Wiktionary:Beer parlour/2008/March

Definition from Wiktionary, the free dictionary
Jump to: navigation, search
This is an archive page that has been kept for historical purposes. The conversations on this page are no longer live.
Beer parlour archives edit
2002
December
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016

Contents

March 2008

Translation bars

What do you think could "show" in the translation bars be changed to be right after the bar title? It's now a bit difficult to find for someone who browses here but doesn't edit, especially if one has never seen that kind of bar before. And It would be great if the bar floated better with pictures, now if there's a picture on the right, and the translation bar on the left, the bar starts after the picture and leaves blank above. Best regard Rhanyeia 15:54, 8 February 2008 (UTC)

I think maybe it should actually go right before the bar title (instead of or as well as at the right end), but right after the bar title would be a good second choice. —RuakhTALK 00:35, 9 February 2008 (UTC)
That sounds good. :) Best regards Rhanyeia 10:53, 9 February 2008 (UTC)
I don't like this idea, having the [show] before the text would (I think) make it messier as they are currently styled like the [edit] links which always float to the right. I can't see how there can be that much different in finding it (and it is the same as the 'pedia bars) Conrad.Irwin 21:10, 9 February 2008 (UTC)
I'm trying to imagine how the pages look to someone who doesn't use computers much, and who doesn't know how their layout works. That person doesn't know there's something hidden inside the bar and his eyes may just go over it fast. Translations are a very valuable part of Wiktionary pages and I think they should be found easily. I can't remember that Wikipedia would use those bars inside articles, does it? That "show" resembles the "edit" tags is not very good because they are for so distinct purposes, a casual reader might avoid clicking anywhere around the "edit" tags fearing it could change the page. I think the "show" tags could be far more distinct from the "edit" tags. If not before the bar titles, could they be after or under them? Best regards Rhanyeia 09:12, 10 February 2008 (UTC)
Assuming that we implemented a change in the "trans" and "rel" bars, how would we know that it was a good change after we had done it? I know that we should and do eat our own dog food, but we aren't representative of the larger population that we serve. DCDuring TALK 11:21, 10 February 2008 (UTC)
This problem bothers me too. What about a companion icon displayed along with the text, for example a downwards-pointing arrow alongside [Show] and an upwards-pointing arrow alongside [Hide] ? --EncycloPetey 22:11, 10 February 2008 (UTC)
I like the idea of the arrows - but does it explain the concept enough - perhaps a "+" sign would be better [show ▼] [show +] - what does Windows use for this kind of thing? Conrad.Irwin 22:21, 10 February 2008 (UTC)
But a plus (+) could still be interpreted to mean that it is for editing (add additional material). The reason for suggesting arrows is that people with poor English (or who aren't watching) might assume the bracketed text allows editing. --EncycloPetey 23:19, 10 February 2008 (UTC)
Indeed. When I created {trans-top} as a demonstration, I just borrowed the "nav" CSS. I pointed very loudly and repeatedly that it should be designed before being put into use, but was utterly ignored by those eager to plow ahead.
The show/hide link should immediately follow the gloss, and the table not display full width unless "opened". Someone ought to create separate CSS for these (and rel-top as well) and use it. Likewise, where a conjugation/inflection table is built to be collapsed, it should not be a full width bar. Robert Ullmann 14:06, 12 February 2008 (UTC)
EncycloPetey's idea about an arrow was excellent. Were you thinking about a graphical arrow or a text arrow? I could try to create something, but does anyone know how to do these things technically so that we can try out how it looks? Best regards Rhanyeia 19:11, 12 February 2008 (UTC)
If there is an arrow character (like ▼) that will display properly in a majority of current browsers and platforms, then that would be ideal. A single text character would load faster and have fewer format pitfalls. I know that the arrow character I used in the first sentence of this paragraph will display correctly in IE6, IE7 and Safari for MacOS X. --EncycloPetey 22:38, 12 February 2008 (UTC)
See http://www.alanwood.net/unicode/arrows.html and take your pick, then finding out whether it will display on IE6 will be harder. Conrad.Irwin 22:46, 12 February 2008 (UTC)
It isn't hard for me to check IE6, since I'm forced to use it at work. I checked the page and nearly all fail to display in IE6; only the first six items display, as well as item 8616. Personally, I prefer 9650 and 9660, the black triangles, which are included in WGL4. --EncycloPetey 02:28, 13 February 2008 (UTC)
A desirable design for a control that we wish to encourage people to use would be for it to be close to where their eye is and where their pointer is and fairly large. A short bar is good, but also a large target. See Nielsen and Loranger (2006), Prioritizing Web Usability DCDuring TALK 02:53, 13 February 2008 (UTC)
The triangles 9650 and 9660 sound good. I think "show" could also be a little bigger. Under the title text might also look good, then the bar would become thicker and one would pay more attention to it. Can the "show" be changed from the template itself, or is it changed from the same place than the "edit" tags? Best regards Rhanyeia 09:41, 17 February 2008 (UTC)
Does someone know who is the editor who can change the place of the "edit" tags? He would probably know what to do with the "show" tags too, and if I remember correct the "edit" tags have been in different places sometimes. Best regards Rhanyeia 08:36, 23 February 2008 (UTC)
Any admin who understands the necessary CSS or markup changes should be able to make them. For reference, one place that has the edit tags in a different place is German Wikipedia from what I recall (on the left instead of the right). Mike Dillon 16:46, 23 February 2008 (UTC)


Did anyone have any objections to the modification of the "show" location on the translation bars? They are so clearly good from the point of view of basic user-oriented computer interface design that it seems a shame to revert them unless there is some extremely good reason, which has not been provided. DCDuring TALK 17:07, 5 March 2008 (UTC)
Hello. I strongly object to the insinuation above that making mouse-click targets move all over the place is an "expert user-interface opinion." That, on the face of it, is a bald lie. Do you have an "[X]" in the top corner of this window? How would you feel if that were placed randomly? Placing widgets consistently, is much larger UI issue than conjoining. I know I commented somewhere when I rolled back the experiment - but am not finding any traces of that comment, now. If my initial post on it, did not save, then I apologize (and am curious as to why.) --Connel MacKenzie 17:07, 10 March 2008 (UTC)
There's another possibility too. They could be placed under the title text, either on the left or in the middle of the bar. Best regards Rhanyeia 17:13, 10 March 2008 (UTC)

So Connel MacKenzie has brought out one downside what the "show" tag being right after the title text has, so the tag wouldn't be in the same place in all bars. Could we try to explore the possibility where the "show" would be on the left side? How would be on the left and under the title text, any opinions about this please? :) Best regards Rhanyeia 16:50, 11 March 2008 (UTC)

For those to know who have not watched Grease pit, there was also a discussion about this there a little while ago. Personally I think "show" was quite good after the text, but it might be even better on the left side one way or another. In the beginning of this conversation Ruakh suggested that it could be before the title text, and Conrad.Irwin thought that might be messy. Do you Conrad.Irwin still after these discussions think so if there was "show", a little space in between and then the title text? There's also one more way to do it. On the vote page the title texts are centered. If the translation bar title text was centered and the "show" was on the left side it would look almost like the vote page now, except that the "show" would be on the other side. How do these things sound? Best regards Rhanyeia 17:59, 13 March 2008 (UTC)

Placing "Show" immediately to the right of the text is a superior solution for users who are beginning to use the system, I think. As I noticed and continue to notice in my own experience, it doesn't work very well for those who use the system heavily, ie, us (regular users). Given a choice (in WT:PREFS or "my preferences"), we could set our preferences to support our needs. Anons have no persisting choices and inexperienced registered users mostly don't know they have choices. To the extent that we can do so without overtaxing our tech experts, the servers, and user patience (download times and other latency delays), it would be nice to have accommodation for each broad group of users (anons/new users, registered users (mostly newish), admins and other regulars, experts). But the needs of anons and other new users would and should determine our defaults. In the absence of any specific data about their behavior on Wiktionary or Wikis in general, we are forced to resort to general principles of naive-user usability. Or we could just make all user interface choices for our own convenience and amusement until someone pulls the plug on all this. DCDuring TALK 19:16, 13 March 2008 (UTC)
It would be nice to have a 'show all' option at the top to expand all boxes at once. Pistachio 15:33, 14 March 2008 (UTC)
Both of these ideas sound good, that one could set these things from preferences and that there would be a "show all" tag too, but I don't know if there would be a volunteer to code them at least at this moment. In the meanwhile, DCDuring I agree with you that right after the text could be good, but what if we tried also some place on the left of the bar to be able to compare them? Best regards Rhanyeia 18:04, 16 March 2008 (UTC)
I didn't mean to suggest not to do that. In some ways it is a perfect solution, providing both predictability for experienced users and obviousness for inexperienced users. DCDuring TALK 18:18, 16 March 2008 (UTC)
There is already a WT:PREF for showing all tables, and it is something that shouldn't take too long to code - though I would quite like to rewrite that section of javascript completely - I don't have time for the moment though. It is possible to override the position of the [show] buttons using personal monobook.css by adding the following lines - but whether to set it as default is a different matter. (This could easily be made into a WT:PREF if people want)
.NavToggle {
  display: inline !important;
  position: static !important;
  float: none !important;
}

Conrad.Irwin 11:46, 20 March 2008 (UTC)

It's great if there may become new features to the preferences regarding these things in the future. Maybe we could try to think about where the default place of "show" would be? Right after the title text has that one downside but might still be possible, on the left side before the title text has not yet been tried. Are there opposing opinions for trying how it looks? Best regards Rhanyeia 16:13, 27 March 2008 (UTC)
I guess the next thing to do would be trying that. It takes an administrator to edit that file, Conrad.Irwin you'd know how to do it, do you think this is something which you could try there please? Best regards Rhanyeia 14:42, 3 April 2008 (UTC)

I've fixed the current NavFrames so that this is now possible to choose with only CSS - if you like it then I'm happy to give it a site wide go, but I'm not sure it works that well.

.NavToggle {
  float: left !important;
  position: static !important;
  right: inherit;
  margin-top: 0.1em; /* To counter the 90% font size used */
  margin-right: 5px;
}

Just add the above to Special:Mypage/monobook.css. Conrad.Irwin 11:34, 7 April 2008 (UTC)

I tested it and I think that's quite good and the tag would be much easier to find. For me the template rel-top looks better with it than the template trans-top. Could they both be like rel-top? Best regards Rhanyeia 16:00, 9 April 2008 (UTC)
Thank you Conrad Irwin for fixing the templates to be similar and explaining why it's better that way. Best regards Rhanyeia 15:27, 14 April 2008 (UTC)
If this becomes changed to the left, how about the first bar of any entry having "show all" tag on the right side? Would that be difficult to make? Best regards Rhanyeia 16:31, 9 April 2008 (UTC)
What Rhanyeia said. And a big thanks for this, Conrad. I hope that there are more than the two of us using it. It would seem likely to nicely facilitate display for ordinary users without making it too easy for them to accidentally click on edit. Fixed position is good. DCDuring TALK 18:04, 9 April 2008 (UTC)
Yes, thank you Conrad.Irwin. I plan to begin a vote so that the default place of the "show" could be changed. Before that, since these templates are used in the mainspace for important things, maybe "show" wouldn't need to be only 90% font size? And thank you Robert Ullmann for making the bars float better. Best regards Rhanyeia 09:24, 13 April 2008 (UTC)
Just because voting sounds like too much bureaucracy, I've gone ahead and implemented it. If people really object to it, please undo the edit - but comment here to let us know why. Conrad.Irwin 15:59, 14 April 2008 (UTC)
Thank you for testing, although it had to be changed back. I'll continue on Wiktionary:Beer parlour#"Show" tags. Best regards Rhanyeia 14:23, 15 April 2008 (UTC)

Inconsistent quotation examples

There's an inconsistency between what follows the year in these two places. In the first it's colon and the second a comma. Looks like the comma is preferred? - dougher 05:17, 1 March 2008 (UTC)

I always use the comma, since that's what's shown repeatedly on the detailed WT:QUOTE page (your second source). Wiktionary entries are all over the map when it comes to formatting quotations, but I usually change them to conform to WT:QUOTE. I don't really consider this format to be particularly elegant but it is simple (every element of the citation is separated off by a comma) and it is important to be consistent. -- WikiPedant 06:06, 1 March 2008 (UTC)
Looks like {{quote}} is a necessity, the only problem is that no-one can agree how it should work. I, by copying other entries always assumes we should use '—' after the year... Conrad.Irwin 13:20, 2 March 2008 (UTC)
Good grief, not this discussion again. No one has successfully developed a template that will do what such a template would need to do. Look at the many previous discussions on this issue; it is pointless to start it again here. --EncycloPetey 15:06, 4 March 2008 (UTC)

AOL

Isn't Wiktionary:AOL and the link from the main page unnecessary now since the X-forwarded-for is provided so AOL users can be identified by their individual IP? At least that was my understanding from wikipedia, see Wikipedia:Wikipedia:AOL Nil Einne 06:52, 1 March 2008 (UTC)

OK, I've just unblocked. Let's see how this goes, shall we? --Connel MacKenzie 20:50, 1 March 2008 (UTC)
In response to the various IRC questions: I unblocked the forward-facing AOL proxy servers that now supposedly forward the XFF-forwarding information correctly. AOL was not "blocked" for years - it was "blocked from invisible access" while allowing all AOL users access over the https: servers. This was no big, dramatic change - individual AOL blocks should still be limited to 15 minutes to 1 hour. --Connel MacKenzie 03:03, 2 March 2008 (UTC)

Encyclopedia of Life

This online resource (at [1]) would seem to be a good source of animals, plants etc. I have added one of their featured entries - green anole. SemperBlotto 12:15, 1 March 2008 (UTC)

I take that you mean it as a reference. We can't use their content (esp. the very nice images), can we? DCDuring TALK 19:58, 1 March 2008 (UTC)
Their terms of use page says this:
Please note that a single page may be made up of many different data elements, each covered by a different license. You are required to check to see which license applies to any portion(s) of the page you wish to re-use and to abide by any restrictions on that content. ... In most cases, EOL data partners have made content available for re-use under one of the following Creative Commons licenses: ... To identify the terms of re-use of a photograph or drawing, click on the green information button on the bottom left corner of the picture.
I spot checked a couple of images from their home page and one of them was CC-BY-NC and the other was CC-BY-NC-SA (meaning we can't use either of them). We should be able to use any images that are CC-BY-SA; in general we can use CC content that has "SA" (share alike) and does not have "NC" (non-commercial). The "BY" (attribution) doesn't make a difference as to whether or not we can use it, only whether we have to give attribution or not (which we would probably do regardless). Mike Dillon 01:49, 2 March 2008 (UTC)
Could anyone with a zoology, botany, or microbiology (not to slight viruses, molds, and fungi, et al.) background compare and contrast this to WikiSpecies? DCDuring TALK 15:57, 14 March 2008 (UTC)

plural and uncountable

Have we currently got a consistent way of marking entries where some senses are the plural forms of single nouns, and other senses are uncountable nouns?

The most recent example I've come across is hostilities, but there are many others. Thryduulf 17:20, 1 March 2008 (UTC)

See weeds. --EncycloPetey 17:49, 1 March 2008 (UTC)
Would it not be reasonable to insert a link to plurale tantum at the sense lines (there might be a template to make the format uniform.)? DCDuring TALK 19:54, 1 March 2008 (UTC)

Language index as category

I am still thinking about how to keep the language index current. Could an index page be treated as a category? Then when I create an entry, I would add it to the appropriate letter category, for example a word starting with m would be added to Category:Hungarian index m. I don't know enough about the consequences. An entry belonging to too many categories, performance issues maybe, too much work for editors, although a bot could also add the categories. Or is it easier to regenerate the index every month using the monthly dump? Thanks. --Panda10 14:38, 2 March 2008 (UTC)

English dictionary should only have English words?

Aren't there Wiktionaries in other languages for words in other languages? For instance, why does the English Wiktionary have entries for être, φλόος, Աստանա, 가마우지 and many others which clearly aren't English words? Should these not be in http://fr.wiktionary.org etc. - for instance, why do we have an être page at the English Wiktionary when this seems the only natural place for it to be? After all, the English Wikipedia has no article entitled "Աստանա", as the English Wikipedia is written in English. Should this not apply here too?

Any comments are appreciated. It Is Me Here 07:27, 3 March 2008 (UTC)

Wiktionary is a dictionary written in English of all words in all languages, just as Wikipedia is an encyclopedia written in English of all topics from all language-areas. I could look up fr:être at fr.wiktionary, but I don't speak French, so I would still not know what it means. All of the Wiktionaries define all words from all languages in their own language, so a reader can access the definitions of all of them in their own language. Dmcdevit·t 07:37, 3 March 2008 (UTC)
This explanation is so perfect that it should not be lost. Could it be pasted to some help page? Lmaltier 20:42, 3 March 2008 (UTC)
Agreed. WT:NOT would also be a good place for this, with a label like "Wiktionary is not monolingual" or some such. -- Visviva 13:39, 4 March 2008 (UTC)
Done. Well, doing now. --Neskaya talk 21:50, 12 May 2008 (UTC)

Romance verb and past part. forms

Because I'm so fond of adding forms etc., and because I personally feel that our non-lemma entries could use the attention, I'm going to be designing verb-form templates for the Romance languages that I can. I'll be doing them in the same way I did {{ro-form-verb}} ({{fr-form-verb}}, {{es-form-verb}}, {{pt-form-verb}}, etc. SemperBlotto handles Italian like it's cool, so I'm going to leave that one to him.) This is the easy part and I'm not too worried about progress here. (Except in Spanish, I think our verb form entries may have gotten a little out of control here. Especially the categories.)

One of the serious things I want to see under control stat is past participles and their forms, though. Spanish pp forms I haven't seen anywhere, nor have I seen the forms listed in the base past participle entries. ({{ca-pp}}, {{es-pp}}, {{fr-pp}} and {{pt-pp}} are all good for use in inflection lines now, so this shouldn't be an issue for new entries if we all know they're there.)

Now the formatting of past participle forms is probably what needs the most help. Keene's runs a bot (ăsta, de fapt) that adds French verb forms (which is awesomeness, I just want to get everything ironed out smooth before we get started for realreal.) French and Catalan pp forms can still function as verbs or adjectives, so they need to go under Category:French past participle forms and Category:Catalan past participle forms. Others I'm just about certain only function as adjectives, so I don't see a problem with putting them in Category:Spanish adjective forms etc, but if anyone disagrees, I'd like to know why so we can do what's best, etc. See User:Opiaterein/Basic past participle format to check out the standardized format I'm looking at. I think It'll work pretty well translingually.

The languages I'm most concerned with, because they represent the bulk of our Romance languages that still need direction here, are:

  • French
  • Spanish
  • Portuguese
  • Catalan

If there are any concerns, I'd like to get them out now so I'm not messin' anything up, keh? :) Let's get to it — [ ric ] opiaterein — 14:18, 3 March 2008 (UTC)

All of them need something that clearly identifies them as minimal stubs, needing definitions, example sentences etc. Users must never get the unfortunate idea that content is somehow not allowed in "non-lemma" entries, or think they can or are allowed to remove it.
Do you mean that all bot-formed entries should have a tag which puts them in a cleanup/stub category, for example in Category:Keenebot2 entries? This is feasible of course, but I can't see how it would be helpful. --Keene 15:11, 3 March 2008 (UTC)
The inflection must be on the inflection line, the definition in English on the definition line(s). We should have a firm policy prohibiting bot creation of entries that do not contain English definitions. If they can generate all the names for inflections, they can generate the English definitions in the correct forms. If the operator is unable/unwilling to do that, he/she should not be creating them.
Please, can you give a link to such a page with the inflection on the inflection line? I've seen a few already, bu I forget where. --Keene 15:11, 3 March 2008 (UTC)
Trying to clean up details in the "form of" entries is pointless, they are fundamentally wrong; they will all have to be done over again, either with a decent bot, or by hand. Robert Ullmann 14:42, 3 March 2008 (UTC)
re: Opiaterein mentions starting for "real real". Keenebot2 has already started for "real real". The bot's almost auto-added conjugations of all verbs tagged with an inflection template so far. Changing it now is possible of course, and if necessary I could cease bot activity until such a time when we're all happy with how inflected entries should look (at the time of the bot vote, there were a couple of oppose votes, so obviously it isn't perfect). However, when Robert Ullmann says that form-of entries are "fundamentally wrong", I must disagree. --Keene 15:11, 3 March 2008 (UTC)
As for past participles and pp forms, the adding of adjective definitions to them is on my to-do list. I started adding a few adjective sections at the beginning, but haven't done many since then. The same with present participles, many (most? all?) of which can be adjectives in French too. At least these ones are all together in Category:French past participles and in Category:French present participles, so when anyone wants to trawl through them it's easier. However IMHO having only these stub-form entries for forms of verbs is better than nothing, and you'd do well to find a website out there with better entries for each "form-of". Regards, --Keene 15:21, 3 March 2008 (UTC)
When the same form can be either a participle or an adjective (which is sometimes, but not always, the case), they should be listed separately.
About definitions in English: they are in English! Actually, I find there are 3 kinds of definitions in use:
  • traditional definitions: they are used for lemma forms of ENglish words, but should also be used whenever appropriate,
  • translation definitions: one word translating the word defined + a gloss.
  • grammatical definitions: used for inflected forms. I cannot find a better way of defining the meaning of inflected forms. Take an example: aima. You cannot understand what this word means if you don't understand several concepts, and at least singular, third person, past historic (and its not obvious use) + the appropriate meaning of aimer. Impossible. Try to build a better definition (it's a challenge) and you'll understand what I mean. You could try to define all this in the page itself, but this would still not provide a definition of aima (only of aimer). You could also try to provide a translation (loved) but, clearly, this is only a translation, not a good definition, and this would not help to understand the difference between aima, aimas, aimai, aimé, aimée, aimés, aimées... as all these words have the same translation. In such cases, grammatical information is part of the meaning. Lmaltier 17:45, 3 March 2008 (UTC)
I strongly agree with Lmaltier. Non-lemma entries require a grammar-focused description of the relationship between the headword and the corresponding lemma. Rod (A. Smith) 18:03, 3 March 2008 (UTC)
Regardless of the debate regarding use of definitions in non-lemmatic entries (disclosure: I am most firmly against Ullman's crusade), there is already a template for French, it's {{fr-verb-form}}. I've created a redirect from the name there (I believe the POS should come first, but then I think we need a naming convention for these templates, and getting agrement on style or naming issues here is even more difficult than on wp...) 18:09, 3 March 2008 (UTC)
{{fr-form-verb}} isn't meant to be used in the inflection (header) line, it's meant to show the actual forms. Check out vorbesc to see the corresponding Romanian template in action.
From this point on, I want no more talk of definitions or glosses in form-of entries. We've been over it a hundred times and it's not the subject of this discussion, thanks. :) — [ ric ] opiaterein — 18:42, 3 March 2008 (UTC)
Okay, that makes more sense, but there are still quite a few kinks to smooth out IMHO, though:
  • I definitely think this ought to use the {{form of}} meta-template, if only for formatting (And they need to start with a capital letter, too!), because there is no specific reason to set them apart from stuff as formated by {{feminine of}} and {{plural of}}.
  • These should be something like {{verb form of}} (or if limited {{romace verb form of}}) for making their purpose clearer ("XXX of" is the format used by all such templates)
  • How about a master template? 90% of the romance languages share the same name of verb tenses, persons and mode. Heck, this could easily be one generic template with extra tenses and modes covering most other languages with person-tense-mode agreement.
  • OTOH, I've by far favored formulations of the sort "first- and second-person" because I find equally readable and slightly more compact (yes I know Wiki is not paper), although that could possibly be handled with special abbreviation for the combinations (there are only 3 or 4 different ones for french). In any case, there is no absolute need to have a "I read" bit after the lemma definition: we can always have it formatted as an example if we insist upon having it (although I'd again favor not using them at all).
Oh boy that was wordy... But necessary. Currently, different editors have applied different methods of formating and of applying this formatting, and I think we should seriously consider sorting that. And finally, there is also an {{es-verb form of}} (which I didn't know about when I suggested that name). Circeus 22:52, 3 March 2008 (UTC)
Funny that folks should mention "standard template" and "Romance verbs all have the same stuff". I recently set up {{conjugation of}} to do exactly this. It accepts a language argument, so no new template is required to add all the various Romance language verb forms; the template is flexible enough already to handle them as it is. --EncycloPetey 02:51, 4 March 2008 (UTC)
Re: "Others I'm just about certain only function as adjectives, so I don't see a problem with putting them in Category:Spanish adjective forms etc, but if anyone disagrees, I'd like to know why so we can do what's best, etc.": Spanish past participle forms often function as verb forms: just like in English and French, they're used in forming the perfect aspect and the true passive voice. (Neither of these is This isn't as common in Spanish as in English or French, but they do it does exist.) —RuakhTALK 20:48, 3 March 2008 (UTC)
No no no, Ruakh, I know that the past participles are used as both verbs and adjectives, I mean the forms of the past participles. Preservada, escondidos, habladas :) — [ ric ] opiaterein — 23:48, 3 March 2008 (UTC)
I've edited my comment accordingly. :-)   —RuakhTALK 01:29, 4 March 2008 (UTC)
Participles are a really sticky problem in Romance languages. I haven't made up my mind how I'd like to see them handled. Most Latin past participles inflect and function as adjectives, but all grammars treat them as a verb form, and they are used to form certain verb tenses. I've even considered the possibility of just using the POS Participle, but that too leads to problems. --EncycloPetey 02:54, 4 March 2008 (UTC)
In French, there is no such rule. Adjectives are ofteh created by using participle (mostly from past participles, and it is usual that only very common such adjectives are mentioned as adjectives in dictionaries, to save space). For French, at least, the simplest and best way is to mention them as verb forms and to mention them separately as adjectives.

[reset tabs] I have a separate, but related, worry. Many past participles have taken on a life of their own, and have an adjectival meaning which is wider than their verbal meaning: for an example, see tired, fatigué in French, cansat in Catalan (the meaning is pretty much identical in the three languages). I think these need to have two PoS headers, one for the adjectival meaning and one as the past participle of the verb. On the other hand, if the adjectival use is completely subsumed by the verbal use (eg, underlined), I don't see any need to list them separately. My second, more minor, complaint against ric's suggestion is that the participle forms do not link back directly to the lemma form, which for most romance languages is the infinitive. The very definition of a participle is "a form of a verb that may function as an adjective or noun" (present participles may also function as adverbs in most romance languages)—the link back to the root-form of the verb seems essential to me. Physchim62 14:13, 4 March 2008 (UTC)

Maybe I'm missing something with this participle talk. Speaking for Spanish only, past participles used as a verb only have one form (-o) unlike how it is currently listed in preservado. They only have 4 forms (gender x number) when used as an adjective (in which case we use {{es-adj}}) or as a noun (in which we use {{es-noun-mf}}) as most Spanish adjectives can be used as nouns. In the passive sense they are used like adjectives. See desaparecido for a quick try at seperating out all three senses. What's the need for new templates like {{es-pp}}, shouldn't we be just standardizing use of the old ones? --Bequw¢τ 14:48, 4 March 2008 (UTC)
You understand the grammar, but the point of the discussion is that this is a regular pattern for Spanish participles. They behave almost as their own part of speech with a consistent set of rules. So, instead of discussing each participle as a verb form and and adjective and a noun, and giving all the inflections and notes for every participle, it almost makes more sense to just call it a Participle and point to an Appendix on spanish participles to explain what that means and give all the definitions in one section. It's a bit like the way we agreed that attributive use of English nouns does not make them adjectives; it's just that English nouns can function attributively as a regular part of English grammar. Or the way that English cardinal numbers can function as both noun and adjective, but we can catch both functions by pointing out that the word is a cardinal number. So, one way to handle Spanish participles is to call them Participles. --EncycloPetey 15:03, 4 March 2008 (UTC)
Bequw just explained that, in Spanish, it's not a regular pattern. It's not a regular pattern in French either: many past participles cannot function as adjectives. And most present participles cannot function as adjectives (when they are used as adjectives, they are not considered as verb forms). Why inventing misleading solutions? Lmaltier 19:32, 4 March 2008 (UTC)
I find it ironic that in accusing me of "inventing solutions" and saying I didn't read what was said, you're showing that you didn't read my post. As I said earlier, I have not decided how I feel about this because both approaches have problems. I do not see where Bequw explained anything about a lack of pattern in Spanish, so if you could show me where he said that, I'd be grateful. I did not propose a solution, I explained one possible position because (as Bequw said) "maybe I'm missing something". I therefore laid out the full discussion so we could be clear about what's being discussed. I proposed no solution. Sheesh! --EncycloPetey 02:38, 5 March 2008 (UTC)
I don't accuse you of anything. I can't speak Spanish, but I understand from past participles used as a verb only have one form that some participles are used as a verb only, and some may be used as adjectives (but I may misunderstand the sentence). This would mean that there is no general rule applying to all participles. Even if I misunderstand it, it's clear that word forms to be listed are not the same when the word is a verb form and when the word is an adjective, and this suggests separate headers. For French, I'm sure that paper dictionaries always consider participles as verb forms (and they are right) and, therefore, do not list them, but they list adjectives derived from participles (systematically for adjectives such as souriant, as readers would not be able to guess that souriante exists, and only when they are common enough for adjectives such as fumé, in order to save space). Stating that souriante is a participle would simply be wrong, nobody considers this word as a participle. I only mean that there is no need for inventing something more than current headers (this is not the same as cardinal numbers, which follow a regular, systematic, rule). Lmaltier 07:08, 5 March 2008 (UTC)
Then you misread it, didn't you? Go back and read it again. What Bequw said was that, of the forms a Spanish participle has, only one of those forms (the one that ends in -o) functions to form verb tenses; the other forms function as adjectives. Everything else you just wrote about Spanish participles is incorrect. And you're missing the point about Participles. You are beginning with the assumption that they must be forced into the existing categories of Verb & Adjective. The point of my discussion is that we want to consider the possibility that they deserve separate recognition. Consider: English nouns can modify other nouns and function like an adjective, but we don't call them an Adjective here just because they function like one. They're still nouns. The point I'm making is that perticiples are etymologically verbs, function like verbs, but that they also function like adjectives. If we use the POS Participle for Romance languages, then we can include the adjective function in the same POS section without having to pretend there are two separate words involved. I don't pretend that I've solved anything here, but please do consider the possibility and help discuss both the pros and cons, rather than simply dismissing it. This may be different for French, but I don't know French. --EncycloPetey 14:58, 5 March 2008 (UTC)
(Ears burning) The process of Participle → Adjective → Noun is standard (in terms of changes in the suffix), though it's not universally applicable to all Participles. Some Participles would just be weird to use as nouns, and if you used one in that case people would understand you but in the way some neologisms are easily understandable. Some Particples used as nouns have a slightly different meaning (secado is "the action of drying", not "the dried thing" that one might infer from its descent from secar ‎(to dry)). So if a participle heading were to be used, it would be helpful if it could show in an integrated way how a word could be used in the standard POS terms. Maybe like Particple (Verb or Adjective) or Particple (Verb or Adjective or Noun). Possibly this would be in the header or maybe the inflection line. --Bequw¢τ 19:35, 5 March 2008 (UTC)
This may or may not be relevant, but it may interest you to know that Ancient Greek will definitely be using the POS "Participle." Obviously it's not a Romance language, but I thought you might find that information useful anyway. -Atelaes λάλει ἐμοί 16:36, 5 March 2008 (UTC)
I might have misunderstood the intention, too. But changing the existing entry for sucré would be wrong. In French, an adjective is not a participle, a participle is not an adjective, and they have different meanings: an adjective is about a characteristic feature of something and has nothing to do with the verb (except etymologically), a participle has a meaning related to the meaning of the verb. sucré, as an adjective, means sugared (the adjective) or sweet, as a participle, it means sugared (the participle). This is important, because it makes clear that many French sentences are ambiguous. An example: Il a sucré son café, puis a bu le café sucré. Le café était très sucré.. The first use of sucré is a participle, the third one an adjective, and the second one is ambiguous (the intended meaning may be 'sweet', in which case it's an adjective, not a participle, or 'which has just been sugared', in which case it's a participle, not an adjective). Lmaltier 17:41, 5 March 2008 (UTC)

Can anyone quote me a grammar of any IE language that defines 'participle' as a distinct part of speech, not the usual catch-all term sense for 'verbal adjective/adverb' or as an unseparable compound in tense-formation? Why do you think there are none?
This is an English wiktionary and all the entries must be normalized to English senses/terminology. It would be a dangerous precedent to impose additional headers just because the editors are lazy to provide quality content. --Ivan Štambuk 17:30, 5 March 2008 (UTC)

Wrong on all counts. (1) I have grammars of Ancient Greek and Latin that treat Participle as a part of speech. Grammars are inconsistent about how they treat participles; some treat them exclusively as verb forms, while others treat them primarily as adjecitves. Other grammars treat them as if they were two separate words. Some of these same grammars will treat adjectives used as if they were nouns solely as adjectives. We should not blindly follow mono-lingual grammars when we are creating a multi-lingual dictionary, but should consider how best to treat the words themselves for the use of our readers. (2) Not all entries must be "normalized to English terminology". That argument went out the window the moment we started including African and East Asian languages. Take a look at the Japanese POS list sometime; it does not normalize to English POS categories because Japanese is not like English or western languages. (3) It is not laziness to discuss a topic and come to a consensus that fits the needs best. What is lazy is to throw out an idea simply because you assume it is a bad idea, assume the world works a particular way, and assume that you're right. The participle discussion is about how to provide quality content. --EncycloPetey 01:15, 6 March 2008 (UTC)
What are you talking about?? What Latin and Ancient Greek grammars list participles as distinct part of speech? Nouns, verbs, pronouns, adjectives...participles? Don't be silly. "Participle" in linguistics is just an umbrella term meaning "this is really an adjective/adverb/inseparable component... derived from verbal root by regular morphology, meaning exactly what verbal root does in the context of it's application". If it inflects exactly like an adjective, translates in English exactly like an adjective, and is used like an adjective - it's an adjective allright. The fact that there are well-defined rules for producing participles from verbal roots does not mean that all of them are verbal forms that should be put under =Verb=, or worse, under =Participle=; when both their semantic value an inflection properties demonstrate that they act as a separate PoS, you must treat them as such.
Nobody advocates blindly following of a mono-lingual grammar, I't very rude of you to put words like these in my mouth.
I'm sure readers will be much more delighted if they saw a real definition line for a participle, not some "Xx participle of " stub. Advocating separate PoS heder for paticiples that would only be populated with stubs cannot be a quality argument for end users.
Your argument on Japanese is what exactly? We're including non-IE languages and that fact legitimizes this linguistic perversion participle-as-a-PoS? I don't see anything unusal on Wiktionary:About_Japanese#Parts_of_speech, what specifically did you have in mind?
You wrote it yourself: So, instead of discussing each participle as a verb form and and adjective and a noun, and giving all the inflections and notes for every participle, it almost makes more sense to just call it a Participle and point to an Appendix on spanish participles to explain what that means and give all the definitions in one section. - this sounds like laziness to me. For langauges in which participles themselves could have many different inflected forms, this would link form-of entries to form-of entries, which would admittedly make the automated generation of entries much easier, but certainly not for the users to understand them.
Sorry, but te idea of =Participle= section appears to me as silly as that of =Infinitive= and =Gerund= (there are also languages with plenty of those too). --Ivan Štambuk 16:24, 6 March 2008 (UTC)
Again, you have claimed one thing, but said differently in the same post. You siad "Nobody advocates blindly following of a mono-lingual grammar". OK, so why then does my discussion of treating Spanish participles lead you to ask how this would affect other languages? You keep assuming that whatever we decide to do for one language must necessarily be applied equally to all languages, i.e. a mono-lingula grammar. No one believes this must happen except you. That is why I pointed you to the Japanese parts of speech list. Japanese POS headers include "Quasi-adjective" and "Counter word", but these headers do not have to be applied to any other language. We can and do recognize different POS headers for different languages.
I also have no lcue why you think having "Participle" as a POS header would lead to stubs. If it has its own header, then it must have its own lemma; it will not be a "form of" entry, but will have definitions, inflection tables, related terms, descendants, and all the other sections a lemma would have. The only way a stub happens is if someone doesn;t add the additional information. Stub-ness is completely unrelated to the question of what POS we use. Let's bury that straw man argument now. --EncycloPetey 16:39, 14 March 2008 (UTC)
Nearly all of my Ancient Greek grammars treat participles as a separate category. -Atelaes λάλει ἐμοί 01:24, 6 March 2008 (UTC)
Sanskrit has been using exactly four PoS categories for the last 2500 years (since it codification): nouns (nāman, in broader sense - substantives), verbs, prefixes and particles. In all Sanskrit dictionaries today you'll find conjunctions/adverbs/interjections/... combined in a category of "indeclinables" named "avyaya". Does that mean that we should dismiss usual English/Western terminology in favour of the Pāṇini's scheme? NO! --Ivan Štambuk 16:34, 6 March 2008 (UTC)
Ok, but the mere fact that a number of editors working of a number of different languages are considering participle as a POS should certainly indicate that this is not some esoteric POS invented by a single author. And, for clarification, in Koine Greek (the form of Greek I received my formal education in and am most knowledgeable of) participles did not function as adjectives all that often. I would say the most common usage is along the lines of the English infinitive, probably followed by a noun usage. Smyth, Black, Long, and Wallace all treat participles as separate POS's. -Atelaes λάλει ἐμοί 18:54, 6 March 2008 (UTC)
Some of the AGr. grammar books I'm browsing on books.google.com really say participle as a PoS, like this and this one, but at the same time they don't treat adjectives as a separate PoS at all ^_^ w:Lexical category claims that "It wasn't until 1767 that the adjective was taken as a separate class", so those participle-as-a-PoS could indicate obsolete terminology used by ancient grammarians, and continued in modern tradition. And all of those AGr. books have a separate chapter on infinitives just next to the one on participles, and I don't see anyone advocating =Infinitive= header. Encompassing under =Participle= both adjectival and substantival meaning of participles which they could acquire through context, and translating them as different English PoS seems utterly wrong to me. In Polish, mało ‎(little) is classified as a Numeral (liczebnik), but not so usually in other languages; - should the article conform to English or Polish notation of a numeral? Dumping all of those participles under unique =Participle= section that would be used for different things in different languages seems additional argument against to me, because it would almost certainly be the vaguestly defined lexical category of them all. At any case, the thing that should not be forbidden is promoting those =Participle= or =Verb= (form) stubs (stubs (stubs with declension tables though!) to proper full-blown =Adjective=, =Noun= or =Verb= section, with normal English gloss and usage samples. --Ivan Štambuk 20:28, 6 March 2008 (UTC)
Huh? How the translation is used in another language has no bearing on what the POS is in the native language. In Slovene, the names of languages are adjecitves and adverbs, never nouns, but the translations of those terms into English is a noun. That doesn't mean the Slovene word changes POS to match its English translation. I really don't follow your arguments. A few paragrpahs ago you were telling us we should use Participle as a header because nobody does that and we don't want to set a "dangerous precedent". Now that we've pointed out it is used (and has been for a long time), you a balking that it's "obsolete". So, you don't want to follow established tradition, and you don't want to try something new. So what exactly do you want to do? The grammars for Latin and Ancient Greek are not obsolete, and they have changed the way they treat POS over the years to better reflect accumulated understanding of the language. The retention of Participle is charcteristic of many of the most progressive and modern Classical language texts. Please don't dismiss the work of renown experts just because you have a bee in your bonnet about Participles. And, by the way, mało is a Determiner, or more specifically an "indefinite numeral". We actually do use Determiner as a POS here. --EncycloPetey 00:38, 7 March 2008 (UTC)
1) slovenščina, angleščina, nemščina etc. are, of course, nouns in Slovenian. Language names ending in -čki/-ski/-ški/-ský/-ский are adjectives/adverbs exclusively both in all Slavic languages and in their English translations. If you see a Slovenian adjective/adverb formated as a =Noun=, that's a mistake that needs to be corrected.
2) I never advocated using =Participle=, you must have misread something. Yes, I'm "balking" that it's obsolete, because those same books that list "participle" as a PoS don't list adjective as a PoS at all. Every time you use =Adjective= in Latin/AGr. your'e already not following "established tradition". If we were to follow traditions of particular languages, PoS header names would be a complete mess. The only possible solution that doesn't lead to chaos is to normalize everything to English.
3) I've never seen a "Determiner" used as a PoS in any Slavic language. Other Slavic langauge's correspondents of mało, pół, dużo,ćwierć etc. are always treated as adjectives/adverbs. This treatmeang of "fractional numerals" is Polish-only. Formating pariciples that are obvious adjectives as =Participles= makes as much sense as formatting Polish adverbs/adjectives as numerals.
4) Here's an excerpt from Encyclopeda of Language and Linguistics (Elsevier, 2005): p330, on Applicational Grammar: "Other items belonging to the category 'adjective' are prepositional phrases – about gardening combines with the term books to yield the term books about gardening – and participles, such as English sleeping from sleep and Russian igrajushchij (‘playing’) from the stem igra-. The primary function of verbs is to apply to a noun to yield a sentence. One secondary function is to act as an adjective, which is signaled by the participial suffix." On most other places on the Internet I found that the separate lexical category of participles is included by some authors, but at best it remains an exception rather than a general rule. --Ivan Štambuk 20:25, 7 March 2008 (UTC)
Why do you continue to assert things that are flatly untrue and which contradict each other?
(2) "those same books that list "participle" as a PoS don't list adjective as a PoS at all" This is flatly untrue. The modern textbooks on Classical languages have both parts of speech listed. "The only possible solution that doesn't lead to chaos is to normalize everything to English." This assertion was thoroughly refuted the last time we discussed Japanese and Korean grammar. You can go look at those arguments yourself, since there is no point in repearting the whole discussion here. We cannot and should not try to shoehorn all languages to fit an English model of language. If that were possible, then linguists would not have abondoned the idea of "universal grammar" as most now have.
(3) "I've never seen a "Determiner" used as a PoS in any Slavic language". Just because you've never seen it does not invalidate it. It's in standard modern English grammars, so if we standardize to modern English as you suggest, then we'll have to apply them to Slavic languages, won't we? In short, your arguments in (2) and (3) above are inconsistent. One of them will have to give way to the other. In any case, I have several Slavic language grammars that recognize "Numerals" as parts of speech, including Slovak for Slavicists by Baláž et al., Czech for English Speaking Students by Šára et al., Polish: an Essential Grammar by Bielec, A Basic Reference Grammar of Slovene by Derbyshire, and Introduction to the Croatian and Serbian Language by Magner. Likewise, I have a range of grammars for several languages (including English) that recognize the "Determiner" as a separate part of speech that includes numerals, articles, and demonstratives as subtypes.
(4) If most of the places you found on the internet have a separate lexical category, then how does that make it an "exception"? If most sites are using it, then it is the general rule. Are you arguing that, on the basis of an Elsevier Encyclopedia, that we classify prepositional phrases as adjectives? Please note that the source you quoted says that a participle is an adjecitve and says it is a verb that functions as an adjective. That is it is both simultaneously, and not just one or the other. That's what the POS of "Participle" means.
--EncycloPetey 03:54, 12 March 2008 (UTC)
2) Those that I've looked on b.g.c that list participle as a separate lexical category and that I've linked above really don't list adjectives as PoS. This accords with that 'pedia article that asserts that they weren't recognized as such up until recently. That the participles retained separate grouping is more a continuation of an established tradition, not because of a real necessity of doing so. Even today in Sanskrit adjectives are not really recognized as separate category from nouns (if it inflects exactly like a noun, sometimes behaves exactly like a noun - why treat it separate?). What is a pure convenience for one language's descriptive grammar's tradition cannot be used as a general argument for all languages. It's pointless to introduce =Participle= only for a couple of languages, and not doing so for all the other ones. Linguist didn't exactly abandon the idea of "universal grammar", for you see 99.999% of morphosyntactic constructions in all natural languages are describable with CFGs and nicely fit some general framework.
3) It doesn't, but maybe that indicates something, doesn't it? What is a "determiner" in English (according to w:Determiner (class) not even widely excepted term in English grammars) are really all adjectives in Slavic. You can write something like "Big likes something different." that it still makes sense, because this "big" will then have an ending that will indicate gender & case. I didn't claim that numerals weren't classified as PoS in Slavic (where did you get that?), but was just refering to this special-case Polish "partitive numerals" ("little", "half" etc.) that are really classified as adjectives/adverbs in all the other languages. That kind of chaos you get when you apply local standards as general formatting scheme.
4) Most sites are not classifying participles as separate lex. category (you again must have misread something). That OR relation in that sentence was not inclusive at all; participle can behave like multiple PoS according to it's context, and each of those can be separated into it's own L3 header. Just like almost every English present participle can be used as a verb, noun or an adjective. In some other languages these are on not "syncretized" into one form and each of those would have different translations in their ====Translations==== section. Similarly, participles in other languages that can have multiple PoS functions must be formatted as such, and have different translations in English. It's like MxN relation.
5) My point is: you can't just treat something collectively because it's convenient to do so, under the assumption that the reader will know all the details under what conditions participle can act in what ways. I don't know what were you exactly discussing on those Altaic languages, but I fail to see the implications to this topic. Participles could have verbal, adjectival and nominal functions 6000 years ago, have lost some of them in some daughter languages, but the point remained the same: they're not some "special" PoS just because they can have multiple lexical functions depending on the context. --Ivan Štambuk 09:23, 13 March 2008 (UTC)
2) Then perhaps you should visit a library and look at physical books. I have not seen any recent grmmars of Latin or Ancient Greek that completely failed to have an "Adjective" category. You are focussing only on the older books if you aren't finding adjective, which means you are ignoring modern research.
3) Clearly you do not know whicj words in English are Determiners. "Big" is not a determiner; it is an adjective. "little, "half", "this", "that", etc. are Determiners. It is not a "local" standard -- the behavior of determiners is a commonality among most European languages.
4) Yes, and almost every adjective can function as a noun. Almost every noun can function attributively as an adjective. But, we deliberately do not have a separate Adjective section for each attributive use of a noun, because that would be silly. The meaning is present in the noun, so we only list it as a noun. In other words, we already have situations parallel to the idea of listing under "Particple" all the various POS functions.
5) If you think my entire line of reasoning is based upon "convenience", then you haven't been paying attention to anything I've said. And what Altaic languages are you talking about? There is no point in having a discussion if you are going to keep jumping to topics that haven't been raised instead of dealing with the questions put to you. I have pointed out severl times that your arguments contradict themselves, and you have yet to address that very critical point.
6) Worse, your arguments keep jumping all over the map. Let's return to something you said above: "when both their semantic value an inflection properties demonstrate that they act as a separate PoS, you must treat them as such." OK, so if Latin participles decline wiht a present, perfect, and future form, how can we stuff them into the Adjective category? Latin participles have tense, which adjectives do not. We therefore have inflection properties that demonstrate they act as a separate part of speech. The same is true for Ancient Greek. And, no, they don't simply function like adjecitves, since they are used to form compound verbs as well, and adjectives don't do that. --EncycloPetey 16:19, 14 March 2008 (UTC)

In no way is it "laziness" to provide an accurate description of the grammars of the different languages we include. It is smallminded to pretend that everything can be neatly fitted into some form of bastardized English grammar. Physchim62 13:49, 6 March 2008 (UTC)

Yes, the description must be accurate, this is why I insist so much. I feel I must mention the 'adjectif verbal' (verbal adjective) concept, defined as adjective formed after a participle. The name is misleading, but the definition is clear, and this name seems to be fairly common in French. Try to google 'adjectif verbal', and you'll find that sites consistently insist on one point: verbal adjectives should not be confused with participles. It's probably equally true in many languages (including Spanish, as I understand it, and English), but it's especially important in French, because the singular, plural, feminine forms are often pronounced the same, and because the basic spelling itself sometimes depends on this distinction (e.g. intriguant = present participle, intrigant (same pronunciation) = associated 'verbal' adjective). If you don't clearly understand the difference, you are likely to misspell many words. Lmaltier 17:50, 6 March 2008 (UTC)
That's great for French, but more than French is at issue here. If you look in most Ancient Greek and LAtin grammars, they will define a participle as a "verbal adjective". So what may be true of French is not true in other languages. The spelling is certainly not an issue in either Spanish or English. In those languages, the "participle" is spelled the same way whether it functions as a verb or adjective. Spanish participles can change their spelling when used as a modifier, but the msculine singular is the same as the spelling used in constructing compound verb tenses. So, it seems that Participle as a header may not work for French, but that only addresses one of the languages under consideration. --EncycloPetey 18:55, 6 March 2008 (UTC)
I'm happy you are convinced. I just want to add that, IMO, the only good general solution is to list all adjectives as adjectives, all nouns as nouns, whatever their etymologies. Bulgarian verbal nouns can be found in conjugation tables. Nonetheless, and fortunately, they can be found in dictionaries, as nouns, because they are nouns. Lmaltier 07:17, 7 March 2008 (UTC)
The problem with that suggestion as I understand it is that we will end up systematically having entries for adjectives and verb forms which mean exactly the same thing in English. I am happy to have separate PoS entries whenever there is a difficulty, but to make this universal seems to be overkill. Physchim62 15:14, 7 March 2008 (UTC)
I can personally assure you that defining sugared as a verb form and as an adjective is not overkill, despite the fact that the translation in French is the same in both cases. I was not aware of this use. Now, this must be done only when it is considered as an adjective (or a noun...) in the language. Churchill is considered only as a proper noun in English, so I don't propose to define it as a common noun, even if you can say like a Churchill (this is only a figure of speech, and all proper nouns can be used this way). Lmaltier 17:41, 7 March 2008 (UTC)
Two cents: in Modern Greek, participle is one of the POS, and in the grammars it's listed that way. One class of them is used in forming certain verb tenses. I want to keep the participle header for these reasons. -- ArielGlenn 09:34, 8 March 2008 (UTC)
Before making a decision, the meaning of participle should be clarified. The current definition states: A form of a verb that may function as an adjective or noun. When combined with a form of auxiliary verbs, such as have or be, they form certain tenses or moods of the verb. But this definition does not work for all participles. Take brumassé, a standard past participle in French. This word is used in compound tenses, but I cannot imagine how it could function as an adjective or as a noun (it would not make sense). I propose to change the definition to something like A form of a verb often used to form certain tenses or moods of the verb (when combined with a form of auxiliary verbs, such as have or be) and that often tends to be used as an adjective or as a noun. The important thing is that you cannot generalize. Lmaltier 21:36, 14 March 2008 (UTC)
In Lithuanian, participles are definitely a painfully distinct part of speech. The davylis participles in particular are... well, just check this out for a minute or two and you might get what I mean. They're heavily inflected and function very differently from verbs, and most of the time differently from adjectives. — [ ric ] opiaterein — 11:01, 3 April 2008 (UTC)

New Free Corpus of American English

"The BYU Corpus of American English is the first large corpus of American English, and it is freely available online. It contains more than 360 million words of text, including 20 million words each year from 1990-2007, and it is equally divided among spoken, fiction, popular magazines, newspapers, and academic texts (more information). The corpus will also be updated at least twice each year from this point on, and will therefore serve as a unique record of linguistic changes in American English."--BrettR 15:57, 3 March 2008 (UTC)

That's "free as in beer" only, meaning that results cannot be copied wholesale, but nonetheless is excellent news. I haven't tested it thoroughly, but this has the potential to be a great boon for sense-verification work. -- Visviva 13:36, 4 March 2008 (UTC)
It should proove an excellent reference source. Someone may like to create a R:Reference style template for it.--Williamsayers79 08:27, 7 March 2008 (UTC)

Stylistic recommendations for wikilinks

Is there a written consensus/guide on wikilinks where the target is different than the linking-text? I'm not so much concerned about the general case of [[x|y]] links as they are obviously useful. I am wondering about a the specific shorthand of appending suffixes to the wikilinks so as to link to the "lemma" entry rather than the inflected entry. For example writing [[run]]ning to produce running or [[building]]s to produce buildings. This is really common in Wikipedia where they don't have separate articles or redirects for every word/phrase form. Here in Wiktionary, though, if the inflected entry exists isn't this kind of linking a little gauche as you think you're going to the definition of one word and actually end up somewhere else? I've seen a lot that I'd like to change. Suggestions? --Bequw¢τ 21:26, 4 March 2008 (UTC)

I always use this to get straight to the lemma form, an entry like swooning contains nothing I couldn't guess, however the meaning of "swooning" can be ascertained from the entry at swoon, hence I link <code>[[swoon]]ing</code>. This is either a problem with our form of entries, or is how the world should be. Conrad.Irwin 00:45, 5 March 2008 (UTC)
Agree with Conrad.Irwin. Links should go to the lemma, unless the inflected form is intrinsically important to the meaning. -Atelaes λάλει ἐμοί 01:02, 5 March 2008 (UTC)
I agree with Atelaes and Conrad.Irwin about inflected forms. However, I'm iffier when it comes to trivially derived terms, like nouns in -ity that we define as "the quality of being …y" and so on. —RuakhTALK 01:35, 5 March 2008 (UTC)
The other place where the link is "hidden" is when the wikilink is at the beginning of a sentence or definition that start start with a capital letter, but the entry to be linked does not. Since Wiktionary is case-sensitive, and Wikipedia is not, this is a rather significant difference between the two projects. --EncycloPetey 02:35, 5 March 2008 (UTC) I thought I'd throw this in so that someone might take what folks say in this discussion and create a guide to wikilinks on Wiktionary.
And another thing: It wouldn't hurt to recommend that wikilinks to long lemma entries go to the appropriate Level-2 (for non-English terms) or Level-3 heading (Etymology or PoS). DCDuring TALK 03:27, 5 March 2008 (UTC)

I've started Wiktionary:Links. Please contribute to it. :-)   —RuakhTALK 01:44, 6 March 2008 (UTC)

Us versus Les Grenouilles

One of my favorite restaurants in Paris was frequented in the years following WWII by businessmen from England, who referred to the proprietor as Roger the Frog. The restaurant is long and narrow [2], with tables for 4 on both sides, the walls are covered with photographs and many other memorabilia from the 1940's and 50s. It is now run by Roger's daughters. On my first visit, I ordered ris de veau (sweetbreads) in passable French; the woman went to the other end and said to her sister "the American tourist doesn't know what he is ordering". As she passed my table again, I said, in English: "is the abuse part of the service?" ... it is on the Left Bank, and is called proudly: Roger Le Grenouille (26 rue Grands Augustins)

I set out to find out how and why the Wiktionairre was 20K or so entries "ahead" of the en.wikt. We have ~720K entries counted in statistics, they have ~750K. By reading the XML dumps, I figured I could find out the 20 or 30K entries we didn't have, and then sample them to see what they were. But I got a surprise.

We have 712,823 (3 March) that would be counted in the statistics. They have 744,620. So there are 31,797 entries we are missing, plus some number more because we have a different set of entries. Right?

Turns out that they have 600,508 entries we don't have. See User:Robert Ullmann/en v fr.

A very large percentage are form-of entries, not "real" entries that someone has worked on. Given that some are also form-of entries here, I'd take a WAG that they have < 100K real entries ("base forms"), while we have 406,112 WT:STAT. Which would make a great deal of sense, given the number of contributors, etc. So I'd say forget about chasing fr.wikt, we are way ahead. (Reminds me of the cold war, in which the US built thirty thousand nuclear warheads out of fearmongering that the Soviet Union was getting ahead, when all the time it was desperately far behind ...) Robert Ullmann 14:50, 5 March 2008 (UTC)

Though I like to think this competition is a little friendlier than the stockpiling of nuclear weapons.. Widsith 16:50, 5 March 2008 (UTC)
No, much worse ... "Academic politics are so vicious precisely because the stakes are so small." ;-) (Attrib. Woodrow Wilson, modern form Wallace Sayre Robert Ullmann 13:51, 6 March 2008 (UTC)

For more precise statistics (% of form-of entries), look at the Total line of the table, at fr:Wiktionnaire:Statistiques. But it seems obvious to me that, overall, yes, the Wiktionary is ahead, and it's not surprising. Lmaltier 17:49, 5 March 2008 (UTC)

Am I going to get arrested for treason if I added a grc etymology to an entry on fr? -Atelaes λάλει ἐμοί 17:53, 5 March 2008 (UTC)
No, but if you create an entry, we might have to dig out the old "giving aid and comfort to the enemy" clause. :-) (U.S. Const. Article III at (3): "Treason against the United States, shall consist only in levying war against them, or in adhering to their enemies, giving them aid and comfort.") Robert Ullmann 13:51, 6 March 2008 (UTC)

I've update the analysis, screening out (roughly) the form-of entries. The results are more interesting. More words of interest that we will want to add. Robert Ullmann 13:51, 6 March 2008 (UTC)

I'd reckon that en: has about that many of non-:fr entries as :fr has non-:en entries. --Keene 01:16, 7 March 2008 (UTC)
There is some difference: 189,651 in fr, not in en.wikt; 262,239 in en, not in fr.wikt (keeping in mind that these numbers are based on a very rough screening of form of entries) A lot of the ones fr has are French or Vietnamese, which makes sense.
Do note that all my us v them rhetoric in this section is in fun ;-) Robert Ullmann 14:12, 7 March 2008 (UTC)
This is great stuff. Would it be feasible to generate the entire list without a lot of extra hassle? (I'm particularly interested in the Korean entries they have that we don't). -- Visviva 04:24, 8 March 2008 (UTC)
Never mind, I figured out how to get that data. Fortunately nobody on wikt has started worrying about Korean form-of entries yet. -- Visviva 07:16, 8 March 2008 (UTC)

Wiktionary:What Wiktionary is not

Should the Wiktionary:What Wiktionary is not page be made policy? A vote could be started if so, but I don't think it's a particulary pressing task. --Keene 15:48, 6 March 2008 (UTC)

No, I dislike the formal policy status, it should be unnecessary - particularly for pages like this which exist only because they exist on Wikipedia. See WT:NPOV for a particular example of this problem; that page may not be modified without a vote, yet its contents at the moment simply does not apply to Wiktionary. Conrad.Irwin 16:39, 6 March 2008 (UTC)
Well, WT:NPOV has been modified by vote, to reflect the userbox ban. It could (and should) be modified further, to reflect Wiktionary's unique characteristics. Ultimately, however, neutrality is as crucial here as it is on Wikipedia. We just happen to have a mercifully low level of content disputes, so haven't needed to develop the sort of robust dispute-resolution process which would require a robust NPOV policy. -- Visviva 03:35, 8 March 2008 (UTC)
Sure, this should be a policy if anything is. However, it's no big deal at this point. In practice the real boundary-defining page is WT:CFI anyway. -- Visviva 03:35, 8 March 2008 (UTC)
No, it should not. I can just see it: make WT:WIN and WT:CFI both policy, and then have disputes over slight nuances and shades that seem to contradict one another, until we wind up in the end with both pages essentially quoting one another with no difference in wording, whereupon someone will suggest merging them into WT:CFI and getting rid of the duplicate. Then someone will suggest having a more user-friendly explanation of what Wiktionary is not, and we'll start all over again. (That was with a touch of humor, but still.)—msh210 18:43, 13 March 2008 (UTC)

The inclusionist/deletionist divide

Hi all, one of Wikipedia's great problems is the division between those who wish to include everything and those who wish to include only useful information. It seems to me that the same divide is growing on Wiktionary and I would like to discuss ways in which some of the angst it causes on Wikipedia can be avoided here. Firstly, I am an inclusionist and I can see no reason whatever for deleting anything. (Well except for vandalism and nonce words perhaps).

There are two cases that I feel we can do better than the current situation, and I would like to propose a solution: For proper nouns with no Idiomatic meaning (and thus should fail CFI), we can replace the page with a template that looks similar to MediaWiki:Noarticletext however contains a link to the Wikipedia article. The same can happen for words which do not meet our CFI because of the Independance criteria, however do merit an entry in an appendix - with the template linking to the correct page.

I can see some objections to this, "it turns red links blue", "it messes up our statistics", "it is misleading for the bots/humans", however I still feel the useful service we would be providing to our readers counteracts this. Thoughts? Conrad.Irwin 17:02, 6 March 2008 (UTC)

  • I don't know which one I am (I think I'm an inclusionist). My criteria is simple - does the word / term / combination of letters and symbols etc. have a meaning in the real world. If yes, keep it, if not delete it. SemperBlotto 17:07, 6 March 2008 (UTC)
If we need to create a second-class status to include more entries of the kind that fill the specilized dictionaries and references that fill the shelves of bookstores and libraries, then we should. I can't imagine that there would be insuperable actual technical difficulties although there may be other insuperable difficulties. DCDuring TALK 17:11, 6 March 2008 (UTC)
As a hesitant inclusionist, I agree in parts with both sides. SemperBlotto's view of "does the word / term / combination of letters and symbols etc. have a meaning in the real world" is very ideological. and I have followed that notion before. But I think our RFV system might be a little out-dated. 3 decent Google Books hits was resaonable, but today seems too inclusionist-y. I#m not sure how this could be improved, but it is still lacking something. Much like maybe all these thousands of French verb forms - I'm no expert, but am sure that some declensions would be unfindable on b.g.c.--Keene 01:14, 7 March 2008 (UTC)
Are you saying that we should be a little tighter on lemmas and a bit looser on forms-of? DCDuring TALK 02:52, 7 March 2008 (UTC)
Are there terms that are passing RfV that you think should not be included? -- Visviva 05:59, 7 March 2008 (UTC)
Personally, I'm an inclusionist with a strong we-​have-​rules-​we've-​agreed-​on-​so-​let's-​follow-​them bent (-slash- a strong we-​have-​common-​practices-​so-​let's-​codify-​them bent) and a weak our-​readers-​already-​don't-​understand-​the-​difference-​between-​an-​encyclopedia-​and-​a-​dictionary,-​so-​why-​blur-​the-​line? bent. Overall, this disposes me favorably toward your solution. Regarding the possible objections you mention: (1) If we don't want an entry, then there's no particular need for links to it to be red edit-links (there probably shouldn't be any links to it, period, but between red and blue links there's no advantage one way or the other). (2) As long as we don't include [[ in these pages, I believe it won't affect our statistics. (3) Humans can learn, bots can learn, mirrors can learn, everyone will be happy. —RuakhTALK 03:17, 7 March 2008 (UTC)
Ruakh's rationale sounds good to me. In a perfect world, this {{nothing to see here}} template would also accept unwikified translations as arguments, something along the lines of wikispecies:Template:VN; that would I think dispense with the primary argument made in favor of keeping these entries. -- Visviva 05:59, 7 March 2008 (UTC)
I am an unabashed deletionist (and proud of it). I think this is a reasonable idea, but I would prefer it if the WM message could do a search and make a larger link if found. However, if this is not feasible, I am less concerned by our numbers than by our usability. -Atelaes λάλει ἐμοί 06:22, 7 March 2008 (UTC)
I have created an entry at Isaac Newton to show off the new {{only wikipedia}}, I am not sure what the best way of adding translations to it will be, both in terms of formatting and in terms of coding, so if someone else wants to experiment with that then please do. Conrad.Irwin 11:46, 7 March 2008 (UTC)
That seems like a good idea. It might be better though to have a generic {{otherprojectonly}} template with a parameter for the relevant project. Admittedly I can't think of any occasion it would be used for a project other than Wikipedia, but then I'm not familiar with most Wikimedia projects. Thryduulf 14:23, 7 March 2008 (UTC)
I wonder whether this would be a good way to handle two-part species names: referral to Wikispecies. That would be an argument for genericization or possibly for a parallel template. Wikispecies doesn't have etymology and we have some of the individual words (from Classical through Medieval Latin, less in New Latin). Although perhaps directly linking our redlinks in this area would be even better. DCDuring TALK 16:03, 7 March 2008 (UTC)
I would envisage there being a few of these (probably all using one template behind the scenes), one in particular could go from main namespace to the appendices, and I see no reason why we shouldn't also link to other projects if necessary. We have to be careful to ensure that this is only used for cases when Wiktionary should not have an entry and not used as a quick substitute for getting an entry written. Conrad.Irwin 21:10, 7 March 2008 (UTC)
I'm inclined to agree that binomial (and trinomial) scientific names should be on Wikispecies and not here, as there is not much we can say about them. However, this needs more discussion, as we have welcomed this sort of material heretofore. -- Visviva 03:28, 8 March 2008 (UTC)
The template looks great. -- Visviva 03:28, 8 March 2008 (UTC)
Not bad, now just make it multilingual. Yeah, I'm all for this idea, though I want to be sure that it's still possible to create an entry when the template needs to be overridden. For instance, Google Books has 91 hits for "the Isaac Newton of" as an exact phrase. I think the way the template is set up now (for English Wikipedia) already addresses my concern. DAVilla 22:48, 8 March 2008 (UTC)
Yes, the whole point of the template is for cases where this is not the case. If Wiktionary should have the word then it shouldn't have that template. I am not sure why we would want to link to other language Wikipedias, given that we should be providing information for English readers here. See 毛泽东 for how I think that situation should be treated. Conrad.Irwin 23:36, 8 March 2008 (UTC)
Obviously we should give top billing to the English Wikipedia's article if it has one, but I really don't see why we'd avoid linking to other languages' Wikipedias. BTW, a technical question: is there any way we can get these non-entries to not be indexed by Google? Currently we have <meta name="robots" content="noindex,nofollow" /> on redlinks; we probably want something like that on these entries as well, or at least the noindex part. (Actually, I'm not sure why we have nofollow on redlinks, either, but what do I know?) —RuakhTALK 00:24, 9 March 2008 (UTC)

CFI for languages

So, our mission statement is "Every word in every language!" But, at the same time, we don't include every langauge. For example, we don't include Ionic Greek. However, we do include the Ionic dialect of Ancient Greek. But, how do we divide our languages? Everyone who has studied linguistics knows that a language is only a well marked dialect. How do we decide what's a language and what's a dialect or period? With the more common languages it's generally fairly clear. However, for less common languages and extinct languages, where to draw the line is difficult. To a certain extent, it's not that important. If we didn't include Ancient Greek, but only included Greek, then all the words would still get covered, I'd simply have to include {{obsolete}} in nearly every term I enter. :) However, dividing between Greek and Ancient Greek works well, as it splits words up into groupings which are convenient for both users and editors. So far, the standard we have been following, for the most part, is SIL. SIL does a fantastic job of splitting up the world's languages and it gives us a bit of official credence in our divisions when we follow them. Most importantly, it prevents a lot of waffling and unending arguments. While it works to have Greek and Ancient Greek, and it would probably work to have just Greek, it would absolutely not work to have both randomly from day to day, or some editors doing one thing and some editors another. That would be an awful mess. However, SIL's not perfect. A few editors of obscure languages have noted a few deficiencies in SIL's groupings. So, here's what I'm thinking: I propose that we retain SIL as the standard which we use for making divisions between languages. If a language has a SIL 693-3 code, it gets approved for its own L2 header and for general use on Wiktionary. However, people are free to propose amendments to SIL's decisions. If someone thinks a language which SIL does not recognize should exist on Wiktionary, or perhaps what SIL considers two languages should be treated as one on Wiktionary, they can contest SIL's grouping. So, they start a BP topic on the issue and con a bunch of editors into buying their story. Since languages are the apical sorting method on Wiktionary, I think this is important enough that every single modification should be officiated by a vote, if the BP discussion goes well. Obviously everyone will be convinced in different ways, but I think, in general, an editor should have to write more than five entries to justify changing up the format of Wiktionary like that. What does everyone else think? -Atelaes λάλει ἐμοί 03:05, 8 March 2008 (UTC)

A minor point, but I think it is important to distinguish the SIL (Summer Institute of Linguistics, originally a missionary-training outfit) from the ISO (International Standards Organisation). As far as I'm aware, though, the SIL version is a faithful reflection of the ISO 693-3 standard. -- Visviva 03:20, 8 March 2008 (UTC)
Sorry, but that's not correct. SIL International (formerly known as the Summer Institute of Linguistics) is the official registration authority for ISO 639-3; I don't know whether ISO (English name: the International Organization for Standardization) maintains its own copy of SIL's list. —RuakhTALK 13:39, 8 March 2008 (UTC)
Um, no. ISO has the official list, IS 639-3. SIL is the designated registration authority [[3]] for requests for new codes and changes. Proposals by SIL have to be approved by ISO JTC1/SC2/WG2 and then are published by ISO. (Yes, the 3-letter code list was originally developed by SIL, contributed to ISO, updated, and now SIL uses the ISO codes in the Ethnologue, etc. That's why you'll find different coding in the 14th and 15th editions of the Ethnologue.) Visiva's comment is correct. We should generally just refer to the ISO codes, rather than SIL. Robert Ullmann 14:40, 8 March 2008 (UTC)
According to the ISO Web site, "ISO 639-3:2007 provides a code, published by the Registration Authority of ISO 639-3, consisting of language code elements comprising three-letter language identifiers for the representation of languages."[4] That is, according to ISO, it's SIL that publishes the official list. Your statement that SIL proposals have to be approved by ISO JTC1/SC2/WG2 is interesting — I can't find evidence of that, but will take your word for it — but doesn't seem to be relevant. SIL is the public face of ISO 639-3, responsible both for handling change requests from the public, and for publishing the standard. If ISO keeps its hand in the process, bully for them, but for us they're relevant only in that they gave their imprimatur to SIL. —RuakhTALK 15:12, 8 March 2008 (UTC)
This is very interesting, thanks. I had not looked into the matter properly, and had not been aware of the special relationship between SIL and ISO 693-3. That said, I still object to referring to these as "SIL 693-3 codes" as the OP did (which was all that I meant to object to in my response above). -- Visviva 15:20, 8 March 2008 (UTC)
Yes, I think we should be calling them ISO 639 codes, not "SIL codes" (SIL itself calls them ISO 639-3). (The document in question says "International Organization for Standardization" (and "Organisation internationale de normalisation" :-) on the cover, and the copyright is ISO, not SIL.) As Visiva notes, that is the only point being made here. Robert Ullmann 15:40, 8 March 2008 (UTC)
Oh, yes, agreed, sorry. The name of the standard is ISO 639-3 (note the number, BTW), no matter who is in charge of it. —RuakhTALK 20:24, 8 March 2008 (UTC)
I think it is important that we be open to including as many languages (natural languages) as we can fit, meaning all of them. If a language is only spoken by a tribe of 373 natives in the remotest jungles of Brazil, well, where better to record their lexicon? I am absolutely not an inclusionist in general, but I like the idea of recording 'all words in all languages' quite a lot, (when it comes to real words), so let's get them in here. The sticky wicket comes along when you talk about Level 2 headers, and what should and should not be included in them. I don't think we should go exclusively by the ISO list, but I also don't think we should let anyone who feels like it decide that a language deserves a L2. Perhaps we can get together a few of our true linguists (of whom I am not one) and get some kind of Language Committee or something, a group of people who are willing to put in the legwork and document reasoning behind calling a language a language, and then saying 'Yes, use an L2' or 'No, Appendix only'. I think that a 'vote' is a bad idea, it doesn't represent anything close to objective results, I know because I have voted before. I think getting qualified people to make the recommendations is a better way to go about these sorts of things. - TheDaveRoss 03:35, 8 March 2008 (UTC)
Excellent idea. Some of the discussions on the inclusion of a particular language end up in WT:RFDO, and it would be nice centralize them somewhere formally. --Ivan Štambuk 10:53, 8 March 2008 (UTC)
I agree, and would like to take this opportunity to propose officially that "Hebrew" (he/heb) and "Ancient Hebrew" (hbo) be taken as one language, "Hebrew". The two languages are not without their differences, but most Ancient Hebrew are still considered correct (if odd) in Modern Hebrew, and quotes from Ancient Hebrew texts are still widespread in Modern Hebrew, much like quotes from Early Modern English (Shakespeare, the KJV, etc.) are still widespread in English today. (This has previously been discussed, and appeared to have consensus; I re–bring it up now only because you're proposing an official mechanism for deciding these things, and I'm not a huge fan of implicit grandfathering.) —RuakhTALK 13:39, 8 March 2008 (UTC)
While I have absolutely no problem with this proposal, I request it not take place in the middle of this particular thread. This is meant to be a discussion of whether we can deviate from 693-3, and if so, how. If we throw every language proposal here it will be unteneable. Besides that, we really should figure out what the policy is for these changes before we make them. -Atelaes λάλει ἐμοί 19:01, 8 March 2008 (UTC)
O.K., sorry. In my own defense, the SIL-vs.-ISO discussion above takes up a lot more space. :-P —RuakhTALK 20:24, 8 March 2008 (UTC)
Indeed it does, but it's almost sort of vaguely related to the topic at hand. More importantly, I figured it would fizzle out as it's such a minute detail. ISO 639 is the official standard. SIL is the organization charged with producing that standard and one of the easiest places to find out what the standard is. Whatever. -Atelaes λάλει ἐμοί 20:38, 8 March 2008 (UTC)
I run into problems with the differences between what the "WMF language committee" and ISO use. For example, Min Nan is "nan", but WMF created the Min Nan wikipedia using "zh-min-nan". I have been trying (not terribly hard ;-) to find out who the committee is, and possibly get on it. There are definitely codes we need to add, probably mostly as subcodes (fiu-vro for Template:fiu-vro), but they should get WMF-wide coordination if possible. At least others should know what we are doing. For languages that need separate codes, we should find out if there is a proposed -3 or -4/-5 code; if not we should be contributing our findings to ISO TC37/SC2 via SIL. Robert Ullmann 15:40, 8 March 2008 (UTC)
That would be the m:Language subcommittee. It's worth noting that the language proposal policy specifically requires that new proposals have an ISO 639-x code, but this seems to be honored more in the breach than the observance. -- Visviva 13:08, 9 March 2008 (UTC)
The proposal seems good, as I understand it; i.e. that we should follow ISO 639-3 except when it is in the project's interests to do otherwise, and that such exceptions should only be approved upon thorough community-wide deliberation (WT:VOTE or equivalent). -- Visviva 13:08, 9 March 2008 (UTC)
As I think about it, TheDaveRoss's idea seems better and better. As evidenced by two of the following threads, this issue is one which will appear over and over, with increased frequency as time goes on. Inasmuch as overall community consensus is the best way to decide things when possible, I just don't know if its feasible for the community to make this decision every time it comes up. Most of our editors simply don't have the time to read a ten page Beer Parlour discussion and weigh all the political, historical, and linguistic controversies present. I think it will work much better if we pick two or three solid, well-respected editors with a good track record and give them dictatorial powers on the issue. I know everyone cringes at the very mention of the word "dictatorial", but we've already tried this method (with WotD), and it's turned out pretty damned well, as far as I'm concerned. -Atelaes λάλει ἐμοί 21:23, 11 March 2008 (UTC)

I don't know about these constructed languages, as certainly some such as Esperanto may be legitimate as a native tongue, while others are so obscure that not a single word in the language would pass CFI independently for three authors, so there is definitely a gray area for those. But forgetting about artificial languages for a moment, at least I can be certain that if it flows out of someone's mouth or pen and conveys a message that is understood by others then it is a word in some natural language, though I may not know which one. So in some sense I see the question as to which language a word belongs to be completely independent of the question of whether it belongs on Wiktionary. If it's spoken or written and it can be attested then it belongs in a project that documents every word in every language, and if there is controversy over which language it belongs to then the heading may change and change back and change again, but the substance will remain. 71.129.48.8 06:46, 13 March 2008 (UTC)

Category:Old Korean language

This category currently contains (only) Hangul reconstructions. This is absurd, since even the most generous definitions of Old Korean define it as Korean in use before the introduction of Hangul. All extant OKO texts (which are very few) use Chinese and/or gugyeol characters. I am seeking clearance to move all entries herein to their attested forms, if any, and delete the redirects. No objection to Appendix:Reconstructed pronunciation of Old Korean if anyone wishes to create it. -- Visviva 09:38, 8 March 2008 (UTC)

That seems reasonable to me. Thank you. I was wondering what I was going to do with that. You've saved me a great deal of stress. -Atelaes λάλει ἐμοί 09:39, 8 March 2008 (UTC)

Grammar gurus in the house? See entry for "hence"

Hey, I'm not a grammar guru, but I think there's a comma splice in the examples for hence. I left a note on that page's talk, but assume it will be ignored. 163.28.49.4 12:45, 9 March 2008 (UTC)

This belongs in the Wiktionary:Tea Room. I have removed the comma, though do note that you can fix things like this yourself using the edit button at the top of an entry. Conrad.Irwin 12:52, 9 March 2008 (UTC)
Wasn't sure if I was right, so didn't edit.. plus didn't know the diff b/w a tea room and a beer parlour. :-) Thanks! 163.28.49.6 13:01, 9 March 2008 (UTC)

Naming of categories of non-English proverbs and idioms

Currently, there is a mixed naming of categories for non-English proverbs and idioms, like:

These seem to stem from two different sources of modeling and imitation:

Depending on whether idioms and proverbs are considered more like the items in the first group or the second group, the naming of proverb and idiom categories could be chosen, fixed in policy, and then the categories could be renamed. Any opinion on this from the policy makers? --Daniel Polansky 08:47, 10 March 2008 (UTC)

Actually, in the case of Mandarin, we have both (Category:Mandarin proverbs, Category:zh:Proverbs, Category:Mandarin idioms, Category:zh:Idioms)! There's no reason that it has to be an "either/or" choice. Furthermore, provided that the contributor uses the inflection templates recommended by WT:AC (Template:cmn-proverb and Template:cmn-idiom), it involves no more work for the contributor than a single category. Incidently, the benefit of having both categories is that you can use it as an opportunity to provide different sort options (as well as options for which entries to include in which category). -- A-cai 10:06, 10 March 2008 (UTC)
I see. I did not notice there were both types of categories for Mandarin. I have now noticed that there is also the category Category:zh:Nouns, and that Category:zh:Proverbs is further split into subcategories while Category:Mandarin proverbs is not.
I see the benefit of less confusion, for me anyway, if there is only type of naming scheme for categories. As regards the downsides of one type of naming scheme, unfortunately, I do not understand what you mean by "different sort options"; do you think you could explain that to me?
Is there any other language using both naming schemes, or is this just Mandarin?
Do you expect that all the languages should have both categories, like having Category:cs:Proverbs and Category:Czech proverbs? --Daniel Polansky 12:35, 10 March 2008 (UTC)
It seems to me that, for other languages but Mandarin, having a consistent naming scheme would be valuable. --Daniel Polansky 12:35, 10 March 2008 (UTC)
We do have a consistent scheme, except for Chinese languages and a handful of categories that have not yet been cleaned up to standards. The two forms of category name (one for parts of speech, the other for topics) is a deliberate and consistent distinction. --EncycloPetey 17:01, 10 March 2008 (UTC)
So is it correct that there should be Category:German proverbs and not Category:de:Proverbs? And can I move Category:fr:Proverbs to Category:French proverbs? Connel seems to have a different view, judging from the tags he added to Category:French proverbs and Category:German proverbs. --Daniel Polansky 18:35, 10 March 2008 (UTC)
See the ongoing discussion at Wiktionary:Requests for deletion/Others#Category:French proverbs. --EncycloPetey 19:24, 10 March 2008 (UTC)

Norwegian language classification

I have been engaging in conversations with a couple of users about what to do with Norwegian, and it has gotten to the point where I figure it would be good to have the larger community's input into the subject. The issue centers around Bokmal and Nynorsk. The introductory section of the Wikipedia article on Norwegian sums it up rather nicely. Please do the background reading there if you are not already acquainted with the subject. The two conversations can be found at User talk:EivindJ#Norwegian Questions and User talk:Robert Ullmann#Norwegian language templates. So, there are a number of different ways we can sort between these types. I am advocating following the Norwegian Wiktionary. What this would entail is having the L2 header "Norwegian" for all words which are used and spelled the same in both Bokmal and Nynorsk. Any word which only exists in (or has a usage unique to) Bokmal receives the L2 header "Norwegian (Bokmal)" and any word which only exists (or has a unique usage to) Nynorsk receives the header "Norwegian (Nynorsk). I am as yet, unsure of how the categorization would work on that. Another option is to put all Norwegian only under the L2 "Norwegian", and then use Nynorsk and Bokmal as context tags (i.e. treating Bokmal and Nynorsk as dialects, instead of languages). A third option which has been advocated is to equate Bokmal with Norwegian (i.e. Bokmal terms go under the header "Norwegian") and treat Nynorsk as something else entirely, going under the header "Nynorsk." Thoughts? -Atelaes λάλει ἐμοί 00:22, 11 March 2008 (UTC)

Prior to having done all the background reading, my initial thoughts are the simplest to explain and the easiest in terms of categorisation would be either your second option (treating Nynorsk and Bokmal as effectively dialects); or one based on your first option but where the header "Norwegian" is not used, where words are the same in Nynorsk and Bokmal the page would have two L2 sections "Norwegian (Bokmal)" and "Norwegian (Nynorsk)". Your first option would follow this format if there are homographs?
In terms of translations into Norwegian of English words, I'd recommend using the same format as Serbian does for the different scripts. Thryduulf 01:08, 11 March 2008 (UTC)
Yes, the more I think about it, the better I think it is to simply use "Norwegian" and then use context tags. This solves the problem of sorting (as words would go into [Category:Norwegian POS's], as well as [Category:Nynorsk], etc.). Additionally, this provides an intuitive format for any other dialects we want to include in the future. -Atelaes λάλει ἐμοί 01:29, 11 March 2008 (UTC)
I'd prefer either (1) to separate them completely, with "Norwegian Nynorsk" and "Norwegian Bokmal" being valid L2 headers, and "Norwegian" alone being invalid, or (2) to treat them as a single language, "Norwegian", with two regional variants, like we do with U.S. and U.K. English. I don't like the idea of a half-and-half approach where we'd have three L2 headers for two forms of arguably one language; and I'm only O.K. with "Norwegian" vs. "Nynorsk" if we can also give U.K. English the boot — say, "English" vs. "British". Or maybe "Limey". ;-) —RuakhTALK 01:33, 11 March 2008 (UTC)
Remember, English is English: it is "American" that would be given the boot. English and USpeak. ;-) Robert Ullmann 15:56, 11 March 2008 (UTC)
Actually, in this situation British English resembles Nynorsk much more. The reason why some consider Bokmål more "properly" Norwegian is simply because it has a distinctly larger population. And if we're talking about population, American English definitely has British English beat. -Atelaes λάλει ἐμοί 21:11, 11 March 2008 (UTC)
Tee-hee: everyone says this as if it is gospel, because it is so "obvious". The actual numbers show Commonwealth English over American about 3:1. Sure, people hear a lot of American on TV, but if you write "color" in your schoolwork you will lose marks for spelling ;-). Robert Ullmann 12:36, 13 March 2008 (UTC)
To refer to Bokmål and Nynorsk as dialects is a misinterpretation, I am afraid. It is wrong to think about these two as dialects, or simply as if they were Norway's answer to British and American English. They are two equivalent languages, by law, and differ almost as much as Norwegian and Danish in it's written form (not verbally). When Robert Ullmann utters that "The only issue is a few words that can't be in any way considered Nynorsk" he is sadly mistaken. There is heaps of words that in no way can be considered bokmål and the other way around. There is not for fun that we have one Nynorsk Wikipeda and one Bokmål/Riksmål Wikipedia. I reckon, and some other no.-admins with me, that "Norwegian (Bokmål)" and "Norwegian (Nynorsk)" is most correct, but I will give notice to the Norwegian Wikipedias' fellowships and make them come with their thoughts. Thanks (: --EivindJ 07:27, 11 March 2008 (UTC)
Using "Norsk" and "Nynorsk" is wrong, as it gives the impression that nynorsk is less "norsk" than bokmål is. I am a bokmål user myself, but I would dislike seeing nynorsk discriminated in such a way. I think that either you should use two different L2 headers, or you should use the same method as with UK/US English (though i think the differences between bokmål and nynorsk are both bigger and more numerous than the differences between UK English and US English). - Soulkeeper 07:57, 11 March 2008 (UTC)

A symmetrical solution has to be used. The option “Norwegian (bokmål)” vs. “Norwegian (nynorsk)” is probably the best one. As for the two ofen having similar forms, that goes almost equally much for the two of them vs. Swedish (esp. nynorsk) and Danish (esp. bokmål). -- Olve Utne 12:09, 11 March 2008 (UTC)

Look, we have to have Norwegian as Norwegian. (Else everyone will be saying forever "I can't find Norwegian"). We then have the same case as with other languages, the most common/standard/default language (from an English POV; this is the English wikt ;-) gets the name. Then we have words that people will insist absolutely must be identified as only Bokmål (which we call Norwegian ...) and others that are Nynorsk.

ISO codes 3 different languages:

  • no = nor = Norwegian
  • nb = nob = Norwegian Bokmål
  • nn = nno = Norwegian Nynorsk

(Technically, we have a few constraints that must be observed: the names must match, e.g. no = nor, and the names must not contain parens or be partly linked. But that isn't a problem.)

I had/have it set up as:

  • no = nor = Norwegian
  • nb = nob = Norwegian Bokmål
  • nn = nno = Nynorsk

Note there is only one difference from ISO (and the Norwegian Government recommendation for the names of Bokmål and Nynorsk). I don't see a problem with "Nynorsk" -> "Norwegian Nynorsk" The problem will be keeping the majority of the language in Norwegian where people expect it to be. See euro. Robert Ullmann 15:56, 11 March 2008 (UTC)

Even if the translations are the same, keep the two different headers. One could always put a bigger header atop called Norwegian, then have two sub-headers: one for bokmål and one for nynorsk. To suddenly have one header, whereas there usually is two, is going to confuse more than anything else. --Harald Khan Ճ 16:16, 11 March 2008 (UTC)
no, we don't use subheads like that. (See WT:ELE, header levels are very significant.) And there won't be "suddenly" one header; there will be almost always only one header (Norwegian), and only occasionally two. Note that the case at euro is instructive, it occurs only when the word is the same and the inflection different (yes, I know this occurs with some frequency), it could be handled right in the inflection line, and save having two sections. Robert Ullmann 16:19, 11 March 2008 (UTC)
OK. Still there should be two different headers: Norwegian (Bokmål) and Norwegian (Nynorsk). No official written language is called Norwegian. It is discriminating to hint that the inflection of Bokmål is more Norwegian than that of Nynorsk or vice versa. --Harald Khan Ճ 17:28, 11 March 2008 (UTC)

The WP article is not clear on whether the languages are mutually intelligible. Are they? If so, I see no reason at all to split them here on enwikt. If not, then they should be different L2 sections even where the contents of those sections would coincide. What to call them is then another question, and I don't like any of the solutions, to be honest.—msh210 17:36, 11 March 2008 (UTC)

All the four langugages, Danish, Swedish, Bokmål and Nynorsk is mutually intelligible. --EivindJ 17:49, 11 March 2008 (UTC)
While I don't have much background in linguistics, I thought that that was the criterion on which linguists decided to consider dialects languages. Am I wrong? Or are Danish, Swedish, Norwegian, and Norwegian considered one language by linguists? Or is this an exception for some reason?—msh210 17:53, 11 March 2008 (UTC)
I don't have any background in linguistics either, but I strongly doubt that they are considered as one language. --EivindJ 17:59, 11 March 2008 (UTC)
For more on this, see w:Dialect, w:Dialect continuum#Scandinavian_languages, and w:Ausbausprache - Abstandsprache - Dachsprache.—msh210 18:58, 11 March 2008 (UTC)
Note that there are already a number of languages that are separated here, and that are mutually intelligible and naturally form a dialect continuum, like Croatian/Serbian, Hindi/Urdu, Moldovan/Romanian, Macedonian/Bulgarian... So this kind of separation for Norwegian wouldn't be a precedent, but a continuation of a common practice. --Ivan Štambuk 19:11, 11 March 2008 (UTC)
Mutual intelligibility has nothing to do with a definition of a 'language'. Natural languages are not like biological species that that there is a hard line cut between them, that prohibits mixing, at the DNA level. Such kind of analogy a grossly misleading simplification. They are exclusively defined by national committees, and in this particular case NLC recommends "Norwegian Bokmål" and "Norwegian Nynorsk" terms respectively. There's no reason to enforce politically incorrect terms into L2 section names when there such clear alternatives that all native speakers agree on. nor/no itself is a macrolanguage code, not individual language, and these normally don't get included at all. --Ivan Štambuk 18:48, 11 March 2008 (UTC)
Actually, they are very much like biological species. Because, you see, some species can (and do) interbreed, while other species form a continuum wherein which some members can interbreed and others cannot. The various species of oaks regularly form fertile offspring, and many orchid genera can form hybrids with enough regularity that some of these hybrids have their own names. There is a ring species of birds in the northern hemisphere around the pole, where individuals at wither end of the bird's range cannot interbreed, but interbreeding happens everywhere in between over short distances. It is a myth that biological species cannot ever interbreed, and, in most cases, there is no experimental data verifying that two species cannot interbreed. So, the interbreeding of species is very like the intelligibility of language, that is: thoroughly muddled. --EncycloPetey 01:16, 12 March 2008 (UTC)
Nice link on ring species, read about it in Richard Dawkins - 'A Devil's Chaplain' (ingeniously written book :). However, obstacle that regulates interbreeding of species has been discovered recently at the DNA level. Escherichia coli and Salmonella typhimurium, that evolution separated > 150 million years ago and have ~20% mismatch in DNA, have proven to be compatible under some circumstances. This means that the barrier between species is very discrete, and this does not occur in natural languages in which lexemes are tossed in all directions and adapted. --Ivan Štambuk 19:14, 12 March 2008 (UTC)
A famous geneticist once remarked "If it's true for E. coli then it's true for elephants." Since that time, the statement has proved false many times. There are many ways in which the genetics of bacteria and elephants are very, very different. There is not just one single factor that regulates interbreeding of species, there are many, many different mechanisms that can and do come into play. Consider that species with separate males and females automatically limit the possibilities of pairings that will result in fertile offspring, even if the DNA itself is 100% compatible. In some species of plants and fungi, there are single allele mating cofactors that function in the same way to control mating type. And even when species cannot mate themselves, there are viruses that act as agents transferring DNA laterally from one species to another, just as happens with languages. Language has not been around so very long, compared with the age of biological species, so intercompatibility and hybridization is to be expected. --EncycloPetey 03:33, 13 March 2008 (UTC)

The solution used in euro is not good since it strongly implies that Bokmål is Norwegian and Nynorsk is something else than Norwegian. I don't understand why it is desirable that en.wikt should imply something like that just because some users here prefer the one language before the other when it comes to what is Norwegian. There is no doubt that the two of them are independent languages (a quick call to the Norwegian Language Council should prove me right) ... none of the two is closer to spoken Norwegian (almost everyone speaks something in between). I do also not understand why users who don't have proper acquaintance with both of the languages can utter statements that implies that they are extremely similar and that partition only is necessary in a few cases. With all respect, it isn't out of national sentiment that we argue for having a clear distinction between the two. --EivindJ 18:15, 11 March 2008 (UTC)

Agreed. Robert's proposed format is untenable. Since all of our Norwegian friends seem to insist that we must make a language level distinction between the Bokmål and Nynorsk, that seems the way we must go. However, this means a few things: "Norwegian" as a language header is out. All Norwegian entries must go under "Norwegian Bokmål" or "Norwegian Nynorsk." However, we must keep the the word Norwegian in these headers, because, as Robert rightfully notes, otherwise people will be complaining that they can't find Norwegian. Certainly we will have a lot of duplication (i.e. a lot of entries with both "Norwegian Bokmål" and "Norwegian Nynorsk" headers), but we already have a lot of duplication with Scandinavian languages anyway. This also means that every single entry which currently uses the L2 "Norwegian" (according to WT:STATS, there's about 3,000 of them) must be changed. Obviously we should wait to get an official consensus (a vote is really required), but that looks to be where we're heading. -Atelaes λάλει ἐμοί 20:03, 11 March 2008 (UTC)
So you mean it will not be sufficient to name all words who are equal in spelling and meaning in both languages as "Norwegian", and only use the spesific headers for the words that differs from the one or the other? --EivindJ 20:43, 11 March 2008 (UTC)
The more I think about it, the more I have to say no, it isn't possible to do that. It's simply not the way we do things here. We don't have varying levels within our language headers. We can treat Norwegian as a language, or a language family containing Bokmål and Nynorsk, but not both. If we use Bokmål and Nynorsk as language headers, that makes Norwegian a language family, and we don't put language families in L2 headers. Ultimately, there is no qualitative distinction between language and well marked dialect, except the distinction of politics, as Ivan mentioned. Because of this we generally divide stuff in the way which will be most useful to our readers and easiest on our editors. However, the politics of living languages sometimes forces our hand. I will say that treating Norwegian as a language and Bokmål and Nynorsk as dialects would be easier to edit (and probably more useful to our readers), but if degrading the two to dialects would cause us uproars, it's not worth it, and we'll have to treat the two as distinct languages. -Atelaes λάλει ἐμοί 21:08, 11 March 2008 (UTC)
Thanks for the explanation. The way I see it there is not much other options than what you describe here; at least if we're going to do this properly. --EivindJ 21:27, 11 March 2008 (UTC)
Ok, can everyone who cares about the issue state whether they can accept the following two languages on Wiktionary: "Norwegian Bokmål" and "Norwegian Nynorsk," with "Norwegian" being a deprecated language. If this seems to be acceptable after a few days, we'll put it through a short vote, and allow you Norwegian folks to get back to work. :) -Atelaes λάλει ἐμοί 21:34, 11 March 2008 (UTC)
I accept that. And hopefully a bot (maybe AutoFormat?) can start s/Bokmal/Bokmål/g-ing. —RuakhTALK 23:21, 11 March 2008 (UTC)
No, they must be "Norwegian" (no) and "Nynorsk" or "Norwegian Nynorsk" (nn). We can not deprecate "Norwegian". Note that this is the WMF standard: we have no.wp and nn.wp, and no.wikt and nn.wikt, and so forth. (Also, nb would exclude Riksmål, which creates another utterly un-necessary problem.) Robert Ullmann 13:30, 12 March 2008 (UTC)

That's a little tricky. If most of the entries are from a single editor, and we find out that they've only been working with Bokmål, that might work. However, the best thing would be for real live people to go through these by hand and figure out whether the word is B, or N, or both. -Atelaes λάλει ἐμοί 23:33, 11 March 2008 (UTC)

If there is no label, then it will require human attention. I think Ruakh's suggestion, however, was about changing instances of "Bokmal" to "Bokmål" (i.e. changing the second vowel from "a" to "å") which is certainly automatable. Certainly changing it for new/edited entries would fit will with AutoFormat's work. Thryduulf 00:06, 12 March 2008 (UTC)
In principle, I am opposed to the idea of splitting the two into separate L2s, as I have been saying to Atelaes and Robert Ullman. Bokmål and Nynorsk are not different languages; they are two differing written standards based on differing dialects (but by no means necessarily faithfully reflective of any one dialect) of the same language. It is thus my preference that they should be unified under the same L2 header (Norwegian) and any (all) standard-specific terms indicated with context tags. I would also see Riksmål- and Høgnorsk-specific terms (and anything else we can think of) likewise indicated, where they exist, and are appropriate and relevant. However, if it's deemed more useful by the community to split them into separate L2s (although it does seem rather like duplicating work), I will not object, so long as it's split as indicated by Atelaes above. In addition, we need an About Norwegian page (either containing information on both standards or linking to separate pages with a note as to why) for those who just don't get the idea of there being no Norwegian language header. Release the shoats! --Wytukaze 01:31, 12 March 2008 (UTC)
As you say, a tag might be better, except that Nynorsk is treated as a separate language and code from Norwegian across projects. (Think about iwiki links, wp links, etc. etc.) This is already extremely well established, which is why I am very annoyed with Atelaes for creating a large discussion on BP about something that is not going to change anyway. You can go around in circles forever on this, and it will all come back to cleaning up entries so that the headers are "Norwegian" and "Nynorsk" (or "Norwegian Nynorsk") This is a solved problem. Creating a very large mess from a solved problem is not productive. I will be trouting Ateleas when I can catch him on IRC ;-) Robert Ullmann 13:30, 12 March 2008 (UTC)
I would like to know why users continously comes with utterings like "Bokmål and Nynorsk are not different languages; they are two differing written standards based on differing dialects". This isn't clear facts. Nynorsk is based on all Norwegian dialects together while Bokmål is based on the old standard strongly influenced from Danish. And as a fact I can state: "They are two independent languages". I note that several users by the way they express themselves are implying that these two are very close to each other, and not at all two languages. The iws to other articles on Wikipedia use "Norsk (Bokmål)" and "Norsk (Nynorsk)", so why do Robert want "Norwegian" to mean "Norwegian (Bokmål)" when it it means both? The only reason why "no" is used instead of "nb" (bokmål) on no.wiki is because no.wiki first was for both lanugages, but after a while "nn" got it's own wikipedia, but "no" was never changed to "nb". --EivindJ 13:42, 12 March 2008 (UTC)
I concur that Nynorsk should be separate. What I am saying is that the header here for no must be "Norwegian" and we must use no and nn because all the rest of WMF does (we can't use "nb" for Bokmål without breaking everything in sight). Note that no is then Norwegian-not-Nynorsk, as you point out is the established usage in WMF for (e.g.) no.wikt, nn.wikt and so on. And we simply can't use "Norwegian Bokmål" for what in English is called "Norwegian". (NLC POV, etc, etc, notwithstanding.) As I have pointed out, this is a solved-set problem; nothing to see here. Robert Ullmann 13:53, 12 March 2008 (UTC)
Yes there is a major problem here: Despite what you imply, there is no written language called Norwegian; only Norwegian (bokmål) and Norwegian (nynorsk).
If bokmål is best left under the no template, then change it into Norwegian (bokmål). Wiktionary's accuracy regarding the Norwegian language starts here. --Harald Khan Ճ 16:35, 12 March 2008 (UTC)
Just to clarify it further: making one header called Norwegian and one called Nynorsk is like under the Wikipedia entry of Norway to present the official languages of Norway as "Norwegian" and "Nynorsk", which is nothing but a factual mistake. --Harald Khan Ճ 16:48, 12 March 2008 (UTC)
Umm....I'm seeing Norwegian (Bokmål and Nynorsk). -Atelaes λάλει ἐμοί 16:55, 12 March 2008 (UTC)
As you should. That is opposed to Norwegian AND Nynorsk. The term Norwegian does exclusively refer to the Norwegian language as a whole including all dialects or both written languages. --Harald Khan Ճ 20:02, 12 March 2008 (UTC)
While I'm sure this will prolong my trouting, I must say Robert, that you've never offered any convincing reason why we can't use "Norwegian Bokmål" and "Norwegian Nynorsk." If the angstrom will break bots, perhaps we can substitute an "a" for it (and note the deficiency in a few places). It really can't be a matter of users not being able to finding Norwegian, as Norwegian's included in there. The worst case scenario there is that people will wonder what Bokmål and Nynorsk mean, and educate themselves on the subject (a situation which is, admittedly, completely opposed to WM values :-)). If it's out of concern for following WM precedent, then a quick look at the main page of Wikipedia solves that. The interwiki links to both Norwegian 'pedias are titled ‪Norsk (bokmål) and ‪Norsk (nynorsk). We don't seem to have an interwiki on en wikt for the nn wikt (it looks like a fairly new wikt). You say that using nb would break stuff (such as t-bot, I presume), then we can use no and title it Norwegian (Bokmål). If it's simply your odd POV in considering Nynorsk to somehow be inferior to Bokmål or just plain not a part of the Norwegian language, then I guess I'm not worried, as you seem to be singing a solo tune on that one, and a vote will put an end to that (the tyranny of the majority can be a nice thing when you're in the majority). -Atelaes λάλει ἐμοί 16:43, 12 March 2008 (UTC)
(note that the previous entirely misrepresents my "odd POV". If Språkrådet itself can call it (just) "Nynorsk" in its literature, it is at the very least not wrong ;-) Robert Ullmann 13:00, 13 March 2008 (UTC)
I'll say it again. Bokmål and Nynorsk are not different languages and it is misleading to refer to them as such. They are, and I have never said otherwise, (add 'completely' if you wish) separate written standards of the same language. They are similarly not independent; they do influence each other and are influenced by speech (and this is, if you'll forgive me for saying so, rather obvious). The spoken language, as it happens, is not separated into the two standards of Bokmål and Nynorsk; yes, there's some correlation, but speech is split among the various dialects, some of which correspond more or less closely with one of the written standards or with the Swedish standard, and on and on and on. Many languages have considerable dialectal variation in speech and Norwegian is in no way special in this area. What is special is the written situation, but a written standard does not constitute a language and a language can have more than one written standard. It is not difficult to do. This is, as I believe has been stated, a different matter to the differences between US and UK spelling differences, true; it is, however, comparable with that, the pluricentric standardisation of German, the Cyrillic and Latin orthographies of Serbian and the situation with modern Welsh, where the written, literary standard differs considerably from modern dialects. I do believe useful information on all of those can be gleaned from Wikipedia. As such, it is still my opinion that we should not be splitting the two written standards into separate L2s. However, as we and the WMF as a whole already do split them, we should continue to name both Norwegian and specify which standard we are referring to at every instance. That is, I firmly do not support naming one "Norwegian" and the other "Nynorsk" or even "Norwegian Nynorsk". They are both equally Norwegian, and Atelaes' proposal that we use "Norwegian (Bokmål)" and "Norwegian (Nynorsk)" shows this and the secondary nature of writing in language admirably. --Wytukaze 20:15, 12 March 2008 (UTC)

Okay people ... have we discussed this to death? (note: I only used "Nynorsk" itself because "Norwegian Nynorsk" seemed like a redundant pleonasm. Språkrådet (NLC) itself uses just "Nynorsk" in its English language literature.)

How about we simply follow the standards? ISO, SIL, NLC, the de-facto setup of WMF, the setup of the no.wikt, and the status quo pro ante here? Eh? Especially because they all say the same thing?

ISO/SIL codes:

  • no = nor = Norwegian
  • nb = nob = Norwegian Bokmål
  • nn = nno = Norwegian Nynorsk

Språkrådet (NLC, Norwegian Language Council) recommends that in English the two written forms be called "Norwegian Bokmål" and "Norwegian Nynorsk" when they need to be distinguished. (Notice no parenthesis; we don't want parens in L2 names anyway.)

WMF uses no = Norwegian in the project naming, (although this has become usually more Bokmål as the Nynorsk projects were added).

The no.wikt uses no = Norsk, nb = Norsk (Bokmål), nn = Norsk (Nynorsk) (see no:Mal:=no=, no:Mal:=nn=, no:Mal:=nb=), using no = Norsk for most entries, and the other two when they need to be distinguished:


from the no.wikt, as of 6 March 2008

Header Occurs
no 674
nb 72
nn 64


We use no = Norwegian, and have been using the other two (with some variations in form) when they need to be distinguished. (A very small number at this point, L2/invalid shows exactly five.) Note that Norwegian not Bokmål or Nynorsk is no = Norwegian in all the standards; we presumably want some context tag or usage notes in these cases.

As I've said above more than once, there isn't any problem here; all the standards and the de-facto setup(s) agree. There are just a few entries that need fixing, which is where all this [redacted] started.

See mandag and måndag, compare no:mandag and no:måndag. Robert Ullmann 12:29, 13 March 2008 (UTC)

And note that all this is pretty much exactly what Atelaes said at the top of this section, which is why I want to trout him for raising essentially a non-issue and creating a lot of sound and fury. Robert Ullmann 12:36, 13 March 2008 (UTC)
Note that using en.wikt and no.wikt as examples for how often we need to distinguish between the two is not good. A quick look through Category:Norwegian language tells me that there are heaps of words that needs to be changed into either nb or nn. However, I understand it so that Robert says we can have the headers "Norwegian", "Norwegian Bokmål" and "Norwegian Nynorsk". If that's correct then everything's ok for me. I just don't want to see "Norwegian" for "Norwegian Bokmål" and "Nynorsk" for "Norwegian Nynorsk". --EivindJ 13:49, 13 March 2008 (UTC)
This is my first comment on Wictionary; I don't know the technicalities. But I know American English, Nynorsk and Bokmål - in that order. It is totally unacceptable to in any way suggest that Bokmål is more Norwegian than Nynorsk is.
It should perhaps also be pointed out that the differences between the two include more than just words; there are grammatical differences as well. --Hordaland 22:40, 17 March 2008 (UTC)

Please note that the use of the language code “no:” for “Bokmål” on Wikipedia is a leftover from the time when no.wikipedia.org included both Bokmål and Nynorsk. This code has been kept as the main one, with nb being a redirect, as a “compromise” — to facilitate good diplomatic relations between the Bokmål and Nynorsk wikipedias. To use this compromise to misrepresent the language names here would be a mistake — regardless of what Mr. Ullmann’s feelings are. -- Olve Utne 19:09, 19 March 2008 (UTC)

Hmmm.....this could be a sticky issue. The simple fact is that we have a lot of bots running information back and forth between wikt's, such as User:Tbot. The work they do for us is invaluable. Because of this fact, we may be stuck retaining this incorrect usage simply because the Wiktionaries themselves retain it. What we really need is a comment from Robert Ullmann on whether it would be possible to use nb and nn, and still have the bots function properly when the Wiktionaries are no and nn. While I strongly disagree with him on how to treat the Norwegian languages, there is no denying that he is easily the most knowledgeable editor on this particular aspect. If this screws up the bots, I think we may have to retain no and nn until such time as the Norwegian Wiktionaries themselves make the appropriate switch. -Atelaes λάλει ἐμοί 19:47, 19 March 2008 (UTC)
Words that are not different between Bokmål and Nynorsk remain in no/Norwegian. The variant codes and names are only to be used when there are differences. Specifically: only when there are corresponding entries which are differently spelled, and refer to each other. Understand that the no/nb/nn distinction, however real, was a political result of the ISO 639-1 process, which was intended to produce a stopgap coding until something better could be done. The differences between Bokmål and Nynorsk are very small compared to the differences in English, which we code and represent as one language. The differences in Norwegian between dialects and regions are much larger than the Bokmål/Nynorsk written form representation. (A serious argument could be made that Nynorsk is (yet another ;-) 19th century spelling reform that has now failed....)
However, given that Nynorsk is coded, and has at least some people who want to represent it, it is very reasonable that we include it and document it. At the same time, forcing all of the rest of Norwegian into the "Bokmål" pigeonhole is not acceptable. Most terms in spoken and written Norwegian are just that: Norwegian.
So we use no/Norwegian for most of the language, quite properly; nb/Norwegian Bokmål when, and only when it must be distinguished from Nynorsk, and nn/Norwegian Nynorsk for those words that must be distinguished from Bokmål.
And yes all this works correctly with the automation, which has long since changed "nb" to "no" for the iwikis. Robert Ullmann 01:13, 20 March 2008 (UTC)
I hesitate to comment that all written languages are, in fact, different and often divergent from their spoken counterparts. That said, the only slight modification to Robert Ullmann's reasonable offering is that when nn/Nynorsk is used is exactly the same as when nb/Bokmål would be used: when either is variant. - Amgine/talk 02:31, 21 March 2008 (UTC)
Of course; we often document the spoken and written differences, and the frequent derivation of one from the other. (An interesting case is German pfui > pfui > spoken form > phooey ;-) And yes, you can look at it either way 'round ;-) Robert Ullmann 08:52, 23 March 2008 (UTC)

Please do not let yourselves be mislead by Mr. Ullmann’s current “solution”. (What are his qualifications in this matter, by the way?) It does not work in practice without some major tweaking of templates etc. to bridge the very frequent differences in grammar — see dag#Norwegian, where the common Scandinavian word dag is presented under the header “Norwegian”, but given only the “Norwegian Bokmål” plural forms. While it is true that the singular forms in this word (dag, dagen) are the same in Norwegian Bokmål and Norwegian Nynorsk (and Danish and Swedish at that), Mr. Ullmann may not be aware of the fact that the plural forms are different. Actually, when taking morphology into consideration, most words are different between Nynorsk and Bokmål, despite the (false) impression one gets from the fact that the indefinite singular forms of nouns and adjectives often are the same in both (and in Swedish/Danish). The solution is very simple:

  • Treat Norwegian Bokmål and Norwegian Nynorsk, under those exact names, as separate languages — the same way as the two other Scandinavian languages (Swedish and Danish) are treated.
  • Use either “nb” (correct language code) or “no” (not correct language code, but currently used interwiki code on Wikipedia) for Norwegian Bokmål. A robot can easily standardise the entries either way.
  • Use “nn” (correct language code and currently used interwiki code) for Norwegian Nynorsk.

Respectfully, Olve Utne 07:42, 23 March 2008 (UTC)

"(What are his qualifications in this matter, by the way?)" That is argumentum ad hominem and pretty much discredits anything else you have to say. My qualifications would probably floor you. (for one thing, I am 1/2 Norwegian :-) In the 639-1 process, where we were developing a 2-letter language code to cover a number of languages, we had several serious political problems. The nn and nb codings are the result of one of those problems: the Norwegians vociferously insisted that Bokmål and Nynorsk be coded separately (not all the Norwegian contributors to the committee, just the Nynorsk proponents...) even though the -1 two letter coding should have had only one code for that level of classification. The distinction should have been left to (what is now) -3, or more properly -4. In the end a political "solution" was reached, no, nb, and nn all coded, with the expectation that software implementors would use the correct no code, and just ignore nb and nn. (Which is what they have mostly done. Until one or more very vocal proponents of Nynorsk show up, such as Olve from the nn.wp ;-) Please note that the linguistic credentials of the several hundred people both on the committee and in support (such as myself) were/are very extensive. A similar political problem—in reverse—was with the Chinese languages; the rapporteur from the PRC insisted that there was only one "Standard Written Chinese" (meaning Mandarin written in simplified characters) disregarding that -1 should have coded 11-14 of the languages. The result was zh, which was useful to some extent, but quickly had to be extended by software implementations (zh-CN, zh-TW, zh-min-nan, etc). What I wrote above is the resulting political standard(s), IMHO a better solution for us would be what Atelaes said right at the top: use no/nor "Norwegian", and distinguish with proper tags and such the Nynorsk written forms from Bokmål. Robert Ullmann 08:52, 23 March 2008 (UTC)
Whether ISO should have been this or that way in your opinion, the fact stands that Nynorsk and Bokmål have separate language tags — just like the two other Scandinavian languages. Since this distinction does exist — and in all of ISO 639-1, ISO 639-2 and ISO 639-3 at that — I do not see any reason why we should not use it.
You are of course free to prove me wrong through addressing the problems I pointed out rather than bragging of your one parent from Norway (I have... — TWO (!) (Unbelievably impressive, eh? ;-) But beside the point.)) and claiming ad hominem attacks (In my book, those are about attacking the person rather than their arguments.. But who am I — a mere ignorant Norwegian linguist — to know what ad hominem means...) as an excuse for avoiding the legitimate questions — see dag#Norwegian.
While you are at it, feel free to explain what makes Nynorsk/Bokmål less of a legitimate distinction than Bokmål/Danish, Czech/Slovakian, Macedonian/Bulgarian — or Nynorsk/Swedish at that. That those of us who write both Nynorsk and Bokmål (like myself) or only Nynorsk (like quite a few others). Isn’t having a separate literary tradition for a century and a half enough for a language to be treated as one? That Cantonese and some other languages within the Chinese languages have not had a full literary tradition until recently is not a reason to treat Nynorsk differently from Bokmål, Swedish, Danish, Faroese, Czech, Slovakian, Macedonian, Bulgarian, Catalonian, etc.
-- Olve Utne 11:00, 23 March 2008 (UTC)
Actually, since you hint to your knowing Norwegian (?): How do you want to solve, e.g., the following problems: sei, elv, bli, bok, rot, sau, kjerring? -- Olve Utne 11:27, 23 March 2008 (UTC)


"(not all the Norwegian contributors to the committee, just the Nynorsk proponents...)" Semantics. How there came to be two different ISO-codes is irrelevant. The fact that there indeed is two different ISO-codes is what you should note.
"Look, we have to have Norwegian as Norwegian. (Else everyone will be saying forever "I can't find Norwegian")." Another argument you used earlier which is also pure semantics. The fact that there IS two different official versions of Norwegian, and that both are equally Norwegian, you cannot alter. By your logic, for those who do not know that there are different official languages in the three Scandinavian countries, we should lump the languages together under a Scandinavian header and use Swedish as the norm of Scandinavian languages since it is the one with the most speakers/users, and have the other languages as mere sub-sections. --Harald Khan Ճ 19:01, 24 March 2008 (UTC)
I agree, we should delete it. - TheDaveRoss 19:58, 24 March 2008 (UTC)
  • I am shocked to find out that the large number of SIL's macrolanguages we regularly use here as normal L2 headers. Technically, on the basis of large lexis/morphology variations between individual languages with separate -3 codes, anyone could ask for their separation. I've heard that the differences between some of those Arabic/Chinese dialects can be quite enormous. Sometimes the orthogrophy used for lemmatization (non-phonetic logograms, or consonant-based such as for Arabic/Aramaic..) can be a binding factor, in addition to, of course, shared cultural/religious heritage that strongly emphasizes common treatment. In other cases, when the separation itself is a preferred option for various reasons (usually just the opposite of those used for common treatment - I've read that dozens of almost exact Aboriginal languages are treated separately because no tribe want's to have it language named ofter neighbouring tribe's name..), forced unification on the basis of intelligibility arguments would necessarily be enforcing a particular POV, that would encounter sharp criticism all along the way of wiktionary's lifetime, assuming that the separation meme is typically not peculiar to some small and loud group. For once, almost all Norwegian-language contributors that have voiced their opinion here support the separation of Bokmal and Nynorsk. If it's true what has been said that, notwithstanding the common lemma form, there is a lot of disparity (approx. how much?) in inflected forms, especially those that display next to the headword line (like plurals, participles, definite/indefinite forms) that would be hard to solve correctly, the separation would probably be the best choice. --Ivan Štambuk 21:36, 24 March 2008 (UTC)
  • Huh?? We don't use Chinese as a L2 header, except accidentally or as holdover from early additions. See Wiktionary:About Chinese. We do treat Arabic as a single language, but that's partly driven by the fact that there's an Arabic Wiktionary and an Arabic Wikipedia. We also don't get many contributors who add terms in specific regional variants. We did have someone adding Egyptian Arabic for a time, but he has long since disappeared. Without more contributors for a macrolanguage, there just isn't much we can do. --EncycloPetey 06:03, 25 March 2008 (UTC)
I was under the impression that here L2 Arabic = Standard Arabic (we use macrolanguage -1 ar=ara as synonymous with -3 individual language arb, and Category:Egyptian Arabic language has only 2 entries, so I guess the dude you're referring to must have been formatting his entries with ==Arabic==, and all of the regional variants in translation tables I saw all point to entries with normal ==Arabic== L2), the exactly the same situation occurs with Aramaic where 334a is adding regional-agnostic spellings (-3 arc = Imperial Aramaic = just Aramaic here). Now had you done your homework and actually studied that list, you'd find out that almost all of other families on that list beside Chinese and Serbo-Croatian are used here as normal L2 headers; in alphabetical order per -3 code; beside already mentioned Aramaic and Arabic:
Edit history of those categories shows that you edited most of those, so you should have known better before generalizing everything to Chinese family (which represents 0.01% of world's languages anyway). --Ivan Štambuk 10:22, 25 March 2008 (UTC)
I would like to point out the following:
1) Nynorsk and Bokmål booth have their own separate literature.
2) As for the similarities with the other Scandinavian language, Bokmål is probably closer to Danish than to Nynorsk. --Sigmundg 05:23, 25 March 2008 (UTC)

Hm, I've been absent for a while, and haven't got to follow this discussion. Is there any conclusions yet? Regardless of what people think is a language or not, I would like an answer to the following:

  1. What to do when a word is the same in both Nynorsk and Bokmål, but has different grammar.
  2. How to categorize
  3. When a word is in both of them, but with different meanings

I also think we need to create som Bokmål and Nynorsk templates. It's about time we can get back to work ... The Norwegian part of this wiktionary is poor :S --EivindJ 09:45, 25 March 2008 (UTC)

My gut feeling (an my brain feeling too) is that deprecating “Norwegian” as an unqualified term is the best solution. Norwegian is a conglomerate of Scandinavian dialects that happen to be spoken in Norway, but it is not a written language, and not even one written language continuum. Rather, there are two written languages — currently known as Bokmål and Nynorsk. Each of these has, like English, its own continuum of conventions and standards, and some of these again have their own names. Thus, we have — from Høgnorsk and archaic Nynorsk through archaic Riksmål to Danish:

Høgnorsk - conservative Nynorsk - NYNORSK - radical Nynorsk || Samnorsk | radical Bokmål - BOKMÅL - conservative Bokmål - Riksmål - Dano-Norwegian - DANISH

(The above is a bit simplified, but should give a reasonably good impression of the situation.)

The continuum from høgnorsk through radical Nynorsk are mainly predictable phonological variations over one common morphological system.

The continuum from radical Bokmål though Riksmål is a bit more complicated, but is also a continuum which has reasonable predictability based on a common morphological system.

Setting up grammar tags to cover the whole Nynorsk spectrum is uncomplicated. Setting up grammar tags to cover the whole Bokmål spectrum from radical Bokmål through Riksmål is also quite easy.

But setting up grammar tags that cover both the Bokmål and the Nynorsk spectrum in one is a daunting task which takes the invention of new grammatical tags that will make the entries much more opaque for the average editor, since one would have to disregard the tradtionally numbered classes of nouns, verbs, etc. and in effect invent a new system.

The current unqualified “Norwegian” entries are, as far as I have checked, all actually Bokmål, and by far the easiest solution, practically speaking, would be to have a bot rename them all to Norwegian Bokmål. That being done, one would already have achieved a system which is practical, intuitive and — last, but not least: not original research...!

In addition to the differences in vocabulary and morphology, please note that the syntax is also quite different — to the degree that a literal, word-by-word translation from Bokmål to Nynorsk will, actually, sound very awkward in most cases.

My proposal is therefore:

  • Have a bot move all current “Norwegian” entries (which are de facto Bokmål already) to “Norwegian Bokmål”.
  • Keep all “Norwegian Nynorsk” entries under “Norwegian Nynorsk”.

This reflects the fact that Bokmål and Nynorsk are not one written language continuum, but two. The names reflect the officially designated terms by the official Norwegian Language Council, as well as, except for not having parentheses, the already established Mediawiki interwiki names. -- Olve Utne 15:06, 27 March 2008 (UTC)

WT:MILE

I propose that we re-format Wiktionary:Milestones with User:Nadando/milestone, or something similar. The current page is all of the place formatting-wise, and the new page is sortable. Nadando 03:44, 11 March 2008 (UTC)

Be bold, kick Milestone's ass. - TheDaveRoss 03:48, 11 March 2008 (UTC)
I can't, it's protected :) Nadando 03:49, 11 March 2008 (UTC)
Thanks. Nadando 03:50, 11 March 2008 (UTC)

ő and ű in .ogg audio filenames

I am seeking BP's help in a problem I'm having with two characters (ő and ű) in .ogg filenames. I am trying to record audio for Hungarian words. I've used both Audacity and Shtooka. The other special characters work fine (á, é, í, ó, ö, ú, ü), but ő and ű are changed to o and u. I'd prefer Shtooka since it is so easy to use. In Hungarian, the accents mean a different letter, not simply the stress in the word. So bor = wine, and bőr = skin, but Shtooka will not create bőr.ogg, only bor.ogg, and if I change the filename to bőr.ogg RealPlayer will not play it. I don't have this problem with mp3 files and I can display these special characters on my PC. I use the hu-%STR mask in Shtooka because this will give the preferred filename format. I copy/paste the words from Wiktionary to Shtooka before recording. If the pasted word contains ő and ű, these two characters are displayed as vertical bars. I was told that other languages with special characters work fine, but I don't know how to fix this. Thanks. --Panda10 22:02, 12 March 2008 (UTC)

As a work-around is there any standard transliteration scheme for ő and ű to ascii characters (like ä and ö can be written ae and oe)? Thryduulf 22:21, 12 March 2008 (UTC)
Unfortunately, no or at least I've never seen it. --Panda10 22:23, 12 March 2008 (UTC)
I recall seeing ooe and uue.—msh210 22:40, 12 March 2008 (UTC)
I've just tested ô and û, both worked fine. It seems that the Latin-Extended characters will not work, but Basic Latin and Latin-1 will. If nothing else works, I will use these characters. They are not correct, but look close enough to the original. Thanks. --Panda10 23:41, 12 March 2008 (UTC)
Is this a problem when naming the files on your own computer, or when you upload them to Commons? Did you know that you can upload the file under a different name than the one saved on your computer? Have you tried doing that? Have you tried using QuickTime instead of RealPlayer? This same problem can potentially affect many other languages, so I'd rather we didn't try to "work around" it by giving such files a different name. The audio file name should always match the entry name. If there is a deficiency in RealPlayer, then we should tell them and let them decide whether they want to fix the shortcomings in their software on their own. --EncycloPetey 03:23, 13 March 2008 (UTC)
This is not a problem with naming files on my computer. I have several mp3 files with ő and ű in the filename, they can be played fine with Windows Media Player. Uploading to Common - I uploaded only words that did not contain ő and ű, so I can't answer this part of your question. For now, I am trying to play the files with ő and ű in the .ogg filename on my computer, no success. I tried QuickTime, it displays an error message for any .ogg file. What is the recommended player for .ogg files? How will Wiktionary users play these files? --Panda10 19:25, 13 March 2008 (UTC)
I use Quicktime to play all .ogg files, but then I am using a Mac. I would try uploading some files with the problem charatcers to Commons, changing the name at upload to the Hungarian spelling, and see if you can then play them from within a page. --EncycloPetey 19:43, 13 March 2008 (UTC)
Thanks for the idea. I uploaded the audio for ősz (autumn) and played it back successfully from Wiktionary using the same RealPlayer that is not willing to play it on my computer... Does this make sense to you? --Panda10 20:55, 13 March 2008 (UTC)
It makes sense in that I understand what are are saying, and am not surpised by the vagaries of internet interaction. Do I comprehend the reason it works when done this way? No. But I do know that I've successfully played .ogg files for odd script words before this way. --EncycloPetey 23:29, 13 March 2008 (UTC)

Even sysops are ignoring my questions

Which box(es) should be utilized on failure, zero gravity, vampire, and martial art? I am being told of a non-existent policy which is in practice (WTF?) albeit it is not written. Ergo, I was temporarily banned for making what I thought were bold edits, and some of my thoughts continued to go unanswered. Can anyone lend a voice? This is probably the last place I'll be repeating myself. Sesshomaru 05:42, 13 March 2008 (UTC)

Relevant discussion at User talk:Sesshomaru. -Atelaes λάλει ἐμοί 05:52, 13 March 2008 (UTC)
This seems like a strange use of blocking, although I haven't looked into it thoroughly, and don't plan to.
Personally I think both boxes should be deprecated in favor of {{pedialite}} in a ===See also=== section. As an added bonus, that template is unobtrusive enough that it can be repeated as needed. If we must choose, then at least 99% of the time we should be linking to the disambiguation page. Dab pages seek to provide the full range of encyclopedic meanings, just as we seek to provide the full range of lexical meanings, and are thus the most appropriate next stop for someone who didn't find what they were looking for in our entry. -- Visviva 05:59, 13 March 2008 (UTC)
Can you clarify? All I understood was "don't use 'em" and "use dabs 99% of the time". Sesshomaru 06:05, 13 March 2008 (UTC)
Visviva suggest you use {{pedialite}} as an inline, unobtrusive link in a ===See also=== section within the part of speech section. (Notice the level of the header, this is important.) - Amgine/talk 06:11, 13 March 2008 (UTC)
Yes, try at most one box, and preferably only {{pedialite}}, although we really need to update the WT:FAQ to encourage the inline template. If they are as such, more than one link to Wikipedia is NOT a problem, provided they are relevant. A disambiguation page is fine, and I would say a link corresponding to any definition where the Wikipedia article is directly related. 71.129.48.8 06:17, 13 March 2008 (UTC)
Okay. I'm starting to understand. Can someone provide sample(s) of links which share this tag? I'm more curious in seeing the layout than anything else. Sesshomaru 06:28, 13 March 2008 (UTC)
Have a look at Special:Whatlinkshere/Template:PL:pedia. One example is at pigment#See also. Mike Dillon 06:42, 13 March 2008 (UTC)
I'm sorry zero gravity was reverted time and again instead of indicating a better way to accomplish the goal you had in mind. I agree with Visviva that a box makes sense for disambiguation, but the community is splintered on the whole issue. Personally I don't think many people have put the necessary thought into it, assuming one page on Wiktionary equates to one page on Wikipedia, but I digress. To address your question, no one would object to the way I've set up zero gravity now, using {{pedialite|...}} and {{pedialite|dab=...}}. 71.129.48.8 06:57, 13 March 2008 (UTC)
Although I have never seen a policy about it I have always used links to Wikipedia either when they have a page at the same title, or when I am using terms in the definition that are not explained on Wiktionary (for example expanding abbreviations to places or entity names). As a dictionary I feel that Wiktionary should definitely not be aiming to provide background information about related topics, though linking to related words on Wiktionary is useful for broadening vocabulary, and that is why we have lots of synonyms, antonyms, etc.etc. sections. It makes clear sense to me to link to the disambiguation page on Wikipedia, unless we want to have an interwiki for each sense of the word, which is redundant and ugly, as there is no way we can tell which article people will be interested in. On a related note, if Wikipedia's article at the identical title is actually a redirect, it is still preferable to link directly to it than to disambiguate manually, this is so that if the redirect is converted into an article or pointed to a different place then the link will still make sense.
For example, with zero gravity, the link to w:Weightlessness is irrelevant, if people want to know what weightlessness is they can click on the link to weightlessness in the definition (where I assume they will find a link to 'pedia should one be necessary). It is worthy of note that we have a javascript extension that provides interwiki links to Wikipedia whereever {{pedialite}} and related templates are used and that at the moment it is assuming an interwiki between Wiktionaries "zero gravity" and Wikipedia's "weightlessness", which should not be there. Conrad.Irwin 10:55, 13 March 2008 (UTC)
I would strongly suggest that in the case of redirects, "Zero gravity" should be preferred over "Weightlessness", basically, to avoid confusion. And you never know if the redirect could very well become its own article (this is what happened to w:Kristin Wells; for a long time the link targeted w:Superwoman and now it has its own page). Per this discussion, I made these changes, but was I correct in doing this edit (martial art or martial arts)? And back to the fundamental inquiry: what about an instance such as Batman vs. batman, Superman vs. superman, etc.? Sesshomaru 22:10, 13 March 2008 (UTC)
In the case of zero gravity, WP's disambiguation page is not relevant as there is only one definition, I think therefore it should link to w:Zero gravity - though that is a personal preference. This discussion looks in favour of having only one Wikipedia link per entry, yet both of the "these changes" links contain two links, so I am not sure why you are referring people to here to justify them. The martial arts edit was fine. For Batman I think we should link to w:Batman (disambiguation) as we have more than one meaning in common. For batman I feel that we can link to w:Batman (military) as that is the meaning given - though I wouldn't object to linking to the disambiguation page there too. The links on superman and Superman feel right to me, though I can't see the link on superman being much use, and wouldn't have added it myself. I don't think that we shold be using Wikipedia as a place for people to find new meanings of words, but instead to enhance their knowledge and understanding of the words we define. Conrad.Irwin 14:58, 14 March 2008 (UTC)
I don't understand. Is this a reversal of what you said above? Should we link to w:Zero gravity or to w:Zero gravity (disambiguation)? First you said that w:Weightlessness, which the first redirects to, is irrelevant, and now you say that the disambiguation page is not relevant. I would imagine either could be depending on what the user is looking for, so why not have both? If there were more definitions, as with trunk, there would have to be more links anyways. DAVilla 11:03, 19 March 2008 (UTC)
The thing is that "weightlessness" is not the same as "zero gravity", the fact that at the moment Wikipedia treats it the same at the moment is irrelevant. Whether to link to a disambiguation page or a specific article is a choice that needs making on a per entry basis, but I can never see the use of linking to a different word on Wikipedia. Our aim is not to provide people with information about topics, it is to provide them with a better understanding of words. Conrad.Irwin 01:12, 20 March 2008 (UTC)
There's a policy draft at Wiktionary:Links; it definitely needs more work before it's representative of community standards, but it's a start. —RuakhTALK 12:24, 13 March 2008 (UTC)

Can you see this: 𐎧𐏁𐎹𐎠𐎼𐏁𐎠?

I am interested in knowing what percentage of people can see the Old Persian script without installing extra fonts. If the majority of people can see only '??????' then it might be good to add a template or some information on adding the font and making it display properly. There is such a template on Wikipedia for Chinese [5] (although I think most people can see Chinese by default). Pistachio 00:25, 15 March 2008 (UTC)

There are numerous fonts (including this one) that are not installed in my Windows setup. Mostly I don't miss them. DCDuring TALK 00:33, 15 March 2008 (UTC)
Something like 0.01%. Lots of those ancient languages' scripts requre special fonts; in this case Aegean.otf or Xerxes (those two are supported by Xpeo). I don't think it would be good to clutter pages with "this needs special fonts" messages; how about instead providing image display in the headword by default for all of these? Like in e.g. Phoenician 𐤀𐤍𐤊𐤉 or Gothic 𐌷𐌰𐌹𐍂𐍄𐍉 ‎(hairtō) ? --Ivan Štambuk 00:44, 15 March 2008 (UTC)
I can see it, incidentally, but I doubtless have extra fonts installed for a lot of things. --Wytukaze 00:48, 15 March 2008 (UTC)
I can't believe a Wiktionary admin can't see Old Persian cuneiform. How embarrassing. I move to desysop DCDuring for their lack of interest in esoteric languages. :-) -Atelaes λάλει ἐμοί 00:51, 15 March 2008 (UTC)
Now that I've outed myself, I should admit that even if it were visible, I couldn't pronounce or read this script or, er umm, several others, including, er umm, a couple that do display on the screen (though I couldn't say what they display). I blame my failing eyesight. DCDuring TALK 00:58, 15 March 2008 (UTC)
I am able to view it (on Debian Linux) without doing anything special, though there are several scripts which I can't decipher. Conrad.Irwin 01:01, 15 March 2008 (UTC)
I can see it in Linux (Kubuntu), the only non-default font I've installed on this machine is Gothic. I can't see in Windows XP on a machine that is completely unadulterated font-wise. Thryduulf 01:03, 15 March 2008 (UTC)
I can't read it (or several other scripts, including Gothic) though. Thryduulf 01:09, 15 March 2008 (UTC)

Thinking about the suggestion of image display, is there any way to automate the generation of these images in the same way that complicated maths formulae are? 01:12, 15 March 2008 (UTC)

I was thinking of a javascript solution which allowed people to define which fonts should be replaced in this way, and get it to insert the images in place of the text as and when it was required. But I then got distracted and ran out of ideas, maybe I'll come back to it some other time. Conrad.Irwin 01:26, 15 March 2008 (UTC)
Do we have one or more tables of languages, scripts, and fontnames (or link to same) that one could refer to find what one needed to install in each major operating system/browser pending more user-friendly solutions? DCDuring TALK 01:34, 15 March 2008 (UTC)
Not that I'm aware of. I always use [6]. It's an excellent site. You may want to simply begin with Code2000. While it's not always the prettiest of fonts, it covers a fairly broad swathe. -Atelaes λάλει ἐμοί 01:36, 15 March 2008 (UTC)
Perhaps there could be a section in the 'help' with this information. Also, the idea of having images to display seems lovely (it goes way over my head though). Pistachio 01:40, 15 March 2008 (UTC)
A help page with a table showing samples from each script along with a link to get a copy of that font if you can't see it, and a link to a how to install fonts on various operating systems seems like a very useful thing to have. Thryduulf 02:05, 15 March 2008 (UTC)

I wonder if it would make sense to change some of the script templates for less common scripts to link to a help page or appendix giving information about installing fonts or an option to have those scripts turned into images via JavaScript (for logged-in users). Alternative, JavaScript code could use the CSS classes used by these scripts only when they are the head-word to avoid cluttering pages with tons of help links. Mike Dillon 02:50, 15 March 2008 (UTC)

There are no OPC signs on Commons, so I made some ad-hoc images (that really ought to be turned into SVGs by a knowledgeable wizard). Nothing impressive, but it's better than nothing. I like the idea of some sort of superscript over a headword with a message like "Problem with fonts?" linking to Appendix: with detailed instructions. --Ivan Štambuk 03:23, 15 March 2008 (UTC)

Cross-dictionary bookmarklet

I have just made a bookmarklet that you can use when on one online dictionary to add links to other online dictionaries for the same word. So far only Merriam-Webster, Microsoft Encarta, and the English Wiktionary are supported.

It's only tested on Firefox so far and for some reason the links are not clickable on Wiktionary. Improvements welcome. Copy and paste this code into a bookmark:

javascript:if(location.host=='www.merriam-webster.com')w=decodeURIComponent(location.pathname.substr(12));else if(location.host=='encarta.msn.com'){t=document.getElementsByTagName('title')[0].firstChild.nodeValue;w=t.substr(0,t.length-38).replace('’',"'");}else if(location.host=='en.wiktionary.org')w=decodeURIComponent(location.pathname.substr(6));else w=null;di=document.createElement('div');di.innerHTML='<a href="http://en.wiktionary.org/wiki/'+w+'">Wiktionary</a> <a href="http://encarta.msn.com/dictionary_/'+w+'.html">Encarta</a> <a href="http://www.m-w.com/dictionary/'+w+'">Merriam-Webster</a>';di.align='center';bod=document.getElementsByTagName('body')[0];if(w)bod.insertBefore(di,bod.firstChild);void(1)hippietrail 11:18, 15 March 2008 (UTC)

Template:defective verb

Can somebody explain to me why {{defective}} includes the POS while no ther such templates ({{ambitransitive}}, {{ergative}}, {{ditransitive}}, {{impersonal}} and even {{auxiliary}}) don't? Circeus 15:58, 15 March 2008 (UTC)

Well, it's a different kind of description; {{defective}} has to do with the forms a verb takes (and doesn't take), while the others have to do with the grammatical frames it's used in. That said, I don't think we need {{defective}} at all; it's not really a context label, and is best covered by the inflection line, the conjugation table (if any), and/or usage notes. I mean, marking a sense "defective" is really useless anyway, since it doesn't tell you what forms exist and what forms don't. —RuakhTALK 19:23, 15 March 2008 (UTC)
Where is this template intended for use? The only place this information is meaningful is where the full set of inflected forms is presented. It might appear on an inflection line, to alert a user that the verb does not follow the full normal pattern, or it might appear in an Inflection / Conjugation section for a similar reason. There is no other place I can imagine it being useful. I therefore don't see any use for this template. When we redesigned the {{la-verb}} tempate, we set it up so that "pattern=defective" could be used. This way, it displays in the inflection line and it provides a link explaining what it means. So, we don't need the template under discussion for Latin entries. --EncycloPetey 19:28, 15 March 2008 (UTC)
I don't see why evry single bit of information should be included on an inflection line. This template is useful for signaling verbs which might need special treatment, both for users and for those who look after the formatting on various languages. Physchim62 14:43, 19 March 2008 (UTC)

talkative - category?

I'd like to add talkative and its synonyms to a category. Would you recommend the existing Category:Behaviour? --Panda10 22:01, 15 March 2008 (UTC)

If you can come up with enough words, it might be worth starting a Category:Talking as a subcategory of Behavior (and of Category:Sound). I imagine it would be fairly easy to come up with enough terms for such a category. (e.g. loquacious, chatty, talk, speak, speech, say, blab, gossip, chat) --EncycloPetey 23:28, 15 March 2008 (UTC)
If it is created, Category:Talking should probably be a child of Category:Language as well. Another possible name would be Category:Oral communication, but that could include things that aren't "language" like grunting or humming. Mike Dillon 23:43, 15 March 2008 (UTC)
I created the category and added the synonyms I've found so far. Thanks. --Panda10 01:56, 16 March 2008 (UTC)
How should the new Category:Talking relate to Category:Communication? Several of the above mentioned words are already there. --Panda10 15:29, 16 March 2008 (UTC)
It would be a subcategory of that as well. We have some categories that are listed in three or more locations because of their breadth and importance. --EncycloPetey 15:30, 16 March 2008 (UTC)
Mike Dillon changed the code of Category:Talking to {{topic cat|lang=en|current=Talking}}. I used nav before, but I don't know how to update topic cat. Can I just add Category:Communication in the second line? --Panda10 15:37, 16 March 2008 (UTC)
To answer you more directly: Yes you could just add Category:Communication. You could also add "parent=Communication" to the call to {{topic cat}} and it would get merged with the parents defined at Template:topic_cat_parents/Talking. I'm planning to set up a process to watch for these sorts of changes to allow people who want to be familiar with the internals to add them into Template:topic_cat_parents/Talking and make it easier for others to just make their changes and go about their work. Mike Dillon 05:04, 17 March 2008 (UTC)
You merely have to add the extra listing to Template:topic_cat_parents/Talking. --EncycloPetey 15:42, 16 March 2008 (UTC)
I guess this is the tradeoff we have with {{topic cat}} if we decide to adopt it instead of {{nav}}. Adding a parent category is not as obvious, but once it is done it is done for the category in all languages and description changes can be managed from one place as well. I'm planning to set up a read-only bot to watch for changes in the topic category tree and parent/description configuration, so it would actually be OK to just add [[Category:Communication]] directly to Category:Talking and the fact that it is missing from the parent configuration would be noticed and reported. The code is partially written, but I'll try to get a report running soon. Mike Dillon 04:38, 17 March 2008 (UTC)
Thanks for the information, Mike. I do like the new system because of its obvious advantages. It would help to add more usage information to the template talk page. For example, what to do if an existing category has to be added to another existing category as a subcategory. I did read the template talk page before but it was not as clear as now after reading your explanation and seeing how EncycloPetey modified it. --Panda10 18:24, 17 March 2008 (UTC)

Persian,Urdu, Hebrew, Arabic, Korean entry keyboards

Following on from the above discussion about less-commonly installed fonts, I want to raise the point that some users will have difficulty entering search terms in languages with non-Latin script such as Persian,Urdu, Hebrew, Arabic and Korean and so on. They may lack administrator rights to install the languages themselves (students, people in the office), they could be using someone else's computer whilst travelling or they may not know how to install input for extra languages. Also, for many Persian-speaking people, a fault with their computer means inputting Persian produces Arabic letters instead of the modified Persian versions, for example ي instead of . A search for a Persian word in Wiktionary using those Arabic letters will produce no results unless there is a redirect in place: searching for "ايراني" instead of "ایرانی" produces no results. Therefore perhaps creating online keyboards to facilitate input and searching in some languages would be really helpful for some people. Does anyone think this is a good idea? Pistachio 02:30, 17 March 2008 (UTC)

Yes. It can be a page (Wiktionary:Search using various character sets or some such) which uses JavaScript to paste characters to a search box, and then uses the usual Wiktionary search as the search mechanism. Cf. [7].—msh210 16:10, 17 March 2008 (UTC)

Images

How do people feel about something like this, adding an image for each sense for which an image is available? In general, do galleries have a place in Wiktionary entries, and if so where should they be placed within the entry? -- Visviva 07:28, 17 March 2008 (UTC)

Certainly for nouns and proper nouns, images are often good at aiding understanding of the word being defined. Some verbs can also be imaged, but animations would be better for some. Adjectives, etc. are far more difficult to illustrate (e.g. what image would you add to eloquent?).
Regarding the placement of the images, generally I don't like the use of galleries outside Commons, as in most cases inline images work better (imo). Compare ring with router. This can cause problems when the picture(s) extend beyond the definition lines, but this can be overcome - see bassoon (compare with this old revision). There is probably a better way of doing this with js or css than the table format I used there, but I don't know any js or css). Thryduulf 14:45, 17 March 2008 (UTC)
I agree that galleries aren't ideal (at least in their current default format). In general, I'd like the images to be as close as possible to the definitions. But it seems to me that the bassoon solution breaks down rather quickly when there are 4 or more illustrable senses (and most senses of concrete nouns are illustrable, although the store of images on Commons still leaves much to be desired in this regard). -- Visviva 14:57, 17 March 2008 (UTC)
How about a Gallery: namespace, for use in situations where there need to be lots of images for lots of senses? If there is only one image needed, it can go on the page as we do now, but in situations where many are needed, we could have one on the page, then a link to the Gallery namespace for additional illustrations. --EncycloPetey 15:37, 17 March 2008 (UTC)
I think a gallery namespace is a bad idea. Finding uses of a word throughout history is one of the basal functions of a dictionary, showing picture examples is not. While there's nothing wrong with adding pictures, I don't think our emphasis falls on that area enough to justify a new namespace. -Atelaes λάλει ἐμοί 18:44, 17 March 2008 (UTC)
I really prefer the use of images off to the right side of the page. Also I think it is important to use the (#) notation to associate the image with it's definition. The gallery might be able to work, but we don't use the width of the page as much as we could, which is a shame. We often have great big areas off to the right and pages that go on forever vertically, with most of the content well out of sight. Off to the right the image balance the text density of the left side of the page and keep the right side from being mostly blank and boring. We really don't need images for every sense, I think it might be best to only use them when they clarify or clearly illustrate the definition, or when they just look really good :). For some words that will mean 10 images are called for, for others one or none. - TheDaveRoss 19:59, 17 March 2008 (UTC)
In general, I don't think numbered notation works for images any more than for translations (although the consequences of confusion are less severe). Numbers change, and the people making the changes don't always notice the by-number references elsewhere in the entry. That's why I usually try to put some sort of short gloss in the caption (as in ring). -- Visviva 06:28, 20 March 2008 (UTC)
I like it. It's certainly better than messing things up with lots of miscellaneous floaty stuff. Conrad.Irwin 19:27, 17 March 2008 (UTC)
Without intending to stir up this issue any further, it does occur to me that the problems with associating images to senses would disappear if we did begin using the sense, rather than the POS, as our primary unit of organization. -- Visviva 06:28, 20 March 2008 (UTC)

Category:Computer Science vs Category:Computer science

Is there a reason why the Category:Computer Science has "science" with the first capital letter? If not, may I move it to Category:Computer science? --Daniel Polansky 10:57, 9 February 2008 (UTC)

No, yes, go ahead. H. (talk) 09:14, 17 March 2008 (UTC)
I've wondered the same thing about Category:Food and Drink, though moving all those categories by hand seemed like more trouble than it could possibly be worth. -- Visviva 14:58, 17 March 2008 (UTC)
I've made those sort of moves by hand before. If someone will poke me (after Thursday) about getting this done, I'll do the dirty work. --EncycloPetey 15:33, 17 March 2008 (UTC)
I have moved the computer science category manually. It seems to have worked nicely, also because of the heavy use of the {{computer science}} in the entries. --Daniel Polansky 19:14, 17 March 2008 (UTC)

Basque etymological dictionary

For those interested in Basque: http://linguistlist.org/issues/19/19-863.html#1 H. (talk) 09:13, 17 March 2008 (UTC)

Category:Food and Drink

I have manually moved Category:Food and Drink to Category:Food and drink. What I have not moved are the numerous non-English subcategories of that category, listed in the new category. It would be nice if other people could help. Otherwise, I am planning to slowly work on the non-English categories for food and drink too.

Another category worth fixing is Category:Spices and Herbs, fortunately having fewer non-English subcategories. --Daniel Polansky 09:04, 18 March 2008 (UTC)

I'm not sure that I'd say it's complete ready yet, but the {{topic cat}} stuff could help with future moves like this. I've configured Template:topic cat parents/Food and drink, Template:topic cat parents/Foods, Template:topic cat parents/Breads, and Template:topic cat parents/Desserts, so any of those categories can be configured by replacing their contents with {{topic cat|lang=XX|current=CATEGORY}}. If we had mw:Extension:StringFunctions, it would just be {{topic cat}} since we could have the template take care of splitting the language code from the category name. Mike Dillon 15:13, 18 March 2008 (UTC)
I've populated all of the category parent entries for the Category:Food and drink tree under Template:topic cat parents. I'm sure some of them still need descriptions under Template:topic cat description, but many will be fine with the standard description. Mike Dillon 15:54, 18 March 2008 (UTC)

Category:Automotive

Should Category:Automotive be moved to become a sub-category of Category:Road transport? Thryduulf 12:23, 18 March 2008 (UTC)

I think so, yes. -- Visviva 14:31, 18 March 2008 (UTC)

category:zh:Adverbs

Is there a reason category:zh:Adverbs is a sub-category of category:Adverbs rather than category:Adverbs by language? Thryduulf 17:37, 18 March 2008 (UTC)

Yes. Physchim62 14:44, 19 March 2008 (UTC)
The whole situation with "zh" seems very weird to me. I can see that we have Category:Mandarin adverbs, Category:zh:Adverbs, Category:zh-cn:Adverbs, and Category:zh-tw:Adverbs. We don't have Category:Cantonese adverbs, but we do have Category:Cantonese nouns. The "zh" categories seem entirely inappropriate to me for an "adverbs" category. The Chinese languages are the only languages that are handled like this. This seems to be a lowest-common denominator type thing where the words made up of simplified Chinese characters end up in the "zh-cn" categories and the ones made up of traditional Chinese characters end up in the "zh-tw" categories instead of identifying them in each of the actual languages that the words belong to. No doubt someone who understands these matters better will show me the error of my ways, but it doesn't seem that there is a good reason that Chinese languages should be using language code-prefixed categories for parts of speech. Mike Dillon 14:56, 19 March 2008 (UTC)

Consideration on the order of context tags

Just to stir up some discussions and thought, this is the ordering I've been using (and switching articles to):

  1. Grammar information
  2. Topical labels
  3. Regional labels
  4. Formality labels (in practice, mostly "informal" and "slang", but also stuff like "literary" and "baby talk/childish")
  5. Politeness labels ("euphemism", "derogatory", "pejorative", "vulgar", "jocular" etc.)
  6. Temporal/frequency labels ("rare" goes here)

This has felt to me the most natural ordering, though I'm not clear myself why. Typically, I'm more confortable with the Grammar labels not being in the same parenthesis as the other ones, hough I haven't been too consistent on that. I want to point that in quite a few cases, I've been removing templates that felt redundant (e.g. "informal" alongside "slang", "vulgar" or "jocular"). Circeus 14:49, 19 March 2008 (UTC)

The order seems agreeable. This might be worthy of becoming at least a guideline. I'd favor facilitating the placement of all of these inside a single set of brackets, which usually works with our existng tags, but perhaps not every one. Are there any limits on the number of these in {{context}}? Six seems like it would be insufficient since there can be multiple topics and regions {and possibly others). Also editors use up slots with qualifiers like "mostly". DCDuring TALK 15:33, 19 March 2008 (UTC)
After some experimenting, it seems you can have up to 10 labels, subsequent ones are ignored. However, "usually" and "mostly" (and presumably other similar ones) use 2 slots. When using "usually" or "mostly" means there are too many arguments there is a red link to "Template:context 10" at the end. In contrast to what I expected, "and" and "or" each only take 1 slot. |_| of course also uses 1 slot. Thryduulf 18:13, 19 March 2008 (UTC)
It is unusual for more than 4 or 5 labels to appear (including non-label stuff such as Thryduulf just mentioned). At that point a usage note typically becomes a good idea. Circeus 19:07, 19 March 2008 (UTC)
Let me stick a technical note in here: the number of "slots" is not built into {{context}}, it is just a result of the number of {context n}'s that have been set up. Each of {{context 1}} through {{context 9}} is a redirect to Template:context. This "tells" the parser that the resulting recursion is intentional. Create {{context 10}} as the same redirect, and you'll have 11 "slots", and so forth. At some point the tags will be longer than desirable, but this isn't an implementation limit. (So there isn't any reason to worry about "using up" slots.) Robert Ullmann 09:13, 23 March 2008 (UTC)
As the discussion suggests, 10 seems to exceed what we feel would be useful to show in a sense line. How hard would it be to have a trustworthy bot that put context tags in our desired order (assuming that we get consensus on it)? I am having trouble seeing how that could lead to trouble (ignoring tech glitches for the moment). DCDuring TALK 12:06, 23 March 2008 (UTC)
The context slot contents get processed to determine the need for new categories, I believe, so I is useful for things that could be categories to be there. That's one reason I don't like using up slots for "empty" qualifiers, although I suppose that there were enough demand we could have some special means for handling them. I like to use Usage notes for anything complicated or nuanced - or just not likely to become a category in my assessment. I suppose there would be nothing that would prevent successive context tags, if it came to that. Extra brackets might be useful to separate a list of regional or topical contexts from other types. DCDuring TALK 19:39, 19 March 2008 (UTC)
I think the context notes are getting overused, if there are any more than 2-3 the qualifications should be in the usage notes section, perhaps with a pointer to that section as the sole contents of {{context}}. I have seen context fields with 6-7 notes, taking up more than half of the definition line. This is just plain confusing, because inevitably there are regional nuances and things going on which can't be fully explained by a list of context tags. I say limit the contents of context to 3, force all complex contexts to a more clear usage notes paragraph. - TheDaveRoss 19:47, 19 March 2008 (UTC)
Once more, it is the most polysemic words and PoSs that create the best test cases for any options. Relying on usage notes for items that might have been placed on the sense line means that the usage note may not appear on the same screen as the sense line. This puts a big cognitive load on the user at best and may mean they never even know that there is a usage note. The solution of having rel templates with glosses has serious maintainability drawbacks when definitions are edited. This doesn't have so much bearing on the narrowest construction of the subject of this heading, but provides another example of how the structure of our complex entries limits us. If we treated a given sense as a collapsable mini-entry with its own context, def, semantic relations, translations, usage examples, and usage notes, we would have almost guaranteed that almost all the information a user could want would be on a single screen. DCDuring TALK 20:06, 19 March 2008 (UTC)
I think in the end this is going to come down to "we need to rethink our entry layout, because it doesn't work." There is a lot of data which is associated with a particular sense which is kept far away from that sense, often with a lot of other unrelated junk in between. Translations, citations, usage notes, examples, images, and to a lesser extent (or more general) pronunciations, etymologies and conjugation/declension data, should all be clearly associated with particular senses, and the current method of doing so is by using a gloss to indicate which sense the subsequent information should be associated. What we should probably be doing is using collapsible fields directly beneath every definition containing all associated data, so once the reader finds the sense they want they simply click once and get all kinds of additional information pertaining only to that sense. No hunting around across the page, no question about which sense the information is associated with. It isn't that big a technological problem, it is a decent-sized organizational problem and a GIGANTIC effort in manually reorganizing the entries problem, but it is something we will end up having to do anyway, and there is a good chance that while we are cleaning things up we will be able to format them in a very standard way, hopefully a way which is easy to convert to xml or other associatively-structured output so that _all_ of the information on Wiktionary is friendly to machine reading rather than just the definitions. How best to affect this is a matter of discussion of course, but I think everyone knows, deep down, that our entry layout is far more complicated and far less useful than it could and ought to be. - TheDaveRoss 20:18, 19 March 2008 (UTC)
And in the meantime, the best we can do is standardize everything that can be effectively standardized (without losing information) to maximize the chances that some of the restructuring can be automated. Well, it won't be long until we'll have gotten rid of all the English transitive and intransitive verb headings, occasionally yielding more than 10 sense lines and 10 usage examples (more than a screenful) for a single ety's verb PoS. DCDuring TALK 20:36, 19 March 2008 (UTC)
Just to add, despite how deep the above discussion seems to go, the vast, vast majority of cases use no more than 2 or 3non-grammatical labels at the same time, and appropriately used. As I said, having more than tree labels is already very rare. Circeus 21:54, 19 March 2008 (UTC)
The initial suggestion seemed like a good idea, but the opportunity to learn something new and to put some mileage on a hobby-horse was irresistible (to me, anyway). DCDuring TALK 22:02, 19 March 2008 (UTC)

While I agree with TheDaveRoss that we should be nesting information under definitions (which sucks for the JavaScript-less, but I don't think the current approach is much better), it's obvious that we don't currently have consensus to do that. So in the meantime, I think it would be helpful to create a family of "see-usage-notes" templates that provide a standard appearance and functionality for links to and from usage notes. (Actually, in general it would be nice if we had a convenient way for glossed onym sections and so on to link to the senses they belong to, until such time as we simply attach them to those senses.) —RuakhTALK 00:26, 20 March 2008 (UTC)

BTW, the idea was mentioned above of putting grammatical information in a separate set of parentheses; I don't think that's a good idea, as it will look strange, and the distinction we're making will not be obvious to the casual reader. —RuakhTALK 00:32, 20 March 2008 (UTC)
From the amount of stuff I've looked at, even with only topical templates, you can believe ma that we are not exactly consistent. Circeus 00:52, 21 March 2008 (UTC)

Name of Category:Spices and Herbs

I would like to move Category:Spices and Herbs to Category:Spices and herbs, but it seems much more work than with the previous categories that I have moved. What could be done robotically is that all the word entries in the various languages are moved to the new categories, such as Category:de:Spices and herbs; a simple regular expression replace should achive this, such as r/\[\[Category:\(.*?\):Spices and Herbs\]\]/\[\[Category:\1:Spices and herbs\]\]. Afterwards, the category pages could be moved manually. Anyone volunteer on having his robot do the work? Or is it more complex than I imagine it to be? --Daniel Polansky 06:14, 20 March 2008 (UTC)

You could try doing it with autoedit, which will make it quicker though not as easy as getting a friendly bot. Conrad.Irwin 11:38, 20 March 2008 (UTC)
Thanks for the hint. --Daniel Polansky 12:24, 20 March 2008 (UTC)

entries with illegal titles

A feedback message prompted my creation of Appendix:Entries with illegal titles Appendix:Unsupported titles, which still perhaps needs a better name, and definitely need expansion (and being linked to).—msh210 19:41, 20 March 2008 (UTC)

Seems like a great idea. I suppose we could create some Javascript that redirects from those pages to that Appendix. Conrad.Irwin 17:55, 20 March 2008 (UTC)
Sounds good to me.—msh210 19:41, 20 March 2008 (UTC)
Better idea, that appendix is now transcluded into MediaWiki:Badtitletext. For example, see http://en.wiktionary.org/wiki/%7C . Conrad.Irwin 20:17, 20 March 2008 (UTC)
I imagine this is going to grow fairly long. Can we make it a list of links, to e.g. Appendix:.7C etc? Also, where do we name symbols? See e.g. ^ which is not acually defined as exponentiation, conjunction, or however else it's used, but is named unlike the symbols on this new page. DAVilla 20:56, 20 March 2008 (UTC)
I would imagine that this page would become an index, and the individual entries would be moved out: it would make sense to use that kind of encoding for them. Appendix:UT/.5B or summat. Conrad.Irwin 22:04, 20 March 2008 (UTC)

Language hierarchies

What is the preferred language heading, ==Mandarin== or ==Mandarin Chinese==, or something else? ==Cantonese== or ==Cantonese Chinese==?

Which of these is preferred in the translations section?

  • Chinese
    • Cantonese
    • Mandarin

with or without the extra bullets, or

  • Cantonese (Chinese)
  • Mandarin (Chinese)

with or without "Chinese"? My opinion is based on simplicity in the layout, which rules out the indentation that other users like, and on normality, which rules out "Cantonese Chinese" but allows "Mandarin Chinese" for clarification. DAVilla 20:14, 20 March 2008 (UTC)

I like the first one because of the fact that it would also work for regional variations:
But that's just me. — [ ric ] opiaterein — 12:52, 21 March 2008 (UTC)
I really like that too (and we can discuss bullets and italicization of region name). Note that there may be Portuguese translations that are not regional which would go on the first line. But the question posed is a little different. I think that the indentation should consistently match region, and the language/dialect listed should consistently match what we use as level-two headers. Otherwise you would end up with
  • Chinese
    • Mandarin: ...
      Taiwan: ...
which is a bit crazy. On the other hand, the problem with not doing it this way is that Mandarin translations wind up under "M" instead of "C", which in my opinion is mitigated by the use of "Mandarin Chinese". DAVilla 22:26, 22 March 2008 (UTC)
The preferred L2 header is ==Mandarin== or ==Cantonese== or the like. The way Translations are listed is a current topic on Wiktionary talk:About Chinese. --EncycloPetey 22:05, 21 March 2008 (UTC)
Great, thanks for the link! If the translation section were to list
  • Cantonese
  • Mandarin Chinese
Then would it be objectable to change all the language headers to ==Mandarin Chinese== to match? DAVilla 22:26, 22 March 2008 (UTC)
I would like to say that there are occassions when it would be good to point out both region and language/dialect in the translations section. In such cases, I don't think it would be crazy to have (for computer):
The above example is not necessarily the norm, but there are times when you may want to point out this kind of information. However, I'm not arguing for a specific format, just the ability to include such information when necessary. -- A-cai 08:16, 29 March 2008 (UTC)

I would like to propose that:

  1. We use the same names for languages in translations as those which are used for L2 language headers
  2. We use Mandarin Chinese instead of just "Mandarin", but otherwise the name of the dialect such as Cantonese without "Chinese"... unless we decide otherwise for a specific dialect
  3. Within the translations section, we list all languages at the same level, and indent only regions which would not be accepted as L2 headers

The major objection to this of course is that Mandarin translations would not be listed under "C". Does the simplicity have enough benefit to overweigh that objection? DAVilla 04:50, 31 March 2008 (UTC)

That sounds good to me, though I'm not sure regions should generally be indented. I think something like Mandarin Chinese: (PRC only) trad. 計算機, simpl. 计算机 (jìsuànjī), (PRC and Taiwan) trad. 電腦, simpl. 电脑 (diànnǎo) would be better in most cases. —RuakhTALK 05:17, 31 March 2008 (UTC)
Sounds reasonable to me as well. Thinking ahead (but not discussing the issue yet), we should consider how this would impact grouping and alphbetization for languages like Ancient Greek, Old French, and Northern Saami. IF the Translations labels and L2 headers always match (which I think is good), can we live with the impact it would have in situations where languages have a qualifier of time or location? --EncycloPetey 05:35, 31 March 2008 (UTC)
Same as the reasoning for listing language sections alphabetically, this is already an unavoidable issue where the newer and older languages do not even share a name. For not remembering a good example of that I'd make a terrible linguist. DAVilla 05:00, 7 April 2008 (UTC)

Separate new page templates for simple past and past participle

The new page template for "Past" creates "Simple past tense and past participle". It would be nice to have separate templates for "simple past" and "past participle" for words where they're different. - dougher 00:48, 21 March 2008 (UTC)

Maybe I'm just dumb, but can you give an example of a word where they are different? Nadando 02:11, 21 March 2008 (UTC)
sing - simple past sang (I sang in the choir), past participle sung (I have sung in the choir). Thryduulf 02:19, 21 March 2008 (UTC)
For more examples, see Category:English irregular verbs. Thryduulf 02:22, 21 March 2008 (UTC)
This would be very useful in corner cases such as proved vs. proven where the regular inflection is both the simple past and past participle, but where use of an irregular past participle needs distinction of context such as region. DAVilla 07:37, 21 March 2008 (UTC)
What new template are you talking about? We've always had possible the option of listing these separately. --EncycloPetey 22:03, 21 March 2008 (UTC)
{{past of}}:{{simple past of}}:{{past participle of}}::{{new en verb past}}:{{_____}}:{{_____}}, where the blanks denote the templates I believe he's proposing. —RuakhTALK 22:23, 21 March 2008 (UTC)
No, he hasn't proposed a new template. He said 'The new page template for "Past" creates...' which implies that the template he's talking about already exists. I'm asking what "new" template that might be. --EncycloPetey 01:17, 22 March 2008 (UTC)
When you search for a word that isn't in Wiktionary, you are shown a table with the title "You can create a new entry with one of the following preloaded entry templates:" [8].
The options given on this table are "Basic", "Noun", "Plural", "Adjective", "Adverb", "Comparative", "Superlative", "Verb", "3rd Person", "Participle" and "Past".
If the word you want to create is a past participle then obviously the option you choose is "past".
However, this preloads a page inclduing the template {{past of}}, which outputs "Simple past tense and past participle of", and categorises your word into category:English simple past forms and category:English past participles. This is correct for English regular verbs.
If your word is a past participle but not a simple past form (or vice versa) though then this is incorrect. The original questioner is asking for separate preloaded templates for past tense words that are not both the past participle and simple past form (i.e. they are one but not the other). Thryduulf 01:35, 22 March 2008 (UTC)
Ah, thanks. I understand the question now. --EncycloPetey 19:03, 22 March 2008 (UTC)
FWIW, at least two other editors (myself and RJFJR) also originally parsed "new page template" as "new {page template}" rather than as "{new page} template". I don't know why it's so confusing, but you're not alone. —RuakhTALK 21:04, 22 March 2008 (UTC)
Right, so how about if we split {{new en verb past}} into two definition lines, {{simple past of}} and {{past participle of}}, and then when appropriate the user could simply delete one of the lines? DAVilla 22:10, 22 March 2008 (UTC)

Projects being neglected

Wiktionary:Translations of the week and Wiktionary:Collaboration of the week have not been updated regularly, Translations for a few weeks and Collaboration for much longer. Is anyone interested in keeping these up to date or should we take them out of prominence rather than have repeats showing up? - TheDaveRoss 21:29, 21 March 2008 (UTC)

If nobody is working on them, then I suggest you reduce their prominence and mark them as inactive with a note to start a discussion here if anyone ones to revitalise them in future. My plate is currently full working on category:Requests for pronunciation and special:UncategorizedPages. Thryduulf 21:42, 21 March 2008 (UTC)
Connel, DAVilla, and I (and others) have all tried at times to get these projects going. Ethusiasm and participation seldom continues beyond a couple of weeks, and then the community forgets about them again. I stopped bothering because it just wasn't worth my time and effort to try to get people involved. --EncycloPetey 22:02, 21 March 2008 (UTC)
Projects only work properly if people work together as a team. But we are all individuals doing our own thing. They should probably all be scrapped. SemperBlotto 22:21, 21 March 2008 (UTC)
Both have been marked {{inactive}}. Conrad.Irwin 10:24, 22 March 2008 (UTC)
Well, we include them elsewhere, should we drop the inclusion on places like Tea Room? - TheDaveRoss 01:52, 23 March 2008 (UTC)
That would seem sensible, though we should probably leave a few links hanging around in the hope that someone enthusiastic finds them one day. Conrad.Irwin 16:10, 23 March 2008 (UTC)
I went ahead and filled up ToW for the next ~25 weeks, took about 20 minutes. Apparently this isn't a time consuming chore it is just one that has slipped by for whatever reason. If anyone wants to take a crack at CoW I am sure it would be similarly quick and painless and these rather useful Things to Do(tm) wouldn't go by the wayside. - TheDaveRoss 16:42, 29 March 2008 (UTC)

Interwikis and ta.wikt (Tamil)

moved from Grease Pit, as this is no longer a technical issue ...

I've disabled the adding of iwikis for ta.wikt for now (except when the bot is making an edit anyway, so you'll see "iwiki +fr, ta" but not "iwiki +ta"). Reason is that the bot was spending 60-70% of its time on these.

The problem is that the Tamil wiktionary has gone from <10K entries to over 100K in about two weeks. (Urp!) There is a bot loading lots and lots of "English" words. Some of the entries are clearly crap. Some are dubious even without knowing any Tamil. The rest? Well, I tried looking up a few dozen words that appear in what seems to be the definition line, and most of them gave me the magic zero google. Of course that might be a lot of technical words that no-one has yet written in Tamil on the net, but it makes me wonder.

Also, all of the entries reference tamilvu.org, (Tamil Virtual University) and everything there is copyright. It could be that the source is elsewhere, and that is just a "convenient" link, but I dunno.

Someone ought to look at this ... anyone know any Tamil? Robert Ullmann 08:14, 21 March 2008 (UTC)

You could ask ravishankar. --EncycloPetey 20:04, 21 March 2008 (UTC)

uh oh ... from my talk page:

Hello Robert Ullmann. I am User:Sundar in English and Tamil Wikipedias and the Tamil Wiktionary. I saw your message to Ravi concerning the bot-created articles in Tamil Wiktionary. As I wrote up SundarBot that uploaded articles, let me answer your questions:

Firstly, while there could be some unforeseen bugs in transcoding to Unicode, there's no junk uploaded by the bot. Secondly, we got the glossary from Tamil Virtual University which developed that dictionary from numerous public domain sources, volunteer effort, and fully funded by the Government of Tamil Nadu. Also, we believed that words of a language can't be copyrighted and are naturally in the public domain. The bot took the meanings from www.tamilvu.org, transliterated them to Unicode (from TAB encoding), categorised them, formatted per wiktionary conventions, added pronunciation where one exists in the commons, and uploaded it to Tamil Wiktionary citing TVU and providing a link to their page. Errors from the original source have since been corrected by users too. Being words of a language (actively encouraged by the creator for wide public use) compiled using public funds copied with proper citation, processed and value-added in Wiktionary is fair-use according to Tamil Wiktionary editors. Also, let me state that we didn't use any style or artistic product of TVU. -- 122.167.242.183 14:25, 22 March 2008 (UTC)

sigh

As far as I can tell, Tamil Virtual University claims copyright on the dictionary (as part of a blanket claim on the website). And his definition of "fair use" is very very wrong, and "fair use" doesn't apply anyway if you are copying entries from another dictionary.

Where do we take this? Robert Ullmann 14:48, 22 March 2008 (UTC)

This is a very serious problem, though arguably one to which we could simply shut our eyes, since in the end their content is not ours. Assuming that Sundar's explanation above is accurate, TA is in direct violation of foundation:Resolution:Licensing_policy. Stewards refuse as a matter of principle to be involved in these matters, so I guess the best that could be done would be to contact foundation staff directly (or semi-directly via a post to foundation-L). -- Visviva 15:38, 22 March 2008 (UTC)
Foundation-L? Is there not enough noise there? I sent email to WMF counsel, Mike Godwin. (who is the eponymous Godwin! very cool :-) Robert Ullmann
You're right that if fair use is being claimed here, it's likely a bad claim, and unless ta.wikt has an EDP (unlikely) even a good fair use claim is in violation of policy. However, I would note that while it makes me uncomfortable, Mike Godwin has said in the past that things like dictionary definitions and etymologies are "facts" that can't really be subject to copyright, even though the companies that publish dictionaries will claim that it is. To me, those seem like the most creative parts of the dictionary, too, and Mike seems to think that there isn't much to a dictionary that is protected. Aside from that being startlingly permissive, it would also be a direct commercial threat if we, say, copied OED wholesale, so they'd likely challenge it in some way eventually. It would be good to hear from his own mouth what he has to say, but I remember being surprised the last time he was asked about Wiktionary and copyrights. Dmcdevit·t 09:52, 23 March 2008 (UTC)
If anyone is interested in the finer details on copyrights on facts, and collections of facts, in US law, the seminal case is 499 US 340 (Feist v Rural), in which the Court ruled on the (non-)copyright in a telephone directory (the content being facts, with no originality in selection or arrangement: all the numbers in the area, in alphabetical order by subscriber name). However, dictionaries contain significant creativity in selection, composition of definitions, etc. So Sundar has part of it right, but the reference to "fair use" implies that he has knowingly used copyright material, rather than simply using the "fact collection". IMHO, Mike is correct that there isn't much to a dictionary that can be copyright; but the entire work almost certainly is. And Sundar copied/is copying the entire work from TVU. It will be interesting to see what he (Mike) says. Robert Ullmann 10:19, 23 March 2008 (UTC)
<IP lawyer hat on>If an action were brought against anyone in this matter, the action would likely be brought in India and the applicable law would be the copyright law of India, where the work was produced. Under the prevailing law, the INDIAN COPYRIGHT ACT, 1957, the Indian government does own copyright in its works, and works produced by a "public undertaking" (a cooperative venture between a government and private actor) are owned by that "public undertaking". Feist v. Rural is not applicable law to an Indian copyright. The courts of India have apparently rejected Feist, at least with regard to the "sweat of the brow" doctrine, in Burlington's Home Shopping Ltd. v. Chibber (1995), in which the Delhi High Court applied copyright protection to a computer database of contact info from mail order customers.
Even if Feist were applicable, dictionaries are generally covered by copyright to the extent that, for example, creativity goes into the writing of definitions. Of course, if a copyrighted dictionary purports to define all words in a language, its owner can not prevent another person from similarly attempting to define all words in the same language, and from referring to the copyrighted work to determine if that list is complete. However, if there is anything more than that - say, one dictionary copying another's select (but not "complete") list of words, then we run into copyright problems. Furthermore, although India has some broad fair-use exemptions, I don't see how this use falls within them - there is no criticism or artistic statement made about the Indian government's work, nor the reporting of a news event. In short, I wouldn't be plucking definitions from a copyrighted work in India. </IP lawyer hat off> bd2412 T 01:18, 24 March 2008 (UTC)
Yes, thank you, I should have been clearer than "in US law", pointing out that the copyright material was in India, so Feist is illuminating, but not controlling. Thanks, Robert Ullmann 13:54, 24 March 2008 (UTC)

{{plurale tantum}} vs {{pluralonly}}

We currently have two different templates for use on words used only in the plural, they both categorise articles into the appropriate language subcategory of category:Pluralia tantum but display different text.

  {{plurale tantum}} {{pluralonly}}
Display text (plurale tantum) (plural only; not used in singular form)
Inclusions 245* 159

* Including inclusions of {{pluralia tantum}}, which redirects to {{plurale tantum}}

Having a quick look at the entries they are used on, there is a very slight preference for {{pluralonly}} to be used on more basic words, but this is not going to be at all statistically significant. The history seems to show that they have developed independently, with the creation of {{plurale tantum}} postdating that of {{plural only}} by about 5 months.

I don't see the benefit of maintaining two separate templates which are doing the same job, however as the wording is different it isn't just a case of redirecting one to the other. While "plurale tantum" is the correct name for the class of words, it isn't a term with which most non-specialists are familiar. The "plural only; not used in singular form" is a very good definition of the meaning of "plurale tantum" but it doesn't educate people about the technical name. As combining the descriptions will make the text too long, perhaps "(plurale tantum)", including a link to the entry would be best? Thryduulf 16:39, 22 March 2008 (UTC)

I feel that we should use the correct term but provide a quick definition on hover, maybe something like (plurale tantum). I also quite like the idea of the link, but it is quite slow compared to a hover text. Conrad.Irwin 16:55, 22 March 2008 (UTC)
I think something like (plurale tantum) might be the best of both worlds. —RuakhTALK 17:08, 22 March 2008 (UTC)
Is there any English-language dictionary that uses "plurale tantum" instead of "plural construction only" or a similar English wording? The term is not even defined in MW3. What is our justification for ignoring the needs of our users or, actually, our potential users? That botanists descirbe newly discovered plants in Latin does not mean that gardening books are writen in Latin. DCDuring TALK 17:26, 22 March 2008 (UTC)
The one hardcopy dictionary I have to hand, the 1998 edition of The Chambers Dictionary, doesn't define "plurale tantum" and uses the abbreviation "n pl" for "noun plural" to mark such entries. This dictionary bills itself as "the most comprehensive single-volume dictionary" so I would not expect to find it in concise dictionaries. Thryduulf 17:38, 22 March 2008 (UTC)
Both the OED and MW3 use pl.. The AHD is inconsistent, sometimes using the text "Often used in the plural" (cf. pant) and other times putting the plural form in bold at the head of the numbered sense (cf. color). Random House doesn't bother to mark these at all. --EncycloPetey 19:02, 22 March 2008 (UTC)
No clue. An English wording would be better, if there's a brief one. How about (in plural)? That would also reduce the suggestion that a word is simply never used in the singular: after all, most pluralia tantum, despite the name, are sometimes used in the singular (especially in jocular, nonce, or nonstandard usages, but even certain standard forms of the language tend to singular-ize them in attributive use, and some words, like paparazzi, are freely used both ways by different people). —RuakhTALK 21:13, 22 March 2008 (UTC)
That's sounds like a good reason to use plurale tantum. If we use an English wording, it will be positively misleading. A technical term like plurale tantum will have to be looked up by people less familiar with the grammar, and can be linked to a definition or Appendix that points out that the plural is not strictly absolute, as you say. The probelm with "in plural" is that it isn't strong enough. The problem with "only plural" is that it's too strong. Any English wording likely to be of the appropriate strength is also likely to be too wordy. --EncycloPetey 16:28, 23 March 2008 (UTC)
The wordings "rarely" "sometimes", "usually", and "always", with "plural" would seem to encompass the range of possibilities, with "usually" being an appropriately cautionary and flexible default. Having a special link from plural to a WT Appendix article on plurals would give us the chance to clarify exactly what we mean, if any ambiguity remained. "Tantum" only means "only".
No matter how we do this, some users will be mislead. The Latin, I fear, will put off a noticable percentage (5%?) of viewers of the article, whether or not they care about plurals. Only some (50%?) of viewers will really care all that much about the plural question. I would expect that not all users (20-50%?) will click through to any link on a grammar point, even if that's what they were looking for. I would expect that a Latin term would lead to fewer (50%?) click-throughs than an English one. Few users (20%?) would get the right idea from the Latim term itself. More (3X) would get it from an English term. If one does the arithmetic one would conclude that fewer users are helped by Latin than English and that some folks would be put off by the very presence of Latin.
Finally, I don't think that the plurale tantum and singulare tantum entries really explain things well enough. DCDuring TALK 17:04, 23 March 2008 (UTC)

Agree with using {{plurale tantum}} as a specific term to desribe what is a phenomenon rather than a rule. Those users who would ever care, or in any way be affected by the phenomenon, would click the link and thus learn a fancy new term. -- Thisis0 17:27, 29 March 2008 (UTC)

So, if I understand the argument correctly, we don't care what our (potential) users understand, but prefer non-english terms in the english wiktionary because they're more precise? and if they're curious enough, they'll click through. Just asking for clarity on this issue. - Amgine/talk 20:13, 29 March 2008 (UTC)
We care very much what our users understand and how they are best able to consume our content. We also care how to increase their understanding. In this particular case, I agree with using the specific Latin term because what we are describing is a linguistic phenomenon with it's own peculiar attributes. Using actual short English phrases to try and describe it becomes misleading and inaccurate. Saying "plural only; not used in singular form" is both inaccurate and proscriptive -- two things Wiktionary aims not to be. Other attempts in English to briefly capture the existence, function, and use of terms like trousers, glasses, scissors, and clothes are certain to be imprecise and misleading. If you can coin a template-worthy phrase to describe the way these terms really work, go for it. Until then, I advocate using the detached, erudite phrase (with click-through to thorough explanation) to tag the peculiar phenomenon. -- Thisis0 00:44, 1 April 2008 (UTC)
That is saying that we only care about those who are willing to put in the time, not those who are too busy, i.e., who have a life. That's like only teaching the students who are going to go to grad school. "Plurale tantum" and "singulare tantum" aren't clearly explained once someone clicks through and have a deterrent effect on users. Or are you saying that the value of plurale tantum is precisely that it isn't clear that it means "plural only"? DCDuring TALK 01:16, 1 April 2008 (UTC)
Partly, yes, indeed I am saying that is its value. Actually saying "plural only" would be quite wrong, and would teach the wrong thing to the most nonchalant user, which is worse than teaching him nothing. And the fact that the click-through entries aren't yet thoroughly explained isn't a reason not to proceed with the correct course. Make them thoroughly explained. Also, I'm not gonna give any weight in my consideration to anyone "taking offense" or being deterred by the mere presence of things that actually have Latin names. What is a dictionary if not instructive and illuminating on the subject of words? We will be just that. -- Thisis0 03:33, 1 April 2008 (UTC)
If we decide to go with "Plurale tantum" (which I'm less keen on now that when I started this thread), then I think Ruakh's hover and link would be the most useful for users with and without a life. What might be better though would be being able to express degrees of commonality - "not used in singular form", "rarely used in singular form", "usually not used in singular form", "normally not used in singular form", "not used formally in singular form", etc. Although the last would probably work better expanded to a usage note. Thryduulf 01:34, 1 April 2008 (UTC)
If any of those phrases happened to be accurate for a specific entry, they are Usage-Notey and belong in that section. The actual real name for the thing is plurale tantum, which is the right substance for a tag. Besides, it's not as if the Latin name is some obtuse thing. It starts with the word "plural", and most who glance at it can sense the phenomenon to which it refers. The fact that the tag exists is the only reason I know the term today. I learned it here, and have since researched, explored and categorized the phenomenon in my brain, i.e. I learned a new thing because of it. Don't rob the next guy of the chance to learn. -- Thisis0 03:33, 1 April 2008 (UTC)
I think we would do well to express degrees of prevalence in the sense line (or in the inflection line if appropriate). It would be nice to direct users to the Usage notes if there is more, but some generic cases could be handled with some kind of link to an appendix, an article, or even a "Plural notes" subheading under Usage notes. I'm not at all certain that this can be handled separately from plurals in general. Generic cases could include "pairs of" words (scissors and other tools, various kinds of pants, spectacles/eyeglasses) and other cases which I am too tired to reliably characterize at this time. If we do this, I don't see how we could do it in Latin and expect to be understood or indeed expect all editors and admins apply it properly. DCDuring TALK 03:23, 1 April 2008 (UTC)
Keep it simple, friend. -- Thisis0 03:33, 1 April 2008 (UTC)

The dictionary should be as clear as possible to all readers. We diminish it by imposing our idea of what they should learn. Plurale tantum is rare jargon – does any recent dictionary define or employ it?

The CanOD and NOAD don't define it (nor does Dictionary.com). They simply use the description "plural noun". It's self-explanatory, unambiguous, generally useful, and can be made more specific with labels. For example, in CanOD individual senses are sometimes labelled like "jean noun 1 a heavy twilled cotton fabric... 2 (usu. in pl.) hard-wearing pants..."

Let's use plain English instead of obscure Latin. —Michael Z. 05:40, 1 April 2008 (UTC)

Teeth, men, and horses are also all "plural nouns", however they hardly relate to the linguistic category we are discussing. That phrase is neither unambiguous nor particularly useful in separating these terms out from other plurals. In line with the goal of being "as clear as possible to all readers", the phrase 'plural noun' is lacking. Also, the fact that other dictionaries deal with this phenomenon in divergent, non-uniform, and sometimes inaccurate ways, does not mean we should follow suit. Other dictionaries also have been highly confused and divergent on the issue of "Noun Adjunct/sort-of Adjectives", which Wiktionary is also digesting right now and attempting to handle properly. So. A) Plurale tantum has a real name. We didn't make it up. B) It's the most accurate way of labeling a specific phenomenon that is more complex than a few English words can describe. C) It avoids the misleading nature of those inaccurate phrases, and giving potentially false impressions is worse than teaching nothing. D) We're really only talking about a very small number of entries to which this term even applies. E) Being a project that values knowledge, I without question err on the side of imparting more of it rather than catering to those uninterested in knowledge. F) The presence of two italicised Latin words in an entry does not prevent in any way the most casual user from learning the definition of trousers. Incidentally, the best way to teach this user the same information is with example sentences. We have those too. -- Thisis0 18:08, 1 April 2008 (UTC)
Hm, those are the plural forms of regular nouns, nouns in the plural. E.g. "men pl / Plural form of man," as opposed to "scissors plural noun / A type of tool..." That's just an example; I haven't claimed to have worked out the best method, but I do see plenty of precedents indicating that we can express this in English.
Many items in Category:English pluralia tantum simply aren't, and many more will require notes explaining the nuance. I think something like "usually in the plural" is better than mixing languages with "usually plurale tantum." Does that even make sense, or is plurale tantumness an absolute quality, requiring that such cases be labelled in English anyway?
As to the superior accuracy of the Latin, I do see the benefits of using a technical term in its appropriate context, and also the disadvantages of using it elsewhere. To most readers, p.t. will remain jargon (again: does any recent dictionary use or even define the term). Indeed the concept is "more complex than a few English words can describe," but most of our readers won't benefit from a familiarity with lexicographical literature debating its meaning,[9][10] but will have to settle for an (adequate?) 11-word English definition anyway. The same goes for our editors, so the category will continue to be full of nouns which are conventionally, often, usually, or mostly plural, but may not in fact be plurale tanta. —Michael Z. 20:59, 1 April 2008 (UTC)
I don't understand the point of men, teeth and horses. As to your helpfully labelled points:
A. I do have another term that we didn't make up: "plural in construction". This has the advantage (from your point of view) of not being 100% self-explanatory, but also the advantage (from my PoV) of being in English.
B. As for "plurale tantum" accurately describing things in a few words: it does not describe things for non-Latinists, it merely labels them; and it does not well address the problem of usage that are "usually", "often", or "sometimes" "plural in construction" without the use of macaronic, oxymoronic constructions that would give most of us a chance to test our gag reflex, such as "sometimes plurale tantum".
C. I do not see the inaccuracy of "plural in construction".
D. To the 400+ templates should be added the entries that link to plurale tantum and singulare tantum, those using {{singulare tantum}} and {{singular only}}, those that might be using some unlinked variations, and those that should have an appropriate label.
E. I favor imparting knowledge especially to those who are our most marginal visitors, because there are so many of them. (Evolution must favor them because there are so many of them.)
F. The impact of Latinisms on our users is not one that we have any facts about that I am aware of. Anecdotally and analogically, I would draw your attention to the replacement in the paper and printing and paper industry of such terms for page sizes as sexto and sextodecimo with sixmo and sixteenmo. This is suggestive of a certain concern within that industry with whether it is worth more to respect old practice or make things intelligible for newbies.
I strongly favor usage examples. I also favor usage notes for nuances. These last two possibilities do not differentially favor either English or Latin labels on the sense lines. DCDuring TALK 19:10, 1 April 2008 (UTC)

Forcing fonts in script templates

The issue of forcing fonts in script templates has come up a couple of times lately, so I thought I'd bring up the issue here. Here are a couple recent discussions:

There are a couple of questions raised by changes that Conrad.Irwin and I have made lately (some of them unimportant technical details), but I think the important issue is whether or not we should be forcing fonts for particular scripts to use the ones that our local language experts have deemed to be the "best" fonts for that script.

There are a couple of cases where we have to force fonts to accomodate Internet Explorer 6, but is already done in a way that only affects that browser. There are also cases like {{Cyrl}} where the default fonts used by most of our readers don't render the text correctly (combining accents in the case of Cyrillic). Since I think that forcing fonts in those cases is broadly considered a necessary evil, I'd like to focus on the cases forcing fonts is done solely to make things look "optimal", not to work around broken browsers or incorrect default font rendering in common browser setups.

My personal feeling is that we shouldn't force fonts unless it is to correct a widespread problem that actually results in incorrect display, not just suboptimal display. If we want to provide a way for logged-in users to easily choose to have the "best" fonts as determined by people who are familiar with the script in question, it could be done with WT:PREFS. Mike Dillon 00:11, 23 March 2008 (UTC)

P.S. There are currently big timing problems related to certain types of changes to the handling of script-specific fonts due to the caching strategy used by the Wikimedia Foundation's settings for running MediaWiki. If you're interested in giving us back some control of the timing of these things, please vote for bugzilla:8433 or add your comments there. Mike Dillon 00:13, 23 March 2008 (UTC)
I agree. Accessibility is paramount, so we should take the necessary steps to ameliorate known breakage. For example, the known MSIE 6 bug renders international text unreadable, and even a technical user would have no way to fix without our help. (Anyone know if this still affects MSIE 7 and 8?)
But aside from that, users' preferences should be respected, also to follow accessibility principals. There are more web browsers, versions, and configurations out there than we can ever test, and countless different sets of user preferences and personal style sheets. Many of them have been specifically chosen by users for their own preferred fonts, to display their own preferred or required language scripts. (Even a tiny fraction of our audience is very many in absolute terms.) Any unjustified fiddling with fonts and styles is bound to degrade or break the display for someone out there, so there should be as little intervention as possible.
Personally, I dislike that some templates currently override my font choice and eliminate the default bolding for Cyrillic in all browsers, but I can live with it. I'm looking forward to some changes in the translation templates which have to wait for the proxy caching to refresh. —Michael Z. 08:30, 26 March 2008 (UTC)
WT:PREFS is already too complex. I don’t understand a lot of it, and I seem to have to spend half an hour a week reticking the Expand translations box. So far, the only result I’ve seen from removing forced fonts is bad display, sometimes to be point of being unreadable. Perhaps you could add code to all of the font-call templates that will allow users to tick a box in WT:PREFS if they want the template to be ignored. I have no understanding of the timing problem you mention at bugzilla:8433, so it would be silly for me to vote either way. —Stephen 13:04, 26 March 2008 (UTC)
What I am suggesting is forcing fonts only for badly broken browsers (i.e. for MSIE's inability to deal with mixed language scripts), but not in other browsers (i.e. pretty much the rest). This is what's done in Wikipedia, and mostly here, except IPA and Cyrillic fonts are currently forced for all browsers. I agree that there's already way too many preferences.
Twiddling the display without necessity will have unpredictable effects in minority browsers, and could break the display for someone who has an otherwise working browser, or who has already taken steps to make the text he's interested in work right by default.
Is it MSIE that are you experiencing problems with, or am I being too presumptuous? Which writing systems suffer in Wiktionary? —Michael Z. 21:24, 1 April 2008 (UTC)
The fonts are actually forced for quite a number of scripts, they're just forced with inline styles instead of in the site-wide CSS file. Pretty much all of the font templates in Category:Font templates are used in an active script template. Mike Dillon 02:14, 2 April 2008 (UTC)

Customizing appearance of FL links in translation tables

for those that don't usually read the Grease pit, you might be interested in WT:GP#Customizing t, t-, t+ templates, please keep the discussion there, this is just a pointer Robert Ullmann 14:07, 24 March 2008 (UTC)

clung

This entry has two verb headers with corresponding translations. I'd like to create one verb header with two senses and one translation header with two trans tables. Would this be a good approach? --Panda10 16:01, 24 March 2008 (UTC)

Sounds good to me.—msh210 16:07, 24 March 2008 (UTC)
Putting both senses under the same header is an excellent idea. However, whether the translation tables should stay at all is a question which the larger community has not yet decided. Some of us (especially those who work with highly inflected languages) would argue that a "form of" entry like that should not have translations at all. If I were to reformat the entry I would put both senses under the same verb header and then nix the translation tables altogether (although I was take a look at cling and see if some of the information is lacking there). However, if.....say...Robert Ullmann or Connel MacKenzie were to reformat the entry, they would keep the translation tables and format them as you say (see this vote for more info. In any case, formatting the entry as you proposed is certainly acceptable. -Atelaes λάλει ἐμοί 16:23, 24 March 2008 (UTC)
Thanks. I made the change. --Panda10 21:10, 24 March 2008 (UTC)

-in' forms

Two RFD nominations of -in'-form verbs, bein' and frontin', led a somewhat more general discussion of whether we should generally have entries for such words. Before debating that, I think it would be instructive to have people's views on what the state of the CFI is currently. Do you read the current CFI as allowing:

  • all -in' words (minus perhaps some exceptions) whose corresponding -ing words are attested;
  • individually attested in' words; or
  • none of the -in' words (minus perhaps some exceptions) even if the corresponding ing word is attested?

Once we have people's views on the current CFI, we can, perhaps, talk about whether they ought to be amended.—msh210 16:05, 24 March 2008 (UTC)

Attested.msh210 16:05, 24 March 2008 (UTC)
Attested.RuakhTALK 23:06, 24 March 2008 (UTC)
Actually, that's not true. I definitely think the current CFI allow -in' forms, but I'm not sure they currently require those forms to be cited separately from other forms of their respective words. (This is actually a question that applies to normal wordforms as well: my opinion for that is that we have to cite words, not wordforms, but I'm not sure that would apply to something like this. I guess I find it hard to separate how I think it should be from how I think it is, heh.) —RuakhTALK 01:29, 25 March 2008 (UTC)
Well, I await the outcome of this discussion, since it can impact how we do Latin, where individual forms are often not attested, but are assumed from the pattern of the verb conjugation. --EncycloPetey 01:36, 25 March 2008 (UTC)
For what it's worth, my plan is to only include forms which are attested in Ancient Greek. However, this doesn't necessarily mean that such a route is best for Latin, as grc inflection is a bit more varied and regionally dependent than Latin's. -Atelaes λάλει ἐμοί 05:49, 25 March 2008 (UTC)
Attested, but uncertain as to whether each "-ing" PoS (verb, noun, and adjective) needs to be separately attested in the "-in'" form. Pretty sure not each sense. DCDuring TALK 00:56, 25 March 2008 (UTC)
Attested. In my opinion CFI, as it currently stands is fairly silent on this particular issue. However, I see no reason to exempt forms from the normal criteria (both in' or otherwise). In the future, if we manage to find a more automated fashion of finding cites, such as TheDaveRoss's citebot (the possibility of which is freakin' awesome for a number of reasons), we may want to trim some of them out which are not attested. However, for the time-being, I think that we should stick with our current practice of allowing all normally inflected forms of a cited lemma. Irregular forms, such as -in', should be handled and cited on a case by case basis. -Atelaes λάλει ἐμοί 05:49, 25 March 2008 (UTC)

Random Page

A few dozen clicks of "Random page" indicate that more than 80% of en.wiktionary.org now consists of words in languages other than English. (Is the proportion published anywhere?)
I think this has been suggested before, but does anyone have the skill to program a "Random English page"? (I don't have the programming skill, unfortunately.)
Also, many of the non-English word entries contain just the part of speech. It would be very useful to have a translation too, without having to follow through the grammar trail (which is OK for grammarians, but frustrating for those who just want a quick meaning in English). I realise there are difficulties with this because the meaning often depends on context, but at least some attempt to translate into English would be useful, wouldn't it? Dbfirs 17:42, 24 March 2008 (UTC)

  • This has been mentioned many times before. When we achieve our aim of ALL words in ALL languages (not this year!) then English will probably accoubt for less than 1% of the dictionary. The part of speech entries for languages such as French, Italian and Spanish are mostly generated by bots. The work involved in adding translations, or simple examples of use would be horrendous - but anyone with a large amount of time could have a go. SemperBlotto 17:49, 24 March 2008 (UTC)
Then again, some have argued that adding translations to these is a bad idea. How do you translate an Ancient Greek perfect optative middle/passive second dual? The answer is, you don't; not in isolation anyway. The fact is that, for most languages, having a basic understanding of the language and its grammar is essential to being able to translate it. Adding translations to these will help no one, and mislead many. Also, Connel Mackenzie has a random English word program on his userpage (5th line down, rnd-En). -Atelaes λάλει ἐμοί 17:57, 24 March 2008 (UTC)
Connel has an external tool which will send you to a random page in a given language, based on dumps I think. In order to have a link to a random page in English either the wiki would have to parse all the pages to decide if it included anywhere ==English== or not, or it would have to give a random page from a category "English Words". Currently I don't think 100% of our English words (or 100% of any language's words) are in a category "languagex Words", it wouldn't be difficult to categorize them as such with a bot but I seem to recall a discussion about how silly it was to have categories as broad as "languagex words"...imagine that someone found a good use for them :). So, a couple steps would need to take place to have it local, or we could use Connel's tool and call it a day.
Also, WT:GP might be a better place for this particular suggestion. - TheDaveRoss 20:05, 24 March 2008 (UTC)
  • Thanks. Questions answered. Answers and reasons understood. I'm impressed by the vision of "all words in all languages"! Dbfirs 21:00, 24 March 2008 (UTC)
Whoops, I'm a bit late - but see Wiktionary:Random page for more info on Connel's solution. Conrad.Irwin 17:28, 25 March 2008 (UTC)

Admin Notice: Special:MergeAccount

Finally, a reward, of sorts, for all your hard work! Administrators on any Wikimedia site can access the Special:MergeAccount page, which allows them to unify their logins across the whole of the Wikimedia Foundation. This is not yet available to non-administrator users however I am sure it won't be too much longer. Thank you to all the people who have been involved in the implementation of this great, long anticipated, feature. Conrad.Irwin 11:07, 25 March 2008 (UTC)

Cheers for the note Conrad, I'm now the only Thryduulf you'll see on any Wikimedia project - and it only took three tries to remember the old password I still had set on cy.wikipedia!
Its worth noting that it works even if all the accounts you have don't have admin status, but whether it requires it on your "home wiki" or just on more or more I don't know. Thryduulf 13:49, 25 March 2008 (UTC)
I did not even remember that I had an account at Wikispecies! Circeus 13:26, 26 March 2008 (UTC)
Likewise I don't remember registering at sv.wikipedia! Thryduulf 14:20, 26 March 2008 (UTC)
It requires (for now) that you initiate the merge from "my preferences" on a wiki where you have admin status. Having done that, everything is symmetrical again. Robert Ullmann 14:47, 26 March 2008 (UTC)
I've done this using my admin status on simple.wikt across a variety of sites, but the user:Brett account is taken here, though it's completely unused. If a bureaucrat here could rename that, I could unify my logins. I wonder if somebody would have a look over here.--BrettR 19:26, 26 March 2008 (UTC)

Does this work if your password is not the same for all your accounts? That is, do I need to sync my passwords first? --EncycloPetey 19:33, 26 March 2008 (UTC)

If the password and/or (confirmed?) email address are the same as the account from which you initiate the merge, then the merge will happen automatically. If neither of these are true, then you are given the option to log into those accounts. Once the correct password is supplied, the accounts are merged. For example
  • Most of my accounts had a password starting with X and/or had the same email address as my Wiktionary account - these were automatically merged.
  • All but one of the rest had a password starting with k, once I provided this password all the accounts with it were merged.
  • The remaining 1 account (cy.wikipedia) was not included in any of the above. I tried at least two different passwords without success until I remembered an old password that worked and it was then added to the merge.
If I hadn't remembered the password, I could have left it unmerged to merge it in at a later date after I'd reset the password. I presume that had I not been able to remember or reset the password that I could have gone through a forced merge procedure or something. Thryduulf 19:51, 26 March 2008 (UTC)
Seemed to go very smoothly for me. The one thing the doco didn't say is that it does set the unified password to the one you used in the merge procedure. Instead of having multiple passwords, there is now a single password for all merged accounts. --EncycloPetey 20:16, 26 March 2008 (UTC)
That's because there is only one account after the merge. The only thing on the local wikis after the merge are user preferences (along with contributions, action logs, user page(s), and user talk page). Mike Dillon 03:50, 28 March 2008 (UTC)

Bot flag request for User:Computer

Hi, I would like to request a bot flag. I already have a bot flag on many wikis m:User:White Cat#Bots and I am from commons:User:White Cat at which I am a sysop. My language skills: en-n, tr-4, az-2, ja-1.

I hope to help with the following tasks:

  • Interwikilinking using interwiki.py
  • Double redirect fixing using redirect.py
  • Commons delinking using delinker.py

I can also help with tasks like recategorization or any other bulk find & replace tasks.

-- Cat chi? 00:18, 27 March 2008 (UTC)

We do not need another interwiki bot (if you look in the archives the last few requests to run one have been denied) as User:Interwicket is much more efficient than interwiki.py. We also have User:CommonsDelinker - though I don't know if it needs a hand or not. Double redirects aren't often a problem (as we generally shoot redirects on sight and so don't want them fixing). In terms of bulk find and replace tasks I would prefer these to be in the hands of a user with more Wiktionary experience, though they are always reversible so the damage is minimal. It is probably better to run with the flag off until you begin to annoy the RC patrollers. Conrad.Irwin 00:39, 27 March 2008 (UTC)
There are a few reservations I have with some of that. Let me number list them.
  1. I intend to run my interwiki bot on all wiktionaries. Not running it here would create additional load to the existing bots. Having multiple interwiki bots do not disrupt the operation of each other. Wiktionary is a colossal project. I am looking at http://www.wiktionary.org/ Just adding the largest four wikis reveal: 769 000 + 753 000 + 225 000 + 187 000 = 2 121 000. Processing all of that regularly would require lots of interwiki bots. I am talking about all article scans.
  2. Even if there are one or two double redirects (there can be valid reasons to have redirects even if most of them are shot on sight). This wiki had 6 such redirects as of this post of which four you fixed manually (example). Bot could have done that for you. Any unnecesary redirect can be processed from Special:ListRedirects. Looking at there I see plenty of redirects, well over 5000... Broken redirects are a navigation hazard.
  3. The bot acts as a backup to CommonsDelinker. If for any reason commons delinker fails to operate (such as toolserver going down), my bot would fill in for it.
  4. Find & replace tasks requires no real experience. Its merely case sensitive here.
  5. Operating the bot without a bot flag decreases its efficiency. Wikimedia servers limits how many edits people can make to prevent spam bots. Because I operate my bot it many wikis the bot flag is particularly helpful.
-- Cat chi? 15:43, 27 March 2008 (UTC)
  1. I still don't see the need for more interwiki bots - I'm not aware that the current ones are struggling at all?
  2. Fair enough if that happens - CommonsDelinker doesn't have much work to do here (1 edit this month, the 50th most recent was on 12 November), given there are not many images. If the toolserver does go down (I was under the impression it was far more stable these days?) I'm not certain there is a need for a backup bot - if there is an extended outage, then we can discuss a need then.
  3. The most recent find and replace operation I'm aware of was moving a category, which you are right doesn't need much experience, but more likely will be ones that you need to understand the formatting and many templates we use here - I'd rather the bot be operated by someone who understands how to fix any accidental damage it causes.
  4. To my mind you still haven't explained why we need the bots in the first place. Thryduulf 22:21, 27 March 2008 (UTC)
  1. You cannot deal with well over two million pages with just 2 bots. How much time do you think the bot spends on each page? 2 000 000 / (24*60*60) = 23.148... meaning the bot would need to review 23 pages per second assuming all editions of wiktionary has 2 million pages. Assuming the current bot can handle such a thing. Dividing that workload by two is only logical.
  2. No. En.wiktionary is not at the center of the universe. Commonsdelinker operates on over 800 wikis. I will not have the time to discuss this on so many wikis. I am a human being and I will need to be sleeping, eating, working when the next outage happens which could be the next hour when a lightning strikes. You are right there aren't a whole lot of images here. If there isn't a backup bot you will have a red link. Commons administrators are neither required nor expected to manually delink (or relink, images can be renamed) images.
  3. It would take me a few minutes to figure out the fine details. I am not a 5 year old. I can do the same kind of fixes with trivial amount of attention.
  4. I am trying to offer a service. I want to help out. Having two or more bots help out a demanding task is something productive. You share the workload, you cooperate. I can help with the bulky issues like interwiki linking and double redirect fixing. Other bot operators the would be freer to focus on tasks that requires more fine tuning and experience in wiktionary.
-- Cat chi? 00:45, 28 March 2008 (UTC)
While its true there are 2 million pages (edit, this is not true for a single Wiktionary the largest is fr with 169,000 pages, we have 153,000. The 10 largest wikis total 2.6 million), not all of them need monitoring constantly - all the bots need to do is to watch for page creations and additions of other interwiki links. The latter is less important for Wiktionaries than for Wikipedias (and perhaps other projects) as the only interwiki links we want are between identically spelled entries, i.e. house should link to fr:house, de:house, pl:house, etc. whereas w:en:House links to w:fr:Maison, w:de:Haus, w:pl:Dom. The latter needs inteligence to know that w:pl:Dom should link to w:de:Haus (house) not w:de:Dom (cathedral), whereas the Wiktionary links need only a dumb bot. Also, we have at least two interwiking bots currently - you have not shown they cannot cope and the owners of those bots have not said they cannot cope. Additionally the last 100 recent changes covers 1 hour and 20 minutes of edits here at one of the busiest Wiktionaries (I don't know if its the busiest or not) At Wikipedia there were more than 100 changes in the past 1 minute, which shows the different scale of the projects - smaller wiktionaries are even more stark - 100 changes on pt.wiktionary took 18 hours, on cy.wiktionary it took 6 days (and many of them were dealing with a spambot). You appear to be thinking in Wikipedian terms. Thryduulf 01:26, 28 March 2008 (UTC)
Please see http://www.wiktionary.org/ fr.wikt has 769 000 alone en.wikt is at 753 000 followed by vi.wict at 225 000 and tr.wikit 187 000 (big 4). I do not know where you get your numbers.
Sorry, my numbers are from the same source as yours, but for some reason I typed an initial 1 instead of an initial 7. Thryduulf 13:13, 28 March 2008 (UTC)
All wikis should be scanned that they contain the correct interwiki links. That includes links for ts.wikt. Remember we are not writing this project for technically advanced people like you and me but for the casual reader who barely knows how to use a mouse.
The latter indeed need greater intelligence. But thats half of the work. If you for example want to add Polish to the chain, you would have to edit every wiki that is in the chain. Interwiki.py spreads it to every wiki for you. All you need to do is add a single interwiki link on one wiki, the bot can spread it for you. Thats what I mean by constant scanning.
But this is exactly what is already happening with VolkovBot and Interwicket here - look at their contribs. Thryduulf 13:13, 28 March 2008 (UTC)
It is of course managavle if you restrict the bots sensors to recent changes and hope all interwiki links are properly placed. How does a bot operating on en.wiki RC feed know about the addition of a page on Polish wiki? I scan individual articles, I pay no attention to the RC feed. Regularly scanning every page on every edition of wiktionary is the task I want to fulfill. Which of the two bots are doing this?
-- Cat chi? 12:29, 28 March 2008 (UTC)
See User talk:Interwicket#VolkovBot - VolkovBot does what you are proposing (I think), Interwicket reads the recent changes. So you see the task you are proposing is already being doing on the English Wiktionary, If you want to run your bot on other Wikitionaries then you will have to ask there we can't give or refuse you permission. Thryduulf 13:13, 28 March 2008 (UTC)
Agree with Thryduulf. Non of your bots seem to fill any need here on Wiktionary. I feel far more comfortable having one of our own fulfill these tasks, as they know Wiktionary and its needs far better. Ullmann's bots are like magic, and not just because he's a skilled programmer, but also because he has been here for a long time and has a thorough understanding of what needs to happen. While we appreciate your offer to help, it is not required here right now. -Atelaes λάλει ἐμοί 22:28, 27 March 2008 (UTC)
In other words if you are new you are unwelcome. :D I know this wasn't intended to be confrontations but I don't like it. :P -- Cat chi? 00:46, 28 March 2008 (UTC)
Not at all - you are very welcome, we would just like you to gain experience as a human editor before you run bots we aren't sure are needed. Thryduulf 01:26, 28 March 2008 (UTC)
It's much more manageable for each wiki community to run its own bots locally. The interwiki bot is an example of that; the one in use here is much more efficient than the standard bot based on the pywikipedia framework, and we can tailor it to our specific needs. I find it odd that you are so hostile to the idea that your bot may be duplicating work we do already. Instead of taking offense, why not join the local community and see where a new bot might be useful? -- ArielGlenn 01:29, 28 March 2008 (UTC)
I am not the one hostile. My concern is all editions of wiktionary not just en.wikitonary. We seem to be talking in different scales. I am interested in the macro-scale not micro. --12:29, 28 March 2008 (UTC)
Our concern is for the English Wiktionary primarily, and with the bots we already have here, our interwiki links are kept up-to-date already. As I said above we cannot give you permission to run your bot on any Wiki other than the English Wikipedia - you will have to ask on them. But a look at a random selection of wiktionaries (cy, pt, id, ts, vo and fr) suggests that VolkovBot is keeping their interwikis up-to-date as well. Thryduulf 13:13, 28 March 2008 (UTC)
  • Random question for the techies: Are non-mainspace interwikis (for project pages, templates, etc.) currently being handled adequately? Of course there are project-unique concerns here as well (particularly wrt templates and categories), but the pywikipedia approach would seem to be more applicable to these. -- Visviva 03:13, 28 March 2008 (UTC)
    • We don't seem to do those, for some reason. Some would need to be done in the Wikipedia way, though, since project/template/appendix/etc. pages should link to pages with the same focus, not just the same spelling, and some, like rhymes/wikisaurus/etc. should be linked the normal way. Dmcdevit·t 03:20, 28 March 2008 (UTC)
      • My bot can deal with this. -- Cat chi? 12:29, 28 March 2008 (UTC)
        • Would you accept a bot flag that was only for handling non-main-namespace interwikis, without permission to touch entries? —RuakhTALK 16:55, 28 March 2008 (UTC)
I'm not sure why, but you are considered as not having registered.
To solve this issue once and for all, why not publishing Interwicket in pywikipedia, so that it is clear that there are two standard interwiki bots, one for wiktionaries, and one for other projects?
If you are willing to use Interwicket on wiktionaries, I'm sure you would be very welcome on all wiktionaries (including here, providing there is some coordination between Interwicket users). Lmaltier 07:33, 29 March 2008 (UTC)
I can use seperate code yes. But the two different codes do the same thing. Interwicket is simply more efficient but does the same thing as interwiki.py. I will always use the more efficient code. I value my CPU time after all. Also interwicket cannot handle non-mainspace tasks while interwiki.py can. I think a good use of both will be the best course of action. -- Cat chi? 19:03, 16 April 2008 (UTC)

Wiktionary:Easter Competition 2008

Announcing Wiktionary:Easter Competition 2008. Discussion can be at that page (or its talk page).—msh210 17:08, 27 March 2008 (UTC)

Citations pages: let's be specific.

I was under the impression that Citations pages were to hold evidence for whether or not a term meets the CFI, and in this case I am defining 'term' as a specific set of characters unique with regards to order and capitalization. This means that MySpace != myspace, and Kind != kind when it comes to Citations (altho first word in the sentence caps are acceptable). Am I the only one thinking this way? MySpace's citations should be on Citation:MySpace and myspace on Citation:myspace? Inflected forms and declined forms and conjugated forms and all other forms should be on separate pages, to verify their own existence. - TheDaveRoss 20:32, 28 March 2008 (UTC)

I was under that impression, yes, except that inflected forms may appear on the "lemma" page in addition to the page for the specific form. In other words, the citations for "let" as an entry do not have to be the infinitive form, but can be for other forms as well. We do the same for example sentences; such sentences do not have to be artificially worded to use a particular form of the word, but may use any form. --EncycloPetey 20:47, 28 March 2008 (UTC)
Agreed, however there is also merit to citing specific forms. I am considering dividing up the citations pages for Ancient Greek lemma entries in two, with one half for any form of the lemma, and one half for the specific form (this is especially relevant because some words may not have their lemma form attested, a situation which isn't really the case in living languages). However, the point TheDaveRoss raises about different spellings, such as capitalized and non-capitalized remains a valid one. This probably ties in to the discussion we were having about -in' forms. -Atelaes λάλει ἐμοί 20:59, 28 March 2008 (UTC)
Capitalization does create a bit of an extra problem for Latin, since capitalization rules differ between Classical Latin on the one hand and Medieval and Later Latin on the other. We'll probably have to set special guidelines for languages that went through such a shift. --EncycloPetey 21:16, 28 March 2008 (UTC)
Such as English. :-) -- Visviva 23:50, 30 March 2008 (UTC)
When a word can be cited only at the beginning of a sentence, this is not a sufficient reason to capitalize it in the entry. The same should apply to normal inflected forms (if applicable rules are clear): it can be helpful to provide examples and citations for each form, including the lemma form, but they should not be a requirement before creating the entries (once again, when applicable rules are clear). Lmaltier 07:44, 29 March 2008 (UTC)
Even citation in the middle of a sentence isn't always reason to capitalize the entry. English allows capiptalization for the purpose of emphasis, and this used to be quite common in written English. If you read some Shakespeare's plays or some of Locke's essays with the original capitlaization, you will see many, many common nouns capitalized in the middle of sentences. --EncycloPetey 00:06, 31 March 2008 (UTC)
You are right. And I think this is a reason not to create Lion or Beluga as English words. After all, we don't create When, which may be capitalized or not, but follows general rules. Lmaltier 16:46, 31 March 2008 (UTC)

FL name as template in trans section

I've seen this in prologue, in the translation section the FL name is not Finnish but {fi}. Is this standard? --Panda10 11:44, 29 March 2008 (UTC)

No, they should always be subst'd. They occasionally show up because of edits by people more familiar with other wikts, which sometimes use them as standard. The primary reason we don't is that we have a lot of languages in translation tables that are either not coded, or have codes not known to the editors. It also makes it very hard to alphabetize when changing the wikitext. Often the FL.wikts that use them just put the table in code order; but with the number of languages we have code order looks random after a while. (Tocharian A is xto, Tocharian B is txb, etc, etc).
If you just leave them, AF will fix them as it rechecks the entry after your edit. Robert Ullmann 13:22, 29 March 2008 (UTC)
Thank you for the explanation. --Panda10 21:52, 29 March 2008 (UTC)
Could we make it alphabetize automatically using class="sortable" or something? --Ptcamn 22:55, 30 March 2008 (UTC)
If AutoFormat always sorts them then there is no need. Conrad.Irwin 10:51, 31 March 2008 (UTC)
AF doesn't sort them yet, but it has been on the "to do" list for a long time ... perhaps in a few hours from now? ;-) Do note that in some cases people have intentionally used a different order (Like listing "Ancient Greek" after "Greek" without using any nested syntax) and this will "fix" those. I had brought up the issue of a well-defined but not strict alpha order previously, and the sort-of conclusion was just to stick to strict alpha. The "nesting" (see something on hierarchy supra) is another issue, as it is used for several different things. AF will just preserve it for now. (Of course!). Robert Ullmann 11:56, 2 April 2008 (UTC)
AF will now sort and rebalance translations blocks that use {{trans-top}}. See price for example. Robert Ullmann 15:36, 2 April 2008 (UTC)

How should Wiktionary distinguish between two classes of non-English words?

How should Wiktionary distinguish between two classes of non-English words?: The superclass of "all foreign words which will eventually be added to Wiktionary" and the much smaller subclass of "foreign words that are somewhat regularly used in some English literature (technical or otherwise)". For example, dirigist is used in the English languarge social science literature whereas many words are not (such as milieuverontreiniging, which is not, I suspect, used in the English-language environmental literature). Yet both are listed in Related or Derived terms lists of other foreign words (dirigisme and milieu, respectively). My question is how does an English-language only Wiktionary user know which words might make sense to use in English literature and which ones are merely foreign words being added to the big giant Wiktionary project and would therfore likely NOT be useful/normal in any standard English literature? This seems like a problem that will become more and more pronounced as a larger number of foreign words are added. N2e 23:45, 30 March 2008 (UTC)

Personally, I think that if a foreign word is used regularly in English, then in some sense it's an English word, and it should have an English language section; the etymology and usage notes sections can discuss its foreignness, and some sort of context template can be devised if that's considered necessary. However, at least one editor has in the past objected rather strenuously to giving such words English-language sections; I don't know if he still would. —RuakhTALK 03:02, 31 March 2008 (UTC)
I agree with Ruakh; any foreign word used in English literature (or used in English sentences- online, etc) should count as an English word, as a general rule of thumb. sewnmouthsecret 03:51, 31 March 2008 (UTC)
Basically agreed; IMO the question that actually confronts us is how best to differentiate between two classes of English words, viz. English words in the strict sense and foreignisms. But let's note that this inclusive approach also entails a large number of FL sections for the many English words that are used as deliberate anglicisms in French, German, Italian, et al. We do need some way of handling these cases without absurdity, and I'm not sure if any suitable model has yet been presented (maybe a Translingual section for widely-used anglicisms?). And on the other hand there is still a line that must be drawn between foreignisms and code-switching (as when for example someone quotes a German movie title in English -- the words of the title are not being used as English words in any sense). -- Visviva 04:17, 31 March 2008 (UTC)
If a word has been partly or fully naturalized in English in a certain field of study or practice, then it should have appropriate context labels or notes explaining this (slang, jargon, technical, the field, etc). Sounds to me like a case of overlapping restrictions make it less common than something purely technical or purely a loanword, but still no less a part of English.
In such cases, I'd like to see attention paid to including references attesting to how established they are in their field, as well as good citations demonstrating their use. Of course, it may be difficult finding such information, but glossaries specific to the field may be useful. —Michael Z. 04:18, 31 March 2008 (UTC)
That makes sense. But how would you approach the case where the only field of study or practice where the word is partially naturalized is directly related to the source language? For example, mashallah appears chiefly in narratives set in the Arabic- and Farsi-speaking world; devotchka (Burgess aside) chiefly in literary works featuring Russians. Both are overwhelmingly italicized, and may occur alongside other words (like malenkaya) which no sane person would consider English. -- Visviva 10:28, 31 March 2008 (UTC)
Well, many Russian words used in fiction about Russia, in works translated from Russian, or in the field of Slavistics are simply foreign terms, which are being used for the benefit of some chelovicks who may razoomy them. I think these don't belong under the "English" heading. —Michael Z. 21:40, 1 April 2008 (UTC)
Re: Ruakh's comment. Thanks, that helps me a lot. I looked back at the two words I orginally used as examples (dirigist and milieuverontreiniging) and found them actually coded correctly to Ruahk's suggested standard: dirigist as English and milieuverontreiniging as Dutch. So far, so good. That shows that my comment was about a "user error" -- I didn't perceive the difference even though it was articulated in the entry for each word, plain as day. That helps me; I will try to be more careful in the future.
HOWEVER, it does suggest another question for all of the serious wordsmiths and Wiktionarians who frequent the Beer Parlor: How to "design" a good (better?) user interface for Wiktionary that makes this English/Foreign word distinction more apparent to the casual Wiktionary user? N2e 14:46, 31 March 2008 (UTC)
My CanOD italicizes a headword "if the word is originally a foreign word and not naturalized in English." It's a great way to demonstrate its nature by simulating its natural environment. —Michael Z. 21:40, 1 April 2008 (UTC)

Bot flag request for User:Computer

Hi, I would like to request a bot flag. I already have a bot flag on many wikis m:User:White Cat#Bots and I am from commons:User:White Cat at which I am a sysop. My language skills: en-n, tr-4, az-2, ja-1.

I hope to help with the following tasks:

  • Interwikilinking using interwiki.py
  • Double redirect fixing using redirect.py
  • Commons delinking using delinker.py

I can also help with tasks like recategorization or any other bulk find & replace tasks.

-- Cat chi? 00:18, 27 March 2008 (UTC)

We do not need another interwiki bot (if you look in the archives the last few requests to run one have been denied) as User:Interwicket is much more efficient than interwiki.py. We also have User:CommonsDelinker - though I don't know if it needs a hand or not. Double redirects aren't often a problem (as we generally shoot redirects on sight and so don't want them fixing). In terms of bulk find and replace tasks I would prefer these to be in the hands of a user with more Wiktionary experience, though they are always reversible so the damage is minimal. It is probably better to run with the flag off until you begin to annoy the RC patrollers. Conrad.Irwin 00:39, 27 March 2008 (UTC)
There are a few reservations I have with some of that. Let me number list them.
  1. I intend to run my interwiki bot on all wiktionaries. Not running it here would create additional load to the existing bots. Having multiple interwiki bots do not disrupt the operation of each other. Wiktionary is a colossal project. I am looking at http://www.wiktionary.org/ Just adding the largest four wikis reveal: 769 000 + 753 000 + 225 000 + 187 000 = 2 121 000. Processing all of that regularly would require lots of interwiki bots. I am talking about all article scans.
  2. Even if there are one or two double redirects (there can be valid reasons to have redirects even if most of them are shot on sight). This wiki had 6 such redirects as of this post of which four you fixed manually (example). Bot could have done that for you. Any unnecesary redirect can be processed from Special:ListRedirects. Looking at there I see plenty of redirects, well over 5000... Broken redirects are a navigation hazard.
  3. The bot acts as a backup to CommonsDelinker. If for any reason commons delinker fails to operate (such as toolserver going down), my bot would fill in for it.
  4. Find & replace tasks requires no real experience. Its merely case sensitive here.
  5. Operating the bot without a bot flag decreases its efficiency. Wikimedia servers limits how many edits people can make to prevent spam bots. Because I operate my bot it many wikis the bot flag is particularly helpful.
-- Cat chi? 15:43, 27 March 2008 (UTC)
  1. I still don't see the need for more interwiki bots - I'm not aware that the current ones are struggling at all?
  2. Fair enough if that happens - CommonsDelinker doesn't have much work to do here (1 edit this month, the 50th most recent was on 12 November), given there are not many images. If the toolserver does go down (I was under the impression it was far more stable these days?) I'm not certain there is a need for a backup bot - if there is an extended outage, then we can discuss a need then.
  3. The most recent find and replace operation I'm aware of was moving a category, which you are right doesn't need much experience, but more likely will be ones that you need to understand the formatting and many templates we use here - I'd rather the bot be operated by someone who understands how to fix any accidental damage it causes.
  4. To my mind you still haven't explained why we need the bots in the first place. Thryduulf 22:21, 27 March 2008 (UTC)
  1. You cannot deal with well over two million pages with just 2 bots. How much time do you think the bot spends on each page? 2 000 000 / (24*60*60) = 23.148... meaning the bot would need to review 23 pages per second assuming all editions of wiktionary has 2 million pages. Assuming the current bot can handle such a thing. Dividing that workload by two is only logical.
  2. No. En.wiktionary is not at the center of the universe. Commonsdelinker operates on over 800 wikis. I will not have the time to discuss this on so many wikis. I am a human being and I will need to be sleeping, eating, working when the next outage happens which could be the next hour when a lightning strikes. You are right there aren't a whole lot of images here. If there isn't a backup bot you will have a red link. Commons administrators are neither required nor expected to manually delink (or relink, images can be renamed) images.
  3. It would take me a few minutes to figure out the fine details. I am not a 5 year old. I can do the same kind of fixes with trivial amount of attention.
  4. I am trying to offer a service. I want to help out. Having two or more bots help out a demanding task is something productive. You share the workload, you cooperate. I can help with the bulky issues like interwiki linking and double redirect fixing. Other bot operators the would be freer to focus on tasks that requires more fine tuning and experience in wiktionary.
-- Cat chi? 00:45, 28 March 2008 (UTC)
While its true there are 2 million pages (edit, this is not true for a single Wiktionary the largest is fr with 169,000 pages, we have 153,000. The 10 largest wikis total 2.6 million), not all of them need monitoring constantly - all the bots need to do is to watch for page creations and additions of other interwiki links. The latter is less important for Wiktionaries than for Wikipedias (and perhaps other projects) as the only interwiki links we want are between identically spelled entries, i.e. house should link to fr:house, de:house, pl:house, etc. whereas w:en:House links to w:fr:Maison, w:de:Haus, w:pl:Dom. The latter needs inteligence to know that w:pl:Dom should link to w:de:Haus (house) not w:de:Dom (cathedral), whereas the Wiktionary links need only a dumb bot. Also, we have at least two interwiking bots currently - you have not shown they cannot cope and the owners of those bots have not said they cannot cope. Additionally the last 100 recent changes covers 1 hour and 20 minutes of edits here at one of the busiest Wiktionaries (I don't know if its the busiest or not) At Wikipedia there were more than 100 changes in the past 1 minute, which shows the different scale of the projects - smaller wiktionaries are even more stark - 100 changes on pt.wiktionary took 18 hours, on cy.wiktionary it took 6 days (and many of them were dealing with a spambot). You appear to be thinking in Wikipedian terms. Thryduulf 01:26, 28 March 2008 (UTC)
Please see http://www.wiktionary.org/ fr.wikt has 769 000 alone en.wikt is at 753 000 followed by vi.wict at 225 000 and tr.wikit 187 000 (big 4). I do not know where you get your numbers.
Sorry, my numbers are from the same source as yours, but for some reason I typed an initial 1 instead of an initial 7. Thryduulf 13:13, 28 March 2008 (UTC)
All wikis should be scanned that they contain the correct interwiki links. That includes links for ts.wikt. Remember we are not writing this project for technically advanced people like you and me but for the casual reader who barely knows how to use a mouse.
The latter indeed need greater intelligence. But thats half of the work. If you for example want to add Polish to the chain, you would have to edit every wiki that is in the chain. Interwiki.py spreads it to every wiki for you. All you need to do is add a single interwiki link on one wiki, the bot can spread it for you. Thats what I mean by constant scanning.
But this is exactly what is already happening with VolkovBot and Interwicket here - look at their contribs. Thryduulf 13:13, 28 March 2008 (UTC)
It is of course managavle if you restrict the bots sensors to recent changes and hope all interwiki links are properly placed. How does a bot operating on en.wiki RC feed know about the addition of a page on Polish wiki? I scan individual articles, I pay no attention to the RC feed. Regularly scanning every page on every edition of wiktionary is the task I want to fulfill. Which of the two bots are doing this?
-- Cat chi? 12:29, 28 March 2008 (UTC)
See User talk:Interwicket#VolkovBot - VolkovBot does what you are proposing (I think), Interwicket reads the recent changes. So you see the task you are proposing is already being doing on the English Wiktionary, If you want to run your bot on other Wikitionaries then you will have to ask there we can't give or refuse you permission. Thryduulf 13:13, 28 March 2008 (UTC)
Agree with Thryduulf. Non of your bots seem to fill any need here on Wiktionary. I feel far more comfortable having one of our own fulfill these tasks, as they know Wiktionary and its needs far better. Ullmann's bots are like magic, and not just because he's a skilled programmer, but also because he has been here for a long time and has a thorough understanding of what needs to happen. While we appreciate your offer to help, it is not required here right now. -Atelaes λάλει ἐμοί 22:28, 27 March 2008 (UTC)
In other words if you are new you are unwelcome. :D I know this wasn't intended to be confrontations but I don't like it. :P -- Cat chi? 00:46, 28 March 2008 (UTC)
Not at all - you are very welcome, we would just like you to gain experience as a human editor before you run bots we aren't sure are needed. Thryduulf 01:26, 28 March 2008 (UTC)
It's much more manageable for each wiki community to run its own bots locally. The interwiki bot is an example of that; the one in use here is much more efficient than the standard bot based on the pywikipedia framework, and we can tailor it to our specific needs. I find it odd that you are so hostile to the idea that your bot may be duplicating work we do already. Instead of taking offense, why not join the local community and see where a new bot might be useful? -- ArielGlenn 01:29, 28 March 2008 (UTC)
I am not the one hostile. My concern is all editions of wiktionary not just en.wikitonary. We seem to be talking in different scales. I am interested in the macro-scale not micro. --12:29, 28 March 2008 (UTC)
Our concern is for the English Wiktionary primarily, and with the bots we already have here, our interwiki links are kept up-to-date already. As I said above we cannot give you permission to run your bot on any Wiki other than the English Wikipedia - you will have to ask on them. But a look at a random selection of wiktionaries (cy, pt, id, ts, vo and fr) suggests that VolkovBot is keeping their interwikis up-to-date as well. Thryduulf 13:13, 28 March 2008 (UTC)
  • Random question for the techies: Are non-mainspace interwikis (for project pages, templates, etc.) currently being handled adequately? Of course there are project-unique concerns here as well (particularly wrt templates and categories), but the pywikipedia approach would seem to be more applicable to these. -- Visviva 03:13, 28 March 2008 (UTC)
    • We don't seem to do those, for some reason. Some would need to be done in the Wikipedia way, though, since project/template/appendix/etc. pages should link to pages with the same focus, not just the same spelling, and some, like rhymes/wikisaurus/etc. should be linked the normal way. Dmcdevit·t 03:20, 28 March 2008 (UTC)
      • My bot can deal with this. -- Cat chi? 12:29, 28 March 2008 (UTC)
        • Would you accept a bot flag that was only for handling non-main-namespace interwikis, without permission to touch entries? —RuakhTALK 16:55, 28 March 2008 (UTC)
I'm not sure why, but you are considered as not having registered.
To solve this issue once and for all, why not publishing Interwicket in pywikipedia, so that it is clear that there are two standard interwiki bots, one for wiktionaries, and one for other projects?
If you are willing to use Interwicket on wiktionaries, I'm sure you would be very welcome on all wiktionaries (including here, providing there is some coordination between Interwicket users). Lmaltier 07:33, 29 March 2008 (UTC)
I can use seperate code yes. But the two different codes do the same thing. Interwicket is simply more efficient but does the same thing as interwiki.py. I will always use the more efficient code. I value my CPU time after all. Also interwicket cannot handle non-mainspace tasks while interwiki.py can. I think a good use of both will be the best course of action. -- Cat chi? 19:03, 16 April 2008 (UTC)

<section end="archive_march">

April 2008

<section begin="archive_april">

Wholesale conversion to "Determiner"

A user Contribution of BrettR_aka_Mr._Determiner has changed the entries for a dozen frequently used words to eliminate all PoS headings except "Determiner". Does such a drastic step have community approval? It seems simply undiscussed. It seems to me to be going in the direction against intelligibility for ordinary users and favoring some current fashion in lingustics and language education. DCDuring TALK 20:11, 1 April 2008 (UTC)

Wiktionary:Beer_parlour_archive/2007/April#Determiner_vs_Determinative is the last major discussion and seems hardly conclusive. Has there been a policy vote? Or is this an April Fools thing? DCDuring TALK 20:36, 1 April 2008 (UTC)

I don't think that's O.K. It might be O.K. to unify ===Adjective=== and ===Pronoun===, and maybe even ===Noun===, under ===Determiner===; but eliminating ===Adverb=== sections? —RuakhTALK 22:28, 1 April 2008 (UTC)
Is "determiner" really widely accepted in books for those outside of the language and linguistics community. If it is leading (bleeding?) edge, then perhaps we could find a way of transitioning to its use that allowed for folks like me to get behind it. I really don't see why this header ought to be in use at all until it has been well discussed. Because we can't do any real user research, we have to pay close attention to the practice of other publishers. Their practice does not support this. If we think that we can achieve a competitive advantage by being cutting edge, free of some unhelpful traditional categories, then we should go ahead and implement the change. I haven't heard the rationale for the superiority of the unknown-to-users category "Determiner" to the widely known old-fashioned categories. Maybe it doesn't really matter much because hardly anyone will use a dictionary for these words anyway and we don't have all that big a user base so we can just do what pleases us. DCDuring TALK 23:39, 1 April 2008 (UTC)
I'm not certain whether we should use "Determiner" or not, but I am certain we should not use it until we've discussed it and come to a conclusion to use it. The previous discussions are neither recent enough nor conclusive enough imho. Thryduulf 00:01, 2 April 2008 (UTC)
That wish is not stopping the process which continues as we wring our hands. Judging from the lack of action I would say that this is something most don't care about or support. DCDuring TALK 01:26, 2 April 2008 (UTC)
We just don't particularly have time or whatever. What "BrettR" doesn't realize it that he is likely to cause a vote on using "determiner" fo English (see below), and if it is barred, we will calmly revert all of his edits and remove the Determiner heading from the entries it was already used in. (I recall reading something on the 'pedia about someone (ahem) determined to show the wikt people who don't even know what a determiner is; I suppose I should have seen this coming? ;-). It is not a classic English POS, and there is little reason to allow it. "BrettR" is likely just wasting his time. But note, not much of ours, is easy to rip out. OTOH, maybe it is just an April Fool's joke, albeit not a very amusing one. Robert Ullmann 01:39, 2 April 2008 (UTC)

No, not April fools. There are a small number of English determiners and half of them were already listed as determiner. The others have been categorized that way for a long time. I just though I'd be bold. I'm certainly willing to discuss. I've stopped the process for now.--BrettR 01:45, 2 April 2008 (UTC)

By the way, the vast majority of my edits last night were simply adding or fixing up the {{en-det}} template to sections with existing L3/POS Determiner headers.--BrettR 13:51, 2 April 2008 (UTC)

  • I really don't understand all of the hullabaloo here. "Determiner" (or "demonstrative determiner") has been in use as an English POS header here since at least May 2004, without causing much of a fuss, and it seems to me that all BrettR has done is introduce a bit of much-needed consistency. -- Visviva 06:42, 3 April 2008 (UTC)
I'm asking what it's for. If it's good for users, then it deserves to be documented as policy. If it's not, then it deserves to be eliminated. The other dictionary operations seem divided on its goodness. Should we stay with the set of categories most widely known to users and contributors or should we push them in a progressive direction (if it really is progressive)?
As wiktionary matures some of the issues that have been swept under the rug ought to be addressed to allow progress toward consistent practice to help our users get more out of us than they would get out of a list of definitions. Our entries are far from uniform in quality and extremely inconsistent in the use of fundamental categories. The appearance of uniformity forced by the software and the heading structure belies the depth of the inconsistency. The level three headers are of fundamental importance in structuring entries. Questionable areas include the treatment of abbreviations (written vs. non-written, actual PoS for abbrevs. other than nouns); phrases, idioms, and proverbs; interjections; numbers; and other symbols. Do we need adjectve headings for attributive use of nouns? When do participles become adjectives or nouns? When are related etymologies worth splitting? Codifying much of this now might be premature, but it would help Wiktionary improve to convert our experience and beliefs concerning what works from a user perspective to policies and guidelines. DCDuring TALK 10:14, 3 April 2008 (UTC)

I believe that all the English determiners are now identified with an L3 Determiner heading and {{en-det}}.--Brett 12:09, 26 May 2008 (UTC)

Proposal to use "Determiner" as an L3/POS header.

Per the discussion immediately preceding this one, I'd like to propose that we begin using "Determiner" as an L3/POS header for the unambiguous English determiners (CGEL determinatives). It's a small enough group of words — larger than you might expect, but still pretty small — that it shouldn't be too bad to undo if we later decide it was a bad idea for whatever reason. Conversely, if we decide that it was a good move, we can open the door to determiners in other languages, and loosen the "unambiguous" criterion.

Advantages to doing this:

  • It's more accurate than trying to force determiners into the traditional lexical categories (parts of speech).
  • It's more concise than giving determiners several different POS sections with the same definitions over and over again.
  • It's in better keeping with our sister project Wikipedia's (accurate, or trying-to-be-accurate) descriptions of the various parts of speech.

Disadvantages:

  • The term "Determiner" will be less familiar to many readers than more traditional terms. (Of course, most of our readers probably have a fairly unclear notion of the traditional terms' meanings, anyway.)
  • From what I understand, there's not total consensus in the linguistic community about exactly which words are determiners. (Of course, "we can't do this perfectly" is a far cry from "we shouldn't bother trying" — and it's not like there's total consensus in the traditional-grammarian community about exactly which words belong to which traditional parts of speech, either.)
  • It might be a slippery slope from this to less obviously positive adoptions of modern linguistic theory — will we next adopt the notion of intransitive prepositions? Will we create an entry at Ø with screenful after screenful of definitions of null determiners and whatnot in every language known to man? (Of course, knowingly accepting inaccuracies is also a slippery slope. As a wiki, we really have no choice but to trust our future selves and future members of the community.)

Thoughts? (I'd like to bring this to a vote within a week or two, if there seems to be agreement.)

RuakhTALK 01:23, 2 April 2008 (UTC)

Why should we buy what the linguists are selling? Why should we buy it before OED, MW, Collins, AHD, Longmans, Random House, et al.? What are the actual advantages of this to our anonymous users? What evidence supports our beliefs about these possible advantages? What are the words that would be affected? It would be handy if we had one or more categories for them. If this vote is to be meaningful, would it not be useful to rein in Mr. Determiner and get him involved in the discussion? What else has to be done for this change to actually have a good effect on the experience of our supposed user base? DCDuring TALK 01:41, 2 April 2008 (UTC)
I'm in for a vote. Obviously I think determiner is a very useful category. Why should we buy what the linguists are selling? Can you imagine editors over at wikipedia (I know this isn't wikipedia) asking "Why should we buy what the physicists are selling? Newton was good enough for me." We should buy it because they're the experts. Why before OED? Because OED only comes out every few decades. Why before Longman? Longman has used determiner for years. For a list, see here Category:English_determiners (that doesn't include cardinal numbers which are all determiners as well as being nouns). I hope that helps.--BrettR 01:57, 2 April 2008 (UTC)
Longman's competitive strategy seems to be to try take market share among those who are dissatisfied with existing dictionaries. They seem to try novel approaches in many ways: typesetting, layout, font selection. So it is hardly a shock that they would be an early adopter. I wonder what percentage of the things that they have been early adopters of have been taken up by the others, that is, been successful. And what percentage they have abandoned, that is, failed. OED and MW also have online dictionaries which could readily adopt these innovations if they thought there were an advantage. —This unsigned comment was added by DCDuring (talkcontribs) at 02:52, 2 April 2008 (UTC).
Here on Earth there is very little reason for an engineer to care about theories of everything, general relativity, gauge theory, string theory, etc. Truth is truth for a purpose. Newtonian physics is very practical for many purposes. How will it help an anonymous user to use an unfamiliar category such as "Determiner"? How will it help me? It is not compelling and more than a little troubling that the rationale is: "Trust us, we're the experts." I certainly have no objection to having entries for words like "determiner" or using it for categorizing. It is for more fundmental purposes that I am concerned. You are imposing a change in the structure of what we are doing. It reminds me more of a Websterian or Shavian spelling reform proposal than something productive. DCDuring TALK 02:21, 2 April 2008 (UTC)
How will it help an anonymous user to use an inaccurate category such as "Adjective" for a determiner like both? Determiners are some of the most common words in the language; if someone's looking one up, I don't think we can assume that they'll be terribly familiar with "Adjective", either — and it certainly won't help them understand how to use the word. Incidentally, I really don't understand your comparison of this to Webster's and Shaw's proposals: we most certainly are not trying to change how people use these words. We're not trying to turn these words into determiners, only to accurately identify the words that already are determiners. —RuakhTALK 02:39, 2 April 2008 (UTC)
Accuracy-shmaccuracy. They are no more Determiners than they are Adjectives. These are just categories, tools of thought. Planets don't have elliptical orbits any more than they have circular ones. Elliptical orbits are more useful tools for thinking about their orbits, not ultimate truth. What are the constructive benefits of using this category? Will users be happier? Will they leave the site knowing more? Will we gain users? Will we gain funding? Will linguists cheer us? DCDuring TALK 02:52, 2 April 2008 (UTC)
I don't think it would be constructive to get rid of the category -- there are English determiners (or words which act like determiners), just as there are determiners (or words which act like them) in many other languages, and it makes sense to have a category for such words. The real question is whether we should use this as a header for English, as we have already been doing for many years. If we reject it for English entries, we have the odd phenomenon of a category, accepted for English entries, which is (and must be) a POS header for other languages such as Korean, but which is not accepted as a POS header for English entries. That would be inelegant at least, and elegance does have a certain value of its own. -- Visviva 04:41, 2 April 2008 (UTC)
I notice that above "determiner" is called "leading (bleeding?) edge". That's an odd way to refer to a concept that is at least 75 years old. Leonard Bloomfield wrote in 1933 (likely echoing earlier writings) that "The determiners are defined by the fact that certain types of noun expressions (such as house or big house) are always accompanied by a determiner (as, this house, a big house)." Since the 1960s, the concept of English determiners has been positively mainstream within linguistics, admittedly with some disagreement around the edges (do words like my, his, etc. belong?), but with broad agreement that there exists this class of words that are not properly described as adjectives or pronouns.
"How will it help an anonymous user to use an unfamiliar category such as 'Determiner'?" How will it help them to find things called "demonstrative adjectives"? And maybe they'll click on the word and learn something.
Nobody is asking anyone to take this one faith. The topic is clearly laid out in many books and articles with data and argument to back it up. Anyone who hasn't read much about the topic could do worse than look at the wikipedia entry. From there I would suggest moving on the A Student's Introduction to English Grammar or, for the more ambitious, The Cambridge Grammar of the English Language. You could also check out John Payne's 1993 chapter in Heads in Grammatical Theory by Corbett et al (eds.). Hope that's useful.--BrettR 13:17, 2 April 2008 (UTC)
If it has been around then it has had ample opportunity to have proven its practical utility to OED, MW, et al. The article of faith is whether it will prove useful to the practical understanding of users. This appears to be a fashion on interest to linguists with no positive practical consequences that anyone seems to be aware of. I was hoping that there would turn out to be some positive consequences that could emerge from the discussion. I remain hopeful that someone will be able to articulate an advantage to the population at large from making this change. If it has value in making linguists want to participate in Wikitionary, that might count for something. DCDuring TALK 14:11, 2 April 2008 (UTC)
If we can agree that there is a practical utility to having POS at all (and there seems to be consensus that there is), then presumably there is utility in making them perform consistently. The OED defines adjective as "a word standing for the name of an attribute, which being added to the name of a thing describes the thing more fully or definitely, as a black coat, a body politic." It further defines an attribute as "A quality or character ascribed to any person or thing." The determiners don't look at the qualities or characters of things. Instead, they have a pointing function that tell us which things are being discussed. Unlike adjectives, they cannot typically be graded or modified by too, so, very, or other adverbs that typically modify adjectives. They cannot typically appear predictively as most adjectives can (which also relates back to the fact that they are not qualities). That is, you can't say "The people are some." They are usually mandatory where adjectives are (always?) optional. You can't typically use them together (where you can string adjectives together to your heart's content). Adjectives are independent of number where determiners typically much match the number of the noun. In fact, about the only similarity is that they are both appear before nouns. So one practical utility of calling these determiners is that you convey all this information in a single word.
What, may I ask, is the benefit of calling these adjectives apart from tradition?--BrettR 14:37, 2 April 2008 (UTC)
The principal advantages are that:
  1. users believe they know the implications of something being labelled an adjective and
  2. there is some validity to that belief.
I don't see any reason to cause them to investigate the meaning of determiner if doing so does not offer some real benefits to those not on a career path in language. That the concept has appeal and value to those in the field I do not doubt. I am a little disappointed that there seems to be so little of practical value in the concept. When writers about language a writing a book to sell they don't seem to find the word "determiner" of great value in explaining things. It doesn't appear in the index or glossary of many books on language {Chomsky's "Aspects", Pinker's "The Language Instinct", 3 Safire works, 1 Crystal, 1 Fischer), though it does appear in the index to Pinker's "Words and Rules" and Crytal's "Encyclopedia of Language".
In short, "determiner" has been around, but has not swept the field among the producers of on-line and print dictionaries or language authors. It's merits do not speak for themselves. The stated benefits seem limited to "elegance". DCDuring TALK 15:38, 2 April 2008 (UTC)

You seem to be implicitly assuming

  1. said implications apply to determiners, such that this belief is a good thing here.

which is not an assumption I'm ready to make with you. That said, if we had some sort of (determiner) context tag, I'd feel more comfortable than I currently do with labeling these "adjectives". (That might also help streamline the entry for זֶה ‎(ze, this), which can serve either as an adjective or as a determiner, the latter being more formal/poetic/archaic, but the meaning being the same either way.)

RuakhTALK 18:10, 2 April 2008 (UTC)

I await an itemization of the erroneous conclusions that users draw from the old-style PoS headers that will be corrected by the use of Determiner for the words that really are Determiners as opposed to the pretenders that have been nominated Determiners by some linguists for their own nefarious ends. I don't see why the Hebrew need for the determiner category has any particular implication for how we treat English. Or is there a procrustean imperative in Wiktionary's constitution that I missed. I don't see how we can have "proof" of benefits of one kind of catagorization over another, but I would like to see some presentation of the possible benefits that offset the "cost" of introducing a term that Longman's identifies as "technical" in the two of its dictionaries I have looked at. This seems of a piece with "plurale tantum". If Wiktionary is "by linguists, for linguists", then we ought to reconsider the logo design to make that clear. DCDuring TALK 19:47, 2 April 2008 (UTC)
I thought I had done that. I guess we're thinking about different kinds of conclusions. Could you give me an example of the type of thing you mean, say with 'verb' in the role of determiner (nefarious interloper/deprecated POS header)?--BrettR 20:47, 2 April 2008 (UTC)
I have no idea how to put 'verb' in the role of determiner or how that connects to what I think I've said or was trying to say. I'll look at this again tomorrow with some sleep. My underlying goal is to make sure that we have our poor occasional users in mind with any change because they are the source of the growth of Wikitionary's impact on the world. It seems moderately useful to me personally, but I don't believe myself to be representative, nor do I delieve the participants in this forum to be representative, of our target users. DCDuring TALK 01:01, 3 April 2008 (UTC)
Erm, sorry, but I think you misread my comment? In no way was I suggesting that "the Hebrew need for the determiner category ha[d] any particular implication for how we treat English." What I was suggesting is almost the opposite, actually: some of our Hebrew entries currently make use of ===Determiner=== even though that category doesn't apply as perfectly in Hebrew as it does in English, and I was suggesting that if we figure out a decent way to handle English determiners without using ===Determiner===, we can apply that same method to Hebrew. (Obviously ignoring the distinction altogether is not a decent way in either language.) —RuakhTALK 23:16, 2 April 2008 (UTC)
OK DCDuring TALK 01:01, 3 April 2008 (UTC)
"I have no idea how to put 'verb' in the role of determiner" Sorry, what I meant was, if the 'verb' heading didn't exist, and all these verb things were being called 'nouns', what arguments would you use to argue that we should split them out into a new category? Anyhow, have a good sleep.--BrettR 01:40, 3 April 2008 (UTC)
As a general comment: the additional heading is a bad idea. Adding the categories seems fine. Don't those categories already exist anyhow? The heading might simplify typing an entry in once, instead of listing the POS headings that apply. But a heading like 'determiner' does nothing to clarify the word for typical readers. --Connel MacKenzie 06:28, 3 April 2008 (UTC)
Yes, the categories already exist. The discussion is about the heading.
I have trouble imagining what kind of typical readers are being imagined here. What typical reader looks up words like many, each, this, etc. in a dictionary? Presumably we have someone curious who has noticed something about the word and wants to understand it better. Such a person is much more likely to meet their goal (i.e., learn something) if they find the heading Determiner. If they don't know what it means, the answer is a click away. And if they've already gone to the trouble of looking up one of these words, then its seems a good bet that they'd go that extra step.--BrettR 11:41, 3 April 2008 (UTC)
I agree with this. If a person is looking up a word like this, then they should be told about what it is. To call this and a word like nice by the same label will probably lead to an erroneous conclusion that they have the same syntactic behavior. The reason for using two labels is because they occur in different positions in grammatical constructions and because they have different functions. And that's why they are used in detailed descriptions of English grammar. It is probably just resistance to change (prescriptivists dont usually like change) that other dictionaries have not followed Longman. I suggest you add the new label. Ishwar 19:54, 15 April 2008 (UTC)

Demonstrative adjective

I seem to have started this. Demonstrative adjective is an old established term for this/that, but certainly not for both. I again remind you all that only this/that, these/those inflect for number, unlike any other English adjectives. A fun bit is contracting "is" while putting these adjective into the possessive: "this's this's", "that's that's". —This unsigned comment was added by Allamakee Democrat (talkcontribs) at 16:27, 2 April 2008 (UTC).

Note: it is an error in English to contract this' with is. Regarding "demonstrative adjective", that term is possibly established for linguists, but not used in any recognized general use dictionary, to describe them. Making up headings certainly is not useful. A category with two members is a fine bit of overkill. A simple usage note on those two entries, would be better. --Connel MacKenzie 06:24, 3 April 2008 (UTC)
Your pronouncement is a bit misleading. Webster's New 20th Century Dictinary includes demonstrative with the following as a definition: "in grammar, pointing out; as that is a demonstrative pronoun." It does not use demonstrative to identify parts of speech of entries, but the concept is in the dictionary as it is in most dictionaries.
The problem with this approach can be seen (for example) in that same dictionary's entries for this. They give both a listing marked as pronoun followed by half a dozen definitions, then a listing marked as adjective followed by the same half a dozen definitions with the wording alterted only slightly for the use as an adjective.
You see, demonstrative adjective/pronoun words are one class of Determiners. One of the great advantages of the Determiner header in English is that we can combine and simplfy many repeated senses. This has much potential for reducing confusion on the part of the casual user, since they won't have to look through two sets of definitions that look almost identical. The translations will also be greatly simplified. In most languages I've studied, the translation for adjectival use of this and pronoun use of this are the same or nearly so. Thus, separate Adjective and Pronoun sections (in addition to other uses) doubles the necessary length of all words that are Determiners. Consolidating under a single header will simplify and cleanup these entries enormously. --EncycloPetey 18:06, 3 April 2008 (UTC)
Demonstratives this, that, etc. can function as determiners and as pronouns. But, not all determiners have the same syntactic behavior as this, that. For example, no, every, the cannot function as pronouns. So, you should use both labels for this reason. It, of course, doesnt necessarily follow that you need to repeat the definition multiple times Ishwar 19:35, 15 April 2008 (UTC)
I'm not quite sure I understand why the matter needs to be debated at all. Determiner is not an accepted part of speech among legitimate grammarians; if you look in one of good old Webster's dictionaries, you'll never find a word classified as a "determiner." If it is not accepted by the experts, why should it be on Wiktionary? Are we striving for incorrectness? Elfred 03:51, 20 April 2008 (UTC)
That's a funny thing to say. Picking up the first three relevant books that come to hand on my bookshelf -- Longman's Dictionary of Contemporary English, Leech & Svartvik's Communicative Grammar of English, and van Ek & Robat's Student's Grammar of English -- I find that all three use "determiner" quite routinely. The last two have a fairly canonical status in the field of English language teaching.
There are at least two things that distinguish these works from many others: they are written for an international audience, and they are based on methods and theories that are reasonably up to date. I would hope that we would try to emulate them in both regards. Dictionaries written for monoglot audiences, based on obsolete theories and methods, are not a particularly good role model. -- Visviva 04:56, 20 April 2008 (UTC)
It would be much easier to make a decision about this if we actually knew something about our anon users are and who they should be. Longman's DCE (definitely a learner's dictionary) is a good model for a dictionary, but it seems to be the only dictionary that uses "determiner" as a PoS. If someone could articulate the benefits of the "determiner" concept .... DCDuring TALK 09:47, 20 April 2008 (UTC)

As of the March, 2008 revisions, the OED uses Determiner. See, for example, the entry for many.--Brett 12:11, 26 May 2008 (UTC)

Internet slang

We currently have {{Internet}}, {{slang}}, and {{Internet slang}}, all of which are context tags categorizing eponymously. This means that some Internet slang is categorized under Internet and slang, whereas other Internet slang is categorized under category:Internet slang. This is far from ideal. I can think of two (mutually exclusive) solutions:

  • Get rid of category:Internet slang and force template:Internet slang to categorize into Internet and Slang. This seems reasonable to me.
  • Have AutoFormat change every context tag that includes both Internet and slang to include Internet slang instead (and remove spaces, or whatever, of course). This seems like a bad idea to me, as there could be an entry that is both Internet (not necessarily slang) and slang (outside of the Internet), so would need both tags.

Thoughts?—msh210 21:18, 2 April 2008 (UTC)

Seems to me that the vast majority of these should be in Category:Internet slang. You are probably right that the conversion cannot be automated, but perhaps we could get a cleanup list? -- Visviva 06:33, 3 April 2008 (UTC)
A rough cleanup list.msh210 16:18, 3 April 2008 (UTC)
It would be nice to redesign the context labels to carry information such as formality and region. Yes, {{internet slang}} could expand, as you say, but why shouldn't{{context|internet|slang}} classify it under all three categories? DAVilla 06:53, 7 April 2008 (UTC)
Is that doable with template:context?—msh210 19:00, 9 April 2008 (UTC)
See my response here. DAVilla 17:05, 1 June 2008 (UTC)

full-width characters

I think we should reprogram wiktionary so that when people look up something with full-width characters, it treats it as a lookup using half-width characters. Full-width characters are non-ASCII variations of half-width characters, made wider to match the width of other languages like Japanese. So words in them would show up now and then in foreign text for example CD in Japanese. If someone looks up CD on Wiktionary, we will either...

1) Tell them the word doesn't exist,

2) Give them a definition like "full-width version of CD",

or 3) Just automatically redirect them to CD (hardcoded, not a manually-added redirect, because it'd be too much work to manually add redirects for all English words).

I suggest option 3. Right now we do option 1. Language Lover 22:23, 2 April 2008 (UTC)

How does one come to look up something with full-width characters? By copying text from another website, and pasting it in Google or Wiktionary's search field?
If full-width characters are non-ASCII, do they carry other meaning? Or are they just a presentation form which always represents the same as their Latin equivalents?
Sounds like it might be a problem to be solved by text encoding standards bodies, and the makers of system software and web browsers, rather than by the maintainers of each website. —Michael Z. 23:13, 2 April 2008 (UTC)
Copy-and-paste, yes. Google is smart enough to fix them; MediaWiki (the software the Wiktionary runs on) is not — which is just as well, because we probably do want entries for and , if only to answer the questions in your second paragraph. However, CD should probably JavaScript-redirect to CD. Shall we discuss this at Wiktionary:Grease pit? —RuakhTALK 01:09, 3 April 2008 (UTC)
Okay, perhaps it is a feature we could use, but it should be pretty well thought out before implementing.
But this brings up another question, about which symbols are appropriate Wiktionary entries. WT:CFI doesn't really cover this adequately. The technical implementation is different, but whether in the full-width or ASCII code range, "CD" means "CD". As far as I know, Wiktionary entries are about words, and maybe some symbols or the concepts they represent, but they are not about code points. The full-width code point is a symbol representing the exact same concept as the ASCII code point C, and I don't think it should have a separate dictionary entry. —Michael Z. 21:30, 3 April 2008 (UTC)
Yes, hallelujah! And combine A with Cyrillic А and Greek Α etc. Differentiating script is a great feature of Unicode if you're interested in automated text manipulation, but when it comes to defining symbols, these are indistinguishable glyphs. In fact, I would say that if anything deserves another page it's the italic text, the script text, etc. even if it may be the same code point. DAVilla 06:48, 7 April 2008 (UTC)
I believe this is yet another issue that Hippietrail's Extension:DidYouMean would deal with for us, lets hope we can get it tested and implemented soon. Conrad.Irwin 11:36, 3 April 2008 (UTC)
If you have the right keyboard setup, you would come across it just by typing. And don't think for a second that just because some Asian language Wiktionary exists that a single translation of ko:CD is going to keep a non-native English speaker from coming here. Even with as little knowledge of other languages as I have, I know that to get the best explanation of a word you have to look on the foreign language dictionary. A survey of foreign language terms on this English Wiktionary and the all-too-often need for a {{gloss}} drive in the point. So yes, there is every reason to have this incorporated into DidYouMean or some other solution. DAVilla 06:48, 7 April 2008 (UTC)

Colloquial and slang: a sensible combination?

I'm in a disagreement with User:Amgine over at doggie as to whether having both {{colloquial}} and {{slang}} simultaneously is appropriate. I'm firmly in the camp that only "slang" is needed and if the fact nobody commented on that above is any indication, other people seem to agree. Circeus 01:05, 3 April 2008 (UTC)

I agree with you. As Amgine says, "[t]he terms are not synonymous", but as you say, "slang is by its very definition 'colloquial'". "Colloquial" does not imply "slang", but "slang" does imply "colloquial", so there's never any need to list both. (That's going by what I currently know. If someone can give a decent rationale for ever including both, I'm open to the possibility.) —RuakhTALK 01:25, 3 April 2008 (UTC)
Colloquial, by our glossary, indicates a term in common, often informal, parlance, as opposed to jargon usage. Slang, in contrast, is characterized by its limited use, unconventionally or as informal jargon. ie. "anon" on en.wiktionary is slang, but "bike" is colloquial. (Incidentally, I thought informal was defined in the glossary at one point as indicating a term for which there is a more-formal synonym term used in formal circumstances, such as the bike/bicycle dichotomy. Currently there is no definition in the glossary for informal, so should we be using it at all?) - Amgine/talk 14:06, 3 April 2008 (UTC)
Given that the distinction you refer to was added a few days ago (after your revert, I think—would need to check;I've started a discussion about it below), I definitely don't consider it to have any bearing on the discussion. Circeus 17:23, 3 April 2008 (UTC)
Even given your definitions of informal (the last of which I dispute, btw) the primary definitions are to contrast with formal uses. And, by implication, must clearly not be considered to have any bearing on the discussion.</irony> I was unaware of the edits to the glossary, assuming your good faith in the matter. - Amgine/talk 20:13, 3 April 2008 (UTC)
Re: the edits, I was just pointing it out before someone looked there and noticed it bore on the discussion.
As for "my definitions" (which, despite pointed discussions on the subject, have not been altered since), they are directly in line with those of, e.g. Merriam-Webster. In fact, one could argue that the replacement definition is entirely superfluous since meaning 1 arguably cover it (and the entire entry could use a better swipe: it's considered bad form to define a word in by saying what it doesn't mean).
I've made the point in the past, with several references to scholarly sources, that no modern dictionary makes a meaning distinction between the "colloquial" and "informal" labels (which is why they always use only one). Furthermore, "informal" is clearly never used in the very restricted meaning of "spoken" (which is what "colloquial" was originally explicitly coined to cover). Circeus 21:24, 3 April 2008 (UTC)
Now that you mention it, I don't recall any dictionary using the informal label except the US ones. However, I have not often examined dictionaries specifically for this use. I should examining those near to hand to me before I comment further:
  • Oxford Dictionary of Current English: informal
  • M-W Pocket, Online: neither
  • Various nautical, rare: colloq and formal (presumably because all else is informal?) But these are not valid as they are, by definition, non-standard.
I believe this does not indicate a consensus amongst modern publishers, but it may raise question as to the value of *either* label. - Amgine/talk 03:55, 4 April 2008 (UTC)
There are disagreement over the philosophical application of tags between lexicographers, and although some general language dictionaries do use colloquial, they use it instead of informal, not as a separate category. When M-W published (in th 50s) a dictionary that dropped the colloquial and (IIRC) slang tags, amongst others, they got an incredible amount of flak, but that dictionary is considered one of the most progressive of its time.
To give aother example of peculiar tag use, my harraps-chambers billingual is mildly idiosyncratic itself: it uses neither formal or informal, but has "ironic" and "humorous" (arguably billingual dictionaries have different needs and must carry more connotative information than the average monolingual). It uses "familiar" and adds "colloquial" next to the explanation of the abbreviation (Fam). It does use "formal" though. This is likely because "informal" is not used of language in French, and they strived to use a single tag for both languages whenever possible. Circeus 06:02, 4 April 2008 (UTC)
Longman's Dictionary of Contemporary English 3rd ed (1987) uses: formal, informal, literary, pompous, poetical, slang, dialect, technical, old use, old-fashioned, appreciative, derogatory, taboo, trademark, slang, humorous, and appreciative, but not "colloquial". I included more than have a direct bearing to illustrate that they seem to go out of their way to select straight-forward terms whose ordinary meaning is very close to their intended meaning instead of relying on jargon. Their definition of colloquial is that it indicates usage "suitable for ordinary, informal, or familiar conversation, not formal or special to literature." They are rather progressive in their handling of such matters. (These are the fellows who use "determiner" as a PoS.)
MW3 (principal copyright 1961) must be the M-W dictionary referred to above. The 1993 ed. retains "slang", but not "colloquial", "formal", or "informal". It does have "standard", "substandard", and "nonstandard". DCDuring TALK 12:03, 4 April 2008 (UTC)
From this I conclude that "colloquial" is less precise. I wonder if it wouldn't be desirable to allow all of these tags and indicate which ones are more and which ones less precise. That way, even less precise knowledge that users might have would be included. Categories would make the less precise items reviewable if someone actually believed that had knowledge or belief tha they should be more precise. We can take advantage of our non-print wiki nature to be more dynamic and evolutionary than print dictionaries. DCDuring TALK 12:03, 4 April 2008 (UTC)

I find humorous the claim that all slang is colloquial just below another topic called "Internet slang". Certainly Internet slang or other types of esoteric jargon did not necessarily originate in speech. The categories slang, colloquial, and informal all overlap greatly, but there are clearly some terms that will be only one or the other. This is another big reason to maintain the distinctions. -- Thisis0 23:50, 5 April 2008 (UTC)

Well, obviously the reason I do not normally accept the combination is properly that I consider that "colloquial" and "informal" are one and the same for all practical purposes, so obviously "informal" and "slang" make no sense to use together. Circeus 00:26, 6 April 2008 (UTC)
Likewise, I assume all informal falls within the purview of colloquial, but not always the reverse; that is, I feel colloquial is more inclusive as some colloquial speech is also formal. However, these discussions suggest the labels are ill-defined and, possibly, driven by editors' opinions more than by objective measure. As such it seems likely they are inappropriate labels to be used at this time. - Amgine/talk 01:58, 6 April 2008 (UTC)

I see a misspelled entry

Just passing by, but I noticed that paranomasia and paranomasias are misspelled: the correct spelling is paronomasia, with an o as the second vowel. --124.178.50.148 01:05, 3 April 2008 (UTC)

Actually, both spellings seem to be fairly common; the a spellings should probably be marked as alternative spellings of the o spellings. —RuakhTALK 01:13, 3 April 2008 (UTC)

Recent edits to Appendix:Glossary needs community discussion

I think these (especially those to "Colloquial" and "Informal") really ought to be discussed. While I appreciate the addition of a {{familiar}} tag and entry, I'm really dubious of the need to add entries for informal and formal, unless we want to throw meanings on them they don't have or (not to mention the information given is sometimes contradictory with that found in Wiktionary: Glossary). The links in {{informal}} and {{slang}} are at best superfluous, at worst an insult to our readers. Also, the attempt to specifically distinguish "slang" and "colloquial" (although it does formally establish that the tags are inappropriate in combination, see related discussion above), is fairly ridiculous, as slang is colloquial (if we take the "originating in speech" definition as a basis) by its very definition.

I'll again request a thorough discussion about the basis of distinguishing informal from a purportedly distinct and easy to attribute (because if we have to argue over the tagging every single word, the tag is pointless) "colloquial" category , as it is indirectly related to the other discussion above.Circeus 01:21, 3 April 2008 (UTC)

I think of "informal" as being used mostly to create the contrast with "formal", rather than being used in the same context as "colloquial" and "slang". There are certainly words that are mastly used in "formal" contexts, Madame Chairman. "Formal" is not a synonym for standard. It refers to words used on ceremonial and official occasions and contexts, but not limited to any one of them (like courts of law). DCDuring TALK 11:24, 3 April 2008 (UTC)
I don't really dispute the usefulness of the template, but I do strongly doubt the necessity of defining it into the user glossary. Circeus 21:27, 3 April 2008 (UTC)
I don't wish to cause trouble, but if we have a term we use in these tags, I would argue that we need to have a link from the tag to either an entry or, if we are using the word in an even slightly idiosyncratic way, to the glossary. DCDuring TALK 21:49, 3 April 2008 (UTC)
Given the number of various terms we use like this, I agree. If we're going to make distinctions beween "informal, colloquial, slang, vulgar, etc" then it seems reasonable to explain the distinctions to our users by means of a link to some kind of explanation. --EncycloPetey 22:21, 3 April 2008 (UTC)
Of course, it would be a good idea if we could actually agree about the meanings first I think the previous discussion firmly established there is at best disagreement as to what should be done of colloquial. Circeus 23:58, 3 April 2008 (UTC)
The tracks are parallel. The only technical decision is whether to use the normal entry definition, a category page, or the glossary. I would argue that the best technical solution would be to use the Wiktionary Glossary definition unless there wasn't one, in which case the normal entry would suffice. Then we can hash this out without further tech involvement. Or perhaps we could simply insist that the Glossary be the sole source for such tag definitions. The category pages would be inconvenient for maintaining the sets of related terms, I suppose. DCDuring TALK 00:40, 4 April 2008 (UTC)

Citations_talk: is deprecated

There has been scattered discussion in various places about the [[Citations_talk:]] namespace with the general feeling being that, given the phenomenal amount of use that our Talk: pages get, we may as well have one talk page to discuss the word - rather than two talk pages to discuss two intimately related pages. The Citations_talk: namespace should now be empty, and the site javascript conspires heavily against users finding themselves there. Conrad.Irwin 11:33, 3 April 2008 (UTC)

That's nice. Now, given that one of the primary purposes of the Citations namespace is to collect citations for words that we don't yet have in NS:0, and may or may not have in the future, where do you suggest putting any resulting discussion? An NS:1 Talk page will look orphaned, and such are routinely deleted. Eh? Robert Ullmann 11:58, 3 April 2008 (UTC)
They should still be in the Talk: page, as - should the entry ever be created - previous discussion about the citations is likely to have an impact on it. It should be trivial to ask whatever (or whoever) deletes orphaned talk pages to check for Citations pages of the same name. (off topic) I feel that the obsession with deleting orphaned talk pages is unnecessary and indeed harmful - it is possible for the community to talk about entries that will never exist or have been deleted. It makes sense to store all this talk in a central place where it can be found instantly should the need arise - the talk page is the ideal page for this. Take for example the {{rfvfailed}} template - which archives deleted information easily accessible on the talk page, yet if a whole entry gets deleted the archive is somewhere in a page history accessible from some index that I can't remember the name of at the moment. Conrad.Irwin 12:36, 3 April 2008 (UTC)
Perhaps a standard link (tab?) from Citations to the associated talk page would facilitate finding it. And if it were red, it would convey non-existence. I don't see any reason to delete our previous work, either citations or discussion. Page history is not readily searchable, AFAIK. DCDuring TALK 16:53, 3 April 2008 (UTC)
Re: Ullmann's comment: why are any discussion pages with valuable content being deleted at all? Orphaned talk pages should only be deleted if they contain vandalism, discussions about terms which may not presently exist are valid if only to show that someone has discussed them and a decision was made that the word didn't merit inclusion. Orphaned talk pages should not be deleted by rote.
Re: the general discussion: I am opposed to splitting discussions between the cites and talk page, since the cites page is simply another type of "discussion" about the NS:0 term. It isn't as if the talk pages are swamped, the average defined term has no extant talk page and the average talk page is tiny, even compared to the entry. Best to keep all meta information in one place. - TheDaveRoss 20:27, 3 April 2008 (UTC)
So apparently there is this fancy thing in MW now called "namespace alias" where we can make all "Citations_talk" pages point directly at "Talk" if we wish (also "WT" at "Wiktionary" if we want that. - TheDaveRoss 20:54, 3 April 2008 (UTC)
Oooh. That sounds a lot like what I would want. Any drawbacks? Presumably it would need a vote to be implemented. DCDuring TALK 17:36, 4 April 2008 (UTC)
Yeah, on wiki consensus would need to be displayed, and the namespace which was made into an alias would never-ever be editable, at all. - TheDaveRoss 20:30, 4 April 2008 (UTC)
I think it would be more useful to wait and see how Citations talk might be used than to declare this early that it could not be used. Primarily I am concerned that there is not the 1-to-1 correspondence between entries and Citations pages that people wish there to be. I could be proven wrong, but I would want time itself to do that.
However, I am certainly interested by this ability to create namespaces that are not the usual duals. I never did understand, for instance, how WT talk: could ever be of use. WT: isn't really a "full" namespace anyway. At the same time, it would be incredibly useful to have the Template: space for documentation and another space that actually holds the code, sharing a single talk page. But you've heard me say that before. DAVilla 06:12, 7 April 2008 (UTC)
Well <noinclude> allows documentation on the Template page, it is just more common to see it on the talk page, no reason really to do it that way. I agree that WT_talk is useless, I personally think we ought to make it an NS alias, but I wasn't greeted with full support when I voiced that opinion... - TheDaveRoss 20:55, 9 April 2008 (UTC)

maritime pine

The definition is embedded in {1} code and references contain {2} but they don't seem to do anything. Is this something I have to watch for or can I delete it when I see it? --Panda10 20:34, 3 April 2008 (UTC)

You can delete that on sight, it is meant to be used when a definition is subst:ed in from a template, but it ends up as a relic when people who don't know how to use the template do. - TheDaveRoss 20:56, 3 April 2008 (UTC)

Formatting of Idioms, Proverbs in non-English entries

I haven't found any precise information on how to format Idioms or Proverbs in non-English entries. The main problem we have is where to put the literal translation. Here is a summary of the different elements to put in a standard entry.

  1. Wiki links on the words or group of word of the Czech proverb: yes, in lemma form
  2. Translation with the equivalent proverb in English: with a # after the entry, if not just omit
  3. Literal translation of the Czech proverb: on the same line as the entry, at the end, in bracket using the tr= parameter in the template {{infl}} but should only be used for transliteration or at the end of the definition in bracket or in the Etymology section ?
  4. Explanation of the idiomatic meaning of the Czech proverb: after the translation with #:, in italic ?

See About Czech for more information and also Translation of idioms. --Thomas was here  12:13, 4 April 2008 (UTC)

I usually put the literal translation as part of the etymology section. There may be times where explaining the literal translation further can help, and sometimes that can go under "Usage notes". I usually wouldn't put the literal translation as a "definition" unless there was no English expression that came close to the meaning. --EncycloPetey 18:00, 4 April 2008 (UTC)
I also use the etymology section for this. I can't think of a specific case where a usage note would be helpful, but I can abstractly imagine that sometimes an idiom's literal meaning could have implications a user should know about. I agree that the literal translation shouldn't be given a sense line unless the idiom is sometimes used literally, and even then, caution is warranted. —RuakhTALK 22:39, 4 April 2008 (UTC)
Me too. -- Visviva 02:51, 5 April 2008 (UTC)
Fine, so there seems to be a consensus on putting the literal translation in the Etymology section. Thanks for your answers. Another and I hope last question is: should we copy the idiomatic meaning from the English entry in the non-English entries ? I can also add the final version of this formatting to a Wiktionary:Proverbs page and put this page into the Category:Proverbs. It also is maybe time to add a section Non-English entries in the Entry layout explained. --Thomas was here  16:32, 6 April 2008 (UTC)
We already have Wiktionary:Language considerations for that; the page is just severely underused. --EncycloPetey 21:35, 6 April 2008 (UTC)

Illustrations

I suggest a new guideline for illustrations: they must be helpful when trying to understand the meaning of the word (if not, they belong to Wikipedia). As an example, I think that the picture in maritime pine does not belong here, but a picture of the tree would help. Lmaltier 16:47, 4 April 2008 (UTC)

Not a bad idea. But a little visual interest and color is better than an absence of any illustrations. At least the picture is (purportedly) actually of a maritime pine, albeit a rather young one. DCDuring TALK 17:30, 4 April 2008 (UTC)
Perhaps whoever took that picture is planning on returning in 30 years to get us a followup...assume good faith :) - TheDaveRoss 20:29, 4 April 2008 (UTC)
Changed that pic, hope it's more satisfactory (others at Commons:Pinus pinaster). In general, I agree; we should generally have no more than one image per sense, and the image should be chosen to illustrate that sense as clearly as possible -- which in the case of an organism, means preferably a picture of the whole, mature organism. We should link to Commons or Wikipedia for additional visuals. However, if the previous picture had been the only one available on Commons, I think it would have been better than nothing. -- Visviva 02:50, 5 April 2008 (UTC)
It's a fine pic and probably better than the other, but from a distance many pines look alike. If I wanted to know how a maritime was different from the pines I know, probably neither picture would help. I wonder whether we should make all the wikicommons pictures on the subject just one click away. DCDuring TALK 04:00, 5 April 2008 (UTC)
All 3 sister project links in lite form seem useful in this case. Lite form makes it all appear on a single screen. I suppose there might be other cases where the bigger links would be OK. And yes that probably is the best single picture, whatever its shortcomings might be for specific purposes. And with tabbed browsing it is so easy to compare images of two or more kinds of pines. DCDuring TALK 04:10, 5 April 2008 (UTC)
My point was that, sometimes, no picture is better: everything that blurs the limit with Wikipedia should be avoided, including pictures when their only interest is encyclopedic. Pictures should illustrate definitions, they should be visual definitions. A good example, for a country name, is a map showing where the country is. This is not original, I think that this principle is applied for picture selection in most dictionaries (except encyclopedic dictionaries). Lmaltier 21:01, 5 April 2008 (UTC)

Category for rulers king, tzar, pharaoh

I'd like to find a category to put these words in. I think it should be a subcategory of People, but what would be a good name? Thanks. --Panda10 22:04, 4 April 2008 (UTC)

Category:Titles is related, but not quite the same. I had been thinking a while ago that it would be good to have a category for ranks of nobility like duke and baronet. I think whatever is used for "rulers" should probably encompass "nobility" as well. Mike Dillon 02:37, 5 April 2008 (UTC)
hereditary heads of state seems reasonable. --Allamakee Democrat 03:32, 5 April 2008 (UTC)
Wouldn't that exclude many founders of dynasties? DCDuring TALK 18:34, 5 April 2008 (UTC)
I'm not sure we need or want categories that are that granular. We already have too many categories that will never have more than a dozen entries; it's not like we're going to be adding individual heads of state a la Wikipedia. Mike Dillon 03:59, 5 April 2008 (UTC)
What is the minimum number of entries to create a category? How about Leaders for category name with this list as a starter: autocrat, crown prince, czar, czarina, despot, dictator, dynast, emperor, empress, generalissimo, governor, head of state, imperator, kaiser, khan, king, leader, magistrate, magnate, maharajah, maharani, mikado, mogul, monarch, Negus, pharaoh, potentate, prince, princess, queen, rajah, rani, regent, ruler, shah, shogun, sovereign, sultan, tsar, tsarina, tycoon, tyrant, viceroy
Or I could add these words as a See also to king without creating a category. Panda10 13:04, 6 April 2008 (UTC)
I'd go with Category:Rulers personally; "leaders" is awfully vague and doesn't describe a very clearly-delineated semantic field; it could just as easily include things like pastor and principal. But a category should (IMO) definitely exist which includes most if not all of the terms you have listed above. -- Visviva 14:08, 6 April 2008 (UTC)
I'd avoid Category:Rulers personally. The category name is ambiguous. Does it refer to people who rule, or to instruments used to measure distance? Such ambiguous names should be avoided. --EncycloPetey 16:24, 6 April 2008 (UTC)
How about Category:Monarchs? See Wikipedia w:Monarch listing a lot more words. This categorization would exclude words for modern leaders, though. Or better: Category:Positions of authority? This is a wider category and could include a lot more. --Panda10 17:53, 6 April 2008 (UTC)
"Monarchs" is too narrow a description. I fear that "positions of authority" may be too broad, unless we want to include abbot, schoolmaster, manager, etc. I haven't thought of a good name, but I have come up with many bad ones. --EncycloPetey 21:33, 6 April 2008 (UTC)
"Heads of state" would probably work. "Nobility" should be a separate category, although obviously some entries would be in both. Thryduulf 21:57, 6 April 2008 (UTC)
"Heads of state" doesn't really work for princess, queen, or prince (at least not in general). There seem to be a few of cross-cutting things going on here, mainly heads of state and royalty. Mike Dillon 22:13, 6 April 2008 (UTC)
I am not certain that there is a need for one category to contain all these entries, categories "Heads of state", "Royalty" and "Nobility" linked by see alsos would seem to be the best solution to me. "Marquess", "King" and "Prime Minister" don't seem to belong to a single class of positions to me. Thryduulf 22:31, 6 April 2008 (UTC)
I concur. That looks like a pretty good breakdown. Now the question is where to put them. "Heads of state" probably makes sense under Category:Government and/or Category:Titles. Royalty and Nobility could make sense under Category:Society and/or Category:Titles. Mike Dillon 01:31, 7 April 2008 (UTC)

I created Category:Heads of state and set up the parents and a customized description for using {{topic cat}}. We can adjust if necessary. Mike Dillon 01:44, 7 April 2008 (UTC)

P.S. We already have Category:Monarchy too under Category:Forms of government. Mike Dillon 01:46, 7 April 2008 (UTC)

Dating

Would anyone object to categorizing words by the year/decade/century of their introduction to the language/earliest attestation, where such information is known? --Ptcamn 08:02, 5 April 2008 (UTC)

Not really, but it seems problematic. What really counts as "knowing" that information? To prove that a word was introduced in the year/decade/century X involves proving that was not in use at any time before X -- and proving a negative is a difficult task under the best of circumstances. Third-party sources aren't necessarily a reliable fallback, either -- I think there have been a few recent cases where we've found citations for terms which substantially predate the dates of introduction given in reputable sources. -- Visviva 16:32, 5 April 2008 (UTC)
You want to introduce this concept for all languages? or limit it to English? or to Modern English? Even limiting to modern English, you'd be talking about adding hundreads of categories. Personally, I'm not sure they would be very useful. Why would anyone want to look up "English words first attested in 1712"? --EncycloPetey 18:12, 5 April 2008 (UTC)
Folks might be more interested in words first attested after a certain date so categories as we now know them might not be the right approach. We already have, in principle, the concept of definitions that are "dated", which implies that we know when they died or, at least, retired from active service. A bit of unstructured user Feedback suggested the notion. Perhaps that will be something we will further develop in the second decade of Wiktionary. DCDuring TALK 18:29, 5 April 2008 (UTC)
I can't imagine a category more useless than Category:English nouns, so if you want to create these, I'd have no problem with them. But, as Visviva notes, caution is warranted. —RuakhTALK 19:40, 5 April 2008 (UTC)
Not the topic, but I find the Wiktionnaire category for French words useful, despite its 350 000 entries. I already used it several times. Lmaltier 21:14, 5 April 2008 (UTC)
Categorize after earliest and newest added quotation? Nah, I don't know... Still, it would be a hassle, if not outright confusing, for words of several meanings. \Mike 12:26, 7 April 2008 (UTC)

Keenebot2 to branch out?

This thread is to inquire about User:Keenebot2's generation of conjugated forms of verbs, nouns, adjectives and other parts of speech in foreign languages other than French. Hopefully I've proved myself capable of running a bot - around 50000 entries created in the last couple of months for French stuff. I'd like to branch out into other languages now, only ones which have logical conjugations. Examples of Keenebot2's non-French FL entries are for inflections of dificultar (Catalan), precisar (Catalan), soma (Romanian) and lyste (Norwegian). Essentially, I'm asking to be allowed to use the bot for whatever I feel might be useful, as long as it's within the "get data from Wiktionary, process offline and spout out tonnes of declension/conjugation forms in WT format" form. I'd hate to have to go the an uber-bureacratic 2-week vote for every language I use, and also am not keen on abusing the bot status to pick things willy-nilly and add unfamliliar languages so nobody notices, so can I get "free reign" for my bot? Keene2 13:50, 5 April 2008 (UTC)

The bot authorization is only for French, so you'll need a vote. A very important consideration is the involvement of a native speaker in the process; otherwise serious errors will occur. A native speaker need only to look at the entries being made and say wait a sec, that's wrong!. The vote process is very simple if not controversial, and if you are too impatient for the time you have to wait, you probably should wait a bit anyway ;-). For now: got a native speaker of Romanian? Robert Ullmann 14:41, 5 April 2008 (UTC)
"Native speaker" is pushing it, but yes, there's needs to be someone who knows the language well enough to vouch for the resultant entries. —RuakhTALK 16:09, 5 April 2008 (UTC)
2 week vote started. Any question are best posed there. Keene2 14:48, 5 April 2008 (UTC)
I will point out here that prior to running TheDaveBot for Spanish verbs I looked over it, two native speakers looked over it, the pages sat for several months and THEN two errors were discovered (two per verb, not two total), I think Keene will act in the best interests of Wiktionary and I don't expect perfection, even though I am certain Keene will aim for it. - TheDaveRoss 16:25, 5 April 2008 (UTC)

Unicode 5.1

Is finally officially out:

There is a lot of cool new stuff inside, so check it out. Unfortunately still no Avestan and Egyptian hieroglyphs :/ --Ivan Štambuk 21:15, 5 April 2008 (UTC)

Early Cyrillic

Unicode 5.1 includes some revisions and very significant additions for the range of Cyrillic characters used for Old Church Slavonic (cu, chu), Old East Slavic (orv) – used in etymologies, e.g. горілка#Etymology – and modern Church Slavonic languages (and probably many others).[11] A small selection is already available in the Dilyana font,[12] and undoubtedly there is more font support to come.

It's likely that only obscure Slavistics fonts will support this range, at least at first. Will we have to fork the current Cyrillic style (.RU) into a second version with its own list of fonts for this purpose? When the fonts do become available, the preference be to imitate traditional typography, which uses old-fashioned manuscript-style typefaces for these languages? —Michael Z. 01:30, 6 April 2008 (UTC)

Such significant additions merit their own brand new ISO 15924 code, so they created Cyrs - Cyrillic (Old Church Slavonic variant). So far OCS entries (2158 of them according to WT:STATS) are using exclusively {{Cyrl}}, which should be changed to {{Cyrs}} once the particular font issues are inspected and settled. Dilyana is so far used for Glagolitic ({{Glag}}). --Ivan Štambuk 02:06, 6 April 2008 (UTC)
I've created a draft {{Cyrs}}, based on the existing pattern, specifying Dilyana font followed by other Cyrillic ones, and applying class="Cyrs".
Is there any reason not to add a class="Glag" to {{Glag}}, for future use and user customization? —Michael Z. 04:46, 6 April 2008 (UTC)
We should have Glagolitic. It was in use in the Balkans long after it disappeared elsewhere, so there are documented forms for relatively modern words in Glagolitic spellings. --EncycloPetey 04:57, 6 April 2008 (UTC)
The template is already there, and seems to be in use on 226 pages. I'll just go ahead and add the class, so it will be there if anyone needs it. It seems clear that there will be no conflict. —Michael Z. 05:06, 6 April 2008 (UTC)

Hm, it seems that applying "font-family:Dilyana;" breaks glagolitic text in my browser (Safari/Mac, but it doesn't affect the display in Firefox 2/Mac). Evidence for "don't mess with it if it's not broken". —Michael Z. 05:19, 6 April 2008 (UTC) NM; restarting the browser fixed it.Michael Z. 05:26, 6 April 2008 (UTC)

Still lacking credibility as a decent dictionary

I wrote the following in the Requests for Cleanup, but though it worth copying here.--Richardb 00:24, 6 April 2008 (UTC)

It is still too easy to find basic words, such as head, which have far fewer meanings listed in Wiktionary than in many a concise dictionary. I pointed this out about head a couple of years ago. Yet it is still missing some simple definitions:-

  • head of steam, head of pressure.
  • head of a door frame
  • it cost him his head (it cost him his ilfe, but his head may still be in place!)
  • $10 per head
  • side of a coin
  • part of a tape or disc player, printer etc
  • promontory
  • events come to a head; a climax
  • the top of a pimple;spot;boil
  • out of one's head; off one's head

etc etc.


some parts are confused:-

  • (countable) The topmost, foremost, leading or principal operative part of anything.

What does it say on the head of the page?
Principal operative part of a machine has nothing in common with head of the page

I previously tried to get some sort of Quality Control Project going on the top 1000 words, but was defeated by apathy (mine and everyone else's). It has to be a team effort, but team efforts never seem to succeed here. Everyone seems to want to do their own thing. So Wiktionary still seriously lacks credibility in it's most basic function - as an English Dictionary.

I'm no longer interested in trying to take this on. But unless quite a decent group takes it on, the dictionary is still going to be lacking credibility, despite all the other wonderous stuff which people spend time adding.--Richardb 00:24, 6 April 2008 (UTC)

  • So what you are saying is - you can't be bothered to fix it yourself, but are complaining that others aren't fixing it for you? Hardly the spirit of a Wiki. SemperBlotto 07:12, 6 April 2008 (UTC)
Nonetheless I think the basic point is valid -- we are unlikely ever to be taken seriously as a dictionary unless we have exemplary coverage of core English vocabulary. Part of the problem here is that writing a reasonably comprehensive entry for a common word like head is easily a day's work; personally, on the rare occasions when I have that kind of time available, I find it difficult to justify spending the day improving one entry rather than creating 50 or 100 entries for words that we don't have yet. But I do agree that this is our single most significant failing at present. The next time I actually have an 8-hour block available, I'm bringin' it to the GSL. -- Visviva 13:01, 6 April 2008 (UTC)

moo goo gai pan: What if you don't speak the language?

Copied from: Talk:蘑菇鸡片

start ...omitted... The Cantonese definition seems to have been removed, although this is fairly clearly a dish of Cantonese origin. 24.29.228.33 02:59, 6 April 2008 (UTC)

I agree. The problem that I'm running into is that I can't find any corroborating material. I don't think citing the Wikipedia article is appropriate at this point, since it also lacks proper citations. I don't speak Cantonese myself, so I don't know if I can trust the accuracy of the scant materials that I've found online. Already, I've found one descrepancy. Wikipedia says that 鸡片 is "gai1 pin3" but Cantodict says that its gai1 pin3*2.[13] I think I'll post this to WT:BP, and find out what others want to do. I'd honestly rather having nothing, than to include potentially inaccurate information. My reason is that I've seen how errors and inaccuracies are quite easy to perpetuate online, once they are out there. -- A-cai 05:44, 6 April 2008 (UTC)

end

First of all, are there any Cantonese speakers that could help out with these two entries (蘑菇雞片 and 蘑菇鸡片)? If not, what do we want to do about such entries (words in languages which are difficult to verify, especially because we lack the appropriate native speakers at Wiktionary)? Opinions? -- A-cai 05:44, 6 April 2008 (UTC)

Wiktionary:Transliteration

The guidelines at Wiktionary:Transliteration and the contents of Category:Wiktionary:Transliteration need some attention. There are a few independent issues I'd like to address, so I'll place them under separate subheadings here. —Michael Z. 16:45, 6 April 2008 (UTC)

Forked guideline

I'd like to merge Wiktionary:Transliteration with Wiktionary:Transliteration and romanization. There doesn't seem to be any reason for two essentially redundant guidelines. Any objections? —Michael Z. 16:45, 6 April 2008 (UTC)

Sounds good to me. :-) —RuakhTALK 19:36, 6 April 2008 (UTC)
Since there's no objection, I'll go ahead and merge these shortly. I'll add merge notices to the pages immediately, in case someone missed this discussion. —Michael Z. 18:33, 10 April 2008 (UTC)
Done merging, and then made some additions and reorganization on the page. Please look over Wiktionary:Transliteration and romanization. —Michael Z. 21:51, 10 April 2008 (UTC)

Nomenclature

Romanization is the more general category, with transliteration being more limited in scope. In one case (Wiktionary:About Japanese/Transliteration, dealing with w:Hepburn romanization) a guideline seems to be incorrectly named. Since we're dealing with less than a dozen guidelines so far, I've proposed moving the guideline to Wiktionary:Romanization and re-categorizing it under Category:Wiktionary:Romanization. Comments or objections? —Michael Z. 16:45, 6 April 2008 (UTC)

That makes sense. It's not like we'll ever be transliterating into any non-roman script. —RuakhTALK 19:36, 6 April 2008 (UTC)
Incorrect. Transliteration has the broader scope, since any script may theoretically be translated into any other script. Romanization is a specific subset of transliteration in which the result is written with Roman letters. There are some languages included here (like Serbian or Crimean Tatar) which in fact are written in multiple scripts. --EncycloPetey 21:29, 6 April 2008 (UTC)
Eh, that's iffy. I'd argue that neither has broader scope, since "transliteration" typically implies a character-by-character mapping scheme (so, you can transliterate Greek writing, but not Hanzi writing, into Latin script), which "romanization" does not; but "romanization" necessarily implies mapping into Latin script, which "transliteration" does not. But for Wiktionary purposes, where we only ever map text into Latin script, "romanization" has the broader scope. (Your comment about languages in multiple scripts strikes me as a red herring, since in that case we're not transliterating Serbian+Latin into Serbian+Cyrillic, but rather including the already-existent Serbian+Cyrillic alongside the already-existent Serbian+Latin. If there's a Serbian+Cyrillic word that's spelled funkily for whatever reason, we'd use that funky spelling, rather than providing a straightforward Cyrillic transliteration of its Serbian+Latin counterpart.) —RuakhTALK 21:55, 6 April 2008 (UTC)
You may be right about Serbian, but I maintain that Transliteration has the broader scope. Strictly speaking, Romanization implies the use of only Roman letters, but Pinyin "Romanization" includes Arabic numerals to transcribe tone. On reflection, there are aspects to Romanization that are not covered under the term Transliteration, just as there are aspects of Transliteration not covered under the term Romanization. So, I retract what I said about one being the subset of the other; they are two items which have significant overlap on Wiktionary, but neither is wholly included within the other. --EncycloPetey 22:05, 6 April 2008 (UTC)
It's true that I only considered transliteration into Latin when I suggested that romanization is the more encompassing concept, but that is what we're concerned with. Romanization from another alphabet is also transliteration, while romanization from a logographic system is not (the addition of numbers to pinyin is just a detail of a romanization system). In en.Wiktionary, "romanization" covers the whole topic more precisely than "transliteration" does.
The broad standards bodies have run into these constraints too, so while Slavicists usually say "transliteration", the BGN/PCGN refers to all of their standards as "romanization" systems. —Michael Z. 22:30, 6 April 2008 (UTC)

I'll leave this alone for now, since there doesn't seem to be active support for changing the names. —18:37, 10 April 2008 (UTC)

Standards

For romanization and transliteration we are using a mix of established standards, slightly modified standards, and systems created specifically for Wiktionary. Some of our romanization guides emphatically state that romanization is distinct from pronunciation, while at least one novel phonetic system is under development with the explicit assumption that the need for transliteration "is now suddenly past."

I think we need to develop some basic guidance for the use of romanization and transliteration in Wikipedia.

  1. Briefly, what is the purpose of romanization in Wiktionary? Is it distinct from pronunciation, and if not then should it be merged with the latter or deleted altogether?
  2. What circumstances justify developing our own novel system instead of adopting an established standard, created and used by professionals?

Michael Z. 16:45, 6 April 2008 (UTC)

  1. Briefly, the purpose is to enable the casual reader to look at a string of characters they don't know, ignore that string, and look at the string right next to it of characters that they do know, so they'll have some idea of the word in question, will easily be able to tell if the same word is mentioned in more than once place in an entry (assuming they can distinguish between the various scripts it might be written in that could all produce the same romanization), and so on. It is definitely distinct from pronunciation, because many languages are like English in that a single word can have vastly different pronunciations over space and time, but we don't want to have to provide all those pronunciations every single time we mention a word in any entry. Also, because we typically aim to provide pronunciations in a fairly technical form (IPA, SAMPA, etc.) that are hard to guess at if you're not familiar with them; romanizations, by contrast, should be easily (if sometimes ambiguously) intelligible to the casual reader.
  2. I think for most languages, there exist various co-existing de facto standards, and I'd almost say that in most cases it's better to form our own balance than to try to impose some de jure standard that's not representative of our needs and those of our readership. (This is also complicated by the fact that the de jure standards are often tied to specific organizations and specific kinds of goals, and therefore are potentially POV; and even when this isn't the case, they're frequently way too technical for our purposes.)
RuakhTALK 19:36, 6 April 2008 (UTC)
I don't think it's a good idea to just dump standard transliterations schemes used by thousands of publications (including most of the real-world dictionaries) in favour of some ad-hoc designed ones that are Wiktionary-specific and which should somehow approximate phonetic value of a word to a clueless reader who just happens to randomly open some FL entry. If someone is supposed to actually learn a FL word using Wiktionary (assuming that that is the primary purpose of FL entries), he is expected to be familiar with some basic properites of it, like phonology and transliteration system. --Ivan Štambuk 20:28, 6 April 2008 (UTC)
In the case of “standard transliteration schemes used by […] most […] real-world dictionaries”, I agree with you; but I think that for most languages, the so-called "standard" transliteration schemes are really not the most widely used. (Perhaps I'm simply mistaken; perhaps Hebrew, the only non-Latin-alphabet-using language I know well enough to really form my own opinion about, is simply an exception in this regard.) Also, transliterations are not just for foreign-language entries, but also for English-language etymology sections and so on. And even foreign-language entries are not just for people actually learning the foreign language, but also for people who encounter a foreign-language word in some context that makes them want to know more about it. (And I don't think that for most languages there's any transliteration scheme that could be considered a "basic property" that a language learner is expected to know.) —RuakhTALK 21:09, 6 April 2008 (UTC)
Transliteration also helps non-readers of foreign scripts discern and compare the structure and possibly the phonemics (not the phonetics) of words of various languages.
I would suggest that for some of these reasons it would be best to use a system in use in dictionaries or in linguistics. I think it is generally better to use an established system than to invent our own—even if it is a rare one, then it would be used in at least two places, not just one. Some systems may have variations or not be well defined, in which case we may choose to nail down the fuzzy details.
I also believe that the the wiki principal of reliance on documented knowledge strongly discourages us from presuming to have the expertise to develop or modify a better romanization method than those that have been developed or used by professionals or academics.
But the cases of some languages, the choice of a best system may be debatable, or there may be no good candidate (e.g., Wiktionary:About Thai#Transliteration). —Michael Z. 22:55, 6 April 2008 (UTC)
The primary purpose of FL entries on Wiktionary is to help English-speaking users (not necessarily native English speakers; English being de facto world's only lingua franca, and the defining vocabulary much more easier to acquire than any specific terminology, cross-language learning opportunities are much more bigger here than in e.g. Wikipedia) learn what do FL lexemes mean, with as much additional data that could enhance learning experience. Experiences of others (those who happen to randomly open a FL entry or navigate to it via ===Etymology===) that have absolutely no interest in the FL entry itself, nor wish to spend a reasonably small amount of time acquainting themselves with transcription/transliteration system usually used for it, should be of little or no concern. Transliterations can sometimes convey much important data - stress/pitch/tone via diacritics that could sometimes be phonemic but not marked in usual orthography, or hyphenation for separating clitics and compounds (which are due to various peculiarities sometimes very difficult for beginners to distinguish).
Most "important" scripts have some sort of standard transliteration system (in lots of cases an ISO standard), or usually a half a dozen of them (The great thing about standards is that there are so many to choose from) that are widely used, and Wiktionary should follow the most common practice employed by real-world FL-English and English-FL dictionaries. Significant deviations should be thoroughly discussed an voted on (like when community decided to dump /r/ and use /ɹ/ despite the fact most (>90%) English-English, FL-English and English-FL dictionaries uses /r/ and that almost no one except trained linguists and knowledgeable enthusiasts knows wtf "alveolar trill" means ^_^).
It might make some sense to account for those who want to see "ch" instead of "č", "sh" instead of "š", "ś", /ts/ instead of /c/ etc. - but not at the expense of all the others who could use the Wiktionary to learn the language, and would expect it to follow the scheme used by most of the others FL-English dictionaries. Maybe in simple.wiktionary.org, or some "dumb-mode" WT:PREFS option ^_^ --Ivan Štambuk 19:51, 7 April 2008 (UTC)

In a while, I will try to incorporate some of these thoughts into the transliteration guidelines. When/if I formulate some concrete wording, I'll introduce it here before changing the guideline. —Michael Z. 18:43, 10 April 2008 (UTC)

Organization

Transliteration guides are spread out under different namespaces and categories, and inconsistently titled. Some merely refer to standards outlined in Wikipedia articles. [please add any omissions]

Where does all of this belong?

  • It seems to me that any "wiki-romanization" originated by this project belongs in the Wiktionary: namespace, and not in an Appendix:.
  • Is it better to point to Wikipedia, or to duplicate that material here, in cases where only standardized systems are used?
  • Should we present alternative standards, or only include Wiktionary's selected or created romanization systems?

Michael Z. 16:45, 6 April 2008 (UTC)

I think for some languages, such as Han-using languages, it does make sense for the romanizations to be described in appendices, since a reader might find it useful to learn the details of our system. But for other languages, such as Greek or Hebrew, an interested reader would find it much more useful to simply learn the script for his or herself, and the romanizations are probably only needed in the Wiktionary namespace. (Even when we do have an appendix, it might be best to have both an appendix and a project page, aimed at different audiences. Keeping them in sync would be a bit annoying, but when you consider that we also have to keep all main-namespace romanizations in sync, it's really nothing by comparison. :-P) —RuakhTALK 19:36, 6 April 2008 (UTC)
These pages do NOT necessarily all belong in the Wiktionary namespace. If a page is about standards of transliteration used specifically for Wiktionary, then it belongs in the Wiktionary namespace, either within an "About Language" page, or as a page or subpage of its own linked from that "About" page. On the other hand, if the page is about a variety of transliteration schemes, for the benefit of users who may have a work with an unusual transliteration scheme, then it should be an Appendix. The Wiktionary namespace is set aside for information about practice on Wiktionary, and should include only the standard selected for Wikationary. The Appendix namespace covers supplementary material not specific to Wiktionary, and should include any major system likely to be encountered. --EncycloPetey 21:22, 6 April 2008 (UTC)
Sensible, but it results in the guides for Wiktionary's romanization/transliteration being split between two different namespaces, or having some Appendix information repeated in the Wiktionary: namespace. I guess this could be ameliorated using categories, and by adding a definitive list to the main romanization/transliteration guide. Which is the tidiest solution? —Michael Z. 21:04, 7 April 2008 (UTC)
That will depend on what information currently exists on Wiktionary for a given language. I would think that having an "About:LAnguage" page would be an important first step, since there is the possibility of listing and linking such key pages and sections from the bottom of the page. --EncycloPetey 22:52, 7 April 2008 (UTC)
There shouldn't be any significant duplication. The project page should describe what is required/recommended for entries, and the appendix should explain how a given system works. So if it is the consensus that for language A, romanization X should be used, the "Wiktionary:About A" page should say "Entries in language A should use romanization system X," and link to "Appendix:X Romanization." Beyond this, considerations that affect how a romanization system is used in entries (layout, templates, etc.) go in project space; but the description of the system (insofar as it is not unique to Wiktionary) goes in appendix space. -- Visviva 09:17, 9 April 2008 (UTC)
That's a good summary, Visviva. I will review the relevant guidelines and appendices, and perhaps shuffle things around a bit to fit this picture. —Michael Z. 18:45, 10 April 2008 (UTC)

Have a look at the pages in Category:Transliteration appendices. Most of them are simply labelled "Wiktionary standard translation", with no explanation or citation. I'll move these from the Appendix namespace into Wiktionary, and post a note on each requesting a reference. —Michael Z. 16:59, 11 April 2008 (UTC)

AutoFormat and Category:Entries with level or structure problems

Does anyone still object to AutoFormat just fixing these? I don't think I've ever seen a case where its explanation of what it would do was wrong; granted, in plenty of cases it was incomplete, but I think that's because it only adds one {{rfc-*}} tag at a time, so if it actually just fixed things, I think it would have done a complete job. It's annoying that we have to do these manually, and frankly, I'm not convinced that manual intervention is any less error-prone than AutoFormat would be. —RuakhTALK 19:20, 6 April 2008 (UTC)

Agreed. I can't think of an instance where AF would have done something I disagreed with. However, it might be nice to have an official proposal of new things we're giving AF license to do, so we can specifically agree to them (if Robert's willing to throw together such a list). -Atelaes λάλει ἐμοί 21:27, 6 April 2008 (UTC)
Likewise. I can imagine that some of the more complicated pages could present a problem (mulitple POS sections with a single Translation section at the bottom), but these are very rare and are problematic anyway. --EncycloPetey 22:07, 6 April 2008 (UTC)
I've asked the same question here myself a while back, and since then I think I've seen an error, but just one, which I can't believe I didn't report. Suggestions aren't something people jump on, but if the bot does something wrong then we can yell at Robert to tweek it. DAVilla 05:02, 7 April 2008 (UTC)
Yes. Go for it. SemperBlotto 07:26, 7 April 2008 (UTC)

Appendix:Old Cyrillic alphabet

I've created a new Appendix:Old Cyrillic alphabet, including transliteration. Please review and correct any mistakes. —Michael Z. 02:58, 7 April 2008 (UTC)

Placement of terms consisting of multiple words

I am by convention placing terms consisting of multiple words such as complex analysis under the Derived terms header of the article, in this case analysis, as it is my understanding of WT:ELE that they belong there. Is my understanding shared by the community?

My placement of these under the Derived terms header in the article analysis has been reversed. Before launching an edit war, I'd like to be sure I am on the right side. --Daniel Polansky 09:38, 7 April 2008 (UTC)

  • I always put such terms in the "Derived terms" section (see sulfate as an example). I don't see the problem. SemperBlotto 09:45, 7 April 2008 (UTC)
  • Me too. I always figured this was the reason for using "terms" rather than "words" in the headings of these sections. -- Visviva 09:52, 7 April 2008 (UTC)
    • This has always been my understanding as well. Thryduulf 11:35, 7 April 2008 (UTC)
The edit summary that moved these to "See also" claimed that compounds should get different treatment. Was there a discussion to that effect before the creation of {{rel-top}}? Without the template, any justification to push some of this perhaps lower-value material lower on the page is understandable. But with collapsible tables I can't see any justificiation at all for separating them. Sometimes I wonder about the point of having big tables of such derived and related terms at all, collapsed or not. DCDuring TALK 11:47, 7 April 2008 (UTC)
Not to my knowledge; I think the editor was just confused. -- Visviva 12:12, 7 April 2008 (UTC)
Okay, thank you all. As an aside, I am very fond of these big tables, although not yet sure why. --Daniel Polansky 12:26, 7 April 2008 (UTC)
Even if your fondness were very neurotic (;-}), it would almost certainly be shared by some meaningful fraction of our users. I'd be interested in why you like them or how you might use them. DCDuring TALK 13:07, 7 April 2008 (UTC)
So (a) one reason I have discovered is that when looking for a compound term, I like only typing one word of the several and then navigate myself to the term with mouse. That is on days on which I type a lot and am glad to get a relief from typing. Another one (b) is that some substantives get extended by adjectives (e.g. philosophy, analytical philosophy, continental philosophy, pain, physical pain, emotional pain), and when these multi-term extensions are listed, the page of the substantive kind of documents its subclasses, or attibutes. I admit that the latter could be partially served by the Hyponyms header.--Daniel Polansky 17:52, 7 April 2008 (UTC)
And (c) is the reason (or use case?) given by Mike below: I know or assume that the phrase contains a specific word, but am uncertain about the exact reading of the phrase. --Daniel Polansky 18:05, 7 April 2008 (UTC)
Adding after the discussion: (d) specifically for adjectives, derived multi-word terms tell me on what classes the adjective is defined as a value of an attribute, so to speak. Phrased differently and modeled differently, it tells me what types the predicate of the adjective is ready to accept as its parameter. --Daniel Polansky 13:45, 9 April 2008 (UTC)
Likewise, I like the tables of related and derived terms. I always try to add them to Latin entries because I find it helps enormously with learning the vocabulary. Being able to see a host of related terms, and click on each to get the specifics, really is enlightening in terms of understanding Latin word relationships. The commonalities among the various words allow insight into the scope of the root word, and provide a survey of what ending created words in other parts of speech from that root. They're also really handy in the case of verbs for finding (and learning) all the compounded verbs that come from a particular root, and which differ in the addition of a prepositional prefix. --EncycloPetey 01:15, 8 April 2008 (UTC)
Thanks for the explanations. Though this seems like something only a veteran would use, Daniel has articulated how the tables might help an ordinary user who had come to Wiktionary to look up a complex concept. It's similar to having a lot of usage examples and citations in principal name-space, enabling certain (correctly spelled) searches to find useful entries. That kind of use would not put any limits on how phrases or compound words appeared, so that esthetics and the interests of etymlogic/morphologic/ally oriented user needs could legitimately govern. Would subject matter grouping help in the case of long lists. I would have thought that time-zone names would have been a helpful categorization in the useful extreme case of time. DCDuring TALK 02:46, 8 April 2008 (UTC)

Wow! So everyone here is happy with the fact that timely is buried deep within time? DAVilla 16:04, 7 April 2008 (UTC)

I kinda would prefer to split it into several tables, say one for "Derived terms" (which would include e.g. timely), one for "compounds" and one for "phrases", though I understand that such division is not popular here, and a distinction between "compound" and "phrase" is perhaps more difficult to keep up in English than in other languages (like Swedish). But for the information those lists contain: yes, I like them as they allow me to scan the list to find an expression I know contain a given word, but am uncertain how it would be written in the "lemma form"; or I may see which options there are to add a particle or a preposition to get to the appropriate expression, even if I don't remember which one should be used. (Trying to keep track of English prepositions in general, and prepositions used in various fixed expressions in particular, is nothing but Sisyphus work... ;) \Mike 16:36, 7 April 2008 (UTC)
Perhaps other grammatical forms, or transformations of a word deserve a special status. The plural times appears right next to time, so maybe the adjective timed, adverb (and adj.) timely, etc, belong closer to the top than, say time-honoured or Australian Eastern Daylight Time.
Is it possible to describe a logical, but fairly limited list of such forms? —Michael Z. 16:46, 7 April 2008 (UTC)
I would then say that anything which is not a compound/phrase, that is, anything which is not possible to split into more than one independent 'proper' word, would qualify. Thence timely would qualify, but not time-honored (= time + honored). Would there be any ambiguity in such a split? \Mike 17:42, 7 April 2008 (UTC)
I'm happy with timely's being s.v. time's "Derived terms" section if it's a derived term. If it's in fact descended from an older version of timely then I'm not sure. In the case of complex analysis, I suspect strongly that it is derived from analysis and so belongs in its "Derived terms" list.—msh210 16:55, 7 April 2008 (UTC)
If it is descented from an older version of timely then I believe it should be in a related terms section. Thryduulf 17:04, 7 April 2008 (UTC)
Ah, yes, agreed.—msh210 18:39, 7 April 2008 (UTC)
Interesting ... the discussion at #Ambiguous etymologies (above) seemed to reach the opposite conclusion. It remains my opinion that both forms of etymology need to be presented on Wiktionary; "timely" is formed from time + -ly in contemporary English, but it is also a linear descendant of OE tīmlīce. We would be doing an unforgivable disservice to our readers if we discounted either of these facts... and in this case it seems like "Derived terms" is the more transparent choice. -- Visviva 09:25, 9 April 2008 (UTC)
Hm, I'm thinking in terms of grammatical morphology rather than etymology. Timely is an adv/adj sense of the lemma time, regardless of whether it sprang from time or has always been used alongside it. Maybe I'm being too ambitious, as this may require a separate section, or something like a declension or conjugation block, rather than being sorted at the top of "derived terms". —Michael Z. 17:08, 7 April 2008 (UTC)
  1. Do ordinary passive users actually use derived terms and related terms? How do we use it? I use it as a kind of memory exercise when working on an entry sometimes, but rarely follow the links just to get information.
  2. Are "Derived terms" and "Related terms" ever split by sense? I assume that we wouldn't want them to be.
It doesn't seem silly to divide the contents of these into single words, compounds, and phrases/idioms/proverbs if the single block is "too big" as time's certainly is. DCDuring TALK 17:21, 7 April 2008 (UTC)
Separating (in lion) the lioness from the lion cub would be cruel. It is painful to have to look in different sections depending on the precise spelling of words (space inside or not?) when there is no other reason to separate them. But I agree that proverbs should be put in a different section. Lmaltier 17:48, 7 April 2008 (UTC)
2. Splitting by sense seems like a good idea in many cases. For example, why would I want pressure head or Korboggen head to be in the same table as head lettuce at head#Derived terms? This goes several times over for Chinese characters in the East Asian languages. But I suspect there are many other cases where splitting by sense would cause all hell to break loose.-- Visviva 15:15, 8 April 2008 (UTC)
The time page is indeed an extreme example, showing the downsides of what I so often like. As regards my preference, the main point is that the compounds are listed somewhere, not that they are listed under Derived terms heading. For me, it would be perfectly okay to have Compound terms heading, or whatever is considered appropriate. --Daniel Polansky 17:58, 7 April 2008 (UTC)

Inflections

We conventionally list certain inflections by the headword: plurals of nouns and pronouns, comparatives and superlatives of adjectives and adverbs, other cases of pronouns (he > him, himself, his), key inflections of verbs.

My paper dictionary (Canadian Oxford) also lists such inflections when they are irregular or "may cause difficulty". But it goes beyond Wiktionary by adding the simple past tense, present and past participles, adjectives in -able formed from transitive verbs, e.g., achieve (achievable), exchange (exchangeable). It may include versions restricted to U.S., British, or Canadian English, etc, e.g. "car·olled, car·ol·ling; US car·oled, car·ol·ing".

Regardless of their shared or separate etymologies, timed, timely, and timeful are inflections of time. It makes sense that we would make this intimate relationship clear somehow. Perhaps we should consider expanding the inflection templates like {{en-noun}}, or adding an "Inflections" section before "Derived terms" —Michael Z. 19:48, 7 April 2008 (UTC)

We already have an "Inflections" section before "Derived terms".—msh210 20:29, 7 April 2008 (UTC)
By my reading of WT:ELE, in English words only some inflections belong next to the headwords, but an inflection heading is only to be included in non-English words.
Perhaps the latter restriction should be relaxed. —Michael Z. 21:11, 7 April 2008 (UTC)
You have read ELE correctly; we do not use the Inflections section in English entries, and we do not need to. Adjectives formed in "-able" are separate words with separate entries and etymologies. They are listed in the "Derived terms" section. English does not treat timely as an inflection of time, nor do I know of any European language where the adverbs are considered inflections of nouns or vervs. Adverbs are typically regarded as a separate part of speech, though they are Derived from nouns, verbs, or adjectives. --EncycloPetey 22:47, 7 April 2008 (UTC)
Agreed 100%. —RuakhTALK 22:59, 7 April 2008 (UTC)
Okay, I see that the verb forms can be handled as in carol#Verb.
So inflection isn't the correct term, but this still means that some of a word's closely-related cognates can get lost in a sea of compound words, idiomatic phrases, and relative neologisms. In this way, Wiktionary's presentation suffers in a few cases, compared to a paper dictionary. —Michael Z. 00:45, 8 April 2008 (UTC)
I agree that it would be nice to have separate sections for (on the one hand) words derived by the addition of affixes and (on the other) phrases “derived” by the addition of words. —RuakhTALK 01:00, 8 April 2008 (UTC)
Yes, they can get lost, but this only happens on a very small number of pages. I suspect there are fewer than 50 such pages on all of Wiktionary. --EncycloPetey 01:11, 8 April 2008 (UTC)
But our hope is for them to get lost on many words. We might as well formulate some thoughts about how to separate them now, when it's still quite rare. Would anyone object to my splitting the section at time#Noun into two tables, one glossed as "words derived from the noun time", one as "idioms and set phrases using the noun time", just to see if we like the result? —RuakhTALK 01:38, 8 April 2008 (UTC)
I'd like to see how it looks. I can't see any serious objections to trying it, as long as none of the information is removed. Idioms, especially, seems to be a different thing from derived terms
My paper dictionary groups these as "idioms and phrasal verbs", and groups them after the main definitions. It also differentiates "derivatives" (formed with suffixes and are appended to an entry unless further definition is required) from compound words (which are always main entries, whether they are formed as one word or not, e.g., bathroom, serial number, and mega-musical). —Michael Z. 02:49, 8 April 2008 (UTC)
With collapsible tables, we don't even need an additional section. We could have multiple tables just as we do for Translations, as long as each table is appropriately labelled. --EncycloPetey 02:58, 8 April 2008 (UTC)

{{lookfrom}}

Something useful I've recently found is {{lookfrom}}, which directs the user to a Special:Prefix index page. It isn't perfect, but could be used to make the expansion of the ====Derived terms==== section redundant. Keene 14:22, 8 April 2008 (UTC)

I don't think that would do much to help things, as it only can find pages starting with the selected word. Things like "in time" or "on time" would still need a manually made list in time. On a related term: ould it be possible to tweak the search function so that it only looks for pages which includes the search string in the title of the page, and doesn't care if it occurs in the body? That would IMO make more sense as a replacement for ====Derived terms==== (except of course such derived terms which are based on some kind of mutation or stem change...) \Mike 14:42, 8 April 2008 (UTC)
Additionally, that template function doesn't distinguish languages. Nor does it restrict the listing to words etymologically derived from the start term; it simply lists entries that start with the specified set of characters. For example, boggy is derived from bog, but boggle is not (even though it shares the same start letters). It also is case-sensitive, so it wouldn't list New York if you were looking at "n". We don't have anything that would make Derived terms redundant. --EncycloPetey 21:43, 8 April 2008 (UTC)

Treatment of certain types of compound terms

I asked a question about formatting Spanish entries that applies to many other languages, so I brought it here. Compound words can often be formed by affixing pronouns (or participles particles) to verbs. In Spanish, for instance, practically any combination of one or two reflexive/direct/indirect pronouns can be attached to infinitive verbs, present participles, or affirmative commands. So I have a couple questions. 1) How should these compound words be treated? Should they be listed as "Conjugations" or under "Derived Terms"? Should one place (for instance the infinitive entry) list all the permutations or should they be scattered on the pages of each stem? See redactar for an example of putting them all under "Derived Terms". 2) The list was made by blindly following the rules of grammar, so many of the permutations are rare, possibly unattestable. Should we include links for as yet unattested terms (does the CFI apply to links)? I'd love ideas. --Bequw¢τ 19:34, 8 April 2008 (UTC)

Note: that should say "pronouns (or particles)". Some languages have particles (small words) that can be attached to the verb. I know Hungarian does this, and IIRC German and Dutch do as well. Indonesian will have problems with this that I don't fully understand.
To elaborate for those who aren't familiar with Spanish, many verbs can have a pronoun (indirect object or direct object) affixed to the end of a verb. So, "kiss me" would be bésame (besa + me). "Give it to me" would be dámelo (using da + me + lo). How should these affixed forms be treated, and where/how should they be linked or listed? The issue is compunded in Spanish by the fact that the meaning of some verbs changes depending on the presence or absence of these pronouns. --EncycloPetey 21:34, 8 April 2008 (UTC)
Hebrew does this also, although it is far more common in older texts than it is in current speech. I say have all these attested forms as entries, and list all of them that are definitely the correct form, even if unattested. I'm not sure where to list them, though: probably under Conjugation.—msh210 22:07, 8 April 2008 (UTC)
I think Hebrew's a bit different from Spanish in this regard: in Hebrew I think it's actually part of the verb's morphology — for example, in a form like עשני ‎(asáni, made me), I couldn't say where the verb ended and the direct object began. So in Hebrew, I think these forms definitely warrant their own entries, and in fact some of them (such as קדשנו ‎(kidshánu, sanctified us)) already have them. By contrast, in Spanish I think there's a separate verb and enclitic pronoun, and the spacelessness and accent are strictly written phenomena. (The -monos thing also affects pronunciation, but still I think falls into the same general category.) They may or may not warrant their own entries — my sense is not, but I see it both ways — but I don't think that bears on handling of Hebrew. —RuakhTALK 04:32, 9 April 2008 (UTC)
Even if Ruakh's correct that Hebrew is different from the Romance languages in this regard, I still maintain that both should have such entries and lists per my comment just above.—msh210 16:46, 9 April 2008 (UTC)
As I've always seen "Sum of Parts" reasoning used to remove phrases, and not applied at the sub-word level, I'd suppose that all Romance language compound term that are attestable would merit their own entry. This is especially the case in Spanish because some of the affixed pronouns could be ambiguous (whether direct or indirect objects) and meanings can change slightly. --Bequw¢τ 15:50, 9 April 2008 (UTC)
I'm not suggesting we apply it at the sub-word level, only that we adopt a more useful definition of word. If me hablaste is two words, then I think so is háblame. That said, it would be nice to have an entry for hábla- defined as “Form of habla used with following clitic pronouns; see hablar.”, and perhaps háblame should have an entry that says simply “habla (see hablar) + me.” I don't know what POS to use, though; in many cases it's a verb phrase (which we'd call a “verb”), but in other cases it's an odd verb-phrase fragment, like in either dalo al profesor ‎(give it to the teacher) or dame el libro ‎(give me the book). (Perhaps either V+DO or V+IO can be considered a constituent, I don't know, but certainly it doesn't flip back and forth whenever we change which object is a clitic.) And IMHO it is in no case a good idea for dar to link to dame, etc., though it should perhaps give the relevant imperative as “da/da-/dá-” (and likewise for other affected forms). —RuakhTALK 17:10, 10 April 2008 (UTC)

Treatment of other types of compound terms

Hebrew has a number of terms that translate into English as prepositions (from, to, others) and conjunctions (and, that), but which are attached to the fronts of words in the Hebrew. These are ב-,‎ ו-,‎ כ-,‎ ל-,‎ מ-, and ש-. Words formed of these, like בארץ ‎(b'eretz, in a land) (equals ב- ‎(b, in) plus ארץ ‎(eretz, land)), are written without space in the middle, and are, I think, recognized as one word by, for example, schoolchildren. Linguists consider them two words each, with the prefix counting separately from the rest. (Ruakh informs me it's actually a clitic rather than a prefix.) Certainly anyone who knows Hebrew can figure out the meaning if he can figure out where the prefix ends: if it's two words, then it's a sum of its parts. On the other hand, someone who doesn't know where the prefix ends will likely look up the whole thing. Ruakh says these are not entry-worthy; I say they are. I decided to take this issue here to the BP because it may well be relevant to other languages (Finnish, Hungarian, others) as well. What do you all think?—msh210 16:41, 9 April 2008 (UTC)

If they're written without a space in the middle, and words in Hebrew are otherwise spaced, then I say we might as well have them. What harm does it do after all? (A bigger problem is with languages like Thai or (what I've been learning recently) Lao, which are not written with spaces between words at all. It is often impossible to judge what is a compound noun and what is just sum-of-parts.) Widsith 16:50, 9 April 2008 (UTC)
The problem is that there's no limit; you can (and frequently do) put more than one of them together, as in [v'][she][k'][she][mi][b'][tokh], "and that when from within" — which contains the clitic [she] twice, four other clitics once each, and the preposition [tokh]. (Note: the brackets here are just for ease of reading; this isn't IPA or anything.) In this example, I think [b'][tokh] "inside, within" warrants inclusion as a fixed expression, as does [k'][she] "when"; but certainly the whole thing together doesn't. —RuakhTALK 17:06, 9 April 2008 (UTC)
I'm not sure what the problem is with adding many such entries. Not all will be attested, and those probably are the only ones we should add, but among those that are attested, I'll grant that there will still be quite a few. But this is a wiki: we've got time.—msh210 17:28, 9 April 2008 (UTC)
I think the same rule as in English should be followed, if possible: if the compound term refers to a specific term, then it should be included (like "motorcar", "railroad car", "carwash"), but not if it is a simple sum of parts ("red car", "Japanese car", "dad's car"). Of course, it is often not clear if the term is specific or not. In that case I don't see any problem including the word.--Jyril 17:41, 9 April 2008 (UTC)
I agree. —RuakhTALK 17:55, 9 April 2008 (UTC)
Sorry, but I don't think that makes sense. The CFI are designed to use the words found in permanently accessible works (books, journal articles, etc.) as a proxy for the words that a typical reader might encounter and want to look up. That works fine — not perfectly, but fine — for individual words; but I don't think it'll work at all for a stray series of prepositions and conjunctions that all happen to appear together at the start of a phrase, plus whatever word happens to follow them. There are virtually unlimited real combinations, and the CFI's standards of attestation won't reflect which ones are worth including and which worth aren't. (Partly because in some sense none of them are worth including, partly because in some sense they all would be if that were even remotely possible, just as it would be great to include every possible Lao sentence.) —RuakhTALK 17:55, 9 April 2008 (UTC)

I think a dictionary is intended to be (among other things) an aid for learning a language, not a substitute, or the only source for understanding. Therefore it is not necessary to include every possible form of every word in every language. It would probably be impossible as well. Just as an example, every Finnish verb has five infinitives and six particips, some of which can be inflected in fourteen cases in singular and plural, and combined with six possessive suffixes and a host of clitics. This adds up to dozens, possibly hundreds of forms derived from each verb. The numbers are smaller with nouns, but they are still large. Check one composition at: järjestelmällistyttämättömyydellänsäkään, which I added for fun, and because I have learned that it is the longest "word" in Finnish that is not a compound term. Nykysuomen Sanakirja (the Dictionary of Modern Finnish) has some 200.000 entries. In order to list all forms that may exist in Finnish alone, one would probably need tens of millions of entries. As a matter of fact I think we have too many forms already. As a simple example, most of the English plurals are plain old SoP's and completely unnecessary for anyone who knows even the basics of the language. Finnish plurals are a bit more complicated because the stem often changes, but for the most part they are as useless (or to be exact, the stem does not change, but the nominative form and stem are very often not the same thing). Only irregular plurals and those which have an independent meaning, or are "pluralia tantum" would suffice, IMHO. Having said this I do not know where to draw the line. Hekaheka 18:54, 9 April 2008 (UTC)

Why did you added järjestelmällistyttämättömyydellänsäkään because the word usually used is epäjärjestelmällistyttämättömyydellänsäkään? ;) In the case of words with enclitic suffixes or other "obvious" cases, I would accept those which are very common in that language or otherwise special. That is not the case of "proper" inflections, which should be included. --Jyril 19:18, 9 April 2008 (UTC)
This is out of the point of the discussion, but epäjärjestelmällistyttämättömyys includes a double negation and is therefore not a meaningful word - rather a collection of clitics. Hekaheka 21:50, 9 April 2008 (UTC)

A dictionary is not used only by people learning a language. You might want to try to understand a message you received by e-mail in a language you don't know, and cut and paste every word. Another use of Wiktionary might be by a browser allowing a simple search by double-clicking on any word of any website. Such uses require that as many forms as possible are included. Lmaltier 21:05, 9 April 2008 (UTC)

Translations robots do that job better. Besides, how much does it help to know that presupuestares is "the second-person singular of presupuestar in the future subjunctive", if you don't know what a subjunctive is?. It takes quite a bit of language-specific knowledge to understand the glosses. Hekaheka 21:50, 9 April 2008 (UTC)
There are no translation robots for some languages. Of course, you cannot translate a text well just by searching each word. But you can know what the lemma form is, and what it means, and this is important, even if you don't know what subjunctive means. Paper dictionaries don't allow that (I already searched for a word in a paper dictionary, and concluded it was absent; actually, it was present, but the lemma form was not obvious). Lmaltier 06:08, 10 April 2008 (UTC)
Well I guess there is some precedent for excluding single words that are Some of Parts as in English we now exclude the possessive case. Per language policies could be written to exclude Sum of Parts words according to the rules of that language, but I don't think it should happen until our wiki technical abilities mature. For example, finally when dealing with the case of the first letter of a word, our technical ability (auto-redirecting Omphaloskeptic -> omphaloskeptic) matches our policy (not allowing multiple entries for different cases of the same word/lexeme). (That was such a great enhancement, by the way.) I'd be great if a user's search came up empty, we could ask them for the language and we could check to see if the word is decomposable by that language's wiki-programmed rules. But until then, let's leave it open. --Bequw¢τ 21:36, 9 April 2008 (UTC)

Wandering [edit] links

These should now appear on the correct line; also trans and rel tables play nicely with images. See WT:GP for more info. If you see anything odd, tell me or us there. Some float boxes still need some extra code removed for IE. Robert Ullmann 18:10, 9 April 2008 (UTC)

Demonyms

A demonym is the name for a person from a place: European, Basotho, Iowan, Winnipegger, Haligonian, Smithereen. It's a specific, and sometimes interesting, kind of word.

I'd like to create a context template {{demonym}} applying a new category:Demonyms, which would in turn fall into category:People and either category:Geography or category:Place names.

(See also category:Exonyms, category:Endonyms, category:Xenonyms.)

Is this a sensible idea? —Michael Z. 07:01, 10 April 2008 (UTC)

It sounds like a sensible idea to me. Thryduulf 09:09, 10 April 2008 (UTC)
The category sounds good, but why would we need the context label? The sense would not be used in the context of demonyms: it'd be used in a general context.—msh210 16:38, 10 April 2008 (UTC)
I see, demonym doesn't describe where the word is used, rather it's a sub-category of nouns (and come to think of it, they also belong in category:English nouns). I guess it's just as easy to add [[category:demonyms]] as it is to use a template. —Michael Z. 17:37, 10 April 2008 (UTC)
OTOH, demonym is specific to a sense, not an entry, not a Language, not an Etymology, not a PoS. (Not that that problem is in anyway limited to demonyms.) DCDuring TALK 17:48, 10 April 2008 (UTC)
So a Berliner is both a native and a pastry.
General question: is it a good idea to place the category tag in its context in an entry (e.g., in the same line as definition no. 1), or should they all remain at the bottom of the page? —Michael Z. 18:03, 10 April 2008 (UTC)
End of the language section; see Wiktionary:Votes/2007-05/Categories at end of language section. —RuakhTALK 20:07, 10 April 2008 (UTC)
This would be a topical category, and should be a subcategory of Category:People, and might also be listed as a subcategory of Category:Etymology for each language. I wouldn't use a context template, because "Demonym" is not a context; it is a class of words. That is (astronomy) and (sports) say something about the context in which a sense is used, but demonym describe the kind of word. --EncycloPetey 23:25, 10 April 2008 (UTC)
  • Er. Is this really what demonym means? The only place I've seen it used that way is on Wikipedia, and I always assumed someone there just invented it cos it sounds important. Obviously I get the formation, but I am just a bit cautious about our adopting something if it is really only a protologism. A look at books.google shows few hits, and a lot of those seem to be using it with the sense of "name used by the people", i.e. "colloquial pseudonym". Widsith 18:49, 11 April 2008 (UTC)
    2007 dictionary where it is used as defined, 1870 dictionary where it is defined differently, 2003 book where the word is used and mentioned as defined, an 1895 dictionary where it apparently means 'name based one what one does' (I think, not too great with Greek), 2005 textbook used as defined. I think is used some if not widely, but it's present definition is very recent, late 1990s or early 2000s. The old definition seems to have fallen off at the onset of the 20th century. - TheDaveRoss 20:35, 11 April 2008 (UTC)

Thanks for the input, and thanks for checking the attributions, TRD. I've created Category:Demonyms, and a couple of language subcategories, under People, Geography, Etymology, and Names. Please review. Michael Z. 2008-06-26 21:36 z

Guidelines to correct structures with multiple ety and pron

I've been going through Category:Entries with level or structure problems and found several entries where I did not know how to correct the structure. These are mostly the ones with multiple etymology and pronunciation in different variations. One example is Sofia. Is there a guideline I could look at? Thanks. --Panda10 21:52, 10 April 2008 (UTC)

==Italian==

===Pronunciation 1===
{{IPA|/soˈfia/}}

====Proper noun====
{{rfc-level|Proper noun at L4+ not in L3 Ety section}}
'''Sofia''' ''f''
# {{given name|female||it:}}, cognate to [[Sophia]].
WT:ELE is your best bet, normally it is broken down by etymologies, then parts of speech. I am sure that somewhere there are a pair of lexemes derived from the same etymology with different pronunciations...I think that the best idea in this case would be to list both pronunciations in the pronunciation section and then note the pronunciation differences, rather than break the page into yet more sections. - TheDaveRoss 22:21, 10 April 2008 (UTC)
I've also got a "Model Pages" project. For a word with a single pronunciation and multiple etymologies, refer to round. For a word with 2 etymologies, each with its own pronunciation, see hinder. For a word with a single etymology and multiple pronunciations, see predicate (though you'll have to look at my last edit, since Widsith disagrees and thinks this is two separate etymologies). --EncycloPetey 23:15, 10 April 2008 (UTC)
Thanks for the model entries. What are the headers that can be numbered? Only etymology? It seems that AutoFormat will add rfc-level to entries with numbered pronunciations as above. So even if I remove rfc-level because I think the structure is correct, it will be added back next time. --Panda10 23:56, 10 April 2008 (UTC)
There is debate about which headers may be numbered. Personally, I believe that etymology and pronunciation headers should be numbered iff they are parallel headers under the same over-header. So, I would only number pronunciation sections if (1) there were more than one pronunciation under a single etymology and the pronunciations were tied to the particular POS sections underneath them, or (2) there were multiple pronunciations tied to particular POS sections and the etymologies had not yet been put in. But, in the latter case, the addition of etymologies might eliminate the need for numbering the pronunciations, if they were located under different etymology headers.
There has been discussion off and on about numbering Verb or Noun sections in certain languages. However, there are several approaches to how this gets handled and there has never been a focussed discussion or conclusive decision. Some regulars are strictly opposed to the idea, while others think it is useful. But, it is almost never needed in English, so it isn't usually a concern to the community. It's more a problem in languages where the gender, inflection, or other aspects of a word are tied to specific senses, so that there must be separate inflection lines and separate inflection sections for the different definitions. --EncycloPetey 00:11, 11 April 2008 (UTC)

Sorry, I am getting a little confused. I've just got a message from Hikui87 that I edited some of the Japanese entries incorrectly. They were marked with rfc-level and I moved the Alternate forms section from below the POS above it and probably renamed them Alternative spellings because I thought all languages followed the same basic layout. It seems that Japanese entries follow different layout rules. So are we discussing only English entries here? --Panda10 11:47, 11 April 2008 (UTC)

All languages do follow the same basic layout, but Alternative forms can appear under the POS when the forms are specific to a particular POS. This sometimes happens in English entries, but not so often as in some other languages. In the case of Japanese, they've chosen to make that placement of the section all the time because it's a more common problem than in other languages. The kanji used to write Japanese have more than one reading, and a particular romaji may come from more than one set of characters too. So, there are so many cases where the alternative form depends on POS, or even on sense, that ALternative forms is placed under the POS every time in Japanese entries. This is one reason I don't try to clean up Japanese (or Chinese, Korean) entries myself. There are a number of special considerations. --EncycloPetey 12:29, 11 April 2008 (UTC)

Glossaries on Wiktionary

Are there planned any glossaries on Wiktionary other than Wiktionary:Glossary? What is planned to happen with Transwiki:Glossary of library and information science? Does anyone know whether Wikipedia plans to keep its glossaries? I am asking because I find the glossaries useful, simplyfying the work of extracting all the definitions of a given domain from Wiktionary, which is something that can in principle be tediously done using categories. Thanks for any hints. --Daniel Polansky 17:22, 11 April 2008 (UTC)

Yes, there are others. They are mostly in the Appendix: or Transwiki namespace because they are not Wiktionary-specific. You can find them by sifting through Category:Appendices. --EncycloPetey 17:45, 11 April 2008 (UTC)
So is it that the glossaries in Transwiki: namespace are planned to be moved to Appendix: namespace? I mean, I thought that Transwiki means that these things are yet to be processed. Is there any policy, even if in the making, on how to deal with glossaries, like how to format the entries? --Daniel Polansky 18:20, 11 April 2008 (UTC)
As far as I know, there's not really any policy or guideline on transwikis. In my eyes, Generally the transwiki namespace is full of the unformatted crap that Wikipedia didn't want, in essence in limbo between the 2 projects. Keene 21:44, 11 April 2008 (UTC)
Items in the Transwiki: namespace have been moved here, and may be cleaned up and moved to the appropriate location. However, some of these items are duplicates of what we have, or are non-Wiktionary items, and will be deleted. --EncycloPetey 21:52, 11 April 2008 (UTC)

Inflection line for nouns used only in the plural.

Encyclopetey and I have come to a disagreement at template talk:en-noun#For a plural regarding how we should note pluralia tantum on the inflection line.

My position is that we should use the format:

'''noun''' {{pluralonly}}
(or '''noun''' {{plurale tantum}} depending on the outcome of the discussion further up this page).

This categorises words into category:English pluralia tantum, a sub-category of category:English plurals, which is a sub-category of category:English nouns

Enyclopetey is advocating the alternative:

{{en-noun|''[[plurale tantum]]''}}

This categorises words into category:English nouns

Before this discussion descends (further?) into acrimony I feel that more opinions are needed. I suggest that the discussion take place here rather than there. Thryduulf 18:43, 11 April 2008 (UTC)

It seems like a good idea that it be regularized. I would expect that it would need a vote if it is actually to become mandatory. I certainly hope it isn't going to be backdoored. Taking the point of view of an ordinary user would suggest that it should be intelligible. Learning from what other dictionaries (with vastly more resources and a pecuniary interest in trying to make their product useful) seems useful even if we reject their choices. MW Collegiate shows:
  1. no plural and no notation when the singular and plural are the same, but also for regular plurals;
  2. "pl" when the noun in plural in form and is usually used in a plural sense;
  3. "pl but sing in constr when the noun looks like a plural but takes a verb in a singular inflection
  4. "pl but sing or pl in constr when the noun lools like a plural but may take a verb either singular or plural.
MW3 (unabridged) does the same, but also always shows a plural form if there is one, reducing some ambiguity in "singular only" cases, and adds the qualifier "usu" (usually) for more common plural forms and sing vs pl 'construction'.
MW Online seems to be the same as MW Collegiate.
Longmans DCE, an ESL/learner's dictionary, dispenses with the idea of "construction" and has just "P" for plural and "U" for uncountable, slightly restricting the acceptable choices to simplify matters for their users.
I'd be interested in what OED, AHD, Collins, Random House, and Chambers do.
I think we've already established that nobody with normal users uses Latin. DCDuring TALK 19:52, 11 April 2008 (UTC)
Both the OED and MW3 use pl. for plurale tantum nouns. The AHD is inconsistent, sometimes using the text "Often used in the plural" (cf. pant) and other times putting the plural form in bold at the head of the numbered sense (cf. color). Random House doesn't bother to mark these at all. --EncycloPetey 21:58, 11 April 2008 (UTC)
Hm, that may not be inconsistent. "Pant leg" sometimes appears in the singular, but as far as I know, the "regimental colours" never does. —Michael Z. 08:22, 12 April 2008 (UTC)

Sorry, I missed the point of the discussion which relates to the structure of categories and the use of template. Does this affect non-editing users? Not initially, as I understand it.

When using the category intersection tools, what would happen? I hope that such tools will become available to regular users. If a p.t. noun is treated no differently than a normal noun most of the time, then everything should be fine. If not then, then we will have a problem.
Making something templated means its appearance can be changed by changing the template only. The EP approach would seem to give more scope to change things because there would be a template for the noun itself instead of the noun appearing in a "hard-coded" way. My 2 cents. DCDuring TALK 20:09, 11 April 2008 (UTC)
This came up because I couldn't figure out how to do it with {{en-noun}}, then went searching for the right template, and finally asked for help.
Doing this with a template would be advantageous, because it's already expected by semi-newbs like myself. All the better if it's an option on en-noun rather than a separate template. It would also guarantee consistent formatting and categorization, and allow us to decide on the exact categories and wording independently (the template can always be updated).
I think the wording is a separate discussion, but FYI, the Canadian Oxford saves space by saying plural noun or (pl. same or -es) in many cases, and letting you figure out the details—it's obviously aimed primarily at native English speakers. —Michael Z. 20:33, 11 April 2008 (UTC)
We could adjust the {{en-noun}} template to accept something along the lines of "plural=only" / "plural=tantum" that would add the expected formatting and category. However, I'm not sure how best this could be done. --EncycloPetey 21:55, 11 April 2008 (UTC)
If we are adjusting the {{en-noun}} template (which I assume would be no more difficult than the existing way we get {{en-noun|-}} to categorise into category:English uncountable nouns), then I'd hope we'd stick with the "pl=" format already used. I don't think "pl=tantum" would be possible as the plural of "plurale tantum" is apparently "pluralia tantum", so logically the/a plural of "tantum" is "tantum". "pl=pluralonly" or "pl=plurale tantum" would be possible, both displaying whichever form of words we agree on (which I agree with Michael is a separate issue). I think however the best form might be {{en-noun|sg=-}}, which to me implies there is no singular form in the same way {{en-noun|-}} signifies there is no plural form, and has the benefit (I presume its a benefit anyway) of using the existing "sg" parameter. This might fall down though for two-word pluralia tantum, e.g. glad rags and checks and balances where we use the sg parameter to link to the individual words. Perhaps then use either the "sg" or "pl" parameters with some other symbol to denote this status, ! perhaps?
If we do this, then I think the {{pluralonly}} and {{plurale tantum}} templates should be depricated in the inflection line, but remain for use in the sense line.
Whichever solution we have, I think it is useful to retain categorisation of the pluralia tantum, either there solely or there and in category:English nouns or category:English plurals. The existing templates could of course very easily be modified to do this as well. Thryduulf 23:34, 11 April 2008 (UTC)
[I adjusted some text above which didn't display correctly —Michael Z. 00:28, 12 April 2008 (UTC)]
My preference would be for dual categorization in those cases. --EncycloPetey 00:17, 12 April 2008 (UTC)
Dual categorisation in which two categories? Thryduulf 00:59, 12 April 2008 (UTC)
Sorry, Category:English nouns and Category:English pluralia tantum. --EncycloPetey 02:30, 12 April 2008 (UTC)

Glossary - formatting

Is there any consensus on how to format glossaries? I have put up for myself a provisional policy at User:Daniel Polansky#Glossary, still wondering whether I should use (a) bullets, boldface, and "-" separator or (b) definition lists with ";" and ":". Today, I have formatted two glossaries, using the option (a). The option (a) is used in Wiktionary:Glossary and is more compact than definition lists. Still, definition lists are a standard HTML means of entering terms and their definitions. --Daniel Polansky 10:32, 12 April 2008 (UTC)

I am definitely in favour of (b), see the utterly hated Appendix:List of Harry Potter terms, it adds proper structure and (as a result) looks neater. Conrad.Irwin 18:18, 12 April 2008 (UTC)
That is a woefully incomplete list...if you are going to do it at least do it right :p - TheDaveRoss 20:07, 12 April 2008 (UTC)
Having a look through the various glossaries, I agree that a guideline would be helpful.
{{compactTOS}} doesn't need rules to be added, because it already stands out on the page. If they are desirable, then we can add them as CSS border-top and border-bottom in the template, instead of tossing in more wikitext.
I would suggest that bulleted glossaries don't need bold formatting, especially if the terms are linked. A colon may be a less obtrusive and more natural separator. The terms don't get lost in Appendix:Bagpipe terms, and I think it is more readable than many of the others.
The guideline mention consistent copywriting, too. Does each term begin a sentence, or is it followed by one? Does the definition begin with a capital letter? I think the answers are different for each glossary, but the definitions in a glossary should be consistently written.
I am very much in favour of using structural HTML, but unfortunately Wikipedia's semicolon-colon wiki lists are styled for discussion. Definition lists are well structured, but not particularly attractive. The lists also have the advantage that one term can be associated with several definitions (but unfortunately there's no way to put more than one paragraph or another list into a single definition). —Michael Z. 03:08, 13 April 2008 (UTC)
IMHO the defined term should better appear in boldface. It does so in Wiktionary entries of terms, it is so formatted when the HTML definition list is used, and it is a Wikipedia convention to have newly defined terms in boldface. --Daniel Polansky 06:17, 13 April 2008 (UTC)
It's good typographic practice to use the formatting appropriate for the context.
Headwords in entries are the most important thing on the page, and have to visually compete with the headings. Each entry has one or only a few of them. Boldface is appropriate here, and it is a convention inherited from many paper dictionaries. But dictionary terms also appear in etymologies, where they are italicized, and in lists of related terms, etc, where they are in roman font and linked. We wouldn't boldface any of these instances.
In contrast to Wiktionary entries, glossaries have dozens or even hundreds of terms, and are made of many blocks of running text, rather than a collection of headings and bulleted lists. They are more similar to the lists of terms appearing in entries than to the headwords. Glossary entries don't have to compete with other bold elements, they just have to be found by the reader, and then not distract her from reading the definitions. A term here is already flagged by coming at the beginning of the line, by being marked with a bullet, by being linked, and is set off with prominent punctuation after. Boldfacing every one is just adding icing to the gravy. —Michael Z. 17:40, 13 April 2008 (UTC)
Okay. I will format poorly formatted glossaries using the option (a), as I prefer it and I can see no clear consensus against it, but I will refrain from turning well-formatted glossaries formatted using the option (b) into the formatting (a). Please, let me know if you think it a poor personal policy. --Daniel Polansky 06:47, 13 April 2008 (UTC)
A and B are both much better than any of the other formats which appear in some of those glossaries. —Michael Z. 17:25, 13 April 2008 (UTC)

a bot to capture Wikipedia on Wiktionary

I was just now editing 狐獴 (the Mandarin entry for meerkat), and thought of something (maybe someone else has already thought of it, but anyway). Wikipedia now has thousands of articles with versions in multiple languages. The titles of most of these articles are either nouns or proper nouns. Perhaps a bot could be written to create Wiktionary formatted entries (similar to Tbot) for these words. For example, if the bot noticed that the English Wikipedia article for meerkat had a Mandarin equivalent article called 狐獴, the bot would create a formatted entry on Wiktionary that would look something like what you now see at 狐獴. A category could be slapped onto such entries (similar to the way Tbot tags things), so that a human editor could verify the contents, and add extra things (ex. Pinyin romanization for Mandarin entries etc.). Thoughts? -- A-cai 13:31, 12 April 2008 (UTC)

  • This would be fine if the interwiki links between the various Wikipedias always used the same conventions. Take your example of w:meerkat - the Italian interwiki link points to "Suricata suricatta", which is the translingual (or modern Latin) name, not the Italian word suricato. So your bit would generate incorrect entries. SemperBlotto 14:28, 12 April 2008 (UTC)
That's a bad idea. Many of the organism articles on most Wikipedias are based on the scientific name, and most of the plant articles (over 15,000 and growing) use the Latin binomial for the name. some other Wikipedias do the same. When an organism belongs to a higher taxon, and is the only organism in that taxon, the two articles may be the same, but not all Wikipedias divide them the same way. As a result, the article on the Ginkgo genus on one Wikipedia may be titled for the Ginkgoaceae family on another Wikipedia.
There are also many cases where the bots propogate incorrect links. I have had an ongoing tussle with bot operators over w:Monoicous because they don't understand that w:es:Monoica and w:fr:Monoécie are not about the same topic (they should link to an article about monoecious, not monoicous). The plant editors keep removing the incorrect links; the bots keep adding back the links; and the bot operators believe they are absolved of any fault. There are also many cases where the article titles don't even remotely mean the same thing, even when the topic is the same. English wikipedia has an article on Plant sexuality (which we would delete as sum of parts), but it covers the same subject as w:es:Monoica does. In short, we're seeing links between articles that do not have titles of the same meaning.
There are also many, many articles with titles that do not merit an entry because the entry would not meet our CFI. I can't see any reasonable way to get a bot to distinguish between cases or use appropriate selectivity in choosing which articles to create entries for. --EncycloPetey 14:29, 12 April 2008 (UTC)
I agree with EncycloPetey. It's a great thought, but I think there are too many problems with it. In addition to the ones he mentions, there are also cases like wikipedia:Fixed-wing aircraft where Wikipedia's noble quest for NPOV has led it linguistically astray (the normal words being airplane and aeroplane, depending on dialect). (There are also issues with figuring out whether a Wikipedia article corresponds to a lowercase Wiktionary entry or an uppercase one, but an intelligent bot might be able to handle those by searching the article for non-sentence-initial uses.) —RuakhTALK 18:09, 12 April 2008 (UTC)
Do note that while "airplane" and "aeroplane" may be the "normal" layman's terms, engineers (aircraft engineers) generally use the technical term aircraft. When Boeing talks to the public and to the Street, they use "airplane" (Boeing Commercial Airplanes Division), when their engineers and pilots talk, it is "aircraft". "Fixed-wing aircraft" is correct. (Besides evading the Pondian Problem ;-) Robert Ullmann 12:10, 20 April 2008 (UTC)
I agree that aircraft is the more formal/technical term, and that fixed-wing aircraft is correct and precise, but even a Boeing engineer or pilot would presumably choose airplane over fixed-wing aircraft in a context where aircraft alone didn't suffice, right? (And remember that Wikipedia's main naming convention is that articles should be named using the most common term for their referents.) —RuakhTALK 16:09, 20 April 2008 (UTC)
I think this would work. Ullmann has already put significant effort into Tbot's checking mechanism and the quality of the Wikipedia data would seem not too much lower than the quality of some of our translations. I think that (if this is possible) a modified version of Tbot that accepted input from Wikipedia interwiki's instead of Wiktionary's translation tables would be very good. (And as an added bonus it could add the word to the translation tables at the same time ;). For more information on exactly what checks Tbot does you'd have to ask Ullmann, but I believe they require a foreign Wikt entry to exist and contain a translation in common with the English Wiktionary entry. Certainly this bot shouldn't create translations of articles that don't have entries in Wiktionary. I don't know how many of the Wikipedia interwiki's would pass Tbots checker, but I would think a significant enough number to want to give this a go. Conrad.Irwin 20:21, 12 April 2008 (UTC)
I think it is a bad idea to try and get data from information which isn't designed to be a direct translation. Even if one article corresponds to another that does not mean that the titles are translations of one another. - TheDaveRoss 20:43, 12 April 2008 (UTC)
You make some good points, but the thing is, when Tbot creates a bad entry based on one of our translations tables, that's still useful: it calls attention to a problem in that translations table. When Tbot creates a bad entry based on Wikipedia interwikis, that's just annoying. (And, would it re-create the entry every time we deleted it?) —RuakhTALK 20:45, 12 April 2008 (UTC)
Based on recent discussion here, I think it would be trivially easy to stop a bot creating an entry when a previous page with that title had been deleted. When a section on a page was deleted, but the entry still exists (e.g. the Dutch section was deleted but the English section remains) I think it would be harder (disclaimer: I am not a programmer). Thryduulf 22:08, 12 April 2008 (UTC)
Besides the points raised above, I think that Wiktionary should try to create it's own content, and not rely on other projects and any mistakes they might make. Nadando 07:29, 13 April 2008 (UTC)

I should have pointed out one of my justifications for such a bot. A lot of contributors are already introducing such entries ... by hand! Many such entries are poorly formatted, and rarely tagged with any kind of "blindly copied from Wikipedia" tag. The entry for 狐獴 was one such example. It originally looked like this. I'm proposing to standardize the process with a bot. Such a bot (if done correctly) would give me the means to efficiently verify such entries, without a lot of additional formatting work. -- A-cai 07:46, 13 April 2008 (UTC)

One option that we might consider is writing a bot to add the interwikis as ttbc's to the English entry. The beauty of this option is that we get a whole bunch of data, it's pretagged to be looked at by human editors, and can then be fed into Tbot in the normal fashion. Additionally, our readers will be forewarned that it's questionable data and so, hopefully, won't be led astray. This does, however, have the downside that it would flood ttbc categories, which would, admittedly, be irritating. -Atelaes λάλει ἐμοί 07:39, 13 April 2008 (UTC)

A starting point would be to try to extract some entries automatically, and create a report or list of what the automation thinks it world do. As noted above, Tbot's primary check uses the translation table in the FL.wikt entry; this won't work that way. A serious technical issue is that Tbot works by digesting the entire en.wikt XML, and then looking for specific FL entries; what method would be used to extract the 'pedia data? The en.wp XML by itself is not manageable. (one could get "langlinks.sql.gz" for a given set of wps, and then do some analysis) I myself have not tried to parse anything out of wp entries; they look superficially consistent in many ways, but that may not make them tractable. Robert Ullmann 12:10, 20 April 2008 (UTC)

Transliteration appendices

The various transliteration systems in category:Transliteration appendices need references attesting to their wider usage or indicating their source. If they are systems modified or created specifically for Wiktionary, then they ought to be moved to the Wiktionary: namespace, per earlier discussion (Wiktionary:Beer parlour#Organization).

I added Wikipedia links to all of these appendices, and here is a link to Thomas T. Pedersen's reference to many transliteration systems.

Please add a reference to any of these appendices you are familiar with. —Michael Z. 20:07, 12 April 2008 (UTC)

It also looks like Wiktionary:About Greek/Transliteration may be a candidate to become an appendix. —Michael Z. 20:53, 12 April 2008 (UTC)

Most of those where created specifically for Wiktionary, and for some (New Persian) editors seem to be using multiple transliteration systems simultaneously. If you have specific objections to any of those, I'm sure people will be happy to discuss on respective talk pages. --Ivan Štambuk 21:39, 12 April 2008 (UTC)
The translaiteration Appendices should have references, yes, but when the Transliteration system is part of a Wiktionary: namespace page, then its an internal transcription system that may or may not be used elsewhere. --EncycloPetey 03:19, 13 April 2008 (UTC)
Right, but I'd like to just figure out which is which, and put each one in the right place.
A reader should be able to look at each table, and:
  • Know if they can expect to see the system used in other publications, and if so then in which field (linguistics, publishing, other dictionaries, etc).
  • Know if they shouldn't use it in e.g. an academic paper and expect their peers to be familiar with it.
  • Have confidence that what they are reading observes the Wiki principal of verifiable accuracy, and clearly identifies any original research.
From browsing through them and having a look at the relevant Wikipedia articles, it does look like a significant proportion of them are systems used in academia, or possibly dangerously close to such systems. —Michael Z. 03:30, 13 April 2008 (UTC)

Proposal: a template for linking prominently to foreign-language Wiktionaries.

When I add a foreign-language word that has a that-language Wikipedia article, I typically add a prominent link to that article, using {{projectlink|pedia|lang=fr}} or whathaveyou. However, if the word has a that-language Wiktionary entry, the link to it only shows up in the sidebar, which is fairly useless unless the reader knows to look there. I think it makes sense for the that-language Wiktionary entry to have a more prominent link, if only because the that-language entry is generally more complete (if only because it will have translations to languages other than English). So, I've created {{PL:wt}}, such that this:

* {{projectlink|wt|español|lang=es}}

will produce this:

español on the Spanish Wiktionary.

It's pretty much like all the other {{PL:*}} templates, except that it doesn't create a sidebar link (since the normal bot-managed interwiki link serves that purpose).

Before I start using it widely, mentioning it at Wiktionary:Links, etc.: does anyone object to this? (And, does everyone agree that this should only be used for linking to the that-language Wiktionary entry, never to other foreign-language Wiktionary entries?)

RuakhTALK 00:36, 13 April 2008 (UTC)

The icon is pretty pointless. It's just just a vague smudge, in my browser. Better to use nothing at all.
Or perhaps the favourite icon, which was designed to display at 16px size. But then it should be the bullet, created using the list-style-image CSS property, not an icon placed next to a bullet. —Michael Z. 02:30, 13 April 2008 (UTC)

Wiktionary-favicon.png

But that icon is also used by Wikipedia. How about expanding the function of {{infl}} (and similar templates) to include a link to the appropriate wiktionary, if it exists? How feasible in this? Would this look OK in the inflection line? --EncycloPetey 03:16, 13 April 2008 (UTC)
Yeah, Wikt: and W: are the only two projects that share an icon, but that's what we have (see the selection at commons:Wikimedia#Favicon). But we do use the globe for Wikipedia, so links with the W would still be distinctive.
(There's also an SVG version that looks pretty sharp at various sizes Wikipedia's W.svg and the puzzle piece Wiki letter w.svg )
But that's also why I think it may be better to use nothing. The full logo at 16px size is not attractive or even identifiable. It doesn't even serve as an eye-catching bullet, especially if it is right next to a standard bullet. A graphical element that serves no function simply makes things worse than they would be without it. —Michael Z. 03:46, 13 April 2008 (UTC)
WiktionarySv.svg
Some other-language Wiktionaries use the scrabble tiles for their logo. The W tile clipped out of this image might make a usable 16-pixel bullet (and favicon). —Michael Z. 04:01, 13 April 2008 (UTC)
(edit conflict) O.K., I've removed the logo. I don't like the favicon idea, for the same reason that EP gives; and even if I did like that idea, I wouldn't like the idea of not including a bullet from just one element of a bulleted list. (I mean, technically it would be multiple unordered lists in the HTML, but to our readers it would look like one bulleted list with just one non-bulleted element, so, same effect.) And anyway, on reflection it doesn't make much sense for us to use some form of our own logo as a means of identifying a foreign-language Wiktionary. However, it might be nice to use a bit of markup, something like (es), as the “logo”. —RuakhTALK 04:02, 13 April 2008 (UTC)
The puzzle piece is typically used for stubs, not for projects. I wasn't clear what I meant; I was thinking of adding functionality to the {{infl}} template along the lines of what we do for {{t}}. So, an inflection line might look like: hablar (es) so that the interwiki link appears in the inflection line. The {{infl}} template already includes the language code, which simplifies the process a bit, should we decided to do this. --EncycloPetey 04:03, 13 April 2008 (UTC)
I'm not opposed to {{infl}} having such a link, but I don't think it's enough: I don't think most readers will understand it. To be honest, I'm not sure most readers recognize the significance of those translation-table links, either, but at least there we provide only two links, so most readers who are interested in a translation will probably try them out once to see what they are. That's not true of an interwiki link representing the entire language section. —RuakhTALK 04:02, 13 April 2008 (UTC)
Here's an idea: take the actual wikitext one would use to create an interwiki link, and realize it on the page. Its meaning actually is vaguely self-evident, even for non-wiki editors. All the better if there was a way to add a tooltip reading ‘hablar’ in Spanish Wiktionary. —Michael Z. 04:27, 13 April 2008 (UTC)
Why shouldn't a self-referential (inter-wiki) link make use of the vernacular?
Here's what it could look like, with the es: link after the definitions: User:Mzajac/hablar. It also looks fine just below the headword, but interrupts the flow. I think it's a bit cluttered if placed at the end of the headword line. —Michael Z. 05:00, 13 April 2008 (UTC)

Another option altogether would be to leave the link in the left sidebar, but make it more prominent (bold font?). Wikipedia uses a javascript trick to make other-language featured articles have a star for a bullet (w:Template:Link FA), so it would be possible to manipulate other attributes of the link. —Michael Z. 05:08, 13 April 2008 (UTC)

This idea has been suggested before, I think that javascript is the best solution, as the interwiki link would already be there and we can then just create add a more prominent link to a suitable place in the entry. Obviously the formatting and position of these links has yet to be determined, but I feel that they should be part of, or near to, the language heading. Here are a few of my ideas for layout:

Spanish (es)


Spanish
The Spanish Wiktionary contains hablar


Spanish
español:hablar

As above I think that adding these with Javascript is better than adding these to all the entries, though I suppose it could be added to Interwicket. Conrad.Irwin 11:24, 13 April 2008 (UTC)

Well, I'd still like to be able to list them in "see also" sections like with other sister projects, but if people want your approach: I like option #2 (though "has an entry for" might sound better than "contains"). Option #1 is the most prominent, but again, I'm not sure it's obvious to most people what (es) means: if I saw it, I think I'd interpret it as "Hey, we know that the name Spanish is slightly controversial, so we're also including this unambiguous language code for clarity's sake." I'm not sure if I'd bother to click the link, since I'd assume that the link was to inform readers of what the language code meant (since not everyone is familiar with language codes). Option #3 is kind of cool, but less prominent, and it's not instantly obvious what it means. There are fairly few cases where we just include a link that would be absolutely meaningless in a print edition (aside from the edit-links, sidebar links, etc.), and I don't think this link needs to be an exception. But option #2 is great; I'd be very happy with it, especially if it were in concert with "see also"-s instead of instead of them. In anticipation of it, and of other scripts that might need such a thing, I've created MediaWiki:langcode2name.js, which offers functions for handling the language code, English name, and FL name of each Wiktionary language. —RuakhTALK 14:12, 13 April 2008 (UTC)
Inspired by your quick response, and finding myself at a loose end I have implemented #2. It can be trialled at the bottom of WT:PREFS "Trial the javascript prominent interwiki links." (try a hard refresh if the option doesn't appear straight away). Any thoughts would be appreciated. Conrad.Irwin 16:58, 13 April 2008 (UTC)
No. 2 is my favourite too. No. 1 destroys the graphic effect of the title, so I wouldn't want to see it implemented. —Michael Z. 17:01, 13 April 2008 (UTC)
It looks pretty good. I'm going to try to get used to it a bit
Would it be acceptable to remove the "the" and the period at the end? It might look cleaner without the accoutrements, and if the beginning mirrored the language heading. I don't think the boldface is necessary to draw the eye to the link, and normal-weight text will probably be more readable at such small size, especially when it appear in certain foreign-language scripts.
Spanish
Spanish Wiktionary has an entry for hablar
Since the note is already written out in full, the tooltip may be an opportunity to include the destination language, but this would require compiling many translations of in Wiktionary.
Spanish
hablar in the Spanish Wiktionary
Starting with the term helps emphasize the link, and reduces the verbiage. —Michael Z. 17:53, 13 April 2008 (UTC)
Hmm, thinking about it - do we need "Wiktionary" in there, seeing as we are all one big project. Could we get away with "Spanish entry forhablar"? I like the idea of having the Foriegn language in the title, or even in the heading, as this makes it clear what to expect from the link. I think it would make it hard to include "In WIktionary" in the foreign language, as this would need different layout for each language. Perhaps the title could just be "español: hablar" or something that requires little effort ;). Incidentally please feel free to bugfix/experiment with the javascript so long as you bear in mind that it is being used by an unknown number of other people. I'll give your second idea a go now. Conrad.Irwin 18:06, 13 April 2008 (UTC)
I think there needs to be a reference to the other project. We are already looking at a Spanish word's entry in English Wiktionary, and "Spanish entry" doesn't make it perfectly clear how the context will change when we click.
"Español: hablar" looks fine to me, but does this construction work in every language? Probably, but some of them may have to be bundled with whatever passes for a colon. —Michael Z. 18:15, 13 April 2008 (UTC)
I have no idea, I think we can leave it as a colon unless anyone else has an opinion. Conrad.Irwin 20:10, 13 April 2008 (UTC)
Spanish
hablar in the Spanish Wiktionary
I prefer having just the term linked, which becomes the self-evident subject of the note. When the whole note is linked, there is no differentiation, and it looks like a subtitle for the heading. On the other hand, this may be a problem for short words like i. —Michael Z. 19:55, 13 April 2008 (UTC)
Yes, I was looking at one, and it didn't seem to be prominent enough, though I agree that it is better with only the word linked for anything longer. Conrad.Irwin 20:10, 13 April 2008 (UTC)

Weird: the link appears, but is gone if I refresh a page in my browser (Safari 3.1/Mac). —Michael Z. 19:48, 13 April 2008 (UTC)

Yes, this is partly caused by bugzilla:12773, and partly because it is including an external dependancy that may or may not have downloaded before the script runs, I'll have a think about the best way to fix this. Conrad.Irwin 20:10, 13 April 2008 (UTC)

The action word for a link is a good idea, but the verb is look up, not lookup.

Not sure if I like the indentation breaking up the left margin, though. —Michael Z. 16:02, 14 April 2008 (UTC)

Compounds and grammar

1. The article forgo describes this English verb's grammar as {{en-verb|forgoes|forgoing|forwent|forgone}}. Very similar patterns appear in the articles go and forego, and likely a few dozen other compounds ending in -go. Isn't it a waste of energy to repeat the pattern go/goes/going/went/gone in so many places? Shouldn't a reference to go be enough for the grammar?

2. In the many years of Wiktionary, many people must already have asked this question and so I would have thought that there should be a page about this question and its answer somewhere in the Wiktionary: or Help: namespaces. Is there? I can't seem to find one.

3. It can be argued that this is a minor problem for the English language, but in German and Swedish where compounds are so much more common and inflexion patterns are more complicated, the question takes on a completely different dimension. Still, new methods are seldom introduced in sv.wikt or de.wikt unless they already exist in en.wikt. It appears sv:föregå and de:vergehen use the same methods as forego, the full grammar pattern is repeated in every article for every compound. --LA2 15:46, 13 April 2008 (UTC)

Well if it's the wikitext duplication that bothers you, we could create a new special template for this group, allowing go to use {{en-verb-go}}, undergo to use {{en-verb-go|under}}, etc.; but I don't see why we wouldn't want the displayed version to show all the forms. —RuakhTALK 15:54, 13 April 2008 (UTC)
Personally, I think that we can be of most use to our audience if we show the inflected forms for each compound. Unlike print dictionaries we are not limited by space. Thryduulf 15:56, 13 April 2008 (UTC)
My primary concern is with question 2: Where has this been discussed before? A special template for "go" could be one solution, but for German and Swedish it would mean hundreds of thousands of templates. Saving space is not one of our needs, of course, but adding new compounds is a problem if you have to repeat the grammar pattern each time, without the ability to use a template. I don't expect a solution and consensus to appear in ten minutes, but I was expecting this question to have been discussed before. --LA2 16:23, 13 April 2008 (UTC)
For compounds with regular inflection, then perhaps a compound noun template(s) would be possible, depending on the grammar of the language in question of course. I can't see a way of automating irregular conjugation unfortunately. If the person defining the compound terms doesn't want to enter the inflections then they can use perhaps the {{infl}} template and categorise it somewhere where others can find the entry to add them later. Thryduulf 16:40, 13 April 2008 (UTC)
A reference back to a main entry isn't always satisfactory. Often, compounds follow the inflection of the parent verb, but sometimes they do not. For example, Latin faciō has an irregular passive voice conjugation. Some of the compounds from faciō have this same irregularity in the passive (e.g. patefaciō), but others do not (e.g. cōnficiō). It's therefore better if each entry contains full information of its own. --EncycloPetey 16:52, 13 April 2008 (UTC)
Well, the inflection line doesn't normally show all the inflected forms of a word, just the principal parts (whatever those are considered to be for a given language). In highly inflected languages, the other noteworthy forms are placed in a separate section (Conjugation/Declension/Inflection). But the idea of shunting this information off to a single "core" entry for each group of words is a non-starter, for reasons that are pretty basic to the philosophy of Wiktionary. Entries should stand on their own as comprehensive treatments of the word or form in question; the user should never be required to go to another page in order to get basic information on inflectional (or any other) properties of a word. Even for languages with quite regular inflectional patterns, like Latin, the principal parts for each verb are given in the inflection line (e.g. video#Latin). Users could be directed to an appendix to figure out this information for themselves, but that just isn't the way we do things here. I'm not sure if your specific proposal has been discussed before, but I don't think it would ever fly. -- Visviva 16:41, 13 April 2008 (UTC)
When I search the Wiktionary: and Help: namespaces for "basic to the philosophy of Wiktionary. Entries should stand on their own, I get lots of hits in the Beer parlour, but no obvious policy page. Where should I look? --LA2 16:48, 13 April 2008 (UTC)

edit conflict:

FWIW, Longman's DCE shows the past as "rare".
As to question 1, In the case of compounds not separated by a space or hyphen, I would think that we would want to show the inflection in each entry because it is not necessarily obvious that one could click on part of the blue headword in the inflection line to determine inflection. I use the infl tmplt to suppress inflections that would result from using {{en-verb}} for compound words separated by a space or hyphen, phrasal verbs, idioms, and other phrases that I put under the verb PoS header. I inflect compounds that are not separated. I justify the different treatment by saying that it ought to be obvious to even a casual user that, for such entries, if the inflection is needed, one would click on the word involved. "Obviousness" is in the eye of the beholder, of course, so this is not entirely satisfactory. HTH.
As to question 2, unfortunately you have to use Google-type search skills in the various spaces (talk, wiktionary, WT, Appendix) to try to exhume old discussions of this and sometimes even a description of current practice. We don't seem to have that many policy systematizers active here and many practices have not gotten beyond disagreements. For example, there is a school of thought that would object to my suppression of inflection of phrases and others who would disagree with the very idea of trying to assign "real" parts of speech to idioms. DCDuring TALK 16:59, 13 April 2008 (UTC)

I think there is some confusion here, the inflection line contains a few key forms of a verb, for very regular forms we do have templates (like the infamous {{en-verb}} which can handle far more than just vanilla regular English verbs) but for the less regular forms the effort of creating, cataloging and looking up these templates takes much longer than typing 30 characters. For the full conjugation of inflected languages we do have templates (see the list of French conjugation tables that use the standard patter) because the effort saved is very large and overcomes the maintenance costs. I don't think anyone has asked about this before because what we do does actually make sense. As for point three... the whole point of having seperate Wiktionaries is because different solutions work better for different languages readers and editors. I think it would be much easier for a newbie (and experienced editors) to find and use (or read and understand) {{el-verb|egkataleípo|εγκατέλειψα|egkatéleipsa}} than, for example {{el-verb-λειπω|εγκατ|α|έ|egkat|a|é}} even though it is shorter (see εγκαταλείπω) [I admint this is somewhat contrived, but the same principle holds for smaller idiosyncrasies in many places]. Conrad.Irwin 21:49, 13 April 2008 (UTC)

Scientific names and Taxonomy headers

Are these standard headers? If not, how should I correct them? They usually contain Latin taxonomy specifications for the entry. Example: bar-winged rail. --Panda10 17:15, 13 April 2008 (UTC)

  • I prefer to incorporate the taxonomic name in the definition. That is what I have done with the example (as well as giving it a proper definition, and linking to the correct Wikipedia article). SemperBlotto 17:23, 13 April 2008 (UTC)
I would do as SB does and:
  1. add a link to wikispecies ({{wikispecies}} or {{specieslite}}}. Though they have 130K+ articles, they might not have the one you want.
  2. look for picture in wikicommons. If there are multiple ones, I usually put in a commonslite link as well as the most helpful or interesting image.
  3. make these links external
  4. make in-line links to the individual words of the two-part species name, on the theory that we should be happy if we have the numerous component words for these names in Latin or Translingual and not try to keep up with the complexities and changes of taxonomic classifications and offer at best terse uniformative entries.
I think it should be clear that we do not have the ability to provide comprehensive coverage of compond names in taxonomy (2- or 3-part) and chemistry (n-part). If we could provide a reasonable mapping from vernacular names to taxonomic names as well as definitions of the parts of taxonomic names, we would be doing things that WP and Wikispecies do not and are not likely to unless we fail. DCDuring TALK 17:49, 13 April 2008 (UTC)
People have been using L4 headers Scientific names and Taxonomic names. While it may be useful to incorporate one or two into the definition, this pattern doesn't hold up well when there are several, or dozens. AF has been treating "Scientific names" as recognized (understand this is not an application of policy, of which there is none!), and "Taxonomic names" as unknown; I think we should treat "Taxonomic names" as a recognized L4 header, and convert "Scientific names" (which is sort of ambiguous, could be any kind of "scientific" naming ;-) to "Taxonomic names" Robert Ullmann 22:30, 14 April 2008 (UTC)
It would be very desirable for us to have maps from vernacular names (in, say, English) to the one or more species or genera that may be appropriate. These names are the key to a vast amount of good information that is not necessariy accessible from the vernacular name. Providing this key is something useful to users and not done well by Wikispecies or even Wikipedia. It would pay to make it as complete as we can. But if the list of species would become overwhelming than we can limit ourselves to the genus names.
In the case of dozens of scientific names you must be referring to cases where there are numerous derived or related species names appearing under a genus name that ought to be Translingual, that is an illiustration of the problem. These lists or often not complete, use obsolete or disputed names, or are wrong in other ways. I doubt that we will succeed in recruiting many taxonomists to maintain these listings. We also lack the specific structure that Wikispecies has for this kind of information and the breadth of info that WP offers.
I would think that we would want to discourage the creation of new entries with those headings and with the extensive derived and related terms lists and exploit Wikispecies' and WP's work. Linguistically, the language of the species names includes vast numbers that cannot be said to have been adopted into English, but are essentially Translingual, with components that are Latin (or latinized Ancient Greek). We can provide a usual service to WikiSpecies and to WP by handling the linguistic aspects of these names (etymology, morphology, inflection) as well as association with vernacular names. DCDuring TALK 23:30, 14 April 2008 (UTC)

(after edit conflit)

I agree that "Taxonomic names" is better than "Scientific names", but I'm not certain about either of them. Are these not Translingual proper nouns? e.g currently we have

Categories also seem all over the place, with entries in at least: category:Taxonomic names, Category:Taxonomy, category:Zoology, Category:Botany, category:Entomology and Category:Biology. Thryduulf 23:44, 14 April 2008 (UTC)

Lynx ought not be Translingual and should be lower case. I believe that the taxonomic names that appear as Translingual should all have at least the first letter of the first word be capitalized and should be proper nouns. The second part of species names should be entered as Latin, usually/always(?) an adjective, usually/always(?) uncapitalized. Some Latin-derived species and genus names have become part of English and often follow English rather than Latin pluralization. I rely on Stearns' Botanical Latin as a basic source, but haven't finished reading it yet. DCDuring TALK 00:02, 15 April 2008 (UTC)
Note that lynx exists as an English common noun for the wild cats, while Lynx is a translingual proper noun entry for the taxonomic genus.
I don't understand why the second part of a taxonomic name should be a different language to the first part? Just because it has a Latin etymology, and has a Latin homograph doesn't mean it isn't also translingual - particularly if it doesn't follow the Latin pluralisation rules. Thryduulf 00:10, 15 April 2008 (UTC)
I was deferring to my understanding of EP thoughts on the subject. The theory might be that it doesn't become Translingual until it is an officially recognized name. Until that time it is New Latin, a variety of Latin.
You are so right about Lynx. Sorry. "cycad" seems right as English, derived from Cycas, a genus. Each item would have to be checked for correctness.
If it follows English pluralizaton, then it would certainly warrant an English entry. If it appears in non-technical English documents, it might warrant an English entry, but the Translingual really should be sufficient.
As to categories: "Taxonomic names" serves to distinguish these Translingual from others; the discipline contexts/categories seem to be a shortcut for animal/plant/bacteria/mold/fungus/virus distinctions. cat:Taxonomy might be useful for New Latin terms used in taxonomy. That's my take, but based only on limited ill-remembered anecdotal experience and not systematic analysis. DCDuring TALK 00:43, 15 April 2008 (UTC)

My reading of the convenience samples of entries:

OK, I think
Abutilon is a redirect to abutilon, rather than being a Translingual proper noun, reducing the ratio of L2/L3-correct entries to 8 / 15. DCDuring TALK 12:36, 15 April 2008 (UTC)
Not OK
  • Ponginae: English proper noun => Translingual proper noun ("TPN")
  • chironomidae: English noun => u.c. TPN
  • Insecta: English proper noun => TPN
  • platanifolia: Translingual proper noun => Latin adj
  • accipitres: English noun => Latin obsolete taxonomic name
  • mongolica: Translingual noun => Latin adjective

All of them could stand a link to Wikispecies. Some don't even have WP links. Etymology would be fairly straightforward using a Classical or New Latin suffix and a Latin or latinized Greek head. DCDuring TALK 01:16, 15 April 2008 (UTC)

Just noting that the conversation so far looks all good to me (speaking as a trained botanist with a specialty in systematics). However, we might want to consider subdividing the Category:Taxonomic names (or whatever we call it). We probably ought to subdivide out (1) the binomials used for species, (2) names of genera, and (3) higher-level taxa. Putting the whole shebang into a single category seems, well... unhelpful. Particularly so since the lexical use and structure will differ. Species names are binomials, including a functional noun and descriptor. Genera are singular nouns. Higher-level taxa are often constructed as plural nouns, descriptions, or substantive adjectives derived from the names of genera or from characteristics of the group. --EncycloPetey 02:16, 15 April 2008 (UTC)

Regarding the categories, my initial thoughts are

  • category:Taxonomy should be used for taxonomic terms, e.g. genera, not individual species, etc names (there is already a note to this effect at category:Taxonomy)
  • category:Taxonomic names should be a sub-cat of category:Taxonomy (and probably others as well) and either contain all the taxonomic names or be a parent cat to more specific categories (EP please could you suggest appropriate category and context labels names).
  • Taxonomic name entries should also be in categories such as category:Entomology where appropriate but should also be categorised above.
  • Individual species, genera, family and order names, etc should not appear in Category:Biology. Thryduulf 11:25, 15 April 2008 (UTC)
The last suggestion might give us trouble in the cases where we do not have a category for the class of life or near-life (viruses?) under discussion or where the person making the entry does not have specific-enough knowledge. The less precise tag would help users in the meantime by providing a clue as where to look for more information.
I think that there is a strong case for a specialized rfc tag ({{rfc-taxon}} ?) for this kind of entry. The text field can convey information about the issues that previous editor had not yet resolved, with more detail always possible in the Talk page.
I still feel that we are wasting our time insofar as we are duplicating work being done by WikiSpecies. The entries already are misusing the Related terms heading. We should have the linguistic relationships (etymological, morphological, "Derived terms", "Descendants"). Maintaining the hierarchy is for Wikispecies. Perhaps we need templates that read from Wikispecies and provide their best information on next higher element and next lower elements in the taxonomic tree.
To me the vernacular name to taxonomic name mapping is a matter of great importance and value both to normal users and students of biological fields and one only partially addressed even by, say, the USDA plant database.
The use of the "Scientific name" heading is particularly troubling to me because it is so ambiguous. Is it supposed to be a synonym? A hypernym? A hyponym? A translation into Translingual ? DCDuring TALK 12:36, 15 April 2008 (UTC)
I disagree with point 3 above. We should not add all the scientific names of insects to Category:Entomology because they will overwhelm the other terms in the list. Besides, scientific names of insects are not used solely in an entomological context; they may be used when discussing evolution, ecology, botany, agriculture, strict taxonomy, etc. A Category:Taxonomic names of insects is possible, but that opens up the possibility of thousands of other similar categories that I certainly wouldn't want to have to maintain.
I agree somewhat with DCDuring. We don't want to be duplicating work that is covered on Wikispeices. However, Wikispecies does not cover the etymological origin of names and name components. That information falls under our mandate, and it is useful to be able to look up such things. Also, I would rather see the scientific name included in a definition, when it is added to a common name entry, rather than under a separate section header. --EncycloPetey 18:03, 15 April 2008 (UTC)
I absolutely agree that we need to do what Wikispecies does not do: coverage of the language that they use, Etymology, etc. I had tried to say that somewhere above. The only taxonomic entries that I don't think are worthwhile for us are the two-part (or 3-part) species names except in cases where the usage is common (Homo sapiens being the most common of these).
I also think we try to figure out how to get the latest information on taxonomic-tree navigaton (one level up or down) from Wikispecies at run-time. It might be that we can take advantage of the work Wikispecies has done to obviate the need any kind of category structure for taxonomic names at all. Perhaps we could read from them what kingdom (?) a given taxon is in. To me the question is whether our integration with Wikispecies is at run-time (ie, on demand) or periodically. If run-time integration can give adequate performance and transparency from a user perspective and is not too difficult technically, it would be highly desirable. But periodic (weekly, monthly, quarterly) updates (from Wikispecies (and to ???)) could be good enough. DCDuring TALK 19:30, 15 April 2008 (UTC)
Run-time integration would just substitue one problem for another. Our definitions include all major uses of a term, and not just the singlemost current one. So, while Wikispecies uses the current APGII circumscription of Liliaceae, our entry should have at least three definitions, each based on one of the common major senses meant in scientific literature. We also experience the problem of more than one taxon sharing a name. Some of Wikispecies' pages have a parenthetical inclusion to disambiguate names assigned to zoological and botanical groups. So, a plant and animal, or an animal and alga, may share the same name. Each will have its own kingdom, included taxa, etc. We also include obsolete terms among our entries that have no corresponding page on Wikispecies because the name isn't used anymore. We also allow for any taxonomic name at any rank, whereas Wikispecies sometimes skips levels that aren't used often (like infraorder). Wikispecies also has huge holes in its coverage. Run-time integration is not a pipe dream, but at the present we have nowhere near the readiness to implement something like that. --EncycloPetey 21:47, 16 April 2008 (UTC)

British != UK

I use {{British}}, especially when a reference says so. This renders as (UK), and adds category:UK, but the United Kingdom isn't Britain. It would seem more natural to refer to the language as used by people from a place, rather than within the borders of a polity. —Michael Z. 18:38, 14 April 2008 (UTC)

One major us of the UK tag is to discriminate UK-only from all-English or US usage. And linguistic place in this case is roughly equal to polity. As I understand it, UK = Great Britain = Northern Ireland + Britain; Britain = England + Wales + Scotland. I'm not too sure about Cornwall and Channel Islands. The UK tag is intended to include those places covered by the school system and English-language media located there and relates to contemporary usage that more or less covers the whole place. The languages and dialects that exist there are supposed to be covered by other tags, some of them somewhat controversial. I'm not sure how this set of tags can be improved. I'm also not sure how the English spoken in Ireland insofar as it differs from UK English fits into the tag system. DCDuring TALK 20:35, 14 April 2008 (UTC)
Actually, Great Britain refers to England, Scotland and Wales as a unit, and corresponds to the island of Britain (Cornwall is part of England, and I'm not sure about all of the little islands either). The UK is the United Kingdom of Great Britain and Northern Ireland, comprising Britain plus one sixth of Ireland. —Michael Z. 22:01, 14 April 2008 (UTC)
It is overly specific. The designation "UK" means that Northern Ireland is included, but the Republic of Ireland is not, and implies that the English of the former is closer to the language of the rest of the UK than to the English of the latter. But if an editor really means to be this specific, he will never use this tag, but a more specific one, e.g. {{Ireland}}, {{Northern Ireland}}, {{Ulster}}, etc.
It doesn't appear to correspond to any variety of the language, specific or general. We have British English and Wikipedia has an article on w:British English. My paper dictionary uses Brit. for "chiefly in British English..." It appears that dictionary.com, the online etymology dictionary, AHD and M-W also use "British". Is there any precedent in published dictionaries or in linguistics for a "United Kingdom English" or "UK English?"
As far as I can tell, the phrase "UK English" only appears when the country is juxtaposed with English, for example English lessons for students visiting the UK.
In the UK could be useful as a context label (not a language label) for e.g., institutions in the United Kingdom. For example, SAS means Special Air Service (in the UK) and Scandinavian Airlines System (in Scandinavia), but the abbreviation is used for both these institutions in British, American, and every other variety of English.
I think the tag text should probably be changed to Britain or British. —Michael Z. 21:24, 14 April 2008 (UTC)

And Commonwealth English (most of the world) goes where? Robert Ullmann 22:32, 14 April 2008 (UTC)

Why should it go somewhere?
I hadn't given it any thought, but there isn't really a language variety such as "Commonwealth English", especially from the point of view of labelling individual words. Wikipedia's w:English in the Commonwealth of Nations is a list of local varieties, and see also w:Regional accents of English.
Much of the language and usage is the same as its source, British English. Individual regions have developed their own features, but most of these will fall under one or two of Template:Australia, Template:New Zealand, Template:Hong Kong, Template:India etc.
Template:Canada stands out, because it is considered a category of "American English" or General American, has inherited many "Britishisms", and generally accepts much language from either. {{American English}} renders as (US), so this just means that maybe 4/5 of category:US will have to be moved to Template:North America.
Template:South Africa and Template:Philippines may be special cases of their own. —Michael Z. 23:53, 14 April 2008 (UTC)
The idea of "Commonwealth English" only applies to spellings. When we mark pronunciations, we must identify particular accents or dialects: AusE, RP, etc. --EncycloPetey 02:04, 15 April 2008 (UTC)
I hadn't even thought of pronunciations, only spelling and vocabulary. I suspect that (Commonwealth) is a synonym for (British, Canadian), but is in danger of being used to mark up terms which are really British and not Canadian
[A review shows that Commonwealth has some issues. I'll start a new topic at #Commonwealth, below.]
Anyway, what do you think of changing the text from UK to British? —Michael Z. 02:27, 15 April 2008 (UTC)
For spelling, I think it's fine, but for pronunciations it would be inappropriate. --EncycloPetey 03:35, 15 April 2008 (UTC)
Oh, man, right. I never do listen to the pronunciations. Is it a problem because there are transcribed or recorded pronunciations with Irish accents marked up as UK?Michael Z. 03:42, 15 April 2008 (UTC)
Yes, that counts as an error. Any pronunciation specifically transcribed or recorded for an Irish English accent should be labeled with (Ireland), or something more specific like (Ulster), where appropriate. We only use UK when (1) the IPA/enPR is general to most areas of the UK, or (2) the editor isn't sure which UK accent is represented. I tend to stick with (Received Pronunciation) because I know what it sounds like (from years of watching BBC programs), but other UK regional accents are possible. --EncycloPetey 17:56, 15 April 2008 (UTC)
Then it sounds like labelling the UK recordings as British would actually be more accurate, since Irish has its own label. —Michael Z. 18:09, 15 April 2008 (UTC)
No, because "British" is not a current geographic location, unless you are assuming that the people of northern Ireland sound more like Londoners than like the rest of English-speakers in Ireland. We prefer geographic labels when a specific accent is not used, and the highest geographic label I've ssen anyone use is national. --EncycloPetey 21:37, 16 April 2008 (UTC)
I'm confused. Am I getting something wrong?
UK includes Northern Ireland, so it includes English, Scottish, and Irish accents.
Britain or Great Britain are both current geographic locations which correspond to the big island, so either one includes English and Scottish. Since Britain includes many diverse accents, we prefer RP or mainstream English, and use other labels for regional speech including Scottish, Cockney, Cornish, etc.
The latter also corresponds to other dictionaries' use of the term to label senses, spellings, and transcribed pronunciation. Michael Z. 2008-04-16 21:47 Z
Sorry, I should have said "Britain is not currently a nation". We tend to prefer country names as the broadest item in {{a}}, so UK would be preferred if a specific regional accent cannot be given. "British" doesn't really improve specificity, since the island of Britain includes myriad English dialects and accents. The only way that "British" could ever be truly useful is if it is known that the pronunciation is identical throughout England, Scotland, and Wales, but is different in northern Ireland. There is little likelihood of that, and less that it would be known for certain. Eevn then, the question would be open in the mind of the reader about what was intended. --EncycloPetey 02:58, 17 April 2008 (UTC)

Am I the only one...

...who hates the [show]/[hide] buttons being on the left side of inflection tables? I miss the old ways! — [ ric ] opiaterein — 21:38, 14 April 2008 (UTC)

No, I also prefer them on the right. (Thus starts a quick opinion poll) Conrad.Irwin 21:41, 14 April 2008 (UTC)
Right, please. "show ▼" should not look like a subheading. —Michael Z. 22:05, 14 April 2008 (UTC)
Both., with ability to choose whichever you like via prefs. (left by default as I have heard several complaints about people not even knowing they could expand them, it would be nice if the entire div were clickable for expansion...) - TheDaveRoss 22:07, 14 April 2008 (UTC)
I too prefer it on the right, as my mouse is normally on the right hand side of the page for edit links / scroll bars (when using a trackpad it is even more annoying than a proper mouse). I do like the "▼" symbol though. Based on TDR's comment above, I think a WT:PREFS setting for left or right would be ideal. Thryduulf 23:24, 14 April 2008 (UTC)
Right. In principle I prefer the left, but "show" and "hide" come out to slightly different widths, which means that the header moves a bit in a way I don't like. (Yes, I'm picky. What else is new?) —RuakhTALK 23:29, 14 April 2008 (UTC)
Right as well. (Functionality to right, layout to left) - Amgine/talk 23:33, 14 April 2008 (UTC)
Could not a "left" option place the link just to the right of the subheading, instead of next to the right margin? —Michael Z. 23:36, 14 April 2008 (UTC)
I would expect that non-editing users would be better off with the show/hide more visible (almost certainly on the left) and editing users might well prefer them on the right, where their cursor often is. I don't see why the preferences of those of us here (almost by definition editing users) should determine what non-editing users see and use. The solution of having editing users be able to select the look in WT:PREFS or, better, "my preferences" would seem to be the best of both worlds. DCDuring TALK 23:46, 14 April 2008 (UTC)
The preferences are pretty much unanimous, so wondering what hypothetical non-voters might find useful should not be an issue. A lack of data should not be used to support the alternative. I too prefer them on the right. That is where the scroll bar usually is for a broswer window, so that is where I want the collapsing/expanding arrows. Putting the arrows all the way on the left side means more unnecessary hand and mouse movement. That will be true regardless of whether the person is editing or simply reading. --EncycloPetey 03:47, 15 April 2008 (UTC)

Reveal arrows are a common interface element in operating systems. On the Mac, they are a grey triangle facing right, just to the left of the heading/text, which rotates to point down when opened. they don't require brackets or "show/hide" text. Doesn't Windows use something like an elbow-down arrow the same way? I have no idea about Linuxes

Why not just copy what readers already expect to see, instead of inventing a new interface? It may even be possible to use MSIE conditionals to give a separate presentation for many Windows users. —Michael Z. 00:15, 15 April 2008 (UTC)

w:Disclosure widget, and Mac example (PNG)Michael Z. 00:30, 15 April 2008 (UTC)
That is a good point, I think it was mentioned in the previous discussion about these - but not as forcefully. Windows, iirc, uses a [+] symbol. This code is of course copied from Wikipedia, an so it might be better to match what they do rather than try and get back to operating system level (which would not be how people expect websites to act). Maybe the sideways arrow would be better if the link is on the left, though it doesn't make sense if the link is on the right. Conrad.Irwin 01:02, 15 April 2008 (UTC)
I have implemented the idea for making the whold NavHead (grey bit) clickable, and so this is yet another option to be considered, I am fairly ambivalent as to whether it is used, on the one side it is a bigger target to hit, on the other it is slightly less obvious what is going on. It does of course solve the problem of experienced contributors having the mouse on the wrong side of the screen ;). (Note that this method removes the functionality from the [show] button making it just a status indicator). Conrad.Irwin 01:02, 15 April 2008 (UTC)
Forgot to mention... The idea of having the link after the text was trailed and dismissed - though that was before the whole NavHead could be clicked. <techy note>Also, there is a yucky hack repeated all over the place - "    " should NOT appear a the start of <div class="NavHead">, the necessary space is now added by CSS. See this edit for what needs fixing</techy note>
Additional notes: hard refresh if it isn't working for you yet. Also, you can put your own "show/hide" anywhere you want by setting the .NavToggle{ float: [left|right]; };" in your monobook. You can hide it altogether with display:none;. - TheDaveRoss 01:20, 15 April 2008 (UTC)
Having the whole bar clickable is a slightly weird—we are not used to things popping open unless we click on a widget. (The exception is labels for form widgets, but they only change cursor focus.) Unexpected behaviour is not normally desirable, especially if it can change the user's context, say by unexpectedly opening a large translations section, and pushing the text they were reading down out of the window.
Frankly, we already have a very prominent and self-explanatory click target ([show▼]), so I don't think it improves things to make an invisible hot-zone about 15 times bigger.
A quibble: it also activates if I click and drag to select text in the title, and behaves a bit strangely if I double-click or double-click and drag. —Michael Z. 03:37, 15 April 2008 (UTC)

All changes have been reverted due to lack of support here. I am happy to play around with other ideas, but the current situation seems to have been annoying a lot of people ;) Conrad.Irwin 11:17, 15 April 2008 (UTC)

Thank you for your edits Conrad Irwin. I'll continue on Wiktionary:Beer parlour#"Show" tags. Best regards Rhanyeia 14:21, 15 April 2008 (UTC)
Please make it a PREF for those of us who aren't dumb like it that way. - TheDaveRoss 20:13, 15 April 2008 (UTC)

To Wiktionarians who frequent WT:RFD and WT:RFV

First of all, thank you to the people who put in a lot of time consuming and not always interesting leg work to question, list, verify and clean up entries.

Second, I have read something like 200 RFV and RFD conversations over the past few days, and many more than that in the past, and I have a wishlist or set of tips that I think will make following discussions and cleaning up after discussions a lot easier. This will hopefully cut down on the time it takes to archive, and make less tedious the archiving tasks causing more people to be willing to do them.

  1. Please make your "vote" clear. Right at the top of whatever statement you are making is best, bold "keep", "delete", "merge", "cited" statements make it much easier to see quickly, without rereading a 30 statement discussion what the gist of the content was. Make any qualifying statements, blanket statements, policy statements or general statements after the vote, so those archiving later can skip the information not directly relating to the status of the page.
  2. Please close discussions which are completed. The best way to do this involves <s>striking through</s> the title, noting the changes which were made to the page/sense/entry at the bottom of the discussion, and removing the tags from the target page. If this process is followed it is much easier to archive the page, it doesn't even have to be done by hand.
  3. Please format your comments so they are easily delineated from other's comments. If everyone's comments are at the same level it is much harder to figure out who said what, meaning we have to reread the entire discussion. If comments are on different levels it is much easier to weigh the different opinions.

Some of these requests may seem like laziness, not wanting to have to read the entire discussion, but I think if we all try to be more clear in our intentions on these discussion pages it will be less painful for people to archive old discussions and they will be archived more often. The second benefit for clarity in discussions is that they are much more useful down the road, we archive most substantial discussions so that people can read them in the future to understand why decisions were made, and the clearer our statements the more readily future readers can utilize them. Thanks. - TheDaveRoss 23:19, 14 April 2008 (UTC)

+1. —RuakhTALK 23:29, 14 April 2008 (UTC)
+1 and thanks to TDR for the colossal effort to close and archive. DCDuring TALK 23:46, 14 April 2008 (UTC)
I appreciate these thoughts not only to aid helpful archivers like DaveRoss, but to promote more closure on the RFD and RFV pages. Many of the discussions there are left ambiguous and unanswered, leaving a difficult and potentially controversial task in the hands of the archiver. The burden of accomplishing consensus and actually altering/deleting articles in the RFV/RFD process should fall to those active participants in the discussion and not be left to interpret by someone archiving 100 entries. -- Thisis0 23:40, 14 April 2008 (UTC)
Suggestion: Can we create a brightly-colored icon that can be placed on discussions which ought to have been closed, but for which the outcome is still unclear? Such a bright icon, added to discussions that have languished for a month (or more) might help draw attention to users who can help. --EncycloPetey 01:59, 15 April 2008 (UTC)
something like:
Question book magnify2.svg Input needed
This discussion needs further input in order to be successfully closed. Please take a look!
(perhaps with less obnoxious colors) - TheDaveRoss 02:16, 15 April 2008 (UTC)
I think the obnoxious colors will achieve the desired effect. In essence it says, "You must attend to this issue to make the ugly banner go away." --EncycloPetey 03:34, 15 April 2008 (UTC)
I think that's a good idea. How about setting it up as [[template:more input needed}} or template:unfinished discussion? Thryduulf 11:12, 15 April 2008 (UTC)
I've gone for the slightly snappier, imperative. {{look}}, which adds pages to Category:Input needed (though I suspect this feature is fairly useless) Conrad.Irwin 11:37, 15 April 2008 (UTC)
I was under the impression that these would go in the discussions on RFV/D, which would list RFV/D in that category but not much else :). - TheDaveRoss 20:09, 15 April 2008 (UTC)
That was also my understanding. We wanted to draw attention to neglected discussion within the RFD/RFV pages with an attention-grabbing template that lets folks know the discussion needs resolution. --EncycloPetey 21:33, 16 April 2008 (UTC)
As a trial I am going to start plunking this into old discussions without resolution, we will see how it goes. - TheDaveRoss 13:59, 19 April 2008 (UTC)

IPA

Isn't the IPA for Wiktionary wrong? (the last letter) —This unsigned comment was added by Fshfsh (talkcontribs) at 22:50, 14 April 2008.

No, not for some pronunciations. --EncycloPetey 03:56, 15 April 2008 (UTC)
The IPA is godly. I might start praying to it.
But if you're talking about the thing in the upper-left corner of every page, yes. It is. lol — [ ric ] opiaterein — 12:05, 15 April 2008 (UTC)

Commonwealth

{{Commonwealth}} includes Canada, but much of Canadian English spelling, vocabulary, and pronunciation differs.

Specifically, the following entries have senses or spellings categorized as Template:Commonwealth, but are little-used in Canada: alphabetise, alphabetised, appal, archæological, archæologist, brew, discretise, editorialising, fanny, first floor, heads of agreement, homoeomorphic, homoeomorphism, hospitalisation, phonograph, point, pretence, seagulling, superfund, tea cosy.

I notice some are marked up (Commonwealth, except Canada), but they still end up incorrectly included in category:Commonwealth English. Is it okay to mark these as British (UK) instead? (Does category:UK represent UK dialect of English, or UK regional context, or both?) Is there a better way to mark them up? —Michael Z. 04:07, 15 April 2008 (UTC)

If Canada is the only exception, then I think it makes sense for these to remain in the Commonwealth category. Otherwise they would have to be tagged {{UK|Australia|New Zealand|South Africa|India}} at the very least (and that would still leave out a bunch). It might be interesting to have a separate template and subcategory for Commonwealth spellings/words which are not used in Canada. -- Visviva 05:09, 20 April 2008 (UTC)
Perhaps Canada is the most common exception, since its language is in many ways a part of American English, but a regionalism from any of the Commonwealth countries is an exception.
I think paper dictionaries all refer to those as British spellings, and don't use the term Commonwealth for this reason. Michael Z. 2008-04-23 01:15 Z

(Roman) numerals: which ones to include?

Recently 70.55.85.225 (talk) has been adding content to a lot of non-standard roman numerals such as mmmmm and LLM. Some of these edits are not good, since inconsistent and wrong, but some of it is usable as well, and he gave a reference on my talk page which defends them, although nothing about double subtractions can be found on w:Roman numerals.

However, I am thinkin that most if not all of these entries are sum of parts and therefore do not deserve a lemma. Of course we want C, M, and the basic ones, just like we have entries for digits, but not for numbers (or are supposed to). But as I look at it, these are a mess as well. 09 exist (though with very different levels of detail), but 1019 as well, whereas 20 and on are redirects to the corresponding spelt-out pages. Someone really needs to do some cleanup here! See Category:Arabic numerals. (Note that entries such as 180 and 1337 have other reasons of existence, though there also, I’d propose to remove the ‘Translingual’ section.)

What do others think? H. (talk) 14:05, 15 April 2008 (UTC)

I don't mind including SoP entries - particularly in cases like this where it isn't necessarily obvious what parts it is made up of, however I wouldn't ask anyone to go round and create these. It seems to me that it is better to return something rather than nothing for cases like this, and if these words are deemed not necessary for inclusion I would like to replace them with {{only in|{{pedia|Roman Numerals}}}} or something useful (see XVII) perhaps linking to an appendix. As I said though, I would prefer them to have full entries if someone wants to create them. Conrad.Irwin 15:29, 15 April 2008 (UTC)

"Show" tags

This conversation has started on Wiktionary:Beer parlour#Translation bars and continued on Wiktionary:Beer parlour#Am I the only one.... The "show" tags of translation bars are now far on the right, and they were tested on the left. This led to a lot of comments and I'll continue here by commenting some of them:

"'show ▼' should not look like a subheading"   This may be true, and what I'd maybe be interested in trying next would be the "show" under the title text, but something else too as I'll write in the end of this message.

"Functionality to right, layout to left"    I think this is part of the "layout" since it includes important mainspace content. Those who don't edit are not likely to pay attention to a small "show" on the right and they don't even know they are supposed to be looking for something.

"...right. That is where the scroll bar usually is for a broswer window, so that is where I want the collapsing/expanding arrows. Putting the arrows all the way on the left side means more unnecessary hand and mouse movement. That will be true regardless of whether the person is editing or simply reading."    I haven't even thought about the scroll bar. Most softwares I use have most of the things on the left side (except the scroll bar), and because English (and many other language) text naturally starts from the left, that's where people are looking at. And there are things here also on the left side, like "save page" or "search". How about if there was a clickable area all the way under the title text, with ▼ both on the left and on the right, and "show" on the left? Best regards Rhanyeia 14:18, 15 April 2008 (UTC)

The fundamental issue is that we (those participating in community forums) are not a good model for "normal" non-editing users. As long as this is a construction site organized for the the convenience of the craftsmen rather than as a building to be used by normals while being renovated and maintained, it will not be possible to build any enthusiasm for efforts focused on the needs of normals. Even the limited amount of free-text Feedback seems to have been considered an annoyance. The lack of interest in our page-view statistics and in steps to improve our Google visibility are troubling to me. DCDuring TALK 14:57, 15 April 2008 (UTC)
Well, any wiki which tries to orient itself to the needs of non-editing users at the expense of the needs of editors is not going to get far. That's not because editors are somehow better or more important, but because ultimately any work that gets done is going to be done by editors, simply because they feel like doing it. People who want to work on usability and feedback response are welcome to, and I for one applaud the efforts that have been made so far; but I've never had much interest in presentation (on-wiki or off) and personally prefer to spend my limited time working on content. Anyway, in the present case, IMO the more critical concern is that a poor interface may cost us editors (and therefore content) in the long run, when people who want to add translations (etc.) can't figure out where they are or how to do so properly. -- Visviva 01:46, 16 April 2008 (UTC)
I like this last proposal (a bar beneath the gloss with down-arrows at either end); AFAICS it would solve the opacity issue without creating any new problems. Can we get a demo? -- Visviva 01:46, 16 April 2008 (UTC)
I wouldn't mind seeing a demo either, but if one control is wanting, then I suppose duplicating it would be twice so.
Tiger Dictionary screen
Another model is the OS X dictionary app's display of multiple references. It uses a reveal arrow at the left-hand side of a divider, but the whole divider line and label act as a control. The best thing about it is the simplicity. One widget, one rule, and a simple label, all in the same low-key colour. The content relies solely on typography to reveal its structure, with no, bullets, double rules, background colours, or unnecessary labels or punctuation.
I'm not suggesting copying it verbatim, but keep removing elements until there's nothing left to remove. —Michael Z. 05:53, 16 April 2008 (UTC)

Misspelled words

I was told that it would not be an appropriate to use a redirect to help people spelling words incorrectly to the correct dictionary page. Could anyone tell me if Wiktionary allows us to help people searching for a definition to the right page. I'm not really looking for Template:misspelling of but just a general spellchecker to aid people looking for the definitions of words in which they do not know the spelling of. If a redirect is not how to do it, then how? -- penubag  (talk) 09:19, 16 April 2008 (UTC)

I have done some experimenting with using aspell to spell-check my searches,(<tech>this currently works by using javascrpt to open an iframe to a python script on localhost which in turn uses the api to colour links correctly - so it is very inefficient</tech>). This seems to be very effective, though there are issues - particularly in that you need to guess what language was being searched for (<tech>currently guesses english + the ACCEPT_LANGUAGE header of the browser - but should be possible to get the guess to change for other scripts</tech>). There are plans for us to get mediawiki:Extension:DidYouMean which deals with diacritics etc. and so it may be possible to integrate aspell into an extension like it at a similar time. I thing that this is very important, as a large proportion of our feedback has said "I can't find what I'm looking for" Conrad.Irwin 10:35, 16 April 2008 (UTC)
You are doing what used to be called God's work, Conrad. DCDuring TALK 11:23, 16 April 2008 (UTC)
I just looked at usage of "God's work". Let me make clear I meant that in a good, non-ironic way. DCDuring TALK 13:48, 16 April 2008 (UTC)

context...sense...qualifier...italbrac...a...

Are all of these really necessary? Do we really need more than 2, maybe 3 of these? — [ ric ] opiaterein — 12:19, 16 April 2008 (UTC)

Italbrac does less than the others, I think. I think they are intended to have mnemonic names suggestive of their application, even if they don't do things very differently. But, as I understand it, context, for example, puts things in categories and also creates a list of candidate context categories. To me italbrac is the one that has been superseded. But I suspect it would not be wise to automatically replace it with one of the newer more specific tags. It is tedious to hand-check each one, so it doesn't seem to be a high priority to replace it so that the apparently redundant template can be deleted.
Maybe we need to have template categories, like:

Encouraged, Standard, Permitted, Deprecated, To Be Removed, Experimental. Encouraged might be rendered ultra-accessible. Deprecated would be removed opportunistically and earn the scorn of an editor's peers if used. "To Be Removed" could have lists and a project page to encourage removal. Maybe we already have or have had such a system. Perhaps it has been found wanting. DCDuring TALK 13:46, 16 April 2008 (UTC)

This seems plausible, though it might lead to some unnecessary political drama if imposed too rigorously. But let's not forget that a) this is an open wiki, where overt structure is frequently harmful; b) in the absence of real policies, templates often fill this role de facto. Nobody seems to mind when I and others use Template:quote-book in entries, but I imagine EP (and perhaps others) would have a fit if I tried to mark it "encouraged." Likewise I would have a fit if someone marked it "deprecated" (without a better replacement) or "experimental."
I think we do have a deprecated template category somewhere, and possibly one for "subst-only" (which is a very important category, since templates of this type often to be orphaned). If not, those should certainly be created; "experimental" too. For "encouraged" one would want to see some sort of non-bureaucratic approval process; a flash poll on the BP/GP, maybe? -- Visviva 02:53, 17 April 2008 (UTC)

{{context}} to provide labels before the definition and if the appropriate template exists it will categorise etc. {{qualifier}} is used to qualify items in lists, it is often used next to links to alternative spellings, in translation tables etc.

{{sense}} is used under the Synonyms and Antonyms sections to refer to the sense/definition to which the listed terms apply. {{italbrac}} as mentioned is mainly redundant. Hope this helps--Williamsayers79 18:15, 16 April 2008 (UTC)

The talk pages of these templates are usually a good place to start if you want to know what they do. Some of them however, may need a documentation updated.--Williamsayers79 18:17, 16 April 2008 (UTC)

I don't use {{qualifier}} myself, but all of the others are definitely necessary (though {{italbrac}} may not be used much anymore). Although the output will look very similar for most of these, the function and location for their use are very different. Each exists so that, in the event we decide to format a particular section differently, we need only adjust the appropriate template. They also exist so that users can customize display for certain sections. Using the separate templates for the different functions keeps format and customization of different sections separate, rather than forcing a single format for all of those locations. --EncycloPetey 21:28, 16 April 2008 (UTC)

Checking my own uderstanding: {{italbrac}} is designed soley to give users who are "in the know" the option of viewing given text one way or another based on personal preference, instead of having the text "hard-coded". Normal users would see it in the default setting. The same capability is a by-product of the other templates (but I hope the primary justificaton is what EP suggests: flexibility in altering presentation for the benefit of actual end users). There are analogous templates for controlling appearance and functionality in etymologies {{etyl}} and {{term}}. {{term}} is actually supposed to be used when a term is referred to within text like usage notes, excluding definitions. It is particularly useful for non-English terms, especially non-Roman scripts. I would interpret all this as meaning that the more specific template are to be preferred over both hard-coding and {{italbrac}} and "italbrac" is preferred over hard-coding. It would seem to suggest that we will have "italbrac" with us for a while. DCDuring TALK 22:02, 16 April 2008 (UTC)

That seems about right. Some background reading: 2007: discussion leading to qualifier, 2007: qualifier is born, 2006: italbrac discussion. Frankly I'm still not sure why we are still using {{italbrac}}, except for inertia; all the standard cases where italics and parentheses would normally be required are already covered by specialized templates. The problem being, as mentioned above, that each case needs to be hand-checked. -- Visviva 02:53, 17 April 2008 (UTC)

List of descendents

What if we added a parameter to the Term template like |desc=true en to populate a list of descendants of a word? For example, on sandal the word σανδάλιον would make a list (somewhere) and include English: sandal. I have no idea if this is even possible or how it would be done, just putting it out there as something that could be pretty interesting. Nadando 23:38, 16 April 2008 (UTC)

We already have a Descendants section. I'm not sure how such a parameter would work, since it would have to specify a language, yes? --EncycloPetey 02:50, 17 April 2008 (UTC)
Yeah, that's why I put |desc=true|en or something like that. Nadando 02:52, 17 April 2008 (UTC)
That sounds difficult or impossible to work. There would have to be a way that the descendant words were all marked (and many entries currently have no etymology at all!). Nothing we currently have does that. I think this would more easily be coded into the kinds of tables we currently have. --EncycloPetey 03:06, 17 April 2008 (UTC)
To be done natively within MediaWiki this would require some sort of fancy-shmancy (and unapproved) extension. But I suppose some sort of automated script (that would sift Special:Whatlinkshere for links from Etymology sections preceded by an appropriate etymology template) could do a good deal of this work. Visviva 13:22, 17 April 2008 (UTC)
That would certainly be worth a shot. I'd suggest, though, that instead of simply generating Descendants, it should also tackle Derived terms. After all, Derived terms are simply Descendants, but in the same language (at least the way we have the sections defined). So, the bot would need to know both the source and target language, and compare them. --EncycloPetey 21:40, 17 April 2008 (UTC)

en-adj (not comparable) link

Can the wikilink from the (not comparable) option in the en-adj template link to Appendix:Glossary‎#comparable instead of the more general wiktionary entry? -- Thisis0 07:01, 17 April 2008 (UTC)

I've been bold and changed it per your suggestion. I've also changed the target of the "comparative" and "superlative" links to the same glossary entry. Thryduulf 11:56, 17 April 2008 (UTC)
I've made the same edits to the {{en-adv}} template, the {{en-noun}} template already links to the glossary. Are there any others that would benefit from this? Thryduulf 14:24, 17 April 2008 (UTC)
Good. Now all we need to do is work on the Appendix language about "the controversy", that is, vestigial prescriptivism. DCDuring TALK 14:38, 17 April 2008 (UTC)
What do you mean? what do you want addressed? -- Thisis0 18:58, 17 April 2008 (UTC)
The {{comparable}}, the sense-line tag, important for long entries, would benefit from the same link. I believe that {{not comparable}}, {{countable}}, {{uncountable}} (Should that be displayed as "not countable"?) have links. I forgot to check whether all the links are to the appendix glossary. DCDuring TALK 14:50, 17 April 2008 (UTC)

Other project sidebar links

Where the target page in another project has the same title as the Wiktionary page the sidebar displays just the name of the other project (e.g. at frog). Where the target page has a different name, the box-style templates (e.g. {{wikipedia}}) display just the project name, the lite templates (e.g. {{pedialite}}) display the project name and the title.

Where more than one page is linked to on another project this can be very confusing, for example:

pantograph

go

Lynx

I can see two ways around this - the first is to display the target page name, even if it is the same as the Wiktionary page title. The second is to allow a parameter that contains custom text to display as the name. Thryduulf 15:24, 17 April 2008 (UTC)

Are you talking about "in-text" (aka "in-line") or "in-list" links? Have you looked at WT:LINKS? DCDuring TALK 16:02, 17 April 2008 (UTC)
I'm talking about the "in other projects" links to Wikipedia, Commons, Wikispeicies, etc in the sidebar. The lists above are copies of what appears in these boxes in the pantograph, go and Lynx entries. What I'm saying is that we need to change how these links are displayed so they are less confusing. Thryduulf 22:34, 17 April 2008 (UTC)
The problem is that there is not really enough space for any more information, see Template_talk:wikipedia2 where I have included some longer iwiki links. I'm really not sure how to make this less confusing. Conrad.Irwin 00:14, 18 April 2008 (UTC)
Look at Afar. For this page, the main link in the disambiguation; other links have specific names. Is that what you mean? If so, I think it best to link to the disambiguation in such cases, and only link to more specific pages via pedialite if there are specific pages with direct connection to specific definition senses. --EncycloPetey 00:49, 18 April 2008 (UTC)
I agree. —RuakhTALK 02:03, 18 April 2008 (UTC)
That sounds good, but for entries like pantograph there are two Wikipedia articles, one at w:Pantograph the other at w:Pantograph (rail)· What I'm saying is that the link to w:Pantograph currently appears in teh sidebar as "Wikipedia" and the link to w:Pantograph (rail) appears as "Wikipedia: Pantograph (rail)", but that they should appear as "Wikipedia: Pantograph" and "Wikipedia: Pantograph (rail).
At Lynx the links appear as "Wikipedia", "Wikipedia", "Wikipedia: Lynx (disambiguation)" and "Wikispecies". They would be much better as "Wikipedia: Lynx (cat)", "Wikipedia: Lynx (constellation)", "Wikipedia: Lynx (disambiguation)" and "Wikispecies: Lynx". Thryduulf 10:02, 18 April 2008 (UTC)
I'm not so sure. I think it would be better to have only the sidebar links for wikipedia:Pantograph ("Wikipedia"), wikipedia:Lynx (disambiguation) ("Wikipedia"), and wikispecies:Lynx ("Wikispecies"). I think the sidebar links should just point to the project with more information, and that project can help its reader navigate around. (I'm not sure if it's best to try to implement this in the wiki-code for the templates, or in the JavaScript that produces the sidebar links.) —RuakhTALK 12:24, 18 April 2008 (UTC)
I'm inclined to agree with both of you. That is, it would be nice to be able to suppress sidebar placement, but when present, sidebar links should clearly identify their destination (at least when != w:PAGENAME). As seen in the Lynx example, {{PL:pedia}} seems to behave in the desired way while {{wikipedia}} does not; this is governed by the stuff in the "interProject" span at the end of the template. I can't see any reason why the two templates should behave differently in this regard. -- Visviva 13:46, 19 April 2008 (UTC)

What happened to the main page?

What happened to the main page? RJFJR 15:51, 17 April 2008 (UTC)

  • You need to delete Wiktionary:Main Page‎ (added by "Bbnbcm") (- but there don't seem to be any sysops when you need them.
    • When I clicked history there were only three entries. (Its since been fixed) RJFJR 16:09, 17 April 2008 (UTC)

Has been restored. The renegade admin has been de-sysopped by a steward (thanks Spacebirdy!) Robert Ullmann 16:27, 17 April 2008 (UTC)

I suspect that the loss of images may be related. DCDuring TALK 13:19, 18 April 2008 (UTC)
His deletion log doesn't show any recent image deletions that we're unreasonable. What surprises me is that we are not using local protected copies of the Commons images on the main page. It doesn't look like they're protected on Commons either... Mike Dillon 15:07, 18 April 2008 (UTC)
It might have been a temporary problem at Commons or even more specific to me. It would only have taken a rename/move at Commons, which could be under another name. An incident always makes one a little skittish. But it is a vulnerability, not that a wiki won't always have plenty of them. I don't have the mindset for security. I appreciate whatever protection can be afforded our efforts, as long as there isn't excessive inconsistency with the fundamental philosophy. DCDuring TALK 16:12, 18 April 2008 (UTC)

FL links in translations

I've updated the CSS and the {t} templates for these. The appearance should be improved on most browsers (but I can only test a few); the links will have less effect (if any) on line spacing, and on IE's irritating habit of vertically centering a line with a superscript on the bullet. I may have made them just a little too small, tell me? (It has to do with how many pixels, so on my screen 70% and 75% are one pixel different, others may see 0 pixels difference.)

The result is that you can now do a whole series of customizations. You can change colours, font size and font, adjust the degree of super-scripting (including none), leave off the parentheses, or suppress the links entirely.

On all current browsers except Internet Explorer, you can also replace the parentheses with (e.g.) brackets, add symbols, or replace the language code with symbol(s).

See Customization at template {{t}}. Robert Ullmann 15:51, 18 April 2008 (UTC)

Inactive Sysops

I just did a quick, informal audit of Wiktionary sysops. I was looking at the total number we had (75) and it seemed high. The reason that it seemed high is that 11 (~15%) of our sysops are relatively (or completely) inactive. We have, in the past, removed sysops who were no longer active on the project, and I was wondering if the time had come to do so again. If not removing the sysop flag, perhaps removing them from the list on WT:A, which gives the impression that we have a lot more help than we actually do. Here are the sysops which I found to be "inactive" by my own impromptu standards:

  1. User:Ortonmc - diff declaring he no longer wanted to be an admin.
  2. User:Jun-Dai - 4 edits since 2006, not sure about interest in the project any longer.
  3. User:Tawker - 1 edit in past year, not sure about interest in the project any longer.
  4. User:Kipmaster - Not much use of the tools, we can ask easily enough about interest.
  5. User:Psy guy - no edits since 2006
  6. User:Aulis Eskola - ~12 edits in the past year, not sure about interest in the project any longer.
  7. User:Andrew massyn - inactive since May 5 2007.
  8. User:Pathoschild - not terribly active here anymore, easy to ask about interest.
  9. User:Alhen - not very active here, still active on es.wikt, easy to ask about interest.
  10. User:Tohru - 5 months inactive, not sure about interest.
  11. User:Enginear - ~1 year inactive, not sure about interest.

Now, lest anyone say so, I have nothing against any of these people, I quite like all of them who I have interacted with. My primary concern is that we actually do need more sysops, and right now it looks like we have a lot more than we really do. I consider the sysop flag something which indicates a participation and commitment to the project (I know this isn't shared universally) and would like to see more active contributors flagged to help out, but when folks are done I think it is also a good idea to remove the flag. This is certainly not an all or nothing thing, there are varying degrees of inactivity amongst the people I have listed, but I am interested to hear thoughts on what is in the best interest of Wiktionary. I think the ideal here would be to come out with a clear idea of what we consider inactivity, what we consider a standard practice when a sysop is inactive, and then apply it now and in the future. - TheDaveRoss 22:03, 18 April 2008 (UTC)

Perhaps, instead of removing the sysop flag - which I see no reason to do if we want more sysops ;), we could just remove them from the main list at WT:A and use that as our count instead - there is no need to rely on the software's counter. For me, i would say that inactivity sets in after 6 months of no edits. When people declare that they no-longer wish to be sysops they should have the flag removed. Conrad.Irwin 22:17, 18 April 2008 (UTC)
Agree with Conrad.Irwin. I see little harm in allowing inactive sysops to retain their flags, at least until we see evidence that this is dangerous. However, it is nice to have an accurate list of active sysops. What might also be nice is if one of our technical folks could write a dynamic list of sysops active within the previous five minutes, so that folks looking for an active admin at the moment (to block a rampaging vandal, ask a question and get an immediate response, etc.) could find one. -Atelaes λάλει ἐμοί 22:29, 18 April 2008 (UTC)
I am curious as to why inactive sysops ought to retain the flag? It is my understanding that the sysop flag is not a merit badge, it is a set of tools entrusted to certain editors in order to help the project proceed. People who don't edit Wiktionary anymore don't need the tools. - TheDaveRoss 22:51, 18 April 2008 (UTC)
Simply because removing them seems like a waste of effort - and, should they pop back one day, then we'd suddenly have more sysops without having to wait for whatever procedure to give them the tools back. The tools should be given to anyone we can trust to use them properly, not just to those who can demonstrate that they are using them As a period of inactivity does not change a user's trustworthyness (we've been through the compromised account arguments before, it is exceedingly unlikely) it should have no impact on the tools. Conrad.Irwin 22:58, 18 April 2008 (UTC)
The "trustworthiness" metric is an interesting one...some of the older sysops were simply appointed, others got 3-4 votes...this is neither here nor there, but it isn't as if there were sweeping mandates, just a confirmation that folks knew and trusted them. As it is, people who have been inactive for more than a year can hardly be expected to step in and know all the current policy (especially since we don't really write it down). Moreover, I think "sysop for life" adds to the notion that being a sysop is some kind of honor, it should be a set of tools that people have when they need them and don't when they don't. Several retirees have voluntarily dropped the flag when their work was done, others seem to just have disappeared. - TheDaveRoss 23:12, 18 April 2008 (UTC)
I agree with both points of view. I think it logically makes sense to remove an unused sysop flag after a while, especially since (pace Conrad) there is a bit of a risk of compromise now with the advent of unified accounts; but it just doesn't seem worth the effort to develop formal criteria (or vote on each individual admin) and petition stewards accordingly. —RuakhTALK 04:31, 19 April 2008 (UTC)
Actually avoiding a bunch of votes is what I had in mind, I figured if we can just say "one year without using any sysop tools indicates inactivity" or some similar basic criteria then we wouldn't have to vote individually. That was basically the criteria we have used in the past. It is also of note that none of the people who have left for that length of time have ever come back, Kevin Rector was pretty close but never actually had his tools removed. - TheDaveRoss 13:55, 19 April 2008 (UTC)
<detab> At the heart this is a suggestion for a change of community trust metric. Some projects have chosen to address this with success by either an inactivity limit (if you are inactive x amount of time, the bits are removed) or your position as admin is scheduled for a community reconfirm vote x amount of time after your successful RfA. There may be other solutions as well. - Amgine/talk 14:15, 24 April 2008 (UTC)
I think what would be good would be something like what Commons has introduced recently, whereby if a user doesn't use any admin tools for X amount of time (6 months I think there), then they are given a note on their talk page that their adminship is under review. If after a further month they still haven't become active again (or given a good reason why they still need the tools), then they are de-sysopped. If they become active again, they can reapply for adminship with a much lower threshold - I think the intention is that they get it back if nobody objects within a few days. If someone does object then the application reverts to a standard period and threshold RFA. Thryduulf 15:37, 24 April 2008 (UTC)
I have divided the admin list by activity, feel free to modify as needed: [14]. Dmcdevit·t 23:02, 18 April 2008 (UTC)

Display of attributive use of nouns

We often have nouns that are used attibutively, but which do not seem to warrant an entry as an adjective because they do not have enough attributes of an adjective. amazon is an illustration with an unresolved RfV. We need, IMHO, some way of indicating that a noun can be used as an adjective. I would have in mind its use not necessarily for all nouns, but at least for those that have gone through an RfV process that has determined that the adjectival use of the noun does not warrant an adjective PoS. This might discourage the reentry of the adjective PoS and provide helpful information to users.

MW3 indicates such usage by : "often attrib".

Some options I can imagine are:

  1. No Adjective PoS header
    1. inflection line label along the lines "(often|sometimes|rarely) used as attributive adjective"
    2. a "Level 4" heading under a Noun heading: "Adjective use", with text as above and usage examples
    3. a link to citations page with a standard heading on the citations page for attributive use of a noun.
    4. a label on the definition line (per Visviva)
    5. a templated usage note (per Visviva)
  2. Adjective PoS
    1. a standard bit of explanatory text on attributive use of nouns and a link to an Appendix or WP article with more.

Has this been addressed before? Was it resolved in the negative or left open? DCDuring TALK 17:58, 19 April 2008 (UTC)

And also possibly for those that are often used attributively (or better - for all nouns), when there is no corresponding English adjective (in -ic, -an...) meaning "of or pertaining to <noun sense>", to have an optional adjectival translation in noun's ====Translations==== section. Quite a lot of FL adjectives don't link proparly to base English forms because of that, and have "of or pertaining to" stubs. It would be much easier to use [[noun]] (''attibutively'') instead (or standardise this typical usage via some template). --Ivan Štambuk 18:12, 19 April 2008 (UTC)
At the moment I would tend to favor option 1.4, a label on the definition line. A label on the inflection line is problematic, since one sense may have a much stronger attributive tendency than another. Actually I prefer option 1.5, a templated usage note in the noun section, but this has met with opposition, apparently on the grounds that too many electrons would be consumed. ;-) -- Visviva 05:04, 20 April 2008 (UTC)
I have added the suggestions to the list above. DCDuring TALK 09:53, 20 April 2008 (UTC)
I like the idea of a templated usage note, particularly since attributive use may apply to more than one sense of the noun. The templated note could include a link to an Appendix:English nouns, specifically to a section on attributive use. I would also include a link to a special section of the Citations namespace page associated with the entry, demonstrating attributive use. --EncycloPetey 15:25, 20 April 2008 (UTC)
I like 1.4 and 1.5, and see no need to choose between them. The more, the merrier. :-) —RuakhTALK 16:13, 20 April 2008 (UTC)
To clarify and develop 1.5 a little further: Under the heading "Usage notes", under the "Noun" PoS, we would have a template available (not mandatory) for insertion containing a link to the Appendix section referred to by EP, with text that said "Used attributively as an adjective", with attributively being the link word to the Appendix section.
Please feel free to suggest modifications, radical or minor.
One of the advantages of keeping the adjective PoS header is that a user who has looking for a usage that seemed adjectival for a word that had many Noun definitions would be able to click on Adjective in the ToC and go right to an abbreviated section that referred the user to the noun section with the templated explanatory text. Putting something on the Noun inflection line is better than forcing the user to page down for a usage note that the user didn't know s/he needed, but doesn't appear on the first screen. DCDuring TALK 16:46, 20 April 2008 (UTC)

I'm glad to see this brought up again. Previous attempts at a solution here, here, and here. It's clear to me that these nouns are not in any way adjectives, and referring to them as such doesn't cut the mustard. To me the test is if they are exchangeable with other adjectives. Her "voluptuous, attractive, amazon physique" could not be rendered "her attractive, amazon, voluptuous physique." If it could, and make sense, it's made the crossover to adjective. You read "amazon physique" as a unified noun phrase. That's what it is. Not an adjective. The Germans usually make one word out of it -- that's another test. Other obvious tests that go along with this are the predicative: ("Her physique was amazon.") and comparative forms ("more amazon") that aren't figurative/humorous/a colloquial slip of the tongue for "amazonian". As for a solution, see what I did at satellite, senses 5 & 6. This thing has a name, Noun adjunct. I once thought a templated usage note was the best idea, but the Noun Adjunct tag (preferably blue-linked to it's own appendix with thorough explanation) is the most appropriate to our format while getting to the most truth. -- Thisis0 16:53, 20 April 2008 (UTC)

I don't think of a PoS header as an assertion of what something "is" (although that might depend on.... Oh, never mind.). An Adjective PoS header that was immediately followed by an explicit assertion that the word should not be considered a true adjective would probably serve to prevent users from going the wrong way with it.
The term "noun adjunct" does not have the advantage of being widely understood without clicking on a link (which link is not yet present at satellite).
It really depends on whether we are trying to create a dictionary that is primarily intended to be a map of linguists' current understanding of language or something helpful to learners and non-linguists, antiquated concepts and all. I think we have to build more bridges to the benighted minds of the mythical anon users, about whom we know so little, but who are the source of future registered users and contributors and the ability to win funds from users and grant-givers. DCDuring TALK 17:24, 20 April 2008 (UTC)
An erroneous adjective section only reinforces the idea that these are, or might be, adjectives. There is clearly confusion surrounding this issue. It's our job to properly categorize and define language so it can be properly understood. "Dumbing it down" for the mythical anon is not at all productive, accurate, or desirable. We aim to have clear, usable definitions that inform the most casual user, and also more informative categories and tools for those who care to make one click. The limits of this dictionary won't be set by the least concerned user. If you fear offending him, why are the esoteric multi-Latin Etymologies at the top of every entry? Certainly that is more oblique at first glance than a Noun Adjunct tag or, on another topic, plurale tantum. -- Thisis0 17:43, 20 April 2008 (UTC)
I would not object to hiding the Etymology and Pronunciation sections under show/hide bars or moving them out of precious first-screen space. It is hardly a question of "dumbing it down" to treat the mind of our archetypical user as someting other than a tabula rasa. I don't think of users who haven't spent much or their life on grammar as dumb even when they don't speak or write to my taste. We need to accept the realities of their prior education and other experience.
The vast majority of users think of "Adjective" (when forced to think of it at all) as meaning modifier or describer of a noun. They do not differentiate attributive vs. predicative usage, comparability/gradability, let alone other more subtle attributes. I don't see what beneficial goals we achieve by adding the additional conditions if by so doing we limit a user's access to the most basic information that might be sought. It should be clear from Feedback that users don't find all that we do helpful. DCDuring TALK 18:29, 20 April 2008 (UTC)
Can I tell you what doesn't make any sense in what you just said? First you say, (unafraid to use an esoteric term, I should point out) that we would do best to approach our users as a tabula rasa ("blank slate"), but then you are wanting to operate on a premise that they do have preconceived notions that 'noun modifiers must be adjectives', etc. Which is it? I agree with 'blank slate', believing we should impart complete and correct information. Second, you twisted my intent for "dumbing it down", assuming I somehow said non-grammar-philes are "dumb." No way. On the contrary, my point is that average users are intelligent enough to digest an accurate grammar tag, and we should not ever assume they are "too dumb to get it." This seems to be your assumption.
Problems with your solution (Adjective POS with note saying "it's not really an adjective"): 1) That's dumb. 2) If a casual user has any defining characteristic, it's a tendency to glance or skim; a misleading adjective POS for a non-adjective is wrong. 3) It takes up a lot more room. 4) It separates definable noun senses from the Noun POS. 5) It doesn't make any more sense to a non-interested user than an appropriate, succint tag. Yes, there is no Appendix:Noun Adjuncts currently, but there will be. This approach (as in satellite senses 5 & 6) is accurate, succinct, non-intrusive to the uninterested user, and educational to the interested user. Please, please, can the conversation be about this, and not about how we should make this place to cater to the least user. That's what happened discussing plurale tantum, and I do not want this one to fizzle out 'cause it's just turns into you and me bantering about how you want to appease the least user. I really want to hear what others think about the proposed solutions. -- Thisis0 19:37, 20 April 2008 (UTC)
I believe that it is our job to cater to "ignorant", impatient users first and other users (or the same users when they have more time) later. Esoteric terms seem fine for this forum, not for the target or our basic entries' first screens. If we are any good at language, we should be able to figure out how to "dumb it down" without doing violence to a deeper and more subtle understanding.
I think our users' pre-existing understandings are the facts of life that we must accommodate to serve a non-elitist version of the mission of WMF. The first skim of the ToC is the first place that we can lose users of our longer entries. If they are looking for something that behaves a lot like an Adjective in the most central way (modifying nouns) and don't find Adjective, they will most likely go to another dictionary and be annoyed at us. Neither outcome will increase the chances that they will click on Wiktionary again. Like you, I had thought we ought to be able to count on our users to know enough about the language that we could completely dispense with an Adjective PoS for Nouns where the only adjectival use was as an attributive. But, 1., seeing that contributors often insert Adjective PoS sections after Noun sections and, 2., examining dictionary definitions of adjective have led me to question my own beliefs and preferences. My thought about using an Adjective PoS was that we could direct users from an Adjective PoS heading to both the Noun (for definitions) and to a helpful explanation of attributive use of nouns. I simply don't see how that inherently constitutes a problem. It might not be the best solution, of course. DCDuring TALK 20:15, 20 April 2008 (UTC)
I think a separate 'Ajective' PoS would be unnecessary. As long as we mention somewhere (definition line, or usage notes) that it can act similarly to an adjective our entry should jive with what the user was maybe expecting. As for prempting users from creating an 'Adjective' PoS, well as long as we use standard template(s) we should be able to flag entries that have an attributive noun sense, and a separate Adjective PoS, and someone can cleanup afterwards. That being said, I like the combo of a definition-line 'context'-like tag/template and a templated usage note. To make sure users understand the entry, we can worry about the exact wording later. --Bequw¢τ 20:10, 20 April 2008 (UTC)
I don't think that contributor creation of an Adjective PoS is a problem that has to be controlled and corrected as much as it is a concrete demonstration of how non-expert users look at PoS. It seems that if they know a word to be used in an adjectival way to modify a noun, then they think that a dictionary ought to show it as an adjective. If someone has facts that say, for example, that non-expert users have a category called two-word nouns and do not expect that the first of the two words is likely to be in a dictionary under Adjective, then I could put my concern to rest. I would even settle for a good sample of what ESL and grammar books would say about atributive use of nouns.
My Longman's DCE (for learners) doesn't note attributive use in individual entries at all. My MW3 (unabridged, US) has a generous number of nouns marked often attrib immediately after n as well as having seperate entries for neer-SoP phrases like "beer hall". DCDuring TALK 00:26, 21 April 2008 (UTC)
I agree that the creation of Adjective POS headers is a sign of a problem with our current approach (as are some of the messages received on WT:FEED), and will be a useful metric for any solution. But note that the current approach is to have no special notice of attributive use. If we start using usage notes and/or labels, particularly ones that contain the word "adjective" somewhere, I expect that user confusion (and the ensuing creation of spurious Adjective sections) will drop substantially. The proof of the pudding will be in the eating. -- Visviva 06:06, 21 April 2008 (UTC)
So, a usage note in the noun PoS is one thing that might be able to agree on. It doesn't seem to require a vote, AFAICT. To generate a real test, we would probably need to find numerous entries of the following classes:
  1. Noun PoSs that have Adjective PoSs under the same heading (to prevent senses being added in addition to the existing presumably appropriate Adjective senses).
  2. Noun PoSs that have had Adjective PoSs added under the same etymology which Adjective PoS has been removed.
Is the version of 1.5 laid out above after Ruakh's comment the best we can do? I wish I felt that we had a real metric: an actual share of additions of new Adjective PoS sections to English Nouns as a percent of total new PoS creations as well as a listing of the entries involved so we could make sure there wasn't too much large-scale irrelevancy. Can we flag entries that are having new Adjective PoSs added to existing noun PoSs (same English Etymology)? DCDuring TALK 10:36, 21 April 2008 (UTC)

You are all aware that the translations of these "noun adjunct" senses of nouns will actually be adjectives in most languages, even in Old English (the ones that have usually retained genders, have distinctive adjectival inflection etc.) ? --Ivan Štambuk 07:20, 21 April 2008 (UTC)

Actually, look at the current translations at satellite. Other languages have different inflections for the compound-forming nouns, but they don't usually become adjectives. That's actually part of the reason I favor calling them what they are. Other languages know they are nouns, and actually have an inflection case for compound-forming nouns. Yes, some languages will have these as adjectives, but words that are nouns in English should be called that, and you'll find many other languages agree. -- Thisis0 07:30, 21 April 2008 (UTC)
Well, I can tell you that every single English "noun adjunct" translated in Slavic languages (usally with -ni/-ski suffix) would be a classifier-type adjective, that Czech translation included. Lexical content of a first noun is used as a qualifier for the second noun, and every language that 1) has the abovementoined properties 2) favours adjective-noun vs noun-noun constructs (that is, not like modern German) would pretty much always use adjectival translation. Tbot-generation of entries from translation tables would have to be disabled for this "noun adjunct" senses. --Ivan Štambuk 07:49, 21 April 2008 (UTC)
Why? Because the part of speech doesn't match between languages? If we did that, then we wouldn't be translating the names of languages. Translations are about translating, and understanding that the grammar in the target language may very well be different. Besides, in most cases there will not be a separate "noun adjunct" sense. A separate sense for "noun adjuncts" is only useful when the sense (when used as an attributive) is more specific or limited than the noun in general. --EncycloPetey 12:32, 21 April 2008 (UTC)
yes, and also when it is used frequently in noun compounds (dairy, chicken), and when there would be any potential confusion or desire for an Adjective section. -- Thisis0 17:16, 21 April 2008 (UTC)
Yes, and Tbot can't differentiate between those. I remember correcting about a dozen Croatian nouns into adjectives generated by Tbot that were incorrectly placed in the translation tables of English language names (whose noun senses are mostly fossilized adjectives anyway ^_^). Mismatching between the basic PoS categories such as nouns/adjectives that almost all the (relevant) languages of the world have is not a good choice IMHO. --Ivan Štambuk 14:42, 21 April 2008 (UTC)
On the other hand, Spanish (and probably other romance languages) most often translate "<noun1> <noun2>" into "<noun2> de <noun1>". Spanish does have separate adjectives sometimes, but the rearrangement with de ‎(of) is more common. --Bequw¢τ 21:31, 21 April 2008 (UTC)

If you label such things adjective, learners of English, who have studied many grammar rules but don't really know the language, will assume that you can do adjectivy things to them: modify them by very, too, so, use them attributively and predictively, grade them, etc. The other thing is, it's pretty hard to think of a noun that CAN'T be used attributively. I mean, maybe there are some words that only appear in certain constructions dint as in by dint of or sake, but apart from that... You can even do it with proper nouns. Why would this need to be mentioned at all? (I'm speaking of English nouns here).--Brett 01:41, 24 April 2008 (UTC)

You're right. You can do it with all nouns. It's a regular property of nouns in modern English. We're just talking about those that are used most commonly in an attributive sense (dairy, satellite, chicken, etc.), those that have an attributive sense with a slightly different meaning (amazon), or anywhere there might be confusion or a desire for an Adjective header. (Unless of course they've made the full crossover to adjective, then that's what they are.) -- Thisis0 02:58, 24 April 2008 (UTC)
Because our contributors regularly attempt to add adjective PoSs to nouns because they feel that the adjective sense is missing. The proposal at hand is to come up with some way of preventing that and to also direct users to the noun PoS definitions to find the meaning of the adjectival use they might be interested in and to a helpful note explaining attributive use of nouns (and what shouldn't normally done to such nouns) so they don't waste time looking in the wrong place in the future. I believe there are many users who do not remember this kind of thing or were never taught it. I was one of them, though blessed with the tendency to use nouns attributively without support from any rule. I expect speakers and writers of English to "adjectify" almost any noun they can in all the ways that you seem negatively disposed toward. Sometimes I think that this censoriousness must be much more UK than US (;-)). Wasn't that last just so George W. Bush of me? DCDuring TALK 02:34, 24 April 2008 (UTC)
I don't think we need to prevent people from creating such adjective sections, nor am I advocating some such mechanism that will prevent or flag such contributions. No. All we are doing is making them more correct and imparting a little educational info. Like Brett said, as long as people think these are adjectives, they "will assume that you can do adjectivy things to them". They don't behave like adjectives because they aren't. Let's just start fixing the most common ones in a simple straightforward manner (Noun sense with tag), and get a good Appendix:Noun Adjuncts or somesuch going. We don't need a software flag or anything. -- Thisis0 02:58, 24 April 2008 (UTC)
Thanks, I think I understand the situation better now. And, yes, we've run into the same issue at the Simple English wiktionary. Currently, it seems to be under control, but we have only a very small number of editors.
By the way, I wasn't stating a preference but rather a fact about English. It is ungrammatical to say this is a very faculty office or ask how soccer is your ball? Yes, you can playfully force nouns to be adjectives, but this anthimeria is at a rather different level from what we were discussing.--Brett 12:01, 24 April 2008 (UTC)

The necessity of a new etymology header

Should the verb form entry be under a new etymology header like I have done with mast or is that unecessary? __meco 08:21, 20 April 2008 (UTC)

Yes, I think it is necessary. If the verb form were placed in parallel with the noun, this would imply that they share the same etymology, which would be misleading. The extra header is somewhat annoying, but I don't see any way out of it while maintaining a sound ontology. -- Visviva 09:37, 20 April 2008 (UTC)
I agree that a second etymology header is appropriate in cases like this, and is necessary to avoid misleading users. --EncycloPetey 23:22, 20 April 2008 (UTC)
I think there is a page somewhere that actively recommends it.Circeus 23:08, 21 April 2008 (UTC)
To push the matter closer to a conclusion, the entry might have "See [[mase#Norwegian|mase]]" under the etymology. Another possibility is to use the template {{term|mase||lang=no|insert gloss here}}. I also noted that mast's Norwegian section heading were not all at the right level after the insertion of the etymology. If "mase" does not actually have an etymology shown, then the Etymology heading at "mase" should have {{rfe|lang=no}}. Finally, I noted that mase, the lemma entry for the verb, as I understand it, did not have a Norwegian section. Following such trails can lead to valuable new entries when you have the energy and knowledge or reference materials needed. DCDuring TALK 01:31, 22 April 2008 (UTC)
Pages like this I've reorganized before so that the definitions that don't have etymologies go above all the ones that do. That avoids the problem of an empty etymology section, but it often puts much less important definitions first, so honestly I don't think it would be an improvement over what you have. What we definitely don't want to do is write "Unknown" or the like as the etymology, unless the origin of the word had been thoroughly researched with no conclusion reached. DAVilla 18:59, 23 April 2008 (UTC)
The priority would be to get the lemma form of the Norwegian verb entered, I would think. DCDuring TALK 19:47, 23 April 2008 (UTC)

Dutch gender

At long last we have a policy on this at the Dutch wikti, or at least I have proposed one and nobody objected. With as few as we are that is pretty much law. I have tried to explain the situation and its most reasonable remedy at Wiktionary:About Dutch#Gender and had a bit of a discussion with Visviva. I encourage the anglophone community (including its Dutch speakers, mothertongue or no) to support us in the chosen solution. It is admittedly a compromise, not of my making but that of the Taalunie. I must say though that the latter body has done a pretty good job imho. Jcwf 21:39, 20 April 2008 (UTC) nl:Gebruiker:Jcwf

Thanks Jcwf for taking this on. I know absolutely nothing about the background issues here, but deferring to the Taalunie seems like the most sensible option. (This would, I guess, mean barring "common" from inflection lines.) Perhaps {{nl-noun}} could also link to an appendix where these issues are discussed? Word-specific details could be discussed in Usage notes (or Etymology), as and if appropriate. -- Visviva 23:38, 20 April 2008 (UTC)
I'm a Flemish speaker and I didn't know the northern Dutch situation well.. In the entries I made, I looked to the gender Van Dale uses, and if there wasn't any I used {c} or {m|f}, but I support this proposal and I'll now use {f|m}. SPQRobin 15:20, 21 April 2008 (UTC)
This seems like an excellent solution, giving information rather than dictating how an individual speaker should speak their own language. I think we also need an appendix where this is explained, as many (if not most) Dutch courses for English-speakers use northern Dutch and the terminology of "common gender". Physchim62 17:10, 21 April 2008 (UTC) (non-native speaker, level nl-2 on a very good day!)

Category for agent nouns?

Would anyone mind terribly if I created a Category:English agent nouns, and categorized accordingly? bd2412 T 05:12, 21 April 2008 (UTC)

Not I. Seems a meritorious act. -- Visviva 05:53, 21 April 2008 (UTC)
So, the category is for Bond, M, and Q? ;) --EncycloPetey 12:28, 21 April 2008 (UTC)
Do we have an entry on too cute by half? bd2412 T 14:16, 21 April 2008 (UTC)

American

Folks who also play on Wikipedia may be interested in commenting on w:Wikipedia_talk:Manual_of_Style#American, regarding use of the term American to mean "United States". --EncycloPetey 12:27, 21 April 2008 (UTC)

Gaps in entry titles.

Do we have a good way of representing gaps in entry titles? Like, too … by half probably warrants an entry, which too and half should link to; but what should its title be? —RuakhTALK 17:18, 21 April 2008 (UTC)

I think we just hope like crazy that we can always split it into connected parts. "[too clever] [ by half ]" works for me, but I appreciate this ignores the main issue. Conrad.Irwin 17:21, 21 April 2008 (UTC)
Well, it was just a few months ago that we finished deleting all of the "X the Y"-type entries, so I'm guessing that wouldn't be the preferred approach this time (though it does seem logical). As I recall, a primary justification for deleting those was that no one would ever look them up -- something which I'm afraid would apply to pretty much any other way of representing these. This is part of our larger difficulty in handling collocational information, I'm afraid. An interim step would perhaps be to have an Appendix: page detailing the behavior of the given frame (Appendix:Too X by half?), housing various and sundry usage examples. - Visviva 09:06, 22 April 2008 (UTC)
Formulas for constructions could be permitted in any space outside of principal namespace where mostly more experienced users roamed. It would help if we had some agreement on which space had which kind of content. Appendix space would seem like a good place, but perhaps a more entry-like space that allowed constructions that used a Wiktionary-standard notation would be useful. Perhaps there is a suitable commonly used notation that we could appropriate. Such "entries" might be useful link targets from principal namespace. DCDuring TALK 10:58, 22 April 2008 (UTC)
I doubt not looking them up that way is a good enough reason to delete scare the X out of, which would be found in searches or as a derived term. Sure, scare/frighten/knock the living daylights out of/the wits out of/... could be reduced to living daylights out of, wits out of, etc., which is a better way to handle those. And sure, no one would ever look up X like Y. However, there are already tens of hundreds of entries with "one" or "someone" as placeholders that would be found the same way as scare the X out of. No one would ever look up one and one's either.
The question here I think is what to use as a general placeholder for an adjective. I would propose "thus" or "such" as options, but I'm not sure even "do" is used as a placeholder, and it seems like there's a bit of stigma against creativity, which would be very unfortunate. Nonetheless, we already have placeholders, exactly as do traditional dictionaries, and there's a lingering broader question of how to demarcate them. It's not apparent in the title, but I've generally made an indication in the entry itself by italicizing the marker in the heading instead of bolding it. For instance, compare take someone's point with someone else, up one's alley with one's self. But I seem to be in the minority, since some people like to not only bold the word, but also uselessly link it.
On the other hand, Hippitrail pointed out that italicization is ambiguous in contexts like nth where the italicization is normal for part of the term. So... maybe there is a better solution for the heading, or maybe the solution requires rethinking the titles. While mind one's p's and q's would imply that "his" or "her" could be substituted, mind one's p's and q's is actually used that way very commonly, so to some extent, both entries are needed. Probably a usage note, or even just a couple of examples, one with "mind one's p's and q's" and one with "mind your p's and q's", are enough. Otherwise, how would you know?
We don't generally use "..." in titles even for unclosed fragments, I think primarily at the insistence of Connel, who has argued against even such punctuation as (s)he] and s/he. While I strongly disagree with the priciple that including punctuation is always incorrect, and in fact find the wiki software too limiting, I'm fine with eliminating it when it's superfluous. However, I'm not sure that "..." is always superfluous. It isn't polite to say, by itself, "I'd like to know." It's only polite if it precedes something else. And anyways, what happens when we take an expression like that and translate it into another language where the "..." goes in the middle?
"Too by half" needs "..." or something else in there. Never having heard the expression before today, I definitely think it warrants an entry somewhere that can be searched from "by half". However, I don't think that saying by half is good enough is good enough in the general case. This is obviously a much more general problem. DAVilla 21:45, 23 April 2008 (UTC)
Appendices and similar places outside namespace 0 are useful for information not suitable for entries for various reasons, but are not very useful for inexperienced normal users using our search box. Notation algebra isn't going to work for them either. The best we can do for them with current software without silly proliferation of phrase entries is to have good usage examples and default-searchable citations that contain the search words they are looking for in such a way as to bring the best entry to the top of the search. Without better search, this kind of entry won't be found very often. I wonder how many actually look up this kind of article -- and how many find it. DCDuring TALK 08:43, 24 April 2008 (UTC)
We could take 4 or 5 examples representing typical problems and try to analyse our way to a/some solution(s) I was thinking about don't come the X with me. bgc gives the first 5 option for X as "acid", "raw prawn", "cowboy", "tin soldier", and "orator". It will be impossible to give a typical X for this phrase, as it takes almost any noun phrase you can think of. What would a schoolchild enter if s/he came across don't come the tin soldier with me and wanted to understand it? If we can analyse that to a solution, and do the same with too clever by half, and some other specific examples, we will be well on the way to finding an answer. Regarding the too X by half, I must admit that I lean towards an entry at by half. It seems to be the logical first search, and should come up in the search list for approximate entries. By the same reasoning, perhaps an entry at don't come would also work? I find it useful to analyse to find the smallest "chunk" of meaning. -- Algrif 13:02, 24 April 2008 (UTC)
If the schoolchild knew anything about the internet, s/he would enter the phrase in Google rather than Wiktionary's search box. (and if the schoolchild didn't know anything about the internet, s/he probably wouldn't know about Wiktionary either). If we had a Concordance:Don't come the N with me or similar, including a "tin soldier" use among others, that would presumably appear somewhere in the results (though not prominently, at least not until our content improves to the point where people actually start linking to us). On reflection I think Concordance: makes more sense than Appendix: for phrasal template entries, at least in most cases; of course that will require allowing a bit more content in concordance pages than we have done heretofore. -- Visviva 13:14, 24 April 2008 (UTC)
I like both of the above.
The chunking approach is immediately feasible and may help some users right after implementation.
Concordance space would be good for this if it were part of our default full-text search or of a fall-back if the namespace-0 didn't have results.
Usage examples, usage notes, and namespace-0 citations give more searchable material both for Google (???) and our own full-text search.
I wonder if these would also increase the number of hits we would get from Google. The more entries only we cover, the more often we are on the top of their search results, the more click-throughs we get, the better we do in their algorithm (a virtuous cycle). Google drops very common stopwords unless linked by hyphen to non-stopwords. Phrases/chunks/formula(e/s) that have only stopwords are not going to be found via Google. If we could figure out a way to get people to come to Wiktionary for constructions involving mostly stop words, we would be offering something that might win us a certain type of user who would, of course, become a loyal fan because of our superior content. DCDuring TALK 17:35, 24 April 2008 (UTC)
  • I note that some phrasal templates -- or at least snowclones of a sort -- have been put on Wikiquote by our friend BD2412, using the X-Y notation. The man can cite! See for example wikiquote:An X among Ys, a Y among Xs. These do seem to get good Googles, for what it's worth; the Wikiquote page for X me no Xs was #4 in a search for "but me no buts," just above the first actual scholarly treatment. Personally I would prefer, assuming we are going to have these on Wiktionary in some form, that we use a more linguistically-aware notation such as NP/VP or simply N/V/etc., as "N me no Ns." But first, perhaps we should consider what value we can provide that Wikiquote cannot. -- Visviva 11:00, 27 April 2008 (UTC)
  • Neither the X/Y type or NP/VP type notations are going to serve inexperienced users well. How someone might learn that the possibility of such a search is not at all clear to me. Everything we put outside main namespace has second-class citizenship and will not be found by those who are not adepts. I suppose that there is value in having such rewards for becoming an adept.
    The value we create would simply be that we helped meet an expectation that someone had about what should be in a dictionary. I believe that dictionary users want help in understanding odd constructions. Certainly most dictionary have some kind of grammar and usage content. Neither WP nor Wikiquote would be my go-to Wikis for grammar and usage information. DCDuring TALK 11:45, 27 April 2008 (UTC)
Well, insofar as we are an online reference work (and that has to be how at least 99.9999% of our current users use us), most people are going to come to any given page through a search engine, portal, or direct link, not by going to the main page and typing in the search box. So in the case of a snowclone, they will probably find the page by searching the web for information on a particular instance of that snowclone, in the same way that I found BD's pages on Wikiquote. It would never have occurred to me that Wikiquote might have such a page, but in the glorious world of Web 2.0, that was irrelevant; Google did my thinking for me. Appendix pages are indexed by Google et al., so that shouldn't be a concern for us. Likewise people will be able to find the content regardless of the page title. -- Visviva 12:04, 27 April 2008 (UTC)
It would be nice to know how users actually get here. How many are just from sister project links? I don't think we can rely solely on individual-page attractiveness. We deliver branded information to a certain extent, so that users may select us from a search result page because of the good things that have happened to them on our site in the past. I would think that we would want to offer some kinds of search that Google doesn't (and can't) offer. Their orthography limitations are an opportunity. And so too might be some kind of grammar-restricted searches with "variables". "NP1 NP2 no NP1s"? Could we use categories (visible or invisible) to go in this direction for idioms and contructions? DCDuring TALK 12:26, 27 April 2008 (UTC)

Have we come to a decision about how we treat these sorts of entries? When trying to find if we have any sort of entry for the "s/x/y/" type of self-correction notation used frequently by those familiar with regular expressions (if we do, I haven't been able to find it.), I stumbled upon X one's Y off. Thryduulf 01:13, 2 May 2008 (UTC)

Certainly not yet. I would favor Visviva's NP, VP, N, V formulation for anything that didn't fit the one('s)/someone('s)/something('s) approach. That won't cover animate/inanimate and other more semantic categories without modification, but linguists must have suitable vocabularies for such distinctions that we could try out. DCDuring TALK 01:33, 2 May 2008 (UTC)
If the way of filling the blanks is kept simple, then users will be able to form them easily once they see one example, which they will eventually see somewhere in Wiktionary (synonyms, see alsos, translations, search results, maybe redirects for common searches for particular terms). They may also see it on the information desk. "User: How is (something with gaps) used? Wiktionarian: Look at (link). User: Oh, they are formatted like that, neat." I doubt that NP, VP, etc. are simple enough. -- Coffee2theorems 12:20, 3 May 2008 (UTC)
I agree. —RuakhTALK 13:05, 3 May 2008 (UTC)

The simplest possible thing would be to pick a sequence of characters that is used for every gap, i.e. instead of "X one's Y off" you'd have e.g. "... one's ... off", "* one's * off", "? one's ? off", or some such. Using non-letters would be best (if the software allows it..?), because the choice is more clearly unique. With letters there are choices such as upper/lower case and choice of letter ("X one's X off" and "X one's Y off" would seem equally plausible ways of generalizing "too X by half" to me), whereas e.g. "..." doesn't suggest any alternatives. It would also be a plus if the characters can be easily typed ("… " can't, "..." can). -- Coffee2theorems 12:23, 4 May 2008 (UTC)

I agree, though I think … (horizontal ellipsis) is fine provided ... (three periods) is a redirect. —RuakhTALK 15:37, 4 May 2008 (UTC)
What expressions are there that have the same variable in two positions? I can think of a few in English, but there must me more. "X me no Xs" is almost an unfair example, having no restrictions on X other than it being mainly a noun. The switch in PoS defeats even Visviva's approach. "X after X" (periods of time > hour; distance; repetitive task; or object of repetitive task; indeed, anything repetitive) and "X in, X out" (day, week, month, year) are the two cases that first came to mind. "X upon X" and "X by X" are similar.
I personally would prefer an approach that could handle these cases and that made clear PoS restrictions, which are common, but not always obvious from the ellipsis approach. That is a significant advantage of what Visviva had offered. We may need a more flexible framework to reflect the particular restriction on the variables such as (animate, human or near-human, mass noun, countable noun, etc.) DCDuring TALK 16:29, 4 May 2008 (UTC)

Pending an eventual resolution to this, I've started a list of entries that we should add when we decide how. The list is at User:Thryduulf/phrasal entries with variables, feel free to add any others you think of. Thryduulf 10:27, 12 May 2008 (UTC)

Search enhancement

Typing in the search box now shows you the words we have that match your typing. I like it! However, I notice that newly added words don't show up - is it running off of a preproduced list? SemperBlotto 08:27, 22 April 2008 (UTC)

To answer my own question - no, there is just a short time delay - excellent. SemperBlotto 08:37, 22 April 2008 (UTC)

Marvelous. I'd always wondered if that sort of functionality would ever come to us. The responsible dev(s) deserve a signed thank-you note. -- Visviva 09:08, 22 April 2008 (UTC)
That's awesome! (Though sadly, it gives our "misspelling of" entries greater potential to be detrimental. :-/ ) —RuakhTALK 14:31, 22 April 2008 (UTC)
Excellent. It's a very good step, keeping us competitive with the well-funded sites. Something like "soundex" search or a list of aliases would be a wonderful next step for the many instances where user doesn't enter a spelling we have. It might be more important for us than for others in the WMF ambit. DCDuring TALK 14:41, 22 April 2008 (UTC)
Yeah, pretty neat. I'm not sure how soundex works, but definitely it would be good to allow people like Hippietrail to tamper with this, letting it search the "did you mean" results instead of just the page titles.
One problem though, it's rather a pain to do a search when the lists drops because it covers the search button. It only applies if what you've typed is a prefix to something else, but that condition is pretty easy to meet. Maybe it could "drop" up instead of down? DAVilla 21:55, 23 April 2008 (UTC)
Is this CSS-adjustable? -- Visviva 13:26, 24 April 2008 (UTC)
It's miraculous enough that this much has happened; I wouldn't put too much hope in the possibility of future improvements of the same kind. (For example, I wouldn't assume that the DidYouMean extension will be approved this year or this decade.) But in any case I don't think that w:soundex would work for the default search box, since the sounde algorithm is limited to English. On a hypothetical future version of Special:Search, with language selection etc., it would be a great addition. But if we want something like that, which involves front-end functionality rather than anything at the content end, it probably makes the most sense to set up a demo mirror of our own. Got cash? -- Visviva 13:26, 24 April 2008 (UTC)
If we could get our usage up, perhaps we would be more useful to WMF for fund-raising and more "deserving" of technical attention. WMF did get a $500K from Sloan Foundation recently. I wonder what we could do that would help in that regard in terms of identifying funders whose interests coincide with where we might want Wiktionary to go. UK Prime Minister on recent visit to US spoke about the English language as a tool of joint national interest with US. Maybe govt. money has too many rules for WMF and is seen as tainted and insufficiently international, but there should be suitable funders somewhere. DCDuring TALK 17:16, 24 April 2008 (UTC)

Well, it seems to me silly to re-invent wheels (however fun it is ;) so I have wrapped aspell in some python and added a callback to WT:PREFS. If you want to test this feature then go to WT:PREFS and enable "aspell on http://devtionary.org (WARNING...". This is not a feasible long term solution, and if aspell turns out to do the right thing then I will implement it as a proper extension for MediaWiki. The javascript code is User:Conrad.Irwin/aspell.js. Known problems with the current implementation: It is very slow (this is because devtionary has to query the wiktionary API to provide colourful links, aspell is plenty fast enough ;), It only supports English (this is a limitation in the current installation, not aspell or the python script in general). I would appreciate comments on how well aspell performs, and any other ideas people have for doing this kind of thing.Conrad.Irwin 10:53, 25 April 2008 (UTC)

To test what this does, visit a misspelled page (i.e. http://en.wiktionary.org/wiki/alhpabet ) or do a full text search for a word (i.e. Special:Search/hunderd ). Conrad.Irwin 12:26, 25 April 2008 (UTC)

"misspelling of" template

The {misspelling of|} template does not allow wikilinks within the template. Entries with this template are usually simple and do not contain other wikilinks, therefore they are not included in the page count. Is this by design? --Panda10 00:00, 23 April 2008 (UTC)

Yes, this is by design. A number of editors here feel that misspellings aren't really words anyway, so they ought not to count towards our total number of entries. --EncycloPetey 00:02, 23 April 2008 (UTC)

alternative spellings of only some sense

nonpartisan had an alternative spellings section for non-partisan and an adjective section. Then I added noun section. But the alternative spelling only goes with the adjective (I think). How do we indicate an alternative spelling for only POS? RJFJR 16:35, 24 April 2008 (UTC)

I'm pretty sure that there was recent discussion about this. BP, TR? DCDuring TALK 18:26, 24 April 2008 (UTC)
IIRC, when we voted on the order of L4 headers, this case was considered. The Alternative spellings header may occur at L4 when it is specific to only one part of speech. --EncycloPetey 21:40, 24 April 2008 (UTC)


Election Notice

The 2008 Board election committee announces the 2008 election process. Wikimedians will have the opportunity to elect one candidate from the Wikimedia community to serve as a representative on the Board of Trustees. The successful candidate will serve a one-year term, ending in July 2009.

Candidates may nominate themselves for election between May 8 and May 22, and the voting will occur between 1 June and 21 June. For more information on the voting and candidate requirements, see <http://meta.wikimedia.org/wiki/Board_elections/2008>.

The voting system to be used in this election has not yet been confirmed, however voting will be by secret ballot, and confidentiality will be strictly maintained.

Votes will again be cast and counted on a server owned by an independent, neutral third party, Software in the Public Interest (SPI). SPI will hold cryptographic keys and be responsible for tallying the votes and providing final vote counts to the Election Committee. SPI provided excellent help during the 2007 elections.

Further information can be found at <http://meta.wikimedia.org/wiki/Board_elections/2008/en>. Questions may be directed to the Election Committee at <http://meta.wikimedia.org/wiki/Talk:Board_elections/2008/en>. If you are interested in translating official election pages into your own language, please see <http://meta.wikimedia.org/wiki/Board_elections/2008/Translation>.

For the election committee,
Philippe Beaudette

trans gloss in morna

This word is listed in Category:Translation table header lacks gloss even though there is a gloss. The structure looks fine to me. Can you take a look? What is it that I don't see? Thanks. --Panda10 11:00, 27 April 2008 (UTC)

Fixed; just a top where a middle should have been. -- Visviva 11:02, 27 April 2008 (UTC)
Thanks! --Panda10 11:32, 27 April 2008 (UTC)

{{compound}}, {{suffix}}, {{prefix}}

I recently made {{compound}}, modeled after {{suffix}}. I think we ought to promote these templates, since they offer a possibility to keep etymology sections consistent and uniform. However, maybe a little more information than just a ‘+’ would be good. Therefore here a call for better wordings for those templates, maybe in the style of belofteploeg, where I didn’t replace the etymology by the template (yet).

Hoping for your input (but feel free to implement it yourself, I’m not frequenting this page anymore)! H. (talk) 17:17, 27 April 2008 (UTC)

Comparable to {{blend}}, which has a more specialized role. Might allow more automatization of derived terms and, with lang parameters, enable identification of macaronics. DCDuring TALK 16:43, 4 May 2008 (UTC)

Petition on Meta

Hello,

I would like to notify you of a petition against the recent decision by the board to reduce community representation. Please find it here. I am sending this message to most English Wikimedia projects as I think it is important the community is informed. If you have any questions please ask me at my Wikinews talk page.

Thanks,

Anon101 (on Wikinews) 20:23, 28 April 2008 (UTC)

(Note- I did not create the petition)

That petition gives half of one point of view and no place to voice opposition... Where do the people who are pleased that the board is looking to add professional voices to the discourse in order to make the most of the contributions that the community generate? I don't like one sided politics. - TheDaveRoss 20:30, 28 April 2008 (UTC)
I suppose that periodically or as the occasion warrants, we might remind people that Wikimedia Foundation ("WMF") provides the umbrella for us and all our sister projects.
For those interested in the governance of WMF the and the issues that it deals with, here is the contact information for the mailing list:
  • foundation-l mailing list
<mailto:foundation-l-request@lists.wikimedia.org?subject=subscribe>
Unless there is an issue that is specific to en.Wiktionary, or wiktionaries in general or all of WMF's en sites, the discussion is best carried out there. DCDuring TALK 20:55, 28 April 2008 (UTC)

Rohingya (cit) split

News flash! SIL has split cit into rhg (Rohingya) and ctg (Chittagonian) [15]. We'll need to update the appropriate templates and Category:Rohingya language. --EncycloPetey 01:10, 29 April 2008 (UTC)

Now that is unusual, (a split, usually it is additions) apparently because "cit" was an error to begin with? You changed {{cit}} to "Chittagonian", which puts things in a non-existent cat; all of the existing entries label themselves as "Rohingya" (although most are User:Drago ...). I've fixed it to redirect to {{rhg}}, on the way to orphaning it. (This will show up in my language templates table as something to be fixed.) Robert Ullmann 12:06, 1 May 2008 (UTC)

Sorting clicks

yet another trivial issue on which much heat can be expended ...

I've notice in working out the implementation of sorting translations tables in AutoFormat that humans have put !Xũ under X, rather than at the top of the table, where the simple code order would place it. This seems reasonable. Do note that the "!" is a click, not punctuation. This would also apply to ǂHõã, ǀXam, etc which would otherwise sort at the end (IPA characters, "ǀXam" starts with an IPA dental click, not a vbar/pipe). And doing the same sort for language headers in an entry. Robert Ullmann 09:55, 30 April 2008 (UTC)

Makes sense. Dictionary sorting ought to consider letters only, ignoring case, punctuation, and spaces. I realize these clicks are more significant than most punctuation in their native language, but to English-language readers they are not letters. Michael Z. 2008-04-30 18:51 Z
On second thought, we are alphabetizing text in many languages and foreign scripts. Is there a native sort order for these symbols? Is there any reason not to use the default Unicode collation algorithm for all places where we have mixed languages? Michael Z. 2008-04-30 18:55 Z
That's not exactly true. There are places where we alphabetize text across language and script (e.g. at category pages), but the language names in the translation tables are supposed to be English names in the Latin script. I think we should probably use the Ethnologue name (Kung-Ekoka), but if there's a good reason to use !Xu as our name for it, then we should do so, and IMHO we should ignore the ! in collating, just as we do spaces and punctuation and whatnot. (Likewise if there's a good reason to use !Xũ, but that strikes me as unlikely.) —RuakhTALK 20:09, 30 April 2008 (UTC)
Quite right. (I have since had my morning cup, and see that collation depends on the context)
I didn't realize we were talking about !Kung, the name I have heard before, and what Wikipedia calls it. Michael Z. 2008-04-30 21:55 Z
Yes, among others, and we use "!Kung" in the language template. There are others besides clicks, such as 'Auhelawa. I've added a line to the collation order lambda in AF. Robert Ullmann 11:56, 1 May 2008 (UTC)

People might also be interested in the description of what AF has been taught to recognize in tables at Category:Entries with translation table format problems, specifically the handling of grouped languages and subsidiary lines with qualifiers, e.g. doing things like:

(at butterfly) where either * or ** can be used with a language name, this makes it easy for applications parsing wikitext as they can treat * and ** identically, and expect the full language name. (And we don't have to have "Greek, Modern" ;-) Subsidiary notes that are not languages use *: as with

(although the Serbian things really ought to just be on one line ;-) The stuff described at the cat page is not policy, just what I've found that seems reasonably structured and useful. AF has been tagging things for a few days to see what will be found. Robert Ullmann 11:56, 1 May 2008 (UTC)

On the last example, I agree that it should be one line. It's not a transliteration, so it shouldn't be parenthesized, but it could more easily be listed as we do with simple and traditional Chinese:
Is it really necessary to inform the world that the first are Cyrillic characters and the latter Roman? DAVilla 20:53, 19 May 2008 (UTC)

<section end="archive_april">

May 2008

Special:ListUsers, Does this bug anybody besides me?

I think the first page of the master user list is rather unfortunate. Yes, the user's account was deleted--so the link's red now, if it used to be blue. Can't any more be done? Is there an überdelete? If not, how about resurrecting the user, moving the account to a plain vanilla name, and then deleting again? Snakesteuben 07:11, 1 May 2008 (UTC)

  • There doesn't seem to be a "delete user" function (I was asked to delete one yesterday). All we can do is rename them (but they can then be recreated). SemperBlotto 07:24, 1 May 2008 (UTC)
Block user --> Prevent account creation won't keep it from being recreated? If the page and user functions are indeed separate, then the blocked user name shouldn't change when the page name is changed. But I admit I haven't tried it, yet... Snakesteuben 08:45, 1 May 2008 (UTC)
Edit: Nah, that doesn't work, not exactly like that, anyway... Snakesteuben 08:47, 1 May 2008 (UTC)
Well, a bureaucrat can actually rename a user; see WT:MV. (This will automatically move the user and user discussion pages as well, but as you note, that's conceptually a separate step.) So we could rename the user to something innocuous, and then add the original username to the accounts-blocked-from-creation list. Actually, we could more generally add ^! to that list and consider the problem solved. —RuakhTALK 12:18, 1 May 2008 (UTC)
It is easier to use the list at Special:Allpages/User_talk: - though that only lists user's who have been talked to. I believe there are plans to kill the thousands of dead accounts as Unified Login progresses - but 'til then we'll have to just put up with it. Conrad.Irwin 09:12, 1 May 2008 (UTC)
Ruakh: There ya go!
Hi, Conrad. That's not my concern; I haven't much use for the page myself. What about the public, or the casual contributor who isn't part of this community--and hasn't ferreted out the easier ways to do things? I think one might reasonably look at the registered contributor list to figure out who's behind all this, and then gauge the credibility of the source. If I'm right, then that page doesn't help us, and might lend credence to the anarchy-ergo-unreliability theory. 'Course if I had my way, for that very reason, the default display would sort users by UserPage.Qualifications.Impressive + Contributions * Signal_to_Noise_ratio or some such, rather than alphanumerically. ;-)
But seriously folks, yes, you and I can be expected to put up with a fair bit. But just like paper books, wikts must provide value to more than just their authors and developers. While the user interface we present to the public might not be as important as a book jacket, it's still part of the package, not irrelevant. I think anyway. Snakesteuben 15:23, 1 May 2008 (UTC)
Afaik the official mediawiki answer is that there is no user delete function and the database should not be touched because it could be dangerous for db integrity. But on my personal mediawiki i just did a "delete from user.." on the MySQL console anyways and mediawiki didnt explode, didnt seem to cause any trouble so far. Mutante 21:40, 4 May 2008 (UTC)
The question isn't whether or not it can be done, it is whether or not it will be done. Yes users can be deleted, anything can be deleted given the proper amount of legwork, but we don't generally do it without very good reason. I think that it is far easier just to rename the one offensive username from the first page of the list and move on, we can blacklist any names which are recreated if it comes to that. It isn't worth bugging a dev about it, unless someone wants to create a username_delete extension... - TheDaveRoss 21:49, 4 May 2008 (UTC)
Someone, somehow, made it go away. I am content. Snakesteuben 03:32, 9 May 2008 (UTC)

But no one knows the POS!

I just created the entry for θεπτάνων ‎(theptánōn). It's an incredibly obscure Ancient Greek word, which is only attested in a fifth century Ancient Greek dictionary of obscure and archaic Ancient Greek words, written by w:Hesychius of Alexandria. So here's the fun part: No one knows what the POS was for sure. It could be an adjective (on fire), a noun (something which is scorched, on fire), or as I expect, a participle (≈ being on fire). Now, these are close enough together that we can reasonably give it a definition, but I don't feel confident giving it a POS (hence the POS is listed as Unknown). One of the really cool things about Wiktionary is that we can include incredibly obscure and esoteric words like these. However, we may want to discuss how we want to do things when dealing with words which have incomplete information. To give an even more interesting example, take the Phaistos Disc. No one knows what any of that means. However, for ancient Aegean linguists, it's very important stuff. I want to, eventually, include all of these words on Wiktionary (Unicode is waffling over whether to encode them). If they do, Wiktionary is the perfect place to have them. We can discuss various theories as to their meaning, look at similar characters in other scripts, etc. It's a classicist's wet dream. But, how do they get formatted. Now, this is not an urgent thing, as there are plenty of less esoteric words which we still lack (like the verb ἅπτω ‎(háptō), the participle of which is given as the definition of θεπτάνων ‎(theptánōn)). However, I thought I'd throw it out there just to get the ball rolling. -Atelaes λάλει ἐμοί 23:51, 1 May 2008 (UTC)

I think you should at least put Unknown Part of Speech instead of just Unknown, without Part of Speech a casual reader would not know what Unknown refers to. Nadando 23:54, 1 May 2008 (UTC)
I would think you would take a best guess at PoS. In the leading case you mention, you have a fairly good idea, so to say unknown seems to mislead. The underlying question is which of the specialized needs of researchers can productively co-exist with the needs of the more ordinary users. In the case of the Phaistos Disc, it would seem to belong in something more specialized or perhaps the Ancient Greek Wiktionary (if it comes into being), where it would attract all of those most able to decipher the material. Frankly, if there isn't unicode, then it would seem to be more of a WikiCommons thing for the images. DCDuring TALK 01:07, 2 May 2008 (UTC)
According to Wikipedia, Unicode codes now exist for these symbols. Lmaltier 21:42, 2 May 2008 (UTC)
For really exceptional cases like this, IMO it makes sense to put scholarly interpretations in place of documented use. That is, if there is a school of thought that this is actually a genitive-plural noun, have a "Noun" heading, with appropriate qualifiers in the sense & usage lines. If some possible POS's have only been mentioned as possible interpretations (and never seriously championed), those should be relegated to the notes.
In the case of the Phaistos Disc (likewise, other undeciphered writings), I would expect us to use ===Symbol===, though what the L2 header might be I haven't a clue -- perhaps there is a better use for "Unknown". -- Visviva 14:19, 2 May 2008 (UTC)
I’d also suggest you pick one POS, which seems to be the most probable or less contested, and add a Usage notes section explaining the issue.
By all means, do include those Phaistos symbols. Although maybe an Appendix would be more appropriate. H. (talk) 08:17, 9 May 2008 (UTC)

Proposed change in ToC display

In this discussion, Conrad.Irwin has proposed some CSS code that would cause ToCs to float to the right of the entry text, instead of sitting on the left and pushing everything down. This probably would not have worked smoothly in the past, but seems to work fine now, thanks to Robert Ullmann's great work in sorting out the float properties of various floaty things (here endeth my understanding of that matter).

The specific code would be:

#toc {
  float: right;
  clear: both;
  margin-left: 5px;
  margin-bottom: 5px;
}

Since this would be a very significant change to entry display sitewide, I am posting this here to the Beer Parlour rather than the Grease Pit. Please voice any concerns or objections here. For my part I support this change, which IMO makes entry navigation significantly more straightforward. -- Visviva 15:53, 2 May 2008 (UTC)

Perhaps the margins could reflect the existing document grid. The bottom could match an image thumb (6px), or use the line-height of text (1.5em, computed as 19px in my browser). The left could use the same margin as the navigation boxes in the left column (7px), or the main column of text (12px). Michael Z. 2008-05-02 18:21 Z
I'm pro–, but I think there should probably be a corresponding #toc-float-none #toc rule-set that undoes it, and perhaps a #toc-float-left #toc rule-set that floats left instead (with appropriate margin changes), so we can have {{tocnonfloat}} (and perhaps {{tocleft}}) for cases where they might be useful. (Those are probably bad names, but you get the idea.) —RuakhTALK 21:51, 2 May 2008 (UTC)
I think this is an excellent notion. One area where I'm not thrilled with this is in big community pages like this one, where the standard-issue TOC is actually better; since there aren't many such pages, it would be easy to apply a template where appropriate. -- Visviva 01:38, 3 May 2008 (UTC)
This would be a problem for all the right-floating items we have, such as WP link boxes, {{was wotd}}, and images. If the TOC floats right, then these items either (1) are shoved left into the entry text, (2) hidden by the TOC, or (3) shoved down into the collapsible tables. Is there a proposal to deal with these problems? --EncycloPetey 22:16, 2 May 2008 (UTC)
The TOC on the right seems to work on WP without problems where it is used. However like EP I'm uncertain if it would work with our page structure - take a look at pages like head, bassoon and router. How would these work with a TOC on the right. Until I've seen mockupsthat show how these entries (or ones like them) would be with the TOC on the right I would oppose any changes to the status quo in this regard. Thryduulf 00:02, 3 May 2008 (UTC)
For pages like head, see User:Visviva/head. added: note that that matches the CSS behavior in FF, but not in IE, where it behaves weirdly.
For bassoon, the right sidebar currently renders like this: pediabox, TOC, image. Not ideal, but I think we might reasonably ask whether a TOC is needed on that page at all.
The images in router get pushed down, but not too far: on FF for me, the first image is level with sense 1, and the others stack neatly down to the carpentry sense. (hey, those images should really be in a gallery anyway.) ;-) -- Visviva 01:38, 3 May 2008 (UTC)
Here's how it renders for me in FF: any right-floating thing (image, pediabox) above the first language header displays above the TOC; any right-floating thing below the first language header displays below the TOC. Using the __ TOC__ magic word, it is possible to tweak this if it's not quite the desired behavior -- i.e., forcing the TOC to appear above or below a certain point. (Though honestly, I'd been thinking more in terms of preventing pediaboxes from grabbing this prime real estate.)  :-)
AFAICT, this doesn't affect {{was wotd}} at all. -- Visviva 01:38, 3 May 2008 (UTC)
Having difficulty getting this to work in IE6; perhaps someone CSS-knowledgeable can explain why? -- Visviva 01:38, 3 May 2008 (UTC)
I don't have IE6 to test with, but I understand that it has a bunch of problems with margins on floats, and that some of those problems go away if you set display:inline on the floating element. (This is discussed in various places online, e.g. at <http://www.positioniseverything.net/explorer/floatIndent.html>.) If the difficulty you're experiencing has to do with the margins, it might be worth trying. —RuakhTALK 04:28, 3 May 2008 (UTC)
Thanks. Actually the difficulty I'm experiencing is that absolutely nothing changes, even after a full cache dump. Tried on another computer, since this one is having issues -- still no change, and likewise when I add "display:inline". Odd; I suppose it wouldn't be a deal-breaker (since IE6 users would just get the same layout as before), but it doesn't seem right. -- Visviva 06:00, 3 May 2008 (UTC)
BTW, wouldn't clear:right make more sense for this than clear:both? —RuakhTALK 04:22, 3 May 2008 (UTC)
Update to code to reflect idea's given - this works for me in IE6. Conrad.Irwin 20:45, 3 May 2008 (UTC)
.ns-0 #toc {
  float: right;
  clear: right;
  margin-left: 7px;
  margin-bottom: 6px;
  display: inline; 
}
As this seems to be popular I suggest we give it a trial in the next few days. Conrad.Irwin 20:45, 3 May 2008 (UTC)
Popular? I see two people who've voiced support and two who've raised objections. What definition of popular does that fall under? --EncycloPetey 03:46, 4 May 2008 (UTC)
My definition 'cos I like it - I would count it 4/2 :). Anyway, there is now a new item in WT:PREFS to allow this to be previewed more easily. It should appear at the bottom of the list under the search spellchecker - if not then you will need to hard refresh. Conrad.Irwin 11:04, 4 May 2008 (UTC)
It would be helpful if you can clarify what concerns you feel haven't been addressed. To review your previous concerns again: due to the recent CSS revisions, if the TOC pushes down into the collapsible tables, the collapsible tables simply shift left (no collision). Images and boxes above the first language header render just as they do now; images and boxes below the first language header render below the TOC. In entries that already have a cluttered right margin (multiple boxes), this can get a little messy, but I would submit that those entries are in need of cleaning anyway (we have {{pedialite}} et al. for a reason). In any event, AFAIHS the TOC does not collide with anything, nothing is pushed into the entry text, and {{was wotd}} is not affected.
Anyway, I've been running this for a few days and I'm sure not going back to the old layout. I just think it would be nice if we can offer this improvement to the general user; it seems unfair to keep usability improvements to ourselves.  ;-) -- Visviva 07:03, 5 May 2008 (UTC)
  • One issue I have seen in FF: when the browser window is reduced, a {{wikipedia}} box rendering below the TOC actually blocks out some of the definition text. This seems to be a problem in {{wikipedia}}, just made more apparent by this change; however, I'm not sure which of the various style declarations therein might be responsible. -- Visviva 07:03, 5 May 2008 (UTC)

Wiktionary:About Ukrainian

I've started this language article. Michael Z. 2008-05-03 05:02 z

A modest proposal, re: bad translations on the internet

I'm sure everyone who has dabbled in ttbc has noticed a lot of misspellings, and even totally mythical translations on purported "dictionary" and "translation" sites around the internet. It seems any time one site starts a rumour, the others pick it up. And before long, they drop in a citation to "various references," which I assume means each other. Unfortunately, en wikt is frequently a member of this group. (I recently deleted one such "translation" for the second time--and it was in the main section, not ttbc unfortunately.)

Is there some policy governing how to deal with/prevent this kind of thing?

If not, what do you guys think of maybe creating either:

  1. a list or category of these things, with links to/from the affected words, or perhaps even better
  2. an entry akin to "common misspellings" for the myth word. (If we do this, we should call it something else--these aren't common misspellings, they occur nowhere except on the braindead sites, and sometimes in posted messages by obvious English speakers who were duped into using them.)

Snakesteuben 02:42, 4 May 2008 (UTC)

I like option 2, at least for the serious problem cases. If we don't actively address these misconceptions, they will keep getting added when no one is looking. There could be a standard "Common Foovian mistranslations" (or something) category generated by the template. The exact format of the template bears some thought -- should it include the preferred translation, the putative English equivalent, or both? And what should these be called? "Mistranslations" is a bit too broad. -- Visviva 09:15, 4 May 2008 (UTC)
Good ideas, Visviva. In the mean time, I think I'll start noting such things as hidden text comments next to the relevant entry in the translations sections. (I'm guessing a quick consensus is unlikely, and I'm not senior enough here to take semi-unilateral action ... though you probably are.) Winter (Username:Snakesteuben 02:44, 12 May 2008 (UTC))
Maybe something like the way I dealt with the "phobias" would work here, namely {{only in}}. See Appendix:Invented phobias and aurophobia or Category:Wiktionary pages that don't exist. Though I'm not sure how much support this has either it strikes me that the situations are similar. Conrad.Irwin 09:33, 12 May 2008 (UTC)
{{only in}} (or an enhanced descendant) could be of great use for keeping persistent bad full entries out of principal namespace and for getting some better use out of the Appendices. Addressing more of the full range of user and contributor "errors" might be valuable for reducing vandalism and speeding users toward the entries that they really need. In contrast to redirects, but like {{misspelling of}}, it compels users to note that they have made an error. This seems like yet another good use. DCDuring TALK 12:26, 12 May 2008 (UTC)

User Richardb

User:Richardb has been confronted with repeat copyright violation (see User_talk:Richardb#Citations pages) which was dumped into the Citations namespace. His responses were "I'll leave them there and let another editor format them" [16]; "since you like being the policeman so much I'm sure you'll get far more enjoyment out of doing the deletions" [17]; and "Aw sod off the lot of you. Get a life. I'm busy putting decent stuff into the Wiktionary. Can't be bothered with you boring lot. Won't be replying to this crap any more." [18].

Copyright violation is too serious for such a flippant attitude, particularly for a Wiktionary administrator. I'm now of the opinion Richardb should be desysopped (at the least) and possibly banned if this continues to be his attitude towards violating copyright law. --EncycloPetey 06:49, 4 May 2008 (UTC)

Agree. I'm sorry to say that this user's words and deeds have pretty thoroughly ruled out good faith, and appear to indicate that he poses an unacceptable risk to the project. Further, this is not the first occasion that Richardb has indicated he does not consider himself bound by community norms. Lapses of judgment or temper are one thing, but that is not really an acceptable attitude for an admin.
I don't think an outright ban is necessary, provided that Richardb lives up to his (apparent) commitment to stop engaging in copyvio. He has made valuable contributions here, and hopefully will do so again in the future. -- Visviva 07:10, 4 May 2008 (UTC)
However good a contributor is otherwise, this attitude towards copyright violations is completely incompatible with Wiktionary, and doubly so of administrators. Unless he changes his tune very quickly I don't see an option other than formally requesting he be desysopped. Thryduulf 08:11, 4 May 2008 (UTC)
Well, although I don't know if it counts for a change of tune, his remark of 22:04, 3 May 2008 (UTC) on User talk:Richardb#Citations pages seems to indicate that he does not intend to continue, although he also does not intend to clean up after himself. AFAICT cleanup is now complete in any case, so there does not appear to be an imminent threat to the project. Nonetheless, IMO the risk of having an admin with such open disregard (as it appears) for the most fundamental principles of Wiktionary is still great enough to justify desysopping. Not sure what the procedure for that is... I believe stewards look for clear local consensus, but I'm not sure if that would require a formal Vote or not. -- Visviva 09:10, 4 May 2008 (UTC)
De-sysopping would not prevent a recurrence of the specific copyvio problem and risks causing worse problems. The copyvio issue is easy to fall afoul of, speaking from experience. I'm more concerned with seemingly petulant responses to reasonably polite and even very polite feedback. We have seen some fairly disruptive behavior by some of those who feel unhappy with and alienated from the Wiktionary culture. The disruption wastes our time when it occurs, even though it is remediable. In this case "you boring lot" is a possible sign of that kind of unhappiness and alienation. It would be better to have Richardb on board and contributing than hostile and non-contributing. Realistically, we are better off to let slide troubling incidents separated by months. But the reservoir of AGF good will does decline with every incident. Ordinary contributions alone do not restore it, IMHO. DCDuring TALK 09:44, 4 May 2008 (UTC)
Hear, hear. Widsith 12:07, 4 May 2008 (UTC)
The only reason for removing the sysop flag that I can see is that were anyone to sue WMF about copyright issues the fact that he is an "administrator" doesn't look good. There hasn't been an abuse of the tools. I do agree that it would be best if Richardb at least stopped behaving the way he has been, and perhaps is willing to clean up the stuff that is questionable. I don't think that it is necessary for us to de-sysop him, if he doesn't want to play along anymore maybe he wants to volunteer to step down. - TheDaveRoss 16:54, 4 May 2008 (UTC)
I don't know. He seems to be fairly iffy now on the general topic of following community rules; for example, when I mentioned AGF to him, his response was basically a flat-out refusal to abide by it. So far, none of his willful rules violations has involved admin tools (granted, he deleted RFDO once, but that seems to have been mostly accidental), but do we think he draws a clear line there — "I'll break the rules that anyone can break, but not the ones that only admins can break"? If not, I don't see the point of waiting until he's actually abused the admin tools. Adminship is a matter of active trust, not a passive "benefit of the doubt"–type trust. Personally, I'd have preferred that we talk to him about this; but as soon as he stated openly on his talk-page that he refused to engage in further discussion of his misbehavior, I think EP did rightly in bringing this here and raising the possibility of de-sysopping. (That said, DCDuring makes an eloquent appeal for not de-sysopping him, and assuming that he does indeed stop with the blatant copyright violations, I'm quite happy to hold off until the next "troubling incident", whenever and whatever that might be.) —RuakhTALK 21:17, 4 May 2008 (UTC)
If people feel strongly about it it is worth a vote. You are right about the trust issue, and I think that there have been a few questionable behavioral issues in the recent past with Richardb. A "no confidence" type vote might give people the opportunity to voice their concerns and comments. I don't know that it would succeed, but it would bring out some until now silent voices of defense. - TheDaveRoss 21:32, 4 May 2008 (UTC)
Some people make it really easy to flip from being a good sysop to a bad one. Not that we don't have bad ones to begin with, they're just bad in different ways. Certain sysops have made me a lot less willing to take part in community discussions. Kinda sucks that people who can make wiktionary so unpleasant can still be considered worthy of their sysop powers while other transgressions are held to be more...diabolical. mwahahaha (Kinda not on the exact point, I just wanted to say this) — [ ric ] opiaterein — 13:15, 5 May 2008 (UTC)

Pinyin without tone markings

We are suffering from an epidemic of these lately. The entries added seem to fall into three categories:

  1. Entries that are tone-marking-free versions of otherwise valid Pinyin words: jinu, tiao
  2. Words of type 1 that may actually be used in English and other diacritic-averse languages: Hanyu Pinyin, Guomindang (?)
  3. Alleged Pinyin misspellings, particularly involving the letter "v" (is there some sort of variant system at work here?): lvxing, jinv

I'm assuming that types 1 and 3 should be deleted with prejudice, while type 2 should be converted to English. Is that correct? It would be nice if some guidance on these points was added to WT:AZH. -- Visviva 03:40, 5 May 2008 (UTC)

There's nothing wrong with entries without the tones, as long as the tones are specified. They're useful because you can see which words have almost the same pronunciations except for the tones. In theory, we could keep ONLY these while specifying the tones in the headword, instead of keeping entries for every different pronunciation with tone marks in the page title. (Latin doesn't specify which characters have macrons in the page title, why should Chinese specify the tones?) Note also that we don't really have a system for marking tones for Cantonese, without the numbers. So when we manage to find a good Cantonese contributor, what then? Entries like wong4fan1? Whatever decisions we make about this can't be so hasty. — [ ric ] opiaterein — 13:09, 5 May 2008 (UTC)
Well, personally I'd like to see evidence that pinyin is ever used to write Chinese by actual Chinese speakers communicating with other Chinese speakers. Here and elsewhere, I've seen claims that it is used in children's books (but nobody seems to have a specific book title or ISBN handy), and that it is or has been used for internet communication due to the complexities of encoding (but the only uses of Pinyin on Usenet seem to be by/for learners). But that's more of a general issue... At any rate, if Pinyin is really used for Mandarin, but not really used for Cantonese (etc.), then it seems obvious to me that only Mandarin Pinyin entries should be permitted here, and only in the form in which they are actually used.
Regarding Latin, I'm given to understand that the reason is that Latin has seldom/never actually been written with the macrons; they are purely a lexicographer's convention. If that's also the case for tone markings in pinyin, then by all means we should eschew tone markings. But in any event, we shouldn't get into the trap of having entries for "words" that are never used for communication in any language. We are the dictionary of all words in all languages, not the dictionary of all words in all languages transliterated into all possible writing systems. -- Visviva 13:27, 5 May 2008 (UTC)
Let's not forget that pinyin is the official transileration system even in China. Also, why discriminate against "learners" in favor of native speakers? What good is the English wiktionary going to do for native speakers of Chinese? Unless they're learners :) — [ ric ] opiaterein — 14:33, 5 May 2008 (UTC)
The point is that all words have to pass WT:CFI, which means basically that they have to be verifiably used to convey meaning in the given language. People trying to learn a language online are not a valid source of information here (they are a big part of our target user base, but that's another thing entirely). Treating interlanguage as a language in its own right makes sense in studies of second language acquisition, but it is not a very useful approach for lexicographers to take. Also, I may be wrong, but I'm fairly sure the Pinyin which is official in the PRC uses tone marks.
Anyway, sorry to have driven this off-course ... while I remain dubious of Pinyin entries in general, what I really want to know about is what the community thinks of these nonstandard, ad-hoc Pinyin entries. Is there some unique rationale for keeping these entries that would not apply to any ad-hoc romanization of any language written in non-Roman script? -- Visviva 14:52, 5 May 2008 (UTC)
I used to try to create separate pinyin entries, but no longer do so. If someone else creates a pinyin entry, I make an attempt to correctly format it (time permitting). The reason I no longer create pinyin entries is that if you create a proper Mandarin entry using simplified or traditional characters, you should be able to type pinyin into the search box, and find what you're looking for.
As for the "lvxing" spelling, it is not "legitimate" pinyin. It should be lüxing, or more correctly, lǚxíng (旅行). If I were to use a Pinyin-based IME to type 旅行, I would have to type "lvxing" in order to get what I want. Most English keyboards don't come with a "ü," so many IME's substitute a "v" for purposes of typing. -- A-cai 13:51, 5 May 2008 (UTC)
Aha! Thanks for that info. Is there a good way to note this in the 旅行 entry (and others, as appropriate)? That could help to resolve the anon's concern at Talk:lvxing. -- Visviva 13:57, 5 May 2008 (UTC)
As a Chinese learner, I often want to verify a word I've learned. For example, looking up wanshang (without tone markings, which are hard to type in the search box). So I don't think such entries should be deleted; they can redirect (except in the case when there is more than one word for a single romanized spelling). 24.29.228.33 16:21, 5 May 2008 (UTC)
Whenever you edit an entry, at the very bottom you have this drop-down list with Pinyin section that contains character with tone marks, which you can insert upon clicking. Search results should include entries with transliterations with tone marks, even when you don't type them explicitly. --Ivan Štambuk 19:13, 5 May 2008 (UTC)
No, they can't redirect, because there is every likelihood that wanshang (inter alia) is an actual word in another of the thousands of languages we seek to cover.
If the non-marked Pinyin is included in the relevant entries for real words (real Pinyin and Hanzi), those pages will appear in both internal and external search results. Would that be sufficient? -- Visviva 09:10, 6 May 2008 (UTC)
  • Tone-marked Pinyin is very hard to input especially for Chinese beginner. However, non-tone-marked Pinyin is convenient for processing. Actually, an entry is tune marked in the content of the entry (for example: tongyi).
  • User-friendliness is great, but should not come at the cost of allowing entries for non-words. There has to be a better solution. -- Visviva 09:10, 6 May 2008 (UTC)
Completely agree with Visviva. While I've never been a huge fan of having transliteration entries at all, people keep saying that Mandarin's a special exception, and I'm willing to take them at their word on that. However, there absolutely needs to be a standard. That standard should be whatever it is that people are actually using to communicate, be it with accents, without, whatever. And if both are used, then we need to pick one of them, because having two sets of transliteration entries is simply too much. -Atelaes λάλει ἐμοί 17:02, 6 May 2008 (UTC)
It is not an exception. The pinyin entries are there not because they are transliterations, but because pinyin is very often used to write Mandarin. Chats, IRC, SMS messages, email, etc frequently use pinyin (usually sans tone markings). We don't want any "transliteration" entries. We have entries for the single syllable words with and without tone markings (this is a finite set, about 1700 IIRC), we should have entries for common words often written in pinyin; this is what (e.g.) A-Cai has been doing. Robert Ullmann 14:48, 9 May 2008 (UTC)
Just to reinforce Robert's point, here is a link to a picture of a book cover (note the Pinyin without tone marks). I occasionally cite this dictionary in my entries. Another use of Pinyin, which Robert did not mention (but seems worthy of notice), is in URLs. For example: http://www.kexue.com.cn (kexue means science). -- A-cai 11:39, 10 May 2008 (UTC)
Well, people do lots of funny things on book covers. Does the dictionary also use pinyin without tone markings in the entries? That would be interesting. I have a hanja dictionary that includes pinyin (along with kana), but it uses tone markings.
The URL argument would surely apply to all romanizations, including those of Korean, Arabic, etc. Probably not a road we want to go down. :-) -- Visviva 12:15, 10 May 2008 (UTC)
I'm not disagreeing -- I honestly don't have enough information -- but I'm troubled that in the several times this issue has been raised, not one verifiable case of Pinyin being used for authentic communication among native or native-like speakers of Mandarin has been provided. Obviously chats and IRC aren't normally archived in a durable (or even non-durable) fashion. But we don't normally accept words under these conditions. -- Visviva 12:15, 10 May 2008 (UTC)
I'm only half-heartedly defending Pinyin entries. In truth, if we have them at all, I would be more in favor of them being created by a bot (i.e. converted from a simplified or traditional entry). The fact of the matter is that a number of contributors have added Pinyin entries (with and without tones). The real question is whether we want to encourage or discourage them from contributing in this way. Personally, I've always felt that multiple entries for a single word is one of Wiktionary's drawbacks. It creates too much busy work for contributors (particularly in Chinese), and often results in multiple inconsistent entries for the same word (no matter how hard I try to sync them up). However, given Wiktionary's current technical limitations, I'm not sure that we have another good option to multiple entries. -- A-cai 12:55, 10 May 2008 (UTC)

template:hockey

Following the RFDO discussion that resulted in terms relating to field hockey now being categorised in category:Field hockey rather than directly in category:Hockey, this template now labels entries as (field hockey). Thus I propose it should be renamed to template:field hockey. I would just do this, but I'm unsure if this would cause any issues for the articles that transclude it. Additionally, I'm not certain that we would want to keep the resultant redirect. Thryduulf 12:00, 5 May 2008 (UTC)

Done We definitely should keep the resultant redirect, I would (as I said on RFD) never refer to Field hockey as anything other than "hockey". Conrad.Irwin 16:24, 5 May 2008 (UTC)
As a Canadian, I take exception. Hockey always means ice hockey, and this view is supported by the Canadian Oxford Dictionary. The primary sense is our unofficial national sport, while other sense is a mere Britishism. :-)
I suggest creating a neutral template:hockey which places entries in category:Hockey, where they can be easily found and assigned to the correct subcategory(ies). Michael Z. 2008-05-05 17:03 z
Would that category be for words that are used in the contexts of all forms of hockey, or for no words at all? Either way, the category text should state the category's purpose clearly, so that people familiar with only one type of hockey or the other don't assume it's talking about their type.—msh210 17:33, 5 May 2008 (UTC)
Good question. Do we prefer to see the general (hockey), or the wordier (ice hockey, field hockey) in a sense? Personally I think it is best to reduce the number of unique terms used, and remain unambiguous, so I think the latter might be preferable. If we see (hockey), we may not know whether a Canadian or British editor meant only the type they are familiar with, or hockey in general.
I don't know enough about field hockey to compare, but the terminology looks to have a lot of differences. Terms which overlap are goalie, hockey stick, wing, hookMichael Z. 2008-05-05 18:09 z
I don't know huge amounts about either sport, but I know a little more about field hockey. Basically they are different sports that have evolved from a common premise (i.e. a team sport the object of which is to score goals by using a stick to hit a small object into a net) - they are different enough that even for something as basic as hickey stick, we need separate definitions. I wouldn't object to keeping template:hockey as a way to categorise words temporarily until they can be sorted into the correct sport. "Field hockey" is not a term that I have ever seen or heard in the UK, so it wouldn't be intuitive to British editors to categorise their word such. I guess the same may be true of "ice hockey" in North America? Thryduulf 18:29, 5 May 2008 (UTC)
Ice hockey is heard in Canada, but rarely used, except when it is specifically needed to differentiate from the variations floor hockey (e.g. in gym class, with a light plastic puck), field hockey, street hockey, etc. The CanOD's definition of ice hockey is "= hockey 1". I believe that British-style field hockey is played, but it is unfamiliar, and most Canadians would assume that field hockey is just ice hockey played outdoors in summer = ball hockey.
I don't know if this holds true in the central and southern USA, where winter ice rinks aren't ubiquitous. Michael Z. 2008-05-05 18:42 z


By the way, it sounds like the main definition should be moved from field hockey to hockey (2). Michael Z. 2008-05-05 19:19 z

Yes it should. 19:31, 5 May 2008 (UTC)
Done, please review. Michael Z. 2008-05-05 20:27 z


One more: please review street hockey, to which I added the Canadian form. These could be reasonably combined into a single definition based on hockey, but that would be treating the two different senses as one, and worse, presenting two distinct games of street hockey as one. Michael Z. 2008-05-05 20:49 z

They all look good to me. Thryduulf 21:23, 5 May 2008 (UTC)

While I have your attention, please check the descriptions at hockey stick ("primary implement" just didn't seem that useful, and there was no indication that they were different). Michael Z. 2008-05-05 22:37 z

Interwiki links to redirects

There is some debate about whether we should use interwiki links to link to redirects on foreign Wiktionaries. In particular the issue is centered around User:RobotGMwikt, which is currently set to remove interwiki links that point to redirects, though I hasten to add that the underlying issue is far more important than the bot issue (for this discussion). As this has been raging on IRC for the last 48 hours, I hope that posting it here will help to resolve the situation.

For those who don't know, the interwiki links are used on Wiktionary to link pages with exactly the same title on each Wiktionary. i.e. our entry hello links to the French hello. An issue presents itself when the other Wiktionary has a redirect at the page title, should we link to it (on the grounds that there is definitely some information there) or not link to it (on the grounds that it is not the kind of information that people are expecting from the interwiki links). There are no doubt stronger arguments both ways, and GerardM has written a summary of his thoughts at http://ultimategerardm.blogspot.com/2008/05/robotgmwikt.html which are worth reading before entering the discussion.

I would prefer if redirects on other wiktionaries were not interwiki-linked to. If they were real words, why are they then redirects? And if they aren't real words, why imply that you can get somewhere by clicking the interwiki link? I think this will increase as wiktionaries grow. ~ Dodde 22:56, 5 May 2008 (UTC)
Likewise, I would not want links to redirects. I can imagine hypothetical cases where iw links to redirects might be desirable, but to date I have not encountered such cases except as mistakes. --EncycloPetey 23:19, 5 May 2008 (UTC)
I do want redirect-links. If another Wiktionary sees fit to make use of a redirect, then I see fit for us to respect that use, link to that redirect, etc. This is especially true because in many cases, it's fairly arbitrary which entry is the redirect and which is the main one; for example, our don’t is a redirect to don't, but another Wiktionary might well do the reverse. Would y'all suggest that our entries shouldn't link to each other? —RuakhTALK 00:31, 6 May 2008 (UTC)
Personally I think soft redirects are better suited for intended redirects, because you are able to explain why you are redirecting and let the user stop guessing, and as such they will qualify for interwiki links anyway. What a hard redirect mean is so different between projects and also from case to case, that I see no reason to include these pages into the web of interwiki links. Some interwikilinks could probably be discussed to be justified, but I am afraid alot more would not, and this would imho end with more confusion than clarity. You always have to be aware of what the redirect mean on that particular wiktionary, if any system is present at all. ~ Dodde 00:59, 6 May 2008 (UTC)
I agree with Ruakh. The whole point of linking to redirects is precisely because other wikis use redirects differently, and no one Wiktionary should dictate their use. If another wiki wants to redirect all alternative spellings, or plurals, or whatever, to a single article, we shouldn't then remove all links to those redirects, as if that wiki doesn't have the content. Likewise, in the rare cases where redirects are used on en.wikt, aside from the conversion script's little droppings, they are done to consciously take someone searching for one thing to the page where the content actually is. Dmcdevit·t 01:18, 6 May 2008 (UTC)
A good way to look at this is to view it as if you are deciding what to do on another wikt about links to the en.wikt. If you have a local entry for an idiom (one of the cases where we use redirects), but not in the same canonical (or "citation form") as the en.wikt, would you want to link to the redirect? If your entry is apple of one's eye do you want to link to our redirect? Of course, you aren't as likely to have apple of somebody's eye. Likewise if you have have Arabic or Hebrew forms that we have redirected to the forms w/o vowel markings, and so on. In the same way, when the FL wikt redirects forms or variants, we want to link to them, respecting whatever policy they have. If the FL wikt changes something, we just link to whatever they are doing (and see next section). Robert Ullmann 11:08, 6 May 2008 (UTC)
I agree with the above points to include links to redirects, since there is no way to know whether the redirect is useful or not. Of course, an option would be to immediately link to the page which is redirected to. Would that be a problem?
Note that at the same time I am a fan of sort redirects as well, it’s just that for some cases, they don’t make sense, as Robert pointed out. H. (talk) 07:54, 9 May 2008 (UTC)
I belive that each language community should be able to decide how to structure their data: if they want to use soft redirects or hard ones, what to do with alternate spellings or clitic forms, whether to include romanized entries or not, and all of these things mean that we should allow iwikis to redirects. Sure, there are going to be some mistakes. Over time, those will be fewer and fewer (we hope). Right now we have iwikis from a page which has a word in one language to a page on another wiki with no entry for the same language. That's an iwiki that's not so helpful; but no one suggests doing away with them. We just figure that over time we'll get it right. -- ArielGlenn 20:46, 14 May 2008 (UTC)
What you may or may not do in the future is a hand waving exercise.. We are talking the current state of play. Currently there are four solid reasons why we should not refer to redirect pages and your argument does not diminish any of them. GerardM 07:38, 15 May 2008 (UTC)
Forgive me, but we have two solid reasons why a Wiktionary shouldn't (in general) use them (our multilingual nature and the issue of homonymy), one solid reason why a lot of Wiktionaries have them anyway (case conversion), and one supposed reason why we shouldn't link to them (a claim that they don't really have the target entry anyway). Ariel's argument diminishes none of them because none of them needs diminishing: they range from petty to irrelevant. —RuakhTALK 22:35, 16 May 2008 (UTC)

There is no way in which you can distinction between intended and unintended redirects. This is why the argument is moot. GerardM 10:34, 13 May 2008 (UTC)

I'm not following this. Bots cannot make the distinction (presumably), but humans can; so these should probably not be added by bot, but there is no reason they cannot be added by humans. -- Visviva 11:00, 13 May 2008 (UTC)
If you suggest that all interwiki links are to be created by humans I think you are completely right. In that case we do not need to argue about the algorithm used by bots. GerardM 13:51, 13 May 2008 (UTC)
So is a middle path here to allow people to add iw-links to redirects manually in specific cases, while bots shouln't add, nor remove iw-links to redirects? It seems the argument to include iw-links to redirects is that we might miss a useful link here and there, but the negative effect of adding alot of "false" iw-links seems at the same time to be completely overseen. Allowing to add this manually will give the positive effect that only iw-links are added where there is a good reason, and where there is not we become without alot of "crap" iw-links. (Regarding the bot, is this kind of extinction possible/easy to implement?) ~ Dodde 14:32, 16 May 2008 (UTC)
If in many Wiktionaries, "bad" redirects continue to exist erroneously after ConversionScrtipt, iwiki linking to them will draw attention to that and in the long-run improve quality. There's a reason red-links are red; it draws attention to ways to possibly improve Wiktionary. Interwiki links to unwanted redirects is not desireable, but the problem is that the bad redirect, not that we link to them. Don't hide the error, let each Wiktionary determine how to use redirects, have the wiki bots link to them, and when we iwiki to a "bad" one, let someone clean it up. --Bequw¢τ 15:50, 17 May 2008 (UTC)
I have given this some thought, and it is possible that my mind has been affected by narrowness to some extent. I have taken in some of the arguments for allowing redirects to be iw-linked to, and all-in-all I think I agree more than disagree now, that redirects should be linked to, given the variaty in how different language editions of Wiktionary choose to use their redirects within the project. It's not just a matter of using the character ' or ´, but the way of presenting some words in determined form or not (like some languages names: Canarias - or - Las Canarias etc.) - probably there are quite a few examples of likewise differenties between language editions of Wiktionary. I understand it was quite some time since this was discussed, but since I was one of those arguing against iw-linking to redirects I felt appelled to acknowledge my change-of-mind. ~ Dodde 03:03, 14 July 2008 (UTC)

Interwicket and Arabic

Why is Interwicket removing so many interwiki links to ar? I've checked, and the links do not exist, so the bot seems to be functioning properly. Did a mass deletion happen at ar.wiktionary.org? Anyone know what's happened? --02:06, 6 May 2008 (UTC)

ar:User:Lord Anubis did remove a large set of entries, all capitalized forms (Destiny, Comoros, etc) so the bot is functioning properly. Why they were removed is not known (the edit summary is "Bot: deleting a list of files", very helpful); I've dropped the user a note expressing curiousity. They were not uc→lc redirects: the would-be lc targets don't exist in at least some cases I've looked at. (e.g. ar:destiny doesn't exist) In any case, not our problem? (;-) Robert Ullmann 10:46, 6 May 2008 (UTC)
The entries I'm noticing are for proper names of stars (e.g. Algol, Deneb, etc.) and constellations (e.g. Cancer). And earlier today the link to Deutsch disappeared. Do you suppose they've eliminated capitalization althogether? --EncycloPetey 13:22, 6 May 2008 (UTC)
There are lots of words, all capitalized, see log there. But not redirects, since lc form isn't there (deleted Crossover, but no crossover). So it was some sort of content page? Perhaps a bunch of stuff imported a long time ago that they decided to just trash? No way to tell. Robert Ullmann 13:32, 6 May 2008 (UTC)
We were removing content that was imported from GPL-licensed lists, because GPL is not compatible with GFDL.
We are trying to get these lists licensed under a dual GFDL/GPL license, but till this is done, we cannot use the content on Arabic Wiktionary.
It's all under control :).
Oh and btw, the edit summary was initially "Deleting GFDL-incompatible files", but my bot goes crazy sometimes. :).

--Lord Anubis 15:43, 7 May 2008 (UTC)

Thank you for the explanation (:-) Robert Ullmann 15:44, 7 May 2008 (UTC)
Oh ye, and next time we add them, we 'll ensure that they are not unnecessarily capitalised.--Lord Anubis 15:45, 7 May 2008 (UTC)

proposed vote on inclusion of WMF jargon in the main namespace

Per a discussion on RFV, I have proposed a vote on the inclusion of WMF jargon into the main namespace. Please change its wording as needed and comment (there, not here).—msh210 19:02, 10 April 2008 (UTC)

I've now modified it; please have a look.—msh210 18:26, 1 May 2008 (UTC)
And now it's live.—msh210 16:04, 8 May 2008 (UTC)

Appendix:List of Latin phrases

This is useful, but should it be moved to Appendix:List of Latin phrases in English? Which is what it seems to be. Widsith 11:42, 9 May 2008 (UTC)

Only if these phrases are not used in other languages. I can't say because I'm uninformaed on the possible use of these phrases in French, German, Polish, etc. However, on a quick look, they seem to be phrases that would likely have been used either in conversational Classical Latin or written Latin of the medieval and later periods. --EncycloPetey 13:39, 9 May 2008 (UTC)
To call it something other than its current name seems premature. What we have is a list of Latin phrases some of which may sometimes be embedded in English text, with English translations and English commentary. I consider it a useful document for adding new entries and for facilitating certain kinds of searches. It is likely to have other uses. Verification that the majority of the phrase appear in English, let alone other languages, is not available.
If the entries for the listed items are done under the Latin L2 heading, should we indicate that the headword was commonly used in English? Does that fact need to be attested? DCDuring TALK 14:08, 9 May 2008 (UTC)
  • A list of phrases which simply existed in Latin would already seem to be covered by Category:Latin phrases. Whether or not they exist in French/Polish etc does not seem to be addressed by the Appendix, which translates them into English and explains them in English. So it seems to me that the Appendix is designed to list all the Latin phrases which are used by English writers, and its current pagename is confusing (to me at least) because of its apparent crossover with Category:Latin phrases. Widsith 15:43, 9 May 2008 (UTC)
    How did you conclude that the Appendix is designed for that? Solely because it explains them in English? This is the English Wiktionary, so everything is explained in English. I don't see any evidence on the Appendix page that indicates the list was developed specifically for terms that appear in otherwise English texts. --EncycloPetey 17:58, 9 May 2008 (UTC)
OK. So does that mean its content will be the same as that of Category:Latin phrases? Widsith 18:03, 9 May 2008 (UTC)
No. The main namespace requires entries to have 3 citations (or 1 in some cases), and forbids entries that are mere sum of parts. An Appendix is often freer in what it permits, and may include items that would not be included in the main namespace. --EncycloPetey 21:47, 9 May 2008 (UTC)

Yeah, I don't know if there is a limited source or sources, but the list appears to be a compilation of phrases used in English as well as specific mottos and quotations (e.g., "cave canem" from a Pompeiian doormat). I suspect it's too late to apply a specified scope to this large list, but it could be split off into other more specific lists, if someone wants to take it on as a project. Best just to let it continue to grow, and continue to apply the normal attestation requirements on Wiktionary entries for both Latin and "English" Latin terms. Michael Z. 2008-05-09 22:59 z

  • The reason I asked is because you can get those dictionaries of Latin terms in English, and I thought this was maybe our own useful version of such a thing. But apparently not. Widsith 07:38, 10 May 2008 (UTC)

Fixing wikisyntax typos

I've created a punch list of mis-matched ( ), [ ], and { } in entries; they are often very hard to notice even when looking right at them. If anyone would like to help fix them, see User:Robert Ullmann/Mismatched wikisyntax and of course comments and suggestions are wanted. There may also be other things it can look for. Robert Ullmann 14:36, 9 May 2008 (UTC)

in vacuo

I have made an initial entry under the English heading. But our collegue EP suggests there could be a case for a Translingual header. What is the general opinion? Is this term widely enough used in most languages? -- Algrif 14:06, 10 May 2008 (UTC)

Google finds a few instances of the phrase in German, Dutch, and Italian Wikipedia, so it's possible that it is more widely used (although the vast majority are on en: and la:).
Is there a guideline with our definition of translingual: how many languages do we need attestations in to call it translingual, rather than simply a Latin borrowing into several languages? Or do we reserve the designation for things which are more inherently universal, like chemical symbols and proper names of species? (see children of Category:Translingual.) Michael Z. 2008-05-10 19:05 z
Well, it can't be a Latin borrowing if it's not a set phrase in Latin. In Latin, this entry is merely sum of parts, and so would not merit an entry. However, if it occurs in the middle of texts of various languages in this set form, then we have a case for a Translingual entry. There are a lanrge number of chemical symbols and scientific names of taxa that are translignual, yes, but also some abstract symbols, numbers, and some abbreviations and codes. There are also a few phrases or abbreviations of Latin origin, like sp., spp., etc. that have been adopted into many languages. --EncycloPetey 19:15, 10 May 2008 (UTC)
Perhaps it was a set phrase in the sciences, when European scientists still spoke and wrote in Latin. Michael Z. 2008-05-10 21:48 z
WT:ELE says "this heading includes terms that remain the same in all languages. The symbols for the chemical elements and the abbreviations for international units of measurement are but two examples of translingual terms" (my emph.). We should find attestations in a diverse selection of widely-used languages, say, Chinese, Spanish, Arabic, Hindi, and Russian, before we can conclude that it is truly translingual.
I guessed that in Cyrillic it might be ин вакуо, but only found a single Russian citation on the web, which appears to be quoting some Latin text. Of course, it might be Cyrillicized differently. Michael Z. 2008-05-10 19:20 z
Hm, things like chemical symbols, math, taxonomic names, metric units, internet top-level domain names are truly translingual, used everywhere. Perhaps etcetera is too, but I'm skeptical about things like sp., spp., for which other languages have their own names (e.g., Ukrainian вид, Turkish tür). Michael Z. 2008-05-10 19:28 z
I think we have interpreted Translingual to include terms in "scientific Latin" that have achieved some acceptance in the international scientific community. This doesn't seem clearly consistent with the phrase "in all languages" in WT:ELE.
This works pretty well for the taxonomic names and possibly for the language used to describe species and specimens. The extension to a term like "in vacuo" is a more modest stretch from the descriptive language used in botany. OTOH, EP has instructed me that the adjectives used in species names (eg, multiflora, latifolia, carolinensis) are Latin, albeit New Latin, not Translingual. DCDuring TALK 20:41, 10 May 2008 (UTC)
Oops, now that I think of it, Russian documents would probably write in vacuo in Latin characters, since they are not as foreign as Cyrillic characters are to English readers. I don't really read the language, but there appear to be a few cases of this in the first couple of pages of Russian-language search resultsMichael Z. 2008-05-10 21:33 z


  • Surely it will have different pronunciations in different languages? Widsith 20:49, 10 May 2008 (UTC)
    How is that different from the chemical symbols for the elements? In English Hg is pronounced [eɪtʃ dʒiː], but this is not how it would be pronounced in French or Spanish. The scientific name for the Asteraceae (sunflower family) is pronounced differently in different countries as well. The "Translingual" label indicates only that the written form is common to many languages, and does not speak to the pronunciation. I know of no Translingual entries that wold have the same pronunciation in multiple languages. --EncycloPetey 21:08, 10 May 2008 (UTC)
We have:
  1. unpronouncable or unpronounced (g2g) entries
  2. symbols (eg letters, digits) that do not have their own pronunciation, instead taking it from the associated word
  3. multiple pronunciations for the same word in the same language.
I don't think that pronunciation has enough muscle to determine this. DCDuring TALK 21:16, 10 May 2008 (UTC)
Also note that a symbol is different from a word. Hg can be spelled /eɪtʃ dʒiː/ or simply read /mṛkjuri/ in English, corresponding to two different pronunciations, /ha ge/ or /rtutʲ/ in Ukrainian. The Latin (translingual?) term in vacuo may be pronounced something like /ɪn vækjuo:/ in English, and practically identically /in vakjuo/, if read from a Ukrainian text. Michael Z. 2008-05-10 21:33 z

Perhaps the yardstick for translinguality is when something becomes a symbol, and is released from the restrictions of pronunciation in its original language. Asteraceae is still a Latin word: Canadian /æstəɹ'eɪjsiə/ or Eastern European /asterat͡s'eja/ are still examples of people reading Latin with their own accent. But $, mm, Hg, 42, °, =, .de would be spoken in the local language, and are going to need a very large "pronunciation" section. (.com may be an exception, because it is an acronym "dot com", not "dot see o em"). Michael Z. 2008-05-10 22:06 z

  • The fact that several languages may or may not use a Latin term does not make the term Translingual. It's totally different from a Chemical symbol like Hg. Someone writing in Taino or Xhosa can use only the symbol Hg if they are composing a professional chemical document. There is no analogous situation for a phrase like in vacuo, which I simply do not believe can be valid usage in every language in the world.