Wiktionary:Beer parlour/2008/March

From Wiktionary, the free dictionary
Jump to navigation Jump to search
This is an archive page that has been kept for historical purposes. The conversations on this page are no longer live.
Beer parlour archives edit
2024

2023
Earlier years

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002
December


March 2008

Translation bars

What do you think could "show" in the translation bars be changed to be right after the bar title? It's now a bit difficult to find for someone who browses here but doesn't edit, especially if one has never seen that kind of bar before. And It would be great if the bar floated better with pictures, now if there's a picture on the right, and the translation bar on the left, the bar starts after the picture and leaves blank above. Best regard Rhanyeia 15:54, 8 February 2008 (UTC)[reply]

I think maybe it should actually go right before the bar title (instead of or as well as at the right end), but right after the bar title would be a good second choice. —RuakhTALK 00:35, 9 February 2008 (UTC)[reply]
That sounds good. :) Best regards Rhanyeia 10:53, 9 February 2008 (UTC)[reply]
I don't like this idea, having the [show] before the text would (I think) make it messier as they are currently styled like the [edit] links which always float to the right. I can't see how there can be that much different in finding it (and it is the same as the 'pedia bars) Conrad.Irwin 21:10, 9 February 2008 (UTC)[reply]
I'm trying to imagine how the pages look to someone who doesn't use computers much, and who doesn't know how their layout works. That person doesn't know there's something hidden inside the bar and his eyes may just go over it fast. Translations are a very valuable part of Wiktionary pages and I think they should be found easily. I can't remember that Wikipedia would use those bars inside articles, does it? That "show" resembles the "edit" tags is not very good because they are for so distinct purposes, a casual reader might avoid clicking anywhere around the "edit" tags fearing it could change the page. I think the "show" tags could be far more distinct from the "edit" tags. If not before the bar titles, could they be after or under them? Best regards Rhanyeia 09:12, 10 February 2008 (UTC)[reply]
Assuming that we implemented a change in the "trans" and "rel" bars, how would we know that it was a good change after we had done it? I know that we should and do eat our own dog food, but we aren't representative of the larger population that we serve. DCDuring TALK 11:21, 10 February 2008 (UTC)[reply]
This problem bothers me too. What about a companion icon displayed along with the text, for example a downwards-pointing arrow alongside [Show] and an upwards-pointing arrow alongside [Hide] ? --EncycloPetey 22:11, 10 February 2008 (UTC)[reply]
I like the idea of the arrows - but does it explain the concept enough - perhaps a "+" sign would be better [show ▼] [show +] - what does Windows use for this kind of thing? Conrad.Irwin 22:21, 10 February 2008 (UTC)[reply]
But a plus (+) could still be interpreted to mean that it is for editing (add additional material). The reason for suggesting arrows is that people with poor English (or who aren't watching) might assume the bracketed text allows editing. --EncycloPetey 23:19, 10 February 2008 (UTC)[reply]
Indeed. When I created {trans-top} as a demonstration, I just borrowed the "nav" CSS. I pointed very loudly and repeatedly that it should be designed before being put into use, but was utterly ignored by those eager to plow ahead.
The show/hide link should immediately follow the gloss, and the table not display full width unless "opened". Someone ought to create separate CSS for these (and rel-top as well) and use it. Likewise, where a conjugation/inflection table is built to be collapsed, it should not be a full width bar. Robert Ullmann 14:06, 12 February 2008 (UTC)[reply]
EncycloPetey's idea about an arrow was excellent. Were you thinking about a graphical arrow or a text arrow? I could try to create something, but does anyone know how to do these things technically so that we can try out how it looks? Best regards Rhanyeia 19:11, 12 February 2008 (UTC)[reply]
If there is an arrow character (like ▼) that will display properly in a majority of current browsers and platforms, then that would be ideal. A single text character would load faster and have fewer format pitfalls. I know that the arrow character I used in the first sentence of this paragraph will display correctly in IE6, IE7 and Safari for MacOS X. --EncycloPetey 22:38, 12 February 2008 (UTC)[reply]
See http://www.alanwood.net/unicode/arrows.html and take your pick, then finding out whether it will display on IE6 will be harder. Conrad.Irwin 22:46, 12 February 2008 (UTC)[reply]
It isn't hard for me to check IE6, since I'm forced to use it at work. I checked the page and nearly all fail to display in IE6; only the first six items display, as well as item 8616. Personally, I prefer 9650 and 9660, the black triangles, which are included in WGL4. --EncycloPetey 02:28, 13 February 2008 (UTC)[reply]
A desirable design for a control that we wish to encourage people to use would be for it to be close to where their eye is and where their pointer is and fairly large. A short bar is good, but also a large target. See Nielsen and Loranger (2006), Prioritizing Web Usability DCDuring TALK 02:53, 13 February 2008 (UTC)[reply]
The triangles 9650 and 9660 sound good. I think "show" could also be a little bigger. Under the title text might also look good, then the bar would become thicker and one would pay more attention to it. Can the "show" be changed from the template itself, or is it changed from the same place than the "edit" tags? Best regards Rhanyeia 09:41, 17 February 2008 (UTC)[reply]
Does someone know who is the editor who can change the place of the "edit" tags? He would probably know what to do with the "show" tags too, and if I remember correct the "edit" tags have been in different places sometimes. Best regards Rhanyeia 08:36, 23 February 2008 (UTC)[reply]
Any admin who understands the necessary CSS or markup changes should be able to make them. For reference, one place that has the edit tags in a different place is German Wikipedia from what I recall (on the left instead of the right). Mike Dillon 16:46, 23 February 2008 (UTC)[reply]


Did anyone have any objections to the modification of the "show" location on the translation bars? They are so clearly good from the point of view of basic user-oriented computer interface design that it seems a shame to revert them unless there is some extremely good reason, which has not been provided. DCDuring TALK 17:07, 5 March 2008 (UTC)[reply]
Hello. I strongly object to the insinuation above that making mouse-click targets move all over the place is an "expert user-interface opinion." That, on the face of it, is a bald lie. Do you have an "[X]" in the top corner of this window? How would you feel if that were placed randomly? Placing widgets consistently, is much larger UI issue than conjoining. I know I commented somewhere when I rolled back the experiment - but am not finding any traces of that comment, now. If my initial post on it, did not save, then I apologize (and am curious as to why.) --Connel MacKenzie 17:07, 10 March 2008 (UTC)[reply]
There's another possibility too. They could be placed under the title text, either on the left or in the middle of the bar. Best regards Rhanyeia 17:13, 10 March 2008 (UTC)[reply]

So Connel MacKenzie has brought out one downside what the "show" tag being right after the title text has, so the tag wouldn't be in the same place in all bars. Could we try to explore the possibility where the "show" would be on the left side? How would be on the left and under the title text, any opinions about this please? :) Best regards Rhanyeia 16:50, 11 March 2008 (UTC)[reply]

For those to know who have not watched Grease pit, there was also a discussion about this there a little while ago. Personally I think "show" was quite good after the text, but it might be even better on the left side one way or another. In the beginning of this conversation Ruakh suggested that it could be before the title text, and Conrad.Irwin thought that might be messy. Do you Conrad.Irwin still after these discussions think so if there was "show", a little space in between and then the title text? There's also one more way to do it. On the vote page the title texts are centered. If the translation bar title text was centered and the "show" was on the left side it would look almost like the vote page now, except that the "show" would be on the other side. How do these things sound? Best regards Rhanyeia 17:59, 13 March 2008 (UTC)[reply]

Placing "Show" immediately to the right of the text is a superior solution for users who are beginning to use the system, I think. As I noticed and continue to notice in my own experience, it doesn't work very well for those who use the system heavily, ie, us (regular users). Given a choice (in WT:PREFS or "my preferences"), we could set our preferences to support our needs. Anons have no persisting choices and inexperienced registered users mostly don't know they have choices. To the extent that we can do so without overtaxing our tech experts, the servers, and user patience (download times and other latency delays), it would be nice to have accommodation for each broad group of users (anons/new users, registered users (mostly newish), admins and other regulars, experts). But the needs of anons and other new users would and should determine our defaults. In the absence of any specific data about their behavior on Wiktionary or Wikis in general, we are forced to resort to general principles of naive-user usability. Or we could just make all user interface choices for our own convenience and amusement until someone pulls the plug on all this. DCDuring TALK 19:16, 13 March 2008 (UTC)[reply]
It would be nice to have a 'show all' option at the top to expand all boxes at once. Pistachio 15:33, 14 March 2008 (UTC)[reply]
Both of these ideas sound good, that one could set these things from preferences and that there would be a "show all" tag too, but I don't know if there would be a volunteer to code them at least at this moment. In the meanwhile, DCDuring I agree with you that right after the text could be good, but what if we tried also some place on the left of the bar to be able to compare them? Best regards Rhanyeia 18:04, 16 March 2008 (UTC)[reply]
I didn't mean to suggest not to do that. In some ways it is a perfect solution, providing both predictability for experienced users and obviousness for inexperienced users. DCDuring TALK 18:18, 16 March 2008 (UTC)[reply]
There is already a WT:PREF for showing all tables, and it is something that shouldn't take too long to code - though I would quite like to rewrite that section of javascript completely - I don't have time for the moment though. It is possible to override the position of the [show] buttons using personal monobook.css by adding the following lines - but whether to set it as default is a different matter. (This could easily be made into a WT:PREF if people want)
.NavToggle {
  display: inline !important;
  position: static !important;
  float: none !important;
}

Conrad.Irwin 11:46, 20 March 2008 (UTC)[reply]

It's great if there may become new features to the preferences regarding these things in the future. Maybe we could try to think about where the default place of "show" would be? Right after the title text has that one downside but might still be possible, on the left side before the title text has not yet been tried. Are there opposing opinions for trying how it looks? Best regards Rhanyeia 16:13, 27 March 2008 (UTC)[reply]
I guess the next thing to do would be trying that. It takes an administrator to edit that file, Conrad.Irwin you'd know how to do it, do you think this is something which you could try there please? Best regards Rhanyeia 14:42, 3 April 2008 (UTC)[reply]

I've fixed the current NavFrames so that this is now possible to choose with only CSS - if you like it then I'm happy to give it a site wide go, but I'm not sure it works that well.

.NavToggle {
  float: left !important;
  position: static !important;
  right: inherit;
  margin-top: 0.1em; /* To counter the 90% font size used */
  margin-right: 5px;
}

Just add the above to Special:Mypage/monobook.css. Conrad.Irwin 11:34, 7 April 2008 (UTC)[reply]

I tested it and I think that's quite good and the tag would be much easier to find. For me the template rel-top looks better with it than the template trans-top. Could they both be like rel-top? Best regards Rhanyeia 16:00, 9 April 2008 (UTC)[reply]
Thank you Conrad Irwin for fixing the templates to be similar and explaining why it's better that way. Best regards Rhanyeia 15:27, 14 April 2008 (UTC)[reply]
If this becomes changed to the left, how about the first bar of any entry having "show all" tag on the right side? Would that be difficult to make? Best regards Rhanyeia 16:31, 9 April 2008 (UTC)[reply]
What Rhanyeia said. And a big thanks for this, Conrad. I hope that there are more than the two of us using it. It would seem likely to nicely facilitate display for ordinary users without making it too easy for them to accidentally click on edit. Fixed position is good. DCDuring TALK 18:04, 9 April 2008 (UTC)[reply]
Yes, thank you Conrad.Irwin. I plan to begin a vote so that the default place of the "show" could be changed. Before that, since these templates are used in the mainspace for important things, maybe "show" wouldn't need to be only 90% font size? And thank you Robert Ullmann for making the bars float better. Best regards Rhanyeia 09:24, 13 April 2008 (UTC)[reply]
Just because voting sounds like too much bureaucracy, I've gone ahead and implemented it. If people really object to it, please undo the edit - but comment here to let us know why. Conrad.Irwin 15:59, 14 April 2008 (UTC)[reply]
Thank you for testing, although it had to be changed back. I'll continue on Wiktionary:Beer parlour#"Show" tags. Best regards Rhanyeia 14:23, 15 April 2008 (UTC)[reply]

Inconsistent quotation examples

There's an inconsistency between what follows the year in these two places. In the first it's colon and the second a comma. Looks like the comma is preferred? - dougher 05:17, 1 March 2008 (UTC)[reply]

I always use the comma, since that's what's shown repeatedly on the detailed WT:QUOTE page (your second source). Wiktionary entries are all over the map when it comes to formatting quotations, but I usually change them to conform to WT:QUOTE. I don't really consider this format to be particularly elegant but it is simple (every element of the citation is separated off by a comma) and it is important to be consistent. -- WikiPedant 06:06, 1 March 2008 (UTC)[reply]
Looks like {{quote}} is a necessity, the only problem is that no-one can agree how it should work. I, by copying other entries always assumes we should use '—' after the year... Conrad.Irwin 13:20, 2 March 2008 (UTC)[reply]
Good grief, not this discussion again. No one has successfully developed a template that will do what such a template would need to do. Look at the many previous discussions on this issue; it is pointless to start it again here. --EncycloPetey 15:06, 4 March 2008 (UTC)[reply]

AOL

Isn't Wiktionary:AOL and the link from the main page unnecessary now since the X-forwarded-for is provided so AOL users can be identified by their individual IP? At least that was my understanding from wikipedia, see Wikipedia:Wikipedia:AOL Nil Einne 06:52, 1 March 2008 (UTC)[reply]

OK, I've just unblocked. Let's see how this goes, shall we? --Connel MacKenzie 20:50, 1 March 2008 (UTC)[reply]
In response to the various IRC questions: I unblocked the forward-facing AOL proxy servers that now supposedly forward the XFF-forwarding information correctly. AOL was not "blocked" for years - it was "blocked from invisible access" while allowing all AOL users access over the https: servers. This was no big, dramatic change - individual AOL blocks should still be limited to 15 minutes to 1 hour. --Connel MacKenzie 03:03, 2 March 2008 (UTC)[reply]

Encyclopedia of Life

This online resource (at [1]) would seem to be a good source of animals, plants etc. I have added one of their featured entries - green anole. SemperBlotto 12:15, 1 March 2008 (UTC)[reply]

I take that you mean it as a reference. We can't use their content (esp. the very nice images), can we? DCDuring TALK 19:58, 1 March 2008 (UTC)[reply]
Their terms of use page says this:
Please note that a single page may be made up of many different data elements, each covered by a different license. You are required to check to see which license applies to any portion(s) of the page you wish to re-use and to abide by any restrictions on that content. ... In most cases, EOL data partners have made content available for re-use under one of the following Creative Commons licenses: ... To identify the terms of re-use of a photograph or drawing, click on the green information button on the bottom left corner of the picture.
I spot checked a couple of images from their home page and one of them was CC-BY-NC and the other was CC-BY-NC-SA (meaning we can't use either of them). We should be able to use any images that are CC-BY-SA; in general we can use CC content that has "SA" (share alike) and does not have "NC" (non-commercial). The "BY" (attribution) doesn't make a difference as to whether or not we can use it, only whether we have to give attribution or not (which we would probably do regardless). Mike Dillon 01:49, 2 March 2008 (UTC)[reply]
Could anyone with a zoology, botany, or microbiology (not to slight viruses, molds, and fungi, et al.) background compare and contrast this to WikiSpecies? DCDuring TALK 15:57, 14 March 2008 (UTC)[reply]

plural and uncountable

Have we currently got a consistent way of marking entries where some senses are the plural forms of single nouns, and other senses are uncountable nouns?

The most recent example I've come across is hostilities, but there are many others. Thryduulf 17:20, 1 March 2008 (UTC)[reply]

See weeds. --EncycloPetey 17:49, 1 March 2008 (UTC)[reply]
Would it not be reasonable to insert a link to plurale tantum at the sense lines (there might be a template to make the format uniform.)? DCDuring TALK 19:54, 1 March 2008 (UTC)[reply]

Language index as category

I am still thinking about how to keep the language index current. Could an index page be treated as a category? Then when I create an entry, I would add it to the appropriate letter category, for example a word starting with m would be added to Category:Hungarian index m. I don't know enough about the consequences. An entry belonging to too many categories, performance issues maybe, too much work for editors, although a bot could also add the categories. Or is it easier to regenerate the index every month using the monthly dump? Thanks. --Panda10 14:38, 2 March 2008 (UTC)[reply]

English dictionary should only have English words?

Aren't there Wiktionaries in other languages for words in other languages? For instance, why does the English Wiktionary have entries for être, φλόος, Աստանա, 가마우지 and many others which clearly aren't English words? Should these not be in http://fr.wiktionary.org etc. - for instance, why do we have an être page at the English Wiktionary when this seems the only natural place for it to be? After all, the English Wikipedia has no article entitled "Աստանա", as the English Wikipedia is written in English. Should this not apply here too?

Any comments are appreciated. It Is Me Here 07:27, 3 March 2008 (UTC)[reply]

Wiktionary is a dictionary written in English of all words in all languages, just as Wikipedia is an encyclopedia written in English of all topics from all language-areas. I could look up fr:être at fr.wiktionary, but I don't speak French, so I would still not know what it means. All of the Wiktionaries define all words from all languages in their own language, so a reader can access the definitions of all of them in their own language. Dmcdevit·t 07:37, 3 March 2008 (UTC)[reply]
This explanation is so perfect that it should not be lost. Could it be pasted to some help page? Lmaltier 20:42, 3 March 2008 (UTC)[reply]
Agreed. WT:NOT would also be a good place for this, with a label like "Wiktionary is not monolingual" or some such. -- Visviva 13:39, 4 March 2008 (UTC)[reply]
Done. Well, doing now. --Neskaya talk 21:50, 12 May 2008 (UTC)[reply]

Romance verb and past part. forms

Because I'm so fond of adding forms etc., and because I personally feel that our non-lemma entries could use the attention, I'm going to be designing verb-form templates for the Romance languages that I can. I'll be doing them in the same way I did {{ro-form-verb}} ({{fr-form-verb}}, {{es-form-verb}}, {{pt-form-verb}}, etc. SemperBlotto handles Italian like it's cool, so I'm going to leave that one to him.) This is the easy part and I'm not too worried about progress here. (Except in Spanish, I think our verb form entries may have gotten a little out of control here. Especially the categories.)

One of the serious things I want to see under control stat is past participles and their forms, though. Spanish pp forms I haven't seen anywhere, nor have I seen the forms listed in the base past participle entries. ({{ca-pp}}, {{es-pp}}, {{fr-pp}} and {{pt-pp}} are all good for use in inflection lines now, so this shouldn't be an issue for new entries if we all know they're there.)

Now the formatting of past participle forms is probably what needs the most help. Keene's runs a bot (ăsta, de fapt) that adds French verb forms (which is awesomeness, I just want to get everything ironed out smooth before we get started for realreal.) French and Catalan pp forms can still function as verbs or adjectives, so they need to go under Category:French past participle forms and Category:Catalan past participle forms. Others I'm just about certain only function as adjectives, so I don't see a problem with putting them in Category:Spanish adjective forms etc, but if anyone disagrees, I'd like to know why so we can do what's best, etc. See User:Opiaterein/Basic past participle format to check out the standardized format I'm looking at. I think It'll work pretty well translingually.

The languages I'm most concerned with, because they represent the bulk of our Romance languages that still need direction here, are:

  • French
  • Spanish
  • Portuguese
  • Catalan

If there are any concerns, I'd like to get them out now so I'm not messin' anything up, keh? :) Let's get to it — [ ric ] opiaterein14:18, 3 March 2008 (UTC)[reply]

All of them need something that clearly identifies them as minimal stubs, needing definitions, example sentences etc. Users must never get the unfortunate idea that content is somehow not allowed in "non-lemma" entries, or think they can or are allowed to remove it.
Do you mean that all bot-formed entries should have a tag which puts them in a cleanup/stub category, for example in Category:Keenebot2 entries? This is feasible of course, but I can't see how it would be helpful. --Keene 15:11, 3 March 2008 (UTC)[reply]
The inflection must be on the inflection line, the definition in English on the definition line(s). We should have a firm policy prohibiting bot creation of entries that do not contain English definitions. If they can generate all the names for inflections, they can generate the English definitions in the correct forms. If the operator is unable/unwilling to do that, he/she should not be creating them.
Please, can you give a link to such a page with the inflection on the inflection line? I've seen a few already, bu I forget where. --Keene 15:11, 3 March 2008 (UTC)[reply]
Trying to clean up details in the "form of" entries is pointless, they are fundamentally wrong; they will all have to be done over again, either with a decent bot, or by hand. Robert Ullmann 14:42, 3 March 2008 (UTC)[reply]
re: Opiaterein mentions starting for "real real". Keenebot2 has already started for "real real". The bot's almost auto-added conjugations of all verbs tagged with an inflection template so far. Changing it now is possible of course, and if necessary I could cease bot activity until such a time when we're all happy with how inflected entries should look (at the time of the bot vote, there were a couple of oppose votes, so obviously it isn't perfect). However, when Robert Ullmann says that form-of entries are "fundamentally wrong", I must disagree. --Keene 15:11, 3 March 2008 (UTC)[reply]
As for past participles and pp forms, the adding of adjective definitions to them is on my to-do list. I started adding a few adjective sections at the beginning, but haven't done many since then. The same with present participles, many (most? all?) of which can be adjectives in French too. At least these ones are all together in Category:French past participles and in Category:French present participles, so when anyone wants to trawl through them it's easier. However IMHO having only these stub-form entries for forms of verbs is better than nothing, and you'd do well to find a website out there with better entries for each "form-of". Regards, --Keene 15:21, 3 March 2008 (UTC)[reply]
When the same form can be either a participle or an adjective (which is sometimes, but not always, the case), they should be listed separately.
About definitions in English: they are in English! Actually, I find there are 3 kinds of definitions in use:
  • traditional definitions: they are used for lemma forms of ENglish words, but should also be used whenever appropriate,
  • translation definitions: one word translating the word defined + a gloss.
  • grammatical definitions: used for inflected forms. I cannot find a better way of defining the meaning of inflected forms. Take an example: aima. You cannot understand what this word means if you don't understand several concepts, and at least singular, third person, past historic (and its not obvious use) + the appropriate meaning of aimer. Impossible. Try to build a better definition (it's a challenge) and you'll understand what I mean. You could try to define all this in the page itself, but this would still not provide a definition of aima (only of aimer). You could also try to provide a translation (loved) but, clearly, this is only a translation, not a good definition, and this would not help to understand the difference between aima, aimas, aimai, aimé, aimée, aimés, aimées... as all these words have the same translation. In such cases, grammatical information is part of the meaning. Lmaltier 17:45, 3 March 2008 (UTC)[reply]
I strongly agree with Lmaltier. Non-lemma entries require a grammar-focused description of the relationship between the headword and the corresponding lemma. Rod (A. Smith) 18:03, 3 March 2008 (UTC)[reply]
Regardless of the debate regarding use of definitions in non-lemmatic entries (disclosure: I am most firmly against Ullman's crusade), there is already a template for French, it's {{fr-verb-form}}. I've created a redirect from the name there (I believe the POS should come first, but then I think we need a naming convention for these templates, and getting agrement on style or naming issues here is even more difficult than on wp...) 18:09, 3 March 2008 (UTC)
{{fr-form-verb}} isn't meant to be used in the inflection (header) line, it's meant to show the actual forms. Check out vorbesc to see the corresponding Romanian template in action.
From this point on, I want no more talk of definitions or glosses in form-of entries. We've been over it a hundred times and it's not the subject of this discussion, thanks. :) — [ ric ] opiaterein18:42, 3 March 2008 (UTC)[reply]
Okay, that makes more sense, but there are still quite a few kinks to smooth out IMHO, though:
  • I definitely think this ought to use the {{form of}} meta-template, if only for formatting (And they need to start with a capital letter, too!), because there is no specific reason to set them apart from stuff as formated by {{feminine of}} and {{plural of}}.
  • These should be something like {{verb form of}} (or if limited {{romace verb form of}}) for making their purpose clearer ("XXX of" is the format used by all such templates)
  • How about a master template? 90% of the romance languages share the same name of verb tenses, persons and mode. Heck, this could easily be one generic template with extra tenses and modes covering most other languages with person-tense-mode agreement.
  • OTOH, I've by far favored formulations of the sort "first- and second-person" because I find equally readable and slightly more compact (yes I know Wiki is not paper), although that could possibly be handled with special abbreviation for the combinations (there are only 3 or 4 different ones for french). In any case, there is no absolute need to have a "I read" bit after the lemma definition: we can always have it formatted as an example if we insist upon having it (although I'd again favor not using them at all).
Oh boy that was wordy... But necessary. Currently, different editors have applied different methods of formating and of applying this formatting, and I think we should seriously consider sorting that. And finally, there is also an {{es-verb form of}} (which I didn't know about when I suggested that name). Circeus 22:52, 3 March 2008 (UTC)[reply]
Funny that folks should mention "standard template" and "Romance verbs all have the same stuff". I recently set up {{conjugation of}} to do exactly this. It accepts a language argument, so no new template is required to add all the various Romance language verb forms; the template is flexible enough already to handle them as it is. --EncycloPetey 02:51, 4 March 2008 (UTC)[reply]
Re: "Others I'm just about certain only function as adjectives, so I don't see a problem with putting them in Category:Spanish adjective forms etc, but if anyone disagrees, I'd like to know why so we can do what's best, etc.": Spanish past participle forms often function as verb forms: just like in English and French, they're used in forming the perfect aspect and the true passive voice. (Neither of these is This isn't as common in Spanish as in English or French, but they do it does exist.) —RuakhTALK 20:48, 3 March 2008 (UTC)[reply]
No no no, Ruakh, I know that the past participles are used as both verbs and adjectives, I mean the forms of the past participles. Preservada, escondidos, habladas :) — [ ric ] opiaterein23:48, 3 March 2008 (UTC)[reply]
I've edited my comment accordingly. :-)   —RuakhTALK 01:29, 4 March 2008 (UTC)[reply]
Participles are a really sticky problem in Romance languages. I haven't made up my mind how I'd like to see them handled. Most Latin past participles inflect and function as adjectives, but all grammars treat them as a verb form, and they are used to form certain verb tenses. I've even considered the possibility of just using the POS Participle, but that too leads to problems. --EncycloPetey 02:54, 4 March 2008 (UTC)[reply]
In French, there is no such rule. Adjectives are ofteh created by using participle (mostly from past participles, and it is usual that only very common such adjectives are mentioned as adjectives in dictionaries, to save space). For French, at least, the simplest and best way is to mention them as verb forms and to mention them separately as adjectives.

[reset tabs] I have a separate, but related, worry. Many past participles have taken on a life of their own, and have an adjectival meaning which is wider than their verbal meaning: for an example, see tired, fatigué in French, cansat in Catalan (the meaning is pretty much identical in the three languages). I think these need to have two PoS headers, one for the adjectival meaning and one as the past participle of the verb. On the other hand, if the adjectival use is completely subsumed by the verbal use (eg, underlined), I don't see any need to list them separately. My second, more minor, complaint against ric's suggestion is that the participle forms do not link back directly to the lemma form, which for most romance languages is the infinitive. The very definition of a participle is "a form of a verb that may function as an adjective or noun" (present participles may also function as adverbs in most romance languages)—the link back to the root-form of the verb seems essential to me. Physchim62 14:13, 4 March 2008 (UTC)[reply]

Maybe I'm missing something with this participle talk. Speaking for Spanish only, past participles used as a verb only have one form (-o) unlike how it is currently listed in preservado. They only have 4 forms (gender x number) when used as an adjective (in which case we use {{es-adj}}) or as a noun (in which we use {{es-noun-mf}}) as most Spanish adjectives can be used as nouns. In the passive sense they are used like adjectives. See desaparecido for a quick try at seperating out all three senses. What's the need for new templates like {{es-pp}}, shouldn't we be just standardizing use of the old ones? --Bequw¢τ 14:48, 4 March 2008 (UTC)[reply]
You understand the grammar, but the point of the discussion is that this is a regular pattern for Spanish participles. They behave almost as their own part of speech with a consistent set of rules. So, instead of discussing each participle as a verb form and and adjective and a noun, and giving all the inflections and notes for every participle, it almost makes more sense to just call it a Participle and point to an Appendix on spanish participles to explain what that means and give all the definitions in one section. It's a bit like the way we agreed that attributive use of English nouns does not make them adjectives; it's just that English nouns can function attributively as a regular part of English grammar. Or the way that English cardinal numbers can function as both noun and adjective, but we can catch both functions by pointing out that the word is a cardinal number. So, one way to handle Spanish participles is to call them Participles. --EncycloPetey 15:03, 4 March 2008 (UTC)[reply]
Bequw just explained that, in Spanish, it's not a regular pattern. It's not a regular pattern in French either: many past participles cannot function as adjectives. And most present participles cannot function as adjectives (when they are used as adjectives, they are not considered as verb forms). Why inventing misleading solutions? Lmaltier 19:32, 4 March 2008 (UTC)[reply]
I find it ironic that in accusing me of "inventing solutions" and saying I didn't read what was said, you're showing that you didn't read my post. As I said earlier, I have not decided how I feel about this because both approaches have problems. I do not see where Bequw explained anything about a lack of pattern in Spanish, so if you could show me where he said that, I'd be grateful. I did not propose a solution, I explained one possible position because (as Bequw said) "maybe I'm missing something". I therefore laid out the full discussion so we could be clear about what's being discussed. I proposed no solution. Sheesh! --EncycloPetey 02:38, 5 March 2008 (UTC)[reply]
I don't accuse you of anything. I can't speak Spanish, but I understand from past participles used as a verb only have one form that some participles are used as a verb only, and some may be used as adjectives (but I may misunderstand the sentence). This would mean that there is no general rule applying to all participles. Even if I misunderstand it, it's clear that word forms to be listed are not the same when the word is a verb form and when the word is an adjective, and this suggests separate headers. For French, I'm sure that paper dictionaries always consider participles as verb forms (and they are right) and, therefore, do not list them, but they list adjectives derived from participles (systematically for adjectives such as souriant, as readers would not be able to guess that souriante exists, and only when they are common enough for adjectives such as fumé, in order to save space). Stating that souriante is a participle would simply be wrong, nobody considers this word as a participle. I only mean that there is no need for inventing something more than current headers (this is not the same as cardinal numbers, which follow a regular, systematic, rule). Lmaltier 07:08, 5 March 2008 (UTC)[reply]
Then you misread it, didn't you? Go back and read it again. What Bequw said was that, of the forms a Spanish participle has, only one of those forms (the one that ends in -o) functions to form verb tenses; the other forms function as adjectives. Everything else you just wrote about Spanish participles is incorrect. And you're missing the point about Participles. You are beginning with the assumption that they must be forced into the existing categories of Verb & Adjective. The point of my discussion is that we want to consider the possibility that they deserve separate recognition. Consider: English nouns can modify other nouns and function like an adjective, but we don't call them an Adjective here just because they function like one. They're still nouns. The point I'm making is that perticiples are etymologically verbs, function like verbs, but that they also function like adjectives. If we use the POS Participle for Romance languages, then we can include the adjective function in the same POS section without having to pretend there are two separate words involved. I don't pretend that I've solved anything here, but please do consider the possibility and help discuss both the pros and cons, rather than simply dismissing it. This may be different for French, but I don't know French. --EncycloPetey 14:58, 5 March 2008 (UTC)[reply]
(Ears burning) The process of Participle → Adjective → Noun is standard (in terms of changes in the suffix), though it's not universally applicable to all Participles. Some Participles would just be weird to use as nouns, and if you used one in that case people would understand you but in the way some neologisms are easily understandable. Some Particples used as nouns have a slightly different meaning (secado is "the action of drying", not "the dried thing" that one might infer from its descent from secar (to dry)). So if a participle heading were to be used, it would be helpful if it could show in an integrated way how a word could be used in the standard POS terms. Maybe like Particple (Verb or Adjective) or Particple (Verb or Adjective or Noun). Possibly this would be in the header or maybe the inflection line. --Bequw¢τ 19:35, 5 March 2008 (UTC)[reply]
This may or may not be relevant, but it may interest you to know that Ancient Greek will definitely be using the POS "Participle." Obviously it's not a Romance language, but I thought you might find that information useful anyway. -Atelaes λάλει ἐμοί 16:36, 5 March 2008 (UTC)[reply]
I might have misunderstood the intention, too. But changing the existing entry for sucré would be wrong. In French, an adjective is not a participle, a participle is not an adjective, and they have different meanings: an adjective is about a characteristic feature of something and has nothing to do with the verb (except etymologically), a participle has a meaning related to the meaning of the verb. sucré, as an adjective, means sugared (the adjective) or sweet, as a participle, it means sugared (the participle). This is important, because it makes clear that many French sentences are ambiguous. An example: Il a sucré son café, puis a bu le café sucré. Le café était très sucré.. The first use of sucré is a participle, the third one an adjective, and the second one is ambiguous (the intended meaning may be 'sweet', in which case it's an adjective, not a participle, or 'which has just been sugared', in which case it's a participle, not an adjective). Lmaltier 17:41, 5 March 2008 (UTC)[reply]

Can anyone quote me a grammar of any IE language that defines 'participle' as a distinct part of speech, not the usual catch-all term sense for 'verbal adjective/adverb' or as an unseparable compound in tense-formation? Why do you think there are none?
This is an English wiktionary and all the entries must be normalized to English senses/terminology. It would be a dangerous precedent to impose additional headers just because the editors are lazy to provide quality content. --Ivan Štambuk 17:30, 5 March 2008 (UTC)[reply]

Wrong on all counts. (1) I have grammars of Ancient Greek and Latin that treat Participle as a part of speech. Grammars are inconsistent about how they treat participles; some treat them exclusively as verb forms, while others treat them primarily as adjecitves. Other grammars treat them as if they were two separate words. Some of these same grammars will treat adjectives used as if they were nouns solely as adjectives. We should not blindly follow mono-lingual grammars when we are creating a multi-lingual dictionary, but should consider how best to treat the words themselves for the use of our readers. (2) Not all entries must be "normalized to English terminology". That argument went out the window the moment we started including African and East Asian languages. Take a look at the Japanese POS list sometime; it does not normalize to English POS categories because Japanese is not like English or western languages. (3) It is not laziness to discuss a topic and come to a consensus that fits the needs best. What is lazy is to throw out an idea simply because you assume it is a bad idea, assume the world works a particular way, and assume that you're right. The participle discussion is about how to provide quality content. --EncycloPetey 01:15, 6 March 2008 (UTC)[reply]
What are you talking about?? What Latin and Ancient Greek grammars list participles as distinct part of speech? Nouns, verbs, pronouns, adjectives...participles? Don't be silly. "Participle" in linguistics is just an umbrella term meaning "this is really an adjective/adverb/inseparable component... derived from verbal root by regular morphology, meaning exactly what verbal root does in the context of it's application". If it inflects exactly like an adjective, translates in English exactly like an adjective, and is used like an adjective - it's an adjective allright. The fact that there are well-defined rules for producing participles from verbal roots does not mean that all of them are verbal forms that should be put under =Verb=, or worse, under =Participle=; when both their semantic value an inflection properties demonstrate that they act as a separate PoS, you must treat them as such.
Nobody advocates blindly following of a mono-lingual grammar, I't very rude of you to put words like these in my mouth.
I'm sure readers will be much more delighted if they saw a real definition line for a participle, not some "Xx participle of " stub. Advocating separate PoS heder for paticiples that would only be populated with stubs cannot be a quality argument for end users.
Your argument on Japanese is what exactly? We're including non-IE languages and that fact legitimizes this linguistic perversion participle-as-a-PoS? I don't see anything unusal on Wiktionary:About_Japanese#Parts_of_speech, what specifically did you have in mind?
You wrote it yourself: So, instead of discussing each participle as a verb form and and adjective and a noun, and giving all the inflections and notes for every participle, it almost makes more sense to just call it a Participle and point to an Appendix on spanish participles to explain what that means and give all the definitions in one section. - this sounds like laziness to me. For langauges in which participles themselves could have many different inflected forms, this would link form-of entries to form-of entries, which would admittedly make the automated generation of entries much easier, but certainly not for the users to understand them.
Sorry, but te idea of =Participle= section appears to me as silly as that of =Infinitive= and =Gerund= (there are also languages with plenty of those too). --Ivan Štambuk 16:24, 6 March 2008 (UTC)[reply]
Again, you have claimed one thing, but said differently in the same post. You siad "Nobody advocates blindly following of a mono-lingual grammar". OK, so why then does my discussion of treating Spanish participles lead you to ask how this would affect other languages? You keep assuming that whatever we decide to do for one language must necessarily be applied equally to all languages, i.e. a mono-lingula grammar. No one believes this must happen except you. That is why I pointed you to the Japanese parts of speech list. Japanese POS headers include "Quasi-adjective" and "Counter word", but these headers do not have to be applied to any other language. We can and do recognize different POS headers for different languages.
I also have no lcue why you think having "Participle" as a POS header would lead to stubs. If it has its own header, then it must have its own lemma; it will not be a "form of" entry, but will have definitions, inflection tables, related terms, descendants, and all the other sections a lemma would have. The only way a stub happens is if someone doesn;t add the additional information. Stub-ness is completely unrelated to the question of what POS we use. Let's bury that straw man argument now. --EncycloPetey 16:39, 14 March 2008 (UTC)[reply]
Nearly all of my Ancient Greek grammars treat participles as a separate category. -Atelaes λάλει ἐμοί 01:24, 6 March 2008 (UTC)[reply]
Sanskrit has been using exactly four PoS categories for the last 2500 years (since it codification): nouns (nāman, in broader sense - substantives), verbs, prefixes and particles. In all Sanskrit dictionaries today you'll find conjunctions/adverbs/interjections/... combined in a category of "indeclinables" named "avyaya". Does that mean that we should dismiss usual English/Western terminology in favour of the Pāṇini's scheme? NO! --Ivan Štambuk 16:34, 6 March 2008 (UTC)[reply]
Ok, but the mere fact that a number of editors working of a number of different languages are considering participle as a POS should certainly indicate that this is not some esoteric POS invented by a single author. And, for clarification, in Koine Greek (the form of Greek I received my formal education in and am most knowledgeable of) participles did not function as adjectives all that often. I would say the most common usage is along the lines of the English infinitive, probably followed by a noun usage. Smyth, Black, Long, and Wallace all treat participles as separate POS's. -Atelaes λάλει ἐμοί 18:54, 6 March 2008 (UTC)[reply]
Some of the AGr. grammar books I'm browsing on books.google.com really say participle as a PoS, like this and this one, but at the same time they don't treat adjectives as a separate PoS at all ^_^ w:Lexical category claims that "It wasn't until 1767 that the adjective was taken as a separate class", so those participle-as-a-PoS could indicate obsolete terminology used by ancient grammarians, and continued in modern tradition. And all of those AGr. books have a separate chapter on infinitives just next to the one on participles, and I don't see anyone advocating =Infinitive= header. Encompassing under =Participle= both adjectival and substantival meaning of participles which they could acquire through context, and translating them as different English PoS seems utterly wrong to me. In Polish, mało (little) is classified as a Numeral (liczebnik), but not so usually in other languages; - should the article conform to English or Polish notation of a numeral? Dumping all of those participles under unique =Participle= section that would be used for different things in different languages seems additional argument against to me, because it would almost certainly be the vaguestly defined lexical category of them all. At any case, the thing that should not be forbidden is promoting those =Participle= or =Verb= (form) stubs (stubs (stubs with declension tables though!) to proper full-blown =Adjective=, =Noun= or =Verb= section, with normal English gloss and usage samples. --Ivan Štambuk 20:28, 6 March 2008 (UTC)[reply]
Huh? How the translation is used in another language has no bearing on what the POS is in the native language. In Slovene, the names of languages are adjecitves and adverbs, never nouns, but the translations of those terms into English is a noun. That doesn't mean the Slovene word changes POS to match its English translation. I really don't follow your arguments. A few paragrpahs ago you were telling us we should use Participle as a header because nobody does that and we don't want to set a "dangerous precedent". Now that we've pointed out it is used (and has been for a long time), you a balking that it's "obsolete". So, you don't want to follow established tradition, and you don't want to try something new. So what exactly do you want to do? The grammars for Latin and Ancient Greek are not obsolete, and they have changed the way they treat POS over the years to better reflect accumulated understanding of the language. The retention of Participle is charcteristic of many of the most progressive and modern Classical language texts. Please don't dismiss the work of renown experts just because you have a bee in your bonnet about Participles. And, by the way, mało is a Determiner, or more specifically an "indefinite numeral". We actually do use Determiner as a POS here. --EncycloPetey 00:38, 7 March 2008 (UTC)[reply]
1) slovenščina, angleščina, nemščina etc. are, of course, nouns in Slovenian. Language names ending in -čki/-ski/-ški/-ský/-ский are adjectives/adverbs exclusively both in all Slavic languages and in their English translations. If you see a Slovenian adjective/adverb formated as a =Noun=, that's a mistake that needs to be corrected.
2) I never advocated using =Participle=, you must have misread something. Yes, I'm "balking" that it's obsolete, because those same books that list "participle" as a PoS don't list adjective as a PoS at all. Every time you use =Adjective= in Latin/AGr. your'e already not following "established tradition". If we were to follow traditions of particular languages, PoS header names would be a complete mess. The only possible solution that doesn't lead to chaos is to normalize everything to English.
3) I've never seen a "Determiner" used as a PoS in any Slavic language. Other Slavic langauge's correspondents of mało, pół, dużo,ćwierć etc. are always treated as adjectives/adverbs. This treatmeang of "fractional numerals" is Polish-only. Formating pariciples that are obvious adjectives as =Participles= makes as much sense as formatting Polish adverbs/adjectives as numerals.
4) Here's an excerpt from Encyclopeda of Language and Linguistics (Elsevier, 2005): p330, on Applicational Grammar: "Other items belonging to the category 'adjective' are prepositional phrases – about gardening combines with the term books to yield the term books about gardening – and participles, such as English sleeping from sleep and Russian igrajushchij (‘playing’) from the stem igra-. The primary function of verbs is to apply to a noun to yield a sentence. One secondary function is to act as an adjective, which is signaled by the participial suffix." On most other places on the Internet I found that the separate lexical category of participles is included by some authors, but at best it remains an exception rather than a general rule. --Ivan Štambuk 20:25, 7 March 2008 (UTC)[reply]
Why do you continue to assert things that are flatly untrue and which contradict each other?
(2) "those same books that list "participle" as a PoS don't list adjective as a PoS at all" This is flatly untrue. The modern textbooks on Classical languages have both parts of speech listed. "The only possible solution that doesn't lead to chaos is to normalize everything to English." This assertion was thoroughly refuted the last time we discussed Japanese and Korean grammar. You can go look at those arguments yourself, since there is no point in repearting the whole discussion here. We cannot and should not try to shoehorn all languages to fit an English model of language. If that were possible, then linguists would not have abondoned the idea of "universal grammar" as most now have.
(3) "I've never seen a "Determiner" used as a PoS in any Slavic language". Just because you've never seen it does not invalidate it. It's in standard modern English grammars, so if we standardize to modern English as you suggest, then we'll have to apply them to Slavic languages, won't we? In short, your arguments in (2) and (3) above are inconsistent. One of them will have to give way to the other. In any case, I have several Slavic language grammars that recognize "Numerals" as parts of speech, including Slovak for Slavicists by Baláž et al., Czech for English Speaking Students by Šára et al., Polish: an Essential Grammar by Bielec, A Basic Reference Grammar of Slovene by Derbyshire, and Introduction to the Croatian and Serbian Language by Magner. Likewise, I have a range of grammars for several languages (including English) that recognize the "Determiner" as a separate part of speech that includes numerals, articles, and demonstratives as subtypes.
(4) If most of the places you found on the internet have a separate lexical category, then how does that make it an "exception"? If most sites are using it, then it is the general rule. Are you arguing that, on the basis of an Elsevier Encyclopedia, that we classify prepositional phrases as adjectives? Please note that the source you quoted says that a participle is an adjecitve and says it is a verb that functions as an adjective. That is it is both simultaneously, and not just one or the other. That's what the POS of "Participle" means.
--EncycloPetey 03:54, 12 March 2008 (UTC)[reply]
2) Those that I've looked on b.g.c that list participle as a separate lexical category and that I've linked above really don't list adjectives as PoS. This accords with that 'pedia article that asserts that they weren't recognized as such up until recently. That the participles retained separate grouping is more a continuation of an established tradition, not because of a real necessity of doing so. Even today in Sanskrit adjectives are not really recognized as separate category from nouns (if it inflects exactly like a noun, sometimes behaves exactly like a noun - why treat it separate?). What is a pure convenience for one language's descriptive grammar's tradition cannot be used as a general argument for all languages. It's pointless to introduce =Participle= only for a couple of languages, and not doing so for all the other ones. Linguist didn't exactly abandon the idea of "universal grammar", for you see 99.999% of morphosyntactic constructions in all natural languages are describable with CFGs and nicely fit some general framework.
3) It doesn't, but maybe that indicates something, doesn't it? What is a "determiner" in English (according to w:Determiner (class) not even widely excepted term in English grammars) are really all adjectives in Slavic. You can write something like "Big likes something different." that it still makes sense, because this "big" will then have an ending that will indicate gender & case. I didn't claim that numerals weren't classified as PoS in Slavic (where did you get that?), but was just refering to this special-case Polish "partitive numerals" ("little", "half" etc.) that are really classified as adjectives/adverbs in all the other languages. That kind of chaos you get when you apply local standards as general formatting scheme.
4) Most sites are not classifying participles as separate lex. category (you again must have misread something). That OR relation in that sentence was not inclusive at all; participle can behave like multiple PoS according to it's context, and each of those can be separated into it's own L3 header. Just like almost every English present participle can be used as a verb, noun or an adjective. In some other languages these are on not "syncretized" into one form and each of those would have different translations in their ====Translations==== section. Similarly, participles in other languages that can have multiple PoS functions must be formatted as such, and have different translations in English. It's like MxN relation.
5) My point is: you can't just treat something collectively because it's convenient to do so, under the assumption that the reader will know all the details under what conditions participle can act in what ways. I don't know what were you exactly discussing on those Altaic languages, but I fail to see the implications to this topic. Participles could have verbal, adjectival and nominal functions 6000 years ago, have lost some of them in some daughter languages, but the point remained the same: they're not some "special" PoS just because they can have multiple lexical functions depending on the context. --Ivan Štambuk 09:23, 13 March 2008 (UTC)[reply]
2) Then perhaps you should visit a library and look at physical books. I have not seen any recent grmmars of Latin or Ancient Greek that completely failed to have an "Adjective" category. You are focussing only on the older books if you aren't finding adjective, which means you are ignoring modern research.
3) Clearly you do not know whicj words in English are Determiners. "Big" is not a determiner; it is an adjective. "little, "half", "this", "that", etc. are Determiners. It is not a "local" standard -- the behavior of determiners is a commonality among most European languages.
4) Yes, and almost every adjective can function as a noun. Almost every noun can function attributively as an adjective. But, we deliberately do not have a separate Adjective section for each attributive use of a noun, because that would be silly. The meaning is present in the noun, so we only list it as a noun. In other words, we already have situations parallel to the idea of listing under "Particple" all the various POS functions.
5) If you think my entire line of reasoning is based upon "convenience", then you haven't been paying attention to anything I've said. And what Altaic languages are you talking about? There is no point in having a discussion if you are going to keep jumping to topics that haven't been raised instead of dealing with the questions put to you. I have pointed out severl times that your arguments contradict themselves, and you have yet to address that very critical point.
6) Worse, your arguments keep jumping all over the map. Let's return to something you said above: "when both their semantic value an inflection properties demonstrate that they act as a separate PoS, you must treat them as such." OK, so if Latin participles decline wiht a present, perfect, and future form, how can we stuff them into the Adjective category? Latin participles have tense, which adjectives do not. We therefore have inflection properties that demonstrate they act as a separate part of speech. The same is true for Ancient Greek. And, no, they don't simply function like adjecitves, since they are used to form compound verbs as well, and adjectives don't do that. --EncycloPetey 16:19, 14 March 2008 (UTC)[reply]

In no way is it "laziness" to provide an accurate description of the grammars of the different languages we include. It is smallminded to pretend that everything can be neatly fitted into some form of bastardized English grammar. Physchim62 13:49, 6 March 2008 (UTC)[reply]

Yes, the description must be accurate, this is why I insist so much. I feel I must mention the 'adjectif verbal' (verbal adjective) concept, defined as adjective formed after a participle. The name is misleading, but the definition is clear, and this name seems to be fairly common in French. Try to google 'adjectif verbal', and you'll find that sites consistently insist on one point: verbal adjectives should not be confused with participles. It's probably equally true in many languages (including Spanish, as I understand it, and English), but it's especially important in French, because the singular, plural, feminine forms are often pronounced the same, and because the basic spelling itself sometimes depends on this distinction (e.g. intriguant = present participle, intrigant (same pronunciation) = associated 'verbal' adjective). If you don't clearly understand the difference, you are likely to misspell many words. Lmaltier 17:50, 6 March 2008 (UTC)[reply]
That's great for French, but more than French is at issue here. If you look in most Ancient Greek and LAtin grammars, they will define a participle as a "verbal adjective". So what may be true of French is not true in other languages. The spelling is certainly not an issue in either Spanish or English. In those languages, the "participle" is spelled the same way whether it functions as a verb or adjective. Spanish participles can change their spelling when used as a modifier, but the msculine singular is the same as the spelling used in constructing compound verb tenses. So, it seems that Participle as a header may not work for French, but that only addresses one of the languages under consideration. --EncycloPetey 18:55, 6 March 2008 (UTC)[reply]
I'm happy you are convinced. I just want to add that, IMO, the only good general solution is to list all adjectives as adjectives, all nouns as nouns, whatever their etymologies. Bulgarian verbal nouns can be found in conjugation tables. Nonetheless, and fortunately, they can be found in dictionaries, as nouns, because they are nouns. Lmaltier 07:17, 7 March 2008 (UTC)[reply]
The problem with that suggestion as I understand it is that we will end up systematically having entries for adjectives and verb forms which mean exactly the same thing in English. I am happy to have separate PoS entries whenever there is a difficulty, but to make this universal seems to be overkill. Physchim62 15:14, 7 March 2008 (UTC)[reply]
I can personally assure you that defining sugared as a verb form and as an adjective is not overkill, despite the fact that the translation in French is the same in both cases. I was not aware of this use. Now, this must be done only when it is considered as an adjective (or a noun...) in the language. Churchill is considered only as a proper noun in English, so I don't propose to define it as a common noun, even if you can say like a Churchill (this is only a figure of speech, and all proper nouns can be used this way). Lmaltier 17:41, 7 March 2008 (UTC)[reply]
Two cents: in Modern Greek, participle is one of the POS, and in the grammars it's listed that way. One class of them is used in forming certain verb tenses. I want to keep the participle header for these reasons. -- ArielGlenn 09:34, 8 March 2008 (UTC)[reply]
Before making a decision, the meaning of participle should be clarified. The current definition states: A form of a verb that may function as an adjective or noun. When combined with a form of auxiliary verbs, such as have or be, they form certain tenses or moods of the verb. But this definition does not work for all participles. Take brumassé, a standard past participle in French. This word is used in compound tenses, but I cannot imagine how it could function as an adjective or as a noun (it would not make sense). I propose to change the definition to something like A form of a verb often used to form certain tenses or moods of the verb (when combined with a form of auxiliary verbs, such as have or be) and that often tends to be used as an adjective or as a noun. The important thing is that you cannot generalize. Lmaltier 21:36, 14 March 2008 (UTC)[reply]
In Lithuanian, participles are definitely a painfully distinct part of speech. The davylis participles in particular are... well, just check this out for a minute or two and you might get what I mean. They're heavily inflected and function very differently from verbs, and most of the time differently from adjectives. — [ ric ] opiaterein11:01, 3 April 2008 (UTC)[reply]

New Free Corpus of American English

"The BYU Corpus of American English is the first large corpus of American English, and it is freely available online. It contains more than 360 million words of text, including 20 million words each year from 1990-2007, and it is equally divided among spoken, fiction, popular magazines, newspapers, and academic texts (more information). The corpus will also be updated at least twice each year from this point on, and will therefore serve as a unique record of linguistic changes in American English."--BrettR 15:57, 3 March 2008 (UTC)[reply]

That's "free as in beer" only, meaning that results cannot be copied wholesale, but nonetheless is excellent news. I haven't tested it thoroughly, but this has the potential to be a great boon for sense-verification work. -- Visviva 13:36, 4 March 2008 (UTC)[reply]
It should proove an excellent reference source. Someone may like to create a R:Reference style template for it.--Williamsayers79 08:27, 7 March 2008 (UTC)[reply]

Stylistic recommendations for wikilinks

Is there a written consensus/guide on wikilinks where the target is different than the linking-text? I'm not so much concerned about the general case of [[x|y]] links as they are obviously useful. I am wondering about a the specific shorthand of appending suffixes to the wikilinks so as to link to the "lemma" entry rather than the inflected entry. For example writing [[run]]ning to produce running or [[building]]s to produce buildings. This is really common in Wikipedia where they don't have separate articles or redirects for every word/phrase form. Here in Wiktionary, though, if the inflected entry exists isn't this kind of linking a little gauche as you think you're going to the definition of one word and actually end up somewhere else? I've seen a lot that I'd like to change. Suggestions? --Bequw¢τ 21:26, 4 March 2008 (UTC)[reply]

I always use this to get straight to the lemma form, an entry like swooning contains nothing I couldn't guess, however the meaning of "swooning" can be ascertained from the entry at swoon, hence I link <code>[[swoon]]ing</code>. This is either a problem with our form of entries, or is how the world should be. Conrad.Irwin 00:45, 5 March 2008 (UTC)[reply]
Agree with Conrad.Irwin. Links should go to the lemma, unless the inflected form is intrinsically important to the meaning. -Atelaes λάλει ἐμοί 01:02, 5 March 2008 (UTC)[reply]
I agree with Atelaes and Conrad.Irwin about inflected forms. However, I'm iffier when it comes to trivially derived terms, like nouns in -ity that we define as "the quality of being …y" and so on. —RuakhTALK 01:35, 5 March 2008 (UTC)[reply]
The other place where the link is "hidden" is when the wikilink is at the beginning of a sentence or definition that start start with a capital letter, but the entry to be linked does not. Since Wiktionary is case-sensitive, and Wikipedia is not, this is a rather significant difference between the two projects. --EncycloPetey 02:35, 5 March 2008 (UTC) I thought I'd throw this in so that someone might take what folks say in this discussion and create a guide to wikilinks on Wiktionary.[reply]
And another thing: It wouldn't hurt to recommend that wikilinks to long lemma entries go to the appropriate Level-2 (for non-English terms) or Level-3 heading (Etymology or PoS). DCDuring TALK 03:27, 5 March 2008 (UTC)[reply]

I've started Wiktionary:Links. Please contribute to it. :-)   —RuakhTALK 01:44, 6 March 2008 (UTC)[reply]

Us versus Les Grenouilles

One of my favorite restaurants in Paris was frequented in the years following WWII by businessmen from England, who referred to the proprietor as Roger the Frog. The restaurant is long and narrow [2], with tables for 4 on both sides, the walls are covered with photographs and many other memorabilia from the 1940's and 50s. It is now run by Roger's daughters. On my first visit, I ordered ris de veau (sweetbreads) in passable French; the woman went to the other end and said to her sister "the American tourist doesn't know what he is ordering". As she passed my table again, I said, in English: "is the abuse part of the service?" ... it is on the Left Bank, and is called proudly: Roger Le Grenouille (26 rue Grands Augustins)

I set out to find out how and why the Wiktionairre was 20K or so entries "ahead" of the en.wikt. We have ~720K entries counted in statistics, they have ~750K. By reading the XML dumps, I figured I could find out the 20 or 30K entries we didn't have, and then sample them to see what they were. But I got a surprise.

We have 712,823 (3 March) that would be counted in the statistics. They have 744,620. So there are 31,797 entries we are missing, plus some number more because we have a different set of entries. Right?

Turns out that they have 600,508 entries we don't have. See User:Robert Ullmann/en v fr.

A very large percentage are form-of entries, not "real" entries that someone has worked on. Given that some are also form-of entries here, I'd take a WAG that they have < 100K real entries ("base forms"), while we have 406,112 WT:STAT. Which would make a great deal of sense, given the number of contributors, etc. So I'd say forget about chasing fr.wikt, we are way ahead. (Reminds me of the cold war, in which the US built thirty thousand nuclear warheads out of fearmongering that the Soviet Union was getting ahead, when all the time it was desperately far behind ...) Robert Ullmann 14:50, 5 March 2008 (UTC)[reply]

Though I like to think this competition is a little friendlier than the stockpiling of nuclear weapons.. Widsith 16:50, 5 March 2008 (UTC)[reply]
No, much worse ... "Academic politics are so vicious precisely because the stakes are so small." ;-) (Attrib. Woodrow Wilson, modern form Wallace Sayre Robert Ullmann 13:51, 6 March 2008 (UTC)[reply]

For more precise statistics (% of form-of entries), look at the Total line of the table, at fr:Wiktionnaire:Statistiques. But it seems obvious to me that, overall, yes, the Wiktionary is ahead, and it's not surprising. Lmaltier 17:49, 5 March 2008 (UTC)[reply]

Am I going to get arrested for treason if I added a grc etymology to an entry on fr? -Atelaes λάλει ἐμοί 17:53, 5 March 2008 (UTC)[reply]
No, but if you create an entry, we might have to dig out the old "giving aid and comfort to the enemy" clause. :-) (U.S. Const. Article III at (3): "Treason against the United States, shall consist only in levying war against them, or in adhering to their enemies, giving them aid and comfort.") Robert Ullmann 13:51, 6 March 2008 (UTC)[reply]

I've update the analysis, screening out (roughly) the form-of entries. The results are more interesting. More words of interest that we will want to add. Robert Ullmann 13:51, 6 March 2008 (UTC)[reply]

I'd reckon that en: has about that many of non-:fr entries as :fr has non-:en entries. --Keene 01:16, 7 March 2008 (UTC)[reply]
There is some difference: 189,651 in fr, not in en.wikt; 262,239 in en, not in fr.wikt (keeping in mind that these numbers are based on a very rough screening of form of entries) A lot of the ones fr has are French or Vietnamese, which makes sense.
Do note that all my us v them rhetoric in this section is in fun ;-) Robert Ullmann 14:12, 7 March 2008 (UTC)[reply]
This is great stuff. Would it be feasible to generate the entire list without a lot of extra hassle? (I'm particularly interested in the Korean entries they have that we don't). -- Visviva 04:24, 8 March 2008 (UTC)[reply]
Never mind, I figured out how to get that data. Fortunately nobody on wikt has started worrying about Korean form-of entries yet. -- Visviva 07:16, 8 March 2008 (UTC)[reply]

Should the Wiktionary:What Wiktionary is not page be made policy? A vote could be started if so, but I don't think it's a particulary pressing task. --Keene 15:48, 6 March 2008 (UTC)[reply]

No, I dislike the formal policy status, it should be unnecessary - particularly for pages like this which exist only because they exist on Wikipedia. See WT:NPOV for a particular example of this problem; that page may not be modified without a vote, yet its contents at the moment simply does not apply to Wiktionary. Conrad.Irwin 16:39, 6 March 2008 (UTC)[reply]
Well, WT:NPOV has been modified by vote, to reflect the userbox ban. It could (and should) be modified further, to reflect Wiktionary's unique characteristics. Ultimately, however, neutrality is as crucial here as it is on Wikipedia. We just happen to have a mercifully low level of content disputes, so haven't needed to develop the sort of robust dispute-resolution process which would require a robust NPOV policy. -- Visviva 03:35, 8 March 2008 (UTC)[reply]
Sure, this should be a policy if anything is. However, it's no big deal at this point. In practice the real boundary-defining page is WT:CFI anyway. -- Visviva 03:35, 8 March 2008 (UTC)[reply]
No, it should not. I can just see it: make WT:WIN and WT:CFI both policy, and then have disputes over slight nuances and shades that seem to contradict one another, until we wind up in the end with both pages essentially quoting one another with no difference in wording, whereupon someone will suggest merging them into WT:CFI and getting rid of the duplicate. Then someone will suggest having a more user-friendly explanation of what Wiktionary is not, and we'll start all over again. (That was with a touch of humor, but still.)—msh210 18:43, 13 March 2008 (UTC)[reply]

The inclusionist/deletionist divide

Hi all, one of Wikipedia's great problems is the division between those who wish to include everything and those who wish to include only useful information. It seems to me that the same divide is growing on Wiktionary and I would like to discuss ways in which some of the angst it causes on Wikipedia can be avoided here. Firstly, I am an inclusionist and I can see no reason whatever for deleting anything. (Well except for vandalism and nonce words perhaps).

There are two cases that I feel we can do better than the current situation, and I would like to propose a solution: For proper nouns with no Idiomatic meaning (and thus should fail CFI), we can replace the page with a template that looks similar to MediaWiki:Noarticletext however contains a link to the Wikipedia article. The same can happen for words which do not meet our CFI because of the Independance criteria, however do merit an entry in an appendix - with the template linking to the correct page.

I can see some objections to this, "it turns red links blue", "it messes up our statistics", "it is misleading for the bots/humans", however I still feel the useful service we would be providing to our readers counteracts this. Thoughts? Conrad.Irwin 17:02, 6 March 2008 (UTC)[reply]

  • I don't know which one I am (I think I'm an inclusionist). My criteria is simple - does the word / term / combination of letters and symbols etc. have a meaning in the real world. If yes, keep it, if not delete it. SemperBlotto 17:07, 6 March 2008 (UTC)[reply]
If we need to create a second-class status to include more entries of the kind that fill the specilized dictionaries and references that fill the shelves of bookstores and libraries, then we should. I can't imagine that there would be insuperable actual technical difficulties although there may be other insuperable difficulties. DCDuring TALK 17:11, 6 March 2008 (UTC)[reply]
As a hesitant inclusionist, I agree in parts with both sides. SemperBlotto's view of "does the word / term / combination of letters and symbols etc. have a meaning in the real world" is very ideological. and I have followed that notion before. But I think our RFV system might be a little out-dated. 3 decent Google Books hits was resaonable, but today seems too inclusionist-y. I#m not sure how this could be improved, but it is still lacking something. Much like maybe all these thousands of French verb forms - I'm no expert, but am sure that some declensions would be unfindable on b.g.c.--Keene 01:14, 7 March 2008 (UTC)[reply]
Are you saying that we should be a little tighter on lemmas and a bit looser on forms-of? DCDuring TALK 02:52, 7 March 2008 (UTC)[reply]
Are there terms that are passing RfV that you think should not be included? -- Visviva 05:59, 7 March 2008 (UTC)[reply]
Personally, I'm an inclusionist with a strong we-​have-​rules-​we've-​agreed-​on-​so-​let's-​follow-​them bent (-slash- a strong we-​have-​common-​practices-​so-​let's-​codify-​them bent) and a weak our-​readers-​already-​don't-​understand-​the-​difference-​between-​an-​encyclopedia-​and-​a-​dictionary,-​so-​why-​blur-​the-​line? bent. Overall, this disposes me favorably toward your solution. Regarding the possible objections you mention: (1) If we don't want an entry, then there's no particular need for links to it to be red edit-links (there probably shouldn't be any links to it, period, but between red and blue links there's no advantage one way or the other). (2) As long as we don't include [[ in these pages, I believe it won't affect our statistics. (3) Humans can learn, bots can learn, mirrors can learn, everyone will be happy. —RuakhTALK 03:17, 7 March 2008 (UTC)[reply]
Ruakh's rationale sounds good to me. In a perfect world, this {{nothing to see here}} template would also accept unwikified translations as arguments, something along the lines of wikispecies:Template:VN; that would I think dispense with the primary argument made in favor of keeping these entries. -- Visviva 05:59, 7 March 2008 (UTC)[reply]
I am an unabashed deletionist (and proud of it). I think this is a reasonable idea, but I would prefer it if the WM message could do a search and make a larger link if found. However, if this is not feasible, I am less concerned by our numbers than by our usability. -Atelaes λάλει ἐμοί 06:22, 7 March 2008 (UTC)[reply]
I have created an entry at Isaac Newton to show off the new {{only wikipedia}}, I am not sure what the best way of adding translations to it will be, both in terms of formatting and in terms of coding, so if someone else wants to experiment with that then please do. Conrad.Irwin 11:46, 7 March 2008 (UTC)[reply]
That seems like a good idea. It might be better though to have a generic {{otherprojectonly}} template with a parameter for the relevant project. Admittedly I can't think of any occasion it would be used for a project other than Wikipedia, but then I'm not familiar with most Wikimedia projects. Thryduulf 14:23, 7 March 2008 (UTC)[reply]
I wonder whether this would be a good way to handle two-part species names: referral to Wikispecies. That would be an argument for genericization or possibly for a parallel template. Wikispecies doesn't have etymology and we have some of the individual words (from Classical through Medieval Latin, less in New Latin). Although perhaps directly linking our redlinks in this area would be even better. DCDuring TALK 16:03, 7 March 2008 (UTC)[reply]
I would envisage there being a few of these (probably all using one template behind the scenes), one in particular could go from main namespace to the appendices, and I see no reason why we shouldn't also link to other projects if necessary. We have to be careful to ensure that this is only used for cases when Wiktionary should not have an entry and not used as a quick substitute for getting an entry written. Conrad.Irwin 21:10, 7 March 2008 (UTC)[reply]
I'm inclined to agree that binomial (and trinomial) scientific names should be on Wikispecies and not here, as there is not much we can say about them. However, this needs more discussion, as we have welcomed this sort of material heretofore. -- Visviva 03:28, 8 March 2008 (UTC)[reply]
The template looks great. -- Visviva 03:28, 8 March 2008 (UTC)[reply]
Not bad, now just make it multilingual. Yeah, I'm all for this idea, though I want to be sure that it's still possible to create an entry when the template needs to be overridden. For instance, Google Books has 91 hits for "the Isaac Newton of" as an exact phrase. I think the way the template is set up now (for English Wikipedia) already addresses my concern. DAVilla 22:48, 8 March 2008 (UTC)[reply]
Yes, the whole point of the template is for cases where this is not the case. If Wiktionary should have the word then it shouldn't have that template. I am not sure why we would want to link to other language Wikipedias, given that we should be providing information for English readers here. See 毛泽东 for how I think that situation should be treated. Conrad.Irwin 23:36, 8 March 2008 (UTC)[reply]
Obviously we should give top billing to the English Wikipedia's article if it has one, but I really don't see why we'd avoid linking to other languages' Wikipedias. BTW, a technical question: is there any way we can get these non-entries to not be indexed by Google? Currently we have <meta name="robots" content="noindex,nofollow" /> on redlinks; we probably want something like that on these entries as well, or at least the noindex part. (Actually, I'm not sure why we have nofollow on redlinks, either, but what do I know?) —RuakhTALK 00:24, 9 March 2008 (UTC)[reply]

CFI for languages

So, our mission statement is "Every word in every language!" But, at the same time, we don't include every langauge. For example, we don't include Ionic Greek. However, we do include the Ionic dialect of Ancient Greek. But, how do we divide our languages? Everyone who has studied linguistics knows that a language is only a well marked dialect. How do we decide what's a language and what's a dialect or period? With the more common languages it's generally fairly clear. However, for less common languages and extinct languages, where to draw the line is difficult. To a certain extent, it's not that important. If we didn't include Ancient Greek, but only included Greek, then all the words would still get covered, I'd simply have to include {{obsolete}} in nearly every term I enter. :) However, dividing between Greek and Ancient Greek works well, as it splits words up into groupings which are convenient for both users and editors. So far, the standard we have been following, for the most part, is SIL. SIL does a fantastic job of splitting up the world's languages and it gives us a bit of official credence in our divisions when we follow them. Most importantly, it prevents a lot of waffling and unending arguments. While it works to have Greek and Ancient Greek, and it would probably work to have just Greek, it would absolutely not work to have both randomly from day to day, or some editors doing one thing and some editors another. That would be an awful mess. However, SIL's not perfect. A few editors of obscure languages have noted a few deficiencies in SIL's groupings. So, here's what I'm thinking: I propose that we retain SIL as the standard which we use for making divisions between languages. If a language has a SIL 693-3 code, it gets approved for its own L2 header and for general use on Wiktionary. However, people are free to propose amendments to SIL's decisions. If someone thinks a language which SIL does not recognize should exist on Wiktionary, or perhaps what SIL considers two languages should be treated as one on Wiktionary, they can contest SIL's grouping. So, they start a BP topic on the issue and con a bunch of editors into buying their story. Since languages are the apical sorting method on Wiktionary, I think this is important enough that every single modification should be officiated by a vote, if the BP discussion goes well. Obviously everyone will be convinced in different ways, but I think, in general, an editor should have to write more than five entries to justify changing up the format of Wiktionary like that. What does everyone else think? -Atelaes λάλει ἐμοί 03:05, 8 March 2008 (UTC)[reply]

A minor point, but I think it is important to distinguish the SIL (Summer Institute of Linguistics, originally a missionary-training outfit) from the ISO (International Standards Organisation). As far as I'm aware, though, the SIL version is a faithful reflection of the ISO 693-3 standard. -- Visviva 03:20, 8 March 2008 (UTC)[reply]
Sorry, but that's not correct. SIL International (formerly known as the Summer Institute of Linguistics) is the official registration authority for ISO 639-3; I don't know whether ISO (English name: the International Organization for Standardization) maintains its own copy of SIL's list. —RuakhTALK 13:39, 8 March 2008 (UTC)[reply]
Um, no. ISO has the official list, IS 639-3. SIL is the designated registration authority [[3]] for requests for new codes and changes. Proposals by SIL have to be approved by ISO JTC1/SC2/WG2 and then are published by ISO. (Yes, the 3-letter code list was originally developed by SIL, contributed to ISO, updated, and now SIL uses the ISO codes in the Ethnologue, etc. That's why you'll find different coding in the 14th and 15th editions of the Ethnologue.) Visiva's comment is correct. We should generally just refer to the ISO codes, rather than SIL. Robert Ullmann 14:40, 8 March 2008 (UTC)[reply]
According to the ISO Web site, "ISO 639-3:2007 provides a code, published by the Registration Authority of ISO 639-3, consisting of language code elements comprising three-letter language identifiers for the representation of languages."[4] That is, according to ISO, it's SIL that publishes the official list. Your statement that SIL proposals have to be approved by ISO JTC1/SC2/WG2 is interesting — I can't find evidence of that, but will take your word for it — but doesn't seem to be relevant. SIL is the public face of ISO 639-3, responsible both for handling change requests from the public, and for publishing the standard. If ISO keeps its hand in the process, bully for them, but for us they're relevant only in that they gave their imprimatur to SIL. —RuakhTALK 15:12, 8 March 2008 (UTC)[reply]
This is very interesting, thanks. I had not looked into the matter properly, and had not been aware of the special relationship between SIL and ISO 693-3. That said, I still object to referring to these as "SIL 693-3 codes" as the OP did (which was all that I meant to object to in my response above). -- Visviva 15:20, 8 March 2008 (UTC)[reply]
Yes, I think we should be calling them ISO 639 codes, not "SIL codes" (SIL itself calls them ISO 639-3). (The document in question says "International Organization for Standardization" (and "Organisation internationale de normalisation" :-) on the cover, and the copyright is ISO, not SIL.) As Visiva notes, that is the only point being made here. Robert Ullmann 15:40, 8 March 2008 (UTC)[reply]
Oh, yes, agreed, sorry. The name of the standard is ISO 639-3 (note the number, BTW), no matter who is in charge of it. —RuakhTALK 20:24, 8 March 2008 (UTC)[reply]
I think it is important that we be open to including as many languages (natural languages) as we can fit, meaning all of them. If a language is only spoken by a tribe of 373 natives in the remotest jungles of Brazil, well, where better to record their lexicon? I am absolutely not an inclusionist in general, but I like the idea of recording 'all words in all languages' quite a lot, (when it comes to real words), so let's get them in here. The sticky wicket comes along when you talk about Level 2 headers, and what should and should not be included in them. I don't think we should go exclusively by the ISO list, but I also don't think we should let anyone who feels like it decide that a language deserves a L2. Perhaps we can get together a few of our true linguists (of whom I am not one) and get some kind of Language Committee or something, a group of people who are willing to put in the legwork and document reasoning behind calling a language a language, and then saying 'Yes, use an L2' or 'No, Appendix only'. I think that a 'vote' is a bad idea, it doesn't represent anything close to objective results, I know because I have voted before. I think getting qualified people to make the recommendations is a better way to go about these sorts of things. - [The]DaveRoss 03:35, 8 March 2008 (UTC)[reply]
Excellent idea. Some of the discussions on the inclusion of a particular language end up in WT:RFDO, and it would be nice centralize them somewhere formally. --Ivan Štambuk 10:53, 8 March 2008 (UTC)[reply]
I agree, and would like to take this opportunity to propose officially that "Hebrew" (he/heb) and "Ancient Hebrew" (hbo) be taken as one language, "Hebrew". The two languages are not without their differences, but most Ancient Hebrew are still considered correct (if odd) in Modern Hebrew, and quotes from Ancient Hebrew texts are still widespread in Modern Hebrew, much like quotes from Early Modern English (Shakespeare, the KJV, etc.) are still widespread in English today. (This has previously been discussed, and appeared to have consensus; I re–bring it up now only because you're proposing an official mechanism for deciding these things, and I'm not a huge fan of implicit grandfathering.) —RuakhTALK 13:39, 8 March 2008 (UTC)[reply]
While I have absolutely no problem with this proposal, I request it not take place in the middle of this particular thread. This is meant to be a discussion of whether we can deviate from 693-3, and if so, how. If we throw every language proposal here it will be unteneable. Besides that, we really should figure out what the policy is for these changes before we make them. -Atelaes λάλει ἐμοί 19:01, 8 March 2008 (UTC)[reply]
O.K., sorry. In my own defense, the SIL-vs.-ISO discussion above takes up a lot more space. :-P —RuakhTALK 20:24, 8 March 2008 (UTC)[reply]
Indeed it does, but it's almost sort of vaguely related to the topic at hand. More importantly, I figured it would fizzle out as it's such a minute detail. ISO 639 is the official standard. SIL is the organization charged with producing that standard and one of the easiest places to find out what the standard is. Whatever. -Atelaes λάλει ἐμοί 20:38, 8 March 2008 (UTC)[reply]
I run into problems with the differences between what the "WMF language committee" and ISO use. For example, Min Nan is "nan", but WMF created the Min Nan wikipedia using "zh-min-nan". I have been trying (not terribly hard ;-) to find out who the committee is, and possibly get on it. There are definitely codes we need to add, probably mostly as subcodes (fiu-vro for Template:fiu-vro), but they should get WMF-wide coordination if possible. At least others should know what we are doing. For languages that need separate codes, we should find out if there is a proposed -3 or -4/-5 code; if not we should be contributing our findings to ISO TC37/SC2 via SIL. Robert Ullmann 15:40, 8 March 2008 (UTC)[reply]
That would be the m:Language subcommittee. It's worth noting that the language proposal policy specifically requires that new proposals have an ISO 639-x code, but this seems to be honored more in the breach than the observance. -- Visviva 13:08, 9 March 2008 (UTC)[reply]
The proposal seems good, as I understand it; i.e. that we should follow ISO 639-3 except when it is in the project's interests to do otherwise, and that such exceptions should only be approved upon thorough community-wide deliberation (WT:VOTE or equivalent). -- Visviva 13:08, 9 March 2008 (UTC)[reply]
As I think about it, TheDaveRoss's idea seems better and better. As evidenced by two of the following threads, this issue is one which will appear over and over, with increased frequency as time goes on. Inasmuch as overall community consensus is the best way to decide things when possible, I just don't know if its feasible for the community to make this decision every time it comes up. Most of our editors simply don't have the time to read a ten page Beer Parlour discussion and weigh all the political, historical, and linguistic controversies present. I think it will work much better if we pick two or three solid, well-respected editors with a good track record and give them dictatorial powers on the issue. I know everyone cringes at the very mention of the word "dictatorial", but we've already tried this method (with WotD), and it's turned out pretty damned well, as far as I'm concerned. -Atelaes λάλει ἐμοί 21:23, 11 March 2008 (UTC)[reply]

I don't know about these constructed languages, as certainly some such as Esperanto may be legitimate as a native tongue, while others are so obscure that not a single word in the language would pass CFI independently for three authors, so there is definitely a gray area for those. But forgetting about artificial languages for a moment, at least I can be certain that if it flows out of someone's mouth or pen and conveys a message that is understood by others then it is a word in some natural language, though I may not know which one. So in some sense I see the question as to which language a word belongs to be completely independent of the question of whether it belongs on Wiktionary. If it's spoken or written and it can be attested then it belongs in a project that documents every word in every language, and if there is controversy over which language it belongs to then the heading may change and change back and change again, but the substance will remain. 71.129.48.8 06:46, 13 March 2008 (UTC)[reply]

This category currently contains (only) Hangul reconstructions. This is absurd, since even the most generous definitions of Old Korean define it as Korean in use before the introduction of Hangul. All extant OKO texts (which are very few) use Chinese and/or gugyeol characters. I am seeking clearance to move all entries herein to their attested forms, if any, and delete the redirects. No objection to Appendix:Reconstructed pronunciation of Old Korean if anyone wishes to create it. -- Visviva 09:38, 8 March 2008 (UTC)[reply]

That seems reasonable to me. Thank you. I was wondering what I was going to do with that. You've saved me a great deal of stress. -Atelaes λάλει ἐμοί 09:39, 8 March 2008 (UTC)[reply]

Grammar gurus in the house? See entry for "hence"

Hey, I'm not a grammar guru, but I think there's a comma splice in the examples for hence. I left a note on that page's talk, but assume it will be ignored. 163.28.49.4 12:45, 9 March 2008 (UTC)[reply]

This belongs in the Wiktionary:Tea Room. I have removed the comma, though do note that you can fix things like this yourself using the edit button at the top of an entry. Conrad.Irwin 12:52, 9 March 2008 (UTC)[reply]
Wasn't sure if I was right, so didn't edit.. plus didn't know the diff b/w a tea room and a beer parlour. :-) Thanks! 163.28.49.6 13:01, 9 March 2008 (UTC)[reply]

Naming of categories of non-English proverbs and idioms

Currently, there is a mixed naming of categories for non-English proverbs and idioms, like:

These seem to stem from two different sources of modeling and imitation:

Depending on whether idioms and proverbs are considered more like the items in the first group or the second group, the naming of proverb and idiom categories could be chosen, fixed in policy, and then the categories could be renamed. Any opinion on this from the policy makers? --Daniel Polansky 08:47, 10 March 2008 (UTC)[reply]

Actually, in the case of Mandarin, we have both (Category:Mandarin proverbs, Category:zh:Proverbs, Category:Mandarin idioms, Category:zh:Idioms)! There's no reason that it has to be an "either/or" choice. Furthermore, provided that the contributor uses the inflection templates recommended by WT:AC (Template:cmn-proverb and Template:cmn-idiom), it involves no more work for the contributor than a single category. Incidently, the benefit of having both categories is that you can use it as an opportunity to provide different sort options (as well as options for which entries to include in which category). -- A-cai 10:06, 10 March 2008 (UTC)[reply]
I see. I did not notice there were both types of categories for Mandarin. I have now noticed that there is also the category Category:zh:Nouns, and that Category:zh:Proverbs is further split into subcategories while Category:Mandarin proverbs is not.
I see the benefit of less confusion, for me anyway, if there is only type of naming scheme for categories. As regards the downsides of one type of naming scheme, unfortunately, I do not understand what you mean by "different sort options"; do you think you could explain that to me?
Is there any other language using both naming schemes, or is this just Mandarin?
Do you expect that all the languages should have both categories, like having Category:cs:Proverbs and Category:Czech proverbs? --Daniel Polansky 12:35, 10 March 2008 (UTC)[reply]
It seems to me that, for other languages but Mandarin, having a consistent naming scheme would be valuable. --Daniel Polansky 12:35, 10 March 2008 (UTC)[reply]
We do have a consistent scheme, except for Chinese languages and a handful of categories that have not yet been cleaned up to standards. The two forms of category name (one for parts of speech, the other for topics) is a deliberate and consistent distinction. --EncycloPetey 17:01, 10 March 2008 (UTC)[reply]
So is it correct that there should be Category:German proverbs and not Category:de:Proverbs? And can I move Category:fr:Proverbs to Category:French proverbs? Connel seems to have a different view, judging from the tags he added to Category:French proverbs and Category:German proverbs. --Daniel Polansky 18:35, 10 March 2008 (UTC)[reply]
See the ongoing discussion at Wiktionary:Requests for deletion/Others#Category:French proverbs. --EncycloPetey 19:24, 10 March 2008 (UTC)[reply]

Norwegian language classification

I have been engaging in conversations with a couple of users about what to do with Norwegian, and it has gotten to the point where I figure it would be good to have the larger community's input into the subject. The issue centers around Bokmal and Nynorsk. The introductory section of the Wikipedia article on Norwegian sums it up rather nicely. Please do the background reading there if you are not already acquainted with the subject. The two conversations can be found at User talk:EivindJ#Norwegian Questions and User talk:Robert Ullmann#Norwegian language templates. So, there are a number of different ways we can sort between these types. I am advocating following the Norwegian Wiktionary. What this would entail is having the L2 header "Norwegian" for all words which are used and spelled the same in both Bokmal and Nynorsk. Any word which only exists in (or has a usage unique to) Bokmal receives the L2 header "Norwegian (Bokmal)" and any word which only exists (or has a unique usage to) Nynorsk receives the header "Norwegian (Nynorsk). I am as yet, unsure of how the categorization would work on that. Another option is to put all Norwegian only under the L2 "Norwegian", and then use Nynorsk and Bokmal as context tags (i.e. treating Bokmal and Nynorsk as dialects, instead of languages). A third option which has been advocated is to equate Bokmal with Norwegian (i.e. Bokmal terms go under the header "Norwegian") and treat Nynorsk as something else entirely, going under the header "Nynorsk." Thoughts? -Atelaes λάλει ἐμοί 00:22, 11 March 2008 (UTC)[reply]

Prior to having done all the background reading, my initial thoughts are the simplest to explain and the easiest in terms of categorisation would be either your second option (treating Nynorsk and Bokmal as effectively dialects); or one based on your first option but where the header "Norwegian" is not used, where words are the same in Nynorsk and Bokmal the page would have two L2 sections "Norwegian (Bokmal)" and "Norwegian (Nynorsk)". Your first option would follow this format if there are homographs?
In terms of translations into Norwegian of English words, I'd recommend using the same format as Serbian does for the different scripts. Thryduulf 01:08, 11 March 2008 (UTC)[reply]
Yes, the more I think about it, the better I think it is to simply use "Norwegian" and then use context tags. This solves the problem of sorting (as words would go into [Category:Norwegian POS's], as well as [Category:Nynorsk], etc.). Additionally, this provides an intuitive format for any other dialects we want to include in the future. -Atelaes λάλει ἐμοί 01:29, 11 March 2008 (UTC)[reply]
I'd prefer either (1) to separate them completely, with "Norwegian Nynorsk" and "Norwegian Bokmal" being valid L2 headers, and "Norwegian" alone being invalid, or (2) to treat them as a single language, "Norwegian", with two regional variants, like we do with U.S. and U.K. English. I don't like the idea of a half-and-half approach where we'd have three L2 headers for two forms of arguably one language; and I'm only O.K. with "Norwegian" vs. "Nynorsk" if we can also give U.K. English the boot — say, "English" vs. "British". Or maybe "Limey". ;-) —RuakhTALK 01:33, 11 March 2008 (UTC)[reply]
Remember, English is English: it is "American" that would be given the boot. English and USpeak. ;-) Robert Ullmann 15:56, 11 March 2008 (UTC)[reply]
Actually, in this situation British English resembles Nynorsk much more. The reason why some consider Bokmål more "properly" Norwegian is simply because it has a distinctly larger population. And if we're talking about population, American English definitely has British English beat. -Atelaes λάλει ἐμοί 21:11, 11 March 2008 (UTC)[reply]
Tee-hee: everyone says this as if it is gospel, because it is so "obvious". The actual numbers show Commonwealth English over American about 3:1. Sure, people hear a lot of American on TV, but if you write "color" in your schoolwork you will lose marks for spelling ;-). Robert Ullmann 12:36, 13 March 2008 (UTC)[reply]
To refer to Bokmål and Nynorsk as dialects is a misinterpretation, I am afraid. It is wrong to think about these two as dialects, or simply as if they were Norway's answer to British and American English. They are two equivalent languages, by law, and differ almost as much as Norwegian and Danish in it's written form (not verbally). When Robert Ullmann utters that "The only issue is a few words that can't be in any way considered Nynorsk" he is sadly mistaken. There is heaps of words that in no way can be considered bokmål and the other way around. There is not for fun that we have one Nynorsk Wikipeda and one Bokmål/Riksmål Wikipedia. I reckon, and some other no.-admins with me, that "Norwegian (Bokmål)" and "Norwegian (Nynorsk)" is most correct, but I will give notice to the Norwegian Wikipedias' fellowships and make them come with their thoughts. Thanks (: --EivindJ 07:27, 11 March 2008 (UTC)[reply]
Using "Norsk" and "Nynorsk" is wrong, as it gives the impression that nynorsk is less "norsk" than bokmål is. I am a bokmål user myself, but I would dislike seeing nynorsk discriminated in such a way. I think that either you should use two different L2 headers, or you should use the same method as with UK/US English (though i think the differences between bokmål and nynorsk are both bigger and more numerous than the differences between UK English and US English). - Soulkeeper 07:57, 11 March 2008 (UTC)[reply]

A symmetrical solution has to be used. The option “Norwegian (bokmål)” vs. “Norwegian (nynorsk)” is probably the best one. As for the two ofen having similar forms, that goes almost equally much for the two of them vs. Swedish (esp. nynorsk) and Danish (esp. bokmål). -- Olve Utne 12:09, 11 March 2008 (UTC)

Look, we have to have Norwegian as Norwegian. (Else everyone will be saying forever "I can't find Norwegian"). We then have the same case as with other languages, the most common/standard/default language (from an English POV; this is the English wikt ;-) gets the name. Then we have words that people will insist absolutely must be identified as only Bokmål (which we call Norwegian ...) and others that are Nynorsk.

ISO codes 3 different languages:

  • no = nor = Norwegian
  • nb = nob = Norwegian Bokmål
  • nn = nno = Norwegian Nynorsk

(Technically, we have a few constraints that must be observed: the names must match, e.g. no = nor, and the names must not contain parens or be partly linked. But that isn't a problem.)

I had/have it set up as:

  • no = nor = Norwegian
  • nb = nob = Norwegian Bokmål
  • nn = nno = Nynorsk

Note there is only one difference from ISO (and the Norwegian Government recommendation for the names of Bokmål and Nynorsk). I don't see a problem with "Nynorsk" -> "Norwegian Nynorsk" The problem will be keeping the majority of the language in Norwegian where people expect it to be. See euro. Robert Ullmann 15:56, 11 March 2008 (UTC)[reply]

Even if the translations are the same, keep the two different headers. One could always put a bigger header atop called Norwegian, then have two sub-headers: one for bokmål and one for nynorsk. To suddenly have one header, whereas there usually is two, is going to confuse more than anything else. --Harald Khan Ճ 16:16, 11 March 2008 (UTC)[reply]
no, we don't use subheads like that. (See WT:ELE, header levels are very significant.) And there won't be "suddenly" one header; there will be almost always only one header (Norwegian), and only occasionally two. Note that the case at euro is instructive, it occurs only when the word is the same and the inflection different (yes, I know this occurs with some frequency), it could be handled right in the inflection line, and save having two sections. Robert Ullmann 16:19, 11 March 2008 (UTC)[reply]
OK. Still there should be two different headers: Norwegian (Bokmål) and Norwegian (Nynorsk). No official written language is called Norwegian. It is discriminating to hint that the inflection of Bokmål is more Norwegian than that of Nynorsk or vice versa. --Harald Khan Ճ 17:28, 11 March 2008 (UTC)[reply]

The WP article is not clear on whether the languages are mutually intelligible. Are they? If so, I see no reason at all to split them here on enwikt. If not, then they should be different L2 sections even where the contents of those sections would coincide. What to call them is then another question, and I don't like any of the solutions, to be honest.—msh210 17:36, 11 March 2008 (UTC)[reply]

All the four langugages, Danish, Swedish, Bokmål and Nynorsk is mutually intelligible. --EivindJ 17:49, 11 March 2008 (UTC)[reply]
While I don't have much background in linguistics, I thought that that was the criterion on which linguists decided to consider dialects languages. Am I wrong? Or are Danish, Swedish, Norwegian, and Norwegian considered one language by linguists? Or is this an exception for some reason?—msh210 17:53, 11 March 2008 (UTC)[reply]
I don't have any background in linguistics either, but I strongly doubt that they are considered as one language. --EivindJ 17:59, 11 March 2008 (UTC)[reply]
For more on this, see w:Dialect, w:Dialect continuum#Scandinavian_languages, and w:Ausbausprache - Abstandsprache - Dachsprache.—msh210 18:58, 11 March 2008 (UTC)[reply]
Note that there are already a number of languages that are separated here, and that are mutually intelligible and naturally form a dialect continuum, like Croatian/Serbian, Hindi/Urdu, Moldovan/Romanian, Macedonian/Bulgarian... So this kind of separation for Norwegian wouldn't be a precedent, but a continuation of a common practice. --Ivan Štambuk 19:11, 11 March 2008 (UTC)[reply]
Mutual intelligibility has nothing to do with a definition of a 'language'. Natural languages are not like biological species that that there is a hard line cut between them, that prohibits mixing, at the DNA level. Such kind of analogy a grossly misleading simplification. They are exclusively defined by national committees, and in this particular case NLC recommends "Norwegian Bokmål" and "Norwegian Nynorsk" terms respectively. There's no reason to enforce politically incorrect terms into L2 section names when there such clear alternatives that all native speakers agree on. nor/no itself is a macrolanguage code, not individual language, and these normally don't get included at all. --Ivan Štambuk 18:48, 11 March 2008 (UTC)[reply]
Actually, they are very much like biological species. Because, you see, some species can (and do) interbreed, while other species form a continuum wherein which some members can interbreed and others cannot. The various species of oaks regularly form fertile offspring, and many orchid genera can form hybrids with enough regularity that some of these hybrids have their own names. There is a ring species of birds in the northern hemisphere around the pole, where individuals at wither end of the bird's range cannot interbreed, but interbreeding happens everywhere in between over short distances. It is a myth that biological species cannot ever interbreed, and, in most cases, there is no experimental data verifying that two species cannot interbreed. So, the interbreeding of species is very like the intelligibility of language, that is: thoroughly muddled. --EncycloPetey 01:16, 12 March 2008 (UTC)[reply]
Nice link on ring species, read about it in Richard Dawkins - 'A Devil's Chaplain' (ingeniously written book :). However, obstacle that regulates interbreeding of species has been discovered recently at the DNA level. Escherichia coli and Salmonella typhimurium, that evolution separated > 150 million years ago and have ~20% mismatch in DNA, have proven to be compatible under some circumstances. This means that the barrier between species is very discrete, and this does not occur in natural languages in which lexemes are tossed in all directions and adapted. --Ivan Štambuk 19:14, 12 March 2008 (UTC)[reply]
A famous geneticist once remarked "If it's true for E. coli then it's true for elephants." Since that time, the statement has proved false many times. There are many ways in which the genetics of bacteria and elephants are very, very different. There is not just one single factor that regulates interbreeding of species, there are many, many different mechanisms that can and do come into play. Consider that species with separate males and females automatically limit the possibilities of pairings that will result in fertile offspring, even if the DNA itself is 100% compatible. In some species of plants and fungi, there are single allele mating cofactors that function in the same way to control mating type. And even when species cannot mate themselves, there are viruses that act as agents transferring DNA laterally from one species to another, just as happens with languages. Language has not been around so very long, compared with the age of biological species, so intercompatibility and hybridization is to be expected. --EncycloPetey 03:33, 13 March 2008 (UTC)[reply]

The solution used in euro is not good since it strongly implies that Bokmål is Norwegian and Nynorsk is something else than Norwegian. I don't understand why it is desirable that en.wikt should imply something like that just because some users here prefer the one language before the other when it comes to what is Norwegian. There is no doubt that the two of them are independent languages (a quick call to the Norwegian Language Council should prove me right) ... none of the two is closer to spoken Norwegian (almost everyone speaks something in between). I do also not understand why users who don't have proper acquaintance with both of the languages can utter statements that implies that they are extremely similar and that partition only is necessary in a few cases. With all respect, it isn't out of national sentiment that we argue for having a clear distinction between the two. --EivindJ 18:15, 11 March 2008 (UTC)[reply]

Agreed. Robert's proposed format is untenable. Since all of our Norwegian friends seem to insist that we must make a language level distinction between the Bokmål and Nynorsk, that seems the way we must go. However, this means a few things: "Norwegian" as a language header is out. All Norwegian entries must go under "Norwegian Bokmål" or "Norwegian Nynorsk." However, we must keep the the word Norwegian in these headers, because, as Robert rightfully notes, otherwise people will be complaining that they can't find Norwegian. Certainly we will have a lot of duplication (i.e. a lot of entries with both "Norwegian Bokmål" and "Norwegian Nynorsk" headers), but we already have a lot of duplication with Scandinavian languages anyway. This also means that every single entry which currently uses the L2 "Norwegian" (according to WT:STATS, there's about 3,000 of them) must be changed. Obviously we should wait to get an official consensus (a vote is really required), but that looks to be where we're heading. -Atelaes λάλει ἐμοί 20:03, 11 March 2008 (UTC)[reply]
So you mean it will not be sufficient to name all words who are equal in spelling and meaning in both languages as "Norwegian", and only use the spesific headers for the words that differs from the one or the other? --EivindJ 20:43, 11 March 2008 (UTC)[reply]
The more I think about it, the more I have to say no, it isn't possible to do that. It's simply not the way we do things here. We don't have varying levels within our language headers. We can treat Norwegian as a language, or a language family containing Bokmål and Nynorsk, but not both. If we use Bokmål and Nynorsk as language headers, that makes Norwegian a language family, and we don't put language families in L2 headers. Ultimately, there is no qualitative distinction between language and well marked dialect, except the distinction of politics, as Ivan mentioned. Because of this we generally divide stuff in the way which will be most useful to our readers and easiest on our editors. However, the politics of living languages sometimes forces our hand. I will say that treating Norwegian as a language and Bokmål and Nynorsk as dialects would be easier to edit (and probably more useful to our readers), but if degrading the two to dialects would cause us uproars, it's not worth it, and we'll have to treat the two as distinct languages. -Atelaes λάλει ἐμοί 21:08, 11 March 2008 (UTC)[reply]
Thanks for the explanation. The way I see it there is not much other options than what you describe here; at least if we're going to do this properly. --EivindJ 21:27, 11 March 2008 (UTC)[reply]
Ok, can everyone who cares about the issue state whether they can accept the following two languages on Wiktionary: "Norwegian Bokmål" and "Norwegian Nynorsk," with "Norwegian" being a deprecated language. If this seems to be acceptable after a few days, we'll put it through a short vote, and allow you Norwegian folks to get back to work. :) -Atelaes λάλει ἐμοί 21:34, 11 March 2008 (UTC)[reply]
I accept that. And hopefully a bot (maybe AutoFormat?) can start s/Bokmal/Bokmål/g-ing. —RuakhTALK 23:21, 11 March 2008 (UTC)[reply]
No, they must be "Norwegian" (no) and "Nynorsk" or "Norwegian Nynorsk" (nn). We can not deprecate "Norwegian". Note that this is the WMF standard: we have no.wp and nn.wp, and no.wikt and nn.wikt, and so forth. (Also, nb would exclude Riksmål, which creates another utterly un-necessary problem.) Robert Ullmann 13:30, 12 March 2008 (UTC)[reply]

That's a little tricky. If most of the entries are from a single editor, and we find out that they've only been working with Bokmål, that might work. However, the best thing would be for real live people to go through these by hand and figure out whether the word is B, or N, or both. -Atelaes λάλει ἐμοί 23:33, 11 March 2008 (UTC)[reply]

If there is no label, then it will require human attention. I think Ruakh's suggestion, however, was about changing instances of "Bokmal" to "Bokmål" (i.e. changing the second vowel from "a" to "å") which is certainly automatable. Certainly changing it for new/edited entries would fit will with AutoFormat's work. Thryduulf 00:06, 12 March 2008 (UTC)[reply]
In principle, I am opposed to the idea of splitting the two into separate L2s, as I have been saying to Atelaes and Robert Ullman. Bokmål and Nynorsk are not different languages; they are two differing written standards based on differing dialects (but by no means necessarily faithfully reflective of any one dialect) of the same language. It is thus my preference that they should be unified under the same L2 header (Norwegian) and any (all) standard-specific terms indicated with context tags. I would also see Riksmål- and Høgnorsk-specific terms (and anything else we can think of) likewise indicated, where they exist, and are appropriate and relevant. However, if it's deemed more useful by the community to split them into separate L2s (although it does seem rather like duplicating work), I will not object, so long as it's split as indicated by Atelaes above. In addition, we need an About Norwegian page (either containing information on both standards or linking to separate pages with a note as to why) for those who just don't get the idea of there being no Norwegian language header. Release the shoats! --Wytukaze 01:31, 12 March 2008 (UTC)[reply]
As you say, a tag might be better, except that Nynorsk is treated as a separate language and code from Norwegian across projects. (Think about iwiki links, wp links, etc. etc.) This is already extremely well established, which is why I am very annoyed with Atelaes for creating a large discussion on BP about something that is not going to change anyway. You can go around in circles forever on this, and it will all come back to cleaning up entries so that the headers are "Norwegian" and "Nynorsk" (or "Norwegian Nynorsk") This is a solved problem. Creating a very large mess from a solved problem is not productive. I will be trouting Ateleas when I can catch him on IRC ;-) Robert Ullmann 13:30, 12 March 2008 (UTC)[reply]
I would like to know why users continously comes with utterings like "Bokmål and Nynorsk are not different languages; they are two differing written standards based on differing dialects". This isn't clear facts. Nynorsk is based on all Norwegian dialects together while Bokmål is based on the old standard strongly influenced from Danish. And as a fact I can state: "They are two independent languages". I note that several users by the way they express themselves are implying that these two are very close to each other, and not at all two languages. The iws to other articles on Wikipedia use "Norsk (Bokmål)" and "Norsk (Nynorsk)", so why do Robert want "Norwegian" to mean "Norwegian (Bokmål)" when it it means both? The only reason why "no" is used instead of "nb" (bokmål) on no.wiki is because no.wiki first was for both lanugages, but after a while "nn" got it's own wikipedia, but "no" was never changed to "nb". --EivindJ 13:42, 12 March 2008 (UTC)[reply]
I concur that Nynorsk should be separate. What I am saying is that the header here for no must be "Norwegian" and we must use no and nn because all the rest of WMF does (we can't use "nb" for Bokmål without breaking everything in sight). Note that no is then Norwegian-not-Nynorsk, as you point out is the established usage in WMF for (e.g.) no.wikt, nn.wikt and so on. And we simply can't use "Norwegian Bokmål" for what in English is called "Norwegian". (NLC POV, etc, etc, notwithstanding.) As I have pointed out, this is a solved-set problem; nothing to see here. Robert Ullmann 13:53, 12 March 2008 (UTC)[reply]
Yes there is a major problem here: Despite what you imply, there is no written language called Norwegian; only Norwegian (bokmål) and Norwegian (nynorsk).
If bokmål is best left under the no template, then change it into Norwegian (bokmål). Wiktionary's accuracy regarding the Norwegian language starts here. --Harald Khan Ճ 16:35, 12 March 2008 (UTC)[reply]
Just to clarify it further: making one header called Norwegian and one called Nynorsk is like under the Wikipedia entry of Norway to present the official languages of Norway as "Norwegian" and "Nynorsk", which is nothing but a factual mistake. --Harald Khan Ճ 16:48, 12 March 2008 (UTC)[reply]
Umm....I'm seeing Norwegian (Bokmål and Nynorsk). -Atelaes λάλει ἐμοί 16:55, 12 March 2008 (UTC)[reply]
As you should. That is opposed to Norwegian AND Nynorsk. The term Norwegian does exclusively refer to the Norwegian language as a whole including all dialects or both written languages. --Harald Khan Ճ 20:02, 12 March 2008 (UTC)[reply]
While I'm sure this will prolong my trouting, I must say Robert, that you've never offered any convincing reason why we can't use "Norwegian Bokmål" and "Norwegian Nynorsk." If the angstrom will break bots, perhaps we can substitute an "a" for it (and note the deficiency in a few places). It really can't be a matter of users not being able to finding Norwegian, as Norwegian's included in there. The worst case scenario there is that people will wonder what Bokmål and Nynorsk mean, and educate themselves on the subject (a situation which is, admittedly, completely opposed to WM values :-)). If it's out of concern for following WM precedent, then a quick look at the main page of Wikipedia solves that. The interwiki links to both Norwegian 'pedias are titled ‪Norsk (bokmål) and ‪Norsk (nynorsk). We don't seem to have an interwiki on en wikt for the nn wikt (it looks like a fairly new wikt). You say that using nb would break stuff (such as t-bot, I presume), then we can use no and title it Norwegian (Bokmål). If it's simply your odd POV in considering Nynorsk to somehow be inferior to Bokmål or just plain not a part of the Norwegian language, then I guess I'm not worried, as you seem to be singing a solo tune on that one, and a vote will put an end to that (the tyranny of the majority can be a nice thing when you're in the majority). -Atelaes λάλει ἐμοί 16:43, 12 March 2008 (UTC)[reply]
(note that the previous entirely misrepresents my "odd POV". If Språkrådet itself can call it (just) "Nynorsk" in its literature, it is at the very least not wrong ;-) Robert Ullmann 13:00, 13 March 2008 (UTC)[reply]
I'll say it again. Bokmål and Nynorsk are not different languages and it is misleading to refer to them as such. They are, and I have never said otherwise, (add 'completely' if you wish) separate written standards of the same language. They are similarly not independent; they do influence each other and are influenced by speech (and this is, if you'll forgive me for saying so, rather obvious). The spoken language, as it happens, is not separated into the two standards of Bokmål and Nynorsk; yes, there's some correlation, but speech is split among the various dialects, some of which correspond more or less closely with one of the written standards or with the Swedish standard, and on and on and on. Many languages have considerable dialectal variation in speech and Norwegian is in no way special in this area. What is special is the written situation, but a written standard does not constitute a language and a language can have more than one written standard. It is not difficult to do. This is, as I believe has been stated, a different matter to the differences between US and UK spelling differences, true; it is, however, comparable with that, the pluricentric standardisation of German, the Cyrillic and Latin orthographies of Serbian and the situation with modern Welsh, where the written, literary standard differs considerably from modern dialects. I do believe useful information on all of those can be gleaned from Wikipedia. As such, it is still my opinion that we should not be splitting the two written standards into separate L2s. However, as we and the WMF as a whole already do split them, we should continue to name both Norwegian and specify which standard we are referring to at every instance. That is, I firmly do not support naming one "Norwegian" and the other "Nynorsk" or even "Norwegian Nynorsk". They are both equally Norwegian, and Atelaes' proposal that we use "Norwegian (Bokmål)" and "Norwegian (Nynorsk)" shows this and the secondary nature of writing in language admirably. --Wytukaze 20:15, 12 March 2008 (UTC)[reply]

Okay people ... have we discussed this to death? (note: I only used "Nynorsk" itself because "Norwegian Nynorsk" seemed like a redundant pleonasm. Språkrådet (NLC) itself uses just "Nynorsk" in its English language literature.)

How about we simply follow the standards? ISO, SIL, NLC, the de-facto setup of WMF, the setup of the no.wikt, and the status quo pro ante here? Eh? Especially because they all say the same thing?

ISO/SIL codes:

  • no = nor = Norwegian
  • nb = nob = Norwegian Bokmål
  • nn = nno = Norwegian Nynorsk

Språkrådet (NLC, Norwegian Language Council) recommends that in English the two written forms be called "Norwegian Bokmål" and "Norwegian Nynorsk" when they need to be distinguished. (Notice no parenthesis; we don't want parens in L2 names anyway.)

WMF uses no = Norwegian in the project naming, (although this has become usually more Bokmål as the Nynorsk projects were added).

The no.wikt uses no = Norsk, nb = Norsk (Bokmål), nn = Norsk (Nynorsk) (see no:Mal:=no=, no:Mal:=nn=, no:Mal:=nb=), using no = Norsk for most entries, and the other two when they need to be distinguished:


from the no.wikt, as of 6 March 2008

Header Occurs
no 674
nb 72
nn 64


We use no = Norwegian, and have been using the other two (with some variations in form) when they need to be distinguished. (A very small number at this point, L2/invalid shows exactly five.) Note that Norwegian not Bokmål or Nynorsk is no = Norwegian in all the standards; we presumably want some context tag or usage notes in these cases.

As I've said above more than once, there isn't any problem here; all the standards and the de-facto setup(s) agree. There are just a few entries that need fixing, which is where all this [redacted] started.

See mandag and måndag, compare no:mandag and no:måndag. Robert Ullmann 12:29, 13 March 2008 (UTC)[reply]

And note that all this is pretty much exactly what Atelaes said at the top of this section, which is why I want to trout him for raising essentially a non-issue and creating a lot of sound and fury. Robert Ullmann 12:36, 13 March 2008 (UTC)[reply]
Note that using en.wikt and no.wikt as examples for how often we need to distinguish between the two is not good. A quick look through Category:Norwegian language tells me that there are heaps of words that needs to be changed into either nb or nn. However, I understand it so that Robert says we can have the headers "Norwegian", "Norwegian Bokmål" and "Norwegian Nynorsk". If that's correct then everything's ok for me. I just don't want to see "Norwegian" for "Norwegian Bokmål" and "Nynorsk" for "Norwegian Nynorsk". --EivindJ 13:49, 13 March 2008 (UTC)[reply]
This is my first comment on Wictionary; I don't know the technicalities. But I know American English, Nynorsk and Bokmål - in that order. It is totally unacceptable to in any way suggest that Bokmål is more Norwegian than Nynorsk is.
It should perhaps also be pointed out that the differences between the two include more than just words; there are grammatical differences as well. --Hordaland 22:40, 17 March 2008 (UTC)[reply]

Please note that the use of the language code “no:” for “Bokmål” on Wikipedia is a leftover from the time when no.wikipedia.org included both Bokmål and Nynorsk. This code has been kept as the main one, with nb being a redirect, as a “compromise” — to facilitate good diplomatic relations between the Bokmål and Nynorsk wikipedias. To use this compromise to misrepresent the language names here would be a mistake — regardless of what Mr. Ullmann’s feelings are. -- Olve Utne 19:09, 19 March 2008 (UTC)

Hmmm.....this could be a sticky issue. The simple fact is that we have a lot of bots running information back and forth between wikt's, such as User:Tbot. The work they do for us is invaluable. Because of this fact, we may be stuck retaining this incorrect usage simply because the Wiktionaries themselves retain it. What we really need is a comment from Robert Ullmann on whether it would be possible to use nb and nn, and still have the bots function properly when the Wiktionaries are no and nn. While I strongly disagree with him on how to treat the Norwegian languages, there is no denying that he is easily the most knowledgeable editor on this particular aspect. If this screws up the bots, I think we may have to retain no and nn until such time as the Norwegian Wiktionaries themselves make the appropriate switch. -Atelaes λάλει ἐμοί 19:47, 19 March 2008 (UTC)[reply]
Words that are not different between Bokmål and Nynorsk remain in no/Norwegian. The variant codes and names are only to be used when there are differences. Specifically: only when there are corresponding entries which are differently spelled, and refer to each other. Understand that the no/nb/nn distinction, however real, was a political result of the ISO 639-1 process, which was intended to produce a stopgap coding until something better could be done. The differences between Bokmål and Nynorsk are very small compared to the differences in English, which we code and represent as one language. The differences in Norwegian between dialects and regions are much larger than the Bokmål/Nynorsk written form representation. (A serious argument could be made that Nynorsk is (yet another ;-) 19th century spelling reform that has now failed....)
However, given that Nynorsk is coded, and has at least some people who want to represent it, it is very reasonable that we include it and document it. At the same time, forcing all of the rest of Norwegian into the "Bokmål" pigeonhole is not acceptable. Most terms in spoken and written Norwegian are just that: Norwegian.
So we use no/Norwegian for most of the language, quite properly; nb/Norwegian Bokmål when, and only when it must be distinguished from Nynorsk, and nn/Norwegian Nynorsk for those words that must be distinguished from Bokmål.
And yes all this works correctly with the automation, which has long since changed "nb" to "no" for the iwikis. Robert Ullmann 01:13, 20 March 2008 (UTC)[reply]
I hesitate to comment that all written languages are, in fact, different and often divergent from their spoken counterparts. That said, the only slight modification to Robert Ullmann's reasonable offering is that when nn/Nynorsk is used is exactly the same as when nb/Bokmål would be used: when either is variant. - Amgine/talk 02:31, 21 March 2008 (UTC)[reply]
Of course; we often document the spoken and written differences, and the frequent derivation of one from the other. (An interesting case is German pfui > pfui > spoken form > phooey ;-) And yes, you can look at it either way 'round ;-) Robert Ullmann 08:52, 23 March 2008 (UTC)[reply]

Please do not let yourselves be mislead by Mr. Ullmann’s current “solution”. (What are his qualifications in this matter, by the way?) It does not work in practice without some major tweaking of templates etc. to bridge the very frequent differences in grammar — see dag#Norwegian, where the common Scandinavian word dag is presented under the header “Norwegian”, but given only the “Norwegian Bokmål” plural forms. While it is true that the singular forms in this word (dag, dagen) are the same in Norwegian Bokmål and Norwegian Nynorsk (and Danish and Swedish at that), Mr. Ullmann may not be aware of the fact that the plural forms are different. Actually, when taking morphology into consideration, most words are different between Nynorsk and Bokmål, despite the (false) impression one gets from the fact that the indefinite singular forms of nouns and adjectives often are the same in both (and in Swedish/Danish). The solution is very simple:

  • Treat Norwegian Bokmål and Norwegian Nynorsk, under those exact names, as separate languages — the same way as the two other Scandinavian languages (Swedish and Danish) are treated.
  • Use either “nb” (correct language code) or “no” (not correct language code, but currently used interwiki code on Wikipedia) for Norwegian Bokmål. A robot can easily standardise the entries either way.
  • Use “nn” (correct language code and currently used interwiki code) for Norwegian Nynorsk.

Respectfully, Olve Utne 07:42, 23 March 2008 (UTC)

"(What are his qualifications in this matter, by the way?)" That is argumentum ad hominem and pretty much discredits anything else you have to say. My qualifications would probably floor you. (for one thing, I am 1/2 Norwegian :-) In the 639-1 process, where we were developing a 2-letter language code to cover a number of languages, we had several serious political problems. The nn and nb codings are the result of one of those problems: the Norwegians vociferously insisted that Bokmål and Nynorsk be coded separately (not all the Norwegian contributors to the committee, just the Nynorsk proponents...) even though the -1 two letter coding should have had only one code for that level of classification. The distinction should have been left to (what is now) -3, or more properly -4. In the end a political "solution" was reached, no, nb, and nn all coded, with the expectation that software implementors would use the correct no code, and just ignore nb and nn. (Which is what they have mostly done. Until one or more very vocal proponents of Nynorsk show up, such as Olve from the nn.wp ;-) Please note that the linguistic credentials of the several hundred people both on the committee and in support (such as myself) were/are very extensive. A similar political problem—in reverse—was with the Chinese languages; the rapporteur from the PRC insisted that there was only one "Standard Written Chinese" (meaning Mandarin written in simplified characters) disregarding that -1 should have coded 11-14 of the languages. The result was zh, which was useful to some extent, but quickly had to be extended by software implementations (zh-CN, zh-TW, zh-min-nan, etc). What I wrote above is the resulting political standard(s), IMHO a better solution for us would be what Atelaes said right at the top: use no/nor "Norwegian", and distinguish with proper tags and such the Nynorsk written forms from Bokmål. Robert Ullmann 08:52, 23 March 2008 (UTC)[reply]
Whether ISO should have been this or that way in your opinion, the fact stands that Nynorsk and Bokmål have separate language tags — just like the two other Scandinavian languages. Since this distinction does exist — and in all of ISO 639-1, ISO 639-2 and ISO 639-3 at that — I do not see any reason why we should not use it.
You are of course free to prove me wrong through addressing the problems I pointed out rather than bragging of your one parent from Norway (I have... — TWO (!) (Unbelievably impressive, eh? ;-) But beside the point.)) and claiming ad hominem attacks (In my book, those are about attacking the person rather than their arguments.. But who am I — a mere ignorant Norwegian linguist — to know what ad hominem means...) as an excuse for avoiding the legitimate questions — see dag#Norwegian.
While you are at it, feel free to explain what makes Nynorsk/Bokmål less of a legitimate distinction than Bokmål/Danish, Czech/Slovakian, Macedonian/Bulgarian — or Nynorsk/Swedish at that. That those of us who write both Nynorsk and Bokmål (like myself) or only Nynorsk (like quite a few others). Isn’t having a separate literary tradition for a century and a half enough for a language to be treated as one? That Cantonese and some other languages within the Chinese languages have not had a full literary tradition until recently is not a reason to treat Nynorsk differently from Bokmål, Swedish, Danish, Faroese, Czech, Slovakian, Macedonian, Bulgarian, Catalonian, etc.
-- Olve Utne 11:00, 23 March 2008 (UTC)
Actually, since you hint to your knowing Norwegian (?): How do you want to solve, e.g., the following problems: sei, elv, bli, bok, rot, sau, kjerring? -- Olve Utne 11:27, 23 March 2008 (UTC)


"(not all the Norwegian contributors to the committee, just the Nynorsk proponents...)" Semantics. How there came to be two different ISO-codes is irrelevant. The fact that there indeed is two different ISO-codes is what you should note.
"Look, we have to have Norwegian as Norwegian. (Else everyone will be saying forever "I can't find Norwegian")." Another argument you used earlier which is also pure semantics. The fact that there IS two different official versions of Norwegian, and that both are equally Norwegian, you cannot alter. By your logic, for those who do not know that there are different official languages in the three Scandinavian countries, we should lump the languages together under a Scandinavian header and use Swedish as the norm of Scandinavian languages since it is the one with the most speakers/users, and have the other languages as mere sub-sections. --Harald Khan Ճ 19:01, 24 March 2008 (UTC)[reply]
I agree, we should delete it. - [The]DaveRoss 19:58, 24 March 2008 (UTC)[reply]
  • I am shocked to find out that the large number of SIL's macrolanguages we regularly use here as normal L2 headers. Technically, on the basis of large lexis/morphology variations between individual languages with separate -3 codes, anyone could ask for their separation. I've heard that the differences between some of those Arabic/Chinese dialects can be quite enormous. Sometimes the orthogrophy used for lemmatization (non-phonetic logograms, or consonant-based such as for Arabic/Aramaic..) can be a binding factor, in addition to, of course, shared cultural/religious heritage that strongly emphasizes common treatment. In other cases, when the separation itself is a preferred option for various reasons (usually just the opposite of those used for common treatment - I've read that dozens of almost exact Aboriginal languages are treated separately because no tribe want's to have it language named ofter neighbouring tribe's name..), forced unification on the basis of intelligibility arguments would necessarily be enforcing a particular POV, that would encounter sharp criticism all along the way of wiktionary's lifetime, assuming that the separation meme is typically not peculiar to some small and loud group. For once, almost all Norwegian-language contributors that have voiced their opinion here support the separation of Bokmal and Nynorsk. If it's true what has been said that, notwithstanding the common lemma form, there is a lot of disparity (approx. how much?) in inflected forms, especially those that display next to the headword line (like plurals, participles, definite/indefinite forms) that would be hard to solve correctly, the separation would probably be the best choice. --Ivan Štambuk 21:36, 24 March 2008 (UTC)[reply]
  • Huh?? We don't use Chinese as a L2 header, except accidentally or as holdover from early additions. See Wiktionary:About Chinese. We do treat Arabic as a single language, but that's partly driven by the fact that there's an Arabic Wiktionary and an Arabic Wikipedia. We also don't get many contributors who add terms in specific regional variants. We did have someone adding Egyptian Arabic for a time, but he has long since disappeared. Without more contributors for a macrolanguage, there just isn't much we can do. --EncycloPetey 06:03, 25 March 2008 (UTC)[reply]
I was under the impression that here L2 Arabic = Standard Arabic (we use macrolanguage -1 ar=ara as synonymous with -3 individual language arb, and Category:Egyptian Arabic language has only 2 entries, so I guess the dude you're referring to must have been formatting his entries with ==Arabic==, and all of the regional variants in translation tables I saw all point to entries with normal ==Arabic== L2), the exactly the same situation occurs with Aramaic where 334a is adding regional-agnostic spellings (-3 arc = Imperial Aramaic = just Aramaic here). Now had you done your homework and actually studied that list, you'd find out that almost all of other families on that list beside Chinese and Serbo-Croatian are used here as normal L2 headers; in alphabetical order per -3 code; beside already mentioned Aramaic and Arabic:
Edit history of those categories shows that you edited most of those, so you should have known better before generalizing everything to Chinese family (which represents 0.01% of world's languages anyway). --Ivan Štambuk 10:22, 25 March 2008 (UTC)[reply]
I would like to point out the following:
1) Nynorsk and Bokmål booth have their own separate literature.
2) As for the similarities with the other Scandinavian language, Bokmål is probably closer to Danish than to Nynorsk. --Sigmundg 05:23, 25 March 2008 (UTC)[reply]

Hm, I've been absent for a while, and haven't got to follow this discussion. Is there any conclusions yet? Regardless of what people think is a language or not, I would like an answer to the following:

  1. What to do when a word is the same in both Nynorsk and Bokmål, but has different grammar.
  2. How to categorize
  3. When a word is in both of them, but with different meanings

I also think we need to create som Bokmål and Nynorsk templates. It's about time we can get back to work ... The Norwegian part of this wiktionary is poor :S --EivindJ 09:45, 25 March 2008 (UTC)[reply]

My gut feeling (an my brain feeling too) is that deprecating “Norwegian” as an unqualified term is the best solution. Norwegian is a conglomerate of Scandinavian dialects that happen to be spoken in Norway, but it is not a written language, and not even one written language continuum. Rather, there are two written languages — currently known as Bokmål and Nynorsk. Each of these has, like English, its own continuum of conventions and standards, and some of these again have their own names. Thus, we have — from Høgnorsk and archaic Nynorsk through archaic Riksmål to Danish:

Høgnorsk - conservative Nynorsk - NYNORSK - radical Nynorsk || Samnorsk | radical Bokmål - BOKMÅL - conservative Bokmål - Riksmål - Dano-Norwegian - DANISH

(The above is a bit simplified, but should give a reasonably good impression of the situation.)

The continuum from høgnorsk through radical Nynorsk are mainly predictable phonological variations over one common morphological system.

The continuum from radical Bokmål though Riksmål is a bit more complicated, but is also a continuum which has reasonable predictability based on a common morphological system.

Setting up grammar tags to cover the whole Nynorsk spectrum is uncomplicated. Setting up grammar tags to cover the whole Bokmål spectrum from radical Bokmål through Riksmål is also quite easy.

But setting up grammar tags that cover both the Bokmål and the Nynorsk spectrum in one is a daunting task which takes the invention of new grammatical tags that will make the entries much more opaque for the average editor, since one would have to disregard the tradtionally numbered classes of nouns, verbs, etc. and in effect invent a new system.

The current unqualified “Norwegian” entries are, as far as I have checked, all actually Bokmål, and by far the easiest solution, practically speaking, would be to have a bot rename them all to Norwegian Bokmål. That being done, one would already have achieved a system which is practical, intuitive and — last, but not least: not original research...!

In addition to the differences in vocabulary and morphology, please note that the syntax is also quite different — to the degree that a literal, word-by-word translation from Bokmål to Nynorsk will, actually, sound very awkward in most cases.

My proposal is therefore:

  • Have a bot move all current “Norwegian” entries (which are de facto Bokmål already) to “Norwegian Bokmål”.
  • Keep all “Norwegian Nynorsk” entries under “Norwegian Nynorsk”.

This reflects the fact that Bokmål and Nynorsk are not one written language continuum, but two. The names reflect the officially designated terms by the official Norwegian Language Council, as well as, except for not having parentheses, the already established Mediawiki interwiki names. -- Olve Utne 15:06, 27 March 2008 (UTC)

WT:MILE

I propose that we re-format Wiktionary:Milestones with User:Nadando/milestone, or something similar. The current page is all of the place formatting-wise, and the new page is sortable. Nadando 03:44, 11 March 2008 (UTC)[reply]

Be bold, kick Milestone's ass. - [The]DaveRoss 03:48, 11 March 2008 (UTC)[reply]
I can't, it's protected :) Nadando 03:49, 11 March 2008 (UTC)[reply]
Thanks. Nadando 03:50, 11 March 2008 (UTC)[reply]

ő and ű in .ogg audio filenames

I am seeking BP's help in a problem I'm having with two characters (ő and ű) in .ogg filenames. I am trying to record audio for Hungarian words. I've used both Audacity and Shtooka. The other special characters work fine (á, é, í, ó, ö, ú, ü), but ő and ű are changed to o and u. I'd prefer Shtooka since it is so easy to use. In Hungarian, the accents mean a different letter, not simply the stress in the word. So bor = wine, and bőr = skin, but Shtooka will not create bőr.ogg, only bor.ogg, and if I change the filename to bőr.ogg RealPlayer will not play it. I don't have this problem with mp3 files and I can display these special characters on my PC. I use the hu-%STR mask in Shtooka because this will give the preferred filename format. I copy/paste the words from Wiktionary to Shtooka before recording. If the pasted word contains ő and ű, these two characters are displayed as vertical bars. I was told that other languages with special characters work fine, but I don't know how to fix this. Thanks. --Panda10 22:02, 12 March 2008 (UTC)[reply]

As a work-around is there any standard transliteration scheme for ő and ű to ascii characters (like ä and ö can be written ae and oe)? Thryduulf 22:21, 12 March 2008 (UTC)[reply]
Unfortunately, no or at least I've never seen it. --Panda10 22:23, 12 March 2008 (UTC)[reply]
I recall seeing ooe and uue.—msh210 22:40, 12 March 2008 (UTC)[reply]
I've just tested ô and û, both worked fine. It seems that the Latin-Extended characters will not work, but Basic Latin and Latin-1 will. If nothing else works, I will use these characters. They are not correct, but look close enough to the original. Thanks. --Panda10 23:41, 12 March 2008 (UTC)[reply]
Is this a problem when naming the files on your own computer, or when you upload them to Commons? Did you know that you can upload the file under a different name than the one saved on your computer? Have you tried doing that? Have you tried using QuickTime instead of RealPlayer? This same problem can potentially affect many other languages, so I'd rather we didn't try to "work around" it by giving such files a different name. The audio file name should always match the entry name. If there is a deficiency in RealPlayer, then we should tell them and let them decide whether they want to fix the shortcomings in their software on their own. --EncycloPetey 03:23, 13 March 2008 (UTC)[reply]
This is not a problem with naming files on my computer. I have several mp3 files with ő and ű in the filename, they can be played fine with Windows Media Player. Uploading to Common - I uploaded only words that did not contain ő and ű, so I can't answer this part of your question. For now, I am trying to play the files with ő and ű in the .ogg filename on my computer, no success. I tried QuickTime, it displays an error message for any .ogg file. What is the recommended player for .ogg files? How will Wiktionary users play these files? --Panda10 19:25, 13 March 2008 (UTC)[reply]
I use Quicktime to play all .ogg files, but then I am using a Mac. I would try uploading some files with the problem charatcers to Commons, changing the name at upload to the Hungarian spelling, and see if you can then play them from within a page. --EncycloPetey 19:43, 13 March 2008 (UTC)[reply]
Thanks for the idea. I uploaded the audio for ősz (autumn) and played it back successfully from Wiktionary using the same RealPlayer that is not willing to play it on my computer... Does this make sense to you? --Panda10 20:55, 13 March 2008 (UTC)[reply]
It makes sense in that I understand what are are saying, and am not surpised by the vagaries of internet interaction. Do I comprehend the reason it works when done this way? No. But I do know that I've successfully played .ogg files for odd script words before this way. --EncycloPetey 23:29, 13 March 2008 (UTC)[reply]

Even sysops are ignoring my questions

Which box(es) should be utilized on failure, zero gravity, vampire, and martial art? I am being told of a non-existent policy which is in practice (WTF?) albeit it is not written. Ergo, I was temporarily banned for making what I thought were bold edits, and some of my thoughts continued to go unanswered. Can anyone lend a voice? This is probably the last place I'll be repeating myself. Sesshomaru 05:42, 13 March 2008 (UTC)[reply]

Relevant discussion at User talk:Sesshomaru. -Atelaes λάλει ἐμοί 05:52, 13 March 2008 (UTC)[reply]
This seems like a strange use of blocking, although I haven't looked into it thoroughly, and don't plan to.
Personally I think both boxes should be deprecated in favor of {{pedialite}} in a ===See also=== section. As an added bonus, that template is unobtrusive enough that it can be repeated as needed. If we must choose, then at least 99% of the time we should be linking to the disambiguation page. Dab pages seek to provide the full range of encyclopedic meanings, just as we seek to provide the full range of lexical meanings, and are thus the most appropriate next stop for someone who didn't find what they were looking for in our entry. -- Visviva 05:59, 13 March 2008 (UTC)[reply]
Can you clarify? All I understood was "don't use 'em" and "use dabs 99% of the time". Sesshomaru 06:05, 13 March 2008 (UTC)[reply]
Visviva suggest you use {{pedialite}} as an inline, unobtrusive link in a ===See also=== section within the part of speech section. (Notice the level of the header, this is important.) - Amgine/talk 06:11, 13 March 2008 (UTC)[reply]
Yes, try at most one box, and preferably only {{pedialite}}, although we really need to update the WT:FAQ to encourage the inline template. If they are as such, more than one link to Wikipedia is NOT a problem, provided they are relevant. A disambiguation page is fine, and I would say a link corresponding to any definition where the Wikipedia article is directly related. 71.129.48.8 06:17, 13 March 2008 (UTC)[reply]
Okay. I'm starting to understand. Can someone provide sample(s) of links which share this tag? I'm more curious in seeing the layout than anything else. Sesshomaru 06:28, 13 March 2008 (UTC)[reply]
Have a look at Special:Whatlinkshere/Template:PL:pedia. One example is at pigment#See also. Mike Dillon 06:42, 13 March 2008 (UTC)[reply]
I'm sorry zero gravity was reverted time and again instead of indicating a better way to accomplish the goal you had in mind. I agree with Visviva that a box makes sense for disambiguation, but the community is splintered on the whole issue. Personally I don't think many people have put the necessary thought into it, assuming one page on Wiktionary equates to one page on Wikipedia, but I digress. To address your question, no one would object to the way I've set up zero gravity now, using {{pedialite|...}} and {{pedialite|dab=...}}. 71.129.48.8 06:57, 13 March 2008 (UTC)[reply]
Although I have never seen a policy about it I have always used links to Wikipedia either when they have a page at the same title, or when I am using terms in the definition that are not explained on Wiktionary (for example expanding abbreviations to places or entity names). As a dictionary I feel that Wiktionary should definitely not be aiming to provide background information about related topics, though linking to related words on Wiktionary is useful for broadening vocabulary, and that is why we have lots of synonyms, antonyms, etc.etc. sections. It makes clear sense to me to link to the disambiguation page on Wikipedia, unless we want to have an interwiki for each sense of the word, which is redundant and ugly, as there is no way we can tell which article people will be interested in. On a related note, if Wikipedia's article at the identical title is actually a redirect, it is still preferable to link directly to it than to disambiguate manually, this is so that if the redirect is converted into an article or pointed to a different place then the link will still make sense.
For example, with zero gravity, the link to w:Weightlessness is irrelevant, if people want to know what weightlessness is they can click on the link to weightlessness in the definition (where I assume they will find a link to 'pedia should one be necessary). It is worthy of note that we have a javascript extension that provides interwiki links to Wikipedia whereever {{pedialite}} and related templates are used and that at the moment it is assuming an interwiki between Wiktionaries "zero gravity" and Wikipedia's "weightlessness", which should not be there. Conrad.Irwin 10:55, 13 March 2008 (UTC)[reply]
I would strongly suggest that in the case of redirects, "Zero gravity" should be preferred over "Weightlessness", basically, to avoid confusion. And you never know if the redirect could very well become its own article (this is what happened to w:Kristin Wells; for a long time the link targeted w:Superwoman and now it has its own page). Per this discussion, I made these changes, but was I correct in doing this edit (martial art or martial arts)? And back to the fundamental inquiry: what about an instance such as Batman vs. batman, Superman vs. superman, etc.? Sesshomaru 22:10, 13 March 2008 (UTC)[reply]
In the case of zero gravity, WP's disambiguation page is not relevant as there is only one definition, I think therefore it should link to w:Zero gravity - though that is a personal preference. This discussion looks in favour of having only one Wikipedia link per entry, yet both of the "these changes" links contain two links, so I am not sure why you are referring people to here to justify them. The martial arts edit was fine. For Batman I think we should link to w:Batman (disambiguation) as we have more than one meaning in common. For batman I feel that we can link to w:Batman (military) as that is the meaning given - though I wouldn't object to linking to the disambiguation page there too. The links on superman and Superman feel right to me, though I can't see the link on superman being much use, and wouldn't have added it myself. I don't think that we shold be using Wikipedia as a place for people to find new meanings of words, but instead to enhance their knowledge and understanding of the words we define. Conrad.Irwin 14:58, 14 March 2008 (UTC)[reply]
I don't understand. Is this a reversal of what you said above? Should we link to w:Zero gravity or to w:Zero gravity (disambiguation)? First you said that w:Weightlessness, which the first redirects to, is irrelevant, and now you say that the disambiguation page is not relevant. I would imagine either could be depending on what the user is looking for, so why not have both? If there were more definitions, as with trunk, there would have to be more links anyways. DAVilla 11:03, 19 March 2008 (UTC)[reply]
The thing is that "weightlessness" is not the same as "zero gravity", the fact that at the moment Wikipedia treats it the same at the moment is irrelevant. Whether to link to a disambiguation page or a specific article is a choice that needs making on a per entry basis, but I can never see the use of linking to a different word on Wikipedia. Our aim is not to provide people with information about topics, it is to provide them with a better understanding of words. Conrad.Irwin 01:12, 20 March 2008 (UTC)[reply]
There's a policy draft at Wiktionary:Links; it definitely needs more work before it's representative of community standards, but it's a start. —RuakhTALK 12:24, 13 March 2008 (UTC)[reply]

Can you see this: 𐎧𐏁𐎹𐎠𐎼𐏁𐎠?

I am interested in knowing what percentage of people can see the Old Persian script without installing extra fonts. If the majority of people can see only '??????' then it might be good to add a template or some information on adding the font and making it display properly. There is such a template on Wikipedia for Chinese [5] (although I think most people can see Chinese by default). Pistachio 00:25, 15 March 2008 (UTC)[reply]

There are numerous fonts (including this one) that are not installed in my Windows setup. Mostly I don't miss them. DCDuring TALK 00:33, 15 March 2008 (UTC)[reply]
Something like 0.01%. Lots of those ancient languages' scripts requre special fonts; in this case Aegean.otf or Xerxes (those two are supported by Xpeo). I don't think it would be good to clutter pages with "this needs special fonts" messages; how about instead providing image display in the headword by default for all of these? Like in e.g. Phoenician 𐤀𐤍𐤊𐤉 (ʾnky) or Gothic 𐌷𐌰𐌹𐍂𐍄𐍉 (hairtō) ? --Ivan Štambuk 00:44, 15 March 2008 (UTC)[reply]
I can see it, incidentally, but I doubtless have extra fonts installed for a lot of things. --Wytukaze 00:48, 15 March 2008 (UTC)[reply]
I can't believe a Wiktionary admin can't see Old Persian cuneiform. How embarrassing. I move to desysop DCDuring for their lack of interest in esoteric languages. :-) -Atelaes λάλει ἐμοί 00:51, 15 March 2008 (UTC)[reply]
Now that I've outed myself, I should admit that even if it were visible, I couldn't pronounce or read this script or, er umm, several others, including, er umm, a couple that do display on the screen (though I couldn't say what they display). I blame my failing eyesight. DCDuring TALK 00:58, 15 March 2008 (UTC)[reply]
I am able to view it (on Debian Linux) without doing anything special, though there are several scripts which I can't decipher. Conrad.Irwin 01:01, 15 March 2008 (UTC)[reply]
I can see it in Linux (Kubuntu), the only non-default font I've installed on this machine is Gothic. I can't see in Windows XP on a machine that is completely unadulterated font-wise. Thryduulf 01:03, 15 March 2008 (UTC)[reply]
I can't read it (or several other scripts, including Gothic) though. Thryduulf 01:09, 15 March 2008 (UTC)[reply]

Thinking about the suggestion of image display, is there any way to automate the generation of these images in the same way that complicated maths formulae are? 01:12, 15 March 2008 (UTC)

I was thinking of a javascript solution which allowed people to define which fonts should be replaced in this way, and get it to insert the images in place of the text as and when it was required. But I then got distracted and ran out of ideas, maybe I'll come back to it some other time. Conrad.Irwin 01:26, 15 March 2008 (UTC)[reply]
Do we have one or more tables of languages, scripts, and fontnames (or link to same) that one could refer to find what one needed to install in each major operating system/browser pending more user-friendly solutions? DCDuring TALK 01:34, 15 March 2008 (UTC)[reply]
Not that I'm aware of. I always use [6]. It's an excellent site. You may want to simply begin with Code2000. While it's not always the prettiest of fonts, it covers a fairly broad swathe. -Atelaes λάλει ἐμοί 01:36, 15 March 2008 (UTC)[reply]
Perhaps there could be a section in the 'help' with this information. Also, the idea of having images to display seems lovely (it goes way over my head though). Pistachio 01:40, 15 March 2008 (UTC)[reply]
A help page with a table showing samples from each script along with a link to get a copy of that font if you can't see it, and a link to a how to install fonts on various operating systems seems like a very useful thing to have. Thryduulf 02:05, 15 March 2008 (UTC)[reply]

I wonder if it would make sense to change some of the script templates for less common scripts to link to a help page or appendix giving information about installing fonts or an option to have those scripts turned into images via JavaScript (for logged-in users). Alternative, JavaScript code could use the CSS classes used by these scripts only when they are the head-word to avoid cluttering pages with tons of help links. Mike Dillon 02:50, 15 March 2008 (UTC)[reply]

There are no OPC signs on Commons, so I made some ad-hoc images (that really ought to be turned into SVGs by a knowledgeable wizard). Nothing impressive, but it's better than nothing. I like the idea of some sort of superscript over a headword with a message like "Problem with fonts?" linking to Appendix: with detailed instructions. --Ivan Štambuk 03:23, 15 March 2008 (UTC)[reply]

Cross-dictionary bookmarklet

I have just made a bookmarklet that you can use when on one online dictionary to add links to other online dictionaries for the same word. So far only Merriam-Webster, Microsoft Encarta, and the English Wiktionary are supported.

It's only tested on Firefox so far and for some reason the links are not clickable on Wiktionary. Improvements welcome. Copy and paste this code into a bookmark:

javascript:if(location.host=='www.merriam-webster.com')w=decodeURIComponent(location.pathname.substr(12));else if(location.host=='encarta.msn.com'){t=document.getElementsByTagName('title')[0].firstChild.nodeValue;w=t.substr(0,t.length-38).replace('’',"'");}else if(location.host=='en.wiktionary.org')w=decodeURIComponent(location.pathname.substr(6));else w=null;di=document.createElement('div');di.innerHTML='<a href="http://en.wiktionary.org/wiki/'+w+'">Wiktionary</a> <a href="http://encarta.msn.com/dictionary_/'+w+'.html">Encarta</a> <a href="http://www.m-w.com/dictionary/'+w+'">Merriam-Webster</a>';di.align='center';bod=document.getElementsByTagName('body')[0];if(w)bod.insertBefore(di,bod.firstChild);void(1)hippietrail 11:18, 15 March 2008 (UTC)[reply]

Can somebody explain to me why {{defective}} includes the POS while no ther such templates ({{ambitransitive}}, {{ergative}}, {{ditransitive}}, {{impersonal}} and even {{auxiliary}}) don't? Circeus 15:58, 15 March 2008 (UTC)[reply]

Well, it's a different kind of description; {{defective}} has to do with the forms a verb takes (and doesn't take), while the others have to do with the grammatical frames it's used in. That said, I don't think we need {{defective}} at all; it's not really a context label, and is best covered by the inflection line, the conjugation table (if any), and/or usage notes. I mean, marking a sense "defective" is really useless anyway, since it doesn't tell you what forms exist and what forms don't. —RuakhTALK 19:23, 15 March 2008 (UTC)[reply]
Where is this template intended for use? The only place this information is meaningful is where the full set of inflected forms is presented. It might appear on an inflection line, to alert a user that the verb does not follow the full normal pattern, or it might appear in an Inflection / Conjugation section for a similar reason. There is no other place I can imagine it being useful. I therefore don't see any use for this template. When we redesigned the {{la-verb}} tempate, we set it up so that "pattern=defective" could be used. This way, it displays in the inflection line and it provides a link explaining what it means. So, we don't need the template under discussion for Latin entries. --EncycloPetey 19:28, 15 March 2008 (UTC)[reply]
I don't see why evry single bit of information should be included on an inflection line. This template is useful for signaling verbs which might need special treatment, both for users and for those who look after the formatting on various languages. Physchim62 14:43, 19 March 2008 (UTC)[reply]

talkative - category?

I'd like to add talkative and its synonyms to a category. Would you recommend the existing Category:Behaviour? --Panda10 22:01, 15 March 2008 (UTC)[reply]

If you can come up with enough words, it might be worth starting a Category:Talking as a subcategory of Behavior (and of Category:Sound). I imagine it would be fairly easy to come up with enough terms for such a category. (e.g. loquacious, chatty, talk, speak, speech, say, blab, gossip, chat) --EncycloPetey 23:28, 15 March 2008 (UTC)[reply]
If it is created, Category:Talking should probably be a child of Category:Language as well. Another possible name would be Category:Oral communication, but that could include things that aren't "language" like grunting or humming. Mike Dillon 23:43, 15 March 2008 (UTC)[reply]
I created the category and added the synonyms I've found so far. Thanks. --Panda10 01:56, 16 March 2008 (UTC)[reply]
How should the new Category:Talking relate to Category:Communication? Several of the above mentioned words are already there. --Panda10 15:29, 16 March 2008 (UTC)[reply]
It would be a subcategory of that as well. We have some categories that are listed in three or more locations because of their breadth and importance. --EncycloPetey 15:30, 16 March 2008 (UTC)[reply]
Mike Dillon changed the code of Category:Talking to {{topic cat|lang=en|current=Talking}}. I used nav before, but I don't know how to update topic cat. Can I just add Category:Communication in the second line? --Panda10 15:37, 16 March 2008 (UTC)[reply]
To answer you more directly: Yes you could just add Category:Communication. You could also add "parent=Communication" to the call to {{topic cat}} and it would get merged with the parents defined at Template:topic_cat_parents/Talking. I'm planning to set up a process to watch for these sorts of changes to allow people who want to be familiar with the internals to add them into Template:topic_cat_parents/Talking and make it easier for others to just make their changes and go about their work. Mike Dillon 05:04, 17 March 2008 (UTC)[reply]
You merely have to add the extra listing to Template:topic_cat_parents/Talking. --EncycloPetey 15:42, 16 March 2008 (UTC)[reply]
I guess this is the tradeoff we have with {{topic cat}} if we decide to adopt it instead of {{nav}}. Adding a parent category is not as obvious, but once it is done it is done for the category in all languages and description changes can be managed from one place as well. I'm planning to set up a read-only bot to watch for changes in the topic category tree and parent/description configuration, so it would actually be OK to just add [[Category:Communication]] directly to Category:Talking and the fact that it is missing from the parent configuration would be noticed and reported. The code is partially written, but I'll try to get a report running soon. Mike Dillon 04:38, 17 March 2008 (UTC)[reply]
Thanks for the information, Mike. I do like the new system because of its obvious advantages. It would help to add more usage information to the template talk page. For example, what to do if an existing category has to be added to another existing category as a subcategory. I did read the template talk page before but it was not as clear as now after reading your explanation and seeing how EncycloPetey modified it. --Panda10 18:24, 17 March 2008 (UTC)[reply]

Persian,Urdu, Hebrew, Arabic, Korean entry keyboards

Following on from the above discussion about less-commonly installed fonts, I want to raise the point that some users will have difficulty entering search terms in languages with non-Latin script such as Persian,Urdu, Hebrew, Arabic and Korean and so on. They may lack administrator rights to install the languages themselves (students, people in the office), they could be using someone else's computer whilst travelling or they may not know how to install input for extra languages. Also, for many Persian-speaking people, a fault with their computer means inputting Persian produces Arabic letters instead of the modified Persian versions, for example ي instead of . A search for a Persian word in Wiktionary using those Arabic letters will produce no results unless there is a redirect in place: searching for "ايراني" instead of "ایرانی" produces no results. Therefore perhaps creating online keyboards to facilitate input and searching in some languages would be really helpful for some people. Does anyone think this is a good idea? Pistachio 02:30, 17 March 2008 (UTC)[reply]

Yes. It can be a page (Wiktionary:Search using various character sets or some such) which uses JavaScript to paste characters to a search box, and then uses the usual Wiktionary search as the search mechanism. Cf. [7].—msh210 16:10, 17 March 2008 (UTC)[reply]

Images

How do people feel about something like this, adding an image for each sense for which an image is available? In general, do galleries have a place in Wiktionary entries, and if so where should they be placed within the entry? -- Visviva 07:28, 17 March 2008 (UTC)[reply]

Certainly for nouns and proper nouns, images are often good at aiding understanding of the word being defined. Some verbs can also be imaged, but animations would be better for some. Adjectives, etc. are far more difficult to illustrate (e.g. what image would you add to eloquent?).
Regarding the placement of the images, generally I don't like the use of galleries outside Commons, as in most cases inline images work better (imo). Compare ring with router. This can cause problems when the picture(s) extend beyond the definition lines, but this can be overcome - see bassoon (compare with this old revision). There is probably a better way of doing this with js or css than the table format I used there, but I don't know any js or css). Thryduulf 14:45, 17 March 2008 (UTC)[reply]
I agree that galleries aren't ideal (at least in their current default format). In general, I'd like the images to be as close as possible to the definitions. But it seems to me that the bassoon solution breaks down rather quickly when there are 4 or more illustrable senses (and most senses of concrete nouns are illustrable, although the store of images on Commons still leaves much to be desired in this regard). -- Visviva 14:57, 17 March 2008 (UTC)[reply]
How about a Gallery: namespace, for use in situations where there need to be lots of images for lots of senses? If there is only one image needed, it can go on the page as we do now, but in situations where many are needed, we could have one on the page, then a link to the Gallery namespace for additional illustrations. --EncycloPetey 15:37, 17 March 2008 (UTC)[reply]
I think a gallery namespace is a bad idea. Finding uses of a word throughout history is one of the basal functions of a dictionary, showing picture examples is not. While there's nothing wrong with adding pictures, I don't think our emphasis falls on that area enough to justify a new namespace. -Atelaes λάλει ἐμοί 18:44, 17 March 2008 (UTC)[reply]
I really prefer the use of images off to the right side of the page. Also I think it is important to use the (#) notation to associate the image with it's definition. The gallery might be able to work, but we don't use the width of the page as much as we could, which is a shame. We often have great big areas off to the right and pages that go on forever vertically, with most of the content well out of sight. Off to the right the image balance the text density of the left side of the page and keep the right side from being mostly blank and boring. We really don't need images for every sense, I think it might be best to only use them when they clarify or clearly illustrate the definition, or when they just look really good :). For some words that will mean 10 images are called for, for others one or none. - [The]DaveRoss 19:59, 17 March 2008 (UTC)[reply]
In general, I don't think numbered notation works for images any more than for translations (although the consequences of confusion are less severe). Numbers change, and the people making the changes don't always notice the by-number references elsewhere in the entry. That's why I usually try to put some sort of short gloss in the caption (as in ring). -- Visviva 06:28, 20 March 2008 (UTC)[reply]
I like it. It's certainly better than messing things up with lots of miscellaneous floaty stuff. Conrad.Irwin 19:27, 17 March 2008 (UTC)[reply]
Without intending to stir up this issue any further, it does occur to me that the problems with associating images to senses would disappear if we did begin using the sense, rather than the POS, as our primary unit of organization. -- Visviva 06:28, 20 March 2008 (UTC)[reply]

Category:Computer Science vs Category:Computer science

Is there a reason why the Category:Computer Science has "science" with the first capital letter? If not, may I move it to Category:Computer science? --Daniel Polansky 10:57, 9 February 2008 (UTC)

No, yes, go ahead. H. (talk) 09:14, 17 March 2008 (UTC)[reply]
I've wondered the same thing about Category:Food and Drink, though moving all those categories by hand seemed like more trouble than it could possibly be worth. -- Visviva 14:58, 17 March 2008 (UTC)[reply]
I've made those sort of moves by hand before. If someone will poke me (after Thursday) about getting this done, I'll do the dirty work. --EncycloPetey 15:33, 17 March 2008 (UTC)[reply]
I have moved the computer science category manually. It seems to have worked nicely, also because of the heavy use of the {{computer science}} in the entries. --Daniel Polansky 19:14, 17 March 2008 (UTC)[reply]

Basque etymological dictionary

For those interested in Basque: http://linguistlist.org/issues/19/19-863.html#1 H. (talk) 09:13, 17 March 2008 (UTC)[reply]

Category:Food and Drink

I have manually moved Category:Food and Drink to Category:Food and drink. What I have not moved are the numerous non-English subcategories of that category, listed in the new category. It would be nice if other people could help. Otherwise, I am planning to slowly work on the non-English categories for food and drink too.

Another category worth fixing is Category:Spices and Herbs, fortunately having fewer non-English subcategories. --Daniel Polansky 09:04, 18 March 2008 (UTC)[reply]

I'm not sure that I'd say it's complete ready yet, but the {{topic cat}} stuff could help with future moves like this. I've configured Template:topic cat parents/Food and drink, Template:topic cat parents/Foods, Template:topic cat parents/Breads, and Template:topic cat parents/Desserts, so any of those categories can be configured by replacing their contents with {{topic cat|lang=XX|current=CATEGORY}}. If we had mw:Extension:StringFunctions, it would just be {{topic cat}} since we could have the template take care of splitting the language code from the category name. Mike Dillon 15:13, 18 March 2008 (UTC)[reply]
I've populated all of the category parent entries for the Category:Food and drink tree under Template:topic cat parents. I'm sure some of them still need descriptions under Template:topic cat description, but many will be fine with the standard description. Mike Dillon 15:54, 18 March 2008 (UTC)[reply]

Should Category:Automotive be moved to become a sub-category of Category:Road transport? Thryduulf 12:23, 18 March 2008 (UTC)[reply]

I think so, yes. -- Visviva 14:31, 18 March 2008 (UTC)[reply]

Is there a reason category:zh:Adverbs is a sub-category of category:Adverbs rather than category:Adverbs by language? Thryduulf 17:37, 18 March 2008 (UTC)[reply]

Yes. Physchim62 14:44, 19 March 2008 (UTC)[reply]
The whole situation with "zh" seems very weird to me. I can see that we have Category:Mandarin adverbs, Category:zh:Adverbs, Category:zh-cn:Adverbs, and Category:zh-tw:Adverbs. We don't have Category:Cantonese adverbs, but we do have Category:Cantonese nouns. The "zh" categories seem entirely inappropriate to me for an "adverbs" category. The Chinese languages are the only languages that are handled like this. This seems to be a lowest-common denominator type thing where the words made up of simplified Chinese characters end up in the "zh-cn" categories and the ones made up of traditional Chinese characters end up in the "zh-tw" categories instead of identifying them in each of the actual languages that the words belong to. No doubt someone who understands these matters better will show me the error of my ways, but it doesn't seem that there is a good reason that Chinese languages should be using language code-prefixed categories for parts of speech. Mike Dillon 14:56, 19 March 2008 (UTC)[reply]

Consideration on the order of context tags

Just to stir up some discussions and thought, this is the ordering I've been using (and switching articles to):

  1. Grammar information
  2. Topical labels
  3. Regional labels
  4. Formality labels (in practice, mostly "informal" and "slang", but also stuff like "literary" and "baby talk/childish")
  5. Politeness labels ("euphemism", "derogatory", "pejorative", "vulgar", "jocular" etc.)
  6. Temporal/frequency labels ("rare" goes here)

This has felt to me the most natural ordering, though I'm not clear myself why. Typically, I'm more confortable with the Grammar labels not being in the same parenthesis as the other ones, hough I haven't been too consistent on that. I want to point that in quite a few cases, I've been removing templates that felt redundant (e.g. "informal" alongside "slang", "vulgar" or "jocular"). Circeus 14:49, 19 March 2008 (UTC)[reply]

The order seems agreeable. This might be worthy of becoming at least a guideline. I'd favor facilitating the placement of all of these inside a single set of brackets, which usually works with our existng tags, but perhaps not every one. Are there any limits on the number of these in {{context}}? Six seems like it would be insufficient since there can be multiple topics and regions {and possibly others). Also editors use up slots with qualifiers like "mostly". DCDuring TALK 15:33, 19 March 2008 (UTC)[reply]
After some experimenting, it seems you can have up to 10 labels, subsequent ones are ignored. However, "usually" and "mostly" (and presumably other similar ones) use 2 slots. When using "usually" or "mostly" means there are too many arguments there is a red link to "Template:context 10" at the end. In contrast to what I expected, "and" and "or" each only take 1 slot. |_| of course also uses 1 slot. Thryduulf 18:13, 19 March 2008 (UTC)[reply]
It is unusual for more than 4 or 5 labels to appear (including non-label stuff such as Thryduulf just mentioned). At that point a usage note typically becomes a good idea. Circeus 19:07, 19 March 2008 (UTC)[reply]
Let me stick a technical note in here: the number of "slots" is not built into {{context}}, it is just a result of the number of {context n}'s that have been set up. Each of {{context 1}} through {{context 9}} is a redirect to Template:context. This "tells" the parser that the resulting recursion is intentional. Create {{context 10}} as the same redirect, and you'll have 11 "slots", and so forth. At some point the tags will be longer than desirable, but this isn't an implementation limit. (So there isn't any reason to worry about "using up" slots.) Robert Ullmann 09:13, 23 March 2008 (UTC)[reply]
As the discussion suggests, 10 seems to exceed what we feel would be useful to show in a sense line. How hard would it be to have a trustworthy bot that put context tags in our desired order (assuming that we get consensus on it)? I am having trouble seeing how that could lead to trouble (ignoring tech glitches for the moment). DCDuring TALK 12:06, 23 March 2008 (UTC)[reply]
The context slot contents get processed to determine the need for new categories, I believe, so I is useful for things that could be categories to be there. That's one reason I don't like using up slots for "empty" qualifiers, although I suppose that there were enough demand we could have some special means for handling them. I like to use Usage notes for anything complicated or nuanced - or just not likely to become a category in my assessment. I suppose there would be nothing that would prevent successive context tags, if it came to that. Extra brackets might be useful to separate a list of regional or topical contexts from other types. DCDuring TALK 19:39, 19 March 2008 (UTC)[reply]
I think the context notes are getting overused, if there are any more than 2-3 the qualifications should be in the usage notes section, perhaps with a pointer to that section as the sole contents of {{context}}. I have seen context fields with 6-7 notes, taking up more than half of the definition line. This is just plain confusing, because inevitably there are regional nuances and things going on which can't be fully explained by a list of context tags. I say limit the contents of context to 3, force all complex contexts to a more clear usage notes paragraph. - [The]DaveRoss 19:47, 19 March 2008 (UTC)[reply]
Once more, it is the most polysemic words and PoSs that create the best test cases for any options. Relying on usage notes for items that might have been placed on the sense line means that the usage note may not appear on the same screen as the sense line. This puts a big cognitive load on the user at best and may mean they never even know that there is a usage note. The solution of having rel templates with glosses has serious maintainability drawbacks when definitions are edited. This doesn't have so much bearing on the narrowest construction of the subject of this heading, but provides another example of how the structure of our complex entries limits us. If we treated a given sense as a collapsable mini-entry with its own context, def, semantic relations, translations, usage examples, and usage notes, we would have almost guaranteed that almost all the information a user could want would be on a single screen. DCDuring TALK 20:06, 19 March 2008 (UTC)[reply]
I think in the end this is going to come down to "we need to rethink our entry layout, because it doesn't work." There is a lot of data which is associated with a particular sense which is kept far away from that sense, often with a lot of other unrelated junk in between. Translations, citations, usage notes, examples, images, and to a lesser extent (or more general) pronunciations, etymologies and conjugation/declension data, should all be clearly associated with particular senses, and the current method of doing so is by using a gloss to indicate which sense the subsequent information should be associated. What we should probably be doing is using collapsible fields directly beneath every definition containing all associated data, so once the reader finds the sense they want they simply click once and get all kinds of additional information pertaining only to that sense. No hunting around across the page, no question about which sense the information is associated with. It isn't that big a technological problem, it is a decent-sized organizational problem and a GIGANTIC effort in manually reorganizing the entries problem, but it is something we will end up having to do anyway, and there is a good chance that while we are cleaning things up we will be able to format them in a very standard way, hopefully a way which is easy to convert to xml or other associatively-structured output so that _all_ of the information on Wiktionary is friendly to machine reading rather than just the definitions. How best to affect this is a matter of discussion of course, but I think everyone knows, deep down, that our entry layout is far more complicated and far less useful than it could and ought to be. - [The]DaveRoss 20:18, 19 March 2008 (UTC)[reply]
And in the meantime, the best we can do is standardize everything that can be effectively standardized (without losing information) to maximize the chances that some of the restructuring can be automated. Well, it won't be long until we'll have gotten rid of all the English transitive and intransitive verb headings, occasionally yielding more than 10 sense lines and 10 usage examples (more than a screenful) for a single ety's verb PoS. DCDuring TALK 20:36, 19 March 2008 (UTC)[reply]
Just to add, despite how deep the above discussion seems to go, the vast, vast majority of cases use no more than 2 or 3non-grammatical labels at the same time, and appropriately used. As I said, having more than tree labels is already very rare. Circeus 21:54, 19 March 2008 (UTC)[reply]
The initial suggestion seemed like a good idea, but the opportunity to learn something new and to put some mileage on a hobby-horse was irresistible (to me, anyway). DCDuring TALK 22:02, 19 March 2008 (UTC)[reply]

While I agree with TheDaveRoss that we should be nesting information under definitions (which sucks for the JavaScript-less, but I don't think the current approach is much better), it's obvious that we don't currently have consensus to do that. So in the meantime, I think it would be helpful to create a family of "see-usage-notes" templates that provide a standard appearance and functionality for links to and from usage notes. (Actually, in general it would be nice if we had a convenient way for glossed onym sections and so on to link to the senses they belong to, until such time as we simply attach them to those senses.) —RuakhTALK 00:26, 20 March 2008 (UTC)[reply]

BTW, the idea was mentioned above of putting grammatical information in a separate set of parentheses; I don't think that's a good idea, as it will look strange, and the distinction we're making will not be obvious to the casual reader. —RuakhTALK 00:32, 20 March 2008 (UTC)[reply]
From the amount of stuff I've looked at, even with only topical templates, you can believe ma that we are not exactly consistent. Circeus 00:52, 21 March 2008 (UTC)[reply]

I would like to move Category:Spices and Herbs to Category:Spices and herbs, but it seems much more work than with the previous categories that I have moved. What could be done robotically is that all the word entries in the various languages are moved to the new categories, such as Category:de:Spices and herbs; a simple regular expression replace should achive this, such as r/\[\[Category:\(.*?\):Spices and Herbs\]\]/\[\[Category:\1:Spices and herbs\]\]. Afterwards, the category pages could be moved manually. Anyone volunteer on having his robot do the work? Or is it more complex than I imagine it to be? --Daniel Polansky 06:14, 20 March 2008 (UTC)[reply]

You could try doing it with autoedit, which will make it quicker though not as easy as getting a friendly bot. Conrad.Irwin 11:38, 20 March 2008 (UTC)[reply]
Thanks for the hint. --Daniel Polansky 12:24, 20 March 2008 (UTC)[reply]

entries with illegal titles

A feedback message prompted my creation of Appendix:Entries with illegal titles Appendix:Unsupported titles, which still perhaps needs a better name, and definitely need expansion (and being linked to).—msh210 19:41, 20 March 2008 (UTC)[reply]

Seems like a great idea. I suppose we could create some Javascript that redirects from those pages to that Appendix. Conrad.Irwin 17:55, 20 March 2008 (UTC)[reply]
Sounds good to me.—msh210 19:41, 20 March 2008 (UTC)[reply]
Better idea, that appendix is now transcluded into MediaWiki:Badtitletext. For example, see http://en.wiktionary.org/wiki/%7C . Conrad.Irwin 20:17, 20 March 2008 (UTC)[reply]
I imagine this is going to grow fairly long. Can we make it a list of links, to e.g. Appendix:.7C etc? Also, where do we name symbols? See e.g. ^ which is not acually defined as exponentiation, conjunction, or however else it's used, but is named unlike the symbols on this new page. DAVilla 20:56, 20 March 2008 (UTC)[reply]
I would imagine that this page would become an index, and the individual entries would be moved out: it would make sense to use that kind of encoding for them. Appendix:UT/.5B or summat. Conrad.Irwin 22:04, 20 March 2008 (UTC)[reply]

Language hierarchies

What is the preferred language heading, ==Mandarin== or ==Mandarin Chinese==, or something else? ==Cantonese== or ==Cantonese Chinese==?

Which of these is preferred in the translations section?

  • Chinese
    • Cantonese
    • Mandarin

with or without the extra bullets, or

  • Cantonese (Chinese)
  • Mandarin (Chinese)

with or without "Chinese"? My opinion is based on simplicity in the layout, which rules out the indentation that other users like, and on normality, which rules out "Cantonese Chinese" but allows "Mandarin Chinese" for clarification. DAVilla 20:14, 20 March 2008 (UTC)[reply]

I like the first one because of the fact that it would also work for regional variations:
But that's just me. — [ ric ] opiaterein12:52, 21 March 2008 (UTC)[reply]
I really like that too (and we can discuss bullets and italicization of region name). Note that there may be Portuguese translations that are not regional which would go on the first line. But the question posed is a little different. I think that the indentation should consistently match region, and the language/dialect listed should consistently match what we use as level-two headers. Otherwise you would end up with
  • Chinese
    • Mandarin: ...
      Taiwan: ...
which is a bit crazy. On the other hand, the problem with not doing it this way is that Mandarin translations wind up under "M" instead of "C", which in my opinion is mitigated by the use of "Mandarin Chinese". DAVilla 22:26, 22 March 2008 (UTC)[reply]
The preferred L2 header is ==Mandarin== or ==Cantonese== or the like. The way Translations are listed is a current topic on Wiktionary talk:About Chinese. --EncycloPetey 22:05, 21 March 2008 (UTC)[reply]
Great, thanks for the link! If the translation section were to list
  • Cantonese
  • Mandarin Chinese
Then would it be objectable to change all the language headers to ==Mandarin Chinese== to match? DAVilla 22:26, 22 March 2008 (UTC)[reply]
I would like to say that there are occassions when it would be good to point out both region and language/dialect in the translations section. In such cases, I don't think it would be crazy to have (for computer):
The above example is not necessarily the norm, but there are times when you may want to point out this kind of information. However, I'm not arguing for a specific format, just the ability to include such information when necessary. -- A-cai 08:16, 29 March 2008 (UTC)[reply]

I would like to propose that:

  1. We use the same names for languages in translations as those which are used for L2 language headers
  2. We use Mandarin Chinese instead of just "Mandarin", but otherwise the name of the dialect such as Cantonese without "Chinese"... unless we decide otherwise for a specific dialect
  3. Within the translations section, we list all languages at the same level, and indent only regions which would not be accepted as L2 headers

The major objection to this of course is that Mandarin translations would not be listed under "C". Does the simplicity have enough benefit to overweigh that objection? DAVilla 04:50, 31 March 2008 (UTC)[reply]

That sounds good to me, though I'm not sure regions should generally be indented. I think something like Mandarin Chinese: (PRC only) 計算機计算机 (jìsuànjī), (PRC and Taiwan) 電腦电脑 (diànnǎo) would be better in most cases. —RuakhTALK 05:17, 31 March 2008 (UTC)[reply]
Sounds reasonable to me as well. Thinking ahead (but not discussing the issue yet), we should consider how this would impact grouping and alphbetization for languages like Ancient Greek, Old French, and Northern Saami. IF the Translations labels and L2 headers always match (which I think is good), can we live with the impact it would have in situations where languages have a qualifier of time or location? --EncycloPetey 05:35, 31 March 2008 (UTC)[reply]
Same as the reasoning for listing language sections alphabetically, this is already an unavoidable issue where the newer and older languages do not even share a name. For not remembering a good example of that I'd make a terrible linguist. DAVilla 05:00, 7 April 2008 (UTC)[reply]

Separate new page templates for simple past and past participle

The new page template for "Past" creates "Simple past tense and past participle". It would be nice to have separate templates for "simple past" and "past participle" for words where they're different. - dougher 00:48, 21 March 2008 (UTC)[reply]

Maybe I'm just dumb, but can you give an example of a word where they are different? Nadando 02:11, 21 March 2008 (UTC)[reply]
sing - simple past sang (I sang in the choir), past participle sung (I have sung in the choir). Thryduulf 02:19, 21 March 2008 (UTC)[reply]
For more examples, see Category:English irregular verbs. Thryduulf 02:22, 21 March 2008 (UTC)[reply]
This would be very useful in corner cases such as proved vs. proven where the regular inflection is both the simple past and past participle, but where use of an irregular past participle needs distinction of context such as region. DAVilla 07:37, 21 March 2008 (UTC)[reply]
What new template are you talking about? We've always had possible the option of listing these separately. --EncycloPetey 22:03, 21 March 2008 (UTC)[reply]
{{past of}}:{{simple past of}}:{{past participle of}}::{{new en verb past}}:{{_____}}:{{_____}}, where the blanks denote the templates I believe he's proposing. —RuakhTALK 22:23, 21 March 2008 (UTC)[reply]
No, he hasn't proposed a new template. He said 'The new page template for "Past" creates...' which implies that the template he's talking about already exists. I'm asking what "new" template that might be. --EncycloPetey 01:17, 22 March 2008 (UTC)[reply]
When you search for a word that isn't in Wiktionary, you are shown a table with the title "You can create a new entry with one of the following preloaded entry templates:" [8].
The options given on this table are "Basic", "Noun", "Plural", "Adjective", "Adverb", "Comparative", "Superlative", "Verb", "3rd Person", "Participle" and "Past".
If the word you want to create is a past participle then obviously the option you choose is "past".
However, this preloads a page inclduing the template {{past of}}, which outputs "Simple past tense and past participle of", and categorises your word into category:English simple past forms and category:English past participles. This is correct for English regular verbs.
If your word is a past participle but not a simple past form (or vice versa) though then this is incorrect. The original questioner is asking for separate preloaded templates for past tense words that are not both the past participle and simple past form (i.e. they are one but not the other). Thryduulf 01:35, 22 March 2008 (UTC)[reply]
Ah, thanks. I understand the question now. --EncycloPetey 19:03, 22 March 2008 (UTC)[reply]
FWIW, at least two other editors (myself and RJFJR) also originally parsed "new page template" as "new {page template}" rather than as "{new page} template". I don't know why it's so confusing, but you're not alone. —RuakhTALK 21:04, 22 March 2008 (UTC)[reply]
Right, so how about if we split {{new en verb past}} into two definition lines, {{simple past of}} and {{past participle of}}, and then when appropriate the user could simply delete one of the lines? DAVilla 22:10, 22 March 2008 (UTC)[reply]

Projects being neglected

Wiktionary:Translations of the week and Wiktionary:Collaboration of the week have not been updated regularly, Translations for a few weeks and Collaboration for much longer. Is anyone interested in keeping these up to date or should we take them out of prominence rather than have repeats showing up? - [The]DaveRoss 21:29, 21 March 2008 (UTC)[reply]

If nobody is working on them, then I suggest you reduce their prominence and mark them as inactive with a note to start a discussion here if anyone ones to revitalise them in future. My plate is currently full working on category:Requests for pronunciation and special:UncategorizedPages. Thryduulf 21:42, 21 March 2008 (UTC)[reply]
Connel, DAVilla, and I (and others) have all tried at times to get these projects going. Ethusiasm and participation seldom continues beyond a couple of weeks, and then the community forgets about them again. I stopped bothering because it just wasn't worth my time and effort to try to get people involved. --EncycloPetey 22:02, 21 March 2008 (UTC)[reply]
Projects only work properly if people work together as a team. But we are all individuals doing our own thing. They should probably all be scrapped. SemperBlotto 22:21, 21 March 2008 (UTC)[reply]
Both have been marked {{inactive}}. Conrad.Irwin 10:24, 22 March 2008 (UTC)[reply]
Well, we include them elsewhere, should we drop the inclusion on places like Tea Room? - [The]DaveRoss 01:52, 23 March 2008 (UTC)[reply]
That would seem sensible, though we should probably leave a few links hanging around in the hope that someone enthusiastic finds them one day. Conrad.Irwin 16:10, 23 March 2008 (UTC)[reply]
I went ahead and filled up ToW for the next ~25 weeks, took about 20 minutes. Apparently this isn't a time consuming chore it is just one that has slipped by for whatever reason. If anyone wants to take a crack at CoW I am sure it would be similarly quick and painless and these rather useful Things to Do(tm) wouldn't go by the wayside. - [The]DaveRoss 16:42, 29 March 2008 (UTC)[reply]

Interwikis and ta.wikt (Tamil)

moved from Grease Pit, as this is no longer a technical issue ...

I've disabled the adding of iwikis for ta.wikt for now (except when the bot is making an edit anyway, so you'll see "iwiki +fr, ta" but not "iwiki +ta"). Reason is that the bot was spending 60-70% of its time on these.

The problem is that the Tamil wiktionary has gone from <10K entries to over 100K in about two weeks. (Urp!) There is a bot loading lots and lots of "English" words. Some of the entries are clearly crap. Some are dubious even without knowing any Tamil. The rest? Well, I tried looking up a few dozen words that appear in what seems to be the definition line, and most of them gave me the magic zero google. Of course that might be a lot of technical words that no-one has yet written in Tamil on the net, but it makes me wonder.

Also, all of the entries reference tamilvu.org, (Tamil Virtual University) and everything there is copyright. It could be that the source is elsewhere, and that is just a "convenient" link, but I dunno.

Someone ought to look at this ... anyone know any Tamil? Robert Ullmann 08:14, 21 March 2008 (UTC)[reply]

You could ask ravishankar. --EncycloPetey 20:04, 21 March 2008 (UTC)[reply]

uh oh ... from my talk page:

Hello Robert Ullmann. I am User:Sundar in English and Tamil Wikipedias and the Tamil Wiktionary. I saw your message to Ravi concerning the bot-created articles in Tamil Wiktionary. As I wrote up SundarBot that uploaded articles, let me answer your questions:

Firstly, while there could be some unforeseen bugs in transcoding to Unicode, there's no junk uploaded by the bot. Secondly, we got the glossary from Tamil Virtual University which developed that dictionary from numerous public domain sources, volunteer effort, and fully funded by the Government of Tamil Nadu. Also, we believed that words of a language can't be copyrighted and are naturally in the public domain. The bot took the meanings from www.tamilvu.org, transliterated them to Unicode (from TAB encoding), categorised them, formatted per wiktionary conventions, added pronunciation where one exists in the commons, and uploaded it to Tamil Wiktionary citing TVU and providing a link to their page. Errors from the original source have since been corrected by users too. Being words of a language (actively encouraged by the creator for wide public use) compiled using public funds copied with proper citation, processed and value-added in Wiktionary is fair-use according to Tamil Wiktionary editors. Also, let me state that we didn't use any style or artistic product of TVU. -- 122.167.242.183 14:25, 22 March 2008 (UTC)[reply]

sigh

As far as I can tell, Tamil Virtual University claims copyright on the dictionary (as part of a blanket claim on the website). And his definition of "fair use" is very very wrong, and "fair use" doesn't apply anyway if you are copying entries from another dictionary.

Where do we take this? Robert Ullmann 14:48, 22 March 2008 (UTC)[reply]

This is a very serious problem, though arguably one to which we could simply shut our eyes, since in the end their content is not ours. Assuming that Sundar's explanation above is accurate, TA is in direct violation of foundation:Resolution:Licensing_policy. Stewards refuse as a matter of principle to be involved in these matters, so I guess the best that could be done would be to contact foundation staff directly (or semi-directly via a post to foundation-L). -- Visviva 15:38, 22 March 2008 (UTC)[reply]
Foundation-L? Is there not enough noise there? I sent email to WMF counsel, Mike Godwin. (who is the eponymous Godwin! very cool :-) Robert Ullmann
You're right that if fair use is being claimed here, it's likely a bad claim, and unless ta.wikt has an EDP (unlikely) even a good fair use claim is in violation of policy. However, I would note that while it makes me uncomfortable, Mike Godwin has said in the past that things like dictionary definitions and etymologies are "facts" that can't really be subject to copyright, even though the companies that publish dictionaries will claim that it is. To me, those seem like the most creative parts of the dictionary, too, and Mike seems to think that there isn't much to a dictionary that is protected. Aside from that being startlingly permissive, it would also be a direct commercial threat if we, say, copied OED wholesale, so they'd likely challenge it in some way eventually. It would be good to hear from his own mouth what he has to say, but I remember being surprised the last time he was asked about Wiktionary and copyrights. Dmcdevit·t 09:52, 23 March 2008 (UTC)[reply]
If anyone is interested in the finer details on copyrights on facts, and collections of facts, in US law, the seminal case is 499 US 340 (Feist v Rural), in which the Court ruled on the (non-)copyright in a telephone directory (the content being facts, with no originality in selection or arrangement: all the numbers in the area, in alphabetical order by subscriber name). However, dictionaries contain significant creativity in selection, composition of definitions, etc. So Sundar has part of it right, but the reference to "fair use" implies that he has knowingly used copyright material, rather than simply using the "fact collection". IMHO, Mike is correct that there isn't much to a dictionary that can be copyright; but the entire work almost certainly is. And Sundar copied/is copying the entire work from TVU. It will be interesting to see what he (Mike) says. Robert Ullmann 10:19, 23 March 2008 (UTC)[reply]
<IP lawyer hat on>If an action were brought against anyone in this matter, the action would likely be brought in India and the applicable law would be the copyright law of India, where the work was produced. Under the prevailing law, the INDIAN COPYRIGHT ACT, 1957, the Indian government does own copyright in its works, and works produced by a "public undertaking" (a cooperative venture between a government and private actor) are owned by that "public undertaking". Feist v. Rural is not applicable law to an Indian copyright. The courts of India have apparently rejected Feist, at least with regard to the "sweat of the brow" doctrine, in Burlington's Home Shopping Ltd. v. Chibber (1995), in which the Delhi High Court applied copyright protection to a computer database of contact info from mail order customers.
Even if Feist were applicable, dictionaries are generally covered by copyright to the extent that, for example, creativity goes into the writing of definitions. Of course, if a copyrighted dictionary purports to define all words in a language, its owner can not prevent another person from similarly attempting to define all words in the same language, and from referring to the copyrighted work to determine if that list is complete. However, if there is anything more than that - say, one dictionary copying another's select (but not "complete") list of words, then we run into copyright problems. Furthermore, although India has some broad fair-use exemptions, I don't see how this use falls within them - there is no criticism or artistic statement made about the Indian government's work, nor the reporting of a news event. In short, I wouldn't be plucking definitions from a copyrighted work in India. </IP lawyer hat off> bd2412 T 01:18, 24 March 2008 (UTC)[reply]
Yes, thank you, I should have been clearer than "in US law", pointing out that the copyright material was in India, so Feist is illuminating, but not controlling. Thanks, Robert Ullmann 13:54, 24 March 2008 (UTC)[reply]

We currently have two different templates for use on words used only in the plural, they both categorise articles into the appropriate language subcategory of category:Pluralia tantum but display different text.

  {{plurale tantum}} {{pluralonly}}
Display text (plurale tantum) (plural only; not used in singular form)
Inclusions 245* 159

* Including inclusions of {{pluralia tantum}}, which redirects to {{plurale tantum}}

Having a quick look at the entries they are used on, there is a very slight preference for {{pluralonly}} to be used on more basic words, but this is not going to be at all statistically significant. The history seems to show that they have developed independently, with the creation of {{plurale tantum}} postdating that of {{plural only}} by about 5 months.

I don't see the benefit of maintaining two separate templates which are doing the same job, however as the wording is different it isn't just a case of redirecting one to the other. While "plurale tantum" is the correct name for the class of words, it isn't a term with which most non-specialists are familiar. The "plural only; not used in singular form" is a very good definition of the meaning of "plurale tantum" but it doesn't educate people about the technical name. As combining the descriptions will make the text too long, perhaps "(plurale tantum)", including a link to the entry would be best? Thryduulf 16:39, 22 March 2008 (UTC)[reply]

I feel that we should use the correct term but provide a quick definition on hover, maybe something like (plurale tantum). I also quite like the idea of the link, but it is quite slow compared to a hover text. Conrad.Irwin 16:55, 22 March 2008 (UTC)[reply]
I think something like (plurale tantum) might be the best of both worlds. —RuakhTALK 17:08, 22 March 2008 (UTC)[reply]
Is there any English-language dictionary that uses "plurale tantum" instead of "plural construction only" or a similar English wording? The term is not even defined in MW3. What is our justification for ignoring the needs of our users or, actually, our potential users? That botanists descirbe newly discovered plants in Latin does not mean that gardening books are writen in Latin. DCDuring TALK 17:26, 22 March 2008 (UTC)[reply]
The one hardcopy dictionary I have to hand, the 1998 edition of The Chambers Dictionary, doesn't define "plurale tantum" and uses the abbreviation "n pl" for "noun plural" to mark such entries. This dictionary bills itself as "the most comprehensive single-volume dictionary" so I would not expect to find it in concise dictionaries. Thryduulf 17:38, 22 March 2008 (UTC)[reply]
Both the OED and MW3 use pl.. The AHD is inconsistent, sometimes using the text "Often used in the plural" (cf. pant) and other times putting the plural form in bold at the head of the numbered sense (cf. color). Random House doesn't bother to mark these at all. --EncycloPetey 19:02, 22 March 2008 (UTC)[reply]
No clue. An English wording would be better, if there's a brief one. How about (in plural)? That would also reduce the suggestion that a word is simply never used in the singular: after all, most pluralia tantum, despite the name, are sometimes used in the singular (especially in jocular, nonce, or nonstandard usages, but even certain standard forms of the language tend to singular-ize them in attributive use, and some words, like paparazzi, are freely used both ways by different people). —RuakhTALK 21:13, 22 March 2008 (UTC)[reply]
That's sounds like a good reason to use plurale tantum. If we use an English wording, it will be positively misleading. A technical term like plurale tantum will have to be looked up by people less familiar with the grammar, and can be linked to a definition or Appendix that points out that the plural is not strictly absolute, as you say. The probelm with "in plural" is that it isn't strong enough. The problem with "only plural" is that it's too strong. Any English wording likely to be of the appropriate strength is also likely to be too wordy. --EncycloPetey 16:28, 23 March 2008 (UTC)[reply]
The wordings "rarely" "sometimes", "usually", and "always", with "plural" would seem to encompass the range of possibilities, with "usually" being an appropriately cautionary and flexible default. Having a special link from plural to a WT Appendix article on plurals would give us the chance to clarify exactly what we mean, if any ambiguity remained. "Tantum" only means "only".
No matter how we do this, some users will be mislead. The Latin, I fear, will put off a noticable percentage (5%?) of viewers of the article, whether or not they care about plurals. Only some (50%?) of viewers will really care all that much about the plural question. I would expect that not all users (20-50%?) will click through to any link on a grammar point, even if that's what they were looking for. I would expect that a Latin term would lead to fewer (50%?) click-throughs than an English one. Few users (20%?) would get the right idea from the Latim term itself. More (3X) would get it from an English term. If one does the arithmetic one would conclude that fewer users are helped by Latin than English and that some folks would be put off by the very presence of Latin.
Finally, I don't think that the plurale tantum and singulare tantum entries really explain things well enough. DCDuring TALK 17:04, 23 March 2008 (UTC)[reply]

Agree with using {{plurale tantum}} as a specific term to desribe what is a phenomenon rather than a rule. Those users who would ever care, or in any way be affected by the phenomenon, would click the link and thus learn a fancy new term. -- Thisis0 17:27, 29 March 2008 (UTC)[reply]

So, if I understand the argument correctly, we don't care what our (potential) users understand, but prefer non-english terms in the english wiktionary because they're more precise? and if they're curious enough, they'll click through. Just asking for clarity on this issue. - Amgine/talk 20:13, 29 March 2008 (UTC)[reply]
We care very much what our users understand and how they are best able to consume our content. We also care how to increase their understanding. In this particular case, I agree with using the specific Latin term because what we are describing is a linguistic phenomenon with it's own peculiar attributes. Using actual short English phrases to try and describe it becomes misleading and inaccurate. Saying "plural only; not used in singular form" is both inaccurate and proscriptive -- two things Wiktionary aims not to be. Other attempts in English to briefly capture the existence, function, and use of terms like trousers, glasses, scissors, and clothes are certain to be imprecise and misleading. If you can coin a template-worthy phrase to describe the way these terms really work, go for it. Until then, I advocate using the detached, erudite phrase (with click-through to thorough explanation) to tag the peculiar phenomenon. -- Thisis0 00:44, 1 April 2008 (UTC)[reply]
That is saying that we only care about those who are willing to put in the time, not those who are too busy, i.e., who have a life. That's like only teaching the students who are going to go to grad school. "Plurale tantum" and "singulare tantum" aren't clearly explained once someone clicks through and have a deterrent effect on users. Or are you saying that the value of plurale tantum is precisely that it isn't clear that it means "plural only"? DCDuring TALK 01:16, 1 April 2008 (UTC)[reply]
Partly, yes, indeed I am saying that is its value. Actually saying "plural only" would be quite wrong, and would teach the wrong thing to the most nonchalant user, which is worse than teaching him nothing. And the fact that the click-through entries aren't yet thoroughly explained isn't a reason not to proceed with the correct course. Make them thoroughly explained. Also, I'm not gonna give any weight in my consideration to anyone "taking offense" or being deterred by the mere presence of things that actually have Latin names. What is a dictionary if not instructive and illuminating on the subject of words? We will be just that. -- Thisis0 03:33, 1 April 2008 (UTC)[reply]
If we decide to go with "Plurale tantum" (which I'm less keen on now that when I started this thread), then I think Ruakh's hover and link would be the most useful for users with and without a life. What might be better though would be being able to express degrees of commonality - "not used in singular form", "rarely used in singular form", "usually not used in singular form", "normally not used in singular form", "not used formally in singular form", etc. Although the last would probably work better expanded to a usage note. Thryduulf 01:34, 1 April 2008 (UTC)[reply]
If any of those phrases happened to be accurate for a specific entry, they are Usage-Notey and belong in that section. The actual real name for the thing is plurale tantum, which is the right substance for a tag. Besides, it's not as if the Latin name is some obtuse thing. It starts with the word "plural", and most who glance at it can sense the phenomenon to which it refers. The fact that the tag exists is the only reason I know the term today. I learned it here, and have since researched, explored and categorized the phenomenon in my brain, i.e. I learned a new thing because of it. Don't rob the next guy of the chance to learn. -- Thisis0 03:33, 1 April 2008 (UTC)[reply]
I think we would do well to express degrees of prevalence in the sense line (or in the inflection line if appropriate). It would be nice to direct users to the Usage notes if there is more, but some generic cases could be handled with some kind of link to an appendix, an article, or even a "Plural notes" subheading under Usage notes. I'm not at all certain that this can be handled separately from plurals in general. Generic cases could include "pairs of" words (scissors and other tools, various kinds of pants, spectacles/eyeglasses) and other cases which I am too tired to reliably characterize at this time. If we do this, I don't see how we could do it in Latin and expect to be understood or indeed expect all editors and admins apply it properly. DCDuring TALK 03:23, 1 April 2008 (UTC)[reply]
Keep it simple, friend. -- Thisis0 03:33, 1 April 2008 (UTC)[reply]

The dictionary should be as clear as possible to all readers. We diminish it by imposing our idea of what they should learn. Plurale tantum is rare jargon – does any recent dictionary define or employ it?

The CanOD and NOAD don't define it (nor does Dictionary.com). They simply use the description "plural noun". It's self-explanatory, unambiguous, generally useful, and can be made more specific with labels. For example, in CanOD individual senses are sometimes labelled like "jean noun 1 a heavy twilled cotton fabric... 2 (usu. in pl.) hard-wearing pants..."

Let's use plain English instead of obscure Latin. —Michael Z. 05:40, 1 April 2008 (UTC)[reply]

Teeth, men, and horses are also all "plural nouns", however they hardly relate to the linguistic category we are discussing. That phrase is neither unambiguous nor particularly useful in separating these terms out from other plurals. In line with the goal of being "as clear as possible to all readers", the phrase 'plural noun' is lacking. Also, the fact that other dictionaries deal with this phenomenon in divergent, non-uniform, and sometimes inaccurate ways, does not mean we should follow suit. Other dictionaries also have been highly confused and divergent on the issue of "Noun Adjunct/sort-of Adjectives", which Wiktionary is also digesting right now and attempting to handle properly. So. A) Plurale tantum has a real name. We didn't make it up. B) It's the most accurate way of labeling a specific phenomenon that is more complex than a few English words can describe. C) It avoids the misleading nature of those inaccurate phrases, and giving potentially false impressions is worse than teaching nothing. D) We're really only talking about a very small number of entries to which this term even applies. E) Being a project that values knowledge, I without question err on the side of imparting more of it rather than catering to those uninterested in knowledge. F) The presence of two italicised Latin words in an entry does not prevent in any way the most casual user from learning the definition of trousers. Incidentally, the best way to teach this user the same information is with example sentences. We have those too. -- Thisis0 18:08, 1 April 2008 (UTC)[reply]
Hm, those are the plural forms of regular nouns, nouns in the plural. E.g. "men pl / Plural form of man," as opposed to "scissors plural noun / A type of tool..." That's just an example; I haven't claimed to have worked out the best method, but I do see plenty of precedents indicating that we can express this in English.
Many items in Category:English pluralia tantum simply aren't, and many more will require notes explaining the nuance. I think something like "usually in the plural" is better than mixing languages with "usually plurale tantum." Does that even make sense, or is plurale tantumness an absolute quality, requiring that such cases be labelled in English anyway?
As to the superior accuracy of the Latin, I do see the benefits of using a technical term in its appropriate context, and also the disadvantages of using it elsewhere. To most readers, p.t. will remain jargon (again: does any recent dictionary use or even define the term). Indeed the concept is "more complex than a few English words can describe," but most of our readers won't benefit from a familiarity with lexicographical literature debating its meaning,[9][10] but will have to settle for an (adequate?) 11-word English definition anyway. The same goes for our editors, so the category will continue to be full of nouns which are conventionally, often, usually, or mostly plural, but may not in fact be plurale tanta. —Michael Z. 20:59, 1 April 2008 (UTC)[reply]
I don't understand the point of men, teeth and horses. As to your helpfully labelled points:
A. I do have another term that we didn't make up: "plural in construction". This has the advantage (from your point of view) of not being 100% self-explanatory, but also the advantage (from my PoV) of being in English.
B. As for "plurale tantum" accurately describing things in a few words: it does not describe things for non-Latinists, it merely labels them; and it does not well address the problem of usage that are "usually", "often", or "sometimes" "plural in construction" without the use of macaronic, oxymoronic constructions that would give most of us a chance to test our gag reflex, such as "sometimes plurale tantum".
C. I do not see the inaccuracy of "plural in construction".
D. To the 400+ templates should be added the entries that link to plurale tantum and singulare tantum, those using {{singulare tantum}} and {{singular only}}, those that might be using some unlinked variations, and those that should have an appropriate label.
E. I favor imparting knowledge especially to those who are our most marginal visitors, because there are so many of them. (Evolution must favor them because there are so many of them.)
F. The impact of Latinisms on our users is not one that we have any facts about that I am aware of. Anecdotally and analogically, I would draw your attention to the replacement in the paper and printing and paper industry of such terms for page sizes as sexto and sextodecimo with sixmo and sixteenmo. This is suggestive of a certain concern within that industry with whether it is worth more to respect old practice or make things intelligible for newbies.
I strongly favor usage examples. I also favor usage notes for nuances. These last two possibilities do not differentially favor either English or Latin labels on the sense lines. DCDuring TALK 19:10, 1 April 2008 (UTC)[reply]

Forcing fonts in script templates

The issue of forcing fonts in script templates has come up a couple of times lately, so I thought I'd bring up the issue here. Here are a couple recent discussions:

There are a couple of questions raised by changes that Conrad.Irwin and I have made lately (some of them unimportant technical details), but I think the important issue is whether or not we should be forcing fonts for particular scripts to use the ones that our local language experts have deemed to be the "best" fonts for that script.

There are a couple of cases where we have to force fonts to accomodate Internet Explorer 6, but is already done in a way that only affects that browser. There are also cases like {{Cyrl}} where the default fonts used by most of our readers don't render the text correctly (combining accents in the case of Cyrillic). Since I think that forcing fonts in those cases is broadly considered a necessary evil, I'd like to focus on the cases forcing fonts is done solely to make things look "optimal", not to work around broken browsers or incorrect default font rendering in common browser setups.

My personal feeling is that we shouldn't force fonts unless it is to correct a widespread problem that actually results in incorrect display, not just suboptimal display. If we want to provide a way for logged-in users to easily choose to have the "best" fonts as determined by people who are familiar with the script in question, it could be done with WT:PREFS. Mike Dillon 00:11, 23 March 2008 (UTC)[reply]

P.S. There are currently big timing problems related to certain types of changes to the handling of script-specific fonts due to the caching strategy used by the Wikimedia Foundation's settings for running MediaWiki. If you're interested in giving us back some control of the timing of these things, please vote for bugzilla:8433 or add your comments there. Mike Dillon 00:13, 23 March 2008 (UTC)[reply]
I agree. Accessibility is paramount, so we should take the necessary steps to ameliorate known breakage. For example, the known MSIE 6 bug renders international text unreadable, and even a technical user would have no way to fix without our help. (Anyone know if this still affects MSIE 7 and 8?)
But aside from that, users' preferences should be respected, also to follow accessibility principals. There are more web browsers, versions, and configurations out there than we can ever test, and countless different sets of user preferences and personal style sheets. Many of them have been specifically chosen by users for their own preferred fonts, to display their own preferred or required language scripts. (Even a tiny fraction of our audience is very many in absolute terms.) Any unjustified fiddling with fonts and styles is bound to degrade or break the display for someone out there, so there should be as little intervention as possible.
Personally, I dislike that some templates currently override my font choice and eliminate the default bolding for Cyrillic in all browsers, but I can live with it. I'm looking forward to some changes in the translation templates which have to wait for the proxy caching to refresh. —Michael Z. 08:30, 26 March 2008 (UTC)[reply]
WT:PREFS is already too complex. I don’t understand a lot of it, and I seem to have to spend half an hour a week reticking the Expand translations box. So far, the only result I’ve seen from removing forced fonts is bad display, sometimes to be point of being unreadable. Perhaps you could add code to all of the font-call templates that will allow users to tick a box in WT:PREFS if they want the template to be ignored. I have no understanding of the timing problem you mention at bugzilla:8433, so it would be silly for me to vote either way. —Stephen 13:04, 26 March 2008 (UTC)[reply]
What I am suggesting is forcing fonts only for badly broken browsers (i.e. for MSIE's inability to deal with mixed language scripts), but not in other browsers (i.e. pretty much the rest). This is what's done in Wikipedia, and mostly here, except IPA and Cyrillic fonts are currently forced for all browsers. I agree that there's already way too many preferences.
Twiddling the display without necessity will have unpredictable effects in minority browsers, and could break the display for someone who has an otherwise working browser, or who has already taken steps to make the text he's interested in work right by default.
Is it MSIE that are you experiencing problems with, or am I being too presumptuous? Which writing systems suffer in Wiktionary? —Michael Z. 21:24, 1 April 2008 (UTC)[reply]
The fonts are actually forced for quite a number of scripts, they're just forced with inline styles instead of in the site-wide CSS file. Pretty much all of the font templates in Category:Font templates are used in an active script template. Mike Dillon 02:14, 2 April 2008 (UTC)[reply]

Customizing appearance of FL links in translation tables

for those that don't usually read the Grease pit, you might be interested in WT:GP#Customizing t, t-, t+ templates, please keep the discussion there, this is just a pointer Robert Ullmann 14:07, 24 March 2008 (UTC)[reply]

This entry has two verb headers with corresponding translations. I'd like to create one verb header with two senses and one translation header with two trans tables. Would this be a good approach? --Panda10 16:01, 24 March 2008 (UTC)[reply]

Sounds good to me.—msh210 16:07, 24 March 2008 (UTC)[reply]
Putting both senses under the same header is an excellent idea. However, whether the translation tables should stay at all is a question which the larger community has not yet decided. Some of us (especially those who work with highly inflected languages) would argue that a "form of" entry like that should not have translations at all. If I were to reformat the entry I would put both senses under the same verb header and then nix the translation tables altogether (although I was take a look at cling and see if some of the information is lacking there). However, if.....say...Robert Ullmann or Connel MacKenzie were to reformat the entry, they would keep the translation tables and format them as you say (see this vote for more info. In any case, formatting the entry as you proposed is certainly acceptable. -Atelaes λάλει ἐμοί 16:23, 24 March 2008 (UTC)[reply]
Thanks. I made the change. --Panda10 21:10, 24 March 2008 (UTC)[reply]

-in' forms

Two RFD nominations of -in'-form verbs, bein' and frontin', led a somewhat more general discussion of whether we should generally have entries for such words. Before debating that, I think it would be instructive to have people's views on what the state of the CFI is currently. Do you read the current CFI as allowing:

  • all -in' words (minus perhaps some exceptions) whose corresponding -ing words are attested;
  • individually attested in' words; or
  • none of the -in' words (minus perhaps some exceptions) even if the corresponding ing word is attested?

Once we have people's views on the current CFI, we can, perhaps, talk about whether they ought to be amended.—msh210 16:05, 24 March 2008 (UTC)[reply]

Attested.msh210 16:05, 24 March 2008 (UTC)[reply]
Attested.RuakhTALK 23:06, 24 March 2008 (UTC)[reply]
Actually, that's not true. I definitely think the current CFI allow -in' forms, but I'm not sure they currently require those forms to be cited separately from other forms of their respective words. (This is actually a question that applies to normal wordforms as well: my opinion for that is that we have to cite words, not wordforms, but I'm not sure that would apply to something like this. I guess I find it hard to separate how I think it should be from how I think it is, heh.) —RuakhTALK 01:29, 25 March 2008 (UTC)[reply]
Well, I await the outcome of this discussion, since it can impact how we do Latin, where individual forms are often not attested, but are assumed from the pattern of the verb conjugation. --EncycloPetey 01:36, 25 March 2008 (UTC)[reply]
For what it's worth, my plan is to only include forms which are attested in Ancient Greek. However, this doesn't necessarily mean that such a route is best for Latin, as grc inflection is a bit more varied and regionally dependent than Latin's. -Atelaes λάλει ἐμοί 05:49, 25 March 2008 (UTC)[reply]
Attested, but uncertain as to whether each "-ing" PoS (verb, noun, and adjective) needs to be separately attested in the "-in'" form. Pretty sure not each sense. DCDuring TALK 00:56, 25 March 2008 (UTC)[reply]
Attested. In my opinion CFI, as it currently stands is fairly silent on this particular issue. However, I see no reason to exempt forms from the normal criteria (both in' or otherwise). In the future, if we manage to find a more automated fashion of finding cites, such as TheDaveRoss's citebot (the possibility of which is freakin' awesome for a number of reasons), we may want to trim some of them out which are not attested. However, for the time-being, I think that we should stick with our current practice of allowing all normally inflected forms of a cited lemma. Irregular forms, such as -in', should be handled and cited on a case by case basis. -Atelaes λάλει ἐμοί 05:49, 25 March 2008 (UTC)[reply]

Random Page

A few dozen clicks of "Random page" indicate that more than 80% of en.wiktionary.org now consists of words in languages other than English. (Is the proportion published anywhere?)
I think this has been suggested before, but does anyone have the skill to program a "Random English page"? (I don't have the programming skill, unfortunately.)
Also, many of the non-English word entries contain just the part of speech. It would be very useful to have a translation too, without having to follow through the grammar trail (which is OK for grammarians, but frustrating for those who just want a quick meaning in English). I realise there are difficulties with this because the meaning often depends on context, but at least some attempt to translate into English would be useful, wouldn't it? Dbfirs 17:42, 24 March 2008 (UTC)[reply]

  • This has been mentioned many times before. When we achieve our aim of ALL words in ALL languages (not this year!) then English will probably accoubt for less than 1% of the dictionary. The part of speech entries for languages such as French, Italian and Spanish are mostly generated by bots. The work involved in adding translations, or simple examples of use would be horrendous - but anyone with a large amount of time could have a go. SemperBlotto 17:49, 24 March 2008 (UTC)[reply]
Then again, some have argued that adding translations to these is a bad idea. How do you translate an Ancient Greek perfect optative middle/passive second dual? The answer is, you don't; not in isolation anyway. The fact is that, for most languages, having a basic understanding of the language and its grammar is essential to being able to translate it. Adding translations to these will help no one, and mislead many. Also, Connel Mackenzie has a random English word program on his userpage (5th line down, rnd-En). -Atelaes λάλει ἐμοί 17:57, 24 March 2008 (UTC)[reply]
Connel has an external tool which will send you to a random page in a given language, based on dumps I think. In order to have a link to a random page in English either the wiki would have to parse all the pages to decide if it included anywhere ==English== or not, or it would have to give a random page from a category "English Words". Currently I don't think 100% of our English words (or 100% of any language's words) are in a category "languagex Words", it wouldn't be difficult to categorize them as such with a bot but I seem to recall a discussion about how silly it was to have categories as broad as "languagex words"...imagine that someone found a good use for them :). So, a couple steps would need to take place to have it local, or we could use Connel's tool and call it a day.
Also, WT:GP might be a better place for this particular suggestion. - [The]DaveRoss 20:05, 24 March 2008 (UTC)[reply]
Whoops, I'm a bit late - but see Wiktionary:Random page for more info on Connel's solution. Conrad.Irwin 17:28, 25 March 2008 (UTC)[reply]

Admin Notice: Special:MergeAccount

Finally, a reward, of sorts, for all your hard work! Administrators on any Wikimedia site can access the Special:MergeAccount page, which allows them to unify their logins across the whole of the Wikimedia Foundation. This is not yet available to non-administrator users however I am sure it won't be too much longer. Thank you to all the people who have been involved in the implementation of this great, long anticipated, feature. Conrad.Irwin 11:07, 25 March 2008 (UTC)[reply]

Cheers for the note Conrad, I'm now the only Thryduulf you'll see on any Wikimedia project - and it only took three tries to remember the old password I still had set on cy.wikipedia!
Its worth noting that it works even if all the accounts you have don't have admin status, but whether it requires it on your "home wiki" or just on more or more I don't know. Thryduulf 13:49, 25 March 2008 (UTC)[reply]
I did not even remember that I had an account at Wikispecies! Circeus 13:26, 26 March 2008 (UTC)[reply]
Likewise I don't remember registering at sv.wikipedia! Thryduulf 14:20, 26 March 2008 (UTC)[reply]
It requires (for now) that you initiate the merge from "my preferences" on a wiki where you have admin status. Having done that, everything is symmetrical again. Robert Ullmann 14:47, 26 March 2008 (UTC)[reply]
I've done this using my admin status on simple.wikt across a variety of sites, but the user:Brett account is taken here, though it's completely unused. If a bureaucrat here could rename that, I could unify my logins. I wonder if somebody would have a look over here.--BrettR 19:26, 26 March 2008 (UTC)[reply]

Does this work if your password is not the same for all your accounts? That is, do I need to sync my passwords first? --EncycloPetey 19:33, 26 March 2008 (UTC)[reply]

If the password and/or (confirmed?) email address are the same as the account from which you initiate the merge, then the merge will happen automatically. If neither of these are true, then you are given the option to log into those accounts. Once the correct password is supplied, the accounts are merged. For example
  • Most of my accounts had a password starting with X and/or had the same email address as my Wiktionary account - these were automatically merged.
  • All but one of the rest had a password starting with k, once I provided this password all the accounts with it were merged.
  • The remaining 1 account (cy.wikipedia) was not included in any of the above. I tried at least two different passwords without success until I remembered an old password that worked and it was then added to the merge.
If I hadn't remembered the password, I could have left it unmerged to merge it in at a later date after I'd reset the password. I presume that had I not been able to remember or reset the password that I could have gone through a forced merge procedure or something. Thryduulf 19:51, 26 March 2008 (UTC)[reply]
Seemed to go very smoothly for me. The one thing the doco didn't say is that it does set the unified password to the one you used in the merge procedure. Instead of having multiple passwords, there is now a single password for all merged accounts. --EncycloPetey 20:16, 26 March 2008 (UTC)[reply]
That's because there is only one account after the merge. The only thing on the local wikis after the merge are user preferences (along with contributions, action logs, user page(s), and user talk page). Mike Dillon 03:50, 28 March 2008 (UTC)[reply]

Bot flag request for User:Computer

Hi, I would like to request a bot flag. I already have a bot flag on many wikis m:User:White Cat#Bots and I am from commons:User:White Cat at which I am a sysop. My language skills: en-n, tr-4, az-2, ja-1.

I hope to help with the following tasks:

  • Interwikilinking using interwiki.py
  • Double redirect fixing using redirect.py
  • Commons delinking using delinker.py

I can also help with tasks like recategorization or any other bulk find & replace tasks.

-- Cat chi? 00:18, 27 March 2008 (UTC)[reply]

We do not need another interwiki bot (if you look in the archives the last few requests to run one have been denied) as User:Interwicket is much more efficient than interwiki.py. We also have User:CommonsDelinker - though I don't know if it needs a hand or not. Double redirects aren't often a problem (as we generally shoot redirects on sight and so don't want them fixing). In terms of bulk find and replace tasks I would prefer these to be in the hands of a user with more Wiktionary experience, though they are always reversible so the damage is minimal. It is probably better to run with the flag off until you begin to annoy the RC patrollers. Conrad.Irwin 00:39, 27 March 2008 (UTC)[reply]
There are a few reservations I have with some of that. Let me number list them.
  1. I intend to run my interwiki bot on all wiktionaries. Not running it here would create additional load to the existing bots. Having multiple interwiki bots do not disrupt the operation of each other. Wiktionary is a colossal project. I am looking at http://www.wiktionary.org/ Just adding the largest four wikis reveal: 769 000 + 753 000 + 225 000 + 187 000 = 2 121 000. Processing all of that regularly would require lots of interwiki bots. I am talking about all article scans.
  2. Even if there are one or two double redirects (there can be valid reasons to have redirects even if most of them are shot on sight). This wiki had 6 such redirects as of this post of which four you fixed manually (example). Bot could have done that for you. Any unnecesary redirect can be processed from Special:ListRedirects. Looking at there I see plenty of redirects, well over 5000... Broken redirects are a navigation hazard.
  3. The bot acts as a backup to CommonsDelinker. If for any reason commons delinker fails to operate (such as toolserver going down), my bot would fill in for it.
  4. Find & replace tasks requires no real experience. Its merely case sensitive here.
  5. Operating the bot without a bot flag decreases its efficiency. Wikimedia servers limits how many edits people can make to prevent spam bots. Because I operate my bot it many wikis the bot flag is particularly helpful.
-- Cat chi? 15:43, 27 March 2008 (UTC)[reply]
  1. I still don't see the need for more interwiki bots - I'm not aware that the current ones are struggling at all?
  2. Fair enough if that happens - CommonsDelinker doesn't have much work to do here (1 edit this month, the 50th most recent was on 12 November), given there are not many images. If the toolserver does go down (I was under the impression it was far more stable these days?) I'm not certain there is a need for a backup bot - if there is an extended outage, then we can discuss a need then.
  3. The most recent find and replace operation I'm aware of was moving a category, which you are right doesn't need much experience, but more likely will be ones that you need to understand the formatting and many templates we use here - I'd rather the bot be operated by someone who understands how to fix any accidental damage it causes.
  4. To my mind you still haven't explained why we need the bots in the first place. Thryduulf 22:21, 27 March 2008 (UTC)[reply]
  1. You cannot deal with well over two million pages with just 2 bots. How much time do you think the bot spends on each page? 2 000 000 / (24*60*60) = 23.148... meaning the bot would need to review 23 pages per second assuming all editions of wiktionary has 2 million pages. Assuming the current bot can handle such a thing. Dividing that workload by two is only logical.
  2. No. En.wiktionary is not at the center of the universe. Commonsdelinker operates on over 800 wikis. I will not have the time to discuss this on so many wikis. I am a human being and I will need to be sleeping, eating, working when the next outage happens which could be the next hour when a lightning strikes. You are right there aren't a whole lot of images here. If there isn't a backup bot you will have a red link. Commons administrators are neither required nor expected to manually delink (or relink, images can be renamed) images.
  3. It would take me a few minutes to figure out the fine details. I am not a 5 year old. I can do the same kind of fixes with trivial amount of attention.
  4. I am trying to offer a service. I want to help out. Having two or more bots help out a demanding task is something productive. You share the workload, you cooperate. I can help with the bulky issues like interwiki linking and double redirect fixing. Other bot operators the would be freer to focus on tasks that requires more fine tuning and experience in wiktionary.
-- Cat chi? 00:45, 28 March 2008 (UTC)[reply]
While its true there are 2 million pages (edit, this is not true for a single Wiktionary the largest is fr with 169,000 pages, we have 153,000. The 10 largest wikis total 2.6 million), not all of them need monitoring constantly - all the bots need to do is to watch for page creations and additions of other interwiki links. The latter is less important for Wiktionaries than for Wikipedias (and perhaps other projects) as the only interwiki links we want are between identically spelled entries, i.e. house should link to fr:house, de:house, pl:house, etc. whereas w:en:House links to w:fr:Maison, w:de:Haus, w:pl:Dom. The latter needs inteligence to know that w:pl:Dom should link to w:de:Haus (house) not w:de:Dom (cathedral), whereas the Wiktionary links need only a dumb bot. Also, we have at least two interwiking bots currently - you have not shown they cannot cope and the owners of those bots have not said they cannot cope. Additionally the last 100 recent changes covers 1 hour and 20 minutes of edits here at one of the busiest Wiktionaries (I don't know if its the busiest or not) At Wikipedia there were more than 100 changes in the past 1 minute, which shows the different scale of the projects - smaller wiktionaries are even more stark - 100 changes on pt.wiktionary took 18 hours, on cy.wiktionary it took 6 days (and many of them were dealing with a spambot). You appear to be thinking in Wikipedian terms. Thryduulf 01:26, 28 March 2008 (UTC)[reply]
Please see http://www.wiktionary.org/ fr.wikt has 769 000 alone en.wikt is at 753 000 followed by vi.wict at 225 000 and tr.wikit 187 000 (big 4). I do not know where you get your numbers.
Sorry, my numbers are from the same source as yours, but for some reason I typed an initial 1 instead of an initial 7. Thryduulf 13:13, 28 March 2008 (UTC)[reply]
All wikis should be scanned that they contain the correct interwiki links. That includes links for ts.wikt. Remember we are not writing this project for technically advanced people like you and me but for the casual reader who barely knows how to use a mouse.
The latter indeed need greater intelligence. But thats half of the work. If you for example want to add Polish to the chain, you would have to edit every wiki that is in the chain. Interwiki.py spreads it to every wiki for you. All you need to do is add a single interwiki link on one wiki, the bot can spread it for you. Thats what I mean by constant scanning.
But this is exactly what is already happening with VolkovBot and Interwicket here - look at their contribs. Thryduulf 13:13, 28 March 2008 (UTC)[reply]
It is of course managavle if you restrict the bots sensors to recent changes and hope all interwiki links are properly placed. How does a bot operating on en.wiki RC feed know about the addition of a page on Polish wiki? I scan individual articles, I pay no attention to the RC feed. Regularly scanning every page on every edition of wiktionary is the task I want to fulfill. Which of the two bots are doing this?
-- Cat chi? 12:29, 28 March 2008 (UTC)[reply]
See User talk:Interwicket#VolkovBot - VolkovBot does what you are proposing (I think), Interwicket reads the recent changes. So you see the task you are proposing is already being doing on the English Wiktionary, If you want to run your bot on other Wikitionaries then you will have to ask there we can't give or refuse you permission. Thryduulf 13:13, 28 March 2008 (UTC)[reply]
Agree with Thryduulf. Non of your bots seem to fill any need here on Wiktionary. I feel far more comfortable having one of our own fulfill these tasks, as they know Wiktionary and its needs far better. Ullmann's bots are like magic, and not just because he's a skilled programmer, but also because he has been here for a long time and has a thorough understanding of what needs to happen. While we appreciate your offer to help, it is not required here right now. -Atelaes λάλει ἐμοί 22:28, 27 March 2008 (UTC)[reply]
In other words if you are new you are unwelcome. :D I know this wasn't intended to be confrontations but I don't like it. :P -- Cat chi? 00:46, 28 March 2008 (UTC)[reply]
Not at all - you are very welcome, we would just like you to gain experience as a human editor before you run bots we aren't sure are needed. Thryduulf 01:26, 28 March 2008 (UTC)[reply]
It's much more manageable for each wiki community to run its own bots locally. The interwiki bot is an example of that; the one in use here is much more efficient than the standard bot based on the pywikipedia framework, and we can tailor it to our specific needs. I find it odd that you are so hostile to the idea that your bot may be duplicating work we do already. Instead of taking offense, why not join the local community and see where a new bot might be useful? -- ArielGlenn 01:29, 28 March 2008 (UTC)[reply]
I am not the one hostile. My concern is all editions of wiktionary not just en.wikitonary. We seem to be talking in different scales. I am interested in the macro-scale not micro. --12:29, 28 March 2008 (UTC)
Our concern is for the English Wiktionary primarily, and with the bots we already have here, our interwiki links are kept up-to-date already. As I said above we cannot give you permission to run your bot on any Wiki other than the English Wikipedia - you will have to ask on them. But a look at a random selection of wiktionaries (cy, pt, id, ts, vo and fr) suggests that VolkovBot is keeping their interwikis up-to-date as well. Thryduulf 13:13, 28 March 2008 (UTC)[reply]
  • Random question for the techies: Are non-mainspace interwikis (for project pages, templates, etc.) currently being handled adequately? Of course there are project-unique concerns here as well (particularly wrt templates and categories), but the pywikipedia approach would seem to be more applicable to these. -- Visviva 03:13, 28 March 2008 (UTC)[reply]
I'm not sure why, but you are considered as not having registered.
To solve this issue once and for all, why not publishing Interwicket in pywikipedia, so that it is clear that there are two standard interwiki bots, one for wiktionaries, and one for other projects?
If you are willing to use Interwicket on wiktionaries, I'm sure you would be very welcome on all wiktionaries (including here, providing there is some coordination between Interwicket users). Lmaltier 07:33, 29 March 2008 (UTC)[reply]
I can use seperate code yes. But the two different codes do the same thing. Interwicket is simply more efficient but does the same thing as interwiki.py. I will always use the more efficient code. I value my CPU time after all. Also interwicket cannot handle non-mainspace tasks while interwiki.py can. I think a good use of both will be the best course of action. -- Cat chi? 19:03, 16 April 2008 (UTC)[reply]

Wiktionary:Easter Competition 2008

Announcing Wiktionary:Easter Competition 2008. Discussion can be at that page (or its talk page).—msh210 17:08, 27 March 2008 (UTC)[reply]

Citations pages: let's be specific.

I was under the impression that Citations pages were to hold evidence for whether or not a term meets the CFI, and in this case I am defining 'term' as a specific set of characters unique with regards to order and capitalization. This means that MySpace != myspace, and Kind != kind when it comes to Citations (altho first word in the sentence caps are acceptable). Am I the only one thinking this way? MySpace's citations should be on Citation:MySpace and myspace on Citation:myspace? Inflected forms and declined forms and conjugated forms and all other forms should be on separate pages, to verify their own existence. - [The]DaveRoss 20:32, 28 March 2008 (UTC)[reply]

I was under that impression, yes, except that inflected forms may appear on the "lemma" page in addition to the page for the specific form. In other words, the citations for "let" as an entry do not have to be the infinitive form, but can be for other forms as well. We do the same for example sentences; such sentences do not have to be artificially worded to use a particular form of the word, but may use any form. --EncycloPetey 20:47, 28 March 2008 (UTC)[reply]
Agreed, however there is also merit to citing specific forms. I am considering dividing up the citations pages for Ancient Greek lemma entries in two, with one half for any form of the lemma, and one half for the specific form (this is especially relevant because some words may not have their lemma form attested, a situation which isn't really the case in living languages). However, the point TheDaveRoss raises about different spellings, such as capitalized and non-capitalized remains a valid one. This probably ties in to the discussion we were having about -in' forms. -Atelaes λάλει ἐμοί 20:59, 28 March 2008 (UTC)[reply]
Capitalization does create a bit of an extra problem for Latin, since capitalization rules differ between Classical Latin on the one hand and Medieval and Later Latin on the other. We'll probably have to set special guidelines for languages that went through such a shift. --EncycloPetey 21:16, 28 March 2008 (UTC)[reply]
Such as English. :-) -- Visviva 23:50, 30 March 2008 (UTC)[reply]
When a word can be cited only at the beginning of a sentence, this is not a sufficient reason to capitalize it in the entry. The same should apply to normal inflected forms (if applicable rules are clear): it can be helpful to provide examples and citations for each form, including the lemma form, but they should not be a requirement before creating the entries (once again, when applicable rules are clear). Lmaltier 07:44, 29 March 2008 (UTC)[reply]
Even citation in the middle of a sentence isn't always reason to capitalize the entry. English allows capiptalization for the purpose of emphasis, and this used to be quite common in written English. If you read some Shakespeare's plays or some of Locke's essays with the original capitlaization, you will see many, many common nouns capitalized in the middle of sentences. --EncycloPetey 00:06, 31 March 2008 (UTC)[reply]
You are right. And I think this is a reason not to create Lion or Beluga as English words. After all, we don't create When, which may be capitalized or not, but follows general rules. Lmaltier 16:46, 31 March 2008 (UTC)[reply]

FL name as template in trans section

I've seen this in prologue, in the translation section the FL name is not Finnish but {fi}. Is this standard? --Panda10 11:44, 29 March 2008 (UTC)[reply]

No, they should always be subst'd. They occasionally show up because of edits by people more familiar with other wikts, which sometimes use them as standard. The primary reason we don't is that we have a lot of languages in translation tables that are either not coded, or have codes not known to the editors. It also makes it very hard to alphabetize when changing the wikitext. Often the FL.wikts that use them just put the table in code order; but with the number of languages we have code order looks random after a while. (Tocharian A is xto, Tocharian B is txb, etc, etc).
If you just leave them, AF will fix them as it rechecks the entry after your edit. Robert Ullmann 13:22, 29 March 2008 (UTC)[reply]
Thank you for the explanation. --Panda10 21:52, 29 March 2008 (UTC)[reply]
Could we make it alphabetize automatically using class="sortable" or something? --Ptcamn 22:55, 30 March 2008 (UTC)[reply]
If AutoFormat always sorts them then there is no need. Conrad.Irwin 10:51, 31 March 2008 (UTC)[reply]
AF doesn't sort them yet, but it has been on the "to do" list for a long time ... perhaps in a few hours from now? ;-) Do note that in some cases people have intentionally used a different order (Like listing "Ancient Greek" after "Greek" without using any nested syntax) and this will "fix" those. I had brought up the issue of a well-defined but not strict alpha order previously, and the sort-of conclusion was just to stick to strict alpha. The "nesting" (see something on hierarchy supra) is another issue, as it is used for several different things. AF will just preserve it for now. (Of course!). Robert Ullmann 11:56, 2 April 2008 (UTC)[reply]
AF will now sort and rebalance translations blocks that use {{trans-top}}. See price for example. Robert Ullmann 15:36, 2 April 2008 (UTC)[reply]

How should Wiktionary distinguish between two classes of non-English words?

How should Wiktionary distinguish between two classes of non-English words?: The superclass of "all foreign words which will eventually be added to Wiktionary" and the much smaller subclass of "foreign words that are somewhat regularly used in some English literature (technical or otherwise)". For example, dirigist is used in the English languarge social science literature whereas many words are not (such as milieuverontreiniging, which is not, I suspect, used in the English-language environmental literature). Yet both are listed in Related or Derived terms lists of other foreign words (dirigisme and milieu, respectively). My question is how does an English-language only Wiktionary user know which words might make sense to use in English literature and which ones are merely foreign words being added to the big giant Wiktionary project and would therfore likely NOT be useful/normal in any standard English literature? This seems like a problem that will become more and more pronounced as a larger number of foreign words are added. N2e 23:45, 30 March 2008 (UTC)[reply]

Personally, I think that if a foreign word is used regularly in English, then in some sense it's an English word, and it should have an English language section; the etymology and usage notes sections can discuss its foreignness, and some sort of context template can be devised if that's considered necessary. However, at least one editor has in the past objected rather strenuously to giving such words English-language sections; I don't know if he still would. —RuakhTALK 03:02, 31 March 2008 (UTC)[reply]
I agree with Ruakh; any foreign word used in English literature (or used in English sentences- online, etc) should count as an English word, as a general rule of thumb. sewnmouthsecret 03:51, 31 March 2008 (UTC)[reply]
Basically agreed; IMO the question that actually confronts us is how best to differentiate between two classes of English words, viz. English words in the strict sense and foreignisms. But let's note that this inclusive approach also entails a large number of FL sections for the many English words that are used as deliberate anglicisms in French, German, Italian, et al. We do need some way of handling these cases without absurdity, and I'm not sure if any suitable model has yet been presented (maybe a Translingual section for widely-used anglicisms?). And on the other hand there is still a line that must be drawn between foreignisms and code-switching (as when for example someone quotes a German movie title in English -- the words of the title are not being used as English words in any sense). -- Visviva 04:17, 31 March 2008 (UTC)[reply]
If a word has been partly or fully naturalized in English in a certain field of study or practice, then it should have appropriate context labels or notes explaining this (slang, jargon, technical, the field, etc). Sounds to me like a case of overlapping restrictions make it less common than something purely technical or purely a loanword, but still no less a part of English.
In such cases, I'd like to see attention paid to including references attesting to how established they are in their field, as well as good citations demonstrating their use. Of course, it may be difficult finding such information, but glossaries specific to the field may be useful. —Michael Z. 04:18, 31 March 2008 (UTC)[reply]
That makes sense. But how would you approach the case where the only field of study or practice where the word is partially naturalized is directly related to the source language? For example, mashallah appears chiefly in narratives set in the Arabic- and Farsi-speaking world; devotchka (Burgess aside) chiefly in literary works featuring Russians. Both are overwhelmingly italicized, and may occur alongside other words (like malenkaya) which no sane person would consider English. -- Visviva 10:28, 31 March 2008 (UTC)[reply]
Well, many Russian words used in fiction about Russia, in works translated from Russian, or in the field of Slavistics are simply foreign terms, which are being used for the benefit of some chelovicks who may razoomy them. I think these don't belong under the "English" heading. —Michael Z. 21:40, 1 April 2008 (UTC)[reply]
Re: Ruakh's comment. Thanks, that helps me a lot. I looked back at the two words I orginally used as examples (dirigist and milieuverontreiniging) and found them actually coded correctly to Ruahk's suggested standard: dirigist as English and milieuverontreiniging as Dutch. So far, so good. That shows that my comment was about a "user error" -- I didn't perceive the difference even though it was articulated in the entry for each word, plain as day. That helps me; I will try to be more careful in the future.
HOWEVER, it does suggest another question for all of the serious wordsmiths and Wiktionarians who frequent the Beer Parlor: How to "design" a good (better?) user interface for Wiktionary that makes this English/Foreign word distinction more apparent to the casual Wiktionary user? N2e 14:46, 31 March 2008 (UTC)[reply]
My CanOD italicizes a headword "if the word is originally a foreign word and not naturalized in English." It's a great way to demonstrate its nature by simulating its natural environment. —Michael Z. 21:40, 1 April 2008 (UTC)[reply]

Bot flag request for User:Computer

Hi, I would like to request a bot flag. I already have a bot flag on many wikis m:User:White Cat#Bots and I am from commons:User:White Cat at which I am a sysop. My language skills: en-n, tr-4, az-2, ja-1.

I hope to help with the following tasks:

  • Interwikilinking using interwiki.py
  • Double redirect fixing using redirect.py
  • Commons delinking using delinker.py

I can also help with tasks like recategorization or any other bulk find & replace tasks.

-- Cat chi? 00:18, 27 March 2008 (UTC)[reply]

We do not need another interwiki bot (if you look in the archives the last few requests to run one have been denied) as User:Interwicket is much more efficient than interwiki.py. We also have User:CommonsDelinker - though I don't know if it needs a hand or not. Double redirects aren't often a problem (as we generally shoot redirects on sight and so don't want them fixing). In terms of bulk find and replace tasks I would prefer these to be in the hands of a user with more Wiktionary experience, though they are always reversible so the damage is minimal. It is probably better to run with the flag off until you begin to annoy the RC patrollers. Conrad.Irwin 00:39, 27 March 2008 (UTC)[reply]
There are a few reservations I have with some of that. Let me number list them.
  1. I intend to run my interwiki bot on all wiktionaries. Not running it here would create additional load to the existing bots. Having multiple interwiki bots do not disrupt the operation of each other. Wiktionary is a colossal project. I am looking at http://www.wiktionary.org/ Just adding the largest four wikis reveal: 769 000 + 753 000 + 225 000 + 187 000 = 2 121 000. Processing all of that regularly would require lots of interwiki bots. I am talking about all article scans.
  2. Even if there are one or two double redirects (there can be valid reasons to have redirects even if most of them are shot on sight). This wiki had 6 such redirects as of this post of which four you fixed manually (example). Bot could have done that for you. Any unnecesary redirect can be processed from Special:ListRedirects. Looking at there I see plenty of redirects, well over 5000... Broken redirects are a navigation hazard.
  3. The bot acts as a backup to CommonsDelinker. If for any reason commons delinker fails to operate (such as toolserver going down), my bot would fill in for it.
  4. Find & replace tasks requires no real experience. Its merely case sensitive here.
  5. Operating the bot without a bot flag decreases its efficiency. Wikimedia servers limits how many edits people can make to prevent spam bots. Because I operate my bot it many wikis the bot flag is particularly helpful.
-- Cat chi? 15:43, 27 March 2008 (UTC)[reply]
  1. I still don't see the need for more interwiki bots - I'm not aware that the current ones are struggling at all?
  2. Fair enough if that happens - CommonsDelinker doesn't have much work to do here (1 edit this month, the 50th most recent was on 12 November), given there are not many images. If the toolserver does go down (I was under the impression it was far more stable these days?) I'm not certain there is a need for a backup bot - if there is an extended outage, then we can discuss a need then.
  3. The most recent find and replace operation I'm aware of was moving a category, which you are right doesn't need much experience, but more likely will be ones that you need to understand the formatting and many templates we use here - I'd rather the bot be operated by someone who understands how to fix any accidental damage it causes.
  4. To my mind you still haven't explained why we need the bots in the first place. Thryduulf 22:21, 27 March 2008 (UTC)[reply]
  1. You cannot deal with well over two million pages with just 2 bots. How much time do you think the bot spends on each page? 2 000 000 / (24*60*60) = 23.148... meaning the bot would need to review 23 pages per second assuming all editions of wiktionary has 2 million pages. Assuming the current bot can handle such a thing. Dividing that workload by two is only logical.
  2. No. En.wiktionary is not at the center of the universe. Commonsdelinker operates on over 800 wikis. I will not have the time to discuss this on so many wikis. I am a human being and I will need to be sleeping, eating, working when the next outage happens which could be the next hour when a lightning strikes. You are right there aren't a whole lot of images here. If there isn't a backup bot you will have a red link. Commons administrators are neither required nor expected to manually delink (or relink, images can be renamed) images.
  3. It would take me a few minutes to figure out the fine details. I am not a 5 year old. I can do the same kind of fixes with trivial amount of attention.
  4. I am trying to offer a service. I want to help out. Having two or more bots help out a demanding task is something productive. You share the workload, you cooperate. I can help with the bulky issues like interwiki linking and double redirect fixing. Other bot operators the would be freer to focus on tasks that requires more fine tuning and experience in wiktionary.
-- Cat chi? 00:45, 28 March 2008 (UTC)[reply]
While its true there are 2 million pages (edit, this is not true for a single Wiktionary the largest is fr with 169,000 pages, we have 153,000. The 10 largest wikis total 2.6 million), not all of them need monitoring constantly - all the bots need to do is to watch for page creations and additions of other interwiki links. The latter is less important for Wiktionaries than for Wikipedias (and perhaps other projects) as the only interwiki links we want are between identically spelled entries, i.e. house should link to fr:house, de:house, pl:house, etc. whereas w:en:House links to w:fr:Maison, w:de:Haus, w:pl:Dom. The latter needs inteligence to know that w:pl:Dom should link to w:de:Haus (house) not w:de:Dom (cathedral), whereas the Wiktionary links need only a dumb bot. Also, we have at least two interwiking bots currently - you have not shown they cannot cope and the owners of those bots have not said they cannot cope. Additionally the last 100 recent changes covers 1 hour and 20 minutes of edits here at one of the busiest Wiktionaries (I don't know if its the busiest or not) At Wikipedia there were more than 100 changes in the past 1 minute, which shows the different scale of the projects - smaller wiktionaries are even more stark - 100 changes on pt.wiktionary took 18 hours, on cy.wiktionary it took 6 days (and many of them were dealing with a spambot). You appear to be thinking in Wikipedian terms. Thryduulf 01:26, 28 March 2008 (UTC)[reply]
Please see http://www.wiktionary.org/ fr.wikt has 769 000 alone en.wikt is at 753 000 followed by vi.wict at 225 000 and tr.wikit 187 000 (big 4). I do not know where you get your numbers.
Sorry, my numbers are from the same source as yours, but for some reason I typed an initial 1 instead of an initial 7. Thryduulf 13:13, 28 March 2008 (UTC)[reply]
All wikis should be scanned that they contain the correct interwiki links. That includes links for ts.wikt. Remember we are not writing this project for technically advanced people like you and me but for the casual reader who barely knows how to use a mouse.
The latter indeed need greater intelligence. But thats half of the work. If you for example want to add Polish to the chain, you would have to edit every wiki that is in the chain. Interwiki.py spreads it to every wiki for you. All you need to do is add a single interwiki link on one wiki, the bot can spread it for you. Thats what I mean by constant scanning.
But this is exactly what is already happening with VolkovBot and Interwicket here - look at their contribs. Thryduulf 13:13, 28 March 2008 (UTC)[reply]
It is of course managavle if you restrict the bots sensors to recent changes and hope all interwiki links are properly placed. How does a bot operating on en.wiki RC feed know about the addition of a page on Polish wiki? I scan individual articles, I pay no attention to the RC feed. Regularly scanning every page on every edition of wiktionary is the task I want to fulfill. Which of the two bots are doing this?
-- Cat chi? 12:29, 28 March 2008 (UTC)[reply]
See User talk:Interwicket#VolkovBot - VolkovBot does what you are proposing (I think), Interwicket reads the recent changes. So you see the task you are proposing is already being doing on the English Wiktionary, If you want to run your bot on other Wikitionaries then you will have to ask there we can't give or refuse you permission. Thryduulf 13:13, 28 March 2008 (UTC)[reply]
Agree with Thryduulf. Non of your bots seem to fill any need here on Wiktionary. I feel far more comfortable having one of our own fulfill these tasks, as they know Wiktionary and its needs far better. Ullmann's bots are like magic, and not just because he's a skilled programmer, but also because he has been here for a long time and has a thorough understanding of what needs to happen. While we appreciate your offer to help, it is not required here right now. -Atelaes λάλει ἐμοί 22:28, 27 March 2008 (UTC)[reply]
In other words if you are new you are unwelcome. :D I know this wasn't intended to be confrontations but I don't like it. :P -- Cat chi? 00:46, 28 March 2008 (UTC)[reply]
Not at all - you are very welcome, we would just like you to gain experience as a human editor before you run bots we aren't sure are needed. Thryduulf 01:26, 28 March 2008 (UTC)[reply]
It's much more manageable for each wiki community to run its own bots locally. The interwiki bot is an example of that; the one in use here is much more efficient than the standard bot based on the pywikipedia framework, and we can tailor it to our specific needs. I find it odd that you are so hostile to the idea that your bot may be duplicating work we do already. Instead of taking offense, why not join the local community and see where a new bot might be useful? -- ArielGlenn 01:29, 28 March 2008 (UTC)[reply]
I am not the one hostile. My concern is all editions of wiktionary not just en.wikitonary. We seem to be talking in different scales. I am interested in the macro-scale not micro. --12:29, 28 March 2008 (UTC)
Our concern is for the English Wiktionary primarily, and with the bots we already have here, our interwiki links are kept up-to-date already. As I said above we cannot give you permission to run your bot on any Wiki other than the English Wikipedia - you will have to ask on them. But a look at a random selection of wiktionaries (cy, pt, id, ts, vo and fr) suggests that VolkovBot is keeping their interwikis up-to-date as well. Thryduulf 13:13, 28 March 2008 (UTC)[reply]
  • Random question for the techies: Are non-mainspace interwikis (for project pages, templates, etc.) currently being handled adequately? Of course there are project-unique concerns here as well (particularly wrt templates and categories), but the pywikipedia approach would seem to be more applicable to these. -- Visviva 03:13, 28 March 2008 (UTC)[reply]
I'm not sure why, but you are considered as not having registered.
To solve this issue once and for all, why not publishing Interwicket in pywikipedia, so that it is clear that there are two standard interwiki bots, one for wiktionaries, and one for other projects?
If you are willing to use Interwicket on wiktionaries, I'm sure you would be very welcome on all wiktionaries (including here, providing there is some coordination between Interwicket users). Lmaltier 07:33, 29 March 2008 (UTC)[reply]
I can use seperate code yes. But the two different codes do the same thing. Interwicket is simply more efficient but does the same thing as interwiki.py. I will always use the more efficient code. I value my CPU time after all. Also interwicket cannot handle non-mainspace tasks while interwiki.py can. I think a good use of both will be the best course of action. -- Cat chi? 19:03, 16 April 2008 (UTC)[reply]

<section end="archive_march">

April 2008

<section begin="archive_april">

Wholesale conversion to "Determiner"

A user Contribution of BrettR_aka_Mr._Determiner has changed the entries for a dozen frequently used words to eliminate all PoS headings except "Determiner". Does such a drastic step have community approval? It seems simply undiscussed. It seems to me to be going in the direction against intelligibility for ordinary users and favoring some current fashion in lingustics and language education. DCDuring TALK 20:11, 1 April 2008 (UTC)[reply]

Wiktionary:Beer_parlour_archive/2007/April#Determiner_vs_Determinative is the last major discussion and seems hardly conclusive. Has there been a policy vote? Or is this an April Fools thing? DCDuring TALK 20:36, 1 April 2008 (UTC)[reply]

I don't think that's O.K. It might be O.K. to unify ===Adjective=== and ===Pronoun===, and maybe even ===Noun===, under ===Determiner===; but eliminating ===Adverb=== sections? —RuakhTALK 22:28, 1 April 2008 (UTC)[reply]
Is "determiner" really widely accepted in books for those outside of the language and linguistics community. If it is leading (bleeding?) edge, then perhaps we could find a way of transitioning to its use that allowed for folks like me to get behind it. I really don't see why this header ought to be in use at all until it has been well discussed. Because we can't do any real user research, we have to pay close attention to the practice of other publishers. Their practice does not support this. If we think that we can achieve a competitive advantage by being cutting edge, free of some unhelpful traditional categories, then we should go ahead and implement the change. I haven't heard the rationale for the superiority of the unknown-to-users category "Determiner" to the widely known old-fashioned categories. Maybe it doesn't really matter much because hardly anyone will use a dictionary for these words anyway and we don't have all that big a user base so we can just do what pleases us. DCDuring TALK 23:39, 1 April 2008 (UTC)[reply]
I'm not certain whether we should use "Determiner" or not, but I am certain we should not use it until we've discussed it and come to a conclusion to use it. The previous discussions are neither recent enough nor conclusive enough imho. Thryduulf 00:01, 2 April 2008 (UTC)[reply]
That wish is not stopping the process which continues as we wring our hands. Judging from the lack of action I would say that this is something most don't care about or support. DCDuring TALK 01:26, 2 April 2008 (UTC)[reply]
We just don't particularly have time or whatever. What "BrettR" doesn't realize it that he is likely to cause a vote on using "determiner" fo English (see below), and if it is barred, we will calmly revert all of his edits and remove the Determiner heading from the entries it was already used in. (I recall reading something on the 'pedia about someone (ahem) determined to show the wikt people who don't even know what a determiner is; I suppose I should have seen this coming? ;-). It is not a classic English POS, and there is little reason to allow it. "BrettR" is likely just wasting his time. But note, not much of ours, is easy to rip out. OTOH, maybe it is just an April Fool's joke, albeit not a very amusing one. Robert Ullmann 01:39, 2 April 2008 (UTC)[reply]

No, not April fools. There are a small number of English determiners and half of them were already listed as determiner. The others have been categorized that way for a long time. I just though I'd be bold. I'm certainly willing to discuss. I've stopped the process for now.--BrettR 01:45, 2 April 2008 (UTC)[reply]

By the way, the vast majority of my edits last night were simply adding or fixing up the {{en-det}} template to sections with existing L3/POS Determiner headers.--BrettR 13:51, 2 April 2008 (UTC)[reply]

  • I really don't understand all of the hullabaloo here. "Determiner" (or "demonstrative determiner") has been in use as an English POS header here since at least May 2004, without causing much of a fuss, and it seems to me that all BrettR has done is introduce a bit of much-needed consistency. -- Visviva 06:42, 3 April 2008 (UTC)[reply]
I'm asking what it's for. If it's good for users, then it deserves to be documented as policy. If it's not, then it deserves to be eliminated. The other dictionary operations seem divided on its goodness. Should we stay with the set of categories most widely known to users and contributors or should we push them in a progressive direction (if it really is progressive)?
As wiktionary matures some of the issues that have been swept under the rug ought to be addressed to allow progress toward consistent practice to help our users get more out of us than they would get out of a list of definitions. Our entries are far from uniform in quality and extremely inconsistent in the use of fundamental categories. The appearance of uniformity forced by the software and the heading structure belies the depth of the inconsistency. The level three headers are of fundamental importance in structuring entries. Questionable areas include the treatment of abbreviations (written vs. non-written, actual PoS for abbrevs. other than nouns); phrases, idioms, and proverbs; interjections; numbers; and other symbols. Do we need adjectve headings for attributive use of nouns? When do participles become adjectives or nouns? When are related etymologies worth splitting? Codifying much of this now might be premature, but it would help Wiktionary improve to convert our experience and beliefs concerning what works from a user perspective to policies and guidelines. DCDuring TALK 10:14, 3 April 2008 (UTC)[reply]

I believe that all the English determiners are now identified with an L3 Determiner heading and {{en-det}}.--Brett 12:09, 26 May 2008 (UTC)[reply]

Proposal to use "Determiner" as an L3/POS header.

Per the discussion immediately preceding this one, I'd like to propose that we begin using "Determiner" as an L3/POS header for the unambiguous English determiners (CGEL determinatives). It's a small enough group of words — larger than you might expect, but still pretty small — that it shouldn't be too bad to undo if we later decide it was a bad idea for whatever reason. Conversely, if we decide that it was a good move, we can open the door to determiners in other languages, and loosen the "unambiguous" criterion.

Advantages to doing this:

  • It's more accurate than trying to force determiners into the traditional lexical categories (parts of speech).
  • It's more concise than giving determiners several different POS sections with the same definitions over and over again.
  • It's in better keeping with our sister project Wikipedia's (accurate, or trying-to-be-accurate) descriptions of the various parts of speech.

Disadvantages:

  • The term "Determiner" will be less familiar to many readers than more traditional terms. (Of course, most of our readers probably have a fairly unclear notion of the traditional terms' meanings, anyway.)
  • From what I understand, there's not total consensus in the linguistic community about exactly which words are determiners. (Of course, "we can't do this perfectly" is a far cry from "we shouldn't bother trying" — and it's not like there's total consensus in the traditional-grammarian community about exactly which words belong to which traditional parts of speech, either.)
  • It might be a slippery slope from this to less obviously positive adoptions of modern linguistic theory — will we next adopt the notion of intransitive prepositions? Will we create an entry at Ø with screenful after screenful of definitions of null determiners and whatnot in every language known to man? (Of course, knowingly accepting inaccuracies is also a slippery slope. As a wiki, we really have no choice but to trust our future selves and future members of the community.)

Thoughts? (I'd like to bring this to a vote within a week or two, if there seems to be agreement.)

RuakhTALK 01:23, 2 April 2008 (UTC)[reply]

Why should we buy what the linguists are selling? Why should we buy it before OED, MW, Collins, AHD, Longmans, Random House, et al.? What are the actual advantages of this to our anonymous users? What evidence supports our beliefs about these possible advantages? What are the words that would be affected? It would be handy if we had one or more categories for them. If this vote is to be meaningful, would it not be useful to rein in Mr. Determiner and get him involved in the discussion? What else has to be done for this change to actually have a good effect on the experience of our supposed user base? DCDuring TALK 01:41, 2 April 2008 (UTC)[reply]
I'm in for a vote. Obviously I think determiner is a very useful category. Why should we buy what the linguists are selling? Can you imagine editors over at wikipedia (I know this isn't wikipedia) asking "Why should we buy what the physicists are selling? Newton was good enough for me." We should buy it because they're the experts. Why before OED? Because OED only comes out every few decades. Why before Longman? Longman has used determiner for years. For a list, see here Category:English_determiners (that doesn't include cardinal numbers which are all determiners as well as being nouns). I hope that helps.--BrettR 01:57, 2 April 2008 (UTC)[reply]
Longman's competitive strategy seems to be to try take market share among those who are dissatisfied with existing dictionaries. They seem to try novel approaches in many ways: typesetting, layout, font selection. So it is hardly a shock that they would be an early adopter. I wonder what percentage of the things that they have been early adopters of have been taken up by the others, that is, been successful. And what percentage they have abandoned, that is, failed. OED and MW also have online dictionaries which could readily adopt these innovations if they thought there were an advantage. — This unsigned comment was added by DCDuring (talkcontribs) at 02:52, 2 April 2008 (UTC).[reply]
Here on Earth there is very little reason for an engineer to care about theories of everything, general relativity, gauge theory, string theory, etc. Truth is truth for a purpose. Newtonian physics is very practical for many purposes. How will it help an anonymous user to use an unfamiliar category such as "Determiner"? How will it help me? It is not compelling and more than a little troubling that the rationale is: "Trust us, we're the experts." I certainly have no objection to having entries for words like "determiner" or using it for categorizing. It is for more fundmental purposes that I am concerned. You are imposing a change in the structure of what we are doing. It reminds me more of a Websterian or Shavian spelling reform proposal than something productive. DCDuring TALK 02:21, 2 April 2008 (UTC)[reply]
How will it help an anonymous user to use an inaccurate category such as "Adjective" for a determiner like both? Determiners are some of the most common words in the language; if someone's looking one up, I don't think we can assume that they'll be terribly familiar with "Adjective", either — and it certainly won't help them understand how to use the word. Incidentally, I really don't understand your comparison of this to Webster's and Shaw's proposals: we most certainly are not trying to change how people use these words. We're not trying to turn these words into determiners, only to accurately identify the words that already are determiners. —RuakhTALK 02:39, 2 April 2008 (UTC)[reply]
Accuracy-shmaccuracy. They are no more Determiners than they are Adjectives. These are just categories, tools of thought. Planets don't have elliptical orbits any more than they have circular ones. Elliptical orbits are more useful tools for thinking about their orbits, not ultimate truth. What are the constructive benefits of using this category? Will users be happier? Will they leave the site knowing more? Will we gain users? Will we gain funding? Will linguists cheer us? DCDuring TALK 02:52, 2 April 2008 (UTC)[reply]
I don't think it would be constructive to get rid of the category -- there are English determiners (or words which act like determiners), just as there are determiners (or words which act like them) in many other languages, and it makes sense to have a category for such words. The real question is whether we should use this as a header for English, as we have already been doing for many years. If we reject it for English entries, we have the odd phenomenon of a category, accepted for English entries, which is (and must be) a POS header for other languages such as Korean, but which is not accepted as a POS header for English entries. That would be inelegant at least, and elegance does have a certain value of its own. -- Visviva 04:41, 2 April 2008 (UTC)[reply]
I notice that above "determiner" is called "leading (bleeding?) edge". That's an odd way to refer to a concept that is at least 75 years old. Leonard Bloomfield wrote in 1933 (likely echoing earlier writings) that "The determiners are defined by the fact that certain types of noun expressions (such as house or big house) are always accompanied by a determiner (as, this house, a big house)." Since the 1960s, the concept of English determiners has been positively mainstream within linguistics, admittedly with some disagreement around the edges (do words like my, his, etc. belong?), but with broad agreement that there exists this class of words that are not properly described as adjectives or pronouns.
"How will it help an anonymous user to use an unfamiliar category such as 'Determiner'?" How will it help them to find things called "demonstrative adjectives"? And maybe they'll click on the word and learn something.
Nobody is asking anyone to take this one faith. The topic is clearly laid out in many books and articles with data and argument to back it up. Anyone who hasn't read much about the topic could do worse than look at the wikipedia entry. From there I would suggest moving on the A Student's Introduction to English Grammar or, for the more ambitious, The Cambridge Grammar of the English Language. You could also check out John Payne's 1993 chapter in Heads in Grammatical Theory by Corbett et al (eds.). Hope that's useful.--BrettR 13:17, 2 April 2008 (UTC)[reply]
If it has been around then it has had ample opportunity to have proven its practical utility to OED, MW, et al. The article of faith is whether it will prove useful to the practical understanding of users. This appears to be a fashion on interest to linguists with no positive practical consequences that anyone seems to be aware of. I was hoping that there would turn out to be some positive consequences that could emerge from the discussion. I remain hopeful that someone will be able to articulate an advantage to the population at large from making this change. If it has value in making linguists want to participate in Wikitionary, that might count for something. DCDuring TALK 14:11, 2 April 2008 (UTC)[reply]
If we can agree that there is a practical utility to having POS at all (and there seems to be consensus that there is), then presumably there is utility in making them perform consistently. The OED defines adjective as "a word standing for the name of an attribute, which being added to the name of a thing describes the thing more fully or definitely, as a black coat, a body politic." It further defines an attribute as "A quality or character ascribed to any person or thing." The determiners don't look at the qualities or characters of things. Instead, they have a pointing function that tell us which things are being discussed. Unlike adjectives, they cannot typically be graded or modified by too, so, very, or other adverbs that typically modify adjectives. They cannot typically appear predictively as most adjectives can (which also relates back to the fact that they are not qualities). That is, you can't say "The people are some." They are usually mandatory where adjectives are (always?) optional. You can't typically use them together (where you can string adjectives together to your heart's content). Adjectives are independent of number where determiners typically much match the number of the noun. In fact, about the only similarity is that they are both appear before nouns. So one practical utility of calling these determiners is that you convey all this information in a single word.
What, may I ask, is the benefit of calling these adjectives apart from tradition?--BrettR 14:37, 2 April 2008 (UTC)[reply]
The principal advantages are that:
  1. users believe they know the implications of something being labelled an adjective and
  2. there is some validity to that belief.
I don't see any reason to cause them to investigate the meaning of determiner if doing so does not offer some real benefits to those not on a career path in language. That the concept has appeal and value to those in the field I do not doubt. I am a little disappointed that there seems to be so little of practical value in the concept. When writers about language a writing a book to sell they don't seem to find the word "determiner" of great value in explaining things. It doesn't appear in the index or glossary of many books on language {Chomsky's "Aspects", Pinker's "The Language Instinct", 3 Safire works, 1 Crystal, 1 Fischer), though it does appear in the index to Pinker's "Words and Rules" and Crytal's "Encyclopedia of Language".
In short, "determiner" has been around, but has not swept the field among the producers of on-line and print dictionaries or language authors. It's merits do not speak for themselves. The stated benefits seem limited to "elegance". DCDuring TALK 15:38, 2 April 2008 (UTC)[reply]

You seem to be implicitly assuming

  1. said implications apply to determiners, such that this belief is a good thing here.

which is not an assumption I'm ready to make with you. That said, if we had some sort of (determiner) context tag, I'd feel more comfortable than I currently do with labeling these "adjectives". (That might also help streamline the entry for זֶה (ze, this), which can serve either as an adjective or as a determiner, the latter being more formal/poetic/archaic, but the meaning being the same either way.)

RuakhTALK 18:10, 2 April 2008 (UTC)[reply]

I await an itemization of the erroneous conclusions that users draw from the old-style PoS headers that will be corrected by the use of Determiner for the words that really are Determiners as opposed to the pretenders that have been nominated Determiners by some linguists for their own nefarious ends. I don't see why the Hebrew need for the determiner category has any particular implication for how we treat English. Or is there a procrustean imperative in Wiktionary's constitution that I missed. I don't see how we can have "proof" of benefits of one kind of catagorization over another, but I would like to see some presentation of the possible benefits that offset the "cost" of introducing a term that Longman's identifies as "technical" in the two of its dictionaries I have looked at. This seems of a piece with "plurale tantum". If Wiktionary is "by linguists, for linguists", then we ought to reconsider the logo design to make that clear. DCDuring TALK 19:47, 2 April 2008 (UTC)[reply]
I thought I had done that. I guess we're thinking about different kinds of conclusions. Could you give me an example of the type of thing you mean, say with 'verb' in the role of determiner (nefarious interloper/deprecated POS header)?--BrettR 20:47, 2 April 2008 (UTC)[reply]
I have no idea how to put 'verb' in the role of determiner or how that connects to what I think I've said or was trying to say. I'll look at this again tomorrow with some sleep. My underlying goal is to make sure that we have our poor occasional users in mind with any change because they are the source of the growth of Wikitionary's impact on the world. It seems moderately useful to me personally, but I don't believe myself to be representative, nor do I delieve the participants in this forum to be representative, of our target users. DCDuring TALK 01:01, 3 April 2008 (UTC)[reply]
Erm, sorry, but I think you misread my comment? In no way was I suggesting that "the Hebrew need for the determiner category ha[d] any particular implication for how we treat English." What I was suggesting is almost the opposite, actually: some of our Hebrew entries currently make use of ===Determiner=== even though that category doesn't apply as perfectly in Hebrew as it does in English, and I was suggesting that if we figure out a decent way to handle English determiners without using ===Determiner===, we can apply that same method to Hebrew. (Obviously ignoring the distinction altogether is not a decent way in either language.) —RuakhTALK 23:16, 2 April 2008 (UTC)[reply]
OK DCDuring TALK 01:01, 3 April 2008 (UTC)[reply]
"I have no idea how to put 'verb' in the role of determiner" Sorry, what I meant was, if the 'verb' heading didn't exist, and all these verb things were being called 'nouns', what arguments would you use to argue that we should split them out into a new category? Anyhow, have a good sleep.--BrettR 01:40, 3 April 2008 (UTC)[reply]
As a general comment: the additional heading is a bad idea. Adding the categories seems fine. Don't those categories already exist anyhow? The heading might simplify typing an entry in once, instead of listing the POS headings that apply. But a heading like 'determiner' does nothing to clarify the word for typical readers. --Connel MacKenzie 06:28, 3 April 2008 (UTC)[reply]
Yes, the categories already exist. The discussion is about the heading.
I have trouble imagining what kind of typical readers are being imagined here. What typical reader looks up words like many, each, this, etc. in a dictionary? Presumably we have someone curious who has noticed something about the word and wants to understand it better. Such a person is much more likely to meet their goal (i.e., learn something) if they find the heading Determiner. If they don't know what it means, the answer is a click away. And if they've already gone to the trouble of looking up one of these words, then its seems a good bet that they'd go that extra step.--BrettR 11:41, 3 April 2008 (UTC)[reply]
I agree with this. If a person is looking up a word like this, then they should be told about what it is. To call this and a word like nice by the same label will probably lead to an erroneous conclusion that they have the same syntactic behavior. The reason for using two labels is because they occur in different positions in grammatical constructions and because they have different functions. And that's why they are used in detailed descriptions of English grammar. It is probably just resistance to change (prescriptivists dont usually like change) that other dictionaries have not followed Longman. I suggest you add the new label. Ishwar 19:54, 15 April 2008 (UTC)[reply]

Demonstrative adjective

I seem to have started this. Demonstrative adjective is an old established term for this/that, but certainly not for both. I again remind you all that only this/that, these/those inflect for number, unlike any other English adjectives. A fun bit is contracting "is" while putting these adjective into the possessive: "this's this's", "that's that's". — This unsigned comment was added by Allamakee Democrat (talkcontribs) at 16:27, 2 April 2008 (UTC).[reply]

Note: it is an error in English to contract this' with is. Regarding "demonstrative adjective", that term is possibly established for linguists, but not used in any recognized general use dictionary, to describe them. Making up headings certainly is not useful. A category with two members is a fine bit of overkill. A simple usage note on those two entries, would be better. --Connel MacKenzie 06:24, 3 April 2008 (UTC)[reply]
Your pronouncement is a bit misleading. Webster's New 20th Century Dictinary includes demonstrative with the following as a definition: "in grammar, pointing out; as that is a demonstrative pronoun." It does not use demonstrative to identify parts of speech of entries, but the concept is in the dictionary as it is in most dictionaries.
The problem with this approach can be seen (for example) in that same dictionary's entries for this. They give both a listing marked as pronoun followed by half a dozen definitions, then a listing marked as adjective followed by the same half a dozen definitions with the wording alterted only slightly for the use as an adjective.
You see, demonstrative adjective/pronoun words are one class of Determiners. One of the great advantages of the Determiner header in English is that we can combine and simplfy many repeated senses. This has much potential for reducing confusion on the part of the casual user, since they won't have to look through two sets of definitions that look almost identical. The translations will also be greatly simplified. In most languages I've studied, the translation for adjectival use of this and pronoun use of this are the same or nearly so. Thus, separate Adjective and Pronoun sections (in addition to other uses) doubles the necessary length of all words that are Determiners. Consolidating under a single header will simplify and cleanup these entries enormously. --EncycloPetey 18:06, 3 April 2008 (UTC)[reply]
Demonstratives this, that, etc. can function as determiners and as pronouns. But, not all determiners have the same syntactic behavior as this, that. For example, no, every, the cannot function as pronouns. So, you should use both labels for this reason. It, of course, doesnt necessarily follow that you need to repeat the definition multiple times Ishwar 19:35, 15 April 2008 (UTC)[reply]
I'm not quite sure I understand why the matter needs to be debated at all. Determiner is not an accepted part of speech among legitimate grammarians; if you look in one of good old Webster's dictionaries, you'll never find a word classified as a "determiner." If it is not accepted by the experts, why should it be on Wiktionary? Are we striving for incorrectness? Elfred 03:51, 20 April 2008 (UTC)[reply]
That's a funny thing to say. Picking up the first three relevant books that come to hand on my bookshelf -- Longman's Dictionary of Contemporary English, Leech & Svartvik's Communicative Grammar of English, and van Ek & Robat's Student's Grammar of English -- I find that all three use "determiner" quite routinely. The last two have a fairly canonical status in the field of English language teaching.
There are at least two things that distinguish these works from many others: they are written for an international audience, and they are based on methods and theories that are reasonably up to date. I would hope that we would try to emulate them in both regards. Dictionaries written for monoglot audiences, based on obsolete theories and methods, are not a particularly good role model. -- Visviva 04:56, 20 April 2008 (UTC)[reply]
It would be much easier to make a decision about this if we actually knew something about our anon users are and who they should be. Longman's DCE (definitely a learner's dictionary) is a good model for a dictionary, but it seems to be the only dictionary that uses "determiner" as a PoS. If someone could articulate the benefits of the "determiner" concept .... DCDuring TALK 09:47, 20 April 2008 (UTC)[reply]

As of the March, 2008 revisions, the OED uses Determiner. See, for example, the entry for many.--Brett 12:11, 26 May 2008 (UTC)[reply]

Internet slang

We currently have {{Internet}}, {{slang}}, and {{Internet slang}}, all of which are context tags categorizing eponymously. This means that some Internet slang is categorized under Internet and slang, whereas other Internet slang is categorized under category:Internet slang. This is far from ideal. I can think of two (mutually exclusive) solutions:

  • Get rid of category:Internet slang and force template:Internet slang to categorize into Internet and Slang. This seems reasonable to me.
  • Have AutoFormat change every context tag that includes both Internet and slang to include Internet slang instead (and remove spaces, or whatever, of course). This seems like a bad idea to me, as there could be an entry that is both Internet (not necessarily slang) and slang (outside of the Internet), so would need both tags.

Thoughts?—msh210 21:18, 2 April 2008 (UTC)[reply]

Seems to me that the vast majority of these should be in Category:Internet slang. You are probably right that the conversion cannot be automated, but perhaps we could get a cleanup list? -- Visviva 06:33, 3 April 2008 (UTC)[reply]
A rough cleanup list.msh210 16:18, 3 April 2008 (UTC)[reply]
It would be nice to redesign the context labels to carry information such as formality and region. Yes, {{internet slang}} could expand, as you say, but why shouldn't{{context|internet|slang}} classify it under all three categories? DAVilla 06:53, 7 April 2008 (UTC)[reply]
Is that doable with template:context?—msh210 19:00, 9 April 2008 (UTC)[reply]
See my response here. DAVilla 17:05, 1 June 2008 (UTC)[reply]

full-width characters

I think we should reprogram wiktionary so that when people look up something with full-width characters, it treats it as a lookup using half-width characters. Full-width characters are non-ASCII variations of half-width characters, made wider to match the width of other languages like Japanese. So words in them would show up now and then in foreign text for example CD in Japanese. If someone looks up CD on Wiktionary, we will either...

1) Tell them the word doesn't exist,

2) Give them a definition like "full-width version of CD",

or 3) Just automatically redirect them to CD (hardcoded, not a manually-added redirect, because it'd be too much work to manually add redirects for all English words).

I suggest option 3. Right now we do option 1. Language Lover 22:23, 2 April 2008 (UTC)[reply]

How does one come to look up something with full-width characters? By copying text from another website, and pasting it in Google or Wiktionary's search field?
If full-width characters are non-ASCII, do they carry other meaning? Or are they just a presentation form which always represents the same as their Latin equivalents?
Sounds like it might be a problem to be solved by text encoding standards bodies, and the makers of system software and web browsers, rather than by the maintainers of each website. —Michael Z. 23:13, 2 April 2008 (UTC)[reply]
Copy-and-paste, yes. Google is smart enough to fix them; MediaWiki (the software the Wiktionary runs on) is not — which is just as well, because we probably do want entries for and , if only to answer the questions in your second paragraph. However, CD should probably JavaScript-redirect to CD. Shall we discuss this at Wiktionary:Grease pit? —RuakhTALK 01:09, 3 April 2008 (UTC)[reply]
Okay, perhaps it is a feature we could use, but it should be pretty well thought out before implementing.
But this brings up another question, about which symbols are appropriate Wiktionary entries. WT:CFI doesn't really cover this adequately. The technical implementation is different, but whether in the full-width or ASCII code range, "CD" means "CD". As far as I know, Wiktionary entries are about words, and maybe some symbols or the concepts they represent, but they are not about code points. The full-width code point is a symbol representing the exact same concept as the ASCII code point C, and I don't think it should have a separate dictionary entry. —Michael Z. 21:30, 3 April 2008 (UTC)[reply]
Yes, hallelujah! And combine A with Cyrillic А and Greek Α etc. Differentiating script is a great feature of Unicode if you're interested in automated text manipulation, but when it comes to defining symbols, these are indistinguishable glyphs. In fact, I would say that if anything deserves another page it's the italic text, the script text, etc. even if it may be the same code point. DAVilla 06:48, 7 April 2008 (UTC)[reply]
I believe this is yet another issue that Hippietrail's Extension:DidYouMean would deal with for us, lets hope we can get it tested and implemented soon. Conrad.Irwin 11:36, 3 April 2008 (UTC)[reply]
If you have the right keyboard setup, you would come across it just by typing. And don't think for a second that just because some Asian language Wiktionary exists that a single translation of ko:CD is going to keep a non-native English speaker from coming here. Even with as little knowledge of other languages as I have, I know that to get the best explanation of a word you have to look on the foreign language dictionary. A survey of foreign language terms on this English Wiktionary and the all-too-often need for a {{gloss}} drive in the point. So yes, there is every reason to have this incorporated into DidYouMean or some other solution. DAVilla 06:48, 7 April 2008 (UTC)[reply]

Colloquial and slang: a sensible combination?

I'm in a disagreement with User:Amgine over at doggie as to whether having both {{colloquial}} and {{slang}} simultaneously is appropriate. I'm firmly in the camp that only "slang" is needed and if the fact nobody commented on that above is any indication, other people seem to agree. Circeus 01:05, 3 April 2008 (UTC)[reply]

I agree with you. As Amgine says, "[t]he terms are not synonymous", but as you say, "slang is by its very definition 'colloquial'". "Colloquial" does not imply "slang", but "slang" does imply "colloquial", so there's never any need to list both. (That's going by what I currently know. If someone can give a decent rationale for ever including both, I'm open to the possibility.) —RuakhTALK 01:25, 3 April 2008 (UTC)[reply]
Colloquial, by our glossary, indicates a term in common, often informal, parlance, as opposed to jargon usage. Slang, in contrast, is characterized by its limited use, unconventionally or as informal jargon. ie. "anon" on en.wiktionary is slang, but "bike" is colloquial. (Incidentally, I thought informal was defined in the glossary at one point as indicating a term for which there is a more-formal synonym term used in formal circumstances, such as the bike/bicycle dichotomy. Currently there is no definition in the glossary for informal, so should we be using it at all?) - Amgine/talk 14:06, 3 April 2008 (UTC)[reply]
Given that the distinction you refer to was added a few days ago (after your revert, I think—would need to check;I've started a discussion about it below), I definitely don't consider it to have any bearing on the discussion. Circeus 17:23, 3 April 2008 (UTC)[reply]
Even given your definitions of informal (the last of which I dispute, btw) the primary definitions are to contrast with formal uses. And, by implication, must clearly not be considered to have any bearing on the discussion.</irony> I was unaware of the edits to the glossary, assuming your good faith in the matter. - Amgine/talk 20:13, 3 April 2008 (UTC)[reply]
Re: the edits, I was just pointing it out before someone looked there and noticed it bore on the discussion.
As for "my definitions" (which, despite pointed discussions on the subject, have not been altered since), they are directly in line with those of, e.g. Merriam-Webster. In fact, one could argue that the replacement definition is entirely superfluous since meaning 1 arguably cover it (and the entire entry could use a better swipe: it's considered bad form to define a word in by saying what it doesn't mean).
I've made the point in the past, with several references to scholarly sources, that no modern dictionary makes a meaning distinction between the "colloquial" and "informal" labels (which is why they always use only one). Furthermore, "informal" is clearly never used in the very restricted meaning of "spoken" (which is what "colloquial" was originally explicitly coined to cover). Circeus 21:24, 3 April 2008 (UTC)[reply]
Now that you mention it, I don't recall any dictionary using the informal label except the US ones. However, I have not often examined dictionaries specifically for this use. I should examining those near to hand to me before I comment further:
  • Oxford Dictionary of Current English: informal
  • M-W Pocket, Online: neither
  • Various nautical, rare: colloq and formal (presumably because all else is informal?) But these are not valid as they are, by definition, non-standard.
I believe this does not indicate a consensus amongst modern publishers, but it may raise question as to the value of *either* label. - Amgine/talk 03:55, 4 April 2008 (UTC)[reply]
There are disagreement over the philosophical application of tags between lexicographers, and although some general language dictionaries do use colloquial, they use it instead of informal, not as a separate category. When M-W published (in th 50s) a dictionary that dropped the colloquial and (IIRC) slang tags, amongst others, they got an incredible amount of flak, but that dictionary is considered one of the most progressive of its time.
To give aother example of peculiar tag use, my harraps-chambers billingual is mildly idiosyncratic itself: it uses neither formal or informal, but has "ironic" and "humorous" (arguably billingual dictionaries have different needs and must carry more connotative information than the average monolingual). It uses "familiar" and adds "colloquial" next to the explanation of the abbreviation (Fam). It does use "formal" though. This is likely because "informal" is not used of language in French, and they strived to use a single tag for both languages whenever possible. Circeus 06:02, 4 April 2008 (UTC)[reply]
Longman's Dictionary of Contemporary English 3rd ed (1987) uses: formal, informal, literary, pompous, poetical, slang, dialect, technical, old use, old-fashioned, appreciative, derogatory, taboo, trademark, slang, humorous, and appreciative, but not "colloquial". I included more than have a direct bearing to illustrate that they seem to go out of their way to select straight-forward terms whose ordinary meaning is very close to their intended meaning instead of relying on jargon. Their definition of colloquial is that it indicates usage "suitable for ordinary, informal, or familiar conversation, not formal or special to literature." They are rather progressive in their handling of such matters. (These are the fellows who use "determiner" as a PoS.)
MW3 (principal copyright 1961) must be the M-W dictionary referred to above. The 1993 ed. retains "slang", but not "colloquial", "formal", or "informal". It does have "standard", "substandard", and "nonstandard". DCDuring TALK 12:03, 4 April 2008 (UTC)[reply]
From this I conclude that "colloquial" is less precise. I wonder if it wouldn't be desirable to allow all of these tags and indicate which ones are more and which ones less precise. That way, even less precise knowledge that users might have would be included. Categories would make the less precise items reviewable if someone actually believed that had knowledge or belief tha they should be more precise. We can take advantage of our non-print wiki nature to be more dynamic and evolutionary than print dictionaries. DCDuring TALK 12:03, 4 April 2008 (UTC)[reply]

I find humorous the claim that all slang is colloquial just below another topic called "Internet slang". Certainly Internet slang or other types of esoteric jargon did not necessarily originate in speech. The categories slang, colloquial, and informal all overlap greatly, but there are clearly some terms that will be only one or the other. This is another big reason to maintain the distinctions. -- Thisis0 23:50, 5 April 2008 (UTC)[reply]

Well, obviously the reason I do not normally accept the combination is properly that I consider that "colloquial" and "informal" are one and the same for all practical purposes, so obviously "informal" and "slang" make no sense to use together. Circeus 00:26, 6 April 2008 (UTC)[reply]
Likewise, I assume all informal falls within the purview of colloquial, but not always the reverse; that is, I feel colloquial is more inclusive as some colloquial speech is also formal. However, these discussions suggest the labels are ill-defined and, possibly, driven by editors' opinions more than by objective measure. As such it seems likely they are inappropriate labels to be used at this time. - Amgine/talk 01:58, 6 April 2008 (UTC)[reply]

I see a misspelled entry

Just passing by, but I noticed that paranomasia and paranomasias are misspelled: the correct spelling is paronomasia, with an o as the second vowel. --124.178.50.148 01:05, 3 April 2008 (UTC)[reply]

Actually, both spellings seem to be fairly common; the a spellings should probably be marked as alternative spellings of the o spellings. —RuakhTALK 01:13, 3 April 2008 (UTC)[reply]

Recent edits to Appendix:Glossary needs community discussion

I think these (especially those to "Colloquial" and "Informal") really ought to be discussed. While I appreciate the addition of a {{familiar}} tag and entry, I'm really dubious of the need to add entries for informal and formal, unless we want to throw meanings on them they don't have or (not to mention the information given is sometimes contradictory with that found in Wiktionary: Glossary). The links in {{informal}} and {{slang}} are at best superfluous, at worst an insult to our readers. Also, the attempt to specifically distinguish "slang" and "colloquial" (although it does formally establish that the tags are inappropriate in combination, see related discussion above), is fairly ridiculous, as slang is colloquial (if we take the "originating in speech" definition as a basis) by its very definition.

I'll again request a thorough discussion about the basis of distinguishing informal from a purportedly distinct and easy to attribute (because if we have to argue over the tagging every single word, the tag is pointless) "colloquial" category , as it is indirectly related to the other discussion above.Circeus 01:21, 3 April 2008 (UTC)[reply]

I think of "informal" as being used mostly to create the contrast with "formal", rather than being used in the same context as "colloquial" and "slang". There are certainly words that are mastly used in "formal" contexts, Madame Chairman. "Formal" is not a synonym for standard. It refers to words used on ceremonial and official occasions and contexts, but not limited to any one of them (like courts of law). DCDuring TALK 11:24, 3 April 2008 (UTC)[reply]
I don't really dispute the usefulness of the template, but I do strongly doubt the necessity of defining it into the user glossary. Circeus 21:27, 3 April 2008 (UTC)[reply]
I don't wish to cause trouble, but if we have a term we use in these tags, I would argue that we need to have a link from the tag to either an entry or, if we are using the word in an even slightly idiosyncratic way, to the glossary. DCDuring TALK 21:49, 3 April 2008 (UTC)[reply]
Given the number of various terms we use like this, I agree. If we're going to make distinctions beween "informal, colloquial, slang, vulgar, etc" then it seems reasonable to explain the distinctions to our users by means of a link to some kind of explanation. --EncycloPetey 22:21, 3 April 2008 (UTC)[reply]
Of course, it would be a good idea if we could actually agree about the meanings first I think the previous discussion firmly established there is at best disagreement as to what should be done of colloquial. Circeus 23:58, 3 April 2008 (UTC)[reply]
The tracks are parallel. The only technical decision is whether to use the normal entry definition, a category page, or the glossary. I would argue that the best technical solution would be to use the Wiktionary Glossary definition unless there wasn't one, in which case the normal entry would suffice. Then we can hash this out without further tech involvement. Or perhaps we could simply insist that the Glossary be the sole source for such tag definitions. The category pages would be inconvenient for maintaining the sets of related terms, I suppose. DCDuring TALK 00:40, 4 April 2008 (UTC)[reply]

Citations_talk: is deprecated

There has been scattered discussion in various places about the [[Citations_talk:]] namespace with the general feeling being that, given the phenomenal amount of use that our Talk: pages get, we may as well have one talk page to discuss the word - rather than two talk pages to discuss two intimately related pages. The Citations_talk: namespace should now be empty, and the site javascript conspires heavily against users finding themselves there. Conrad.Irwin 11:33, 3 April 2008 (UTC)[reply]

That's nice. Now, given that one of the primary purposes of the Citations namespace is to collect citations for words that we don't yet have in NS:0, and may or may not have in the future, where do you suggest putting any resulting discussion? An NS:1 Talk page will look orphaned, and such are routinely deleted. Eh? Robert Ullmann 11:58, 3 April 2008 (UTC)[reply]
They should still be in the Talk: page, as - should the entry ever be created - previous discussion about the citations is likely to have an impact on it. It should be trivial to ask whatever (or whoever) deletes orphaned talk pages to check for Citations pages of the same name. (off topic) I feel that the obsession with deleting orphaned talk pages is unnecessary and indeed harmful - it is possible for the community to talk about entries that will never exist or have been deleted. It makes sense to store all this talk in a central place where it can be found instantly should the need arise - the talk page is the ideal page for this. Take for example the {{rfvfailed}} template - which archives deleted information easily accessible on the talk page, yet if a whole entry gets deleted the archive is somewhere in a page history accessible from some index that I can't remember the name of at the moment. Conrad.Irwin 12:36, 3 April 2008 (UTC)[reply]
Perhaps a standard link (tab?) from Citations to the associated talk page would facilitate finding it. And if it were red, it would convey non-existence. I don't see any reason to delete our previous work, either citations or discussion. Page history is not readily searchable, AFAIK. DCDuring TALK 16:53, 3 April 2008 (UTC)[reply]
Re: Ullmann's comment: why are any discussion pages with valuable content being deleted at all? Orphaned talk pages should only be deleted if they contain vandalism, discussions about terms which may not presently exist are valid if only to show that someone has discussed them and a decision was made that the word didn't merit inclusion. Orphaned talk pages should not be deleted by rote.
Re: the general discussion: I am opposed to splitting discussions between the cites and talk page, since the cites page is simply another type of "discussion" about the NS:0 term. It isn't as if the talk pages are swamped, the average defined term has no extant talk page and the average talk page is tiny, even compared to the entry. Best to keep all meta information in one place. - [The]DaveRoss 20:27, 3 April 2008 (UTC)[reply]
So apparently there is this fancy thing in MW now called "namespace alias" where we can make all "Citations_talk" pages point directly at "Talk" if we wish (also "WT" at "Wiktionary" if we want that. - [The]DaveRoss 20:54, 3 April 2008 (UTC)[reply]
Oooh. That sounds a lot like what I would want. Any drawbacks? Presumably it would need a vote to be implemented. DCDuring TALK 17:36, 4 April 2008 (UTC)[reply]
Yeah, on wiki consensus would need to be displayed, and the namespace which was made into an alias would never-ever be editable, at all. - [The]DaveRoss 20:30, 4 April 2008 (UTC)[reply]
I think it would be more useful to wait and see how Citations talk might be used than to declare this early that it could not be used. Primarily I am concerned that there is not the 1-to-1 correspondence between entries and Citations pages that people wish there to be. I could be proven wrong, but I would want time itself to do that.
However, I am certainly interested by this ability to create namespaces that are not the usual duals. I never did understand, for instance, how WT talk: could ever be of use. WT: isn't really a "full" namespace anyway. At the same time, it would be incredibly useful to have the Template: space for documentation and another space that actually holds the code, sharing a single talk page. But you've heard me say that before. DAVilla 06:12, 7 April 2008 (UTC)[reply]
Well <noinclude> allows documentation on the Template page, it is just more common to see it on the talk page, no reason really to do it that way. I agree that WT_talk is useless, I personally think we ought to make it an NS alias, but I wasn't greeted with full support when I voiced that opinion... - [The]DaveRoss 20:55, 9 April 2008 (UTC)[reply]

The definition is embedded in {1} code and references contain {2} but they don't seem to do anything. Is this something I have to watch for or can I delete it when I see it? --Panda10 20:34, 3 April 2008 (UTC)[reply]

You can delete that on sight, it is meant to be used when a definition is subst:ed in from a template, but it ends up as a relic when people who don't know how to use the template do. - [The]DaveRoss 20:56, 3 April 2008 (UTC)[reply]

Formatting of Idioms, Proverbs in non-English entries

I haven't found any precise information on how to format Idioms or Proverbs in non-English entries. The main problem we have is where to put the literal translation. Here is a summary of the different elements to put in a standard entry.

  1. Wiki links on the words or group of word of the Czech proverb: yes, in lemma form
  2. Translation with the equivalent proverb in English: with a # after the entry, if not just omit
  3. Literal translation of the Czech proverb: on the same line as the entry, at the end, in bracket using the tr= parameter in the template {{infl}} but should only be used for transliteration or at the end of the definition in bracket or in the Etymology section ?
  4. Explanation of the idiomatic meaning of the Czech proverb: after the translation with #:, in italic ?

See About Czech for more information and also Translation of idioms. --Thomas was here  12:13, 4 April 2008 (UTC)[reply]

I usually put the literal translation as part of the etymology section. There may be times where explaining the literal translation further can help, and sometimes that can go under "Usage notes". I usually wouldn't put the literal translation as a "definition" unless there was no English expression that came close to the meaning. --EncycloPetey 18:00, 4 April 2008 (UTC)[reply]
I also use the etymology section for this. I can't think of a specific case where a usage note would be helpful, but I can abstractly imagine that sometimes an idiom's literal meaning could have implications a user should know about. I agree that the literal translation shouldn't be given a sense line unless the idiom is sometimes used literally, and even then, caution is warranted. —RuakhTALK 22:39, 4 April 2008 (UTC)[reply]
Me too. -- Visviva 02:51, 5 April 2008 (UTC)[reply]
Fine, so there seems to be a consensus on putting the literal translation in the Etymology section. Thanks for your answers. Another and I hope last question is: should we copy the idiomatic meaning from the English entry in the non-English entries ? I can also add the final version of this formatting to a Wiktionary:Proverbs page and put this page into the Category:Proverbs. It also is maybe time to add a section Non-English entries in the Entry layout explained. --Thomas was here  16:32, 6 April 2008 (UTC)[reply]
We already have Wiktionary:Language considerations for that; the page is just severely underused. --EncycloPetey 21:35, 6 April 2008 (UTC)[reply]

Illustrations

I suggest a new guideline for illustrations: they must be helpful when trying to understand the meaning of the word (if not, they belong to Wikipedia). As an example, I think that the picture in maritime pine does not belong here, but a picture of the tree would help. Lmaltier 16:47, 4 April 2008 (UTC)[reply]

Not a bad idea. But a little visual interest and color is better than an absence of any illustrations. At least the picture is (purportedly) actually of a maritime pine, albeit a rather young one. DCDuring TALK 17:30, 4 April 2008 (UTC)[reply]
Perhaps whoever took that picture is planning on returning in 30 years to get us a followup...assume good faith :) - [The]DaveRoss 20:29, 4 April 2008 (UTC)[reply]
Changed that pic, hope it's more satisfactory (others at Commons:Pinus pinaster). In general, I agree; we should generally have no more than one image per sense, and the image should be chosen to illustrate that sense as clearly as possible -- which in the case of an organism, means preferably a picture of the whole, mature organism. We should link to Commons or Wikipedia for additional visuals. However, if the previous picture had been the only one available on Commons, I think it would have been better than nothing. -- Visviva 02:50, 5 April 2008 (UTC)[reply]
It's a fine pic and probably better than the other, but from a distance many pines look alike. If I wanted to know how a maritime was different from the pines I know, probably neither picture would help. I wonder whether we should make all the wikicommons pictures on the subject just one click away. DCDuring TALK 04:00, 5 April 2008 (UTC)[reply]
All 3 sister project links in lite form seem useful in this case. Lite form makes it all appear on a single screen. I suppose there might be other cases where the bigger links would be OK. And yes that probably is the best single picture, whatever its shortcomings might be for specific purposes. And with tabbed browsing it is so easy to compare images of two or more kinds of pines. DCDuring TALK 04:10, 5 April 2008 (UTC)[reply]
My point was that, sometimes, no picture is better: everything that blurs the limit with Wikipedia should be avoided, including pictures when their only interest is encyclopedic. Pictures should illustrate definitions, they should be visual definitions. A good example, for a country name, is a map showing where the country is. This is not original, I think that this principle is applied for picture selection in most dictionaries (except encyclopedic dictionaries). Lmaltier 21:01, 5 April 2008 (UTC)[reply]

Category for rulers king, tzar, pharaoh

I'd like to find a category to put these words in. I think it should be a subcategory of People, but what would be a good name? Thanks. --Panda10 22:04, 4 April 2008 (UTC)[reply]

Category:Titles is related, but not quite the same. I had been thinking a while ago that it would be good to have a category for ranks of nobility like duke and baronet. I think whatever is used for "rulers" should probably encompass "nobility" as well. Mike Dillon 02:37, 5 April 2008 (UTC)[reply]
hereditary heads of state seems reasonable. --Allamakee Democrat 03:32, 5 April 2008 (UTC)[reply]
Wouldn't that exclude many founders of dynasties? DCDuring TALK 18:34, 5 April 2008 (UTC)[reply]
I'm not sure we need or want categories that are that granular. We already have too many categories that will never have more than a dozen entries; it's not like we're going to be adding individual heads of state a la Wikipedia. Mike Dillon 03:59, 5 April 2008 (UTC)[reply]
What is the minimum number of entries to create a category? How about Leaders for category name with this list as a starter: autocrat, crown prince, czar, czarina, despot, dictator, dynast, emperor, empress, generalissimo, governor, head of state, imperator, kaiser, khan, king, leader, magistrate, magnate, maharajah, maharani, mikado, mogul, monarch, Negus, pharaoh, potentate, prince, princess, queen, rajah, rani, regent, ruler, shah, shogun, sovereign, sultan, tsar, tsarina, tycoon, tyrant, viceroy
Or I could add these words as a See also to king without creating a category. Panda10 13:04, 6 April 2008 (UTC)[reply]
I'd go with Category:Rulers personally; "leaders" is awfully vague and doesn't describe a very clearly-delineated semantic field; it could just as easily include things like pastor and principal. But a category should (IMO) definitely exist which includes most if not all of the terms you have listed above. -- Visviva 14:08, 6 April 2008 (UTC)[reply]
I'd avoid Category:Rulers personally. The category name is ambiguous. Does it refer to people who rule, or to instruments used to measure distance? Such ambiguous names should be avoided. --EncycloPetey 16:24, 6 April 2008 (UTC)[reply]
How about Category:Monarchs? See Wikipedia w:Monarch listing a lot more words. This categorization would exclude words for modern leaders, though. Or better: Category:Positions of authority? This is a wider category and could include a lot more. --Panda10 17:53, 6 April 2008 (UTC)[reply]
"Monarchs" is too narrow a description. I fear that "positions of authority" may be too broad, unless we want to include abbot, schoolmaster, manager, etc. I haven't thought of a good name, but I have come up with many bad ones. --EncycloPetey 21:33, 6 April 2008 (UTC)[reply]
"Heads of state" would probably work. "Nobility" should be a separate category, although obviously some entries would be in both. Thryduulf 21:57, 6 April 2008 (UTC)[reply]
"Heads of state" doesn't really work for princess, queen, or prince (at least not in general). There seem to be a few of cross-cutting things going on here, mainly heads of state and royalty. Mike Dillon 22:13, 6 April 2008 (UTC)[reply]
I am not certain that there is a need for one category to contain all these entries, categories "Heads of state", "Royalty" and "Nobility" linked by see alsos would seem to be the best solution to me. "Marquess", "King" and "Prime Minister" don't seem to belong to a single class of positions to me. Thryduulf 22:31, 6 April 2008 (UTC)[reply]
I concur. That looks like a pretty good breakdown. Now the question is where to put them. "Heads of state" probably makes sense under Category:Government and/or Category:Titles. Royalty and Nobility could make sense under Category:Society and/or Category:Titles. Mike Dillon 01:31, 7 April 2008 (UTC)[reply]

I created Category:Heads of state and set up the parents and a customized description for using {{topic cat}}. We can adjust if necessary. Mike Dillon 01:44, 7 April 2008 (UTC)[reply]

P.S. We already have Category:Monarchy too under Category:Forms of government. Mike Dillon 01:46, 7 April 2008 (UTC)[reply]

Dating

Would anyone object to categorizing words by the year/decade/century of their introduction to the language/earliest attestation, where such information is known? --Ptcamn 08:02, 5 April 2008 (UTC)[reply]

Not really, but it seems problematic. What really counts as "knowing" that information? To prove that a word was introduced in the year/decade/century X involves proving that was not in use at any time before X -- and proving a negative is a difficult task under the best of circumstances. Third-party sources aren't necessarily a reliable fallback, either -- I think there have been a few recent cases where we've found citations for terms which substantially predate the dates of introduction given in reputable sources. -- Visviva 16:32, 5 April 2008 (UTC)[reply]
You want to introduce this concept for all languages? or limit it to English? or to Modern English? Even limiting to modern English, you'd be talking about adding hundreads of categories. Personally, I'm not sure they would be very useful. Why would anyone want to look up "English words first attested in 1712"? --EncycloPetey 18:12, 5 April 2008 (UTC)[reply]
Folks might be more interested in words first attested after a certain date so categories as we now know them might not be the right approach. We already have, in principle, the concept of definitions that are "dated", which implies that we know when they died or, at least, retired from active service. A bit of unstructured user Feedback suggested the notion. Perhaps that will be something we will further develop in the second decade of Wiktionary. DCDuring TALK 18:29, 5 April 2008 (UTC)[reply]
I can't imagine a category more useless than Category:English nouns, so if you want to create these, I'd have no problem with them. But, as Visviva notes, caution is warranted. —RuakhTALK 19:40, 5 April 2008 (UTC)[reply]
Not the topic, but I find the Wiktionnaire category for French words useful, despite its 350 000 entries. I already used it several times. Lmaltier 21:14, 5 April 2008 (UTC)[reply]
Categorize after earliest and newest added quotation? Nah, I don't know... Still, it would be a hassle, if not outright confusing, for words of several meanings. \Mike 12:26, 7 April 2008 (UTC)[reply]

Keenebot2 to branch out?

This thread is to inquire about User:Keenebot2's generation of conjugated forms of verbs, nouns, adjectives and other parts of speech in foreign languages other than French. Hopefully I've proved myself capable of running a bot - around 50000 entries created in the last couple of months for French stuff. I'd like to branch out into other languages now, only ones which have logical conjugations. Examples of Keenebot2's non-French FL entries are for inflections of dificultar (Catalan), precisar (Catalan), soma (Romanian) and lyste (Norwegian). Essentially, I'm asking to be allowed to use the bot for whatever I feel might be useful, as long as it's within the "get data from Wiktionary, process offline and spout out tonnes of declension/conjugation forms in WT format" form. I'd hate to have to go the an uber-bureacratic 2-week vote for every language I use, and also am not keen on abusing the bot status to pick things willy-nilly and add unfamliliar languages so nobody notices, so can I get "free reign" for my bot? Keene2 13:50, 5 April 2008 (UTC)[reply]

The bot authorization is only for French, so you'll need a vote. A very important consideration is the involvement of a native speaker in the process; otherwise serious errors will occur. A native speaker need only to look at the entries being made and say wait a sec, that's wrong!. The vote process is very simple if not controversial, and if you are too impatient for the time you have to wait, you probably should wait a bit anyway ;-). For now: got a native speaker of Romanian? Robert Ullmann 14:41, 5 April 2008 (UTC)[reply]
"Native speaker" is pushing it, but yes, there's needs to be someone who knows the language well enough to vouch for the resultant entries. —RuakhTALK 16:09, 5 April 2008 (UTC)[reply]
2 week vote started. Any question are best posed there. Keene2 14:48, 5 April 2008 (UTC)[reply]
I will point out here that prior to running TheDaveBot for Spanish verbs I looked over it, two native speakers looked over it, the pages sat for several months and THEN two errors were discovered (two per verb, not two total), I think Keene will act in the best interests of Wiktionary and I don't expect perfection, even though I am certain Keene will aim for it. - [The]DaveRoss 16:25, 5 April 2008 (UTC)[reply]

Unicode 5.1

Is finally officially out:

There is a lot of cool new stuff inside, so check it out. Unfortunately still no Avestan and Egyptian hieroglyphs :/ --Ivan Štambuk 21:15, 5 April 2008 (UTC)[reply]

Early Cyrillic

Unicode 5.1 includes some revisions and very significant additions for the range of Cyrillic characters used for Old Church Slavonic (cu, chu), Old East Slavic (orv) – used in etymologies, e.g. горілка#Etymology – and modern Church Slavonic languages (and probably many others).[11] A small selection is already available in the Dilyana font,[12] and undoubtedly there is more font support to come.

It's likely that only obscure Slavistics fonts will support this range, at least at first. Will we have to fork the current Cyrillic style (.RU) into a second version with its own list of fonts for this purpose? When the fonts do become available, the preference be to imitate traditional typography, which uses old-fashioned manuscript-style typefaces for these languages? —Michael Z. 01:30, 6 April 2008 (UTC)[reply]

Such significant additions merit their own brand new ISO 15924 code, so they created Cyrs - Cyrillic (Old Church Slavonic variant). So far OCS entries (2158 of them according to WT:STATS) are using exclusively {{Cyrl}}, which should be changed to {{Cyrs}} once the particular font issues are inspected and settled. Dilyana is so far used for Glagolitic ({{Glag}}). --Ivan Štambuk 02:06, 6 April 2008 (UTC)[reply]
I've created a draft {{Cyrs}}, based on the existing pattern, specifying Dilyana font followed by other Cyrillic ones, and applying class="Cyrs".
Is there any reason not to add a class="Glag" to {{Glag}}, for future use and user customization? —Michael Z. 04:46, 6 April 2008 (UTC)[reply]
We should have Glagolitic. It was in use in the Balkans long after it disappeared elsewhere, so there are documented forms for relatively modern words in Glagolitic spellings. --EncycloPetey 04:57, 6 April 2008 (UTC)[reply]
The template is already there, and seems to be in use on 226 pages. I'll just go ahead and add the class, so it will be there if anyone needs it. It seems clear that there will be no conflict. —Michael Z. 05:06, 6 April 2008 (UTC)[reply]

Hm, it seems that applying "font-family:Dilyana;" breaks glagolitic text in my browser (Safari/Mac, but it doesn't affect the display in Firefox 2/Mac). Evidence for "don't mess with it if it's not broken". —Michael Z. 05:19, 6 April 2008 (UTC) NM; restarting the browser fixed it.Michael Z. 05:26, 6 April 2008 (UTC)[reply]

Still lacking credibility as a decent dictionary

I wrote the following in the Requests for Cleanup, but though it worth copying here.--Richardb 00:24, 6 April 2008 (UTC)[reply]

It is still too easy to find basic words, such as head, which have far fewer meanings listed in Wiktionary than in many a concise dictionary. I pointed this out about head a couple of years ago. Yet it is still missing some simple definitions:-

  • head of steam, head of pressure.
  • head of a door frame
  • it cost him his head (it cost him his ilfe, but his head may still be in place!)
  • $10 per head
  • side of a coin
  • part of a tape or disc player, printer etc
  • promontory
  • events come to a head; a climax
  • the top of a pimple;spot;boil
  • out of one's head; off one's head

etc etc.


some parts are confused:-

  • (countable) The topmost, foremost, leading or principal operative part of anything.

What does it say on the head of the page?
Principal operative part of a machine has nothing in common with head of the page

I previously tried to get some sort of Quality Control Project going on the top 1000 words, but was defeated by apathy (mine and everyone else's). It has to be a team effort, but team efforts never seem to succeed here. Everyone seems to want to do their own thing. So Wiktionary still seriously lacks credibility in it's most basic function - as an English Dictionary.

I'm no longer interested in trying to take this on. But unless quite a decent group takes it on, the dictionary is still going to be lacking credibility, despite all the other wonderous stuff which people spend time adding.--Richardb 00:24, 6 April 2008 (UTC)[reply]

Nonetheless I think the basic point is valid -- we are unlikely ever to be taken seriously as a dictionary unless we have exemplary coverage of core English vocabulary. Part of the problem here is that writing a reasonably comprehensive entry for a common word like head is easily a day's work; personally, on the rare occasions when I have that kind of time available, I find it difficult to justify spending the day improving one entry rather than creating 50 or 100 entries for words that we don't have yet. But I do agree that this is our single most significant failing at present. The next time I actually have an 8-hour block available, I'm bringin' it to the GSL. -- Visviva 13:01, 6 April 2008 (UTC)[reply]

moo goo gai pan: What if you don't speak the language?

Copied from: Talk:蘑菇鸡片

start ...omitted... The Cantonese definition seems to have been removed, although this is fairly clearly a dish of Cantonese origin. 24.29.228.33 02:59, 6 April 2008 (UTC)[reply]

I agree. The problem that I'm running into is that I can't find any corroborating material. I don't think citing the Wikipedia article is appropriate at this point, since it also lacks proper citations. I don't speak Cantonese myself, so I don't know if I can trust the accuracy of the scant materials that I've found online. Already, I've found one descrepancy. Wikipedia says that 鸡片 is "gai1 pin3" but Cantodict says that its gai1 pin3*2.[13] I think I'll post this to WT:BP, and find out what others want to do. I'd honestly rather having nothing, than to include potentially inaccurate information. My reason is that I've seen how errors and inaccuracies are quite easy to perpetuate online, once they are out there. -- A-cai 05:44, 6 April 2008 (UTC)[reply]

end

First of all, are there any Cantonese speakers that could help out with these two entries (蘑菇雞片 and 蘑菇鸡片)? If not, what do we want to do about such entries (words in languages which are difficult to verify, especially because we lack the appropriate native speakers at Wiktionary)? Opinions? -- A-cai 05:44, 6 April 2008 (UTC)[reply]

Wiktionary:Transliteration

The guidelines at Wiktionary:Transliteration and the contents of Category:Wiktionary:Transliteration need some attention. There are a few independent issues I'd like to address, so I'll place them under separate subheadings here. —Michael Z. 16:45, 6 April 2008 (UTC)[reply]

Forked guideline

I'd like to merge Wiktionary:Transliteration with Wiktionary:Transliteration and romanization. There doesn't seem to be any reason for two essentially redundant guidelines. Any objections? —Michael Z. 16:45, 6 April 2008 (UTC)[reply]

Sounds good to me. :-) —RuakhTALK 19:36, 6 April 2008 (UTC)[reply]
Since there's no objection, I'll go ahead and merge these shortly. I'll add merge notices to the pages immediately, in case someone missed this discussion. —Michael Z. 18:33, 10 April 2008 (UTC)[reply]
Done merging, and then made some additions and reorganization on the page. Please look over Wiktionary:Transliteration and romanization. —Michael Z. 21:51, 10 April 2008 (UTC)[reply]

Nomenclature

Romanization is the more general category, with transliteration being more limited in scope. In one case (Wiktionary:About Japanese/Transliteration, dealing with w:Hepburn romanization) a guideline seems to be incorrectly named. Since we're dealing with less than a dozen guidelines so far, I've proposed moving the guideline to Wiktionary:Romanization and re-categorizing it under Category:Wiktionary:Romanization. Comments or objections? —Michael Z. 16:45, 6 April 2008 (UTC)[reply]

That makes sense. It's not like we'll ever be transliterating into any non-roman script. —RuakhTALK 19:36, 6 April 2008 (UTC)[reply]
Incorrect. Transliteration has the broader scope, since any script may theoretically be translated into any other script. Romanization is a specific subset of transliteration in which the result is written with Roman letters. There are some languages included here (like Serbian or Crimean Tatar) which in fact are written in multiple scripts. --EncycloPetey 21:29, 6 April 2008 (UTC)[reply]
Eh, that's iffy. I'd argue that neither has broader scope, since "transliteration" typically implies a character-by-character mapping scheme (so, you can transliterate Greek writing, but not Hanzi writing, into Latin script), which "romanization" does not; but "romanization" necessarily implies mapping into Latin script, which "transliteration" does not. But for Wiktionary purposes, where we only ever map text into Latin script, "romanization" has the broader scope. (Your comment about languages in multiple scripts strikes me as a red herring, since in that case we're not transliterating Serbian+Latin into Serbian+Cyrillic, but rather including the already-existent Serbian+Cyrillic alongside the already-existent Serbian+Latin. If there's a Serbian+Cyrillic word that's spelled funkily for whatever reason, we'd use that funky spelling, rather than providing a straightforward Cyrillic transliteration of its Serbian+Latin counterpart.) —RuakhTALK 21:55, 6 April 2008 (UTC)[reply]
You may be right about Serbian, but I maintain that Transliteration has the broader scope. Strictly speaking, Romanization implies the use of only Roman letters, but Pinyin "Romanization" includes Arabic numerals to transcribe tone. On reflection, there are aspects to Romanization that are not covered under the term Transliteration, just as there are aspects of Transliteration not covered under the term Romanization. So, I retract what I said about one being the subset of the other; they are two items which have significant overlap on Wiktionary, but neither is wholly included within the other. --EncycloPetey 22:05, 6 April 2008 (UTC)[reply]
It's true that I only considered transliteration into Latin when I suggested that romanization is the more encompassing concept, but that is what we're concerned with. Romanization from another alphabet is also transliteration, while romanization from a logographic system is not (the addition of numbers to pinyin is just a detail of a romanization system). In en.Wiktionary, "romanization" covers the whole topic more precisely than "transliteration" does.
The broad standards bodies have run into these constraints too, so while Slavicists usually say "transliteration", the BGN/PCGN refers to all of their standards as "romanization" systems. —Michael Z. 22:30, 6 April 2008 (UTC)[reply]

I'll leave this alone for now, since there doesn't seem to be active support for changing the names. —18:37, 10 April 2008 (UTC)

Standards

For romanization and transliteration we are using a mix of established standards, slightly modified standards, and systems created specifically for Wiktionary. Some of our romanization guides emphatically state that romanization is distinct from pronunciation, while at least one novel phonetic system is under development with the explicit assumption that the need for transliteration "is now suddenly past."

I think we need to develop some basic guidance for the use of romanization and transliteration in Wikipedia.

  1. Briefly, what is the purpose of romanization in Wiktionary? Is it distinct from pronunciation, and if not then should it be merged with the latter or deleted altogether?
  2. What circumstances justify developing our own novel system instead of adopting an established standard, created and used by professionals?

Michael Z. 16:45, 6 April 2008 (UTC)[reply]

  1. Briefly, the purpose is to enable the casual reader to look at a string of characters they don't know, ignore that string, and look at the string right next to it of characters that they do know, so they'll have some idea of the word in question, will easily be able to tell if the same word is mentioned in more than once place in an entry (assuming they can distinguish between the various scripts it might be written in that could all produce the same romanization), and so on. It is definitely distinct from pronunciation, because many languages are like English in that a single word can have vastly different pronunciations over space and time, but we don't want to have to provide all those pronunciations every single time we mention a word in any entry. Also, because we typically aim to provide pronunciations in a fairly technical form (IPA, SAMPA, etc.) that are hard to guess at if you're not familiar with them; romanizations, by contrast, should be easily (if sometimes ambiguously) intelligible to the casual reader.
  2. I think for most languages, there exist various co-existing de facto standards, and I'd almost say that in most cases it's better to form our own balance than to try to impose some de jure standard that's not representative of our needs and those of our readership. (This is also complicated by the fact that the de jure standards are often tied to specific organizations and specific kinds of goals, and therefore are potentially POV; and even when this isn't the case, they're frequently way too technical for our purposes.)
RuakhTALK 19:36, 6 April 2008 (UTC)[reply]
I don't think it's a good idea to just dump standard transliterations schemes used by thousands of publications (including most of the real-world dictionaries) in favour of some ad-hoc designed ones that are Wiktionary-specific and which should somehow approximate phonetic value of a word to a clueless reader who just happens to randomly open some FL entry. If someone is supposed to actually learn a FL word using Wiktionary (assuming that that is the primary purpose of FL entries), he is expected to be familiar with some basic properites of it, like phonology and transliteration system. --Ivan Štambuk 20:28, 6 April 2008 (UTC)[reply]
In the case of “standard transliteration schemes used by […] most […] real-world dictionaries”, I agree with you; but I think that for most languages, the so-called "standard" transliteration schemes are really not the most widely used. (Perhaps I'm simply mistaken; perhaps Hebrew, the only non-Latin-alphabet-using language I know well enough to really form my own opinion about, is simply an exception in this regard.) Also, transliterations are not just for foreign-language entries, but also for English-language etymology sections and so on. And even foreign-language entries are not just for people actually learning the foreign language, but also for people who encounter a foreign-language word in some context that makes them want to know more about it. (And I don't think that for most languages there's any transliteration scheme that could be considered a "basic property" that a language learner is expected to know.) —RuakhTALK 21:09, 6 April 2008 (UTC)[reply]
Transliteration also helps non-readers of foreign scripts discern and compare the structure and possibly the phonemics (not the phonetics) of words of various languages.
I would suggest that for some of these reasons it would be best to use a system in use in dictionaries or in linguistics. I think it is generally better to use an established system than to invent our own—even if it is a rare one, then it would be used in at least two places, not just one. Some systems may have variations or not be well defined, in which case we may choose to nail down the fuzzy details.
I also believe that the the wiki principal of reliance on documented knowledge strongly discourages us from presuming to have the expertise to develop or modify a better romanization method than those that have been developed or used by professionals or academics.
But the cases of some languages, the choice of a best system may be debatable, or there may be no good candidate (e.g., Wiktionary:About Thai#Transliteration). —Michael Z. 22:55, 6 April 2008 (UTC)[reply]
The primary purpose of FL entries on Wiktionary is to help English-speaking users (not necessarily native English speakers; English being de facto world's only lingua franca, and the defining vocabulary much more easier to acquire than any specific terminology, cross-language learning opportunities are much more bigger here than in e.g. Wikipedia) learn what do FL lexemes mean, with as much additional data that could enhance learning experience. Experiences of others (those who happen to randomly open a FL entry or navigate to it via ===Etymology===) that have absolutely no interest in the FL entry itself, nor wish to spend a reasonably small amount of time acquainting themselves with transcription/transliteration system usually used for it, should be of little or no concern. Transliterations can sometimes convey much important data - stress/pitch/tone via diacritics that could sometimes be phonemic but not marked in usual orthography, or hyphenation for separating clitics and compounds (which are due to various peculiarities sometimes very difficult for beginners to distinguish).
Most "important" scripts have some sort of standard transliteration system (in lots of cases an ISO standard), or usually a half a dozen of them (The great thing about standards is that there are so many to choose from) that are widely used, and Wiktionary should follow the most common practice employed by real-world FL-English and English-FL dictionaries. Significant deviations should be thoroughly discussed an voted on (like when community decided to dump /r/ and use /ɹ/ despite the fact most (>90%) English-English, FL-English and English-FL dictionaries uses /r/ and that almost no one except trained linguists and knowledgeable enthusiasts knows wtf "alveolar trill" means ^_^).
It might make some sense to account for those who want to see "ch" instead of "č", "sh" instead of "š", "ś", /ts/ instead of /c/ etc. - but not at the expense of all the others who could use the Wiktionary to learn the language, and would expect it to follow the scheme used by most of the others FL-English dictionaries. Maybe in simple.wiktionary.org, or some "dumb-mode" WT:PREFS option ^_^ --Ivan Štambuk 19:51, 7 April 2008 (UTC)[reply]

In a while, I will try to incorporate some of these thoughts into the transliteration guidelines. When/if I formulate some concrete wording, I'll introduce it here before changing the guideline. —Michael Z. 18:43, 10 April 2008 (UTC)[reply]

Organization

Transliteration guides are spread out under different namespaces and categories, and inconsistently titled. Some merely refer to standards outlined in Wikipedia articles. [please add any omissions]

Where does all of this belong?

  • It seems to me that any "wiki-romanization" originated by this project belongs in the Wiktionary: namespace, and not in an Appendix:.
  • Is it better to point to Wikipedia, or to duplicate that material here, in cases where only standardized systems are used?
  • Should we present alternative standards, or only include Wiktionary's selected or created romanization systems?

Michael Z. 16:45, 6 April 2008 (UTC)[reply]

I think for some languages, such as Han-using languages, it does make sense for the romanizations to be described in appendices, since a reader might find it useful to learn the details of our system. But for other languages, such as Greek or Hebrew, an interested reader would find it much more useful to simply learn the script for his or herself, and the romanizations are probably only needed in the Wiktionary namespace. (Even when we do have an appendix, it might be best to have both an appendix and a project page, aimed at different audiences. Keeping them in sync would be a bit annoying, but when you consider that we also have to keep all main-namespace romanizations in sync, it's really nothing by comparison. :-P) —RuakhTALK 19:36, 6 April 2008 (UTC)[reply]
These pages do NOT necessarily all belong in the Wiktionary namespace. If a page is about standards of transliteration used specifically for Wiktionary, then it belongs in the Wiktionary namespace, either within an "About Language" page, or as a page or subpage of its own linked from that "About" page. On the other hand, if the page is about a variety of transliteration schemes, for the benefit of users who may have a work with an unusual transliteration scheme, then it should be an Appendix. The Wiktionary namespace is set aside for information about practice on Wiktionary, and should include only the standard selected for Wikationary. The Appendix namespace covers supplementary material not specific to Wiktionary, and should include any major system likely to be encountered. --EncycloPetey 21:22, 6 April 2008 (UTC)[reply]
Sensible, but it results in the guides for Wiktionary's romanization/transliteration being split between two different namespaces, or having some Appendix information repeated in the Wiktionary: namespace. I guess this could be ameliorated using categories, and by adding a definitive list to the main romanization/transliteration guide. Which is the tidiest solution? —Michael Z. 21:04, 7 April 2008 (UTC)[reply]
That will depend on what information currently exists on Wiktionary for a given language. I would think that having an "About:LAnguage" page would be an important first step, since there is the possibility of listing and linking such key pages and sections from the bottom of the page. --EncycloPetey 22:52, 7 April 2008 (UTC)[reply]
There shouldn't be any significant duplication. The project page should describe what is required/recommended for entries, and the appendix should explain how a given system works. So if it is the consensus that for language A, romanization X should be used, the "Wiktionary:About A" page should say "Entries in language A should use romanization system X," and link to "Appendix:X Romanization." Beyond this, considerations that affect how a romanization system is used in entries (layout, templates, etc.) go in project space; but the description of the system (insofar as it is not unique to Wiktionary) goes in appendix space. -- Visviva 09:17, 9 April 2008 (UTC)[reply]
That's a good summary, Visviva. I will review the relevant guidelines and appendices, and perhaps shuffle things around a bit to fit this picture. —Michael Z. 18:45, 10 April 2008 (UTC)[reply]

Have a look at the pages in Category:Transliteration appendices. Most of them are simply labelled "Wiktionary standard translation", with no explanation or citation. I'll move these from the Appendix namespace into Wiktionary, and post a note on each requesting a reference. —Michael Z. 16:59, 11 April 2008 (UTC)[reply]

Does anyone still object to AutoFormat just fixing these? I don't think I've ever seen a case where its explanation of what it would do was wrong; granted, in plenty of cases it was incomplete, but I think that's because it only adds one {{rfc-*}} tag at a time, so if it actually just fixed things, I think it would have done a complete job. It's annoying that we have to do these manually, and frankly, I'm not convinced that manual intervention is any less error-prone than AutoFormat would be. —RuakhTALK 19:20, 6 April 2008 (UTC)[reply]

Agreed. I can't think of an instance where AF would have done something I disagreed with. However, it might be nice to have an official proposal of new things we're giving AF license to do, so we can specifically agree to them (if Robert's willing to throw together such a list). -Atelaes λάλει ἐμοί 21:27, 6 April 2008 (UTC)[reply]
Likewise. I can imagine that some of the more complicated pages could present a problem (mulitple POS sections with a single Translation section at the bottom), but these are very rare and are problematic anyway. --EncycloPetey 22:07, 6 April 2008 (UTC)[reply]
I've asked the same question here myself a while back, and since then I think I've seen an error, but just one, which I can't believe I didn't report. Suggestions aren't something people jump on, but if the bot does something wrong then we can yell at Robert to tweek it. DAVilla 05:02, 7 April 2008 (UTC)[reply]
Yes. Go for it. SemperBlotto 07:26, 7 April 2008 (UTC)[reply]

Appendix:Old Cyrillic alphabet

I've created a new Appendix:Old Cyrillic alphabet, including transliteration. Please review and correct any mistakes. —Michael Z. 02:58, 7 April 2008 (UTC)[reply]

Placement of terms consisting of multiple words

I am by convention placing terms consisting of multiple words such as complex analysis under the Derived terms header of the article, in this case analysis, as it is my understanding of WT:ELE that they belong there. Is my understanding shared by the community?

My placement of these under the Derived terms header in the article analysis has been reversed. Before launching an edit war, I'd like to be sure I am on the right side. --Daniel Polansky 09:38, 7 April 2008 (UTC)[reply]

The edit summary that moved these to "See also" claimed that compounds should get different treatment. Was there a discussion to that effect before the creation of {{rel-top}}? Without the template, any justification to push some of this perhaps lower-value material lower on the page is understandable. But with collapsible tables I can't see any justificiation at all for separating them. Sometimes I wonder about the point of having big tables of such derived and related terms at all, collapsed or not. DCDuring TALK 11:47, 7 April 2008 (UTC)[reply]
Not to my knowledge; I think the editor was just confused. -- Visviva 12:12, 7 April 2008 (UTC)[reply]
Okay, thank you all. As an aside, I am very fond of these big tables, although not yet sure why. --Daniel Polansky 12:26, 7 April 2008 (UTC)[reply]
Even if your fondness were very neurotic (;-}), it would almost certainly be shared by some meaningful fraction of our users. I'd be interested in why you like them or how you might use them. DCDuring TALK 13:07, 7 April 2008 (UTC)[reply]
So (a) one reason I have discovered is that when looking for a compound term, I like only typing one word of the several and then navigate myself to the term with mouse. That is on days on which I type a lot and am glad to get a relief from typing. Another one (b) is that some substantives get extended by adjectives (e.g. philosophy, analytical philosophy, continental philosophy, pain, physical pain, emotional pain), and when these multi-term extensions are listed, the page of the substantive kind of documents its subclasses, or attibutes. I admit that the latter could be partially served by the Hyponyms header.--Daniel Polansky 17:52, 7 April 2008 (UTC)[reply]
And (c) is the reason (or use case?) given by Mike below: I know or assume that the phrase contains a specific word, but am uncertain about the exact reading of the phrase. --Daniel Polansky 18:05, 7 April 2008 (UTC)[reply]
Adding after the discussion: (d) specifically for adjectives, derived multi-word terms tell me on what classes the adjective is defined as a value of an attribute, so to speak. Phrased differently and modeled differently, it tells me what types the predicate of the adjective is ready to accept as its parameter. --Daniel Polansky 13:45, 9 April 2008 (UTC)[reply]
Likewise, I like the tables of related and derived terms. I always try to add them to Latin entries because I find it helps enormously with learning the vocabulary. Being able to see a host of related terms, and click on each to get the specifics, really is enlightening in terms of understanding Latin word relationships. The commonalities among the various words allow insight into the scope of the root word, and provide a survey of what ending created words in other parts of speech from that root. They're also really handy in the case of verbs for finding (and learning) all the compounded verbs that come from a particular root, and which differ in the addition of a prepositional prefix. --EncycloPetey 01:15, 8 April 2008 (UTC)[reply]
Thanks for the explanations. Though this seems like something only a veteran would use, Daniel has articulated how the tables might help an ordinary user who had come to Wiktionary to look up a complex concept. It's similar to having a lot of usage examples and citations in principal name-space, enabling certain (correctly spelled) searches to find useful entries. That kind of use would not put any limits on how phrases or compound words appeared, so that esthetics and the interests of etymlogic/morphologic/ally oriented user needs could legitimately govern. Would subject matter grouping help in the case of long lists. I would have thought that time-zone names would have been a helpful categorization in the useful extreme case of time. DCDuring TALK 02:46, 8 April 2008 (UTC)[reply]

Wow! So everyone here is happy with the fact that timely is buried deep within time? DAVilla 16:04, 7 April 2008 (UTC)[reply]

I kinda would prefer to split it into several tables, say one for "Derived terms" (which would include e.g. timely), one for "compounds" and one for "phrases", though I understand that such division is not popular here, and a distinction between "compound" and "phrase" is perhaps more difficult to keep up in English than in other languages (like Swedish). But for the information those lists contain: yes, I like them as they allow me to scan the list to find an expression I know contain a given word, but am uncertain how it would be written in the "lemma form"; or I may see which options there are to add a particle or a preposition to get to the appropriate expression, even if I don't remember which one should be used. (Trying to keep track of English prepositions in general, and prepositions used in various fixed expressions in particular, is nothing but Sisyphus work... ;) \Mike 16:36, 7 April 2008 (UTC)[reply]
Perhaps other grammatical forms, or transformations of a word deserve a special status. The plural times appears right next to time, so maybe the adjective timed, adverb (and adj.) timely, etc, belong closer to the top than, say time-honoured or Australian Eastern Daylight Time.
Is it possible to describe a logical, but fairly limited list of such forms? —Michael Z. 16:46, 7 April 2008 (UTC)[reply]
I would then say that anything which is not a compound/phrase, that is, anything which is not possible to split into more than one independent 'proper' word, would qualify. Thence timely would qualify, but not time-honored (= time + honored). Would there be any ambiguity in such a split? \Mike 17:42, 7 April 2008 (UTC)[reply]
I'm happy with timely's being s.v. time's "Derived terms" section if it's a derived term. If it's in fact descended from an older version of timely then I'm not sure. In the case of complex analysis, I suspect strongly that it is derived from analysis and so belongs in its "Derived terms" list.—msh210 16:55, 7 April 2008 (UTC)[reply]
If it is descented from an older version of timely then I believe it should be in a related terms section. Thryduulf 17:04, 7 April 2008 (UTC)[reply]
Ah, yes, agreed.—msh210 18:39, 7 April 2008 (UTC)[reply]
Interesting ... the discussion at #Ambiguous etymologies (above) seemed to reach the opposite conclusion. It remains my opinion that both forms of etymology need to be presented on Wiktionary; "timely" is formed from time + -ly in contemporary English, but it is also a linear descendant of OE tīmlīce. We would be doing an unforgivable disservice to our readers if we discounted either of these facts... and in this case it seems like "Derived terms" is the more transparent choice. -- Visviva 09:25, 9 April 2008 (UTC)[reply]
Hm, I'm thinking in terms of grammatical morphology rather than etymology. Timely is an adv/adj sense of the lemma time, regardless of whether it sprang from time or has always been used alongside it. Maybe I'm being too ambitious, as this may require a separate section, or something like a declension or conjugation block, rather than being sorted at the top of "derived terms". —Michael Z. 17:08, 7 April 2008 (UTC)[reply]
  1. Do ordinary passive users actually use derived terms and related terms? How do we use it? I use it as a kind of memory exercise when working on an entry sometimes, but rarely follow the links just to get information.
  2. Are "Derived terms" and "Related terms" ever split by sense? I assume that we wouldn't want them to be.
It doesn't seem silly to divide the contents of these into single words, compounds, and phrases/idioms/proverbs if the single block is "too big" as time's certainly is. DCDuring TALK 17:21, 7 April 2008 (UTC)[reply]
Separating (in lion) the lioness from the lion cub would be cruel. It is painful to have to look in different sections depending on the precise spelling of words (space inside or not?) when there is no other reason to separate them. But I agree that proverbs should be put in a different section. Lmaltier 17:48, 7 April 2008 (UTC)[reply]
2. Splitting by sense seems like a good idea in many cases. For example, why would I want pressure head or Korboggen head to be in the same table as head lettuce at head#Derived terms? This goes several times over for Chinese characters in the East Asian languages. But I suspect there are many other cases where splitting by sense would cause all hell to break loose.-- Visviva 15:15, 8 April 2008 (UTC)[reply]
The time page is indeed an extreme example, showing the downsides of what I so often like. As regards my preference, the main point is that the compounds are listed somewhere, not that they are listed under Derived terms heading. For me, it would be perfectly okay to have Compound terms heading, or whatever is considered appropriate. --Daniel Polansky 17:58, 7 April 2008 (UTC)[reply]

Inflections

We conventionally list certain inflections by the headword: plurals of nouns and pronouns, comparatives and superlatives of adjectives and adverbs, other cases of pronouns (he > him, himself, his), key inflections of verbs.

My paper dictionary (Canadian Oxford) also lists such inflections when they are irregular or "may cause difficulty". But it goes beyond Wiktionary by adding the simple past tense, present and past participles, adjectives in -able formed from transitive verbs, e.g., achieve (achievable), exchange (exchangeable). It may include versions restricted to U.S., British, or Canadian English, etc, e.g. "car·olled, car·ol·ling; US car·oled, car·ol·ing".

Regardless of their shared or separate etymologies, timed, timely, and timeful are inflections of time. It makes sense that we would make this intimate relationship clear somehow. Perhaps we should consider expanding the inflection templates like {{en-noun}}, or adding an "Inflections" section before "Derived terms" —Michael Z. 19:48, 7 April 2008 (UTC)[reply]

We already have an "Inflections" section before "Derived terms".—msh210 20:29, 7 April 2008 (UTC)[reply]
By my reading of WT:ELE, in English words only some inflections belong next to the headwords, but an inflection heading is only to be included in non-English words.
Perhaps the latter restriction should be relaxed. —Michael Z. 21:11, 7 April 2008 (UTC)[reply]
You have read ELE correctly; we do not use the Inflections section in English entries, and we do not need to. Adjectives formed in "-able" are separate words with separate entries and etymologies. They are listed in the "Derived terms" section. English does not treat timely as an inflection of time, nor do I know of any European language where the adverbs are considered inflections of nouns or vervs. Adverbs are typically regarded as a separate part of speech, though they are Derived from nouns, verbs, or adjectives. --EncycloPetey 22:47, 7 April 2008 (UTC)[reply]
Agreed 100%. —RuakhTALK 22:59, 7 April 2008 (UTC)[reply]
Okay, I see that the verb forms can be handled as in carol#Verb.
So inflection isn't the correct term, but this still means that some of a word's closely-related cognates can get lost in a sea of compound words, idiomatic phrases, and relative neologisms. In this way, Wiktionary's presentation suffers in a few cases, compared to a paper dictionary. —Michael Z. 00:45, 8 April 2008 (UTC)[reply]
I agree that it would be nice to have separate sections for (on the one hand) words derived by the addition of affixes and (on the other) phrases “derived” by the addition of words. —RuakhTALK 01:00, 8 April 2008 (UTC)[reply]
Yes, they can get lost, but this only happens on a very small number of pages. I suspect there are fewer than 50 such pages on all of Wiktionary. --EncycloPetey 01:11, 8 April 2008 (UTC)[reply]
But our hope is for them to get lost on many words. We might as well formulate some thoughts about how to separate them now, when it's still quite rare. Would anyone object to my splitting the section at time#Noun into two tables, one glossed as "words derived from the noun time", one as "idioms and set phrases using the noun time", just to see if we like the result? —RuakhTALK 01:38, 8 April 2008 (UTC)[reply]
I'd like to see how it looks. I can't see any serious objections to trying it, as long as none of the information is removed. Idioms, especially, seems to be a different thing from derived terms
My paper dictionary groups these as "idioms and phrasal verbs", and groups them after the main definitions. It also differentiates "derivatives" (formed with suffixes and are appended to an entry unless further definition is required) from compound words (which are always main entries, whether they are formed as one word or not, e.g., bathroom, serial number, and mega-musical). —Michael Z. 02:49, 8 April 2008 (UTC)[reply]
With collapsible tables, we don't even need an additional section. We could have multiple tables just as we do for Translations, as long as each table is appropriately labelled. --EncycloPetey 02:58, 8 April 2008 (UTC)[reply]

{{lookfrom}}

Something useful I've recently found is {{lookfrom}}, which directs the user to a Special:Prefix index page. It isn't perfect, but could be used to make the expansion of the ====Derived terms==== section redundant. Keene 14:22, 8 April 2008 (UTC)[reply]

I don't think that would do much to help things, as it only can find pages starting with the selected word. Things like "in time" or "on time" would still need a manually made list in time. On a related term: ould it be possible to tweak the search function so that it only looks for pages which includes the search string in the title of the page, and doesn't care if it occurs in the body? That would IMO make more sense as a replacement for ====Derived terms==== (except of course such derived terms which are based on some kind of mutation or stem change...) \Mike 14:42, 8 April 2008 (UTC)[reply]
Additionally, that template function doesn't distinguish languages. Nor does it restrict the listing to words etymologically derived from the start term; it simply lists entries that start with the specified set of characters. For example, boggy is derived from bog, but boggle is not (even though it shares the same start letters). It also is case-sensitive, so it wouldn't list New York if you were looking at "n". We don't have anything that would make Derived terms redundant. --EncycloPetey 21:43, 8 April 2008 (UTC)[reply]

Treatment of certain types of compound terms

I asked a question about formatting Spanish entries that applies to many other languages, so I brought it here. Compound words can often be formed by affixing pronouns (or participles particles) to verbs. In Spanish, for instance, practically any combination of one or two reflexive/direct/indirect pronouns can be attached to infinitive verbs, present participles, or affirmative commands. So I have a couple questions. 1) How should these compound words be treated? Should they be listed as "Conjugations" or under "Derived Terms"? Should one place (for instance the infinitive entry) list all the permutations or should they be scattered on the pages of each stem? See redactar for an example of putting them all under "Derived Terms". 2) The list was made by blindly following the rules of grammar, so many of the permutations are rare, possibly unattestable. Should we include links for as yet unattested terms (does the CFI apply to links)? I'd love ideas. --Bequw¢τ 19:34, 8 April 2008 (UTC)[reply]

Note: that should say "pronouns (or particles)". Some languages have particles (small words) that can be attached to the verb. I know Hungarian does this, and IIRC German and Dutch do as well. Indonesian will have problems with this that I don't fully understand.
To elaborate for those who aren't familiar with Spanish, many verbs can have a pronoun (indirect object or direct object) affixed to the end of a verb. So, "kiss me" would be bésame (besa + me). "Give it to me" would be dámelo (using da + me + lo). How should these affixed forms be treated, and where/how should they be linked or listed? The issue is compunded in Spanish by the fact that the meaning of some verbs changes depending on the presence or absence of these pronouns. --EncycloPetey 21:34, 8 April 2008 (UTC)[reply]
Hebrew does this also, although it is far more common in older texts than it is in current speech. I say have all these attested forms as entries, and list all of them that are definitely the correct form, even if unattested. I'm not sure where to list them, though: probably under Conjugation.—msh210 22:07, 8 April 2008 (UTC)[reply]
I think Hebrew's a bit different from Spanish in this regard: in Hebrew I think it's actually part of the verb's morphology — for example, in a form like עשני (asáni, made me), I couldn't say where the verb ended and the direct object began. So in Hebrew, I think these forms definitely warrant their own entries, and in fact some of them (such as קדשנו (kidshánu, sanctified us)) already have them. By contrast, in Spanish I think there's a separate verb and enclitic pronoun, and the spacelessness and accent are strictly written phenomena. (The -monos thing also affects pronunciation, but still I think falls into the same general category.) They may or may not warrant their own entries — my sense is not, but I see it both ways — but I don't think that bears on handling of Hebrew. —RuakhTALK 04:32, 9 April 2008 (UTC)[reply]
Even if Ruakh's correct that Hebrew is different from the Romance languages in this regard, I still maintain that both should have such entries and lists per my comment just above.—msh210 16:46, 9 April 2008 (UTC)[reply]
As I've always seen "Sum of Parts" reasoning used to remove phrases, and not applied at the sub-word level, I'd suppose that all Romance language compound term that are attestable would merit their own entry. This is especially the case in Spanish because some of the affixed pronouns could be ambiguous (whether direct or indirect objects) and meanings can change slightly. --Bequw¢τ 15:50, 9 April 2008 (UTC)[reply]
I'm not suggesting we apply it at the sub-word level, only that we adopt a more useful definition of word. If me hablaste is two words, then I think so is háblame. That said, it would be nice to have an entry for hábla- defined as “Form of habla used with following clitic pronouns; see hablar.”, and perhaps háblame should have an entry that says simply “habla (see hablar) + me.” I don't know what POS to use, though; in many cases it's a verb phrase (which we'd call a “verb”), but in other cases it's an odd verb-phrase fragment, like in either dalo al profesor (give it to the teacher) or dame el libro (give me the book). (Perhaps either V+DO or V+IO can be considered a constituent, I don't know, but certainly it doesn't flip back and forth whenever we change which object is a clitic.) And IMHO it is in no case a good idea for dar to link to dame, etc., though it should perhaps give the relevant imperative as “da/da-/dá-” (and likewise for other affected forms). —RuakhTALK 17:10, 10 April 2008 (UTC)[reply]

Treatment of other types of compound terms

Hebrew has a number of terms that translate into English as prepositions (from, to, others) and conjunctions (and, that), but which are attached to the fronts of words in the Hebrew. These are ב-,‎ ו-,‎ כ-,‎ ל-,‎ מ-, and ש-. Words formed of these, like בארץ (b'eretz, in a land) (equals ב- (b, in) plus ארץ (eretz, land)), are written without space in the middle, and are, I think, recognized as one word by, for example, schoolchildren. Linguists consider them two words each, with the prefix counting separately from the rest. (Ruakh informs me it's actually a clitic rather than a prefix.) Certainly anyone who knows Hebrew can figure out the meaning if he can figure out where the prefix ends: if it's two words, then it's a sum of its parts. On the other hand, someone who doesn't know where the prefix ends will likely look up the whole thing. Ruakh says these are not entry-worthy; I say they are. I decided to take this issue here to the BP because it may well be relevant to other languages (Finnish, Hungarian, others) as well. What do you all think?—msh210 16:41, 9 April 2008 (UTC)[reply]

If they're written without a space in the middle, and words in Hebrew are otherwise spaced, then I say we might as well have them. What harm does it do after all? (A bigger problem is with languages like Thai or (what I've been learning recently) Lao, which are not written with spaces between words at all. It is often impossible to judge what is a compound noun and what is just sum-of-parts.) Widsith 16:50, 9 April 2008 (UTC)[reply]
The problem is that there's no limit; you can (and frequently do) put more than one of them together, as in [v'][she][k'][she][mi][b'][tokh], "and that when from within" — which contains the clitic [she] twice, four other clitics once each, and the preposition [tokh]. (Note: the brackets here are just for ease of reading; this isn't IPA or anything.) In this example, I think [b'][tokh] "inside, within" warrants inclusion as a fixed expression, as does [k'][she] "when"; but certainly the whole thing together doesn't. —RuakhTALK 17:06, 9 April 2008 (UTC)[reply]
I'm not sure what the problem is with adding many such entries. Not all will be attested, and those probably are the only ones we should add, but among those that are attested, I'll grant that there will still be quite a few. But this is a wiki: we've got time.—msh210 17:28, 9 April 2008 (UTC)[reply]
I think the same rule as in English should be followed, if possible: if the compound term refers to a specific term, then it should be included (like "motorcar", "railroad car", "carwash"), but not if it is a simple sum of parts ("red car", "Japanese car", "dad's car"). Of course, it is often not clear if the term is specific or not. In that case I don't see any problem including the word.--Jyril 17:41, 9 April 2008 (UTC)[reply]
I agree. —RuakhTALK 17:55, 9 April 2008 (UTC)[reply]
Sorry, but I don't think that makes sense. The CFI are designed to use the words found in permanently accessible works (books, journal articles, etc.) as a proxy for the words that a typical reader might encounter and want to look up. That works fine — not perfectly, but fine — for individual words; but I don't think it'll work at all for a stray series of prepositions and conjunctions that all happen to appear together at the start of a phrase, plus whatever word happens to follow them. There are virtually unlimited real combinations, and the CFI's standards of attestation won't reflect which ones are worth including and which worth aren't. (Partly because in some sense none of them are worth including, partly because in some sense they all would be if that were even remotely possible, just as it would be great to include every possible Lao sentence.) —RuakhTALK 17:55, 9 April 2008 (UTC)[reply]

I think a dictionary is intended to be (among other things) an aid for learning a language, not a substitute, or the only source for understanding. Therefore it is not necessary to include every possible form of every word in every language. It would probably be impossible as well. Just as an example, every Finnish verb has five infinitives and six particips, some of which can be inflected in fourteen cases in singular and plural, and combined with six possessive suffixes and a host of clitics. This adds up to dozens, possibly hundreds of forms derived from each verb. The numbers are smaller with nouns, but they are still large. Check one composition at: järjestelmällistyttämättömyydellänsäkään, which I added for fun, and because I have learned that it is the longest "word" in Finnish that is not a compound term. Nykysuomen Sanakirja (the Dictionary of Modern Finnish) has some 200.000 entries. In order to list all forms that may exist in Finnish alone, one would probably need tens of millions of entries. As a matter of fact I think we have too many forms already. As a simple example, most of the English plurals are plain old SoP's and completely unnecessary for anyone who knows even the basics of the language. Finnish plurals are a bit more complicated because the stem often changes, but for the most part they are as useless (or to be exact, the stem does not change, but the nominative form and stem are very often not the same thing). Only irregular plurals and those which have an independent meaning, or are "pluralia tantum" would suffice, IMHO. Having said this I do not know where to draw the line. Hekaheka 18:54, 9 April 2008 (UTC)[reply]

Why did you added järjestelmällistyttämättömyydellänsäkään because the word usually used is epäjärjestelmällistyttämättömyydellänsäkään? ;) In the case of words with enclitic suffixes or other "obvious" cases, I would accept those which are very common in that language or otherwise special. That is not the case of "proper" inflections, which should be included. --Jyril 19:18, 9 April 2008 (UTC)[reply]
This is out of the point of the discussion, but epäjärjestelmällistyttämättömyys includes a double negation and is therefore not a meaningful word - rather a collection of clitics. Hekaheka 21:50, 9 April 2008 (UTC)[reply]

A dictionary is not used only by people learning a language. You might want to try to understand a message you received by e-mail in a language you don't know, and cut and paste every word. Another use of Wiktionary might be by a browser allowing a simple search by double-clicking on any word of any website. Such uses require that as many forms as possible are included. Lmaltier 21:05, 9 April 2008 (UTC)[reply]

Translations robots do that job better. Besides, how much does it help to know that presupuestares is "the second-person singular of presupuestar in the future subjunctive", if you don't know what a subjunctive is?. It takes quite a bit of language-specific knowledge to understand the glosses. Hekaheka 21:50, 9 April 2008 (UTC)[reply]
There are no translation robots for some languages. Of course, you cannot translate a text well just by searching each word. But you can know what the lemma form is, and what it means, and this is important, even if you don't know what subjunctive means. Paper dictionaries don't allow that (I already searched for a word in a paper dictionary, and concluded it was absent; actually, it was present, but the lemma form was not obvious). Lmaltier 06:08, 10 April 2008 (UTC)[reply]
Well I guess there is some precedent for excluding single words that are Some of Parts as in English we now exclude the possessive case. Per language policies could be written to exclude Sum of Parts words according to the rules of that language, but I don't think it should happen until our wiki technical abilities mature. For example, finally when dealing with the case of the first letter of a word, our technical ability (auto-redirecting Omphaloskeptic -> omphaloskeptic) matches our policy (not allowing multiple entries for different cases of the same word/lexeme). (That was such a great enhancement, by the way.) I'd be great if a user's search came up empty, we could ask them for the language and we could check to see if the word is decomposable by that language's wiki-programmed rules. But until then, let's leave it open. --Bequw¢τ 21:36, 9 April 2008 (UTC)[reply]

Wandering [edit] links

These should now appear on the correct line; also trans and rel tables play nicely with images. See WT:GP for more info. If you see anything odd, tell me or us there. Some float boxes still need some extra code removed for IE. Robert Ullmann 18:10, 9 April 2008 (UTC)[reply]

Demonyms

A demonym is the name for a person from a place: European, Basotho, Iowan, Winnipegger, Haligonian, Smithereen. It's a specific, and sometimes interesting, kind of word.

I'd like to create a context template {{demonym}} applying a new category:Demonyms, which would in turn fall into category:People and either category:Geography or category:Place names.

(See also category:Exonyms, category:Endonyms, category:Xenonyms.)

Is this a sensible idea? —Michael Z. 07:01, 10 April 2008 (UTC)[reply]

It sounds like a sensible idea to me. Thryduulf 09:09, 10 April 2008 (UTC)[reply]
The category sounds good, but why would we need the context label? The sense would not be used in the context of demonyms: it'd be used in a general context.—msh210 16:38, 10 April 2008 (UTC)[reply]
I see, demonym doesn't describe where the word is used, rather it's a sub-category of nouns (and come to think of it, they also belong in category:English nouns). I guess it's just as easy to add [[category:demonyms]] as it is to use a template. —Michael Z. 17:37, 10 April 2008 (UTC)[reply]
OTOH, demonym is specific to a sense, not an entry, not a Language, not an Etymology, not a PoS. (Not that that problem is in anyway limited to demonyms.) DCDuring TALK 17:48, 10 April 2008 (UTC)[reply]
So a Berliner is both a native and a pastry.
General question: is it a good idea to place the category tag in its context in an entry (e.g., in the same line as definition no. 1), or should they all remain at the bottom of the page? —Michael Z. 18:03, 10 April 2008 (UTC)[reply]
End of the language section; see Wiktionary:Votes/2007-05/Categories at end of language section. —RuakhTALK 20:07, 10 April 2008 (UTC)[reply]
This would be a topical category, and should be a subcategory of Category:People, and might also be listed as a subcategory of Category:Etymology for each language. I wouldn't use a context template, because "Demonym" is not a context; it is a class of words. That is (astronomy) and (sports) say something about the context in which a sense is used, but demonym describe the kind of word. --EncycloPetey 23:25, 10 April 2008 (UTC)[reply]
  • Er. Is this really what demonym means? The only place I've seen it used that way is on Wikipedia, and I always assumed someone there just invented it cos it sounds important. Obviously I get the formation, but I am just a bit cautious about our adopting something if it is really only a protologism. A look at books.google shows few hits, and a lot of those seem to be using it with the sense of "name used by the people", i.e. "colloquial pseudonym". Widsith 18:49, 11 April 2008 (UTC)[reply]
    2007 dictionary where it is used as defined, 1870 dictionary where it is defined differently, 2003 book where the word is used and mentioned as defined, an 1895 dictionary where it apparently means 'name based one what one does' (I think, not too great with Greek), 2005 textbook used as defined. I think is used some if not widely, but it's present definition is very recent, late 1990s or early 2000s. The old definition seems to have fallen off at the onset of the 20th century. - [The]DaveRoss 20:35, 11 April 2008 (UTC)[reply]

Thanks for the input, and thanks for checking the attributions, TRD. I've created Category:Demonyms, and a couple of language subcategories, under People, Geography, Etymology, and Names. Please review. Michael Z. 2008-06-26 21:36 z

Guidelines to correct structures with multiple ety and pron

I've been going through Category:Entries with level or structure problems and found several entries where I did not know how to correct the structure. These are mostly the ones with multiple etymology and pronunciation in different variations. One example is Sofia. Is there a guideline I could look at? Thanks. --Panda10 21:52, 10 April 2008 (UTC)[reply]

==Italian==

===Pronunciation 1===
{{IPA|/soˈfia/}}

====Proper noun====
{{rfc-level|Proper noun at L4+ not in L3 Ety section}}
'''Sofia''' ''f''
# {{given name|female||it:}}, cognate to [[Sophia]].
WT:ELE is your best bet, normally it is broken down by etymologies, then parts of speech. I am sure that somewhere there are a pair of lexemes derived from the same etymology with different pronunciations...I think that the best idea in this case would be to list both pronunciations in the pronunciation section and then note the pronunciation differences, rather than break the page into yet more sections. - [The]DaveRoss 22:21, 10 April 2008 (UTC)[reply]
I've also got a "Model Pages" project. For a word with a single pronunciation and multiple etymologies, refer to round. For a word with 2 etymologies, each with its own pronunciation, see hinder. For a word with a single etymology and multiple pronunciations, see predicate (though you'll have to look at my last edit, since Widsith disagrees and thinks this is two separate etymologies). --EncycloPetey 23:15, 10 April 2008 (UTC)[reply]
Thanks for the model entries. What are the headers that can be numbered? Only etymology? It seems that AutoFormat will add rfc-level to entries with numbered pronunciations as above. So even if I remove rfc-level because I think the structure is correct, it will be added back next time. --Panda10 23:56, 10 April 2008 (UTC)[reply]
There is debate about which headers may be numbered. Personally, I believe that etymology and pronunciation headers should be numbered iff they are parallel headers under the same over-header. So, I would only number pronunciation sections if (1) there were more than one pronunciation under a single etymology and the pronunciations were tied to the particular POS sections underneath them, or (2) there were multiple pronunciations tied to particular POS sections and the etymologies had not yet been put in. But, in the latter case, the addition of etymologies might eliminate the need for numbering the pronunciations, if they were located under different etymology headers.
There has been discussion off and on about numbering Verb or Noun sections in certain languages. However, there are several approaches to how this gets handled and there has never been a focussed discussion or conclusive decision. Some regulars are strictly opposed to the idea, while others think it is useful. But, it is almost never needed in English, so it isn't usually a concern to the community. It's more a problem in languages where the gender, inflection, or other aspects of a word are tied to specific senses, so that there must be separate inflection lines and separate inflection sections for the different definitions. --EncycloPetey 00:11, 11 April 2008 (UTC)[reply]

Sorry, I am getting a little confused. I've just got a message from Hikui87 that I edited some of the Japanese entries incorrectly. They were marked with rfc-level and I moved the Alternate forms section from below the POS above it and probably renamed them Alternative spellings because I thought all languages followed the same basic layout. It seems that Japanese entries follow different layout rules. So are we discussing only English entries here? --Panda10 11:47, 11 April 2008 (UTC)[reply]

All languages do follow the same basic layout, but Alternative forms can appear under the POS when the forms are specific to a particular POS. This sometimes happens in English entries, but not so often as in some other languages. In the case of Japanese, they've chosen to make that placement of the section all the time because it's a more common problem than in other languages. The kanji used to write Japanese have more than one reading, and a particular romaji may come from more than one set of characters too. So, there are so many cases where the alternative form depends on POS, or even on sense, that ALternative forms is placed under the POS every time in Japanese entries. This is one reason I don't try to clean up Japanese (or Chinese, Korean) entries myself. There are a number of special considerations. --EncycloPetey 12:29, 11 April 2008 (UTC)[reply]

Glossaries on Wiktionary

Are there planned any glossaries on Wiktionary other than Wiktionary:Glossary? What is planned to happen with Transwiki:Glossary of library and information science? Does anyone know whether Wikipedia plans to keep its glossaries? I am asking because I find the glossaries useful, simplyfying the work of extracting all the definitions of a given domain from Wiktionary, which is something that can in principle be tediously done using categories. Thanks for any hints. --Daniel Polansky 17:22, 11 April 2008 (UTC)[reply]

Yes, there are others. They are mostly in the Appendix: or Transwiki namespace because they are not Wiktionary-specific. You can find them by sifting through Category:Appendices. --EncycloPetey 17:45, 11 April 2008 (UTC)[reply]
So is it that the glossaries in Transwiki: namespace are planned to be moved to Appendix: namespace? I mean, I thought that Transwiki means that these things are yet to be processed. Is there any policy, even if in the making, on how to deal with glossaries, like how to format the entries? --Daniel Polansky 18:20, 11 April 2008 (UTC)[reply]
As far as I know, there's not really any policy or guideline on transwikis. In my eyes, Generally the transwiki namespace is full of the unformatted crap that Wikipedia didn't want, in essence in limbo between the 2 projects. Keene 21:44, 11 April 2008 (UTC)[reply]
Items in the Transwiki: namespace have been moved here, and may be cleaned up and moved to the appropriate location. However, some of these items are duplicates of what we have, or are non-Wiktionary items, and will be deleted. --EncycloPetey 21:52, 11 April 2008 (UTC)[reply]

Inflection line for nouns used only in the plural.

Encyclopetey and I have come to a disagreement at template talk:en-noun#For a plural regarding how we should note pluralia tantum on the inflection line.

My position is that we should use the format:

'''noun''' {{pluralonly}}
(or '''noun''' {{plurale tantum}} depending on the outcome of the discussion further up this page).

This categorises words into category:English pluralia tantum, a sub-category of category:English plurals, which is a sub-category of category:English nouns

Enyclopetey is advocating the alternative:

{{en-noun|''[[plurale tantum]]''}}

This categorises words into category:English nouns

Before this discussion descends (further?) into acrimony I feel that more opinions are needed. I suggest that the discussion take place here rather than there. Thryduulf 18:43, 11 April 2008 (UTC)[reply]

It seems like a good idea that it be regularized. I would expect that it would need a vote if it is actually to become mandatory. I certainly hope it isn't going to be backdoored. Taking the point of view of an ordinary user would suggest that it should be intelligible. Learning from what other dictionaries (with vastly more resources and a pecuniary interest in trying to make their product useful) seems useful even if we reject their choices. MW Collegiate shows:
  1. no plural and no notation when the singular and plural are the same, but also for regular plurals;
  2. "pl" when the noun in plural in form and is usually used in a plural sense;
  3. "pl but sing in constr when the noun looks like a plural but takes a verb in a singular inflection
  4. "pl but sing or pl in constr when the noun lools like a plural but may take a verb either singular or plural.
MW3 (unabridged) does the same, but also always shows a plural form if there is one, reducing some ambiguity in "singular only" cases, and adds the qualifier "usu" (usually) for more common plural forms and sing vs pl 'construction'.
MW Online seems to be the same as MW Collegiate.
Longmans DCE, an ESL/learner's dictionary, dispenses with the idea of "construction" and has just "P" for plural and "U" for uncountable, slightly restricting the acceptable choices to simplify matters for their users.
I'd be interested in what OED, AHD, Collins, Random House, and Chambers do.
I think we've already established that nobody with normal users uses Latin. DCDuring TALK 19:52, 11 April 2008 (UTC)[reply]
Both the OED and MW3 use pl. for plurale tantum nouns. The AHD is inconsistent, sometimes using the text "Often used in the plural" (cf. pant) and other times putting the plural form in bold at the head of the numbered sense (cf. color). Random House doesn't bother to mark these at all. --EncycloPetey 21:58, 11 April 2008 (UTC)[reply]
Hm, that may not be inconsistent. "Pant leg" sometimes appears in the singular, but as far as I know, the "regimental colours" never does. —Michael Z. 08:22, 12 April 2008 (UTC)[reply]

Sorry, I missed the point of the discussion which relates to the structure of categories and the use of template. Does this affect non-editing users? Not initially, as I understand it.

When using the category intersection tools, what would happen? I hope that such tools will become available to regular users. If a p.t. noun is treated no differently than a normal noun most of the time, then everything should be fine. If not then, then we will have a problem.
Making something templated means its appearance can be changed by changing the template only. The EP approach would seem to give more scope to change things because there would be a template for the noun itself instead of the noun appearing in a "hard-coded" way. My 2 cents. DCDuring TALK 20:09, 11 April 2008 (UTC)[reply]
This came up because I couldn't figure out how to do it with {{en-noun}}, then went searching for the right template, and finally asked for help.
Doing this with a template would be advantageous, because it's already expected by semi-newbs like myself. All the better if it's an option on en-noun rather than a separate template. It would also guarantee consistent formatting and categorization, and allow us to decide on the exact categories and wording independently (the template can always be updated).
I think the wording is a separate discussion, but FYI, the Canadian Oxford saves space by saying plural noun or (pl. same or -es) in many cases, and letting you figure out the details—it's obviously aimed primarily at native English speakers. —Michael Z. 20:33, 11 April 2008 (UTC)[reply]
We could adjust the {{en-noun}} template to accept something along the lines of "plural=only" / "plural=tantum" that would add the expected formatting and category. However, I'm not sure how best this could be done. --EncycloPetey 21:55, 11 April 2008 (UTC)[reply]
If we are adjusting the {{en-noun}} template (which I assume would be no more difficult than the existing way we get {{en-noun|-}} to categorise into category:English uncountable nouns), then I'd hope we'd stick with the "pl=" format already used. I don't think "pl=tantum" would be possible as the plural of "plurale tantum" is apparently "pluralia tantum", so logically the/a plural of "tantum" is "tantum". "pl=pluralonly" or "pl=plurale tantum" would be possible, both displaying whichever form of words we agree on (which I agree with Michael is a separate issue). I think however the best form might be {{en-noun|sg=-}}, which to me implies there is no singular form in the same way {{en-noun|-}} signifies there is no plural form, and has the benefit (I presume its a benefit anyway) of using the existing "sg" parameter. This might fall down though for two-word pluralia tantum, e.g. glad rags and checks and balances where we use the sg parameter to link to the individual words. Perhaps then use either the "sg" or "pl" parameters with some other symbol to denote this status, ! perhaps?
If we do this, then I think the {{pluralonly}} and {{plurale tantum}} templates should be depricated in the inflection line, but remain for use in the sense line.
Whichever solution we have, I think it is useful to retain categorisation of the pluralia tantum, either there solely or there and in category:English nouns or category:English plurals. The existing templates could of course very easily be modified to do this as well. Thryduulf 23:34, 11 April 2008 (UTC)[reply]
[I adjusted some text above which didn't display correctly —Michael Z. 00:28, 12 April 2008 (UTC)][reply]
My preference would be for dual categorization in those cases. --EncycloPetey 00:17, 12 April 2008 (UTC)[reply]
Dual categorisation in which two categories? Thryduulf 00:59, 12 April 2008 (UTC)[reply]
Sorry, Category:English nouns and Category:English pluralia tantum. --EncycloPetey 02:30, 12 April 2008 (UTC)[reply]

Glossary - formatting

Is there any consensus on how to format glossaries? I have put up for myself a provisional policy at User:Daniel Polansky#Glossary, still wondering whether I should use (a) bullets, boldface, and "-" separator or (b) definition lists with ";" and ":". Today, I have formatted two glossaries, using the option (a). The option (a) is used in Wiktionary:Glossary and is more compact than definition lists. Still, definition lists are a standard HTML means of entering terms and their definitions. --Daniel Polansky 10:32, 12 April 2008 (UTC)[reply]

I am definitely in favour of (b), see the utterly hated Appendix:List of Harry Potter terms, it adds proper structure and (as a result) looks neater. Conrad.Irwin 18:18, 12 April 2008 (UTC)[reply]
That is a woefully incomplete list...if you are going to do it at least do it right :p - [The]DaveRoss 20:07, 12 April 2008 (UTC)[reply]
Having a look through the various glossaries, I agree that a guideline would be helpful.
{{compactTOS}} doesn't need rules to be added, because it already stands out on the page. If they are desirable, then we can add them as CSS border-top and border-bottom in the template, instead of tossing in more wikitext.
I would suggest that bulleted glossaries don't need bold formatting, especially if the terms are linked. A colon may be a less obtrusive and more natural separator. The terms don't get lost in Appendix:Bagpipe terms, and I think it is more readable than many of the others.
The guideline mention consistent copywriting, too. Does each term begin a sentence, or is it followed by one? Does the definition begin with a capital letter? I think the answers are different for each glossary, but the definitions in a glossary should be consistently written.
I am very much in favour of using structural HTML, but unfortunately Wikipedia's semicolon-colon wiki lists are styled for discussion. Definition lists are well structured, but not particularly attractive. The lists also have the advantage that one term can be associated with several definitions (but unfortunately there's no way to put more than one paragraph or another list into a single definition). —Michael Z. 03:08, 13 April 2008 (UTC)[reply]
IMHO the defined term should better appear in boldface. It does so in Wiktionary entries of terms, it is so formatted when the HTML definition list is used, and it is a Wikipedia convention to have newly defined terms in boldface. --Daniel Polansky 06:17, 13 April 2008 (UTC)[reply]
It's good typographic practice to use the formatting appropriate for the context.
Headwords in entries are the most important thing on the page, and have to visually compete with the headings. Each entry has one or only a few of them. Boldface is appropriate here, and it is a convention inherited from many paper dictionaries. But dictionary terms also appear in etymologies, where they are italicized, and in lists of related terms, etc, where they are in roman font and linked. We wouldn't boldface any of these instances.
In contrast to Wiktionary entries, glossaries have dozens or even hundreds of terms, and are made of many blocks of running text, rather than a collection of headings and bulleted lists. They are more similar to the lists of terms appearing in entries than to the headwords. Glossary entries don't have to compete with other bold elements, they just have to be found by the reader, and then not distract her from reading the definitions. A term here is already flagged by coming at the beginning of the line, by being marked with a bullet, by being linked, and is set off with prominent punctuation after. Boldfacing every one is just adding icing to the gravy. —Michael Z. 17:40, 13 April 2008 (UTC)[reply]
Okay. I will format poorly formatted glossaries using the option (a), as I prefer it and I can see no clear consensus against it, but I will refrain from turning well-formatted glossaries formatted using the option (b) into the formatting (a). Please, let me know if you think it a poor personal policy. --Daniel Polansky 06:47, 13 April 2008 (UTC)[reply]
A and B are both much better than any of the other formats which appear in some of those glossaries. —Michael Z. 17:25, 13 April 2008 (UTC)[reply]

a bot to capture Wikipedia on Wiktionary

I was just now editing 狐獴 (the Mandarin entry for meerkat), and thought of something (maybe someone else has already thought of it, but anyway). Wikipedia now has thousands of articles with versions in multiple languages. The titles of most of these articles are either nouns or proper nouns. Perhaps a bot could be written to create Wiktionary formatted entries (similar to Tbot) for these words. For example, if the bot noticed that the English Wikipedia article for meerkat had a Mandarin equivalent article called 狐獴, the bot would create a formatted entry on Wiktionary that would look something like what you now see at 狐獴. A category could be slapped onto such entries (similar to the way Tbot tags things), so that a human editor could verify the contents, and add extra things (ex. Pinyin romanization for Mandarin entries etc.). Thoughts? -- A-cai 13:31, 12 April 2008 (UTC)[reply]

  • This would be fine if the interwiki links between the various Wikipedias always used the same conventions. Take your example of w:meerkat - the Italian interwiki link points to "Suricata suricatta", which is the translingual (or modern Latin) name, not the Italian word suricato. So your bit would generate incorrect entries. SemperBlotto 14:28, 12 April 2008 (UTC)[reply]
That's a bad idea. Many of the organism articles on most Wikipedias are based on the scientific name, and most of the plant articles (over 15,000 and growing) use the Latin binomial for the name. some other Wikipedias do the same. When an organism belongs to a higher taxon, and is the only organism in that taxon, the two articles may be the same, but not all Wikipedias divide them the same way. As a result, the article on the Ginkgo genus on one Wikipedia may be titled for the Ginkgoaceae family on another Wikipedia.
There are also many cases where the bots propogate incorrect links. I have had an ongoing tussle with bot operators over w:Monoicous because they don't understand that w:es:Monoica and w:fr:Monoécie are not about the same topic (they should link to an article about monoecious, not monoicous). The plant editors keep removing the incorrect links; the bots keep adding back the links; and the bot operators believe they are absolved of any fault. There are also many cases where the article titles don't even remotely mean the same thing, even when the topic is the same. English wikipedia has an article on Plant sexuality (which we would delete as sum of parts), but it covers the same subject as w:es:Monoica does. In short, we're seeing links between articles that do not have titles of the same meaning.
There are also many, many articles with titles that do not merit an entry because the entry would not meet our CFI. I can't see any reasonable way to get a bot to distinguish between cases or use appropriate selectivity in choosing which articles to create entries for. --EncycloPetey 14:29, 12 April 2008 (UTC)[reply]
I agree with EncycloPetey. It's a great thought, but I think there are too many problems with it. In addition to the ones he mentions, there are also cases like wikipedia:Fixed-wing aircraft where Wikipedia's noble quest for NPOV has led it linguistically astray (the normal words being airplane and aeroplane, depending on dialect). (There are also issues with figuring out whether a Wikipedia article corresponds to a lowercase Wiktionary entry or an uppercase one, but an intelligent bot might be able to handle those by searching the article for non-sentence-initial uses.) —RuakhTALK 18:09, 12 April 2008 (UTC)[reply]
Do note that while "airplane" and "aeroplane" may be the "normal" layman's terms, engineers (aircraft engineers) generally use the technical term aircraft. When Boeing talks to the public and to the Street, they use "airplane" (Boeing Commercial Airplanes Division), when their engineers and pilots talk, it is "aircraft". "Fixed-wing aircraft" is correct. (Besides evading the Pondian Problem ;-) Robert Ullmann 12:10, 20 April 2008 (UTC)[reply]
I agree that aircraft is the more formal/technical term, and that fixed-wing aircraft is correct and precise, but even a Boeing engineer or pilot would presumably choose airplane over fixed-wing aircraft in a context where aircraft alone didn't suffice, right? (And remember that Wikipedia's main naming convention is that articles should be named using the most common term for their referents.) —RuakhTALK 16:09, 20 April 2008 (UTC)[reply]
I think this would work. Ullmann has already put significant effort into Tbot's checking mechanism and the quality of the Wikipedia data would seem not too much lower than the quality of some of our translations. I think that (if this is possible) a modified version of Tbot that accepted input from Wikipedia interwiki's instead of Wiktionary's translation tables would be very good. (And as an added bonus it could add the word to the translation tables at the same time ;). For more information on exactly what checks Tbot does you'd have to ask Ullmann, but I believe they require a foreign Wikt entry to exist and contain a translation in common with the English Wiktionary entry. Certainly this bot shouldn't create translations of articles that don't have entries in Wiktionary. I don't know how many of the Wikipedia interwiki's would pass Tbots checker, but I would think a significant enough number to want to give this a go. Conrad.Irwin 20:21, 12 April 2008 (UTC)[reply]
I think it is a bad idea to try and get data from information which isn't designed to be a direct translation. Even if one article corresponds to another that does not mean that the titles are translations of one another. - [The]DaveRoss 20:43, 12 April 2008 (UTC)[reply]
You make some good points, but the thing is, when Tbot creates a bad entry based on one of our translations tables, that's still useful: it calls attention to a problem in that translations table. When Tbot creates a bad entry based on Wikipedia interwikis, that's just annoying. (And, would it re-create the entry every time we deleted it?) —RuakhTALK 20:45, 12 April 2008 (UTC)[reply]
Based on recent discussion here, I think it would be trivially easy to stop a bot creating an entry when a previous page with that title had been deleted. When a section on a page was deleted, but the entry still exists (e.g. the Dutch section was deleted but the English section remains) I think it would be harder (disclaimer: I am not a programmer). Thryduulf 22:08, 12 April 2008 (UTC)[reply]
Besides the points raised above, I think that Wiktionary should try to create it's own content, and not rely on other projects and any mistakes they might make. Nadando 07:29, 13 April 2008 (UTC)[reply]

I should have pointed out one of my justifications for such a bot. A lot of contributors are already introducing such entries ... by hand! Many such entries are poorly formatted, and rarely tagged with any kind of "blindly copied from Wikipedia" tag. The entry for 狐獴 was one such example. It originally looked like this. I'm proposing to standardize the process with a bot. Such a bot (if done correctly) would give me the means to efficiently verify such entries, without a lot of additional formatting work. -- A-cai 07:46, 13 April 2008 (UTC)[reply]

One option that we might consider is writing a bot to add the interwikis as ttbc's to the English entry. The beauty of this option is that we get a whole bunch of data, it's pretagged to be looked at by human editors, and can then be fed into Tbot in the normal fashion. Additionally, our readers will be forewarned that it's questionable data and so, hopefully, won't be led astray. This does, however, have the downside that it would flood ttbc categories, which would, admittedly, be irritating. -Atelaes λάλει ἐμοί 07:39, 13 April 2008 (UTC)[reply]

A starting point would be to try to extract some entries automatically, and create a report or list of what the automation thinks it world do. As noted above, Tbot's primary check uses the translation table in the FL.wikt entry; this won't work that way. A serious technical issue is that Tbot works by digesting the entire en.wikt XML, and then looking for specific FL entries; what method would be used to extract the 'pedia data? The en.wp XML by itself is not manageable. (one could get "langlinks.sql.gz" for a given set of wps, and then do some analysis) I myself have not tried to parse anything out of wp entries; they look superficially consistent in many ways, but that may not make them tractable. Robert Ullmann 12:10, 20 April 2008 (UTC)[reply]

Transliteration appendices

The various transliteration systems in category:Transliteration appendices need references attesting to their wider usage or indicating their source. If they are systems modified or created specifically for Wiktionary, then they ought to be moved to the Wiktionary: namespace, per earlier discussion (Wiktionary:Beer parlour#Organization).

I added Wikipedia links to all of these appendices, and here is a link to Thomas T. Pedersen's reference to many transliteration systems.

Please add a reference to any of these appendices you are familiar with. —Michael Z. 20:07, 12 April 2008 (UTC)[reply]

It also looks like Wiktionary:About Greek/Transliteration may be a candidate to become an appendix. —Michael Z. 20:53, 12 April 2008 (UTC)[reply]

Most of those where created specifically for Wiktionary, and for some (New Persian) editors seem to be using multiple transliteration systems simultaneously. If you have specific objections to any of those, I'm sure people will be happy to discuss on respective talk pages. --Ivan Štambuk 21:39, 12 April 2008 (UTC)[reply]
The translaiteration Appendices should have references, yes, but when the Transliteration system is part of a Wiktionary: namespace page, then its an internal transcription system that may or may not be used elsewhere. --EncycloPetey 03:19, 13 April 2008 (UTC)[reply]
Right, but I'd like to just figure out which is which, and put each one in the right place.
A reader should be able to look at each table, and:
  • Know if they can expect to see the system used in other publications, and if so then in which field (linguistics, publishing, other dictionaries, etc).
  • Know if they shouldn't use it in e.g. an academic paper and expect their peers to be familiar with it.
  • Have confidence that what they are reading observes the Wiki principal of verifiable accuracy, and clearly identifies any original research.
From browsing through them and having a look at the relevant Wikipedia articles, it does look like a significant proportion of them are systems used in academia, or possibly dangerously close to such systems. —Michael Z. 03:30, 13 April 2008 (UTC)[reply]

Proposal: a template for linking prominently to foreign-language Wiktionaries.

When I add a foreign-language word that has a that-language Wikipedia article, I typically add a prominent link to that article, using {{projectlink|pedia|lang=fr}} or whathaveyou. However, if the word has a that-language Wiktionary entry, the link to it only shows up in the sidebar, which is fairly useless unless the reader knows to look there. I think it makes sense for the that-language Wiktionary entry to have a more prominent link, if only because the that-language entry is generally more complete (if only because it will have translations to languages other than English). So, I've created {{PL:wt}}, such that this:

* {{projectlink|wt|español|lang=es}}

will produce this:

It's pretty much like all the other {{PL:*}} templates, except that it doesn't create a sidebar link (since the normal bot-managed interwiki link serves that purpose).

Before I start using it widely, mentioning it at Wiktionary:Links, etc.: does anyone object to this? (And, does everyone agree that this should only be used for linking to the that-language Wiktionary entry, never to other foreign-language Wiktionary entries?)

RuakhTALK 00:36, 13 April 2008 (UTC)[reply]

The icon is pretty pointless. It's just just a vague smudge, in my browser. Better to use nothing at all.
Or perhaps the favourite icon, which was designed to display at 16px size. But then it should be the bullet, created using the list-style-image CSS property, not an icon placed next to a bullet. —Michael Z. 02:30, 13 April 2008 (UTC)[reply]

But that icon is also used by Wikipedia. How about expanding the function of {{infl}} (and similar templates) to include a link to the appropriate wiktionary, if it exists? How feasible in this? Would this look OK in the inflection line? --EncycloPetey 03:16, 13 April 2008 (UTC)[reply]
Yeah, Wikt: and W: are the only two projects that share an icon, but that's what we have (see the selection at commons:Wikimedia#Favicon). But we do use the globe for Wikipedia, so links with the W would still be distinctive.
(There's also an SVG version that looks pretty sharp at various sizes and the puzzle piece )
But that's also why I think it may be better to use nothing. The full logo at 16px size is not attractive or even identifiable. It doesn't even serve as an eye-catching bullet, especially if it is right next to a standard bullet. A graphical element that serves no function simply makes things worse than they would be without it. —Michael Z. 03:46, 13 April 2008 (UTC)[reply]
Some other-language Wiktionaries use the scrabble tiles for their logo. The W tile clipped out of this image might make a usable 16-pixel bullet (and favicon). —Michael Z. 04:01, 13 April 2008 (UTC)[reply]
(edit conflict) O.K., I've removed the logo. I don't like the favicon idea, for the same reason that EP gives; and even if I did like that idea, I wouldn't like the idea of not including a bullet from just one element of a bulleted list. (I mean, technically it would be multiple unordered lists in the HTML, but to our readers it would look like one bulleted list with just one non-bulleted element, so, same effect.) And anyway, on reflection it doesn't make much sense for us to use some form of our own logo as a means of identifying a foreign-language Wiktionary. However, it might be nice to use a bit of markup, something like (es), as the “logo”. —RuakhTALK 04:02, 13 April 2008 (UTC)[reply]
The puzzle piece is typically used for stubs, not for projects. I wasn't clear what I meant; I was thinking of adding functionality to the {{infl}} template along the lines of what we do for {{t}}. So, an inflection line might look like: hablar (es) so that the interwiki link appears in the inflection line. The {{infl}} template already includes the language code, which simplifies the process a bit, should we decided to do this. --EncycloPetey 04:03, 13 April 2008 (UTC)[reply]
I'm not opposed to {{infl}} having such a link, but I don't think it's enough: I don't think most readers will understand it. To be honest, I'm not sure most readers recognize the significance of those translation-table links, either, but at least there we provide only two links, so most readers who are interested in a translation will probably try them out once to see what they are. That's not true of an interwiki link representing the entire language section. —RuakhTALK 04:02, 13 April 2008 (UTC)[reply]
Here's an idea: take the actual wikitext one would use to create an interwiki link, and realize it on the page. Its meaning actually is vaguely self-evident, even for non-wiki editors. All the better if there was a way to add a tooltip reading ‘hablar’ in Spanish Wiktionary. —Michael Z. 04:27, 13 April 2008 (UTC)[reply]
Why shouldn't a self-referential (inter-wiki) link make use of the vernacular?
Here's what it could look like, with the es: link after the definitions: User:Mzajac/hablar. It also looks fine just below the headword, but interrupts the flow. I think it's a bit cluttered if placed at the end of the headword line. —Michael Z. 05:00, 13 April 2008 (UTC)[reply]

Another option altogether would be to leave the link in the left sidebar, but make it more prominent (bold font?). Wikipedia uses a javascript trick to make other-language featured articles have a star for a bullet (w:Template:Link FA), so it would be possible to manipulate other attributes of the link. —Michael Z. 05:08, 13 April 2008 (UTC)[reply]

This idea has been suggested before, I think that javascript is the best solution, as the interwiki link would already be there and we can then just create add a more prominent link to a suitable place in the entry. Obviously the formatting and position of these links has yet to be determined, but I feel that they should be part of, or near to, the language heading. Here are a few of my ideas for layout:

Spanish (es)


Spanish
The Spanish Wiktionary contains hablar


Spanish
español:hablar

As above I think that adding these with Javascript is better than adding these to all the entries, though I suppose it could be added to Interwicket. Conrad.Irwin 11:24, 13 April 2008 (UTC)[reply]

Well, I'd still like to be able to list them in "see also" sections like with other sister projects, but if people want your approach: I like option #2 (though "has an entry for" might sound better than "contains"). Option #1 is the most prominent, but again, I'm not sure it's obvious to most people what (es) means: if I saw it, I think I'd interpret it as "Hey, we know that the name Spanish is slightly controversial, so we're also including this unambiguous language code for clarity's sake." I'm not sure if I'd bother to click the link, since I'd assume that the link was to inform readers of what the language code meant (since not everyone is familiar with language codes). Option #3 is kind of cool, but less prominent, and it's not instantly obvious what it means. There are fairly few cases where we just include a link that would be absolutely meaningless in a print edition (aside from the edit-links, sidebar links, etc.), and I don't think this link needs to be an exception. But option #2 is great; I'd be very happy with it, especially if it were in concert with "see also"-s instead of instead of them. In anticipation of it, and of other scripts that might need such a thing, I've created MediaWiki:langcode2name.js, which offers functions for handling the language code, English name, and FL name of each Wiktionary language. —RuakhTALK 14:12, 13 April 2008 (UTC)[reply]
Inspired by your quick response, and finding myself at a loose end I have implemented #2. It can be trialled at the bottom of WT:PREFS "Trial the javascript prominent interwiki links." (try a hard refresh if the option doesn't appear straight away). Any thoughts would be appreciated. Conrad.Irwin 16:58, 13 April 2008 (UTC)[reply]
No. 2 is my favourite too. No. 1 destroys the graphic effect of the title, so I wouldn't want to see it implemented. —Michael Z. 17:01, 13 April 2008 (UTC)[reply]
It looks pretty good. I'm going to try to get used to it a bit
Would it be acceptable to remove the "the" and the period at the end? It might look cleaner without the accoutrements, and if the beginning mirrored the language heading. I don't think the boldface is necessary to draw the eye to the link, and normal-weight text will probably be more readable at such small size, especially when it appear in certain foreign-language scripts.
Spanish
Spanish Wiktionary has an entry for hablar
Since the note is already written out in full, the tooltip may be an opportunity to include the destination language, but this would require compiling many translations of in Wiktionary.
Spanish
hablar in the Spanish Wiktionary
Starting with the term helps emphasize the link, and reduces the verbiage. —Michael Z. 17:53, 13 April 2008 (UTC)[reply]
Hmm, thinking about it - do we need "Wiktionary" in there, seeing as we are all one big project. Could we get away with "Spanish entry forhablar"? I like the idea of having the Foriegn language in the title, or even in the heading, as this makes it clear what to expect from the link. I think it would make it hard to include "In WIktionary" in the foreign language, as this would need different layout for each language. Perhaps the title could just be "español: hablar" or something that requires little effort ;). Incidentally please feel free to bugfix/experiment with the javascript so long as you bear in mind that it is being used by an unknown number of other people. I'll give your second idea a go now. Conrad.Irwin 18:06, 13 April 2008 (UTC)[reply]
I think there needs to be a reference to the other project. We are already looking at a Spanish word's entry in English Wiktionary, and "Spanish entry" doesn't make it perfectly clear how the context will change when we click.
"Español: hablar" looks fine to me, but does this construction work in every language? Probably, but some of them may have to be bundled with whatever passes for a colon. —Michael Z. 18:15, 13 April 2008 (UTC)[reply]
I have no idea, I think we can leave it as a colon unless anyone else has an opinion. Conrad.Irwin 20:10, 13 April 2008 (UTC)[reply]
Spanish
hablar in the Spanish Wiktionary
I prefer having just the term linked, which becomes the self-evident subject of the note. When the whole note is linked, there is no differentiation, and it looks like a subtitle for the heading. On the other hand, this may be a problem for short words like i. —Michael Z. 19:55, 13 April 2008 (UTC)[reply]
Yes, I was looking at one, and it didn't seem to be prominent enough, though I agree that it is better with only the word linked for anything longer. Conrad.Irwin 20:10, 13 April 2008 (UTC)[reply]

Weird: the link appears, but is gone if I refresh a page in my browser (Safari 3.1/Mac). —Michael Z. 19:48, 13 April 2008 (UTC)[reply]

Yes, this is partly caused by bugzilla:12773, and partly because it is including an external dependancy that may or may not have downloaded before the script runs, I'll have a think about the best way to fix this. Conrad.Irwin 20:10, 13 April 2008 (UTC)[reply]

The action word for a link is a good idea, but the verb is look up, not lookup.

Not sure if I like the indentation breaking up the left margin, though. —Michael Z. 16:02, 14 April 2008 (UTC)[reply]

Compounds and grammar

1. The article forgo describes this English verb's grammar as {{en-verb|forgoes|forgoing|forwent|forgone}}. Very similar patterns appear in the articles go and forego, and likely a few dozen other compounds ending in -go. Isn't it a waste of energy to repeat the pattern go/goes/going/went/gone in so many places? Shouldn't a reference to go be enough for the grammar?

2. In the many years of Wiktionary, many people must already have asked this question and so I would have thought that there should be a page about this question and its answer somewhere in the Wiktionary: or Help: namespaces. Is there? I can't seem to find one.

3. It can be argued that this is a minor problem for the English language, but in German and Swedish where compounds are so much more common and inflexion patterns are more complicated, the question takes on a completely different dimension. Still, new methods are seldom introduced in sv.wikt or de.wikt unless they already exist in en.wikt. It appears sv:föregå and de:vergehen use the same methods as forego, the full grammar pattern is repeated in every article for every compound. --LA2 15:46, 13 April 2008 (UTC)[reply]

Well if it's the wikitext duplication that bothers you, we could create a new special template for this group, allowing go to use {{en-verb-go}}, undergo to use {{en-verb-go|under}}, etc.; but I don't see why we wouldn't want the displayed version to show all the forms. —RuakhTALK 15:54, 13 April 2008 (UTC)[reply]
Personally, I think that we can be of most use to our audience if we show the inflected forms for each compound. Unlike print dictionaries we are not limited by space. Thryduulf 15:56, 13 April 2008 (UTC)[reply]
My primary concern is with question 2: Where has this been discussed before? A special template for "go" could be one solution, but for German and Swedish it would mean hundreds of thousands of templates. Saving space is not one of our needs, of course, but adding new compounds is a problem if you have to repeat the grammar pattern each time, without the ability to use a template. I don't expect a solution and consensus to appear in ten minutes, but I was expecting this question to have been discussed before. --LA2 16:23, 13 April 2008 (UTC)[reply]
For compounds with regular inflection, then perhaps a compound noun template(s) would be possible, depending on the grammar of the language in question of course. I can't see a way of automating irregular conjugation unfortunately. If the person defining the compound terms doesn't want to enter the inflections then they can use perhaps the {{infl}} template and categorise it somewhere where others can find the entry to add them later. Thryduulf 16:40, 13 April 2008 (UTC)[reply]
A reference back to a main entry isn't always satisfactory. Often, compounds follow the inflection of the parent verb, but sometimes they do not. For example, Latin faciō has an irregular passive voice conjugation. Some of the compounds from faciō have this same irregularity in the passive (e.g. patefaciō), but others do not (e.g. cōnficiō). It's therefore better if each entry contains full information of its own. --EncycloPetey 16:52, 13 April 2008 (UTC)[reply]
Well, the inflection line doesn't normally show all the inflected forms of a word, just the principal parts (whatever those are considered to be for a given language). In highly inflected languages, the other noteworthy forms are placed in a separate section (Conjugation/Declension/Inflection). But the idea of shunting this information off to a single "core" entry for each group of words is a non-starter, for reasons that are pretty basic to the philosophy of Wiktionary. Entries should stand on their own as comprehensive treatments of the word or form in question; the user should never be required to go to another page in order to get basic information on inflectional (or any other) properties of a word. Even for languages with quite regular inflectional patterns, like Latin, the principal parts for each verb are given in the inflection line (e.g. video#Latin). Users could be directed to an appendix to figure out this information for themselves, but that just isn't the way we do things here. I'm not sure if your specific proposal has been discussed before, but I don't think it would ever fly. -- Visviva 16:41, 13 April 2008 (UTC)[reply]
When I search the Wiktionary: and Help: namespaces for "basic to the philosophy of Wiktionary. Entries should stand on their own, I get lots of hits in the Beer parlour, but no obvious policy page. Where should I look? --LA2 16:48, 13 April 2008 (UTC)[reply]

edit conflict:

FWIW, Longman's DCE shows the past as "rare".
As to question 1, In the case of compounds not separated by a space or hyphen, I would think that we would want to show the inflection in each entry because it is not necessarily obvious that one could click on part of the blue headword in the inflection line to determine inflection. I use the infl tmplt to suppress inflections that would result from using {{en-verb}} for compound words separated by a space or hyphen, phrasal verbs, idioms, and other phrases that I put under the verb PoS header. I inflect compounds that are not separated. I justify the different treatment by saying that it ought to be obvious to even a casual user that, for such entries, if the inflection is needed, one would click on the word involved. "Obviousness" is in the eye of the beholder, of course, so this is not entirely satisfactory. HTH.
As to question 2, unfortunately you have to use Google-type search skills in the various spaces (talk, wiktionary, WT, Appendix) to try to exhume old discussions of this and sometimes even a description of current practice. We don't seem to have that many policy systematizers active here and many practices have not gotten beyond disagreements. For example, there is a school of thought that would object to my suppression of inflection of phrases and others who would disagree with the very idea of trying to assign "real" parts of speech to idioms. DCDuring TALK 16:59, 13 April 2008 (UTC)[reply]

I think there is some confusion here, the inflection line contains a few key forms of a verb, for very regular forms we do have templates (like the infamous {{en-verb}} which can handle far more than just vanilla regular English verbs) but for the less regular forms the effort of creating, cataloging and looking up these templates takes much longer than typing 30 characters. For the full conjugation of inflected languages we do have templates (see the list of French conjugation tables that use the standard patter) because the effort saved is very large and overcomes the maintenance costs. I don't think anyone has asked about this before because what we do does actually make sense. As for point three... the whole point of having seperate Wiktionaries is because different solutions work better for different languages readers and editors. I think it would be much easier for a newbie (and experienced editors) to find and use (or read and understand) {{el-verb|egkataleípo|εγκατέλειψα|egkatéleipsa}} than, for example {{el-verb-λειπω|εγκατ|α|έ|egkat|a|é}} even though it is shorter (see εγκαταλείπω) [I admint this is somewhat contrived, but the same principle holds for smaller idiosyncrasies in many places]. Conrad.Irwin 21:49, 13 April 2008 (UTC)[reply]

Scientific names and Taxonomy headers

Are these standard headers? If not, how should I correct them? They usually contain Latin taxonomy specifications for the entry. Example: bar-winged rail. --Panda10 17:15, 13 April 2008 (UTC)[reply]

  • I prefer to incorporate the taxonomic name in the definition. That is what I have done with the example (as well as giving it a proper definition, and linking to the correct Wikipedia article). SemperBlotto 17:23, 13 April 2008 (UTC)[reply]
I would do as SB does and:
  1. add a link to wikispecies ({{wikispecies}} or {{specieslite}}}. Though they have 130K+ articles, they might not have the one you want.
  2. look for picture in wikicommons. If there are multiple ones, I usually put in a commonslite link as well as the most helpful or interesting image.
  3. make these links external
  4. make in-line links to the individual words of the two-part species name, on the theory that we should be happy if we have the numerous component words for these names in Latin or Translingual and not try to keep up with the complexities and changes of taxonomic classifications and offer at best terse uniformative entries.
I think it should be clear that we do not have the ability to provide comprehensive coverage of compond names in taxonomy (2- or 3-part) and chemistry (n-part). If we could provide a reasonable mapping from vernacular names to taxonomic names as well as definitions of the parts of taxonomic names, we would be doing things that WP and Wikispecies do not and are not likely to unless we fail. DCDuring TALK 17:49, 13 April 2008 (UTC)[reply]
People have been using L4 headers Scientific names and Taxonomic names. While it may be useful to incorporate one or two into the definition, this pattern doesn't hold up well when there are several, or dozens. AF has been treating "Scientific names" as recognized (understand this is not an application of policy, of which there is none!), and "Taxonomic names" as unknown; I think we should treat "Taxonomic names" as a recognized L4 header, and convert "Scientific names" (which is sort of ambiguous, could be any kind of "scientific" naming ;-) to "Taxonomic names" Robert Ullmann 22:30, 14 April 2008 (UTC)[reply]
It would be very desirable for us to have maps from vernacular names (in, say, English) to the one or more species or genera that may be appropriate. These names are the key to a vast amount of good information that is not necessariy accessible from the vernacular name. Providing this key is something useful to users and not done well by Wikispecies or even Wikipedia. It would pay to make it as complete as we can. But if the list of species would become overwhelming than we can limit ourselves to the genus names.
In the case of dozens of scientific names you must be referring to cases where there are numerous derived or related species names appearing under a genus name that ought to be Translingual, that is an illiustration of the problem. These lists or often not complete, use obsolete or disputed names, or are wrong in other ways. I doubt that we will succeed in recruiting many taxonomists to maintain these listings. We also lack the specific structure that Wikispecies has for this kind of information and the breadth of info that WP offers.
I would think that we would want to discourage the creation of new entries with those headings and with the extensive derived and related terms lists and exploit Wikispecies' and WP's work. Linguistically, the language of the species names includes vast numbers that cannot be said to have been adopted into English, but are essentially Translingual, with components that are Latin (or latinized Ancient Greek). We can provide a usual service to WikiSpecies and to WP by handling the linguistic aspects of these names (etymology, morphology, inflection) as well as association with vernacular names. DCDuring TALK 23:30, 14 April 2008 (UTC)[reply]

(after edit conflit)

I agree that "Taxonomic names" is better than "Scientific names", but I'm not certain about either of them. Are these not Translingual proper nouns? e.g currently we have

Categories also seem all over the place, with entries in at least: category:Taxonomic names, Category:Taxonomy, category:Zoology, Category:Botany, category:Entomology and Category:Biology. Thryduulf 23:44, 14 April 2008 (UTC)[reply]

Lynx ought not be Translingual and should be lower case. I believe that the taxonomic names that appear as Translingual should all have at least the first letter of the first word be capitalized and should be proper nouns. The second part of species names should be entered as Latin, usually/always(?) an adjective, usually/always(?) uncapitalized. Some Latin-derived species and genus names have become part of English and often follow English rather than Latin pluralization. I rely on Stearns' Botanical Latin as a basic source, but haven't finished reading it yet. DCDuring TALK 00:02, 15 April 2008 (UTC)[reply]
Note that lynx exists as an English common noun for the wild cats, while Lynx is a translingual proper noun entry for the taxonomic genus.
I don't understand why the second part of a taxonomic name should be a different language to the first part? Just because it has a Latin etymology, and has a Latin homograph doesn't mean it isn't also translingual - particularly if it doesn't follow the Latin pluralisation rules. Thryduulf 00:10, 15 April 2008 (UTC)[reply]
I was deferring to my understanding of EP thoughts on the subject. The theory might be that it doesn't become Translingual until it is an officially recognized name. Until that time it is New Latin, a variety of Latin.
You are so right about Lynx. Sorry. "cycad" seems right as English, derived from Cycas, a genus. Each item would have to be checked for correctness.
If it follows English pluralizaton, then it would certainly warrant an English entry. If it appears in non-technical English documents, it might warrant an English entry, but the Translingual really should be sufficient.
As to categories: "Taxonomic names" serves to distinguish these Translingual from others; the discipline contexts/categories seem to be a shortcut for animal/plant/bacteria/mold/fungus/virus distinctions. cat:Taxonomy might be useful for New Latin terms used in taxonomy. That's my take, but based only on limited ill-remembered anecdotal experience and not systematic analysis. DCDuring TALK 00:43, 15 April 2008 (UTC)[reply]

My reading of the convenience samples of entries:

OK, I think
Abutilon is a redirect to abutilon, rather than being a Translingual proper noun, reducing the ratio of L2/L3-correct entries to 8 / 15. DCDuring TALK 12:36, 15 April 2008 (UTC)[reply]
Not OK
  • Ponginae: English proper noun => Translingual proper noun ("TPN")
  • chironomidae: English noun => u.c. TPN
  • Insecta: English proper noun => TPN
  • platanifolia: Translingual proper noun => Latin adj
  • accipitres: English noun => Latin obsolete taxonomic name
  • mongolica: Translingual noun => Latin adjective

All of them could stand a link to Wikispecies. Some don't even have WP links. Etymology would be fairly straightforward using a Classical or New Latin suffix and a Latin or latinized Greek head. DCDuring TALK 01:16, 15 April 2008 (UTC)[reply]

Just noting that the conversation so far looks all good to me (speaking as a trained botanist with a specialty in systematics). However, we might want to consider subdividing the Category:Taxonomic names (or whatever we call it). We probably ought to subdivide out (1) the binomials used for species, (2) names of genera, and (3) higher-level taxa. Putting the whole shebang into a single category seems, well... unhelpful. Particularly so since the lexical use and structure will differ. Species names are binomials, including a functional noun and descriptor. Genera are singular nouns. Higher-level taxa are often constructed as plural nouns, descriptions, or substantive adjectives derived from the names of genera or from characteristics of the group. --EncycloPetey 02:16, 15 April 2008 (UTC)[reply]

Regarding the categories, my initial thoughts are

The last suggestion might give us trouble in the cases where we do not have a category for the class of life or near-life (viruses?) under discussion or where the person making the entry does not have specific-enough knowledge. The less precise tag would help users in the meantime by providing a clue as where to look for more information.
I think that there is a strong case for a specialized rfc tag ({{rfc-taxon}} ?) for this kind of entry. The text field can convey information about the issues that previous editor had not yet resolved, with more detail always possible in the Talk page.
I still feel that we are wasting our time insofar as we are duplicating work being done by WikiSpecies. The entries already are misusing the Related terms heading. We should have the linguistic relationships (etymological, morphological, "Derived terms", "Descendants"). Maintaining the hierarchy is for Wikispecies. Perhaps we need templates that read from Wikispecies and provide their best information on next higher element and next lower elements in the taxonomic tree.
To me the vernacular name to taxonomic name mapping is a matter of great importance and value both to normal users and students of biological fields and one only partially addressed even by, say, the USDA plant database.
The use of the "Scientific name" heading is particularly troubling to me because it is so ambiguous. Is it supposed to be a synonym? A hypernym? A hyponym? A translation into Translingual ? DCDuring TALK 12:36, 15 April 2008 (UTC)[reply]
I disagree with point 3 above. We should not add all the scientific names of insects to Category:Entomology because they will overwhelm the other terms in the list. Besides, scientific names of insects are not used solely in an entomological context; they may be used when discussing evolution, ecology, botany, agriculture, strict taxonomy, etc. A Category:Taxonomic names of insects is possible, but that opens up the possibility of thousands of other similar categories that I certainly wouldn't want to have to maintain.
I agree somewhat with DCDuring. We don't want to be duplicating work that is covered on Wikispeices. However, Wikispecies does not cover the etymological origin of names and name components. That information falls under our mandate, and it is useful to be able to look up such things. Also, I would rather see the scientific name included in a definition, when it is added to a common name entry, rather than under a separate section header. --EncycloPetey 18:03, 15 April 2008 (UTC)[reply]
I absolutely agree that we need to do what Wikispecies does not do: coverage of the language that they use, Etymology, etc. I had tried to say that somewhere above. The only taxonomic entries that I don't think are worthwhile for us are the two-part (or 3-part) species names except in cases where the usage is common (Homo sapiens being the most common of these).
I also think we try to figure out how to get the latest information on taxonomic-tree navigaton (one level up or down) from Wikispecies at run-time. It might be that we can take advantage of the work Wikispecies has done to obviate the need any kind of category structure for taxonomic names at all. Perhaps we could read from them what kingdom (?) a given taxon is in. To me the question is whether our integration with Wikispecies is at run-time (ie, on demand) or periodically. If run-time integration can give adequate performance and transparency from a user perspective and is not too difficult technically, it would be highly desirable. But periodic (weekly, monthly, quarterly) updates (from Wikispecies (and to ???)) could be good enough. DCDuring TALK 19:30, 15 April 2008 (UTC)[reply]
Run-time integration would just substitue one problem for another. Our definitions include all major uses of a term, and not just the singlemost current one. So, while Wikispecies uses the current APGII circumscription of Liliaceae, our entry should have at least three definitions, each based on one of the common major senses meant in scientific literature. We also experience the problem of more than one taxon sharing a name. Some of Wikispecies' pages have a parenthetical inclusion to disambiguate names assigned to zoological and botanical groups. So, a plant and animal, or an animal and alga, may share the same name. Each will have its own kingdom, included taxa, etc. We also include obsolete terms among our entries that have no corresponding page on Wikispecies because the name isn't used anymore. We also allow for any taxonomic name at any rank, whereas Wikispecies sometimes skips levels that aren't used often (like infraorder). Wikispecies also has huge holes in its coverage. Run-time integration is not a pipe dream, but at the present we have nowhere near the readiness to implement something like that. --EncycloPetey 21:47, 16 April 2008 (UTC)[reply]

I use {{British}}, especially when a reference says so. This renders as (UK), and adds category:UK, but the United Kingdom isn't Britain. It would seem more natural to refer to the language as used by people from a place, rather than within the borders of a polity. —Michael Z. 18:38, 14 April 2008 (UTC)[reply]

One major us of the UK tag is to discriminate UK-only from all-English or US usage. And linguistic place in this case is roughly equal to polity. As I understand it, UK = Great Britain = Northern Ireland + Britain; Britain = England + Wales + Scotland. I'm not too sure about Cornwall and Channel Islands. The UK tag is intended to include those places covered by the school system and English-language media located there and relates to contemporary usage that more or less covers the whole place. The languages and dialects that exist there are supposed to be covered by other tags, some of them somewhat controversial. I'm not sure how this set of tags can be improved. I'm also not sure how the English spoken in Ireland insofar as it differs from UK English fits into the tag system. DCDuring TALK 20:35, 14 April 2008 (UTC)[reply]
Actually, Great Britain refers to England, Scotland and Wales as a unit, and corresponds to the island of Britain (Cornwall is part of England, and I'm not sure about all of the little islands either). The UK is the United Kingdom of Great Britain and Northern Ireland, comprising Britain plus one sixth of Ireland. —Michael Z. 22:01, 14 April 2008 (UTC)[reply]
It is overly specific. The designation "UK" means that Northern Ireland is included, but the Republic of Ireland is not, and implies that the English of the former is closer to the language of the rest of the UK than to the English of the latter. But if an editor really means to be this specific, he will never use this tag, but a more specific one, e.g. {{Ireland}}, {{Northern Ireland}}, {{Ulster}}, etc.
It doesn't appear to correspond to any variety of the language, specific or general. We have British English and Wikipedia has an article on w:British English. My paper dictionary uses Brit. for "chiefly in British English..." It appears that dictionary.com, the online etymology dictionary, AHD and M-W also use "British". Is there any precedent in published dictionaries or in linguistics for a "United Kingdom English" or "UK English?"
As far as I can tell, the phrase "UK English" only appears when the country is juxtaposed with English, for example English lessons for students visiting the UK.
In the UK could be useful as a context label (not a language label) for e.g., institutions in the United Kingdom. For example, SAS means Special Air Service (in the UK) and Scandinavian Airlines System (in Scandinavia), but the abbreviation is used for both these institutions in British, American, and every other variety of English.
I think the tag text should probably be changed to Britain or British. —Michael Z. 21:24, 14 April 2008 (UTC)[reply]

And Commonwealth English (most of the world) goes where? Robert Ullmann 22:32, 14 April 2008 (UTC)[reply]

Why should it go somewhere?
I hadn't given it any thought, but there isn't really a language variety such as "Commonwealth English", especially from the point of view of labelling individual words. Wikipedia's w:English in the Commonwealth of Nations is a list of local varieties, and see also w:Regional accents of English.
Much of the language and usage is the same as its source, British English. Individual regions have developed their own features, but most of these will fall under one or two of Template:Australia, Template:New Zealand, Template:Hong Kong, Template:India etc.
Template:Canada stands out, because it is considered a category of "American English" or General American, has inherited many "Britishisms", and generally accepts much language from either. {{American English}} renders as (US), so this just means that maybe 4/5 of category:US will have to be moved to Template:North America.
Template:South Africa and Template:Philippines may be special cases of their own. —Michael Z. 23:53, 14 April 2008 (UTC)[reply]
The idea of "Commonwealth English" only applies to spellings. When we mark pronunciations, we must identify particular accents or dialects: AusE, RP, etc. --EncycloPetey 02:04, 15 April 2008 (UTC)[reply]
I hadn't even thought of pronunciations, only spelling and vocabulary. I suspect that (Commonwealth) is a synonym for (British, Canadian), but is in danger of being used to mark up terms which are really British and not Canadian
[A review shows that Commonwealth has some issues. I'll start a new topic at #Commonwealth, below.]
Anyway, what do you think of changing the text from UK to British? —Michael Z. 02:27, 15 April 2008 (UTC)[reply]
For spelling, I think it's fine, but for pronunciations it would be inappropriate. --EncycloPetey 03:35, 15 April 2008 (UTC)[reply]
Oh, man, right. I never do listen to the pronunciations. Is it a problem because there are transcribed or recorded pronunciations with Irish accents marked up as UK?Michael Z. 03:42, 15 April 2008 (UTC)[reply]
Yes, that counts as an error. Any pronunciation specifically transcribed or recorded for an Irish English accent should be labeled with (Ireland), or something more specific like (Ulster), where appropriate. We only use UK when (1) the IPA/enPR is general to most areas of the UK, or (2) the editor isn't sure which UK accent is represented. I tend to stick with (Received Pronunciation) because I know what it sounds like (from years of watching BBC programs), but other UK regional accents are possible. --EncycloPetey 17:56, 15 April 2008 (UTC)[reply]
Then it sounds like labelling the UK recordings as British would actually be more accurate, since Irish has its own label. —Michael Z. 18:09, 15 April 2008 (UTC)[reply]
No, because "British" is not a current geographic location, unless you are assuming that the people of northern Ireland sound more like Londoners than like the rest of English-speakers in Ireland. We prefer geographic labels when a specific accent is not used, and the highest geographic label I've ssen anyone use is national. --EncycloPetey 21:37, 16 April 2008 (UTC)[reply]
I'm confused. Am I getting something wrong?
UK includes Northern Ireland, so it includes English, Scottish, and Irish accents.
Britain or Great Britain are both current geographic locations which correspond to the big island, so either one includes English and Scottish. Since Britain includes many diverse accents, we prefer RP or mainstream English, and use other labels for regional speech including Scottish, Cockney, Cornish, etc.
The latter also corresponds to other dictionaries' use of the term to label senses, spellings, and transcribed pronunciation. Michael Z. 2008-04-16 21:47 Z
Sorry, I should have said "Britain is not currently a nation". We tend to prefer country names as the broadest item in {{a}}, so UK would be preferred if a specific regional accent cannot be given. "British" doesn't really improve specificity, since the island of Britain includes myriad English dialects and accents. The only way that "British" could ever be truly useful is if it is known that the pronunciation is identical throughout England, Scotland, and Wales, but is different in northern Ireland. There is little likelihood of that, and less that it would be known for certain. Eevn then, the question would be open in the mind of the reader about what was intended. --EncycloPetey 02:58, 17 April 2008 (UTC)[reply]

Am I the only one...

...who hates the [show]/[hide] buttons being on the left side of inflection tables? I miss the old ways! — [ ric ] opiaterein21:38, 14 April 2008 (UTC)[reply]

No, I also prefer them on the right. (Thus starts a quick opinion poll) Conrad.Irwin 21:41, 14 April 2008 (UTC)[reply]
Right, please. "show ▼" should not look like a subheading. —Michael Z. 22:05, 14 April 2008 (UTC)[reply]
Both., with ability to choose whichever you like via prefs. (left by default as I have heard several complaints about people not even knowing they could expand them, it would be nice if the entire div were clickable for expansion...) - [The]DaveRoss 22:07, 14 April 2008 (UTC)[reply]
I too prefer it on the right, as my mouse is normally on the right hand side of the page for edit links / scroll bars (when using a trackpad it is even more annoying than a proper mouse). I do like the "▼" symbol though. Based on TDR's comment above, I think a WT:PREFS setting for left or right would be ideal. Thryduulf 23:24, 14 April 2008 (UTC)[reply]
Right. In principle I prefer the left, but "show" and "hide" come out to slightly different widths, which means that the header moves a bit in a way I don't like. (Yes, I'm picky. What else is new?) —RuakhTALK 23:29, 14 April 2008 (UTC)[reply]
Right as well. (Functionality to right, layout to left) - Amgine/talk 23:33, 14 April 2008 (UTC)[reply]
Could not a "left" option place the link just to the right of the subheading, instead of next to the right margin? —Michael Z. 23:36, 14 April 2008 (UTC)[reply]
I would expect that non-editing users would be better off with the show/hide more visible (almost certainly on the left) and editing users might well prefer them on the right, where their cursor often is. I don't see why the preferences of those of us here (almost by definition editing users) should determine what non-editing users see and use. The solution of having editing users be able to select the look in WT:PREFS or, better, "my preferences" would seem to be the best of both worlds. DCDuring TALK 23:46, 14 April 2008 (UTC)[reply]
The preferences are pretty much unanimous, so wondering what hypothetical non-voters might find useful should not be an issue. A lack of data should not be used to support the alternative. I too prefer them on the right. That is where the scroll bar usually is for a broswer window, so that is where I want the collapsing/expanding arrows. Putting the arrows all the way on the left side means more unnecessary hand and mouse movement. That will be true regardless of whether the person is editing or simply reading. --EncycloPetey 03:47, 15 April 2008 (UTC)[reply]

Reveal arrows are a common interface element in operating systems. On the Mac, they are a grey triangle facing right, just to the left of the heading/text, which rotates to point down when opened. they don't require brackets or "show/hide" text. Doesn't Windows use something like an elbow-down arrow the same way? I have no idea about Linuxes

Why not just copy what readers already expect to see, instead of inventing a new interface? It may even be possible to use MSIE conditionals to give a separate presentation for many Windows users. —Michael Z. 00:15, 15 April 2008 (UTC)[reply]

w:Disclosure widget, and Mac example (PNG)Michael Z. 00:30, 15 April 2008 (UTC)[reply]
That is a good point, I think it was mentioned in the previous discussion about these - but not as forcefully. Windows, iirc, uses a [+] symbol. This code is of course copied from Wikipedia, an so it might be better to match what they do rather than try and get back to operating system level (which would not be how people expect websites to act). Maybe the sideways arrow would be better if the link is on the left, though it doesn't make sense if the link is on the right. Conrad.Irwin 01:02, 15 April 2008 (UTC)[reply]
I have implemented the idea for making the whold NavHead (grey bit) clickable, and so this is yet another option to be considered, I am fairly ambivalent as to whether it is used, on the one side it is a bigger target to hit, on the other it is slightly less obvious what is going on. It does of course solve the problem of experienced contributors having the mouse on the wrong side of the screen ;). (Note that this method removes the functionality from the [show] button making it just a status indicator). Conrad.Irwin 01:02, 15 April 2008 (UTC)[reply]
Forgot to mention... The idea of having the link after the text was trailed and dismissed - though that was before the whole NavHead could be clicked. <techy note>Also, there is a yucky hack repeated all over the place - "    " should NOT appear a the start of <div class="NavHead">, the necessary space is now added by CSS. See this edit for what needs fixing</techy note>
Additional notes: hard refresh if it isn't working for you yet. Also, you can put your own "show/hide" anywhere you want by setting the .NavToggle{ float: [left|right]; };" in your monobook. You can hide it altogether with display:none;. - [The]DaveRoss 01:20, 15 April 2008 (UTC)[reply]
Having the whole bar clickable is a slightly weird—we are not used to things popping open unless we click on a widget. (The exception is labels for form widgets, but they only change cursor focus.) Unexpected behaviour is not normally desirable, especially if it can change the user's context, say by unexpectedly opening a large translations section, and pushing the text they were reading down out of the window.
Frankly, we already have a very prominent and self-explanatory click target ([show▼]), so I don't think it improves things to make an invisible hot-zone about 15 times bigger.
A quibble: it also activates if I click and drag to select text in the title, and behaves a bit strangely if I double-click or double-click and drag. —Michael Z. 03:37, 15 April 2008 (UTC)[reply]

All changes have been reverted due to lack of support here. I am happy to play around with other ideas, but the current situation seems to have been annoying a lot of people ;) Conrad.Irwin 11:17, 15 April 2008 (UTC)[reply]

Thank you for your edits Conrad Irwin. I'll continue on Wiktionary:Beer parlour#"Show" tags. Best regards Rhanyeia 14:21, 15 April 2008 (UTC)[reply]
Please make it a PREF for those of us who aren't dumb like it that way. - [The]DaveRoss 20:13, 15 April 2008 (UTC)[reply]

To Wiktionarians who frequent WT:RFD and WT:RFV

First of all, thank you to the people who put in a lot of time consuming and not always interesting leg work to question, list, verify and clean up entries.

Second, I have read something like 200 RFV and RFD conversations over the past few days, and many more than that in the past, and I have a wishlist or set of tips that I think will make following discussions and cleaning up after discussions a lot easier. This will hopefully cut down on the time it takes to archive, and make less tedious the archiving tasks causing more people to be willing to do them.

  1. Please make your "vote" clear. Right at the top of whatever statement you are making is best, bold "keep", "delete", "merge", "cited" statements make it much easier to see quickly, without rereading a 30 statement discussion what the gist of the content was. Make any qualifying statements, blanket statements, policy statements or general statements after the vote, so those archiving later can skip the information not directly relating to the status of the page.
  2. Please close discussions which are completed. The best way to do this involves <s>striking through</s> the title, noting the changes which were made to the page/sense/entry at the bottom of the discussion, and removing the tags from the target page. If this process is followed it is much easier to archive the page, it doesn't even have to be done by hand.
  3. Please format your comments so they are easily delineated from other's comments. If everyone's comments are at the same level it is much harder to figure out who said what, meaning we have to reread the entire discussion. If comments are on different levels it is much easier to weigh the different opinions.

Some of these requests may seem like laziness, not wanting to have to read the entire discussion, but I think if we all try to be more clear in our intentions on these discussion pages it will be less painful for people to archive old discussions and they will be archived more often. The second benefit for clarity in discussions is that they are much more useful down the road, we archive most substantial discussions so that people can read them in the future to understand why decisions were made, and the clearer our statements the more readily future readers can utilize them. Thanks. - [The]DaveRoss 23:19, 14 April 2008 (UTC)[reply]

+1. —RuakhTALK 23:29, 14 April 2008 (UTC)[reply]
+1 and thanks to TDR for the colossal effort to close and archive. DCDuring TALK 23:46, 14 April 2008 (UTC)[reply]
I appreciate these thoughts not only to aid helpful archivers like DaveRoss, but to promote more closure on the RFD and RFV pages. Many of the discussions there are left ambiguous and unanswered, leaving a difficult and potentially controversial task in the hands of the archiver. The burden of accomplishing consensus and actually altering/deleting articles in the RFV/RFD process should fall to those active participants in the discussion and not be left to interpret by someone archiving 100 entries. -- Thisis0 23:40, 14 April 2008 (UTC)[reply]
Suggestion: Can we create a brightly-colored icon that can be placed on discussions which ought to have been closed, but for which the outcome is still unclear? Such a bright icon, added to discussions that have languished for a month (or more) might help draw attention to users who can help. --EncycloPetey 01:59, 15 April 2008 (UTC)[reply]
something like:

{{look}}

(perhaps with less obnoxious colors) - [The]DaveRoss 02:16, 15 April 2008 (UTC)[reply]
I think the obnoxious colors will achieve the desired effect. In essence it says, "You must attend to this issue to make the ugly banner go away." --EncycloPetey 03:34, 15 April 2008 (UTC)[reply]
I think that's a good idea. How about setting it up as [[template:more input needed}} or template:unfinished discussion? Thryduulf 11:12, 15 April 2008 (UTC)[reply]
I've gone for the slightly snappier, imperative. {{look}}, which adds pages to Category:Input needed (though I suspect this feature is fairly useless) Conrad.Irwin 11:37, 15 April 2008 (UTC)[reply]
I was under the impression that these would go in the discussions on RFV/D, which would list RFV/D in that category but not much else :). - [The]DaveRoss 20:09, 15 April 2008 (UTC)[reply]
That was also my understanding. We wanted to draw attention to neglected discussion within the RFD/RFV pages with an attention-grabbing template that lets folks know the discussion needs resolution. --EncycloPetey 21:33, 16 April 2008 (UTC)[reply]
As a trial I am going to start plunking this into old discussions without resolution, we will see how it goes. - [The]DaveRoss 13:59, 19 April 2008 (UTC)[reply]

IPA

Isn't the IPA for Wiktionary wrong? (the last letter) — This unsigned comment was added by Fshfsh (talkcontribs) at 22:50, 14 April 2008.

No, not for some pronunciations. --EncycloPetey 03:56, 15 April 2008 (UTC)[reply]
The IPA is godly. I might start praying to it.
But if you're talking about the thing in the upper-left corner of every page, yes. It is. lol — [ ric ] opiaterein12:05, 15 April 2008 (UTC)[reply]

Commonwealth

{{Commonwealth}} includes Canada, but much of Canadian English spelling, vocabulary, and pronunciation differs.

Specifically, the following entries have senses or spellings categorized as Template:Commonwealth, but are little-used in Canada: alphabetise, alphabetised, appal, archæological, archæologist, brew, discretise, editorialising, fanny, first floor, heads of agreement, homoeomorphic, homoeomorphism, hospitalisation, phonograph, point, pretence, seagulling, superfund, tea cosy.

I notice some are marked up (Commonwealth, except Canada), but they still end up incorrectly included in category:Commonwealth English. Is it okay to mark these as British (UK) instead? (Does category:UK represent UK dialect of English, or UK regional context, or both?) Is there a better way to mark them up? —Michael Z. 04:07, 15 April 2008 (UTC)[reply]

If Canada is the only exception, then I think it makes sense for these to remain in the Commonwealth category. Otherwise they would have to be tagged {{UK|Australia|New Zealand|South Africa|India}} at the very least (and that would still leave out a bunch). It might be interesting to have a separate template and subcategory for Commonwealth spellings/words which are not used in Canada. -- Visviva 05:09, 20 April 2008 (UTC)[reply]
Perhaps Canada is the most common exception, since its language is in many ways a part of American English, but a regionalism from any of the Commonwealth countries is an exception.
I think paper dictionaries all refer to those as British spellings, and don't use the term Commonwealth for this reason. Michael Z. 2008-04-23 01:15 Z

(Roman) numerals: which ones to include?

Recently 70.55.85.225 (talk) has been adding content to a lot of non-standard roman numerals such as mmmmm and LLM. Some of these edits are not good, since inconsistent and wrong, but some of it is usable as well, and he gave a reference on my talk page which defends them, although nothing about double subtractions can be found on w:Roman numerals.

However, I am thinkin that most if not all of these entries are sum of parts and therefore do not deserve a lemma. Of course we want C, M, and the basic ones, just like we have entries for digits, but not for numbers (or are supposed to). But as I look at it, these are a mess as well. 09 exist (though with very different levels of detail), but 1019 as well, whereas 20 and on are redirects to the corresponding spelt-out pages. Someone really needs to do some cleanup here! See Category:Arabic numerals. (Note that entries such as 180 and 1337 have other reasons of existence, though there also, I’d propose to remove the ‘Translingual’ section.)

What do others think? H. (talk) 14:05, 15 April 2008 (UTC)[reply]

I don't mind including SoP entries - particularly in cases like this where it isn't necessarily obvious what parts it is made up of, however I wouldn't ask anyone to go round and create these. It seems to me that it is better to return something rather than nothing for cases like this, and if these words are deemed not necessary for inclusion I would like to replace them with {{only in|{{pedia|Roman Numerals}}}} or something useful (see XVII) perhaps linking to an appendix. As I said though, I would prefer them to have full entries if someone wants to create them. Conrad.Irwin 15:29, 15 April 2008 (UTC)[reply]

"Show" tags

This conversation has started on Wiktionary:Beer parlour#Translation bars and continued on Wiktionary:Beer parlour#Am I the only one.... The "show" tags of translation bars are now far on the right, and they were tested on the left. This led to a lot of comments and I'll continue here by commenting some of them:

"'show ▼' should not look like a subheading"   This may be true, and what I'd maybe be interested in trying next would be the "show" under the title text, but something else too as I'll write in the end of this message.

"Functionality to right, layout to left"    I think this is part of the "layout" since it includes important mainspace content. Those who don't edit are not likely to pay attention to a small "show" on the right and they don't even know they are supposed to be looking for something.

"...right. That is where the scroll bar usually is for a broswer window, so that is where I want the collapsing/expanding arrows. Putting the arrows all the way on the left side means more unnecessary hand and mouse movement. That will be true regardless of whether the person is editing or simply reading."    I haven't even thought about the scroll bar. Most softwares I use have most of the things on the left side (except the scroll bar), and because English (and many other language) text naturally starts from the left, that's where people are looking at. And there are things here also on the left side, like "save page" or "search". How about if there was a clickable area all the way under the title text, with ▼ both on the left and on the right, and "show" on the left? Best regards Rhanyeia 14:18, 15 April 2008 (UTC)[reply]

The fundamental issue is that we (those participating in community forums) are not a good model for "normal" non-editing users. As long as this is a construction site organized for the the convenience of the craftsmen rather than as a building to be used by normals while being renovated and maintained, it will not be possible to build any enthusiasm for efforts focused on the needs of normals. Even the limited amount of free-text Feedback seems to have been considered an annoyance. The lack of interest in our page-view statistics and in steps to improve our Google visibility are troubling to me. DCDuring TALK 14:57, 15 April 2008 (UTC)[reply]
Well, any wiki which tries to orient itself to the needs of non-editing users at the expense of the needs of editors is not going to get far. That's not because editors are somehow better or more important, but because ultimately any work that gets done is going to be done by editors, simply because they feel like doing it. People who want to work on usability and feedback response are welcome to, and I for one applaud the efforts that have been made so far; but I've never had much interest in presentation (on-wiki or off) and personally prefer to spend my limited time working on content. Anyway, in the present case, IMO the more critical concern is that a poor interface may cost us editors (and therefore content) in the long run, when people who want to add translations (etc.) can't figure out where they are or how to do so properly. -- Visviva 01:46, 16 April 2008 (UTC)[reply]
I like this last proposal (a bar beneath the gloss with down-arrows at either end); AFAICS it would solve the opacity issue without creating any new problems. Can we get a demo? -- Visviva 01:46, 16 April 2008 (UTC)[reply]
I wouldn't mind seeing a demo either, but if one control is wanting, then I suppose duplicating it would be twice so.
Tiger Dictionary screen
Another model is the OS X dictionary app's display of multiple references. It uses a reveal arrow at the left-hand side of a divider, but the whole divider line and label act as a control. The best thing about it is the simplicity. One widget, one rule, and a simple label, all in the same low-key colour. The content relies solely on typography to reveal its structure, with no, bullets, double rules, background colours, or unnecessary labels or punctuation.
I'm not suggesting copying it verbatim, but keep removing elements until there's nothing left to remove. —Michael Z. 05:53, 16 April 2008 (UTC)[reply]

Misspelled words

I was told that it would not be an appropriate to use a redirect to help people spelling words incorrectly to the correct dictionary page. Could anyone tell me if Wiktionary allows us to help people searching for a definition to the right page. I'm not really looking for Template:misspelling of but just a general spellchecker to aid people looking for the definitions of words in which they do not know the spelling of. If a redirect is not how to do it, then how? -- penubag  (talk) 09:19, 16 April 2008 (UTC)[reply]

I have done some experimenting with using aspell to spell-check my searches,(<tech>this currently works by using javascrpt to open an iframe to a python script on localhost which in turn uses the api to colour links correctly - so it is very inefficient</tech>). This seems to be very effective, though there are issues - particularly in that you need to guess what language was being searched for (<tech>currently guesses english + the ACCEPT_LANGUAGE header of the browser - but should be possible to get the guess to change for other scripts</tech>). There are plans for us to get mediawiki:Extension:DidYouMean which deals with diacritics etc. and so it may be possible to integrate aspell into an extension like it at a similar time. I thing that this is very important, as a large proportion of our feedback has said "I can't find what I'm looking for" Conrad.Irwin 10:35, 16 April 2008 (UTC)[reply]
You are doing what used to be called God's work, Conrad. DCDuring TALK 11:23, 16 April 2008 (UTC)[reply]
I just looked at usage of "God's work". Let me make clear I meant that in a good, non-ironic way. DCDuring TALK 13:48, 16 April 2008 (UTC)[reply]

context...sense...qualifier...italbrac...a...

Are all of these really necessary? Do we really need more than 2, maybe 3 of these? — [ ric ] opiaterein12:19, 16 April 2008 (UTC)[reply]

Italbrac does less than the others, I think. I think they are intended to have mnemonic names suggestive of their application, even if they don't do things very differently. But, as I understand it, context, for example, puts things in categories and also creates a list of candidate context categories. To me italbrac is the one that has been superseded. But I suspect it would not be wise to automatically replace it with one of the newer more specific tags. It is tedious to hand-check each one, so it doesn't seem to be a high priority to replace it so that the apparently redundant template can be deleted.
Maybe we need to have template categories, like:

Encouraged, Standard, Permitted, Deprecated, To Be Removed, Experimental. Encouraged might be rendered ultra-accessible. Deprecated would be removed opportunistically and earn the scorn of an editor's peers if used. "To Be Removed" could have lists and a project page to encourage removal. Maybe we already have or have had such a system. Perhaps it has been found wanting. DCDuring TALK 13:46, 16 April 2008 (UTC)[reply]

This seems plausible, though it might lead to some unnecessary political drama if imposed too rigorously. But let's not forget that a) this is an open wiki, where overt structure is frequently harmful; b) in the absence of real policies, templates often fill this role de facto. Nobody seems to mind when I and others use Template:quote-book in entries, but I imagine EP (and perhaps others) would have a fit if I tried to mark it "encouraged." Likewise I would have a fit if someone marked it "deprecated" (without a better replacement) or "experimental."
I think we do have a deprecated template category somewhere, and possibly one for "subst-only" (which is a very important category, since templates of this type often to be orphaned). If not, those should certainly be created; "experimental" too. For "encouraged" one would want to see some sort of non-bureaucratic approval process; a flash poll on the BP/GP, maybe? -- Visviva 02:53, 17 April 2008 (UTC)[reply]

{{context}} to provide labels before the definition and if the appropriate template exists it will categorise etc. {{qualifier}} is used to qualify items in lists, it is often used next to links to alternative spellings, in translation tables etc.

{{sense}} is used under the Synonyms and Antonyms sections to refer to the sense/definition to which the listed terms apply. {{italbrac}} as mentioned is mainly redundant. Hope this helps--Williamsayers79 18:15, 16 April 2008 (UTC)[reply]

The talk pages of these templates are usually a good place to start if you want to know what they do. Some of them however, may need a documentation updated.--Williamsayers79 18:17, 16 April 2008 (UTC)[reply]

I don't use {{qualifier}} myself, but all of the others are definitely necessary (though {{italbrac}} may not be used much anymore). Although the output will look very similar for most of these, the function and location for their use are very different. Each exists so that, in the event we decide to format a particular section differently, we need only adjust the appropriate template. They also exist so that users can customize display for certain sections. Using the separate templates for the different functions keeps format and customization of different sections separate, rather than forcing a single format for all of those locations. --EncycloPetey 21:28, 16 April 2008 (UTC)[reply]

Checking my own uderstanding: {{italbrac}} is designed soley to give users who are "in the know" the option of viewing given text one way or another based on personal preference, instead of having the text "hard-coded". Normal users would see it in the default setting. The same capability is a by-product of the other templates (but I hope the primary justificaton is what EP suggests: flexibility in altering presentation for the benefit of actual end users). There are analogous templates for controlling appearance and functionality in etymologies {{etyl}} and {{term}}. {{term}} is actually supposed to be used when a term is referred to within text like usage notes, excluding definitions. It is particularly useful for non-English terms, especially non-Roman scripts. I would interpret all this as meaning that the more specific template are to be preferred over both hard-coding and {{italbrac}} and "italbrac" is preferred over hard-coding. It would seem to suggest that we will have "italbrac" with us for a while. DCDuring TALK 22:02, 16 April 2008 (UTC)[reply]

That seems about right. Some background reading: 2007: discussion leading to qualifier, 2007: qualifier is born, 2006: italbrac discussion. Frankly I'm still not sure why we are still using {{italbrac}}, except for inertia; all the standard cases where italics and parentheses would normally be required are already covered by specialized templates. The problem being, as mentioned above, that each case needs to be hand-checked. -- Visviva 02:53, 17 April 2008 (UTC)[reply]

List of descendents

What if we added a parameter to the Term template like |desc=true en to populate a list of descendants of a word? For example, on sandal the word σανδάλιον would make a list (somewhere) and include English: sandal. I have no idea if this is even possible or how it would be done, just putting it out there as something that could be pretty interesting. Nadando 23:38, 16 April 2008 (UTC)[reply]

We already have a Descendants section. I'm not sure how such a parameter would work, since it would have to specify a language, yes? --EncycloPetey 02:50, 17 April 2008 (UTC)[reply]
Yeah, that's why I put |desc=true|en or something like that. Nadando 02:52, 17 April 2008 (UTC)[reply]
That sounds difficult or impossible to work. There would have to be a way that the descendant words were all marked (and many entries currently have no etymology at all!). Nothing we currently have does that. I think this would more easily be coded into the kinds of tables we currently have. --EncycloPetey 03:06, 17 April 2008 (UTC)[reply]
To be done natively within MediaWiki this would require some sort of fancy-shmancy (and unapproved) extension. But I suppose some sort of automated script (that would sift Special:Whatlinkshere for links from Etymology sections preceded by an appropriate etymology template) could do a good deal of this work. Visviva 13:22, 17 April 2008 (UTC)[reply]
That would certainly be worth a shot. I'd suggest, though, that instead of simply generating Descendants, it should also tackle Derived terms. After all, Derived terms are simply Descendants, but in the same language (at least the way we have the sections defined). So, the bot would need to know both the source and target language, and compare them. --EncycloPetey 21:40, 17 April 2008 (UTC)[reply]

en-adj (not comparable) link

Can the wikilink from the (not comparable) option in the en-adj template link to Appendix:Glossary#comparable instead of the more general wiktionary entry? -- Thisis0 07:01, 17 April 2008 (UTC)[reply]

I've been bold and changed it per your suggestion. I've also changed the target of the "comparative" and "superlative" links to the same glossary entry. Thryduulf 11:56, 17 April 2008 (UTC)[reply]
I've made the same edits to the {{en-adv}} template, the {{en-noun}} template already links to the glossary. Are there any others that would benefit from this? Thryduulf 14:24, 17 April 2008 (UTC)[reply]
Good. Now all we need to do is work on the Appendix language about "the controversy", that is, vestigial prescriptivism. DCDuring TALK 14:38, 17 April 2008 (UTC)[reply]
What do you mean? what do you want addressed? -- Thisis0 18:58, 17 April 2008 (UTC)[reply]
The {{comparable}}, the sense-line tag, important for long entries, would benefit from the same link. I believe that {{not comparable}}, {{countable}}, {{uncountable}} (Should that be displayed as "not countable"?) have links. I forgot to check whether all the links are to the appendix glossary. DCDuring TALK 14:50, 17 April 2008 (UTC)[reply]

Other project sidebar links

Where the target page in another project has the same title as the Wiktionary page the sidebar displays just the name of the other project (e.g. at frog). Where the target page has a different name, the box-style templates (e.g. {{wikipedia}}) display just the project name, the lite templates (e.g. {{pedialite}}) display the project name and the title.

Where more than one page is linked to on another project this can be very confusing, for example:

pantograph

go

Lynx

I can see two ways around this - the first is to display the target page name, even if it is the same as the Wiktionary page title. The second is to allow a parameter that contains custom text to display as the name. Thryduulf 15:24, 17 April 2008 (UTC)[reply]

Are you talking about "in-text" (aka "in-line") or "in-list" links? Have you looked at WT:LINKS? DCDuring TALK 16:02, 17 April 2008 (UTC)[reply]
I'm talking about the "in other projects" links to Wikipedia, Commons, Wikispeicies, etc in the sidebar. The lists above are copies of what appears in these boxes in the pantograph, go and Lynx entries. What I'm saying is that we need to change how these links are displayed so they are less confusing. Thryduulf 22:34, 17 April 2008 (UTC)[reply]
The problem is that there is not really enough space for any more information, see Template_talk:wikipedia2 where I have included some longer iwiki links. I'm really not sure how to make this less confusing. Conrad.Irwin 00:14, 18 April 2008 (UTC)[reply]
Look at Afar. For this page, the main link in the disambiguation; other links have specific names. Is that what you mean? If so, I think it best to link to the disambiguation in such cases, and only link to more specific pages via pedialite if there are specific pages with direct connection to specific definition senses. --EncycloPetey 00:49, 18 April 2008 (UTC)[reply]
I agree. —RuakhTALK 02:03, 18 April 2008 (UTC)[reply]
That sounds good, but for entries like pantograph there are two Wikipedia articles, one at w:Pantograph the other at w:Pantograph (rail)· What I'm saying is that the link to w:Pantograph currently appears in teh sidebar as "Wikipedia" and the link to w:Pantograph (rail) appears as "Wikipedia: Pantograph (rail)", but that they should appear as "Wikipedia: Pantograph" and "Wikipedia: Pantograph (rail).
At Lynx the links appear as "Wikipedia", "Wikipedia", "Wikipedia: Lynx (disambiguation)" and "Wikispecies". They would be much better as "Wikipedia: Lynx (cat)", "Wikipedia: Lynx (constellation)", "Wikipedia: Lynx (disambiguation)" and "Wikispecies: Lynx". Thryduulf 10:02, 18 April 2008 (UTC)[reply]
I'm not so sure. I think it would be better to have only the sidebar links for wikipedia:Pantograph ("Wikipedia"), wikipedia:Lynx (disambiguation) ("Wikipedia"), and wikispecies:Lynx ("Wikispecies"). I think the sidebar links should just point to the project with more information, and that project can help its reader navigate around. (I'm not sure if it's best to try to implement this in the wiki-code for the templates, or in the JavaScript that produces the sidebar links.) —RuakhTALK 12:24, 18 April 2008 (UTC)[reply]
I'm inclined to agree with both of you. That is, it would be nice to be able to suppress sidebar placement, but when present, sidebar links should clearly identify their destination (at least when != w:PAGENAME). As seen in the Lynx example, {{PL:pedia}} seems to behave in the desired way while {{wikipedia}} does not; this is governed by the stuff in the "interProject" span at the end of the template. I can't see any reason why the two templates should behave differently in this regard. -- Visviva 13:46, 19 April 2008 (UTC)[reply]

What happened to the main page?

What happened to the main page? RJFJR 15:51, 17 April 2008 (UTC)[reply]

Has been restored. The renegade admin has been de-sysopped by a steward (thanks Spacebirdy!) Robert Ullmann 16:27, 17 April 2008 (UTC)[reply]

I suspect that the loss of images may be related. DCDuring TALK 13:19, 18 April 2008 (UTC)[reply]
His deletion log doesn't show any recent image deletions that we're unreasonable. What surprises me is that we are not using local protected copies of the Commons images on the main page. It doesn't look like they're protected on Commons either... Mike Dillon 15:07, 18 April 2008 (UTC)[reply]
It might have been a temporary problem at Commons or even more specific to me. It would only have taken a rename/move at Commons, which could be under another name. An incident always makes one a little skittish. But it is a vulnerability, not that a wiki won't always have plenty of them. I don't have the mindset for security. I appreciate whatever protection can be afforded our efforts, as long as there isn't excessive inconsistency with the fundamental philosophy. DCDuring TALK 16:12, 18 April 2008 (UTC)[reply]

FL links in translations

I've updated the CSS and the {t} templates for these. The appearance should be improved on most browsers (but I can only test a few); the links will have less effect (if any) on line spacing, and on IE's irritating habit of vertically centering a line with a superscript on the bullet. I may have made them just a little too small, tell me? (It has to do with how many pixels, so on my screen 70% and 75% are one pixel different, others may see 0 pixels difference.)

The result is that you can now do a whole series of customizations. You can change colours, font size and font, adjust the degree of super-scripting (including none), leave off the parentheses, or suppress the links entirely.

On all current browsers except Internet Explorer, you can also replace the parentheses with (e.g.) brackets, add symbols, or replace the language code with symbol(s).

See Customization at template {{t}}. Robert Ullmann 15:51, 18 April 2008 (UTC)[reply]

Inactive Sysops

I just did a quick, informal audit of Wiktionary sysops. I was looking at the total number we had (75) and it seemed high. The reason that it seemed high is that 11 (~15%) of our sysops are relatively (or completely) inactive. We have, in the past, removed sysops who were no longer active on the project, and I was wondering if the time had come to do so again. If not removing the sysop flag, perhaps removing them from the list on WT:A, which gives the impression that we have a lot more help than we actually do. Here are the sysops which I found to be "inactive" by my own impromptu standards:

  1. User:Ortonmc - diff declaring he no longer wanted to be an admin.
  2. User:Jun-Dai - 4 edits since 2006, not sure about interest in the project any longer.
  3. User:Tawker - 1 edit in past year, not sure about interest in the project any longer.
  4. User:Kipmaster - Not much use of the tools, we can ask easily enough about interest.
  5. User:Psy guy - no edits since 2006
  6. User:Aulis Eskola - ~12 edits in the past year, not sure about interest in the project any longer.
  7. User:Andrew massyn - inactive since May 5 2007.
  8. User:Pathoschild - not terribly active here anymore, easy to ask about interest.
  9. User:Alhen - not very active here, still active on es.wikt, easy to ask about interest.
  10. User:Tohru - 5 months inactive, not sure about interest.
  11. User:Enginear - ~1 year inactive, not sure about interest.

Now, lest anyone say so, I have nothing against any of these people, I quite like all of them who I have interacted with. My primary concern is that we actually do need more sysops, and right now it looks like we have a lot more than we really do. I consider the sysop flag something which indicates a participation and commitment to the project (I know this isn't shared universally) and would like to see more active contributors flagged to help out, but when folks are done I think it is also a good idea to remove the flag. This is certainly not an all or nothing thing, there are varying degrees of inactivity amongst the people I have listed, but I am interested to hear thoughts on what is in the best interest of Wiktionary. I think the ideal here would be to come out with a clear idea of what we consider inactivity, what we consider a standard practice when a sysop is inactive, and then apply it now and in the future. - [The]DaveRoss 22:03, 18 April 2008 (UTC)[reply]

Perhaps, instead of removing the sysop flag - which I see no reason to do if we want more sysops ;), we could just remove them from the main list at WT:A and use that as our count instead - there is no need to rely on the software's counter. For me, i would say that inactivity sets in after 6 months of no edits. When people declare that they no-longer wish to be sysops they should have the flag removed. Conrad.Irwin 22:17, 18 April 2008 (UTC)[reply]
Agree with Conrad.Irwin. I see little harm in allowing inactive sysops to retain their flags, at least until we see evidence that this is dangerous. However, it is nice to have an accurate list of active sysops. What might also be nice is if one of our technical folks could write a dynamic list of sysops active within the previous five minutes, so that folks looking for an active admin at the moment (to block a rampaging vandal, ask a question and get an immediate response, etc.) could find one. -Atelaes λάλει ἐμοί 22:29, 18 April 2008 (UTC)[reply]
I am curious as to why inactive sysops ought to retain the flag? It is my understanding that the sysop flag is not a merit badge, it is a set of tools entrusted to certain editors in order to help the project proceed. People who don't edit Wiktionary anymore don't need the tools. - [The]DaveRoss 22:51, 18 April 2008 (UTC)[reply]
Simply because removing them seems like a waste of effort - and, should they pop back one day, then we'd suddenly have more sysops without having to wait for whatever procedure to give them the tools back. The tools should be given to anyone we can trust to use them properly, not just to those who can demonstrate that they are using them As a period of inactivity does not change a user's trustworthyness (we've been through the compromised account arguments before, it is exceedingly unlikely) it should have no impact on the tools. Conrad.Irwin 22:58, 18 April 2008 (UTC)[reply]
The "trustworthiness" metric is an interesting one...some of the older sysops were simply appointed, others got 3-4 votes...this is neither here nor there, but it isn't as if there were sweeping mandates, just a confirmation that folks knew and trusted them. As it is, people who have been inactive for more than a year can hardly be expected to step in and know all the current policy (especially since we don't really write it down). Moreover, I think "sysop for life" adds to the notion that being a sysop is some kind of honor, it should be a set of tools that people have when they need them and don't when they don't. Several retirees have voluntarily dropped the flag when their work was done, others seem to just have disappeared. - [The]DaveRoss 23:12, 18 April 2008 (UTC)[reply]
I agree with both points of view. I think it logically makes sense to remove an unused sysop flag after a while, especially since (pace Conrad) there is a bit of a risk of compromise now with the advent of unified accounts; but it just doesn't seem worth the effort to develop formal criteria (or vote on each individual admin) and petition stewards accordingly. —RuakhTALK 04:31, 19 April 2008 (UTC)[reply]
Actually avoiding a bunch of votes is what I had in mind, I figured if we can just say "one year without using any sysop tools indicates inactivity" or some similar basic criteria then we wouldn't have to vote individually. That was basically the criteria we have used in the past. It is also of note that none of the people who have left for that length of time have ever come back, Kevin Rector was pretty close but never actually had his tools removed. - [The]DaveRoss 13:55, 19 April 2008 (UTC)[reply]
<detab> At the heart this is a suggestion for a change of community trust metric. Some projects have chosen to address this with success by either an inactivity limit (if you are inactive x amount of time, the bits are removed) or your position as admin is scheduled for a community reconfirm vote x amount of time after your successful RfA. There may be other solutions as well. - Amgine/talk 14:15, 24 April 2008 (UTC)[reply]
I think what would be good would be something like what Commons has introduced recently, whereby if a user doesn't use any admin tools for X amount of time (6 months I think there), then they are given a note on their talk page that their adminship is under review. If after a further month they still haven't become active again (or given a good reason why they still need the tools), then they are de-sysopped. If they become active again, they can reapply for adminship with a much lower threshold - I think the intention is that they get it back if nobody objects within a few days. If someone does object then the application reverts to a standard period and threshold RFA. Thryduulf 15:37, 24 April 2008 (UTC)[reply]
I have divided the admin list by activity, feel free to modify as needed: [14]. Dmcdevit·t 23:02, 18 April 2008 (UTC)[reply]

Display of attributive use of nouns

We often have nouns that are used attibutively, but which do not seem to warrant an entry as an adjective because they do not have enough attributes of an adjective. amazon is an illustration with an unresolved RfV. We need, IMHO, some way of indicating that a noun can be used as an adjective. I would have in mind its use not necessarily for all nouns, but at least for those that have gone through an RfV process that has determined that the adjectival use of the noun does not warrant an adjective PoS. This might discourage the reentry of the adjective PoS and provide helpful information to users.

MW3 indicates such usage by : "often attrib".

Some options I can imagine are:

  1. No Adjective PoS header
    1. inflection line label along the lines "(often|sometimes|rarely) used as attributive adjective"
    2. a "Level 4" heading under a Noun heading: "Adjective use", with text as above and usage examples
    3. a link to citations page with a standard heading on the citations page for attributive use of a noun.
    4. a label on the definition line (per Visviva)
    5. a templated usage note (per Visviva)
  2. Adjective PoS
    1. a standard bit of explanatory text on attributive use of nouns and a link to an Appendix or WP article with more.

Has this been addressed before? Was it resolved in the negative or left open? DCDuring TALK 17:58, 19 April 2008 (UTC)[reply]

And also possibly for those that are often used attributively (or better - for all nouns), when there is no corresponding English adjective (in -ic, -an...) meaning "of or pertaining to <noun sense>", to have an optional adjectival translation in noun's ====Translations==== section. Quite a lot of FL adjectives don't link proparly to base English forms because of that, and have "of or pertaining to" stubs. It would be much easier to use [[noun]] (''attibutively'') instead (or standardise this typical usage via some template). --Ivan Štambuk 18:12, 19 April 2008 (UTC)[reply]
At the moment I would tend to favor option 1.4, a label on the definition line. A label on the inflection line is problematic, since one sense may have a much stronger attributive tendency than another. Actually I prefer option 1.5, a templated usage note in the noun section, but this has met with opposition, apparently on the grounds that too many electrons would be consumed. ;-) -- Visviva 05:04, 20 April 2008 (UTC)[reply]
I have added the suggestions to the list above. DCDuring TALK 09:53, 20 April 2008 (UTC)[reply]
I like the idea of a templated usage note, particularly since attributive use may apply to more than one sense of the noun. The templated note could include a link to an Appendix:English nouns, specifically to a section on attributive use. I would also include a link to a special section of the Citations namespace page associated with the entry, demonstrating attributive use. --EncycloPetey 15:25, 20 April 2008 (UTC)[reply]
I like 1.4 and 1.5, and see no need to choose between them. The more, the merrier. :-) —RuakhTALK 16:13, 20 April 2008 (UTC)[reply]
To clarify and develop 1.5 a little further: Under the heading "Usage notes", under the "Noun" PoS, we would have a template available (not mandatory) for insertion containing a link to the Appendix section referred to by EP, with text that said "Used attributively as an adjective", with attributively being the link word to the Appendix section.
Please feel free to suggest modifications, radical or minor.
One of the advantages of keeping the adjective PoS header is that a user who has looking for a usage that seemed adjectival for a word that had many Noun definitions would be able to click on Adjective in the ToC and go right to an abbreviated section that referred the user to the noun section with the templated explanatory text. Putting something on the Noun inflection line is better than forcing the user to page down for a usage note that the user didn't know s/he needed, but doesn't appear on the first screen. DCDuring TALK 16:46, 20 April 2008 (UTC)[reply]

I'm glad to see this brought up again. Previous attempts at a solution here, here, and here. It's clear to me that these nouns are not in any way adjectives, and referring to them as such doesn't cut the mustard. To me the test is if they are exchangeable with other adjectives. Her "voluptuous, attractive, amazon physique" could not be rendered "her attractive, amazon, voluptuous physique." If it could, and make sense, it's made the crossover to adjective. You read "amazon physique" as a unified noun phrase. That's what it is. Not an adjective. The Germans usually make one word out of it -- that's another test. Other obvious tests that go along with this are the predicative: ("Her physique was amazon.") and comparative forms ("more amazon") that aren't figurative/humorous/a colloquial slip of the tongue for "amazonian". As for a solution, see what I did at satellite, senses 5 & 6. This thing has a name, Noun adjunct. I once thought a templated usage note was the best idea, but the Noun Adjunct tag (preferably blue-linked to it's own appendix with thorough explanation) is the most appropriate to our format while getting to the most truth. -- Thisis0 16:53, 20 April 2008 (UTC)[reply]

I don't think of a PoS header as an assertion of what something "is" (although that might depend on.... Oh, never mind.). An Adjective PoS header that was immediately followed by an explicit assertion that the word should not be considered a true adjective would probably serve to prevent users from going the wrong way with it.
The term "noun adjunct" does not have the advantage of being widely understood without clicking on a link (which link is not yet present at satellite).
It really depends on whether we are trying to create a dictionary that is primarily intended to be a map of linguists' current understanding of language or something helpful to learners and non-linguists, antiquated concepts and all. I think we have to build more bridges to the benighted minds of the mythical anon users, about whom we know so little, but who are the source of future registered users and contributors and the ability to win funds from users and grant-givers. DCDuring TALK 17:24, 20 April 2008 (UTC)[reply]
An erroneous adjective section only reinforces the idea that these are, or might be, adjectives. There is clearly confusion surrounding this issue. It's our job to properly categorize and define language so it can be properly understood. "Dumbing it down" for the mythical anon is not at all productive, accurate, or desirable. We aim to have clear, usable definitions that inform the most casual user, and also more informative categories and tools for those who care to make one click. The limits of this dictionary won't be set by the least concerned user. If you fear offending him, why are the esoteric multi-Latin Etymologies at the top of every entry? Certainly that is more oblique at first glance than a Noun Adjunct tag or, on another topic, plurale tantum. -- Thisis0 17:43, 20 April 2008 (UTC)[reply]
I would not object to hiding the Etymology and Pronunciation sections under show/hide bars or moving them out of precious first-screen space. It is hardly a question of "dumbing it down" to treat the mind of our archetypical user as someting other than a tabula rasa. I don't think of users who haven't spent much or their life on grammar as dumb even when they don't speak or write to my taste. We need to accept the realities of their prior education and other experience.
The vast majority of users think of "Adjective" (when forced to think of it at all) as meaning modifier or describer of a noun. They do not differentiate attributive vs. predicative usage, comparability/gradability, let alone other more subtle attributes. I don't see what beneficial goals we achieve by adding the additional conditions if by so doing we limit a user's access to the most basic information that might be sought. It should be clear from Feedback that users don't find all that we do helpful. DCDuring TALK 18:29, 20 April 2008 (UTC)[reply]
Can I tell you what doesn't make any sense in what you just said? First you say, (unafraid to use an esoteric term, I should point out) that we would do best to approach our users as a tabula rasa ("blank slate"), but then you are wanting to operate on a premise that they do have preconceived notions that 'noun modifiers must be adjectives', etc. Which is it? I agree with 'blank slate', believing we should impart complete and correct information. Second, you twisted my intent for "dumbing it down", assuming I somehow said non-grammar-philes are "dumb." No way. On the contrary, my point is that average users are intelligent enough to digest an accurate grammar tag, and we should not ever assume they are "too dumb to get it." This seems to be your assumption.
Problems with your solution (Adjective POS with note saying "it's not really an adjective"): 1) That's dumb. 2) If a casual user has any defining characteristic, it's a tendency to glance or skim; a misleading adjective POS for a non-adjective is wrong. 3) It takes up a lot more room. 4) It separates definable noun senses from the Noun POS. 5) It doesn't make any more sense to a non-interested user than an appropriate, succint tag. Yes, there is no Appendix:Noun Adjuncts currently, but there will be. This approach (as in satellite senses 5 & 6) is accurate, succinct, non-intrusive to the uninterested user, and educational to the interested user. Please, please, can the conversation be about this, and not about how we should make this place to cater to the least user. That's what happened discussing plurale tantum, and I do not want this one to fizzle out 'cause it's just turns into you and me bantering about how you want to appease the least user. I really want to hear what others think about the proposed solutions. -- Thisis0 19:37, 20 April 2008 (UTC)[reply]
I believe that it is our job to cater to "ignorant", impatient users first and other users (or the same users when they have more time) later. Esoteric terms seem fine for this forum, not for the target or our basic entries' first screens. If we are any good at language, we should be able to figure out how to "dumb it down" without doing violence to a deeper and more subtle understanding.
I think our users' pre-existing understandings are the facts of life that we must accommodate to serve a non-elitist version of the mission of WMF. The first skim of the ToC is the first place that we can lose users of our longer entries. If they are looking for something that behaves a lot like an Adjective in the most central way (modifying nouns) and don't find Adjective, they will most likely go to another dictionary and be annoyed at us. Neither outcome will increase the chances that they will click on Wiktionary again. Like you, I had thought we ought to be able to count on our users to know enough about the language that we could completely dispense with an Adjective PoS for Nouns where the only adjectival use was as an attributive. But, 1., seeing that contributors often insert Adjective PoS sections after Noun sections and, 2., examining dictionary definitions of adjective have led me to question my own beliefs and preferences. My thought about using an Adjective PoS was that we could direct users from an Adjective PoS heading to both the Noun (for definitions) and to a helpful explanation of attributive use of nouns. I simply don't see how that inherently constitutes a problem. It might not be the best solution, of course. DCDuring TALK 20:15, 20 April 2008 (UTC)[reply]
I think a separate 'Ajective' PoS would be unnecessary. As long as we mention somewhere (definition line, or usage notes) that it can act similarly to an adjective our entry should jive with what the user was maybe expecting. As for prempting users from creating an 'Adjective' PoS, well as long as we use standard template(s) we should be able to flag entries that have an attributive noun sense, and a separate Adjective PoS, and someone can cleanup afterwards. That being said, I like the combo of a definition-line 'context'-like tag/template and a templated usage note. To make sure users understand the entry, we can worry about the exact wording later. --Bequw¢τ 20:10, 20 April 2008 (UTC)[reply]
I don't think that contributor creation of an Adjective PoS is a problem that has to be controlled and corrected as much as it is a concrete demonstration of how non-expert users look at PoS. It seems that if they know a word to be used in an adjectival way to modify a noun, then they think that a dictionary ought to show it as an adjective. If someone has facts that say, for example, that non-expert users have a category called two-word nouns and do not expect that the first of the two words is likely to be in a dictionary under Adjective, then I could put my concern to rest. I would even settle for a good sample of what ESL and grammar books would say about atributive use of nouns.
My Longman's DCE (for learners) doesn't note attributive use in individual entries at all. My MW3 (unabridged, US) has a generous number of nouns marked often attrib immediately after n as well as having seperate entries for neer-SoP phrases like "beer hall". DCDuring TALK 00:26, 21 April 2008 (UTC)[reply]
I agree that the creation of Adjective POS headers is a sign of a problem with our current approach (as are some of the messages received on WT:FEED), and will be a useful metric for any solution. But note that the current approach is to have no special notice of attributive use. If we start using usage notes and/or labels, particularly ones that contain the word "adjective" somewhere, I expect that user confusion (and the ensuing creation of spurious Adjective sections) will drop substantially. The proof of the pudding will be in the eating. -- Visviva 06:06, 21 April 2008 (UTC)[reply]
So, a usage note in the noun PoS is one thing that might be able to agree on. It doesn't seem to require a vote, AFAICT. To generate a real test, we would probably need to find numerous entries of the following classes:
  1. Noun PoSs that have Adjective PoSs under the same heading (to prevent senses being added in addition to the existing presumably appropriate Adjective senses).
  2. Noun PoSs that have had Adjective PoSs added under the same etymology which Adjective PoS has been removed.
Is the version of 1.5 laid out above after Ruakh's comment the best we can do? I wish I felt that we had a real metric: an actual share of additions of new Adjective PoS sections to English Nouns as a percent of total new PoS creations as well as a listing of the entries involved so we could make sure there wasn't too much large-scale irrelevancy. Can we flag entries that are having new Adjective PoSs added to existing noun PoSs (same English Etymology)? DCDuring TALK 10:36, 21 April 2008 (UTC)[reply]

You are all aware that the translations of these "noun adjunct" senses of nouns will actually be adjectives in most languages, even in Old English (the ones that have usually retained genders, have distinctive adjectival inflection etc.) ? --Ivan Štambuk 07:20, 21 April 2008 (UTC)[reply]

Actually, look at the current translations at satellite. Other languages have different inflections for the compound-forming nouns, but they don't usually become adjectives. That's actually part of the reason I favor calling them what they are. Other languages know they are nouns, and actually have an inflection case for compound-forming nouns. Yes, some languages will have these as adjectives, but words that are nouns in English should be called that, and you'll find many other languages agree. -- Thisis0 07:30, 21 April 2008 (UTC)[reply]
Well, I can tell you that every single English "noun adjunct" translated in Slavic languages (usally with -ni/-ski suffix) would be a classifier-type adjective, that Czech translation included. Lexical content of a first noun is used as a qualifier for the second noun, and every language that 1) has the abovementoined properties 2) favours adjective-noun vs noun-noun constructs (that is, not like modern German) would pretty much always use adjectival translation. Tbot-generation of entries from translation tables would have to be disabled for this "noun adjunct" senses. --Ivan Štambuk 07:49, 21 April 2008 (UTC)[reply]
Why? Because the part of speech doesn't match between languages? If we did that, then we wouldn't be translating the names of languages. Translations are about translating, and understanding that the grammar in the target language may very well be different. Besides, in most cases there will not be a separate "noun adjunct" sense. A separate sense for "noun adjuncts" is only useful when the sense (when used as an attributive) is more specific or limited than the noun in general. --EncycloPetey 12:32, 21 April 2008 (UTC)[reply]
yes, and also when it is used frequently in noun compounds (dairy, chicken), and when there would be any potential confusion or desire for an Adjective section. -- Thisis0 17:16, 21 April 2008 (UTC)[reply]
Yes, and Tbot can't differentiate between those. I remember correcting about a dozen Croatian nouns into adjectives generated by Tbot that were incorrectly placed in the translation tables of English language names (whose noun senses are mostly fossilized adjectives anyway ^_^). Mismatching between the basic PoS categories such as nouns/adjectives that almost all the (relevant) languages of the world have is not a good choice IMHO. --Ivan Štambuk 14:42, 21 April 2008 (UTC)[reply]
On the other hand, Spanish (and probably other romance languages) most often translate "<noun1> <noun2>" into "<noun2> de <noun1>". Spanish does have separate adjectives sometimes, but the rearrangement with de (of) is more common. --Bequw¢τ 21:31, 21 April 2008 (UTC)[reply]

If you label such things adjective, learners of English, who have studied many grammar rules but don't really know the language, will assume that you can do adjectivy things to them: modify them by very, too, so, use them attributively and predictively, grade them, etc. The other thing is, it's pretty hard to think of a noun that CAN'T be used attributively. I mean, maybe there are some words that only appear in certain constructions dint as in by dint of or sake, but apart from that... You can even do it with proper nouns. Why would this need to be mentioned at all? (I'm speaking of English nouns here).--Brett 01:41, 24 April 2008 (UTC)[reply]

You're right. You can do it with all nouns. It's a regular property of nouns in modern English. We're just talking about those that are used most commonly in an attributive sense (dairy, satellite, chicken, etc.), those that have an attributive sense with a slightly different meaning (amazon), or anywhere there might be confusion or a desire for an Adjective header. (Unless of course they've made the full crossover to adjective, then that's what they are.) -- Thisis0 02:58, 24 April 2008 (UTC)[reply]
Because our contributors regularly attempt to add adjective PoSs to nouns because they feel that the adjective sense is missing. The proposal at hand is to come up with some way of preventing that and to also direct users to the noun PoS definitions to find the meaning of the adjectival use they might be interested in and to a helpful note explaining attributive use of nouns (and what shouldn't normally done to such nouns) so they don't waste time looking in the wrong place in the future. I believe there are many users who do not remember this kind of thing or were never taught it. I was one of them, though blessed with the tendency to use nouns attributively without support from any rule. I expect speakers and writers of English to "adjectify" almost any noun they can in all the ways that you seem negatively disposed toward. Sometimes I think that this censoriousness must be much more UK than US (;-)). Wasn't that last just so George W. Bush of me? DCDuring TALK 02:34, 24 April 2008 (UTC)[reply]
I don't think we need to prevent people from creating such adjective sections, nor am I advocating some such mechanism that will prevent or flag such contributions. No. All we are doing is making them more correct and imparting a little educational info. Like Brett said, as long as people think these are adjectives, they "will assume that you can do adjectivy things to them". They don't behave like adjectives because they aren't. Let's just start fixing the most common ones in a simple straightforward manner (Noun sense with tag), and get a good Appendix:Noun Adjuncts or somesuch going. We don't need a software flag or anything. -- Thisis0 02:58, 24 April 2008 (UTC)[reply]
Thanks, I think I understand the situation better now. And, yes, we've run into the same issue at the Simple English wiktionary. Currently, it seems to be under control, but we have only a very small number of editors.
By the way, I wasn't stating a preference but rather a fact about English. It is ungrammatical to say this is a very faculty office or ask how soccer is your ball? Yes, you can playfully force nouns to be adjectives, but this anthimeria is at a rather different level from what we were discussing.--Brett 12:01, 24 April 2008 (UTC)[reply]

The necessity of a new etymology header

Should the verb form entry be under a new etymology header like I have done with mast or is that unecessary? __meco 08:21, 20 April 2008 (UTC)[reply]

Yes, I think it is necessary. If the verb form were placed in parallel with the noun, this would imply that they share the same etymology, which would be misleading. The extra header is somewhat annoying, but I don't see any way out of it while maintaining a sound ontology. -- Visviva 09:37, 20 April 2008 (UTC)[reply]
I agree that a second etymology header is appropriate in cases like this, and is necessary to avoid misleading users. --EncycloPetey 23:22, 20 April 2008 (UTC)[reply]
I think there is a page somewhere that actively recommends it.Circeus 23:08, 21 April 2008 (UTC)[reply]
To push the matter closer to a conclusion, the entry might have "See [[mase#Norwegian|mase]]" under the etymology. Another possibility is to use the template {{term|mase||lang=no|insert gloss here}}. I also noted that mast's Norwegian section heading were not all at the right level after the insertion of the etymology. If "mase" does not actually have an etymology shown, then the Etymology heading at "mase" should have {{rfe|lang=no}}. Finally, I noted that mase, the lemma entry for the verb, as I understand it, did not have a Norwegian section. Following such trails can lead to valuable new entries when you have the energy and knowledge or reference materials needed. DCDuring TALK 01:31, 22 April 2008 (UTC)[reply]
Pages like this I've reorganized before so that the definitions that don't have etymologies go above all the ones that do. That avoids the problem of an empty etymology section, but it often puts much less important definitions first, so honestly I don't think it would be an improvement over what you have. What we definitely don't want to do is write "Unknown" or the like as the etymology, unless the origin of the word had been thoroughly researched with no conclusion reached. DAVilla 18:59, 23 April 2008 (UTC)[reply]
The priority would be to get the lemma form of the Norwegian verb entered, I would think. DCDuring TALK 19:47, 23 April 2008 (UTC)[reply]

Dutch gender

At long last we have a policy on this at the Dutch wikti, or at least I have proposed one and nobody objected. With as few as we are that is pretty much law. I have tried to explain the situation and its most reasonable remedy at Wiktionary:About Dutch#Gender and had a bit of a discussion with Visviva. I encourage the anglophone community (including its Dutch speakers, mothertongue or no) to support us in the chosen solution. It is admittedly a compromise, not of my making but that of the Taalunie. I must say though that the latter body has done a pretty good job imho. Jcwf 21:39, 20 April 2008 (UTC) nl:Gebruiker:Jcwf[reply]

Thanks Jcwf for taking this on. I know absolutely nothing about the background issues here, but deferring to the Taalunie seems like the most sensible option. (This would, I guess, mean barring "common" from inflection lines.) Perhaps {{nl-noun}} could also link to an appendix where these issues are discussed? Word-specific details could be discussed in Usage notes (or Etymology), as and if appropriate. -- Visviva 23:38, 20 April 2008 (UTC)[reply]
I'm a Flemish speaker and I didn't know the northern Dutch situation well.. In the entries I made, I looked to the gender Van Dale uses, and if there wasn't any I used {c} or {m|f}, but I support this proposal and I'll now use {f|m}. SPQRobin 15:20, 21 April 2008 (UTC)[reply]
This seems like an excellent solution, giving information rather than dictating how an individual speaker should speak their own language. I think we also need an appendix where this is explained, as many (if not most) Dutch courses for English-speakers use northern Dutch and the terminology of "common gender". Physchim62 17:10, 21 April 2008 (UTC) (non-native speaker, level nl-2 on a very good day!)[reply]

Category for agent nouns?

Would anyone mind terribly if I created a Category:English agent nouns, and categorized accordingly? bd2412 T 05:12, 21 April 2008 (UTC)[reply]

Not I. Seems a meritorious act. -- Visviva 05:53, 21 April 2008 (UTC)[reply]
So, the category is for Bond, M, and Q? ;) --EncycloPetey 12:28, 21 April 2008 (UTC)[reply]
Do we have an entry on too cute by half? bd2412 T 14:16, 21 April 2008 (UTC)[reply]

American

Folks who also play on Wikipedia may be interested in commenting on w:Wikipedia_talk:Manual_of_Style#American, regarding use of the term American to mean "United States". --EncycloPetey 12:27, 21 April 2008 (UTC)[reply]

Gaps in entry titles.

Do we have a good way of representing gaps in entry titles? Like, too … by half probably warrants an entry, which too and half should link to; but what should its title be? —RuakhTALK 17:18, 21 April 2008 (UTC)[reply]

I think we just hope like crazy that we can always split it into connected parts. "[too clever] [ by half ]" works for me, but I appreciate this ignores the main issue. Conrad.Irwin 17:21, 21 April 2008 (UTC)[reply]
Well, it was just a few months ago that we finished deleting all of the "X the Y"-type entries, so I'm guessing that wouldn't be the preferred approach this time (though it does seem logical). As I recall, a primary justification for deleting those was that no one would ever look them up -- something which I'm afraid would apply to pretty much any other way of representing these. This is part of our larger difficulty in handling collocational information, I'm afraid. An interim step would perhaps be to have an Appendix: page detailing the behavior of the given frame (Appendix:Too X by half?), housing various and sundry usage examples. - Visviva 09:06, 22 April 2008 (UTC)[reply]
Formulas for constructions could be permitted in any space outside of principal namespace where mostly more experienced users roamed. It would help if we had some agreement on which space had which kind of content. Appendix space would seem like a good place, but perhaps a more entry-like space that allowed constructions that used a Wiktionary-standard notation would be useful. Perhaps there is a suitable commonly used notation that we could appropriate. Such "entries" might be useful link targets from principal namespace. DCDuring TALK 10:58, 22 April 2008 (UTC)[reply]
I doubt not looking them up that way is a good enough reason to delete scare the X out of, which would be found in searches or as a derived term. Sure, scare/frighten/knock the living daylights out of/the wits out of/... could be reduced to living daylights out of, wits out of, etc., which is a better way to handle those. And sure, no one would ever look up X like Y. However, there are already tens of hundreds of entries with "one" or "someone" as placeholders that would be found the same way as scare the X out of. No one would ever look up one and one's either.
The question here I think is what to use as a general placeholder for an adjective. I would propose "thus" or "such" as options, but I'm not sure even "do" is used as a placeholder, and it seems like there's a bit of stigma against creativity, which would be very unfortunate. Nonetheless, we already have placeholders, exactly as do traditional dictionaries, and there's a lingering broader question of how to demarcate them. It's not apparent in the title, but I've generally made an indication in the entry itself by italicizing the marker in the heading instead of bolding it. For instance, compare take someone's point with someone else, up one's alley with one's self. But I seem to be in the minority, since some people like to not only bold the word, but also uselessly link it.
On the other hand, Hippitrail pointed out that italicization is ambiguous in contexts like nth where the italicization is normal for part of the term. So... maybe there is a better solution for the heading, or maybe the solution requires rethinking the titles. While mind one's p's and q's would imply that "his" or "her" could be substituted, mind one's p's and q's is actually used that way very commonly, so to some extent, both entries are needed. Probably a usage note, or even just a couple of examples, one with "mind one's p's and q's" and one with "mind your p's and q's", are enough. Otherwise, how would you know?
We don't generally use "..." in titles even for unclosed fragments, I think primarily at the insistence of Connel, who has argued against even such punctuation as (s)he] and s/he. While I strongly disagree with the priciple that including punctuation is always incorrect, and in fact find the wiki software too limiting, I'm fine with eliminating it when it's superfluous. However, I'm not sure that "..." is always superfluous. It isn't polite to say, by itself, "I'd like to know." It's only polite if it precedes something else. And anyways, what happens when we take an expression like that and translate it into another language where the "..." goes in the middle?
"Too by half" needs "..." or something else in there. Never having heard the expression before today, I definitely think it warrants an entry somewhere that can be searched from "by half". However, I don't think that saying by half is good enough is good enough in the general case. This is obviously a much more general problem. DAVilla 21:45, 23 April 2008 (UTC)[reply]
Appendices and similar places outside namespace 0 are useful for information not suitable for entries for various reasons, but are not very useful for inexperienced normal users using our search box. Notation algebra isn't going to work for them either. The best we can do for them with current software without silly proliferation of phrase entries is to have good usage examples and default-searchable citations that contain the search words they are looking for in such a way as to bring the best entry to the top of the search. Without better search, this kind of entry won't be found very often. I wonder how many actually look up this kind of article -- and how many find it. DCDuring TALK 08:43, 24 April 2008 (UTC)[reply]
We could take 4 or 5 examples representing typical problems and try to analyse our way to a/some solution(s) I was thinking about don't come the X with me. bgc gives the first 5 option for X as "acid", "raw prawn", "cowboy", "tin soldier", and "orator". It will be impossible to give a typical X for this phrase, as it takes almost any noun phrase you can think of. What would a schoolchild enter if s/he came across don't come the tin soldier with me and wanted to understand it? If we can analyse that to a solution, and do the same with too clever by half, and some other specific examples, we will be well on the way to finding an answer. Regarding the too X by half, I must admit that I lean towards an entry at by half. It seems to be the logical first search, and should come up in the search list for approximate entries. By the same reasoning, perhaps an entry at don't come would also work? I find it useful to analyse to find the smallest "chunk" of meaning. -- Algrif 13:02, 24 April 2008 (UTC)[reply]
If the schoolchild knew anything about the internet, s/he would enter the phrase in Google rather than Wiktionary's search box. (and if the schoolchild didn't know anything about the internet, s/he probably wouldn't know about Wiktionary either). If we had a Concordance:Don't come the N with me or similar, including a "tin soldier" use among others, that would presumably appear somewhere in the results (though not prominently, at least not until our content improves to the point where people actually start linking to us). On reflection I think Concordance: makes more sense than Appendix: for phrasal template entries, at least in most cases; of course that will require allowing a bit more content in concordance pages than we have done heretofore. -- Visviva 13:14, 24 April 2008 (UTC)[reply]
I like both of the above.
The chunking approach is immediately feasible and may help some users right after implementation.
Concordance space would be good for this if it were part of our default full-text search or of a fall-back if the namespace-0 didn't have results.
Usage examples, usage notes, and namespace-0 citations give more searchable material both for Google (???) and our own full-text search.
I wonder if these would also increase the number of hits we would get from Google. The more entries only we cover, the more often we are on the top of their search results, the more click-throughs we get, the better we do in their algorithm (a virtuous cycle). Google drops very common stopwords unless linked by hyphen to non-stopwords. Phrases/chunks/formula(e/s) that have only stopwords are not going to be found via Google. If we could figure out a way to get people to come to Wiktionary for constructions involving mostly stop words, we would be offering something that might win us a certain type of user who would, of course, become a loyal fan because of our superior content. DCDuring TALK 17:35, 24 April 2008 (UTC)[reply]
  • I note that some phrasal templates -- or at least snowclones of a sort -- have been put on Wikiquote by our friend BD2412, using the X-Y notation. The man can cite! See for example wikiquote:An X among Ys, a Y among Xs. These do seem to get good Googles, for what it's worth; the Wikiquote page for X me no Xs was #4 in a search for "but me no buts," just above the first actual scholarly treatment. Personally I would prefer, assuming we are going to have these on Wiktionary in some form, that we use a more linguistically-aware notation such as NP/VP or simply N/V/etc., as "N me no Ns." But first, perhaps we should consider what value we can provide that Wikiquote cannot. -- Visviva 11:00, 27 April 2008 (UTC)[reply]
  • Neither the X/Y type or NP/VP type notations are going to serve inexperienced users well. How someone might learn that the possibility of such a search is not at all clear to me. Everything we put outside main namespace has second-class citizenship and will not be found by those who are not adepts. I suppose that there is value in having such rewards for becoming an adept.
    The value we create would simply be that we helped meet an expectation that someone had about what should be in a dictionary. I believe that dictionary users want help in understanding odd constructions. Certainly most dictionary have some kind of grammar and usage content. Neither WP nor Wikiquote would be my go-to Wikis for grammar and usage information. DCDuring TALK 11:45, 27 April 2008 (UTC)[reply]
Well, insofar as we are an online reference work (and that has to be how at least 99.9999% of our current users use us), most people are going to come to any given page through a search engine, portal, or direct link, not by going to the main page and typing in the search box. So in the case of a snowclone, they will probably find the page by searching the web for information on a particular instance of that snowclone, in the same way that I found BD's pages on Wikiquote. It would never have occurred to me that Wikiquote might have such a page, but in the glorious world of Web 2.0, that was irrelevant; Google did my thinking for me. Appendix pages are indexed by Google et al., so that shouldn't be a concern for us. Likewise people will be able to find the content regardless of the page title. -- Visviva 12:04, 27 April 2008 (UTC)[reply]
It would be nice to know how users actually get here. How many are just from sister project links? I don't think we can rely solely on individual-page attractiveness. We deliver branded information to a certain extent, so that users may select us from a search result page because of the good things that have happened to them on our site in the past. I would think that we would want to offer some kinds of search that Google doesn't (and can't) offer. Their orthography limitations are an opportunity. And so too might be some kind of grammar-restricted searches with "variables". "NP1 NP2 no NP1s"? Could we use categories (visible or invisible) to go in this direction for idioms and contructions? DCDuring TALK 12:26, 27 April 2008 (UTC)[reply]

Have we come to a decision about how we treat these sorts of entries? When trying to find if we have any sort of entry for the "s/x/y/" type of self-correction notation used frequently by those familiar with regular expressions (if we do, I haven't been able to find it.), I stumbled upon X one's Y off. Thryduulf 01:13, 2 May 2008 (UTC)[reply]

Certainly not yet. I would favor Visviva's NP, VP, N, V formulation for anything that didn't fit the one('s)/someone('s)/something('s) approach. That won't cover animate/inanimate and other more semantic categories without modification, but linguists must have suitable vocabularies for such distinctions that we could try out. DCDuring TALK 01:33, 2 May 2008 (UTC)[reply]
If the way of filling the blanks is kept simple, then users will be able to form them easily once they see one example, which they will eventually see somewhere in Wiktionary (synonyms, see alsos, translations, search results, maybe redirects for common searches for particular terms). They may also see it on the information desk. "User: How is (something with gaps) used? Wiktionarian: Look at (link). User: Oh, they are formatted like that, neat." I doubt that NP, VP, etc. are simple enough. -- Coffee2theorems 12:20, 3 May 2008 (UTC)[reply]
I agree. —RuakhTALK 13:05, 3 May 2008 (UTC)[reply]

The simplest possible thing would be to pick a sequence of characters that is used for every gap, i.e. instead of "X one's Y off" you'd have e.g. "... one's ... off", "* one's * off", "? one's ? off", or some such. Using non-letters would be best (if the software allows it..?), because the choice is more clearly unique. With letters there are choices such as upper/lower case and choice of letter ("X one's X off" and "X one's Y off" would seem equally plausible ways of generalizing "too X by half" to me), whereas e.g. "..." doesn't suggest any alternatives. It would also be a plus if the characters can be easily typed ("… " can't, "..." can). -- Coffee2theorems 12:23, 4 May 2008 (UTC)[reply]

I agree, though I think … (horizontal ellipsis) is fine provided ... (three periods) is a redirect. —RuakhTALK 15:37, 4 May 2008 (UTC)[reply]
What expressions are there that have the same variable in two positions? I can think of a few in English, but there must me more. "X me no Xs" is almost an unfair example, having no restrictions on X other than it being mainly a noun. The switch in PoS defeats even Visviva's approach. "X after X" (periods of time > hour; distance; repetitive task; or object of repetitive task; indeed, anything repetitive) and "X in, X out" (day, week, month, year) are the two cases that first came to mind. "X upon X" and "X by X" are similar.
I personally would prefer an approach that could handle these cases and that made clear PoS restrictions, which are common, but not always obvious from the ellipsis approach. That is a significant advantage of what Visviva had offered. We may need a more flexible framework to reflect the particular restriction on the variables such as (animate, human or near-human, mass noun, countable noun, etc.) DCDuring TALK 16:29, 4 May 2008 (UTC)[reply]

Pending an eventual resolution to this, I've started a list of entries that we should add when we decide how. The list is at User:Thryduulf/phrasal entries with variables, feel free to add any others you think of. Thryduulf 10:27, 12 May 2008 (UTC)[reply]

Search enhancement

Typing in the search box now shows you the words we have that match your typing. I like it! However, I notice that newly added words don't show up - is it running off of a preproduced list? SemperBlotto 08:27, 22 April 2008 (UTC)[reply]

To answer my own question - no, there is just a short time delay - excellent. SemperBlotto 08:37, 22 April 2008 (UTC)[reply]

Marvelous. I'd always wondered if that sort of functionality would ever come to us. The responsible dev(s) deserve a signed thank-you note. -- Visviva 09:08, 22 April 2008 (UTC)[reply]
That's awesome! (Though sadly, it gives our "misspelling of" entries greater potential to be detrimental. :-/ ) —RuakhTALK 14:31, 22 April 2008 (UTC)[reply]
Excellent. It's a very good step, keeping us competitive with the well-funded sites. Something like "soundex" search or a list of aliases would be a wonderful next step for the many instances where user doesn't enter a spelling we have. It might be more important for us than for others in the WMF ambit. DCDuring TALK 14:41, 22 April 2008 (UTC)[reply]
Yeah, pretty neat. I'm not sure how soundex works, but definitely it would be good to allow people like Hippietrail to tamper with this, letting it search the "did you mean" results instead of just the page titles.
One problem though, it's rather a pain to do a search when the lists drops because it covers the search button. It only applies if what you've typed is a prefix to something else, but that condition is pretty easy to meet. Maybe it could "drop" up instead of down? DAVilla 21:55, 23 April 2008 (UTC)[reply]
Is this CSS-adjustable? -- Visviva 13:26, 24 April 2008 (UTC)[reply]
It's miraculous enough that this much has happened; I wouldn't put too much hope in the possibility of future improvements of the same kind. (For example, I wouldn't assume that the DidYouMean extension will be approved this year or this decade.) But in any case I don't think that w:soundex would work for the default search box, since the sounde algorithm is limited to English. On a hypothetical future version of Special:Search, with language selection etc., it would be a great addition. But if we want something like that, which involves front-end functionality rather than anything at the content end, it probably makes the most sense to set up a demo mirror of our own. Got cash? -- Visviva 13:26, 24 April 2008 (UTC)[reply]
If we could get our usage up, perhaps we would be more useful to WMF for fund-raising and more "deserving" of technical attention. WMF did get a $500K from Sloan Foundation recently. I wonder what we could do that would help in that regard in terms of identifying funders whose interests coincide with where we might want Wiktionary to go. UK Prime Minister on recent visit to US spoke about the English language as a tool of joint national interest with US. Maybe govt. money has too many rules for WMF and is seen as tainted and insufficiently international, but there should be suitable funders somewhere. DCDuring TALK 17:16, 24 April 2008 (UTC)[reply]

Well, it seems to me silly to re-invent wheels (however fun it is ;) so I have wrapped aspell in some python and added a callback to WT:PREFS. If you want to test this feature then go to WT:PREFS and enable "aspell on http://devtionary.org (WARNING...". This is not a feasible long term solution, and if aspell turns out to do the right thing then I will implement it as a proper extension for MediaWiki. The javascript code is User:Conrad.Irwin/aspell.js. Known problems with the current implementation: It is very slow (this is because devtionary has to query the wiktionary API to provide colourful links, aspell is plenty fast enough ;), It only supports English (this is a limitation in the current installation, not aspell or the python script in general). I would appreciate comments on how well aspell performs, and any other ideas people have for doing this kind of thing.Conrad.Irwin 10:53, 25 April 2008 (UTC)[reply]

To test what this does, visit a misspelled page (i.e. http://en.wiktionary.org/wiki/alhpabet ) or do a full text search for a word (i.e. Special:Search/hunderd ). Conrad.Irwin 12:26, 25 April 2008 (UTC)[reply]

"misspelling of" template

The {misspelling of|} template does not allow wikilinks within the template. Entries with this template are usually simple and do not contain other wikilinks, therefore they are not included in the page count. Is this by design? --Panda10 00:00, 23 April 2008 (UTC)[reply]

Yes, this is by design. A number of editors here feel that misspellings aren't really words anyway, so they ought not to count towards our total number of entries. --EncycloPetey 00:02, 23 April 2008 (UTC)[reply]

alternative spellings of only some sense

nonpartisan had an alternative spellings section for non-partisan and an adjective section. Then I added noun section. But the alternative spelling only goes with the adjective (I think). How do we indicate an alternative spelling for only POS? RJFJR 16:35, 24 April 2008 (UTC)[reply]

I'm pretty sure that there was recent discussion about this. BP, TR? DCDuring TALK 18:26, 24 April 2008 (UTC)[reply]
IIRC, when we voted on the order of L4 headers, this case was considered. The Alternative spellings header may occur at L4 when it is specific to only one part of speech. --EncycloPetey 21:40, 24 April 2008 (UTC)[reply]


Election Notice

The 2008 Board election committee announces the 2008 election process. Wikimedians will have the opportunity to elect one candidate from the Wikimedia community to serve as a representative on the Board of Trustees. The successful candidate will serve a one-year term, ending in July 2009.

Candidates may nominate themselves for election between May 8 and May 22, and the voting will occur between 1 June and 21 June. For more information on the voting and candidate requirements, see <http://meta.wikimedia.org/wiki/Board_elections/2008>.

The voting system to be used in this election has not yet been confirmed, however voting will be by secret ballot, and confidentiality will be strictly maintained.

Votes will again be cast and counted on a server owned by an independent, neutral third party, Software in the Public Interest (SPI). SPI will hold cryptographic keys and be responsible for tallying the votes and providing final vote counts to the Election Committee. SPI provided excellent help during the 2007 elections.

Further information can be found at <http://meta.wikimedia.org/wiki/Board_elections/2008/en>. Questions may be directed to the Election Committee at <http://meta.wikimedia.org/wiki/Talk:Board_elections/2008/en>. If you are interested in translating official election pages into your own language, please see <http://meta.wikimedia.org/wiki/Board_elections/2008/Translation>.

For the election committee,
Philippe Beaudette

trans gloss in morna

This word is listed in Category:Translation table header lacks gloss even though there is a gloss. The structure looks fine to me. Can you take a look? What is it that I don't see? Thanks. --Panda10 11:00, 27 April 2008 (UTC)[reply]

Fixed; just a top where a middle should have been. -- Visviva 11:02, 27 April 2008 (UTC)[reply]
Thanks! --Panda10 11:32, 27 April 2008 (UTC)[reply]

{{compound}}, {{suffix}}, {{prefix}}

I recently made {{compound}}, modeled after {{suffix}}. I think we ought to promote these templates, since they offer a possibility to keep etymology sections consistent and uniform. However, maybe a little more information than just a ‘+’ would be good. Therefore here a call for better wordings for those templates, maybe in the style of belofteploeg, where I didn’t replace the etymology by the template (yet).

Hoping for your input (but feel free to implement it yourself, I’m not frequenting this page anymore)! H. (talk) 17:17, 27 April 2008 (UTC)[reply]

Comparable to {{blend}}, which has a more specialized role. Might allow more automatization of derived terms and, with lang parameters, enable identification of macaronics. DCDuring TALK 16:43, 4 May 2008 (UTC)[reply]

Petition on Meta

Hello,

I would like to notify you of a petition against the recent decision by the board to reduce community representation. Please find it here. I am sending this message to most English Wikimedia projects as I think it is important the community is informed. If you have any questions please ask me at my Wikinews talk page.

Thanks,

Anon101 (on Wikinews) 20:23, 28 April 2008 (UTC)[reply]

(Note- I did not create the petition)

That petition gives half of one point of view and no place to voice opposition... Where do the people who are pleased that the board is looking to add professional voices to the discourse in order to make the most of the contributions that the community generate? I don't like one sided politics. - [The]DaveRoss 20:30, 28 April 2008 (UTC)[reply]
I suppose that periodically or as the occasion warrants, we might remind people that Wikimedia Foundation ("WMF") provides the umbrella for us and all our sister projects.
For those interested in the governance of WMF the and the issues that it deals with, here is the contact information for the mailing list:
  • foundation-l mailing list
<mailto:foundation-l-request@lists.wikimedia.org?subject=subscribe>
Unless there is an issue that is specific to en.Wiktionary, or wiktionaries in general or all of WMF's en sites, the discussion is best carried out there. DCDuring TALK 20:55, 28 April 2008 (UTC)[reply]

Rohingya (cit) split

News flash! SIL has split cit into rhg (Rohingya) and ctg (Chittagonian) [15]. We'll need to update the appropriate templates and Category:Rohingya language. --EncycloPetey 01:10, 29 April 2008 (UTC)[reply]

Now that is unusual, (a split, usually it is additions) apparently because "cit" was an error to begin with? You changed {{cit}} to "Chittagonian", which puts things in a non-existent cat; all of the existing entries label themselves as "Rohingya" (although most are User:Drago ...). I've fixed it to redirect to {{rhg}}, on the way to orphaning it. (This will show up in my language templates table as something to be fixed.) Robert Ullmann 12:06, 1 May 2008 (UTC)[reply]

Sorting clicks

yet another trivial issue on which much heat can be expended ...

I've notice in working out the implementation of sorting translations tables in AutoFormat that humans have put !Xũ under X, rather than at the top of the table, where the simple code order would place it. This seems reasonable. Do note that the "!" is a click, not punctuation. This would also apply to ǂHõã, ǀXam, etc which would otherwise sort at the end (IPA characters, "ǀXam" starts with an IPA dental click, not a vbar/pipe). And doing the same sort for language headers in an entry. Robert Ullmann 09:55, 30 April 2008 (UTC)[reply]

Makes sense. Dictionary sorting ought to consider letters only, ignoring case, punctuation, and spaces. I realize these clicks are more significant than most punctuation in their native language, but to English-language readers they are not letters. Michael Z. 2008-04-30 18:51 Z
On second thought, we are alphabetizing text in many languages and foreign scripts. Is there a native sort order for these symbols? Is there any reason not to use the default Unicode collation algorithm for all places where we have mixed languages? Michael Z. 2008-04-30 18:55 Z
That's not exactly true. There are places where we alphabetize text across language and script (e.g. at category pages), but the language names in the translation tables are supposed to be English names in the Latin script. I think we should probably use the Ethnologue name (Kung-Ekoka), but if there's a good reason to use !Xu as our name for it, then we should do so, and IMHO we should ignore the ! in collating, just as we do spaces and punctuation and whatnot. (Likewise if there's a good reason to use !Xũ, but that strikes me as unlikely.) —RuakhTALK 20:09, 30 April 2008 (UTC)[reply]
Quite right. (I have since had my morning cup, and see that collation depends on the context)
I didn't realize we were talking about !Kung, the name I have heard before, and what Wikipedia calls it. Michael Z. 2008-04-30 21:55 Z
Yes, among others, and we use "!Kung" in the language template. There are others besides clicks, such as 'Auhelawa. I've added a line to the collation order lambda in AF. Robert Ullmann 11:56, 1 May 2008 (UTC)[reply]

People might also be interested in the description of what AF has been taught to recognize in tables at Category:Entries with translation table format problems, specifically the handling of grouped languages and subsidiary lines with qualifiers, e.g. doing things like:

(at butterfly) where either * or ** can be used with a language name, this makes it easy for applications parsing wikitext as they can treat * and ** identically, and expect the full language name. (And we don't have to have "Greek, Modern" ;-) Subsidiary notes that are not languages use *: as with

(although the Serbian things really ought to just be on one line ;-) The stuff described at the cat page is not policy, just what I've found that seems reasonably structured and useful. AF has been tagging things for a few days to see what will be found. Robert Ullmann 11:56, 1 May 2008 (UTC)[reply]

On the last example, I agree that it should be one line. It's not a transliteration, so it shouldn't be parenthesized, but it could more easily be listed as we do with simple and traditional Chinese:
Is it really necessary to inform the world that the first are Cyrillic characters and the latter Roman? DAVilla 20:53, 19 May 2008 (UTC)[reply]

<section end="archive_april">

May 2008

Special:ListUsers, Does this bug anybody besides me?

I think the first page of the master user list is rather unfortunate. Yes, the user's account was deleted--so the link's red now, if it used to be blue. Can't any more be done? Is there an überdelete? If not, how about resurrecting the user, moving the account to a plain vanilla name, and then deleting again? Snakesteuben 07:11, 1 May 2008 (UTC)[reply]

Block user --> Prevent account creation won't keep it from being recreated? If the page and user functions are indeed separate, then the blocked user name shouldn't change when the page name is changed. But I admit I haven't tried it, yet... Snakesteuben 08:45, 1 May 2008 (UTC)[reply]
Edit: Nah, that doesn't work, not exactly like that, anyway... Snakesteuben 08:47, 1 May 2008 (UTC)[reply]
Well, a bureaucrat can actually rename a user; see WT:MV. (This will automatically move the user and user discussion pages as well, but as you note, that's conceptually a separate step.) So we could rename the user to something innocuous, and then add the original username to the accounts-blocked-from-creation list. Actually, we could more generally add ^! to that list and consider the problem solved. —RuakhTALK 12:18, 1 May 2008 (UTC)[reply]
It is easier to use the list at Special:Allpages/User_talk: - though that only lists user's who have been talked to. I believe there are plans to kill the thousands of dead accounts as Unified Login progresses - but 'til then we'll have to just put up with it. Conrad.Irwin 09:12, 1 May 2008 (UTC)[reply]
Ruakh: There ya go!
Hi, Conrad. That's not my concern; I haven't much use for the page myself. What about the public, or the casual contributor who isn't part of this community--and hasn't ferreted out the easier ways to do things? I think one might reasonably look at the registered contributor list to figure out who's behind all this, and then gauge the credibility of the source. If I'm right, then that page doesn't help us, and might lend credence to the anarchy-ergo-unreliability theory. 'Course if I had my way, for that very reason, the default display would sort users by UserPage.Qualifications.Impressive + Contributions * Signal_to_Noise_ratio or some such, rather than alphanumerically. ;-)
But seriously folks, yes, you and I can be expected to put up with a fair bit. But just like paper books, wikts must provide value to more than just their authors and developers. While the user interface we present to the public might not be as important as a book jacket, it's still part of the package, not irrelevant. I think anyway. Snakesteuben 15:23, 1 May 2008 (UTC)[reply]
Afaik the official mediawiki answer is that there is no user delete function and the database should not be touched because it could be dangerous for db integrity. But on my personal mediawiki i just did a "delete from user.." on the MySQL console anyways and mediawiki didnt explode, didnt seem to cause any trouble so far. Mutante 21:40, 4 May 2008 (UTC)[reply]
The question isn't whether or not it can be done, it is whether or not it will be done. Yes users can be deleted, anything can be deleted given the proper amount of legwork, but we don't generally do it without very good reason. I think that it is far easier just to rename the one offensive username from the first page of the list and move on, we can blacklist any names which are recreated if it comes to that. It isn't worth bugging a dev about it, unless someone wants to create a username_delete extension... - [The]DaveRoss 21:49, 4 May 2008 (UTC)[reply]
Someone, somehow, made it go away. I am content. Snakesteuben 03:32, 9 May 2008 (UTC)[reply]

But no one knows the POS!

I just created the entry for θεπτάνων (theptánōn). It's an incredibly obscure Ancient Greek word, which is only attested in a fifth century Ancient Greek dictionary of obscure and archaic Ancient Greek words, written by w:Hesychius of Alexandria. So here's the fun part: No one knows what the POS was for sure. It could be an adjective (on fire), a noun (something which is scorched, on fire), or as I expect, a participle (≈ being on fire). Now, these are close enough together that we can reasonably give it a definition, but I don't feel confident giving it a POS (hence the POS is listed as Unknown). One of the really cool things about Wiktionary is that we can include incredibly obscure and esoteric words like these. However, we may want to discuss how we want to do things when dealing with words which have incomplete information. To give an even more interesting example, take the Phaistos Disc. No one knows what any of that means. However, for ancient Aegean linguists, it's very important stuff. I want to, eventually, include all of these words on Wiktionary (Unicode is waffling over whether to encode them). If they do, Wiktionary is the perfect place to have them. We can discuss various theories as to their meaning, look at similar characters in other scripts, etc. It's a classicist's wet dream. But, how do they get formatted. Now, this is not an urgent thing, as there are plenty of less esoteric words which we still lack (like the verb ἅπτω (háptō), the participle of which is given as the definition of θεπτάνων (theptánōn)). However, I thought I'd throw it out there just to get the ball rolling. -Atelaes λάλει ἐμοί 23:51, 1 May 2008 (UTC)[reply]

I think you should at least put Unknown Part of Speech instead of just Unknown, without Part of Speech a casual reader would not know what Unknown refers to. Nadando 23:54, 1 May 2008 (UTC)[reply]
I would think you would take a best guess at PoS. In the leading case you mention, you have a fairly good idea, so to say unknown seems to mislead. The underlying question is which of the specialized needs of researchers can productively co-exist with the needs of the more ordinary users. In the case of the Phaistos Disc, it would seem to belong in something more specialized or perhaps the Ancient Greek Wiktionary (if it comes into being), where it would attract all of those most able to decipher the material. Frankly, if there isn't unicode, then it would seem to be more of a WikiCommons thing for the images. DCDuring TALK 01:07, 2 May 2008 (UTC)[reply]
According to Wikipedia, Unicode codes now exist for these symbols. Lmaltier 21:42, 2 May 2008 (UTC)[reply]
For really exceptional cases like this, IMO it makes sense to put scholarly interpretations in place of documented use. That is, if there is a school of thought that this is actually a genitive-plural noun, have a "Noun" heading, with appropriate qualifiers in the sense & usage lines. If some possible POS's have only been mentioned as possible interpretations (and never seriously championed), those should be relegated to the notes.
In the case of the Phaistos Disc (likewise, other undeciphered writings), I would expect us to use ===Symbol===, though what the L2 header might be I haven't a clue -- perhaps there is a better use for "Unknown". -- Visviva 14:19, 2 May 2008 (UTC)[reply]
I’d also suggest you pick one POS, which seems to be the most probable or less contested, and add a Usage notes section explaining the issue.
By all means, do include those Phaistos symbols. Although maybe an Appendix would be more appropriate. H. (talk) 08:17, 9 May 2008 (UTC)[reply]

Proposed change in ToC display

In this discussion, Conrad.Irwin has proposed some CSS code that would cause ToCs to float to the right of the entry text, instead of sitting on the left and pushing everything down. This probably would not have worked smoothly in the past, but seems to work fine now, thanks to Robert Ullmann's great work in sorting out the float properties of various floaty things (here endeth my understanding of that matter).

The specific code would be:

#toc {
  float: right;
  clear: both;
  margin-left: 5px;
  margin-bottom: 5px;
}

Since this would be a very significant change to entry display sitewide, I am posting this here to the Beer Parlour rather than the Grease Pit. Please voice any concerns or objections here. For my part I support this change, which IMO makes entry navigation significantly more straightforward. -- Visviva 15:53, 2 May 2008 (UTC)[reply]

Perhaps the margins could reflect the existing document grid. The bottom could match an image thumb (6px), or use the line-height of text (1.5em, computed as 19px in my browser). The left could use the same margin as the navigation boxes in the left column (7px), or the main column of text (12px). Michael Z. 2008-05-02 18:21 Z
I'm pro–, but I think there should probably be a corresponding #toc-float-none #toc rule-set that undoes it, and perhaps a #toc-float-left #toc rule-set that floats left instead (with appropriate margin changes), so we can have {{tocnonfloat}} (and perhaps {{tocleft}}) for cases where they might be useful. (Those are probably bad names, but you get the idea.) —RuakhTALK 21:51, 2 May 2008 (UTC)[reply]
I think this is an excellent notion. One area where I'm not thrilled with this is in big community pages like this one, where the standard-issue TOC is actually better; since there aren't many such pages, it would be easy to apply a template where appropriate. -- Visviva 01:38, 3 May 2008 (UTC)[reply]
This would be a problem for all the right-floating items we have, such as WP link boxes, {{was wotd}}, and images. If the TOC floats right, then these items either (1) are shoved left into the entry text, (2) hidden by the TOC, or (3) shoved down into the collapsible tables. Is there a proposal to deal with these problems? --EncycloPetey 22:16, 2 May 2008 (UTC)[reply]
The TOC on the right seems to work on WP without problems where it is used. However like EP I'm uncertain if it would work with our page structure - take a look at pages like head, bassoon and router. How would these work with a TOC on the right. Until I've seen mockupsthat show how these entries (or ones like them) would be with the TOC on the right I would oppose any changes to the status quo in this regard. Thryduulf 00:02, 3 May 2008 (UTC)[reply]
For pages like head, see User:Visviva/head. added: note that that matches the CSS behavior in FF, but not in IE, where it behaves weirdly.
For bassoon, the right sidebar currently renders like this: pediabox, TOC, image. Not ideal, but I think we might reasonably ask whether a TOC is needed on that page at all.
The images in router get pushed down, but not too far: on FF for me, the first image is level with sense 1, and the others stack neatly down to the carpentry sense. (hey, those images should really be in a gallery anyway.) ;-) -- Visviva 01:38, 3 May 2008 (UTC)[reply]
Here's how it renders for me in FF: any right-floating thing (image, pediabox) above the first language header displays above the TOC; any right-floating thing below the first language header displays below the TOC. Using the __ TOC__ magic word, it is possible to tweak this if it's not quite the desired behavior -- i.e., forcing the TOC to appear above or below a certain point. (Though honestly, I'd been thinking more in terms of preventing pediaboxes from grabbing this prime real estate.)  :-)
AFAICT, this doesn't affect {{was wotd}} at all. -- Visviva 01:38, 3 May 2008 (UTC)[reply]
Having difficulty getting this to work in IE6; perhaps someone CSS-knowledgeable can explain why? -- Visviva 01:38, 3 May 2008 (UTC)[reply]
I don't have IE6 to test with, but I understand that it has a bunch of problems with margins on floats, and that some of those problems go away if you set display:inline on the floating element. (This is discussed in various places online, e.g. at <http://www.positioniseverything.net/explorer/floatIndent.html>.) If the difficulty you're experiencing has to do with the margins, it might be worth trying. —RuakhTALK 04:28, 3 May 2008 (UTC)[reply]
Thanks. Actually the difficulty I'm experiencing is that absolutely nothing changes, even after a full cache dump. Tried on another computer, since this one is having issues -- still no change, and likewise when I add "display:inline". Odd; I suppose it wouldn't be a deal-breaker (since IE6 users would just get the same layout as before), but it doesn't seem right. -- Visviva 06:00, 3 May 2008 (UTC)[reply]
BTW, wouldn't clear:right make more sense for this than clear:both? —RuakhTALK 04:22, 3 May 2008 (UTC)[reply]
Update to code to reflect idea's given - this works for me in IE6. Conrad.Irwin 20:45, 3 May 2008 (UTC)[reply]
.ns-0 #toc {
  float: right;
  clear: right;
  margin-left: 7px;
  margin-bottom: 6px;
  display: inline; 
}
As this seems to be popular I suggest we give it a trial in the next few days. Conrad.Irwin 20:45, 3 May 2008 (UTC)[reply]
Popular? I see two people who've voiced support and two who've raised objections. What definition of popular does that fall under? --EncycloPetey 03:46, 4 May 2008 (UTC)[reply]
My definition 'cos I like it - I would count it 4/2 :). Anyway, there is now a new item in WT:PREFS to allow this to be previewed more easily. It should appear at the bottom of the list under the search spellchecker - if not then you will need to hard refresh. Conrad.Irwin 11:04, 4 May 2008 (UTC)[reply]
It would be helpful if you can clarify what concerns you feel haven't been addressed. To review your previous concerns again: due to the recent CSS revisions, if the TOC pushes down into the collapsible tables, the collapsible tables simply shift left (no collision). Images and boxes above the first language header render just as they do now; images and boxes below the first language header render below the TOC. In entries that already have a cluttered right margin (multiple boxes), this can get a little messy, but I would submit that those entries are in need of cleaning anyway (we have {{pedialite}} et al. for a reason). In any event, AFAIHS the TOC does not collide with anything, nothing is pushed into the entry text, and {{was wotd}} is not affected.
Anyway, I've been running this for a few days and I'm sure not going back to the old layout. I just think it would be nice if we can offer this improvement to the general user; it seems unfair to keep usability improvements to ourselves.  ;-) -- Visviva 07:03, 5 May 2008 (UTC)[reply]
  • One issue I have seen in FF: when the browser window is reduced, a {{wikipedia}} box rendering below the TOC actually blocks out some of the definition text. This seems to be a problem in {{wikipedia}}, just made more apparent by this change; however, I'm not sure which of the various style declarations therein might be responsible. -- Visviva 07:03, 5 May 2008 (UTC)[reply]

I've started this language article. Michael Z. 2008-05-03 05:02 z

A modest proposal, re: bad translations on the internet

I'm sure everyone who has dabbled in ttbc has noticed a lot of misspellings, and even totally mythical translations on purported "dictionary" and "translation" sites around the internet. It seems any time one site starts a rumour, the others pick it up. And before long, they drop in a citation to "various references," which I assume means each other. Unfortunately, en wikt is frequently a member of this group. (I recently deleted one such "translation" for the second time--and it was in the main section, not ttbc unfortunately.)

Is there some policy governing how to deal with/prevent this kind of thing?

If not, what do you guys think of maybe creating either:

  1. a list or category of these things, with links to/from the affected words, or perhaps even better
  2. an entry akin to "common misspellings" for the myth word. (If we do this, we should call it something else--these aren't common misspellings, they occur nowhere except on the braindead sites, and sometimes in posted messages by obvious English speakers who were duped into using them.)

Snakesteuben 02:42, 4 May 2008 (UTC)[reply]

I like option 2, at least for the serious problem cases. If we don't actively address these misconceptions, they will keep getting added when no one is looking. There could be a standard "Common Foovian mistranslations" (or something) category generated by the template. The exact format of the template bears some thought -- should it include the preferred translation, the putative English equivalent, or both? And what should these be called? "Mistranslations" is a bit too broad. -- Visviva 09:15, 4 May 2008 (UTC)[reply]
Good ideas, Visviva. In the mean time, I think I'll start noting such things as hidden text comments next to the relevant entry in the translations sections. (I'm guessing a quick consensus is unlikely, and I'm not senior enough here to take semi-unilateral action ... though you probably are.) Winter (Username:Snakesteuben 02:44, 12 May 2008 (UTC))[reply]
Maybe something like the way I dealt with the "phobias" would work here, namely {{only in}}. See Appendix:Invented phobias and aurophobia or Category:Wiktionary pages that don't exist. Though I'm not sure how much support this has either it strikes me that the situations are similar. Conrad.Irwin 09:33, 12 May 2008 (UTC)[reply]
{{only in}} (or an enhanced descendant) could be of great use for keeping persistent bad full entries out of principal namespace and for getting some better use out of the Appendices. Addressing more of the full range of user and contributor "errors" might be valuable for reducing vandalism and speeding users toward the entries that they really need. In contrast to redirects, but like {{misspelling of}}, it compels users to note that they have made an error. This seems like yet another good use. DCDuring TALK 12:26, 12 May 2008 (UTC)[reply]

User Richardb

User:Richardb has been confronted with repeat copyright violation (see User_talk:Richardb#Citations pages) which was dumped into the Citations namespace. His responses were "I'll leave them there and let another editor format them" [16]; "since you like being the policeman so much I'm sure you'll get far more enjoyment out of doing the deletions" [17]; and "Aw sod off the lot of you. Get a life. I'm busy putting decent stuff into the Wiktionary. Can't be bothered with you boring lot. Won't be replying to this crap any more." [18].

Copyright violation is too serious for such a flippant attitude, particularly for a Wiktionary administrator. I'm now of the opinion Richardb should be desysopped (at the least) and possibly banned if this continues to be his attitude towards violating copyright law. --EncycloPetey 06:49, 4 May 2008 (UTC)[reply]

Agree. I'm sorry to say that this user's words and deeds have pretty thoroughly ruled out good faith, and appear to indicate that he poses an unacceptable risk to the project. Further, this is not the first occasion that Richardb has indicated he does not consider himself bound by community norms. Lapses of judgment or temper are one thing, but that is not really an acceptable attitude for an admin.
I don't think an outright ban is necessary, provided that Richardb lives up to his (apparent) commitment to stop engaging in copyvio. He has made valuable contributions here, and hopefully will do so again in the future. -- Visviva 07:10, 4 May 2008 (UTC)[reply]
However good a contributor is otherwise, this attitude towards copyright violations is completely incompatible with Wiktionary, and doubly so of administrators. Unless he changes his tune very quickly I don't see an option other than formally requesting he be desysopped. Thryduulf 08:11, 4 May 2008 (UTC)[reply]
Well, although I don't know if it counts for a change of tune, his remark of 22:04, 3 May 2008 (UTC) on User talk:Richardb#Citations pages seems to indicate that he does not intend to continue, although he also does not intend to clean up after himself. AFAICT cleanup is now complete in any case, so there does not appear to be an imminent threat to the project. Nonetheless, IMO the risk of having an admin with such open disregard (as it appears) for the most fundamental principles of Wiktionary is still great enough to justify desysopping. Not sure what the procedure for that is... I believe stewards look for clear local consensus, but I'm not sure if that would require a formal Vote or not. -- Visviva 09:10, 4 May 2008 (UTC)[reply]
De-sysopping would not prevent a recurrence of the specific copyvio problem and risks causing worse problems. The copyvio issue is easy to fall afoul of, speaking from experience. I'm more concerned with seemingly petulant responses to reasonably polite and even very polite feedback. We have seen some fairly disruptive behavior by some of those who feel unhappy with and alienated from the Wiktionary culture. The disruption wastes our time when it occurs, even though it is remediable. In this case "you boring lot" is a possible sign of that kind of unhappiness and alienation. It would be better to have Richardb on board and contributing than hostile and non-contributing. Realistically, we are better off to let slide troubling incidents separated by months. But the reservoir of AGF good will does decline with every incident. Ordinary contributions alone do not restore it, IMHO. DCDuring TALK 09:44, 4 May 2008 (UTC)[reply]
Hear, hear. Widsith 12:07, 4 May 2008 (UTC)[reply]
The only reason for removing the sysop flag that I can see is that were anyone to sue WMF about copyright issues the fact that he is an "administrator" doesn't look good. There hasn't been an abuse of the tools. I do agree that it would be best if Richardb at least stopped behaving the way he has been, and perhaps is willing to clean up the stuff that is questionable. I don't think that it is necessary for us to de-sysop him, if he doesn't want to play along anymore maybe he wants to volunteer to step down. - [The]DaveRoss 16:54, 4 May 2008 (UTC)[reply]
I don't know. He seems to be fairly iffy now on the general topic of following community rules; for example, when I mentioned AGF to him, his response was basically a flat-out refusal to abide by it. So far, none of his willful rules violations has involved admin tools (granted, he deleted RFDO once, but that seems to have been mostly accidental), but do we think he draws a clear line there — "I'll break the rules that anyone can break, but not the ones that only admins can break"? If not, I don't see the point of waiting until he's actually abused the admin tools. Adminship is a matter of active trust, not a passive "benefit of the doubt"–type trust. Personally, I'd have preferred that we talk to him about this; but as soon as he stated openly on his talk-page that he refused to engage in further discussion of his misbehavior, I think EP did rightly in bringing this here and raising the possibility of de-sysopping. (That said, DCDuring makes an eloquent appeal for not de-sysopping him, and assuming that he does indeed stop with the blatant copyright violations, I'm quite happy to hold off until the next "troubling incident", whenever and whatever that might be.) —RuakhTALK 21:17, 4 May 2008 (UTC)[reply]
If people feel strongly about it it is worth a vote. You are right about the trust issue, and I think that there have been a few questionable behavioral issues in the recent past with Richardb. A "no confidence" type vote might give people the opportunity to voice their concerns and comments. I don't know that it would succeed, but it would bring out some until now silent voices of defense. - [The]DaveRoss 21:32, 4 May 2008 (UTC)[reply]
Some people make it really easy to flip from being a good sysop to a bad one. Not that we don't have bad ones to begin with, they're just bad in different ways. Certain sysops have made me a lot less willing to take part in community discussions. Kinda sucks that people who can make wiktionary so unpleasant can still be considered worthy of their sysop powers while other transgressions are held to be more...diabolical. mwahahaha (Kinda not on the exact point, I just wanted to say this) — [ ric ] opiaterein13:15, 5 May 2008 (UTC)[reply]

Pinyin without tone markings

We are suffering from an epidemic of these lately. The entries added seem to fall into three categories:

  1. Entries that are tone-marking-free versions of otherwise valid Pinyin words: jinu, tiao
  2. Words of type 1 that may actually be used in English and other diacritic-averse languages: Hanyu Pinyin, Guomindang (?)
  3. Alleged Pinyin misspellings, particularly involving the letter "v" (is there some sort of variant system at work here?): lvxing, jinv

I'm assuming that types 1 and 3 should be deleted with prejudice, while type 2 should be converted to English. Is that correct? It would be nice if some guidance on these points was added to WT:AZH. -- Visviva 03:40, 5 May 2008 (UTC)[reply]

There's nothing wrong with entries without the tones, as long as the tones are specified. They're useful because you can see which words have almost the same pronunciations except for the tones. In theory, we could keep ONLY these while specifying the tones in the headword, instead of keeping entries for every different pronunciation with tone marks in the page title. (Latin doesn't specify which characters have macrons in the page title, why should Chinese specify the tones?) Note also that we don't really have a system for marking tones for Cantonese, without the numbers. So when we manage to find a good Cantonese contributor, what then? Entries like wong4fan1? Whatever decisions we make about this can't be so hasty. — [ ric ] opiaterein13:09, 5 May 2008 (UTC)[reply]
Well, personally I'd like to see evidence that pinyin is ever used to write Chinese by actual Chinese speakers communicating with other Chinese speakers. Here and elsewhere, I've seen claims that it is used in children's books (but nobody seems to have a specific book title or ISBN handy), and that it is or has been used for internet communication due to the complexities of encoding (but the only uses of Pinyin on Usenet seem to be by/for learners). But that's more of a general issue... At any rate, if Pinyin is really used for Mandarin, but not really used for Cantonese (etc.), then it seems obvious to me that only Mandarin Pinyin entries should be permitted here, and only in the form in which they are actually used.
Regarding Latin, I'm given to understand that the reason is that Latin has seldom/never actually been written with the macrons; they are purely a lexicographer's convention. If that's also the case for tone markings in pinyin, then by all means we should eschew tone markings. But in any event, we shouldn't get into the trap of having entries for "words" that are never used for communication in any language. We are the dictionary of all words in all languages, not the dictionary of all words in all languages transliterated into all possible writing systems. -- Visviva 13:27, 5 May 2008 (UTC)[reply]
Let's not forget that pinyin is the official transileration system even in China. Also, why discriminate against "learners" in favor of native speakers? What good is the English wiktionary going to do for native speakers of Chinese? Unless they're learners :) — [ ric ] opiaterein14:33, 5 May 2008 (UTC)[reply]
The point is that all words have to pass WT:CFI, which means basically that they have to be verifiably used to convey meaning in the given language. People trying to learn a language online are not a valid source of information here (they are a big part of our target user base, but that's another thing entirely). Treating interlanguage as a language in its own right makes sense in studies of second language acquisition, but it is not a very useful approach for lexicographers to take. Also, I may be wrong, but I'm fairly sure the Pinyin which is official in the PRC uses tone marks.
Anyway, sorry to have driven this off-course ... while I remain dubious of Pinyin entries in general, what I really want to know about is what the community thinks of these nonstandard, ad-hoc Pinyin entries. Is there some unique rationale for keeping these entries that would not apply to any ad-hoc romanization of any language written in non-Roman script? -- Visviva 14:52, 5 May 2008 (UTC)[reply]
I used to try to create separate pinyin entries, but no longer do so. If someone else creates a pinyin entry, I make an attempt to correctly format it (time permitting). The reason I no longer create pinyin entries is that if you create a proper Mandarin entry using simplified or traditional characters, you should be able to type pinyin into the search box, and find what you're looking for.
As for the "lvxing" spelling, it is not "legitimate" pinyin. It should be lüxing, or more correctly, lǚxíng (旅行). If I were to use a Pinyin-based IME to type 旅行, I would have to type "lvxing" in order to get what I want. Most English keyboards don't come with a "ü," so many IME's substitute a "v" for purposes of typing. -- A-cai 13:51, 5 May 2008 (UTC)[reply]
Aha! Thanks for that info. Is there a good way to note this in the 旅行 entry (and others, as appropriate)? That could help to resolve the anon's concern at Talk:lvxing. -- Visviva 13:57, 5 May 2008 (UTC)[reply]
As a Chinese learner, I often want to verify a word I've learned. For example, looking up wanshang (without tone markings, which are hard to type in the search box). So I don't think such entries should be deleted; they can redirect (except in the case when there is more than one word for a single romanized spelling). 24.29.228.33 16:21, 5 May 2008 (UTC)[reply]
Whenever you edit an entry, at the very bottom you have this drop-down list with Pinyin section that contains character with tone marks, which you can insert upon clicking. Search results should include entries with transliterations with tone marks, even when you don't type them explicitly. --Ivan Štambuk 19:13, 5 May 2008 (UTC)[reply]
No, they can't redirect, because there is every likelihood that wanshang (inter alia) is an actual word in another of the thousands of languages we seek to cover.
If the non-marked Pinyin is included in the relevant entries for real words (real Pinyin and Hanzi), those pages will appear in both internal and external search results. Would that be sufficient? -- Visviva 09:10, 6 May 2008 (UTC)[reply]
  • Tone-marked Pinyin is very hard to input especially for Chinese beginner. However, non-tone-marked Pinyin is convenient for processing. Actually, an entry is tune marked in the content of the entry (for example: tongyi).
Completely agree with Visviva. While I've never been a huge fan of having transliteration entries at all, people keep saying that Mandarin's a special exception, and I'm willing to take them at their word on that. However, there absolutely needs to be a standard. That standard should be whatever it is that people are actually using to communicate, be it with accents, without, whatever. And if both are used, then we need to pick one of them, because having two sets of transliteration entries is simply too much. -Atelaes λάλει ἐμοί 17:02, 6 May 2008 (UTC)[reply]
It is not an exception. The pinyin entries are there not because they are transliterations, but because pinyin is very often used to write Mandarin. Chats, IRC, SMS messages, email, etc frequently use pinyin (usually sans tone markings). We don't want any "transliteration" entries. We have entries for the single syllable words with and without tone markings (this is a finite set, about 1700 IIRC), we should have entries for common words often written in pinyin; this is what (e.g.) A-Cai has been doing. Robert Ullmann 14:48, 9 May 2008 (UTC)[reply]
Just to reinforce Robert's point, here is a link to a picture of a book cover (note the Pinyin without tone marks). I occasionally cite this dictionary in my entries. Another use of Pinyin, which Robert did not mention (but seems worthy of notice), is in URLs. For example: http://www.kexue.com.cn (kexue means science). -- A-cai 11:39, 10 May 2008 (UTC)[reply]
Well, people do lots of funny things on book covers. Does the dictionary also use pinyin without tone markings in the entries? That would be interesting. I have a hanja dictionary that includes pinyin (along with kana), but it uses tone markings.
The URL argument would surely apply to all romanizations, including those of Korean, Arabic, etc. Probably not a road we want to go down. :-) -- Visviva 12:15, 10 May 2008 (UTC)[reply]
I'm not disagreeing -- I honestly don't have enough information -- but I'm troubled that in the several times this issue has been raised, not one verifiable case of Pinyin being used for authentic communication among native or native-like speakers of Mandarin has been provided. Obviously chats and IRC aren't normally archived in a durable (or even non-durable) fashion. But we don't normally accept words under these conditions. -- Visviva 12:15, 10 May 2008 (UTC)[reply]
I'm only half-heartedly defending Pinyin entries. In truth, if we have them at all, I would be more in favor of them being created by a bot (i.e. converted from a simplified or traditional entry). The fact of the matter is that a number of contributors have added Pinyin entries (with and without tones). The real question is whether we want to encourage or discourage them from contributing in this way. Personally, I've always felt that multiple entries for a single word is one of Wiktionary's drawbacks. It creates too much busy work for contributors (particularly in Chinese), and often results in multiple inconsistent entries for the same word (no matter how hard I try to sync them up). However, given Wiktionary's current technical limitations, I'm not sure that we have another good option to multiple entries. -- A-cai 12:55, 10 May 2008 (UTC)[reply]

Following the RFDO discussion that resulted in terms relating to field hockey now being categorised in category:Field hockey rather than directly in category:Hockey, this template now labels entries as (field hockey). Thus I propose it should be renamed to template:field hockey. I would just do this, but I'm unsure if this would cause any issues for the articles that transclude it. Additionally, I'm not certain that we would want to keep the resultant redirect. Thryduulf 12:00, 5 May 2008 (UTC)[reply]

Done We definitely should keep the resultant redirect, I would (as I said on RFD) never refer to Field hockey as anything other than "hockey". Conrad.Irwin 16:24, 5 May 2008 (UTC)[reply]
As a Canadian, I take exception. Hockey always means ice hockey, and this view is supported by the Canadian Oxford Dictionary. The primary sense is our unofficial national sport, while other sense is a mere Britishism. :-)
I suggest creating a neutral template:hockey which places entries in category:Hockey, where they can be easily found and assigned to the correct subcategory(ies). Michael Z. 2008-05-05 17:03 z
Would that category be for words that are used in the contexts of all forms of hockey, or for no words at all? Either way, the category text should state the category's purpose clearly, so that people familiar with only one type of hockey or the other don't assume it's talking about their type.—msh210 17:33, 5 May 2008 (UTC)[reply]
Good question. Do we prefer to see the general (hockey), or the wordier (ice hockey, field hockey) in a sense? Personally I think it is best to reduce the number of unique terms used, and remain unambiguous, so I think the latter might be preferable. If we see (hockey), we may not know whether a Canadian or British editor meant only the type they are familiar with, or hockey in general.
I don't know enough about field hockey to compare, but the terminology looks to have a lot of differences. Terms which overlap are goalie, hockey stick, wing, hookMichael Z. 2008-05-05 18:09 z
I don't know huge amounts about either sport, but I know a little more about field hockey. Basically they are different sports that have evolved from a common premise (i.e. a team sport the object of which is to score goals by using a stick to hit a small object into a net) - they are different enough that even for something as basic as hickey stick, we need separate definitions. I wouldn't object to keeping template:hockey as a way to categorise words temporarily until they can be sorted into the correct sport. "Field hockey" is not a term that I have ever seen or heard in the UK, so it wouldn't be intuitive to British editors to categorise their word such. I guess the same may be true of "ice hockey" in North America? Thryduulf 18:29, 5 May 2008 (UTC)[reply]
Ice hockey is heard in Canada, but rarely used, except when it is specifically needed to differentiate from the variations floor hockey (e.g. in gym class, with a light plastic puck), field hockey, street hockey, etc. The CanOD's definition of ice hockey is "= hockey 1". I believe that British-style field hockey is played, but it is unfamiliar, and most Canadians would assume that field hockey is just ice hockey played outdoors in summer = ball hockey.
I don't know if this holds true in the central and southern USA, where winter ice rinks aren't ubiquitous. Michael Z. 2008-05-05 18:42 z


By the way, it sounds like the main definition should be moved from field hockey to hockey (2). Michael Z. 2008-05-05 19:19 z

Yes it should. 19:31, 5 May 2008 (UTC)
Done, please review. Michael Z. 2008-05-05 20:27 z


One more: please review street hockey, to which I added the Canadian form. These could be reasonably combined into a single definition based on hockey, but that would be treating the two different senses as one, and worse, presenting two distinct games of street hockey as one. Michael Z. 2008-05-05 20:49 z

They all look good to me. Thryduulf 21:23, 5 May 2008 (UTC)[reply]

While I have your attention, please check the descriptions at hockey stick ("primary implement" just didn't seem that useful, and there was no indication that they were different). Michael Z. 2008-05-05 22:37 z

Interwiki links to redirects

There is some debate about whether we should use interwiki links to link to redirects on foreign Wiktionaries. In particular the issue is centered around User:RobotGMwikt, which is currently set to remove interwiki links that point to redirects, though I hasten to add that the underlying issue is far more important than the bot issue (for this discussion). As this has been raging on IRC for the last 48 hours, I hope that posting it here will help to resolve the situation.

For those who don't know, the interwiki links are used on Wiktionary to link pages with exactly the same title on each Wiktionary. i.e. our entry hello links to the French hello. An issue presents itself when the other Wiktionary has a redirect at the page title, should we link to it (on the grounds that there is definitely some information there) or not link to it (on the grounds that it is not the kind of information that people are expecting from the interwiki links). There are no doubt stronger arguments both ways, and GerardM has written a summary of his thoughts at http://ultimategerardm.blogspot.com/2008/05/robotgmwikt.html which are worth reading before entering the discussion.

I would prefer if redirects on other wiktionaries were not interwiki-linked to. If they were real words, why are they then redirects? And if they aren't real words, why imply that you can get somewhere by clicking the interwiki link? I think this will increase as wiktionaries grow. ~ Dodde 22:56, 5 May 2008 (UTC)[reply]
Likewise, I would not want links to redirects. I can imagine hypothetical cases where iw links to redirects might be desirable, but to date I have not encountered such cases except as mistakes. --EncycloPetey 23:19, 5 May 2008 (UTC)[reply]
I do want redirect-links. If another Wiktionary sees fit to make use of a redirect, then I see fit for us to respect that use, link to that redirect, etc. This is especially true because in many cases, it's fairly arbitrary which entry is the redirect and which is the main one; for example, our don’t is a redirect to don't, but another Wiktionary might well do the reverse. Would y'all suggest that our entries shouldn't link to each other? —RuakhTALK 00:31, 6 May 2008 (UTC)[reply]
Personally I think soft redirects are better suited for intended redirects, because you are able to explain why you are redirecting and let the user stop guessing, and as such they will qualify for interwiki links anyway. What a hard redirect mean is so different between projects and also from case to case, that I see no reason to include these pages into the web of interwiki links. Some interwikilinks could probably be discussed to be justified, but I am afraid alot more would not, and this would imho end with more confusion than clarity. You always have to be aware of what the redirect mean on that particular wiktionary, if any system is present at all. ~ Dodde 00:59, 6 May 2008 (UTC)[reply]
I agree with Ruakh. The whole point of linking to redirects is precisely because other wikis use redirects differently, and no one Wiktionary should dictate their use. If another wiki wants to redirect all alternative spellings, or plurals, or whatever, to a single article, we shouldn't then remove all links to those redirects, as if that wiki doesn't have the content. Likewise, in the rare cases where redirects are used on en.wikt, aside from the conversion script's little droppings, they are done to consciously take someone searching for one thing to the page where the content actually is. Dmcdevit·t 01:18, 6 May 2008 (UTC)[reply]
A good way to look at this is to view it as if you are deciding what to do on another wikt about links to the en.wikt. If you have a local entry for an idiom (one of the cases where we use redirects), but not in the same canonical (or "citation form") as the en.wikt, would you want to link to the redirect? If your entry is apple of one's eye do you want to link to our redirect? Of course, you aren't as likely to have apple of somebody's eye. Likewise if you have have Arabic or Hebrew forms that we have redirected to the forms w/o vowel markings, and so on. In the same way, when the FL wikt redirects forms or variants, we want to link to them, respecting whatever policy they have. If the FL wikt changes something, we just link to whatever they are doing (and see next section). Robert Ullmann 11:08, 6 May 2008 (UTC)[reply]
I agree with the above points to include links to redirects, since there is no way to know whether the redirect is useful or not. Of course, an option would be to immediately link to the page which is redirected to. Would that be a problem?
Note that at the same time I am a fan of sort redirects as well, it’s just that for some cases, they don’t make sense, as Robert pointed out. H. (talk) 07:54, 9 May 2008 (UTC)[reply]
I belive that each language community should be able to decide how to structure their data: if they want to use soft redirects or hard ones, what to do with alternate spellings or clitic forms, whether to include romanized entries or not, and all of these things mean that we should allow iwikis to redirects. Sure, there are going to be some mistakes. Over time, those will be fewer and fewer (we hope). Right now we have iwikis from a page which has a word in one language to a page on another wiki with no entry for the same language. That's an iwiki that's not so helpful; but no one suggests doing away with them. We just figure that over time we'll get it right. -- ArielGlenn 20:46, 14 May 2008 (UTC)[reply]
What you may or may not do in the future is a hand waving exercise.. We are talking the current state of play. Currently there are four solid reasons why we should not refer to redirect pages and your argument does not diminish any of them. GerardM 07:38, 15 May 2008 (UTC)[reply]
Forgive me, but we have two solid reasons why a Wiktionary shouldn't (in general) use them (our multilingual nature and the issue of homonymy), one solid reason why a lot of Wiktionaries have them anyway (case conversion), and one supposed reason why we shouldn't link to them (a claim that they don't really have the target entry anyway). Ariel's argument diminishes none of them because none of them needs diminishing: they range from petty to irrelevant. —RuakhTALK 22:35, 16 May 2008 (UTC)[reply]

There is no way in which you can distinction between intended and unintended redirects. This is why the argument is moot. GerardM 10:34, 13 May 2008 (UTC)[reply]

I'm not following this. Bots cannot make the distinction (presumably), but humans can; so these should probably not be added by bot, but there is no reason they cannot be added by humans. -- Visviva 11:00, 13 May 2008 (UTC)[reply]
If you suggest that all interwiki links are to be created by humans I think you are completely right. In that case we do not need to argue about the algorithm used by bots. GerardM 13:51, 13 May 2008 (UTC)[reply]
So is a middle path here to allow people to add iw-links to redirects manually in specific cases, while bots shouln't add, nor remove iw-links to redirects? It seems the argument to include iw-links to redirects is that we might miss a useful link here and there, but the negative effect of adding alot of "false" iw-links seems at the same time to be completely overseen. Allowing to add this manually will give the positive effect that only iw-links are added where there is a good reason, and where there is not we become without alot of "crap" iw-links. (Regarding the bot, is this kind of extinction possible/easy to implement?) ~ Dodde 14:32, 16 May 2008 (UTC)[reply]
If in many Wiktionaries, "bad" redirects continue to exist erroneously after ConversionScrtipt, iwiki linking to them will draw attention to that and in the long-run improve quality. There's a reason red-links are red; it draws attention to ways to possibly improve Wiktionary. Interwiki links to unwanted redirects is not desireable, but the problem is that the bad redirect, not that we link to them. Don't hide the error, let each Wiktionary determine how to use redirects, have the wiki bots link to them, and when we iwiki to a "bad" one, let someone clean it up. --Bequw¢τ 15:50, 17 May 2008 (UTC)[reply]
I have given this some thought, and it is possible that my mind has been affected by narrowness to some extent. I have taken in some of the arguments for allowing redirects to be iw-linked to, and all-in-all I think I agree more than disagree now, that redirects should be linked to, given the variaty in how different language editions of Wiktionary choose to use their redirects within the project. It's not just a matter of using the character ' or ´, but the way of presenting some words in determined form or not (like some languages names: Canarias - or - Las Canarias etc.) - probably there are quite a few examples of likewise differenties between language editions of Wiktionary. I understand it was quite some time since this was discussed, but since I was one of those arguing against iw-linking to redirects I felt appelled to acknowledge my change-of-mind. ~ Dodde 03:03, 14 July 2008 (UTC)[reply]

Interwicket and Arabic

Why is Interwicket removing so many interwiki links to ar? I've checked, and the links do not exist, so the bot seems to be functioning properly. Did a mass deletion happen at ar.wiktionary.org? Anyone know what's happened? --02:06, 6 May 2008 (UTC)

ar:User:Lord Anubis did remove a large set of entries, all capitalized forms (Destiny, Comoros, etc) so the bot is functioning properly. Why they were removed is not known (the edit summary is "Bot: deleting a list of files", very helpful); I've dropped the user a note expressing curiousity. They were not uc→lc redirects: the would-be lc targets don't exist in at least some cases I've looked at. (e.g. ar:destiny doesn't exist) In any case, not our problem? (;-) Robert Ullmann 10:46, 6 May 2008 (UTC)[reply]
The entries I'm noticing are for proper names of stars (e.g. Algol, Deneb, etc.) and constellations (e.g. Cancer). And earlier today the link to Deutsch disappeared. Do you suppose they've eliminated capitalization althogether? --EncycloPetey 13:22, 6 May 2008 (UTC)[reply]
There are lots of words, all capitalized, see log there. But not redirects, since lc form isn't there (deleted Crossover, but no crossover). So it was some sort of content page? Perhaps a bunch of stuff imported a long time ago that they decided to just trash? No way to tell. Robert Ullmann 13:32, 6 May 2008 (UTC)[reply]
We were removing content that was imported from GPL-licensed lists, because GPL is not compatible with GFDL.
We are trying to get these lists licensed under a dual GFDL/GPL license, but till this is done, we cannot use the content on Arabic Wiktionary.
It's all under control :).
Oh and btw, the edit summary was initially "Deleting GFDL-incompatible files", but my bot goes crazy sometimes. :).

--Lord Anubis 15:43, 7 May 2008 (UTC)[reply]

Thank you for the explanation (:-) Robert Ullmann 15:44, 7 May 2008 (UTC)[reply]
Oh ye, and next time we add them, we 'll ensure that they are not unnecessarily capitalised.--Lord Anubis 15:45, 7 May 2008 (UTC)[reply]

proposed vote on inclusion of WMF jargon in the main namespace

Per a discussion on RFV, I have proposed a vote on the inclusion of WMF jargon into the main namespace. Please change its wording as needed and comment (there, not here).—msh210 19:02, 10 April 2008 (UTC)[reply]

I've now modified it; please have a look.—msh210 18:26, 1 May 2008 (UTC)[reply]
And now it's live.—msh210 16:04, 8 May 2008 (UTC)[reply]

This is useful, but should it be moved to Appendix:List of Latin phrases in English? Which is what it seems to be. Widsith 11:42, 9 May 2008 (UTC)[reply]

Only if these phrases are not used in other languages. I can't say because I'm uninformaed on the possible use of these phrases in French, German, Polish, etc. However, on a quick look, they seem to be phrases that would likely have been used either in conversational Classical Latin or written Latin of the medieval and later periods. --EncycloPetey 13:39, 9 May 2008 (UTC)[reply]
To call it something other than its current name seems premature. What we have is a list of Latin phrases some of which may sometimes be embedded in English text, with English translations and English commentary. I consider it a useful document for adding new entries and for facilitating certain kinds of searches. It is likely to have other uses. Verification that the majority of the phrase appear in English, let alone other languages, is not available.
If the entries for the listed items are done under the Latin L2 heading, should we indicate that the headword was commonly used in English? Does that fact need to be attested? DCDuring TALK 14:08, 9 May 2008 (UTC)[reply]
  • A list of phrases which simply existed in Latin would already seem to be covered by Category:Latin phrases. Whether or not they exist in French/Polish etc does not seem to be addressed by the Appendix, which translates them into English and explains them in English. So it seems to me that the Appendix is designed to list all the Latin phrases which are used by English writers, and its current pagename is confusing (to me at least) because of its apparent crossover with Category:Latin phrases. Widsith 15:43, 9 May 2008 (UTC)[reply]
    How did you conclude that the Appendix is designed for that? Solely because it explains them in English? This is the English Wiktionary, so everything is explained in English. I don't see any evidence on the Appendix page that indicates the list was developed specifically for terms that appear in otherwise English texts. --EncycloPetey 17:58, 9 May 2008 (UTC)[reply]
OK. So does that mean its content will be the same as that of Category:Latin phrases? Widsith 18:03, 9 May 2008 (UTC)[reply]
No. The main namespace requires entries to have 3 citations (or 1 in some cases), and forbids entries that are mere sum of parts. An Appendix is often freer in what it permits, and may include items that would not be included in the main namespace. --EncycloPetey 21:47, 9 May 2008 (UTC)[reply]

Yeah, I don't know if there is a limited source or sources, but the list appears to be a compilation of phrases used in English as well as specific mottos and quotations (e.g., "cave canem" from a Pompeiian doormat). I suspect it's too late to apply a specified scope to this large list, but it could be split off into other more specific lists, if someone wants to take it on as a project. Best just to let it continue to grow, and continue to apply the normal attestation requirements on Wiktionary entries for both Latin and "English" Latin terms. Michael Z. 2008-05-09 22:59 z

  • The reason I asked is because you can get those dictionaries of Latin terms in English, and I thought this was maybe our own useful version of such a thing. But apparently not. Widsith 07:38, 10 May 2008 (UTC)[reply]

Fixing wikisyntax typos

I've created a punch list of mis-matched ( ), [ ], and { } in entries; they are often very hard to notice even when looking right at them. If anyone would like to help fix them, see User:Robert Ullmann/Mismatched wikisyntax and of course comments and suggestions are wanted. There may also be other things it can look for. Robert Ullmann 14:36, 9 May 2008 (UTC)[reply]

I have made an initial entry under the English heading. But our collegue EP suggests there could be a case for a Translingual header. What is the general opinion? Is this term widely enough used in most languages? -- Algrif 14:06, 10 May 2008 (UTC)[reply]

Google finds a few instances of the phrase in German, Dutch, and Italian Wikipedia, so it's possible that it is more widely used (although the vast majority are on en: and la:).
Is there a guideline with our definition of translingual: how many languages do we need attestations in to call it translingual, rather than simply a Latin borrowing into several languages? Or do we reserve the designation for things which are more inherently universal, like chemical symbols and proper names of species? (see children of Category:Translingual.) Michael Z. 2008-05-10 19:05 z
Well, it can't be a Latin borrowing if it's not a set phrase in Latin. In Latin, this entry is merely sum of parts, and so would not merit an entry. However, if it occurs in the middle of texts of various languages in this set form, then we have a case for a Translingual entry. There are a lanrge number of chemical symbols and scientific names of taxa that are translignual, yes, but also some abstract symbols, numbers, and some abbreviations and codes. There are also a few phrases or abbreviations of Latin origin, like sp., spp., etc. that have been adopted into many languages. --EncycloPetey 19:15, 10 May 2008 (UTC)[reply]
Perhaps it was a set phrase in the sciences, when European scientists still spoke and wrote in Latin. Michael Z. 2008-05-10 21:48 z
WT:ELE says "this heading includes terms that remain the same in all languages. The symbols for the chemical elements and the abbreviations for international units of measurement are but two examples of translingual terms" (my emph.). We should find attestations in a diverse selection of widely-used languages, say, Chinese, Spanish, Arabic, Hindi, and Russian, before we can conclude that it is truly translingual.
I guessed that in Cyrillic it might be ин вакуо, but only found a single Russian citation on the web, which appears to be quoting some Latin text. Of course, it might be Cyrillicized differently. Michael Z. 2008-05-10 19:20 z
Hm, things like chemical symbols, math, taxonomic names, metric units, internet top-level domain names are truly translingual, used everywhere. Perhaps etcetera is too, but I'm skeptical about things like sp., spp., for which other languages have their own names (e.g., Ukrainian вид, Turkish tür). Michael Z. 2008-05-10 19:28 z
I think we have interpreted Translingual to include terms in "scientific Latin" that have achieved some acceptance in the international scientific community. This doesn't seem clearly consistent with the phrase "in all languages" in WT:ELE.
This works pretty well for the taxonomic names and possibly for the language used to describe species and specimens. The extension to a term like "in vacuo" is a more modest stretch from the descriptive language used in botany. OTOH, EP has instructed me that the adjectives used in species names (eg, multiflora, latifolia, carolinensis) are Latin, albeit New Latin, not Translingual. DCDuring TALK 20:41, 10 May 2008 (UTC)[reply]
Oops, now that I think of it, Russian documents would probably write in vacuo in Latin characters, since they are not as foreign as Cyrillic characters are to English readers. I don't really read the language, but there appear to be a few cases of this in the first couple of pages of Russian-language search resultsMichael Z. 2008-05-10 21:33 z


  • Surely it will have different pronunciations in different languages? Widsith 20:49, 10 May 2008 (UTC)[reply]
    How is that different from the chemical symbols for the elements? In English Hg is pronounced [eɪtʃ dʒiː], but this is not how it would be pronounced in French or Spanish. The scientific name for the Asteraceae (sunflower family) is pronounced differently in different countries as well. The "Translingual" label indicates only that the written form is common to many languages, and does not speak to the pronunciation. I know of no Translingual entries that wold have the same pronunciation in multiple languages. --EncycloPetey 21:08, 10 May 2008 (UTC)[reply]
We have:
  1. unpronouncable or unpronounced (g2g) entries
  2. symbols (eg letters, digits) that do not have their own pronunciation, instead taking it from the associated word
  3. multiple pronunciations for the same word in the same language.
I don't think that pronunciation has enough muscle to determine this. DCDuring TALK 21:16, 10 May 2008 (UTC)[reply]
Also note that a symbol is different from a word. Hg can be spelled /eɪtʃ dʒiː/ or simply read /mṛkjuri/ in English, corresponding to two different pronunciations, /ha ge/ or /rtutʲ/ in Ukrainian. The Latin (translingual?) term in vacuo may be pronounced something like /ɪn vækjuo:/ in English, and practically identically /in vakjuo/, if read from a Ukrainian text. Michael Z. 2008-05-10 21:33 z

Perhaps the yardstick for translinguality is when something becomes a symbol, and is released from the restrictions of pronunciation in its original language. Asteraceae is still a Latin word: Canadian /æstəɹ'eɪjsiə/ or Eastern European /asterat͡s'eja/ are still examples of people reading Latin with their own accent. But $, mm, Hg, 42, °, =, .de would be spoken in the local language, and are going to need a very large "pronunciation" section. (.com may be an exception, because it is an acronym "dot com", not "dot see o em"). Michael Z. 2008-05-10 22:06 z

  • The fact that several languages may or may not use a Latin term does not make the term Translingual. It's totally different from a Chemical symbol like Hg. Someone writing in Taino or Xhosa can use only the symbol Hg if they are composing a professional chemical document. There is no analogous situation for a phrase like in vacuo, which I simply do not believe can be valid usage in every language in the world. Widsith 22:47, 10 May 2008 (UTC)[reply]
First, I hope that we are not challenging the labelling of taxonomic names as Translingual.
Is in vacuo like carolinensis (which is Latin, per EP)?
  • If so, then in vacuo is Latin. If it is Latin, we then need to determine whether it meets WT:CFI. I would propose that, on the one hand, it is Latin, but, on the other hand, it is not SoP because it is a set phrase in its use embedded in other languages, where it is used by those who might not be able to decipher it by its Latin components and know it mostly as a phrase and, accidentally, by its similarities to words in their language (for English speakers: in and vacuum.
  • To me it seems easier if they were both deemed Translingual based on their use in scientific literature and separately determination were made of their qualification as Latin (use in epigrams (?), religious documents, and other modern Latin usage for such New Latin words). DCDuring TALK 01:33, 11 May 2008 (UTC)[reply]
A comparison between carolinensis and in vacuo is not appropriate. The term carolinensis is used regularly in Latin contexts. This happens in the Latin circumscriptions of newly described species, which are a requirement for legitimate publication of a botanical species. It also occurs in botanical texts of Linnaeus and others, who were still publishing in Latin. The adjective carolinensis also declnes as a Latin adjective in these publications. By contrast, in vacuo is a prepositional phrase, so it will exhibit no inflection. If it is occurring as a set phrase in multiple non-Latin languages, then it is doing something that carolinensis is not. The adjective carolinensis does not migrate into non-Latin languages except as a component of a proper noun naming a species; it never crosses over as a word in isolation. We are saying that in vacuo does cross over as a unit. So, if we want to draw comparisons, we need to find other non-inflecting phrases for comparison, such as caveat emptor, ad hominem, or sub nomine. --EncycloPetey 01:51, 11 May 2008 (UTC)[reply]
  • Translingual is not panlingual -- it was not so long ago that we reviewed the fact that taxonomic names are not panlingual (East Asian and other languages use homegrown terms of equivalent specificity). It has been proposed somewhere that "translingual" denote a term used in three reasonably disparate languages. In vacuo would seem to meet that standard, although I'm not sure if it is used in any non-Indo-European languages (maybe Hungarian or Finnish?). -- Visviva 03:39, 11 May 2008 (UTC)[reply]
    How about Japanese: [19]. --EncycloPetey 03:46, 11 May 2008 (UTC)[reply]
  • Personally I think that misses the point. Translingual terms to me are terms which are used "by the international community", i.e. practically speaking within fields such as the sciences which have internationally-recognised terminology – IPA symbols, binomial classification, chemical symbols etc. Latin phrases seem to me to be a different kind of thing altogether. I mean the English word bar is used with identical meaning by dozens of languages around the world. Is it Translingual? Widsith 07:40, 11 May 2008 (UTC)[reply]
    No, because bar inflects differently in different languages, so it isn't the same across languages. By contrast, see the example above of in vacuo used in a scientific article that is primarily in Japanese. This indicates acceptance and use "by the international community". --EncycloPetey 13:14, 11 May 2008 (UTC)[reply]
  • Thanks all. Having gone through the pros and cons, I have changed it to Translingual, particularly noting that this heading does not mean Panlingual. I would appreciate any Latin input to improve the entry. -- Algrif 19:38, 24 May 2008 (UTC)[reply]

Which contains Citations: namespace pages.

I've got a new minion collecting words used in the 'pedia which we don't have. The citations probably don't meet CFI for strict 3-independent-source criteria (e.g. if RfV'd), but do provide one usage example and context. It isn't automatic, anything I'm suspicious of gets checked by me in Google, or just skipped. (I have code to tell it to correct spelling in wikipedia when it finds an error and I tell it the correct spelling; but they won't let Python edit ... so I just leave them ...)

Did you know we didn't have ethernet? (Citations:ethernet)

We have only 180K or so English entries, there are 500K missing to get to the Random House Unabridged (which is what I had under my bed growing up, I would lie there and read, and reach under and drag it out when I met a word I didn't know. I learned almost every word in it in the process :-). Lots more to find. Robert Ullmann 00:52, 11 May 2008 (UTC)[reply]

What is the mechanism for removing items from the list as entries are made? I would suggest a two-step process. First we need to check whether we have the lemma form of the word. If we do not, then there is more work - and more value.
The first non-lemma I tried modded led to an incorrect inflection of mod (one "d"), which was useful to correct. The second, crossbands led to a missing lemma.
This seems like a good way to generate some new entries. It would be interesting to find potential entries that were on multiple lists of wanted entries, especially lemmas. DCDuring TALK 01:52, 11 May 2008 (UTC)[reply]
When the entry exists, the Citations: page will no longer appear in this category. (modulo some job-queue updating that doesn't seem to find updates; they changed #ifexist to put the links in the page table, breaking Special:Wantedpages, but then didn't make it do what was intended! I had to purge Citations:scute to make it disappear from the cat). You are quite right, it leads to finding "lemmas" that are wrong (mod, you missed moding by CheatBot ;-) or incomplete (restructuring should have a noun sense).
Just to point it out clearly: these citations do not meet the strict CFI-3-independent-use requirement for an RfV; they are simply helpful and/or illustrative.
The method of finding words in the WP might shock you ;-) Robert Ullmann 12:21, 11 May 2008 (UTC)[reply]

aphetic form of

Please see scoriating. I have not found an "aphetic of" template. What categorization should be used? --Panda10 14:35, 11 May 2008 (UTC)[reply]

The general template {{form of}} allows you to do this:
# {{form of|[[aphetic|Aphetic]] form|[[excoriating]]}}
--EncycloPetey 14:50, 11 May 2008 (UTC)[reply]
Thanks. --Panda10 14:55, 11 May 2008 (UTC)[reply]

Category for moving along on foot

I'd like to create a category for entries that mean moving along on foot. I have about 80 words/expressions on my list. Could you help me with the category name and its place in the category tree? Thanks. --Panda10 15:06, 11 May 2008 (UTC)[reply]

Somewhere under Category:Movement, I suppose. Not sure what to call it -- "Foot transport"? -- Visviva 16:09, 11 May 2008 (UTC)[reply]
Category:Human gaits looks appropriate. Mike Dillon 16:14, 11 May 2008 (UTC)[reply]
Thanks. I will use that. --Panda10 16:49, 11 May 2008 (UTC)[reply]

Category:Movement

This topical category is too ambiguous. I propose we do what Wikipedia has done and use Locomotion instead < parentage Biomechanics < parentage Physiology and Mechanics. __meco 06:13, 12 May 2008 (UTC)[reply]

I think our needs are rather different from Wikipedia's, particularly when it comes to these sorts of "everyday" concepts. The approach taken by WordNet is probably more suitable, at least for nouns (locomotion < movement < change(s) < action(s)). There may be other examples we would want to consider -- isn't there a map of the Roget's categories around here somewhere?
Anyway, I've said this before, I'll say it again -- Words Don't Have Topics. Our categories can map onto any number of lexical properties -- semantic relations (as here), discourse field, usage, etymology, etc., but they cannot map onto topics, because there is no meaningful association between (most) words and specific topics. Thorough category reform is needed, but presents serious challenges. -- Visviva 07:46, 12 May 2008 (UTC)[reply]
I see any move toward being more like Wikipedia as a step backwards :p — [ ric ] opiaterein17:38, 13 May 2008 (UTC)[reply]

Foreign terms

[See also #How should Wiktionary distinguish between two classes of non-English words?, above.]

How to indicate foreign words which are normally written in italics?

My paper dictionary (CanOD) italicizes a headword "if the word is originally a foreign word and not naturalized in English." The various inflection templates could have a parameter like loan=yes, to italicize them. This could be accomplished with an HTML class, so the display of such loanwords could be customized by wiktionarians. For example, from comme il faut:

Adjective

comme il faut (comparative more comme il faut, superlative most comme il faut)

  1. Proper...

Or does this need to be applied to specific senses? (I presume that each specific etymology is likely to either be normally italicized, or normally not.)

How do we define foreign terms? I suggest that attesting italicized use may be good enough. Do we need to distinguish several classes of them (beyond what is accomplished by adding context labels)? Michael Z. 2008-05-14 01:15 z

Our formatting is inconsistent enough that italicizing headwords, while a good idea, is probably not by itself sufficient to clarify anything, especially since (unlike in most print dictionaries) our headwords are not all close together in a way that makes variant formatting stand out. Similarly with example sentences and even quotations: it's not instantly obvious to a reader that our italics are meant faithfully. So, I'd support italicize=yes or something, but a stock usage note seesm necessary as well. —RuakhTALK 01:44, 14 May 2008 (UTC)[reply]
Well, an advantage of italics is that it doesn't hit you in the face, but it is obvious when you look for it.
But where would you put a usage note. This seems to be something that belongs to the headword line, not to individual senses. Michael Z. 2008-05-14 02:06 z
Sorry, but I don't understand your question. Usage notes don't belong to individual senses; we sometimes use {{sense}} to indicate what sense they apply to (and usually clarify it in the text of the note as well), but that doesn't affect the placement of the note. And we have no shortage of usage notes that apply to all senses of a term. —RuakhTALK 02:42, 14 May 2008 (UTC)[reply]
Oops, I misunderstood. That makes sense. Michael Z. 2008-05-14 07:09 z
I once drafted a stock usage note at {{en-usage-foreignism}}. Thoughts & improvements thereon would be most welcome. -- Visviva 09:06, 15 May 2008 (UTC)[reply]

I'll work on implementing italicize=yes in the inflection templates. Anyone have comments or suggestions? Michael Z. 2008-05-14 19:22 z

If this is to be done, it should be done for form-of templates, too, no?—msh210 19:44, 14 May 2008 (UTC)[reply]
I think the inflection template is enough, since there is normally one present above the form-of template. But I'll think about where else we should italicize words. Michael Z. 2008-05-14 19:51 z
But we'd want consistency, no? Italicize all words that [whatever the criterion is]. That's whether they appear in inflection lines, in definition lines, or elsewhere. Or am I missing something? (Of course another consideration is that, I think, some people have the preference to view all form-of parameters in italics.)—msh210 2008-05-14 (9 Iyar 5768) 19:59:23 UTC
That might be nice, but might be impractical. It already collides with general use/mention italicization of English terms using {{term}}Michael Z. 2008-05-14 20:04 z
Such words could be included in Category:English borrowed words. I'll think about adding a category hierarchy parallel to the etymological derivations for un-naturalized borrowings from one language to another. Michael Z. 2008-05-14 19:49 z
I think this is a bad idea. Firstly the parameter should (if this is to happen) not be called "italicise=yes" but something less format specific like "borrowed=yes" so that users don't think "I prefer italics, so I'll make my entries italic". Secondly, visiting readers will not know that italics isn't normal, if they even notice it is italic at all, and so we'd have to add the ====Usage Note==== anyway - making it redundant. Thirdly it makes everything just a little less consistent, and slightly more complicated - two areas which need to be moving in the opposite directions. Conrad.Irwin 20:34, 14 May 2008 (UTC)[reply]
Agreed. (My comments above on how to implement this all meant "if we use this".)—msh210 2008-05-14 (9 Iyar 5768) 20:46 UTC
Fair enough. borrowed=yes sounds good, because it refers to the linguistic concept and not its presentation. I'd also like to consider whether there is any sense in incorporating different types of loanwords (=borrowing, =calque, =reborrowing). But I think that information probably belongs in = Etymology = and this should incorporate the use of italics in English only.
I agree that it may be hard to notice—and I'm convinced that it is a great advantage to convey meaning in a way which is easy to understand, without adding any clutter or visual distraction (exactly what italics are meant for). It's used to good effect in some dictionaries (e.g. my Canadian Oxford), as are various typographic conventions in various dictionaries. This in no way mandates the addition of a usage note; on the contrary, it obviates that cluttering method which is not in use anyway (only 1 of 75 terms in Category:English borrowed words has such a note, and the particular one is contraindicated by my paper dictionary).
Whether it complicates the use of inflection templates is a question. Two possible problems: it remains absent from many foreign terms, or editors add it to naturalized English terms (we know how some hate "foreign" diacritics). We should specify some sort of attestation test for its application. Michael Z. 2008-05-14 21:15 z

I've created an initial draft proposal, which addresses some issues mentioned, but still leaves some questions open. Please keep general discussion here, and refer to User:Mzajac/Foreign termsMichael Z. 2008-05-14 22:40 z

Regional language vs regional topics

Regional context tags like Template:US are used to indicate different things:

  1. Regional spellings, as in labor:
    • labour (UK, Australia, New Zealand, Canada)
  2. Regional senses of words, as in station:
    1. (Newfoundland) A harbour or cove with a foreshore suitable for a facility to support nearby fishing.
  3. Regionalisms only used in a place, as in jambuster (labels the sole sense, but really refers to the entire entry):
    1. (Canada, Manitoba and north-western Ontario) A doughnut filled with jam.

The problem is that regional language and a regional context are two different things. SAS means one thing in a British context, and another in a Scandinavian one—but these are not examples of British English and "Scandinavian English". Both senses are used world-wide to refer to the particular things. Another example is горелка (gorelka), which is used in South Russia to refer to vodka. But it is also used in general Russian in a Ukrainian context, or to indicate a certain mood associated with Ukraine, because this word comes from the Ukrainian горілка (horilka). These would be indicated with (South Russia) and (Ukraine), respectively, but when you combine (South Russia, Ukraine) you can see how inadequately they convey two different messages.

Another unfortunate result is that categories like Category:Canadian English get full of words like Canada Day, Canuck, Montrealer, Robertson screwdriver, which are not restricted to Canadian English, but belong in Category:Canada, instead.

My paper dictionary (CanOD) uses geographical labels on a headword or a sense to indicate where a word is used, formatted the same way as other context labels.

  • bunny hug noun Cdn (Sask.) ...
  • 2 Cdn ...

But it uses a different style of comment to indicate regional context only:

  • FCC abbreviation 1 (in Canada) ... 2 (in the US) ...

Shouldn't we accommodate this distinction, and have one set of labels and categories for regional usage (US, British, Canadian, Newfoundland), and another for regional topics (in the US, in the UK, in Canada, in Newfoundland)? Michael Z. 2008-05-14 02:14 z

Perhaps they can be made more distinctive by some typographic treatment, e.g.:
British
(in the UK)
[Canadian]
 Michael Z. 2008-05-14 02:23 z
Perhaps we can have the regional topics included into the text of the definition (FCC: The United States Federal Communications Commission), with context tags used only for regional use. This would allow us to keep our current category scheme (i.e., that the regional {{context}} tags categorize as "Canadian English"). Then we'd need to have new categories for regional topics, as desired, and add them manually using [[Category:. These should be called, perhaps, Category:United States, at al.—msh210 17:49, 14 May 2008 (UTC)[reply]
Sounds like that works in principal, with a little tweaking of the category tree. I hope editors wouldn't continue to use the regional templates for everything, but good documentation on templates and category listings should help. I'd like to make a proposal with some specifics about the changes, and present it here (may be a week or two).
This is a significant change, so I'd like to see broad consensus for this. Anyone opposed? Michael Z. 2008-05-14 19:28 z
No opposition from me, I think its a good idea. Thryduulf 20:14, 14 May 2008 (UTC)[reply]
Ditto. —RuakhTALK 21:03, 14 May 2008 (UTC)[reply]
This problem has also been bothering me. If we can get a clean explanation of both the technical and editing aspects, I'd certainly support this. --EncycloPetey 21:49, 14 May 2008 (UTC)[reply]

I've made some notes at User:Mzajac/Regional language vs regional topics. Still needs more detail, I think, and a few sets of eyes to find what's still lacking. Michael Z. 2008-05-15 04:30 z

Can someone else please be the bad guy

So KYPark and I have been butting heads a few times as of late. The most notable discussion can be seen at Talk:못하다, and there's also a bit at User talk:Dmcdevit#Deleted Category:Euro-Korean words. Additionally, I blocked them last year for the continued insertion of Korean-Germanic cognates into Korean entries. KYPark has a history of making edits which, in my opinion, use guerilla style tactics to push their point of view (a point of view which they are generally alone in), notably with Korean etymology and transliteration. However, they are also, again in my opinion, an excellent and highly skilled editor, which is what makes them so problematic, because it would be so much easier if they were worthless and I could simply put a long block on. So, I just noticed Citations:witch. The citation seems reasonable enough, but the etymology bit seems completely outside the bounds of what we want to have......anywhere on Wiktionary. My first instinct was to simply remove the content, but every time I've done something similar I've been accused of being a rogue bully admin. If the community agrees with my opinions on the matter, can someone else remove the content (and can we please remove KYPark from the whitelist so someone can keep tabs on their additions). If I am being a bully, please tell me so, and I will desist. -Atelaes λάλει ἐμοί 06:37, 14 May 2008 (UTC)[reply]

You are in fact and effect inviting the innocent to an evil age-old witch-hunt party. Just stop it right now, I say. I expected this proceeding, and edited the very Citations:witch in advance, just to suggest that you are unforgivably wrong and evil. Behave yourself, I warn you. Should you be told to do so by Wikt, it should be prepared ... --KYPark 11:11, 14 May 2008 (UTC)[reply]
This user should definitely never have been whitelisted. The best I can say is that many of his recent contributions have been simply value-neutral, not requiring immediate cleanup. -- Visviva 11:22, 14 May 2008 (UTC)[reply]
Non ety, non-citation content deleted. DCDuring TALK 11:43, 14 May 2008 (UTC)[reply]
Atelaes, could you talk a little more about your reasoning for blocking/deleting vs. posting something at WT:RFV or WT:RFD? Also, I think we should be using the {{fact}} template more often. For example, if you found his claim about the Indo-European thingy (I didn't read the whole debate in detail) to be dubious, you would insert the template so that everyone would know that it is an unverified claim (until a proper reference is given).
KYPark, you strike me as a non-native speaker of English. Just an impression, please correct me if I'm wrong. My initial feeling is that some of your posts come off as a bit defensive (clearly, you were unhappy about your stuff being deleted. An understandable reaction) or impolitic. I attribute some of this to a lack of ability with some of the finer points of English discourse. If your skills in English are not the problem, then it could be that you feel that your work is being attacked by non-experts. You must understand that we have no way of verifying anyone's level of knowledge. This is why it is important to cite credible references when entering potentially contentious information.
At this point, I'm trying to remain neutral. I'm doing my best to give both of you the benefit of the doubt. How will you respond to my post, I wonder? Will your response be diplomatic, sarcastic, funny, mean? I have no idea. What I do know is that a lot of people form opinions about a contributor after reading posts at places like Beer Parlour. -- A-cai 12:33, 14 May 2008 (UTC)[reply]
Tbis is not to respond to A-cai at all. Please, please, don't be too excited by the fact that some Korean words sound like Western words.
What is the the way you like best?
The note material at Citations:witch was not etymology and was not citations. It simply does not belong there. I believe that Mr. Park's judgment may possibly be impaired by the anger generated by the exchange leading him to revert my removal of it. I have rolled it back. I do not wish to get into a revert war. Could someone else take a look at the material and determine whether there might be another place where it can do Wiktionary some good. DCDuring TALK 14:56, 14 May 2008 (UTC)[reply]
I'm not sure if this is the correct line to put a response to A-cai, but hopefully they'll see it anyway. The problem with using {{fact}} for the Korean/Indo-European "cognates" is that it was simply too distributed, for one thing. This was not an assertion made on a Wikipedia article about the history of the Korean language, but rather was contained within the Etymology and Related terms sections of a number of entries (I'm not really sure about the exact number, you may have to ask Stephen about that, as I believe he did most of the cleanup). Now, I'll come right out and say that I know very little about the Korean language, however the initial claim of a relationship between Korean and European languages struck me as, well, surprising. I did a bit of research, talked with some other editors, and the conclusion I came to what that this was not a claim taken seriously at all within historical linguistics. When I initially talked to KYPark about this (which can be read at User talk:KYPark#Block), they gave me the impression that their only evidence was that words sounded alike. It is my opinion that this is not an acceptable method for deducing genetic relationships on Wiktionary. I hope that at least begins to answer your questions. Please feel free to restate any that have not been answered. -Atelaes λάλει ἐμοί 19:20, 14 May 2008 (UTC)[reply]
Also to respond to your query concerning posting something on rfv/d versus simply removing it: That is something which I decide on a case by case basis, and I do not think I could give you a reliable rubric for it. However, I can note on my reasoning for the content removals specific to this case. Concerning the Korean-PIE cognates, I think I covered that fairly well in the preceding paragraph. As for the bit on 못하다, I felt that such a discussion about and critique of Wiktionary policy was clearly outside of the bounds of what we have in the mainspace entries. Thus, discussion about the merits of the specific content were unnecessary, as to include such content would require a complete revamp of what we have in our entries. As for Citations:witch, the etymology was completely unscholarly, and, again, not the type of content we have in our entries (additionally, etymologies, of any quality, do not go in the citations namespace, but rather the mainspace; although since this was a quoted pseudo-etymology, perhaps that is a grey area). -Atelaes λάλει ἐμοί 06:52, 15 May 2008 (UTC)[reply]
I don't know enough about word histories to even judge the arguments here, about whether these things belong in Wiktionary, but I am confident that they do not belong on the citations page. As little as possible there should be composed, so if you find yourself trying to phrase something just so then it probably doesn't belong. What you could cite are other references that make your argument for you... although I doubt quoting them so extensively would be fair use. Even so that's apart from the question of how much should be mentioned, if any of it. DAVilla 18:44, 15 May 2008 (UTC)[reply]

If your skills in English are not the problem, then it could be that you feel that your work is being attacked by non-experts. You must understand that we have no way of verifying anyone's level of knowledge. This is why it is important to cite credible references when entering potentially contentious information.

by the way who are you at all, mr. dcduring? do you know Korean at all? how much you know? would you dare to compete with me? you choose the best way you like. come on baby. --KYPark 15:49, 14 May 2008 (UTC)[reply]

I know an angry person when I am in contact with one. I know material that doesn't belong in Citations when I see it. The material about Korean etymology looked tendentious to me, but I do not hold myself in any position to act on that impression alone in a matter of Etymology - and did not do so, as best I can recall. I am aware that there have been disputes in the past about areas of conjectural etymology. As a Wiki we need to limit ourselves to theories that are fairly widely accepted among lexicographers. DCDuring TALK 16:08, 14 May 2008 (UTC)[reply]
dear DCDuring, who are you talking to? me? oh no it's not me. you must be talking to someone else. go ahead. but if you'd answer me, read my word carefully enough. then the anwer should come out of itself, not necessarily by you. understood? --KYPark 16:25, 14 May 2008 (UTC)[reply]
No. I do not understand. DCDuring TALK 16:29, 14 May 2008 (UTC)[reply]
AEL 1
  • Atelaes did the right thing. Lots of KY's contributions have been useful, but the promotion of supposed Germanic-Korean cognates is so far out-there as to be extremely misleading to users. Widsith 19:32, 14 May 2008 (UTC)[reply]
Atelaes, I agree with you that it is not good evidence if words sound alike. I would like to note though that they have found evidence of so-called Caucasian humans in the middle of China. That could be a link between Korean and Indo-European, but I'm not sure. Mallerd 21:23, 14 May 2008 (UTC)[reply]
This one and more I believe It could be something, it could be nothing. Mallerd 21:29, 14 May 2008 (UTC)[reply]
Those fair-haired blue-eyed Caucasians would be Tocharians, Indo-European ethnolinguistic group responsible for some well-known IE loanwords into Old Chinese and other neighbouring languages. They certainly do not represent evidence in favour of "Uralo-Altaic" hypothesis, or of common development between IE and Altaic. --Ivan Štambuk 01:36, 15 May 2008 (UTC)[reply]
Talk:witch#Etymological notes deleted

This call for the "bad guy" started from the Etymological notes, which is necessary for this talk but deleted by User:DCDuring. So I copied and pasted it on the above page. From the above talk, I reallize there appear a very delicate misunderstanding against me and the resulting injustice done to me. So I have to defend myself positively while showing how others offend me intelligently. Please come and read, though you may need much patience. I'm so sorry to respond individually. Thanks. --KYPark 13:14, 15 May 2008 (UTC)[reply]

I have already tried to reason with KYPark in years past, in regard both to his unsupportable folk etymologies and his refusal to stick to the Revised Romanization that we use here for Korean, and he absolutely refuses to listen to reason on either score. Now whenever I encounter his edits, I simply remove everything concerning etymologies and I fix the transliterations. It is true that there are a small number of Korean words that were borrowed from Sanskrit in ancient times, but I don’t think that KYPark knows about those. If he would stick to definitions and grammatical work, he would be a very valuable contributor, but what he does here makes a laughing stock of our Korean entries. Because he’s never going to give an inch, I believe the only options we have are (1) to slowly and tediously correct all of his work, or (2) automatically revert everything he does that hints of a Korean-Indo-European nexus, or (3) just block him for a period every time he adds an etymology. —Stephen 14:06, 15 May 2008 (UTC)[reply]
마니다 (manida)
# to handle, cf. French manier  

This illustrates "what he does here makes a laughing stock of our Korean entries" or Stephen's intention to "automatically revert everything he does that hints of a Korean-Indo-European nexus," that is, to remove "cf. French manier."

Is this "etymology," "Korean-IE nexus," and "a laughing stock" indeed? You are supposed to be the opinion leader in this regard. Yet you look hypersensitive or extremely allegic to the possible Korean-IE nexus to your great dismay. I don't understand why you are so harsh. Do you know the fact that the exact 1:1 transliteration that you have opposed so harshly is now being given as additionally as was done by me as a bonus as well as "cf. French manier" above? To me, your allergy looks like a real laughing stock. --KYPark 16:28, 15 May 2008 (UTC)[reply]

By the sheer number of human languages (around 7,000) and the number of terms in each language, it is relatively easy to find in 2 non-related languages a few words that are similar in meaning and pronunciation. This information may be interesting, but does not make the word pairs cognates or make the information fit for the Etymology sections of entries. Keep the information in Appendices or the User area, unless it has been accepted by area experts. Thanks to those that cleaned up the entries. --Bequw¢τ 20:06, 15 May 2008 (UTC)[reply]
Right, Bequw. The majority of the linguistic community remains unconvinced that an early Altaic version of the Korean language was strongly influenced by IE, so we must rely on published linguistic works to claim that, for example, 마니다 (manida) derived from manier. (I speculate that it more likely derived from the native Korean root in 만하다 (man-hada) or from that in 만들다 (mandeulda, to make), but I won't make such claims in our main namespace.) Similar to the English Wikipedia, we avoid original research in such controversial matters and must fall back on authoritative publications. Rod (A. Smith) 20:36, 15 May 2008 (UTC)[reply]
Excuse me Rod A. Smith, but I thought that experts didn't see Korean and/or Japanese as Altaic languages as well. Have I missed something? Thanks Mallerd 20:51, 15 May 2008 (UTC)[reply]
Correct, Mallerd, and no need to excuse yourself.  :-) KYPark seems to be in the minority group who considers it an Altaic language, and he further distances himself from the mainstream by claiming that an early version of it was strongly influenced by one or more IE languages. I didn't mean to lend any support to that notion, nor to support any claims that Korean or Japanese are Altaic languages. (I've modified my above post to clarify that.) Rod (A. Smith) 21:30, 15 May 2008 (UTC)[reply]
There's nothing wrong with providing readers with possible mnemonics for learning Korean words, but there is no place for such information in mainspace entries. Specifically,
  • such information doesn't belong in the definition line, since it has nothing to do with the definition of the term.
  • It doesn't belong under "Derived terms" or "Related terms," since there is no etymological relationship.
  • And it certainly doesn't belong under "See also," since a reader following the link will learn nothing about the Korean word.
Such information could be placed in an Appendix:Mnemonic aids for English speakers learning Korean words, or similar. Of course, a French sounds-alike term would be useful only for French speakers, so that would need to go in a separate Appendix. And frankly, I don't think any such correlations are useful for the majority of language learners; effective mnemonic strategies are something which individual learners have to work out based on the peculiarities of their own brains. For instance, I first learned "매다" as "to weed," by drawing a picture in my vocabulary book of a hawk () weeding a garden. Would that be useful enough to put in an appendix? I rather doubt it. -- Visviva 03:28, 16 May 2008 (UTC)[reply]
Having read through the entire exchange to this point, I think I understand the nature of the problem more clearly. The original question posed by Atelaes was whether anyone else could be the "bad guy." In looking at the qualifications of the various contributors, it would seem that Visviva and Stephen are the most qualified. Of the available contributors, these two seem to be the most knowledgeable about Korean. If anyone has a chance of getting KYPark to be reasonable, it would be another Korean expert who could provide credible evidence to counter any dubious claims. I know nothing about Korean, so I obviously wouldn't even attempt to debate KYPark about Korean. However, I do think it is legitimate for non-Korean speakers to ask a Korean speaker to refrain from original research and to cite his or her sources etc. -- A-cai 13:52, 16 May 2008 (UTC)[reply]
AEL examples
Excuse me, but I have to move to the leftmost as follows:
매다 (mae-da)
# to weed, cf. mow

Uses

* 호미매다 to weed the weed with the hoe.
* 으로 베다 to mow the grass with the sickle. 

This should be absolutely all right. This has nothing to do with original research. I wish to make this more interesting, surprising, or motivating as follows:

매다 (mae-da)
# to weed, cf. mow, Dutch maaien, German mähen, 
  Old English māwan, Old High German māen. 

Uses

* 호미매다 to weed the weed with the hoe.
* 으로 베다 to mow the grass with the sickle. 

Neutral Korean students would be surprised, while some of Korean scholars, anti-Eurasiaticists, anti-Euro-Koreanists, etc. probably more or less upset. But do you insist that this is an original research, especially claiming the "Korean-IE nexus"? Why should this be so different in effect from the first example?

Currently, Wikt misses this well-known etymology in the mow page. I don't know if it has done so all the while. What if that etymology were deleted? Then I cannot help but guess that it may have been deleted by those who were badly afraid of the relation to Korean 매다. Such could probably exist on earth. And they must hate me as if a witch hated by Western Christians. Then, they would make me a prey of witch-hunt. To me, such is a war, so-called w:science war!

Refrain from saying too easily here. Please try to be more cool, smart, and neutral. Evaluation is up to everyone, hence mostly subjective rather than objective. Note that I look like being witch-hunted. Please do any justice to the likely prey to the wicked, twisted or evil, like me. Your sense of justice is on the testbed. (Let me drop below another example for your reference. I am not sure but may further respond to Visviva.)

wick (plural: wicks)
# a bundle of twisted fibers in a candle or lamp.  

See also
* witch 
* wise
* wit
* white
* bitch
* Korean   (bit, bich) light

--KYPark 03:30, 17 May 2008 (UTC)[reply]

KYPark, I'm not sure I understand what you are trying to say. Do you mean that the English word mow is somehow related to the Korean word mae-da? If that is the case, can you list a reliable dictionary, book or website where you found this information? If you cannot provide a source of the information (other than yourself), then it would qualify as original research. As you know, original research is not allowed. -- A-cai 04:50, 17 May 2008 (UTC)[reply]
A-cai, no, I don't. I said above: "But do you insist that this is an original research, especially claiming the "Korean-IE nexus"?" This is to deny your and others' doubt. The above comparative data may be regarded as such, if I were listing them under Etymology, Derived terms, or Related term. So far I've denied again and again and again. But they would not understand and believe my word as such. So I cannot help but doubt their mindset, orientation, or motivation. In a nutshell, I've edited Wikt mainly to help Korean students learn foreign languages, esp. English more effectively. (I wish them to know that Korean is not such an island as the mainstream linguistists believe.) They are most famous, if not notorious, for spending an enormous amount of money in learning them. Unfortunately, however, their achievement is very doubtful, not to mention their hardship and loneliness. All the gentlemen above, including you, need not bother them, nor what Korean education should look like. This is a matter of Korean strategy, in a way at least, which I thus warn others to be very careful not to interfere with. Thanks. --KYPark 06:58, 17 May 2008 (UTC)[reply]
Reducing the language anxiety of Korean students is a worthy goal. However, it is extraordinarily unlikely that Korean EFL students would be looking up common Korean words on the English Wiktionary. I say this as someone who has been working with such students for the past 6 years. It is far more likely that they would be looking up English words on the Korean Wiktionary. If such content belongs anywhere, then, instead of putting notations like "cf. mow" at 매다, it would make much more sense to place a note at ko:mow, something like "<매다>의 뜻과 비교됨." (I'm not sure whether such content belongs on KO either, but that is a KO issue.)
Likewise for the respective Old High German (etc.) material. This is completely useless for someone learning either Korean or English, but might conceivably be useful for L1-Korean students of Old High German. In any case, it doesn't belong here. -- Visviva 07:54, 17 May 2008 (UTC)[reply]
Traditionally, the Korean-English dictionary or 한영사전 in the book form and now online has offered the kind of information given under == Korean == of the "English Wiktionary" (en.wiktionary.org) you mentioned. No doubt such dictionary has been a must to put Korean into English. Now there are a number of similar online dicts. For example, just take a look at this page for 매다. And compare this with my example given above, and evaluate the difference. Explicitly and implicitly, there is everything I answer you.
The naver page ends with a blind alley or 막다른 골목, while my wikt edit is widely open toward boundless information resources. My edit is not just for young Korean students, but for any Korean who bothers foreign languages, e.g., to know Dutch maaien or German mähen, if not Old English māwan, Old High German māen. Suppose her common exploration routine such as 베다 or 매다 > mow > Translations > Dutch maaien or German mähen. On my page, she may be glad to go direct to her destination, rather than through Translations.
What is the comparative superiority of English Wiktionary (en.wiktionary.org) over all the other online dictionaries. Outstandingly through the hub called Translations of == English ==, all words of all languages are interconnected within one framework. In principle, anyone can begin with any word in any language and end with any other word in any other language. This is just great to anyone! Visviva's real intention is not to advise Koreans not to bother using "English iktionary" (en.wiktionary.org). --KYPark 13:26, 17 May 2008 (UTC)[reply]
But on the Wiktionaries, there is not one Korean-English dictionary, but two: that found on the Korean (KO) Wiktionary, ko.wiktionary.org, and that found on the English Wiktionary, right here. The difference is that the KO Wiktionary aims (in part) to provide English translations for speakers of Korean, while this Wiktionary aims to provide English glosses and usage information on Korean words for speakers of English.
These may seem similar, but in fact there is an enormous difference between the two. If you have ever compared a K-E dictionary made for Korean speakers with one made for English speakers, you will understand this.
  • If you want to provide translations of Korean words in multiple languages, you can only do so on the KO Wiktionary. English words are the hub of the English Wiktionary; Korean words are the hub of the Korean Wiktionary. Cf. ko:묶다.
  • If you want to assist Korean-speaking students of foreign languages, you will only reach your target audience through the KO Wiktionary.
  • On the other hand, if you want to provide information for English-speaking students of the Korean language, you should contribute here on the EN Wiktionary.
The KO Wiktionary is currently quite neglected. Nonetheless it is, in principle, the equal of this project; eventually it should contain as much information as any other Wiktionary. However, it cannot do so without the help of native Korean speakers like yourself. -- Visviva 14:14, 17 May 2008 (UTC)[reply]
KYPark, you have now just admitted to everyone that you have no way of verifying your claim. If you cannot verify information, then it doesn't belong on Wiktionary. At Wiktionary, we cannot simply post whatever we want, and assume that others will not challenge it. You should be prepared to defend any edit with solid evidence. Some of my entries have been challenged by non-experts as well. The best way to handle such a situation is to provide proof which is independently verifiable. What proof (book, dictionary, website etc.) can you offer that your above example is not an example of a false cognate? -- A-cai 07:38, 17 May 2008 (UTC)[reply]
Oh dear A-cai, again you misunderstand my English. By "A-cai, no, I don't." I meant "I don't mean that the English word mow is somehow related to the Korean word mae-da. This was to respond to your primary question: "Do you mean that the English word mow is somehow related to the Korean word mae-da?" By my answer, I need not answer the next. Watch out your English understanding. Cheers. --KYPark 13:26, 17 May 2008 (UTC)[reply]
KYPark, I think this is part of the problem. Your English is difficult to understand. You seem to have an adequate grasp of English vocabulary. However, your English sentence structure needs work. Ok, so if that was not what you meant, then what did you mean? If explaining your argument is too difficult in English, perhaps you could post your explanation in Korean to Visviva, and he can translate it into idiomatic English for the rest of us. I do not wish to embarrass you by suggesting this. I only wish to help you communicate with us. After all, isn't that why we're all here? -- A-cai 13:44, 17 May 2008 (UTC)[reply]


== Korean ==

=== Alternative spellings ===

* 띄엿 (ttuiyeos, ttuiyeot) (obsolete) 

=== Noun ===

 띠앗 (ttias, ttiat)

# brotherhood, fraternity, fellowship 
  Cf. Dutch, deutsch, w:Theod, þeod 

  • Is this sort of thing just a laughing stock or witchcraft?
  • Isn't it a great fun and run that Koreans may enjoy?
  • Is it doing any harm to what or whom, as if a fraud?

As I said, Visviva, nothing but English Wiktionary is just great to anyone. It is not in this case that "anything goes." There is a royal road in learning, say, English in English! This is partly why young Koreans spend so much money in English-speaking countries. They should better or more use the English-English dictionary, say, English Wiktionary or Merriam-Webster, than the English-Korean except in the beginning. This is partly why I would not accept your advice for me to go to KO, though it may not stand for Knock Out. Definitely no thanks anyway. But let me argue this way instead.

The international language or lingua franca shifts from language to language. English is the currency of which native English speakers take great advantage. Yet it is not their community's monopoly. Simply their national and international languages are the same. Everybody's language is nobody's language. English as such, e.g., en.wiktionary, should remain a universal melting pot which should include the Korean nativity per se together with the others. An objective, accidental, factual, neutral, unvocal, uncommitted, uncrowned, unaffected, undeniable, unalterable, unassailable simple comparison of Korean mae-da with Dutch maai-en should not be excluded from en.wiktionary. I do hate such nationalists as create a myth to brainwash their people as if they had been specially created by their God. I would never do such ridiculous evil.

Linguistics could become a science as far as subscribing to scientific methods, empirical and rational, inductive and deductive. Theories or hypotheses are rather rational and deductive by definition. So are Indo-European, Eurasiatic, Uralic, Altaic, Ural-Altaic, and so on. Neither is either historically or archaeologically proved. Either is no more than hypotheses, each aiming for better explanation than other. The Altaic hypothesis takes Korean as Altaic, while some others take it as an island not without reason. Suppose Korean shares far more cognates with European than with the mainstream Altaic. Then the Eurasiatic would best explain this fact, however minor. Every hypothesis has its own use. Even the Ural-Altaic should not be wholly denied. "Anything goes," according to w:Paul Feyerabend (1975). To treat it as rubbish is to degenerate linguistics into a lesser science. The more claim for community opinion, the more degeneration into the lesser science.

The "normal science" performed within a community called "paradigm," as noted by the science sociologist w:Thomas Kuhn (1962), is not quite scientific but quite socio-polytical. I would rather call it scientific pathology. In effect, if not on purpose, he stirred up scientists to the wrong, unscientic, polytical direction. His notion of normal science is abnormal science in an ideal sense. Scientists have the reason to prefer pragmatic interests, personal and communal, or pursuit of happiness to pursuit of truth. (In Korean parlance, 염불보다 잿밥 (yeombul-boda jaeppap) literally means that the mass service of Buddhism is valued less than the mess served to Buddha.) They used to wage w:science wars such as w:Creationism vs. w:Darwinism. The former fights for the Christian community, while the latter for the Darwinist.

In a sense, science and the w:enlightenment movement emerged and evolved in reaction to the clergic community called church. (Note that the church is no more than a community, whatever absolute claims it may make. Any church would make such claims.) At least, the movement, if not science too, became highly polytical, culminating in w:French Revolution. The English industrial, the French polytical, and the German religious revolutions share the same thing, that is, rebellion against the Christian church.

To know Korean truly is to know its greatness underlying. Unfortunately, Koreans seem to know little about that, I fear. It has undergone lesser changes. Its syntax and vocabulary is well organized. In contrast, English has undergone greater changes, hence a highly corrupted or eccentric version of the Germanic and European. It almost does without the European inflection. The /-en/ ending of Germanic verbs gave way to the root and to-infinitive. OE "mawan" now sounds "mow." Everything has been simplified, if not corrupted, but for the vocabulary of the greatest mixture. It may have changed so as to be used by a great mixture of ethnicity such as w:Huns, especially of Altaic origin, perhaps from the Far East!

The mysterious name Hun may have been simply derived from the Chinese hun () meaning (1) mixture (2) western barbarians or "西戎混夷". Anglo-Saxons may have risen from Scythia or w:Khazaria afterwards around the Caspian Sea, aka Bahr-e-Qazvin meaning "Khazar Sea" in ancient Arabic. Then it would be a great "laughing stock" for the British who may be more Hunnish to make fun of Germans as Huns or Sauerkrauts, (that is, most similar to one of the best Korean trademarks, kimchi 김치, together with the millenia-old caviar to Korean cet in Yale).

Such English eccentricity could never be explained by the helpless I-E hypothesis but hopefully by the Eurasiatic. Strangely, the Far Western English language has been easternized in effect, if not in fact. I can hear all sounds spoken in King's English, say, by Queen Elizabeth, as clearly as Korean.

Islands on the surface rarely float like an iceberg but mostly connect with the land mass below the surface. Such may be Korean. If it dubiously or hardly connects with the surrounding Altaic on the surface, it may do more closely with the other mass below the surface. Strangely it rarely shares cognates with the Altaic neighbors, in spite of syntactic similarity. Thus linguists regard Korean as an island. But the SOV syntax does not surely warrant the linguistic neighborhood. The older Latin also used SOV, which may have been more prevailing two millenia ago.

Korean is a great mystery as well as a great heritage. Such is the case with Korean Scythian-like clothings, fermented food in variety, floor heating, unbeatable archery and hand work, Amazon-like tough women (millenia-old iron headgears for women were excavated), half of the world's dolmens, and so on. --KYPark 06:28, 19 May 2008 (UTC)[reply]

KYPark, you do well to emphasize the point that is (or at least should be) a science, not a democracy. And it is quite true that science should be open to any theory, tested on evidence and not whether people like it or not. However, I think what you may be confusing is whether we're doing science here. The fact is, we're not. We are not coming up with, testing, and debating theories here. That is not the purpose of Wiktionary. What we are doing is simply reflecting academic consensus. Thus, we are not at liberty to come up with interesting thoughts and propose them to our readership. We only copy what the experts have already figured out. If you would like to try and argue for a Korean-PIE relationship, you are certainly welcome to do so. But do it in linguistics journals, not here. This is not an academic forum. It is an academic reflection. -Atelaes λάλει ἐμοί 06:57, 19 May 2008 (UTC)[reply]
Atelaes, I was arguing that things like academic consensus are very very rare, but ever-diverging points of view or community opinions in disguise of consensus. For example, there can be no consensus between creationism and evolutionism in parallel for ever, between Judaism, Christianity, and Islamism within Abrahamism, between unaccountable sects of Protestantism. Such is science carried on by ever-diverging competing conflicting paradigms or academic communities, as anything goes! or as if Thomas Kuhn had taught scientists to behave divergingly like religion rather than convergingly! Religion and science are a firm belief system. These are too dirty for me, hence none of my business. I have no intention whatsoever to promote or prove the Eurasiatic hypothesis, Euro-Korean hypothesis, or the like. But I would like to share and communicate the objective facts I know about Korean. These are mere data perhaps to be evaluated by scientists from theory to theory. But I don't bother them but the general readers. To help simply compare Korean mani-da with French mani-er is not science at all, but to insist both are cognate surely is. The question of fuzzy boundary is very crucial in science, so easily leading to category mistakes. It is a foolish category mistake to insist my help is science. A tour guide to show us around is not a scientist at all! A Shakespeare cannot become a Newton at will. (my parody in reverse order) But he helps us open our eyes wide to see what the world looks like. Please try to get to my point. You may ignore all my argument mainly aiming to draw attention to Korean, but evaluate the example in the boxes on top of it and answer the three questions below the boxes. Thanks. --KYPark 13:26, 19 May 2008 (UTC)[reply]
I've already told you: compare Korean mani-da to the progeny of Latin manus on your blog or personal web pages, but not on the definition lines of either here on Wiktionary, for that would be masquerading supposed IE-Korean genetic relationship based on nothing but vague sound similarities. Mnemonics argument is also not applicable to mainspace (it's usefulness is debatable even in separate appendix), as you've been told.
About your "anything goes" and "science as a belief system" claims - you're barking up the wrong tree. We don't want proofs or new theories or invalidating the old ones (which you've hardly done in your lengthy rant), but cites supporting mainstream theories established by professionalists. To what extent are they wrong - it's not our problem. --Ivan Štambuk 16:31, 19 May 2008 (UTC)[reply]
  • I expected Visviva to respond to my long argument mainly intended with him. But he did not.
  • Instead, Atelaes responded, mainly arguing for "academic consensus" as the sole source of reflection on Wiktionary. I disagreed and advised that things like "academic consensus" is very very rare. And I asked him to ignore all my argument but answer my three questions on top. But he did not.
  • Instead, Ivan responded, mainly repeating his claim elsewhere. In a sense, he answered my last question on top "Is it doing any harm to what or whom, as if a fraud?" ignoring the previous two. His word "masquerade" would be equivalent to my word "fraud" that may do "any harm to what or whom."
  • In effect, he insists I do harm by "fraud" or "masquerading." What is this? Is this what Wiktionary says to me? Please advise me how I can be assured that this is what Wiktionary means. I advise Wiktionary and Wikipedia to answer my question when it is well aware who I am.
Nobody wants to respond to your long "argument". I'm personally just waiting for this discussion to end and for you to accept that this sort of speculative original research does not belong on our defintion pages. It think it's blindingly obvious that concensus is against what you're trying to do. Mike Dillon 15:41, 20 May 2008 (UTC)[reply]
KYPark, I will attempt to give you a short answer. Wiktionary and Wikipedia have a policy of no original research. Your argument is that you should be allowed to enter original research into Wiktionary, because there is no academic consensus on the subject. So far you have given us your personal opinion. You have not cited a single source that supports your claim. I find your argument about creationism vs. evolution to be disingenuous, because there is academic consensus among scientists on that subject. You ask us why it is harmful to enter unverified information into Wiktionary. The reason that it is harmful is that it affects our credibility as a dictionary. If Wiktionary cannot point to a reliable source for its information, then nobody will take Wiktionary seriously. We want people to take Wiktionary seriously, because we want Wiktionary to survive and thrive. -- A-cai 12:42, 21 May 2008 (UTC)[reply]
It is a shame that one often defeats oneself, especially without knowing the fact. It is unclear if to let her know that is to do her harm. So some just suck their cheeks or stick their tongue in cheek. But I would not make fun of her behind her back, but tell her the truth, which in itself is neutral, but could sound cruel to liars and obscurantists.
Suppose w:Darwinism or w:evolutionism is a thesis of academic consensus. Then should its antithesis such as w:creationism or w:intelligent design be deleted from Wikipedia and Wiktionary? Should w:Lamarckism be deleted? Should w:Ural-Altaic as rubbish be deleted? In practice, few things die and most things do. Even w:Flat Earthism survives! Wikipedia is very proud of the greatest number of entries. What is the implication of this greatness? Are there that many theses of academic consensus indeed? Oh, no, never, ever!
The idea of "academic consensus" is a huge stumbling block and self-contradiction. Academia in essence is a place for partisanship Kuhn called paradigm, rather than for consensus beyond paradigm. Academic circles are like polytical parties. It is now well accepted that science is not value-neutral. The title The Collapse of the Fact/Value Dichotomy authored by w:Hilary Putnam (2002) is striking. Korean linguistic facts, for example, should be of more value to Korean general public above all than scholars, education than science.
Christians see Muslims evil, and vice versa, namely, proto-religious war endangering the peaceful excluded middle. A practical solution would be peaceful co-existence of black and white, say, Muslims and Christians. Wikipedia where anyone does and anything goes in general is the last place for black-and-white judgment and choice, but the ever-lasting place or melting pot for black-and-white confusion. All it could and should do is to inform readers of both black and white for their own judgment and choice, namely, w:reader response. --KYPark 14:37, 22 May 2008 (UTC)[reply]
You are missing the point, Wikipedia articles and Wiktionary entries do not deal in original research. The articles and entries relating to Darwinism and those relating to evolutionism, as well as all the others you mention, reference respected reliable third-party sources. Your etymological additions are not backed up by any third-party sources and so are not accepted here. There are plenty of sites on the internet that operate with an "anything goes" philosophy, but Wiktionary is not one of them. The governing philosophy here is reliability and verifiability, which means that everything must be sourced and referenced with reliable sources. If it cannot be reliably sourced then it must be deleted, regardless of the importance or otherwise of the word/topic/goal/etc. Thryduulf 15:23, 22 May 2008 (UTC)[reply]
The admin community is supposed to be brainwashing me and the third party by repeating again and again as if I had imposed my original research on Euro-Korean etymology on Wiktionary. That is, it seems to be unjustifiably harassing me. I take this likely offence very seriously. My contribution must have included a negligible amount of original research if any. I cannot show up the whole state of affairs, as what I had done was mostly destroyed by the community one-sidedly. Instead, I recently brought a few new examples to attention, and asked how problematic such would be. The ever-changing answerers have rather avoided answering my questions directly while repeating their one-sidedly assumed claims in other words to the brainwashing effect. From those examples and another new page 고인돌 (as originally edited by me), you should discuss very persuasively which parts are definitely an original research and why. Otherwise, you are in effect harassing me without enough evidence beyond the reasonable doubt. You would know perhaps better than me what could be the possible consequence of such repeated false charge and evil harassment. --KYPark 08:45, 23 May 2008 (UTC)[reply]
Either you are trying to present serious etymological relationships, in which case you need serious etymological sources, or you are simply adding accidental similarities between words in a handful of the 7,000+ languages we try to cover, because you think readers will find them interesting or motivating. (You have appeared to make both claims in this discussion.) In the first case, the claim must be verified; in the second case, this is simply indiscriminate trivia, which we do not welcome here. -- Visviva 09:50, 23 May 2008 (UTC)[reply]
You show me two choices. But you are well aware that I deny the first is my choice. So you actually allow me just one (second) choice and dictate it based on your subjective evaluation, without discussing "very persuasively." OK, anyway. But, as you are not supposed to be the wiki law-giver, please convince me that all my above examples plus all my edit on the page 고인돌 are useless and undebatable enough to be entirely ignorable and deletable (even without prior discussion with the original editor), and that your remark is the final, non-negotiable wiki policy. Thanks. --KYPark 15:17, 23 May 2008 (UTC)[reply]
What do you mean by "share the same Roman syllable" ? What, if any, usefullness there is in comparing Korean dol and English *dol < Breton teol (which means table not stone) appearing exclusively inside the adopted compound term dolmen ? Looks to me that you're again trying to masquarede genetic relationship based on vague phoentic correspondence. --Ivan Štambuk 11:51, 23 May 2008 (UTC)[reply]
``Korean goindol and English dolmen share the same Roman syllable /dol/ by accident, meaning "stone" and "table" respectively.`` Ivan, you are very irresponsible to answer me without properly understanding the above single self-evident sentence. So are most others. So are most witch-hunters in Western history. So I called this talk the twisted or wicked witch-hunt party loud and clear, so convincingly from the beginning. So I blame you all for blaming me unjustifiably, without enough understanding and evidence beyond the reasonable doubt. At least on this occasion, you mistook my word and harassed me. I wonder if you are brave enough to apologize for this, and again to look for my weakest link you have to attack. Cheers. --KYPark 15:17, 23 May 2008 (UTC)[reply]

Response to KYPark

KYPark, you invited us to explain which parts of 고인돌 contain original research. I don't speak Korean, but I will attempt to give you an answer. First, let us look at your definition:
===Noun===
{{ko-noun|rv=goindol}}
  1. A dolmen, a prehistoric megalith having a capstone supported by two or more upright stones.
The above does not constitute original research, and is easily verifiable. However, you should include a references section to show where the information comes from. Here is how I would do it in this case:
===References===
*{{pedialite|고인돌|lang=ko}}
*{{pedialite|Dolmen}}
The {{pedialite}} template will give you the following text:
The above is convincing evidence that the Korean word 고인돌 equates to the English word dolmen.
Now for the second part, your etymology says:
===Etymology===
From 고인 (goin, “supported”), adnominal form of 고이다 (goida, “to support”) + (dol, “stone”). Korean goindol and English dolmen share the same Roman syllable /dol/ by accident, meaning "stone" and "table" respectively.
Where did you find the information in the etymology section? Did it come from a dictionary? Did it come from a book? Did it come from a website? Did it come from an academic journal? We don't know where you got the information from, because you don't state that in a references section. If you cannot point to a dictionary, book, website, academic journal or other reliable document as the source of your information, then we are free to assume that it is your own personal opinion. If it is your own personal opinion (even if your opinion turns out to be correct), it is considered to be original research, and is not allowed on Wiktionary.
Does the above answer your question? -- A-cai 12:01, 23 May 2008 (UTC)[reply]
KYPark, one more thing. The purpose of the etymology section is to explain the origin of the word 고인돌. With respect to the second sentence:
Korean goindol and English dolmen share the same Roman syllable /dol/ by accident, meaning "stone" and "table" respectively.
The second sentence does not explain the origin of the word 고인돌. Therefore, it should not be included in the etymology section (whether it is original research or not). -- A-cai 12:17, 23 May 2008 (UTC)[reply]
KYPark, take a look at 刻舟求劍. Notice how I provide a source for each piece of information. -- A-cai 12:24, 23 May 2008 (UTC)[reply]
Again and again and again, you mistook my word and harass me! But I will help you understand me properly. First you need to go to the history file of 고인돌 I created today. There were great edit wars today, presumably without your knowledge. The most important admins visited and edited against my edit. Have you done any? Oh no forget it, but examine carefully the historical processes, and sort out what is my real contribution. Really I did not want this sort of confusion, and asked my original edit to remain as such for a week. Nonetheless, my edit was immediately destroyed perhaps to your dismay. But forget all these, but just remember that you have to answer me at all after you have mastered the whole history of this god-damned page! Understood? Many Thanks. --KYPark 15:36, 23 May 2008 (UTC)[reply]
Hurriedly, just one more thing. Read the talk page, too. Thanks again. Sincerely yours, --KYPark 15:43, 23 May 2008 (UTC)[reply]
KYPark, I now see that the following part was deleted:
  • "Dolmen" originates from the expression taol maen, which means "stone table" in Breton. (Beside this Wikipedia article: Note that this Bretonic word was allegedly incorrectly fabricated so that taol stood for "table" and maen for "stone." Also note an assumed Sino-Korean word consisting of (dol, "stone") and (Japanese men, Korean myeon, "roof").)
  • The etymology of the German Hünenbett or Hünengrab and Dutch Hunebed (lit. Huns' bed) all evoke the image of giants building the structures. Of other Celtic languages, "cromlech" derives from Welsh and "quoit" is commonly used in Cornwall. Anta is the term used in Portugal, and dös in Sweden.
KYPark, the above was deleted because it is not directly related to the origin, definition or usage of the word 고인돌. Wiktionary guidelines are fairly clear about what kinds of information can be included in an entry (see: Wiktionary:Entry layout explained).
Finally, please read the following Wiktionary policy pages: Wiktionary:No personal attacks and Wiktionary:Assume good faith. -- A-cai 20:23, 23 May 2008 (UTC)[reply]

I dare to declare I won

Do you want permission to add etymologically unrelated words to the etymology section of Korean entries? If so, then a simple "no, that's not the purpose of the etymology section, but feel free to use the talk pages for such trivia" seems sufficient. If you want something else, please be specific. Rod (A. Smith) 18:01, 23 May 2008 (UTC)[reply]

I still sit up at 5:20 local time. --KYPark 20:23, 23 May 2008 (UTC)[reply]
Yeah, I read you, A-cai. Thanks always. --KYPark 20:35, 23 May 2008 (UTC)[reply]
May I go to bed at 6:04 local time? Good night everybody... --KYPark 21:00, 23 May 2008 (UTC)[reply]

This page needs a complete overhaul for several reasons.

  • When viewing a page that has been deleted, you are shown the message "Note that administrator comments older than one year may be inaccurate, as explained in Deletions" (which redirects to Wiktionary:sysop-Deleted). However when you get there, you find there is no explanation of this.
    The history of the page that generates this message, MediaWiki:Recreate-deleted-warn shows that this was added on the 14th of December 2007. It gives no indication of whether it is older than 1 year from (approximately) that date, or a rolling 1 year period.
  • The content of the page is written in a very harsh manner, seemingly designed to scare people away. We don't want vandals, true, but they are not the only people who will see this.
  • Number 4, for example, basically tells everyone who wants to add a term at a protected title to bugger off to urban dictionary. It mentions nothing about what to do if you know (or even think) you're not entering the same term - it is entirely plausible that in the past some vandal repeatedly recreated something completely bogus, e.g. perhaps at

somewhere like "ogof", such that the entry title was protected. Years later a new contributor comes along and wants to add some words in a language that we don't have many entries for, they happen to start with "ogof" (which means "cave" in Welsh iirc) but are told to go away. There is no instruction to ask anywhere whether their word is allowed or not, not even to look in the list of protologisms. The result is they go away and we lose a valuable resource.

  • Number 3 says "before resubmitting nonsense entries" and then goes on to explain about requiring three citations and not to use secondary sources. We don't want nonsense entries resubmitted, and we want people who don't know about the citations requirements to find out about them easily, neither or which this entry does.
  • There is no explanation for what is meant by:
    • "attack page" - number 1 explains it (sort of) but never using that term.
    • "bad redirect" - there is something about redirects but it starts off by saying we're not Wikipedia, which isn't going to make it easy for anyone looking for what is meant by "bad redirect"
    • "bad entry title" - this is tangentally covered in the "Entire classes of terms are deemed not Wiktionary-worthy." info at number 7, and even less well at number 4 about page titles being protected
    • "copyright violation" - it should be obvious what this means, but there is nothing anywhere on the page (not even a link) about why we can't take copyrighted material
    • "Creative invention or protologism, (use WT:LOP)" - there is no explanation of what we mean by either "creative invention" or "protologism" and why they are deleted. You might follow the link to "WT:LOP", but if you don't it's meaningless jargon ("what is WT:LOP and how do I use it?"). Buried towards the end of number 3 (which waffles about explaining what deletion is for those who don't understand) is the criteria for citations of use. This is also briefly mentioned at the end of the section about classes of words not being allowed, using phobias as an example (and not a brilliant one either).
    • "Failed RFD do not enter" - what is RFD? How do I find the reason RFD deleted this word (or a word with this spelling)? What do I do if it isn't the same word?
    • "Failed RFV do not enter without Valid citations" - what is RFV? The link is to WT:CFI, which does not use the word "citations" in any header so you can't find it in the TOC. In the body text the word "citations" is used only in the sections about "Fictional universes" and "Brand names", which doesn't help the person unfamilar with Wiktionary who is trying to enter a word that is not a brand name and not related to a fictional universe (e.g. ogof again).
    • "Fatuous entry". Possibly covered in the "user tests" section (2) but the word is not used anywhere on the page.
    • "misspelling of" - of what? I've seen other pages that say they are a misspelling of something, why was mine deleted?
    • "Name of a person" - Why is this not allowed? You have entries for first names and surnames?
    • "Previously deleted/failed RFD or RFV" - What is RFD? What is RFV? Why was it previously deleted?
    • "Random formatting" - Why was it wrong? Where do I find how to get the formatting right? I thought this was a wiki and people were meant to cleanup where others got things wrong?
    • "Promotional material" - Why can't I put it here? Where can I put it? Where do I appeal if it isn't promotional material?
  • The page doesn't give links to other useful pages - there are no links to (or explanations of) RFD, the CFI, the Tearoom, Beer Parlour , Grease pit, information desk, etc. There is one link to RFV (in fact this is the only link on the page), but it gives no expansion of the acronym or explanation of the term.
  • Entry 8 is "other minor reasons", which they are, but there is no explanation of what the terms mean.

In short it needs a complete rewrite. Thryduulf 13:20, 14 May 2008 (UTC)[reply]

The page communicates more to us ("I had a legitimate rationale rather than mere pique for this deletion.") than to those to whom it is purportedly directed. The negative thrust of the message is the problem, IMHO. Giving the user a positive path to follow might lead to less bad feeling and get us some good content at modest cost in terms of additional entry-review time. Vandalism and silly entries are an inevitable part of Wikidom, I think. We need to cope. It might be a good idea to encourage folks to use requested entries, information desk, feedback, or the talk page for the entry or to direct them to a special-purpose place (proposed-change-space?) where they could enter what they wanted to enter without it trashing principal namespace. DCDuring TALK 15:13, 14 May 2008 (UTC)[reply]
I suppose there is a legitimate point to not facilitating the access to a possible vandal or a frustrated would-be contributor to a place where they can be disruptive. A positive path to a place where they can either contribute or vent spleen might allow negative emotion (anger) to dissipate. DCDuring TALK 15:23, 14 May 2008 (UTC)[reply]
We have the sandbox though it is very under-advertised. I dislike the appearance of this page in the deletion comment, though I do agree that ensuring we don't actually save personal information or other garbage in the deletion log is a good idea (which is why the PREF replaces the custom deletion comment with this). If someone does want to rewrite this page, then feel free - though maybe at a better title. Conrad.Irwin 19:53, 14 May 2008 (UTC)[reply]
I'll take a shot at rewriting this, but it will probably take a few days. As for a better title, how about Wiktionary:Explanation of common deletion summaries? Thryduulf 21:07, 14 May 2008 (UTC)[reply]
Wiktionary:Deletion currently redirects to sysop-Deleted, maybe we could use that (as it's slightly shorter ;)? Conrad.Irwin 21:25, 14 May 2008 (UTC)[reply]
I've made a start at Wiktionary:Explanation of common deletion summaries, feel free to move it there if you want. I've only done the introduction and one section so far, both of which need checking, etc. Thryduulf 22:35, 14 May 2008 (UTC)[reply]

what is all this fuss at all?

just total nonesense. you should know. why? — This unsigned comment was added by KYPark (talkcontribs) at 10:59, 14 May 2008.

If this is to do with Atelaes' thread above, please comment there. Otherwise I have no idea what you are talking about. Conrad.Irwin 19:56, 14 May 2008 (UTC)[reply]

Wiktionary:English pronunciation key

Wiktionary:English pronunciation key

I've made a number of changes to the pronun. key, which you may feel free to disagree with:

  1. removed erroneous information
  2. removed distinction between monophthongs and diphthongs listing all vowels under one table. Why? (a) because we dont need to teach articulation to anyone, we just need to represent sounds, (b) this is usually the tact used in dictionaries
  3. reordered sounds to approximate English alpha order (instead of the previous articulatory ordering). Why? (a) because this makes it easier to find the symbol for the naïve reader (who knows nothing of articulation), (b) this is standard practice in dictionaries
  4. I removed subphonemic distinctions (like the flap and velarized L). Why? (a) these pronunciations are entirely predictable by regular phonological rule, (b) dictionaries generally do not indicate nonphonemic information
  5. I added Vowel + /r/ distinctions that dialectically naïve wiktionary editors may not be aware of. These are usually implicit in dictionaries, but I suggest that we explicitly mark their differences due to nonprofessional editors:
    1. The difference between Mary, merry, marry
    2. The difference between serious and Sirius
    3. The difference between hoarse and horse. This one is perhaps debatable since this distinction is being lost in many standard dialects due to language change. However, these two sounds are usually distinguished American dictionaries and in the 2nd edition of OED (however, the new online edition has changed to treat them as the same, ignoring the old folks and minority RP speakers). (If wiktionary wants to ignore this distinction, then we need to revert my edit to hoarse.)

Discuss? Ishwar 13:40, 15 May 2008 (UTC)[reply]

On a quick look through the changes, I didn't see any problems, although a note at ʍ would be a good idea, since the distinction exists only a a portion of the range where English is spoken. And we may want to keep the distinction between the flap and velarized L. I'll wait to see what others think before copmmenting any further. --EncycloPetey 17:52, 15 May 2008 (UTC)[reply]
ʍ belongs next to w, since the English is spelled the same and the distinction is easier to find and understand when they are next to each other. Notes should be in the same column as examples, rather than the footnote in /ɹ/. Why are there separate entries for the identical vowels for /æ/ær/ and /ɪ/ɪr/ (wouldn't the /r/ be /ɹ/ in GA anyway)? Other symbols may want to have examples if they don't get cluttered (because it's a bit hard to judge the vertical position of the symbols in isolation), which may also help replace the note describing the use of stress mark.
Looks like an overall improvement. Thanks. Michael Z. 2008-05-15 19:27 z
I didnt change the footnoting.
Yes, ɹ should > r if you're using ɹ.
The separate entries for ăr and ĭr are to explicitly indicate to American editors that ăr ≠ âr ≠ ĕr and that ĭr ≠ îr. These sequences have mergered in many standard American dialects. This is in fact somewhat redundant and usually kept implicit in dictionary pronunciation guides (although you can find this out in the body of the dictionary by comparing Mary (mâr), merry (mĕr), marry (măr) and myriad (mĭr) , Sumerian (mîr). Plus, the guide already had the redundant är which is the sequence ä + r. (On a related note, you could eliminate îr altogether by symbolizing it as ēr, which is the way Random House Unabridged does it: myriad (mĭr) , Sumerian (mēr).)
Suggestion for "Other Symbols" is so noted. Ishwar 20:07, 15 May 2008 (UTC)[reply]
  • Basically good I would say. Except I think we should ditch RP and just call it UK, as per the OED and Wiktionary talk:Pronunciation. Widsith 19:38, 15 May 2008 (UTC)[reply]
    • I agree that we shouldn't treat RP as "proper". But should we separate RP and standard UK, or just expunge RP from Wiktionary? It may be that pronunciations for many terms were transcribed from RP so it would be wrong to relabel them en masse. Perhaps it would be valuable to keep both RP and modern UK transcriptions for historical/research purposes. This looks like a broader discussion. Michael Z. 2008-05-15 19:47 z
      It is a difficult issue. I add a lot of pronunciations, and always want to provide some clue about UK pronunciations. However, my experience is largely with the RP and very little with other UK dialects. This is primarily a result of the "posh" acccents promoted by the BBC, which has been my primary source of information about British pronunciation over the years. I have neither a good ear for nor sense of the sounds used in other UK accents. --EncycloPetey 23:27, 15 May 2008 (UTC)[reply]
      The problem is, what you're hearing is probably not really RP. For example, even the poshest BBC newsreader does not say [æ] anymore, but [a]. Phonemes like [æ] and [r] have essentially disappeared from "standard UK" speech. [æ] is particularly interesting one, since it is a major part of US English, and is in fact one of the primary differences in accent, which is why a word like man sounds very different in London and New York – yet RP transcription makes them look the same. Widsith 14:18, 16 May 2008 (UTC)[reply]
      Yes, what I hear now on the BBC is not RP, and I can tell. I'm referring to programs recorded in the 1960s and 1970s that I grew up on, and which even now get occasional airtime. Mostly comedies, scifi, nature, and adventure programs. I also have a DVD collection of those Shakespeare productions done in the 1970s. --EncycloPetey 14:32, 18 May 2008 (UTC)[reply]
      Ah, yes – that'll do it. Widsith 14:52, 18 May 2008 (UTC)[reply]
      Maybe I was overreacting. I'm okay with ditching RP in the chart, as long as we have a reasonably clear definition of what the UK accent is, and take into consideration other dictionaries' practices. Perhaps no one was suggesting ditching or renaming RP pronunciation in Wiktionary entries. Michael Z. 2008-05-15 23:30 z

pronunciations

I've extracted all the pronunciation info from American Heritage online for my own personal fun. I can convert them to the wiktionary pronunciation guide. Anyone interesting in using this info? If so, tell me where to upload it. Ishwar 13:43, 15 May 2008 (UTC)[reply]

What copyright restrictions are there? It would be amazing if we are allowed to. Conrad.Irwin 16:33, 15 May 2008 (UTC)[reply]
Please don't upload information copied directly from copyrighted sources. Perhaps the audible pronunciation of a word is an uncopyrightable fact. But its phonetic transcription is the creative product of a skilled expert (it isn't deterministic, incorporating a lot of nuance, just like dictionary definitions). Michael Z. 2008-05-15 19:10 z
I dont know the copyright issues.
"copied directly" is defined as what?
So, where does the pronunciation info in wiktionary come from? No one looks in reference books for pronunciation information? If pronunciation information is largely the same across dictionaries, can the origin of the information be ascertained and it be copyrighted? Will any wiktionary entry ever have an analysis resulting in a transcription that differs in substance from a transcription in a published dictionary? Ishwar 20:15, 15 May 2008 (UTC)[reply]
Dumping an online database and converting the data to Wiktionary is copying. It infringes on the creator's rights under worldwide copyright laws.
Pronunciations in Wiktionary are composed by editors. They may be created with reference to, with interpretation of, or transcribed from various sources (which often differ). Referring to a source is not the same as duplicating its database.
Read Wiktionary:Copyrights#Contributors' rights and obligations. It clearly identifies what you have the rights to licence to Wiktionary by entering it here. Michael Z. 2008-05-15 20:31 z
When I enter pronunciation information, I basically transcribe my pronunciation of the word in question, with reference to the pronunciation key and the examples therein. Occasionally I will look at online references to see how a particular part of a word has been transcribed or where syllable breaks have been put. Having moved around Britain quite a bit (principally Tyneside, North Yorkshire, Somerset, South Wales and now London), I do not have a particularly strong accent but I don't speak RP, hence I label it "UK". Thryduulf 22:43, 15 May 2008 (UTC)[reply]

Meta logo letter discussion link

There's a discussion on meta about changing the Japanese letter in the tile logo image. Best regards Rhanyeia 08:09, 16 May 2008 (UTC)[reply]

Topline see alsos for common misspellings

One of the most frequent spelling mistakes I make is whether a word has a single or double consonant in the middle. When looking the word up on Wiktionary, if I guess wrongly and the word I have entered is not an entry then the search results often (but not always) will link to the spelling I intended.

However, there is no such indication if the spelling I entered is a different word. For example, I might be intending to look up the English word barrack but misspell it as barack, which is a Hungarian noun.

Should the use of the topline see also therefore be used additionally to link these words together, in addition to any words with differ solely in capitalisation and/or presence or absence of diacritics? Thryduulf 15:44, 17 May 2008 (UTC)[reply]

I think that is a good idea. Conrad.Irwin 15:46, 17 May 2008 (UTC)[reply]
It can't hurt, but wouldn't it be better to have an actual English header and a "common misspelling" sense? DCDuring TALK 15:50, 17 May 2008 (UTC)[reply]
I'm not certain (I've not checked) that all of these would qualify for a "common misspelling" entry. Perhaps it would have been better to use a description like "simple orthographic variations" but then I doubt that anyone would know what I meant. In English words with sounds like /æɹə/ in them can be spelled either "..ara.." or "..arra...", so it makes sense to link to the alternate. However I don't know how to define this finitely, e.g. how many variations for words with the sound [ʃ] in them do we include? What about the orthography of other languages? Thryduulf 23:30, 17 May 2008 (UTC)[reply]
It is first of all entirely possible but only if there is a regular rule that does not depend on the language. See also colour for the page color is not acceptable on the top line unless you would want to see the top line to extend for several lines. See also theatre for the page theater is questionable and probably not a good idea in my opinion. See also perro for the page pero has been suggested before, and it is addition that I have made myself on a number of pages, even between different languages. More often though I have removed such suggestions as "zero" on the page 0, which I find to be rather annoying. As far as I'm concerned see also may as well be something that's completely automatic, since if a computer can't make the decision then it more than likely doesn't belong. The exception are glyphs.
The question about misspellings more broadly could probably be handled a different way though. I don't see that as exactly being the primary motivation for such an idea. The problem, if one could choose the words, is confusion. A native speaker is not likely to misspell pero as perro, but a learner could easily confuse them. I personally feel that's a pretty good reason to have doubles (yes of vowels and everything else) but I'm not entirely confident in that because I haven't seen how far it would extend. DAVilla 19:49, 19 May 2008 (UTC)[reply]
I have a hard time seeing how that would work, since (as noted in the discussion) the "misspelling" may not be common and may be a word in a completely different langauge. Would this then be extended to cover cases of single/double vowels? other similar spellings? We already have some members of the community concerned that our "see also" includes too much (although I think the current coverage is just about right). I tend to favor the stats quo on this particular issue. --EncycloPetey 23:58, 17 May 2008 (UTC)[reply]
Perhaps what we should have is a "Not the word you were looking for? See a list of similar words" With that being a list broken down by language and containing words with similar spellings in addition to the different capitalisations, diacritics, leading/training -, homophones, etc. If a word with a similar spelling exists in more than one language then it would get an entry in each language section. I don't know how workable this is though. Thryduulf 09:38, 18 May 2008 (UTC)[reply]
Search needs improvement. The "see" template, misspellings, and orthographic variant entries incompletely fill the need for a search that handles a fuller range of typos and other user errors. Also restricting search to a user-specific set of languages would be a help to users. DCDuring TALK 13:36, 18 May 2008 (UTC)[reply]
We could use a combination of aspell and DidYouMean for this, though I don't currently have the time to play around with them. (For those who haven't tried the WT:PREF "(Experimental) Use the aspell checker on User:Amgine's http:..." that will give you a limited idea of what it is capable of). Essentially DidYouMean will normalise all letters by removing diacritics etc. while aspell uses Soundex and typo checking to work like a normal spellchecker-suggester (in multiple languages). I've had only limited success in getting DidYouMean working on http://devtionary.info/wiki/, though more success with aspell - but it really needs a couple of days of PHP coding to get both of them going again properly. There are limitations though, aspell can only do one language at a time (because each language has different Soundex rules) and running every word through the ~80 dictionaries that it has would take too long, so we'd have to guess the languages to try in advance. If anyone has ideas on how to do this I'd be very interested. If/when we get these two installed there are a couple of other things I'd like to look into, such as de-romanisation of search input (so that searching for luw would also match λύω) and something similar for pinyin. (Though these are long term ideas while I would like to get DidYouMean/aspell going reasonably soon). Conrad.Irwin 13:50, 18 May 2008 (UTC)[reply]
I can't wait. :-)   Wiktionary:Feedback suggests that one of our biggest problems is that users don't know how to spell anything. (It's not our absolute biggest problem, which is that users don't know the difference between Wiktionary and Wikipedia, but still, it's a biggie.) —RuakhTALK 15:30, 18 May 2008 (UTC)[reply]
Regarding the 80+ dictionaries, there is a useful tool at http://www.xrce.xerox.com/competencies/content-analysis/tools/guesser that identifies which language a phrase is in (in my experience with about 90% accuracy). Obviously this will be harder with single words, and I've also no idea how the tool works or what its license is. However perhaps the best way to work with soundex is to use a tool like this to suggest the likely languages and give a probability based order to feed soundex dictionaries. Thryduulf 15:56, 18 May 2008 (UTC)[reply]
*tries it out* It's very impressive (and very cool!), but it doesn't seem to do decently even with correctly-spelled single English words, and it seems to do (very slightly) worse when they're misspelled. (I tried misspellings that I know from experience, and from Google, to be common.) —RuakhTALK 17:24, 18 May 2008 (UTC)[reply]
yes, interesting, but relies on having several words in some way: "wadudu" is Breton, and "wadudu wabili" is Malay? Seems to me that is two bugs. (well of course is ... ;-) "mdudu" is Turkish. as is "mdudu engine" (another bug ;-). Okay, I'm having too much fun. It isn't looking for the individual words. Robert Ullmann 17:47, 18 May 2008 (UTC)[reply]

Link prefix for pedia-links: w: vs. wikipedia:

I'd like to suggest that in reader-facing pages (entries, appendices, etc.), we use the link prefix wikipedia: rather than its short form w:. This is because the HTML title attribute (which defines the tooltip text — the text in the little yellow box that you get when you hover over something in out-of-the-box Windows, and the corresponding text in other systems) doesn't do the mapping. That is:

It's not a huge deal, but I think it would nice if our tooltip text actually said "wikipedia" in full.

RuakhTALK 18:46, 17 May 2008 (UTC)[reply]

So, does silence mean that people agree? Disagree? Don't give a darn? Don't leave me hanging here, folks! :-P —RuakhTALK 16:02, 18 May 2008 (UTC)[reply]
We use w: everywhere, and routinely correct wikipedia: (and Wikipedia:) to w:. I for one don't care about "tooltips" (which this isn't, it is just the link title), I look at the URL target if I want to see where I would be going. (This applies to every site, not just here) What some site designer wants me to read is less important. If you think WM software should be putting something more informative in the title attribute, that should go to bugzilla. (might break who knows what ...) Robert Ullmann 16:36, 18 May 2008 (UTC)[reply]
I agree with Robert. We routinely use w:, and Wikipedia routinely uses wikt: (Wikipedia even has a policy on that). If there is a problem, it's at the Wikimedia level, not at the individual project level. --EncycloPetey 16:51, 18 May 2008 (UTC)[reply]
O.K. Thanks, both of you. :-)   —RuakhTALK 17:11, 18 May 2008 (UTC)[reply]
On the other hand, it would be nice to have the tool tips corrected. Even if we use w:foo the tip should display as "Wikipedia: foo", no? DAVilla 21:51, 19 May 2008 (UTC)[reply]
That's what http://bugzilla.wikimedia.org/ is for. Though they tend to be very slow in answering enhancement requests. Conrad.Irwin 12:11, 20 May 2008 (UTC)[reply]

Offensive usernames and how to handle them

Following discussions on User talk:Connel MacKenzie regarding the usernames "Teh Rote" [20] and "Lou Crazy" [21] I think it would benefit the community to have an agreed procedure for dealing with usernames that are or may be offensive. To that end I propose the following:

If you see a username you find offensive or is inappropriate in another way, you must follow the following procedure:

  1. Initiate a discussion at Wiktionary:Beer parlour about the username, explicitly stating why it is inappropriate.
    • If a previous discussion has occurred, you must either abide by that outcome or explicitly state what has changed.
  2. Inform the user concerned on their Wiktionary talk page (and optionally via any other communication methods) that the discussion is taking place.
  3. Wait for consensus to arise about whether the username is acceptable or not, and abide by that consensus.
  4. If there is consensus that the username is unacceptable, the user should be encouraged to seek a renaming to a username that is acceptable. Only if after a reasonable period they have not requested a renaming may they be blocked and/or a forced renaming sought. Any period between having requested a name change in the appropriate place and this change being carried out does not count against the user.

In all cases, the user must not be blocked prior to their being consensus to do so, except where the block is unrelated to their username. Where the username is very obviously inappropriate such consensus will be arrived at quickly, so there is no need to preempt it. Where consensus does not arrive quickly, the username in question is not "obviously inappropriate".

In deciding whether a username is acceptable or not, it is appropriate to take into consideration the contributions and behavior of the user on the English Wiktionary and/or any other Wikimedia wikis the user has been active on.

Thryduulf 15:38, 18 May 2008 (UTC)[reply]

No good; way too much process. The 100% case is usernames that should be blocked instantly, out of hand. The problem here is simply that "Lou Crazy" and "Teh Rote" are simply not offensive, and should never have been blocked, or even considered to be blocked. And that is just an issue to take up with CM, as it has been. (And previously, when he blocked User:Dustsucker, and I unblocked, it is a literal translation of Staubsauger ;-) In the cases where the above might apply, the username should simply not be blocked; the user could be asked to change the name to something else. Robert Ullmann 15:48, 18 May 2008 (UTC)[reply]
+1 —RuakhTALK 15:57, 18 May 2008 (UTC)[reply]
Totally over the top. I shall continue to shoot first and ask questions later. (any usernames implying continual drunkeness will get the harshest treatment). SemperBlotto 15:51, 18 May 2008 (UTC)[reply]
Typical of naysayers. I can't help but chuckle at where this is coming from. Any username with leet in it, indeed, still should be indef'd. This is a web resource dedicate to compiling information about the English language, which leet still is not part of. Those who which to advocate it, are positively pushing a POV that honestly has no bearing on the English language itself. But all that aside, when one sysop suggests that the block reason itself isn't good enough and unblocks another sysops's block (note that there invariably are additional factors) they probably should not be sysops. --Connel MacKenzie 16:02, 19 May 2008 (UTC)[reply]
Yes, this is a resource for English. English only? No, every word in every language. (Okay, some constructed languages are not included because they aren't actually used to communicate, but) if people use leet, then it should be included. And for a few words, it is. It doesn't slip through the cracks because you think it isn't English. If it's attestable and it isnt' English, then pray tell, what language is it? What language are ain't and alright and amscray?
Yes, we all know it's silliness, but what is offensive about silliness? To some individuals such a username may not give a preferable impression, sure. But do you count it as a warning to check a person's contributions, or do you count it as a first black mark? Would you consider me, for instance, to be a silly person or a serious person? And does it change who I am because my username is DAVilla or ∂ανίΠα? Believe what you may, I submit that it does not any more than a person can hide his or her true personality. The truth is, somehow that always comes across.
Connel, if you think Thryduulf shouldn't be sysop, you might make a better case by giving an example of an unblock that wasn't justifiable. I have to agree though that in general such actions should be taken very cautiously. I'm assuming, of course, that you would be willing to unblock the person yourself if the issue came to your attention. This is a good lesson though. It's not good to have admins taking unilateral actions, on either side, when there is question. DAVilla 19:22, 19 May 2008 (UTC)[reply]
I was the unblocker, so I assume Connel refers to me. I decided to unblock because I felt that the block was incorrect and harmful to Wiktionary, in that it risked us loosing yet another contributor. Had the block been given when the user initially signed up I would not have revoked it, as that is unlikely to cause particular insult, however I feel that such a block should not be given after the user has become entrenched. (Call me flighty if you want, this is just what I think). Before unblocking I checked with a few others on IRC at the time that this was acceptable, and it seemed to be so, however I am not placing the blame on these people as I would have unblocked anyway had no-one been around. Conrad.Irwin 19:51, 19 May 2008 (UTC)[reply]
Template:italbrac In a more general sense, I disagree with the "Wheel War" rule. It is true that admin actions rarely require undoing, as they tend to know what they are doing, they are our most experienced contributors afterall. However in some cases they do need to be undone, and because adminship should not be a "special" thing (I reckon that the majority of Wiktionary regulars are admins - though I notice that WT:VOTE lies still recently), there should be nothing special about the admins or their actions. Conrad.Irwin 19:51, 19 May 2008 (UTC)[reply]
Although I think "Lou Crazy" and "Teh Rote" are acceptable usernames, I think that requiring an admin to submit his proposed block for community approval is too much red tape; also, if he was given the sysop flag, he is presumably trustworthy to block appropriately; also, the block can always be removed if needed. I do think that the text that a blocked user sees should inform him how long the block is in place, and that other admins might remove it if they think it was placed inappropriately. (This might prevent unfairly blocked users from leaving Wiktionary forever, when they might have good contributions to make.)—msh210 17:57, 19 May 2008 (UTC)[reply]
Red tape: bad; blocking user names indefinitely: bad; leet: not so bad.
Real problem: Wiktionary needs more users and contributors. Wiktionary not is nearly as popular as MW online or Answers.com. If I were trying to ensure that Wiktionary be no long-term competition to the other on-line dictionaries, what would I do? I would do what I could to discourage an active, broad user community such as Wikipedia has, by aggressively and opaquely policing new users in any way that I could. Delete their entries (preferrably without explanation), block them, harass them, provide little welcome or help. Discourage entries of popular terms, especially those that appealed to the young. Try to make it more like the OED than Wikipedia. This would be vastly more successful than having no Wiktionary, because it would actually prevent the "Wiki" brand from being used to establish a real competitor. DCDuring TALK 18:20, 19 May 2008 (UTC)[reply]
You paint a rather rosy picture of Wikipedia. Keep in mind that Wikipedia does many silly and irritating things as well. Discussion happens on every individual article on the site. Voting that affects hundreds of pages and several wikiprojects happens without notification in obscure locations, to be concluded rapidly before more than a tiny fraction of uninformed voting rats chew over the issue. Lots of sockpuppetry. Wikilawyering. And, my personal peeve, templates plastered mindlessly onto every single page that lacks a reference. At least here, the obvious or easily checked items are mostly left alone. On Wikipedia, even a one sentence stub will get a massive template demanding references and dwarfing the poor little stub. Yes, Wiktionary has problems, but only different problems, not more of them. --EncycloPetey 22:03, 19 May 2008 (UTC)[reply]
Well, yes, but they have 60+ times the user visits. I think most people used a print dictionary more than they used a print encyclopedia. Why don't we have as much usage as they have? We're not growing much faster than they are either, apparently. IOW, they are getting something for the all of the abovesaid misery. It is not obvious exactly what bold new direction would really help, but it seems that we ought to be willing to rethink some of the long-standing policies. Is there a wiktionary Google Tool? Wiktionary look-up add-ons for Writer and Word? More user stats? Do we need fact-based user personas to help us focus on actual users instead using our own preferences (now the sole basis for entry design)? I think we can have even more fun by getting more folks contributing and having more visits and more disagreements. DCDuring TALK 23:56, 19 May 2008 (UTC)[reply]
Your argument assumes that Wiktionary and Wikipedia have comparable content. We do not. There will be 60 times as many people looking up Britney spears as are looking up the meaning of "incontrovertible". The fact that we are growing at a comparable rate, despite having fewer contributors and despite being less "sexy" speaks to a real strength here, not a weakness. --EncycloPetey 02:13, 20 May 2008 (UTC)[reply]
I think we should be well pleased that we have fought our way to being the 4th most visited dictionary site on the web, at least according to these figures. DCDuring TALK 04:52, 20 May 2008 (UTC)[reply]

Move to Wiktionary:Manual of style, which is in keeping with basically every other wikimedia project. Nwspel 20:20, 19 May 2008 (UTC)[reply]

No. Only Wikipedia has a Manual of Style, it is a page consisting of general guidelines about the finer points of writing in a way appropriate for an encyclopedia. WT:ELE is not a style guideline, it is a structural policy. I think there is plenty of room for us to have a Manual of Style too, it's just that no-one has got around to writing it yet. Certainly some instruction into the best way to write definitions, what makes suitable example sentences and how to make Etymologies clear is beyond the scope of WT:ELE, but should be included in the manual of style. Conrad.Irwin 20:33, 19 May 2008 (UTC)[reply]
The ELE deals with issues of layouts of a page; something which the MOS does. I see no difference enough to keep them separate. Nwspel 20:35, 19 May 2008 (UTC)[reply]
Ww need to seperate policy from general advice. The Wikipedia manual of style is general advise, and there is (as above) plenty of scope for one of these on Wiktionary. We should not make WT:ELE any larger by including the irrelevant points. A second reason to avoid having it at the same title is that Wikipedians already assume that Wiktionary is just a Wikipedia with a different name, trying to hammer the differences into them is hard enough without confusing them by calling different things the same. Conrad.Irwin 20:39, 19 May 2008 (UTC)[reply]
I think we are more like wikispecies (See WikiSpecies:Help:General Wikispecies) than we are like wikipedia. The structure and interrelationship of our entries is relatively more important than at WP. In our case, we are trying to have consistency of entries so that a user can quickly find what they want. A dictionary is used more often and briefly than an encyclopedia. DCDuring TALK 20:36, 19 May 2008 (UTC)[reply]
The fact that Wikipedians already get so confused by the system here is surely reason enough to try to clarify things. At no circumstance should there be two separate pages. I think the issue is whether to call the page the ELE or the MOS. Nwspel 20:42, 19 May 2008 (UTC)[reply]
Every project is an island. We do things here in the way we find best for us to accomplish our mission, and Wikipedia does the same. Forcing one project to do something just because another project does it will just hinder the many projects which are better served doing something a different way. - [The]DaveRoss 20:44, 19 May 2008 (UTC)[reply]
Agree with TDR. We should be allowed to do things our own way, completely regardless of how the 'pedia happens to do them. -Atelaes λάλει ἐμοί 20:46, 19 May 2008 (UTC)[reply]
There are some reasons to tilt toward WP's structure, especially for our most basic pages:
  1. they have 60 times the visits that we do and may be doing something right.
  2. they are a prime source of recruits since we don't seem to do all that well in outreach to the general population.
This principle should apply to all the pages that wikipedians expect. Perhaps all of the WP shortcuts should redirect to the closest corresponding WT page. DCDuring TALK 20:59, 19 May 2008 (UTC)[reply]
Disagree with DCDuring (seems like I've been doing that a lot lately, nothing personal :-)). The 'pedia has different aims and different needs than we do. Should we change WT:ALA to redirect to something about Alabama as the 'pedia does? No. Why? Because Alabama is not a major part of what we do here, but Latin is. Our shortcuts should reflect our priorities and needs, not the 'pedia's. -Atelaes λάλει ἐμοί 21:07, 19 May 2008 (UTC)[reply]
That's a slightly different example to moving ELE to MOS, since the MOS seems to be pretty uniform pan-wiki. Nwspel 21:11, 19 May 2008 (UTC)[reply]
That's not what I'm seeing. What I'm seeing is that every project has their own name for their major formatting page, with WX:MOS linking to it. I have no problem with WT:MOS linking to ELE, but we should not change the name of the page. -Atelaes λάλει ἐμοί 21:19, 19 May 2008 (UTC)[reply]
WT:WS should not be redirecting to Wiktionary:Shortcut, since there is a more logical redirect for that (WT:SC), and since WT:WS is much more needed for Wiktionary:Wikisaurus. Nwspel 21:56, 19 May 2008 (UTC)[reply]
Based on what reasoning? As Conrad has already wisely said, the shortcuts are for editors, not readers. Do you have some evidence that this assertion is the case? -Atelaes λάλει ἐμοί 22:17, 19 May 2008 (UTC)[reply]
Based on logic, not tradition; something I hope this wiktionary would adhere to. The fact that some users may be used to WT:WS directing to Wiktionary:Shortcut is irrelevant. There is now a better shortcut for that, and that shorcut can be used on the more needed case of the Wikisaurus. Nwspel 22:38, 19 May 2008 (UTC)[reply]
But the logic is not that simple. Tradition is useful in redirects because it's what our editors are used to and will think to use (and making things easier for the editors is what these things are here for). If there's a huge demand to be linking to Wikisaurus pages, then fine. However, it's my impression that wikisaurus is a dead project. Those pages have all turned into piles of nonsense, and I think people simply prefer to put synonyms into the main entries, so that they receive the same scrutiny that everything else does. Now, I wouldn't expect you to know that, as you're a new user, but that's the point we've been trying to convey, namely that new users should not come in and, with little experience actually working on wikt, start revamping our background shortcuts. -Atelaes λάλει ἐμοί 22:49, 19 May 2008 (UTC)[reply]
If its dead, should you not delete it? You also made an interesting point; you say that new users do not understand about the project properly - but that shouldn't be the case, since the wiki should be aiming to make itself usable by anyone, not just someone who has spent months in the system. Nwspel 22:56, 19 May 2008 (UTC)[reply]
Delete wikisaurus, hmmm......that's not a bad idea. And I didn't say that we shouldn't make the project easier for newbies to get into. I said that newbies shouldn't presume to make major policy and structural changes. -Atelaes λάλει ἐμοί 23:02, 19 May 2008 (UTC)[reply]
I was only trying to help :/ Nwspel 23:10, 19 May 2008 (UTC)[reply]
Newbies get to vote just like anyone else and participate in the discussion. Our braver newbies also stir the pot. Veterans may discount some newbie opinions. I would argue that, despite the risks and the rehashing of old matters, we need more committed newbies stirring the pot. Despite all that's been accomplished, Wiktionary does not yet have laurels worth resting on. DCDuring TALK 00:04, 20 May 2008 (UTC)[reply]
Even if you keep "ELE" instead of "MOS", I still have a bone to pick with the name. We don't call Wiktionary:Administrator, "Wiktionary:Administration explained", or Help:Tips and tricks, "Help:Tips and tricks explained", so I see no consistent reason in keeping the "explained" bit on the end of the title. Many of you will sit there reading this saying to yourselves "What's he on about, it sounds completely fine" - but that's because you're used to it; if you looked at it from a non-wiktionary-native view, it would seem very illogical and inconsistent. Wiktionary:Entry layout would be much more appropriate.Nwspel 09:25, 21 May 2008 (UTC)[reply]
If we change the name as you suggest, then the shortcut would be WT:EL, but el is the ISO code for Modern Greek. The shortcut would become confusing to regular users (who are the most likely to use it). In any case, it is the page where entry layout is explained. Entry layout is one item here that really is so complex that it cannot be simply presented, but must be explained. The current title is logical. --EncycloPetey 13:41, 21 May 2008 (UTC)[reply]
"Tips and tricks explained" isn't an analogue of "Entry layout explained". One page is a page of tips and tricks (tips and tricks are its contents), while the other explains entry layout. I take your point, though, that most Wiktionary: pages don't have "explained" or the like at the end of their respective names. So what? They don't have to match each other. And it's not like someone is more likely to look at Wiktionary:Entry layout than at Wiktionary:Entry layout explained: neither is intuitive. (Nor is anything else. That's why we have links to policy pages: so people don't need to guess their names.) So there's really no need to change the name of the page; and it will serve to confuse those who know where it is already. That said, I see no reason not to make Wiktionary:Entry layout a redirect to ELE. As a side point, part of the reason I left WP and now edit (almost exclusively) here is that there was too much focus on administrativia (stub sorting, anyone? continual arguments over VFD procedures? etc.) and correspondingly not enough on making an encyclopedia. (Contrast [22] with [23].)—msh210 16:39, 21 May 2008 (UTC)[reply]
What do you mean about the comparisons with my edits? And ID is the ISO code for Indonesian, but it doesn't stop us having WT:ID, so that is not a valid point. If we don't have "Wiktionary:Administrators explained", then why should we add the explained onto this? Nwspel 18:14, 21 May 2008 (UTC)[reply]
Your logic is still flawed for the reasons I stated above. The Administrators page does not "explain" administrators; it merely lists them and allows for voting. By contrast, ELE does explain (in detail) aspects of Entry layout. Do you really believe this one small point is critical to the development of Wiktionary? I won't be bothering to discuss it anymore, because I do not think it deserves this much attention. --EncycloPetey 13:49, 22 May 2008 (UTC)[reply]

WikiSaurus

Proposal

Deletion of WikiSaurus, or the merge of it into mainspace articles. Nwspel 15:35, 20 May 2008 (UTC)[reply]

Poll

merge/delete keep

Discussion

WikiSaurus is a useful place for people to play about adding stupid words - it's not part of the REAL dictionary (IMHO). SemperBlotto 16:27, 20 May 2008 (UTC)[reply]
Despite all the bad stuff in it, it does have some good stuff. Shame to be rid of it. (Incidentally, I agree with everyone else who said not to rename ELE. No objection to having WT:MOS redirect to it, though I see no need at all.)—msh210 16:18, 20 May 2008 (UTC)[reply]
I thought the whole point of wiktionary was to contain everything in one article, since the whole policy think about "wiktionary/wikipedia" not being confined by the normal "paper" issues, etc, meaning that the synonyms should simply be included in the main wiktionary articles themselves. The general impression I get from several people here is that WikiSaurus is a dead project, and the little that is on there, is rubbish. Nwspel 17:06, 20 May 2008 (UTC)[reply]

We voted on the issue of Wikisaurus, IIRC. Changing that would require another vote. And no, we don't put everything on one page. For example, we have a separate namespace for citations. Wikisaurus allows us a space to list synonym sets that would otherwise be repeated on countless pages with slightly different meanings, and require the format and lists of those synonyms to be kept in synch. Wikisaurus was started to solve that problem by allowing a freer format for presenting and aligning synonyms, near synonyms, antonyms, and such. The project has not gotten far because the one person who helped set up the pages has stopped work, and few people have since shown interest in writing new pages (except for slang terms of body parts and sex). We don't delete content just because it isn't being actively edited (or we'd delete huge numbers of entries). A better solution is to locate interested individuals to add more pages to the project. --EncycloPetey 18:07, 20 May 2008 (UTC)[reply]

However, considering no good work has been put into the project in a great while, and a lot of nonsense has accumulated, it might be worth reevaluating. When Nwspel (perhaps only jokingly) proposed that it be deleted, there were a number of "ay"s on IRC. I must admit, right now the whole thing feels like an embarrassment to the project. I would put up no opposition to simply deleting the whole thing. I am curious as to what others think of this. -Atelaes λάλει ἐμοί 18:19, 20 May 2008 (UTC)[reply]
Or merge the content of the pages there into subsections of the mainspace pages. Nwspel 18:14, 20 May 2008 (UTC)[reply]
I wasn't joking about deleting it Atelaes; in fact, you were one of the several people that helped me come to the conclusion that something should be done about it, from when you were discussing about the shortcut system with me. Nwspel 18:23, 20 May 2008 (UTC)[reply]
Merge useful content (if any) back into mainspace, leave redirects for a while. Then remove namespace. Isn't this one of those basic no-nos of database management (having separate fields that have to sync'd manually)? Anyway, it's not working. Better to keep this content in plain view, where everyone can keep an eye on it. -- Visviva 03:09, 21 May 2008 (UTC)[reply]
I have similar misgivings about Citations, but note that it wasn't supposed to be used as the default holder for all citations, only as a sort of incubator/scratchpad, particularly for words and senses that didn't yet meet CFI. -- Visviva 03:09, 21 May 2008 (UTC)[reply]
Um, if we're considering this as a matter of policy, it needs a formal WT:VOTE. If we're just considering general opinions about the current state of Wikisaurus, and how specific existing pages might best be handled, formal tabulation of positions isn't really the best course. Either way, I don't think the table above is very helpful. -- Visviva 09:07, 21 May 2008 (UTC)[reply]
Of course this is no substitute for a formal vote, but as long as that's realized, I don't see how it's harmful (barring the possibility that people's positions might be misrepresented). -Atelaes λάλει ἐμοί 09:10, 21 May 2008 (UTC)[reply]
If the results from the table above show that there is some movement for action, then it shall go to a formal vote; the voting above is simply so we can get an idea of whether or not there is enough support to put it forward.Nwspel 09:19, 21 May 2008 (UTC)[reply]
Before a formal vote can happen we need to know what the options entail. How do we handle pages like Wikisaurus:penis/more, for example. Also is it possible that the Wikisaurus space could be put to a better use, for example by providing transclusion for sets of synonyms on mainspace entries (like the templates work, so {{Wikisaurus:anger}} on an entry could provide some of synonyms for anger, rage etc.)?
You make interesting points, but my oppinion is that most of those words listed there are rubbish; and would not even warrant a chance on an RFV. We should gradually be working through the words, checking to see if they can be verified; if they cannot, delete them; if they can, make the page, and link it in the see also section or whatever, to the other word. Nwspel 11:21, 21 May 2008 (UTC)[reply]
1. By deleting them, like the irredeemable cesspools they are. (Or filtering them gradually for any valid content that may somehow have found its way in.) 2. This is a great idea and I'd like to see a pilot, but it runs into the same basic problem as the current Wikisaurus: the basic unit of synonymy is not the word, but the sense, and our arrangement and glossing of senses is in constant and necessary flux. That probably isn't a big problem for "anger" but can be an enormous problem for many words. Also, as somebody mentioned somewhere recently, synonymy is not reliably transitive; a one-size-fits-all template may not suit all members of the set equally. -- Visviva 12:01, 21 May 2008 (UTC)[reply]
Wikisaurus, like some of our Appendices, is a useful location for certain potential entries that might not warrant inclusion in principal namespace. As such, it might again serve to involve in Wiktionary some who might become principal-namespace contributors. I don't know whether there is any software or structure that could be developed that would allow Wikisaurus to benefit from the synonym work in principal namespace and also allow it to have lower quality material as well without risking corrupting principal namespace. DCDuring TALK 11:33, 21 May 2008 (UTC)[reply]
I don't think Wikisaurus lists should be including anything that doesn't meet CFI; our tacit allowance for this is a big part of why the 'saurus has become such a steaming pile. -- Visviva 12:01, 21 May 2008 (UTC)[reply]
Or to put it another way, to allow unverifiable content on WS as a matter of course would do an active disservice to our users, by providing them with information that is very likely to be wrong. IMO this is actually worse than providing no information at all. -- Visviva 12:28, 21 May 2008 (UTC)[reply]
There are several hundreds of words in the WS that have not been verified, are probably mostly wrong, etc, etc, etc, so I find myself agreeing with Visviva. Wiktionary itself has sections for synonyms in its mainspace articles; there is no reason to repeat the same thing twice; once on mainspace, once on WS; except, in the latter, several hundreds of other "words" seem to have been added that are not verifiable in any way, and as we merge the "project" into the mainspace, will have to be verified. Nwspel 17:55, 21 May 2008 (UTC)[reply]
About synonyms - for those who learn English as a second language, listing synonyms is just one half of the equation. Another very important aspect would be describing the usage of each and explaining what situation they are used compared to each other (formal, informal, written, spoken, positive, negative, etc.) When looking at a group of synonyms, one (maybe the most common) could be used to hold these descriptions, while the other could just point to it. --Panda10 18:29, 21 May 2008 (UTC)[reply]
But WS does not do that anyway. Nwspel 19:18, 21 May 2008 (UTC)[reply]
(Not that you're doing it, but...) I hate it when I hear anyone say that something is done the "correct" way or not about Wikisaurus. For instance, I've been yelled at for using titles with parentheses, whose purpose was to "disambiguate" meanings. Thankfully that's been resolved by saying that the page can be devoted to the primary meaning of a word, but I still feel that the way I had done it would have been pretty cool. Wikisausus is not established enough for creativity to be stifled, and Panda10 has a great idea. In fact I like the American Heritage Dictionary for precisely that reason, the way that it explains synonyms. DAVilla 01:34, 26 May 2008 (UTC)[reply]
Hear, hear. The /more pages can be deleted or at least ignored, and I'm all for shooting any red links on sight. On the other hand, there are a number of terms in thesauruses that are not necessarily idiomatic, so this isn't the perfect solution. Requiring blue links is probably a good enough starting point, though. It's been brought up before, of course, and I think the main objector had been RichardB. In my opinion he was right about most of the terms he wanted to include, but RFV has progressed substantially since the days that some of the earlier bickering had taken place.
I would rather the Wikisaurus space sit unused than have it removed. It needs to happen, and one of these days someone is going to come along and make something out of it. The best proposal I've heard is to import a public domain thesaurus, much like Webster's was used to jumpstart the dictionary. It would also be great to have bots running around and adding things to the Wikisaurus that are synonyms in the main dictionary space. Of course bots usually require some structure first though, so this would probably need some human guidance if it were to be done now. DAVilla 01:34, 26 May 2008 (UTC)[reply]
There's a lot of importable synonym content in Webster's, such as at [24]; that might be a good place to start. -- Visviva 04:22, 26 May 2008 (UTC)[reply]
  • I've taken a stab at Conrad's (? unsigned) suggestion above; see Wikisaurus:anger and anger#Synonyms. I created {{comma list}} for the occasion; I hope it will also find other applications. This approach looks promising, and has the advantage that it would actually reduce fragmentation of content... BTW, I'm not much with layout or coding, so if anyone wants to fix these up, please feel free. :-) -- Visviva 04:22, 26 May 2008 (UTC)[reply]
See also the very old idea at User:Conrad.Irwin/anger, I couldn't get it to work nicely enough for most cases, though it works well in a few - your idea looks a lot neater though. Would it be possible to change the heading from ====Synonyms==== to ====Thesaurus====, and then transclude the synonyms, antonyms headings so that the normal edit links work by editing the Wikisaurus page? Conrad.Irwin 12:22, 27 May 2008 (UTC)[reply]
This is a nifty notion, but I'm afraid it would be problematic for most words, since monosemy is unusual for most core (or even semi-core) vocabulary, and even if we have a "Thesaurus" heading we can't very well have multiple such headings under a single POS. An advantage of the comma list is that it can be spliced into existing entries without breaking up the existing ontology. -- Visviva 15:45, 27 May 2008 (UTC)[reply]
Thesaurus Flunky below holds my comment. Amina (sack36) 08:09, 27 May 2008 (UTC)[reply]

At Talk:d'oh, a user mentioned that "annoyed grunt" is how that entry term is indicated in the closed caption. This would appear to be a excellent possible source for some other non-verbal entries. How should such information be shown in an entry? Does anyone know of sources of information about how the meanings of meaningful non-verbal sounds are communicated to the hearing-impaired? DCDuring TALK 11:02, 24 May 2008 (UTC)[reply]

Subtitles in DVDs are also an excellent "permanent" publication record for this sort of thing. However, one has to be aware that both closed captioning and subtitles are prone to errors. I once watched an episode of "Seinfeld" while out with friends. (Note: this was one of only two episodes I ever watched at all because I can't stand the program). The television on the wall was muted for sound and the closed captioing was turned on. As the program went on, the captioning became more and more garbled, until it was a string of nonsense characters and symbols. As far as I can tell, some closed captioning is "recorded ahead of broadcast, but other times is typed "live" as someone hears the program and types what they think they heard, with all the potential for errors that implies. I have seen problems in subtitles as well, such as typing the original script rather than the recorded performance, or the editing out of content. There is a principle among those who write subtitles for translated films that the subtitles should not exceeed a certain length, in order to permit them to be read by the audience. This principle is sometimes applied to transcribed subtitles as well. --EncycloPetey 13:51, 24 May 2008 (UTC)[reply]
Live closed captioning is usually done with automatic speech recognition software, which feeds directly into the captioning encoder i.e. with no way for a human to edit. Some TV productions take the time to take the script, revise it to reflect the actual recorded program, and then fed the edited version into the caption encoder. But quite a few just shove the tape into the automatic software, and use the unedited results. Clearly they don't care; they are just meeting the requirement via the broadcasters that material be closed captioned. This no doubt explains the Seinfeld episode; there were some problem with the audio, but they only checked the beginning, or didn't bother checking it at all. Robert Ullmann 14:03, 24 May 2008 (UTC)[reply]
I hadn't heard of automatic closed captioning, but the manual work varies widely in quality, apparently a great deal of it being at the low end of the scale (there are also subtitles and audio description which may be useful references). More info at Joe Clark and Captioning SucksMichael Z. 2008-05-24 14:51 z
Glad I asked. Thanks for the assessments, cautions, and links. "d'oh" may be a bit of a special case because it is the signature catch-phrase of the show. DCDuring TALK 17:52, 24 May 2008 (UTC)[reply]
Gievn the volume of Simpsons-related merchandise, I'd be surprized if several printed sources weren't readily available in support. --EncycloPetey 19:17, 24 May 2008 (UTC)[reply]
The written "D'oh" appeared in the show itself, in the episode about how Lisa got her saxophone; Homer accidentally says "d'oh" while reciting the inscription for the sax, and it ends up in there. If anyone can track that down, it's citeable in-canon. —RuakhTALK 02:08, 25 May 2008 (UTC)[reply]
I was interested in transcripts generally as a possible source of the meaning of non-verbal expressions, such as "d'oh". If the transcript or script says that is supposed to be "annoyed grunt", that is meaningful, but not a definitive statement of how viewers interpret and use the expression. Non-verbal expressions are different from words, in that the words carry most of the full load of meaning, but these non-verbal expressions do not. Some, but not actors, would say that words "speak for themselves". I don't think that is as true of such expressions. That is why I was hoping that "annoyed grunt" and other clarifying statements of the script-writer's intent would be available for a range of non-verbal expressions. Unfortunately, that does not seem likely to be the case. DCDuring TALK 04:02, 25 May 2008 (UTC)[reply]
It's normally put as (ANNOYED SOUND) or (ANNOYED GRUNT) in relay calls, if that helps anyone. --Neskaya talk 00:18, 26 May 2008 (UTC)[reply]
Is that done to avoid special-character issues with d'oh or for other reasons? DCDuring TALK 01:24, 26 May 2008 (UTC)[reply]
partially to avoid special-character issues, I would assume. However, d'oh could be portrayed as d oh which is usually how things like "I'm" are portrayed (i m). I never actually thought about the particular reasoning issues there. Do closed-captioning services have apostrophes, though? I have never seen apostrophes used in the captions for films at school or anything. --Neskaya talk 00:11, 27 May 2008 (UTC)[reply]

Thesaurus flunky

I'm looking for a word or words to work on and an area where that should be done. I finally got my User page up so you can check that if you need to.

I didn't know if the wiktionary method of proceeding was the same as the wikisaurus method was the same as the wikipedia method. I've begun the process of reading my welcome but dry stuff takes me a while. I will get through it, though. Amina (sack36) 07:36, 27 May 2008 (UTC)[reply]

I wrote this before I had read the relevant section here. I have now done that. The one thing that Wikisaurus would have that a synonym spot in wiktionary doesn't have is the ability to go beyond the mind set that other people have. It drives me crazy not having a Roget setup in a thesaurus. What if all you can think of is slingshot and what you really want is the medieval siege weapon, mangonel? With a Roget's you can get to it eventually. With the dictionary type of thesaurus you have slim chance to none.
So. I'm willing to volunteer the time to clean up as much as possible and work on valid words. I don't make this offer lightly. I know there will be a load of hard work and long hours. I still want to give it a shot. If I haven't made a difference in a month, I'll cry crocodile tears while you delete or merge.
Go for it! We've all kind of given up with it, as it's been a mess for too long. Be as bold as you like, and I look forward to seeing the results. Conrad.Irwin 12:16, 27 May 2008 (UTC)[reply]
Agreed. While I have grown so used to the low quality current state of Wikisaurus that I remain skeptical of it becoming a useful project, I have to admit that there is always a chance that someone can turn it around into something beautiful. The best of luck to you. -Atelaes λάλει ἐμοί 18:09, 27 May 2008 (UTC)[reply]
Cool. Go for it. There could be real value in having a thesaurus-style presentation of the data already in Wiktionary, judging from the number of requests for synonyms and from Feedback. A thesaurus could also help us identify missing senses of words that are in Wiktionary. DCDuring TALK 18:14, 27 May 2008 (UTC)[reply]
Under ws|improvements Preview, I have put together a plan of action for cleaning up and making manageable the wikisaurus project. Could you read through and give me your impressions? Amina (sack36) 02:26, 29 May 2008 (UTC)[reply]
I assume you mean: Wiktionary:Wikisaurus/improvements? --EncycloPetey 03:53, 29 May 2008 (UTC)[reply]
Um... yeah. Sorry about that. Amina (sack36) 04:19, 29 May 2008 (UTC)[reply]
Could I propose a change to the header of Wikisaurus? It's purpose is to concatenate the introduction and allow the beginning available words to show on the page without scrolling. You can see the proposal at Template_talk:saurus-head Amina (sack36) 00:27, 30 May 2008 (UTC)[reply]

What brought me back

I just stumbled onto a TED talk that calls for more ameteur lexicography (word-hunting, with context). It inspired me to spend more time at this site. I thought others might be interested in the video. --Polyparadigm 01:33, 29 May 2008 (UTC)[reply]

It is a cool presentation. It was fun to listen to it again. It would be interesting to figure out how we can be less traffic-cop-like and not be over-run with vandalism. Erin's message is very positive and inspiring. DCDuring TALK 03:49, 29 May 2008 (UTC)[reply]

Scholarly hypersensitivity or sophistication

"Falso cognate" is certainly NOT an accurate description of these terms. False cognate is not the same as "not cognate". A false cognate implies that there is or was a significant number of people who believed the words were cognates. Examples of false cognates are English dog and Mbabaram dog; English mama and Quechua mama. I don’t believe anybody thinks is related to English mow, or that -다 is related to English do. They are not false cognates, they simply are not cognate. To label them false cognates means that there is a group of people who think, or once thought, that they were cognate. —Stephen 18:45, 29 May 2008 (UTC)

-- Quoted from User talk:KYPark#Korean "false cognates"

``I don't believe anybody thinks is related to English mow, or that -다 is related to English do.``

I don't believe anybody thinks the mistake involved in the above quote very seriously, but hypersensitive scholars may do. In what ways?

Korean verbs and adjectives end with -다 (-da, “-da”), which is thus similar, analogous, equivalent, or related (just functionally, hence NOT in Stephen's genetic sense) to the French ending /-er/, the Germanic /-en/, AND the English eccentric preposition /to/ but NOT "do" as Stephen mentioned. So granted, the French and Germanic postpositional endings are more equivalent to the Korean counterpart than the English prepositional /to/. Is such comparison entirely meaningless from the point of view of w:general linguistics, w:comparative linguistics, w:universal grammar, or just from the popular point of view of curiosity? I don't believe anybody thinks it is so.

Dutch doen, English do, and German tun are equivalent to Korean 하다 (hada, “ha-da”). Meanwhile, they are cognate with Greek θέτω (théto, to put, to set, to place), Lithuanian deti (to put), Czech diti (to hide), Polish dziać (to happen), Russian деть (detʹ, to put, to place), etc. In this archaic sense, they are equivalent to Korean 두다 (duda, “du-da”). Hence a striking equivalence, both semantic and phonetic, with English. Whoever reads this, including Stephen and Ivan, would never ever forget Korean 두다 forever hereafter. What a mnemonics!

On the basis of the above discussion, sophisticated or hypersensitive scholars could, would, or should set up a w:straw man or more to stand for, or in the image of, me, namely, cognate, false cognate, not cognate, etymology, IE-Korean nexus, IE-Korean genetic, Ural-Altaic, or whatever sophistication.

Wikimedia in general aims to serve for the average readers or public in general rather than sophisticated scholars. Then, the dichotomy of "false cognate" and "not cognate" would be too sophisticated for them, but that of "true cognate" or "false cognate" would do. Yeah, it is surely possible to define "false cognate" as such as Stephen, in contrast with "true cognate" AND "not cognate," especially when needed to set up a straw man.

In corollary, all scholarly arguments are more or less of such dirty arguments. Frankly, I am supposed to practice some in dirty discussions, while almost unconsciously attempting to reduce them to a minimal, which I call unconscious morality in action. Consciously and unconsciously, we ask or just wish others not to wield such dirty tactics at will, at random. Throughout the BP talk I hve had now and then, I've been getting more and more skeptical how they behave themselves in the context of Wikipedia. To me, however, it is NOT a great surprise, as I understand such is the usual scholarly behavior. At this point, I fear, Ivan would like to set up a straw man. No, he has already done and attacked it near the end of User talk:KYPark#Korean "false_cognates". --KYPark 03:26, 30 May 2008 (UTC)[reply]

See also; #Can someone else please be the bad guy
To begin with, I must simply say that I am not, for a single second, buying the mnemonics argument. KYPark has a very long history of pushing a genetic relationship between Korean and IE languages. That he has switched to calling them mnemonics after the community made it abundantly clear that they were not interested in such etymologies is beyond suspicious. What bothers me most about all of this is that so much time of valuable editors such as Stephen, Visviva, and Ivan (who could be writing real etymologies) has been wasted trying to reason with, clean up after, and debate minor semantics with KYPark. So let me address KYPark very simply. The community has made it abundantly clear that we have no wish to see any comparisons between Korean and IE languages, not as etymologies, not as mnemonics, not at all. I am tired of debating this issue. If you write anymore about comparisons between Korean and IE words you will be blocked for successively longer periods of time. I very ardently plead with you to not do so, because I genuinely have no wish to lose you, as, aside from your Korean/IE hypothesis, you are an excellent editor, and you make very good Korean entries. However, I am unwilling to devote further time to this debate. -Atelaes λάλει ἐμοί 04:37, 30 May 2008 (UTC)[reply]
Seconded. -- Visviva 04:40, 30 May 2008 (UTC)[reply]
To begin with, I say that Atelaes set up still another w:straw man for me, called mnemanics, so as to cut it down, as I've expected as his or their usual tactics. No wonder at all.
The best of such tactics makes best use of w:contextomy or w:quoting out of context, say, simply "mnemanics" out of "a very long history of pushing a genetic relationship between Korean and IE languages."
I once said to Visviva, who seconded Atelaes a while ago, that the variety of uses of my, hot if you like, Korean-English comparison is rather undefinable, including associationist memory, for which Visviva specifically reminded me of the shorter "mnemonics" that he often attempts himself.
He happened to take 매다 "to weed," for example, with which he associates "hawk" brightly. In contrast or response, I associated (or compared) English mow, Dutch maaien, etc., instead, at least for far better mnemonics. There was no sense or hint of "genetic relation" here, though I myself was quite surprised by the semantic as well as phonetic similarity. It was such an accident indeed.
It was Atelaes that invited me to the BP again and again. But I refused, because I know it is not the place for me to argue as if I were a scholar I hate. Read and edit is all I like doing here.
At last he initiated #Can someone else please be the bad guy. It is very clear from the title that he attempted a wicked personal attack on me! In advance, such a symptom was surely realized from a few trivial encounters.
For the first time, I edited the Citations page of witch, while feeling like being made a witch, so most likely reflecting my miserable feeling on the note. If I am not on the right track, the only thing the admin like Atelaes should do is just to advise me that my way is not the right way! I presume.
ANYWAY DEFINITELY, I was not attempting any Korean-European genetic claim. Had I been so attempting, I should have mentioned Korean (bit, “bich, bit”) as a cognate with witch, as I do guess! Then, Atelaes would have had every reason for attacking me personally according to their agenda.
I feel like Atelaes being trapped by me. Or, in effect, he trapped himself, or he was trapped anyway, and he tries to escape from it desparately. I see it, as anyone sees it. If NOT, he should not make such a biiig ultimatum, say, block me. Go ahead and block me if you like. I am not afraid of it at all! But never forget that that might be the end of Wiktionary! Don't guess as if I were joking! You can do anything, however harmful it may be to Wiktionary.
I could, would and should not say everything here. I'm just sorry. Remember this is not the end. --KYPark 10:32, 30 May 2008 (UTC)[reply]
Everybody take a deep breath :) KYPark, Wiktionary is based on consensus. If you believe that your approach is correct, you may submit a proposal at Wiktionary:Votes. If you do not receive enough supporting votes, then you should abide by the wishes of the Wiktionary community. Before you consider such an option, be aware that you will most likely be outvoted (based on reading the above). Please carefully write your proposal in clear and concise English. -- A-cai 12:18, 30 May 2008 (UTC)[reply]
I always thank you for everything indeed. But things may not be so simple as you may think. I need to know the way they think, not yours. --KYPark 12:49, 30 May 2008 (UTC)[reply]

Suggestion from OTRS mail

>  I would just like to make a tiny suggestion that would help make your
> website better. I would like to point out that your quick reference
> dictionary does not have any syllable breaks that can help in the
> pronunciation of some words. If you could put syllable breaks on newer
> articles, later posted on your website, it would really help. Thank you!!
I said I'd relay the suggestion to the Wiktionary team, so... here ya go. :) ~Kylu (u|t) 03:56, 30 May 2008 (UTC)[reply]
I second the suggestion positively. For Wiktionary should remain evolutionary. --KYPark 04:17, 30 May 2008 (UTC)[reply]
The suggestion is not specific enough to act upon. If the suggestion is about pronunciations, then in which languages? For some languages, there are no syllable breaks in the pronunciation, and for many languages they are very difficult to place with certainty. If the suggestion is about hyphenation, then again which languages? For English, hyphenation syllable breaks differ between the US and UK for some words, and I do not know how to reliably present these differences, because I don't know the UK algorithm for determining where to hyphenate. --EncycloPetey 04:22, 30 May 2008 (UTC)[reply]
They use math for that? Now I'm scared. --Neskaya talk 07:03, 30 May 2008 (UTC) (Argh I didn't sign in.)[reply]
(Actually, we use maths.) Widsith 10:36, 30 May 2008 (UTC)[reply]
Now I'm more scared. As if one wasn't enough of that. --Neskaya talk 06:19, 31 May 2008 (UTC)[reply]

Dude, I gotta jump in on this one. It's a good suggestion that I have mentioned in individual conversations before. I'm taking the user's request to mean a desire for hyphenation. This is actually the utility I require most from a dictionary in my profession as a sheet music arranger, and for all my affinity for wiktionary, I have to take my business elsewhere for that information. True, it's only useful for very specific applications (one of which happens to be my daily work), and self-hyphenating word processors have removed the necessity of this information from all would-be amateur publishers, but I think it's a worthy addition to our format. Especially since there are US/UK differences, it's then all the more important that this info is ours alone to share. Other sources certainly don't make that valuable effort. I once thought it would be cool to break apart the headword with little dots showing hyphenation, but maybe this goes better in the Pronunciation header. On a slightly larger topic, I would love a standardized pronunciation section, maybe beginning with Hyphenation: Head•word, followed by Simple Pronunciation: hed-wurd, then IPA, SAMPA, Rhymes... Whaddaboutit? -- Thisis0 19:27, 30 May 2008 (UTC)[reply]

How about a hyphenations header when needed? The hyphenation doesn't really impact the pronounciation, or the spelling, it's a separate kind of property. RJFJR 19:31, 30 May 2008 (UTC)[reply]
The pronunciations sections, if fully tricked out with content in table form and Homophone L4 header takes up 30-40% of the highly valuable "above-the-fold" space on the first page without offering intelligible or usable context to most non-expert users. Adding another space-burning header strikes me as squandering prime real estate.
OTOH, I would think that syllabification (with stress shown) would be more valuable to most users than what we now provide under the pronunciation header. It could actually appear routinely on the inflection line if we chose, thus offering more value to most users than the existing pronunciation section at no or little cost in above-the-fold real estate. DCDuring TALK 19:46, 30 May 2008 (UTC)[reply]
Syllabification could be useful, we already have {{hyphenation}} which is used in the pronunciation section. I think we should have a {{syllables}} or {{syllabification}} template which does pretty much the same thing, but obviously is for syllables not hyphenation - although similar it must always be stressed that they are not the same. There's the slight issue with the UK-US variations, but this is not a big problem - we seem to have managed alright with the rest of the pronunciation section that must be split thusly. There is a Hyphenation algorithm which is fairly good, but it is always possible to find exceptions so any automated task to add them would have to be carefully supervised. I'm not sure whether anyone has attempted to write a syllabification algorithm, but I suspect it'd suffer from the same problem. On a similar, yet very different theme - {{collate as}} now exists to help with the shiny new Project - Text Processing Information, the discussion that took place on the Grease Pit could probably do with some input from the Beer Parlour. Conrad.Irwin 21:40, 30 May 2008 (UTC)[reply]
Conrad, the pronunciaiton is already set up to include syllable breaks; we do not need another new template for that. And in any case it CANNOT be used with the English spelling of the word, because the pronounced syllable breaks sometimes come in the middle of letters. Consider exactly, which hyphenates as Hyphenation: ex‧act‧ly, but for which the syllable breaks occur as /ɛk.sækt.li/. That's one very important reason why IPA or some other transcription is used for pronunciations; you can't put the syllable breaks in using the usual English orthography. And they are only sometimes similar. Even for words that hyphenated the same in the US and UK, the spoken syllable breaks may be in different locations and cannot be predicted from the spelling alone. --EncycloPetey 21:55, 30 May 2008 (UTC)[reply]
Ok, as a complete ignoramus in the pronunciation section I was jumping to conclusions from what was said above. Where is the syllabification information stored? It doesn't seem to be in the pronunciation section at pronunciation or hyphenation - though maybe that's just because I can't read IPA well enough. Interestingly enough Wiktionary:Pronunciation says we do include it, but I can't decipher where or how. I'd disagree with cannot be done algorithmically, but am happy to admit that the Google search isn't that promising. Conrad.Irwin 22:13, 30 May 2008 (UTC)[reply]
The little period-looking things, the stress mark, and secondary stress all indicate syllable breaks. You can see them in the examples of pronunciation of exactly that I gave above. These are not consistently marked in all languages, though, because some words and some languages (like Czech, IIRC, or possibly French) do not have any discernable syllable breaks in the spoken forms. For other languages, especially in East Asia, this information is often trivial because the charatcers are themselves syllables. --EncycloPetey 22:38, 30 May 2008 (UTC)[reply]
In the pronounciation string, syllable breaks are shown as periods.
Except that it's not the same character as the period; it's a symbol in the IPA character set. --EncycloPetey 22:43, 30 May 2008 (UTC)[reply]
IPA syllable breaks are often left out, but perhaps we should encourage them to always be explicitly entered. (I believe the IPA symbol is a regular period U+002E, no?) Details at IPA#Suprasegmentals.
But explicitly entering them is also a problem. Many words do not have an explicit syllable break, and placing the syllable break in the pronunciation is therefore misleading in those cases. In English, the placement of syllable breaks also varies widely with dialect. I have seen thus far only one dictionary that recognizes this fact (the Cambridge English Pronouncing Dictionary), but even this volume doesn't capture the whole range of variation. It presents only the variation present in the Received Pronunciation and a well-enunciated General American accent. --EncycloPetey 23:59, 30 May 2008 (UTC)[reply]
Wow. But this is still soluble by showing multiple dialects, and multiple pronunciations per dialect. We already do that. The trick is knowing when to leave out a syllable break, yes? Michael Z. 2008-05-31 02:01 z
I really wonder how linguists go about syllabification. English speakers tend to break words apart the way that they think of them, but not the way that they sound. The example given is a perfect illustration. Mentally "exact" is the root and we break the word there, but in the pronunciation /ɛgzæktli/ replace g with ɡ, invalid IPA characters (g) it makes a lot more sense orally to tie the t with the l as part of the same syllable. Scream the word at the top of your lungs and you'll see how awkward it is to break them apart. It also doesn't really matter all that much how syllables divide a single word because the consonant clusters at the end of a word can attach themselves to the next word. Consider sand eel = sænd + i:l replace : with ː, invalid IPA characters (:) where the syllables span words since sandy = sæn.di. Discernable syllable breaks are probably far fewer in number and far less important than most of us suspect. 75.54.80.198 07:28, 4 June 2008 (UTC)[reply]

Hypehenation isn't only British/American, it is also a matter of style. It appears that some dictionaries provide hyphenation mechanically at every possible break, relying on the editor to use discretion (or not), while others recommend better hyphenation breaks.

For example, the NOAD indicates every possible location, while the CanOD is more conservative, and explains some of the reasons for its recommendations in the frontmatter—for example "plane-tary" would not be smooth reading. (Note that the place of hyphenation sometimes varies between the two, and I have no idea if the CanOD's corresponds to British usage, or is a Canadian style, or is based on the corpus.)

Hopefully any automated algorithm would at least avoid breaks like "throw in some fresh shit-ake mushrooms".

NOAD plan·e·tar·y fore·fin·ger fin·ger·nail gov·ern·ment self-gov·ern·ment
CanOD plan·et·ary fore·finger finger·nail gov·ern·ment self-govern·ment

We need a convention for indicating the source or style of hyphenation beyond just British/American. Do we cite a source? Michael Z. 2008-05-30 23:30 z

Yikes! CanOD: an·alogy, an·aly·sand, an·aly·sis. NOAD: a·nal·o·gy, a·nal·y·sand, a·nal·y·sis! NOAD also allows ex-acting, coin-cidence, read-just, leg-ends. Shouldn't a quality hyphenation guide omit such breaks which are more than awkward? Or should we report everything recommended by some reference? Michael Z. 2008-05-30 23:54 z
All that the latest portion of this thread shows is that NOAD sucks balls at hyphenation and should never be mentioned in a discussion about it. -- Thisis0 00:21, 31 May 2008 (UTC)[reply]
It seems possible to me that NOAD haven't checked their automated hyphenations, the four mistakes Mzajac listed above above (co-in-ci-dence, ex-act-ing, leg-ends, read-just) are replicated by the Knuth-Liang algorithm (though it does a better job than EPs example and the top of Anarchy of Pedantry). From reading about hyphenation it seems that we can say whatever we like about it, we are just providing a guide - there is no strict right or wrong (beyond common sense and aesthetic judgment). Obviously that poses problems in the "No, I'm right" style of wiki-dispute, however given the low importance of this I'm hoping people won't get too worked up about them. Conrad.Irwin 00:38, 31 May 2008 (UTC)[reply]
Related reading: On Hyphenation - Anarchy of PedantryMichael Z. 2008-05-30 23:56 z
Ultimately, hyphenation is about making printed text look better. Many dictionaries simply present all the possible places one could conceivably hyphenate, even to including hyphenation breaks that no sane publisher would ever use. Consider this humorous example:
American families in the 1950s empathized with Lucy and Ricky Ricard-
o. Each week, millions turned in to watch the antics of Lucy and friend E-
thel Mertz.
No sane publisher would hyphenate this way, but if one follows the kind of hyphenation advice given in some dictionaries, then this could be the result. ---EncycloPetey 00:11, 31 May 2008 (UTC)[reply]
All my sources would unanimously say Eth-yl and Ri-car-do. -- Thisis0 00:47, 31 May 2008 (UTC)[reply]
Admittedly I fabricated a quick example, but the implication stands. How would your sources hyphenate mighty? --EncycloPetey 20:43, 31 May 2008 (UTC)[reply]

I'm convinced that my daily use of hyphenation is much more esoteric than I had assumed. I require (and publish) hyphenation for ev-'ry sung syl-la-ble in vo-cal mu-sic. I suppose in light of this discussion, this is somewhat a combination of syllabification and hyphenation. However, I have found most reputable dictionaries provide standardized hyphenation of every syllable, seemingly for my industry alone. (Anyone know any other widespread applications? Aiding pronunciation?) Standardized hyphenation by syllable becomes more difficult as I publish Italian and Spanish songs (which I have hyphenated as best as I can, usually with each syllable beginning with it's consonant). English does have standardized hyphenation -- usually to avoid awkward or misleading word fragments, especially those that would lead one to pronounce a different vowel sound before getting to the rest of the word. Properly hyphenated "mu-sic" does not keep the letters of it's component "muse" together; also it avoids the desire to begin to pronounce "muh" if one saw only "mu-" at the end of a line. "Wom-an" avoids a momentary "woh" pronunciation, and seems to deemphasized the subordinate relationship to "man". "Vo-cal" avoids a preemptive "vawk" sound when reading. Anyway, I still think this is worth the effort (to start using the hyphenation template, I guess -- which I didn't know existed) especially if we can delineate some of the US/UK specifics as well as discover standardized hyphenation for other languages. -- Thisis0 00:41, 31 May 2008 (UTC)[reply]

Does the hyphenation template have any provision for indicating stressed syllables? Could it be made to do so? I realize that this is a simplistic approach to pronunciation, but, then again, so many of us ordinary users are simple people. Does it pay to show multiple kinds of hyphenation (and stress)? DCDuring TALK 01:18, 31 May 2008 (UTC)[reply]
Inspection suggests that nothing would prevent stresses from being added with a character like "'", presumably at the beginning of the stressed pseudo-syllable (pace EP). DCDuring TALK 01:26, 31 May 2008 (UTC)[reply]
Excuse me, but hyphenation has nothing to do with stress or pronunciation. Stress should never be marked in the hyphenation template. --EncycloPetey 20:39, 31 May 2008 (UTC)[reply]
I get it now. Obviously, music publishing has different needs than prose, and this is where guides like NOAD's make sense. Even for prose writers or typesetters, it may be useful to know every place where a word can be hyphenated, for extreme cases or art. But it would be benificial to also have a practical guide for non-expert writers.
So how do we annotate all of this? Michael Z. 2008-05-31 01:57 z

Hyphenation example

Let's see what an extreme example of hyphenation can look like. Can anyone add more permutations from other dictionaries, or other English styles? Or add a different word if you know of a more diverse example. Michael Z. 2008-05-31 02:14 z

  • Hyphenation (US, NOAD): a‧nal‧y‧sand
  • Hyphenation {US, MW3): anal‧y‧sand
  • Hyphenation (Canada, CanOD): an‧aly‧sand

Or a functional description:

  • Hyphenation (US, every break): a‧nal‧y‧sand
  • Hyphenation (Canada, for prose): an‧aly‧sand

With stress shown:

  • Hyphenation {US, MW3): a:nal‧y‧:sand
    ( " : " is an approximation of MW3 notation for could get primary stress in some pronunciations or unstressed in pronunciations stressing the other so marked. " ' " is an approximation for their stress marker)
Stress should never be included in hyphenation, because it is unrelated to the function of hyphenation. Stress is peculair to spoken language; hyphenation is peculiar to written language. --EncycloPetey 20:41, 31 May 2008 (UTC)[reply]
If there were a way to finesse it, I would have been willing to argue with you, EP, but the messiness of the very first case suggested that it would not be simple. I will continue to look for ways to get naive new users to get pronunciation benefit from Wiktionary. DCDuring TALK 20:48, 31 May 2008 (UTC)[reply]

Is there a policy or practice about whether such entries should be included? This would apply to uncountable noun senses for many, many entries. In the case of paper one would need scrap of paper, sheet of paper, ream of paper, roll of paper, stack of paper, etc. If the phrases were in the entry for paper they would be findable by search. Though the entry for paper could stand to have a list of such ways of achieving countability for quantities of paper, I see much less value in the individual entries. They are certainly related terms, but might warrant a separate rel table. DCDuring TALK 20:45, 30 May 2008 (UTC)[reply]

Piece of paper might merit an entry, as it is slightly idiomatic (when someone asks for a piece of paper, they're not asking for a piece, but a sheet). The rest should not be included anywhere, unless they have similar idiomaticity (if anyone knows a real word to replace idiomaticity in this sentence, please feel free to do so). -Atelaes λάλει ἐμοί 21:00, 30 May 2008 (UTC)[reply]
They might be asking for a "scrap" of paper or even a memorandum or an index card or a post-it note. It seems to have much more to do with the situation than the words. I didn't really think that we could handle much of that kind of context dependence. DCDuring TALK 21:11, 30 May 2008 (UTC)[reply]
Hmmmm...that's a good point. -Atelaes λάλει ἐμοί 21:17, 30 May 2008 (UTC)[reply]
Is there a policy or practice in this regard? DCDuring TALK 21:07, 30 May 2008 (UTC)[reply]
Well, my understanding about the related terms bit (if that's what you're asking about) is that terms only merit entry in the related terms section if they merit their own entry. This doesn't mean that they actually have their own entry at the time, but it is conceived that they would eventually. -Atelaes λάλει ἐμοί 21:17, 30 May 2008 (UTC)[reply]
I'm with Atelaes, piece of paper is idiomatic, the rest are SoP. However as they are all reasonably common I wouldn't complain if the entries existed. It might be better to have these as some sort of extended usage note at paper, as opposed to a "related terms" section, if we aren't treating them as terms and giving them entries. Conrad.Irwin 21:28, 30 May 2008 (UTC)[reply]
Maybe I ought to take a look at a few of the uncountable words and determine whether there is anything of interest for any subset of them. Many achieve countability by generic means: "item of", "instance of", "case (situation) of", "[container] of", "[measure] of". Some may be like paper, having more idiosyncratic units. I am not sure that I see the idiomaticity of piece of paper and sheet of paper, as opposed to paper having its own particular units. DCDuring TALK 21:44, 30 May 2008 (UTC)[reply]
Isn't "piece of" one of the most generic ways of making countable an uncountable? DCDuring TALK 21:47, 30 May 2008 (UTC)[reply]
Right. And if you doubt it's idiomatic, you're welcome to prune the list. (Just be sure to comment it.) And if you disagree with that judgement, create the entry rather than putting comments in the list like <!--this is idiomatic--> which no one will ever see anyways. 75.54.80.198 06:47, 4 June 2008 (UTC)[reply]
It might be worth looking at how this is handled in our Japanese entries, since the phenomenon is more widespread in Japanese. In English, it's rare (piece/sheet of paper, head of cattle, pair of jeans) but it's the norm (as I understand it) in Japanese. --EncycloPetey 21:49, 30 May 2008 (UTC)[reply]
Well then I guess "piece of" anything would be out in English, per DCDuring's comment, and we would somewhere have a table of measure words for uncountable nouns like paper. How long would that list be, in scrolls and/or yards? And would it explain any of the terms? I rather like the current solution since it's not clear that "piece" means a sheet and "scrap" means a piece. Of course the problem is that we then have to get into these tedious debates in order to have something like bar of soap or sheet of paper deleted as nothing more than sum of parts. 75.54.80.198 06:47, 4 June 2008 (UTC)[reply]

Language specific help templates

This topic involves both general and technical issues, but I hope the tech junkies who frequent the GP will still read it. I recently created {{attention}}, which is intended to mirror the functionality of templates such as {{la-attention}}, {{zh-attention}}, but work for all languages. Basically, the template is inserted into an entry (with the language code as the first parameter) and places it in a category for people with specific knowledge of that language to look over and, if necessary, clean up. In addition to putting, say, Latin entries in the Latin cleanup category, this can also be used in entries of any language which need attention from an editor with capabilities in a different language. So, for example, if an English word comes from Ancient Greek, but the person writing the etymology has limited Ancient Greek skills, they can put {{attention|grc}} in the entry, and I'll see it and look over the entry. In addition to simply advertising the existence of this template, I am hereby proposing that {{rfscript}} be deprecated in favour of this template. The problem with {{rfscript}} is that some scripts, such as Cyrillic and Arabic are used in so many languages that it is unlikely for any editor to know enough about every language which uses that script to be able to adequately respond. Just because a person knows Hindi, does not mean that they will know Sanskrit or Marathi. Just because a person knows Hebrew, does not mean they know Aramaic or Yiddish. Granted, we are fortunate enough to have Stephen working for us, who seems to know every language ever used, and so the rfscript method has worked reasonably well up to this point. However, I managed to stump even him with a Pashto request at πάρδαλις (párdalis) (although, I do believe that's the first time I've ever managed to do that :-)). The template currently puts the entry in the category [[Category:XXX words needing attention]], but I think it might be prudent to switch it to [[Category:Words needing XXX attention]], which makes more sense when put into foreign language entries. Thoughts? -Atelaes λάλει ἐμοί 22:44, 30 May 2008 (UTC)[reply]

Actually, I like the current setup. I find that it's easy to spot an out-of-language entry in the category and understand that there's a translation or etymology in need of care (such as when cutify turned up in Category:Latin words needing attention). I only see a potential problem for categories that are not being maintained and so grow out of control (like Category:Japanese words needing attention). In those cases, further subdivisions or something might be useful. Personally, I prefer seeing the language identified up front in the category name. --EncycloPetey 22:51, 30 May 2008 (UTC)[reply]
I also like the subdivision. I think more specificity is good in this case- the 'Requests for scripts' can be subcategories of the 'words needing attention'. If there are many words needing general attention then it becomes hard to find specific problems (like scripts). Just because someone knows the script doesn't mean they know the language. Nadando 22:54, 30 May 2008 (UTC)[reply]
I agree that something needs to be done. Although I know a lot of scripts, I try not to mess with languages that I don’t know much about. I do a lot of Arabic and can do some Persian when forced to, but Urdu really takes too much effort, and Pashto is worse. I check the request for Arabic script page regularly, but I don’t think anyone who works in a language other than Arabic does, so the Urdu goes unattended. It’s the same story with Cyrillic. I do a lot of Russian and some Bulgarian, but there are many languages that use Cyrillic and many more transliteration systems, so it can be very difficult to retransliterate words in unfamiliar languages. I think it would be nice to list requests for Arabic script work in a central location as it is done now, but also on a page for the specific language, and that page should include a link from the "Category:XXX language" page so that the relevant contributors can be made aware of it and can find it. —Stephen 08:06, 31 May 2008 (UTC)[reply]
I like the current setup; personally, I've never minded having to subvert it a bit when necessary (e.g. adding {{la-attention}} to cutify so a Latin-speaker — EncycloPetey, as it turns out — could fix the etymology, or adding {{rfscript|Cyrl}} to various entries that had the Cyrillic but needed someone — usually Stephen — to add a transliteration). However, I'm also quite fine with a setup that comes pre-subverted. :-) —RuakhTALK 15:08, 31 May 2008 (UTC)[reply]
Should modify {{rfscript}} to accept two arguments? One could identify the script and categorize it accordingly, but the other could identify the specific language (when known) and add it to the appropriate language attention category. That way, Urdu entries show up in an Urdu-specific category as well. --EncycloPetey 20:35, 31 May 2008 (UTC)[reply]
I think that's a great idea. :-) —RuakhTALK 13:00, 1 June 2008 (UTC)[reply]

June 2008

Auto-categorization based on suffix

I would like to use a template to add entries to categories based on suffix. The suffix is mentioned in the etymology (e.g. in kertész -> kert + -ész). Would this be something other languages would use? For Hungarian words, it would be useful to build categories for words ending in the same suffix. --Panda10 12:25, 1 June 2008 (UTC)[reply]

We've tended to frown on those in the past, for several reasons. Some languages are so highly inflected, that it becomes difficult to know when there is a suffix and when it is just an inflectional ending. It has also been troublesome to settle on appropriate category names, since a "-" must necessarily appear in the middle of the category name. What we've tended to do instead is to create lists as appendices/indices instead. --EncycloPetey 16:17, 1 June 2008 (UTC)[reply]
An appendix would work for this purpose, and the categories I created so far are not that large, I can rework the entries. Thanks. --Panda10 16:51, 1 June 2008 (UTC)[reply]
I don't see we can't have categories- after all, we have Category:English nouns ending in "-ism", Category:German words ending in -nf. Nadando 17:43, 1 June 2008 (UTC)[reply]
Neither of those are suffix-based categories. The first is a list of religious / philosophical nouns that happen to end in "-ism", but this isn't always the result of an added suffix. The name went through considerable discussion, and this was the best compromise we came up with. --EncycloPetey 01:35, 3 June 2008 (UTC)[reply]
I created both an appendix and a template: Appendix:Hungarian words ending in -ász -ész and {{hu-suffix-ász}}. This template puts the word in the right category and allows easy standardization/changes in the future if needed. --Panda10 21:54, 1 June 2008 (UTC)[reply]
Excellent, the template-based approach is by far the best solution, though it still faces some resistance. The category thus constituted basically maintains itself (as long as the entries themselves are maintained). In contrast, walk away from the appendix for a year and see if it is still up to date. -- Visviva 08:56, 4 June 2008 (UTC)[reply]
Actually, I've just discovered that there is already a {{suffix}} template which could be updated to place the entry into a category. An additional (optional) parameter would be required to indicate the categorization. The category name could be standard: Category:<Language> words suffixed with -xxx (no double quotes). This was already brought up in the suffix template discussion page. I created a new {{hu-suffix}} template to do just that, but updating the {{suffix}} template would be a much better solution. --Panda10 10:45, 4 June 2008 (UTC)[reply]
Seems useful to me. DCDuring TALK 11:47, 4 June 2008 (UTC)[reply]

Audio in word lists

I've been thinking about this idea and would like your feedback. Is there a way to add audio to the word lists we have in Wiktionary (indexes, categories)? Visually, this would be a speaker icon in front of the words without any additional text that we normally have in the audio template. Only the icon and the word. If the audio does not exist, the icon would be red, otherwise blue. This would also make missing audio visible - which can be good or bad depending on how you look at it. --Panda10 12:48, 1 June 2008 (UTC)[reply]

I don't think we can do that in categories. The sound (1) may or may not exist, (2) is specific to one language, (3) may have more than one audio file dependent upon poart of speech, stress, etc. This would require a lot of work. Now, adding sound files to an index is potentially easier, but still has the problems of making sure the correct audio file(s) end up in the index, and making sure that each word is properly represented. Consider, for example, that some English words are pronounced differently as a noun or verb. Consider that some English words are pronounced differently in different countries. --EncycloPetey 00:45, 3 June 2008 (UTC)[reply]
The index entry could contain the same number of audio icons as the actual entry. When you hover above the speaker icon with the cursor, an alternate text appears clearly showing the file name, telling which country's pronunciation it is and which part of speech it belongs to. So I don't see this as a problem. If the audio file does not exist in the entry, then there is a choice of either displaying a red speaker icon next to the index entry, or not displaying at all. Anyway, this was just an idea. I think Wiktionary has many interesting functionalities that the other online dictionaries do not have and making those aspects stronger could be one of the ways to stand out. --Panda10 01:44, 3 June 2008 (UTC)[reply]

Citations/Quotations

Urmm, I'm confused! What's the difference (on Wiktionary) between a citation and a quotation, should we be using the {{cite-book}} or {{quote-book}} in the Citations namespace? I'm fairly sure I understand the difference between a reference and a citation/quotation . Am I suffering from (yet another) foolish misunderstanding or is there a wider confusion as to the difference? I am aware that lots of the visiting 'pedians seem to equate citations with references, not quotations. Conrad.Irwin 21:54, 1 June 2008 (UTC)[reply]

Oh, I also forgot to mention {{cite book}} which seems to have been transwiki'd for references. Conrad.Irwin 21:57, 1 June 2008 (UTC)[reply]
"Citation" and "quotation" are synonymous for our purposes; the latter name is perhaps less misleading, but the former name became the namespace name, so *shrug*. {{cite-book}} and {{quote-book}} are both fine in the Citations: namespace; we don't yet have a template that handles this stuff perfectly, so use whichever you prefer or fits a given quote better, or use neither if you'd rather format by hand or if you have a quote that requires handling that neither of them offers. {{cite book}} is for books used as references (i.e. it's used to cite a book in the ordinary, non-Wiktionary sense); you can use it inside <ref> elements, or inside bullet points in ===References=== sections, or you can pretend it doesn't exist and format such references by hand. —RuakhTALK 22:10, 1 June 2008 (UTC)[reply]
See also Category:Citation templates for templates for other special cases such as video, Usenet, US patents, journal articles, and newspapers. -- Visviva 11:02, 2 June 2008 (UTC)[reply]
Also ee also Special:PrefixIndex/Template:cite - it looks like they are both redundant to each other. Conrad.Irwin 16:04, 2 June 2008 (UTC)[reply]
AFAICT, {{cite-book}} is equivalent to {{quote-book|i1=*}} for the basic case, but without support for book URLs, editors, original authors, etc.; similarly {{quote-book}} lacks support at this writing for page URLs and genres. I'm not sure if cite-book was created because of problems with {{quote-book}}, or just because that template has not been sufficiently advertised. Either way, hopefully we can get a consensus treatment going forward.
In terms of the name, I opted not to use "cite" in the names of the quote-X templates to avoid confusion with the ex-Wikipedia templates mentioned by Ruakh, which serve a very different function. Perhaps a future consensus series of templates could be at Template:citation-foo? -- Visviva 11:02, 2 June 2008 (UTC)[reply]
As far as I can tell, {{cite-book}} was created for use in the Citations namespace and {{quote-book}} for use on the main entry. The reason there is a difference is that quotations on the main entry must begin with a "#" to put them under the appropriate definition sense. In the Citations namespace, this isn't the case. However, neither template seems to work very well, and the way they package content differs, which makes it frustrating to move an item from the entry page to the citations naemspace or vice versa. --EncycloPetey 14:25, 2 June 2008 (UTC)[reply]
(Yup. That difference is also present between the definition lines and the Quotations section, if the latter is still relevant.) It would clearly be nicer to have only one template for each need, and I'm sure the differences between cite-book and quote-book can be hammered out. For any of these templates, it would be better, IMO, not to include the quotation as a template parameter, which would resolve the problem of indentation. But that could be "solved" as well with even more parameters like {{{indent}}} or other patchwork. 75.54.80.198 06:18, 4 June 2008 (UTC)[reply]
If the quotation is not associated with the template, then there is no prospect of CSS-based autohiding, which is surely something we will want to consider as entries become better-cited. Having the quotation be optional was something I tried with {{quote-book}}, but was borked by the changes to the preprocessor and in any case was probably a dumb idea.
It's fairly easy to match {{cite-book}}'s behavior using {{quote-book}}; the former template could, I think, be replaced with the following code:
{{quote-book
|i1=*
|year={{{year|{{{1|}}}}}}
|author={{{author|{{{2|}}}}}}
|title={{{title|{{{3|}}}}}}
|isbn={{{ISBN|{{{isbn|}}}}}}
|url={{{book-link|{{{url|}}}}}}
|pageurl={{{link|{{{pageurl|}}}}}}
|page={{{page|}}}
|passage={{{text|{{{passage|}}}}}}
}}
... or code to that effect. The only difference would be in the URL display (but we don't really have a standard for that AFAIK). The reverse operation (replacing quote-book with an invocation of cite-book) is not currently possible.
I think it's good to have separate low-hassle templates for the * and #* cases, but they should use the same basic architecture, whatever that may be. -- Visviva 11:35, 4 June 2008 (UTC)[reply]

Flag edits for specific people

This edit got me thinking- I don't have any knowledge of Ancient Greek or how it is pronounced, but what if I could flag the edit and have it show up on some kind of watch list (or just the normal watch list) for people who had knowledge of the language? IE this edit would show up on Atelaes watchlist. It could be tied in with the babel boxes somehow. Maybe there isn't enough need for this but it would be cool. Nadando 06:59, 2 June 2008 (UTC)[reply]

No, I wholeheartedly agree with you. Every time I see an edit to an Italian, Mandardin, Russian, etc. page, I'm never quite sure what to do. It looks reasonable, should I mark it? It would be absolutely fantastic if we could sort RC by language or something. I have no idea if such a thing is possible. And yes, those edits were bunk, as far as I can tell (admittedly, my sources aren't fantastic for esoteric placenames). -Atelaes λάλει ἐμοί 07:20, 2 June 2008 (UTC)[reply]
Well, we've got {{attention}} for stuff that needs a check, or WT:BABEL to find someone to ask to check it, but for small things like that I tend to assume they know what they're on about. Conrad.Irwin 16:02, 2 June 2008 (UTC)[reply]
I noted that we have a lang= parameter for rfc. Would it make sense to have a context= parameter for domain specialties? I would think it would be a good way to get some context-specific entries cleaned up efficiently. DCDuring TALK 16:28, 2 June 2008 (UTC)[reply]
That sort of functionality is not likely to come to MediaWiki anytime soon. The closest thing would be if someone could write a bot that would periodically (on daily/weekly basis) dump the list of unpatrolled edits sorted by language section name in which the edit occur, so that the interested editors could check them, but I don't know if that's feasible at all. --Ivan Štambuk 18:47, 3 June 2008 (UTC)[reply]

Plurals of proper nouns

So what is the policy now? Are plural forms of proper nouns to be included in WT or not? Specifically, is to be specified in the article Alexander that the plural to be used is Alexanders when referring to more than one person bearing that name?
Background 1: RFD discussion on Jesuses, which seemed to converge to the opinion that plurals of proper nouns may be acceptable for Wiktionary purposes in exceptional cases when the use has been verified by a suitable number of citations.
Background 2: Dispute on the inclusion of plural forms in the declension tables for Polish male given names between User:Maro and me (see my discussion page and his edits on several articles I created). The problem is exactly the same: the situation to refer to more than one Marek is in Polish no more and no less common than the situation in English to refer to more than one Alexander; so the fact that this is LOTE does not justify a different policy.
Please discuss. -- Gauss 01:05, 3 June 2008 (UTC)[reply]

For inflected languages, the situation is perhaps more easily resolved. What has been happening in Latin and Ancient Greek is that the declension tables for proper nouns do not include plural forms (except in cases where the name is inherently plural). However, on the inflection line of all Latin proper nouns is a link to the appropriate declension pattern appendix, where the exmples includes plural form patterns. So, someone wanting to explore the plural inflections has that option to examine a full declension table. But attestation is also a bit more cut-and-dry in Classical languages; either the plural is known in the corpus of surviving literature or it isn't. --EncycloPetey 01:30, 3 June 2008 (UTC)[reply]
I disagree with your (Gauss's) assessment of Wiktionary:Requests for deletion#Jesuses; I don't believe it converged to any opinion. It looks to me like a slight majority of editors prefer to keep what you're calling "plural forms of proper nouns", exceptional case or no. I think the topic should be brought to a vote. —RuakhTALK 01:34, 3 June 2008 (UTC)[reply]
I've now created Wiktionary:Votes/pl-2008-06/Plurals of proper nouns; I don't like the name, since proper nouns don't have plurals, but couldn't think of a better one. :-P   I've decided to go with a simple up-down vote, but if people prefer, I'd be happy to split it into sub-votes (e.g. first voting on whether to include such plurals, then voting on details for how to present them). Please feel free to edit. :-) —RuakhTALK 01:54, 3 June 2008 (UTC)[reply]
Well, you could call it "Plurals from proper nouns", so that the nature of the plurals is left nicely ambiguous... --EncycloPetey 02:08, 3 June 2008 (UTC)[reply]
Good idea, thanks, done. :-) —RuakhTALK 02:13, 3 June 2008 (UTC)[reply]
Comment below probably mooted by EP's deft finesse. DCDuring TALK 02:44, 3 June 2008 (UTC)[reply]

Edit conflict:

Would a vote lead to any conclusion, given the slim majority? Is there any issue on which there might be a resolution? I'm still struck by the fact that there are numerous occurrences of plurals like "Henrys" or "Smiths", but that we wouldn't have a place for them. It wouldn't have to be a prominent place or a separate entry, just something findable from the search box and accessible to someone looking at the singular form. It doesn't have to be called a Proper noun, though it seems as if we could address the issue by linking the PoS header to an explanation of "true" proper nouns. Making all of the proper nouns automagically sprout Noun PoS sections for the normally occurring common noun uses of the words spelled the same as the proper noun seems like overkill, though that would be the only course that addresses the problem posed by the commonly accepted technical definitions of Proper noun.
The MW3 definition is an example of what I see as defective definitions: "a noun that designates a particular being or thing, does not take a limiting modifier, and is usually capitalized in English". By "limiting modifier" they seem to mean "determiner", citing "this" as an example. "This Barack Obama that everyone is talking about" is a phrase that does not include a proper noun. Nor does "the eight King Henrys". "Proper noun" seems to indicate a very particular kind of use that doesn't exactly fit our entries which are for the atoms that might make up true proper names or that might be proper names in particular circumstances.
In other words "proper noun" doesn't seem to be a linguistic category that is stable enough to be useful for a dictionary that specifically excludes proper nouns that actually refer to particular individuals. DCDuring TALK 02:44, 3 June 2008 (UTC)[reply]
MW3 seems to be using a narrower definition akin to that of the CGEL. We use a broader definition of "proper noun" that includes the concept of proper name. The instability is not inherent in the category, but in the scope of its definition as compared between sources that lump together proper nouns and names versus those that carefully distinguish between them. Also, keep in mind that I've started an appendix that should help (I hope) to clarify our stance and definition here. There will still be some borderline cases, but having the appendix should make those cases explicit and clearly explain why they are problematic. I should have time next week to resume work on said Appendix. --EncycloPetey 02:54, 3 June 2008 (UTC)[reply]
MW3 shows "proper noun" as a synonym for "proper name", though not vice versa. I don't know where to look to find our definition of proper noun. Are you referring to the one in the Glossary? DCDuring TALK 05:06, 3 June 2008 (UTC)[reply]
It will probably influence how people vote. We clearly have to have a broader definition, presumably in our Glossary, so that we can have proper noun entries in Wiktionary at all. But, given that our entries are often of proper noun components, not proper nouns themselves, why do we have to follow rules that specifically apply (however imperfectly) to the kind of proper nouns that we mostly exclude? DCDuring TALK 03:04, 3 June 2008 (UTC)[reply]
Which proper nouns and which rules do you mean? your comment is too vague for me to underatnd what you mean. Could you clarify? --EncycloPetey 03:28, 3 June 2008 (UTC)[reply]
The proper nouns I am referring to are the ones that are the subject of the vote. The rules include "Proper nouns do not have plurals". I would argue that "the Smiths", referring to a particular nuclear family living at a particular address, is a proper noun in the most important regard, designating a specific unitary entity. It is as much a proper noun as "Smith". The vote proposes that the common noun singular PoS of the words we have display under the proper noun PoS be officially discouraged and denies proper noun status to the plural proper nouns.
I am really looking forward to the clarification of "proper noun" as used by 1., linguists, 2., our naive users, 3., us as we now use it as a PoS header, and 4., as we propose to use it. I find it hard to understand how we can have a vote on this in the absence of a great deal more precision than currently characterizes our Glossary definition of proper noun or the definition in principal namespace. DCDuring TALK 05:06, 3 June 2008 (UTC)[reply]
Ah, well "the Smiths" belongs to that odd form that linguists seem to waffle over. I would argue that these are collective proper nouns that merely look plural. Consider that United States is normally considered to be a singular proper noun, as is the Virgin Islands. But you are correct that the issue needs clarification. This vote may not clarify that particular issue, however, since we still have very vague policy about the inclusion of personal name elements. --EncycloPetey 13:50, 3 June 2008 (UTC)[reply]
Perhaps the vote is premature until we have a little greater clarity on the issues.
Musing about this: Given that we have a strong, fundamental stricture against entries that are for individual entities, ie, that are the purest case of proper nouns, I wonder whether "proper noun" is the most useful PoS label for words whose principal use is the construction of and shortening of the type of proper nouns that we generally prohibit. Taxonomic names are a great prototype for proper names. There is a clear structure with definite rules and relatively little colloquial usage that we feel compelled to include. In principle, the various taxons each refer to exactly one group of individuals at a given point in time. The species and subspecies modifiers are not in themselves proper nouns, as you pointed out. That is in some ways parallel to "given names".
We could make a case that given names should not be called proper nouns. "Proper name" carries slightly less linguistic baggage and is intelligible to all users. Perhaps the same reasoning could be extended to surnames as well. I certainly haven't thought through what all the implications would be. I airily dismiss the difficulty of conversion by saying a bot could do many of them (assuming that they had the given name and surname templates). Perhaps the existence of the appendices on names would aid the conversion as well. Are there deal-killing drawbacks to such a PoS header? DCDuring TALK 15:52, 3 June 2008 (UTC)[reply]
But saying that given names are not proper nouns runs contrary to all the literature on the subject that I've ever seen. I'd want some solid basis, preferrably in academic publications, before making that kind of colossal change. And "proper name" does have some real problems as well; most especially in that it is a kind of nominal phrase and not strictly the name of a part of speech. In linguistic terms, the category of proper name includes: "the Crimea", "the Earl of Sandwich", "Mother", "my Jennifer", and "the Mary that you met yesterday". If you have access to a copy of the Cambridge Grammar of the English Language, I recommend reading the pages on proper nouns and proper names. --EncycloPetey 23:58, 3 June 2008 (UTC)[reply]
We already have PoS-level headings that are not parts of speech, to wit, the three abbreviation headers, Symbol, Phrase, Proverb, among those not deprecated, as best I recollect. I think that "my Jennifer" and "Jennifer" are very much equivalent. "Jennifer" does not actually uniquely specify a single person except in some kind of context. "My Jennifer" seems to be a case where the "my" provides some of the context, the balance being provided by the context that says who "my" refers to. Fortunately for preventing endless proliferation of entries, "my Jennifer" would seem to be SoP.
I would actually find all of the given name and surname entries to be in good company with any of these that are not SoP. Any term of address would seem appropriate.
It does not seem hard to come up with rules to exclude any types of entries that would be substantially duplicative.
CGEL is not conveniently available, but I will get to the library that has it at some point. Is the CGEL's use of terms consistent with the other modern English grammars? DCDuring TALK 02:33, 4 June 2008 (UTC)[reply]
Sometimes yes, but other times, no. Their terminology is sometimes current, but sometimes idiosyncratic. It is usually possible to tell the idiosyncratic uses from the excessive explanation and over-precise splitting from other terms. I recommend it primarily because the authors raise many issues and cases often glossed over in other general works. It therefore makes a good source for opening the mind to what the language is actually doing.
I disagree with you about "Jennifer" and other such given names. They almost always refer to a specific unique individual. Yes, additional context is often needed to tell which of several possible referents is meant, but the same is true of Congo, Guyana, and Macedonia. The fact that an identical label is applied to many different individuals does not make it a common noun. A proper noun disinguishes an individual, whereas a common noun groups it. An apple is named to place it in a category of items which possess shared qualities common to all items bearing the same label of "apple", and which would be expected from someone hearing the word apple. But Jennifer predicts no properties of the item bearing the label (except gender in this particular case, but not all given names do so), and so there is no description possible of what a "Jennifer" is. Any such description would have to be made of a specific individual bearing the name, rather than a description of shared characteristics. This is why we have no true definition of Jennifer, but instead identify it as a label used as a given name. Thus, Jennifer is a proper noun and not a common noun. --EncycloPetey 03:59, 4 June 2008 (UTC)[reply]
What you are saying seems true of some elements of our current naming system, but wouldn't have applied to occupational surnames like Sawyer; patronymics, Leif Ericson; names reflecting personal characteristics Karl der Grosse, Preacher Roe, Wild Bill Hickok; or place name related names: Vito Corleone, Friedrich August von Hayek. Titles (arguably part of Proper Names and Nouns) also convey informaton. All given names reflect information about the attitudes or aspirations of parents toward their children (patron saints, names like "Theodora", favorite relatives, sports heroes, celebrities, etc.
I take a less Platonic view of these matters. If the concepts don't communicate to users and require a few dozen more doctoral dissertations to get the kinks worked out, then they don't yet belong in a dictionary. It would be much easier for us if we could model what other dictionaries do with regards to proper nouns and modify it to suit using the evolutionary incrementalism that characterizes public wikis. But, for the most part, general dictionaries exclude personal names, leaving the field to more encyclopedic works or to specialized glossaries. DCDuring TALK 06:21, 4 June 2008 (UTC)[reply]
Jennifer can be a name (uncountable, not existing in determined form), but also a specific person who bears the name "Jennifer". In "one Jennifer", Jennifer can be any person who bears the name "Jennifer". In this meaning meaning two persons who bear the name, they would be called Jennifers. "My Jennifer" is also a specific person, but extraxting "Jennifer" from that phrase, "Jennifer" becomes any person who bears that name, thus it would be possible to say "My Jennifers" (like if you have two daughters both named "Jennifer"). In Swedish I think we don't even have terms to distinguish these different meanings, so it's interesting to read this discussion (and the one about Jesuses). However, even if a very small number of persons can differentiate these meanings and the proper PoS-name for it, the vast majority would probably not. Having each proper noun page have different headings would probably lead to alot more confusion than clarity, and there would need lots of work to accomplish this change - the gain would be minimal, if at all. Proper nouns (like Jennifer etc.) are also something that normal dictionaries don't even include so it seems like a bit like they come as a peripheral part of the goal for Wiktionary. Why put so much effort in this instead of making the main pages of normal words better? Plurals (Jennifers) of the "person bearing the name"-meaning could ofcourse be created by a bot, but if these pages directs to the main page (Jennifer) the main page needs to include this meaning too, then all proper noun pages need manual attention, too. Who's supposed to do that, also to maintain this for all pages created in the future, and for all languages? Not because it's not logical to do this, but to me it feels way to peripheral to put alot of effort in. What is reasonable though is to add some kind of inflection table, explaining both the possessiv spelling and also this information about the other "person bearing the name"-meaning, along with the plural form of this meaning, and maybe also a link to a page explaining more about Proper Nouns in the English language that I saw you worked on, EncycloPetey. This could be added with much less effort, but also make the pages complete with information that would satisfy most users the best. Exactly how to formulate I don't know enough of English grammar to, but I am confident you are knowledgable to figure out something good. ~ Dodde 07:53, 7 June 2008 (UTC)[reply]
Thanks for the fresh, thoughtful, reasonable perspective. Treatment of proper nouns is a hard area because of the absence of suitable models in other general dictionaries, but it also is a way to attract users to Wiktionary. I don't believe that proper noun plurals deserve a separate entry (but most common noun plurals don't either, IMHO). I am mostly interested in simply, 1., displaying the plural at the proper noun entry and, 2., allowing search for plurals of "proper" nouns. In the most obvious implementations, this seems to clash with the linguist's definition and criteria as to whether something is a proper noun. I am saying so much the worse for the definition, but others reasonably disagree. Language consists of both words and rules. The definition of a dictionary focuses on words, not rules. We risk wasting user's time, attention, memory, and patience when we have full entries for things that are better accommodated by rules. Almost all of the uses of proper nouns in common noun ways would be fully covered by rules (as would many standard derived meanings of nouns, like "instance or type of" for abstract nouns like -isms; morphological alterations of stems, like those using -ly, -ness; and regular inflections, like plurals and verb forms). Perhaps it could all be accommodated by a clickable black (or, better, an unobtrusive color) links to PoS appendices (which could be for all headers and glossary terms. DCDuring TALK 12:45, 7 June 2008 (UTC)[reply]
We should not forget that proper noun is interpreted slightly differently in different languages (the basic meaning being the same), as this may have some impact. In French, it's usual to consider that proper nouns have no plural, but plurals are used in some cases, nonetheless. And I really think that adding the meaning plural of France to the page Frances could help some readers. Lmaltier 15:42, 7 June 2008 (UTC)[reply]
I must say I am a completely astonished by the first supporting votes on Wiktionary:Votes/pl-2008-06/Plurals of proper nouns. Don't people realize the extensive work supporting this vote mean in reality, if it's meant to be done consistently on all proper noun pages, and how it will completely downgrade the affected pages' readability? I really hope people get aware of this before it's too late. ~ Dodde 14:06, 13 June 2008 (UTC)[reply]

Combining forms

I seem to remember we had a discussion about combining forms at one point. Does the entry include the hyphen or not?

For example, I've just created -eyed and moved derived terms from eye to that page. But should the entry properly be "eyed"? I don't think so, as this is the past tense and past participle of "to eye", and the hyphen is an essential part of the combination (except when the compound is a single word, as is common in US English). So how should the entry be titled? — Paul G 10:11, 3 June 2008 (UTC)[reply]

-head is an example of a more restrained differentiation of cases that I think is appropriate for combining forms. One etymology is a pure suffix not connected at all to the ordinary meanings of "head". The other etymology shows a particular combining form application whose meaning, though clearly connected, does not exactly correspond to any one meaning of "head". There is no sense line at -head that corresponds to the use of -head in forehead, ahead, flathead, gearhead, and behead, or even shithead, mophead or pinhead. I think that is as it should be, though it may be that the use of -head in invective and milder forms of name-calling might warrant a sense. DCDuring TALK 16:20, 3 June 2008 (UTC)[reply]
For the sake of consistency, since suffixes are marked with a hyphen and this suffix always occurs with a real hyphen, this should technically be at --eyed. But as this example shows, consistency is no virtue. -- Visviva 08:51, 4 June 2008 (UTC)[reply]

Are (present/past) participles really verbs or adjectives?

While working with Dutch inflections I noticed that all the pages and templates list past participles as verbs. This strikes me as odd, because although they are formed from verbs, they are not verbs themselves but grammatically behave as adjectives, including adjectival inflection and also forming of comparatives and superlatives in some cases. And this applies not just to Dutch but to many other languages including English, German and French, and probably a lot of other Indo-European languages too (including Latin I believe). So why are past participles of these languages listed as being verbs rather than adjectives? Shouldn't this be changed? --CodeCat 22:20, 4 June 2008 (UTC)[reply]

English and French past participles are verb forms (consider "have said", "ont dit"), though it's very common to form adjectives from them, and they're sometimes described (a bit misleadingly) as "verbal adjectives". Certainly they're verbs and have to be listed as such; the question is, should they be listed as adjectives as well? For English and French we've decided that, except in cases that have really produced all-out adjectives (such as "surprised", "relieved", "agacé", "ennuyé", etc.), there's no point in doing so, since it's a regular feature of the language that past participles can be used much like adjectives in some cases, and it's more misleading to list them as adjectives than not to do so. Dutch, German, and Latin may be different, however; I wouldn't know. —RuakhTALK 23:01, 4 June 2008 (UTC)[reply]
CodeCat, the answer to this question varies between languages. In Latin and Ancient Greek, participles are typically treated as their own part of speech. Part of the reason for this is that, although they have gender and decline like adjectives, they also have tense like verbs. In English, where adjectives do not have gender, this is no longer an issue. In Spanish, the participle only has gender when it functions in the place of an adjective, but when it combines to form a compound verb, it does not. So, there will not be one answer to this question that will apply equally to all Indo-European languages. What we're doing for Classical languages is recognizing the Participle as its own part of speech, the way Classical grammars do. The etymology shows its origin from a verb, but the inflection table shows how it declines like an adjective. The participle itself gets its own definition, since this will not always match the meaning of the verb and will not always have a simple English equivalent. --EncycloPetey 02:28, 5 June 2008 (UTC)[reply]
No, participles themselves don't inflect for tense, verbal roots they're formed from do ^_^ Once the participle is formed, it represent a particular mood-tense-voice modification of verbal root sense, which can be translated to English using auxiliary/modal verbs, and some helpful semantic modifiers (because lots of those moods/tenses are not directly translatable in English)
AFAICS, what you are doing with entries like e.g. fractus is giving purely adjectival English translations with adjectival Latin declension, but format them as ===Participle===, and moving the "xxx participle of" stub to etymology. I like that approach. I don't like the approach used for e.g. Ancient Greek κατακεχυμένος or Lithuanian bučiuotinas, where the stub-like "xxx participle of" suddenly becomes a "lemma" by itself, another stubs linking to it. Such approach could yield literally hundreds of stubs-linking-to-stubs entries for highly inflected languages like Sanskrit and Ancient Greek, where there are dozen+ participles for every verb each able to decline in 5-8 cases 3 genders and 3 numbers, with several orthography variants, and that would be no good. --Ivan Štambuk 13:50, 6 June 2008 (UTC)[reply]
I dislike EP's approach regarding definitions. First, because the definitions are adjectival. I don't know exactly how Latin treats participles, but that wouldn't work for Ancient Greek, as participles can be adjectives, nouns, or verbs (with verbs probably being the most common). A pasticiple can take on any of the meanings of its parent verb, and so to list them all over again would be silly. I am considering masculine singular participles to be sub-lemma pages. Thus, their forms will link to them, but also to the parent verb (Feminine genitive dual of xxxx, present active participle of xxx), so that the user can go straight to definitions. Rest assured that all entries will link directly to a true lemma page, even if they also link to a few sub-lemma pages as well. -Atelaes λάλει ἐμοί 18:11, 6 June 2008 (UTC)[reply]
Latin has complications that don't exist in Ancient Greek or Sanskrit. Firstly, Latin continued as a language of international communication into the modern era, so it has a longer history of active use beyond the Classical period. With Greek, the modern and medieval languages have a different language code, so historical breaks can be made. A Latin entry, by contrast, must cover all three periods if it is to be thorough. Secondly, Latin has given rise to several major modern languages in which the descendants of those Latin participles are considered strictly adjecitves. So, we have Participle as a verb/adjective hybrid POS in the Classical period, but as Latin gradually became French, Spanish, Italian, etc., the verb aspects of the participle dwindled in favor of the adjective function. We therefore have many, many non-Latin words whose etymologies will point to a Latin participle root. Combine this with its description as a separate part of speech, and an independent, thorough definition seems more helpful than a grammatical description. I'm not entriely happy with this approach, but it seems to me that it is more useful to our users, at least for Latin entries. --EncycloPetey 21:27, 10 June 2008 (UTC)[reply]
Ruakh, can you point to the discussion about that criteria? AFAICS, both surprised and relieved are derivatives of a corresponding verbal sense.
I think that it would be most regretful not to allow adjectival use of English past participles and nominal use of present participles (the act of doing <verbal sense>) to have their own ===Adjective=== and ===Noun=== sections, just because they're derived by regular, predictable morphology. We don't have the same constraints as paper dictionaries, and should not be thinking in terms "what do we gain in savings" but "what do we lose". And in this case, the real non-stub parts of speech, and in lots of cases non-obvious translations to foreign languages.
My mother tongue's grammars never use the terms "participle" but use "verbal adjective" and "verbal adverb" instead, and I do plan to format them as ===Adjective=== and ===Adverb=== one day. This would for it, and for all the other languages that choose to do the same, incur the unnecessary inconsistency of translating one FL POS with another English-one (which wouldn't even be a full-blown entry by itself, but a "xxx participle of" stub). So the "misleadings" saved by omitting the adjectival sense are not eliminated, just projected onto another domain. --Ivan Štambuk 13:34, 6 June 2008 (UTC)[reply]
If regular formation of a participle from its corresponding verb is an argument for listing it as a verb, then why are English adverbs in -ly not considered adjectives, and why are adjectives in -able not considered verbs? It makes sense to me to consider formation of a participle to be simply derivational morphology, since you are deriving one word from another by means through morphological changes. The fact that the original word was a verb doesn't matter, because the end result behaves grammatically as an adjective.
In any case they are not verb forms, because they can never take the place of another verb form except for another participle. However a present participle in English can take the place of any adjective, both predicatively (the man is walking) and attributively (the walking man). English past participles can also be used in this way provided that the verb was transitive (the painted fence and the fence is painted). The only thing that distinguishes an English past participle from an adjective is the use of the auxiliary 'to have' (I have walked).
The situation is the same in Dutch and many other Germanic languages (and French too). In Dutch, present participles can be used predicatively (considered archaic) and attributively. Past participles of transitive verbs can be used like this as well. Furthermore, in both French and Dutch, participles are inflected as adjectives based on gender and number. E.g. Dutch ik heb gekookt (I have cooked), but de gekookte ham (the cooked ham). Or French il a mangé (he has eaten), le pain mangé (the eaten bread), but elle a mangée (she has eaten), la pomme mangée (the eaten apple). --CodeCat 18:02, 6 June 2008 (UTC)[reply]
About French: in elle a mangé, mangé is a verb form, not an adjective at all. In la pomme mangée, mangée is also usually considered as a verb form (it's a short form of la pomme qui a été mangée). Actually, I think that it would be very difficult to find a sentence where mangé is used as an adjective. In other words, 'mangé is not an adjective. But many participles can become adjectives, nonetheless. They are considered adjectives when the meaning of the sentence does not include the meaning of the verb. In Je voudrais un café sucré, sucré is not a verb form, it's an adjective. But, in le café qu'il a sucré, sucré is a verb form, not an adjective. In some sentences, a word may be understood either as a verb form or as an adjective, with slightly different meanings. There has been a long discussion about this point in the Beer parlour, not so long ago. Note that, when you state that they can never take the place of another verb form except for another participle, it's also true of other verb forms (e.g. an imperative can take the place of another imperative, not of a past form). Lmaltier 19:38, 6 June 2008 (UTC)[reply]

Also note that, in French, the noun participe always refers to a verb form (when used as an adjective or as a noun, it is called an adjective or a noun, not a participle). Lmaltier 12:36, 7 June 2008 (UTC)[reply]

I think you need to look at their syntax. Do they, for example licence objects? Do they appear attributively or predicatively? Do they allow modifications by adverbs that typically modify adjectives (e.g., very in English). etc.--Brett 17:05, 7 June 2008 (UTC)[reply]
You are right. In French très (very) works well with adjectives and adverbs, and beaucoup (much) works well with verbs (including participles). I think this trick works as well in English and in many other languages. Lmaltier 20:02, 10 June 2008 (UTC)[reply]
Dutch, from verwennen: Hij is erg verwend. (He is very spoiled.) Hij is veel verwend. (He has been spoiled much.) Hij heeft veel verwend. (He has spoiled much.) --CodeCat 11:37, 11 June 2008 (UTC)[reply]

Full entries or soft redirects for Swiss Standard German spellings

Swiss Standard German has a consistent orthographic difference from Standard German, converting Standard German ß into Swiss Standard German ss. Should we include full entries or soft redirects for Swiss Standard German spellings? For example, should we include a full entry at the Swiss Standard German spelling geniessen (which currently has a hard redirect) or just make it a soft redirect to the Standard German spelling genießen? My instinct is to make it a soft redirect, but I ask here because my affinity for soft redirects is not always in sync with everyone else’s preferences. Rod (A. Smith) 22:42, 4 June 2008 (UTC)[reply]

A soft redirect with a template modelled on {{alternative spelling of}} sounds like a good solution to me, as long as the main entry also gets an Alternative spellings section listing the Swiss form. --EncycloPetey 02:30, 5 June 2008 (UTC)[reply]
I agree with EP, assuming Swiss Standard German is a dialect of German, not a ==Language==.—msh210 16:21, 5 June 2008 (UTC)[reply]
I think we're referring to actual alternative spellings here, sort of like US/UK, whereas Swiss German proper is closer to the Standard French/Quebec Joual part of the Language/Dialect spectrum (abstracting social factors). Circeus 17:03, 5 June 2008 (UTC)[reply]
Note that Swiss German (ISO gsw) is different from Swiss Standard German (ISO de-CH). The former is an almost exclusively spoken (as opposed to written) language, and is nearly impossible for a speaker of only Standard German to understand. The latter is much closer to Standard German, easier for Germans to understand, with entirely predictable changes in pronunciation and spelling, and few differences in vocabulary. Rod (A. Smith) 18:46, 5 June 2008 (UTC)[reply]
I know that ;) Circeus 19:27, 5 June 2008 (UTC)[reply]

Phrasal verb SoP tests

I would like to take some time to hammer out some guidelines to apply to phrasal verbs with a view to creating a reasonably streamlined decision making process for those recurring borderline cases where it can be difficult to decide if the entry really is a phrasal verb, or is SoP.
My interest stems from the fact that the number of classified phrasal verbs has grown from 22 to 880 in about 1½ years. And there are well over 3,000 in common use, meaning that there are still a large number of possibly long-winded discussions on the horizon which would benefit from a set of basic guidelines, in the same way we often decide set phrases by using one of the standard tests. For those who don't know, the list can be found at Category:English phrasal verbs

To start the ball rolling, I think we need to agree on some basic principles:-

  1. A phrasal verb is composed of a word that is normally used as a verb, plus a particle which in any other construction would either be a preposition or an adverb. There can be one or two particles, but never three.
    The plane takes off at 15.00. - Could you look after my baby for half an hour, please.? - John ran off with Jane last week.
  2. It is often, but not always, idiomatic. It can have both an idiomatic and a literal meaning.
    Look out for the broken glass. -- Look out the window and tell me what you can see.
  3. It can be either transitive or intransitive, or both.
    The plane takes off at 15.00. - Take off your shoes before entering, please.
  4. A phrasal verb often has more than one meaning.
  5. The meaning of the verb in a phrasal verb often, but not always, changes
  6. The particle has a strong meaning content.
    In throw away, run away, and most (but not all) other away phrasals, away strongly indicates separation or disappearance.
  7. (Please add more to this list)
  • Typical particles found in phrasal verbs:-

aback, about, across, against, along, apart, around, aside, at, away, back, by, down, for, forth, forward, in, into, off, on, onto, out, over, past, round, through, to, together, up, upon, with, without, and any others you might think of, please add.

  • Areas of uncertainty and conflicting opinion:-
  1. Possible phrasal verbs that appear to have only a literal meaning, but occur in grammatical constructions where one would expect to see a phrasal verb.
  2. Common collocations.
  3. (Please add more to this list)

Thanks in advance for your input. -- Algrif 17:51, 5 June 2008 (UTC)[reply]

Is it possible that there is no good intrinsic test of whether a phrasal verb merits an entry in Wiktionary? It seems that phrasal verbs are a rather amorphous category. Is it possible that the merit of having an entry for one depends of the number of possible combinations of definitions of the verb and the particle. I note that both the verbs and the particles often have a very large numbers of definitions. Although it is, in principle, possible for a user to test each possible combination of verb meaning and particle meaning to find the meaning of a phrasal verb, it would seem very difficult to do so in practice. Thus an extrinsic test would be warranted, based on the product of the number of verb definitions and the number of particle definitions. The number would not have to be dramatically large in order to warrant including the phrasal verb as an entry. Perhaps both the verb and the particle should be both be required to have at least two {three?, more?) definitions in general use (excluding special contexts). DCDuring TALK 18:39, 5 June 2008 (UTC)[reply]
It is possible, I suppose. However, preparing a description of the characteristics of phrasal verbs will allow us to (1) write an appendix for users who want to know more, and (2) allow us to weed out obvious cases without repeating the same discussions over and over.
As far as #2 in the list above, I would say that the first take off is a phrasal verb, but the second example (the "literal" one) is not phrasal at all. It is an example of the verb look with a prepositional phrase "out the window" that answers the question "where". So one characteristic of transitive phrasal verbs is that the object is an object of the verbal phrase, and not the object of the preposition. Example: In the sentence "Take off your shoes," what is being taken off? the shoes. Whereas in the sentence "Take the laces off your shoes," it is laces that are being taken, and the prepositional phrase "off your shoes" answers an adverbial question of where they are being taken. So, a phrasal verb parses with a noun or pronoun object answering what in relation to the verb, but if the verb is not phrasal then it parses with the particle as part of a larger prepositional phrase answering where or how. Intransitive situations are hander to judge, because they don't have an object and so do not exhibit this behavior. --EncycloPetey 19:50, 5 June 2008 (UTC)[reply]
Likewise "with Jane".
For intransitive wouldn't it more or less be a question of whether it's idiomatic? DAVilla 06:07, 6 June 2008 (UTC)[reply]
The CGEL argues that there's no such thing as a phrasal verb, though I don't recall the arguments used.--Brett 17:17, 7 June 2008 (UTC)[reply]
Well "phrasal verb" is definable and worth publishing books about. There are dictionaries of phrasal verbs. All grammatical categories are mere creations of students of language anyway. To me the main issue is whether the entries are useful and, secondarily, whether the category is useful. The entries make it much easier to find the meaning of a particular verb-particle construction, especially because the verbs and particles involved are both among the most polysemous of English words. The categories are useful to maintain and review the categoriesentries. I am unclear as to what value the categories have for our users, except to suggest that looking for verb-particle combinations might be fruitful in subsequent searches. DCDuring TALK 18:07, 7 June 2008 (UTC)[reply]
So, are you suggesting that we not include them? Phrasal verbs may not be syntactic units, but we are a dictionary, not a grammar-book. We should strive for accuracy in describing the syntax of our headwords (e.g. labeling determiners as such, attributive nouns as such, etc.), but that shouldn't bleed over into omitting headwords that are syntax-decomposable but not semantics-decomposable. (Or are you simply saying that we can't rely on syntactic arguments in deciding which to include?) —RuakhTALK 18:56, 7 June 2008 (UTC)[reply]
By all means, lets add phrases that have idiomatic senses. I'm saying that I suspect that we won't succeed in finding consistent tests for "phrasal verbs". I'll go back and read what Huddleston & Pullum have to say about it when I get a chance though. I think they generally argue that they're simply verbs with prepositional complements.
With the example of run off with, for example, we have go off with, wander off with, stroll off with, etc. You could also front the with if you were so included (i.e., with whom did she run off?)
There are, however, certain oddities, such as pronouns being disallowed in certain constructions where other nouns are fine (e.g., pick up the paper vs. *pick up it.)--Brett 00:37, 8 June 2008 (UTC)[reply]
I for one have even suggested that the heading Phrasal verb should be allowed as a PoS, but I understand from previous discussions, and some comments here, that this would be difficult, or even impossible to pursue. However, I think the category is very useful, particularly for English L2 who struggle to learn the phrasal verbs, and find this list to be a great aid. (I have even had emails thanking me for pushing it!) So I agree with EP that "preparing a description of the characteristics of phrasal verbs will allow us to (1) write an appendix for users who want to know more, and (2) allow us to weed out obvious cases without repeating the same discussions over and over.".
There are many respectable phrasal verb dictionaries, and I believe that Wikt can be better than most simply by attempting to formulate a few "tests" for inclusion (a la "fried egg" and "Egyptian pyramid"). One example that I see in this discussion is run off with. Why is this different from wander off with? Or maybe they are not so different after all? Certainly run off with 1. has more than one meaning, 2. has the idiomatic meaning of steal, (not forgetting the to get married example above of John and Jane), 3. changes the basic meaning of run, and 4. demonstrates what I call inseparability of meaning, that is to say, remove either particle, and the sense changes. Wander off with, if it can be demonstrated to also mean steal, which is a possibility, would also qualify as phrasal verb for the same test.
Grammatical construction tests I think would be good, as per pick it up vs pick up it. EP makes a similar point in his parsing analysis of take off. If examples can be shown to parse as an inseparable unit, eg. Take off your shoes before you enter. then there is a case for there being a phrasal verb entry, even though take + off could appear to be SoP in other examples, eg. Take the laces off the shoes.
I also think that a definitive list of possible particles would help in the weeding out process. -- Algrif 16:19, 8 June 2008 (UTC)[reply]
It appears that I had somewhat misremembered. The CGEL argues against phrasal verbs mainly at a terminological level where, "The view taken here, however, is that the underlined expressions in the [a] example in [3], despite their idiomatic interpretations, do not form syntactic constituents, any more than the underlined word sequences in the [b] examples form constituents." The forms they contrast are:
  Kim referred to your book      vs.    He flew to the capital
  He put in his application      vs.    He carried in the chairs
  I look forward to seeing you   vs.    I ran forward to the desk
  He paid tribute to his parents vs.  He sent money to his parents
The CGEL does, however, recognize "prepositional verbs". These are those verbs that select a prepositional phrase complement containing a specified preposition along with its own complement. Notice that under this definition, referred itself is a prepositional verb, but referred to is not. Those verbs that select a specified PP complement are of two kinds: those with mobile PPs and those with fixed PPs. Verbs taking non-specified PP complements and those with specified but mobile PPs can be distinguished from those with fixed PPs, according to the CGEL, in that the fixed ones do not allow:
  • fronting of the preposition in relative, open interrogative, and it-cleft constructions.
  • repetition of the preposition in coordinated complements (e.g., *I came across some pictures and across some letters.)
  • insertion of an adjunct between the verb and the preposition.
Another test in which the prepositional verbs can generally be distinguished from others is
  • Passives are usually allowed in the case of prepositional verbs, but not always. In contrast, they are usually unacceptable in non-prepositional verbs.
The CGEL notes 6 possible constructions
  verb [prep + O]
  verb O [prep + O]
  verb [prep + O] [prep + O]
  verb [prep + C]
  verb O [prep + C]
  verb [prep + O] [prep + C]

--Brett 14:41, 10 June 2008 (UTC)[reply]

On top of the category of prepositional verbs, the CGEL also recognizes a verb-particle-object construction (e.g., take down the poster, let go her hand, make clear the intent), verbal idioms containing intransitive prepositions (e.g., He gave in), verbal idioms containing NP + transitive prepositions (e.g., we lost sight of our goal), and other types of verbal idiom (e.g., make sure, given to understand, make do with, have in mind to change his hair).--Brett 15:04, 10 June 2008 (UTC)[reply]

Hidden categories

I don't know if this MediaWiki feature has been enabled here, but if so, polyethylene shows one good example for where it would be useful. __meco 13:13, 6 June 2008 (UTC)[reply]

Why? I'm more likely to spot a porblem because of the category list than by seeing in one of the various little categories. So, for me, the visible categories make it more likely I'll do something to improve the article. I expect this is true for others as well. If the categories were hidden, then that benefit would go away. --EncycloPetey 13:28, 6 June 2008 (UTC)[reply]
I, being a Norwegian native speaker, have a different perspective on this. I check the Norwegian category and then clean it out on a not too regular basis. __meco 13:54, 6 June 2008 (UTC)[reply]
Registered users have the option of having the hidden categories appear. To me the question is the effect of hiding a category on unregistered users and occasional users who are not familiar with the choices afforded them by WT:PREFS and even my preferences.
I can see that the problem with polyethylene. The non-English language-specific maintenance categories would appear to be prime candidates for hiding. DCDuring TALK 14:24, 6 June 2008 (UTC)[reply]
It seems to me that having these categories on display is useful, if people want to remove them from the entries they should fix the cause of the problem instead of just hiding the categories. Conrad.Irwin 14:30, 6 June 2008 (UTC)[reply]
That doesn't quite work. No matter how much I (or Meco) want to get rid of them, we can't; we don't have the information. I certainly don't know any of the genders in question (;-). It isn't like a syntax error or a bad header or some such. Robert Ullmann 14:39, 6 June 2008 (UTC) And as noted, editors can show them, users won't see them (and would probably only be confused by them) Robert Ullmann 14:47, 6 June 2008 (UTC)[reply]
Hiding them (maintenance cats that tend to flock together like these) is a very good idea: we should much more often think about the presentation to users, not (just) editors. That is the idea isn't it? That most people looking at a page are using the project as a reference, not involved in edits? I put the magic word for the ttbc categories in the boiler-plate template {{ttbccatboiler}}, so that we could show them again with one edit if desired, it seems to be satisfactory. In this case, a {{gendercatboiler}} (with a couple of parameters for the genders in the language as well) could do the same, and generate the text in most cases. I do think we should consider carefully for each class of cats whether they ought to be hidden, and not go overboard. This class seems to me to be something to be hidden by default. Robert Ullmann 14:36, 6 June 2008 (UTC)[reply]
It would be almost ideal if registered users could display only hidden categories that were for languages in their Babel listings (above a certain level!!!) or specify which language or other class of category to display. DCDuring TALK 15:15, 6 June 2008 (UTC)[reply]
But how much of our obscure language cleanup/improvement is done by anon editors? How many people here first found out about these cleanup categories because they saw them on a page and decided to do the cleanup? Why is it that we want to show lots of redlinks, but we want to hide all the categories? This eems like we're shooting ourselves in the foot. --EncycloPetey 17:53, 6 June 2008 (UTC)[reply]
Actually, I think you shot yourself in the foot with that initial question EP. In my experience, the answer is basically none. Anons don't do the tedious cleanup like cateogorizing, formatting, etc. They do add translations, but that's about the extent of it. I am also in support of hidden categories, as I think the masses of clumped cats would made our site look ugly, and those who want to see them can. -Atelaes λάλει ἐμοί 18:01, 6 June 2008 (UTC)[reply]
If we hide the missing gender categories, there should be some alternate display to alert users/editors that the translations are missing gender information. Perhaps do this by adjusting the {{g}} template so that it displays an asterisk, with a corresponding note added to the bottom of the table explaining that those translations marked with asterisks are missing gender information. Perhaps add it to {{trans-bottom}}, and have its display (and categorisation in a supercat of the the language specific ones) triggered by the presence of a {{g}} template (I don't know if this can be done, if it can't then editors/AutoFormat could add a {{translations missing gender}} template above {{trans-bottom}} where required).
As a way of cleaning up some of these, would there be a way for a bot to read any gender information from our entry for that word and/or the foreign language Wiktionary? If so, this would seem a natural fit with what tbot does.
Tbot and/or AutoFormat could add {{g}} templates to translations that have no gender template. This would obviously need to work with either a list of languages that have gender or a list of those that don't. Thryduulf 20:59, 6 June 2008 (UTC)[reply]
Is there any need to have Category:Requests for autoformat show? Can we hide it?—msh210 18:48, 11 June 2008 (UTC)[reply]

Proposed change of wording to {{PL:pedia}}

Copied from Template_talk:PL:pedia to allow for a wider audience.

As this template is for External links, or See also sections, the wording is out of place. The fact that Wikipedia has an article is not the same as saying that more information can be found by clicking on the link.

I would like to change this template to one of the following:

  • (The French) Wikipedia's cat article.
  • Cat on (the Spanish) Wikipedia.

instead of

So that the flow of meaning is more consistent. I have intentionally removed the link from Wikipedia, I don't think that is necessary anymore. I have also intentionally removed the quote marks, as the emboldening suffices to remove the literal meaning of the word. In both of the examples the words in brackets are intended to be removed for links to the English Wikipedia. Conrad.Irwin 14:57, 23 May 2008 (UTC)[reply]

That makes sense to me; and, likewise with all the other {{PL:*}} templates. —RuakhTALK 18:35, 23 May 2008 (UTC)[reply]
Oh, but if we're not going to be linkifying the project names, maybe we should include their tag-lines, like:
  • Cat on (the Spanish) Wikipedia, the free (Spanish) encyclopedia.
? —RuakhTALK 18:42, 23 May 2008 (UTC)[reply]
To be honest I'm not a big fan of taglines they just (as I suppose they are intended to) sound like spam. Conrad.Irwin 00:01, 7 June 2008 (UTC)[reply]
I sometimes accidentally click on the blue-linked "Wikipedia" instead of the subject, so I like the recommended change. It would seem to be the kind of mistake many would make. How about a small-target link for those who want information about the Wiki, not the subject? Or perhaps we could keep the full-size blue-link for an "initial" period. Species, for example, might warrant it. DCDuring TALK 16:56, 7 June 2008 (UTC)[reply]
Yes, for Wikispecies some explanation would be useful, though whether it is better to link to the Wikipedia article or to the project's main page I'm not sure. I set up {{PL:pedia2}}, so if people want to experiment then they can do so, as this seems unopposed, I'll make the change to {{PL:pedia}} in the next few days. Conrad.Irwin 10:29, 8 June 2008 (UTC)[reply]

Variant forms of Chinese Characters?

How should we deal with variant forms of Chinese characters? (I’ll archive this discussion at Wiktionary:About Chinese characters afterwards.)

Currently they are listed without much indication of which form they are, and variant forms are only listed under the vague “see also” hatnote.

Distinguish two issues:

  • character itself (the topic here)
  • use in particular languages (I’ll address this in a second thread)

For the character itself, there are two issues:

  • what form is this character?
  • what are the variants?

Concretely:

  • I looked at these characters 1: 2: and became confused.
  • I suggest adding new fields to {{Han char}}:
    • one called f= for which form, and
    • separate fields to list all variants.

As far as I can tell, all characters fall into one of the following 9 categories; I list suggested abbreviations below.

In outline, there are:

  • Traditional forms (the core)
  • Simplified: Simplified Chinese, shinjitai (Japanese simplifications)
  • Country-specific: Japanese, Korean, Vietnamese
  • …and a few other Japanese ones (Japanese-specific traditional forms, idiosyncratic simplifications, and errors)

In detail:

(Note that Jōyō kanji (常用漢字)and Jinmeiyō kanji (人名用漢字)are written in shinjitai, while Hyōgaiji (表外字)are written in kyūjitai.)

Characters should be classed as Traditional or Simplified if possible; for instance, only class a character as kyūjitai if that form differs from Traditional (but list both the traditional form and the kyūjitai form as variants on the simplified character page). Non-Traditional/Simplified should be classed into the relevant “(Country)-only CJKV Characters” category.

Returning to my original problem, 1: is the shinjitai form, used in Japanese, and 2: is the Traditional form, used in Chinese, and kyūjitai form (this is clear in my browser setup because I have different fonts for Japanese and Chinese coverage): they should be marked as such and refer directly to each other as variant forms.

How does this sound?

Nbarth (email) (talk) 00:52, 8 June 2008 (UTC)[reply]

The template {{zh-forms}} is similar to what I’m talking about, but is specific to Chinese, and deals more with phrases/multiple characters than with individual characters.
Nbarth (email) (talk) 01:27, 8 June 2008 (UTC)[reply]
I also created a {{ja-forms}} template, which might suit your needs. See 図書館 for example. -- A-cai 06:38, 8 June 2008 (UTC)[reply]
Hi A-cai,
Thanks, yes, {{zh-forms}} and {{ja-forms}} address a form of my second question “show the variants”, though I think variant forms should be listed in the {{Han char}} entry itself, for instance, so they can easily be displayed in-line or extracted programmatically; displaying the {{zh-forms}}/{{ja-forms}} box would ideally be selected by user preferences.
Thinking on this more, I think what I’m suggesting is:
  • Extend {{Han char}} to also reflect the form of the character, as in {{cmn-noun}} and outlined above.
  • In fact, rather than a f= named parameter, have an (optional for now, mandatory once forms have been fixed for existing entries) positional parameter, as in {{cmn-noun}}, which states what form a character is. This should have t, s, ts, tsh, tssh for various combinations of traditional, simplified, shinjitai.
Nbarth (email) (talk) 15:09, 8 June 2008 (UTC)[reply]
As this seems to not have raised objections, I’ve put a form of the above classification and suggestions (without modifying {{Han char}} at Wiktionary:About Chinese characters#Categories of characters – if it’s not ok, please change.
Nbarth (email) (talk) 23:15, 12 June 2008 (UTC)[reply]

Use of Chinese Character Forms in specific languages

This addresses the second point in the above thread

I suggest:

For the main (Translingual) character section:

  • if a meaning (but not the form) is country-specific, it should be at the end, and flagged as such

The only example I know are kokkun(国訓): Japanese meaning, for which we could add a template: {{ja-kokkun}}

For the language-specific sections:

  • if a character form is the standard form, continue as present
  • if a character form is not used in a given language, such as a Simplified Chinese character that does not agree with the shinjitai form, like , then that language should not be listed on the page
  • if a character form is non-standard but used, like kyūjitai that are tolerated in names, it should be flagged as such (for this example, {{ja-kyu-name}}, unless someone knows the official name for such characters), and only permitted uses included. Notably, kyūjitai forms of Japanese words are ok to include (for historical reference), but should be clearly flagged as such.

Nbarth (email) (talk) 00:58, 8 June 2008 (UTC)[reply]

On “forms not used in a language”: examining ja:歩 shows that while it does list the other languages, it is only to say “use the other form”. (Or whatever “参照” means.)
Nbarth (email) (talk) 00:27, 9 June 2008 (UTC)[reply]

Permissions: Add Template:yue-hanzi to correct category

Could someone so-permissioned add:

<noinclude>[[Category:Chinese templates]]</noinclude>

to Template:yue-hanzi, as it is an (important) Chinese template? (Following Template:cmn-hanzi.)

Thanks!

Nbarth (email) (talk) 16:28, 8 June 2008 (UTC)[reply]

Specific Universal Changes in Wikisaurus

You said a broad hand, but let's just say I don't want to overstay my welcome. I see several changes that I think would be beneficial to the wikisaurus project.

  1. The header is kinda clunky and could stand a cleanup. I've created a proposed header here. My reasons for this change are to remove some of the extraneous white-space within the page in order to bring the user's main interest in immediate view, and to give a better aesthetic to the page.
  2. Many of the categories suggested in the template page are actually duplication of effort from the wiktionary pages. I feel we should remove the duplication of effort and therefor simplify the project. See altered page.
  3. Much of the page will be duplicated in the actual wikisaurus process. Couldn't we just let the process be our workhorse? Amina (sack36) 13:24, 10 June 2008 (UTC)[reply]
The header/logo seems to take up too much precious vertical space on the first screen. If it could appear in the upper right, it could be the same size (or even larger). If you feel it needs to be on the left, it should be narrower. (It could be wider.) DCDuring TALK 14:11, 10 June 2008 (UTC)[reply]
The header/logo actually took up a great deal more space on the original pass. This is quite a bit shortened.Amina (sack36) 17:14, 10 June 2008 (UTC)[reply]
Space above the fold is precious. Every extra keystroke it takes for someone to get what they want is a problem, whether it's paging down or clicking on a link. The existing wikisaurus pages seem extravagant in their use of space, both horizontally and vertically. Whitespace certainly has its uses, but not for squeezing important content off the first screen. Some of the efforts to improve Wiktionary have focused on getting the table of contents for an entry onto the otherwise underutilized space on the right side of the screen; hiding long lists of translations and related and derived terms; horizontalizing lists such as of synonyms; and even pushing sister project links toward the bottom of the entry. Long etymology and pronunciation sections also squander space above the fold. In-line citations can also, but usually farther down the page. DCDuring TALK 18:44, 10 June 2008 (UTC)[reply]
One thing, on Wikisaurus:obese, all the links are prefixed with ws: (which doesn't work, but looks like it should go to wikisource), shouldn't they just link straight to the dictionary entries? Conrad.Irwin 18:48, 10 June 2008 (UTC)[reply]
what is the purpose of having all that extra stuff on the wikisaurus pages? Etymology and pronunciation belong in wiktionary as do translations. As for related and derived terms, that's the whole point of a thesaurus. They should act as synonyms with the closest match be listed first and the least likely, the derived and the archaic at the end. The pages should be simple with the logo and the selected word at the top, the synonyms next and the antonyms last each with their own header, of course. That's it. No extra stuff that's already being taken care of in wiktionary. Amina (sack36) 20:58, 11 June 2008 (UTC)[reply]

REDIRECTION IN WIKISAURUS -- I noticed when doing the wikisaurus that the word "fat" had been redirected to "obese". I see two problems with that.

  1. The word "fat" has several meanings. By redirecting to obese, you don't allow the "fat" meaning "lard" or the computer term (although that may be spelled "phat" these days.
  2. Redirection end runs the ease of the simple model. It makes the entire creation of wikisaurus orders of magnitude harder. Now we have to make the determination through programming which is meant. Leaving it open and putting all the different meanings of a word on the same wikisaurus page, we don't have to make that determination, the user can. Amina (sack36) 09:21, 12 June 2008 (UTC)[reply]
The redirection is from a specific sense of fat: "in the sense of obese" I would have thought that is precisely what we want. I would also think that we would want to substitute the wikisaurus link for part of the list of synonyms. It would be particularly nice if the total synonyms line for a sense would only be one line long on a full screen device. Can the wikisaurus link be on the same line as a short list of synonyms? DCDuring TALK 11:22, 12 June 2008 (UTC)[reply]
I'm not sure I understand what you're saying, DC. Are you saying we want the people to be redirected? Wouldn't that defeat the purpose of simplicity? Also, why would we want to limit the synonyms of a given word? Isn't that counter to the establishment of the wiki in the first place? Amina (sack36) 16:05, 12 June 2008 (UTC)[reply]
He's saying that Wikisaurus:fat (obese) is a redirect to Wikisaurus:obese, but that Wikisaurus:fat is no such thing. —RuakhTALK 19:21, 12 June 2008 (UTC)[reply]
I must have misunderstood which application of a Wikisaurus link we were talking about. I'm not sure how folks first come to Wikisaurus and how people use Wikisaurus once they are aware of its existence. I focus on the links from Wiktionary entries, which, I assume, are always under the synonyms header. The synonyms headers are likely to be "redirecting" and, in a proper entry for a multi-definition word, will be doing so from a specific sense. In my analysis, the user would already know what for what sense of a word he was seeking synonyms.
I am simply unfamiliar with other means of using wikisaurus, which seem to be what Sack36 was talking about.
I justify my focus on the links from Wiktionary because that seems to be the most likely way Wikisaurus will capture users. Once 'Saurus handles that class of usage well for a large number of high-volume synonym classes, other interfaces might be developed or improved.
Is there any good information about what types of entries get visits and contributions? I assume that sex, invective, oaths, and insults are high on the list. Those classes of entries would at least provide sufficient traffic to generate tests of entry designs. DCDuring TALK 19:47, 12 June 2008 (UTC)[reply]
Wow, DC, you are so ahead of me! By the way, the redirect on fat was not labeled fat(obese). It was just "fat". Now, about the 'saurus. I'm going to summarize where we think we are at this time.
      • The overall look to date is still too complicated
        • Concatenate the heading to allow visibility to the meat of the page
        • Remove all but synonyms and antonyms at this time
          • once a significant body of work is in place, more complexity can be added
      • The portal into wikisaurus will be through the definition of each word in Wiktionary
        • once in wikisaurus, the pattern of finding things will change to a multi-dimensional link system.
        • wiktionary will maintain the bulk of the information, leaving 'saurus to do word linkage.

Does that outline seem right? Any additions/subtractions/alterations?

Well, actually I have one or two already. By using body parts as our introductory foray into the wiktionary we are inviting a great deal more of the perv group to join us in creation of this project. What say we hold off on the body parts until quite a bit later? In wiktionary there is no need for connections between pages. Each definition is autonomous. Wikisaurus is just the opposite. We'll be using links for almost everything. It's like a huge database where wiktionary is the main base while wikisaurus is the key. It uses one-to-many and many-to-many connections. Amina (sack36) 23:52, 13 June 2008 (UTC)[reply]

I'd have to spend more time looking at 'Saurus to be of any help. I've only been here during a period when it's been mostly neglected. Any illustrations of a good entry? Any illustrations of interesting features, bad features, common user problems (pervs, etc)?
With the multi-definition words I assume that the links are from the synonyms section which may have the multiple senses, more than one (but not all) of which will have corresponding 'Suarus entries.
When you say "Remove", I assume you mean "comment out" the content beyond synonyms and antonyms for now.
Can you say more about the "multi-dimensional link system" ? DCDuring TALK 00:12, 14 June 2008 (UTC)[reply]

Finalization of format - really!

Well it was an interesting week trolling the back roads of Wiktionary and Wikisaurus to find the different styles of wikisaurus pages and throwing away the duplicates and insane. I think I may have some bug bites left over from the body part section. If people have done this before me I don't want to know. Humor me. The different types seemed to break down into three noticeably different schemes. I've added a fourth that is of course perfect! I have the salient points of each and a URL so you can actually see them. I have indeed removed parts from a couple of them because they were being addressed in wiktionary where they belonged. Etymology is not a heading for Wikisaurus. If I disagreed with something being part of wikisaurus but I couldn't find it in wiktionary, it stayed in the pages it was found. We all get a blame in this is what I say. So here goes:

  1. Style 1
    1. the header (header is the same on all) is inline with the Table of Contents.
    2. The differentiation of language is included on the line with part of speech.
    3. All the links go to Wiktionary.
    4. Pseudo-synonyms are included in a separate part of synonyms. (what's a pseudo-synonym?)
    5. each part of speach is represented on a different page
  2. Style 2
    1. Table of Contents in line with header and text
    2. Each meaning of the headword is labeled in the Table of Contents
    3. Pseudo-Synonyms, Idioms, and Slang are included as separate sections
    4. the synonyms etc. are presented vertically in a table
    5. the table provides a synonym with a link to wiktionary, the same one with a link to it's own page on wikisaurus, and a definition of that word.
    6. each group of synonyms includes a definition as well.
  3. Style 3
    1. Table of Contents in line with header and text
    2. the Headword provides a link to wiktionary
    3. each synonym is represented in the same grid as described above
    4. Language is the first breakout rather than meaning or part of speach
    5. near synonyms, Colloquialisms and Slang are broken out to be on their own
    6. near synonyms, Colloquialisms and Slang are presented offset from synonyms
    7. wikisaurus links is listed, though I have no idea what function it performs.
    8. Roget's Thesaurus classification for this headword is given.
  4. Style 4
    1. Table of Contents is along right hand border
    2. the Headword provides the only link to wiktionary
    3. a synonym is used in lieu of a definition to differentiate separate overall meanings
    4. all parts of speach are kept on the same page
    5. synonyms and antonyms are presented horizontally
    6. only synonyms and antonyms are differentiated since Idioms, Colloquialisms and Slang will be defined as such in wiktionary.
    7. all words will be headwords
    8. table of contents shows the division of word meanings but synonym and antonym are assumed

Obviously the last one listed is mine. I'm hoping by seeing the page in action y'all will get a better understanding of how we can use the nature of the internet to do our work. The one change I'd like to make across the board is to change the size of the head word to be larger. It would really help people to know which word is the headword. Any comments? Questions? Complaints? Ice Cream?! Amina (sack36) 01:35, 15 June 2008 (UTC)[reply]

The outcome of the highly debated VOTE seems to be in favour of not including jargon in the main Wiktionary namespace. This leaves us with the question of what exactly to do with it. I would like to propose the following.

For terms that are only WMF jargon (!vote)
Replace the page with {{only in|{{in glossary}}}}
For terms that are also used elsewhere (RFC)
Add {{xsee}} or perhaps a more flexibly worded {{also in}} to the top of the page.

Anyone else have any thoughts? Conrad.Irwin 18:32, 10 June 2008 (UTC)[reply]

I'm not sure that I understand the implications of what you suggest. But the question of making Wiktionary jargon usable to new users quickly is vital. Terms used elsewhere in Wikiworld are of secondary importance.

I recollect that six months ago I was constantly coming across terms that I could not make sense of. I naturally hoped that I could find their meaning with one click from the screen on which they arose. Embedded blue-linked terms are great in that regard. I was, of course, unaware of the option of double-clicking on a word to open up the associated entry. I would try Help in the navigation pane. Then I would put the term in the search window, often not finding it, except after fooling with advanced search, making guesses as to where such terms might be. The things I would be looking for started were: 1., pages that people referred to, 2., terms that had the look (to me) of insider Wiktionary jargon; and, 3., linguistic terms, especially those used in a somewhat different way in Wiktionary than in most of the rest of the world. All of these need to be addressed. They do not all fit into the category that was voted on. I would argue that we need a single readily accessible main Glossary that encompasses page shortcuts, Wiktionary jargon, and linguistic jargon that Wiktionary uses in a way different from prevailing meanings.

Each of the three approaches I tried need to be covered as well as any others that a new user might think of. Blue links are likely to work only for fixed text, not for discussions. Double clicking is the same as search with regard to the destination, but requires no typing, a big plus. The navigation screen could have a link to the Glossary. The Help page could have a link to the Glossary. The devices Conrad is suggesting only address the use of the search window. If I understand the proposal and how search works, they seem to be close to the best we can do without significant changes in how search works. Putting the Glossary in the navigation pane and making it more inclusive might help also. DCDuring TALK 19:20, 10 June 2008 (UTC)[reply]

Well we have Wiktionary:Glossary, this was mainly intended as a discussion of how to point people there, but we might need to improve it to incorporate more information. Conrad.Irwin 19:35, 10 June 2008 (UTC)[reply]
We have two glossaries, one for wiktionary jargon, one for terms used in entries. There is some overlap. Moreover, for new users, the distinction between the two is not likely to be obvious. What you propose eliminates the problem by directing the user to the appropriate glossary (presumably the jargony one). I am interested in helping new users find the terms now in both. User expectations formed in the process of finding one set of non-entry terms is likely to govern search for other types of non-entry terms. DCDuring TALK 20:25, 10 June 2008 (UTC)[reply]

New functionality of Template:rfscript

Per the apparent consensus of #Language specific help templates, I have added an additional function to {{rfscript}}. The template now takes a {{{lang}}} parameter, with the input being the ISO code of the language, similar to many existing templates. If a language parameter is entered, the template categorizes the word into [[:Category:{{{lang}}} articles which need {{{1}}} script]] instead of [[:Category:Articles which need {{{1}}} script]]. It seems to me that these categories should be double-categorized under their general script request category and the general language attention category, so I have placed Category:Sanskrit articles which need Devanagari script into both Category:Articles which need Devanagari script and Category:Sanskrit words needing attention. Just wanted to let people know and open up the floor to any conflicting opinions on the matter. -Atelaes λάλει ἐμοί 07:36, 11 June 2008 (UTC)[reply]

PS, I think that in cases where a script is only used by one language (i.e. Avestan), I think it redundant to use the lang parameter. However, I think this the exception, not the rule. -Atelaes λάλει ἐμοί 07:40, 11 June 2008 (UTC)[reply]
I tried it with Punjabi but it did not seem to work: {{rfscript|pa}}. —Stephen 18:23, 11 June 2008 (UTC)[reply]
It still requires a script name, so: {{rfscript|Shahmukhi|lang=pa}} (or would it be {{rfscript|Arabic|lang=pa}}?). Then again, if the former, you shouldn't need to clarify the language, as Punjabi seems to be the only language using Shahmuki. -Atelaes λάλει ἐμοί 18:35, 11 June 2008 (UTC)[reply]

ditransitive

This seems like a wonderfully obscurantist name for a common linguistic phenomenon. See {{ditransitive}}. If we are to keep it, it would seem to need to have a link to the principal namespace entry or to one of the two Glossaries, presumably the one for terms used in entries. Learners' dictionaries like Longman's DCE do not depend on this term, having a particular abbreviated notation for constructions taking "double objects" without prepositions. I suggest that it has no place among the terms that we use in making entries for our users. It of course deserves to be an entry. The phenomenon it describes deserves to be intelligibly noted in all of the relevant verb entries. It might be an appropriate name for the template, but "ditransitive" is yet another term we let discourage new users. DCDuring TALK 18:32, 12 June 2008 (UTC)[reply]

I tend to disagree. Wikipedia is not censored for nudity and Wiktionary is not censored for stupidity. Five minutes ago I had no idea what the word meant, but I found myself on an online dictionary and now do. While we shouldn't use overly technical terms just to look smart, if there is a term which describes a phenomenon, and its the best term for the job, we should use it. I have no problem linking the term, like we do for {{archaic}}, and if there's a simpler and easier to understand word which means the same thing, then let's use it. Otherwise, let's stick with the correct term, and if people have the motivation to find out what it means, perhaps they can find a dictionary as quickly as I did. -Atelaes λάλει ἐμοί 18:50, 12 June 2008 (UTC)[reply]
Wiktionary is not just for us. We have some obligation to attempt to communicate to the larger population. The technical terms serve our needs in doing the work, but they fail in converting most new visitors to repeat users. We have 1.5% the usage of WP, 20% the usage of MW online, and 7% the usage of Answers.com. That doesn't seem like success to me. I would argue that we are aiming at an audience that consists of US. We don't even seem to be doing all that well at attracting more contributors, who would be people most like us. Will such an approach ever get donors interested? Will we ever be able to get technical resources for improvements without donor support?
There is nothing especially "correct" about technical terms. There are more convenient labels for phenomena for those who use them a lot. Web usability research suggests that a clickable link is not something that we can depend on for communication because users simply often don't click through, especially if they don't believe that they will get intelligible, useful information. Take a look at any successful site and you will see that clickable terms are almost always simple ones, especially on pages that a users encounters at the beginning of a visit to a site. Subsequent pages begin to reveal complexities and, yes, technical vocabulary. DCDuring TALK 19:25, 12 June 2008 (UTC)[reply]
We do not make lexical decisions based on economics or uncited research. I also disagree with your comment about the clickable terms on "successful sites". One of the most successful sites on the Internet during its first decade is UCMP, the University of California Museum of Paleontology. The clickable terms on their site during that time were almost entirely all technical terms. Thus, your premise is not a legitimate one, and valid conclusions cannot follow from your argument. --EncycloPetey 06:01, 13 June 2008 (UTC)[reply]
I have long had the feeling that this might be an ivory tower. Decisions appear to be made on unstated preferences, values, and standards as if this was some kind of academic institution. I would welcome a Vote to make explicit all such preferences, values, and standards.
What uncited research are you referring to? If it is the Site Analytics data about our low usage, I had previously offered the link for discussion [25]. If it is web usability research, I would welcome some discussion of the issue by my betters. As it is now we are making decisions based on utter ignorance of user behavior reinforced by apparent indifference to user needs.
I would be interested to determine what number of visits the UCMP site had or what constituted its success.
I am not at all opposed to including technical terms as entries, quite to the contrary. I am simply opposed to adding unnecessary technical terms that constitute a barrier to wider usage. DCDuring TALK 16:33, 13 June 2008 (UTC)[reply]
Like I said earlier, if there is a simpler route which conveys the same information, I am open to using it. Also, I wonder how much the use of ditransitive detracts from the experience of the average user. Most probably see it, perceive technical jargon, and gloss over (in much the same way as I imagine most average users gloss over "transitive"). It doesn't really take up that much space. However, those who are interested in grammatical understanding will either know what it means or take the time to look it up. The understanding of "ditransitive" is not necessary to learn the meaning of the word. When our software becomes a bit more sophisticated, perhaps we could have a setting to allow users to hide grammtical info if they're not interested in it. Until then, I am unwilling to let this information simply be prohibited. If we can have practical, useful information coexisting with esoteric word nerd information, why not do it? -Atelaes λάλει ἐμοί 16:47, 13 June 2008 (UTC)to[reply]
{{ditransitive}} is used in five entries, four of them English. It doesn't seem to have taken our community by storm. There is a vast amount of usage information (especially about verbs) that we don't convey (use with prepositions [consistency and completeness are the issues with phrasal verbs], use with gerunds and infinitives, as well as double-object constructions), but it needs to be conveyed without technical vocabulary.
I favor hiding our use of technical terms by default. Templates referring to grammatical terms could be made behave differently for anons and registered users. Anonymous users should not be confronted with an unfamiliar term like "ditransitive", but with something that might communicate the basics without requiring a click, but offering a click-through for further explanation. If registered users were to have the option of switching from the anon version to one using technical grammatical vocabulary, that would be great.
I am concerned with building up our usage from its amazingly modest levels. I would think this would be an objective worth working toward. If technical terms, obsolete definitions, complicated definition wording, and complex layout scare users away, that hurts us. I certainly don't care nearly as much how we refer to the underlying phenomenon anywhere outside principal namespace. "Transitive" and "intransitive" are terms that some non-negligible percentage of the population have been exposed to, as with the traditional names of parts of speech. DCDuring TALK 21:05, 13 June 2008 (UTC)[reply]
Our model for grammar tags is to use the simplest tags that clearly express the grammar of the term, and to let readers click through to the glossary for the less familiar terms. Just as "transitive", "intransitive", and "reflexive" are useful to many interested readers, some readers may find the now clickable "ditransitive" tags useful. Of course, if you can think of a layman replacement for "ditransitive", please suggest it. Rod (A. Smith) 21:48, 13 June 2008 (UTC)[reply]
By the way, I created {{label}} back in October to allow us to create labels that display differently for linguists and laymen. Robert Ullmann disapproved of it, though, on the grounds that it would confuse editors break external automated consumers of our data. Perhaps somebody can suggest a better approach. Rod (A. Smith) 21:57, 13 June 2008 (UTC)[reply]
Thanks for the clickability. I believe our choice of vocabulary and conventions should be subject to usability evaluation, even testing. I am going to study Longman's DCE and other learner's dictionaries that have more usage information than conventional print dictionaries to find models. I doubt if I could successfully address the problem one term at a time, however. It would probably require some conventional notation or something.
What have the automated consumers of our data done for us lately? A facetious-sounding, but also serious question.
We have already successfully deterred many casual contributors by varied means. The survivors would seem capable of handling complications like differences between what is displayed for different classes of users. If we are going to open up more to new contributors, that would be different. DCDuring TALK 23:28, 13 June 2008 (UTC)[reply]

WT:ELE inconsistency

Just a quick question: I noticed that in Wiktionary: Entry layout explained the list of headers given right after #Order_of_headings has a different order from that of the sections (one put "translations" at the end, the other between "-nyms" and other terms). Which one is supposed to be current (as I was just looking to check that)? Circeus 01:36, 13 June 2008 (UTC)[reply]

I'm not sure what you're asking, but it sounds as though you are concerned that the headers are not discussed in the same sequence that they are to appear in an entry. Is this correct? If so, then this is partly historical, partly because the Translations section is more important than the Related or Derived terms sections, partly because the Related terms and Derived terms do not always appear nested as a L4 header, and possibly as well for other reasons I'm not aware of. In any case, the order given in the Order of heading section is correct as of the vote we took on the matter. In any event, it is not necessary that the sections be discussed in their L4 sequence because, as noted above, some of these headers do not always appear under a POS, and so it is convenient to discuss those sections together following the ones that do always appear under a POS header. --EncycloPetey 05:53, 13 June 2008 (UTC)[reply]
So as far as my actual question is concerned, that appears to boil down to "yes, when they are at the same level, translations are supposed to be last". Thanks. Circeus 18:53, 13 June 2008 (UTC)[reply]
That isn't what you asked. And, no, Translations are not always last; there are a few sections that follow them as noted in WT:ELE. --EncycloPetey 20:12, 17 June 2008 (UTC)[reply]

Non-gloss definitions

Most of the dictionaries I have distinguish between (a) gloss definitions, wherein the definition has the same part of speech and meaning as the defined term; and (b) non-gloss definitions, used for the relatively few terms for which it is difficult or impossible to provide a gloss. Such a distinction seems important to me, so I created {{non-gloss definition}} and applied it to definitions for of and hear, hear.

I don't know what the default style should be, but the Category:Form of templates are used to create non-gloss definitions, so I reused the 'use-with-mention' class from there. Questions, comments, observations, and points of refutation are welcome. Rod (A. Smith) 19:18, 13 June 2008 (UTC)[reply]

I like the idea, and while I'm not too happy with the way of implementing it (in particular, there's no way I can remember that template's name) I can't think of anything better. Conrad.Bot 13:55, 15 June 2008 (UTC)[reply]
I like it! :-)   The next step is to decide how to word such definitions; for example, at of you're using a subjectless finite verb phrase of which the headword is the implicit subject, but at hear, hear you're using a determinerless noun phrase indicating what the headword is. Elsewhere, I've sometimes used determiner phrases (such as "The definite article") and sometimes adjective-y/non-finite modifier clauses ("Used to […]"). Each of these approaches makes sense, but it's probably best to aim for consistency. —RuakhTALK 15:35, 15 June 2008 (UTC)[reply]
Yeah, consistency would be good. I'm not certain what wording style to use, and as you say, all three styles have have their advantages. Regardless, applying {{non-gloss definition}} (or whatever better-named template anyone might suggest) to such definitions will give us a convenient list of entries to review if we make up or later change our collective mind about the wording style. Rod (A. Smith) 20:57, 16 June 2008 (UTC)[reply]
True. :-)   —part of our collective mindTHINK 22:55, 16 June 2008 (UTC)[reply]
I'm not convinced that this is a useful approach. In particular, this applies primarily to parts of speech that cannot be defined in a way that limits the part of speech. For example, no prepositions can be defined with definitions that are themselves prepositions. This applies also to pronouns, interjections, conjunctions, and the like. It seems rather silly to say that these parts of speech will receive definitions formatted in one way, but those parts of speech will use a special template to format all their definitions in a different way. The gloss / non-gloss distinction is relevant in many cases, but I think the way this template has been planned for use is a mistake that will lead to confusion rather than clarification. --EncycloPetey 20:08, 17 June 2008 (UTC)[reply]
I hadn't planned to limit {{non-gloss definition}} to any particular part of speech. Many senses of many prepositions can be defined using words that function in the same grammatical role as preposition (of (belonging to), over (on top of), with (in the company of), etc.). So, yes, it would be rather silly to say that certain parts of speech will receive definitions formatted in one way, but other parts of speech will use a special template. Fortunately, nobody is saying that. Rather, this follows the sensible convention of nearly all of the respectable dictionaries I've seen. That is, definitions that are worded as glosses get one style. The few definitions that cannot be expressed well with a gloss get a different style. Do you know of a good dictionary that doesn't make that distinction? Rod (A. Smith) 20:34, 17 June 2008 (UTC)[reply]
You mean like MW3 ( of — "used as a function word to..." gets no special formatting); AHD ("of" — "used to indicate an appositive" gets no special formatting), Oxford Advanced Learner's Dictionary ("of" — "used to show the position of something" gets no special formatting), and likewise for the Compact OED. In fact the only dictionary I own that makes a formatting distinction is the Random House Dictionary, 2nd ed. In which dictionaries have you seen this distinction made?
I realize you hadn't planned to limit this template to certain parts of speech. My point is that, intentionally or not, the explanation of how this template is to be used means that it will apply almost universally to certain parts of speech while almost never appearing in others. As you have noted, it will also provide a push for people to define prepositions in certain convoluted or unenlighening ways to avoid using the template, which I feel is another undesirable outcome of the template's use. --EncycloPetey 21:08, 17 June 2008 (UTC)[reply]
But MW3 does use special formatting. It begins each non-gloss definition with an m-dash, whereas gloss definitions get no such treatment. In addition to MW3, Britannica/Funk & Wagnalls Standard Dictionary of the English Language (a 1960 edition, which begins special definitions with a clear mention of the headword within a sentence that describes it) and Webster's Encyclopedic Unabridged (which encloses special definitions in parentheses) all distinguish gloss from non-gloss. If you're mainly concerned with overuse, would it assuage your concerns if we add some warning to the template documentation that we prefer glosses when practical? Even if editors begin to overuse the template, though, the existence of such a template gives us a convenient list of entries from which to cull unnecessarily convoluted definitions. That's a good thing, right? Rod (A. Smith) 22:33, 17 June 2008 (UTC)[reply]
OK, now that you point out the m-dash I notice it, but it isn't especially obvious, and when definitions include a gloss and non-gloss, the intervening m-dash does not register visually at all.
Overuse? It's not overuse that bothers me. Please go back and read what I wrote; I never complained about it being used too much. I'm concerned that we're setting up a double standard where we use one kind of definition format for most parts of speech, but a different format primarily for the other "lesser" parts of speech. --EncycloPetey 22:58, 17 June 2008 (UTC)[reply]
I've read and re-read your original reply above, but if you're not concerned with overuse, I fear I'm no closer to understanding you. Rather than frustrate you with further questions, I'll wait for somebody else to shed some light on your concerns. Rod (A. Smith) 23:17, 17 June 2008 (UTC)[reply]
Longman's DCE puts such definitions in parentheses, FWIW. DCDuring TALK 21:29, 17 June 2008 (UTC)[reply]
That's also what Random House does. --EncycloPetey 22:03, 17 June 2008 (UTC)[reply]

Could someone please help me out with an addition to this template? Currently it displays par1 and par2 in italics and puts the PAGENAME into a default category. I'd like to add two optional parameters. One would be called "link" and if link=xx is provided, it would put it between parameter 1 and 2 (par1 + link + par2), indicating a linking vowel between the lemma and the suffix. If link=xx is not provided, it would just display par1 + par2 as before. The other would be called "cat" and if cat=yy is provided, it would put the PAGENAME into the category given in the parameter. If cat=yy is not provided, it would use the current default category. Thanks. --Panda10 11:59, 14 June 2008 (UTC)[reply]

That work for you? (In future requests like this should probably be at WT:GP.) Conrad.Bot 13:53, 15 June 2008 (UTC)[reply]
It works great. Thanks for your help. --Panda10 16:09, 15 June 2008 (UTC)[reply]

It's been protected since Oct 2007 with the reason: "vandal target - leave for a few days then delete with "misspelling of" comment". Seems like time to reevaluate it's protection... 75.212.217.187 (really, w:en:User:JesseW/not logged in) 75.212.217.187 06:45, 16 June 2008 (UTC)[reply]

We only include common misspellings. Since supercalifragilisticexpialidocious isn't so common itself, it would be a very difficult case to argue. DAVilla 04:19, 17 June 2008 (UTC)[reply]
What makes us sure of the right spelling? Amina (sack36) 05:49, 18 June 2008 (UTC)[reply]
I would imagine most people who wanted to know the spelling would go to the source, so you could ask Buena Vista Pictures that. Of course this isn't to say that there could be a more popular spelling, but you'd have to make that case. DAVilla 20:40, 18 June 2008 (UTC)[reply]

Proto-Indo-European (PIE)

Is there possibly a place on Wiktionary that might be dedicated to Proto-Indo-European roots (for example, *albho-) with accounts of all the words from various languages that arise from those roots? Or is that an inappropriate role for Wiktionary? I know that as a language student, studying everything from Spanish to Sanskrit, I often have difficulty looking for cognates across languages. I feel that there should be a forum for etymologies and cognate words somewhere within the various Wiki programs; however, I don't know whether Wiktionary is the proper place for this, or how such an operation would be handled. Thank you for your consideration (and feedback).

Have you tried Appendix:List of Proto-Indo-European roots? We try to include etymologies on words, but it's fairly specialist knowledge, so we need as much help as we can get. Conrad.Irwin 18:04, 16 June 2008 (UTC)[reply]
Yes that list, and everything inside Category:Proto-Indo-European language where there are more complete cognates list for individual reconstructions, without that much space restrictions. (everything from Hittite and Old Persian to modern languages - all written in the original orthography ^_^). Note however that the list mentioned by Conrad was originally compiled by folks on WP and transwikied here, and was mainly based on Pokorny's dictionary which, by today's standards, contains some "cognates" that are either false or too far-fetched to be considered reasonably tenable. --Ivan Štambuk 02:17, 17 June 2008 (UTC)[reply]

Googleability.

[[xenization]] is one of the first several hits for google:xenization; but it's not a hit at all for google:xenization definition, nor for google:xenization dictionary. And even though we're one of the first several hits for google:xenization, the title is simply “xenization - Wiktionary”, which is only meaningful to those who either know what we are or can guess without any context.

Surely this is something we need to remedy?

RuakhTALK 12:59, 17 June 2008 (UTC)[reply]

Yes, but we're the best; we don't have to be popular. As long as the right people (like the one's looking for "MILF", not necessarily in a dictionary) come here, we're fine. If we make it too easy for us to be found, we'll just have more newbie contributors and we'd have to block them - or train them. DCDuring TALK 13:29, 17 June 2008 (UTC)[reply]
It doesn't matter if we appear low down, the problem is that we don't appear at all if the word "definition" is included in the search string. As people are very likely to use a word like definition to find definitions, we should help them find us by letting the search engine know what our pages contain. Conrad.Irwin 13:44, 17 June 2008 (UTC)[reply]
Obviously you're being either facetious or sarcastic, but I can't tell which. If the latter, then I apologize for whatever I said that upset you. :-/ —RuakhTALK 16:39, 17 June 2008 (UTC)[reply]
Sorry, I wasn't intending to be. Looking back I was just not reading what you were saying. Conrad.Irwin 08:57, 18 June 2008 (UTC)[reply]
msh210 had a go at fixing it (after some IRC discussion) by using Mediawiki:Tagline. This does seem to work for "dictionary", google:ablute and google:ablute dictionary, however not for google:ablute definition. As this change was not very long ago, it is possible that google hasn't recached page an xenization to get the tagline in. As the Tagline is hidden to most users we could change it to read "A definition from Wiktionary, a free dictionary" which would get definition (singular). The other solution would be to install User:Conrad.Irwin/MetaKeywords.php which was designed so that we can include these words in our meta description tag so that the search engines know that we have definitions. Conrad.Irwin 13:44, 17 June 2008 (UTC)[reply]
IMO we ought to get more hits than WP (not 1.5% of WP) eventually because our brief help is needed more often than an encyclopedia article. Google and dictionary look-up tools in text-editors seem like the drivers. It would be great if we appeared higher on Google searches so that more folks could see what we are doing. What are the currently relevant barriers to testing these google-placement-improving steps, evaluating results, and implementing what seems to be working ? If we can't evaluate, then why not implement what does no harm and might improve things ? DCDuring TALK 15:51, 17 June 2008 (UTC)[reply]
I'm not sure that we can make that kind of comparison meaningfully. What do people go online to search for more often? The meaning of an obscure word or encyclopedic information about TV, celebrities, science topics, historical events, etc.? I think the reason WP gets far more hits is the kind of content they supply and the greater likelihood that people will go looking for that information. If more people go to the movies each day than to the library, does that mean the library is doing something wrong? No. The cinema and the library are providing different resources. In much the same way, and given the huge difference between the kind of content WP and WT supply, a hit percentage comparison is not statistically meaningful. --EncycloPetey 20:02, 17 June 2008 (UTC)[reply]
We are also getting about 21% of the hits that MW Online gets and 40% of what Dictionary.com gets. I exclude answers.com (6-7%), which has favorable placement at Google and broader coverage. On the bright side we are ahead of Bartleby (130-40%). DCDuring TALK 21:24, 17 June 2008 (UTC)[reply]
OK, those are comparisons we can work with. Do we know how users end up at those sites? For example, is there anything packaged with Windows or set as a default in Explorer that would favor those sites over us? How much of the value is from returning users (versus new ones)? We can't assume that the number of hits all result from random internet searches of from search engines, unless we have data to support that notion. --EncycloPetey 22:02, 17 June 2008 (UTC)[reply]
I do know that our traffic numbers are inflated by the links from one of the most popular sites on the Web. Our numbers seem to include all the Wiktionaries under wiktionary.org. I don't think that the MWOnline, Bartleby and Dictionary.com have non-English dictionaries. I also know that MW Online had a huge increase (300%) in its volume over the last year; whereas Wikt has gotten about a 43% increase in the same period. After these teaser facts, everything else would cost money or require research I haven't done yet and may not be able to get the facts for. Has anyone else been paying attention to what our competitors have been doing? Whatever it is the MW has done it seems to have borne fruit starting in January '08 and has been a big setback to bartleby and dictionary.com. DCDuring TALK 02:40, 18 June 2008 (UTC)[reply]
The site dictionary.com probably is the main reason- it packages many dictionary sites together on one page. Maybe we could get that site to include our definitions too? That is probably where most people get dictionary information, as opposed to individual sites. Nadando 03:31, 18 June 2008 (UTC)[reply]
Dictionary.com has one-third the volume of MWOnline and has lost 15% during the period that MWOnline has gained 300%. That doesn't seem likely to account for MWOnline's sudden surge since January 1, 2008. DCDuring TALK 11:35, 18 June 2008 (UTC)[reply]
The surge in M-WOnline.com is largely due to shifting a great deal of the volume from M-W.com to MWOnline.com. DCDuring TALK 18:51, 20 June 2008 (UTC)[reply]
Oh, cool. :-)   I'm not sure that we should accept a suboptimal tagline just because Monobook doesn't display it; if we're going to go that approach, I think it would be better to wrap whatever text we want in an explicit <span style="display:none">[…]</span>. (By the way, does the tagline have access to the current page name, using {{PAGENAME}} and whatnot? If so, then we might want it to mention the entry title, and perhaps to differ slightly between entries/appendices/etc. and templates/project-pages/etc.) —RuakhTALK 16:39, 17 June 2008 (UTC)[reply]
On the topic of web visibility of Wiktionary: Isn't the use of keywords in a meta tag in the header of HTML the standard way of informing search engines about the topics covered in a web page? A meta tag, unlike a tagline, does not get printed, so it can contain a list of relevant keywords as one sees fit, without forcing the words to form an artificial sentence or phrase, including such words as "definition", "dictionary", and "define".
Also, did you know of the keyword "define:" by Google, exemplified by define:apple? --Daniel Polansky 08:53, 18 June 2008 (UTC)[reply]
Daniel, yes. See User:Conrad.Irwin/MetaKeywords.php which was written the last time this discussion arose (which would allow us to define relevant meta-tags only in the main namespace). It would be nice if they included Wiktionary in the define: statements, but sadly they don't (I filed a feedback about it a long while back, but it got ignored ;). If we want to do this (which it seems like we do) shall I start a WT:VOTE on it so that we can demonstrate consensus to the developers?
Ruakh, we can get the page name into the tag line, however we can't differ between namespaces with it (though we could with the MetaKeywords extension). The Tagline is already wrapped in a display:none through CSS, and I'm not sure how clever google are, but I'm fairly sure they try and ignore things that are explicitly not shown (as the potential for abuse is large). Conrad.Irwin 09:17, 18 June 2008 (UTC)[reply]
The tagline is display:none in Monobook, but not in, say, Classic. Does Google respect meta-tags? (And did I totally miss the previous discussion, or do I have the memory span of, like, a flea or something?) —RuakhTALK 12:39, 18 June 2008 (UTC)[reply]
This is one of my pet-hates with our beloved MediaWiki software, why the hell can't they use the same class names for the same bits of the skin! The previous discussion was on the grease pit (Wiktionary:Grease_pit_archive/2008/January#Aiming_for_Google_keyword_define) Conrad.Irwin 19:01, 18 June 2008 (UTC)[reply]
I had a hunch this must have been discussed before. Thanks. I for one am supportive of the idea of keywords in meta tags, and see no drawbacks, except from the increase of the attention that the project could get from a broad user base. But then the issue at stake would be not what technical solution is preferable but rather whether attention is wanted. --Daniel Polansky 10:44, 18 June 2008 (UTC)[reply]
We have no idea whether returning a result (likely to fall below m-w and other online dictionaries) on google will increase the number of editors, or, if it does, by how much. The only way to find out is to try this out, it can always be reverted if it is found we are unable to cope with the levels of vandalism. Conrad.Irwin 10:58, 18 June 2008 (UTC)[reply]

Google's Listing Tactics

It turns out Google doesn't take it's cue from Meta tags alone. Their approach is more complex than that.

  1. The information included as the header is their first go-to, but not their primary judgement call. They couple this with more info.
  2. Number of times a word is used on the page in questions. In other words, if the word is "tree" and in our definition page we have the word "tree" written five times and dictionary.com only has it listed three times, we win the upper position--all other things being equal. This is a fairly sophisticated calculation that isn't easily fooled by tactics created specifically to fool the spider.
  3. Meta tags are taken into account, just not the only thing taken into account
  4. Number of times a link is made to that page. Here's where they generally place their heaviest consideration. It's just really hard to see why anyone would link back to wiktionary. I see it all the time with Wikipedia when people want to cite a reference. Is there a way we can improve this statistic? Amina (sack36) 12:56, 20 June 2008 (UTC)[reply]
Do you know whether links from WP count? Many is the article there that has terms used where a Wiktionary link would be much more of a help than a WP link. The useful links would be in-line links, not boxes. DCDuring TALK 14:33, 20 June 2008 (UTC)[reply]
Links from WP do count for Wiktionary (as we are a sister project MediaWiki doesn't add the nofollow attribute to links to us).
Re: Sack36, You are right on all counts, but we are not (as far as I know) aiming for a high google position, just any place at all if certain words are searched for (compare google:ablute google:ablute definition site:en.wiktionary.org, it is interesting to note that thanks to the recent tagline change, it does now find "google:ablution defintion" on page 6 of the results for me). Link backs will happen gradually as people find Wiktionary useful, so we can leave it to take care of itself, getting the repetition exactly right in a wiki is (I think) too hard, and as it isn't greatly important, we may as well leave it. Conrad.Irwin 17:46, 20 June 2008 (UTC)[reply]
If we're not aiming for first page, we may as well not "aim" at all. People don't look past the first page. Let's face it, the only reason we're looking to google is to get people to our site. It has to be first page or nothing. Amina (sack36) 07:06, 23 June 2008 (UTC)[reply]

Are you sure that Google looks at meta tags at all? It was my impression that meta (keyword and description) tags are so often abused that most search engines completely disregard them. As far as I know, only directories like Yahoo and directory.google which have human editors confirm their content look at the meta tags. And for this purpose, it would be useful to add something like <meta name="description" content="English-language definition of ‘exposé’, a word in English and French." />

Search engines also disregard text hidden using CSS or other methods. I believe hidden text may also lower the perceived trustworthiness of a page. Why not just show “Definition from Wiktionary, a free dictionary?” The title tag, which shows up in your browser's window title, is also significant. Instead of just “ablute - Wiktionary,” it should be something like “ablute – Wiktionary, an open-content dictionary.”

See Google's Webmaster Help CenterMichael Z. 2008-06-20 20:47 z

Important: don't try to game the search engines—they are designed to respond badly to that. The easier it is for human readers to read and understand the content of a page, the better it will be understood by Google, etc.
The number one thing we can do to increase Google rank is improve the quality of the dictionary, causing more sites to link to it.
We already have a good head start in links from Wikipedia articles and 404 pages, and it may be helpful to start a systematic campaign to add a Wiktionary link to every single eligible Wikipedia article. It would also be nice to explicitly note the presence of a Wiktionary definition on a 404 page like w:Ablute, rather than just rely on the default sister link. Michael Z. 2008-06-20 20:56 z
And don't forget Wikiquote! bd2412 T 21:02, 20 June 2008 (UTC)[reply]
What people don't seem to understand is that we are not trying to game the search engines. We are trying to make the site describe itself correctly, which will aid the search engines. Yes, there is no substitute if we want "higher google ranking" to improving our defintions and letting eventuality take hold. However this is not about getting higher google rankings. This is about ensuring that people that specify they are looking for dictionary definitions find dictionary definitions. I don't care how far down the listing Wiktionary is, but it really oughts to be there somewhere. Although most search engines nolonger ascribe high importance to meta-tags they do read them - though they (Google at least) are very harsh on sites they catch spamming irrelevant keywords. Conrad.Irwin 22:12, 20 June 2008 (UTC)[reply]
We're a "dictionary", we're "free", we "define" "word"s, we offer "definition"s, "translation"s, "synonym"s, "etymology", "pronunciation", "usage". It would hardly be gaming the system if we made that much clear. Arguably we also offer "answers" and various other things, but I'd be happy if those terms coupled with a word came up wiktionary some of the time. Could we have an easy-to-use "cite us" link in the toolbox? DCDuring TALK 22:53, 20 June 2008 (UTC)[reply]
But not every entry has all those items. Hmm... Is there a way to set things up so that meta tags appear according to the section headers that exist on the page? That feature could be useful and accomplish some of what's being discussed. That is, if an entry has a synonyms section, then "synonym" will appear in a meta tag for the page. How difficult would that be to do? --EncycloPetey 17:58, 21 June 2008 (UTC)[reply]
There's a question of fact here. Does Google ignore our section headers? DCDuring TALK 18:06, 21 June 2008 (UTC)[reply]
Ignore? probably not. But, even if the section headers aren't ignored, does Google give them any meaningful weight for searches, or would meta inclusion benefit users? --EncycloPetey 18:17, 21 June 2008 (UTC)[reply]
Meta keywords count for less than the page headers. Anyway, there would be no point in duplicating the text of the page in them: the page structure already contains more information than repeating those heading names would add. Needlessly repeating information that's already there is trying to game the search engines, and only waters down the meaningful content. Michael Z. 2008-06-21 21:03 z
MZ: Do you know how Google treats our section headers? If they are included, then the only issue is whether the words that accurately characterize us and do not appear in an entry naturally are worth including in some way that Google would take seriously. I don't know what to say if Google excludes our section headers, because most of those terms are important descriptors of our content. It could be that they have made a business decision that we aren't indispensable enough to be treated on a par with commercial sites that are in a position to do business with them. WP is still indispensable to users, but Google is trying to create a commercial competitor anyway. It may be that we have to work harder at linking to and from sister projects to enhance our value. But our non-commercial nature limits how we can deal with Google. DCDuring TALK 00:05, 22 June 2008 (UTC)[reply]
I don't have any behind-the-scenes insight, but it is my understanding that Google and other search engines look primarily at the visible content of the page (because text hidden from readers is usually trying to game search engines), and do pay at least some attention to page structure (so the window title, headings, and subheadings are significant, and even parts of the URL). Also very important is having high-quality links to a page and to a site.
I doubt that Google treats us differently from commercial sites, or that they mistreat competitors—stuff like that would ruin their good reputation. I think the simple fact is that everyone in the world knows about Wikipedia and links to it. As we become more useful, we will gain high-quality links and a better reputation, and so show up higher up in the search results. Keep improving the dictionary, and have patience. Michael Z. 2008-06-22 04:34 z
Everyone on that all important first page is doing everything we'll do with the pages we have. The only thing that will set us off from the others is the amount of back links we can get. If we could get some kind of widget that could be put on other people's sites that would allow them to do lookup of meanings at their site, we'd have the back links we need. Of course that would require that we have the kind of definitions they're looking for. Medical sites require medical terms, scientific sites require scientific terms, etc. Amina (sack36) 07:06, 23 June 2008 (UTC)[reply]

I think search engine users who seek a definition of a word often include "definition" in their queries. Unfortunately for the English Wiktionary, its entry pages don't typically contain the word "definition", so search engines exclude our entries from the search results. We should consider adding the word "definition" to the visible text of our entries. Rod (A. Smith) 19:34, 24 June 2008 (UTC)[reply]

Extracted from Merriam-Webster.com:

"Definition of tipple from the Merriam-Webster Online Dictionary with audio pronunciations, thesaurus, Word of the Day, and word games."
"Keywords" content="tipple, definition, define, meaning, dictionary, glossary, free, online, english, language, word, words, webster, websters, merriam-webster"
tipple - Definition from the Merriam-Webster Online Dictionary

Merriam Webster is the leading English dictionary site without a special deal with Google. They feel compelled to include some of the very meta keywords we are discussing. They seem to get better placement than we do. There are several possible reasons why. The easiest of those to address is the absence of meta keywords. Why wouldn't we add the keywords? We won't stop doing things like improving quality, layout, breadth, depth, special features. DCDuring TALK 01:05, 25 June 2008 (UTC)[reply]

I've initiated a vote at Wiktionary:Votes/2008-06/Install_MetaKeywords_Extension. Conrad.Irwin 12:13, 25 June 2008 (UTC)[reply]
Cool. I've commented at the talk page there. DCDuring TALK 14:12, 25 June 2008 (UTC)[reply]
The Vote, which we need to show the developers before they will install anything, is now live at Wiktionary:Votes/2008-06/Install_MetaKeywords_Extension. Yours Conrad.Irwin 16:20, 1 July 2008 (UTC)[reply]
"This template adds Category:Vietnamese words needing attention to bring the entry to the attention of our Japanese experts. It does not change the appearance of the page."

Why Vietnamese words need attention of Japanese folks? I think there are Vietnamese members here, for example, me. --Cumeo89 14:52, 18 June 2008 (UTC)[reply]

That was probably a cut-and-paste error. Robert has now corrected the template. Thanks for pointing this out. --EncycloPetey 16:39, 18 June 2008 (UTC)[reply]

User behavior

Because we have no direct information on the particular behavior of our own users, we have been relying on ourselves as models of their behavior and do not even have quantitative information about ourselves. I came across some interesting statements in 2006, Prioritizing Web Usability, Nielsen and Loranger. Their results are based on a study of 69 users (no teens or seniors) presumably conducted c. 2004.

  • As a baseline web users successfully achieve their objectives 66% of the time, compared to 40% in the 1990s.
  • Users spend less than 2 minutes (1:50) on a site before abandoning it if they are not achieving their objectives.
  • Users spend only 31 seconds on a home page on their first visit and less and less thereafter. They scroll little on the first visit (23%) and less thereafter. Users read much more on interior pages than on home pages.

c. 45% of links users click on are from the interior content area of a page, 10% from the footer, 15% each from left, right, and top.

For search results, 53% of users only look at what appears above the fold (first screen), a further 40% make it below that but not past the first page, 7% make it to the second page, less than 1% beyond that.

On a search page #1 gets 51% of clicks; #2, 16%; #3, 6%; #4, 6%; #5, 5%; #6, 4%; #7, 2%; #8, 1%, #9, 1%; #10, 2%; #11+ (2nd page+), 5%

Users scrolled below the fold only 42% of the time on content pages that had "below-the-fold" material.

The above are a large percentage of the user behavior facts in the book. The conclusion that the authors reach about site design are based more on their rational economic behavioral model of search behavior and their own clinical experience. I doubt if one could say they were tested, though they are probably more accurate than our anecdotal impressions. DCDuring TALK 18:37, 18 June 2008 (UTC)[reply]

These are great statistics, and I don't mean to argue against them in any way, but I do think they're skewed toward the average site. We are reference sites (Wikipedia, Wiktionary, Wikisaurus) and the data may be different for us. If a person is looking for a given piece of information, I imagine the time spent would be directly proportional to how important the information is to the viewer; how specialized the information is; and how predisposed toward our sites the individual is. Wikipedia may have the most far reaching reputation, but it's not always considered valid info. Wiktionary is more favorable in the eyes of those who know of it, but it's not as well known. Wikisaurus has a bad rep or no rep depending. Amina (sack36) 06:51, 20 June 2008 (UTC)[reply]
Of course, we're unique, just like everybody else. I eagerly await the data that shows that. In the meantime, what would be your guess as to how different we are? Wiktionary is not a monopoly. How important is a dictionary entry on Wiktionary as opposed to:
  1. rereading the troublesome passage, using a different word.
  2. doing without
  3. using Answer.com (via Google)
  4. using MWOnline
  5. asking someone nearby or reachable
  6. using a print dictionary?

I am not at all certain that one piece of reference information is more valuable to a user than, say, finding exactly the right model of water shoe at a good price while avoiding trips to the two nearest shopping malls.

I eagerly await the data on user perception of the relative reliability of reference sites and on the recognition of "Wiktionary". It may be, erm, some time in coming. We don't even seem to know where users are coming from when they come to Wiktionary. We do know that "MILF" has been the most common search term to find us, more common than "Wiktionary".

Also if a user doesn't find what's wanted at a site quickly the first time, does it make the user more or less likely to click on the site the next time information is needed? What matters most to us in terms of attracting new users is ease of getting what they are looking for the first time they hit the site, the first time they come back after an initial disappointment, etc. Also we are competing with other WMF sites for the time of contributors. I would expect that well laid-out pages can't hurt in recruiting them, especially casual contributors and those with special-context knowledge, as opposed to linguists. DCDuring TALK 11:41, 20 June 2008 (UTC)[reply]

If I may, I'd like to ask a question unrelated to the discussion. Several times y'all have used abbreviations that I don't understand. Specifically: MWOnline, MILF, WM. Could you define them really quickly? Amina (sack36) 13:07, 20 June 2008 (UTC)[reply]
Sorry. Of course you may. Some of these may or should be in either Wiktionary:Glossary for "insider" terms used in Discussion rooms like Grease Pit or Beer Parlor, or pages like RfVerification, or RfDeletion, etc. or Appendix:Glossary for terms used in entries. MILF is a notorious entry. MWOnline is Merriam-Webster Online, a mostly free dictionary site. WMF is Wikimedia Foundation, the umbrella for Wikipedia (WP), Wikt, WikiSpecies, WikiCommons, WikiSouurce, Wikiversity and other projects. I have argued that the Glossaries should be links right under Help in the navigation box on the upper left. DCDuring TALK 14:16, 20 June 2008 (UTC)[reply]
Shortcuts are WT:GL for the "insider" jargon; App:GL for what every user is supposed to know to understand our entries. DCDuring TALK 14:27, 20 June 2008 (UTC)[reply]
Thank you. I understand now. BTW, why isn't Dictionary.com in that list? It's gotta be the easiest to access. It is severely limited but most people don't want any more than that. Those that do, will head for the OED pages or beat their brains out with the mouse.
I think the thing that would make people access us above the rest is a two tiered approach like (and don't boo and hiss) Apple uses. A person getting a Macintosh can shove it on a desk, plug it in, turn it on and as soon as the welcome routine gets finished can pretty much do all the basics. No fuss, no extra work, no wading through incomprehensible jargon and unwanted explanation. That's tier one. However, if you are uber-geek they also supply a Unix window where you can really screw up the machine; a help function that can boor you to death and several other high end, arcane things. But all that is transparent to the mom and pop terrified-of-computer types!
We need to be that kind of flexible. The stuff at the top should be 6th grade understanding. Nothing about parts of speech or declensions or widgets. Then, as you scroll down the page it should get increasingly more complex until you reach the PHD in (name the language). If you look at Wikipedia, the better pages do just that. Amina (sack36) 00:01, 21 June 2008 (UTC)[reply]
It seems obvious, doesn't it? Most casual users (non-contributing, often anon) probably just want definitions (and spelling). Even if they want more. they need those things to make sure that they rest is relevant. A first screen (above the fold) that doesn't have definitions had better make it clear that the definitions and other wanted content are just one click away. We can use up to an inch for Etymology and two inches for Pronunciation when we only have six inches in total. I'm not sure whether anons still have the left-side ToC to contend with, but that could force all content off the first screen.
I don't think that you can manage presenting words without mentioning the part of speech. Even MWOnline can offer a daunting list of parts of speech )even multiple links for the same pos but different etymology) for a complex word like set before offering any definitions.
Many entries are short enough to fit on the first screen in their entirety. Some entries with multiple etymologies and parts of speech and many definitions for some of the parts of speech would be unavoidably difficult to present. Some entries derive their apparent complexity from the existence of many language headers, which in turn follows from the "all words in all languages" part of the Wiktionary creed. Each of the classes of causes can be dealt with to improve the effectiveness of the first screen for anonymous first users. But the goal would first have to be accepted, which it is not, certainly not with any enthusiasm.
What registered users might want to see on their first screen is somewhat customizable anyway, but some of the customization never worked for me with Internet Explorer, though it is fine with Firefox. DCDuring TALK 00:53, 21 June 2008 (UTC)[reply]

brainstorming

Can we come up with a brainstorming technique suitable for the Wiktionary team to help our discussions? Not just this, but any. Many times I find that good ideas are buried in long conversations and after a few months when the subject comes up again, editors may remember that this was already discussed but newcomers will not know about it and it's hard to search archived talk pages to find the pieces. In classic brainstorming, the ideas are listed and judgment is suspended, then the ideas are analyzed, combined, improved, etc. to come up with the best solution. If you don't see this as feasible, please recommend other methods to keep comments/ideas/issues related to one subject on one summary page, so people can refer to it and return to it.
Perhaps brainstorming needs to be conducted off the main community pages, but with a link from these pages as long as the dicussion is active. Trying the classic brainstorming method first seems like a good idea. We could then consider revising our method based on results. Would the right topic be "first entry screen for anons" or something else? DCDuring TALK 13:47, 21 June 2008 (UTC)[reply]
"First entry screen" is the high-level topic. But we might want to break it down to subtopics, a list of things that could be looked at for improving the first entry screen. --Panda10 14:13, 21 June 2008 (UTC)[reply]
OK, I'm confused. I thought this was off the main community pages. Do you remember how hard this is to locate? Also, I really like the idea of the brainstorming. These pages are bloody hard to keep up with! Amina (sack36) 07:25, 23 June 2008 (UTC)[reply]

TOC issues

Regarding long TOCs that occupy precious space: Is there any way we can implement a horizontal TOC where only the two-character language codes are displayed (en - de - fi - ru) as a link pointing to the FL section of that page? If you say the code is not intuitive for users, an alternate text could display the English and FL name of the language (for de it would be German - Deutsch) when the cursor is above the link. Only those codes would be listed that are on the page. If a new FL section is added, AutoFormat could add the new link when the page is saved. I know we have many more items in the current TOC but maybe we can come up with a new way to incorporate them into a horizontal TOC. --Panda10 12:44, 21 June 2008 (UTC)[reply]
To clarify, are you suggesting this as the default for anons? DCDuring TALK 13:47, 21 June 2008 (UTC)[reply]
Sorry, I don't understand your question. Do anons see a different layout than registered users? --Panda10 14:13, 21 June 2008 (UTC)[reply]
Registered users can set their own preferences at "my preferences". Those really in the know can use WT:PREFS for further customization. One of the preferences allows the table of contents to be on the right hand side with the entry beginning at top left. The top-right placement of ToC already accomplishes some of what we seek for users who know how to set it. The option to do so was considered a test at the time I opted for it, I think. I don't know what its status is now. DCDuring TALK 17:34, 21 June 2008 (UTC)[reply]

I've been wondering about this whole thing with left hand TOC. What reason do we have for wasting that space with TOC? Why isn't it on the right? Amina (sack36) 07:17, 23 June 2008 (UTC)[reply]

(1) It's the default for the MediaWiki projects to have it on the left. (2) There are often images, templates, and other items displayed at the top right of an entry, which would interfere with a TOC there. (3) Some people want it there, and don;t see that as a "waste". --EncycloPetey 02:15, 24 June 2008 (UTC)[reply]
  1. We already depart from some MediaWiki defaults.
  2. There certainly are issues with how the right-side ToC interacts with images and sister-project link boxes. We can push the link boxes down to "See also", but there would be a lot of entry modification required. The images might require more complex work. An entry with multiple images (See screw, which has just 2) can be a mess and may need a gallery (See head). Gallery is not even an approved header, so that the images may not be noticed by users interested in the top 15 senses, unless they use the non-standard intra-entry links that have been developed on an experimental basis or we develop another approach to their display.
  3. It is hard to see the value of the vast amounts of white space above the fold on the right side. I'd love to see Etymology and Pronunciation under the show/hide bars and shorter fonts for the headers. Also ToC length control by suppressing everything below PoS by default.
Registered users could be given configuration choices tailor how they see things. DCDuring TALK 02:54, 24 June 2008 (UTC)[reply]
I've been using the WT:PREF to put the TOC on the right, and I've not found any page on which it looks unacceptably messy. I still strongly support enabling that WT:PREF by default, with the option to turn it off in WT:PREFS. Yes, it may move a few images around on the few pages we have images, but it makes nearly all pages easier to read which is an insurmountable benefit. Conrad.Irwin 10:59, 26 June 2008 (UTC)[reply]
I've started the page Wiktionary:Layout woes where future discussion on entry improval can take place without disturbing the beer parlour too much. Conrad.Irwin 11:24, 26 June 2008 (UTC)[reply]

botflag for CarsracBot

I ask a botflag for my bot, because a small test for the first 60 articles in the main space 7 needs adustment. For more information see my userpage. CarsracBot 19:08, 18 June 2008 (UTC)[reply]

Please read the policies concerning bots. You will not be given permission until you have allowed experienced users to check your bot code and have explained exactly what the bot is suppoed to do. You have not fulfilled either of these requirements. --EncycloPetey 19:24, 18 June 2008 (UTC)[reply]
On the homepage is explained that I would only do interwiki work (add,removing and changing interwiktionary links). And that I use the standard pywikipedia software. That is constantly checked and updated by experienced users. Your homeiwbot only adds interwiki links. Carsrac 12:45, 23 June 2008 (UTC)[reply]
We have a much more efficient interwiki bot (User:Interwicket) which can do all of the interwikis for the whole en.wikt in a couple of days after every XML dump, and the last several requests to run an interwiki.py based bot have been denied (so I doubt you'll be granted permission). There has been an interesting bot request on WT:GP, I don't know whether you're interested enough to try that? Conrad.Irwin 17:38, 19 June 2008 (UTC)[reply]
If it is something that can be done with the standard set of pywikipedia scripts it would not be a problem. Please give a good link to the interessing request.
To comeback on my request. As I indicated in the start your home interwikibot is overworked. It has a the moment a turnaround time of 10 days give and take a hour. So please don't come with a couple of days. I'm not a skilled software engineer. But I work with a public scripts and work together with other users. Questions about the home bot are not answered for a half year by the owner.
BTW I have no question marks on the skills of the programmer and owner of the homeinterwikibot. But I think that a man with his knowledge should be one that improves pywikipedia scripts and upload those improvements. Carsrac 12:45, 23 June 2008 (UTC)[reply]

Vote proposal to modify WT:ELE (help) page

The WT:ELE help page does not include a reference to context labels, yet there are hundreds in use and they are a valuable guide to definitions in paper dictionaries as well as Wiktionary.

I would like to propose that a section be built at the WT:ELE#The_part_of_speech_or_other_descriptor section of the ETE page to introduce people to context labels, found at Category:Context_labels.

This suggestion is first-stage only to see if others agree with the placement and Context_labels page as the appropriate reference for labels. If so, an actual explanation must be drafted. Wakablogger 23:31, 19 June 2008 (UTC)Wakablogger[reply]

There is some ongoing work on context labels, but whatever the outcome, ELE would need to accommodate it. ELE is long. The discussion of context labels is potentially long. Perhaps it would be better to have a heading, two or three lines, and a link in ELE, the link being to the body of the discussion of context labels in a subpage. DCDuring TALK 00:05, 20 June 2008 (UTC)[reply]
See Template talk:context. DCDuring TALK 00:26, 20 June 2008 (UTC)[reply]
It might be best to start this as a whole new page Wiktionary:Context that explains the labels and their use in detail, then create a summary for inclusion on ELE. That way, we don't have to have a vote until we want to add the summary. (New pages don't need votes to start or edit.) --EncycloPetey 03:01, 20 June 2008 (UTC)[reply]

Absolutely, yes.

But these should be addressed from a content point of view rather than a technical one. “Context labels” serve several diverse functions—indicating grammatical aspect, qualifying register, dating, or usage, indicating geographic or topical context, or specifying regional language. That they happen to be indicated using similar “templates” is incidental. Each of these should be discussed in the appropriate place to reduce conflating them.

See Category:Context labels for a breakdown. Michael Z. 2008-06-20 03:33 z

As long as they're all easily accessible from one location AND easily accessible from the basic help pages, that's fine. On the few items I've done, I spend more time on labels than anything else (more recently, I gave up on labels). Wakablogger 04:58, 20 June 2008 (UTC)Wakablogger[reply]

Portuguese spelling

There are a huge number of alternative spellings between European Portuguese (which is also used in Africa) and Brazilian Portuguese, since the rules for diacritics, digraphs and others are different from scratch. (ato and acto, amamos and amámos, gol and golo, sistema operacional and sistema operativo...) Only a few entries in Wiktionary include this distinction, and their counterparts simply have not yet written. Then I created two templates (Template:pt-Brazilian spelling and Template:pt-European spelling) that would be used in every page explained above. However, I have not found any similar templates for English words (such as center and centre). Instead, we are using Template:qualifier for them, to provide a link to the specified country followed by the alternative spellings, or by a simple description. In this case, the templates Template:pt-Brazilian spelling and Template:pt-European spelling would be a better choice, to provide a complete explanation where it is necessary. Daniel. 18:09, 20 June 2008 (UTC)[reply]

Wouldn't it be more usual, for a word used in Brazil only, to use {{context|Brazilian}} and, sub ===ALternative forms===, * [[foo]] {{qualifier|Portugal}}?—msh210 17:40, 23 June 2008 (UTC)[reply]

Bot flag request for User:EivindBot

I, EivindJ, hereby request a bot flag for my bot, EivindBot. It is a bot run on the python/pywikipedia framework and it'll do interwikis based on no.wikt (a growing wiktionary). Thanks in advance, and please tell me when I can do test edits. --EivindJ 18:13, 20 June 2008 (UTC)[reply]

See the previous bot request on this page. We have a very efficient Interwiki bot (who's code User:Robert Ullmann would (I think) be happy to share, if you want to spread this good news to the other wikts) User:Interwicket, that can update the entire site in a day or two after every xml dump. All recent interwiki.py bot requests have been denied, so it's unlikely you'll get approval to use it at all. Conrad.Irwin 22:16, 20 June 2008 (UTC)[reply]
I have see the code and it is a modified version of pywikipedia. He is more then welcome to share his code with the rest of the pywikipedia project. But at the moment it not a public code. And it can't be reviewed by an experienced editor. Carsrac 11:57, 23 June 2008 (UTC)[reply]
For everyone else's edification: it is not "modified" from interwiki.py; it is purpose written for the wikts, which can and should use an entirely different algorithm from the pedias; the source is public and published at User:Interwicket/code and can be reviewed by anyone. (I do in fact run it on a modified version of the framework, but it will run on the standard one; I run it on a modified version because the standard one is extremely fragile when faced with network problems; it tends to crash if the net has the slightest glitch. On the net here, a glitch can be anything from transients occurring many times an a hour to 24 hour outages, and the process must recover, if it kept restarting it would take forever ...). Robert Ullmann 14:19, 23 June 2008 (UTC)[reply]
I see, that's ok – and I would be more than happy to get hold of that code :) --EivindJ 22:31, 20 June 2008 (UTC)[reply]
See above; but do not that even though it is partly set up to be run on any wikt (e.g. variable "home" is set to "en"; it also is "non-portable" in several ways: it assumes sort order is set up in a putfirst list; apparently the current framework doesn't provide that if the order is just code-alphabetic, etc.) To make it really usable elsewhere, I should do a bit of work and testing. Also note the current published version is not necessarily the version running on en.wikt at any given moment; I may need to be prodded to update it. (which I will do now ;-) Robert Ullmann 14:19, 23 June 2008 (UTC)[reply]

Formatting conjugation, inflection and declension

Is there a particular reason to write some conjugations, inflections and declensions in italics when almost every other doesn't follow that rule? Should the first letter be uppercase for all languages? Do we need emphasis on the most common form (infinitive of verbs, singular of countables, etc.)? And what about ending with a dot? Here are some examples...

  • ama#Spanish (first letter uppercase, a bold lind and ending with a dot)
  • botones#Spanish (first letter lowercase, a bold link and no dot)
  • consigo#Spanish (first character is a numeric abbreviation, there is an italic link and two italic translations)
  • creamos#Spanish (first letter lowercase, a normal link and no dot)
  • discere#Latin (first letter lowercase, a bold link and ending with a dot)
  • éramos#Spanish (first letter uppercase, a bold link and ending with a dot)
  • ler#Norwegian (first letter lowercase, conjugation in italics followed by a non-italic infinitive)
  • séria#Portuguese (first letter lowercase, a bold link and no dot)
  • servos#English (first letter uppercase, a bold link and ending with a dot)
  • servos#Latin (first letter lowercase, a bold link and no dot)
  • somos#Portuguese (first letter lowercase, in italics and in parenthesis, followed by a translation)
  • sou#Portuguese (first letter lowercase, in italics and no dot)

As with other definitions, I think that all those forms should start with uppercase, and ending with a dot. The most common form should appear, emphasized by bold formatting. And examples (or even quotations!) should be done in this way, not this way, that is, not inside the definition. No abbreviations should be done ("first person" would be used instead of "1st person") and no translations are necessary inside the definition. Every definition should be separated, and italics are not necessary. Daniel. 19:40, 20 June 2008 (UTC)[reply]

There is actually a standard formatting for Spanish conjugated forms- éramos#Spanish is the only one in your list that uses it, I believe. Most of the conjugated Spanish forms were added with a bot. It should be possible to do the same thing with Portuguese forms- take a look at the subcategories of Category:Spanish_verb_forms (the verb forms in that category all need formatting, by the way.) User:TheDaveRoss originally ran the bot to add verb forms, and User:DCDuring ran the more recent one. Nadando 19:53, 20 June 2008 (UTC)[reply]
I did not knowingly run any bots. I hope it's hard to run them unknowingly. DCDuring TALK 22:38, 20 June 2008 (UTC)[reply]
My bad, that would be User:Dmcdevit :) Nadando 13:56, 21 June 2008 (UTC)[reply]
Actually, italics are standard with non-gloss definitions here. That is, any time a word is "defined" as "first-person singular past tense" (etc.), we use a particular formatting style that by default renders as italics. We voted on this. Now, if you don;t want to see the text in italics, you may set your personal preferences so that this does not happen. However, the style tags should still be used because there are others who do want such "definitions" displayed this way. The issue of capitalization and periods is not settled for situations like this, and is left to personal choice in most cases. For Latin, I use lower case and no period because it makes coding the templates for conjugated verbs much easier to do. I would agree, though, that any given definition should either (a) be capitalized and end in a period, or (b) not be capitalized and have no period. That is, no definition should start with a capital letter and not have a period, or start with a lower case letter and end with a period. --EncycloPetey 17:54, 21 June 2008 (UTC)[reply]

Introducing…{{autological}}

I’ve just created a category Category:Autological words and written a template {{autological}} to ease categorization.

Presumably this is ok – enjoy!

Nbarth (email) (talk) 17:28, 22 June 2008 (UTC)[reply]

Seems fine, the words that are in Category:Autological words should (I think) be in Category:English autological words. Conrad.Irwin 17:32, 22 June 2008 (UTC)[reply]
Yup, thanks – fixed this (took some purging) – I hadn’t realized it at first.
Nbarth (email) (talk) 18:08, 22 June 2008 (UTC)[reply]
For loanword to be autological, it would have to be loanword itself, which it isn't AFAICS. --Ivan Štambuk 02:39, 25 June 2008 (UTC)[reply]
As I share understanding of "loanword" and "autological" with both of you, I have removed the tag "autological" from loanword.--Daniel Polansky 20:11, 25 June 2008 (UTC)[reply]
Agreed – “loanword” is a loan translation of a German word, not a loanword itself (loanwords are borrowings w/o translation). I’ve made a note at Talk:loanword.
Nbarth (email) (talk) 22:28, 27 June 2008 (UTC)[reply]
What particular sense of the verb to verb is the verb autological to? It's not autologism, unless senses upon which autologisms are defined are allowed to operate on cross-lexeme boundary, on all lexemes that appear to just share the spelling, regardless of PoS, etymology etc. (which strikes me as a very dirty trick ^_^). --Ivan Štambuk 17:55, 26 June 2008 (UTC)[reply]
I think it would be an autologism- as the word "verb" was "verbed". XD Teh Rote 17:59, 26 June 2008 (UTC)[reply]
Certainly one can verb the noun verb, but can one verb the verb verb ? --Ivan Štambuk 06:21, 27 June 2008 (UTC)[reply]
This is subtle: the word “verb” can be used as a verb, hence the verb “verb” is a “verb” (in the noun sense). Thus as a word, it can apply to itself, but only by changing parts of speech. Also, for fun: the verb “verb” is verbed from the noun “verb”.
Nbarth (email) (talk) 22:28, 27 June 2008 (UTC)[reply]

Question: Why use this template? Why not simply add the category in the usual way? The template merely adds complexity for the bots who format pages and sort category links to the bottom of the appropriate language section (where they belong). --EncycloPetey 21:14, 25 June 2008 (UTC)[reply]

Yes. If it is supposed to be a visible context template (probably not), then it should use {context}, but as used it is a formatting oddity, the cat itself would be better. I have a question too: is "autological" an autological word? (Seems to me it is if it is, and isn't if it isn't, and either way is self-consistent ;-) Robert Ullmann 15:32, 26 June 2008 (UTC)[reply]
I agree with EncycloPetey and Robert. When I went to add the template to some autological words, I assumed it was a context tag, but then saw that it has no visible output and that other entries use it as a simple category. WT:RFDO#Template:autological. (By the way, Robert, that very question was posed by Douglas R. Hofstadter. I think it was in w:Gödel, Escher, Bach.) Rod (A. Smith) 16:27, 26 June 2008 (UTC)[reply]
Robert, “autological” can logically be chosen to be autological or heterological.
This is mentioned at Appendix:Autological words, and explained at Grelling–Nelson paradox.
Just as “heterological” cannot logically be chosen to be autological or heterological (it’s a contradiction), “autological” can be chosen to be either (it’s a tautology) – see details as linked.
Nbarth (email) (talk) 22:28, 27 June 2008 (UTC)[reply]
I made a template because I copied it from {{back-form}}, and knew not any other way of dealing with “one category per language” in an elegant way. (Been working with {{term}} and {{t}} too much, not with categories.)
Would the usual/proper way be to literally include the code:
[[Category:English autological words]]
…or perhaps:
[[Category:{{en}} autological words]]
?
Also, is this documented anywhere?
Thanks!
Nbarth (email) (talk) 22:28, 27 June 2008 (UTC)[reply]
(code edited at: 00:14, 28 June 2008 (UTC) b/c I was confused re: ISO 639 expansion)

Renaming glossaries

I am about to rename, technically move, some glossaries from Category:Glossaries so that they start with "Glossary of", and optionally end in "terms", and mandatorilly do not end in "terminology", to create uniform naming. Most of the glossaries already keep that naming scheme. Aligned with this scheme, the name "List of" should be reserved for lists without definitions.

Please, tell me if this action is unwanted. --Daniel Polansky 07:58, 26 June 2008 (UTC)[reply]

I tend to think they should all be "Glossary of", since lists often drift into becoming glossaries, and should have the words linked anyway. --EncycloPetey 17:27, 26 June 2008 (UTC)[reply]
Also, I would like to remove the Category:Appendices from the glossaries, so that they are only listed in Category:Glossaries. --Daniel Polansky 08:05, 26 June 2008 (UTC)[reply]
Since all glossaries are appendices, that would be fine provided that each glossary is categorized in Category:Glossaries, which is in turn listed in Category:Appendices. --EncycloPetey 17:27, 26 June 2008 (UTC)[reply]
And what about me creating Category:Word lists as a subcategory of Category:Appendices and moving the appendices that lists of words without definitions, either plain or hierarchically organized, there? Is "Lists of words" a better name for the category? --Daniel Polansky 08:13, 26 June 2008 (UTC)[reply]
I don;t like that idea unless we can devise a suitable name. "Word lists" could easily be confused with content in the Concordance: namespace. --EncycloPetey 17:27, 26 June 2008 (UTC)[reply]
All sound like good ideas to me. "Lists of words" is fine. Michael Z. 2008-06-26 20:54 z

Thanks for the feedback. I will rename the glossaries proper, and remove them from the Category:Appendices. I will refrain from doing anything with word lists. --Daniel Polansky 08:25, 27 June 2008 (UTC)[reply]

pronunciation guides?

A while back I was trying to figure out how to pronounce a place name. My first thought was to look in wiktionary, but we don't have place names. I looked in WP and fortunately they included the pronunciation (which they don't always do). Should we add, to our extremely long list of things to do, the creation of pronunciation guides for words we do not include? Could also be used for people's names (such as how to you pronounce Feynman or Dalai). RJFJR 21:03, 26 June 2008 (UTC)[reply]

Do you mean in an Appendix: list?—msh210 21:05, 26 June 2008 (UTC)[reply]
Something like that. Just a long list of words and pronunciations. (Probably broken up by starting letter or something) RJFJR 21:22, 26 June 2008 (UTC)[reply]
Yes, please. This would be very useful when generating rhymes, as very many placenames do not have pronunciations in Wikipedia or in online gazetteers.
However, last time I looked (policy might have changed since then) I thought we were planning on including all place names in Wiktionary anyway, which would therefore cover including their pronunciations too? — Paul G 13:01, 18 July 2008 (UTC)[reply]
If you can find that decision, we probably should include it in CFI. I was left with yet another vague impression that someone though all placenames should be included only in Appendices, but that no one had proposed a method for actually setting this up. --EncycloPetey 19:51, 18 July 2008 (UTC)[reply]

WikiLook, Firefox Wiktionary add-on

Hey guys. I made WikiLook, Firefox browser add-on that looks up any word and show definition in small and sexy frame:) It, for example, let you to check existence of articles, word by word without clicking while you browse web. Or look up translation of foreign words. Look up idiom. Etc. It can be downloaded from direct download link or from Mozilla (registration required). Check Mozilla link or this page regarding how to use it(very simple). I'm looking for any feedback here, on my Wikipedia user talk page, talk page here or by email (you can find it on Mozilla add-on page). And I really need some published reviews on Mozilla page(all you need is to get registered with Mozilla) - it will help to win Mozilla "public nomination" faster. I really hope you guys love it! TestPilottalk to me! 03:37, 27 June 2008 (UTC)[reply]

The only major objection I have is that it gives the definition lines of the first language section only. It would be great if a preferences (or the order of preference) could be set for a specific language (or languages), defaulting to the first one available. --Ivan Štambuk 06:10, 27 June 2008 (UTC)[reply]
That one is on to do list already! Furthermore, the current idea is to extend project, so it will be able to parse other languages editions of Wiktionary. That would bring some nice opportunities. Like to easily compare on the fly articles for same word, and then go and improve one edition. Or look for word definition on your mother tongue first, and, if article not found, define it using your second/third language of choice. As for immediate future, tonight I'll try to implement language name parsing and show it in the frame. That would eliminate unclearness, as to what language was used to define particular word. Next step would be "smart" language search. So it correctly go for definitions of words like Wind, or, that is an awful example - After.
The whole thing take time and lots of efforts, but I'm on it:) TestPilottalk to me! 22:32, 27 June 2008 (UTC)[reply]
Version 1.2.3 is out. Smart language look up - if there is an English definition of the word, it will find it. If you want it, Ivan, it will take me like 5 min to make custom version that would go for another language. Soon I make an user selectable options, just need to read some docs. And this version is a first one that should be able to auto update itself. In theory. Plus, first non English Wiktionary WikiLook. Works like a charm. But not many words defined in that Wiktionary - so not much usable. But sort of proof of the concept. TestPilottalk to me! 09:35, 30 June 2008 (UTC)[reply]

Request for Bot approval

I have made a Bot using my notepad++, the bot is named SinBot and shall be used against vandalism. I have used Javascript and Python to create the bot, i havent uploaded the source code yet because i am waiting for approval. The bot has it in its script to notify me when a page has had any content removed, and show me what was removed. It will then ask if i wish to redo/undo the edit made that constituted vandalism. Also, the bots task will be to let me know when a page is blanked, and ask if i would like to undo the vandalism. I have checked and rechecked the scripts, and everything is fine. The Bot was made using javascript only because i have no other source (except Python, which i used a little). It has been run on my computer and works fine. I only ask now that you accept this request on begging knees, and allow me to aid in fighting vandalism with SinBot. Thank you for your time, The7DeadlySins 04:32, 29 June 2008 (UTC)[reply]

Having looked at the soucecode is does not look like javascript, it looks like Bash script with mistakes. The code also strongly resembles ScsBot's, are you User:Scs under a new name, or did you copy it from User:Scsbot/wikised? Additionally, the code for the bot does not seem to do what you claim, though I didn't really look closely. Conrad.Irwin 19:36, 29 June 2008 (UTC)[reply]


Yes. I did copy the code off of the wikised, but i completely revamped it after spending a whole night researching Script, Batch script, and Javascript. I reordered it to do these given tasks:
  1. Notify me when there is vandalism (i.e, page has been deleted or content removed)
  2. Revert the vandalism if i say so.

Also, i ordered it to logon whenever it wants. However, i am having trouble getting the bot to start up, how do i activate it? Furthermore, no. You didnt look closely. I made the bot to do EXACTLY those functions, and there are no loopholes. Trust me, im a computer programmer by heart. I triple checked. Also, i am working on a more advanced version on my Notepad++ if this one fails. Cheers, and please let me know how to activate the bot, The7DeadlySins 19:46, 29 June 2008 (UTC)[reply]

It is not batch script, it is w:Bash script, it does not work under Windows by default, you have to install bash (probably by using Cygwin). I have blocked you for one day so you can research all of this, and also because above you say "It has been run on my computer and works fine" and now you admit " i am having trouble getting the bot to start up". You clearly have no idea about running a bot, and this request is a waste of mine and everyone else's time. Additionally, copying the code from User:Scsbot is illegal, he released the code under the GFDL, which means that you "must" give him attribution in your code (i.e. say where you got the code from) or you are breaking the law. Conrad.Irwin 19:59, 29 June 2008 (UTC)[reply]
We have no use for a bot with this function. It would mean that sysops who patrol recent changes would have to mark two separate edits as patrolled instead of doing a simple rollback. I would never trust this user to run a bot - I can't remember another new user with such a ludicrously low signal-to-noise ratio (lots of "look at me" pages, duplicate emails, and less than a handful of useful edits, most of which would have been done by some of our existing bots). SemperBlotto 07:24, 30 June 2008 (UTC)[reply]

Would anyone object to moving User:Ptcamn/lulz to lulz (currently protected)? --Ptcamn 04:44, 30 June 2008 (UTC)[reply]

Are quotes from livejournal and other free blog accounts considered valid attestations? I don't believe they fit the description “permanently recorded media”. Michael Z. 2008-06-30 05:13 z
Done, rfv'd. It's now up to the courts to decide. -Atelaes λάλει ἐμοί 05:23, 30 June 2008 (UTC)[reply]
From what I understand such quotes don't count for RFV, but are fine to illustrate usage. Conrad.Irwin 13:31, 30 June 2008 (UTC)[reply]

Unicode

Mutante and I were playing around with some lists of Unicode characters, we decided that it would be both possible and useful to create entries for those unicode characters about which we have no information. Such entries would be created from a list such as this one and would contain the unicode character name, which is a human readable description of the character (e.g. "Latin Capital Letter Y with macron"), thus making it possible for people to obtain basic information about the symbol. They would (as a bonus) include the unicode character block ("Latin Extended-B"), the unicode code point ("U+0232"), and an external link to the relevant unicode consortium page. An example entry can be seen at Template:new unicode char. Before we embark on creating the more relevant of these entries (probably only the Extended Latin character blocks) I would like to first check whether people have any suggestions for making these entries more useful. This is the kind of thing that can be done by a bot or "enhanced" human editing once we've worked out how to do it. Conrad.Irwin 14:11, 30 June 2008 (UTC)[reply]

Two concerns.
1 I think the focus should be on semantic symbols or letters, rather than Unicode code points. Although Unicode tries to divide up writing into units of meaning, and define a best way to express something, there is often more than one acceptable expression (e.g. the dumb apostrophe “ ' ” can represent an apostrophe, a single quotation mark, a prime, etc.), and there are code points which overlap in meaning.
2 Is this really a dictionary entry? This information is not a definition nor an etymology, and isn't in the scope of WT:ELE. It is an attribute of the Unicode code position rather than of the represented letter. If we provide a reference to a letter's Unicode attributes, why not also its ISO-8859 code point, etc? Are we also going to describe its Unicode spacing, combining, collation, case characteristics, etc? This is threatening to become encyclopedic. Cf. w:Y with stroke). Michael Z. 2008-06-30 15:40 z
To point 1 I fully agree, to point 2 I'd suggest just descriing the character itself, i. e. "the letter y with a stroke through its upper stems", or ⏢ "a white, usually symmetric trapezium". Of course this can't be done via bot and has to be done manually. -- Prince Kassad 16:08, 30 June 2008 (UTC)[reply]
We could mention that a letter is defined as a y with a stroke, but we don't normally describe the visual appearance of a letter or other symbol, just as we don't describe a, Y, 5, =, ж, or μ. For one thing, the symbol is right there on the page to see. Michael Z. 2008-06-30 16:52 z
When there are upper-case and lower-case variations, it's good to show them on the "headword/inflection" line, as is done in a and similar entries. For characters that are only used in one or two languages, it would probably be better to show the actual language as the H2 header rather than "Translingual". For readers without the necessary fonts, it would be helpful to include an image of the character. It may also be interesting to include a brief "etymology" so readers can understand why a stroke was added for example. Finally, some example expressions that include the character would be nice. Of course, most of the preceding will have to be entered by a human rather than a bot, but they will make the entries more useful and more similar to entries for actual words. Rod (A. Smith) 17:25, 30 June 2008 (UTC)[reply]
Sounds like a good approach. Michael Z. 2008-06-30 17:52 z
+1 to this initiative and to all of Rod's suggestions. Also, this is a minor thing, but since LATIN SMALL LETTER Y WITH STROKE is in the category Ll (lowercase letters), shouldn't its POS header be ===Letter=== rather than ===Symbol===? —RuakhTALK 01:44, 1 July 2008 (UTC)[reply]
It is only a letter in the Lubuagan Kalinga language of the Philippines. If we can it a "letter", then it is only used in that language and should have the appropriate L2 header. If we are treating the Unicode symbol, then it is Translingual. --EncycloPetey 01:53, 1 July 2008 (UTC)[reply]
Unicode classifies this character as a letter, not as a symbol; AFAIK it's not a symbol for anything; and if it is a symbol for something, our entry doesn't say that it is one, much less indicate what it might be a symbol for. If this letter is indeed only used in a single language's writing system, then you are right that the appropriate L2 header is preferable to ==Translingual== (though I dunno where a bot would get that information); but I don't see how the use of ==Translingual== is an argument for the use of ===Symbol===. —RuakhTALK 03:03, 1 July 2008 (UTC)[reply]

Etymology trees

It would be nice to have a system in place to show the development of word stems to their present forms in modern language, and a section to show language branching. Perhaps with a legend for the differentiation of meanings across languages. Just a language tree would be nice to have. Thoughts?

For reconstructible PIE nouns there are some fairly complete entries in the appendix (*ph₂tḗr, *méh₂tēr, *bʰréh₂tēr, *swésōr, *gʷḗn, wĺ̥kʷos, *ḱḗr, *h₁nḗh₃mn̥ etc.). Semantic shifts and not so evident meanings are usually indicated with glosses when they occur. For the PIE roots - not so much complete entries unfortunately, other than for very basic ones as *bʰer-, *steh₂-, *h₁es-, *deh₃- which haven't changed much in reflexes neither in meaning nor shape even till today. You are welcome to contribute to any language family of your preference. --Ivan Štambuk 05:28, 1 July 2008 (UTC)[reply]

Scope of related terms

I am wondering about the proper scope of the section Related terms. I have added philosophy of science to the section Related terms of epistemology, as the distinction proper between the two terms is unclear to me; for instance, I am at loss whether Popper's falsification theory is a contribution to the former, to the latter, or both. WikiPedant has removed philosophy of science from epistemology per WT:ELE's definition of what counts as related term: only etymologically related terms count as related. Per WT:ELE his removal is right AFAICS.

However, I have been treating Related terms more broadly, and as I now see not in align with WT:ELE. For instance, I have added "computer science" and "information theory" as related terms of information science, based on the felt risk of confusion of the terms, not based on their etymological relationship.

So I wonder: where should I put terms that I want to have contradistinguished from the term of the entry? Or should they be not there at all?

Also, I have been moving terms from See also to Related terms, which too seems to be wrong.

--Daniel Polansky 07:39, 1 July 2008 (UTC)[reply]

If the additional terms are of real value for understanding the entry, and cannot be included either as part of the definition or as etymologically realted terms, then a ===See also=== section may be used. However, it should be used only sparingly, and some community members dislike seeing that section at all. --EncycloPetey 07:42, 1 July 2008 (UTC)[reply]
Thanks. I have now noted that See also is mentioned at WT:ELE and that it could be used for the purpose. I have now also noticed the existence of Wiktionary:Semantic relations, which however is not a policy and does not seem to be endorsed. It seems to me that the relationships between terms that I wanted to enter do not help understanding the entry alone; they help to understand a group of terms within a domain of discourse, and in other cases, they help to remind the reader that two or more terms have vastly different meaning, in spite of their syntactic similarity, such as information theory and information science. In any case, they are not about single terms; they are always about two or more terms. --Daniel Polansky 08:06, 1 July 2008 (UTC)[reply]
There is a limit to what a general dictionary can do. The semantic relations headers other than Synonym can be difficult to apply and can be difficult to understand (even Antonym in many cases). I sometimes amuse myself with such things, but I don't understand how they could be very useful to users. Often the best "See also" is just one or more {{pedialite}}. DCDuring TALK 00:00, 2 July 2008 (UTC)[reply]
I find synonyms, antonyms, derived terms, and related terms very useful, especially synonyms and antonyms. My perspective is one of a non-native speaker. I have spoken in favor of derived terms in a recent discussion in Beer parlour.
Specifically on antonyms: Often, I want to express an idea but only know its antonym, in whose entry I find candidate terms for my idea under the head of Antonyms. Also, there are pairs of words that syntactically appear as antonyms, but semantically are not; I've got no example right now. Antonym forming in English is not altogether regular, featuring the prefixes "non-", "non", "un", "im" and the like, unlike in my mother tongue where it is rather regular, so knowing which prefix applies to which particular word is useful. Also, when different antonyms are given to more senses, my confidence in the meaning often increases, again without examples.--Daniel Polansky 08:05, 2 July 2008 (UTC)[reply]
To clarify, "derived terms" and "related terms" aren't semantic relations per se. —RuakhTALK 18:04, 2 July 2008 (UTC)[reply]
Correct, thanks. I've mixed them up in my reply. --Daniel Polansky 09:36, 3 July 2008 (UTC)[reply]
Let me see if I understand:
  1. Related terms = defined list of Semantic relations + words morphologically connected to head word that are not derived terms;
  2. Derived terms are those derived from the word (compounds and other morphological derivatives using the unaltered headword or inflected forms)
  3. See also = Semantic relations not on the defined list. DCDuring TALK 10:58, 3 July 2008 (UTC)[reply]
Yes regarding "See also" and "Derived terms", but no regarding "Related terms". All items in the "Related terms" sections should be etymologically related to the head word. If a term is semantically related to the head word but shares no history with it, it should not be included in the "Related terms" section, but perhaps in one of the sections devoted to semantic relations. I also use the "Related terms" as a repository for terms that clearly share some history with the head word, even if I don't know whether they were derived directly from it, hoping that another editor who knows more of the history of the words can sort it out. Rod (A. Smith) 15:59, 3 July 2008 (UTC)[reply]
If that is the case than the original question is that "surjection", "injection", and "bijection" are "Related terms" mutually. My association of Wiktionary:Semantic relations with "Related terms" was wrong, although based on the actual practice of using "Related terms" as a holding pen for terms related eihter semantically (same PoS, usually no direct etymological relationship) or etymologically (usually different PoS).
I would expect that "Related terms" should not include terms that share only minor morphemes like prefix and suffix, that at least one stem should be involved. DCDuring TALK 16:34, 3 July 2008 (UTC)[reply]

Related: Do injection and bijection qualify as related terms of surjection, per sharing a grammatical root, even if not a prefix? It seems to me that the definition of "related term" for the purpose of the Wiktionary sections deserves a more detailed elaboration that the one currently found at WT:ELE, including some borderline counterexamples, or even obvious counterexamples. --Daniel Polansky 15:30, 1 July 2008 (UTC)[reply]

Using Wiktionary:Semantic relations (which hasn't been rejected) and if I get these terms properly:
IF "jection" were a valid name OR if "mapping" were suitably defined, THEN
they would all three be "Coordinate terms" and could appear as "Related terms" in each other's entries ELSE
"bijection" is hyponym of both "surjection" and "injection", which are then, by definition, hypernyms of "bijection", but, not having one of the named semantic relationships, "surjection" and "injection" can only appear in each other's "See also" sections. DCDuring TALK 00:00, 2 July 2008 (UTC)[reply]
With my question, I intended to head in a slightly different direction: does "having a shared grammatical stem and differing only in prefix", such as "to project" and "to inject", qualify as "having strong etymological connections"? If not, I think "to project" and "to inject" should better be mentioned as counterexamples in WT:ELE#Related terms. --Daniel Polansky 08:05, 2 July 2008 (UTC)[reply]
I think that kind of relationship should fit into Etymology or Derived terms. To facilitate pursuing those relationships Etymology can and should include morphology for constructed words so that links are available to go to the entries that have the constituents of the word. Those constituents should have Derived terms that show other words derived from them. I don't think there is a good home for all aspects of the mixed semantic-morphological relationship that exists in the case of the mathematical "-jections". The semantic relationships, at least, can be presented at the level of an individual definition. Related terms and Etymology do not penetrate to the definition level. In the case of the "-jections", there is almost certainly a case to be made for creating an entry for at least one of "jection" or "-jection", which would then allow separate etymology for the various "jections" and allow the use of the existing etymological and semantic headings to display the relationships more clearly. DCDuring TALK 10:52, 2 July 2008 (UTC)[reply]
I support creating an entry for -jection. (But I must say that I'm surprised to see you advocating it, as you requested that our entry for -scribe be deleted, and I don't see how that case is different?) —RuakhTALK 18:04, 2 July 2008 (UTC)[reply]
I came up short of advocating it. I was just trying to advance the discussion. I am interested in how far the existing structures go. I don't like -jection (because it is like -scribe), but I haven't found any evidence that jection exists. In English, we have no provision for "Stems" as a part of speech heading, though we do have "Prefix" and "Suffix". It would be interesting to have a morpheme namespace to and from which we could have various mappings from and to principal namespace. This would give a home for "infixes" and other troublesome entries, possibly Symbols. DCDuring TALK 18:28, 2 July 2008 (UTC)[reply]

Yet Another Interminable Discussion about Wikisaurus

Hello everyone, your favorite mosquito here. I've got yet another question about wikisaurus. Now I know you have gotten to the point of ignoring me and letting me do my thing, but I have a question about the Wiktionary:Wikisaurus page. On it there is, of course, the list of words created to date. So far so good. Then we have this:

   * Help:Creating a Wikisaurus entry
   * Wiktionary:Thesaurus considerations - Original discussion about the project.
   * /purpose - Centre for discussion of the purpose of Wikisaurus
   * /improvements - Discussion about the direction and overall project.
   * /criteria - Discussion about the content and criteria for inclusion within the Wikisaurus project.
   * /format - Discussion about the formatting and general contents of a Wikisaurus entry.
   * /requested entries
   * /templates - short list

My question is: just how much of this is still:

    1. valid
    2. needed
    3. confusing

I plan on doing the job for quite some time yet, but it wouldn't be impossible that someone would come along wanting to help. Is there a point when we can start doing cleanup of the resource pages so that people entering don't get hopelessly lost in the clutter? Amina (sack36) 21:46, 1 July 2008 (UTC)[reply]

And while we're on the subject of wikisaurus clutter, I'd like to point out a tiny glitch in something mentioned on the policy of Wikisaurus. It says to put in a section for linking non-synonymous non-antonymous related words. I have updated one record to follow that strategy. This one was not picked on purpose, guys, it was just the one that was mentioned as an example! May I point your attention to this page: http://en.wiktionary.org/wiki/User:Sack36/sandbox and then look at the other linkages I found that should be added to that page. I found them here: http://en.wiktionary.org/wiki/Wikisaurus:beer I haven't looked anywhere else and yet It still starts looking like a three book tome. What would you suggest? Amina (sack36) 00:17, 2 July 2008 (UTC)[reply]

It looks like Wikisaurus is gaining a little momentum, maybe enough that some of these issues can be discussed, instead of brought up and unanswered. In my opinion none of it is completely valid, especially since any of it is subject to change.
Format: Wikisaurus should be more than just a list of synonyms. Ideally it would actually explain when you would use what word under what circumstances, to the extent that we can generalize it. Somehow I don't think we can jump in with the experience of American Heritage though. Maybe a good starting point are collocations as illustation.
Criteria: Exact same criteria as for dictionary entries. Why duplicate the process? One of the advantages of linking to the main space is that it's easy to check. Having criteria is important because without it the crud builds up very quickly.
Templates: The ones I've made express the ideas that I've had about titling pages. Wikisaurus has an entry on vomit in the sense of "regurgitate". I don't think any single word can pin the concept down precisely. Both vomit and regurgitate have other meanings, but the intersection is a clearer, single concept. DAVilla 07:35, 16 July 2008 (UTC)[reply]

Wikibooks

The lead featured book over on Wikibooks is the introduction to Spanish. We've been talking about increasing our internet profile, and so I was dismayed to find that not only does this book not link to Wiktionary, but there isn't even a template for linking to Wiktionary over there! --EncycloPetey 19:10, 2 July 2008 (UTC)[reply]

I added a few wikt links to that book back in November ([26]), but it would certainly be nice to have an easier way to make those links. Rod (A. Smith) 19:35, 2 July 2008 (UTC)[reply]
I reckon it would help if some people just went through it and placed wikt links on all important words. I've started already :) --EivindJ 23:03, 2 July 2008 (UTC)[reply]
There really ought to be a prominent link from the first page too, don't you think? Something to let readers know that there is an available companion resource. --EncycloPetey 06:28, 3 July 2008 (UTC)[reply]
[27]msh210 17:56, 3 July 2008 (UTC)[reply]
Also started added a few links [28]. & [29] Mutante 09:45, 3 July 2008 (UTC)[reply]

Redirect from "article + noun"

I reckon it might be ok to make redirects from "article + noun" to noun. In English it might not be necessary to redirect "an apple" to apple, but the articles clearly plays a greater role in other language, like e.g. Norwegian or Spanish. Should we, when we stumble upon them, redirect la lunatica to lunatica and et tre to tre, or should we simply mark them for speedy deletion if people create them? I think it is plausible entries and I reckon many people enter the article when they search. --EivindJ 16:11, 3 July 2008 (UTC)[reply]

Articles are prefixes or enclitics in a number of languages, including the Semitic languages, the Scandinavian languages, Romanian, and Bulgarian. Certainly these should be kept either as redirects or as entries similar to "form of". In some cases, the form with the article is different from the base noun (e.g., Den Haag, 's-Gravenhage, La Haya, A Haia) and deserves a full entry. —Stephen 16:35, 3 July 2008 (UTC)[reply]
An old BP discussion decided (against my better judgement) that we should not have (most) Hebrew words that include clitics, not even as form-of entries. (Hard redirection was not discussed.) For some reason, that discussion was about conjunction and preposition clitics, but not the definite-article clitic.—msh210 17:48, 3 July 2008 (UTC)[reply]
Here I was basically thinking about articles that indicates the gender of a word, like in Spanish and Scandinavian language. I cannot see that they deserve a full entry or "form of" entry, but at least I reckon they should be redirected directly to the noun. --EivindJ 18:15, 3 July 2008 (UTC)[reply]
The Hebrew definite article is different from the examples discussed there in that it's less clearly a clitic: it mostly attaches to individual nouns and adjectives, not to full nominals. For example, as I'm sure you know, "the two children" is not *"ha-sh'nei y'ladim" but rather "sh'nei ha-y'ladim", and "the tall children" is not "ha-y'ladim g'vohim" (which means "the children are tall") but rather "ha-y'ladim ha-g'vohim". Traditional Hebrew analysis has viewed nouns as having three "states" — indefinite (status absolutus: bayit = house), definite (status emphaticus/determinatus: ha-bayit = the house), and construct (status constructus: beit = house-of) — and likewise for adjectives but without a construct state. I don't advocate following traditional analysis on all points, but this strikes me as a sound way to look at it, until such time as we find that modern linguists with Hebrew expertise have adopted a better analysis (and even then we have to ask what forms of Hebrew are covered by said better analysis). There are some cases in colloquial Modern Israeli Hebrew where it looks more clitic-like (e.g. "ha-beit sefer", which gets 122,000 hits — nothing compared to "beit ha-sefer"'s 2,470,000, but still nothing to sneeze at), but even for those, I'm not sure what the best analysis is: is it ha-{beit sefer} because ha- is a clitic for those speakers, or because {beit sefer} is a single word? ("Beit-sefer-im" gets essentially no hits compared to "batei-sefer", which seems to rule out the latter interpretation, but it could be somewhere in between.) If we do follow the traditional analysis, I believe definite nouns and adjectives would have normal form-of entries ("singular/plural definite form of ____"). —RuakhTALK 22:06, 3 July 2008 (UTC)[reply]
If there's no language ambiguity then this sounds fine. We already do hard redirects on phrases (with placeholders etc.) because a collision is so unlikely. I'd want to be really sure about the no ambiguity part though. DAVilla 09:27, 15 July 2008 (UTC)[reply]

trans-top

Does the sense given in template:trans-top make it seem like the translations given are of that term? For example, slide rule gives as its definition "an analog calculator consisting of three interlocking strips marked with...", so has {{trans-top|analog calculator}}, including, inter alia, Czech: logaritmické pravítko. Would the reader, not acquainted with Wiktionary conventions, read that as meaning that logaritmické pravítko means "analog calculator" (rather than as meaning that logaritmické pravítko means "slide rule")? I fear so. Perhaps template:trans-top, instead of <div class="NavHead" align="left">{{{1|Translations}}}</div>, should have something like <div class="NavHead" align="left">{{#if:{{{1|}}}|''In the sense of:'' {{{1}}}|Translations}}</div>.—msh210 17:42, 3 July 2008 (UTC)[reply]

That seems like a possible concern for some newer users and even newer translator-contributors. But the gloss is supposed to be no longer than the definition and the extra 1.5 inches of repetitive text on a series of collapsed translation bars might look silly. Does or could the template easily provide for optionally suppressing the "in the sense of 'headword'" for those cases where a knowledgeable editor had prepared a good shorter gloss that didn't need the phrase? Perhaps the translation header is where the repetition should be for the benefit of users before expansion, with the headword only appearing in the gloss after expansion, if indeed it were still needed. All of this might be annoying for veteran users so that having the option of suppressing if would be nice. I fear that all of this conditionality could make the design of the template complicated, so just take it as idle wishful thinking. DCDuring TALK 18:25, 3 July 2008 (UTC)[reply]
I'm sorry for not having been clear. The {{{1}}} referred to in trans-top is not the headword but a rewrite of the individual sense; so, in the example I gave, {{{1}}} is analog calculator. So I was suggesting that the header of the drop-down box read "In the sense of: analog calculator" instead of merely "analog calculator". To your points, though: I don't see why we couldn't suppress "In the sense of" in cases where {{{1}}} is a good gloss of the headword (so that the translations given are translations of {{{1}}}); we could use something along the lines of {{#if:{{{1|}}}|{{#if:{{{goodgloss|}}}||''In the sense of:'' }}{{{1}}}|Translations}}.—msh210 18:47, 3 July 2008 (UTC)[reply]
I still can't read template code, though I'm working on it. With the tech details, this discussion seems more GPish than BPish. In any event, the option for suppressing the "in the sense of: " would leave us with a visible marker for indicating glosses that haven't been looked at, assuming nobody runs a bot to automatically insert what turns that text off. DCDuring TALK 19:27, 3 July 2008 (UTC)[reply]
That haven't been looked at, yes, or that are not good glosses of the headword. They needn't be, need they? I mean, "analog calculator" is fine for the translation table (as long as, as noted above, indication is given that that's not what' sbeing translated) even though it's not a good gloss. No?—msh210 19:50, 3 July 2008 (UTC)[reply]

The purpose of the gloss is to disambiguate individual senses. I don't see the point of repeating part of a definition when there is only one sense. Better to repeat the headword “slide rule,” to reinforce exactly what is being translated.

When there are multiple senses, it would be clearer to use the headword, and add the disambiguating gloss to reinforce its supplementary function, as “slide rule (analog calculator).” Michael Z. 2008-07-03 19:46 z

I always add the gloss, even if there is only one sense. There is no guarantee that a new sense will never be added. --Panda10 22:10, 3 July 2008 (UTC)[reply]
When another sense is added, a gloss should be added to each as a matter of course. But there is no purpose to embellishing the table header with an unnecessary gloss—useless elements of an interface necessarily make it worse. Michael Z. 2008-07-04 00:06 z
Michael, I agree. When another sense is added, a gloss should be added. But not every editor will follow this rule. So I am trying to prevent a lot of additional work. Not to mention, that I am monitoring Category:Translation table header lacks gloss and if a trans table is created without gloss, this category will get a new member and it will grow quickly. --Panda10 02:39, 4 July 2008 (UTC)[reply]

This seems like a very good idea and very easy to implement. How much harm can that little extra sentence do compared to what gain it might give us. The way I see it, be bold and do it ... --EivindJ 22:44, 3 July 2008 (UTC)[reply]

For entries with a single sense, why not do both? That is, for slide rule, the trans-top gloss would read "slide rule - an analog calculator", or some such. That way, we don't confuse new users by having only the definition, but we also don't limit ourselves to forever having just one sense for the entry. --EncycloPetey 00:16, 4 July 2008 (UTC)[reply]
Can't this be done in the template, adding {HEADWORD}: (colon)? edit: Wait... I think I meant PAGENAME. As must be obvious, I really know my template talk... %-) -- Thisis0 19:20, 4 July 2008 (UTC)[reply]

Lack of consistency

After some contribution to this project I've reached the conclusion that this is a very nice project with many nice contributors with a very nice final product, but I've also understood that some basic (and sometimes more than basic) understanding of wiki syntax is needed. The extensive use of templates probably makes this project one of the hardest places for newbies to start and contribute, but then again it might be that Wiktionary rely on already experienced Wikipedians from other projects. Anyways, I was thinking about a more spesific problem, and that is the lack of consistency when it comes to the names on the parametres on many of the templates. When you have an entry with more than one word it is often preferrable to link to the different words inside the three '''. The {{infl}} then have the parametre "head=", {{en-noun}} has "sg=" and {{en-adj}} has "pos" ... Would it be an idea to make this kind of parametres (having the same function on more or less all the templates) have the same name on all templates. I reckon "head=" would be a nice parameter name which I would prefer if we could have on all templates of that kind. Any reasons why we shouldn't? --EivindJ 13:55, 4 July 2008 (UTC)[reply]

+1 —RuakhTALK 14:22, 4 July 2008 (UTC)[reply]
I'd also strongly support an effort to make template parameters consistent. Rod (A. Smith) 16:26, 4 July 2008 (UTC)[reply]
The only reason not to do it is that the current crew is used to this way. But we need more contributors and any reasonable standardization would help in getting them up to speed.
There is some logic to the existing system, because the parameters have somewhat different functions. In {{en-verb}} "inf" automatically adds "to", which might otherwise be forgotten. "pos" can used for all(?) of the en- templates (at least) other than {{en-verb}}, I think. {{infl}} can be used for any language, which puts much more burden on the person entering.
Maybe a place to start is to document the named parameters used in classes of templates. They may be more standardized than appears at first glance. It might also be useful to replace some of the less-used templates with the best modern forms in entries so that the obsolete templates can be eliminated. DCDuring TALK 16:31, 4 July 2008 (UTC)[reply]
The "pos" parameter in {{en-adj}} stands for "positive", as opposed to "comparative" and "superlative". To me, "head" seems a better name for the general parameter because it cannot be confused with "part of speech", "positive", or {{pos}} (the ISO 639-3 code template for "Sayula Popoluca").  :-) Rod (A. Smith) 19:38, 4 July 2008 (UTC)[reply]
I confess to having assumed it referred to "Part of Speech", which didn't keep me from using it correctly, I think. Your reasons seem fine, but, in my limited understanding, "head" in infl works differently than "pos" in the en- templates. "pos" also appears in other en- templates where I didn't read it as meaning "positive", {{en-pron}}, {{en-intj}}, {{en-prep}}, {{en-conj}}, {{en-det}}, (not the most commonly used); as well as {{comparative of}}, {{superlative of}}, where positive makes more sense. It would be tough if we let ISO 639-3 deprive us of many desirable three-letter abbreviations for our templates and for parameter names. DCDuring TALK 20:01, 4 July 2008 (UTC)[reply]
There are some good reasons for the inconsistencies. Now, for {{en-noun}}, it might be fine, and the template might be set up to accept either head or sg for the parameter name; I think it already has a couple of built-in options. It would also have the benefit of allowing the template to more freely be used on plurale tantum nouns, where "sg=" makes no sense. For {{en-verb}}, I think the suggestion would also work well. Butw hile uniformly using "head=" might seem like a good idea, there are some situations where it creates problems. For adjectives and adverbs, I'm skeptical. If someone could generate by bot a short list of entries that use either of those templates with an explicit pos=, it would either allay my fears as unfounded, or else reveal a problem that needs to be addressed first.
For foreign language entries, the use of "head=" isn't always the best choice. For some of us, the "usual" parameter is "alt=", so there will be some inconsistency even if English can be made uniform. Additionally, some templates work entirely on the basis of roots or stems, such as the Esperanto templates, so head= would be meaningless in those cases. --EncycloPetey 04:12, 7 July 2008 (UTC)[reply]

Not to change the subject, but another inconsistency that bugs me is with language parameters. {{attention}} and {{infl}} use {{{1}}} and the ISO code; {{rfc}} uses {{{lang}}} and the ISO code; {{etyl}} uses {{{2}}} and the ISO code; {{abbreviation}} and {{acronym}} use {{{1}}} and the spelled-out name of the language. In my opinion, which you can take or leave, all templates that call a language parameter should allow users to enter the language code or the spelled-out language; and all should allow users to use {{{lang}}} (in addition to whatever numbered parameter they use now, if any).—msh210 17:53, 4 July 2008 (UTC)[reply]

That problem has been noted and raised by many of us. Some folks have made some real progress towards solving the issue, and ao some templates already have this problem corrected. However, there are some cases where the user must be forced to use one or the other, especially in category names where some require ISO and others require the full name, or we will get incorrectly named categories requiring mass cleanup. now, {{rfc}} requires use of a named parameter because that parameter is optional, and {{{1}}} is the reason/note regarding cleanup. But {{infl}} must have a language, so it's silly to require the user to always have "lang=". Now, you're mistaken slightly about {{etyl}}, because it uses both {{{1}}} and {{{2}}}, and interprets one as the language name and the other is ISO code. As a result, it can't use "lang=", because it may require two such values in a single use. You are correct about {{acronym}} and {{abbreviation}}; both of those ought to accept the ISO code with lang=, because the code should always be interpreted the same way. In those cases, the feature has simply not yet been added, because the templates predate {{lang}} and other ISO interpreting code. --EncycloPetey 04:12, 7 July 2008 (UTC)[reply]
I agree with nearly all the comments made here so far (with the exception of msh210's critique of the deftly written {{etyl}}, which EP seems to have rebutted nicely). I think that a greater deal of uniformity is an excellent idea, and we have a great deal of room to improve on this. However, as EP notes, there are a few places where an editor will simply have to know the quirks of a specific template, with etyl being an excellent example of a template which can't take a lang parameter. A good start would be to make many of the templates take multiple parameters to mean the same thing (e.g. let {{en-adj}} take pos or head, and have them both work). I bet if we asked nicely, Robert Ullmann would be more than capable of both writing these in and/or siccing AutoFormat on the parameters we would like to deprecate, so that they can leisurely be exterminated, and the parameter redundancy be lifted. -Atelaes λάλει ἐμοί 08:10, 7 July 2008 (UTC)[reply]
Well, {{etyl}} can have {{{lang}}} s a synonym of {{{2}}}, much s other templates have {{{lang}}} to indicate the language of the word being defined on that page. {{etyl}}'s having {{{lang}}} as a synonym of {{{1}}} would not be in line with other templates.—msh210 16:47, 15 July 2008 (UTC)[reply]
While it can have that, it would be terribly confusing for editors to have a template that accepts "lang=" for only one of its two ISO slots. --EncycloPetey 17:45, 15 July 2008 (UTC)[reply]

Quotations from newspapers / magazines

Hi, I was wondering how to format quotations from newspapers and / or magazines. Please see Wiktionary talk:Quotations#Quoting of newspapers/magazines H. (talk) 15:06, 4 July 2008 (UTC)[reply]

Community censure of Robert Ullmann

I have spoken to Robert, and a couple of others have chimed in about his poor behavior, all founded on assumptions of the reasons for a single pair of reversions [30] [31] of an anon contributor. The anon had moved a quote off an entry's main page to another page; I moved it back. The anon has since returned and decided to again remove the quotation [32], because he/she feels that (contrary to prior community discussion) the forms do not match exactly and so should not appear on the page [33]. That is the background.

Here is Robert's initial response, in which he charges me with vandalism for making the revert: [34] The ensuing conversation is at [User_talk:Robert_Ullmann#Vandalism charge], in which I chastise Robert for making such ugly accusations, and in which Robert is than chided by two other community members. Robert holds firmly and vocally in the discussion to his charge of vandalism, and proceeds to ascribe many motives to my actions which are clearly assumed. He has refused to retract his serious allegations that my reverts were motivated and carried out as deliberate vandalism.

If Robert had a charge to level against me, he should have come to me (as I went first to him). As it stands, Robert has yet to ask me for my side of things. (It was a simple pair of reverts.) Instead, he has woven an extensive fabrication of untruths and half-truths to claim that I have acted in vandalism.

I request that the coommunity make it clear that such serious charges are unwarranted, and that the community make a plain statement that such attacks should never be posted on third-party pages. Robert's actions are unwarranted and are a blot on the reputation of Wiktionary. --EncycloPetey 16:35, 6 July 2008 (UTC)[reply]

I removed the comment from the IP anon talk page. I do stand by my primary complaint that removing content from non-lemma pages is very wrong, but this has been discussed before and will be again. I do want to apologise to EP. My explanation, while not an excuse, is that I have been having a desperately bad day; normally the wikt is a relief, requiring a lot of intellectual focus and thus distraction. This has not been working today. (If you want the gory details of the bad day, send me email, I'm certainly not going to splat it here! ;-) May I ask you to please stop escalating this; it was only on my talk page (after I took it off the anon, who has since replied there.) Robert Ullmann 16:45, 6 July 2008 (UTC)[reply]
First of all, yes, charging another admin with vandalism is a bad idea, and I believe Robert was in the wrong to do so. However, he has apologized (in bold no less), and retracted his statement. In my opinion that closes the matter. We all have our off days and make imprudent actions from time to time. -Atelaes λάλει ἐμοί 17:42, 6 July 2008 (UTC)[reply]
(after edit conflict) I object to the use of the word vandalism, as to me that word implies bad faith, but I think it's important that administrators be honest with new users, rather than pretending that administrators all always do the right thing and always agree with each other. Our policies are poorly documented, and we frequently disagree on whether a given thing has been decided or not; so, it's not like we can just hide from newbies the fact that we lack agreement about even very basic things (e.g. what kind of entry is optimal for non-lemma forms). R.U. having now apologized (or having now expressed an affirmative desire to apologize, which is pretty much the same thing), I see no need to censure him. Everyone makes mistakes, everyone feels strongly about random things that others find unimportant, everyone sees mastodons occasionally, everyone learns to forgive and forget. —RuakhTALK 17:53, 6 July 2008 (UTC)[reply]
Mistakes happen. Conflict happens. Please let it go. DCDuring TALK 18:04, 6 July 2008 (UTC)[reply]
As Robert has been willing to apologize, I accept his apology. Please note, Robert, that this was restoration of the quote to where it started only, and nothing more. If you believe that the quote should be duplicated on the page for the matching inflected/plural form, I won't disagree with that. I merely wanted to return the quote to the "primary" page to ensure we had a citation there. That (and that alone) was my motivation in making the reversions. The comment I posted to the anon reflects this, as it only metions removal of the quote from the lemma page, and does not discuss the issue of adding it to the plural page. --EncycloPetey 20:23, 6 July 2008 (UTC)[reply]

I'm not going to comment on any of the charges as they appear to have been resolved. I would like to say, though, that I completely understand Robert's point about duplicated content, which is not something that others have backed him on. The frustration I think comes as a result of fully reverted changes that admins label flatly as wrong without asking if any part of the change might be productive. No, I wouldn't call it by the V-word at all since in this case the removed content still exists in an appropriate place on Wiktionary, but there's nothing inherently wrong with having it in both places. I say this despite disagreement on precisely this point about the titles under which Citations should be combined or not, but my position is not inconsistent. I see Citations as a space for potentially holding all quotations, not every one of which could be so illustrative as to be included in a dictionary entry somewhere. Thus, anything listed in the main space as illustration would be, potentially, a duplicate anyway. If it's helpful in more than one location, then why force a choice between them? This is a fairly minor point of course, relative to the escalation. DAVilla 09:11, 15 July 2008 (UTC)[reply]

Wikipedia/Wiktionary

Quick question please -- can a word be defined both an here ond on Wiktipedia?? Smith Jones 00:49, 7 July 2008 (UTC)[reply]

Wikipedia avoids definitions, but a Wikipedia article might have the same name as an entry here; for example, our horse defines the word horse, while Wikipedia's Horse discusses horses. —RuakhTALK 00:59, 7 July 2008 (UTC)[reply]
I was refering more to the phrase detente balam, which is more or less defined her and goes into very little more detail on Wikipeda. it seems redundnat to me so I wanted to make sur thta tit was ckosehr. Thanks for replaying? Smith Jones 01:02, 7 July 2008 (UTC)[reply]
Ah, I see. We decide whether to include an entry based on Wiktionary:Criteria for inclusion, and Wikipedia decides whether to include an article based on w:Wikipedia:Notability and other policies. Since a dictionary and an encyclopedia tend to include different things, an entry here will generally differ significantly from a corresponding article there; but in some cases they'll be quite similar, and there's nothing wrong with that in itself. —RuakhTALK 02:33, 7 July 2008 (UTC)[reply]
For what it's worth, a common reason given for deletion on Wikipedia is (or at least used to be) "dicdef": that the article is merely a "dictionary definition" and unlikely to be more than that. Likewise, a common reason here is "encyclopedic" or "not dictionary material": that the entry is not a word but subject for an encyclopedia article (like Columbia-Presbyterian; cf. w:Columbia-Presbyterian).—msh210 16:43, 9 July 2008 (UTC)[reply]

Merriam-Webster copying us?

Is it just me, or has Merriam-Webster flat-out copied our first two examples for mondegreen? bd2412 T 09:10, 7 July 2008 (UTC)[reply]

Is there a problem with this? Are we really worried about copyright infringement on our GNU license? According to that, "...this means that the entries will remain free forever and can be used by anybody." Amina (sack36) 16:49, 7 July 2008 (UTC)[reply]
It's good to know and to be able to cite chapter and verse when discussing the relative quality of on-line dictionaries. In this case it's a little hard to tell because there are only about 10-20 commonly cited mondegreens. DCDuring TALK 17:36, 7 July 2008 (UTC)[reply]
It's a bit flattering actually, although even the GNU technically requires that we be credited. I suppose they could argue that it's a coincidence, but if it is, it is certainly quite the coincidence. bd2412 T 21:52, 7 July 2008 (UTC)[reply]
Perhaps they did, but we didn't invent those examples ourselves, so to me it sounds a little strange to say we have a copyright on them. Consider, for example, how we've copied quotations directly out of the OED. And numbering two, it's hardly a collection. DAVilla 08:33, 15 July 2008 (UTC)[reply]

Placing of Etymology

Is there a reason why etymology preceeds definition? I have found words(though at the moment I can't remember them) that have the same spelling but different etymologies, and the second most frequent use of a dictionary is for definition of a word, not etymology. (for some wrong-headed reason spelling is number one. Go figure.) Amina (sack36) 16:45, 7 July 2008 (UTC)[reply]

Etymologies are placed first because a word's origin is the only reasonable way to sort it. While this is less important for the average word, for words that have four completely different words sharing the same spelling, it becomes highly important to be able to sort out what is actually related to each other, and what is related simply by happenstance. 'Tis a lesson learned from biology. -Atelaes λάλει ἐμοί 17:08, 7 July 2008 (UTC)[reply]
We already have two entry structures: one where Etymology comes first and at the same level as POS sections, and one where POS sections are grouped under Etymology. Even if we don't want to change the second structure, that doesn't prevent us from changing the first one and putting Etymology last and at the same level as POS sections. —RuakhTALK 18:44, 7 July 2008 (UTC)[reply]
Another take: for a given entry, there can be multiple parts of speech and multiple etymologies (as many as 10!: kaeru#Japanese). For a given etymology-part of speech combination, there can be multiple definitions. Grouping the parts of speech that share the same etymology can help someone understand the range of meanings the word has more easily. It is even more of a help to someone trying to prepare an entry. In a wiki the later consideration counts for more than it would in an on-line dictionary with a paid professional staff. If we could efficiently present differently to "normal users" (say, all unregistered users and registered users who opted to be treated as unregistered users with respect to presentation), then we could present one way to those with more sophisticated tastes in language and another way to "normal users". Under the new WMF budget there is supposed to be much more spending for technology, so it may become possible to contemplate more developer-, server-, and bandwidth-intensive possibilities. In the meantime, portions of excessively long etymology and pronunciation sections could be placed under a show/hide bar using {{rel-top}} and long "Alternative forms" or "Alternative spellings" lists can be laid out as horizontal lists instead of vertical ones. DCDuring TALK 17:32, 7 July 2008 (UTC)[reply]
Further, consider placing the etymology if it did not precede the definitions. Because there can be multiple etymologies, we would have to mark each and every definition line as to its etymology if we did that. To avoid needless repitition and screen clutter, it is better to simply use the Etymology as a grouping mechanism. (Note to Atelaes: Although it is not incorrect to say we learned this lesson from biology, the biologists got the idea from Hennig, who got the idea from the historical linguists. And so we come full circle.)--EncycloPetey 18:04, 7 July 2008 (UTC)[reply]
The Merriam-Webster on-line solution is to present a single etymology-PoS combination at a time, which easily allows the etymology to be presented after the definitions. Their solution is not perfect by any means: for words that have multiple etymologies they may present two entries for the same part of speech on a disambiguation page, without even giving a gloss to guide the user's choice.
There are design tradeoffs if one only has one way of presenting to the user. For us, registration provides a convenient point for differentiating types of users. Users who register, learn about preferences, and then log in, can have custom profiles which are limited by the capabilities of CSS etc., developer time and capability, and the willingness of WMF to tolerate the resource load. DCDuring TALK 19:03, 7 July 2008 (UTC)[reply]
M-W is also presenting words in only a single language, and so has far less of the overarching data structure we must contend with. --EncycloPetey 23:27, 7 July 2008 (UTC)[reply]

Category:Wikisaurus

Should there be a [[:Category:Wikisaurus]]? I ask because there’s recently been some traffic on my talk page on exactly this point.

See:

Summary:

“I removed “Category:Wikisaurus” from the few WS pages that had it, since it seemed to be deprecated, both:
  • Because the category itself says that it is deprecated:
    This category has been superceded, since the introduction of the Wikisaurus namespace, by Special:Allpages/Wikisaurus:
  • ..and ostensibly, since few WS pages had it.”

EncycloPetey wrote:

Actually, there is a need for the category. All Wii pages are expected to be categorized, or they clog up the list of uncategorized pages. It may seem superfluous, since there is a separate namespace, but it is technically required.
Re-adding the category would be best. We categorize items in all namespaces. All Citations: pages are categorized in Category:Citations; etc.

…and thus I dutifully added [[Category:Wikisaurus|{{PAGENAME}}]] to {{saurus-head}}, and refreshed all the Wikisaurus pages.

Then Robert Ullmann wrote:

Having a category which replicates Special:AllPages/Wikisaurus: may be useful, or may not.

In more detail, he wrote:

Note that we categorize things in some even-numbered namespaces (not Mediawiki, and I'm not sure what with Citations), the odd-numbered namespaces (talk pages) are not. But it isn't about Special:UncategorizedPages, which is only NS:0; there are others for other namespaces (Special:UncategorizedTemplates, Special:UncategorizedCategories, Special:UncategorizedImages). And there is no "Special:UncategorizedWikisaurus" (which would require writing or modifying a code module for the MW software).

…so I figured I’d bring it to the beer parlour – what do people think, and, concretely, should there be a catch-all Wikisaurus category?

Nbarth (email) (talk) 21:18, 7 July 2008 (UTC)[reply]

I'd just as soon have the category. As my father would say, it can't hurt and it might help. bd2412 T 21:50, 7 July 2008 (UTC)[reply]
See also WT:RFDO#Category:Citations.—msh210 21:56, 7 July 2008 (UTC)[reply]
I'd just as soon remove the category. As BD2412's father might say on a more pessimistic day, it can't help and it might hurt. —RuakhTALK 21:57, 7 July 2008 (UTC)[reply]
Hurt how, exactly? I can't conceive of any possible harm. bd2412 T 09:04, 8 July 2008 (UTC)[reply]
I think an incomplete [[:Category:Wikisaurus]] could be actively detrimental. We can do various things to mitigate that risk, such as task a bot with populating the category, but I don't see the point. If we want to invest effort in it, I think we'd be better off using JavaScript to link from Wikisaurus entries to Special:PrefixIndex/Wikisaurus:. —RuakhTALK 23:00, 8 July 2008 (UTC)[reply]
I'd like to see us keep and expand it. If Wikisaurus is expected to (finally) grow, then a category structure would enable us to group listings by themes, the same way the Roget's hierarchy does. --EncycloPetey 23:25, 7 July 2008 (UTC)[reply]
With or without Wikisaurus, I've been thinking, we can have Category:Roget's 123 (etc.) on entries.—msh210 23:30, 7 July 2008 (UTC)[reply]
Possibly, but only if that information is not proprietary. --EncycloPetey 01:51, 8 July 2008 (UTC)[reply]
The original Roget's is from the 1850s and from the U.S., if I recall correctly. If that's right, then there's no copyright on it (IANAL). I don't know about patents, though. (And note that the numbering system has changed since the first edition.)—msh210 16:36, 9 July 2008 (UTC)[reply]

Idea for a new bot

Greetings. I run bots on Wikipedia, Wikisource, and Commons, and I have an idea for a bot that might be useful here. I've read Wiktionary's Bot policy, and so I wanted to bring the idea here to see if it would be worth writing. Here is my idea.

Nutshell: This bot would help users easily add citations on the citation pages for words, when the source of the citation is Wikisource.

Justification: The "citation" pages are a great idea, but they are underutilized. It seems like a lot of work to add them, involving cross-referencing with the definition page, as well as pages for the found quote. A bot could make this easier.

Detailed description: This would be a tool running on the toolserver. It would provide a user with a textbox for a word, a textbox for a quote, and a textbox for a URL to Wikisource. For instance, I might type in "delicacy", "Between me and the other world there is ever an unasked question: unasked by some through feelings of delicacy; by others through the difficulty of rightly framing it." and "http://en.wikisource.org/wiki/The_Souls_of_Black_Folk/I". If the URL is valid and the word is found in the quote, the bot will look up the word in Wiktionary and return a list of definitions for the user to choose between. In this case, the choices would be:

  • Noun: The quality of being delicate.
  • Noun: Something appealing, usually a pleasing food, especially a choice dish of a certain culture suggesting rarity and refinement -a Chinese delicacy
  • Noun: Fineness or elegance of construction or appearance.
  • Noun: Frailty of health or fitness.
  • Noun: Refinement in taste or discrimination.
  • Noun: Tact and propriety; the need for such tact.

Let's say the user chooses #6 -- a sensible choice. The bot would look up the URL in Wikisource to see if it can get the author, the name of the work, and the year. In this case it can. By going up a level in the URL, it can get the author and title from the "header" template, and the year from the category. Having all this information, the bot could create or edit Citations:delicacy with the following:

===Noun: ''Tact and propriety; the need for such tact.''===
{{timeline|
1900s=1903}}
*'''1903''' - [[w:W. E. B. Du Bois|W. E. B. Du Bois]], ''[[s:The Souls of Black Folk/I|The Souls of Black Folk]]''
*:Between me and the other world there is ever an unasked question: unasked by some through feelings of '''delicacy'''; by others through the difficulty of rightly framing it.

Built-in limitations:

  1. This bot will not make any decisions about whether a quote is appropriate -- that's up to the user, just as it would be if the user didn't use this tool. The tool only makes the job easier.
  2. If the Wiktionary definition is not formatted correctly, the bot will not be able to correctly list the definitions for the user to choose from.
  3. If the Wiktionary citations page is not formatted correctly, the bot may insert the new quote in an unexpected part of the page.
  4. If the Wikisource article is missing information, then the bot can't get it. For instance, s:The Two Gentlemen of Verona does not give the year.

This is just the planning phase, and I haven't even started on the bot yet. I wanted to bring it up here and get some feedback to see whether this function would be useful and encouraged or not. All the best, Quadell 19:24, 9 July 2008 (UTC)[reply]

Intriguing. Note that the year is often missing from the texts pages themselves, but are (more) often included in the list of author's works on the author page. If the bot could locate the line on the author page that links back to the text examined (a little tricky, yes), then it could also get the date that way. --EncycloPetey 22:50, 9 July 2008 (UTC)[reply]
Additional thought: It would be nice if the bot could be customized to search a particular language of Wikisource. The Latin Wikisource has a huge corpus of Classical literature (in part because of fewer copyright problems, I suppose). --EncycloPetey 00:06, 10 July 2008 (UTC)[reply]
You may wish to talk to TheDaveRoss who was also writing a similar bot (though I think he was using google books). I don't think that this would be that useful without a human glance at each quote, as although lots of quotes is adequate, a few good quotes is better. Conrad.Irwin 23:02, 9 July 2008 (UTC)[reply]
I like this idea a lot. I think that if Wikisource doesn't have a date, or other bibliographic info, the script can insert some template like {{Quadell-bot-missing-info|date}}. Note that this is not really a bot in the usual sense; every edit would be made by a human contributor, as I understand it (right?). Yielding more results than having the user type in a word, a quotation, and a URL, wouldbe for the user to type in only the word, and have the script search Wikisource; that's similar to what TDR is doing (as noted above by Conrad).—msh210 23:29, 9 July 2008 (UTC)[reply]

Good feedback. I hadn't thought of using Wikiquote, although that might (might) be possible as well, if the necessary data is easily harvestable. I like the idea of entering a word and searching Wikisource (or Wikiquote), but that would be tougher. I would have to either integrate with Wikisource's search, or Google's API, and I don't have any experience with either. Still, it certainly sounds like a useful idea. (And yes, to confirm, every addition would be initiated and confirmed by a human.) I'll give TheDaveRoss a ring to see what his thoughts are. Quadell 01:31, 10 July 2008 (UTC)[reply]

I was/am part way through a bot with the same intent, I was using texts from the Gutenberg Project. My project is on hold for the summer while I spend time doing other things. I was going at it a bit differently, I was simply going to have the bot add the citations directly to the page based on some rules for what qualifies as a good cite, your bot seems like it would be a great, but different, tool. - [The]DaveRoss 19:53, 11 July 2008 (UTC)[reply]

Examples

One of the reasonably common "illegal" L4 headings is "Examples". These are not usage examples, but rather illustrations of the concept. I have been working on the entry for prolepsis, which included in-line examples of the rhetorical and grammatical device involved. Arguably these do not belong in a dictionary entry because they do not illustrate the usage of the word. Presumably, we are referring users to articles in WP that actually illustrate the phenomenon.

Is it ever appropriate for us to illustrate the concept in words (by an example), rather than define the word and show its usage? (Obviously we "illustrate" by the use of images, BTW). If so, how ought it be presented ? The parallel case of pictorial illustration suggests that a sidebar "box" of some kind might be appropriate to mark the pattern-breaking nature of such examples. DCDuring TALK 20:02, 9 July 2008 (UTC)[reply]

It's sometimes appropriate to illustrate the use of a grammar term in an example sentence with the following technique from the entry for synonym:
“Happy” is a synonym of “glad”.
That is, an fitting example sentence can usually be made by mentioning some examples of the concept. Does that help? Rod (A. Smith) 21:18, 9 July 2008 (UTC)[reply]
Judicious and, dare I say, witty selection of usage examples has been fun and has enabled me to finesse the general problem on many occasions. But it doesn't always work. (As in the more complex definitions of prolepsis.) It can be hard to find a passage that briefly illustrates the definition. In addition, our use of graphics is usually to illustrate concepts rather than decorate entries. It is the parallel to those graphics that struck me. Also the abundant use of boxed mini-cases and sidebars in textbooks and some other non-fiction struck me as a useful format for texty material that could illustrate, say, rhetorical devices, grammatical constructions, etc. Just a thought. DCDuring TALK 22:10, 9 July 2008 (UTC)[reply]
Yeah, that's a good idea. Make it into a "picture", and then it "illustrates" the concept. DAVilla 07:17, 16 July 2008 (UTC)[reply]

Bot Request for QualiaBot

I ask for a bot flag for QualiaBot. The half automated bot will work interwiki (the main aim is to integrate upper sorbian wiktionary in the interwiki and to make interwiki in the hsb wiktionary). To be effective with ressources it is senseful that the bot writes his findings also in the pt dictionary. The bot uses pywikipedia. His wikipedia sister has bot flag in 26 wikipedias (e.g. en, pt, es, de, cs, pl, sk, ru, uk, bs, sr, hr, eo, el, it, no, da, ro, sl, lt, lb, tr, nds, nds-nl, be-x-old, cv) Thank you! Qualia 17:12, 10 July 2008 (UTC)[reply]

See above, Wiktionary:Beer parlour#botflag_for_CarsracBot and Wiktionary:Beer parlour#Bot_flag_request_for_User:EivindBot. In what sense is your bot "half[-]automated"?—msh210 19:24, 10 July 2008 (UTC)[reply]
Okay, I see there would be only be a small time benefit if QualiaBot would write here. If you don't like this benefit, not my problem. By the way I would think it would be - with respect to the effectiveness - better if Interwicket would write its findings also in other wiktionaries. Because not all wiktionaries have such a powerful bot and it would be more effective for the wictionary projects at all. Qualia 12:28, 11 July 2008 (UTC)[reply]

Wikisaurus at cross purposes

We seem to be working at cross purposes with wikisaurus and it's wasting everybodys time because we are overlaying each others work. We need to stop the entry work there and clearly define how the page is going to work.

I have been listing all the synonyms I can come up with for a given word within that words headpage. Thus, noun, adjective, adverb, and verb forms of the same word all have their place on that page. Each synonym is linked to their own headpage where wiktionary can find them. I was told to take a page to the sandbox to alter it and when I went to put it back, the page had been altered beyond recognition. It won't let me archive and overlay any more.

The altered page has broken out all of the list of words into different groupings (I guess) and sent them to their own page. A great deal more information has been added to the page (most of it I have no problem with) that wasn't there before and all of the words link back to wiktionary instead of wikisaurus.

I've been putting in a lot of hours on this and I want to know that my hard work won't be wiped away just because of our lack of organization. The casual editor isn't the problem here, it's the bunch of us who are on this thing all the time and now that I've breathed new life into the project, people are no longer eager to have it demolished. That's good, but it is also a plan for disaster if the heavy hitters all pounce on the wagon at once without getting up to speed on where things sit.

I appreciate the Category:Wikisaurus. It was a great improvement, but I wasn't told it was happening and found it by accident after all was accomplished. I was reading someplace that courtesy dictates we notify the people who seem to be most involved in a certain area. Everyone here knows that I am most involved in the Wikisaurus project. Where was i notified? Amina (sack36) 20:59, 10 July 2008 (UTC)[reply]

Who did what to whom? Have the changes been rolled back? Were you working in a public sandbox or on your own user page? On a wiki, if it is public, it is open for editing by all. We have civility rules and we also things that are hard to find (one's own user pages, if cunningly named. DCDuring TALK 21:40, 10 July 2008 (UTC)[reply]
I'd rather not spend time pointing fingers. The page that I prepared is at sack36/sandbox. I was asked earlier to move things to either the public sandbox or one at my page. I chose my page. the other item is at Wikisaurus:drunk.
The point is that the pages look HUGELY different. Words that I had painstakingly transferred to point to wikisaurus are now all pointing to wiktionary. Lord only knows why. they can type the word in search and be there as fast or they can click the wikisaurus pointer and go to where it is a headword, click again and you're there. Not every word is a clear equal to the headword. You have to have synonyms set up for each word because of that.Amina (sack36) 22:34, 10 July 2008 (UTC)[reply]
The Wikisaurus pages must link to the dictionary pages, because that's where the meaning of words is stored. I can see why you would want to redirect from the other Wikisaurus pages to the main one, but can't work out why you would want to link from Wikisaurus to Wikisaurus on every item in the list. Is there a page like Wiktionary:Wikisaurus layout explained that I should read/someone should write? Conrad.Irwin 22:42, 10 July 2008 (UTC)[reply]

Hiyas, There are a number of issues being discussed here, and some evident frustration, at least some of which relate to me, so I thought I’d add my two cents.

  • WikiSaurus format
There seems to be a lack of documentation (and perhaps consensus) on format for WS entries. I find Wiktionary:Wikisaurus/format and Help:Creating a Wikisaurus entry#Formatting (which I’ve linked to each other).
Regarding linking, the Help page says:
“Please be sure to wikify all words and idioms listed in Wikisaurus, and in applicable places link to Wikisaurus pages.”
Other than “applicable places”, I can’t find a mention of where to link to on WS pages – perhaps this could be clarified?
  • Category:Wikisaurus

This is discussed above at: #Category:Wikisaurus – briefly, I found few pages with the apparently deprecated category, removed it, then at EncycloPetey’s suggestion, added it – I didn’t know that anyone wanted to be appraised, and it seemed uncontroversial. Sorry if I stepped on anyone’s (Amina’s?) toes.

  • Altered page?

I don’t follow Amina’s question – Amina? The page currently at User:Sack36/sandbox (current revision) is on “alcoholic”, and Wikisaurus:alcoholic has not been edited in over 9 months (history), so I’m not sure what the question is.

In case it adds useful context, I was the person who suggested using a sandbox when ones edits break a page, at: User talk:Sack36#Wikisaurus:gigantic.

Suggestions:

  • WikiSaurus layout

As Conrad suggested, having WS layout laid out somewhere would at least give us somewhere to point, be it at a new Wiktionary:Wikisaurus layout explained (WS:ELE?) or an existing Wiktionary:Wikisaurus/format or Help:Creating a Wikisaurus entry#Formatting.

  • WikiProject pages

If people want to broadcast that they’re quite involved with (for instance, Amina is very involved in Wikisaurus, User:A-cai is clearly very active in Chinese (see: recent contributions), lately I’ve been doing some work in etymology, Japanese, and Chinese characters, etc.), may I suggest WikiProjects?

For instance, Wiktionary:WikiProject WikiSaurus, etc.?

Among other things, this means that someone interested in WikiSaurus (say) can find who’s currently involved?

Amina says “Everyone here knows that I am most involved in the Wikisaurus project.”

I didn’t. I now do. Were there a WikiProject stating: “These people are very involved in WS”, future people would learn much faster.

Nils (Nbarth) (email) (talk) 00:03, 11 July 2008 (UTC)[reply]

Added given name at 00:22, 11 July 2008 (UTC)
I'm sorry Nbarth, if you hadn't heard. I joined the Wikisaurus project a few months ago. There was at that time an informal vote going on as to whether the project should get folded into wiktionary and archived. In other words, since nobody had worked on it in "quite some time" everyone was saying it was a dead project. I volunteered to work on it so it wouldn't be a dead project. I was told that since no one else was doing anything I could have at it and don't worry about painting with broad strokes.
So I had at it. I explained everything that I was doing as I did it, dutifully recording it here in the Beer parlour. I asked for feedback numerous times with DC being the biggest contributor to the discussions. I had assumed (I usually know better) that with the amount of chatter I was generating, that people would notice things were happening with wikisaurus and who was making it happen. I know I have gotten a fairly good idea of people's interests based on what they comment on in the discussion rooms.

Now about the linking of non-headword words to wiktionary. Why is it necessary for the wikisaurus synonyms to be directly linked to find out their meaning? If you click on them when linked to wikisaurus, it takes you to their headword document where the abbreviated meaning is the second thing listed after part of speech. It they need to have something more detailed after that, clicking on the headword takes you to their wiktionary page. Why clutter the page with links that are unnecessary? There is no main page. This was explained earlier in the Beer parlour.

Nils, I think the idea of setting up a project similar to the Wikipedia projects is a wonderful idea. In that, one person designates the project and invites anyone and everyone who wants to join in the discussion, right? Then different segments of the project are farmed out to different people to do the preliminary work. I've been working on wikipedia and it's children for around 4 years. For a short time I was project leader on one of the Wikipedia projects. That was until I located Wikisaurus and got so excited about it. I was really impressed with the designation of projects and I kinda wondered why it didn't happen here.

Finally, I'd like everyone to take a look at my user page User:Sack36, please? I talk at length about how I feel the wikisaurus should be laid out. Does this clear up all the questions etc. that people wanted? Amina (sack36) 08:38, 11 July 2008 (UTC)[reply]

Ok, I finally see what you are driving at, what a brilliant idea! Can I suggest that you keep your posts slightly shorter, then more people might read them :). Your user page describes something very interesting, but I think we need to constrain it slightly to fit within the bounds of wiki-ness, (though maybe you've just worked out Web 3.0!)
Wikisaurus 2: Instead of thinking of Wikisaurus as a place to dump synonyms when main space entries get too long, Wikisaurus is a big map describing the network of words. There are lots of connections between words, in the grammatical sense the simplest being synonyms and antonyms, the more complex ones including hyponyms and hypernyms. Thus each word should have a Wikisaurus page that describes each of its connections. The interesting bit is that each of the connections is not a simple link, it can (and should) be described too, probably using tags like {context}.
Issues: it will be necessary to give a Wikisaurus entry to each definition of a word, which can get messy for words with lots of definitions. Laying out a multi-dimensional web on a two dimensional page. Conrad.Irwin 23:46, 12 July 2008 (UTC)[reply]
My idea: see Wikisaurus:lucky for a simple, reasonably clean format that can be extended to include whatever relationships we want. I consciously chose to imitate your layout == Part of speech == / === gloss of definition === as that makes it very easy to navigate, I also chose to link to both wiktionary and wikisaurus, though it is easy to change the format of each line by editing {{wsword}}. Not sure whether using ;synonyms is better than ==== synonyms ====, but we can always customise appearance with CSS. Conrad.Irwin 00:47, 13 July 2008 (UTC)[reply]

Wikisaurus is fundamentally flawed

Interlinking Wikisaurus with itself is great when it's appropriate, but there's no reason to give every word an entry in Wikisaurus as it has in Wiktionary. If that were the case, then why not combine every PAGENAME with its corresponding Wikisaurus:PAGENAME via the synonyms and antonyms sections?

The fundamental flaw is one that has existed since the beginning. A thesaurus entry is not supposed to list every meaning of a string of characters. A thesaurus entry doesn't even have to correspond with a single word. It's best if it corresponds with a concept, which is why titles like fast (quickly) and fast (speedy) work better in my opinion. Most of those have been deleted because of strong objection from a contributor to titles with parentheses, although in my view nothing about Wikisaurus can be said to be all that rigid even today.

That's the reason why I didn't revert any of the changes that I've seen being made, although in earnest I don't always agree with them. I'd rather give some wiggle room. But since it's now an issue, I'm laying my opinion out there. DAVilla 08:16, 15 July 2008 (UTC)[reply]

no word of the day

there is no word of the day today. in its place was a template to let the beer parlour know about this fact. Special:Contributions/70.231.238.8c70.231.238.8 19:12, 11 July 2008 (UTC)[reply]

OK, I don't get it. What am I supposed to be seeing here? All I get with your or my id's is no contributions. Is that right?! Amina (sack36) 08:05, 12 July 2008 (UTC)[reply]
The Word of the Day today is laissez faire. You can see this on the Main Page. --EncycloPetey 22:10, 13 July 2008 (UTC)[reply]

While I agree that a phrasebook can be useful, it's more Wikibooks material. Would it be possible to transwiki all these phrases, then delete the entries here on Wiktionary?--♠TBC♠ 05:12, 12 July 2008 (UTC)[reply]

There's also the issue of determining how common a phrase has to be to merit an inclusion into the phrasebook. But that's an issue the Wikibooks editors have to resolve (if these phrases do get transwiki'd).--♠TBC♠ 05:16, 12 July 2008 (UTC)[reply]

No, we want to keep them for the sake of the translations. Wikibooks is not about maintaining the Translations like we do, and usually resents becoming a dumping ground for other WM projects. --EncycloPetey 22:08, 13 July 2008 (UTC)[reply]

The one problem I see with having a phrasebook, is that it sets a bad precedent for other non-dictionary entries. For example, having Is it going to rain? might merit the creation of Is it going to snow?, Is it going to be a cloudy day?, and so on.--TBC 04:12, 14 July 2008 (UTC)[reply]
Fortunately, that particular issue has never been a problem here. --EncycloPetey 17:58, 14 July 2008 (UTC)[reply]

Length

The length of the RfD page is way too long, over 400 KB, making it very cumbersome to scroll through. I propose we split it by months, like Wiktionary talk:Requests for deletion/Log/2008 June and Wiktionary talk:Requests for deletion/Log/2008 July. Any objections?--♠TBC♠ 04:04, 12 July 2008 (UTC)[reply]

I don't like that idea now (though I did support it last time it was proposed I think), if we split up that page it will never get processed - it just needs someone to go through it every month or so and close the old debates, if it is split into months then the necessary work won't be apparent. I would suppose splitting discussion pages like this into topic sections, and having the main page only as a place-holder (as we do for WT:VOTE) as that would make archiving them much easier. Conrad.Irwin 12:04, 12 July 2008 (UTC)[reply]
That sounds okay. A problem is that, unless we date subpage titles as we do for Votes, they won't be unique. (If we ever decide to do similarly for WT:TR, by the way, we can instead of subpage transclude ns:1 pages into the TR, and make Rfdresult (or whatever that template is called) noincluded.)—msh210 23:16, 14 July 2008 (UTC)[reply]
I, too, object, for the record.—msh210 22:32, 14 July 2008 (UTC)[reply]
Well, what we could do is start doing all the big discussion pages like we do WT:VOTE, with threads being contained in subpages which are transcluded. This would confer a number of benefits: First, archiving is a snap, and retains a faithful edit history. A word passes rfv? It's discussion is simply removed from the rfv page and transcluded into the entry's talk page, taking the whole edit history along with it. This would make the BP much more efficient as well. People can watchlist topics of interest to them, and keep track of what people are saying. Conrad.Irwin says that it would be rather easy to write a bot which would archive discussion topics which have had no comments for a month. That way, short discussions are moved out of the way, and long discussions can stay as long as they need to. Even better, an archived discussion could be back onto the BP if it starts up again. Anyone remember an occasion where they were trying to find an old, unresolved discussion to bring context into a new discussion? Just revive it! Obviously, the archiving policies would have to be different for the rf pages, as they shouldn't be archived until resolved, but I imagine this would be a simple matter. Thoughts? -Atelaes λάλει ἐμοί 23:28, 14 July 2008 (UTC)[reply]
I like this idea. It certainly would make things a lot easier and more convenient.--TBC 23:53, 14 July 2008 (UTC)[reply]
I've no objections either way. We have discussed this idea before, so I know there are some proponents out there. --EncycloPetey 03:39, 15 July 2008 (UTC)[reply]
Starting a vote. Any other additions before this goes up?--TBC 10:52, 17 July 2008 (UTC)[reply]
I like this idea, but am not sure exactly how it would work when creating a new section. Would we expect new users to be able to create appropriately-named discussion pages from scratch, and then transclude them successfully into RFD? Also, it's currently a bit of a pain to watchlist votes; I don't mind it there, because there aren't very many, but if we're going to do this for RFD, I think we need a better approach, likely using JavaScript. —RuakhTALK 17:23, 17 July 2008 (UTC)[reply]
Concerning creation, I imagine the process would be similar to that of WT:VOTE. To tell the truth, I'm not completely sure of how that works, but I have to imagine it could be duplicated. If I'm not mistaken, the rfv/rfd templates currently do everything for you simply by pressing the little + button, and I would hope we can duplicate that with the new system. I must admit, I hadn't thought about how watchlisting would work for rfv/d. Yes, that would be a pain in the ass. I wonder if there's some magic that can automatically watch included pages on a page or something. That would have to be worked out. -Atelaes λάλει ἐμοί 20:15, 18 July 2008 (UTC)[reply]
We can definitely duplicate the process of starting a vote, but to start a vote, one must first create the vote subpage, and then manually transclude it into Wiktionary:Votes. I realize this is just a two-step process, but I think that that two-step process is enough to discourage many newcomers (maybe even non-newcomers). —RuakhTALK 21:22, 18 July 2008 (UTC)[reply]
My idea would be to have User:Conrad.Bot (or another) sit on recent changes and control the beer parlour, it would automatically add new discussions, and could (optionally) remove topics once they had become stale (after a month or so of no reply). Conrad.Irwin 23:49, 20 July 2008 (UTC)[reply]
Eep. I have to say, I find a bit frightening the idea of being completely dependent on a bot's catching every single relevant edit as it goes through recent changes. Before Interwicket, one of the problems with some of the interwiki-link-adding bots was that they only tracked recent changes, so when they missed an edit (as they often did), that was that. Maybe, in addition to what you describe, we could have a cronbot go through (e.g.) Special:PrefixIndex/Wiktionary:Beer parlour/2008-07/ once a day? —RuakhTALK 01:29, 21 July 2008 (UTC)[reply]
I had taken that as a given :). Yes, of course. Conrad.Irwin 09:53, 21 July 2008 (UTC)[reply]

Modernism

Many prominent modernist writers, like T. S. Eliot and F. Scott Fitzgerald, use nouns as adjectives, ones that are typically not considered as such. Fitzgerald, for example, uses the word "cocktail" in the Great Gatsby to describe a festive song. So then, should we count these words as adjectives?--♠TBC♠ 19:41, 12 July 2008 (UTC)[reply]

The decision has been no if the noun is only used as "attributively". For example, it is good English in anyone's book to say "cocktail party" or "cocktail dress". One does not normally say "That party (or dress) is cocktail", which would be a predicative use. One also cannot normally grade such a word when it used adjectivally: "That dress is very cocktail". Or compare: "That dress is less cocktail than the other." The use could be analyzed as a compound noun "cocktaildress", but not spelled solid. OTOH, note the weasel word "normally". I am sure that thousands of people have said: "That dress is so party." I am sure that thousands of people have said: "That dress is so party." Some nouns originally only used attributively do become more truly adjective-like over time.
Omitting the Adjective Part of Speech section, however, invites contributors to add such a section. That is why I have advocated having some kind of filler note under an Adjective header for such a noun saying that the noun is often used attributively as an adjective (or to form compound nouns) and providing links to an appendix or WP article providing a fuller explanation. The response here has not been enthusiastic because this is deemed to violate the metaphysically non-adjectival nature of such words and to mislead users. DCDuring TALK 00:03, 13 July 2008 (UTC)[reply]
Perhaps instead we add a note under the Noun section, indicating that the noun can be used as an attributive adjective with a different meaning than simply "of or relating to the noun". For example, "cocktail" in "cocktail dress" means "informal".--♠TBC♠ 04:23, 13 July 2008 (UTC)[reply]
I think that information belongs in the definition of cocktail dress, because in many circumstances it does mean “of or relating to:” cocktail shaker, cocktail party, cocktail lounge, cocktail wiener, cocktail hour, etc.
This seems to be a case where we're over-specifying to the point of being prescriptive. Other dictionaries mention attributive nouns, or assume that the reader can figure it out. Our convention of mentioning every possible usage leaves no room for the grey area: if we leave out the adjective it implies that this is not “normally” possible, whatever that means. But although “that dress is cocktail” is non-standard, it is not uncommon nor “incorrect” in colloquial speech.
Perhaps an adjective heading is a good idea, or an “attributive” label for the noun's inflection line. Michael Z. 2008-07-13 05:55 z
We'd then be adding "empty" information to just about every noun in the English language. I'd hazard a guess that more than 90% of all English nouns can be used attributively, including many proper nouns (e.g. "Paris skyline", "United States government). Creating an "adjective" definition for all of these nouns would be a waste of our time and take up "valuable real estate"tm on the entry pages. (Note use of both adjective and entry as attributive nouns in that last sentence.) --EncycloPetey 10:03, 13 July 2008 (UTC)[reply]
I wouldn't dispute the repetitiveness of the information, only that it might serve to prevent the addition of adjective senses by contributors while remaining at least as much an open Wiki as we are. As to "valuable", I would have assumed that any attributive-use Adjective header and line of linked text would follow the Noun and therefore fall into less valuable below-the-fold real estate, except for very short entries (one screen), where no real estate is especially valuable. DCDuring TALK 16:03, 13 July 2008 (UTC)[reply]
Let me clarify, this is not a request to add in all attributive adjectives, just the nouns that don't simply mean "of or relating to said noun" when used as such. For instance, the word "coffee", which can be used as an attributive adjective to mean "pale brown".--♠TBC♠ 10:27, 13 July 2008 (UTC)[reply]
Coffee and most other entries for colors have adjective PoS sections even though there is a universal rule that "allows" such words to be used as adjectives. DCDuring TALK 20:50, 13 July 2008 (UTC)[reply]
Actually, the rule runs the other way. Most color words are (originally) adjectives, but they are permitted to be used as nouns. For the historical reason, then, new color names acquire a sense of being an adjective. Also note that some color words also function as a verb meaning "to turn that color" (The page yellowed; The lawn began to green), although these are more aften formed by addition of a suffix to the color name. --EncycloPetey 22:06, 13 July 2008 (UTC)[reply]
Some print dictionaries contain a notation on some noun entries that indicates frequent attributive use. DCDuring TALK 11:53, 13 July 2008 (UTC)[reply]
What nouns are not used as adjectives? Would it make sense to mark these? Michael Z. 2008-07-13 18:28 z
That would be worth noting, if we could think to ask the question when looking at the noun entry. DCDuring TALK 20:50, 13 July 2008 (UTC)[reply]
Another example, "spot" is an attributive-only adjective that can be defined as to be "made, paid, delivered, etc., at once" (as in a spot sale, which is obviously not a sale of spots).--♠TBC♠ 20:02, 13 July 2008 (UTC)[reply]
This is a fairly practical question. What combination of characteristics of word-as-noun, word-used-attributively, and Wiktionary-entry-for-word might warrant some kind of Adjective PoS or other indication? The practical benefit for us is to prevent needless entry of adjective PoS sections that we believe would be misleading. It also gives us the chance to explain a bit about attributive use of nouns, thereby fulfilling a modest educational function. The cost is about 3/4" of usually below-the-fold vertical screen space.
One possible limited application of the stub Adjective PoS section would be those words for which a contributor entered an Adjective PoS section that we subsequently determined to be merely attributive use. If there were no indication that contributors found the absence of an Adjective PoS confusing, then there would be little reason for us to add it.
Another application would be cases where the noun is attestable (and etymologically prior ?), but most of the usage is actually attributive. The justification would be that such instances are highly likely to lead to a contributor seeking to enter the Adjective PoS section. DCDuring TALK 20:50, 13 July 2008 (UTC)[reply]
Is this the same as in spot check? Would this be a derivative of on the spot?
Possibly, but that's more of an etymology concern.--TBC 14:44, 14 July 2008 (UTC)[reply]

Idioms vs Chengyu

Are the entries in "Category:Mandarin idioms" idioms in the English sense of the word, or in the sense of the Chinese equivalent of an idiom, the chengyu (using "chengyu" since I'm too lazy to type "four character idiom")? Chengyus are a lot more specific than idioms, as they require four characters and they usually come with a background story. For example, this issue is evident with 黑馬, which means dark horse, or an unexpected success. English speakers would immediately identify this as an idiom, since it can't be understood simply by knowing the meanings of each individual character. Chinese speakers, on the other hand, would not consider this to be a chengyu; in fact, they would likely just consider it to a normal compound and not idiomatic at all. So then, any thoughts?--♠TBC♠ 11:20, 13 July 2008 (UTC)[reply]

One possible solution to differentiate the two is to create a subcategory in Category:Mandarin idioms for chengyus. But then, what should we call the category? Category:Mandarin four-character idioms?--♠TBC♠ 11:23, 13 July 2008 (UTC)[reply]
For purposes of Wiktionary, Category:Mandarin idioms refers to idioms in the English sense of the word. As you mentioned, most people associate the term chéngyǔ with the numerous four-character expressions found in Chinese. However, Category:Mandarin idioms is not limited to just chéngyǔ. You could perhaps create a subcategory for chéngyǔ. However, I'm not sure how useful such a category would be. -- A-cai 12:26, 13 July 2008 (UTC)[reply]
I think it's important that we distinguish chéngyǔ from idioms, so that readers (especially those starting out in Chinese) can understand the culture significance of four character idioms in comparison to normal idioms. A subcategory certainly won't do any harm.--TBC 14:11, 14 July 2008 (UTC)[reply]
Category created.--TBC 14:15, 14 July 2008 (UTC)[reply]

(I believe this has been brought up before, but I'd like to add in my two cents)

I'm fine with keeping entries like Nintendo or Tupperware, companies so ubiquitous that they've become synonymous with their products. But I draw the line at fictional characters. I know there's been some discussion of this before (a year ago I believe, with Care Bears and Teletubbies), but keeping these entries sets up a god awful precedent for keeping any pop cultural icon associated with a certain personality, nuance, or quirk. This opens up a Pandora's Box for entries like Bart Simpson (associated with adolescent rebelliousness), Eric Cartman (associated with excessive profanity and manipulation), Cheech and Chong (associated with "stoned slackers" and buddy flicks), and so on. I'm tempted to take this to RfD, but it's more of a policy issue. --TBC 13:05, 14 July 2008 (UTC)[reply]

Shouldn't the test be whether the phrase is used sufficiently to generate citations conveying meaning independent of the work with which the character is primarily associated? bd2412 T 13:09, 14 July 2008 (UTC)[reply]
But the problem is, virtually any pop cultural icon can be applied in such a way. Bart Simpson, Indiana Jones, the list is endless.--TBC 13:14, 14 July 2008 (UTC)[reply]
Concerning the Oscar the Grouch entry, I'm particularly concerned about citations like "I'd have looked a lot like Oscar the Grouch", "a pop-up “Oscar the Grouch” toy", "Oscar the Grouch flip-flops", where the quote is clearly referring to "Oscar the Grouch" as in the character from Sesame Street. For the other quotes, my comment above on pop cultural icons applies.--TBC 13:24, 14 July 2008 (UTC)[reply]
Also, regarding pop cultural icons, this applies to real people (and entries such as The Beatles and Abbott and Costello) as well: "He's a Jimmy Hendrix on the guitar", "He's the da Vinci of our time", "He's an Orson Orwell when it comes to directing", etc. See how this can be applied to nearly anyone in pop culture with an identifiable style, motif, or personality?--TBC 13:32, 14 July 2008 (UTC)[reply]
Adding on, the "out-of-context" rationale used to defend entries like these is extremely weak. Anything that's relatively famous can (and probably has) been used out-of-context. More examples (this time with cites): "the original TLJ was the Lord of the Rings of the gaming world" [35], "the Jewish people need a Ronald Reagan", [36], "the Bill Gates of Artificial Life" [37], "don't have a Jesse Jackson to speak for us" [38]--TBC 13:45, 14 July 2008 (UTC)[reply]
First of all, the pop-up toy and flip-flops do not necessarily support an out-of-context rationale. They are there to support an attributive-use rationale that worked its way into CFI and that I'd just as soon see work its way out. Out-of-context is superior in my view, and completely defendable. If you ask anyone on the street the who the following are:
  • Bart Simpson
  • Bill Gates
  • Cheech and Chong
  • da Vinci
  • Eric Cartman
  • Jesse Jackson
  • Jimmy Hendrix
  • Ronald Reagan
  • The Beatles
then I'd bet the majority would know every one of them, if they don't nail, right on the head, the significance of every single one of them. What you're saying is that you think a word, the metaphor of an idea, that every English speaker knows, that we can use to convey a concept in conversation, that as you demonstrated we do in fact use without any additional references to its meaning or origin, should not because of some technicality be part of a corpus of knowledge that defines terms. A younger person today might not understand "Abbott and Costello routine" any more than they understand "dance routine". Why should we define one and not the other? Fifty years from now Bart Simpson might not be known either, nor I'm guessing are Cheech and Chong by many non-native speakers. It's not that the Orson Wells of directing is difficult to presume correctly, rather "the Orson Welles of the hand-held camera" or "the Orson Welles of software" or even "the Orson Welles of mice". If someone is likely to run across it and want to know what it means, then we should provide them with a brief explanation. DAVilla 21:01, 14 July 2008 (UTC)[reply]
The problem here is that, this logic would merit the inclusion of virtually any famous figure or pop cultural icon. I wouldn't opposed a soft redirect to Wikipedia, though.--TBC 21:32, 14 July 2008 (UTC)[reply]
I don't see how that's a problem. If someone is likely to run across the figurative use of any famous figure or pop cultural icon, and needed to know what is meant, then include them all, I say. This line is not exactly how CFI reads, by the way, which is part of the reason there are still red links. But we don't do soft links to pedia, maybe because we're just too proud.
I don't blame you for bringing it up as there will always be more debate. It's not so easy to reach consensus on some terms, and harder to generalize what the community opinion is more generally, and close to impossible to pin a rationale. DAVilla 07:49, 15 July 2008 (UTC)[reply]
That sounds reasonable. I personally don't believe that such entries belong in a dictionary, but then again, Wiktionary isn't really your traditional paper dictionary.--TBC 08:23, 15 July 2008 (UTC)[reply]
Regarding the "'attributive-use" rationale; couldn't any major brand or series name be used attributively? As in, "a Hot Wheels car", a "G.I. Joe figurine", a "Hitchhiker's Guide to the Galaxy t-shirt".--TBC 21:48, 14 July 2008 (UTC)[reply]
Yes, and that's why we went through a lengthy discussion and vote to set policy for brand names. Brand names now have their own amended requirements under WT:CFI. --EncycloPetey 03:32, 15 July 2008 (UTC)[reply]
That was a wording mistake, I meant brands as in franchises and pop cultural characters (like Indiana Jones), not products and trademarks like Nintendo and Band-aid, which CFI does cover.--TBC 08:23, 15 July 2008 (UTC)[reply]
You could have saved yourself a lot of work by reading the Talk page for Oscar the Grouch. That entry has already passed RfD, and was kept. Yes, pop culture icons can end up surviving CFI and RFD. Some cultural icons have penetrated everyday speech to an extent that they should be included in a reasonable dictionary, so that they can be explained to persons learning the language. Having an entry for Oscar the Grouch is no stranger than having entries for Hercules, Cinderella, or Romeo. These are characters in stories that mean something to the culture that shares them. Note that this is not the same as the full name of a real person, as that is a separate issue. --EncycloPetey 17:56, 14 July 2008 (UTC)[reply]
I've read through the discussion, and I found the rationale for keeping the entry extremely unsatisfactory. I realize that this has already survived RfD, but how does that bar us from discussing it? Consensus is never set in stone (I know this is from Wikipedia, but the concept still applies) and decisions can change; after all, this is a wiki. Again, allowing these entries to be kept would set a precedent meriting the inclusion of virtually any famous person, pop cultural icon, character, book, and so on.--TBC 21:32, 14 July 2008 (UTC)[reply]
Please do not cite Wikipedia policies or guidelines to support actions here; they do not apply. We don't want to keep hashing over the same RfD discussions over and over, which is why we archive them on the talk pages of the entries. Once a decision has been reached, an argument of "I don't agree" isn't sufficient to reopen the discussion. Yes, we have set a precedent with that, but not the one you think; please re-read the end of the discussion. The item must still have citations suitable to pass the requirements of WT:CFI. --EncycloPetey 22:08, 14 July 2008 (UTC)[reply]
I understand, but the concept that decisions are never binding and that consensus can always change on a wiki still applies to Wiktionary. And regarding citations, my point is that such citations can (and likely will) be found for any famous person, book, pop cultural icon, etc.--TBC 22:13, 14 July 2008 (UTC)[reply]
But I do agree with you on one point: We need a comprehensive policy for proper nouns.--TBC 22:20, 14 July 2008 (UTC)[reply]
You mean additional, beyond what WT:CFI already says? It already says that Thomas Jefferson will not have an entry because it is a specific person's name that is not used attributively. Yes, citations can be found for the things you mentioned, but in most cases those citations will not be attributive use. Please review CFI carefully before you continue this discussion. You're not saying anything that hasn't already been said here many times before. --EncycloPetey 03:37, 15 July 2008 (UTC)[reply]
Thomas Jefferson's full name is not used attributively because Jefferson is a distinctive last name. But for famous figures with a common last name (like Jackson), full names are used attributively. Using Jackson as an example, there's: "he's a Jesse Jackson Democrat" (New York Times), "wearing a Michael Jackson hat" (New York Times), "fixed him with the Samuel Jackson stare" [39]. Either way, although I think attributive use is a good litmus test for determining an entry's inclusion, I just don't believe that it should be the only requirement for inclusion. Also, WT:CFI covers trademarks and products, but not fictional characters like Oscar the Grouch and Cookie Monster (which was the original point of this discussion).--TBC 08:23, 15 July 2008 (UTC)[reply]
Yes, it does. It has an entire section on "Names of specific entities" that does not limit itself to "real" versus "fictional". The few examples may not be fictional, but that doesn't mean that it doesn't apply to all names of specific entities. The Oscar the Grouch previous discussion specifically tied it to this same section. Again, I say please review CFI carefully before you continue this discussion. You're not saying anything that hasn't already been said here many times before. --EncycloPetey 19:49, 15 July 2008 (UTC)[reply]
Not trying to fight for the last word, but I did read the section carefully, and the section does not specifically talk about fictional characters. Although fictional characters might fall under specific entities, Oscar the Grouch is a lot different than New York. It doesn't matter either way; I can see the point to having such entries. --TBC 13:25, 16 July 2008 (UTC)[reply]
I like the examples you give. I wonder if, contrary to what CFI claims, Thomas Jefferson is used attributatively, somewheres out theres. DAVilla 06:55, 16 July 2008 (UTC)[reply]
Not sure about Jefferson (since he's usually referred to attributively by his last name), but George Bush has certainly been used attributively and out-of-context. Attributive examples: "He should take a George Bush approach" New York Times, "a George Bush compromise" [40], "Toevs calls Kolbe a George Bush Republican" [41] Out-of-context examples: "Has Campbell pulled a George Bush?" [42], "hes too old, a flip flopper, and a George Bush lite" Washington Post--TBC 13:25, 16 July 2008 (UTC)[reply]
I think the attributive-use rule is being interpreted too loosely. “... with a widely understood meaning” should be better defined.
“Oscar the Grouch toy,” “Oscar the Grouch suit,” and “Oscar the Grouch flip-flops” are not using his name attributively to invoke some widely-understood characteristic of said Grouch. These are merely the names of products. Likewise, I think “a Jesse Jackson democrat” is referring to someone's specific political affiliation with Mr Jackson, and not invoking him as an adjective with an understood meaning. “A Samuel L. Jackson stare” relies on context and the word stare for its meaning, and makes perfect sense to someone unfamiliar with this Mr. J. All of these uses refer to specific associations with the people, and none of them relies on a generalized understanding of who they are or some quality of theirs. (What the heck is a “Michael Jackson hat?”)
“A George Bush approach” refers Bush's specific actions in office, not generally to some understood Bush quality—it has no meaning out of its context. The same goes for every other of these Bush citations.
A more illustrative example: 2-term Winnipeg Mayor Sam Katz: “a Sam Katz party,” “a Sam Katz Party-dominated Council,” “a Sam Katz quote,” “a Sam Katz business investment,” “a Sam Katz campaign sign,” “a Sam Katz re-election campaign promise.” I would argue that despite these citations, Katz doesn't belong in Wiktionary.
Supporting citations should have meanings which are widely understood without reference to a specific context or circumstance.  Michael Z. 2008-07-16 15:34 z
I agree. Part of the problem is that "attributive" has two senses: a technical sense that covers "George Bush" in "George Bush ally" but not in "ally of George Bush", and a non-technical sense that covers "George Bush ally" but not "George Bush" in "he's a George Bush ally". CFI aren't clear about which sense is meant, and (as you might expected) editors' ad-hoc interpretations vary significantly. —RuakhTALK 20:51, 16 July 2008 (UTC)[reply]
Can anyone suggest a revision of WT:CFI#Names of specific entities?
I also think that Empire State Building is a poor example, because it is so familiar and its special quality is understood worldwide. Here are some examples of its use attributively, or metaphorically (allegorically?):
  1. “His ignorance was an Empire State Building of ignorance. You had to admire it for its size.” —Dorothy Parker (1893–1967)
  2. “A lecherous and ruthless real estate developer, who built an Empire State Building-like Tower of Babel, corrupts an innocent girl in this film teeming ...”[43]
  3. “... Cheerleaders but also the deadly Ass-teroids and King Dong, who has a unique way of fending off attackers from his Empire State Building-like perch.”[44]
  4. “Canada, for instance, doesn't have an Empire State Building, and so makes do with the not-even-real-sounding CN Tower.”[45]
  5. “ Unlike the Empire State-like Ukraine, where the rest of our group were staying, or the new mammoth Rossia, the National is a pre-Revolutionary hotel, and, compared with them, built on modest . . .”[46]
  6. “... a musical experience any more than use of matter and windows will produce an Empire State Building unless one actually builds an Empire State Building.”[47]
  7. “But more than anything else, Allie had pointed up, like an Empire State Building in a row of Neissen huts, that even Lepke—even the cool, calculating king—could make one mistake.”[48]
  8. “Most people, in their drive to get rich, are trying to build an Empire State Building on a 6-inch slab.”[49]
  9. “But, shit, when they kill the spirit of your better half, then you are left to do the dishes yourself and rekonstruct your Life like an Empire State Building out of toothpicks.”[50]
  10. “He contended that establishing claims to knowledge was akin to building an Empire State building out of toothpicks, ‘most of which we haven't got and cannot be given.’”[51]
 Michael Z. 2008-07-17 21:42 z
Past decisions don't have any authority over present consensus. They can set precedent, affect or obviate new discussions, but anything can be reopened, especially if there is some point or information that hasn't been covered. We all signed the GFDL, so we shouldn't get too attached to our contributions. Michael Z. 2008-07-14 23:24 z

It's insensitive to refer to God, Jesus, Muhammad, Buddha etc. as “characters.” Any objections to changing the title and references on the page to “Mythological figures?”. Michael Z. 2008-07-14 19:58 z

"Mythological figures?"? I object. How about, instead, "Mythological figures"?  :-) msh210 20:38, 14 July 2008 (UTC)[reply]
Perhaps a separate appendix for Appendix:Deities? That way, God and the others won't directly be under mythological characters.--TBC 22:18, 14 July 2008 (UTC)[reply]
Careful now, there are still people who worship Thor - and consider him to be as real as any Christian, Muslim, or Buddhist counts their deity. bd2412 T 22:28, 14 July 2008 (UTC)[reply]
Another possibility is "Beings in X," thereby eliminating the word mythology. Wakablogger 23:09, 14 July 2008 (UTC)Wakablogger[reply]
There are atheists who faithfully consider any religion a fantasy, and there are probably adherents of just about any marginalized or even fictional religion. The dictionary shouldn't make judgments, and trying to categorize would be a can of worms. The simplest solution is to treat the subject with basic respect.
At first I also thought mythology sounded wrong, but it doesn't necessarily disparage or discount in the way tha myth or legend might, and is not uncommon. Cf. w:Christian mythology, w:Islamic mythology, w:Jewish mythology, w:Slavic mythology, w:Religion and mythologyMichael Z. 2008-07-14 23:18 z

The Appendix in question is nothing more than a list of linked words, so why have this appenidx at all? We already have Category:Mythology (and subcats). Why not just use categories? --EncycloPetey 03:29, 15 July 2008 (UTC)[reply]

Agreed, delete. This appendix is a stone's throw from worthless, and we already have the cats. These sorts of things are just itching to push peoples' buttons. I'm an atheist, and even I think it a bit insensitive to put this all under mythology. -Atelaes λάλει ἐμοί 05:41, 15 July 2008 (UTC)[reply]
Good point, EP, Atelaes: delete. (Well, bring to RFDO, then delete.)—msh210 17:58, 15 July 2008 (UTC)[reply]
I disagree with that - the list includes (and should include) redlinks which point us towards entries that we are missing. A category wouldn't. bd2412 T 19:21, 15 July 2008 (UTC)[reply]
A number of categories include, on the category page itself, a list of words, including redlinks, that someone ought to (write, if necessary, and) categorize in that same category. See, e.g., Category:English three letter words. Same can easily be done here, and imho it'd be more appropriate.—msh210 19:25, 15 July 2008 (UTC)[reply]
That has got to be the worst category. It has like 17 entries in the category, and the list of words not in the category is simultaneously about twelve times as long as the words that are in it, and yet woefully short of the total count of three letter words (and some of those appear to be acronyms, rather than words anyway). Besides, are we going to have subcategories for the particular branch of mythology that each character comes from? bd2412 T 00:55, 16 July 2008 (UTC)[reply]
I agree the three-letter-word category needs help; I was just using it as an example. As to your final question, whether we need subcats for various mythologies, I think not. Frankly, I'm not that fond of topic categories; alas, they are, apparently, here to stay. But we definitely don't need very fine ones. (Note incidentally that my dislike of topic categories does not apply to lexicons, including lexicons of jargon used only in certain fields, as marked by context tags.)—msh210 17:00, 16 July 2008 (UTC)[reply]
I agree that we ought not have categories specifying the mythology of a certian figure - but if we don't have them, is that a point in favor of an appendix making that sort of separation? Especially in a dictionary, where lexical origin of the set of words might be important? bd2412 T 20:32, 16 July 2008 (UTC)[reply]
Yes, but not a strong one, in my opinion.—msh210 20:13, 18 July 2008 (UTC)[reply]

Would the same apply to Appendix:Fictional characters? Do we have a guideline setting out what kinds of lists are desirable, or how to obviate them with categories? Michael Z. 2008-07-16 00:50 z

The ultimate goal of any category is to have all the red links blue (or removed because they fail CFI). At that point the list of items can usually be removed from the category because the items exist. One has to be careful though, because a blue link does not necessarily mean the entry exists in the desired language. --EncycloPetey 01:02, 16 July 2008 (UTC)[reply]

Alt spellings

An idea: what if Autoformat used the information on one of the pages (say color) and duplicated it on the other page where translations, etc. might be listed but might be inconsistent between the two pages? This wouldn't be for all alternative spellings, just the ones where there are 2 complete entries. And the pages would have to be marked by hand beforehand to make sure that they really are identical. Nadando 04:54, 16 July 2008 (UTC)[reply]

But not all alternative spellings always apply to all senses (see Makemake). AF would have to know which senses paired with either other, and any edit made to clarify definitions would make the two different and stymie AF. The caveat of having to mark the pages beforehand is a big caveat. And there are many things that will have to be different betwen the pages, such as quotations and example sentences. --EncycloPetey 17:39, 17 July 2008 (UTC)[reply]

Fictional brand names in citations?

Hello. This edit and edit summary bring up a very interesting question. If an author uses a word as part of a fake brand name in a fictional work, does it count as a cite of the word? Language Lover 05:09, 16 July 2008 (UTC)[reply]

If it conveys the meaning that the definition posits, I don't see why not. But how often is that going to happen? bd2412 T 05:13, 16 July 2008 (UTC)[reply]
I've seen stuff like this now and again. It may not contribute directly to any of the definitions, or marginally so, but that doesn't mean it isn't worth keeping. DAVilla 06:47, 16 July 2008 (UTC)[reply]
I don't think a citation like the one removed in the edit linked to above is useful as an example or should count as attestation: there's no meaning conveyed by the word there, so not, in particular, the meaning given in our definition.—msh210 16:40, 16 July 2008 (UTC)[reply]
Sure there is - have you heard the legal maxim, inclusio unius est exclusio alterius ("inclusion of one is exclusion of another")? When, in describing a weapon, you see the phrase "MegaHurt InstaKill Thunderbomb", you know right away that "InstaKill" is the kind of word that falls into the same category of meaning as MegaHurt and Thunderbomb", i.e. it has something to do with inflicting extremely large amounts of damage on something else. bd2412 T 20:29, 16 July 2008 (UTC)[reply]

Wiktionary:Internal links was written with the new or newish user in mind; please add to it (or blast it, as appropriate).—msh210 17:06, 17 July 2008 (UTC)[reply]

It is a useful page indeed, particularly the links to the obsurish pages. Would this be better placed at Help:Internal links, seeing as it is written to aid the newish user and is not policy, but an amalgamation thereof? See #Spam Policy below for the same debate. Conrad.Irwin 22:27, 19 July 2008 (UTC)[reply]

Broken superlative template?

When I went to use the superlative template I got this in the edit box:

"#REDIRECT Template:new en superl bot"

dougher 02:16, 18 July 2008 (UTC)[reply]

It seems as if someone wants it work the same way that the new comparative template works, showing "most X" in the sense line. It would be desirable to have said appearance, but have the option of switching it off. DCDuring TALK 02:26, 18 July 2008 (UTC)[reply]
Well, exist that that isn't the case in all languages. In Latin the superlative can mean "most X", but it can also mean "extremely X" or "very much X" without comparison to other X's. If this template does it, it should be something that must be turned on, not the default. --EncycloPetey 06:24, 18 July 2008 (UTC)[reply]
Then why not {{en-superlative-of}}? DCDuring TALK 10:43, 18 July 2008 (UTC)[reply]

Phrasebook Criteria

We need a separate section on CFI detailing the criteria for inclusion of phrasebook entries. WT:CFI mentions phrasebook entries very briefly under the idioms section, but the wording is extremely vague (entries must be "very common" and "useful to non-native speakers"). Namely, we need to determine what qualifies as a common phrase, what qualifies as a useful phrase, and how phrasebook entries should be worded (grammatically, that is) and formatted. There should also be a guideline for writing phrasebook entries. At the very least, something more extensive than "See Wiktionary talk:Phrasebook for now".--TBC 09:39, 18 July 2008 (UTC)[reply]

Well this hasn't been a problem thus far, but I guess we could try to flesh it out a little bit more. It's not time to create completely objective criteria, however. I would want to work backwords from the consensus on multiple entries in RFD before trying a major overhaul. So far I don't know that any entries have failed.
To me, useful phrases are correlated to common or frequently occurring scenarios. For instance, being in a restaurant and needing to use the bathroom is a common scenario. Counting goldfish in a bowl is not. It's certainly conceivable that someone would count goldfish in a bowl, and an il-y-a construction might be "useful to non-native speakers" learning the structure of a language, but it's not something that happens frequently enough to warrant mention. It's not the type of entry you're normally think of being in a phrasebook.
The "very common" criterion isn't so important to me. What's more important is to say that this is the way that native speakers would tend to phrase a concept. There may be other ways of saying it that are both grammatically correct and attestable, but they're not always the most immediate or obvious ways of addressing the choice of wording. In some cases there may be more than one way of expressing an idea, and that's fine too. But they'd all have to be very common, first of all, and it's worth pointing out that the more informal terms are not something a non-native speaker would need to be introduced to right away. "Where's the little boy's room?" is a very common phrase, but it's not what you would teach someone who has trouble communicating. DAVilla 06:25, 19 July 2008 (UTC)[reply]

Spam Policy

Instead of using an interwiki link to the Wikipedia policy on spam (which might not completely apply with Wiktionary), I think we should have our own comprehensive policy on spam. Currently working on a draft.--TBC 10:20, 18 July 2008 (UTC)[reply]

Started up Wiktionary:Spam. Any other Wiktionary-specific additions?--TBC 10:50, 18 July 2008 (UTC)[reply]
I think this is a bad idea, the Wikipedia policy covers everything and our version will just go stale, a bit like that Vandalism page and all the other policies that are merely copies of Wikipedia. (see also Wiktionary:Three-revert rule Wiktionary:Be bold). I agree that Wiktionary could do with some explanatory pages, but copying from Wikipedia does not seem to work. Conrad.Irwin 20:51, 19 July 2008 (UTC)[reply]
I had a second thought! These pages should not be policy, I think that is what annoys me. Instead we should have lots of pages like Help:Spam which help newbies understand what the community thinks of spam. What does anyone else think? Conrad.Irwin 22:14, 19 July 2008 (UTC)[reply]
You mean like essentially a guideline?--TBC 21:16, 20 July 2008 (UTC)[reply]
"Guideline" to me implies a set of instructions, this isn't a set of instructions it's just a definition of what spam is, or have I misunderstood? I don't really see what the purpose of such a page can be except to help people understand what spam is, and so it seems logical (to me) that this should be in the Help: namespace. Conrad.Irwin 23:33, 20 July 2008 (UTC)[reply]
Could help. Can't hurt much, even if completely neglected. Doesn't require a vote. Why not? The approach has lots of applications. It's a good way of seeing what we really do think as a group and might lead to some policy decisions. Almost anything that would explain us to newbies would be very good. It is a actually a good task for a senior newbie or not so veteran contributor, like me. DCDuring TALK 23:15, 19 July 2008 (UTC)[reply]
Well ignoring WP:BEANS, no it doesn't hurt at all. However I can't see how this page functions as a policy. Conrad.Irwin 23:33, 20 July 2008 (UTC)[reply]
The "help" given could be wrong when written (because help is a low-priority/low-fun task) or become obsolete and therefore somewhat contradict to whatever policy might be. Definitely not policy itself, but could lead to policy if discussion is active and productive. DCDuring TALK 23:55, 20 July 2008 (UTC)[reply]

Right-hand ToC interactions with other templates

I really like having the ToC on the right so that I can both see much content and jump to content by clicking on the ToC. From time to time there are issues with other templates, which force the content (or the ToC ?) off the first page, defeating the purpose of having it. An example is {{rfe}}. In contrast {{etystub}} does not. Is it worth fixing {{rfe}} or could/should it be deprecated? How hard is it to avoid bad interactions with the right-hand ToC? Can the right-hand ToC code or CSS be tweaked to force proper display? DCDuring TALK 21:10, 18 July 2008 (UTC)[reply]

Hear, here. DAVilla 06:01, 19 July 2008 (UTC)[reply]

I take it that the source of the problem is the width of the pseudo-graphic. Templates with poor interaction with rh ToC: {{rfe}}, {{rfp}} {{rfap}} DCDuring TALK 10:37, 19 July 2008 (UTC)[reply]

I have reduced the width of the graphics to 50% in all 3 of these templates. They might look better not centered, but with a small left-hand margin. DCDuring TALK 10:53, 19 July 2008 (UTC)[reply]

I've now fixed them using the same CSS that is used on {{rfv}} etc. I have created the meta template {{request box}} for there, that mimicks {{maintenance box}}. I've put the width to match maintenance box, but that can easily be changed by editing Template:request_box. Conrad.Irwin 20:45, 19 July 2008 (UTC)[reply]
Your way is definitely better and would be my model, but the new meta template seems to obivate the need. DCDuring TALK 21:12, 19 July 2008 (UTC)[reply]
Are there other remaining barriers to making the right-hand ToC the default? Should we give more time for identifying problems? Should any that are discovered be reported here? or elsewhere? DCDuring TALK 21:15, 19 July 2008 (UTC)[reply]
There are a few issues with some of the CJK entries I think, but as I don't spend much time there I'm not sure what they are or what causes them. I definitely think we should activate this for a trial period of a few days, and then if things look good we might need to VOTE before enabling it permanently. Conrad.Irwin 22:18, 19 July 2008 (UTC)[reply]
RU seems to be aware of CJK issues. What are his thoughts? DCDuring TALK 23:18, 19 July 2008 (UTC)[reply]

Norwegian – again

Well, after some discussion on my talk page and the creation of Wiktionary:About Norwegian, I am to raise this question again. How should we handle Norwegian entries? Today we handle all words that is the same in Bokmål and Nynorsk as Norwegian, while we differ if a word only exists in one of the languages. But, as Kåre-Olav says, almost every word (nouns, adjectives and verbs) are inflected (slightly) different. We will renew the discussion on no.wikt, and see if we maybe should work differently there, but for now we do it the same way as here ... but I'm not sure if it's the best way. Any thoughts? Can we go for totally differing between the two languages (will give quite a lot of double entries, as it already is with Danish, Swedish and Norwegian in general)? --Eivind (t) 16:12, 20 July 2008 (UTC)[reply]

Previous discussion here. I'm going to stay out of this conversation, as I did so much waffling on the last one that I think I would be of little help. I would like to see Meco's input, as they are the only one who has stayed and consistently worked on Norwegian here. Not only are they going to be most affected by any decision, but they are in a better position to assert what will work. -Atelaes λάλει ἐμοί 19:29, 20 July 2008 (UTC)[reply]
I think that this is a complex issue that is not yet ripe for a comprehensive solution. I have for my two year tenure here mostly stayed away from the problem by sticking exclusively to Bokmål entries, basically ignoring the mere existence of Nynorsk, not out of disregard for Nynorsk (disdain for Nynorsk is prevalent among Norwegians), but because I have felt uneasy and unsure about how to deal with the Norwegian problem of two separate, however quite similar, languages. Only very recently have I begun fiddling with dual entries (partly inspired by the entrance of User:Kåre-Olav).
There are inconsistencies in the Norwegian connjugation and declension templates that are tentatively dealt with in a piecemeal way, and I propose that the strategy to be adopted for solving this problem is that of continual reconstruction and reconfigurations until we perceive that we are on a converging track. Having a separate Norwegian language forum is a good initiative, I think. I don't sense, though, that we are yet at the point where a comprehensive discussion is likely to provide clarity and solutions with regard to coming up with guidelines for how to deal with Norwegian entries. I think we should keep tabs on one another, taking pointers and giving feedback to fellow Norwegian entry contributors, and having a dedicated forum would be a useful adjunct in this respect.
Finally, I will mention two things. Firstly, the problems of the Chinese languages which does lead to a similar problem complex of not being able to pin down comprehensive, integral guidelines for that are only partly distinguishable. Secondly, that I don't think we should hope to find a solution by gleaning from the Norwegian Bokmål Wiktionary, as it's still too much in its infancy. We should keep lines of communication with it open though, as the problem is a common one, and a sustainable solution found either place would quite possibly be adoptable in the other. __meco 09:44, 21 July 2008 (UTC)[reply]

Googleability

Good news: Google now includes Wiktionary in its search for definitions using "define:" keyword. (E.g. [52]) --Dan Polansky 10:58, 21 July 2008 (UTC)[reply]

Callooh! Callay! DCDuring TALK 11:57, 21 July 2008 (UTC)[reply]