Wiktionary:Beer parlour: difference between revisions

From Wiktionary, the free dictionary
Latest comment: 15 years ago by Jackofclubs in topic Unorthodox request.
Jump to navigation Jump to search
Content deleted Content added
Robert Ullmann (talk | contribs)
Jackofclubs (talk | contribs)
m Unorthodox request.: new section
Line 3,106: Line 3,106:


Is what is written at {{term|dens#Etymology|dēns}} meant to avoid the inclusion in the entry of the long tree of terms at [[Appendix:Proto-Indo-European *h₃dónts#Descendants]]? If so, I think I get your point… <tt> :-S </tt> Cognates are really useful, but I’d shy away from ''that'' many lest they swamp the entry. <b><font style="color:darkred">†</font></b>&nbsp;[[﴾]]<sup>(<i>u</i>):</sup>[[User:Doremítzwr|Raifʻhār]] <sup>(<i>t</i>):</sup>[[User talk:Doremítzwr|Doremítzwr]][[﴿]] 14:48, 23 June 2009 (UTC)
Is what is written at {{term|dens#Etymology|dēns}} meant to avoid the inclusion in the entry of the long tree of terms at [[Appendix:Proto-Indo-European *h₃dónts#Descendants]]? If so, I think I get your point… <tt> :-S </tt> Cognates are really useful, but I’d shy away from ''that'' many lest they swamp the entry. <b><font style="color:darkred">†</font></b>&nbsp;[[﴾]]<sup>(<i>u</i>):</sup>[[User:Doremítzwr|Raifʻhār]] <sup>(<i>t</i>):</sup>[[User talk:Doremítzwr|Doremítzwr]][[﴿]] 14:48, 23 June 2009 (UTC)

== Unorthodox request. ==

Hello Wiktionary community. I have a confession to make. You see, [[Wiktionary:Requests for checkuser|as suspected, but not proved]] I am in fact Wonderfool. And Wonderfools hae never been [[User:Keene|keen]] on serious long-term admin work. So this year, instead of being [[User:Dangherous|dangerous]] and going on a spree, I'll be amical and request desysoppig [http://meta.wikimedia.org/w/index.php?diff=1527949&oldid=1527264 the polite way] (I'll delete the main page, of course, but that's all). And I think it would be [[User:Wonderfool|wonderful]] if I could remain a Wiktionarian, and be open about my WFness (i.e. don't send me underground so I'm forced to clandestine editting and hopping from IP address to IP address and town to town and continent to continent). This way, I can edit hardcore French stuff, which I haven't done properly for about a year, without worrying about being blocked. And It'd be nice to run [[User:Keenebot2]] again - there's tens of thousands of pages waiting to be rapidly added to this project in my off-wiki files. Anyway, I propose a mini-poll to allow WF to edit here, but without boring adminship duties. If not, then I'll probably see you again in 2010 under a [[User:Newnoise|new name]]. Regards --[[User:Jackofclubs|Jackofclubs]] 07:00, 24 June 2009 (UTC)

'''Support'''

'''Oppose'''

'''Abstain'''

'''Discussion'''

Revision as of 07:00, 24 June 2009

Wiktionary:Beer parlour/header






April 2009

Translations of taxonomic names

I wanted to enter somewhere that the Norwegian name for (deprecated template usage) Primulaceae is (deprecated template usage) nøkleblomfamilien. How is this possible with no English entry, only translingual? Do I create a translations section inside the translingual entry? I checked a dozen or so other members of Category:Taxonomic names and none had translations sections. __meco 08:17, 1 April 2009 (UTC)Reply

We've had this conversation before, though I'm not sure where or when. At least some people contended, and I think this may have been the resolution, that the language-specific name is the translation of the English word representing that taxon, such as (in this case) (deprecated template usage) primrose (one sense), and not of the translingual taxon name.—msh210 15:59, 1 April 2009 (UTC)Reply
There is also a mechanism in place at Wikispecies to enter translations for taxon names. --EncycloPetey 03:21, 2 April 2009 (UTC)Reply

Wiktionary:Writing templates

Hi everyone. A document like this has long been needed, and my meagre start merely scratches the scab off the itch. Could people please modify it where I am wrong, and improve it. Feel free to be slightly proscriptive. It'd be better to have "ideals" than "current practice". Thanks Conrad.Irwin 22:56, 1 April 2009 (UTC)Reply

Hi, on the topic of floating, you write: "Conjugation templates: If it's small, float right and don't hide." Recently, I have made a move in the other direction, with German conjugation templates, turning some floating templates into non-floating ones. Is it really preferred that conjugation templates float in the right? From what I have seen in English Wiktionary, most conjugation templates do not float in the right, and I for one prefer to avoid floating conjugation templates. What do other people think? --Dan Polansky 09:45, 2 April 2009 (UTC)Reply
I agree on the non-floating. There's also a timely example similar to this at WT:RFDO#Template:romance_cognates). --Bequw¢τ 05:36, 3 April 2009 (UTC)Reply
Non-floating templates are better. They can be placed into a section without extending into other sections. --EncycloPetey 13:43, 3 April 2009 (UTC)Reply

Move Category:Chinglish > Category:Chinese English

According to w:Chinglish, “The term "Chinglish" is mostly used in popular contexts and may have pejorative or derogatory connotations. The terms "Chinese English" and "China English" are also used, mostly in the academic community, to refer to Chinese varieties of English”

Wikipedia's rule for titling articles is “most common name”, but Wiktionary's a different animal. Any objection to moving the category? Michael Z. 2009-04-02 17:13 z

Do we even need the category at present? It currently has no entries in it, and the examples of Chinglish I've seen are a result of mistranslation or grammar problems rather than deliberate intent to develop a Chinese variety of English. — Carolina wren discussió 21:46, 2 April 2009 (UTC)Reply
I have no idea. My uninformed impression is that Chinglish is considered a “sub-standard” speech, but we are a documentary dictionary, so there's no reason to exclude it. Someone foresaw a need by creating the category, and there are 300–500 million ESL speakers in China, so sooner or later they will start speaking to each other and someone will document it. I won't object to leaving it around until it's needed, nor to deleting it until it's needed.
I'm not even positive that it should be moved; I'm just going by the WP article's intro, and OED labels it “colloq. (freq. depreciative)”. It seems that educators and translators are interested in the subject.[1]Michael Z. 2009-04-03 00:04 z
A concurrent discussion about dialect labels make me think it's better to define a dialect template and category before it's needed, than to have editors start applying it ad hoc. After writing the last sentence, I found that it's actually already used in the entry bo jook, but there doesn't happen to be a template. So I'll create the template now, and let's keep the category. Michael Z. 2009-04-03 00:31 z

Words in the News (defunct?)

I have stopped adding words to Wiktionary:Words in the News. I don't think many people look at it, and it is getting difficult to find anything interesting to add. Unless anyone else wants to take it over, I propose that we archive the current content, remove what links to it there are, and just forget about it. If any really interesting word turns up in the news, we can always slot it into Word of the Day. SemperBlotto 15:10, 3 April 2009 (UTC)Reply

We should add some links from there to Visivia's News wordlists, and any of the other such things; but yes, Words in the News does seem dead right now. JesseW 19:41, 3 April 2009 (UTC)Reply
Archived. Delinked. SemperBlotto 14:53, 6 April 2009 (UTC)Reply

Taking a fortnight off.

I'm taking a break for a few weeks to attend to family and academic obligations. I would really appreciate if the rest of you would buckle down and finish the Wiktionary by the time I get back. Cheers! bd2412 T 23:33, 3 April 2009 (UTC)Reply

Sure, we'll leave the last word for you to do in ceremony on your return! Conrad.Irwin 00:05, 4 April 2009 (UTC)Reply
I'm distressed to have returned and found that the collection of "all words in all languages" is not finished. Quit lollygagging people, and let's wrap this up already! bd2412 T 21:56, 5 June 2009 (UTC)Reply
We had almost finished, but the New York Times keeps making more words. Stop them! Equinox 23:53, 5 June 2009 (UTC)Reply
Always with the excuses... bd2412 T 00:45, 6 June 2009 (UTC)Reply

Uploading the remaining language templates.

I have started my bot slowly uploading the remaining ISO 639-3 templates as it seems a bit silly not to just have them all. The script will also tell me which templates differ from the standard's names (but won't 'correct' them). In the first few it has only come across {{aab}} which reads Arum-Tesu not Alumu-Tesu. Conrad.Irwin 22:17, 5 April 2009 (UTC)Reply

We have been doing this by hand as needed for a reason. A number of reasons. Are you only getting the I (individual languages)? Are you checking that the ISO/SIL name is the name of the language, without things like a qualifier in parens (you are not), chekcing that it isn't a -1 language we use the 2 letter code for? Checking that it isn't an artificial language we prohibit? all in all, a recipe for a real mess. Since someone is going to have to check (and that means looking at the language itself, in WP or whatever) every single one, are you going to do it? I don't think this is really a good idea. Robert Ullmann 10:09, 6 April 2009 (UTC)Reply
From what I've seen, it indeed only creates individual language codes (no collective codes), so no problem there. And I think we should have three-letter codes even if a two-letter code exists (it's already done on Wiktionary). As for artificial languages, the only actually forbidden artificial languages with ISO codes are Quenya and Klingon, I believe. All others just don't have any consensus. -- Prince Kassad 10:55, 6 April 2009 (UTC)Reply
I have also replied on my talk page, including a list of discrepancies found so far. Conrad.Irwin 11:21, 6 April 2009 (UTC)Reply
Duplicate language codes are a problem for categories (we don't want both Category:es:Cardinal numbers and Category:spa:Cardinal numbers) and technically "collective codes" are only in ISO 639-2 (639-3 has "macro languages"). I'm sure Conrad is able to filter out the ones that duplicate 639-1 codes, have parens or commas in the title, and are artificial (there's only maybe 8 or so). I don't think the concern with macro languages should stop Conrad. Best to get it out in the open and deal with it early. We should setup Wiktionary:Language codes with, among other things, a section on macro language codes and list which undergo fusion and fission. We can delete and protect the codes that don't use here. I say go for it, as it may be easier for everyone if we can deal with the majority of these issues in batch-style work (I know there's duplicate codes around still). --Bequw¢τ 02:31, 7 April 2009 (UTC)Reply
I think it should be noted that we do not, as of yet, have a standard format for how to deal with languages with parenthetical qualifications, and there are quite a few of them. Robert proposed replacing the parentheses with a simple dash, which I think is reasonable, but we never really got consensus on that. Also, and I don't know how feasible this is for a bot to do, but it'd be nice to have a list of macros that we're treating as individual languages (such as Albanian). I don't think we should instantly put the kabash on them just yet (as for most of them, we have no good method/people for transitioning to true languages), but it'd be nice to have a list somewhere so we at least know they need fixing. Anyway, while Robert mentions some valid concerns, I think this is an excellent idea, and my thanks go out to Conrad for it. -Atelaes λάλει ἐμοί 05:01, 7 April 2009 (UTC)Reply
What does kabash mean? We do not have an entry for this word. The uſer hight Bogorm converſation 06:00, 7 April 2009 (UTC)Reply
See kibosh. Nadando 06:09, 7 April 2009 (UTC)Reply
This subject is mature enough for a dedicated page, and one that sets the standard if we agree to it. I don't think it should just be a Wiktionary: page though. It should be in Appendix: so that everyone can see what languages are included in the dictionary. DAVilla 02:29, 8 April 2009 (UTC)Reply
A first attempt at Wiktionary:Language codes is up. It includes a table that lists the ISO 639-3 macrolanguages so that we can note how we currently treat them (if we split them up or keep them whole). It has info on the language code decisions that I'm aware of. Please add to it:) (I wasn't sure about which prefix to use, Wiktionary for now)--Bequw¢τ 07:58, 14 April 2009 (UTC)Reply
That looks useful, I like the macrolanguages table a lot, though I'd question whether it should be "policy", it's more of an informational page. If I get some time later, I'll filter the list of language codes and begin uploading again. Conrad.Irwin 08:53, 14 April 2009 (UTC)Reply
It's now uploading again, but ignoring macro-languages, constructed languages, ISO-1 coded languages and languages with brackets. For the full list of exclusions see User:Conrad.Bot/bad_iso. Conrad.Irwin 09:44, 14 April 2009 (UTC)Reply
Now complete, we have all the templates except for the ones on User:Conrad.Bot/bad_iso that did not exist before the filtered run. Conrad.Irwin 16:27, 16 April 2009 (UTC)Reply

Why does the pronunciation not play automatically?

Why can't the pronunciation play automatically on wikitionary as on awww.answers.com when we click the sound (blowhorn) icon. Why is it in ogg format and why does it ask us to save the file first... (saving is ok but should be optional)? Am I asking this at the right place? 69.14.222.205 02:23, 6 April 2009 (UTC)Reply

This isn't a Wiktionary issue; it depends on your browser configuration and how it deals with OGG files. Look at your browser help file. Equinox 02:26, 6 April 2009 (UTC)Reply
Mediawiki uses OGG files because they're widely used and the format is free to use, not owned by someone who might demand money or recognitiion or such. See w:Ogg. --EncycloPetey 04:48, 9 April 2009 (UTC)Reply

The prepositions/adpositions must be given with each verb or noun

I want to get the adposition used with verbs e.g. attentive to their needs, separately from its neighbors and as used with nouns a thirst for revenge, an amendment to the constitution. But many verb definitions and noun definitions does not include this information. This is very important for people with English as a foreign/second language. Is there a page on wiki projects which has this information?69.14.222.205 02:23, 6 April 2009 (UTC)Reply

Where the combination of a verb or noun with a preposition means more than a basic sum of parts, it should have an entry of its own, linked from the verb or noun's entry in the Derived terms section. Indicating a preference for a particular preposition to be used would otherwise go in the Usgae notes section. We don't have too many projects here as they do on Wikipedia. The closest analogue we have is generally the various subcategories of Category:Requests by language. (As a side note, in both your examples with to, I'd find of to be equally valid.) — Carolina wren discussió 03:41, 6 April 2009 (UTC)Reply
Actually, giving the relevant prepositions is a key issue which we should definitely do more of, and we have discussed it in the past without ever really working out the best format. See for example die, Verb sense 1, where I used subsenses for the job. Ƿidsiþ 12:49, 6 April 2009 (UTC)Reply
of could be equally valid, but the point is that how is a foreigner supposed to know what to use with these nouns and verbs? Just telling the definition of a word does not allow the reader to be able to use them. A dictionary is the perfect place to tell this smaller word (adposition). What I mean here is that if the word is worthy and the defenition is Having merit, or value; useful or valuable then a person will not know how to use it in: he is worthy __ the prize'. The dictionary might as well tell that the adposition (also specify what type of adposistion : preposition, postposition, or circumposition) is of. Otherwise the user might use worthy for, worthy with etc. which are incorrect 207.148.219.146 12:45, 6 April 2009 (UTC)Reply
207.148.219.146, I concur with you in general, but while worthy is a piece of cake, because the apposition is the same in the other two greatest West Europæan languages (digne de, etw. (Gen.) wert, Gen.->of), a more tricky example would be angry with (furieux contre, wütend auf). Howbeit, the addition of appositions for all major (Indo-Germanic? I am not sure about the non-Indo-Germanic languages, in Japanese this is determined by particles) languages is no doubt exigent. The uſer hight Bogorm converſation 16:36, 6 April 2009 (UTC)Reply
What about verbs? Do we document that I write in ink (but presumably not for The Times, on paper, with pencil as they are used the same way throughout)? Conrad.Irwin 16:50, 6 April 2009 (UTC)Reply
Common ways of writing could be offered as example quotations in write, and common phrases might qualify as entries in themselves. The meaning of in ink, for example, is not sum-of-parts. But we have to draw the line at offering language tutorials or lessons. Michael Z. 2009-04-06 18:12 z
My approach for a similar situation in Hungarian was a template that displays the case endings/postpositions and puts the entry in a category where words requiring the same case endings/postpositions are collected, see an example at aggódik (worry). --Panda10 23:16, 6 April 2009 (UTC)Reply
Definitely to be encouraged! Probably the most widely known example is different than (US) vs. different to (UK) and the recommended usage different from. Unfortunately the links are and should be red because this is done through examples and usage notes at different. If you have a suggestion for a more standard way to present this information, I'm all eyes and ears. DAVilla 02:01, 8 April 2009 (UTC)Reply
The best way to do this, in my opinion, is to collect citations that demonstrate use of the verb with various prepositions. --EncycloPetey 04:46, 9 April 2009 (UTC)Reply

Is there a robot for this?

Is there a robot which can collect noun definitions where the plural of a word is not mentioned. I hate entries of nouns where the plural is not given.69.14.222.205 02:23, 6 April 2009 (UTC)Reply

What do you mean, "collect noun definitions"? Can you give examples of entries that you "hate"? Equinox 02:27, 6 April 2009 (UTC)Reply
I mean that the robot can tell a person (admin, editors etc) that these words are nouns/can be used as nouns but the plural form of the word is not mentioned in its definition. I am not able to locate any such word now, but in the past I have seen some words defined incompletely (maybe they were newly entered and the robot did not reach them yet). The question is, 'Is there such a robot?' 207.148.219.146 12:50, 6 April 2009 (UTC)Reply
No. Though it'd be fairly easy to scan the XML dump if you are interested. Conrad.Irwin 12:59, 6 April 2009 (UTC)Reply
Hm. How is the robot supposed to distinguish countable from uncountable entries? It would be reasonable just to detect the absence of any tag for countability and to tag it. The uſer hight Bogorm converſation 16:38, 6 April 2009 (UTC)Reply
I've got a program that scanned the XML for nouns that don't use the {{en-noun}} template, results are at User:RJFJR/nounscan. If it used the template it should either list the plural or be marked as uncountable with a '-' so fixing the entries to use the template should help. RJFJR 16:46, 6 April 2009 (UTC)Reply
Another good list would be those that are marked {{en-noun|?}} so that they could be fixed up as well. --Bequw¢τ 02:08, 7 April 2009 (UTC)Reply
My previus comment struck because it applied to {{en-noun|!}}
In either case an automatic category would be nice. DAVilla 21:25, 7 April 2009 (UTC)Reply
Those marked {{en-noun|?}} are listed in Category:English nouns with unknown or uncertain plurals, and those marked with {{en-noun|!}} are categorized in Category:English nouns with unattested plurals. Perhaps the first could be changed to something like "English noun entries with no plural given", and then used as a general cleanup cat? -- Visviva 02:58, 8 April 2009 (UTC)Reply
It would be pretty simple to generate a list of English nouns that use either simple boldfacing or {{infl|en|noun}} in the inflection line, if there's any interest. Of course there are probably some cases where {{infl}} is still the best option. -- Visviva 03:02, 8 April 2009 (UTC)Reply

French categories

Pharamp and I were having a discussion about the {{fr-noun}} template and the topic of categories came up. The issue was that Category:French_masculine_nouns and Category:French_feminine_nouns either have misleading category names or are not being used properly. They seem to be used mainly for singular nouns but {{fr-noun|m/f|singular word|type=plural}} places plurals in anyway. Pharamp and I were looking for other Wiktionarians' input. Should we recommend use of {{fr-noun}} over {{infl|fr|plural}}? Should we change {{fr-noun}} to make it not include these categories? Should we make different categories? I would favour the first option. —Internoob (Talk|Cont.) 22:44, 6 April 2009 (UTC)Reply

The front page needs a revamp

I've come to realise that our front is, really, not very good (it seems to be a definite issue with many Wiktionary projects, actually) at doing its job of actually drawing people into the site. So I'm throwing a buttload of thoughts out there for mulling.

The first issue is waste of screenspace, and the single worst offender is undeniably the wall of text on the left side. Although it admittedly helps focusing attention on the left side where the WOTD is (a longstanding rule of webdesign 101 is that the most important content—besides navigation and that jazz—goes on the top left), it is ultimately mere blather and generates only tl;dr in the viewer who came to use the site, not be told the site's grand scheme. I had actually never done more than glance at it until examining the page in details for this analysis.

Second more global issue is that the page does not actually help draw the readers into the content itself. The only unambiguous link to an entry is the WOTD. ALL other links are either "hidden" (i.e. in the wall o' text) or ultimately useless for that purpose ("Wiktionary, the free dictionary"). All other links are to further index pages, some of which are not even that useful, for example, the index.

An inconvenient truth: the MediaWiki indexes are of little use for actual browsing, and unlikely to be used. Why? People who are looking for a specific word are not very likely to actually use the index (whatever its form), they'll go for one of the search box! And if you actually wanted to use the MediaWiki index, you'd end up swamped in the "form of"s, a phenomenon that is bound to get worse as more and more forms of other languages are added (I'm thinking particularly cases and conjugations, but a mere noun with a feminine represent least 4 forms in most languages). This is a separate problem, but it would probably be a god thing if we could avoid making visitors get turned off by it.

The top bar, for all its catchyness, not only does not accomplish much (even the search box is arguably very dispensible), but is actively taking a LOT of precious screen real estate, well over 1½ vertical inch that is pretty much wasted. At least the Wikipedia bar is discrete and stable for all usual screen sizes!

Now this is what we are doing wrong. What are we not doing that we could? A few ideas:

  • Feature some foreign-language material
    I know "this is the English wiktionary" and all that jazz, but actually showing that we do include original material that goes beyond just a billingual dictionary would be a good idea (cf. by showing words that do not have equivalent, or more quirky stuff than in the WOTD). Another avenue to explore is the nice little right-hand bar on ru: which list category links for a number of languages.
  • Feature some non-definitory material
    There is almost constant off-hand commentary that the Wikisaurus, appendixes, rhymebooks and suches are in a sad state. It's not exactly surprising seeing that there is little, if any incentive to actually improve them! Throw links to actual pages on the front page and people will be more interested in improving them.
  • Use the Word in the News feature
    This thing has been managed by editors mostly as an amusing aside, but is otherwise of no real impact, but by broadening it a bit beyond Wikinews report, it could become of definite interest to draw in people.

Circeus 23:47, 7 April 2009 (UTC)Reply

Measuring elements of a Web page in inches seems like a mistake. Equinox 00:18, 8 April 2009 (UTC)Reply
That's beyond the point. I could give the measures as 4.0299315154 E-17 light years and the fact would still be it's a BIG waste of space for something that doe snot do much to draw people into the site and pushes the useful content down. Circeus 00:32, 8 April 2009 (UTC)Reply
You're right; I just took non-screen units as a bad sign from a would-be Web designer. How about making a prototype for an alternate front page in your user space? There are definitely improvements that could be made, but the best way to show them might be to come up with a new idea and put it together yourself for discussion as a starting-point. Equinox 00:49, 8 April 2009 (UTC)Reply
I'll see what initial offering I can throw up. I'm familiar with the general principles of webdesign, but am not by any mean a professional (or even experienced in designing for large-scale websites). Circeus 00:58, 8 April 2009 (UTC)Reply

The letter index links are indeed useless, because MediaWiki can't sort its way out of a wet paper bag (COO, cooperate, co-operate and coöperate never appear anywhere near each other). Let's strike all links to Special:PrefixIndex or Special:AllPages. Instead, link to Index:English and its allies. I'd be glad to rebuild the letter-link indexes, if everyone thinks this is a good idea. Michael Z. 2009-04-08 01:48 z

As for the indices, theoretically they won't have any 'form-of' entries in them (I know the ones that are generated with a bot are filtered first). Nadando 02:35, 8 April 2009 (UTC)Reply

I'm referring to allpages/prefixindex, which is what the Main Page links to. Even if they were links to the hand-made indexes, the point about people being unlikely to use them would still stand. Circeus 04:38, 8 April 2009 (UTC)Reply
The manual index for English actually feels a bit more like flipping through the pages of a dictionary, and less like undergoing bum surgery (although the very long pages drag my browser down a bit). It could also be useful. Linking to better stuff would improve the home page. Michael Z. 2009-04-08 05:35 z
The last time the main page was re-vamped was end of 2007, if you want to make small tweaks, feel free to do it on the front page - if we are going to be making a larger change let's create Wiktionary:Main Page/2009 redesign. I think linking to the real indexes is a good idea, now that some of them are being updated by User:Conrad.Bot (I linked to the AllPages beacuse the real indexes were far worse at the time). Ideas for what to put on the front page are needed, perhaps a history of WOTD? there's really not much that "enticing" we have. Conrad.Irwin 13:43, 8 April 2009 (UTC)Reply
I like the idea of exemplifying our content on the main page. If we're going to link directly to entries, then Wikipedia is a great example of how to do that. EncycloPetey should definitely be commended for his consistent efforts on Word of the Day. A foreign language WOTD has been suggested several times in the past. The problem with this and other dynamic content is that it takes a lot of work to keep it up. Wikipedia for instance has an army to evaluate articles. French Wiktionary has featured articles, but not on a daily basis. Considering how much effort we put into citing neologisms, that would probably be a good source of material. I'm just not sure that we want to taut ourselves as being so "urban". Another good source is Words in the News. We might actually include the quotation, for instance when the media made a hubbub about Barack Obama's down payment speech. Ultimately we would want to highlight all areas: thesaurus, rhyme guide, phrase book, appendices, etc. DAVilla 17:26, 8 April 2009 (UTC)Reply
The version of the demonstration design as of typing has a variable "Interesting stuff" box modelled roughly on WOTD that would be able to display pretty much whatever type of content we want it to with a bit of ingenuity applied to the template. Hit the "refresh" to see one of eight different examples from foreign word to phrasebook to appendix. Circeus 06:08, 10 April 2009 (UTC)Reply

Re-coding started

I've started work on a version at Wiktionary:Main Page/2009 redesign, as Conrad.Irwin suggested. For the time being, It's mostly fiddling with and simplifying the code (the image overlap is implemented in a very complex fashion). I intend to leave the bottom tier (i.e. scripts and other wikts.) and the basic design elements (blue boxes+icons) mostly intact. Ideas (on the talk page plz?) are welcome and even demanded :P Circeus 19:51, 8 April 2009 (UTC)Reply

From memory, much of the image code was to make it not look too atrocious on IE6, but I've never really been a CSS guru so there's probably a better way. I really like the "Newly discovered" box, but think there should not be only one style of container on that page (it begins to hearken back to Wiktionary:Main Page/Old 2007). Conrad.Irwin 09:48, 9 April 2009 (UTC)Reply
The two boxes under WOTD were added by DAVilla. I like the look, and wanted to keep it globally consistent (the only reason it didn't look like the previous page is that the current version has less boxes.), though I wanted each box to have its own icon. I just checked and my system (it uses position:relative to allow the image to be absolutely positioned in regard to their container box rather than "sliding" the containers under the pictures) does work in IE. What's more, it allows for a constant margin-top between the boxes, which would have been much harder to achieve in the prvious system. I shouldn't take too much credit for it, though. I came across the solution entirely by accident while looking for something else. Circeus 15:45, 9 April 2009 (UTC)Reply

Out-of-process removal of a verification requæst for a sense of cruz gamada

Regrettably, I bring this issue to this forum, as it has not yet been resolved.

Note this RfV-sense discussion for the “swastika” sense of the Portuguese term (deprecated template usage) cruz gamada, which I started. Stephen G. Brown removed the {{rfv-sense}} without providing the necessary reference and quotations to support that sense (as required by our CFI). The entry’s revision history, alongside our discussion on his talk page shows the developments since then. The entry is now edit-protected, with the {{rfv-sense}} tag removed. As I state on his talk page, I consider Stephen’s edit-protecting of the entry to be “an abuse of [his] admin. privileges”. Since we are æqually intransigent in our positions, I ask that others from the editing community intervene in this matter to resolve the issue. Whilst it is unfortunate that it could not have been resolved before now, I wish to commend Stephen for his civility in our discussion thus far.  (u):Raifʻhār (t):Doremítzwr﴿ 02:06, 8 April 2009 (UTC)Reply

Challenged senses of FL terms are subject to the same burden of proof as English terms. A quotation where the meaning of the head-term can be determined by knowing the meanings of the other (in this case Portuguese) words is required (technically three). It might be a bit more difficult to verify if the sense in question is "standard" or not, but still feasible. The RFV process should not be stopped early as personal knowledge is not attestation. --Bequw¢τ 03:59, 8 April 2009 (UTC)Reply
Indeed; pretty much my points.  (u):Raifʻhār (t):Doremítzwr﴿ 04:05, 14 April 2009 (UTC)Reply
Yes, that's quite a silly dispute. Of course, as long as it's being discussed at RfV, the tag should remain, and it does no harm anyway. I agree that there is no way that reverting a non-admin in a good-faith dispute and then protecting your version is an appropriate use of that tool. I have removed the protection at least, but that doesn't mean that you should go revert him again. It would have been better from the start if you two could have just let it sit and let others act. A little tag is not the end of the world. Dominic·t 13:09, 8 April 2009 (UTC)Reply
I have not reverted him since the entry’s de-protection, since the RfV process still applies without it. Another editor may re-add it if it is felt necessary.  (u):Raifʻhār (t):Doremítzwr﴿ 04:05, 14 April 2009 (UTC)Reply
There is good reason not to apply CFI fully in foreign language cases. For one, we are the only Wiktionary so far as I know to have an RFV process and a Citations namespace. If the major Wiktionaries are going to include every term in every language, then it is not reasonable to expect each of them to cite terms like confuzzle. Likewise, citing English terms is more than enough work for us. It is unfortunate that coordination and cooperation has not been established at any level. For now it is precisely Stephen to whom we have deferred many of these decisions.
If you like we can revise or put through (all or parts of) this vote on attestation criteria which I had left on the back burner. The section on attestation in other languages would allow us to defer the question of cruz gamada to the Portuguese Wiktionary. This isn't an easy or short-term solution, seeing as Portuguese Wiktionary is way behind, lacking even a page for the same term. The entry is still going to have to be cited somewhere. However, in the long term initiating verification on other Wiktionaries is going to be a much more sound approach than having all work initiated here. DAVilla 16:45, 8 April 2009 (UTC)Reply
“[I]nitiating verification on other Wiktionaries is going to be a much more sound approach than having all work initiated here.” — Well, indeed, but why can’t we copy their (GFDL-licensed) quotations to this Wiktionary if they have some, and vice versa? I personally think that such duplication would be valuable. This also means that the CFI need not necessarily be altered.  (u):Raifʻhār (t):Doremítzwr﴿ 04:05, 14 April 2009 (UTC)Reply
While DAVilla's scenario sounds nice, and would be the prudent approach, all else being equal, I wonder if it will actually pan out. Whatever the article count might imply, we're leaps and bounds beyond every other Wiktionary. I think it very likely that we will often be the ones whom other Wiktionaries take stuff from, even in their own languages. Because of this, I don't think that we should modify our CFI for foreign languages; they should be subject to the same restrictions as English. I think that we will get to the point where we have teams of people working on every language, as we currently have on English, and I think we'll get much better at efficiently (probably automatedly) acquiring cites. That being said, in the meantime, I think we have to exercise a bit of caution, and treat those rules as something to be tempered by good sense. Stephen Brown is easily one of our best contributors, and he has proved his expansive knowledge of languages in a multitude of ways during his stay here. If Doremítzwr were a native Portuguese speaker, and the word sounded odd to him, then he might be justified in his actions. As it stands, wasting so much of Stephen's time over this is nothing short of ridiculous. In my opinion, the rfv should be wiped, completely out of process, and completely against the official rules, because common sense dictates that it's the best approach to the situation. -Atelaes λάλει ἐμοί 04:26, 14 April 2009 (UTC)Reply
Forgive my curtness, but it would’ve wasted far less of his and my time for him to have learnt how to quote and cite and to have done so. Very few of his contributions get challenged, but when they are, I see no reason why he should have the prerogative of closing RfVs out-of-process. Bear in mind, as well, that he could’ve left the {{rfv-sense}} tag where it was for someone else to address the requaest as, it turns out, someone already has.  (u):Raifʻhār (t):Doremítzwr﴿ 04:44, 14 April 2009 (UTC)Reply
I agree that Stephen's reaction was not appropriate. However, let's not act as though he's solely at fault here. To begin with, you admitted from the get-go that, "I don’t doubt that this term is used thus." Your only qualm is the "correctness" of such usage. Vagahn produces a dictionary which backs up the sense, which you don't dispute, and yet you persist in your demand of the three cites. This is taking the letter of the law beyond its spirit. This is taking something which we have already verified within our current resources and utilizing a rule the wrong way. Yes, I will admit that diplomacy is not Stephen's strong suit. That does not nullify the fact that you are wasting our time with this. Let it go. -Atelaes λάλει ἐμοί 06:30, 14 April 2009 (UTC)Reply
Very well; the dictionary does address my original concerns. I shall not object to this RfV being closed without satisfying the letter of the law.  (u):Raifʻhār (t):Doremítzwr﴿ 14:29, 14 April 2009 (UTC)Reply
It's Vahagn :( -Vahagn Petrosyan 14:43, 14 April 2009 (UTC)Reply
Gah....sorry. -Atelaes λάλει ἐμοί 18:44, 14 April 2009 (UTC)Reply

Do we include non-native language?

A discussion has arisen in Wiktionary:Requests for deletion#vacuüm, and I believe it has far-reaching enough consequences that I thought it worth bringing up here. Do we include non-native language? vacuüm was requested for deletion by Hamaryns on the grounds that all the quotes were from non-native speakers (Dutch, specifically). Much to my surprise, no one batted an eye at this, opting simply to note a Netherlands specific contag. Of course the "All words in all languages" was quoted. I think that this is a very, very bad idea. Should I start entering how I pronounce French words? It's not very similar to how French speakers do (any of them). Additionally, how do we define when a non-native speaker is interspersing their native language into their English? One could well argue that the Dutch speakers in the aforementioned example are simply using the Dutch spelling of vacuum. Now, this is not an attempt to try and promote some "higher standard," I think that the English of a poor farmer in Georgia is every bit as valid as that of the queen of England. However, I think that we should limit our descriptive approach to native speakers. Now, it is certainly possible to describe how second-language speakers speak every language (or rather....how they write it), but I think this is a huge can of worms that we really don't want to open. Finally, I will personally murder anyone who attempts to add my pronunciation of Ancient Greek into the Ancient Greek entries. -Atelaes λάλει ἐμοί 08:47, 8 April 2009 (UTC)Reply

I agree with the above. I don't think we serve any good purpose by including such variants except to provide a living embodiment of a Borgesian (w:Library of Babel) dictionary of all possible attestable utterances and orthographies. DCDuring TALK 10:10, 8 April 2009 (UTC)Reply
I would certainly draw the line at nonnative pronunciations, not least because they would almost certainly be unverifiable. But, as the attestations show, "vacuüm" is a word that appears in published writing in English, and therefore potentially a word that a reader might encounter and want to look up. We do list misspellings at Wiktionary, and I don't see any reason to exclude misspellings that are encountered in published work simply because they are only ever committed by nonnative speakers. Angr 11:46, 8 April 2009 (UTC
Yes, we include misspellings, but they are under different standards than other words (exactly what those standards are is fairly vague, as far as I can tell). They are required to meet rather more demands than simple CFI. -Atelaes λάλει ἐμοί 18:17, 8 April 2009 (UTC)Reply
I look forward to the day when downloadable audio will be deemed usable (mandatory?) to cite pronunciations. I wouldn't count on current technological/economic limitations to keep us from being overwhelmed by new "opportunities" and never achieving excellence at our current mission. DCDuring TALK 14:39, 8 April 2009 (UTC)Reply
No, don't add your pronunciation under a French language header, but it would be right to add pronunciations for regions where French is relied upon to communicate, for instance many countries in Africa where the R is rolled (as in Spanish), even if it is not the native language there. The French terms to which you can add English pronunciations are borrowed terms like parlez vous.
In my opinion we should include all language transfers that can be documented. The reason is that the lines between proper and improper use of language, and in many cases even between one language and another, are artificial. I promise you that the Parisian French francophiles hold in such high esteem is not spoken near the border of Spain, but their speech is still comprehensible to a Frenchman and incomprehensible to a Spaniard. At one time in history there would have been an entire range between the languages. The reason languages are so uniform today is political, primarily the result of education and the choices at state level in how to achieve it.
The question of which language is more or less judged by the surrounding text. If the sentence minus the term is in a single language, then the entire sentence is in that language, tag-switching aside. The good thing about switching is that it rarely occurs in isolation, so it's easy to spot scenarios where this is the most likely explanation. That is why personally I don't think the immediately surrounding text is enough for citation purposes. After all, a person could be trying their hand at another language and failing miserably. However, in nearly every case, a work will be presented in one almost entirely predominant language. For instance, I'm sure you've seen Latin words sprinkled in English books, but there's no question that the book is written in English. That's why so many Latin terms are said to be borrowed.
If the works cited in this case are written in English, then in my opinion even if they are written by foreigners, the quotations I've seen count toward attestation. Given how easily misspellings and nonstandard use are accepted, note how weak a claim this is. DAVilla 16:11, 8 April 2009 (UTC)Reply
First of all, let's please set aside any talk of "proper"and "improper", as that only confuses the issue. The distinction I'm speaking of is native and non-native. Now, this is not always an easy distinction, as there are children who grow up in bilingual households, but it's a much more possible distinction than proper vs improper. Your point about the different varieties of French is moot, as I have already been quite clear that any native speaker has equally valid speech, whether they grew up learning Parisian French or the French of some outlying French area, as well as Quebec French or whatever other type of French there is. The statement about if everything but one word is from a language, then the whole thing is in that language is not true. When I try to communicate with Spanish speakers, I often drop English words in, simply because my Spanish is not very strong. That doesn't make it Spanish. -Atelaes λάλει ἐμοί 18:17, 8 April 2009 (UTC)Reply
Right, and my roommate with his Dutch, and even if your Spanish or his Dutch were strong, you may still do it. It's called tag-switching, as noted above. If it happened often enough it would develop into a creole, let's call it Nederlish. When that happens I have a hard time with the notion that several hundred words suddenly pop into existence without ever having been documented before. We try to avoid handling the higher-level structure directly, but you do have to keep in mind that not only languages but their boundaries are in flux. Tens of thousands of descendants like vacuüm would already be part of the pre-existing langauges, but what about the new Nederlish terms that bridge the gap? What about the purely English words with their English meanings and their Dutch misspellings (maybe bruüm instead of broom), or the Dutch words with their Dutch spellings and their English misinterpretations (maybe Nederlands for the country instead of the language). You would refuse to document these because e.g. bruüm does not appear in a Dutch context, only an English context by non-native speakers, at least until Nederlish is sanctioned as a language. How does that political decision suddenly make all our citations valid? What I'm saying is, to me "all words in all languages" does not mean all words provided it's sanctioned as being some language. Rather, if it's clear than it's natural language, then it's a word, and the problem becomes identification of the language. How sure do you have to be that it's not Dutch? Saying native vs. non-native is just as prescriptive as proper vs. improper. Their English vocabulary may not have developed fully in elementary school, only later in high school or university. I guarantee yours did not either. Yes, I would want to be certain that the citations were entirely in the target language, which is why I already did address tag-switching, the problem you mentioned. A foreigner not fluent in English would be obvious by dropping in larger Dutch words, the translations of which he does not know. It would happen more than once. I said above and repeat that I would not include such citations. But if the writer or speaker is confident enough to communicate entirely in English and makes a legitimate mistake, and if he's not the only one to make that mistake, then it's attested in the most nonstandard, half-bred, incorrect and hideous form, but attested nonetheless. DAVilla 03:41, 9 April 2009 (UTC)Reply
You mean code-switching. tag switching is a method for routing data packets (I invented it, and Tony Li of cisco tried to patent it, somehow forgetting that he had gotten it from my IETF plenary presentation on next-generation IP ;-). You should hear Swahili and English here, where people fluent in both insert English words where there is no Swahili, and people more comfortable with Swahili insert words when speaking English. The TV stations have news in Swahili at 7pm, and English at 9pm, but unless you understand both, you are likely to miss a lot, interviews are neither subtitled nor translated. On topic: I've been careful to only tag things as "East African English" when they have been clearly borrowed into English: murram, godown. Robert Ullmann 12:15, 9 April 2009 (UTC)Reply
I note, as an aside, that one implication of limiting all entries to native speakers is that we won't have any Neo-Latin terms. --EncycloPetey 04:43, 9 April 2009 (UTC)Reply

No one is proposing adding Atalaes's pronunciation of ancient Greek to the dictionary. Let's curb the hyperbole.

So do we remove and ban Category:East African English, Category:Hong Kong English, and Category:Indian English? Delete the corresponding templates? Do we put Category:Quebec English on probation until we find solid proof that more English-first Quebeckers speak English than French-first Quebeckers? Should we scan the Wiktionary for quotations[2] from w:Joseph Conrad, because he was not a native English speaker? Strike Salman Rushdie quotes, in case English wasn't his first language.I guess quotations of translations from Plato, Cicero, Marx, Derrida, and Abba should be stricken too. Let's start with an audit of the personal histories of cited authors, and compile a blacklist of the ones who didn't learn English first.

The proposed restriction is not followed by any other dictionary. It's arbitrary and completely impractical. Michael Z. 2009-04-09 05:17 z

Wow, that was such a well-reasoned retort I've almost forgotten why I proposed this in the first place. Point conceded. -Atelaes λάλει ἐμοί 05:39, 9 April 2009 (UTC)Reply
Well, I think Netherlands English vacuüm is weird, too – it's way out of the ballpark for any paper dictionary. We don't have the same extreme constraints they do, so we may end up with niche regional categories from every country in the world, especially for the lingua anglia. It may require much more work to find the boundaries of regional usage than merely to confirm attestation. Michael Z. 2009-04-09 14:39 z

New categories

We're having some international decisions to take on categories in the French parlour :

User Category:Synonyms (and antonyms) Category:Homonyms Category:Homophones Category:Heteronyms Category:Paronyms
JackPotte for for for for against
Category:Hyponyms Category:Hyperonyms Category:Meronyms Category:Holonyms Category:Transitive verbs (and intransitive) Category:Ergative verbs (and inergative)
against against against against for for

At the risk of repeating myself, here goes. If you have an automatic category for synonyms, what does that do? What's happened on the French Wiktionary is that every article which has the {{-syn-}} template on it now adds that words to the "French synonyms" category. But it doesn't say within the category what the synonyms are, just that the said words has at least one synonym. What percentage of words have at least one synonym? More than 90% surely. A category like "English synonyms" could easily have a million articles in it by the end of the year. Do you see what I'm saying?

PS Jack, nothing against you personally, it's just an admin issue, that's all. Mglovesfun 18:25, 8 April 2009 (UTC)Reply

Even if such a list would be huge, it would be interesting to have it in order to be sure to find some words with synonyms when needed, and to know its number of entries : which language do you think propose the more synonyms ? JackPotte 18:42, 8 April 2009 (UTC)Reply
Yeah but when you want a synonym of a word, it's a specific one. An alphabetical list of every word that has a synonym (more than a million) is of little use. Mglovesfun 18:58, 8 April 2009 (UTC)Reply
Well, I'll leave how the French Wiktionary is run well enough alone, but I would prefer not to have most of those categories here. I like having "language" "POS" categories, etymological categories, topical categories, and a few others, but the rest I'm suspicious of. I just don't see any utility from most of what you're proposing. -Atelaes λάλει ἐμοί 19:03, 8 April 2009 (UTC)Reply

The only one of these *-nyms that looks even remotely worthwhile in a non-trivial way for categorization is heteronym and perhaps one for homographs. — Carolina wren discussió 21:03, 8 April 2009 (UTC)Reply

I guess you must not think some of our existing categories to be even remotely worthwhile. We already have Category:Ergative verbs by language. I agree that synonyms isn't such a useful category. Still, I don't think all of these should be brushed off, though my interpretation is a little different. If there's a definition line that includes synecdoche or the like then that could be automatically categorized. It could easily be done with a context label. DAVilla 02:35, 9 April 2009 (UTC)Reply
The various verb ones I do consider useful, but they aren't what I referring to by *-nyms. Synecdoche I'm ambivalent about, but since there probably ought to be a usage label for it, anyway, wouldn't have a problem with an associated category. — Carolina wren discussió 03:34, 9 April 2009 (UTC)Reply


The 5 other languages versions of this debate are being updated with the previous ideas. Clearly I'm "for" because we need to find all synonym words and compute them, even without any synonym example in mind. Synchronise this dynamic alphabetic ordered list with external ones. JackPotte 12:07, 9 April 2009 (UTC)Reply

A word is a synonym of another word; there is not point in classifying a word as a synonym without reference to another particular word. Thus, a category for synonyms is dubious, just like categories for other semantic relations including antonyms, meronyms, holonyms and coordinated terms. In general, categories are poorly suited for capturing role types in binary relationships, which synonymy, antonymy, meronymy and holonymy are. To clarify a further bit, you can classify people into friends and non-friends with reference to a particular person, but if the person whose friends are sought is unknown or unspecified, classifying people as friends and non-friends becomes pointless. --Dan Polansky 09:52, 10 April 2009 (UTC)Reply
1) For instance, one of the recognized longest English words : "pseudopseudohypoparathyroidism" (30 letters) seems to have no synonym (only some relatively long paraphrases), thus the synonym category might still be useful for the reasons I've described yesterday. Moreover, somebody who need to check the "replaceabitity" of each word would be interested in this synonym list.
Apart from that another proposition has been suggested today : Category:Words by number of letters (you can see here an example of result with the French acronyms).
2) I'm not academician, however we're so at the top of the linguistic researches that we could propose a new computer scientific definition of paronyms (some words of maximum 2 different letters or sounds between them), hence false-friends would be some international paronyms and would be classified in the same for all. JackPotte 20:59, 10 April 2009 (UTC)Reply
It might make more sense to have a category for words without synonyms, although that's a dubious claim. False friends might be interesting although we don't even list those in entries. I wonder if a category could contain English terms likely to be misunderstood in Spanish, Italian terms likely to be misunderstood in French, etc. DAVilla 23:15, 10 April 2009 (UTC)Reply
I've asked about false friend at some point not so long ago. I think the consensus was that false friends should be discussed only with regard to english, and notes in the "Usage notes" where the best way to go about it. Circeus 23:58, 10 April 2009 (UTC)Reply
Maybe some people would build it by being inspired by the French and Italian ones. JackPotte 00:25, 11 April 2009 (UTC)Reply
Categories decisions :
French Version
German version
Spanish Version
Italian Version
Portuguese Version
A category for words having non-etymologically-related homographs (excluding inflected forms) and a category for words having non-etymologically-related homophones (excluding inflected forms) might be useful, provided that they are short enough, but a category for words having synonyms is not useful at all (it's very obvious). Lmaltier 06:20, 11 April 2009 (UTC)Reply

Update CFI

I started a discussion several weeks ago about updating Wiktionary:Criteria for inclusion, specifically changing or removing the Proverbs section because current practice is inconsistent with the policy. Since nobody replied, I made the change, but it has been reverted because there had been no vote regarding this change. I've been asked to make the request here, hence this message. Should the change I had proposed be implemented, ignored, or should a different result be applied? Mindmatrix 22:57, 8 April 2009 (UTC)Reply

I guess you'd have to wait until the outcome of this vote before making changes without going through the formal process. I would implement the change myself except there's a vote out on exactly this point. DAVilla 02:15, 9 April 2009 (UTC)Reply

Wikimania 2009: Scholarships

Wikimania 2009, this year's global event devoted to Wikimedia projects around the globe, is now accepting applications for scholarships to the conference. This year's conference will be handled from August 26-28 in Buenos Aires, Argentina. The scholarship can be used to help offset the costs of travel and registration. For more information, check the official information page. Please remember that the Call for Participation is still open, please submit your papers! Without submissions, Wikimania would not be nearly as fun! - Rjd0060 02:10, 9 April 2009 (UTC)Reply

Editing without Wikitext? Introducing User:Conrad.Irwin/editor.js

I've made a start on trying to improve the editing interface of Wiktionary. At the moment it only supports adding translations, but I'd thought I'd see if people are interested in this kind of thing before writing lots of features.

So, If you'd like to be able to edit a page, without looking at Wikisyntax, try enabling the WT:PREF "Add input boxes to pages to assist with adding translations.", or if you're not a WT:PREFS person, add the following line to your personal javascript.

importScript('User:Conrad.Irwin/editor.js');

More information will be found at User talk:Conrad.Irwin/editor.js, bug reports and small feature requests are welcome - here or there. Conrad.Irwin 19:03, 9 April 2009 (UTC)Reply

That's awesome. I'd really like to see this expanded and become part of the standard javascript (as the users likely to use WT:PREFS are basically the only people that don't need this. :-)). This could really make the process more inviting for a lot of users. -Atelaes λάλει ἐμοί 19:15, 9 April 2009 (UTC)Reply
I certainly think it's worth enabling it for a trial period to see whether it encourages people to add translations or nonsense. Some kind of function to create a minimal foreign language entry from a redlink in a translation table would also be cool, but I think that should come under acceleration (which is another thing we might want to enable by default - maybe only for logged in editors though?). What do people feel about enabling editor.js for everyone for a couple of days after Easter? Conrad.Irwin 16:00, 10 April 2009 (UTC)Reply

Okay, what am I looking for, and where? I've tried this in Safari 4 beta and Firefox 3.0.8 on the Mac. The only difference I see is that Translations sections are expanded in Safari (only). Michael Z. 2009-04-10 16:41 z

You should see an input form like at http://jelzo.com/Screenshot.png . I'm using Firefox, and have tested in Safari 4. It sounds like you are getting a Javascript error, could you tell me what it is? (Ctrl+Shift+J in firefox may give you a list of recent errors) Otherwise, which WT:PREFS do you have enabled, there may be a conflict? Conrad.Irwin 16:48, 10 April 2009 (UTC)Reply
Ach, sorry. It relies on a function (newNode) that I thought I had added to the sitewide Javascript but hadn't (it's used by the feedback stuff and some of me other stuff) as I had feedback.js enabled, there was no problem. Should now be fixed. Conrad.Irwin 17:02, 10 April 2009 (UTC)Reply
Works now. Very cool way to enter structured info.
I almost gave up trying to figure out how to save, before I spotted the buttons at the top-left corner. I think it's more natural to look for action buttons at the bottom-right of the display, so maybe a 1-line strip across the bottom of the window would be more natural.
It also needs a more definite confirmation – a clear “done.” Have you tried reloading the page after entry?
And while I'm at it, I think I would be making the same language transliterations repeatedly. It would be nice if it would remember its expanded state, and auto-enter the language code and script.
But no complaints at all. This makes it much easier to just go in and enter translations, and I can see myself entering more with the streamlined process. Michael Z. 2009-04-10 22:06 z
Glad it's working, and thanks for the feedback. Yes, remembering code is a good idea; I'm not too fussed about where it appears on the screen, I might have an experiment. I thought about reloading the page, but with caching being what it is, it seems that you don't always see your changes (unless you do an ?action=purge which won't work nicely for anonymous users and is quite slow anyway.) On save, it should definitely remove the green highlighting, maybe that would help? Also, would it be worth trying to guess the script from the language code? There is a mapping at {{lang2sc}} I could steal - though I'd still provide an override. Conrad.Irwin 22:35, 10 April 2009 (UTC)Reply
Flash a background colour once for the changed spans, and remove the green outlines – sort of a burning-in effect. That would be slick, and better than a reload. Michael Z. 2009-04-10 23:04 z
I've made these changes, with the exception that I prefer the save button to be top-left. clear your cache (ctrl+shift+F5) Conrad.Irwin 22:54, 11 April 2009 (UTC)Reply

Works great. However, if the trans table does not have a gloss, I get an error: "Could not find translation table, please improve glosses.". Is this how it is supposed to work? Tested in Firefox v3. --Panda10 19:32, 11 April 2009 (UTC)Reply

Yes. In order that it won't make a mistake when editing the wikitext (it could guess a little more, but it's very hard to spot if it makes a mistake, so it plays safe.) It will also fail if the trans-mid template is absent, and in some cases where the glosses are too similar. Conrad.Irwin 20:25, 11 April 2009 (UTC)Reply
I was scratching my head over that message. How about “Error: this Transliteration section doesn't indicate which sense it applies to. Please add a gloss to {trans-top|...}.”? But I've already chosen a translation table when I started entering text, so why refuse to accept my judgment? Michael Z. 2009-04-13 17:43 z
The issue is that counting the number of {{trans-top}} in the source code does not work. (That was what I first attempted, but there are some crazy cases where you find <!-- {{trans-top}} or similar, the code tries to ignore these but I found it still couldn't count accurately enough. Thus the only way to match the trans table you select in the HTML with the trans table it finds in the wikitext is to match glosses. I could try adding some more complicated logic to match them all up in order and interpolate for those it cannot match - but I'd prefer to do that as part of a seperate "and translation gloss" module. Maybe a temporary improvement would be to try and guess which HTML tables it will not be able to find, and not provide the editing form there. Conrad.Irwin 18:02, 13 April 2009 (UTC)Reply
How about adding an “add a title” button to untitled translation blocks, instead of the form for adding a translation? We should have a work bee to empty out Category:Translation table header lacks glossMichael Z. 2009-04-13 18:49 z
With time... Conrad.Irwin 20:22, 13 April 2009 (UTC)Reply

I love this.

The “guessing Xxx” is nice, too. Maybe make the label “Script code”, and link it to w:List of ISO 15924 codes. How about “guessing Xxxx (<Script name>)”? Why not enter the code into the field and make the text beside it “<Script name>”, which updates if you enter a script code. If you're worried that the guess is wrong, then add a checkbox to activate the field with a click instead of having to retype the guess. Alternatively, make “Xxxx” a link or button which enters the value in a click.

Actually it will use the guess if you leave the box blank - maybe I should change "guessing" to "using"? Conrad.Irwin 18:02, 13 April 2009 (UTC)Reply
In that case, just enter the code as the field's default text, and then “using” is self-evident. The fact that the script was guessed from the language can go into the docs page, but needn't clutter the interface. Michael Z. 2009-04-13 18:49 z
I could do this given some time, but it's a bit fiddly to work out when I can just replace the value in the text box without irritating the user. I hope the recent interface improvements make it clearer. Conrad.Irwin 20:22, 13 April 2009 (UTC)Reply

Gender could have (m f n c p) links with tooltips, buttons, or a pop-up menu, to enter with a single operation. Or just a pop-up menu if these are absolutely the only choices. Should that say “Gender or number”, since plural is not a gender? Michael Z. 2009-04-13 17:43 z

I was under the impression (for some language) that you can have words that are masculine and feminine, I was considering putting these into a standard drop-down, though that could get long but maybe a set of tick boxes would be better - but that takes up a lot of space and is slower to use. Conrad.Irwin 18:02, 13 April 2009 (UTC)Reply
Now changed to checkboxes. Conrad.Irwin 20:22, 13 April 2009 (UTC)Reply

This keeps getting better. Please don't be annoyed by all the piecemeal suggestions.

Instead of having the action controls dissociated in the top-left corner, can you make each translation block's form independent, having its own "Save" button at the bottom? This would be a good place to put the help button: (?) (Save). Instead of an “undo” stack, each translation span could have an (x) button for deletion. Michael Z. 2009-04-13 21:25 z

In short, yes this would be nice, but no it doesn't work. In long:
I agree this would look nicer, and it was how the early prototypes worked, however I don't think it is practical. If each form has a save button then I add a translation into two different boxes, and click save on one box, it would only save the changes in that one box. I then have to wait until that has finished saving before allowing the user to click save on the other form, in order that there is no possibility of the edit being rejected as an edit-conflict by the software - this would be slow and, if there are lots of translations tables, that's a lot of save buttons to wait for. If the save button in each form saved the changes for all forms, it would be very unintuitive and probably lead to people saving edits the didn't intend to save. Adding a (x) to each translation is tempting and perfect from a UI point of view at the moment; however from a coding point of view it is a nightmare. I'm not prepared to hand-analyze all the dependencies that could be made by any combination of edits. What do I do if someone adds a translation, moves it to the other column, and then clicks (x). For example, if I add 'a' and 'b' as Spanish translations, the first edit adds "Spanish: a", and he second adds ", b". To delete 'a' as a translation I have to delete "a," which is not simply undoing an edit made so far. It would be possible to (internally) treat this as four edits and have the "Spanish:" and the "," as seperate edits, but then I need to keep track (somehow) of under which circumstances to remove what. I know this seems silly, because it's pretty obvious to a human what to do, and indeed if the editor.js actually worked by parsing the page into an internal format, and then modifying that, and then writing it back again, it would be trivial - but the effort of writing an accurate parser and symmetric rewriter for Wiktionary is enormous. Conrad.Irwin 01:09, 14 April 2009 (UTC)Reply
Understood. I guess you have to consider what this will look like if it's developed more fully.
I wouldn't mind if saving made the other boxes' forms go greyed out until saving was finished (the server will get faster someday, as all computer things do). This could scale to an object-oriented model of the entry, where you change a component and the change is made immediately with a short edit summary: ten changes = ten edit summaries, fewer edit conflicts. This could be integrated with the existing edit history's rollback and undo facilities for fine-grained restores.
The alternative means making a dozen different edits, hitting save once, and writing a long edit summary. This scenario gives the editor a new interface layer, emulating a window-based desktop GUI (Edit > Undo, File > Save). Close the window without saving and lose your work, but unlike MS Word, most browsers don't warn you. There is one undo stack on the page, and a separate one in the edit history. After saving, the history can undo sessions, not individual operation. Michael Z. 2009-04-14 19:01 z

Trial Run?

Would there be any objection to turning this on for everyone for a few days? It would be worth seeing whether it encourages people to add good translations, bad translations or just rubbish. Conrad.Irwin 21:22, 12 April 2009 (UTC)Reply

Okay by me. DAVilla 16:33, 13 April 2009 (UTC)Reply
I love it, so yeah. As long as I get to keep it if anons add bullshit :D — [ ric ] opiaterein17:56, 13 April 2009 (UTC)Reply
Ok, as I'm going to be around tomorrow, I'll try to enable it some time late morning (UK time). I'll disable it as soon as I notice problems, please feel free to disable it for me. Conrad.Irwin 01:09, 14 April 2009 (UTC)Reply

Foreign WOTD

Circeus has started a redesign of the main page which I think has some definite merit. One of the additions to the page is a "Foreign Word of the Day" section, which I find very promising. Regardless of whether the redesign is accepted or not, I'd like to propose that we include such a section. However, such a thing really needs some infrastructure supporting it. Most importantly, it needs a point of contact, someone who is in charge of the thing. Now, with WOTD, we've basically got a dictatorship going, with EncycloPetey having absolute and unrestrained authority over it. Anyone is free to disagree with me on this, but I think that it's worked out fantastically so far, and I recommend that we adopt a similar policy, voting in a a FWOTD dictator, who will act responsibly with such unlimited power. My first choice would be Stephen, who seems to be fluent in every language known to man. However, I have my doubts as to whether he'd take on the job. Anyway, whomever we pick, we'd want to set it up so that it requires minimal work from the person in charge. So....along the lines of the current WOTD selection process, we could simply have a nomination page, where people show off the best their language has to offer. The person in charge would probably give preference to words which are more impressive, so it would offer incentive to offer up choices which are already in good shape. Thoughts? -Atelaes λάλει ἐμοί 21:30, 9 April 2009 (UTC)Reply

FWIW I had not decided what exactly was going to go in that space (though a FWOTD was one of several suggestions I made in the discussion re: the revamp). It was added by DAVilla. I was going to let it to a separate discussion. Circeus 21:45, 9 April 2009 (UTC)Reply
Hey I just slapped it up there as a brainstorming idea. Atelaes is right, there's absolutely no infrastructure behind it at all. But then, we're not committed to this. It depends on if people want to have it or not. There is no question that if it goes up at all that it will look much different than it does now, with a different icon and at least showing the language of the definition! Another possibility is that if there are many languages then those could be listed, not showing the definitions. Those details we can hash out together, but I like the idea of a czar running the whole thing. Likewise for newly discovered terms, which interests me more than anything else. There are a couple of other things I haven't thrown up there yet, just as ideas. Hopefully one of these will grab someone's interest. 63.95.64.254 01:39, 10 April 2009 (UTC)Reply
I was just outline that I wasn't endorsing these specific features at the moment. I didn't intend to put down the proposal, quite the contrary. Circeus 02:08, 10 April 2009 (UTC)Reply
Understood. That was really a response to both of you, and of course for anyone who's watching. Apologies for hijacking your redesign page. It may be called for to splinter off these changes. DAVilla 02:53, 10 April 2009 (UTC)Reply
S'alright. As I mention above, my concern is really with the design itself. I'll leave open the question whether there should even be new content there. After all, the idea of not sliding a second box under WOTD and publicizing non-word content in a page-wide box under WOTD and discussion room links is tempting. Circeus 04:13, 10 April 2009 (UTC)Reply
Some questions about FWOTD that ought to be resolved:
  1. How often are each of the various languages to be represented? That is, how often will we feature a French word, a Dutch word, a Hungarian word, an Ewe word, etc.? Will representation be according to the percent of total entries on Wiktionary? If so, would it be according to number of lemmata or total entries? If not, then how will features be equitably apportioned to various languages?
  2. Will non-Latin languages be featured? For many users, some languages will show up on the Main Page as a series of little empty boxes or as identical indecipherable squiggles. By the latter, I mean that some languages are interpreted by some browsers/systems with a language-specific default character that is repeated for each different character. On my Mac, Malayalam looks like a series of identical little icons, as does Lao, but the little icons differ for Malayalam and Lao. How will unenlightened users react to this apparent computer garbage on our Main Page? Transcriptions may help a little, but won't eliminate the possible "garbage" symbols.
  3. How will words be selected? For the WOTD we currently have, words are neither too common nor too bizarre, and are chosen more often because an average person could conceivably use them or come across them during a typcial day, but they are a bit out of the average person's experience. If we feature words from many different languages, then how are these standards to be applied? Most people communicate in just one or two languages, and have little opportunity or reason to spout a word in another language. I have almost no potential for using Albanian, Maori, or Norwegian in my everyday activities (though my job has permitted me to drop in Chinese, Japanese, Latin, Spanish, and a few other languages from time to time). So what qualifies an entry for featuring in FWOTD?
There are other concerns I could name (e.g. audio files), but I think the items listed above are the big three. If a satisfactory plan can be made to cover these three points, then I imagine any other issues could be dealt with as well. I have raised these questions before, but no usable replies have been made to them. --EncycloPetey 17:24, 11 April 2009 (UTC)Reply
Some thoughts in response:
  1. This is one reason to have a czar, someone who we trust to be fair. In my opinion it would be best to highlight good entries, more like a featured article, but if quality does not correspond with the quantity then yes, the czar could make some adjustments.
  2. Good point. I agree, but I couldn't imagine leaving out other scripts. I guess just the Romanization is a possibility, though in some cases that would have to be stripped down further, leaving out the stranger diacritics. Better in my opinion would be to use an image linked to the page (or if necessary linking to transliteration to the scripted word), which hopefully isn't too much extra work.
  3. The point is not that any of us would use these words. (The phrasebook is a lot better for that.) It just hightlights the fact that this isn't your grandmother's dictionary. I think the coolest terms are those that are difficult to translate into English, for instance if the primary definition includes what are to us different concepts. I also like to see words with several definition lines, very different concepts like English bat (mammal/baseball bat). In a foreign language these can be very strange to us sometimes.
I'll state the obvious on audio files: there may not always be one. Should there be? At the very least it should be encouraged. DAVilla 19:18, 11 April 2009 (UTC)Reply
Re: Audio files: The draft for the new Main Page incorporates FWOTD into the same template as WOTD, which I think is a mistake for many reasons. One of these reasons is that the WOTD template requires an audio file and provides a red-link if the audio file does not exist. If you intend to use a WOTD template, then the FWOTD selection must have an audio file or we will have perpetual red-links on the Main Page. This invites trouble. --EncycloPetey 19:28, 11 April 2009 (UTC)Reply
Good point, we don't want to invite trouble. One idea is to leave the audio icon and link out entirely if a file doesn't already exist. Otherwise this might be considered a restriction on which words can be used, namely those that have already been recorded only. That I've seen there are many foreign words with recordings on the respective Wiktionary, many of those missing here, so it won't be too much to ask of nominators. However it would severely restrict the language options. Originally I had put the FL WOTD as a separate box. I think it looks nicer now, but we have to design around the content. DAVilla 20:02, 11 April 2009 (UTC)Reply
Re: Separate box: I also think the one box looks better than two, but cosmetic improvement isn't the only concern. I see more logistical problems with the combined box, such as coordinating nominations and archives; coordinating when WOTD gets updated daily but perhaps the FWOTD doesn't; and the possibility of user confusion in having two headwords inside the single box. Having a single box with two words but only one audio link would confuse/perturb some users, I'm sure. --EncycloPetey 20:23, 11 April 2009 (UTC)Reply
It should be possible to run the FL WOTD completely separately even though they share the same space. That did occurr to me while trying out the look. In my opinion it would only be worthwhile if there were a new foriegn word every day, so I hadn't considered unsynchronized updates, nor do I think we should. If it's not possible to update every day, then leave it for Interesting Stuff. Otherwise, wouldn't it be possible to use a single template and have the duties divided between yourself for WOTD and another person for the FL part?
To avoid the confusion you mention and issues about red audio links etc. I think it makes a lot of sense to simply require that there always be audio for the word. This will restrict us in some ways, but it's equally a bonus to have a narrowed the field of selection, (edited:) so as to avoid accusation of unfair selection by the czar. It just might encourage the addition of such information, in many cases just adding a link to content that already exists elsewhere. Also it's a strong selling point to say that we have audio recordings of foreign words. DAVilla 23:53, 11 April 2009 (UTC)Reply
If we want to consider restricting FWOTD entries to those that have audio, then we ought to do some reconnaissance to determine how many entries in various languages have audio available. A ballpark estimate would be sufficient, but we ought to consider that common and uninteresting words are more likely to have the audio recorded than "interesting" words. I almost always have to record audio for WOTD selections before they go up, although I do find a couple each month that Dvortygirl has already recorded.
An idea: we could consider starting FWOTD as a part of "Interesting Stuff" (at least initially) and see where it takes us. --EncycloPetey 23:42, 11 April 2009 (UTC)Reply
Trial run is an good idea, absolutely. I am going ahead to ask for nominations at this time. We will need someone to take this up though. Atelaes's original point is unanswered. If it looks like it's possible, I say do it! Interesting stuff is plan B. And by the way, we probably ought to critique the interesting stuff as thoroughly. DAVilla 00:05, 12 April 2009 (UTC)Reply

Now accepting nominations! Wiktionary:Word du jour/Nominations DAVilla 02:20, 12 April 2009 (UTC)Reply

Why Wiktionary:Word du jour/Nominations in lieu of Wiktionary:Mot du jour/Nominations? The uſer hight Bogorm converſation 20:49, 17 April 2009 (UTC)Reply
Because (1) mot isn't understood by English speakers and (2) this isn't the French word of the day. DAVilla 05:52, 18 April 2009 (UTC)Reply

Other Boxes in Redesign

I've realised that only a minimal amount of alteration to the WOTD template (basically add a few <code>{{#switch}}</code> to account for the differences and make the title change according to the content, and possibly drop off the audio) was necessary to allow a feature where the content is not composed solely of English word entries. I've made 8 sample entries that can be roughly cycled through by purging/refreshing the page (the code allowing this random switching is borrowed from w:Portal:Featured content). Circeus 05:49, 10 April 2009 (UTC)Reply

I was wondering about featuring other pages like this but wasn't quite sure how to go about it. We don't have enough Wikisaurus content to show off, for instance. Using a round robin sort of approach is keen.
Some other ideas would be possible with a dedicated contributor to support it. The news quote didn't go on the page because it looks like a lot of work to me, but I'm hoping to inspire someone. On the other hand, I honestly think we have enough content to list a newly discovered word every day, and that's something I would be excited about personally. Can we bring that back? DAVilla 07:35, 10 April 2009 (UTC)Reply
Probably not. SemperBlotto was running the thing singlehandedly for ages, and has grown tired of it. No one else ever stepped up to help out for very long (and I say that as someone who used to dabble from time to time). If it's not being maintained actively, it shouldn't be on the Main Page. --EncycloPetey 19:33, 11 April 2009 (UTC)Reply
What SB was running was words in the news, which was one proposal but different from the attested neologisms I called "Newly Discovered". This was incorporated into an Interesting Stuff box, but I'd like to put a dedicated section back into the revision. DAVilla 20:02, 11 April 2009 (UTC)Reply
For all these things I think it would be better to not specify the time period on the page, and then update it as often as we have the ability/enthusiasm. It may end up being "of the day", but if we don't specify that then it won't look so bad when it doesn't change for a week because someone is on holiday. Conrad.Irwin 08:22, 10 April 2009 (UTC)Reply
Alternatively, it should be possible to start with a 31 advance log that turns on itself in a fashion similar to WOTD until we get enough for a yearly cycle. Circeus 16:28, 10 April 2009 (UTC)Reply
Now see Wiktionary:Newly discovered. DAVilla 06:42, 11 April 2009 (UTC)Reply
I listed vulgar language but I'm pretty sure no one would want to see e.g. pussy pounding on the front page. Can I assume the same about jill (to masturbate)? assward? What about a quotation like "I bet you’d just lurve to have baby oil smoothed all over your little nappy bits"? If that's objectionable, it could easily be swapped for another. DAVilla 18:20, 11 April 2009 (UTC)Reply
Vulgar language on the Main Page would limit access to our entire website. Most schools use a filter that prevents access to pages containing certain words. We don't want access to the Main Page restricted from a significant fraction of users. --EncycloPetey 19:31, 11 April 2009 (UTC)Reply
Thanks, that's what I thought. I'll strike all of those out. Only had them listed because I was going through rather methodically. Words like lurve aren't so vile with a decent quotation. I still have questions about where to draw the line though. What about other potentially offensive words like shotacon that don't rely on vulgar terms for definition but could be problematic nonetheless? What about quotations like "Bi people tend to develop polyamorous identities and poly people tend to develop bisexual identities." Better to leave them out too, I guess? DAVilla 02:44, 12 April 2009 (UTC)Reply
Okay, sorry for being so dense. I answered my own question. I was just on a “Wiktionary doesn't censure” spacewalk, and it took a little while for reality to sink in. DAVilla 20:57, 12 April 2009 (UTC)Reply

I've got IPA, and here are my demands

Peter Isotalo recently started a BP thread on changing the link in {{IPA}}. Follow the link to see the rousing conversation he found. He has (reasonably, in my opinion) become a bit impatient about the whole issue, and has asked me to make the changes. I've set up {{grc-test}} as I intend to make {{IPA}}. Basically, if there is a lang parameter entered, the template links to "Wiktionary:language pronunciation" if it exists and "w:language phonology" if it doesn't. Otherwise, it simply links to our old standby w:IPA chart for English dialects. You can see the results at User:Atelaes/Sandbox. Unless someone gives me a damned good reason not to (or pays me a large sum in unmarked bills), I intend to institute these changes tomorrow. -Atelaes λάλει ἐμοί 22:25, 11 April 2009 (UTC)Reply

Sounds reasonable. --EncycloPetey 22:31, 11 April 2009 (UTC)Reply
Should we maybe use the "Appendix" namespace — e.g., [[Appendix:IPA chart for language]] — if we want it to be part of our content? My understanding is that the project ("Wiktionary") namespace is for Wiktionary policies, guidelines, discussions, and so on. —RuakhTALK 02:39, 12 April 2009 (UTC)Reply
Yes, I think you're right. It should be "Appendix:language pronunciation. -Atelaes λάλει ἐμοί 03:03, 12 April 2009 (UTC)Reply
Do eeeet — [ ric ] opiaterein17:23, 12 April 2009 (UTC)Reply

Roman numerals

For a Catalan usage note template for Catalan ordinal numbers, I was planning on writing some code to generate Roman numerals given a numeric argument. Any interest in my making it a template of its own and any preference to the name? (I was thinking {{Roman numeral}} and if there already is one by some other name, I couldn't find it.) I expect to make it with a range of 1 to 3999 (I to MMMCMXCIX) as more than that runs into problems as using CSS for the overline for thousands means that I'd be using CSS to convey content, which is bad practice and using the combining overline (U+305) has very spotty font support. Besides, for my purposes, 1000 is sufficient. — Carolina wren discussió 02:41, 13 April 2009 (UTC)Reply

Could you use the template in the way you've intended, so that we can see what it's to be used for? --EncycloPetey 18:21, 15 April 2009 (UTC)Reply
It's been incorporated into {{ca-num-ord-note}} (as {{Roman-num}} as I was worried about it being mistaken for a context template), and you can see it in use in primer. Since I was already passing the number into {{ca-num-ord-note}} for other purposes, it's not requiring changes to every entry to make use of it. — Carolina wren discussió 01:39, 17 April 2009 (UTC)Reply
Do 160 entries really need the identical 100-word note about Catalan numbering? Perhaps it should be in a sidebar or call-out box with a title like “Catalan ordinals,” so that its nature is clear to readers. Or does it belong in an appendix or Wikipedia article?
Entry content (like the specific notes for 1, 2, 3, 4, 10, 100, 144, 1000, etc) should be placed in the respective entries where it can be edited, rather than buried away in a template's #switch statement. Michael Z. 2009-04-17 02:37 z
I'm trying to ensure provide a uniformity to the usage notes for a finite, but large class of entries. The lemma entries for the ordinals all exist, and I would be extremely surprised if any others would merit inclusion. Indeed, I added 144 solely because of the existence of grossa. The specific notes given by the switch exist as alternatives to the default note. If the switch is what is mainly bothering you, I suppose I could rework it so that instead of a switch, the entire alternative to those parts would be passed as a parameter by those entries that need it to assuage your concern. — Carolina wren discussió 03:41, 17 April 2009 (UTC)Reply
Yes, I think a parameter would be preferable for text which is unique to an entry. Can't this be rewritten so that the entry text could just be text in the entry?
I'm concerned that although they do cite the entry term as an example, the second and third paragraphs are general discourse on Catalan ordinals, and not a proper part of the entry quart, for example. Michael Z. 2009-04-19 01:53 z
What I was asking for was a sample entry showing what the final product of sucha template might look like or be used for. --EncycloPetey 04:17, 19 April 2009 (UTC)Reply

Assisted editing a success?

It's still early days, but so far the following seems to be the case:

  • More translations are being added (not sure exactly how many more, but as I type 21/80 recent changes are made with this (compared to a meagre 3/80 accelerated).
  • Anons are using it to add translations, they seem to be mainly correct - but are sometimes slightly substandard, missing a gender - or formatted slightly wrong because they've tried to put too much detail into the box.
  • I've so far noticed two blatently incorrect uses of it, both times was to "unbalance" the translations using the ← and → buttons (though both times, only slightly).
  • Unwhitelisted and whitelisted registered users seem to be using it well.

The decision thus seems to be whether to keep it enabled for all anonymous users, or to limit it to logged in users. Conrad.Irwin 14:08, 14 April 2009 (UTC)Reply

Users might be unbalancing the translations because of the edit box itself, which makes a balanced table look unbalanced. Maybe you could use a strip across the bottom instead of just the right column? DAVilla 01:19, 17 April 2009 (UTC)Reply
Not knowing the effect on the pattern of anonymous contribution, I can only speak about my editing experience: this is a nice tool, thanks! --Dan Polansky 14:46, 14 April 2009 (UTC)Reply
On design: What about making the gender checkboxes a part of the "less" variant instead of the "more" variant? Many foreign languages feature gender in their translations, while few feature transliteration, display form, and override script, AFAICT anyway. --Dan Polansky 15:04, 14 April 2009 (UTC)Reply
I had given that some thought before, so it is now done if you clear your cache (ctrl+shift+F5). Conrad.Irwin 15:26, 14 April 2009 (UTC)Reply
I can't get this to work. Whatever I type in the first box (e.g. cy), I get the message "Please use a language code. (en, fr, aaa)". What am I doing wrong? 82.18.22.160 21:54, 14 April 2009 (UTC)Reply
I haven't yet worked it out. You are using IE6? Conrad.Irwin 22:18, 14 April 2009 (UTC)Reply
The same happens for me in IE7. It works fine using Google Chrome. Using Firefox I get "Loading . . ." but nothing happens. SemperBlotto 12:29, 15 April 2009 (UTC)Reply
The same in IE7 by me. I get the error message: "Please use a language code. (en, fr, aaa)". In Firefox, which I normally use, everything works fine. --Dan Polansky 17:56, 15 April 2009 (UTC)Reply
Would be okay for anons just as edit is okay, but take it offline if it's causeing errors. Per below, you may also consider an "off" button that would store as a cookie and could be reset in prefs or by clearing cookies. DAVilla
Yes, now done using the module that already remembers whether you have the box open or shut and which language you last used. It's at the top of User_talk:Conrad.Irwin/creation.js and can be put anywhere. Conrad.Irwin 01:52, 16 April 2009 (UTC)Reply
I also wonder why we don't have more standard A to I and J to Z assignments for columns. If I'm looking for Japanese I'd like to find it in about the same place every time. Nearly balanced is good enough, and perfection is impossible given that the lines may or may not run over depending on fonts and page width. DAVilla 20:16, 15 April 2009 (UTC)Reply
This is something that would need to be taken up more widely if it were to be changed. The other thing that concerns me is the grouping of languages in translation sections as that makes things that much harder. Conrad.Irwin 01:52, 16 April 2009 (UTC)Reply
You mean like when people put everything that is or ever was spoken anywhere within China or the territories it claims under "Chinese"? You should consider those as incorrect. I'm going through a few entries and adding *Chinese: See Mandarin and hoping to get some feedback. DAVilla 01:19, 17 April 2009 (UTC)Reply
IE6 works okay as far as language code goes, but the box to save changes is hidden behind the logo and so the changes can't be saved.
On a similar note I was wondering if it would be possible to position that box relative to the window, rather than at the top of the page. DAVilla 01:19, 17 April 2009 (UTC)Reply
Now fixed (both problems) in IE6, it should already have been right in other browsers. Conrad.Irwin 11:12, 17 April 2009 (UTC)Reply

I'd say keep it enabled! A very nice tool which, so thanks a lot! --Eivind (t) 16:32, 15 April 2009 (UTC)Reply

Could there be a way to turn it off? I'm not likely to use it myself, and unless one is intending to edit a translation box with it, it's ugly intrusion. By the way, a way to edit an existing translation to add gender or other things to existing entries would be useful. — Carolina wren discussió 17:51, 15 April 2009 (UTC)Reply
You could put the following into Special:MyPage/monobook.js, though I appreciate that's not an ideal solution.
window.editorLoaded = true;

Editing existing entries is more tricky as it has to do more detailed analysis of the page, but it's certainly somewhere on the todo list. Conrad.Irwin 17:55, 15 April 2009 (UTC)Reply

You can now disabled it at User talk:Conrad.Irwin/editor.js after you hard refresh. Conrad.Irwin 01:52, 16 April 2009 (UTC)Reply
The feature is almost useless to me. Although it seems to work in Safari on a Mac, I need special characters for Latin that can't be typed from my keyboard easily. I do note one confusing point, and that is the complete lack of visible prompting of what goes into each of the little text windows. The ISO-code window needs at least something to prompt more visibly for an ISO code. Is it possible for the boxes to have "dummy" text visible in them when they first appear, and which disappears once someone begins typing information into the window? --EncycloPetey 18:15, 15 April 2009 (UTC)Reply
It's possible, and I'll give it a go at some point. Conrad.Irwin 01:52, 16 April 2009 (UTC)Reply
One other point: The gender checkboxes say "male" and "female", which are not grammatical genders. They need to say "masculine" and "feminine", or else use an abbreviated form of those words. The grammatical gender does not always match the actual gender of the referent. Consider that the German (deprecated template usage) Mädchen is a female word, but is grammatically neuter. The grammatical gender of animal names in many languages has similar problems. --EncycloPetey 18:32, 15 April 2009 (UTC)Reply
I've fixed this. Conrad.Irwin 01:52, 16 April 2009 (UTC)Reply

I find this tool absolutely indispensable. Can't imagine going back to manually adding translations. Please, keep it developing. Particularly, I'm interested in being able to modify already existing stuff, like adding sc= parameter or a transliteration to an existing translation. --Vahagn Petrosyan 18:44, 16 April 2009 (UTC)Reply

It's a great tool and a time-saver, Irwin! Sometimes it gives an error on the gloss format being incorrect (just added another and then the next translation fails), even if there is no problem with the gloss. It doesn't happen too often, though. Unfortunately, I can't use for Chinese translations. First of all, it doesn't allow both traditional and simplified entries. Also, I don't agree to use "Mandarin". It's more common to use "Chinese" in translations here and adding dialects on the next line. Just repeating here what I said in your discussion page. Please let me know if you have questions. I would like to add many more Chinese translations. (Please don't explain me the Chinese language family situation, I am well aware of it.) Anatoli 01:49, 20 April 2009 (UTC)Reply

I made some fixes about the gloss and the formatting last night, so hopefully they'll be better - feel free to list pages that you think are right and that it raises an error on, if I get time I'll fix it. The reason it says Mandarin is because it just uses '{{zh|subst=1}}'. Conrad.Irwin 09:49, 20 April 2009 (UTC)Reply
Conrad.Irwin, it should show "Mandarin" for cmn or zh-cmn, not for zh (zh stands for Zhōngwén (中文) - Chinese), '{{cmn|subst=1}}' produces Mandarin as well, '{{yue|subst=1}}' produces Cantonese, etc. Anyway, I'll get to the bottom of the Chinese templates usage. More importantly, IMHO, the traditional/simplified separation should be accommodated, e.g. Template:zh-tsp, otherwise, I can't use the assisted for Chinese (Mandarin) at all, in case when traditional/simplified are identical, then it should provide just one script, e.g. Lua error in Module:gender_and_number at line 92: The tag "Běijīng" in the gender specification "Běijīng" is not valid. See Module:gender and number for a list of valid tags..
I get errors occasionally on other languages. However, thanks for your efforts, in any case. Anatoli 12:14, 24 April 2009 (UTC)Reply
The remembering of the "less"/"more" state does not work any more for me. That is, when I choose to see less on one page, and open another page, I see all the fields in the newly opened page. Browser: Firefox 3.0.9; OS: Windows Vista. --Dan Polansky 11:14, 22 April 2009 (UTC)Reply
It won't remember anything until you hit "Preview", I should maybe change that. Conrad.Irwin 08:54, 24 April 2009 (UTC)Reply
So that is the trick that does it. Works for me. --Dan Polansky 11:41, 24 April 2009 (UTC)Reply

Nested translations etc.

Is there a list of which languages should get nested under which headings? Are there other templates, apart from the Chinese ones that people want to be able to (ab)use in place of {{t}}? Are these deviations from a standard a good idea? Conrad.Irwin 09:49, 20 April 2009 (UTC)Reply

"Norwegian Bokmål" (nb) and "Norwegian Nynorsk" (nn) should be nested under "Norwegian" (no). Most often we put translations that are correct in both languages under "Norwegian", and spesific translations under nb and nn. --Eivind (t) 10:03, 20 April 2009 (UTC)Reply

For Bokmål and Nynorsk, I believe both are subsumed under one Norwegian Wiktionary, i.e., {{no}}. Upper and Lower Sorbian should nest under Sorbian (often ignored, unfortunately). Jicarilla, Chiricahua and Western Apache should nest under Apache. Under Chinese go Mandarin, Yue, Xiang, Min Nan, Min Dong, Gan, Wu, and Hakka. Modern and Ancient Greek under Greek. Eastern and Western Mari under Mari. The Arabic dialects under Arabic. Brazilian and European Portuguese under Portuguese. I think there are others that are differentiated only by Northern and Southern, or Upper and Lower, that should nest, but I don’t recall off the top of my head. —Stephen 13:31, 22 April 2009 (UTC)Reply
What about various Old and Middle languages? Old Irish under Irish? Middle Welsh under Welsh? And if so, where do we put Old English and Modern English, since translation boxes don't have a line for English? Is this nesting something AutoFormatBot could do? Because when I've added Ancient Greek using the handy new quick-entry form, it gets automatically alphabetized between "Am" and "Ao" rather than nested under Greek, and if I have to go in and fix that manually it defeats the purpose of the convenient quick-entry form. Angr 13:46, 22 April 2009 (UTC)Reply
Yes, Old and Middle go with the Modern language, except in the case of Old English, since Modern English does not get a line in the translation section. —Stephen 14:02, 22 April 2009 (UTC)Reply
So Old English and Middle English get alphabetized under "O" and "M" respectively? That will separate them from each other as well as confuse people who are accustomed to looking for "Old/Middle Foobar" under "Foobar". Maybe there should be an empty "English" line with "Old English" and "Middle English" indented under it, the way "Serbian" is usually empty, with "Cyrillic" and "Latin" indented under it. Angr 14:44, 22 April 2009 (UTC)Reply
I fear that that will encourage well-meaners to add (modern) English synonyms or summat.—msh210 17:56, 22 April 2009 (UTC)Reply
We could call it "English (earlier stages):" or summat of that. Angr 10:44, 24 April 2009 (UTC)Reply
No, please, keep Old English and Middle English away from Modern English, they have an entirely different vocabulary (no Gallicisms, Latin loanwords) and grammar(ge- for forming past participle like in German). Old High German and Middle High German do not have the same problem, but the spelling is too different (lack of noun capitalisation unlike German). The current format is the best one - English, Old English, Middle English; French, Old French; Latin, Late Latin and so on. I support nesting Bokmål and Nynorsk under Norwegian, it seems reasonable. The uſer hight Bogorm converſation 18:09, 24 April 2009 (UTC)Reply
I thought we didn't separate Middle English from English. There is ==Old English== but no ==Middle English==, only # {{obsolete}}. Likewise translations for * Old English but not * Middle English, only * English: (obsolete). DAVilla 15:49, 30 April 2009 (UTC)Reply
Apparently I'm mistaken. See Category:Middle English language. DAVilla 20:32, 3 May 2009 (UTC)Reply
We've been through this before. Yes, alphabetizing the languages separates them, but there is no way around this. Not every language descends from a similarly named language. The choices are (1) alphabetical order or (2) complete language family tree. There is no middle. DAVilla 15:49, 30 April 2009 (UTC)Reply
I've been doing entries for the Valencian standard of català as indents. (See eight for an example.) It's a peculiar situation. Valencian and Catalan share the same ISO 639 code, but both are recognized in ISO 639-3 as names for the language and both have bodies to prescribe their standard. Fortunately the differences between the two are largely a matter of pronunciation or preference between equally acceptable forms (and thus wouldn't concern the translations section) but as seen with huit/vuit the difference is sometimes orthographic. It's rare enough that manually adding the Valencian form is a viable option so long as editor.js doesn't disturb existing entries.— Carolina wren discussió 04:03, 24 April 2009 (UTC)Reply
Irwin, thanks for fixing the '{{zh|subst=1}}'. I can now add assisted Chinese translations, if jiantizi/fantizi match. Is traditionals/simplified on your to do list, at least. Would be great if you can add this. Even if you can add two entries separated by commas, this could benefit some other translations where you can have variants in spelling. Here are two example:

What do you say? If you had an additional (optional) textbox that would also work.

Anatoli 01:02, 30 April 2009 (UTC)Reply
We don't do translations into "Chinese". Nothing was fixed by substituting that for Mandarin, which is the correct language header. DAVilla 01:15, 30 April 2009 (UTC)Reply
DAVilla, there is an existing '{{cmn|subst=1}}' template for Mandarin (Chinese Mandarin). {{zh|subst=1}}' is for Chinese (Zhōngwén - Chinese (language)) and zh links to the Chinese Wiktionary, which happens to be in Mandarin but I see no difference, since standard written Chinese is normally in Mandarin. As I described before, the translations could be nested further by having Mandarin, Cantonese, etc, then Chinese needs to be the header for all Chinese languages/dialects. It would be a waste of space and typing time, in my opinion. Anatoli 03:14, 30 April 2009 (UTC)Reply
I'm not an expert, but I belive that "Traditional"/"Simplified" can be toggled by setting the appropriate script template. (Click on "more", and use Hans or Hant in the "Script" box - you can then use the "Qualifier" box to put Template:italbrac and Template:italbrac in front if you're feeling cunning). I did not fix Template:zh, but it was my suggestion that it be fixed - there are a lot of users who seem to want to add translations under Chinese. If this isn't desirable then the "fix" needs reverting and it needs to be explained somewhere why this is the case. As for supporting nesting, it will hopefully happen, eventually. Conrad.Irwin 09:35, 30 April 2009 (UTC)Reply
(For what it's worth, we currently have about 8000 Chinese translations). Conrad.Irwin 09:41, 30 April 2009 (UTC)Reply
And they're mostly wrong. Per Wiktionary:About Chinese, particularly A-cai's comments on the talk page, cases where only * Chinese is listed need to be changed to * Chinese ** Mandarin or, my preference, just * Mandarin. The idea of using * Chinese ''See Mandarin'' is very new and potentially controversial, but the languages listed in the translations section have always been meant to match the L2 headers. DAVilla 15:39, 30 April 2009 (UTC)Reply
As stated on that template talk page, "Wiktionary breaks down Chinese dialects with special codes such as cmn for Mandarin". Your point about Mandarin-language Wiktionary makes my point entirely. Regardless of which code they use, the words that concern them are in Mandarin, not from the entire Chinese language. The reason that {{zh}} says "Mandarin" is for use in linking to the Mandarin-language Wiktionary, which should say "Mandarin" not "Chinese". In fact I believe that is its only use here. DAVilla 15:39, 30 April 2009 (UTC)Reply

Sorry, DAVilla but I don't see your point, especially not seeing you being active in creating Chinese translations. Mandarin = Standard Chinese, they are synonyms, besides, it's the only official dialect in China in Taiwan. We have the template cmn, which can be used for "Mandarin". Please leave zh for Chinese - zh stands for Chinese, not for Mandarin. They can be used when there is a difference between dialects. The original edit of the template wasn't mine. Your demands to remove the word "Chinese" from translations are not helpful, I have to do more manual editing. If the word Mandarin were used more often in translations, then I would stick to it but Chinese is more common than Mandarin when referring to the language of China and most translations use Chinese, not Mandarin in translations and I prefer to continue. The linked translations may have Mandarin, Cantonese, etc. with the appropriate pronunciation.

Bear in mind that adding a translation under Chinese, will add a link to [[page#Chinese]]. As we only have 15 pages with this heading, compared with 21075 using [[page#Mandarin]], DAVilla may well be talking sense. Presumably the problem would be somewhat lessened if I added support for "nesting" all the languages on Template_talk:zh under a "Chinese" heading. (Something I hope to be able to do at some point soonish). Conrad.Irwin 23:55, 3 May 2009 (UTC)Reply
No, they don't. The link just searches for any occasion of the word, even if it's in Japanese because there is no [[page#Chinese]] but simply [[page]]. Nesting is OK but I see some problems with your tool, plus there is an extra unused line with just Chinese: on it, etc. Just in case you don't know, there is only one written Chinese standard (with 2 scripts), even in Hong Kong, where standard documents are written in "Mandarin", although they may read them out loud in Cantonese. Mandarin refers to the standard spoken language. That's why I don't see any issue about the Chinese wiktionary being in Mandarin - it's the normal way to write in Chinese. Anatoli 00:50, 4 May 2009 (UTC)Reply

Irwin, your suggested would require separate entries for simplified/traditional. They are regarded as the same word. 經驗 and 经验 (jīngyàn) is the same word (experience), only written in 2 forms. I prefer to see them together, followed by the pronunciation, like this: 經驗, 经验 (jīngyàn) Anatoli 22:39, 3 May 2009 (UTC)Reply

You can achieve this by first adding 經驗, then adding 经验 with the transliteration (jīngyàn). [If you can't see the transliteration box, it is under More]. Which gives [3] (I should maybe have used cmn not zh which would get rid of the (zh), but it was just an example). Conrad.Irwin 23:55, 3 May 2009 (UTC)Reply
This sounds like a good workaround, perhaps the first then should be sc=Hant, and the 2nd sc=Hans. Only some may assume they separate words. The template: {{zh-tsp|經驗|经验|jīngyàn}} or simply: {{zh-ts|經驗|经验}} (jīngyàn) makes it clearer but it could be an overkill in terms of how much info it provides. Anatoli 00:50, 4 May 2009 (UTC)Reply
You can of course also edit the wikitext if you want to do something funky. While adding support for other templates is not impossible, it requires someone to design the interface for them and test that the wikitext generated is always correct (I think the actual edits can be done with the editing functions already used for {{t}}). If you wanted to give designing the interface a go then I'm sure that your changes could be added to editor.js (while it does require a knowledge of javascript, the fix should be manageable if time-consuming). Conrad.Irwin 22:38, 4 May 2009 (UTC)Reply

Word of the Day

The Word of the Day appears on the home page to be stuck on the 12 April entry. When "refresh" is clicked in the Word of the Day box, one is taken to a page containing only today's proper word (not a home page with the proper word). Just FYI. — This unsigned comment was added by 76.202.234.114 (talk) at 15:40, 14 April 2009 (UTC).Reply

Thanks for the heads-up.  (u):Raifʻhār (t):Doremítzwr﴿ 15:46, 14 April 2009 (UTC)Reply
That's a result of the interaction with your browser, although it may be caused partly by MW as well. I know that the problem did not exist at this time last year, but has been particularly bad of late. I am having refresh problems on more than one Wiktionary page (including the Main Page and Recent Changes), and the problem is not limited to one browser or OS. --EncycloPetey 18:09, 15 April 2009 (UTC)Reply
Hmm... It could also be partly caused by the fact that the Main Page dynamically calls content from the current WOTD according to the date. Is there any way to force a refresh on that? --EncycloPetey 19:06, 15 April 2009 (UTC)Reply
Not MW at the moment: it displays as April 15 for me. Does it still look wrong to you? Besides local caching issues, there may be internet caching issues, potentially. DAVilla 20:07, 15 April 2009 (UTC)Reply
It looks fine to me on my Mac right now, but I have problems when I use my Back Button (Mac), or when I first visit Wiktionary on any given day (Mac or PC). The latter applies even if I didn't log in. --EncycloPetey 20:21, 15 April 2009 (UTC)Reply
Word of the day: word n, 1. Please leave a note in the Beer parlour to tell us that there is no word of the day. 78.49.0.96 09:34, 25 May 2009 (UTC)Reply
Seems to be "chagrin". Conrad.Irwin 09:49, 25 May 2009 (UTC)Reply

Would it not be nice if one could specify a parameter so sister project templates, especially {{commonslite}}, would link directly to a search results page in the sister project? By default for the article title, optionally the display, or perhaps a third parameter. The default result now at best takes another click and may discourage clickthrough altogether. Implementation is a GP matter, but does anyone else see the advantage? Are there drawbacks? I see more benefit on Commons than on other projects. DCDuring TALK 16:59, 16 April 2009 (UTC)Reply

Commons is the only site that I would even consider that for, and all in all it's probably still a bad idea. We don't want to link to searches on e.g. Wikipedia because it's expense. We already have enough shit ugly Wikipedia boxes plastered onto every damn page as if someone isn't going to have enough sense how to go to Wikipedia and type in that exact term, when what we really need are Wikipedia links to relevant articles and maybe an occassional box linking to the disambiguation page. DAVilla 02:15, 17 April 2009 (UTC)Reply
Yea, to hell with them. DCDuring TALK 10:32, 17 April 2009 (UTC)Reply

Russian noun stuff

I got tired of using {{infl}} for Russian nouns, given that it's such a widely spoken language and all those parameters, including sc= just got to be too much. So, I moved the old {{ru-noun}} to {{ru-decl-noun}} (all entries linking to the old former have been changed over, don't worry) so that we can now use {{ru-noun}} for inflection lines. The usage isn't too complicated, have a look at the talk page for more on how to use it. — [ ric ] opiaterein17:01, 16 April 2009 (UTC)Reply

I wonder whether there is any need for the animate/inanimate option. Animate nouns are people or animals, living, moving beings. Inanimate nouns are plants, rocks, elements, dust, ideas, feelings, dimensions, things that are not living. It’s pretty cut-and-dried. Also, since we show the plural in the declension table, the plural parameter in the heading seems like overkill. —Stephen 20:25, 17 April 2009 (UTC)Reply
It's certainly useful to learners of the language, who don't always remember that there is a distinction between the two. I have always found the "in/animate" notes in my Polish dictionaries to be very helpful. And I don't know about Russian, but it also has an impact in Polish on the construction of place names from masculine nouns. --EncycloPetey 20:56, 17 April 2009 (UTC)Reply
In Russian, animate/inanimate does not impact the grammar outside of the declension of the word itself. Inanimate nouns have accusative like the nominative, while animate nouns have it like the genitive. But we are giving the approriate accusative for each word, so it is not important to know about animacy. Only in a few words, such as prick, which can be an inanimate bodily organ or an animate irritating person, the accusative can have both forms depending on animacy. Polish is a more difficult case, because there are different pronouns and such. In Russian, it’s only the accusative case, which we show for every noun. —Stephen 22:26, 17 April 2009 (UTC)Reply
I too think info about animate/inanimateness is not interesting. What I'd like to see is an ability to add the wording indeclinable in the inflection line, when necessary. Also, does anyone else think a feature for showing feminine counterparts to words like гражданин, армянин is desirable? I do. --Vahagn Petrosyan 19:44, 18 April 2009 (UTC)Reply
I think showing feminine counterparts is useful. {{he-noun}} does it, e.g.—msh210 15:42, 21 April 2009 (UTC)Reply
Yes, indeclinable is a useful parameter, since it’s a significant noun class in Russian, and there should be a way to show feminine counterparts on gender-specific Russian nouns such as American, brother, and Mr. —Stephen 13:42, 22 April 2009 (UTC)Reply

Arabic Romanization

Hi, arabic romanization guidelines have been published on Wiktionary:About_Arabic, after having been discussed on the corresponding talk page. It is based on the qalam system, which has been chosen because it is very easy to type on any latin keyboard and uses transliterations that are well known by most already. Thanks for any comments and suggestions. --Beru7 20:25, 16 April 2009 (UTC)Reply

I'm not crazy about the mixed use of lowercase and capital letters. I don't think ease of typing on any Latin-alphabet keyboard is really the best way to choose a transliteration system, but I don't do a lot of editing for Arabic. — [ ric ] opiaterein15:14, 17 April 2009 (UTC)Reply
I do not like the idea at all, especially since it includes the use of numerals, which is highly confusing. We're already using a transliteration system for Arabic which appears to be widely accepted. -- Prince Kassad 15:37, 17 April 2009 (UTC)Reply
Currently there is no system in use. There are several systems, mixed together and used inconsistently throughout the wiki. I don't think anything could be worse.
Now, many people do not like the numeral 3 for Template:Arab (the only numeral used), which is comprehensible. Remember it is already widely used on the internet, however, and that the most commonly used tranliteration for ع, the backquote `, is not good at all on computer screens, as it is not vey distinguishable from ' which is used to transliterate hamzas. Also, in Arabic, ع is a real consonant with no specific rule. It should have a symbol that has the same size as other consonants, not just a quote. ʔ would be about the only alternative.
Concerning the use of lower-case and upper-case: it is a common practice for transliterating arabic. Karin C. Ryding's "Reference Grammar of Standard Arabic" (Cambridge University Press) uses such a system for example as do many other recent grammar books written in english. Very serious books. If they use it, so can we. --Beru7 16:30, 17 April 2009 (UTC)Reply
Personally I'm more concerned about the use of t-h and s-h, which is too easily misinterpreted IMHO (also, do other consonant+h sequences occur?). I'd be more in favor of adopting a set of existing diacritics for sh/th and the emphatic letters, but I don't edit the area, so I'll defer to those who do. Circeus 17:56, 17 April 2009 (UTC)Reply
The ALA-LC transliteration system uses ' where we use -. t, s and k are concerned. But using th, kh and sh makes the transliterations much more readable for english speakers. Cases where - will have to be used are rare.
By the way there is already a consensus by us people who are actively editing the arabic area. --Beru7 18:34, 17 April 2009 (UTC)Reply
Mixed-case is a common way to transliterate Arabic and I believe it’s familiar to anyone who studies Arabic. In this new system, there is only one numeral being used, and it is likewise a common way to represent that letter. I don’t like the hyphenated s-h usage either, but it makes clear to anyone that the letters are not a digraph, and if we don’t use diacritics or other IPA symbols, I don’t think of any better solution. It’s true that I’ve used our earlier system consistently, but some recent users have been insisting on doing it their own way, so that it’s quickly become a hodge-podge. We need to publish a strict standard, so now is the time to decide. Another thing about Beru7’s new system is that it is easy to type without EditTools, and now that we are using User:Conrad.Irwin’s js editor, the EditTools are not available. —Stephen 20:20, 17 April 2009 (UTC)Reply
However, most readers of Wiktionary do not study Arabic, but are just casual users of Arabic wanting to look up some word. These will not understand the numeral (or the hyphenated letters), and will fare off better with the current pseudo-scientific system. -- Prince Kassad 20:59, 17 April 2009 (UTC)Reply
I'm not familiar with these, but it seems to me that Arabic is not English, and has non-English sounds, so an anglophone will need some basic familiarization with any system. Making it “easy to read” is just making it easy to ignore the differences. The important factors are compatibility and standardization, particularly with other lexicographical and linguistic references.
Do the five modifications come from the usage in Ryding and the other serious books, or are they another example of Wiktionary refusing to work with the real world because we are so much smarter? Michael Z. 2009-04-17 21:18 z
Kassad, Michael is right: arabic has 28 consonants, and nothing can be done about that. So any transliteration will have to use unusual signs or usage of letters.
Concerning the second part of your post, Michael, Ryding does not adress the "sh" problem to my knowledge. In fact, almost every book I own that presents transliterated text uses a different system. It's not because each author is trying to outsmart the other, it is because each has his own requirements. Just like we have ours. --Beru7 22:01, 17 April 2009 (UTC)Reply
That's fair, but can't we be compatible with one source, rather than having to defend our own innovations? Michael Z. 2009-04-18 23:22 z
I would have liked to, believe me. First thing I tried is to look at all the existing systems. Most were designed for print, and the resulting confusion between hamza and 3ayn was unacceptable... Others are very systematic but do not take into account that they should be as easy as possible to read for most (IPA or the buckwalter which uses * for dhel, v for th). That's how I ended up with the modified qalam, which also happens to be compatible with many online transliteration tools (yamli, yoolki, eiktub). I have written my own as well for those who are interested. --Beru7 12:14, 19 April 2009 (UTC)Reply
Support. To me the hardest thing to accept was 3 for `ayn (ﻉ). I suggested to use ` (backquote). However, 3 is used by online Arabic editors (Yamli, Google) and is the "standard" method for the letter input in chat, so it's well-known to Arabs. This is the only number Beru7 had in his proposal. Capital letters are used in Qalam, which is one of the standard Arabic transliteraton methods. I also uses ` for ﻉ. I think this proposal is worth considering. Anatoli 07:58, 18 April 2009 (UTC)Reply

Would someone please have a look at Anatoli's recent edits and comment on the talk page? We can't seem to agree on the basic concepts. Michael Z. 2009-04-18 23:37 z

Plural of multi-word nouns

Currently, en-noun generates a brand new entry in multi-word nouns (e.g. book award - book awards). I've been wondering if the components should be separated in the plural form, e.g. book awards or at least a new option added to en-noun to display separate words if it makes sense? We are adding a lot of new plural entries when the components would be sufficient. Panda10 13:37, 19 April 2009 (UTC)Reply

Using Special:PrefixIndex to automatically list prefix derived terms

I just made an edit to self- which got me thinking about whether it would be a good idea to implement that as a standard: we could just transclude Special:PrefixIndex into the list of derived terms for prefixes, instead of building that list manually. Additional red-link entries could be added by hand (as I left self-belief). In fact we could create a template such as {{prefix derived terms}} (or whatever name you prefer) with {{Special:PrefixIndex/{{PAGENAME}}}} and then add it to pages. This would make those lists more complete and even self-updating! What do you think? --Waldir 15:42, 20 April 2009 (UTC)Reply

Oh, you're right. I wonder if subst:Special:PrefixIndex will work there... otherwise I'll copy and past the list. --Waldir 15:57, 20 April 2009 (UTC)Reply

on using the Wikisaurus

I've been working on compiling a list of names for birds (i.e. (deprecated template usage) cuckoo, (deprecated template usage) duck, (deprecated template usage) wren, (deprecated template usage) nightjar etc.), and decided that most likely Wikisaurus:bird was the bets place to foist this on instead of making a duplicate Appendix:Names of birds.

However, I am noticing that there are MANY MANY more names than I had first thought. Using w:List of birds to avoid forgetting anything, I've just finished cuculiformes and am already at ~120 names, a number that is bound to rise fast (and that's not counting 50+ names moved to Wikisaurus:hummingbird). I'm not sure yet whether to move out more stuff (i.e. Wikisaurus:fowl and Wikisaurus:bird of prey, almost certainly Wikisaurus:songbird) or reduce the list to the well-known/most generic names and move the rest (e.g. (deprecated template usage) brush-turkey, (deprecated template usage) go-away-bird, (deprecated template usage) tragopan) to an appendix.

It's worth noting here that there is some duplication between wikisaurus and the entries themselves in detailing tehse semantic networks: hyponyms, hypernyms and meronyms have atendency to be listed in "related terms" section (i.e. (deprecated template usage) marteau, (deprecated template usage) shark). Circeus 19:47, 20 April 2009 (UTC)Reply

To me, keeping some bird names in more specific entries such as Wikisaurus:hummingbird instead of in Wikisaurus:bird seems to be a useful way to prevent having a large, difficult to overview list of exotic bird names in Wikisaurus:bird.
A partial duplication of the mainspace content in Wikisaurus looks okay to me, presenting no problem. --Dan Polansky 10:11, 21 April 2009 (UTC)Reply
What you're describing here sounds more like categories to me. Do hyponyms make sense in Wikisaurus? I wouldn't go to a thesaurus to find different birds of prey. Equivalently, I wouldn't expect to see eagle and vulture in the same thesaurus entry. If some thesauruses do that, I'm not sure it's a strategy that can be completely unfurled. Wouldn't it create a gigantic mess, or am I being too pessimistic? Maybe it could be done, but it would mix a sort of picture dictionary hierarchy into something for which it wasn't intended. That can be changed of course, the intention bit, as long as you realize that you're deliberately mixing the two. DAVilla 07:35, 28 April 2009 (UTC)Reply
Well, look at Wikisaurus:abode, to which I ultimately added very little. Circeus 12:46, 28 April 2009 (UTC)Reply
Actually, this warrant a proper discussion. I'll keep the Wikisaurus:abode link to note that although I did add a few, it was not in its original form, very different from the original "bird" entry. Compare also Wikisaurus:building or wikisaurus:creature.
Ultimately, my editing is basically going down to the logical conclusion with regard to inclusion of hyponyms: it is ultimately not fitting not list only a few (though I would not be entirely opposed to keep only the broadest ones and refer to an appendix for the rest).
When you say "What you're describing here sounds more like categories to me.", you're harkening back to the classic list v. category debate. Many arguments from that page are appropriate here, two that immediately come to mind are that categories are much harder to establish and maintain (you have to edit dozens of articles), and they cannot contain redlinks. In our case, Wikisaurus allow beter subdivisions (i.e. a sub-splitting for hummingbirds, passerine, birds of prey and fowl) that would probably be considered inappropriate in category space, whereas the categories would still have the entire, unstructured list of names (i.e. treating waterfowl and ostrich equally). In reverse, as is obvious here, I am kinda running into a problem with having too much information on the page, although the spliting away of some groups I believe ultimately makes for a better thesaurus. 23:36, 28 April 2009 (UTC)
When this happened with derived terms, we decided not to list derived terms of derived terms, e.g. unenthusiastic and enthusiastically but not unenthusiastically. Ultimately this did very little: all of the longest pages on Wiktionary have a laundry list of derived terms, and there's probably no way around that. In your case, however, it's the ideal solution. Don't list hyponyms of hyponyms. Is this what you ultimately settled upon anyway?
Yeah, I did mean categories in the Wiktionary sense, but you could also take it to be the regular English meaning, as in hierarchies of lists or even an outright picture dictionary. DAVilla 15:20, 30 April 2009 (UTC)Reply
On whether hyponyms belong to a thesaurus: I understand one of the purposes of a thesaurus to be to help people find words they can't recall. If one recalls a hypernym of the word one was looking for, a thesaurus that has hyponyms helps him while one that has only synonyms does not. Like, "How only was that rodent called?" In a similar way, if one recalls a coordinate term, more likely scenario, a thesaurus with hyponyms helps: coordinate term of, say, "apple" is a hyponym of one of the hypernyms of "apple". Like, "How only was that thing called, similar to apple but not apple? I see, pear, found in the hypernym of apple--Wikisaurus:fruit" or even "..Wikisaurus:edible fruit." Admittedly, categories do that job. However, there is no clearly defined relationship between a category and its memebers, unlike "hyponymy" in Wikisaurus; while Category:Fish contains fish, Category:Physics does not contain items of the class "physic". Also, there are further semantic relations such as meronymy that don't fit well into categories. As Circeus mentioned, contrasted to categories, Wikisaurus enables much finer hyponymy documentation, one that, in the category namespace, would probably be considered an overcategorization.
The guideline "don't include hyponyms of hyponyms" is a useful heuristic, but should IMHO be taken as that: a heuristic. Like, I have included hypernyms of hypernyms in Wikisaurus:fish by including "vertebrate" and "animal", as it did no harm. The point is to build helpful pages that help people recall words and browse the network of words and concepts by their semantic relations, without violating the semantic relations, but with fluid rules for the depth of unfolding of the relations, by which I mean which degree of A of A of A of ... we include under the relationship heads, where A stands for hyponym, meronymy, hypernym and holonym. --Dan Polansky 16:31, 30 April 2009 (UTC)Reply
An afterthought: hyponymy is included in many entries of Roget's 1911, thought not fully. Consider:
Animal” in Roget's Thesaurus, T. Y. Crowell Co., 1911.
which has "[major divisions of animals] mammal, bird, reptile, amphibian, fish, crustacean, shellfish, mollusk, worm, insect, arthropod, microbe" and further selected names of species. Of course, Roget did not have to set up an inclusion policy for other people to follow: he formed a possibly unarticulated policy in his mind while compiling his thesaurus. --Dan Polansky 16:54, 30 April 2009 (UTC)Reply

language names we don't use

I've started Wiktionary:Language names as a list of language names we don't use. Such a list is useful in my opinion, but: (a) I'm not sure whether it will be overly long; if not, then (b) am not sure it needs its own page (rather than beig part of Language considerations or something; and if it is to be its own page, then (c) that's not a great title. Edits and opinions are hereby requested.—msh210 21:55, 20 April 2009 (UTC)Reply

What we need is a master list of level 2 "language" names that are included in Wiktionary, which someone had suggested not long ago, and also its completion with a list of recognized dialect names for each language. The latter could probably not be well maintained in a centralized location. Rather, it should be information included on each About: page for a language. In the case of widely used languages like English and Spanish, the list of dialect names would probably be so long as to constitute a subpage.
The list you created would ultimately include a compilation of all that, far too long to be of any use. Removing all the dialectical information leaves something much more interesting, synonyms that refer to the same language as well as broader or ambiguous names that languages are sometimes called. Terms like "New Latin", "Modern Hebrew", and "American English" are better left out, along with basically anything that's a subset of a level 2. They would simply grow the list to an unreasonable size. DAVilla 07:16, 28 April 2009 (UTC)Reply
Good idea.—msh210 16:11, 5 May 2009 (UTC)Reply
If someone wants the current list of level 2 headers, it is at User:Conrad.Irwin/languages. Auto-Format also has a list that includes language codes. User:AutoFormat/Languages. Conrad.Irwin 16:22, 5 May 2009 (UTC)Reply

Implications of WT:RFV#remis

Hi all. Please note this RfV-sense discussion; it is for the sense “[t]he generally accepted (mis)spelling of the term ‘penis’ when input on a mobile device with T9 text prediction” of the letter-combination (deprecated template usage) remis. Ya gotta love those deluded, bowdlerising T9 lexicographers, since (deprecated template usage) remis is indeed what I got when I checked to see what I would get when I tried to key (deprecated template usage) penis into my phone. This is a peculiar class of misspelling, and it seems to me beyond pointless to give them any kind of recognition herein. Apart from the absurdity that there can be such a thing as a “generally accepted misspelling”, if we accept (deprecated template usage) remis, then we must also accept (deprecated template usage) shiv, (deprecated template usage) dial, (deprecated template usage) dual, (deprecated template usage) collock, (deprecated template usage) captap, (deprecated template usage) aunt, (deprecated template usage) yank, and so on. Of course, they’re just the insults and vulgarisms — think of the proliferation of “text-os” we’d get from trying to key into our phones the various words herein marked {{rare}}, {{archaic}}, {{obsolete}}, and the like; my recent contributions would give us (deprecated template usage) aimsiz, (deprecated template usage) whichrneter, (deprecated template usage) syno, (deprecated template usage) synt, (deprecated template usage) photo, (deprecated template usage) fyi, and (deprecated template usage) diabol. Such a list of accidental contranyms, anagrams, truncations, and garbled forms are surely not desirable. Shall we delete these with extreme præjudice?  (u):Raifʻhār (t):Doremítzwr﴿ 19:40, 21 April 2009 (UTC)Reply

I see value in keeping them iff, for example, such a word has become {{Internet slang}} for "penis", in use on Usenet or even in print, with etymology from T9 but with use even where T9 is not employed. Aside from that, RFV should serve to weed them out, though I certainly wouldn't complain if someone speedily deleted such a word after checking for use.—msh210 20:18, 21 April 2009 (UTC)Reply
Hmm. I suppose. The first page of hits yielded by Google Groups Search throw up uses in the chess sense, as a sort of taxi, and as a misspelling of (deprecated template usage) remix. I’d expect that (deprecated template usage) remis would be far more common as a misspelling of (deprecated template usage) remix, (deprecated template usage) remiss, &c. than as a text-o for (deprecated template usage) penis (u):Raifʻhār (t):Doremítzwr﴿ 20:33, 21 April 2009 (UTC)Reply
I don't know if this is labeled properly, but I think we can cross that bridge when we get there. At the moment I have absolutely no expectation that the term will be cited in durably archived media. If anyone thinks this is worth having even when it cannot be cited, which might be a bit controversial, then I would suggest a heading similar to anagrams. DAVilla 06:32, 28 April 2009 (UTC)Reply

Usage notes and verbs

I know this sounds weird. I'm not a native English speaker and I often look up some words to fully understand their meanings. The sections "Usage notes" are really helpfull. I know Witkionary is a dictionary rather than an English language textbook, but most of people consult it not only to understand a word, but also to learn how to use this word in a practical context.

This is why usage notes, like in who or which are important. You can actually learn how to use this word in the right way. What I found confusing is the use for proposition after a verb. This is not something made clear on Wiktionary. I don't know if it were your (as a the community) intention to do so, but it would be really nice to some explanation about that. We know that "talk with" and "talk to" are different. The difference might not be widely understood, or maybe there is a common form that is used instead of the grammatically correct form. (Such as "if I was" vs "if I were")

Another example are the verbs to speak, which can used with either the prepositions "with", "of" or "to" (speak for is considered a phrasal verbs as I've just read), to connect ("with" or "to"), to think ("of" or "about") etc.

Since it's a bit confusing and the examples don't really help, can we readers have a little more attention on this topic? Made a bit clear over the prepositions used with a verb? It would be really nice. I want to discuss this topic and find a solution with you, rather than complaining or telling you what to focus on. I don't really have any idea about it, and I think a section just for prepositional usage is too much. Thanks for your time! Exe --125.24.188.33 20:27, 22 April 2009 (UTC)Reply

I agree that the usage notes section of verbs should indicate what prepositions the verb takes (or that it doesn't take any, like eat). It should also indicate, if there are two objects, what each one is (e.g., what "I gave my son a cat" means).—msh210 22:13, 22 April 2009 (UTC)Reply
Unfortunately, there are few generalizations about verb/preposition combinations that hold. For example, you said that "eat" takes no preposition, but there are instances where it does, with the KJV translation of Genesis being an oft-quoted example ("eat of the tree"; "thou shalt not eat of it"). Admittedly, such constructions are rare in modern English, and sound archaic, but they are still used. And there are adverbial prepositional phrases that can be used with (deprecated template usage) eat, such as "eat in the kitchen", "eat on the deck", "eat off the floor", etc. --EncycloPetey 21:05, 26 April 2009 (UTC)Reply
This is also in part what examples/quotes are for. Circeus 02:47, 23 April 2009 (UTC)Reply
I wonder if examples/quotes can feasibly do a thorough job of describing the various nuances of words. In my opinion, this need is rendered very difficult by one of the primary failings of our current format, the inability to add lots of information without bogging down the entry. What we really, really need is a robust, powerful sense template, where, in addition to a definition, all sorts of other sense specific information can be added, such as detailed usage notes, as well as lots of stuff we currently have unintuitively detached, like synonyms and translations. Ideally, all the sense specific stuff would be hidden from view initially, with a few floating tabs for opening it up, if the user wanted to dig further. This would allow all information for a specific sense to be clustered in one spot. It would also allow the initial page to be far more intelligible to the average user, while allowing us a lot more space for adding information for the in-depth user. Anyone with some JS skills feel like giving this a shot? -Atelaes λάλει ἐμοί 03:16, 23 April 2009 (UTC)Reply
Old news :p User:Conrad.Irwin/parser.js (also on WT:PREFS) has been doing this (albeit very crudely and bugily) since 2007. I can't promise to be able to improve it at any time in the near future. Conrad.Irwin 09:15, 23 April 2009 (UTC)Reply
Ah, yes. Come to think of it, I think I've tried this before, and forgot about it. The layout needs some work, but that's exactly what we need to be doing. This is the future of Wiktionary. -Atelaes λάλει ἐμοί 09:26, 23 April 2009 (UTC)Reply
This information could, and should, be put under "=== Usage notes===". Either in prose, or if someone comes up with a clear format, using a template. Conrad.Irwin 09:15, 23 April 2009 (UTC)Reply
speak
of to with
It's true, it should be under "Usage notes", but that's mean you will have that section in several pages about verbs. I little template (like the example) and a more attention on the examples given should be fine. Exe --125.24.217.199 11:38, 23 April 2009 (UTC)Reply
Prepositions are difficult to pin down. I am always thinking about how best we could deal with them, but there is no one clear solution that I can see so far. Taking your example. Speak + prep. The prepositions in these examples have their own meanings, and they can transfer those meanings to other verbs equally well eg with "speak", "talk", "throw", "give", "walk" etc. + "to", "to" has the meaning of "directed towards". "with" means "together", and so is unlikely to collocate with "throw" and "give", but it does with "speak", "talk" and "walk". If we were to fill the prepositional possibilities of "speak", it would be a huge entry, with most of it being nothing more than a repetition of the meaning of each preposition. (to, with, against, at, over, about, around, into, etc.) I think prepositional collocations are only really useful when a verb consistently uses just one or two particular prepositions in the majority of written examples, and yet is not a phrasal verb. "Return" for example collocates with "to" and "from" in nearly all cases, but does not form a phrasal verb with either. -- ALGRIF talk 18:11, 23 April 2009 (UTC)Reply
  • I would like to present an example to see what would be the best way to deal with this problem. I'm looking at smash verb. "smash into" and "smash through" are not considered to be phrasal verbs, but they are very common collocates. So what do people think would be a clear and consistent way to put the ideas The car smashed into the wall. The police smashed into the room. and The builders smashed through the wall. into the entry at "smash"? -- ALGRIF talk 15:22, 24 April 2009 (UTC)Reply

I had a look at speak directly followed by a preposition. I found 83 possibilities in the Corpus of Current American English (I didn't look at the British National Corpus). Of those, I looked carefully at the first 20 or so.

word		count	PMW	Object of the preposition
to		17218	44.72	listener,	topic
of		9433	24.50	topic,	nothing to speak of
with		7814	20.30	listener	
for		4484	11.65	source	
in		4431	11.51	tongues,	terms of
about		3185	8.27	topic	
on		1996	5.18	topic	
up		1878	4.88	(intransitive)	
at		1794	4.66	listener	
through	985	2.56	(intransitive)	
from		933	2.42	Not complement	
by		483	1.25	Not complement	
into		435	1.13	Not complement	
like		297	0.77	Not complement	
as		278	0.72	to + topic	
before		243	0.63	Not complement	
against	209	0.54	topic	source
without	190	0.49	Not complement	
out		120	0.31	(intransitive)	
during		101	0.26	Not complement	
over		93	0.24	noise	
after		90	0.23	topic	
back		46	0.12	(intransitive)

This may have omissions. I distinguished between complements and adjuncts by trying to front them. So, for example, *Of the problems I'm having I spoke is not acceptable, so [of + topic] is a complement and therefore worth noting. But From the stage, I spoke of the problems is fine, so I take [from + object] to be an adjunct and not worth noting in the dictionary.

This kind of analysis is very important, but quite difficult.--Brett 14:59, 28 April 2009 (UTC)Reply

Brett: Superficially (all I'm capable of), it doesn't seem so much difficult as time-consuming, at least to get the "easy" 80%. What made it difficult? I wonder how our users could see the interest and value in it.
Algrif: Multiple usage examples and/or citations? We already have people complaining about our long entries, but also about insufficient usage examples. More or wordier definitions seems definitely the wrong way to go. Perhaps WT:CFI needs to be amended to explicitly allow for the inclusion of some verb-preposition collocations that would be too debatable under current criteria. Perhaps such entries would appear under their own "rel" bar at the verb. Our longer verb entries often seem rather hard to use and hiving off some material to separate entries might help keep the situation from getting worse. DCDuring TALK 16:33, 28 April 2009 (UTC)Reply
It's both difficult and time consuming. It wouldn't take much to train people to come up with the lists of prepositions, but looking through the results and recognizing the various possibilities, and then thinking through carefully whether they are adjuncts or complements, takes a level of language awareness that, frankly speaking, most of the population simply doesn't have. But if we're serious about this, I see no alternative. I'm not aware of any available source for the data. But then there is the prohibition about own research.--Brett 00:09, 29 April 2009 (UTC)Reply
This seems like a kind of attestation/WT:CFI "research" that could lead to vastly better entries for common verbs, some of which seem stuck in 1913. I would expect big improvements of all the senses, whether or not used with prepositions. If our methods are documented and objective and confined to validation/attestation, where is the problem? DCDuring TALK 02:13, 29 April 2009 (UTC)Reply
Brett, Wiktionary is not wikipedia and does not have a NOR policy, only the Criteria for inclusion. Writing a wholly new, multilingual dictionary without original reseach would be impossible, as it would be almost impossible to write definitions of new terms (cf. splog or link spam). Circeus 04:57, 29 April 2009 (UTC)Reply
I'm glad to learn that there's no NOR policy here. I'd had it thrown at me and not taken the time to check.
Of course DCDuring is right. Complements go beyond prepositions and objects. Just for English verbs, the possibilities include: to-infinitive (e.g., want to go), bare infinitive (e.g., make him go), various kinds of content clause (e.g., wondered what the problem was, said (that) it was difficult, it worries me that they're late, understand what a great chance it is), present participles (e.g., keep making progress), and locative complements (e.g., put it here).--Brett 13:05, 29 April 2009 (UTC)Reply
Longman's DCE is exemplary in this regard. I was just thinking that any focused attention by our better editors on those core entries will likely lead to lots of improvement for all aspects of these entries, especially the rarely improved definitions. This is first programmatic effort I'm aware of that would get at these definitions. I don't know how often users use Wikt for these words, though. DCDuring TALK 14:10, 29 April 2009 (UTC)Reply
My first thought is to suggest we try this for the 20 or 50 most common verbs in English, but then we run across the problem of coordinating prepositional usage with definitional senses. Brett's analysis is a good start, but it doesn't take into consideration (yet) which prepositions are used with which senses. That issue adds another very difficult, but firmly relevant and important, level to be done in an analysis like this. --EncycloPetey 20:48, 3 May 2009 (UTC)Reply

The Centre for Corpus Research at the University of Birmingham has made available online the book Grammar Patterns 1: Verbs, originally published in 1996 by Collins Cobuild and now out of print. It can be found here--Brett 14:07, 15 May 2009 (UTC)Reply

Sign language entry links

Currently, most links to our sign language entries show just the name of the target page:

* [[American Sign Language]]: [[OpenB@Chin-PalmBack-OpenB@CenterChesthigh-PalmUp OpenB@Palm-PalmUp-OpenB@CenterChesthigh-PalmUp]]

I just discovered that our Mediawiki instance allows image links. Using such syntax, we can link to sign language entries from images.

* [[American Sign Language]]: [[Image:ASL OpenB@Palm-PalmUp-OpenB@CenterChesthigh-PalmUp.jpg|35px|link=OpenB@Chin-PalmBack-OpenB@CenterChesthigh-PalmUp OpenB@Palm-PalmUp-OpenB@CenterChesthigh-PalmUp]]

I think that's a more reader-friendly format, but I'm not sure whether images in translation tables will make for a layout that's too jagged or otherwise jarring. Comments? —Rod (A. Smith) 16:59, 23 April 2009 (UTC)Reply

I'm for including pictures. (On a computer that won't load the BP, I can't see the layout of the format Rod proposed. But I like the format at two#Translations.)—msh210 17:09, 23 April 2009 (UTC)Reply
{{t-image}} allows images for languages that aren't yet in unicode already, so there'd be no problem with including images. It would rely on the images already being present - though I suppose you could always fall back to the previous system if necessary. Conrad.Irwin 17:16, 23 April 2009 (UTC)Reply
Using {{t-image}} results in the following:
* [[American Sign Language]]: {{t-image|ase|ASL OpenB@Palm-PalmUp-OpenB@CenterChesthigh-PalmUp.jpg|35px|OpenB@Chin-PalmBack-OpenB@CenterChesthigh-PalmUp OpenB@Palm-PalmUp-OpenB@CenterChesthigh-PalmUp}}
The transliteration appears after the image. Is that desirable? If so, I think the transliteration should also link to the target entry. If the community agrees, I'll edit {{t-image}} to do so. Also, I'm not sure how big to make the image. 35px (as above) seems about the smallest readable size. Is the large vertical space surrounding the image OK? —Rod (A. Smith) 17:53, 23 April 2009 (UTC)Reply

capitalization of proverbs' pagetitles

The CFI say, when discussing pagetitles:

===Proverbs===
Proverbs that are whole sentences should begin with a capital letter. For example: You can't judge a book by its cover.

The problem with this is that not only that particular proverb but actually all English proverbs are lowercase-initial-letter. Clearly, current practice does not match the CFI. Two people recently tried to edit the CFI because of this inconsistency (diff, diff) and were reverted for different reasons (diff, diff). But something should be done. I propose just that the above text be removed from the CFI. Any opposition?—msh210 17:49, 23 April 2009 (UTC)Reply

I never liked that rule myself, so I support your proposal. We already have plenty of sentence-like utterances that aren't proverbs and thus aren't capitalised (e.g. what's cooking); I don't see the point of a distinction. Equinox 22:03, 23 April 2009 (UTC)Reply
Can the guideline suggest that they have a small initial then? No point in having random initial caps in these entries, or near-duplicates differing by sentence case. Michael Z. 2009-04-23 22:37 z
No objection from me.—msh210 23:33, 23 April 2009 (UTC)Reply
I support this; it makes sense to me to align the guideline with the current practice. --Dan Polansky 07:24, 24 April 2009 (UTC)Reply
I agree too. Even when proverbs are sentences by themselves, thay may be used inside sentences (e.g. after because). Lmaltier 11:14, 24 April 2009 (UTC)Reply
You have my agreement, too. I never liked, nor understood the rationale of this CFI. It can lead to double entries, as you say. -- ALGRIF talk 12:40, 24 April 2009 (UTC)Reply
Support for changing CFI to say "lowercase letter" instead of "capital letter". No point in contradicting ourselves. --Jackofclubs 06:01, 25 April 2009 (UTC)Reply
Support as well, as these aren't going to be followed by a period, and can come as fragments of a larger sentence (e.g. "Even though you can't judge a book by its cover, the gaudy cover of this book made me apprehensive.") bd2412 T 06:07, 25 April 2009 (UTC)Reply
Support, although I always feel better if we hash out exactly what the change will say before making the change. This is certainly one case where we havn't explicitly articulated our norms properly. --EncycloPetey 20:59, 26 April 2009 (UTC)Reply
How about Proverbs should begin with a loewrcase letter. For example: you can't judge a book by its cover.? --Jackofclubs 08:48, 27 April 2009 (UTC)Reply
What about God helps those who help themselves? I would prefer : Usual Wiktionary capitalization rules also apply to proverbs (proverbs can come as fragments of a larger sentence). For example: you can't judge a book by its cover. Lmaltier 15:30, 27 April 2009 (UTC)Reply

Okay, lemme restart this conversation, then: The proposal is changing

===Proverbs===
Proverbs that are whole sentences should begin with a capital letter. For example: [[You can't judge a book by its cover]].

in the CFI to

===Proverbs===
Even proverbs that are whole sentences should begin with a lowercase letter. For example: [[you can't judge a book by its cover]]. (Exception: A proverb like [[God helps those who help themselves]] starts with a proper noun, so is capitalized.)

. Any objection?—msh210 15:49, 27 April 2009 (UTC)Reply

How about “Don't capitalize proverbs as sentences”? There's no need to mention every reason that you would capitalize a word as exceptions. Michael Z. 2009-04-27 18:34 z

What about:

===Proverbs===
Proverb entries should begin with a lowercase letter, regardless of whether they are whole sentences. An example: you can't judge a book by its cover.

A benefit over the "don't" proposal is that it heeds Strunk's "specify positively". The proposal only states the rule, not the reasoning behind. It also explicitly discards one item that someone could see as speaking against the rule, namely that proverbs usually are whole sentences, making it clear that the rule was created with the awareness of the item. Other proposals are okay for me; though. The semantics is clear from all the proposals, even if style varies. --Dan Polansky 19:08, 27 April 2009 (UTC)Reply

I agree that these all work, but I don't mind shaking out the best wording.
Would Strunk apply positiveness this way to a prohibition? The problem is that not all proverbs will begin with lowercase letters (e.g. “God...,” above). We're not prohibiting editors from capitalizing the first word of a proverb – we're only only advising them not to capitalize it as the start of a sentence. Michael Z. 2009-04-27 20:09 z
In general, I'd think Strunk would go for a positive expression even with a prohibition, by chosing "avoid" in preference to "don't". But to the specific point: I don't know about Strunk, but, to me, "write in lowercase" seems to do just as well as "don't write in uppercase". However, I see your point that has led you to choose "Don't capitalize proverbs as sentences": your statement is more accurate, as it caters for such cases as "God...". I can fix my proposal by adding to the first sentence ", unless the first word of the proverb is capitalized on its own", or something to the effect. That makes my prosed statement much less charming, though. In any case, an example should be added of Rome wasn't built in a day or another proverb that should start with a capital letter. Hence my second take:
===Proverbs===
Proverb entries should begin with a lowercase letter, regardless of whether they are whole sentences, unless the first word of the proverb is capitalized on its own. Examples: you can't judge a book by its cover, Rome wasn't built in a day.
--Dan Polansky 21:02, 27 April 2009 (UTC)Reply
Strunkist: “avoid capitalizing proverbs as sentences.” Michael Z. 2009-04-27 21:29 z
===Proverbs===
Proverb entries should begin with a lowercase letter, regardless of whether they are whole sentences, unless the first word of the proverb is capitalized on its own. Examples: you can't judge a book by its cover, Rome wasn't built in a day.

(Sigh.) I was hoping this could be uncontroversial, to avoid voting. Can we agree on

===Proverbs===
Proverb entries should begin with a lowercase letter, regardless of whether they are whole sentences, unless the first word of the proverb is capitalized on its own. Examples: [[you can't judge a book by its cover]], [[Rome wasn't built in a day]].

then?—msh210 20:31, 28 April 2009 (UTC)Reply

My copyedit:

A proverb entry's title begins with a lowercase letter, whether it is a full sentence or not. The first word may still be capitalized on its own:

 Michael Z. 2009-04-29 16:45 z

Msh210, I support this proposal, of course ;). --Dan Polansky 10:06, 30 April 2009 (UTC)Reply
This looks good to me. Equinox 21:42, 1 May 2009 (UTC)Reply
Six of one, half a dozen of the other. DAVilla 20:12, 3 May 2009 (UTC)Reply

Thank you, folks. Done.—msh210 16:10, 5 May 2009 (UTC)Reply

Logged-in users editing their own user pages

Since this is always permissible, could we automate the removal of the "red exclamation mark" designating unpatrolledness? Equinox 21:55, 25 April 2009 (UTC)Reply

I'd say no, 'cause as an admin (primarily at no.wiki) I've often seen users abusing their user pages, adding offensive content or blatant vandalism. We have a policy at Wiktionary:Usernames and user pages, and we should make sure it is not violated. --Eivind (t) 20:48, 26 April 2009 (UTC)Reply
We already have this. Just enable the WT:PREF "Patrol in enhanced mode" or w/e. If there's someone with that pref logged in, it will be patrolled by javascript. Conrad.Irwin 21:07, 26 April 2009 (UTC)
Then we should disable that. We should not give carte blanche to all users to make patrolled edits to their own user pages, for the reasons ElvindJ has given. The edits should be patrolled if the user in question isn't "whitelisted". --EncycloPetey 21:12, 26 April 2009 (UTC)Reply
Concur with EJ and EP. Given the fact that user pages will be rarely examined otherwise if the user makes no other edits, it is imperative that user pages not be given a pass. Indeed, it might be worthwhile to disable autopatrol of sub pages in userspace even for whitelisted users if that be technically doable. — Carolina wren discussió 04:34, 27 April 2009 (UTC)Reply
Given that we already have a massive (a couple of hundred every day) backlog of unpatrolled edits, I'd rather we actually did some more guessing of edits that are "likely" to be constructive and, if not auto-patrol them too, give them a different colour exclamation mark so that we can patrol more easily the edits that need looking at. Yes people might be making a mess on their userpage, but (in my experience anyway) the most common thing is a link to Wikipedia, followed perhaps by a random spiel about oneself, then you get the people who create adverts (which SemperBlotto then goes through and deletes in batches). Yes we might miss a few bad edits, but we already miss a few anyway. I did recently, as an experiment, try to keep a day with no unpatrolled edits. I was able to do this (with others still patrolling as much as usual, I assume), but it took maybe two or three hours of extra time throughout the day. Conrad.Irwin 09:24, 27 April 2009 (UTC)Reply
As Conrad notes, SB has a script that marks for deletion userpages of users whose only edit is their userpage (or something like that, anyway). So I see no harm in patrolling these. Moreover, doing so lessens the number of unpatrolled edits that patrollers need to wade through. Is there anyone who actually patrols (by hand, not just by JavaScript) who thinks these should not be marked patrolled?—msh210 15:44, 27 April 2009 (UTC)Reply
Yes, I do. --EncycloPetey 03:05, 28 April 2009 (UTC)Reply

Spanish

Two issues I have off the top of my head. First one is simple:

Why? I've come up with some pretty good reasons in the past, but I've forgotten a lot of them. The main reason is that Spanish nouns have, at the very most, 3 non-lemma forms - and that's only because some nouns that describe humans have masculine and feminine forms. To compare that to languages with heavy inflection is silly. Look at Armenian Template:Armn, Hungarian vizsla, Lithuanian brolis, Russian, Finnish... These heavily-inflected languages obviously need the broad 'x noun forms' categories, while Spanish, which really only has "nominative plural" forms, definitely doesn't.

Spanish adjective forms makes more sense, because Spanish adjectives pretty consistently have specific m/f forms, which the majority of Spanish nouns do not. Even 'Category:Spanish plurals' was better for Spanish nouns than this noun form business. — [ R I C ] opiaterein15:38, 27 April 2009 (UTC)Reply

What's a femmie? --Jackofclubs 16:09, 27 April 2009 (UTC)Reply
Opposite of a butchie. — [ R I C ] opiaterein16:28, 27 April 2009 (UTC)Reply

Is the category for Spanish terms of Spain or of Europe? If the adjective is too awkward, then use the attributive noun: Category:Spain Spanish, but don't say something else altogether. Michael Z. 2009-04-27 18:31 z

Yeah, let's name all of our multi-country language categories like that :) French French/France French, England English/English English, Dutch Dutch. There's a certain level of bias that can be read into that. The most common being "real" French, or "real" English, or whatever. The category is for Spanish terms used in Spain, which is European Spanish. You could call it Castillian Spanish, which not everyone would recognize. That'd be like the Portuguese category being called Lusitanian Portuguese. — [ R I C ] opiaterein21:52, 27 April 2009 (UTC)Reply

A plea to our sysops

EVERY time that I log on to Wiktionary, I go to Recent Changes and patrol for vandalism and stupidity back from the time that I last logged out. If I am away for an extended period (two days - busted cable modem) this can take a very long time. It would help me greatly if other sysops did the same thing (rather than just patrolling while you are logged on). Cheers. SemperBlotto 16:00, 27 April 2009 (UTC)Reply

Even when I go through RC, I mark fewer edits patrolled than some others do, as I am wary of marking an edit patrolled when I have absolutely no idea whether it was made in good faith. The prime example, and a very common one, is an edit that adds a translation into a language I do not know at all, or that adds several such translations into various languages for a single word. Another example — less common, but still common enough — is an edit that adds a ==language== section (or a new page) in a language that I do not know. If our policy is that such edits can be patrolled, I will be glad to do so. Is it?—msh210 17:16, 27 April 2009 (UTC)Reply
I've tended to mark as patrolled things that are formatted perfectly (though this is now harder for translations with creation.js) on the assumption that if they have taken the time to learn the format, they must believe what they are adding is right (even if it isn't - but hey, even the autopatrollers will make mistakes, they get corrected eventually). [Assuming of course that there are no highly devious anti-Wiktionary agents who are stealthily creating a huge practical joke]. For a lot of the Romance language translations, I often find myself "guessing" whether they are right or not - I tend to lend higher credence to words that look similar (though I'm often slightly wary that someone might be "guessing" as well). I'd also be more inclined to patrol a group of similar, good looking contributions by the same user (again on the assumption that if they are spending some time on Wiktionary, they must be under the impression they are doing some good). If in strong doubt, I occasionally ask one of our native speakers to verify, or will just not patrol it. 131.111.220.6 23:07, 27 April 2009 (UTC)Reply
In the past, there was an eight-hour window of time I tried to patrol through for every day. If I didn't log in one day, then I'd do that eight-hour period for all previous days the next time I did log on. Unfortunately, my current job takes away more of my Wiktionary time than my previous job did, so I can no longer do this most days. However, the idea might work if a few sysops selected particular time slots to patrol through (not necessarily at the same time you habitually log in), or thoroughly patrolled one hour's worth of edits each day (though not necessarily the same UTC hour each time). --EncycloPetey 03:50, 28 April 2009 (UTC)Reply
To be brutally honest, I don't usually feel I have time for this, especially when there are always so many new words to be added and when so many unpatrolled edits are the non-English ones. (Even with e.g. Spanish, unless I immediately recognise a cognate to French, I can't think about approving it.) Particularly egregious vandalisms are usually picked off at once because it's rare for no admin to be logged on. I do, though, appreciate your efforts, SB, and I will continue to zap as much vandalism as I can while I happen to spot it. Equinox 21:44, 1 May 2009 (UTC)Reply
Patrolling is not approving of. You can mark entries as patrolled even if you aren't sure they're correct, as long as there's a good chance they could be correct. For new translations, having the right script is usually enough for me. DAVilla 20:08, 3 May 2009 (UTC)Reply
Patrolling is about checking that the edit (1) isn't vandlism, (2) isn't spam or propaganda, and (3) is formatted correctly. Patrolling isn't about the veracity of the information, although many patrollers do check that simultaneously when they notice a problem. --EncycloPetey 20:43, 3 May 2009 (UTC)Reply
Ah, but in the cases I describe above (that is, where the edits are in a language I don't know), I have no way of recognizing vandalism. I've brought up this question twice before IIRC, and each time someone says "you can patrol those" and someone else says "I don't patrol those" and someone else says "patrolling means checking that it's not vandalism" (which doesn't help me really).—msh210 19:15, 4 May 2009 (UTC)Reply
Equinox (and others), I was told when I was nominated for sysop that a sysop has no required jobs: the only requirement is that he keep active (at least one edit a year). Obviously, patrolling is necessary (in the sense that someone's got to do it) and therefore a Good Thing To Do™, but.—msh210 19:15, 4 May 2009 (UTC)Reply
Sadly, I can't really use my browser well enough at this point in time to go back and patrol when I'm not on, but I often do leave VandalFighter running through days at a time and I will patrol through that (also, I tend to catch a lot more vandalism than I do to just patrol other edits). I'm also often gone for weeks at a time, which can make the idea of going through RC a bit daunting. But I'll try.  :) --Neskaya kanetsv 22:48, 24 May 2009 (UTC)Reply

Category:Filmology, Template:filmology

filmology is something else. If there's no objection, I'll rename these to Category:Film or Category:Cinema, and Template:filmMichael Z. 2009-04-27 18:00 z

Or Category:Cinematography (used in OED), or Category:FilmmakingMichael Z. 2009-04-27 22:40 z
And Random House would use Category:Movies. But I'll keep it simple, and choose film. Last chance to complain. Michael Z. 2009-04-28 02:51 z

Huh? What happened to all the previous discussion on this issue? --EncycloPetey 03:45, 28 April 2009 (UTC)Reply

I've never seen it. The category has a tag placed August 2008 pointing to RfC, but there's nothing on RfC. Michael Z. 2009-04-28 14:12 z
Found it: Wiktionary:Tea_room/Archive_2007/November#Category:Filmology. Inconclusive. Points to add:
  • cinematography is wider in meaning than mentioned there. It is the artistic and technical activity of making and reproducing films, per OED, M–W,[4] AHD, and RH,[5] a close synonym for filmmaking. OED uses the subject label Cinematogr. in entries like (film) trailer. (It appears that our cinematography definition is not quite right, because it is derived verbatim from the related but subtly distinct meaning of cinematographer)
  • filmology means something altogether different.
  • our “Category:Filmology” could include film distribution (arguably within the sphere of filmmaking), criticism (arguably not), etc.
I still think the general field film might be best, cinema, movies, cinematography, or filmmaking would be fine. Michael Z. 2009-04-28 16:39 z
I have placed the RFC tag to the category:Filmology without, in mistake, also putting a RFC entry to Wiktionary:Requests for cleanup, in August 2008. I hope to know better now. --Dan Polansky 07:47, 2 May 2009 (UTC)Reply

Found some more at Wiktionary:Beer parlour archive/2009/January#Category:Filmology, also inconclusive. Perhaps cinema is preferable. Anyway, if there is any more hiding out there, or anyone has some current input, please speak now.

See also Wiktionary:Requests_for_deletion/Others#Template:Filmology Michael Z. 2009-04-30 17:42 z

  • I definitely prefer ‘film’ to ‘cinema’. I work in television, most of these terms are ones I use every day, and while we often talk about ‘film’ or ‘filmmaking’, ‘cinema’ is obviously inappropriate. Ƿidsiþ 18:30, 30 April 2009 (UTC)Reply
  • Since many of the "filimng" terms apply to videography and television as well as to cinema, should we split out a sub-category for videography/cinematography? --EncycloPetey 20:41, 3 May 2009 (UTC)Reply
    There is Category:Television. I believe film and video production have more in common than ever, technically. Can cinematography or film be used broadly to include video production? Or we could have a combined Category:Film and videoMichael Z. 2009-05-04 03:56 z

Organization of Index:American Sign Language

User:Positivesigner is working on reorganizing Index:American Sign Language. For anyone interested, please give feedback at Index talk:American Sign Language#Organization. —Rod (A. Smith) 21:25, 27 April 2009 (UTC)Reply

Word of the Day - help needed

It looks as though Word of the Day needs volunteer help. Because of personal problems with stress on Commons, and the inaction of that community, I will no longer be uploading audio or other media to Commons. This means that Word of the Day needs someone willing and able to record / upload .ogg file recordings for WOtD. Otherwise, we have two options: (1) no longer include an audio link, (2) allow red links on the Main Page for the audio. --EncycloPetey 03:03, 28 April 2009 (UTC)Reply

If nobody volunteers, then (3) upload locally would also be a (temporary) option. -- Prince Kassad 20:23, 28 April 2009 (UTC)Reply
I've recorded the five missing files for this month but Commons capitalizes the first letter, e.g. File:En-us-resile.ogg, and I'm not sure how to get around that. Someone please revise Help:Audio pronunciations. DAVilla 19:01, 3 May 2009 (UTC)Reply
The capitalization of the first letter of Commons file names has no effect on linking from here. Like Wikipedia, they are case-insensitive for the initial letter. --EncycloPetey 20:38, 3 May 2009 (UTC)Reply

Protected titles

(this page has gotten way to big to load again; I'm adding this section by using the "section=new" URL, so I won't be able to reply here or edit this section again ...)

I re-wrote Wiktionary:Protected titles to reflect the "new" (year old ;-) built in mechanism in the MW s/w. It is underused, and I'd suggest we use it much more.

Please look, and please use the talk page there, rather than here ... Robert Ullmann 15:25, 28 April 2009 (UTC)Reply

Well, I just archived February and March, bringing the Parlor under 200 kb... there's not that much from the current month that's more than 2 weeks old... Hope it helps! (BTW, it'd be nice if some bot herder took it upon themselves to update WT:BP/headings page.) 75.214.50.157 (really, User:JesseW/not logged in) 08:07, 30 April 2009 (UTC)Reply
Connel, who used to do it, has been afk for a while now. Would it not be better to move to using subpages for topics, and transclusion to allow everyone to concentrate on topics they are interested in without concerns about the length of the main page. (Much like WT:ES save that the pages transcluded woult be at Wiktionary:Beer Parlour/) Conrad.Irwin 09:28, 30 April 2009 (UTC)Reply
Many of those titles are not set to expire, and some are set to expire in the distant future, including one ten-letter word in 2018! I'm very much glad to see something said about that. DAVilla 14:47, 30 April 2009 (UTC)Reply

Request for extra input on the Main Page redesign

Discussion for the proposed new design has stalled out. I'm relatively happy with the layout as proposed (though there are tweaks and kinks to work out before it is formally submitted for replacing the current one), so if you want to propose things, come and say so! Circeus 23:50, 28 April 2009 (UTC)Reply

Help:Interacting with humans

Per Visviva's suggestion above, I've had a go at writing a document to discuss how people on Wiktionary are likely to behave. I don't think the current pages at Wiktionary:Assume good faith and Wiktionary:No personal attacks are particularly useful in describing how things actually work. Conrad.Irwin 00:24, 29 April 2009 (UTC)Reply

Looks pretty good. Michael Z. 2009-04-29 00:56 z
Looks good. How would a user get to it? DCDuring TALK 02:01, 29 April 2009 (UTC)Reply
I've gone on a spamming spree, Special:Whatlinkshere/Help:Interacting with humans. I wasn't sure if it should be in {{welcome}} or not - but the last thing that needs is yet another link. Conrad.Irwin 09:45, 29 April 2009 (UTC)Reply
Those all seem good. What screens do people see immediately at the moment they are most likely to be upset (deletion of entry, being blocked, etc)? Perhaps some would read something there. DCDuring TALK 10:36, 29 April 2009 (UTC)Reply
MediaWiki:Blockiptext? —RuakhTALK 20:53, 30 April 2009 (UTC)Reply
I've noticed a numbmer of grammar and punctuation errors. I'm signing off now, but will correct them later if someone else doesn't catch them first. There also ought to be something explaining to newcomers about where most discussion on wiktionary takes place. Wikipedia users often believe that user pages and entry talk pages are the appropriate place for most discussions, as that is where they tend to happen on Wikipedia. On Wiktionary, most discussion happens in one of the five discussion fora. --EncycloPetey 12:10, 30 April 2009 (UTC)Reply
Couldn't this be Help:Interacting with others or Help:Interacting with other/fellow editors? DAVilla 15:03, 30 April 2009 (UTC)Reply
Wherever, this was the title that was suggested - I do quite like the subtle reminder :D. Conrad.Irwin 19:15, 30 April 2009 (UTC)Reply
Maybe you could have some pointed humor in a subtitle. The page title and/or pipe should not drive people away!!! DCDuring TALK 19:48, 30 April 2009 (UTC)Reply

Prepositions and verbs (redux)

What would be a good way to add "to" to the entry for tell, used in Irish English to mean tell? In AmE, it is ungrammatical (in the dialects I'm familiar with.) Wakablogger 00:16, 30 April 2009 (UTC)Reply

If they're using to in place of tell, wouldn't that be under the entry for to then? Or maybe I'm not understanding. DAVilla 14:59, 30 April 2009 (UTC)Reply
Yeah can you explain a bit more please. Mglovesfun 15:09, 30 April 2009 (UTC)Reply
I think Wakablogger's talking about a verbal construction tell to. Circeus 05:17, 7 May 2009 (UTC)Reply
Yes, the song "Seven Drunken Nights," for example, has the repeated phrase "Will you kindly tell to me." [[6]] What is the best way to include this along with all the other prepositions "tell" takes? Wakablogger 20:50, 9 May 2009 (UTC)Reply

Languages without written forms

This messages is inspired by a debate about French sign language on fr.wikt. The sections just linked to an external site with no other formatting, so we deleted them. My question is does it say anywhere in the Wiktionary policies that we only accept languages with written form? If not it either should, or we need to come up with something that allows sign languages to be represented here. Which I'm against just because it's a massive challenge and I can't see how it would work. Mglovesfun 15:09, 30 April 2009 (UTC)Reply

I've only just seen the message above. I should say that I'm not against sign languages, it's just a question of how you 'write down' a sign language. What you've come up with looks interesting. For instance on the French Wikt, they had stuff like fr:bleu as a French sign language word - surely if you write it down, it's written French and therefore not 'sign' language. Anyone care to disagree? Mglovesfun 16:42, 30 April 2009 (UTC)Reply
I believe the system we use to transcribe sign language (i.e. write it down in the manner it is signed, rather than use the English equivalent) is home-grown (as existing systems were not (yet?) usable for entry titles), and that the people who are organising the system on en.wikt are uploading photos to make it easier to learn. I assume that the system they have built could quite easily be modified for other sign-languages, though it does have a possible problem in that it uses fragments of English to describe the signs, that might need to be changed if it were to be adopted by another Wiktionary, I would certainly encourage the inclusion of such entries. Some translations for the word water are in scripts that Unicode does not yet support, the way we solve this problem is to use an image for the translation, and link it to a page with the transliteration as the title. To exclude languages just because no-one writes them seems a bit cruel, and a system of transliteration can presumably always be found/created. Conrad.Irwin 19:12, 30 April 2009 (UTC)Reply
I'm not really against the principle, it just needs to be either done well, or not at all. I'm more in favour of putting external links or "YouTube" style clips for example instead of an audio file Vancouver (Canada) listen to water you could have American sign language - water. Reactions? Mglovesfun 20:45, 30 April 2009 (UTC)Reply
Sure, if we can find volunteers to record them, it would be really neat. Conrad.Irwin 20:53, 30 April 2009 (UTC)Reply

Our current system is the result of extensive consideration of all possibilities that anyone could think of. For some (not all) of the discussion involved, see the archive page Wiktionary talk:About sign languages/Archive 1. For the current system, see WT:ASL. To see it in action, see the entries listed in the subcategories of Category:American Sign Language. And to propose specific emendations to it, you can do so here, but a better place would probably be Wiktionary talk:About sign languages.—msh210 21:37, 30 April 2009 (UTC)Reply

May 2009

Bot for adding audio

Can someone review latest edits of my bot adding pronunciation files before I run it normally on more entries? --Derbeth talk 15:06, 1 May 2009 (UTC)Reply

I've had a look. The edits seems fine (I didn't listen to any of the files, just have to assume they are correct). I made a couple of changes: Absolute and here where an {{also}} was missing, but I'm sure that doesn't upset the bot. I'd say run it again. --Jackofclubs 12:32, 3 May 2009 (UTC)Reply
I didn't spot any problems. --EncycloPetey 14:02, 3 May 2009 (UTC)Reply
Actually, there might be a couple of things to review: homographs and US/UK files. I compiled a mini list of recent bot additions to homographs at User:Jackofclubs/nothing, which should be moved to a relevant section. I don't suggesting stopping the bot for it, just for a human editor to put them in the right place. I could do it when I get time, but my sound card isn't working. --Jackofclubs 15:08, 4 May 2009 (UTC)Reply

Bot's work is over. Page User:DerbethBot/May 2009 provides statistics how many files were added plus a list of files that could not be added. They can be inserted manually. --Derbeth talk 22:37, 5 May 2009 (UTC)Reply

Category names

There's a bit of a problem with the names of categories on this Wikt. For example Category:Meats, should I add [[fr:Category:Meats in English]] or [[fr:Category:Meats]] (translated into English here, clearly). The same for the reciprocal links, should [[fr:Category:Meats in English]] just link to Category:Meats? If possible, getting a bot to change the links to Category:en:Meats would solve this problem, or is it just too much effort for such a trivial problem? Mglovesfun 09:40, 2 May 2009 (UTC)Reply

We deliberately have not used :en: for topical categories because this is the English Wiktionary. The word "English" only appears in lexical categories (about the nature of the word), never the names of topical ones (which treat the meaning of the word). This is not a problem; it is a conscious choice we made. --EncycloPetey 14:04, 3 May 2009 (UTC)Reply
Or rather, it's a choice that was made which we consciously didn't unmake. DAVilla 18:58, 3 May 2009 (UTC)Reply

Sanskrit

Can someone who knows Sanskrit check out the contributions of user:71.138.140.129, mostly reverted by user:67.116.243.171 (and then restored by me, because that reverting user didn't respond to DAVilla's query on his talk page and I took him/her to be a vandal)? Equinox 00:32, 4 May 2009 (UTC)Reply

As well, it could be that the first user didn't know what he was doing. Or that it's the same person, who is realizing his mistake. That's why I asked. DAVilla 01:11, 4 May 2009 (UTC)Reply

Ramifications of assisted editing

Assisted editing avoids a lot of the need for specific knowledge, whether of wikitext or of our fairly rigid formatting. This is definitely the direction of future growth. What I'm thinking is that with this functionality expanded to other areas, it might be possible to restrict contributions from anonymous editors through the tools provided. Link an audio file at the click of a button. Adding a derived term would only be allowed if the term already existed in the target language. The javascript would automatically alphabetize and balance it, and we wouldn't have any of the additional crud that sometimes goes along with those. (New users tend to try to define the term there.) Clicking on a red link in translations would automatically fill in the language header and a starter definition for those logged in, and would query for more information on a form for those who are not. Basically, we want to make it easy for anons to make positive contributions, especially translators who don't often bother to log in on every project, but we have the right to expect any direct changes to the wikitext to be made by knowledgeable contributors who have at least taken the time to create a username, never mind the time to read through all of our guidelines.

It has always been the case here that the volume of edits require hasty decisions in patrolling, and the result is that a number of potential repeat contributors are turned away from a first bad experience, for being blocked in the creation of words like outgreen which may not appear in other dictionaries but are real nonetheless. The problem is that we get so many bogus entries that these positive contributions are misinterpreted. It would be extremely convenient to channel our energy at directing those who have taken the trouble of registering, while at the same time not requiring such a high barrier on minuscule edits that provide a long tail of content. Seeing the objections that have arisen from this first assisted test case, it is evident that these sorts of tools can only be applied where the formatting is very rigidly defined, so I wouldn't expect for instance Wikisaurus pages to be collectively protected in a very long time. I have believed for some time that our format differing from Wikipedia will require an independent solution in the long run, one that seamlessly links definitions with their synonyms and translations for instance. I have some hope now that this will actually come about because with experimentation outside of the wikimedia framework it can happen in steps and only be fully incorporated when maturated. 72.177.113.91 03:00, 4 May 2009 (UTC)Reply

"Adding a derived term would only be allowed if the term already existed in the target language." Do you mean translations? What's the point of having too many, if there isn't just one? I would allow to add the very first translation to a missing language to anyone. The regular members may not have the knowledge of that language or be bothered about it. Anatoli 03:06, 4 May 2009 (UTC)Reply
I don't mean translations, I mean derived terms. We could apply what we've done with translations to derived terms with a convenient input box. And likewise with other sections. Now edited above to be clearer.
As to fleshing out translations, I agree. In contrast to Wikipedia, we do not protect against new page creation because we need those entries very badly. 72.177.113.91 03:18, 4 May 2009 (UTC)Reply

Re-ordering ELE sections

"Re-ordering the descriptive paragraphs is just fine; if you stick to that there is no issue." -- RU, commenting on the previous VOTE which included this proposal

The proposal: Re-ordering the ELE sections Derived terms, Related terms, and Descendants to match the Order of Headings section.

Why don't we try this again. Since I now believe that merely re-ordering three sections is too trivial to require a full VOTE, maybe if I mention it at the BP, someone will be willing to be bold and simply do it. Or, maybe some kind soul will go through the rigamarole of a VOTE, then (a month later) make the minor change. In any case, hopefully this will provide further evidence of discussion, or at least the attempt to provoke such. JesseW 17:09, 4 May 2009 (UTC)Reply

Perhaps we need to aggregate proposed "minor" changes. We should exclude any that have any significant controversy or are substantive. As I understand it, this is roughly how it is done in most legislative bodies. If we do not limit ourselves to non-controversial items, we may not get any changes through. If we slip in substantive changes, they may not get the benefits of serious attention. DCDuring TALK 16:43, 11 May 2009 (UTC)Reply
Well, I've now tried making the edit again -- we'll see if someone jumps on it. JesseW 20:05, 12 May 2009 (UTC)Reply

Recurring problem with Chinese vs. Mandarin

Why is Mandarin more correct?

Because we distinguish the Chinese languages, so "Chinese" is ambiguous. This is not a debate between whether Mandarin is a dialect or Chinese a language family. Linguistically the distinction between dialect and language is arbitrary. The point is that we do not group all Chinese words together, so our nomenclature should reflect the way these are grouped. That's why ==Mandarin== is the approved and de facto language header. The problem is that many of our Mandarin Chinese translations do not say "Mandarin" at all. DAVilla 17:25, 4 May 2009 (UTC)Reply

There's no problem in Mandarin being more correct, we just shouldn't use "zh" to reference it, "zh" is Chinese. Conrad.Irwin 22:59, 4 May 2009 (UTC)Reply
I need to change the template (zh) to "Chinese" to help me with the assisted translations. Asking you not to revert, please. ("cmn" still exists and can be used). In any case, I don't know how we can reconcile our differences. If the majority decides on Mandarin, renaming all * Chinese to * Mandarin translations may require some bot program, if I add extra translations, it won't make much difference. zh still stands for Chinese (中文 (Zhōngwén)), not for Mandarin. (普通話 / 普通话 (Pǔtōnghuà), and other words meaning standard Mandarin or northern Chinese dialects). Anatoli 01:52, 5 May 2009 (UTC)Reply
We don't use zh to reference it, we use cmn. The reason that {{zh}} says Mandarin is that Wikimedia uses zh to refer to Mandarin. The Wiktionary at zh.wiktionary.org does not include other dialects of Chinese. Aside from that there is no purpose for zh at all. DAVilla 06:49, 5 May 2009 (UTC)Reply
But don't take it from me. See Wiktionary:Beer parlour archive/2009/March#lang=zh. DAVilla 09:34, 5 May 2009 (UTC)Reply
"zh" should really not be "Mandarin" in the template. No one uses "Mandarin" as the term for translations in Wiktionary; everyone uses "Chinese". So it's quite inconvenient for us to change it to Mandarin. "cmn" gives you "Mandarin" anyway. (And this is not a problem; after all, Chinese means Mandarin, effectively. For Chinese people, there is only one standard Chinese language, and some of us Westerners call it Mandarin.) --Aghniyya 10:20, 4 May 2009 (UTC)Reply
The problem is that "Chinese" is ambiguous. Yes, when people say "Chinese", they mean Mandarin, and if they don't mean Mandarin then they have to be clear about that. The fact that "Chinese" is used in translations more than "Mandarin" is a problem here because the approved and de facto language header is the latter. We should never use "Chinese" by itself so what we need to discuss is how to eliminate it. DAVilla 16:53, 4 May 2009 (UTC)Reply
Well, this is the common practice in Wiktionary at this point, so we would have to go through and change possibly over a thousand entries, which is unlikely to happen. We should focus on getting the translations up.
Not only is this a distraction, in any case, I still disagree with your linguistic approach here. I'm in a graduate program, and I couldn't see my professors supporting you here. Languages are, remember, socio-cultural constructions, so it's best to follow the universally accepted practice, both in Chinese and western cultures: Chinese means Mandarin (and Mandarin itself is a silly, archaic, Orientalist term for "putonghua"). The PRC constitution says, "The standard spoken and written Chinese language means Putonghua (a common speech with pronunciation based on the Beijing dialect) and the standardized Chinese characters." From there on, the Chinese constitution only refers to Chinese. Likewise, no one buys a "Mandarin-English" dictionary - it's Chinese-English.
Lastly, when people are searching for a translation from a long list, they will not search for Mandarin; they will search for Chinese. Anything else will confuse them (e.g., think of a native Chinese speaker who will not think of the term "Mandarin"). So let's please put this behind us! --Aghniyya 06:20, 5 May 2009 (UTC)Reply
You've done a great job of twisting linguistics, a social science that observes human behavior, into an excuse for politics, a cultural tool that directs human behavior. The PRC is a political entity and not a linguistic authority apart from what they can force their schools to teach, the Standard Mandarin that your quotation refers to. It is after all Standard Mandarin Chinese that the quotation refers to, regardless of what it is called by them or by us or anyone else.
In one sense, Chinese = Mandarin, and in another sense, Chinese is a family of languages. You can argue whether the label should be Chinese or Mandarin or something else, but please do not cross the line into thinking that Mandarin, or what the PRC calls Chinese, is a family of languages, as if to say they're all the same. DAVilla 07:42, 5 May 2009 (UTC)Reply
I agree with Aghniyya's point. As a native Mandarin speaker (with Min Nan & Cantonese background) I would most likely look for the translation under Chinese instead of Mandarin. I think it is more intuitive to nest Mandarin & other dialects under Chinese instead of having top-level entries (not grouped under Chinese). When I think of the translation, 翻譯成中文 (translate into Chinese) makes more sense than 翻譯成普通話 (translate into Mandarin). --Ccsheng125 01:38, 7 May 2009 (UTC)Reply
I should also say that this question of languages and dialects comes up in numerous cases, and linguists always decide to let socio-cultural definition lead the way. Otherwise, it's chaos. Shall we list German under "Hochdeutsch"? Or Arabic under "FusHa" or "Modern Standard Arabic"? After all, Arabic and German do NOT refer to Austrian German, Plattdeutsch, or any spoken Arabic form at all (Standard aka classical Arabic has no native speakers). But there is a standard, and everyone knows it. --Aghniyya 06:29, 5 May 2009 (UTC)Reply
I've already written too much, but I'll also say that even in scholarly journals, Chinese is used to mean standard spoken Chinese, ie, putonghua/Mandarin. --Aghniyya 06:46, 5 May 2009 (UTC)Reply
I don't argue that. My argument is that it also means something else. Chinese can be an ambiguous term. If here we decide that it means Mandarin specifically, then I'm fine with that, as long as we are deliberate and consistent. Calling Mandarin "Chinese" would mean that the other dialects are not Chinese (i.e. not Mandarin). DAVilla 07:42, 5 May 2009 (UTC)Reply
Aghniyya has given very good arguments here about the common usage. In any case, at least, in mainland China, the separation of Chinese dialects is not encouraged and not supported by the Chinese themselves. Norwegian and Danish may understand each other but they don't think they belong to the same language but Chinese think of themselves as Chinese and that they speak Chinese, regardless of their dialect. I am keen to add more translations into Chinese but this discussion is not helping. Whatever we change, if we change, it will become inconsistent and would require a lot of rework. Anatoli 12:04, 5 May 2009 (UTC)Reply

Do we have to eliminate Chinese?

No. Current practice sometimes solves this by using * Chinese: ** Mandarin: in translations. However, this is not consistent with the way we handle other names of languages. Languages are sorted alphabetically, not grouped by language family. For instance, the Scandinavian languages (edited) are not only closely related, some are mutually intelligible! But we do not put Danish and Swedish next to each other just because a speaker of one can understand the other.

There are better solutions. As appropriate, we could use * Chinese: ''See Mandarin'' or * Chinese: ''See Mandarin, etc.'' in translations (and probably also * Farsi: ''See Persian''), or we could use * Mandarin Chinese: and adopt ==Mandarin Chinese== as the language header name. I don't really see "Cantonese Chinese" as being necessary, just "Cantonese" should suffice, so this would be a conscious exception to the rule. DAVilla 17:26, 4 May 2009 (UTC)Reply

I don't like the idea of See X as it just adds clutter, and, assuming every page is formatted the same (which it will be eventually), people only have to find the language they are interested in once. Nesting is not ideological, it just exists to make the page easier for people to follow - so it should aim to do what they expect (I personally would expect to find Mandarin under Chinese if it wasn't under Mandarin, same with Nynorsk and Norwegian). Conrad.Irwin 22:59, 4 May 2009 (UTC)Reply
That's fair. Personally I would prefer to see * Mandarin Chinese but I wouldn't even care if we simply called it * Chinese providing we got rid of the second indentation, listing the other dialects like * Cantonese in the full alphabetical list. DAVilla 06:43, 5 May 2009 (UTC)Reply
Aren't the written forms of entries in the different Chinese languages often identical? If so, then why scatter them all over the list of translations? Group them together in a prominent block, and no one looking for a Chinese translation will ever have any trouble finding it. Michael Z. 2009-05-04 23:49 z
Nynorsk and Bokmål are of interest only if they are different, perhaps making Bokmål the default makes sense, add Nynorsk if different. I don't see the need for the split if they are identical, same with Chinese. Anatoli 00:07, 5 May 2009 (UTC)Reply
As (a long way) above, translations are the same get added as {{no}} Template:no, those that differ get added nested under "Norwegian" as {{nb}} Template:nb and {{nn}} Template:nn. Conrad.Irwin 08:31, 5 May 2009 (UTC)Reply
First of all, no. See A-cai's comments below. Second, it doesn't matter whether they're similar or not. As I pointed out, there are very similar languages that are listed under completely dissimilar names. We do not group any other languages in these sorts of prominent blocks. Why does Chinese have to be an exception? If you argue that alphabetical order isn't ideal for these dialects, then you'd have to be willing to extend that argument to cases where it is much more applicable. DAVilla 06:40, 5 May 2009 (UTC)Reply
(@DAVilla) grouped by language family - this means maintaining the theory which considers the dialects of the Chinese language to be separate languages... Well, how can two dialects be declared separate languages, when 95% of the words have a common spelling? (not talking about the pronunciations here). For instance, the Scandinavian language are not only closely related - did you really mean to write the Scandinavian language? Even me, a staunch sceptic when it comes to fabricating new languages out of hitherto existing dialects, do not think that Danish and Swedish can possibly be one language, as I have more difficulties when reading Swedish texts and when hearing Norwegian speakers (with my knowledge of Danish) than listening to Swedish users or reading Norwegian bokmål. In case this was an inadvertent misspelling of yours, then the apt example is not Scandinavian languages, and Anatoli already pointed out the more applicable Norwegian Bokmål-Norwegian Nynorsk or Flemish dialect of Dutch-standard Dutch. As they are listed under one header and when there is no spelling difference, not even mentioned to be two variants, the same approach should apply to the Chinese language: * Chinese: xxx (pinyin: xxx, /*other pronunciations*/) . It is not Chinese which should be eliminated, but the term Mandarin, transforming the whole issue merely to a pronunciation issue. Well, for those 5%, where there is any difference in spelling, perhaps we should use the terms Mandarin, Cantonese and so forth, but under the header of the Chinese language. The uſer hight Bogorm converſation 09:54, 5 May 2009 (UTC)Reply
I have to correct a common misunderstanding here. It is simply not true that words between all Chinese dialects have a common spelling 95% of the time. Here are some figures (quoting from: w:Min_Nan#Mutual_intelligibility), Mandarin and Amoy Min Nan are 62% phonetically similar and 15.1% lexically similar. The reason for the misunderstanding is that many of the people who speak a dialect other than Standard Mandarin often use Standard Mandarin as a "lingua franca" written language. This is not unlike the relationship between Modern Standard Arabic and other varieties of Arabic. -- A-cai 10:48, 5 May 2009 (UTC)Reply
A-cai, could you please elaborate the meaning of "15.1% lexically similar", it seems really too small. I saw this in the Wikipedia article. Not sure what it means here and what measurement of similarity was used. Many phrases in Min Nan, when written in Chinese characters are comprehensible even for my Chinese, even if the word order and word choice may differ from the standard Chinese. Some most common words, although their number is very small, like in Cantonese, are different from Mandarin. BTW, the Arabic translations have * Arabic first, followed by **dialects, without specifying * Modern Standard Arabic or * Classical Arabic. * Mandarin seems to be denied the status of being standard Chinese by some of you guys. Anatoli 11:34, 5 May 2009 (UTC)Reply
Could be a typo (from the original cited article). I wonder if it's supposed to be "51.1%" lexically similar"? That would be a closer match to my Swadesh list comparison (see below). -- A-cai 02:30, 6 May 2009 (UTC)Reply
Dungan is written in Cyrillic. Thus your point is moot. -- Prince Kassad 12:19, 5 May 2009 (UTC)Reply
Dungans don't call themselves Chinese, they call themeselves Dungans or Tungani, even if their language is comprehensible to Mandarin speakers and especially Huizu. Dungans didn't have a chance to learn proper Chinese. Mandarin and other dialects can be written in different scripts but they are normally not. Min Nan is sometimes written in a romanised script to show the pronunciation difference. Not sure if ** Dungan should appear under * Chinese, perhaps it could and should. Dungan can be written in hanzi for native Chinese words to show the variant spelling as with Serbian Roman/Cyrillic - хуэйзў йүян / 回族语言 / Huízú yǔyán. Anatoli 12:37, 5 May 2009 (UTC)Reply
Indeed the sorting is not clear. I currently sort Dungan under Chinese, but if people prefer it to be separate, that's possible too. -- Prince Kassad 12:45, 5 May 2009 (UTC)Reply
(By the way, pinyin is a transliteration or romanization. Calling these pronunciations could confuse people when adding translations in other languages.)
This isn't just a pronunciation issue. Saying that the Chinese dialects are basically equivalent except for grammar is like saying that German and English are essentially the same except for word order. If you're thinking more along the lines of British vs. Australian English, pronunciation differences exist between the Beijing, Qingdo, and Xuzhou dialects of Mandarin. You would think that within a major branch of Chinese like Wu there would be less variation. However, many Wu dialects, apart from Taihu, are not mutually intelligible. Hui, which can be Wu or Gan depending on who you ask, has a high degree of unintelligibility even from county to county. These are where the pronunciations differences lie. It makes little sense to say that these branches, which have variation even within themselves, differ only in pronunciation from other branches. Between branches like Mandarin and Min Nan there are even false friends such as run/walk.
I think this objection to splitting langauges that might in some cases use the same characters is very odd. Many words like animal, taxi, mango, and international are the same spelling and meaning across very different langauges. The more similar the language, the more crossover, as with the romance languages. 72.177.113.91 17:26, 5 May 2009 (UTC)Reply

This whole argument about what constitutes a language and a dialect is utterly boring and moot for linguists. Everyone agrees that even linguistically separate languages, like Egyptian Arabic and Standard Arabic, can be considered dialects of the same language if the speakers define it that way. Likewise, linguistically unitary languages like Danish and Norwegian can be separate if the speakers so choose. "A language is a dialect with an army and a navy," as Max Weinreich said. Let's use the socially recognized designations for languages and dialects. Let Swiss German be a dialect of German, Wu a dialect of Chinese, and Moroccan and Egyptian a dialect of Arabic, and Mandarin Chinese is standard Chinese. Now, the question is, how should these things be listed? Dialects could be listed as alphabetically separate from the languages themselves. This is more convenient for writing entries than having to manually indent the entries. However, I would still argue that dialects like Wu and Egyptian should be indented next to their standard languages. Why? Because it's more convenient for users. I myself am a very serious learner/speaker of Arabic and Arabic dialects (which are linguistically speaking separate languages, but not socio-culturally). If I am looking at how to say a word, I tend to assume that the dialects won't be present on the list. If they are indented next to the standard, it's right there for me to see. --Aghniyya 18:12, 5 May 2009 (UTC)Reply

  • Interesting that no-one argues that Arabic entries should appear under * Arabic, followed by possible dialects but with * Chinese, we have this argument. The entries followed * Arabic are of FuSHa, are not called * Modern Standard Arabic or * Classical Arabic, the language not usually used in common speech but Mandarin - the official and standard Chinese language, needs to be disputed here.
  • Anyway, one point that A-cai mentioned that Min Nan is "15.1% lexically similar" to Mandarin. Japanese, Korean and Vietnamese are said to have between 40% to 60% of common vocabulary to Chinese dialects, of course pronounced differently but there is a pattern, how can a Chinese dialect be more remote from Mandarin than a foreign language? There is something wrong in that Wikipedia article.
  • In Wikipedia, they use multiple templates providing jiantizi/fantizi + different pronunciations - pinyin, Yale and pe̍h-ōe-jī to represent Mandarin, Cantonese and Min Nan. A template with different optional parameters would do the job, in case when the Chinese spelling is the same. See Bogorm's comments. Anatoli 01:45, 6 May 2009 (UTC)Reply

Anatoli 01:45, 6 May 2009 (UTC)Reply

There is a common misunderstanding that Chinese words are mostly written with the same Chinese characters, regardless of dialect, but pronounced differently. Intuitively, I know this not to be the case, and have attempted in several earlier posts to cite online research to back my claim. I decided to take a different approach for this post. Let's assume that we were to label everything as "Chinese" in the translation sections, except in cases where there is a divergence (as was suggested by another contributor). What would happen?
I will use Mandarin and Min Nan, only because those are the two dialects that I speak. I'm not about to compare the entire lexicon of both languages, for obvious reasons. However, I can compare the Swadesh lists for Mandarin and Min Nan, which should provide sufficient insight for the purposes of a Beer Parlour discussion. Although there are 207 words in the Swadesh list, it actually requires a total of 295 individual Mandarin words to account for all of the senses of the 207 English words. It requires 307 Min Nan words to account for all of the senses of the 207 English words. For example, the English word "not" is expressed with five different words in Min Nan and three different words in Mandarin, depending on the sense of the word "not" that you want to convey.
In the above hypothetical, I would only label as "Chinese" those words that are written with identical Chinese characters and are used in the exact same way in both languages. For example, the word for "mountain" in both Mandarin and Min Nan and is . Furthermore, the sense meaning of is identical in both. The only thing that is different is the pronunciation. As such, would qualify for the "Chinese" label in the above hypothetical. In the translation section, you might see something like:
large mass of earth and rock
  • Chinese:
    Mandarin: shān
    Min Nan: soaⁿ
However, 怎樣 meaning "how" would not qualify, despite meaning "how" both in Mandarin and Min Nan (12 out of 149 "matches" fall into this category, and are thus "disqualified" from my calculations. The final number of "matches" is therefore 138). The reason is that while 怎樣 is the informal word for "how" in Mandarin, it is regarded as a rather formal term in Min Nan. The equivalent word in Min Nan to 怎樣 is 按怎. As such, this would be a divergent case, and would appear in the translation section as:
in what way (informal)
  • Chinese:
    Mandarin: 怎樣 (zěnyàng)
    Min Nan: 按怎 (án-chóaⁿ)
Ok, so taking all of that into account, what did I find? It turns out that 138 words (not counting the 12 "false" positives) were a match between Mandarin and Min Nan. That works out to 44.62% (138/307, 307 being the total number of Min Nan words needed to represent the 207 English words in the Swadesh list). In other words, if one assumes that the Swadesh list is a rough representation of the language as a whole, you would expect to see a common "Chinese" label 44.62% of the time. The other 55.38% of the time, you would require separate "Mandarin" and "Min Nan" labels.
In case you want to try it yourself, I used the Appendix:Amoy Min Nan Swadesh list and the Appendix:Mandarin Swadesh list for my stats. -- A-cai 02:15, 6 May 2009 (UTC)Reply
I've read your reply in full but will only give a quick reply with questions, sorry, will get back later if I can. Your calculation (44.62%) is based on the Swadesh list? This list consists of the very basic and the most common words in a language - pronouns, question words, quantifiers. These are the words that mostly differ between Mandarin and dialects. I have almost no knowledge of Min Nan but I can judge by my exposure to Cantonese. Wouldn't your 55.38% (of Swadesh list) only convert to a couple of hundred words out of many thousands Chinese words? Besides, 怎樣 is not foreign to Cantonese or Min Nan speakers, am I right? Although you'd prefer to write 按怎 when using Min Nan? Let me explain a bit, in Cantonese, the word for "come" is 嚟 (lei4) but common Chinese 來 is also used as a cognate. Isn't it the same in Min Nan, do you at times write 怎樣 but say án-chóaⁿ? Would you a different pronunciation in a formal Min Nan, more similar zěnyàng? Anatoli 05:43, 6 May 2009 (UTC)Reply
  1. Yes, the 44.62% is based on the Swadesh list. I agree that if you were to do a much larger sample, the number might increase. However, I don't think it would increase as dramatically as you might think.
  2. In the case of 怎樣, I understand your question, but that is a different phenominon. 怎樣 (chóaⁿ-iūⁿ) is a legitimate Chinese word in Min Nan, but is not used in the same way as 怎樣 (zěnyàng) in Mandarin. What you're talking about is spot translating a Mandarin word into Min Nan. For example, "how much" in Min Nan is 偌儕 (jōa-chē), but is commonly written with the Mandarin characters 多少. If I were to pronounce 多少 in Min Nan, it would be to-siáu, but would be met with strange stares, if I tried to use it in Min Nan. -- A-cai 11:02, 6 May 2009 (UTC)Reply
There seems to be little written material in Min Nan with Chinese characters but if it is written in Chinese characters, it will be very comprehensible (especially serious topics) to Mandarin, Wu or Cantonese readers. The small number of incomprehensible but frequent words may impede the understanding + some false friends. I would be interested to see a mutual intelligibility analysis of larger texts, not of selected, specifically dialectal words. Modern written Cantonese, Wu and Mandarin are mutually very intelligible. Anatoli 06:12, 6 May 2009 (UTC)Reply
One of the most lucid articles that I have found online on this subject can be found at: http://www.glossika.com/en/dict/faq.php#18. -- A-cai 11:02, 6 May 2009 (UTC)Reply
The last item, no. 19, should confirm some of the statements here. It's supplemented by a first-hand account as well, while the rest of the information, though interesting, is more analytical than narrative-based. Chinese speakers can have a multitude of their own fist-hand experience by listening to the recordings at the bottom of this page... at least in theory. They don't download for me. DAVilla 18:15, 6 May 2009 (UTC)Reply

How can we make editors aware?

By agreeing to a format so that an example can be listed explicitly in the entry layout and other help pages.

By splitting Wiktionary:About Chinese into several pages so as to reinforce the idea that these dialects are treated separately.

By using the correct term in assisted edits and otherwise running bots to clear up the current mess, since many contributors just copy or do as they see. DAVilla 17:27, 4 May 2009 (UTC)Reply

Bots to clean up mess is fun, on the condition that everyone can agree exactly which changes are to be made. Conrad.Irwin 22:59, 4 May 2009 (UTC)Reply
I support retaining Chinese as the main header for translations, which should contain the standard Chinese spelling. The definition of what is Chinese language stems from the Chinese themselves. There is very little language separatism in China, why should we promote it? The formal or standard writing is almost identical for all Chinese dialects. The benefit of having just Chinese is that dialects can be added ** Cantonese, **Min Nan as nested, if somebody bothers to do but it's important to have the Mandarin entry. If we add ** Mandarin to each * Chinese
  • There are a lot of entries to change. it seems using the word "Chinese" is popular with many editors.
  • We are not using space efficiently - there will always be a blank line
  • Mandarin IS the standard Chinese. The big difference (especially in separate words) is mainly in the pronunciation.
  • Dialects are often added for pronunciation purpose only. They use the same character, e.g. Indonesia can be written as 印尼 in Chinese, pronounced Yìnní in Mandarin and Ìn-nî in Min Nan. Min Nan is not a written language, like many Chinese dialects are, they write in standard Chinese (Mandarin) but may pronounce words the Min Nan way. Hong Kong TV anchors have their speeches written in Chinese Mandarin, they read it out loud in Cantonese.
  • Chinese dialects can be grouped together, if entries are added and they can be cross-referenced. My preference is to have * Chinese (** dialect 1, ** dialect 2), omitting ** Mandarin altogether.
Even the colloquial, informal Cantonese only differs by about 5% from Mandarin, many words from Cantonese do penetrate standard Mandarin if they are used too often in writing. The separation between traditional/simplified applies to dialects as well, although some are under assumption that Cantonese is always written in traditional characters. Cantonese speakers in Guangdong province use simplified characters to write in their dialect. Anatoli 23:01, 4 May 2009 (UTC)Reply
Please see Appendix:Sino-Tibetan Swadesh lists for a side by side comparison of basic words in some of the more well known Chinese dialects. -- A-cai 00:49, 5 May 2009 (UTC)Reply
Thanks, A-cai. This is very useful and interesting. I do read about dialects, although I am not studying them now. So you support the idea of separate translations for each Chinese dialect? Even in your list all dialectal entries are under the same Chinese characters, even if some of them are only used in modern dialects (e.g. also has the Mandarin reading and the meaning is known but not currently used). As I suggested before, the rare cases where they are different and don't overlap, like and can be listed together (see he. In any case, standard Cantonese will use in formal writing, so I would prefer to write: * Chinese (tā), ** Cantonese (formal writing), (colloquial) with pronunciations. In this case, is common for any Chinese, the dialectal form may not exist, even if does, the Mandarin form may still be used and is known. Anatoli 01:33, 5 May 2009 (UTC)Reply
I can't see any other way to do it. I tried to have everything labeled as "Chinese" when I first started two years ago. It became quickly apparent that it just wasn't going to be a sustainable model, if we wanted to include anything besides Standard Mandarin. I was initially in favor of treating each mutually unintelligle Chinese dialect as a separate language, and listing them in the translations accordingly. However, this proved to be unpopular with many of our users. It seems from some of the above posts, that there is still resistance to the idea. The *Chinese **Mandarin **Min Nan etc model was a compromise solution. I'm not sure that we will ever be able to come up with a solution that will please everybody. However, the compromise solution mentioned above has more or less held for the last two years. -- A-cai 01:53, 5 May 2009 (UTC)Reply
P.S. the varieties of Chinese can be more varied than you might initially think. For more information, see: Varieties of Chinese. -- A-cai 01:57, 5 May 2009 (UTC)Reply
The situation has somewhat changed with the introduction of assisted translations, which Conrad.Irwin has kindly developed. If the nesting can be fixed then it's fine, otherwise, all translations can be done quickly, except all Chinese translations will have to be done manually, which is sad.
My other point is, how many Chinese dialect editors do we have? The grammatical differences are irrelevant here. The differences in the written form are low. I took part in editing that article and others in Wikipedia. Anatoli 02:12, 5 May 2009 (UTC)Reply
As a general rule, please don't let what software makes "easy" the right thing to do. I hope to have support for nesting this week, it requires writing four further types of 'edits' (adding a new nested section with heading, adding a new nested section (using *:) to a heading that has translations, adding a new *: and adding a new ** translation to nested lists that already exist), of which the first two are done, I'm now struggling with how to sort these nested languages, as presumably "Old " should come before "Middle", but otherwise I think alphabetical works well enough. For Chinese is this the case, or do we want to always put Mandarin first, or something yucky? Conrad.Irwin 08:36, 5 May 2009 (UTC)Reply
In this model that you propose, is "Chinese" supposed to mean "Mandarin", or is "Chinese" supposed to mean a family of languages? What this hierarchy would seem to imply is that Mandarin is the only real Chinese language, and that the other dialects are offshoots of it. In fact major branches like Hakka bear more resemblance to Middle Chinese, and Min not even to that!
As to your TV anchors, this is the result of schooling. Cantonese can be written, but because this is not easily understood by speakers of Mandarin etc., in formal writing Cantonese speakers use a standard written Chinese. As a result, it is much easier for Cantonese speakers to learn Mandarin than for speakers of Mandarin to learn Cantonese.
Your arguments entirely gloss over the very real grammatical differences. It's like saying German and English are essentially the same except for word order. DAVilla 09:18, 5 May 2009 (UTC)Reply
A good compromise could be to accept Chinese (zh) (after all, it's a code used by Wikipedia and in interwiki links) and Mandarin, Cantonese, Wu, etc (they all have their own ISO codes). In the same translation table, or in the same page, the same word could be present in both Chinese and Mandarin, for example. But it could also be present only as Chinese, or only as Mandarin, depending of options taken by contributors. Languages would be sorted normally, according to their names. I'm aware that this would allow duplication and that this is not a satisfactory solution, but this might be the simplest solution, information provided would not be wrong, and it would be easy for all readers to find what they look for. Lmaltier 10:15, 5 May 2009 (UTC)Reply
That's what I a have been suggesting but DAVilla disagrees. If Mandarin can't be made the default and be regarded as simply * Chinese (if dialects are missing), then let them be nested but I don't like extra work and not looking forward to manually adding * Chinese \n\t ** Mandarin to each translation I make. This argument must have happened a few times here and if the majority of the existing translations use simply * Chinese, it indicates what their preference was. The word Chinese is simply a more common English name for the standard Chinese language than Mandarin. Anatoli 11:48, 5 May 2009 (UTC)Reply
FYI: it doesn't indicate "what their preference was"; it just happened by default because (as A-Cai notes) we were not properly distinguishing the languages in the Chinese group until 2 1/2 years ago. Robert Ullmann 12:20, 5 May 2009 (UTC)Reply

Please people. This is not complicated, and there is only one problem that needs to be resolved (as David identified at the top): we have a substantial number of entries that have only "Chinese:" in the translations sections that need to be corrected to "Mandarin:" (or separated into "Mandarin", "Cantonese", etc). Otherwise people will copy them (and/or confuse themselves with "needing" to change the zh template or something). This is the only thing that "needs to be fixed".

The zh code should not be used anywhere with the wiktionary; language "Mandarin" is cmn (and so forth). The fact that WMF uses "zh" in the domain name(s) for Mandarin (and zh-min-nan for Min Nan, zh-yue for Cantonese) is not something users or editors need to or should see. The {{t}} template converts (cmn->zh etc) internally; Tbot and Interwicket know other details. (We probably should get rid of template {{zh}} entirely.)

As to grouping under Chinese: that is a separate issue, and whenever we decide something, it will just get added to AF's sort algorithm, so you don't need to "fix" it. For now use:

* Chinese:
** Mandarin:
** Wu:

or the individual language lines as you want. (I.e. if you are adding Mandarin with the acceleration (or not), add code "cmn", and don't worry about the "grouping".) If you do group them, note the ** which says the sub-line is a full language name, not some other qualifier, which always uses *: Robert Ullmann 12:16, 5 May 2009 (UTC)Reply

In order to come up with an idea and not only appear to be cavilling at the Mandarin separatistic practice, my suggestion for the entry layout is:
1) When there are no spelling difference (vast majority of cases)
* Chinese: [[xxx]] (transliteration in Pinyin for Mandarin, transliteration in [[:w:Jyutping|Jyutping]] for Cantonese, ...)
2) spelling difference:
* Chinese:
** Mandarin:[[xxxM]]
** Cantonese:[[xxxC]]
I do not see any reason for the cmn templates, because I see no reason for rejecting Chinese as a header followed by clarifications concerning the regional pronunciations/transliterations systems in brackets and italicised. Thus, my opinion is that a bot converting all Mandarin entries to Chinese with a (Mandarin) note would be more appropriate instead of vice versa. Such note would be needful only in the pronunciation sections and before the romanisation, with meaning and spelling intact. In the exiguous minority of cases where spellings differ, it may also be applicable before the meaning (of the regional dialects of cours, not before the meaning in standard Chinese). The uſer hight Bogorm converſation 21:35, 5 May 2009 (UTC)Reply
A-cai responds to this point when he mentions the Swadesh lists in the section above. 72.177.113.91 05:11, 6 May 2009 (UTC)Reply
Mandarin is more specific than Chinese, but due to Widespread use of "Chinese" to de facto mean "Mandarin", I consider that both form should be allowed in translations. For sectional entries, perhaps ==Chinese== should have a redirect to ==Mandarin==. Or should we consider ==Chinese (Mandarin)== as a compromise? I have not seen dictionaries being "Mandarin-English" or vice versa.--Jusjih 03:19, 7 May 2009 (UTC)Reply

Reframing the question

Putting aside the distinction between the labels that describe the Chinese languages, can we at least agree to use the same labels in the level two language headers as we do for bulleted translations? It doesn't make sense to click on a translation for * Chinese, as at present most are simply labeled, and not find a section called Chinese on the next page. Considering that most of the terms are defined under a ==Mandarin== heading as policy dictates, this is a big discrepancy. I don't care if we have to change one or the other, it's going to be a big undertaking, and we may as well take it. Better sooner than later. I've stated my preference above, but ==Chinese== would at least be consistent. And heck, if we have to change every single one to "Mandarin Chinese" or the like, all the better as long as this is settled. DAVilla 05:42, 6 May 2009 (UTC)Reply

I agree but, if both Chinese (zh) and Mandarin (cmn) are accepted (in translations as well as in language headers), this would also work. Isn't compatibility with WM language codes at least as important as compatibility with ISO? Shouldn't all WM codes (e.g. zh) be accepted, even when they conflict with ISO (e.g. als)? Note that the only issue I'm raising is priority between conflicting compatibilities (I know that the best and most complete list is ISO, but I think that compatibility with Wikimedia is very important). Lmaltier 06:43, 6 May 2009 (UTC)Reply
I'm not sure I understand. How can both "Chinese" and "Mandarin" be accepted? Wouldn't we have to pick one or the other? The words that are listed under * Chinese are de facto Mandarin Chinese. Now, I don't mind calling them just "Chinese" or whatever else we may decide, as long as it's consistent on this project. WM codes really have very little to do with this except that people seem to like using zh when they should be using cmn. DAVilla 07:14, 6 May 2009 (UTC)Reply
I don't mind calling them just "Chinese" - that is great. I agree with calling them consistently Chinese, but not with calling them consistently Mandarin. People like that which is the established and usual practice and have their reason for that. The uſer hight Bogorm converſation 07:33, 6 May 2009 (UTC)Reply
Again, step back for a minute from which label is actually used. Do you agree that, all else being equal, we should be deciding just one question, of which label to use, rather than two questions, of which label to use in translations and which label to use in language headers? If it's the latter then there are inconsistencies. (I know your first preference is to have both be "Chinese". For your second preference:) Would you rather have different terms used depending on where the label is placed, so that for instance a user clicks on a "Chinese" translation but has to scroll down to a "Mandarin" definition as is current practice, or would you rather have both the translation and the language heading correspond, labeled as decided by the community, which might mean seeing "Mandarin Chinese" in both places? DAVilla 17:17, 6 May 2009 (UTC)Reply
Of course there is no use in labelling the translation and the entry differently. Having seen the translation labelled as Chinese, the reader expects to discover an entry labelled Chinese as well and that is also what I expect. As the comparison with varieties of Arabic or (the more familiar for me) Norwegian bokmål and nynorsk shows, one should use either Mandarin Chinese in the caption (when spelling differences exist) or rather a header Chinese and a (Mandarin) / (Cantonese) / ... note and not these subordinated designations (dialects) as headers. Ivan's approach with Ekaviana and Ijekavian Serbo-Croatian is exactly what I mean (see hteti/htjeti). The uſer hight Bogorm converſation 21:16, 6 May 2009 (UTC)Reply
Okay, in that case the translations are labeled as Serbian or Croatian, but the language header is Serbo-Croatian. I guess if someone followed a translation for "Chinese" then they would be able to find Mandarin Chinese on the next page. That doesn't sound unreasonable. DAVilla 02:18, 7 May 2009 (UTC)Reply
My proposal is to allow both, provided that information provided is not wrong. This would lead to duplication (and the Chinese version might probably include more information in some cases), but this would be compatible both to ISO and to WM codes, and everybody would be happy. Lmaltier 11:59, 6 May 2009 (UTC)Reply
The real answer is that this will most likely not be resolved one way or the other until Wiktionary attracts more Chinese speakers to the project. In the last 2+ years, I have been the only consistant contributor of Chinese words. As I tried to explain above (in great detail), "Chinese" is simply not workable as a label, unless, that is, we equate "Chinese" in every instance to Standard Mandarin. Trust me, I've tried. Again, if a word is used in one dialect but not another (as is the case as much as 50% of the time between some dialects of Chinese), how would you deal with it, if everything is labeled as Chinese? I don't mind "Chinese Mandarin," but again, that has been voted down in the past. Just so everyone is aware, this is not the first time that we've had this kind of lengthy discussion on Beer Parlour about this subject. Over the last two years, I have participated in least three or four similarly themed discussions (go back and check, it's all archived :). -- A-cai 22:54, 6 May 2009 (UTC)Reply
P.S. Case in point: Wiktionary:Beer_parlour_archive/July_06#Min_Nan, Wiktionary:Beer_parlour_archive/2007/April#Amoy, Wiktionary:Beer_parlour_archive/2007/April#Headings for 漢語, 閩南話, 粵語 etc. -- A-cai 23:40, 6 May 2009 (UTC)Reply

A compromise proposal

The label CHINESE would be taken to mean Mandarin Chinese. As a statement of general principle, any translation that is different from that would be labeled as such and alphabetized as with any other language. Cantonese comes before Catalan and Cebuano, Hakka after Haitian Creole. None of these would be called Chinese under any heading. Aside from Old Chinese and Middle Chinese, the only translations that could be labeled Chinese are the Mandarin translations, which need not say Mandarin at all, depending on how the community wishes to treat that language. In other words, each language heading corresponds with a row in the translation table. A possible exception to this are the Min languages, the bifurcation of which is an issue that should be decided separately. Likewise any proposals to isolate variants of Mandarin such as Dungan and Jin would be addressed separately. This is a compromise because the branches of Chinese would retain their own names, allowing "Chinese" itself to remain in use for Mandarin translations without ambiguity. DAVilla 03:21, 7 May 2009 (UTC)Reply

Doesn't look like a compromise to me. In my opinion, * Chinese ** Mandarin ** Cantonese ** Min Nan, etc. (nested) is far better, only awkward for the moment. You are separating the dialects, which was your original idea, where is the compromise? Didn't you already mention that you were agreeing with having Chinese as the default (meaning Mandarin) with others nested as they get added, eg. * Chinese (meaning Mandarin) ** Cantonese ** Min Nan, etc. (nested). Also, Chinese Mandarin is better than Mandarin Chinese, users start by searching for Chinese translations, not Mandarin. This option is not ideal either. I prefer the status quo, currently we have nested or just * Chinese if dialects are missing. Anatoli 03:45, 7 May 2009 (UTC)Reply
The dialects have always been separated. They have had their own language headers. They have their own bulleted rows in translations, though oddly grouped together. When only "Chinese" is listed it is invariably the Mandarin translation alone. I am not separating the Chinese languages, I am ungrouping them just the way we ungroup every other language family: Serbian and Croatian, the Scandinavian languages, the Arabic languages, etc. What does "Mandarin Chinese" versus "Chinese Mandarin" have to do with anything? This is a distraction I thought I'd separated out for the moment. It's a good idea, yes, but not on point. How should the translations be structured?
I did not mention that I agreed to have Chinese as default. I mentioned that it would be my second preference, as better than status quo. Conditional to that agreement was a distinction between "Chinese" meaning Mandarin and "Chinese" meaning the language family. My first preference is to simply call it Mandarin or Mandarin Chinese. You on the other hand seem to like status quo, inconsistent as it is with the choice of Mandarin as a language header, probably the only standard we have agreed upon to date. Where is the compromise in your proposal? In case it is not clear to you, status quo is contrary to policy. {{zh}} was not a mistake, it was a deliberation.
Let me tell you what you are effectively doing by confusing the two meanings of "Chinese". You are playing into the PRC's politically motivated game to stamp out any sort of Chinese that is not Mandarin. This is contrary to Wiktionary's "all words in all languages" vision. Essentially you say that the other languages of China are imperfect because they are not the true Chinese, equating any so-called dialect with Chinese, and Chinese with Mandarin, and therefore making any dialect a variant of Mandarin. They are not. They are better understood as independent variants descended from Middle Chinese, and Min not even that. Mandarin is not representative in any linguistic way.
Pick one meaning of "Chinese" and stop playing these word games. If you think "Chinese" should mean Mandarin, as was argued above, then you must conclude that the other branches are not to be grouped under a Chinese heading. If you think that "Chinese" should mean the language family, then you must conclude that the correct label for the translations you're adding should include the word "Mandarin" or the like so as to distinguish from the other branches. Now that doesn't say they must be ungrouped, but it does say that * Chinese: by itself would be invalid. If you want to use simply that in translations, as you've said above, then that would equate Chinese with Mandarin, which is why the other languages could not use the "Chinese" label, per this proposal.
Status quo is inconsistent, only serving to confuse the issue. It adds an unnecessary level of complexity to the layout, one we have lived with to date. However, even that has not been followed by Mandarin translators who add solely under the "Chinese" banner. The only consistent way to possibly have all the Chinese languages under the same name is if they all had the same level 2 language header, with dialectical tags in the definition lines. This is probably the view that the PRC would take, but in practice it is a preposterous proposition because the languages are simply too different. The policy we have for structuring translations must extend from the policy we have for listing entries in the first place. This is why in the end yours is a losing argument. DAVilla 09:23, 7 May 2009 (UTC)Reply
I'm really being too negative. To address your point of "Chinese Mandarin" vs. "Mandarin Chinese" etc., my position is to yield that to you. It's not part of the proposal explicitly, but it should be. Pick your favorite one. That's the compromise. You get to pick what you call the dominant Chinese language, we get to pick what to call the others.
Personally it's not the duplicity of "Chinese Cantonese" which I dislike in the * Chinese ** Cantonese nesting, it's the nesting itself. Ungrouping gives you the advantage of calling the langauge whatever you like. The alternative to this proposal is to group (my compromise), but then you would have to call it Mandarin (your compromise). That was what A-cai et al. had settled upon earlier, but it simply hasn't been followed. The result? Chinese translations and Mandarin definitions. And that's why we're here. DAVilla 10:04, 7 May 2009 (UTC)Reply
I'll agree to nesting, even if it's awkward - * Chinese, followed by all dialects (if the translation exists), including ** Mandarin and including cases where different hanzi are used or hanzi are not known/provided. A-cai and others said it worked as a compromise before, suitable for many others. I have no energy right now to respond to your comments about the political PRC games but I just say I don't want to any dialects but feel they need to be shown together. If this is OK, perhaps we should make a final vote and see if there are still strong objections. Anatoli 11:23, 7 May 2009 (UTC)Reply

A modest request

I have no strong opinons in the above discussion about labelling Chinese. Whatever decision is reached will be fine for me. However, I do request that the decision be clearly expressed and illustrated in WT:AZH (Wiktionary:About Chinese), so that those of us who patrol will know what format is considered standard, and what formats need modification. --EncycloPetey 02:57, 9 May 2009 (UTC)Reply

Let's take a straw poll. Please indicate your preference(s) for Translations.
  1. Group all Chinese languages under * Chinese. The compromise position may be to use ** Mandarin for translations into that dialect, even when no other dialects are present. However, this can be decided later.
  2. Sort Chinese languages into the full alphabetical list. The compromise position may be to use * Chinese for translations into Mandarin and a ==Chinese== instead of ==Mandarin== language header. However, this can be decided later.
DAVilla 05:23, 10 May 2009 (UTC)Reply
Before we go any further, please everyone take a look at Sinitic languages, if you have not already done so. It actually does a pretty good job of mapping out the languages that are mutually incomprehensible. A while back, I picked the word child in order to illustrate this in a way that the average reader could comprehend. If one were to base the breakdown on mutual incomprehensibility, child would look something like (in part):
The reason it does not look like the above in Wiktionary (besides not having enough people to add the words) is that ISO-639 does not have language codes for all of these (according to my understanding, this will be fixed in a future release of ISO-639). I am open to the idea of equating Chinese in every case to Standard Mandarin, but that would still leave us with something like:
In other words, the "prestige" dialect of each subdialect family would be equated to a top level tag. -- A-cai 01:07, 11 May 2009 (UTC)Reply
Although there are many dialects and subdialects of Chinese, can we limit the number for the sake of the translations being user-friendly? Otherwise, the translations will have a big article attached on all the Chinese dialects. A-cai, how likely is it that we will need translations into Tianjinhua vs translations into Chinese Mandarin. If we deal with Cantonese specifically, can we limit it to standard (prestige) Cantonese and leave Hong Kong/Guangzhou differences in quotes? There are too many issues with Chinese translations - tradional/simplified, PRC vs Taiwan standard, major and minor dialects/subdialects. Can the appendixes deal with the varieties, so that we limit Chinese translations to the highest level and most prestigious dialects? Perhaps, using this map as a guide? Map of Chinese dialects, even if we add some disputed varieties or use different preferred names. The differences between subdialects can be great but do we need to show all possible variants? Anatoli 04:27, 11 May 2009 (UTC)Reply
The decision about what to include, and how to include it, is up to all of us. However, if English Wiktionary's goal is to document the English translations of every word in every language, then we must be clear about what we define as a language. My criteria for calling something a distinct language is intelligibility. British and American English are variants of a single mutually intellible language. German and English are two distinct languages. I think words from the same language should be located in the same place. For the purposes of this discussion, let's define, for a moment, the "same place" as the same line in a translation table. For example, if English were hypothetically one of the foreign languages here, you would see something like:
A rubber or plastic device imitating nipple that goes into a baby’s mouth, used to calm and quiet the baby.
  • English: (United States) pacifier, (Britain, Australia, New Zealand) dummy, (Canada, Ireland) soother
  • German: Schnuller
Here are a few more Chinese examples that distinguish between variations within a language, and entirely separate languages. Again, since Min Nan and Mandarin are the two languages that I speak, I will use those two:
trash
Here is another:
egg
  • Chinese:
    Mandarin: Template:zh-ts (jīdàn)
    Min Nan:
    Amoy: Template:zh-ts: (Quanzhou, Tainan) ke-nn̄g, (Xiamen, Taipei) koe-nn̄g, (Zhangzhou, Yilan) ke-nūi
And finally:
tomato
I hope this clarifies the situation. I just want everyone to be absolutely clear about exactly what we're voting on. -- A-cai 11:22, 11 May 2009 (UTC)Reply
Whether it is a language, dialect, or subdialect, if it represents how a particular group of people would say it, then it belongs as a translation somewhere in the table. Our ultimate goal is to fully populate those translation tables. This is where the words belong, not in an appendix. DAVilla 05:56, 13 May 2009 (UTC)Reply

Votes

2 because I don't want to see e.g. sub-dialects of Mandarin proceeded by ***. DAVilla 05:23, 10 May 2009 (UTC)Reply
1 for me, as Chinese is what i would look for in the meaning of a term. However, i have no strong objections either way. i speak a number of dialects, including Hokkien, Teochew and Cantonese, and feel that they come naturally under Chinese, rather than Mandarin. This is my humble opinion (i could be very wrong), as i can't follow some of the arguments laid out here. Psoup 15:27, 11 May 2009 (UTC)Reply
Just to clarify, under either method Mandarin entries would be found under C for Chinese. This may be C as Chinese > Mandarin if not simply Chinese, but never under M for Mandarin, as there seems to be quite a bit of objection to that.
The question is mainly how the other dialects would be listed. Is Hokkien under H and Teochew under T (or maybe M for Min Nan), or would these but under C as well, for Chinese > Hokkien and Chinese > Min Nan > Teochew? DAVilla 05:33, 13 May 2009 (UTC)Reply
i feel that the Chinese dialects (such as Hokkien) should fall under Chinese. This will help structure the dialects and sub-dialects. For example, there are sub-dialects of Cantonese, such as SayYup (i was born into a SayYup-speaking family), and this should fall under Chinese > Cantonese > SayYup. In other words, SayYup should not be under S. To look up a word in SayYup, i would logically look up the written form in Chinese, and then the phonetic form in SayYup. The other way around does not seem logical to me. (Incidentally, the phonetic differences between SayYup and Cantonese is probably greater than that between Cantonese and Putonghua, and a native speaker of Cantonese in Hong Kong will probably not understand SayYup. However, such a person will be able to find the written word in Chinese, and then drill down to the word in SayYup, if it has been created in Wiktionary. ) Psoup 03:36, 14 May 2009 (UTC)Reply
I am neutral on the subject, with two caveats:
  1. I am opposed to lumping every Chinese dialect together under one language header called "Chinese."
  2. If we can't reach broad consensus, I'm in favor of maintaining the status quo. -- A-cai 11:55, 10 May 2009 (UTC)Reply
1. What is the status quo, A-cai? What structure are suggesting if, for example, you add a new translation? We have translations under simply * Chinese, under * Chinese ** Mandarin and sometimes traditional and simplified on separate indented lines.
There shouldn't be any *** under dialects of Mandarin or Cantonese, otherwise, it will be a mess and we won't help anyone wanting simply to find a word translation into Chinese. The non-standard Mandarin words could be flagged as such regional, dialectal, etc, e.g. I - (ǎn) (regional), how - (zǎ) (regional), etc. I don't see the need for them in translations but in separate entries. Anatoli 13:11, 10 May 2009 (UTC)Reply
1 - this is the ſtraightforward approach, I already expreſt mine opinion, vide ſupra. The uſer hight Bogorm converſation 13:19, 10 May 2009 (UTC)Reply
By "status quo," I'm referring to the format that can be found in the translation section for carnation and child. -- A-cai 15:06, 10 May 2009 (UTC)Reply
That's 1 then. The child has 3 levels. Is it really necessary? Can we keep to 2 levels? For example, ** Min Nan: 囡仔 (gín-á), 孥囝 (nou5gian2) (Teochew)? Anatoli 22:39, 10 May 2009 (UTC)Reply
1 - like carnation.--Ccsheng125 04:51, 13 May 2009 (UTC)Reply
2, or 1. On the actual stated difference between 1 and 2, I think that 2 is better, but that 1 is still O.K. The issue I consider more important is the one labeled "However, this can be decided later": whether to give Mandarin preferential claim over the name "Chinese", or whether to say that all these languages are equally "Chinese" and equally not. If we indent them all under "Chinese", then that should go for Mandarin as well; and if we list them all out separately, then Mandarin should be labeled "Mandarin", not "Chinese". (That said, if we don't group the languages, I would accept something like "Mandarin Chinese" to help people who are searching for "Chinese" to find the Mandarin translation, since that's probably what they want.) —RuakhTALK 16:30, 10 May 2009 (UTC)Reply
1 for me, because that is how we think of this group and how we look it up. —Stephen 15:49, 11 May 2009 (UTC)Reply
If the decision is to split Chinese languages/dialects (I am not in favour of this), "Chinese Mandarin" is better (or even Chinese (Mandarin)) than "Mandarin Chinese" because people look for Chinese dictionaries, not Mandarin. For example, Google translates into Chinese, not Mandarin.
This is not to neglect Chinese dialects and I am not playing any political games. It is for users wanting to know how to write/say something in Chinese, in 99% of cases they want to know the standard or Mandarin translations, when they want the Chinese translations. I know that terms Chinese and Mandarin are not identical terms but they are for many users in practice.

Anatoli 22:39, 10 May 2009 (UTC)Reply

1. Ƿidsiþ 10:56, 13 May 2009 (UTC)Reply
Whoa this is gettung waaaaaaaay too long. Partially a case of on my part. Anyway in my opinion the nesting is best so put me down for that. 50 Xylophone Players talk 17:02, 19 May 2009 (UTC)Reply
1. I think that most people would find it most natural to look for Chinese when they want to know what an English word are called in Chinese. Notice that both google's translation tool and babel fish use the world Chinese not Mandarin. Kinamand 11:25, 2 June 2009 (UTC)Reply

meta:Wiktionary/logo/refresh/proposals

For those who have an interest in the logo vote, now is your chance to decide how nominations should proceed. Conrad.Irwin 19:46, 5 May 2009 (UTC)Reply

Submit it on the proposals page, not here... Conrad.Irwin 15:29, 12 June 2009 (UTC)Reply

Needed: Wiktionary:About Croatian and Wiktionary:About Lithuanian

I would like to ask the main contributors of Croatian and Lithuanian info to the English Wiktionary to make some "about" pages like we have for several other languages.

In particular I have noticed that some editors have been adding pronunciation diacritics in the "alt" version of words in these two languages and I would like it documented somewhere exactly which diacritics are used for each language and what they indicate. For an idea of what else to include in such pages please see the existing ones in this category: Category:Wiktionary:Language considerationshippietrail 05:50, 6 May 2009 (UTC)Reply

There is no need for Wiktionary:About Croatian - Ivan, who is a Croatian user, already created Wiktionary:About Serbo-Croatian, all you need is there. See the discussion for that at Wiktionary:Beer_parlour_archive/2009/March#Serbo-Croatian (already archived) or on the talk page there where the approach is justified. The uſer hight Bogorm converſation 06:27, 6 May 2009 (UTC)Reply
Then I shall make a redirect. Thanks. — hippietrail 11:42, 6 May 2009 (UTC)Reply
Call me out by name, why doncha! =p I'll see what I can do about WT:ALT[ R·I·C ] opiaterein22:54, 7 May 2009 (UTC)Reply

The new look of the translations tables

I haven't been very active since the new look of {{trans-top}} has been effected. I'm not going to assert it's because of that change, but the new look certainly puts me off, seriously. I think it's awful and it gives me associations of being a software programmer or something way too technical for my comfort. I would expect other users to be alienated by it similarly. I have looked around (including the template talk page) for others who had brought this grievance to the fore, but I haven't been able to spot any discussion threads on this topic. Are there some that I haven't noticed? Or am I merely a lone discontented voice out-of-synch with the public opinion on this? __meco 08:44, 6 May 2009 (UTC)Reply

Ya it's pretty clumsy and confusing looking. Off-putting. -- Thisis0 15:27, 6 May 2009 (UTC)Reply
I don't remember the old look. What about the new one puts you off? What associates it with programming? Michael Z. 2009-05-06 15:50 z
I much prefer the new look. The old boxes were impossible to get rid of if you didn't care to see them. Now that they're collapsible, it's much easier to work around them. The only thing I can think of that I would change is the background color. That pale yellow is a little bit icky. — [ R·I·C ] opiaterein23:01, 7 May 2009 (UTC)Reply
Ah, that is not the new change, that's the "old change" – which I'm all for. It's the added options and ability to move entries between columns which I find so confusing and annoying. __meco 01:14, 8 May 2009 (UTC)Reply
If I understand correctly what you are referring to, it's called Assisted Editing, and was coded by Conrad Irwin. See the discussions above entitled "Editing without Wikitext? Introducing User:Conrad.Irwin/editor.js" and "Assisted editing a success?" to read the discussion about it. Hope this helps! 75.215.191.18 (really, User:JesseW/not logged in) 07:22, 8 May 2009 (UTC)Reply
You can turn off the new buttons at WT:EDIT, though if you have an idea for making that interface look more friendly I'd be very glad to hear it. Conrad.Irwin 08:05, 8 May 2009 (UTC)Reply
I wonder if it could be hidden until the user clicks on "Add". (This would allow adding to every translation table on the page?) Also there was the suggestion to spread the interface across the bottom using both columns so that every table doesn't look unbalanced. DAVilla 04:34, 10 May 2009 (UTC)Reply

Special Characters in edit box like Wikipedia

Hey, I would assume this has been asked before, but I can't find the thread. Why-come we don't have a special characters box present on the 'edit' page, to insert special characters? You have no idea how many times I click over and find a random wikipedia page and click 'edit' just to nab some curly quotes or a dash. -- Thisis0 15:26, 6 May 2009 (UTC)Reply

We do. it's just under the Save page button. (If you click on the drop-down menu, you will see even more possible sets of characters/templates than even Wikipedia use). Conrad.Irwin 15:28, 6 May 2009 (UTC)Reply

Script userboxes see below @ WT:BP#User script templates

If no one objects, I'd like to start making things like w:Template:User cyrl-2 and w:Template:User ipa-3 soon, to show knowledge of various scripts, to go with our knowledge-of-various-language userboxes. I've brought up something similar to this before, but I kinda forgot about it. Anyway, I'd like to start doing this by the end of next week at the latest — [ R·I·C ] opiaterein22:59, 7 May 2009 (UTC)Reply

Template:en-adj provides false information by default

Default use of {{en-adj}} (with no parameters) displays "more"/"most" as comparative and superlative forms of an adjective, which is incorrect for most common English adjectives. If someone writes an entry on English adjective and is not aware what {{en-adj}} actually does (one may think it just adds an entry to a proper category) and does not look carefully on the preview, they may save false information.

Most recent example: edit in "phoney" by a native English speaker. I think that as not all editors of English Wiktionary and definitely not all readers of it are native users of English language, many of them may not spot the mistake in entries.

I think that no template, and especially this, should generate any automatical, default inflection forms. --Derbeth talk 10:53, 9 May 2009 (UTC)Reply

Well, actually, most of these (deprecated template usage) more + [adj.] and (deprecated template usage) most + [adj.] constructions do occur, and in the case of the more common adjectives, virtually all of them can be attested in such comparative and superlative constructions. Take, for example, your example: “more phoney” is pretty common, whilst “most phoney” is also pretty clearly attestable; OTOH, phonier and phoniest are more common (whereas phoneyer and phoneyest are much rarer). Whether such constructions are standard or not is very much up for debate; however, since we ostensibly wish to include “all words in all languages”, it is appropriate for {{en-adj}} to display these comparative and superlative constructions automatically since, in the vast majority of cases, it will be reflecting the facts by doing so.  (u):Raifʻhār (t):Doremítzwr﴿ 11:29, 9 May 2009 (UTC)Reply
It's a little hard to check this on the corpus of Current American English because [most + Adj] will find both relevant constructions like the most important thing we can do, where most modified important, and irrelevant constructions like most young people enjoy pop music, where most determines people. Still, with that proviso in mind, I poked around and it looks like there are thousands of legitimate most + Adj combinations, where there are fewer than 1,000 adjectives that take the morphological ending -est.--Brett 15:20, 9 May 2009 (UTC)Reply
It’s also worth noting that almost all monosyllabic words can form their comparative and superlative forms by the suffixation of (deprecated template usage) -er and (deprecated template usage) -est, respectively, as can very many disyllabic words; however, very few tri-or-more-syllabic words can do this, their forms being constructed phrasally as (deprecated template usage) more + [adj.] and (deprecated template usage) most + [adj.].  (u):Raifʻhār (t):Doremítzwr﴿ 15:26, 9 May 2009 (UTC)Reply
I rather doubt that the community would ever accept your viewpoint that "no template [] should generate any automatical, default inflection forms" since we use just that feature to generate English plurals, as well as most forms in most declension tables throughout Wiktionary. It would be extremely time-consuming to have to enter the 100+ inflectional forms of Latin verbs by hand, rather than using the current templates (which require the user to enter at most 6 parameters). I fail to see how asking users to enter those 100+ forms by hand would reduce the numbmer of errors generated. I expect quite the opposite would happen; we would have more errors and they would be harder to spot. --EncycloPetey 21:19, 9 May 2009 (UTC)Reply

I think it's better to have no information in 100,000 entries than have false information in say, 1,000 entries. Like on Commons: better to remove all files with missing copyright information than risk that one per 1,000 or 10,000 of these files would cause a legal action against the service. Providing false information causes service reputation to be seriously damaged, it's not easy to rebuild it later.

If Wiktionary is to follow the rule of "no original research" and verifyability, it should not take Google as a reliable source, because there are lots of people who don't know how to speak or write in their own language. --Derbeth talk 18:09, 9 May 2009 (UTC)Reply

I bed to differ with utmost emphasis. I they "didn't know how to speak or write", then they wouldn't write to begin with, and we couldn't understand the language. I just don't like it is valid on Wiktionary too. More people are writing, and more of that writing is accessible now than ever before. It's patent to anybody with a modicum of logic that there will also be more documented language variation (compare the similar variations when Old English began to be written, and in the 18-19th when more people began writing, these were times of massive language creativity). Circeus 19:30, 9 May 2009 (UTC)Reply
Wiktionary cannot, and does not use "NOR", it does use WT:CFI which does prevent use of anything that is not "durably archived" (i.e. most of the internet). If you think that words don't meed CFI, then they should be deleted or RFV'd, if you think CFI is wrong, then that is another matter - but I doubt it will be changed much. Many of our inflection templates automatically cater for the most common types of word - this is desirable, as it saves effort on behalf of contributors. As with everything, mistakes will be made (more so by newcomers), but these are (from experience) no more common than the mistakes that are routinely made throughout entries. Conrad.Irwin 19:47, 9 May 2009 (UTC)Reply
A good bulk of the entries that use -er and -est have already been created because these are, generally speaking, the shorter words. For the rare exceptions, a contributor will be very likely to catch the mistake even if unfamiliar with Wiktionary when he or she reviews the page. Plus, virtually all new entries by anons and new users are looked over by admins (SB in particular) to check for vandalism. There are bigger worries for correctness of content. DAVilla 04:25, 10 May 2009 (UTC)Reply
It is unthinkable to expect less than 1% error rate in a wiki. Also, errors of fact on one hand and copyright violations on the other hand are two groups of issues with varying risk and seriousness levels. And, Wiktionary is a descriptive dictionary, documenting above all how people actually use the language in durably archived media, not how someone thinks people should be using the language. Google is not being used as a source; Google is used to find sources that use the terms to be documented, such sources as printed books available in Google books. --Dan Polansky 08:15, 10 May 2009 (UTC)Reply

I think that making a template provide false information in more than marginal number of cases by default is completely crazy. It's like introducing mistakes in random places all over the project. Yet another recent example I found: uncountable noun shown as countable. If person creating the entry decided (perhaps just because they forgot it) not to fill the word inflection, the entry should have no inflection. Not some "default" one.

You cannot treat all adjectives as using "more"/"most" for building comparative and superlative and you cannot treat all nouns as being countable. The other option is perhaps as popular as the previous. Lots of words describing abstract ideas are uncountable.

I cannot image how anybody could trust a dictionary providing random information. This is like Wikipedia provided a default value for city population in its infoboxes. Fortunately it does not. --Derbeth talk 09:51, 10 May 2009 (UTC)Reply

For the template to generate an incorrect result, it would have to be used by someone who knew of the existence of the template, but not of the documentation (which would imply a newcomer), and not have had their edit checked, and most newbie edits are patrolled (particularly page creation). [Alternatively it could just be a mistake by anyone, but mistakes happen everywhere all the time]. While I will happily admit that there are problems with the templates that we use, doing the right thing by default is not one of them. Conrad.Irwin 10:17, 10 May 2009 (UTC)Reply
This seems to belong to the broad topic of defaults in templates. Given you are a software developer, think of a method in C++ that provides a default parameter. It is up to the caller of the method (or template) to make sure the default behavior fits the case to which the method is applied; if it does not fit, the parameter should be explicitly provided by the caller. While Java does not support defaults directly in the way in which C++ does, it can do a similar thing by having several methods of the same name but varying number of parameters. This is routinely done in Swing. Errors in method calling are by no means constrained to the misuse of default parameters.
It is not the dictionary or the template that provides "random information"; it is the person who entered the template without explicit parameters who entered an error. By your reasoning, there should be no defaults for templates at all (defaults that make a difference in facts rather than formatting), because there is always a chance that the editor forgets to enter a parameter when entering the template. But then, there is also the chance that the editor enters a wrong template. And there is also the chance, one for which wikis are criticized, that editors who do not know what they are doing enter false information. As I see it, the burden for using templates correctly is with the person who enters the template. There is the broad wiki principle that wikis are almost by definition revisionist, Popperian, so to speak, not only getting cumulatively extended, but also getting corrected. Entering wrong information without templates is an order-of-magnitude more ample source of wrong information than a template that is not fool-proof.
On the benefits side, defaults in templates make it clear that there is a regularity to which there are exceptions, even if numerous exceptions.
To go for a specific another example of template, there is the, AFAICT, very useful automatic declension template {{cs-decl-noun-auto}}, which requires the user of the template to verify that the results are correct. Still, that makes it possible to see clearly whether the declension of the word is perfectly regular or whether there is an exception. The template saves a considerable amount of work, but produces wrong results when entered mindlessly.
Not all parameters of templates (and methods) are amenable to defaults. It is when a parameter gets the same value for a large number of calling cases that a default is in order. There is no repetation of sizes of populace of cities, so there is no supported default. Because the majority of English adjectives do get graded using "more ..." and "most ...", a default makes sense. On the other hand, if there would be no exception to the rule, there would be no parameter at all, as there would be nothing to distinguish. --Dan Polansky 11:25, 10 May 2009 (UTC)Reply
Derbeth, is this a purely hypothetical situation, or did you find a significant number of erroneous entries? Which ones? Did you correct them? --EncycloPetey 19:59, 10 May 2009 (UTC)Reply

User script templates

So I got right on the ball with this one for once, and now you can use these templates in your Babels the same way you use the language templates. Hope this catches on well :) — [ R·I·C ] opiaterein23:22, 9 May 2009 (UTC)Reply

Forgot to mention, in case anyone feels like knowing before jumping into the category, we currently have 13 scripts fully covered from level 0 (no knowledge) to N (native user) Arabic (code Arab), Armenian (Armn), Bengali (Beng), Cyrillic (Cyrl), Devanagari (Deva), Georgian (Geor), Greek (Grek), Hanzi (Hani, Hant and Hans), Hebrew (Hebr), Latin (Latn) and Thai (Thai) — [ R·I·C ] opiaterein23:26, 9 May 2009 (UTC).Reply

Attestability for transcriptions

Should not words such as hoeng1gong2 jyu5jin4hok6 hok6wui6 jyut6jyu5 ping3jam1 fong1on3 be attested (use, not mention), or is there a special rule for transcriptions? I would think that a single attestation might be sufficient in such cases, but that at least one should be required. I cannot find the answer in policy pages. See Talk:hoeng1gong2 jyu5jin4hok6 hok6wui6 jyut6jyu5 ping3jam1 fong1on3. Lmaltier 17:06, 10 May 2009 (UTC)Reply

I don’t see why they should be attested, but I also don’t believe we should have entries for most transcriptions. Mandarin written in Pinyin is one of the few exceptions, since some publications actually write Mandarin this way. Entries for Hepburn transcriptions of Japanese are okay, since they are so standard and we often see Japanese written that way. But delete hoeng1gong2 jyu5jin4hok6 hok6wui6 jyut6jyu5 ping3jam1 fong1on3 and confine the transcription to 香港語言學學會粵語拼音方案. —Stephen 15:56, 11 May 2009 (UTC)Reply
This was my concern. I agree. Lmaltier 19:27, 11 May 2009 (UTC)Reply

xx-inflection of

I want to create a couple of language specific inflection "offspring" of {{inflection of}} starting with {{is-inflection of}} which should support inclusion of {{strong}}, {{weak}} and {{posi}}. Can someone do this for me so that I might be able to manage the creation of any other that may need to be created? 50 Xylophone Players talk 19:13, 10 May 2009 (UTC) P.S. Explain everthing in plain English as I know next to nothing about programming languages, etc.Reply

Note: The template already supports those, it just doesn't do so with abbreviated forms. The template was set up to allow abbreviated forms for a few of the more common grammatical items, but when something is included that isn't part of the abbreviated set, it displays whatever was entered.
It should be a simple matter to set up {{is-inflection of}}. I can do that, if you like. But, as I said, the current template will already accept and display strong, weak, etc. if you include those words as unabbreviated parameter values. --EncycloPetey 19:57, 10 May 2009 (UTC)Reply
Oh, really? Well, I don't know enough about this stuff to have been able to realise that. I think I'll make {{stro}} as a redirect (I'm just thinking for the sake of speedy creation) I seem to remember being unable to type indef and get the desired result with inflection of before. Can you explain to me why this occurred? 50 Xylophone Players talk 20:12, 10 May 2009 (UTC)Reply
If you want to add additional shortcuts, then you should have a separate template. Because of the template coding, only the most common and widely used options were included. Adding more options increases strain on the server. There is also little reason to abbreviate "strong" to four letters. If you use "indefinite", it will support that. If you see a need for a separate template with its own shortcut "indef", then that is also a possibility. However, the current setup does not suppport the abbreviation "indef". Each additional abbreviation increases the server strain from the template for all entries using the template. --EncycloPetey 20:14, 10 May 2009 (UTC)Reply
So how does {{hu-inflection of}} accept ine, tran, efor, cfin, etc? 50 Xylophone Players talk 20:21, 10 May 2009 (UTC)Reply
It calls another template {{hu-grammar tag}} that has been given a list of acceptable abbreviations for use with the template. Open the source for that secondary template to see the list. Even with no programming experience, the list in the coding should make sense.
The way the two templates function is: The primary template accepts the data and formats the text for the definition line. The primary template asks the secondary template for help with the grammar abbreviations (any parameters in position 2 or 3, in this Hungarian instance). The secondary template then contains a list of acceptable abbreviations, checks against that list, and expands it to the full form when one is used. --EncycloPetey 23:05, 10 May 2009 (UTC)Reply
Ahh, so what I need is {{is-grammar tag}}? 50 Xylophone Players talk 23:15, 10 May 2009 (UTC)Reply
Yes. Note that the method used for the Hungarian templates differs from the way the generic {{inflection of}} template works. It is able to do so by (1) limiting the number of calls to the secondary template to just two, and (2) having a language-specific list of abbreviations. By contrast, the generic template allows for a variable number of grammatical parameteters, but is limited to a small set of abbreviated forms. Anything it doesn't recognize as an abbreviation is presented as it was entered. --EncycloPetey 23:05, 10 May 2009 (UTC)Reply

Webster's quoteless quotes

Many word senses imported from Webster's 1913 dictionary end with an author's surname, but without a proper quote (e.g. upflow). Obviously a real quote would be preferable, but is there a better way we can format this? Single word surnames should obviously be expanded to full names so that when "Speed" is mentioned after sense #2 of abider one doesn't confuse w:John Speed and speed. Anything more than this? --Bequw¢τ 09:32, 12 May 2009 (UTC)Reply

[This version of abider] is the one that illustrates Bequw's point. DCDuring TALK 11:49, 12 May 2009 (UTC)Reply

I wonder whether some adept could devise or suggest one or more tools to speed the location and insertion of the proper quote. The idea would be to automagically insert the author's name and the quote (or headword) into a google search template and then speed the manually selected quote into {{quote-book}}. DCDuring TALK 10:58, 12 May 2009 (UTC)Reply
It is possible to speed the quote-finding process by wrapping {{b.g.c.}} around the quote, previewing, and getting the material from the search results. Often a WP link to the author is needed for the original year= date. I don't know whether further speeding is possible, but I haven't been inserting urls because of the extra keystrokes required. DCDuring TALK 11:49, 12 May 2009 (UTC)Reply

Reverts at Wiktionary:Criteria for inclusion.

See Wiktionary:Criteria for inclusion?action=history.

The banner at the top of the page clearly states that it "should not be modified without a VOTE." Further, we recently had a VOTE to change that to "should not be modified without discussion and consensus. Any substantial or contested changes require a VOTE", and said VOTE failed. However, editors continue to make modifications — without even discussion, as far as I can see.

So the de facto policy seems to be merely that "Any substantial or contested changes require a VOTE"?

RuakhTALK 13:36, 12 May 2009 (UTC)Reply

I think it's more like "Any policy changes require a VOTE", which leaves out typos and the like (imagine if we did a VOTE to correct every typo) -- Prince Kassad 13:41, 12 May 2009 (UTC)Reply
Re: "imagine if we did a VOTE to correct every typo": Yes, that's why most people voted not to require a VOTE for unsubstantial and uncontested changes — but that failed, so we do. (In the case of a typo, I think a note at the talk-page, and waiting a day or two for comments (to make sure that it is in fact a typo), should suffice for the "discussion and consensus" clause.) —RuakhTALK 13:57, 12 May 2009 (UTC)Reply

FYI, the referenced vote is Wiktionary:Votes/pl-2009-03/Removing vote requirements for policy changes. 16 people said it should be OK to make minor changes like this one, but 7 people opposed the idea, so the voting requirement stands.  :-( —Rod (A. Smith) 15:22, 12 May 2009 (UTC)Reply

We would probably need to have a dreaded Vote to amend WT:CFI. For example, we need to:
  1. incorporate no-change-without-a-vote language,
  2. correct those portions that are drafted so as to allow internal contradictions to develop
  3. remove elements that might be better on a more flexible policy page (not as tightly protected as this) or a guidelines page.
You could think of WT:CFI as a part of a wiktionary "constitution", intentionally inflexible. But perhaps we haven't finished drafting it yet. DCDuring TALK 15:44, 12 May 2009 (UTC)Reply
The irony of it is that no one ever approved the CFI; as far as I can tell, it magically become immutable when a certain editor replaced the old version of {{Policy-SO}} ("This Policy has Semi-Official status, having some degree of support in the community. Please see the discussions on the attached Talk: page if you want to contribute to the further development or adoption of this policy, to bring it to the level of Wiktionary Official.") with a redirect to {{policy}} (which he had just changed to read "Wiktionary Policies, Guidelines and common practices page. It should not be modified without a VOTE."). Before 28 January 2007, we did have a fully-functional VOTE process, but for some reason the aforementioned editor bypassed that process when he instituted CFI (and all other policy think-tanks and semi-official policies) as policy that could only be mutated by that process. (Note: I'm not naming the editor in question, because I don't know whether he was acting unilaterally, and it doesn't seem that anyone really objected. But if you want to ask him about it, look in the history of any of the policy templates.) —RuakhTALK 16:03, 12 May 2009 (UTC)Reply
Right. CFI became official with this change. The only related comment I can now find is the one currently at the top of Template talk:policy, but I think it was part of a policy struggle between that editor and Gerard (who subsequently left Wiktionary to start the project now known as OmegaWiki). —Rod (A. Smith) 16:50, 12 May 2009 (UTC)Reply
DAVilla, Ruakh and Rod are right. If more people than those who actually voted wish to change the metapolicy that currently reads "It should not be modified without a VOTE", let them state it here and the vote can be restarted.
On another note, there is a thing about the metapolicy that I, a non-native, find slightly confusing. It says "should not", not "must not". In Czech, "should not" would be read as "in most cases better to avoided, but people will do it anyway", but I vaguely remember that, in English, "should not" is used as a polite way of saying "shall not" or "must not". Then, is "should not" actually meant as "shall not" and "must not", that is, "under no conditions and circumstances is it allowed under the pain of penalty of blocking" or something of the sort? (Excuse the non-native confusion, please.) Do we have a guide here on Wiktionary on these deontic (duty-like or obligation-like) modalities? --Dan Polansky 16:04, 12 May 2009 (UTC)Reply
Your understanding is right. "Should not" is very strange here, since it doesn't imply that it's absolutely forbidden (as, say, "must not" would), but does imply that there are no exceptions (unlike, say, "generally shouldn't"). "Should not" makes sense in circumstances where there's no enforcement (e.g., "You shouldn't expect him to come", where it's entirely up to you whether you actually do expect him to come), but it doesn't make much sense here. (Re: "I vaguely remember that, in English, 'should not' is used as a polite way of saying 'shall not' or 'must not'": that's true, but it's a very bureaucratic thing, used when an institution has a great deal of control. In normal circumstances, the polite way to say it would be something like, "Please do not modify this page, except to implement the results of a VOTE.") —RuakhTALK 17:01, 12 May 2009 (UTC)Reply

Does anyone else find this perversely hilarious? Michael Z. 2009-05-12 16:16 z

Hell, yes.  (u):Raifʻhār (t):Doremítzwr﴿ 16:42, 12 May 2009 (UTC)Reply
It seems like an organic, common-law-style, wiki-like development of a constitution, more visible if not more risible than the typical law- or sausage-making process. DCDuring TALK 16:52, 12 May 2009 (UTC)Reply
Notwithstanding its origins. DCDuring TALK 17:05, 12 May 2009 (UTC)Reply
I would, if I didn't find it so frustrating. The CFI are seriously broken, and there's no consensus for them — but somehow they're policy, and we can't fix them without consensus? Even if the latest VOTE had passed, I'd find this frustrating; but its failure is beyond. —RuakhTALK 17:01, 12 May 2009 (UTC)Reply
The recently failed vote and the nature of the trivial factual edit which prompted this[7] emphasizes that both parties are in the right. Our guideline has not only utterly failed, but thwarts resolution. Please, let's rerun the other vote immediately instead of wasting time voting to make the change to CFI. Michael Z. 2009-05-12 18:25 z
Reverting improvements is petty.
Any improvement made is an improvement. Reverting it because of some ridiculous notion of "incorrect process" is beyond petty - it's a harmful waste of everybody's time.
Out of date policy pages are useless.
Our policy pages don't match our policy/"universal accepted norms". This needs fixing, by whom and using what "process" I really don't think it matters.
The VOTE about policy pages was flawed.
The issue is that the supporters were in favour of changing "policy pages", and the opposers were against changing "policy". These are not the same.

Conrad.Irwin 18:38, 12 May 2009 (UTC)Reply

The issue is not "incorrect process", but rather "violation of policy". And an improvement is not an improvement if it establishes the precedent that policy is not policy. Without some sort of consensus (1) that the edit was O.K. and (2) on exactly why it was O.K., it simply creates problems. And if we have that consensus, then we should have no difficulty passing a VOTE to fix {{Policy-SO}} accordingly.
If you believe the VOTE was flawed, then please, for Pete's sake, start an non-flawed one!
I wonder what would happen if we had a VOTE to accept CFI as policy?
RuakhTALK 20:06, 12 May 2009 (UTC)Reply
In no way were the improvements that have been made thus far a violation of policy, taking a purely legalistic view. Firstly, the text "It should not be modified without a VOTE" was added with no VOTE, and has thus no "policy" status. Secondly the vote that failed was "Removing vote requirements for policy changes" NOT "Removing vote requirements for policy page changes". People have always tweaked these pages to improve spelling and other similar mistakes without the need for a VOTE, the fact that a vote for the negation of something similar failed does not mean that a VOTE for "No one may edit a policy page without a VOTE" would succeed, and given that it failed with the majority supporting it, I find it incredibly hard to believe that the opposing point of view has demonstrable support either.
I have started a further discussion below, that may lead to a VOTE.
Voting on CFI is something we should do, but it will need a few hours of work before the page reflects the policy accurately enough to be voted upon. Conrad.Irwin 00:54, 13 May 2009 (UTC)Reply

I wonder: why are WT:CFI and WT:ELE not protected so that only admins can edit it, just like {{policy}}? That would reduce unwanted reverting interchanges to those among the wheels. This would make it less critical to protect the page by soft "should-not-be-modified" meta-policies. Edits like this from 2 March 2009 would be impossible. Afterwards, relaxing the "should-not-edit" metapolicy should be less critical. --Dan Polansky 21:16, 12 May 2009 (UTC)Reply

Doing so would bring false benefit, as admins are just as capable of making a mess of them as anyone else. Conrad.Irwin 00:54, 13 May 2009 (UTC)Reply
Although that is inconsistent, you are finding flaw with the wrong policy, in my view. Admins are trusted parties, but we are not the gatekeepers of what does and does not pass here, apart from what the community as a whole has decided, which is always open to interpretation in the first place. For instance, redirects have always been strongly opposed, but there are exceptions made even by those who oppose them. My views on how Chinese should be listed in translations are no more correct than those of a new contributor. Wiktionary documents all words in all languages, as is said time and again, but it turns out constructed languages needed further deliberation. Setting policy in stone is entirely anti-wiki. DAVilla 05:20, 13 May 2009 (UTC)Reply
I think there is a series of increasing levels of trustworthiness, from (a) anons, (b) registered users, and (c) admins. None of the groups is infallible, but there are clear differences. While many registered users are inexperienced and many take part in vandalism before they get blocked, this cannot be said of an admin, who first needs to gain a great majority - 75% support of the community in a formal vote to become one.
Only time would should what kind of further disputes admins would create about WT:CFI if WT:CFI is locked and the soft rules for its modification are relaxed.
Locking WT:CFI, as has now been done, only makes the soft metapolicy technically effective: given WT:CFI should not be modified without a vote, there is no point in the page being editable by any registered user. If WT:CFI is unlocked again, this will mean that Wiktionary:Votes/pl-2009-03/Removing vote requirements for policy changes is not taken all that seriously as failed. Now I am not saying that the vote should be considered failed; I do not know, and I think the only thing worse than an unfair domination of a minority by a majority is an unfair domination of a majority by a minority, hence the imperfect but practical majority (50%) rule used in democracies. --Dan Polansky 09:09, 13 May 2009 (UTC)Reply
I'm not so sure we should consider that vote as having failed, but even if we have to put it through again it will eventually pass in some form, at which point the basis for a lock will have to be reconsidered. DAVilla 17:44, 13 May 2009 (UTC)Reply

Update. I've un-failed the aforementioned vote ([[q.v.). —RuakhTALK 19:53, 13 May 2009 (UTC)Reply

Moving on

I feel that we must be very careful that we never need to introduce an Ignore all rules "policy" which, I assume, became necessary on Wikipedia because the policy lawyers were making bad rules. As I see it, at the moment our two main policy pages (and many of the others) are flawed because they try and interweave too much of what is "policy" with explanations and descriptions of "common practice". I propose changing this so that WT:ELE and WT:CFI contain documentation on the "current practice", leaving the pages Wiktionary:Layout policy, and Wiktionary:Inclusion policy containing short paragraphs codifying actual explicitly worded and voted on "policy". Yes, common practice will never contradict policy, and new policy can (and should) be derived from common practice. Then we could usefully have a rule that "policy cannot be modified without a VOTE", while leaving WT:ELE and WT:CFI under a looser restriction such as "An incorrect change to this page may result in a block." While this might seem an extreme solution, it is clear to me that we need to do something to prevent there being any "policy" that can ever be violated for a justifiable reason. Conrad.Irwin 00:54, 13 May 2009 (UTC)Reply

Strong support.RuakhTALK 01:02, 13 May 2009 (UTC)Reply
That's the cleverest idea. Do it, and please unprotect the page. Admins are users who may still skim but not longer scrub the policy pages, and cannot be solely trusted to polish them. DAVilla
Sounds good. However, "Yes, common practice will never contradict policy, and new policy can (and should) be derived from common practice" looks like a statement of a deadlock. A change needs to happen somewhere first: either in the practice or in the policy. If a policy is meant to always fall behind practice (as it is derived from it), then a policy cannot be considered unconditionally obligatory. The very term "policy" seems odd to apply to a document that only documents current practice after the fact. --Dan Polansky 08:42, 13 May 2009 (UTC)Reply

Just a minor comment. But having a vote to change the numbers in CFI to reflect how many ISO codes there are is completely ridiculous. You want to vote on keeping Wiktionary up to date? You want to vote on such a stupidly minor issue? I can't believe this made it to BP and how silly you all can be sometimes :P Shit like this is the reason I don't edit Wikipedia anymore. But then, I doubt even they would treat this issue as it's being treated here and now. — [ R·I·C ] opiaterein01:56, 13 May 2009 (UTC)Reply

Re: "You want to vote on such a stupidly minor issue?": Don't worry, I don't think anyone wants that. At least, I really hope no one does; and I'll be amazed (in a bad way) if this actually ends up coming to a vote on that. The question is, how do we avoid such stupidly minor votes? —RuakhTALK 02:17, 13 May 2009 (UTC)Reply
Support. What Conrad said seems great. DCDuring TALK 03:22, 13 May 2009 (UTC)Reply
Yes. The challenge will be figuring out what needs to be moved over to policy before switching ELE and CFI to descriptions of common practice. Hopefully RU and msh and the others who supported hard-to-change policies will speak up about what parts of ELE and CFI they consider critical. JesseW 05:42, 13 May 2009 (UTC)Reply
I've no time at the moment, I'm afraid. But surely those who voted for easier-to-change policies also have views on that question.—msh210 17:06, 13 May 2009 (UTC)Reply
I support CI's proposal.—msh210 17:06, 13 May 2009 (UTC)Reply

An analogy with constitution has been invoked, to justify the requirement of 75% majority. Isn't it that a constitution first needs to be accepted by such a great majority in order to become effective? Under this line of thought, the statement "the policy should not be modified without a vote" has never acquired the status of constitutional statement. Put differently, in order for a change of a statement to require a great majority (75%), the statement first must have been enacted by a great majority (75%). Thus, the statement "the policy should not be modified without a vote", which looks like constitutional for its being a meta statement about policies, is not constitutional for its never having been voted on, and thus, formally speaking, is invalid, not formally enacted. --Dan Polansky 09:09, 13 May 2009 (UTC)Reply

It need not be done exactly like that. The Magna Carta was never voted on by the "people", nor was it signed, but was considered to enact many of the rights of English commoners. It was itself largely based on the w:Charter of Liberties, an earlier statement of limits on the powers of the Crown voluntarily decreed by a Henry I.
We would look to WMF for hard and fast procedure. Otherwise, it is our own desires and expectations that shape things. My only thoughts were that our rules about voting itself were worthy of some kind of strong "constitutional" protection. Otherwise, there just needs to be some core of policy which requires votes to change. Whatever we invoke to justify summary deletion of contributions and blocking of users (as we now invoke WT:CFI) should be policy. (Not to say that enforcement won't require interpretation and discretion.)
There should be plenty of opportunity for guidelines and draft and proposed policies and guidelines that codify our best practice and thought for accessibility. I would argue that no uncodified practice should have any force whatsoever, so as to compel more codification so would-be contributors have something to learn from. Lists of leading examples of good entries would be a perfectly good alternative approach. DCDuring TALK 18:16, 13 May 2009 (UTC)Reply
Support with modification. I agree very much that we need to separate the policy from the "how-to" in our two primary documents. However, I prefer keeping ELE and CFI as the policy pages, since that is what they have been in the past (and what they are in past discussion threads). Rather than convert them to practices pages, with policy split out into new pages, I advocate distilling down the policies in those pages while relocating practices descriptions to new locations. --EncycloPetey 01:42, 21 May 2009 (UTC)Reply

are we biting the newcomers?

I hate to criticize Wiktionary's fine administrators, but there is something that has been bothering me for some time. It seems many users are blocked for making just a single test edit or two. For example, 216.120.137.66 (talk) made a test edit to compensate, which I had reverted. There was no further vandalism from 216.120.137.66 after I warned them, but they were still blocked. Granted, I am not an administrator and have no access to Special:DeletedContributions, so I cannot tell if any of those users were blocked for creating inappropriate pages. However, I feel that these blocks go against Wiktionary:Assume good faith.

In comparison, Wikipedia users who make disruptive edits are often warned several times before they are blocked. On Wiktionary, many blocked users were not even warned once. I know Wiktionary isn't Wikipedia, but I feel that many of the blocks are a bit on the harsh side. Wiktionary's blocking policy clearly states that "simple 'Dave is a dork!' edits probably don't merit a block unless they are persistent."

Any thoughts? --Ixfd64 11:30, 13 May 2009 (UTC)Reply

FYI, there are no deleted user contributions. DAVilla 17:57, 13 May 2009 (UTC)Reply
Yes, Many. Look in the archives to this page for numerous similar discussions. Essentially problems occur, either:
  1. Deliberately and unconstructively (as in this case. in which case the editor is blocked or ignored).
  2. Through lack of understanding of Wiktionary (in which case the editor is welcomed, and their mistakes are explained).
As, in the first case, the person was not trying to help, there is (from experience) little point in trying to communicate with them. There are occasional mistakes in classifying editors between the two, but these are rare. Conrad.Irwin 12:21, 13 May 2009 (UTC)Reply
A short block serves two purposes. 1) It stops further vandalism. 2) It serves as a record that a person (or ip address) has vandalised before - so any subsequent vandalism can be dealt with more severely. SemperBlotto 12:26, 13 May 2009 (UTC)Reply
Question: knowing that mistakes do happen, where an individual is blocked for an edit that is interpreted as vandalism but was not malintentioned (I remember a case where a user had actually reverted vandalism and then was blocked himself, having been mistaken in identity with the vandal in a very sloppy case of detectivework), how can the record you speak of be amended to reflect their amended status as non-vandal, so that a patrolling admin might not so readily assume otherwise? DAVilla 17:57, 13 May 2009 (UTC)Reply
Block him again, for a second, with a note indicating that in your opinion the revious block was mistaken. This would be best coming from the original blocking admin, natch.—msh210 20:41, 13 May 2009 (UTC)Reply
My experience is that, when reverted and otherwise ignored (not blocked nor warned), a vandal stops very quickly (in most cases). This might be the most effective way to deal with them. If they insist too much, they should be warned, but they should also be strongly encouraged to contribute (in a constructive way). When nothing else works, they should be blocked (but blocking does not work with vandals contributing with many different IP addresses...).
I feel that a new helpful principle could be Vandals want to play. Don't play with them!. Lmaltier 20:50, 13 May 2009 (UTC)Reply

Random page per language

Connel's random page per language has been dead for a while and I've been using my Toolserver account a fair bit lately so today I whipped up my own implementation:

http://toolserver.org/~hippietrail/randompage.fcgi?langname=English or http://toolserver.org/~hippietrail/randompage.fcgi?langcode=en

Just replace "English" with any language name or "en" with any language code used in the English Wiktionary.

It only knows about page titles from the last official dump released: 20090509

It does not yet work with language codes.

It does not yet automatically update when new dumps are released.

It has an incomplete "special page":

http://toolserver.org/~hippietrail/randompage.fcgi?langs

Have fun with it! — hippietrail 08:36, 15 May 2009 (UTC)Reply

Awesome, thanks! FYI, thanks to Robert Ullmann's coding and Amgine's (I think) hosting, there are daily dumps available at http://70.79.96.121/w/dump/xmlu/.
BTW, is it intentional that it generates URIs of the form http://en.wiktionary.org/wiki/matere#Middle%20English?rndlangcached=yes instead of, say, http://en.wiktionary.org/wiki/matere?rndlangcached=yes#Middle_English?
RuakhTALK 14:45, 15 May 2009 (UTC)Reply
  • Yes I know about the daily dumps thanks but I still need to set up tools to download them and index them automatically with various error checking on the Toolserver.
  • No the messed up fragment vs query was a last minute hack to see whether keeping all the words files in memory on the toolserver might use up too much resources and get it killed. Anyway I've fixed it now and thanks for pointing it out to me! — hippietrail 15:35, 15 May 2009 (UTC)Reply

Non-Latin text seems to be broken. lang=uk takes me to pages like служниця?rndlangcached=yes#UkrainianMichael Z. 2009-05-15 16:43 z

The redirector isn't doing URL-encoding, so I guess it depends how your browser's HTTP implementation (and possibly those of any intervening proxies) handle UTF-8 data in the Location header. It works fine for me in FF3 and IE7 on WinXPPro. —RuakhTALK 17:25, 15 May 2009 (UTC)Reply
Fails in Safari 4/Mac. There is no charset specified in the server response. Doesn't HTTP use ISO-8859-1, unless another charset is specified? Michael Z. 2009-05-16 02:20 z
Should be fixed now. Let me know if not. — hippietrail 04:17, 16 May 2009 (UTC)Reply
Looks good. Thank you. Michael Z. 2009-05-16 15:05 z
Re: ISO-8859-1: Technically yes, but it should be moot, in that there's no need for the browser to do any sort of conversion (except perhaps its own URL-encoding, since the redirecting server obviously failed in that regard). There's no reason Safari should re-UTF-8-ify as it URL-encodes the byte-string. (But, I'm not really blaming it: this is a GIGO kind of situation.) —RuakhTALK 15:23, 16 May 2009 (UTC)Reply
You can select a specific language with http://toolserver.org/~hippietrail/randompage.fcgi?langname=Russian. —Stephen 15:58, 16 May 2009 (UTC)Reply

Category:Filmology

Previous talk:

  1. Wiktionary:Tea_room/Archive_2007/November#Category:Filmology
  2. Wiktionary:Beer parlour archive/2009/January#Category:Filmology
  3. #Category:Filmology, Template:filmology

Individual editors have expressed preference for Category:Cinema and Category:Film, but no one has come to any agreements. The current category name is plain wrong, so it must change. A new category name could be changed again anyway if we dislike it.

I'm going with the experts. OED labels this subject Cinematogr., so I'll now move this to Category:CinematographyMichael Z. 2009-05-15 17:34 z

Done. Michael Z. 2009-05-15 18:30 z
I haven't followed the previous discussions regarding this matter, but I'm quite sure that cinematography has a significantly narrower range of meaning than, say, film, or filmmaking, or cinema. "Cinematography" ordinarily refers explicitly to the photographic artistry and photographic technical processes employed in filmmaking. Consider that the academy award for Best Cinematography is only one of a great many academy awards in the diverse achievement areas (sound, costuming, editing, writing, etc.) which come together in the making of a motion picture. The "Filmology" category contained far more terms than simply terms relating precisely to the photographic aspect of filmmaking. I think Michael has jumped the gun here and that this change should be reconsidered. -- WikiPedant 22:48, 15 May 2009 (UTC)Reply
Or maybe you were asleep at the wheel when this was discussed over a week and then sat idle in the Beep for another 10 days. I've now acted on consensus, or rather mostly disinterest, and the discussion here at WT:RFDO#Category:Filmology is merely whether to delete an empty incorrect category page.
If cinematography is not your favourite synonym for filmmaking, then please go ahead and propose another change. I'll even help move items to the new name if consensus favours your suggestion over what the OED uses. Michael Z. 2009-05-15 23:36 z
By the way, the name of an awards category of the American Academy of Motion Picture Arts and Sciences doesn't define cinematographyMichael Z. 2009-05-15 23:39 z
Sorry I don't follow the project page discussions with sufficient regularity to meet your standards, but the fact remains that swapping "Cinematography" for "Filmology" was just plain mistaken. "Cinematography" has a much narrower range of meaning. As for the OED, I just spot-checked the OED entries for "pan", "out-take", and "montage" and found them all contextualized as Film. "Film", "Filmmaking", "Motion pictures", or "Cinema" would all be vastly more accurate substitutes for the defunct "Filmology" label than "Cinematography". PS -- I note that you have just rewritten the defn at cinematography, construing the term as a synonym for "filmmaking". Your defn is self-serving and inaccurate; it does not conform to the defns for "cinematography" in the OED ("the use of the cinematograph; the art of taking and reproducing films"), the Random House Dictionary ("the art or technique of motion-picture photography") or Wikipedia ("the making of lighting and camera choices when recording photographic images for the cinema"). -- WikiPedant 04:36, 19 May 2009 (UTC)Reply
1989 OED entries like zoom n. have Cinematogr. Looks like OED has changed the label to Film in 2002/2008 entries. Sorry I missed those. There's no basis for saying that cinematography is narrower than filmmaking; obviously OED thinks they are close equivalents, and film is currently an improvement.
Your “self-serving” accusation is pretty rank. I improved an incorrect and naïve definition. Please don't try to write lexicographical definitions by copying from Wikipedia.
The fact is that for two years the category name was wrong. (Have you even read what filmology actually is?) After 18 months of discussion, this community was unable to agree on anything that was right. I tried yet again to get some interest and find consensus, but that's not going anywhere. So I did some significant work to correct it. Maybe I didn't make the best call, but at least now it's not wrong and we all look a bit less stupid. So because I actually did something to correct a dumb mistake, the trolls come out of the woodwork to tell me what a dick I am.
The discussion is still open. Go ahead and get consensus for your favourite version of cinematography/cinema (that's short for cinematograph)/film/filmmaking/movies/visual media/whatever. Let me know when you do, and I'll even help move the contents of the category. In the meantime, do something useful instead of criticizing me. Michael Z. 2009-05-19 05:06 z
I apologize for my language,, and I don't have the energy to tone it down I've had a very long weekend that didn't go as planned. but I stand by what I said otherwise. Michael Z. 2009-05-19 05:13 z
I'd say let's go for "Film". Does anyone think "filmology" is better than "film" or that "cinematography" is better than "film"? As regards "cinematography", WikiPedant and EncycloPetey[8] expressed their misgivings about the narrowness of the term. While "film" also denotes a thin layer, I don't think anyone is about to create a category for thin layers. Let's avoid trying to find the best option, and see if we can agree on a proposal that, while perhaps imperfect, is good enough: "Film". Agreed? --Dan Polansky 18:47, 19 May 2009 (UTC)Reply
Yes, "Film". -- WikiPedant 23:14, 19 May 2009 (UTC)Reply
I agree. Ƿidsiþ 08:18, 20 May 2009 (UTC)Reply
I'll be happy with film too, by the way. Michael Z. 2009-05-24 18:56 z

Missing transliteration in the translations or entries

What's the method of requesting the transliteration in the existing translations? Or how do you add a translation where transliteration is unknown/not ceratin for Arabic, Korean, etc. As an example, I don't have my paper dictionaries handy. Could we have something like this?

Anatoli 04:37, 19 May 2009 (UTC)Reply

Without the ' ' for preference, maybe even just leave it blank so as to not confuse the unfortunate people who see □□□□□ even more (maybe not for arabic, but there are some unsupported languages). On a second note, I was intending to write an extension (either for editor.js or for MediaWiki) to allow for automatic transliteration, is that possible for Arabic? Conrad.Irwin 13:44, 19 May 2009 (UTC)Reply
I put '???' to avoid confusion with left to right problems, looked messy with Arabic. Instead there could be something like Please add a transliteration, if you can.
No, automatic and accurate Arabic transliteration is impossible. Even with a smart program, which would look up words in a dictionary (there is no reliable one available), there are many homographs. The issue (briefly) is with inserting correct unwritten short vowels or identifying absence thereof and geminating (doubling) consonants. The long vowels may be read as consonants or diphthongs, also depending on the preceding/following unwritten short vowels. Some text-to-speech engines do a good job with Arabic but they must be analysing some grammar as well, not just individual words. They are not perfect, anyway, at this moment. Anatoli 19:52, 19 May 2009 (UTC)Reply
I wonder if it could be automated at least for verb translations, where (assuming the bot knows some grammar) there are fewer possible vowel combinations? But maybe it's not worth the risk of errors. —RuakhTALK 20:03, 19 May 2009 (UTC)Reply
It could be automated if arabic words were written with all vowels - as they should be in the wiktionary. There is a category for entries lacking vowels Category:Entries which need Arabic vowels, by the way. Beru7 20:41, 19 May 2009 (UTC)Reply
Trouble is, if you pick up the word from Google translate or many other online dictionaries or simply a text, there are no vowels written. My preference is not to write all Arabic short vowels here but provide the romanisation, that way the Arabic words look the way they look in the real world but there is no problem with the pronunciation, as there is the phonetic guide. The romanisation, as opposed to Arabic vowels, will also unambiguously show marginal sounds, which only appear in loanwords or dialects - /o, e, g, p, v, tʃ, ʒ/, which Arabic vowels cannot do, also the issue with ﺝ and ﻍ where they can be used to represent /g/ instead of the expected /ʤ/ and /ɣ/ in loanwords or Egyptian Arabic (ﺝ). Anatoli 22:39, 19 May 2009 (UTC)Reply
What you haven't noticed, maybe, is that page titles are written with no diacritics for this reason exactly. That means you can cut and paste into the search box and land on the word page. There are already thousands of entries in the arabic section that have been written that way, so I don't think it is time to change that except if there are very, very good reasons to do so. And let's leave dialects out of this discussion: they have their own sections. Beru7 13:28, 20 May 2009 (UTC)Reply
Oh, please don't do that. It just looks messy, and could be mistaken to mean that you're not sure if that's the right translation. An admin would move it to TTBC, a newbie would mimic that with European terms they think might be right. DAVilla 18:37, 20 May 2009 (UTC)Reply
I don't think there's a specific way to do this (though maybe we should create one). I used to append {{rfscript|Arabic}}, which adds the entry to Category:Entries which need Arabic script, and add a comment that the issue is actually the opposite; but now that we have {{attention|ar}}, which adds the entry to Category:Arabic words needing attention, I usually use that instead — it's less precise, but also less inaccurate. But neither approach adds any sort of "transliteration needed" message to the entry. —RuakhTALK 20:00, 19 May 2009 (UTC)Reply
We have the category Category:Arabic words lacking transliteration. That can be used as is, or a template could be written to evoke it. —Stephen 18:56, 20 May 2009 (UTC)Reply
Perhaps these should be used more often. As for automatic transliteration, it is possible for a number of other scripts with some exceptions, including Cyrillic-based languages. Anatoli 22:39, 19 May 2009 (UTC)Reply
Still, automatic transliteration can be very useful for Arabic too. It can transliterate consonants, and leave the vowels to add by hand. E.g., it can give škr for Template:Arab, which can be easily tweaked to šukr by hand. That way you won't have to use the unscientific Arabic transliteration system you employ now (as far as I can remember, the easyiness of typing, i.e. sh for š, was the main argument in favor of it). --Vahagn Petrosyan 10:32, 20 May 2009 (UTC)Reply
There is nothing "unscientific" about using sh instead of š. It is a commonly used transliteration for shin, used in many scientific works, and by the library of congress. Beru7 13:28, 20 May 2009 (UTC)Reply
I consider that system to be "unscientific" for two reasons:
1) It transliterates many Arabic letters with two Roman ones. Theoretically, you have no way of knowing whether sh is IPA: /ʃ/ or IPA: /sh/, or gh is /ɣ/ or /gh/. It's ambiguous.
2) It's an anglicization, not a romanization. shukr will have to be changed to chukr in French Wiktionary, and we're supposed to be the papa-wiktionary from where everybody else can plunder everything they wish, without caring to make modifications. --Vahagn Petrosyan 13:50, 20 May 2009 (UTC)Reply
"Unscientific" may not be the best choice of words. But the system is imprecise and ambiguous. Dictionaries which use digraphs to transliterate Arabic utilize other conventions to disambiguate, such as italics or underscoring, neither of which are copy&paste compatible online. Better to have a one-to-one Arabic-Latin transliteration. kwami 18:09, 20 May 2009 (UTC)Reply
To be fair, I just noticed WT:AA proposes to use (-) to distinguish between sh and s-h. Still, this is not the best solution and there is the problem of transliterating ع with 3 which is unscientific. Anywho, we are not here to discuss the problems of Arabic transliteration systems, which can be easily solved if and when Conrad develops his automatic transliteration tool.
(To Anatoli) You can manually add the English entry into Category:Arabic words lacking transliteration or ask Conrad to tweak the Editor to do that automatically whenever transliteration bar is left empty. --Vahagn Petrosyan 19:12, 20 May 2009 (UTC)Reply
I used the transliteration because this was the agreement between at least, three user. Why didn't you guys take part in the discussion (Wiktionary_talk:About_Arabic)? If the decision is to change, I will follow. I don't think automatic transliteration is a good idea for Arabic and Hebrew because we will mislead readers with the incomplete pronunciation. Perhaps, it could be used to assist typing the transliteration but the editor should know the pronunciation, otherwise, they may leave it as škr. The automatic transliteration could be used for Korean, though (I know the formula to decompose the Hangul characters into Jamo components), Devanaghari, Cyrillic scripts. As for my original question, those templates can be used for Arabic entries, not for the translation. I will be away for two weeks. May not reply any answers soon. Anatoli 22:22, 22 May 2009 (UTC)Reply

Slow-loading, really big entries

I today found my first ordinary article that gave the "too big for some browsers" warning. It had taken ten seconds or so to load on my broadband connection. This miner's canary of entry was water. The translation section is huge. It might not be too soon to consider some way of accommodating this. For anyone without broadband, this would probably be inaccessible. I doubt that we can assume that global broadband availability will make this moot. Would we need a separate translation space? That would seem only temporary palliative. DCDuring TALK 15:10, 20 May 2009 (UTC)Reply

A custom translation page, analogous to the citation pages, might be a good idea. We could give a Translation subheading with a link and a warning that it's a large page. kwami 18:04, 20 May 2009 (UTC)Reply
There are many solutions to this, my favourite would be to load them on-demand when the user clicks "show". Given that only 0.00046% of our pages are above 20kb (which is on about 30 times smaller than WT:BP), I don't think this is a pressing problem. I don't think that there are any solutions with no negative side-effects, so probably should be ignored for the moment. Conrad.Irwin 20:31, 20 May 2009 (UTC)Reply
I think a water/Translations page would be a fine solution. They will be rare. bd2412 T 00:35, 21 May 2009 (UTC)Reply
(...with not one but two ugly boxes linking to Wikipedia! and a number of definitions that say "in plural".) DAVilla 18:31, 20 May 2009 (UTC)Reply
It would be nice of one of our tech people could design something that would allow the user to specify, perhaps in their css file, which languages to show (or not show) in translation boxes to avoid the nightmare of water and similar situations... — [ R·I·C ] opiaterein17:45, 3 June 2009 (UTC)Reply
Sadly, CSS cannot be used to prevent parts of the HTML from being downloaded. I think the ideal would be Conrad's suggestion, if we can make it work without too many problems. My thoughts on problem-minimization are as follows:
  1. For Ajaxy readers, it should look and behave just as a normal nav-bar, except for load-time between when the user clicks "show" and when the translations actually appear. (This will require some forethought, but is not particularly difficult.) For non-JS or non-Ajaxy readers, the "show" link should be an actual link to the separate wiki page hosting the translations for the section. (This one also should not be too difficult, but note that the "show" link is currently added by JS, so this will require a bit of work. Incidentally, note that this requirement means that the name of the separate wiki page has to be provided manually to the {{external-trans-top}} template or whatever; it can't just be inferred intelligently by JS code. Which, to be honest, is probably a good thing; there will be enough magic anyway.)
  2. In the raw wiki code (what you see when you edit), contents of a given translation table will appear almost as normal, but preceded by something like
    <includeonly><section begin="translations_foo_bar"/>
    and followed by something like
    <section end="translations_foo_bar"/></includeonly>
    . This will hopefully enable editors to edit the translations section without worrying too much about how it's implemented, and bots and external tools to ignore it entirely. (Drawback: it's likely that editors will sometimes copy this boilerplate code without understanding it, and without understanding why their translations don't show up. One option is for the Ajaxy code to display some sort of error message that tries to explain it, but this is basically a lost cause. Users don't read error messages, ever. There's an urban legend about a user having once read an error message, but it was debunked by Snopes.com.)
  3. A separate wiki page, probably in the Appendix: namespace, will consist entirely of
    {{#lst:baz|translations_foo_bar}}
    . The aforementioned Ajax can load this using /w/api.php?action=parse&page=Appendix:Translations/baz/foo_bar or whathaveyou.
But I haven't used Ajax all that much, nor really delved into the innards of labeled section transclusion; I welcome the opinions of anyone greasier. (And for that matter, the opinions of anyone else with opinions. I'm not picky. :-)
RuakhTALK 18:54, 3 June 2009 (UTC)Reply
This may raise hackles, but why not just have translations in, say, the top ten languages by number of speakers on the entry page, and relegate other translations to a subpage? bd2412 T 04:33, 6 June 2009 (UTC)Reply

meta:Wiktionary/logo/refresh/proposals

<BLINK><RED><BIG>

The Wiktionary Logo renewal plan has reached the stage where it is open for nominations. Rather than kicking up a fuss about the logo that Wiktionary gets given by other people. Get Involved Now. Conrad.Irwin 12:47, 21 May 2009 (UTC)Reply

</BIG></RED></BLINK>

MICRA's version of Roget's thesaurus

I have uploaded MICRA's version of Roget's thesaurus here:

A downside: the six subpages are very long. Yet, here it is, the complete MICRA and Roget's thesaurus wikilinked to Wiktionary.

If you like the thesaurus, it can be moved to appendix space. Enjoy. --Dan Polansky 16:15, 22 May 2009 (UTC)Reply

Mathematical definitions

I was wondering if it were possible (even theoretically, if not practically) to define mathematical terms using English language sentences rather than mathematical symbolism. For example, I would like to be able to define (deprecated template usage) hypoelliptic and all I have to go on is the Wikipedia article w:Hypoelliptic operator (which is fairly typical of the genre). SemperBlotto 07:24, 25 May 2009 (UTC)Reply

I'm afraid this is an example where defining is just not possible for someone without knowledge of fairly advanced maths. Possibly, the term would be based on a calculus-related definition of (deprecated template usage) elliptic (cf. elliptic operator). Circeus 11:45, 25 May 2009 (UTC)Reply
We certainly can't convey what something like this is in a dictionary definition. Is it possible to express its relationship to other mathematical or other entries: giving the various nyms and the branches of theory and application where it arises? I certainly didn't get much out of the WP article in that regard, so I don't know how you could get the information. BTW, I couldn't read the linked PlanetMath article, but it might have more. DCDuring TALK 12:05, 25 May 2009 (UTC)Reply
Yes, it's possible to word such a thing using English, and one's ability to do so is often a pretty good marker of his understanding of the subject (assuming he knows English: otherwise it's not, of course). Wikipedia's explanation of w:Frobenius number is:
Given n natural numbers with greatest common divisor 1, find the largest natural number that can not be expressed as a non-negative integer combination of these n numbers. For a given set this largest number is referred to as the Frobenius number .
Our definition of Frobenius number doesn't use symbols:
The greatest integer that cannot be formed as a sum of specific coprime positive integers.
It is less precise (refers to a "sum" of the numbers without specifying that a number can appear more than once in that sum, which is frankly incorrect and should be fixed) but is fairly decent for a dictionary entry.—msh210 01:40, 26 May 2009 (UTC) ...Now fixed. —AugPi 03:11, 26 May 2009 (UTC)Reply
How does this fit in with our authority-defying reliance on how words are actually used? We seem willing to defy many scientific and international authorities in technical fields. Should we be altering WT:CFI for some or all of the fields where technical correctness or voted-on standards exist? DCDuring TALK 02:12, 26 May 2009 (UTC)Reply
I think that if the usually given definition of Frobenius number is as stated above, and someone uses it without any context indicating that that's likely (or surely) not the definition he's using, then it counts as a good cite. (Where two conflicting definitions are common, which happens in math, the cites, I think, should be clearly of one sense or the other.) Moreover, I think that the current CFI allow for this. Does all that seem reasonable?—msh210 02:37, 26 May 2009 (UTC)Reply
Mathematics terms are in an even more restricted realm than most specialized terms. I hadn't noticed many mathematical terms being challenged on RfD or RfV, so I wonder about how it will work when it happens. Perhaps the topic=mathematics parameter will gain some use. We lack many of the most basic terms in statistics, let alone stochastic integrals. I hope we have good mathematical definitions for terms like "obvious", "trivial", and "an exercise for the reader". DCDuring TALK 03:05, 26 May 2009 (UTC)Reply
The mathematical definition for those would be, respectively: "There's probably an error here," "You can't see this, but I'm waving my arms vigorously", and "I was too lazy to write out a proof." :P --EncycloPetey 03:32, 28 May 2009 (UTC)Reply
Not an error, necessarily, but more than likely an unnamed assumption, in all honesty. The other two are dead on, so why the tongue? DAVilla 02:30, 31 May 2009 (UTC)Reply
I've extended a request for assistance to WP:WPMATH in writing layman's definitions of technical terms (giving (deprecated template usage) hypoelliptic for the example) and locating interesting quotations for terms like (deprecated template usage) sphenic number below. Circeus 02:45, 3 June 2009 (UTC)Reply

How to handle "see also"

What is the recommended way to do "see also", using an also template at the top of the page or a see also section in the body? I prefer the section in the body, but I see the other way done here and there as well. -- dougher 01:03, 27 May 2009 (UTC)Reply

Generally {{also}} is for entries that have very similar titles — basically a "you may have been looking for this instead" — whereas a ===See also=== section is for entries on other words that are connected in some way — basically a "you may be interested in this as well". For example, [[a]] might use {{also}} to link to [[à]], but ===See also=== to link to [[the]]. The two are not mutually exclusive; an entry can easily have both, and many do. In fact, I can imagine an entry linking to the same other entry in both places; for example, right now [[there]] doesn't link to [[there-]], but I think both {{also}} and ===See also=== would be appropriate ways for it to do so. —RuakhTALK 01:18, 27 May 2009 (UTC)Reply
To add to that, keep in mind that {{also}} appears above the first language section, so it cannot be language specific. Variations like capitalization, spacing, and hyphenation are common. In contrast, ===See also=== is specific to the language, and ====See also==== to the part of speech. DAVilla 07:57, 27 May 2009 (UTC)Reply
I agree. The fr.wiktionary policy is to mention only words with the same letters in the same order (but there may be differences in capitalization, spacing, diacritics or special characters (such as , - or /)). Other similarities are not considered. This rule is not subjective, and a bot could exploit dumps to do the job. Lmaltier 18:17, 28 May 2009 (UTC)Reply
We also allow double letters like nam and naam, or at least it's come up before without objection. But that isn't subjective either. DAVilla 04:15, 29 May 2009 (UTC)Reply
No, but only if rules are clearly stated (very similar is subjective). Lmaltier 08:50, 1 June 2009 (UTC)Reply

Latin orthography

I'm trying to find some sort of consensus (or indeed consensvs) on how to deal with Latin spellings on the Wiktionary. It's difficult with dead languages, especially one that existed for at least two thousand years and over half the globe. I suppose I'm talking here about things like (deprecated template usage) jocus and (deprecated template usage) iocus, which if I'm right should be (deprecated template usage) IOCVS in traditional Latin. A couple of sources to get us start: w:Latin spelling and pronunciation and Wiktionary:About Latin. Mglovesfun 09:54, 27 May 2009 (UTC).Reply

Having just read those two, what should we do with things like (deprecated template usage) jocus, alternative spelling, or just a redirection? Mglovesfun 09:57, 27 May 2009 (UTC)Reply

They should not be redirects, see WT:REDIR#Redirecting_between_different_spellings_of_words, so they should be alternative spellings pages. Conrad.Irwin 10:24, 27 May 2009 (UTC)Reply
I'd say jocus should be listed separately as an alternative spelling of iocus. Redirects are dangerous here because while jus as a Latin word is an alternative spelling of ius, jus is also a French word that can't be spelled *ius. Angr 15:13, 27 May 2009 (UTC)Reply
Are you thinking about which form should be the main one? For the benefit of users one would want to minimize the number of server-hitting clicks to get useful content. Latin terms could be reached from a search or from links, especially from etymologies. I would bet that the etymologies are a high percentage (without much fear of being contradicted by facts of which I have seen none). Whatever they now link to should remain the entry with the main content. The expectation of readers to see lower case letters is pretty strong. Putting a word in all caps is considered visual SHOUTING. DCDuring TALK 15:28, 27 May 2009 (UTC)Reply
We should decide — or have — on one standard (e.g., always using i, never j), and always soft-redirect (i.e., {{alternative spelling of}}) from the other if attested.—msh210 16:12, 27 May 2009 (UTC)Reply
This has already been decided Wiktionary:About Latin#Prefer spellings with I; do not use J, combined with WT:REDIR's policy on not using redirects for alternative spellings leave no alternatives to be considered. Conrad.Irwin 16:16, 27 May 2009 (UTC)Reply
The policy on I/J seems to contradict the policy on V/U. This should probably be looked into again. And while it is interesting to consider which should be the main spelling, the policy excludes other attestable spellings, which feels very wrong to me. There would always be at least two, the uppercase spelling that never uses J or U, and the lowercase spelling that substitutes these with the modern consonant/vowel expectations. When there is a substitution to make, there is also the lowercase variant where the spelling is unchanged, and potentially even a fourth common spelling where vowel V changes to U but consonant I does not change to J, if a word were to have both. Current practice prefers the latter when it exists, which could be very confusing indeed. It also picks what is probably the least likely attestable form of the word, a particular modern spelling without macrons. I don't believe macrons would actually double the number of representations. Certainly the uppercase forms need not have them, nor I'm guessing their lowercase counterparts that do not use J/U substitutions. But if it's printed in any of these ways, I would expect to see a soft redirect. DAVilla 17:00, 27 May 2009 (UTC)Reply
Re: "The policy on I/J seems to contradict the policy on V/U": I don't think so. My understanding is that modern printings of Latin works typically use English-style capitalization and distinguish U from V, but typically do not use macrons and do not distinguish I from J. So, we seem to be in line with modern standard practice. (Which is not to say that all-capitalized spellings, or spellings with Js or macrons or vocalic Vs, don't warrant some sort of redirection.) —RuakhTALK 17:27, 27 May 2009 (UTC)Reply
That's good to know. After reading EncycloPetey's response below, I take back most of what I said. DAVilla 03:18, 28 May 2009 (UTC)Reply
[e/c] The About Latin page says use only i, never j. That is (as I understand it), not even to soft-redirect. I recall a certain eminent Latin contributor's saying we should not have such soft redirects, and seem to recall his deleting them. I think that that's not the way to go, and if About Latin is policy I think policy should change (slightly). People will look for judico.—msh210 16:55, 27 May 2009 (UTC)Reply
Wiktionary:About Latin#Prefer spellings with I; do not use J, I assume there are quite a few editors of this paragraph as it seems to contradict itself a lot. It says 'do not use J' in the title, and then changes its mind later on. Plus 'prefer I' - well if you don't us J, surely I is all that's left anyway? So it's either redundant or confusing, in fact probably both. Mglovesfun 22:37, 27 May 2009 (UTC)Reply

I’m glad to see this issue raised, as it’s one that I also think is not best dealt with at present. I take the position that the inclusion of Latin words should be according to the (qualified) criterion of attestability. Take, for example, (deprecated template usage) Lua error in Module:parameters at line 360: Parameter "sc" should be a valid script code; the value "unicode" is not valid. See WT:LOS., which is attestable as IVVENIS, ivvenis, JUVENIS, juvenis, IUVENIS, iuvenis, iuuenis (but not *IUUENIS), juuenis (but not *JUUENIS), JVVENIS, and jvvenis (all the hits from sources which are purported write the term with ‘J’/‘j’ only are scannos AFAICT) — that’s ten different forms (and that’s without counting forms written with diacritics like breves and acute accents, such as juvénis &c.).

Counting just character substitutions, we have (at least) capitals–lower case, all I–I/J, all U–U/V–all V, AE–Æ–Ę–E & OE–Œ–Ę–E (<Ę> = e caudata), marca–bare characters–breves–both, acute–grave–zero-stress-accented, ſ/s–s only, &c. (2 × 2 × 3 × 4 × 4 × 3 × 2 × n… = >1,152 hypothetical variations); whilst it is certain that no Latin term will have over a thousand attestable alternative forms, the above does show the extraordinary potential and significant actual variation that at present we simply deny.

Many of those variables don’t really matter — viz. capitals–lower case, e caudata, and vowel-length marking (users are unlikely to search in all-caps, the e caudata is extremely rare, and macra / breves seldom ever occur outside textbooks) — but others do — I–J & U–V and the digraph–ligature–monograph trichotomy especially, whereas the use of the long ess and stress marking also cannot be simply dismissed as vanishingly rare or as unsearchable.

Rather than just assume that our users will know to edit automatically words they encounter to conform with our schema, we should provide them with information that reflects the complexity of the true picture of changing Latin usage over the millennia; such an approach would be far more in line with the ambition to include “all words in all languages”. We can still have our entries where they are at the moment, but we need to allow soft redirects to them from their various variants (in fact, it should be seen as A Good Thing™ that we have a pretty clear and uncontroversial policy governing whereat to house a Latin term’s “main entry”).  (u):Raifʻhār (t):Doremítzwr﴿ 23:43, 27 May 2009 (UTC)Reply

The Latin letters I and J are very, very late orthographic variants of the same letter, and not alternative spellings. Even in English, "long-I" (J) is used in some antiquated texts. In Latin it does not appear until very, very late in the language. The difference between iuvenis and juvenis in Latin is akin to the difference between cat and cɑt in English. If we're going to have entries for alternative orthographies of the same letter for words in the same spelling, then we should do that with all languages. Modern Latin dictionaries, textbooks, and the like do not use "j" precisely because it was not even thought of as a letter until after the Renaissance. Modern ideas about whether the letters are separate notwithstanding, we should judge the language by its internal rules, just as we do for other issues such as "sum of parts" discussions. Shoehorning this typography into every Latin entry would be like requiring every English entry with an "s" to have a long-S form; every entry that was capitalized in Victorian literature to have a capitalized counterpart; and every English word with an "a" or "g" to have variant entries with a single-loop "ɑ" and a double-loop "g". When we start doing that with English, French, etc., then it will be appropriate to do such things in Latin as well. Personally, I think such an efort is counterproductive and would make Wiktionary look foolish. By contrast, the Latin letters U and V began to diverge comparatively early. By the early Medieval period, their typography was distinct. Modern Latin dictionaries, textbooks, and the like routinely distinguish between the two.
Let's look at an analogous (sort of) example in English. I have, next to me, a facsimile of Shakespeare's Romeo and Juliet as published in the First Folio edition. The title at the top of the play proclaims "THE TRAGEDIE OF ROMEO AND IVLIET," so do we need an entry for Ivliet? Across the tops of the next few pages, the title is given as "The Tragedie of Romeo and Iuliet," so do we need an entry for Iuliet? Mixed in among these page headings later are ones that give the title as "The Tragedie of Romeo and Juliet," but the spelling Iuliet is the most common one in the text of the play. In the same play, I find Iohn (for John), subiects (for subjects), Vncle (for uncle), seruingmen (for servingmen), giue (for give), mou'd (for moved), vs (for us), moue (for move), ſerue (for serve), and all of this (and more) in the first page of the play. --EncycloPetey 01:16, 28 May 2009 (UTC)Reply
If we're going to judge it by internal rules, then why do we choose i as the class representative for I, J, i, and j? You've chosen the middle/late Medievel spellings with regard to the U/V distinction. I/J distinction did not come about until the Renaissance, and we couldn't have converted to writing Latin in minusules earlier than that, did we? In English, full majuscules are an alternative capitalization for titles, but it was the standard orthography for Latin.
What alternatives are there for variations in capitalization and orthography? I know if the page exists we use a See also, but what if it doesn't exist? How smart is Did you mean? DAVilla 03:12, 28 May 2009 (UTC)Reply
Latin has been written with miniscules since miniscules have existed (usually credited to scribes under the reign of Charlemagne). If we're going to worry about that, then we should run all ancient Greek words together since they typically didn't put spaces between words; nor did they capitalize according to modern usages. Shall we keep piling on questions without answering any? One possibility is to have an unlinked list of orthographic variants, perhaps in a template. These would be found using a site search. --EncycloPetey 03:24, 28 May 2009 (UTC)Reply
I guess I just don't understand ancient languages. To me something is lost, not gained, by adding spacing, changing capitalization, or otherwise interpreting a text differently than the way it's written.
You noted that orthographic variants exist in English etc. I'm not sure a template would be the best idea there, but these choices demanded by Latin may necessitate it for Latin if a more encompassing solution can't be obtained. DAVilla 17:35, 28 May 2009 (UTC)Reply
Re: "interpreting a text differently than the way it's written". Unfortunately, this is routinely done for any text whose spelling, puctuation, etc. is not considered "standard". Shakespeare's plays are almost never encountered with their original spellings and capitalizations (even on Wikisource!) unless you deliberately go hunting for a facsimile text. I spent several months one summer looking for an unedited copy of Milton's Paradise Lost. The problem is even greater with Latin, where most available texts have been edited to norms of spelling chosen by the editor according to standards of the day in which the text was published. I don't have any copies of Classical texts that I would trust to have preserved a Classical orthography with any fidelity. I do have copies of a number of medieval and Renaissance texts in Latin where the editor went to great pains to preserve oddities of the hand in which the documents were written.
Neither is the problem limited to Latin. Czech publications are notorious for silently standardizing and normalizing spelling of medieval reproductions. Furthermore, if you ever look at old handwritten manuscripts, then you begin to appreciate all the editing done to make the content accessible. Reading medieval hands is an art unto itself, and there are many marks and abbreviations that cannot be reproduced faithfully in Wiktionary because we don't have the character encodings to do so. There are Dutch maps I have with tildes over the final vowel (to represent a nasalization). There are Low German church records with the umlaut sound indicated by a small "e" as the diacritical instaed of the usual two dots. I have Hungarian and Polish documents from the medieval and Renaissance periods that use spellings and diacriticals that would make native speakers cringe because the notation isn't used anymore. If you look at Arabic, the problem gets even worse because you have calligraphers who wrote words inside of other words to make the script look pretty. We have to disentangle the individual words to represent them in a computer.
In sum, there are just some changes that have to be made in representing words from older texts. We may not be paper, but Wiktionary does have some severe limitations when it comes to ancient languages and archaic typographies. --EncycloPetey 06:08, 30 May 2009 (UTC)Reply
If that's true then Wikisource is total rubbish. What are people a hundred years from now going to say about what Wikimedia contributed to the world? A thick coating of whitewash? You say you don't trust the classical texts for doing the very thing you're proposing yourself. I understand, handwriting requires interpretation just as does speech, but not using the closest approximation possible alters the text unnecessarily. Modifying the corpus is not the way to preserve a language. While I can see standardizing our entries for valid orthographical reasons such as the lacking distinction at the time, it would be entirely inconsistent to ignore distinctions such as those on your Dutch map, and an outright lie to modify quotations when they don't conform. Our standard must be to preserve the texts as closely as we can. Someone has to make the effort now, or it will be impossible to do in a hundred years, just as such information from earlier works is already forever lost. DAVilla 02:25, 31 May 2009 (UTC)Reply
That's why I'm giving serious thought to preparing a First Folio Romeo and Juliet for Wikisource. I've raised the issue about their Shakespeare plays in the past, but nothing has ever happened. Their currently listed Shakespeare texts don't even alert the reader to the fact that they are modernized and edited! I just have to decide whether to pursue doing Romeo and Juliet or work instead on some of the other many projects waiting for attention. --EncycloPetey 02:36, 31 May 2009 (UTC)Reply

Template:UK, Template:US

Presumably I've missed some kind of protracted discussion, again, but how come {{UK}} redirects to {{British}}, whereas {{US}} doesn't redirect anywhere? It just sets my teeth on edge a bit because the word ‘British’, in reference to language, sounds very, well, American. I don't see what was wrong with ‘UK’, which after all is more inclusive (and a bit shorter and neater). Ƿidsiþ 19:13, 27 May 2009 (UTC)Reply

I tend to agree, British has more meanings than UK does. I'd say 'British English' is as misleading as 'American English'. They do speak English in South America, albeit not at a native level. Mglovesfun 22:39, 27 May 2009 (UTC)Reply
On the contrary, in Guyana it is at a native level. I wonder whether Guyana's English is more similar to the UK or US standard... The uſer hight Bogorm converſation 10:05, 29 May 2009 (UTC)Reply
(North) American English and British English are the names of the respective dialects used by linguists and lexicographers. Dictionaries label them Amer(ican) or US, and Brit(ish) (evidence at User:Mzajac/Dialect_labels).
These were both discussed:
The name British English has a history: until relatively recently it was the standard for “proper” writing taught in most of the British Empire and later British Commonwealth, while regionalisms were considered sub-standard. It is still the historical basis for most speech and writing, vocabulary, spelling, and usage in most Commonwealth countries (Canada being the major exception). There is no such dialect as United Kingdom English, or UK English. Today dictionaries essentially define British English as meaning “not American English.” Michael Z. 2009-05-29 13:31 z
I get all that, my question is not about what we call the dialect but how we label it on a definition line. If we are going to use British we should also use American. Personally I think UK and US is better. But currently we have British and US, which is just weird. Ƿidsiþ 13:41, 29 May 2009 (UTC)Reply
Dictionaries seem to use adjectives for countries, and attributive nouns where there is no adjective, or it would sound funny, or it's not well known, or represents a small region. Brevity is also valued, so while both American and US are used, perhaps the latter is more popular. But I would prefer Amer(ican), because it refers to American English, which is sometimes taken to mean North American. Some dictionaries use US and Canadian instead of North American. UK is not used, and is not the same as British. (We have long strings of labels on some definitions, and I think abbreviating their names with links to documentation would be an improvement, but that is another topic.) See User:Mzajac/Dialect_labels#The_form_of_dialect_labels for a list of labels actually used. What does your dictionary or dictionaries use for American and British English?
And certainly Category:US looks like a mistake among the members of Category:Regional EnglishMichael Z. 2009-05-30 02:04 z
If you say so, personally I don't think that putting UK means that we are suggesting there is a dalect called ‘UK English’ (not that that sounds particularly wrong). We still use US but I would still always think of ‘American English’ as being the dialect name. In short, UK and US are fairly obvious shorthand ways of referring to ‘British English’ and ‘American English’ which have the advantage of brevity and also of being, in my opinion, less patronising. Ƿidsiþ 06:39, 30 May 2009 (UTC)Reply
But “UK English” is practically a made-up name for British English, almost unattested in language writing. Its invocation of political boundaries is both over and under-specific, implying that there is some commonality in the English of England, Scotland, and Northern Ireland (but not in that of the Republic of Ireland). British English is associated historically and linguistically, by its source and scope of usage, with England, Britain, and the British Empire and British Commonwealth. UK English is a neologism lacking these connotations. Michael Z. 2009-05-30 16:49 z
British English has the appropriate vagueness which lets it refer to both the spoken English of London and the standardized English of the British Isles, India, and New Zealand, for example. UK English cannot be these things. Michael Z. 2009-05-30 16:55 z
I finally understand your position, now. To be perfectly honest, I never took “British English” to include the English of Australia and New Zealand — UK & Commonwealth is the label for that referent. Also, even if we were to take {{British}} to mean the English(es) spoken in most of the quarter of the globe that used to belong to Britain, how would we then label usages that are restricted to the UK only?  (u):Raifʻhār (t):Doremítzwr﴿ 17:35, 30 May 2009 (UTC)Reply
[It took me a while to get the picture of why this term appeared to be used with such variety too]
Well, that's part of the thing, there isn't a dialect or language variety of the UK. English in the UK and British Isles is represented by several national varieties—English and Welsh, Scottish, and Irish—and their regional subdivisions. There are terms for things which are specifically in the UK, for example SAS (1). But the SAS is called SAS in Canada and the USA too, so it's wrong to mark that sense as belonging to British English—“in the UK” in this case is a national topic, not a regional dialect (the only dictionaries to consistently label this phenomenon is the COD and its descendants, like the CanOD). Another example is Category:England and Wales law, which stems from the scope of a court system, not a language variety. Michael Z. 2009-05-30 21:19 z
Firstly, I've already said I have no position on what to call the dialect and this is just about labelling a definition line. We currently use US to tag definitions but most people still call the dialect ‘American English’. Secondly, ‘UK’ is not exclusive of Northern Ireland, quite the contrary, it is ‘Great Britain’ which is strictly just the mainland. Thirdly, as per Raif‘har above, no one is likely to interpret ‘Britsh’ as referring also to Commonwealth English. Ƿidsiþ 06:15, 1 June 2009 (UTC)Reply
It seems to me that you're suggesting we call the dialect “UK” for short. But no one calls it that. The standard abbreviation is Brit.
(I don't see what bearing the rest has on this. Firstly, US is used to label American English terms in dictionaries, UK is not used for British English. Secondly, I certainly wrote above that the UK includes Northern Ireland but not the Republic of Ireland, rather than following any dialectal boundaries. Thirdly, why would anyone better interpret a label that no dictionary uses, rather than the label that practically all dictionaries use?—is “UK & Commonwealth” found in any dictionary at all?) Michael Z. 2009-06-02 01:11 z
Yup: the OED uses “US” and “Brit” in its entries; nevertheless, that is really bad nomenclature — it doesn’t matter if that’s the lexicographical convention, because it’s so unintuitive and misleading, our users will assume that “Brit(ish)” just means the English of Britain and it will not occur to them that they might have misinterpreted the scope of the label. Let’s use {{UK|and|Commonwealth}}, if anything, please.  (u):Raifʻhār (t):Doremítzwr﴿ 20:46, 3 June 2009 (UTC)Reply
(Incidentally, according to Norri 1996,[9] the OED only uses labels to indicate “the variety of English, when the word is not current in the standard English of Great Britain”—so British is its default. Almost all other British and American dictionaries use British in the conventional sense.)
So you propose to mark other dictionaries' British with UK|and|Commonwealth, and presumably to replace our UK/British with the same. So then exactly what is signified by just UK or just Commonwealth? Since these would be new distinctions, used by no other source, what references would we base our assessment on? If an editor finds a term marked British in several other dictionaries would he be wrong to mark it just UK or just Commonwealth? Michael Z. 2009-06-04 04:21 z
Nope—that would just be wrong. Whatever definitions you see, British English is an international standard language—its name relates to Britain and the British Empire, and its nature is as much register as it is regional. British English may include RP and BBC English, and historically, and perhaps still, is the basis for much of the formal written and spoken language of Scotland, Ireland, India, Australia, etc. One might consider calling it Standard English, if there weren't already another important standard of American English.
British English is the name for it. Just because you don't like the sound of that, or you don't think it neatly corresponds to the borders of countries, gives you no right to reject universal lexicographical practice. Turning it into three (!) arbitrary dialectal subdivisions UK, Commonwealth, and UK and Commonwealth for the sake of more pleasing naming doesn't even deserve to be dignified as original research. Michael Z. 2009-06-05 18:35 z

Patroller's flag

Hi. Could you add at the patroller's flag the rights autopatrol & rollback? Thanks --Ivocamp96 13:27, 28 May 2009 (UTC)Reply

Hi, you already have the autopatrol right (it's allotted by the sysops at WT:WL when a user is editing well). We don't give out rollback or patrol as you'll get them when you become a sysop. (The last thread that discussed this seemed of the general conclusion that if you were good enough to be allowed to patrol, you were good enough to be an admin anyway). Conrad.Irwin 14:46, 28 May 2009 (UTC)Reply


Word of the Day

Hi. There is no Word of the Day for today, May 28.

It should come up as unsung. Take a look at the WOTD archive and try refreshing you screen, and if it's still not there let us know what you do see. DAVilla 17:40, 28 May 2009 (UTC)Reply

L2 header Serbo-Croatian instead of Croatian, Bosnian, etc.

note that the heading for this section was improperly altered from the original

(I tripped over this yesterday when starting Tbot on a new run; the first thing it did was add a new section for Croatian to the entry at rječnik; this surprised me as I would have thought it surely was there already! (rječnik is "dictionary" ;-) on investigation I found this very disturbing edit.)

At least one person (perhaps more, I have only started to look into this) has apparently taken some recent discussion of the "Serbo-Croatian" language group as licence to remove the valid, recognized languages (Croatian and Bosnian, and in some cases Serbian itself) from entries, replacing them with the invalid "language" "Serbo-Croation". Note that that is not a language recognized by SIL/ISO, and therefore does not meet CFI, the code ("sh") was properly deleted by SIL/ISO. (Serbian itself is "sr", alive and well ;-)

Given who has been doing this (that I have seen so far), I will respectfully ask that you undo the severe damage caused by the removal of the CFI-valid languages, in all entries thus far damaged. (If it was anyone else, it would have of course been instantly reverted and the user(s) permanently block/banned.) Robert Ullmann 09:16, 30 May 2009 (UTC)Reply

I don't know, and cannot formulate any plausible theory of, what might have happened after this discussion (among others) and before this one (among others). However, given that said thing(s) happened, I think any further discussion should join the existing discussion at Wiktionary talk:About Serbo-Croatian. —RuakhTALK 13:10, 30 May 2009 (UTC)Reply
L2 Bosnian, Croatian and Serbian are to be obsoleted and replaced by L2 Serbo-Croatian, and the details are still being worked out at the abovementioned policy draft page that is to be voted on some time in the near future (when technical details such as this one get sorted out, and when my Latin->Cyrillic transformation tool for SC entries becomes 100% operative). SIL doesn't really decide what is a "language" or not (we already have Wiktionary:Languages without ISO codes, remember..), and the macrolanguage terminology they use for SC (and many of other L2s the Wiktionary already uses, see the full listing [10]) is non-existent in general linguistics. It is much easier to escape unnecessary redundancy of having 2-3 separate but identical sections for 99% of SC words (words that are exclusive to either of the triad are really rare), and have only one instead. Also, sh code is safe to use as ISO doesn't assign two-letter codes anymore AFAIK. I've disabled the generation of Tbot entries for Croatian for now. --Ivan Štambuk 13:50, 30 May 2009 (UTC)Reply
How have you "disabled ... Tbot entries"? I don't know of any such mechanism? Robert Ullmann 15:46, 23 June 2009 (UTC)Reply
I've set the limit= parameter to 0 [11]. --Ivan Štambuk 16:20, 23 June 2009 (UTC)Reply
That hasn't been used since Decemeber 2007. Robert Ullmann 06:59, 24 June 2009 (UTC)Reply

This is totally unacceptable. ISO/SIL very properly deleted the code "sh" because it is both bogus and offensive. The languages are Bosnian, Croatian, and Serbian, all properly defined and coded.

Several people have begun severely damaging entries by "combining" the languages; it must stop and be un-done. "Serbo-Croation" is not a language, by international standards. Full stop.

If it continues we will have to call for community censure of the people pushing this offensive Serbian-nationalist POV. Robert Ullmann 15:43, 23 June 2009 (UTC)Reply

As I said, ISO/SIL don't define what a "language" is. It wasn't "bogus and offensive" 20 years ago and now suddenly is? Hardly, except in the minds of some insane nationalists. It's the most practical term to use unless we want to stray into needless political correctness with terms such as "BCS". The introduction of new 3/4 codes by ISO merely reflects the separate standardization bodies used for B/C/S(/M) nowadays. There are lots of languages which still don't have ISO code and rightfully deserve one, and there are even some languages that have ISO code but hardly deserve one (e.g. the so-called "Knaanic language", imaginary Slavic language no one's ever heard of).
I don't see how combining in most cases identical 3 language sections into one is damaging in any kind. It reduces needless redundancy and makes it easy for both the users (learners) and maintainers (editors). Most (up to 15 years ago - all) English-language textbooks and courses treat it as one language, with one combining dictionary, grammar...accompanied with commentaries on regional differences in lexis (the grammar is 99% the same). That's the best way to pursue on Wiktionary IMHO. --Ivan Štambuk 16:20, 23 June 2009 (UTC)Reply
For the record, I support this unification here on the grounds of usefulness and scholarly opinion, which have been explained amply by Ivan. I cannot see how this can be harmful; this is really only a technical matter. Also, I must say that it's very prejudiced of Robert to assume that "offensive Serbian-nationalist POV" is being pushed here; nationalism (on the level of Bosnian/Croat/Serb/Montenegrin) is the very barrier a unified header breaks, and, as Robert may perhaps not realize, although there was a time when attempts were made to impose elements common to speech in Serbia (such as ekavianism), the modern literary Serbo-Croatian (or whatever you'd like to call it) as it has been used in the old Yugoslavia and in later times, although always colored (as in any other language) by the writer's particular dialect, owes a huge part of its development to historical Croat writers as well as Serbs (Croatian vernacular literature being older). Also, Ivan could hardly be considered a "Serbian nationalist", as he is a Croat. – Krun 17:12, 23 June 2009 (UTC)Reply
I also want to note that the main dialectal differences are in practice not just drawn on ethnic (Serb/Croat) lines, as some people would like to think. Bosnian Serbs (i.e. ethnic Serbs living in Bosnia and Herzegovina), for example, predominantly speak an ijekavian vernacular, but still want to relate more to Serbian Serbians in language naming. In reality, this is one dialect continuum with the language standards, despite nominally affirming their separateness, that are almost identical. Even this yat-reflex issue is just minor and silly to reflect in spelling; it would be easy to instead just use ѣ, or ě in Latin script, that would then be pronounced slightly differently by different speakers; that would probably get rid of most of the existing spelling variations and simplify things greatly for everyone. Of course, we can't go as far as to set a new spelling standard, but as the spelling is, in Ivan's system it's very easy and straightforward to create pages for the yat-reflex variants, and they are shown for what they are (i.e. not necessarily specifically Serbian, Croatian, etc., but used in various regions in Serbia, Croatia and Bosnia and Herzegovina). – Krun 19:40, 23 June 2009 (UTC)Reply
That's all correct. Originally Karadžić's Serbian was ijekavian of his native Hercegovina, which was spoken by few Croats, but once his reform was officially accepted in 1868 they switched to ekavian variety which was prevalent in the mainland Serbia, and by that time ijekavian pronunciation became very spread in Croatia (chiefly due to the efforts of Croatian Vukovians like Maretić, Broz etc.), and by the twist of history the positions changed. Anyhow, It is indeed very unfortunate that the spelling with jat reflexes is not indicated with <ě> - in the 19th century during the Illyrian movement one philological school advocated exactly that, so that all speakers would write <ě> and pronounce /i/, /e/, /ie/ /æ/ or whatever their regional pronunciation was. You can see that spelling used in some 19th century magazines like Danica Ilriska. Unfortunately, strict phonological spelling eventually prevailed and so we're stuck with 2 script x 2-3 jat varieties of basically one and the same word.
What is important to understand here that Wiktionary is using scheme whose primary purpose is to enhance learning process, i.e. that person using Wiktionary to learn SC words doesn't end up wasting time chasing 3 or 4 language sections to discover variant spellings, depending on which ethnicity uses which jat reflex. Similarly, variants in lexis and morphology are all being reflected in ==Usage notes==. It is much easier to note in one entry that e.g. Croatian uses doktorica whilst literary Bosnian and Serbian prefer doktorka, and mutually link between those two, than create 3 entries on 2 different pages and let the poor reader figure it out on his own. This is not some kind of political agenda to "unify" languages - they're all 3 different standard based on 1 organic idiom (stylised Neoštokavian dialect). This is just a convention to ease the efforts of both editors and users. We already use the same thing for Hindi and Urdu (Hindustani, written in Devanagari and Arabic script respectively) and the Romanian and Moldovan (using the Latin and Cyrillic scrip respectively). It's just that with SC we have the luxury of using the same L2 section name, that has (had) it's own ISO code. --Ivan Štambuk 20:27, 23 June 2009 (UTC)Reply
To Robert Ullmann: Since you, on my user talk page, purport to know the criteria for inclusion so well you should have at least paid attention to what they actually say. They do not, as you seem to believe, stipulate that only languages with a current ISO 639-3 (or other ISO) code may be included. In fact, ISO 639-3 seems only to be mentioned for its inclusion of several constructed languages, which have been considered, and most of them accepted, here. That could be taken, at most, to indicate that we should consider including the languages included in the standard, but further down reasons (agreed upon by consensus) are stated for not including a few of the languages which are in the standard. It says nothing whatsoever about not including other languages; on the contrary, it states that "All natural languages are acceptable." It does not, however, state specifically what we consider a language, nor does it refer the matter to SIL's judgement. Here is a list of the numerous languages not in the standard that we are already including (and have created code extensions for many of them): Wiktionary:Languages without ISO codes. So, you see, Serbo-Croatian isn't the only one. There should of course be discussion on each case (as is now going on), but you don't have to come out of the blue and attack with prejudice and ignorance honest attempts at making a better Wiktionary. In particular, the so called "damage" you mention is nonexistant. It would be very simple to have a bot split the sh entries (in effect, simply tripling the vast majority of them), and that is precicely the point. We could simply vote on this (preferably when a few more users have contributed to the conversation, of course); to be honest, I am quite confident that this will pass through. It would be quite nice to see this matter finally settled. – Krun 23:11, 23 June 2009 (UTC)Reply

Oh dear, another Balkan war. Will those people ever learn anything? Jcwf 00:35, 24 June 2009 (UTC)Reply
The lesson from 1990s war was: never trust Dutch "peace-keeping" soldiers who shit their pants and hand 8000 unarmed civillians that are about to be slaughtered in the biggest genocide in Europe since WW2, and later come and preach of "war crimes" and "justice". --Ivan Štambuk 01:24, 24 June 2009 (UTC)Reply

Just thought I’d censor some personal attacks. Cut it out, guys. I couldn’t give a flying fuck about whichever brand of Balkan nationalism one party or another is trying to POV-push. With that in mind, please note that, FWIW, I tend to find that the stronger case has been made thus far by those who seek to unify under a single language header our presentation of the Serbo-Croatian language (insert or omit as appropriate: continuum).  (u):Raifʻhār (t):Doremítzwr﴿ 02:23, 24 June 2009 (UTC)Reply

Let's vote on this and decide once and for all. The vote should not suggest whether these are separate languages or dialects, which cannot be decided by vote of course, but to propose a standard of treating Serbian/Croatian/Bosnian/Montenegrin on Wiktionary. For the record, I would vote for unifying. --Vahagn Petrosyan 06:41, 24 June 2009 (UTC)Reply

This is not susceptible to a "vote". Given the history of the genocide committed by Serbians attempting to maintain a "greater Serbia" by massacring Bosnians and Croations, (and trying to exterminate the Albanian Kosovars), an attempt to remove Bosnian, Croatian (and Montenegrin) by "combining" them into a greater Serbian language is utterly beyond obscene. (With trials ongoing; Damir Sireta was sentenced by the Tribunal just yesterday.)

It would not be more offensive if someone put a swastika on their user page and went around deleting Hebrew from entries.

Suppressing Bosnian and Croatian cannot not be accepted. Robert Ullmann 06:55, 24 June 2009 (UTC)Reply

Embassy

An international embassy has been built for any trans- or inter-wiktionary exchange. JackPotte 06:45, 31 May 2009 (UTC)Reply

But you have written Welcome to the French Wiktionary... Furthermore, I consider Russian indispensible, since Russian is the language spoken by the greatest number of people on the Europæan continent. I shall propose my translation on the talk page and urge a native speaker to verify it. The uſer hight Bogorm converſation 14:38, 31 May 2009 (UTC)Reply
Umm, what is this for? Conrad.Irwin 18:16, 31 May 2009 (UTC)Reply
For people with scarce command of English, but who are able to converse in other international languages, e. g. French, Chinese, Esperanto and so forth. The uſer hight Bogorm converſation 18:38, 31 May 2009 (UTC)Reply
Which raises the following quæstion: many Wiktionaries have in their welcoming templates a line like You do not speak language X? Go to our embassy., mainly in English. But Template:Welcome does not have this line. I suggest inserting four lines in small script in German, French, Russian and Chinese with the same content. What do you think? We can not expect from every novice here to speak English flawlessly, especially if he uses Wiktionary for expanding his vocabulary. The uſer hight Bogorm converſation 18:42, 31 May 2009 (UTC)Reply
If said user plans to use Wiktionary he will need at least a basic fluency in English, everything is labelled in English, defined in English and translated to English (you can't even change the interface to be not in English unless you already know how to use MediaWiki). Given the very low proportion of people who will try to use Wiktionary without speaking a bit of English, I don't think there is a need for a seperate forum - though perhaps we should allow multilingual discussion on WT:ID. Conrad.Irwin 19:07, 31 May 2009 (UTC)Reply
Conrad, English is still the fourth most spoken language after Chinese, Spanish and Hindi (here, the second column) and if Chinese, Spanish and Hindi Wiktionaries allow embassies for foreigners with insignificant knowledge of their language, it would be impolite if they are deprived of the due reciprocity here. The uſer hight Bogorm converſation 19:29, 31 May 2009 (UTC)Reply
People who post on WT:ID are almost guaranteed a reply, as a large number of people watch that page, or at least visit it from time to time. Posts to the Embassy are unlikely to get a reply except for from the people who have read this thread. It's not that I don't want to talk to them, it's just that I don't think segregation is a good solution. Conrad.Irwin 21:38, 31 May 2009 (UTC)Reply
That's a good point. Since (some) other Wiktionaries have embassies, we should probably keep Wiktionary:Embassy as a target for interwiki links, but I'd be quite happy with it just having a brief paragraph in each language inviting visitors to comment in their own language at BP or ID. —RuakhTALK 22:48, 31 May 2009 (UTC)Reply
That would make sense, and also a brief description of how to useWT:BABEL would not go amiss. Conrad.Irwin 23:09, 31 May 2009 (UTC)Reply
Sounds good (Ruakh's, Conrad's ideas).—msh210 16:41, 1 June 2009 (UTC)Reply
We can't go by reciprocity, because that would obligate us to include notices in 100 languages, sooner or later. Perhaps we should provide notes in the languages of the Wiktionaries with most editors, since this is a place to communicate with ambassadors: foreign-language Wiktionary editors—not necessarily, and not only foreign-language writers. Michael Z. 2009-05-31 22:16 z
And if so, then Italian should get replaced with Portuguese before we add any more notices. Chinese, Polish, Turkish, Italian, then Japanese would follow. Michael Z. 2009-05-31 22:19 z
Huh? I don't understand what this Embassy is supposed to be good for. I don't see any analogy between the page created and real-world embassies. He who speaks no English should not edit English Wiktionary.
Admittedly, pages have been created in other Wiktionaries that carry in their title a foreign-language translation of "embassy". But from having looked there, I failed to notice any worthy use of these pages. I mean the following pages:
I see no point in replicating in English Wiktionary that which does not work in the original site. --Dan Polansky 15:32, 1 June 2009 (UTC)Reply
It doesn't necessarily have to be people with English as a second language; it can just be a way of coordinating erm... multilingual coordination. Perhaps rewriting it is better than deleting it, or making a redirection out of it. Mglovesfun 15:41, 1 June 2009 (UTC)Reply
Having said that, this page could do that as well, couldn't it? Mglovesfun 15:42, 1 June 2009 (UTC)Reply
If Beer parlour would not suffice for the topics under the head "international coordination", a page called "Wiktionary:International coordination" could be created. However, I don't see what the subject matter of international coordination is supposed to be. One topic belonging to the subject is the logo, which has been handled in Beer parlour. Another one is the proposal for unification of category structures across all the Wiktionaries, which I hope will be refused, and could as well be posted to Beer parlour. I am short of further ideas.
I have noticed meta:Wikimedia Embassy, which links to Wikipedia "embassies". From having looked at several ones in the languages that I understand, Wikipedia embassies mainly list, per language, people who understand the language and are willing to respond to inquiries. An example: W:Wikipedia:Local Embassy. Also German Wiktionary's de:Wiktionary:Botschaft mainly lists contacts per language. I doubt that this is very useful, though. Wikipedia embassies host no discussions. --Dan Polansky 16:34, 1 June 2009 (UTC)Reply
For me the 2 interests of the Wiktionary:Embassy system are :
  1. To compare and synchronize all wikies (eg: did you know that the fr.wikt proposes a specific filters search engine (eg: phonetic, rhymes and anagrams) since the last week ? We can also propose into it to export the translations "assisted editing" for that any ambassador would bring it into his known wiki).
  2. All wikies Beer parlours are too long to follow if we are searching something (or everything) to import into another wiki.
JackPotte 21:17, 1 June 2009 (UTC)Reply
A multilingual-coordination page should exist on meta (it probably does already, but no-one uses it) so that every Wiktionary can talk in the same place instead of having every conversation in 200 disparate embassies (which no-one will use, for exactly the same reason). Discussions that affect a local wiki should take place on that wiki's discussion forum, it is not up to a "central committee" to dictate what each Wiktionary does. Conrad.Irwin 23:36, 1 June 2009 (UTC)Reply
To my mind http://meta.wikimedia.org/wiki/Wikimedia_Forum name isn't enough explicit for these functions. JackPotte 07:32, 3 June 2009 (UTC)Reply
We (at sv:wikt) have seen that people have used other wiktionaries (chiefly en:wikt) as a source for translations. Obviously we are not alone, as Irish speakers since moved around sending messages to - as I understand - a large number of wiktionaries that the correspondingly created entries of names of countries failed to show some important information concerning use of definite article and, if memory serves, lenition(?). This would have been facilitated by a single "alert" page for all wiktionaries.
Though that may or may not be what Jack refers to... \Mike 23:02, 5 June 2009 (UTC)Reply

June 2009

Term for “pertaining to”?

Does anyone know what the grammatical term, if any, is for “a word pertaining to a subject”, such as “verbal” for “pertaining to words”? (In English these are often formed by using a Latin term plus -al.)

This came up due to a quandary, like all too familiar to the lexicographically inclined: I couldn’t think of a word. Specifically, “pertaining to dance”. I was tongue-tied. Mortified. It was awful. (The word, of course, is terpsichorean.)

I ask not idly – think of the category we could make! And an appendix for the irregular forms beyond counting!

Your assistance, as ever, kindly appreciated in this matter, dear sirs and madams.

—Nils von Barth (nbarth) (talk) 22:57, 1 June 2009 (UTC)Reply
I think the word you're looking for is adjective or adjectival. We already have a category for those. --EncycloPetey 13:27, 2 June 2009 (UTC)Reply

I don't think so. I think (s)he's looking for a term for these sorts of (semantic) relationships:

nounadjective
word(s)verbal
danceterpsichorean
sheepovine
nickname(s)hypocoristic
star(s)stellar

which would be a good word to have. If there isn't one already, maybe we should create one!

The relationship is kind of subtle; in the general case, the adjective can mean "of (a) ___", "pertaining to (a) ___", "being (a) ___", "resembling (a) ___", and maybe other things as well, but it seems like each specific noun–adjective pair has duked it out separately and come up with different rules for what the adjective is allowed to mean. :-P

RuakhTALK 15:04, 2 June 2009 (UTC)Reply

"Derived adjectives" would give us 'rainy' from 'rain', 'childish' from 'child', etc. Or are we concerned with cases were the adjective uses a different root than the corresponding noun? Something like "suppletive adjectives" maybe? kwami 18:36, 2 June 2009 (UTC)Reply
Yes, that seems to work. (Note though that "suppletive adjective" is often shorthand for "suppletive adjectival paradigm", such as good ~ better.)
David Beck (2006) Aspects of the theory of morphology, in the section "The typology of suppletion", p 421 example (9), has
Noun ~ corresponding denominal adjective (= relational adjective)
with examples like father ~ paternal, earth ~ terrestrial, sun ~ solar. Footnote 12 (p 461) then says,
This is a well known phenomenon: borrowed (here, Latinate) adjectives standing in suppletion relation to the native (Germanic) nouns. Compare these to the non-suppletive adjectives fatherly, earthly, sunny, etc. with more concrete meanings.
Though he never specifically calls them "suppletive adjectives", the fact he contrasts them to "non-suppletive adjectives" makes it pretty clear that's accidental. kwami 18:57, 2 June 2009 (UTC)Reply
Ah, so "denominal adjective" or "relational adjective" must be the term we want. Thanks! :-)   —RuakhTALK 19:09, 2 June 2009 (UTC)Reply
Depends on how narrow a category you want. A denominal adjective is any adjective formed from a noun root. "Fatherly" is a derived denominal, "paternal" is a suppletive denominal. "Relational" is an adjective used to classify a noun, not the word itself. So in "musical instrument", the adj. is relational, in "she's quite musical" it is not. kwami 20:51, 2 June 2009 (UTC)Reply
Paul Georg Meyer (1997) Coming to know: studies in the lexical semantics and pragmatics of academic English also uses "suppletive adjective(s)" on p 130.
Found a snippet from an abstract posted online: "Although many linguists have referred to [collateral adjectives] (paternal, vernal) as 'suppletive' adjectives with respect to their base nouns (father, spring), the nature of ..."
As for the term "collateral adjective", I found the following: Tetsuya Koshiishi, "Collateral adjectives, Latinate vocabulary, and English morphology", Studia Anglica Posnaniensia, Jan 2002.[12] The intro explains the term:
The purpose of this paper is to study the nature of collateral adjectives and the Latinate vocabulary in English together with some morphological problems relating to them. English abounds in pairs like the following, in which adjective counterparts are difficult to relate in terms of their shape to the base nouns:
(1) spring — vernal fall (American) — autumnal dog — canine wolf — lupine arm — brachial iron — ferric father — paternal day — diurnal summer — aestival winter — hibernal cat — feline horse — equine heart — cardiac ice — gelid, glacial mother — maternal church — ecclesiastical, ecclesial (2)
Pyles — Algeo (1970: 129) call the above adjectives collateral adjectives (CAs). According to their definition, CAs are "[adjectives which] are closely related in meaning but quite different in form from their corresponding nouns, like equine and horse". It seems that this terminology is strictly theirs, and I have not seen any other literature on word formation referring to it.
In the tradition of lexicography, on the other hand, the term CA was once used in the dictionaries published by Funk and Wagnalls in the 1950's. (3) Those dictionaries are peculiar in that they describe CAs in the entry of the base nouns. However, all of the dictionaries of that company are now out of print and hence we seldom see this term used in the lexicographic field as well.
He continues,
Although in its strict sense, vernal is not etymologically collateral to spring (i.e. no "same stock" can be assumed between them etymologically), I still refer to it as a CA because it is an adjective of Latinate origin used dissociatively.
That is, this author defines "collateral adjectives" as cognate, like father ~ paternal, but not pairs like sheep ~ ovine. But if you read the paper, that appears to be an assumption on the part of the author, due to the etymology of "collateral". Wikipedia at w:Collateral adjective would include sheep ~ ovine. If Funk and Wagnalls did as well, then we would have no reason not to ourselves.
Okay, it looks like F&W covered all such pairs, not just cognate pairs: "Another category in which the F&W prides itself are 'collateral adjectives', which are 'adjectival forms of the noun so remote in spelling that they may not be brought to mind' [p 7a] [arm .. n. 1. Anat. ... <> Collateral adjective: brachial.]."[13] Now, "brachial" is not cognate with "arm". Nor are other pairs I can see in Google snippet view (I don't have an F&W dictionary): dawn ~ auroral, heat ~ thermal, flood ~ diluvial. F&W was published at least up to 1984.
I think "suppletive adjective" and "collateral adjective" are probably equally good for sheep ~ ovine. "Denominal adjective" would be the term if you also want to cover "sheepish". kwami 21:38, 2 June 2009 (UTC)Reply

Thank you all so kindly, particularly kwami!

The tasks now before us are these, I perceive:

  • A general category – say, Suppletives or Suppletion or Suppletive forms, in Category:Grammar.
  • A specific category therein called Suppletive denominal adjectives or Collateral adjectives, for such words as ovine.
  • Properly, a containing category, Suppletive adjectives, for such words as better, best, worse, worst (compared to good and bad, but not denominal).
  • For completeness, a category Derived denominal adjectives for such words as sheepish, and a category Denominal adjectives to contain this category and that of suppletive denominal adjectives.
  • Similarly, Suppletive verbs for such verbs as be, and Suppletive nouns for such nouns as people.
  • Move Wikipedia’s List of irregular English adjectives to Wiktionary, under Appendix:Irregular English adjectives or Appendix:English suppletive adjectives or the like.

So I ask you, O Beer Parlourians all:

  • Do these seem the right items (categories, tasks)?
  • How find you these nomenclatural propositions?
—Nils von Barth (nbarth) (talk) 00:22, 3 June 2009 (UTC)Reply
I'd say that denominal adjectives are part of the lexicon, rather than part of the grammar.
Suppletive nouns may only make sense as suppletive plurals. That might be a clearer cat. name.
For clarity, we might want a more specific term for good ~ better than just "suppletive adjectives", since that term would cover father ~ paternal as well. Irregular adj. or suppletive adj. forms / paradigms would work. kwami 01:03, 3 June 2009 (UTC)Reply
Ok, I’ve made a category at Category:Suppletion, and various subcategories, together with Category:Irregular inflections to contain it – hope it looks good!
Once List of irregular English adjectives gets transwikied, I’ll fix it up (and categorize the words therein) and we should be sorted – thanks!
—Nils von Barth (nbarth) (talk) 00:30, 5 June 2009 (UTC)Reply

template:blend and category:Portmanteaus

Moved discussion to WT:RFDO.msh210 04:56, 8 June 2009 (UTC)Reply

POS prominence

There are frequently comments at WT:FEED to the effect that pages are "too cluttered" or that users "can't find the definition". While the nesting hierarchy of headers is pretty clear from the table of centents, it's not so clear just looking at the page itself, especially if there is a long etymology, several etymologies, or a lot of alternative spellings. Structuring entries so that the POS is the top-level header is, in my view, extremely undesirable. Therefore I wonder if we should consider ways in which the POS line(s) could be made to stand out more. Two options that suggest themselves are 1) they could have a differently-coloured or -shaded background, or 2) they could be accompanied by some sort of picture-style icon (in the manner of other Wiktionaries, only ours would ONLY use it for POS headers). Any thoughts? (Through PREFS, I have inflection templates display as coloured boxes where available, which virtually eliminates this problme; but that obviously isn't the default.) Ƿidsiþ 14:24, 3 June 2009 (UTC)Reply

This seems like a good approach to an important goal. I've wondered whether we couldn't make all of the headers other than the language and PoS headers smaller by default, while retaining their structural role. Icons would be an additional attention-directing tool for new and occasional visitors.
Icons might be a good way to draw attention to important content concealed under show/hide boxes, such as long etymology and pronunciation sections. It would be handy for registered users to be able to suppress the icons to quieten down the visual impact. Horizontalizing alternative spellings and pronunciation would also help. DCDuring TALK 15:48, 3 June 2009 (UTC)Reply
Just one comment: whatever we do we'd bettermake sure the section headers remain editable. The stupid uneditable sections at fr:wikt are the single major reason I don't touch that project if I can avoid it. Circeus 16:31, 3 June 2009 (UTC)Reply
I like this idea, and believe that User:Hippietrail has some javascript that will divide the pages up into sections (allowing our current layout to be colourised). Making headings smaller may not help that much, as then what little structure there is becomes harder to follow, but it may be worth experimenting with. As Widsith says, structuring entries by POS is extremely undesirable, and dividing them by Etymology and Pronunciation is, in my opinion, even worse. I would strongly support a large one-time-effort to restructure our entries to make them useful and usable, for editors, readers and robots. Conrad.Irwin 16:36, 3 June 2009 (UTC) (Section moved after some auto-conflict resolution misplaced my comment)Reply
On a semi-related note, I am particularly keen on an approach taken by http://www.wolframalpha.com/input?i=alphabet whereby they assign a short "word" to identify each definition. Even where the word isn't particularly appropriate, it gives some initial context which aids in understanding the definition, and it could be used to link to individual definitions easily. Conrad.Irwin 16:36, 3 June 2009 (UTC)Reply

Someone pretty recently wrote (at WT:FEED) "the definition is sometimes not clearly marked and you need to look all over the page" and I'll copy my response here:

We get that complaint a lot. Well, more precisely, there seem to me to be a lot of people who write, here at WT:FEED, that they went to a page and couldn't find the definition. Two options that I can think of off the top of my head, and which have surely been considered already, are:
  • Make the definitions slightly larger (in font size) by means of CSS. This would require a single edit to whatever css file: body.ns-0 #bodyContent ol li{...}.
  • Instead of ==Language==, ===POS===, infl, #defn, ====Other stuff====, use ==Language==, ===POS===, infl, ====Definitions====, #defn, ====Other stuff====. This would, of course, require lots and lots of edits, though most of them would be bot-doable.
As I said, I have little doubt these have been suggested before, but I don't know where or when.

End-quote. I also like DCDuring's idea of making headers' font sizes smaller (except POS and language headers'), though.—msh210 16:30, 3 June 2009 (UTC)Reply

Wiktionary:Layout woes shows some ideas, though many of them have not been discussed in detail. Conrad.Irwin 16:50, 3 June 2009 (UTC)Reply
I hope that whatever we do does not require changing the structure or, worse, debating about changing the structure. Nor should it make headings uneditable. Do we have test suite of a few different types of entries (fully compliant and not-fully compliant; with/without images; short/long) so that the consequences of different style-sheet decisions could be easily tested and seen?
The WolframAlpha thing depends on the existence of an appropriate synonym or context for each sense, which would itself be a good exercise. It would be interesting to test on something like "head". MWOnline uses bold synonyms in some of their definitions, but with a different layout, more like ours. Could we lead off a definition with a bold synonym? Perhaps either context or the bolded synonym could appear in ToC instead of some of the lower level headings. DCDuring TALK 17:06, 3 June 2009 (UTC)Reply
  • Indeed I have a JavaScript extension User:Hippietrail/addstructure.js which I made at least a couple of years ago which wraps all sections in correctly nested divs with classNames generated from the headings. CSS styles can then be applied.
    It still works unchanged now. I think it may not work with cirwin's "paper view" though and it doesn't do the addloadhook startup stuff that became standard some time after I wrote it. Any of the technically inclined may like to play with it. — hippietrail 18:24, 3 June 2009 (UTC)Reply

Animations policy

At Commons, looking for an image suitable for barber pole, I found a gif animation, which is now inserted there. (There are perfectly fine still photos of barber poles at Commons.) Should we have a policy against animations? Arguably they detract from our overall serious (?, boring?) image. An animation that initiated only at the request of the user might be different. DCDuring TALK 16:14, 3 June 2009 (UTC)Reply

No. Though now you've brought it up there are going to be endless arguments about exactly when animation should be used :p. In my opinion, almost any animation is fine, and most animation is good; the situations I would not want to see are where there are several animations clustered together on a page, or where the animation is not directly relevant to the topic (i.e. a video of someone running is not relevant to "leg") but as both of these also apply to static imagery I see no reason to differentiate. I strongly dislike the idea of policy, but I would have no problem with people creating Help:Illustration (or similar) that can give advice as to what looks tasteful. We don't currently have a forum for debating difference in opinion for page appearance, I would suggest that WT:RFC can perform this role, if and when such a disagreement arises. Conrad.Irwin 16:23, 3 June 2009 (UTC)Reply
I agree with Conrad.—msh210 16:33, 3 June 2009 (UTC)Reply
Me, too. BTW, if anyone wants to go nuts with this, entries like Geneva wheel and escapement — all the clock mechanisms, really — would be a good place to start. —RuakhTALK 17:21, 3 June 2009 (UTC)Reply
Yeah, let's forbid animations, because god knows they're completely useless. — [ R·I·C ] opiaterein17:25, 3 June 2009 (UTC)Reply
Stroke order hadn't occurred to me. Still, shouldn't it be at user initiative? At barber pole, the animation makes a minimal contribution to the entry and is somewhat distracting to old codgers (sample of 1). Is it a good illustration for guideline purposes of minimum suggested value? DCDuring TALK 17:40, 3 June 2009 (UTC)Reply
I for one find animations distracting. An animation that starts only after I click some button is okay with me. But I am not proposing anything like a ban on animations, cemented by a 75%-majority rule needed for its removal. I just think that in each case the pros of using an animation should be weighed against the con of the distraction effect. --Dan Polansky 18:16, 3 June 2009 (UTC)Reply
Limited use of animation can be a very good thing. Besides stroke order, animation can be more enlightening than static images for (1) encoding Sign Languages, (2) illustrating actions (e.g. parkour), and (3) representing 3-D shapes (e.g. dodecahedron). --EncycloPetey 02:47, 4 June 2009 (UTC)Reply
I'm nuts about animation. Look at drip. You couldn't do justice to the verb with a static image. bd2412 T 06:30, 4 June 2009 (UTC)Reply
I do appreciate that animations can be useful. Still, I have difficulties concentrating on reading the definition when I see something moving in one screen corner, which may be untypical of an average reader though. In parkour, I would ideally like to have a button to pause the animation, calmly read the definition, and then run the animation again, which I currently cannot do. Having this option, we can combine the benefit of having an animation with the benefit of having a calm screen.
An alternative to animation for "parkour"
A still image can contain an animation of sorts, as shown in the image for "parkour":
I still think that, per individual case, if the job of the animation can be achieved using a still image, an animation should better be avoided.
Let us see what Wikipedia has on animations: W:Wikipedia:Image_use_policy#Animated_images:
Inline animations should be used sparingly; a static image with a link to the animation is preferred unless the animation has a very small file size.
My concern is not about size though; it is about distraction, and about lacking control for the user to disable this distraction. --Dan Polansky 08:33, 4 June 2009 (UTC)Reply
I have up to now suppressed my itch to start ranting about evolutionary visual systems of humans that give increased priority to items that move, until I have come over a guideline from Jakob Nielsen:
Let me quote:
"These days, tiger-avoidance is less of an issue, but anything that moves in your peripheral vision still dominates your awareness: it is very hard to, say, concentrate on reading text in the middle of the a page if there is a spinning logo up in the corner."
--Dan Polansky 09:05, 4 June 2009 (UTC)Reply
I've ask the person who provided the Commons barber-pole animation about whether user-initiated animation is feasible. I find that particular animation makes me ill. In the meantime I'm going to insert a photo. DCDuring TALK 12:13, 4 June 2009 (UTC)Reply
“Sometimes moving” isn't even a defining characteristic of the term barber pole—there is no reason to substitute a movie for a good diagram in this entry. Michael Z. 2009-06-05 18:23 z
Dancing kitties would illustrate this dodecahedron even better.

Goodness, many of the hand-picked examples aren't even very useful, are they? What on Earth does does jiggling a dodecahedron contribute to the definition of dodecahedron? And traditional illustrations of calligraphy are more effective than a letter redrawing itself.[14][15][16][17][18] Please keep in mind that a dictionary only needs to illustrate a thing with characteristics difficult to define simply (like fraktur type), not explain its technical (encyclopedic) details.

Why wait around to see the strokes?

But the worst thing is that an unstoppable moving image on the page makes it almost impossible for me to comprehend even a sentence. Drives me freaking nuts.

Gifs are the worst kind of movie file. Any animation's motion should be user-controlled, as required by accessibility standards,[19][20] or clicked through to, via a link. Michael Z. 2009-06-05 04:41 z

I don't feel that a rotating dodecahedron or barber's pole adds anything; it seems gratuitous and irritating (even though most modern browsers are capable of disabling animations per site). But the calligraphic stroke orders are good; these would have to be very large otherwise and full of directional arrows. Equinox 10:44, 5 June 2009 (UTC)Reply
While we are at it: let's forbid anything three dimensional like dodecahedrons entirely, Equinox. After all if it does not conform to a proper 2D computer screen, it cannot possibly be real, can it? And let's reinstate the old Utah law that pi equals three. So much easier! Jcwf 19:11, 5 June 2009 (UTC)Reply
Yeah, clearly my calmly stated opinion on animations means I want to exterminate the Jews. Thanks for clarifying. Equinox 19:40, 5 June 2009 (UTC)Reply
P.S. Sorry to sound irritated, but then try to teach three dimensional objects like molecules or unit cells to freshman chemists that are of the computer screen-flatland-generation, like I do. They are hopeless... And yes they get annoyed at rotation tetrahedra at such, too. That is why I insist on showing them. Contrary to their strongly held belief they do live in a 3D world. So do you. Jcwf 19:11, 5 June 2009 (UTC)Reply
I've taught geometry to high school students, and they don't interpret static "3-D" drawings as three-dimensional. Classes like woodshop, drafting, etc are gone from US high schools, and the result is that the only experience students have with "3-D" is in computer animation. If it isn't moving or physically present, then it isn't interpreted as three-dimensional. --EncycloPetey 22:15, 5 June 2009 (UTC)Reply

After edit conflict: In light of the comment immediately above, the discussion my not have concluded.

  • I take the sense of this discussion (so far) to be:
    1. No desire for explicit policy or even guidelines.
    2. Acceptance of animation that contributes notably to the content of an important sense in the entry.
    3. A strong preference for user-initiated animation over other animation, where feasible.
    4. General user-oriented Web good practice provides useful guidance.
    5. A preference to have links to off-page (usually off-wikt) animations that might be low value, distracting, or not user-initiated.
    6. WT:RFC or WT:TR are the appropriate forums for reviewing animation content at this time.
    Is that about right? DCDuring TALK 19:21, 5 June 2009 (UTC)Reply
Thank you. Michael Z. 2009-06-06 15:39 z
Have a look at Mills Mess, it's just about impossible to explain a juggling motion with words and still images; but I'll happily abide by an consensus decision. Mglovesfun 19:37, 5 June 2009 (UTC)Reply
What about the potential for converting it to something user-initiated? It seems more valuable than the barber-pole animation. BTW, shouldn't the English entry be at Mills' mess, with "Mills Mess" being the German translation? DCDuring TALK 19:47, 5 June 2009 (UTC)Reply
I would like to point out the .gifs are not binary, they have three states. A .gif image can be static, it can loop continuously, or it can play once and then remain static. In image which plays through once and then is static on a relevant frame is often very useful, and they can be replayed by reloading the page or by reloading the image. - [The]DaveRoss 19:50, 5 June 2009 (UTC)Reply
Although the play-once solution isn't perfect, there are some cases where it would be good. Is that something that we can do without editing the gifs on Commons? What would be the required edits to the gifs or to our Image link to Commons? DCDuring TALK 19:58, 5 June 2009 (UTC)Reply
Not only that, but if I understand correctly (I've never done this myself), you can actually set it to play n times, where n is any non-zero integer in the unsigned two-byte range (i.e., any integer 1–65535). And for JS-enabled users, I think we can create a "replay" link that basically removes and re-adds the image. (But I haven't actually tried this. And the n times thing would need to be specified within the gif itself on Commons, which means it's not easily tweakable.) —RuakhTALK 20:23, 5 June 2009 (UTC)Reply
That would seem to be an approach that might work according to what little I've gleaned from WikiCommons Village pump. DCDuring TALK 10:36, 6 June 2009 (UTC)Reply
Let's just try to replace gifs with real movie files—javascript workarounds will always be hacks. For the same quality, they can probably be as small as gifs or smaller, if the video format incorporates frame-differencing or whatever. A movie has the following properties that gif + javascript doesn't:
  1. Can meet web accessibility standards.
  2. Players are ready to embed today, rather than inventing a gif player.
  3. A movie can start in an initial state, without motion.
  4. A movie controller is a visual signal to the reader that they can start and control the movie.
  5. A movie can be started, stopped, looped an arbitrary number of times, scanned forwards and backwards, at any desired rate. This lets the reader actually analyze motion at their own pace, rather than relying on us guessing what won't drive them mental.
  6. A movie can support appropriate compression for animation or video imagery, at various frame rates and resolutions, incorporating sound, titles or captions, etc.
The question is how? Free and open .ogg, which lacks decent browser plugins, proprietary WMV or QuickTime, or Flash, which is ubiquitous but lacks free authoring tools? Michael Z. 2009-06-06 15:37 z
How about this: a default condition of "animation runs when user clicks a button to make it run", and stops running when the user clicks said button again; but with an option to set preferences to automatically run them, still having a button that lets the user stop it from running. bd2412 T 22:00, 5 June 2009 (UTC)Reply
FYI, in most versions of FF and IE you can stop all gif animations by just pressing Esc on the page. Don't know about Opera or Webkit browsers.
JS code to switch between the static and animated versions would be great. One could even imagine a bot creating static (or single time animated) versions of existing animated gifs. --Bequw¢τ 18:33, 6 June 2009 (UTC)Reply
So, if I understand, there might be a kludge using JS that generated an escape on loading the gif to stop it, and on mouseover or click-on reloaded the image (without another escape) to start it, generating another escape on another click or mouse-out to stop it. Thus mouse-over or click would activate the gif animation. To be useful this would have to work for anyone with Java enabled and we would have to hope that most users would have java enabled. I hope that the facts conform to our hopes. DCDuring TALK 18:58, 6 June 2009 (UTC)Reply
JavaScript, not Java. At least it would be an improvement, but not ideal. Michael Z. 2009-06-08 03:21 z

How should we handle statutory/regulatory legal definitions? There are many terms (i.e. assault, gross income, solicitation, unfair competition) which are defined in statutes, which, in some cases may vary from jurisdiction to jurisdiction, and others (i.e. ground beef, cream cheese) which have national "standards of identity" established by U.S. federal agencies. Should we include such definitions? bd2412 T 23:39, 5 June 2009 (UTC)Reply

Significant issue. I think it may be a question of how rather than whether to include or refer to such statutory or regulatory definitions. Those "official definitions" may in some cases be the sole reason for keeping a compound entry (perhaps semisweet chocolate). I would like to avoid needlessly cluttering an entry with many definitions differing only in truly minor details. Indeed, I would like to exclude minor details from any definition, which might be a means to reduce the number of senses.
It would be a service if we could refer people to a source of appropriate jurisdiction-specific detailed definitions. But external links would themselves be a clutter. Would it make sense to have an "Appendix:External Links to Sources of Regulatory Definitions" with headings for the major categories (eg, Food, Air transportation, Pharmaceuticals) and have a single link to the appropriate section of the appendix? DCDuring TALK 00:47, 6 June 2009 (UTC)Reply
We could appendicize them internally as well, which would actually be quite useful. I can think of a number of reasons why a legal practitioner might want to know the definition of battery or conspiracy according to the laws of each of the 50 states (especially when figuring out where to bring an action or whether case law of other states is useful). As for the statutory food-identity definitions, that's federal law so there is only one for the country, which should be included in our entries (or perhaps summarized with a link if its one that's filled with bizarre and lengthy exceptions). bd2412 T 02:27, 6 June 2009 (UTC)Reply
I don't see why we would want to have the full text. Our standard is to have one- or two-line glosses. Folks have repeatedly complained about longer "encyclopedic" definitions. If a full legal definition is longer, then it is not really dictionary material.
We are also not set up to maintain mirrors of text on other sites. I would think we would be happy to have live links. :
The US is not the only English-speaking country whose regulatory and legal pronouncements are relevant. The UK, Canada, Oz, NZ, EU, India, Ireland, and several others may do so. In some cases provincial, state and even municipal laws govern.
In any event, collecting the links seems like a feasible first step, whatever subsequent steps we may be able to take with the vast increase in volunteers, technical resources, bandwidth, server capacity etc that are just around the corner. DCDuring TALK 03:55, 6 June 2009 (UTC)Reply
The thing is, the legal definition can be exacting. For example, the U.S. Food & Drug Administration standard of identity for "ground beef"] states: "(a) "Chopped Beef" or "Ground Beef" shall consist of chopped fresh and/or frozen beef with or without seasoning and without the addition of beef fat as such, shall not contain more than 30 percent fat, and shall not contain added water, phosphates, binders, or extenders. When beef cheek meat (trimmed beef cheeks) is used in the preparation of chopped or ground beef, the amount of such cheek meat shall be limited to 25 percent..." So, if any of the conditions of this definition are not met (say, the cheek meat is 26%, or it contains added phosphates), then the meat in question is legally not "ground beef". bd2412 T 04:49, 6 June 2009 (UTC)Reply
I think that we could give the general definition, with a note mentioning that legal definitions also exist, and that readers interested should use the link to Wikipedia for more information. Legal definitions should not contradict the general definition, only be more precise. The existence of a legal definition seems to be a good reason to include a phrase, and should be added to CFI. But the length of the definition is not the issue: some definitions have to be long (e.g. you cannot define topological space in a few words, simplifying the mathematical definition would make it useless). Lmaltier 05:58, 6 June 2009 (UTC)Reply
But our definition of (deprecated template usage) topological space is completely wrong: a topological space is not a set for which a topology exists (that would make it a rather useless concept: a topology exists for every set), but rather a set-plus-topology. And I don't think that's just an incidental fact about one entry; in our zeal to cover the technical criteria for a topological space (which is arguably encyclopedic), we've failed to our more basic obligation to explain what sort of thing it is (which is more clearly dictionaric). (It's not impossible to do both — in this case, we can replace our current definition of (deprecated template usage) topological space with a brief and correct definition in terms of (deprecated template usage) topology, and move the formal criteria to [[topology]] — but I think it's difficult and risky, so it's worth contemplating whether we really want to do both.) —RuakhTALK 14:43, 6 June 2009 (UTC)Reply
(Actually, I shouldn't say it's completely wrong: mathematicians will frequently speak of "a topological space X", where X is actually the set, and the specific topology of interest is either obvious, irrelevant, or mentioned elsewhere in the surrounding context. But our definition doesn't even cover that use accurately, since the mere existence of a possible topology is not enough to permit this sort of locution.) —RuakhTALK 14:50, 6 June 2009 (UTC)Reply
I agree with you, but I feel that the definition should have been only slightly changed. As it is now, it is not very useful, much less useful as the previous one (there are several definitions in topology, which one is intended is not clear at all). Lmaltier 14:12, 7 June 2009 (UTC)Reply
The legal definition is not necessarily merely "more precise" - sometimes statutes provide a definition that is counterintuitive to the common definition. For example, burglary for the longest time could, by definition (at common law) only be committed at night (if it was during the day, it simply was not burglary). In fact, the common law definition of burglary was "the breaking and entering the house of another in the night time, with intent to commit a felony therein, whether the felony be actually committed or not", which required all kinds of case law to strain out the niceties of what constituted a "dwelling" and what constituted "breaking". Every U.S. state has since modified this definition by statute, said statutes modifying the common law by specifying structures other than a "dwelling", or times other than "at night", or not requiring a "breaking" but merely an illegal entry. At the very least, we should probably have the common law definition for common law crimes/torts, and the definitions used in model codes and uniform acts if they have been adopted by multiple states. With respect to the FDA "standards of identity" for food items, I think there are only about 500 of them, and I think they should all be included. There are some other situations where items in commerce are defined by statute, particularly in international treaties, where the definitions would be useful for us to have as well. bd2412 T 04:01, 7 June 2009 (UTC)Reply
What can we do now to see what would prove useful and feasible? What should be do with semisweet chocolate and ground beef? They seem to be interesting cases. The US national regulatory definitions are good places to start. Should we put the entire US FDA regulatory definitions on the respective Citations pages? I would be wary of putting it on the main page, where it might drive some users away from Wiktionary. What else should we do with these two? Is anyone inclined to search out other Anglophone regulatory definitions? Should any corresponding Francophone regulations for "viande hachée" get parallel treatment, even though the defining regulations are in French? That would conflict with the idea of all definitions here being in English. It seem to me to argue for treating the full regulatory definition as a citation rather than a definition. DCDuring TALK 16:19, 7 June 2009 (UTC)Reply
I think we should have the definitions, for different jurisdictions, of words that are defined by statue or regulation (though not, I think, those defined by common law: that's too difficult, I think). Yes, [[assault]] may have a lot of definitions, but, hey, that's what subsenses are for  :-) . This would be a great boon for anyone who needs to nwo the legal definition. The big problem imo is how to include a sense which the law defines using three paragraphs of the Uniform Commercial Code along with several annotations. Or, to use bd2412's example above, how to include a U.S.-legal definition of ground beef. The ideal would be to summarize the law briefly; for ground beef, perhaps "chopped beef without fillers, without added fat, with fat content at most 30% of the whole, and with cheek-meat content at most 25%". Where this is not possible because the law is too detailed, perhaps leave off some detail — in fact, I did that in my proposed definition of ground beef by not specifying what's meant by fillers, but it can be done to a greater extent, so that the U.S.-legal definition of bockwurst can be reduced to perhaps "uncured, comminuted food comprising at least 70% meat, some of which is pork, and containing also eggs, liquid, onions (or the like), and sometimes other ingredients, as specified in 9 CFR 319.281". (To reply to a question, above, of DCDuring, yes, we should also include France's regulatory definitions of French terms, but in English translation, of course, like all our definitions.) And any such legal-type definition must specify that it is such, as by a context tag.—msh210 04:45, 8 June 2009 (UTC)Reply
I think as long as the definition indicates that the food must be "with specified quantities" of certain ingredients, we don't have to say exactly what those quantities are; we can leave a link to the FDA page which hosts their definition. Regarding concepts which will have multiple jurisdictional definition, it won't actually be that much. Many states either adopt a uniform act or simply copy one another. As for other countries, as raised by DCDuring above, I suppose we'll have to have the French limitations translated for the page on "viande hachée", but I'd be surprised if the French government itself has not released official translations for importers (and if other countries have not also generally done so). bd2412 T 05:46, 8 June 2009 (UTC)Reply
By the way, the entire FDA catalog of standards of identity for foods is available here, from section 131 forward. bd2412 T 06:54, 8 June 2009 (UTC)Reply
So how do we decide which specific bits of law, from which countries, are worthy of entries? Perhaps there is New Zealand legislation whereby a "sheepfold" only counts if it has more than 30 sheep in it, and no NZer has heard of it, except lawyers. There are thousands of acts and contracts and case histories, and almost all of them define terms. Equinox 00:38, 9 June 2009 (UTC)Reply
Perhaps there's a usage restricted to the local slang. I don't see how this would be much different (you'd be surprised how many legal definitions are just codifications of what is essentially slang as used by lawyers). As for judicial opinions, judges interpret the law, and to the extent that they define legal terms, they are either doing so pursuant to the statute, or pursuant to a long line of cases which reflect variations of the same basic definition. bd2412 T 22:33, 10 June 2009 (UTC)Reply
I can see that having all possible legal definitions will never work in theory, but it might well work in practice. Most of the possible definitions will come from a small number of jurisdictions, mostly Anglophone. Few will display the energy to provide properly formatted material and veteran contributors may selectively withhold help. The result will be a small number of entries, perhaps with a high degree of topical relevance to some news story.
"Veteran contributors may selectively withhold help"! This is the kind of pragmatic say-what-you-meanism that would spark a hysterical committee investigation on Wikipedia. And you're probably right. Equinox 23:57, 10 June 2009 (UTC)Reply
At Wikipedia can they force volunteers to do what the volunteers think is not good for Wikipedia? I wasn't talking about a conspiracy to suppress anything. If some veteran decided to help someone do do something I thought was bad, I might get cranky and I might engage in angry forum discussions and e-mail exchanges. Period. Seriously, I just don't think that we are likely to be overwhelmed with poor quality legal content. Even some really good ideas die because of lack of support. Bad ideas mostly have less of a chance. This is why I don't think we should make a point of eliminating the Shorthand header. It is a useful reminder. DCDuring TALK 00:34, 11 June 2009 (UTC)Reply
No one is talking about forcing anyone to do anything here. But if editors (such as myself) plan to add statutory or regulatory definitions, I'd like to know that that's considered appropriate. bd2412 T 01:16, 11 June 2009 (UTC)Reply
That was just at Wikipedia. I'm sure some way of doing fuller legal/regulatory entries would be good in most people's opinion. Let's try to do a couple of them and see what people like and how hard it is to do them well. Whatever it is it will be better than arguing in the abstract. "Semisweet chocolate" and "ground beef" seem like good places to start, but anything of sufficient interest to someone to get them to do the work would be fine too, don't you think? For example, I might take a run at deepening "Value at Risk", "VaR", the bank regulatory concept with its official definition, if I can. Or the prudent man rule. Or cap and trade. We could go back and improve skim milk, non-fat milk, 1% milk, 2% milk, light cream, heavy cream, half-and-half. We could define a frankfurter in a regulatory sense, if anyone has the stomach for it. DCDuring TALK 02:04, 11 June 2009 (UTC)Reply

Here is what I would propose for milk, by way of example:

  1. {{context|US|legal}} Under the standard of identity established by U.S. Food and Drug Administration,[21] the lacteal secretion, practically free from colostrum, obtained by the complete milking of one or more healthy cows, and including the addition of limited amounts of vitamin A, vitamin D, and other carriers or flavoring ingredients identified as safe and suitable.

We could create a category for FDA definitions, as well, which would inherently indicate that the context is U.S. and legal. bd2412 T 19:23, 11 June 2009 (UTC)Reply

So, then, you propose listing this as a separate definitional sense, with its own Translations table and Synonyms? I see some rather odd consequences of this sort of format. --EncycloPetey 19:30, 11 June 2009 (UTC)Reply
There wouldn't be a point to having a translation table or synonyms, as no other language will have a word that uniquely corresponds to the U.S. legal definition of milk, and no synonyms at all since each legal definition is unique (ok, there are some exceptions, but they are minor). However, it is key to remember that if you sell cow juice in the U.S. and you fortify it with vitamin C instead of vitamins A or D, you can not legally call it milk. That's the point of having a legal definition at all. bd2412 T 19:57, 11 June 2009 (UTC)Reply
I wonder if any countries effectively use the FDA's standards, but provide translations into a language of their own. In principle, that could lead to an exact correspondence between regulatory definitions. Does it make sense for Spanish (deprecated template usage) leche to repeat the exact words and contexts as this sense of "milk"? It would then, of course, be a translation. Other languages spoken in the US would seem to warrant the same treatment.— This unsigned comment was added by DCDuring (talkcontribs).
If another country, say Chile, has the same definition for leche as we do for milk then I can see the translation listed with a Chile qualifier. I wonder how often that would happen, though.msh210 21:14, 11 June 2009 (UTC)Reply
Is Spanish-language labeling and advertising of milk sold in the US exempt from FDA labeling and advertising requirements? I think not. So (deprecated template usage) leche in a US TV advert or newspaper circular should have the same restricted meaning as bd2412's definition of (deprecated template usage) milk. The same might apply to (deprecated template usage) lait sold in the far northern reaches of New England where the population is partially Francophone and may get milk with Canadian bilingual labeling. Hawaii and some US possessions might also have issues of this sort in other languages.
Of course, we could decide that this has nothing to do with consumers, that it is only a jargon of lawyers, regulators and regulatees, so only the official language of the courts and the regulators matters. DCDuring TALK 23:13, 11 June 2009 (UTC)Reply
According to the FDA Compliance Manual, all food sold in the U.S. must be labeled in English, with the exception of Puerto Rico (where it can be labeled in Spanish or both languages). I don't think there is any place in the U.S. where you can legally sell food that does not bear an English identification of the food being sold (although the law may be laxly enforced in some places). bd2412 T 01:01, 12 June 2009 (UTC)Reply
As far as the FDA is concerned, all the regs specify is that if you use a foreign language, you have to include the same information, but they don't bother with specifying what the translations into Spanish or any other language would be. — Carolina wren discussió 01:27, 12 June 2009 (UTC)Reply
Still, I think you could get in trouble selling something that does not qualify under FDA standards for milk, and calling it leche. bd2412 T 01:46, 12 June 2009 (UTC)Reply
On one side it says "Milk"; on another it says "Leche". The issue is that the FDA standards undoubtedly are intended to protect consumers who do not read or speak English as well as those who do. So the word (deprecated template usage) leche when used in an ad or on a package for milk sold in the US must mean the same thing as "milk". Incidentally, I regularly go to stores that have merchandise that bears labels in English and another language. That Spanish alone can appear on food labels in Puerto Rico is notable. DCDuring TALK 02:01, 12 June 2009 (UTC)Reply
Now we're talkin'. Including some form of FDA and standard of identity among the contexts (after the first) should lead to the creation of one or more categories once there are enough entries to populate them. Putting them in the context would also shorten the definition, which, at three lines, is long. I think our format would say the link should appear under a Notes section which means it needs to appear between <ref> tags. DCDuring TALK 21:09, 11 June 2009 (UTC)Reply
I agree: now we're talking; and I agree: the context should be in a context tag, not in the definition proper. That said, I don't see why "FDA" needs to be in the tag: the fact that it's a legal standard of identity in the U.S. is what's contextual; which agency defined it is, if anything, etymological. And I agree: the link should not be in the definition: perhaps in References or Etymology.msh210 21:14, 11 June 2009 (UTC)Reply
I merely thought "FDA" would be helpful to the user by providing some brief context that they are more likely to be familiar with than "standards of identity". DCDuring TALK 02:01, 12 June 2009 (UTC)Reply

Here would be the legal definition of artificially sweetened canned figs:

  1. {{context|US|legal}} Under the standard of identity established by U.S. Food and Drug Administration,[22] mature figs of the light or dark varieties packaged in water artificially sweetened with saccharin, sodium saccharin, or a combination of both, having a specified density, to which lemon juice, concentrated lemon juice or organic acids are added as necessary to reduce the pH of the finished product to pH 4.9 or below, and optionally containing any combination of natural and artificial flavoring, spice, vinegar, unpeeled segments of citrus fruits, and salt.

I realize that this sounds like a "sum of parts" case (figs, which are canned, and which are sweetened, and the sweetener is artificial) but if you try canning your own figs with a different artificial sweetener, or other ingredients that deviate from the FDA definition, or your pH is above the limit, then you will be admonished for selling a "misbranded" product (i.e., the FDA will say that your use of the phrase "artificially sweetened canned figs" does not correctly identify your product, and is in violation of the law, and your cans of figs will be confiscated and destroyed). bd2412 T 20:29, 11 June 2009 (UTC)Reply

The reason that the FDA doesn't specify a generic artificially sweetened for canned fruit in general seems to be that only certain canned fruits have been requested to be labeled artificially sweetened The actual regulations supports using a SoP construction. Save for the base product and the referred to base regulation, the exact same verbiage is used in defining artificially sweetened for apricots, cherries, figs, fruit cocktail, peaches, pears, and pineapple, but no standards for artificially sweetened applesauce, berries, plums, or prunes are defined. — Carolina wren discussió 01:34, 12 June 2009 (UTC)Reply
If we can be reasonably certain that the FDA uses "artificially sweetened" the same way every time (e.g. "sweetened with saccharin, sodium saccharin, or a combination of both"), then we could simply have that definition of the phrase, and spare the multitude of more SOP-like possibilities (but see nonstandardized breaded composite shrimp units and fried clams made from minced clams - "The common or usual name of the food product that resembles and is of the same composition as fried clams, except that it is composed of comminuted clams, shall be fried clams made from minced clams"). bd2412 T 07:22, 12 June 2009 (UTC)Reply

Okay, pursuant to the above I've created Template:standard of identity and Category:Standards of identity, and added corresponding definitions to milk and ground beef. How does it look? bd2412 T 04:12, 13 June 2009 (UTC)Reply

The template needs help from someone who knows how to make a context template (i.e., not me). I like the done entries, but the context tag should specify the country. And why is milk rfd'ed?msh210 22:17, 14 June 2009 (UTC)Reply
How does this work for non-food terms? For example, some states' codes, following an old version of the UCC (section 2-319 et seq.), define the terms "F.O.B.", "F.A.S.", "C.I.F.", "C.&F.", "ex-ship", etc.msh210 22:17, 14 June 2009 (UTC)Reply
And what about wine appellations?msh210 19:54, 16 June 2009 (UTC)Reply

Links to next and previous entry per language

I'm releasing a new JavaScript extension I've been working on for testing and comments.

Please add this to your monobook.js

importScript('User:Hippietrail/nearbypages.js');

Then clear your cache: hold the control key and click reload on most browsers.

Links are currently added in two places.

  1. At the left between the "navigation" box and the "toolbox" will be added one box for each language in the currnet page. Each is named after the relevant language or namespace. Or "browse" when it can't find any langage headings in the page such as in edit mode or on a nonexisting page.
    ◄ links to the previous page and ► to the next page. Between them is a link to the current page in bold or in red if it doesn't exist.
  2. Below each language heading in the page. These have only the ◄ and ► buttons.

When in a namspace other than the entry/definition namespace only the navigation column links appear.

The links come from the latest dump file, not from the database. This means for new changes the links may be wrong. Currently there are new dump files released every 4 or 5 days.

Correct alphabetical order is used for all languages for which there is a locale on the Toolserver. For all others the fallback is current "en_US.utf8".

There is no specific support for languages with more than one script yet.

In all cases a better sorting algorithm is used than elsewhere on Wiktionary such as Category pages. Basically punctuation, capitalization, and spaces between words in a term are treated as secondary with primary attention paid just to the letters themselves.

I'd like to hear who prefers just the links in the left column, just the links below each language heading, or both.

Note that namespaces and redirects are treated much as languages too but always with the default sorting sequence. American English for now.

Note that some namespaces are not in the normal dump files and as such will not get the links. This includes talk pages and user pages.

Cirwin has suggested it should be put in common.js - if you have rights to make such a change and think it's a good idea then please be bold.

All feedback appreciated. — hippietrail 03:52, 6 June 2009 (UTC)Reply

Thank you for doing this, the functionality is great and very helpful. I tested it in Firefox v3.0.10:
  • Occasionally, after a few navigation clicks, the previous/next words will no longer appear, I have to refresh the cache again.
  • Sometimes it loads slowly, it takes 2-3 seconds after the rest of the page already loaded.
  • I am not sure if the left navigator solution is needed.
  • Formatting under the language header: Can it be one line? Conrad had a good example for formatting.
  • The ◄ and ► images are used for audio link in the index. Could they be replaced by ← and → ?
  • What other browser version and type will this work? --Panda10 12:10, 6 June 2009 (UTC)Reply
  • Can you add linking to the specific language on a multi-language page? For example, a is a multi-language page. When I go to it from a Hungarian entry, can it jump to the Hungarian section? --Panda10 12:14, 6 June 2009 (UTC)Reply
I like it (more so than some other linking projects like the ranking boxes). And I think it would be very helpful when you want to look for similar words, or just for making wikt seem more like a bound dictionary. A couple things,
  • I have the interwiki links showing below the language header and sometimes the vertical spacing is too little and your links run over the other ones a bit (rendering on Google Chrome on MS Vista).
  • I prefer just th in-entry links. The navigation links are a bit long (it makes the presentation more obvious to show the current entry title between the for previous and next ones, but it makes them longer). They get even longer still on long page names (see pneumoultramicroscopicossilicovulcanoconiótico). The left-pane navigation turns into 5 lines because the arrow & word (for both the prev & next entries) don't fit on the same line. Worse, the whole words don't even show anyways! Smart abbreviation might help, but I'm not sure about the consistency of left-pane widths.
  • I'd be much more tempted to use it consistently if the in-entry links where consolidated onto one line, though you might have some technical limitations. Maybe they could even be on the same line as the interwiki link, though that might be confusing.
  • Could you add a link to the Index pages when they exist (now that Conrad has them more updated) to the in-entry links? Maybe something linke (prev - Index - next)?
  • On a, I was startled to see previous links starting with 'z'. While wrap-around might be interesting (especially to get to the 'last' word in the index) it might be unintuitive for others.
Thanks for the great work. --Bequw¢τ 18:24, 6 June 2009 (UTC)Reply

First a few quick responses. I might respond at more length soon:

  • On links not appearing or being slow to appear, that's most likely to be when the Toolserver is under heavy load. The Toolserver often slows down due the many and varied tasks it does. It shouldn't be anything to do with the cache. It does seem much less reliable on MSIE for me though.
  • The left navigator solution may have more appeal soon when I increase the number of previous and next links. It is closest to how other online dictionaries such as Encarta do it. But yes both are still experimental for now.
  • I am thinking of ways to format on one line. You may see it soon.
  • I could use ← and → but on the computers I use they don't stand out well. I haven't seen the other places where ◄ and ► are used.
  • I have tested the feature on Firefox 3.0/3.1/3.5 Google Chrome 2 Opera 9/10 and Safari Windows. I have only tested on Windows, not on Mac or *nix. I did some work today to make sure it works on Internet Explorer but it seemed sketchy. I'd like more feedback on this.
  • I have now added linking to the correct language section of the page.
  • I personally really dislike the Gutenberg ranking sections. I think that data is very interesting but definitely doesn't deserve to take up so much space right at the top of articles.
  • I haven't yet tried the interwiki links under the language headings. That's another JavaScript extension by cirin I think. I'll look into it soon.
  • I'll look into adding a link to the Index pages too though it might be tricky to link to the specific page. I think this shows that there are several concepts which are language specific which would benefit from being done in a consistent manner somewhere near the language headings: random, next/previous, indexes, interwikis.
  • The problem you saw on a where some entries wrapped around to "z" is now fixed. There were several minor problems with both the back end and front end that I knew were there but which I ironed out today.
  • Thanks for your comments and please refresh your caches to check on the problems that are now fixed. — hippietrail 09:59, 7 June 2009 (UTC)Reply

OK I've got the extension working with MSIE now and I've also listened to your opinions and changed my beautiful arrow symbols to the ugly ones you all prefer (-: — hippietrail 00:27, 8 June 2009 (UTC)Reply

Just a couple of comments on the current version:
  • I like it.
  • I like the in-entry version much better than the sidebar one. One result of the sidebar version is that stuff in the sidebar will be at variable height (depending, roughly, on how many languages we have a word defined in), which makes me have to search more for the link I want in the sidebar: a bad thing.
  • There was some talk of including User:Conrad.Irwin/iwiki.js into the default .js. (It's currently a PREF.) In fact, I seem to recall — though perhaps incorrectly — that that was the plan. Although not strictly relevant to this discussion, I'd like to take this opportunity to urge that.
  • I think that for those with iwiki.js and nearbypages.js, they should be integrated so as to appear on one line. We currently have a <div> for the preceding word, a div for the succeeding, and a div for iw; perhaps these can be spans instead, or divs with style set so they float next to each other?
  • Even if the iwiki.js and nearbypages.js links can't (or won't) be on one line, at least the two nearbypages.js links should be.
  • In the in-entry version, the font size can (and should I think) be smaller, à la iwiki.js.
msh210 04:13, 8 June 2009 (UTC)Reply
I haven't much to say beyond what is mentioned above. Intregration of iwiki.js and a link to the index would be cool where the index is well-constructed (I can create a list of which words are where for the indices I generate, American Sign language might be a problem). And you should exclude form-of entries from the index (I can give you the code I use for that for the indices if I haven't already). My idea for formatting would be something similar to what follows, with perhaps more efficient wording for the central section - this would allow the adding of a few more links if we suddenly decided we needed them, though I can't imagine what for (perhaps for Wikipedia links - would be nicer than the floating boxes by a long shot). Conrad.Irwin 15:34, 8 June 2009 (UTC)Reply
Hungarian


  • I have finally gotten around to finishing the next version for testing. There are now more than one previous and next term. Comments and suggestions are again appreciated. — hippietrail 15:50, 11 June 2009 (UTC)Reply
I like it. The new arrow looks good, too. Thanks. --Panda10 22:44, 11 June 2009 (UTC)Reply
Improvement, a much better look! A couple things, (1) I lost the sub-lang-heading iwiki link (was this by design?)found them, and (2) on entries created recently, the entry name in the list on the left is red (hinting the entry doesn't exist, when it does). Keep up the good work. Are you going to put it in WT:PREFS? --Bequw¢τ 00:59, 12 June 2009 (UTC)Reply
  • I haven't done anything to interact with cirwin's iwiki script. I still see it on my machine and it still doesn't match of course. Getting two javascript extensions to interact is a bit tricky so I won't attempt it until it's clear that nearbypages won't change much more. I also made my own version of the iwiki thing as an experiment which checks all the relevant wikis live but it seems that the interwicket bot doesn't miss a beat so there's no gain.
  • Yes the red link in the left is on purpose though it looks like an edit link it always displays the page rather than editing it. This is because the links are generated from the latest XML dump so can be up to five days stale and a red link on the left is more often a too-new entry rather than a nonexistant entry.
  • I will add it to WT:PREFS when I think I've had enough positive comments. Then I'll work with cirwin to nicely interact with his iwiki script too. I will also be able to add some options and variables such as which of the two display areas you want and how much of each of prev and next context you want.
  • Thanks for the feedback! — hippietrail 02:03, 12 June 2009 (UTC)Reply

One Million Words in English

According to http://www.languagemonitor.com/ we're about to hit 1,000,000 words (where they seem to define a "word" as something that has been used 25,000 times), a press release is available. Conrad.Irwin 10:39, 6 June 2009 (UTC)Reply

Nah, that's bogus. They've discussed it a few times on Language Log; see e.g. http://languagelog.ldc.upenn.edu/nll/?p=972. —RuakhTALK 14:55, 6 June 2009 (UTC)Reply
Yeah, basically the millionth word's been rescheduled like 5 or 6 times over a period of ca. a year to match with the release date of the guy's book. Circeus 13:52, 12 June 2009 (UTC)Reply

Category:Languages of the Caucasus

This category is currently defined as including "all languages spoken in Armenia, Azerbaijan and Georgia". Not only does this give the wrong impression that the Caucasus is limited to these three countries, but it also excludes the more famous Caucasian languages of Russia (Chechen for example). Because some languages spoken only in Russia are already in that category anyway, the definition should be extended to include some of the Russian administrative regions. -- Prince Kassad 18:20, 6 June 2009 (UTC)Reply

Fixed, thanks. —RuakhTALK 19:08, 6 June 2009 (UTC)Reply

Adding references

There's a new user who's adding references to German pages, but adding the source in the edit summary - I think it would be better to add it to the etymology section, like in Lob - is there a better way to reference other books in our entries? I'd like to inform the user (User:MaEr) how we should do this too, as it seems like useful information to have. --Jackofclubs 08:25, 7 June 2009 (UTC)Reply

In the wikipedias I'm used to adding sources, with <ref>...</ref> and ==References== <references/>.
I tried it here but it looked quite strange, with an additional "Notes" header, so I guess this cannot be the right way.
And I could not find out where to place the "References" header(s): only one for the whole article or for every language one. This is not documented, as far as I can see. --MaEr 10:21, 7 June 2009 (UTC)Reply
Hmm, maybe a page Wiktionary:References cna be written? --Jackofclubs 10:40, 7 June 2009 (UTC)Reply
WT:ELE#References -- Prince Kassad 10:53, 7 June 2009 (UTC)Reply
Unfortunately, this was one of the places that could not help me.
But when searching in the page later, I found some other interesting statements: WT:ELE#The_essentials seems to suggest that you are free to put the references anywhere. WT:ELE#A_very_simple_example suggests a per-language-section "References" but does not use the <references/> tag. --MaEr 11:08, 7 June 2009 (UTC)Reply
I created a new reference template for the Etymologisches Wörterbuch der deutschen Sprache - at Template:R:EWddS ‎. Do you think you could use it? --Jackofclubs 12:45, 7 June 2009 (UTC)Reply
The template looks fine! May I suggest an parameter for the lemma? Maybe an optional one?
Did you notice that I do not use the most recent edition of the dictionary? The hard-core etymologist (or library inhabitant) probably does not use the 22nd but the 24th edition. See also w:de:Etymologisches Wörterbuch der deutschen Sprache. So maybe we need another parameter which controls the edition (and the ISBN). --MaEr 13:07, 7 June 2009 (UTC)Reply
I've created Wiktionary:References as a redirect to WT:ELE#References. If needed, Wiktionary:References can become a guideline of its own. --Dan Polansky 12:59, 7 June 2009 (UTC)Reply

I added (as a suggestion) two parameters to Template:R:EWddS: ed for the edition of dictionary and hw for the headword of the article within the dictionary. Feel free to correct or comment details — I never did any template programming before and English is not my native language. It still is easy to change things because I used the new parameters in Spiegel only. --MaEr 19:20, 8 June 2009 (UTC)Reply

'Variables' in page names

Following debates such as X one's Y off and I'll see your X and raise you Y, are these sort of titles acceptable, if not how do we deal with pages that need some sort of variable in the title, and what others are allowable? A few examples:

  1. amuse oneself
  2. work one's butt off
  3. milk it -- currently up for deletion

Or in other languages

  1. s'appeler (French)
  2. llamarse (Spanish)

Of course in Spanish the particle is quite often attached the to verb, so stuff like llámeme and llamarte are one word, but still sum of parts. Indeed in German you can go on (deprecated template usage) ad infinitum creating one-word terms that are stil SoP. Mglovesfun 19:57, 7 June 2009 (UTC)Reply

Although the headword's with "one", "someone", etc are not very searchable for normal users, we seem to have accepted them. I believe that previous discussions have not yielded any consensus on how to handle formulas other than these. There is some thought, not opposed, that we should have redirects from all major pronoun forms (eg, "amuse myself", "amuse themselves", "amuse herself", etc). That would seem to be a job for a bot, though it seems a good idea to leave a redirect behind when moving one of these to a "oneself" form or similar. DCDuring TALK 20:38, 7 June 2009 (UTC)Reply
I'm gonna start an article for s'appeler tomorrow as I don't think that's SoP. But I wouldn't then created all the conjugated forms - surely putting see appeler covers this. Actually, WT:CFI avoids (or doesn't yet refer to) quite a lot of these issues, such as what is a term; indeed the phrase sum of parts doesn't seem to be in there even once. Mglovesfun 22:52, 7 June 2009 (UTC)Reply
Wiktionary:About French says to cover (deprecated template usage) s'appeler at [[appeler]], as we currently do; but Spanish we seem to do the opposite way, with separate entries for e.g. (deprecated template usage) lavar and (deprecated template usage) lavarse (and the former not even linking prominently to the latter). It would probably be nice to do them both the same way, but as long as we present things such that readers expecting one approach will notice we're taking the opposite approach, I don't think it's a big deal. —RuakhTALK 00:04, 8 June 2009 (UTC)Reply
I too have pondered over the issue of reflexive verbs. So far, most Swedish reflexive verbs are defined in the entry for the actual verb (lata, not lata sig) but I'm not sure that's ideal for a number of particle verbs where the word order tend to mess things up (ta till sig versus ta sig till). On the other hand, to put everything in the full-word entry would make a couple of reflexive verbs harder to find, e.g. in the case of lata sig, where there doesn't exist any non-reflexive use, and a verb entry at lata would have to be made merely to supply the user with some kind of "see also". \Mike 10:07, 8 June 2009 (UTC)Reply
I'd tend to think that the SoP 'rule' could easily be applied to reflexive verbs; so s'appeler in the sense of to 'call each other' (by telephone) is SoP, but in the sense of 'to have the name of', these meaning is not SoP (IMO) so I'm gonna start the article and if someone puts it up for deletion, I have no objection to that. As for WT:CFI it does say 'all words in all languages', so stuff like llámame which is really SoP ought to be fine on the grounds it is undeniably a word. That takes us to the definition of 'word' - is (deprecated template usage) don't one word or two? How about (deprecated template usage) to-morrow, is that one or two? Hence the reason that the CFI could do with a bit of work on it. Mglovesfun 11:47, 8 June 2009 (UTC)Reply
Well, according to our current approach, I suppose "s'appeler" is SOP, in that "s'-" is used in its sense of "indicating a reflexive verb" and "appeler" is used in its sense of "Template:reflexive to be called". But there's no law that says we have to do it that way. —RuakhTALK 18:59, 8 June 2009 (UTC)Reply

Favicon

Alright. Am I the only one for whom the favicon is suddenly the little scrabble tile thing? And when did this happen? And is there a way that we can not have this happen until a definitive outcome of whatever the heck logo vote is going on? Call me a grumpy old fart (given that I'm turning eighteen in an hour) but I find this annoying and would rather not see sudden changes when I sign on with the intent of going through newpages, going through RC, and adding some entries. It's irksome. And perhaps meddlesome on someone's part -- or at the very least this sort of change should have been announced somewhere? --Neskaya kanetsv 05:50, 8 June 2009 (UTC)Reply

To answer your first question, no: I, too, first noticed it this time that I signed on. I, too, was surprised to find a change made while meta's discussion is ongoing. Unlike you, though, I wasn't irked. But the main reason I'm commenting here is to wish you a happy birthday.  :-) --msh210 05:58, 8 June 2009 (UTC)Reply
Call me grumbly and persnickety and what say you, I just generally don't like sudden changes, and I'd really, really rather they not happen during meta discussion. Also, thanks. --Neskaya kanetsv 06:06, 8 June 2009 (UTC)Reply
No change for me yet, unfortunately (and the new favicon is not a scrabble tile!). This request (different favicons for Wiktionary and Wikipedia) has been delayed for a long time, and there is no reason why a new discussion about the logo should have caused an additional delay. Happy birthday. Lmaltier 06:59, 8 June 2009 (UTC)Reply
The request was invalid, and, at any rate, no action should have been taken until after the new logo procedings. The argument that "Wikipedia" and "Wiktionary" shouldn't be the same cuts no ice with me, why don't Wikipedia use something that look like their logo? Conrad.Irwin 11:11, 8 June 2009 (UTC)Reply
I too see a new favicon, the one that has reminded me of a scrabble tile from the first time I saw it. I am speaking of the effect of the favicon on me, of which I have direct unquestionable evidence, not of the effect on other people. I was slightly annoyed when I have noticed it, but not annoyed enough to do anything about it. --Dan Polansky 07:49, 8 June 2009 (UTC)Reply
Likewise, I see the icky Scrabble icon. On the one hand, we do need to be using something different from Wikipedia (with whom we seem to be perpetually confused), but on the other hand, I really dislike the Scrabble-style logo. --EncycloPetey 15:35, 8 June 2009 (UTC)Reply
I like it! --Jackofclubs 18:21, 8 June 2009 (UTC)Reply
I don't like it! --Vahagn Petrosyan 18:43, 8 June 2009 (UTC)Reply
Er, I see the new icon on Firefox on MS Windows, but not on Firefox on Red Hat Linux. I don't know whether this is a caching issue, or some markup that different browsers interpret differently.--msh210 19:57, 8 June 2009 (UTC)Reply
Now, I see it for fr.wiktionary, not for en.wiktionary... Lmaltier 20:00, 8 June 2009 (UTC)Reply
Almost certainly caching. Try visiting <http://en.wiktionary.org/favicon.ico> and hitting your browser's refresh button. --RuakhTALK 20:02, 8 June 2009 (UTC)Reply
For those interested, the relevant request to the server folk is at https://bugzilla.wikimedia.org/show_bug.cgi?id=16315 Conrad.Irwin 20:33, 8 June 2009 (UTC)Reply
Also, for those who care, I'll be talking to RobH at some point today and pointing out that the action taken was perhaps a great deal premature, and asking him nicely to undo it. --Neskaya kanetsv 20:54, 8 June 2009 (UTC)Reply
Reverted, and I thus withdrawl from consensus discussion ;p [23] --RobH 21:20, 8 June 2009 (UTC)Reply
Thank you! —Neskaya kanetsv 22:32, 8 June 2009 (UTC)Reply
No, please leave the change. I don't like it either, but having the same icon as Wikipedia is worse than having none! Its function is to differentiate the site in my history menu, so a non-differentiating icon is an absolute failure, when I jump between the two Wikis.
When there's a consensus it can be updated again, but we waited until hell froze over the last time, didn't we? Michael Z. 2009-06-09 00:16 z
I want the favicon to be animated. Can we make it take up the whole screen? Equinox 00:20, 9 June 2009 (UTC)Reply
I agree with Mzajac. How many years should we wait? Changing it does not mean that it cannot be changed again if a decision on a new logo is taken, some day. Anyway, a consensus is impossible on such a subject. Therefore, the only possible way to take a decision is a vote (after a discussion). I'm not aware of any vote against the new favicon (or against the new logo). Lmaltier 05:50, 9 June 2009 (UTC)Reply
The change was reverted partly because the new favicon was very badly drawn. See http://bug-attachment.wikimedia.org/attachment.cgi?id=6207 . Had it been good, I suspect that it would not have been changed back. Conrad.Irwin 09:58, 9 June 2009 (UTC)Reply
Are you referring to the aliased icon mask, which gives it a jaggy outline on a coloured background? I believe I can fix that if I can get a copy of the icon file. Michael Z. 2009-06-12 20:50 z

(back to unindented thanks to tiny netbook!) The new favicon didn't display correctly in at least fifty percent of browsers, and was definitely unpleasant to look at for most people. We may need an updated one but we do not need one that makes us look like an icky Scrabble game. --Neskaya kanetsv 19:52, 9 June 2009 (UTC)Reply

  • I finally had an idea for a logo for a dictionary that won't look like either a block of text, any other book, or any other multilanguage project. I've done a quick mockup (with the Japanese version of Microsoft Paint!) and put it in use as the favicon on my Toolserver homepage. For some reason the favicon doesn't work for me on Firefox 3 or Explorer 6 so in case you can't see it, here is a direct link. If anyone with more artistic talent or actual Gimp or Photoshop skillz would like to make a better realization of the concept please go ahead. It should scale up easily and be adaptable to other languages. Ideally it should look more like other WikiMedia project logos. Feedback appreciated. — hippietrail 03:39, 12 June 2009 (UTC)Reply
  • I like it. It didn't work for me on FF either. My eyes are not great so I had to scale it up times two+ to realize what it was. But that might be just a "realization" issue. DCDuring TALK 04:23, 12 June 2009 (UTC)Reply

Entry Layout for Abbrevations etc.?

A can of worms! Why not open it‽

What is the proper entry layout for Abbreviations, Acronyms, and Initialisms?

This has limited policy guidance, and inconsistent use – perhaps we might hash it out, if it’s not already been addressed, in some lost corner of the archives, beneath discarded beer bottles?

Policy:

…which cites:

…where I just wrote:

…summarizing previous discussion, AFAICT.

Previous discussion:

Motivation: we were having a discussion at RFC:LOL regarding how to format LOL.

What appears to be agreed so far is:

  • Ab/Ac/In are legit L3 headers (despite not being Parts of Speech), as per vote
  • Short forms should just expand, if the long form is used – e.g., NATO should simply expand to “North Atlantic Treaty Organization”. (as per WT:ELE).

Some questions – “Parts of Speech” is the main question.

Parts of Speech
Should Parts of Speech be used in addition to Ab/Ac/In? Always? Never? If an Ab/Ac/In is used as more than one part of speech?
  • What if it is used as only one part of speech?
  • What if only one expansion is used for these other parts of speech (as in SMS)?
Etymology – break up
Having multiple expansions in a single L3 header breaks the “Divide by etymology” principle. Breaking up each expansion into separate Etymology sections is rather long (bloated?), however.
Etymology – where?
Should etymology be given in the Ab/Ac/In section (as in NATO) or in a separate Etymology section (as in SNAFU)?

Examples:

  • SMS
    Cited in discussion – has an Initialism, with 4 expansions, a Noun (“a text message”), and a Verb (“to send a text message”), the noun and verb only associated with one expansion/etymology.
  • NATO
    Unambiguous expansion – can be used as a Noun (the organization) or an adjective (“NATO forces”). Do we want separate POS for this? Consider “blue”.
  • LOL
    Pronounced both as an Initialism (letter-by-letter) and as an Acronym (as a word) – currently listed as an Abbrevation, splitting the difference.
    Further, mostly used as an Interjection, but also used as a verb (as in “I LOLed/LOL’ed”).

Oh Beer Parlourians, what sayeth (sayst?) you?

Second-person plural (as in this case): “what say you?”; second-person singular: “what sayest thou?” or “what sayst thou?”; “sayeth” is third-person singular.  (u):Raifʻhār (t):Doremítzwr﴿ 12:49, 9 June 2009 (UTC)Reply
—Nils von Barth (nbarth) (talk) 22:56, 8 June 2009 (UTC)Reply
[e/c] Just the way I think of these, not necessarily anyone else's opinion, and may possibly even be counter to what has already been definitively established as policy for all I know (as Nbarth implies, though it's news to me): Ab, Ac, and In are second-rate headers, used when nothing else works, such as for phrases (where Phrase also works, but is not better). Where Noun or one of the other standard POS headers works, use it instead. The set of pages with Ab/In/Ac headers that could be a N/V/Adj/Adv/Prep/Cardinal number/... is a cleanup category. Not that Ab, Ac, and In is illegal (anti-policy), just that they're poor substitutes. So SMS — currently defined as ===Initialism=== # Short Message Service # Sega Master System # special mint set # short man syndrome ===Noun=== pl SMSes # A text message sent on a cell phone ===Verb=== conj SMSes SMSing SMSed # To send a text message on a cell phone — should instead have just a noun and a verb section (perhaps proper noun for the video-game system, if attested). These can all be listed under the same etymology (all have "initialism" as their etymology) or split: I see advantages to either and have yet to be convinced which is better. Again, this is all just my own view, natch.—msh210 01:02, 9 June 2009 (UTC)Reply
  1. The one issue that might be easy is the question of adjective use of the noun-type abbreviations. It would seem that such use is just like attributive use of the corresponding noun which we do not present as a separate part of speech unless it is attestably gradable or comparative. OTOH, I can imagine someone be said to be "more NATO than EU". We could simply ignore this kind of fairly trivial case.
  2. An abbreviation used as verb seems to need to be a separate PoS. Its etymology is almost always just the noun form of the same abbreviation.
  3. For all of the abbreviations that would be nouns, I would think no change would be required. The etymology and rudiments of pronunciation are already built in to the entry as we have it. Plurals are trivially formed by adding "s" in almost all cases. Exceptions could be noted by a hand-made additions to inflection or sense lines.
  4. The "texting"/"internet" abbreviations like "LOL" raise other issues because they function differently than their expanded forms, due to the medium.
  5. I don't know whether we really need to do anything with the abbreviations that are adverbs or other parts of speech that are not texting/internet.
  6. There is an issue I've noticed with headwords that are both normal English words and English abbreviations. The Abbreviation heading only makes sense to me if it is at the same level as the Etymology header for the English word. It also seems inappropriate to put a word below the abbreviations in these cases, though our standard is alphabetical order by part of speech, which always puts abbreviation and acronym at the top. DCDuring TALK 00:57, 9 June 2009 (UTC)Reply
I agree with Msh210. I think that Initialism, abbreviation, etc. should belong to the etymology section, and that tele is a noun (currenty, no POS is mentioned), and UNO is a proper noun. And I think that the OK page is OK (except for the Oklahoma sense: Oklahoma is a proper noun, but are such codes really words?). Lmaltier 05:59, 9 June 2009 (UTC)Reply
In a small proportion of the cases what MSH proposes makes sense. But imposing the same structure on abbreviations as on real words will lead to extravagantly long entries in many cases. We have a number of instances of more than ten different organization names sharing a given abbreviation. WP disambiguation pages often show many that we don't even have. To show the entire set with etymologies that merely repeat the sense line would be a serious waste of space. Allowing etymology and pronunciation sections for each noun abbreviation may lead to those sections being 10 times longer (in vertical screen space consumed) with minimal additional information value. DCDuring TALK 11:43, 9 June 2009 (UTC)Reply
In many cases, it might make sense to consider "Initialism." as a single etymology, and to detail meanings on each definition line, in order to make pages more readable and to save space (this technique does not work well when there are several pronunciations, or several genders, etc.). But my point was about the POS... Lmaltier 12:35, 9 June 2009 (UTC)Reply

Abbreviations facts and guesses

We are missing some important facts that have to do with the relative importance of the considerations. What we know: as of March 22, 1334 L3 headers for abbreviations, 139 acronyms, 314 initialisms. Counting from categories we would get more than 3 times as many, many more initialisms. We also know that many abbreviation headers should be one of the others to faithfully carry out existing "best" practice.
My a priori assessment based on the English entries I've worked on (no file extensions) is that:
  1. most (60+% of entries, 80%+ of senses) abbreviations are just nouns, most (80+%) of them proper (or at least capitalized) nouns.
  2. there are few (<1%) abbreviation senses that are verbs
  3. an important 1% might be the internet/texting type abbreviations of various PoSs, but with important and troublesome common characteristics.
  4. the balance are adverbs, true adjectives, phrases, and items of uncertain classification (eg, file extensions and URL components).
  5. there are a number of abbreviations (< 100?) that are formatted as real parts of speech, some of which are not categorized as abbreviations.
We could expect there to be a number of entries that have very many senses. Our level of coverage is not high. The significance of many of the abbreviations is low. Given that we don't subject these to much selection pressure, we can expect there to be many more, especially of "low significance".
Are there sources of additional information? Does my a priori assessment clash with others'? Are there important considerations for languages other than English? for non-Latin scripts?
Do we need more facts? What can be readily collected by dump analysis? Do we need some kind of sample of entries for other characteristics ? DCDuring TALK 13:27, 9 June 2009 (UTC)Reply
To illustrate the low level of completeness and therefore the likelihood of long entries, compare ABA (2 senses) with w:ABA (~30, almost all with article links). DCDuring TALK 13:37, 9 June 2009 (UTC)Reply

Following on DC’s thoughtful analysis, there seem to be 3 main cases:

simple
Most a/a/i are a single term, used as a single POS, usually Noun – often Proper Noun.
many senses
Some a/a/i have many expansions – this is a concern for suggestions of a separate etymology for each expansion.
complicated
A few entries (1%), such as SMS, are more complicated – what works for them may be overkill for others.

A significant concern is at WT:CFI#Names of specific entities – from my reading, most expansions of ABA are not in our scope: they are just names, and not used attributively.

Some thoughts:

simple
What’s the best way to say: “This is an a/a/i used as a noun/adjective?”.
An existing example is OEM, which lists Initialism and then uses {en-noun} for the POS line, indicating plural, but otherwise does not indicate that this is a noun.
Another option is Etymology+Noun or Etymology+Adjective, as suggested above (msh).
many senses
A list is easiest; policing CFI for these may be tricky, and less-used senses are often just names, hence should only be at Wikipedia.
complex
Not sure if we can think of or address all cases.
One rule of thumb may be in complex cases to consider the a/a/i L3 header as an “Etymology” section of sorts, and have POS headers subordinate to it (L4), if there are multiple senses.
E.g., SMS would be split into 2 Initialisms sections, with Noun & Verb being subordinate to the first).

Note that giving the expansion in “Etymology”, the pronunciation (Acronym/Initialism) in “Pronunciation”, and the use in POS headers (as indicated by msh and Lmaltier) is consistent with other entries, though it takes up more space and has been rejected in the past – WT:VOTE.

Perhaps Etyl/Pron/POS would work for simple entries?

That is, the key questions one likely has for a simple a/a/i is:

  • what does it mean?
  • how is it pronounced (Acronym/Initialism, or some combo: JPEG, LOL)?
  • what part of speech is it?

…which are addressed by separate sections.

OTOH, this is bulky, especially for many senses.

—Nils von Barth (nbarth) (talk) 22:07, 9 June 2009 (UTC)Reply
IMHO, the simple vs. many senses distinction seems hard to maintain. Simple entries tend to become many-sense entries. Unfortunately, I fear we have no choice to design for the many-sense and complex cases, but make sure that a simple entry is not overburdened.
We have not been excluding abbreviations because the referent would not meet WT:CFI. I have occasionally put an abbreviation in RfV when the referent was not even in WP AFAICT. I am usually concerned with abbreviation entries only to keep them from cluttering up User:Robert Ullmann/Missing for which putting them inside a WP link is sufficient.
To me, the default case of a noun entry is adequately handled by existing practice. It is only a few unusual noun cases and the non-noun PoSes that might benefit from well-designed change. We could posit that they all non-noun abbreviations should be treated as normal word entries and be done with it. We could then finesse the remaining exceptional noun cases on a case-by-case basis. DCDuring TALK 00:30, 10 June 2009 (UTC)Reply
Why not keeping the current practice for nouns (where possible), with a single difference: Noun or Proper noun as POS, instead of Initialism, etc.? Remember the KISS principle. Consistency is important, it makes things simpler. Lmaltier 07:04, 10 June 2009 (UTC)Reply
Thanks for clarification on inclusion practice DC!
The main concerns with current practice, AFAICT, are:
  • No POS given in simple cases.
  • No guidance for complicated cases.
How’s this for a proposal?:
For a single POS
use === Acronym === etc. as an L3 header, immediately followed by ==== Noun ==== etc. as an L4 header – i.e., the a/a/i L3 header is a brief Etymology/Pronunciation gloss, but POS is indicated.
Likewise for a list
list multiple senses under a single Noun heading, or rather under 2 if some uses are countable and some are uncountable.
For complex cases
use multiple L3 Acronym/Initialism headers if senses are used/inflected differently (SMS), with L4 POS headers as above.
…and for terms like LOL that are pronounced both as Acronym and Initialism, use === Acronym/Initialism === as the L3 header.
I think this addresses concerns raised, scaling cleanly between simple and complicated cases, with the only change to existing practice in most cases being “adding a POS subheader, in addition to a/a/i header”.
—Nils von Barth (nbarth) (talk) 10:23, 10 June 2009 (UTC)Reply
Why adding a level? It's not needed. tele is a noun, UNO is a proper noun. KISS! Lmaltier 10:44, 10 June 2009 (UTC)Reply
Economy of headers has a major value in reducing the visual complexity of our entries and in enhancing the usability of the Table of Contents. It also adds to economy in the use of vertical screen space increasing the amount of information on the first screen that a user sees. We need to give new users what they want to avoid falling farther behind the other online dictionaries in the number of users (I have facts on this.).
The biggest merit of the status quo is its economical use of headers. The single ab/ac/in header:
  1. distinguished these from normal PoSes (providing a reason why an Etymology was not required)
  2. gave the most important pronunciation information (but only for Acronyms and Initialisms)
  3. and took but one header-worth of vertical screen space.
From a consistency perspective the status quo is deficient, but the category of abbreviations is large enough to merit its own treatment rather than being forced onto a procrustean bed. That there are specialized dictionaries for abbreviations and specialized sections for abbreviations suggests that abbreviations do not fit all that well into the data structure for other lexical units.
Abbreviations do not normally need Etymology or Pronunciation headers (or PoS headers if users can be assumed to infer the PoS from the header). The etymology is the gloss. Some ac- and ab-type abbreviations are not served well in terms of pronunciation, but the same is true for all parts of speech from the point of view of the vast portion of users who don't know IPA and haven't figured out how to do audio in our format. Putting an audio icon and IPA at the end of the gloss would serve the cases that need pronunciation.
An alternative approach would be to cede to WP disambiguation-type pages all proper noun abbreviations or initialisms (which have no pronunciation issues). In my experience their coverage seemingly far exceeds ours, certainly for notable entities. The only-in template would speed users to those pages from here. DCDuring TALK 12:00, 10 June 2009 (UTC)Reply
DC – any thoughts on indicating POS?
My major issue with the status quote is that POS is nowhere indicated – e.g., FUBAR is an adjective, while SNAFU is a noun, but this is not clear from the entries.
Separate POS headers are bulky, as noted; the shortest possible unambiguous solution is {{pos n}} (which displays an n, for “noun”), as suggested by Connel at WT:ELE/POS talk.
That is, I see three options:
  1. Status quo – POS nowhere indicated, shortest
  2. POS headers – either in addition to or instead of a/a/i header – clearest, most consistent, bulkiest: many variants:
    1. instead of a/a/i, (as msh and Lmaltier suggest, though failed in a previous vote)
    2. in addition to a/a/i as L3 (as Ullmann suggests),
    3. in addition to a/a/i as terse L4 (my suggestion above)
  3. Definition line – minimal change to status quo to indicate POS, as Connel suggests.
Any other suggestions or possibilities? What do y’all think?
—Nils von Barth (nbarth) (talk) 14:45, 10 June 2009 (UTC)Reply
The status quo can bear some improvement. I am completely open on approaches to indicate PoS for non-nouns. Even treating as a normal PoS would be OK. The "pos-X" templates (under review for deletion) seem to do a very economical job of handling this, but don't address the problem of those few items that need pronunciation or have etymologies not obvious from the gloss. Would end-of-gloss pronunciation be acceptable? (BTW, don't you think that snafu, though etymologically an acronym, has entered the lexicon as an ordinary noun?) My own favored option would be a hybrid:
4 (= 1 + 2 + 3)
  1. Status quo for all abbreviations that are not exceptions in terms of actual content of entry.
  2. Exceptions are items meriting full PoS treatment by virtue of
    1. having entered the lexicon,
    2. having etymologies distinct from their gloss,
    3. needing an inflection line,
    4. having a need for long or multi-line pronunciations, or
    5. for such other reason as we might find acceptable.
  3. No pronunciations for initialisms.
  4. Pronunciations, if any, to be shown at the end of the gloss, if they can be, for non-exceptions.
  5. Appropriate category to be added to entries that are exceptions.
  6. In-gloss PoS indicators for non-nouns not otherwise exceptional.

Among a list of merits to long to repeat (;-}), this has the merit of fitting into an incremental process of altering entries. Almost any of our existing entries would conform. Only valuable content forces change. Many of the abbreviations that already are treated as true parts of speech already conform. In other words, it is very close to our actual best current practice. DCDuring TALK 16:03, 10 June 2009 (UTC)Reply

Sounds good in the main – I’ll see about drafting something to summarize.
A question or two:
  1. Should the term be displayed via an inflection line ({en-noun}, {en-noun|-}, {en-proper noun}, etc.), or manually as word?
    I.e., this shows plural or (uncountable), if relevant, and categorizes as Noun/etc.
  2. Is the reasoning behind not including in-gloss PoS for nouns b/c they’ll likely be parsed as nouns anyway or it’s distracting? It seems to add clarity even for nouns.
—Nils von Barth (nbarth) (talk) 22:49, 10 June 2009 (UTC)Reply

Would adding links to wikipedia be a bot job the way inter-language links are added? (Is it possible without a current WP dump?) RJFJR 14:09, 10 June 2009 (UTC)Reply

I could see it being a bot job for ex. {{langcatboiler}}, but not really for the main namespace. -- Prince Kassad 14:18, 10 June 2009 (UTC)Reply
I don't know how easy it is to do without error. Perhaps a process could be tested on a closely related but more manageable problem: linking entries in the taxonomic name category to Wikispecies under See also for the appropriate PoS. These entries usually would have WP links too. At the genus level, they might benefit from Wikicommons too. Perhaps also there are categories of nouns that would benefit from links to Wikicommons. A clean-up list of those that don't have good targets in the sister project would be useful because it is usually not hard to make a manual adjustment to find a good target article to link to.
I am among those here have an aversion to the big sister-project link boxes as opposed to the more discretediscreet things that fit into "See also". Some have stronger aversion than I. DCDuring TALK 15:02, 10 June 2009 (UTC)Reply
I agree linking to WP by bot would be error-prone, but to Species per DCDuring might be a good idea. I think that if Commons has a page or category related to one of our entries, then we should find a good picture, include it in the entry, and not link to Commons. Links to Commons are saying "here are some pictures!" and imply that we're too lazy (or understaffed) to pick one of those pictures as a representative. (Incidentally, I'm with DCDuring: the discreet {{pedia}} beats the discrete {{wikipedia}} every time.)msh210 16:01, 10 June 2009 (UTC)Reply
We've had bots try to do that and fail miserably. For countries, a bot often picks out a flag, so (deprecated template usage) Italia (as an example) is illustrated with a picture of the flag, rather than the country (although I'm not sure this specific case was done by bot, I've come across it often). Selecting a good picture really is a manual job, although it could perhaps be bot-assisted. --EncycloPetey 16:06, 10 June 2009 (UTC)Reply
I must not have been clear: I did not intend for bots to add pictures! I merely meant that we should add pictures (viz, manually) rather than link to Commons (as DCDuring had tentatively suggested).msh210 16:17, 10 June 2009 (UTC)Reply
Linking at the genus level is frought with problems, since Wikipedia is case-insensitive and sometimes disambiuates by moving the genus to a common name rather than its scientific name. There are situations, for example, where an animal and plant genus have the same name, so only one of them (or neither) is at the basic page name. We would want both linked, and there is no standard on WP for how the disambiguation genus page is named. There are also cases where the genus name also happens to be a fairly common word, so the genus page is not at the basic name but has instead a parenthetical component. There are also situations where a genus is monotypic, so the information is only at the species page and the genus is a redirect. I don't think a bot is sophisticated enough to handle all that. --EncycloPetey 16:11, 10 June 2009 (UTC)Reply
Perhaps we could just have a labor-reducing, rather than a labor-eliminating approach. I usually add pictures by first adding a generic commonslite link to the headword and follow searches and links until I get a good image for import and a page or category, if there is one. I do a similar process for Species and WP to confirm a valid link. If the labor was reduced by inserting the sister project link templates and the entry were assigned to a clean-up list, we could accelerate the process significantly. Both vernacular and taxonomic names would benefit from this process because many vernacular name entries include what some contributor thought was the right taxonomic name.
Doing WP, Species, and Commons all at once gives benefit because WP and Species don't always agree and Commons is not perfectly consistent in using the latest taxonomic name, an obsolete one, a synonym, or a vernacular name. WP often and sometime Species far surpass us in vernacular names, so we might generate many new entries or, at least, productive red-links. DCDuring TALK 16:39, 10 June 2009 (UTC)Reply
You (EncycloPetey) seem to feel that we only want Wikipedia links when those are relevant to what we mention in our entry; but personally, I'd be quite happy with {{projectlink}}s to whatever Wikipedia might happen to have at its like-named article (as long as said article exists, or redirects to one that does). There are basically two uses for Wikipedia-links: "Here's more information about what you just read a definition for!" and "Here're other things you may have been looking for instead!". The former is not readily bottable, but the latter is, and IMHO is worthwhile. —RuakhTALK 01:30, 12 June 2009 (UTC) Clarification added 13:27, 20 June 2009 (UTC). I'm careful with my indentation, but often forget that not everyone else is, so the meaning of "you" isn't always clear, even when it theoretically should be.Reply
No, I would often put multiple links in a single article, most commonly one to a dab page and one to something actually on the subject of one or another gloss, or to multiple Species and Pedia articles if more than one is referred to by a single vernacular name.
This is about entry quality improvement. I am just looking to make the process of adding pictures, species links, and content-specific WP links easier by having a bot:
  1. find the entries,
  2. enter some useful headers and templates, and
  3. put the so-templated entries on a list for the manual content enrichment.
It would be a real help for a fairly large group of entries equal to the union of entries with taxonomic names; entries in categories of animals, plants, fish, insects, fungi, etc; entries using the "spelink" template less those already with the multiple sister project links under a "See also" header. Taking a minute off the link-up process for each entry is good, but even better is having it all on a list.
Being able to do all these quickly could also help us add a large number of English vernacular names for species and genera as synonyms and thereby as red-links for new entries with starter content already available for a bot (potentially} or a human to generate new entries. DCDuring TALK 02:24, 12 June 2009 (UTC)Reply
A bot would be an excellent idea. It could go through and strip out all those damn Wikipedia boxes we have littered all over the place, especially the ones in section zero above any language header, and substitute a simple link. But that's not the purpose you had in mind, is it? I don't like the idea of automatically adding links to anything other than a disambiguation page since thoroughly linking articles can takes some thought to determine which ones are most related. There are tons of Wikipedia articles that we should link to but do not for the simple reason that Wikipedia includes parenthesized annotations in titles. Also there are many cases where topics are coalesced and the most relevant article is under a different name. I suppose if you could find a way to determine which ones were related then that would be fine. No one who adds Wikipedia links now really puts any thought into it either, so a dumb machine could easily do a better job. DAVilla 12:01, 17 June 2009 (UTC)Reply

After a bit of feedback and a new version I've added "nearbypages", my extension to provide links to previous and subsequent pages in alphabetic order of each languages where possible. It's in the experimental section. So far I haven't added suboptions as I'm not sure WT:PREFS currently supports such a concept. But I may add options to turn on/off the navbar links separately from the language heading links, and to specify how many such links to add. Enjoy and please leave impressions here. — hippietrail 16:06, 12 June 2009 (UTC)Reply

Where's this discussed? I love it that the sorting works.
I'd like to propose using single instead of double angle quotation marks for dividers ( ‹ › )--works just as well, but reduces busy type-clutter). Also, the dividers should be preceded by a non-breaking space, so they won't randomly show up at the start or end of a line, depending on the wrap width.
Could also try a smaller font, so that it looks more like supplementary information and not part of the entry. --Michael Z. 2009-06-13 16:23 z
Contrarily, as I sometimes am, I'm really much more pleased with « and » than I would be with the single angle quotation marks, and I find the single angle quotation marks (‹›) difficult to distinguish. Maybe this could be made into a suboption for it, for instance having one or the other as default and the one that isn't default as a suboption checkbox, "Use «» instead of ‹›. That's I think the best compromise we're gonna find on this. However, I really like this. It's making a few tasks I have to do that are language-wide for Hiligaynon (template replacement and displaying an inflected form) a lot easier for me. —Neskaya kanetsv 20:20, 13 June 2009 (UTC)Reply
Also, slightly weird that the sort order goes co-op, coop, Co-op. Perhaps punctuation characters should have more significance than capitalization, so the order would be coop, co-op, Co-op. --Michael Z. 2009-06-13 16:33 z'
  • So far it's only been discussed here and not a lot yet. But now that it's released I'll take cirwin's lead and make its offical talk page ... its talk page (-: User talk:Hippietrail/nearbypages.js
  • The separator to use is open so I'll go with whatever is most popular. You could just set it yourself with CSS if I didn't have to support stupid IE6. The original version used chunky solid triangles that looked like the play button icon but people complained. Then it used the skinny arrows but those didn't work so well with more than one link either side. I've made it random for now which will no doubt annoy people but you'll get to see them all. I'll set it permanently when it's clear which is most popular.
  • I will give it a CSS class so its font size and other characteristics can be set.
  • The nonbreaking space idea is a very good one. I've tried to implement it now.
  • The sort order is the one specified on the Toolserver for the best matching locale I could find. I assume it's a localized version of the standard Unicode sort order.

hippietrail 01:38, 14 June 2009 (UTC)Reply

Greenlandic

Apparently, the name of this language is not consistent in Wiktionary. {{kl}} says "Greenlandic", while {{kal}} says "Kalaallisut". This results in inconsistent naming across Wiktionary (for example, Category:Greenlandic language but Category:Kalaallisut derivations) and the templates should be changed to be consistent. The only question is what name to choose. -- Prince Kassad 14:59, 13 June 2009 (UTC)Reply

Greenlandic is the English name, which is used by Wikipedia, Wikimedia, and one of two names used by Ethnologue. --EncycloPetey 15:02, 13 June 2009 (UTC)Reply
Greenlandic is probably the best name for us to use. It is clear and doesn't conflict with the dialect names. It appears to be used by Greenlanders.[24]
Ethnologue calls this Greenlandic Inuktitut[25] (or “Inuktitut, Greenlandic”), with alternate names Greenlandic and Kalaallisut. Apparently Kalaallisut is the most common and “standard” dialect, also called Western Greenlandic or West Greenlandic. The other two dialects are Eastern Greenlandic, East Greenlandic, or Tunumiit oraasiat, and “Polar Eskimo”, Northern Greenlandic, North Greenlandic, Thule Inuit, Inuktun, or AvanersuarmiututMichael Z. 2009-06-13 16:12 z
I say Greenlandic too, with polar directions for dialects if needed. Circeus 17:15, 13 June 2009 (UTC)Reply

Pictures!

Alright, so bear with me while I hijack a small section of the Beer Parlour for something that will not only help the project but give me something to do.

I'm pretty soon going to have actual full time access to a dSLR -- actually, I'm buying one within two weeks from now. Unfortunately I really have no idea as to what I want to go out and take pictures of, so I decided to create a project for myself to do with it. While I'm aware that commons has billions of images (or I might be slightly exaggerating) I'd like to go out and work on taking pictures for entries here. Specifically for entries here. So the project right now is that I'd like a list from anyone here of entries that they want pictures of. Nouns preferably. Any language works as long as the item that it is I can logically find somewhere in the greater Los Angeles metro area. Suggestions can either be left here (I will be watching) or at User:Neskaya/pictures.

Thanks all, and I'm hoping that we actually get a couple hundred good pictures out of this. --Neskaya kanetsv 21:25, 13 June 2009 (UTC)Reply

See Category:Requests for photographs for about 20. There are some long-standing requests there, including ones for a photo to replace the one at the top of WT:GP. Some may have already been satisfied. Take a look at lie detector, which doesn't really have a suitable image (one with a person hooked up would be good, but hard to get.). The template {{rfphoto}} puts things there, though without explanation. DCDuring TALK 23:27, 13 June 2009 (UTC)Reply
Well yes, but that category doesn't seem widely used, and there are a metric ton of entries without photographs at all. So this was a hope of getting people to think and figure out what other entries should reasonably have photographs. Household objects comes to mind too. --Neskaya kanetsv 00:25, 14 June 2009 (UTC)Reply
I was just hoping to give the existing system a free ride on your enthusiasm and promotional efforts. Any way we get better visuals is fine with me. See also {{rfdrawing}}, for which the camera won't help. DCDuring TALK 01:16, 14 June 2009 (UTC)Reply
Most of those things that I can possibly photograph will get photographed. Thank you. I still want more entries and want to force people to actually think about what could use a photo. :D --Neskaya kanetsv 01:41, 14 June 2009 (UTC)Reply
Your approach is already working. I just added gravity boots and inversion table to the category (likely to be found in Lala Land, I think) based on a query at WT:FEEDBACK. I couldn't find anything at Commons for them. DCDuring TALK 01:58, 14 June 2009 (UTC)Reply
I don't suppose you could be bothered to add anything that doesn't have a picture at all that you come across that can possibly have a photo? (Foreign language terms should all have photos too.) --Neskaya kanetsv 19:37, 14 June 2009 (UTC)Reply
I'll be happy to, but nonkilling, carry someone's water, carry water for, patriotism, Caló didn't seem suited. zoot suit already has a vintage picture. From w:Zoot suit, (deprecated template usage) carlango and (deprecated template usage) tramas. Actually each element of a zoot suit would be good: the hat, with feather (deprecated template usage) tapa, (deprecated template usage) tarda; the extra-long watch-chain, the pointy shoes (deprecated template usage) calcos would be great. All conveniently located (I hope) in some retro shop window in East LA. DCDuring TALK 20:11, 14 June 2009 (UTC)Reply
Speaking of variously Hispanic items, though not ones on east LA (as East LA is going to take a bit more coordination -- not exactly somewhere someone without a car should be going on their own by public transit, and definitely not somewhere someone should be going with an expensive camera), I'm going to be going to Olvera St. at some point. Anything that you can think of there? It has got a bunch of little marketplace stalls with ethnic/cultural stuff and lots of touristy people there already so I shouldn't have too much trouble taking pictures of stuff -- if I know what I'm looking for. --Neskaya kanetsv 18:21, 15 June 2009 (UTC)Reply
Looks like fun. w:List of Chicano Caló words and expressions didn't have any specific nouns that gave me any ideas. Produce, toys, and clothing come to mind, especially if you can get Calo or other Spanish dialect or Spanglish terms from the merchants. DCDuring TALK 18:58, 15 June 2009 (UTC)Reply

Gutenberg word frequency rankings

While I'm collecting feedback on my new previous/next pages extension (nearbypages) I've boldly taken the liberty to change the format of the Gutenberg word frequency rankings to match it as best I could. If you love the rankings as they were and hate my changes I'll be happy to change it back in a little while. I'll even change the format of nearbypages to match the old rankings format if enough people prefer it.

Please comment here or on User talk:Hippietrail/nearbypages.jshippietrail 23:13, 14 June 2009 (UTC)Reply

A colorful little side-project of mine

A while ago (well, last summer actually), I began running the Wiktionary WOTD through a tool called the Transmonster. This generates five-color palettes that I subsequently uploaded to the ColourLovers (link to gallery) website. I'm up to some 200 of them (because I had a fallout between January and May >__>;;). It's not much of a serious thing, but I just thought it might as well let you all know. Circeus 03:26, 15 June 2009 (UTC)Reply

Dog breed names - capitalization

I've noticed that some believe that all dog breed names should be capitalized. The practice seems to be widespread enough that it seems to be more than just a handful of grammatically ignorant people. Any thoughts / information / consensus on this? And should it be mentioned somewhere? -Oreo Priest talk 16:27, 15 June 2009 (UTC)Reply

The same question applies to the common (i.e., English, not binomial scientific) name of bird and some other species, fwiw.msh210 16:50, 15 June 2009 (UTC)Reply
There seems to be wider usage of capitalization for the pet breed names than for the vernacular names of animals. In pet names perhaps it's influenced by the fact that many breeds would be at least partially capitalized as a result of having a place name/ethnonym as part of the name (Rottweiler, [[English]] [[bulldog]]). So to be orthographically "fair" other breed names are capitalized (Poodle) and then the AKC names Standard Poodle.
Animal and plant vernacular names have an analogous "problem". Some are commonly referred to by their 2-part taxonomic names, of which the genus part is properly capitalized in technical writing. It seems only "fair" to capitalize a vernacular name. But capitalization seems less prevalent.
If we were to declare one form (for each class of names) to be the default standard, allowed alternative forms, and allowed evidence-based departure from the defaults, we would have a reasonable solution. Or they could be "only in Wikipedia" entries by default, perhaps with a translation section. DCDuring TALK 17:52, 15 June 2009 (UTC)Reply
Ideally, each should be capitalized according to the most frequent usage. But determining this may be a lot of work, so we'd benefit from a default guideline (preferably not a rule or policy).
But I think that the breeders' association standards are prescriptive, and probably don't represent actual usage. Wouldn't one write “he had a poodle,” possibly adding an adjective indicating the dog's size (“he had a toy, miniature, or standard poodle”), just like “he had a dog?” Both “he had a Dog” and “he had a Poodle” look weird.
Or does capitalization differ between naming a breed and referring to individual animals? “The Poodle is a noble breed” vs. “poodles are noble animals?” Whatever we choose, I'd rather it be based on real usage than some organization's style guide. Michael Z. 2009-06-15 22:24 z
See [26] and the subarticles, for example. And I did a quick define:mastiff on Google and got predominantly (but not exclusively) sites that used the capital in all cases. -Oreo Priest talk 23:57, 15 June 2009 (UTC)Reply
None of those is a good usage example. Look for mastiff where it appears in running text, not in a title or heading.[27] Also search americancorpus.org.[28] Michael Z. 2009-06-16 00:58 z


Depending on the AKC is like asking Kimberly-Clark whether we should capitalize "kleenex". I'd bet that most actual usage does not capitalize dog breed names much. But our attestation depends on print sources, which seem to capitalize them a lot. If we would like to formalize this a bit (ie guidelines), then we should sample a few types of breed names (and vernacular names plant and animal names to get as much out of this as we can) and see which way it actually goes. Using COCA and BNC at BYU we can fairly conveniently do this.
A preliminary look would suggest that overwhelmingly the lower case wins for some well-known dog breeds. Even in the case of Rottweiler, lower case rottweiler represented 40% of usage (n=96, 4 indeterminate). Only "standard poodle" appeared (n=14); not Standard Poodle. Lower-case "dachshund" 94 vs 1 Dachshund (n=95, 1 indeterminate). German shepherd predominated with 88 vs 11 for German Shepherd (n=99, 1 indeterminate).
Also in Category:Dogs there are more breed names are lower case or mixed upper and lower case. The capitalized parts are often ethnonyms or toponyms. Hardly any breeds with compound names had the full complement of capitalization combinations.
Nevertheless, I believe that some of the folks motivated to enter the breed names will prefer to capitalize them and we would have no trouble finding a sufficient number of citations for all "recognized" (by AKC et al) breeds and other breeds as well. I would be happy if all of the capitalized names were "only in Wikipedia", but I don't think many others like that approach in any of the other proposed applications. DCDuring TALK 00:59, 16 June 2009 (UTC)Reply
Interesting citation: “Note: For this book, breed names will be capitalized; breed types (and general groupings) will not.”[29]
After looking at about five entries, I'm guessing that our definitions are mostly inadequate, lacking both the senses of “a dog breed” and “a dog of the breed”. When I say “a bichon piddled on the rug,” I don't mean that “a class of toy dogs” piddled, I mean “a dog belonging to the bichon breed” did.
And I shall speculate freely that the one is capitalized much more often than the other. Michael Z. 2009-06-16 01:17 z
I didn't look carefully enough to be sure, but there was not all that much difference. The sample of capitalized instances turned out to be quote small. The Rottweiler case would be the best one to review on COCA.
All of our taxonomic name entries are like, ignoring the logical need for two senses on the grounds, I think, that it is more or less a fixed rule of grammar that assures us that both senses are possible. We have similar situations in Proper nouns of other types and even in nouns where countability and uncountability can occur in the usage of almost every noun according to predictable rules. I don't know whether we can make that kind of assumption for dog breeds and plant and animal vernacular names. DCDuring TALK 01:34, 16 June 2009 (UTC)Reply
I can see how that might be standard practice in print dictionaries for native speakers, but perhaps not so perfect in a learner's dictionary. There may be exceptions (I can think of some for ethnonyms, but not dog breeds). In the long run, we may as well be precise about this. Michael Z. 2009-06-16 01:47 z
I think that most of the population of users of taxonomic names can be expected to know (or, at least, learn) the rules, whereas the users of vernacular names and the breed names may include more of the folks who should not be expected to know or learn them. If so, we can focus our efforts to add more precise and refined senses where they will do the most good, without prejudicing our ability to eventually improve even the taxon entries. In any case, EP prefers that we not expend effort on two- and three-part taxonomic names, leaving them for WikiSpecies and Wikipedia. DCDuring TALK 02:20, 16 June 2009 (UTC)Reply
This duality of meaning (specific member and general class) is a common feature of all English nouns. I can say "Lamps light rooms." In which case I am speaking of lamps in general. Or, I can say "My lamp needs a new bulb." In which case I am speaking of a particular lamp. We do not need to add this duality of meaning to every noun entry in every language on Wiktionary. --EncycloPetey 03:04, 16 June 2009 (UTC)Reply
Perhaps it's as open and shut as you say, but there seems to me to be a small difference in how different types of nouns are defined and used. A taxon is never defined as an individual; it is defined as a class: a subfamily, genus, etc. It may be used to identify an individual. A countable common noun is almost always (normatively: always) defined as an individual. Lamp is defined as "A device [] ". It may be used in the singular as a class name, but rarely. One could say "The lamp was a great invention", but not so naturally "The lamp is used to light things." when one means "Lamps are used to light things."
Dog breeds are in between these two in usage, I think, which is why we are banging the keyboards about it. DCDuring TALK 03:33, 16 June 2009 (UTC)Reply
Did I give the impression that I thought that was open and shut? I certainly didn't mean to imply that, and was suggesting quite the opposite. I agree with the points you've just made. Note that a taxon is always defined as a Proper noun denoting a class of members. This is directly connected with the way the nomenclature codes are written. We went through a big discussion on the issue of capitalization in plants names in the Plants Group on Wikipedia some time back. Similar discussions have happened from time to time in the other taxon-specific groups. --EncycloPetey 04:14, 16 June 2009 (UTC)Reply
I think the need for a separate sense might be strictly governed by usage. How often do we speak of “the lamp”? Maybe some specific treatises on “Thomas Edison's light bulb” generalize this noun. But can't “the poodle” refer to the breed as easily and as often as it refers to a particular beast, warranting two separate senses? The ultimate test is the specific usage for each word, but until someone takes the time to estimate or measure usage for each one, I'd like to encourage defining these separately.
This is also a good application for subsenses, or at least compound definitions like “or an individual of the breed”. Michael Z. 2009-06-16 04:50 z
Please keep in mind that dogs aren't the only things that have breeds or cultivars. Your proposal potentially affects half a million current or future Wiktionary entries, and possibly more. This isn't a decision to be made lightly, and applies to most common names of living organisms. "The cheetah is the fastest animal on land." "The monarch butterfly spends the winter in Mexico." "The geranium is popular as a house plant." etc. --EncycloPetey 05:18, 16 June 2009 (UTC)Reply

I think that capitalizing an animal name insists on the fact that this name is used with a generic sense (animal, or "generic" animal, belonging to this category of animals, as opposed to other categories), but does not really change the meaning. This is one of the cases where capitalization is possible to express something special without changing the meaning. Other cases are personalized nouns (e.g. Truth), beginning of sentences, book titles, shouting in Internet forums, etc. In such cases, I think that a single entry should be created (e.g. mastiff). There is no reason to create two pages for each plant name and each animal name. We could add many millions of such pages (don't forget that this issue also exists in other languages), but this would not help readers at all, this would only confuse them. When both forms are used, determining which is the most frequent is not relevant: assuming that You is used more often as you should not lead to create You instead of you, and this is a similar case. Lmaltier 06:46, 16 June 2009 (UTC)Reply

What I explain is the policy adopted by other dictionaries, and it's a good policy. However, Webster's policy is much too extremist (e.g. they write new jersey pine). When the capitalized form is the normal form (e.g. Newfoundland, Thunnus albacares, New Jersey pine...), this form should be privileged, and the uncapitalized form may be created in addition if it's also in use. Lmaltier 07:14, 16 June 2009 (UTC)Reply

Context label cleanup

The following context labels can mean different things, and are used differently by some dictionaries. We need to define what we mean by them. Once we make up our minds, I'll draft up some documentation for the template or category pages, or for WT:GLOSS.

(By the way, some of the hide links and 250 links on “what links here” pages seem to be completely broken today.) Michael Z. 2009-06-16 04:32 z

{{slang}}

Over 500 inclusions. The concept of slang is defined in different ways. The Oxford Guide to Practical Lexicography says this label “indicates that the item is non-standard language used by the named group” [emphasis sic], but “in some dictionaries, ‘slang’ is considered a register label, meaning ‘even more informal than very informal’.”p 228 Unfortunately, we also have {{cant}} (20 inclusions) and {{jargon}} (12), which mean pretty much the same thing, and WT:GLOSS is no help. I suggest that we merge these three, and define them similarly to Oxford. Michael Z. 2009-06-16 04:32 z

  • To me "slang" is informal language, but "jargon" can be formal. If you work in a formal environment, you wouldn't likely speak to your boss using slang words, but your conversation could well be full of jargon. Here I am using "jargon" to mean "A technical terminology unique to a particular subject." (ety 1 def 1) and "slang" to mean "Vernacular language outside of conventional usage." (1st synonym + def 1). I do not support merging these two. Thryduulf 22:51, 16 June 2009 (UTC)Reply
Well, of the three English terms which carry {jargon}, I don't think any falls under your definition of jargon (although I admit I'm not clear on what the label represents in the two which aren't prison slang).
Do we currently need this usage label at all? To represent technical terminology particular to a subject, I would simply apply the subject label, like medicine. This is clearer and supplies more information than just jargonMichael Z. 2009-06-16 23:42 z
"Jargon", as used here, seemed to be an expression of a negative attitude by the tagger toward the entry. "Buzzword" (now gone) was used the same way. "Cant" has some particularly linguistic meaning, but I don't think it has any use in the portions of a general dictionary that are supposed to be for normal users. I think it could often be replaced in its use here by "obsolete|_|slang". Do we have {{argot}} too? In all of this group "slang" seems like the keeper. DCDuring TALK 00:53, 17 June 2009 (UTC)Reply

20 inclusions. {{vulgar}} and {{slang}} represent two different things, and this template looks like “sum of parts”. I'd like to replace it with {{vulgar|slang}}Michael Z. 2009-06-16 04:32 z

Seems very sensible to me. DCDuring TALK 00:43, 17 June 2009 (UTC)Reply

Over 150 transclusions. Does this mean the term expresses an attitude towards the referent, like {{endearing}}, {{pejorative}}, {{ethnic slur}}, or simply risks offending a reader or listener, like {{vulgar}}? Let's pick one, or let's decide to retire this vague wording, and I'll get to work on the resolution. I think every instance can be safely replaced with {{pejorative}}, {{vulgar}}, or both. Michael Z. 2009-06-16 04:32 z

"Offensive" is a hypernym of "ethnic slur" and a hyponym of "pejorative". To me the term "ethnic slur" is the one of no clear value. (I don't get "endearing" either.) I did not think that it was intended as a synonym or a hypernym for "vulgar". There are terms that the user does not view as pejorative, that no one views as vulgar, but are nevertheless taken by auditors or readers as offensive. I'd be happy to provide examples. DCDuring TALK 00:41, 17 June 2009 (UTC)Reply
Now I'm looking at the Oxford book I cited above, which seems to use offensive the other way. These terms fall into two categories (this book considers these both subclasses of register, but other books don't):
  1. attitude or approval of the speaker to the subject: affectionate, endearing, appreciative, approving, disapproving, derogatory, pejorative, insult, strong insult, slur
  2. taboo or vulgar language: rude, offensive, vulgar, taboo
My main problem with offensive is that its nature is not clear—it could be taken as offending, i.e. “insulting”. But we could continue using it as long as we agree on the meaning of the label.
ethnic slur is part definition and part usage—I think “ethnic” ought to be evident from the definition, and a label like pejorative or insult sufficient to describe the usage. racist and sexist are similar labels also used in some dictionaries. Michael Z. 2009-06-17 01:42 z
I'm not wedded the to which terms are used and would prefer that we be consistent with user expectations. Other dictionaries have done some research on and have helped shape users' expectations. Their views should be accorded some weight, especially in the absence of any usability research budget here. I still think there is value in distinguishing personal insults from other pejorative terms. If offensive and vulgar were combined and labeled vulgar or offensive, that would be fine. But I believe that some of the terms labeled as offensive are in fact more properly considered "insulting". So, three terms seem important to me: offensive/vulgar, insults directed at people, non-personal pejoratives ("rust-bucket", "jingoism"). Many of these need to be refined via usage notes. If additional tags would significantly reduce the need for usage notes, they should be considered.
I think we should take a little time to make sure that all the items tagged with labels that are to be removed are properly tagged as "vulgar/offensive"" or "insulting" if they are. "Pejoratives" are less important. DCDuring TALK 02:23, 17 June 2009 (UTC)Reply

Over 400 transclusions. In most dictionaries this is a regional label, indicating that the scope is too complicated to express within the constraints of print. Example at User:Mzajac/Dialect labels#Dialectal. Let's define it as such, and resolve to substitute detailed regional labels whenever the information is available. Michael Z. 2009-06-16 04:32 z

Over 125 transclusions. Looks like it's the same as dialectal. Merge? Michael Z. 2009-06-16 04:32 z

  • I would think of a word marked "regional" as a word that was used in two or more dialects in a similar geographical location, for a word used in the dialects of Cornwall, Devon, Somerset and Bristol. I agree though that wherever possible we should have more detail than just either of these labels, so I support this merger. Thryduulf 22:51, 16 June 2009 (UTC)Reply
    I start looking at this from the practical point of view of merging, and see that the meanings overlap a great deal, but regional and dialectal are not identical. E.g. bunny hug is a regionalism from Saskatchewan, where people speak the same basic dialect of Canadian English found from Vancouver through Toronto. I'll hold off on changing this one. Michael Z. 2009-06-23 12:32 z

Invitation to Kosovo for Wiktionary

Hi guys, would like o invite you to kosovo for our software conference. includes topics of wikimedia and wiktionary. I have been recruiting people. please come, and speakers might get sponsored, so get your talks submitted.

mike http://www.kosovasoftwarefreedom.org/

Refactored WT:PREFS

After years of procrastinating I've finally began to refactor WT:PREFS. Please check that nothing has changed. Refresh your caches (control F5 etc).

You shouldn't see any difference at all. The code has been rearranged to make it more modular, easier to add to, easier to maintain and improve.

If anything doesn't behave as before please let me know. If something is drastically broken feel free to revert.

In MediaWiki:Common.js I have made this change:

before: importScript('User:Connel_MacKenzie/custom.js'); now: importScript('User:Hippietrail/custom.js');

If you'd like to look at the code: User:Hippietrail/custom.js

hippietrail 12:08, 17 June 2009 (UTC)Reply

  • I've now added partial support for disabled (greyed out) items.
    Some features are disabled due to certain problems such as broken servers, one feature of mine only works with JavaScript 1.7 or better.
    Next step is to disable/enable all options when the master switch is toggled. — hippietrail 00:51, 18 June 2009 (UTC)Reply
  • All controls are now greyed out or enabled as the master switch is toggled. Let me know if there are any problems. I've tested it with all major Windows browsers. — hippietrail 04:09, 19 June 2009 (UTC)Reply
    Awesome. It improves the control 100%, because I always assumed that you'd have to save settings to change its state. Now the greying out directly reflects whether it is in effect.
    But there is still a weird disconnect in having a non-modal control to change the entire activate state, but clicking a link to refresh the page to change individual behaviours. Michael Z. 2009-06-19 04:15 z
    Yes that's next. I've always hated it not having "OK" and "Cancel" buttons but Connel, who originally implemented it, insisted it was impossible. That was what I really wanted to fix but the code was pretty crufty so first I wanted to refactor it making sure I didn't break any features.
    It's somewhat complicated by the fact that it works with two ways of storing and retrieving preference settings from the browser cookies. It seems the following options have all been broken for some time but I don't know if anyone has complained: WiktionaryPreferencesTimeUTC, WiktionaryPreferencesTickClock, WiktionaryPreferencesShowNav, WiktionaryDisableAutoRedirect
    If I can be sure nobody uses those options I would be glad to remove them and decomplexify the code.
    With any luck I'll have OK/Cancel code tonight Sydney time. — hippietrail 05:26, 19 June 2009 (UTC)Reply
No, those preferences work for me using the current WT:PREFS, ShowNav is a particularly "used" one (maybe I gave you bad information last night, sorry). Conrad.Irwin 09:30, 19 June 2009 (UTC)Reply

Conjugated verb phrases

Are there guidelines about conjugated verb phrases (e.g. wastes time)? If they are allowed, why not creating pages such as appelé un chat un chat, appelant un chat un chat, appelât un chat un chat, etc. (from appeler un chat un chat)? I cannot imagine that they might be added. It would be ridiculous, and 100% useless. Lmaltier 14:04, 17 June 2009 (UTC)Reply

I would think that the default should be to not inflect idioms. (Is waste time really an idiom?) If there were something unusual about the inflection, perhaps, but I can't think of an example. DCDuring TALK 15:06, 17 June 2009 (UTC)Reply
How about possessives like eat one's hat? Of which there certainly are exemples in French: bête comme ses deux pieds. Personally I've often added a note about agreement (cf. attacher sa tuque). Circeus 11:24, 19 June 2009 (UTC)Reply

Pronominal verbs

(separated from the above topic)

There could be something to it... things like m'appele and t'appeles and s'appele can be confusing. But I do think there's more productive things to do...there's some more French one-word verbs to conjugate, if you like. --Jackofclubs 15:40, 17 June 2009 (UTC)Reply
It would be m'appelle, t'appelles, s'appelle, etc. Creating them might be considered, it's not ridiculous. But this is another issue (s'appeler is not a verb phrase, it's a pronominal verb). Lmaltier 16:33, 17 June 2009 (UTC)Reply
… though [[Wiktionary:About French]] currently says that not even [[s'appeler]] should exist. —RuakhTALK 16:41, 17 June 2009 (UTC)Reply
I propose we change that, admittedly I did start s'appeler on the simple enough premise that it's not sum of parts (s'appeler could literally mean call each other (by phone, I mean)). Mglovesfun 17:36, 17 June 2009 (UTC)Reply
Yeah, I think I agree. Most dictionaries do not have separate entries for idioms, but rather list them under the most salient word; such dictionaries, unsurprisingly, cover "s'appeler" at their entry for "appeler". We, however, put idioms on their own page (which doesn't work quite so well, but probably won't change any time soon), so yeah, the consistent thing for us to do would be to put "s'appeler" on its own page as well. But right now we cover it both at [[appeler]] and at [[s'appeler]], which is not so good. —RuakhTALK 21:42, 17 June 2009 (UTC)Reply
The basic argument we came up with on fr.wikt (with virtually no opposition) was that se + infinitive entries are acceptable if not sum of parts. So se laver is sum of parts, because it's just to wash oneself, but se passer isn't because it means to (deprecated template usage) happen (in fact I've been meaning to add se passer and se produire to fr.wikt for a while now). Mglovesfun 21:53, 17 June 2009 (UTC)Reply
FWIW I could have sworn the absolutive meaning of the reflexive (which IMHO covers se produire) was considered a grammatical feature and did not usually warrant special definition? Circeus 03:13, 18 June 2009 (UTC)Reply
Source? Mglovesfun 04:45, 18 June 2009 (UTC)Reply
All dictionaries I have tried have special definitions for se produire, these definitions are needed. And it's the case for most pronominal verbs in French. Lmaltier 17:39, 18 June 2009 (UTC)Reply
I'm not disagreeing as to whether definitions are needed. There are clearly cases where they should be given. What I disagree is as to whether a separate se produire page is warranted. What I've taken to do is create one for idioms in the reflexive, but not the verbs themselves. Though I wouldn't be against redirects from such entries. As a side note, I'm really not too keen on "sub entries" of the type found at fr:. IMHO either these are definition with labels like (deprecated template usage) reflexive and (deprecated template usage) pronominal, or they are different entries. Of course, I am opposed to conjugated reflexive entries just as I am for idioms, if only because it feels silly to have an entry for "m'appelle" in the first person singular, but not the third... Circeus 11:35, 19 June 2009 (UTC)Reply
When something warrants a definition, this "something" also warrants a page. Don't you agree? But providing info about the pronominal verb in the page of the simple verb is also needed (at least a soft redirect, or more...) I agree with you about conjugated entries such as m'appelle, it's similar to mother's or l'eau. Lmaltier 16:56, 19 June 2009 (UTC)Reply
(Let's start from the left again) where's the appropriate place to have a vote on this? I can think of some 'good' changes I can make, but I don't want to do them now just have to have someone revert them all. Mglovesfun 04:48, 18 June 2009 (UTC)Reply
I propose that we continue this discussion here. Mglovesfun 13:29, 19 June 2009 (UTC)Reply

Another idiom

Please help the Wikipedia editors with opinions as to whether this would be an idiom that we take. Uncle G 19:54, 17 June 2009 (UTC)Reply

Done. DCDuring TALK 21:02, 17 June 2009 (UTC)Reply

Typical collocations

The german wiktionary has a header “Charakteristische Wortkombinationen”, “typical word combinations”. Do we have something similar? I think we should. For example, in , it says this is typically used with the Verb versetzen: in ~ versetzen. H. (talk) 08:46, 18 June 2009 (UTC)Reply

We have "Derived terms" but only for combinations that meet CFI. For combinations that don't, we include example sentences or information under "Usage notes". --EncycloPetey 14:59, 18 June 2009 (UTC)Reply
Unfortunately any combination that involves the headword properly bolded in a usage example will not be found by the search engine. Your suggestion would help overcome the deficiency.
We also sometimes have restrictions on collocations in context tags or in the sense line. DCDuring TALK 14:20, 19 June 2009 (UTC)Reply
This comes up all the time, and our guidelines' failure to address this seems contrary to our mandate and to the instincts of many editors. Important collocations are often listed at RfD with the (warranted) justification that they are merely sum-of-parts, but many editors claim that they belong in the dictionary as “set phrases”—of course this is wrong, as being a set phrase is not a CFI.
Our ELE also don't give us any reasonable way of including unlinked collacations in entries. These are not simply derived terms so their significance is lost if they are piled in there, and they are likely to be removed if they don't have an entry. The best we can do is to persistently shoehorn them into Usage notes and see if some conventional format arises.
This is a feature which is very important in dictionaries for language learners. We need to resolve to address such needs, and we need a hard-working editor to introduce this and other such dictionary features, and we need to support her or him. Michael Z. 2009-06-20 14:31 z

Broken web bug on Wiktionary??

I've been having intermittent problems with Wiktionary for the past few days.

  • Often page loading completes bug only a totally blank page is shown.
  • Sometimes an alert appears saying "This doesn't look like a Wiktionary page. No can do."
  • Both problems have occurred on various browsers and various machines.

My hunch is that they are a side effect of a badly programmed hit counter or tracker web bug of some kind that only activates randomly once out of every so many hits. Does anyone have any idea what it is, how to fix or remove it, or if the problems even share the same cause? — hippietrail 09:04, 18 June 2009 (UTC)Reply

I haven't had any such problems myself. My bot has had difficulties, but that was from a server that had gone rogue and should be fixed now. --EncycloPetey 14:57, 18 June 2009 (UTC)Reply
Is this while logged in, while logged out, or both? —RuakhTALK 15:00, 18 June 2009 (UTC)Reply
I'm getting it right now while logged in using Google Chrome on a work computer and submitting an edit for the page miga. The Chrome debugger tells me there were two Google JavaScript files included. I'm pretty sure the devs don't like that kinda thing. But neither script included the error message.
So I think the Google web bug thingies are causing a conflict in some other piece of JavaScript. It might even be one of my own old scripts but a search in Wiktionary and a search in Google both find no hint of the error message... — hippietrail 02:49, 20 June 2009 (UTC)Reply
  • Found it. One of Conrad's older js extensions was clashing with one of my newer extensions. The Google bugs were a red herring I think. The error message itself came from Conrad's parser.js aka "paper view". But where are the Google bugs coming from? — hippietrail 02:37, 21 June 2009 (UTC)Reply

New toy to play with

In my copious free time this winter I have crafted a new toy for you all to play with.

in WT:PREFS turn on "For each language section add interwiki and random links." and tell me if you like it. — hippietrail 11:26, 19 June 2009 (UTC)Reply

I haven’t been able to find it. I searched for "for each language" but it was nowhere to be found in the PREFS. What does it do, anyway? —Stephen 11:50, 19 June 2009 (UTC)Reply
Oh sorry you will have to refresh your browser cache, control+F5 on most browsers. It will be the last item under the heading "Experiments – these are likely to be buggy and may not work in very common browsers." — hippietrail 00:10, 20 June 2009 (UTC)Reply
Cleared my cache in Safari (cmd-opt-E) and forced reload (shift-Reload) several times on WT:PREFS, but I don't see this. I don't usually have caching problems at all. Is it still installed? Michael Z. 2009-06-20 14:40 z
Ditto. What's more, I don't see anything like it in User:Connel MacKenzie/custom.js or User:Hippietrail/custom.js. —RuakhTALK 14:54, 20 June 2009 (UTC)Reply
Apologies again. I neglected to copy my development version onto the public version after testing it on all browsers. It should work if you refresh your caches this time. — hippietrail 02:38, 21 June 2009 (UTC)Reply

Main Page redesign saga: Part 3

Wiktionary:Main_Page/2009_redesign

Okay, discussion had slowed down and I sort of got sidetracked for a while. I implemented as much suggestions as I could to the proposed redesign and would like final input before I start nagging the people over at commons for icon retracings and stuff. Circeus 11:28, 19 June 2009 (UTC)Reply

Apparently this isn't an official vote yet, but a discussion to get us started. Mglovesfun 13:31, 19 June 2009 (UTC)Reply

Wiktionary:Solstice Competition, June 2009

Announcing this June's Solstice Competition. Its open and close dates are not yet set, so as to allow editors to amend the rules.msh210 22:39, 19 June 2009 (UTC)Reply

The competition has begun.msh210 20:51, 22 June 2009 (UTC)Reply

{{compound}}

This template has a lang parameter but it is not used. Would you support to create a category Category:English compounds and the appropriate FL counterparts by using this template? So compound words using this template would be automatically added to the category. --Panda10 22:10, 20 June 2009 (UTC)Reply

The word "compound" is ambiguous. The relevant category already exists as Category:English compound words. --EncycloPetey 22:15, 20 June 2009 (UTC)Reply
Ok, great, then Category:English compound words and the appropriate FL categories. The question remains: would it make sense to add this new functionality to the template? --Panda10 22:26, 20 June 2009 (UTC)Reply
Hard to see a downside to this. If we are going to refine the category or even maintain it, much of the activity would be by language. I noticed at least one instance of what looked to me like a single character having the template, so the interpretation of this will definitely be by script and language.
It seems as if the template is used in nearly a thousand entries, many (most?) of which are not English, but at least a couple of hundred are English. There are at present 12 in Category:English compound words. I wonder how many other languages have "compound word categories", whether the naming is consistent, and how many entries are so categorized. One preparatory step would be to get the compound template onto the items in those categories, assuming that they are properly categorized. I assume that most English use of the template have not inserted the lang= parameter. Is that also true for the other languages? Can we automate or accelerate the insertion of the lang=? Autoformat? DCDuring TALK 23:33, 20 June 2009 (UTC)Reply

ijs and ijs

I just discovered these two pages: ijs and ijs, the first one with ij, the second one with ij. The contents treat the same word. What is the correct form? Or do we need both? --MaEr 18:20, 21 June 2009 (UTC)Reply

Given this is not (AFAIK) even an alternative spelling (purely a typographical issue, like the use of æ/ae in english or œ/oe in French), I say it's better to pick only one (is it a letter, a digraph or a ligature anyway??) of the version and redirect the others (either directly or as an {{alternative form of}}). Circeus
Custom in the past has been to avoid using "virtual" ligature combinations in any page name. They're neither easy to type nor easy to recognize for what they are. This applies also to the dž digraph in South Slavic langaguages. --EncycloPetey 19:23, 21 June 2009 (UTC)Reply
There are a couple of "forcibly ligatured" words finding their way into the English section as well, fisherwomen and firſt. It seems to me we should delete the version with ligatures; good fonts will add the tasteful ligatures back anyway (though mine seems to only put one between "r" and "ſ") and it just leads to confusion when firſt and firſt and firſt and firſt are all distinct. (Not to mention first and first of course). I'd settle for making leaving redirects behind if people want to create these things, but I see no reason to create them or keep them. Conrad.Irwin 20:02, 21 June 2009 (UTC)Reply
Dutch ij is considered a digraph. In alphabetizing words with ij are in general found under i not y. Capitalized it does become IJ not Ij. Otherwise I do not think that it needs a special symbol, although there is a bit of a problem if stress marks are added. This is optional in Dutch, but the spelling is regulated. The proper spelling has an acute on both i and j which I do not know how to do. Even the above digraph symbol produces ij́, which is not correctJcwf 22:11, 21 June 2009 (UTC)Reply
Obviously, then ij needs heavy refactoring because it refers only to a specific/incidental typesetting of ij, the digraph, which we currently have no entry for, and would take most of the content. ij should have a translingual section too, while ij should not. Circeus 23:06, 23 June 2009 (UTC)Reply

Citations

The current draft proposal at Wiktionary:Citations apparently deals only with English, and some of the ill-designed templates it suggests to use (like {{citation}}) are based on that assumption. Given the ever-growing application of Citations: namespace format (>2k pages) as laid out in that proposal, I think it's high time its deficiencies be discussed before more damage is done by its usage.

First of all, the template {{citation}} is broken and should be terminated. It creates L2 section and that behavior is to be avoided by the unwritten common template practice (unnecessarily complicates already complex parsing of wiki-code). Its purpose is to list variant spellings whose usage Citations page should illustrate, but these are already listed ad the corresponding mainspace page(s) in the ==Alternative spellings/forms== section, which is one click away.

The suggested practice is to put ====Quotations==== as L4 section, which doesn't make sense if all that it contains is a soft redirect to the corresponding Citations page by means of {{seeCites}} template. This makes it needlessly duplicate for every PoS of every etymology, as one can see at the exemplary entry (deprecated template usage) hinder.

Now, the obvious thing to do would be to follow the same formatting scheme in the citationspace as in the mainspace, i.e. L2 section names separated by ----, each explicitly categorizing in [[Category:Xxx citations]]. (Category:Citations was apparently deleted recently, I've created Category:Citations by language which seems more appropriate). That way individual languages can bee linked to by means of lang= parameter of {{seeCites}}. Within the L2s, senses should all (regardless of etymology/PoS) be listed as L3s in a sequence they appear in the corresponding mainspace entry. As a first L3, perhaps a duplicate of ===Alternative spellings/forms=== should be made, to let the readers know which spellings are being grouped.

Thoughts? --Ivan Štambuk 22:24, 21 June 2009 (UTC)Reply

If {{seeCites}} is used to categorize in cat:French citations, then the entry, not the citations page, will be categorized, which is not, I think, the desired effect. I'm not sure why citations need to be categorized by language at all, but, if they are to be, then doing so in {{citation}}, not seeCites, would seem to be the way to go. (What would be the purpose of such categorization?)msh210 23:06, 21 June 2009 (UTC)Reply
We need the citations page to have a link to each form that it cites, if at all possible, so that those checking whatlinkshere for a page prior to deleting it (as all admins do, of course) will know whether the word has been cited. That is currently accomplished by {{citation}}, but poorly: the template displays only up to some low number (four IIRC) of terms, and should be fixed if its use for this is to continue.msh210 23:06, 21 June 2009 (UTC)Reply
I, for one, have no opinion as to the order of language, POS, or etymology sections in the citations namespace, or of what level the headers should be or whether they should be template-generated: I don't care.msh210 23:06, 21 June 2009 (UTC)Reply
Well, I wasn't suggesting that {{seeCites}} be used for categorization in the first place. Categories would be added manually in the Citation: namespace L2 section, e.g. [[Category:French citations|entry]]. The purpose of such categorizations scheme would be obviously to categorize all the citations on a per-language basis, so that the interested editors could maintain them. So far there is no way to list all the citations pages for a particular language. Also, sort key must be mandatory, or else all the citations would be sorted under "C".
Four doubtful entries (variant spellings or such) that need citations other than in the corresponding Citations:{{PAGENAME}} page, creators should create the ===Quotations=== section with {{seeCites}} linking to the appropriate Citations page (by using the first unnamed parameter). It's the burden of the entry creator to provide the evidence that the doubtful word or variant spelling exists. Also, as I said, methinks that that that kind of behavior, if needed to be implemented at all, should be accomplished by means of a L3 section ===Alternative spellings/forms=== which should wikilink to all the entries for which citations are being provided, not by using the awkward {citation} template. --Ivan Štambuk 08:28, 22 June 2009 (UTC)Reply
One use I have made of Citations namespace is for the removal of quotations that are arguably not valid for attestation or for which it is not clear what sense they might support. I have sometimes been using headers to create bins for sorting such quotes. It seems clear that I should use context tags for the quotations that have problems so as not to interfere with desirable permanent structuring of these pages. Please bring such pages to my attention if you notice them. I will undertake to make some proposals about context tags to mark attestation issues in the near future. Does anyone remember prior discussion of this?
One aspect of quotation sorting is the sorting by alternative forms and spellings. This is usually just a temporary thing. Arguably, at the close of an RfV, the quotations not involving the headword and its inflected forms should be moved to the page of citation space that exactly corresponds to the spelling in the quote. If the alternative form or spelling is being used to support attestation, I suppose it ought to be linked in the citation space.
This makes me wonder what functionality we really need from the citations template.
I regularly observe problems in connecting individual citations (especially the uplifting or humorous literary ones) with individual senses of the headword, sometimes even the part of speech. (See Citations:lagging for today's example.) Any structure for citations pages needs to preserve our ability to accommodate such ambiguous citations. DCDuring TALK 17:02, 23 June 2009 (UTC)Reply
Initially my idea for Citations pages was to have all inflections and certain other forms of a word on the same citations page. I've argued this at length. For instance, it makes no sense to separate capitalizations, since it would be impossible to determine for a lowercase word that begins a sentence, nor hyphenation, if a word is split on two lines. But groupthink and the introduction of the citations tab steamrollered right over this. DAVilla 06:48, 24 June 2009 (UTC)Reply

Suggested enhancement to search logic for terms containing possessive pronouns

Suppose the search logic for terms containing possessive pronouns is enhanced, so that whenever a user enters such a term (e.g., off his rocker or off your rocker), if the initial search fails, the code substitutes one's for the possessive pronoun and repeats the search? If the second search gets a hit, the user is auto-redirected to the version of the term containing one's ((deprecated template usage) off one's rocker). Is this a good idea? If so, is it doable? -- WikiPedant 05:38, 22 June 2009 (UTC)Reply

Currently the only supported auto-redirects are ones where server-side MediaWiki markup generates the bluelinks and JavaScript handles the redirection. Thus, it's basically limited to what combinations of uc:, lc:, ucfirst:, and lcfirst: can accomplish. (Though for entries with capital letters in the middle, we sometimes augment this with manual redirects from the all-lowercase form.) If we want to support other kinds of redirects — and that would be really nice — we need either (1) to write JavaScript code that generates a list of permutations to try and then queries the API to see which of them are bluelinks, or (2) to create an external page (possibly on the toolserver) that contains a bag of bluelinked page-titles, implements this logic, and generates appropriately-redirecting JavaScript. The latter is more flexible algorithmically (it wouldn't need to generate all possible relevant bluelinks; for example, without getting too technical, it could normalize the page-titles, e.g. storing [[burst somebody's bubble]] under "burst one's bubble", so when someone looks up "burst his bubble", it would check its index for "burst one's bubble" and find [[burst somebody's bubble]]), but has major downsides (it would probably be editable only by whoever's hosting it; we'd have to trust that person not to steal our passwords, or to use our passwords for good and not for evil :-P  ; and its index wouldn't be up-to-the-second). The former approach is much more limited (how many permutations can we try? how much language-specific code can we build in?), but is clearly safer, and would still be much more comprehensive than what we've currently got. If nothing else, I'd love it if it could remove Hebrew diacritics (vowels, chanting notations, etc.), replace fullwidth English characters with normal ones, change the I-J ligature to a normal "ij", and so on. —RuakhTALK 00:21, 24 June 2009 (UTC)Reply

Excessive cognates revisited

Straw poll: (1) Do you think this is too many cognates to list for an Old English entry? (2) Should modern languages be included in a list of such cognates? Note that these questions are specific to Old English entries here. --EncycloPetey 02:34, 23 June 2009 (UTC)Reply

Yes. For a given word, there should never be more than one or two useful cognates, ideally from contemporary languages. Such material is interesting, but best used in appendices. Circeus 03:05, 23 June 2009 (UTC)Reply
I don’t see any problem with that entry. It’s such a stub that even such an exhaustive list of cognates cannot possibly be regarded as detracting from more useful information or whatever. One thing though: Why is an Old English word listed as a cognate in an Old English entry? –Shouldn’t (deprecated template usage) Lua error in Module:parameters at line 360: Parameter "sc" should be a valid script code; the value "unicode" is not valid. See WT:LOS. be listed in a Related terms section, rather than in the Etymology section?  (u):Raifʻhār (t):Doremítzwr﴿ 03:16, 23 June 2009 (UTC)Reply
This IP has been doing that a lot, including adding an etymology section to an Old English word that has nothing more in it than {{etyl|ang}}, which is doubly wrong because it categorizes the word as if it were a modern English word derived from Old English. I've seen improvement in the anon's edits, but have not gotten a response to any comments or edits made. --EncycloPetey 03:22, 23 June 2009 (UTC)Reply
Yes, that seems like too many. I have always thought that cognates should mostly be found by going to the list of descendants of ancestors. In the case in point one could not do that because the protoGermanic conjectural ancestor is not a permitted entry and there is no alternative home for the cognate terms. Without a suitable home, I suppose we could just have a show/hide for reducing really long lists to a single line. Or we could we have a WikiCognate analogous to WikiSaurus. DCDuring TALK 03:28, 23 June 2009 (UTC)Reply
We have a mechanism for including PIE roots, so I don't see why the same mechanism couldn't be used for proto-Germanic. See (deprecated template usage) dēns for an example of this. --EncycloPetey 03:31, 23 June 2009 (UTC)Reply
I haven't had much reason to go back that far so this was the first I had seen of it. That would be lovely for the Germanic reconstructed languages. Is that where reconstructed terms from, say, Vulgar Latin, would go, too? Could this contributor be introduced to the attractions of this approach? DCDuring TALK 04:05, 23 June 2009 (UTC)Reply
I wouldn't put reconstructed terms from documented languages into such a format. PIE and proto-Germanic are reconstructed languages. For recorded languages, this format probably shouldn't be used. --EncycloPetey 14:55, 23 June 2009 (UTC)Reply
Where do/should they go? DCDuring TALK 16:10, 23 June 2009 (UTC)Reply
On the page. They look just fine. DAVilla 06:09, 24 June 2009 (UTC)Reply

(1) I see no problem with adding bunch of cognates to such short articles. Especially to Old English ones, which, I think, often are accessed through etymology sections of Modern English entries and thus are interesting etymology-wise. (2) Modern language cognates are enclosed in parentheses after their parent Old language, so I don't see any problem with this either. As for making Proto-Appendices like Appendix:Proto-Germanic *dagaz, and moving cognates there, well, that's a reasonable alternative but the practice shows people are reluctant to create such appendices. --Vahagn Petrosyan 05:34, 23 June 2009 (UTC)Reply

I agree completely (with Vahagn Petrosyan). Even for readers who are actually looking up Old English words directly, I'd bet most would be interested in other Germanic cognates. It's not a language like Modern French, that people learn for practical reasons. —RuakhTALK 00:29, 24 June 2009 (UTC)Reply

Is what is written at (deprecated template usage) dēns meant to avoid the inclusion in the entry of the long tree of terms at Appendix:Proto-Indo-European *h₃dónts#Descendants? If so, I think I get your point…  :-S Cognates are really useful, but I’d shy away from that many lest they swamp the entry.  (u):Raifʻhār (t):Doremítzwr﴿ 14:48, 23 June 2009 (UTC)Reply

Unorthodox request.

Hello Wiktionary community. I have a confession to make. You see, as suspected, but not proved I am in fact Wonderfool. And Wonderfools hae never been keen on serious long-term admin work. So this year, instead of being dangerous and going on a spree, I'll be amical and request desysoppig the polite way (I'll delete the main page, of course, but that's all). And I think it would be wonderful if I could remain a Wiktionarian, and be open about my WFness (i.e. don't send me underground so I'm forced to clandestine editting and hopping from IP address to IP address and town to town and continent to continent). This way, I can edit hardcore French stuff, which I haven't done properly for about a year, without worrying about being blocked. And It'd be nice to run User:Keenebot2 again - there's tens of thousands of pages waiting to be rapidly added to this project in my off-wiki files. Anyway, I propose a mini-poll to allow WF to edit here, but without boring adminship duties. If not, then I'll probably see you again in 2010 under a new name. Regards --Jackofclubs 07:00, 24 June 2009 (UTC)Reply

Support

Oppose

Abstain

Discussion