Wiktionary:Beer parlour

Definition from Wiktionary, a free dictionary

(Redirected from WT:BP)
Jump to: navigation, search
Nuvola apps chat.png Start a new discussion
Lautrec a corner in a dance hall 1892.jpg

Welcome, all, to the Beer Parlour! This is the place where many a historic decision has been made and where important discussions are being held daily. If you have a question about fundamental Wiktionary aspects—that is, about policies, proposals and other community-wide features—please place it at the bottom of the list (click on Start a new discussion), and it will be considered. Please keep in mind the rules of discussion: remain civil, don't make personal attacks, don't change other people's posts, and sign your comments with four tildes (~~~~), which produces your name with timestamp. Also keep in mind the purpose of this page. There are various other discussion rooms which may serve the idea behind your questions better. Please take a look to see which is most appropriate.

Sometimes discussion identifies an issue as an idea for policy development or rewriting. Such discussions may be taken out of the Beer parlour to the relevant policy page, or a brand new one may be created. See Category:Policies - Wiktionary Top Level for identified policy pages. Some of these may be inactive. Usually, the active policy pages will be listed in one of the sections below. See also the policy development page.

Questions and answers will not remain on this page indefinitely, as it would very soon become too long to be editable. After a period of time with no further activity (usually a couple of weeks), information will be moved to the archives. We make a point to preserve all discussions that were started here in the archives. However, talk that is clearly not intended for this page may be moved and will not end up in the archives. Enjoy the Beer parlour!

Beer parlour archives
2002
December
2003
2004
2005
2006
2007
2008
2009
All subject headings

Contents

[edit] October 2009

[edit] Volunteers still needed

Hi all,
Although we removed the centralnotice that was up, the Wikimedia Foundation is still looking for volunteers to serve as subject area experts or to sit on task forces that will study particular areas and make recommendations to the Foundation about its strategic plan. You may apply to serve on a task force or register your name as an expert in a specific area at http://volunteer.wikimedia.org.

The Foundation's strategy project is a year-long collaborative process which is hosted on the strategy wiki, at http://strategy.wikimedia.org. Your input is welcome (and greatly desired) there. When the task forces begin to meet, they will do their work transparently and on that wiki, and any member of the community may join fully in their work. This process is specifically designed to involve as many community members as possible.

Any questions can be addressed to me either on my talk page here or on the strategy wiki or by email to philippe at wikimedia.org.

I hope you'll consider joining us!

Philippe 01:53, 1 October 2009 (UTC)

[edit] Mandarin reaches 10,000 nouns!

Thanks for everyone's help in getting Mandarin to the 10,000 nouns mark. The 10,000th noun was... 尼龍 (nylon)! Celebrations all around! (see Category:Mandarin_nouns) Tooironic 04:55, 1 October 2009 (UTC)

Congratulations! Well done!. I marked the English word with a few new translations. Red links can't wait to become blue. :) Anatoli 05:49, 1 October 2009 (UTC)

[edit] Strong numbers

I was thinking, it might be useful to add Strong numbers to all the relevant Greek/Hebrew entries, and also redirects from the Strong numbers themselves to those entries. What would people think of this addition? --SJK 09:49, 1 October 2009 (UTC)

[edit] CFI Clarification required

(copy of entry in Wiktionary_talk:Criteria_for_inclusion#Clarification_Required) The CFI (Criteria for Inclusion) need clarification on one point:-

Attestation.

“Attested” means verified through
  • Clearly widespread use,
  • Usage in a well-known work,
  • Appearance in a refereed academic journal, or
  • Usage in permanently recorded media, conveying meaning, in at least three independent instances spanning at least a year.

Are those 4 attenstation criteria joined by OR, or by AND.

My personal view is that they should be joined by an OR, so that a term that meets ANY of the criteria, and does not need to meet ALL of the criteria.

I would suggest a change of the paragraph to

“Attested” means verified through meeting ANY of the following conditions

  • Clearly widespread use,
  • Usage in a well-known work,
  • Appearance in a refereed academic journal, or
  • Usage in permanently recorded media, conveying meaning, in at least three independent instances spanning at least a year.

I cannot be bothered to mount a campaign or vote on my own. Anyone agree enough to take it on ? --Richardb 14:28, 1 October 2009 (UTC)

They are joined by an "or". It's right there in the text. I agree that this is not sufficiently clear on a first reading; in fact just a little while ago I was modifying Wiktionary:Editable CFI in a way similar to your suggestion. Please feel free to modify that page further -- it is there to be edited. -- Visviva 14:34, 1 October 2009 (UTC)
Whilst I can see the little "or" buried in there, there are clearly those who cannot. And as to being bold and -- it is there to be edited--- there is CLEALRY at the top of the page "This is a Wiktionary policy, guideline or common practices page. It should not be modified without a VOTE." I've been around long enough, and been battered enough, not to try messing with CFI without a VOTE.--Richardb 15:44, 1 October 2009 (UTC)
Wiktionary:Editable CFI is a seperate page. Conrad.Irwin 17:29, 1 October 2009 (UTC)
Yeh ? I was still beaten up for editing that page too!--Richardb 07:39, 30 November 2009 (UTC)

[edit] Blunder in CFI General Rule needs to be corrected.

(Copy of entry at Wiktionary_talk:Criteria_for_inclusion#Blunder_needs_to_be_corrected_in_CFI_definition)

Someone, at some time, has made a blunder, that has apparently been subsequently accepted by a vote.

Under ==General rule== we find the line-

A term should be included if it's likely that someone would run across it and want to know what it means. This in turn 
leads to the somewhat more formal guideline of including a term if it is attested and idiomatic.

I hate to point out the absurdity, but, if obeyed, this would mean we would have ONLY idioms in Wiktionary !

I propose that the General Rule should be changed to:-

A word should be included if it meets any of the following criteria
*Clearly in widespread use, 
*Used in a well-known work, 
*Appears in a refereed academic journal, or 
*Used in permanently recorded media, conveying meaning, in at least three independent instances spanning at least a year. 
(See below under Attestation for clarification of these criteria)

A term other than a single word needs to meet the above criteria, and additionally be idiomatic. (See below for Criteria for Idiomaticity)

This change would also remove the disparity between the very loose, almost colloquial general rule (if it's likely that someone would run across it and want to know what it means) and the more formal attestation requirements.

Again, it needs to be changed, but I personally can't make the effort to mount a vote and a campaign. Anyone want to take it on ?--Richardb 15:02, 1 October 2009 (UTC)

When dealing with many types of languages, I take idiomatic to mean, "not a sum of parts" where parts can be whitespace-delimited words or just morphemes/letters. I think most single, whitespace-delimited words can be included with such an "open" interpretation (though maybe I'm alone on this view:). It's a bit fuzzier with highly agglutinative languages, and other issues noted at WT:IDIOM. --Bequw¢τ 18:12, 2 October 2009 (UTC)
This is not a blunder; the wording is correct. You have misinterpreted the logical term if as only if. They are not the same. The term if implies action only in cases where the initial condition is met. When the initial condition is not met, it makes no statement either way. --EncycloPetey 00:18, 3 October 2009 (UTC)
And there's me, 30 odd years in computing, engineering, policy writing, and I thought I knew and/OR/IF logic fairly well. But no way do I understand that completely twisted logic. Does anyone else ?--Richardb 07:52, 30 November 2009 (UTC)
But I think Richardb's follow up question would then be (if I may put words into his mouth), "if single-word terms aren't admitted under the 'General rule', under what other criteria in the CFI are they admitted?" This I had a hard time finding.
Essentially this is a problem, when I delete some nonsense every now and again someone replies on my talk page citing the sentence above. I mean why aren't Tiger Woods and Davis Love III terms that someone might run into and want to know what it means? Basically what follows in WT:CFI contradicts this anyway, so I think it should be changed as a cleanup issue, not a 'policy change'. Mglovesfun (talk)
Can Davis Love III be called a term? I don't think so. But this sentence in the CFI is much too restrictive for another reason: users may consult a dictionary for a definition, but also for an etymology, a pronunciation, anagrams, etc. They may also look for a term (a phrase) they do not know or cannot remember, but with a precise meaning (with the search tool, they could find it, but only if it's present...) Lmaltier 13:23, 3 October 2009 (UTC)

[edit] Including unicode encoding information

Should unusual characters (i.e. ) have unicode encoding information systematically included in some fashion? If so, what is the best way to do it?

For what it's worth, I'm inching toward yes at least on encoding, and also including the actual Unicode character name somwhere. Circeus 17:12, 1 October 2009 (UTC)

Could you treat the name as a synonym and include them (name and encoding) in a template that provided useful links for explaining what they are and how used? I often do that with abbreviations of headwords. Many contributors seem to believe that abbreviations deserve to be on the inflection line. Presumably they would like these there as well. DCDuring TALK 17:27, 1 October 2009 (UTC)
I was leaning towards usage notes for this, the last time it came around: e.g. . But a nice little floating infobox might do the job better. I do think we should include all of this information (for all characters defined in Unicode). However, IMX efforts to do so tend to arouse a tedious amount of carping. -- Visviva 18:38, 1 October 2009 (UTC)
I've come to the conclusion that, for completely non-linguistic symbols, the best treatment is often to start with a typographical definition that explicitly mention Unicode, which is immediately followed by the encoding. See the aforementioned , and . A good solution for actual letters (say Â) stil needs to be found, though: I'm struggling just to get a good way to deal with Δ. Letters in general are a downright mess, and I believe good stylistic guidelines for defining symbols need to be found: "delta" under an english section for the aforementioned Greek letter is neither an accurate nor a good definition for several reasons I won't get into here. Circeus 18:43, 1 October 2009 (UTC)
These approaches are great for the applicable symbols. May they be applied consistently and widely and be a model for whatever classes they do not themselves cover. DCDuring TALK 19:54, 1 October 2009 (UTC)
Encoding details are different than definitions, so including them as such is confusing. Having that information included as Usage Notes or on the inflection like (e.g. ) are both fine. A separate infobox also works. --Bequw¢τ 03:03, 2 October 2009 (UTC)
Actually the more I think about it, the more I think the encoding info should not just be separated from definition lines, but from language sections altogether. If there was some clean way to include it above the language sections, like {{see}}, I'd be in favor of that. --Bequw¢τ 21:40, 5 October 2009 (UTC)
Thanks to Visviva we now have the general {{character info}} and several script-specific templates (such as {{Vai character info}}). --Bequw¢τ 15:24, 15 October 2009 (UTC)

[edit] Wiktionary lookup tool

One of the side projects I've been pushing is increasing traffic/visibility of the Wiktionary project in general. A tool was requested for fr.wn, to allow readers to quickly look up terms without leaving the article they're browsing. Cirwin pointed out Bequw's quickLookup.js, and modified it for the proof-of-concept, as well as continuing to advise. n:User:Bawolff has continued to develop the idea, and has a working model which uses MediaWiki:extractFirst.xsl to extract the first definition.

My goal for this tool is to increase the reach of Wiktionary, and hopefully bring in new contributors. Of course, that's not the goal for others working on the tool - they just want a quick way to look up terms.

This tool is currently installed as a gadget on en.wn and en.ws, and probably on other wikisource languages. It is also going to be packaged for inclusion in blogs and other popular website scripts. The tool has been shown to several large Mediawiki sites outside the WMF, and may be tested on some of them soon.

- Amgine/talk 18:31, 1 October 2009 (UTC)

[edit] IPA - standard tag etc...

I'm working on a TTS processor / speech processor /text reader at the moment (currently targeted at Gutenberg for a friend of mine with failing eyesight but this could easily be refactored to work with wiki pages or pretty much anything else), and one of the ideas which occurred to me was that it should be possible to build a local implementation of Wiktionary words against their phonetic equivalents expressed as IPA. A few of my public utterances on my thinking on the subject may be found here [1].

Now at the moment I have a little Java bot which merrily reads a word document from Wiktionary and looks in there for a regex string which looks like \<span class=\"IPA\"\>.*\<\/span\> - this may need to be extended in the case of Wiktionary edition languages with extended character sets etc - and parses out the .* IPA rendition of the word. Unfortunately this is not complete, my quick, dirty and probably unrepresentative sample indicates that there are about 20 - 30% of words without IPA renditions, the real number is likely to be much higher. The generator for this tag is the IPA template e.g. for the IPA of help it looks like: {{IPA|/hɛlp/}}. Moreover there can be >1 implementations of this tag e.g. [2]. This is not an insuperable problem since I have a nuance indicator which knows what the preferred pronunciation should be by recourse to the tag \<span class=\"qualifier-content\"\>.*\<\/span\> where the .* indicates the preferred implementation. One fine day I will get my bot to run over a few Gutenberg volumes and build a database of IPA renditions from Wiktionary, and I will try and find an unobtrusive time/date to do this. My real questions initially are these: is this tag class likely to change at any point? And is there currently a mechanism for indicating entries which do not have an IPA representation with a view to getting them fixed? Sjc 12:27, 2 October 2009 (UTC)

You might find it easier to work with the XML dump http://download.wikimedia.org/enwiktionary/ this way you can find the "{{IPA}}" directly - that way you don't have to spider the whole site. The chances are that we won't change the classname, but it's entirely possible someone will forget. With the XML dump, generating a list of missing IPA is trivial. Conrad.Irwin 14:17, 2 October 2009 (UTC)
Thanks Conrad, I'll have a play with the download, and obviously if I don't have to spider it would make a lot of sense not to. Sjc 08:37, 3 October 2009 (UTC)
Depending on your situation, it might also make sense to use the API. You can feed it a list of titles separated by pipes to get the wikitext for multiple pages at once (e.g. all words in a sentence). But the database dumps are the optimal solution. -- Visviva 06:29, 4 October 2009 (UTC)

[edit] Wiktionary:About English

It seems odd to me that there's no Wiktionary:About English (WT:AEN) like there are for other languages. I think it would be very useful, both in organizing our existing English-specific resources and as a place to note consensus that has been found on a variety of issues. Some issues that could be dealt with on the page are:

Thoughts? --Bequw¢τ 17:43, 2 October 2009 (UTC)

I had just been thinking the same thing in regards to the little flap we just had over whether to use ==Postposition== as a POS header for English. That is a classic "language considerations" issue -- sorting out what makes sense for a proper treatment of a specific language (nobody would question that English has postpositions, or that these are an important grammatical category in some languages). I definitely think having this page would be a good idea. However, I also would prefer that it be aggressively limited to language considerations, lest it start to blur into areas that should really be addressed through broader policy. So I would be of mixed feelings about including translation formatting there, even though it is an issue that affects English entries only. But definitely create the page if you have a notion; we can sort out the details as we go along. -- Visviva 19:02, 2 October 2009 (UTC)
Seems worth a try. DCDuring TALK 19:25, 2 October 2009 (UTC)
Just had time enough to put up some starter links. We'll see where this goes. --Bequw¢τ 20:14, 2 October 2009 (UTC)
About translations: in most cases, translations are limited to English entries, but they might be included in other entries too, as English terms don't exist for all possible meanings, in their most precise senses. As an example, fièvre de cheval could be given as a translation in febbre da cavallo, and conversely, rather than adding the translation in the etymology section (which may be appropriate in some cases, but not always). Lmaltier 13:30, 3 October 2009 (UTC)
This would require a vote to change WT:ELE#Translations. While it has been mentioned before, I doubt this will happen in the foreseeable future. --Bequw¢τ 19:55, 16 November 2009 (UTC)

[edit] Some macrons in English

I need to focus your attention on the fact that we should now add macrons in the English alphabet (ā in a#Derived_terms...). Actually, w:en:English_words_with_diacritics talks about some exotic characters importations in the English language, and I'd like to thank Hippietrail to currently demonstrate their importance in Category:English_words_spelled_with_macrons. Apart from that we're having the same observation for French on fr.wikt in parallel. JackPotte 15:21, 3 October 2009 (UTC)

[edit] Note on some unusable sources.

There are a few publishing houses out there, notably Alphascript Publishing and Global Vision Publishing House (that I've seen, at least) which are putting out "books" which are nothing more than collections of Wikipedia articles plucked and thrown together. Obviously, such works should not be counted towards the CFI requirement for demonstrating usage of words within them any more than the copied Wikipedia articles themselves. Cheers! bd2412 T 17:12, 3 October 2009 (UTC)

Well, obviously they would only count once ... but if they are in fact printed paper books, I'm not sure why we wouldn't accept them as citations. We would accept a printed quote from a Wikipedia article, wouldn't we? -- Visviva 04:14, 4 October 2009 (UTC)
A printed quote in a peer-reviewed source, I'm sure we could use. However, if someone is just grabbing a handful of Wikipedia titles and calling the collection a book (for $60-70, no less) they are no more reliable for our purposes than an openly editable Wikipedia article. From what I'm told, I should add, these printed volumes come complete with whatever spelling/syntax errors are to be found in the articles grabbed, and misprints of certain characters. See User:PrimeHunter/Alphascript Publishing sells free articles as expensive books for details, but my particular concern is that these books do come up in word searches on Google Books. Cheers! bd2412 T 23:04, 4 October 2009 (UTC)
We are only looking of for durably archived examples of usage. WP can't rely on a printed copy of itself. We could rely on a printed copy of articles from WP because it overcomes the durable archiving issue. I wonder under what circumstances we could rely on printed copies of our own usage examples for attestation! Both sources run the risk that a savvy person could game attestation, but we already face that problem with Usenet. A savvy PR person could plant terms in the press and in testimony to game the system as well. DCDuring TALK 01:16, 5 October 2009 (UTC)
Our own usage examples are mentions, not uses. But our discussions (as her in the BP) are uses.​—msh210 17:24, 7 October 2009 (UTC)
With the advent of self-publishing mechanisms like Lulu, the divide between "books as properly reviewed and edited texts" and "books as anything anyone can write" (à la blog) is becoming more and more blurry. I foresee this becoming a problem in the near future. Equinox 21:45, 5 October 2009 (UTC)
Not sure why somebody reverted that edit. To clarify: I don't mean that the man on the street should never, under any circumstances, be allowed to write a book, but that books without skilled editing are liable to be a mess, and that our WT:CFI currently relies on a tacit assumption that books don't have the failings of blogs and Web pages. Equinox 22:27, 5 October 2009 (UTC)

[edit] Appendix:Unicode/Egyptian Hieroglyphs

Most of these map neatly to WikiHiero syntax, AFAICT, but there are some that don't. In particular, there are two sets, "NL" and "NU", that don't seem to be supported by WikiHiero at all. Does anybody have a notion of what these are, or of where more information on them can be found? Are they in the Gardiner list, or do they come from another source? -- Visviva 06:36, 4 October 2009 (UTC)

  • Now solved, these are symbols for the nomes of upper and lower Egypt respectively. See e.g. [3] Glyphs for all can of course be viewed in the block description, as well as in the Aegyptus font that uses the Supplementary Private Use Area. -- Visviva 07:01, 9 October 2009 (UTC)

[edit] Style guide for prefixes and suffixes

Do we have a style guide for prefixes and suffixes? If we don't, should we have? What would be the recommended definition line, for example:

  • Creates nouns from verbs.
  • Forms nouns from verbs.
  • Used to form nouns from verbs.

--Panda10 13:41, 4 October 2009 (UTC)

Not as far as I know, but we should. I favor approach 3, preferably wrapped in {{non-gloss definition}}. Where a gloss is possible, this can go at the front of the definition, like so: "Become. Used to form nouns from verbs." On the other hand, 1 & 2 are shorter and -- because of the polysemy of used to -- clearer. So I dunno, actually. If a consensus exists, it could perhaps be documented at Wiktionary:Style guide, which is still a bit of a mess at present. -- Visviva 15:00, 4 October 2009 (UTC)
I'm not using any of those aproaches in Latin, because it would be repetitive. See -icus, where there are three meanings, but all are used for the same pattern of word formation. I prefer the definition line to be a definition line, and to have the POS constructive information in a Usage notes section. --EncycloPetey 15:43, 4 October 2009 (UTC)
That is definitely the best approach I've seen. There may be some cases that call for a different approach, but overall that seems like an excellent template. -- Visviva 16:27, 4 October 2009 (UTC)
I like it too, especially this:
  • Instead of saying "Creates nouns from verbs" it probably sounds clearer "Added to a verb to form a noun"
  • Wikilinking verb and noun will highlight the parts of speech.
Other thoughts:
  • I still prefer providing examples just below the definition. This is what we do for other entries.
  • In the examples, it feels more natural to me to provide the base word first, then the derived word:
turista (tourist)turistaként (as a tourist)
  • I can't always translate the suffix with a single English words and I have to describe it with a sentence.
Maybe we could provide several recommended models in the style guide. Thanks. --Panda10 16:45, 4 October 2009 (UTC)
The examples on -icus were chosen to show formation from three different parts of speech, rather than to illustrate the meanings. I agree that having examples with the definitions could be good, but that's where we normally put quotations or example sentences showing the term itself in action. So, I'm not overly fond of the idea of putting something else in that location. --EncycloPetey 19:59, 4 October 2009 (UTC)

[edit] Extended Wiktionary queries now available

I've been working on a tool which makes it possible to make many new kinds of queries on the English Wiktionary. It doesn't yet have a public front-end but for now you can submit queries to me and I will try to fulfil them.

For an example of what is possible see User talk:Vahagn Petrosyan/Armenian nouns lacking declension sections.

Basically anything involving page titles, languages, section headings, and section heading levels. It's also possible to compare and count.

I would like to add categories and information other than section headings too and if anybody would like to help me improve the tool that would also be great. — hippietrail 08:03, 6 October 2009 (UTC)

i'dlike 2find it[+ipa pl--史凡>voice-MSN/skypeme!RSI>typin=hard! 10:17, 6 October 2009 (UTC)
I'd like to find italian entries containing IPA please
L☺g☺maniac chat? 13:30, 6 October 2009 (UTC) +--史凡>voice-MSN/skypeme!RSI>typin=hard! 14:40, 6 October 2009 (UTC)
Sorry but IPA doesn't involve a section heading. I could find all Italian entries with a Pronunciation section if you like. — hippietrail 17:01, 6 October 2009 (UTC)

k+same4cantones pl--史凡>voice-MSN/skypeme!RSI>typin=hard! 01:32, 7 October 2009 (UTC)

I like it, this will make many cleanup lists easier to devise and keep updated. Can't wait for a front-end. --Bequw¢τ 16:38, 7 October 2009 (UTC)
A front-end is difficult because SQL is difficult and allowing users to enter arbitrary SQL would be dangerous. That means I have to wrap it in some generalized code. But to do that I need to know what kinds queries people will want. So please ask for some and the front-end will get closer.
I have categories working now too so for another example I was able to find all the differences between pages with Armenian Noun headings and Armenian noun categories. — hippietrail 00:51, 8 October 2009 (UTC)
Using just headword, section headers and categories: Sample queries: English entries with a "homophones" header or in homophones category with no Pronunciation section; English PoS headers not in corresponding English PoS category; English entries in category English Prepositional phrases with only Adverb headers in English section, with Phrase header, etc. English phrase headers not in any of a set of categories. Single-word English entries with Proverb, Phrase, or Idiom headers.
Many of these would need a "count" run, followed by a "sample" run of 10 or 20 (not necessarily a random sample, but not always beginning with "a"). HTH. DCDuring TALK 01:47, 8 October 2009 (UTC)
What an interesting effort. Can I have a list of words in Category:1000 English basic words that lack the etymology header? If yes, you can post it to my user space or anywhere else fit.
Does the tool have a home page here in wiki? --Dan Polansky 07:44, 8 October 2009 (UTC)
  • User:HippieBot/English basic words without etymologies
    No home page yet. Rather than being a tool I have several tools which create metadata database tables from a dump file and for now I'm figuring out SQL queries for those tables. When I get the hang of it I will attempt to craft it into a tool with a web front end. — hippietrail 14:08, 8 October 2009 (UTC)
    Thanks! Nice that you have also posted the SQL statement to the result page. --Dan Polansky 14:49, 8 October 2009 (UTC)
    Thanks very much for generating these extremely useful cleanup lists. One good use would be to make our PoS categories match our PoS headers so the categories could be relied upon as being nearly complete matches with the headers.
Being able to match templates and context tags with headers would be nice too, when as and if you get the chance. In the meantime, the lists that you can now generate make it easier to systematically do lots of cleanup. DCDuring TALK 15:45, 8 October 2009 (UTC)
A great mega-family of cleanup lists would be, for each language, entries with headers of a given PoS, but not in the corresponding categories. This is close to what an Ullman bot generates I think, but only in pursuit of changed items. DCDuring TALK 20:56, 8 October 2009 (UTC)
I have completed cleanup on the two lists you provided. I would like to work through some of the smaller English PoS categories: prepositions, determiners, conjunctions, pronouns, interjections, idioms, phrases, proverbs. The cleanup would be those with the headers indicated, but not in the categories indicated. DCDuring TALK 17:23, 10 October 2009 (UTC)

Here's another one I thought of: User:HippieBot/English terms in unknown etymology category but without etymology section

[edit] template:mammal

What’s happened to the mammal template? It hasn’t been working lately. I’ve tried to use it in кабарга, but it has vanished. —Stephen 02:14, 7 October 2009 (UTC)

[4] -- Prince Kassad 12:58, 7 October 2009 (UTC)
Oh, now I recall someone removing all the mammal categories from the Taos language. So we just don’t have mammals anymore. What’s the difference between mammals and politics or religion? What categories are going to be kept, and how does one determine them, other than trial and error to see what connects to a working template? —Stephen 14:29, 7 October 2009 (UTC)
Category:Mammals hasn't gone anywhere, just the context tag. The argument, I believe, is that "mammal" is not a context, which is true enough but IMO just shows that we need an additional kind of label. -- Visviva 06:54, 9 October 2009 (UTC)
I don’t understand what you mean, but if we are keeping the category, then it would seem that the gripe was with the way the template displayed. The proper thing to have done was to change the template output to whatever "not a context" means, while retaining the category. It was a huge loss when the mammal category was completely stripped from the Taos lexicon. There is no one here who is working on Taos anymore, so this is permanent damage. Whoever did it should go back and insert the category in all of those words from which he removed it.
Ideally, we should restore the template and change the display to please whomever it was that took offence. That would make adding these tags so much easier. The way it is now, we need to have a dedicated context expert who understands what this is about to add all the contexts for all the languages where called for, and to add categories when contexts are not wanted (that sounds like gibberish to me). —Stephen 23:57, 9 October 2009 (UTC)
I wholly agree that before a template for a pseudo-context label is removed, all its entries should be placed into the corresponding category using [[Category:...]]; the categorizing effect of a deleted pseudo-context template should not be lost.
Better yet, a pseudo-context template such as {{fish}} can be kept; the end of distinguishing topical context templates from topical categorizing templates can be achieved by removing the display from the latter, so that the latter do not show, say, "(fish)". It is nice to see in the wiki code to which of the senses the given topic category applies. So for instance, for the not-yet-deleted template {{fish}}, the template {{context}} in its guts could be passed a new parameter "pseudo-context=1"; the {{context}} would show no label when pseudo-context is 1. Or {{fish}} could have {{pseudo-context}} in its guts to make it more explicit that it is a mere pseudo-context template. Or whatever else we see fit. In the end, whether a categorizing pseudo-context template should show a label or not can be made user-customizable using CSS clases. I for one would be happy to see "(fish)" on the definition line of salmon: "(fish) One of several species of fish of the Salmonidae family". --Dan Polansky 07:20, 10 October 2009 (UTC)
Well, I reckon there are two or three editors who can edit {{context}} without breaking anything; if you can get one of them on the case, I personally would have no objections. But the fact remains that context and meaning are quite different things, and it does no service to our project or our users to lump them together. Consider: if these continue to display, then when we do have a fish name that is used only in technical contexts, we would then have the odd lead-in "(ichthyology, fish)". Or worse, "(fishing, fish)".  :-)
In view of this, I would rather that we used a separate template entirely, maybe placed at the end of the sense line. This could be much simpler than {{context}}, because it wouldn't need to display in any fancy ways (or at all). It could perhaps generate an HTML anchor that would allow direct linking to the sense. It could take a string of identifiers, maybe combining into a short gloss: {{label|fish|of the family |Fishofishea}} -- thus categorizing into Category:Fish and Category:Fishofishea, if present. Or something like that. -- Visviva 02:49, 11 October 2009 (UTC)
(Unindent, re Visviva) Sounds good to me; I admit that letting the {{context}} uninvolved and going with {{label}} or another separate template seems better.
I'd prefer to have the label at the beginning of the sense line, though. Again, whether the label should be displayed can be made customizable. Also, the categorizing labels can have a slightly different formatting, and there can be a tooltip indicating that it is a categorization label rather than a restricted context label. In any case, I think the pseudo-context templates such as {{fish}} should better be left undeleted until this is resolved. --Dan Polansky 08:24, 11 October 2009 (UTC)
I think {{mammal}} should show (zoology), but categorize into Category:Mammals, the way {{rivers}} shows (geography), and categorizes into Category:Rivers. --Vahagn Petrosyan 12:40, 10 October 2009 (UTC)
But this was the problem in the first place; most mammal names are not technical zoological terms. "Geography" is at least ambiguous; nobody is quite sure whether it refers to the everyday field of knowledge, or to the specialized work of geographers. "Zoology" (and likewise "ornithology" etc.) has no such ambiguity; it refers to a specific technical field, so putting such a label on dog or elephant (or their language-X equivalents) is actively misleading. -- Visviva 02:49, 11 October 2009 (UTC)

[edit] Translingualness of letters

I thought we'd be able to come of with some guidelines for which letters are Translingual. Looking at Appendix:Alphabets, here is the current state of affairs (there's obviously some scripts missing here but these are the main ones):

  • Translingual: Latin (mostly), Cyrillic (mostly), Braile, Greek, Armenian, Gothic
  • Individual language: Arabic, Carian, Georgian, Hebrew (except א), Lycian, Lydian, Phoenician (except 𐤔), Thai

I propose that letters be considered Translingual if they are used in multiple languages but excluding direct ancestors. I'm not sure if that wording is clear, but I mean to exclude cases like Greek where mostly the same alphabet was used for the temporal divisions of Greek (divided on en.wikt as Modern Greek, Ancient Greek, and Mycenaean Greek). Usage, as opposed to a metion, of a letter would be interpreted as usage in constructing native language words. An English textbook describing the pronunciation of would not count as usage in English.

What do people think of this criterion, is it too restrictive or not restrictive enough? I think it meshes well with the decision to have Translingual entries for Han characters. If implemented it would cause Greek, Armenian, and Gothic letter entries to be converted to individual language entries. There may still be some Translingual entries (possibly for Translingual-senses of Greek letters) but the the simple letter sense would be language specific.

The Unicode encoding details of characters is information that I think is separate from language concerns (such as definitions, inflections, or usage). I think we should be able to come up with a format to display this information without resorting to Translingual entries as we have in some cases. Possibly a template like {{see}} that sits above L2 headers.--Bequw¢τ 16:33, 7 October 2009 (UTC)

I like the criterion, but I would prefer to reverse it: a character/symbol/letter is Translingual unless it is used only in one language, or only in different historical phases of one language. Otherwise I don't know what we would do with things like 𐇐. Also if we are non-Translingual by default, it will be difficult to find a rationale for the translinguality of some symbols that are simply non-linguistic (e.g. arcane mathematical symbols), but that may only happen to be attested in English writings. -- Visviva 05:32, 8 October 2009 (UTC)
I have been fiddling with something at Template:character info. It could go above the first L2 (though it would still float to the right of the first section). It's designed to be customized for specific cases (e.g. {{Vai character info}}). I'm thinking that the specific data we might want to include would vary from one code block to another. For all characters we would probably want to provide
a) codepoint linking to further technical information, either on Unicode.org (which is problematically information-poor) or FileFormat.Info (which is problematic, since it's just some guy with a website AFAICT). I assume we don't want a Wiktionary entry to include things like how to represent the character as an HTML entity or in UTF-8 et al.; but we do want to show the reader where to find this information.
b) a link to a root node for the set. In most/all cases this would be a subpage of Appendix:Unicode. This can provide general information about fonts, etc., and a link to the official description of the block on Unicode.org.
c) links to the previous and next characters in the block. Clickiness is always good. Individual language sections might also have their own navbars, if the traditional order varies from the Unicode order.
d) links to other characters with which this character might be confused (this can simply continue to use {{see}}, I think)
e) a canonical image of the character, if available
For some characters we would want to provide additional information, such as presence/designation in other standards, decomposition, keyboard entry, compatibility forms, etc. For very special cases like the Egyptian hieroglyphs, it may be necessary to break this information out for more extensive treatment a la the Han characters. But in most cases I think an infobox would be sufficient. -- Visviva 05:32, 8 October 2009 (UTC)
I like it. We might want to make the "Next"/"Previous" links optional. In some areas of Unicode there's no real ordering and so this info isn't really useful there. In other places, the page might be very full, and a Translingual section might elaborate on where the entry fits in some order (like an alphabet) already. In this case we might reduce duplication, and save on RHS space. Looks nice though. It might be hard, but very useful, finding images for many of the symbols. --Bequw¢τ 02:16, 9 October 2009 (UTC)
I have reformatted it a bit, so the next/previous links will appear only if supplied. I was thinking of them mostly as a browsing feature (so that someone could flip through all the characters in a block, if they were so inclined). Please feel free to make any other modifications that seem approprite. As with anything I create, I live in the hope that someday, someone who understands layout and design will fix everything up. :-)
Commons only has a smattering of character images at present (CJK excepted), but it looks to me like these fonts should all be Wikimedia-compatible. (And fantastically, that appears to include glyphs for most/all of the Egyptian Hieroglyphs block). Plus SIL has free-as-in-freedom fonts for Vai and some other scripts. So image creation should mostly be a mechanical (albeit time-consuming) task. -- Visviva 06:42, 9 October 2009 (UTC)
Two full sets now using {{character info}}: Vai (not illustrated, yet) and Phaistos Disc (illustrated). Feedback welcome, before I go crazy and start creating 50,000 of these. -- Visviva 02:53, 11 October 2009 (UTC)
It's important to enforce one of the script templates in the appendix. For example, symbols in Phaistos Disc show up as squares in Opera and IE, even though I have the Aegean font. {{Linb}} would work for Phaistos. --Vahagn Petrosyan 10:56, 11 October 2009 (UTC)
This should be sorted now, at least for these two. For the others, I've made some rough guesses as to which script template is appropriate for (most of) each Unicode block and created corresponding subtemplates of {{unicode scripts}}. However, some of the Appendix:Unicode pages won't be affected by this until their format is updated. -- Visviva 06:23, 13 October 2009 (UTC)

[edit] Categorization of Abbreviations, Acronyms and Initialism

Shouldn't the categorizing templates ({{abbreviation}}, {{acronym}}, {{initialism}}) assume "English" unless otherwise specified, just like the context templates do? There's whole mess of entries in Category:Abbreviations, acronyms and initialisms that are English and therefore should be in Category:English abbreviations, acronyms and initialisms but are miscategorized because people don't pass in lang=English/en. --Bequw¢τ 19:13, 8 October 2009 (UTC)

Makes sense. L☺g☺maniac chat? 19:18, 8 October 2009 (UTC)
Yes. English gets to be the default, but has the responsibility for cleaning up. Shouldn't there be a pseudolanguage code for "language to be assigned"? Is there one already? DCDuring TALK 20:13, 8 October 2009 (UTC)
{{und}} Undetermined is the relevant ISO 639-3 code you are thinking of. — Carolina wren discussió 20:32, 8 October 2009 (UTC)
Thanks, CW. DCDuring TALK 20:42, 8 October 2009 (UTC)
There are 6270 or so in that category. All the ones beginning with "." (>200) are translingual. Many of the others are translingual. Perhaps Hippietrail should make a gift of a cleanup list linked to the About pages for each language of all entries with an abbreviation header in a language section and not an entry in the appropriate category. If some items now have "en" as a parameter, I would guess it would be more efficient if the master list of misclassified items remained where it is. DCDuring TALK 20:42, 8 October 2009 (UTC)
Implemented. Now time for cleanup. --Bequw¢τ 00:25, 9 October 2009 (UTC)

[edit] Phaistos Disc symbols: Translingual or Undetermined?

Just wondering what the best language header would be for the characters in Appendix:Unicode/Phaistos Disc. Which do we prefer, 𐇐 ("Translingual") or 𐇑 ("Undetermined")? -- Visviva 08:02, 9 October 2009 (UTC)

FWIW, I'm leaning strongly towards Undetermined. If there are no objections, I will probably go ahead with that, and see how it works out. (The category name "Undetermined symbols" is a bit troubling, but these should probably have a more specific inflection template anyway.) -- Visviva 15:00, 9 October 2009 (UTC)

Seem to be Undetermined to me, any real use in another language would be a quote off the disc (though no doubt they'll become popular as nifty characters in their own right). Conrad.Irwin 00:03, 10 October 2009 (UTC)
OK, I have started creating these as ==Undetermined==. We'll see if anybody screams. AFAIK this is the first use of this L2 header on Wiktionary; the existing calls to {{und}} were all from etymologies. Nor can I think of any other cases offhand that would call for this -- I think Herodotus and Strabo have a few words from languages that don't map to any known language, but I'm not sure if (or how) we'd want to include those. -- Visviva 09:25, 10 October 2009 (UTC)

[edit] Reconstructed terms in attested languages?

How should we handle reconstructed terms in attested languages?

These arise frequently in detailed etymologies – for example, firkin comes from conjectured Middle Dutch *vierdekijn, diminutive of vierde (fourth).

These differ from reconstructed terms in reconstructed languages in that only the term is reconstructed – the language itself is attested; thus *vierdekijn is an conjectured unattested term in the language Middle Dutch, not in “Proto-Dutch” or “Proto-Middle Dutch”.

Specifically, how should be file and link to pages for such terms?

The existing policy page, Wiktionary:Reconstructed terms (WT:RT), largely deals with reconstructed languages, the category for reconstructions, Category:Reconstructions, contains only reconstructed languages, and the {{reconstructed}} template is used only terms in reconstructed languages – not one single reconstructed term in an attested language is flagged or categorized as such, hence there is no existing practice on which to base policy.

Question 1 – where do they go?

Do reconstructed terms (in existing languages) go in the main namespace or in Appendix:? Reconstructed terms fail WT:CFI as they are not attested, but WT:RT: Entries for terms states:

A Latin reconstruction should be clearly marked as a reconstruction, but goes in the main namespace as any other Latin term would, following normal rules for inclusion.

This appears to be a mistake – following “normal rules for inclusion”, a reconstructed term does not meet CFI, because it is unattested.

We should either:

  • Amend WT:CFI to allow reconstructed terms in attested languages;
    Dubious – reconstructed terms are subject to revision as theories change: “attested” is part of CFI for good reason.
  • Clarify at WT:RT and WT:CFI that reconstructed terms do not go in the main namespace, whether the language is or is not reconstructed.

Based on existing practice, I’d assume that *vierdekijn would be filed in:

What should the parent category be called?

(There’d be a “nouns” in between, natch.)

Question 2 – how to link?

Existing policy at WT:RT: References from Etymologies only refers to reconstructed languages, and prescribes the use of {{proto}}.

How to link to *vierdekijn (and other reconstructed terms)?

  • Using {term} in some way – say, link to vierdekijn but display as *vierdekijn,
  • Using {proto} in some way – namely, (optionally) removing the required “Proto-”.
  • Using a new template, say {{conjectured}}, which functions like {term} but links to Appendix, optionally adds the *, and some wording like “conjectured”.

My thoughts:

  • Overloading {term} any further seems a mistake – it’s very basic.
  • Adding a noproto argument to {proto} would be the easiest answer, and fine if we don’t want to distinguish reconstructed terms from reconstructed languages.
  • A new {conjectured} template would allow finest control; also, one could use existing ISO codes for languages.

Simple summary of my conclusion:

  • reconstructed terms in Appendix:, even if language attested;
  • add noproto to {proto}, or make new template {conjectured}.

What do people think?

—Nils von Barth (nbarth) (talk) 21:09, 10 October 2009 (UTC)
BTW, previous discussions at:
—Nils von Barth (nbarth) (talk) 21:15, 10 October 2009 (UTC)
As I said on some of the abovelinked discussions, I see no difference between protolanguages, and what others call "reconstructed terms in existing languages". Proto-languages are collections of proto-terms, and these are the etymons that are hypothesized to have existed on the basis of comparative evidence, in order to yield existing, actually attested forms, and *vierdekijn is exactly such "proto-term". The certainty by which *vierdekijn is reconstructed is no less than some of the reconstructions in ancestral languages that are not attested at all, or attested much more scarcely than Middle Dutch.
Also: I don't see much gained by putting such reconstructed middle-forms yielding an attested form in only one language (or only one form in one language) in the appendix, and IMHO the appendix namespace should be used only for major proto-languages which have many descendants (in this case Proto-Germanic where MD reconstruction would be listed at the appropriate clade in the hierarchy). Vulgar Latin (Proto-Romance) forms that were not attested could be added as the descendants in the most-close Latin etymon (and it won't be too wrong in this particular case to list D firkin as if descending from MD vierde).
In etymologies I've been so far formatting these reconstructed non-protolanguage terms as *{{term||<form>|meaning}}. I wouldn't have nothing against using some newly-written {conjectured} template, tho enforcing it may be an overkill for this whole issue, as {term} seems to be up to the task for it. --Ivan Štambuk 09:53, 12 October 2009 (UTC)
I think Ivan has pretty well summarized my opinions as well. --EncycloPetey 02:28, 13 October 2009 (UTC)

Ivan, thanks for the thoughtful write-up (and EP for concurrence). If I may summarize, both to verify that I’ve understood and to make a concrete proposal:

  • Question 1 – where do they go?
Reconstructed terms in attested languages – especially intermediate forms in etymologies – should not have an entry (a page), either in the main namespace (because they do not meet CFI), nor in the Appendix (as that is reserved for Proto-languages).
Rather, they should be listed in the entry for the descendants used to reconstruct it (in the “Etymology”) section, using *{{term||<form>|meaning}} (which yields *<form> (meaning) – formatted, but no link), and in the entry for the closest older term (in the “Descendants”) section (e.g., Vulgar Latin in the Latin entry).
  • Question 2 – how to link?
(Don’t link; {term} formats correctly.)

This sounds like an excellent solution – it lists the form for etymology (which is their function), but skips having an entry whose only function is to fill in a chain of etymology.

—Nils von Barth (nbarth) (talk) 01:59, 14 October 2009 (UTC)

[edit] Stock symbols

Do we even want these?
We currently have Category:Stock symbols for companies with feeder template {{stock symbol}}, but I strongly doubt we want either the category ot the template, while the entries themselves should all be sent to RFV or simply deleted, with the possible exception of T (AT&T) . Not only are the ticker symbols duplicated on various exchanges worldwide, they change over time. For example. C was once Chrysler, and now is Citigroup. — Carolina wren discussió 17:54, 11 October 2009 (UTC)
What is the reason? That they have an average life of less than 30 years? There might be books (even literature) that refer to then-current ticker symbols, especially the ones with cutesy names or that were for popular stocks, like "T". We seem to find is easy to justify including all kinds of abbreviations from realms that are more familiar (or ideologically congenial ?), like ISO-639 codes, E numbers, etc. I would be interested to hear the arguments distinguishing this case from those others. DCDuring TALK 16:12, 12 October 2009 (UTC)
It seems to me that the case is basically identical with ISO codes, except that there is more than one registering authority. Which is to say, I wouldn't really lose any sleep over it if they all got shuffled off to Appendix-land, but I do think we are a more useful reference work for having them (and for having them in mainspace). If we got rid of these, I would want to see a global purge of all the other initialisms that don't meet normal-word criteria. But I don't really think that would be a good use of anyone's time. -- Visviva 16:35, 12 October 2009 (UTC)
I say, apply the CFI for brand names. Iff they meet it, they stay in. If not, appendicize. bd2412 T 17:40, 12 October 2009 (UTC)
But they're not brand names; why would we treat them like they are? Also, that bit about not being written "about the type of product in general" is rather problematic. Surely no one would expect to find these anywhere but in publications that are at least generally related to the stock market. -- Visviva 11:08, 13 October 2009 (UTC)
What is a stock ticker symbol? It's a stand-in for the company name, a proper noun. I have no objection whatsoever to including them in an appendix, but I'm not sure what purpose is served by reporting them as words. Someone who comes across a stock ticker symbol will almost certainly be looking at a stock ticker or the like, and will immediately know what quality of symbol they are looking at. bd2412 T 19:38, 13 October 2009 (UTC)
Unlike ISO-639 or E numbers, we've got the issues of multiple issuing authorities and no ban on reassignment of previously used symbols. Indeed, Citigroup took over C the same year Chrysler merged with Daimler. At a minimum, if generally kept in mainspace, we need to revamp the category and template, so that it could handle the multiple issuing authority concern. — Carolina wren discussió 20:14, 12 October 2009 (UTC)
But that actually suggests to me that this would be more useful than other similar classes of entries. A zeptogram will never be anything but zepto- + -gram, and en will never be the ISO 639-1 code for anything but English; but a stock symbol may have wildly different associations over time and space, associations that may not be satisfactorily documented anywhere else. Someone reading an older financial text that refers to "C" might come away with a very flawed interpretation if they assume it is referring to Citigroup rather than Chrysler. -- Visviva 11:08, 13 October 2009 (UTC)
I'll agree that a source that collects historical ticker symbols would have value, but are they used in non-tabular text, unaccompanied by the name of the company whose stock they represent? Generally, the answer to that is no, with occasional exceptions such as T which could then pass the normal CFI. E numbers are a marginal case, but they will show up commonly enough with such words as peas, carrots, and potatoes in ingredient lists. — Carolina wren discussió 17:42, 13 October 2009 (UTC)
They are in widespread use in the financial press in articles that discuss stock prices. I'm not really sure that the use of language codes occurs much more on running test than stock symbols do. We haven't been very demanding on any of the 1900 headwords that are have abbreviation-type heading. Items from the world of commerce often attract hostile attention that analogous words from the worlds of IM, computer gaming, computing generally, and linguistics do not. Consequently they can serve as a kind of miners' canary, providing useful information about entry classes for which our standards have otherwise been overly lax. DCDuring TALK 18:00, 13 October 2009 (UTC)
In the financial press I read, I've generally only seen them used in conjunction with the the name of a company, never independently, as way to enable easy lookup of data by interested readers, since financial data often is setup based on the assumption that the ticker symbol will be used to access it. — Carolina wren discussió 19:49, 13 October 2009 (UTC)
... or areas where we have an unreasonable prejudice against business. (-: -- Visviva 18:21, 13 October 2009 (UTC)
If the entries only said "stock ticker symbol", I agree that would be fairly useless. But most seem to have at least the name of the company and a link to the pertinent Wikipedia article. Information about the exchange and period of use would also be helpful. Of course, all of this is rather encyclopedic... but it's also somewhat dictionaric, and even if it were included somewhere in Wikipedia it wouldn't be likely to be easily found. So we once again face the choice of whether we would rather be useful or pure. I'm less than convinced of the community's ability to make the right choice on such matters, based on past experience; maybe I should just go out and create a few thousand of these myself, to help create some momentum in the "right" direction. ;-) -- Visviva 18:21, 13 October 2009 (UTC)
The meaningful symbols each have issuing authorities appropriate for the scope. There is some effort currently to differentiate via prefixes or suffixes securities from markets on other exchanges or similar entities. In some ways, the polysemy and duplication is reminiscent of what occurs in other realms, such as ordinary words in ordinary languages. DCDuring TALK 14:05, 13 October 2009 (UTC)

I just thought of another reason to not include most of them. In the case of ISO 639, en is a symbol for English. In the case of E numbers, E175 is a symbol for gold. In the case of stock ticker symbols, KFT is a symbol for Kraft Foods, which unlike English or gold, likely does not meet CFI. How can a symbol for something which has not met CFI, meet CFI itself? — Carolina wren discussió 19:49, 13 October 2009 (UTC)

We don’t have for your information, but we do have fyi. There certainly are cases where something might not merit inclusion, but an abbreviation or symbol for it probably would. —Stephen 20:07, 13 October 2009 (UTC)
Agreed. Furthermore, I'd dispute the claim that "en", "E175", and "KFT" are symbols for "English", "gold", and "Kraft Foods"; rather, I think they're symbols for English (the language), gold (the substance), and Kraft Foods (the company). That is, these symbols have the same referents as the corresponding plain English, and two of them are even derived from said plain English, but they are not actually symbols for the plain English. —RuakhTALK 20:23, 13 October 2009 (UTC)
Ticker symbols are for various securities. Most well known are the ticker symbols for the common equity of operating businesses, but they also exist for other securities of those companies, and for options, closed-end funds and exchange-traded funds (ETFs). I'm sure I'm missing other categories. In the financial press, the common stock ticker symbol may be used to refer to the underlying company, but the divergence between the company and its equity becomes clear whenever bankruptcy is an issue.
Just as with ISO 639 codes, the need to avoid duplication sometimes forces a choice of ticker symbol whose connection with the referent is obscure or even arbitrary.
Many of our abbreviations refer to organization names that do not meet CFI or to SoP phrases. (Though Pawley would suggest that the existence of an abbrevation was evidence of idiomaticity.) DCDuring TALK 22:53, 13 October 2009 (UTC)

[edit] Can y'all reconcile this please?

Hey. I know there's been some tension between 史凡 (talkcontribs) and various members of the community (I won't mention names). 史凡 has been blocked for a few days, and while communicating with him by email, he asked whether or not any of the others had apologized. I was obliged to reply that no one had. I then suggested to him that when he comes back, he go to the people who hurt him (some of those comments really stung him, you know) and apologize for anything that he said that may have been hurtful, and try to make up with all of you. Now I'll suggest that to the rest of you too. Can you please forgive each other and get back to building the dictionary??? I want this quarrel to end ASAP. Please. L☺g☺maniac chat? 14:11, 12 October 2009 (UTC)

You can see below, that he did not take your advice. --EncycloPetey 02:09, 13 October 2009 (UTC)
I ... see. Since none of the parties I've talked to seem to want to listen to common sense, I'm going to pull out of this issue. Unfortunately. :( L☺g☺maniac chat? 12:01, 13 October 2009 (UTC)

ididnt ask4anapology[justwords i/worstcase],but4"is anythin'said onwt datshowsthey realiz they cant cal adisabldpersn namz pl?"--but inflamm.burocratpetey,weldun,oil onfire always gudon wt.--史凡>voice-MSN/skypeme!RSI>typin=hard! 02:16, 13 October 2009 (UTC)

I didn't ask for an apology (just words in the worst case) but for "is anything said on Wiktionary that shows they realize they can't call a disabled person names please?" -- but inflammatory bureaucrat petey, well done, oil on fire always good on wiktionary
He has now resorted to whining and insults with his first edits after the block ended. I am now getting insulting e-mails from him. I move for permanently blocking this user. --EncycloPetey 04:27, 13 October 2009 (UTC)
Not having received any such emails myself, I can't judge for sure. But it does sound like a ban is in order. -- Visviva 11:18, 13 October 2009 (UTC)
Unfortunately, if he doesn't listen to common sense from any of us, that may be necessary. L☺g☺maniac chat? 12:01, 13 October 2009 (UTC)
He has since posted two more ugly e-mails to me, including a mild threat. I am now firmly in favor of a ban. Between altering other people's comments, the constant undirected and misdirected complaints, and now rude e-mails, his presence is more disruptive of Wiktionary than beneficial. --EncycloPetey 03:08, 14 October 2009 (UTC)
*siiiiiiiiigh* :( L☺g☺maniac chat? 20:49, 15 October 2009 (UTC)
Well, for the record, I am sorry that I responded in a way that escalated the situation. I don't really think that reconciliation is necessary, though of course it's great if it happens. Most of us have problems of some sort with multiple other editors, but as long as we stick to the work at hand, everything mostly works out OK. If we want to get back to building the dictionary, the only thing to do is ... get back to building the dictionary.  :-) -- Visviva 11:18, 13 October 2009 (UTC)

The user has continued to send me hate mail, so I've blocked him permanently. If he continues to harass me, I'll have to contact his ISP. --EncycloPetey 20:38, 15 October 2009 (UTC)

Given that the blocked party in question here has made NO positive efforts and is inflammatory by nature, and that these sort of disruptive editing and comments are why he was blocked on Wikipedia ... There's no reason we should have even tolerated his behaviour here as long as we have. --Neskaya contribs talk? 03:13, 19 October 2009 (UTC)

[edit] Category:Old Provençal language

According to Wikipedia, Old Provençal is the "former" name for Old Occitan. Given how little this template ({{pro}}) is used, it would be pretty simple to rename it and all the pages associated. What are our criteria in such cases? Mglovesfun (talk) 22:33, 12 October 2009 (UTC)

It's usually on a case-by-case basis. For this case, we prefer the name Occitan over Provençal, so I would say the same applies for the "Old" version of the language as well. --EncycloPetey 02:07, 13 October 2009 (UTC)
Well Occitan and Provençal have different ISO 639 codes (oc and prv). However, both Google and Google Books say that Old Provençal is the more common name, although not by a wide margin. I'm kinda on the fence now. Ethnologue might help. Mglovesfun (talk) 12:14, 13 October 2009 (UTC)
Okay yeah, sil.org says both, so Wikipedia (according to sil) is wrong. Mglovesfun (talk) 12:20, 13 October 2009 (UTC)

[edit] Restoring Bosnian, Croatian, Serbian sections

In a few days I will be running code to restore the standard language sections deleted with no consensus. There will be no changes made to other sections, and no changes made to the restored sections other than to tag some of them for further attention.

There will be several tests (8-10 entries) run in the meantime for inspection (;-) Robert Ullmann 01:45, 13 October 2009 (UTC)

Robert, many of those entries were created originally by Ivan Štambuk himself or other contributors who support the change to Serbo-Croatian, and then changed from Bosnian, Croatian, Serbian to Serbo-Croatian. If you are not contributing in these language(s), I don't think it's right. Anatoli 02:19, 13 October 2009 (UTC)
Once an entry (or any content) is added to the wiki, it is not the property of any contributor; deleting it without consensus is not permissible, and must be restored. (I'll quote Štambuk himself: "... ti ne posjeduješ ovaj projekt. On nije tvoje vlasništvo, ni jednog jedinog bajta. you do not possess/own this project, not one single byte from hr.wikt Kafić ;-) The alternative is that the various contributors, including myself, will have to correct 6000+ entries manually, which we are not looking forward to. Robert Ullmann 02:40, 13 October 2009 (UTC)
Uhm, I don't remember writing anything like that in the Kafić. Please stop spreading your dirty lies.
As for the "deleting" - absolutely nothing of content was deleted. I've explained to you that many, many times. Can your brain understand that? Pretty much all of the merged entries were heavily expanded and rectified in the process.
"deleting it without consensus" - there was a consensus for 4 months while the merger was ongoing, and which you happily ignored, until you imagined it was some kind of "linguistic genocide" (and sadly, too many ignorants have succumbed to many to your FUD)
You can manually add individual B/C/S/M sections from the merged ones, and add the new ones from the new SC entries, I have absolutely no problem with that. What I have problem with is this unilateral bot-running of yours, for languages you are completely ignorant of. It would generate faulty entries, and it would generate more 15-20 000 entries all of which would have to be manually checked. It would generate absolutely no useful content at all, as all of the information of the "new" entries is allready contained in the merged ones (since they're, you know, the same language). --Ivan Štambuk 09:34, 13 October 2009 (UTC)
...ti ne posjeduješ ovaj projekt. On nije tvoje vlasništvo, ni jednog jedinog bajta... was written by IP, but in style of Ivan Štambuk (which is unique on hr wiki), so anybody could make an error and assume that was Ivan, when in reality somebody could easily pose as him just by using his violent style. SpeedyGonsales 13:27, 14 October 2009 (UTC)
It is, in fact, his (fixed assignment) IP address at the University of Zagreb. Robert Ullmann 05:35, 15 October 2009 (UTC)
Uhm, FYI I have dynamic IP address thru several providers, and when I edit Wiktionary (and other Wikimedia projects) from any external connection (from college or public computer) I always do it tunneled through my home computer (which is always on) for safety reasons. I've never been to Slavonia or Požega (where the above IP traces itself to), and even though I do use CARNET as one of my providers, it is also used by at least 100 000 other people in the Academia.
What is really interesting to me is that you Robert can't understand a iota of any Slavic language (Serbo-Croatian included), and yet you somehow claim out of thin air that some randomly quoted IP, that posted on Croatian Wikipedia discussion board several weeks ago, is me. How on earth did you reach that conclusion? SpeedyGonsales (desysoped and banned bureaucrat from Croatian Wikipedia) claims that it apparently bears resemblance to my style of writing, which I find silly since anyone who has read any prosaic writings of mine on Croatian Wikipedia knows very much that my preferred style of writing (which I tend to rotate depending on the occasion, mood, and the addressee) are long, baroquesque sentences with many archaic and unusual words, metaphors and idioms (I abhor the so-called "standard language", and strive to "break" it wherever I can ^_^). You couldn't have possibly made out the stylemes of my writing by means of Google Translate, or similar automatic-translation tools, which brings me to the conclusion that you've been suggested of it by somebody else in off-wiki communication. Gee, I never thought I'd someday be so important that people would translate even the suspected writings of mine to some external parties by e-mail!
And as for this whole "issue", which needlessly diverged into whether that IP address is me or not: It was addressed already in June when DCDuring here in Beer Parlour expressed concern on whether the "merged" entries would belong to the contributors who opposed the "unified treatment". Back then basically all (99%) of all the individual B/C/S entries were in fact created by contributors who were pro unified treatment, and I also added an aside that that issue is irrelevant at any case, because people do not "own" the entries they created, but that we might as well take it into consideration, as a token of good-intention towards the potential (back-then, non-existent) SC contributors who might oppose the common treatment, to only merge the entries created/edited by those supporting the proposal (which in practice meant pretty much all of the entries). And that has been in practice ever since.
However, reducing the legitimacy of a creation of these entries from page histories to whether we could add them or not is simply clouding the issue altogether: A great deal of these entries have some problems (see below), absolutely none of them adds anything useful to the already-present ==Serbo-Croatian== entries (which even Elephantus (talkcontribs) figured out soon, when he started re-creating ==Croatian== entries not from page history, as he did initially, but by copy-pasting ==Serbo-Croatian== entries to ==Croatian==, the only changes being the the switch of sh [the Serbo-Croatian ISO code] to hr [Croatian ISO code]), and the whole effort seems to be an evil-minded exercise in "how vicious can we be to Štambuk", with you unilaterally announcing it here in Beer-Parlour, not notifying the relevant contributors on the relevant talk-pages for feedback, ignoring the requests to ask for a vote sanction first which is required not only per bot policy, but also given the controversy of all this, and esp. doing all so in an accusatory tone (quoting an allegedly IP of mine, that was suppose to retort on counter-arguments), in a company of a couple who, to put it mildly, don't really suggest good intentions of your actions. --Ivan Štambuk 12:53, 15 October 2009 (UTC)
So long as the Serbo-Croatian sections aren't deleted when the Bosnian, Croatian, and Serbian sections are restored I don't see a major problem. That same lack of consensus over how to treat Southwest Slavic works both ways. I do see a minor problem depending on how you intend to manage the restoration. If the tool will be looking through page histories for Bosnian, Croatian, and Serbian sections deleted when a Serbo-Croatian section was added in its place, and adding them exactly as were prior to deletion, that's one thing, but I can't see trusting a bot, even one that is being human monitored, to recreate Bosnian, Croatian, and Serbian sections from a Serbo-Croatian section. After all the whole raison d'etre of having them as separate sections is that there may be differences. — Carolina wren discussió 03:31, 13 October 2009 (UTC)
Tell me Carolina Wren, will you be helping cleaning up the dreck generated by Ullmann's bot, or you're simply seeing no problem with it because it wouldn't affect your work here at all? --Ivan Štambuk 09:36, 13 October 2009 (UTC)
If anybody doesn't know why I write above that Ivan Štambuk is violent, here is proof (just one of many): cleaning up the dreck generated by Ullmann's bot, clear violation of "assume good faith", because Ivan assumes that bot will make errors, even before agreement is reached will some action be done by bot. I know it is hard when admin is breaking the rules of project, but some measure of civility should be held. SpeedyGonsales 13:27, 14 October 2009 (UTC)
I don't simply "assume" that bot will make errors, I know it will, and I already listed below (in reply to Visviva) some cases where it would introduce errors requiring manual cleanup. I gave an example of one of the test edits, where bot restored the wrong etymology, which has been fixed in the merged entry. It would also restore some of the obsolete templates and sections, not to mention ambiguous and sometimes downright wrong definition lines (which were fixed and expanded in the merged entries), introducing not only ethnical imbalance (by not generating sections for all the four modern Serbo-Croatian standards), but by also misleading the poor reader of Wiktionary that there actually are some differences in the meanings among standards, where there is none (cf. rog#Bosnian and rog#Croatian. The question on whether the bot will make errors or not is competely orthogonal to the question on whether the agreement is reached of bot being run or not - as long as all the bot does is unintelligently restoring old sections from page history, there would be errors, no doubt about that.
"I know it is hard when admin is breaking the rules of project, but some measure of civility should be held." - Uhm, what rules of the project? :) AGF principle applies prevalently to the very first edits of the newly-arrived contributors (before they gain experience of how things are handled here, not only because lots of common practice is not written in the help pages, but because making quality edits requires a bit more technical knowledge than on other Wikimedia projects), and to what appears to be disruptive edits by some of the regulars. What you are trying to do here is to "prove" that I am somehow "violent", and should be "sanctioned" :) My reply to Carolina Wren was along the lines of the old saying lako je tuđim kurcem gloginje mlatiti :) The point is, that those whose edits here are not being directly affected by the consequences of running this bot, esp. those not even knowing the language in question thus being unable to assess the cleanup effort it would induce, don't really have the moral ground to bless its running. Have you checked the validity of those test edits by UllmannBot (talkcontribs)? If you haven't, I don't really care what you think :) --Ivan Štambuk 13:18, 15 October 2009 (UTC)
Yes, don't delete Serbo-Croatian sections. Some questions:
  1. Will this bot deal with cyrillic entries? Mr. Štambuk created equivalent cyrillic entries of the latin ones. And the bot should try to keep sync of cyrillic & latin entries, otherwise confusion will reign.
  2. Will this bot deal with Template:sr-noun and sh-noun since they aren't the same? Instrumental & locative are swapped around.
  3. What will it do with accent markers?
  4. Does your bot know what to do with yat reflexes?
Hopefully, one day, Croats will stop using serbian vukopis. Serbs don't use Kajkavian & Chakavian and Croats should embrace these dialects more. I feel bad for Mr. Štambuk, since this is mostly his work, but at least this bot could save him time if it will keep cyrillic and latin entries in sync.--Pepsi Lite 10:00, 13 October 2009 (UTC)
What do you mean by "Serbian vukopis"? Does that alphabet have a Serbian ethnical marker attached to it? Latin script used for Serbo-Croatian today is mostly a result of the work of Ljudevit Gaj and his associates.
I generate Cyrillic-script Serbo-Croatian entries from Latin ones automatically by means of the program I wrote - first I write the Latin-script entry manually, copy it to clipboard, click on the Cyrillic-script redlink linked to in the inflection line, and do CTRL+V (i.e. "paste") - it comes out as Cyrillic. That process can be 99.99% automated. It handles accent marks, sc=Cyrl, various templates ({term}, {l}..) and context markers.. All the edits to either of the script are kept in sync manually that way.
Kajkavian and Čakavian are dying out for the last 5 centuries, and will have no native speakers at all by the end of this century. There is a diglossia with Štokavian wherever Kajkavian and Čakavian are spoken, and that's only in the rural areas, and considering the ever-increasing urbanization, that esp. the fact that they're non-literally with negligible actually written literally output, they're interesting today only as a historical devices of communication to linguists (and a badge of "Croatdom" for some Croatian nationalists). --Ivan Štambuk 10:40, 13 October 2009 (UTC)
Kajkavian and Čakavian are dying out for the last 5 centuries - equally false statement as "water is running uphill". After work of Ljudevit Gaj (19th century) Croatian people chose Štokavian dialect as base of unified language (in short). Ivan Štambuk thinks we are in 24th century? Every language which is not used, die, that is obvious. But still today both Kajkavian and Čakavian dialect are very much alive, maybe not so much in written form, but you just need to travel a bit, and you will hear so colorful richness of Croatian dialects as you can imagine. People like Isaac Asimov in his books assumed that some day we will all speak some mixture of English and Russian, but that are SF books. Nobody really knows the future, we can make an educated guess about it, but it is still nothing more then educated guess. Wiktionary (same as Wikipedia) should describe reality, not be used as a tool to enforce somebodies vision how it should be. And in reality there are 3 different languages (Bosnian, Croatian, Serbian) and one forming (Montenegrin). And although they are similar to some extent, every language have full right to be described until it cease to exist, either to be joined to other language, or if there is no live speakers of it. And Latin is here nice example, nobody will deny that is dead language, but as it is still used in some professions, it has also its Wikipedia. Štambuk use word nationalist every now and then to antagonize his opponents (which should be discouraged on Wikimedia projects), but we should only stick with reality, as it is only thing that matters. SpeedyGonsales 13:59, 14 October 2009 (UTC)
Cquote1 black.svg
Današnje područje čakavštine znatno je manje negoli je bilo prije migracija izazvanih osmanskim osvajanjima velikoga dijela hrvatskoga jezičnog prostora, a čakavsko područje se i danas smanjuje pod pritiskom književnog standarda. Tako je prije jednog stoljeća, uoči 1. svj. rata čakavski govorilo blizu 1/4 Hrvata ili oko 23%, a danas dvostruko manje ili tek 12%, pa je to najugroženije narječje hrvatskog jezika, najbliže izumiranju.
Cquote2 black.svg
Translation from the article on the distribution of Čakavian from Croatian Wikipedia:
Cquote1 black.svg
Today's spread of the Čakavian dialect is significantly lower than it used to be, prior to the migrations caused by the Ottoman conquest of a great deal of Croatian linguistic area, and Čakavian-speaking area is diminishing even today, under the influence of literary standard. Hence, a century ago, at the eve of WW1, Čakavian was spoken by approximately 1/4 or 23% of Croats, and today barely 12% of Croats speak it, which makes it the most endangered of all the Croatian dialects, the closest one to extinction.
Cquote2 black.svg
This was the spread of Čakavian in the Middle Ages (an image from the book by Dalibor Brozović, renowned Croatian Slavist), and this is today. As one can see, it has been gradually decreasing in the territorial distribution, from hinterland towards the islands. Former exclusive Čakavian urban centres such as Split and Rijeka are today completely Štokavianised. The same thing can be said for the Kajkavian dialect area - the foremost example is the city of Zagreb, Croatia's capital, which used to be Kajkavian few centuries ago, but is today 100% Štokavian (the only traces of Kajkavian left in the speech of its dwellers are the interrogative pronoun kaj, stress-based accentuation, and a few lexical items in the speech of common people, especially older folks). To my knowledge, in the course of centuries exactly nowhere the reverse trend has been observed - Štokavian speeches always ousted Čakavian and Kajkavian speeches. By exercising the pure commonsense mental logic, taking into consideration the immense increase of mass media, urbanization, education of all classes of people as well as language standardization in the last half of century, and especially the fact that today Čakavian and Kajkavian are virtually dead as literally languages, it's safe to assume that that in the following century they'd be reduced to the brink of extinction. That process is irreversible and unstoppable. It's just a matter of time before inertia and ignorance finishes what Turks initiated 600 years ago :)
Nicely put, but that is only showing that Kajkavian and Čakavian dialects are slowly dying, but not 5 centuries (slower even longer, faster much shorter). And for irreversible and unstoppable process, I would only mention example of Norwegian languages, which are nice examples what can be done if people want to change its language. Once again, wiktionary should describe todays reality, not something somebody thinks will be in 1, 5 or 10 centuries. SpeedyGonsales 03:23, 16 October 2009 (UTC)
And in reality there are 3 different languages (Bosnian, Croatian, Serbian) and one forming (Montenegrin). - In reality there is one linguistical entity, in dialectology usually called Neoshtokavian, more commonly known as Serbo-Croatian, actualized in 4 national standards whose mutual differences in grammar could fit on some 2 pages of text :) Now, whether you call these national stanadards "languages" or not is a matter of somebody's perception. Most foreign linguists would rather treat tham as one pluricentric language with regional variants, such as we already have for English, Spanish, German, Portuguese... In the Balkans the word language bears much more value of identity than it does in the rest of the world, so insisting that one can talk of "different languages", strictly on the basis of what ore - let's be honest - trivial differences in spelling and pronunciation, becomes a confirmation of self-identity. You speak of "sticking with reality" - that's exactly what we're doing! :) Linguistic reality is that there are 4 dialectals systems (really 4 languages) - Čakavian, Kajkavian, Štokavian and Torlakian. The third one on the list, a particular innovative speach of it to be more precise (Neo-Štokavian), is used as a bases of the standard, codified language of all the 4 nations of that 4-part dialect cluster. If we were to truly describe linguistic reality, we'd have L2 ==Shtokavian==, ==Kajkavian==, ==Chakavian== and ==Kajkavian== (as Millosh once suggested), but that that would be pointless as we're not building a dialectological dictionary but dictionary that people would use to learn the language of communication, i.e. the standard language.
Štambuk use word nationalist every now and then to antagonize his opponents - I use it because bulk of the objections against the SC proposal come/came on nationalist grounds, as I explained above. Voting "no" on the Unified Serbo-Croatian vote page was perceived as a step to re-affirming one's "Croatdom" (hrvatstvo) by many of the voters. Some even requested that, can you imagine that, they'd be apologised to, due to "being insulted" :) Most of the Balkans still lives in the 19th century state-language-nation fairy tales, a state of mind commonly described by the word nationalist. Sorry, but to ignore the nationalist dimension in all this would be to simply play dumb. FFS, people deliberately lie when discussing the history of "Croatian language", as if the choice of Neoštokavian dialect as literary has absolutely nothing to do with Karadžić and Serbs, which is a ridiculous fabrication of history. Vienna Literary Agreement? Never happened! :)--Ivan Štambuk 15:12, 15 October 2009 (UTC)
Sadly, you explained nothing. To ignore national dimension when we are talking about language is more than dumb, but to ignore rules of Wikipedia & Wikimedia as you tend to, is rude. You don't have the right to call users of Wikimedia project nationalists! Say what is true or false in words of others, but stop giving others tags of nationalists or any other ists, its rude, FFS. SpeedyGonsales 03:23, 16 October 2009 (UTC)

Ullmann, you need to pass a vote to do any kind of such large-scale modifications in the main namespace, esp. for languages you have absolutely no bloody clue about. You can start a proposal, and then I'll explain to you why your brain-damaged bot wouldn't work (as I've already partially done, but you seem to simply ignore any kind of discussion). --Ivan Štambuk 09:23, 13 October 2009 (UTC)

I agree that this would require a vote. (If he wants to go through and do it manually, that's another issue — there's no consensus to forbid it — but bots are only for implementing consensus, not for exploiting the lack of it.) —RuakhTALK 16:15, 15 October 2009 (UTC)
I would like a vote is such language as brain-damaged bot allowed on this project? Or I can start calling every bot I dislike brain-damaged? Ruakh, maybe you have bot. Is it damaged? Or? SpeedyGonsales 03:23, 16 October 2009 (UTC)
(don't worry, the bot doesn't care if it is called "brain-damaged". And since he applies that word to everyone who disagrees with him, it is mostly meaningless; "brain-damaged bigot", etc ;-) Robert Ullmann 06:12, 16 October 2009 (UTC)
"you need to pass a vote to do any kind of such large-scale modifications in the main namespace" Quite so. I entirely concur. When you set out to make "large-scale modifications" by deleting 3 languages from the Wiktionary, and forcibly merging them into "Serbo-Croatian", you discovered that it was contentious, and disputed, and yet continued. You then set up a vote, and continued making the modifications while the vote was running. When the vote failed, and your modifications were rejected, you continued as if the vote had not occurred.
Now you have the absolute temerity to demand that your entirely unauthorized "large-scale modifications" not be undone?
The 5,427 standard language content sections improperly deleted must be restored. The brain-damaged bot can be used to restore them to the status quo pro ante. If Mr Štambuk wants to insist that they must instead be restored manually, then I believe we must insist that Mr Štambuk personally do all of those manual restorations to his desired standard of quality before being permitted to engage in any other activity on the Wiktionary. Robert Ullmann 06:12, 16 October 2009 (UTC)
[edit] Restoring Bosnian, Croatian, Serbian sections — AEL

Entries recently edited by UllmannBot (talkcontribs), with sections extracted from page history:

Apparently the whole list of entries waiting to be "processed" this way is held here: User:Robert Ullmann/SC recovery/report.

As one can see, the way the Ullmann bots does his work, is by looking up the page history and restoring stubbish entries that were later expanded as ==Serbo-Croatian==. It doesn't add any kind of new content at all - it just mindlessly restores the previously merged entries, under the sections that were mercilessly wiped during the SC merger.

And during that process, it doesn't differentiate between standards at all, as it pretends to. For example, of the abovelisted entries, it only added ==Bosnian== and ==Croatian== section at [[cijena]], even thought it is also perfectly valid Ijekavian Serbian word (with exactly the same meaning, inflection, etymology...). This introduces ethnical imbalance in the treatment by falsely insinuating that some word is e.g. less "Serbian" (this especially pertains to Ijekavian Serbian entries which were scarcely generated, but are handled transparently in the "common" SC treatment.)

At the entry [[cigla]] you see all the 3 entries restored, but without the information that was added in the merged ==Serbo-Croatian== entry (the declension table). It also restored the wrong etymology which I fixed in the merged entry (the Latin word does not originate from Ancient Greek word - they're cognates of the same Proto-Indo-European root *(s)teg- "to cover").

It would be much easier, less wrong and more comprehensive to simply generate separate B/C/S/M entries from the SC entries themselves. But that is entirely different issue by itself (take a look at [[govor]] to see how ridiculous would it look like). The thing I hate the most of this bot is that by generating the entries of different state of treatment, that it somehow hints to the unknowledgeable observer that there is some kind of functional difference between these "languages", which there isn't. --Ivan Štambuk 10:22, 13 October 2009 (UTC)

  • Questions from the peanut gallery:
    • Were these old entries merely incomplete, or did they have substantial quality issues? I would certainly hope the bot wouldn't be re-creating any RFC issues that had already been dealt with under Serbo-Croatian. But incompleteness is the normal state of a wiki page.
    • If we have decided, as apparently we have, to have entries in all four of these language(s) alongside each other, then wouldn't we expect such entries to spread and grow organically, anyway? Would the bot-restored entries differ from the stub sections that might normally be created by a passing anon?
    • If these entries are re-created, would it subsequently be possible for a bot wrangler who is knowledgeable in the languages to automatically spread the spreadable aspects of the Serbo-Croatian entry to the other language sections? If this were done, would it resolve the concerns about restoration?
    • Is there any chance that we could all discuss this in an atmosphere of mutual respect? I know the whole issue is quite a mess, and has quite a history at this point; but we are all working towards the same goal here. I think. :-) -- Visviva 11:48, 13 October 2009 (UTC)
  • I'd say that at least 90% of them were in the state of "allowable incomplete", and the 5-10% of them had various issues such as incomplete, unprecise or ambiguous definitions (which is characteristic of the basic-vocabulary lexemes in a stub form), and formatting mistakes (e.g. the obsolete ====Cyrillic spelling==== and ====Latin spelling==== sections which are now handled inside the inflection line of {{sh-noun}}, previously by {{sr-noun}}, ====Related terms==== not containing really etymologically-related terms, some not so precise synonyms, meanings not split by etymology..). Nothing really "wrong" with all that, and great many other languages here already have similar issues (esp. the rarely-maintained ones), but I'd really hate to see all that already fixed sh*** resurrected again.
  • We haven't really decided anything. There is a consensus among the SC contributors (apart from a group of nationalists that contributed here only during the vote, and has fled ever since it ended), as well as pretty-much all of the Slavic-language contributors that the common treatment in one language section is something beneficial to the end-users as well as contributors, and there are others who for some unknown reason oppose to the merger, mostly due to the unexplainable concern that somebody's feelings don't "get hurt". With the second groups, Robert Ullmann and his minions (Lmaltier, DCDuring, Elephantus..) you cannot really argue at all, because they either ignore any kind of discussion, continue to spread FUD and dirty lies, or imagine various kind of "concern scenarios" over imaginary users. Now, since the first group (the people who actually contribute Serbo-Croatian entries) is responsible for some 99.9% of all the entries, these non-merged entries would not, as you say "spread and grow organically". This is not Wikipedia: the contributions by "passing anons" are infinitesimal in number, and usually of very little value. 99.9% of all quality content on Wiktionary is added by long-time, well-known and dedicated users. If you ask me, I'd lock this project completely for IPs, as they only generate vandalism and low-quality entries that need to be fixed and expanded. The net result of everyone contributing sections they prefer would be: Serbo-Croatian entries growing to tens of thousands, and individual B/C/S/M entries serving primarily as a hole to channel various passing-by nationalist contributors, who'd give up once they realized how much time-consuming adding quality content here is, despite the apparent "easiness" of contributing (as opposed to Wikipedia).
  • Not completely, but to the large extent possible. But, why? It's pointless to have both the merged and separated entries on the same page. It still doesn't solve the maintenance-hell problem. So for each SC entry you'd have 1+2+2+2=7 (B, S and M can be written in 2 scripts) additional entries, 9 if you include the alternative forms like with jat reflexes. It's pointless and fruitless endeavor. Our Serbo-Croatian learners would be horrified upon encountering such mess. But I rather wouldn't discuss some would-be scenarios, but what we have here and now with this Ullmann's bot.
  • Ever since his blogpost at DailyKos in which Ullmann described me as some kind of "genocidal Serb nationalist", he has done absolutely nothing to deal with this issue in a civilized, respectful way. Moreover, abs. everything he did was to obstruct the Serbo-Croatian proposal (of which he was notified, as did the entire community, 4 months earlier - which begs for question: what has changed in his mind in the meantime?): from the "intractable technical difficulties" with some silly HTML languages codes that upon inspection came to be completely irrelevant for some 99.99% of websurfers, to this brain-damaged bot which he writes on his home computer and then "announces" that he'll run in on BP as something completely legitimate and "normal", barely worth of discussing. If he were a decent person, he'd announce it on the talkpage of WT:ASH, and ask for feedback, but that wouldn't be evil enough of him. The only thing Robert Ullmann is interested in is sucking on my nerves, under the disguise of some "concern" for the "deleted content" of "standard languages" (whom he doesn't know a word of). It's pointless to discuss with person like him, since he'll never admit that he's wrong. --Ivan Štambuk 15:22, 13 October 2009 (UTC)
I absolutely agree with you, Ivan. RU has done extreme damage to the Serbian/Croatian project and seems intent on killing it off entirely. It’s the same kind of misguided and shortsighted behavior that has stifled our Swahili entries for years and paralyzed the Kinyarwanda Wiktionary. If he would put his considerable programming expertise to good use in support of our linguists instead of thwarting us at every turn, he would be a great resource. Instead he smothers everything he touches. —Stephen 19:30, 13 October 2009 (UTC)
(wow!) Stephen, what has happened to you? You are much better than this! Personal abuse from you? I would never have expected it. and your complaints are nonsense. I haven't "stifled our Swahili entries"; I in fact organized a project of 5-6 native speakers of Swahili and English and several tribal languages, and we have added thousands of entries to sw.wikt and translations here. "paralysed Kinyarwanda" is even more inexplicable, I haven't done anything there in a long time. Robert Ullmann 05:35, 15 October 2009 (UTC)
I strongly disagree with Pepsi Lite. The bot should not try to sync anything, only to restore sections that have been renamed, without deleting anything nor adding anything (otherwise, it would introduce errors: a bot cannot guess anything about words). Restored sections would be for users looking for e.g. Croatian sections (users looking for Serbo-Croatian sections would find them, and not bother about other sections). After that, it will be up to editors willing to do it to add new entries (if they want to), or to improve restored entries. Only humans can do this job (but nobody has to do it). Lmaltier 16:19, 13 October 2009 (UTC)

Given the controversy of this, it would seem irresponsible to proceed without voting on the issue (per WT:BOT, "a new bot must be approved by the community"). However, as it has been previously shown, votes on this issue are nothing more than power struggles, it seems unlikely that this bot will ever be approved without it being beneficial enough to everyone. Conrad.Irwin 20:42, 13 October 2009 (UTC)

This absence of consensus makes the bot necessary. Is it normal that lots of sections are removed again and again, without doing anything? This bot would do nothing except restoring valid sections. On the French wiktionary, somebody used to systematically remove translations to Ido (and many translations to Latin, when the Latin word was already present in the etymology section, giving the reason redundancy). In this kind of case, when changes are too numerous, a bot is a good solution. Refusing the bot would be accepting this removal of valid sections. Lmaltier 06:09, 14 October 2009 (UTC)
This absence of consensus makes the bot necessary. - Uhm, what kind of ill-logic is this? There is already a consensus to merge the entries among the relevant, knowledgeable contributors. Pretty much all the opposes are either complete ignorants, or do so on some bizarre ideological basis.
This bot is worthless. It restores worthless, buggy, stubbish content that is already contained in the thoroughly fixed and expanded entries. It is solely and ultimately product of Robert Ullmann's evil-mindedness and hatred.
Your comparison of Serbo-Croatian standards to Latin and Ido is preposterous. For once, you should familiarize your self with Serbo-Croatian to see why. The overlap would be in some 99% of words, and the entire grammar, which is at least an order of magnitude greater redundancy than you'll get between Ido and Latin. Words are not just spelled the same - they have the same inflection, pronunciation, etymology etc.
Refusing the bot would be accepting this removal of valid sections. - You keep being obsessed with the notion of "valid sections". Above you also make a nonsensical claim: Restored sections would be for users looking for e.g. Croatian sections (users looking for Serbo-Croatian sections would find them, and not bother about other sections).. Did you miss the part where me and other thoroughly elaborated the following points:
  1. It's impossible to learn "Croatian" and not also "Serbian", "Bosnian", and "Montenegrin" at the same time.
  2. Modern Serbo-Croatian standards are taught together, as one language, (as "Serbo-Croatian", "BCS", or other similar name) in 99% of FL world's unis. The only place were they are not is because of funding by Croatian diaspora (i.e. due to politics messing in the world of academia).
Hence, there are really infinitesimal number users and contributors that would be looking "only for Croatian entries". Just because these merged sections are technically "valid" as kept separately, is no reason at all to have them, or to encourage their creation - they're worthless content-wise, they'd introduce a number of errors, require hundreds of hours of cleanup and expanding, in order to make all the pages containing them look as ridiculous as the entries on [[govor]]. --Ivan Štambuk 07:48, 14 October 2009 (UTC)
Even an infinitesimal number of users would have to be considered (after all, even languages with only a few speakers are accepted here), but publishers proposing Croatian-English, Croatian-French, Croatian-German... dictionaries prove you are wrong: there is no doubt that there is a market, that there are many dictionary users looking for Croatian words. But there is no need to discuss this again, only to apply the result of the vote you organized, which shows that this issue is very sensitive, and that there is no consensus for the removal of these sections. Lmaltier 20:52, 14 October 2009 (UTC)
Indeed. To confirm, the bot code only restores sections for which there was no approval to remove, and therefore were improperly removed, and therefore must be restored. No changes are made to the restored sections, except to add attention tags in some cases, and no changes are made to "Serbo-Croatian" sections. (Comments by Mr Štambuk supra about creating entries do not apply; I did have/do have code that can do things like that; that is not the issue here.) Editors are then free, as always, to work on the standard languages, without fear that their valid contributions will be deleted. If in fact, Mr Štambuk had simply continued his excellent work on his native language, Croatian, without deleting the standard languages, we would have much better entries today. Robert Ullmann 05:35, 15 October 2009 (UTC)
bot code only restores sections for which there was no approval to remove - Uhm, you were personally notified as well as the entire community in Beer Parlour for 4 months the merger ongoing, before you turned on the sick "linguistic genocide" mode, spreading lies, defamation and FUD all over the place. The merger was approved by the consensus of all the active SC contributors, as well as by the silent approval of the rest of the community. Now, the fact that you personally imagine that there was no consensus, simply proves that the only intention of yours here it to be evil by playing dumb.
No changes are made to the restored sections, except to add attention tags in some cases - which will result in thousands of created entries requiring manual cleanup, which you, I suppose, have no intention giving a hand with, since you have no clue about the language...
Editors are then free, as always, to work on the standard languages, without fear that their valid contributions will be deleted. Editors are now free to work on their standard and non-standard languages and "languages" without fear that their valid contributions would be deleted. I haven't touched the new creations done by your nationalist friends, except fixing errors in them. OTOH, they've quite benefited from my edits (many a such new ==Croatian== entries are blatant copy/pastes of the neighboring ==Serbo-Croatian== entry, so much for the "different languages"). The only entries that are merged are those created by the contributors supporting the unification effort. And that would be most of them, and I suspect that is the thing which bothers you most.
If in fact, Mr Štambuk had simply continued his excellent work on his native language, Croatian, without deleting the standard languages, we would have much better entries today. - Excuse me, my native language, mother tongue, is "Croatian", in the same sense as yours is "American English". In a more general sense they're called Serbo-Croatian and English, respectively. We already have much, much better entries today because of the unification effort, since Dijan and I don't need to waste time and space on anymore doing exactly the same thing on seven different places, we do it on only one or two, thus doing it much faster. --Ivan Štambuk 15:42, 15 October 2009 (UTC)
Even an infinitesimal number of users would have to be considered - Infinitesimal number of users are of infinitesimal concern to us. In other words, it would be a waste of time and space. This has nothing to with the number of speakers (we even add extinct languages spoken by no one), but with various other practical concerns which you somehow manage to ignore every single time.
but publishers proposing Croatian-English, Croatian-French, Croatian-German... dictionaries prove you are wrong: there is no doubt that there is a market - there are separate dictionaries and grammars of Austrian and Swiss German, Portuguese in Portugal and Brazil, not to mention Spanish, English, Arabic... Should we separate these too? :) Lmaltier, the point is not whether we could do it, but does it make sense. I see no sense in making 200 000 entries looking as ridiculous as those on [[govor]].
But we're not talking about whether these entries are allowed or not (and they are allowed): we're dealing here with Ullmann's bot which he threatens to unleash generating thousands of problematic entries, because his brothers-in-arms think they're "fighting" for Croatian language that way. They'll be gone for good after the bot finishes execution, and so will Ullmann, attending to his usual business I am the one who'd be left cleaning up that **** :) --Ivan Štambuk 15:27, 15 October 2009 (UTC)

This is getting boring. The only solution I see is a duel between Štambuk and Ullmann. I can pick the weapon (I'm thinking about muskets). --Vahagn Petrosyan 17:04, 15 October 2009 (UTC)

I have a better suggestion. Can they have a virtual beer session? If they were closer geographically, the real one would be better. --Anatoli 00:38, 22 October 2009 (UTC)

@ Ivan Štambuk: Calm down, I'll help you to clean up this ****. Afterwards, we can make a bot to generate an enormous amount of separate American and British entries... Maria Sieglinda von Nudeldorf 18:55, 15 October 2009 (UTC)

The bot won't generate any problematic entry, nor generate any entry at all, only restore entries (entries removed before, during or after the vote about this issue). No cleaning at all should be necessary but, of course, people willing to improve them (and only people willing to improve them) would be welcome. Lmaltier 22:23, 15 October 2009 (UTC)
The bot already generated problematic entries during the test edits. See my posts above, and pay attention to what is being written Lmaltier, I'm beginning to suspect that you possess selective visual sensorium. --Ivan Štambuk 22:52, 15 October 2009 (UTC)
The bot should not be used yet if it's not completely tested, i.e. if it does not restore correctly the last version of removed entries to be restored. I have not checked that. But, once it does its job well, it is necessary to use it. Lmaltier 06:42, 16 October 2009 (UTC)
You're obscuring (either deliberately or inadvertently) the actual issue I object to with two seemingly identical notions of "correct":
  • correctly restoring the previously merged entries from the page history
  • the correctness of the restored entries.
No, the correctly restored entries are not necessarily correct. They'd be inducing, as can be attested on the test edits committed by the bot already: 1) factually wrong data rectified in the merged entry (e.g. the wrong etymology I exemplified above and which was fixed in the merged entry) 2) obsoleted sections and templates (e.g. ==Cyrillic/Roman spelling== headers which are now treated in the inflection line) 3) propagate inconsistency (e.g. rog#Bosnian has different set of meanings than rog#Croatian, so the poor Wiktionary user would be mislead into assumption "gee, this must be one of those cases where there actualy is some kind of differences among the standards", when in fact there is none and all 3 senses are valid in all 3 (4) Serbo-Croatian standards. In other words, if the point of separate treatment is to make up forthe cases when there is some kind of difference in meanings/spellings, this restoration would in fact to more damage than good).
Therefore, all of those entries restored from the page history must be checked by a human before committing, which kind of renders the process of completely automated bot-restoration pointless. The reason why I principally oppose such restoration of the merged entries is because 1) they're worthless as everything of any worth in them is already contained in the merged entry, and properly rectified/expended in the process 2) Thousands of these restored entries would need manual cleanup (actually, every singly one of them would have to be checked, because bot doesn't have a clue whether there was sth wrong or not in the unmerged entries). If Ullmann and co. are ah-so-concerned with the status of ==Croatian==, ==Serbian== and ==Bosnian== entries they should perhaps focus instead on extracting (OK, let's be honest, copy/pasting) the information from the ==Serbo-Croatian== sections, as that would be much, much easier thing to do (90% less LoC), and much less error-prone, in some kind of semi-automatible way, but supervised by humans. I personally don't care at all for B/C/S/M entries now, other than as a means to enhance ==Serbo-Croatian== entries. --Ivan Štambuk 20:16, 23 October 2009 (UTC)

Duplicating, or even triplicating in this case articles for essentially same language is a bad thing for Wiktionary. People who improved one article, would have to merge their improvements to other ones. And most outsiders or just casual editors will never bother to do it. I see a VERY GOOD reason for the merge that have been done. But I see no whatsoever advantages in creating near identical copies. Assuming that such triplication would be done intelligently by human. And doing it by means of the bot, who will ignore most, if not all improvements that have been already made making even less sense. Sure, we could create something like, lets say, == Canadian English ==. And we might one day. For example to give a Wiktionary reader information he most interested. But for the moment copying entries would be bad idea. Doesn't matter we talking about copying English --> Canadian English and Australian English or Serbo-Croatian --> Serbian and Croatian. TestPilottalk to me! 23:37, 31 October 2009 (UTC)

There has been a great deal of - discussion - about this. Similar arguments can be made about various languages that are quite similar and may be considered dialects of the same language. The saying is: "A language is a dialect with an army." Wikiworld is actually part of the real world in which non-ivory-tower considerations may have applicability. For some time we (relatively) peacefully followed ISO 639, treating any purported language with a code granted by ISO as a language for our purposes. There has been no consensus to change that policy. Serbian, Bosnian, Croatian, and Serbo-Croatian have had ISO codes. The linguistic arguments favoring treating them as a single language seem as good as those that are made about languages such as English (UK, US, Australian, at least) French (France, Quebec), Spanish (Spain, Latin American countries), etc. DCDuring TALK 00:42, 1 November 2009 (UTC)

Ummm....I think we're missing the point here. The SC vs individual headers bit is damned controversial. I think we can all agree on that. The question we need to be discussing is NOT whether we should have one SC header or headers for Serbian, Croatian, etc. The issue is whether this bot should make these edits. It seems to me that having consensus on all bot activities is absolutely necessary. Inasmuch as Robert has done immeasurable work for this project with his skilled programming, he does not have the right to shape it as he sees fit, which is ultimately what allowing this bot-work to proceed would accomplish. Yes, the vote for SC unification failed, but separate headers have no more consensus than unified headers (less, actually). Additionally, it should be noted that when Ivan did the mergings, he DID have apparent consensus. Now that we no longer have consensus either way, editors are free to edit as they see fit, but bots should stay the hell out of it. -Atelaes λάλει ἐμοί 05:55, 1 November 2009 (UTC)

Indeed. We have a consensus, let's act on it. The bot can restore the sections with a request for AutoFormat, and human editors can do the rest. Mglovesfun (talk) 06:07, 1 November 2009 (UTC)
What? I must have missed something (I, admittedly, haven't been that active lately). My impression was that we had anything but a consensus. My impression was that we were utterly away from coming to any sort of rational decision on the matter. Looking at the discussion, it would appear that, of the names that I recognize, only DCDuring and Lmaltier are supportive of this (to those whose names I don't recognize, I apologize for snubbing you. This issue has brought in so many people on both sides who have no intention of doing anything for the project that I haven't bothered to get to know them). Specifically, I see Conrad, Ruakh, and Stephen all opposing this. I also see a whole lot of two editors who used to have a lot of my respect whining like little children, as they have been for some time. In any case, this is not a consensus. -Atelaes λάλει ἐμοί 08:51, 1 November 2009 (UTC)
I oppose bot edits under these highly controversial headers as well, as controversy and confusion would only increase thereby. The uſer hight Bogorm converſation 20:03, 1 November 2009 (UTC)
In case more "names" need to be listed, I disagree with the bot edits. Meta-wise I view no consensus and on the issue (and therefore a bot should not be run). --Bequw¢τ 21:37, 1 November 2009 (UTC)
No we don't have a consensus for running the bot. All such edits must be done manually and checked in the process against the merged entries for corrections and missing stuff. --Ivan Štambuk 06:41, 1 November 2009 (UTC)
I agree we do not have consensus for running the bot. Nor do we have consensus for removing language headers. Yet Monsieur Štambuk continues to do so. Despite a lack of consensus, Štambuk is disregarding the Wiktionary community's decision in pursuing his crusade. But that has been xyr style. - Amgine/talk 04:04, 2 November 2009 (UTC)
I agree with you that Ivan should have stopped converting e.g. “Croatian” to “Serbo-Croatian” once the community started to debate the issue. That would have been courteous. However, there is a huge difference between performing an edit without consensus, and writing a bot to perform edits without consensus. Wiktionary could not function without the former, and cannot function with the latter. (And I don't understand why you link to his block log. Are you trying to suggest that your unjustified block was somehow "the Wiktionary community's decision"?) —RuakhTALK 04:18, 2 November 2009 (UTC)
Wow, look who's talking, a person who ran Ullmann's script from his own username account (so much for the respect of "community consensus"), who thinks that the same word written in 2 scripts belongs to "different languages", who claims that no peer-reviewed evidence was provided to him that supports the notion of one language... And I find it extremely funny that your rhetoric has changed from "removing languages" to "removing language headers" ! Are we finally accepting the fact that these are all triplicates of essentially one and the same content?!
I created those entries, I'm reformatting them. If you have trouble with that, learn Serbo-Croatian and add separate B/C/S/M sections, if you care so much. --Ivan Štambuk 04:34, 2 November 2009 (UTC)
Goddamn it, Ivan, shut up. Your complete lack of restraint is making everyone supporting your side of the argument look bad. If you have nothing productive to add to this conversation, please bite your tongue. Agree with Ruakh that, in the face of no consensus, everyone is allowed to edit as they see fit, including Robert and Ivan, but automated edits are out of the question. -Atelaes λάλει ἐμοί 11:35, 2 November 2009 (UTC)
Atalaes, M. Štambuk has used technically-assisted edits in his campaign. I believe Cirwin can confirm this. In every way but the semantics, a bot. - Amgine/talk 14:30, 2 November 2009 (UTC)
When you suggested this, I responded "I doubt it", and gave you reasons for this response. Please do not use my privately-expressed opinions to support your arguments: even if you were correct in remembering what I said, it would be polite to ask permission first. Conrad.Irwin 15:43, 2 November 2009 (UTC)
LOL - nonsense. I do all of my edits manually. I have a computer program (that I wrote) that helps me transliterate Cyrillic/Latin script on the fly, and that's it. I haven't run a Mediawiki bot in my life. Stop spreading lies. --Ivan Štambuk 14:34, 2 November 2009 (UTC)
Cirwin: what you stated was you believed the method Ivan used was similar to your own to open 50 or more windows, then saving and closing in quick succession (which would not have given the sustained edit rate M. Štambuk's log occasionally displays, but that's another topic.) You've also said you've used scripts to automate this process. And you said this in a public channel. My apologies if I made logical leaps which were inaccurate. - Amgine/talk 19:09, 2 November 2009 (UTC)
I merely pointed out that that was another method of making quick edits (thus the supposition that a bot as being used was premature), I did not imply that it was the only way; simply that it was a way I have used in the past. The scripts, which I also mentioned, are written in python and are another way allowing me to do manual edits in a proper text editor instead of tinkering around in a tiddly box in a web-browser. They were unrelated, but mentioned in succession. This part of the discussion took place in a private message session between the two of us. I do not think it merits further discussion. Conrad.Irwin 23:32, 2 November 2009 (UTC)
I do all of my edits manually. I have a program that converts Serbo-Croatian entry in Latin script to the equivalent entry in Cyrillic script, because Cyrillic characters with combining diacritics that Slavists' use to denote accents are impossible to type. Every single SC entry I've edited was saved manually, by ALT-SHIFT-S. In very rare occasions (when I'm internetless) I pre-edit the entries in a text file, and then copy/paste them in a serialized manner. When editing at a sustained rate, I add entries one by one, never parallelly in multiple windows waiting to be saved. I've never used scripts or bots so far as I have no need for such technology. --Ivan Štambuk 19:30, 2 November 2009 (UTC)

[edit] calin adisabldpersn namz has2stop here!

&if its dalasthing i'complish'ere! ps was kwami'api onhis return?butkeep sowin'lazy adminshp.

Excuse me, but you do not get special priviliges. You have been calling other people ugly names, but now complain that others are doing the same to you. You do not get to call names, and then complain about the same behavior from other people. You do not get to pass the blame for your improper behavior or its consequences on to the administrators. You need to stop your improper behavior now, and own up to your own mistakes. It is not just calling disabled people names that is a problem, but calling any people names that is a problem. --EncycloPetey 02:05, 13 October 2009 (UTC)
Yeah people with "disabilities" should be treated as equals, which means not worse or better than everyone else. As far as I can see, nobody's talking about your disability apart from you. The rest of us are just worried about your actions. FWIW (personal opinion only) I imagine everyone has some sort of "disability" if you define the term widely enough. Mglovesfun (talk) 12:11, 13 October 2009 (UTC)
User:史凡, you really need to stop griping and try to refocus on doing some useful work. Nobody here is a paid employee, we all do our work pro bono. It is a team effort and each member works at his own pace and does what he feels competent to do. There are no paid employees who are obigated to kowtow to anybody’s needs or whims. If you don’t get along with someone, you don’t have to talk to him. If some of us cannot understand your shorthand, that’s something you just have to learn to live with. —Stephen 19:44, 13 October 2009 (UTC)
I've emailed 史凡 (talkcontribs) and asked what he considers unresolved problems. Unless he responds with a very compelling answer, I see no reason for anybody to keep complaining for special treatment. L☺g☺maniac chat? 20:48, 13 October 2009 (UTC)
(later) 史凡 replied: "undrstndbl-dadeythink deycando wodeva deywnt w/som1who signls arlvntdisablty,instedof doinwot=nrml i/society,nl acomodat dadsabllty[orwotsdawrd~providin ramps4wh ch usrs etc],treat w/respct&giv equaloportunity. that is to say: Understandable - that they think they can do whatever they want with someone who signals a relevant disability, instead of doing what is normal in society, accomodate the disability (or what's the word, providing ramps for wheelchair users et cetera), treat with respect and give equal opportunity
Apparently he thinks that we're not accommodating him properly. True, a record button has not been set up, but I'm not exactly sure what more he wants. L☺g☺maniac chat? 22:51, 13 October 2009 (UTC)
I think he mistakes us the contributors for people who have influence on those who fund technical improvements and moreover he seems to mistake en.wikt for a "normal" organization. I personally have not the ability, the connections, the financial resources, or the interest in solving this problem. Perhaps it really is a WMF problem. It would have been nice if we could have resolved matters, but the expectations expressed seem to make it clear that we cannot. DCDuring TALK 23:08, 13 October 2009 (UTC)

(unindent) Good analysis. Moreover, it might just be a general proper with the Internet or computers in general. I see no particular reason to single out WikiMedia. Other than that fact that's what this site is, a WikiMedia project. Mglovesfun (talk) 10:21, 14 October 2009 (UTC)

[edit] Missing Citations page should link to Wiktionary:Citations

To a casual user who just encountered a new citation, which might never be recorded in wiktionary if not recorded now, the Citations page seems to be the place to put it, something like sending a letter to the OED in the old days.

However, it seems that putting such a note on the citations page is likely to get it deleted without a trace, causing the citation to be lost forever. See for example the recently deleted Citations:paronomasiac which cited the NYT Magazine On Language column.

BTW I am not going to add it back. IMHO that is SemperBlotto's job.

So there should be more instructions indicating how to get a citation recorded permanently by someone using a mobile device which just can't accomplish a properly formatted entry.

—This unsigned comment was added by Archimerged (talkcontribs) 23:50, 13 October 2009 (UTC).

Well, seeing as you didn't include the actual quotation, misspelled the referent's name (he's William Safire, not william saffire), and seem to have gotten the headline wrong (at least, the one you included does not match that on the NYT Web-site), your contribution was not so much a citation as a vague pointer to where we might find one — fodder less for the citations page than for the discussion page. And given that the word gets well over a hundred hits on Google Book Search, I can't say that such a pointer is really all that necessary, anyway; so, I don't blame SemperBlotto for having deleted it. Recent-changes patrolling always involves this sort of trade-off; you don't get anywhere if you try to fix every entry, so you always have to make a judgment call about whether what's there is better than a redlink. In this case, a reader clicking the bluelinked "citations" tab would have been sorely disappointed; the citation in the entry was far more useful. (However, in this case, I've restored the page, and tracked down the citation and fixed the page, so you can see what sort of thing we have in mind.) —RuakhTALK 01:50, 14 October 2009 (UTC)


Thanks for fixing the entry. It was written using a kindle. You can't make capital letters (the shift key is ignored by the experimental web browser). Any quotations have to be retyped by hand on a tiny keyboard from memory, and there is no spell check. While reading the paper on the kindle and away from a computer, I looked up "paronomasiac" wondering if it was a new word, and because the entry doesn't link to "paronomasia" I thought it might be a word coined in 2004 by Spider Robinson, so seeing it used in the NYT would be an important piece of data. I had a choice: try to find out elsewhere if the citation is important (but then I wouldn't have time to record it), or enter it on the citations page. There is no indication that the citations page isn't supposed to be something like the talk page anyway.
Anyway, I think the important thing is that the page displayed from a red "Citations" link ought to say a little about what citations pages are for and link to the official policy page (which is quite difficult to find). Also it ought to link to a place where pointers to citations can be recorded (or say they should be recorded on the talk page), for people using mobile devices who encounter possibly importation citations. After all, a dictionary is not just for people who want to look up words. It is for recording the slow alteration of the language. — Archimerged 01:52, 15 October 2009 (UTC)-
This is a good point; we should have something similar to the current warning message for templaltes. Does anyone know where that text that appears on a template edit page comes from? I can't find it anywhere in MediaWiki or template-space. -- Visviva 02:58, 16 October 2009 (UTC)

[edit] Accepting input from mobile device users

http://www.nytimes.com/2009/10/07/technology/companies/07amazon.html

Mr. Bezos declined to offer specific information about Kindle sales. But he said Kindle titles were now 48 percent of total book sales in instances where Amazon sold both a digital and physical copy of a book. That was up from 35 percent last May, an increase Mr. Bezos called “astonishing.”

There will be lots of readers with kindles and other mobile devices, and when they encounter words not in the included dictionary, many will use the clunky and experimental web browser to look in Wiktionary, as I did. The words they are looking for are by definition rare, as they wouldn't be looking in Wiktionary unless the words were not in the built-in dictionary. Is wiktionary going to accept their input? There should be a fast and easy way to add a short note about where they encountered such words. Archimerged 03:05, 15 October 2009 (UTC)

We have Wiktionary:Requested entries:English (shortcut WT:RE:en), where users can request an English entry that does not yet exist. Those requests which include a quotation or other information are usually dealt with must faster than bare requests consisting of just the word with no contextual information provided. --EncycloPetey 03:09, 15 October 2009 (UTC)
Wiktionary looks awful in my Nokia 6700 browser, I have trouble viewing let alone adding some input. --Anatoli 03:11, 15 October 2009 (UTC)

[edit] Useless link at Wiktionary:Discussion rooms

Wiktionary:Discussion rooms lists Bug reports under "Other places to congregate" which was eliminated four months ago and now redirects to the Grease pit. Anyone have any ideas on the best way to get rid of it without making the entire table look terrible? --Yair rand 13:50, 15 October 2009 (UTC)

Better?​—msh210 16:42, 15 October 2009 (UTC)
Great. --Yair rand 18:19, 15 October 2009 (UTC)

[edit] Scripts

As I said in a previous discussion, I'm working on a project related to letters. I've cleaned up every translingual Latin letter definition that I found and added some dozens of new ones. Entries on Armenian, hiragana, basic katakana and some Braille are done as well.

Each entry has at least a generic definition and a display box containing related information (such as character variations, Unicode and romanization; though the Unicode was removed from display boxes at Braille entries by Bequw, who favors the usage of {{Braille character info}} instead). Examples: ĉ, , , ա and .

My next plans on this project include: cleaning up Braille variants, IPA letters and Roman numerals; creating every Latin, Braille and katakana entry; then starting Cyrillic script and Greek script. --Daniel. 03:28, 16 October 2009 (UTC)

See Appendix:Cyrillic script for an exhaustive list of modern Cyrillic letters. —Stephen 03:39, 16 October 2009 (UTC)
Good stuff! I've got to say, though, that the top placement of {{mul-script}} is not growing on me. It really clogs the top of the entry. Could it possibly go under ===See also=== as a navigation footer, maybe reformatted accordingly? -- Visviva 06:50, 17 October 2009 (UTC)
I am inspired on (among other things) the function "Add links to previous and next pages." from WT:PREFS; a list of related entries at a top placement looks very good in my opinion. But yes, the {{mul-script}} can be moved to See also section. --Daniel. 13:16, 19 October 2009 (UTC)
I agree they should be in the See also section as they are in the main content space (as opposed to floating on the RHS) and show the entire script. I also believe we should rethink the format. Most of these templates just show lists of letters, so why are we using centered-tables with color? On pages with legitimate uses of tables, RHS elements, and images, I think these script templates look inconsistent, overly large, overlap other elements, and catch too much attention. I'd rather see
See also
than
See also
--Bequw¢τ 14:42, 19 October 2009 (UTC)
I strongly agree with Bequw. Conrad.Irwin 14:57, 19 October 2009 (UTC)

In my opinion, the suggestion from Bequw is excellent. In other words, I agree on the format of the character list at the See also section as lines. Though all the issues pointed at his message could also be fixed through collapsible tables, like the Galician, Portuguese and Spanish conjugation tables at the entry comer and translation boxes of English entries; this may be a better solution in case of higher quantities of characters, such as at Ǽ, which could contain this:

See also

instead of such links allocated in five distinct lines. (Additionally, hiragana and katakana are mainly always organized as tables, but converting the current {{mul-script/Hira}} and {{mul-script/Kana}} to one or more lines wouldn't be troublesome.) --Daniel. 09:33, 20 October 2009 (UTC)

That does make it tidier, but then the alphabet isn't showing by default, which I think we would all agree it should be. Maybe hide the less-common variants when there are many (most don't have as many as æ), but the main script alphabets should be visible and still consistent with the rest of our UI. --Bequw¢τ 14:52, 20 October 2009 (UTC)
It might be worth rethinking how many of these are actually relevant on every page - I'd be perfectly happy for people who want to lookup á to have to look at a first, same with IJ and I or J. That way you only need two rows of the most pertinant links as opposed to a splatter of mainly irrelevant ones, removing the need for yet-another hidey-box-thing. Conrad.Irwin 21:49, 20 October 2009 (UTC)
Bequw, could you explain why do you think that, if collapsible tables are acceptable for some or most of the discussed contents, the fifty-two uppercase and lowercase basic Latin letters should be always visible? I'd like to know, specially because the text "The Latin script [show ▼]" is self-explanatory. --Daniel. 03:51, 21 October 2009 (UTC)
First, the collapsible box is not a cure for the inconsistent UI (using centered, colored tables). Using a consistent UI will reduce the space enough so the issue of collapsing boxes is probably moot (I said "maybe" because in general when a section runs too long, one can consider boxing away the less useful parts). Second, for letters in an alphabet, I think the other letters in that alphabet should always be visible (Latin a should show b and Hungarian ű should show ü). This is both because users are very likely to click on these links and because dictionaries often define alphabetic letters in reference to the rest of the sequence. Third, in regards to the obviousness of "Latin script", I think many users of Wiktionary might not know that by "Latin" we mean letters used also in English (definition 2 of our 6) and by "script" we mean a writing system (definition #7 or our 8) but that does not include numbers and punctuation (like Unicode's "Latin" block). It's fine to show the phrase and the letters at the same time, but don't assume users will know correctly what to find by clicking on the "show" link. When we can do w/o this obfuscation we should. --Bequw¢τ 18:30, 21 October 2009 (UTC)

[edit] Time M-zine article

Some of you might be interested in reading Time Magazine’s September article on Wikipedia. It is available here: Is Wikipedia a Victim of Its Own Success?. —Stephen 19:59, 16 October 2009 (UTC)

Interesting. Thanks for posting the link. :) L☺g☺maniac chat? 21:54, 16 October 2009 (UTC)
Related talk from Wikimania 2009: Interpreting Wikipedia’s Demographic Decline: Implications for an Emergent Community (video, 79.4MB, 1:16:00). --Ivan Štambuk 00:28, 17 October 2009 (UTC)

[edit] Protologism-by-template

I came across the word zwavelzuurtjes. The Dutch word zwavelzuur means sulfuric acid, and this is the diminutive plural of a word that does not really have a plural or a diminutive. Moreover the word "zuurtjes" does exist. It denotes a form of candy, a bit like lifesavers. That means that "zwavelzuurtjes" sounds like a morbid joke: sweets filled with sulfuric acid.

The problem here is that a template like nl-noun generates protologism like this by default. I think it should be the other around imho: the default should suppress the diminutive and one should have to specify the diminutive if there is one, because it is certainly not so that all Dutch words have a diminutive even if one can grammatically be formed.

As the default is now it invites people with very limited knowledge of this language (or none at all) to generate large amounts of nonsense. This is unfair to the native speakers and unfair to our language. Besides, real protologims are suppressed here, even when they do have genuine semantics.

The problem is not limited to nl-noun. I also see superlatives and comparatives of adjectives that really do not have those, or forms like "farmacologischst" that are simply wrong. (Words on -isch typically take meest (most) as in English if they have superlatives).

Imho trustworthy dictionaries should provide factual information on words and forms of words as they are really used (or not), not gobs of bogus forms that were computer generated to boost the article count. Jcwf 21:50, 16 October 2009 (UTC)

Don't the templates involved have the option to suppress the additional forms? {{en-noun}} is the best one in English for handling such matters, specifically allowing "-" and "?" as parameters, "-" displaying "uncountable" and "?" displaying nothing but adding the item to a hidden category for unknown or uncertain plurals.
What portion of nl nouns generate spurious diminutives? Do nl templates offer options for suppressing some forms? DCDuring TALK 23:21, 16 October 2009 (UTC)
You can use {{nl-noun|plural|-}} to suppress the diminutive. But as with {{en-noun}}, many users won't realize this. The few times that I've dealt with {{nl-noun}} doing cleanup, I've just kind of prayed that the template knew what it was doing. :-) -- Visviva 06:04, 17 October 2009 (UTC)
Diminutives are sth that shouldn't be appearing in the inflection line in the first place, but under ====Derived forms====. Do diminutives play such a prominent role in Dutch that they ought to be conspicuously indicated that way? Do Dutch (mono- and bilingual) dictionaries list the noun diminutives that way normally, or list them under separate headwords?
Comparatives and superlatives should obviously be made optional parameters, defaulting to the behaviour that is characteristic of the bulk of adjectives in the language (I assume that most of the Dutch adjectives/adverbs can be graduated into comparatives and superlatives..) --Ivan Štambuk 00:13, 17 October 2009 (UTC)
It always seemed odd to me that we had the diminutives in the inflection line, but I figured it must be a Dutch thing. Hopefully more Dutch-speaking editors will weigh in. -- Visviva 06:04, 17 October 2009 (UTC)
Support changing the default template behavior to avoid creating misleading redlinks. "First, do no harm," and so forth. I assume, though, that this would then require editing the many Dutch nouns for which legitimate diminutives are being template-generated. -- Visviva 06:04, 17 October 2009 (UTC)
support for the same reason: it's better not to provide some useful information than to provide wrong information. Lmaltier 06:20, 17 October 2009 (UTC)
I support doing whatever people decide on ANL. Not an issue the community at large needs to get involved in IMO.​—msh210 18:10, 19 October 2009 (UTC)
I oppose changing default template behaviours and support morons not adding shit they don't know. — [ R·I·C ] opiaterein — 19:32, 22 October 2009 (UTC)
I don't know Dutch (and am ipso facto a moron), but I feel as though templates shouldn't offer something by default unless it's going to be relevant more than 50% of the time. Is this the case here? (Example: our {{en-adj}} doesn't automatically put -er and -est forms on every adjective, because loads of them don't take it or are irregular.) Equinox 21:08, 22 October 2009 (UTC)
I'm not sure the the original complainant is a native Dutch speaker, but he does say he spends his time on nl.wp. So, even after we solve the moron problem, we have the problem of bad default behavior of the template. The solution lies with the active contributors to such entries.
About morons: If we only allow non-morons to contribute, then the non-morons had better get on the stick: If a language has 200,000 lemmas and many more non-lemmas and there are two serious contributors doing 20 all-new entries per day (no translations) with etymology and pronunciation and full inflection, it would seem that it would take 5,000 days. Then either one has to get more non-moronic contributors or reduce the number of languages (!!!). OR one could make it easier for morons to make some kind of contribution. And, who knows, perhaps their moronity is not a genetic condition, but one that can be overcome by this new biotech thing I've heard about called learning. DCDuring TALK 21:43, 22 October 2009 (UTC)
I strongly support moron contributions. I hate contributing to entries for words that I actually know something about. So boring... -- Visviva 12:39, 23 October 2009 (UTC)
Support for changing templates so that they do not create gibberish by default. Oppose making assumptions like "people will know how to suppress incorrect information". Strong oppose to classifying those who don't as morons. When I first came here I barely understood what a template was, and certainly didn't know enough to find the Template: page that described its use. I thought the learning curve was pretty steep then. I can't imagine what it must be like now. DAVilla 06:19, 9 November 2009 (UTC)

[edit] Logo vote

As everyone here probably already knows, there has been a discussion on changing the Wiktionary logo going on on Meta for quite a while. There were a bunch of logo proposals, a few arguments, no one having a clue what anyone's talking about, the discussion page getting insanely long, yada yada yada. Recently, there were doubts on whether there should be a logo vote in the first place as people started to notice that two thirds of the votes to start a logo vote were in fact not from Wiktionarians, but from Wikipedians and Meta-people. So, to finally set the record straight on whether people are in favor of a vote I am asking here to find out who does, and who doesn't, want a new logo.

So, do the people at the English Wiktionary want a new logo?

--Yair rand 17:21, 18 October 2009 (UTC)

What logo do you consider as the current logo? The logo used here or the official tile logo? I think there should first be a vote here about adopting this tile logo (AFAIK, no vote has been organized here about this tile logo yet). Lmaltier 17:32, 18 October 2009 (UTC)

I used to not like our current text-logo, then it sort of grew on me. Then I saw this proposal, and it stole my heart. Now, I want a vote to adopt it. --Vahagn Petrosyan 17:40, 18 October 2009 (UTC)

I still hate our current text logo. I would do a vote. --Internoob (TalkCont.) 18:19, 18 October 2009 (UTC)
I don't care, but might vote to block a worse alternative. Equinox 18:42, 18 October 2009 (UTC)

I agree with Vahagn about the current logo having grown on me, but find the book-and-globe one he's pointed to unobjectionable (which is more than I can say for the icky tile logo). --EncycloPetey 18:27, 18 October 2009 (UTC)

Except that it is what users are accustomed to, the current logo-in-use doesn't have much to recommend it IMO. But that is a big "except". Perhaps it is too English/Roman alphabet-oriented, too busy/doesn't scale well to small size. I'm not sure how much it matters as most new users don't select Wiktionary because of a logo, but come to us from WP links or search engines and portals. Logos become invisible over time, too (and we can turn the corner one off). A logo can help a bit if the colors are a bit different from what the other English-language projects use, biased toward whatever colors are in the logo(s) other language Wiktionaries choose. The color distinctiveness might help users find our logo in the various boxes where it appears with the other proejcts'. I think it's a good thing if we get people other than ourselves involved (WPs and WMFs), as they are a little more representative of a large segment of our actual users than we are likely to be. DCDuring TALK 18:47, 18 October 2009 (UTC)
I don't particularly like either the current text logo or the tile logo, and actually if we voted, the one Vahagn pointed to is the one I was planning to vote for. And yes I think the vote should be started. L☺g☺maniac chat? 19:48, 18 October 2009 (UTC)

I would vote to block a worse alternative. Frankly, I don't mind the text logo. I could care less about it, except that the tiles are disgusting, unprofessional, and look horrible, and some of the alternatives are worse. The other thing that strikes me about the vote is that most of the people that called for it in the first place and the people who are adamant about the current en.logo being horrible are people that never edit Wiktionary. And just … I can't coherently get my thoughts together any better than that. --Neskaya contribs talk? 00:55, 19 October 2009 (UTC)

  • I just don't care. At all. Logo, shmogo. I'm not here to build anyone's effing brand. But how about this? We could just make the Wiktionary logo an image selected at random from the proposed logos, rotated daily. Variety is the spice of life. More realistically, I would support a binding vote here to accept whatever comes out of Meta (if anything). -- Visviva 02:51, 19 October 2009 (UTC)
  • I hate all the voting at the drop of a hat. Is the logo a problem?
    • No - let's build a dictionary.
    • Yes - let's talk about what is the problem, figure out how real a problem it is, and the solution should become obvious.
    - Amgine/talk 22:29, 19 October 2009 (UTC)
  • I know you're being a bit frivolous, but rotating the logo is the worst, most confusing thing we could do. There is a need for a consistent "brand", even if it's only our current little bit of boilerplate text. I'm just not that picky personally as to whether it's a Scrabble® tile or a book or some variant of the Wikimedia logo or whatever. Equinox 22:32, 19 October 2009 (UTC)
No, please, don't rotate the logo! L☺g☺maniac chat? 22:44, 19 October 2009 (UTC)
But... but... people could tune in every day to see if we had the Goatse Wiktionary or not. No? Man, I never get to have any fun. :-(
Well, I am being frivolous. The whole issue is frivolous. Visual branding isn't that important because it applies only to our web interface, which is of marginal significance to the mission of building a universal, free-as-in-freedom dictionary.
The people on NL have a very good point, IMO; we did this already, so let's just accept the outcome of that process, however flawed it may have been. If we don't do that, then we should at least vote on whether to accept the outcome of the new process before we make all the other Wiktionaries waste their time. Again. -- Visviva 03:16, 20 October 2009 (UTC)
I was actually hoping that votes whether to accept the outcome would be held after the votes on which logo is best with the understanding that no one takes it unless a clear majority of the Wiktionaries do, as that seems to be the only way to make sure that we don't split the logos even more. Really, the whole process to start a vote doesn't seem too difficult to go through. Since there has been a decision that the tile logo is too horribly ugly to use, I think that holding a vote is worth it just to stop those annoying "isn't that pronunciation wrong?" comments, even without the annoyances that the favicon is identical to WPs and that our entries look nothing like that and that it's ugly and the fact the current logo isn't even being used by about half of the Wiktionaries. Really, would holding a vote really take so much time away from building the dictionary as to make it undoable? --Yair rand 04:15, 20 October 2009 (UTC)
No particular objections to a vote, now or later. But FWIW, I think having a long and laborious process, and then a vote on whether to accept the outcome, is a recipe for failure and aggravation. People who didn't bother to engage in the initial process at all will suddenly come out of the woodwork with tendentious objections to the result. Isn't that more or less what happened the last time around? So why don't we make it clear that we will (or won't) accept the results of the Meta process, and then allow that process to take its natural course? Then the outcome of the process will be accepted by default, and anyone wanting to raise objections after the fact will have to show that there is a consensus against the result of the process (which is unlikely, unless there are very serious problems indeed). -- Visviva 07:10, 20 October 2009 (UTC)
That would really only work well if all or most of the Wiktionaries had votes like that, and the vote would have to state that in the event that the majority of the wikts don't vote positively to whether to accept the outcome of the vote, then the vote will not happen. If it's clear that all or most Wiktionaries will accept the outcome, then the tile-users will likely see it as a chance to unify the logos (which in itself may or may not be a good thing). In that case, if we hold a vote like that first and it's clear we'll accept the outcome, the NL Wiktionarians might participate. --Yair rand 13:25, 20 October 2009 (UTC)
Actually, forget it. I support having a vote whether to accept the outcome of the Meta vote in advance, whether it states the extra stuff I said or not. --Yair rand 14:19, 20 October 2009 (UTC)
Sorry for changing my mind again, but I actually don't have the slightest clue which is better, voting whether to accept before or after the vote. Voting after is probably more likely to get positive votes from the Wiktionaries, but voting before would mean a lot less wasted time if people vote negatively. I dunno. --Yair rand 01:06, 21 October 2009 (UTC)
How about this: horse-cart or cart-horse? Should a logo be proposed by Wiktionary to the WMF, or should the WMF propose a logo to Wiktionary? How you answer that may suggest how you feel about Wiktionary and WMF. - Amgine/talk 20:28, 21 October 2009 (UTC)
(unindent) Personally, I think it'd be better if we proposed the logo to WMF. L☺g☺maniac chat? 22:28, 21 October 2009 (UTC)
I don't know what you mean by WMF proposing a logo to Wiktionary. There was a 10-week nominations period when anyone could suggest a logo and most of the proposals came from Wiktionarians (Stephane8888 from the French Wiktionary, V85 from the Norwegian Wiktionary, Moilleadóir from ga wikt, Diego from pt wikt, Vildricianus and DAVilla from here). Amgine, are you saying that the current voting setup (in which most of the work is completed) isn't good enough because it didn't originate on en.wikt? Does it really matter that much? --Yair rand 23:17, 21 October 2009 (UTC)
Yes, it really does matter that much. - Amgine/talk 03:13, 22 October 2009 (UTC)
I think we would all have preferred it if the vote originated on Wiktionary, but this logo discussion has been going on since March, a large number of Wiktionarians were in favor of it, most of the necessary work has been done and Amgine seems to be the only one who thinks that it is so much of a problem that the entire vote should be abandoned. Is there anyone else who is in favor of abandoning the whole vote because the voting discussion did not originate on Wiktionary? --Yair rand 05:17, 22 October 2009 (UTC)
Just out of curiosity, what good do you (anyone) think it would do if we abandoned the vote? L☺g☺maniac chat? 14:42, 22 October 2009 (UTC)
IMO the visual branding question is mostly a matter of concern to WMF and not much to us. (But how important is it even to them?) Ideally we could veto a really bad logo but accept almost any reasonable choice without further delay. If WMF or the active participants really care and want, we could commit ourselves to accepting anything they propose, unless we have a vote that rejects the proposal by, say, a 2/3 (or 60% or 75%) majority. Between a final veto and an opportunity to suggest criteria and make specific proposals we would have had enough opportunity to participate. DCDuring TALK 15:08, 22 October 2009 (UTC)
How could it possibly come from Wiktionary? There is no one Wiktionary; we on EN may like to imagine ourselves as the hub around which the other projects revolve -- we may even fill that role in certain limited respects -- but ultimately we are just one among well over 150 coequal Wiktionaries. If we object because the initiative didn't come from us, why wouldn't NL, FR, VI, or NV object to any initiative from us on the same basis? Fortunately, there is a project dedicated specifically to cross-project coordination: Meta. And fortunately, that's exactly where the discussion is taking place. -- Visviva 15:22, 22 October 2009 (UTC)


  • Personally, I would greatly prefer if someone from the WMF just stepped in and imposed a logo. It won't happen, of course, mostly because they're a bunch of gutless wonders who will happily countenance flagrant bullying (ru.wikibooks) and mass copyvio (ta.wiktionary) as long as "the community supports it." And so we will all go on wasting our time on these non-issues ad infinitum. Maybe I should write a proposal for that StrategyWiki: "Grow A Spine." :-D / *sigh*. -- Visviva 15:22, 22 October 2009 (UTC)
OK, I see what you mean by that. I guess I really don't care who starts the vote, but someone please do ... L☺g☺maniac chat? 15:27, 22 October 2009 (UTC)
The work to start the vote is going on as we speak. The voting page still needs translations into de, fr, ja, ru, tr, lt, and vi. --Yair rand 16:03, 22 October 2009 (UTC)
Well, what I would prefer to see is if wiktionary languages approached each other about normalizing the logo if it were going to happen at all. It's clearly not an issue for the project, however, so it has never happened. But it seems every so often someone outside the project brings it up as something that is life or death and then we end up with months of drama culminating in a divisive vote dominated by people outside the project who attempt to impose a logo which has little or no meaning to the project.
It'd be great to see all the effort currently being put into this crusade instead being put into trans-Wiktionary discussion about whether this is important to the project. And an option to vote against the logo vote in the logo vote. - Amgine/talk 18:39, 25 October 2009 (UTC)
How in the world could there be a trans-Wiktionary discussion? There are 172 independant Wiktionaries none of whom speak the same language. No one is going to impose a logo on anyone. There are only going to be votes from Wiktionarians and no one is going to take the winning logo without agreement by more than %60 of the Wiktionaries. I don't think anyone but you thinks it would make any sense to abandon the vote so the vote will probably proceed as planned. --Yair rand 19:01, 25 October 2009 (UTC)
I would guess a trans-wiki discussion would happen in much the same way I currently have daily discussions with wiktionarians on multiple projects - through a range of online media including on-site, e-mail, microblogging[fr][nl][4], blogs[fr][es], and even in IRC[fr][de] &c. Where we don't talk is on meta. - Amgine/talk 19:58, 25 October 2009 (UTC)
But what is wrong with talking on meta? That's the whole point of meta, to have discussions that affect multiple projects. Meta works perfectly, and the vote has no real problems with it, so there is no real reason to stop it. How would off-meta discussions be any better? I see no reason to stop the vote just so we can start from scratch setting up the voting process off of the project that was built for that purpose. Perhaps I'm simply not understanding you correctly. --Yair rand 20:39, 25 October 2009 (UTC)
There's nothing wrong with meta for trans-project discussions. But this is within a single project, and the vast majority of Wiktionarians don't go to meta. It's not how things are done, which isn't a judgement on the benefits or drawbacks of any system or process; only an observation. I've been saying this since, what? March? April? and yet the Meta process has bowled along ignoring this. If you make decisions where the community is not, the community will not feel they've made the decision. That's not at all difficult to understand. And, to be blunt, it doesn't matter if you stop the vote or not. - Amgine/talk 22:04, 25 October 2009 (UTC)
If I'm understanding you correctly, you are saying that because the discussion was held on meta instead of somehow between Wiktionary projects, the community will feel that the vote was constructed without them, and because of that the end vote will have less than %60 of Wiktionaries approving of the winning logo and the whole thing will have been a waste of time. I find this to be an extremely unlikely possibility, and I don't think that many people agree with you on this. As there is no consensus to stop the vote, the only thing we can do is wait and see. I think that the vote will most likely be a success. --Yair rand 22:38, 25 October 2009 (UTC)
No. I'm saying the die is cast; the acceptance or lack of acceptance of any result from meta is unlikely to be changed at this point because it was begun outside the community, by someone not a wiktionarian, and continues to be dominated by non-wiktionarians. Most of us, I expect, don't care too strongly what the result is; we're tired of having this argument over and over again, we're tired of non-community members pushing their idea of what the perfect logo is, we tired of having so much of our very limited energy tied up in arguments, and we'll likely go along with whatever strident voice is screaming at the end of this.
And we'll hate it.
And people like you, who've been around not even as long as this argument has been going on (yes, I know, you were on Wikipedia for several years, but you started here under this nick in July) will have won the day. And what will you have won? Why was it so important to you?
Never mind, you're plainly incapable of thinking beyond tomorrow. Go have your little poll and your victory march. - Amgine/talk 02:29, 26 October 2009 (UTC)
(unindent) The logo discussion is still going on not because a meta person decided to bring it up, not because of the bits of help from non-Wiktionarians, but because the community wants it as has been said again and again, throughout the discussion and poll on meta. You have made a claim that the vote will be a disaster, that nobody wants it and that the community will hate it, and I see no evidence whatsoever that supports that claim.
It is clear that this discussion is getting us nowhere. This apparently pointless debate is now over. --Yair rand 15:02, 26 October 2009 (UTC)

[edit] Names of specific entries (again)

I thought I'd copy the URL to save time (here). The several problems I can find (for me, anyway) are:

  1. What is a specific entry? Proper nouns aren't always specific entries, like Stephen as you can have several Stephens. So it refers to things that there are only one of, right? Still, ambiguous.
  2. Used attributively. Okay I'll spare you use#Verb but attributively and attributive aren't very detailed and give almost no help. If it's just the grammatical sense of attributively, then almost any place name will meet CFI because place names (in English) don't have adjectival forms. So I'm from Leeds. A Leeds taxi, a Leeds restaurant. All of these attributive, right? I'd think as long as the place name is not extremely, the attributive form won't be either.
  3. Widely understood meaning. A debate broke out over Daffy Duck that even though it can be cited attributively, it doesn't have a widely understood meaning. What does that even mean? I mean Daffy Duck means a cartoon duck, right? In the same way that Leeds means a city in West Yorkshire. Is there another meaning of Leeds? Or Confucius means a 5th Century BC philosopher. AFAICT "widely understood meaning" doesn't put any limits on what the meaning is, it just has to have one.

Admittedly, and I can barely stress this enough I don't really have a better idea but maybe someone else does? Mglovesfun (talk) 10:10, 19 October 2009 (UTC)

I have boldly edited Wiktionary:Editable CFI#Names of specific entities to reflect my understanding of this annoyingly oracular passage. I hope that those who disagree will boldly revert or revise my edit, and perhaps we can eventually work out something that will have a working consensus behind it. Here's the problem asa I see it: the lack of clarity of the current wording has given everyone cover to read whatever they want into it (or to ignore it entirely). Any efforts to clarify it thus mean that someone's ox will get gored, and so are voted down for the most ridiculously disingenuous reasons. It's worth reiterating that there was never any consensus for CFI to be set in stone; the current state of {{policy}} is purely the product of some erratic editing by Richardb and Connel in 2006. -- Visviva 10:57, 19 October 2009 (UTC)
That's already about 50 times better. If we could have a few more people edit it and then have a vote on it, I would be pretty happy. But at the very least, it has to be worded in a way that is a lot less ambiguous, I tend to think that everything is at least a little bit ambiguous. But hey, c'est la vie. Mglovesfun (talk) 11:16, 19 October 2009 (UTC)
For (1) see Appendix:English proper nouns#Proper nouns as common nouns. Visviva got my understanding of (2+3) right (a definition independent of the referent). --Bequw¢τ 14:17, 19 October 2009 (UTC)

Until we can agree on sets of proper nouns that should be included in their own right, I don't see a successful vote to make the change. The current proposal makes one criterion clearer, but rewords it in such a way as to exclude most names of countries, etc., which is a set of proper nouns most community editors have supported for inclusion in the past. I have made one small edit to the last sentence, since the way that is was worded would require that we include Thomas Jefferson (or any other full name) if there were more than one individual with that name, which is probably not what was intended. --EncycloPetey 01:52, 20 October 2009 (UTC)

[edit] Request for bot status

I am formally requesting bot status for my new bot, User:Di gama bot. It is not intended for any long-term 24×7 operations, but rather for short-term "projects" involving many edits. At the moment, it is being geared up for a mass page move of sign language entries following a change in the entry name style, in accordance with the proposal at Wiktionary talk:About sign languages. As of this message, the matter is not settled, but a change of some sort is likely and this bot is intended to do the heavy lifting.

As noted above, the job requires the ability to move pages and I don't think bots innately have that privilege, so it would need to be autoconfirmed as well. The program is not hashed out yet, but it will use a stock script of Pywikipedia in conjunction with a hand-gathered list of moves to be done. As I said earlier, it is a short-term project, and will be monitored by me for the duration. Any future projects using this bot will be requested similarly. —Di gama (t • c • w) 00:54, 30 September 2009 (UTC)

I don't imagine there would be any objections, but I believe this is technically supposed to go through a formal WT:VOTE after being mentioned here. -- Visviva 12:14, 1 October 2009 (UTC)
I've made a new vote page at Wiktionary:Votes/bt-2009-10/User:Di gama bot for bot status, but I notice that in the instructions at WT:VOTE, which state:
  1. Replace “Title of vote” with what you’d like to start a vote on, or add the relevant user in one of the boxes below.
  2. Ctrl-A, Ctrl-C (select all, copy) of the text in that input box (it is the new link).
  3. Edit in a new tab that page and add the one line of magic text below the editbox. Again, replace “Title of vote” with your exact vote topic name. Remember to add {{ and }} around your pasted text.
  4. For nominations that should be listed in multiple places (in essence, WT:A, WT:B or WT:C) open that page in another new tab and repeat the pasting of the transclusion line.
  5. Click the button below the relevant box, fill out the form displayed and save it.
#'s 2 and 3 require editing WT:VOTE, a protected page (as of '06, apparently). —Di gama (t • c • w) 06:48, 2 October 2009 (UTC)
You should be able to edit WT:VOTE; it's only semiprotected, and you've had an account since early last month. Are you only getting the "view source" option? -- Visviva 07:20, 2 October 2009 (UTC)
Ha, I'm clever. I had an "edit" button, but I turned away when I saw the header which said the page was protected (it actually says protected, not semiprotected), not bothering to try actually saving the page. Just goes to show what glossing over notices can get you. Thanks! —Di gama (t • c • w) 08:26, 2 October 2009 (UTC)

The vote needs more input.​—msh210 18:25, 19 October 2009 (UTC)

[edit] Daily dump RSS feed

For anyone interested I have created an RSS feed for our daily dumps. You can subscribe at http://www.devtionary.org/cgi-bin/feed.pl - please report any bugs or feature requests. I've tried to make it as similar to the RSS feeds for the offical dumps from Wikimedia. — hippietrail 00:09, 20 October 2009 (UTC)

[edit] Translingual translations

Corvidae. On the surface this seems rather contradictory. If there are translations, then how can it be called 'Translingual'? Nadando 21:12, 20 October 2009 (UTC)

These terms are translingual by their virtue of being understood by biologists the world over. HOWEVER, they are, in practice, New Latin, and as such pretty much all the taxonomic level higher than genus have mostly standardised forms in vernacular languages. Usually this is a standard form for the suffix combined with agreed rules of transliteration, although many language function differently in assigning vernacular terms. Circeus 01:33, 21 October 2009 (UTC)
I supported these once, but looking at Corvidae#Translations, it's obvious how badly this can go wrong. At best, these seem like translations of "crow family"; some, however, bear a strong resemblance to wikispecies:Corvidae#Vernacular names, which suggests that they just mean "corvid". So either we come up with a standard header template for these that makes very clear that only non-Roman exact equivalents should be entered, or we just need to remove them entirely. The linked entries can get by on their own, I think; appendices can take care of the cross-referencing. -- Visviva 02:26, 21 October 2009 (UTC)
Come to think of it, isn't crow family idiomatic enough for inclusion? Why don't we just move all of the "translations" thither? -- Visviva 02:32, 21 October 2009 (UTC)
Crow family is hardly idiomatic in English. We already have corvid#Noun.
But, in principle, couldn't there be even multiple transcriptions of these taxonomic names into the various scripts. And since we have translations into languages, not scripts, we could have multiple ones for each language using, say, the Cyrillic script. Is it more standardized than that? DCDuring TALK 02:48, 21 October 2009 (UTC)
How is it not idiomatic? It's a proper noun, even. (Or if it isn't, we have a whole lot of mislabeled entries.) How would someone guess the specific meaning from crow + family? Or how would a non-English speaker know that it was the "crow family" and not, say, the "jackdaw family"? On the other hand, it's difficult to say that Corvidae has a distinct meaning apart from "the corvids", so perhaps there is no real need for a translation handle. On the third hand, 까마귓과 means "Corvidae" and definitely not corvid. So there is a risk of orphan translations (not that that's a huge problem in itself).
I don't really know how these issues are handled outside of the CJK scripts, but w:Corvidae has (unsurprisingly) very different interwikis for Russian and Kazakh. Neither is a transliteration. -- Visviva 03:33, 21 October 2009 (UTC)

[edit] Bot: MerlLinkBot

  • Operator: Merlissimo
  • Function overview: changes external links which are outdated and can be successfully replaced by a new one.
  • Function Details:

The bot replaces urls that have to be changed. This can be only a domain change or a more complex page structure change on a website. Links are dectected with the help of the api (and not with regex) and are only replaced if the webserver of the new url returns a 200-status-response for that new resource. “Link text” is not changed. (own framework written in java - used by all of my bots)

  • Operation: controlled
  • Software:: java (own framework)
  • Has bot flags on: 30+ Wikipedias and some other projects (e.g dewiki(home), enwiki, simplewiki, commons, enwikinews) (see all flags)

Mglovesfun asks at the talk page of my bot to request for permission/bot flag on this page. The bot is running globally and as you can see there typically aren't so many weblinks on wiktionary projects. Merlissimo 16:15, 21 October 2009 (UTC)

How does it find the replacement URLs? (Via w:HTTP 301 headers??)​—msh210 16:21, 21 October 2009 (UTC)
If there is only an 301-header the browser would automatically redirects you to the new url and so there is no need to rewrite the urls.
How i find the new locations? Thats can sometimes be very complicated. The easiest possibility is when webmasters give hints (ask on a project for rewrite or i write them a mail). Sometimes there is an hint on the old location (like europa.eu.int). And mostly wikipedians (or wiktionariens? ;-) ) know how the rewrite could by done. The bot can read headers and content or do a rule based rewrite. But i am testing very much before the bot really starts running. But the bot is always testing the respone code of the new location and can do and similarity check (e.g. with an archived version) so thats guaranteed that the content is the same. Merlissimo 17:03, 21 October 2009 (UTC)
It would seem most sensible to run the bot without a flag so that we can see what it is doing. If it starts doing lots of edits (more than a few a day) then adding a flag may be helpful. As long as it is just updating broken links, I cannot see any problem. I assume it cannot handle links generated partially in templates (i.e. {{{googleid}}} on {{cite-usenet}}), can it produce a list of those that are broken? Conrad.Irwin 19:09, 21 October 2009 (UTC)
Not more than fifty edits, at any speed, or ten a day, in any quantity, sounds reasonable to me.​—msh210 19:38, 21 October 2009 (UTC)
@Conrad Of course i can check some links and generate a filtered list to a specified page. On enwikinews and some de-projects i am also reporting broken links on the discussion pages.
Simply tell me which links to check, how to filter (e.g. broken only) and where i should put the list. I can also check if an url is available at different archive services. My framework is very modular, so that would be very easy. Merlissimo 19:53, 21 October 2009 (UTC)
If I knew which links to check, I'd have checked them already :p - I assumed that's what you were proposing to do, no matter either way. Conrad.Irwin 20:42, 21 October 2009 (UTC)
Ah, ok. My bot can also change links in templates namespace. More complicated templated are repaired manually by myself. External links produced by templates (e.g. googleid) in article namespace are normally not rewriteable - they are simple broken. Templates like cite-usenet which contain the complete url as argument can be handled by my bot. Merlissimo 17:42, 22 October 2009 (UTC)
I added a note on the bot user page that it runs without flag: [5] Merlissimo 17:48, 22 October 2009 (UTC)

[edit] Morse code

How should we treat Morse code (and related codes such as Russian Morse code)? In its basic form, Morse code is just a character encoding scheme. Some encoding schemes don't get entries for their individual characters, such as signal flags or semaphores, while others do, such as Braille (because they have their own Unicode codepoints) and ASL manual alphabet (because we have a consistent transcription scheme and because it's part of real language like ASL). Aside from single letter encodings, Morse code can represent control characters (with Prosigns) and special letter combinations has specific meanings (special abbreviations, Q codes, and Z codes). As I see it, there are several things that should be decided.

  1. Currently, our Latin letter pages (and "Variations of .." pages) show Morse code but link to the audio representation. Should we extend this to multi-letter terms? If so, would we extend the other encoding systems (e.g. semaphores) to multi-letter terms as well?
  2. Do we have entries for individual Morse code encodings ("..." = 'S')? One problem is that there is no standard textual transcription of Morse code (see Wikipedia discussion). For "dots" people use periods '.' (which is problematic for linking), bullets '', and middots '·'. For "dashes" some use the hyphen-minus '-' and some the em-dash ''.
  3. Do we list special letter combinations, and if so which ones? Some, like SOS and CQ are common outside of Morse code, but many are not. If we do include them, do we represent them in Latin characters or as Morse code sequences? ... --- ... ("SOS") used to exist but was deleted for being "encyclopedic".

I'd say for #1 that we should only do single letters. I'm not sure about #2. For #3 I'd like the terms listed in Latin letters but not in Morse code. Anyone else have thoughts? --Bequw¢τ 16:50, 22 October 2009 (UTC)

[e/c] I think we should not include Morse-code pagenames (#2), in part because (per Bequw) there's no unicodification of the Morse code. Things like QSL and SOS (#3) we should have; attestation can be easy, by means of transcribed telegrams quoted in durably archived sources. For #1, Bequw seems to be asking whether an entry like [[foo]] should include the Morse-code representation of foo as... an =Alternative spelling=? a =Trivia= fact? Either way, I think not. That's an easy one-to-one correspondence anyone can figure out by checking a table of codes, so there's no reason to include it; and a downside is the wasted screen space.​—msh210 17:14, 22 October 2009 (UTC)
Can't hurt to have entries for the code for individual letters. I'd support having the entry for the code for something like SOS if citations could be found that indicate that a reader might come across it 'in the wild' without a referent to the letters for which it was a code. bd2412 T 17:03, 22 October 2009 (UTC)
Wiktionary is not Unicode's bitch. All for inclusion of letters by whichever transcription is preferred. Terms would need to be cited where they are intended to be discernible. DAVilla 06:31, 9 November 2009 (UTC)
I agree (with DAVilla). —RuakhTALK 11:32, 9 November 2009 (UTC)
I seem to remember adding some of the individual letters some time ago - but they all got deleted, "Morse code" not being a "language". So I just added the table to our entry for Morse code instead. I wasn't able to get the dots and dashes to align properly though. SemperBlotto 11:39, 9 November 2009 (UTC)

[edit] Štambuk v Štambuk

Let's listen to what Mr Štambuk himself has to say on Bosnian, Croatian, Serbian, Montenegrin and "Serbo-Croatian", shall we?

In Talk:Macedonian:

[...] I'm aware that Lunt published the first grammar of Macedonian just about ~50 years ago, but look - Bosnian language was codified less then 15 years ago; does this fact invalidate rights to Bosniaks to their own standard language/literary language, which they were unable to do due to inadequate political climate in ex-Yugolavia(s) surrounding the question of the existence of their ethnos? I don't think so. [...] --Ivan Štambuk 12:08, 31 January 2008 (UTC)

In WT:RFD/O (see this version, the section is no longer in the current version):

Serbo-Croatian macrolanguage officially died with communist Yugoslavia and sh ISO code is now deprecated. With bs/hr/sr babelboxen available, I see no reason for keeping this other than for insultive/propaganda purposes. --Ivan Štambuk 18:21, 4 June 2008 (UTC)
[...]
Bogorm, so-called "Serbo-Croatian" is nothing but a political label that in practice meant nothing. Most modern Croat historians would argue that "Yugoslavism" that brought us the "common language" was nothing but the most elaborate and refined phase of Greater Serbianism. I can guarantee you that no Serbian speaker can write literate Croatian. Do you find it funny that today people who claim to speak "natively Serbo-Croatian" here and on WP have but -3 and -4 profficiency in Croatian listed in their Babel boxes? This "separatism" you speak of is bizarre, Croat writers were caling their mother tongue Croatian centuries before communist brought us the SC thingie, and Serbs continued to have in the constitution "SC" listed as an official language up until 1997 in unsuccessful Yugoslavia #5 even after Bosnia and Croatia declared independence. In practice people who claim to be speakers of "SC" are in 99% cases Serbs or Serbophiles who use it as a political manifest with a very precise meaning, especailly when they couple it with hr/sr babelboxen (why chose them at all when hr=sr=sh?). Believe it or not, the notion of "SC" is offensive nowadays to most Bosniaks and Croats, and it would be to you to if you were more educated in language policy, propaganda machinery and fabricating history that was essential ingredient of Yugoslav politics (always at the expense of someone others heritage..). Also, a common myth you mention - there is no "linguistics justification" for SC either; as a standard language it's dead (it never really existed because two "varieties" of Eastern and Western that were recognized roughly correspond to modern notion of standard Croatian and Serbian, post-90s changes aside), and the notion of "SC" as a "collection of dialects" is completely arbitrary as the whole South Slavic area forms a dialect continuum (i.e. Čakavian+Kajkavian+Štokvian+Torlakian do not constitute a "genetic node", and neither does the South Slavic branch either as it's just a geographical designation, just like West and East Slavic too). The whole thing is a bit more complicated than b/w picture you boil it down to. Once again, saying that one speaks "SC" is nothing but a political manifest, by modern conceptions anachronistic, obsolete and potentially offending to some. Why insist on it when alternatives are available? --Ivan Štambuk 22:45, 15 August 2008 (UTC)
[... I'm eliding the other half of the discussion to focus on Štambuk's comments and views, see the original page]
I can't relate much the relevance of the content of your comments to this discussion. I'll list my points again so they don't get obscured by your irrelevancies:
  • "Serbo-Croatian" as a standard language existed only in a Communist State called SFRJ, describing two abstract, literary forms of "Eastern" and "Western" variant that roughly correspond to Serbian and Croatian today. So even then - it was not a "unified" language.
  • "Serbo-Croatian" in a sense of "collection of dialects spoken by Croats and Serbs" is a completely arbitrary classification of dialects, and not a valid genetic node in Slavic language branch. No Slavic historical linguistics handbook I know of claims this, and I do not of several works written by experts that prove otherwise. Moreover, there are no isoglosses that connect all Čakavian or Kajkavian speeches (two major dialectal groupings of the area), hence no "Proto-Čakavian" or "Proto-Kajkavian" - their last common ancestor that is reconstructable by comparative method of historical linguistics is Proto-West-South-Slavic or Proto-Slavic proper. Since everybody speeks their own local idiom, and not literary language (95% of speakers don't even know the proper accents, rules for which can be extremely complex), there is no reason to put onto them a term which in practice denotes something completely different (standard macrolanguage with 2 variants of Communist SFRJ)
  • "Serbo-Croatian" never existed in the past as a single literary tradition, or a state, before it was artificially made to become so (unsuccessfully though). Croats and Serbs have had their own, separate literary traditions, spanning centuries before SC ideology appeared, written on various dialects and scripts. For example, the "father of Croatian literature" - Marko Marulić, wrote in Croatian Čakavian dialect not spoken by any Serb. Serbs have up until the acceptance of Karadžić's reforms in 1868 used Serboslavic - obscure mixture of Church Slavonic and Russian unintelligible to common folks - as their literary language. No Croatian literary historian has ever had pretensions to claim this Serboslavic writings.
  • "Serbo-Croatian" ultimately became political dogma, upon which Communists enforced Serbian words onto Croats, and wiped words with centuries of literary tradition and those that were even very much spoken from the dictionaries just because they were not used by the Serbs (i.e. they were in Croatian-only dialects). Everything Croat-only was systematically demonised by the Communists as "Ustasha", punishment for which was political persecution and humiliation. For example, the entire editio princeps of current official Croatian orthography book Babić-Finka-Moguš was burned by the communists in 1971 just because it had "Croatian" in the title, and not "Serbo-Croatian".
  • The term "Serbo-Croatian" is still very much cherished by Serb-side, because the myth of it it legitimizes the territorial-cultural pretensions. E.g., while Serbs were bombing Dubrovnik in 1991-92, they published propaganda books such as Srpstvo Dubrovnika ("Serbdom of Dubrovnik") in which history was fabricated to legitimize annexation of Croatian territory to Greater Serbia. (Velika Srbija, to znači ujedinjena Srbija! - "Greater Serbia means unified Serbia!") - published by Vojna štamparija "Military publishing". To this very day the Faculty of Philosphy in Belgrade teaches Ivan Gundulić and other Old Croatian writers that explicitly identified themselves with Croatdom and called their language Croatian as a part of some "Serbian Renaissance and Baroque". Too bad the rest of the world doesn't share the same views..
  • Ultimately the "SC" epoch was highly detrimental to Croatian literary language tradition, as in the crucial period an abrupt cut was made with history. When in the 1990s lots of banished words are those "dormant" ones were revived to standard language, lots of left-wing pro-Communist media ridiculed it as "trying to separate Croatian from Serbian". You won't believe how many times I've heard that words like zrakoplov and brzojav were invented during NDH (they're in fact at least half a century older). When the common dictionary of "language of literature" (književni jezik) was compiled by Matica hrvatska and Matica srpska it was not only criticised for being amateurish and in many definitions wrong, but it completely ignored a large corpus of Croatian literature tradition. Some 10-20k terms for some very basic terms which were used by 15th-19th century writers nowadays don't appear in Croatian standard language as a result of it.
Bogorm I understand your predilection for some myths of the past as they presumably reflect a state of affairs much "brighter" then after the 1990s, a land of "milk and honey" and "brotherhood and unity", but at the same tome there are millions of people that abhor the Communism demon for some very good reason. 1990s wars is a direct result of Communist not being active in educating common people to come to terms with what their ancestors did in the past (the official politics was "collective amnesia"; imagine what would Germany look like nowadays had de-nazification not occurred actively by academics and the media?). The term "Serbo-Croatian" epitomizes lots of bad things that happened in Communist SFRJ in the minds of lots of people, not strictly language-policy (which itself was bad reason enough), and insisting on it just because somebody living in a balloon imagines that the next Yugoslavia (#6, there were 5 unsuccessful previous attempts believe it or not) will be the one, bringing economic prosperity and Pan-Slavic commonness to the impoverished people would be utilizing this project for someone's personal political fantasies. The term is in all possible sense (linguistic, regional, political) obsolete, anachronistic, misleading and politically incorrect, and there is no reason to insist on it, esp. when the only justifications of it presented are the same myths that coined and enforced it. --Ivan Štambuk 15:31, 18 November 2008 (UTC)

Very well said. Štambuk is very intelligent, very knowledgeable, and has explained here the problems with both the term "Serbo-Croatian" and the nonsense of pretending they are one language, or of trying to force convergence.

So what happened?

It turns out to be uncomplicated. You are (all too) familiar with his pattern of extreme personal abuse of anyone who disagrees with him on anything; what you may not fully understand is that this pattern is not new. He has had problems on the Croatian projects for years, having at one point been blocked for a year on the hr.wp (and now permanently banned there.) He seems to see himself as the victim of some sort, failing to see his own constant abusive behaviour.

On February 26 of this year, someone had once again had enough of his abuse, and blocked him on the hr.wikt.

On February 28, he moved Wiktionary:About Serbian to Wiktionary:About Serbo-Croatian and set about editing it to change the language(s) to Serbo-Croatian. (Interestingly, he created redirects for about-Bosnian, and about-Montenegrin, but not Croatian.)

He then set out, fairly sporadically at first, to delete Croatian (and the others when present) from the wikt, replacing it with SC. It had nothing to do with "reducing duplication" or any other arguments that would come later. It was all about—and still is all about—removing Croatian from the en.wikt. (Hence his vociferous objections to having a brain-damaged bot restore all of the deleted sections, that would defeat his entire purpose.)

It isn't that he changed his mind about "nationalism", or about the linguistics. He still holds the positions stated eloquently above, he knows that "Serbo-Croatian" is offensive, insulting, and linguistic nonsense. That is precisely why he is pushing it. His objective is to take revenge on the people in the Croatian WM projects, by doing severe damage here.

He is trolling.

That, btw, explains (if you have noticed through all the noise), why his "arguments" for SC reduce to screaming that the languages are "99.9% identical", interwoven with constant personal abuse. What else can he do? And the noise level itself adds to the obfuscation, leading to some people thinking it is some sort of he-said-she-said argument, when it is not. (See Vaughn's somewhat amusing suggestion of a duel.)

I don't think he intended to go this far, but it has gotten way out of hand, and he doesn't know how to "walk back the cat". All of this is on the permanent record, and having "notorious wikipedia vandal" on your resumé is not a good career move.

From his statement above, for emphasis:

The term "Serbo-Croatian" epitomizes lots of bad things that happened in Communist SFRJ in the minds of lots of people, not strictly language-policy (which itself was bad reason enough), and insisting on it just because somebody living in a balloon imagines that the next Yugoslavia (#6, there were 5 unsuccessful previous attempts believe it or not) will be the one, bringing economic prosperity and Pan-Slavic commonness to the impoverished people would be utilizing this project for someone's personal political fantasies. The term is in all possible sense (linguistic, regional, political) obsolete, anachronistic, misleading and politically incorrect, and there is no reason to insist on it, esp. when the only justifications of it presented are the same myths that coined and enforced it.

Quite so. Robert Ullmann 11:26, 23 October 2009 (UTC)

I nearly reverted this actually. Who cares. Why don't we all work on the f--king dictionary. Mglovesfun (talk) 11:29, 23 October 2009 (UTC)
I don't get it either. Robert, what's your point? That Stambuk contradicts himself? Maybe that would be relevant in a post about Serbo-Croatian, but this is not trying to make any constructive argument. It's attacking Stambuk directly, and we don't tolerate behavior like that. It could all be true, and everything that Stambuk said in reply irrelevant, but that wouldn't change the fact that you're the aggressor. In his response I mainly see Stambuk defending himself, and I think it's shameful that we would allow the question to be framed as such and then talk about Serbo-Croatian like there wasn't the elephant of an ad-hominem in the room. If you have an argument to make about Serbo-Croatian then make the argument. Next time you have a complaint about someone who disagrees with you, you should think to only complain about something actionable like behavior. Please let us know if there's something pertinent to dictionary entries that concerns you, but it doesn't look like you have anything legitimate to complain about with regard to Stambuk's recent edits, at least not anything since the last vote. As long as he abides by consensus with entires, and as long as he sticks to the ground rules, he can make all the fallacious and contradictory comments he likes. We can ignore him or rebuke all his points as long as we stick to the rules as well, which means arguing the points. I don't think there's a better example of breaking that than by starting a new section on the reasons some contributor's beliefs shouldn't be trusted. Coming from anyone else, these comments may have been reverted immediately. Mglovesfun would be correct to have done so. Even Wonderfool would have been correct to revert you, it's so out of line. Really, do you have a legitimate point? Then please argue that instead. DAVilla 05:42, 9 November 2009 (UTC)
I care. My support for "Serbo-Croatian" is due in large part to Ivan's claim that his earlier opposition was grounded in Croatian nationalism; and I'm sure I'm not alone. Ivan is by far our most knowledgeable editor on the subject. If Robert is right that Ivan is trolling, and/or using us for some sort of anti-hr.wikt revenge, then that's a BFD. —RuakhTALK 19:34, 23 October 2009 (UTC)
The Serbo-Croatian unification proposal has absolutely nothing to do with Croatian Wiki-Community, or my alleged "revenge" against it. That is simply one of those ad-hominems Ullmann is inventing against me. Ullmann kind of really "hates" me now, and given that he has no knowledge to confront the irrefutable logical soundness of the Serbo-Croatian unification proposal in linguistic terms, he must resort to trolling and ad-hominems. According to his theory, I an fact still believe that these are "different languages!". If it is really the case, that I want to "take revenge", why I'd be bringing the SC unification proposal after 2 years of creating independent B/C/S entries, and why I'd be ardently defending the oneness of SC on some other unrelated projects (namely the English Wikipedia?), where there is not a single Croatian Wikipedia contributor (I run only to Kubura from time to time). Ullmann's accusation simply doesn't make sense at all, and is more of an deliberate argumentative diversion from the real dispute (i.e. the big list of reasons why the proposal is actually beneficial, regardless of the alleged motivations of one of its supporters). --Ivan Štambuk 19:53, 23 October 2009 (UTC)
And this suggestion of a duel is not such a bad one. to the victor belong the spoils! --Volants 11:43, 23 October 2009 (UTC)
Ullmann, you're the king of the trolls. It's absolutely f* amazing what you wouldn't try to get me blocked on this project. "Constant abusive behavior" - Sweet Jesus. The only person I see feeling "abused" is yourself, which is absurd after all the disgusting lying defamation you've did with DaillyKos post and Wikimedia Board e-mails.
I got blocked on Croatian WP because it was at that time literally owned by two maniacal *****s that up until some month ago were bureaucrats (imagine that!), but are now desysoped and banned from the project without anyone rasing so much an eyebrow, after the community finally came to realize how insane their off-wiki machinations were (I wouldn't rather mention all the details in public, the were some really ugly stuff going on. Ask your fellow Connel, he knows some of the details ^_^). During the year I spent at Croatian Wikipedia I witnessed about a dozen superb contributors being either blocked or permanently driven away by their delicate abuse of admin privilages. Basically everyone who wasn't Croatian nationalist, or who dared to touch the subjects that were somehow perceived as some kind of a "national treason" were driven away. There are reports on meta on all this shit. I have no desire to explicate the billion details that accumulated over the years to you now, and it doesn't matter anyway.
The reason why I got indefblocked recently is not because I've broken any rule or something, but after the nationalist clique out of thin air (initiated by one of the banned ex-burecurats Roberta F., and that everything was well-organized can be seen by the fact that votes were colellected in a matter of minutes) decided to "vote" to indef block me after I tried to reason to them in Kafić (I was 100% polite and civilized in that discussion, as everyone who can read serbo-Croatian can see for themselves, and received plenty of unsanctioned abuse BTW) on how SC unified treatment on Wiktionary is not something reflective of "Communist oppression", or "next step towards the obliteration of bs/hr/sr wikiprojects", or something similar that they paranoidly imagined. But, after that vote I kept my mouth shut because I had a feeling that there was a big change coming with regard to the ArbCom vs. 2 burecurats issue (plus it was no damage at all because I don't contribute at all to Croatian Wikipedia, only on English-language wikiprojects for 2.5 years now, with the exception of Croatian Wikisource where I put material I need for English Wiktionary as citations). In other words, my recent indefblock was not because I was "abusive" or something (as I said, I decked all my logically incontrovertible verbiage in the related discussion in Kafić with flowers of civility :P), but because they simply couldn't stand my presence. I cited them the most competent living Croatian Slavist Mate Kapović, who wrote in a nationalist magazine Kolo published by Matica hrvatska (the central Croatian cultural institution) that "dialectally, Croatian and Serbian are of course the same language" - which of course created an "outrage", and the "defense" that languages are not merely "linguistically defined" yada yada.....
On February 28, he moved Wiktionary:About Serbian to Wiktionary:About Serbo-Croatian and set about editing it to change the language(s) to Serbo-Croatian. (Interestingly, he created redirects for about-Bosnian, and about-Montenegrin, but not Croatian.) - I assure you that the only reason why if I've failed to create the redirect for Croatian, as you mention I did, it was simply because I somehow forgot to do it.
He then set out, fairly sporadically at first, to delete Croatian (and the others when present) from the wikt, replacing it with SC. It had nothing to do with "reducing duplication" or any other arguments that would come later. It was all about—and still is all about—removing Croatian from the en.wikt. (Hence his vociferous objections to having a brain-damaged bot restore all of the deleted sections, that would defeat his entire purpose.) - Nonsense, it was with regard to the modifications of the WT:ASH policy and the unified treatment contained in it, and which was agreed to by all of the Serbo-Croatian contributors at the time. Your perception that this was all about "deleting Croatian" is tragical. Absolutely nothing od the information was lost. Moreover, all the merged entries were during the process expanded, rectified and checked. You, as well as the entire community, were notified of this ongoing activity.
It isn't that he changed his mind about "nationalism", or about the linguistics. He still holds the positions stated eloquently above, he knows that "Serbo-Croatian" is offensive, insulting, and linguistic nonsense. - No Freud, I actually changed my mind, believe it or not. You know, smart people always have the right to change their opinion, as opposed to stubborn fools who push the same "truths" forever, despite being proven wrong. FFS, what would I possibly gain by pushing Serbo-Croatian, and privately holding that it's not one language, and that the name is insultive? I'd have to be clinically insane to do that. Sorry, your logic is fallacious and ill.
"His objective is to take revenge on the people in the Croatian WM projects, by doing severe damage here." - LOL. You need help, dude. And stop finally using the abusive word damage - no "damage" is done, as no content is lost, and English-speaking Wiktionary users would find the merged treatment much more optimal to use. I personally don't give a flying **** on the state of mind of folks on Croatian Wikipedia :) My sole concern here is the quality of both content and presentation of Serbo-Croatian (i.e. its modern-days national standards) with respect to Wiktionary users, i.e. people like Bogorm. As a person who has some 30 000 quality edits (you know, actual content edits), I'm quite concerned that that goal not be obstructed by maliciously-intented individuals, as a form of trivial remedy for their personal mental issues (nationalist pride, ego..)
"That, btw, explains (if you have noticed through all the noise), why his "arguments" for SC reduce to screaming that the languages are "99.9% identical", interwoven with constant personal abuse. What else can he do? And the noise level itself adds to the obfuscation, leading to some people thinking it is some sort of he-said-she-said argument, when it is not." - And I'm the one who's trolling? ^_^ Here are my arguments I've stated repeatedly over and over and over again, on various places, and which irrefutable, undeniable facts of nature which can be verified in the books:
  1. Modern standard Bosnian/Croatian/Serbian(/Montenegrin) are all standardized on the same, identical speech/subdialect (Neoštokavian) of the same dialect (Štokavian), the only (!!!) speech that is spoken by all 4 nations. This was deliberately chosen in the 19th century to bind the dialectally diversified literatures and their respective languages of our neighboring brotherly nations, to suppress Ottoman and Austro-Hungarian yoke (see Vienna Literary Agreement).
  2. As a result these have the same phonology (the same phonemic inventory), the same pronunciation (2-way accentual system), the same inflexion of nouns/adjectives/verbs (in 7 cases 2 genders 3 persons 4 tenses). That would make their grammars 99% identical. In fact, all the differences in grammar among the standards could all fit on some 2 pages of text. (And that's on standards alone, if you take into account what people actually write/speak, there are countless "Serbian" words/constructs used by Croats and vice versa.)
  3. These 4 were prior to 1990s treated as different regional varieties of one and the same language Serbo-Croatian/Croato-Serbian by all of the world's Slavists (including venerable Croatian linguists like Tomislav Maretić, Vatroslav Jagić etc.). This is the stance of pretty much all of the Western Slavists today (including the top-ones like Browne, Kortlandt, Dybo...who actively publish on Serbo-Croatian and still use that very term). If you look in the early 20th century and earlier, Serbo-Croatian/Serbokroatisch is in fact the only term you'd find in the books.
  4. From the perspective of modern linguistic science, B/C/S/M are four independent standard languages, 4 national standards of one underlying linguistic entity (call it Neoštokavian, Serbo-Croatian or whatever). This makes Serbo-Croatian polycentric standard language, as Croatian linguist Snježana Kordić meticulously elaborates in her recent paper here.
If you find this above "abusive", and choose to ignore and dismiss it as "arguments", you're effectively demonstrating a solid amount of trollish behaviour. We all know that you don't know iota of Serbo-Croatian, or of any other related Slavic language, and it's astonishing that you even have the courage to emit such immense amounts of BS directed against an expert on the language (me), failing to mention a single reputable source refuting any of my claims.
Ullmann, if you genuinely imagine that I'm "abusive" and "trolling" (which I don't think you are, you're simply trying to maliciously denigrate my standing in the community but I don't think that people are that stupid around here), start a confidence vote for sysops only on me, otherwise finally cease this disruptive pattern of transparent trolling because it's getting really boring retorting you and you're starting to look really pošandrcao (dunno the English term for that, ask your "new friends" to explain :P). --Ivan Štambuk 19:40, 23 October 2009 (UTC)

Please, let's restore removed entries, assume everybody's good faith, forget all this, and get back to normal work. Lmaltier 20:51, 23 October 2009 (UTC)

The real reason ye Croats are doing this is because Serbian wikipedia is moving along quite nicely and Croatian/Bosnian wikipedias are falling behind, and you guys want to take Serbian articles without doing the hard work of "translating" them.--Pepsi Lite 00:01, 24 October 2009 (UTC)
No, this is a completely untenable præsumption, since die-hard nationalists on hr and bs wiki would be incensed at words typical for the Eastern variety of SC such as poreklo or hiljada, not to speak about the Cyrillic script. Therefore it is virtually impossible for a wikipedia which would claim to be true-born Croatian to contain similar words and their expurgation would go far beyond the copy-paste process and render their alleged intention fairly intricate to realise. Furthermore Ivan is at odds with the hr wiki governance and their (nationalist) approach is not his. The most virtuous example is sh wiki where both poreklo and podrijetlo, hiljada and tisuća are accepted and a foreign reader is capable of becoming familiar with those peculiarities æquivalent to American and British English differences. It is by far the most tolerant one. The uſer hight Bogorm converſation 06:43, 24 October 2009 (UTC)
Now that is out of line and completely irrelevant to anything on Wiktionary. Knock it off, please. -- Visviva 04:33, 24 October 2009 (UTC)

[edit] Less ad-hominem section header

  • I think we all have complex motives for what we do here, including some motives that we probably aren't terribly proud of. But motive is neither here nor there, really. Ruakh is correct that revenge-trolling would be a BFD, but no plausible case has been brought against the linguistic soundness of Ivan's edits AFAICT. Even if the arguments behind them are ultimately rejected, sound edits backed up by sound arguments don't amount to trolling, regardless of the motivation.
    I'd like to go back to something that Robert said some ways above, that the absence of consensus makes the bot necessary. This seems somewhat contrary to the way we normally do things:
    1. It is a fact of life on a wiki that there are a great many issues for which no consensus exists, either for or against. We do not require a prior consensus for human edits. If we did, no progress would ever be made.
    2. We do require a prior consensus for bot runs. If we didn't, pure chaos would ensue.
    3. No argument has been made that Ivan's edits were automated. They appear to be complex, human edits done over a considerable period of time, with concurrent discussion on various policy pages.
    4. There may have been no consensus for these edits, but there was also no consensus against them, as far as I can tell.
    5. Accordingly, there is no obvious basis for automated reversion/restoration.
  • On the other hand, it would be perfectly acceptable for individual editors to restore sections in a manual or semi-automated fashion. It could be done like this:
    • A script could prepare a list of the removed sections, formatted in such a way that it would take a human editor only 2-3 clicks to add them back (after which AF could sort them into place).
    • If there is then a human editor who wants to make the hundreds/thousands of clicks involved, it's done.
    • If not, it stays not done until somebody decides to care.
  • Anyway, that's as much sense as I can make of things at this juncture. I am aware that my proposed solution will not make anyone happy; I'm not sure that's a bad thing. I have absolutely no opinion on the merits of the case(s). -- Visviva 04:33, 24 October 2009 (UTC)
    As I've stated above: I have nothing against manual adding of B/C/S(/M) sections, either on the basis of the previously merged entries, or on the basis of newly created SC entries. The most important thing is that they're checked by a knowledgeable human, since during the merger e great deal of cleanup occurred, and the blindly-restoring bot would generate thousands of entries needing human attention (such as wrong data and obsolete headers, as I've explained in the above discussion). Elephantus (talkcontribs) attempted doing this manually - restoring from history previously merged ==Croatian== sections, but after a few edits he realized how inane activity it was, how ridiculous it looks like to have stubbish ==Croatian== entry next to moderately complete ==Serbo-Croatian== entry with the same content, and thus he eventually started simply copy/pasting whole ==Serbo-Croatian== entries to ==Croatian== the only changes being modifying ISO code from sh to hr, and removing non-Croatian ==Alternative forms== when they occurred (Ekavian Serbo-Croatian). He eventually gave up on that too.
    The only proper way to handle this "issue" (the non-existence of redundant data for non-existing languages) is not by restoring rubbish from page history, but on cloning the existing ==Serbo-Croatian== entries, so that all of those thousands entries look as silly as [[govor]]. This can be trivially semi-automated in various ways. We can also add a note to the corresponding language policy pages, for all the future WTF complains by Serbo-Croatian learners that end up on Wiktionary, why is one language handled in 5 mostly identical sections, to seek "explanation" by Robert Ullmann, because someone has to take responsibility for this nonsense. --Ivan Štambuk 13:26, 24 October 2009 (UTC)
Ullmann has become a gigantic troll. He’s trying to do for Serbian/Croatian what he has done as the adminstrator for rw:Main Page. In October of 2006 that Wiktionary had about 300 entries, and today, three years later, it still has about 300 entries. The main contributor for Kinyarwanda quit to get away from RU back in 2006, and RU is doing his damnedest to make our Serbo-Croatian editors quit now. We should turn over our Serbian/Croatian project to qualified, dedicated, competent editors. I have an idea...let’s get Ivan Štambuk and Dijon to handle it. —Stephen 15:51, 24 October 2009 (UTC)
We seem to be on the brink of a reasonable solution, which Ivan seems to have accepted. It may well play out exactly as he says, which may be the most wiki-like way to achieve the desirable final outcome. I fully expect Serbo-Croatian to be superior in its coverage and entry quality to any of the other languages that had been at issue. Superior content supported by active contributors will win. It can win without preventing folks who have other reasons to want to have other language headers from contributing under those headers. Some of what they contribute might even be useful for the coverage of Serbo-Croatian. DCDuring TALK 16:24, 24 October 2009 (UTC)
We do not need to snatch destructive conflict from the jaws of reasonable compromise by stirring the pot with extraneous score-settling. DCDuring TALK 16:24, 24 October 2009 (UTC)
Ullmann wrote the preceding section, #Štambuk v Štambuk, only yesterday. That is not "reasonable compromise". Nobody is snatching destructive conflict and stirring the pot except Ullmann himself. —Stephen 16:55, 24 October 2009 (UTC)
The thing is DCDuring that this whole "issue" surfaced as an exercise in planned trolling where Ullmann hoped to have cast me as an "abusive POV-pusher" in order to get me blocked, in assistance with his new "friends" from Croatian and Serbian Wikipedia (a bunch of nationalist bigots, one of which ironically even openly confessed pro-Greater-Serbian viewpoints). No one prevented anyone from (re)creating B/C/S(/M) sections from the start. The thing is that these folks are simply too lazy to do it themselves, and they'd rather see bot do it, only to see they beloved nationalist designations on Wiktionary, regardless of the quality or the necessity of such entries. And that last thing is what I have problems with. You want every single Serbo-Croatian entry to look as ridiculous as [[govor]]? Fine by me. Perhaps one day when we have 100 000 of those there'll be a vote to eliminate all that redundant garbage, or to abstract it away by in JavaScript at the presentation-level. --Ivan Štambuk 16:40, 24 October 2009 (UTC)
We may accept some ugliness in support of "wiki-ness". I doubt that there will end up being truly massive duplication (and hope there won't). If Ivan and like-minded contributors continue with high-value SC entries and allow the others to go along doing what they will, the worst that will happen is duplication, perhaps accompanied by excessive bickering. As Ivan pointed out, the silliness of that may become apparent to the related-language contributors and to the community at large.
Visviva had a reasonable accommodation. I was hoping that it would be the focus of discussion. DCDuring TALK 17:35, 24 October 2009 (UTC)
Could there be a solution while contributors still remove Bosnian, Croatian, Serbian sections? Imagine that a group of contributors begin to remove all Ido sections because of their linguistic opinions (which is not improbable). Would you forbid the use of a bot to restore them because they have been removed manually? Removing sections because of one's political or linguistic opinions is not acceptable, and removed sections must be restored. It's the only way to quieten things. Lmaltier 20:57, 24 October 2009 (UTC)
We haven't determined, as a matter of policy, that it is wrong to do so, as Visviva has pointed out. There seems to be a majority, but not a consensus, that believes that it is OK to eliminate such headers if the resident experts think it appropriate. There was insufficient consensus to make it a policy or to have such a process (or its reverse) be implemented by bot. I think it would be wise not to eliminate material under any header for a living language systematically, whether by policy, by bot, or manually in the absence of consensus, because that seems to me to be how a well-functioning wiki would do it. DCDuring TALK 22:30, 24 October 2009 (UTC)
If there's no consensus that we should have Ido sections, then yes I'd forbid the use of a bot to add them, even if the bot is simply reverting another user's manual removal of them. Allowing bots to operate without consensus is bad in so many ways. You seem to see a bright-line distinction between "remove" and "restore", but if there's no consensus for either one, then I don't think that distinction is valid. (No such distinction is valid unless there's consensus for it, else you're basically saying that your own personal opinions — in this case inclusionism — supersede community consensus.) And it's especially tendentious as applied in this case, because the sections weren't "removed" so much as "merged, expanded, and improved". (I'm not saying the term "removed" is completely inapplicable, but it's stretching the truth to pretend that that is the term for what happened, and then to base all your analogies on that term.) —RuakhTALK 05:22, 25 October 2009 (UTC)
You seem not to understand my main point: it's not acceptable to do changes because of one's political or linguistic opinions (except by adding correct information). There will never be any consensus on political or linguistic opinions, of course (this is what opinion means), you should not expect one. I'll take another example. If some people believe that religions should not be addressed by Wikipedia (as they are allowed to believe), and delete all pages related to religions (even with good arguments), these pages should be restored (by bot if necessary), because their changes would be based only on opinions, and would violate the NPOV principle. This also applies to linguistic issues, some linguistic issues are very polemical (cf., in French, fr:au temps pour moi and fr:autant pour moi). Yes, the NPOV principle is one of the founding Wikimedia principles, and would supersede community consensus (but anyway, in this case, there is no community consensus). Lmaltier 09:44, 25 October 2009 (UTC)
Re: "There will never be any consensus on political or linguistic opinions, of course (this is what opinion means), you should not expect one": we don't need one. We may never have a consensus on whether BCSM is one language or four (or some other possibility), but we certainly can have a consensus on whether we want to treat it under one language section or four, or five (or some other possibility). You prefer five, and I don't begrudge you that preference, but there is nothing in Wiktionary policy that automatically elevates your preference to "imposable by bot without consensus". Your comparison to Wikipedia articles on religion is a bad one, because Wikipedia has a consensus to include such pages. If there were no such consensus, then a bot to add them there would be highly inappropriate. And your appeal to NPOV is unconvincing, because NPOV doesn't support your belief that the only appropriate edit is the addition of correct information. For example, if someone adds a usage note to fr:au temps pour moi stating that it's wrong and that everyone who uses it is an idiot, then other editors can (and should) restore NPOV by removing that note. And while you obviously feel that NPOV requires us to have sections for every putative language, it's obvious that many editors feel differently; even if NPOV supersedes community consensus, how do we decide, if not by community consensus, what is NPOV? —RuakhTALK 13:20, 25 October 2009 (UTC)
  • [long undent back to Visiva's proposal] There are a couple flaws, imo, with your findings of fact:
    2. In fact, dozens of bots are operating on Wiktionary regularly with neither community approval nor bot flag; in some cases there is collusion/tacit approval of admins.
    5. There is basis for automated reversion/restoration: the removal violates Wiktionary's inclusive policies. Key policies.2 states 'Wiktionary is multi-lingual in that it has entries for words from any language. It aims to cover Every Word from Every Language.' This policy would of course include every word from any dialect, based on the policy's broad language. The fundamental description as criteria for inclusion is 'As an international dictionary, Wiktionary is intended to include “all words in all languages”.'
  • Given the above, I cannot see a reasonable objection to any form of automated restoration, but I also cannot see support for any automated reversion as that would likewise be removing valid word entries. But then, iirc, I believe M Ullmann did suggest exactly this - a restoration of removed sections and retention of the new sections as is. - Amgine/talk 19:12, 25 October 2009 (UTC)
  • Re #2: It's true that bots have been run without approval and without a bot flag, but not when it's so obvious that they don't have consensus; and there are very high expectations that the bot be stopped if anyone raises any objections.
  • Re #5: Nonsense. The "removal" does not violate the "all words in all languages" doctrine, because we still include all the words that we did before.
  • RuakhTALK 22:39, 25 October 2009 (UTC)
I respectfully disagree with your assertion of "Nonsense." Here are a couple of examples in which we are now missing words from some languages:
term Bosnian Croatian Serbian
jedinica X X X
ансамбл     X
badem X X X
Минхен     X
žito   X  
jagoda X X X
kompozitor X X X
Whether you agree that consolidating several dialects under one language header is better for Wiktionary or not, you cannot deny that each of these are defined as languages by authorities respected by the project, and that Wiktionary's stated policy is to include all words in all languages.
This is true whether or not you are stalking my postings because you don't like me, as well. - Amgine/talk 04:45, 26 October 2009 (UTC)
I don't deny those things, but I deny that they're relevant. We still have the words, we just describe them differently. I stand by what I said above to Lmaltier: "I'm not saying the term 'removed' is completely inapplicable, but it's stretching the truth to pretend that that is the term for what happened, and then to base all your analogies on that term." (And I'm really not stalking your postings. You just happened to have commented a few times recently on pages I have watchlisted.) —RuakhTALK 12:06, 26 October 2009 (UTC)
Amgine, modern Bosnian can be written in Cyrillic as well (is allowed to in orthography books, just not much used), and Croatian writers have for several centuries used bosančica and attestations for Cyrillic spelling of all of these could be found. As for žito - it's a native Slavic word (Proto-Slavic *žito, basically unchanged) and can be found in usage by Bosniak and Serbian writers as well as dictionaries and is no freakin' way "ethnically Croatian".
Also, given how loosely you use the word dialect, it's clear that you apparently don't know much of South Slavic dialectology, and its application to the Serbo-Croat area. These are not different dialects - they're all the same dialect and which is in dialectology books called Neoshtokavian. Serbian and Croatian writers decided in the 19th century to deliberately standardize their literary languages on the same dialect Ijekavian Neoshtokavian (which was at that period spoken by minority of Croats, but has ever since managed to largely obliterate all the other subliterary dialect, namely Čakavian and Kajkavian). There is no "language" of which standard Bosnian/Croatian/Serbian/Montenegrin are "dialects of". It doesn't exist. They're not only the same dialect, but the same subdialect (Neoštokavian, there are also "Old Štokavian" dialect with different accentual system).
The usage of different scripts that trivially map 1:1, or as well regionally confined terms does not justify the notion of "different language". Languages are not defined in terms of lexis but in terms of grammar. And in case of Serbo-Croatian national varieties, grammar coincide 99%. In phonology alone they're more similar (i.e. identical) than e.g. British and American English.
There are no "authorities respected by the project" that you speak of, and there is no institution in the world that "defines as languages". Furthermore, there is no firm criteria in linguistics at all to strictly define languages. We follow our own criteria for language-inclusion (not lexeme-inclusion which is covered by WT:CFI) on the basis of our needs, i.e. what do we gain by doing separation/merger with respect to the target audience (Wiktionary users) and contributors. We merge Byblical and modern Hebrew because it makes sense. We treat all Ancient Greek dialects and also Middle Greek as ==Ancient Greek== because it makes sense (even tho the differences between the spellings and inflection of Ancient Greek dialects are much greater than among modern standard B/C/S/M). OTOH we treat 2 varieties of standard Norwegian under different language headers because they apparently differ enough in various points of inflection so that it doesn't make sense to treat them commonly. We also treat Lithuanian Žemaitian dialecet under different L2 because it radically differs literary Aukštaitian dialect which we use as normal ==Lithuanian==. For B/C/S/M there is no such justification - as the merged entries have proved, it's very easy to treat them all together, and differences among standards and the distribution of regionally-confined lexemes can easily be handled by means of context labels and ===Alternative forms=== header, similar to what we already use for varieties of English, German, Spanish etc.
B/C/S/M are 4 different national standards of what is linguistically doubtless one entity, call it Serbo-Croatian or whatever. That fact cannot be ignored. The fact that they're maintained by different national bodies does not invalidate their inherent linguistic oneness. You cannot argue that just because they've been assigned different ISO codes under the pressure of nationalist governments in the 1990s that it justifies the notion of "separate languages". Especially because there was only one code sh for a very long time before that, and that nobody had any problems with. It's silly to try to "prove" anything on the basis of some cherry-picked lexical comparisons and the fact that Croatian standard does not use the Cyrillic script (where Bosnian, Serbian and Montenegrin allow it). --Ivan Štambuk 12:33, 26 October 2009 (UTC)
You write much and say little.
International linguistic standards respected by en.Wiktionary recognize these as viable languages. Full stop. Your actions in consolidating multiple entries under a single language header reduced the number of entries in those languages whose headers were removed. Full stop. Your actions in replacing a single language header with another reduced the number of entries in the language whose header was replaced. Full stop.
There are no value judgements in the above statements. I happen to agree with your campaign whole-heartedly emotionally, but my opinions are irrelevant. They cannot change the fact that your actions have materially reduced the number of entries in some languages which are valid languages by our standards. - Amgine/talk 17:02, 26 October 2009 (UTC)
I write much because you apparently don't know much on the topic. It is necessary to educate you on the basic concepts, lack of understanding of which blurs your perception of the subject and leads you to fallacious conclusions such as "words spelled in different script belong to different languages" and "lexical dissimilarity necessitates the notion of a 'different dialect'". It is important not just for you, but also for other readers not to be mislead by such fallacious reasoning. Hence, I strive to keep my post as extensive and educational as possible.
There is no such thing as "international linguistic standard". Have you absolutely any clue what SIL International does? How many professional linguists do you think give a **** about that nongovernmental non-profit Christian organization? I'll tell you: none. SIL and 3-letter codes have their own particular purpose (read e.g. article on it in Elsevier Encycolpedia of Language and Linguistics on what exactly do they try to accomplish with it, which is not to proscribe the notion of a "language" at all). Some nationalist might be deceived that it internationally legitimizes their "separate language", but a quick glance of recently published authoritative FL sources (e.g. Britannica, papers and works by top linguists in the field..) should prove them otherwise. Pleas stop trolling with the abuse of such seemingly "authoritative" wording such as "international standard". Really. Languages are not computer protocols.
Your actions in consolidating multiple entries under a single language header reduced the number of entries in those languages whose headers were removed. - Nonsense.
Your actions in replacing a single language header with another reduced the number of entries in the language whose header was replaced. - Nonsense
There are no value judgments in the above statements. - Yes there are. These are not "different languages", for starters. Once you realize that (which you can't, as you don't know the language at all, and apparently don't know much about linguistics either) it all falls to pieces. Additionally, once you realize that we are not here to mindlessly chase 3-letter codes but to describe the linguistic reality as it is/was with our readers/contributors in mind, you reach the conclusion that the common Serbo-Croatian treatment is the Right Thing to do. We can add those national varieties of Serbo-Croatian too, but that would be simply a waste of time and bytes. I can see on my watchlist that our proud admirer of Vojislav Šešelj Pepsi Lite (talkcontribs) has been more than industrious in copy/pasting ==Serbo-Croatian== entries to ==Serbian== lately, why don't you give him a hand? :)
Once again, to summarize all this: Restoration of these merged entries must not be done automatically. They contain a large number of factually wrong data, obsoleted formatting as well as ambiguous content that would introduce unnecessary confusion, misleading that there is some kind of additional difference between those "languages" when in fact there is none. There is absolutely no loss in not having them because everything of use is already contained in the merged ==Serbo-Croatian== sections. Users interesting in adding sections in national varieties of SC alone (those that pretend to only speak/write "Croatian" or "Serbian", but can nevertheless smell a "Serbianism" in Croatian or vice versa from a mile away) can do so. They might be much more interested in automated extracting of interesting content from the merged or newly-created SC entries, an approach which should reduce their efforts by an order of magnitude at least. In fact, had they done so for these entries, the could've checked the list in a few hours and it all could've been added long time ago. But it appears to me that that wasn't the point at all of this silly anti-Štambuk crusade. --Ivan Štambuk 15:56, 28 October 2009 (UTC)
Ivan, how did you infer Pepsi Lite's admiration for Dr. Vojislav Šešelj? The uſer hight Bogorm converſation 20:31, 28 October 2009 (UTC)
I respect Vuk Stefanović Karadžić. Mr. Štambuk admires Josip Broz Tito along with at least 25% of his fellow countrymen labeling him the 'Greatest Croat in history'. Croats admire Tito because on November 26, 1942 he promised us Serbs: democracy, inviolability of private property, freedom of individual economic initiative; and like a typical Croat he later changed his mind. Tito very successfully deceived Serbs and Winston Churchill who gave him support because oh his false promises. The communist leadership headed by Croat Josip Broz Tito sent Serbs and Montenegrins (like Milovan Đilas) who protested the lack of democracy to the Croatian island of Goli Otok.
Mr. Štambuk still holds the nationalist ideals to his heart but is not trolling the Croatian wikipedia like Mr. Ullmann is indicating. Mr. Štambuk is doing to us like the rest of his admired forefathers have done in the 20th Century: screw us over and throw us away like a used condom. Croats overwhelmingly voted on 19th of May 1991 in favor of separation (94.17%), and now this is not enough for them, apparently. Mr. Štambuk, your countrymen wanted and voted for this so there is no reason to tell other people lies that you are going along with the wishes of 90% of your countrymen.
Serbian and Croatian are not the same language. Croatian is composed of Kajkavian and Chakavian, which Serbian isn't. Wiktionary is all words in all languages. Some Croatian villager somewhere today (or in the past) is speaking either Kajkavian or Chakavian and those words should be included in Wiktionary. Kajkavian or Chakavian cannot be included as a 'Serbo-Croatian' entry, since those dialects are very different. Kajkavian and Chakavian should be included under Croatian. Likewise Torlakian under Serbian. The only way Serbian and Croatian words can be included together is if there is a L2 header called Neo-shtokavian. A L2 heading called Neo-shtokavian would stop my opposition to this grouping and would even stop Mr. Ullmann's trolling which is awesome and spectacular!--Pepsi Lite 08:09, 29 October 2009 (UTC)

[edit] Latin letter cleanup

After some reformatting of "A" (by several kind souls), what do people think of today's version (specifically the Translingual section compared to last week's version)? Several changes were made: use of {{Basic Latin character info}}, the reference image only shows one style (no italics) and a single case, new layouts of {{Letter}} and {{Latn-script}}/{{mul-script}} were used, images of different styles of the letter were put together, and etymologies of abbreviations were merged. Should these types of reformattings be done to the other basic latin letters? Anything that people would disagree with, or more suggestions? --Bequw¢τ 16:50, 23 October 2009 (UTC)

Vastly improved. A long-needed effort. We would be much better off if all of them looked that good. But, can we make further improvements?
  1. The image seems large for its modest informational (or eye-candy) value with my default thumb settings, which are good for other images. Are there other images that might convey more? Is there a way of setting the image size to be smaller, even fixed size?
  2. The gallery doesn't play well with the rhs ToC, giving too much whitespace. Is that an intrinsic problem with gallery, which usually only appears below the bottom of the ToC? Is the gallery a Wikimedia thing? Is it alterable?
  3. Should even more of the material appear only in Appendices, which themselves might merit more of a preview/advertisement on the page. I am thinking of the related terms and the images.
-- Great job so far. DCDuring TALK 18:22, 23 October 2009 (UTC)
Stab taken. MediaWiki gallery syntax is more flexible than I recalled; setting "perrow=3" has solved the TOC issue for me (though obviously there are some screen resolutions where it would still cause problems). For the image in the character box, I'm thinking 50px should do just as well as 100px for most cases; people who want more detail can always click through to Commons. I do think it has to be set explicitly; using thumbnails would result in a much-too-large image (for most people) with lots of extraneous formatting.
I don't think appendices are a good idea. The entry is about the letter A; we may as well use it to showcase the information we have about that letter. Having an A entry that pointed to a separate Appendix:Latin capital A (or whatever) would not be very productive IMO. I've boxed related terms, though, since it's got plenty of room to grow. -- Visviva 05:12, 24 October 2009 (UTC)
Excellent on all counts. The ToC problem is solved for me as well. I hope it works for others. The two signal-flag images seem out of place. Can they go in the/a gallery? The images in the gallery also seem large for their information content. I wouldn't be so picky except this could be a template for at least the Latin characters and apply approaches and tricks that would work for other classes of single-character headwords. It might as well be the best we know how to do. If it involves features that are not documented in our normal places or are just out of the ordinary, our more technically adept contributors are the best ones to find appropriate solutions. I'm hoping that no character is more demanding than A. DCDuring TALK 10:36, 24 October 2009 (UTC)
Something like User talk:Visviva/Letter, maybe? MediaWiki galleries are kind of fiddly (for instance, you can't use variables or parser functions in them), but most of their behavior can be emulated. This may work poorly with funny-shaped images; I haven't really tested it beyond the one example there. -- Visviva 16:59, 24 October 2009 (UTC)
Beautiful. Yes, like that. Is that the basis for something more flexible (and improvable) than gallery?
Is there a way to do images that are sized relative to the user-preference for size of thumb? I am mostly thinking of a smaller size for low-information content images, but it could conceivably be useful for high-density images, though a larger size is just one click away, as you had pointed out. DCDuring TALK 17:17, 24 October 2009 (UTC)
I like it too. --Bequw¢τ 21:08, 25 October 2009 (UTC)
It includes all of the old ones but the NATO word. Would you just want it formatted so that that NATO word was a bullet point above or below the gallery? --Bequw¢τ 03:54, 27 October 2009 (UTC)
We currently have images of the letter A written in Fraktur, Uncial, and Roman serif fonts. Is this sufficient, or would we want to include other major types of typefaces (Roman script and Antiqua) even other variants (such as a cursive or Textualis blackletter). If we do, does anyone know about creating images of these types (I don't think there aren't free fonts for all these). Would this information be too encyclopedic? --Bequw¢τ 19:01, 24 October 2009 (UTC)
It's a bit encyclopedic IMO, but it has closer-than-average connection with a dictionary. To me it seems like excellent Appendix material if WP doesn't cover it or we have something to say that they don't. DCDuring TALK 20:15, 24 October 2009 (UTC)
I think they should go somewhere, maybe on the main page, as it could help someone decipher which symbols on a document are which letters. --Bequw¢τ 21:05, 25 October 2009 (UTC)

[edit] Replace Category:Old French plurals with Category:Old French noun forms

Basically it's the simplest way of dealing with cases like none which is the nominative singular of nonain. I've had a go at update the relevant templates ({{fro-noun}}) if I've screwed anything up, do fix it, tell me. User:Widsith is the only other user (that I know of) that knows anything about Old French. So, anyone object? Mglovesfun (talk) 20:40, 24 October 2009 (UTC)

I'm strongly in favor of anything that gets us more Middle French, Norman French, or Old French entries because of their importance to English etymology. For that purpose, any simplifications that lead to more lemma entries would be great. DCDuring TALK 22:35, 24 October 2009 (UTC)
Do we have anything even close to policy describing when it is appropriate to ditch plurals and go with noun forms instead? I'd been thinking of nominating Category:Catalan plurals for deletion and simply leave all the non-lemma forms in Category:Catalan noun forms (and Category:Catalan adjective forms, since at present the plural adjective forms are ending up in there as well. — Carolina wren discussió 23:14, 24 October 2009 (UTC)
Well Old French has a case system, (nominative and oblique). I sometimes wonder what to do about stuff like joueuse which is just classed as a noun right now, which is okay, but it could be considered a noun form. It's easy to see with Russian, Latin, Greek and whatnot, that when there's a case system, the only other sensible option would be Category:Latin genitive singular forms (and about 11 others) which is why we have Category:Latin noun forms. Other input? Mglovesfun (talk) 13:22, 25 October 2009 (UTC)
Support. I think that [[Category:langname plurals]] only makes sense for a language like English, where only one part of speech has a plural form, and said plural form cannot be marked for anything else (such as definiteness or case). Something like [[Category:langname plural nouns]] or [[Category:langname noun plurals]] could work for a language like Catalan or Modern French (especially if we treat pairs like cousin ~ cousine as being two related lemmata; I know that some people consider it to be one noun with an inflection to indicate its referent's natural gender), but the more general solution of [[Category:langname noun forms]] seems best. (Personally, I wouldn't oppose an all-out split — stuff like [[Category:Old French nominative plural nouns]] — but when we've discussed this in the past for verbs, the general attitude has seemed to be in favor one big catch-all category for non-lemma forms of a given POS.) —RuakhTALK 13:03, 26 October 2009 (UTC)
Such categories were created on the basis of English category model for parts of speech. There are these for many more languages that have additionally marked plural forms, and which also ought to be deleted. Some of these grew quite large (e.g. Category:Hungarian plurals). Deletion of these should best be discussed on an individual language basis, but the creation of new such categories for languages which inflect nouns for more then 1 plural number should IMHO be strongly discouraged, especially if there is any kind of syncretism (e.g. nominative plural and accusative plural forms are the same). --Ivan Štambuk 15:25, 28 October 2009 (UTC)
  • I don't object. I have always read "[lang] plurals" as "[lang] nominative plurals" anyway, but given that there are only two cases in OF, it does seem a bit weird, I agree. Ƿidsiþ 13:59, 26 October 2009 (UTC)
    • If nobody objects, I may as well get on with it. Mglovesfun (talk) 08:51, 28 October 2009 (UTC)

[edit] More advanced Wiktionary queries

I've been hard at work finding more stuff to index. I now have working which scripts are used in page titles and I'm working on Unicode Collation Algorithm sort keys. Here's the first new result:

Let me know if you have some more ideas. — hippietrail 03:28, 26 October 2009 (UTC)

Have you gotten to templates and content-level items yet or are we still operating on headwords, categories, and headings? I'm still plenty busy on the product of your last runs. DCDuring TALK 09:33, 26 October 2009 (UTC)

[edit] Policy proposal

Per roof tile and coal mine passing rfd as the more common spellings of rooftile and coalmine, I think it would be good policy to accept de jure these sort of entries that while sum of parts as two (or more) words, are the more common form of the same thing but without the space. Otherwise you get something like this.

  1. (rare) Alternative spelling of coal mine.

Or if you split it like this

  1. (rare) Alternative spelling of coal mine.

It's confusing, because you might click on it not realising that it is two separate links, hence end up with no definition. Or, like this.

  1. (rare) Alternative spelling of coal + mine.

Looks a bit silly.

In fairness, I don't think stuff like coal mine should get deleted merely because it's got a space in it. I don't see how coalmine is less sum of parts than coal mine. Mglovesfun (talk) 14:42, 26 October 2009 (UTC)

Having default CSS underscore links will allow the "coal mine" method.​—msh210 14:59, 26 October 2009 (UTC)
I would strongly support underscores to differentiate a single multi-word link from multiple single-word links, especially if they were faint or dashed or dotted. But I would rather have conspicuous ones than not have them at all. But that has benefits in many circumstances, including many unrelated to the proposal at hand. I'm still without firm opinion of the proposal. DCDuring TALK 16:24, 26 October 2009 (UTC)
 :P I agree with Mg. IMO Keep both or else keep neither better choice of the two is obvious isn't it?. 50 Xylophone Players talk 15:20, 26 October 2009 (UTC)
Me too. That is silly looking ... L☺g☺maniac chat? 15:33, 26 October 2009 (UTC)
If you treat coalmine and coal mine as typographical variants rather than spelling ones, then coal mine is not sum of parts at all. Mglovesfun (talk) 11:34, 27 October 2009 (UTC)

[edit] Verifying rare languages

See WT:RFV#tingo. Languages that are little used or not often written are by nature, hard to cite with three durably archived cites. Apparently our only other Rapa Nui word is hehe, which I imagine is equally difficult to cite. There is no www.google.rap by the way. Is there any reasonable way to combat this? I can't think of one. Mglovesfun (talk) 20:52, 27 October 2009 (UTC)

I don't think the three cites rule fits smaller languages particularly well, since they're not only rarely printed, but the prints that exist are hardly avaliable on the Internet. I think we should accept definitions in dictionaries or scholarly works for those languages. -- Prince Kassad 21:05, 27 October 2009 (UTC)
Well if it's written and has any kind of literature, I don't think we should stray away much from the usual CFI (we could e.g. lower it to only one attestation, because in those cases when language is spoken by a tiny community there is little chance that the recorded word is not actually used). If a language has no written literature at all, and is only described in scholarly works in some form of transcription (or wore, in multiple incompatible transcriptions, depending on how the speech was analyzed by the linguists who investigated it), then it should IMHO belong only to the appendix namespace. A quick Web search yields several websites containing written Rapanui, so there's no excuse to remove RfV label. --Ivan Štambuk 21:37, 27 October 2009 (UTC)
That would exclude all languages without written tradition to appear in Wiktionary at all. Given that there are about 5,000 languages without written tradition, this is very major. -- Prince Kassad 13:30, 28 October 2009 (UTC)
Regardless, we shouldn't be adding them at all in the main namespace if there is no standardized orthography or transcription. The loss is minimal, as all of these languages will be extinct by the end of the century, and they interest barely anyone outside the academia. --Ivan Štambuk 15:11, 28 October 2009 (UTC)
That might be a fairer way to define what we currently include under "appeared in a well-known work" (or did when I last read this stuff) - if it appears as a mention in a scholarly article (or two, or three), then we can take Wikipedia's approach and define it in terms of what is given there (given that wthere are not enough cites to define it for ourselves) - though we might need more rigourous criteria to define when such mentions are acceptable, yadayada. Conrad.Irwin 21:41, 27 October 2009 (UTC)
In the case of Rapa Nui, there is http://www.rongorongo.org/index.html, but this dictionary does not have tingo. To me, the definition looks fishy. The book by Adam Jacot de Boinod seems to have been poorly researched and inaccurate. Looked at some of his German inclusions...they are either just plain wrong or, in some cases, do not even exist. I say, delete tingo posthaste. —Stephen 22:30, 27 October 2009 (UTC)
Google Books suggests a possible single reference outside of the controversial The Meaning of Tingo reference on "page 11" of Pacific Studies, Vol. 3-4, a publication by Brigham Young University--Hawaii Campus's Institute for Polynesian Studies, possibly being earlier than the books were published. (OCLC link) This particular reference, while perhaps rescuing the use, is still not well supported at best, and there's no way for me right now to easily confirm that this reference predates The Meaning of Tingo in any case. It does appear to discuss it in context with a similar meaning ("[a reciprocity system] abused by the unscrupulous who might make excessive demands"), however, which may be promising if it does appear to be independent from the word collection. Finding an actual Rapa Nui source may be difficult, however. --Pipian 07:09, 21 November 2009 (UTC)

[edit] Tbot mess with Chinese translations - this must stop

I left a message on Robert Ullmann's talk page but got no reply yet,

This edit [6] created such a mess with this Chinese translation! Can this be stopped please? I don't know how many entries are affected but whatever it's doing, it's wrong! --Anatoli 22:42, 27 October 2009 (UTC)

It was probably caused by things like [[wo|wǒ]]. I think all of those are wrong and, if they are linked at all, should simply be [[wǒ]]. The tone marks should not be ignored. —Stephen 23:26, 27 October 2009 (UTC)
Yes, Stephen, that looked like an attempt to wikify pinyin linking to pages without tone marks but it all went wrong. The transliteration should follow "|tr=", so the result was just a piece of ugly looking code. Anyway, I don't see much benefit to linking transliteration to pinyin syllables with or without tone marks, besides, transliteration should be left alone. Anatoli 23:45, 27 October 2009 (UTC)
I have (just now ;-) replied, and yes, the problem is a bad attempt to wikilink to the forms w/o tones. Either they should be linked properly to the forms with tones, or, as is usual with transliterations, not linked at all, as transliterations are usually not also written forms. As Stephen says, lose the piped links and link to the forms with tones. Robert Ullmann 23:58, 27 October 2009 (UTC)
Please don't link at all. Anatoli 08:16, 28 October 2009 (UTC)
I agree, the Pinyin transliterations should be unlinked. Delete the links when you see them. —Stephen 05:43, 29 October 2009 (UTC)
Robert, as you said on your talk page (I got lost in that discussion, sorry), it seems Tbot is trying to link the first part of Japanese transliteration, assuming it is "|tr=(Hiragana), Rōmaji|". I don't think it's a good idea either. The transliteration is a free form, could be more than one reading, mixing Hiragana and Rōmaji or only Rōmaji. The Hiragana entry doesn't have to exist, if the word is seldom written in Hiragana. Even if there is a value in it, I'd leave it for humans to add a link. --Anatoli 22:04, 29 October 2009 (UTC)
  • I don't feel qualified to make a comment about the scripting problems issues, but I gotta say, those translations for "I must go" (我必須去 and 我應該去) are pretty awkward. This is because they use literal translations of the word "go", when in fact the implicature is not "go" but "leave". 我必須走了 would be much more natural. Tooironic 08:21, 2 November 2009 (UTC)
    • Although if I may just say one thing about the scripting thing, I recently added both "literal" and "natural" (though less accurate) translations for antipasto, please let me know if I've done the formatting correctly. Cheers. Tooironic 09:54, 2 November 2009 (UTC)
Thanks, I agree with your translation and I have changed it. True, I haven't added the explanation that "I must leave" is implied. Anatoli 22:36, 3 November 2009 (UTC)

[edit] Portals

Have language portals ever been suggested here? I'm guessing yes but consensus can change. Essentially as per Wikipedia, a portal links the main categories and appendices using subframes and nice colors. Mglovesfun (talk) 20:59, 28 October 2009 (UTC)

The search function gets just one relevant result. Mglovesfun (talk) 21:10, 28 October 2009 (UTC)
Most languages have Wikipedia pages, many languages have pretty good (internal) About: pages, slowly more have Index: pages, and several have extensive grammatical appendices - maybe some kind of overview page could be created, not sure what location would be the best for it (I certainly think grammatical appendices are our biggest lack in the per-language department) - might be worth using the Index:<language> page if it's just to link to other places. Conrad.Irwin 21:17, 28 October 2009 (UTC)
Isn't that what the About pages are about? -- Prince Kassad 21:38, 28 October 2009 (UTC)
Those are for editors. I suspect Mg meant something for visitors.​—msh210 22:35, 28 October 2009 (UTC)
MG: Where would the portals appear? Surely not on an entry page? How then would a user find the portal page? Would it be a link from the L2 header, or via some banner at the bottom of each language section? DCDuring TALK 00:19, 29 October 2009 (UTC)
The French Wiktionary uses main page links to its portals, which I think is a good idea if we want to follow it. —Internoob (Disc.Cont.) 18:38, 31 October 2009 (UTC)
I don't think this is a good idea, Wiktionary seems rather short of people who care about nice colors, etc. (as evidenced by the surprising amount of indifference to the logo vote and the main page redesign). It seems to me like this would end up with some short-lived attempts to make a bunch of mostly unused pages and a lot of "to-do" things which would never get done. --Yair rand 01:21, 29 October 2009 (UTC)
Yes (per above) I mean for visitors rather than editors. I certainly think Index pages could handle this, and as Yair points out, we may end up with too much duplication from indexes/about/portals and they may end up being up for deletion. Mglovesfun (talk) 10:44, 29 October 2009 (UTC)
If this was done, I like the idea of linking all the level 2 headers to them (for all languages). The current linking to a dictionary definition seems a bit pointless. Conrad.Irwin 20:37, 29 October 2009 (UTC)
I had thought there was some reason why we were not having links in headers. Is it that we only want well-structured consistent ones, as via templates?
Presumably because it is easier to ensure they are the same, yes - and using a template or link to a specific place is harder for a new editor to learn. Conrad.Irwin 13:32, 30 October 2009 (UTC)
I thought I had seen at least one "About" page that was somewhat portal-like, but I can't locate it now. Could such a page serve or do we want a separate portal-type page? In the absence of a suitable page would the WP article be an acceptable link, or should we have/show no link until a portal page exists? DCDuring TALK 22:51, 29 October 2009 (UTC)
I think the Wikipedia article would make a good substitute, but don't remember seeing anything like what I imagine yet. Conrad.Irwin 13:32, 30 October 2009 (UTC)
IMO this is such a good idea that we ought to implement the heading idea using the WP articles ASAP, perhaps for "smaller" languages for which we are unlikely to develop our own portal in the near term. We can work through the corresponding article to make sure that there are wikilinks back to us to supplement the "back" button. Do any of our bots need to be amended to not flag the L2 heading link? Should be wait for templates? A small-scale experiment/demostration/implementation would be useful before a vote if a vote is required or before large-scale implementation. DCDuring TALK 15:08, 30 October 2009 (UTC)
I just remembered the possible problem: Links in headers make for bad section links. We could get round this by defining a new link inside a template within the heading - but when people click on the ToC they will then be linked to a bad link - it's not a huge problem, but certainly one worth considering. AutoFormat would likely need instruction, as would the index processing stuff that I do, as would all of the inflectobots. Conrad.Irwin 15:26, 30 October 2009 (UTC)
What happens when they link? Do they go to right place but get a confusing message? Do they go nowhere? If it gives users a navigation/confusion problem, the net advantage goes negative IMO. DCDuring TALK 15:49, 30 October 2009 (UTC)
AFAIK, they do not work at all. -- Prince Kassad 18:44, 31 October 2009 (UTC)
I remember something mentioned in the Grease Pit a while back about how section links often have the same name. "#Etymology_2" for instance could refer to the first instance of ===Etymology 2=== or to the second instance of ===Etymology===. Perhaps with a single implementation we could kill two birds with one stone? Just a thought. —Internoob (Disc.Cont.) 18:38, 31 October 2009 (UTC)
Unindent back to Yair rand's comment It could be done. How many contributors would we need to make and maintain a portal? If MG and I expend the same effort on a portal that we spend on our user pages, a Portal:French would not look too shabby. —Internoob (Disc.Cont.) 18:38, 31 October 2009 (UTC)
Another thing is, that once a portal is "done" (that is, well formatted) it doesn't need much updating. It might only need a dozen edits in a year; things like new appendices. And yes L2 headers is the most logical way to do it. Mglovesfun (talk) 06:15, 1 November 2009 (UTC)
I just noticed that in the proposed Main page redesign there is a section called "Languages of the world" which currently links to language indexes. If portals are implemented, and the redesign ever gets done, it would probably make sense to have that section link to portals. --Yair rand 20:56, 4 November 2009 (UTC)

What exactly are the proposed Portals designed to do? Specifically, what would they do that isn't already being done in the language categories, indexes, and About pages? --EncycloPetey 00:14, 8 November 2009 (UTC)

I'm assuming from the above discussion and from what I've seen of the French Wiktionary portals, that they would be designed specifically for readers, with nice colors and designs, some information on the language, links to categories, and maybe some links to WM projects in that language. I think this could be useful if there actually is momentum to do this, but not if we're going to end up six months from now with only three or so portals. --Yair rand 16:40, 8 November 2009 (UTC)

[edit] November 2009

[edit] Bot idea

I'd like to know if there would be consensus for me (or someone else) to write and run a bot to move sister-project links (eg {{wikipedia}}) and images that are above the first language header to directly underneath that header. Information above the language sections is really language-independent. The two main examples of appropriate above-the-first-L2 elements are {{also}} (which links to entries that could be in any language) and {{character info}} (which shows encoding related info). Images and sister-projects links are inherently language specific. Images are representations of language-specific definitions (I'll stear clear of images in templates such as {{stroke order}} or {{character info}} which actually represent the character and not it's definition) and the sister-project boxes link to language-specific projects (eg en.wikipedia.org). These elements therefore should be moved into the language section they correspond to, which, is as far as I've seen, always the top language section. This will maintain semantic order for the page and follow existing guidelines (eg Template_talk:wikipedia#Placement). Additionally, these offending elements mess with right-hand side Table of Contents. Are there any notable deviations that I'm currenlty not aware of? If there is concensus I would code up a bot and submit it for a vote. --Bequw¢τ 22:51, 2 November 2009 (UTC)

I like the idea with one class of exceptions: links to WP disambiguation pages (and only dab pages) that actually exist. Such links should be in a more compact format than our big WP boxes. Those pages are more like our {{also}} links. They provide another opportunity for users to locate, for example, Proper nouns and Abbreviations which we do not cover. I am not sure whether this is desirable for other PoS entries. Perhaps something similar should be done for entries whose top L2 is not English, linking to a transliteration at WP, if there is one. I don't think we should ever to direct users from a landing page to a non-English pedia or sister project unless it is from the corresponding language section. I certainly doubt that we would want multiple sister project links above the first L2 header. DCDuring TALK 23:32, 2 November 2009 (UTC)
I'd support that. And personally, I don't see a need for WP disambig boxes to be above the first L2 header. I don't know for sure, but I'd guess that banner blindness probably prevents users from seeing boxes up there, anyway. —RuakhTALK 01:19, 3 November 2009 (UTC)
That would be a reason to get rid of everything above the first L2. We've got browser frame, site banner, site tabs, and the {{also}} row. It is the relatively empty middle of the "also" row that would be my target. I would favor preventing the also row from ever pushing the rhs ToC down, by forcing it to remain within a frame that did not interfere with ToC, if that were possible. DCDuring TALK 03:06, 3 November 2009 (UTC)
Sorry, but if you have an image above the first L2, so, as you note, it's not tied in any bot-readable way to any language, how will the bot know where to put it?​—msh210 15:24, 3 November 2009 (UTC)
Everyone I've seen has always pertained to the top L2 (usually English). It's pretty obvious if it doesn't pertain to the top one, and I think editors spot and fix these quickly. --Bequw¢τ 18:57, 3 November 2009 (UTC)
I support the idea. I support that all English WP boxes that are now above L2 heading come under the L2 heading, including disambiguating ones, as they pertain only to English regardless of their being disambiguating.
Images: The bot should make the assumption that any image that is above the first heading in an entry that has an English section is an image belonging to the English section, as my experience wholly confirms this assumption; the exceptions will have to be manually corrected.
An alternative to {{wikipedia}} below L2 heading would be {{pedialite}} in See also section. But I propose to leave this aside: whatever the preferences regarding the two templates, robotic moving {{wikipedia}} from above L2-heading below it does not increase the number of {{wikipedia}} instances. --Dan Polansky 16:34, 3 November 2009 (UTC)
The wikipedia links still pertain to only English so I think the proper way to make those links prominent on pages where wikipedia has definitions we don't is through a layout change in the English section or adding them to also somehow. The boxes shouldn't be left outside any L2. These possible solutions could be handled afterwards with no problem, assuming there's a consensus for that. For number's sake, out of about 12k {{wikipedia}} invocations that are above the first L2, about 1k are explicitly disambiguation pages. --Bequw¢τ 18:54, 3 November 2009 (UTC)
"Explicitly" meaning that that's how {wikipedia} refers to them? — or meaning that the WP page is categorized ina disambiguation category or has a disambiguation-page template?​—msh210 18:57, 3 November 2009 (UTC)
I meant "explicit" in the template call ({{wikipedia|dab=}}), so not perfect, just a rough estimate.--Bequw¢τ 14:41, 4 November 2009 (UTC)
Is it possible to specify a location that is just below the first L2 and just to the left of the ToC? That would be fine. Another location is alternative forms, though the header would be a misnomer. I am hoping to find a solution that doesn't take up vertical screen space either for the main flow of content (left justified) and the rhs ToC. Does or can the ToC have a maximum width with excess text wrapping to the line below at a reasonable resource burden? DCDuring TALK 21:32, 3 November 2009 (UTC)
Yeah, that could be done with some modifications. See this Sandbox version which works with the latest versions of FF, IE, and Chrome. Maybe an option could be passed into {{slim-wikipedia}} which would make it float next to the ToC and set the width to be variable rather than fixed. Do you like that? I do, it's prominent and still inside the L2. As for the ToC, its div tag can have a fixed width, but on overflow, the div will just present scrollbars rather than having the content wrap. But I don't think this is a huge problem because our Headers are generally just one or two words. --Bequw¢τ 14:41, 4 November 2009 (UTC)
That is beautiful. Exactly what I had in mind. Thanks.
Does anyone object to that placement for WP links to dab pages? It seems adequate for users who might be searching for proper noun senses or encyclopedic senses of a word and consistent with the logic of our heading structure. Should it be limited to use with dab pages in that location? What should it be called as a template? DCDuring TALK 15:30, 4 November 2009 (UTC)
Would something like {{temp|also|wp=}} work? generating "see also bing, bong, and Wikipedia articles" (for the disambiguation pages at least). Conrad.Irwin 21:47, 3 November 2009 (UTC)
{{also}} is for language-independent or cross-language linking, which is why I think that links to English (language-dependent) Wikipedia should not be on the same line with it, but rather should be somewhere below the "==English==" L2 heading: either in a box or in a pedialite link. An example of an entry with two boxes is "word"; an edit that the bot would make would be like this one made to "word", resulting in two Wikipedia boxes below "==English==" heading. --Dan Polansky 08:41, 4 November 2009 (UTC)
Do you think that the two wikipedia boxes stacked on top of each other is a problem? They are almost identical (the disambig one says "articles") which is not ideal. Anything to be done about these? --Bequw¢τ 14:44, 4 November 2009 (UTC)
I see no problem in having two boxes, but other people may differ. If the robot does not change the number of boxes, fine with me. If the robot keeps only one box in case there are more than one boxes, and it keeps the disambiguating one, fine with me. --Dan Polansky 18:02, 4 November 2009 (UTC)
Building off Dan's comment: It should be possible to link to two different articles sith a single box, rather than stacking two boxes. Alternatively, the {{pedialite}} link can be used instead of the box, when the box template becomes cumbersome. --EncycloPetey 00:11, 8 November 2009 (UTC)

[edit] Translations of the attributive form of nouns

Many English nouns may be used attributively as adjectives. In many cases the foreign-language translation is different to that of the noun sense. How do we show that? As an example, the translation table of folklore has folclore as an Italian translation - but nothing to point to folcloristico. SemperBlotto 22:46, 4 November 2009 (UTC)

Italian: {{t|it|folclore}} {{pos n}}, {{t|it|folcloristico}} {{pos a}} {{qualifier|in attributive use}}?​—msh210 22:58, 4 November 2009 (UTC)
Probably on folkloric xD However, in general this is a very delicate issue, because there are are also other adjectival senses that are not (or cannot be) expressed in English but can be relatively regularly derived in a FL, and that are being missed in the translation tables simply because they all translate as English attributive noun usage. E.g. "mother's", "of or pertaining to a mother", "like a mother" etc. I think that we should either:
  1. include separate translation tables for attributive usages of English nouns, which note that these translate as adjectives in many FL
  2. not add adjectives to the translation tables of English nouns at all, because of the cluttering and because these would prob. wrongly get generated as nouns on many FL wiktionaries that bot-generate mainspace entries from the English Wiktionary translation tables (I know this is not our problem, but still..). Also, in most of the cases, such adjectives are regular morphological derivations from the base noun, and could be listed in the FL noun entry in the ====Derived terms==== section. (In this case, folcloristico at folclore) --Ivan Štambuk 23:18, 4 November 2009 (UTC)
OK - I've added a second translation table to folklore - does that seem reasonable? SemperBlotto 08:00, 5 November 2009 (UTC)
Yes, indeed. "Attributive use" might not be clear enough as a table header line; perhaps "adjectives (translating the attributive use of the noun)"? (Well, that's worded badly, but something else explicit.)​—msh210 15:40, 5 November 2009 (UTC)

[edit] AWB please?

Hello. Can I please be approved for AWB use here please? Thanks, Razorflame 23:10, 5 November 2009 (UTC)

Yes check.svg Done: Wiktionary:AutoWikiBrowser/CheckPage#Approved users. —RuakhTALK 23:21, 5 November 2009 (UTC)
Same here please, although I don't know how to use it, I will need to know how. Mglovesfun (talk) 11:54, 6 November 2009 (UTC)
You should have access automatically as you are an administrator - if not you can authorize yourself. There is documentation somewhere on Wikipedia, I think. Conrad.Irwin 13:36, 6 November 2009 (UTC)

[edit] Categories with "words"

Some categories uses "words" to actually designate some "terms"/"vocables"/"locutions", for instance: Category:Autological_words, Category:Compound_words, Category:English_words_spelled_with_macrons.

I'm just suggesting to list, vote and rename all of them "properly" (as we've already done on fr.wikt). JackPotte 23:25, 5 November 2009 (UTC)

I agree that words isn't the best choice, because plenty of entries are not single words. I think vocable might be a bit obscure. Term sounds pretty good to me. Equinox 23:27, 5 November 2009 (UTC)
What is the benefit to users or anyone else of going through a voting process on this matter? DCDuring TALK 01:16, 6 November 2009 (UTC)
Having categories with more accurate and consistent names. Equinox 01:23, 6 November 2009 (UTC)
OK for "Terms in...", let's call a cat a cat, as on the other wikis. JackPotte 11:15, 6 November 2009 (UTC)
"Compound words" is currently accurate, as only closed compounds, meaning space-free, are within the scope of the category. So "black hole" is not included, and all the terms are indeed words in the sense "space-free terms". --Dan Polansky 09:14, 8 November 2009 (UTC)
Sorry but I don't agree, many examples like loan_translation include a space. Thus I'll rename these categories if nobody argues in 1 week. JackPotte 15:20, 16 November 2009 (UTC)
Loan translation is not a compound as we use term here. We use the term specifically to refer to combinations of entire words that do not have a space, as Dan Polansky pointed out. If the term "compound" is used to include other kinds of terms in fr.wikt you might consider the advantages of correcting it. I strongly object to the proposed renaming, especially of the "compound words" set of categories.
Further, it will not make people receptive to any of your proposals if they are accompanied by ultimatums. DCDuring TALK 22:16, 19 November 2009 (UTC)
JackPotte, I am unclear what you disagree about. The term "loan translation" is not a closed compound, as it contains a space. Yes, there is also what some call open compound, which includes terms with spaces such as "black hole". But open compounds are currently excluded from Category:English compound words, as you can see from having a glance at the content of the category. Until there is a clear consensus to include open compounds within the category--which I oppose as I think it would make the category much less useful--the category should keep its name "English compound words", which accurately describes its content. --Dan Polansky 08:51, 20 November 2009 (UTC)
I've never seen any English textbook refer to "compound terms." In English, they are called "compound words." Saying that you've made the same change on fr.wikt doesn't just seem contrived, it seems inappropriate. Do the proposed moves make more sense in English? --Connel MacKenzie 21:22, 19 November 2009 (UTC)
Re: "I've never seen any English textbook refer to 'compound terms.' In English, they are called 'compound words.'": Yes, I agree. —RuakhTALK 21:58, 19 November 2009 (UTC)
Is there any object to renaming the other two categories (Category:Autological words, Category:English words spelled with macrons) to use "terms" rather than "words"? I support that. --Bequw¢τ 14:42, 20 November 2009 (UTC)
+1 with Bequw: it's actually the fr.wikt state, which doesn't publish any "compound term" too. Sorry because I didn't need to speed anybody, let's just define a vote closure date, if no other argument are to be taken into account. JackPotte 21:42, 20 November 2009 (UTC)

[edit] Valencian -iste

First the background so you all can understand where I'm coming from. In the IEC's standard for Catalan, the suffix that corresponds to -ist is -ista which is both masculine and feminine. That's not too surprising since while nouns ending in 'a' are usually feminine in Catalan, but the Latin suffix -ista this descends from is masculine in Latin and similar cases where a masculine Latin noun stem ends in 'a' are usually masculine nouns in Catalan. However, in the Valencian (AVL) standard, they allow either -ista as the masculine singular or as a backformation -iste. There doesn't seem to be a similar case of a to e backformation for other words ending in 'a' that are both masculine and feminine. (or if it there is, it is rare). Therefore I'm planning on writing some specialized templates for this situation. I don't need help with the coding, but a potential policy type issue relating to the coding came to mind as I was wring one of them.

The template I plan to use to generate the definition line for the alternate masculine singular form. Since that definition line will always have the {{Valencia}} regional context label, and never any other context label, my initial inclination was to have the new template include {{Valencia}}. The minor advantages of doing so are that in the unlikely event we change how we do regional context labelling, the change could be done via one edit to the template, or in the equally unlikely event of the acceptance of -iste by the IEC as an alternate form for the masculine singular, the label could be removed with one edit. The only meaningful advantage is not having to type or paste {{Valencia}} into the wikitext.

As to the potential disadvantages, first off there is a mild side effect that having the new template include {{Valencia}} would cause it to be a member of Category:Regional context labels. However, the concern that brought me here to bug you with this is do we want context labels buried inside non-context templates or should context templates always be present in the wikitext itself? I don't think it would be a concern, but before I write the template and make use of it, I thought I'd get input from others. — Carolina wren discussió 01:53, 6 November 2009 (UTC)

[edit] Edits to etymology that remove "from" from them.

Hello there. I've had several people tell me that I should request the communitys' support in order to not have my edits rolled back. I went and made a bunch of edits that removed the "From" from some of the etymologies of some entries for words that began with re- dis- over- anti- non- un- and in- to make them all the same. The majority of them were made without the "From" in the Etymology section, so I believe that it would make it consistent to have them all one or the other. Since more than 90% of them don't have the "From" in the etymology section, I believe that removing the "From" in the etymology section would be the way to go. What do you think? Razorflame 07:56, 6 November 2009 (UTC)

The logic behind keeping "from" is the idea that many people don't know what the word etymology means. "From" hints them that this section is about the origin of the word. Besides, I think the ety line looks more friendly with "from" in it. Keep it. --Vahagn Petrosyan 10:33, 6 November 2009 (UTC)
I prefer to have from, From break + -able.. Mglovesfun (talk) 11:44, 6 November 2009 (UTC)
Does that mean that you don't like the use of the {{suffix}} template? —RuakhTALK 11:47, 6 November 2009 (UTC)
I agree 100% with Vahagn. —RuakhTALK 11:47, 6 November 2009 (UTC)
No but if I used suffix here, it categorizes this page (hence {{term}}). What I meant to say was !I agree 100% with Vahagn". Mglovesfun (talk) 12:06, 6 November 2009 (UTC)
Ah, O.K. :-)   For future reference, most of our templates — including {{suffix}} — only categorize when used in entries. So <u>From {{suffix|break|able}}.</u> produces From break +‎ -able., no category. :-)   —RuakhTALK 12:17, 6 November 2009 (UTC)
BTW, see previous discussion at [[Wiktionary:Beer parlour archive/2008/November#Etymology sections are very concise]]. (Thanks go to Dan Polansky for tracking down the link!) —RuakhTALK 12:24, 6 November 2009 (UTC)
I also prefer including the "From", though occasionally I'm not sure it's actually true; just because it is made up of the two parts, doesn't mean that that's where it originated. While, in this case, I don't mind particularly either way, in future you should ask people for reasons that the situation is as it is before deciding to fix it on a whim. Conrad.Irwin 13:33, 6 November 2009 (UTC)

(unindenting) So everyone prefers to add the from in front of the {{prefix instead of not including it, because if I were re-granted the AWB rights, I would fix my "mistake" and make them all From first, unless people want them to change back to the way that they were. Secondly, CI, I've already learnt from my mistake, but I did not think it was a big enough of a deal to warrant its' own community discussion. Cheers, Razorflame 13:38, 6 November 2009 (UTC)

I think they're better left alone, there are more important things to do - including (but not limited to) User:Hippiebot's, User:Visviva's and User:Brian0918's vast lists of entries we are missing. Many entries are missing basic information, like quotes, pronunciation, and synonyms. Formatting issues like this can just be ignored for now. Conrad.Irwin 13:48, 6 November 2009 (UTC)
Agreeing with Conrad, I believe that many (but not a large portion) of the terms for which we show only morphological information in the etymology actually had formed in Middle English, Anglo-Norman, or some vintage of French, Latin, or even Greek (-ize verbs often are of Greek formation}. Once we track the formation, we would be entitled to validly insert "from". There is good, albeit undocumented, reason to not have complete uniformity. The presence of a "From" could be taken as a positive indication that the word was formed in Modern English and not another language. I suppose a comment in the entry confirming the verification of that fact or conclusion would be useful to prevent the "from" from being deleted, at least manually. DCDuring TALK 14:27, 6 November 2009 (UTC)
On the other hand, if the majority of entries can be given a consistent format now, future editors are more likely to follow the convention. While I agree there are many important tasks here, nipping a formatting problem in the bud is still worthwhile. --Bequw¢τ 17:01, 6 November 2009 (UTC)
Razorflame, perhaps you could take care of Category:Translation table header lacks gloss? Missing glosses make me not wanna add translations at all. --Vahagn Petrosyan 14:20, 6 November 2009 (UTC)
Is that something done with AWB? How would I go about doing it? Thanks, Razorflame 14:23, 6 November 2009 (UTC)
The types of edits required are this and this. I don't know if AWB can help here. --Vahagn Petrosyan 14:38, 6 November 2009 (UTC)
Nope, that can't be done via AWB. Razorflame 14:59, 6 November 2009 (UTC)
  • What is this gloss thing you guys speak of? Tooironic 23:47, 7 November 2009 (UTC) Oh wait, do you mean the translation explanation in the {{trans-top}} tag? I've stopped adding these for words with singular senses as, IMO, it's a waste of time. Tooironic 23:50, 7 November 2009 (UTC)
If someone is translating using ordinary edit, the gloss may be the only reminder of the sense they are translating. It is one of the many concessions that we make to the cognitive limitations of us imbeciles. DCDuring TALK 00:03, 8 November 2009 (UTC)
It means that, when someone adds another sense, without adding a translation section, that all the translations don't have to be put under a {{ttbc}} when it is noticed that we have one table for two sense and no-one knows who added what for what (the history feature is only useful if this is caught within an edit or two). While it brings no immediate gain, it provides a much more stable foundation on which to build the rest of the entry. Conrad.Irwin 00:07, 8 November 2009 (UTC)
Oooooh, I see. That makes sense. I guess I better stop being lazy and add them from now on. Cheers. Tooironic 01:44, 8 November 2009 (UTC)

[edit] Archiving this page

Ok, I just have something to say about this page; it is too damn big. 744 KB is too big of a page for many people to load, let alone certain editors. I am certain that if you keep the size of this page down that you could possibly get more people engaged in actively editing in this section of Wiktionary. Just a thought, Razorflame 14:59, 6 November 2009 (UTC)

Feel free. :p Conrad.Irwin 15:50, 6 November 2009 (UTC)
Much better, but still a tad too bit. I would rather allow other more experienced editors or administrators to archive this page rather than me because I do not feel as though I am experienced enough to archive it, as I will probably not know which discussions are still active or not. Razorflame 16:25, 6 November 2009 (UTC)
We should really have automatic archiving of the "big" discussion pages (RFV and RFD spring to mind). Of course there'd have to be some new mechanism for linking, so that a semi-recent discussion wouldn't become unlinkable just because it had been archived. Perhaps somebody else has bright ideas on this...? Equinox 12:21, 10 November 2009 (UTC)
We should use sub-pages, as what we do with WT:VOTE - that way, if someone links to WT:BP#topic-name, the archive can be found by changing the # to a /. The problem is that it's (slightly) more effort to create a new conversation, but a BOT with the sole function of adding Wiktionary:Beer parlour/subpage to the end of WT:BP every-time a sub-page is created would almost eliminate this (if it were watching recent-changes in real-time). If we were to do this, the bot would likely need to develop a few extra features over time, firstly splitting off new sections added the old way into new sub-pages; removing deleted pages; adding an archive template to manually untranscluded pages; and maybe automatically archiving sections a month after the last reply. If people want to do this, I am very happy to evolve (or help to evolve) such a bot. Conrad.Irwin 12:43, 10 November 2009 (UTC)
But are they going to install mw:Extension:LiquidThreads soon? --Bequw¢τ 14:19, 10 November 2009 (UTC)
Possibly, it does mostly the same, except implemented as an extension - I'm not sure how it handles archiving now, initially it was just going to limit to X conversations per-page; so hopefully that's been made more flexible. Conrad.Irwin 14:23, 10 November 2009 (UTC)

[edit] w:template:talkback

Would there be any objection to importing this template from Wikipedia? --Yair rand 04:45, 10 November 2009 (UTC)

I wouldn't be objected to it, although I cannot say how used it will be, which might lead to a better choice of not importing it. Razorflame 06:03, 10 November 2009 (UTC)
It might be nicer to not use a box as then it stands out more than any other part of a normal talk page, despite having less use than any other part of a talk page. If a box must be used it should inherit from {{request box}} or {{maintenance box}} (see Template:rfe or Template:rfc) so that the look is consistent, and it is reasonably certain to interact correctly with other layout features (particularly right-hand elements interact badly with naive boxes). I personally do not like to be pinged with such messages, I check my watch-list frequently enough that I will know if there's something I want to reply to. Conrad.Irwin 12:37, 10 November 2009 (UTC)
I suspect most people check usertalkpages they post to for some time, but, for example, I recently responded on my talkpage to an old comment that I doubted was being followed any longer by the OP, so posted a note on his talkpage that was roughly ==[subject matter]== I've responded at my talk.~~~~. I could have linked to the section easily enough. Does that require templatification? Not imo, but I don't see the harm in it. I say go for it, but either with Conrad's modifications, or, better yet, as non-boxed text.​—msh210 18:23, 10 November 2009 (UTC)
Okay, how about this? --Yair rand 19:36, 10 November 2009 (UTC)
Also see {{Mytalk}} --Yair rand 19:49, 10 November 2009 (UTC)

[edit] Bot request

Hello. I would like to propose a bot to be used here. I currently use it for interwikis, but I wouldn't mind reworking it to be used for AWB tasks. Here is what I would like to propose:

  • My username: Razorflame
  • My bots' username: Darkicebot
  • Software used: AWB
  • Proposed task: Both Prince-Kassad and hippietrail said that {{top}}, {{mid}}, and {{bottom}} templates on entries needed to be changed to {{trans-top}}, {{trans-mid}}, and {{trans-bottom}}. This task would easily be able to be done using Darkicebot and AWB, as it needs to be done, according to the two users above. If you would rather a human do it, I would also be happy to do it, but I would require AWB access in order to do it, a point of contention amongst me and Ruakh.

Thanks, Razorflame 06:00, 10 November 2009 (UTC)

Doesn't AutoFormat already do that? --Yair rand 06:06, 10 November 2009 (UTC)
Not that I know of. There seems to be a long list of entries on the What links here for the templates that need to be done, so....Razorflame 06:10, 10 November 2009 (UTC)
AutoFormat only does it when the gloss is present. It doesn't do it when a gloss is not present. Razorflame 06:11, 10 November 2009 (UTC)
No. Any mass conversion to {{trans-top}} needs a human to generate glosses and determine whether the translations need to be split to conform with senses. A bot would only create more work in Category:Translation table header lacks gloss. Nadando 06:13, 10 November 2009 (UTC)
1. AutoFormat does not add a gloss either and nobody seems to complain against that, 2. however AutoFormat only reads the recent changes, so we need a dedicated bot for {{top}} replacement. -- Prince Kassad 16:51, 10 November 2009 (UTC)
As Nadando said above, {{top}} et al can't be converted wholesale to {{trans-top}} et al because they aren't 1:1. Top is used for more than just translation sections and the translation sections using top can't just have the templates swapped, as trans-top has a different and more involved usage. This is work best done by humans, assisted by AWB probably but not done blindly. - TheDaveRoss 17:05, 10 November 2009 (UTC)
It does say AWB right at the beginning. -- Prince Kassad 18:11, 10 November 2009 (UTC)
AWB is not a bot, it is a browser. This is a bot request. You don't need -- nor should you have -- bot status to run AWB in manual mode. - TheDaveRoss 20:42, 11 November 2009 (UTC)
Then can I turn this into an AWB request to use AWB myself on this account to do this? Cheers, Razorflame 16:16, 12 November 2009 (UTC)
You don't need permission to use AWB, and you don't need a special account. As long as you know what you are doing and what you are doing is in accordance with en.wikt policies etc. you can just do it. If you are going to be doing a lot of edits quickly you ought to mark them as minor, but other than that no special consideration is needed. - TheDaveRoss 22:00, 12 November 2009 (UTC)
Ruakh gave me AWB access a few days ago, but took it back when s/he said that I needed to talk with the community before any big changes...Razorflame 22:01, 12 November 2009 (UTC)

[edit] Huh?

What in the world is the Wikipedia logo doing at the top of every single Wiktionary page with a link to a page titled "Wikipedia Forever"? --Yair rand 05:19, 11 November 2009 (UTC)

It's yet another (badly implemented) advertising banner. It's slowed my page loads down to a crawl on both Wiktionary and Wikipedia. --EncycloPetey 05:58, 11 November 2009 (UTC)
You can adblock this banner, if it bothers you. -- Prince Kassad 06:01, 11 November 2009 (UTC)
I have my preferences already set to not show them, but it pops up for a fraction of a second anyway, and the loading delay prior to that is still a problem. --EncycloPetey 06:10, 11 November 2009 (UTC)

If others are willing, I would suggest adding the following to MediaWiki:Common.css so that it looks less like we have been hacked (it is about the same, but a third of the size, and with internal link colour instead of external link colour). I have no idea who was responsible for this abomination, but I hope they're reading the comments from all-over the wiki-sphere. Conrad.Irwin 13:43, 11 November 2009 (UTC)

div.siteNoticeBig {
    height: 30px;
    margin-top: 3px;
}
div.siteNoticeBig .toggle-box{
    font-size: 1em;
    padding-right: 5px;
}
div.siteNoticeBig img {
    height: 20px;
    padding: 0px 10px;
    margin-top: 5px;
}
div.siteNoticeBig #forever {
    margin-top: -50px;
    font-size: 12px;
    font-weight: bold;
    color: #002BB8;
}
The banner has been removed do to IE6/7 breakage. Apparently there has been a huge amount of opposition to the banner: w:Wikipedia:PROPS#Abolish the silly headers, m:Fundraising_2009/Launch_Feedback, n:Wikinews:Water_cooler/technical#Ugly_Ass_central_site_notice. There is now a discussion going on to make alternative banners. --Yair rand 17:54, 11 November 2009 (UTC)
The need for project-localized banners was raised earlier, by me and by others. I expect this is just a sign of being in a hurry; there is interest in making sure this happens. Please add your voice to the need for suitable Wiktionary banners on the launch feedback page. +sj + 11:56, 12 November 2009 (UTC)

[edit] Request for clarification of policy / convention in re inflected forms of English multi-word idioms

What is the present policy / convention governing inflected forms of English multi-word idioms such as damn by association and fudge the issue? It was my understanding that we aren’t meant to give such terms full entries (or even soft-redirect entries), but that they should rather hard-redirect to their lemmata. This question stems from uncertainty in a discussion between msh210 and me. How are they meant to be treated?  (u):Raifʻhār (t):Doremítzwr﴿ 18:17, 11 November 2009 (UTC)

NVRM; Ruakh has resolved this confusion: Such forms are meant to hard-redirect, per Wiktionary:Redirections#Redirecting between different forms of idioms.  (u):Raifʻhār (t):Doremítzwr﴿ 18:22, 11 November 2009 (UTC)
An aside: the title of this thread should IMHO have been "Inflected forms of English multi-word idioms" or something of the sort, to be usefully short. --Dan Polansky 20:01, 11 November 2009 (UTC)
If we have the technology to put a man on the Moon (or [your favorite technological milestone here]), we should have the technology to do word wraps in the floating right-hand ToC. That would get rid of one bad consequence of overly long section heads. DCDuring TALK 23:07, 11 November 2009 (UTC)

[edit] Could all entries have a non-bolded repeat of the word?

Hello,

I am a user. What I am finding is a need for your word entries to have a non-bolded copy of the word, so I can simply copy and paste it into my document. I don't look up words I already know how to spell or that the spell checker in MS Word already knows. So the words I am looking up need to be copied for speedy processing. But once I copy/paste, I always have to go back and un-bold them. This seems a waste of time to me. Giving me such a copy of the word would enhance the usefulness of the site.

Thanks for your consideration. —This comment was unsigned.

  • Time and effort involved - masses.
  • Would help - one person.
  • Would piss off - many people.
  • We probably won't do it. SemperBlotto 14:54, 12 November 2009 (UTC)
As far as I know, only certain browsers like Firefox copy the style of a word/links/whatever else. I'm not really sure what the big issue with unbolding a word is...lazy person. — [ R·I·C ] opiaterein — 15:22, 12 November 2009 (UTC)
Use Opera. It does not copy formatting. --Vahagn Petrosyan 16:10, 12 November 2009 (UTC)
The title is non-bolded, though it might copy it out title-sized. Conrad.Irwin 16:28, 12 November 2009 (UTC)
they could use Paste Special, Unformatted. Even write a little macro so that is one combo-keystroke.Richardb

[edit] Darkicebot inflection testing

Hi there all. I am going to be testing my inflection bot Darkicebot for a while. It won't make too many edits. I just need to test it while I get it working properly. I am only going to be making an edit or two, stopping it, then repeating the process until it works. I plan on fixing all mistakes it makes in the meantime. Thanks, Razorflame 17:13, 12 November 2009 (UTC)

Testing has been completed and a vote is now active. Thanks Razorflame 20:52, 12 November 2009 (UTC)
I notice you have several templates lined up for use with the bot, but the VOTE only mentions Esperanto verbs - which languages and parts of speech will Darkicebot service? Conrad.Irwin 23:40, 12 November 2009 (UTC)
I've gone over the vote and added in the information which you are looking for. I've stated on the vote page that I will be creating Esperanto verb form-of entries, batch creating Esperanto noun form-of entries from a text file, and batch creating Ido noun plurals from a text file. Those are my goals. I will probably add in Esperanto adjectives, too, but I think that I will wait for that for a while longer. Cheers, Razorflame 23:43, 12 November 2009 (UTC)

[edit] My userbox

Hello there. After reading over two archives about userboxes, I noticed that you said that controversial userboxes are not allowed here on the English Wiktionary. I made User:Razorflame/Count so that I could keep track of how many edits that I have made here, and I don't believe that it was offensive towards anyone, much less controversial. Ruakh removed it from my userpage (made it invisible), and I went ahead and fully deleted it. I just wanted to ask here if I could be allowed to use this userbox as it isn't doing any harm to anyone. Thanks, Razorflame 20:52, 12 November 2009 (UTC)

Not for the unaware: the relevant policy is (for some inconceivable reason) Wiktionary:Neutral point of view, which states that "All other [non–language-proficiency] userboxes are currently forbidden (though specific exceptions may be made, after discussion)." I believe Razorflame is asking for such a specific exception; if so, I'd be O.K. with that. The only POV it seems to reflect is that it's good to contribute to Wiktionary, which I think is a POV we can all accept. (Arguably it implies that quantity is more important than quality, but personally, I'd much rather a number-of-contributions userbox than a my-contributions-are-better-than-yours userbox.) —RuakhTALK 22:49, 12 November 2009 (UTC)
I don't see it as a good thing to boast about, the number of contributions is almost completely irrelevant. The idea of acheiving a high "edit count" can encourage people to make many edits of meagre substance - and while there are plenty of that kind of edit to do, I feel it is more useful to encourage people to make more beneficial edits. As some edits are considerably harder, and may take thousands of times as long as quick-fixes or javascript edits, the "amount of contribution" of two users with a similar edit count could be well over 100% different, it would seem misleading if they wear the same badge. The policy got co-opted into WT:USER, which while not voted on, is I think reasonably accepted - though we seem quite lax about signatures in some cases. Conrad.Irwin 23:36, 12 November 2009 (UTC)
I suppose Conrad's right that it'd be inappropriately promoting quantity over quality, but I don't see that as a big deal. What I see as a bigger deal is the slippery slope toward infection by the bacterium Userboxophilis uikipedii.​—msh210 16:19, 13 November 2009 (UTC)
It isn't a question of being proud of your edit count or displaying a statistic. Both Stephen and I link to this site which keeps track of such things. The concern among the community has always been the problem of what's happened to UserBoxes on Wikipedia. There, the boxes have become silly, numerous, and (therefore) often uninformative. On a multilingual dictionary project like this, the most relevant information to have in a standardized format is language proficiency, with script proficiency just a bit behind that. Your politics, ideologies, diet, and other idiosyncrasies have no bearing on the project, and we chose to therefore limit the use of Userboxes. You are still free to describe yourself in text on your user page, just not to create userboxes or their accompanying categories. --EncycloPetey 02:00, 16 November 2009 (UTC)
Ruakh is correct. Now, as it seems relevant to the project, I would allow. I don't think a discussion is in order on the merits of counting edits. If Razorflame is proud of it, and again as it's relevant to his contribution, he can ask for an exception. That would apply to a specific template, I think, unless he wants to be more general in his proposal. I doubt this has too be very formal either. 70.112.24.181 06:21, 18 November 2009 (UTC)
I don't particularly like the idea of a contributions userbox. However, if Razorflame would like to display the userbox, it is his userspace after all which we shouldn't have too much say over. Ditto Conrad.Irwin and msh210's comments; let's not get into promoting quantity over quality please....... L☺g☺maniac chat? 15:23, 18 November 2009 (UTC)
How could it possibly be wrong for a user to have a labor-saving way of displaying their edit count? Any way individual users can motivate themselves to do something productive seems fine. Is there any metric that would provide the correct incentives in all regards? No. But all metrics have some value. I am sure I am not alone in using the zero target for the number of English-language or any-language entries in various cleanup categories, both community/official and user-created. (BTW, It would be nice to have some metrics for entry quality by language.)
OTOH, competitive edit counting seems to go in the wrong direction. DCDuring TALK 15:40, 18 November 2009 (UTC)
The implication of this last point is clearly that we should forbid access to this site or have it destroyed. Just knowing that it exists is making my adrenalin flow.
The concern about keeping main user pages "plain" seems to befit this community's monkish approach to the project. The question of appearance is wholly distinct from the edit-count question, is it not? DCDuring TALK 15:54, 18 November 2009 (UTC)
I think it's pointless posturing and would personally oppose any loosening of the current userbox rules. Equinox 22:40, 18 November 2009 (UTC)
I 100% agree with Equinox. Unfortunately, too often people here do mindless botwork in order to pump their editcount, when they should be focusing on missing content itself. Allowing editcount userboxes would amount to openly promoting that kind of degenerate metric. --Ivan Štambuk 04:29, 19 November 2009 (UTC)
If the mindless botwork were actually being done by mindless bots then we wouldn't have to worry about recruiting and motivating people. There are plenty of cases of one kind of flaw appearing in 100 entries. Getting a bot to do that is often not worth it. In fact the content gaps and the format blunders and the poor quality all need work. One good way for people to learn is to do a lot of less complicated word so that they get to see a large number of entries with the inevitable large number of good and bad features. Then they can proceed to more complicated work and eventually to noting new problems and making major improvements. It certainly would be nice if everyone was brilliant from their arrival here, but in fact we are stuck with defective human contributors, some deficient in people skills, some deficient in energy, some deficient in perspective, some deficient in humor, some deficient in language skills, some deficient in tolerance, some deficient in patience. DCDuring TALK 04:52, 19 November 2009 (UTC)
And you think that editcount userboxes would help foster a culture of newbies gradually turning eager to make quality edits after making 10^x trivial ones:) I'd say quite the opposite.. Lexicography if one of the most boring and ardous tasks imaginable, and the real motivation comes from "within".
God I'd hate we turn to Wikipedia with everyone posturing with their fancy "awards" and brag-articles.. --Ivan Štambuk 05:18, 19 November 2009 (UTC)
Ugh yes. I don't care if anyone (newbie or not) wants to do the mindless botwork; it needs to be done, and it won't harm anybody. However, they shouldn't be showing off about it. This page has a lot more useful information on it than just user contributions, and I don't see it as a bad thing that should be forbidden linking to or deleted. I saw it before I started to contribute here and it was encouraging to see that so many people were dedicated enough to this project to contribute that much. (However, I don't like the column on that table that shows position change i.e. this user has moved up one spot in the last month. That gives the air that this is a competition :P) L☺g☺maniac chat? 15:19, 19 November 2009 (UTC)

[edit] Wiktionary on the WP Signpost

The Signpost apparently has a series in which each of Wikipedia's sister projects is featured with an interview of one of the locals. As a relative old-timer on both projects, I volunteered to write the one on Wiktionary. It is due to be published in the next issue, a few days from now. I would like to encourage anyone who is interested to read over what I have written and perhaps even make any changes you feel are appropriate. The current draft is here. Dominic·t 07:33, 13 November 2009 (UTC)

[edit] Wiktionary:Requests for deletion/Others#Category:US slang

I put a link here as it looks like a policy decision more than just a deletion one. Mglovesfun (talk) 17:21, 14 November 2009 (UTC)

[edit] Wikisaurus - inclusion criteria

Richardb thinks that Wikisaurus should have less strict inclusion criteria than the Wiktionary mainspace, and, without starting a discussion, has been entering his proposal directly to Wiktionary:Editable CFI and Wiktionary:Wikisaurus. I have reverted his edits, but I have no interest in keeping an edit war with him.

Please provide your input on what you think the inclusion criteria for Wikisaurus should be. I think they should be basically the same as those for the mainspace, with possible exceptions that are left unarticulated. --Dan Polansky 16:36, 15 November 2009 (UTC)

No, there should not be exceptions. Both Wikisaurus and the mainspace should allow all words, as the first sentence of CFI (before the table of contents) states it. Lmaltier 17:19, 15 November 2009 (UTC)
So you think the same CFI should apply to both and separately disagree with the restrictiveness of the current CFI? Or do think that since the current CFI is too restrictive that it shouldn't apply to Wikisaurus? --Bequw¢τ 23:35, 15 November 2009 (UTC)
I think the same CFI should apply to both and separately disagree with the restrictiveness of the current CFI. Lmaltier 19:13, 18 November 2009 (UTC)
Same CFI for both. They both a strive to be general lexical reference works. --Bequw¢τ 23:35, 15 November 2009 (UTC)
If Wikisaurus were to include references to appendices that have terms that do not meet CFI, we might be able to have our cake and eat it too. At present the recommendation to put such terms in appendices is transparently (to all but the most naive newbies) consigning them to oblivion. Wikisaurus could be a window onto such appendices, which might include common SoP terms and well as protologisms and unattestable colloquialisms that don't rise to the "widespread-use" exception to CFI. DCDuring TALK 23:57, 15 November 2009 (UTC)
This argument seems to fall into a broad category of arguments of the form, "the CFI are, or WT:CFI is, broken, but it's too hard to change them/it the right way, so let's 'fix' them/it in a way that doesn't solve the underlying problems". Such arguments most often come up with place-names; also frequently with personal names (especially, for whatever reason, in Chinese); and occasionally with languages that aren't well attested. I don't dispute that the CFI are broken — not that I have much of a gripe with any particular CFI, but collectively they do not have, and AFAICT at no point have they ever had, consensus, and they're riddled with ambiguities that we can't pass a vote to resolve one way or the other, and enforcement is very inconsistent, etc., etc. etc. — but I don't think that proposals like "let's let Richardb use Wikisaurus as a dumping-ground for garbage we won't allow in mainspace, because no one actually cares about Wikisaurus" are really very productive. Especially since, as it happens, one editor does care about Wikisaurus. —RuakhTALK 15:42, 17 November 2009 (UTC)

[edit] Argument for altering CFI to meet current practice, for Wikisaurus

--Richardb 13:15, 16 November 2009 (UTC)

There is "Added Value" in capturing as many words as possible. But there can be a "Cost" in having a Wiktionary cluttered with entries of questionable value, neologisms etc. But in Wikisaurus, especially in the /more pages, there is good added value in just simply having a word (not a page or an entry), be it a neologism or a piece of technical largon, associated with other related words, with very little "cost". Especially once the more dubious words in the more contentious pages are hived off to the /more pages . Indeed, the /more pages are, of themselves, a kind of appendix, are they not ?
Further, I make a point that the policy and current practice, in regard to Wikisaurus "words" (not headwords/pages/entries), do not match. Either the policy is wrong or the practice is wrong. So, either we make an alteration to CFI to allow Wikisaurus the same sort of latitude as given to appendices, or we wipe out a considerable chunk of value in Wikisaurus by deleting all the single word content that doesn't strictly meet CFI as it is now. I know which I feel Adds Value and which Costs.
Additionally, Wikisuarus allowing some unattested words does serve a number of useful purposes.
  1. It gives a space for protologisms, neologisms etc to go, without cluttering the main space. and yet they are still searchable. To some extent we lose creditability if a person sees a new word somewhere, comes to Wiktionary, and either can't find it at all, or finds it only in a list of "neologisms". Whereas, if he finds the word in Wikisaurus page, he is going to get a fair idea of what the word means through it's associations.
  2. Equally, we make it pretty daunting for newbies to add a new word, having to meet CFI, formatting, template use etc. It's quite a bit easier to be able to put the new word they have discovered into Wikisaurus. and now all of a sudden, we don't just have a newbie reader, we have a new contributor, who actually has invested something into Wiktionary, and has some, very small, feeling of ownership, of sharing in the exprience of building Wiktionary. Some of these new contributors may go on to be very significant contributors. Instead of the first response to them trying to add a word somewhere is to be told = No, it doesn't meet CFI, it isn't formatted properly etc. I know in the past that we have lost a number of new, interested contributors, because of the "Wiktionay Police" approach of one or two over pedantic admins.
Allowing words in lists within Wikisaurus that don't necessarily meet full CFI(present) adds a lot of value, at little cost. So why stop it? Why have a policy that doesn't meet current practice?
PS. I have noted in the past, and I think it is still the case, that there are many archaic and obscure words in Wiktionary that only on the slimmest of measures meet CFI. I myself know I have added one or two words from Jane Austen works that I could not find in use (as opposed to in dictionaries) in any other book. Why should words like that be admitted, while we have huge battles over words like "Chillaxin" or "Boucebackability", clearly in widespread use.
Finally, I have also proposed another notionally very minor clarification change to CFI, which, if accepted, would quite possibly avoid this whole argument anyway, at least to a large extent. That is to clearly spell out that words need only meet ANY one of the CFI, not ALL of them. Too often in the past that too subtle little "or" in the criteria has been ignored. "No, it's not good enough that "chillaxin" has hundreds of thousands of hits on Google, it must also appear in print somewhere". If that clarifying modification, the vey clear reference to meeting ANY of the criteria, is accepted, then perhaps we don't have much to worry over after all. Except, of course, for the very top policy of all - all words in all languages!

--Richardb 13:15, 16 November 2009 (UTC)

Richard, can you give an instance of a Wikisaurus entry at which current practice does not match CFI?
Well, since virtually all words meet CFI IMHO, I 'm guessing you are looking for an example where there is a clearly valid entry in Wikisaurus that is not in the Main namespace. How about off one's trolley in wikisaurus:insane. Google hits - "Results 1 - 10 of about 96,400 for "off one's trolley" ". Clearly a phrase undestood by most people in Britain. But no entry in Main namespace. Not even a mention in trolley. So, with very little effort, certainly no effort to ensure it meets CFI, a user has contributed a term to Wikisaurus that adds real value. --Richardb 00:11, 30 November 2009 (UTC)
Really, it's about the way CFI is interpreted. If a word is a bit new, or a bit obscene, the bowdlerisers delete it on the grounds it does not meet strict CFI, clearly applying a much harder test than for many other words. Actually, as pointed out in the discussions on CFI, I feel things would be improved just by the clear expression that the CFI Criteria are to be applied with an OR, not an AND condition. Change that, and get the exclsuionists to stick by that change, and pretty much the argument about having lesser CFI for Wiktionary would be irrelevant. Though the /more pages would still be useful.Richardb 07:25, 30 November 2009 (UTC)
I agree that the subpages of Wikisaurus entries ending in "/more" such as "Wikisaurus:breasts/more" can be left unregulated by criteria for inclusion, as is currently the case. I have added a sentence to that effect into Wiktionary:Wikisaurus.
Other than that, Wikisaurus should not become an inbox for malformatted and unattested content. If editors of Wiktionary wish to create an inbox for quickly added malformatted and unattested entries, we can create one.
Wikisaurus is a thesaurus and a namespace for semantic network, not a bin for malformatted and unverified content. Wikisaurus is striving at no less accuracy and being well-formatted than the mainspace. --Dan Polansky 14:06, 16 November 2009 (UTC)
Sorry - "Wikisaurus is a thesaurus and a namespace for semantic network" ???? When was that decided? Funny thing is, I thought it was for users, people. Please expand on your idea. Perhaps outside this immediaste discussion though. --Richardb 00:11, 30 November 2009 (UTC)
Let me be specific. In this edit to Wikisaurus:die, I have removed "become brown bread", "get off one's twig", etc., as these terms have almost no Google hits. Do you wish that these unattested invented terms are included in a Wikisaurus entry?
I have to question your objectivity. Even if I use the precise phrase "become brownn bread", hardly the commonest form of usage, I get "Results 1 - 10 of about 18,800 for "become brown bread" from Google. So those 18,800 hits are, in your humble opinion, are not worth anything in support of the inclusion of the phrase. OK, in reality, there may be only a few usages there, but, nevertheless, if a non-English speaking person sees the phrase and looks it up on Wiktionary, what will they find. Nothing, or something helpful ?--Richardb 23:51, 29 November 2009 (UTC)
My mistake. I admit that "become brown bread" has gained 18,000, which it had not when I had removed it. I retract "these terms have almost no Google hits", as its use of present tense is incorrect; the terms had almost no Google hits when I had removed them. Nevertheless, the term "become brown bread" can be readded when it becomes clear that it meets CFI. --Dan Polansky 09:11, 30 November 2009 (UTC)
For a context, there was a vote on the "/more" pages at Wikisaurus: Wiktionary:Votes/2006-09/Wikisaurus semi-protection. --Dan Polansky 14:29, 16 November 2009 (UTC)
It feels like this is at least the third time this discussion has come up. Historically it has been decided that we do care about the Wikisaurus namespace being more than an Urban Dictionary mirror. It will continue to be my vote that verification and attestation as rigorous as the main namespace be applied to all material meant to be used as reference material included in this project. The giant list of utterly dubious "synonyms" listed in the current incarnation are barely worth the bits they are stored in. - TheDaveRoss 20:30, 19 November 2009 (UTC)
Even i f they are "barely worth the bits they are stored in", they are still woorth more than it costs. But, what you really mean is "in my opinion barely worth the bits they are stored in". It seems that you, as with many other exclusionists, take the judgement that if you think it is not worth anything, then sod those who do think it is worth something. I believe there is more value in inclusiveness, and have made the argument. I cannot recall seeing a cogent argument for making this an exclusive, academic work.--Richardb 23:36, 29 November 2009 (UTC)

Richardb, I can't see how you call this a good-faith proposal, when you ignore things I've generated for you like User:Connel MacKenzie/thesaurus. The truth here, is that you aren't interested in building a thesaurus, you are interested in dumping lists of obscene terms in the wikisaurus namespace. Since I've spent at least ten times the amount of time on wikisaurus than you have, I'm shocked to see you portrayed as "caring" about it. But I'm not surprised to see the same proposal now, that was rejected numerous times in numerous forms, in the past. --Connel MacKenzie 23:24, 19 November 2009 (UTC)

As far as I can see, the only person who's used the word "caring" in this discussion is me, and I can assure that you I did not mean Richardb. Since Amina left, the only editor putting much effort into Wikisaurus has been Dan Polansky. —RuakhTALK 01:51, 20 November 2009 (UTC)
How is having User:Connel MacKenzie/thesaurus any use to someone searching the Wikisaurus namespace ? And it is a total and slanderous misrepresentation to say --Richardb 23:36, 29 November 2009 (UTC)"you are interested in dumping lists of obscene terms in the wikisaurus namespace". I am interested in protecting the value that is in all of Wikisaurus, contributed by a great number of people. It just so happens a lot of commonn usage words and phrases are obscene. (and some times I put a few back after the bowdlerisers have been active). This is not an argument about ":obscene" or otherwise.. It's about Inlcusivness, or exclusion. I believe there is more value in inclusiveness, and have made the argument. I cannot recall seeing a cogent argument for making this an exclusive, academic work.--Richardb 23:36, 29 November 2009 (UTC)

Responding to some of the postings above as regards whether the current practice of Wikisaurus matches CFI: The example given by Richardb—off one's trolley in wikisaurus:insane—has meanwhile been created in the mainspace, as it meets CFI. To demonstrate that the current practice of Wikisaurus does not match the principle of including in Wikisaurus only CFI-meeting terms, it would be necessary to list a considerable number of terms that are currently included in Wikisaurus and yet do not meet CFI; that a term is not yet in the mainspace does not prove that it does not meet CFI. To see whether a term meets CFI, the first thing to do is to search for it in Google books, and check further criteria. Whether a term is vulgar or obscence is out of discussion; CFI allows inclusion of vulgar or obscence terms. The demonstration is unlikely to be delivered, as there is a large number of entries that I have either created or cleaned up. That is, Richardb's claim posted in this thread that "...the policy and current practice, in regard to Wikisaurus "words" (not headwords/pages/entries), do not match" is untrue. I am unsure whether this subject is still relevant, though, since several people in this thread expressed their support for the principle of keeping in Wikisaurus only terms meeting CFI, while only one person disagreed. Unless some more people turn out to disagree, this thread documents that the principle has a community support. If it turns out that some Wikisaurus entries contain CFI-non-meeting terms, this can be easily amended by removing the offending terms from Wikisaurus. Let me remind that this discussion does not pertain to the "/more" subpages in Wikisaurus, whose content is, for now, allowed not to meet CFI. --Dan Polansky 08:56, 30 November 2009 (UTC)

[edit] Collaboration

The Collaboration of the week has been inactive for quite a while, probably because there isn't enough interest in it to have a new one every week. I think there were really some flaws in the concept in the first place, mainly that a new one was started whether the last was good enough or not. However, collaboration is very useful in building entries.

Therefore, I propose that a new project be started, that would work on one word, and not start a new one until the current one is about as good as it can possibly get. That means great and complete definitions, as many pronunciation guides as possible, audio files, synonyms and antonyms (and possibly a Wikisaurus entry as well), a good etymology section, example sentences, large translation tables with as many de-redlinked and well-formatted translations as possible, references and citations for everything. Basically, work on this word would not stop until the entry is as much of an example of a "perfect entry" as possible. This will probably take longer than one week per word. I don't know how we could decide whether an entry is "perfect" or not (Polls? Comments? Specific criteria?) nor do I have an idea for a name, but I really think that trying to get as many near-perfect entries as possible is rather important and that we often make quantity of entries too much of a priority over quality.

Any thoughts? --Yair rand 20:29, 15 November 2009 (UTC)

Sounds like a good idea in general, though I would imagine a time-limit is necessary (we can always re-start the clock if there's still enthusiasm for a particular entry). The only really successful "explicit" collaboration we've had was User:Mutante's emptying of Special:UncategorizedPages went from tens of thousands to tens in a few weeks - I think it was successful because everyone could take part in their own way (even the people who contributed only to one language); there was a definite success criteria, and a strong driving force. If some definite targets could be set, and you were willing to whip people a little bit, then I can imagine this working very well. Conrad.Irwin 21:45, 15 November 2009 (UTC)
One of the reasons I suggested this is so we could have a goal, rather than a deadline. I like the idea of working on one word until it's done, rather than working on it until a certain amount of time has past. I think the simplest way would be some form of poll, maybe stay with the current word until a majority including at least a certain number of people (3,4,5?) think we should switch to a new one. --Yair rand 00:58, 16 November 2009 (UTC)
I started a page for it in my userspace (User:Yair rand/Current Collaboration). Edits to it are welcome. I hope to move it to the Wiktionary namespace after a name is decided and there is sufficient interest in having a collaboration project. Personally, I'd love Wiktionary to have some entries that would just make readers go "wow". I think this is the best way to accomplish that. --Yair rand 05:45, 17 November 2009 (UTC)
You mean like we now have for listen, parrot, etc.? That was the idea behind the Model Pages, except that the models were chosen to have a limited number of parts of speech and possible senses, to avoid confusion when using them as models. My most recent attempt at "complete" page was biceps. I selected that word in part because (1) there is a common sense often missing from other dictionaries (the informal sense #3), and (2) the word exists in several languages, so we now have a new Model Page that models additional languages beyond English. If we picked a few such multi-lingual entries for collaboration, that might spur more interest. That is, pick pagenames that ought to contain entries in multiple languages, even if the meanings of the words in those languages are unrelated. --EncycloPetey 03:46, 21 November 2009 (UTC)
Yes, those are the kind of entries I was talking about. The idea was really just to have some pages where everything immediately around it was just how it should be. I guess that really should include words in other languages that have the same name as the English word. --Yair rand 06:16, 24 November 2009 (UTC)

[edit] WT:APR

I decided to go ahead an make WT:APR active. Any feedback is appreciated. --The New Mikemoral ♪♫ 01:40, 16 November 2009 (UTC)

Note the existence of {{rfap}}, which I think is the more conventional way of requesting an audio file for an entry. While I don't know, I'm guessing that the ease of use of that template and its automatic categorization are why WT:APR was {{inactive}} (as redundant).​—msh210 17:20, 16 November 2009 (UTC)
This is more to get audio wanted immediately recorded. I am currently working on clearing the backlog at the category. --The New Mikemoral ♪♫ 06:09, 17 November 2009 (UTC)

[edit] Renaming of *Topics categories

Per the recent BP and RFDO conversations I was going to edit {{topic cat}} et al. to rename the "*Topics" categories to "All topics". If you have strong feelings that this isn't correct (e.g. if you think it should just be "Topics"), please discuss now before lots of categories are renamed. --Bequw¢τ 20:04, 16 November 2009 (UTC)

[edit] WT:EDIT allows editing of translation table glosses

In order to help try to clear the backlog at Category:Translation table header lacks gloss, I've implemented another feature in WT:EDIT which allows you to modify the gloss without bothering with that horrible edit-page bit [7] and [8]. More detail, transcluded from the talk page is below. Conrad.Irwin 03:07, 17 November 2009 (UTC)

I should perhaps have mentioned that this feature won't work unless you click the "Enable" button that should appear after you clear your cache (ctrl+shift+F5). I wasn't planning to make this available "by default", as I'm not sure non-ediits will understand exactly what to do (even after we've fixed all the bugs). Conrad.Irwin 03:09, 17 November 2009 (UTC)


[edit] Table Labelling

If you enable this feature, each translation table has a ± in the corner.

  • Clicking on this brings up an edit box for the table's header.
  • Type what you want in, normally the first few words in the definition (without linking or formatting)
  • Click "Preview" - if you are not happy with the result, click ± again - otherwise click "Save" in the top corner

WARNING: This feature has not been well tested in all browsers yet. Please notify me if you have any problems.

You don't have the current version of the translation table gloss editor.


Excellent. Very convenient. Now bug number 1: I can't edit Marxist, says "Could not find translation table". Also, I have a question: do you plan to develop this tool to cover cases like this, i.e. add new tables, and ttbcify the translations of the first one? It's unbelievably hard to do by hand. --Vahagn Petrosyan 12:00, 17 November 2009 (UTC)
I thought I had fixed the Marxist bug a while back, I'll look again. I was planning to allow adding new tables (at some point) - I suppose I could try ttbc-ifying old ones, it doesn't look too hard for a computer, but it wasn't an immediate goal. Conrad.Irwin 12:03, 17 November 2009 (UTC)
Maybe once the backlog of Category:Translation table header lacks gloss is cleared out a bit we can start automatically converting {{top}}{{top2}}/{{trans-top}} and {{mid}}{{mid2}}/{{trans-mid}} so those 3-letter templates can be used fully as language codes. Also a quick scan shows that the majority of the entries in this category have a single definition line meaning there's lots of low-hanging fruit. --Bequw¢τ 15:07, 17 November 2009 (UTC)
Yes, that would be a good idea - I just caught most of the low-hangers from the "b" section of Category:Translation table header lacks gloss. I was able to do 88/120. So, this is only going to cut down the category from ~3000 -> ~1000. The rest needs much more attention, by far the most common remaining problem is that there is one translation table, but multiple definitions - perhaps we could automagically ttbc-ify all tables in that situation (or maybe only those with more than one translation, and ask people who speak the language of that one translation to re-name the table)? Conrad.Irwin 17:26, 17 November 2009 (UTC)
I think we'd lose some valid translations that could be kept if the conversion was done manually. On a page with 2 senses and only one translation box, a human can look through the page history to find when the second sense was added. S/he can then keep all translations up to that point in history as valid for the original sense. Translations added after there were two senses would have to be ttbc'ed of course, but at least we're not dumping them all in there. Maybe this could be automated, but it seems a bit trickier. --Bequw¢τ 20:21, 17 November 2009 (UTC)

[edit] Category:Translation table header lacks gloss

It'd be good to catch all the easy-to-fix problems in this category: (well, and all the others too!). To catch the easy ones, you just open up lots of tabs with pages from Category:Translation table header lacks gloss and add a short summary of the definition to the translation table using the ± button enablable above. Conrad.Irwin 17:26, 17 November 2009 (UTC)

  •  !->Z: easy done Vahagn Petrosyan
  • a: easy done
  • b: done the easy ones. Conrad.Irwin
  • c: easy done
  • d: easy done Bequw
  • e: easy done
  • f: done the easy ones​—msh210
  • g: easy done.
  • h: done the easy ones​—msh210
  • i: easy done.
  • j: easy done.
  • k: easy done Conrad.Irwin
  • l: easy done.
  • m: easy done
  • n: easy done
  • o: easy done
  • p: easy done pa-pd
  • q: easy done Conrad.Irwin
  • r: easy done
  • s: easy done
  • t: easy done.
  • u: all done
  • v: easy done
  • w: easy done
  • x: done by big V
  • y: done by big V
  • z: done by big V

Any advice about how to write a trans table gloss? Simply the definition given, or copy it? Help:Translation gloss?? --Volants 13:50, 19 November 2009 (UTC)

IMO, glosses should be short enough to fit in one line for most screen resolutions, and be particular enough to clearly correspond to a single definition/sense in the parent PoS. It is not required to define the term all over again and when the definition is medium to long in length it is appropriate to simplify it for the gloss. When there's only one sense for a certain PoS, the gloss can be trivial to produce, but as more senses are added one should take care to ensure that the glosses still identify unique senses. If someone else concurs maybe they can add something to Wiktionary:Translations or maybe WT:STYLE. --Bequw¢τ 14:20, 19 November 2009 (UTC)
I've written my thoughts at Help:Glosses, I think we mainly agree though you emphasise quoting the definition more than I - please feel free to update the help page. Conrad.Irwin 16:53, 19 November 2009 (UTC)

The category is now down to 1,600ish from 2,900ish. Thanks to everyone, and I hope we can get the remaining gaps of "easy done" plugged in the next day or two (it only takes a few tens of seconds per-page, and is most satisfying)! Conrad.Irwin 16:53, 19 November 2009 (UTC)

What do we think about HTML "comment" glosses? For example, pond uses <!‐‐DefAtlOcean‐‐> (on the definition line) and acceleration has <!‐‐DefB‐‐>. I think if we are going to be having people add translation glosses w/o seeing the underlying wikitext (and associated HTML comments) then these "comment" glosses should be converted to normal glosses. --Bequw¢τ 18:35, 19 November 2009 (UTC)
Yes, I agree. The comment-style glosses are only useful for people editing the entire page (and/or robots) not for readers nor sectional editors. Conrad.Irwin 22:48, 19 November 2009 (UTC)

[edit] What's the deal with the logo?

I was surprised to see that you guys are still using the old crappy Wiktionary logo since most of the other Wiktionaries have upgraded. (Especially since the vote was 3 years ago.) When I tried to find out why, I learned that "While there was a consensus to approve the logo on Meta, there is a substantial opposition to changing the Wiktionary logo entirely from, primarily, the English Wiktionary project, and therefore, the Wiktionary logo has not been changed at all." Now there is total chaos on Meta about what to do regarding the Wiktionary logos. (It looks like there are about 20 different proposals with little to no organization.) What's the deal guys? Why don't you just upgrade your logo instead of monkey-wrenching the process for everyone? It would definitely be an improvement over what you're currently using (which looks more like an HTML rendering mistake). Think it over. Kaldari 21:46, 17 November 2009 (UTC)

There was consensus on the English Wiktionary not to use the tile logo. There is not "little or no organization", nor is there any chaos on Meta. The mess on the logo page is completely due to User:Richardb's decision two days ago to wipe half the page and replace it with his own comments. That vote will begin as soon as the rest of the translations are complete. After that, and the vote deciding whether to use the winning logo, hopefully we will have a new logo, which will most likely be a lot better than the "upgrade" of the scrabble imitation logo. --Yair rand 22:10, 17 November 2009 (UTC)
That is bloody slanderous. I did not delete anything. I am NOT, NOT, NOT a deletionist!!!. All I did was
  1. moved all the info about Voting into one section, under voting
  2. researched the history of the matter, and put in a sectiion with a short summary of that history.
  3. put my opinion, that, based on reviewing the history, there was a clear misrepresentation about the outcomes of previous voting. The traditional logo (as used by English wiktionary) had failed to gain much support at all.


I'd appreciate a withdrawl of the slander from Yair Rand. But I won't hold my breath!--Richardb 08:37, 30 November 2009 (UTC)
PS: Far from organising a vote, you couldn't organise a piss-up in a brewery. You have absoliutely confused the issue by the way you have handled it. No clarity at all.--Richardb 08:37, 30 November 2009 (UTC)

Which consensus? AFAIK, no vote has been organized here. Lmaltier 19:11, 18 November 2009 (UTC)

From what I can see in the Beer Parlour archives, it was clear long before it could have come to a vote that nobody wanted the tile logo. More recently, a poll was held on Meta, and there were 71 votes to start from scratch rather than use the text logo or the tile logo. The point of whether the tile logo is better than the text logo is now irrelevant, as the logo vote will be starting soon, as soon as German, Japanese, Turkish, Lithuanian and Vietnamese translations of the voting page are added, the Russian translation is completed, and the French and Finnish translations are proofread. --Yair rand 20:59, 18 November 2009 (UTC)
Well, when fr.wikt has organized a vote, its results were very surprising. A few negative comments don't mean that most users are negative. Lmaltier 22:14, 18 November 2009 (UTC)

Wow, what an embarrassment. After 3 years of debate, Wiktionary still hasn't decided whether or not it wants Goatse.cx as it's logo. Why hasn't The Register written an article about this yet? Kaldari 19:11, 20 November 2009 (UTC)

Also, what's the point of having a new vote on meta? Won't the results just be ignored again? It kind of reminds me of elections in Burma. Every once in a while they have an election, but since the military dictator never wins the election, they just keep ignoring the results. Kaldari 19:20, 20 November 2009 (UTC)
If you had fully read the voting page you would have noticed the section where it says that "Following [the vote], each language Wiktionary will hold their own vote on whether to accept the winning logo. In the event that less than 60% of the Wiktionaries approve of the logo, none of the Wiktionaries will use the logo." This way, we have a chance at unifying the logos and the result will not be ignored. --Yair rand 19:28, 20 November 2009 (UTC)
Your stupid ad-hominem approach to pushing the process forward doesn't endear you to me. Equinox 00:48, 21 November 2009 (UTC)
Bit strong there, Equinox. Although I do agree that Kaldari's points are not particularly helpful. The logo hasn't changed because there wasn't a consensus right? That's how the system works. At the end of the day, an attractive logo does not change anything about the project. People use wiktionary because they find it useful, not to marvel at its graphic design. If someone comes up with a logo that everyone really can agree on then all the power to them. But that day hasn't come yet, has it? Tooironic 02:30, 22 November 2009 (UTC)
Right! By looking at those hundreds of kilobytes long "discussions" on meta on whether it's better to use logo with a line over here or a tile over there, one begins to wonder whether those people have anything better to do in their lives. WMF should hire a professional designer and get this "problem" over with, so that everyone can waste their energies more constructively. Building a "consensus" over sth that isn't the problem in the first place (picking a favorite logo is really the same as picking a favorite flavor of ice cream) will inevitably leave most of the voters dissatisfied, esp. when choice needs to be made from 30+ proposals. With these pointless votes, methinks meta is slowly turning into a giant self-purposing bureaucratic machinery, losing touch with the real world. FWIW, I'm perfectly satisfied with our "embarrassing crappy old logo" and more concerned with nailing down the 10k lemmata missing on Wiktionary that I have on my TODO list. --Ivan Štambuk 02:52, 22 November 2009 (UTC)
"More recently, a poll was held on Meta, and there were 71 votes to start from scratch rather than use the text logo or the tile logo." In my research of the history of this matter, I did not come across this. I did however come across a reference to something like a a vote on English Wiktionary Beer Parlour, a vote of 71 votes to retain the traditional logo. Can you please give a link to your sources, so we can add that in to the history of the "logo project".--Richardb 08:37, 30 November 2009 (UTC)

[edit] Missing categories

I can't help noticing that the necessary templates like {{en-noun}} or at worst {{infl|en|noun}} (usually for less common languages) are missing from a lot of words. acceleration due to gravity (currently at rfd) is in the physics category, but not English nouns (or English anything). Entries with context tags like this, or rfc tags don't get listed in Special:UncategorizedPages (corrected link Mglovesfun (talk) 12:16, 21 November 2009 (UTC))which means they are harder to find. Mglovesfun (talk) 22:23, 19 November 2009 (UTC)

For Spanish at least there is Category:Spanish entries lacking inflection, but I don't know if any other languages have this. Could entries without {{infl}} or a standard inflection template be tagged by Autoformat? Nadando 22:33, 19 November 2009 (UTC)
I think what you want is Special:WantedCategories. Besides Special:SpecialPages (on the left frame) Ullman, Hippiebot, and perhaps Conrad may have subpages that contain useful problem-entry lists. CM's haven't been run recently AFAICT and may not be runnable or may have been replaced. There may be more. DCDuring TALK 00:19, 20 November 2009 (UTC)
Wanted categories are for red linked categories. I'm talking about English nouns that aren't in the category English nouns (as an example). Mglovesfun (talk) 12:16, 21 November 2009 (UTC)
User:Conrad.Irwin/English_nouns_without_categories is a list of all 7613 entries with ===Noun=== under ==English that aren't in Category:English nouns, Category:English plurals, Category:English alternative spellings, Category:English misspellings, Category:Misspellings, Category:Alternative spellings. It paints quite a depressing picture, containing things like tastier and antecedently which aren't nouns at all. :(. Conrad.Irwin 14:36, 21 November 2009 (UTC)
Indeed, and that's just the nouns in English. I find verbs and adjectives as well, of course. Mglovesfun (talk) 11:36, 23 November 2009 (UTC)

[edit] Wiktionary:Requests for cleanup#Category:United States of America

A debate over on RFC concerning whether to use Category:United States of America or Category:United States. Result should affect other topical categories using one or the other of the two ways of referring to the country by name. — Carolina wren discussió 22:40, 20 November 2009 (UTC)

[edit] What's in a name?

Currently Category:Names is a topical category that exists betwixt and between our topical categories and our part of speech categories in terms of naming. There's the subcategories Category:Given names and Category:Surnames that use the part of speech category naming system (i.e, French given names and French surnames) and Category:Demonyms and Category:Place names that use the topical category naming system (i.e, fr:Demonyms and fr:Place names). It also has the topical category Category:Onomastics as its sole parent. There are also a few anomalous categories:

So, what to do? First off, while related, a good case can be made that Demonyms aren't Names per se. Unlike Given names, Surnames, or Place names, they aren't proper nouns and in English at least, generally do double duty as adjective and noun. So I recommend changing the parentage of Demonyms by replacing Names with Onomastics. That's easy enough to do (or undo) and if it were the only recommendation I had, I'd likely go ahead and do it without bothering the parlor. However, that leaves us with what to do with place names.

Changing Category:Place names and its subcategories over to the part of speech category naming system would be a lot of effort, and we'd end up with some extremely verbose category names such as English names of states of the United States for some of the subcategories. Plus, unlike the given names and surnames which combine to form a compound name that is used without commas (at least in English), place names generally aren't used to form compound names, so it can be argued that structurally they act differently. So here's what I'd like to propose:

  1. Change Category:Demonyms from having Category:Names as a parent to having Category:Onomastics.
  2. Change Category:Place names from having Category:Names as a parent to having Category:Onomastics.
  3. Create a new category that would follow the part of speech category naming convention: Category:Personal names.
  4. Change Category:Given names from having Category:Names as a parent to having Category:Personal names.
  5. Change Category:Surnames from having Category:Names as a parent to having Category:Personal names.
  6. Delete Category:Names and include in the descriptions of Category:Onomastics and Category:Personal names the relationship between the two categories.

The reason I want to delete Category:Names is that I feel that the category name is ambiguous, since apple is the name of a type of fruit, etc., plus once Demonyms and Place names are moved to Onomastics, elements used to form personal names would be all that are left in the category.

It might be worth renaming Category:Place names to Category:Toponyms to reduce potential confusion if Category:Names is retained, but it is not essential to the proposal, and renaming can be considered as part of the Gazetteer proposal that has been floating around these discussion pages. — Carolina wren discussió 00:46, 21 November 2009 (UTC)

This is a good proposal. Category:Names , and all its topic and POS forms in various languages (like Category:fr:Names and Category:French names) are confusing and do more harm than good. That is why I gave the "Xxxn given names/surnames" categories "Xxxn language" as the parent when I created them a year ago. Only those categories with the new Template:namecatboiler have "Xxxn names" as a parent. Why not change it into "Category:Xxxn personal names"? This template does not include Category:Given names/Surnames by language either, though they would seem essential to me.
The Australian, German and Irish categories are accidental left-overs from the topic category days, and should be deleted. I would keep Jewish surnames ( but not Jewish names) and Indian names, in order to direct all the anon Indian contributors into correctly formatted entries.
If I were you I wouldn't hurry changing "Place names" into "Toponyms". We'll probably have a hundred more discussions about the CFI for place names, so anything could happen.--Makaokalani 15:24, 23 November 2009 (UTC)

[edit] Unprotect WT:CFI

My argument would be:

We don't actually use the criteria listed there, because we can't! They're too vague. I can pick apart almost every sentence and show how vague it is. The "names of specific entries" bit has been cited ad nauseam, so how about "idiomaticity"?

An expression is “idiomatic” if its full meaning cannot be easily derived from the meaning of its separate components.
For example, this is a door is not idiomatic, but shut up and red herring are.
Compounds are generally idiomatic, even when the meaning can be clearly expressed in terms of the parts. The reason is that the parts often have several possible senses, but the compound is often restricted to only some combinations of them.

It does actually say "expression", which we interpret as "more than one word". I think words with hyphens and apostrophes are ambiguous. For example rod-shaped, one word or two? don't, one word or two? What about l'ai or l'appelle in French? If those are one word, there's no doubt they can be attested. llámame in Spanish, is that one word or two?

I'm not convinced that expression does mean "more than one word", that's just the Wiktionary norm. I'd quite like a 24 hour period to edit WT:CFI, just because I don't think anyone actually knows what it means, or uses it seriously! Mglovesfun (talk) 12:32, 21 November 2009 (UTC)

Wiktionary:Editable CFI was set up to deal with these issues, one presumes that the proponents thereof will eventually instigate procedings to have the new version become authoritative, as and when the bugs have been removed. I personally would prefer a much less prosaic set of inclusion and exclusion rules against which words can be argued to match, with the rules being updatable by discussion at WT:RFD/WT:RFV, but I'm not sure how well this would work in practice. Conrad.Irwin 13:04, 21 November 2009 (UTC)
I'm tempted to put {{rfc}} at the top of it, but I suspect I'd get a block for that. Whatever happened to Visvisa proposing some stuff to vote on? Or at least, that was the idea I had in my head. Mglovesfun (talk) 12:05, 22 November 2009 (UTC)
If you have a specific change you'd like to see, then by all means propose the specific change, have the discussion, and possibly put it for a vote. A 24 hour free-for-all of editing one of our core policy pages is a bad idea. The page is protected to keep more people from getting themselves in trouble for editing a page that tells them in a big banner they don't see at the top not to do so. --EncycloPetey 15:52, 22 November 2009 (UTC)

[edit] Let's talk about sex, baby!

I don't think that Category:Sex and Category:Sexuality are redundant to each other, but they most definitely need some clean up. I also stumbled on Category:Sexual deviance which seems a bit POV. It might need renaming, but then again, to what? Mglovesfun (talk) 12:53, 21 November 2009 (UTC)

There are things that belong in cat:sexuality (like sexual orientations) that don't belong in cat:sex. Category:Sexual deviance should definitely go, though. I'm gonna rfdo it. — [ R·I·C ] opiaterein — 17:15, 21 November 2009 (UTC)
What about stuff like Mile High Club, is that sex or sexuality? Certainly anything biological should go in sex. AFAICT stuff like penis and vagina should be in there, I'll have a quick look (so to speak). Mglovesfun (talk) 21:56, 21 November 2009 (UTC)
I would say the Mile High Club should go in sex, but I can see how it could go into both. I think it best to reserve sex for sexual activities, aids, toys, etc. while sexuality should be for sexual preferences, gender identities, things like that. — [ R·I·C ] opiaterein — 22:17, 21 November 2009 (UTC)
Suits me. Mglovesfun (talk) 22:24, 21 November 2009 (UTC)

[edit] Logo Vote

The Wiktionary logo vote is set to start 2009-12-07 00:01. The first round will continue until 2009-12-31 23:59 at which point the second round will last until 2010-01-31 23:59. Anyone who understands a foreign language that the voting page has not been translated into, please consider contributing a translation. Other Wiktionaries still need to be notified about the vote, so please help post messages into other Wiktionary Beer parlours (or equivalent). Thank you. --Yair rand 01:48, 22 November 2009 (UTC)

There are at least four discussions of the logo on this beer parlour alone. But you make no effort to connect them together, to put this announcement into the other discussions. And at the top of every Wiktionary page there is a link [Discuss new logo proposals for Wiktionary.], but you don't even mention this new voting schedule prominently on that on that page!!!! OK, way down the page I could eventually find a minute reference - I have added the schedule to the voting pages. --Yair rand 23:49, 21 November 2009 (UTC). Not even a sodding link to the voting pages! You need to do serious publicity if you want this process to have any credibility.


Again I ask, by what authority, and for what reason, are you running this vote (or rather, attempting to run it) ? It seems clear to me that you are pushing this all on your own, making lots of unilateral decisions, without ever once declaring why ? Please explain yourself.


Before your sudden pronouncement about the latest voting schedule, your last comment on the matter that I can find on Beer Parlour was "It is clear that this discussion is getting us nowhere. This apparently pointless debate is now over. --Yair rand 15:02, 26 October 2009 (UTC) " On that we can agree!--Richardb 09:36, 30 November 2009 (UTC)
The schedule was suggested by Conrad.Irwin on the meta discussion page, it's right above the comment I added about adding the schedule to the voting page. The debate on the beer parlour that I managed to end was about whether it matters that the discussion was started on Meta, rather than here. An extra link to the voting page on the meta discussion page was not needed as they are sprinkled throughout the page. There's a link not only at the top of the page, but right above the current discussion under the words "Current Status". Further publicity on Meta isn't needed; the Wiktionaries are being notified through their Beer parlour equivalents. If you would like, for some reason, to post the schedule and a link to the voting page once again on meta, feel free. And no, I am not "running" this vote, even though it may seem like that as I might be the most active contributor to the vote; this is a wiki system. And what do you mean by "authority"? The reason for the vote is that 71 who voted for having a vote want it. --Yair rand 16:52, 30 November 2009 (UTC)

[edit] Subidioms in the inflection line

I believe that this:

"to give what for"

shows, in concept, how multi-word entries what include idiomatic components should appear in the inflection line. One desirable alteration might be a fainter-appearing underline or perhaps simply an underline between "what" and "for". A faint underline could appear under all elements of multiword terms or even all terms if that were simpler or more resource-thrifty to implement.

I expect that HTML in the inflection line is really bad, if not explicitly forbidden. So I think this needs some technical support if we agree that this is a good idea.

This particular idiom illustrates the need fairly well. Linking to each individual word gives users no clue about the construction of this and may lead them to look to insert a question mark or to add an object to the preposition. DCDuring TALK 13:11, 22 November 2009 (UTC)

[edit] Help:Writing definitions

I suggest a help page like Help:Writing definitions/Help:Defining terms/Help:Definitions, to give advice on good ways to go about writing good definitions. AFAICT, we don't have such a page, and there are a handful of very good definition writers here and a couple of very good help-page writers too. --Rising Sun 17:39, 22 November 2009 (UTC)

I sure could use a page like that. DCDuring TALK 18:18, 22 November 2009 (UTC)
I've started a page with three central principles (which can eventually be expounded upon with examples both right and wrong). I've also hinted at two areas I think ought to be included in the page, but which will require quite a bit of work to assemble. --EncycloPetey 05:32, 23 November 2009 (UTC)
It occurred to me as I've been writing, that the results look more like a page for the Wiktionary: namespace that for the Help: namespace. The Help: namespace is usually for technical issues, and I can't think of very many that apply to definition writing (other than the initial hash, no blank lines between defs, context tags). Should we move/redirect the page to Wiktionary:Writing definitions? What do other people think? --EncycloPetey 06:36, 23 November 2009 (UTC)
The Wiktionary tutorial is badly in need of a rewrite. I suggest that this be made into a section of the tutorial. Right now all we have in it is basically of copy of the Wikipedia tutorial which is mostly irrelevant to Wiktionary. This would be a good starting point for redoing the tutorial. --Yair rand 06:09, 23 November 2009 (UTC)
Yes, and no. The tutorial is supposed to teach the basic mechanics of a wiki, and not teach style or dictionary-specific skills. --EncycloPetey 06:36, 23 November 2009 (UTC)
Oh. Never mind, then. --Yair rand 06:44, 23 November 2009 (UTC)

[edit] Category:Lojban language

It would be rather rare to nominate every word in a language for cleanup, but Lojban needs it. The definitions are written in a Lojban style that I can't understand at all. This needs some sort of communal decision. Btw is Lojban actually used as a language? I suspect most constructed languages appear more often in dictionaries than in texts. Mglovesfun (talk) 17:47, 22 November 2009 (UTC)

Lojban makes for a very sticky situation here. We need to have the entries written in English and they need to be as comprehensible, but unfortunately this is virtually impossible to do. With the languages that have the same parts of speech as English, it's very simple to explain because English words exist for those parts of speech and definitions. What are we supposed to do with a language with completely foreign concepts that English words aren't really well-suited for? How do we define a word for which we can't simply say "to do x", "a x", "having x characteristics", "in a x manner" or use English parts of speech? I can't think of any solution better than use what we currently have and assume anyone reading this has some understanding of the language. (And I don't have a clue whether Lojban is actually used as a language but I suspect the folks over at the Lojban Wikipedia and Wiktionary use it to some extant.) --Yair rand 05:31, 23 November 2009 (UTC)

[edit] ttbc (Translations to be checked) tags on translations

Please refrain from flagging translations with "ttbc" lightly, if you make any changes to the English entries, especially adding/modifying senses. Please respect the translators' work! They may not be available any more. I personally find it annoying and discouraging. It's a lot easier to change the English entry than to find and fix the translations into other languages! Anatoli 01:31, 23 November 2009 (UTC)

I don't know when this entry: that is going to be checked. I think there are nicer ways of handling the situation with new senses in the English entry. --Anatoli 01:38, 23 November 2009 (UTC)
What would you recommend? DCDuring TALK 03:52, 23 November 2009 (UTC)
The original translations (the first that appeared in the entry) were the translations of the most common or default sense of the word, in this case "connecting noun clause" (that). If you can't contact the translators to verify, be brave and leave in the original sense. My point is, the translators have already taken the effort to "check" those translations, added their translations and not necessarily "watch" this entry or have time or desire to look at it again. Additional senses in translations may be left blank or with trreq tags (translation requests). Careful examination of the original entry may prevent from redoing a collective effort work. --Anatoli 04:01, 23 November 2009 (UTC)
Those who move these words lightly to ttbc sections should be mindful that some languages have very few or no active contributors and may never be reviewed. Taos, for example, is a finished project, and it is unlikely that anyone will ever add any more. What we have now is all we will ever have for that language, since the Taos speakers do not want it to be written down or published and will not contribute. If Taos words are moved to ttbc, they will remain there forever. (Likewise, if anyone messes with Taos contexts, categories, or templates, they should be very careful, because nobody else is ever going to clean up after them.)
You should be careful about using ttbc, especially for minority languages. I think anyone who moves words to ttbc should also be heavily involved in the checking and restoring of at least one of the languages. That is the only way that you can have a sense of job that you are creating. If you don’t care to check ttbc tranlations yourself, don’t expect others to do it for you. —Stephen 08:17, 23 November 2009 (UTC)
Unfortunately, English is not a finished project either. In one common situation English senses that are best separated are initially combined. When they are separated what should be done?
And, of course, there are many other kinds of errors in the supposed main definitions. The entries for the non-English words are often not much help because they are usually one-word glosses rather than full definitions (by policy as I understand it). They are sometimes glossed with obsolete terms or with highly polysemic English words. It could well be that translation should not be commenced for an entry until it meets some minimal quality standard in terms of having senses that are distinct, with usage examples that correspond. Without some kind of process to note entries and senses that are "ready" for translation (which noting may be wrong or simply changed because this is a wiki), there will be many cases where English entry changes make the glosses in the translation tables no longer match the senses. Should a new "trans" be set up that does correspond? How should the no-longer-corresponding transtable be marked? DCDuring TALK 11:19, 23 November 2009 (UTC)
The ttbc tables' main purpose now seems to be "we don't know which table this translation should be in" rather than "we don't know how accurate the translation is", perhaps it would be better to reserve a translation gloss (something like "Other translations") which could be used for this situation - with a short hat-note much as ttbc already has to explain to anyone who does know how to fix the problem. Conrad.Irwin 13:27, 23 November 2009 (UTC)
Is the problem really with the use of {{ttbc}}, or is it with the moving down to the "checktrans" box? Personally, when I radically alter a sense, I'll frequently tag the citations with {{ttbc}}, but leave them in place; if they're still fine, then it seems like very little work for a translator to remove the tag. Is that still problematic? —RuakhTALK 15:07, 27 November 2009 (UTC)
It is not the use of {{ttbc}} per se, but the moving down to the "checktrans" box. When the words are contributed, they usually have the correct meaning for the existing definition. When you move words to checktrans, it strips them of all sense and syntax and they cannot be restored or used in any way unless a knowledgeable speaker restores them. For languages that have no or almost no contributors, it is tantamount to deleting the words in question. —Stephen 15:57, 27 November 2009 (UTC)
Understood. Robert made a change to Tbot (or Autoformat?) a while back that makes it support the use of {{ttbc}} within the regular translations tables, so people should probably just take advantage of that when at all possible. —RuakhTALK 21:55, 27 November 2009 (UTC)
BTW, the glossless translation table I just found at surely reminded me that almost all of the {{checktrans}} insertions that I have done have been cases where there was no gloss. Translations sometimes predated additional senses, but in other cases translators blithely added translations to multi-sense words that did not have translation tables for each individual sense. I seem to recall that it has taken several senior contributors some time to make a serious dent in the number of glossless trans tables that remained until recently. If we add to that the translation tables using some variant of {{top}}, which seem to have glosses less than half the time, it will be quite some time before this problem is behind us. I also wonder why folks bother to translate any entries that have the Webster 1913 warning or otherwise show serious signs on obsolete wording (eg, only literary usage examples in the Webster's format). They do not meet the most basic standards of being satisfactory for translations that will not have to be subsequently reviewed and the work seems likely to be harder and less fruitful.
The best hope we have is the liberal use of {{trans-see}}. DCDuring TALK 01:30, 28 November 2009 (UTC)

[edit] Plurals vs. nouns forms

As pointed out, one of the stickier issues on the Wiktionary. For example I've just now discovered Category:French noun forms which AFAICT contains only plurals, with about 10 exceptions. I think that the Catalan and Spanish noun form categories are also up for deletion, right? As pointed out by Carolina wren, we don't have anything close to policy on this? FWIW I think it's pure bureacracy, because I don't think anyone uses these categories to look for words. What do other people think? Mglovesfun (talk) 11:34, 23 November 2009 (UTC)

Well, in French there are feminine noun forms of words like gardienne and tueuse, although I don't think they are classed as "noun forms" by any authority. I (along with, I assume, other French contributors) would be happy to get rid of this category, sticking with the Category:French plurals category. --Rising Sun 20:30, 23 November 2009 (UTC)
Yeah but paper dictionaries don't classify these at all, and most online dictionaries just redirect to the masculine singular. So we have something of a "carte blanche" to do what we want. Opi, Rising Sun and I are happy to delete Category:French noun forms and sort into the two categories above. Who says no? Mglovesfun (talk) 11:55, 24 November 2009 (UTC)
I might have said no at one time, but I no longer see the merit it this category. Delete. —Internoob (Disc.Cont.) 23:09, 24 November 2009 (UTC)
Yes, these categories for Western Romance languages are ridiculous... nouns that have corresponding masculine and feminine forms like gardien and gardienne are in the minority. To compare languages who have a majority of nouns with two forms, one being the lemma, to languages like Russian (12 forms), Lithuanian (14 forms) and Hungarian (which has well over 30 forms) is just silly. I actually put Category:Spanish noun forms and Category:Catalan noun forms on WT:RFDO a while back, but that never went anywhere. — [ R·I·C ] opiaterein — 00:29, 25 November 2009 (UTC)
Okay, see Wiktionary:Requests for deletion/Others#Category:French noun forms. Mglovesfun (talk) 11:41, 25 November 2009 (UTC)
@R·I·C - It's more silly having a category named Fooian plurals that apples only to noun plurals and not adjective plurals in the Romance languages. However, I have no objection to having a properly named category for noun plurals, and a category for the feminine singular noun forms of those nouns that have two distinct gender forms, with the noun form category itself either deleted or used solely as holding category that should be empty save for the two subcategories. — Carolina wren discussió 02:53, 27 November 2009 (UTC)
What about a category like Category:Catalan nouns with both masculine and feminine forms? --EncycloPetey 03:01, 27 November 2009 (UTC)
What I'm interested in is a simple category for words like anglesa. The masculine form, anglès, has two common noun senses. In addition, anglesa is also adjective form of the male lemma, so to me it makes sense to mark it as a feminine form. That's to avoid a third repetition of the shared meanings (The Valencian masculine singular is anglés), so that it doesn't go into the index which an independent noun entry would, and because of the parallelism with the adjective form. Also, the name you proposed sounds more like something for nouns like bèstia which take either either gender (and hence either masculine or masculine adjectives, articles, or pronouns) but have no change of form. Dual gendered nouns, either with (anglès / anglesa), or without (bèstia) separate forms are typical for animate nouns in Catalan. The relative rarity of nouns with dual gender is due mainly to the fact that most nouns are not animate. There needs to be a category for noun forms like anglesa, with my own preference being something like Category:Catalan feminine singular noun forms or even Category:Catalan singular noun forms if we were to treat word pairs such as cabra / cabrot as having a feminine lemma and a masculine noun form. (I don't favor treating cabrot that way because I think that those few nouns that don't use the -∅/-a paradigm of the adjective masculine/feminine forms shouldn't be classified as having forms.)

[edit] requesting a bot for manual of style enforcement.

Can I suggest we create a bot to remove the "----". First of all the correct to separate things in either Wikipedia or Wikitionary is to use == title ==, these markers will be shown in the Table of Contents.

The division lines doesn't make the contents any easier to read. Users just use them to stylize any dictionary terms they are interested, and the abuse is becoming so worse that it degrades wiktionary as a formal dictionary, nevermind launching projects, such as Visual Thesaurus...etc. aka Google Search result 1.97 million each page has about 7~8 abusive/per page, that is way over 60%.

some of the horrible usage can be seen below.

--75.154.186.99 18:10, 23 November 2009 (UTC)

See WT:ELE and User:AutoFormat, the first is our layout policy, the second enforces it. Conrad.Irwin 18:13, 23 November 2009 (UTC)
The reasoning behind the lines is simple, if people want to extract a language section from the dump, they have to merely look for ==English==.*(----|$). Without these, they have to scan the whole thing linearly to find the next language, which is much slower and much more complicated. It is a matter of personal taste whether the entries look better or worse without them, as we've been using them on every page for many years, I think a more convincing reason than "I don't like them" is called for. Conrad.Irwin 18:34, 23 November 2009 (UTC)
I can't see anything horrible about this. Does this appears in WT:ELE? I think not, so keep them. Mglovesfun (talk) 11:57, 24 November 2009 (UTC)

They're, um, lines. Call me crazy, but I think we have better things to worry about as Wiktionarians than header dividers. Tooironic 09:26, 26 November 2009 (UTC)

Uh, I don't think we even have a "manual of style". --Yair rand 06:32, 30 November 2009 (UTC)

[edit] Cleanup project

Is there any single page for discussing specific cleanup project; orphaning templates and categories that have failed RFD, correct categories, templates, etc. Does this is exist? If not, surely it should. But under what name? Mglovesfun (talk) 14:06, 24 November 2009 (UTC)

No. When someone starts such a project, they create their own page for it and announce it in the WT:BP, but there is no other central place used for that. Such projects come along so rarely, that it hasn't been worthwhile to have another page for them. There is also a Category:Wiktionary Projects, but it doesn't seem to be much used. --EncycloPetey 15:48, 24 November 2009 (UTC)
I went for Wiktionary:Cleanup and deletion process/Requests. Please anyone, add stuff from your "to do" pages so everyone can see. Mglovesfun (talk) 17:27, 24 November 2009 (UTC)

[edit] misspellings

I thought we used to have a policy not to include misspellings. Did that change, or was it never the case? —scs 17:14, 26 November 2009 (UTC)

I think about this quite a lot. We include "common misspellings" but we have no criteria to define what that is, meaning that almost anything can be a misspelling of something. Right now, anything that goes on WT:RFD is just a pure vote. Mglovesfun (talk) 17:32, 26 November 2009 (UTC)

[edit] Annoyances: "This page has been deleted" for capitalised words

I use OneLook as my dictionary portal. It gives a potted definition and a list of urls to the word in a collection of actual dictionaries. When the potted definition isn't sufficient then I usually use the Wiktionary link because the other dictionaries suffer from bloat, ads or other deficiencies.

Unfortunately there's a problem with Wiktionary that makes it very annoying to access. This is the fact that OneLook uses capitalisation for the word that they reference and Wiktionary shows a "This page has been deleted" entry.

Why is this a problem? Because I don't care two hoots that a capitalised word's page has been deleted. I don't want to stop there and read it, the same text (bar the word in question) for every reference from OneLook. What I do want is to go straight to the definition page. I don't mind if the page has a "capitalised: Redirected from Capitalised" line, I just don't want to have to be pointlessly told that a page has been deleted.

Should OneLook fix this? No, because there will be cases when Wiktionary does have an actual entry for the capitalised word, something that OneLook cannot be expected to know.

I would be very pleased if you would delete "This page has been deleted" from the lookup process. Hopefully, if you consider the amount of time that these pages get in the way versus how often they serve a useful purpose (ie. one that someone would thank you for) then you'll agree to the need.

Thank you, on behalf of all OneLook-referred users.

That isn't technically possible, since it's part of the Mediawiki software for all projects including Wikipedia. Yes, there will be capitalized entires, but these will often be the German noun. If OneLook relies on matching spelling, but does not deal with capitalization, then users will have to stop and read a German entry when looking for English. This really is a OneLook issue, not a Wiktionary issue. --EncycloPetey 19:00, 26 November 2009 (UTC)
I'm not sure that you understand what I am asking. Going to, say, a German word that is capitalised is not the problem; not once has this occurred. What happens is that the page that Wiktionary presents is one saying that the prior capitalised entry has been deleted and giving a link to the uncapitalised word. This repeatedly hinders access to the target definition yet I have yet to observe any useful purpose in being informed that a page has been deleted. My request is that this unnecessary obstruction be removed. 78.151.155.41 13:28, 29 November 2009 (UTC)
I understood your question, but you don't seem to have understood my response. As I explained at the outset of my previous response, that isn't technically possible. Wiktionary has no control over that aspect of our software. Mediawiki controls that; we don't. The problem must be dealt with at either the level of Mediawiki (who develop the software for Wiktionary, Wikipedia, Wikisource, etc.) or it must be dealt with by OneLook. We can't make the kind of change you are asking. The simplest solution is for OneLook to recognize that there is a difference between lower-case and capital letters. --EncycloPetey 15:04, 29 November 2009 (UTC)
Why can't we just put REDIRECTs in, instead of saying that the prior capitalised entry has been deleted and giving a link to the uncapitalised word ? --Richardb 23:19, 29 November 2009 (UTC)
Because it's against the redirection policy at Wiktionary:Redirections, which should explain why it is unwise to use them that way. Equinox 23:27, 29 November 2009 (UTC)
It may be against the redirection policy that you have in your memory, but it's not against the policy as recorded at Wiktionary:Redirections. That states:-
  • .... Work could redirect to work, although this is unnecessary.
  • .... leaving a redirection for external links (such as those from other language Wiktionaries or Wikipedia, or mirror sites.)
So clearly we can have a redirection, and clearly it allows for redirections for links from external sites.--Richardb 06:26, 30 November 2009 (UTC)
But, surely, the question is, since If one enters the uppercase word in the search box, the software automatically redirects to the lowercase article (unless the uppercase exists). , then why do we have the page there with a deletion message. Just get rid of the page fully, and then it will automatically redirect. The deletion message is of no use, and is a definite hindrance.--Richardb 06:26, 30 November 2009 (UTC)
I will email the OneLook guys again, and see if they can fix it on their end. The reason they capitalize everything like that is because Wikipedia does, the list Connel generates for them is appropriately capitalized. They just need to be told that we don't do it the same way. - TheDaveRoss 00:20, 30 November 2009 (UTC)
unindent, for what it's worth, for those with javascript visiting http://en.wiktionary.org/wiki/Work will redirect to http://en.wiktionary.org/wiki/work?rdfrom=Work which contains a link to http://en.wiktionary.org/wiki/Work?redirect=no to turn off this behaviour. The best solution would be for everyone who wants to link to us to link: http://en.wiktionary.org/wiki/Special:Search/Work which will work no matter what case Work has. Conrad.Irwin 12:55, 30 November 2009 (UTC)
Are there good reasons why this is not part of the default? Obviously, if it is a OneLook only problem and OneLook can and will fix it, then any resource cost or implementation risk is probably not worth it. But is this an indication of a more widespread problem? DCDuring TALK 15:59, 30 November 2009 (UTC)
Most wikis are not case sensitive (for a very good reason :p) so it is only a concern on wiktionaries, this javascript behaviour is the default on en.wiktionary for all non-existant pages where an entry exists at {{lc: {{PAGENAME}} }}, {{uc: {{PAGENAME}} }}, or {{ucfirst: {{PAGENAME}} }} as we can't easily query other case combinations. This exists only for those who follow broken links to the site (and note that most sites, when they re-arrange content leave simply 404 messages behind, at least ours tell you the existance of the right page - and visit it automatically if you have javascript). The expected user behaviour is to use the search box and not the url bar, which doesn't have this problem at all. Conrad.Irwin 16:06, 30 November 2009 (UTC)
I was asking a question which I now realize you had already answered. The answer to the question I was trying to ask is: The default js already effectively directs users to an existing entry with different capitalization if the user's/portal site's capitalization does not yield an entry. Then can we infer that the user with the complaint is one of those without Javascript? Do we have information about what portion of relevant Web usage is via browsers without JS? DCDuring TALK 16:50, 30 November 2009 (UTC)
The chances of people using a browser with javascript turned off is very small, I'd extrapolate at around 99% from [9] and [10]. Given that many large companies don't have the patience/time/money for supporting javascript-less users (Google, Flickr, MySpace, though Facebook recently added support) I have no concerns that we are being overly discriminating in making them click a whole extra time (particular given that they are following a broken url from another site). Conrad.Irwin 23:36, 30 November 2009 (UTC)

[edit] Template:citation

Was just thinking - should this categorize, or does it already? Maybe in [[Category:English citations]], or citations of English words? Then by adding lang=fr for example, that changes it to French. It would be make our citations pages a lot more findable IMO. Mglovesfun (talk) 10:11, 27 November 2009 (UTC)

This used to be the case, but Special:AllPages/Citations: is a more readable list than categories provide, see the deletion comment on Category:Citations. Conrad.Irwin 13:51, 27 November 2009 (UTC)
That index does not resolve at all the need for language-specific categorization of citations. I created Category:Citations by language a while back. As for the {{citation}} - that template is broken and the automatic generation of L2 section needs to be removed and replaced by normal language L2 (actually, there is little need for that template at all, and variant spellings for which the common set of citations is provided should prob. be grouped under some kind of subsections). --Ivan Štambuk 14:42, 27 November 2009 (UTC)
Special:AllPages/Citations: is not particularly readable, and it doesn't allow for language sorting. I'm strongly with Ivan on this one, what's the downside to this? Nobody's forced to click on links at the bottom of the page. Mglovesfun (talk) 17:56, 27 November 2009 (UTC)
These changes are huge step forward (and are long overdue). --EncycloPetey 06:48, 28 November 2009 (UTC)

I also added support for lang= to {{seeCites}} a while back to link to L2 section names for a particular language on the citations page (it seemed the most reasonable thing to do, to simply follow the layout of the main page entries). I'm also interested whether there could be an easy way to add support to transclude (via labeled section transclusions) citations for a particular sense (or the entire word or a set of associated senses, whatever makes sense in a particular case), directly in the mainspace definition line by means of some button or link or something that would expand to a drop-down list with show/hide functionality similar to the ones we have in the translation tables (but that would not be that much conspicuous). Could such list be "shrinked" into some kind of Citations superscripts or something? This would eliminate lots of problems with citations (discoverability, dissociation from the definition lines, waste of time with additional clicking..). Ideally this would be done on-demand so that large citation pages don't affect the main entry size. --Ivan Štambuk 06:00, 28 November 2009 (UTC)

That's definitely possible. I tried it on machete, obviously the wikimarkup would need to be adjusted and it could probably be put into a collapsible box. Nadando 06:22, 28 November 2009 (UTC)
I wonder whether this kind of transclusion is advisable. Having one or two quotes transcluded into the entry is a good thing, when they're well chosen. A potentially huge list popping in doesn't seem such a good idea, although interconnectivity between specific senses might be, if we can figure out how to do it simply. --EncycloPetey 06:48, 28 November 2009 (UTC)
I was thinking about a template for this. It could wrap around quotations in the Citations: namespace, giving each one a number. Then the numbers would be called in the entry, selecting from the citations as appropriate (no need for huge lists). This could eliminate the need to have sense duplication in the Citations space. Nadando 02:48, 30 November 2009 (UTC)
I like the idea, but with glosses not numbers. Conrad.Irwin 12:50, 30 November 2009 (UTC)
Agreeing with Conrad, I think we need glosses to keep things from falling apart if the senses get re-ordered, as they will. The glosses should at least provide a clue even if when the definitions are revised. As long as entries can be edited by anyone who isn't a certified expert in our largely undocumented ways, we can expect edits to mess up the coordination mechanisms available to us. As we apparently can't make our solutions bullet-proof, we need to try to keep them easily reparable. DCDuring TALK 15:51, 30 November 2009 (UTC)

[edit] A dictionary is not for punctuation

The West Frisian word ús has two articles. The other one is for Us. This second article is unnecessary; it is not a separate word, and has no different meaning. It seems to exist only to show people that, in West Frisian, diacritics are usually not written on capital letters. Going by that, each West Frisian word that begins with an accented vowel would need a separate entry for its capitalized form, which is wholly unnecessary. Explaining orthographical rules, as this article's creator seemingly means to do, is not a dictionary's purpose.

Eal

Yes, I agree. Deleted.RuakhTALK 17:05, 28 November 2009 (UTC)
I also found Citations:; earlier, which I found really odd. Mglovesfun (talk) 17:50, 28 November 2009 (UTC)
I believe that in Greek accents are also optional (usually left out) on capital letters... but at any rate, I don't think diacritics are punctuation. — [ R·I·C ] opiaterein — 18:23, 28 November 2009 (UTC)

[edit] Christmas Competition 2009

This year's Christmas Competition is announced and is open to all contributors!
--EncycloPetey 07:47, 29 November 2009 (UTC)
Adventskranz-1.Advent.jpg

[edit] Genitive and Swedish

The question may seem ridiculous, especially as coming from a native. But: Do Swedish have a genitive case, or at least, should we claim so?. Why I don't know this? Well, when I started to add entries, I followed what layman knowledge I had, namely that possession is indicated by the genitive case. This is what is taught in school (at least when I went to school), this is what (most, afaik) other dictionaries state, this is what encyclopedias state (e.g. NE [11]), this is what textbooks claim even today, based on what I find on the net. But it has also been claimed that a more thorough analysis reveal that it isn't a case at all anylonger, and should be considered as a possessive form, just as in English (see e.g. this discussion on sv:wp, and this change here on wikt). One motivation is that it is so vastly more common to write the phrase The Queen of England's men as Drottningen av Englands män than Drottningens av England män, even though the latter still is in some very limited use. (The recommendation is actually to use the former, which would correspond better to a "possessive" analysis, even from those who aren't ashamed to call the form genitive.)

So: should we follow what is the academically correct description (i.e. contrasting a "base form" to a "possessive form"), or the description which is vastly more common (i.e. "nominative" versus "genitive")? \Mike 10:54, 29 November 2009 (UTC)

[edit] Wiktionary:Phrasebook

I've made a start on this, but it's tough going. We probably need some discussion about CFI for the phrasebook versus "languages", as it were. Mglovesfun (talk) 15:55, 29 November 2009 (UTC)

[edit] Categories for intergovernmental organizations

This is spurred by a RFDO entry that has resulted in a deletion and recreation of Category:ru:CIS. As I pointed out in the original deletion discussion (which was 2-0 in favor of deletion at the time of deletion and 2-1 at the time of recreation) I do not believe that as a dictionary we should have topical categories for intergovernmental organizations such as the CIS. The same reasoning also applies to the G20, the European Union, Mercosur, ASEAN, etc. At best, membership information belongs in the entry for the pertinent organization, and likely should be left to the relevant Wikipedia article which as an encyclopedia can deal with information such as changing membership in such organizations (such as Georgia's withdrawal earlier this year from the CIS) with more detail than Wiktionary can (and likely in a more timely fashion as well).

To that end, I'm seeking a consensus that Wiktionary will not have topical categories for intergovernmental organizations. — Carolina wren discussió 20:46, 29 November 2009 (UTC)

[edit] Wiktionary:CFI - has something gone missing ?

I thought this page used to have some sort of banner on it directing people who wanted to contribute ideas for change to Wiktionary:Editable CFI. There is no banner now. Am I just imagining this, or does anyone else remember there being such a "banner" ? --Richardb 07:56, 30 November 2009 (UTC)

Well, according to the history page, CFI has not been edited since a month before the creation of Wiktionary:Editable CFI, so apparently there never was such a banner. --Yair rand 08:09, 30 November 2009 (UTC)
Ah, but you don't know how devious some administrators can be to cover their tracks. Of course I checked the history, and the visible deletes. Doh!--Richardb 08:55, 30 November 2009 (UTC)
If you want to make changes to CFI, bring it up here. Duh. — [ R·I·C ] opiaterein — 17:02, 30 November 2009 (UTC)

[edit] Template help

Please see the discussion here. If you're good at wikisyntax, please help. --The New Mikemoral ♪♫ 01:22, 1 December 2009 (UTC)