Wiktionary:Beer parlour/2009/October

From Wiktionary, the free dictionary
Jump to navigation Jump to search
This is an archive page that has been kept for historical purposes. The conversations on this page are no longer live.
Beer parlour archives edit
2024

2023
Earlier years

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002
December


October 2009

Volunteers still needed

Hi all,
Although we removed the centralnotice that was up, the Wikimedia Foundation is still looking for volunteers to serve as subject area experts or to sit on task forces that will study particular areas and make recommendations to the Foundation about its strategic plan. You may apply to serve on a task force or register your name as an expert in a specific area at http://volunteer.wikimedia.org.

The Foundation's strategy project is a year-long collaborative process which is hosted on the strategy wiki, at http://strategy.wikimedia.org. Your input is welcome (and greatly desired) there. When the task forces begin to meet, they will do their work transparently and on that wiki, and any member of the community may join fully in their work. This process is specifically designed to involve as many community members as possible.

Any questions can be addressed to me either on my talk page here or on the strategy wiki or by email to philippe at wikimedia.org.

I hope you'll consider joining us!

Philippe 01:53, 1 October 2009 (UTC)[reply]

Mandarin reaches 10,000 nouns!

Thanks for everyone's help in getting Mandarin to the 10,000 nouns mark. The 10,000th noun was... 尼龍 (nylon)! Celebrations all around! (see Category:Mandarin_nouns) Tooironic 04:55, 1 October 2009 (UTC)[reply]

Congratulations! Well done!. I marked the English word with a few new translations. Red links can't wait to become blue. :) Anatoli 05:49, 1 October 2009 (UTC)[reply]

Strong numbers

I was thinking, it might be useful to add Strong numbers to all the relevant Greek/Hebrew entries, and also redirects from the Strong numbers themselves to those entries. What would people think of this addition? --SJK 09:49, 1 October 2009 (UTC)[reply]

Blunder in CFI General Rule needs to be corrected.

(Copy of entry at Wiktionary_talk:Criteria_for_inclusion#Blunder_needs_to_be_corrected_in_CFI_definition)

Someone, at some time, has made a blunder, that has apparently been subsequently accepted by a vote.

Under ==General rule== we find the line-

A term should be included if it's likely that someone would run across it and want to know what it means. This in turn 
leads to the somewhat more formal guideline of including a term if it is attested and idiomatic.

I hate to point out the absurdity, but, if obeyed, this would mean we would have ONLY idioms in Wiktionary !

I propose that the General Rule should be changed to:-

A word should be included if it meets any of the following criteria
*Clearly in widespread use, 
*Used in a well-known work, 
*Appears in a refereed academic journal, or 
*Used in permanently recorded media, conveying meaning, in at least three independent instances spanning at least a year. 
(See below under Attestation for clarification of these criteria)

A term other than a single word needs to meet the above criteria, and additionally be idiomatic. (See below for Criteria for Idiomaticity)

This change would also remove the disparity between the very loose, almost colloquial general rule (if it's likely that someone would run across it and want to know what it means) and the more formal attestation requirements.

Again, it needs to be changed, but I personally can't make the effort to mount a vote and a campaign. Anyone want to take it on ?--Richardb 15:02, 1 October 2009 (UTC)[reply]

When dealing with many types of languages, I take idiomatic to mean, "not a sum of parts" where parts can be whitespace-delimited words or just morphemes/letters. I think most single, whitespace-delimited words can be included with such an "open" interpretation (though maybe I'm alone on this view:). It's a bit fuzzier with highly agglutinative languages, and other issues noted at WT:IDIOM. --Bequw¢τ 18:12, 2 October 2009 (UTC)[reply]
This is not a blunder; the wording is correct. You have misinterpreted the logical term if as only if. They are not the same. The term if implies action only in cases where the initial condition is met. When the initial condition is not met, it makes no statement either way. --EncycloPetey 00:18, 3 October 2009 (UTC)[reply]
And there's me, 30 odd years in computing, engineering, policy writing, and I thought I knew and/OR/IF logic fairly well. But no way do I understand that completely twisted logic. Does anyone else ?--Richardb 07:52, 30 November 2009 (UTC)[reply]
But I think Richardb's follow up question would then be (if I may put words into his mouth), "if single-word terms aren't admitted under the 'General rule', under what other criteria in the CFI are they admitted?" This I had a hard time finding.
Essentially this is a problem, when I delete some nonsense every now and again someone replies on my talk page citing the sentence above. I mean why aren't Tiger Woods and Davis Love III terms that someone might run into and want to know what it means? Basically what follows in WT:CFI contradicts this anyway, so I think it should be changed as a cleanup issue, not a 'policy change'. Mglovesfun (talk)
Can Davis Love III be called a term? I don't think so. But this sentence in the CFI is much too restrictive for another reason: users may consult a dictionary for a definition, but also for an etymology, a pronunciation, anagrams, etc. They may also look for a term (a phrase) they do not know or cannot remember, but with a precise meaning (with the search tool, they could find it, but only if it's present...) Lmaltier 13:23, 3 October 2009 (UTC)[reply]

Including unicode encoding information

Should unusual characters (i.e. ) have unicode encoding information systematically included in some fashion? If so, what is the best way to do it?

For what it's worth, I'm inching toward yes at least on encoding, and also including the actual Unicode character name somwhere. Circeus 17:12, 1 October 2009 (UTC)[reply]

Could you treat the name as a synonym and include them (name and encoding) in a template that provided useful links for explaining what they are and how used? I often do that with abbreviations of headwords. Many contributors seem to believe that abbreviations deserve to be on the inflection line. Presumably they would like these there as well. DCDuring TALK 17:27, 1 October 2009 (UTC)[reply]
I was leaning towards usage notes for this, the last time it came around: e.g. . But a nice little floating infobox might do the job better. I do think we should include all of this information (for all characters defined in Unicode). However, IMX efforts to do so tend to arouse a tedious amount of carping. -- Visviva 18:38, 1 October 2009 (UTC)[reply]
I've come to the conclusion that, for completely non-linguistic symbols, the best treatment is often to start with a typographical definition that explicitly mention Unicode, which is immediately followed by the encoding. See the aforementioned , and . A good solution for actual letters (say Â) stil needs to be found, though: I'm struggling just to get a good way to deal with Δ. Letters in general are a downright mess, and I believe good stylistic guidelines for defining symbols need to be found: "delta" under an english section for the aforementioned Greek letter is neither an accurate nor a good definition for several reasons I won't get into here. Circeus 18:43, 1 October 2009 (UTC)[reply]
These approaches are great for the applicable symbols. May they be applied consistently and widely and be a model for whatever classes they do not themselves cover. DCDuring TALK 19:54, 1 October 2009 (UTC)[reply]
Encoding details are different than definitions, so including them as such is confusing. Having that information included as Usage Notes or on the inflection like (e.g. ) are both fine. A separate infobox also works. --Bequw¢τ 03:03, 2 October 2009 (UTC)[reply]
Actually the more I think about it, the more I think the encoding info should not just be separated from definition lines, but from language sections altogether. If there was some clean way to include it above the language sections, like {{see}}, I'd be in favor of that. --Bequw¢τ 21:40, 5 October 2009 (UTC)[reply]
Thanks to Visviva we now have the general {{character info}} and several script-specific templates (such as {{Vai character info}}). --Bequw¢τ 15:24, 15 October 2009 (UTC)[reply]

Wiktionary lookup tool

One of the side projects I've been pushing is increasing traffic/visibility of the Wiktionary project in general. A tool was requested for fr.wn, to allow readers to quickly look up terms without leaving the article they're browsing. Cirwin pointed out Bequw's quickLookup.js, and modified it for the proof-of-concept, as well as continuing to advise. n:User:Bawolff has continued to develop the idea, and has a working model which uses MediaWiki:extractFirst.xsl to extract the first definition.

My goal for this tool is to increase the reach of Wiktionary, and hopefully bring in new contributors. Of course, that's not the goal for others working on the tool - they just want a quick way to look up terms.

This tool is currently installed as a gadget on en.wn and en.ws, and probably on other wikisource languages. It is also going to be packaged for inclusion in blogs and other popular website scripts. The tool has been shown to several large Mediawiki sites outside the WMF, and may be tested on some of them soon.

- Amgine/talk 18:31, 1 October 2009 (UTC)[reply]

Note on some unusable sources.

There are a few publishing houses out there, notably Alphascript Publishing and Global Vision Publishing House (that I've seen, at least) which are putting out "books" which are nothing more than collections of Wikipedia articles plucked and thrown together. Obviously, such works should not be counted towards the CFI requirement for demonstrating usage of words within them any more than the copied Wikipedia articles themselves. Cheers! bd2412 T 17:12, 3 October 2009 (UTC)[reply]

Well, obviously they would only count once ... but if they are in fact printed paper books, I'm not sure why we wouldn't accept them as citations. We would accept a printed quote from a Wikipedia article, wouldn't we? -- Visviva 04:14, 4 October 2009 (UTC)[reply]
A printed quote in a peer-reviewed source, I'm sure we could use. However, if someone is just grabbing a handful of Wikipedia titles and calling the collection a book (for $60-70, no less) they are no more reliable for our purposes than an openly editable Wikipedia article. From what I'm told, I should add, these printed volumes come complete with whatever spelling/syntax errors are to be found in the articles grabbed, and misprints of certain characters. See User:PrimeHunter/Alphascript Publishing sells free articles as expensive books for details, but my particular concern is that these books do come up in word searches on Google Books. Cheers! bd2412 T 23:04, 4 October 2009 (UTC)[reply]
We are only looking of for durably archived examples of usage. WP can't rely on a printed copy of itself. We could rely on a printed copy of articles from WP because it overcomes the durable archiving issue. I wonder under what circumstances we could rely on printed copies of our own usage examples for attestation! Both sources run the risk that a savvy person could game attestation, but we already face that problem with Usenet. A savvy PR person could plant terms in the press and in testimony to game the system as well. DCDuring TALK 01:16, 5 October 2009 (UTC)[reply]
Our own usage examples are mentions, not uses. But our discussions (as her in the BP) are uses.​—msh210 17:24, 7 October 2009 (UTC)[reply]
With the advent of self-publishing mechanisms like Lulu, the divide between "books as properly reviewed and edited texts" and "books as anything anyone can write" (à la blog) is becoming more and more blurry. I foresee this becoming a problem in the near future. Equinox 21:45, 5 October 2009 (UTC)[reply]
Not sure why somebody reverted that edit. To clarify: I don't mean that the man on the street should never, under any circumstances, be allowed to write a book, but that books without skilled editing are liable to be a mess, and that our WT:CFI currently relies on a tacit assumption that books don't have the failings of blogs and Web pages. Equinox 22:27, 5 October 2009 (UTC)[reply]


Extended Wiktionary queries now available

I've been working on a tool which makes it possible to make many new kinds of queries on the English Wiktionary. It doesn't yet have a public front-end but for now you can submit queries to me and I will try to fulfil them.

For an example of what is possible see User talk:Vahagn Petrosyan/Armenian nouns lacking declension sections.

Basically anything involving page titles, languages, section headings, and section heading levels. It's also possible to compare and count.

I would like to add categories and information other than section headings too and if anybody would like to help me improve the tool that would also be great. — hippietrail 08:03, 6 October 2009 (UTC)[reply]

i'dlike 2find it[+ipa pl--史凡>voice-MSN/skypeme!RSI>typin=hard! 10:17, 6 October 2009 (UTC)[reply]
I'd like to find italian entries containing IPA please
L☺g☺maniac chat? 13:30, 6 October 2009 (UTC) +--史凡>voice-MSN/skypeme!RSI>typin=hard! 14:40, 6 October 2009 (UTC)[reply]
Sorry but IPA doesn't involve a section heading. I could find all Italian entries with a Pronunciation section if you like. — hippietrail 17:01, 6 October 2009 (UTC)[reply]

k+same4cantones pl--史凡>voice-MSN/skypeme!RSI>typin=hard! 01:32, 7 October 2009 (UTC)[reply]

I like it, this will make many cleanup lists easier to devise and keep updated. Can't wait for a front-end. --Bequw¢τ 16:38, 7 October 2009 (UTC)[reply]
A front-end is difficult because SQL is difficult and allowing users to enter arbitrary SQL would be dangerous. That means I have to wrap it in some generalized code. But to do that I need to know what kinds queries people will want. So please ask for some and the front-end will get closer.
I have categories working now too so for another example I was able to find all the differences between pages with Armenian Noun headings and Armenian noun categories. — hippietrail 00:51, 8 October 2009 (UTC)[reply]
Using just headword, section headers and categories: Sample queries: English entries with a "homophones" header or in homophones category with no Pronunciation section; English PoS headers not in corresponding English PoS category; English entries in category English Prepositional phrases with only Adverb headers in English section, with Phrase header, etc. English phrase headers not in any of a set of categories. Single-word English entries with Proverb, Phrase, or Idiom headers.
Many of these would need a "count" run, followed by a "sample" run of 10 or 20 (not necessarily a random sample, but not always beginning with "a"). HTH. DCDuring TALK 01:47, 8 October 2009 (UTC)[reply]
What an interesting effort. Can I have a list of words in Category:1000 English basic words that lack the etymology header? If yes, you can post it to my user space or anywhere else fit.
Does the tool have a home page here in wiki? --Dan Polansky 07:44, 8 October 2009 (UTC)[reply]
  • User:HippieBot/English basic words without etymologies
    No home page yet. Rather than being a tool I have several tools which create metadata database tables from a dump file and for now I'm figuring out SQL queries for those tables. When I get the hang of it I will attempt to craft it into a tool with a web front end. — hippietrail 14:08, 8 October 2009 (UTC)[reply]
    Thanks! Nice that you have also posted the SQL statement to the result page. --Dan Polansky 14:49, 8 October 2009 (UTC)[reply]
    Thanks very much for generating these extremely useful cleanup lists. One good use would be to make our PoS categories match our PoS headers so the categories could be relied upon as being nearly complete matches with the headers.
Being able to match templates and context tags with headers would be nice too, when as and if you get the chance. In the meantime, the lists that you can now generate make it easier to systematically do lots of cleanup. DCDuring TALK 15:45, 8 October 2009 (UTC)[reply]
A great mega-family of cleanup lists would be, for each language, entries with headers of a given PoS, but not in the corresponding categories. This is close to what an Ullman bot generates I think, but only in pursuit of changed items. DCDuring TALK 20:56, 8 October 2009 (UTC)[reply]
I have completed cleanup on the two lists you provided. I would like to work through some of the smaller English PoS categories: prepositions, determiners, conjunctions, pronouns, interjections, idioms, phrases, proverbs. The cleanup would be those with the headers indicated, but not in the categories indicated. DCDuring TALK 17:23, 10 October 2009 (UTC)[reply]

Here's another one I thought of: User:HippieBot/English terms in unknown etymology category but without etymology section

Categorization of Abbreviations, Acronyms and Initialism

Shouldn't the categorizing templates ({{abbreviation}}, {{acronym}}, {{initialism}}) assume "English" unless otherwise specified, just like the context templates do? There's whole mess of entries in Category:Abbreviations, acronyms and initialisms that are English and therefore should be in Category:English abbreviations, acronyms and initialisms but are miscategorized because people don't pass in lang=English/en. --Bequw¢τ 19:13, 8 October 2009 (UTC)[reply]

Makes sense. L☺g☺maniac chat? 19:18, 8 October 2009 (UTC)[reply]
Yes. English gets to be the default, but has the responsibility for cleaning up. Shouldn't there be a pseudolanguage code for "language to be assigned"? Is there one already? DCDuring TALK 20:13, 8 October 2009 (UTC)[reply]
{{und}} Undetermined is the relevant ISO 639-3 code you are thinking of. — Carolina wren discussió 20:32, 8 October 2009 (UTC)[reply]
Thanks, CW. DCDuring TALK 20:42, 8 October 2009 (UTC)[reply]
There are 6270 or so in that category. All the ones beginning with "." (>200) are translingual. Many of the others are translingual. Perhaps Hippietrail should make a gift of a cleanup list linked to the About pages for each language of all entries with an abbreviation header in a language section and not an entry in the appropriate category. If some items now have "en" as a parameter, I would guess it would be more efficient if the master list of misclassified items remained where it is. DCDuring TALK 20:42, 8 October 2009 (UTC)[reply]
Implemented. Now time for cleanup. --Bequw¢τ 00:25, 9 October 2009 (UTC)[reply]

Reconstructed terms in attested languages?

How should we handle reconstructed terms in attested languages?

These arise frequently in detailed etymologies – for example, (deprecated template usage) firkin comes from conjectured Middle Dutch *vierdekijn, diminutive of vierde (fourth).

These differ from reconstructed terms in reconstructed languages in that only the term is reconstructed – the language itself is attested; thus *vierdekijn is an conjectured unattested term in the language Middle Dutch, not in “Proto-Dutch” or “Proto-Middle Dutch”.

Specifically, how should be file and link to pages for such terms?

The existing policy page, Wiktionary:Reconstructed terms (WT:RT), largely deals with reconstructed languages, the category for reconstructions, Category:Reconstructions, contains only reconstructed languages, and the {{reconstructed}} template is used only terms in reconstructed languages – not one single reconstructed term in an attested language is flagged or categorized as such, hence there is no existing practice on which to base policy.

Question 1 – where do they go?

Do reconstructed terms (in existing languages) go in the main namespace or in Appendix:? Reconstructed terms fail WT:CFI as they are not attested, but WT:RT: Entries for terms states:

A Latin reconstruction should be clearly marked as a reconstruction, but goes in the main namespace as any other Latin term would, following normal rules for inclusion.

This appears to be a mistake – following “normal rules for inclusion”, a reconstructed term does not meet CFI, because it is unattested.

We should either:

  • Amend WT:CFI to allow reconstructed terms in attested languages;
    Dubious – reconstructed terms are subject to revision as theories change: “attested” is part of CFI for good reason.
  • Clarify at WT:RT and WT:CFI that reconstructed terms do not go in the main namespace, whether the language is or is not reconstructed.

Based on existing practice, I’d assume that *vierdekijn would be filed in:

What should the parent category be called?

(There’d be a “nouns” in between, natch.)

Question 2 – how to link?

Existing policy at WT:RT: References from Etymologies only refers to reconstructed languages, and prescribes the use of {{proto}}.

How to link to *vierdekijn (and other reconstructed terms)?

  • Using {term} in some way – say, link to vierdekijn but display as *vierdekijn,
  • Using {proto} in some way – namely, (optionally) removing the required “Proto-”.
  • Using a new template, say {{conjectured}}, which functions like {term} but links to Appendix, optionally adds the *, and some wording like “conjectured”.

My thoughts:

  • Overloading {term} any further seems a mistake – it’s very basic.
  • Adding a noproto argument to {proto} would be the easiest answer, and fine if we don’t want to distinguish reconstructed terms from reconstructed languages.
  • A new {conjectured} template would allow finest control; also, one could use existing ISO codes for languages.

Simple summary of my conclusion:

  • reconstructed terms in Appendix:, even if language attested;
  • add noproto to {proto}, or make new template {conjectured}.

What do people think?

—Nils von Barth (nbarth) (talk) 21:09, 10 October 2009 (UTC)[reply]
BTW, previous discussions at:
—Nils von Barth (nbarth) (talk) 21:15, 10 October 2009 (UTC)[reply]
As I said on some of the abovelinked discussions, I see no difference between protolanguages, and what others call "reconstructed terms in existing languages". Proto-languages are collections of proto-terms, and these are the etymons that are hypothesized to have existed on the basis of comparative evidence, in order to yield existing, actually attested forms, and *vierdekijn is exactly such "proto-term". The certainty by which *vierdekijn is reconstructed is no less than some of the reconstructions in ancestral languages that are not attested at all, or attested much more scarcely than Middle Dutch.
Also: I don't see much gained by putting such reconstructed middle-forms yielding an attested form in only one language (or only one form in one language) in the appendix, and IMHO the appendix namespace should be used only for major proto-languages which have many descendants (in this case Proto-Germanic where MD reconstruction would be listed at the appropriate clade in the hierarchy). Vulgar Latin (Proto-Romance) forms that were not attested could be added as the descendants in the most-close Latin etymon (and it won't be too wrong in this particular case to list D firkin as if descending from MD vierde).
In etymologies I've been so far formatting these reconstructed non-protolanguage terms as *{{term||<form>|meaning}}. I wouldn't have nothing against using some newly-written {conjectured} template, tho enforcing it may be an overkill for this whole issue, as {term} seems to be up to the task for it. --Ivan Štambuk 09:53, 12 October 2009 (UTC)[reply]
I think Ivan has pretty well summarized my opinions as well. --EncycloPetey 02:28, 13 October 2009 (UTC)[reply]

Ivan, thanks for the thoughtful write-up (and EP for concurrence). If I may summarize, both to verify that I’ve understood and to make a concrete proposal:

  • Question 1 – where do they go?
Reconstructed terms in attested languages – especially intermediate forms in etymologies – should not have an entry (a page), either in the main namespace (because they do not meet CFI), nor in the Appendix (as that is reserved for Proto-languages).
Rather, they should be listed in the entry for the descendants used to reconstruct it (in the “Etymology”) section, using *{{term||<form>|meaning}} (which yields *(deprecated template usage) <form> – formatted, but no link), and in the entry for the closest older term (in the “Descendants”) section (e.g., Vulgar Latin in the Latin entry).
  • Question 2 – how to link?
(Don’t link; {term} formats correctly.)

This sounds like an excellent solution – it lists the form for etymology (which is their function), but skips having an entry whose only function is to fill in a chain of etymology.

—Nils von Barth (nbarth) (talk) 01:59, 14 October 2009 (UTC)[reply]


Stock symbols

Do we even want these?
We currently have Category:Stock symbols for companies with feeder template {{stock symbol}}, but I strongly doubt we want either the category ot the template, while the entries themselves should all be sent to RFV or simply deleted, with the possible exception of (deprecated template usage) T . Not only are the ticker symbols duplicated on various exchanges worldwide, they change over time. For example. (deprecated template usage) C was once Chrysler, and now is Citigroup. — Carolina wren discussió 17:54, 11 October 2009 (UTC)[reply]
What is the reason? That they have an average life of less than 30 years? There might be books (even literature) that refer to then-current ticker symbols, especially the ones with cutesy names or that were for popular stocks, like "T". We seem to find is easy to justify including all kinds of abbreviations from realms that are more familiar (or ideologically congenial ?), like ISO-639 codes, E numbers, etc. I would be interested to hear the arguments distinguishing this case from those others. DCDuring TALK 16:12, 12 October 2009 (UTC)[reply]
It seems to me that the case is basically identical with ISO codes, except that there is more than one registering authority. Which is to say, I wouldn't really lose any sleep over it if they all got shuffled off to Appendix-land, but I do think we are a more useful reference work for having them (and for having them in mainspace). If we got rid of these, I would want to see a global purge of all the other initialisms that don't meet normal-word criteria. But I don't really think that would be a good use of anyone's time. -- Visviva 16:35, 12 October 2009 (UTC)[reply]
I say, apply the CFI for brand names. Iff they meet it, they stay in. If not, appendicize. bd2412 T 17:40, 12 October 2009 (UTC)[reply]
But they're not brand names; why would we treat them like they are? Also, that bit about not being written "about the type of product in general" is rather problematic. Surely no one would expect to find these anywhere but in publications that are at least generally related to the stock market. -- Visviva 11:08, 13 October 2009 (UTC)[reply]
What is a stock ticker symbol? It's a stand-in for the company name, a proper noun. I have no objection whatsoever to including them in an appendix, but I'm not sure what purpose is served by reporting them as words. Someone who comes across a stock ticker symbol will almost certainly be looking at a stock ticker or the like, and will immediately know what quality of symbol they are looking at. bd2412 T 19:38, 13 October 2009 (UTC)[reply]
Unlike ISO-639 or E numbers, we've got the issues of multiple issuing authorities and no ban on reassignment of previously used symbols. Indeed, Citigroup took over C the same year Chrysler merged with Daimler. At a minimum, if generally kept in mainspace, we need to revamp the category and template, so that it could handle the multiple issuing authority concern. — Carolina wren discussió 20:14, 12 October 2009 (UTC)[reply]
But that actually suggests to me that this would be more useful than other similar classes of entries. A zeptogram will never be anything but zepto- + -gram, and en will never be the ISO 639-1 code for anything but English; but a stock symbol may have wildly different associations over time and space, associations that may not be satisfactorily documented anywhere else. Someone reading an older financial text that refers to "C" might come away with a very flawed interpretation if they assume it is referring to Citigroup rather than Chrysler. -- Visviva 11:08, 13 October 2009 (UTC)[reply]
I'll agree that a source that collects historical ticker symbols would have value, but are they used in non-tabular text, unaccompanied by the name of the company whose stock they represent? Generally, the answer to that is no, with occasional exceptions such as T which could then pass the normal CFI. E numbers are a marginal case, but they will show up commonly enough with such words as peas, carrots, and potatoes in ingredient lists. — Carolina wren discussió 17:42, 13 October 2009 (UTC)[reply]
They are in widespread use in the financial press in articles that discuss stock prices. I'm not really sure that the use of language codes occurs much more on running test than stock symbols do. We haven't been very demanding on any of the 1900 headwords that are have abbreviation-type heading. Items from the world of commerce often attract hostile attention that analogous words from the worlds of IM, computer gaming, computing generally, and linguistics do not. Consequently they can serve as a kind of miners' canary, providing useful information about entry classes for which our standards have otherwise been overly lax. DCDuring TALK 18:00, 13 October 2009 (UTC)[reply]
In the financial press I read, I've generally only seen them used in conjunction with the the name of a company, never independently, as way to enable easy lookup of data by interested readers, since financial data often is setup based on the assumption that the ticker symbol will be used to access it. — Carolina wren discussió 19:49, 13 October 2009 (UTC)[reply]
... or areas where we have an unreasonable prejudice against business. (-: -- Visviva 18:21, 13 October 2009 (UTC)[reply]
If the entries only said "stock ticker symbol", I agree that would be fairly useless. But most seem to have at least the name of the company and a link to the pertinent Wikipedia article. Information about the exchange and period of use would also be helpful. Of course, all of this is rather encyclopedic... but it's also somewhat dictionaric, and even if it were included somewhere in Wikipedia it wouldn't be likely to be easily found. So we once again face the choice of whether we would rather be useful or pure. I'm less than convinced of the community's ability to make the right choice on such matters, based on past experience; maybe I should just go out and create a few thousand of these myself, to help create some momentum in the "right" direction. ;-) -- Visviva 18:21, 13 October 2009 (UTC)[reply]
The meaningful symbols each have issuing authorities appropriate for the scope. There is some effort currently to differentiate via prefixes or suffixes securities from markets on other exchanges or similar entities. In some ways, the polysemy and duplication is reminiscent of what occurs in other realms, such as ordinary words in ordinary languages. DCDuring TALK 14:05, 13 October 2009 (UTC)[reply]

I just thought of another reason to not include most of them. In the case of ISO 639, (deprecated template usage) en is a symbol for (deprecated template usage) English. In the case of E numbers, (deprecated template usage) E175 is a symbol for (deprecated template usage) gold. In the case of stock ticker symbols, (deprecated template usage) KFT is a symbol for (deprecated template usage) Kraft Foods, which unlike English or gold, likely does not meet CFI. How can a symbol for something which has not met CFI, meet CFI itself? — Carolina wren discussió 19:49, 13 October 2009 (UTC)[reply]

We don’t have for your information, but we do have fyi. There certainly are cases where something might not merit inclusion, but an abbreviation or symbol for it probably would. —Stephen 20:07, 13 October 2009 (UTC)[reply]
Agreed. Furthermore, I'd dispute the claim that "en", "E175", and "KFT" are symbols for "English", "gold", and "Kraft Foods"; rather, I think they're symbols for English (the language), gold (the substance), and Kraft Foods (the company). That is, these symbols have the same referents as the corresponding plain English, and two of them are even derived from said plain English, but they are not actually symbols for the plain English. —RuakhTALK 20:23, 13 October 2009 (UTC)[reply]
Ticker symbols are for various securities. Most well known are the ticker symbols for the common equity of operating businesses, but they also exist for other securities of those companies, and for options, closed-end funds and exchange-traded funds (ETFs). I'm sure I'm missing other categories. In the financial press, the common stock ticker symbol may be used to refer to the underlying company, but the divergence between the company and its equity becomes clear whenever bankruptcy is an issue.
Just as with ISO 639 codes, the need to avoid duplication sometimes forces a choice of ticker symbol whose connection with the referent is obscure or even arbitrary.
Many of our abbreviations refer to organization names that do not meet CFI or to SoP phrases. (Though Pawley would suggest that the existence of an abbrevation was evidence of idiomaticity.) DCDuring TALK 22:53, 13 October 2009 (UTC)[reply]


Restoring Bosnian, Croatian, Serbian sections

In a few days I will be running code to restore the standard language sections deleted with no consensus. There will be no changes made to other sections, and no changes made to the restored sections other than to tag some of them for further attention.

There will be several tests (8-10 entries) run in the meantime for inspection (;-) Robert Ullmann 01:45, 13 October 2009 (UTC)[reply]

Robert, many of those entries were created originally by Ivan Štambuk himself or other contributors who support the change to Serbo-Croatian, and then changed from Bosnian, Croatian, Serbian to Serbo-Croatian. If you are not contributing in these language(s), I don't think it's right. Anatoli 02:19, 13 October 2009 (UTC)[reply]
Once an entry (or any content) is added to the wiki, it is not the property of any contributor; deleting it without consensus is not permissible, and must be restored. (I'll quote Štambuk himself: "... ti ne posjeduješ ovaj projekt. On nije tvoje vlasništvo, ni jednog jedinog bajta. you do not possess/own this project, not one single byte from hr.wikt Kafić ;-) The alternative is that the various contributors, including myself, will have to correct 6000+ entries manually, which we are not looking forward to. Robert Ullmann 02:40, 13 October 2009 (UTC)[reply]
Uhm, I don't remember writing anything like that in the Kafić. Please stop spreading your dirty lies.
As for the "deleting" - absolutely nothing of content was deleted. I've explained to you that many, many times. Can your brain understand that? Pretty much all of the merged entries were heavily expanded and rectified in the process.
"deleting it without consensus" - there was a consensus for 4 months while the merger was ongoing, and which you happily ignored, until you imagined it was some kind of "linguistic genocide" (and sadly, too many ignorants have succumbed to many to your FUD)
You can manually add individual B/C/S/M sections from the merged ones, and add the new ones from the new SC entries, I have absolutely no problem with that. What I have problem with is this unilateral bot-running of yours, for languages you are completely ignorant of. It would generate faulty entries, and it would generate more 15-20 000 entries all of which would have to be manually checked. It would generate absolutely no useful content at all, as all of the information of the "new" entries is allready contained in the merged ones (since they're, you know, the same language). --Ivan Štambuk 09:34, 13 October 2009 (UTC)[reply]
...ti ne posjeduješ ovaj projekt. On nije tvoje vlasništvo, ni jednog jedinog bajta... was written by IP, but in style of Ivan Štambuk (which is unique on hr wiki), so anybody could make an error and assume that was Ivan, when in reality somebody could easily pose as him just by using his violent style. SpeedyGonsales 13:27, 14 October 2009 (UTC)[reply]
It is, in fact, his (fixed assignment) IP address at the University of Zagreb. Robert Ullmann 05:35, 15 October 2009 (UTC)[reply]
Uhm, FYI I have dynamic IP address thru several providers, and when I edit Wiktionary (and other Wikimedia projects) from any external connection (from college or public computer) I always do it tunneled through my home computer (which is always on) for safety reasons. I've never been to Slavonia or Požega (where the above IP traces itself to), and even though I do use CARNET as one of my providers, it is also used by at least 100 000 other people in the Academia.
What is really interesting to me is that you Robert can't understand a iota of any Slavic language (Serbo-Croatian included), and yet you somehow claim out of thin air that some randomly quoted IP, that posted on Croatian Wikipedia discussion board several weeks ago, is me. How on earth did you reach that conclusion? SpeedyGonsales (desysoped and banned bureaucrat from Croatian Wikipedia) claims that it apparently bears resemblance to my style of writing, which I find silly since anyone who has read any prosaic writings of mine on Croatian Wikipedia knows very much that my preferred style of writing (which I tend to rotate depending on the occasion, mood, and the addressee) are long, baroquesque sentences with many archaic and unusual words, metaphors and idioms (I abhor the so-called "standard language", and strive to "break" it wherever I can ^_^). You couldn't have possibly made out the stylemes of my writing by means of Google Translate, or similar automatic-translation tools, which brings me to the conclusion that you've been suggested of it by somebody else in off-wiki communication. Gee, I never thought I'd someday be so important that people would translate even the suspected writings of mine to some external parties by e-mail!
And as for this whole "issue", which needlessly diverged into whether that IP address is me or not: It was addressed already in June when DCDuring here in Beer Parlour expressed concern on whether the "merged" entries would belong to the contributors who opposed the "unified treatment". Back then basically all (99%) of all the individual B/C/S entries were in fact created by contributors who were pro unified treatment, and I also added an aside that that issue is irrelevant at any case, because people do not "own" the entries they created, but that we might as well take it into consideration, as a token of good-intention towards the potential (back-then, non-existent) SC contributors who might oppose the common treatment, to only merge the entries created/edited by those supporting the proposal (which in practice meant pretty much all of the entries). And that has been in practice ever since.
However, reducing the legitimacy of a creation of these entries from page histories to whether we could add them or not is simply clouding the issue altogether: A great deal of these entries have some problems (see below), absolutely none of them adds anything useful to the already-present ==Serbo-Croatian== entries (which even Elephantus (talkcontribs) figured out soon, when he started re-creating ==Croatian== entries not from page history, as he did initially, but by copy-pasting ==Serbo-Croatian== entries to ==Croatian==, the only changes being the the switch of sh [the Serbo-Croatian ISO code] to hr [Croatian ISO code]), and the whole effort seems to be an evil-minded exercise in "how vicious can we be to Štambuk", with you unilaterally announcing it here in Beer-Parlour, not notifying the relevant contributors on the relevant talk-pages for feedback, ignoring the requests to ask for a vote sanction first which is required not only per bot policy, but also given the controversy of all this, and esp. doing all so in an accusatory tone (quoting an allegedly IP of mine, that was suppose to retort on counter-arguments), in a company of a couple who, to put it mildly, don't really suggest good intentions of your actions. --Ivan Štambuk 12:53, 15 October 2009 (UTC)[reply]
So long as the Serbo-Croatian sections aren't deleted when the Bosnian, Croatian, and Serbian sections are restored I don't see a major problem. That same lack of consensus over how to treat Southwest Slavic works both ways. I do see a minor problem depending on how you intend to manage the restoration. If the tool will be looking through page histories for Bosnian, Croatian, and Serbian sections deleted when a Serbo-Croatian section was added in its place, and adding them exactly as were prior to deletion, that's one thing, but I can't see trusting a bot, even one that is being human monitored, to recreate Bosnian, Croatian, and Serbian sections from a Serbo-Croatian section. After all the whole raison d'etre of having them as separate sections is that there may be differences. — Carolina wren discussió 03:31, 13 October 2009 (UTC)[reply]
Tell me Carolina Wren, will you be helping cleaning up the dreck generated by Ullmann's bot, or you're simply seeing no problem with it because it wouldn't affect your work here at all? --Ivan Štambuk 09:36, 13 October 2009 (UTC)[reply]
If anybody doesn't know why I write above that Ivan Štambuk is violent, here is proof (just one of many): cleaning up the dreck generated by Ullmann's bot, clear violation of "assume good faith", because Ivan assumes that bot will make errors, even before agreement is reached will some action be done by bot. I know it is hard when admin is breaking the rules of project, but some measure of civility should be held. SpeedyGonsales 13:27, 14 October 2009 (UTC)[reply]
I don't simply "assume" that bot will make errors, I know it will, and I already listed below (in reply to Visviva) some cases where it would introduce errors requiring manual cleanup. I gave an example of one of the test edits, where bot restored the wrong etymology, which has been fixed in the merged entry. It would also restore some of the obsolete templates and sections, not to mention ambiguous and sometimes downright wrong definition lines (which were fixed and expanded in the merged entries), introducing not only ethnical imbalance (by not generating sections for all the four modern Serbo-Croatian standards), but by also misleading the poor reader of Wiktionary that there actually are some differences in the meanings among standards, where there is none (cf. rog#Bosnian and rog#Croatian. The question on whether the bot will make errors or not is competely orthogonal to the question on whether the agreement is reached of bot being run or not - as long as all the bot does is unintelligently restoring old sections from page history, there would be errors, no doubt about that.
"I know it is hard when admin is breaking the rules of project, but some measure of civility should be held." - Uhm, what rules of the project? :) AGF principle applies prevalently to the very first edits of the newly-arrived contributors (before they gain experience of how things are handled here, not only because lots of common practice is not written in the help pages, but because making quality edits requires a bit more technical knowledge than on other Wikimedia projects), and to what appears to be disruptive edits by some of the regulars. What you are trying to do here is to "prove" that I am somehow "violent", and should be "sanctioned" :) My reply to Carolina Wren was along the lines of the old saying lako je tuđim kurcem gloginje mlatiti :) The point is, that those whose edits here are not being directly affected by the consequences of running this bot, esp. those not even knowing the language in question thus being unable to assess the cleanup effort it would induce, don't really have the moral ground to bless its running. Have you checked the validity of those test edits by UllmannBot (talkcontribs)? If you haven't, I don't really care what you think :) --Ivan Štambuk 13:18, 15 October 2009 (UTC)[reply]
Yes, don't delete Serbo-Croatian sections. Some questions:
  1. Will this bot deal with cyrillic entries? Mr. Štambuk created equivalent cyrillic entries of the latin ones. And the bot should try to keep sync of cyrillic & latin entries, otherwise confusion will reign.
  2. Will this bot deal with Template:sr-noun and sh-noun since they aren't the same? Instrumental & locative are swapped around.
  3. What will it do with accent markers?
  4. Does your bot know what to do with yat reflexes?
Hopefully, one day, Croats will stop using serbian vukopis. Serbs don't use Kajkavian & Chakavian and Croats should embrace these dialects more. I feel bad for Mr. Štambuk, since this is mostly his work, but at least this bot could save him time if it will keep cyrillic and latin entries in sync.--Pepsi Lite 10:00, 13 October 2009 (UTC)[reply]
What do you mean by "Serbian vukopis"? Does that alphabet have a Serbian ethnical marker attached to it? Latin script used for Serbo-Croatian today is mostly a result of the work of Ljudevit Gaj and his associates.
I generate Cyrillic-script Serbo-Croatian entries from Latin ones automatically by means of the program I wrote - first I write the Latin-script entry manually, copy it to clipboard, click on the Cyrillic-script redlink linked to in the inflection line, and do CTRL+V (i.e. "paste") - it comes out as Cyrillic. That process can be 99.99% automated. It handles accent marks, sc=Cyrl, various templates ({term}, {l}..) and context markers.. All the edits to either of the script are kept in sync manually that way.
Kajkavian and Čakavian are dying out for the last 5 centuries, and will have no native speakers at all by the end of this century. There is a diglossia with Štokavian wherever Kajkavian and Čakavian are spoken, and that's only in the rural areas, and considering the ever-increasing urbanization, that esp. the fact that they're non-literally with negligible actually written literally output, they're interesting today only as a historical devices of communication to linguists (and a badge of "Croatdom" for some Croatian nationalists). --Ivan Štambuk 10:40, 13 October 2009 (UTC)[reply]
Kajkavian and Čakavian are dying out for the last 5 centuries - equally false statement as "water is running uphill". After work of Ljudevit Gaj (19th century) Croatian people chose Štokavian dialect as base of unified language (in short). Ivan Štambuk thinks we are in 24th century? Every language which is not used, die, that is obvious. But still today both Kajkavian and Čakavian dialect are very much alive, maybe not so much in written form, but you just need to travel a bit, and you will hear so colorful richness of Croatian dialects as you can imagine. People like Isaac Asimov in his books assumed that some day we will all speak some mixture of English and Russian, but that are SF books. Nobody really knows the future, we can make an educated guess about it, but it is still nothing more then educated guess. Wiktionary (same as Wikipedia) should describe reality, not be used as a tool to enforce somebodies vision how it should be. And in reality there are 3 different languages (Bosnian, Croatian, Serbian) and one forming (Montenegrin). And although they are similar to some extent, every language have full right to be described until it cease to exist, either to be joined to other language, or if there is no live speakers of it. And Latin is here nice example, nobody will deny that is dead language, but as it is still used in some professions, it has also its Wikipedia. Štambuk use word nationalist every now and then to antagonize his opponents (which should be discouraged on Wikimedia projects), but we should only stick with reality, as it is only thing that matters. SpeedyGonsales 13:59, 14 October 2009 (UTC)[reply]

Današnje područje čakavštine znatno je manje negoli je bilo prije migracija izazvanih osmanskim osvajanjima velikoga dijela hrvatskoga jezičnog prostora, a čakavsko područje se i danas smanjuje pod pritiskom književnog standarda. Tako je prije jednog stoljeća, uoči 1. svj. rata čakavski govorilo blizu 1/4 Hrvata ili oko 23%, a danas dvostruko manje ili tek 12%, pa je to najugroženije narječje hrvatskog jezika, najbliže izumiranju.

Translation from the article on the distribution of Čakavian from Croatian Wikipedia:

Today's spread of the Čakavian dialect is significantly lower than it used to be, prior to the migrations caused by the Ottoman conquest of a great deal of Croatian linguistic area, and Čakavian-speaking area is diminishing even today, under the influence of literary standard. Hence, a century ago, at the eve of WW1, Čakavian was spoken by approximately 1/4 or 23% of Croats, and today barely 12% of Croats speak it, which makes it the most endangered of all the Croatian dialects, the closest one to extinction.

This was the spread of Čakavian in the Middle Ages (an image from the book by Dalibor Brozović, renowned Croatian Slavist), and this is today. As one can see, it has been gradually decreasing in the territorial distribution, from hinterland towards the islands. Former exclusive Čakavian urban centres such as Split and Rijeka are today completely Štokavianised. The same thing can be said for the Kajkavian dialect area - the foremost example is the city of Zagreb, Croatia's capital, which used to be Kajkavian few centuries ago, but is today 100% Štokavian (the only traces of Kajkavian left in the speech of its dwellers are the interrogative pronoun kaj, stress-based accentuation, and a few lexical items in the speech of common people, especially older folks). To my knowledge, in the course of centuries exactly nowhere the reverse trend has been observed - Štokavian speeches always ousted Čakavian and Kajkavian speeches. By exercising the pure commonsense mental logic, taking into consideration the immense increase of mass media, urbanization, education of all classes of people as well as language standardization in the last half of century, and especially the fact that today Čakavian and Kajkavian are virtually dead as literally languages, it's safe to assume that that in the following century they'd be reduced to the brink of extinction. That process is irreversible and unstoppable. It's just a matter of time before inertia and ignorance finishes what Turks initiated 600 years ago :)
Nicely put, but that is only showing that Kajkavian and Čakavian dialects are slowly dying, but not 5 centuries (slower even longer, faster much shorter). And for irreversible and unstoppable process, I would only mention example of Norwegian languages, which are nice examples what can be done if people want to change its language. Once again, wiktionary should describe todays reality, not something somebody thinks will be in 1, 5 or 10 centuries. SpeedyGonsales 03:23, 16 October 2009 (UTC)[reply]
And in reality there are 3 different languages (Bosnian, Croatian, Serbian) and one forming (Montenegrin). - In reality there is one linguistical entity, in dialectology usually called Neoshtokavian, more commonly known as Serbo-Croatian, actualized in 4 national standards whose mutual differences in grammar could fit on some 2 pages of text :) Now, whether you call these national stanadards "languages" or not is a matter of somebody's perception. Most foreign linguists would rather treat tham as one pluricentric language with regional variants, such as we already have for English, Spanish, German, Portuguese... In the Balkans the word language bears much more value of identity than it does in the rest of the world, so insisting that one can talk of "different languages", strictly on the basis of what ore - let's be honest - trivial differences in spelling and pronunciation, becomes a confirmation of self-identity. You speak of "sticking with reality" - that's exactly what we're doing! :) Linguistic reality is that there are 4 dialectals systems (really 4 languages) - Čakavian, Kajkavian, Štokavian and Torlakian. The third one on the list, a particular innovative speach of it to be more precise (Neo-Štokavian), is used as a bases of the standard, codified language of all the 4 nations of that 4-part dialect cluster. If we were to truly describe linguistic reality, we'd have L2 ==Shtokavian==, ==Kajkavian==, ==Chakavian== and ==Kajkavian== (as Millosh once suggested), but that that would be pointless as we're not building a dialectological dictionary but dictionary that people would use to learn the language of communication, i.e. the standard language.
Štambuk use word nationalist every now and then to antagonize his opponents - I use it because bulk of the objections against the SC proposal come/came on nationalist grounds, as I explained above. Voting "no" on the Unified Serbo-Croatian vote page was perceived as a step to re-affirming one's "Croatdom" (hrvatstvo) by many of the voters. Some even requested that, can you imagine that, they'd be apologised to, due to "being insulted" :) Most of the Balkans still lives in the 19th century state-language-nation fairy tales, a state of mind commonly described by the word nationalist. Sorry, but to ignore the nationalist dimension in all this would be to simply play dumb. FFS, people deliberately lie when discussing the history of "Croatian language", as if the choice of Neoštokavian dialect as literary has absolutely nothing to do with Karadžić and Serbs, which is a ridiculous fabrication of history. Vienna Literary Agreement? Never happened! :)--Ivan Štambuk 15:12, 15 October 2009 (UTC)[reply]
Sadly, you explained nothing. To ignore national dimension when we are talking about language is more than dumb, but to ignore rules of Wikipedia & Wikimedia as you tend to, is rude. You don't have the right to call users of Wikimedia project nationalists! Say what is true or false in words of others, but stop giving others tags of nationalists or any other ists, its rude, FFS. SpeedyGonsales 03:23, 16 October 2009 (UTC)[reply]

Ullmann, you need to pass a vote to do any kind of such large-scale modifications in the main namespace, esp. for languages you have absolutely no bloody clue about. You can start a proposal, and then I'll explain to you why your brain-damaged bot wouldn't work (as I've already partially done, but you seem to simply ignore any kind of discussion). --Ivan Štambuk 09:23, 13 October 2009 (UTC)[reply]

I agree that this would require a vote. (If he wants to go through and do it manually, that's another issue — there's no consensus to forbid it — but bots are only for implementing consensus, not for exploiting the lack of it.) —RuakhTALK 16:15, 15 October 2009 (UTC)[reply]
I would like a vote is such language as brain-damaged bot allowed on this project? Or I can start calling every bot I dislike brain-damaged? Ruakh, maybe you have bot. Is it damaged? Or? SpeedyGonsales 03:23, 16 October 2009 (UTC)[reply]
(don't worry, the bot doesn't care if it is called "brain-damaged". And since he applies that word to everyone who disagrees with him, it is mostly meaningless; "brain-damaged bigot", etc ;-) Robert Ullmann 06:12, 16 October 2009 (UTC)[reply]
"you need to pass a vote to do any kind of such large-scale modifications in the main namespace" Quite so. I entirely concur. When you set out to make "large-scale modifications" by deleting 3 languages from the Wiktionary, and forcibly merging them into "Serbo-Croatian", you discovered that it was contentious, and disputed, and yet continued. You then set up a vote, and continued making the modifications while the vote was running. When the vote failed, and your modifications were rejected, you continued as if the vote had not occurred.
Now you have the absolute temerity to demand that your entirely unauthorized "large-scale modifications" not be undone?
The 5,427 standard language content sections improperly deleted must be restored. The brain-damaged bot can be used to restore them to the status quo pro ante. If Mr Štambuk wants to insist that they must instead be restored manually, then I believe we must insist that Mr Štambuk personally do all of those manual restorations to his desired standard of quality before being permitted to engage in any other activity on the Wiktionary. Robert Ullmann 06:12, 16 October 2009 (UTC)[reply]

Missing Citations page should link to Wiktionary:Citations

To a casual user who just encountered a new citation, which might never be recorded in wiktionary if not recorded now, the Citations page seems to be the place to put it, something like sending a letter to the OED in the old days.

However, it seems that putting such a note on the citations page is likely to get it deleted without a trace, causing the citation to be lost forever. See for example the recently deleted Citations:paronomasiac which cited the NYT Magazine On Language column.

BTW I am not going to add it back. IMHO that is SemperBlotto's job.

So there should be more instructions indicating how to get a citation recorded permanently by someone using a mobile device which just can't accomplish a properly formatted entry.

— This unsigned comment was added by Archimerged (talkcontribs) at 23:50, 13 October 2009 (UTC).[reply]

Well, seeing as you didn't include the actual quotation, misspelled the referent's name (he's William Safire, not william saffire), and seem to have gotten the headline wrong (at least, the one you included does not match that on the NYT Web-site), your contribution was not so much a citation as a vague pointer to where we might find one — fodder less for the citations page than for the discussion page. And given that the word gets well over a hundred hits on Google Book Search, I can't say that such a pointer is really all that necessary, anyway; so, I don't blame SemperBlotto for having deleted it. Recent-changes patrolling always involves this sort of trade-off; you don't get anywhere if you try to fix every entry, so you always have to make a judgment call about whether what's there is better than a redlink. In this case, a reader clicking the bluelinked "citations" tab would have been sorely disappointed; the citation in the entry was far more useful. (However, in this case, I've restored the page, and tracked down the citation and fixed the page, so you can see what sort of thing we have in mind.) —RuakhTALK 01:50, 14 October 2009 (UTC)[reply]


Thanks for fixing the entry. It was written using a kindle. You can't make capital letters (the shift key is ignored by the experimental web browser). Any quotations have to be retyped by hand on a tiny keyboard from memory, and there is no spell check. While reading the paper on the kindle and away from a computer, I looked up "paronomasiac" wondering if it was a new word, and because the entry doesn't link to "paronomasia" I thought it might be a word coined in 2004 by Spider Robinson, so seeing it used in the NYT would be an important piece of data. I had a choice: try to find out elsewhere if the citation is important (but then I wouldn't have time to record it), or enter it on the citations page. There is no indication that the citations page isn't supposed to be something like the talk page anyway.
Anyway, I think the important thing is that the page displayed from a red "Citations" link ought to say a little about what citations pages are for and link to the official policy page (which is quite difficult to find). Also it ought to link to a place where pointers to citations can be recorded (or say they should be recorded on the talk page), for people using mobile devices who encounter possibly importation citations. After all, a dictionary is not just for people who want to look up words. It is for recording the slow alteration of the language. — Archimerged 01:52, 15 October 2009 (UTC)-[reply]
This is a good point; we should have something similar to the current warning message for templaltes. Does anyone know where that text that appears on a template edit page comes from? I can't find it anywhere in MediaWiki or template-space. -- Visviva 02:58, 16 October 2009 (UTC)[reply]

Wiktionary:Discussion rooms lists Bug reports under "Other places to congregate" which was eliminated four months ago and now redirects to the Grease pit. Anyone have any ideas on the best way to get rid of it without making the entire table look terrible? --Yair rand 13:50, 15 October 2009 (UTC)[reply]

Better?​—msh210 16:42, 15 October 2009 (UTC)[reply]
Great. --Yair rand 18:19, 15 October 2009 (UTC)[reply]

Names of specific entries (again)

I thought I'd copy the URL to save time (here). The several problems I can find (for me, anyway) are:

  1. What is a specific entry? Proper nouns aren't always specific entries, like Stephen as you can have several Stephens. So it refers to things that there are only one of, right? Still, ambiguous.
  2. Used attributively. Okay I'll spare you use#Verb but attributively and attributive aren't very detailed and give almost no help. If it's just the grammatical sense of attributively, then almost any place name will meet CFI because place names (in English) don't have adjectival forms. So I'm from Leeds. A Leeds taxi, a Leeds restaurant. All of these attributive, right? I'd think as long as the place name is not extremely, the attributive form won't be either.
  3. Widely understood meaning. A debate broke out over Daffy Duck that even though it can be cited attributively, it doesn't have a widely understood meaning. What does that even mean? I mean Daffy Duck means a cartoon duck, right? In the same way that Leeds means a city in West Yorkshire. Is there another meaning of Leeds? Or Confucius means a 5th Century BC philosopher. AFAICT "widely understood meaning" doesn't put any limits on what the meaning is, it just has to have one.

Admittedly, and I can barely stress this enough I don't really have a better idea but maybe someone else does? Mglovesfun (talk) 10:10, 19 October 2009 (UTC)[reply]

I have boldly edited Wiktionary:Editable CFI#Names of specific entities to reflect my understanding of this annoyingly oracular passage. I hope that those who disagree will boldly revert or revise my edit, and perhaps we can eventually work out something that will have a working consensus behind it. Here's the problem asa I see it: the lack of clarity of the current wording has given everyone cover to read whatever they want into it (or to ignore it entirely). Any efforts to clarify it thus mean that someone's ox will get gored, and so are voted down for the most ridiculously disingenuous reasons. It's worth reiterating that there was never any consensus for CFI to be set in stone; the current state of {{policy}} is purely the product of some erratic editing by Richardb and Connel in 2006. -- Visviva 10:57, 19 October 2009 (UTC)[reply]
That's already about 50 times better. If we could have a few more people edit it and then have a vote on it, I would be pretty happy. But at the very least, it has to be worded in a way that is a lot less ambiguous, I tend to think that everything is at least a little bit ambiguous. But hey, c'est la vie. Mglovesfun (talk) 11:16, 19 October 2009 (UTC)[reply]
For (1) see Appendix:English proper nouns#Proper nouns as common nouns. Visviva got my understanding of (2+3) right (a definition independent of the referent). --Bequw¢τ 14:17, 19 October 2009 (UTC)[reply]

Until we can agree on sets of proper nouns that should be included in their own right, I don't see a successful vote to make the change. The current proposal makes one criterion clearer, but rewords it in such a way as to exclude most names of countries, etc., which is a set of proper nouns most community editors have supported for inclusion in the past. I have made one small edit to the last sentence, since the way that is was worded would require that we include Thomas Jefferson (or any other full name) if there were more than one individual with that name, which is probably not what was intended. --EncycloPetey 01:52, 20 October 2009 (UTC)[reply]

Daily dump RSS feed

For anyone interested I have created an RSS feed for our daily dumps. You can subscribe at http://www.devtionary.org/cgi-bin/feed.pl - please report any bugs or feature requests. I've tried to make it as similar to the RSS feeds for the offical dumps from Wikimedia. — hippietrail 00:09, 20 October 2009 (UTC)[reply]

Can y'all reconcile this please?

Hey. I know there's been some tension between 史凡 (talkcontribs) and various members of the community (I won't mention names). 史凡 has been blocked for a few days, and while communicating with him by email, he asked whether or not any of the others had apologized. I was obliged to reply that no one had. I then suggested to him that when he comes back, he go to the people who hurt him (some of those comments really stung him, you know) and apologize for anything that he said that may have been hurtful, and try to make up with all of you. Now I'll suggest that to the rest of you too. Can you please forgive each other and get back to building the dictionary??? I want this quarrel to end ASAP. Please. L☺g☺maniac chat? 14:11, 12 October 2009 (UTC)[reply]

You can see below, that he did not take your advice. --EncycloPetey 02:09, 13 October 2009 (UTC)[reply]
I ... see. Since none of the parties I've talked to seem to want to listen to common sense, I'm going to pull out of this issue. Unfortunately. :( L☺g☺maniac chat? 12:01, 13 October 2009 (UTC)[reply]

ididnt ask4anapology[justwords i/worstcase],but4"is anythin'said onwt datshowsthey realiz they cant cal adisabldpersn namz pl?"--but inflamm.burocratpetey,weldun,oil onfire always gudon wt.--史凡>voice-MSN/skypeme!RSI>typin=hard! 02:16, 13 October 2009 (UTC)[reply]

I didn't ask for an apology (just words in the worst case) but for "is anything said on Wiktionary that shows they realize they can't call a disabled person names please?" -- but inflammatory bureaucrat petey, well done, oil on fire always good on wiktionary
He has now resorted to whining and insults with his first edits after the block ended. I am now getting insulting e-mails from him. I move for permanently blocking this user. --EncycloPetey 04:27, 13 October 2009 (UTC)[reply]
Not having received any such emails myself, I can't judge for sure. But it does sound like a ban is in order. -- Visviva 11:18, 13 October 2009 (UTC)[reply]
Unfortunately, if he doesn't listen to common sense from any of us, that may be necessary. L☺g☺maniac chat? 12:01, 13 October 2009 (UTC)[reply]
He has since posted two more ugly e-mails to me, including a mild threat. I am now firmly in favor of a ban. Between altering other people's comments, the constant undirected and misdirected complaints, and now rude e-mails, his presence is more disruptive of Wiktionary than beneficial. --EncycloPetey 03:08, 14 October 2009 (UTC)[reply]
*siiiiiiiiigh* :( L☺g☺maniac chat? 20:49, 15 October 2009 (UTC) [reply]
Well, for the record, I am sorry that I responded in a way that escalated the situation. I don't really think that reconciliation is necessary, though of course it's great if it happens. Most of us have problems of some sort with multiple other editors, but as long as we stick to the work at hand, everything mostly works out OK. If we want to get back to building the dictionary, the only thing to do is ... get back to building the dictionary.  :-) -- Visviva 11:18, 13 October 2009 (UTC)[reply]

The user has continued to send me hate mail, so I've blocked him permanently. If he continues to harass me, I'll have to contact his ISP. --EncycloPetey 20:38, 15 October 2009 (UTC)[reply]

Given that the blocked party in question here has made NO positive efforts and is inflammatory by nature, and that these sort of disruptive editing and comments are why he was blocked on Wikipedia ... There's no reason we should have even tolerated his behaviour here as long as we have. --Neskaya contribstalk? 03:13, 19 October 2009 (UTC)[reply]

Even though I became aware of this incident and the ensuing blocking of 史凡 too late, I wanted to share my positive impressions from this user - he has been helpful in creating well-formatted entries and has written comprehensibly in the two interactions I had with him - on Talk:slordig and on User_talk:Bogorm/archive3#slordig. Even though he has mentioned that he changed his voice recognition software in May, I assure you that there are much more nuisible and illegible writing styles than his. In Bulgarian, in chats or on SMS not only does one abandon the Cyrillic script, but one introduces usages of 4, 6 and even q to repræsent the sound normally render by я (ya) and there is not much to explain that those q and ya have nothing in common. I personally find it much more easier to read literary text in the old orthography before 1945 with the nasal letters than those appalling SMS- and chat distortions of my language which I loathe (even though I am not an old retired person in pension reminiscing the time before the era of SMSes, but in my 20es). With regard to 史凡's writing style, said example seems much more incomprehensible, believe me.

Whilst I condemn this sort of ugly e-mails EncycloPetey mentioned, the dedication of a disabled person to this project is to be appraised as laudatory. Furthermore, a disabled person cannot always show sociability like a completely healthy person, but may become morose due to his condition, which ought to be respected as well. Since he has shown dedication and created numerous entries and been helpful before the beginning of October 2009 and since he has expiated his incivility with being banned for one and a half months as of now, I plead for setting an end date for his block, such as 3 months, 6 months (considering that it was set for one week initially)... In mine opinion indefinite is too austere. Together with Logomaniac I consider reconciliation to be essential and necessary so that enmity amid the Wiktionary community is fended off. The uſer hight Bogorm converſation 10:58, 9 December 2009 (UTC)[reply]

I, and at least two others, have recieved a large variety of personally insulting and threatening emails from this user over a period of months. Such behaviour is utterly unacceptable, regardless of anyone's contributive or communicative abilities. He has demonstrated a total lack of ability to accept criticism, control his anger, or communicate civilly; such a user cannot be welcome here. Conrad.Irwin 15:46, 9 December 2009 (UTC)[reply]

According to Wikipedia, Old Provençal is the "former" name for Old Occitan. Given how little this template ({{pro}}) is used, it would be pretty simple to rename it and all the pages associated. What are our criteria in such cases? Mglovesfun (talk) 22:33, 12 October 2009 (UTC)[reply]

It's usually on a case-by-case basis. For this case, we prefer the name Occitan over Provençal, so I would say the same applies for the "Old" version of the language as well. --EncycloPetey 02:07, 13 October 2009 (UTC)[reply]
Well Occitan and Provençal have different ISO 639 codes (oc and prv). However, both Google and Google Books say that Old Provençal is the more common name, although not by a wide margin. I'm kinda on the fence now. Ethnologue might help. Mglovesfun (talk) 12:14, 13 October 2009 (UTC)[reply]
Okay yeah, sil.org says both, so Wikipedia (according to sil) is wrong. Mglovesfun (talk) 12:20, 13 October 2009 (UTC)[reply]

calin adisabldpersn namz has2stop here!

&if its dalasthing i'complish'ere! ps was kwami'api onhis return?butkeep sowin'lazy adminshp.

Excuse me, but you do not get special priviliges. You have been calling other people ugly names, but now complain that others are doing the same to you. You do not get to call names, and then complain about the same behavior from other people. You do not get to pass the blame for your improper behavior or its consequences on to the administrators. You need to stop your improper behavior now, and own up to your own mistakes. It is not just calling disabled people names that is a problem, but calling any people names that is a problem. --EncycloPetey 02:05, 13 October 2009 (UTC)[reply]
Yeah people with "disabilities" should be treated as equals, which means not worse or better than everyone else. As far as I can see, nobody's talking about your disability apart from you. The rest of us are just worried about your actions. FWIW (personal opinion only) I imagine everyone has some sort of "disability" if you define the term widely enough. Mglovesfun (talk) 12:11, 13 October 2009 (UTC)[reply]
User:史凡, you really need to stop griping and try to refocus on doing some useful work. Nobody here is a paid employee, we all do our work pro bono. It is a team effort and each member works at his own pace and does what he feels competent to do. There are no paid employees who are obigated to kowtow to anybody’s needs or whims. If you don’t get along with someone, you don’t have to talk to him. If some of us cannot understand your shorthand, that’s something you just have to learn to live with. —Stephen 19:44, 13 October 2009 (UTC)[reply]
I've emailed 史凡 (talkcontribs) and asked what he considers unresolved problems. Unless he responds with a very compelling answer, I see no reason for anybody to keep complaining for special treatment. L☺g☺maniac chat? 20:48, 13 October 2009 (UTC)[reply]
(later) 史凡 replied: "undrstndbl-dadeythink deycando wodeva deywnt w/som1who signls arlvntdisablty,instedof doinwot=nrml i/society,nl acomodat dadsabllty[orwotsdawrd~providin ramps4wh ch usrs etc],treat w/respct&giv equaloportunity. that is to say: Understandable - that they think they can do whatever they want with someone who signals a relevant disability, instead of doing what is normal in society, accomodate the disability (or what's the word, providing ramps for wheelchair users et cetera), treat with respect and give equal opportunity
Apparently he thinks that we're not accommodating him properly. True, a record button has not been set up, but I'm not exactly sure what more he wants. L☺g☺maniac chat? 22:51, 13 October 2009 (UTC)[reply]
I think he mistakes us the contributors for people who have influence on those who fund technical improvements and moreover he seems to mistake en.wikt for a "normal" organization. I personally have not the ability, the connections, the financial resources, or the interest in solving this problem. Perhaps it really is a WMF problem. It would have been nice if we could have resolved matters, but the expectations expressed seem to make it clear that we cannot. DCDuring TALK 23:08, 13 October 2009 (UTC)[reply]

(unindent) Good analysis. Moreover, it might just be a general proper with the Internet or computers in general. I see no particular reason to single out WikiMedia. Other than that fact that's what this site is, a WikiMedia project. Mglovesfun (talk) 10:21, 14 October 2009 (UTC)[reply]


Restoring Bosnian, Croatian, Serbian sections — AEL

Entries recently edited by UllmannBot (talkcontribs), with sections extracted from page history:

Apparently the whole list of entries waiting to be "processed" this way is held here: User:Robert Ullmann/SC recovery/report.

As one can see, the way the Ullmann bots does his work, is by looking up the page history and restoring stubbish entries that were later expanded as ==Serbo-Croatian==. It doesn't add any kind of new content at all - it just mindlessly restores the previously merged entries, under the sections that were mercilessly wiped during the SC merger.

And during that process, it doesn't differentiate between standards at all, as it pretends to. For example, of the abovelisted entries, it only added ==Bosnian== and ==Croatian== section at [[cijena]], even thought it is also perfectly valid Ijekavian Serbian word (with exactly the same meaning, inflection, etymology...). This introduces ethnical imbalance in the treatment by falsely insinuating that some word is e.g. less "Serbian" (this especially pertains to Ijekavian Serbian entries which were scarcely generated, but are handled transparently in the "common" SC treatment.)

At the entry [[cigla]] you see all the 3 entries restored, but without the information that was added in the merged ==Serbo-Croatian== entry (the declension table). It also restored the wrong etymology which I fixed in the merged entry (the Latin word does not originate from Ancient Greek word - they're cognates of the same Proto-Indo-European root *(s)teg- "to cover").

It would be much easier, less wrong and more comprehensive to simply generate separate B/C/S/M entries from the SC entries themselves. But that is entirely different issue by itself (take a look at [[govor]] to see how ridiculous would it look like). The thing I hate the most of this bot is that by generating the entries of different state of treatment, that it somehow hints to the unknowledgeable observer that there is some kind of functional difference between these "languages", which there isn't. --Ivan Štambuk 10:22, 13 October 2009 (UTC)[reply]

  • Questions from the peanut gallery:
    • Were these old entries merely incomplete, or did they have substantial quality issues? I would certainly hope the bot wouldn't be re-creating any RFC issues that had already been dealt with under Serbo-Croatian. But incompleteness is the normal state of a wiki page.
    • If we have decided, as apparently we have, to have entries in all four of these language(s) alongside each other, then wouldn't we expect such entries to spread and grow organically, anyway? Would the bot-restored entries differ from the stub sections that might normally be created by a passing anon?
    • If these entries are re-created, would it subsequently be possible for a bot wrangler who is knowledgeable in the languages to automatically spread the spreadable aspects of the Serbo-Croatian entry to the other language sections? If this were done, would it resolve the concerns about restoration?
    • Is there any chance that we could all discuss this in an atmosphere of mutual respect? I know the whole issue is quite a mess, and has quite a history at this point; but we are all working towards the same goal here. I think. :-) -- Visviva 11:48, 13 October 2009 (UTC)[reply]
  • I'd say that at least 90% of them were in the state of "allowable incomplete", and the 5-10% of them had various issues such as incomplete, unprecise or ambiguous definitions (which is characteristic of the basic-vocabulary lexemes in a stub form), and formatting mistakes (e.g. the obsolete ====Cyrillic spelling==== and ====Latin spelling==== sections which are now handled inside the inflection line of {{sh-noun}}, previously by {{sr-noun}}, ====Related terms==== not containing really etymologically-related terms, some not so precise synonyms, meanings not split by etymology..). Nothing really "wrong" with all that, and great many other languages here already have similar issues (esp. the rarely-maintained ones), but I'd really hate to see all that already fixed sh*** resurrected again.
  • We haven't really decided anything. There is a consensus among the SC contributors (apart from a group of nationalists that contributed here only during the vote, and has fled ever since it ended), as well as pretty-much all of the Slavic-language contributors that the common treatment in one language section is something beneficial to the end-users as well as contributors, and there are others who for some unknown reason oppose to the merger, mostly due to the unexplainable concern that somebody's feelings don't "get hurt". With the second groups, Robert Ullmann and his minions (Lmaltier, DCDuring, Elephantus..) you cannot really argue at all, because they either ignore any kind of discussion, continue to spread FUD and dirty lies, or imagine various kind of "concern scenarios" over imaginary users. Now, since the first group (the people who actually contribute Serbo-Croatian entries) is responsible for some 99.9% of all the entries, these non-merged entries would not, as you say "spread and grow organically". This is not Wikipedia: the contributions by "passing anons" are infinitesimal in number, and usually of very little value. 99.9% of all quality content on Wiktionary is added by long-time, well-known and dedicated users. If you ask me, I'd lock this project completely for IPs, as they only generate vandalism and low-quality entries that need to be fixed and expanded. The net result of everyone contributing sections they prefer would be: Serbo-Croatian entries growing to tens of thousands, and individual B/C/S/M entries serving primarily as a hole to channel various passing-by nationalist contributors, who'd give up once they realized how much time-consuming adding quality content here is, despite the apparent "easiness" of contributing (as opposed to Wikipedia).
  • Not completely, but to the large extent possible. But, why? It's pointless to have both the merged and separated entries on the same page. It still doesn't solve the maintenance-hell problem. So for each SC entry you'd have 1+2+2+2=7 (B, S and M can be written in 2 scripts) additional entries, 9 if you include the alternative forms like with jat reflexes. It's pointless and fruitless endeavor. Our Serbo-Croatian learners would be horrified upon encountering such mess. But I rather wouldn't discuss some would-be scenarios, but what we have here and now with this Ullmann's bot.
  • Ever since his blogpost at DailyKos in which Ullmann described me as some kind of "genocidal Serb nationalist", he has done absolutely nothing to deal with this issue in a civilized, respectful way. Moreover, abs. everything he did was to obstruct the Serbo-Croatian proposal (of which he was notified, as did the entire community, 4 months earlier - which begs for question: what has changed in his mind in the meantime?): from the "intractable technical difficulties" with some silly HTML languages codes that upon inspection came to be completely irrelevant for some 99.99% of websurfers, to this brain-damaged bot which he writes on his home computer and then "announces" that he'll run in on BP as something completely legitimate and "normal", barely worth of discussing. If he were a decent person, he'd announce it on the talkpage of WT:ASH, and ask for feedback, but that wouldn't be evil enough of him. The only thing Robert Ullmann is interested in is sucking on my nerves, under the disguise of some "concern" for the "deleted content" of "standard languages" (whom he doesn't know a word of). It's pointless to discuss with person like him, since he'll never admit that he's wrong. --Ivan Štambuk 15:22, 13 October 2009 (UTC)[reply]
I absolutely agree with you, Ivan. RU has done extreme damage to the Serbian/Croatian project and seems intent on killing it off entirely. It’s the same kind of misguided and shortsighted behavior that has stifled our Swahili entries for years and paralyzed the Kinyarwanda Wiktionary. If he would put his considerable programming expertise to good use in support of our linguists instead of thwarting us at every turn, he would be a great resource. Instead he smothers everything he touches. —Stephen 19:30, 13 October 2009 (UTC)[reply]
(wow!) Stephen, what has happened to you? You are much better than this! Personal abuse from you? I would never have expected it. and your complaints are nonsense. I haven't "stifled our Swahili entries"; I in fact organized a project of 5-6 native speakers of Swahili and English and several tribal languages, and we have added thousands of entries to sw.wikt and translations here. "paralysed Kinyarwanda" is even more inexplicable, I haven't done anything there in a long time. Robert Ullmann 05:35, 15 October 2009 (UTC)[reply]
I strongly disagree with Pepsi Lite. The bot should not try to sync anything, only to restore sections that have been renamed, without deleting anything nor adding anything (otherwise, it would introduce errors: a bot cannot guess anything about words). Restored sections would be for users looking for e.g. Croatian sections (users looking for Serbo-Croatian sections would find them, and not bother about other sections). After that, it will be up to editors willing to do it to add new entries (if they want to), or to improve restored entries. Only humans can do this job (but nobody has to do it). Lmaltier 16:19, 13 October 2009 (UTC)[reply]

Given the controversy of this, it would seem irresponsible to proceed without voting on the issue (per WT:BOT, "a new bot must be approved by the community"). However, as it has been previously shown, votes on this issue are nothing more than power struggles, it seems unlikely that this bot will ever be approved without it being beneficial enough to everyone. Conrad.Irwin 20:42, 13 October 2009 (UTC)[reply]

This absence of consensus makes the bot necessary. Is it normal that lots of sections are removed again and again, without doing anything? This bot would do nothing except restoring valid sections. On the French wiktionary, somebody used to systematically remove translations to Ido (and many translations to Latin, when the Latin word was already present in the etymology section, giving the reason redundancy). In this kind of case, when changes are too numerous, a bot is a good solution. Refusing the bot would be accepting this removal of valid sections. Lmaltier 06:09, 14 October 2009 (UTC)[reply]
This absence of consensus makes the bot necessary. - Uhm, what kind of ill-logic is this? There is already a consensus to merge the entries among the relevant, knowledgeable contributors. Pretty much all the opposes are either complete ignorants, or do so on some bizarre ideological basis.
This bot is worthless. It restores worthless, buggy, stubbish content that is already contained in the thoroughly fixed and expanded entries. It is solely and ultimately product of Robert Ullmann's evil-mindedness and hatred.
Your comparison of Serbo-Croatian standards to Latin and Ido is preposterous. For once, you should familiarize your self with Serbo-Croatian to see why. The overlap would be in some 99% of words, and the entire grammar, which is at least an order of magnitude greater redundancy than you'll get between Ido and Latin. Words are not just spelled the same - they have the same inflection, pronunciation, etymology etc.
Refusing the bot would be accepting this removal of valid sections. - You keep being obsessed with the notion of "valid sections". Above you also make a nonsensical claim: Restored sections would be for users looking for e.g. Croatian sections (users looking for Serbo-Croatian sections would find them, and not bother about other sections).. Did you miss the part where me and other thoroughly elaborated the following points:
  1. It's impossible to learn "Croatian" and not also "Serbian", "Bosnian", and "Montenegrin" at the same time.
  2. Modern Serbo-Croatian standards are taught together, as one language, (as "Serbo-Croatian", "BCS", or other similar name) in 99% of FL world's unis. The only place were they are not is because of funding by Croatian diaspora (i.e. due to politics messing in the world of academia).
Hence, there are really infinitesimal number users and contributors that would be looking "only for Croatian entries". Just because these merged sections are technically "valid" as kept separately, is no reason at all to have them, or to encourage their creation - they're worthless content-wise, they'd introduce a number of errors, require hundreds of hours of cleanup and expanding, in order to make all the pages containing them look as ridiculous as the entries on [[govor]]. --Ivan Štambuk 07:48, 14 October 2009 (UTC)[reply]
Even an infinitesimal number of users would have to be considered (after all, even languages with only a few speakers are accepted here), but publishers proposing Croatian-English, Croatian-French, Croatian-German... dictionaries prove you are wrong: there is no doubt that there is a market, that there are many dictionary users looking for Croatian words. But there is no need to discuss this again, only to apply the result of the vote you organized, which shows that this issue is very sensitive, and that there is no consensus for the removal of these sections. Lmaltier 20:52, 14 October 2009 (UTC)[reply]
Indeed. To confirm, the bot code only restores sections for which there was no approval to remove, and therefore were improperly removed, and therefore must be restored. No changes are made to the restored sections, except to add attention tags in some cases, and no changes are made to "Serbo-Croatian" sections. (Comments by Mr Štambuk supra about creating entries do not apply; I did have/do have code that can do things like that; that is not the issue here.) Editors are then free, as always, to work on the standard languages, without fear that their valid contributions will be deleted. If in fact, Mr Štambuk had simply continued his excellent work on his native language, Croatian, without deleting the standard languages, we would have much better entries today. Robert Ullmann 05:35, 15 October 2009 (UTC)[reply]
bot code only restores sections for which there was no approval to remove - Uhm, you were personally notified as well as the entire community in Beer Parlour for 4 months the merger ongoing, before you turned on the sick "linguistic genocide" mode, spreading lies, defamation and FUD all over the place. The merger was approved by the consensus of all the active SC contributors, as well as by the silent approval of the rest of the community. Now, the fact that you personally imagine that there was no consensus, simply proves that the only intention of yours here it to be evil by playing dumb.
No changes are made to the restored sections, except to add attention tags in some cases - which will result in thousands of created entries requiring manual cleanup, which you, I suppose, have no intention giving a hand with, since you have no clue about the language...
Editors are then free, as always, to work on the standard languages, without fear that their valid contributions will be deleted. Editors are now free to work on their standard and non-standard languages and "languages" without fear that their valid contributions would be deleted. I haven't touched the new creations done by your nationalist friends, except fixing errors in them. OTOH, they've quite benefited from my edits (many a such new ==Croatian== entries are blatant copy/pastes of the neighboring ==Serbo-Croatian== entry, so much for the "different languages"). The only entries that are merged are those created by the contributors supporting the unification effort. And that would be most of them, and I suspect that is the thing which bothers you most.
If in fact, Mr Štambuk had simply continued his excellent work on his native language, Croatian, without deleting the standard languages, we would have much better entries today. - Excuse me, my native language, mother tongue, is "Croatian", in the same sense as yours is "American English". In a more general sense they're called Serbo-Croatian and English, respectively. We already have much, much better entries today because of the unification effort, since Dijan and I don't need to waste time and space on anymore doing exactly the same thing on seven different places, we do it on only one or two, thus doing it much faster. --Ivan Štambuk 15:42, 15 October 2009 (UTC)[reply]
Even an infinitesimal number of users would have to be considered - Infinitesimal number of users are of infinitesimal concern to us. In other words, it would be a waste of time and space. This has nothing to with the number of speakers (we even add extinct languages spoken by no one), but with various other practical concerns which you somehow manage to ignore every single time.
but publishers proposing Croatian-English, Croatian-French, Croatian-German... dictionaries prove you are wrong: there is no doubt that there is a market - there are separate dictionaries and grammars of Austrian and Swiss German, Portuguese in Portugal and Brazil, not to mention Spanish, English, Arabic... Should we separate these too? :) Lmaltier, the point is not whether we could do it, but does it make sense. I see no sense in making 200 000 entries looking as ridiculous as those on [[govor]].
But we're not talking about whether these entries are allowed or not (and they are allowed): we're dealing here with Ullmann's bot which he threatens to unleash generating thousands of problematic entries, because his brothers-in-arms think they're "fighting" for Croatian language that way. They'll be gone for good after the bot finishes execution, and so will Ullmann, attending to his usual business I am the one who'd be left cleaning up that **** :) --Ivan Štambuk 15:27, 15 October 2009 (UTC)[reply]

This is getting boring. The only solution I see is a duel between Štambuk and Ullmann. I can pick the weapon (I'm thinking about muskets). --Vahagn Petrosyan 17:04, 15 October 2009 (UTC)[reply]

I have a better suggestion. Can they have a virtual beer session? If they were closer geographically, the real one would be better. --Anatoli 00:38, 22 October 2009 (UTC)[reply]

@ Ivan Štambuk: Calm down, I'll help you to clean up this ****. Afterwards, we can make a bot to generate an enormous amount of separate American and British entries... Maria Sieglinda von Nudeldorf 18:55, 15 October 2009 (UTC)[reply]

The bot won't generate any problematic entry, nor generate any entry at all, only restore entries (entries removed before, during or after the vote about this issue). No cleaning at all should be necessary but, of course, people willing to improve them (and only people willing to improve them) would be welcome. Lmaltier 22:23, 15 October 2009 (UTC)[reply]
The bot already generated problematic entries during the test edits. See my posts above, and pay attention to what is being written Lmaltier, I'm beginning to suspect that you possess selective visual sensorium. --Ivan Štambuk 22:52, 15 October 2009 (UTC)[reply]
The bot should not be used yet if it's not completely tested, i.e. if it does not restore correctly the last version of removed entries to be restored. I have not checked that. But, once it does its job well, it is necessary to use it. Lmaltier 06:42, 16 October 2009 (UTC)[reply]
You're obscuring (either deliberately or inadvertently) the actual issue I object to with two seemingly identical notions of "correct":
  • correctly restoring the previously merged entries from the page history
  • the correctness of the restored entries.
No, the correctly restored entries are not necessarily correct. They'd be inducing, as can be attested on the test edits committed by the bot already: 1) factually wrong data rectified in the merged entry (e.g. the wrong etymology I exemplified above and which was fixed in the merged entry) 2) obsoleted sections and templates (e.g. ==Cyrillic/Roman spelling== headers which are now treated in the inflection line) 3) propagate inconsistency (e.g. rog#Bosnian has different set of meanings than rog#Croatian, so the poor Wiktionary user would be mislead into assumption "gee, this must be one of those cases where there actualy is some kind of differences among the standards", when in fact there is none and all 3 senses are valid in all 3 (4) Serbo-Croatian standards. In other words, if the point of separate treatment is to make up forthe cases when there is some kind of difference in meanings/spellings, this restoration would in fact to more damage than good).
Therefore, all of those entries restored from the page history must be checked by a human before committing, which kind of renders the process of completely automated bot-restoration pointless. The reason why I principally oppose such restoration of the merged entries is because 1) they're worthless as everything of any worth in them is already contained in the merged entry, and properly rectified/expended in the process 2) Thousands of these restored entries would need manual cleanup (actually, every singly one of them would have to be checked, because bot doesn't have a clue whether there was sth wrong or not in the unmerged entries). If Ullmann and co. are ah-so-concerned with the status of ==Croatian==, ==Serbian== and ==Bosnian== entries they should perhaps focus instead on extracting (OK, let's be honest, copy/pasting) the information from the ==Serbo-Croatian== sections, as that would be much, much easier thing to do (90% less LoC), and much less error-prone, in some kind of semi-automatible way, but supervised by humans. I personally don't care at all for B/C/S/M entries now, other than as a means to enhance ==Serbo-Croatian== entries. --Ivan Štambuk 20:16, 23 October 2009 (UTC)[reply]

Duplicating, or even triplicating in this case articles for essentially same language is a bad thing for Wiktionary. People who improved one article, would have to merge their improvements to other ones. And most outsiders or just casual editors will never bother to do it. I see a VERY GOOD reason for the merge that have been done. But I see no whatsoever advantages in creating near identical copies. Assuming that such triplication would be done intelligently by human. And doing it by means of the bot, who will ignore most, if not all improvements that have been already made making even less sense. Sure, we could create something like, lets say, == Canadian English ==. And we might one day. For example to give a Wiktionary reader information he most interested. But for the moment copying entries would be bad idea. Doesn't matter we talking about copying English --> Canadian English and Australian English or Serbo-Croatian --> Serbian and Croatian. TestPilottalk to me! 23:37, 31 October 2009 (UTC)[reply]

There has been a great deal of - discussion - about this. Similar arguments can be made about various languages that are quite similar and may be considered dialects of the same language. The saying is: "A language is a dialect with an army." Wikiworld is actually part of the real world in which non-ivory-tower considerations may have applicability. For some time we (relatively) peacefully followed ISO 639, treating any purported language with a code granted by ISO as a language for our purposes. There has been no consensus to change that policy. Serbian, Bosnian, Croatian, and Serbo-Croatian have had ISO codes. The linguistic arguments favoring treating them as a single language seem as good as those that are made about languages such as English (UK, US, Australian, at least) French (France, Quebec), Spanish (Spain, Latin American countries), etc. DCDuring TALK 00:42, 1 November 2009 (UTC)[reply]

Ummm....I think we're missing the point here. The SC vs individual headers bit is damned controversial. I think we can all agree on that. The question we need to be discussing is NOT whether we should have one SC header or headers for Serbian, Croatian, etc. The issue is whether this bot should make these edits. It seems to me that having consensus on all bot activities is absolutely necessary. Inasmuch as Robert has done immeasurable work for this project with his skilled programming, he does not have the right to shape it as he sees fit, which is ultimately what allowing this bot-work to proceed would accomplish. Yes, the vote for SC unification failed, but separate headers have no more consensus than unified headers (less, actually). Additionally, it should be noted that when Ivan did the mergings, he DID have apparent consensus. Now that we no longer have consensus either way, editors are free to edit as they see fit, but bots should stay the hell out of it. -Atelaes λάλει ἐμοί 05:55, 1 November 2009 (UTC)[reply]

Indeed. We have a consensus, let's act on it. The bot can restore the sections with a request for AutoFormat, and human editors can do the rest. Mglovesfun (talk) 06:07, 1 November 2009 (UTC)[reply]
What? I must have missed something (I, admittedly, haven't been that active lately). My impression was that we had anything but a consensus. My impression was that we were utterly away from coming to any sort of rational decision on the matter. Looking at the discussion, it would appear that, of the names that I recognize, only DCDuring and Lmaltier are supportive of this (to those whose names I don't recognize, I apologize for snubbing you. This issue has brought in so many people on both sides who have no intention of doing anything for the project that I haven't bothered to get to know them). Specifically, I see Conrad, Ruakh, and Stephen all opposing this. I also see a whole lot of two editors who used to have a lot of my respect whining like little children, as they have been for some time. In any case, this is not a consensus. -Atelaes λάλει ἐμοί 08:51, 1 November 2009 (UTC)[reply]
I oppose bot edits under these highly controversial headers as well, as controversy and confusion would only increase thereby. The uſer hight Bogorm converſation 20:03, 1 November 2009 (UTC)[reply]
In case more "names" need to be listed, I disagree with the bot edits. Meta-wise I view no consensus and on the issue (and therefore a bot should not be run). --Bequw¢τ 21:37, 1 November 2009 (UTC)[reply]
No we don't have a consensus for running the bot. All such edits must be done manually and checked in the process against the merged entries for corrections and missing stuff. --Ivan Štambuk 06:41, 1 November 2009 (UTC)[reply]
I agree we do not have consensus for running the bot. Nor do we have consensus for removing language headers. Yet Monsieur Štambuk continues to do so. Despite a lack of consensus, Štambuk is disregarding the Wiktionary community's decision in pursuing his crusade. But that has been xyr style. - Amgine/talk 04:04, 2 November 2009 (UTC)[reply]
I agree with you that Ivan should have stopped converting e.g. “Croatian” to “Serbo-Croatian” once the community started to debate the issue. That would have been courteous. However, there is a huge difference between performing an edit without consensus, and writing a bot to perform edits without consensus. Wiktionary could not function without the former, and cannot function with the latter. (And I don't understand why you link to his block log. Are you trying to suggest that your unjustified block was somehow "the Wiktionary community's decision"?) —RuakhTALK 04:18, 2 November 2009 (UTC)[reply]
Wow, look who's talking, a person who ran Ullmann's script from his own username account (so much for the respect of "community consensus"), who thinks that the same word written in 2 scripts belongs to "different languages", who claims that no peer-reviewed evidence was provided to him that supports the notion of one language... And I find it extremely funny that your rhetoric has changed from "removing languages" to "removing language headers" ! Are we finally accepting the fact that these are all triplicates of essentially one and the same content?!
I created those entries, I'm reformatting them. If you have trouble with that, learn Serbo-Croatian and add separate B/C/S/M sections, if you care so much. --Ivan Štambuk 04:34, 2 November 2009 (UTC)[reply]
Goddamn it, Ivan, shut up. Your complete lack of restraint is making everyone supporting your side of the argument look bad. If you have nothing productive to add to this conversation, please bite your tongue. Agree with Ruakh that, in the face of no consensus, everyone is allowed to edit as they see fit, including Robert and Ivan, but automated edits are out of the question. -Atelaes λάλει ἐμοί 11:35, 2 November 2009 (UTC)[reply]
Atalaes, M. Štambuk has used technically-assisted edits in his campaign. I believe Cirwin can confirm this. In every way but the semantics, a bot. - Amgine/talk 14:30, 2 November 2009 (UTC)[reply]
When you suggested this, I responded "I doubt it", and gave you reasons for this response. Please do not use my privately-expressed opinions to support your arguments: even if you were correct in remembering what I said, it would be polite to ask permission first. Conrad.Irwin 15:43, 2 November 2009 (UTC)[reply]
LOL - nonsense. I do all of my edits manually. I have a computer program (that I wrote) that helps me transliterate Cyrillic/Latin script on the fly, and that's it. I haven't run a Mediawiki bot in my life. Stop spreading lies. --Ivan Štambuk 14:34, 2 November 2009 (UTC)[reply]
Cirwin: what you stated was you believed the method Ivan used was similar to your own to open 50 or more windows, then saving and closing in quick succession (which would not have given the sustained edit rate M. Štambuk's log occasionally displays, but that's another topic.) You've also said you've used scripts to automate this process. And you said this in a public channel. My apologies if I made logical leaps which were inaccurate. - Amgine/talk 19:09, 2 November 2009 (UTC)[reply]
I merely pointed out that that was another method of making quick edits (thus the supposition that a bot as being used was premature), I did not imply that it was the only way; simply that it was a way I have used in the past. The scripts, which I also mentioned, are written in python and are another way allowing me to do manual edits in a proper text editor instead of tinkering around in a tiddly box in a web-browser. They were unrelated, but mentioned in succession. This part of the discussion took place in a private message session between the two of us. I do not think it merits further discussion. Conrad.Irwin 23:32, 2 November 2009 (UTC)[reply]
I do all of my edits manually. I have a program that converts Serbo-Croatian entry in Latin script to the equivalent entry in Cyrillic script, because Cyrillic characters with combining diacritics that Slavists' use to denote accents are impossible to type. Every single SC entry I've edited was saved manually, by ALT-SHIFT-S. In very rare occasions (when I'm internetless) I pre-edit the entries in a text file, and then copy/paste them in a serialized manner. When editing at a sustained rate, I add entries one by one, never parallelly in multiple windows waiting to be saved. I've never used scripts or bots so far as I have no need for such technology. --Ivan Štambuk 19:30, 2 November 2009 (UTC)[reply]

Accepting input from mobile device users

http://www.nytimes.com/2009/10/07/technology/companies/07amazon.html

Mr. Bezos declined to offer specific information about Kindle sales. But he said Kindle titles were now 48 percent of total book sales in instances where Amazon sold both a digital and physical copy of a book. That was up from 35 percent last May, an increase Mr. Bezos called “astonishing.”

There will be lots of readers with kindles and other mobile devices, and when they encounter words not in the included dictionary, many will use the clunky and experimental web browser to look in Wiktionary, as I did. The words they are looking for are by definition rare, as they wouldn't be looking in Wiktionary unless the words were not in the built-in dictionary. Is wiktionary going to accept their input? There should be a fast and easy way to add a short note about where they encountered such words. Archimerged 03:05, 15 October 2009 (UTC)[reply]

We have Wiktionary:Requested entries:English (shortcut WT:RE:en), where users can request an English entry that does not yet exist. Those requests which include a quotation or other information are usually dealt with must faster than bare requests consisting of just the word with no contextual information provided. --EncycloPetey 03:09, 15 October 2009 (UTC)[reply]
Wiktionary looks awful in my Nokia 6700 browser, I have trouble viewing let alone adding some input. --Anatoli 03:11, 15 October 2009 (UTC)[reply]

Scripts

As I said in a previous discussion, I'm working on a project related to letters. I've cleaned up every translingual Latin letter definition that I found and added some dozens of new ones. Entries on Armenian, hiragana, basic katakana and some Braille are done as well.

Each entry has at least a generic definition and a display box containing related information (such as character variations, Unicode and romanization; though the Unicode was removed from display boxes at Braille entries by Bequw, who favors the usage of {{Braille character info}} instead). Examples: ĉ, , , ա and .

My next plans on this project include: cleaning up Braille variants, IPA letters and Roman numerals; creating every Latin, Braille and katakana entry; then starting Cyrillic script and Greek script. --Daniel. 03:28, 16 October 2009 (UTC)[reply]

See Appendix:Cyrillic script for an exhaustive list of modern Cyrillic letters. —Stephen 03:39, 16 October 2009 (UTC)[reply]
Good stuff! I've got to say, though, that the top placement of {{mul-script}} is not growing on me. It really clogs the top of the entry. Could it possibly go under ===See also=== as a navigation footer, maybe reformatted accordingly? -- Visviva 06:50, 17 October 2009 (UTC)[reply]
I am inspired on (among other things) the function "Add links to previous and next pages." from WT:PREFS; a list of related entries at a top placement looks very good in my opinion. But yes, the {{mul-script}} can be moved to See also section. --Daniel. 13:16, 19 October 2009 (UTC)[reply]
I agree they should be in the See also section as they are in the main content space (as opposed to floating on the RHS) and show the entire script. I also believe we should rethink the format. Most of these templates just show lists of letters, so why are we using centered-tables with color? On pages with legitimate uses of tables, RHS elements, and images, I think these script templates look inconsistent, overly large, overlap other elements, and catch too much attention. I'd rather see
See also
than
See also
--Bequw¢τ 14:42, 19 October 2009 (UTC)[reply]
I strongly agree with Bequw. Conrad.Irwin 14:57, 19 October 2009 (UTC)[reply]

In my opinion, the suggestion from Bequw is excellent. In other words, I agree on the format of the character list at the See also section as lines. Though all the issues pointed at his message could also be fixed through collapsible tables, like the Galician, Portuguese and Spanish conjugation tables at the entry comer and translation boxes of English entries; this may be a better solution in case of higher quantities of characters, such as at Ǽ, which could contain this:

See also

instead of such links allocated in five distinct lines. (Additionally, hiragana and katakana are mainly always organized as tables, but converting the current {{mul-script/Hira}} and {{mul-script/Kana}} to one or more lines wouldn't be troublesome.) --Daniel. 09:33, 20 October 2009 (UTC)[reply]

That does make it tidier, but then the alphabet isn't showing by default, which I think we would all agree it should be. Maybe hide the less-common variants when there are many (most don't have as many as æ), but the main script alphabets should be visible and still consistent with the rest of our UI. --Bequw¢τ 14:52, 20 October 2009 (UTC)[reply]
It might be worth rethinking how many of these are actually relevant on every page - I'd be perfectly happy for people who want to lookup á to have to look at a first, same with IJ and I or J. That way you only need two rows of the most pertinant links as opposed to a splatter of mainly irrelevant ones, removing the need for yet-another hidey-box-thing. Conrad.Irwin 21:49, 20 October 2009 (UTC)[reply]
Bequw, could you explain why do you think that, if collapsible tables are acceptable for some or most of the discussed contents, the fifty-two uppercase and lowercase basic Latin letters should be always visible? I'd like to know, specially because the text "The Latin script [show ▼]" is self-explanatory. --Daniel. 03:51, 21 October 2009 (UTC)[reply]
First, the collapsible box is not a cure for the inconsistent UI (using centered, colored tables). Using a consistent UI will reduce the space enough so the issue of collapsing boxes is probably moot (I said "maybe" because in general when a section runs too long, one can consider boxing away the less useful parts). Second, for letters in an alphabet, I think the other letters in that alphabet should always be visible (Latin a should show b and Hungarian ű should show ü). This is both because users are very likely to click on these links and because dictionaries often define alphabetic letters in reference to the rest of the sequence. Third, in regards to the obviousness of "Latin script", I think many users of Wiktionary might not know that by "Latin" we mean letters used also in English (definition 2 of our 6) and by "script" we mean a writing system (definition #7 or our 8) but that does not include numbers and punctuation (like Unicode's "Latin" block). It's fine to show the phrase and the letters at the same time, but don't assume users will know correctly what to find by clicking on the "show" link. When we can do w/o this obfuscation we should. --Bequw¢τ 18:30, 21 October 2009 (UTC)[reply]

Time M-zine article

Some of you might be interested in reading Time Magazine’s September article on Wikipedia. It is available here: Is Wikipedia a Victim of Its Own Success?. —Stephen 19:59, 16 October 2009 (UTC)[reply]

Interesting. Thanks for posting the link. :) L☺g☺maniac chat? 21:54, 16 October 2009 (UTC)[reply]
Related talk from Wikimania 2009: Interpreting Wikipedia’s Demographic Decline: Implications for an Emergent Community (video, 79.4MB, 1:16:00). --Ivan Štambuk 00:28, 17 October 2009 (UTC)[reply]


Protologism-by-template

I came across the word zwavelzuurtjes. The Dutch word zwavelzuur means sulfuric acid, and this is the diminutive plural of a word that does not really have a plural or a diminutive. Moreover the word "zuurtjes" does exist. It denotes a form of candy, a bit like lifesavers. That means that "zwavelzuurtjes" sounds like a morbid joke: sweets filled with sulfuric acid.

The problem here is that a template like nl-noun generates protologism like this by default. I think it should be the other around imho: the default should suppress the diminutive and one should have to specify the diminutive if there is one, because it is certainly not so that all Dutch words have a diminutive even if one can grammatically be formed.

As the default is now it invites people with very limited knowledge of this language (or none at all) to generate large amounts of nonsense. This is unfair to the native speakers and unfair to our language. Besides, real protologims are suppressed here, even when they do have genuine semantics.

The problem is not limited to nl-noun. I also see superlatives and comparatives of adjectives that really do not have those, or forms like "farmacologischst" that are simply wrong. (Words on -isch typically take meest (most) as in English if they have superlatives).

Imho trustworthy dictionaries should provide factual information on words and forms of words as they are really used (or not), not gobs of bogus forms that were computer generated to boost the article count. Jcwf 21:50, 16 October 2009 (UTC)[reply]

Don't the templates involved have the option to suppress the additional forms? {{en-noun}} is the best one in English for handling such matters, specifically allowing "-" and "?" as parameters, "-" displaying "uncountable" and "?" displaying nothing but adding the item to a hidden category for unknown or uncertain plurals.
What portion of nl nouns generate spurious diminutives? Do nl templates offer options for suppressing some forms? DCDuring TALK 23:21, 16 October 2009 (UTC)[reply]
You can use {{nl-noun|plural|-}} to suppress the diminutive. But as with {{en-noun}}, many users won't realize this. The few times that I've dealt with {{nl-noun}} doing cleanup, I've just kind of prayed that the template knew what it was doing. :-) -- Visviva 06:04, 17 October 2009 (UTC)[reply]
Diminutives are sth that shouldn't be appearing in the inflection line in the first place, but under ====Derived forms====. Do diminutives play such a prominent role in Dutch that they ought to be conspicuously indicated that way? Do Dutch (mono- and bilingual) dictionaries list the noun diminutives that way normally, or list them under separate headwords?
Comparatives and superlatives should obviously be made optional parameters, defaulting to the behaviour that is characteristic of the bulk of adjectives in the language (I assume that most of the Dutch adjectives/adverbs can be graduated into comparatives and superlatives..) --Ivan Štambuk 00:13, 17 October 2009 (UTC)[reply]
It always seemed odd to me that we had the diminutives in the inflection line, but I figured it must be a Dutch thing. Hopefully more Dutch-speaking editors will weigh in. -- Visviva 06:04, 17 October 2009 (UTC)[reply]
Support changing the default template behavior to avoid creating misleading redlinks. "First, do no harm," and so forth. I assume, though, that this would then require editing the many Dutch nouns for which legitimate diminutives are being template-generated. -- Visviva 06:04, 17 October 2009 (UTC)[reply]
support for the same reason: it's better not to provide some useful information than to provide wrong information. Lmaltier 06:20, 17 October 2009 (UTC)[reply]
I support doing whatever people decide on ANL. Not an issue the community at large needs to get involved in IMO.​—msh210 18:10, 19 October 2009 (UTC)[reply]
I oppose changing default template behaviours and support morons not adding shit they don't know. — [ R·I·C ] opiaterein19:32, 22 October 2009 (UTC)[reply]
I don't know Dutch (and am ipso facto a moron), but I feel as though templates shouldn't offer something by default unless it's going to be relevant more than 50% of the time. Is this the case here? (Example: our {{en-adj}} doesn't automatically put -er and -est forms on every adjective, because loads of them don't take it or are irregular.) Equinox 21:08, 22 October 2009 (UTC)[reply]
I'm not sure the the original complainant is a native Dutch speaker, but he does say he spends his time on nl.wp. So, even after we solve the moron problem, we have the problem of bad default behavior of the template. The solution lies with the active contributors to such entries.
About morons: If we only allow non-morons to contribute, then the non-morons had better get on the stick: If a language has 200,000 lemmas and many more non-lemmas and there are two serious contributors doing 20 all-new entries per day (no translations) with etymology and pronunciation and full inflection, it would seem that it would take 5,000 days. Then either one has to get more non-moronic contributors or reduce the number of languages (!!!). OR one could make it easier for morons to make some kind of contribution. And, who knows, perhaps their moronity is not a genetic condition, but one that can be overcome by this new biotech thing I've heard about called learning. DCDuring TALK 21:43, 22 October 2009 (UTC)[reply]
I strongly support moron contributions. I hate contributing to entries for words that I actually know something about. So boring... -- Visviva 12:39, 23 October 2009 (UTC)[reply]
Support for changing templates so that they do not create gibberish by default. Oppose making assumptions like "people will know how to suppress incorrect information". Strong oppose to classifying those who don't as morons. When I first came here I barely understood what a template was, and certainly didn't know enough to find the Template: page that described its use. I thought the learning curve was pretty steep then. I can't imagine what it must be like now. DAVilla 06:19, 9 November 2009 (UTC)[reply]

Logo vote

As everyone here probably already knows, there has been a discussion on changing the Wiktionary logo going on on Meta for quite a while. There were a bunch of logo proposals, a few arguments, no one having a clue what anyone's talking about, the discussion page getting insanely long, yada yada yada. Recently, there were doubts on whether there should be a logo vote in the first place as people started to notice that two thirds of the votes to start a logo vote were in fact not from Wiktionarians, but from Wikipedians and Meta-people. So, to finally set the record straight on whether people are in favor of a vote I am asking here to find out who does, and who doesn't, want a new logo.

So, do the people at the English Wiktionary want a new logo?

--Yair rand 17:21, 18 October 2009 (UTC)[reply]

What logo do you consider as the current logo? The logo used here or the official tile logo? I think there should first be a vote here about adopting this tile logo (AFAIK, no vote has been organized here about this tile logo yet). Lmaltier 17:32, 18 October 2009 (UTC)[reply]

I used to not like our current text-logo, then it sort of grew on me. Then I saw this proposal, and it stole my heart. Now, I want a vote to adopt it. --Vahagn Petrosyan 17:40, 18 October 2009 (UTC)[reply]

I still hate our current text logo. I would do a vote. --Internoob (TalkCont.) 18:19, 18 October 2009 (UTC)[reply]
I don't care, but might vote to block a worse alternative. Equinox 18:42, 18 October 2009 (UTC)[reply]

I agree with Vahagn about the current logo having grown on me, but find the book-and-globe one he's pointed to unobjectionable (which is more than I can say for the icky tile logo). --EncycloPetey 18:27, 18 October 2009 (UTC)[reply]

Except that it is what users are accustomed to, the current logo-in-use doesn't have much to recommend it IMO. But that is a big "except". Perhaps it is too English/Roman alphabet-oriented, too busy/doesn't scale well to small size. I'm not sure how much it matters as most new users don't select Wiktionary because of a logo, but come to us from WP links or search engines and portals. Logos become invisible over time, too (and we can turn the corner one off). A logo can help a bit if the colors are a bit different from what the other English-language projects use, biased toward whatever colors are in the logo(s) other language Wiktionaries choose. The color distinctiveness might help users find our logo in the various boxes where it appears with the other proejcts'. I think it's a good thing if we get people other than ourselves involved (WPs and WMFs), as they are a little more representative of a large segment of our actual users than we are likely to be. DCDuring TALK 18:47, 18 October 2009 (UTC)[reply]
I don't particularly like either the current text logo or the tile logo, and actually if we voted, the one Vahagn pointed to is the one I was planning to vote for. And yes I think the vote should be started. L☺g☺maniac chat? 19:48, 18 October 2009 (UTC)[reply]

I would vote to block a worse alternative. Frankly, I don't mind the text logo. I could care less about it, except that the tiles are disgusting, unprofessional, and look horrible, and some of the alternatives are worse. The other thing that strikes me about the vote is that most of the people that called for it in the first place and the people who are adamant about the current en.logo being horrible are people that never edit Wiktionary. And just … I can't coherently get my thoughts together any better than that. --Neskaya contribstalk? 00:55, 19 October 2009 (UTC)[reply]

  • I just don't care. At all. Logo, shmogo. I'm not here to build anyone's effing brand. But how about this? We could just make the Wiktionary logo an image selected at random from the proposed logos, rotated daily. Variety is the spice of life. More realistically, I would support a binding vote here to accept whatever comes out of Meta (if anything). -- Visviva 02:51, 19 October 2009 (UTC)[reply]
  • I hate all the voting at the drop of a hat. Is the logo a problem?
    • No - let's build a dictionary.
    • Yes - let's talk about what is the problem, figure out how real a problem it is, and the solution should become obvious.
    - Amgine/talk 22:29, 19 October 2009 (UTC)[reply]
  • I know you're being a bit frivolous, but rotating the logo is the worst, most confusing thing we could do. There is a need for a consistent "brand", even if it's only our current little bit of boilerplate text. I'm just not that picky personally as to whether it's a Scrabble® tile or a book or some variant of the Wikimedia logo or whatever. Equinox 22:32, 19 October 2009 (UTC)[reply]
No, please, don't rotate the logo! L☺g☺maniac chat? 22:44, 19 October 2009 (UTC)[reply]
But... but... people could tune in every day to see if we had the Goatse Wiktionary or not. No? Man, I never get to have any fun. :-(
Well, I am being frivolous. The whole issue is frivolous. Visual branding isn't that important because it applies only to our web interface, which is of marginal significance to the mission of building a universal, free-as-in-freedom dictionary.
The people on NL have a very good point, IMO; we did this already, so let's just accept the outcome of that process, however flawed it may have been. If we don't do that, then we should at least vote on whether to accept the outcome of the new process before we make all the other Wiktionaries waste their time. Again. -- Visviva 03:16, 20 October 2009 (UTC)[reply]
I was actually hoping that votes whether to accept the outcome would be held after the votes on which logo is best with the understanding that no one takes it unless a clear majority of the Wiktionaries do, as that seems to be the only way to make sure that we don't split the logos even more. Really, the whole process to start a vote doesn't seem too difficult to go through. Since there has been a decision that the tile logo is too horribly ugly to use, I think that holding a vote is worth it just to stop those annoying "isn't that pronunciation wrong?" comments, even without the annoyances that the favicon is identical to WPs and that our entries look nothing like that and that it's ugly and the fact the current logo isn't even being used by about half of the Wiktionaries. Really, would holding a vote really take so much time away from building the dictionary as to make it undoable? --Yair rand 04:15, 20 October 2009 (UTC)[reply]
No particular objections to a vote, now or later. But FWIW, I think having a long and laborious process, and then a vote on whether to accept the outcome, is a recipe for failure and aggravation. People who didn't bother to engage in the initial process at all will suddenly come out of the woodwork with tendentious objections to the result. Isn't that more or less what happened the last time around? So why don't we make it clear that we will (or won't) accept the results of the Meta process, and then allow that process to take its natural course? Then the outcome of the process will be accepted by default, and anyone wanting to raise objections after the fact will have to show that there is a consensus against the result of the process (which is unlikely, unless there are very serious problems indeed). -- Visviva 07:10, 20 October 2009 (UTC)[reply]
That would really only work well if all or most of the Wiktionaries had votes like that, and the vote would have to state that in the event that the majority of the wikts don't vote positively to whether to accept the outcome of the vote, then the vote will not happen. If it's clear that all or most Wiktionaries will accept the outcome, then the tile-users will likely see it as a chance to unify the logos (which in itself may or may not be a good thing). In that case, if we hold a vote like that first and it's clear we'll accept the outcome, the NL Wiktionarians might participate. --Yair rand 13:25, 20 October 2009 (UTC)[reply]
Actually, forget it. I support having a vote whether to accept the outcome of the Meta vote in advance, whether it states the extra stuff I said or not. --Yair rand 14:19, 20 October 2009 (UTC)[reply]
Sorry for changing my mind again, but I actually don't have the slightest clue which is better, voting whether to accept before or after the vote. Voting after is probably more likely to get positive votes from the Wiktionaries, but voting before would mean a lot less wasted time if people vote negatively. I dunno. --Yair rand 01:06, 21 October 2009 (UTC)[reply]
How about this: horse-cart or cart-horse? Should a logo be proposed by Wiktionary to the WMF, or should the WMF propose a logo to Wiktionary? How you answer that may suggest how you feel about Wiktionary and WMF. - Amgine/talk 20:28, 21 October 2009 (UTC)[reply]
(unindent) Personally, I think it'd be better if we proposed the logo to WMF. L☺g☺maniac chat? 22:28, 21 October 2009 (UTC)[reply]
I don't know what you mean by WMF proposing a logo to Wiktionary. There was a 10-week nominations period when anyone could suggest a logo and most of the proposals came from Wiktionarians (Stephane8888 from the French Wiktionary, V85 from the Norwegian Wiktionary, Moilleadóir from ga wikt, Diego from pt wikt, Vildricianus and DAVilla from here). Amgine, are you saying that the current voting setup (in which most of the work is completed) isn't good enough because it didn't originate on en.wikt? Does it really matter that much? --Yair rand 23:17, 21 October 2009 (UTC)[reply]
Yes, it really does matter that much. - Amgine/talk 03:13, 22 October 2009 (UTC)[reply]
I think we would all have preferred it if the vote originated on Wiktionary, but this logo discussion has been going on since March, a large number of Wiktionarians were in favor of it, most of the necessary work has been done and Amgine seems to be the only one who thinks that it is so much of a problem that the entire vote should be abandoned. Is there anyone else who is in favor of abandoning the whole vote because the voting discussion did not originate on Wiktionary? --Yair rand 05:17, 22 October 2009 (UTC)[reply]
Just out of curiosity, what good do you (anyone) think it would do if we abandoned the vote? L☺g☺maniac chat? 14:42, 22 October 2009 (UTC)[reply]
IMO the visual branding question is mostly a matter of concern to WMF and not much to us. (But how important is it even to them?) Ideally we could veto a really bad logo but accept almost any reasonable choice without further delay. If WMF or the active participants really care and want, we could commit ourselves to accepting anything they propose, unless we have a vote that rejects the proposal by, say, a 2/3 (or 60% or 75%) majority. Between a final veto and an opportunity to suggest criteria and make specific proposals we would have had enough opportunity to participate. DCDuring TALK 15:08, 22 October 2009 (UTC)[reply]
How could it possibly come from Wiktionary? There is no one Wiktionary; we on EN may like to imagine ourselves as the hub around which the other projects revolve -- we may even fill that role in certain limited respects -- but ultimately we are just one among well over 150 coequal Wiktionaries. If we object because the initiative didn't come from us, why wouldn't NL, FR, VI, or NV object to any initiative from us on the same basis? Fortunately, there is a project dedicated specifically to cross-project coordination: Meta. And fortunately, that's exactly where the discussion is taking place. -- Visviva 15:22, 22 October 2009 (UTC)[reply]


  • Personally, I would greatly prefer if someone from the WMF just stepped in and imposed a logo. It won't happen, of course, mostly because they're a bunch of gutless wonders who will happily countenance flagrant bullying (ru.wikibooks) and mass copyvio (ta.wiktionary) as long as "the community supports it." And so we will all go on wasting our time on these non-issues ad infinitum. Maybe I should write a proposal for that StrategyWiki: "Grow A Spine." :-D / *sigh*. -- Visviva 15:22, 22 October 2009 (UTC)[reply]
OK, I see what you mean by that. I guess I really don't care who starts the vote, but someone please do ... L☺g☺maniac chat? 15:27, 22 October 2009 (UTC)[reply]
The work to start the vote is going on as we speak. The voting page still needs translations into de, fr, ja, ru, tr, lt, and vi. --Yair rand 16:03, 22 October 2009 (UTC)[reply]
Well, what I would prefer to see is if wiktionary languages approached each other about normalizing the logo if it were going to happen at all. It's clearly not an issue for the project, however, so it has never happened. But it seems every so often someone outside the project brings it up as something that is life or death and then we end up with months of drama culminating in a divisive vote dominated by people outside the project who attempt to impose a logo which has little or no meaning to the project.
It'd be great to see all the effort currently being put into this crusade instead being put into trans-Wiktionary discussion about whether this is important to the project. And an option to vote against the logo vote in the logo vote. - Amgine/talk 18:39, 25 October 2009 (UTC)[reply]
How in the world could there be a trans-Wiktionary discussion? There are 172 independant Wiktionaries none of whom speak the same language. No one is going to impose a logo on anyone. There are only going to be votes from Wiktionarians and no one is going to take the winning logo without agreement by more than %60 of the Wiktionaries. I don't think anyone but you thinks it would make any sense to abandon the vote so the vote will probably proceed as planned. --Yair rand 19:01, 25 October 2009 (UTC)[reply]
I would guess a trans-wiki discussion would happen in much the same way I currently have daily discussions with wiktionarians on multiple projects - through a range of online media including on-site, e-mail, microblogging[fr][nl][4], blogs[fr][es], and even in IRC[fr][de] &c. Where we don't talk is on meta. - Amgine/talk 19:58, 25 October 2009 (UTC)[reply]
But what is wrong with talking on meta? That's the whole point of meta, to have discussions that affect multiple projects. Meta works perfectly, and the vote has no real problems with it, so there is no real reason to stop it. How would off-meta discussions be any better? I see no reason to stop the vote just so we can start from scratch setting up the voting process off of the project that was built for that purpose. Perhaps I'm simply not understanding you correctly. --Yair rand 20:39, 25 October 2009 (UTC)[reply]
There's nothing wrong with meta for trans-project discussions. But this is within a single project, and the vast majority of Wiktionarians don't go to meta. It's not how things are done, which isn't a judgement on the benefits or drawbacks of any system or process; only an observation. I've been saying this since, what? March? April? and yet the Meta process has bowled along ignoring this. If you make decisions where the community is not, the community will not feel they've made the decision. That's not at all difficult to understand. And, to be blunt, it doesn't matter if you stop the vote or not. - Amgine/talk 22:04, 25 October 2009 (UTC)[reply]
If I'm understanding you correctly, you are saying that because the discussion was held on meta instead of somehow between Wiktionary projects, the community will feel that the vote was constructed without them, and because of that the end vote will have less than %60 of Wiktionaries approving of the winning logo and the whole thing will have been a waste of time. I find this to be an extremely unlikely possibility, and I don't think that many people agree with you on this. As there is no consensus to stop the vote, the only thing we can do is wait and see. I think that the vote will most likely be a success. --Yair rand 22:38, 25 October 2009 (UTC)[reply]
No. I'm saying the die is cast; the acceptance or lack of acceptance of any result from meta is unlikely to be changed at this point because it was begun outside the community, by someone not a wiktionarian, and continues to be dominated by non-wiktionarians. Most of us, I expect, don't care too strongly what the result is; we're tired of having this argument over and over again, we're tired of non-community members pushing their idea of what the perfect logo is, we tired of having so much of our very limited energy tied up in arguments, and we'll likely go along with whatever strident voice is screaming at the end of this.
And we'll hate it.
And people like you, who've been around not even as long as this argument has been going on (yes, I know, you were on Wikipedia for several years, but you started here under this nick in July) will have won the day. And what will you have won? Why was it so important to you?
Never mind, you're plainly incapable of thinking beyond tomorrow. Go have your little poll and your victory march. - Amgine/talk 02:29, 26 October 2009 (UTC)[reply]
(unindent) The logo discussion is still going on not because a meta person decided to bring it up, not because of the bits of help from non-Wiktionarians, but because the community wants it as has been said again and again, throughout the discussion and poll on meta. You have made a claim that the vote will be a disaster, that nobody wants it and that the community will hate it, and I see no evidence whatsoever that supports that claim.
It is clear that this discussion is getting us nowhere. This apparently pointless debate is now over. --Yair rand 15:02, 26 October 2009 (UTC)[reply]

IPA - standard tag etc...

I'm working on a TTS processor / speech processor /text reader at the moment (currently targeted at Gutenberg for a friend of mine with failing eyesight but this could easily be refactored to work with wiki pages or pretty much anything else), and one of the ideas which occurred to me was that it should be possible to build a local implementation of Wiktionary words against their phonetic equivalents expressed as IPA. A few of my public utterances on my thinking on the subject may be found here [1].

Now at the moment I have a little Java bot which merrily reads a word document from Wiktionary and looks in there for a regex string which looks like \<span class=\"IPA\"\>.*\<\/span\> - this may need to be extended in the case of Wiktionary edition languages with extended character sets etc - and parses out the .* IPA rendition of the word. Unfortunately this is not complete, my quick, dirty and probably unrepresentative sample indicates that there are about 20 - 30% of words without IPA renditions, the real number is likely to be much higher. The generator for this tag is the IPA template e.g. for the IPA of help it looks like: {{IPA|/hɛlp/}}. Moreover there can be >1 implementations of this tag e.g. [2]. This is not an insuperable problem since I have a nuance indicator which knows what the preferred pronunciation should be by recourse to the tag \<span class=\"qualifier-content\"\>.*\<\/span\> where the .* indicates the preferred implementation. One fine day I will get my bot to run over a few Gutenberg volumes and build a database of IPA renditions from Wiktionary, and I will try and find an unobtrusive time/date to do this. My real questions initially are these: is this tag class likely to change at any point? And is there currently a mechanism for indicating entries which do not have an IPA representation with a view to getting them fixed? Sjc 12:27, 2 October 2009 (UTC)[reply]

You might find it easier to work with the XML dump http://download.wikimedia.org/enwiktionary/ this way you can find the "{{IPA}}" directly - that way you don't have to spider the whole site. The chances are that we won't change the classname, but it's entirely possible someone will forget. With the XML dump, generating a list of missing IPA is trivial. Conrad.Irwin 14:17, 2 October 2009 (UTC)[reply]
Thanks Conrad, I'll have a play with the download, and obviously if I don't have to spider it would make a lot of sense not to. Sjc 08:37, 3 October 2009 (UTC)[reply]
Depending on your situation, it might also make sense to use the API. You can feed it a list of titles separated by pipes to get the wikitext for multiple pages at once (e.g. all words in a sentence). But the database dumps are the optimal solution. -- Visviva 06:29, 4 October 2009 (UTC)[reply]

Wiktionary:About English

It seems odd to me that there's no Wiktionary:About English (WT:AEN) like there are for other languages. I think it would be very useful, both in organizing our existing English-specific resources and as a place to note consensus that has been found on a variety of issues. Some issues that could be dealt with on the page are:

Thoughts? --Bequw¢τ 17:43, 2 October 2009 (UTC)[reply]

I had just been thinking the same thing in regards to the little flap we just had over whether to use ==Postposition== as a POS header for English. That is a classic "language considerations" issue -- sorting out what makes sense for a proper treatment of a specific language (nobody would question that English has postpositions, or that these are an important grammatical category in some languages). I definitely think having this page would be a good idea. However, I also would prefer that it be aggressively limited to language considerations, lest it start to blur into areas that should really be addressed through broader policy. So I would be of mixed feelings about including translation formatting there, even though it is an issue that affects English entries only. But definitely create the page if you have a notion; we can sort out the details as we go along. -- Visviva 19:02, 2 October 2009 (UTC)[reply]
Seems worth a try. DCDuring TALK 19:25, 2 October 2009 (UTC)[reply]
Just had time enough to put up some starter links. We'll see where this goes. --Bequw¢τ 20:14, 2 October 2009 (UTC)[reply]
About translations: in most cases, translations are limited to English entries, but they might be included in other entries too, as English terms don't exist for all possible meanings, in their most precise senses. As an example, fièvre de cheval could be given as a translation in febbre da cavallo, and conversely, rather than adding the translation in the etymology section (which may be appropriate in some cases, but not always). Lmaltier 13:30, 3 October 2009 (UTC)[reply]
This would require a vote to change WT:ELE#Translations. While it has been mentioned before, I doubt this will happen in the foreseeable future. --Bequw¢τ 19:55, 16 November 2009 (UTC)[reply]


  • Operator: Merlissimo
  • Function overview: changes external links which are outdated and can be successfully replaced by a new one.
  • Function Details:

The bot replaces urls that have to be changed. This can be only a domain change or a more complex page structure change on a website. Links are dectected with the help of the api (and not with regex) and are only replaced if the webserver of the new url returns a 200-status-response for that new resource. “Link text” is not changed. (own framework written in java - used by all of my bots)

  • Operation: controlled
  • Software:: java (own framework)
  • Has bot flags on: 30+ Wikipedias and some other projects (e.g dewiki(home), enwiki, simplewiki, commons, enwikinews) (see all flags)

Mglovesfun asks at the talk page of my bot to request for permission/bot flag on this page. The bot is running globally and as you can see there typically aren't so many weblinks on wiktionary projects. Merlissimo 16:15, 21 October 2009 (UTC)[reply]

How does it find the replacement URLs? (Via w:HTTP 301 headers??)​—msh210 16:21, 21 October 2009 (UTC)[reply]
If there is only an 301-header the browser would automatically redirects you to the new url and so there is no need to rewrite the urls.
How i find the new locations? Thats can sometimes be very complicated. The easiest possibility is when webmasters give hints (ask on a project for rewrite or i write them a mail). Sometimes there is an hint on the old location (like europa.eu.int). And mostly wikipedians (or wiktionariens? ;-) ) know how the rewrite could by done. The bot can read headers and content or do a rule based rewrite. But i am testing very much before the bot really starts running. But the bot is always testing the respone code of the new location and can do and similarity check (e.g. with an archived version) so thats guaranteed that the content is the same. Merlissimo 17:03, 21 October 2009 (UTC)[reply]
It would seem most sensible to run the bot without a flag so that we can see what it is doing. If it starts doing lots of edits (more than a few a day) then adding a flag may be helpful. As long as it is just updating broken links, I cannot see any problem. I assume it cannot handle links generated partially in templates (i.e. {{{googleid}}} on {{cite-usenet}}), can it produce a list of those that are broken? Conrad.Irwin 19:09, 21 October 2009 (UTC)[reply]
Not more than fifty edits, at any speed, or ten a day, in any quantity, sounds reasonable to me.​—msh210 19:38, 21 October 2009 (UTC)[reply]
@Conrad Of course i can check some links and generate a filtered list to a specified page. On enwikinews and some de-projects i am also reporting broken links on the discussion pages.
Simply tell me which links to check, how to filter (e.g. broken only) and where i should put the list. I can also check if an url is available at different archive services. My framework is very modular, so that would be very easy. Merlissimo 19:53, 21 October 2009 (UTC)[reply]
If I knew which links to check, I'd have checked them already :p - I assumed that's what you were proposing to do, no matter either way. Conrad.Irwin 20:42, 21 October 2009 (UTC)[reply]
Ah, ok. My bot can also change links in templates namespace. More complicated templated are repaired manually by myself. External links produced by templates (e.g. googleid) in article namespace are normally not rewriteable - they are simple broken. Templates like cite-usenet which contain the complete url as argument can be handled by my bot. Merlissimo 17:42, 22 October 2009 (UTC)[reply]
I added a note on the bot user page that it runs without flag: [3] Merlissimo 17:48, 22 October 2009 (UTC)[reply]

Latin letter cleanup

After some reformatting of "A" (by several kind souls), what do people think of today's version (specifically the Translingual section compared to last week's version)? Several changes were made: use of {{Basic Latin character info}}, the reference image only shows one style (no italics) and a single case, new layouts of {{Letter}} and {{Latn-script}}/{{mul-script}} were used, images of different styles of the letter were put together, and etymologies of abbreviations were merged. Should these types of reformattings be done to the other basic latin letters? Anything that people would disagree with, or more suggestions? --Bequw¢τ 16:50, 23 October 2009 (UTC)[reply]

Vastly improved. A long-needed effort. We would be much better off if all of them looked that good. But, can we make further improvements?
  1. The image seems large for its modest informational (or eye-candy) value with my default thumb settings, which are good for other images. Are there other images that might convey more? Is there a way of setting the image size to be smaller, even fixed size?
  2. The gallery doesn't play well with the rhs ToC, giving too much whitespace. Is that an intrinsic problem with gallery, which usually only appears below the bottom of the ToC? Is the gallery a Wikimedia thing? Is it alterable?
  3. Should even more of the material appear only in Appendices, which themselves might merit more of a preview/advertisement on the page. I am thinking of the related terms and the images.
-- Great job so far. DCDuring TALK 18:22, 23 October 2009 (UTC)[reply]
Stab taken. MediaWiki gallery syntax is more flexible than I recalled; setting "perrow=3" has solved the TOC issue for me (though obviously there are some screen resolutions where it would still cause problems). For the image in the character box, I'm thinking 50px should do just as well as 100px for most cases; people who want more detail can always click through to Commons. I do think it has to be set explicitly; using thumbnails would result in a much-too-large image (for most people) with lots of extraneous formatting.
I don't think appendices are a good idea. The entry is about the letter A; we may as well use it to showcase the information we have about that letter. Having an A entry that pointed to a separate Appendix:Latin capital A (or whatever) would not be very productive IMO. I've boxed related terms, though, since it's got plenty of room to grow. -- Visviva 05:12, 24 October 2009 (UTC)[reply]
Excellent on all counts. The ToC problem is solved for me as well. I hope it works for others. The two signal-flag images seem out of place. Can they go in the/a gallery? The images in the gallery also seem large for their information content. I wouldn't be so picky except this could be a template for at least the Latin characters and apply approaches and tricks that would work for other classes of single-character headwords. It might as well be the best we know how to do. If it involves features that are not documented in our normal places or are just out of the ordinary, our more technically adept contributors are the best ones to find appropriate solutions. I'm hoping that no character is more demanding than A. DCDuring TALK 10:36, 24 October 2009 (UTC)[reply]
Something like User talk:Visviva/Letter, maybe? MediaWiki galleries are kind of fiddly (for instance, you can't use variables or parser functions in them), but most of their behavior can be emulated. This may work poorly with funny-shaped images; I haven't really tested it beyond the one example there. -- Visviva 16:59, 24 October 2009 (UTC)[reply]
Beautiful. Yes, like that. Is that the basis for something more flexible (and improvable) than gallery?
Is there a way to do images that are sized relative to the user-preference for size of thumb? I am mostly thinking of a smaller size for low-information content images, but it could conceivably be useful for high-density images, though a larger size is just one click away, as you had pointed out. DCDuring TALK 17:17, 24 October 2009 (UTC)[reply]
I like it too. --Bequw¢τ 21:08, 25 October 2009 (UTC)[reply]
It includes all of the old ones but the NATO word. Would you just want it formatted so that that NATO word was a bullet point above or below the gallery? --Bequw¢τ 03:54, 27 October 2009 (UTC)[reply]
We currently have images of the letter A written in Fraktur, Uncial, and Roman serif fonts. Is this sufficient, or would we want to include other major types of typefaces (Roman script and Antiqua) even other variants (such as a cursive or Textualis blackletter). If we do, does anyone know about creating images of these types (I don't think there aren't free fonts for all these). Would this information be too encyclopedic? --Bequw¢τ 19:01, 24 October 2009 (UTC)[reply]
It's a bit encyclopedic IMO, but it has closer-than-average connection with a dictionary. To me it seems like excellent Appendix material if WP doesn't cover it or we have something to say that they don't. DCDuring TALK 20:15, 24 October 2009 (UTC)[reply]
I think they should go somewhere, maybe on the main page, as it could help someone decipher which symbols on a document are which letters. --Bequw¢τ 21:05, 25 October 2009 (UTC)[reply]


Policy proposal

Per roof tile and coal mine passing rfd as the more common spellings of rooftile and coalmine, I think it would be good policy to accept de jure these sort of entries that while sum of parts as two (or more) words, are the more common form of the same thing but without the space. Otherwise you get something like this.

  1. (rare) Alternative spelling of coal mine.

Or if you split it like this

  1. (rare) Alternative spelling of coal mine.

It's confusing, because you might click on it not realising that it is two separate links, hence end up with no definition. Or, like this.

  1. (rare) Alternative spelling of coal + mine.

Looks a bit silly.

In fairness, I don't think stuff like coal mine should get deleted merely because it's got a space in it. I don't see how coalmine is less sum of parts than coal mine. Mglovesfun (talk) 14:42, 26 October 2009 (UTC)[reply]

Having default CSS underscore links will allow the "coal mine" method.​—msh210 14:59, 26 October 2009 (UTC)[reply]
I would strongly support underscores to differentiate a single multi-word link from multiple single-word links, especially if they were faint or dashed or dotted. But I would rather have conspicuous ones than not have them at all. But that has benefits in many circumstances, including many unrelated to the proposal at hand. I'm still without firm opinion of the proposal. DCDuring TALK 16:24, 26 October 2009 (UTC)[reply]
:P I agree with Mg. IMO Keep both or else keep neither better choice of the two is obvious isn't it?. 50 Xylophone Players talk 15:20, 26 October 2009 (UTC)[reply]
Me too. That is silly looking ... L☺g☺maniac chat? 15:33, 26 October 2009 (UTC)[reply]
If you treat coalmine and coal mine as typographical variants rather than spelling ones, then coal mine is not sum of parts at all. Mglovesfun (talk) 11:34, 27 October 2009 (UTC)[reply]

Move to WT:AEN DCDuring TALK 17:26, 4 December 2009 (UTC)[reply]

(Note: Despite that command request, it never did move there.​—msh210 18:10, 9 December 2009 (UTC) 20:09, 9 December 2009 (UTC))[reply]
I took it as a mild poke. At the time, I thought we might actually do something, not that the problem seemed especially serious. DCDuring TALK 20:26, 9 December 2009 (UTC)[reply]

Note the existence of Wiktionary:Votes/pl-2009-12/Unidiomatic multi-word phrases to meet CFI when the more common spelling of a single word.​—msh210 18:10, 9 December 2009 (UTC)[reply]

That vote as currently worded will allow "[u]nidiomatic terms made up of multiple words to officially meet WT:CFI when significantly more common than a single word spelling that already meets CFI". Since our only technical criterion for rare spellings and even misspellings and even typos(!) is "three hits and you're in", this will allow all sorts of phrases in, such as climb a (as there are three hits for climba: [4], [5], [6]). So I think that this vote should not be held — or, if it is, I'll likely vote against it — unless and until we have a set policy about which spellings and forms are common enough to include, different from the current one. (Technically the climba example is not a good one, as the three hits I found are in two different senses. But there are surely other similarly obviously non-admittable phrases where similarly three misspellings can be found, but in the same sense.)​—msh210 18:10, 9 December 2009 (UTC) 20:09, 9 December 2009 (UTC)[reply]

I've found one: citations:hisown.​—msh210 20:41, 22 December 2009 (UTC)[reply]

Portals

Have language portals ever been suggested here? I'm guessing yes but consensus can change. Essentially as per Wikipedia, a portal links the main categories and appendices using subframes and nice colors. Mglovesfun (talk) 20:59, 28 October 2009 (UTC)[reply]

The search function gets just one relevant result. Mglovesfun (talk) 21:10, 28 October 2009 (UTC)[reply]
Most languages have Wikipedia pages, many languages have pretty good (internal) About: pages, slowly more have Index: pages, and several have extensive grammatical appendices - maybe some kind of overview page could be created, not sure what location would be the best for it (I certainly think grammatical appendices are our biggest lack in the per-language department) - might be worth using the Index:<language> page if it's just to link to other places. Conrad.Irwin 21:17, 28 October 2009 (UTC)[reply]
Isn't that what the About pages are about? -- Prince Kassad 21:38, 28 October 2009 (UTC)[reply]
Those are for editors. I suspect Mg meant something for visitors.​—msh210 22:35, 28 October 2009 (UTC)[reply]
MG: Where would the portals appear? Surely not on an entry page? How then would a user find the portal page? Would it be a link from the L2 header, or via some banner at the bottom of each language section? DCDuring TALK 00:19, 29 October 2009 (UTC)[reply]
The French Wiktionary uses main page links to its portals, which I think is a good idea if we want to follow it. —Internoob (Disc.Cont.) 18:38, 31 October 2009 (UTC)[reply]
I don't think this is a good idea, Wiktionary seems rather short of people who care about nice colors, etc. (as evidenced by the surprising amount of indifference to the logo vote and the main page redesign). It seems to me like this would end up with some short-lived attempts to make a bunch of mostly unused pages and a lot of "to-do" things which would never get done. --Yair rand 01:21, 29 October 2009 (UTC)[reply]
Yes (per above) I mean for visitors rather than editors. I certainly think Index pages could handle this, and as Yair points out, we may end up with too much duplication from indexes/about/portals and they may end up being up for deletion. Mglovesfun (talk) 10:44, 29 October 2009 (UTC)[reply]
If this was done, I like the idea of linking all the level 2 headers to them (for all languages). The current linking to a dictionary definition seems a bit pointless. Conrad.Irwin 20:37, 29 October 2009 (UTC)[reply]
I had thought there was some reason why we were not having links in headers. Is it that we only want well-structured consistent ones, as via templates?
Presumably because it is easier to ensure they are the same, yes - and using a template or link to a specific place is harder for a new editor to learn. Conrad.Irwin 13:32, 30 October 2009 (UTC)[reply]
I thought I had seen at least one "About" page that was somewhat portal-like, but I can't locate it now. Could such a page serve or do we want a separate portal-type page? In the absence of a suitable page would the WP article be an acceptable link, or should we have/show no link until a portal page exists? DCDuring TALK 22:51, 29 October 2009 (UTC)[reply]
I think the Wikipedia article would make a good substitute, but don't remember seeing anything like what I imagine yet. Conrad.Irwin 13:32, 30 October 2009 (UTC)[reply]
IMO this is such a good idea that we ought to implement the heading idea using the WP articles ASAP, perhaps for "smaller" languages for which we are unlikely to develop our own portal in the near term. We can work through the corresponding article to make sure that there are wikilinks back to us to supplement the "back" button. Do any of our bots need to be amended to not flag the L2 heading link? Should be wait for templates? A small-scale experiment/demostration/implementation would be useful before a vote if a vote is required or before large-scale implementation. DCDuring TALK 15:08, 30 October 2009 (UTC)[reply]
I just remembered the possible problem: Links in headers make for bad section links. We could get round this by defining a new link inside a template within the heading - but when people click on the ToC they will then be linked to a bad link - it's not a huge problem, but certainly one worth considering. AutoFormat would likely need instruction, as would the index processing stuff that I do, as would all of the inflectobots. Conrad.Irwin 15:26, 30 October 2009 (UTC)[reply]
What happens when they link? Do they go to right place but get a confusing message? Do they go nowhere? If it gives users a navigation/confusion problem, the net advantage goes negative IMO. DCDuring TALK 15:49, 30 October 2009 (UTC)[reply]
AFAIK, they do not work at all. -- Prince Kassad 18:44, 31 October 2009 (UTC)[reply]
I remember something mentioned in the Grease Pit a while back about how section links often have the same name. "#Etymology_2" for instance could refer to the first instance of ===Etymology 2=== or to the second instance of ===Etymology===. Perhaps with a single implementation we could kill two birds with one stone? Just a thought. —Internoob (Disc.Cont.) 18:38, 31 October 2009 (UTC)[reply]
Unindent back to Yair rand's comment It could be done. How many contributors would we need to make and maintain a portal? If MG and I expend the same effort on a portal that we spend on our user pages, a Portal:French would not look too shabby. —Internoob (Disc.Cont.) 18:38, 31 October 2009 (UTC)[reply]
Another thing is, that once a portal is "done" (that is, well formatted) it doesn't need much updating. It might only need a dozen edits in a year; things like new appendices. And yes L2 headers is the most logical way to do it. Mglovesfun (talk) 06:15, 1 November 2009 (UTC)[reply]
I just noticed that in the proposed Main page redesign there is a section called "Languages of the world" which currently links to language indexes. If portals are implemented, and the redesign ever gets done, it would probably make sense to have that section link to portals. --Yair rand 20:56, 4 November 2009 (UTC)[reply]

What exactly are the proposed Portals designed to do? Specifically, what would they do that isn't already being done in the language categories, indexes, and About pages? --EncycloPetey 00:14, 8 November 2009 (UTC)[reply]

I'm assuming from the above discussion and from what I've seen of the French Wiktionary portals, that they would be designed specifically for readers, with nice colors and designs, some information on the language, links to categories, and maybe some links to WM projects in that language. I think this could be useful if there actually is momentum to do this, but not if we're going to end up six months from now with only three or so portals. --Yair rand 16:40, 8 November 2009 (UTC)[reply]

I need to focus your attention on the fact that we should now add macrons in the English alphabet (ā in a#Derived_terms...). Actually, w:en:English_words_with_diacritics talks about some exotic characters importations in the English language, and I'd like to thank Hippietrail to currently demonstrate their importance in Category:English_words_spelled_with_macrons. Apart from that we're having the same observation for French on fr.wikt in parallel. JackPotte 15:21, 3 October 2009 (UTC) Move to WT:AEN DCDuring TALK 17:21, 4 December 2009 (UTC)[reply]

Most of these map neatly to WikiHiero syntax, AFAICT, but there are some that don't. In particular, there are two sets, "NL" and "NU", that don't seem to be supported by WikiHiero at all. Does anybody have a notion of what these are, or of where more information on them can be found? Are they in the Gardiner list, or do they come from another source? -- Visviva 06:36, 4 October 2009 (UTC)[reply]

Style guide for prefixes and suffixes

Do we have a style guide for prefixes and suffixes? If we don't, should we have? What would be the recommended definition line, for example:

  • Creates nouns from verbs.
  • Forms nouns from verbs.
  • Used to form nouns from verbs.

--Panda10 13:41, 4 October 2009 (UTC)[reply]

Not as far as I know, but we should. I favor approach 3, preferably wrapped in {{non-gloss definition}}. Where a gloss is possible, this can go at the front of the definition, like so: "Become. Used to form nouns from verbs." On the other hand, 1 & 2 are shorter and -- because of the polysemy of used to -- clearer. So I dunno, actually. If a consensus exists, it could perhaps be documented at Wiktionary:Style guide, which is still a bit of a mess at present. -- Visviva 15:00, 4 October 2009 (UTC)[reply]
I'm not using any of those aproaches in Latin, because it would be repetitive. See (deprecated template usage) -icus, where there are three meanings, but all are used for the same pattern of word formation. I prefer the definition line to be a definition line, and to have the POS constructive information in a Usage notes section. --EncycloPetey 15:43, 4 October 2009 (UTC)[reply]
That is definitely the best approach I've seen. There may be some cases that call for a different approach, but overall that seems like an excellent template. -- Visviva 16:27, 4 October 2009 (UTC)[reply]
I like it too, especially this:
  • Instead of saying "Creates nouns from verbs" it probably sounds clearer "Added to a verb to form a noun"
  • Wikilinking verb and noun will highlight the parts of speech.
Other thoughts:
  • I still prefer providing examples just below the definition. This is what we do for other entries.
  • In the examples, it feels more natural to me to provide the base word first, then the derived word:
(deprecated template usage) turista(deprecated template usage) turistaként
  • I can't always translate the suffix with a single English words and I have to describe it with a sentence.
Maybe we could provide several recommended models in the style guide. Thanks. --Panda10 16:45, 4 October 2009 (UTC)[reply]
The examples on (deprecated template usage) -icus were chosen to show formation from three different parts of speech, rather than to illustrate the meanings. I agree that having examples with the definitions could be good, but that's where we normally put quotations or example sentences showing the term itself in action. So, I'm not overly fond of the idea of putting something else in that location. --EncycloPetey 19:59, 4 October 2009 (UTC)[reply]


What’s happened to the mammal template? It hasn’t been working lately. I’ve tried to use it in кабарга, but it has vanished. —Stephen 02:14, 7 October 2009 (UTC)[reply]

[8] -- Prince Kassad 12:58, 7 October 2009 (UTC)[reply]
Oh, now I recall someone removing all the mammal categories from the Taos language. So we just don’t have mammals anymore. What’s the difference between mammals and politics or religion? What categories are going to be kept, and how does one determine them, other than trial and error to see what connects to a working template? —Stephen 14:29, 7 October 2009 (UTC)[reply]
Category:Mammals hasn't gone anywhere, just the context tag. The argument, I believe, is that "mammal" is not a context, which is true enough but IMO just shows that we need an additional kind of label. -- Visviva 06:54, 9 October 2009 (UTC)[reply]
I don’t understand what you mean, but if we are keeping the category, then it would seem that the gripe was with the way the template displayed. The proper thing to have done was to change the template output to whatever "not a context" means, while retaining the category. It was a huge loss when the mammal category was completely stripped from the Taos lexicon. There is no one here who is working on Taos anymore, so this is permanent damage. Whoever did it should go back and insert the category in all of those words from which he removed it.
Ideally, we should restore the template and change the display to please whomever it was that took offence. That would make adding these tags so much easier. The way it is now, we need to have a dedicated context expert who understands what this is about to add all the contexts for all the languages where called for, and to add categories when contexts are not wanted (that sounds like gibberish to me). —Stephen 23:57, 9 October 2009 (UTC)[reply]
I wholly agree that before a template for a pseudo-context label is removed, all its entries should be placed into the corresponding category using [[Category:...]]; the categorizing effect of a deleted pseudo-context template should not be lost.
Better yet, a pseudo-context template such as {{fish}} can be kept; the end of distinguishing topical context templates from topical categorizing templates can be achieved by removing the display from the latter, so that the latter do not show, say, "(fish)". It is nice to see in the wiki code to which of the senses the given topic category applies. So for instance, for the not-yet-deleted template {{fish}}, the template {{context}} in its guts could be passed a new parameter "pseudo-context=1"; the {{context}} would show no label when pseudo-context is 1. Or {{fish}} could have {{pseudo-context}} in its guts to make it more explicit that it is a mere pseudo-context template. Or whatever else we see fit. In the end, whether a categorizing pseudo-context template should show a label or not can be made user-customizable using CSS clases. I for one would be happy to see "(fish)" on the definition line of salmon: "(fish) One of several species of fish of the Salmonidae family". --Dan Polansky 07:20, 10 October 2009 (UTC)[reply]
Well, I reckon there are two or three editors who can edit {{context}} without breaking anything; if you can get one of them on the case, I personally would have no objections. But the fact remains that context and meaning are quite different things, and it does no service to our project or our users to lump them together. Consider: if these continue to display, then when we do have a fish name that is used only in technical contexts, we would then have the odd lead-in "(ichthyology, fish)". Or worse, "(fishing, fish)".  :-)
In view of this, I would rather that we used a separate template entirely, maybe placed at the end of the sense line. This could be much simpler than {{context}}, because it wouldn't need to display in any fancy ways (or at all). It could perhaps generate an HTML anchor that would allow direct linking to the sense. It could take a string of identifiers, maybe combining into a short gloss: {{label|fish|of the family |Fishofishea}} -- thus categorizing into Category:Fish and Category:Fishofishea, if present. Or something like that. -- Visviva 02:49, 11 October 2009 (UTC)[reply]
(Unindent, re Visviva) Sounds good to me; I admit that letting the {{context}} uninvolved and going with {{label}} or another separate template seems better.
I'd prefer to have the label at the beginning of the sense line, though. Again, whether the label should be displayed can be made customizable. Also, the categorizing labels can have a slightly different formatting, and there can be a tooltip indicating that it is a categorization label rather than a restricted context label. In any case, I think the pseudo-context templates such as {{fish}} should better be left undeleted until this is resolved. --Dan Polansky 08:24, 11 October 2009 (UTC)[reply]
I think {{mammal}} should show (zoology), but categorize into Category:Mammals, the way {{rivers}} shows (geography), and categorizes into Category:Rivers. --Vahagn Petrosyan 12:40, 10 October 2009 (UTC)[reply]
But this was the problem in the first place; most mammal names are not technical zoological terms. "Geography" is at least ambiguous; nobody is quite sure whether it refers to the everyday field of knowledge, or to the specialized work of geographers. "Zoology" (and likewise "ornithology" etc.) has no such ambiguity; it refers to a specific technical field, so putting such a label on dog or elephant (or their language-X equivalents) is actively misleading. -- Visviva 02:49, 11 October 2009 (UTC)[reply]

Translingualness of letters

I thought we'd be able to come of with some guidelines for which letters are Translingual. Looking at Appendix:Alphabets, here is the current state of affairs (there's obviously some scripts missing here but these are the main ones):

  • Translingual: Latin (mostly), Cyrillic (mostly), Braile, Greek, Armenian, Gothic
  • Individual language: Arabic, Carian, Georgian, Hebrew (except א), Lycian, Lydian, Phoenician (except 𐤔), Thai

I propose that letters be considered Translingual if they are used in multiple languages but excluding direct ancestors. I'm not sure if that wording is clear, but I mean to exclude cases like Greek where mostly the same alphabet was used for the temporal divisions of Greek (divided on en.wikt as Modern Greek, Ancient Greek, and Mycenaean Greek). Usage, as opposed to a metion, of a letter would be interpreted as usage in constructing native language words. An English textbook describing the pronunciation of would not count as usage in English.

What do people think of this criterion, is it too restrictive or not restrictive enough? I think it meshes well with the decision to have Translingual entries for Han characters. If implemented it would cause Greek, Armenian, and Gothic letter entries to be converted to individual language entries. There may still be some Translingual entries (possibly for Translingual-senses of Greek letters) but the the simple letter sense would be language specific.

The Unicode encoding details of characters is information that I think is separate from language concerns (such as definitions, inflections, or usage). I think we should be able to come up with a format to display this information without resorting to Translingual entries as we have in some cases. Possibly a template like {{see}} that sits above L2 headers.--Bequw¢τ 16:33, 7 October 2009 (UTC)[reply]

I like the criterion, but I would prefer to reverse it: a character/symbol/letter is Translingual unless it is used only in one language, or only in different historical phases of one language. Otherwise I don't know what we would do with things like 𐇐. Also if we are non-Translingual by default, it will be difficult to find a rationale for the translinguality of some symbols that are simply non-linguistic (e.g. arcane mathematical symbols), but that may only happen to be attested in English writings. -- Visviva 05:32, 8 October 2009 (UTC)[reply]
I have been fiddling with something at Template:character info. It could go above the first L2 (though it would still float to the right of the first section). It's designed to be customized for specific cases (e.g. {{Vai character info}}). I'm thinking that the specific data we might want to include would vary from one code block to another. For all characters we would probably want to provide
a) codepoint linking to further technical information, either on Unicode.org (which is problematically information-poor) or FileFormat.Info (which is problematic, since it's just some guy with a website AFAICT). I assume we don't want a Wiktionary entry to include things like how to represent the character as an HTML entity or in UTF-8 et al.; but we do want to show the reader where to find this information.
b) a link to a root node for the set. In most/all cases this would be a subpage of Appendix:Unicode. This can provide general information about fonts, etc., and a link to the official description of the block on Unicode.org.
c) links to the previous and next characters in the block. Clickiness is always good. Individual language sections might also have their own navbars, if the traditional order varies from the Unicode order.
d) links to other characters with which this character might be confused (this can simply continue to use {{see}}, I think)
e) a canonical image of the character, if available
For some characters we would want to provide additional information, such as presence/designation in other standards, decomposition, keyboard entry, compatibility forms, etc. For very special cases like the Egyptian hieroglyphs, it may be necessary to break this information out for more extensive treatment a la the Han characters. But in most cases I think an infobox would be sufficient. -- Visviva 05:32, 8 October 2009 (UTC)[reply]
I like it. We might want to make the "Next"/"Previous" links optional. In some areas of Unicode there's no real ordering and so this info isn't really useful there. In other places, the page might be very full, and a Translingual section might elaborate on where the entry fits in some order (like an alphabet) already. In this case we might reduce duplication, and save on RHS space. Looks nice though. It might be hard, but very useful, finding images for many of the symbols. --Bequw¢τ 02:16, 9 October 2009 (UTC)[reply]
I have reformatted it a bit, so the next/previous links will appear only if supplied. I was thinking of them mostly as a browsing feature (so that someone could flip through all the characters in a block, if they were so inclined). Please feel free to make any other modifications that seem approprite. As with anything I create, I live in the hope that someday, someone who understands layout and design will fix everything up. :-)
Commons only has a smattering of character images at present (CJK excepted), but it looks to me like these fonts should all be Wikimedia-compatible. (And fantastically, that appears to include glyphs for most/all of the Egyptian Hieroglyphs block). Plus SIL has free-as-in-freedom fonts for Vai and some other scripts. So image creation should mostly be a mechanical (albeit time-consuming) task. -- Visviva 06:42, 9 October 2009 (UTC)[reply]
Two full sets now using {{character info}}: Vai (not illustrated, yet) and Phaistos Disc (illustrated). Feedback welcome, before I go crazy and start creating 50,000 of these. -- Visviva 02:53, 11 October 2009 (UTC)[reply]
It's important to enforce one of the script templates in the appendix. For example, symbols in Phaistos Disc show up as squares in Opera and IE, even though I have the Aegean font. {{Linb}} would work for Phaistos. --Vahagn Petrosyan 10:56, 11 October 2009 (UTC)[reply]
This should be sorted now, at least for these two. For the others, I've made some rough guesses as to which script template is appropriate for (most of) each Unicode block and created corresponding subtemplates of {{unicode scripts}}. However, some of the Appendix:Unicode pages won't be affected by this until their format is updated. -- Visviva 06:23, 13 October 2009 (UTC)[reply]

Phaistos Disc symbols: Translingual or Undetermined?

Just wondering what the best language header would be for the characters in Appendix:Unicode/Phaistos Disc. Which do we prefer, 𐇐 ("Translingual") or 𐇑 ("Undetermined")? -- Visviva 08:02, 9 October 2009 (UTC)[reply]

FWIW, I'm leaning strongly towards Undetermined. If there are no objections, I will probably go ahead with that, and see how it works out. (The category name "Undetermined symbols" is a bit troubling, but these should probably have a more specific inflection template anyway.) -- Visviva 15:00, 9 October 2009 (UTC)[reply]

Seem to be Undetermined to me, any real use in another language would be a quote off the disc (though no doubt they'll become popular as nifty characters in their own right). Conrad.Irwin 00:03, 10 October 2009 (UTC)[reply]
OK, I have started creating these as ==Undetermined==. We'll see if anybody screams. AFAIK this is the first use of this L2 header on Wiktionary; the existing calls to {{und}} were all from etymologies. Nor can I think of any other cases offhand that would call for this -- I think Herodotus and Strabo have a few words from languages that don't map to any known language, but I'm not sure if (or how) we'd want to include those. -- Visviva 09:25, 10 October 2009 (UTC)[reply]

Štambuk v Štambuk

Let's listen to what Mr Štambuk himself has to say on Bosnian, Croatian, Serbian, Montenegrin and "Serbo-Croatian", shall we?

In Talk:Macedonian:

[...] I'm aware that Lunt published the first grammar of Macedonian just about ~50 years ago, but look - Bosnian language was codified less then 15 years ago; does this fact invalidate rights to Bosniaks to their own standard language/literary language, which they were unable to do due to inadequate political climate in ex-Yugolavia(s) surrounding the question of the existence of their ethnos? I don't think so. [...] --Ivan Štambuk 12:08, 31 January 2008 (UTC)[reply]

In WT:RFD/O (see this version, the section is no longer in the current version):

Serbo-Croatian macrolanguage officially died with communist Yugoslavia and sh ISO code is now deprecated. With bs/hr/sr babelboxen available, I see no reason for keeping this other than for insultive/propaganda purposes. --Ivan Štambuk 18:21, 4 June 2008 (UTC)[reply]
[...]
Bogorm, so-called "Serbo-Croatian" is nothing but a political label that in practice meant nothing. Most modern Croat historians would argue that "Yugoslavism" that brought us the "common language" was nothing but the most elaborate and refined phase of Greater Serbianism. I can guarantee you that no Serbian speaker can write literate Croatian. Do you find it funny that today people who claim to speak "natively Serbo-Croatian" here and on WP have but -3 and -4 profficiency in Croatian listed in their Babel boxes? This "separatism" you speak of is bizarre, Croat writers were caling their mother tongue Croatian centuries before communist brought us the SC thingie, and Serbs continued to have in the constitution "SC" listed as an official language up until 1997 in unsuccessful Yugoslavia #5 even after Bosnia and Croatia declared independence. In practice people who claim to be speakers of "SC" are in 99% cases Serbs or Serbophiles who use it as a political manifest with a very precise meaning, especailly when they couple it with hr/sr babelboxen (why chose them at all when hr=sr=sh?). Believe it or not, the notion of "SC" is offensive nowadays to most Bosniaks and Croats, and it would be to you to if you were more educated in language policy, propaganda machinery and fabricating history that was essential ingredient of Yugoslav politics (always at the expense of someone others heritage..). Also, a common myth you mention - there is no "linguistics justification" for SC either; as a standard language it's dead (it never really existed because two "varieties" of Eastern and Western that were recognized roughly correspond to modern notion of standard Croatian and Serbian, post-90s changes aside), and the notion of "SC" as a "collection of dialects" is completely arbitrary as the whole South Slavic area forms a dialect continuum (i.e. Čakavian+Kajkavian+Štokvian+Torlakian do not constitute a "genetic node", and neither does the South Slavic branch either as it's just a geographical designation, just like West and East Slavic too). The whole thing is a bit more complicated than b/w picture you boil it down to. Once again, saying that one speaks "SC" is nothing but a political manifest, by modern conceptions anachronistic, obsolete and potentially offending to some. Why insist on it when alternatives are available? --Ivan Štambuk 22:45, 15 August 2008 (UTC)[reply]
[... I'm eliding the other half of the discussion to focus on Štambuk's comments and views, see the original page]
I can't relate much the relevance of the content of your comments to this discussion. I'll list my points again so they don't get obscured by your irrelevancies:
  • "Serbo-Croatian" as a standard language existed only in a Communist State called SFRJ, describing two abstract, literary forms of "Eastern" and "Western" variant that roughly correspond to Serbian and Croatian today. So even then - it was not a "unified" language.
  • "Serbo-Croatian" in a sense of "collection of dialects spoken by Croats and Serbs" is a completely arbitrary classification of dialects, and not a valid genetic node in Slavic language branch. No Slavic historical linguistics handbook I know of claims this, and I do not of several works written by experts that prove otherwise. Moreover, there are no isoglosses that connect all Čakavian or Kajkavian speeches (two major dialectal groupings of the area), hence no "Proto-Čakavian" or "Proto-Kajkavian" - their last common ancestor that is reconstructable by comparative method of historical linguistics is Proto-West-South-Slavic or Proto-Slavic proper. Since everybody speeks their own local idiom, and not literary language (95% of speakers don't even know the proper accents, rules for which can be extremely complex), there is no reason to put onto them a term which in practice denotes something completely different (standard macrolanguage with 2 variants of Communist SFRJ)
  • "Serbo-Croatian" never existed in the past as a single literary tradition, or a state, before it was artificially made to become so (unsuccessfully though). Croats and Serbs have had their own, separate literary traditions, spanning centuries before SC ideology appeared, written on various dialects and scripts. For example, the "father of Croatian literature" - Marko Marulić, wrote in Croatian Čakavian dialect not spoken by any Serb. Serbs have up until the acceptance of Karadžić's reforms in 1868 used Serboslavic - obscure mixture of Church Slavonic and Russian unintelligible to common folks - as their literary language. No Croatian literary historian has ever had pretensions to claim this Serboslavic writings.
  • "Serbo-Croatian" ultimately became political dogma, upon which Communists enforced Serbian words onto Croats, and wiped words with centuries of literary tradition and those that were even very much spoken from the dictionaries just because they were not used by the Serbs (i.e. they were in Croatian-only dialects). Everything Croat-only was systematically demonised by the Communists as "Ustasha", punishment for which was political persecution and humiliation. For example, the entire editio princeps of current official Croatian orthography book Babić-Finka-Moguš was burned by the communists in 1971 just because it had "Croatian" in the title, and not "Serbo-Croatian".
  • The term "Serbo-Croatian" is still very much cherished by Serb-side, because the myth of it it legitimizes the territorial-cultural pretensions. E.g., while Serbs were bombing Dubrovnik in 1991-92, they published propaganda books such as Srpstvo Dubrovnika ("Serbdom of Dubrovnik") in which history was fabricated to legitimize annexation of Croatian territory to Greater Serbia. (Velika Srbija, to znači ujedinjena Srbija! - "Greater Serbia means unified Serbia!") - published by Vojna štamparija "Military publishing". To this very day the Faculty of Philosphy in Belgrade teaches Ivan Gundulić and other Old Croatian writers that explicitly identified themselves with Croatdom and called their language Croatian as a part of some "Serbian Renaissance and Baroque". Too bad the rest of the world doesn't share the same views..
  • Ultimately the "SC" epoch was highly detrimental to Croatian literary language tradition, as in the crucial period an abrupt cut was made with history. When in the 1990s lots of banished words are those "dormant" ones were revived to standard language, lots of left-wing pro-Communist media ridiculed it as "trying to separate Croatian from Serbian". You won't believe how many times I've heard that words like zrakoplov and brzojav were invented during NDH (they're in fact at least half a century older). When the common dictionary of "language of literature" (književni jezik) was compiled by Matica hrvatska and Matica srpska it was not only criticised for being amateurish and in many definitions wrong, but it completely ignored a large corpus of Croatian literature tradition. Some 10-20k terms for some very basic terms which were used by 15th-19th century writers nowadays don't appear in Croatian standard language as a result of it.
Bogorm I understand your predilection for some myths of the past as they presumably reflect a state of affairs much "brighter" then after the 1990s, a land of "milk and honey" and "brotherhood and unity", but at the same tome there are millions of people that abhor the Communism demon for some very good reason. 1990s wars is a direct result of Communist not being active in educating common people to come to terms with what their ancestors did in the past (the official politics was "collective amnesia"; imagine what would Germany look like nowadays had de-nazification not occurred actively by academics and the media?). The term "Serbo-Croatian" epitomizes lots of bad things that happened in Communist SFRJ in the minds of lots of people, not strictly language-policy (which itself was bad reason enough), and insisting on it just because somebody living in a balloon imagines that the next Yugoslavia (#6, there were 5 unsuccessful previous attempts believe it or not) will be the one, bringing economic prosperity and Pan-Slavic commonness to the impoverished people would be utilizing this project for someone's personal political fantasies. The term is in all possible sense (linguistic, regional, political) obsolete, anachronistic, misleading and politically incorrect, and there is no reason to insist on it, esp. when the only justifications of it presented are the same myths that coined and enforced it. --Ivan Štambuk 15:31, 18 November 2008 (UTC)[reply]

Very well said. Štambuk is very intelligent, very knowledgeable, and has explained here the problems with both the term "Serbo-Croatian" and the nonsense of pretending they are one language, or of trying to force convergence.

So what happened?

It turns out to be uncomplicated. You are (all too) familiar with his pattern of extreme personal abuse of anyone who disagrees with him on anything; what you may not fully understand is that this pattern is not new. He has had problems on the Croatian projects for years, having at one point been blocked for a year on the hr.wp (and now permanently banned there.) He seems to see himself as the victim of some sort, failing to see his own constant abusive behaviour.

On February 26 of this year, someone had once again had enough of his abuse, and blocked him on the hr.wikt.

On February 28, he moved Wiktionary:About Serbian to Wiktionary:About Serbo-Croatian and set about editing it to change the language(s) to Serbo-Croatian. (Interestingly, he created redirects for about-Bosnian, and about-Montenegrin, but not Croatian.)

He then set out, fairly sporadically at first, to delete Croatian (and the others when present) from the wikt, replacing it with SC. It had nothing to do with "reducing duplication" or any other arguments that would come later. It was all about—and still is all about—removing Croatian from the en.wikt. (Hence his vociferous objections to having a brain-damaged bot restore all of the deleted sections, that would defeat his entire purpose.)

It isn't that he changed his mind about "nationalism", or about the linguistics. He still holds the positions stated eloquently above, he knows that "Serbo-Croatian" is offensive, insulting, and linguistic nonsense. That is precisely why he is pushing it. His objective is to take revenge on the people in the Croatian WM projects, by doing severe damage here.

He is trolling.

That, btw, explains (if you have noticed through all the noise), why his "arguments" for SC reduce to screaming that the languages are "99.9% identical", interwoven with constant personal abuse. What else can he do? And the noise level itself adds to the obfuscation, leading to some people thinking it is some sort of he-said-she-said argument, when it is not. (See Vaughn's somewhat amusing suggestion of a duel.)

I don't think he intended to go this far, but it has gotten way out of hand, and he doesn't know how to "walk back the cat". All of this is on the permanent record, and having "notorious wikipedia vandal" on your resumé is not a good career move.

From his statement above, for emphasis:

The term "Serbo-Croatian" epitomizes lots of bad things that happened in Communist SFRJ in the minds of lots of people, not strictly language-policy (which itself was bad reason enough), and insisting on it just because somebody living in a balloon imagines that the next Yugoslavia (#6, there were 5 unsuccessful previous attempts believe it or not) will be the one, bringing economic prosperity and Pan-Slavic commonness to the impoverished people would be utilizing this project for someone's personal political fantasies. The term is in all possible sense (linguistic, regional, political) obsolete, anachronistic, misleading and politically incorrect, and there is no reason to insist on it, esp. when the only justifications of it presented are the same myths that coined and enforced it.

Quite so. Robert Ullmann 11:26, 23 October 2009 (UTC)[reply]

I nearly reverted this actually. Who cares. Why don't we all work on the f--king dictionary. Mglovesfun (talk) 11:29, 23 October 2009 (UTC)[reply]
I don't get it either. Robert, what's your point? That Stambuk contradicts himself? Maybe that would be relevant in a post about Serbo-Croatian, but this is not trying to make any constructive argument. It's attacking Stambuk directly, and we don't tolerate behavior like that. It could all be true, and everything that Stambuk said in reply irrelevant, but that wouldn't change the fact that you're the aggressor. In his response I mainly see Stambuk defending himself, and I think it's shameful that we would allow the question to be framed as such and then talk about Serbo-Croatian like there wasn't the elephant of an ad-hominem in the room. If you have an argument to make about Serbo-Croatian then make the argument. Next time you have a complaint about someone who disagrees with you, you should think to only complain about something actionable like behavior. Please let us know if there's something pertinent to dictionary entries that concerns you, but it doesn't look like you have anything legitimate to complain about with regard to Stambuk's recent edits, at least not anything since the last vote. As long as he abides by consensus with entires, and as long as he sticks to the ground rules, he can make all the fallacious and contradictory comments he likes. We can ignore him or rebuke all his points as long as we stick to the rules as well, which means arguing the points. I don't think there's a better example of breaking that than by starting a new section on the reasons some contributor's beliefs shouldn't be trusted. Coming from anyone else, these comments may have been reverted immediately. Mglovesfun would be correct to have done so. Even Wonderfool would have been correct to revert you, it's so out of line. Really, do you have a legitimate point? Then please argue that instead. DAVilla 05:42, 9 November 2009 (UTC)[reply]
I care. My support for "Serbo-Croatian" is due in large part to Ivan's claim that his earlier opposition was grounded in Croatian nationalism; and I'm sure I'm not alone. Ivan is by far our most knowledgeable editor on the subject. If Robert is right that Ivan is trolling, and/or using us for some sort of anti-hr.wikt revenge, then that's a BFD. —RuakhTALK 19:34, 23 October 2009 (UTC)[reply]
The Serbo-Croatian unification proposal has absolutely nothing to do with Croatian Wiki-Community, or my alleged "revenge" against it. That is simply one of those ad-hominems Ullmann is inventing against me. Ullmann kind of really "hates" me now, and given that he has no knowledge to confront the irrefutable logical soundness of the Serbo-Croatian unification proposal in linguistic terms, he must resort to trolling and ad-hominems. According to his theory, I an fact still believe that these are "different languages!". If it is really the case, that I want to "take revenge", why I'd be bringing the SC unification proposal after 2 years of creating independent B/C/S entries, and why I'd be ardently defending the oneness of SC on some other unrelated projects (namely the English Wikipedia?), where there is not a single Croatian Wikipedia contributor (I run only to Kubura from time to time). Ullmann's accusation simply doesn't make sense at all, and is more of an deliberate argumentative diversion from the real dispute (i.e. the big list of reasons why the proposal is actually beneficial, regardless of the alleged motivations of one of its supporters). --Ivan Štambuk 19:53, 23 October 2009 (UTC)[reply]
And this suggestion of a duel is not such a bad one. to the victor belong the spoils! --Volants 11:43, 23 October 2009 (UTC)[reply]
Ullmann, you're the king of the trolls. It's absolutely f* amazing what you wouldn't try to get me blocked on this project. "Constant abusive behavior" - Sweet Jesus. The only person I see feeling "abused" is yourself, which is absurd after all the disgusting lying defamation you've did with DaillyKos post and Wikimedia Board e-mails.
I got blocked on Croatian WP because it was at that time literally owned by two maniacal *****s that up until some month ago were bureaucrats (imagine that!), but are now desysoped and banned from the project without anyone rasing so much an eyebrow, after the community finally came to realize how insane their off-wiki machinations were (I wouldn't rather mention all the details in public, the were some really ugly stuff going on. Ask your fellow Connel, he knows some of the details ^_^). During the year I spent at Croatian Wikipedia I witnessed about a dozen superb contributors being either blocked or permanently driven away by their delicate abuse of admin privilages. Basically everyone who wasn't Croatian nationalist, or who dared to touch the subjects that were somehow perceived as some kind of a "national treason" were driven away. There are reports on meta on all this shit. I have no desire to explicate the billion details that accumulated over the years to you now, and it doesn't matter anyway.
The reason why I got indefblocked recently is not because I've broken any rule or something, but after the nationalist clique out of thin air (initiated by one of the banned ex-burecurats Roberta F., and that everything was well-organized can be seen by the fact that votes were colellected in a matter of minutes) decided to "vote" to indef block me after I tried to reason to them in Kafić (I was 100% polite and civilized in that discussion, as everyone who can read serbo-Croatian can see for themselves, and received plenty of unsanctioned abuse BTW) on how SC unified treatment on Wiktionary is not something reflective of "Communist oppression", or "next step towards the obliteration of bs/hr/sr wikiprojects", or something similar that they paranoidly imagined. But, after that vote I kept my mouth shut because I had a feeling that there was a big change coming with regard to the ArbCom vs. 2 burecurats issue (plus it was no damage at all because I don't contribute at all to Croatian Wikipedia, only on English-language wikiprojects for 2.5 years now, with the exception of Croatian Wikisource where I put material I need for English Wiktionary as citations). In other words, my recent indefblock was not because I was "abusive" or something (as I said, I decked all my logically incontrovertible verbiage in the related discussion in Kafić with flowers of civility :P), but because they simply couldn't stand my presence. I cited them the most competent living Croatian Slavist Mate Kapović, who wrote in a nationalist magazine Kolo published by Matica hrvatska (the central Croatian cultural institution) that "dialectally, Croatian and Serbian are of course the same language" - which of course created an "outrage", and the "defense" that languages are not merely "linguistically defined" yada yada.....
On February 28, he moved Wiktionary:About Serbian to Wiktionary:About Serbo-Croatian and set about editing it to change the language(s) to Serbo-Croatian. (Interestingly, he created redirects for about-Bosnian, and about-Montenegrin, but not Croatian.) - I assure you that the only reason why if I've failed to create the redirect for Croatian, as you mention I did, it was simply because I somehow forgot to do it.
He then set out, fairly sporadically at first, to delete Croatian (and the others when present) from the wikt, replacing it with SC. It had nothing to do with "reducing duplication" or any other arguments that would come later. It was all about—and still is all about—removing Croatian from the en.wikt. (Hence his vociferous objections to having a brain-damaged bot restore all of the deleted sections, that would defeat his entire purpose.) - Nonsense, it was with regard to the modifications of the WT:ASH policy and the unified treatment contained in it, and which was agreed to by all of the Serbo-Croatian contributors at the time. Your perception that this was all about "deleting Croatian" is tragical. Absolutely nothing od the information was lost. Moreover, all the merged entries were during the process expanded, rectified and checked. You, as well as the entire community, were notified of this ongoing activity.
It isn't that he changed his mind about "nationalism", or about the linguistics. He still holds the positions stated eloquently above, he knows that "Serbo-Croatian" is offensive, insulting, and linguistic nonsense. - No Freud, I actually changed my mind, believe it or not. You know, smart people always have the right to change their opinion, as opposed to stubborn fools who push the same "truths" forever, despite being proven wrong. FFS, what would I possibly gain by pushing Serbo-Croatian, and privately holding that it's not one language, and that the name is insultive? I'd have to be clinically insane to do that. Sorry, your logic is fallacious and ill.
"His objective is to take revenge on the people in the Croatian WM projects, by doing severe damage here." - LOL. You need help, dude. And stop finally using the abusive word damage - no "damage" is done, as no content is lost, and English-speaking Wiktionary users would find the merged treatment much more optimal to use. I personally don't give a flying **** on the state of mind of folks on Croatian Wikipedia :) My sole concern here is the quality of both content and presentation of Serbo-Croatian (i.e. its modern-days national standards) with respect to Wiktionary users, i.e. people like Bogorm. As a person who has some 30 000 quality edits (you know, actual content edits), I'm quite concerned that that goal not be obstructed by maliciously-intented individuals, as a form of trivial remedy for their personal mental issues (nationalist pride, ego..)
"That, btw, explains (if you have noticed through all the noise), why his "arguments" for SC reduce to screaming that the languages are "99.9% identical", interwoven with constant personal abuse. What else can he do? And the noise level itself adds to the obfuscation, leading to some people thinking it is some sort of he-said-she-said argument, when it is not." - And I'm the one who's trolling? ^_^ Here are my arguments I've stated repeatedly over and over and over again, on various places, and which irrefutable, undeniable facts of nature which can be verified in the books:
  1. Modern standard Bosnian/Croatian/Serbian(/Montenegrin) are all standardized on the same, identical speech/subdialect (Neoštokavian) of the same dialect (Štokavian), the only (!!!) speech that is spoken by all 4 nations. This was deliberately chosen in the 19th century to bind the dialectally diversified literatures and their respective languages of our neighboring brotherly nations, to suppress Ottoman and Austro-Hungarian yoke (see Vienna Literary Agreement).
  2. As a result these have the same phonology (the same phonemic inventory), the same pronunciation (2-way accentual system), the same inflexion of nouns/adjectives/verbs (in 7 cases 2 genders 3 persons 4 tenses). That would make their grammars 99% identical. In fact, all the differences in grammar among the standards could all fit on some 2 pages of text. (And that's on standards alone, if you take into account what people actually write/speak, there are countless "Serbian" words/constructs used by Croats and vice versa.)
  3. These 4 were prior to 1990s treated as different regional varieties of one and the same language Serbo-Croatian/Croato-Serbian by all of the world's Slavists (including venerable Croatian linguists like Tomislav Maretić, Vatroslav Jagić etc.). This is the stance of pretty much all of the Western Slavists today (including the top-ones like Browne, Kortlandt, Dybo...who actively publish on Serbo-Croatian and still use that very term). If you look in the early 20th century and earlier, Serbo-Croatian/Serbokroatisch is in fact the only term you'd find in the books.
  4. From the perspective of modern linguistic science, B/C/S/M are four independent standard languages, 4 national standards of one underlying linguistic entity (call it Neoštokavian, Serbo-Croatian or whatever). This makes Serbo-Croatian polycentric standard language, as Croatian linguist Snježana Kordić meticulously elaborates in her recent paper here.
If you find this above "abusive", and choose to ignore and dismiss it as "arguments", you're effectively demonstrating a solid amount of trollish behaviour. We all know that you don't know iota of Serbo-Croatian, or of any other related Slavic language, and it's astonishing that you even have the courage to emit such immense amounts of BS directed against an expert on the language (me), failing to mention a single reputable source refuting any of my claims.
Ullmann, if you genuinely imagine that I'm "abusive" and "trolling" (which I don't think you are, you're simply trying to maliciously denigrate my standing in the community but I don't think that people are that stupid around here), start a confidence vote for sysops only on me, otherwise finally cease this disruptive pattern of transparent trolling because it's getting really boring retorting you and you're starting to look really pošandrcao (dunno the English term for that, ask your "new friends" to explain :P). --Ivan Štambuk 19:40, 23 October 2009 (UTC)[reply]

Please, let's restore removed entries, assume everybody's good faith, forget all this, and get back to normal work. Lmaltier 20:51, 23 October 2009 (UTC)[reply]

The real reason ye Croats are doing this is because Serbian wikipedia is moving along quite nicely and Croatian/Bosnian wikipedias are falling behind, and you guys want to take Serbian articles without doing the hard work of "translating" them.--Pepsi Lite 00:01, 24 October 2009 (UTC)[reply]
No, this is a completely untenable præsumption, since die-hard nationalists on hr and bs wiki would be incensed at words typical for the Eastern variety of SC such as poreklo or hiljada, not to speak about the Cyrillic script. Therefore it is virtually impossible for a wikipedia which would claim to be true-born Croatian to contain similar words and their expurgation would go far beyond the copy-paste process and render their alleged intention fairly intricate to realise. Furthermore Ivan is at odds with the hr wiki governance and their (nationalist) approach is not his. The most virtuous example is sh wiki where both poreklo and podrijetlo, hiljada and tisuća are accepted and a foreign reader is capable of becoming familiar with those peculiarities æquivalent to American and British English differences. It is by far the most tolerant one. The uſer hight Bogorm converſation 06:43, 24 October 2009 (UTC)[reply]
Now that is out of line and completely irrelevant to anything on Wiktionary. Knock it off, please. -- Visviva 04:33, 24 October 2009 (UTC)[reply]

Less ad-hominem section header

  • I think we all have complex motives for what we do here, including some motives that we probably aren't terribly proud of. But motive is neither here nor there, really. Ruakh is correct that revenge-trolling would be a BFD, but no plausible case has been brought against the linguistic soundness of Ivan's edits AFAICT. Even if the arguments behind them are ultimately rejected, sound edits backed up by sound arguments don't amount to trolling, regardless of the motivation.
    I'd like to go back to something that Robert said some ways above, that the absence of consensus makes the bot necessary. This seems somewhat contrary to the way we normally do things:
    1. It is a fact of life on a wiki that there are a great many issues for which no consensus exists, either for or against. We do not require a prior consensus for human edits. If we did, no progress would ever be made.
    2. We do require a prior consensus for bot runs. If we didn't, pure chaos would ensue.
    3. No argument has been made that Ivan's edits were automated. They appear to be complex, human edits done over a considerable period of time, with concurrent discussion on various policy pages.
    4. There may have been no consensus for these edits, but there was also no consensus against them, as far as I can tell.
    5. Accordingly, there is no obvious basis for automated reversion/restoration.
  • On the other hand, it would be perfectly acceptable for individual editors to restore sections in a manual or semi-automated fashion. It could be done like this:
    • A script could prepare a list of the removed sections, formatted in such a way that it would take a human editor only 2-3 clicks to add them back (after which AF could sort them into place).
    • If there is then a human editor who wants to make the hundreds/thousands of clicks involved, it's done.
    • If not, it stays not done until somebody decides to care.
  • Anyway, that's as much sense as I can make of things at this juncture. I am aware that my proposed solution will not make anyone happy; I'm not sure that's a bad thing. I have absolutely no opinion on the merits of the case(s). -- Visviva 04:33, 24 October 2009 (UTC)[reply]
    As I've stated above: I have nothing against manual adding of B/C/S(/M) sections, either on the basis of the previously merged entries, or on the basis of newly created SC entries. The most important thing is that they're checked by a knowledgeable human, since during the merger e great deal of cleanup occurred, and the blindly-restoring bot would generate thousands of entries needing human attention (such as wrong data and obsolete headers, as I've explained in the above discussion). Elephantus (talkcontribs) attempted doing this manually - restoring from history previously merged ==Croatian== sections, but after a few edits he realized how inane activity it was, how ridiculous it looks like to have stubbish ==Croatian== entry next to moderately complete ==Serbo-Croatian== entry with the same content, and thus he eventually started simply copy/pasting whole ==Serbo-Croatian== entries to ==Croatian== the only changes being modifying ISO code from sh to hr, and removing non-Croatian ==Alternative forms== when they occurred (Ekavian Serbo-Croatian). He eventually gave up on that too.
    The only proper way to handle this "issue" (the non-existence of redundant data for non-existing languages) is not by restoring rubbish from page history, but on cloning the existing ==Serbo-Croatian== entries, so that all of those thousands entries look as silly as [[govor]]. This can be trivially semi-automated in various ways. We can also add a note to the corresponding language policy pages, for all the future WTF complains by Serbo-Croatian learners that end up on Wiktionary, why is one language handled in 5 mostly identical sections, to seek "explanation" by Robert Ullmann, because someone has to take responsibility for this nonsense. --Ivan Štambuk 13:26, 24 October 2009 (UTC)[reply]
Ullmann has become a gigantic troll. He’s trying to do for Serbian/Croatian what he has done as the adminstrator for rw:Main Page. In October of 2006 that Wiktionary had about 300 entries, and today, three years later, it still has about 300 entries. The main contributor for Kinyarwanda quit to get away from RU back in 2006, and RU is doing his damnedest to make our Serbo-Croatian editors quit now. We should turn over our Serbian/Croatian project to qualified, dedicated, competent editors. I have an idea...let’s get Ivan Štambuk and Dijon to handle it. —Stephen 15:51, 24 October 2009 (UTC)[reply]
We seem to be on the brink of a reasonable solution, which Ivan seems to have accepted. It may well play out exactly as he says, which may be the most wiki-like way to achieve the desirable final outcome. I fully expect Serbo-Croatian to be superior in its coverage and entry quality to any of the other languages that had been at issue. Superior content supported by active contributors will win. It can win without preventing folks who have other reasons to want to have other language headers from contributing under those headers. Some of what they contribute might even be useful for the coverage of Serbo-Croatian. DCDuring TALK 16:24, 24 October 2009 (UTC)[reply]
We do not need to snatch destructive conflict from the jaws of reasonable compromise by stirring the pot with extraneous score-settling. DCDuring TALK 16:24, 24 October 2009 (UTC)[reply]
Ullmann wrote the preceding section, #Štambuk v Štambuk, only yesterday. That is not "reasonable compromise". Nobody is snatching destructive conflict and stirring the pot except Ullmann himself. —Stephen 16:55, 24 October 2009 (UTC)[reply]
The thing is DCDuring that this whole "issue" surfaced as an exercise in planned trolling where Ullmann hoped to have cast me as an "abusive POV-pusher" in order to get me blocked, in assistance with his new "friends" from Croatian and Serbian Wikipedia (a bunch of nationalist bigots, one of which ironically even openly confessed pro-Greater-Serbian viewpoints). No one prevented anyone from (re)creating B/C/S(/M) sections from the start. The thing is that these folks are simply too lazy to do it themselves, and they'd rather see bot do it, only to see they beloved nationalist designations on Wiktionary, regardless of the quality or the necessity of such entries. And that last thing is what I have problems with. You want every single Serbo-Croatian entry to look as ridiculous as [[govor]]? Fine by me. Perhaps one day when we have 100 000 of those there'll be a vote to eliminate all that redundant garbage, or to abstract it away by in JavaScript at the presentation-level. --Ivan Štambuk 16:40, 24 October 2009 (UTC)[reply]
We may accept some ugliness in support of "wiki-ness". I doubt that there will end up being truly massive duplication (and hope there won't). If Ivan and like-minded contributors continue with high-value SC entries and allow the others to go along doing what they will, the worst that will happen is duplication, perhaps accompanied by excessive bickering. As Ivan pointed out, the silliness of that may become apparent to the related-language contributors and to the community at large.
Visviva had a reasonable accommodation. I was hoping that it would be the focus of discussion. DCDuring TALK 17:35, 24 October 2009 (UTC)[reply]
Could there be a solution while contributors still remove Bosnian, Croatian, Serbian sections? Imagine that a group of contributors begin to remove all Ido sections because of their linguistic opinions (which is not improbable). Would you forbid the use of a bot to restore them because they have been removed manually? Removing sections because of one's political or linguistic opinions is not acceptable, and removed sections must be restored. It's the only way to quieten things. Lmaltier 20:57, 24 October 2009 (UTC)[reply]
We haven't determined, as a matter of policy, that it is wrong to do so, as Visviva has pointed out. There seems to be a majority, but not a consensus, that believes that it is OK to eliminate such headers if the resident experts think it appropriate. There was insufficient consensus to make it a policy or to have such a process (or its reverse) be implemented by bot. I think it would be wise not to eliminate material under any header for a living language systematically, whether by policy, by bot, or manually in the absence of consensus, because that seems to me to be how a well-functioning wiki would do it. DCDuring TALK 22:30, 24 October 2009 (UTC)[reply]
If there's no consensus that we should have Ido sections, then yes I'd forbid the use of a bot to add them, even if the bot is simply reverting another user's manual removal of them. Allowing bots to operate without consensus is bad in so many ways. You seem to see a bright-line distinction between "remove" and "restore", but if there's no consensus for either one, then I don't think that distinction is valid. (No such distinction is valid unless there's consensus for it, else you're basically saying that your own personal opinions — in this case inclusionism — supersede community consensus.) And it's especially tendentious as applied in this case, because the sections weren't "removed" so much as "merged, expanded, and improved". (I'm not saying the term "removed" is completely inapplicable, but it's stretching the truth to pretend that that is the term for what happened, and then to base all your analogies on that term.) —RuakhTALK 05:22, 25 October 2009 (UTC)[reply]
You seem not to understand my main point: it's not acceptable to do changes because of one's political or linguistic opinions (except by adding correct information). There will never be any consensus on political or linguistic opinions, of course (this is what opinion means), you should not expect one. I'll take another example. If some people believe that religions should not be addressed by Wikipedia (as they are allowed to believe), and delete all pages related to religions (even with good arguments), these pages should be restored (by bot if necessary), because their changes would be based only on opinions, and would violate the NPOV principle. This also applies to linguistic issues, some linguistic issues are very polemical (cf., in French, fr:au temps pour moi and fr:autant pour moi). Yes, the NPOV principle is one of the founding Wikimedia principles, and would supersede community consensus (but anyway, in this case, there is no community consensus). Lmaltier 09:44, 25 October 2009 (UTC)[reply]
Re: "There will never be any consensus on political or linguistic opinions, of course (this is what opinion means), you should not expect one": we don't need one. We may never have a consensus on whether BCSM is one language or four (or some other possibility), but we certainly can have a consensus on whether we want to treat it under one language section or four, or five (or some other possibility). You prefer five, and I don't begrudge you that preference, but there is nothing in Wiktionary policy that automatically elevates your preference to "imposable by bot without consensus". Your comparison to Wikipedia articles on religion is a bad one, because Wikipedia has a consensus to include such pages. If there were no such consensus, then a bot to add them there would be highly inappropriate. And your appeal to NPOV is unconvincing, because NPOV doesn't support your belief that the only appropriate edit is the addition of correct information. For example, if someone adds a usage note to fr:au temps pour moi stating that it's wrong and that everyone who uses it is an idiot, then other editors can (and should) restore NPOV by removing that note. And while you obviously feel that NPOV requires us to have sections for every putative language, it's obvious that many editors feel differently; even if NPOV supersedes community consensus, how do we decide, if not by community consensus, what is NPOV? —RuakhTALK 13:20, 25 October 2009 (UTC)[reply]
  • [long undent back to Visiva's proposal] There are a couple flaws, imo, with your findings of fact:
    2. In fact, dozens of bots are operating on Wiktionary regularly with neither community approval nor bot flag; in some cases there is collusion/tacit approval of admins.
    5. There is basis for automated reversion/restoration: the removal violates Wiktionary's inclusive policies. Key policies.2 states 'Wiktionary is multi-lingual in that it has entries for words from any language. It aims to cover Every Word from Every Language.' This policy would of course include every word from any dialect, based on the policy's broad language. The fundamental description as criteria for inclusion is 'As an international dictionary, Wiktionary is intended to include “all words in all languages”.'
  • Given the above, I cannot see a reasonable objection to any form of automated restoration, but I also cannot see support for any automated reversion as that would likewise be removing valid word entries. But then, iirc, I believe M Ullmann did suggest exactly this - a restoration of removed sections and retention of the new sections as is. - Amgine/talk 19:12, 25 October 2009 (UTC)[reply]
  • Re #2: It's true that bots have been run without approval and without a bot flag, but not when it's so obvious that they don't have consensus; and there are very high expectations that the bot be stopped if anyone raises any objections.
  • Re #5: Nonsense. The "removal" does not violate the "all words in all languages" doctrine, because we still include all the words that we did before.
  • RuakhTALK 22:39, 25 October 2009 (UTC)[reply]
I respectfully disagree with your assertion of "Nonsense." Here are a couple of examples in which we are now missing words from some languages:
term Bosnian Croatian Serbian
jedinica X X X
ансамбл     X
badem X X X
Минхен     X
žito   X  
jagoda X X X
kompozitor X X X
Whether you agree that consolidating several dialects under one language header is better for Wiktionary or not, you cannot deny that each of these are defined as languages by authorities respected by the project, and that Wiktionary's stated policy is to include all words in all languages.
This is true whether or not you are stalking my postings because you don't like me, as well. - Amgine/talk 04:45, 26 October 2009 (UTC)[reply]
I don't deny those things, but I deny that they're relevant. We still have the words, we just describe them differently. I stand by what I said above to Lmaltier: "I'm not saying the term 'removed' is completely inapplicable, but it's stretching the truth to pretend that that is the term for what happened, and then to base all your analogies on that term." (And I'm really not stalking your postings. You just happened to have commented a few times recently on pages I have watchlisted.) —RuakhTALK 12:06, 26 October 2009 (UTC)[reply]
Amgine, modern Bosnian can be written in Cyrillic as well (is allowed to in orthography books, just not much used), and Croatian writers have for several centuries used bosančica and attestations for Cyrillic spelling of all of these could be found. As for žito - it's a native Slavic word (Proto-Slavic *žito, basically unchanged) and can be found in usage by Bosniak and Serbian writers as well as dictionaries and is no freakin' way "ethnically Croatian".
Also, given how loosely you use the word dialect, it's clear that you apparently don't know much of South Slavic dialectology, and its application to the Serbo-Croat area. These are not different dialects - they're all the same dialect and which is in dialectology books called Neoshtokavian. Serbian and Croatian writers decided in the 19th century to deliberately standardize their literary languages on the same dialect Ijekavian Neoshtokavian (which was at that period spoken by minority of Croats, but has ever since managed to largely obliterate all the other subliterary dialect, namely Čakavian and Kajkavian). There is no "language" of which standard Bosnian/Croatian/Serbian/Montenegrin are "dialects of". It doesn't exist. They're not only the same dialect, but the same subdialect (Neoštokavian, there are also "Old Štokavian" dialect with different accentual system).
The usage of different scripts that trivially map 1:1, or as well regionally confined terms does not justify the notion of "different language". Languages are not defined in terms of lexis but in terms of grammar. And in case of Serbo-Croatian national varieties, grammar coincide 99%. In phonology alone they're more similar (i.e. identical) than e.g. British and American English.
There are no "authorities respected by the project" that you speak of, and there is no institution in the world that "defines as languages". Furthermore, there is no firm criteria in linguistics at all to strictly define languages. We follow our own criteria for language-inclusion (not lexeme-inclusion which is covered by WT:CFI) on the basis of our needs, i.e. what do we gain by doing separation/merger with respect to the target audience (Wiktionary users) and contributors. We merge Byblical and modern Hebrew because it makes sense. We treat all Ancient Greek dialects and also Middle Greek as ==Ancient Greek== because it makes sense (even tho the differences between the spellings and inflection of Ancient Greek dialects are much greater than among modern standard B/C/S/M). OTOH we treat 2 varieties of standard Norwegian under different language headers because they apparently differ enough in various points of inflection so that it doesn't make sense to treat them commonly. We also treat Lithuanian Žemaitian dialecet under different L2 because it radically differs literary Aukštaitian dialect which we use as normal ==Lithuanian==. For B/C/S/M there is no such justification - as the merged entries have proved, it's very easy to treat them all together, and differences among standards and the distribution of regionally-confined lexemes can easily be handled by means of context labels and ===Alternative forms=== header, similar to what we already use for varieties of English, German, Spanish etc.
B/C/S/M are 4 different national standards of what is linguistically doubtless one entity, call it Serbo-Croatian or whatever. That fact cannot be ignored. The fact that they're maintained by different national bodies does not invalidate their inherent linguistic oneness. You cannot argue that just because they've been assigned different ISO codes under the pressure of nationalist governments in the 1990s that it justifies the notion of "separate languages". Especially because there was only one code sh for a very long time before that, and that nobody had any problems with. It's silly to try to "prove" anything on the basis of some cherry-picked lexical comparisons and the fact that Croatian standard does not use the Cyrillic script (where Bosnian, Serbian and Montenegrin allow it). --Ivan Štambuk 12:33, 26 October 2009 (UTC)[reply]
You write much and say little.
International linguistic standards respected by en.Wiktionary recognize these as viable languages. Full stop. Your actions in consolidating multiple entries under a single language header reduced the number of entries in those languages whose headers were removed. Full stop. Your actions in replacing a single language header with another reduced the number of entries in the language whose header was replaced. Full stop.
There are no value judgements in the above statements. I happen to agree with your campaign whole-heartedly emotionally, but my opinions are irrelevant. They cannot change the fact that your actions have materially reduced the number of entries in some languages which are valid languages by our standards. - Amgine/talk 17:02, 26 October 2009 (UTC)[reply]
I write much because you apparently don't know much on the topic. It is necessary to educate you on the basic concepts, lack of understanding of which blurs your perception of the subject and leads you to fallacious conclusions such as "words spelled in different script belong to different languages" and "lexical dissimilarity necessitates the notion of a 'different dialect'". It is important not just for you, but also for other readers not to be mislead by such fallacious reasoning. Hence, I strive to keep my post as extensive and educational as possible.
There is no such thing as "international linguistic standard". Have you absolutely any clue what SIL International does? How many professional linguists do you think give a **** about that nongovernmental non-profit Christian organization? I'll tell you: none. SIL and 3-letter codes have their own particular purpose (read e.g. article on it in Elsevier Encycolpedia of Language and Linguistics on what exactly do they try to accomplish with it, which is not to proscribe the notion of a "language" at all). Some nationalist might be deceived that it internationally legitimizes their "separate language", but a quick glance of recently published authoritative FL sources (e.g. Britannica, papers and works by top linguists in the field..) should prove them otherwise. Pleas stop trolling with the abuse of such seemingly "authoritative" wording such as "international standard". Really. Languages are not computer protocols.
Your actions in consolidating multiple entries under a single language header reduced the number of entries in those languages whose headers were removed. - Nonsense.
Your actions in replacing a single language header with another reduced the number of entries in the language whose header was replaced. - Nonsense
There are no value judgments in the above statements. - Yes there are. These are not "different languages", for starters. Once you realize that (which you can't, as you don't know the language at all, and apparently don't know much about linguistics either) it all falls to pieces. Additionally, once you realize that we are not here to mindlessly chase 3-letter codes but to describe the linguistic reality as it is/was with our readers/contributors in mind, you reach the conclusion that the common Serbo-Croatian treatment is the Right Thing to do. We can add those national varieties of Serbo-Croatian too, but that would be simply a waste of time and bytes. I can see on my watchlist that our proud admirer of Vojislav Šešelj Pepsi Lite (talkcontribs) has been more than industrious in copy/pasting ==Serbo-Croatian== entries to ==Serbian== lately, why don't you give him a hand? :)
Once again, to summarize all this: Restoration of these merged entries must not be done automatically. They contain a large number of factually wrong data, obsoleted formatting as well as ambiguous content that would introduce unnecessary confusion, misleading that there is some kind of additional difference between those "languages" when in fact there is none. There is absolutely no loss in not having them because everything of use is already contained in the merged ==Serbo-Croatian== sections. Users interesting in adding sections in national varieties of SC alone (those that pretend to only speak/write "Croatian" or "Serbian", but can nevertheless smell a "Serbianism" in Croatian or vice versa from a mile away) can do so. They might be much more interested in automated extracting of interesting content from the merged or newly-created SC entries, an approach which should reduce their efforts by an order of magnitude at least. In fact, had they done so for these entries, the could've checked the list in a few hours and it all could've been added long time ago. But it appears to me that that wasn't the point at all of this silly anti-Štambuk crusade. --Ivan Štambuk 15:56, 28 October 2009 (UTC)[reply]
Ivan, how did you infer Pepsi Lite's admiration for Dr. Vojislav Šešelj? The uſer hight Bogorm converſation 20:31, 28 October 2009 (UTC)[reply]
I respect Vuk Stefanović Karadžić. Mr. Štambuk admires Josip Broz Tito along with at least 25% of his fellow countrymen labeling him the 'Greatest Croat in history'. Croats admire Tito because on November 26, 1942 he promised us Serbs: democracy, inviolability of private property, freedom of individual economic initiative; and like a typical Croat he later changed his mind. Tito very successfully deceived Serbs and Winston Churchill who gave him support because oh his false promises. The communist leadership headed by Croat Josip Broz Tito sent Serbs and Montenegrins (like Milovan Đilas) who protested the lack of democracy to the Croatian island of Goli Otok.
Mr. Štambuk still holds the nationalist ideals to his heart but is not trolling the Croatian wikipedia like Mr. Ullmann is indicating. Mr. Štambuk is doing to us like the rest of his admired forefathers have done in the 20th Century: screw us over and throw us away like a used condom. Croats overwhelmingly voted on 19th of May 1991 in favor of separation (94.17%), and now this is not enough for them, apparently. Mr. Štambuk, your countrymen wanted and voted for this so there is no reason to tell other people lies that you are going along with the wishes of 90% of your countrymen.
Serbian and Croatian are not the same language. Croatian is composed of Kajkavian and Chakavian, which Serbian isn't. Wiktionary is all words in all languages. Some Croatian villager somewhere today (or in the past) is speaking either Kajkavian or Chakavian and those words should be included in Wiktionary. Kajkavian or Chakavian cannot be included as a 'Serbo-Croatian' entry, since those dialects are very different. Kajkavian and Chakavian should be included under Croatian. Likewise Torlakian under Serbian. The only way Serbian and Croatian words can be included together is if there is a L2 header called Neo-shtokavian. A L2 heading called Neo-shtokavian would stop my opposition to this grouping and would even stop Mr. Ullmann's trolling which is awesome and spectacular!--Pepsi Lite 08:09, 29 October 2009 (UTC)[reply]

October 2009

CFI Clarification required

(copy of entry in Wiktionary_talk:Criteria_for_inclusion#Clarification_Required) The CFI (Criteria for Inclusion) need clarification on one point:-

Attestation.

“Attested” means verified through
  • Clearly widespread use,
  • Usage in a well-known work,
  • Appearance in a refereed academic journal, or
  • Usage in permanently recorded media, conveying meaning, in at least three independent instances spanning at least a year.

Are those 4 attenstation criteria joined by OR, or by AND.

My personal view is that they should be joined by an OR, so that a term that meets ANY of the criteria, and does not need to meet ALL of the criteria.

I would suggest a change of the paragraph to

“Attested” means verified through meeting ANY of the following conditions

  • Clearly widespread use,
  • Usage in a well-known work,
  • Appearance in a refereed academic journal, or
  • Usage in permanently recorded media, conveying meaning, in at least three independent instances spanning at least a year.

I cannot be bothered to mount a campaign or vote on my own. Anyone agree enough to take it on ? --Richardb 14:28, 1 October 2009 (UTC)[reply]

They are joined by an "or". It's right there in the text. I agree that this is not sufficiently clear on a first reading; in fact just a little while ago I was modifying Wiktionary:Editable CFI in a way similar to your suggestion. Please feel free to modify that page further -- it is there to be edited. -- Visviva 14:34, 1 October 2009 (UTC)[reply]
Whilst I can see the little "or" buried in there, there are clearly those who cannot. And as to being bold and -- it is there to be edited--- there is CLEALRY at the top of the page "This is a Wiktionary policy, guideline or common practices page. It should not be modified without a VOTE." I've been around long enough, and been battered enough, not to try messing with CFI without a VOTE.--Richardb 15:44, 1 October 2009 (UTC)[reply]
Wiktionary:Editable CFI is a seperate page. Conrad.Irwin 17:29, 1 October 2009 (UTC)[reply]
Yeh ? I was still beaten up for editing that page too!--Richardb 07:39, 30 November 2009 (UTC)[reply]

Translingual translations

Corvidae. On the surface this seems rather contradictory. If there are translations, then how can it be called 'Translingual'? Nadando 21:12, 20 October 2009 (UTC)[reply]

These terms are translingual by their virtue of being understood by biologists the world over. HOWEVER, they are, in practice, New Latin, and as such pretty much all the taxonomic level higher than genus have mostly standardised forms in vernacular languages. Usually this is a standard form for the suffix combined with agreed rules of transliteration, although many language function differently in assigning vernacular terms. Circeus 01:33, 21 October 2009 (UTC)[reply]
I supported these once, but looking at Corvidae#Translations, it's obvious how badly this can go wrong. At best, these seem like translations of "crow family"; some, however, bear a strong resemblance to wikispecies:Corvidae#Vernacular names, which suggests that they just mean "corvid". So either we come up with a standard header template for these that makes very clear that only non-Roman exact equivalents should be entered, or we just need to remove them entirely. The linked entries can get by on their own, I think; appendices can take care of the cross-referencing. -- Visviva 02:26, 21 October 2009 (UTC)[reply]
Come to think of it, isn't crow family idiomatic enough for inclusion? Why don't we just move all of the "translations" thither? -- Visviva 02:32, 21 October 2009 (UTC)[reply]
Crow family is hardly idiomatic in English. We already have corvid#Noun.
But, in principle, couldn't there be even multiple transcriptions of these taxonomic names into the various scripts. And since we have translations into languages, not scripts, we could have multiple ones for each language using, say, the Cyrillic script. Is it more standardized than that? DCDuring TALK 02:48, 21 October 2009 (UTC)[reply]
How is it not idiomatic? It's a proper noun, even. (Or if it isn't, we have a whole lot of mislabeled entries.) How would someone guess the specific meaning from crow + family? Or how would a non-English speaker know that it was the "crow family" and not, say, the "jackdaw family"? On the other hand, it's difficult to say that Corvidae has a distinct meaning apart from "the corvids", so perhaps there is no real need for a translation handle. On the third hand, 까마귓과 means "Corvidae" and definitely not corvid. So there is a risk of orphan translations (not that that's a huge problem in itself).
I don't really know how these issues are handled outside of the CJK scripts, but w:Corvidae has (unsurprisingly) very different interwikis for Russian and Kazakh. Neither is a transliteration. -- Visviva 03:33, 21 October 2009 (UTC)[reply]

Surely the correct solution is that foreign words meaning corvid belong at corvid#Translations. The only translations that should be at Corvidae#Translations are scientific names in non-Latin languages, such as Врановые and カラス科. (Note that these are not always transliterations. Russian Врановые, for example — Vránovye — is not a transliteration of Corvidae, but instead appears to be natively Slavic.)

Of course, this raises a related issue: the question of how to link from Corvidae to corvid. Users certainly should be able to navigate directly from Corvidae to corvid, but where should the link go? The answer is not as obvious as one might suppose, and a few possibilities occur:

  1. Within the definition itself. Thus the definition for Corvidae would be something like one of the following:
  2. As a descendant. Etymologically, corvid derives directly from Corvidae; it is common for vernacular names to derive directly from scientific names in this way, when no vernacular name already exists. Of course, this is not always true; animal does not derive from Animalia. Also, users looking for a vernacular name might not expect to find it under "Descendants".
  3. As a related term. Seems sensible, except that WT:ELE#Related terms indicates that the "Related terms" section is exclusively for terms that are in the same language.

Caesura(t) 13:12, 1 December 2009 (UTC)[reply]

Morse code

How should we treat Morse code (and related codes such as Russian Morse code)? In its basic form, Morse code is just a character encoding scheme. Some encoding schemes don't get entries for their individual characters, such as signal flags or semaphores, while others do, such as Braille (because they have their own Unicode codepoints) and ASL manual alphabet (because we have a consistent transcription scheme and because it's part of real language like ASL). Aside from single letter encodings, Morse code can represent control characters (with Prosigns) and special letter combinations has specific meanings (special abbreviations, Q codes, and Z codes). As I see it, there are several things that should be decided.

  1. Currently, our Latin letter pages (and "Variations of .." pages) show Morse code but link to the audio representation. Should we extend this to multi-letter terms? If so, would we extend the other encoding systems (e.g. semaphores) to multi-letter terms as well?
  2. Do we have entries for individual Morse code encodings ("..." = 'S')? One problem is that there is no standard textual transcription of Morse code (see Wikipedia discussion). For "dots" people use periods '.' (which is problematic for linking), bullets '', and middots '·'. For "dashes" some use the hyphen-minus '-' and some the em-dash ''.
  3. Do we list special letter combinations, and if so which ones? Some, like SOS and CQ are common outside of Morse code, but many are not. If we do include them, do we represent them in Latin characters or as Morse code sequences? ... --- ... ("SOS") used to exist but was deleted for being "encyclopedic".

I'd say for #1 that we should only do single letters. I'm not sure about #2. For #3 I'd like the terms listed in Latin letters but not in Morse code. Anyone else have thoughts? --Bequw¢τ 16:50, 22 October 2009 (UTC)[reply]

[e/c] I think we should not include Morse-code pagenames (#2), in part because (per Bequw) there's no unicodification of the Morse code. Things like QSL and SOS (#3) we should have; attestation can be easy, by means of transcribed telegrams quoted in durably archived sources. For #1, Bequw seems to be asking whether an entry like [[foo]] should include the Morse-code representation of foo as... an =Alternative spelling=? a =Trivia= fact? Either way, I think not. That's an easy one-to-one correspondence anyone can figure out by checking a table of codes, so there's no reason to include it; and a downside is the wasted screen space.​—msh210 17:14, 22 October 2009 (UTC)[reply]
Can't hurt to have entries for the code for individual letters. I'd support having the entry for the code for something like SOS if citations could be found that indicate that a reader might come across it 'in the wild' without a referent to the letters for which it was a code. bd2412 T 17:03, 22 October 2009 (UTC)[reply]
Wiktionary is not Unicode's bitch. All for inclusion of letters by whichever transcription is preferred. Terms would need to be cited where they are intended to be discernible. DAVilla 06:31, 9 November 2009 (UTC)[reply]
I agree (with DAVilla). —RuakhTALK 11:32, 9 November 2009 (UTC)[reply]
I seem to remember adding some of the individual letters some time ago - but they all got deleted, "Morse code" not being a "language". So I just added the table to our entry for Morse code instead. I wasn't able to get the dots and dashes to align properly though. SemperBlotto 11:39, 9 November 2009 (UTC)[reply]

Basically it's the simplest way of dealing with cases like none which is the nominative singular of nonain. I've had a go at update the relevant templates ({{fro-noun}}) if I've screwed anything up, do fix it, tell me. User:Widsith is the only other user (that I know of) that knows anything about Old French. So, anyone object? Mglovesfun (talk) 20:40, 24 October 2009 (UTC)[reply]

I'm strongly in favor of anything that gets us more Middle French, Norman French, or Old French entries because of their importance to English etymology. For that purpose, any simplifications that lead to more lemma entries would be great. DCDuring TALK 22:35, 24 October 2009 (UTC)[reply]
Do we have anything even close to policy describing when it is appropriate to ditch plurals and go with noun forms instead? I'd been thinking of nominating Category:Catalan plurals for deletion and simply leave all the non-lemma forms in Category:Catalan noun forms (and Category:Catalan adjective forms, since at present the plural adjective forms are ending up in there as well. — Carolina wren discussió 23:14, 24 October 2009 (UTC)[reply]
Well Old French has a case system, (nominative and oblique). I sometimes wonder what to do about stuff like joueuse which is just classed as a noun right now, which is okay, but it could be considered a noun form. It's easy to see with Russian, Latin, Greek and whatnot, that when there's a case system, the only other sensible option would be Category:Latin genitive singular forms (and about 11 others) which is why we have Category:Latin noun forms. Other input? Mglovesfun (talk) 13:22, 25 October 2009 (UTC)[reply]
Support. I think that [[Category:langname plurals]] only makes sense for a language like English, where only one part of speech has a plural form, and said plural form cannot be marked for anything else (such as definiteness or case). Something like [[Category:langname plural nouns]] or [[Category:langname noun plurals]] could work for a language like Catalan or Modern French (especially if we treat pairs like cousin ~ cousine as being two related lemmata; I know that some people consider it to be one noun with an inflection to indicate its referent's natural gender), but the more general solution of [[Category:langname noun forms]] seems best. (Personally, I wouldn't oppose an all-out split — stuff like [[Category:Old French nominative plural nouns]] — but when we've discussed this in the past for verbs, the general attitude has seemed to be in favor one big catch-all category for non-lemma forms of a given POS.) —RuakhTALK 13:03, 26 October 2009 (UTC)[reply]
Such categories were created on the basis of English category model for parts of speech. There are these for many more languages that have additionally marked plural forms, and which also ought to be deleted. Some of these grew quite large (e.g. Category:Hungarian plurals). Deletion of these should best be discussed on an individual language basis, but the creation of new such categories for languages which inflect nouns for more then 1 plural number should IMHO be strongly discouraged, especially if there is any kind of syncretism (e.g. nominative plural and accusative plural forms are the same). --Ivan Štambuk 15:25, 28 October 2009 (UTC)[reply]

More advanced Wiktionary queries

I've been hard at work finding more stuff to index. I now have working which scripts are used in page titles and I'm working on Unicode Collation Algorithm sort keys. Here's the first new result:

Let me know if you have some more ideas. — hippietrail 03:28, 26 October 2009 (UTC)[reply]

Have you gotten to templates and content-level items yet or are we still operating on headwords, categories, and headings? I'm still plenty busy on the product of your last runs. DCDuring TALK 09:33, 26 October 2009 (UTC)[reply]

Verifying rare languages

See WT:RFV#tingo. Languages that are little used or not often written are by nature, hard to cite with three durably archived cites. Apparently our only other Rapa Nui word is hehe, which I imagine is equally difficult to cite. There is no www.google.rap by the way. Is there any reasonable way to combat this? I can't think of one. Mglovesfun (talk) 20:52, 27 October 2009 (UTC)[reply]

I don't think the three cites rule fits smaller languages particularly well, since they're not only rarely printed, but the prints that exist are hardly avaliable on the Internet. I think we should accept definitions in dictionaries or scholarly works for those languages. -- Prince Kassad 21:05, 27 October 2009 (UTC)[reply]
Well if it's written and has any kind of literature, I don't think we should stray away much from the usual CFI (we could e.g. lower it to only one attestation, because in those cases when language is spoken by a tiny community there is little chance that the recorded word is not actually used). If a language has no written literature at all, and is only described in scholarly works in some form of transcription (or wore, in multiple incompatible transcriptions, depending on how the speech was analyzed by the linguists who investigated it), then it should IMHO belong only to the appendix namespace. A quick Web search yields several websites containing written Rapanui, so there's no excuse to remove RfV label. --Ivan Štambuk 21:37, 27 October 2009 (UTC)[reply]
That would exclude all languages without written tradition to appear in Wiktionary at all. Given that there are about 5,000 languages without written tradition, this is very major. -- Prince Kassad 13:30, 28 October 2009 (UTC)[reply]
Regardless, we shouldn't be adding them at all in the main namespace if there is no standardized orthography or transcription. The loss is minimal, as all of these languages will be extinct by the end of the century, and they interest barely anyone outside the academia. --Ivan Štambuk 15:11, 28 October 2009 (UTC)[reply]
That might be a fairer way to define what we currently include under "appeared in a well-known work" (or did when I last read this stuff) - if it appears as a mention in a scholarly article (or two, or three), then we can take Wikipedia's approach and define it in terms of what is given there (given that wthere are not enough cites to define it for ourselves) - though we might need more rigourous criteria to define when such mentions are acceptable, yadayada. Conrad.Irwin 21:41, 27 October 2009 (UTC)[reply]
In the case of Rapa Nui, there is http://www.rongorongo.org/index.html, but this dictionary does not have tingo. To me, the definition looks fishy. The book by Adam Jacot de Boinod seems to have been poorly researched and inaccurate. Looked at some of his German inclusions...they are either just plain wrong or, in some cases, do not even exist. I say, delete tingo posthaste. —Stephen 22:30, 27 October 2009 (UTC)[reply]
Google Books suggests a possible single reference outside of the controversial The Meaning of Tingo reference on "page 11" of Pacific Studies, Vol. 3-4, a publication by Brigham Young University--Hawaii Campus's Institute for Polynesian Studies, possibly being earlier than the books were published. (OCLC link) This particular reference, while perhaps rescuing the use, is still not well supported at best, and there's no way for me right now to easily confirm that this reference predates The Meaning of Tingo in any case. It does appear to discuss it in context with a similar meaning ("[a reciprocity system] abused by the unscrupulous who might make excessive demands"), however, which may be promising if it does appear to be independent from the word collection. Finding an actual Rapa Nui source may be difficult, however. --Pipian 07:09, 21 November 2009 (UTC)[reply]

Tbot mess with Chinese translations - this must stop

I left a message on Robert Ullmann's talk page but got no reply yet,

This edit [9] created such a mess with this Chinese translation! Can this be stopped please? I don't know how many entries are affected but whatever it's doing, it's wrong! --Anatoli 22:42, 27 October 2009 (UTC)[reply]

It was probably caused by things like [[wo|wǒ]]. I think all of those are wrong and, if they are linked at all, should simply be [[wǒ]]. The tone marks should not be ignored. —Stephen 23:26, 27 October 2009 (UTC)[reply]
Yes, Stephen, that looked like an attempt to wikify pinyin linking to pages without tone marks but it all went wrong. The transliteration should follow "|tr=", so the result was just a piece of ugly looking code. Anyway, I don't see much benefit to linking transliteration to pinyin syllables with or without tone marks, besides, transliteration should be left alone. Anatoli 23:45, 27 October 2009 (UTC)[reply]
I have (just now ;-) replied, and yes, the problem is a bad attempt to wikilink to the forms w/o tones. Either they should be linked properly to the forms with tones, or, as is usual with transliterations, not linked at all, as transliterations are usually not also written forms. As Stephen says, lose the piped links and link to the forms with tones. Robert Ullmann 23:58, 27 October 2009 (UTC)[reply]
Please don't link at all. Anatoli 08:16, 28 October 2009 (UTC)[reply]
I agree, the Pinyin transliterations should be unlinked. Delete the links when you see them. —Stephen 05:43, 29 October 2009 (UTC)[reply]
Robert, as you said on your talk page (I got lost in that discussion, sorry), it seems Tbot is trying to link the first part of Japanese transliteration, assuming it is "|tr=(Hiragana), Rōmaji|". I don't think it's a good idea either. The transliteration is a free form, could be more than one reading, mixing Hiragana and Rōmaji or only Rōmaji. The Hiragana entry doesn't have to exist, if the word is seldom written in Hiragana. Even if there is a value in it, I'd leave it for humans to add a link. --Anatoli 22:04, 29 October 2009 (UTC)[reply]
  • I don't feel qualified to make a comment about the scripting problems issues, but I gotta say, those translations for "I must go" (我必須去 and 我應該去) are pretty awkward. This is because they use literal translations of the word "go", when in fact the implicature is not "go" but "leave". 我必須走了 would be much more natural. Tooironic 08:21, 2 November 2009 (UTC)[reply]
    • Although if I may just say one thing about the scripting thing, I recently added both "literal" and "natural" (though less accurate) translations for antipasto, please let me know if I've done the formatting correctly. Cheers. Tooironic 09:54, 2 November 2009 (UTC)[reply]
Thanks, I agree with your translation and I have changed it. True, I haven't added the explanation that "I must leave" is implied. Anatoli 22:36, 3 November 2009 (UTC)[reply]