User talk:Dan Polansky: difference between revisions

From Wiktionary, the free dictionary
Jump to navigation Jump to search
Content deleted Content added
Liliana-60 (talk | contribs)
Line 1,021: Line 1,021:


I am now blocked from editing my talk page by Liliana. Just wow! I am using an open proxy with the sole purpose of making this post, circrumventing unjust and utterly unjustified prevention of my posting to my talk page by abusive admin Liliana. --Dan Polansky --[[Special:Contributions/184.173.253.2|184.173.253.2]] 22:24, 7 August 2013 (UTC)
I am now blocked from editing my talk page by Liliana. Just wow! I am using an open proxy with the sole purpose of making this post, circrumventing unjust and utterly unjustified prevention of my posting to my talk page by abusive admin Liliana. --Dan Polansky --[[Special:Contributions/184.173.253.2|184.173.253.2]] 22:24, 7 August 2013 (UTC)
:Mommy, Liliana blocked me! That is so mean! Boo hoo hoo hoo!<br>Seriously, what did you expect when you continued with the kind of behavior that you've been blocked for? Did you seriously think that everyone would just look away and not mind that you're flaming other people behind their backs? -- [[User:Liliana-60|Liliana]] [[User talk:Liliana-60|•]] 22:34, 7 August 2013 (UTC)

Revision as of 22:34, 7 August 2013

Archive
Archives


Usefulness of phrasebook

I find some of the phrasebook entries rather useful, as their translation is not grammatically straightforward.

  • I'm hungry
    • Czech: mám hlad, as if "I have hunger"
    • German: ich habe Hunger, as if "I have hunger"
  • I'm thirsty
    • Czech: mám žízeň, as if "I have thirst"
    • German: ich habe Durst, as if "I have thirst"
  • do you speak English
    • Czech: mluvíte anglicky?, as if "do you speak in English" or even *"do you speak Englishly"?
    • Polish: czy mówisz po angielsku?, which I do not know how to render in an English analogue, maybe "do you speak in an English manner or way"
  • I'm cold
    • Czech: je mi zima, rather than *"jsem studený"; in English, perhaps "it is cold to me"?
    • German: es ist mir kalt, rather than *"ich bin kalt"; in English, perhaps "it is cold to me"?
  • how are you?
    • Czech: jak se máš, as if "how are you having yourself?"
    • German: wie geht es dir?, as if "how does it go with you?" or the like
  • have a seat
    • Czech: posaďte se, as if sit down
  • how do you say...in English?
    • Czech: jak se řekne...anglicky?, as if "how does ... get said Englishly"?
    • German: wie sagt man...auf Englisch, as if "how does one say ... on English"?
  • how much does it cost?
    • German: was kostet das?, as if "what does it cost", which happens to be idiomatic English

Category:English phrasebook has 357 entries. --Dan Polansky (talk) 18:06, 5 January 2013 (UTC)[reply]

Some more:

  • I have a cold
    • Czech: jsem nachlazený, as if "I am got cold" or the like
    • German: ich bin erkältet, just like Czech
    • Russian: ja prostudílsja, as if "I have colded myself through" or the like
  • I'm twenty years old
    • Czech: je mi dvacet let, as if "it is twenty years to me"
    • Polish: mam dwadzieścia lat, as if "I have twenty years"

--Dan Polansky (talk) 11:52, 6 January 2013 (UTC)[reply]

some random words

Just a few random words from the beginning of Czech Wikipedia article on Hydrogen. (not a full, sorted list of red links as my software doesn't understand the funny accents over letters) Don't feel under any obligation to add them! SemperBlotto (talk) 16:11, 6 January 2013 (UTC)[reply]

Vodík chemická latinsky nejjednodušší tvořící převážnou hmoty vesmíru široké praktické redukční činidlo chemické syntéze metalurgii meteorologických pouťových balonů vzducholodí Obsah Základní fyzikálně-chemické vlastnosti Historický Výskyt přírodě Tvorba průmyslová Využití Sloučeniny Anorganické sloučeniny Hydridy Další Organické sloučeniny Izotopy vodíku Odkazy Související články Literatura Základní fyzikálně-chemické vlastnosti Molekula chuti zápachu hoří namodralým plamenem nepodporuje - some capitalisation will be wrong

Thanks. We have lemma forms of base forms of many of these, although not all of them: vodík, chemický, latinský (adjective rather than the redlinked adverb), jednoduchý, tvořit, převážný, hmota, vesmír, mít, široký, praktický, redukční, činidlo, chemický, syntéza, metalurgie, etc.
For your method to work for me, I would need to enter inflected forms of Czech words into Wiktionary, which I don't feel like doing. I actually have a list of Czech words to add, working offline on their verification, from time to time. --Dan Polansky (talk) 19:37, 7 January 2013 (UTC)[reply]

ttbc

Hi,

When adding ttbc's please check for qualifier, they actually explain the sense sometimes as in trio#Translations. --Anatoli (обсудить/вклад) 00:47, 9 January 2013 (UTC)[reply]

Harry Potter

I don't really understand the difference between Category:Harry Potter and Category:Harry Potter derivations. Where do metloboj or bezjak or smrtožder belong? Zabadu (talk)

Also, I am very worried about anti-Serb bias here. For example there is no Serbia category but there is Croatia category. Why is that?

Can you please help me add flag of Serbia to Category:Serbia?

Zabadu (talk)

No comment. --Dan Polansky (talk) 13:37, 12 January 2013 (UTC)[reply]

chargemaster

Please see talk:chargemaster. Please can we discuss this more before removing this material, as it is integral to the definition. Thank you, -- Cirt (talk) 22:06, 9 March 2013 (UTC)[reply]

Thanks very much for your polite response on the talk page, I really appreciate it! :) I've responded there, -- Cirt (talk) 04:47, 10 March 2013 (UTC)[reply]

Request about "chargemaster"

Request: Please, Dan Polansky (talkcontribs), I ask of you to read this article:

I think that will give you some clarity about the term chargemaster. Thank you for your time, -- Cirt (talk) 18:18, 10 March 2013 (UTC)[reply]

I responded at WT:RFD. --Dan Polansky (talk) 19:30, 10 March 2013 (UTC)[reply]

DONE: Trimmed the definition to that suggested by Dan Polansky (talkcontribs), above, please see DIFF. Hopefully this is now satisfactory to Dan Polansky (talkcontribs). Thank you, -- Cirt (talk) 23:40, 10 March 2013 (UTC)[reply]

Please read this article

I strongly recommend you read this article, as a good faith gesture, it would help inform our discussion. Can you please read it? It is most informative. Thank you, -- Cirt (talk) 23:43, 10 March 2013 (UTC)[reply]

Let me note that, in the discussion about the definition of "chargemaster", I am acting in the capacity of a dictionary maker, trying to figure out what is and what is not a part of the definition of "chargemaster". I am not defending whatever despicable practices exist in relation to chargemasters. --Dan Polansky (talk) 19:18, 11 March 2013 (UTC)[reply]
Sure, sure, I agree with you and I don't doubt your good faith intentions. :) I'm just respectfully asking you to read this article, please? -- Cirt (talk) 20:29, 11 March 2013 (UTC)[reply]

Re: KYPark

Well said. I owe you one. —Μετάknowledgediscuss/deeds 02:17, 14 March 2013 (UTC)[reply]

I second. --Anatoli (обсудить/вклад) 02:43, 14 March 2013 (UTC)[reply]

drug

A note to myself and whoever cares to read: I am dissatisfied with the "drug" entry, currently having four senses. Recent related events:

As a consequence, I have done this:

  • Sent the 1st sense to WT:RFD, with the intention of making it more narrow by removing part of the definition.
  • Sent the 2nd sense to to WT:RFV with the intention of getting it removed.

--Dan Polansky (talk) 15:32, 27 April 2013 (UTC)[reply]

Key definition edits to "drug" entry:

  • diff, March 2003: 1st def entered of "Substance used to treat an illness, relieve a symptom or modify a chemical process in the body for a specific purpose."
  • diff, May 2003: A 2nd def entered: "Addictive substance used to alter the level of consciousness"
  • diff, August 2004: 2nd def tweak: "A substance, often addictive, used to alter the level of consciousness"
  • diff, July 2005: 2nd def tweak: "A substance, often addictive, which affects the central nervous system"
  • diff, March 2006: 3rd def added: "A chemical or substance, not necessarily for medical purposes, that alters the way the mind or body works", with the summary "Added definition(noun) that encomasses non-medicinal drugs)", by an anon
  • diff, December 2006: 4rd def added: "An illegal drug", by an anon
  • diff, March 2007: 4rd def tweaked: "A drug, especially illegal, taken for recreational use"
  • diff, July 2008: 4rd def tweaked: "A substance, especially one which is illegal, ingested for recreational use."
  • diff, May 2013: 4rd def tweaked: "A psychoactive substance, especially one which is illegal and addictive, ingested for recreational use, such as cocaine"

--Dan Polansky (talk) 12:20, 4 May 2013 (UTC)[reply]

Two senses removed by me in diff, failing WT:RFV.

I have reverted this revision by an anon, one of interest:

  1. (pharmacology) A substance intended for use in the diagnosis, cure, mitigation, treatment, or prevention of disease in man or other animals; a medicine.
  2. (pharmacology) A substance (other than food) intended to affect the structure or any function of the body of man or other animals.
  3. A narcotic substance.
  4. (figuratively) Anything that has the effect of a narcotic substance.

The 1st definition is subject to objections raised in a recent RFD: contraceptives do not meet the definition. This may be dealt with by placing contraceptives under the 2nd definion, but it is unobvious why the 2nd definition should be separate from the 1st one.

The "narcotic substance" definition is unhelpful, IMHO; it relies on narcotic entry featuring three definitions, failing to select the intended definition from "narcotic". Furthermore, the "narcotic substance" definition may be wrong, depending on what the definitions at "narcotic" are intended to cover; the 1st one seems to entail "induces sleep" as a condition necessary, so it does not cover all illicit drugs; the 2nd one entails "numbing", so ditto; the 3rd one "certain illegal drugs" is massively unspecific, failing to tell us which illegal drugs it selects, but if it selects some and not all, then it cannot be covering all illicit drugs.

When checking drug”, in OneLook Dictionary Search., no dictionary equates illicit drugs with narcotics. Collins, for instance, has "chemical substance, esp a narcotic, ...", which makes it clear "drug" and "narcotic" are not synonymous.

The 4th figurative sense is one that we are possibly missing, and was mentioned by msh210 on his talk page. However, it should be added only together with citations supporting it, IMHO. --Dan Polansky (talk) 09:59, 30 June 2013 (UTC)[reply]

Hi there. In the UK a "tax office" is a place where you can go (or phone) to discuss your tax affairs. The organization used to be called the "inland revenue" and is now called "HM Revenue and Customs". See [1] as an example of use. SemperBlotto (talk) 10:10, 4 May 2013 (UTC)[reply]

Oops! So it is not like "post office", which can refer both to a particular place and to the organization itself. The tax-collecting organizations have various specific names across the world, as per W:Revenue_service: "HM Revenue and Customs" (U.K.), "Internal Revenue Service (IRS)" (U.S.), "Australian Taxation Office" and "Canada Revenue Agency". Would "revenue service" be the generic term for tax-collecting agency or organization I am looking for? What about tax agency, or tax authority? --Dan Polansky (talk) 10:19, 4 May 2013 (UTC)[reply]
Yes, I think that "revenue/tax service/agency/authority" combinations are used in the UK and elsewhere as a generic term for the organization. In the UK, there are several other "offices" that are organizations rather than places (normally capitalised) - Office for National Statistics is one that springs to mind. SemperBlotto (talk) 10:26, 4 May 2013 (UTC)[reply]
I have fixed the entry. Feel free to edit it further. --Dan Polansky (talk) 10:33, 4 May 2013 (UTC)[reply]

Wiktionary popularity among online dictionaries per Alexa rank

I can't even believe the following statistics:

Rank of dictionary web sites per number of visitors per Alexa.com, ordered by global rank:

Web Alexa Global Rank Alexa U.S. Rank Alexa U.K. Rank Note
wikipedia.org 6 8 10 Listed despite not being a dictionary, as a super successful Mediawiki project
reference.com 207 77 144 dictionary.reference.com - 54% visitors of the domain go here
thefreedictionary.com 265 223 205 Multi-lingual; has a definition dictionary for several languages; by Farlex
wordreference.com 306 1,024 325
wiktionary.org 641 1,313 867 This is for all Wiktionaries, not just the English one.

en.wiktionary.org - 40% of visitors of the domain go to this subdomain

urbandictionary.com 836 378 429 Note the U.S. and U.K. ranks
merriam-webster.com 867 315 1,817 Note the U.S. rank
yourdictionary.com 3,440 1,775 3,294
cambridge.org 3,509 5,781 982
oxforddictionaries.com 5,635 7,898 1,540
infoplease.com 7,936 2,742 7,682
uchicago.edu 7,983 3,084 7,711 Hosts Webster 1913 and Roget 1911, but naturally also many other things

machaut.uchicago.edu - 1.2% of domain visitors go here; this is the subdomain that hosts the dictionaries.

macmillandictionary.com 8,232 6,731 4,798
rhymezone.com 13,207 3,548 6,301
collinsdictionary.com 19,187 19,487 5,175
wordnik.com 19,976 11,546 13,955
onelook.com 20,022 8,228 17,626
vocabulary.com 20,588 11,113 15,779
dicts.info 124,942 144,782 81,619
wordsmyth.net 147,063 64,960 N/A
allwords.com 166,231 77,270 172,247
freedictionary.org 576,552 N/A N/A
freedict.org 8,006,370 N/A N/A It may be that most downloaders download the complete dictionary files; I don't know.

It follows that there are five dictionary web sites significantly competing with Wiktionary in terms of number of visitors: reference.com, wordreference.com, thefreedictionary.com, urbandictionary.com, and merriam-webster.com. All the other dictionaries perform worse than Wiktionary even in access from U.S. and U.K., no matter how good definitions they offer. --Dan Polansky (talk) 10:58, 8 May 2013 (UTC) Updated. --Dan Polansky (talk) 15:16, 8 May 2013 (UTC)[reply]


Alexa rank for dictionaries selected with the focus on the Czech Republic aka Czechia:

Web Site Alexa Rank for CR Note
seznam.cz 1 slovnik.seznam.cz - 6% go to this subdomain, so Seznam dictionary is really popular; features data from Lingea and Macmillan Dictionary
centrum.cz 10 slovniky.centrum.cz - 0.56% go to this subdomain
abz.cz 240 slovnik-cizich-slov.abz.cz - 68% go to this subdomain
slovnik.cz 423 Features LangSoft vocabulary + GNU/FDL dictionary
online-slovnik.cz 783 En<-->cs + synonym dictionary; unclear owner and licensing terms
wiktionary.org 827 cs.wiktionary.org - 0.4% go to this subdomain, so chances are the visitors from Czechia actually go somewhere else, like to en.wikt, fr.wikt or de.wikt.
zcu.cz 838 slovnik.zcu.cz - the subdomain is not listed; why?
slovnik-synonym.cz 1,072 Seems to belong to abz.cz
lingea.cz 8,016 slovniky.lingea.cz - 36% go here

--Dan Polansky (talk) 13:14, 8 May 2013 (UTC)[reply]

See also http://www.alexa.com/topsites/category/Top/Reference/Dictionaries, a list of top Alexy sites in Dictionaries category. There, Wiktionary is 3rd, probably based on the global Alexa rank. --Dan Polansky (talk) 15:16, 8 May 2013 (UTC)[reply]

The popularity of Wiktionary can be corroborated from other sources, rather than relying on Alexa only.

According to Google Ad Planner at http://www.google.com/adplanner/static/top1000, Wiktionary had rank of 574 by the number of unique visitors in July 2011; it had 8,200,000 unique visitors and 26,000,000 page views. As for some other dictionaries, thefreedictionary.com had rank 167 and 21,000,000 unique visitors, while merriam-webster.com had rank 700 and 7,400,000 unique visitors. Again, these are 2011 data. To find other dictionaries there, search for "dictionaries", as there is "Dictionaries & Encyclopedias" category shown in their table.

Website quantcast.com is another source. By going to http://www.quantcast.com/wiktionary.org, and using "Compare Site" button, you can compare Wiktionary popularity to other dictionaries, including "merriam-webster.com". The comparision is shown as a time-dependent graph. For March 31 through April 29 2013, the graph shows around 1.8 million "people" in United States per month for Wiktionary while around 14 million "people" for merriam-webster.com; for cambridge.org, it shows around 0.2 million "people". Presumably, "people" refers to unique visitors. --Dan Polansky (talk) 18:26, 17 May 2013 (UTC)[reply]


Page views of Wiktionary and some other stats per Wikimedia statistics[2]:

Language Page Views in March 2013 Very Active Editors in March 2013 Speakers
English 95,761,090 80 1,500,000,000
French 36,264,524 32 200,000,000
Russian 21,857,366 12 278,000,000
German 14,115,657 17 185,000,000
Portuguese 8,935,120 3 290,000,000
Polish 8,399,996 11 43,000,000
Greek 5,636,774 5 15,000,000
Chinese 5,506,369 1 1,300,000,000
Spanish 5,385,766 7 500,000,000
Italian 5,181,025 4 70,000,000
Dutch 4,243,709 6 27,000,000
Japanese 3,502,241 3 132,000,000
Swedish 3,118,274 3 10,000,000
Korean 3,022,935 1 78,000,000
Vietnamese 2,663,255 0 80,000,000
Turkish 2,480,022 2 70,000,000
Finnish 2,127,353 7 6,000,000
Malagasy 1,606,076 0 20,000,000
Lithuanian 1,503,229 0 3,500,000
Czech 1,491,805 8 12,000,000

--Dan Polansky (talk) 17:39, 19 May 2013 (UTC)[reply]

Czech rhymes

I always thought Czech had stress on the first syllable in most words. Am I mistaken, or do words rhyme differently in Czech? —CodeCat 19:03, 14 May 2013 (UTC)[reply]

I don't think consciously of stress in Czech; I just speak it. Whatever the case, is there any impact on the pages that I am creating? Is there anything I have entered that you think in fact does not rhyme? --Dan Polansky (talk) 19:05, 14 May 2013 (UTC)[reply]
If stress is word-initial, I'd expect (deprecated template usage) hrana and (deprecated template usage) obrana to not rhyme. You are probably better at judging what rhymes and what doesn't, but if those words do rhyme, I am curious why that is. —CodeCat 19:11, 14 May 2013 (UTC)[reply]
I can't think of a rhyme with "hrana" and "obrana", but consider this: "Mariana byla panna¶ než vrazila do klokana". There, "panna" has two syllables while "klokana" has three syllables. It can be that the stress shifts to the preposition do before klokana; I do not really know. What I do know is that the words I am entering generally can be paired to create rhymes. --Dan Polansky (talk) 19:18, 14 May 2013 (UTC)[reply]
I think other Slavic languages have similar stress shifts with prepositions. It has something to do with the original Proto-Slavic pitch-based accent I think. —CodeCat 19:24, 14 May 2013 (UTC)[reply]

Deprecated Czech templates

There are some Czech entries listed in Category:Pages using deprecated templates. Could you have a look and fix them if possible? —CodeCat 19:23, 19 May 2013 (UTC)[reply]

I have removed the deprecation from {{cs-conj-it}}. After the server catches up, Category:Pages using deprecated templates should get emptied. The template was marked as deprecated in diff, on 3 March 2009. The template seems to produce correct results. I am not really much into Czech inflection templates, so I am unenthusiastic about implementing replacement proposals invented but not executed on by other editors. --Dan Polansky (talk) 20:11, 19 May 2013 (UTC)[reply]

Phonosemantic interpretations

Thank you for calling my attention to the new Beer Parlour thread, Dan. I await the community's decision, and will of course be adding no entries for the time being. Lawrence J. Howell (talk) 22:48, 9 June 2013 (UTC)[reply]

Your View?

‎Hello, Dan. My watchlist tells me that user 75.71.64.241 reverted data I uploaded for the character 身, writing Very little evidence to support those claims. As I'm abiding by the community's request to refrain from doing anything until the matter under debate has been settled, I believe it's only fair that the hands-off policy cut both ways. What's your take? Lawrence J. Howell (talk) 08:23, 13 June 2013 (UTC)[reply]

I don't really know. I don't think Wiktionary can keep "Phonosemantic interpretations" backed by a single source. The anon should better wait for the discussion to proceed, though. However, many view such waiting as too bureaucratic and proceed via a fast track. As per fast track, etymological content that is sourced from a single source, having no obvious other sources, and for which no sources are in the process of being added can be removed.
Links: Wiktionary:Beer_parlour/2013/June#Phonosemantic_interpretation, 75.71.64.241 (talk). --Dan Polansky (talk) 15:42, 13 June 2013 (UTC)[reply]

What is a misspelling

What is a misspelling may be a hard question but let us have a look, in a hasty sketch.

A misspelling can be understood as a transmission error, in terms on sending messages over a noisy communication channel. A message--a sequence of letters--sent over a noisy communication channel is subject to random changes to the letters. The intended received message is the one that was sent; the criterion of correctness is identity: the received message has correct spellings if they are identical to the spellings used in the sent message. As a consequence, misspellings resulting from noise of low-noise channel tend to be of much lower frequency in the corpus of received messages than "correct" spellings.

What is the noise in the case of man-made misspellings? For one thing, each person makes misspelling in individual written utterances; these tend to have lower frequency in all writings of the person than the "correct" spellings. For another thing, a person can store an uncommon spelling as the standard one in the mind and consistently reproduce the spelling that has low frequency in the corpus of the language community but high frequency in the writing of that single person.

There may be an authority declaring what is and what is not a misspelling, such as a dictionary published by a successful commercial publisher or a dictionary published by a regulatory government-funded organization established in one of the countries in which the language prevails. The decision made by the dictionary may be arbitrary, disregarding current frequency. The point of making an arbitrary decision about "correct" spelling and sticking to it is enabling uniformity of spelling in the corpus, coupled with compactness of spelling patterns if the spelling decision is made according to implied spelling patterns and regularities rather than by individual frequencies.

As a practical frequency criterion, misspellings tend to have vanishingly lower frequency than their "correct" alternatives, whereas alternative spellings have much more favorable frequency ratio to the "correct" or mainstream alternatives. In English, it is worthwhile to have a look at frequency ratios of U.S. vs. British spellings, such as "color" vs "colour". From what I can see in Google Ngram Viewer, their frequency ratio tends to be 2 to 4, meaning the U.S. spelling is twice to four times more common in the whole corpus than the British spelling. By contrast, looking at "conceive" vs. "concieve", the frequency ratio is 1000.

As per frequency criterion, a misspelling can never have a higher frequency than a "correct" spellings. Nonetheless, there are probably etymology afficionados claiming about one mainstream spelling or another that it is "incorrect". If these are allowed to run authoritative dictionaries, their preferences can end up being codified as "correct". --Dan Polansky (talk)

Policies and would-be policies:

Discussions:

Categories:

--Dan Polansky (talk) 10:05, 5 July 2013 (UTC)[reply]

  • Yes, I'll go along with most of that. I had always assumed that spelling mistakes were honest errors (-ie- instead of -ei- etc.), the results of typing too fast (that's where most of mine come from) and simple ignorance (I can never remember how to spell (deprecated template usage) manoeuvre. But when is a spelling mistake "common" (as the ones we include)? Maybe when the "frequency ratio" is greater than hundreds but less than thousands? SemperBlotto (talk) 10:25, 5 July 2013 (UTC)[reply]

──────────────────────────────────────────────────────────────────────────────────────────────────── Re: 'Maybe when the "frequency ratio" is greater than hundreds but less than thousands?' Sounds okay to me as a criterion for "common misspelling"; what has lower frequency ratio is "alternative spelling". However, the lower bound could be even lower, like 20 or 50. In RFV, I have posted a table that gives an impression:

Short Term Long Term Ngram Frequency Ratio in Year 2000
referencable referenceable Ngram 8
experiencable experienceable Ngram 10
influencable influenceable Ngram 16,5
sequencable sequenceable Ngram 6
servicable serviceable Ngram 156
enforcable enforceable Ngram 860
replacable replaceable Ngram 190
colour color Ngram 3,4
behaviour behavior Ngram 2,8
rigour rigor Ngram 2
concievable conceivable Ngram 3867
idiosyncracy idiosyncrasy Ngram 6
supercede supersede Ngram 15

--Dan Polansky (talk) 10:42, 5 July 2013 (UTC)[reply]

I was not paying attention. You asked when is a spelling mistake common enough to be includable. For this, not only frequency ratio can be considered but also absolute frequency. Let me think some more and have a look. --Dan Polansky (talk) 10:45, 5 July 2013 (UTC)[reply]

Currently, Wiktionary is not overflooded with misspellings, having 1477 English misspellings. To decide what misspellings to exclude based on frequency ratio, we would need to choose a fairly arbitrary threshold. I would choose such threshold that prevents overflooding of Wiktionary with misspellings while allowing a fair amount of them. As I cannot determine the number of acceptable misspellings per various frequency ratio thresholds, I have not much of an opinion on that threshold. From the table that follows, I would guess the threshold should be higher than 2000. With the use of the data that Google has published for download at Google Ngram Viewer, the number of misspellings per threshold could be determined, but that would require fairly heavy number crunching, it seems.

One could object that frequency ratio should not be used alone. I don't have much of an opinion on that other than that using it alone seems okay, not too bad.

Term 1 Term 2 Ngram Ratio in Year 2000
beleive believe Ngram 3349
beleiver believer Ngram 22913
aquitted acquitted Ngram 433
aquire acquire Ngram 1075
arithmatically arithmetically  Ngram 441
concieve conceive Ngram 1494
recieve receive Ngram 1874
bibiliography bibliography Ngram 2920
assidious assiduous Ngram 1084
bizzare bizarre Ngram 396
athiest atheist Ngram 561
condensor condenser Ngram 99
concensus consensus Ngram 341
accross across Ngram 5097

--Dan Polansky (talk) 12:06, 5 July 2013 (UTC)[reply]

To get an idea of how selective the predicate "common misspelling" is as opposed to mere "misspelling", I had a little look at imaginable misspellings of "conceive", and their frequency ratio as per Google Ngram Viewer:

Spelling Corpus Frequency
in Y2000 in %
Freq Ratio to
Base Spelling
Ngram
conceive 0,0006574282 1 Ngram
concieve 0,0000004472 1470 Ngram
coceive Not found N/A Ngram
cocneive Not found N/A Ngram
cnceive Not found N/A Ngram
concive 0,0000000197 33372 Ngram
conceie Not found N/A Ngram
conceibe Not found N/A Ngram
conceice Not found N/A Ngram

Notice that, using Google Ngram Viewer, we are looking at Google books, which is a corpus of copyedited works, as contrasted to world wide web. --Dan Polansky (talk) 09:02, 6 July 2013 (UTC)[reply]

To broaden the impression, here comes a comparison of a couple of -ize/-ise forms:

Term 1 Term 2 Ngram Frequency Ratio
in Year 2000
analyse analyze Ngram 2.6
crystalise crystalize Ngram 6.8
revitalise revitalize Ngram 6.5
popularise popularize Ngram 3.7
formalise formalize Ngram 4.4
pluralise pluralize Ngram 7.5
criticise criticize Ngram 5.1
realise realize Ngram 6.7
organise organize Ngram 5.9
equalise equalize Ngram 7.8
neutralise neutralize Ngram 6.8
socialise socialize Ngram 9.8

--Dan Polansky (talk) 18:29, 9 July 2013 (UTC)[reply]

Hypothesis: Copyediting massively impacts frequency ratio. Verification:

Term 1 Term 2 Ngram Ngram Freq Ratio
in Year 2000
Freq Ratio
in English Web
Ratio of Ratios Hits 1 Hits 2
beleive believe Ngram 3349 127 26 22900000 2900000000
beleiver believer Ngram 22913 417 55 220000 91700000
aquitted acquitted Ngram 433 243 2 188000 45600000
aquire acquire Ngram 1075 72 15 5080000 366000000
arithmatically arithmetically Ngram 441 50 9 9640 484000
concieve conceive Ngram 1494 2 612 25400000 62000000
recieve receive Ngram 1874 40 46 56000000 2260000000
bibiliography bibliography Ngram 2920 2118 1 68000 144000000
assidious assiduous Ngram 1084 93 12 25600 2390000
bizzare bizarre Ngram 396 27 15 16300000 444000000
athiest atheist Ngram 561 67 8 1710000 115000000
condensor condenser Ngram 99 9 11 4130000 37200000
concensus consensus Ngram 341 91 4 1990000 181000000
accross across Ngram 5097 187 27 16800000 3140000000

Anomalies or outliers: acquitted, conceive, bibliography.

--Dan Polansky (talk) 17:19, 12 July 2013 (UTC)[reply]

This currently has a chemistry definition. But given that it has a Proto-Slavic origin, it's almost certainly missing senses. Can you help? —CodeCat 16:14, 19 July 2013 (UTC)[reply]

Also, are -ný and -ní the same suffix or is there a difference? —CodeCat 16:25, 19 July 2013 (UTC)[reply]

I have added a def to -ný. -ný does not seem to be the same suffix as -ní. --Dan Polansky (talk) 16:47, 22 July 2013 (UTC)[reply]

Personal attack

Why did you have to personally attack me on my own user talk page? If anyone is being shoddy, you are by attacking me personally on my own talk page. Don't do it again. Razorflame 19:34, 28 July 2013 (UTC)[reply]

Evidence to the claims I have made on your talk page is in the archives of your talk page, in your editing history and in your block log. If you find any inaccuracy in what I write, let me know. --Dan Polansky (talk) 18:26, 29 July 2013 (UTC)[reply]
It is a personal attack. Don't add it back to my talk page. Razorflame 20:18, 29 July 2013 (UTC)[reply]
@Razorflame: You're bandying about "personal attack" with abandon. Don't. What he wrote is not what I, or I believe most editors, would consider a personal attack.
@Dan: That said, I quote the following from WT:BLOCK: "[A reasonable cause for blocking is causing] ... our editors distress by directly insulting them or by being continually impolite towards them." I'm not going to block you, but it is true that you are arguably being "continually impolite". Please be civil. —Μετάknowledgediscuss/deeds 21:08, 29 July 2013 (UTC)[reply]
You are misrepresenting WT:BLOCK. The complete WT:BLOCK policy is this: "The block tool should only be used to prevent edits that will, directly or indirectly, hinder or harm the progress of the English Wiktionary. It should not be used unless less drastic means of stopping these edits are, by the assessment of the blocking administrator, highly unlikely to succeed." I am calling Razorflame to responsibility for what he does. If you find a particular sentence that I posted incivil, be specific about it. For the record, given your history of misrepresentations and poor understanding, I am nowhere all too enthusiastic seeing you on my talk page or in the talk between me and Razorflame. --Dan Polansky (talk) 21:13, 29 July 2013 (UTC)[reply]
Yes, I know you dislike me as well. I am here because you two are at each other's throats, and I am (apparently ineffectively) trying to make sure that neither does something actually blockworthy. —Μετάknowledgediscuss/deeds 21:20, 29 July 2013 (UTC)[reply]
Be specific. --Dan Polansky (talk) 21:20, 29 July 2013 (UTC)[reply]
About what? If you mean for me to be specific about "something actually blockworthy", I essentially mean harassment. Whether or not harassment has occurred could easily be argued; I think not, but Razorflame certainly feels harassed, judging by his defensive reaction. —Μετάknowledgediscuss/deeds 21:41, 29 July 2013 (UTC)[reply]
──────────────────────────────────────────────────────────────────────────────────────────────────── Which sentence that I have posted is incivil, blockworthy or borderline blockworthy? --Dan Polansky (talk) 21:42, 29 July 2013 (UTC)[reply]
Nothing except reposting material that Razorflame read, removed, and (relatively civilly) asked you not to repost. You have a right to notify him of his errors on his talkpage, but reposting reverted material like that is basically edit warring. I don't think you could be reasonably blocked for it, but if you continue, perhaps someone would block you (as I said, I myself would not). —Μετάknowledgediscuss/deeds 21:51, 29 July 2013 (UTC)[reply]
Do you believe users have the right to remove posts to their talk pages that are critical of their editing? I am not notifying Razorflame about his errors; I am notifying other editors of Razorflame's dubious editing by providing direct evidence in the form of diff hyperlinks from which editors can figure things out for themselves, without taking my word for it. My posts on Razorflame's talk page are not for Razorflame, and he knows it very well. This is why he is removing my posts. I would have blocked him for those removals, but I am not an admin. --Dan Polansky (talk) 21:55, 29 July 2013 (UTC)[reply]
Yes, I do believe he has that right. If you truly wished to post them for the community, you should do so in the BP. —Μετάknowledgediscuss/deeds 22:02, 29 July 2013 (UTC)[reply]
──────────────────────────────────────────────────────────────────────────────────────────────────── A user's talk page is the most natural location for finding out about him. If I posted to Beer parlour, and months later people come to Razorflame's talk page, they would not find much there. But because most of discussions about Razorflame took place on his talk page, any editor with a sincere wish to know about Razorflame can conveniently find out. Furthermore, there is no need to bring his editing to a broad community attention when the user talk page suffices. Thus, I see not a single benefit of posting to Beer parlour, while I see two benefits of posting to his talk page. --Dan Polansky (talk) 22:07, 29 July 2013 (UTC)[reply]

trreq

I want {{trreq}} deleted or constrained as much as possible. Other editors disagree.

Discussions:

I am equally okay with dumping {{rfe}}. --Dan Polansky (talk) 07:42, 3 August 2013 (UTC)[reply]

Sort order in Czech

How are Czech words normally sorted? Are all diacritics ignored or are there some special rules? —CodeCat 19:20, 7 August 2013 (UTC)[reply]

Czech sorting is sketched here: Index_talk:Czech#Sorting_or_ordering. Thus, a, and á are equivalent as for sorting order, while r and ř are not. Another tricky thing is "ch", which is treated as a single letter rather than "c" followed by "h". If you want to sort Czech properly in a programming language, there should be a library that takes care of locale and collation. --Dan Polansky (talk) 19:26, 7 August 2013 (UTC)[reply]
I'm asking because I want to make our own software approximate Czech sorting to a reasonable approximation. If I understand it correctly:
  • Vowels with an acute accent should be equivalent to the basic vowel, and ů is also equivalent to u.
  • ch should be equivalent to h
  • Letters with haček are distinct from the basic letter. (Unfortunately we can't make these appear in the correct order, so they'll go at the end)
What happens to w? It's not a native letter of Czech, but when it does occur, is it considered equivalent to v or distinct? And I suppose that the rules for Slovak are similar, but Slovak also has ä and ô, are those considered equivalent to a and o? —CodeCat 19:33, 7 August 2013 (UTC)[reply]
Your 1st and 3rd bullets are right, but the 2nd is wrong: ch comes after h rather than being equivalent. w comes after v; nothing special going on there. Both points should follow from Index_talk:Czech#Sorting_or_ordering. I don't know about Slovak. If you install gsort from GnuWin32, you should be able to figure these things out empirically, by playing with locale. gsort or some GNU library might have a documentation or specification that you might want to see. --Dan Polansky (talk) 19:40, 7 August 2013 (UTC)[reply]
We can't make ch sort after h on Wiktionary. We can't change the order of letters, only make certain letters equivalent to others. So either ch would be sorted under c+h, or considered equivalent to h. We could make ch sort at the end of h, so that the order would be hy, hz, cha, chb... but they would still appear under the H section. —CodeCat 19:43, 7 August 2013 (UTC)[reply]
A Czech sorting that places ě,š,č,ř,ž after all other diacritic-free letters seems so fundamentally broken that I would not bother fixing the rest. --Dan Polansky (talk) 19:49, 7 August 2013 (UTC)[reply]

Block

For reference, here is a block summary: 7 August 2013 -sche (Talk | contribs) blocked Dan Polansky (Talk | contribs) with an expiry time of 1 week (account creation disabled) (Intimidating behavior/harassment: Violating WT:AGF+WT:BLOCK. Hounding+attacking editor despite being warned by MK such behavior was unacceptable+blockable. Refusing to remove attack or acknowledge such behaviour was unacceptable despite being w...).

I cry foul. --Dan Polansky (talk) 22:10, 7 August 2013 (UTC)[reply]

I am now blocked from editing my talk page by Liliana. Just wow! I am using an open proxy with the sole purpose of making this post, circrumventing unjust and utterly unjustified prevention of my posting to my talk page by abusive admin Liliana. --Dan Polansky --184.173.253.2 22:24, 7 August 2013 (UTC)[reply]

Mommy, Liliana blocked me! That is so mean! Boo hoo hoo hoo!
Seriously, what did you expect when you continued with the kind of behavior that you've been blocked for? Did you seriously think that everyone would just look away and not mind that you're flaming other people behind their backs? -- Liliana 22:34, 7 August 2013 (UTC)[reply]