User talk:Dan Polansky: difference between revisions

Browse history interactively

← Go to previous edit Go to next edit →

Content deleted Content added

VisualWikitext

Inline

Revision as of 22:34, 7 August 2013

Usefulness of phrasebook

Latest comment: 11 years ago3 comments2 people in discussion

I find some of the phrasebook entries rather useful, as their translation is not grammatically straightforward.

I'm hungry
- Czech: mám hlad, as if "I have hunger"
- German: ich habe Hunger, as if "I have hunger"
I'm thirsty
- Czech: mám žízeň, as if "I have thirst"
- German: ich habe Durst, as if "I have thirst"
do you speak English
- Czech: mluvíte anglicky?, as if "do you speak in English" or even *"do you speak Englishly"?
- Polish: czy mówisz po angielsku?, which I do not know how to render in an English analogue, maybe "do you speak in an English manner or way"
I'm cold
- Czech: je mi zima, rather than *"jsem studený"; in English, perhaps "it is cold to me"?
- German: es ist mir kalt, rather than *"ich bin kalt"; in English, perhaps "it is cold to me"?
how are you?
- Czech: jak se máš, as if "how are you having yourself?"
- German: wie geht es dir?, as if "how does it go with you?" or the like
have a seat
- Czech: posaďte se, as if sit down
how do you say...in English?
- Czech: jak se řekne...anglicky?, as if "how does ... get said Englishly"?
- German: wie sagt man...auf Englisch, as if "how does one say ... on English"?
how much does it cost?
- German: was kostet das?, as if "what does it cost", which happens to be idiomatic English

Category:English phrasebook has 357 entries. --Dan Polansky (talk) 18:06, 5 January 2013 (UTC)Reply

Exactly. If only we could get rid of all the rubbish, the phrasebook would be a useful part of the project. SemperBlotto (talk) 18:09, 5 January 2013 (UTC)Reply

Some more:

I have a cold
- Czech: jsem nachlazený, as if "I am got cold" or the like
- German: ich bin erkältet, just like Czech
- Russian: ja prostudílsja, as if "I have colded myself through" or the like
I'm twenty years old
- Czech: je mi dvacet let, as if "it is twenty years to me"
- Polish: mam dwadzieścia lat, as if "I have twenty years"

--Dan Polansky (talk) 11:52, 6 January 2013 (UTC)Reply

some random words

Latest comment: 11 years ago2 comments2 people in discussion

Just a few random words from the beginning of Czech Wikipedia article on Hydrogen. (not a full, sorted list of red links as my software doesn't understand the funny accents over letters) Don't feel under any obligation to add them! SemperBlotto (talk) 16:11, 6 January 2013 (UTC)Reply

Vodík chemická latinsky nejjednodušší tvořící převážnou hmoty vesmíru Má široké praktické redukční činidlo chemické syntéze metalurgii meteorologických pouťových balonů vzducholodí Obsah Základní fyzikálně-chemické vlastnosti Historický Výskyt přírodě Tvorba průmyslová Využití Sloučeniny Anorganické sloučeniny Hydridy Další Organické sloučeniny Izotopy vodíku Odkazy Související články Literatura Základní fyzikálně-chemické vlastnosti Molekula chuti zápachu hoří namodralým plamenem nepodporuje - some capitalisation will be wrong

Thanks. We have lemma forms of base forms of many of these, although not all of them: vodík, chemický, latinský (adjective rather than the redlinked adverb), jednoduchý, tvořit, převážný, hmota, vesmír, mít, široký, praktický, redukční, činidlo, chemický, syntéza, metalurgie, etc.

For your method to work for me, I would need to enter inflected forms of Czech words into Wiktionary, which I don't feel like doing. I actually have a list of Czech words to add, working offline on their verification, from time to time. --Dan Polansky (talk) 19:37, 7 January 2013 (UTC)Reply

ttbc

Latest comment: 11 years ago1 comment1 person in discussion

Hi,

When adding ttbc's please check for qualifier, they actually explain the sense sometimes as in trio#Translations. --Anatoli ^{(обсудить}/^вклад) 00:47, 9 January 2013 (UTC)Reply

Harry Potter

Latest comment: 11 years ago1 comment1 person in discussion

I don't really understand the difference between Category:Harry Potter and Category:Harry Potter derivations. Where do metloboj or bezjak or smrtožder belong? Zabadu (talk)

Also, I am very worried about anti-Serb bias here. For example there is no Serbia category but there is Croatia category. Why is that?

Can you please help me add flag of Serbia to Category:Serbia?

Zabadu (talk)

No comment. --Dan Polansky (talk) 13:37, 12 January 2013 (UTC)Reply

chargemaster

Latest comment: 11 years ago2 comments1 person in discussion

Please see talk:chargemaster. Please can we discuss this more before removing this material, as it is integral to the definition. Thank you, -- Cirt (talk) 22:06, 9 March 2013 (UTC)Reply

Thanks very much for your polite response on the talk page, I really appreciate it! :) I've responded there, -- Cirt (talk) 04:47, 10 March 2013 (UTC)Reply

Request about "chargemaster"

Latest comment: 11 years ago3 comments2 people in discussion

Request: Please, Dan Polansky (talk • contribs), I ask of you to read this article:

Template:quote-news

I think that will give you some clarity about the term chargemaster. Thank you for your time, -- Cirt (talk) 18:18, 10 March 2013 (UTC)Reply

I responded at WT:RFD. --Dan Polansky (talk) 19:30, 10 March 2013 (UTC)Reply

DONE: Trimmed the definition to that suggested by Dan Polansky (talk • contribs), above, please see DIFF. Hopefully this is now satisfactory to Dan Polansky (talk • contribs). Thank you, -- Cirt (talk) 23:40, 10 March 2013 (UTC)Reply

Please read this article

Latest comment: 11 years ago3 comments2 people in discussion

Template:quote-news

I strongly recommend you read this article, as a good faith gesture, it would help inform our discussion. Can you please read it? It is most informative. Thank you, -- Cirt (talk) 23:43, 10 March 2013 (UTC)Reply

Let me note that, in the discussion about the definition of "chargemaster", I am acting in the capacity of a dictionary maker, trying to figure out what is and what is not a part of the definition of "chargemaster". I am not defending whatever despicable practices exist in relation to chargemasters. --Dan Polansky (talk) 19:18, 11 March 2013 (UTC)Reply

Sure, sure, I agree with you and I don't doubt your good faith intentions. :) I'm just respectfully asking you to read this article, please? -- Cirt (talk) 20:29, 11 March 2013 (UTC)Reply

Re: KYPark

Latest comment: 11 years ago2 comments2 people in discussion

Well said. I owe you one. —Μετάknowledge^{discuss/deeds} 02:17, 14 March 2013 (UTC)Reply

I second. --Anatoli ^{(обсудить}/^вклад) 02:43, 14 March 2013 (UTC)Reply

drug

Latest comment: 11 years ago3 comments1 person in discussion

A note to myself and whoever cares to read: I am dissatisfied with the "drug" entry, currently having four senses. Recent related events:

Wiktionary:RFC#drug, August 2012, originally at RFV
A conversation at User_talk:Msh210#drug, 17 March 2013

As a consequence, I have done this:

Sent the 1st sense to WT:RFD, with the intention of making it more narrow by removing part of the definition.
Sent the 2nd sense to to WT:RFV with the intention of getting it removed.

--Dan Polansky (talk) 15:32, 27 April 2013 (UTC)Reply

Key definition edits to "drug" entry:

diff, March 2003: 1st def entered of "Substance used to treat an illness, relieve a symptom or modify a chemical process in the body for a specific purpose."
diff, May 2003: A 2nd def entered: "Addictive substance used to alter the level of consciousness"
diff, August 2004: 2nd def tweak: "A substance, often addictive, used to alter the level of consciousness"
diff, July 2005: 2nd def tweak: "A substance, often addictive, which affects the central nervous system"
diff, March 2006: 3rd def added: "A chemical or substance, not necessarily for medical purposes, that alters the way the mind or body works", with the summary "Added definition(noun) that encomasses non-medicinal drugs)", by an anon
diff, December 2006: 4rd def added: "An illegal drug", by an anon
diff, March 2007: 4rd def tweaked: "A drug, especially illegal, taken for recreational use"
diff, July 2008: 4rd def tweaked: "A substance, especially one which is illegal, ingested for recreational use."
diff, May 2013: 4rd def tweaked: "A psychoactive substance, especially one which is illegal and addictive, ingested for recreational use, such as cocaine"

--Dan Polansky (talk) 12:20, 4 May 2013 (UTC)Reply

Two senses removed by me in diff, failing WT:RFV.

I have reverted this revision by an anon, one of interest:

(pharmacology) A substance intended for use in the diagnosis, cure, mitigation, treatment, or prevention of disease in man or other animals; a medicine.
(pharmacology) A substance (other than food) intended to affect the structure or any function of the body of man or other animals.
A narcotic substance.
(figuratively) Anything that has the effect of a narcotic substance.

The 1st definition is subject to objections raised in a recent RFD: contraceptives do not meet the definition. This may be dealt with by placing contraceptives under the 2nd definion, but it is unobvious why the 2nd definition should be separate from the 1st one.

The "narcotic substance" definition is unhelpful, IMHO; it relies on narcotic entry featuring three definitions, failing to select the intended definition from "narcotic". Furthermore, the "narcotic substance" definition may be wrong, depending on what the definitions at "narcotic" are intended to cover; the 1st one seems to entail "induces sleep" as a condition necessary, so it does not cover all illicit drugs; the 2nd one entails "numbing", so ditto; the 3rd one "certain illegal drugs" is massively unspecific, failing to tell us which illegal drugs it selects, but if it selects some and not all, then it cannot be covering all illicit drugs.

When checking “drug”, in OneLook Dictionary Search., no dictionary equates illicit drugs with narcotics. Collins, for instance, has "chemical substance, esp a narcotic, ...", which makes it clear "drug" and "narcotic" are not synonymous.

The 4th figurative sense is one that we are possibly missing, and was mentioned by msh210 on his talk page. However, it should be added only together with citations supporting it, IMHO. --Dan Polansky (talk) 09:59, 30 June 2013 (UTC)Reply

tax office

Latest comment: 11 years ago4 comments2 people in discussion

Hi there. In the UK a "tax office" is a place where you can go (or phone) to discuss your tax affairs. The organization used to be called the "inland revenue" and is now called "HM Revenue and Customs". See [1] as an example of use. SemperBlotto (talk) 10:10, 4 May 2013 (UTC)Reply

Oops! So it is not like "post office", which can refer both to a particular place and to the organization itself. The tax-collecting organizations have various specific names across the world, as per W:Revenue_service: "HM Revenue and Customs" (U.K.), "Internal Revenue Service (IRS)" (U.S.), "Australian Taxation Office" and "Canada Revenue Agency". Would "revenue service" be the generic term for tax-collecting agency or organization I am looking for? What about tax agency, or tax authority? --Dan Polansky (talk) 10:19, 4 May 2013 (UTC)Reply

Yes, I think that "revenue/tax service/agency/authority" combinations are used in the UK and elsewhere as a generic term for the organization. In the UK, there are several other "offices" that are organizations rather than places (normally capitalised) - Office for National Statistics is one that springs to mind. SemperBlotto (talk) 10:26, 4 May 2013 (UTC)Reply

I have fixed the entry. Feel free to edit it further. --Dan Polansky (talk) 10:33, 4 May 2013 (UTC)Reply

Wiktionary popularity among online dictionaries per Alexa rank

Latest comment: 11 years ago5 comments1 person in discussion

I can't even believe the following statistics:

Rank of dictionary web sites per number of visitors per Alexa.com, ordered by global rank:

Web	Alexa Global Rank	Alexa U.S. Rank	Alexa U.K. Rank	Note
wikipedia.org	6	8	10	Listed despite not being a dictionary, as a super successful Mediawiki project
reference.com	207	77	144	dictionary.reference.com - 54% visitors of the domain go here
thefreedictionary.com	265	223	205	Multi-lingual; has a definition dictionary for several languages; by Farlex
wordreference.com	306	1,024	325
wiktionary.org	641	1,313	867	This is for all Wiktionaries, not just the English one. en.wiktionary.org - 40% of visitors of the domain go to this subdomain
urbandictionary.com	836	378	429	Note the U.S. and U.K. ranks
merriam-webster.com	867	315	1,817	Note the U.S. rank
yourdictionary.com	3,440	1,775	3,294
cambridge.org	3,509	5,781	982
oxforddictionaries.com	5,635	7,898	1,540
infoplease.com	7,936	2,742	7,682
uchicago.edu	7,983	3,084	7,711	Hosts Webster 1913 and Roget 1911, but naturally also many other things machaut.uchicago.edu - 1.2% of domain visitors go here; this is the subdomain that hosts the dictionaries.
macmillandictionary.com	8,232	6,731	4,798
rhymezone.com	13,207	3,548	6,301
collinsdictionary.com	19,187	19,487	5,175
wordnik.com	19,976	11,546	13,955
onelook.com	20,022	8,228	17,626
vocabulary.com	20,588	11,113	15,779
dicts.info	124,942	144,782	81,619
wordsmyth.net	147,063	64,960	N/A
allwords.com	166,231	77,270	172,247
freedictionary.org	576,552	N/A	N/A
freedict.org	8,006,370	N/A	N/A	It may be that most downloaders download the complete dictionary files; I don't know.

It follows that there are five dictionary web sites significantly competing with Wiktionary in terms of number of visitors: reference.com, wordreference.com, thefreedictionary.com, urbandictionary.com, and merriam-webster.com. All the other dictionaries perform worse than Wiktionary even in access from U.S. and U.K., no matter how good definitions they offer. --Dan Polansky (talk) 10:58, 8 May 2013 (UTC) Updated. --Dan Polansky (talk) 15:16, 8 May 2013 (UTC)Reply

Alexa rank for dictionaries selected with the focus on the Czech Republic aka Czechia:

Web Site	Alexa Rank for CR	Note
seznam.cz	1	slovnik.seznam.cz - 6% go to this subdomain, so Seznam dictionary is really popular; features data from Lingea and Macmillan Dictionary
centrum.cz	10	slovniky.centrum.cz - 0.56% go to this subdomain
abz.cz	240	slovnik-cizich-slov.abz.cz - 68% go to this subdomain
slovnik.cz	423	Features LangSoft vocabulary + GNU/FDL dictionary
online-slovnik.cz	783	En<-->cs + synonym dictionary; unclear owner and licensing terms
wiktionary.org	827	cs.wiktionary.org - 0.4% go to this subdomain, so chances are the visitors from Czechia actually go somewhere else, like to en.wikt, fr.wikt or de.wikt.
zcu.cz	838	slovnik.zcu.cz - the subdomain is not listed; why?
slovnik-synonym.cz	1,072	Seems to belong to abz.cz
lingea.cz	8,016	slovniky.lingea.cz - 36% go here

--Dan Polansky (talk) 13:14, 8 May 2013 (UTC)Reply

See also http://www.alexa.com/topsites/category/Top/Reference/Dictionaries, a list of top Alexy sites in Dictionaries category. There, Wiktionary is 3rd, probably based on the global Alexa rank. --Dan Polansky (talk) 15:16, 8 May 2013 (UTC)Reply

The popularity of Wiktionary can be corroborated from other sources, rather than relying on Alexa only.

According to Google Ad Planner at http://www.google.com/adplanner/static/top1000, Wiktionary had rank of 574 by the number of unique visitors in July 2011; it had 8,200,000 unique visitors and 26,000,000 page views. As for some other dictionaries, thefreedictionary.com had rank 167 and 21,000,000 unique visitors, while merriam-webster.com had rank 700 and 7,400,000 unique visitors. Again, these are 2011 data. To find other dictionaries there, search for "dictionaries", as there is "Dictionaries & Encyclopedias" category shown in their table.

Website quantcast.com is another source. By going to http://www.quantcast.com/wiktionary.org, and using "Compare Site" button, you can compare Wiktionary popularity to other dictionaries, including "merriam-webster.com". The comparision is shown as a time-dependent graph. For March 31 through April 29 2013, the graph shows around 1.8 million "people" in United States per month for Wiktionary while around 14 million "people" for merriam-webster.com; for cambridge.org, it shows around 0.2 million "people". Presumably, "people" refers to unique visitors. --Dan Polansky (talk) 18:26, 17 May 2013 (UTC)Reply

Page views of Wiktionary and some other stats per Wikimedia statistics[2]:

Language	Page Views in March 2013	Very Active Editors in March 2013	Speakers
English	95,761,090	80	1,500,000,000
French	36,264,524	32	200,000,000
Russian	21,857,366	12	278,000,000
German	14,115,657	17	185,000,000
Portuguese	8,935,120	3	290,000,000
Polish	8,399,996	11	43,000,000
Greek	5,636,774	5	15,000,000
Chinese	5,506,369	1	1,300,000,000
Spanish	5,385,766	7	500,000,000
Italian	5,181,025	4	70,000,000
Dutch	4,243,709	6	27,000,000
Japanese	3,502,241	3	132,000,000
Swedish	3,118,274	3	10,000,000
Korean	3,022,935	1	78,000,000
Vietnamese	2,663,255	0	80,000,000
Turkish	2,480,022	2	70,000,000
Finnish	2,127,353	7	6,000,000
Malagasy	1,606,076	0	20,000,000
Lithuanian	1,503,229	0	3,500,000
Czech	1,491,805	8	12,000,000

--Dan Polansky (talk) 17:39, 19 May 2013 (UTC)Reply

Czech rhymes

Latest comment: 11 years ago5 comments2 people in discussion

I always thought Czech had stress on the first syllable in most words. Am I mistaken, or do words rhyme differently in Czech? —CodeCa t 19:03, 14 May 2013 (UTC)Reply

I don't think consciously of stress in Czech; I just speak it. Whatever the case, is there any impact on the pages that I am creating? Is there anything I have entered that you think in fact does not rhyme? --Dan Polansky (talk) 19:05, 14 May 2013 (UTC)Reply

If stress is word-initial, I'd expect (deprecated template usage) hrana and (deprecated template usage) obrana to not rhyme. You are probably better at judging what rhymes and what doesn't, but if those words do rhyme, I am curious why that is. —CodeCa t 19:11, 14 May 2013 (UTC)Reply

I can't think of a rhyme with "hrana" and "obrana", but consider this: "Mariana byla panna¶ než vrazila do klokana". There, "panna" has two syllables while "klokana" has three syllables. It can be that the stress shifts to the preposition do before klokana; I do not really know. What I do know is that the words I am entering generally can be paired to create rhymes. --Dan Polansky (talk) 19:18, 14 May 2013 (UTC)Reply

I think other Slavic languages have similar stress shifts with prepositions. It has something to do with the original Proto-Slavic pitch-based accent I think. —CodeCa t 19:24, 14 May 2013 (UTC)Reply

Deprecated Czech templates

Latest comment: 11 years ago2 comments2 people in discussion

There are some Czech entries listed in Category:Pages using deprecated templates. Could you have a look and fix them if possible? —CodeCa t 19:23, 19 May 2013 (UTC)Reply

I have removed the deprecation from {{cs-conj-it}}. After the server catches up, Category:Pages using deprecated templates should get emptied. The template was marked as deprecated in diff, on 3 March 2009. The template seems to produce correct results. I am not really much into Czech inflection templates, so I am unenthusiastic about implementing replacement proposals invented but not executed on by other editors. --Dan Polansky (talk) 20:11, 19 May 2013 (UTC)Reply

Phonosemantic interpretations

Latest comment: 11 years ago1 comment1 person in discussion

Thank you for calling my attention to the new Beer Parlour thread, Dan. I await the community's decision, and will of course be adding no entries for the time being. Lawrence J. Howell (talk) 22:48, 9 June 2013 (UTC)Reply

Your View?

Latest comment: 11 years ago2 comments2 people in discussion

‎Hello, Dan. My watchlist tells me that user 75.71.64.241 reverted data I uploaded for the character 身, writing Very little evidence to support those claims. As I'm abiding by the community's request to refrain from doing anything until the matter under debate has been settled, I believe it's only fair that the hands-off policy cut both ways. What's your take? Lawrence J. Howell (talk) 08:23, 13 June 2013 (UTC)Reply

I don't really know. I don't think Wiktionary can keep "Phonosemantic interpretations" backed by a single source. The anon should better wait for the discussion to proceed, though. However, many view such waiting as too bureaucratic and proceed via a fast track. As per fast track, etymological content that is sourced from a single source, having no obvious other sources, and for which no sources are in the process of being added can be removed.

Links: Wiktionary:Beer_parlour/2013/June#Phonosemantic_interpretation, 75.71.64.241 (talk). --Dan Polansky (talk) 15:42, 13 June 2013 (UTC)Reply

What is a misspelling

Latest comment: 10 years ago8 comments2 people in discussion

What is a misspelling may be a hard question but let us have a look, in a hasty sketch.

A misspelling can be understood as a transmission error, in terms on sending messages over a noisy communication channel. A message--a sequence of letters--sent over a noisy communication channel is subject to random changes to the letters. The intended received message is the one that was sent; the criterion of correctness is identity: the received message has correct spellings if they are identical to the spellings used in the sent message. As a consequence, misspellings resulting from noise of low-noise channel tend to be of much lower frequency in the corpus of received messages than "correct" spellings.

What is the noise in the case of man-made misspellings? For one thing, each person makes misspelling in individual written utterances; these tend to have lower frequency in all writings of the person than the "correct" spellings. For another thing, a person can store an uncommon spelling as the standard one in the mind and consistently reproduce the spelling that has low frequency in the corpus of the language community but high frequency in the writing of that single person.

There may be an authority declaring what is and what is not a misspelling, such as a dictionary published by a successful commercial publisher or a dictionary published by a regulatory government-funded organization established in one of the countries in which the language prevails. The decision made by the dictionary may be arbitrary, disregarding current frequency. The point of making an arbitrary decision about "correct" spelling and sticking to it is enabling uniformity of spelling in the corpus, coupled with compactness of spelling patterns if the spelling decision is made according to implied spelling patterns and regularities rather than by individual frequencies.

As a practical frequency criterion, misspellings tend to have vanishingly lower frequency than their "correct" alternatives, whereas alternative spellings have much more favorable frequency ratio to the "correct" or mainstream alternatives. In English, it is worthwhile to have a look at frequency ratios of U.S. vs. British spellings, such as "color" vs "colour". From what I can see in Google Ngram Viewer, their frequency ratio tends to be 2 to 4, meaning the U.S. spelling is twice to four times more common in the whole corpus than the British spelling. By contrast, looking at "conceive" vs. "concieve", the frequency ratio is 1000.

As per frequency criterion, a misspelling can never have a higher frequency than a "correct" spellings. Nonetheless, there are probably etymology afficionados claiming about one mainstream spelling or another that it is "incorrect". If these are allowed to run authoritative dictionaries, their preferences can end up being codified as "correct". --Dan Polansky (talk)

Policies and would-be policies:

Discussions:

WT:RFV#referencable, later at Talk:referencable
Talk:idiosyncracy
Talk:supercede
Talk:legionaire

Categories:

Category:English misspellings - currently 1,477 entries; only for common misspellings

--Dan Polansky (talk) 10:05, 5 July 2013 (UTC)Reply

Yes, I'll go along with most of that. I had always assumed that spelling mistakes were honest errors (-ie- instead of -ei- etc.), the results of typing too fast (that's where most of mine come from) and simple ignorance (I can never remember how to spell (deprecated template usage) manoeuvre. But when is a spelling mistake "common" (as the ones we include)? Maybe when the "frequency ratio" is greater than hundreds but less than thousands? SemperBlotto (talk) 10:25, 5 July 2013 (UTC)Reply

┌────────────────────────────────────────────────────────────────────────────────────────────────────┘ Re: 'Maybe when the "frequency ratio" is greater than hundreds but less than thousands?' Sounds okay to me as a criterion for "common misspelling"; what has lower frequency ratio is "alternative spelling". However, the lower bound could be even lower, like 20 or 50. In RFV, I have posted a table that gives an impression:

Short Term	Long Term	Ngram	Frequency Ratio in Year 2000
referencable	referenceable	Ngram	8
experiencable	experienceable	Ngram	10
influencable	influenceable	Ngram	16,5
sequencable	sequenceable	Ngram	6
servicable	serviceable	Ngram	156
enforcable	enforceable	Ngram	860
replacable	replaceable	Ngram	190
colour	color	Ngram	3,4
behaviour	behavior	Ngram	2,8
rigour	rigor	Ngram	2
concievable	conceivable	Ngram	3867
idiosyncracy	idiosyncrasy	Ngram	6
supercede	supersede	Ngram	15

--Dan Polansky (talk) 10:42, 5 July 2013 (UTC)Reply

I was not paying attention. You asked when is a spelling mistake common enough to be includable. For this, not only frequency ratio can be considered but also absolute frequency. Let me think some more and have a look. --Dan Polansky (talk) 10:45, 5 July 2013 (UTC)Reply

Currently, Wiktionary is not overflooded with misspellings, having 1477 English misspellings. To decide what misspellings to exclude based on frequency ratio, we would need to choose a fairly arbitrary threshold. I would choose such threshold that prevents overflooding of Wiktionary with misspellings while allowing a fair amount of them. As I cannot determine the number of acceptable misspellings per various frequency ratio thresholds, I have not much of an opinion on that threshold. From the table that follows, I would guess the threshold should be higher than 2000. With the use of the data that Google has published for download at Google Ngram Viewer, the number of misspellings per threshold could be determined, but that would require fairly heavy number crunching, it seems.

One could object that frequency ratio should not be used alone. I don't have much of an opinion on that other than that using it alone seems okay, not too bad.

Term 1	Term 2	Ngram	Ratio in Year 2000
beleive	believe	Ngram	3349
beleiver	believer	Ngram	22913
aquitted	acquitted	Ngram	433
aquire	acquire	Ngram	1075
arithmatically	arithmetically	Ngram	441
concieve	conceive	Ngram	1494
recieve	receive	Ngram	1874
bibiliography	bibliography	Ngram	2920
assidious	assiduous	Ngram	1084
bizzare	bizarre	Ngram	396
athiest	atheist	Ngram	561
condensor	condenser	Ngram	99
concensus	consensus	Ngram	341
accross	across	Ngram	5097

--Dan Polansky (talk) 12:06, 5 July 2013 (UTC)Reply

To get an idea of how selective the predicate "common misspelling" is as opposed to mere "misspelling", I had a little look at imaginable misspellings of "conceive", and their frequency ratio as per Google Ngram Viewer:

Spelling	Corpus Frequency in Y2000 in %	Freq Ratio to Base Spelling	Ngram
conceive	0,0006574282	1	Ngram
concieve	0,0000004472	1470	Ngram
coceive	Not found	N/A	Ngram
cocneive	Not found	N/A	Ngram
cnceive	Not found	N/A	Ngram
concive	0,0000000197	33372	Ngram
conceie	Not found	N/A	Ngram
conceibe	Not found	N/A	Ngram
conceice	Not found	N/A	Ngram

Notice that, using Google Ngram Viewer, we are looking at Google books, which is a corpus of copyedited works, as contrasted to world wide web. --Dan Polansky (talk) 09:02, 6 July 2013 (UTC)Reply

To broaden the impression, here comes a comparison of a couple of -ize/-ise forms:

Term 1	Term 2	Ngram	Frequency Ratio in Year 2000
analyse	analyze	Ngram	2.6
crystalise	crystalize	Ngram	6.8
revitalise	revitalize	Ngram	6.5
popularise	popularize	Ngram	3.7
formalise	formalize	Ngram	4.4
pluralise	pluralize	Ngram	7.5
criticise	criticize	Ngram	5.1
realise	realize	Ngram	6.7
organise	organize	Ngram	5.9
equalise	equalize	Ngram	7.8
neutralise	neutralize	Ngram	6.8
socialise	socialize	Ngram	9.8

--Dan Polansky (talk) 18:29, 9 July 2013 (UTC)Reply

Hypothesis: Copyediting massively impacts frequency ratio. Verification:

Term 1	Term 2	Ngram	Ngram Freq Ratio in Year 2000	Freq Ratio in English Web	Ratio of Ratios	Hits 1	Hits 2
beleive	believe	Ngram	3349	127	26	22900000	2900000000
beleiver	believer	Ngram	22913	417	55	220000	91700000
aquitted	acquitted	Ngram	433	243	2	188000	45600000
aquire	acquire	Ngram	1075	72	15	5080000	366000000
arithmatically	arithmetically	Ngram	441	50	9	9640	484000
concieve	conceive	Ngram	1494	2	612	25400000	62000000
recieve	receive	Ngram	1874	40	46	56000000	2260000000
bibiliography	bibliography	Ngram	2920	2118	1	68000	144000000
assidious	assiduous	Ngram	1084	93	12	25600	2390000
bizzare	bizarre	Ngram	396	27	15	16300000	444000000
athiest	atheist	Ngram	561	67	8	1710000	115000000
condensor	condenser	Ngram	99	9	11	4130000	37200000
concensus	consensus	Ngram	341	91	4	1990000	181000000
accross	across	Ngram	5097	187	27	16800000	3140000000

Anomalies or outliers: acquitted, conceive, bibliography.

--Dan Polansky (talk) 17:19, 12 July 2013 (UTC)Reply

-ný

Latest comment: 10 years ago3 comments2 people in discussion

This currently has a chemistry definition. But given that it has a Proto-Slavic origin, it's almost certainly missing senses. Can you help? —CodeCa t 16:14, 19 July 2013 (UTC)Reply

Also, are -ný and -ní the same suffix or is there a difference? —CodeCa t 16:25, 19 July 2013 (UTC)Reply

I have added a def to -ný. -ný does not seem to be the same suffix as -ní. --Dan Polansky (talk) 16:47, 22 July 2013 (UTC)Reply

Personal attack

Latest comment: 10 years ago13 comments3 people in discussion

Why did you have to personally attack me on my own user talk page? If anyone is being shoddy, you are by attacking me personally on my own talk page. Don't do it again. Razor flame 19:34, 28 July 2013 (UTC)Reply

Evidence to the claims I have made on your talk page is in the archives of your talk page, in your editing history and in your block log. If you find any inaccuracy in what I write, let me know. --Dan Polansky (talk) 18:26, 29 July 2013 (UTC)Reply

It is a personal attack. Don't add it back to my talk page. Razor flame 20:18, 29 July 2013 (UTC)Reply

@Razorflame: You're bandying about "personal attack" with abandon. Don't. What he wrote is not what I, or I believe most editors, would consider a personal attack.

@Dan: That said, I quote the following from WT:BLOCK: "[A reasonable cause for blocking is causing] ... our editors distress by directly insulting them or by being continually impolite towards them." I'm not going to block you, but it is true that you are arguably being "continually impolite". Please be civil. —Μετάknowledge^{discuss/deeds} 21:08, 29 July 2013 (UTC)Reply

You are misrepresenting WT:BLOCK. The complete WT:BLOCK policy is this: "The block tool should only be used to prevent edits that will, directly or indirectly, hinder or harm the progress of the English Wiktionary. It should not be used unless less drastic means of stopping these edits are, by the assessment of the blocking administrator, highly unlikely to succeed." I am calling Razorflame to responsibility for what he does. If you find a particular sentence that I posted incivil, be specific about it. For the record, given your history of misrepresentations and poor understanding, I am nowhere all too enthusiastic seeing you on my talk page or in the talk between me and Razorflame. --Dan Polansky (talk) 21:13, 29 July 2013 (UTC)Reply

Yes, I know you dislike me as well. I am here because you two are at each other's throats, and I am (apparently ineffectively) trying to make sure that neither does something actually blockworthy. —Μετάknowledge^{discuss/deeds} 21:20, 29 July 2013 (UTC)Reply

Be specific. --Dan Polansky (talk) 21:20, 29 July 2013 (UTC)Reply

About what? If you mean for me to be specific about "something actually blockworthy", I essentially mean harassment. Whether or not harassment has occurred could easily be argued; I think not, but Razorflame certainly feels harassed, judging by his defensive reaction. —Μετάknowledge^{discuss/deeds} 21:41, 29 July 2013 (UTC)Reply

┌────────────────────────────────────────────────────────────────────────────────────────────────────┘ Which sentence that I have posted is incivil, blockworthy or borderline blockworthy? --Dan Polansky (talk) 21:42, 29 July 2013 (UTC)Reply

Nothing except reposting material that Razorflame read, removed, and (relatively civilly) asked you not to repost. You have a right to notify him of his errors on his talkpage, but reposting reverted material like that is basically edit warring. I don't think you could be reasonably blocked for it, but if you continue, perhaps someone would block you (as I said, I myself would not). —Μετάknowledge^{discuss/deeds} 21:51, 29 July 2013 (UTC)Reply

Do you believe users have the right to remove posts to their talk pages that are critical of their editing? I am not notifying Razorflame about his errors; I am notifying other editors of Razorflame's dubious editing by providing direct evidence in the form of diff hyperlinks from which editors can figure things out for themselves, without taking my word for it. My posts on Razorflame's talk page are not for Razorflame, and he knows it very well. This is why he is removing my posts. I would have blocked him for those removals, but I am not an admin. --Dan Polansky (talk) 21:55, 29 July 2013 (UTC)Reply

Yes, I do believe he has that right. If you truly wished to post them for the community, you should do so in the BP. —Μετάknowledge^{discuss/deeds} 22:02, 29 July 2013 (UTC)Reply

┌────────────────────────────────────────────────────────────────────────────────────────────────────┘ A user's talk page is the most natural location for finding out about him. If I posted to Beer parlour, and months later people come to Razorflame's talk page, they would not find much there. But because most of discussions about Razorflame took place on his talk page, any editor with a sincere wish to know about Razorflame can conveniently find out. Furthermore, there is no need to bring his editing to a broad community attention when the user talk page suffices. Thus, I see not a single benefit of posting to Beer parlour, while I see two benefits of posting to his talk page. --Dan Polansky (talk) 22:07, 29 July 2013 (UTC)Reply

trreq

Latest comment: 10 years ago1 comment1 person in discussion

I want {{trreq}} deleted or constrained as much as possible. Other editors disagree.

Discussions:

I am equally okay with dumping {{rfe}}. --Dan Polansky (talk) 07:42, 3 August 2013 (UTC)Reply

Sort order in Czech

Latest comment: 10 years ago6 comments2 people in discussion

How are Czech words normally sorted? Are all diacritics ignored or are there some special rules? —CodeCa t 19:20, 7 August 2013 (UTC)Reply

Czech sorting is sketched here: Index_talk:Czech#Sorting_or_ordering. Thus, a, and á are equivalent as for sorting order, while r and ř are not. Another tricky thing is "ch", which is treated as a single letter rather than "c" followed by "h". If you want to sort Czech properly in a programming language, there should be a library that takes care of locale and collation. --Dan Polansky (talk) 19:26, 7 August 2013 (UTC)Reply

I'm asking because I want to make our own software approximate Czech sorting to a reasonable approximation. If I understand it correctly:

Vowels with an acute accent should be equivalent to the basic vowel, and ů is also equivalent to u.
ch should be equivalent to h
Letters with haček are distinct from the basic letter. (Unfortunately we can't make these appear in the correct order, so they'll go at the end)

What happens to w? It's not a native letter of Czech, but when it does occur, is it considered equivalent to v or distinct? And I suppose that the rules for Slovak are similar, but Slovak also has ä and ô, are those considered equivalent to a and o? —CodeCa t 19:33, 7 August 2013 (UTC)Reply

Your 1st and 3rd bullets are right, but the 2nd is wrong: ch comes after h rather than being equivalent. w comes after v; nothing special going on there. Both points should follow from Index_talk:Czech#Sorting_or_ordering. I don't know about Slovak. If you install gsort from GnuWin32, you should be able to figure these things out empirically, by playing with locale. gsort or some GNU library might have a documentation or specification that you might want to see. --Dan Polansky (talk) 19:40, 7 August 2013 (UTC)Reply

We can't make ch sort after h on Wiktionary. We can't change the order of letters, only make certain letters equivalent to others. So either ch would be sorted under c+h, or considered equivalent to h. We could make ch sort at the end of h, so that the order would be hy, hz, cha, chb... but they would still appear under the H section. —CodeCa t 19:43, 7 August 2013 (UTC)Reply

A Czech sorting that places ě,š,č,ř,ž after all other diacritic-free letters seems so fundamentally broken that I would not bother fixing the rest. --Dan Polansky (talk) 19:49, 7 August 2013 (UTC)Reply

Block

Latest comment: 10 years ago3 comments3 people in discussion

For reference, here is a block summary: 7 August 2013 -sche (Talk | contribs) blocked Dan Polansky (Talk | contribs) with an expiry time of 1 week (account creation disabled) (Intimidating behavior/harassment: Violating WT:AGF+WT:BLOCK. Hounding+attacking editor despite being warned by MK such behavior was unacceptable+blockable. Refusing to remove attack or acknowledge such behaviour was unacceptable despite being w...).

I cry foul. --Dan Polansky (talk) 22:10, 7 August 2013 (UTC)Reply

I am now blocked from editing my talk page by Liliana. Just wow! I am using an open proxy with the sole purpose of making this post, circrumventing unjust and utterly unjustified prevention of my posting to my talk page by abusive admin Liliana. --Dan Polansky --184.173.253.2 22:24, 7 August 2013 (UTC)Reply

Mommy, Liliana blocked me! That is so mean! Boo hoo hoo hoo!
Seriously, what did you expect when you continued with the kind of behavior that you've been blocked for? Did you seriously think that everyone would just look away and not mind that you're flaming other people behind their backs? -- Liliana • 22:34, 7 August 2013 (UTC)Reply

Revision as of 22:24, 7 August 2013 edit 184.173.253.2 (talk) →‎Block ← Go to previous edit		Revision as of 22:34, 7 August 2013 edit undo Liliana-60 (talk \| contribs) 23,274 edits →‎Block Go to next edit →
Line 1,021:		Line 1,021:

	I am now blocked from editing my talk page by Liliana. Just wow! I am using an open proxy with the sole purpose of making this post, circrumventing unjust and utterly unjustified prevention of my posting to my talk page by abusive admin Liliana. --Dan Polansky --[[Special:Contributions/184.173.253.2\|184.173.253.2]] 22:24, 7 August 2013 (UTC)		I am now blocked from editing my talk page by Liliana. Just wow! I am using an open proxy with the sole purpose of making this post, circrumventing unjust and utterly unjustified prevention of my posting to my talk page by abusive admin Liliana. --Dan Polansky --[[Special:Contributions/184.173.253.2\|184.173.253.2]] 22:24, 7 August 2013 (UTC)
			:Mommy, Liliana blocked me! That is so mean! Boo hoo hoo hoo!<br>Seriously, what did you expect when you continued with the kind of behavior that you've been blocked for? Did you seriously think that everyone would just look away and not mind that you're flaming other people behind their backs? -- [[User:Liliana-60\|Liliana]] [[User talk:Liliana-60\|•]] 22:34, 7 August 2013 (UTC)

User talk:Dan Polansky: difference between revisions

Revision as of 22:34, 7 August 2013

Contents

Usefulness of phrasebook

some random words

ttbc

Harry Potter

chargemaster

Request about "chargemaster"

Please read this article

Re: KYPark

drug

tax office

Wiktionary popularity among online dictionaries per Alexa rank

Czech rhymes

Deprecated Czech templates

Phonosemantic interpretations

Your View?

What is a misspelling

-ný

Personal attack

trreq

Sort order in Czech

Block

Navigation menu