User talk:-sche/exceptional

Definition from Wiktionary, the free dictionary
Jump to: navigation, search


Worst gloss: trampolo; worst plural: sneakers. — Ungoliant (Falai) 00:54, 20 January 2013 (UTC)

lol @ trampolo! I've cleaned it up, though; I worked out which non-bird sense was meant by checking Google images. And I switched sneakers to use the same format everyone at RFD seemed to like on trainers. I think that saves both entries, and Wiktionary's reputation! :) - -sche (discuss) 01:47, 20 January 2013 (UTC)

I think that hoeng1gong2 jyu5jin4hok6 hok6wui6 jyut6jyu5 ping3jam1 fong1on3 deserves a place on the list but I can't think of a good reason why. —Μετάknowledgediscuss/deeds 01:04, 4 February 2013 (UTC)

It could go in the anteroom of silliness as "surprisingly not the result of keyboard mashing". Or "Entries which look like keyboard mashing" could be a category? lol - -sche (discuss) 02:24, 4 February 2013 (UTC)
It faces stiff competition from FlatO@InsideChesthigh-PalmDown-FlatO@InsideChesthigh-PalmDown Nod FlatO@InsideChesthigh-PalmForward-FlatO@Inside-PalmForward OpenB@SideChesthigh-OpenB@SideChesthigh OpenB@SideTrunkhigh-OpenB@SideTrunkhigh. —Μετάknowledgediscuss/deeds 05:14, 4 February 2013 (UTC)
  • Longest living incorrect pronunciation: /ˈʌnu/ at Portuguese ano, added in 2004, removed in 2013. — Ungoliant (Falai) 22:05, 4 February 2013 (UTC)
  • Most prons: pwn. Also, note the amount of past participles. — Ungoliant (Falai) 13:22, 16 May 2013 (UTC)

  1. the longest entry we have? (which, btw, is this);
    and longest non-humorous word
  2. let's have subpages for specific languages
  3. and make this officia? --Dixtosa (talk) 13:46, 10 February 2015 (UTC)


papadom is a strong candidate for the "most alternative forms" title. --Rowboater (talk) 22:14, 4 February 2013 (UTC)

So it is! Thanks! - -sche (discuss) 23:19, 4 February 2013 (UTC)
But are all the forms citeable? I specifically checked each form at Hanukkah. —Μετάknowledgediscuss/deeds 02:17, 5 February 2013 (UTC)
OK, I've checked all the forms; about half turned out to be bogus. - -sche (discuss) 03:07, 10 March 2013 (UTC)


How about something for words that have come from an excessive number of other languages? Currently I only see stuff like dragoman and Toki Pona, but I'm sure better examples exist. —Μετάknowledgediscuss/deeds 17:47, 9 March 2013 (UTC)

Hm, that's a good idea, but I too am having a hard time thinking of examples. - -sche (discuss) 03:09, 10 March 2013 (UTC)
Aha, there's "oka", which is English, from Italian, from French, from Turkish, where the Turkish is sometimes thought to be from Arabic, from Classical Syriac, from Greek, from Latin (so, it's seven steps removed from its etymon, without even going into the non-borrowing descent sequence: Latin from Old Latin from Proto-Italic from Proto-Indo-European). And "cukier" is Polish, from German, from Italian, from Arabic, from Persian, from Sanskrit (five steps, if that chain is correct). - -sche (discuss) 07:05, 10 March 2013 (UTC)

Ungoliant (Falai) 13:23, 10 March 2013 (UTC)

  • trump < possibly Italian < Latin < Etruscan < Ancient Greek < possibly Phrygian or Illyrian. — Ungoliant (Falai) 13:25, 10 March 2013 (UTC)
    I love the one at shaman#Etymology. —Μετάknowledgediscuss/deeds 00:11, 30 September 2013 (UTC)
    It's very impressive, if it's right. Apricot seems to have gone from Italy to Turkey and then back across the Mediterranean to Spain (and then up towards England), but shaman is even more well-travelled. It could use some inline refs, though... I'll see what I can do. - -sche (discuss) 01:48, 1 October 2013 (UTC)
I've added a section. Perhaps we could award honorable mentions to unusual etymology chains, too, like words loaned from Tuvalu into Xhosa and thence into Russian.) - -sche (discuss) 23:52, 17 January 2015 (UTC)

Most alt. forms[edit]

Knowledge. — Ungoliant (Falai) 01:57, 11 April 2013 (UTC)

Weeding out the ones which are only attested in Middle English, or are not attested at all, knocks it from 30 down to 16, but that's still impressive. - -sche (discuss) 18:06, 11 April 2013 (UTC)

Note to self, investigate barghest and hajduk. - -sche (discuss) 22:07, 28 June 2013 (UTC)

  • Thanks for all the help with Gaddafi! Think you could perform the same services for Muammar? I searched pretty extensively, but of course I may have missed some. Anyway, I doubt we'll ever find any term that can outdo that in alt forms, but if we do, I'm betting on another Arabic name (have you tried looking for more for Muhammad?). —This unsigned comment was added by Metaknowledge (talkcontribs).
    Now that you mention it, I can find some more alt forms of Muhammad... it hadn't occurred to me to look for terminal -id, -ut and other schwas. - -sche (discuss) 04:26, 28 January 2014 (UTC)
    A sneaky method is to steal from other languages. Turkish, Persian, and French caused the proliferation at Husayn; I think that spellings like Maxamed (from Somali) should be fair game. Plus, you can try to find scholarly transliterations, which I've been ignoring, but which could have diacritics etc and still be citeable. —Μετάknowledgediscuss/deeds 04:38, 28 January 2014 (UTC)
    Try forms with -ph- and -f- (and maybe -ff-). — Ungoliant (falai) 13:30, 1 February 2014 (UTC)
    I did manage to find one attested -ph- variant of Gaddafi. Or were you talking about Muhammad? (I can find MOffAMMED as a scanno of MOHAMMED, lol.) - -sche (discuss) 22:27, 1 February 2014 (UTC)
    Muhammad. Baphomet derived from Muhammad so I expect there might be some mediaeval forms with -f- or -ph-. Probably Hispanicisms as /f/ is the regular adaptation of /ħ/ in Old Spanish loanwords from Andalusian Arabic. — Ungoliant (falai) 01:39, 2 February 2014 (UTC)

Most parts of speech[edit]

Portuguese a: letter, noun, article, pronoun, preposition, verb, contraction. — Ungoliant (Falai) 02:31, 8 June 2013 (UTC)

Worst formatting[edit]

[1] (Volapük section): a word related to half the language’s other words. — Ungoliant (Falai) 02:11, 16 June 2013 (UTC)

I dunno. The user who did that did that to a lot of pages, and the information is not exactly incorrect (or even badly formatted)... it's just very copious. - -sche (discuss) 02:45, 16 June 2013 (UTC)
I’d consider lists that long without {{top2}} ~ 5 or {{rel-top}} badly formatted. Most of the content is unnecessary anyway. Why add SoP to Derived terms? Why add term only remotely related to dinosaur to See also? — Ungoliant (Falai) 02:53, 16 June 2013 (UTC)

Descendants section completeness[edit]

Appendix:Latin/metipsimus. — Ungoliant (Falai) 05:00, 23 June 2013 (UTC)

Impressive! - -sche (discuss) 14:07, 23 June 2013 (UTC)
I imagine we can find (or make) some really comprehensively 'descended from' PIE roots, too (especially using Yair's etymtree, modelled here, to show all the descendants on one page without actually duplicating the text itself across pages). - -sche (discuss) 14:10, 23 June 2013 (UTC)
I imagine if someone fleshed out Appendix:Proto-Sino-Tibetan/s-la or , it would overtake even शर्करा. - -sche (discuss) 02:12, 10 February 2015 (UTC)
Honestly, the bulk of it would come from outside of Europe, which is where Wiktionary gets weak. I could do it by spending a day at the library, I suppose, but it seems pointless. (Same situation with the descendants of ἐκκλησία ‎(ekklēsía).) —Μετάknowledgediscuss/deeds 20:04, 20 February 2016 (UTC)
Don't worry about getting to a library; just the translations we already have in tea will push this to the top, once I sort out who borrowed what from where. - -sche (discuss) 08:52, 21 February 2016 (UTC)
It's not so easy! For example, did the smaller languages of East Africa (for most of which I doubt we have translations anywhere on this project except at water) borrow it directly from the source, or via Swahili? I suspect the latter, but I doubt it's even knowable. For that matter, our entry says that Swahili got it from Hindi, which I suppose is ultimately true, but I bet it went via Persian. —Μετάknowledgediscuss/deeds 19:12, 21 February 2016 (UTC)
Do you mean via Arabic? Hindi चाय seems to derive from Persian (rather than to have led to a Persian word), because I've read that the y in Persian čây (and hence in Hindi chai) is a grammatical suffix the Persians added to the ča form they borrowed. For languages where it's not clear whether or not there was an intermediary, I've put it under the known ancestor with a note 'possibly via X'. (I wonder if all the loanwords are of interest to anyone besides Wiktionarians, or if we should collapse them or put them in a separate table, where they wouldn't swamp the few inherited [Chinese] words.) - -sche (discuss) 02:36, 22 February 2016 (UTC)

Most borrowings of the same word by the same language[edit]

The Narragansett term mishcùp (plural mishcùppaûog) was borrowed by English four times, as mishcup, scup, paugie, and scuppaug. (Even more impressively, the borrowings seem to have occurred over a relatively short timescale of no more than 200 years and quite probably less.) Are there other cases where one language has repeatedly borrowed the same word from another? If so, we could add a category.
See also #etymologies, where we're looking for examples of long/convoluted etymology chains.
- -sche (discuss) 22:21, 17 January 2015 (UTC)

If inherited terms and indirect loanwords count, Portuguese has mancha, mágoa, mácula, malha, mangra and maquis from macula. — Ungoliant (falai) 18:31, 28 January 2015 (UTC)
Impressive! It even has two distinct malhas, I see. Can you create entries for mangra and maquis?
And now that I see that macchia is from macula, I suppose English itself has macula, mail, macchia and maquis from macula.
(Which is not quite as impressive/inexplicable as borrowing the same term directly four times, but still impressive.)
- -sche (discuss) 19:07, 28 January 2015 (UTC)
Done. — Ungoliant (falai) 22:02, 28 January 2015 (UTC)
Found a few more: English macle, mascle, mackle, macule, and Portuguese macla. — Ungoliant (falai) 22:26, 28 January 2015 (UTC)
Re your edit summary: well, English does tend to steal words for any things it doesn't have words for yet... and also for things it does already have words for (e.g. royal, regal on top of kingly)... it just eats everything in sight, really. :b However, I think mackle and macule — words with the same very specific printing-jargon meaning, same Middle French source, and same basic /mVkVl/ form — are probably just forms of one another, rather than distinct words, whereas malha ‎(mail, from French) and malha ‎(stain in fur, from Latin) seem like distinct words, so Portuguese and English are actually tied. Has Portuguese ever used the Italianate macchia? Does it have its own macula-derived native designation of scrubland? Then it could retake the lead... :b - -sche (discuss) 06:19, 29 January 2015 (UTC)
That would be maquis. I was unable to find evidence of a form that is derived directly from Corsican or Italian. — Ungoliant (falai) 06:28, 29 January 2015 (UTC)
Oh, look at mail#Etymology 4, it looks like it’s also from macula. — Ungoliant (falai) 06:37, 29 January 2015 (UTC)
Century merges our (former) mail#Etymology 4 and mail#Etymology 2, and if both are (as they say) from the same Old French root, that seems reasonable.
I meant does Portuguese have a native (non-borrowed) designation for scrubland? Maquis looks like a borrowing from French.
In trying to track down the precise chain of descent for mascle, I've discovered that it, and possibly some of the other terms, may actually derive from mascula, from a Germanic source related to mesh. - -sche (discuss) 09:25, 29 January 2015 (UTC)
Vietnamese has cuốn, cuộn, cuợn, quận, quấn, quyển, quyền, quyến, and possibly uốn, all from Chinese (juàn, “to roll; a roll”). Wyang (talk) 08:25, 29 January 2015 (UTC)
Interesting. That looks like another case (like English scup, etc) of repeat direct borrowing of the same word. I wonder if in such cases, it's that different communities of speakers borrowed the word in different ways / for different purposes, and then all of the borrowings became part of the language, or what. - -sche (discuss) 09:25, 29 January 2015 (UTC)
Every time Vietnam was conquered by China, the Chinese officials brought with them the Chinese pronunciation of the character at that time. Plus all the pronunciations brought there by the individual Chinese immigrants speaking all the different tongues... A similar thing exists in Chinese too - every time the remote area (say, Min) is reclaimed by the central regime or is populated by refugees from Central China due to famine or warfare, the officials and migrants bring their pronunciations with them, resulting in the Literary and colloquial readings of Chinese characters. Min is probably the hardest hit, having as many as 5-10 different layers of pronunciations of the same word, used in different circumstances. For example, Min Nan (chheⁿ/chhiⁿ/seⁿ/siⁿ/seng), (chn̂g/chn̄g/sn̂g/soân/soan). Mandarin is no exemption, e.g. (luò/là/lào/luō), although there was only one pronunciation in Middle Chinese. Wyang (talk) 23:52, 29 January 2015 (UTC)
Fascinating! I suppose that's one of the neat effects of the writing system not being phonetic. - -sche (discuss) 01:23, 3 February 2015 (UTC)
Mancha, mágoa, malha and mangra are all naturally inherited descendants. This is why I like this etymological chain so much: the word at the same time underwent and avoided four sound changes. This chain alone disproves the Neogrammarian hypothesis. — Ungoliant (falai) 02:30, 3 February 2015 (UTC)
From PG *frankô: Frank, frank, franc, franco, Franco, firangi. Containing other elements: French, lingua franca. — Ungoliant (falai) 19:19, 10 February 2015 (UTC)
Missing entries: Franken ‎(= Swiss franc), franga ‎(former Albanian currency), frange ‎(currency of Korçë) (might not be citable). — Ungoliant (falai) 19:34, 10 February 2015 (UTC)
Plus french, France, farang, François. —Μετάknowledgediscuss/deeds 06:30, 1 June 2015 (UTC)
Here is another interesting one: PG *hringaz into Portuguese:
Ungoliant (falai) 18:49, 8 September 2015 (UTC)
Borrowing from different descendants of a proto-language intuitively feels different from borrowing (via whatever routes) from an attested language, although I suppose there's no real distinction (the word is making its way from the source language via various routes into the target language either way). There are probably a lot of examples of multiple borrowing from proto-languages; English has borrowed Appendix:Proto-Sino-Tibetan/s-la at least four times (probably more). - -sche (discuss) 18:17, 20 February 2016 (UTC)

Anteroom of Silliness[edit]

I nominate this definition — Ungoliant (falai) 00:35, 3 February 2015 (UTC)

Sadly, I've seen a lot of entries like that. There used to be some German ones, though I just googled "A German prefix" and it looks like they've all been taken care of. Czech still has do-, nade-, od-, roze-, and se-. - -sche (discuss) 01:19, 3 February 2015 (UTC)
I think this entry, especially the last def, deserves a place. Unless, of course, you have any clue whatsoever about what's going on there. —Μετάknowledgediscuss/deeds 07:04, 2 August 2015 (UTC)
Sadly, that entry is far from unusual; it's an example of what DCDuring justly calls "the near-incoherent terseness of our copyings of a 110-year old Sanskrit dictionary". Many (most?) of the things en.Wikt copied from Monier-Williams are similarly incomprehensibly curt or else not even English at all (महाभारत is a tame example; I can't offhand find any of the more elaborately unintelligible examples I've seen). - -sche (discuss) 04:18, 5 August 2015 (UTC)

Most spellings[edit]

疙瘩 has 18, including the lemma and simplified forms of alternatives. —suzukaze (tc) 03:23, 5 August 2015 (UTC)

Most pronunciations[edit]

A recent edit combined with the power of Template:zh-pron has given "龍眼#Chinese" 11 Min Nan IPA pronunciations (22 if "Taipei" and "Zhangzhou" are considered separate), along with IPA for Mandarin, Cantonese, and Min Dong —suzukaze (tc) 02:04, 24 August 2015 (UTC)

Impressive! Re Taipei/Zhangzhou: I conceived the section as a count of how many ways a term could be pronounced, so if it's pronounced identically in Taipei and Zhangzhou, I would think of that as one (or in this case eleven) pronunciation which is used in two places. (Traditionally, the formatting would also convey that, by labelling the pronunciation {{a|accent 1|accent 2}} rather than having separate lines.) But the Min Dong, Cantonese and Mandarin pronunciations bring the number up to 14. - -sche (discuss) 06:51, 24 August 2015 (UTC)
The tiny tone numbers differ between Taipei and Zhangzhou. ("liɪŋ23-11" vs. "liɪŋ13⁻22"). —suzukaze (tc) 07:01, 24 August 2015 (UTC)

Most syllables in a single glyph[edit]

This is language-specific and so not a good Hall of Fame category, but , , and all have unusually many morae packed into a single glyph. Does Chinese have any single glyphs that stand for 5+ syllables? What words in languages that you know pack the greatest number of syllables into the fewest letters? - -sche (discuss) 22:58, 19 February 2016 (UTC)

In Chinese, it's very rare for a single glyph to represent more than one syllable. This Chinese character (not in Unicode) represents 4 syllables, but I don't know of any characters that represent more than that in Chinese (though I'm not an expert by any means). The Arabic ligatures and , which are in Unicode, look like they represent about 10 syllables. And some of the Ancient Greek ligatures here may also be of interest. —Mr. Granger (talkcontribs) 23:11, 19 February 2016 (UTC)
can be read as túshūguǎn (Mandarin)/toshokan (Japanese) but it's contrived. —suzukaze (tc) 23:20, 19 February 2016 (UTC)
I wouldn't consider a single Unicode codepoint for a sequence of numerous Arabic glyphs making up several distinct, spaced words to be a single glyph. The polysyllabic Chinese signs are interesting; it's also neat that signs are still being created. It occurs to me that several languages have long letter-names (epsilon, double-u, etc, and even more if you add diacritics like ŵ), but self-referential things like that ("ŵ is a word meaning the sign ŵ") are not very interesting IMO. - -sche (discuss) 15:35, 20 February 2016 (UTC)