Category talk:English words spelled with diacritics

Definition from Wiktionary, the free dictionary
Jump to: navigation, search
This is the talk page for a deleted page. It is being kept for historical interest.


<Jun-Dai 01:46, 30 Apr 2005 (UTC)> I believe that zoölogy and coöperate are different. Those umlauts are part of an occasional tradition in the English language that you can see in certain editions of some books (such as my copy of The Souls of Black Folk) and famously in The New Yorker magazine that mandates the use of an umlaut over the second of two identical vowels when they are to be pronounced separately. I don't believe it has anything to do with borrowings. A good many examples of this come from words formed by prefixing re- with a base word that begins with e: reëxamine, reëlect, reëenter, etc. To put it another way, I've never seen zoölogy or coöperate in a dictionary. Also, latte should never have an accent over it. </Jun-Dai>

There are various reasons for diacritics in English, some of which are even spurious I suppose.
In the case of two dots over a letter to show that it is pronounced distinctly from its predecessor, this is never called an umlaut, always a dieresis / diaeresis. An umlaut represents a vowel qualitcy change in strong verbs or irregular plurals in Germanic languages. "Umlaut" can also be used interchangeably with "dieresis" as a generic term for the two dots when there is no context - but many still consider such usage wrong.
On "latté" I don't know if it should be correct or not but it gets 40,000 Google hits even with language restricted to English only. Maybe it's based on a misconception like I've been reading here about "resumé" with just one accent. It's not hard to find examples - here it is in a British online newspaper:
I doubt that most people use a glottal stop in words such as reëlect. It is quite possible to differentiate two identical vowels without an intervening stop. A true glottal stop occurs in "uh-oh". In most pronunciations I've heard there's something much more like a "y" between two "e"/"i" sounds and something like a "w" between two "o"/"u" sounds. Perhaps some people use an actual glottal stop though - so I've left that in but qualified it.
Hippietrail 02:29, 30 Apr 2005 (UTC)
<Jun-Dai 02:44, 30 Apr 2005 (UTC)> Umlaut->diareses. Thanks for the correction (something felt wrong when I was typing that, but I wasn't sure what). As for latté, we should have a category for words like that--where the accent is added to indicate that the final e is pronounced, or to indicate that the word is a foreign borrowing, even though the accent doesn't exist in the foreign word from which it is borrowed (much like the bizarre pronunciation of lingerie, which bears little resemblance to French). In any case, latté doesn't exist in any dictionary I can find, and latte gets more than 20 times as many Google hits (in English-restricted results), so while it wouldn't be wrong to have an entry, I wouldn't exactly say that we are "missing" one, or that we need to have one. Seperately gets 573k Google hits, yet I wouldn't say that we are "missing" an entry on it (let's not even think about teh). More importantly, if we do decide to have an entry, we should definitely include mention of the fact that use of latté would be considered a misspelling or typographical error by most editors/professors.</Jun-Dai>
I don't have any sources to check the latte / latté thing. I can see that it's not spelled that way in Italian and the word doesn't exist in Spanish ("con leche") or French ("au lait"). It does seem to be part of some trend in English to "Frenchify" or "Europeanise" or "exoticise" certain kinds of words by retaining and even sometimes inventing accents. I would consider "café" an example of retaining the accent being retained for exotic appeal and "latté" could well be an example of inventing the accent for the same reason. This is all without research of course.
I wouldn't really compare it to the spello seperately or the typo teh. I'm pretty sure that in the case of "latté" the people really believe they are spelling it correctly - even that they are spelling it "very correctly". It's all very interesting though (-: — Hippietrail 01:35, 1 May 2005 (UTC)
<Jun-Dai 01:47, 1 May 2005 (UTC)> I agree that it is not the same as seperately or teh, but I think it's in the same murky territory, and a comparison can be made. Teh is obviously a typographical mistake that anyone would spot, seperately is an error that many would believe (sometimes tentatively, sometimes not) to be the correct spelling, but would almost certianly stand corrected if someone pulled out a dictionary. Latté is, I believe, a word that most people would stand corrected on (i.e., most people, having written it, would agree that the accent is erroneous - though many would require a dictionary before being convinced), but a handful would defend. It is by most standards a spelling error, whereas seperately is a spelling error by pretty much all standards. That's my opinion on the matter, anyhow ;-) </Jun-Dai>
I just did a quick test and MS Word's spellchecker gives the thumbs up to both latte and latté for Australian, UK, and US English. I haven't checked better dictionaries yet but thought you might be interested to know. — Hippietrail 07:23, 1 May 2005 (UTC)

Latte, latté, caffè latte[edit]

Allow me to begin a thread to concentrate on this word alone.

Today I went into several bookshops. In the largest one I looked in the largest edition of each publisher's newest English dictionary. None had any entry including "latté". Of those with multiple entries, some had the two-word entry point to the single-word entry, others had the reverse.

More interesting, for those with a two-word spelling either as an entry or in the etymology, there was an unexpected array of spellings of the first word! Unfortunately I don't like to take notes in new bookshops and it was all too much to remember. The correct Italian spelling seems to be "caffè latte" but I also saw "café latte" using the French spelling, and "caffé latte", which I would think is just plain wrong. No-accent versions also occurred as I expected.

As for the "latté" spelling, an internet search turned up several places where people believe the accent should be there, or their spellcheckers told them to put it there. On this computer, MS Word allows both "latte" and "latté". Here are some pages: (accent on both words), (spellchecker forced accent), (caffè correct, latté incorrect)

One site suggests that Starbucks is responsible for the "latté" spelling, but this is not used on Starbucks' website.

Here are some numbers from Google on various spellings:

Web Groups News
latte 2,200,000 232,000 895
latté 94,800
lattè 54,500
cafe latte 96,600 7,530 10
caffe latte 82,500 2,410 13
café latte 43,100
caffè latte 23,800
café latté 11,000
caffé latte 7,780
caffé latté 1,290
cafe latté 880
cafè latte 835
caffè lattè 154
caffe latté 147
caffe lattè 43
cafè lattè 17
caffè latté 11
cafe lattè 4
cafè latté 3
café lattè 3
caffé lattè 0

(Google Groups and Google News don't differentiate accents. I tried and failed to use <small> on the table so fix it if you know how.)


Is this spelling used in English? I can find it in English contexts on Google but that's not definitive. I'll check my big-arse dictionaries soon. — Hippietrail 10:24, 8 May 2005 (UTC)

This is not in my SOED. — Hippietrail 02:30, 17 May 2005 (UTC)

archæism, Judæo-[edit]

The word archæism and the prefix Judæo- are used in a series of related articles on Wikipedia but I have been so-far unable to find them in a dictionary. The latter does seem to be used elsewhere on the Internet at least. — Hippietrail 17:20, 16 May 2005 (UTC)

The latter is in my SOED. — Hippietrail 02:30, 17 May 2005 (UTC)

Order of Entries[edit]

Hippietrail, do you want to continue assigning each word in this list to one of the 26 letters A to Z, or is this a relict of the pre-proper-capitalization era? Ncik 13:53, 16 August 2005 (UTC)

Hi Ncik, I'm not sure what you mean. If you mean do I intend to keep using the category sorting mechanism [[Category:café|cafe]], then yes since this is needed to keep the words in the order they appear in English dictionaries. But this isn't related to capitalisation so I'm not sure what you mean. Capitalisation and alphabetisation are separate issues anyway, English dictionaries sort "foo" and "Foo" and "FOO" identically. There are two bug reports in Mediawiki's Bugzilla which could be related, one has to do with category sorting and locales, the other has to do with category sorting currently being case-sensitive which is wrong for both encyclopedia and dictionaries. — Hippietrail 23:09, 16 August 2005 (UTC)
My question was incoherent. I was, indeed, not only concerned about capitalization, but also about non-Latin letters. But by saying that you want the words ordered as in English dictionaries you answer my question. Ncik 12:04, 17 August 2005 (UTC)

qué será, será[edit]

It is nigh impossible to type in "qué será, será" on my standard US english keyboard. I just gave up and constructed it from a bunch of other entries via cut and paste. This is a common borrowed phrase, so there should probably be a practical way to look it up. It seems that any word or phrase, commonly used in English, containing any diacritical marks should at least have a redirect from the base character spelling. In this case there should be a redirect from que sera, sera to qué será, será. Most words have their own entries in english with a "See Also" section pointing to the "correct" spelling (crepe, epee, ). Shouldn't that practice be policy? -- Speed8ump from 18:02, 15 February 2007 (UTC)

There is a Wiktionary extension to MediaWiki in the works called DidYouMean which addresses this issue without the need to manually create and maintain tens of thousands of links. Not all such words have an accentless spelling. M-W's dictionary includes accentless spellings for many words which other dictionaries do not but even they have a few words listed only with a spelling incorporating accents. — Hippietrail 01:34, 16 February 2007 (UTC)
Still no DidYouMean support? I see someone has created the redirect in question (not due to this conversation though). What's the policy on common misspellings? Wouldn't romanized spellings be considered common misspellings of borrowed phases? -- Speed8ump from 02:29, 20 September 2007 (UTC)

Request to split.[edit]

Diacritics and ligatures are just two different things. I think each should have its own category. Thoughts? bd2412 T 11:41, 5 March 2007 (UTC)

I agree, I think the category should be split into one for ligatures and one for diacritics. The two concepts are almost opposites (ligatures join letters, whereas some diacritical marks indicate that letters are separate). -- Beobach972 20:12, 10 March 2007 (UTC)
We ought to keep this category (well, not this exact category — we'd have to rename it to :Category:English words spelled with diacritics and ligatures — but the concept), though, to contain both of those subcategories. It could also contain any words that happen to be spelt with both diacritics and ligatures — I can't think of any offhand, but I'm sure a few must exist... -- Beobach972 20:12, 10 March 2007 (UTC)
Ok, so how about something like Category:English words spelled with non-standard characters for the super-category, with subcats for Category:English words spelled with diacritics, Category:English words spelled with ligatures and Category:English words spelled with diacritics and ligatures (although I don't know that we should bother creating the latter cat until we've found something that goes in it)? bd2412 T 21:18, 10 March 2007 (UTC)
Sounds like a plan. Shall we begin recategorising the entries, or do you have a robot in mind? -- Beobach972 01:48, 12 March 2007 (UTC)
Will Category:English words spelled with non-standard characters also be the default category for words (such as !Xóõ) that contain non-standard characters which are neither diacritics nor ligatures? -- Beobach972 01:55, 12 March 2007 (UTC)
There's no other place for them, so they may as well go here (but they're few and far between, so I'm not worried for them; they don't require another subcat). I think this is a job that could be done by a bot, as long as it can tell an á from an æ! bd2412 T 02:38, 12 March 2007 (UTC)
"Nonstandard character" is an arbitrary invented term which is not correct. These characters are standard in these spellings. Even the M-W online dictionary which gives ASCII-7-only spellings for most of these words still has a couple for which the only standard spelling is one with such an exotic character. Hmm why not "exotic character" then? — Hippietrail 03:03, 12 March 2007 (UTC)
How is "exotic" any less arbitrary and invented than "non-standard". I think the meaning is clear, as the vast majority of words in the English language contain one of 26 unadorned characters. Is there not some technical term that captures all letter form outside the initial 26? bd2412 T 03:13, 12 March 2007 (UTC)
Because "non-standard" makes a judgment on correctness which even conservative dictionaries do not agree with. "Exotic" only says that such characters might not seem usual to some people. English has always had adorned characters from the days of Old English and collected more in the Middle English period. Hell even G and J and Q and W are adorned characters. Hippietrail 14:51, 13 March 2007 (UTC)
Well, we could call them 'Non-ASCII' or 'adorned' characters... but I don't really suggest that. -- Beobach972 03:21, 12 March 2007 (UTC)
Non-ASCII while entirely accurate implies that ASCII was in some way a precursor to the English language rather than an old technology which limited the way English and other languages were represented for purely technical reasons which were valid at the time. Hippietrail 14:51, 13 March 2007 (UTC)
To me, "exotic" implies letters originating in a tropical paradise with pineapples and coconuts and a local volcano that's worshipped as a god. bd2412 T 15:10, 13 March 2007 (UTC)
I found one, by the way (a word with both a diacritic and a ligature): à contrecœur. -- Beobach972 03:21, 12 March 2007 (UTC)

Ok, Category:English words spelled with ligatures is filled up - will get the other one tomorrow if no one else gets it first. bd2412 T 03:23, 12 March 2007 (UTC)


just came across this word and was thinking it should be included here because of the ö. fwiw an icelandic word that is now used in english.Lotsofmagnets 15:21, 21 September 2008 (UTC)

Deletion debate[edit]

Keep tidy.svg

The following information has failed Wiktionary's deletion process.

It should not be re-entered without careful consideration.

Tagged by JackPotte. --Bequw¢τ 17:01, 17 December 2009 (UTC

Keep DCDuring TALK 23:00, 18 December 2009 (UTC)
Delete This category contains many terms with whitespace (eg à gogo). Use a category name with "terms" rather than "words" so as not to inaccurately describe its contents. Move entries to Category:English terms spelled with diacritics. --Bequw¢τ 18:36, 23 December 2009 (UTC)
Delete. For the same reason as #Category:Metasyntactic_words. JackPotte 20:10, 30 December 2009 (UTC)
Comment. Words can contain whitespace so your term "inacurate" is inacurate. "Ambiguous" would be a better reasoning. — hippietrail 20:33, 2 January 2010 (UTC)
I think the general feeling is the most people consider à gogo to be two words because of the space, so we should move it to the "less ambiguous" name. Mglovesfun (talk) 20:47, 2 January 2010 (UTC)
Yes, to describe coups d'état I would say that coup is the lemma of the first word of the term. JackPotte 22:54, 3 January 2010 (UTC)
Delete and merge into English spellings by character. --Daniel. 23:15, 3 January 2010 (UTC)

Suggestion: Merge into "English terms spelled with diacritics and ligatures"
Fails. Mglovesfun (talk) 22:36, 16 February 2010 (UTC)