Wiktionary:Beer parlour/2004/July-September

Definition from Wiktionary, the free dictionary
Jump to: navigation, search
This is an archive page that has been kept for historical purposes. The conversations on this page are no longer live.
Beer parlour archives +/-
2002
December
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014

Table of phonetical transcriptions

I think we need some tables somewhere (linked from the main page) of standardised IPA, SAMPA and AHD (or whatever) symbols to be used in transcribing pronuncations in Wiktionary. There seems to be a fair amount of variation in phonetical transcription due to different versions of IPA (eg, E versus e), regional variations or, perhaps, dare I say, ignorance. A standardised table, perhaps with a note that these are the transcriptions to be used in pronunciations, would help make this clearer. — Paul G 08:50, 1 Jul 2004 (UTC)

I'm a bit worried about arguing over all that stuff again like when I first arrived here. We could instead of forcing a standard just explain the variations. But then I think I've got my "Non-IPA Dictionary Style" system pretty well honed for "international" English. The problems are with "American" English especially rhotacisation and cot-caught merger. Most British/Australian/European dictionaries use minor variations of the same set of IPA. The latest Oxford is more radical. No US dictionary uses IPA and the "GenAm" set used by linguists differs substantially from all the "International English IPA" schemes, and from all the non-IPA schemes.
English is not alone. I've seen people on Wikipedia get very "outraged" about the usual set of symbols used for Spanish for example.
I wonder if we can boil this down to prescription vs description like we can with the definitions? (-: — Hippietrail 14:44, 1 Jul 2004 (UTC)
Fair enough, but I want to proscibe phonetic symbols, not pronunciation. For example, I saw o: used somewhere the other day in to transcribe the "or" sound in "paw" into IPA instead of ɔː - the symbols used were incorrect rather than the intended pronunciation. What I would like to see is a table showing that, for example, this sound in British English is transcribed as /ɔː/ in IPA (with corresponding symbols for other Englishes). The page at [1] is not sufficient for this purpose and is of no use to those unfamiliar with systems of representing phonemes. Paul G
Paul, are you sure you don't mean prescribe instead of proscribe? They are almost opposites. :-)
I have no particular objection to these tables to explain the options in an [[Appendix:]] pseudonamespace. As in most issues I support a flexible approach to the subject of pronunciation. Nevertheless, I am aware of the dialect problems associated with pronunciation. So for the most part I don't bother with them unless there is a necessary point to be made, and then I do prefer IPA. Eclecticology 16:49, 2 Jul 2004 (UTC)
Yep, prescribe, that's what I should have written - not "proscibe", nor even "proscribe" :) — Paul G 20:43, 16 Jul 2004 (UTC)

French spelling

I'm just wondering whether we should consider French words with "œ" and "oe" as spelling variants or orthographic variants, or maybe it's only in the world of computers that they are variants because the frequent lack on keyboards, fonts, etc, of the ligature.
In English I would definitely consider the "œ" spellings as incorrect, archaic, pedantic, or foreign. But French dictionaries only ever list a single form, preferring the ligature in print. I have read that two words are never distinguished by whether or not they use the ligature and that any adjacent o and e can always be ligated no matter the context.
If they are a lesser variant than being full spelling variants, I would recommend redirecting or see-also-ing the version with disjoined o and e, to point to the version with the ligature.
Hopefully some know better than I. — Hippietrail 05:17, 3 Jul 2004 (UTC)

This can be controversial. My view is that the ligature is obsolete, useless and unnecessary. (The English "ae" ligature "æ" can always be replaced by the two letters, and often only the "e" as with "encyclopædia".) I believe that you are right in saying that there are no minimal pairs where the separate pronunciations are the distinguishing factor. It would be wrong to say that the "o" and "e" can always be ligated. Most often, however, the places where they absolutely cannot be ligated are distinguished by some appropriate diacritic on the "e". BTW I would interpret "spelling" and "orthographic" variants as meaning the same thing; I would consider the variation in question as typographic. Eclecticology 15:30, 3 Jul 2004 (UTC)
I would agree to the ligatures being obsolete in English. I'm asking about French where books I own published in the past few years still use the ligature throughout. Can you provide any example French words which contain plain "o" follwed by plain "e" which cannot be ligated?
By spelling I mean simply which letters exist in a word, in which order, as you would speak it. By orthography, I mean spelling plus all the other written rules such as apostrophes, hyphens, capitals, ligatures, some uses of diacritics, plus topics which don't concern us here such as punctuation. If you have a better pair of words for this distinction please feel free to introduce it. — Hippietrail 15:56, 3 Jul 2004 (UTC)
  1. moelle
  2. According to OED: spelling = the process or activity of writing or naming the letters of a word; orthography = the conventional spelling system of a language; the study of spelling and how letters combine to represent sounds and form words. Your distinction may very well be there, but it's not obvious. I consider accents to be a part of the spelling of a word. That's why I would consider the distinction between oeuvre and œuvre to be typographic. Eclecticology 18:33, 3 Jul 2004 (UTC)
For what it's worth, a quick survey shows the French Wikipedia appears to use them interchangeably in articles (e.g. w:fr:œil), but all "oe" in titles redirect to "œ" variants, and there are people who go around replacing oe with œ where applicable.. In the article w:fr:Ligature_(typographie) there is a section "Œ n'est pas OE" which discusses this which may or may not be informative. —Muke Tever 23:13, 3 Jul 2004 (UTC)
I found that article quite pedantic, but I realize that supporters of the ligature can get very obstinate about their interpretation. For me it's not an argument that I'll push to the bitter end, but I will continue to write the relevant words as two letters. If the advocates for the ligature want to change these entries it's up to them to do it, but they should at least create redirects for the benefit of those who may not be able to write the ligature. Eclecticology 00:34, 4 Jul 2004 (UTC)

Synonyms

Suppose the word X has synonyms Y and Z. Then there are two approaches to providing these synonyms. One is to list Y and Z under X, X and Z under Y, and X and Y under Z. The other is to have a cross-reference from the less common of the two words to the most common, where all the synonyms are listed.

The first approach is used in some thesauruses (eg, Collins Thesaurus) and makes synonyms quick to find (simply look up the word you want synonyms for, and there they are). However it makes for inconsistencies and incompleteness. For example, if I look up "bay" (in the geographical sense), I find "bight, cove, gulf, inlet, natural harbour, sound". If I look up "cove", I find "anchorage, bay, bayou, creek, firth/frith, inlet, sound", a very different list. If I look up "bight", there is no entry because the book's policy is not to list words considered too rare for someone to want to look them up.

The second approach is the traditional one used in thesauruses such as Roget's. The reader looks up the word in an index and is cross-referred to one or more lists of synnoyms. This gives a much smaller book but longer searches. The advantage is that there is one list per group of synonyms and so the results are consistent. The disadvantage is that shades of meaning are not necessarily so clear. Looking up "bay" in Roget's Thesaurus (1966 edition), I am referred to "gulf" where the synonyms listed are "bay, bight, cove, creek, lagoon; inlet, arm of the sea, fiord; mouth, estuary; firth, frith, kyle; sound, strait, belt, gut, euripus, channel" - a longer list, but requiring division into semicolon-terminated sublists for slightly different meanings of the word.

I suggest we go with the second approach for consistency's sake. For example, the word "nonsense" currently has a fairly comprehensive list of 19 synonyms, many of which have already been defined. The first of these, "balderdash", has just five of these 19. "Poppycock" has just one, while "bull" has none listed. I haven't checked all of the synonyms, but there might well be some that list synonyms that are not in the 19 under "nonsense".

Cross-references might look like this:

====Synonyms====
''See'' [[xxx]]

maybe with the addition of a comment to ensure that synonyms are added at the linked word and not at the current word.

Cross-referring in this way will save a lot of needless repetition and will help make Wiktionary more consistent.

Paul G 07:50, 4 Jul 2004 (UTC)

I am in favour of having a proper thesaurus either as part of Wiktionary or some Wikithaurus. I know the synonyms sections are a bit flaky right now but I'm also in favour of keeping them. — Hippietrail 09:06, 4 Jul 2004 (UTC)

I don't have strong feelings either way. Perhaps Paul's suggestion could lead to putting an explanation of how these synonyms differ. Some of the 1913 Webster pages already have a start on something of the sort. Eclecticology 06:42, 5 Jul 2004 (UTC)

Note that there are really very few words with the same meaning in a language, that is there are few which are strictly "synonyms". If we are to do this, I suggest we do it in a way that explains the relationship between the two similar words. Here are two quotes from Webster 1913 that illustrate the reasons of it more, and below them is an example of what I think we should do:

All languages tend to clear themselves of synonyms as intellectual culture advances, the superfluous words being taken up and appropriated by new shades and combinations of thought evolved in the progress of society. - De Quincey
In popular literary acceptation, and as employed in special dictionaries of such words, synonyms are words sufficiently alike in general signification to be liable to be confounded, but yet so different in special definition as to require to be distinguished. - G. P. Marsh

From Webster's New World College Dictionary, Fourth Edition (1999), here are two examples of what I mean by explaining the relationship of words, and what I think it would be excellent if we were to do, and more correct, compared to a plain list of so-called "synonyms". For many of these in the dictionary, the meaning for the different synonyms are often the same in the ordinary definition. There is illustration of variegation in meaning in these entries of synonymy, however, each of which is situated in a grey colored area below the ordinary definition of the most general word, referenced by the other synonyms and referencing the antonyms where available. Situated below/within the definition of stubborn:

SYN.—stubborn implies an innate fixedness of purpose, course, condition, etc. that is strongly resistant to change, manipulation, etc. [a stubborn child, belief, etc.]; obstinate applies to one who adheres persistently, and often unreasonably, to a purpose, course, etc. against argument or persuasion [a panel hung by an obstinate juror]; dogged implies thoroughgoing determination or, sometimes, sullen obstinancy [the dogged pursuit of a goal]; pertinacious implies a strong tenacity of purpose that is regarded unfavorably by others [a pertinacious critic] —ANT. compliant, tractable

Situated below/within the definition of suave:

SYN.—suave suggests the smoothly gracious social manner of one who deals with people easily and tactfully [a suave sophisticate]; urbane suggests the social poise of one who is highly cultivated and has had much worldly experience [an urbane cosmopolite]; diplomatic implies adroitness and tactfulness in dealing with people and handling delicate situations, sometimes in such a way as to gain one's own ends [a diplomatic answer]; politic also expresses this idea, often stressing the expediency or opportunism of a particular policy pursued [a politic move]; bland is the least complex of these terms, simply implying a gentle or ingratiating pleasantness [a bland disposition]

- Centrx 20:39, 13 Jul 2004 (UTC)

Also, if there be links to the pages of synonyms, the sense must be specified. - Centrx 21:09, 13 Jul 2004 (UTC)

"Sou"

For some reason, the database won't let me create a page called "sou" which is to carry the following (unwikified) content. It reports an error that says something about broken links to and from "sou". Can anyone explain why this is, and fix it? — Paul G 16:25, 6 Jul 2004 (UTC)

==English==<br> ===Homophones===<br> *Su, Sue<br> *sue<br> <br> ===Etymology===<br> French<br> <br> ===Noun===<br> '''sou''' (''plural'' '''sous''')<br> <br> #An old French coin worth five centimes.<br> <br> ----<br> ==French==<br> ===Homophones===<br> *sous<br> <br> ===Noun===<br> '''sou''' ''m'' (''plural'' '''sous''')<br> <br> #sou<br> #five-centime coin, five-centime piece<br>

I checked and had something similar happen. I also checked the "What links here" pages for "sou". Four of the five seem OK but Wiktionary:Romanica index s2 came up with a whole long page of errors, so the problem could be related to that. Eclecticology 17:35, 6 Jul 2004 (UTC)

Princeton WordNet

Since nobody seems to be watching the page anymore, I thought I'd point out that progress has been made on clearing the Princeton WordNet database for inclusion into the Wiktionary. You can read all about it at Wiktionary_talk:Princeton_wordnet. --Wclark 22:12, 12 Jul 2004 (UTC)

ick, I hope that doesn't happen. It gives different words identical definitions, even those which would not normally be considered rough synonyms. It seems this might be the nature of it, that certain words simply have as definitions pointers to the definition in a "main" definition. Other than this, it's not a good dictionary, with inaccurate and incomplete definitions. It's presence will mostly only be to serve as a placeholder and not a basis, because its definitions will ultimately be wholly replaced by accurate ones. - Centrx 20:46, 13 Jul 2004 (UTC)

There seems to be a real effort to establish that our usage would be within what is allowed by copyright law. At the same time I see no arguments supporting why we need this material or how it is proposed that we use it. Eclecticology 09:33, 14 Jul 2004 (UTC)

The synonym information might be useful, and some of these definitions could certainly be used as placeholders. I'm setting up a local copy of the database on my server, and will see what (if anything) I can do with it. (Is there a bot request page for the Wiktionary? .. I'll go look) --Wclark 18:25, 19 Jul 2004 (UTC)

Many of the definitions would solely be placeholders and not definitive information. In other words, incorrect information would be holding the place, and would need to be replaced. - Centrx 19:34, 19 Jul 2004 (UTC)

Pseudocognates?

I've argued that homophones in other languages is not a good idea for Wiktionary. I do however think that including cognates (toward the bottom of the article) is a good idea. False cognates will already be on the same page when the spellings are identical. But what about words which look or sound alike and have similar meanings in different languages? I'm not sure what they are called. They are not so common as to clutter Wiktionary and if positioned toward the bottom of the article won't distract people looking for the basics. But they are very interesting to people who are interested in languages generally. For instance, I've just discovered that the Kanji 綿 has the reading "wata" and meaning "cotton wad". "Wata" is very similar to words in European languages including our own wad, Russian вата, and Swedish vadd (cotton wool).

It would be a shame not to include such tid-bits - but under what heading? — Hippietrail 01:32, 14 Jul 2004 (UTC)

Etymology (with a "compare", it here being a "Cf.") seems appropriate, the relationship is in the history the word. For instance, the OED etymology for wad (bundle) is:
[Of obscure origin; the identity of the word in all the senses is not quite certain. With sense 3 cf. mod.Sw. vadd, G., Du. watte, Fr. ouate (whence It. ovatta), wadding; the etymology and mutual relation of these words are unknown.]
A better example of an equivalent cognate this is most basic to all(?) the Indo-European languages (mother) is:
Cognate with Old Frisian mder (West Frisian moer), Middle Dutch moeder, mder (Dutch moeder), Old Saxon mdar, muoder (Middle Low German mder, moeder), Old High German muoter, muotir (Middle High German muoter, German Mutter), Old Icelandic móir, Old Swedish moir (Swedish moder), Danish moder, and further with Sanskrit mt, mtar-, Avestan mtar-, ancient Greek (Doric) -, , (Attic and Ionic) -, , classical Latin mter ( Old French madre, medre, Old French, Middle French mere, French mère, Old Occitan, Occitan maire, Catalan mare, Italian madre, Spanish madre, Portuguese mãe), Gaulish mtr, Old Irish mthir, Tocharian A mcar, Tocharian B mcer, Old Church Slavonic mati (genitive matere), Russian mat´, Latvian mte, Albanian motër (in sense ‘sister’), prob. orig. a derivative (with suffixation) of a nursery word of the ma type (see MAMA n.1).
A less basic one from the Romance languages:
[a. F. liberté (14th c. in Littré) = Pr. libertat, It. libertà, Sp. libertad, Pg. liberdade, ad. L. lbertt-em, f. lber free.]
Here is a good example for ordinary cases, at the end just putting a "Compare" and listing the others:
< Anglo-Norman nombre, noumbre, numbre, nounbre, nunbre, numere and Old French, Middle French nombre sum, total (early 12th cent. as numbre), grammatical number (13th cent.), a (large, small) quantity (14th cent.), conformity in verse to a regular measure (1549) <[...]. Cf. Middle Dutch nomber, nommer, nommere (Dutch nummer, (archaic) nommer), German Nummer, Norwegian nummer, Swedish nummer, Danish nummer. Cf. NUMERO n.1, NUMÉRO n.2 and also NO. n.2

- Centrx 02:09, 14 Jul 2004 (UTC)

This is all good stuff for actual cognages but I doubt that the Japanese is cognate. It's just a coincidence. I don't think coincidences should be in the Etymology section surely. — Hippietrail 02:30, 14 Jul 2004 (UTC)
Yes, I don't think we should have coincidences in the dictionary at all. Should we also include mentions of, say, that a particular word happens to be the name of some nonce thing in some random film? If it's not related, no relation should be indicated. - Centrx 03:10, 14 Jul 2004 (UTC)
And why not? It's interesting. And unlike some other ideas around here, isn't likely to run rampant. I don't get your point about random films at all by the way. — Hippietrail 03:17, 14 Jul 2004 (UTC)
If you mean to add only words which have similar spelling or pronunciation and similar meaning, then it wouldn't be rampant, but it is problematic in that it doesn't really have any place to go. And "wata", if that is the pronunciation it would seem to be, doesn't sound that much like "wad". It is dissimilar enough that it's likely there are numerous such similarities, even a rampant amount. My point about the film was that we can't start having sections that are just "Similar things", "Coincidences", or "Unrelated Trivia". About Eclecticology's below comment, I don't see how we can confirm that it is totally unrelated, and I truly don't see how someone is going to get confused between "wad" and something that sounds like "wata" (looks more like "water" in a Boston accent) and has a totally different typography. How is it so obvious that it is unrelated yet so unclear that should be mentioned? - Centrx 20:51, 14 Jul 2004 (UTC)
Spelling is less interesting unless it's exact since orthography from language to language varies greatly in so far as which digraphs stand for which sounds especially. Others may feel differently.
As for sound, "wata" is absolutely logically possible to be related soundwise. Japanese has no "v" sound. Many languages including Japanese equate a "v" and a "w" sound when borrowing. Japanese also has a strictly "VC" syllable structure except for final "n". Neither "wat" nor "vat" are possible in Japanese. If for instance, Japanese did borrow this word before the Meiji era from an Indoeuropean source and give it an ateji, this character could well be the result. This is what makes it interesting.
You seem to be arguing both sides, I can't quite make it out. But if we don't know the etymolgy fot the Japanese word, perhaps it's possible they really are related. All the more reason to include it. But if we don't know for sure it probably doesn't belong in the Etymology section proper - that section shouldn't be for hypotheses.
I'm inclined to agree that this stuff should be a part of the etymology in a manner similar to what Centrx suggested. Coincidences need to be noted at some point, even if only to keep people from falling into a trap of believing that a relation exists. Eclecticology 09:21, 14 Jul 2004 (UTC)
Whether something is totally unrelated also needs to be borne out by the facts. The inability to establish a relationship soes not automatically imply the lack of a relationahip. There is nothing in what I said that "confirms" anything either way about the Japanese word. Just because the similarity between "wad" and "wata" is so obvious does not allow us to jump to any conclusions. The fact that the word was written with Kanji rather than Katakana suggests a non-relationship more than anything else. In other circumstances the character 綿 can also be pronunced men. Eclecticology 01:33, 15 Jul 2004 (UTC)
All I meant about the obviousness was: if the basis of saying that its unrelated is just because of it 'being obvious' that it is, then we should not at the same time think that we need to explicate the difference. I also think it will get pretty cluttered if we just have lists of 'words that are different', which is all we can do: make lists of it. - Centrx 02:27, 15 Jul 2004 (UTC)
I don't understand what you mean by lists of "words which are different". — Hippietrail 03:06, 15 Jul 2004 (UTC)
That's all we could do if we were to present the Japanese word, is put it in a list as a linked item. For, all the information about the word belongs in its own page, and any other similar words would go in the same list. - Centrx 18:44, 18 Jul 2004 (UTC)

Purpose of the Beer Parlour

Is it now Wiktionary policy to stop using this page for discussion because of its size? Are new discussions relevant to many entries to be continued in the talk page of a single affected word? One contributor has just moved a discussion back out to Talk:Mongolian. Should we start moving out the other topics now or should we discuss it here? — Hippietrail 09:44, 14 Jul 2004 (UTC)

I'm glad that you've made this a discussion point. I don't, however, think of this in terms of "policy", but as a point where we need to develop an understanding. Reversing what a regular contributor has done at least means, "We have to talk about this." Before I moved the material back to Talk:Mongolian I saw that the size of this page was up to 135k. Most of it is old discussions that few of us ever look at; occasionally the top part is lopped off into an archive that we are even less likely to look at. We really need to consider the difficult art of factoring, and, yes, we should start moving out other topics when the discussion has calmed. Some of the topics should even be deleted completely.
I know that discussions around a single word can become generalized well beyond the relevance to that word. A more appropriate approach when this happens would be to put a short summary of the broader issue here as a new topic with a link back to the place where it arose. Eclecticology 17:21, 14 Jul 2004 (UTC)
How about just putting the names of the new topics here, linking to an independent page per topic. Or letting the topic stay here until it gets to say one "screenfull" and then moving it to it's own page. Perhaps "Wiktionary:Beer parlour:XXX" — Hippietrail 01:11, 15 Jul 2004 (UTC)
Yum! Mexican beer!! :-) It's good to have a variety of strategies available. If a discussion has begun on an article talk page, and it has a potential for wider application, writing a summary of the issue here and creating a link from here to that article should be enough. I don't know if a screenful is an appropriate measure. What can happen when a new topic is raised is that we have a flurry of activity that can quickly generate many screenfuls of information, but the discussion can just as quickly die down. Creating new pages for each of these may be a worse solution, like getting a bigger carpet because the old one was no longer big enough to handle all the piles of dirt that you wanted to sweep under it. A lot of these completed discussions can be moved to existing talk pages; others can simply be deleted. There are frequent newby questions which any experienced Wiktionarian can answer very easily. How long should we keep those? Eclecticology 08:20, 15 Jul 2004 (UTC)
Actually I'm a Negra Modelo man myself but I digress. I'm a bit worried that conversations on talk pages might lead to interesting results only to have the issue come up again later when the original participants are no longer around or the article has been forgotten. Not so long ago I broght up a similar issue under you or your to what is now in Mongolian, and have been thinking about a related issue that might go under aunt. Its a shame when new discussions are not aware of older ones.
We also need to sort out a decent way to maintain an FAQ. When such questions come up and are resolved we can put it there. I haven't looked at our current FAQ since my earliest days here. — Hippietrail 09:52, 15 Jul 2004 (UTC)
I've really become partial to German Weiss beers. It's only recently that a microbrewery here in the Vancouver area has finally introduced one. There's a risk in trying to find a single solution that fits all. I see some radically different problems in dealing with nationalities, pronouns and kinship terms. You are indeed right in saying that we are condemned to repeat these discussions. Still no newcomer is going to be inclined to wade through Stale Beer parlour archives to see if the particular problem has been discussed before. Perhaps the time to write up an interesting result is after a consensus has been achieved; it would summarize what might otherwise have been a very long discussion, and explain the rationale behind the decision. Maybe it should go on the relevant page in the "Help:" namespace. The only thing required for maintaining the FAQ is people to do it. In my experience with FAQs generally is that they never answer the questions that I have, and, like you, I end up not bothering to look at them.

Proposed solution to the problem of translations getting out of synch with definitions

In "Inserting New Definitions Without Renumbering?" above, I proposed a solution to the problem of translations getting out of synch with definitions when definitions are added, removed or reordered.

I believe I have come up with a workable form of my proposal. I have set up an example at mace. Note that the columns contain summarised definitions, which solves the problem of the table becoming too wide.

What do people think? Should we adopt this until a software solution becomes available?

Paul G 15:55, 15 Jul 2004 (UTC)

Having tried out a few pages, I think this works very well. A similar approach of summarising the definitions can be used for the synonyms to make sure these stay in synch too. See bike for an example. There is not really any need for a table for synonyms as there is only one language to deal with.
I was able also able to add two new senses to bike without any concerns about messing up the translations that were already there. — Paul G 16:31, 15 Jul 2004 (UTC)
Brilliant! Why didn't I think of that before? I've always felt that the use of numbers was a big problem. Traditionally that old way always left a lot of white space to the right of the translation list, so this solution is making use of waste space rather than making the article longer. The problem of extra-wide tables should only arise with words that have more than 6 or 7 meanings. At that point dividing into two separate tables could be a possibility.
If we're going to enter a cleanup campaign about this there are prabably a few other things we can fix at the same time. Mostly cosmetic issues that would not merit a separate campaign by themselves. Some time ago we agreed not to link on the language names; this would be a good time to clean that up. I think that the table would look better, however, if the language names wore bold; a simple "th" instead of "td" should fix that. Eclecticology 17:02, 15 Jul 2004 (UTC)
Thank you. Where can this be publicised so that new entries use the format and a clean-up campaign can get underway? — Paul G 18:07, 15 Jul 2004 (UTC)
I suppose Wiktionary:Announcements is the place. Thanks suggesting I the language names should be bold - I've been through and changed all the tables I've added so far. — Paul G 18:14, 15 Jul 2004 (UTC)
I don't think using tables is necessarily a very good idea. It makes adding new meanings terribly difficult and also adds manual work for instances where many of the meanings are translated with the same word in some language ie. repeating typing the same word instead of just listing numbers of cases where one translation is applicable to many meanings. The tables are messy in the source and not closing tags makes invalid XHTML. There is an another more simple way to do tables which is described in MediaWiki users guide --Juxho 19:02, 15 Jul 2004 (UTC)
You are really raising two distinct issues here. I have no problem with the option of using the MediaWiki format for tables. I have no complaints when people set up their tables in that way. It's the first point that's more relevant to the present situation. I don't see anything in your comments that would provide an alternative to the clearly deficient technique of referring to the different meanings by number. For sure, repeatedly typing the same thing is tedious; perhaps a single symbol that means same as in the box to the left could do the trick. Eclecticology 21:11, 15 Jul 2004 (UTC)
It's interesting to see differing views. I have no problem with suggestions for refinements to this approach. — Paul G 08:32, 16 Jul 2004 (UTC)

This method has one big drawback: it is complex and it is difficult to set up; a casual user will not do it this way or will not contribute. Using numbers it really a major pain, they are often wrong. With really long lists, it will be even difficult to remember what value was what column. You could have a look at nl:Engels where a long list of translations is presented in two columns where a next meaning would be in another colour block. One problem in the current nl:way is that the numbers so not increment properly. It may be better to do without the numbers. GerardM 08:55, 16 Jul 2004 (UTC)

In reply to the points raised:
1. Where translations are the same for more than one meaning, use cut and paste.
2. Users are free to create tables using either method.
3. If someone adds a meaning, they have to add a new column, granted, but this can be done quickly in a text editor if the <td>'s are kept aligned vertically in the HTML.
4. It does take a little work to set these up, but:
a) existing translations can be converted quickly into tables using a text editor
b) a user starting a page can add a table with translations for one language, or a blank table, or no table at all, leaving this for someone else to do
c) newbies need not add tables at all if they don't want to - they can provide translations using one of the earlier methods and leave someone else to format them (as already happens with other aspects of the format when newbies create pages). There is no obligation to knock out a perfectly formatted page first time.
5. The only criticism I would have of the "colours" idea used in the Dutch wiktionary would be that it takes up a lot of space vertically and repeats the language names for each set of translations. GerardM, can you give an example of a page where more than one sense is translated? Other than this, it combines the strengths of the tabular method without the disadvantage of the difficulty in adding another sense (just add another block in the appropriate place among the existing ones).
Bear in mind that this is the first workable solution presented to this problem. It is bound to be imperfect and have teething troubles. Any further suggestions for improvement are welcome.
Paul G 14:49, 16 Jul 2004 (UTC)
I've experimented with a variation on the Dutch format at obtain. The pastel colours seem redundant - you just need a block for each definition, with a summary of the definition at the top of each block to ensure they match up.
These were a little fiddly to set up too, but probably no more or less so than my tables of translations. Another disadvantage is needing to know where (or when) to break the table into two columns. Again, the tabular set-up is something that will put newbies off. On the other hand, it is much easier to add both new definitions and new languages to the tables in this form. — Paul G 15:53, 16 Jul 2004 (UTC)
Perhaps we should weigh up the various methods and vote on which we prefer before proposing that it is adopted as a standard for the English wiktionary.
Paul G 15:53, 16 Jul 2004 (UTC)
Ah, this is *much* easier to edit than my tables. Just paste in a template, then paste in the translations. New translation - add it to the list. New sense - paste in a new table of a different colour (if colours are necessary). — Paul G 17:27, 16 Jul 2004 (UTC)
The colours probably aren't needed unles you want to emphasize "shades" of meaning. :-)
What you have done is still a table, but using the alternate format. This is working because the translations are into only two other languages. Eclecticology 17:42, 16 Jul 2004 (UTC)
The colours are there to group meanings together; the background can be the same for all of them. What it does it signifies where one meaning continues into the next. Multiple colours might be usefull when you standardize on one colour for translations, another for synonyms, entymology etc.
One other reason why I dislike the table scheme, when you use programs to get translations from one wiktionary to another, it makes it a lot more difficult to code for. GerardM 20:36, 16 Jul 2004 (UTC)
I have added a meaning to nl:Chinees (Chinese) by your request and cleaned the Romanian words by adding a few ";". I added the 1 and 2 to see how that works out at the beginning of a block. I do not vouch for the translations of the second meaning. It certainly looks a mess (I did copy it from en: ). It is worth to note that I did not invent this scheme, it is however the best I have seen sofar. Credit is due to the person who also added the links to the word in the other wiktionary. GerardM 21:12, 16 Jul 2004 (UTC)

Your five points present a good summary of your solution, though I would differ on number 3 by not making it mandatory for them to add the new column. Perhaps for the same reason that we would not expect everybody to start a table. Those who are interested in the strictly English side of the Wiktionary tend to ignore all translations, and that was probably the primary reason why these things would get out of sync.

I prefer to avoid voting on things; reaching a consensus is much better. Votes only manage to divide the community and create confusion. The recent vote on first letter capitalization is a good example on how things can get screwed up. It began with having the issue badly phrased by conflating three questions into one. The result was that nothing happened even if there might have been some agreement on particular elements of the issue. Wikipedia is known for some of its disastrous votes, which is why some of us are happier spending our time on the smaller sister projects. Sorry if this seems like a rant, but I have strong feelings about what voting can do to a community. Eclecticology 17:32, 16 Jul 2004 (UTC)

I propose that we use the same type of section headers for different major meanings (etymologies). For instance, the third definition for mace is of a different etymology--it is a different word. So, the two sections would be "Weapon" and "Spice" or "Noun: weapon" and "Noun: spice" or the level 2 headers would be "Weapon" and "Spice" and there would be level 3 headers "Noun" which would accommodate multiple types of speech, as there are many words where the Noun and the Adjective, for instance, are of the same etymology and are more the same word than two differnet Nouns. This is a far better solution than having section headers like "Noun 1" or "Etymology 2". - Centrx 18:53, 18 Jul 2004 (UTC)

Having used this format on many pages it certainly makes things easier and neater. I think we can safely dispense with the colours, or maybe have a single colour (perhaps blue (colour #BFBFFF), which is sufficiently unobtrustive). Having multiple colours in a fixed order makes it a little hard to add new tables of translations when new definitions are added between existing ones. I think I will go with this in future. — Paul G 12:50, 19 Jul 2004 (UTC)
Make that yellow. This contrasts better with the blue and red of existent and non-existent links. — Paul G 12:56, 19 Jul 2004 (UTC)

How about having three templates: Template:Tnew Template:Tmid Template:Tend, each having the corresponding code. Anyone can easily create the tables, and you can then change the colour later. Or maybe just CSS that. --- Never mind, it doesn't work. -- Blade Hirato 05:51, 10 Aug 2004 (UTC)


Sasxsek yet another artificial language..

Yet another artificial language.. What is the point of having these in a wiktionary? When do we want them in a wiktionary? Or is anyone free to add whatever artificial language ?? This one is not even defined in wiktionary or wikipedia for crying out loud! GerardM 13:28, 16 Jul 2004 (UTC)

Don't knock conlangs. Considering the nature of wiktionary, there's nothing stopping us from having just about every language in existence. Sasxsek is intended as an intentional auxilary language, like Esperanto, and just might be spoken by somebody. --Vladisdead 14:12, 16 Jul 2004 (UTC)
See the front page: "Welcome to Wiktionary, a collaborative project to produce a free multilingual dictionary in every language..." - every language, without restrictions, other than the tacit restrictions that these are natural languages (as opposed to computer languages) and are used by humans (as opposed to any pointless inventions such as Martian, Venusian - Klingon is, in a sense, used by humans as there are many Trekkers who speak it). So Sasxsek is as legitimate as Esperanto or even English. — Paul G 15:01, 16 Jul 2004 (UTC)
When it comes to conlangs it needs to be something more than someone's or some small group's flight of fantasy. We have no way of telling whether the contributor is making it up as he goes along. As things stand Wikipedia doesn't even have an article about it showing its history, or its linguistic basis. The onus should be the contributor to establish the legitimacy of the language. The speculation that it "just might be spoken by somebody" is not enough.
I regret that I did not look closely into Romanica when it was added. It does have a Wikipedia article, but that article does not satisfactorily explain why it is distinct from Interlingua, a more broadly recognized conlang. We are not bound by the Wikipedia's decision to create a Wikioedia for a particular language, but I doubt that the developers will have any enthusiasm for creating a separate Wiktionary for a language that does not have a Wikipedia. We would do well to consider the debates that raged over establishing Wikipedias in Klingon and Toki Puna. Eclecticology 17:07, 16 Jul 2004 (UTC)

I do not mind Klingon, Esperanto, Interlingua, Intenlingue but this one appears without an introduction, without a mention in either wiktionary or wikipedia. That is the least that can be expected. Or should I start Smurfish ?? You know the language of these little blue men with white trousers where every noun is smurf ?? GerardM 21:16, 16 Jul 2004 (UTC)

See also http://pikachize.eye-of-newt.com/ Maybe we can get smurfs and pikachus to communicate together? Eclecticology 00:59, 17 Jul 2004 (UTC)
I tried to bring this up at Wiktionary talk:Criteria for inclusion awhile back but got no responses... Any case, I would be disinclined to include any constructed language that was not either a serious and well-used auxlang (such as Esperanto, Interlingua, or Solresol, but probably not Romanica—criteria for "well-use" might involve inclusion in other standards such as ISO 639 etc.), or a reasonably complete language published in a well-known work (such as Klingon or Quenya, but probably not something small like Watership Down's Lapine). Myself I invent languages, but I don't expect them to belong here (I started a separate wiki for my conlangs and anybody else's that cares to). —Muke Tever 23:01, 16 Jul 2004 (UTC)
As for me, I'm not overexcited about the conlangs but I don't see any value in removing entries that people have gone to the trouble of adding. If we had a bunch of people creating a bunch of new languages each and then losing interest again then it would be an issue. While it's on this small scale it doesn't bother me at all. — Hippietrail 02:18, 17 Jul 2004 (UTC)

In all likelihood nobody noticed that you had posted at Wiktionary talk:Criteria for inclusion Your post on May 29 was the first one there since March 7, 2003. That talk page also discussed what to do about newly invented words. Now we're talking about whole languages. As far as I can see there are only two words entered so far for Sasxsek; that's easy to handle. Romanica has already got some big index files in Wiktionary, and I don't think that the contributor has done anything with them in the last month. I would be inclined to merge it with Interlingua. What is the difference between them? Eclecticology 05:01, 17 Jul 2004 (UTC)

I would hope that we don't go around merging languages together without even knowing first whether they're different or not! — Hippietrail 08:54, 17 Jul 2004 (UTC)

Tajiki-Persian

A new user without a login is adding lots of Persian words but seems to be doing some subtle political-looking changes including changing all occurences of "Tajiki" to "Tajiki-Persian" as though trying to say the former word doesn't and never has existed in English. Also I have no idea whatsoever of the merit of his or her arguments. It doesn't look like the kind of thing we do here however. — Hippietrail 08:41, 17 Jul 2004 (UTC)

I'm inclined to agree. The SIL site at http://www.ethnologue.com/show_language.asp?code=PET treats the name "Tajiki Persian" as an alternate name for Tajiki. They use "Persian" alone to refer to a family of 10 different languages. It's difficult to have any discussion with anons, so I would be inclined to revert his changes. Eclecticology 20:09, 17 Jul 2004 (UTC)

The neologisms of Sir T.B.

Where in this project can i contribute an article on the many, many neologisms I've collected over the years by sir thomas Browne?? Thumbing thru the 17 vols. of the OED reveals that he is a frequent, if not the most frequent originator of neologisms in the 17th c. including medical, electricity, pathology, incubation, hallucination, pubescent, precarious, typographer, locomotion, individuality, ETC.ETC. as well as the weird and wonderful retromingent, uniterable and chyllifactive. The Norwikian

Hmm! We meet again. An original general article on Browne's ample contributions to language sound more like it belongs on Wikibooks (where I personally do not spend much time:-)). The words themselves belong in Wiktionary. When that is done it would be of benefit to everyone to include a brief quote from Browne illustrating his use of the word. I've taken the liberty to change your list of words from all capitals to wiki links. Eclecticology 18:59, 18 Jul 2004 (UTC)
Maybe you need to be more specific about what you mean, but based on your question, you should put the relevant information in the page/definition for the word. So, for the word uniterable for instance, you should make a definition on it of the form of the other pages on this site, or look at the template, and you might also include the etymology (un- + iterable) and quotations, (like "To play away an uniterable life." as I see in the OED). However, and I don't know the consensus of this on the Wiktionary, there might should be some restrictions how how novel and used the word might be. For instance, the OED indicates that they have only one recorded use of the word "uniterable" (as indicated by the Obs.-1) whereas, for instance, they have more than two recorded uses for chylifactive (as indicated by absence of a Obs.-1 or Obs.-2). If you have other information about these words beyond that which is accomodated in the template, bring it up here in the Beer parlour and, if it's not a simple thing, we'll all decide on consensus what to do about it. If you want to write about Sir T. Browne more thoroughly or about his words in a more encyclopedic manner, you might want to add that information to the Wikipedia ( http://www.wikipedia.org ). - Centrx 19:06, 18 Jul 2004 (UTC)
We aren't talking about modern neologisms. The OED is ample evidence of a word's existence. If a discussion needs to take place about a word it should be on the talk page for that word, NOT here. It's hard to see from your response, just where consensus building is required; surely it can't be on every unusual 17th century word. Your last bid of condescension suggests that you have not yet read Norwikian's contribution at w:Thomas Browne. Eclecticology 22:23, 18 Jul 2004 (UTC)
You have misunderstood my statements, apparently with some prejudice. I have no grudge against the poster--indeed I have never communicated with him or her before--and your presumption that I thought him inferior is insulting.
If a discussion needs to take place about a word it should be on the talk page for that word, NOT here. It's hard to see from your response, just where consensus building is required; surely it can't be on every unusual 17th century word.
I was asking or pondering, without any definitive recommendation, about whether there should be some discussion on whether we should include such neologisms--words (old words, at that) of which the vast OED has only a single recorded usage. Such a discussion is appropriate in the Beer parlour because it applies to a great many words, and is not a discussion on the validity of one particular word. Such a discussion obviates discussion of "every unusual 17th century word". If you have confused my call for consensus here with my recommendation "If you have other information about these words beyond that which is accomodated in the template, bring it up here in the Beer parlour and, if it's not a simple thing, we'll all decide on consensus what to do about it", I hereby clarify that this recommendation, this second mention of consensus, referred to the matter of new definitive information of a different quality than that which is currently accomodated in the template and in current practice. That is, information which would not fit under "Etymology", the definition, "Derived words", etc.
Your last bid of condescension suggests that you have not yet read Norwikian's contribution at w:Thomas Browne
Nothing of what I wrote was intended to be condescending in any way. It was entirely a response to a question in the context of Wiktionary practices. As to the Norwikian's contributions, due to the unusual signature, I mistakenly thought it was an unsigned, anonymous post--indeed the Page History lists an unregistered user. I had no knowledge of the poster's contributions to the Wikipedia article on the subject. - Centrx 05:29, 19 Jul 2004 (UTC)

Searching for words in non-Latin-alphabet languages

The Main Page states that the purpose of the English wiktionary is to define words from all languages in English (maybe I'm simplifying a bit). I see a problem for people trying to search for a word in, say, Russian, Greek, Chinese, or Hindi. The words have been entered in the alphabet, syllabary, etc., of the target language, and they are not searchable with Latin letters. I know some Greek and Hebrew, but I don't really know how to set my keyboard to enter these characters, and I'm completely lost in Russian, Chinese, and other languages. I suspect the average user would have much more difficulty than I.

Do we have a method to allow people to search for foreign words using the Latin alphabet? If so, what is it? If not, are we severely limiting the Wiktionary? Is there any simple method to link the Latin version of a word to the native version, to make the Wiktionary more searchable and useful? -- RSvK 00:05, 26 Jul 2004 (UTC)

Your observations raise some very interesting questions and problems. To some extent they overlap with the questions at Wiktionary talk:Categories where a partial solution may be found, but some more profound questions also arise. How do we ensure exact 1 to 1 correspondence between the original script and Latin script. In Greek both κσ and ξ may be represented by ks. How can we ensure that the process can be perfectly reversed? How do we cope with tonality in Chinese? There have already been objections to having tone marks appear in article titles. A quick look at "ma" in my Chinese dictionary shows that it can be represented by 16 different characters across the four available tones. To answer your questions: No we don't have a functioning system to do what you describe, but it would be nice to have it. Yes, we are severely limiting Wiktionary, and ... No, there is no simple solution. Eclecticology 03:22, 26 Jul 2004 (UTC)

Some entries today raise the question again. A one-to-one correspondence and perfect reversal are not necessary, just a way to get from a Latin representation of a foreign word to the definition. For example, if someone sees the word "kalos" somewhere in Latin letters and wants to know what it means, it doesn't do much good to have the definition available only from the Greek letters. RSvK 04:02, 7 Aug 2004 (UTC)
I agree that perfect reversal is not needed. What is needed is a standard that everybody agrees upon. Chinese has Pinyin and Japanese has Romaji and they are standardized so that the way a word is spelled in latin letters is most often the only common way to spell the word in latin letters, or at least people know when an outdated system is used. For other languages this is not the case. Granted that people have some need to find the Libyan leader's name using latin characters. This does not mean that we should create an article for every way it can be spelled in latin characters. Likewise, if we choose just one way, people using the other ways won't find it. While Greek may not be as bad as Arabic or Hebrew, along with Russian and the other Cyrillic-using languages, Greek just doen't have an agreed standard of romanisation. You can only take a guess at a spelling. With Chinese and Japanese you can know the spelling. — Hippietrail 04:55, 7 Aug 2004 (UTC)

That's kind of what I was afraid of. There really is no one-to-one correspondence between Latin letters and other systems, particularly if we want to make the system usable--such as Latin 'e' equating both to Greek 'eta' and 'epsilon'.

Would it work for now to just enter the Latin version with a redirect to the other language version of a given word? -- RSvK 05:31, 26 Jul 2004 (UTC)

You can always use ē for eta, and ō for omega. The enormous scope of the problem can be gleaned at http://lcweb.loc.gov/catdir/cpso/roman.html Your suggestion probably will work as long as there isn't a word already with that spelling in another language. Eclecticology 08:51, 26 Jul 2004 (UTC)

Yes, this is a problem of enormous scope. The problem is whether the foreign non-Latin language dictionaries are going to be of any use to people who don't know that non-Latin method of writing. The web site above seems to indicate that the problem is solvable--the romanization table for Greek is quite straightforward. If I see a word in a newspaper, in Latin letters, from Russian, Greek, Hindustani, Chinese, etc., there ought to be a way to look it up. Yes, there are some multiple spellings. If they are common in Latin letter sources, we should probably add them.

I propose that we add some kind of stub entries for these words. If nothing else, just indicate the language and include a link to the entry in the language's own script. Otherwise the Wiktionary's usefulness becomes greatly diminished.

Yes, this means a lot of entries, but I don't see any way around it.

It also means a lot of mess when there are several ways to romanise a word with some people adding one or another on some entries, and some adding some other on other entries. Find us some romanisation standards for the languages you care about and we can then do it properly. — Hippietrail 02:04, 8 Aug 2004 (UTC)
Eclecticology lists a romanization standard for Greek above here, which looks reasonably straightforward. I suggest we need it not only for Modern Greek but for Classical and Koine Greek, because of the large number of words in English derived from Greek. RSvK 20:39, 14 Aug 2004 (UTC)
I would prefer some person or persons with knowledge of various ages of Greek to look

at several systems before just assuming the first one is going to be best. We should strive for the best at Wiktionary. I am positive that there are more than a couple of "standards" in use for various jobs and by various groups. I'll try to dig some up so they can be compared. — Hippietrail 00:31, 15 Aug 2004 (UTC)

It might be useful also in the foreign script to include a Latin-letter transliteration.

We already do this. It goes after the headword, in parentheses. For languages like Chinese which have more than one recognized system, we also add the transliteration in each system with a label, when we know. — Hippietrail 02:04, 8 Aug 2004 (UTC)

For words with the same spelling in other languages, just have the usual separate language sections. RSvK 14:38, 7 Aug 2004 (UTC)

One (ambitious) way around it would be to build in intelligence to the search interface that matches inquiries that are not found in the dictionary to possible alternative romanisations. This would work something like the Google search engine which sometimes gives you a prompt at the top of the results page asking 'Did you mean ~~~~?' We would have to think carefully about possible perceptions of favouring particular romanisations. Oska 01:56, 8 Aug 2004 (UTC)
The discouraging part of it is the "Hard work" :-)

We're in amazingly good shape for Chinese. See Wiktionary:Chinese Pinyin index and the more general Wiktionary:Chinese index. Pinyin romanization (not the same as transliteration, which only applies to alphabetic or syllabic scripts) works from a limited number of acceptable morphemes. Should we be simply transferring, for example, everything for the Pinyin "can" (pronounced /tsan/ not /kan/) to the page for can. Those Pinyin index pages could be much improved, but they provide a tremendous resource to work from.

Each alphabetic script brings its own problems. Some have several ways of being transliterated. For the Russian letter Я we can use "ia", "ja", "ya" or "ia͡". The Chicago Manual of Style recommends using the system used by the U. S. Board on Geographical Names. It is the one that uses "ya", whereas the Library of Congress is the one that prefers the ligatured version. The average contributor will be glad to not have to figure out how to type that! It comes down to agreeing what we are going to use a certain way of transliterating and sticking to it.

If we congine ourselves to three languages (Russian, Chinese, and modern Greek) until the thing is debugged, the others should come a lot easier. Eclecticology 04:51, 8 Aug 2004 (UTC)

I disagree with the whole premise of this discussion.

  1. "they are not searchable with Latin letters" — they're not written with Latin letters either. Why should they be searchable with Latin letters?
  2. "link the Latin version of a word with the native version" — it's not our place to invent new spellings, Latin or otherwise.
  3. "if someone sees the word kalos somewhere in Latin letters" — that's what =Alternative Spellings= is for, but again, it's not our job to make such things. (I don't deny that it happens a lot. Sometimes there are legitimate examples, such as Kat' exochen.

The best solution is to make it easier for people to enter text in foreign languages. For example if you go to the Esperanto Wikipedia you'll notice that above the search box is the label "Serĉu ĉ ĝ ĥ ĵ ŝ ŭ" (search), giving the user easy entry to all the Esperanto accents by copy and paste. If you edit a page on the Italian Wikipedia you are greeted with the text "Clicca uno di questi caratteri speciali per inserirlo nel testo: È à é è ì ó ò ù – «» “” ‘’ [[]] {{}}" (click one of these special characters to insert it into the text), with the relevant characters being clickable for automatic insertion. Either of these options could be doable, all we need is to make a page to help enable international character input. We don't need to dumb down Wiktionary for the technically impaired when we have the option to technically empower people instead. —Muke Tever 21:30, 15 Aug 2004 (UTC)

Nobody's suggesting that we invent new spellings. If we treat the latin transliteration as an alternative spelling it would link or redirect to the orthography of that language. We are not talking about Esperanto or Italian where there are only a handful of specially accented characters. The proposed approach gives the user another series of options. So does your proposal. Once you have developed your proposal to the point where the "technically impaired" among us can easily look up a Bulgarian word your efforts will be very much appreciated. In the meantime, we hope to accomodate those knowledge of the language is limited to a rudimentary decoding of their alphabet. Eclecticology 22:30, 15 Aug 2004 (UTC)
True, in Esperanto or Italian there are only a handful of specially-accented characters, but while in those languages they may appear in many words of a search, in Wiktionary the problem is offset by the fact that a person is generally only going to be searching for one word at a time. My "proposal" only requires a page somewhere like Wiktionary:Cyrillic with a table like
Bulgarian capitals А Б В Г Д Е Ж З И Й К Л М Н О П Р С Т У Ф Х Ц Ч Ш Щ Ъ Ь Ю Я
Bulgarian smalls а б в г д е ж з и й к л м н о п р с т у ф х ц ч ш щ ъ ь ю я
Transliteration a b v g d e ž z i j k l m n o p r s t u f x c č š št ŭ ʼ ju ja
I already use similar beasts myself, at User:Muke/grc (greek capitals including archaic characters) and User:Muke (a less-developed version, Gothic).
I don't like the pages at Romanized titles because it is actually harder for the user: someone running across a (say) Greek word in situ may want to know what it means and it is easier for them to enter it directly than to have to figure out what each individual character means first.
The biggest reason I don't like this Romanized-title talk is because I see pages like erān coming into existence which 1) contain diacritics not in the original orthography, 2) represent effort not put into the page with the proper spelling, ἐρᾶν (which at the moment doesn't even exist, so people who actually know the word can't find it), 3) don't have any proof — Google:erān knows of none either — that Latin spelling of this word is even used (which is indeed tantamount to inventing new spellings). Indeed, erān doesn't even mention that the word isn't in its original script (though that is a fixable problem).
What I really don't want to see is a policy that endorses misspellings like this in a dictionary. If a transliteration setup like what is being proposed is brought into effect, it should 1) clearly notate that the transliteration is not a real word and 2) be in ASCII, as the primary call for it is people who can't type extended characters, which applies equally well to ā as it does for λ; this does indeed offer grounds for ambiguity on the Latin side, but as the existence of the Latin page itself should be comparable to disambiguation ("Eran is also a Greek word; see ἐρᾶν") this shouldn't be a problem. (As already noted, proper transliterations with fun diacritics appear on the native script page itself anyway.) —Muke Tever 23:13, 15 Aug 2004 (UTC)
OK. I can go along with the essentials of your alternative as satisfying the purpose of this exercise. It is directed at people who don't know the language at all, or in the case of Cyrillic, Devenagiri or Arabic scripted languages may not even know what language they are looking at. I can accept that the transliterated link be written with a simplified, and unnaccented script. On the page itself, and under the appropriately placed language the indication would be "Transliteration of ..." and list the various possibilities with only the briefest indication of what each means.
For Cyrillic, the transliteration needs to be uniform without regard to which language is being used. It will not do to have "Щ" sometimes appear as "šč" or "št", but it should always be "shch". Eclecticology 23:56, 16 Aug 2004 (UTC)
Well, you said Bulgarian, so I went for Bulgarian :) where "Щ" is indeed always "št" and not "šč" (ignorable pedantic side note: the form of the letter is originally a ш-т ligature, the descender being originally central and Cyrillic ligatures being written top-to-bottom instead of side-to-side as in Latin, so št is actually a more-original value than the Russian).
Actually, for an audience who doesn't know the language, it might be better not to confuse them with (possibly contradictory) transliterations at all—when entering Cyrillic text to find a page, it doesn't matter what the value of "Щ" is; the page oughtta should tell them exactly what "Щ" is supposed to stand for.
Transliterations could be useful to people who have the phonetic form of a word but don't know what the spelling should look like. But in this case the user has the same problem as if he didn't know how to spell "bureaucrat": we don't, as we may when browsing a print dictionary, have the capacity to search "nearby" words to find which is our word properly spelt. (And languages like Russian, while perhaps better than English, are still not phonetically spelled.)
The "Transliteration of..." section btw is a Very Good Idea. —Muke Tever 02:36, 17 Aug 2004 (UTC)
Blushingly about Bulgarian I should have stuck with Russian. I do tend to think in terms of the written language. Spoken language opens up a lot of new problems like mishearings or dialectical differences. That might need to wait until we have voice recognition software installed. ;-) Searching nearby words gets a lot more tricky when the language itself is a variable.
For this purpose a very rough one-fits-all transliteration is what we need. The instruction page would make it clear that the series of letters that they are typing may not accurately reflect the pronunciation of the language. The key question that we are trying to answer is "How do I look up a word written in a language and script that I do not know or even recognize?" Eclecticology 00:49, 18 Aug 2004 (UTC)

Downloading Entire Dictionary

Will it be possible to download a complete or partial copy of all dictionary entiries?

Yes. Eclecticology 06:00, 1 Aug 2004 (UTC)

Encyclopedic entries

I am concerned at the steady stream of encyclopedic entries that have been posted lately, in particular, over the past weekend. These almost inevitably come from anonymous users, who are virtually uncontactable. I have migrated some of the more fully-formed entries to Wikipedia for users to edit as appropriate.

Is there a way to prevent this or at least reduce their number? Is it time perhaps to insist that all people wishing to post on Wiktionary set up a user name? — Paul G 08:44, 9 Aug 2004 (UTC)

I can sympathize with the problem, but I think that being too insistent about signing in would completely change the nature of the wiki. It would promote the development of a "professional" cadre, and give fuel to the periodic accusations that the projects are run by a cabal. You have certainly been diligent about your work, but there are limits to what you can do. If you find some aspects of the work overwhelming you might evaluate your approach for the sake of your own sanity. I really don't think it's necessary to go the extra mile for the anons; they may never show up again to appreciate it. I think that a lot of messages left on their talk pages are never read because they simply don't know that there's a message there. I've looked carefully at the things that you have put on RfD, including many that you have rightly called encyclopedic. If one of these falls into that category, it should be enough to leave the messages in this project. You may move the material to Wikipedia if you feel like it, but there's no obligation for you to do the other guy's work for him. I've been making a point of giving ample opportunity for response at RfD, but it seems that you and I are the only ones with anything to say there.
I think that the atmosphere here is still considerably more relaxed than at Wikipedia, and I'm very happy with that. Unfortunately, I happen to believe that perfection is unattainable. I can understand the urge to see things done correctly. An article in Scientific American a few months ago addressed the issue of people being more satisfied with their work when it is not done out of obligation. We sometimes worry too much about getting things just right, and if we start imposing restrictions on editing rights we are more likely to scare away the good people. Eclecticology 02:51, 10 Aug 2004 (UTC)
Thanks, Eclecticology, for looking at the broader picture. I agree with your points. When I do move articles to Wikipedia, I dump them there as-is for someone else to sort out.
Of course perfection will be impossible to attain on here - who decides what "perfection" is in any case? Even if everything could be set out a certain way, it would probably be marred by the next person to come along.
I think you're right to say that being too strict about editing rights is likely to scare away people who make valuable (and numerous) contributions. Viewed that way, handling the odd encyclopedic entry is not such a big deal. — Paul G 08:47, 10 Aug 2004 (UTC)

(no title)

Here are some websites where we can teach ourselves about the issues in Greek romanization before we rush in and make too many useless articles:

Please add more as they are found. — Hippietrail 03:34, 15 Aug 2004 (UTC)

There's an important one at http://lcweb.loc.gov/catdir/cpso/romanization/greek.pdf What we first need to do is set up a page with the transliteration chart that we are going to use. Most letters will not be controversial, but we want to be able to develop a consensus around the others. Eclecticology 22:30, 15 Aug 2004 (UTC)

Pronunciation of 'Oneiro'?

Anyone know how to pronunce 'Oneiro.' The Greek word for 'dream' ot 'a dream.'

Ancient Greek or Modern Greek? Does the "o" represent omicron or omega? — Hippietrail 01:54, 18 Aug 2004 (UTC)
Modern Greek is όνειρο and should be pronounced /ˈoniro/. Ancient Greek is ὄνειρος or ὄνειρον (both genders are found) and pronounced classically /ónɛːros/, /ónɛːron/. In Modern English loanwords like oneiromancy it is pronounced /oˈnaɪro-/. —Muke Tever 14:13, 18 Aug 2004 (UTC)

DB Wiktionary

Wiktionary really needs to become a single database, divided into single languages. Each language should have a table having the required fields of words for that language. The users should only be allowed to add new words for that language from its own subdomain and in its own language. Basically what I mean is: seperate language dictionaries. This will: rid a language of having all the foreign language words just link back to its own words as definitions (E.g. mundo); rid the allpages list from being unbreakable by language; rid the sites of having 83% of repetition; and as it's a database, being able to format easily.


To still have foreign words with foreign language definitions, each language's table would have translations of the defintions in them. Then if someone was to access say "lobo" on the EN subdomain, it would search all the languages that have it (Spanish,...), and return it with the English translation of definitions. If the word exists in other languages but its definition isn't translated, it could print a stub link forwarding an editor to the foreign language's domain to be able to translate it.


This probably seems confusing and useless. --66.177.192.98 04:11, 31 Aug 2004 (UTC) (Blade Hirato)

This can be done. What is required is to inicate for all words what the language of the word is. This is done on nl:wiktionary. All the metawords like translation synonym etc need to be templates like it is done on the nl:wiktionary. The tranlations that are present are shown with the local language word. the definition of the local word can be used to translate that particular meaning of the word. Have a look at nl:wiktionary and see what you think. GerardM 08:48, 31 Aug 2004 (UTC)

References to the Oxford English Dictionary

On the FAQ page it says the following:

"links to wikipedia or references to the Oxford English Dictionary are better ways to ensure that the definitions are complete."

How does one reference the OED?

Borofkin 03:25, 17 Aug 2004 (UTC)



Visual thesaurus

I was just at http://visualthesaurus.com . It has an interesting way of presenting words. Makes me wonder whether developing something similar would be feasible. Eclecticology 12:38, 24 Aug 2004 (UTC)


Automatic Wikipedia links

How about automatically adding something like

  * Wikipedia article on bar.

in any word's page if the relevant page exists on Wikipedia? There are hundreds of thousands of articles there, it will be a long time before these will be added by hand.

I'm not totally opposed but there are a few issues:
  • Sometimes the Wikipedia entry will be under a different article name.
  • Sometimes there are multiple Wikipedia entries with a disambiguation page - we can choose which one(s) we link to now.
  • Sometimes there isn't a relevant or existant page on Wikipedia and links to nonexistant articles would annoy the user more than nonexistant links.
Hippietrail 00:49, 25 Aug 2004 (UTC)
Automatic links could become very annoying. There will be many instances when a link to another Wiki project may be more appropriate. Doing it by hand helps for protecting relevance. Eclecticology 05:20, 25 Aug 2004 (UTC)
I would like to see a lot more automatic links, currently to add one translation from english to french it is neccessary to add it to english page, to french page, to english index , to french index, link to wikipedia and so on and so on.. Seems like so much work which is NOT neccessary (a lot of it could be generated if the database would be a little bit more intelligent). Currently it is a lot of unstructured text which needs serious text mining.

Wikipedia links in general?

The (current?) style for W: links seems to be to manually add the text 'See Wikipedia article on'. Could this be automated in some way, so you just enter for example w:wiktionary and you automatically see See Wikipedia article on Wictionary? trunkie 12:38, 1 Sep 2004 (UTC)

Why not just create a template {{w}} with content
See Wikipedia article on [[{{PAGENAME}}]].
Of course it won't be a major reduction of work needed to be done, but it might help a bit. Maybe it would even be possible for someone who knows more about these templates to include a parameter such that one could choose which page to link to. (Useful in case the target is a disambig page).
Of course, this is not an opinion whether it should be done, only some wild ideas of what can be done... :) \Mike 14:47, 1 Sep 2004 (UTC)


Use of templates

Hello, I am a user from (beginning) de.wiktionary.org and have a basic question. Is it advisable to use templates ({{1}} instead of <small>[1]</small>) in every Wiktionary-entry? Is this a problem for the Wiki-Server? I want to use those {{1}}, {{en}}, ... behind every translation for a better clarity! Bad idea?--217.227.188.90 18:11, 29 Aug 2004 (UTC)

Although there are places where templates are appropriate, they can make editing very difficult for those who are not familiar with the coding. In your example I would have no idea what the "1" is about. Eclecticology 22:37, 29 Aug 2004 (UTC)
It is simply the short form of <small>[1]</small> (for meaning [1], meaning [2] ---> meaning [1]). But is it a problem for the Wiki-server? --217.227.188.90 23:07, 29 Aug 2004 (UTC)
It's not a server problem, but the use of numbers in that way is not advisable since the meanings are soft numbered. If a new meaning is added the numbers could change. Eclecticology 23:26, 29 Aug 2004 (UTC)
On nl:wiktionary, we do indicate for what language a translation is; this is also used on en:Wiktionary eg in Livonian. Here the ISO 639 codes are used. The idea of having templates for the number of the meaning is indeed a bad idea. I use coloured blocks that can start with a number when there is more than one. This scheme is also really usefull to add words to other wiktionaries; the only thing required is that the templates exist on the other wiktionaries. For as long as you know what it should be, you can create the content of the templates like I am doing on the Vietnamese, the Italian wiktionary. Currently the templates and content do exist on en:. 08:57, 31 Aug 2004 (UTC)
Could you show me an example for these coloured blocks which start counting when there is more than one meaning?--217.93.167.4 22:25, 31 Aug 2004 (UTC)


Multilingual links

User 62.90.59.130 is adding a bunch of "multilingual links" in Russian and English in various articles. I cannot figure them out. Here is an example that was added to the article blue just now:

Is this useful to Wiktionary? What is it? — Hippietrail 07:48, 1 Sep 2004 (UTC)

I was just going to comment on the same thing. From what I can determine, they look like translations into Russian of compound nouns. They ought to be moved into entries for the relevant noun. I'll post a comment on the user's talk page. — Paul G 11:54, 1 Sep 2004 (UTC)
I've posted the following on the user's talk page:
"Hello, welcome to Wiktionary.
"I see you've been adding a lot of "English-Russian multilingual links". Can you explain what these mean, as some of us here don't understand them? Your contributions are useful but it seems that a "multilingual links" section isn't the appropriate place to add them. If you want to translate Russian words, you need to create a page with the Russian word as its title and put the English translation there. Contact me or another sysop if you need any help. — Paul G (Wiktionary sysop)" — Paul G 12:05, 1 Sep 2004 (UTC)
In fact I think he is putting in links of words which sound similar but have different meanings and are not otherwise related. At least that's what it seemed like from one of his entries which had Russian and Hebrew together. I don't think it's useful. — Hippietrail 02:07, 2 Sep 2004 (UTC)

Please, see proposal Associative Wiktionary, which placed at the bottom of this page now.

Please check some Korean stuff

An anonymous user has changed things on nl:wiktionary relating to Korean. I am not sure that what is done is correct. What had been done re Farsi, is wrong (according to a Farsi speaker)

nl:Koreaans and nl:한국말

Thanks, GerardM 08:09, 1 Sep 2004 (UTC)

Remove Plural Links

I think these should be removed, as it is the same as including possessives; like dragon's and dragons'. The only thing these things should do is forward. --66.177.192.98 09:38, 1 Sep 2004 (UTC) (Hirato)

The same with non-en: words? Swedish plurals are irregular enough to benefit from own articles, but the question is whether one should link to them - I've done that so far. \Mike
Do you mean links (ie someone putting in a link like links as opposed to doing links (you may need to go to edit or hover over to see the difference)) or entries defining plurals (eg) plurals? trunkie 14:17, 1 Sep 2004 (UTC)

I'd prefer to leave in links to all forms of a word, so that people can look up words easily no matter what form of a word they enter. If a plural has no special meaning other than just being plural, then by all means just forward them to the singular. RSvK 17:36, 5 Sep 2004 (UTC)

I think the links should remain also: for a word such as links that merits an entry on its own (e.g. the German and Dutch senses) inflected forms such as English's should probably have a reduced format: under a section like ==Inflected form== a set of lines like:
  • English: plural of link
  • English: third person singular of link
  • Frankwegian: /nɛ̃x/ plural of link
—that is, a disambiguation-style list instead of trying to impose a skeletal whole-entry form on it (which might, heaven forfend, begin to attract translations). The invented "frankwegian" line is an example of how to incorporate special information (here, that the pronunciation is different from the singular in an unexpected way, as for example the plural of French boeuf).
For a word that doesn't merit an entry on its own though (such as, say, thoroughfares) articles should probably not be written, though the link could remain—who knows what may happen in the future?
As for possessives, that's a different case, as possession is served by a clitic 's appended to a noun phrase, and not an inflection on a noun.—Muke Tever 16:06, 8 Sep 2004 (UTC)

Flexibility remains important in these things. In the general case it should not be necessary to have a separate page for either the plural or possessive inflections. The exceptions would come whenever there are irregularities. "Links" when used in the sense of a golf course does not normally appear in the singular even if one can surmise what that singular might mean. Words ending in a sibilant may have irregular possessives. (Muke's theory about 's as a clitic is debatable, but should not affect the outcome of this discussion.) Eclecticology 19:05, 8 Sep 2004 (UTC)

For once it's not my theory :) See the list at w:Clitic, or for a less faceless authority, some comments by Dirk Elzinga, a linguist out of BYU. —Muke Tever 02:45, 10 Sep 2004 (UTC)

Appendix: XXX alphabet

Some anonymous user(s) is/are setting up appendices of letters in various alphabets. These are legitimate for languages where alphabets differ from the Roman alphabet, but, for example, is it necessary to have a page for the so-called "French alphabet"? Reproduced below are my objections to this appendix in particular:

'This is not the "normal" French alphabet at all. The French alphabet has the same 26 letters as the "English" alphabet (actually the Roman alphabet). The accented letters and ligatures are not considered to be independent letters, unlike, say, the n-tilde in Spanish. Is there any reason to have pages that just replicate the Roman alphabet for all languages that use [it]?'

The titles "This is the normal XXX alphabet" seem to be redundant too - are there any other alphabets for XXX other than the "normal" one?

Thoughts? — Paul G 10:09, 2 Sep 2004 (UTC)

As long as there is only the chart, as is now the case for French, I very much agree with you. The one for Spanish seems far more useful and informative. Eclecticology 17:06, 2 Sep 2004 (UTC)
I came across Wiktionary_talk:Spanish_index prior to its creation; seeing that in certain areas it is taught differently. I made the note so future authors would not modify it to what they learned and leave it listed as what is there, the [[w:Real Academia Espa%F1ola|Real Academia Española]]'s one. Also; We probably need to rename these "writing systems"–and maybe have the structure of the language too. --Blade Hirato 10:17, 4 Sep 2004 (UTC)

The "Edit Conflict" screen

If you are like me and you really really hate the "Edit Conflict" screen, please take a look at meta:Edit conflict handling suggestion. Thank you.

Indexing Japanese entries

The way Japanese entries are indexed at present seems rather dissatisfactory for the following reasons:

  • links to words consisting of a single kanji point to the entry for the kanji, not the word itself
  • hiragana entries for words customarily written in kanji do not redirect to the main entries for those words (e.g. あんしん doesn't redirect you to 安心)

A solution would be to have three entries per word:

  1. All actual content in an entry whose title consists of the word as it is usually written followed by the reading in parentheses (e.g. 安心(あんしん)).
  2. A hiragana entry that disambiguates between all words which have that reading, or redirects to the main entry if there is only one word with that reading (e.g. ふへん would disambiguate between 普遍(ふへん), 不偏(ふへん), 不変(ふへん) etc., while ふへんぶんぽう would redirect straight to 普遍文法(ふへんぶんぽう)).
  3. An entry whose title is the word as it is usually written. This entry redirects to the main entry or disambiguates between different words with the same kanji (e.g. 安心 would redirect to 安心(あんしん), while 目下 would disambiguate between 目下(もっか) and 目下(めした)).

This system would have, among others, the following advantages:

  • It would rid us of hiragana-only titles for words that are typically not written in hiragana.
  • Any Japanese word in any context would have its reading displayed right next to it.
  • Searching for a word in hiragana would send you directly to the main entry for that word when possible.
  • Links to words written with a single kanji would point to the word, not the kanji, while links to the kanji would still point to the kanji and not the word (e.g. a link to 山 (the kanji) would be entitled 山, while a link to 山 (the word) would be entitled 山(やま)).

--Yajuu 09:23, 4 Sep 2004 (UTC)

We look forward to your actual contributions in this matter. Please note that kanji entries should also include their Chinese or Korean renderings. Where a kanji has more than one hiragana (or katakana) rendering it seems that it would be more useful to include them all on the same kanji page. Remember that this is the English Wiktionary, and that people who are not native Japanese speakers may not be aware of the alternate readings.
Hiragana-only titles should continue, but when applicable should say directly that this is not a typical usage followed by a link to the proper page. Your actual contributions will be very helpful in developing a policy that those of us less familiar with Japenese would not be competent to do alone. Eclecticology 12:36, 4 Sep 2004 (UTC)

Accent in Japanese words

Most (all?) entries for Japanese words seem to lack accent information. I think we should agree on a notation and start adding. Probably the simplest notation is a single number; "n" means that the nth syllable is accented, while n=0 denotes a "flat" (平板式) word.

Examples of the proposed notation:

  • 雨(あめ) 1
  • 飴(あめ) 0
  • 言葉(ことば) 3

--Yajuu 10:02, 4 Sep 2004 (UTC)

I disagree. I'd rather see an acute accent on the vowel in the romanization. It can be hard to see what a number standing on its own represents. --Vladisdead 10:08, 4 Sep 2004 (UTC)
Many users will prefer not to look at the romanization at all. But we could include both formats, or we could make the numeric notation so explicit that everyone will understand it. For example, the entry 雨(あめ), if it existed, could contain the following:
Accent: 1 (頭高型) (See accent notation in Japanese entries if you are not familiar with the notation)
--Yajuu 10:53, 4 Sep 2004 (UTC)
You are certainly free to add this information to the pronunciation section for Japanese words. Eclecticology 12:16, 4 Sep 2004 (UTC)

Some time ago I posted a question on a relevant talk page on Wikipedia asking how Japanese dictionaries represent the accent. It seems the vast majority don't show it at all. The only ones I've found so far have been a very few Japanese<->English dictionaries. The one I have shows just an acute accent on one syllable, only in the Romaji. The other system I have seen but do not have available to check uses an acute to show where a word raises in pitch and a grave to show where it is lowered again - again only in the Romaji. I don't know much about Japanese pitch accent but it seems these two systems are not even compatible. And I never heard any more information about how it is shown, if ever, in dictionaries which don't use Romaji. Theoretically it's possible to add acutes and graves to Hiragana but fonts and rendering systems probably don't expect that and the results may be ugly. I will ask this question again now on the Qalam mailing list. — Hippietrail 01:38, 5 Sep 2004 (UTC)

Unfortunately, many 国語辞典 (kokugojiten, Japanese-Japanese dictionaries) don't show the accent. That's why there are separate accent dictionaries for non-Tokyoites (and foreigners). There is a relatively standard notation, however: the word is written in kana and a line is draw over the portion of the word that has high pitch. My accent dictionary uses a variant of that notation (with a small downward notch indicating where high pitch turns to low pitch). Of course, this is not something worth trying in Wiktionary. The number notation which I proposed above is used in the 大辞林 (daijirin) dictionary (http://www.nifty.com/dictionary/).
Realistically, I can't see any alternative to the number notation. We don't want to go down the roomaji road, because many if not most students of Japanese absolutely detest roomaji. Kana with lines or acutes over doesn't seem a very good solution either. And the number notation is brilliant. It is easy to understand and can readily be extended to phrases longer than one word (e.g. the accent of 臨機応変 (rinki ouhen) can be denoted 1-0).
--Yajuu 13:03, 5 Sep 2004 (UTC)
I've received a very good reply on this topic from Berthold Frommann on the Qalam list. Apparently there are several systems, and even two variations on the system which uses numbers. It seems to me that in Romaji the prevalent system uses an "inverted L shaped symbol" but some have a left-pointing system and some a right-pointing system. I would be very interested to find out if Unicode has such a symbol. It seems it should.
From the X-SAMPA standard: "The Japanese pitch accent, phonetically a pitch downstep with contrastive function, is usually symbolized by Japanese scholars as ¬ (corner, ASCII 170, ANSI 172)." This looks like the sign the Martin dictionary uses; the Kenkyusha seems to be using the Japanese opening bracket "「". Since it is a downstep though, the ¬ sign looks more appropriate. —Muke Tever 15:12, 5 Sep 2004 (UTC)
Those characters have shapes similar to what the print dictionaries use but have very different size and positioning characteristics. The Qalam list in frequented by some very good Unicode experts so I'm waiting to see what responses I get there. Thanks for finding these though! — Hippietrail 15:22, 5 Sep 2004 (UTC)
From Qalam this discussion has proagated now even to the Unicode mailing list, where these characters were mentioned:
02F9 ˹ MODIFIER LETTER BEGIN HIGH TONE
02FA ˺ MODIFIER LETTER END HIGH TONE
These seem to be the right ones and I would like to encourage their use even if fonts and OSes are not supporting them properly yet. There is no reason not to have multiple systems as some Chinese and Korean entries, and the pronunciation sections already do. So full articles could have the acute/grave, the begin/end high tone, both numeric systems - especially if we can find labels for each system. For brevity, when Japanese words appear in the "Translations" section of English entries they should use no accent marks, acute/grave, or begin/end high tone - but these "optional" characters must not appear in links. I don't think the number systems will work well in the "Translations" sections. — Hippietrail 00:39, 6 Sep 2004 (UTC)
Ah, that's good... weird, that those characters display here, though I can't make out what font they're displaying in and Character Map doesn't appear to acknowledge their existence, even in fonts that apparently have them (such as Everson Mono Unicode, and Doulos SIL). I shall have to find a newer utility... —Muke Tever 15:48, 8 Sep 2004 (UTC)
Since Wiktionary aims to be a "real dictionary" I think we need to do what the best print dictionaries do. I would love to see a scanned page from an "accent dictionary" - what are they called in Japanese so I can ask about one in my local Japanese bookshop? Now, assuming that the line above a hiragana won't currently work and also assuming that the l-shaped symbols won't currently work; I think we should go with the acute accent since it looks the most professional. I think it would also be fine to have a redundant system which doesn't depend as much on advanced typographical features. One of the numeric systems would be good.
Annoyingly, I thought I had already made entries using the acute accent at either はし or はな but now I can't find them. I'm also going to post Yajuu's reply on the Qalam mailing list with some more questions of my own. — Hippietrail 14:07, 5 Sep 2004 (UTC)
"Accent dictionary" is アクセント辞典 in Japanese. A sample page of the one I use can be seen on the publisher's web site.
I didn't know there was an alternative number system -- counting syllables from the back. It seems by no means a bad idea, but I still prefer counting from the front because it is more intuitive: "nth", in just about any context, tends to mean "nth counting from the front". The argument mr Frommann presents about long words is not of much gravity because in the vast majority of cases you only need to know if the word is 平板式, containing no accented syllables, or 起伏式, containing an accented syllable. There are no subcategories to 平板式, and for most (though not all) 起伏式 words it is obvious where the accented syllable must be. --Yajuu 15:24, 5 Sep 2004 (UTC)
Isn't this entire debate disregarding dialectical differences? Like the endless debate of A-me verses a-ME (candy/rain)? There are countless others I could quote from a 関西 dialect alone. Lockeownzj00 18:47, 3 Oct 2004 (UTC)
I have asked on a couple of forums whether any of these systems will work for other Japanese dialects, or whether accent dictionaries exist for them, but I haven't had much response. I would dearly love to have other Japanese dialect words and pronunciations here. Do you know if accent dictionaries exist and if so, how they show the accent or can you tell us if the systems we've discussed here will work? What other Japanese dialects are you familiar with? This is an area I know very little about. — Hippietrail 01:05, 4 Oct 2004 (UTC)


Translation tables

I think the new translation tables are beautiful. I think it will make automated parsing of Wiktionary articles but it's probably worth it.

What I don't like is the artificial "A to I" and "J to Z" partitioning scheme. It's ugly as sin! I've changed quite a few to balance the number of entries in the left and right column - but there would be a lot less of this work to do if these instructions were left out or replaced with "left column" and "right column". Most users would try to keep them balanced, those that don't will probably make them only slightly unbablanced which will be less ugly. And fixing them will be less work.

What do the other contributors think?

Even with the ugly imbalances, it's an improvement over the numeric references. Eclecticology 04:29, 5 Sep 2004 (UTC)
Yes I totally agree. But they don't have to be unbalanced. — Hippietrail 05:11, 5 Sep 2004 (UTC)

The translation tables are attractive, but the need to manually edit the double columns makes them too time-consuming and difficult to work with. I suggest we keep the nice, colorful formatting, and just reduce the translations to one column. I'll try a sample one that way when I get a chance. RSvK 05:23, 5 Sep 2004 (UTC)

I don't find that using cut-and-past to move material between columns is such a difficult problem. The two column approach makes things more compact, and makes it more likely that you will see more of the translations together for comparison without the need for scrolling. Eclecticology 07:26, 5 Sep 2004 (UTC)

Is there any way to automate the production of double columns? I'd rather concentrate on the content rather than the formatting. RSvK 17:31, 5 Sep 2004 (UTC)

Thanks for everyone's feedback. I'm pleased to hear that people like the tables. Here are a few comments in reply.
Double columns These are there to make better use of the space on the page. The dimensions at which a user typically has his or her browser and the length of the entry for each language in a translation table means that the table is more likely to fill the page widthways if it is in two columns, as Eclecticology points out. It also prevents the page from becoming very long. If I remember, I copied this format from the Dutch Wiktionary, where they use a similar system.
Time-consuming formatting/automating double columns I have a small text file containing a blank translation table that I just copy and paste whenever I need one. I'm not sure I see what additional editing is necessary.
A-I and J-Z This division might appear arbitrary and can lead to unbalanced tables when, say, the tables contain only the main European languages French, German, Italian and Spanish (of which three go on the left and only one on the right). However, when the table contains many languages (say 50 or 60) this division is about right. I came to this conclusion by looking at some of the more developed lists of translations (as, for example, at butterfly and iron). (I've just looked at these again, and the division is closer to A-L/M-Z or A-M/N-Z. Maybe a straightforward 50-50 split, that is, A-M/N-Z, would be more appropriate.)
Balancing the columns One alternative to dividing the languages between the two columns is to edit the table so that, at any one time, half the languages on one side and half are on the other, as has been suggested above. This gives a balanced table but it quickly becomes unbalanced if translations are added for additional languages and do not readjust the balance. I don't agree that "most users would try to keep them balanced" - in my experience of using this system, I found that they simply added languages and did not attempt to tidy up the balance. Keeping the table balanced is in fact a lot more work, as it becomes necessary to count the number of languages, divide by two and then cut and paste to even the table up. This is the particularly the case when several languages are added rather than just one or two. That is why I came up with the A-I, J-Z division instead, which gives a balanced table once sufficient languages are added without the need for any extra effort on the part of the user.
One column versus two I have no major objection to losing the two-column layout if it is considered unwieldy or difficult to use, although I don't believe it is. The minor objection that I have is that the pages will become much longer and contain a lot of white space on the right. I would prefer to keep the two columns rather than one for the reasons I have given above.
Paul G 09:24, 8 Sep 2004 (UTC)
PS - Another reason for adding the A-I/J-Z comments is so that users unfamiliar with this style of HTML table know where to put the first translations in an empty table. — Paul G 09:28, 8 Sep 2004 (UTC)
In the 75 languages now in our Index of language idexes the median value is "Japanese"; in the 413 languages of our list of languages the median value is "Lapp". We are in the right range for splitting up the list, but tiny aestehtic adjustments whenever a list is changed seem like an incredible waste of time.

After using the two-column tables for several weeks now, I'd vote to keep the nice colors but get rid of the double columns. Too much time wasted on formatting. Unless we can produce the two columns automatically. RSvK 02:19, 21 Sep 2004 (UTC)

If you keep the formatting template at a convenient place, like a user sub-page, it's easy just to copy and paste. I even experimented with having it in 3 columns at lead, and it looks alright. Eclecticology 06:49, 21 Sep 2004 (UTC)

Specific language HOWTO and discusstion pages

One thought I've had due to the current discussion on how Japanese articles are done ( Talk:明るい(あかるい), Talk:明るい ) is having standardised pages for these discussions, and standards we arrve at due to the discussions.

Let me propose Wiktionary:Japanese article standard for setting out how to write Japanese entries in the English Wiktionary, and how to add Japanese translation entries for English words. The discussion about the standard could then go to Wiktionary talk:Japanese article standard. I think we also have enough material around to kick-start Wiktionary talk:Greek article standard.

A few things I'd like to request opinions on:

  1. Should these pages go under Wiktionary: or have no namespace?
  2. Should these pages fit into a larger system of similar pages on all Wiktionary standardisation topics? Romanisation and capitalisation are two other prominent and recently-discussed topics which spring to mind.
  3. Should these pages instead be part of a "tutorial" or an FAQ, or should they compliment those? (whether or not they themselves currently exist is moot for now)

Hippietrail 16:07, 5 Sep 2004 (UTC)

I kicked off Wiktionary talk:Japanese article standard with a question about one-kanji words with no okurigana. Let the debate commence! --Yajuu 09:34, 7 Sep 2004 (UTC)

The name Wiktionary

Two issues with our name. First, the phonetic pronunciation in the logo has "Wiktionary" as three syllables, with the "a" completely missing. but even Brits don't completely swallow the "a" in "dictionary", and Americans would definitely pronounce it as four.

Second, unlike Wikipedia, the name Wiktionary sounds horrible when pronounced in German, and probably a number of other languages. We need to reconsider having the same name for each project in all languages. Some foreign Wikipedias sometimes tweak the name (e.g. 'Wikipedy' in Frisian, 'Uiquipedia' in several Iberian langauges) which is an improvement. Of course this is more an issue for the compilers of the foreign Wiktionaries, but we need to be aware of it. --Erauch 00:55, 6 Sep 2004 (UTC)

The pronunciation of the logo has been discussed before, at Wiktionary talk:Wiktionary Logo.
As for the second point, the non-English Wiktionaries do have their own names: according to their front pages, the Polish is Wikisłownik, the Spanish Wikcionario, the Latin Victionarium, and the German, actually, is Wikiwörterbuch. (See also the translation list at Wiktionary.) However a lot of the Wiktionaries have not translated all the messages or namespaces yet, and unless there's been a change in the status quo only the Latin Victionarium has a translated logo, so if it looks like all the wiktionaries are called "Wiktionary" it is only superficial. (Incidentally, in translating the Latin logo I also used an idiosyncratic pronunciation.) —Muke Tever 20:11, 7 Sep 2004 (UTC)

Remove pronunciation systems

Should IPA and AHD be removed, as IPA requires Unicode and AHD is for English. Shouldn't SAMPA be replaced with X-SAMPA? And seeing as X-SAMPA is an alphabet that displays the equivalent for IPA in ASCII, shouldn't we just replace it with a more readable UTF-8 version of IPA?

Summary: There should only be one pronunciation; it should have a link to the key; it should be displayable under the UTF-8 encoding and be less mangled than X-SAMPA; and it should be an IPA-equivalent. I suggest a UTF-8 or image-version of IPA. --Blade Hirato 07:36, 8 Sep 2004 (UTC)

I'll agree there should only be one pronunciation, and would suggest UTF-8 IPA. However, thought should be given to which symbols we choose to represent the phonemes. English <r> is generally represented by /r/, but that refers to a trill. Normally this is acceptable, but since this is a multilingual dictionary, we should use the most appropriate symbols for each language. Do we use /ɹ/, or /ɻ/, or even /ʋ̴/? Do we show aspiration, which in English is a more important feature than voicing? This information would be useful to non-native speakers especially. --Vladisdead 08:50, 8 Sep 2004 (UTC)
No... a pronunciation key represents w:phonemes. The phoneme is not symbolized /ɻ/, even though it is phonetically [ɻ] in America; it is just /r/. It is general practice in producing phonemic representations of a language to use the simplest characters necessary. Specific information about the phonetic realization of the phonemes for non-native speakers demands an article by itself (such as Wiktionary:Pronunciation key). —Muke Tever 15:37, 8 Sep 2004 (UTC)
I agree that IPA would be the system to use, as it is designed to show nuances of pronunciation that other systems have difficulty representing without showing bias towards one accent or another. — Paul G 08:56, 8 Sep 2004 (UTC)
Umm... UTF-8 is a representation of Unicode. IPA is of course the base system to use; the reason both IPA and (X-)SAMPA are used is that (X-)SAMPA is the asciification of it for people without access to good Unicode fonts—not everyone is at their own computer and has access to install such things. (BTW, I don't see that "more readable utf-8 version of IPA" means anything at all? In any case, the server sends and stores all pages as UTF-8; we can't use anything else.)
SAMPA is preferable to X-SAMPA (where it differs) in that SAMPA is a set of standards for representing the phonemes of individual languages—and offers a set of simple plain-text standards for each. For a pronunciation key X-SAMPA is only useful for languages that don't have a SAMPA standard available. —Muke Tever 15:37, 8 Sep 2004 (UTC)
Although I prefer IPA as a standard, I am prepared to be tolerant of any pronunciation representations that may be used. I see no harm in having a word represented in several pronunciation systems, or in showing that a word may have more than one valid pronunciation. Eclecticology 19:28, 8 Sep 2004 (UTC)
I meant to use HTML entities; as IE will select fonts that don't have the character; fonts such as Lucida Sans Unicode. --Blade Hirato 01:38, 11 Sep 2004 (UTC)
As Wiktionarians are urging for a DB Wiktionary, I'm sure IPA will only be needed – for all other systems can be springed from it. --Blade Hirato 01:38, 11 Sep 2004 (UTC)

Discussions on Meta and on the mailing list

On Meta and on the wiktionary-l mailing list, there are discussions going on regarding the need to import and export data to and from the wiktionary. At this moment in time, it is very hard to copy valid content from the en:wiktionary and re-use it in another wiktionary.

The discussion focuses at the moment on this need to share wiktionary content. We can get access to a lot of missing valid data, all we need is a mechanism to share this info.

When it is decided that sharing is strategic, we will preferably use an existing XML format and when we find that an existing XML format does not cover all requirements, we will extend the format. From this we will define how we can enhance our database to make it easier to edit the wiktionary content.

An example for functionality down the road; translations are entered and they are automagically sorted and put into a one / two / three column format. (a user preference maybe)

Question: Please have a look at the discussions on Meta and on the mailing list and feel free to contribute.

Thanks, GerardM 10:42, 8 Sep 2004 (UTC)

Perhaps you will have an easier time with this after you have written your XML code and put it on the test wiki so that we can experiment with it and becoem familiar with the way it works. At that point we can be more knowledgeable in our criticisms. Eclecticology 19:35, 8 Sep 2004 (UTC)
There is little to test with XML. It basically reflects what is in a Wiktionary and it allows for the export in a format that is easy to understand by other electronic dictionaries. The crux is that by deciding to go for open sharable content, the structure of the information of wiktionary cannot be the freeformat that we are used to. There will be a mandatory format that will at the same time aid in filling in the information in a correct way. This will mean that my current use of a xx and -xx- message to indicate a language and a usage, will be hidden in a user interface and propably be replaced by something more inteligible.
It is really important to follow the discussion, to participate in the discussion. When it is decided that XML is the way forward and we are near that point, changes in wiktionary will follow. This does not mean that we will lose data or information, nor that it would not be extensible. But is will surely mean that indicating a translation with numbers will become a thing of the past for wiktioanry. Currently I am proposing a proof of concept implementation with glossary data; this is less complicated than full dictionary content. (see the discussion on Meta). GerardM 07:31, 14 Sep 2004 (UTC)

Ordinal use in single sense definitions

I am having a hard time understanding what seems to be sanctioned practice across most Wiktionary entries. During my many years of education, I was taught that one does not make use of ordinals unless there are multiple subordinates to a whole. Yet, I find that the vast majority of the Wiktionary entries begin even single sense definitions with a numerical offset (e.g. the use of a "#" rather than a ":").

My notice of this practice began even before I started actively contributing. The suggested template for entries (referenced in the Style Guide) provides the fanciful entry "hrunk" as a guide. While I agree with the format offered for the Verb section (there are two senses given), I think that the Noun section is misleading and erroneous. Since there is only one sense of the noun, I feel that the ordinal usage is incorrect. The definition should simply be indented, not numbered.

  • Am I incorrect in my interpretation of proper formatting?
  • Is there some rationale behind the apparently sanctioned practice?

It is my opinion that a standard be agreed upon and that the standard best reflect global practice in this respect. To my knowledge, I have not come across a dictionary wherein a single sense is given a numerical offset. I think singular senses are best displayed as indented and without ordinal. It is easy enough to convert to ordinals in a subsequent edit if one adds another sense.

Comments?
Velociped 19:29, 10 Sep 2004 (UTC)

One thing is, we can't expect people to change anything except what they're interested in. For example we have headings like =Synonyms= and =Antonyms= even when there's only one. As well, people (especially non-heavy users) tend to merely continue the format they come across: if all the language names in a translation list are bracketed, invariably any new translations will be likewise. If we use ":" or "*" for a single definition, it's rather likely the same will be used for new definitions. (I'd prefer "*", as indentation is used for example quotes.)
I also think that the single number standing alone can itself stand as a invitation to add more: there's no question that a one-item numbered list is incomplete. But those are my only arguments, and I wouldn't be averse to changing it, if there were consensus. —Muke Tever 20:30, 10 Sep 2004 (UTC)
Of course, we always want to invite people to add more. With material taken from the 1913 Webster we really look forward to having more modern usage. I wouldn't say that an enumerated list of one is wrong, even if the numbering seems trivial. Eclecticology 23:30, 10 Sep 2004 (UTC)

Synch requests

On several pages where there is a variation between US and British spelling, I have added what I call "synch requests" asking users to make changes to the pages for both (or all) spellings so that the pages remain in synch (as Wiktionary policy is not to favour one variety of English spelling over another, but to give full treatment to both [as there are usually two] forms). I put the synch requests at the top and the bottom of the page so that they are noticeable.

I have created a template for this that people might like to use. To use the template, type {{synch}} followed by the alternative spelling of the word.

For example, typing {{synch}} favor gives:

If you edit this entry, please also edit favor to ensure these two entries remain synchronized.

Should anyone want to edit the text of the template, it is at Template:Synch. — Paul G 09:04, 13 Sep 2004 (UTC)

Hi! How about using ordinary transclusion? For example on the page Colour instead of retyping everything added to Color (or vice versa, whichever the case may be}}, let the text of Colour be {{:Color}} which will include the full text of Color. Then, instead of =alternative spellings= listing just the other word, list both with an explanation (so that it will be applicable to both pages, e.g. "colour chiefly Commonwealth English, color chiefly American").
...in such a case it may behoove to have the original article at something like Colour (English) and transcluded onto both pages, to allow for 'color' and 'colour' being distinct words in other languages (color: Spanish, Latin; colour: Middle English etc.) —Muke Tever 17:53, 13 Sep 2004 (UTC)
Good idea. The drawback is, as you say, the need to take into consideration that "colour" and "color" are also non-English words. This means making sure that links to "colour" or "color" (of which there are quite a lot) are changed to "colour (English)" and ensuring that any newly created links go there too. Otherwise this would work nicely. — Paul G 08:35, 14 Sep 2004 (UTC)
Actually the links could stay the same (they are both going to be pointing to pages that have the same English content) — the one link that would need to be added is an "edit" link that points to "colour (English)". —Muke Tever 18:05, 14 Sep 2004 (UTC)
As commendable as Paul's initial suggestion may be, it's major drawback is that it depends on human diligence. The format {{color}} links to [[Template:Color]] so that to make it technically workable we would need to move some of the material to a template namespace. Eclecticology 20:38, 14 Sep 2004 (UTC)
Since Mediawiki 1.3 any page can be transcluded. The format is {{Namespace:Articlename}} (where "Template:" is superfluous, and the blank namespace is just ":"). See Project:Sandbox for an example of a transcluded article. —Muke Tever 00:23, 15 Sep 2004 (UTC)
Here's an alternative idea that doesn't require much effort, and can be done on old pages as well as new ones. Set up a page called "color, colour" or "color/colour" (with the variant spellings in alphabetical order to avoid favo(u)ritism. Copy (and merge, if necessary) the contents of "color" and "colour" into this page. Remove the "alternat(iv)e spellings" section and put this in the respellings under each part of speech, with notes on the regionality of each spelling: "color (US), colour (British)". Then change the content of both the "color" page and the "colour" page to REDIRECT [[color, colour]]. All links to "color" and "colour" are then redirected to this one page, there is no need for synch requests, and neither spelling is favoured over the other. Is this easier than transclusion? I don't know enough about it to be able to judge. — Paul G 15:12, 15 Sep 2004 (UTC)
How would you handle the fact that "color" is also used in Spanish and a few other languages? This is not the case with "colour". Eclecticology 16:16, 15 Sep 2004 (UTC)
"Colour", IIRC, was used in Old or Middle French (though I don't think it was the normal spelling, it was the source of the English spelling). Even rejecting this, we can't rely on one spelling always being unoccupied by another language. —Muke Tever 22:46, 15 Sep 2004 (UTC)
I think we could get around this by careful use of disambiguation pages (does Wiktionary support them?) So if you ask for "color", you get your choice of, say "color/colour(English)" or "color(non-English)". This would also work even in the rare case of alternate spellings in other languages. Similarly, "colour" would give you the choice of "color/colour(English)" and "colour(non-English)". The exact names of the disambiguated pages are much less important -- you won't be searching for them directly.
From an information design point of view, the basic problem is that the title of a page is overloaded. It serves both as a unique id and as a search key. The disambiguation page separates these concerns. It holds the search key, while the pages it points to just have unique ids (albeit meaningful ones, we would prefer).
I realize there are cases like "check/cheque" where usage differs between the two spellings (British English uses both check and cheque where US uses check everywhere). It's a matter of judgment whether to have two separate pages or one consolidated page. In this particular case, I'd say there's still enough overlap that there should probably be only one page, with notes pointing out the differences, just as there are notes pointing out regional differences when the spelling doesn't vary. But the whole point is that we could do it either way: separate pages for check and cheque with cros-refereces, or a single check/cheque page with usage notes, redirected from check and cheque.
So the rules would be
  • Are there alternate spellings with the same or nearly the same usage? If so, make a consolidated page and redirect the alternatives to it.
    • If there are minor differences in usage between the spellings note them.
    • If there are major differences, revert to separate entries.
  • Do two separate articles have some parts in common? If so, consider breaking out the common part and transcluding.
  • Can a page be plausible redirected more than one way? If so, use disambiguation to provide a choice.
I'm very nervous about duplicated material in general. Wikimedia seems to have evolved several mechanisms for dealing with it, and I hope we can make use of these to eliminate the need for synch requests entirely. -dmh 14:45, 23 Sep 2004 (UTC)

Old English and Middle English letters

Is Ƿƿ or Ww more appropriate for Old English entries? Should W be used for entry titles but wynn elsewhere? Or what? --Vladisdead 10:19, 13 Sep 2004 (UTC)

My knowledge is limited but I know Old English spelling was quite variable. For words which can be attested in various spellings, the main entry should be with the regular 26 letters, spellings using the rare letters can certainly be given as spelling alternatives. If there are words which were only ever used with the rare letters then those letters should be in the article title and the headword. Words should not appear anywhere with the rare letters if they have no attestation with them. — Hippietrail 11:29, 13 Sep 2004 (UTC)
I prefer wynn (and have been using/fixing it when I run across it and remember to). W was never used in Old English—it was a much later invention. Conversely, wynn was not a "rare letter" to Old English.
In modern editions, though, w appears to be universal (with the unique but relevant exception, if I remember correctly, of the Oxford English Dictionary—I know it uses creatures like yogh also). As a dictionary, we should list both, though I'd prefer wynn as standard (used e.g. in page names) with w treated as a transliteration (many people don't even recognize it). (We do seem to have an unwritten policy of spelling words as native.)Muke Tever 17:37, 13 Sep 2004 (UTC)
Does this go for long s, also? --Vladisdead 23:25, 13 Sep 2004 (UTC)
Long s isn't a different letter. It's basically a swash variant of short s. If you want to use "ſ" in the quotations to reflect original orthography, fine, but it shouldn't appear in page titles. —Muke Tever 17:50, 14 Sep 2004 (UTC)
I've got some similar issues. In the book from Francis Grose I'm currently working on, there several peculiar typographical issues. I'd be interested in hearing opionions on these as well.
  • The moft notable is the ufe of a lowercafe "f" wherever a lowercafe "s" would normally appear (except at the end of a word). I've replaced thefe with the lowercafe "s" in my entries fo far, believing that this is not really a fpelling iffue, but rather an early printing behavior intended to more clofely refemble the tall handwritten "S" of the period? Is this correct?
It shouldn't be an f; it should be an ſ. In most fonts they will look similar except the bar on the long-s ſ (if there is one) will only be on the left side of the stem, and not pass through as it does in f. And yes, it is purely a decorative variation (which at times I think is even used inconsistently). —Muke Tever
  • There are lots of ff, ft, and fh ligatures that I presume are just typesetting issues. There's also ct ligature I've not seen very much elsewhere, but occurs frequently in this book. I'm basically ignoring all of these.
In a different book I'm about to start working on, there's some more serious issues.
  • the thorn (þ) is used extensively (usually for þe, sometimes for oþer, þat), and I think it should be transcribed as such.
  • the ezh (ʒ) is used rarely for words like (ʒere) (year), and I think it should be transcribed.
Ezh is not an English letter. Are you sure that isn't their uppercase Yogh (Ȝ), or maybe a gaeliform ("insular") g ? —Muke Tever
Like long s, ezh is a variant of z, in the context of Old English. Also, German eszett (ß) is a ligature of ſʒ, and the cedilla (ç, originally cz) is a small form of the ezh's hook (originally zedilla, "little zed"). IIRC.:)--Vladisdead 09:49, 16 Sep 2004 (UTC)
From what I've seen, a curly-tailed swash z (like, but not an ezh) was used, which helped the confusion of yogh with z later on. This character is still used in modern cursive handwriting [2] and is still not an ezh of any kind. BTW as for eszett, its origin is disputed (see w:Eszett); it could likely be (and in modern styles certainly is) from 'ſs' as from 'ſ with curly z'. —Muke Tever 17:40, 16 Sep 2004 (UTC)
(Both characters can be ezh-shaped, but ezh is purely a recent phonetic character with the value of /Z/. See http://www.evertype.com/standards/wynnyogh/ezhyogh.html )
  • the yogh (ȝ) is used rarely for words like (tymeȝ, nayleȝ, lyȝte), and I think it should be transcribed. It also occurs several times for (ȝer, or ȝere), which is confusing.
Why would yogh be used in "tymeȝ" ?
  • I've found both "Sʒ" and "Sȝ" for Saint, so I'm a bit confused by that.
Me too. Can I see a scan of these things? —Muke Tever
I'll get you a scan sometime tomorrow probably. No Wiki tonight. :-( -- CoryCohen 23:21, 14 Sep 2004 (UTC)
  • the Latin small letter H with stroke (ħ), occurs a fair amount, most often when the H is the last letter of the word. I have no idea what the significance is, and I wonder if matters at all whether it's transcribed or not.
  • the Latin small letter B with stroke (ƀ), mostly in (oƀ. and łƀ). The "oƀ." is an abbreviation for obol or half-pence. The "łƀ" is an abbreviation for pound (weight, not currency). It's actually an lb ligature with a single stroke in my text, unfortunately, the unicode L with a stroke is slanted and lower. As with the stroked H, I've got no clue what the significance is.
I'm guessing that it's the common use of an overline to indicate an abbreviation (which here interferes with the letters' ascenders)?
The stroked lb is in the "Letterlike symbols" block, at U+2114: ℔ —Muke Tever 22:08, 18 Sep 2004 (UTC)
  • a double lowercase L ligature with a large tidle across it, that looks kindof like (ɫɫ). That's just a doubled Latin small letter L with middle tilde. It's not really right. I have no idea what the significance is. It does not appear to be typographic, since some double L's are crossed and some are not. But when it does occur, it's always at the end of words like (Michaeɫɫ, graveɫɫ, byɫɫ, waɫɫ, tyeɫɫ, &c.). I don't know what to do with it.
  • The use of "ɫi" for pound (currency, £) occurs occasionally, being just "li" the rest of the time.
  • I have a number of superscripted letters as well. The most common is probably "ye", with a superscripted E. But there's also a lot of superscripted c, m, o, x, and l in counts of things involving roman numerals. These are important, and probably need to be superscripted, but I haven't figured out how to do that yet in Wiki.
HTML is to put it in <sup>...</sup> tags. There are also individual superscript letters in Unicode, but these are supposed to be for phonetic (not plain language) use, and they came in the latest versions of Unicode so don't yet have much font support (ex: yᵉ for þe).
Actually I checked the Unicode charts and these characters do exist, in the combining diacritical marks block in a subsection for "medieval superscript letter diacritics". There is e (yͤ) as well as a i o u c d h m r t v x, the range is from U+0363 to U+036F. —Muke Tever 00:18, 15 Sep 2004 (UTC)
  • The yogh only appears in my "Microsoft Sans Serif" font. Does anyone have suggestions on what system configuration works best? I'm using Firebird, Windows XP, Character Map, and the International keyboard stuff.
  • Do we think gross misspellings (even in common at the time) and words containing these exceptional characters should be marked in a language like "Middle English"?
The best spelling should be the most common (however gross or irregular they may be). Middle English dictionaries do exist, and these could be consulted as well.
  • If it matters, the printing of the 2nd book was in 1904 by the Early English Text Society, and the source material is from London, 1420-1552.
As for the original question, I've not seen the wynn, but I would use it if I saw it. As for the long s, I presume you mean the very tall typographical S that predates the use of lowercase f. I don't think it matters, and I've not been able to find it in the unicode character set anyway.
-- CoryCohen 00:07, 14 Sep 2004 (UTC)
Wow these are great questions - which I am unable to answer. I think the concept of "correct spelling" is quite new and it probably went through a period where spellings shifted like fashions before settling down. The OED and the KJV were likely the major influences in setting English spelling. For early works I think any spelling found in print is warranted in a dictionary. The more prickly issue is which spelling gets to be the "main" one on Wiktionary, with the more exotic or archaic ones pointing to it. Should it be the modern word in the modern spelling? But what about words which have disappeared and have no modern form? What about words which have shifted in meaning too far between the Old/Middle English and the modern English?
If there are no objections I would like to re-post these questions on a more expert forum such as the Qalam mailing list. — Hippietrail 00:27, 14 Sep 2004 (UTC)
Hey that would be great! Although I'm going to try to resist getting sucked into reading that as well... -- CoryCohen 01:08, 14 Sep 2004 (UTC)

For CoryCohen, and anyone else who wants to know, long s is ſ, 017F. There's also an insular g at ᵹ 1D79, but that's not yet finalized by unicode and is only in one font that I'm aware of. --Vladisdead 01:17, 14 Sep 2004 (UTC)

So I took a shot at ʒere, ȝelew, and þe. Please let mne know what you think. They're not very fancy, and I'm not at all sure that it's the "right" thing to do, but it's what works best for my current goal, which is to collect evidence of the use. The capitalization thing is pretty annoying, but at least unicode seems to understand how to capitalize the letters correctly. I'm not in a big hurry to start doing these, so I'll go back to armor & weapons for a while, and give people some time to think about it.
HippieTrail: Probably the KJV more than the OED. Johnson's 1755 dictionary maybe. The OED wasn't started until 1879 and it took them 5 years to get to "ant". As for which entry is the "main" one, we've got a bit of a pinch there. Except for maybe the translations, I'm not sure there is a "main" entry. The quotations should definately be separate (in my mind), and the etymologies might be different (think bastardized foreign word and native english word mean the same thing).
For a while I started to define corslet = corselet, corcelet = corselet, corselet = armor for the body, blah, blah. But then I realized that corselet might pick up a new meaning down the road that was not appropriate for the other two spellings, and so I defined it in each entry. A solution be to list alternate spellings under each sense, so that a reader can see which sense was meant. An alternate spellings above the definitions (equal with part-of-speech) would imply that all alternate spellings were valid for all senses of the word, for simplicity and backwards compatibility. Finally, regarding the "modern" spelling, I'm developing some personal experience with some of the words to be able to add notes indicating which words are preferred in modern usage. e.g chain_mail, and target. But it's tough when the words are rare to begin with, and I've not found a good way to communicate it yet. -- CoryCohen 03:24, 14 Sep 2004 (UTC)
Sense-based alternative spellings have been run into before; see e.g. thorn. —Muke Tever 17:50, 14 Sep 2004 (UTC)
I think that having main headings under the modern spellings is a practical (if not perfect) solution. Minimal as our own skills may be as lexicographers, they are still substantially more than what is had by the general reader. On Wikipedia the complaint arises occasionally about the use of any accented characters at all, because people (from one unnamed country in particular) don't know how to type them. If this is going to be a problem with modern characters in current use it's going to be much worse with obsolete medieval glyphs. If people can't enter a glyph into a search box how can they look it up? It's conceivable that we could expand the Quick index on the Main page to allow for strange characters.
Backwards compatibility in the variant spellings section should not be a major concern. If something like "(14th century)" can be put after a relevant variant that should do the trick.
I do agree that the long ſ is essentially typographical, and was used until about 1800 in a manner similar to the continuing use of the two lower case sigmas in Greek. I think that we can safely ignore it in the general case, though there may bew exceptional occasions when it should be treated differently.
Some of the Latin examples open up the whole new world of Latin paleography. I don't know how much we want to go there. :-) Before the invention of the printing press an extensive system of "abbreviations" was developed which helped to conserve parchment and monks' time. My copy of Maurice Prou's 1924 work Manuel de paléographie lists well over 5,000 such glyphs for Latin. (There is a similar corpus for Greek.) Many of these usages continued into the early period of book publishing, and were only phased out ofer the course of the 1500s. I don't know how much interest there will be in this. Eclecticology 17:10, 15 Sep 2004 (UTC)

By the way, for those who don't know what the OED does: As a dictionary of all ages of English, it lists words by their modern spellings (or at least as modern as the word gets; some die long before the modern era) and then lists all alternative spellings grouped by century. It exhibits no timidity around the unusual Latin characters (though IIRC foreign scripts besides Greek and maybe Hebrew are all transliterated, but it's been awhile since I've seen to check). It's entirely possible for us to merge all "Old English" and "Middle English" under "English" like that (as well with e.g. the periods of French, German, Greek, etc.) but for our purposes as a dictionary of all languages it may conflict; we would have to find ways to include time-relevant things like Old French declensions in a way that is not confusing to users who don't speak the language they're looking up. (I don't approve of the idea, but some suggestions seem to be slipping down this slope.) —Muke Tever 22:34, 15 Sep 2004 (UTC)

Wow, I still can't learn to read these inline wiki style edits. It's driving me nuts. I've started a whole 'nother page at User_talk:CoryCohen/Old_English_Letters, since I figured including my scanned images in the beer parlour would just be too brutal. Muke, please let me know what you think. I now think it's a typographical issue. I've got some more questions there, but I'll respond to a few specific things here:

Eclecticology: 5000 glyphs? Ouch. I've seen some printed transciptions of abbreviated Latin, that looked pretty complex, but my Latin isn't good enough to have a clue what I was really looking at. No worries about me going to that extreme. :-)

On searching for entries containing special characters, I just steal themn from someplace else with cut-and-paste. A list of all the glyphs on the main page divided into categories would be a huge help I think. This raises a big outsatnding question that I still have though: How do I enter arbitrary unicode characters in Windows without charmap, if the character has no Alt code? It seems like there's probably some little trick I'm missing, and I just don't know what it is. :-(
If you know the Unicode hex, type &#x0000;, replacing "0000" with the hex number. You can let it stand like that (it'll display properly, and I believe links will automatically use the proper form) or you can use the preview, and then copy and paste it back in. —Muke Tever 17:40, 16 Sep 2004 (UTC)
As for dating variant spellings, I'd love to do that, but I'll need more data, or help from someone more knowledgable. That's the sort of thing I was trying to do in a vacum (or relying on the OED) before coming to the Wiktionary. I've already used (Corrupt) in barb. bard seems to be the overwhelmingly more popular spelling in modern usage.

Muke: I'm not sure how thorn demonstrates multiple senses. Do you mean with the alternate spelling of: "þ (2)", indicating that it only applies to the second sense? If so, I find that connection weak at best, and likely to be broken in the same way translations have been historically. It'll also get complicated in cases of multiple senses and multiple spellings.

Yeah, that's what I meant by alternative spellings based by sense. It is a marginal case but does illustrate what I meant. For likeliness to break there is always the chance that someone will come along with a similar solution later. But then, as they say, if it ain't broke yet.... Any case, if alternate spellings multiply, a one-off spelling difference like "þ" could be added to the definition line itself, e.g. "(also spelled þ) A letter of the Latin alphabet..." —Muke Tever 17:40, 16 Sep 2004 (UTC)
This sort of ties into your comments on how the OED does it. I agree completely with your description, but I see one very big difference. The OED essentially has one "article" or "page" per sense -- not per primary spelling. They're perfectly comfortable with bill(1), bill(2), bill(3), and bill(4) as root level entities. And they seem to have some database capability so that I can search for any spelling and find the same document. The wiki technology seems to be pushing us down a path of having only one bill page. The page is then divided into languages (seems correct to me). But as I perceive that the OED does, I'm inclined to believe that the "sense" should be the next level, not the other various attributes like spellings, translations, etc. I mentioned this on another page at [3], but I think it got lost, since no one's responded.
It's not so much an article or page per sense, but an article or page per word (though different words can have the same spelling). Sometimes they split a noun that is spelled the same as its verb (or an adjective that is the spelled the same as a participle), and sometimes they don't. I do agree that the separate words should be on their own level, and it is similar to how I am doing things on la: (see, e.g.,). —Muke Tever 17:40, 16 Sep 2004 (UTC)
As for Old English, Middle English, and just plain English, I take it you're in favor of the single language approach? I'm inclined to put everything under English. It is one language after all, and that will eliminate another big problem with deciding what's old versus middle versus modern, and what words are "assumed" to be in which languages. In all, I think trying to split them up is a horrible slippery slope. So why do I ask? Because I think that the addition of old and obscure words is not necessarily going to be a "Good Thing" (tm) for the translation oriented folks. I thought that maybe some distinction would help ease their pain.

HippieTrail: I went ahead and signed up for the Qalam list after all. It looks like I can learn a lot by reading through some of that discussion.

-- CoryCohen 04:33, 16 Sep 2004 (UTC)

The simplest advice is to keep it simple.:-) The separate senses do appear on a page as a part of a hierarchy of features of varying importance. Etymology comes higher because each distinct etymology carries a series of senses. The translations come later, and are separately listed under each sense. Variant spellings tend to come before because they can trace the spellings historically, but all leading to the word as we know it today. The different meanings are soft-numbered, meaning that the numbers could change as additional meanings are added. Although there is a recommended order for orderring the features about a word, it is important to maintain some flexibility. There will be many times when some variation of the order is encouraged. Odd UTF-8 characters can be enterred by using their numerical code. Eclecticology 07:13, 16 Sep 2004 (UTC)