Wiktionary talk:Language considerations

Definition from Wiktionary, the free dictionary
Jump to: navigation, search


This page is intended as a first point of call for questions relating to the representation and resolution of issues relating to Wiktionary language and internationalisation considerations.

Am I right in assuming that one reason for having language namespaces is to create a translation (i.e. French-English English-French, etcetera) dictionary? If so, we need something more flexible than the language links on Wikipedia. As it is sometimes they totter at the edge of uselessness due simply to varying meanings of words. For an example, my little pocket English-Spanish gives 44 different translations for set, to say nothing of "set about", "set to", etc. There is no single word you could link set to on es:Wikcionario that could possibly be of any value. You would need to be able to put [[es:colocar]], [[es:desarrollarse]], etc, as links within the text, as a bare minimum.

Perhaps we can use #for different meanings of the same word ???.

Also, a major part of a translating dictionary is idioms; I'm just worried that any attempt to feasibly set up a translation dictionary will just spin out of control. - Montrealais

Translation dictionary = translationary.

My original idea for tranlations was this:

links foreign language pages.

en.wiktionary.org/wiki/Dog - is my definition page

then languages could be directed to:

en.wiktionary.org/wiki/Dog-language code eg

en.wiktionary.org/wiki/Dog-es on that page it could explain if its a 100% definition or a near translation.

- fonzy

OK, I think if we just use English explanations of foreign terms, it's reasonable and desirable to have it all on the same site, i.e. this one. If we had a German explanation of an English or French term, we would want to have it on a separate, German site. So I think we should move quickly to get

  • en.wiktionary.com (Jimbo has the domain
  • de.wiktionary.com

set up, possibly others. Then if you wanted to link to the German explanation of a word, you would use [[de:Hund]] etc.

I think it would be desirable to have one language per page only and create disambiguation pages (I hate this manual mechanism, we need something better in the long term) otherwise. It would also be useful to have some kind of flag (not a namespace that would require changing link patterns) to specify which language a specific word is in. That way, we can then create filters to show only changes to pages in language X / only export articles in langauge X etc. If we just use (foo) in the articles to specify the language, this filtering gets harder, and it becomes possible to mix and easier to make mistakes. --Eloquence

So wait, we want the English site to contain every word in every language, defined in English?
I'd say we want the site to contain every word in every language, defined in its own language, plus links to equivalents in other languages with translation notes. --Brion VIBBER
I love this idea User:Mac, go ahead!!!!.

Brion this url is only temporary it is hoped to be moved to language code.wiktionary.org soon. - fonzy

So an entry for cat will contain something like:
Cat: a small carnivorous mammal, Felis catus, often kept as a pet. [[fr:Chat]] [[es:Gato]] [[it:Gatto]] [[de:Katze]] [[eo:Kato]] [[jp:Neko]] [[la:Felis]] etc., ad nauseum...?
To say nothing of words like set - my Canadian Oxford gives no fewer than 77 definitions of "set", divided among three different entries, not including idioms like "set about", "set X against Y", and so forth.
Furthermore, how do we deal with multi-word verbals in English, relative to other languages? Does the Spanish article pavonearse ("show off") link to "show" or "off"?
Phrasal verbs are effectively compound words and should have their own entries; I'd say you'd link show off. --Brion VIBBER
Or perhaps show#off
I'm coming to think that we may need separate projects for the translation dictionary and the language dictionary. - Montrealais
Montreal, I think the interwiki links to de, fr etc. are of limited interest, because the fr:Chat page would contain the definition of Chat, in French. Therefore we do not need as sophisticated disambiguation for these links. For the translations, we will instead do something like
(de) Katze, Kater ..
(fr) Chat ..
as links below the English definition. --Eloquence 15:40 Dec 12, 2002 (UTC)

For the moment I think we should just concentrate on the main English Wiktionary not bother about translations yet, yes it can sill be discussed but i thik proposed templates need to be in order otherwse we maybe in a mess with lots of diffrent style defintion pages. - fonzy

A help for spanish wikipedia with spanish wiktionary it´s necessary too.

By starting this project from the ground up, there are a lot of interesting possibilities. Hopefully people will choose a way that gives the Wiktionary to be as great as it can be. I do have a few observations at this point, and so early on they can't be much more than that.

  1. The language of the Wiktionary. As with Wikipedia, I think that a Wiktionary in every language is an ideal to be reached for. Wikipedia has not developed evenly in every language, and it would be hard to imagine Wiktionary developing in any other way.
  1. Interlanguage linking. This is where the greatest opportunity lies. From the perspective of the English Wiktionary the principal part of the dictionary would be the treatment of the term in English. All definitions and descriptions in the English dictionary would naturally be in English. We could also incorporate both the links to other language Wiktionaries, and what that word as a foreign word means in English.
And a very interesting thing = the user could move (redirect) off-topic wikipedia definitions into on-topic wiktionary definition and viceversa. A wikipedia great article could begin like a very simple wiktionary definition.
  1. Language codes. The ISO 639-1 codes for languages will likely continue to be appropriate for a considerable time yet. I really don't see any practical demand for Wiktionaries or Wikipedias in the more obscure languages of this world, stated goals notwithstanding. Dealing with the meaning of words and expressions from other languages is quite another matter. We could start with the three letter ISO 639-2 codes or even SIL codes as abbreviations or disambiguators, but I can't be positive if even these will be adequate.

"See en:Wikipedia:Complete_list_of_language_wikis_available or ISO 639 Language Codes to see what languages the abbreviations stand for." A message like this and/or an excerpt of the list should be in a more prominent place. Patrick 11:35 Dec 13, 2002 (UTC)

  1. A Translating Wiktionary. Translations based on reading dictionaries are very obvious to native speakers of the receiving language, and are often hilarious. I just had an experience of this sort to-day when I presented my wife with a new teapot and cups for our anniversary. As required by Canadian law an inscription on the bottom of the cup had to be in both English and French. In English it read, "Fine porcelain, dishwasher and microwave safe." In French it read, "Porcelaine fine, sauf en lave-vaisselle et micro-ondes." Although "sauf" can mean "safe" in French, the way it was used it meant "except". Caution is urged.
PS Was it intentional that we should use the British pronunciation for "Wiktionary"? That's what the the IPA in the logo suggests. :-) Eclecticology 07:45 Dec 13, 2002 (UTC)
I didn't think both would fit in the logo, and the British style amuses me more. ;) That, and I'm not 100% confident of my IPA-ish rendering of of the American pronunciation; the only dictionaries I have at hand that use IPA for English are bilingual dictionaries which give only RP pronunciations (sometimes plus rhotic). --Brion VIBBER 09:21 Dec 13, 2002 (UTC)

About the name Wiktionary[edit]

You can use the name wikipedia more or less international. Mayby people will ask to say it again if you talk about it, but it is not a to difficult word. Wiktionary is a difficult word. Mayby not for English, but not very good for international use I think. If people do not understand the name the will not find the website. 12:27 Dec 13, 2002 (UTC)

Can you think of a better name for wiktionary then?? - fonzy

Not really. But does not change that wiktionarty is not a good name. Wikidictionary would be better, but not good. The name should be more or less language neutrall and easy to understand. Like "votaro". 13:03 Dec 13, 2002 (UTC)
"votaro" isn't easy to understand; I can't understand it at all. Also, "dictionary" (and by association wiktionary) translates ok to French and Spanish ok: "diccionario", "dictionnaire"/"wikcionario", "wiktionnaire"... What exactly is the problem? -- Merphant
Calquing the portmanteau only works in languages where the word for "dictionary" is very similar to "dictionary". For Esperanto and German, "Vikivortaro" or "Wikiwörterbuch" may not be half bad as straight compounds; though less delightfully whimsical, you at least get some fun alliteration going. --Brion VIBBER
I suggest WIKIONARY from en:wiki and dictionary
I agree with that, because wikionary conserves wiki and is easier to spell (as the remaining part is onary). 07:07, 18 March 2007 (UTC)
  • I suggest ```wikidict``` as an international name. It doesn't sound as good as wiktionary, but it's definitely more international.
  • But couldn't there simply be more names? For example something like Wikiwoordenboek for Dutch, or Wikislovar for Slavic languages. Then, if wanted, wikiwoordenboek.org could be bought and be and alias to nl.wiktionary.org, and en.wikiwoordenboek.org could be an alias to wiktionary.org. Guaka 23:45, 29 Aug 2003 (UTC)

On including non-english words, are we only including non-english words which are used in the english language or all non-english word ? --Imran 21:19 Dec 16, 2002 (UTC)

Any known word in the whoel word (any language) can be added, its jut the definiton useage, informatioan about it thats in English. - fonzy

If a words not used by english speakers, its unlikely that an english speaker would want to look it up, perhaps we should concentrate on words used by english-speakers (or are used in well-known phrases and statement (eg. "Ich bin ein Berliner"). --Imran 22:25 Dec 16, 2002 (UTC)

Say i was looking around and saw a Japanese Word in Japanese letters, tehn i could cut and paste put it in Wiktionary and **poof** i can see what it means. -fonzy

Also I think it would defy the point of wiktioanry, someone who is learning chinese may find it helpful having translations and describing the useage etc.

Perhaps, this is a wikipedia task.

Any oen thought of creating verb tables? French Verb Table, Spanish Verb Table, etc

Is there a move planned to fr.wiktionary.org for the French version ? I have a lot of work for my school, but I'd like to help a bit once a week (Saturday). Thanks Brion for the move to wiktionary.org :-)-- Youssefsan

I'm hoping to using wiktionary as a testbed for the style I proposed months ago at m:Thoughts on language integration; so English stuff would be moved to http://wiktionary.org/en/Foo and French in http://wiktionary.org/fr/Foo, etc. (And a single login for Wiktionary in all languages.) Unless there are any objections? I'll try to get this set up in the next few days. --Brion

I have no real objections, but i would prefer wiktionary to have the same format as wikipedia, as i like things to be constant. - fonzy

Great! Since my evil plan is to convert Wikipedia to that format once the software's set up to support it, you shall have your wish. ;) --Brion 21:40 Dec 26, 2002 (UTC)

IMO language links should work the same here as they do in wikipedia. In Wiktionary the subjects of articles are words. These words can be any word in any language. Therefore all language Wiktionarys will eventually have an article on the English word dog. This will be spelt the same way and look the same in all language Wiktionarys since the subject is the English word dog. Therefore interlanguage links should be of the magical variety but they will have very predictable forms on the dog page such as [[es:dog]], [[fr:dog]], [[de:dog]], [[zh:dog]] (unless there are ambiguity issues with words in other languages that are also spelt "dog"). Links outside of Wiktionary need to be a bit more complicated and also be inline with the text. For example the link to the article named dog on the English Wikipedia should be something like [[en.wikipedia:dog]] (and the reverse should be [[en.wiktionary:dog]]). We needn't make interlanguage linking here more difficult just because Wikipedia also has interlanguage linking. --Maveric149

I agree. Not sure about the particular syntax for cross-wiki language linking (I'd tend to think of [[wikipedia:en:dog]]), but that's the gist. --Brion
Have there been any moves towards other language wiktionaries by now? I would really like to see them or at least know in what stage they are. --Imperator 11:47 Feb 1, 2003 (UTC)

Although the differences between AME and BRE spelling are being discussed, the differences between AME and BRE pronunciation for the same word have only been mentioned, and differences between the AME and BRE meaning of the same word have not been discussed. For example, in pronunciation, BRE vowels are not pronounced the same as AME vowels in the same word - cf. BRE "lot" and AME "lot." And, an example in meaning, BRE "lift" means "elevator," but AME "lift" does not. RoseParks

"British" and "American" are themselves very large pots in which there's a great variety of regional variation. Significant variations should, of course, be noted where they exist. --Brion

What to do about interlanguage disambiguation?[edit]

Copied from Talk:Lente[edit]

This article raises some interesting questions about how we define our borders between the Wiktionaries in different languages.

"Lente" is not used in contemporary English. (I believe it was used in older forms of English, but I don't think it would be helpful to go there at this time.)

We do need to maintain some perspective on the fact that this is the English Wiktionary. If, for example, we start including what the Spanish translation is for a Dutch word the articles could quickly get out of hand when you contemplate the number of possible language pair. If you admit L languages, then the number of language pairs will be L*(L-1)/2; a mere 10 languages implies 45 possible language pairs, 90 if you consider that translation go in two directions.

I would propose then that the translation section be limited to foreign translations of English words. We already have the English translation of foreign words in an earlier part of the article. The Spanish/Dutch pairs would properly appear on the Spanish and Dutch Wiktionaries. Eclecticology 21:15 Dec 19, 2002 (UTC)

Shouldn't this be a disambiguation page? The subjects of Wiktionary articles are all words in all languages so just because words in different languages share the same spelling doesn't mean that they are the same. They are different words and therefore need their own pages. IMO the syntax should be Lente (es) with Lente having a list of all the words. --Maveric149

Eventually this could be a possibility. In this still young project, however, it seems more important to preserve the big picture. Wiktionarians have an ambitious project here, but a lot of the details still need to be worked out. We are still only discovering the problems. Eclecticology 02:03 Dec 29, 2002 (UTC)
I guess using hr rule lines will work just as well so long as the other language Wiktionarys do the same. Otherwise magic interlanguage linking will be a total mess. --mav
The lines do seem to be doing a nice job. The interlanguage links may need to be more tightly controlled than in Wikipedia for the very reason that you mention. Eclecticology
I can forsee that some article may become very long since each ==Translations== list for each word will get get really long eventually. We may need to have intra-article jump-to links with a TOC at the start of the article. Another good idea might be to either reduce the amount of vertical space the translation lists take-up or devise a neat way to have "translations of..." companion articles. --mav
I've been tolerant of these translation lists, because the project is still young. Fundamentally i feel they don't belong here. Remembering that this is the English Wiktionary. A translation list in the English Wiktionary should only exist for English words. The English reader rarely wants to know what a word in a language that he doesn't understand means in another language that he doesn't understand. Similarly the Dutch Wiktionary should only include a translation list for Dutch word, and so on. When you drop the translation lists the article becomes manageable again.
The only interesting thing it´s know the translation of an english word into another languages (spanish, french and so on). Or the translation of a spanish word into english, french ... And so on.
Intra-article jumps can be useful in any long article. Reducing verticle space may get us into some kind of tables, and that's a whole other can of worms. Companion articles are a possibility if things get too long. Would that re-open the "sub-page" debate that there was on Wikipedia some time back? I look forward to seeing Wiktionaries in a few more languages, and the opportunity to work out some of these inter-Wiktionary links. Eclecticology 09:45 Dec 31, 2002 (UTC)

I think there should be separate namespaces for each translating dictionary, or maybe sub-namespaces. So en:dog would be an entry in English about the English usage, and es-en:perro (or maybe es:en:perro or en:es:perro) would be an entry in English about the Spanish usage. For an article in the en: namespace, en:dog would of course be equivalent to dog. -- Merphant

This would be a wikitranslationary ;D

About the language considerations, I've got an idea, but I'm sure this is NOT hard to code, but that would take the source far away from the wikipedia's one, and I hope that you're trying to keep one source for both applications.

The idea is this one : having (in the database) the pages for the same word used in several language (ex: Chat in english and Chat in french) stored in different entries. I will use for example the Brion's notation, that I like too, but actual wikipedia's notation is also fine...

http://wiktionary.org/en/Chat will contains both definitions, the one for the English "chat", and the one for the French "chat".

http://wiktionary.org/en/fr/Chat will only give the definition of the french word in english. While http://wiktionary.org/fr/en/Chat will give you the definition of the english word in french.

If you consider that there is only one database for all languages, you can also imagine browsing http://wiktionary.org/wiki/fr/Chat to get all the definitions of the french word in all languages. http://wiktionary.org/wiki/Chat will let you read all the definitions of all the words in all languages. Of course, if "wiki" can't be used (for compatibility reasons with the current wiktionary, you can use "all" or "_" or anything, details aren't important, I prefer the idea than the details).

The main advantage to have differents pages (at least in the database, even if you browse all of them at once) for the same word in differents language, is that you'll be able to extract ONE dictionary of all english words in english, and then one dictionary of all french words in english, etc.

If you start mixing up the different pages in wiktionary.org/en/Chat, you won't be able to do that. -- AGiss from french Wikipedia

I am not sure to understand why one would want the definition of an english word in spanish or french (or definition of a french or spanish word in english). Either the definition in english is enough, or the translation in french or spanish is enough. A french/french (or spanish/spanish) dictionary contains the definition of the word in french (or spanish), if the end-user wants to have a definition in his/her language.

Also, why not considering pure translation dictionaries namespaces. For instance, Spanish/English, and English/Spanish. I am sure this has been proposed already, but I would like to know why there is not a clear distinction between definition dictionaries and translation dictionaries.

Moreover, a translation dictionary is more than a one-to-one relationship between words. For each entry, we have usually several translations, and the example of 'en:cat' translated into 'fr:chat' and 'es:gato' simplifies the problem too much. How would you translate 'en:set'?

By having separate functionality dictionaries (definition and translation), we can link, if we want, the definition of 'en:cat' (in the English Definition Dictionary) to its translation into french '->fr' (English/French Translation Dictionary) or spanish '->es' (English/Spanish Translation Dictionary). This sounds more dynamic to me than the current state of this page, where Wikipedians may duplicate information.

I am new to Wikipedian, and did not find dictionary related threads in Intlwiki-l mailing lists archives. Sorry if these ideas have been already proposed and discussed, and if so, please orient me to the appropriate archives. Nicolas Delahaye 09:03 Feb 5, 2003 (UTC)

Thanks for your interest. The first thing to keep in mind is that Wiktionary has only been happening since December, and a lot of the matters which you raise still need to be thought through and developed. Some of these matters cannot be adequately treated until some of the other language Wiktionaries are functioning.
I don't support the idea of separate namespaces for translation dictionaries. To me that would seem to fragment the concepts far too much, and the number of translating dictionories would vary as the square of the number of languages involved. Most translation dictionaries are very weak, and I seriously doubt that they can ever resolve those difficulties. Often they present a series of possible translations without considering the subtleties that separate these possibilities with the result that a native speaker can sense from the weird results when something has been translated by using a dictionary. Machine translations look even weirder.
My own vision of the Wiktionaries is that each of them is based in issues of importance to speakers of one language. This includes how a word could be interpreted in other languages, and how foreign words might be interpreted in the base language. On the other hand translation of words between two foreign languages is of no general interest to speakers of the base language. In the rare instance where it is, that reader should be familiar enough with one of those two languages to be able to use that Wiktionary.
Notwithstanding what I have said, and at least in the short term while only the English Wiktonary exists, I find it more constructive to approach these issues with a spirit of tolerance. There's plenty that can be done without getting into arguments over these issues. For example I've been aiming for a consistent treatment of article formats; this often involves re-organizing articles in a way that respects the contributed content.
Finaly, I find nothing strange about the lack of mention Wiktionary in the mailing list. This may reflect nothing more than the youth of the project, and the relatively small number of people involved. It is still small enough that most discussions can still be had on the various talk pages. Eclecticology 19:30 Feb 5, 2003 (UTC)
Thanks Eclecticology for your reply. I said namespaces, but meant namespaces or subdomains, sorry for the confusion. My point was that I would prefer seeing Wiktionary to be fragmented in subdomains, two for each language pair - for translation dictionaries - and one for each language - for definition dictionaries. The example of the word 'gear', translated in French and Spanish, and also with its definition in English, will explain better what I think (and I am not the only one, some wikipedians in this forum have already expressed similar ideas).
With a similar structure for Wiktionary, the current dictionary of English words explained in English would be located at /en/en, the translation dictionary English/French at /en/fr, the translation dictionary French/English at /fr/en, and so on. We would need some software changes to generate the interlanguage links at the top of the page (-> /en/en/gear for English, -> /en/es/gear for Spanish, etc...), and come up with a prefix link for all links of the other direction of the pair dictionaries (like [en:fr:'my_word']).
What do you all think about this structure? The current one, with the word explained in English and its translation in an undetermined number of languages, seems to me to have two main disadvantages:
  1. restriction, because it only gives one translation for each word
  2. redundancy, because the list of the links to other languages will be almost identical for the entry of 'apple' and 'pomme', when people will enter the definition in French of the word 'pomme' (translation of 'apple').
This is my vision of Wiktionary, and I know we all have different ones, and don't want to offend anyone here, just submit an idea :-). I duplicated all the pages from my 'test' directory to an 'edit' directory. Feel free to edit the 'edit' directory, and please do not change the 'test' one. Thanks.
Nicolas 10:01 Mar 3, 2003 (UTC)

THe structure you have done is abit too complicated and untidy. -fonzy

I too have been thinking about the structure of Wiktionary and of the problems concerning definitions-translations.

  1. Like Nicolas, I'm wondering about the value of having non-English words defined in English in the English Wiktionary. At this point I'm not for or against it, I'm just asking myself if the subtleties Eclecticology mentioned can really be made clear in this way.
  2. Having translations in the same article as definitions seems, to me at least, impractical. Where would they go?
    • Grouped per article doesn't solve the problem in many cases because a given word can act as different parts of speech and can have different definitions. Each part of speech/definition can have a different translation or not, depending on the language it's translated in.
    • Grouped per part of speech doesn't do it either, because there can still be differences in the translations for different definitions.
    • Grouped per definition would be the best solution if it wasn't for the long lists that would be created.
      In short, I can't think of a solution to solve that problem.
  3. If the value of definitions mentioned in (1) is minimal, and given the difficulties mentioned in (2), I tend to prefer a separation between definition dictionaries and translation dictionaries.

I have had a look at Nicolas' test pages and I find them very interesting. Yes, they are a little bit more complicated than what we're doing now, but I don't agree with Fonzy's remark that they are untidy (I wonder why because no explanations are given). I think the structure is quite orderly and I feel comfortable using it. It might even be taken a step further: a connection between the different language Wiktionaries. I'll try to explain it with an example.

On User:Nicolas Delahaye/test/en/en/gear the definitions of gear are given. The page has links to the translating pages for that word, e.g. User:Nicolas Delahaye/test/en/fr/gear (English -> French). When a translation is clicked (e.g. appareil) the page which gives the translations of that word in English is shown. Apart from the links to the two previous pages it could also have a link to the page where the French definitions of appareil are given, i.e. to the French language Wiktionary.

These are my views for the time being (no doubt I will have to adapt them :-) D.D. 21:29 Mar 3, 2003 (UTC)

By untidy i don't mean untidy looking, i mean untidy in an organization sence, do you understand :-s ? Also knowing newbies they wont link everything up properly, just making more work. -fonzy

O this is jsut apoint incase you don't know Brions idea of one database for wikipedia and wiktionary would mean that the urls woudl look like: http://wikipedia.org/{language code}/i dont know if wiki will be put here/article. -fonzy

I can't say that I completely understand Nicholas's concept, but like fonzy I fear that it has the potential to become very complicated. That can scare people away. "Gear" is actually a very good example for illustrating these complications. In addition to the suggested translations which Nicholas has given us there is also pignon or roue dentée. The French pignon, however can have three separate etymological origins, and can also be translated back into English as the gable of a house or as a pinecone.
Where I'm coming from is from a basic mistrust of translation dictionaries in general. My attitude toward thesauri and lists of synonyms is not much different. These can be great tools, but in the wrong hands they can make one's writing look completely idiotic.
My first principle is that each Wiktionary should be solidly based in its own language. The translating dictionary aspects can only provide skeletal suggestions about what to do in the other language. Those suggestions should in turn have links to the other Wiktionary. A person who is looking for the French translation of "gear" can then view the options in considerably more depth.
The approach that I have followed in organizing individual articles in English has been based on a forking structure that is ultimately focused on the definitions. The concepts that divide to different definitions come before the definition, and those that derive from a definition come after.
What I would really like to see happening with other languages is for somebody to start the separate Wiktionaries for the other languages. Once that happens it will be far easier to develop the interlinking that may be required. Eclecticology 23:54 Mar 3, 2003 (UTC)
I agree with Eclecticology User:Mac
I agree very much with D.D. All what I am suggesting is to separate the translation aspect from the definition aspect, because the way it is currently done is too restrictive to me. Either you want a translation, which could be quite complicated in some cases, and not only one unique corresponding word, or you don't put any. I don't see the added value of putting a partial translation, but maybe you have other arguments.
On the 'untidy' aspect, if you read me correctly, the software would do the work for you in terms of links. The central 'bar' of direction to other languages would be generated by the software, so users would not have to deal with that (if you are in the en->fr version of gear, then if you click 'spanish', you go to the en->es version of gear). As for linking back 'appareil' to its translation into english, this link, I imagine, could also be generated by the software (again, if we modify it). Thanks D.D. for bringing up the case of 'appareil': I updated the example to show you can have the fr->en version of 'appareil', which is the translation back into english, and also gives the fr->fr version of 'appareil', which is the definition in french of the french word 'appareil.
Now, the en->en version of any word is basically the current Wiktionary, i.e. the definition of english words in english: if you are in the en->fr version of 'gear' and click on 'english' you end up on the en->en version of 'gear', which is the definition in english of the word.
With this separation of translation and definition dictionaries, one does not have to worry about the translations of the word he/she is entering, only its definition. The central 'bar' will do the trick. That works similarly as Wikipedia, where you can click on any language to have the corresponding article in another language. If the link is red, or non-existent, this means a translation has not been added so far. And that way, we can all work cooperatively and concurrently. I am more interested in translation dictionaries, and some people here are more interested in definition dictionaries. Let's join our efforts. --Nicolas 07:34 Mar 5, 2003 (UTC)

Although I have no strong interests in Nicholas's proposal, and a lot of uncertainty about where it's going, I'm more than willing to give it a chance to succeed. There's already enough to do in our respective areas of interest to keep an army busy for a long time.

With the prolifertation of these indexes though, we should find some consistent way of naming them, or many of them will just get lost. Something that begins with [Index:...] or [Wiktionary index:...] or anything else of the sort would be fine with me. The one thing that I oppose here is having them titled as if they were just another article about a word. Eclecticology 08:37 Mar 5, 2003 (UTC)

I do not support the proposal either but will will allow Nicolas to create about 12 entries using his format just to see how it goes. - fonzy

Just a quick comment. Because I agree with Nicolas's proposed structure I'm offering to help out with a "mini example project". Nicolas, I'll let you decide which articles you want to create (starting with English language headwords seems like the logical thing to do, since it's the only existing Wiktionary at present). If you want, I can create the "Dutch extension" (Nederlands in Dutch) of the project. May I also propose a central page to develop the different articles from: Wiktionary Project:Interlanguage linking. Feel free to change it if you like.
I do have a few other comments on what's been said above, but I don't have the time to write them right now. That's something for tonight or tomorrow. D.D. 14:19 Mar 5, 2003 (UTC)
My comments:
Eclecticology writes:
Most translation dictionaries are very weak ... Often they present a series of possible translations without considering the subtleties that separate these possibilities with the result that a native speaker can sense from the weird results when something has been translated by using a dictionary.
Where I'm coming from is from a basic mistrust of translation dictionaries in general. My attitude toward thesauri and lists of synonyms is not much different. These can be great tools, but in the wrong hands they can make one's writing look completely idiotic.
A translation dictionary is a tool like any other dictionary. There are good ones, there are bad ones, and they have their purposes and limitations. Just like there are hammers of good quality and bad quality, and a given hammer cannot do the job of another kind of hammer just like that. It's not because there are hammers of bad quality that the tool "hammer" can't be trusted. A tool still needs a user, and that user needs judgment. I don't think the user of a translation dictionary can faultlessly translate a text into another language, if he hasn't been immersed in that language and the culture it is part of. But that's not the fault of the dictionary. What I'm trying to say is that a dictionary presents information. The user should use his judgment on how he is going to use that information. His view must be "broader" than just the dictionary. And even if most translation dictionaries are weak, who says we can't make one that is not?
Eclecticology writes:
I can't say that I completely understand Nicholas's concept, but like fonzy I fear that it has the potential to become very complicated. That can scare people away.
It can and it will become complicated and scare people away if every link has to be made manually. But that is not what Nicolas (and I) would like to see.
Nicolas writes:
We would need some software changes to generate the interlanguage links at the top of the page (-> /en/en/gear for English, -> /en/es/gear for Spanish, etc...), and come up with a prefix link for all links of the other direction of the pair dictionaries (like [en:fr:'my_word']).
If these software changes cannot be made for whatever reason, then we'll have to devise another way of dealing with the problem.
I am eager to start helping with this proposal to see where it will lead us. D.D. 22:00 Mar 5, 2003 (UTC)

Having links to all possible translations in all possible languages creates a mess. User should have the possibility to define what s/he is looking for, and get only this. If I am looking for a Polish translation of an English word, I am not interested in Quechua or French translations.

Another thing is free-formedness of the entries. It is OK with encyclopaedia, but not with a dictionary. People will enter here a lot of very useful data: grammatical information, conjugation tables, usage examples, possible translations, comments on usage etc. If you will not provide a way to programmaticaly parse this data for later export, it is going to be a lost work and lost opportunity, that this project has. A dictionary is a database. How do you think, what is better - a list of free form unparseable notes on words or well structured database that can be easily exported to any other format or database?

I would personally use XML as a way to structure the entries and export this dictionary. Yesterday I edited this article and described how I would personally try to do this. I deleted this, because it would need rewriting Wiki for the dictionary, and after I saw where you already are I feel this idea will not be accepted.


Can't you do that with RE? Although Wiktionary using free-format, it is not that mess. Most of the articles created are obeying some common rules. We don't use strict format because Wiktionary should be easy to edit, otherwise many new comers will find it uninteresting to contribute. Other reason is that because we have not grasp enough all the possibilities of all the languages yet.
We are in trial-and-error step now. Adding new articles or editing existing articles is a required step to find these possibilities. Petruk 16:11 Aug 12, 2003 (UTC)

I do not know what is RE - I learned about Wiki yesterday. I will have to learn :-)

Although Wiktionary using free-format, it is not that mess.

Well, it is. The entry is in free format. It can not be processed by a program. It can not be exported in an importable form. The data can not be used elswhere - it means almost all the work your users put into creating the dictionary is lost. You can only browse that dictionary on the net, nothing more.

Most of the articles created are obeying some common rules.

So they look similar. Only that. You can not, for example, write a program that will create a list of irregular verbs from the dictionary data, can you?

We don't use strict format because Wiktionary should be easy to edit, otherwise many new comers will find it uninteresting to contribute.

I believe much more people are familiar with XML rules (they are extremely easy) than with Wiki formatting. If they do not know the rules, they will be more interested to learn a little of XML than your common rules. Please, believe me, using XML will not make it much more difficult. You can use forms to accept the data. It can be made much easier than it is now. And let me tell that, I do love the idea of this project. I would like to contribute entries here, write scripts to easy check words from this dictionary etc. But I prefer to use my own local database, because in my opinion submitting to such free-form is a work in vain. You can not reuse this data in any way. It is better to have less of quality data than have a lot of mess. If people will see that you do a good work, they will learn what they need to contribute. Especially, that in fact you can provide forms (or even special, dedicated programs) so they do not have to learn anything.

Other reason is that because we have not grasp enough all the possibilities of all the languages yet.

This Wiki is written in PHP, isn't it? Working with XML in PHP is easy. You need one evening to learn everything you need to make it work here: you need to know how to check input against a DTD and how to generate HTML from XML. I think parsing XML would be a lot easier than parsing Wiki markup.

We are in trial-and-error step now. Adding new articles or editing existing articles is a required step to find these possibilities.

The problem is that once you have some significant amount of unparseable data introducing a good change would require throwing all this data away and starting from the scratch. Well structured, parseable data can be converted to new format. Your current format can't. In my opinion you are going in wrong direction. :-(


I do not know what is RE

Did you mean regular expressions? If you did mean regular expressions and not some magic trick, my answer is: no, you can not do it using regular expressions.

XML would be MUCH, MUCH, MUCH, MUCH better: easier to parse, forbidding to let ill-formed input, never making mistakes during the parsing, easy to transform. You can not have all this with regular expressions.


Hi David,

I don't know what RE is either. Short of Regular expressions, which are not my forte. I do think that our wiki format should be convertable to xml though (with some manual intervention, but still a whole lot less work than starting all over). You raise some interesting points. I am also thinking that it would be more interesting to work with xml for the Wiktionary articles and I am convinced that if Wiktionary doesn't adopt it, an other project will do it, eventually.
Did you have a look at Reta vortaro? They do use xml. I would like to chat with you a bit about xml and its possibilities. I'm trying to have it being rendered in Mozilla using xsl. Would you mind I ask you a few questions? Since two years I have also been designing a database schema to store words, their definitions and their translations. I can send you the schema for peer review if you want. My email address: rainbow at linuxfocus(no SPAM).orgPolyglot 19:47 Aug 12, 2003 (UTC)

Hi Poliglot,

thank you for considering what I said.

I am convinced that if Wiktionary doesn't adopt it, an other project will do it

I found your project just because I was looking for some Wiki to start my own project exactly like yours. But because you already started it, I will not do it.

I believe XML would be best for the markup of entries. You could allow some subset of xhtml in places where it would be appropriate.

Did you have a look at Reta vortaro?

No, I hear the name for the first time. I will try to find this project and have a look :-)

I would like to chat with you a bit about xml and its possibilities. I'm trying to have it being rendered in Mozilla using xsl. Would you mind I ask you a few questions?

It will be my pleasure. I will send you an e-mail.

But to make it clear, in my vision even though the Wiktionary would use a combination of XML and database tables to store its data, it would not send raw XML to the browser and force it to displaz it. The data MUST be converted to xhtml and just sent to client. Only if the XML data is requested (probabely because it is not a web browser that is requesting it) raw XML conforming to published DTD should be sent.

Since two years I have also been designing a database schema to store words

I think such a schema should separate some part that is common to all the languages and should let custom definitions below some level - just becase languages differ. I certainly would like to have a look on your work, thank you for your proposition.

As to my vision of all-the-languages-in-the-world dictionary... I created some solution for my own needs - a local database that allows to enter a dictionary for any language. It uses a combination that I proposed for Wiktionary. I use it to check words in Internet Explorer when I browse the net. It works like this: I highlight the word I wont to check and right click it. I choose translate from the context menu and it displays me what I have in my database (I only have a chinese database) - the data that I imported is not editable, my own tables are editable. If there is no entry, I just click Add to enter it to the dataabase. I think the easiest way to illustrate my point would be to let you have a look on it... I can mail it to you, if your mailbox will acceept an e-mail with big attachment (have to check how big - maybe about 3 MB). The disadvantage is that I only work on Windows - I did not care to make this solution to work on Mozilla or Linux. Database is MS Access file, scripts will work in IE only... So... are you interested?

I will try to post a few screenshots here, I only have to check how to do it... one moment, please...


I post this screenshots here, because this solution can be easily made to work with Wictionary

And makes adding new entries quick and easy, doesn't it? :-)

We are reading a Chinese web page. We see a word we do not know. We highlight it:

Highlighting a word to look up

We see what is currently in the dictionary:

Displaying dictionary entry

We want to add new translation. We click Add...:

Adding dictionary entry

And we see the results:

Displaying new dictionary entry

I repeat: this can work with Wiktionary. This can work with any language. This can work from within Internet Explorer and MS Word. A better programmer can make similar solution to work with anything that is on the screen...


PS: Poliglot, should I send this to you to let you have a look on the database, scripts etc.?

Other reason is that because we have not grasp enough all the possibilities of all the languages yet.

I guess I misunderstood you when reading it for the first time. I believe that the best way is to have the structure that is common to all the languages expressed in database structure. Everything that changes and should be flexible, can be encoded in XML. Access to this data will be slower, but will be possible from a program or script.

You will NEVER forsee all the needs people might have to encode dictionary data in the languages you never heard of, becase some are specific for each language (and some are common to all of them). Because you will never know anybody's needs, you have to leave yourself enough flexibility to adapt and not to have to change or discard anything you already have. This can be done with XML easier than with other solutions.

Should I try to explain how I would do it, if I were starting this project from a scratch? I mean I admire what you try to do, and I do not want to offend anybody, I just see the big weakness of your approach: an entry here is a free-form note, not a database record.


Did you have a look at Reta vortaro?

Well, just one look a second ago. This is how it should be done. I did not analyze their DTD... but it seems there is nothing one might dislike there.

I would certainly prefer to submit my Esperanto etries there, not here. If I were a professor, I would prefer my students to help their project, not Esperanto Wiktionary as it looks to be at the moment.

Why? Because they generate good, clean, usable data. Their format has more potential. It would be a matter of minutes to convert their dictionary to clean, top-quality Wiktionary entries. On the other hand, converting Wiktionary entries to their format would need manual work - slow, troublesome, expensive, prone to errors.

( Haqqax + at + yahoo + dot + com )

Hi David,
You are right in it being a free form note. We try to get some structure in it by using headings and parentheses, but anybody who comes by and makes an addition or an edit is free to do this any way he/she wants. I think with the structure we have, it would be possible to parse it with a program, but it would not be as easy as with xml. I have been trying to come up with an xml representation, but I don't know how to write a DTD or an XML schema for it. It's all very new to me. I would prefer to use XML schema though, since it is XML and can be parsed just as easily.
This is what I came up with:

<?xml version="1.0" encoding="utf-8" ?> <?xml-stylesheet type="text/xsl" href="vortaro.xsl" ?>

<Entry> <Language name="English">

<PartOfSpeech name="Noun"> <Syllables>bath-room</Syllables> <Plural>bathrooms</Plural>

<Definition>Room where one can <link>bathe</link>.</Definition> <Translations> <Translation language="Dutch" gender="m">badkamer</Translation> <Translation language="Esperanto">banĉambro</Translation> <Translation language="French" gender="f">salle de bains</Translation> <Translation language="German" gender="">Badezimmer</Translation> <Translation language="Spanish" gender="m">cuarto de baño</Translation> <Translation language="Turkish" gender="n">banyo</Translation> </Translations>

<Definition>Room where the <link>toilet</link> is.</Definition> <Translations> <Translation language="Dutch" gender="n">toilet</Translation> <Translation language="Esperanto">necesejo</Translation> <Translation language="French" gender="f">toilette</Translation> <Translation language="German" gender="n">Klosett</Translation> <Translation language="German" gender="n">Klo</Translation> <Translation language="Spanish" gender="m">cuarto de baño</Translation> <Translation language="Spanish" gender="m">baño</Translation> <Translation language="Turkish" gender="n" irregularAccusative="tuvaleti">tuvalet</Translation> </Translations> </PartOfSpeech> </Language> </Entry>

If this is what you would let people edit, then nothing needs to be changed to the software of Wiktionary. (We cannot really change a lot about it. Even (apparently) simple things like not having the first letter of the title of an entry being capitalized don't get done. The software was developed for Wikipedia). The developers don't have the time and we don't have the expertise to actually modify the software (and the database structure). If you feel up to the challenge, we could make a proposal, of course... But it would come down to you having to do the lion share of the programming work. Most people around here aren't programmers.
The problem with something like the above xml, is that it doesn't look very friendly to edit. I realize you can have it look nice for display, but how about editing an entry? That's the main reason the wiki markup was chosen. Do you have a way to make this work? (cross platform, I'm in the process of converting to Linux)
I'm interested in the Access file, but the address I gave you cannot receive it. I'll send you another address that has a larger mail box behind it. Don't be afraid to offend anybody. First off, we are not easily offended. Second, this project can only get better when we get more input from more people. It is still a young project and I think a lot can still be done to improve it and its usefulness
Glad you took a look at Reta Vortaro and liked it. The reason I have never contributed there in the 2,5 years I know about it, is that the threshold is a lot higher. One cannot just add or edit entries on a whim. (Of course the fact that everything is in Esperanto may increase the threshold for most other people...). I still agree with you that we have to strive for a better format here too.Polyglot 23:26 Aug 12, 2003 (UTC)


it would be possible to parse it with a program, but it would not be as easy as with xml.

In my opinion it would merely be possible to try to parse the entry - and the difference in meaning is important.

I don't know how to write a DTD or an XML schema for it.

You know how to plan good structure for the data. It is enough for the moment, because you are doing real work just now. Once the structure is defined, expressing it in DTD or schema is easy. You will learn how to do it when you need it or someone will help you.

I would prefer to use XML schema though, since it is XML and can be parsed just as easily.

Schema would be better, it is more powerfull. But I am not sure whether it is possible to use schema to validate XML in PHP... And I would have to check how easy/difficult would it be to achieve some useful modularity, that I know how to achieve in DTD, but do not how to do it in schema... I personally did not use schema so far.

But I agree with you - if we can do it with schema, it would be better to use schema, not DTD.

This is what I came up with:

<?xml version="1.0" encoding="utf-8" ?>
<?xml-stylesheet type="text/xsl" href="vortaro.xsl" ?> 

You want to have more than one "Entry", don't you? If you do, it can not be a root element, because there can be only one root element. It should be something like:

<?xml version="1.0" encoding="utf-8" ?>
<?xml-stylesheet type="text/xsl" href="vortaro.xsl" ?>

I would like to propose some modifications to your structure, but right now I do not have time for this. I will try to write more a little later.

If this is what you would let people edit, then nothing needs to be changed to the software of Wiktionary.

I do not think it is a good idea to throw too much at the user at the same time. I would prefer to let them edit only a part of the entry at a time. For example they click to edit Turkish traslation they are editing only this:

<Translation language="Turkish" gender="n">banyo</Translation>

And we can and probabely should provide a form for this. They can choose gender from drop-down list and write translation.

We cannot really change a lot about it.

This project has quite different needs than Wikipedia. The software should be modified.

If you feel up to the challenge

I believe this project has the potential to become much more important than Wikipedia itself. People just use dictionaries more often than encyclopaedia. Because of this it is a great honour to make it better in any way. I would definitely love to do whatever I can to make it suceed.

But you are talking here about big responsibility. Any mistake made during defining the database can have a great impact on the whole project. I never wrote programs for Internet. I do not use PHP. I believe I can write the scripts to do what I am talking about, but I lack experience. The other problem is, that I do not know when to do it. I have problems to find time to get enough sleep. I am not the best person for the task.

But I will try to read the source of this Wiki. I will try to find some computer to install Linux, Apache, MySQL and PHP to run a few tests. Maybe then I will have something interesting to say :-)

The problem with something like the above xml, is that it doesn't look very friendly to edit.

It can be made friendly. We will provide forms with drop-down lists, edit boxes etc. If we want to allow free form entries ( <comment> </comment> for example) we can allow carefully choosen subset of xhtml (paragraphs, tables, basic formatting etc.) with some extesions (for example marking up proper names to enable generation of indexes). Many people know HTML, so learning this markup would be easy for them.

The reason I have never contributed there in the 2,5 years I know about it, is that the threshold is a lot higher. One cannot just add or edit entries on a whim.

Wiktionary can be made to create the same quality data. Easiness of editing just means it is much easier to enter 'noise' - this project will need a community of proofreaders, scholars, users dedicated to protect the quality of the database.

In the evening I will write a little more about the XML you propose, Polyglot. We can take it as the beginning of the definition of Wiktionary data format.


I have to admit that Wiktionary is indeed a mess. This could somewhat discourage potential contributors. If we limit user editing with button, textarea, checkbox, combo-box, etc.., that could help them concentrate on the content without thinking what sections s/he should enter or not if s/he don't want to. XML tag should be prepared by the software, not by contributors. If contributors thinking they need a new format, only then they could add or edit the XML tag. Is that possible? Petruk 16:16 Aug 13, 2003 (UTC)

There should be symmetrical links instead of duplicate input[edit]

Currently a lot of duplicate input is needed. When i want to add a new translation from english to french I have to add new pages for both words, then add the words to indexes and so on..

There should be a database which contains links bitween different wiktionary and wikipedia entries (also in different languages). These links have to be "symmetrical" so that when I say A is a translation of B, then it is shown on both pages.(Users who dont want to see them can turn them off)

When I add link that says that A is a synonym of B then it is shown on both pages, when I add a link that A is the opposite of B then it should be shown on both pages and so on.

On the nl:wiktionary, when you add a word, it is known that the word exists for that language. When you add a translation, the existance of the translation is known. This is done by using two templates: {{ISO 639}} and {{-ISO 639-}} where ISO 639 is the code for a language. The first is used for translations the second for indicating the word for a language. It works like a dream :)
In the Dutch wiktionary only the Dutch word has translations. Therefore the French and English word both refer to the Dutch word.
In my mind the only thing that is really language dependent in a wiktionary is the definition of a term. All the rest can be universal to any wiktionary. GerardM 21:39, 28 Aug 2004 (UTC)


Please add more heads in this page. This way it becomes easier to respond to a topic (since you can have [edit] links for each section). Guaka 23:48, 29 Aug 2003 (UTC)

XML Info[edit]

Remember, XML is a data markup language not a screen formating language, so highlighting words and specific list types (number vs. bullet) are not part of XML. So PHP may still need to be part of the equation. XML would be better for portability to other media such as hand helds, print and braile. XML also lends itself better to a more sturtured entry of information as a base but still allows for creativity and flexablity in data storage. Good parcers exist for XML today.

Question about Greek[edit]

I'm assuming that Greek translations are intended to be modern Greek and spelled in Greek letters. Many of the translations I see are either ancient Greek, transliterated into the Roman alphabet, or both. Here are my suggestions for setting up a standard:

  • All words to be translated into both modern and ancient Greek in the form:

"* Greek: modern_translation (modern); ancient_translations (ancient)" or something similar

  • All translations to be in the Greek alphabet

What do people think? Paul G 10:11, 17 Dec 2003 (UTC)

The probable reason why this happens is because most people can't type Greek characters. Feel free to transliterate. I would make a difference between Modern Greek and Ancient Greek. They are like two different languages anyway. My problem is that I know a little bit of Modern Greek, but not enough to tell whether something is Modern or Ancient Greek. Polyglot 10:26, 17 Dec 2003 (UTC)

Maybe it would be better if the worls would be spelt in greek characters by someone how now how it is written in greek characters and not by traslitterationg from the translitteration

I have two things:

  • I have a 8878 word botanical glossary with English words with a Dutch explanation. Each word can have multiple meanings, Each meaning can refer to synonyms antonyms.. Many of the words do as yet not exist in Wiktiony. If only for the referals it might be usefull to have. The copyright is with me and I already give it away for free.
  • With a translating dictionary, there are two issues:
the language of the word
the language of the explanation

IMHO both should be configurable by the user. Default can be EN EN. When a word is known for a language and there is an English synonym, you can present the English explanation if there is none in that language. When an English explanation is presented when an other language is requested, a standard text can be presented to request the user to do the WIKI thing.. When a new word is added from within a translation, it is OBVIOUS that the word is in that language. EG the word "zijn" is a Dutch word and should propably not be in the wiktionary.. The info is good :). As a new word is either from a given language because of the user settings (default EN) or because it came from a translation of a word, the language would be known. PS I can make a nice database relations diagram if anyone is interested.. PS2 If Wiktionary is interested in the glossary please contact me.

Romanian Wiktionary[edit]

Hi, this is Ronline from the Romanian Wikipedia and the Romanian Wikitravel. How would it possible to start up a new version of Wiktionary in Romanian at ro.wiktionary.org, just like the French and Polish versions of the dictionary. I would really like to focus on the Wiktionary project in Romanian since it has huge potential! Ronline 14:31, 8 Apr 2004 (UTC)

What should link from here[edit]

I don't know whether you noticed when reverting my edits to Language considerations, but I moved the "Wiktionary:About LANGUAGE script" pages to "Wiktionary_Appendix:LANGUAGE script". This seems more appropriate since the only thing those pages do is giving a list of script symblos (usually an alphabet) for various languages and were indeed linked to by "Wiktionary_Appendix:Writing systems and alphabets". As far as I remember, the "About" pages were intended for providing guidelines on how to deal with peculiarities of languages that make it impossible to stick to WS:ELE. I see no point in insisting on a script-language split for those pages before they even contain any content. It makes it only more difficult to initiate discussions on conventions for languages other than English. Such a division should only be considered if these pages get overly busy (which we are far, far away from). Ncik 14:23, 7 January 2006 (UTC)

I did notice those changes, but I wasn't ready to get into a big dispute over what the proper title for this series of articles should be. That could change again if "Appendix:" and "Index:" are ever recognized as full namespaces. IIRC when I moved these pages from the proposed Appendix: namespace it was with the idea of clearing out anything whose membership there was not yet clear, and putting it at least temporarily into the Wiktionary: namespace. I would expect that the "Wiktionary Appendix:" will eventually be deprecated, and everything moved from there. The scripts are indeed linked from Wiktionary Appendix:Writing systems and alphabets and have considered deprecating that one too; I'm not sure whether it should be merged with Languae considerations.
Your interpretation of what the "About" pages are for is certainly correct, but they can be more. I would also see that as applying also to the script pages which will, I hope, someday becaome more than just symbol lists. If in the course of cleaning out the writing systems page we happen to lose the links to non existent pages about sign-language or languages from fiction I won't be bothered since even Unicod is not ready to handle such scripts. Eclecticology 01:22, 8 January 2006 (UTC)