|This is an archive page that has been kept for historical purposes. The conversations on this page are no longer live.|
Compound nouns : one or two articles
I have just created an article about the translations of crude oil. I have put it under crude. I wonder if it is better to create a seperate article (crude oil) or if it is better to put all under crude -- Youssefsan 21:10, 14 Jan 2003 (UTC)
crude oil should eb on a seprate page. -fonzy 22:28, 14 Jan 2003 (UTC)
- I think it should be in a seperate article --Imran 22:28, 14 Jan 2003 (UTC)
We need Namespaces, Clear Syntax and Automation, or do we?
I've seen the definition for Wiktionary: A Dictionary and a Thesaurus in every language. The goal is great, but the current infrastructure and free-form syntax means that there will be an enormous amount of redundant manual editing, that could be done automatically if we just had a clear syntax and some software. I'm not saying that it would be easy, I'm just saying it could be feasible.
e.g. To achieve the goal for one meaning of one word in n languages we have to make n(n-1)=n^2-n entries will frustrate a lot of people, who think something like: "If we just XML'd this and this and so forth..."
At least in nouns there are a lot of unambiguous words in most languages.
I'm sure that here are lots of people who have thought of this kind of scheme of automation through clearly defined syntax, namespaces for languages and classes of words (noun, verb...) and evolving the underlying software. Please see My page on what I've managed to scribble down on this matter
I'll iterate on the subject with your help. Cheers.
- Juho 13:26 Feb 22, 2003 (UTC)
- IT WON'T WORK! Language is not that well behaved, and I shudder at the thought of bot generated translations.
- Please consider the following POVs:
- Automatical entries could go to a special namespace and therefor have a different colour before they are checked by a human to be sane and truthful
- The dependency-data from the automatical translation would be very useful for detecting, stopping and reversing vandalism.
- Let me illustrate this point.
- In Wikipedia when I make a change to an article it takes me some time, lots of concentration, Googleing, backtracking my subscribed RSS-feeds and consulting books which makes it very likely that I will put it on my Watchlist to see if someone axes my edits or what further info people input on the subject. I believe most people go about this Watchlist matter in the same way, which results in numerous eyeballs ready to catch vandalism, minor puns, POVs and so on.
- In Wiktionary the contribution of adding a translation usually takes 10-30 seconds and when you get into the flow, you'll do these for half an hours straight and I have no interest to watch these articles (most likely thing to catch would be someone adding a translation in a language I don't have a clue about). Therefore it is much simpler to vandalise Wiktionary e.g. just change some translation to an obcenity and mark something else in the summary.
- When utilising Wikipedia to get information, one can use common sense to filter out possibly unreliable information.
- When utilising Wiktionary to get a translation, I'm really vunerable to practical jokes, puns and obcenities whether human or bot created
- Comments and further thinking are very welcome.
- I'll write more on this subject in my own space. I'll post a link here when I've elaborated and argued my view more precisely
- - Juho 11:24 Feb 23, 2003 (UTC)
- Sure there are some words that can easily be mapped on a one-to-one basis between languages, but these are the exception. This mapping works best with modern technical terms. The further one gets from these technical terms, the more connotational baggage a word picks up, and that baggage will not be the same in every language. Distinctions may be made in one language but not in another. Distinguishing between ser and estar is a problem for a new speaker of Spanish. The use of the is a problem for slavs wanting to learn English. Do we treat each item of a Finnish declension as a separate word?
- Regrettably automation gives us the situation where a letter addressed to the Widget Company will begin with the salutation, "Dear Mr. Company".
<xml> <company name="Widget"></company> <person firstname="John" surname="Doe"></person> </xml>
- Sorry, I just had to put this here (I'm not trying to provoke a fork into "xmltionary") Juho
- I do believe that a fairly uniform format for articles would be an asset, but putting articles in that form will be still need to be mostly done by humans who will be in a position to exercise judgment when exceptions arise.
- In my view the vision of a multilingual Wiktionary involves separate Wiktionaries for each participating language. Each of these would be written with the speakers of that language in mind. Even the foreign words on each Wiktionary would be described in a way to benefit the speakers of the base language.
- -- Eclecticology 01:50 Feb 23, 2003 (UTC
This is meant to be the ENGLISH wiktionary isn't it? Can we PLEASE have more english explanations added to all of these non-english entries that are springing up left, right and centre. It is very hard for a person who only speaks English to make head or tail of them... KJ
- Hey, loser your mum smells by sarah palmer 07903468421
- I believe it's a multilingual dictionary. But more info on each term sounds helpful, thanks for pointing it out and I'll try to pull my share from time to time to add to the English definitions too. Since, Jay B.
- The more or less official decision is that this section of Wiktionary (the only one yet online) is English-language. That means that it covers words in all languages, but the entries are themselves written for an English-speaking audience. Words in other languages are to be translated primarily into English, explanations of etymology and usage are in English, etc. Other language sections will be set up, but so far I haven't had a chance to implement the new multilanguage features that I have planned. (The system currently on Wikipedia is very fragile and difficult to manage, with multiple separate wikis tenuously linked together. For Wiktionary, and later for conversion on Wikipedia, I'd prefer a combined one-database system which allows better, more seamless transitions between language sections; better handling of linking, etc. See m:Thoughts on language integration.) --Brion 04:58 Feb 25, 2003 (UTC)
These "Interlinga index" entries are bloody awful. Our contributor would have done better to wait until the Interlingua Wiktionary was on line. In the English Wiktionary the Interlingua entries should be limited to showing what Interlingua words mean in English. I've particularly looked at the French on a sampling of these index pages (since it's a language that I understand well) only to find that a significant portion are grammatically wrong, and often not even French. If the words beginning with A to O are going to be like this, don't bother! Eclecticology 08:18 Feb 25, 2003 (UTC)
Should they be deleted? They obvioulsy dont belong here. -fonzy
- Probably, but diplomatically. I like to avoid edit wars if possible. Eclecticology
- I think this is another reason why it might be nice to have different namespaces for languages. I don't think these indexes are such bad start (for an InterLingua wiktionary). They are just somewhat misplaced (beginning with that no one knows that it is Interlingua) Henryk911 00:14 Feb 26, 2003 (UTC)
Different language will have their own subdomain, thats where they are meant for. -fonzy
- How hard would it be to give the interlingua articles their own namespace like the Webster 1913 stuff? I have no comment on their value as I have never heard of 'interlingua' before this week, but they certainly muddy the waters of the wiktionary at present as they have no identifying marks to tell you what the hell they're meant to be there for! KJ
- (to fonzy) AFAIK the subdomains will characterize the defining-language (so far only English) while I would like to see namesspaces for the source languages. (e.g. German words explained in English, English grammar-notes etc...) Henryk911 16:54 Feb 26, 2003 (UTC)
The Webster 1913 material namespace is provisional. It was begun because people quite understandably when this was botted in to regular articles for the words without human intervention. Ideally when people have had a chance to review the material and make it conform to the standards we are developing, that namespace will cease to exist. Eclecticology 22:05 Feb 26, 2003 (UTC)
Is there a template for a 'standard entry' giving the correct order and style so that entries can be uniform. ATM I'm trying to standardise entries semi-methodically one way, using Wiki coding for the layout, and User:ILVI has started trying to do it semi-methodically in a totally different way using html. We could easily spend our lives redoing each other's entries, but that doesn't serve any purpose. If there isn't a standard layout we need to get a discussion going to establish one - this sort of problem isn't going to get any smaller as the entry count grows! KJ 06:40 Feb 28, 2003 (UTC)
- A while ago I undertook hopefully (not too dictatorially) to do just that at User:Eclecticology/Vision. Since then I've worked to make articles fit that approach. Since then I have gained experience from doing that, and my vision is in serious need of upgrading based on that experience. Among the ideas which my experience has tended to confirm is the use of "H2" headings to separate the languages in which the term is recognized as a word, and the general order in which the topics should appear. Still undetermined in my own mind for those topics which come after the definition in that order is how best to distinguish when a translationor other entry applies broadly to all the definitions, or just to some of the definitions.
- I don't agree with the extensive use of HTML when easily used Wiki code is available. That view has long prevailed in Wikipedia, and there seems to be no reason to deviate from that on Wiktionary.
- In converting the Websterbot material, their transfer has been more than a simple act of copying from one place to the other, With each article that I transfer, I fully revise the format based on the model that I have been using, expand abbreviations, convert to wiki code, and sometimes even try to identify the illustrative quotes with a little more detail. Unfortunately, in that last feature the very useful site at concordance.com appears to have ceased to function shortly after I found it. Identifying these quotes can be a significant value-added to the project. Once I've made these changes, or checked somebody else's changes, I can feel confident enough to delete the Websterbot version. I hope this helps. Eclecticology 08:00 Feb 28, 2003 (UTC)
I wish ILVI, would slow down, he seems just to be doing his own thing at the moment. I also wish he would stop using HTML. -fonzy
- I have the same feeling as Karen. Uniforming entries should rid us of all those different styles. I do agree with Eclecticology about the use of HTML. I find it very uninviting to edit (most of the time I just don't bother to add something when it's in HTML). On the other hand, I find Eclecticology's format much nicer and easy to edit. My feeling is that it can be used as a good base to further enhance the format. Although I haven't written much on the subject, I'm willing to join a discussion about a proper format. (You never know, I might even have an idea or two worth considering! :-)
- There is one other question I have. It concerns the use of capital letters. I don't know any traditional dictionary in which the first letter of a headword is capitalised as a rule. Headwords are only capitalised when needed (nouns in German, proper nouns in a number of languages, a number of abbreviations, etc). Here it's different: the first letter is always capitalised. I don't really like that and strictly speaking it's not correct. I'm aware that changing that means changing the software. But there might be other possibilities...
- I'll try to get my thoughts organised about these things. D.D. 08:53 Feb 28, 2003 (UTC)
- I support the need to have a Wiktionary-wide template ASAP and to stop capitalization of the first letters as well. Youandme 09:23 Feb 28, 2003 (UTC)
I'm gratified that my approach has some following; keeping it easy to edit has been a consideration all along. I look forward to the opinions of others, and hope that some detailed consensus will grow out of this.
I agree with Dhum Dhum about beginning article titles with capitals. In Wikipedia the number of items where the distinction is important is significantly smaller than it is here. Brion has expressed his disagreement with our view, and he's the one who has recently been the most active in our software development. I don't know enough about the software to be able to properly respond to his objections.
I have a question for anyone regarding what to use as a "language name" for those terms (mostly abbreviations of one sort or another) that transcend any individual language. These can be language and country codes, chemical element symbols, and others that must necessarily be the same for all languages. I started with "general usage" and have currently drifted to meta-usage, but there's nothing there that really feels satisfactory. Any ideas? Eclecticology 09:38 Feb 28, 2003 (UTC)
- My humble suggestion for language links: use Ethnologue langauge codes as names of articles (capitalized or not) with some non-letter characters, maybe for example like this: (ENG), (JPN) for English and Japanese respectively. Youandme 10:22 Feb 28, 2003 (UTC)
- I'm aware of the site and find that list more comprehensive than the 3-letter ISO 639-2 codes, but that wasn't the question. As Dhum Dhum correctly responded, it was more a matter of looking for what single term can be used instead of a language name when we are talking about things that are the same for all languages. Eclecticology 00:16 Mar 1, 2003 (UTC)
An idea i had for wikipedia which i have nto state yet is to have a title masking tag. <TITLEMASK>Page Name</TITLEMASK> this would be use for pages like Alexander of Greece(King) as I don't like those (...). The tag would hide it and show Alexander of Greece but the URL would still be http://www.wikipedia.org/wiki/Alexander of Greece(King). This could be used in wiktionary, for first letters, it will also help with abbrevaitions like [[etc|]]. -fonzy
- There was a recent discussion on the mailing lists about the "pipe trick" which would do what you suggest. Using the format [[Alexander of Greece (king)|]] would have the effect that you want. !!!
I wonder if we should think about using subpages in certain cases. I know that Wikipedia decided against it after a lot of discussion. But I'm getting more and more convinced that Wiktionary is something very different, with a content that needs much more structuring and hierarchy. And probably a number of changes in the software too.
Multilingual pages risk to become much too wide for practical use. It could be useful to create different subpages with English as the base language and groups of related languages. Let's say a page with English - Afrikaans - Dutch - Frisian - German. Another with English - French - Italian - Romanian - Spanish. Etc... Possible candidates to have this structure would be the Swadesh list and Wiktionary Appendix:Elements. D.D. 20:30 Mar 11, 2003 (UTC)
I'm starting to think we need subprojects, 3 differnt ones; (language)dictionary, (language) thesaurus and (language) translator. So the url coudl look like:
- http://wiktionary.en/ <--Main Page(English) directing too:
- http://wiktionary.en/Dictionary/word <-defenitions of ALL words
- http://wiktionary.en/Thesaurus/word <-thesaurus for ALL words
- http://wiktionary.en/Translator/word <--gives translations of ALL words, and problems with translating etc
Now I think about it, I think it will be easier to run Wiktionary like this.
- Fonzy, when you say definitions and thesaurus for ALL words, do you mean all words in English or all words in every language (as it is now). I'm still wondering about the use of having a word in, say Icelandic, explained in English.
- And when you say translations of ALL words, I suppose that means all words in English translated into all other languages and vice versa? D.D. 20:39 Mar 11, 2003 (UTC)
All worlds in all languages, and the trnaslation all english words into others and visa versa. The french wiktionary would look like:
(of course those sub domains would be translated. -fonzy
The concern that I see in this has to do with the high risk of fragmentation. Some people involved with other language Wikipedias believe that Wikipedia and Wiktionary should be combined into a single project. I don't agree with them, but, on the other end of the spectrum I don't think we're ready for a lot of sub-pages either. Most of our articles are still too small to make sub-paging necessary, but I do accept that it will eventually be needed as more languages are merged into the project.
When it comes to dealing with these issues and indexes much of our work is still at an experimental stage; the test for these approaches is how easily the newbie adapts to the environment. In the dictionary/thesaurus/translator divisions we still need to come to an agreement on a philosophical level about the relationship between these elements. The interlanguage links will be a challenge, but these have to stay incomplete until someone starts Wiktionaries in other languages. To me the need for having an Icelandic word explained in English links with the question of unidentified texts. If you have a text in an unknown language, how do you determine its language? Ideally, by showing on a single page how a word appears in all languages that have it, you develop the means not only of interpreting the word, but of identifying the language.
The case for separate thesaurus pages is even weaker. We would do much better putting these on the main page for a word, and including explanations of how one possible synonym would differ from another.
The top items on my software change wish list are 1. being able to have articles begin with a lower case letter and 2. determining an acceptable sort algorithm for a wide range of Unicode letters. Eclecticology 23:08 Mar 11, 2003 (UTC)
- The way you presented it, explaining a non-English word in English does probably have its use. I suspect that it lies somewhere between defining and translating the word. For now I can't yet see its relation to those two elements, but it might develop into something interesting.
- Could you explain why you see fragmentation as a risk? I don't agree at all with those people who believe that Wikipedia and Wiktionary should be combined into one project. But even if Wiktionary is a separate project, I don't think we should put everything there is to say about a word in one place. In Dutch (my mother tongue) there are separate definition dictionaries, translation dictionaries, etymological dictionaries, dictionaries of synonyms, of proverbs, and of idioms. (for an example see  - click on "Nederlands" in the upper left corner - I'm sorry but the page is in Dutch). Of course the borders between them are not that strict (e.g. a definition dictionary does contain proverbs, but on a more limited scale than the actual dictionary of proverbs). The fact that I'm used to using them probably explains why I don't "fear" a certain amount of fragmentation.
- I'm not saying that Wiktionary should follow this example. The definition dictionary and thesaurus can probably be combined, because we're still talking about one language. But I don't really see how centrally combining those elements with a translation dictionary can work. Theoretically Wiktionary is about all words in every language. Take only a single word. That word can exist in a number of languages, and within each one of them it can have different meanings. How are you going to translate all those different meanings into every other language with very little fragmentation? Moreover, translating a word is not something on a one-to-one basis.
- I'm convinced that we should find a way of dealing with that. I agree that we don't encounter too many problems yet. But sooner or later we will. And we should try to prepare ourselves by exchanging views and ideas, by doing certain tests, etc.. My greatest fear is that otherwise it will get out of hand. D.D. 10:27 Mar 13, 2003 (UTC)
We probably have more disagreement than disagreement. My concern about fragmentation is quite similar to the fear in your last sentence. I can easily imagine people becoming caught-up in different interlanguage linking projects before there is even any contents to link. Our indexes to or from Esperanto and Volapük are all very nice, but as I see it these are little more than wish lists of words needing articles. It's a lot of skeletons without any meat on the bones. The different types of dictionaries that you mention could be a good way to go at some future time, but we're far from there now. If we get too far ahead of ourselves in deciding upon our structures, we may find that the structure is not compatible with empirical data. Of course a lot of empirical data without structure is also a problem, but it's not an immediate one. Any word is best explained in its own language. That's why I believe that it is crucial to encourage the beginning of Wiktionaries in other languages for speakers of that language. Eclecticology 08:04 Mar 14, 2003 (UTC)
- I hope you meant We probably have more agreement than disagreement :-)
- I'm also waiting for the set-up of Wiktionaries in other languages. I'm more than willing to participate in a Dutch language Wiktionary when it starts. Any idea when that will happen? D.D. 09:58 Mar 14, 2003 (UTC)
- Ooops, you're right! I think faster than I type. Where are neural interfaces when you need them?
- Regretably, I'm not one of the technical gurus. Bring it to Brion's attention, and he may be able to do something, but from seeing the mailing list traffic about the slow-down of Wikipedia, he probably already has a long list of things to do. Eclecticology 17:31 Mar 14, 2003 (UTC)