|This is an archive page that has been kept for historical purposes. The conversations on this page are no longer live.|
In order to make Wiktionary work faster, we need to get rid of all the unnecessary links. So, I suggest that we do NOT add links to languages, when there is no translation. If someone decides to translate something, then he will add a link to the language.
I'll give you an example. If you look at Multicolor, Multi-billion or Multicultural, you'll notice that there're 103 links, and only 7 translations. Very unpleasant, isn't it? :/ Webkid 08:55, 17 Jan 2004 (UTC)
- Hi Webkid, you're right. It isn't necessary to prepare languages for which one doesn't know the translations. I did it myself when I first got started, to make it easier for others to fill them in, but I realize now it doesn't make sense. They will remain empty for months, which looks a bit silly. Feel free to remove them, when you encounter them. Polyglot 09:14, 17 Jan 2004 (UTC)
- This may be a stupid question, but why do we link to the languages of the translations at all? What is the purpose of (potentially) linking from every single English word to every word that represents a language? - Sandman 10:15, 17 Jan 2004 (UTC)
- In fact it's a very good remark. We got into the habit of doing it that way, but lately I don't see the sense of it anymore either. At first it seemed like a good idea, but it is probably part of the 'common' vocabulary of the Wiktionary. Let's try to get some consensus and then, maybe, we can start taking those links out, as well. Polyglot 11:36, 17 Jan 2004 (UTC)
- I agree with Polyglot. It is generally considered bad Wiktionary policy to wikify the headings Noun, etc, so I think the same ought to apply to language names in translations. I have started removing them in pages that I edit. Unfortunately, most pages have them, and this leads translators to wikify language names for any translations that they add. My view is that the policy on wikification of words in entries should be restricted to words in definitions that the user might not be familiar with (based on the assumption that they do not know the word that is being defined), and links to other entries (in translations, synonyms, antonyms and the like). -- Paul G 13:14, 17 Jan 2004 (UTC)
- I'm in general agreement, I think, with everyone. It doesn't seem to serve any very useful purpose to link to all the languages, and I think it's mainly just habit. (My fingers are well trained in typing "*[[German]]:" by this time. :-) I'd also thought of the "What links here" trick that Webkid suggested, but really that's what the index pages are for. (Of course, the index pages probably aren't up to date, but that's another subject.) -- Ortonmc 05:13, 18 Jan 2004 (UTC)
- OK, if nobody shows disagreement with the unlinking of the languages within 7 days, I suggest that we start removing links then... Webkid 09:19, 18 Jan 2004 (UTC)
- I've already started unlinking just the common language. I'm leaving the languages linked which are not national languages and which most typical users would not know of. Hippietrail 04:14, 21 Jan 2004 (UTC)
- For consistency I think it would be better to unlink all of them. All or nothing... Polyglot 08:57, 21 Jan 2004 (UTC)
- In this case I think "consistency" is a very subjective term. Sure it would be graphically consistent but is it consistent with the idea of a hyperinked dictionary of every word in every language? I think having uncommon words easily investigate by clicking on them when they are seen is a huge benefit and having some words in a list being a different colour than others is a miniscule defecit. Indeed I've learned about a couple of languages I didn't already precisely by clicking on the links I found right here! Hippietrail 20:05, 25 Jan 2004 (PST)
- The problem then becomes: which languages are interesting enough to be linked and which aren't? Very subjective and not a decision I want to make (over and over again, for each article). Polyglot 23:14, 25 Jan 2004 (PST)
- I think everybody should be free to link everything he (she) wants User:Sergio1956 26jan2004 13h
- Yes, in an ideal world. See the original poster's comment. We need to remove some links in order to reduce the load on the server. -- Paul G 05:14, 26 Jan 2004 (PST)
- What is the impact on the load on the server if we remove language links,
0.01% faster, 0.1 %, 10%, 100%, 1000%. Is it right assuming wikipedia and wiktionary use the same server ? The exponential growth of wikipedia and wiktionary is fine, but may be it is the real reason for performance problems. In german it is called Teufelskreis (fr:cercle vicieux): faster server -> more traffic -> more articles -> more users -> slower wiki -> more donations :-) -> faster server -> more traffic -> and so on... Thats why it may be interesting to know the impact on the load removing the linksUser:Sergio1956 26jan2004 21h
- We chose early on to mark them with the full language names in English. We get a lot of languages here and it is not realistic to expect people to memorize more than 300 language codes. So let's stay with the full language names. Polyglot 23:17, 2 Mar 2004 (UTC)
- I'm somewhat surprised there should be a performance problem with multiple links. I'm particularly surprised that Wikipedia's growth hasn't forced a fix.
- Someone mentioned somewhere that each link on a page requires a separate database lookup to determine whether to color the link red or blue. This should be fixable by making the rendering/editing smarter:
- Maintain a table of (link source, link destination). This is also needed to support "what links here" (so I suspect it must already exist?). This table should contain all links, whether tentative or not (otherwise adding a new entry requires finding all pointers to it, without the benefit of a table).
- When rendering, do one lookup to retrieve the set of all extant pages the given page links to (by joining the link table against the table of extant pages). If the tables are properly tuned, this should be fast. Cache this set locally -- even if every word is wikified, there will only be one entry per unique word on the page.
- When rendering a particular link, check to see whether the destination is in the cached set.
- When checking in a page, update the link table accordingly. Again, this should be a single database operation.
- Unless I'm missing something major, this should eliminate the existing performance bottleneck. The single select with local lookups should be much faster than multiple selects with no local lookups. It also gives more atomic semantics -- as it stands, two links to the same page could conceivably get different colors if that page is checked in while the linking page is being rendered.
- No, I'm not volunteering to implement this. -dmh 06:29, 4 Apr 2004 (UTC)
A few questions
1) Look at New Caledonia, for example. You can see a See also at the bottom of the page. My question is: should I use ''See also'' or ===See also=== ?
- I would tend to change it to a heading. Polyglot 11:21, 23 Jan 2004 (UTC)
2) Look again at New Caledonia. The translations are: #:*[[Breton]]: [[Kaledonia Nevez]], #:*[[Danish]]: [[Ny Kaledonien]], #:*[[Dutch]]: [[Nieuw-Caledonië]] etc. Should I change it to *Danish, *Breton, *Dutch or should it stay as #:*Danish, #:*Dutch, #:*Breton?
- There seem to be two different styles of adding translations. The more common style seems to be to group all the noun definitions together, and then put the translations under them (with *Danish), with numbers to indicate which definitions the translations belong with. The other style is to add translations under each definition (with #*Danish). New Caledonia looks like an anomaly, and I'd be inclined to change it.
- The advantage of the second style is you don't have to worry about renumbering the translations if definitions get moved around in the list. The disadvantage is the pages get much longer, and there may be duplication of translations. -- Ortonmc 16:38, 23 Jan 2004 (UTC)
3) I can see that most of the people place the word/phrase on top. Why? Wiki automatically adds this information at the top. Webkid 08:36, 23 Jan 2004 (UTC)
- I have the habit of repeating the word under the Part of speech heading. (Not on top) This way it is immediately clear whether the word needs to be capitalized or not. For other languages, I also tend to put plural forms and diminutives behind that. For English only the irregular verbs and sometimes the adjectives get something added between parentheses. Polyglot 11:21, 23 Jan 2004 (UTC)
Merging consequent changes
It happens quite often to me, that I edit a page and save it, but shortly after that I realize that I did some minior mistake. Although I try to force myself to use the preview it might in general be interessting to merge the consequent changes of the same user to one entry into one single change. (At least show it that way in Special:Recentchanges like for example MoinMoin does). Henryk911 00:46, 29 Jan 2004 (UTC)
- This is a problem if the changes are conceptually different. For example, if I revert some vandalism, and at the same time notice a few other errors, I always do the revert on its own, and then fix the errors as a separate edit, so people can see what has happened (and so a revert is a revert, not an edit). Collapsing these into one edit would not be useful. --HappyDog 02:37, 15 Feb 2004 (UTC)
Is Wiktionary being plagiarised?
See http://dictionary.new-frontier.info/w/Appendices/Place_Names - the comment at the top of the page is the one I added to the equivalent Wikipedia page. Is this a legitimate mirror? If not, and given that the contents of Wiktionary are free, what is the position here?
- I found this at the bottom of the page:
- Some content falls under GNU FDL, and is provided in part by Wikipedia.
- I believe they can do that, but I'm not an expert. It should be made clear to them they can not forbid us to also use the modifications they add on their site. Polyglot 22:50, 9 Feb 2004 (UTC)
- I'm the owner of http://dictionary.new-frontier.info, I was recently informed of his Wiki page by Henryk911 through a recent e-mail enquiry regarding the license of the information presented on my Web site.
- If I may post the e-mail transaction between us to help resolve this confusion:
- Hi Zeeshan,
- thanks for your fast answer. Maybe you want to tell the comunity yourself,
- what your project is about.
- Maybe here: http://wiktionary.org/w/wiki.phtml?title=Wiktionary:Beer_parlour
- Under the topic "Is Wiktionary being plagiarised?"
- All in all it's an honorable effort to read-only mirror wiktionary. :-)
- >Hi Henryk,
- >Thank you for your feedback, I appreciate user input. Regarding your
- >first question, the intention of my Web sites are to repackage GNU FDL
- >and Creative Common licensed information and present it to end users
- >who are looking for it.
- >I do not wish to 'challenge' or 'split forces', as you worded, the
- >Wikimedia Foundation, but the GNU FDL allows for commercial usage of
- >information as long as sources are stated, I have clearly marked this
- >in all of the Web page's footers.
- >The site was developed as an alternative means of gaining information
- >when Wikimedia servers were severely lagged or offline, which they were
- >constantly prior to the donation requests and Slashdot community
- >Regarding your second informational question, the copyright notice has
- >been worded to be minimal in an effort to focus users attention to the
- >actual page content. Of those who are interested in the license status,
- >such as yourself, of the information provided by
- >dictionary.new-frontier.info' a link is provided to the GNU FDL.
- >I have stated 'Some content falls under GNU FDL, and is provided in
- >part by Wikipedia' so that users are aware that I reserve copyright
- >ownership on some of the Web pages (bookmark help pages, link-to pages,
- >recommendation pages, etc.). The copyright also reflects my ownership
- >status of how the GNU FDL information has been repackaged and
- >presented, such as the underlining Perl code, outputted' HTML, and CSS
- >I hope this has been of help to you, if you have any additional
- >questions, please contact me at SNIPPED.EMAIL.ADDR. Thank you.
- >Zeeshan M.
- >-----Original Message-----
- >Sent: Saturday, February 14, 2004 3:55 PM
- >Fullname: Henryk911
- >Email Address: SNIPPED.EMAIL.ADDR
- >Message: Hi Zeeshan,
- >1. I wonder, what's the intend of you dictionary-site? It seems like a
- >1-1 copy of wiktionary.org (except for the layout). But why split
- >2. I'm not sure if I like your "license". Because you state, that some
- >content falls under the GNU FDL, you imply that other does not. So
- >what's the license of the "other" stuff (if there is any). I think, you
- >should at least mark the wiktionary-pages as such, so it is clear they
- >fall under the FDL.
- > Henryk
- All in all, I mean no malicious intent, and do not block re-distribution of the edits I have applied to the Wiktionary pages, after all, this is why I stated the information inherits GNU FDL (in the footers of produced Web page, in the form of a link to GNU FDL).
- Note: I have trimmed e-mail addresses from the enquiry to prevent spam and other annoyances, if you wish to contact me, please use http://dictionary.new-frontier.info/contact/ - If you wish to contact Henryk911, please send me a message (via the above stated URL) and I will ask him if it is okay to inform you of his e-mail address so that you may contact him directly. Henryk, if you're reading this, you may remove this statement if you don't want me to do this or you wish to state your contact details here.
- - Zeeshan M.
Swadesh List 2
- Also most of the work on it was done by one person quite some time ago. It's likely that the material is duplicated on other pages. That should be checked, and if so it should simply be deleted as duplication. Eclecticology 07:27, 26 Mar 2004 (UTC)
Hello. I'm a french Wiktionary contributor. fr: and pl: have just been setup, and still are building basic things from the ground :) (though some are taken from here, i guess).
The main question, which brion asked, and concerns every Wiktionary, is: what should inter-wiktionary links point to?
If, in a Wiktionary, someone write [[fr:Français]], where should it point? Basically, to solutions:
- point to fr.wiktionary.org/wiki/Français
- point to fr.wikipedia.org/wiki/Français
From my understanding of the project, [[Français]] here is right away equivalent to [[Français]] on fr: and pl:. So inter-wiktionary links are pretty obvious, we won't have issues like on Wikipedia :) Consequently, I'd favor linking to Wikipedia, but making clear it's a Wikipedia article we're pointing to, not a Wiktionary one.
Note that we don't need to do the same thing on different Wiktionary, but it'd be easier nontheless :)
I asked this question also on fr:'s talk page, and i'll try to provide feedback from both sides.
Ryo 11:53, 27 Mar 2004 (UTC)
- I'm kind of contributing to the Wiktionnaire and to Wiktionary. I don't understand Polish, so I can't follow what they are doing. I try to tell them how we do things over here and why. I hope it's not considered too intrusive.
- For the links I would like that and point to entries in the Wiktionnaire, where as en:Corsica continues to point to Wikipedia. That would mean things aren't consistent anymore though and I think it is important that a link in no matter which language of Wikipedia or Wiktionary always means the same thing. (Otherwise we get in trouble when copy/pasting, etc...)
- So, I propose to prepend a w, i.e. (wfr:fleur and wfr:flower) point to the French Wiktionnaire, whereas and point to the French Wikipedia and this no matter where; in Wiktionary or Wikipedia; the link is written.
- Of course it might be that another letter than w would be more appropriate, but I wouldn't know which other than maybe d (for dictionary) would be a candidate then.
- I'm glad this problem is going to be resolved. And I'm glad our opinion is asked in the matter. Don't mind me if this proposal doesn't make sense. (I had been thinking about this problem before it was raised though. The only thing I cannot do is to think beyond Wikipedia and Wiktionary. The project has grown larger than those two now, so every solution that is implemented needs to be thought out well) Polyglot 17:38, 27 Mar 2004 (UTC)
Numeration of meanings
While trying to add Estonian counterparts to the article Paper I found a problem. The meaning of the entry words are listed by means of the symbol #, so they change when someone inserts a meaning or changes their order. So the meaning numbers cease to identify the meaning correctly, as it did happen, for instance, to the Dutch translations. Of course, whoever who inserts a meaning should modify the numbers in the translations too. But how can I know the numbers have been changed. I think this is a serious reliability risk. Andres 14:41, 27 Mar 2004 (UTC)
- There are two options: 1) watching a page (or its history) to see when/if senses are added and whether renumbering is done, or 2) making it standard to only add definitions to the end of the list. Unlike in many other dictionaries, sense numbering in Wiktionary doesn't appear to have significant value. —Muke Tever 17:10, 27 Mar 2004 (UTC)
- I did come across some guidelines as to the order, but if the community still don't think they are that important then I would opt for the second option, as this requires just a collective effort: while watching tecent changes, don't let new meanings in otherwise than in the end. Of course, then also easily could be checked if the numbers in the translations are confused. Of course, the best way would be that anyone who inserts meanings would check the translations, but probably this is too utopic. Andres 22:26, 27 Mar 2004 (UTC)
- The # gives us soft numbers, while the numbering that some have used in the translations assumed that they are hard numbers. The people who work on the English part of the dictionary often pay no attention at all to translations. There was an inconclusive discussion very early on about the ordering of definitions. It would have resulted in meanings being grouped by related senses, or ordered chronologically to reflect when the use became a part of the language. Adding meanings at the end as and when they are found would result in a product that would be unhelpful to the user.
- A solution in the translations that avoids hard numbering would be preferable. I don't know if the software can be made to link to a particular definition instead of its position. What I tend to do when I encounter this problem is to put all the translations that apply to a particular meaning in an appropriately indent level under that meaning. The indent level can vary if other extensions are also related to the meaning. Eclecticology 23:11, 27 Mar 2004 (UTC)
The indentation was getting a bit thick, so I dropped it. This is another problem that I noticed in my first few days. I hope there's not huge discussion about this elsewhere, I saw a link a couple of days ago about numbering translations that I skipped. :-) Solutions with pros and cons as I see them:
Only add definitions at the end of the list
- Pro: It's simple, and it requires few changes.
- Con: The definition order is supposed to have some relation to the frequency of use.
- Con: New users may not know to do this, and destroy the translations as a result.
- I find this solution unacceptable personally. I think it will lead to completely incorrect data as a result of the new contributor problem.
Group translations under the definition (as suggested by Eclecticology)
- Pro: I think it has a better chance of working.
- Pro: Isn't this what was decided for quotations?
- Con: It's going to accelerate the need for being able to conditionally display parts of the entry, so that readers aren't overhwhelmed by lots of repetive translation data.
Perhaps thinking about the problem this way will help... We have a N-dimensional matrix of data. The N's are language, sense, pronunciation, etymology, translation into language X, etc. The problem will quickly become intractable if N is large and it's a full matrix. So the refined problem appears to be: Which hierarchy is: the most intuitive, and will result in the least duplication of data?
Language is a good candidate for the top of the list (as we appear to have done already). I propose that "sense" or "definition" should be the very next level because it's the level that resolves the most ambiguities. This would cause a lot of translation data to be duplicated, but I think that might be correct anyway. Another advantage is that sense has a lot of things that can go immediately under it, without conflicting with other things at the same level. e.g. quotations, pronunciations, translations, alternative spellings etc. are not likely to "conflict" in my mind.
I'm fairly sure we'll be able to find examples that contradict this ordering, but my impression so far is that the "sense" has not received a sufficently high level in the hierarchy so far. In short -- I like Eclecticology's solution. I hope that's not too rough on the translators out there.
-- CoryCohen 04:19, 9 Sep 2004 (UTC)
I've since found some more of the translation discussion and seen the tables grouped by sense. This is fine solution in my mind, esecially if it buys something for the translation oriented folks. But I still can't help shake the feeling that the following structure would produce the most reliable data (and also a good bit of duplication).
- Word (implicit spelling)
- Sense (Definition)
- Part of Speech
- Alternate spellings
- See also
- Derived words
- Sense (Definition)
If sections were collapsable, this would be quite attractive, intuitive, and suitable for a lot of different uses. Without collapsable sections, it'll probably be a bit bulky. :-( As for the duplication of data, it'll be cut and paste mostly, and I think there's a chance we'll be able to "upgrade" it to some sort of normalized form by detecting identical content later on down the road. E.g. write a script that compares all translations across senses to see which group, or process all quotations looking for duplicates. Obviously, I'll go with the crowd on this issue, but have we really thought hard about this issue yet? The recent discussion about translations makes me suspect that we have not.
-- CoryCohen 04:46, 16 Sep 2004 (UTC)
Capitalization in headings
Should headings be capitalized or not? I see both and haven't been modifying them but for new articles I capitalize them all. Hippietrail 04:14, 21 Jan 2004 (UTC)
- They should be capitalized. Please change them when you see uncapitalized ones. The reason they were not capitalized is that they were links and we should only capitalize words in links when it is necessary to do so in that language. (German nouns, months in English, etc.) So from looking at the link, one can know whether this word needs capitalization or not. Polyglot 08:57, 21 Jan 2004 (UTC)
- The wikipedia standard is to capitalize the first word, but not the rest of the heading. I've been following that practice in cases where the heading is more than one word. -- Ortonmc 15:28, 21 Jan 2004 (UTC)
- That's exactly our problem. Capitalization is something where a dictionary is very different from an encyclopedia. In a dictionary it matters whether a word is capitalized or not. It's a bit unfortunate that the software will always capitalize the first letter in links, but I guess we'll have to learn to live with it.
- So we should use ===Proper Noun===, not ===Proper noun===? Either way suits me. -- Ortonmc 16:34, 21 Jan 2004 (UTC)
- I tend to write Proper noun. I don't think it's necessary to capitalize both of them. Then again, I don't know what the usual way of doing this is, in English. Polyglot 11:21, 23 Jan 2004 (UTC)
(Biting bullet so as not to overdo the indenting - and because of long interval.) I'm with Polyglot (and so is the publisher I work for, and so is Wikipedia) - headings in "sentence case", because otherwise they look far too dominant on the page. Robin Patterson 11:43, 22 Jun 2005 (UTC)
Downloading Entire Dictionary
Is there any way to download the entire content, e.g. perhaps a zip file with all XML? I can't see any link to such a thing. I would like to work on a standalone application using the dictionary content, which is presumably okay as long as the program itself is GPL'd.
—AppDeving 13:51, 12 Mar 2004 (UTC)
I've just noticed that there is a geographical index (Appendix:Geographical_index) in addition to the place names index accessible from the front page. These contain a lot of duplication, with the former being much less developed. For the past few months I've been transferring place names from the place names appendix to appendices subdivided by country and region. I think these two indices might benefit from being merged or at least made to have the same content. (There could be a place for both, as the former is ordered alphabetically and the latter geographically.) -- Paul G 16:47, 26 Mar 2004 (UTC) (oops, forgot to sign this first time round)
- The link that you mention doesn't lead anywhere, but that does not diminish the very valid point that you have raised.
- Of all the Wikimedia family projects none is so heavily indexed as Wiktionary. Many of the 1500+ indexes are fragmentary, and there is no systematic way to keep these up-to-date. (except perhaps a lot of highly disciplined work. :-) To make matters worse some of these indexes are characterized as appendices. Furthermore, a lot of the indexes are now a part of the Wiktionary: namespace, and that can make it difficult to find a lot of more general information about how Wiktionary works.
- I've raised the idea of an Index: namespace with a couple people, and at least they did not shoot down the preliminary idea. What would people think about such a move? Can we agree to something before anybody starts moving anything. In the same vein, what distinctions are being made between "index" and "appendix", or are we just using them as two different names for the same thing? Eclecticology 21:33, 26 Mar 2004 (UTC)
- The Index: namespace sounds good to me.
- Regarding the difference between index and appendix, my opinion is anything that's merely a list of words is an index; an appendix conveys additional information (e.g. the relationships between the words). Currently there doesn't seem to be such a clear distinction. The First names and Surnames, for instance, would be among the indices by this standard. Ortonmc 21:58, 26 Mar 2004 (UTC)
- I wrote this comment before I saw Ortonmc's - my view is much the same: in my experience the appendices of a dictionary usually contain encyclopaedic material, such as diagrams showing how the British court system works, maps, geneaological trees of Royal families, lists of presidents, lists of countries with their populations and capital cities, a guide to English grammar, a guide to punctuation, and so on. In virtually all cases, these should probably be in Wikipedia rather than Wiktionary. If our indexes consist of lists of words organized by topic, or beginning with a particular letter or belonging to a particular language, I think "indexes" is the best word. Amatlexico 22:00, 26 Mar 2004 (UTC)
- The Index: namspace sounds good to me too. -- Paul G 13:20, 5 May 2004 (UTC)