Wiktionary talk:Entry layout explained/archive 2004

Definition from Wiktionary, the free dictionary
Jump to: navigation, search

"As most of the rest of the sections ..."[edit]

I defy anyone to explain what this sentence means.


I have been making some changes to reflect how I think we are doing things nowadays. (I hope I'm not living in some kind of bubble with my own reality). Please let me know if the changes I made are OK or not. If they are OK, then I will also go and change the articles they refer to. Although, I must say I never was very happy with the link to Portmanteau. Since I'm not a native English speaker, it took me a while to become convinced it wasn't a linguistic term or the name of some person who had studied linguistics... Polyglot 00:16, 16 Jan 2004 (UTC)


I noticed that you have started to dewikify language names in definitions. I thought the links were quite useful. There are some little known laguages and it is nice to be able to find what language it is by just clicking on it. Maybe we should link to wikipedia about languages instead of wiktionary. Poszwa 10:06, 22 Jan 2004 (UTC)

Is it really necessary to be able to click on them? If there is a language one wants to know more about, one can enter it in the search box. How often is it going to be necessary and is it worth it to have the same links all over again in each entry? In the entry for the language's name it makes sense to link on to Wikipedia. Polyglot 12:33, 22 Jan 2004 (UTC)
I pretty well agree with this, but I have been cautious about de-linking to avoid a storm of protest. I have, however, been delinking those section headings which recur regularly. Although I agree with delinking the languages in the translation lists, I would still be inclined to make them bold. Eclecticology 02:42, 23 Jan 2004 (UTC)

It would really help me get up to speed if the template, and cited leap sample, were fleshed out in a couple areas:

  1. US vs UK (and I assume usually the rest of the Commonwealth/former colonies) spelling and pronunciation. Currently there's no guidance how to handle that. Of course, it's gets dicier with -t vs -ed than -or vs -our: US is pretty much exclusively -or, but has partial adoption of -t: slept is fully adopted (I don't know of anyone that would use sleeped). But, say, dreamt and dreamed are closer to 50/50, and, further complicating things, are sometimes considered to be slightly different. For example, the same person might be more likely to say "Martin Luther King dreamt of equality", but would also be more likely to say "Last night I dreamed I was on a desert island".)
  2. Other forms of the word: I know verbs and nouns and past and present tense, and kinda know adjectives and adverbs, but once you get into past participle, intransient, etc. I am completely lost. I can write very well, I just don't know the terminology for what I am doing--I couldn't diagram a moderately complex sentence to save my life. EG Where would "leaping" go? (mostly copied this from a page that people here aren't likely to read)
Also, I think the anticipate article is a good example of why antonyms and synonyms should also be subbed under the different defs, which is similar to the 'multiple defs' ref to pronunciation and etymology, but currently not explicity stated by the template page. Niteowlneils 19:58, 7 Apr 2004 (UTC)

There is no one way to deal with US/UK variations except to describe the possibilities. Usages often ovelap so it's never clean cut. Canadian usage is often intermediate between the British and American. No template can possibly handle everything.

I don't disagree with that. I guess my comments are more directed at the Leap article (but solely because of its citation from the template). What I was trying to say, is that, for a page that is being referred to as a good role model, it would be helpful if it at least showed one of the ways to do things.

Including other forms is worthwhile, but apart from dialect variations, these tend be very few in English. One of the strengths of the language is in the ability of a word to be used as a part of speech other than what is generally perceived, and still be understandable. "Leaping" usually goes to the bottom of the cliff. :-) The terminology that you mention is often independent of the form, which makes me wonder whether your questions really relate to specific words. They seem to focus on the meta-language of language. How much of the terminology do you need?

Thinking about it more, I guess there are probably enuf exceptions that even if all forms were reflected in the Leap article, it wouldn't be wise for me to try and build other articles by simply following the model, without actually understanding the termingology. Like I said, I know noun and verb, and have a fuzzy handle on adjectives and adverbs (and about the same for article and preposition, I guess), but nothing else. Since I write (and read and speak) proper/correct (American) English, I really am not motivated to spend the time to learn the terminology to describe it. I guess I'll just leave that stuff to someone else, and focus on things I do know, like fixing spelling and grammar (to play off the old pornography quote, 'I can't describe/define bad grammar, but I know it when I see it'.)
That's a fair assessment. None of us should expect ourselves to do more than we can. There are still many, many, many words to be added that do not involve ambiguities or grammatical complications. The terminology about writing and grammar can be every bit as complicated and full of jargon as the terminology about any other specialized area of study. If you contribute the words from your owm areas of study and interest, that contribution will be just as valuable, and perhaps more accurate. Old words (like "leap") based in Anglo-Saxon are among those which pose the greatest problems for lexicographers. Eclecticology 07:47, 10 Apr 2004 (UTC)

The anticipate article has a good distinction between the words "anticipate" and "expect". This reflects one of the strong points of the 1913 Webster, but I don't know how much help a template can be with this sort of thing. Eclecticology 00:24, 8 Apr 2004 (UTC)

Yes, because 4 of the 5 words are used in the definitions, their connection is clear. However, which meaning(s) is "obviate" a synonym to (rhetorical q)? On the other hand, it later occurred to me that it might be simpler to just put the def number parenthetically in the list, like for the anticipate article "To prevent (1);...". The only downside I can think of is that if defs are added anywhere but at the bottom of the list, the numbers would have to be manually updated.
You have effectively raised the biggest drawback of synonyms, and, by extension, of an entire thesaurus. In most cases there is no such thing as an identical synonym. The words in a list of synonyms give alternatives whose meaning (or mileage?) may vary. A good writer will seek suggestions from such a list, but will reflect on the implications of adapting such a word for his own purposes.
I absolutely must agree that leap is a terrible example for the template. I should have looked at it closely before I gave my previous answer. It was added a mere five days after the Wiktionary project started, and has not kept up with the development of the template since that time. Even my own amendments are more than a year old. Either leap should be seriously revised, or a better example found. A more straigtforward word would make a better example for a beginner. Eclecticology 07:47, 10 Apr 2004 (UTC)

Derived vs Related?[edit]

I notice a lot of varying practice about what goes under "derived terms" and what goes under "related terms". Maybe this should be explained better? To me I figure "derived" and "related" to have their basic linguistic meanings—e.g. confusion is derived from confuse, bunkbed is derived from bunk (and bed), disk is related to dish and discus, and relations like spoon to fork I would probably put in "see also"—but perhaps I am on the wrong track here? —Muke Tever 14:51, 12 Apr 2004 (UTC)

I would be inclined to agree with you. Unfortunately the precision that some people employ in their use of the language may be a little short of our expectations. This is well reflected in their lexicographical skills. I take comfort from the knowledge that I am neither derived from nor related to these people. :-) If you feel that someone has said "related" when "derived" would be more correct, do feel free to make the change. Eclecticology 21:28, 12 Apr 2004 (UTC)

"Derived" is merely a subset of "related". Do you really want separate headings for each different type of relationship? And do you want to be the one to figure out ambiguous relationships and say which was derived from which? In my opinion uniformity is best. We should work out one set of headings to be used everywhere. If we choose to separate "derived" and a bunch of others from "related" that's great. But haphazardly changing a few of the easier pages in the meantime probably isn't a great help. — Hippietrail 02:24, 25 Apr 2004 (UTC)

The purpose of "derived" is to list compounds and simple derivations, while "related" is for more distant and less transparent etymological relations. This division is from the Wiktionary:Entry layout explained and debating its usefulness probably belongs in Wiktionary talk:Template. I should say though that I don't entirely understand your objection. \x{2014}Muke Tever 12:39, 25 Apr 2004 (UTC)
It just seems a bit messy to have one specific heading and one generic heading - especially when it comes to back-formations and other non-obvious relationships. For instance, most people wouldn't realize that "burgle" is derived from "burglar". Also by "derived" do we mean that these words came into existence after the headword or just that these words consist of the headword plus derivational morphemes? Now if we do have both, since derived terms are a type of related terms, then "Derived" should be a subheading under "Related" — Hippietrail 12:51, 25 Apr 2004 (UTC)
If most people wouldn't realize that burgle is derived from burglar, it's Wiktionary's job to make it so (we are a dictionary, after all) and if someone makes a mistake in relating a term, someone who knows better will come along and fix it (it's the wiki way\x{2122}).
Now perhaps I'm seeing the terminology slightly different than you are, because by "derived" I expect what may be called "daughter terms", and by "related", "sister terms", which probably contributes to not being able to see them as the same thing, or one as a subset of the other. \x{2014}Muke Tever 13:20, 25 Apr 2004 (UTC)
I mostly agree with Muke on this. Derived terms show what that word has led to; the ancestry of the term is considered under etymology. "Related" links parallel developments. These especially need to be remembered when we are dealing with counter-intuitive back-formations. Uniformity is a wonderful ideal, but if a dictionary is to be essentially descriptive that uniformity must give way to accurate description. The contrary might be true if we were following a philosophy of prescriptive dictionaries. A lot more work needs to be done in our own understanding of the language before we can start telling people how to use it.
It is worth remembering that there was a general concept that underlaid the structure of the template for a word in any given language. The template reduces to three general parts: what leads up to the word (including etymology, pronunciation and spelling variants), the word itself (notably its definitions), and how the word relates to its environment (including derivatives, synonyms and translations). In one sense Hippietrail is right. The derived words are a subset of the related words, but the bigger set seems too broad to be useful. Eclecticology 18:06, 25 Apr 2004 (UTC)

Usage notes[edit]

We ought to fit these in, to clarify situations where words are sometimes misused, and give words that should be used instead. -- Smjg 11:52, 27 Apr 2004 (UTC)

Go ahead. That category was not a part of the original template, but its need became apparent as more of the Webster entries were added. Some adjustments may still be needed to existing articles. Eclecticology 16:42, 27 Apr 2004 (UTC)

Proposed international template[edit]

Aloha! I intend to use the template below on hungarian wiktionary. It have the features:

  • handles words in more than one language (1st level header)
  • handles nouns/verbs/etc (2nd lvl)
  • handles more than one meaning in the same language (3rd lvl)
  • every meaning is followed by its info, namely:
    • it's meaning and thesaurus entries, example sentence,
    • etimology (history of the word)
    • whatever you come up with related to the word
  • translations in every language.

Example (fictional, translated to english):

(word: Hold)

== Hungarian ==
=== Noun ===
  1. meaning explanation; synonyms; examples in italic
    • etimology: it was originated from the word "template" (greek)
    • base word: whatever
    • Translations:
  2. another meaning; its synonyms; it's examples
    • fluff about this meaning
    • Translations:
=== Verb ===

...

== English ==
=== Verb ===
  1. to hold something, meaning, synonyms, examples

...etc, same kind of stuff.

en:Hold de:Hold ... same words in other language wiktionaries (interlang links on top or bottom of the page).


Basically it is not compulsory to use # and * to create numbers and hierarchical display, but I'm a HTML guy and love order. It can be created manually by writing the numbers and bullet or blockquote the info associated to the words/meanings.

This system is logical. Shows every language, every meaning, contains links to translated words in the same and the native wiktionaries, and contains interlanguage links to the same words in the native wiktionaries.

Judge whether it's user friendly or not. Feed me back. Thanks. --grin 11:12, 12 May 2004 (UTC)

Ultimately, the Hungarian Wiktionary is autonomous about its own rules. Still I wonder why you would move the etymology to a place after the definition. The definitions are, of course, the heart of any listing. My tendency has been to show any material that contributes or leads to an understanding of the word before the definition; the material that follows derives from the word and depends on its meaning.
The "#" in the definitions provides soft numbering so that these numbers can be changed automatically when a further meaning is included. Eclecticology 18:43, 13 May 2004 (UTC)
I noticed the active debate about the order, and it is not accidental that I did not comment on those: I do not feel myself qualified enough to decide. Both point suits me. I probably put etymology after the meaning because I suspect we (hu:) wouldn't have much etymological material in the foreseeable future. Other reason could've been that (I thought that) separate meanings are the way to differentiate between separate origins of the word... I mean, if there are two words look the same but mean completely different things, I thought they might have different etymologies. Can they? In such a case it seems to be logical to put etymology after the meaning.
Since dictionaries put etymology before the meaning I can accept that, too. I would change hungarian 'dog' accordingly (that's our template article) but you see, I do not have the etymology of dog.... *sigh*
I take this as a positive feedback about the template. :-) --grin 09:04, 14 May 2004 (UTC)
The problem of identical words with different etymologies is probably a much bigger issue in English than in most other languages. Fowler's English Usage gives "calf" as an example. Its meanings as a part of the leg and as a young cattle have completely different and unrelated origins. Each of these etymologies gives rise to its own range of words. However distinct Hungarian may be from everything around it, it is bound to have been influenced by its immediate Romance, Slavic and Germanic neighbours. The extent to which their words may have been adapted to the Hungarian language are a part of Hungarian history.
You can't be expected to know everything, and tracing a word's ancestry can be difficult. You do what you can. If you can't fill in a word's etymology, leave it blank for another person to do. Eclecticology 20:29, 14 May 2004 (UTC)

Recent talk[edit]

Proposal: Part of speech headers to include word[edit]

Rather than repeating a word under the type of speech section like so:

===Noun===
'''May'''

I propose that it follows the type of speech, like "===Noun: May===". This saves a line for all of these sections, looks much better, and makes it so that the proper capitalization and spelling are viewable from the table of contents. It is also much more formatically accurate, because every one of these appears right below the section title entwined with the section title as though it were a title itself. Also, if the decision is not to have case-sensitivity for page names, then this would show the proper capitalization in the table of contents and allow unique sections for words of different capitalization. What are the disadvantages of this? - Centrx 22:41, 15 Jul 2004 (UTC)

One disadvantages of this is that it makes it harder to link to sections from other pages. The link [[May#Noun|May]] will direct me to the "Noun" section of "May". However, this will not work if the section is actually headed "Noun: May". Of course, the link could be changed to [[May#Noun:May|May]] (all those "May"s) but you would have to visit the link first to see which system has been used.
Another disadvantage (as is the case for any change of this sort to the format) is that it requires changes to many hundreds (if not thousands) of pages and tends to be carried out very slowly as pages come to be edited, rather than wholesale. — Paul G 09:04, 7 Sep 2004 (UTC)

Pronunciation Format[edit]

The template page directs us to format pronunciation as a definition list, but all of the example articles use unordered lists. Which is correct? Looks like the template page should be updated. —Mzajac 17:08, 30 Aug 2004 (UTC)

There is no need to have the pronunciations numbered. The order of the systems should not matter. Eclecticology 00:56, 31 Aug 2004 (UTC)

The template page and the FAQ do not account for words with multiple pronunciation/definition pairs, such as content ("open content" vs. "a content person") and he ("he is a person" vs. "he is a Hebrew letter"). In adding the Hebrew letter definition to he, I used content as a model. -- Adam Katz 07:47, 17 Nov 2004 (UTC)

You were on the right track. I've restructured that part of he a little, especially to get rid of the double horizontal line from the middle of the Hebrew letter section. Let me know if this is not consistent with your intent. Eclecticology 09:45, 17 Nov 2004 (UTC)
Thanks for the fast response, that should be fine. ... (continued in next section) -- Adam Katz 10:19, 17 Nov 2004 (UTC)

Letter/Character Format[edit]

(continued from above pronunciation section) ... I was following the format used for the greek letters (see alpha or pi), which stick the last/next notation in the pronunciation section. All 23 letters I've entered use that format ... yet I like yours better. This is only really problematic for the few that I bothered to add the pronunciation section. Suggestions? -- Adam Katz 10:19, 17 Nov 2004 (UTC)

I think I'm beginning to see the problem. He represents the only Hebrew letter that can be mixed up with so many things from other language. There are corresponding English words for "nun" and "shin", but not nearly so problematic. So I looked at the simpler case of vav. In that the part that now reads
  • Letter of the Hebrew alphabet: ו
    • Last: ה
    • Next: ז

should really be under the title ו itself, and the links too should then be to the letters themselves. This tends to explain why we ended up with apparently redundant information. We're mixing up the Hebrew letter with the English name of that letter. I have not looked much at the Greek, but it may have a similar problem. I hope I have not confused you too much; we obviously need your help. Eclecticology 18:08, 17 Nov 2004 (UTC)

re-did all 22 english entries for hebrew letters plus all 22 letter entries for hebrew characters plus the Category:Hebrew letters that links them all together. I ripped out all last/next notation from each letter and put it in the character and linked them both together. See also my next note. -- Adam Katz 05:58, 18 Nov 2004 (UTC)

I am have run into issues of inconsistant naming schemes for alphabets; see Category:Hebrew letters and Appendix:Roman script, not to mention Wiktionary:English index. Greek points at an empty Greek alphabet. User talk:219.173.119.31 just made the Arabic alphabet based on my initial Hebrew convention. -- Adam Katz 01:12, 18 Nov 2004 (UTC)

Format for idioms[edit]

What is the proper way to put idioms in a Wiktionary definition? Is there a standard, or has a precedent been set? I feel "Idioms" should be a separate section, distinct from "Usage" which should deal more with grammar issues. The question then is how to format the Idioms section. I think, at least, the idiom should be in a bold font, in order to mimic that of a dictionary entry, as it's essentially a separate definition. A format I tried can be found at Matter (September 14, 2004 03:12 version). The emdash is pleasing, but it is cumbersome to type —, so a simpler format would certainly be welcome. Suggestions? --Brim 03:39, 14 Sep 2004 (UTC)

Well it depends what you mean by "idiom". This term seems to be used in different ways. A single word can have idiomatic senses along with it straightforward senses. Some people just think of idomatic phrases as idioms. Because of the fuzzy terminology I've just been putting everything under "Related terms", but then there is also "Derived terms" - and that is also fuzzy. This is obviously inadequate but it seems difficult coming up with a good way of categorized the various relationships between one word and other words and phrases, including idioms. I'd love to hear more discussion on this topic please! — Hippietrail 01:09, 15 Sep 2004 (UTC)
I was referring more to standard idiomatic uses of common words. See the example under Matter, in which several phrases containing the word are used in ordinary everyday speech but have special meanings which can't be learned from reading the dictionary definition of the main word. For example: "as a matter of fact", "for that matter", and "no matter". These should almost have entries of their own, but since they're technically idioms, I placed them as such in their own category. I was hoping we could come up with some sort of standard so that idiomatic uses of other words could be formatted similarly to each other. —Brim 01:48, 15 Sep 2004 (UTC)
Yes all of those phrases should have their own entries. If "Idioms" is to be a separate heading, it should come beneath "Derived terms" and "Related terms". They should not be capitalised. They should not be defined on the page but merely linked to, and then defined on their own page. — Hippietrail 02:33, 15 Sep 2004 (UTC)

Comparative and superlative[edit]

Seems to me, we should recommend always giving the comparative and superlative forms of adjectives, even if they're only "more X" and "most X". See, for example, what I've done with Reprehensible. - dcljr 05:33, 30 Oct 2004 (UTC)

Part of Speech "Additional considerations:" - about what?[edit]

I don't understand what the "Additional considerations:" are aditional to. Are they regarding how to write Parts of Speech section headings, or generally how to write definitions, or something else. This should be clarified. JesseW 20:09, 5 Nov 2004 (UTC)

Revised. I hope it helps. Eclecticology 21:24, 7 Nov 2004 (UTC)

What is "Spelling example"?[edit]

"Spelling example" is used multiple times on the page, but it's never explained. It appears to be the line of the form: "'''{the word}''' ({plurals, comparitives, etc.})", but I'm not sure. Also, if it is this line, it should definitely get it's own section, beacuse it's not obvious in format - when do you use plurals, superlatives, etc, what about alternate spelling, etc. JesseW 20:20, 5 Nov 2004 (UTC)

I agree. Various people have contributed to this, and have attempted to say the same thing in several different ways. That's bound to lead to confusion. The page needs editing and pruning. Eclecticology 21:33, 7 Nov 2004 (UTC)
I would be happy to work on it, but I don't know what "spelling example" means. :-) Should I just go with my impression I gave above, or is that wrong? JesseW 05:14, 8 Nov 2004 (UTC)
I would abandon the term completely. Your guess about what it means seems right. Maybe "inflections" might be more to the point.
Alternate or variant spellings are something else which should go in an earlier section in the article. Eclecticology 14:56, 8 Nov 2004 (UTC)

"A very simple (copyable) template"[edit]

Sorry for nit-picking, but isn't it spelt copiable? I didn't edit it because a Google search shows copyable being used a LOT (more than copiable), and I thought maybe it was an American spelling or something... or are both acceptable? :S

Hyphenation?[edit]

What about hyphenation points? All real dictionaries have them! There's nothing in the documentation or the sample templates that I could find on how to indicate hyphenation (although a few actual entries seem to use an ad-hoc scheme). One fine point: I've heard that Australians hyphenate words differently than USers.

I've avoided hyphenation myself because I don't feel I'm well enough acquainted with the "correct rules". Usually we all know how many syllables there are but it's a bit fuzzier to decide which consonants in clusters belong to the preceding or following syllable. I'm sure there are rules for this but I'm not sure who decides them - perhaps there are several sets of rules in different countries or from different ages or created by different people. Since I don't know such details I've left this area alone.
I also feel that what we've got currently is ad-hoc and I doubt whether the people entering them have followed specific rules. It might be a good idea even to tag them if they're been properly researched so users can see which are "official" and which are "ad-hoc" — Hippietrail 23:47, 9 Dec 2004 (UTC)
I agree with Hippietrail's approach on this, and tend strongly to avoid them unless a hard hyphen is required. Hard and fast rules for hyphenation should be avoided. It works much better to have general rules that can be modified by common sense. BTW, Oxford spells the expression "ad hoc" without a hyphen. Eclecticology 19:14, 11 Dec 2004 (UTC)

distinct words under the same token[edit]

I've been trying to find something in the help section to assist with the problem of words that are distinct and unrelated (each with its own etymology, and sometimes its own pronunciation), but are necessarily in the same entry as they are the same combination of letters.

Consider the heteronym sake.

Here we have two distinct words. The first, a much older word in English, is entirely distinct from the second, which has more recently been taken from Japanese. To simply include both words under noun as separate senses doesn't seem like a great idea, because each is an entirely different word, with a different pronunciation, etymology, and set of translations.

Yet I'm not satisfied with the way I tried to solve the problem. It seems like there should be a way to distinguish the words at the base level, but that would require revisiting the very template that the wiktionary uses for its entries.

Consider how much more sensible this is:

English[edit]

sake1[edit]

Noun[edit]

  1. The benefit or survival of something.
  2. etc.
Etymology[edit]

blah blah

Translations[edit]

Some stuff.

sake2[edit]

also saké or saki

Noun[edit]

  1. Japanese booze.
Etymology[edit]

Japanese: 酒

Translations[edit]

etc.

English continued[edit]

Yes almost the same format has been used by others in other articles but without superscripting the etymology number.
However I don't think this is flexible enough. In this case the two words have the same spelling but different pronunciation and different etymology - the neatest case. In other cases two "words" have the same sound but different etymology, or different etymology but the same sound. It is even within the realm of possibility that there exist words which share etymology and sound but have diverged into very distant meanings - though none spring to mind. Another more subtle but more common case for us is that the sound may or may not be different, but we just don't have etymologies yet. I have seen articles which boldly declare "etymology 1" and "etymology 2" without having actual etyms under those headings. The author has decided this division will "always work", but has not done the research to see that this is in fact true.
What I think we should do is to collect some words for each case above, probably in a category somewhere, then look at how various dictionaries format their entries. Does anybody know if there are terms for when two words share spelling but a) not pronunciation, b) not etymology, or combinations thereof. That will give us names for the categories. — Hippietrail 14:07, 11 Dec 2004 (UTC)

Jun-Dai 14:37, 11 Dec 2004 (UTC):

I think it should be based on etymology. That would make the cleanest case (and I think that's the way most dictionaries do it). For each etymological case, you can have a multitude of senses as well as different parts of speech.
Also, the answer for (a) is heteronym. Not sure about (b), though. Also, most heteronyms are different in their parts of speech. Compare conTENT (verb or adjective) and CONtent (noun) or afFECT (verb) and AFfect (noun). In most cases, I think the noun is stressed on the first syllable and the adjective/verb on the second. Many heteronyms share the same etymology (though not content).
In that case I propose Category:English heteronyms and Category:English words with same spelling but different etymology. Words which do not yet have articles can simply be listed in the category pages themselves for now.
We should try to do a bit better than assuming and thinking adn guessing. Also we shouldn't base our decisions on "most" since it's the uncommon cases which need to be considered to find the best solutions in a long-term project like Wiktionary. — Hippietrail 15:00, 11 Dec 2004 (UTC)
Jun-Dai 15:54, 11 Dec 2004 (UTC):
So for the long term, what should we do for the formatting of entries that contain multiple distinct words, like sake and bass? After all, the only thing that bass-as-in-fish and bass-as-in-low-pitched have in common is spelling. It seems logical to separate words of different etymologies but the same spelling at the highest possible level (i.e., split it directly under the language and before the part of speech). Obviously this needs more thought, that so far that seems like the sensible arrangement, with the current arrangement as I'm seeing it in the various entries being a bit muddy (where bass-the-fish is just listed as the 4th sense of a word, the first three being bass-as-in-low-in-pitch), because it will require a split in the etymologies and translations, and in this case, pronunciation as well. i.e., it's easier to go:
bass1
pronunciation(s)
etymology
definition
translations
bass2
pronunciation(s)
etymology
definition
translations


There isn't that much grey area where words with different etymologies and the same spelling are confused. Words with different etymologies will almost always have quite different meanings and different translations (I can't think of an exception), and so they should really just be considered different words, unless there is a very clear case of converging etymologies forming a single definition (I can't think of a single case).
Something much like this format seems good, I agree. But we already divide at part-of-speech boundaries. What can happen is that two parts of speech share an etymology but differ in pronunciation, but then another etymology comes along and shares on of the pronunciations. I've seen something very much like this happen but can't recall the word.
Also, remember that we're dealing with all languages, many of which have far less syllable possibilities than English. Chinese and Polynesion languages springs to mind. We try to keep the format for entries in all languages as close as possible. So a solution to the varying etym/pron/part-of-speech-meaning combinations should be general and work for all languages, otherwise we're just going to have to solve the problem again later anyway.
I think the kind of thing to keep an eye out for with converging etymologies is re-analysis, where sound change makes unrelated words sound more similar, and then people start to believe those words are related. So it is surely very rare for total synonyms but not so much for words with close meanings. Also, exact spelling matches are more likely to occur in single syllable words. So while I can't think of one off the top of my head, I'd expect them to exist - probably those of us without a keen interest in etymology haven't spotted them anyway just because we do believe they are more closely related than they are! — Hippietrail 16:05, 11 Dec 2004 (UTC)
<Jun-Dai 16:33, 11 Dec 2004 (UTC)>True, but we can easily note where pronunciations differ based on the sense or part of speech, or where the pluralization differs, etc. (consider: "ten head of cattle"). These distinctions are not as important as an etymological distinction, where the meaning of the word (and translations of the word) are generally substantially different. If the pronunciation is the same as another etymology of the same token, it isn't that important, we can duplicate the pronunciation for the second entry/etymology, since it is more of a coincidence than anything.
The other point you make is interesting. While I can understand the value in keeping the formats similar/consistent between languages, it seems to me that it is also important to recognize that different languages can't always contain the same format, as the basic principles of the language differ. This relates to the discussion we're having at かく. Many simple letter combinations in Japanese are really just the spelling out of a pronunciation of some kanji, kanji compound, kanji+kurigana, or kanji as used in some compound. This justifies the existence of a furigana section for the hiragana entries where kanji exist for those words, even though this is a concept that doesn't really exist in any other language that I know of. If we limit ourselves to formatting that applies to English words and phrases, then we are crippling our ability to be a useful English-language reference of Japanese words.</Jun-Dai>
It's true that we must accommodate different languages differently. But the way to achieve this is by developing a flexible framework which stays mostly the same for all languages and has ways of extending for those languages which require more, rather than just saying "this doesn't fit so let's do a whole new format". (I'm not saying that's what you're doing but people have taken that attitude before).
In one article, I forget which, I did actually reproduce the pronunciation section for exactly this reason. I'll try to use my parser to find all the articles with multiple pronunciation or etymology sections later.
Print dictionaries, at least small format ones, usually do not divide words by part of speech and they have a two-level distinction of sense (I forgot the term they use for the other level). So you will have a seperate headword "foo1" and "foo2", several senses for each headword, verbs and nouns in together sometimes, merely with a "v" or "n" prefix. I looked at one OED a few months ago to see how they handled some of this and they basically just stuck a new pron in between the senses. Their concept of sense, etymology, and pronunciation were not tied together. But my notes are elsewhere, I didn't know all the best examples to look up, different dictionaries solve the problems differently, even different editions of the OED have different solutions. Not to mention that different languages can have even more needs (Arabic and Hebrew have their own challenges)
So yes let's not limit ourselves, but lets also try to preserve as much standardization of formatting and terminology (etc) as we can. Now it's really time for bed (-: — Hippietrail 16:51, 11 Dec 2004 (UTC)

I certainly support general attempts to standardize formatting for bot English and foreign entries, but the standards still need to be adaptable to changing circumstances when that is appropriate. One of the reasons why the etymology heading should take precedence over the part-of-speech headings is that it gives a broader basis for subdivision. Pronunciation comes next, then the parts of speech. A new pronunciation will lead to its own new set of headings. "Sake" is an easy case because the separate origins and pronunciations align easily with each other.

In some cases we can put the pronunciation before the etymology, but that gives the distinct message that the word is always pronounce the same for all etymologies. A word like "calf" with a constant pronunciation but two very distinct etymologies would fall into that category. "Lead" has two distinct pronunciations and etymologies; "read" has two pronunciations arising from the same etymology and part of speach. Eclecticology 19:58, 11 Dec 2004 (UTC)

Thanks for the good analysis, Eclecticology. I think we should make this the official new structure. Each "headword" can have one or more etymology headings. Each etymology can have one or more pronunciation heading (but there can be multiple variant prounciations per heading). Each pronunciation heading can have one or more part-of-speech heading. For headwords which do not fit this pattern, sections can appear twice if they have to.
This leads us to the next question: Since heading levels show the structure (as shown in the TOC, as interpreted by my wiktionary parser), we should really change the heading levels to reflect this new structure. Currently we have:
==Language
===Pronunciation
===Etymology
===Part of speech
====Synonyms/Antonyms/Derived/Related/Translations
What we should have is:
==Language
===Pronunciation
====Etymology
=====Part of speech
======Synonyms/Antonyms/Derived/Related/Translations
But does the wiki software allow level-6 headings?!
The other issue is how to arrange the headings before etymologies have been established? Maybe in those cases we should retain the current "unstructured" heading heirarchy. — Hippietrail 05:40, 12 Dec 2004 (UTC)

I think the system does allow for level 7, but you only need level 6 in your example where you completely forgot level 3. Generally heading levels need to be flexible; there should be no need to deepen a level unless doing so reflects some basis for division. Pronunciation is a remarkably unstable feature for differentiations. We can easily spot the obvious ones that matter, but beyond that it's too easy to fall into the murky pool of dialect differences.

Thanks - fixed! — Hippietrail 13:51, 12 Dec 2004 (UTC)

My proposal was really to put etymology ahead of pronunciation in the hierarchy for most cases, except for words with multiple etymologies which retain the same pronunciation in all cases. Although I don't advocate splitting articles, it is still conceivable that this could become necessary for some words in the distant future. In such circumstances etymology would be a natural break point.

If we don't know the etymology it can be left blank. Two separate etymology headings can still be used if we don't know what they are but at least know that there exist more than one. Eclecticology 13:10, 12 Dec 2004 (UTC)

The problem is it's very hard to guess in regards to etymology. We can see that a word has two different pronunciations and/or two different meanings, and still have no idea, or make false assumptions about etymologies based on this - it is very very easy to be wrong when guessing in the field of etymology. — Hippietrail 13:51, 12 Dec 2004 (UTC)
This may not be as much of a problem as you suspect. If we don't know anything about the etymology we can put the heading and simply leave its contents blank. If we suspect that there is more than one etymology we can still have "Etymology 1" and "Etymology 2" etc. to attest to that simple fact even if they both remain blank. The wonder of the wiki is that someone else can come along later to fill in the blanks.
As an aside, I've used numbered etymology headings from the start. It has been a practical approach, but if anyone has an alternate solution for how we phrase these headings that's less ugly that would be great. These should be short and easily remembered by anyone. Eclecticology 18:54, 12 Dec 2004 (UTC)