Wiktionary:Etymology
Definition from Wiktionary, a free dictionary
| This is a Wiktionary policy, guideline or common practices page. This is a draft proposal; although it is not official, it is likely to be reasonably widely accepted. | |
| Policies: CFI - ELE - BLOCK - REDIR - BOTS - QUOTE - DELETE - NPOV - AXX |
Contents |
[edit] General
Etymology is the study of the origins of words. The vocabularies of modern languages come from a variety of different sources: some have evolved from older words, others have been borrowed from foreign languages, and some have been named from people, developed from initialisms, or even have been deliberately invented by a certain author. Etymology sections in entries of the English-language Wiktionary provide factual information about the way a word has entered the language and usually some sense of its semantic development.
[edit] Brief
Etymology sections should not be too verbose, particularly because they appear before the definitions; usually a simple list of previous forms is all that is required.
Some words may also benefit from further details, such as cognate words in related languages, or some illustrative comments.
There is currently no standard for longer discussions of etymology.
[edit] Lemma
Include the full etymology on the main entry (the lemma), even if historically it derived from another form, such as by back-formation.
[edit] Folk etymologies
Do not include debunkings of incorrect etymologies (folk etymologies and the like). These can be discussed on the entry’s “Talk” page, but should not go in the “Etymology” section; among other reasons, because they are long and distracting, and unnecessary, given a correct etymology.
[edit] Surface etymologies
Etymologies trace the historical development of words, not simply an analysis of their current (“surface”) forms. For example, astrology comes from Ancient Greek ἀστρολογία (astrologia), though its surface form can be analyzed as astro- (“‘stars’”) + -logy (“‘study of’”), as the components are valid English combining forms. Conversely, biology does not come from an Ancient Greek term, but is rather a classical compound, coined c. 1800. Analyses of surface forms are of value, but do not replace and should not be confused with an account of historical development.
[edit] Phrases, Compounds, Acronyms, and Abbreviations
For a term that is composed of base words separated by spaces or hyphens, do not add an etymology that just notes the base words. This can be better shown by wikilinking the term in the inflection line. The term computer language, for instance currently has no etymology, and the base words are linked to with the inflection line template {{en-noun|sg=[[computer]] [[language]]}}. Similarly, the etymology of acronyms or abbreviations is simply the definition, and no separate etymology is necessary.
Conversely, for compounds – a single word without spaces or hyphens, such as endgame – a brief etymology section using {{compound}} is useful, as wikilinking the components in the headline of the entry does not distinguish the components (it would just appear to be a single clickable link).
However, if some etymology would prove useful, it should be included – for example, history of usage or coinage, such as SNAFU (coined during WWII for the chaos of war), or explanations of set phrases or idioms such as hair of the dog (hangover cure, from folk remedy for rabies), or the origin of a proverb or word that can be traced to a particular source or sources such as fortis Fortuna adiuvat (“‘fortune favors the bold’”).
[edit] Layout
There are numerous types of word origins, including borrowing and word formation mechanisms, followed by processes of lexical change, notably sound change and semantic change. These should be formatted in conventional ways, as detailed below.
[edit] Inherited words
A significant category of words in a language are the so-called ‘native’ or ‘inherited’ words; in some languages, but not all, they form the majority of words. This means that they have developed from an earlier form of the language which may or may not have gone by the same name. Some of these ancestor-languages were written down and are well-attested, but others are not. For example, French, Spanish, Italian, Romanian and Portuguese all developed from Latin. The French word clef, for instance, and the Spanish word llave both evolved from the Latin word clāvis (“‘key’”) (they are cognates). They were not borrowed from Latin; the Latin language evolved naturally in different areas into the different forms.
The ancestors of English are, in order: Middle English, Old English, Proto-Germanic, and Proto-Indo-European, and native words are those that came from these ancestors, without at any stage being borrowed from a different language, nor by being borrowed from an ancestor at a later time.
One should show the complete sequence of ancestors, not just the immediately preceding form.
For native words, one can show the sequence of ancestors in the following way, as in father:
===Etymology===
From {{etyl|enm}} {{term|fader|lang=enm}} < {{etyl|ang}} {{term|fæder|lang=ang}}
< {{proto|Germanic|fader}} < {{proto|Indo-European|ph₂tḗr}}.
- From Middle English fader < Old English fæder < Proto-Germanic *fader < Proto-Indo-European *ph₂tḗr.
Note the use of {{etyl}} to specify the source language, and {{term}} for the etymon; see Templates, below, for details.
Some use the word “from” to separate ancestors, while others use the algebraic “<”, where the arrow points in the direction of language change—there is currently no consensus on a preferred form. The initial ancestor is prefaced by “From” (not “<”), assuming it is different from the current form.
Even if the current form is identical to an earlier form, the same format should be used, so that identically spelled words in earlier language can be linked. For example, in when, while we don’t yet have a entry for the Middle English word, we may in the future, and thus a link should be created, givin the following format:
===Etymology===
From {{etyl|enm}} {{term|when|lang=enm}} from {{etyl|ang}} {{term|hwænne|lang=ang}}.
- From Middle English when from Old English hwænne.
[edit] Reconstructed terms
In some cases, the ancestor languages were not recorded. The ancestor language to English, German, Swedish and Dutch, which was spoken around the same time as Classical Latin, was not written down. We call it Germanic or Proto-Germanic, because it developed into the various Germanic languages, of which English is one. Many words from this language can be inferred with great confidence by comparing the surviving forms in daughter languages. Such words are conventionally written with an asterisk before them to indicate that they are hypothetical.
Reconstructed forms in attested languages are treated as normal, with a * in front of the entry (and outside the {{term}}), as in:
Vulgar Latin *{{term|dente}}
which yields:
- Vulgar Latin *dente
If terms from a reconstructed language are linked, they must use the template {{proto}}.
For non-English entries, one must use the parameter lang=xx to categorize the derivation properly, where xx is the destination language (the language of the entry, not the etymon) – this corresponds to and replaces the second (optional) positional argument in {{etyl}} (destination), not the named parameter in {{term}} (source).
If a reconstructed term is mentioned as a cognate, but not as an ancestor, one must use a blank lang= argument, otherwise the entry will be classed as a derivation.
The inclusion of cognate words in related languages is particularly useful for inherited words, since they show how the same original form has developed in different daughter languages.
[edit] Example
In the entry hound:
===Etymology===
{{etyl|ang}} {{term|hund|lang=ang}}, from {{proto|Germanic|hundaz}}.
Cognate with Dutch {{term|hond|lang=nl}}, German {{term|Hund|lang=de}},
Swedish {{term|hund|lang=sv}}.
- Old English hund, from Proto-Germanic *khundaz. Cognate with Dutch hond, German Hund, Swedish hund.
[edit] Borrowings
Some words have been borrowed from other languages, either because of a historical occupation or co-existence, or simply through exposure to other languages. For example, the English word chasm is borrowed from Latin chasma, which itself was borrowed from Ancient Greek χάσμα (“‘a cleft, abyss’”). Borrowings can be ancient or recent. When words are first borrowed into a language they may still ‘seem’ foreign; examples in English include Schadenfreude or ersatz. After a while they become more naturalised—like French borrowings from the last century such as naïve or detour. Eventually they seem completely native, such as leg or table (borrowed from Old Norse and Latin respectively).
Beware to differentiate Ancient Greek, using the language code grc for Ancient Greek, as in {{etyl|grc}}, not the code el, which is for Modern Greek.
Key waves of borrowings into English are from Old Norse (non); Anglo-Norman (xno); French (fr) – though many of these were via Anglo-Norman (from Old French (fro)), with the main borrowing from French happening from the 14th century with Middle French (frm); Latin (la; see Latin influence in English), though many of these are via French, and others are classical compounds, which instead of being borrowed are modern coinages based on nativized combining forms (e.g., biology is not borrowed from Ancient Greek, but is coined from bio- + -logy); and Ancient Greek (grc), though many of these are via Latin (and often then via French), and others are classical compounds.
However, in modern times especially, English has borrowed from a great many languages.
[edit] Differentiate borrowings
If any step of a word’s history is a borrowing, this step should be flagged as such; in English, any word not from Middle English, from Old English, from Proto-Germanic, from Proto-Indo-European, is at some stage a borrowing.
Languages may borrow from an ancestor at a later date: for example, the two Spanish words palabra (“‘word’”) and parábola (“‘parable’”) both come from Latin parabola, but the former was a natural development (hence ‘native’), whereas the latter was borrowed back into Spanish much later (in the fifteenth century in this case).
[edit] Example
In the entry parábola:
===Etymology===
Borrowed from {{etyl|LL.|es}} {{term|parabola|lang=la}}, from {{etyl|grc|es}}
{{term|sc=polytonic|παραβολή|lang=grc}}. Compare {{term|palabra|lang=es}}.
- Borrowed from Late Latin parabola, from Ancient Greek παραβολή. Compare palabra.
[edit] Borrowed forms
A form of a word may be borrowed, in which case one should say “From Xus, form of X”, where Xus is the form, and X is the lemma.
Beware that a form may be borrowed, and then other forms created by regular formation or back-formation, while in other cases different forms may be borrowed independently, as in the below example: stimulate was borrowed into English from Latin stimulatus, derived from Latin stimulus, while this latter was also borrowed into English as stimulus – stimulus/stimulate are not formed from each other in English by a regular rule or back-formation.
[edit] Example
In the entry stimulate:
===Etymology===
From {{etyl|la}} {{term|stimulatus|lang=la}}, past participle of
{{term|stimulo|stimulō|goad on|lang=la}}, from {{etyl|la}}
{{term|stimulus||goad|lang=la}}.
- From Latin stimulatus, past participle of stimulō (“‘goad on’”), from Latin stimulus (“‘goad’”).
[edit] Layout: Word formation
[edit] Regular formations
If a word is formed by a regular rule (such as a derivation or inflection), such as adding an affix, it is not necessary to repeat the complete details of the word’s origin on the page for the derived or inflected form: simply show the rule, and leave the full etymology for the lemma.
The templates {{prefix}} and {{suffix}} (with the arguments gloss1 and gloss2 for gloss if components are not clear) are useful for this, and place entries into the correct categories (“words prefixed/suffixed with …”), as in the following entry for abstractly:
===Etymology===
{{suffix|abstract|ly}}
[edit] Back-formations
Conversely, words that look like a regular formation can have the formation reversed (especially, removing apparent affixes), yielding a new word. This is called back-formation, and the template {{back-form}} helps here.
Not to be confused with clipping, which is just a shortening of a word, not the undoing of a formation, and does not change the meaning or part of speech.
Note that back-formations are generally the lemma entry, and should have the full etymology, rather than relegating the earlier etymology to the etymon.
[edit] Examples
In the entry greed:
===Etymology===
{{back-form|greedy}}
- Back-formation from greedy.
[edit] Compound
A compound word is a word composed of two words, but used as a single unit, like science fiction or school bus. For these, the etymology can simply be {{term|A}}+{{term|B}}.
Consider also using the template: {{compound}}.
[edit] Blends (portmanteau words)
A blend or portmanteau word is a word which was originally formed by combining two other words. For example, brunch is a blend of breakfast and lunch.
[edit] Examples
In the entry brunch:
===Etymology===
{{blend|breakfast|lunch}}.
[edit] Coined expressions
In some historically recent cases where words have been deliberately created, we may be able to give details of where and by whom this was done. Where possible, the reasoning behind the coinage should be suggested, however note that this will properly be conjectural unless it has been documented by the word’s original creator.
If the original coinage is attested, common practice is to include the relevant quotation in the etymology, and link to a source, if possible, as in serendipity or portmanteau word.
[edit] Examples
In the entry hobbit:
===Etymology===
Coined by [[w:J. R. R. Tolkien|J.R.R. Tolkien]] in 1937.
Ostensibly from {{etyl|ang}} {{term|holbytla||hole-builder|lang=ang}}.
- Coined by J.R.R. Tolkien in 1937. Ostensibly from Old English holbytla (“‘hole-builder’”).
In the entry chortle:
===Etymology===
Coined by [[w:Lewis Carroll|Lewis Carroll]] in ''[[w:Jabberwocky|Jabberwocky]]'',
apparently as a {{blend|chuckle|snort|nocap=1}}.
- Coined by Lewis Carroll in Jabberwocky, apparently as a blend of chuckle and snort.
[edit] Calques
For calques or loan translations it is necessary to provide the source language out of which the lexeme, compound or a phrase has been calqued. Sometimes the exact source of calque cannot be established due to its spread among several languages, in which case several notable examples should be listed. The source out of which the word has been calqued is not to be treated as usual etymological source, i.e. the {{etyl}} template should be used with first parameter specifying the language out of which the lexeme is calqued, but the second parameter must be set to '-'. At the end of the entry, word should be explicitly cagetorized as [[Category:<Language-name> calques]]
[edit] Examples
In the entry antibody:
===Etymology===
{{prefix|anti|body}}, a calque of {{etyl|de|-}} {{term|Antikörper|lang=de}}.
- anti- + body, a calque of German Antikörper.
[edit] Abbreviations
For acronyms, initialisms, and abbreviations that have a foreign origin, such as qv or cf, and etymology section should describe the foreign expansion. For native acronyms, initialisms, and abbreviations (including single letters used symbolically), the definitions are simply the expanded terms. A separate etymological section for each meaning is therefore unnecessary, and multiple senses should be grouped together under the same L3 header. A generic etymology section can be used to separate abbreviation definitions from non-abbreviated definitions. See A for an example.
[edit] Further details
In addition to etymology, one may provide cognates and glosses in the etymology section.
[edit] Cognates
Cognates are not strictly part of the etymology of a word, but can provide useful context, as well as serve as a mnemonic device.
The inclusion of cognate words is allowed only for inherited words, deriving from the same etymon in the ancestor languages, since they show how the same original form has developed in different daughter languages. This is especially useful for words whose ancestor form is not attested, and where regular sound correspondences can be observed.
They can be listed at the end of the etymology as: “Cognate to lang. term.”
Care should be taken however:
- Not to overburden the etymology section with too many cognates. One example from a major branch of the immediate ancestor should suffice, with at most 4-5 cognates listed. Users can always look up more cognates descending from the attested or reconstructed form in the corresponding etymon's ====Descendants==== section, or the appendix page linked to by {{proto}}.
- Not to significantly mix cognates diachronically, by listing cognates of modern languages in the etymology section of ancient languages, or by listing cognates of ancient languages in the etymology section of modern languages. Ancient languages should thus prefer ancient cognates, and modern languages should prefer modern cognates. In case of Indo-European language family that means Ancient Greek, Latin, Old Church Slavonic, Sanskrit/Avestan/Old Persian, Lithuanian/Old Prussian, Gothic, Old Irish, Tocharian, Old Anatolian (Hittite, Luwian, Palaic) and Old Armenian. Exceptions are "single-language" families (e.g. Armenian, Albanian), and cases where there is no ancient but only modern cognate attested, usually occurring in case of languages that were attested relatively lately (e.g. Lithuanian, Albanian) and these are allowed to list ancient languages as cognates.
- Ancient languages of large sub-branches (Indo-Iranian, Germanic, Balto-Slavic) should list both their ancient language cognates of the same branch, and the ancient Indo-European cognates.
- In case of ancient languages, Old English (Anglo-Saxon) is an exception and should always be listed if it is a cognate, including the modern-English reflex in parentheses.
- Special care should be taken to always mutually link cogantes of "classical" ancient IE languages: Sanskrit (Vedic), Latin and Ancient Greek.
These are general guidelines and individual language/language-family policies take precedence over them.
[edit] Glosses
In some cases where the semantic development is not obvious, some explanatory comments may be useful. The more concise and efficient, the better.
[edit] Examples
In the entry trilby:
===Etymology=== From the stage adaptation of [[w:George du Maurier|George du Maurier]]’s novel ''[[w:Trilby (novel)|Trilby]]'', in which such hats were worn.
- From the stage adaptation of George du Maurier’s novel Trilby, in which such hats were worn.
In the entry sybarite:
From {{etly|la}} {{term|Sybarita|lang=la}} < {{etyl|grc}}
{{term|sc=polytonic|Συβαριτης|Συβαρίτης|tr=Subaritēs|inhabitant of Subaris|lang=grc}} <
{{term|sc=polytonic|Συβαρις|Σύβαρις|tr=Subaris|Sybaris|lang=grc}}, an ancient Greek city in southeastern
Italy noted for the luxurious, pleasure-seeking habits of many of its inhabitants
- From Latin Sybarita < Ancient Greek Συβαρίτης (Subaritēs), “‘inhabitant of Subaris’”) < Σύβαρις (Subaris), “‘Sybaris’”), an ancient Greek city in southeastern Italy noted for the luxurious, pleasure-seeking habits of many of its inhabitants
[edit] Descendants
Complementary to etymology (going backwards) is descent and derivation (going forwards): as per WT:ELE, please link back descended terms in the “Descendants” L4 heading of the ancestor term, and likewise for derived terms is the “Derived terms” L4 heading: descendants are terms either inherited or borrowed into another language, while derived terms are terms in the same language which derive from a given term.
Proposals for the format of the descendants section are discussed at Wiktionary talk:Descendants, and a specific format policy is at WT:Latin: Descendants, but as of this writing, there is no detailed general policy. You may use {{l}} to create a link to the correct language. Narrowly, you may wish to distinguish inherited terms from borrowings by suffixing the latter with “(borrowed)”, and list descendants at the form from which they are descended (rather than the lemma), but this is at discretion.
[edit] Templates
[edit] Etymology language templates
To specify the source language and add it to the correct derivation category, one can use the template {{etyl}}, together with a language code. These are generally ISO 639-1 and 639-3 codes of two and three letter but Wiktionary:Language codes explains the full policy and Wiktionary specific exceptions. See Category:Language templates for all language code templates and Wiktionary:Index to templates/languages for table of codes and language names. Note that for non-English entries, one must specify both the source language (first, required positional parameter) and the destination language (second, optional positional parameter; defaults to English).
For the etymon, the templates {{term}} (particularly with the lang= parameter) and {{proto}} allow one to link to ancestor terms.
A comprehensive example for a native English word is father; note lang=enm for Middle English and lang=ang for Old English:
===Etymology===
From {{etyl|enm}} {{term|fader|lang=enm}} < {{etyl|ang}} {{term|fæder|lang=ang}}
< {{proto|Germanic|fader}} < {{proto|Indo-European|ph₂tḗr}}.
- From Middle English fader < Old English fæder < Proto-Germanic *fader < Proto-Indo-European *ph₂tḗr.
[edit] Other templates
Other useful templates are {{rfe}} and {{etystub}}, for flagging stubs or disputes. As many entries lack etymology, this is most useful if there is a partial etymology; including it for all entries lacking etymology would be distracting.
Where a term originates in a foreign, but undetermined language, use {{etyl|und}}. In cases where an etymology is reliably identified as unknown, {{unk.}} may be used (note this can be used for native-born terms unlike the previous template).
[edit] Structure of Etymology Categories
Generally the contents of Category:Etymology are concerned mainly with the etymology of English words and terms. Following the Wiktionary convention for non-parts of speech categories each language has its own root etymology category prefixed by the language code.
- For example the root etymology category for Scots language is Category:sco:Etymology.
The template {{topic cat}} should be included all root etymology categories giving the name of the language as the only parameter.
Similarly, for each of the derivations categories (e.g. Category:Old English derivations) the corresponding category for example for Scots language would be Category:sco:Old English derivations. The template {{topic cat}} should be included for all of these categories, and an appropriate subpage created (if necessary). (The previous template, {{dercatboiler}}, is deprecated.)
The etymology categories are inclusive: they include all terms that trace their roots to the source language, however the root. Finer distinctions can be made (such as derivations versus borrowings, or direct ancestors versus distant), but these have not been found useful and currently only the general categories exist.