Wiktionary:Beer parlour/2015/August

Definition from Wiktionary, the free dictionary
Jump to: navigation, search
discussion rooms: Tea roomEtym. scr.Info deskBeer parlourGrease pit ← July 2015 · August 2015 · September 2015 → · (current)


Category:(langname) plurals and Category:(langname) noun plural forms[edit]

Continuing the discussion from Module talk:category tree/poscatboiler/data/non-lemma forms § Plurals and noun plural forms

Right now both Category:(langname) plurals and Category:(langname) noun plural forms exist, their descriptions are the same "(langname) nouns that are inflected to be quantified as more than one (more than two in some languages with dual number)." And they are used in mostly the same way. The main difference is that there are counterparts to Category:(langname) noun plural forms, such as Category:(langname) noun dual forms, which don't exist for Category:(langname) plurals. And this also follows with the naming scheme of (langname) adjective * forms.

I think we should either change plurals to be more general ((langname) terms that are... (vs (langname) nouns that are...)) and move it out of noun forms, or better yet just remove it. Note that (langname) singularia/dualia/pluralia tantum categories exist. Enosh (talk) 18:56, 2 August 2015 (UTC)

I proposed merging the plurals category into the noun plural forms category before, for consistency with other categories. I still support this. —CodeCat 19:02, 2 August 2015 (UTC)
I thought the goal was to merge any plural category into their appropriate forms category. For example Category:Hungarian plurals were merged into Category:Hungarian noun forms a long time ago. So there is no separate category for plurals at this moment. Isn't the goal the same for all languages? Is this discussion related: [1]? --Panda10 (talk) 20:18, 2 August 2015 (UTC)
Yes, it's the same proposal. But I'm not sure what you're asking. —CodeCat 20:23, 2 August 2015 (UTC)
I support merging Category:English plurals into Category:English noun plural forms for consistency with other categories. See also: Category:Noun plural forms by language. --Daniel Carrero (talk) 21:38, 2 August 2015 (UTC)
I support doing this in general. We have both Category:Arabic plurals and Category:Arabic noun plural forms, which ought to have the same contents but don't for reasons I'm not quite sure of; it's a bit of a mess. Benwing (talk) 05:53, 4 August 2015 (UTC)
Finally done for English. That was a lot of work for sure. A lot of entries needed manual fixing too so it wasn't just a simple bot run. In many entries, the plural-of definition was mixed in with other "proper" lemma definitions even though these should be kept to separate noun sections. There were also many entries where the headword line specified a noun lemma, rather than a noun plural form. —CodeCat 22:24, 19 August 2015 (UTC)

Thai transliterations with tones[edit]

Discussion moved from Wiktionary:Grease pit/2015/August#Thai transliterations with tones.

Native speakers seem to dislike dictionary and textbook transliterations designed for learners, which includes tones and replace it with Royal Thai General System of Transcription (RTGS). I see my older edits replace toned transliterations with RTGS.

I think it's a problem. The standard Thai transliteration system (RTGS) lacks not just tones but displays short and long vowels the same way, merges some consonants. I think it can be used as one of the systems but not the main one. I mentioned this in this discussion.

I insist that transliterating Thai tones is very important, not just the nominal but irregular tones as well. We could include RTGS along with phonetic transliterations (another parameter in Thai headwords?).

For example, ฉัน (chǎn) is nominally "chăn" but normally pronounced "chán" (pronoun), also ไหม (mǎi) (sense 1) is pronounced "mái" (nominally "măi"). I suggest we should use toned transliterations, as dictionaries and textbooks do, not as prescribed by the Thai government. @Stephen G. Brown, Iudexvivorum, Iyouwetheyhesheit. --Anatoli T. (обсудить/вклад) 12:17, 3 August 2015 (UTC)

  1. I agree that we need a romanisation system that better reflects tones, short and long vowels, etc.
  2. What system should we use then?
  3. The system developed by Thai2english (T2E) might be okay. But the T2E machine transliterator should be used with caution, as it sometimes gives incorrect transliterations (see the table below).
  4. Some other systems that might work:
    1. The now-defunct 1939 version of the RTGS (English translation) contains a general system and a precise system (which records tones, short and long vowels, etc.).
    2. The ALA-LC system is generally used by libraries in English-speaking countries. But this system lacks tone marks. (Could we add tone marks ourselves?)
    3. ISO 11940 is used in academic context.
--iudexvivorum (talk) 14:22, 3 August 2015 (UTC)
terms romanised by
T2E transliterator
correctly romanised
according to RTGS system according to T2E system
ภิยโย pí-yá-yoh phin-yo pin-yoh
อธิกมาส (à-tí-gà-mâat) a-tík-mâat a-thi-ka-mat;
ทรูก (trûuk) trôok suk sôok
ซอมซ่อ (sɔɔm-má-sɔ̂ɔ) som-sôr sommaso som-má-sôr
รอมร่อ (rɔɔm-má-rɔ̂ɔ) rom-rôr rommaro rom-má-rôr
เทพรัตนราชสุดา tâyp-rát-dtà-ná-râat-chá-sù-daa theppha rat rat suda tâyp-pá-rát-râat-sù-daa
นิลรัตน์ nin-rát ninlarat nin-lá-rát
อุตบล u-dtà-bon utbon ùt-bon
I completely agree that transliterations need to reflect long vowels and tone marks. If I'm trying to learn Thai, it will do me no good to have important phonetic information like this left out. Native speakers should not be the ones determining transliteration; translit is not designed for them. However, I think that this T2E system looks just awful, and I don't think it will help. People expect foreign words to follow the usage where a e i o u stand for the sounds they have in Latin and Spanish, rather than using weird things like ay for /e/, oo for /u/, or for (presumably) /ɔ/ (this latter notation is especially unhelpful for American English speakers), etc. ISO 11940 won't work either because it's a translit system in the narrow sense in that it reflects the writing rather than the pronunciation (properly speaking, Wiktionary misuses "transliteration" to mean "transcription" but that is a discussion for another day). Adding the tone marks to the ALA-LC system is not a bad idea; you could imagine taking the T2E tone marks and adding them to the ALA-LC system. You could also imagine rewriting long vowels as e.g. aa instead of ā, to avoid the stacking up of diacritics when long vowels are combined with tone marks. Benwing (talk) 06:15, 4 August 2015 (UTC)

Here's a comparison between some systems: --iudexvivorum (talk) 11:39, 4 August 2015 (UTC)

# Thai meaning IPA romanisation
(without tone marks)
tone marks added
(using numbers to indicate tones - see notes below)
1 ไม้ใหม่ไหม้มั้ย Was that new piece of wood burnt by the fire? mäːj˦˥ mäj˩ mäj˥˩ mäj˦˥ mai mai mai mai máai mài mâi mái māi mai mai mai māi4 mai2 mai3 mai4
2 กรุงเทพมหานคร อมรรัตนโกสินทร์ The city as great as a celestial city, where the Emerald Buddha stays in perpetuity. krũŋ˧ tʰeːp̚˥˩ mä˥.häː˩˦ nä˥.kʰɔ̃ːn˧ ʔä˩.mɔ̃ːn˧ rät̚˥.tä˩.nä˥ koː˧.sĩn˩˥ krungthepmahanakhon amonrattanakosin grung-tâyp-má-hăa-ná-kon a-mon-rát-dtà-ná-goh-sĭn krungthēpmahānakhǭn ʻamǭnrattanakōsin krung1-thēp2-ma4-hā5-na4-khǭn1 ʻa2-mǭn1-rat4-ta2-na4-kō1-sin5
3 เสียงลือเสียงเล่าอ้าง อันใด พี่เอย What tales, what rumours, you ask? siːä̃ŋ˩˦ lɯː˧ siːä̃ŋ˩˦ läw˥˩ ʔä̃ːŋ˥˩ ʔä̃n˧ däj˧ pʰiː˥˩ ʔɤːj˧ siang lue siang lao ang an dai phi oei sĭang leu sĭang lâo âang an dai pêe oie sīang lū’ sīang lao ʻāng ʻan dai phī ʻœi sīang5 lū’1 sīang5 lao3 ʻāng3 ʻan1 dai1 phī3 ʻœi1
4 อันมือไกวเปลไซร้แต่ไรมา คือหัตถาครองพิภพจบสากล The hand that rocks the cradle is the hand that rules the world. ʔä̃ːŋ˥˩ mɯː˧ kwäj˧ pleː˧ säj˦˥ tɛː˨˩ raj˧ mäː˧ kʰɯː˧ hät̚˩.tʰäː˩˥ kʰrɔ̃ːŋ˧ pʰi˥.pʰop̚˥ t͡ɕop̚˩ säː˩˥.kõn˧ an mue kwai ple sai tae rai ma khue hattha khrong phiphop chop sakon an meu gwai bplay sái dtàe rai maa keu hàt-tăa krong pí-póp jòp săa-gon ʻan mū’ kwai plē sai tǣ rai mā khū’ hatthā khrǭng phiphop čhop sākon ʻan1 mū’1 kwai1 plē1 sai4 tǣ2 rai1 mā1 khū’1 hat2-thā5 khrǭng1 phi4-phop4 čhop2 sā5-kon1
Tone representation:
"1" = สามัญ (mid; [aː˧])
"2" = เอก (low; [aː˨˩] / [aː˩])
"3" = โท (falling; [aː˥˩])
"4" = ตรี (high; [aː˦˥] / [aː˥])
"5" = จัตวา (rising; [aː˩˩˦] / [aː˩˦])
I got the idea of using numbers from the Wade–Giles system for romanising Chinese. But the numbers will be superscript under the WG system (e.g. "p'in1-yin1" for "拼音").
@Iudexvivorum Thanks. Good job! I was going to suggest the system used by Benjawan Poomsan Becker. In his dictionaries he uses special characters for vowels "ʉ" for อึ, "ɛ" for แอะ, "ɔ" for เอาะ and "ə" for เออะ. Long vowels are simply duplicated, e.g. ตืน (dtʉʉn) is "dtʉʉn". Tone marks are used on the first vowels only, e.g. เบิก (bə̀ək) is "bə̀ək". Tone marks are (using "a"): "a" (1 - no tone mark), "à" (2), "â" (3), "á" (4) and "ǎ" (5). Like T2E he uses d-dt-t, b-bp-p.
Using that system the examples above become:
  • ไม้ใหม่ไหม้มั้ย: máai mài mâi mái
  • กรุงเทพมหานคร อมรรัตนโกสินทร์: grung-têep-má-haa-ná-kon a-mon-rát-dtà-ná-goh-sǐn
  • เสียงลือเสียงเล่าอ้าง อันใด พี่เอย: sǐang leu sǐang lâo âang an dai pêe oie
I agree that Thai2English may not transliterate words correctly, which it doesn't have in their dictionary. (I wonder if อธิกมาส has various readings, though. Both T2E and http://www.thai-language.com transliterate it as "atíkmâat".). Are "a-tí-gà-mâat" and "a-tík-gà-mâat" irregular alternative readings? --Anatoli T. (обсудить/вклад) 12:32, 4 August 2015 (UTC)
  1. The term อธิกมาส is never pronounced "a-thik-mat" (a-tík-mâat). Grammatically, it is pronounced "a-thi-ka-mat" (a-tí-gà-mâat), as it is from Sanskrit अधिकमास adhikamāsa. But people also pronounce it as "a-thik-ka-mat" (a-tík-gà-mâat) and this pronunciation has become so popular. The Royal Institute Dictionary, the official dictionary of the Thai language, therefore accepts both pronunciations.
  2. There are many other similar cases. Some are shown in the table below.
  3. FYI: The Royal Institute of Thailand publishes a popular book called "อ่านอย่างไรและเขียนอย่างไร" ("How to Write? How to Read?"), containing common misspellings and mispronunciations, pronunciations of proper nouns, useful rules concerning writing and reading, etc. The book is regularly updated. The 2014 edition (22th edition; ISBN 9786167073965 seems to be its latest edition. But it is in Thai only.
--iudexvivorum (talk) 14:29, 4 August 2015 (UTC)
term acceptable pronunciations notes
grammatical popular
กรณี RTGS: karani
T2E: gà-rá-nee
IPA: kä˩.rä˥.niː˧
RTGS: korani
T2E: gor-rá-nee
IPA: kɔː˧.rä˥.niː˧
from Sanskrit करणि karaṇi
ครหา RTGS: kharaha
T2E: ká-rá-hăa
IPA: kʰä˥.rä˥.haː˩˩˦
RTGS: khoraha
T2E: kor-rá-hăa
IPA: kʰɔː˧.rä˥.haː˩˩˦
from Sanskrit गर्हा gar'hā
ปรัชญา RTGS: prat-ya
T2E: bpràt-yaa
IPA: prät̚˩.jäː˧
RTGS: pratchaya
T2E: bpràt-chá-yaa
IPA: prät̚˩.t͡ɕʰä˥.jäː˧
from Sanskrit प्राज्य prājya
ปรมาจารย์ RTGS: paramachan
T2E: bpà-rá-maa-jaan
IPA: pä˩.rä˥.mäː˧.t͡ɕä̃ːn˧
RTGS: poramachan
T2E: bpor-rá-maa-jaan
IPA: pɔː˧.rä˥.mäː˧.t͡ɕä̃ːn˧
from Sanskrit परम parama + आचार्य ācārya
มนุษยสัมพันธ์ RTGS: manutsayasamphan
T2E: má-nút-sà-yá-săm-pan
IPA: mä̃˧.nut̚˥.sä˩.jä˧.sä̃m˩˥.pʰä̃n˧
RTGS: manutsamphan
T2E: má-nút-săm-pan
IPA: mä̃˧.nut̚˥.sä̃m˩˥.pʰä̃n˧
from Sanskrit मनुष्य manuṣya + सम्बन्ध sambandha
อธิบดี RTGS: a-thi-bodi
T2E: a-tí-bor-dee
IPA: ʔä˩.tʰi˥.bɔː˧.diː˧
RTGS: a-thipbodi
T2E: a-típ-bor-dee
IPA: ʔä˩.tʰip̚˥.bɔː˧.diː˧
from Sanskrit अधिपति adhipati
อาชญา RTGS: at-ya
T2E: àat-yaa
IPA: ʔäːt̚˨˩.jäː˧
RTGS: atchaya
T2E: àat-chá-yaa
IPA: ʔäːt̚˨˩.t͡ɕʰä˥.jäː˧
from Sanskrit आज्य ājya
If I were to design a Thai translit system, I'd want the following:
  1. Use diacritics for tones rather than numbers; numbers look ugly to me and take up extra room.
  2. Use double letters rather than macrons; this is necessary with diacritic tonal marks to avoid double diacritics.
  3. Don't separate syllables with hyphens; that looks ugly to me and takes up lots of extra room.
  4. Use t th d rather than d t dt.
However, if Benjawan Poomsan Becker's system satisfies 1-3 but not 4, then maybe we should go ahead and use it in the interest of using an existing system rather than rolling our own. Benwing (talk) 08:31, 5 August 2015 (UTC)
@Iudexvivorum Thanks for providing this info. Irregular pronunciation was a side question. We still want to transliterate Thai words with irregular pronunciations phonetically. BTW, you can use automatic transliterations for Sanskrit, e.g. करणि (karaṇi), गर्हा (garhā), प्राज्य (prājya), etc. Unfortunately, it seems that some online dictionaries, including thai2english and thai-language.com don't always provide phonetic transliterations or respellings for irregular words. (The latter uses yet another transliteration system, which is great for learning but not good for dictionaries) If I get some words wrong, I'd appreciate your corrections!
@Benwing I favour Benjawan Poomsan Becker's system but it also uses hyphens, like Thai2English. Hyphens can be either removed or added regardless of what system we choose. It's easier to read Thai correctly when syllables are split by hyphens. Initials and finals are pronounced quite differently in Thai like in many East Asian languages and like many East Asian languages, consonants change pronunciations when they are finals, specifically - s, ch, j, d, dt, t are all pronounced as a clipped "t" [t̚] when they are finals, p, bp, b, f are all [p̚], g, k are [k̚] and n, l and r become [n]. It's important to separate clusters like "kla" from "-k-la", "tra" from "-t-ra", etc. User:Stephen G. Brown also favours using solid words, without hyphens. There are pros and cons with languages like Thai with both. Textbooks and dictionaries favour hyphens, sometimes spaces after each syllable.
Shall I make proposed full tables with Benjawan Poomsan Becker's system? --Anatoli T. (обсудить/вклад) 11:39, 6 August 2015 (UTC)
@Atitarev As for hyphens, I guess I'm used to Pinyin, written without them. But I also kind of would have expected final s, ch, j, etc. to be transcribed as t to follow the pronunciation. But I imagine whatever Becker does should work fine. If dictionaries tend to use hyphens, for example, then that's what we should do. Benwing (talk) 21:35, 6 August 2015 (UTC)
@Iudexvivorum, Benwing I've slowly started using Becker's transliteration, as in เรียก, including a usex, e.g.:
เรียกรถแท็กซี่แล้วยัง?rîak rót tɛ́k-sîi lɛ́ɛo yang?Did you call the taxi?
I've also started Category:Thai terms with irregular pronunciations, which I think could be useful. For irregular pronunciations as in ชาติ (châat) I've added a line "Phonetic respelling: ชาด". What do you think? Sorry, I haven't provided a full table for your consideration because I don't know your opinion on the change (see my post above - 12:32, 4 August 2015). --Anatoli T. (обсудить/вклад) 00:49, 11 August 2015 (UTC)
  1. What you've done above looks great! Anyway, "เรียกรถแท็กซี่หรือยัง?" sounds more natural than "เรียกรถแท็กซี่แล้วยัง?". I've edited the entry เรียก (rîiak). But I haven't provided transliterations (because I don't know how) and I haven't replaced "เรียกรถแท็กซี่แล้วยัง?" with "เรียกรถแท็กซี่หรือยัง?". I hope you will further improve the entry.
  2. I've been waiting for the full table; that's why I didn't give any opinion, lol! I'll also start using the system as soon as possible. And I think, for readers' sake, you should create a page on Wiktionary that contains the table (like the page Wiktionary:International Phonetic Alphabet) and the transliterations should be linked to that page (by means of template or any other means).
--iudexvivorum (talk) 02:12, 11 August 2015 (UTC)
@Iudexvivorum OK, great. I'll make a table and it will make it easy to look up and copy/paste if needed and I'll teach you some tricks to make adding transliterations easier (if you use Firefox, it's even easier). We don't normally link transliterations to templates (just using tr=) but if entries contain more than one transliteration, it could be done, I guess - I will ask for assistance to enhance Thai headword modules/templates. Wiktionary:Thai transliteration and Wiktionary:About Thai will need to be updated. I will try adding new transliterations to your usage examples. You can use the new transliteration "rʉ̌ʉ-yang" for หรือยัง, if you want to replace แล้วยัง with หรือยัง :). BTW, can หรือยัง be considered a single term? Does it need a space instead of a hyphen between the two syllables? I trust your judgement on what sounds more natural, of course, since my Thai is very basic, LOL! --Anatoli T. (обсудить/вклад) 02:50, 11 August 2015 (UTC)
  1. Thank you so much! I've replaced "แล้วยัง" with "หรือยัง".
  2. "หรือยัง", "แล้วหรือ", "แล้วหรือยัง" are generally interchangeable. For example:
    1. "จะไปหรือยัง", "จะไปแล้วหรือ", "จะไปแล้วหรือยัง" = "shouldn't we go yet?"
    2. "ไปได้หรือยัง", "ไปได้แล้วหรือ", "ไปได้แล้วหรือยัง" = "can't we go yet?"
    3. "ไปหรือยัง", "ไปแล้วหรือ", "ไปแล้วหรือยัง" = "hasn't he gone yet?" / "hasn't he left yet?"
  3. Using "แล้วยัง" in a question is rare in the Central Thai dialect, though it would mean the same as the above phrases. But it can be found in the Northern Thai and Northeastern Thai dialects. (In fact, in Northern Thai, "แล้วยัง" is even less common than "แล้วกา".)
  4. I don't think "แล้วยัง", "หรือยัง", "แล้วหรือ", "แล้วหรือยัง" can be considered single terms, just as "should not", "have not", "is not", "are not", etc., are not single terms. (That's why I removed the hyphen from "rʉ̌ʉ-yang".)
--iudexvivorum (talk) 04:03, 11 August 2015 (UTC)

Feedback on alternative layout for Template:de-decl-adj-table[edit]

I created an alternative layout for this template, see User:CodeCat/de-adj. The three sections for strong, mixed and weak are now merged into one piece, with the distinction instead shown through columns. Please comment; is it better, worse? Should we use it? —CodeCat 14:50, 3 August 2015 (UTC)

Your table is more compact. On the other hand, the current arrangement with all strong forms in one place, all weak forms in one place, and all mixed forms in one place seems better for what I expect is the main use of the tables: someone has "[definite article] _ [noun]" or "[indefinite article] _ [noun]" or "_ [noun]" (i.e. they know whether they're looking for a strong or weak or mixed form), and they want to know what ending to put on "rot", for the case and gender they're dealing with, when they plug it into to that blank. Both online (de.Wikt, Canoo) and print references seem to favour the "all strong (etc) forms in one place" format. Notably, I would expect printed works to prefer a more space-saving compact format if they didn't think there was a compelling reason for the longer format. OTOH, if your table were rotated 90°, it might be compact enough to have the advantage of fitting all on one screen for mobile users (but as it is, I imagine it's still too wide). - -sche (discuss) 00:23, 4 August 2015 (UTC)
The main reason I made it was to show the similarities of forms between strong, weak and mixed declensions. This is something that I personally always struggled with, so I though a different table layout might help. But I'll leave it then. —CodeCat 00:34, 4 August 2015 (UTC)
A slightly different issue -- surely the order "nom gen dat acc" is unhelpful for German? My German textbooks use "nom acc dat gen", which IMO is far better since nom and acc are so often the same. Benwing (talk) 06:37, 4 August 2015 (UTC)
I agree that this order is more helpful. The order used for old Germanic languages is generally nom acc gen dat, and this is still used for Icelandic. I never saw the point in having accusative fourth; it's "traditional", but traditions are superceded when we realise they're stupid. —CodeCat 20:11, 9 August 2015 (UTC)
Like Benwing, I'd prefer nom-acc-dat-gen. Nom-gen-dat-acc was traditionally the most common order, but I wouldn't mind improving upon tradition, and there certainly are references which have already done so, as Benwing notes; e.g. Günter Kempcke, Wörterbuch Deutsch als Fremdsprache (2000); Paul G. Graves, ‎Henry Strutz, Master the Basics: German (1995, ISBN 0812090012; David Crowner, ‎Klaus Lill Impulse: Kommunikatives Deutsch Fur Die Mittelstufe (1998, ISBN 0395909341; Karsten Fink, Workbook Deutsch: Das Übungsbuch zu Eine wesentliche Grammatik (2014); and even Robert P. Ebert, ‎Oskar Reichmann, ‎Hans-Joachim Solms, Frühneuhochdeutsche Grammatik (1993), which all use Nom-Akk-Dat-Gen order. - -sche (discuss) 20:34, 9 August 2015 (UTC)
Time for a proposal then? I wouldn't mind one for Latin either to be honest, but Latin tends to be full of tradition freaks... x.x —CodeCat 20:57, 9 August 2015 (UTC)
My only objection is that I am so used to nom-gen-dat-acc that I get confused every time I see nom-acc-dat-gen. But I'll get over it if it's really a better order and we start using it more. Whichever order we choose though, we should try as much as possible to use it consistently not only within languages, but across all languages. --WikiTiki89 17:46, 10 August 2015 (UTC)
Heh, I have the reverse problem. (*looks at second row of inflection table* "what?! there's no way that's the accusative form..." *looks at legend* "oh, it really isn't.") I don't think all languages can necessarily be handled the same; perhaps for some (e.g. Latin) there really is a case for nom-gen order, while for others we already use nom-acc order (e.g. Proto-Germanic, Middle Dutch). I'd rather handle German first and worry about unrelated languages I don't speak later (e.g. Finnish, which uses nom-gen-part-acc, in contrast to Hungarian which uses nom-acc-dat). - -sche (discuss) 19:11, 10 August 2015 (UTC)
I agree with User:-sche that we should do one language at a time. Different languages may have different orders that make the most sense, and also there's the issue of tradition -- German textbooks often prefer nom-acc-dat-gen but Old English textbooks use nom-acc-gen-dat. Sanskrit has a traditional order nom-voc-acc, inst-dat-abl, gen-loc which makes total sense for Sanskrit (and for PIE, and it looks like we indeed use it for PIE) but for Latin the order that makes the most sense might be something like nom-voc-acc, gen-dat, abl-loc, which is similar but moves the genitive. Lithuanian seems to have its own order nom-gen-dat-acc-inst-loc-voc and people working on it might object to changing the order (although personally I think the first two should be nom-voc because they're the same in the plural). Benwing (talk) 01:32, 11 August 2015 (UTC)
For Slovene, the traditional order is nom-gen-dat-acc-loc-ins, but on Wiktionary that's changed into nom-acc-gen-dat-loc-ins. So here, too, genitive precedes dative. For IE languages with a vocative, the order should indeed be nom-voc-acc, like for Proto-Germanic. Balto-Slavic languages tend to put the vocative last; for Proto-Slavic and Proto-Balto-Slavic we currently use the order nom-acc-gen-loc-dat-ins-voc. —CodeCat 01:40, 11 August 2015 (UTC)
Russian seems to do nom-gen-dat-acc-ins-prep which reverses the order of the last two from Slovenian (since "prepositional" is really the locative case). But it would make a lot more sense to move the acc to come after nom, like we do for Slovenian, since the acc is usually the same as either nom or gen (presumably Slovenian is like this too). I guess the point is that the most appropriate order depends somewhat on the language ... for German, acc-dat-gen makes sense since dat and acc are often the same but gen is different, whereas for Russian, acc-gen-dat makes sense since gen and acc are often the same. Benwing (talk) 08:57, 12 August 2015 (UTC)

Deletion of inflected forms[edit]

I see an editor deleting inflected form entries that use {{inflected form of}}, including kveldi, kljenuta, and κυκλῶν. Do we want this? I don't. --Dan Polansky (talk) 23:15, 3 August 2015 (UTC)

Most uses of the template are gone now, via Special:Contributions/MewBot and its e.g. "Rename inflected form of > lb-inflected form of for Luxembourgish entrie" or "Rename inflected form of > yi-inflected form of for Yiddish entries".

I ask that the bot be immediately blocked for a gross violation of WT:BOT and that it remain blocked until the changes are undone. (I might as well talk to a tree, I guess.) --Dan Polansky (talk) 23:24, 3 August 2015 (UTC)


shows that the bot made more than 5000 edits to remove {{inflected form of}}, at the rate of approximately 60 edits per second. --Dan Polansky (talk) 23:30, 3 August 2015 (UTC)

I think you mean minute. DTLHS (talk) 23:36, 3 August 2015 (UTC)
Yes, my mistake. --Dan Polansky (talk) 23:41, 3 August 2015 (UTC)
The change to kveldi looks correct; {{inflected form of}} should be avoided in favor of specifying the actual inflection, which is what was done here. But I totally disagree with simply deleting the pages that use this template, as in kljenuta and κυκλῶν. They should be left alone until someone manages to fix them up to specify which inflection is involved. As for templates like {{de-inflected form of}} instead of the generic one, I'm not sure the point of them, but I imagine CodeCat can explain, and at least there is no loss of information. Benwing (talk) 06:25, 4 August 2015 (UTC)
I agree that these deletions are not okay, and CodeCat should recreate all the entries she has bot-deleted for this reason. —Μετάknowledgediscuss/deeds 06:29, 4 August 2015 (UTC)
Just so we're on the same page, "all the entries she has bot-deleted" = zero entries, and she only deleted three by hand (kveldi, kljenuta, and κυκλῶν). The bot work consisted of switching German uses to {{de-inflected form of}} (which was proposed on the 22nd, met with agreement from a German speaker on the 23rd, and thereafter met with silence until after the changes had been made; only then did someone object) or relatedly switching Yiddish and Luxembourgish uses to corresponding templates. The fact that more languages than were initially thought use {{inflected form of}} may mean we want to go back to the general-purpose template and use langcodes, rather than using language-specific templates — if so, we can do that, since nothing was deleted, but rather only renamed. - -sche (discuss) 08:20, 4 August 2015 (UTC)
Someone else has restored kveldi and I've restored κυκλῶν and made it more precise than it was. I've left kljenuta deleted since if the declension table at kljenut is right, kljenuta isn't a form of it. —Aɴɢʀ (talk) 10:33, 4 August 2015 (UTC)
Thanks for the clarification, -sche. Dan Polansky's wording was evidently intentionally misleading, but my faulty assumptions derived therefrom aside, I still do not support those deletions without process. —Μετάknowledgediscuss/deeds 16:15, 4 August 2015 (UTC)
I apologize to anyone who was mislead by my wording. I should have already been fast asleep at the time when I posted the initial post here; 23:30 means it was 1:30 CET, summer time. --Dan Polansky (talk) 10:01, 8 August 2015 (UTC)
  • Manual creation of a subset of word's inflected forms should be banned, and such entries deleted. Making such entries only complicates botting the rest of the inflection in the future. Too much time is wasted cleaning up such entries. If you are creating inflected forms manually either create it entirely for a lemma, using one and only one template, or don't create it at all. --Ivan Štambuk (talk) 09:13, 4 August 2015 (UTC)
    No it shouldn't, and no they shouldn't. I don't know how to use a bot, and I don't always have the time to create entries for all the inflected forms. I often create entries only for those inflected forms that already exist as spellings in other languages. For example, if some random Irish or Old Irish verb form happens to share a spelling with an existing Spanish entry, I'll create the Irish form there, but I won't bother creating brand-new entries for all the other forms of the verb. In other words, I'll work to remove orange links from inflection tables, but not (always) black/red ones. —Aɴɢʀ (talk) 10:39, 4 August 2015 (UTC)
    Extinct languages like Old Irish which have irregular paradigms and limited attestation of inflection should of course be manually treated. But for living languages that don't have such issues you are just creating more cleanup in the future. Blueing orange links seem to me the only valid reason to do so (convenience over thoroughness). --Ivan Štambuk (talk) 10:23, 5 August 2015 (UTC)

Some relevant data:

  • There were 45419 uses of {{inflected form of}} on a definition line on 2014-07-28. I used the following Windows command line to ascertain that: find /c "# {{inflected form of" enwiktionary-20140728-pages-articles.xml
  • {{de-inflected form of}} was created on 3 August 2015‎ by CodeCat. --Dan Polansky (talk) 10:01, 8 August 2015 (UTC)
  • AWB shows 25000 uses of {{de-inflected form of}} as of now, but there is probably a limit of 25000 built into AWB. I hazard a guess that almost all uses of {{inflected form of}} were replaced with {{de-inflected form of}}.

--Dan Polansky (talk) 10:01, 8 August 2015 (UTC)

{{ux}} in Eastern Mari?[edit]

Recently, CodeCat (talkcontribs) changed the format of the examples in Eastern Mari лум, inserting {{ux}}. The result looked like this: [2]. I certainly understand the need to use standard templates, but the resulting format was much less compact and less practical: three lines per example (including transliteration). Since I thought one line per example would be nicer on the eye and easier for anyone actually interested in seeing how the word can be used, I reverted her change. But I wondered if it wouldn't be possible to change said templates (or create a new one) that has the one-line-per-example format, and keep using it. Would that be a problem to anyone? Is there a reason why the three-line-per-example format should be preferred to the one-line-per-example one? --Pereru (talk) 20:02, 4 August 2015 (UTC)

@Pereru: Just add the parameter |inline=1 to the {{ux}}/{{usex}} template. --WikiTiki89 20:13, 4 August 2015 (UTC)
OK. Now, can this be the standard format? Or is there any reason to prefer the three-line-per-example format? Or is this up to every Wiktionarian to decide? --Pereru (talk) 20:20, 4 August 2015 (UTC)
The reason is that most usage examples are much longer and wouldn't fit well on one line. It's the short ones that are the exception. --WikiTiki89 22:14, 4 August 2015 (UTC)
I think this should be automated in some way. Once the length exceeds a threshold, put it on multiple lines, otherwise keep it on one line. —CodeCat 20:25, 4 August 2015 (UTC)
It's hard to determine length other than by counting characters, which is not so accurate. I think it is better to leave it as is. Perhaps we can make it easier by having a template such {{uxln}} or {{ux1}} which would effectively be a redirect to {{ux|inline=1}}. --WikiTiki89 22:14, 4 August 2015 (UTC)
That's probably better than having such a parameter. But there is an alternative to counting characters: CSS layout. I'm not sure if it's feasible, but at least the client-side stuff knows exactly how wide text is, and it can overflow when necessary. —CodeCat 22:17, 4 August 2015 (UTC)
But it would also need to hide the dashes when it overflows. How would you do that? Also semi-relatedly, |tr=- doesn't work to hide the transliteration in {{ux}}. --WikiTiki89 22:35, 4 August 2015 (UTC)

Sourcing etymologies?[edit]

Recently, in Latvian ūdrs, I reverted a change that introduced a Proto-Baltic reconstruction in the Etymology section, without proper sourcing. Given the way the text was written, it would seem that the Proto-Baltic proposed reconstruction came from Karulis' Latviešu Etimoloģijas Vārdnīca, when in fact it came from an as yet unpublished article by R. Kim. I changed the format, to make it clearer where the Proto-Baltic form was taken from. Can't we perhaps agree on a general policy for Etymology sections whereby we try to explicitly source what is what -- so that, if two protoforms from different sources are cited, the reader can know which was proposed by which source? The format doesn't have to be the one I used in ūdrs, of course, but it would be nice to have something that would avoid this kind of confusion. A second, unrelated question is whether unpublished sources should be accepted in Wiktionary. I'd say no: let it be published before it can be cited here. But I don't know what the others here think. --Pereru (talk) 20:19, 4 August 2015 (UTC)

My general issue with your etymologies is that they're huge blocks of text. They need to be structured better in order to be readable. The long list of cognates is not necessary either, especially if we already have PIE pages and, more recently, categories to hold them. At the very least, they should be made collapsible or presented in a separate paragraph to make the rest easier to read.
This is of course not the problem I wanted to talk about here, but OK, there we go...
The 'huge blocks of text' are necessary when the etymology is not simple, or is disputed, or involves changes, semantic or otherwise, that are not obvious, as in liegt. When the etymology is simple -- just PIE to PB to the word, without semantic changes, as in acs, you have only one short sentence. I suppose your problem here is how much information should be given: should there be only a reference to the etymon, with no indication of how you got from that form and from that meaning to the current state? Or should more information be provided? I, for one, favor the latter, because this extra information is important to judge and accept the etymology, and are part of the history of the word, which is what the etymology section is about. It is also often interesting and brings new light to the understanding of the word, as several other people here told me when commenting favorably on the 'huge blocks of text' that you dislike. Call that 'humanistic etymology' if you will.
I don't have anything against presenting a lot of information. My problem is more the way it's presented. One giant paragraph doesn't invite the user to read it, and instead they'll just go tl;dr at it. If I want to know, at a glance, what the origin is, I don't want to have to read through a lot of blabber to get to the point. So what I would suggest is to write etymologies focus first on the known and reconstructed history, and leave the details until later. That way, people who aren't interested in the extra details can skip them, rather than having to sift through. Make the information that users want more accessible by splitting it. —CodeCat 21:26, 4 August 2015 (UTC)
Most users don't want to look at etymologies, they just want to see what the word means; so they won't read the etymology (or at the alternative forms, or the pronunciation) at all. If they glance at the etymology section, they're as likely to go tl;dr at mysterious cabalistic symbols like *h₃ḗHḱ-ō as they are at longish texts. Only if they are interested will they read it. Interestingly, the information I present is already in the format you suggest: the very first sentence gives the PB and the PIE etymon, you don't have to read any further than that. Perhaps the only necessary change here is to add a carriage return after that first sentence, to put the rest of the information in a separate paragraph? --Pereru (talk) 22:23, 4 August 2015 (UTC)
The list of cognates is less relevant, I agree. The only problem is that different sources often quote different cognates, and this may be a problem. One solution is to bypass cognates altogether, but this only works for the (relatively few) 'famous' words or roots that already have reconstructed forms here at Wiktionary (where one can add cognates and refer to the specific sources that mention them. But over 90% of Latvian words for which Karulis' LEV gives etymologies are not in this category: rather, they are words with only a couple of cognates, mostly in Baltic (e.g. liegt) or maybe a couple of other non-Baltic languages. It will be a long time before those etyma have Wiktionary pages, so eliminating these cognates looks like a bad idea. I would agree, though, with the cases in which there is already a good Appendix page with the etymon (as long as different cognates proposed by different sources are clearly distinguished there). Do you have one such example, so we could discuss the format further?
If sources conflict, then Wiktionary has to find a compromise through the usual consensus process. Consensus may invalidate some sources or even all of them, or choose a particular one that seems most usable by the people discussing the matter. —CodeCat 21:28, 4 August 2015 (UTC)
Sure. Let it happen, then. The LEV, for instance, cites cognates that are not cited in some Wiktionary reconstructed entries; should I add them? Or should I start somewhere a discussion about whether or not to do this? Or whether or not the LEV is a good source? And, if so, where? --Pereru (talk) 22:23, 4 August 2015 (UTC)
Showing different takes on the issue by different people is good. I think the best way to present it would be through an unordered list. See for example *fanhaną. —CodeCat 20:24, 4 August 2015 (UTC)
Back to the problem at hand. Yes, that would be good, so separate paragraphs for your PBS etymologies (with correct sourcing) might be a good idea. You could start such paragraphs with 'According to a diffferent source,...' and then add the information. Or you could mention the forms with a footnote to the source, as I did in ūdrs. Either way would be OK with me, as long as the wording is fluent and there is no confusion as to what comes from where. What I would disagree with is what you did before: just adding a form with no sourcing to a text that is itself attributed to a specific source, as if that form also came from the same source (i.e., your original PBS etymon at ūdrs looked like it came from Karulis' LEV, when in fact it came from Kim's unpublished paper).
Besides, note that English fang (from *fanhaną) -- where you find one of those 'huge blocks of text' you so much dislike -- does NOT mention the two proposed PIE etyma mentioned under *fanhaną: rather, it only mentions the first one, and without references to sources. So the information under fang is misleading at best. Shouldn't such things be changed in a more principled way, so that a reconstructed entry does not seem to be in contradiction with the information found in the etymology section of one of its reflexes? --Pereru (talk) 21:23, 4 August 2015 (UTC)
And I would add that I don't think it's a good idea, in principle, to cite unpublished sources. (But maybe Kim's paper has already been published? It was going to come out in a Handbook, as I recall; maybe it is already there?) --Pereru (talk) 21:12, 4 August 2015 (UTC)
This is the problem with the paragraph approach that you use. You source the whole paragraph, which makes it impossible for anyone to make adjustments to the text. Any edits make it no longer faithful to the source. Instead what should be sourced is individual facts. That way, people can add or change things without invalidating the references. Again, splitting etymologies into separate sections with paragraph breaks and lists should help with that. Again look at *fanhaną: each list item has its own separate sourcing.
Yet I did source other forms in ūdrs, for example, so that it is clear that the PBS form is not from the LEV; just put the footnote next to the material from the other source, not at the end of the paragraph. Why not make it standard practice? Another possibility is simply to start a new paragraph with a different source, perhaps starting with "A different source claims that..." or something similar. So this isn't a problem. --Pereru (talk) 22:23, 4 August 2015 (UTC)
I'm not sure what you mean by unpublished sources. If they are available, then they are public, right? —CodeCat 21:26, 4 August 2015 (UTC)
An unpublished article has not yet passed peer review. It may be complete nonsense, or more likely it may have a few minor errors that will be corrected before publishing. --WikiTiki89 22:20, 4 August 2015 (UTC)
Nowadays everything is on the internet: manuscripts, unpublished sources, papers at various levels of completion... because we always want to invite comments from other interested researchers, comments that may improve a paper even before it's completely finished (academia.edu is a great site for this, as are individual researchers' pages at their institution website). When a paper is published, however, it is officially released, be it on paper, be it in a publishing website. After that, it can no longer be edited or altered; and the year of its publication becomes fixed. Also, a published paper went through a refereeing process in which it was read and commented upon by two or three of the author's peers; an unpublished paper, of course, didn't. So the jist of it is that an unpublished paper is (supposed to be) less good and less final than its published version. Its author, for instance, wouldn't like you to cite an unpublished version if there is a published one alreday. (Kim's paper states quite clearly -- at the end, I think -- that it is an unpublished version, to appear in a Handbook of something or other). --Pereru (talk) 22:23, 4 August 2015 (UTC)
Different pages conflicting on each other is an unfortunate effect of how Wiktionary works. There's not much that can be done about it other than checking and updating things regularly. I would say that generally, the reconstruction pages are more reliable than the etymologies within entries, as they've been created and reviewed by more knowledgeable editors. Etymologies in entries often tend to be copied from just one source, often an outdated or nonspecialised one. They are then inserted into entries by editors who are relatively inexperienced with such matters, so that they are not able to spot and correct problems in their sources. And then, when new entries are created for cognate terms, then the etymologies are just copied over. This tends to propagate old/bad etymologies. And it's one of the reasons I prefer keeping etymologies to a bare minimum and letting the proto-language pages handle the rest. —CodeCat 21:33, 4 August 2015 (UTC)
This is, again, difficult for words that have a more complicated history, as I mentioned above. For such words, their etymology section is the only place where, say, discussing a strange semantic evolution or comparing two or three different etymologies is logical: after all, in the reconstructed entries, you are not interested in the details of the semantic evolution of one reflex in one sub-branch of the family (I haven't seen a single reconstructed proto-entry here that does that); rather, the focus is on the reconstructed protoform and how it fits in the proto-system. So I think you would lose more than gain by doing that. The only thing that I would indeed relegate to the reconstructed entries is the list of cognates -- assuming that we can source cognates that occur in only one source, for instance.
And here's a final thought: if inconsistencies are unavoidable at Wiktionary, if no policy can be devised to address them, then we're basically giving up on the idea that Wiktionary can become a quality work. No -- I'm sure something can be done. Wikipedia found solutions, so can Wiktionary. --Pereru (talk) 22:23, 4 August 2015 (UTC)
  • I support banning original research with reconstructions in etymologies, as well as inventive editorial corrections, such as how "ū́drā́-" (the form cited in the article by R.K.) became "ūdrāˀ (which is what CodeCat inserted in the etymology). Additionally, for protolanguages, when there is no accepted general framework, which is the cases with Proto-Baltic/Proto-Balto-Slavic, all of the competing theories should be presented on an equal footing. That means that there can be no single and "true" reconstruction, and that there could be multiple inflection tables for a word according to different sources. --Ivan Štambuk (talk) 10:19, 5 August 2015 (UTC)
Agreed. And since the number of reconstructed entries in Wiktionary is not so high, this is probably quite feasible, isn't it? Shouldn't for instance the page *ūdrāˀ be moved to *ū́drā́-, then? Or does CodeCat have another source that references the form she prefers? --Pereru (talk) 13:46, 5 August 2015 (UTC)
I oppose a ban on editorial corrections; to fail to harmonize notation schemes is misleading. In both Menominee (living language) and Proto-Algonquian (reconstructed language), for example, most people notate long vowels like , but some people write , a: or ā. To have individual words/forms in different systems based on who attested the particular word/form (e.g. fooba·r, plural foobārs) would confuse readers into thinking the vowels were of some different quality. - -sche (discuss) 17:24, 5 August 2015 (UTC)
I agree with -sche here; notation schemes should be harmonized to the extent that this is a simple case of equivalent notations. As for "ū́drā́-" vs. "ūdrāˀ", it's not obvious to me what's going on here. Do the two acute accents indicate Balto-Slavic acute? If so, then it's fine to convert them to use the superscript glottal stop, which can be viewed as simply another way of indicating the BS acute register -- the fact that it expresses an opinion as to how that register was phonologically realized is irrelevant here. But then shouldn't it be "ūˀdrāˀ"? Benwing (talk) 07:16, 6 August 2015 (UTC)
Using acute accent marks to indicate the acute is actually very misleading, because Proto-Balto-Slavic also had a proper phonemic word accent like that of PIE. We should definitely use the same symbol, ´, to denote the accent in both of them. Anything else would just be unnecessarily confusing. That said, it does seem that there is somewhat of a linguistic consensus that the acute register involved some kind of glottal feature. The Latvian broken tone is a direct continuation of the acute, and is realised as glottalisation. So if there is any serious disagreement among linguists about the approximate nature of the acute, then I would like to hear about it. —CodeCat 00:42, 7 August 2015 (UTC)
@CodeCat OK, I think I agree with you here, but what I don't understand is why you didn't write "ūˀdrāˀ" rather than "ūdrāˀ". Isn't the acute register on both syllables? And where's the stress? Benwing (talk) 10:09, 7 August 2015 (UTC)
You're right, I moved the page. But I wonder why the masculine form *udras doesn't have an acute, at least according to the source Pereru gave. Did Winter's law skip that word or something? —CodeCat 12:06, 7 August 2015 (UTC)
I think it does have an acute, it's just mis-written. The Latvian descendant has a long broken-tone vowel, and AFAIK broken-tone is descended from an unstressed Balto-Slavic acute vowel (one of the other two tones reflects a stressed acute vowel, I think, but I forget which one). Benwing (talk) 12:17, 7 August 2015 (UTC)

Sourcing etymologies bis: a proposal[edit]

Well, here is a modest proposal for sourcing (and otherwise formatting) etymologies in etymology sections:

  • For "simple" etymologies (A < proto-B AA' < proto-C AAA),

(a) State in the first sentence what the path is from the current form to the oldest protoform you want to cite ('From proto-B AA, from proto-C AAA). Make it a separate paragraph.
(b) Further infomration (semantic evolution, irregular transformations, etc.) can be described in the following paragraph, if need be, as succinctly as possible.

  • For "complicated" etymologies (there are several suggested paths or etyma):

(a) Start with "There are (two, three, several) proposed hypotheses:";
(b) State each hypothesis in a single sentence in a separate paragraph, starting with a letter -- (a), (b), (c), etc. -- to identify the hypothesis;
(c) If further information is necessary on a given hypothesis, add it in a separate paragraph after all the hypothesVes, referring back to it by its letter.

  • Cognates would be listed, in full agreement with the source (i.e., no tampering with the data!) in a separate paragraph at the end. If one of the protoforms (preferably the oldest) already has a good, consensus-approved entry in the Appendix, then all, cognates to that entry, making sure that each cognate is duly and correctly sourced. (This is not the current state in most reconstructed entries here, and those interested in entering protoforms should add their sources.)

What do y'all think?

I'm impressed with the detail you put into ūdrs. My only suggestion would be to put the cognates into a separate paragraph to avoid the "wall of text" feeling. It sounds like you're in agreement with this. Benwing (talk) 08:37, 5 August 2015 (UTC)
Is this arrangement in ūdrs (a carriage return between the two paragraphs) what you had in mind? --Pereru (talk) 14:05, 5 August 2015 (UTC)
I don't like how you are duplicating the cognates in ūdrs. They are already listed in *udrós. The Latvian page is not the proper place to discuss the development of Latin lutra. --Vahag (talk) 16:38, 5 August 2015 (UTC)
In this case I actually agree. But before removing them, we need to solve inconsistencies. So there are cognates in my source that aren't mentioned in *udrós. Should I copy them and source them there? How about the fact that my source menitons a Proto-Baltic form, whereas *udrós lists only Proto-Balto-Slavic? I agree that basically cognates (at least for the 'richer' words with cognates in many branches) should be in the reconstructed entry page, but we need to know which forms should be there, from which sources... or else we simply don't know what kind of information we have there. In ūdrs, at least I know who made the claim and where.
I made a first attempt to change *udrós, introducing information from the Latvian source and footnoting it. I don't quite like the look of the result, but it's a first attempt. Any thoughts? --Pereru (talk) 08:43, 6 August 2015 (UTC)
Yes, you can add the cognates to the proto-entry and source them there. That way the information from LEV can be enjoyed by everyone, not just the viewers of the Latvian page. For the format of referencing individual descendants you can look at *tep-. As to which descendants should be there and from which sources, I think at first all descendants from all sources can be added. If people have objections, a centralized discussion will happen on the talk page of the proto-entry or in WT:ES. The bad cognates from outdated sources will be eventually weeded out. That will not happen if you keep the information on the Latvian page. --Vahag (talk) 09:24, 6 August 2015 (UTC)
Well, @Vahagn Petrosyan:, I did add LEV cognates to the list, but my changes at *udrós were reverted without explanation (diff). Unless this is better explained, so that I can know what is going on, what is the point of adding cognates there? It seems safer to leave them on the Latvian page...--Pereru (talk) 01:31, 7 August 2015 (UTC)
I gave an explanation, so did you just choose to not read it? —CodeCat 01:56, 7 August 2015 (UTC)
Reverting good faith edits is not cool, CodeCat. You did the same to me recently.
There is no accepted format for listing both Proto-Balto-Slavic and Proto-Baltic. Pereru, can't you list the cognates under Balto-Slavic only and still reference LEV? Sure, LEV says "from Proto-Baltic", but we understand that in essence what he is saying is that ūdrs is from PIE *udrós, whatever the intermediate details. When my dated Armenian equivalent of LEV says հոտ (hot) is from PIE *ōd-, I understand that I should list it under modern PIE *h₃ed- and still reference my old source. I have seen it done by academic scholars. Martirosyan 2010 can write that a source in 1920s derives a word from such-and-such PIE root and use modern reconstruction for that root. It seems to me that you are trying to give a literal translation of LEV in Wiktionary. That is a job for Wikisource, not Wiktionary. The best practice is to synthesize sources new and old under the light of modern knowledge. --Vahag (talk) 09:43, 7 August 2015 (UTC)
I agree with Vahag that User:CodeCat probably shouldn't have reverted that change, and should definitely have given a better explanation than "this just looks ugly". I can understand CodeCat's objection to the form ū́drā́ with acute accents indicating the acute register (i.e. it conflicts with the conventional use of accents to indicate stress, which is also phonemic in Balto-Slavic, and it's inconsistent with the way other Balto-Slavic entries have been formatted in Wiktionary [granted, it was CodeCat doing that formatting]), but in that case, she should have just undone that one change, with explanation, rather than the whole thing. I also agree with Vahag that we should feel free to modernize/canonicalize proto-forms and such. Benwing (talk) 10:05, 7 August 2015 (UTC)
This is not about canonicalization. Those glottal stops are phonemes on their own in the reconstruction of Proto-Balto-Slavic by the Leiden School, and according to it only after the parent language disintegration did individual branches developed their own acute/circumflex distinctions. The notation with acute accents by R. K. is an entirely different reconstruction, where acute accent marks indicate intonation/tone. Those two also have different originating points - the glottalic theory of PIE vs. the standard PIE Frankenstein's monster with laryngeals, genders and thematic inflection existing contemporaneously. You can't mix those two notations, because they refer to two different protolanguages, in two different chronological stages. There are also other differences that go beyond mere characters substitutions. --Ivan Štambuk (talk) 12:19, 7 August 2015 (UTC)
I think my point still stands, though, that these can be viewed as equivalent notations, with acute register vs. non-acute register marked either by acute accent vs. tilde (or circumflex) accent or by presence or absence of superscript glottal stop, without necessarily committing to a phonological interpretation of the notation. As long as it's agreed that there was a two-way register distinction -- regardless of whether that is interpreted as tonal, as glottal, or whatever -- then the notations are equivalent in that you can convert from one to the other without loss of information, and we may as well be consistent. Benwing (talk) 12:53, 7 August 2015 (UTC)
What does the "two-way register distinction" actually mean? It's a meaningless notion, vague and abstract. Those symbols mean different things in different protolanguages. Leiden School theory also has short *o and *a, and different assumptions on Auslautgesetze and paradigms leading to different endings and forms in inflections. Also, some of the origins of the glottal stop or "acute" are disputed (Winter's law formulation, long/hyperlong vowles), which in particular renders the acute accent notation inapplicable, whereas with the glottal stop you can just use parentheses as is the customary notation for optional parts of reconstruction. Lastly, the superscript notation is baised as to the phonation character of what you call the Balto-Slavic "acute register" - there are different theories (rising/falling tone, glottalization/stod). It's best not to mix those two protolanguages, and use two different reconstructions. There are some Proto-Slavic appendices that already do it like that. --Ivan Štambuk (talk) 13:32, 7 August 2015 (UTC)

──────────────────────────────────────────────────────────────────────────────────────────────────── "register distinction" is an abstract way of referring to a distinction with unknown phonetics, but it's certainly not meaningless, and more than the three laryngeals of PIE, which are equally abstract. No one would have any problem regularizing e.g. Ringe's laryngeal notation, where he writes something like ç x xʷ in his Tocharian book, into more standard h₁ h₂ h₃, even though they may have a completely different interpretation of what these symbols mean phonologically. Differences that cannot be treated as notational variants, e.g. differences in which register or vowel length is reconstructed in a particular word, or in numbers of vowels, obviously shouldn't be confounded, but when there's an equivalence to be made between notational variants I don't see the point of not making it. Benwing (talk) 17:00, 7 August 2015 (UTC)

But the difference is that the glottal stop is not a mere "abstract register". It's a phoneme with a very specific phonetic value. Nobody disputes the phonemic status of PIE laryngeals. The differences between the protolanguage described by Ronald Kim (which has no glottal stop as a phoneme, and "acute" is a property of certain vowels) and the one of the Leiden School are irreconcilable and these two should not be mixed. This "canonicalization" is a thinly-veiled attempt at giving undue prominence to certain theories at the expense of others. It should be resisted and denounced. --Ivan Štambuk (talk) 17:09, 7 August 2015 (UTC)
The difference you mention is nothing more than relative chronology and allophony. Compare the sequence -Vnh- in Proto-Germanic. It eventually gave way to -Ṽ:h-, with a long nasal vowel. It doesn't matter in the slightest whether we write one or the other, because they represent the same phonological units. It's merely a matter of phonetic interpretation, but notation certainly does not have to indicate any particular phonetic reality. The same applies here with the acute. The interpretation in which there is an actual glottal stop, and the one in which there is merely glottalisation or some other change in the vowel, are different interpretations of the same phonological units. How you interpret it phonetically, or denote it in writing, is completely irrelevant to what it is. It's the acute, nothing more, nothing less. Whatever symbol we choose to show that it's there is equally valid, because it's just a symbol that says "acute is here". —CodeCat 19:00, 7 August 2015 (UTC)
I am not trying to give prominence to one theory over another. I'd be just as happy if you denote the acute with a superscript A, and the non-acute with a superscript B (or vice-versa). That makes it obvious that we're talking about what is ultimately an abstract register difference. It is entirely analogous to the situation in Old Chinese, where everyone agrees there was a distinction between "Type A" and "Type B" syllables but no one agrees what the relevant feature was. Some think type B syllables have an extra /j/ phoneme before the vowel, some think type A syllables have pharyngealization of the syllable-initial consonant, some think the difference is vowel length, some think it's a phonation difference (creakiness/breathiness/whatever), etc. But these theories are hardly irreconcilable just because of this. My concern is not to favor one theory over another but to avoid needless complication introduced by notational differences. Since you don't seem to ever believe in canonicalizing notations, we might end up just having to agree to disagree. Benwing (talk) 19:45, 7 August 2015 (UTC)
Two points that have been made in previous discussions: one, we have to recognize that many obvious derivations are not cited, e.g. it is unlikely that a dictionary has gone through Spanish's (or, even more likely, Rumantsch's) massive corpus of words inherited from Latin and noted, in each and every case "yep, this one too was inherited from its obvious Latin predecessor, rather than, like, borrowed from Welsh or something". The same sort of research we undertake to determine what words mean, and e.g. how they inflect (in contradistinction to what scholars and authorities think they mean and how scholars and authorities think they inflect), will sometimes be necessary when tracing etymologies. Two, it would be misleading and foolish not to allow for standardization of notation schemes, as I note above. - -sche (discuss) 17:36, 5 August 2015 (UTC)
@Pereru There's still a bit of a "wall of text" effect, since the paragraph break is barely visible. If it could be set off better, I think people would object less. An alternative is to just list a few cognates, the subjectively most "interesting" ones (e.g. Greek, Sanskrit) and put the rest on a reconstruction page; and if such a page doesn't exist, create it. I personally don't object to seeing all the cognates listed on the Latvian page, but I understand the objections of the others, and I also see how it's likely to lead to inconsistencies (e.g. you give an etymology for the unexpected l and t in lutra whereas the reconstruction page doesn't and says it's unknown. Benwing (talk) 07:23, 6 August 2015 (UTC)
Does indenting the cognate paragraph make it look better (see ūdrs)? As for cognates in general, I do understand the feelings, but there are too few reconstructed entries in Wiktionary for all cognates to be easily transferable (and given that there are discussions about "what the right form" is, I'm a bit afraid of creating hundreds of new reconstructed pages on the authority of my Latvian source, the LEV, just to see them moved to other titles, or incorporated into other pages, etc.; I'd rather wait till there are more solid criteria. I frankly think Wiktionary simply follows no real policy on dealing with reconstructed entries, etymological sources, etc. -- everybody pretty much does whatever s/he wants... For differences between sources, case in point: Latin l in lutra. I simply copied what my source had on this problem, while the reconstructed page made that claim apparently on the basis of an old etymological dictionary of Ossetian (though it is not clear whether the reference refers to the entire page or only to the reconstructed protoform -- again, we lack a good format for these things). Should we mention both? Only the most recent? The best source? Again, where's the policy?... --Pereru (talk) 08:43, 6 August 2015 (UTC)
Indenting is better, although not perfect. I also tried indenting with ':' (where you don't see the bullet point) and setting the paragraph off with two blank lines. All are possibilities.
As for there being no real policy on reconstructed entries, I think you're right. Mostly that's probably because few people are actually creating those entries -- mostly it seems to be CodeCat (talkcontribs), at least for IE languages. You might consider proposing a policy and getting people to vote on it (although that may be a bit like herding cats). Benwing (talk) 08:08, 7 August 2015 (UTC)
BTW you could also try just "being bold" and editing pages like Wiktionary:Etymology and Wiktionary:About Proto-Indo-European and Wiktionary:About Latvian and so on that purport to be policy pages; if anyone objects, they will change it. Benwing (talk) 08:11, 7 August 2015 (UTC)
I'd like to have OP's third point clarified a bit. First, "listing cognates in full agreement with the source": sources on languages that have been unwritten or scarcely written until recently will often utilize technical or otherwise non-standard orthography or transcription; but I would suggest that this does not mean we are obliged to provide a separate source for the actual native orthography. E.g. the Udmurt reflex of Proto-Uralic *käle is кыл (kyl), but all basic sources appear to list only the transliteration kyl, kïl, kɨl or ki̮l. (And, as mentioned above, I agree that transcription schemes should definitely be unified here as well.)
Second, would "making sure that each cognate is duly sourced" involve simply watching that people don't add new cognates out of the blue, or actually adding an inline citation for every single cognate? The latter would sound like overkill, whenever the majority of a cognate list is based on a reliable and comprehensive source, such as authoritative major etymological dictionary (or dictionaries), and is not explicitly contradicted by other equally reliable sources. Not every language group necessarily has such a source available of course, and establishing what sources to consider "reliable by default" (and to what extent — often a source might be quite reliable for cognates but outdated for reconstructions or etymologies) should be determined by the consensus of editors involved with the language or language group in question.
For "unexpected" cognates that are added from somewhere else than from a standard source (say, if someone were to release a paper arguing that Mongolian хэл (hel) is a Uralic loanword), I'd be in favor of annotating the etymologies in more detail, but it should probably be sufficient doing this on the "main" etymology hubs — the entry's own page and its posited origin's entry (whether an attested form or a reconstructed proto-form) — rather than on every single page that refers to it. --Tropylium (talk) 11:52, 7 August 2015 (UTC)
Here are my personal opinions on the clarifications you ask about:
(a) "In full agreement with the source" is not supposed to mean that you can't regularize transcriptions, as long as this is described in a policy page (e.g., WK:Etymology or WK:About Proto-Uralic or something like that), so that the interested reader can always see what can be one to source transcriptions.
(b) "Making sure that every cognate has a source" is meant as making sure the reader can tell where cognates came from. So, if all cognates come from the same source (some authoritative etymological dictoinary, for instance), you can refer to it only once at the end of the page. But then it becomes necessary to indicate deviations if they occur. If someone adds a new cognate to this page that happens not to come from the common source, then this cognate needs a footnote indicating its source, so the reader isn't fooled into thinking it is from the same source as the others. In short: don't necessarily add a footnote to every cognate, but always make it possible for the reader to know where the cognate comes from. If it's an original suggestion of a Wiktionarian (e.g., CodeCat, who is into original research), then also say so by adding an "original research" template. --Pereru (talk) 19:55, 8 August 2015 (UTC)

As I see it, a discussion on allowing "Notes" as a valid header should be considered.

Vahag has brought this up (Wiktionary:Grease_pit#.7B.7Breflist.7D.7D) and I'm running into a similar problem all the time. As ridiculously silly of an argument as it may be, I do, in fact, agree that numbered and bulleted references together look ugly AF. (I have even went to such ridiculous steps as removing a reference that didn't add anything critical just because it was bulleted while the other ones were numbered because of how unappealing it looks.)

In more general terms, I kind of get the feeling that there seems to be consensus that references are in fact valuable and add value to the entry, perhaps the discussion should focus more on how to allow more elegant ways of faithfully citing content, particularly in "controversial" cases, e.g., obviously one bulleted reference is enough under, say, an assertion that et kala and liv kalā derive from the same source because, well, it's pretty obvious but then if there is a "weird" controversial cognate there isn't even a way of citing it inline (unless you want the awful looking mixing of numbered and bulleted refs.) Neitrāls vārds (talk) 11:09, 12 August 2015 (UTC)

  • Support having a ==Notes== section separate from ==References==, esp. when both exist. Benwing (talk) 11:21, 12 August 2015 (UTC)
Thinking of copying this to a separate header for separate discussion. Neitrāls vārds (talk) 06:45, 17 August 2015 (UTC)

Transliteration obligatory?[edit]

It seems that transliterating non-latin scripts has become obligatory in all templates, but in certain cases -- "latin-like" scripts like Cyrilic or Greek -- I think transliteration actually annoys more than it helps. Why is transliteration, especially of Cyrilic and Greek, obligatory in all cases, including inflection tables and examples? I would rather have it only next to the headword... Case in point: Eastern Mari лум, where having two Mari lines is somewhat disruptive. Can't we have a parameter tr=- (which is what I used in this case) to avoid the obligatory transliteration? As things are, the only option is not to use templates... which I would prefer not to do. --Pereru (talk) 04:47, 5 August 2015 (UTC)

Generally, we transliterate everything in Wiktionary -- we don't assume readers are able to handle foreign scripts. So I don't think it's a good idea to disable the translit just because it seems disruptive to you -- we're not limited in space or anything like that (and "disruptive" is in the eye of the beholder). Benwing (talk) 07:48, 5 August 2015 (UTC)
I would also say that translit is especially important for an obscure language like Eastern Mari -- even though it's "just" Cyrillic, it invariably has different conventions from more familiar languages like Russian. (Consider, for example, the Abkhaz language, which is written in Cyrillic but with all sorts of strange non-Russian characters.) Benwing (talk) 07:52, 5 August 2015 (UTC)
I have fixed the problem that |tr=- didn't work in {{ux}}/{{usex}}. However, it should not be used except in exceptional circumstances and this is not one of them. --WikiTiki89 13:30, 5 August 2015 (UTC)
So you guys don't think that it is confusing to have two "examples" separated by em-dashes -- the original spelling example and the transliterated example -- followed by a translation? My first reaction was that it looked like it had two translations, or at least that there were too many elements, enough to clutter the view. Wouldn't it be better to restrict transliterations to headwords, and leave them out of examples and inflection tables? --Pereru (talk) 13:53, 5 August 2015 (UTC)
Should people reading the examples and inflection tables be required to be able to read the script? I think that's too high/elitist a requirement. People might be wanting to read inflection tables for all kinds of reasons. For example, I might be interested in Armenian inflections even though I can't read the script at all. Why make that impossible for me to do? —CodeCat 14:20, 5 August 2015 (UTC)
I agree with the obligatory transliteration of usexes. The format may be tweaked though. Perhaps the first em-dash can be replaced with parentheses. --Vahag (talk) 16:41, 5 August 2015 (UTC)
What about this: лум лумеш, возеш (lum lumeš, vozeš) ― it (lit. snow) is snowing ? DTLHS (talk) 16:51, 5 August 2015 (UTC)
I would not italicize, nor use a small font. I would use a format as in the headword line or {{l}}. --Vahag (talk) 20:20, 5 August 2015 (UTC)
I agree with Wikitiki89 and CodeCat. For short usexes DTLHS's suggestion is good. (I suppose this comes back to the subject we were discussing elsewhere, of having the template 'know' when to make a multi- vs a single-line usex.) - -sche (discuss) 17:37, 5 August 2015 (UTC)
I also like DTLHS's idea with parenthesis and a smaller fornt. My problem with having transliterations everywhere is simply that it affects compactness, which we also want to strive for. I'd be in favor of some solution that doesn't force inflection tables, sometimes already too big (especially in an agglutinative language like Eastern Mari), to become twice as big. Wouldn't it be possible, for instance, to have a second, alternative table with the transliterations? Perhaps with a clickable point to change one version of the table into the other? Or could we maybe have the transliteration become visible in a hovering bubble as you move your cursor over the table? --Pereru (talk) 08:52, 6 August 2015 (UTC)
  • The widespread assumption on Wiktionary is that users are idiots, so having redundant and often unjustifiable data cluttering the entry is generally seen as a good thing. Perhaps we need a two-tier Wiktionary: one for "common people" - without dead words and meanings, complicated etymologies, transliterations on every place under the sun and generally anything that could hurt their attention spans in search of that precious datum of information that landed them here, and one for "serious people", with all that extra stuff. --Ivan Štambuk (talk) 11:02, 7 August 2015 (UTC)
That's actually a good idea. Is it possible to do something like that, maybe by having different shells for "specialists" and "non-specialists"? Even 'normal' articles seem cluttered with all those translation tables and alternate forms and what not, especially for the casual user who just wants to know what a word means. Note that other online dictionaries often have this extra information in some clickable-access format, but not immediately displayed when one asks for a certain word. --Pereru (talk) 00:58, 8 August 2015 (UTC)
I prefer to see transliterations, when templates are used, including for scripts and languages I can read, e.g. Korean or Hindi, etc.
내가 어찌 알겠어?
Nae-ga eojji algesseo?
How should I know?
एक नई शुरुआत
ek naī śuruāt
a new beginning
It's much easier that way for most users. "Smart" users can bear with those who are dumb :) --Anatoli T. (обсудить/вклад) 12:16, 7 August 2015 (UTC)
But wouldn't it be just as good if the transliteration were 'clickable' or available on a hovering bubble? --Pereru (talk) 00:58, 8 August 2015 (UTC)
This feature is currently unavailable. There's no point mentioning something that doesn't exist. (I am not saying, it's not possible to implement.) --Anatoli T. (обсудить/вклад) 13:05, 8 August 2015 (UTC)
If you're "not saying it's not possible to implement", then what's wrong with proposing it? --WikiTiki89 17:51, 10 August 2015 (UTC)
  • I oppose obligatory transliteration of usage examples. I would even favor banning transliteration in usage examples, but there won't be consensus for this. Then at least, don't make it mandatory. These transliterations present inessential (disposable) visual noise. --Dan Polansky (talk) 11:17, 8 August 2015 (UTC)
I suspect that any non-Roman script would be visual noise for you. Unless you can read all scripts, of course. --Anatoli T. (обсудить/вклад) 13:05, 8 August 2015 (UTC)
I am not saying the non-Roman script is noise; I am saying that the romanization in the example sentence is noise. Thus, in лум, I see this:
  • мамык лум ― mamyk lum ― fluffy snow
But I'd like to see this:
  • мамык лум ― fluffy snow
Romanizations in headword lines are fine, IMHO. --Dan Polansky (talk) 13:37, 8 August 2015 (UTC)
At least for one-line usexes, I'd like to see:
  • мамык лум (mamyk lum) ― fluffy snow
but that may become unwieldy for usexes where the translation is on a different line from the usex. —Aɴɢʀ (talk) 14:20, 8 August 2015 (UTC)
I agree with Dan Polansky above, of course. But I still ask: why not find some other way of handling transliterations, such as making them visible when one moves the cursos over them in hovering bubbles, or having a button that makes them visible or invisible depending on the taste of the viewer? What would be wrong with that? We make inflectional tables appear closed by default, and only open when you click on them; why not do the same with transliterations? --Pereru (talk) 19:39, 8 August 2015 (UTC)
@Pereru. Someone will probably create a technical solution for this but I don't understand your dislike for transliterations in usexes. You can ignore them if you don't need them but do you realise that other users may be interested? They may not know the script or willing to learn it, they could be interested in analysing the grammar, vocabulary or language comparison. Foreign scripts just put off some people who are only used to Roman letters. I know this for a fact - this includes people who are familiar with foreign scripts but not fluent in them and reading foreign characters takes some effort. Besides, I'm sure you're having Cyrillic in mind when wanting to get rid of transliterations but the change (if implemented) will affect all non-Roman scripts, some are very complicated and hard to read! How useful would a string of Thai characters like this: เรียกรถแท็กซี่แล้วยัง be to you, compared to
เรียกรถแท็กซี่แล้วยัง?rîak rót tɛ́k-sîi lɛ́ɛo yang?Did you call the taxi?
? You would probably even have some difficulty in finding the headword term (เรียก) at first? --Anatoli T. (обсудить/вклад) 01:19, 11 August 2015 (UTC)
@Atitarev, maybe I'm making this seem more important to me than it really is. It all boils down to an esthetical preference: examples plus translations tend to already be long enough, if you still add transliterations the result will often be longer than one line, and that offends my sense of proportion. I would prefer no transliterations even in languages whose script I don't read (I can read the Thai script, so that's not a big deal for me, but, for instance, I don't read Chinese characers; and yet, for me, lines with just the original Chinese example and a translation look nicer than those with the transliteration). The esthetics gets especially bad with inflection tables, which become at least twice larger than they need to be only to accommodate transliterations. Now, I understand and agree that others have a right to think differently, and I won't mind too terribly if things remain as they are. But if there's a chance of getting a nicer format... then I'm all for it! --Pereru (talk) 06:20, 11 August 2015 (UTC)
@Pereru Thanks for the reply. Yes, various enhancements are welcome but until they are implemented, I think it's good to keep transliterations as they are. Yes, foreign language example can look nice but sometimes meaningless or very hard to digest. It is very true when you look for them. For me, full FL examples with translations and with transliterations (or phonetic guide/help like Japanese furigana, Arabic vocalisations, word stresses, etc.) were always a blessing in learning the basic of new tongues in a relatively short period. You can focus on scripts, grammar, vocabulary, syntax - it's your choice what you do and when, when you have all three (audio recording is a fourth important component). --Anatoli T. (обсудить/вклад) 07:08, 11 August 2015 (UTC)
  • Readability and usability I think that adding transliteration to other scripts is extremely valuable and would like to see it implemented throughout the dictionary but I am also concerned about the perspective that encourages adding an extra step on clicking or focusing for the browser and mouse because this is difficult for users with certain disabilities and on some platforms. —Justin (koavf)TCM 06:16, 12 August 2015 (UTC)

Eastern Mari possessed forms[edit]

I'm thinking about how to do a template that will include possessed forms in Eastern Mari, but because every possessed form ('my house', 'your house', etc.) can also be inflected for ten cases, singular and plural ('my house', 'in my house', 'in my houses', 'to my house', 'to my houses', etc.), we end up having 6 persons x 20 cases x 2 numbers = 240 forms, most of which are predictably formed. This means creating tables that are rather big and unwieldy. I was wondering if someone working with similar cases (in other Finno-Ugric languages, or in Turkish, etc.) has found a better solution that just creating big tables? (Right now, I'm tempted to make each non-possessed declined form -- e.g., 'my house' -- an independent sublemma, with its own case inflection table under it, but I'm not sure this is the best solution.) --Pereru (talk) 04:56, 5 August 2015 (UTC)

I'm a bit fuzzy on the details, but I vaguely remember someone saying that you can nest collapsible boxes. That means you could have just one form showing, but a whole sub-paradigm that opens up when you click on it. Chuck Entz (talk) 06:58, 5 August 2015 (UTC)
Finnish declension tables ignore possessive forms. The possessive endings (which can also be added to verb forms) have separate entries like -ni, -si, -nsä etc with lots of usage examples. --Makaokalani (talk) 10:32, 5 August 2015 (UTC)
For Hungarian entries, each possessive form contains its own declension table. For example: ablak (‘window’) → ablakom (‘my window’), of which the latter have a separate table with forms such as ablakommal (‘with my window’), ablakomban (‘in my window’), etc. Einstein2 (talk) 19:09, 5 August 2015 (UTC)
Nagyon szépen! I like the Hungarian solution. But how do you get those green links? They speed up the making of form-of pages considerably. --Pereru (talk) 01:57, 6 August 2015 (UTC)
Here's a description about how to make a template use the script which generates the green links: User:Conrad.Irwin/creation.js/documentation. Einstein2 (talk) 11:22, 6 August 2015 (UTC)

Make Proto-Baltic an etymology-only language[edit]

Linguists don't all agree on the nature of the Baltic languages as a group. There are three main proposals, that I know of:

  1. Balto-Slavic splits into Baltic and Slavic. Baltic then split into East and West Baltic. (this is the traditional view)
  2. Balto-Slavic splits into East Baltic and Slavic-West Baltic. Slavic-West Baltic then split into Slavic and West Baltic.
  3. Balto-Slavic splits into East Baltic, West Baltic and Slavic.

Proto-Baltic only exists in the first of these proposals. Moreover, it has been noted that there aren't really any common linguistic changes that separate Proto-Baltic from Proto-Balto-Slavic. As reconstructed, the two are essentially identical.

In the past, we've deleted and merged different proto-languages when there is no definite agreement on their existence and definition, and when they are too similar to their parent language to make separate pages for them worthwhile. For example, Proto-Finno-Permic and Proto-Finno-Ugric were recently merged into Proto-Uralic. There was also a discussion on merging various Polynesian languages, although I'm not sure where that went. In any case, I don't see the value in having separate pages for Proto-Baltic reconstructions when they're all just going to be identical to Proto-Balto-Slavic reconstructions. So I think that Proto-Baltic should be changed into an etymology-only language, so that it can be mentioned with {{etyl}}, but there can be no entries or links to it. All existing links would be changed to Proto-Balto-Slavic. —CodeCat 12:14, 7 August 2015 (UTC)

  • Support. --WikiTiki89 14:59, 7 August 2015 (UTC)
  • Support also. Like you, I have also heard that Baltic = East + West Baltic is not a valid clade. Benwing (talk) 16:30, 7 August 2015 (UTC)
  • Disagree. PBS is still not consensus, and as far as I understand the assumption PB = PBS is not obviously true -- Slavic can alter PB reconstructions significantly if it is taken into account for PBS. So, since there is no consensus, I say keep the PB pages as long as they're sourced. After there is a PBS etymological dictionary then this issue can be dealt with here; before that, doing this would simply be premature. --Pereru (talk) 00:55, 8 August 2015 (UTC)
But would there be any difference? It would receive the same treatment as fiu-pro – valid for use in etymologies (in {{etyl}}) but not having its own appendices. Does bat-pro even have any appendices, I think majority are bsl-pro, is that correct? Hopefully this would be another step towards lessening confusion/misguided deletions like this: User_talk:Tropylium#Category:Proto-Finnic_terms_derived_from_Proto-Baltic (I'm sure it was done with good intentions but a user should be able to use such oft-cited (in published literature) genetic groupings in etymologies even if they are considered defunct by the most recent research and don't have their own appendices.) Neitrāls vārds (talk) 09:25, 9 August 2015 (UTC)
There was no deletion: that category simply hasn't been created yet. The category showed up in Special:WantedCategories, and I wanted to make sure it was a good idea to create it before doing so. I wouldn't have deleted it if someone else had created it, but I try to avoid creating categories that are only going to be deleted later (though it inevitably happens some of the time). I do weed out a lot of mistaken categories from bad edits, which I correct, like Category:Spanish adejctive forms, but I generally wouldn't do that with a knowledgeable editor who intended to do it that way. I didn't create the category, but I didn't "fix" the entry itself. Chuck Entz (talk) 14:50, 9 August 2015 (UTC)
  • Question: has Proto-East Baltic been worked out to any major degree? As far as I know, everyone accepts East Baltic, which means that effectively the Baltic vs. Balto-Slavic debate should only come up whenever there's Old Prussian or similar data involved. I would not be surprized if there were even sources defining "Proto-Baltic" as only the common ancestor of Latvian + Lithuanian anyway. (I tentatively support a merger between the appendices; bear in mind that we could still cover in prose differences between Baltic and Balto-Slavic of they were to come up. But I have no opinion on which of the two should remain.) --Tropylium (talk) 18:46, 8 August 2015 (UTC)
Nope. AFAIK no such a thing has been worked out as of yet. Neitrāls vārds (talk) 09:25, 9 August 2015 (UTC)
  • Support. Like Benwing, my understanding of the scholarship is that Baltic is not a genetic group and there was no Proto-Baltic. Even Derksen, who writes of Proto-Baltic, says "I am not convinced that it is justified to reconstruct a Proto-Baltic stage; the term Proto-Baltic is used for convenience’s sake." Reconstructing Prehistorical Dialects: Initial Vowels in Slavic and Baltic says "Baltic scholars who have concerned themselves with this question conclude that one cannot reconstructed a Proto-Baltic." The situation seems comparable to Proto-Algonquian, which was initially reconstructed as Proto-Central-Algonquian (contrasted with Eastern and Plains), before scholars realized that only Eastern was a genetic group with a proto-language (PEA), and that what had been reconstructed as PCA was, with only a few minor changes here and there, simply Proto-Algonquian. - -sche (discuss) 19:14, 8 August 2015 (UTC)
    But there is the question of accuracy. Since PBS still hasn't really been reconstructed (no etymological dictionary), mentioning fleeting forms or original research should only be done explicitly, which is not (yet) done here as policy. What is available out there often does have PB, not PBS, forms -- only those few words that are important for an author's paper, such as the Derksen paper you cite. In the absence of a body of consensus reconstructions for Proto-Balto-Slavic, disregarding the Proto-Baltic ones or changing them automatically into Proto-Balto-Slavic is simply too hasty. The work hasn't been done yet to justify this. We're still at "Proto Central Algonquian" time; to assume that the work of demonstrating that all those forms are simply "Proto Algonquian" has already been done is at best temerary. --Pereru (talk) 19:36, 8 August 2015 (UTC)
I'm confused ... AFAIK no one questions that Balto-Slavic is a clade. Benwing (talk) 05:12, 9 August 2015 (UTC)
Some Lithuanian (and perhaps Latvian) nationalists deny it. I've seen the claim made that the Balto-Slavic theory was a Soviet plot to justify the annexation of the Baltic States into the Soviet Union. I don't know whether any reputable linguists free of ideological motivations deny it, but if so, they're in the minority. —Aɴɢʀ (talk) 15:24, 9 August 2015 (UTC)
One of them tried to hijack the Wikipedia pages on the subject not that long ago. As to Benwing's confusion: the issue isn't whether it's a clade, but whether the details have been worked out on the proto-language. Also, proto-languages are theoretical constructs that are only as good as the information on which they're based: including Slavic in a reconstruction provides extra material to work with, so a PB reconstruction may not be as a complete a picture as a PBS one. I have no problem with documenting that a referenced reconstruction was for PB rather than PBS. My main issue has been with categorizing entries as derived from PB. Even experienced editors sometimes forget about the categories that are added by the templates. Chuck Entz (talk) 15:57, 9 August 2015 (UTC)
@Chuck Entz, well, using it in etyl would imply categorization as well, this is how it's done for fiu-pro as well, do you think the cat should redir?
@Angr, one way it can be valuable (if one reads between the lines) is that it often is used as a "code word" for Proto-East Baltic (the hypothetical parent of Latv. and Lith. that hasn't been worked out yet and judging by current theories wouldn't include Slavs if it is, in fact, worked out at some point) which gives geographic and chronological clues (this can be important in Uralic/Finnic etymologies for example, as there appear to be several layers – a pre-Slavic Balt(o-Slav)ic layer and for Finnics a "Proto-Baltic" (read "Proto-East Baltic") layer of borrowings.) Neitrāls vārds (talk) 16:31, 9 August 2015 (UTC)
@Chuck: It's possible that Slavic would include more information, but someone with enough knowledge of Slavic sound changes could easily evaluate if the Proto-Baltic reconstruction is also valid for Proto-Balto-Slavic. In most cases, it will be. This is not limited to Slavic either; information from outside Balto-Slavic can also contribute to a Balto-Slavic reconstruction. —CodeCat 20:19, 9 August 2015 (UTC)

Adding our own diacritics in quotations of prose works printed without them[edit]

I've had an ongoing debate in the past with User:Atitarev about whether we should add stress marks to quotations of Russian prose. He believes that this is helpful to readers, but I am against this for a number of reasons. Firstly, I believe that out of respect to the author and publisher, all of our quotations should reproduce as closely as possible the original work with the exception of the bolding we add to the word(s) that the quote is demonstrating. Secondly, this forces us in some instances to choose between two or more equally acceptable stress variants of some words, or worse in some cases between two or more homographs with different meanings. Note that this does not apply as much to poetry from which stress can be inferred by the meter, or to songs or movies in which the stress can be heard. This problem is significantly exacerbated in languages such as Hebrew and Arabic, where would not only be inferring stress, but also vowels, leaving much more possibility for ambiguity.

The question is: Should we (Wiktionary) do this in general? Should we do this for languages like Russian, even if not for languages like Hebrew and Arabic? Should we do this even for languages like Hebrew and Arabic? Should we remove diacritics from quotations where we have already added them? --WikiTiki89 15:32, 7 August 2015 (UTC)

As far as I know, the practice is to leave quotes relatively unchanged. I don't think we add macrons to Latin or old Germanic quotes, for example. —CodeCat 15:37, 7 August 2015 (UTC)
Adding macrons to Latin is a completely different story, because these texts are often already printed with macrons. I'm not talking about always sticking with the most original quote version of the quote, but about sticking to existing publications. This question mostly applies to relatively modern quotations. --WikiTiki89 15:55, 7 August 2015 (UTC)
As for Arabic, stress of course doesn't really apply, but I think it would be a huge help to the reader to add the vowels to the extent that they can be inferred reasonably unambiguously. Reading Arabic is hard for non-fluent speakers due to the underspecified text, esp. with verbs. I think in the case of Russian, similar arguments could be made -- if you're concerned about ambiguous cases, just leave off the stress in those cases or (perhaps better) follow Anatoli's convention of putting a stress mark in each possible place of stress. I'd also like to see individual words inside quotes linked -- again it would be a great help for the language learner. Benwing (talk) 16:26, 7 August 2015 (UTC)
Not all the quotations we include need to be targeted toward beginners. We can have usage examples with the full diacritics, which would be helpful for beginners. But quotations are meant to show how the words are really used in reality; and in reality, Russian is not written with stress marks and Arabic is not written with vowels. --WikiTiki89 17:32, 7 August 2015 (UTC)
In reality we always (or should always) transliterate Arabic text, at the very least. (And who's to decide what's targeted towards beginners and what's not? The same arguments could be made for not transliterating at all.) Benwing (talk) 19:49, 7 August 2015 (UTC)
What I mean is that not everything needs to be targeted toward beginners who can't read without vowels. And even for people who are not so comfortable reading without vowels, it's not as hard when you already know what word you're looking at. With transliteration, we're not actually altering the original text; the original is still there and anyone who doesn't want or need the transliteration can ignore it. --WikiTiki89 20:19, 7 August 2015 (UTC)
Another thing is that adding adding vowels prevents us from being able to show how vowels actually are used in the text (such as the fatḥatān, šadda, and other sporadic disambiguators). This applies to all three of the languages I've mentioned. --WikiTiki89 21:10, 7 August 2015 (UTC)
When we're giving a direct quote, we should keep the original spelling of the whole quote, i.e. without Russian stress marks (unless we happen to be quoting some text that for whatever reason uses them). We should also keep е for ё if that's how it was spelled in the original. (I don't quite understand why we allow ё in page names in the first place.) We can include stress marks in the transliteration if need be, though that will mean writing the transliteration out manually instead of letting it happen automatically. —Aɴɢʀ (talk) 10:55, 8 August 2015 (UTC)
We allow ё in the page names because this is a dictionary convention, it's so also in the Russian Wiktionary. The Russian Wikipedia makes the letter mandatory throughout articles and many native speakers prefer to write it all the time. Letter ё isn't exactly banned in Russian! It's also considered a separate letter, not a е with two dots (две точки). Every Russian dictionary uses it in the alphabetical order. Knowing that ё is replaced with е by native speakers lets you figure out how to spell it in the real world. For the same reason, I don't see how adding stress marks, normalising texts with ё, adding Arabic or Hebrew diacritics, Japanese furigana is a problem in quotations. Many editors suggest photographic image of the original texts, even using the glyphs. Modern Russian books don't reprint texts in the pre-1918 reform spellings. China republished all old books in the simplified script. Japanese publishers partially follow the post-war reform.
Another point, some Russian books appear in accented forms with consistent usage of ё, designed for foreigners or children. Or Arabic texts can be with or without vocalisations. Japanese texts appear with furigana (ruby) to help with the pronunciation, especially when aiming at young readers.
My strong opinion is that dictionary should be user-friendly and help master languages, it's about the language, not the facts. Showing how languages are written out there in the real world can be described in appendices. Learners learn this as the first thing. For me, a learner of Arabic, is much more useful to have vocalised Arabic then telling me over and over again that diacritics are not used by Arabs. Imposed restrictions is the reason I dislike adding citations. --Anatoli T. (обсудить/вклад) 12:46, 8 August 2015 (UTC)
I support Benwing's idea of linking words in usage example. It has long been used by Chinese templates, which do it automatically. E.g.
中國首都北京 [MSC, trad.]
中国首都北京 [MSC, simp.]
Zhōngguó de shǒudū shì Běijīng. [Pinyin]
The capital of China is Beijing.
As you can see, it has a semi-automatic script conversion and transliteration, it can also be used for quotes, which will display both traditional and simplified forms, regardless of the original form. --Anatoli T. (обсудить/вклад) 12:53, 8 August 2015 (UTC)
All of this is fine for our own example sentences, but I do think we should follow the original orthography when we're giving a direct quote. We're showing how the word is used "in the wild", and I don't think we should pretty that up. But headword lines and translation listings and usage examples can be as learner-friendly as we want them to be. —Aɴɢʀ (talk) 05:54, 9 August 2015 (UTC)
What do we violate by providing "самолёт лети́т на за́пад" instead of "самолет летит на запад" with word stresses and normalising "е" as "ё"? The text is the same, it just has accents to make the reading easier. It's completely uncommon in Russia to use pre-1918 reform spelling when quoting old authors and Chinese don't have to use traditional script when quoting old authors, regardless of what script the original was in. Chechen texts often replace Cyrillic palochka with |, l, 1, etc. for technical reasons but the normalised spelling distinguishes capital and small Ӏ and ӏ , e.g. лугӏат (luġat) (the correct spelling) will appear in a printed text as луг|ат, лугlат, луг1ат or лугӀат. Should we also copy the fonts and word breaks in citations? --Anatoli T. (обсудить/вклад) 07:23, 9 August 2015 (UTC)
I feel like with direct quotes, we should present them as faithfully as Wikisource presents source texts: we don't copy over fonts and word breaks, and incorrect character shapes can be replaced with correct ones when the intent is clear (e.g. when the original author is clearly attempting to write a palochka but doesn't have the exact character available), but we do present misspellings, misprints, typos, etc., uncorrected (though they can be [sic]ed) and we don't add pedagogical diacritics. —Aɴɢʀ (talk) 08:17, 9 August 2015 (UTC)
I agree with Angr. - -sche (discuss) 06:03, 9 August 2015 (UTC)
I disagree, as mentioned above. Although in any case there shouldn't be problems linking individual words in quotes. Benwing (talk) 07:08, 9 August 2015 (UTC)
FWIW, I think the argument for adding diacritics to Arabic (it's often unintelligible without them) is much stronger than the argument for adding diacritics to Russian (it's perfectly intelligible without them), and I would sooner allow the former than the latter. At the risk of adding far too much visual noise to non-Latin script citations, perhaps we could have vocalized forms display on mouse-over or something? - -sche (discuss) 16:54, 9 August 2015 (UTC)

WikiTiki's and Anatoli's disagreement is very deep and philosophical. It stems from the disagreement over the purpose of Wiktionary. Anatoli and his camp see Wiktionary mainly as a learning tool for non-native speakers. Hence the reading aids in quotations, the note in Template:ru-adj1 and the unscientific, pronunciation-based transliteration system for Russian. The other camp, which includes me, sees Wiktionary as a scholarly resource, a kind of an encyclopaedia of language, useful for native speakers too. One side wants to write an OALD, the other an OED. Both projects are useful and have a right to exist, but we have to choose one. --Vahag (talk) 13:56, 9 August 2015 (UTC)

I'm not sure what you mean by "unscientific" here. Also, maybe I'm an optimist but I think it's possible to resolve this issue through compromise. As for OALD vs. OED, keep in mind this is the English Wiktionary, and hence designed for English speakers. That means that foreign-language entries are inevitably geared somewhat towards language learners, just like all cross-language dictionaries. I don't think there's much disagreement over this. This means the OALD isn't the right point of comparison. We're rather trying to create something like the OED for the English-language entries and the Hans Wehr dictionary for Arabic language entries (this is the best dictionary of Modern Standard Arabic I can think of), and similarly for other foreign-language entries. Benwing (talk) 21:25, 9 August 2015 (UTC)
Vahag. Neither OALD nor OED cover topics in detail we do here. Published Russian dictionaries lack transliterations, there's nothing to compare with. Well-known dictionaries are unconcerned about the Russian transliteration, they simply don't do it. When they do (in citations, etc.), you get both "narodnovo" (phonetic) and "narodnogo" (graphic) transliterations (genitive or animate accusative of наро́дный (naródnyj)). You made negative comments about word stresses and genders as well but most users and editors find them useful, AFAIK. Therefore, I have to use other languages again as examples, for the umptieth time.
Examples of irregular pronunciations and transliterations, using very common words in various scripts:
  • Thai: ชาติ (châat) (written as "châa-dti") but the final "i" is silent. Can you find a (scientific) source, which claims that it should be transliterated as "châa-dti" or similar, with a transliterated "i"?
  • Korean: 십육 (simnyuk) (written as "sibyuk"). Can you find a (scientific) source, which claims that it should be transliterated as "sibyuk" or similar?
  • Japanese: 今日は (こんにちは, konnichi wa) (written as "konnichi ha"). Can you find a (scientific) source, which claims that it should be transliterated as "konnichi ha" or similar?
  • Arabic: شُوكُولَاتَة (šokolāta) (written as "šūkūlāta"). Can you find a (scientific) source, which claims that it should be transliterated as "šūkūlāta" or similar? Perhaps a better example is إِنْجْلِيزِيّ (ʾinglīziyy) written as "ʾinjlīziyy".
I can give more examples where phonetic transliteration (closer to pronunciation) is considered standard and scientific. Are there sources that claim that "что" should only be "čto" and never "što" and "кого" should only be transliterated as "kogo" and never "kovo"? --Anatoli T. (обсудить/вклад) 01:01, 10 August 2015 (UTC)
Anatoli, Benwing, Russian transliteration has been discussed million times (see Wiktionary talk:Russian transliteration) without achieving consensus. Let's not start a new one here. I was merely pointing it out as an example of scientifically rigorous vs convenient. The issue at hand are the usage examples. When you are giving a quote from Pushkin's Eugene Onegin, I want it do be without stress marks and in pre-reform orthography, as it was published in 1833. If you normalize the text, it is less valuable to me and others who are interested in diachronic, historical development of Russian. Language learners would prefer normalized quotes. Our needs are irreconcilable. --Vahag (talk) 10:50, 10 August 2015 (UTC)
If I were to provide citations for the pre-1918 reform spelling of пока́мест (pokámest) (modern) - пока́мѣстъ (pokáměst) (pre-1918 spelling reform), then the old spelling would be more appropriate:
Покамѣстъ, въ утреннемъ уборѣ
Надѣвъ широкій боливаръ,
Онѣгинъ ѣдетъ на бульваръ,
И такъ гуляетъ на просторѣ,
Пока недремлющій брегетъ
Не прозвонитъ ему обѣдъ
But why would I need to confuse users/readers if the entry is the modern spelling (покамест)? Pre-reform spellings are, of course, allowed but they should be clearly marked as old or obsolete. Pushkin's works are enjoyed today by most readers who don't have to struggle to read the old orthography, citing pre-revolution authors works just fine but if anyone is interested in the old orthographies, they are free to do so but what it has little to do with the dictionary of (modern) Russian. --Anatoli T. (обсудить/вклад) 12:32, 10 August 2015 (UTC)
You can reference a more modern printing of the work, which would have already converted the orthography. I would have no problem with that. --WikiTiki89 17:59, 10 August 2015 (UTC)
That's great but why would a learner of Russian seek archaic spellings in the Russian sections of the English Wiktionary, even if the reference is for a term, which hasn't changed with the reform? It's fine if you already mastered modern standard Russian and wish to take the next step and familiarise yourself with historical spellings. Yes, we can add all historical spellings but they are not a priority for this project. --Anatoli T. (обсудить/вклад) 01:07, 11 August 2015 (UTC)
In case you misunderstood me, you can quote a more modern printing of the work that uses the modern orthography. I'm only concerned with with us altering the text ourselves. --WikiTiki89 01:22, 11 August 2015 (UTC)
Perhaps, my comment was to get my point across in reply to Vahag's comment earlier, where he said that he would prefer the original orthography quotes. Eugene Onegin (or Yevgeny Onegin) is available in both pre-reform and modern spellings or in fact any old literature for that matter. I just don't see the need to quote pre-reform orthography for modern terms. Not at the expense of modern orthography, in any case. --Anatoli T. (обсудить/вклад) 01:48, 11 August 2015 (UTC)
Some people may be interested in them. If we already have a few quotations in modern orthography, who does it hurt to have one in the old orthography as well? --WikiTiki89 02:08, 11 August 2015 (UTC)
  • Adding stress marks to attesting quotations of Russian prose is a poor practice, IMHO. Adding these to headword lines is acceptable; adding these to lists of terms such as synonyms and derived terms is equally poor, IMHO. In most places, terms should be presented in the form in which they appear in print. I don't believe the learners of Russian should be reminded on every single occassion how to pronounce; the headword line itself should suffice. --Dan Polansky (talk) 11:40, 10 August 2015 (UTC)
    I agree with Dan in principle. Language learners are smart enough to click the link if they forgot the stress or pronunciation of a word. --WikiTiki89 17:59, 10 August 2015 (UTC)
As a (admittedly not very committed) learner of Russian, I would find more pervasive usage of stress marks very useful. Looking up the stress every time is very tedious and a drag on learning. Seeing the stress mark in quotes, examples, links and synonyms would make learning faster through repetition and reinforcement. --Tweenk (talk) 22:48, 26 August 2015 (UTC)

Tagging unsourced reconstructed entries[edit]

I've just made {{needsources}} to tag reconstructed entries (protoforms) that were created without explicit published sources. Since, after all, reconstructed forms are simply hypotheses, not attested words, they need sources (who proposed that reconstruction, in what publication, and based on which cognates) just as much as a "normal" word needs usage examples so we know it really exists. I therefore suggest that any reconstructed entries that have no sources in them be tagged, so that those interested in them can add the sources. (I started doing this, but my edits were reverted since the issue had not been discussed here first, so I am doing this now.) --Pereru (talk) 01:40, 8 August 2015 (UTC)

I don't understand why they must absolutely have sources. From its conception, Wiktionary has been a dictionary and therefore stands on par with other dictionaries. Other dictionaries do not source all their definitions to another linguistic work; they interpret and present their research independently. In the same way, Wiktionary and its editors have directly interpreted evidence in the form of attestations. Parroting other dictionaries has always been explicitly forbidden and independent research of lexicographic content has been a requirement, enshrined in WT:CFI and the process of WT:RFV. For lexicographical content, we have never once required corroboration by an outside source; we require evidence and make our own decisions based on that through consensus and peer review.
Because Wiktionary presents etymological information as well, it's also an etymological dictionary. That means that other etymological dictionaries stand on par with Wiktionary. Etymological dictionaries, too, present independent and sometimes novel interpretation of the evidence, and are not required to take all of their contents from other linguistic sources. Of course, when information is corroborated by another source, they can and do indicate this, to strengthen their own claims. But etymological works may equally question or refute what other sources say; they're not limited to parroting others.
Wikipedia is an encylopedia, a compendium of existing knowledge. This makes sourcing vital to Wikipedia, and original research a problem. But as I have shown here, Wiktionary is of a very different nature, and through this nature it is bound by different rules. It's not a compendium of lexicographic or etymologic knowledge presented by others; it's an independent source of this knowledge. We are not subservient to other linguistic sources, we are their equivalents, or even competitors. Original research within Wiktionary is important, it's an integral part of how Wiktionary works and has always worked. Therefore, it's not appropriate to require sourcing to another linguistic work for information presented on Wiktionary. This goes directly against what Wiktionary is, and the principles and processes written down in our policies. —CodeCat 12:09, 8 August 2015 (UTC)
Contrary to the above, requiring references for etymologies is not against en wikt policies since we do not have any on the matter. WT:ATTEST, the important evidence-based criterion, says nothing about etymologies. Some people are even pushing a requirement that etymologies should be referenced into WT:ETY; my removal (diff) of an undiscussed addition of such a requirement was undone. I think the whole section References in WT:ETY should be removed as not traceable to a discussion or vote showing consensus, but I have better things to do at this point; maybe a couple of months later. Again, while for definitions we have WT:ATTEST and WT:CFI in general, for etymologies we have a policy vacuum. --Dan Polansky (talk) 13:10, 8 August 2015 (UTC)
Let's see if I can help CodeCat understand why sources for etymologies are a good thing:
(a) Etymologies are hypotheses, not the truth; the interested reader should be able to see why a certain etymology is given here rather than others, without havaing to trace some discussion of its correctness somewhere in the archives.
(b) Etymologies, being hypotheses, have authors: unlike words, they aren't simply "in usage" or "out of usage" or "dated" and whatnot, they were actually ideas, good or bad, proposed by someone. To omit this information is (a1) a disservice to the interested reader, since it hides available information, and (a2) unethical, since it amounts to not giving credit to an author for his/her idea, which is a kind of intellectual theft
(c) To the non-specialist, more information is better than less information. I am sure that a specialist can probably quickly assess and evaluate the goodness of a specific etymology, but others would need more than that. Claiming you don't need this information because "expert Wiktionaries" can access the correctness of an etymology anyway is like claiming that attestations are not necessary to qualify a word for inclusion because "expert Wiktionarians" can tell if a very rare or dialectal word actually exists...
(d) "Other dictionaries don't do that" is not a good argument ("Wiki is not paper", etc.). Some do: etymological dictionaries, where the sources are so important they are usually listed at the beginning of the book rather than at the end, because the author knows that the interested reader will want to form his/her opinion on the author's choice of sources. Non-etymological dictionaries indeed often don't, but they also often don't cite any etymologies at all, and they certainly don't have appendices with reconstructed protoforms -- if we want to follow them, then we should delete all reconstructed entries, shouldn't we?
(e) "Etymological dictionaries present independent and novel interpretations of the evidence" -- indeed, and they always label it as such! And they also always give sources for ideas that are not "independent and novel"! Why should Wiktionary be any different? Personally, I am not against independent research, as long as it is (e1) labeled as such, and (e2) argued for, preferably on the same page. Why are you not doing that? Īn other words: Etymological dictionaries do distinguish original ideas from other people's ideas, which they give sources for; why don't we -- why don't YOU -- do the same?
(f) Mentioning sources is not equivalent to parroting other dictionaries' definitions--quite the opposite! Mentioning sources means respecting other people's intellectual property rights, and also giving the reader the possibility of exploring the basis for a given etymology being used here.
Besides, both Wiktionary:Reconstructed_terms#References_and_verifiability and Wiktionary:Etymology#References mention the need for sources in etymologies. Why shouldn't we follow these guidelines?
@CodeCat:, you seem to believe that sources vs. lack of sources boils down to Wikipedia vs. Wiktionary. It doesn't. The reason for writing adding sources to etymologies is that it is a good idea (see above), not a simple imitation of other wiki projects. Please get off the soapbox!... Also, it's not a question -- at least not to me -- of "original research". As I said elsewhere, I have nothing in principlpe against original research; I just want it to be labeled as such. If the reconstructed protoforms you created entries for are all your own work, then they should be labeled as such, and your reasons for creating them with that form should be on their page (or on a page like WK:About_Proto-Indo-European, or WK:About_Proto-Balto-Slavic, etc.). I'm not "requiring sourcing to another linguistic work", I'm just "requiring sourcing"-- if it's your work, say so on the page! That's what etymological dictionaries do: they label their own work as such. It's also not about "criteria for inlcusion or deletion": I'm not saying 'delete it if it's original research', I'm saying 'label and argue for it if it's your work' -- not just in obscure discussions two years ago in the Scriptorium, but right on the reconstructed entry page! WHY THE HELL NOT? Something I really don't understand is why you are hellbent on obscuring the reasons why a certain protoform is included here. In what way does hiding the reasons/sources for including a form help Wiktionary become better? Claiming that "expert Wiktionaries" can judge it so we don't need to argue for them on their page is like claiming that "Expert Wiktionaries" can tell if, say, Arabic usage examples are correct or not, so we don't need to translate them into English on the page of the word they are an example for...--Pereru (talk) 19:05, 8 August 2015 (UTC)
(a) Sourcing doesn't actually tell the readers any reasoning. It just suggests that the reasoning might be found in another work instead, but even that's no guarantee as plenty of other works just give forms without any arguments. I am completely for reasoning and giving arguments for reconstructions, within reason. Some widely known and accepted sound changes like Grimm's law should not need to be pointed out in every etymology. So I'm not sure how this point is relevant. External sourcing doesn't change anything about it. If anything, I understand your argument to mean that we should provide argumentation for etymologies in addition to, and regardless of, sourcing.
(b) It can be assumed that all information on Wiktionary is the result of Wiktionary's own editorial process. All content on a wiki is already sourced through the page history, so that gives credit to everything users have ever added to pages. Adding references to Wiktionary users only complicates things. External sources are fine, but we should not be required to tag everything we add with our own usernames, that's just stupid.
(c) Again, a source provides no information, it merely says where information came from. We use many specialised linguistic works as sources on Wiktionary, and I don't think many Wiktionary readers will have access to them. So to the majority of readers, the source is nothing more than a name.
(d) I have nothing against providing a reference to a source when information is taken from them. I admit I have been rather sloppy about this, and still am to some degree. But I am trying to improve things, as you may have noticed from my recent edits to PIE root pages. Do as I say not as I do. Just because I'm not perfect doesn't mean I'm not right.
(e) Again, I have nothing against sourcing information that does come from an external source. What I disagree with is requiring that all information comes from an external source; this is what your new template's wording appears to imply. I also disagree with sourcing particular ideas to individual editors. Wiktionary is a wiki, and information can and should be edited and improved by other editors. This means it's not right to place certain parts of pages on "lockdown", not allowing anyone else to edit them. Etymological information originating from within Wiktionary should be sourced to Wiktionary editors as a whole, and to editorial consensus. But since all information not sourced to external sources can be assumed to have been provided by Wiktionary editors, this is entirely redundant.
(f) Copyright doesn't apply and never has applied to information alone. So intellectual property is not relevant here. Scientists give each other credit and require it from others, because of plagiarism, but that's not intellectual property as far as I know. And I have no idea what the laws and rules are on plagiarism anyway. Wiktionary doesn't have any rules for it.
Those two pages you mentioned were written long ago, long before there was really any significant number of reconstructed pages. I also doubt whether they actually reflect consensus and common practice, so they should be changed to reflect what we actually do. My objections to your proposals, now and before, are that we should not be required to have an external source for all etymological information on Wiktionary. This is where my comparison with Wikipedia comes in. Wikipedia has a simple rule: unsourced material that is challenge can and should be removed. I object to bringing this practice to Wiktionary, as we are a dictionary (lexicographical, etymological and other) and it is in the nature of this project to be able to interpret, research and peer review available evidence (attested words) on our own.
So, again, to recap: I have nothing against sourcing. If information comes from somewhere else, source it. That's a good thing. Explaining reasoning for particular reconstructions, in the entries themselves, is also a good thing. I have no problem against that either, but within reason. Very obvious things like Grimm's law probably don't need to be mentioned, but there is no objective standard for this and if we want to go this route, we should figure out among ourselves which information is obvious enough to leave out. —CodeCat 19:39, 8 August 2015 (UTC)
(a): Of course, only if you mention bad sources. Good sources do have the reasoning behind the proposals. It's up to you if you cite good sources or bad sources. Don't cite bad ones; cite good ones. If you see bad sources being cited, mention that to the author or start a discussion about that source. Don't just omit it -- as always, there is nothing to be gained by using a source -- including your own original research -- and not mentioning it. How many etymological dictionaries do you know that fail to mention their sources? And they are not Wikipedia... Now, it would indeed be better if you added the entire reasoning behind a suggestion rather than just reference the source, but the latter is easier and is the standard practice in etymological dictionaries. And most reconstructed entries here -- especially the ones you made -- still lack such an explanation, which is why they should be tagged with {{needsources}}.
(b): Sure. But as others have said there is no policy with respect to etymologies and their sources, so saying "it's the result of Wiktionary editorial process" still tells us nothing about what was done. What if I want to know the reasons? Where do I find this information -- an information that most etymological dictionaries give by means of, among other things, indicating their sources? And by giving detailed reasonings when it's their own idea?
(c): A good source does provide information. Are you familiar with good etymological dictionaries? They provide further sources, so you can trace it down to the original proposer, and they provide rationales for deviant forms. They also compare different hypotheses, and often provide further evidence for preferring one or the other. Plus they list correspondences and sound laws, especially the least known ones. They're full of argumentation, reasonings, rationales... What the heck are you talking about? What sources are you talking about?
(d): Good! Please continue doing that. If you add sources to your pages I have problems with them. In fact that is my entire point: not having sources and reasons for including a particular protoform on the page itself is not a plus for Wiktionary, it's actually, as you put it, being sloppy. I'm glad you're fixing that, and you'll get my support for this. The goal, of course, should be to fix everything.
(e): And here we are apparently in full agreement: I am in favor of referencing external sources only when the information comes from an external source (duh!...). But now, "if" a given word is the result of your own original research, then this should be sourced, so that the reader knows that it is your original research. If you have your reasoning on the page, what the heck is bad about saying it is your idea? In what way is that bad for Wiktionary? And again, good etymological dictionaries do that (Karulis adds a big "K" to every paragraph in the LEV that contains his own ideas, for instance. That is what good etymological dictonaries do: they do not shy away from original research, but they label it as such and argue for it on the entry itself! Why is that so bad?)
(f): Intellectual property is not simply a question of law; it's a question of ethics. "Plagiarism", i.e. people taking advantage of other people's ideas without mentioning them, is exactly what the concept of intellectual property is supposed to prevent; why else do you think it exists? I think scientists don't own the legal copyright over their own ideas after they're published, but they certainly have the moral/ethical copytright. Do you think Dr Kim would be happy if you wrote him an e-mail telling himv you've mentioned Proto-Balto-Slavic protoforms he proposed in a public forum like Wiktionary without mentioning his name? Would you, if you were in his place? Maybe he thinks Wiktionary is "just internet" or "not trustworthy" and thus not worth the trouble, but I'm sure he wouldn't think that not mentioning his name is the right thing to do -- in fact, I'll bet he would mention this as an argument against taking Wiktionary seriously. Which in fact it is.
(g): Maybe the pages should indeed be changed; two other Wiktionarians in the rfd discussion have already suggested that I myself "be bold" and edit and change them. I don't want to do that, though; but if you feel so strongly about it, why don't you? I do point out, though, that several others have said there is no official policy, so I'm not sure that there is a "what we do" yet: you seem to be placing the cart before the horses here. I think you still need to argue for "what YOU do" as being "what we should do". And frankly, I don't see how you can argue that not mentioning sources actually enhances Wiktionary. There is no self-respecting etymological dictionary that doesn't mention sources and doesn't label independent original research as such; why should Wiktionary?
In sum, if you don't have anything against sourcing, then remove the {{rfd}} from a template that merely asks for what you say you have nothing against. If you are in favor of explaining reasons for particular reconstructions in the entries themselves, then do so. In fact create a framework for doing that, with a special page in the Wiktionary namespace for listing all correspondences, all sound laws, etc. so you can easily refer to them in the shorter explanations in every reconstructed entry. By all means do so! The problem thus far is that this is not being done, and when I started requesting that it be done ("source" = "published source" OR "original research rationale") you reverted all my changes and asked for my template to be deleted. Be consistent! Do as you claim to believe! --Pereru (talk) 20:34, 8 August 2015 (UTC)
@CodeCat:, to summarize:
It seems we agree in most things. We both think it's good to have sources if the information comes from an external source. We both think original research is OK, and we're both in favor of writing down the reasons for a certain reconstruction in the entry itself. I'm further in favor of you also mentioning yourself as the author of a given idea if indeed that is the case, or at least of referencing/copying the discussion that led to a given form being accepted here. So why not do it? And what is the problem with tagging the entries where this wasn't done yet? I also add {{rfap}} to basically every new Latvian entry I make, because this puts them in a single category where Latvian native speakers like Neitrāls vārds can comfortably find the words they want to add pronunciation files to. Because, just as in the case of etymologies being sourced (and I don't mean only external sources), this actually adds value to the entry. Why not make this official Wiktionary policy?--Pereru (talk) 20:46, 8 August 2015 (UTC)
I just don't want my name to be placed in entries, and especially not my real name. I think that's my prerogative. —CodeCat 21:28, 8 August 2015 (UTC)
Not even CodeCat? Why not? I don't want my real name here either, but I wouldn't mind signing something here as "Pereru", the same way I sign a picture I upload to Commons as "Pereru"... If our names are in the histories of the pages we edit, and here as signatures in the comments we write, why not also in suggestions in pages? But well, it *is* your prerrogative. Call it then "Wiktionary contribution", or tag it with a "W" or "WK" to show that the idea originated here, rather than in the outside world.--Pereru (talk) 04:18, 9 August 2015 (UTC)
I don't accept the notion that we need to cite sources to list descendants; that would hobble us. Regular inheritance by a language of a word from an earlier stage of that language (including from a proto-language) is usually so obvious and non-noteworthy that it is not mentioned except for common words in well-documented languages, or for proto-language terms that an author needs to grasp at less-documented languages to demonstrate; good luck finding a reference that confirms, for any sizeable number of words from e.g. Rumantsch, that they indeed derive from Latin/PIE foobar. Even borrowing may be obvious but unreferenced; no reference in supra confirms that the word derives from Georgian, but it's fairly obvious.
I do think the sheer existence of a word in a proto-language is something we need to provide a reference for, though if a reference attests that a certain word existed in a proto-language, I think we can and should certainly adapt that reference's potentially outdated notation; when I do this in Proto-Algonquian appendices I write source (has form) (sometimes visibly and sometimes in an HTML comment). If no previous scholarship attests the existence of a word, we could put a template at the bottom of the entry (a bit like {{LDL}} and {{Webster}}) saying something like "this reconstruction is the product of deduction by Wiktionary editors"; users could then (as with every other claim on every non-talk-page) look to the page history to see who added what. Such a template would provide a nice way of tracking and periodically revisiting such entries to see if references for them had become available, since it seems to be obvious to everyone except CodeCat that citing external authority wrt the existence of words in proto-languages is better than leaving it at "well, a random, vehemently anonymous person on the internet thinks so". - -sche (discuss) 22:15, 8 August 2015 (UTC)
@-sche: the problem here is simply when you have cognates proposed by different sources. Cognates can be sourced by default (they will mostly come from the same source anyway), without necessarily adding a footnote to each of them; but those who come from some other source will need to be footnoted, so that we are clear the source in question did not claim cognacy in this case. This applies even to words suggested as cognate by Wiktionarians: we could add a little superscript "W" to those, for example. This happens because "obvious" is not always true. French parler looks like a cognate of Portuguese falar, but it isn't. In fact, it's standard scientific practice: when you are presenting cognates, they must be either (a) sourced, or (b) your claim, or at least (c) be attested in some very well-known source, so they can be presented as "known to everybody already".--Pereru (talk) 04:48, 9 August 2015 (UTC)
I'm still being misunderstood here it seems. I do think that citing an external authority improves etymologies and reconstructions further. However, I don't think that reconstructions are necessarily less reliable without them. Sometimes, the reconstruction is just so obvious that there's nothing else it could possibly be. A great example is Proto-Finnic *kala. It's exactly the same form as its ancestor and many of its descendants. If we can find sources that agree with our own ideas, then all the better, that just shows that we're not alone in thinking that. But the same applies to sources with respect to each other, too. If we have two sources that disagree with each other, then we can mention the idea from both of them. But we should also feel free to poke holes in these proposals. Maybe we (through WT:ES or a talk) could decide that one of them has more merit than the other, and we can mention our reasoning in the entry. As editors and researchers, we don't have to consider all sources equally valid. —CodeCat 22:28, 8 August 2015 (UTC)
I think the misunderstanding is actually yours, about how science works. Yes, *kala is maybe an obvious case, but it was not discovered by you or me. It has a proposer, and saying who it is is, I think, something an etymologist would be interested in. See, this is like saying we don't have to provide usage examples or definitions for words that "everybody knows". Yes, everybody knows what time and happy mean; yet Witkionary provides them with definitions. Is this useless? No. Is it useless to provide a source for *kala? Again, no. Just ask any scientist: is it useless to provide sources for 'obvious' things? No, both for credit/historical reasons (the guy who said it first deserves the credit), and for scientific reasons ('obvious' ideas sometimes turn out to be wrong...). If the source is well-known ('everybody knows who proposed that'), then scientists will not mention the author (everybody knows the laws of gravitation were proposed by Sir Isaac Newton).
Looking for "the proposer of" obvious etymologies is not a good idea. Finnic is a dialect continuum, and it has always been known by the speakers that people in nearby areas use plenty of the same words. This would be sort of like asking "who was it that proposed that fish in British English and fish in American English are cognate?" (Or: "who was it to discover that the moon has phases?")
It's possible to do historiography on when does an etymology like this start turning up in scientific literature of course, but that's more constrained by the development of linguistic methodology and publication practices themself. {{R:fi:SSA}} mentions appearences of kala spanning 350 years; the earliest inter-Finnic comparison found by them is Finnish ~ Estonian from 1786, followed by Karelian in 1799, Veps in 1830 (in the first linguistic report on Veps to be published), Votic in 1856 (in the first grammar of Votic to be written), etc. (There's no specific date on who was the first to claim that this is also a Proto-Finnic word; but if we grant modern theoretic understanding, this is already implied by the Finnish-Hungarian comparisons from the 17th century, so essentially the date would be as soon as someone came up with the concept of "Proto-Finnic" in the first place.)
I agree that this is information that someone might be interested in, but just referencing SSA itself should be enough so that people interested in the history of etymology would know where to look for more details. At Wiktionary we're only working on etymology itself, not its history. --Tropylium (talk) 13:47, 9 August 2015 (UTC)
Indeed, I agree, especially because a source like SSA would probably give you the beginning of the trail leading to the first proponent if need be. I'm not saying that you need to find out the very first historical source ever to make the claim; but that, unless the claim is yours, some source should be indicated (so the interested reader can follow the trail). And it seems that we agree on that, right? (The "fish" in AE and BE case is not really parallel: I don't think these words were popularly believed to be cognates, but rather they were believed to be the same word, much as when I use "fish" as opposed to when you use "fish": we are using the same words, even if we pronounce them differently. Now, English "fish" and German "Fisch" or Dutch "vis": that is not perceived as the 'same word', and cognacy enters the picture.) --Pereru (talk) 19:20, 10 August 2015 (UTC)

Transliterations in parentheses?[edit]

From the above discussion, it seems to me that most people want to keep the automatic transliteration of non-latin-script examples. Would it be possible to implement DTLHS's suggestion of putting the transliteration in parenthesis rather than after an em-dash, to distinguish it more clearly from the following translation? Could someone perhaps make the necessary changes in the appropriate module, assuming nobody has any objections? --Pereru (talk)

I oppose using brackets but perhaps a light-grey colour for transliterations would be more palatable? --Anatoli T. (обсудить/вклад) 12:59, 8 August 2015 (UTC)
What's wrong with brackets/parentheses? Transliterations on the headword line are in parentheses. Light grey text is hard for people with bad or limited eyesight (e.g. partial blindness) to read, although such people are probably only a tiny minority of our readers. I'd prefer parentheses to lighter text. I like the suggestion (made above) of putting transliteration on the same vs a different line according to the length of the line, but I guess it has no chance of actually corresponding to "fits on one line" vs "doesn't", given the variety of phone- and computer-screen sizes (unless we implement it is a css feature?). - -sche (discuss) 17:58, 8 August 2015 (UTC)
I agree. (Personally, I would even favor a smaller font, in addition to parentheses, but parentheses would already be enough to separate more clearly transliteration from transcription and from the original text).--Pereru (talk) 18:43, 8 August 2015 (UTC)
How about an option that allows transliterations to be shown and hidden at will? —CodeCat 19:49, 8 August 2015 (UTC)
Sounds OK to me. Is that easy to implement? --Pereru (talk) 04:16, 9 August 2015 (UTC)
As long as "at will" means something the end user does, not the editor. Benwing (talk) 05:19, 9 August 2015 (UTC)
Yes, it should work more or less like showing and hiding inflection tables. But there should probably be something that saves the user's preference too, so that transliterations stay hidden forever unless you show them again. —CodeCat 17:02, 9 August 2015 (UTC)
I can agree with that. I'll wait for implementation beforee using the templates in Eastern Mari, but after that it shouldn't be a problem. --Pereru (talk) 19:21, 10 August 2015 (UTC)

When adding RFC to entries[edit]

Would it be too much to ask whether when an RFC is added to an entry that the date be added as well (perhaps automatically), so that it can be traced back much more easily in the RFC records. Some RFCs remain in entries for years and get forgotten about, and are not easily traceable in the entry's history. Donnanz (talk) 16:00, 9 August 2015 (UTC)

Wikipedia has a bot that goes around adding dates to cleanup templates. Perhaps we could ask the folks who run it to run one here, too. You can find always the RFC discussion via the whatlinkshere (restrict it to searching the Wiktionary namespace and ctrl-f "cleanup"), unless the page was tagged but not listed. - -sche (discuss) 01:59, 11 August 2015 (UTC)
One way of adding the date is by adding your "four tildes" next to the RFC, but very few users would think of that, hence this thread. I try not to create too many RFCs! Donnanz (talk) 16:23, 11 August 2015 (UTC)
We already have the capability to deploy "oldest" and "newest" tables (such as the "oldest" table at the top of this page) for categories, which addresses on of your concerns.
The very existence of these suggests that the dates when an item was added to a category must already be accessible. Does anyone know how? DCDuring TALK 18:09, 11 August 2015 (UTC)

Templatizing usage examples[edit]

FYI, I created Wiktionary:Votes/pl-2015-08/Templatizing usage examples. Let us discuss the proposal, and postpone the start of the vote as much as the discussion requires. --Dan Polansky (talk) 09:38, 10 August 2015 (UTC)

I support this and I don't see why anyone wouldn't. It's analogous to why we templatize headwords and such. Templatized foreign-script languages, for example, allow for automatic translit. And likewise, the format can be changed, either by the end user through CSS or by editing the template -- e.g. if we figure out how to automatically use CSS to decide whether to put such an example on one line or multiple lines, which should definitely be doable since things like Bootstrap (a CSS library released by Twitter) can do it. Benwing (talk) 01:39, 11 August 2015 (UTC)
Is this something we even need to vote on? Is anyone against it? We've been templatizing usage examples for quite a while now and I don't remember anyone complaining. --WikiTiki89 01:45, 11 August 2015 (UTC)
I'm getting tired of all these pointless votes to be honest. —CodeCat 01:48, 11 August 2015 (UTC)
I too oppose votes on matters of formatting and template usage (and have stated as much in the past). Such votes could be seen as, at best, pointless, or as disruptive attempts to block the implementation of relatively minor changes by requiring the changes undergo more hurdles and meet a higher threshold (compare how US congresspeople use the filibuster to raise the threshold for passing legislation from 51% to 60%, blocking legislation which has enough votes to pass but not enough votes to come to the floor). Once before I started an "oppose having this vote" section on a vote, which garnered as much support as the vote itself; one could consider such an action if this vote is opened. (Side note, all the examples in the vote are English usexes, but I think it may be wise to consider English usexes — which don't need transliteration or translation — differently from foreign-language usexes.) - -sche (discuss) 02:14, 11 August 2015 (UTC)
Here's Wiktionary:Votes/2015-03/Templatizing topical categories in the mainspace; it has 50% support. Here's Wiktionary:Votes/2014-08/Migrating from Template:term to Template:m; it ended with 60% support. I find the above implication that editors at large should not have a consensus-based say in matters of template use in the mainspace and formatting in mainspace disconcerting. The wiki and template markup is the user interface and it matters a lot. The formatting instructions WT:ELE are a policy and cannot, in most circumstances, be edited without a vote. I oppose the use of ux and usex templates in English and Czech entries; it adds almost no value and makes the markup ugly to read. I never said so since I did not have the energy to do so; there are usually all to many things to discuss, in part since there are too many unnecessary changes being introduced by various editors without discussion. I have finally lost my patience, after seeing an editor chastise another editor for not using these templates. If I am a lone voice, the vote will easily pass. --Dan Polansky (talk) 08:03, 11 August 2015 (UTC)
As for "meet a higher threshold", can you clarify what the lower threshold and and the higher thresholds are in this particular Wiktionary situation? Do you consider 2/3 to be a too high threshold to pass? --Dan Polansky (talk) 08:09, 11 August 2015 (UTC)
You shouldn't create a vote before the issue has ever even been discussed. --WikiTiki89 10:33, 11 August 2015 (UTC)
The vote can be postponed as much as the discussion needs. Furthermore, overtemplatizing has been discussed, AFAIR. I remember one editor expressing his dislike of quotation templates and his preference for plain non-templated markup for attesting quotations; that's a case similar though not the same as example sentences. --Dan Polansky (talk) 11:49, 11 August 2015 (UTC)
  • Could someone remind me of what the benefit of this template is to new contributors, to passive users, or to others? If the benefit is a technical benefit that inures in a diffuse way to many, please explain. DCDuring TALK 12:30, 11 August 2015 (UTC)
See my comment up top about the benefit of the template, although there may be other reasons as well. Benwing (talk) 14:04, 11 August 2015 (UTC)
I asked not about the generalized benefits of templates, but of this one. I was hoping there were more.
So the total benefit is in the statement "the format can be changed, either by the end user through CSS or by editing the template -- e.g. if we figure out how to automatically use CSS to decide whether to put such an example on one line or multiple lines"
  1. What portion of our "end users" (admins? whitelisted editors? newbies?) will be trusted to make CSS changes of broad implementation? How would that work? Can you point to any examples or analogs in existing templates?
  2. Generally it seems that the features of templates quickly become Luacized, which dramatically reduces the ability of more casual contributors like me to make changes, especially since there is no group of responsive technical contributors willing to respond to requests, rather than implement their own cryptic agendas.
  3. All benefit depends on either:
    1. total implementation of a very capable (ergo, hard to develop successfully) template or
    2. allowing user-option non-use of the template when it fails to provide good output by the person using the template, ie, some who knew or was willing to learn the switches etc.
But we do not even have consistent use of our existing format, which is almost certainly needed for successful mass conversion to the template approach. What steps have we taken to discover inconsistencies in formatting, to learn from them, and to either correct them or amend WT:ELE?
Our failure to successfully continue deployment of Autoformat worries me. The existing format-maintenance system seems to be a regression requiring much more manual involvement.
It would be much easier for me to accept changes if they did not make it harder for newer content contributors, did not require more typing, did not make editing harder by uglifying the edit frame, led to specific benefits that were achievable with reasonable certainty, and were implemented by a responsive group of technical contributors. Continued overtemplatization in areas for which we need more contributors, ie, definitions, usage examples, citations, seems approximately opposite to the direction we should go. DCDuring TALK 14:50, 11 August 2015 (UTC)

Wiktionary:Votes/pl-2014-03/CFI: Removing usage in a well-known work 3[edit]

Some people recently mentioned they missed Wiktionary:Votes/pl-2014-03/CFI: Removing usage in a well-known work 3, despite the fact that the vote was opened for 5 months. Some of the people who missed it could have been User:Cloudcuckoolander, User:Ungoliant MMDCCLXIV, and User:DCDuring. I would like to encourage such people to post late votes, properly indented so that they do not count (e.g. #: Late '''oppose'''). We can't keep votes open forever, but we can continue to collect best evidence of consensus or its lack. Having a rationale accompanied with a late vote would would be very preferable, I think. --Dan Polansky (talk) 10:13, 10 August 2015 (UTC)

Thanks. Nevertheless, pinging for votes, even after-the-fact, looks like electioneering, a use of discretion that biases the process. It is what political parties do in elections: get out their vote. As I recall, there is some policy (probably unenforceable) against using e-mail to solicit votes. This has the merit of being more transparent, but still. Is but still an includable idiom or just elision? DCDuring TALK 14:15, 10 August 2015 (UTC)
I see your point. By pinging, I notified three people who explicitly said that they missed the vote, two of whom are likely to oppose the vote and one of whom would support based on his past comments. At the same time, I posted to Beer parlour so everyone who monitors Beer parlour is indirectly notified. I don't know what better I could have done other than stay silent. Late votes won't change the vote result anyway but are interesting, so I think they are a good idea. --Dan Polansky (talk) 17:42, 10 August 2015 (UTC)
The BP note alone would be more to my taste. But, as I said, pinging from a well-watched page is at least transparent, especially compared to alternatives. DCDuring TALK 18:14, 11 August 2015 (UTC)

Rare senses x rare forms[edit]

I have noticed that the parameter "rare" of the {{template:context}} categorizes entries into the Category:Terms with rare senses by language, while the parameter "uncommon" into the Category:Rare forms by language. What is the difference between these two categories? Originally I thought that "rare forms" contains only forms of some lemma, which are rare (e. g. common Czech word "pes (dog)" has a common plural "psi", but rarely "psové" can be found too), but the real content of the category does not look so. Jan Kameníček (talk) 00:19, 11 August 2015 (UTC)

The fact that various rare, historical, dated, archaic, and obsolete things are categorized differently is due to (1) a desire to categorize terms with only obsolete/rare senses (like heleth) differently from terms which are still current/common in some senses (like land), combined with (2) the fact that categorizing such entries differently requires a lot of work (edits to entries, templates, etc), most of which has not been done yet. I think the ideal/plan/hope is that one day terms like heleth will be in Category:English obsolete terms (I am not sure why Category:English obsolete forms exists with the name and content it has; as you note, it should properly be used only for e.g. low as a form of laugh), while land et al will be in Category:English terms with obsolete senses. (And likewise with rare things.) - -sche (discuss) 01:03, 11 August 2015 (UTC)

Retiring the codes of spurious languages[edit]

As of this year, the ISO has retired or has received requests to retire the following codes on the grounds that they are spurious and the languages they ostensibly refer to never existed. I suggest we also retire the codes.

  1. cbh Cagua, kox Coxima, cum Cumeral, ome Omejes, toe Tomedes, rna Runa. I quote from the change request forms (cbh, kox, cum, ome, toe): "Alan Wares, in correspondence with Barbara Grimes (5/28/1971), stated that [each one] should be deleted as 'non-existent.' Moreover, the Ethnologue has not added any information to the language entry in nearly 40 years. Mention of this language is missing from all the major sources on South American languages (Adelaar 2004, Campbell 1997, Dixon & Aikhenvald 1999, Loukotka 1968, Crevels 2007, Voegelin & Voegelin 1977). Hammarstrom (2014, in press) cites two additional sources (Landaburu 2000, Ortiz 1965) for the non-attestation of [it]." (rna's change request is similarly blunt about the total lack of evidence that it exists.)
  2. cbe Chipiajes and pod Ponares. These are surnames rather than language names. Quoth the change request form for obe: "Alan Wares, in correspondence with Barbara Grimes (5/28/1971), stated that Chipiajes should be deleted as 'non-existent.' The only information that the Ethnologue has added for Chipiajes: 'A Sáliba surname. Many Guahibo also have that name.' Mention of this language is missing from all the major sources on South American languages (Adelaar 2004, Campbell 1997, Dixon & Aikhenvald 1999, Loukotka 1968, Crevels 2007, Voegelin & Voegelin 1977). Hammarstrom (2014, in press) cites two additional sources (Landaburu 2000, Ortiz 1965) for the non-attestation of Chipiajes." The comments on pod are similar.
  3. xbx Kabixí. See the change request form, where it is noted that the term Kabixí is a catch-all for any hostile tribe, and the linguist who studied it "concedes that there was no information on" it.
  4. iap Iapama. Quoth [3]: "There is no evidence that this language exists. No information has been added to the Ethnologue since the 1980s. Mention of this language is missing from all the major sources on South American languages (Adelaar 2004, Campbell 1997, Dixon & Aikhenvald 1999, Loukotka 1968, Crevels 2007, Voegelin & Voegelin 1977). Hammarstrom (2014, in press) cites two additional sources (Grenand & Grenand 1994, Gallois & Ricardo 1983) for the non-attestation of Iapama."
  5. svr Savara. Quoth [4]: "Hammarstrom (2014, in press) states that it has been checked quite carefully that no Dravidian language exists matching the name Savara or any of the other information in the entry (p.c. David Stampe 2011) , nor, for that matter, the Indo-Aryan Oriya variety labeled Sahara/Saora in Mahapatra (2002:183-184). Barb Waugh, in an email dated 10 November 2009, responded to queries about Savara stating that she did not believe that the language existed at all. She only knew Savara as an alternate name for Sora [srb], a Munda language. She pointed out that Ruhlen (1987) lists Savara as a Dravidian language. Around the same time, Kirk Miller (UCSB) wrote questioning the existence of this language."
  6. yds Yiddish Sign Language. See jewish-languages.org's entry for more. [has been removed from Wiktionary]
  7. btl Bhatola. See [5].
  8. myi Mina (India). See [6]. Neatly, retiring this will allow us to include hna Mina (Cameroon) without a disambiguator.
  9. pry Pray (aka "Pray 3", as Ethnologue called it because they just gave up on disambiguating it in any of their usual ways). This one is not strictly spurious; rather, it turns out to be no more than a duplicate of prt Prai (aka Pray). See the change request form with data from recent field research.
  10. yos "Yos" was retired and merged it into zom "Zo", with the change request noting that Yos is simply the English plural of Yo which is no more than a variant form of Zo. [has been removed from Wiktionary]

If you have objections to any of these, speak up. (This list does not include codes retired by being split or merged [except pry and yos], or some other codes; I'll post about those later.) - -sche (discuss) 02:39, 12 August 2015 (UTC)

(Pinging because this is old.) Have we any translations into (or entries in) these? If so, I guess merge pry and yos, but what do we do with the others?​—msh210 (talk) 20:43, 8 September 2015 (UTC)
@Msh210 We have no words (entries or translations) which are claimed to be from these languages, AFAICT. (Which is good, given that the linguists cited above are saying the languages don't exist.)
These aren't the first codes we or the ISO have retired for having nonexistent referents, btw.
I'll wait until the 15th, when the ISO/SIL say they will post their notes on which of the retirement requests they themselves accepted, before acting on most of these. But yos and yds, which they retired years ago, I'll go ahead and merge now. - -sche (discuss) 02:18, 10 September 2015 (UTC)

About gloss parameter in term templates in all other sections, except etymology[edit]

I am not sure how helpful is a gloss in Derived and Related terms or even in Synonyms. A term, as all words, may have many different senses. These can include a figurative one, a literal one etc. The existence of a gloss definition in etymology section is usefull since possibly (but not always) only one sense is the specific one that "caused" the people to use the word (or phoneme). I came to these conclusions after @Saltmarsh pointed out that I should add some gloss definitions to my additions. I tried it for start, but there where some "mind" troubles when I had to add terms that have more than one wide used sense or more than one gender. Someone might say that in such cases do not add a gloss. But even if only one sense is wide used we "provoke" the rejection of all other senses that user may find in the article. --Xoristzatziki (talk) 06:05, 12 August 2015 (UTC)

Synonyms should use {{sense}}. Derived and related terms should specify a gloss but it can be of the form |gloss=foo, bar, baz with multiple defns; doesn't have to be all possible senses but should be the principal ones. Benwing (talk) 06:45, 12 August 2015 (UTC)
[I've been away]   Reasoning: the gloss may be interesting/relevant and having it there saves the user linking through to find out. User Benwing shows the syntax.   Gender: I think I would do this for multiple gender forms of different meaning:
Idiomatic phrases: I would normally link each term (do we want all of these with a separate entry) and these certainly need a gloss.  — Saltmarshσυζήτηση-talk 05:57, 15 August 2015 (UTC)
  • Synonyms and Derived terms sections should not specify a gloss, by my lights. It is not only my preference but also a long-term overwhelming practice not to specify a gloss. Some people prefer glosses, obviously, but it is nowhere close to being a prescribed or recommended practice. Similarly, these sections should not provide gender, IMHO, but here the practice varies language by language. As for rationale, glosses make these sections too busy with information that is available elsewhere. Gender is okay as for being too busy or not, but is available in the lemma, and IMHO not so important that it should be available in term lists. --Dan Polansky (talk) 17:42, 16 August 2015 (UTC)

Allow Etymology as level 4 header[edit]

Me and some others (I don't know who, or where the discussion was) have expressed in the past a desire to have the etymology section nested under part-of-speech sections, rather than floating alongside them (both on level 3) or having the part of speech nested under the etymology. I think it makes more sense to put etymology underneath an individual word:

  1. Users generally look up terms for their definitions, etymology is of lesser importance overall. Therefore, it makes more sense to put it below the definitions.
  2. Etymology always applies to a single word and part of speech. If it happens to apply to multiple parts of speech, then the chances are that one of them was first, and the others were derived from that. That's something we can and should note in the etymologies of each individual part of speech.
  3. Having to increase the heading level whenever there are multiple etymologies is annoying. It also makes it look less consistent; sometimes POS is level 3, sometimes level 4? Level 5 headings are hard to distinguish visually from level 4, so I think level 4 should be the highest level we use.
  4. For non-lemma entries, we generally don't have or need etymologies, but we're forced to create etymology sections for them whenever there is another word in the entry. For example, rose (rise, past) needs an etymology header to separate it from the header for rose (flower), but the etymology section itself is left empty or doesn't have any useful information, because the etymology is at the lemma, rise.

So I'd like to ask/propose that etymologies be allowed to be nested underneath the POS header, as level 4. It would be added below the definitions, usage notes and inflection headers, but above synonyms, antonyms and derived/related terms. This is done in accordance with the general principle in our entry layout that information about the current term precedes information about relationships to other terms.

This proposal is intended as an indefinite trial, to let users who prefer this alternative format apply it to entries and evaluate its merits and problems. The original format will continue to be allowed as well, at least until there is a decision to phase it out. —CodeCat 18:50, 12 August 2015 (UTC)

"If it happens to apply to multiple parts of speech" -- isn't that overwhelmingly common? It would be tiresome to have ety sections repeated all over the place when they are basically the same word/sense. Do other dicts do that? Equinox 18:54, 12 August 2015 (UTC)
I've not found it particularly common in the languages I've worked with, it's quite rare. Maybe English is just an exception. But this is why I'm not proposing to get rid of the old format just yet; we can still keep using it in situations that we haven't found alternatives for yet. That said, I think it's pretty easy to handle this with nested etymologies, as I noted in point 2. Just put the etymology on the term that was first, and the rest get etymologies saying they were derived from that first term. For example, up (preposition) is derived from up (adverb), which our entry fails to note. —CodeCat 19:01, 12 August 2015 (UTC)
For many words, it might not be known which POS came first. Other dictionaries do not do this. They generally list all the parts of speech and give one etymology at the top or bottom of the entry (which sometimes mentions different derivations of specific senses of the word, is still in one section). --WikiTiki89 19:12, 12 August 2015 (UTC)
"English is just an exception." And also merely, technically the host language of this wiki.
No matter how many times this is proposed, it still seems like a bad idea. The structuring advantage of having semantically related terms that are different PoSes is enormous for English. Though it is not particularly helpful for one not familiar with large dictionaries, it is quite helpful once one get the hang of it. It is almost essential where there are homonyms both with, say, nouns as PoSes. Is it the proposal to combine all of the noun PoSes, no matter what the etymologies? We have spent a fair amount of effort trying to split etymologies where semantically warranted. To run the definitions through the blender as seems to be proposed seems like a regression. We may have come to accept them in technical areas as people abandon the project and their creations lapse, but I don't see why they should be allowed in content. DCDuring TALK 23:26, 12 August 2015 (UTC)
Where are you getting the idea that POS sections are going to be merged? They'll be split by etymology as they always have been. —CodeCat 23:40, 12 August 2015 (UTC)
  • If POS is level 3, and etym is level 4... how would the POS sections not be merged? I find this rather confusing. ‑‑ Eiríkr Útlendi │Tala við mig 00:40, 13 August 2015 (UTC)
And I'm confused that it confuses you, because it seems pretty simple to me. POS sections are not merged, they're kept separate as they are now. Nothing more to it. —CodeCat 00:54, 13 August 2015 (UTC)
Just for fun (and clarity), could you take one of the more complex entries and reformat it in your proposed style (perhaps within your userspace)? It'd be useful for reference. Equinox 01:11, 13 August 2015 (UTC)
Ok, can you give me one you had in mind? —CodeCat 01:25, 13 August 2015 (UTC)
I think what is being proposed is (for e.g. rose):
A flower. (blah blah blah, headword line template, synonyms, etc)
From Oscan.
Past tense of "rise".
Inflected form of "rise".
Which indeed obviously keeps the POS sections distinct. I'm not sold on such an arrangement, but one obvious benefit is that etymology could be pushed below the definitions, which some people have favoured. - -sche (discuss) 03:01, 13 August 2015 (UTC)
  • To clarify my concern about merging POS bits, I'm not talking about nouns and verbs being thrown together. Instead, I'm concerned about terms that have multiple senses of a single POS type, and where those separate senses have different etymologies. If the etymology header is made subordinate to POS, then things get confusing pretty quickly. Consider the Japanese entry at , for instance. This term has nine different noun senses, all with distinct etymologies (and eight distinct pronunciations even). The proposed structure of an ====Etymology==== header at level 4, under a ===Noun=== header at level 3, would make this entry a complete mess. ‑‑ Eiríkr Útlendi │Tala við mig 18:07, 14 August 2015 (UTC)
See diff for an example of how I think etymologies should be handled. Each POS has its own etymology, including (especially) forms of another lemma. Two different lemmas can't possibly have the same etymology, because after all, if they have the same development history, why are they still different?
As for Eirikr's entry above, I'm not really seeing the issue. The entry already has one etymology section for each POS, so all that would be left to do is to switch the headers around. —CodeCat 20:09, 18 August 2015 (UTC)
  • CodeCat, have another look. As visible in the page's TOS, some single etymologies cover multiple POSes -- noun and prefix, noun and suffix.
In addition, I still don't quite understand your proposed layout. Further up the thread, it sounds like all nouns would go together under a single ===Noun=== header -- which then leaves me wondering how the disparate etymologies would be accounted for. Even if your intention is to have as many ===Noun=== headers as there are etymologies, this produces a strange circumstance where we are organizing higher-level headers in a way that's dependent on lower-level headers. Just in terms of hierarchical organization, that seems backwards.
And that still doesn't account for the case where an entry has multiple etymologies, and some of those etymologies apply to multiple POSes. Numerous Japanese terms have a single spelling, with multiple POSes under a single etym and pronunciation. Fewer, but still numerous, entries have multiple separate etymologies, each etym with its own pronunciation and possibly multiple POSes.
Would you be willing to edit the entry into your proposed structure, as you did for dice? A more concrete example would illustrate things more clearly, I think. ‑‑ Eiríkr Útlendi │Tala við mig 21:07, 18 August 2015 (UTC)
See User:CodeCat/ja. Since I didn't know the etymologies of all the terms, I had to make something up. —CodeCat 21:44, 18 August 2015 (UTC)
An alternative.​—msh210 (talk) 20:57, 8 September 2015 (UTC)

Why don't we have an Unattested namespace?[edit]

Putting unattested terms in the Appendix namespace gives no information about why they are there or how they differ from other appendices. Why not give them their own namespace? There is nothing particularly appendix-like about them. DTLHS (talk) 02:57, 13 August 2015 (UTC)

New namespace I'm confused: how would a new namespace help? —Justin (koavf)TCM 03:06, 13 August 2015 (UTC)
If you're talking about reconstructed proto-language terms (versus, say, this kind of unattested terms), I think giving them their own namespace (say, "Reconstructed:") would be a fine idea. We could even perhaps then write a Mediawiki: page or, at worst, some js/css, to automatically display the "this term is reconstructed" warning atop such pages, which people currently have to remember to add manually. - -sche (discuss) 03:08, 13 August 2015 (UTC)
Right, reconstructed, sorry. DTLHS (talk) 03:11, 13 August 2015 (UTC)
It would also make it easier to parse reconstructed pages, which should be treated like all other namespace pages, vs appendix pages which mostly should not be. DTLHS (talk) 03:30, 13 August 2015 (UTC)
I was a bit confused by the use of "Unattested" at first, (Appendix:English unattested phobias and Appendix:English dictionary-only terms come to mind) but for reconstructed terms I, too, support the idea of creating the separate namespace Reconstructed:. --Daniel Carrero (talk) 06:02, 13 August 2015 (UTC)
We should name the namespace so as to include constructed languages as well. --WikiTiki89 06:10, 13 August 2015 (UTC)
I definitely support a Reconstructed: namespace. I don't think we should include appendix-only constructed languages in it. What we should do with them, I don't know, but muddling the reconstructed namespace with them is a bad idea and would take away some of the technical benefits of such a namespace. My own personal preference is to just delete them altogether. —CodeCat 11:41, 13 August 2015 (UTC)
I support a Reconstructed: namespace too, without conlangs. They can have a namespace of their own, e.g. Conlang:. —Aɴɢʀ (talk) 12:19, 13 August 2015 (UTC)
I also support a Reconstructed: namespace. Not sure about conlangs; either they should go into Conlang: or into the main namespace. Arguably, conlangs that are well enough attested should go into the main namespace and others shouldn't be included at all. If we use a Conlang: namespace, where do we draw the line? Esperanto was originally a conlang, too, but we put it in the main namespace. Same with Lojban, for example. Benwing (talk) 12:53, 13 August 2015 (UTC)
This is why I prefer deleting them. It's a bit strange if we say "yeah, we don't actually allow these conlangs, but if you hide them away in an appendix then it's ok". —CodeCat 12:54, 13 August 2015 (UTC)
Question: What about reconstructed terms in, say, Vulgar Latin? What differentiates them from terms in Proto-Romance? Our entry claims VL and Proto-Romance are synonyms, but w:Vulgar Latin says that the two are "often confused". DCDuring TALK 13:33, 13 August 2015 (UTC)
Something to consider: should the pages in this new namespace be named with the language name as they are now? Or should we have entries named only with the headword, like in the main namespace? —CodeCat 16:33, 13 August 2015 (UTC)
There would be some pages with multiple proto-languages on them, e.g. the strings *me- and *ke- are so short that they're surely found in more languages than just proto-Algonquian. OTOH, handle that just fine in the main namespace. - -sche (discuss) 18:23, 13 August 2015 (UTC)
I'm in favor of organizing them like the main namespace rather than like the current layout, e.g. /wiki/Reconstructed:bʰer- with a ==Proto-Indo-European== heading rather than /wiki/Reconstructed:Proto-Indo-European/bʰer-, where the ==Proto-Indo-European== heading would be redundant. Maybe we could pick a shorter name for the namespace though, like Proto:. —Aɴɢʀ (talk) 18:18, 14 August 2015 (UTC)
Proto: would not work for non-protolanguage reconstructions. —CodeCat 18:25, 14 August 2015 (UTC)
It would work, it just wouldn't be the optimal name. "Recons:", maybe? I just don't feel like typing out "Reconstructed:" all the time. —Aɴɢʀ (talk) 18:45, 14 August 2015 (UTC)
I would suggest "R:" were it not for the fact that that would conflict with how we name and transclude reference templates. - -sche (discuss) 18:47, 14 August 2015 (UTC)
The software allows for namespace shortcuts. WT: is a shortcut to Wiktionary:. —CodeCat 18:57, 14 August 2015 (UTC)
Yeah, I (and, on my talkpage, JohnC5) have thought about the utility of having more namespace shortcuts, e.g. AP: for appendices. The shortcut might still have to be RC:, though, since I suspect the existence of an R: namespace (even as a redirect) might cause {{R:OED}} to be interpreted as a transclusion of R:OED rather than Template:R:OED (certainly I would expect it to fail to reach Template:R:foo for any {{R:foo}} where R:foo was a page). Side note, @Angr, how often would you be typing out rather than copy-pasting the first part of the pagename (Reconstructed:) given that the second part would probably contain characters like ɸ or ʰ₂r̥ that you'd have to copy-paste or insert from the edittools? Perhaps we could add Reconstructed: to the things edittools can insert... - -sche (discuss) 19:11, 14 August 2015 (UTC)
Even then, all our linking templates already treat * as a shortcut to reconstructed pages. So you'd only need to type the namespace name in the very rare occasion that you're not using a linking template. —CodeCat 19:23, 14 August 2015 (UTC)
I feel like I waste hours of my time typing the words Appendix and Category. If the abbreviations AP and CT respectively existed, I would be very pleased. Also Temp or TP for Template would be great for that matter. I don't see why we don't have more of these. I also support the creation of the Reconstructed namespace. —JohnC5 19:28, 14 August 2015 (UTC)
@CodeCat: Separate to this discussion, could we look into adding those shortcuts to the search bar? —JohnC5 12:58, 19 August 2015 (UTC)
Ideally, we'll get a few more users to chime in here supporting such namespace-redirects. Then we can file a Phabricator ticket asking for (a) a 'Reconstructed' namespace, and (b) 'RC'→'Reconstructed', 'AP'→'Appendix' and 'CT' (or maybe 'CA', since 'CT' sounds like 'Category talk' although we almost never have discussions on Category talk pages) → 'Category' namespace-redirects. It shouldn't be hard / take long for the devs to grant such things to us. - -sche (discuss) 07:49, 20 August 2015 (UTC)
  • I agree that conlangs should be handled separately from reconstructed terms. In contrast to how we handle proto-languages, our current approach to conlangs actually is fairly well suited to the appendix namespace, in that we have one page (one appendix, total) on each conlang. However, most of them are constant copyvio magnets, since we can only allow short appendices, but the inclusion of any appendix at all tempts people to expand said appendix: see e.g. [7] (BP discussion of copyright issues). I wouldn't mind deleting most of them, perhaps moving a few (de minimis) words into our mainspace entries on the names of the conlangs, using {{examples-right}}, like this. - -sche (discuss) 18:23, 13 August 2015 (UTC)
  • Support a separate namespace for reconstructed languages (for one, it's the by far busiest part of the Appendix: namespace, and trying to find out whatever is going on with all the other appendices is a pain). — I do not think that a mainspace-type approach to lumping "homographic" roots from different protolangs on the same page is a good idea though. Notational systems for protolangs vary greatly, and this could imply a senseless amount of repetition of "Alternate spelling of…" sections in the future. The basic object of protolang pages is an etymological group, not the graphical representation of its proto-form, per se.
    In fact I could suggest that the new namespace be named simply Etymology:, and that it could include appendix pages tracing the descendants of attested words just as well (a la Appendix:Names derived from Marcus). --Tropylium (talk) 14:10, 22 August 2015 (UTC)


Last year, the ISO approved the code esy for Eskayan. Should we follow suit, and if so, should we allow it in the main namespace? It's technically a conlang from the early 1900s, but it comes with a mythology that claims it's much older and it functions as a medium for recording traditional stories (both in Roman script and in a native script which lacks a ISO 15924 code). It has no native speakers but a few hundred secondary speakers and a few schools to teach it. - -sche (discuss) 06:50, 14 August 2015 (UTC)

I've added it to Module:languages. It is spoken by a few hundred people, and schools teach it and literature is published in it and has been for almost a century, so I suppose it is allowed in the main namespace like Esperanto. Its creator intended it for widespread use (by his ethnic group) and attributed it to his tribe's mythical ancestor rather than to himself, and then he (the actual creator) died in 1949, so as far as copyright concerns go, it seems similar to e.g. Esperanto and different from e.g. Dothraki. Shall we update WT:CFI#Constructed_languages to note the existence and inclusion of Eskayan, or is that not necessary because the ISO doesn't categorize esy as a constructed language, and it does not itself admit that it is one (even though it is identifiable as such by linguists)? - -sche (discuss) 19:03, 16 August 2015 (UTC)
I don't see any reason to exclude it.--Prosfilaes (talk) 20:32, 17 August 2015 (UTC)
@-sche: Reading up on it, I see that it's pretty much relexified Boholano Cebuano. If that's the case, it resembles the avoidance registers of Australia or the pandanus languages of New Guinea, which we treat as part of whatever language's grammar they have. Perhaps, then, we ought not to be including Eskayan on those grounds instead. —Μετάknowledgediscuss/deeds 02:20, 18 August 2015 (UTC)
That seems to value internal consistency over ease of use and external consistency. If the world treats it as a separate language, it seems like people looking it up are going to be expecting it to be a separate language.
Also, people looking up a language that has multiple known registers are going to know about the registers. It's a lot easier for students and the like to get confused if we mix Eskayan words in with Boholano Cebuano words, no matter how they're labeled.--Prosfilaes (talk) 04:56, 18 August 2015 (UTC)
Right. Furthermore, I'm not sure treating Eskayan as Cebuano would even provide internal consistency: if (as is my understanding) the entire lexicon is different to the point that there is zero mutual intelligibility, on what basis would we consider them the same language, while considering languages with very similar grammars and lexicons (say, Danish and Swedish) to be distinct? It's my impression that even the largest avoidance registers contain only a fraction of the number of words the main language possesses. - -sche (discuss) 03:27, 26 August 2015 (UTC)
  • I'm big on recording what people are actually using to communicate. On the other hand, there's a lot of missionary-mangled versions of languages that aren't really worth bothering with, and this looks like it might be just another example. If someone wants to do it, I'm not going to object.--Prosfilaes (talk) 07:33, 29 August 2015 (UTC)
    • I agree. If islanders use this language to communicate amongst themselves, then it would seem to be comparable to (a much less widespread form of) Esperanto, or even to Michif with only the difference that the group of people who created it lived recently enough to be identifiable by name rather than lost to the mists of time. But if it never gained use outside of the missionaries' materials, then it would seem comparable to other failed attempts at language-blending conlangs. - -sche (discuss) 16:57, 29 August 2015 (UTC)

Neo and Talossan (the two ISO-coded conlangs CFI doesn't specifically address)[edit]

Quoth CFI as updated to reflect current ISO numbers, in addition to the 7 (self-identified-as- and identified-by-the-ISO-as-) constructed languages which are approved for inclusion in the mainspace, there are 14 more languages which are classified as constructed languages, of which 9 "have not yet been approved for inclusion in the English Wiktionary", and are included in appendices: these are languages like Láadan. "Another 3 of those fourteen languages are prohibited", namely Quenya, Sindarin and Klingon, which are also included in appendices.

  1. What is the difference between being 'not approved' and included only in appendices, and being 'prohibited' and included only in appendices?
  2. What should be done with the two languages which are left out of the above count (9+3=12≠14), Neo and Talossan? Should they be 'not approved' and limited to appendices, or 'prohibited' and limited to appendices, or something else?
  3. What should be down with WT:BP#Eskayan, discussed above, which the ISO does not classify as a constructed language but which is identifiable as one?

- -sche (discuss) 19:29, 16 August 2015 (UTC)

I think we need to overhaul that part of CFI a bit. Instead of listing languages and thus being both messy and incomplete, we should make it clear that those 7 languages are approved, and no other languages that the ISO considers to be constructed languages may have entries in mainspace. That would leave Eskayan just like any other language, which I think is fine. —Μετάknowledgediscuss/deeds 23:48, 17 August 2015 (UTC)

Two romanization headers in a row[edit]

In entries like de and lei, is it preferable to have two romanization headers in a row (one with the "form of X" templates and one with the "nonstandard form of Y" templates), or only one header, like so? - -sche (discuss) 03:52, 17 August 2015 (UTC)

I think it's preferable to have a single Romanization header in such cases. —Aɴɢʀ (talk) 18:44, 17 August 2015 (UTC)

Notes as a valid L3 (esp. along References)[edit]

Copied from a related discussion, for separate discussion. (Link removed from sig not to ping unintentionally.) Neitrāls vārds (talk) 06:50, 17 August 2015 (UTC)

As I see it, a discussion on allowing "Notes" as a valid header should be considered.

Vahag has brought this up (Wiktionary:Grease_pit#.7B.7Breflist.7D.7D) and I'm running into a similar problem all the time. As ridiculously silly of an argument as it may be, I do, in fact, agree that numbered and bulleted references together look ugly AF. (I have even went to such ridiculous steps as removing a reference that didn't add anything critical just because it was bulleted while the other ones were numbered because of how unappealing it looks.)

In more general terms, I kind of get the feeling that there seems to be consensus that references are in fact valuable and add value to the entry, perhaps the discussion should focus more on how to allow more elegant ways of faithfully citing content, particularly in "controversial" cases, e.g., obviously one bulleted reference is enough under, say, an assertion that et kala and liv kalā derive from the same source because, well, it's pretty obvious but then if there is a "weird" controversial cognate there isn't even a way of citing it inline (unless you want the awful looking mixing of numbered and bulleted refs.) Neitrāls vārds (talk) 11:09, 12 August 2015 (UTC)

  • Support having a ==Notes== section separate from ==References==, esp. when both exist. Benwing (talk) 11:21, 12 August 2015 (UTC)
Where in the standard order of headers would this be placed? —CodeCat 18:48, 17 August 2015 (UTC)
Right before ==References==. --WikiTiki89 18:53, 17 August 2015 (UTC)
Am I understanding correctly, then, that the notes section would apply to all POS sections collectively rather than any specific one? —CodeCat 18:59, 17 August 2015 (UTC)
The way it is now (perhaps unofficially) is that the ==References== section may be found in an entry with one etymology as an L3 or L4 or in an entry with more than one etymology as an L3, L4, or L5 (my personal preference is never to have it as an L3 with more than one etymology, so I usually fix these cases). --WikiTiki89 19:04, 17 August 2015 (UTC)
My preference is the opposite, to have it always at L3. —CodeCat 20:00, 17 August 2015 (UTC)
If they are always tagged with <ref> tags, then your way may be better, but often the ==References== section is just used to list references that apply to an entire section, in which case you need to know which section that is. You can have a different set of reference links for each etymology section or even each POS section. --WikiTiki89 20:04, 17 August 2015 (UTC)
Your point is valid, but I don't like references that don't use ref tags to begin with. "Section-wide" references tell you nothing about what comes from where. All they do is say "these references were somehow involved in the creation of this entry", which is rather vague. —CodeCat 20:11, 17 August 2015 (UTC)
This. His point is invalidated since we shouldn't have references without ref tags. — LlywelynII 14:07, 18 August 2015 (UTC)
Support. This will also help prevent misuse of the Usage notes section. I frequently run across entries whose “usage notes” have nothing to do with how the word is used (arachnogenic necrosis is the latest example). — Ungoliant (falai) 19:05, 17 August 2015 (UTC)
  • Oppose. The solution to the layout problem is to not use bulleted references. Usage notes already covers any notes relevant to the entry. Anything that would go into a Wikipedia entry's "Notes" section should either be addressed in the appropriate section directly (as with contested etymologies) or simply removed (as with Ungoliant's "a. necrosis" example). Giving people yet another section in which to include errata isn't an actual solution to the problems people are listing. — LlywelynII 14:07, 18 August 2015 (UTC)
    • IMO this makes little sense. You basically think people should never create references sections listing refs; that's an impossible standard to meet and often way too awkward. In my Arabic entries that I add, I routinely add a "References" section under each part-of-speech entry listing the books where I got the entry definitions from. There's no simpler way of doing it, since the reference really does refer to the POS section as a whole in most cases. And many languages do this. So we really do need Notes and References separate. Likewise if we're using Harvard-Style references, with short footnotes under "Notes" that are linked to a list of references under "References". Benwing (talk) 14:17, 18 August 2015 (UTC)
      • IMO you're confused as to what's being proposed which probably goes back to the original discussion's misunderstanding of Wikipedia's #Notes section. #Notes (as the name implies) are for actual notes; they are not for references of any sort. #References are for both generated inline references (what's being called numbered references here) and bibliographic lists. If you feel the layout requires it, you can create subsections for #Citations and #Bibliography or #Works cited or #Whathaveyou.

        There's no call whatsoever for a (second) #Notes section at Wiktionary and creating one will increase the level of errata our users will add to entries, which the editors above felt to be a problem. The w:1st rule of holes suggests not expanding the areas of the entry devoted to random information, beyond that included in the existing and needful areas.

        As for having a subsection of #References for linked #Citations and another for stand-alone #Works... I fall back on my position that you're just being lazy and should create appropriate references as you create entries. At the same time, there's no real problem with creating a subsection within #References to deal with the layout issues, if people really want harvard style references and a separate list of works. But that discussion has nothing to do with a #Notes section. — LlywelynII 14:25, 18 August 2015 (UTC)

        As an example of what I mean, I patched up բալախ, the original entry that prompted this discussion. Note that having a separate inline section means that the inline citations should not fully duplicate the information in the bulleted list. It should be kept terser, with the full information on the source given below. — LlywelynII 14:38, 18 August 2015 (UTC)

        Here's an edit after the #Citation section has been made terser and the bibliographic info has been moved down to the #Bibliography section. Obviously it could be made more helpful and nicer with some of Wikipedia's inline citation templates like sfnp, which create automatic links to the full citation info. — LlywelynII 14:55, 18 August 2015 (UTC)
        WT:NOT#Wiktionary is not Wikipedia. We can do things differently. --WikiTiki89 14:32, 18 August 2015 (UTC)
        We can, but having an infelicitously-named #Notes section is really not a good place to start. If it's intended for storing inline references, it still belongs in the #Reference section. — LlywelynII 14:34, 18 August 2015 (UTC)
        Well we could have the ==Notes== section actually be notes that reference the ==References== section, like Benwing mentioned. --WikiTiki89 14:39, 18 August 2015 (UTC)
        A #Note section giving notes on the #References section would be a section of commentary on the sources being used for the entry. That's completely different from what Benwing was discussing and doesn't seem particularly helpful itself, either. — LlywelynII 14:55, 18 August 2015 (UTC)
While we're on this subject, note that I cannot use inline references for two language sections simultaneously without resorting to ugly tricks. See գութ. --Vahag (talk) 14:48, 18 August 2015 (UTC)
Sure you can. You either duplicate the information in each section or you use a named reference, with a #Reference section below both. I do have to admit I'm confused, though. Your example գութ doesn't have any reference shared between its two sections. Was there one you wanted to share or was it just a bad example? — LlywelynII 14:59, 18 August 2015 (UTC)
Hmm, I wanted to do this and I could swear that format didn't work before. It does now, so I withdraw my comment. --Vahag (talk) 15:29, 18 August 2015 (UTC)
  • Oppose. I also oppose the notion (expressed by some above) that all references need to use ref tags. In particular, because Wiktionary has a longstanding practice, which I support, of not cite other dictionaries inline for definitions, but Wiktionary does allow other dictionaries ("mentions") to verify words in many languages, there will always be many entries which have references which apply to the whole entry, as Benwing notes. I personally don't find a mix of bulleted and numbered citations problematic, but if you do, a solution like the one deployed on բալախ is preferable to a new section which, I agree with Llywelyn, is unnecessary and also apparently misunderstanding what Wikipedia uses ==Notes== for (hint: not references, but actual clarificatory notes, which often don't cite references). Practically speaking, the continued use of "related terms" by new users to mean "semantically related" when it actually is for "etymologically related", and the only very slight distinction that is proposed to be made between ==References== and ==Notes==, convinces me that only a few veteran adepts would use ==Notes== correctly, and other people would either not use it "correctly" vis-a-vis ==References==, or fill it up with trivia. - -sche (discuss) 15:16, 18 August 2015 (UTC)
    • But those are completely different things. There's references ("see here for more") and sourcing ("we got this information from here"). Mixing them into the same references section is bad. I have no problem with listing external reference works, but treating them as sources or mixing them in with sources is very bad. External reference works should, surely, go in the "external links" section, the "references" section should be kept for sourcing only. —CodeCat 20:06, 18 August 2015 (UTC)


See Talk:𐤋𐤏𐤁. Seems we have a hundred-odd entries whose headwords are perfectly correct but whose article titles are written backwards for no apparent reason. (Nothing came up searching the beer parlor but there may have been a discussion about this elsewhere. If so, just kindly link to it.) — LlywelynII 23:33, 17 August 2015 (UTC)

Could this be a problem with the wiki editor (and/or the user's browser)? I mean, if you start typing Hebrew or Arabic, it will correctly switch to right-to-left mode. But it doesn't necessarily "know" about every language. Equinox 23:51, 17 August 2015 (UTC)
The problem is that even though Unicode designated Phoenician as right-to-left, most fonts seem to display Phoenician characters left-to-right. And because of this, the editors who created these entries entered the letters backwards in an attempt to get them to display correctly, so the article titles are actually wrong. --WikiTiki89 02:18, 18 August 2015 (UTC)
Ah. So it's a well-meaning problem all around: the original editors were trying to get it to display correctly; the programmers got around to formatting that language to process correctly; implementing the new coding has now made the existing pages display incorrect backwards names which are getting copied onto other people's work elsewhere on the internet. So, we just need to go fix this, right? Is this something easily automated or do we just slowly do it by hand?

And will the entries now alphabetize correctly? or do they need special treatment in their DEFAULTSORTs? — LlywelynII 13:48, 18 August 2015 (UTC)
This must be done manually by someone with enough familiarity with Semitic languages (such as myself). The entries are very inconsistent. Some are correct, and some are incorrect in different ways. And yes, they will alphabetize correctly after this. --WikiTiki89 14:02, 18 August 2015 (UTC)

Nouns mostly used in plural - redirection to singular[edit]

I see reduction of content going on in nouns often used in plural, via soft redirection to singular forms. That includes crocodile tears, savings, and scrambled eggs. This seems inferior to me and I would like to refert. We should IMHO host the definitions in the most common form, and if the most common form is the plural, we should host it in plural. What do you think? Anyone has a link to a previous discussion? --Dan Polansky (talk) 19:10, 18 August 2015 (UTC)

One concern I expressed in this previous discussion (see also this one) was that most people are able to figure out when a word is plural even if they can't tell what it means, and will look up the base form (e.g. foobar, if what they see in the text is "the foobars are blah"), so unless there is some explicit and obvious notice that additional senses are to be found in the plural's entry, readers may never think to look there.
If all senses are most common in the plural, I agree that the plural should be the lemma, with the singular using Template:singular of or a similar template. If only some senses are most common in the plural, I think it's more helpful to the reader to have them all in one place with appropriate labels (like "chiefly plural"). I could live with splitting them, though, as long as there were explicit, obvious notices to readers that they need to look in the other entry for more senses. (I don't think bare Template:singular of as an additional definition-line after some substantive definitions makes it sufficiently obvious that there's more semantic information to be found in the plural, but Template:singular of with a gloss specified might work.) - -sche (discuss) 19:38, 18 August 2015 (UTC)
The way I see it, there is one and only one lemma entry (one with definitions, inflection, -nyms, etc) per lemma. A single lemma should not have more than one lemma entry. So either these should all be concentrated on a single lemma page, as our normal practice is with respect to lemmas and non-lemmas, or we should treat them as separate lemmas entirely and keep them completely separate. I have done this with some entries as well, such as dialectics and darts. Note that in the former case I made sure to split the etymology as well, as different lemmas always have different etymologies. —CodeCat 20:20, 18 August 2015 (UTC)
We also need to establish some limit for how much more common the plural is. According to bgc ngrams, shoes, eyes, and feet are all somewhat more common than their corresponding singulars, but I wouldn't want to treat the plurals as the lemmas. —Aɴɢʀ (talk) 06:28, 19 August 2015 (UTC)

How can we improve Wikimedia grants to support you better?[edit]


The Wikimedia Foundation would like your feedback about how we can reimagine Wikimedia Foundation grants, to better support people and ideas in your Wikimedia project. Ways to participate:

Feedback is welcome in any language.

With thanks,

I JethroBT (WMF), Community Resources, Wikimedia Foundation. 05:24, 19 August 2015 (UTC)

What to call plural noun lemmas?[edit]

We have the template {{en-plural noun}} to categorise nouns whose lemma is grammatically plural. But this template also categorises in Category:English pluralia tantum. Is every noun that is used primarily in the plural a plurale tantum? I'm thinking a better category name would be Category:English plural nouns or Category:English plural-only nouns. —CodeCat 14:28, 19 August 2015 (UTC)

Very many "plural only" nouns can be found to be attested in the singular, eg scissor. It would, IMO, be misleading to eliminate the category for this reason, but it means that we need a good explanation in the category header. If we have a good explanation, we don't need to worry as much about the category name. I think what users need to know is not that the lemma is plural in form, but whether it is more commonly ("correctly") used ("agrees") with a singular or plural verb. I think this is an empirical question for many such terms, rather than something that follows from the categorization. I wonder whether the category shouldn't be hidden and the "plural-only" display replaced with something that focused on the agreement issue. As a hidden category it would retain its usefulness in directing contributors to reviewing the entries to determine whether they adequately and correctly addressed the agreement issue. DCDuring TALK 14:42, 19 August 2015 (UTC)
scissors pl (normally plural, singular scissor). We can call the category Category:English plural nouns (and use it only for lemmas, not forms-of). --WikiTiki89 14:50, 19 August 2015 (UTC)
With such nouns that do have a singular, we have to ask what the singular actually means. For the derivation singular > plural it's easy, it is simply multiple of a thing. For plural > singular, if the plural form clearly does refer to multiple objects, then I'd reason that it should simply be a non-lemma and the singular is the lemma. But for plural nouns that are not clearly multiple instances of something, it's more difficult. "Scissors" is a single object, so a hypothetical singular form doesn't have a predictable meaning. What is a "scissor"? Saying it's the singular of "scissors" doesn't actually make it clear what it is. So I think that we should evaluate cases where the singular parameter of this template has been specified. —CodeCat 15:00, 19 August 2015 (UTC)
I agree (but that shouldn't prevent it from being on the headword line, just in case that's what you were implying). And the same is the case with plurals of proper nouns, such as Islams; just calling it the "plural of Islam doesn't explain what it means. --WikiTiki89 15:04, 19 August 2015 (UTC)
"A scissor is for cutting"; "A scissors is for cutting"; "Scissors are for cutting" (could refer to one or multiple pairs of scissors). The pattern doesn't apply to spectacles/glasses.
What label and what category name should be applied to scissors and to glasses/spectacles? DCDuring TALK 16:17, 19 August 2015 (UTC)
Yes, but what is a scissor? Some would say it is one half of a pair of scissors. Other's would say it is one pair of scissors. Others would say it is one instance of a scissoring motion. But none of that is clear from the definition of scissors. --WikiTiki89 16:21, 19 August 2015 (UTC)
I don't think it is used much to mean "one of the two parts of a pair of scissors." despite the apparent use of scissor in just that sense in pair of scissors. We have long past the time when there was a significant group of speakers who used scissor that way. DCDuring TALK 16:29, 19 August 2015 (UTC)
Challenge accepted. --WikiTiki89 16:34, 19 August 2015 (UTC)
I don't doubt that you can find current attestable usage of scissor in the sense you have dredged up from history and etymology. I think it is more likely the subject of humor (eg, George Carlin) than conversation that adheres to the Gricean maxims, in particular "Avoid obscurity of expression" and "Avoid ambiguity" (presumably in context). DCDuring TALK 16:46, 19 August 2015 (UTC)

Guidance requested on religious terminology[edit]

Quaker terms I would like to make entries or a listing for Quaker-related terminology, as some of it is very particular but I'm not sure if it belongs in the main body of the dictionary or an appendix or what-have-you. For instance, Quakers traditionally didn't refer to the days of the week by their common pagan-derived names but used "first day" for Sunday, "second day" for Monday, etc. I could easily imagine someone reading about a "Friend going to meeting-house on first day" and not realizing that this means a "Quaker going to church on Sunday". Should I create entries for all of these terms or simply something like Appendix:Quaker terminology? Thanks. —Justin (koavf)TCM 02:17, 20 August 2015 (UTC)

  • Be bold, and make a start. We'll let you know if you do anything wrong. SemperBlotto (talk) 05:20, 20 August 2015 (UTC)
    • Do we have a context label for Quakerism? If not, we should make one. —Aɴɢʀ (talk) 09:49, 20 August 2015 (UTC)
      • A strategy is to start with a simple, but formatted, list in an Appendix * {{l|en|first day}}, yielding first day. That would enable you to see how many of the terms already existed in English (blue link), possibly with the right definition, how many required a new English section (orangish link), and how many needed new entries (red link). Each of these situations can be speeded up by having specific cut-and-paste. DCDuring TALK 18:06, 21 August 2015 (UTC)

@DCDuring:, @Angr:, @SemperBlotto: A lot of them are at Appendix:Quakerism. There are probably a few more but I'm tired now. Do you think that a context label and tracking category would be useful? Thanks. —Justin (koavf)TCM 03:16, 23 August 2015 (UTC)

I do. We already have them for other Christian denominations such as Category:en:Anglicanism‎, Category:en:Eastern Orthodoxy‎, Category:en:Coptic Church‎, Category:en:Mormonism‎, Category:en:Protestantism‎, Category:en:Roman Catholicism‎, so why not Quakerism? —Aɴɢʀ (talk) 06:25, 23 August 2015 (UTC)

French French, Spanish Spanish and the like[edit]

This came up tangentially in May, but I'd like to raise it in its own thread. Currently, most regional categories are named "[place-adjective] [language]", as in "French French", "Welsh English" and "Austrian German", while a minority are named "[place-noun] [language]", as in "Louisiana English" (not *"Louisianan English") and "Quebec French".
I and some others find "French French" (and also to some extent "Welsh English") awkward and confusing, because it's easy to interpret both instances of "French" (and "Welsh") as referring to a language rather than a place. The "[place-adjective] [language]" scheme is also impossible or undesirable for some languages: "Swiss German" was felt [by some people, not me] to be so similar to the name of the Swiss German language [which Wiktionary calls Alemannic] that its category was moved to "Switzerland German", and it's currently impossible to distinguish French terms specific to the DRC from those specific to the ROC, because both go in "Congolese French". OTOH, "Austrian German" and most other category names are fine.
I propose we move all the reduplicated categories (like "French French") to either the "France French" format some categories already use, or to a format like "French of France". (Should we move all categories, including "Austrian German", etc, to one of those formats? It'd be consistent, but unnecessary in most cases.) - -sche (discuss) 22:05, 20 August 2015 (UTC)

Using the "French in France" format has the nice advantage, from a technical standpoint, that it fits the same name format as all our other part-of-speech type categories. —CodeCat 22:30, 20 August 2015 (UTC)
  • Support Absolutely. I always support "X in Y" or "X of Y" constructions because of Congo/Congo and Dominican/Dominican (Dominica and the Dominican Republic). —Justin (koavf)TCM 03:58, 21 August 2015 (UTC)
We need to make sure we use linguistic borders rather than political borders. Anything with the word "Republic" in it is not likely to be a linguistic border. --WikiTiki89 05:25, 21 August 2015 (UTC)
I could support this in cases where it's ambiguous (like Congolese French) or highly misleading (like Swiss German was), but some of the reduplicated names (e.g. English English for the English of England) are actually well established and I wouldn't be happy to see them go. And I really wouldn't want to change the names of local varieties when the names are nonreduplicated, well established, and unambiguous, like Austrian German or Munster Irish. —Aɴɢʀ (talk) 12:43, 21 August 2015 (UTC)
Yeah, that's a concern I have, too — "Austrian German" and most categories have perfectly good names as-is, it's only a minority that are problematic. I certainly don't want to have three competing formats ("[place-adjective] [language]", "[place-noun] [language]", "[language] of [place]"), so if we're not prepared to switch in general to a "[language] of [place]" format, I suppose the status quo of occasionally deviating from "[place-adjective] [language]" to "[place-noun] [language]" is functional, if a bit unschön. "Dominica English" and "Dominican Republic English" work, and I guess so does "DRC French" (probably the least ugly option, compared to "DR Congo[lese] French" or the atrocious "Democratic Republic of the Congo French"). - -sche (discuss) 19:13, 21 August 2015 (UTC)
  • Where can I see uses of "English English"? google books:"English English" gives me high number of hits but from clicking the hits I find no quotations of use of "English English". --Dan Polansky (talk) 21:20, 21 August 2015 (UTC)
google books:"English English dialects" turns up a handful, which I've added to Citations:English English. Obviously, I don't dispute that the phrase is attested, only that it's the best/clearest name we could choose to use. - -sche (discuss) 03:15, 22 August 2015 (UTC)
We don't have names for linguistic divisions. It's English of England, not the more accurate English of England minus the northern half of Northumberland and the southwestern part of Wrexham, Wales and various enclaves in Paris, Dublin, New York City, Hollywood, etc., etc. (Yes, that was made up; I don't know the exact lines of English of England, and in fact the edges aren't that clean, the lines between Welsh English and Scottish English and the English of England are in fact slow changes.) By the difficulty of moving across national borders, and cultural identities tied to them, national borders tend to have some effect on language division, and where they don't, we probably can't say anything about it. So, no, "Republic" in the name doesn't mean anything.--Prosfilaes (talk) 20:56, 21 August 2015 (UTC)
Fr.Wikt lists ~70 words which are used in one Congo but not the other; I welcome suggestions on how to categorize them without using the names of the countries (which is what fr.Wikt does, if anyone wondered). :-) Fr.Wikt also lists a handful of words which are used in both Congos, which it might be tempting to conflate into one category, but I note that we don't conflate words used in Canada with words used in the US even when the words are used in both places — we dual categorize them as "Canadian English" and "American English". (In fact, we had a discussion which specifically deprecated the geographic label "North American" and made it so {{lb|en|North America}} displays and categorizes as "Canada, US".) - -sche (discuss) 03:23, 22 August 2015 (UTC)

Get rid of the parentheses around inflections in headword lines[edit]

Instead of putting parentheses there, I'm thinking it might look cleaner to separate the inflections with an m-dash or something similar. Something like this:

testplural tests

An advantage is that it looks nicer when you put qualifiers or transliterations there. Those features aren't used much, but they are available.

What do you think? —CodeCat 21:15, 21 August 2015 (UTC)

@CodeCat: I think it could be visually appealing but mdashes with spaces is bad typography. Space ndashes or use mdashes immediately between the terms. —Justin (koavf)TCM 03:18, 22 August 2015 (UTC)
Not it's not. Languages other than English frequently use m-dashes with spaces. It's not "bad typography", just not typical in English text. --WikiTiki89 03:30, 22 August 2015 (UTC)
@Wikitiki89: If it's not typical typography, then we shouldn't use it. —Justin (koavf)TCM 03:43, 22 August 2015 (UTC)
It's not typical in English running text. That says nothing about specially formatted things like tables or dictionaries. --WikiTiki89 00:49, 23 August 2015 (UTC)
Here's what other dictionaries do (a slash denotes a line break):
online dictionaries:
  • Cambridge: thesis / noun (plural theses) / [definition]
  • Collins: thesis / noun / (plural) -ses
  • dictionary.com: thesis / noun, plural theses [...] / [definition]
  • Merriam-Webster: thesis / noun [...] / [definition] / plural theses
  • Oxforddictionaries: thesis / noun (plural theses) / [definition]
  • thefreedictionary.com: thesis / n. pl. theses / [definition]
paper dictionaries:
  • Concise Oxford English Dictionary: thesis n. (pl. theses) [definition]
  • Webster: thesis n., pl. -ses [definition]
The trend is to have as little horizontal space as possible between the singular and the plural, which is consistent with using parentheses or a comma, and inconsistent with a dash. - -sche (discuss) 05:13, 22 August 2015 (UTC)
But the trend for online dictionaries like us is to have a line break, so maybe {{head|en|noun|plural|tests}} and {{en-noun}} should generate:
and so on, e.g. {{de-noun|m|Tischs|gen2=Tisches|Tische|Tischlein|dim2=Tischchen}} gives:
I think that's easier to read than piling the forms up horizontally. —Aɴɢʀ (talk) 06:09, 22 August 2015 (UTC)
  • This would make sense on cell phones, but on laptops there's usually limited vertical space and lots of horizontal space.
  • In response to CodeCat, I've long wanted the parens gone, because with translits you end up with two layers of parens. Benwing2 (talk) 06:31, 22 August 2015 (UTC)
I think the line break some other dictionaries provide between the first mention of the lemma form and the mention of its plural is the same one we already provide between those two things: we just separate the first mention of the lemma form (up at the top of every page) from the rest of the headword line by so much other stuff like etymology that we repeat the lemma form a second time before we give the plural. I don't think we should add another line break on the PC version of the site, although as Benwing notes, it might actually make sense to do so on the mobile version. - -sche (discuss) 07:54, 22 August 2015 (UTC)
What about just a comma?
noun, plural nouns
ко (ko), plural кои (koi)
Arabic entries like حدث already employ commas rather than parentheses for this kind of thing. - -sche (discuss) 08:02, 22 August 2015 (UTC)
There are many formats that would get my vote, but any format that would take up any additional vertical screen space on a desktop, laptop, or good-sized tablet would not. I'd prefer an endash over an emdash too.
Wouldn't the space constraints of a cellphone be better addressed by the Wiktionary app than by our efforts? DCDuring TALK 11:34, 22 August 2015 (UTC)

The dot before the first transliteration on some entries' headword lines[edit]

The thread above this prompted me to look closely at how headword-lines are formatted, and it strikes me that having a dot before only the first transliteration on only the headword line of only some entries creates an awkward and inconsistent amount of space. For example, in буква: why should "(búkva)" be further away from "бу́ква" than it is from "f inan"? and why should "(búkva)" be further away from "буква" than "(Latin spelling žagati)" is from "жагати" in [[жагати]], or than "‎(romaji aizōban)" is from "あいぞうばん" in that entry? The dot is especially awkward in entries like حَدُثَ (ḥaduṯa), where the headword-line goes on to give another word and its translit, and the second translit is not separated by a dot. I propose we eliminate the dot.
I know that for the tiny number of languages which have WT:_ transliteration pages, the dot serves as an easter egg for the tiny number of people who notice that it contains a link. The link could either be moved to the transliteration itself, i.e. бу́ква (búkva), or just omitted because a nearly unnoticeable link that only exists for a few languages and points to a page that's frankly not very useful is, well, not so useful that it needs to stay... I mean entries don't even normally link to WT:About _ pages AFAIK, and those are frequently more useful. - -sche (discuss) 08:30, 22 August 2015 (UTC)

I was one of the proponents and spreaders of the "dot" format, but recently I changed my mind for the reasons you give. I now think it should be removed altogether. --Vahag (talk) 11:07, 22 August 2015 (UTC)

User:Pereru and sources again[edit]

This user has started adding all kinds of templates like {{needsources}} and {{needref}} to entries again. These templates don't serve a purpose as there is no strict need for sources. You can't ask for sources if there aren't any.

More annoyingly, the user is now also preventing me from editing and fixing up etymologies, reasoning that I may only write what agrees with the source. This is complete nonsense; if sources restrict what edits Wiktionarians may make, then the sources need to go. Or better yet, the users need to stop doing that and let editors do their work. If sources prevent me from improving Wiktionary, I'm going to start removing them. —CodeCat 18:39, 22 August 2015 (UTC)

The template {{needsources}} was kept, as in the decision above, and is believed to be useful. Adding it to an entry does not change any of its contents, it merely points out that there are no sources and that it would be an improvement to add them. If there are no sources, add a rationale -- the template says so. If you want, we can talk about how to do that. But adding information that is based on something -- even original research -- wihtout mentioning its source or rationale -- that is in no way an improvement.
Ahn... Please don't misrepresent me. What I'm saying is that etymologies cannot float in the vacuum. If you don't have a source to add, add a rationale. You often do a quick'n'dirty one in the edit summary -- why not add a better one to the text itself?
I insist: I am not preventing you from fixing etymologies: I merely think that, by letting them float in the air, you're making them worse. Ground your etymologies, and I'll have no problem with what you do. Please, don't misrepresent what I say.--Pereru (talk) 18:45, 22 August 2015 (UTC)
Why is a rationale needed? Specify which parts of an etymology are in doubt. Or better yet, take it to WT:ES. Putting a template on the entry solves nothing at all. The template itself needs a rationale just as much for it to be useful.
Because the reader is not a Wiktionarian. He's not trying to discuss etymologies. He wants to know what the jist is of the reason why this form is here rather than some other form. He is not a critic: he just wants to know how Wiktionary decided that this was the right form. It's information, it's relevant, it should be on the page. Why is this even a problem? Are you trying to hide something?
I'm also not amused by your continued stance against Balto-Slavic. Balto-Slavic is accepted and has consensus among linguists, yet whenever I add it to an entry you put brackets around it and add "perhaps", while keeping your own Baltic-only etymologies displayed prominently. Wiktionary is not here to promote your fringe anti-Balto-Slavic views. We should show the current state of research. I think if you continue to exclude Balto-Slavic or play down its relevance or acceptance, then you should stop editing etymologies altogether. —CodeCat 19:07, 22 August 2015 (UTC)
Not the linguists I've talked to, no. But even if it were a consensus -- I don't have a problem with you adding Proto-BS to Wiktionary. I have a problem with you not arguing for the forms (same for Proto-IE, by the way). If you've invented them yourself, say so and state why. Why is this so difficult? If you're proposing a hypothesis, justify it on the page! If it's an argument that is generally valid for many words, write it up somewhere and link to it on the page! The "perhaps" there is meant to show that there is no reason for that form given here in Wiktionary -- if you add a reason, a justification for that form, then I'll be happy to delete any hedges.
Let me turn the argument against you: Wiktionary is also not here to support your anti-source, anti-rationale agenda. Being against sources and insisting on hiding the reasons why you choose one specific protoform when there are other in the literature and when there often is disagreement among Wiktionarians (see Štambuk and you on Kim vs. the Leiden school) does not make anything better here -- it arguably makes things worse. --Pereru (talk) 19:15, 22 August 2015 (UTC)
And yet, you refuse to explain anything at all about the problems you have with etymologies. WT:ES exists for a reason, why don't you use it? That's the place for discussing etymologies. Discuss what's wrong with them. If you find them implausible, then say why in the discussion. Just putting "perhaps" and a bunch of templates doesn't solve any of that. —CodeCat 19:22, 22 August 2015 (UTC)
That's because I don't have any specific problems with the forms in question; I just want to see what the reasons are for their having been chosen. And I keep not understanding why wanting to see this is strange, and why you're so determined to hide it. Again: it's not about discussing, it's about documenting. It's like adding sources to quotations. --Pereru (talk) 19:34, 22 August 2015 (UTC)
Yes, and which reasons are unclear? What needs explaining? Specify which aspects of the etymology are unclear and need explanation. And don't say "all of it" because that would make no sense; not a single etymological source explains everything about an etymology. The reader is always assumed, by every work, to have an understanding of the linguistics. What etymological sources do is they explain special parts of the etymology that may be surprising or unexpected, or aspects of a language's development that are unknown or not fully consensus. So you need to specify which parts of the etymology are unclear and need motivating.
I need to see the rationale in order to tell you if I think there is something wrong about it. Just as I need to see the definition of a word to see if I think it's wrong. A word without a definition is not useful in a dictionary. An etymology without a source or rationale is just floating in space, it is a speculation of its author. This should be obvious. Ask yourself: why is it that every good etymological dictionary known to man has both sources and justifications for the protoforms it lists? Are they really all wrong in doing that?
Also, aside from all of this, you do realise that all this applies to you as well? You'll have to give motivations in all your etymologies as well. Especially the ones that promote Baltic while dismissing Balto-Slavic. Fringe and unusual ideas should always be subject to higher scrutiny. So if you have a particular reason for going against the majority view that Balto-Slavic is a real thing, then you will have to explain this and why this view should be preferred in Wiktionary etymologies. Because I don't think there is a consensus for excluding or minimising Balto-Slavic. It has more support among linguists than Baltic does. —CodeCat 19:43, 22 August 2015 (UTC)
Of course I realize that. That's what I've been doing from the start. Every single etymology I have added has (a) a source and (b) a motivation/rationale. They are just not mine; they are Konstantīn Karuli's. You may disagree with them, and you are free to argue or counterargue (justifying and sourcing your arguments); and if indeed they proceed, then you win. What is the problem with that?
I'm all in favor of scrutiny! My entire point is that you provide no scrutiny. You just carry out your decisions without giving good reasons, and every time you're called on that, you just say something like "I don't need to justify my preferences". Well, you do. Please, do some scrutinizing. And write it down for other to see and scrutinize, too! Your distaste for justifications and/or sources is the fringiest idea I've seen: I don't know a single person interested in historical linguistics who supports that, including the other Wiktionarians here. It's only you, CC. You're the fringe one here, the one who needs scrutinizing. Please accept that. --Pereru (talk) 13:36, 24 August 2015 (UTC)

Sources, despite User:CodeCat[edit]

Frankly, here is my personal approach to this. If anyone (except CodeCat, who really isn't impartial about this issue) thinks I'm wrong, please let me know.

  • I think sources and/or rationales (CodeCat always forgets this part, for some reason) make etymologies more trustworthy, because they show to that Wiktionary has done its homework and allow the more educated user to check whether or not s/he agrees with Wiktionary (this is especially important when an etymology is Wiktionary's own).
  • Sources and/or rationales (CodeCat always forgets this part, for some reason) are easy to add: if you're copying the info from somewhere, write down where from. If you're creating it yourself, write down why this is better.
  • If an entry doesn't have sources and/or rationales (CodeCat always forgets this part, for some reason), then it's OK to add a template that says so, so that those who are interested can take care of it. It's not different from templates like {{rfap}}, which I also use extensively to encourage Latvian speakers to add audio pronunciation files.

What is wrong about any of the above? And in what way does any of the above prevent anyone from working?

CodeCat and I have been reverting each other's edits for a few minutes. I will no longer do that -- it's more than a bit childish -- but I will leave here my request that something be done about it. This page is a discussion forum where such problems can hopefully be resolved. Let us talk about that, then, and come to some sort of conclusion, so that we can finally go on doing things without sudden tantrums from our estimated colleagues. --Pereru (talk) 19:03, 22 August 2015 (UTC)

  • CodeCat, I think previous discussions have made abundantly clear that the only person here who thinks there is no need for reconstructions to have some kind of reference is you. Given that fact, it would be wise of you to stop edit-warring {{needsources}} (which was RFD-kept per consensus) out of entries. Let's start working on a template or format for presenting "Wiktionarian research" / "rationales" on entries which lack scholarly references. For reconstructions based on known sound correspondances, perhaps we could document the sound correspondances on an 'About' page (or similar page) and then have a template that says "Reconstructed by Wiktionary according to known sound correspondences" which could be placed in the references section or at the bottom of the entry {{Webster}}- and {{LDL}}-style. - -sche (discuss) 19:08, 22 August 2015 (UTC)
    • The issue I have is that these templates are telling me to add references and sources. There aren't any, so I remove the template. What point is there in asking for something that doesn't exist? —CodeCat 19:11, 22 August 2015 (UTC)
      • The template says sources and/or rationales. If there is no source, add a rationale. Are you claiming the rationales also don't exist? Supposedly you haven't been picking protoforms randomly... have you? --Pereru (talk) 19:20, 22 August 2015 (UTC)
      • (e/c) The template explicitly asks for either pre-existing scholarly sources, or what it currently calls "original research" (that wording and the format it prescribes need to be improved, but the meaning is clear). On Wikipedia and some other Wiktionaries, like de.Wikt, you would be blocked if you kept adding original etymological research. We're offering you a big concession, a big compromise — you get to keep adding your OR (whereas newer users even on this wiki have been threatened with blocks, as recently as last week, for adding OR etymologies), but you have to provide your rationales for it — for each reconstruction you invent. If you aren't willing to do that, previous discussions have made clear that there are quite a few people who would be happy to simply delete and ban all etymological OR. - -sche (discuss) 19:23, 22 August 2015 (UTC)
        • You're making it sound like there's this big change that has to be made to allow unsourced etymologies. But it's just the status quo. So I keep doing what has always been done, as there hasn't been a policy change. Don't make it sound like a concession because it isn't. If you want to require sources for all etymologies, make a policy and enforce it (which would mean removing somewhere around 90% of all etymology sections and reconstructions). That's all I ask for. Until then, you need to be clearer about what's wrong with the etymologies. Just asking for sources and rationales is going to get ignored. Pereru can patrol his fringe etymologies all he wants, that's fine with me. Latvian is not my responsibility. As long as I can make sure the rest of Wiktionary is up to par. —CodeCat 19:31, 22 August 2015 (UTC)
          • Previous discussions have made abundantly clear that you are the only person who subscribes to the view that reconstructions are not required to have any sources. Your long-standing but solitary refusal to accept the status quo does not change the status quo. - -sche (discuss) 19:37, 22 August 2015 (UTC)
            • It's not clear to me at all, the prior BP discussions gave a rather nuanced picture. Make a policy that has clear consensus, and then enforce it. Nothing else will do. —CodeCat 19:45, 22 August 2015 (UTC)
          • I think this is the main point of all these discussions -- to create a new policy. Are we all in agreement now? If anyone other than CodeCat disagrees that sourcing and justifications are good and people should add them to pages, then please say so, or else... do we have a new policy? --Pereru (talk) 19:51, 22 August 2015 (UTC)
            • A policy is a separate page, clearly and delicately worded, and approved by consensus through a vote. Something like WT:CFI. —CodeCat 20:01, 22 August 2015 (UTC)
                • In a previous discussion I did exactly that, on this very page. And since nobody disagreed, I suppose this means we have a policy? --Pereru (talk) 20:48, 22 August 2015 (UTC)
      • There can be no such thing as no source; if you made it up, then write Source: CodeCat's ass. If someone is asking for a source, it's useful information that you just made it up.--Prosfilaes (talk) 20:55, 22 August 2015 (UTC)

@Pereru And since nobody disagreed, I suppose this means we have a policy? From what I gather CC insists that a lack of voted-on, explicit policy negates the fundamental clause that "Wiktionary is a secondary source" (which means that wikt allows some elements of synthesis but the synthesized sources still need to be cited.)

Anyways, can this be a thing? In case I disappear I would like to document my support of a potential policy requiring sources, including for synthesis [e.g., "bebe could be considered derived from baba because source X says that ebe is derived from aba" and so forth, the keyword here being source X.] Neitrāls vārds (talk) 21:51, 22 August 2015 (UTC)

Is User:CodeCat's behavior a problem?[edit]

I have personally nothing against CodeCat's work, which is excellent in many areas of Wiktionary. But his/her behavior with respect to sourcing and providing support for his/her etymological choices are causing increasing concern. Despite the majority view that {{needsources}} was useful, and that sources and/or rationales (CodeCat always forgets this part, for some reason) improve an entry (just like audio pronunciations do, which is why there are templates like {{rfap}}), CodeCat is doing his/her damndest to make this particular part of the job -- selecting the entries that need this improvement, and then going about doing it -- irritatingly difficult. Again, I have nothing against all other contributions by CodeCat, who, as far as I know, is a good person. I'm not against the person, I'm against the behavior, which, as I think most people agree, is not justified.
In view of that, is there some adminsitrative procedure here that can be undertaken to deal with such cases of irrational behavior? --Pereru (talk) 19:51, 22 August 2015 (UTC)

I have a problem with Pereru ignoring the consensus agreed upon in BP just this month, to make Proto-Baltic an etymology-only language. Pereru continues to create Proto-Baltic pages and categories, even going so far as to undo page moves. This needs to stop and I would like to know if there is some administrative procedure that can take care of this irrational behaviour. —CodeCat 21:02, 22 August 2015 (UTC)
I've blocked Pereru for one day for disruptive edits, which ignored consensus. —CodeCat 21:04, 22 August 2015 (UTC)
And I've unblocked him because, as I wrote, it was a "bad block by an admin who is actively involved in edit wars with this user, and is herself disruptively editing against community consensus (which is what she accuses Pereru of)". - -sche (discuss) 21:08, 22 August 2015 (UTC)
Of course, my behaviour makes his perfectly excusable. —CodeCat 21:09, 22 August 2015 (UTC)
Yes. You're being irrational, so your decisions don't make sense, whereas I wasn't, and mine do. What's the problem with that? --Pereru (talk) 13:38, 24 August 2015 (UTC)
  • But still, guys: CodeCat is imposing a policy that was never approved, that clearly goes against what the majority here wants, that goes against written recommendations like Wiktionary:Etymology#References; s/he also goes on a tantrum whenever anyone opposes that and takes unmeasured punishing actions such as his/her recent attempt to block me. And yet nobody does anything against it. What is the problem? Why does Wiktionary allow such destructive behavior? Isn't it the time for a disciplinary action? --Pereru (talk) 13:45, 24 August 2015 (UTC)
    • A disciplinary action was tried, but it failed. --Vahag (talk) 15:19, 24 August 2015 (UTC)

Czech possessive adjectives - etymology and related terms[edit]

First off, the term "Czech possessive adjective" does not find much use but I do not find a better one. Czech possessive adjectives would be the likes of orlův (eagle's) from orel (eagle). They are much like English possessive forms that we do not include for the reason that the apostrophe makes them effectively sum of parts; that is not the case with the Czech forms. In Czech, there is still a distinction between orlův and orlí; the latter would be used in the translation for "eagle's nest".

Now, how to treat them as for etymology and related terms?. I want that entries for them do not repeat the etymology of the base term, and I want to see no "Related terms" section. I prefer that they be treated a bit like items in Category:Latin participles. In this, I seem to differ from User:Jan.Kamenicek.

A possessive adjective is created for great many animate nouns, most often referring to humans but also sometimes to animals. They include matčin (and forms matčina, matčino), otcův, sestřin, bratrův, synův, orlův, etc. They are not to be confused with koní, orlí, kočičí, psí, člověčí, etc.

I am asking for input from other people. I am looking forward to getting a view from other languages that have a similar feature, maybe Russian and other Slavic languages, but also other languages. --Dan Polansky (talk) 13:38, 23 August 2015 (UTC)

The term "Czech possessive adjective" does not find much use because there are not many English books dealing with them. The term "Czech hard adjectives" seems to find even less use, but they do exist. It is also not easy to filter them out, because not all books dealing with Czech possessive adjectives use the phrase "Czech possessive adjectives", they can talk simply about Czech language and use only the phrase "possessive adjectives" (such as here: [8]).
I believe that all the expressions like orel, orlice, orlí or orlův should be listed in the categories like Category:Czech terms derived from Proto-Slavic and therefore their etymology sections should include information that they "come from Proto-Slavic *orьlъ", which also puts it into the correct category.
As for the "eagle's nest": it can be translated in both ways (depending on context) as orlí hnízdo (talking about the kind of nest), or orlovo hnízdo (nominative neuter of orlův) The latter is used quite rarely, usually when referring to a nest belonging to a specific eagle, but examples when it is used as a synonym for "orlí" can be also found (usually in poetry or in old texts, one of them is in the quotation in the entry orlův). Jan Kameníček (talk) 14:16, 23 August 2015 (UTC)
My preferred format is like this нилеце (nilece), which seems to be what Dan Polansky is suggesting (the term "sub-lemma" comes to mind.) Just my "2 cents." Neitrāls vārds (talk) 14:25, 23 August 2015 (UTC)
I think that words categorized as lemmas should be treated as lemmas. Either it is a lemma, or it is not. I do not think that e. g. orlův can be considered a sublemma of orel. It is an adjective derived from orel by a suffix -ův, which is a derivational suffix, not an inflectional suffix. --Jan Kameníček (talk) 17:08, 23 August 2015 (UTC)
Maybe possessive adjectives should be ranked as non-lemmas, along with Latin participles and Czech comparatives (menší). It would be consistent with the practice of PSJC and SSJC. But I do not think it obvious that there should only be lemmas and non-lemmas, and that's it. For instance, many editors prefer to create some entries as alternative forms, and prefer to centralize etymology in the main entry and avoid it in the alternative form. The alternative form is still a lemma, but it is a secondary entry from the standpoint of information management. I have even seen some editors use the word "lemma" to mean "main entry" rather than "the word form representing all the inflected forms of the word".
The question is, like, do we want to repeat the etymology of huge in hugely, and do that for the whole class of -ly adverbs? --Dan Polansky (talk) 17:24, 23 August 2015 (UTC)
Was orlův separate from orel in Proto-Slavic, or was it only formed in Czech? If it was only formed in Czech, then I agree with Dan and Neitrāls: just say how it's derived from orel and put the history of orel in that entry. Just because something is its own lemma doesn't mean we have to duplicate (knowing it will come unsynced) information in multiple entries; rigidify is its own lemma independent of rigid, but doesn't repeat rigid’s etymology. - -sche (discuss) 19:12, 23 August 2015 (UTC)
Generally speaking, possessive adjectives appeared already in Proto-Slavic, see Appendix:Proto-Slavic/-ovъ. The possessives with the suffix *-ovъ changed in Proto-Czech (between 10th and 13th century) to -óv and later -uov, which changed into modern -ův.
Unlike huge x hugely, there are often more changes taking place when creating Czech possessives than adding the suffix, compare e. g. Radka x Radčin.
Besides this, I think that all words which have roots in proto-languages, should be listed in the categories like Terms derived from Proto-... . I don't think that only one representant of a group of related words should be listed there. Using the {{template:etyl}} in the etymology section is a good way to do so. Or should the category be added manually? Jan Kameníček (talk) 21:01, 23 August 2015 (UTC)
The fact that going from "matka" to "matčin" does not look like plain suffixing does not matter; it is the property of Czech morphology (inflectional and derivational alike) that it often does not work like plain suffixing on the surface level. For instance, "bedna" --> "bednář" = "bedna" - "a" + "ář"; "samec" --> "samčí"; "vyrobit" --> "výrobce" = "vyrobit" - "it" + "ce" with "y" made acute or the like; "dům" --> "domeček" (ů went to o); "orel" --> "orlíček" (e dropped); "hrdlo" --> "hrdelní"; etc.
What matters is that we are dealing with a very productive derivation or inflection pattern, like in English for -ly, -ness, -hood, -ify, -ing, etc. And what matters is whether we want to have etymologies like the one currently in orlův, which says this:

"From orel +‎ -ův. Noun orel comes from Proto-Slavic *orьlъ, which is from Proto-Indo-European *h₃er- ‎(“big bird, eagle”).[1]"

As you can see, the etymology first indicates the suffixing, and then goes into detailing the etymology of the component "orel". That is really like "swimming" detailing the etymology of "swim", and "merrily" detailing the etymology of "merry".

Whether possessives in general originated early or not does not seem to matter. What matters is the particular etymology, and whether it is of the form "base + suffix. Base is from base-etymology" rather than what we see e.g. in windmill, which could conceivably be "wind + mill", but can in fact be traced to Old English *windmylen. I do not think that all compounds should provide the etymologies of the component terms on the pages of the compounds. Put in general terms, I do not think that all etymologies of all terms resulting from derivation (prefixing, suffixing, compounding, etc.) should repeat the etymologies of their base terms. --Dan Polansky (talk) 19:10, 24 August 2015 (UTC)

Question (re: sourcing)[edit]

So, there was this thing that I wanted to get a feel of the general attitude.

Do passages/statements attributed to an author or a book need to actually reflect what the author/book says? Can they be changed ("corrected") with something that author doesn't say while still attributing it to them?

My answer would probably be "are you effing kidding me?" (lol) Then again en.wikt can be a serious "land of the bent mirrors" [don't remember the correct idiom] and things that I see as common sense some others don't even consider.

This is referring to a discussion 3 headers up (that I actually missed) where (to sum it up somewhat snarkily) CodeCat says that that book is stupid and needs her corrections while still proudly displaying the reference [1] at the end despite (in some cases) all the core information being changed. For example in akmens the direct parent root was changed, then extrapolating from that the proto-group was changed and a different PIE root introduced (none of these things are to be found in the source cited.) I call this manufactured references/misattribution but maybe I'm dumb...?

Would like others' input.

And more generally this thing has been lingering on for years, the crux of the matter is that CC demands an explicit, voted-on policy, why not just do it, it could be something very simple, something to the effect:

  • Wiktionary by previous consensus is a secondary source, this explicitly applies to etymologies, sources need to be provided, in case of synthesis, the synthesized works need to be attributed.
    • Usage of templates to keep track of unsourced pages is to be encouraged.
    • Attribution of statements to an author that they didn't make is to be avoided.

What do you think about that? Perhaps User:Dan Polansky could help set it up? Neitrāls vārds (talk) 14:57, 23 August 2015 (UTC)

There is still Wiktionary:Votes/2013-10/Reconstructions need references that never started. How is the present wording of the vote from your standpoint? --Dan Polansky (talk) 15:09, 23 August 2015 (UTC)
As for policy page, the main thing is consensus and evidence of consensus, IMHO. A policy page by itself is a poor evidence of consensus; it merely makes things convenient for newcomers who then do not need to wade through previous votes to find what the decision was. Thus, a policy page is not strictly necessary, IMHO.
For interest, Wiktionary:Votes/pl-2006-12/Proto- languages in Appendicies is a related vote that does not seem to indicate inclusion criteria. --Dan Polansky (talk) 15:15, 23 August 2015 (UTC)
Looks good, pretty much exactly what I had in mind. The only problem – a bit narrow. In Latvian there is this problem that the entries look like doormats (to be a bit dramatic). Would be perfect if it could be extended to mainspace...? P.S. perhaps a clause about misattribution would be necessary – right now I can name two appendices that very dubiously cite template:R:lv:LEV (a connection is attributed to this book that cannot be found there.) Neitrāls vārds (talk) 15:31, 23 August 2015 (UTC)
I agree completely. I don't know what is on CC's mind, but s/he is clearly doing the wrong thing here. I don't really know what "policies" are supposed to imply (CC clearly acts without one), but I say there has to be some order in the usage of references and justifications. I also agree completely that reconstructions need justifications (sources, rationales), a practice that is used in every good etymological dictionary that I know. --Pereru (talk) 13:26, 24 August 2015 (UTC)
@Neitrāls vārds: I updated the vote a bit, to indicate sentence structure in a clearer way.
As for narrowness: I'd suggest to leave it narrow, and see whether it can get enough support as is. We can create another vote for etymologies later. There is still the question whether etymologies should be inline referenced etc.; dealing with these appendices separately seems to be a good initial step. --Dan Polansky (talk) 15:39, 23 August 2015 (UTC)
I added the vote to WT:VOTE and scheduled it to start in a week. Let us postpone the vote as much as a discussion requires. --Dan Polansky (talk) 15:44, 23 August 2015 (UTC)
Great, thanks! Neitrāls vārds (talk) 15:49, 23 August 2015 (UTC)

Native speakers' advice[edit]

Native speaker's advice needed, please look at Talk:houbelec#Translation. Thanks very much! Jan Kameníček (talk) 21:15, 23 August 2015 (UTC)



I don't see the use of keeping such empty entries that failed their RFD's. Could someone explain? Thanks 12:15, 24 August 2015 (UTC)

Partly as a place to store the evidence for the word (so that if we eventually find more, we can recreate it more easily – see for instance redamancy, which was a blank entry pointing to Appendix:English dictionary-only terms, until we managed to find enough citations to create a full entry), and partly to stop people trying to recreate the page (which often happens with "words" that correspond to rare phobias, sex acts, political insults etc, which are often mentioned in word lists and novelty dictionaries but never actually used – look how many times "wunch" got deleted, until I created a proper cited entry for it). Smurrayinchester (talk) 13:44, 24 August 2015 (UTC)
Your first reason does not apply, since the citations page would exist even if the soft redirect to it from the main entry did not exist. --WikiTiki89 13:48, 24 August 2015 (UTC)
But who checks whether the citations tab is a blue-link when creating an entry? Smurrayinchester (talk) 14:22, 24 August 2015 (UTC)
That's your second reason. I only said your first reason doesn't apply. --WikiTiki89 14:59, 24 August 2015 (UTC)

Recreating Proto-Baltic (and other "deprecated" languages) with a different status?[edit]

Proto-Baltic was recently discontinued as an accepted language in Wiktionary. I was against it, because it doesn't seem to me that the discussion is over (and because there is no real authoritative source for PBS etyma yet), so it seemed premature, but OK, I can live with that. The problem, it seems to me, is that this forces changing quotes from sources in ways that don't seem legitimate. If a source reconstructs a form as Proto-Baltic, renaming it as Proto-Balto-Slavic without any further changes (e.g., replacing it with a different source) seems to me illegitimate. So: how about having a different status for Proto-Baltic? Say, "older/deprecated/obsolete Proto-language" or something like that? In this manner, we could list deprecated protoforms here (with templates duly identifying them as such) in the same way we list "misspellings of" or "alternative forms of" or "obsolete forms of" words in the main namespace. Here are a couple of reasons:

  1. People will still come upon older reconstructions -- they are, after all, attested in papers, etymological dictionaries, and other similar sourcces --, and may want to know what they were and why they were abandoned; it would thus be useful to have pages with these forms (clearly tagged as "deprecated" or something like that, and linked to the most recent and most widely accepted form), just as in biological taxonomy it is useful to have lists of old, deprecated scientific names so that older articles can still be read and understood correctly
  2. To follow the history of a proposed protoform, knowing its predecessors is important -- often, a new protoform is proposed in explicit opposition to, or as an explicit correction of, an earlier proposal. Being able to track these would be useful in understanding the state-of-the-art.

What do y'all think?

You can still reference a source that reconstructs a Proto-Baltic term in a Proto-Balto-Slavic entry. Think of it this way: we are reconstructing a Proto-Balto-Slavic term based on someone else's reconstruction of a Proto-Baltic term. --WikiTiki89 15:01, 24 August 2015 (UTC)
Sure, but the Proto-Balto-Slavic reconstruction will ultimately look different, at least in that it refers to a different level. (Most PBS entries here look very much different from the PB forms on which they are based). Someone who sees a PB form somewhere and wants to know what it is won't find a page about it here. Shouldn't there be one -- in the same way that there are "alternative spelling of" and "obsolete form of" pages? In this way we don't misrepresent sources, and we allow users to find exactly the form they saw in some source and track its status (deprecated) and understand why it was replaced by the PBS form. --Pereru (talk) 15:16, 24 August 2015 (UTC)
Let me give an example. A proto-Baltic form, like e.g. Appendix:Proto-Baltic/*akemns, would have an initial template saying something like: "This protoform is deprecated. The current consensus form is Appendix:Proto-Balto-Slavic/akmo. Reasons for this change are indicated below. See also Appendix:Proto-Balto-Slavic for the current view on this branch of Indo-European." In the page itself, the sources for that form (say, Karulis' LEV) would be cited. In this way, the reader would know what this form is, where it came from, and what it was abandoned for. The end result would seem to me to be at least as useful as "alternative spelling of" or "obsolete form of" pages. (I imagine there would also be a heading in the current reconstruction -- something like ==Deprecated forms== or ==Older proposed forms== -- to link the currently accepted protoform to its previous incarnations.)--Pereru (talk) 15:22, 24 August 2015 (UTC)
You may be right about including them in some way, shape, or form, but this has nothing to do with misrepresenting sources. We do not misrepresent PB sources by altering the form of the reconstruction to make it PBS-like. --WikiTiki89 15:33, 24 August 2015 (UTC)
Thanks. But as for sources, if a source clearly reconstructs a form as X, and we list it here under page Y, then it seems to me we are misrepresenting it, aren't we? (But one possible solution would be to mention this on the page; i.e., have something akin to a ===Usage notes===, or a footnote, where we explain that what the source said isn't exactly what is on the page. Would that be OK with you?) --Pereru (talk) 17:47, 24 August 2015 (UTC)
If you quote the Pythagorean theorem as (= (+ (* x x) (* y y)) (* z z)) rather than as , are you misrepresenting the Pythagorean theorem? --WikiTiki89 18:21, 24 August 2015 (UTC)
If you cite someone who quoted it as , then yes, you are. If you have some standard way of referring to the theorem that supersedes whatever the author you're quoting saying, then you should say so somewhere and link to it. It would be like, you know, changing US spelling to British spelling in a quote written by an American author -- not the right way to quote. --Pereru (talk) 18:39, 24 August 2015 (UTC)
But that's the thing, we're not quoting, we're paraphrasing. And when you paraphrase, it is totally OK to change British spelling to American. I can talk about the "color" of Winston Churchill's eyes and cite a British source that spells it as "colour", and I would not be misrepresenting the source. --WikiTiki89 18:44, 24 August 2015 (UTC)
But that most clearly should not be the case for reconstructions -- they are ideas and hypotheses, not paraphrases. Their spelling is often exactly what is being claimed -- a *X instead of a *X'. In other words, the sounds that compose the protoform are exactly the theoretical point that is being made; and, in this case, of course the spelling matters, in fact it is what matters most. There can of course be general problems that can be solved in a general way -- researcher 1 uses X for a certain sound (say little glottal stops), while researcher 2 uses Y (say accent amrks) -- and you can adapt the spellings to reflect that (as long as you are consistent, and you write up somewhere why you chose to regularize this difference in the way you did -- and link it to the pages where it is relevant). But in most cases this is not so, and differences in spelling mean something much more serious -- and they should be better documented. --Pereru (talk) 20:11, 24 August 2015 (UTC)
The difference is between equality (faithfully representing a source in its original form), and equivalence (understanding its meaning and/or intention). Some sources for PIE write h₂ while others write H₂. These are different things in writing, but we know and understand that they mean the same thing; they are equivalent even if they are not equal. So we can exchange one for the other without problems. Likewise, in Wikitiki's example, "colour" and "color" are unequal representations of equivalent meanings (I might call them "equivalent words", but this hinges on the question of whether different spellings make different words). —CodeCat 18:57, 24 August 2015 (UTC)
And the solution for this is easy: you make a principled choice (I hope, after a discussion with others) for, say, h₂; and then you write somewhere (say, Appendix:Proto-Indo-European) that you did that, and why, and you link this page to those in which h₂ occurs -- so the reader, who may have seen a source that had H₂, doesn't think that you made a mistake. And since you do know why you preferred h₂ to H₂, explaining it in writing shouldn't be a problem. You would only need to do it once, in one page (where you could explain all the other similar choices you made), and then link it to new PIE entries. And again I ask: what is it about this suggestion that is so unreasonable or difficult to do? You spend more time writing comments here than it would take you to do this.--Pereru (talk) 20:11, 24 August 2015 (UTC)
There's nothing against this in principle, and it's even preferred I would imagine. But at the same time, a lot of Wiktionary's practices and conventions are unwritten; we follow them because we learn from existing examples that are already on Wiktionary. In the case of the choice to use lowercase h₂, the earliest that I can find is this. And there, too, it was simply set as a rule without discussion or motivation. To discuss and motivate it now would be a bit pointless, as there's already a consensus for it. —CodeCat 20:19, 24 August 2015 (UTC)
Good, let's do it like that in the future then. Write up your favorite spelling choices for PIE (h₂ instead of H₂), their reasons (in this case, I suppose because h₂ is more recent?), and voilà: no more for reasons for complains, and people can go back to arguing the merits. The point is not justifying it to other Wiktionarians (though that in itself is not bad: there are always new people coming who don't know where this decision came from, and I'm sure they'd appreciate the information), but to users. If someone checks an etymology here and sees something s/he finds strange, and there is no justification anywhere for it, then this doesn't make Wiktionary look more trustworthy. Again: I'm not suggesting a discussion (unless people think there should be one), I'm suggesting documenting choices, to show, at first sight, that they are choices, not mistakes. Besides, there are things that are much more important than h₂, like your current diatribe about how to spell proto-BS intonations. After this is over, don't you think it would be a service to others to write down somewhere why one variant was preferred? Again, so that it doesn't look like a mistake, but like a true principled choice? --Pereru (talk) 20:53, 24 August 2015 (UTC)
I'm not going to start documenting everything for PIE all over again. Not unless enough people feel there is a pressing need for it. So far, nobody has complained about our current standards. If you want motivations for PIE notation, you'll have to write them on your own. —CodeCat 21:05, 24 August 2015 (UTC)
I hope they do, because this would indeed lead you towards actually improving your PIE forms. Of course, nobody can force you to do the right thing; you're a free individual. I'll simply keep adding {{needsources}} and {{needref}} to your unjustified decisions (unless you'll help me by doing it yourself, of course), hoping that someone other than you will have the knowledge to do the right thing. As for complaints, I did complain against your standards, and I've seen several peole (Štambuk, -sche) disagreeeing with your standards in specific cases, so I think you're assuming a non-existing consensus here. You're more counting on people's intertia than consensus actually. But hey, it is a strong feature of humans too. For all I know, you may well get away with it. --Pereru (talk) 22:37, 24 August 2015 (UTC)
There's also the case where different sources disagree on certain sound laws. For example there's a subset of linguists that thinks the change o > a happened independently in all the Balto-Slavic branches rather than in Balto-Slavic itself. In this case, too, we have to pick one particular set of sound laws as the "main" one. Our existing pages treat the o > a change as Balto-Slavic. Likewise, some sources may neglect to indicate accent or acutes even when all descendants are in agreement. You can compare this to Pokorny's reconstructions for PIE: they don't reflect modern understandings so they have all kinds of weird schwas and long vowels while lacking laryngeals. So imagine that the only source we have on a particular form is Pokorny; should we allow ourselves to bring the form up to par? These are all questions that arise when we start giving too much weight to sourcing. —CodeCat 18:33, 24 August 2015 (UTC)
Disagreement between sources is exactly the reason why you need to argue for the forms you create pages for her1e -- I'm so glad you brought this up! Look: if different sources give different opinions and explanations, then you discuss them and explain why you favor one over the other. Things like "different sound laws" can be part of the discussion. All the problems you mention above can be summarized and written up in a page (e.g., Appendix:Proto-Indo-European sources) to which you can refer as part of your explanations for preferring one form over another. I've seen this done in etymological dictionaries, and I see no reason why you couldn't do this here. Sources are good -- even when they disagree... --Pereru (talk) 18:39, 24 August 2015 (UTC)
We don't have to motivate and discuss every single choice we make. For Dutch entries, we choose the spelling as prescribed by the Dutch language union as the norm for lemmas, even though not everyone uses it and some people advocate alternatives. This choice is not motivated or discussed; it's simply set as a rule and accepted by our Dutch editors. In the same way, it's not necessary to discuss why we picked one particular set of sound laws to base our reconstructions on. In many cases, the choice is arbitrary and we simply picked one because we had to make a choice. I think it's more important for us all to agree on a set representation and sound laws for Balto-Slavic reconstructions, than it is for us to discuss and motivate it all. Not that it's not welcome and valuable to give reasons for choosing one particular thing, but that's secondary to making the choice in the first place. What we choose is more important than why. —CodeCat 18:52, 24 August 2015 (UTC)
Things are different with reconstructions, especially when there are competing hypotheses. See, a reconstruction is not a word, but an idea; and, as for every idea, justifying it is important. The spelling of a Dutch word is not an idea that is being discussed by several people right now and with different, equally authorative variants (for a still different, but more comparable case, see Nynorsk vs. Bokmål). That's what historical linguists do -- they justify their reconstructions -- and that's what you should do, too, if you care about reconstructions. When you say "we don't have to justify it", you're making a "petitio principii" without there actually being policy on this. Why don't you start a policy page on why we don't have to justify choosing one etymology over another, one set of sound laws over another (thus disputing Wiktionary:Etymology and let people vote on it? You keep talking as if everybody agreed with you, when this is clearly not the case, much the opposite. I'd like to see you try to defend and get this "policy" of yours approved. --Pereru (talk) 19:08, 24 August 2015 (UTC)
Well of course, there must be a consensus. I'm aware that the current representation of Balto-Slavic doesn't have consensus, as both you and Ivan seem to disagree on it. But Ivan's solution was to simply create alternative (duplicate) entries or move mine, which of course is no way to come to an agreement. So the question becomes how do we come to an agreement on things, and if we don't, what should be done with existing and future Balto-Slavic pages? Right now, the majority of them has been created by me, so they mostly reflect the (unwritten) standards I follow. But if we insist that there must be consensus first, what do we do with them? Should they be deleted until there is an agreement on them? What about Balto-Slavic forms in etymologies? —CodeCat 19:16, 24 August 2015 (UTC)
It's not simply that there should be a consensus -- the consensus shouldn't be hidden, buried in some page that was archived three years ago. The reason for the consensus should be right there, on the page, or at least in Appendix:Proto-Indo-European, so that the reader knows what consensus decisions you made, and why. Since you're talking about theories, not words that exist in real languages, then your sources and/or arguments are the basic reason why the protoword is here -- i.e., they are precisely the most important piece of information. (I don't disagree with anyone's spelling of proto-BS, by the way, simply because I'm not sufficiently familiar with it to have a principled opinion. But I see you disagree -- and given this fact, why create pages with one spelling when you can't even agree this is the right thing to do? Why not create a paragraph in, say, Appendix:Proto-Balto-Slavic, that summarizes this discussion -- after you're done with --, lists your conclusion and the reasons for it? Then you can follow it consistently, and nobody -- or at least not I -- will complain. What I keep not understanding is this need to hide your reasons: that makes no sense to me at all; and it's something that no etymological dictionary I know of has ever done. Why this innovation? --Pereru (talk) 20:16, 24 August 2015 (UTC)
Consensus doesn't necessarily have to be formed through discussion. Sometimes all that's needed is for one person to do something and for others to then follow that example. Consensus is often silent, and therefore undocumented, reflected only in practice. There's no documented consensus for most of the edits people make to Wiktionary entries; it's simply the fact that they're left unreverted that creates a sense of agreement for the new status quo. It's only when someone disputes something that a lack of consensus becomes obvious. In the case of Balto-Slavic, you two have voiced your opinions, so that's how I know. I continued creating entries because I figured, the source of the dispute is the naming, but we can still have good content and when we solve the dispute we can rename the entries. I haven't made further attempts to come to a consensus because the attempts I did make didn't work; Ivan's opinions were fundamentally different from mine on this matter, and nobody else seemed to care enough to provide a third voice, so the matter remained unresolved and both of us just kept doing our own thing. —CodeCat 20:28, 24 August 2015 (UTC)
True. And sometimes what is necessary to challenge it is for someone to come here and say "but this is not right, and here's why". And that's what I'm doing, quite legitimately so, since what I am asking for is no more, no less than what every good etymological dictionary known to man already does -- sources + justifications. So: me being here, and the reactions of several others, show that there is no consensus here. If I were you, I would stop adding any new words, and concentrate on justifying the ones you've added already. You know the reasons you had for creating them, so this shouldn't be a problem. What to do with the proto-BS (or IE, or FU...) words? Justify them. If in the future a given justification is abandoned, because a new one came up... then all those pages will need to be moved, and a new justification added. That's how things go with ideas that aren't attested words (and add the reasons why). --Pereru (talk) 20:44, 24 August 2015 (UTC)
Oh of course, challenge and counter-challenge. And then eventually there's either an agreement, or we all give up until the next time. I actually find it much easier to discuss things with many participants though, that way things are more nuanced and it's not just two opposites clashing and getting nowhere. Much less chance of a stalemate. I will see if I can write up a proposal for PBS reconstructions with those motivations you're after so much. No promises though. I will refrain from creating any more until I do this, but I ask you not to add your templates to the pages. You should also remove them from Germanic pages because the norms are already explained at WT:AGEM, have consensus, and therefore don't need further justification. —CodeCat 21:05, 24 August 2015 (UTC)
Yes, the Wiktionarian way, isn't it? So conducive to the right result!... I also like it when there are more participants. Please! And I will indeed be glad to see you write up your proposals, so that others can see what exactly are the tacit rules you're tacitly following with the tacit (dis)agreement of your peers. WT:AGEM is actually quite good -- proficiat! But it doesn't say why sources or justifications should not be added. (I keep saying: you're following a policy that is not used in any goood etymological dictionary anywhere. "Consensus" indeed!...) I will refrain from adding the template to them for now, but unless someone explains why there shouldn't be sources/justifications in Proto-Germanic words I will eventually return to adding them. Why should it be less good to source/justify Proto-Germanic reconstructions than those of any other protolanguage? As in the other cases, they aren't words, their justification is a crucial element to their eligibility for having a page that states they are the "right" protoform, etc.- --Pereru (talk) 22:49, 24 August 2015 (UTC)
You can list deprecated reconstructions under ===Alternative reconstructions===, tagging them with {{qual|obsolete}} or whatever. Note the reconstruction from Pokorny in Reconstruction:Proto-Indo-European/h₂eHs-. The entries for these alternative forms can be soft-redirected as in Reconstruction:Proto-Indo-European/pel-. --Vahag (talk) 15:29, 24 August 2015 (UTC)
That is a good idea! I didn't know this could be done. Now, would it be OK if I created Proto-Baltic forms as such, under Appendix:Proto-Baltic/xxxx and then redirected them to their Proto-Balto-Slavic equivalents at Appendix:Proto-Balto-Slavic/xxxx? --Pereru (talk) 17:47, 24 August 2015 (UTC)
Of course, the names of Balto-Slavic pages should agree in notation with what is already the current practice for BS entries. Acutes and accent should be indicated when known, and the distinction between ś/ź (former palatovelars) and š (from RUKI) should be maintained, while the letter ž is not used for Balto-Slavic. This means that such things should be corrected for in the redirect as well. If certain features are reconstructible but not indicated in the page name, this should be explained in the entry. For example, if Slavic and Latvian have s while Lithuanian has š, then the expected reconstruction is ś and any difference should be accounted for by the entry. Likewise, if the descendants all indicate an acute but the page name has none, this too needs explaining. —CodeCat 18:05, 24 August 2015 (UTC)
Hi CodeCat! Glad you're not whimsically blocking people today. Now, to keep a form that was reconstructed as PB under a PBS heading would be as wrong as keeping a Latvian word under a Lithuanian heading -- it simply disagrees with the source, i.e. it is factually wrong information. The various letters are just notational conventions, differing from author to author, and could probably be resolved with redirect pages. (You could of course also include this information about Proto-Baltic in the Proto-Balto-Slavic page itself, but I don't see how this would be any better -- care to elaborate?
Let me give an example. I'm going to recreate the Appendix:Proto-Baltic/akmens page -- CodeCat, please refrain from deleting it until the discussion here is complete -- and make it look like what I'm thinking. Then you guys can give your opinions. --Pereru (talk) 18:34, 24 August 2015 (UTC)
Latvian vs Lithuanian is irrelevant here, that's a completely different case. They are real attested languages, and to label a Latvian word as Lithuanian would be a misrepresentation of the attested facts (not the sources; sources are irrelevant for attestation as we are a secondary source). But for etymologies, sources aren't facts, they're proposals. And as an independent etymological source, we're allowed to make different proposals. So if we think that no, your Baltic reconstruction doesn't make much sense, here's a Balto-Slavic one we agree with more, then we are allowed to do that. Being a secondary source means that we do our own interpretation of the facts. We can of course use the proposals of others as part of ours, which we do. And we should definitely source that. —CodeCat 18:44, 24 August 2015 (UTC)
And reconstructed protoforms are attested as claims at certain levels; and to misrepresent the claims as different from what they were is wrong. If you prefer, compare it to adding a quote to a certain word, but (a) misspelling words in it, or (b) attributing it to the wrong source. Not the right move, ahn?
Thank you so much for saying the sources are proposals -- I had said that to you so many times, I thought you never would agree with me. That's exactly why it's so important to ground them. See, when you create a protofrom page here, you're not creating a word: you're creating a proposal. And what makes proposals good or bad are the arguments that support them -- as you yourself said, they are not attested facts. That is exactly why sourcing and arguing them is so important: proposals without the accompanying argumentation are not compelling.
Finally, I have no Baltic reconstructions -- Karulis does. Take it up with him if you want, not me. Just like you haven't invented any of the Dutch words you contributed to Wiktionary (right? you haven't, have you? I mean, maybe you think Dutch is like Proto-BS and you should be allowed to add even the ones you invented yourself without further justification...). I have absolutely no problem with you changing any Proto-Baltic etymologies as long as you document you reason for doing so, or your source, etc. -- so that the reader can see why this is supposed to be better. I repeat: it's not much work, it takes a couple of minutes, and you must have the information already since you're making judgments on the basis of it. There is absolutely no valid reason for you not to do that. Period. --Pereru (talk) 19:08, 24 August 2015 (UTC)
Adding a source to our proposals just says "we agree with this idea". But that doesn't make sourcing important necessarily. Maybe there aren't any proposals that we agree with, and in that case we have nothing to source. So what Karulis says may be nice, but they are your reconstructions as soon as you put them in etymologies. Again, the source simply says that Karulis agrees with you, but you put it in the etymology, so you are proposing it in the name of Wiktionary. And I'm not required to give motivation for changing the etymology if there isn't one to begin with. Take your favourite edit warring target suns for example; the form is not motivated at all, but simply stated as fact, with reference to Karulis. This seems like exactly the kind of thing you're advocating against. A proper etymology, as I understand your view to be, would provide a motivation for the reconstruction. This motivation may itself come from Karulis's work, or it may be your own supplement. Or, it could be documented centrally in an appendix so that we don't have to write it down everywhere. But would have to exist, even for proposals that are sourced. —CodeCat 19:26, 24 August 2015 (UTC)
Also an added note: Karulis's reconstruction for suns is demonstrably wrong, because it shows the ō > uo diphthongisation for both East and West Baltic. This change only occurred in East Baltic, and is not found in West Baltic{{R:Fortson 2004}} so the form Karulis gives is Proto-East-Baltic. This is one of the reasons I am against over-reliance on sources; sometimes they are quite obviously wrong. —CodeCat 19:34, 24 August 2015 (UTC)
──────────────────────────────────────────────────────────────────────────────────────────────────── If we change suns from saying "From x.<ref>Karulis, Book</ref>" to saying "According to Karulis, from x.<ref>Book by Karulis</ref>", does that solve at least some of this dispute? That's what was done once before when there was dispute over the etymology of bensin — the entry was rephrased to attribute the etymological theory explicitly, rather than giving it in "Wiktionary's voice". - -sche (discuss) 19:45, 24 August 2015 (UTC)
I think it would help, because, to me, the main problem is making sure that everybody's opinion is clearly marked -- Karulis', CC's (or Wiktionary's), etc. It's all a question of knowing we are reporting the right thing.
@CodeCat, look: the point is not whether Karulis is right or wrong -- I have no beef with that. The point is making sure your reasons for agreeing or disagreeing with him are documented somewhere, so the reader can see them and decide if s/he agrees with you or not. So: if you want to copy the paragraph you wrote above and place it, say, somewhere (in suns, or in Appendix:Proto-Balto-Slavic and then link it to suns, adding a few words to the etymology discussion) -- I have no problem with that. My only problem is with you erasing or changing Karulis' opinion, and then contributing something that cannot be checked. Let me see if I put in bold you will finally react to this: I am not saying you have to believe your sources unconditionally; I am saying that you have to explain the choices you make. You're not explaining your choices; and it would be easy to do so: just create Appendix:Proto-Balto-Slavic and do it there, and link it to other pages. (After discussing the 'best solution' with your colleagues, i.e. after you and Štambuk and whoever else is intersted finally agree on how to spell proto-BS words.) But if you simply take down Karulis' opinion without justifying it -- and obviously you can try to justify doing it, since you just did it in the preceding paragraph -- you are NOT improving Wiktionary; you're just making it look more whimsical. My entire point in a nutshell: why hide the reasons for making a choice, especially when this choice is the crucial thing -- the very name of the page you create depends on it? --Pereru (talk) 20:26, 24 August 2015 (UTC)
Are you saying I need an Appendix page in order to remove an etymology I judge to be bad? Lots of other editors before me have simply edited out bad content, nothing to it. I'm just doing what others have also been doing already. It's you that's now trying to change all this and making it much more complicated, and then complaining when someone doesn't simply do it your new way and they start to butt heads with you. —CodeCat 20:32, 24 August 2015 (UTC)
Yes, but it's actually very simple. The Appendix page you need is a general guide to why certain things are 'bad content' -- they don't follow accepted correspondences, or they misapply sound laws, or are based on some idea (say, Glottalic Theory) that has been disproved, etc. Only one such page would probably solve all your problems. Then, when you remove an etymology that you think is bad and replace it with one you think is good, you mention in a footnote that so-and-so prosed the bad etymology, but then there's reason 1 and 2 (say, correspondence nr. 35, and sound law nomber 4) why this was bad -- see Appendix:Proto-Indo-European reconstructions -- which is why it was removed here. For deeper differences you might have specific pages, but I don't think there would be many of those, no. And it would also be possible to link Wikipedia pages, in case you see one that you agree with and you think actually explains the issue. The actual argumentation for removing an etymology would probably be one sentence long, and be added as a footnote. You could also mark it as a "Wiktionary editorial decision" if you don't want your name there. --Pereru (talk) 21:01, 24 August 2015 (UTC) NOTE: but note also that you'd have to deal with those that disagree with your reason. I suggest that anyone who disagrees with an etymology should first mention it somewhere -- the talk page of the protoform in question, or maybe WT:ES -- before making the change. If you do make the change, then also be ready to discuss with whoever disagrees with it, and if his/her arguments are good, then incorporate them in your rationale for accepting and/or refusing his/her criticism in the original footnote that explains the change.--Pereru (talk) 21:04, 24 August 2015 (UTC)
Unless we make this a rule for the removal of any content, etymological or not, then I'm not on board with this proposal. It would have to be justified why the rules for removing bad etymologies are different from those for removing bad anything else. Wiktionarians have always had the prerogative to delete content they think is bad, and they've never had to refer to some kind of standards document to justify their removal. An edit summary has generally been enough, and often even that is not done. This has worked well enough so far that you're the first to propose a change. So I will be expecting a more general support for this idea as it seems like a solution without an obvious problem. —CodeCat 21:10, 24 August 2015 (UTC)
You might do this if you want, though you yourself have pointed out repeatedly that etymologies are not words, so it's up to you to argue why they should follow the same rules. Feel free to present your arguments. As for me, obviously, protoforms (to quote your post) are proposals, not words; and, in science, proposals exist only because of their arguments. Unless you've changed your mind and no longer think protoforms are proposals rather than words, you should agree, for the sake of consistency with your own stated opinion.
I dispute the idea that Wiktionarians have always been free to delete whatever they thought was bad content; if they don't justify their deletions, they are stopped and blocked after a while -- i.e., others have to agree with them, tacitly or not, or else they are not allowed to continue. Adding justifications to protoforms, especially when you're making choices, falls within this general area. I maintain that for protoforms (= proposals), justifications are more important, let's say as important as sources are for quotes. You don't seem to want to address that, so I'll assume you tacitly agree (as you assume those Wiktionarians who don't revert your edits tacitly agree with you -- "tacit consensus", right? :-) --Pereru (talk) 22:22, 24 August 2015 (UTC)

I kind of like this idea – referenced Proto-Baltic pages with a clear disclaimer that it's a defunct grouping in its classical sense according to the most recent sources. (Disclaimer: I have yet to see serious challenging of Slavic being a daughter of W-Balt, thus I do not believe that there can be W-Balt + E-Balt grouping excluding Slav).

We do not misrepresent PB sources by altering the form of the reconstruction to make it PBS-like. --WikiTiki89 I agree with this, "sadly" that is not exactly the case ("correcting" referenced (even if deprecated by mod. stand.) PB forms to Orig. Res. PBS forms is what lead to edit-warring a while ago, in my reading of things.)

My (personal/pseudoscientific) reading between the lines of Pereru's proposal is that it would serve as another "safety valve" and, baby, we couldn't have enough of those, lol. Neitrāls vārds (talk) 22:00, 24 August 2015 (UTC)

From where I stand, PBS does look like a better grouping that PB (the evidence seems to be accumulating). But in the absence of a general work on the topic (say, a PBS etymological dictionary), I don't think it can be regarded as settled -- I'm just conservative on this point. But I have nothing against it as a theory, and as long as things are clearly marked and sources are not misrepresented, I have no problem with it. --Pereru (talk) 22:22, 24 August 2015 (UTC)
Also, I'm not in principle against altering the forms of reconstructions -- I just think this should be done in the open, with the rules clearly laid out and placed in some page where others can see them. What is the point of "adjusting" a form to a spelling that was not in the original source, and then doing nothing, not even adding a footnote, thus misrepresenting the original content? And it's so easy to do it right -- just add the footnote, or change the source to the one whose spelling you think is better. This implies adding only a few words, keeps things clean and organized, and doesn't prevent anyone from expressing his/her agreement or disagreement with this or that protoform. Why not do it? Or, worse yet in CodeCat's case, why fight against it? --Pereru (talk) 22:26, 24 August 2015 (UTC)
Yes, wouldn't it be so much easier if everyone just saw it your way? Why do people always have to make it so difficult by disagreeing with you? It's so inconvenient. —CodeCat 22:38, 24 August 2015 (UTC)
Indeed! You have much more experience than I do with being in this position, so I'm hoping you'll share your wisdom in this respect? And especially with respect to my old, old question: "all good etymological dictionaries do it this way, and CodeCat does the opposite. Now, who do you think is more likely to be wrong?..." --Pereru (talk) 22:55, 24 August 2015 (UTC)
So, to sumarize: I'm OK with deprecated pages/redirects, as long as it is clear which form is which, and who proposed what and why. As far as I'm concerned, this settles the question. --Pereru (talk) 05:25, 25 August 2015 (UTC)
But why create Appendix:Proto-Baltic/akmens with an unusual "deprecated" infrastructure? Why a hard redirect wouldn't do? In case of proto-languages on the same level we should use soft redirects, because the page can contain homonymous roots. Why do that for Proto-Baltic? How will a user ever even get to the Proto-Baltic page? --Vahag (talk) 09:09, 25 August 2015 (UTC)
Why a hard redirect wouldn't do? Hard redirect to what though? You mean akmō which is somehow mysteriously unciteable (I was actually looking at it and wondering whether to ask Itsacatfish if it would be possible to come up with some refs (non-agnostic of PBS) but I wouldn't want to draw any "innocent" editors in this drama.) Neitrāls vārds (talk) 22:31, 25 August 2015 (UTC)
Uncitability is a different question and has nothing to do with the policy of redirecting. The PBS page will presumably have CodeCat's original-research justifications (I'm with Pereru on this one). --Vahag (talk) 08:26, 26 August 2015 (UTC)
The discussion here (including other headers above) seems to have some of the problems arising from overdoing lexicography:
  1. from trying to use sources to "attest" reconstructions,
  2. and from treating reconstructions as "headwords" — instead of kind of index words for etymologically connected word groups.
Creating redirects for alternate reconstructions, and discussion of competing (though not necessarily depreciated) approaches both sound like good ideas, but I do not see the benefit in creating separate pages altogether for reconstructions based on more or less the same data as another one.
If (and it appears to me that this is an if) the point of protolang pages is to illustrate the connections between attested languages, then cutting down on repetition is necessary. We do not create separate appendices for things like West Germanic or Anglo-Frisian, even though they are known to have existed; since this stuff can be adequately discussed already in the "Proto-Germanic" appendices.
I would hold that, strictly speaking, we have no such thing as an "accepted reconstructed language" on Wiktionary — that's why they go in the Appendix namespace to begin with. Which is not a namespace that means "just like mainspace, but for second-tier languages". As I see it, an appendix-only status means not only that protolanguages can be subject to new limitations like possibly requiring sources, but also that they don't need, and in some respects probably shouldn't, be treated as lexicographic subjects.
I also welcome explaining systematic details on how and why to present reconstructions on pages like Wiktionary:About Proto-Balto-Slavic. That said, if the dispute is about a current inability to establish a consensus reconstruction of PBS that we could use as the index forms, there are a couple of alternate possibilities that can be considered:
  • Picking an index language and listing forms under its' reflex. In the 'stone' example above, we'd perhaps use Lithuanian akmuo. This seems a bit difficult to fit into the Appendix:Proto-Whatever/word notation, though (it might appear to imply that it is a Proto-Baltic or Proto-Balto-Slavic form rather than Lithuanian).
  • Using rough "non-reconstructions". A convention introduced by I think Roger Blench is the symbol "#" in place of "*" when we have a cognate word-family but no systematic reconstruction scheme has been worked out in detail; and adding this to some kind of a "majority representation" (in principle partly arbitrary) of the word root's shape. In this case this would probably bring us to #akmV (since the ending seems to be the main issue).
--Tropylium (talk) 08:21, 29 August 2015 (UTC)
Tropylium said (..) problems arising from overdoing lexicography (..) from trying to use sources to "attest" reconstructions – Dan brought up that in their opinion protoforms need (in wiki jargon) tertiary sourcing (as opposed to secondary sources.) I completely agree with this (I also think that the vote that's "in the pipeline" essentially implies this.) A way to paraphrase it would be to say that protoforms need to be sourced as "ideas" or "concepts" (which is exactly what they are, imo) as opposed to sourcing them like "real words." On one hand the sourcing requirements are more stringent, OTOH in that they are not "words," things such as "uniformifying" their spelling would be allowed (and prob. encouraged) unlike "real words" where one would need an "alternative/archaic/blah spelling of". There's a bit of disc. on that here.
It sounds like there might some kind of inflation of "source grades" going on here. In lexicography, an attestation is a primary source, while a mention in a research paper would be a secondary source. However, in etymology, attested words are merely data, while a publication proposing an etymology or a reconstruction is a primary source. An etymological dictionary would be a secondary source; it'd take something like an etymology section in a general-purpose dictionary to reach a tertiary source. And as usual, there should be no reason to demand tertiary sources specifically.
It also follows that basing a reconstruction on "just look at these words here" is not merely synthesis of existing sources, it's unambiguously original research; and, similarly, demanding etymdict-type sources would be equal to positioning Wiktionary as a Wikipedia-type tertiary source. --Tropylium (talk) 20:32, 31 August 2015 (UTC)
In lexicography, an attestation is a primary source – I think on wikt. this is treated as secondary(?) – their head/brain was the "primary source," then they publish it (which renders it "archived") and then it becomes secondary (in my reading of wiki jargon anyhow.) Perhaps, indeed, there could be a "shift forward" by one "grade," so, if lexicog sourcing is primary in your terms then it would make the proposed protoform sourcing "secondary" (or tertiary if it's "shifted" by one.) Neitrāls vārds (talk) 20:53, 31 August 2015 (UTC)
Etymological dictionaries are sources of new information as well, not just research papers. I consider them secondary sources, because the primary source is the attestations themselves, and etymological dictionaries and research papers contain interpretations and conclusions based on these attestations. This is the same as what Wiktionary does: Wiktionary collects attestations in the form of citations from primary sources, and then makes interpretations and conclusions as to the meanings of the words and other aspects. Reconstructions are just another kind of interpretation and conclusion drawn from the data, except they're drawn from attestations of many words collectively. Hence, the question that still remains to be answered is whether Wiktionary is an etymological dictionary (secondary source with its own interpretations) or an encyclopedia/compendium of etymological research (tertiary source). Currently, Wiktionary is an etymological dictionary/secondary source as it contains its own interpretations of the data. —CodeCat 22:27, 31 August 2015 (UTC)
Imho the Baltic stuff doesn't even merit a discussion, make it an etyl-only lang and period. However, edit-warring between certain users, a knee-jerk block on a certain user and a lot of generally disruptive stuff (in the strictest sense of this word, aka, some users would probably be making useful contributions if not being swept up in this drama, hence disruptive) show that there is definitely lack of tools to solve (or if you ask me, not have it in the first place) this type of stuff. Neitrāls vārds (talk) 19:52, 31 August 2015 (UTC)

Just in case, here's the page I was referring to WP:TERTIARY.

I'm also confusing some things myself, it's not the act of sourcing that is to be any "-ary", it is what the project is supposed to be, e.g., wikip. is supposed to be a tertiary source, wikt. is supposed to be a secondary source (well, kind of... the lexicog part at least?), as opposed to secondarily/tertiraily sourced, this is where the "grade inflation" came from, sorry, my bad!

Also, I don't think there would necessarily be difference in the "grade" of a res. paper and an etyl dict. as the wikip. page seems to suggest that it's the manner of how it is being discussed that determines the "grade", e.g.,

  • John Doe wrote in his book "the water is so clear and splashy..." – someone on wikt. makes the judgement that the term "water" is used to mean H2O, J. Doe then is primary source and by using this citation to assert that "water" does indeed mean H2O wikt. is a secondary source.
  • But then Jane Doe mentioning a (hypothetical) protoform *wōdor (or whatever) would have to spell out that "wōdor is H2O in Proto-Whatever, because, I, Jane Doe said so." Then this becomes secondary and by quoting this wikt. is being "tertiary."

Neitrāls vārds (talk) 18:15, 1 September 2015 (UTC)

An etymological dictionary can be a primary source, in case its editorial team advances any analyses or etymologies that are entirely new. Most that I have used mainly compile etymologies established by earlier research, however.
This might be something dependant on the language family though… families like Indo-European or Uralic have a deep research history and there is quite a bit to be cited; but elsewhere, it might well happen that a wider etymological dictionary will be the first etymological source to treat a given language at all.
(The arbitraryness of the language/dialect division brings up a couple difficulties here, too; if variety X had earlier been considered merely a dialect of language A, then it'll be debatable if old sources on language A will count as sources on X as well.)
but back to reconstructions: when we're dealing with unattested protolanguages, the crucial point is the lack of lexicographical primary sources. (I'll skip edge cases like Latin or Sanskrit for now.) This means that something else has to be the primary source, and this is going to depend on what exactly we are sourcing.
  • If we want to establish a proto-Fooian word for let's say 'macaroni' having existed at all, then already someone pointing out that a set of words in Fooian languages for 'macaroni' are of common inheritance is a primary source (common inheritance necessarily requires that a PFooian word existed). Of course, this also presumes that a Fooian language family is already established. After all we presumably do not want to leave backdoors open for Proto-Worlders, Hungaro-Sumerists, everything-is-Tamil-ists, or even partisans for debated families like Hokan or Nilo-Saharan…
  • If we want to know some details about how the PFooian word should be reconstructed, someone's paper or monograph or course handout or whatever, on Fooian historical phonology or historical morphology or semantic change or whatever will be a primary source; and I suppose this holds even if they don't treat the particular word we're interested in. (After all, you don't need to list every single example out there in order to establish that e.g. French m- usually corresponds to Spanish m-.)
  • If later on Jane Doe comes along to put the pieces together to state "the Proto-Fooian word for 'macaroni' is *nuduly", this would indeed be a secondary source — provided that she's not proposing new details in the process.
--Tropylium (talk) 21:10, 1 September 2015 (UTC)

Links in examples of non-English words[edit]

I mean in the entry πειρατής#noun (meaning: pirate) the example

  • Πειρατές του Αιγαίου (meaning: pirates of the Aegean Sea)

(strictly speaking this example should be in the entry πειρατές, which is the plural nominative of πειρατής)
>>>> I believe the links of this kind are very useful for an english-speaking person who wants to learn that other language (Greek in the case above) because she/he can examine the word for word translation of the example (when a word for word translation can be provided for an example).
Another user reverted an edit of mine that added a link of this kind. Is there a Wiki-Decision on this issue?SoSivr (talk) 10:26, 25 August 2015 (UTC)

See WT:ELE#Example sentences: "Example sentences should... not contain wikilinks (the words should be easy enough to understand without additional lookup)". However, that policy may have been written with English example sentences in mind; perhaps it's time to reconsider it for other languages. —Aɴɢʀ (talk) 10:55, 25 August 2015 (UTC)
Yes this occurred to me some time ago. I'd like to split the rule for English and non-English entries, or just abolish it all together. Renard Migrant (talk) 17:04, 25 August 2015 (UTC)

Implementing some type of autolinking in usexes has been brought up (by Benwing, I think?) and I really like this idea. I have been doing this manually (as in Россия (Rossija)), it's a bit of a pita doing it manually though. Neitrāls vārds (talk) 23:52, 25 August 2015 (UTC)

"A bit of a pita"? I don't support autolinking in usage examples. It barely works in headword lines. --WikiTiki89 00:33, 26 August 2015 (UTC)
Oh... I just got what a pita is (only because someone else used it in all caps in a discussion below). --WikiTiki89 03:06, 26 August 2015 (UTC)
I support autolinking. It would be better if they were black links, because the wrong links would be hidden and it would be easier on the eye. — Ungoliant (falai) 00:37, 26 August 2015 (UTC)
@Ungoliant, With the so-called "orange" links (when a landing page doesn't have the header for that lang) built into the software they could be made pretty accurate (only capitals at start of sentences would be a problem. @WikiTiki, well, perhaps the person originally suggesting this could share their vision of how it could/couldn't be implemented, Idk. Neitrāls vārds (talk) 01:24, 26 August 2015 (UTC)
Full support. Autolinking has been a de-facto standard for Chinese lects. In fact, you need to add an @ sign to remove links in {{zh-usex}}. In any case, the choice should be available for difficult or rare words, especially in foreign languages. I consider this quite important for languages without spaces between words (existing usexes may need to be need to be rewritten to allow autolinking as in เรียก). --Anatoli T. (обсудить/вклад) 01:27, 26 August 2015 (UTC)
I like the idea of autolinks, if it can be done right. Wikitiki, can you explain what doesn't work currently? Benwing2 (talk) 07:06, 26 August 2015 (UTC)

Adding a collocations tab or section[edit]

In the past, there has been support for listing common collocations somewhere (besides usexes, which only fit a few), such as in ====Collocations==== sections. At WT:RFD#sentimental_value, it was suggested that not only collocations but also translations be provided. IMO, it might consume too much visual and byte space to list translations of collocations within entries, so I propose that we [ask the developers to] create a 'Collocations' namespace with its own tab like 'Citations'. We could also link to it using a {{seeCites}}-type template in entries. In that namespace, we could list common collocations, perhaps as the glosses to translation tables to which translations could be added — I have mocked up an example at Talk:goods; note that SOP translations are linked to their component parts. What do you think; would you like a Collocations: tab, a ====Collocations==== section, or neither? Should the tab or section contain translations, like at goods? - -sche (discuss) 19:44, 25 August 2015 (UTC)

Seems like a reasonable solution to a perennial problem, at least if the default search includes the Collocations namespace. If it doesn't, we won't have helped users. I suppose I would support it anyway because we might be able to come up with some other way to facilitate user search access to it or technical possibilities and rules may change. DCDuring TALK 00:18, 26 August 2015 (UTC)
Adding another namespace is a PITA because there's no enforced correspondence between the entry and the other page (which is why the citations namespace should be deleted). If it's too much of a distraction it can go in a collapsed box. DTLHS (talk) 02:18, 26 August 2015 (UTC)
  • I don't know the technicalities of the issue. However, I would strongly support this idea, as it would make a natural repository for SoP expressions which actually have some linguistic value, such as what we often call "set phrases", or what is the usual (unexpected) verb which collocates with this noun?- ("wage war", "run for president", "wax lyrical"), as well as being a useful tool for Eng L2 students. There are well known lemming dictionaries out there devoted to the theme of common collocations. -- ALGRIF talk 15:50, 29 August 2015 (UTC)
I support this too. Useful for everybody. — Ungoliant (falai) 16:26, 29 August 2015 (UTC)
As a follow-up to my proposal re translations tables (see my mock-up of what Collocations:goods might look like): when a collocation has a synonym which has a main-namespace entry, we can of course use {{trans-see}}, like so. - -sche (discuss) 03:02, 1 September 2015 (UTC)
  • I support a separate namespace for collocations to such an extent that I strongly oppose an additional Collocations section in principal namespace that took up as much screen space as -sches's example. Imagine what that would look like for words like set, take, have, head. I think users of English Wiktionary who want a usable monolingual English dictionary for definitions and diction guidance already have a lot of drek (from their POV) to contend with:
    1. German or Translingual entries where their WP habits lead them to expect English;
    2. {{also}} that often leads them to FL entries;
    3. alternative forms sections that go to form of entries that convey no additional information;
    4. lengthy etymologies, with PIE and cognates;
    5. pronunciation sections they can't use without IPA;
    6. translation tables;
    7. semantic relations headers using words that don't occur in normal speech [hypernyms, hyponyms, troponyms, meronyms]).
Adding something else to this list seems like a good way to drive any English monolingual speakers away for good. DCDuring TALK 13:21, 1 September 2015 (UTC)
The version of Google that I end up most often using shows succinct definitions of English words above the search results without having to visit any of the search results. When traveling Google automatically switches to the local version of the site and I haven't seen any real differences, I also saw a tidbit about Wikipedia shedding 250 million visits (in July, I think?) aside from a "summer slump," these Google "blurbs" where mentioned as a factor in this rather dramatic decrease.
Imo, if Wiktionary is to ever have "an edge" over competitors, it's by providing highly detailed information. Neitrāls vārds (talk) 23:52, 2 September 2015 (UTC)
  • "collocation" is itself a word that doesn't occur in normal speech. If this is going to be helpful to everyday users, I think needs a clearer name. Nothing's coming to mind though. Also, which collocations would be common enough to list? Smurrayinchester (talk) 13:20, 3 September 2015 (UTC)
    I think we would have to have the courage and humility to have a name for the tab like "Words used [with] this word". DCDuring TALK 17:48, 3 September 2015 (UTC)
    How about just "phrases" or "related phrases"? Equinox 17:50, 3 September 2015 (UTC)
    Or Derived phrases, as I assume the headword or its inflected forms must be included in the collocations. That would at least be consistent with the use of derived in derived terms. DCDuring TALK 21:40, 3 September 2015 (UTC)
    I would prefer it that derived terms be split into different types more generally. Separating phrases from other derived things is useful, but it's also useful to separate, say, compounds from affixed words, or compounds with the term as a head from compounds with the term as a modifier. —CodeCat 21:42, 3 September 2015 (UTC)

I like this idea. While I don't have particular dislike for the citations tab (admittedly it's a bit of a mystery to me), I'm inclined to agree with DTLHS in that I prefer all the information in the relevant entry. Neitrāls vārds (talk) 23:52, 2 September 2015 (UTC)

Thanks for drawing my attention to the Google definitions. They seem better in quality than ours. The have copious synonyms. We cover more variation in senses, including obsolete, archaic, and obscure ones, though they have an excellent expanded display of additional definitions. They have good etymologies, though what I saw was presented in a confused graphical way that IMO misrepresented the facts they reported in text. They offer translations, too. IMO, all online dictionaries will have problems competing.
The bright [side] of the reduced number of WP visits is the lower load on the servers!!! DCDuring TALK 00:25, 3 September 2015 (UTC)
  • I oppose creating a separate namespace for collocations. Listing them in the mainspace is fine. --Dan Polansky (talk) 18:34, 4 September 2015 (UTC)
  • Re Derived Terms vs Collocations. A derived term would normally be a blue link term. E.g. mineral water is a derived term in the entry for mineral. A collocation is "use of English". Collocations are generally not going to be suitable as main entries, for SoP-iness. It's, for instance, knowing typical useful verb-noun collocations - such as "follow instructions", or "take aim", or "entertain a doubt". It's also about typical adjective-noun or adverb-verb collocations, etc. Words such as "wedge" would be enhanced if you could read that one "drives a wedge between" things, and so on. Word chunks that, if you know them, can make your English sound "good". However, I repeat, these are not derived terms. I am of the opinion that, by the time any Eng L2 learner is wondering about this stuff, they will already be familiar with the term Collocation, and so will be very happy to see the section, or tab, available as a resource. -- ALGRIF talk 15:32, 7 September 2015 (UTC)
I disagree that an English L2 learner would likely know the meaning of the word "collocation". For instance, my French is at the point at which such a tool on Wiktionnaire would be very useful to me, but if it was labelled "collocation", I would only know what it meant because it's the same word in English. I haven't ever seen the word in French before (the only reason I know it's the same is because I just looked it up), and I imagine it would be the situation for second language speakers of English. Something slightly wordier, but using more familiar English, might be "common phrases with this word" or "derived phrases" (as distinct from "derived terms"...which could be confusing, especially for new editors). Andrew Sheedy (talk) 21:43, 7 September 2015 (UTC)
While not detracting from your experience, I would also say in my defense that anyone taking First Certificate or similar would already know and use the term Collocation. Furthermore, whatever is chosen in the end has to fit neatly on a tab. -- ALGRIF talk 13:33, 8 September 2015 (UTC)
I bet more people would know collocation than hyponym, hypernym, coordinate and etymology. — Ungoliant (falai) 14:11, 8 September 2015 (UTC)
That's not saying much. Etymology, though is different, because it's present as a label in lots and lots of dictionaries. My suggestion would be "combinations", as in "what terms do you usually find it combined with?". While combine isn't a basic word like you would find in a book for young children, it's not eye-glaze material like collocation. I read a lot about language and have an undergraduate degree in Linguistics, but I don't remember ever seeing collocation in use before I came here, nor did I remember what it meant when I first saw it- though I must have encountered it at least a few times over the years. Chuck Entz (talk) 01:28, 9 September 2015 (UTC)
How about "Expressions" or "Phrases"? --Panda10 (talk) 13:16, 9 September 2015 (UTC)
FWIW, despite having received a degree in linguistics, I had never heard collocations used in this sense until this discussion; whereas I have heard and used hyponym, hypernym, coordinate and etymology numerous times. It certainly appears to be the correct term for this discussion, and it seems like in the past we have not shied away from using the right terminology (like collateral form or deponent) despite the relative obscurity. I'd certainly be fine with a different name, but also wouldn't dismiss collateral wholesale based on obscurity. —JohnC5 13:34, 9 September 2015 (UTC)
  • I don't know if anyone has looked at the Pedia entry? It is very informative for those of you who are not sure about the use, usefulness, or correctness of collocations. -- ALGRIF talk 15:02, 24 September 2015 (UTC)
 collocation on Wikipedia
I have created Wiktionary:Votes/2015-09/Adding a collocations or phrases namespace or section so we can obtain a clear, enumerated consensus (or lack thereof) to show the devs, because we will need to ask them to add the namespace if the namespace is what we want. (It's not difficult to ask them and AFAIK it's not difficult for them to add a namespace; it's how we came to have a citations namespace. It's just a technical observation that we can't create a namespace ourselves, we have to ask them.) Please fix/point out any problems you see with the vote, suggest/make improvements, etc. As it gets closer to the scheduled start date, I will ping anyone who has participated in this discussion but doesn't seem to have noticed the vote. - -sche (discuss) 18:09, 24 September 2015 (UTC)
Wiktionary:Votes/2015-09/Adding a collocations or phrases namespace or section has opened. - -sche (discuss) 17:10, 8 October 2015 (UTC)

Gender markers in Polish adjective entries[edit]

{{pl-adj}} currently requires a gender parameter. However, gender in Polish adjectives is inflectional, not lexical, and the lemma form is almost always masculine nominative singular (with rare exceptions for "female-only" adjectives like ciężarna (pregnant) or szczenna (pregnant with puppies)). I think these markers should be removed and the gender parameter ignored and eventually removed through a bot, as the exceptional cases can be easily identified by looking at the adjective ending. Are there any objections? --Tweenk (talk) 22:30, 26 August 2015 (UTC)

If the gender can always be determined from the ending, then this sounds good to me. Even if in rare cases it can't, it might still be better to have the gender auto-detected and only present as an override. Benwing2 (talk) 08:30, 27 August 2015 (UTC)

Allowing matched-pair entries[edit]

I created Wiktionary:Votes/2015-08/Allowing matched-pair entries as a proposal to formalize entries such as ( ), based on the discussion Wiktionary:Beer parlour/2015/July#Merging ( and ) into a single entry. Thoughts? Can this vote be improved? What would be your vote and why? Feel free to edit it. --Daniel Carrero (talk) 14:27, 27 August 2015 (UTC)

Having an entry for () or some variant is one, thing, I doubt we actually want to delete ( or ) as they're real. Of course, can be used in smileys where pairing is not necessary. Renard Migrant (talk) 16:37, 9 September 2015 (UTC)
A more legitimate unpaired use outside of smileys is, for example, numbering: 1) like this. 2) like that. --WikiTiki89 16:41, 9 September 2015 (UTC)
Yes. Because :) is debatably more of an image than a word or a symbol. Renard Migrant (talk) 16:46, 9 September 2015 (UTC)
True. See also Citations:) for examples. --Daniel Carrero (talk) 18:30, 9 September 2015 (UTC)

Scientific symbols?[edit]

At present, there's no good place to put scientific symbols in entries (eg E for energy or electric field, t1/2 for half-life etc.) What would people say to modifying {{en-noun}} or creating a new inflection line template to show these symbols (similar to what's currently done at speed of light, but neater). So for instance:

speed of light ‎(uncountable, symbol c)
velocity ‎(countable and uncountable, plural velocities, symbol v or v)
magnetic flux ‎(uncountable, symbol Φ or ΦB)
neutron ‎(plural neutrons, symbol n)

There are some shortcomings (for instance, the need to sometimes use bold or italics) so I'd be happy to hear other suggestions. Smurrayinchester (talk) 15:49, 27 August 2015 (UTC)

I think it would be better if we agreed on a guideline on how to add them to definition lines rather than HWLs, because a symbol doesn’t always apply to all senses (i.e. velocity (rapidity of motion) and speed of light (figurative: extremely fast speed)). — Ungoliant (falai) 15:57, 27 August 2015 (UTC)
Why aren't they just displayed next to the appropriate {{sense}} under Synonyms, just like abbreviations sometimes are and always should be, IMO. I could understand making these symbols larger, having a different background or a border, etc to make them more visible as they could get lost in a series or block of synonyms. DCDuring TALK 16:26, 27 August 2015 (UTC)
Surely not worth changing en-noun for this. Use alternative forms of synonyms. If really necessary use {{head|en|noun}}. Renard Migrant (talk) 17:38, 27 August 2015 (UTC)
In an entry for an English word there is a section English, in an entry for a French word there is a section French and so on. But in an entry for a number, e.g. 7 or for a symbol, e.g. c, there is a section Translingual. Therefore similarly one could have an additional translation for e.g. the english noun velocity as rapidity of motion:
  • French: vitesse
  • Spanish: velocidad
  • Symbol(or Translingual): v

SoSivr (talk) 21:39, 28 August 2015 (UTC)

It's not a translation of the word, though: it's a conventional abbreviation. Equinox 21:42, 28 August 2015 (UTC)
Such symbols are normally Translingual. Thus they might be a synonym in many languages. DCDuring TALK 21:56, 28 August 2015 (UTC)
I agree with DCDuring, list abbreviations in the Synonyms or Alternative forms section. This is also how we handle non-scientific abbreviations, in my experience, like United KingdomUK. - -sche (discuss) 22:15, 28 August 2015 (UTC)

Attributive use of nouns[edit]

How to we tell for certain that a noun that modifies another noun is or isn't an adjective? For instance, I'm pretty sure that the word donkey in "donkey sanctuary" is just a noun, as is beer in beer parlour. An example of true adjectival usage would be welcome. SemperBlotto (talk) 14:57, 29 August 2015 (UTC)

Wiktionary:English adjectives which is of course, not policy. Wiktionary:About English contains no policy that I can see on what separates an adjective from a noun used attributively. I actually don't think it's that hard and in ambiguous cases, there should be three citations which are clearly adjectival not either nominal or adjectival. For example "this desk is wood" would not count as a clear adjectival cite as it's just as easily (or more easily) identifiable as a noun than an adjective. Renard Migrant (talk) 15:11, 29 August 2015 (UTC)
It's very difficult to get a wording through a vote, though. Even people who agree that we need such a policy will oppose on the grounds of wording, so getting 70% ish approval is unlikely. Renard Migrant (talk) 15:12, 29 August 2015 (UTC)
Would you say that epidemic in "epidemic proportions" is an adjective? It seems so to me (but I can't explain why). SemperBlotto (talk) 15:55, 29 August 2015 (UTC)
Yes, you're right. [9]. Donnanz (talk) 17:02, 29 August 2015 (UTC)
Yes, "proportions" usually takes an adj; e.g. you'd say "canine proportions", not "dog proportions". Equinox 17:11, 29 August 2015 (UTC)
Apply tests of adjectivity, and Occam's razor. Donkey has not (yet) been shown to be used in contexts that are clearly adjectival, like this sanctuary is donkeyer than that one; it was very red and very donkey. In contexts where either a noun or an adjective could work (donkey sanctuary could be compared to noun sanctuary or improbable sanctuary), Occam's razor suggests it's more likely to still be a noun than to have acquired a second part of speech which is peculiarly limited to only those varied contexts where the first part of speech could also be used. On the other hand, epidemic is used in contexts where only an adjective could work, so it must be an adjective (some of the time). It's also used in contexts where only a noun could work, e.g. in the plural, hence it is also a noun. When a word that has been shown to be both an adjective and a noun is used in contexts where it could be either (like epidemic disease), I think we've tended to default to the interpretation that it's an adjective unless semantics make the other interpretation more likely: e.g. the adjective is the best semantic fit in epidemic fraud (widespread fraud), while the noun would be the best fit in *epidemic storage (section of a lab which stores samples of viruses that cause epidemics). But if a prime minister fakes an outbreak of disease in order to push through security measures, you could speak of "his epidemic fraud" with epidemic as a noun, just like you could mock postmodernism as "that postmodernism nonsense". - -sche (discuss) 16:45, 29 August 2015 (UTC)

General thoughts on this:

  1. The reason that it's difficult to get a policy through is that there's no bright line.
  2. This is a particularly confusing subject for non-English speakers. I speak English 1st and French 2nd. French isn't big into attributive nouns. In English, you can construct a sentence "A B", where A is an attributive noun and B is a common noun. French you usually construct it "B de A", where A and B are nouns and "de" is the preposition "de"

Purplebackpack89 17:01, 29 August 2015 (UTC)

Some French adjectives feel a lot like attributive nouns to me, e.g. routier (not comparable and so forth). Equinox 17:11, 29 August 2015 (UTC)
True adjectives can be qualified by adverbs. Epidemic fraudfraud was indeed epidemic; Epidemic storage → *the storage was indeed epidemic (in the sense -sche mentioned); the table is woodenthe table is solidly wooden; the table is woodthe table is solid wood, *the table is solidly wood. — Ungoliant (falai) 17:15, 29 August 2015 (UTC)
I wouldn't really have any problem with "the table is solidly wood". —CodeCat 17:45, 29 August 2015 (UTC)
In "the table is solidly wood", solidly is modifying is, not wood. --WikiTiki89 14:35, 1 September 2015 (UTC)

restoring solitary wasp[edit]


Perhaps not the best place to post a request but I don't know another place to do it. This is a perfectly attestable expression, and my grammar, though not perfect (I'm not a native speaker) was certainly acceptable, and at least correctable if there were mistakes. Could someone bring back that entry please? I'm really fed up with the cavalier behavior of this admin, really (and not the only one). Thank you 20:50, 29 August 2015 (UTC)

Hi. Yes, it's a real phrase, but doesn't it just refer to any wasp that is solitary (i.e. not social or colony-dwelling)? Then it's obvious from the two words. Equinox 20:59, 29 August 2015 (UTC)
It seems that the terms solitary wasp, social wasp, and hunting wasp have been used as if they referred to well-defined groups, though most modern thinking would apparently have them as SoP. For example, Century 1911 has solitary wasp as a run-in at the entry for solitary. DCDuring TALK 21:33, 29 August 2015 (UTC)
This has been recreated, and I have rewritten it in English. However, I can't find it in any other dictionary and feel it is sum-of-parts. There are plenty of hits for the two words used together so I'm not sure that RfV would be useful. SemperBlotto (talk) 20:33, 31 August 2015 (UTC)
Take a look at the bottom of the entry for solitary in The Century Dictionary, The Century Co., New York, 1911 where it is a run-in entry. I take this to mean that "the solitary wasps" was considered at least an informal grouping at that time and that the most promising source of citations would be before 1910, though the term may have continued in use past that time. DCDuring TALK 22:23, 31 August 2015 (UTC)
I have edited the entry in line with the thoughts above, adding a dated definition with cites from the 19th century and adding {{&lit|social|wasp}} to replace the previous SoP definition. Note that two of the citations are of Social Wasps, the capitalization being suggestive of something other than SoPitude. The entry could be further improved or challenged, of course. DCDuring TALK 22:55, 31 August 2015 (UTC)

Which English entries need pronunciation?[edit]

Can someone generate a list of English entries that don't have {{IPA}}? But somehow sort them in order of importance? I'm not sure how we would go about that, but there are basic entries out there, like garbage, which really should have the IPA pronunciation. Ultimateria (talk) 04:41, 30 August 2015 (UTC)

I'd like this too, though "in order of importance" is probably an unattainable goal. I've added pronunciation info at garbage now. —Aɴɢʀ (talk) 06:52, 30 August 2015 (UTC)

Here's a list of top 100 English entries whose English section did not contain "{{IPA" on 28 July 2014, ordered by Wiktionary:Frequency lists/PG/2006/04/1-10000, not constrained to lemmas, based on 20140728 dump: said, no, de, hands, Gutenberg, english, 2, replied, united, john, looking, coming, making, sn, arms, followed, appeared, continued, ety, reached, suddenly, miles, taking, beyond, nearly, laws, comes, natural, laid, copyright, opened, an', 4, makes, tried, Dr, lived, certainly, unto, placed, letters, remained, blockquote, happened, minutes, loved, knows, donations, thoughts, including, filled, seeing, tears, places, raised, moved, giving, laughed, leaving, started, circumstances, c., lines, considered, observed, wished, Charles, formed, trying, allowed, girls, discovered, sitting, ways, officers, offered, happiness, produced, walls, declared, prepared, takes, soldiers, talking, steps, intended, matters, appears, closed, gives, required, ladies, fixed, troops, camp, copies, v., running, cases, names.

If you want to have the list constrained to lemmas, let me know. Basically, let me know:

  • a) How many items you want
  • b) Whether you want to constrain to lemmas
  • c) To what location do you want the list delivered, like someone's talk page, some subpage, or the like

The process is rather simple, based on a dump. The key part is identifying English sections that do not contain "{{IPA". This is done using the following script find-missing-English-IPA.py:

import sys, re
entryStartFound = False
IPAFound = False
title = ""
for line in open(sys.argv[1]):
  line = line.rstrip()
  if "<title>" in line: title = re.sub(" *</?title> *", "", line)
  if entryStartFound:
    if "{{IPA" in line or "{{audio-IPA" in line: IPAFound = True
    if "----" in line or "</text>" in line:
      entryStartFound = False      
      if not IPAFound: print title
      IPAFound = False
  if "==English==" in line: entryStartFound = True

The rest is intersecting the result with the frequency list in such order that the result is sorted by frequency list. The process was as follows:

  • find-missing-English-IPA.py enwiktionary-20140728-pages-articles.xml >English-entries-with-no-IPA.txt
  • grep -Fx -f English-entries-with-no-IPA.txt frequency-list-English-PG-10000.txt >t.txt
    That's a set intersection, but the order of files matter.
  • head -100 t.txt
    Output the first 100 lines

You need Python, grep and head. You probably do not really need head, since you can pick the top 100 in your favorite editor. grep is used to do set intersection; if you have another method, you don't need grep. --Dan Polansky (talk) 10:34, 30 August 2015 (UTC)

By the way, English-entries-with-no-IPA.txt has 519,273 items. --Dan Polansky (talk) 10:36, 30 August 2015 (UTC)
The first two in your list above, said and no use {{audio-IPA}}, so they do have IPA pronunciations given. I bet several of the others in the list do, too. —Aɴɢʀ (talk) 11:55, 30 August 2015 (UTC)
@Aɴɢʀ: I fixed the script above. Do you want to have the list constrained to lemmas? Do you want to have a longer list? --Dan Polansky (talk) 12:58, 30 August 2015 (UTC)
I don't know about Ultimateria, but I don't want it constrained to lemmas, and I'd like to have an exhaustive list unless that would take too long to generate. —Aɴɢʀ (talk) 13:29, 30 August 2015 (UTC)
The exhaustive list of English entries without IPA is approximately the same as the list of all English entries. It has 519,273 items, as stated above. The list of items in PG-10000 that lack IPA has about 4070 items. I am posting the first 500 items to Beer parlour; when you're done adding IPA to those, drop me a line on my talk page to get more:
List of words

de, hands, Gutenberg, english, 2, replied, united, john, looking, coming, making, sn, arms, followed, appeared, continued, ety, reached, suddenly, miles, taking, beyond, nearly, laws, comes, natural, laid, copyright, opened, an', 4, makes, tried, Dr, lived, certainly, unto, placed, letters, remained, blockquote, happened, minutes, loved, knows, donations, thoughts, including, filled, seeing, tears, places, raised, moved, giving, laughed, leaving, started, circumstances, c., lines, considered, observed, wished, Charles, formed, trying, allowed, girls, discovered, sitting, ways, officers, offered, happiness, produced, walls, declared, prepared, takes, soldiers, talking, steps, intended, matters, appears, closed, gives, required, ladies, fixed, troops, camp, copies, v., running, cases, names, Word, higher, et, affairs, wouldn't, repeated, forms, ones, questions, start, smiled, keeping, silver, mentioned, associated, greek, ordered, obliged, rule, members, official, request, heads, dollars, engaged, peter, mountains, greatly, forced, ideas, using, feelings, working, finished, extent, watched, sides, gentlemen, aside, concerning, powers, possessed, building, particularly, knowing, weeks, settled, lies, pieces, clearly, ships, conditions, removed, highest, honor, obtained, presented, fingers, remembered, agreed, fully, rights, servants, sons, shoulders, points, woods, nations, created, refused, quietly, streets, regarded, fashion, surprised, faces, succeeded, birds, failed, peculiar, animals, desired, touched, occupied, expressed, opening, spirits, growing, served, carriage, papers, practice, hast, permitted, enemies, expense, explained, companion, established, suffered, satisfied, numerous, famous, telling, powerful, waters, material, gathered, suggested, finding, remains, seized, equally, naturally, remarkable, gods, saved, crossed, pounds, immediate, willing, principles, characters, paul, remarked, worked, whispered, midst, noticed, aware, genius, spanish, reader, published, you'll, joined, kings, sd, posted, needed, increased, walking, appointed, ceased, numbers, demanded, wounded, listened, contact, distinguished, distributed, watching, wants, occurred, follows, interests, dressed, hopes, smiling, tells, minds, suffering, proceeded, flesh, carrying, legs, duties, admitted, countries, pocket, rules, inhabitants, owner, coat, relations, consideration, accompanied, moving, stands, teeth, treated, burning, completely, resolved, calling, title, understanding, mad, forces, included, nearer, slaves, larger, previous, proposed, stars, informed, moments, supper, fighting, fields, stones, fees, seated, knees, amongst, sending, parties, gained, possibly, receiving, don, hoped, printed, features, fond, capable, firm, spiritual, pressed, sooner, lands, doors, concerned, deeply, destroy, distributing, results, reasons, rooms, useful, addressed, needs, unable, victory, dozen, ended, shows, connected, degrees, committed, notes, gradually, souls, cities, commanded, partly, playing, safety, provisions, asleep, thinks, escaped, bringing, highly, stated, attached, kindness, citizens, clouds, figures, assured, comply, fellows, haven't, gently, directed, pulled, surrounded, wishes, yards, voices, weary, couple, details, awful, asking, showing, introduced, composed, plans, rendered, pictures, volunteers, singing, eager, paused, whenever, successful, plants, granted, you've, obs, trial, learning, approached, paying, fn, disappeared, interrupted, readers, recognized, destroyed, signs, temper, hurried, represented, mental, attitude, returning, causes, vessels, compelled, kissed, younger, companions, harm, views, ends, kinds, branches, inquired, delivered, Word, calls, earlier, visited, sufficiently, natives, contained, perceived, scattered, rushed, helped, treatment, dreams, patient, growth, latin, immense, affected, eternal, pages, sounds, swift, wings, stepped, services, remaining, containing, editions, attended, softly, performed, likewise, frightened, acquainted, unhappy, feared, prisoners, adopted, shalt, thousands, inclined, convinced, valuable, effects, readily, striking, creatures, shouted, related, setting, punishment, slightly, articles, extended, wondered, increasing, expenses, doctrine, mystery, changes, consciousness, trembling, formerly, mankind, habits, estate, reign, shining, reported, unfortunate, classes, banks, glanced, troubled, difficulties, picked, purposes, somewhere, pushed, lately, uttered, ages, murmured, bowed, liability, enjoyed, stretched, belonged, nodded, opinions, indicate, misery, guests, painted, attend, proceed, loves, plainly, risk, doubtless, properly, singular, methods, strongly, breaking, violence, displayed, gets, lights, patience, concluded, approaching, mounted, jane, providing, measures, towns, dared, occasionally, furnished, priests, flying, gazed, movements, eagerly, acted, urged, ascii, disposed, electronically, begged, invited, departed, files, replacement, humanity, quarters, rolled, celebrated, slavery, verse, probable, turns, stared, boats, senses, occasions, readable, inches, bones, materials, managed, preserved, reaching, wretched, hanging, pursued, attempted, centuries, eggs, hastily, generations, located, compared, handed, circumstance, gates, observation, stronger, recovered, belonging, loving, masters, writers, cf., permanent, millions, merry, shadows, sentiment, profits, finds, imagined, raising, lords, separated, tribes, conviction, secured, mixed, insisted, prayers, selected, daughters, warning, developed, impulse, slipped, ours, damages, resumed, yield, schools, confirmed, descended, rush, falls, calculated, somehow, acquired, sins, notion, constitution, hundreds, firmly, actions, remarks, elements, th', vol, fears, nights, limitation, tied, displaying, experienced, opposed, contents, poured, seeking, practically, reports, begins, founded, brings, collected, cheerful, costs, threatened, western, beings, sam, revealed, winds, riding, scenes, industry, claims, pp., thereof, supported, requires, fathers, obey, alexander, exceedingly, continually, rest., gifts, folly, shoes, o'er, grateful, nearest, copying, activity, wives, parted, martin, cottage, jews, leaning, referred, holder, involved, sunshine, dutch, princes, examination, strangers, noted, slightest, realized, attacked, maintained, restored, folks, concealed, heavens, examined, deeds, wordforms, oath, prevented, completed, touching, inner, fix, suspected, contains, sighed, establishment, muttered, oxford, cavalry, succeed, hated, landed, passions, interior, lightly, offering, confined, exhausted, poets, sounded, directions, negro, studied, buildings, commenced, deeper, holds, residence, treasure, throwing, runs, favourite, desires, heavily, assembled, existed, depends, hesitated, staring, roads, pains, performing, grounds, recently, tones, walter, shaking, possibility, marched, writes, issued, sailed, instructions, additions, vi, vanished, arts, supplied, safely, references, passes, presents, marks, obtaining, moreover, commerce, startled, outer, belongs, naked, conducted, rivers, concern, campaign, hunting, whisper, commonly, contributions, operations, caesar, wondering, leaders, altar, tenderness, sharply, distinctly, creating, gather, reflected, preceding, individuals, gazing, armies, limbs, plays, hastened, dragged, pointing, verses, pronounced, tendency, Word, churches, earnestly, considering, bears, signed, mingled, walks, training, relieved, passages, persuaded, sources, inspired, angels, wilt, troubles, Lee, wherever, advantages, fortunate, employment, misfortune, owns, stirred, resist, depths, crossing, independence, breeze, provinces, conceived, relative, solitary, wandering, thereby, locked, courts, regarding, preferred, wherein, condemned, gross, happens, Billy, cleared, fruits, testimony, existing, ranks, beating, judges, simplicity, legally, veil, doubtful, weapons, limits, feeble, examine, corrupt, payments, returns, laying, instances, Greeks, realize, demands, consists, studies, ID, forming, slender, criminal, knocked, masses, indifferent, keeps, regions, intervals, intellect, leads, Lucy, invitation, sentiments, Marie, flash, swiftly, summoned, induced, helpless, preparing, indicated, Germans, attracted, gracious, respects, ventured, Spaniards, wearing, indifference, conceal, pleasures, precisely, registered, gardens, non, greece, childhood, saddle, supplies, weeping, paragraphs, grows, external, agents, institutions, losing, attempts, instruction, roots, jumped, earliest, finest, motives, fastened, converted, fancied, offices, revolution, silently, fires, responded, neglected, engagement, rolling, platform, offers, physician, imposed, organized, covering, wars, he'll, gravely, charges, tragedy, commander, virgin, farewell, villages, hunger, trembled, criticism, Ruth, restrictions, outward, impressed, blows, flashed, owed, satisfactory, originally, Samuel, wages, claimed, glow, emotions, Adam, Jones, wandered, procession, betrayed, admired, elected, Pierre, sunk, ruins, reminded, deceived, tables, starting

--Dan Polansky (talk) 13:47, 30 August 2015 (UTC)
Angr is right, I don't want just lemmas. The entries from PG 1-10000 is a good start. Could you put them at User:Ultimateria/en-needing-ipa? Ultimateria (talk) 19:08, 30 August 2015 (UTC)
Done. --Dan Polansky (talk) 19:17, 30 August 2015 (UTC)
Thanks, Dan! But I'm wondering why nearly is in the list; it's had IPA since February. —Aɴɢʀ (talk) 19:19, 30 August 2015 (UTC)
The list is based on a 28 July 2014 dump, as per above. That should be good enough, I think. By the way, the addition of "audio-IPA" had very little effect. 20-80. --Dan Polansky (talk) 19:29, 30 August 2015 (UTC)
Thanks for the list, Dan! Sad to see that in the past 13 months hardly anyone added IPA to these entries... Ultimateria (talk) 21:10, 30 August 2015 (UTC)


I think someone should check the Polish declension of one. I think the recent change looks very, very odd. —Stephen (Talk) 13:57, 1 September 2015 (UTC)

Removed. It was absolute nonsense. --Tweenk (talk) 19:04, 4 September 2015 (UTC)