Wiktionary:Beer parlour

Definition from Wiktionary, the free dictionary
(Redirected from Wiktionary:Beer Parlour)
Jump to: navigation, search

Wiktionary > Discussion rooms > Beer parlour

Lautrec a corner in a dance hall 1892.jpg

Welcome, all, to the Beer Parlour! This is the place where many a historic decision has been made and where important discussions are being held daily. If you have a question about fundamental Wiktionary aspects—that is, about policies, proposals and other community-wide features—please place it at the bottom of the list (click on Start a new discussion), and it will be considered. Please keep in mind the rules of discussion: remain civil, don't make personal attacks, don't change other people's posts, and sign your comments with four tildes (~~~~), which produces your name with timestamp. Also keep in mind the purpose of this page. There are various other discussion rooms which may serve the idea behind your questions better. Please take a look to see which is most appropriate.

Sometimes discussion identifies an issue as an idea for policy development or rewriting. Such discussions may be taken out of the Beer parlour to a relevant page, or a brand new page may be created. Usually, the active policy pages will be listed in one of the sections below. See also the policy development page and the votes page.

Questions and answers will not remain on this page indefinitely, as it would very soon become too long to be editable. After a period of time with no further activity (usually a couple of weeks), information will be moved to the archives. We make a point to preserve all discussions that were started here in the archives. However, talk that is clearly not intended for this page may be moved and will not end up in the archives. Enjoy the Beer parlour!

Beer parlour archives edit


August 2015

Category:(langname) plurals and Category:(langname) noun plural forms[edit]

Continuing the discussion from Module talk:category tree/poscatboiler/data/non-lemma forms § Plurals and noun plural forms

Right now both Category:(langname) plurals and Category:(langname) noun plural forms exist, their descriptions are the same "(langname) nouns that are inflected to be quantified as more than one (more than two in some languages with dual number)." And they are used in mostly the same way. The main difference is that there are counterparts to Category:(langname) noun plural forms, such as Category:(langname) noun dual forms, which don't exist for Category:(langname) plurals. And this also follows with the naming scheme of (langname) adjective * forms.

I think we should either change plurals to be more general ((langname) terms that are... (vs (langname) nouns that are...)) and move it out of noun forms, or better yet just remove it. Note that (langname) singularia/dualia/pluralia tantum categories exist. Enosh (talk) 18:56, 2 August 2015 (UTC)

I proposed merging the plurals category into the noun plural forms category before, for consistency with other categories. I still support this. —CodeCat 19:02, 2 August 2015 (UTC)
I thought the goal was to merge any plural category into their appropriate forms category. For example Category:Hungarian plurals were merged into Category:Hungarian noun forms a long time ago. So there is no separate category for plurals at this moment. Isn't the goal the same for all languages? Is this discussion related: [1]? --Panda10 (talk) 20:18, 2 August 2015 (UTC)
Yes, it's the same proposal. But I'm not sure what you're asking. —CodeCat 20:23, 2 August 2015 (UTC)
I support merging Category:English plurals into Category:English noun plural forms for consistency with other categories. See also: Category:Noun plural forms by language. --Daniel Carrero (talk) 21:38, 2 August 2015 (UTC)
I support doing this in general. We have both Category:Arabic plurals and Category:Arabic noun plural forms, which ought to have the same contents but don't for reasons I'm not quite sure of; it's a bit of a mess. Benwing (talk) 05:53, 4 August 2015 (UTC)
Finally done for English. That was a lot of work for sure. A lot of entries needed manual fixing too so it wasn't just a simple bot run. In many entries, the plural-of definition was mixed in with other "proper" lemma definitions even though these should be kept to separate noun sections. There were also many entries where the headword line specified a noun lemma, rather than a noun plural form. —CodeCat 22:24, 19 August 2015 (UTC)

Thai transliterations with tones[edit]

Discussion moved from Wiktionary:Grease pit/2015/August#Thai transliterations with tones.

Native speakers seem to dislike dictionary and textbook transliterations designed for learners, which includes tones and replace it with Royal Thai General System of Transcription (RTGS). I see my older edits replace toned transliterations with RTGS.

I think it's a problem. The standard Thai transliteration system (RTGS) lacks not just tones but displays short and long vowels the same way, merges some consonants. I think it can be used as one of the systems but not the main one. I mentioned this in this discussion.

I insist that transliterating Thai tones is very important, not just the nominal but irregular tones as well. We could include RTGS along with phonetic transliterations (another parameter in Thai headwords?).

For example, ฉัน is nominally "chăn" but normally pronounced "chán" (pronoun), also ไหม (sense 1) is pronounced "mái" (nominally "măi"). I suggest we should use toned transliterations, as dictionaries and textbooks do, not as prescribed by the Thai government. @Stephen G. Brown, Iudexvivorum, Iyouwetheyhesheit. --Anatoli T. (обсудить/вклад) 12:17, 3 August 2015 (UTC)

  1. I agree that we need a romanisation system that better reflects tones, short and long vowels, etc.
  2. What system should we use then?
  3. The system developed by Thai2english (T2E) might be okay. But the T2E machine transliterator should be used with caution, as it sometimes gives incorrect transliterations (see the table below).
  4. Some other systems that might work:
    1. The now-defunct 1939 version of the RTGS (English translation) contains a general system and a precise system (which records tones, short and long vowels, etc.).
    2. The ALA-LC system is generally used by libraries in English-speaking countries. But this system lacks tone marks. (Could we add tone marks ourselves?)
    3. ISO 11940 is used in academic context.
--iudexvivorum (talk) 14:22, 3 August 2015 (UTC)
terms romanised by
T2E transliterator
correctly romanised
according to RTGS system according to T2E system
ภิยโย pí-yá-yoh phin-yo pin-yoh
อธิกมาส a-tík-mâat a-thi-ka-mat;
ทรูก trôok suk sôok
ซอมซ่อ som-sôr sommaso som-má-sôr
รอมร่อ rom-rôr rommaro rom-má-rôr
เทพรัตนราชสุดา tâyp-rát-dtà-ná-râat-chá-sù-daa theppha rat rat suda tâyp-pá-rát-râat-sù-daa
นิลรัตน์ nin-rát ninlarat nin-lá-rát
อุตบล u-dtà-bon utbon ùt-bon
I completely agree that transliterations need to reflect long vowels and tone marks. If I'm trying to learn Thai, it will do me no good to have important phonetic information like this left out. Native speakers should not be the ones determining transliteration; translit is not designed for them. However, I think that this T2E system looks just awful, and I don't think it will help. People expect foreign words to follow the usage where a e i o u stand for the sounds they have in Latin and Spanish, rather than using weird things like ay for /e/, oo for /u/, or for (presumably) /ɔ/ (this latter notation is especially unhelpful for American English speakers), etc. ISO 11940 won't work either because it's a translit system in the narrow sense in that it reflects the writing rather than the pronunciation (properly speaking, Wiktionary misuses "transliteration" to mean "transcription" but that is a discussion for another day). Adding the tone marks to the ALA-LC system is not a bad idea; you could imagine taking the T2E tone marks and adding them to the ALA-LC system. You could also imagine rewriting long vowels as e.g. aa instead of ā, to avoid the stacking up of diacritics when long vowels are combined with tone marks. Benwing (talk) 06:15, 4 August 2015 (UTC)

Here's a comparison between some systems: --iudexvivorum (talk) 11:39, 4 August 2015 (UTC)

# Thai meaning IPA romanisation
(without tone marks)
tone marks added
(using numbers to indicate tones - see notes below)
1 ไม้ใหม่ไหม้มั้ย Was that new piece of wood burnt by the fire? mäːj˦˥ mäj˩ mäj˥˩ mäj˦˥ mai mai mai mai máai mài mâi mái māi mai mai mai māi4 mai2 mai3 mai4
2 กรุงเทพมหานคร อมรรัตนโกสินทร์ The city as great as a celestial city, where the Emerald Buddha stays in perpetuity. krũŋ˧ tʰeːp̚˥˩ mä˥.häː˩˦ nä˥.kʰɔ̃ːn˧ ʔä˩.mɔ̃ːn˧ rät̚˥.tä˩.nä˥ koː˧.sĩn˩˥ krungthepmahanakhon amonrattanakosin grung-tâyp-má-hăa-ná-kon a-mon-rát-dtà-ná-goh-sĭn krungthēpmahānakhǭn ʻamǭnrattanakōsin krung1-thēp2-ma4-hā5-na4-khǭn1 ʻa2-mǭn1-rat4-ta2-na4-kō1-sin5
3 เสียงลือเสียงเล่าอ้าง อันใด พี่เอย What tales, what rumours, you ask? siːä̃ŋ˩˦ lɯː˧ siːä̃ŋ˩˦ läw˥˩ ʔä̃ːŋ˥˩ ʔä̃n˧ däj˧ pʰiː˥˩ ʔɤːj˧ siang lue siang lao ang an dai phi oei sĭang leu sĭang lâo âang an dai pêe oie sīang lū’ sīang lao ʻāng ʻan dai phī ʻœi sīang5 lū’1 sīang5 lao3 ʻāng3 ʻan1 dai1 phī3 ʻœi1
4 อันมือไกวเปลไซร้แต่ไรมา คือหัตถาครองพิภพจบสากล The hand that rocks the cradle is the hand that rules the world. ʔä̃ːŋ˥˩ mɯː˧ kwäj˧ pleː˧ säj˦˥ tɛː˨˩ raj˧ mäː˧ kʰɯː˧ hät̚˩.tʰäː˩˥ kʰrɔ̃ːŋ˧ pʰi˥.pʰop̚˥ t͡ɕop̚˩ säː˩˥.kõn˧ an mue kwai ple sai tae rai ma khue hattha khrong phiphop chop sakon an meu gwai bplay sái dtàe rai maa keu hàt-tăa krong pí-póp jòp săa-gon ʻan mū’ kwai plē sai tǣ rai mā khū’ hatthā khrǭng phiphop čhop sākon ʻan1 mū’1 kwai1 plē1 sai4 tǣ2 rai1 mā1 khū’1 hat2-thā5 khrǭng1 phi4-phop4 čhop2 sā5-kon1
Tone representation:
"1" = สามัญ (mid; [aː˧])
"2" = เอก (low; [aː˨˩] / [aː˩])
"3" = โท (falling; [aː˥˩])
"4" = ตรี (high; [aː˦˥] / [aː˥])
"5" = จัตวา (rising; [aː˩˩˦] / [aː˩˦])
I got the idea of using numbers from the Wade–Giles system for romanising Chinese. But the numbers will be superscript under the WG system (e.g. "p'in1-yin1" for "拼音").
@Iudexvivorum Thanks. Good job! I was going to suggest the system used by Benjawan Poomsan Becker. In his dictionaries he uses special characters for vowels "ʉ" for อึ, "ɛ" for แอะ, "ɔ" for เอาะ and "ə" for เออะ. Long vowels are simply duplicated, e.g. ตืน is "dtʉʉn". Tone marks are used on the first vowels only, e.g. เบิก is "bə̀ək". Tone marks are (using "a"): "a" (1 - no tone mark), "à" (2), "â" (3), "á" (4) and "ǎ" (5). Like T2E he uses d-dt-t, b-bp-p.
Using that system the examples above become:
  • ไม้ใหม่ไหม้มั้ย: máai mài mâi mái
  • กรุงเทพมหานคร อมรรัตนโกสินทร์: grung-têep-má-haa-ná-kon a-mon-rát-dtà-ná-goh-sǐn
  • เสียงลือเสียงเล่าอ้าง อันใด พี่เอย: sǐang leu sǐang lâo âang an dai pêe oie
I agree that Thai2English may not transliterate words correctly, which it doesn't have in their dictionary. (I wonder if อธิกมาส has various readings, though. Both T2E and http://www.thai-language.com transliterate it as "atíkmâat".). Are "a-tí-gà-mâat" and "a-tík-gà-mâat" irregular alternative readings? --Anatoli T. (обсудить/вклад) 12:32, 4 August 2015 (UTC)
  1. The term อธิกมาส is never pronounced "a-thik-mat" (a-tík-mâat). Grammatically, it is pronounced "a-thi-ka-mat" (a-tí-gà-mâat), as it is from Sanskrit अधिकमास adhikamāsa. But people also pronounce it as "a-thik-ka-mat" (a-tík-gà-mâat) and this pronunciation has become so popular. The Royal Institute Dictionary, the official dictionary of the Thai language, therefore accepts both pronunciations.
  2. There are many other similar cases. Some are shown in the table below.
  3. FYI: The Royal Institute of Thailand publishes a popular book called "อ่านอย่างไรและเขียนอย่างไร" ("How to Write? How to Read?"), containing common misspellings and mispronunciations, pronunciations of proper nouns, useful rules concerning writing and reading, etc. The book is regularly updated. The 2014 edition (22th edition; ISBN 9786167073965) seems to be its latest edition. But it is in Thai only.
--iudexvivorum (talk) 14:29, 4 August 2015 (UTC)
term acceptable pronunciations notes
grammatical popular
กรณี RTGS: karani
T2E: gà-rá-nee
IPA: kä˩.rä˥.niː˧
RTGS: korani
T2E: gor-rá-nee
IPA: kɔː˧.rä˥.niː˧
from Sanskrit करणि karaṇi
ครหา RTGS: kharaha
T2E: ká-rá-hăa
IPA: kʰä˥.rä˥.haː˩˩˦
RTGS: khoraha
T2E: kor-rá-hăa
IPA: kʰɔː˧.rä˥.haː˩˩˦
from Sanskrit गर्हा gar'hā
ปรัชญา RTGS: prat-ya
T2E: bpràt-yaa
IPA: prät̚˩.jäː˧
RTGS: pratchaya
T2E: bpràt-chá-yaa
IPA: prät̚˩.t͡ɕʰä˥.jäː˧
from Sanskrit प्राज्य prājya
ปรมาจารย์ RTGS: paramachan
T2E: bpà-rá-maa-jaan
IPA: pä˩.rä˥.mäː˧.t͡ɕä̃ːn˧
RTGS: poramachan
T2E: bpor-rá-maa-jaan
IPA: pɔː˧.rä˥.mäː˧.t͡ɕä̃ːn˧
from Sanskrit परम parama + आचार्य ācārya
มนุษยสัมพันธ์ RTGS: manutsayasamphan
T2E: má-nút-sà-yá-săm-pan
IPA: mä̃˧.nut̚˥.sä˩.jä˧.sä̃m˩˥.pʰä̃n˧
RTGS: manutsamphan
T2E: má-nút-săm-pan
IPA: mä̃˧.nut̚˥.sä̃m˩˥.pʰä̃n˧
from Sanskrit मनुष्य manuṣya + सम्बन्ध sambandha
อธิบดี RTGS: a-thi-bodi
T2E: a-tí-bor-dee
IPA: ʔä˩.tʰi˥.bɔː˧.diː˧
RTGS: a-thipbodi
T2E: a-típ-bor-dee
IPA: ʔä˩.tʰip̚˥.bɔː˧.diː˧
from Sanskrit अधिपति adhipati
อาชญา RTGS: at-ya
T2E: àat-yaa
IPA: ʔäːt̚˨˩.jäː˧
RTGS: atchaya
T2E: àat-chá-yaa
IPA: ʔäːt̚˨˩.t͡ɕʰä˥.jäː˧
from Sanskrit आज्य ājya
If I were to design a Thai translit system, I'd want the following:
  1. Use diacritics for tones rather than numbers; numbers look ugly to me and take up extra room.
  2. Use double letters rather than macrons; this is necessary with diacritic tonal marks to avoid double diacritics.
  3. Don't separate syllables with hyphens; that looks ugly to me and takes up lots of extra room.
  4. Use t th d rather than d t dt.
However, if Benjawan Poomsan Becker's system satisfies 1-3 but not 4, then maybe we should go ahead and use it in the interest of using an existing system rather than rolling our own. Benwing (talk) 08:31, 5 August 2015 (UTC)
@Iudexvivorum Thanks for providing this info. Irregular pronunciation was a side question. We still want to transliterate Thai words with irregular pronunciations phonetically. BTW, you can use automatic transliterations for Sanskrit, e.g. करणि ‎(karaṇi), गर्हा ‎(garhā), प्राज्य ‎(prājya), etc. Unfortunately, it seems that some online dictionaries, including thai2english and thai-language.com don't always provide phonetic transliterations or respellings for irregular words. (The latter uses yet another transliteration system, which is great for learning but not good for dictionaries) If I get some words wrong, I'd appreciate your corrections!
@Benwing I favour Benjawan Poomsan Becker's system but it also uses hyphens, like Thai2English. Hyphens can be either removed or added regardless of what system we choose. It's easier to read Thai correctly when syllables are split by hyphens. Initials and finals are pronounced quite differently in Thai like in many East Asian languages and like many East Asian languages, consonants change pronunciations when they are finals, specifically - s, ch, j, d, dt, t are all pronounced as a clipped "t" [t̚] when they are finals, p, bp, b, f are all [p̚], g, k are [k̚] and n, l and r become [n]. It's important to separate clusters like "kla" from "-k-la", "tra" from "-t-ra", etc. User:Stephen G. Brown also favours using solid words, without hyphens. There are pros and cons with languages like Thai with both. Textbooks and dictionaries favour hyphens, sometimes spaces after each syllable.
Shall I make proposed full tables with Benjawan Poomsan Becker's system? --Anatoli T. (обсудить/вклад) 11:39, 6 August 2015 (UTC)
@Atitarev As for hyphens, I guess I'm used to Pinyin, written without them. But I also kind of would have expected final s, ch, j, etc. to be transcribed as t to follow the pronunciation. But I imagine whatever Becker does should work fine. If dictionaries tend to use hyphens, for example, then that's what we should do. Benwing (talk) 21:35, 6 August 2015 (UTC)
@Iudexvivorum, Benwing I've slowly started using Becker's transliteration, as in เรียก, including a usex, e.g.:
เรียกรถแท็กซี่แล้วยัง?rîak rót tɛ́k-sîi lɛ́ɛo yang? ― Did you call the taxi?.
I've also started Category:Thai terms with irregular pronunciations, which I think could be useful. For irregular pronunciations as in ชาติ ‎(châat) I've added a line "Phonetic respelling: ชาด". What do you think? Sorry, I haven't provided a full table for your consideration because I don't know your opinion on the change (see my post above - 12:32, 4 August 2015). --Anatoli T. (обсудить/вклад) 00:49, 11 August 2015 (UTC)
  1. What you've done above looks great! Anyway, "เรียกรถแท็กซี่หรือยัง?" sounds more natural than "เรียกรถแท็กซี่แล้วยัง?". I've edited the entry เรียก. But I haven't provided transliterations (because I don't know how) and I haven't replaced "เรียกรถแท็กซี่แล้วยัง?" with "เรียกรถแท็กซี่หรือยัง?". I hope you will further improve the entry.
  2. I've been waiting for the full table; that's why I didn't give any opinion, lol! I'll also start using the system as soon as possible. And I think, for readers' sake, you should create a page on Wiktionary that contains the table (like the page Wiktionary:International Phonetic Alphabet) and the transliterations should be linked to that page (by means of template or any other means).
--iudexvivorum (talk) 02:12, 11 August 2015 (UTC)
@Iudexvivorum OK, great. I'll make a table and it will make it easy to look up and copy/paste if needed and I'll teach you some tricks to make adding transliterations easier (if you use Firefox, it's even easier). We don't normally link transliterations to templates (just using tr=) but if entries contain more than one transliteration, it could be done, I guess - I will ask for assistance to enhance Thai headword modules/templates. Wiktionary:Thai transliteration and Wiktionary:About Thai will need to be updated. I will try adding new transliterations to your usage examples. You can use the new transliteration "rʉ̌ʉ-yang" for หรือยัง, if you want to replace แล้วยัง with หรือยัง :). BTW, can หรือยัง be considered a single term? Does it need a space instead of a hyphen between the two syllables? I trust your judgement on what sounds more natural, of course, since my Thai is very basic, LOL! --Anatoli T. (обсудить/вклад) 02:50, 11 August 2015 (UTC)
  1. Thank you so much! I've replaced "แล้วยัง" with "หรือยัง".
  2. "หรือยัง", "แล้วหรือ", "แล้วหรือยัง" are generally interchangeable. For example:
    1. "จะไปหรือยัง", "จะไปแล้วหรือ", "จะไปแล้วหรือยัง" = "shouldn't we go yet?"
    2. "ไปได้หรือยัง", "ไปได้แล้วหรือ", "ไปได้แล้วหรือยัง" = "can't we go yet?"
    3. "ไปหรือยัง", "ไปแล้วหรือ", "ไปแล้วหรือยัง" = "hasn't he gone yet?" / "hasn't he left yet?"
  3. Using "แล้วยัง" in a question is rare in the Central Thai dialect, though it would mean the same as the above phrases. But it can be found in the Northern Thai and Northeastern Thai dialects. (In fact, in Northern Thai, "แล้วยัง" is even less common than "แล้วกา".)
  4. I don't think "แล้วยัง", "หรือยัง", "แล้วหรือ", "แล้วหรือยัง" can be considered single terms, just as "should not", "have not", "is not", "are not", etc., are not single terms. (That's why I removed the hyphen from "rʉ̌ʉ-yang".)
--iudexvivorum (talk) 04:03, 11 August 2015 (UTC)

Feedback on alternative layout for Template:de-decl-adj-table[edit]

I created an alternative layout for this template, see User:CodeCat/de-adj. The three sections for strong, mixed and weak are now merged into one piece, with the distinction instead shown through columns. Please comment; is it better, worse? Should we use it? —CodeCat 14:50, 3 August 2015 (UTC)

Your table is more compact. On the other hand, the current arrangement with all strong forms in one place, all weak forms in one place, and all mixed forms in one place seems better for what I expect is the main use of the tables: someone has "[definite article] _ [noun]" or "[indefinite article] _ [noun]" or "_ [noun]" (i.e. they know whether they're looking for a strong or weak or mixed form), and they want to know what ending to put on "rot", for the case and gender they're dealing with, when they plug it into to that blank. Both online (de.Wikt, Canoo) and print references seem to favour the "all strong (etc) forms in one place" format. Notably, I would expect printed works to prefer a more space-saving compact format if they didn't think there was a compelling reason for the longer format. OTOH, if your table were rotated 90°, it might be compact enough to have the advantage of fitting all on one screen for mobile users (but as it is, I imagine it's still too wide). - -sche (discuss) 00:23, 4 August 2015 (UTC)
The main reason I made it was to show the similarities of forms between strong, weak and mixed declensions. This is something that I personally always struggled with, so I though a different table layout might help. But I'll leave it then. —CodeCat 00:34, 4 August 2015 (UTC)
A slightly different issue -- surely the order "nom gen dat acc" is unhelpful for German? My German textbooks use "nom acc dat gen", which IMO is far better since nom and acc are so often the same. Benwing (talk) 06:37, 4 August 2015 (UTC)
I agree that this order is more helpful. The order used for old Germanic languages is generally nom acc gen dat, and this is still used for Icelandic. I never saw the point in having accusative fourth; it's "traditional", but traditions are superceded when we realise they're stupid. —CodeCat 20:11, 9 August 2015 (UTC)
Like Benwing, I'd prefer nom-acc-dat-gen. Nom-gen-dat-acc was traditionally the most common order, but I wouldn't mind improving upon tradition, and there certainly are references which have already done so, as Benwing notes; e.g. Günter Kempcke, Wörterbuch Deutsch als Fremdsprache (2000); Paul G. Graves, ‎Henry Strutz, Master the Basics: German (1995, ISBN 0812090012); David Crowner, ‎Klaus Lill Impulse: Kommunikatives Deutsch Fur Die Mittelstufe (1998, ISBN 0395909341); Karsten Fink, Workbook Deutsch: Das Übungsbuch zu Eine wesentliche Grammatik (2014); and even Robert P. Ebert, ‎Oskar Reichmann, ‎Hans-Joachim Solms, Frühneuhochdeutsche Grammatik (1993), which all use Nom-Akk-Dat-Gen order. - -sche (discuss) 20:34, 9 August 2015 (UTC)
Time for a proposal then? I wouldn't mind one for Latin either to be honest, but Latin tends to be full of tradition freaks... x.x —CodeCat 20:57, 9 August 2015 (UTC)
My only objection is that I am so used to nom-gen-dat-acc that I get confused every time I see nom-acc-dat-gen. But I'll get over it if it's really a better order and we start using it more. Whichever order we choose though, we should try as much as possible to use it consistently not only within languages, but across all languages. --WikiTiki89 17:46, 10 August 2015 (UTC)
Heh, I have the reverse problem. (*looks at second row of inflection table* "what?! there's no way that's the accusative form..." *looks at legend* "oh, it really isn't.") I don't think all languages can necessarily be handled the same; perhaps for some (e.g. Latin) there really is a case for nom-gen order, while for others we already use nom-acc order (e.g. Proto-Germanic, Middle Dutch). I'd rather handle German first and worry about unrelated languages I don't speak later (e.g. Finnish, which uses nom-gen-part-acc, in contrast to Hungarian which uses nom-acc-dat). - -sche (discuss) 19:11, 10 August 2015 (UTC)
I agree with User:-sche that we should do one language at a time. Different languages may have different orders that make the most sense, and also there's the issue of tradition -- German textbooks often prefer nom-acc-dat-gen but Old English textbooks use nom-acc-gen-dat. Sanskrit has a traditional order nom-voc-acc, inst-dat-abl, gen-loc which makes total sense for Sanskrit (and for PIE, and it looks like we indeed use it for PIE) but for Latin the order that makes the most sense might be something like nom-voc-acc, gen-dat, abl-loc, which is similar but moves the genitive. Lithuanian seems to have its own order nom-gen-dat-acc-inst-loc-voc and people working on it might object to changing the order (although personally I think the first two should be nom-voc because they're the same in the plural). Benwing (talk) 01:32, 11 August 2015 (UTC)
For Slovene, the traditional order is nom-gen-dat-acc-loc-ins, but on Wiktionary that's changed into nom-acc-gen-dat-loc-ins. So here, too, genitive precedes dative. For IE languages with a vocative, the order should indeed be nom-voc-acc, like for Proto-Germanic. Balto-Slavic languages tend to put the vocative last; for Proto-Slavic and Proto-Balto-Slavic we currently use the order nom-acc-gen-loc-dat-ins-voc. —CodeCat 01:40, 11 August 2015 (UTC)
Russian seems to do nom-gen-dat-acc-ins-prep which reverses the order of the last two from Slovenian (since "prepositional" is really the locative case). But it would make a lot more sense to move the acc to come after nom, like we do for Slovenian, since the acc is usually the same as either nom or gen (presumably Slovenian is like this too). I guess the point is that the most appropriate order depends somewhat on the language ... for German, acc-dat-gen makes sense since dat and acc are often the same but gen is different, whereas for Russian, acc-gen-dat makes sense since gen and acc are often the same. Benwing (talk) 08:57, 12 August 2015 (UTC)

Deletion of inflected forms[edit]

I see an editor deleting inflected form entries that use {{inflected form of}}, including kveldi, kljenuta, and κυκλῶν. Do we want this? I don't. --Dan Polansky (talk) 23:15, 3 August 2015 (UTC)

Most uses of the template are gone now, via Special:Contributions/MewBot and its e.g. "Rename inflected form of > lb-inflected form of for Luxembourgish entrie" or "Rename inflected form of > yi-inflected form of for Yiddish entries".

I ask that the bot be immediately blocked for a gross violation of WT:BOT and that it remain blocked until the changes are undone. (I might as well talk to a tree, I guess.) --Dan Polansky (talk) 23:24, 3 August 2015 (UTC)


shows that the bot made more than 5000 edits to remove {{inflected form of}}, at the rate of approximately 60 edits per second. --Dan Polansky (talk) 23:30, 3 August 2015 (UTC)

I think you mean minute. DTLHS (talk) 23:36, 3 August 2015 (UTC)
Yes, my mistake. --Dan Polansky (talk) 23:41, 3 August 2015 (UTC)
The change to kveldi looks correct; {{inflected form of}} should be avoided in favor of specifying the actual inflection, which is what was done here. But I totally disagree with simply deleting the pages that use this template, as in kljenuta and κυκλῶν. They should be left alone until someone manages to fix them up to specify which inflection is involved. As for templates like {{de-inflected form of}} instead of the generic one, I'm not sure the point of them, but I imagine CodeCat can explain, and at least there is no loss of information. Benwing (talk) 06:25, 4 August 2015 (UTC)
I agree that these deletions are not okay, and CodeCat should recreate all the entries she has bot-deleted for this reason. —Μετάknowledgediscuss/deeds 06:29, 4 August 2015 (UTC)
Just so we're on the same page, "all the entries she has bot-deleted" = zero entries, and she only deleted three by hand (kveldi, kljenuta, and κυκλῶν). The bot work consisted of switching German uses to {{de-inflected form of}} (which was proposed on the 22nd, met with agreement from a German speaker on the 23rd, and thereafter met with silence until after the changes had been made; only then did someone object) or relatedly switching Yiddish and Luxembourgish uses to corresponding templates. The fact that more languages than were initially thought use {{inflected form of}} may mean we want to go back to the general-purpose template and use langcodes, rather than using language-specific templates — if so, we can do that, since nothing was deleted, but rather only renamed. - -sche (discuss) 08:20, 4 August 2015 (UTC)
Someone else has restored kveldi and I've restored κυκλῶν and made it more precise than it was. I've left kljenuta deleted since if the declension table at kljenut is right, kljenuta isn't a form of it. —Aɴɢʀ (talk) 10:33, 4 August 2015 (UTC)
Thanks for the clarification, -sche. Dan Polansky's wording was evidently intentionally misleading, but my faulty assumptions derived therefrom aside, I still do not support those deletions without process. —Μετάknowledgediscuss/deeds 16:15, 4 August 2015 (UTC)
I apologize to anyone who was mislead by my wording. I should have already been fast asleep at the time when I posted the initial post here; 23:30 means it was 1:30 CET, summer time. --Dan Polansky (talk) 10:01, 8 August 2015 (UTC)
  • Manual creation of a subset of word's inflected forms should be banned, and such entries deleted. Making such entries only complicates botting the rest of the inflection in the future. Too much time is wasted cleaning up such entries. If you are creating inflected forms manually either create it entirely for a lemma, using one and only one template, or don't create it at all. --Ivan Štambuk (talk) 09:13, 4 August 2015 (UTC)
    No it shouldn't, and no they shouldn't. I don't know how to use a bot, and I don't always have the time to create entries for all the inflected forms. I often create entries only for those inflected forms that already exist as spellings in other languages. For example, if some random Irish or Old Irish verb form happens to share a spelling with an existing Spanish entry, I'll create the Irish form there, but I won't bother creating brand-new entries for all the other forms of the verb. In other words, I'll work to remove orange links from inflection tables, but not (always) black/red ones. —Aɴɢʀ (talk) 10:39, 4 August 2015 (UTC)
    Extinct languages like Old Irish which have irregular paradigms and limited attestation of inflection should of course be manually treated. But for living languages that don't have such issues you are just creating more cleanup in the future. Blueing orange links seem to me the only valid reason to do so (convenience over thoroughness). --Ivan Štambuk (talk) 10:23, 5 August 2015 (UTC)

Some relevant data:

  • There were 45419 uses of {{inflected form of}} on a definition line on 2014-07-28. I used the following Windows command line to ascertain that: find /c "# {{inflected form of" enwiktionary-20140728-pages-articles.xml
  • {{de-inflected form of}} was created on 3 August 2015‎ by CodeCat. --Dan Polansky (talk) 10:01, 8 August 2015 (UTC)
  • AWB shows 25000 uses of {{de-inflected form of}} as of now, but there is probably a limit of 25000 built into AWB. I hazard a guess that almost all uses of {{inflected form of}} were replaced with {{de-inflected form of}}.

--Dan Polansky (talk) 10:01, 8 August 2015 (UTC)

{{ux}} in Eastern Mari?[edit]

Recently, CodeCat (talkcontribs) changed the format of the examples in Eastern Mari лум, inserting {{ux}}. The result looked like this: [2]. I certainly understand the need to use standard templates, but the resulting format was much less compact and less practical: three lines per example (including transliteration). Since I thought one line per example would be nicer on the eye and easier for anyone actually interested in seeing how the word can be used, I reverted her change. But I wondered if it wouldn't be possible to change said templates (or create a new one) that has the one-line-per-example format, and keep using it. Would that be a problem to anyone? Is there a reason why the three-line-per-example format should be preferred to the one-line-per-example one? --Pereru (talk) 20:02, 4 August 2015 (UTC)

@Pereru: Just add the parameter |inline=1 to the {{ux}}/{{usex}} template. --WikiTiki89 20:13, 4 August 2015 (UTC)
OK. Now, can this be the standard format? Or is there any reason to prefer the three-line-per-example format? Or is this up to every Wiktionarian to decide? --Pereru (talk) 20:20, 4 August 2015 (UTC)
The reason is that most usage examples are much longer and wouldn't fit well on one line. It's the short ones that are the exception. --WikiTiki89 22:14, 4 August 2015 (UTC)
I think this should be automated in some way. Once the length exceeds a threshold, put it on multiple lines, otherwise keep it on one line. —CodeCat 20:25, 4 August 2015 (UTC)
It's hard to determine length other than by counting characters, which is not so accurate. I think it is better to leave it as is. Perhaps we can make it easier by having a template such {{uxln}} or {{ux1}} which would effectively be a redirect to {{ux|inline=1}}. --WikiTiki89 22:14, 4 August 2015 (UTC)
That's probably better than having such a parameter. But there is an alternative to counting characters: CSS layout. I'm not sure if it's feasible, but at least the client-side stuff knows exactly how wide text is, and it can overflow when necessary. —CodeCat 22:17, 4 August 2015 (UTC)
But it would also need to hide the dashes when it overflows. How would you do that? Also semi-relatedly, |tr=- doesn't work to hide the transliteration in {{ux}}. --WikiTiki89 22:35, 4 August 2015 (UTC)

Sourcing etymologies?[edit]

Recently, in Latvian ūdrs, I reverted a change that introduced a Proto-Baltic reconstruction in the Etymology section, without proper sourcing. Given the way the text was written, it would seem that the Proto-Baltic proposed reconstruction came from Karulis' Latviešu Etimoloģijas Vārdnīca, when in fact it came from an as yet unpublished article by R. Kim. I changed the format, to make it clearer where the Proto-Baltic form was taken from. Can't we perhaps agree on a general policy for Etymology sections whereby we try to explicitly source what is what -- so that, if two protoforms from different sources are cited, the reader can know which was proposed by which source? The format doesn't have to be the one I used in ūdrs, of course, but it would be nice to have something that would avoid this kind of confusion. A second, unrelated question is whether unpublished sources should be accepted in Wiktionary. I'd say no: let it be published before it can be cited here. But I don't know what the others here think. --Pereru (talk) 20:19, 4 August 2015 (UTC)

My general issue with your etymologies is that they're huge blocks of text. They need to be structured better in order to be readable. The long list of cognates is not necessary either, especially if we already have PIE pages and, more recently, categories to hold them. At the very least, they should be made collapsible or presented in a separate paragraph to make the rest easier to read.
This is of course not the problem I wanted to talk about here, but OK, there we go...
The 'huge blocks of text' are necessary when the etymology is not simple, or is disputed, or involves changes, semantic or otherwise, that are not obvious, as in liegt. When the etymology is simple -- just PIE to PB to the word, without semantic changes, as in acs, you have only one short sentence. I suppose your problem here is how much information should be given: should there be only a reference to the etymon, with no indication of how you got from that form and from that meaning to the current state? Or should more information be provided? I, for one, favor the latter, because this extra information is important to judge and accept the etymology, and are part of the history of the word, which is what the etymology section is about. It is also often interesting and brings new light to the understanding of the word, as several other people here told me when commenting favorably on the 'huge blocks of text' that you dislike. Call that 'humanistic etymology' if you will.
I don't have anything against presenting a lot of information. My problem is more the way it's presented. One giant paragraph doesn't invite the user to read it, and instead they'll just go tl;dr at it. If I want to know, at a glance, what the origin is, I don't want to have to read through a lot of blabber to get to the point. So what I would suggest is to write etymologies focus first on the known and reconstructed history, and leave the details until later. That way, people who aren't interested in the extra details can skip them, rather than having to sift through. Make the information that users want more accessible by splitting it. —CodeCat 21:26, 4 August 2015 (UTC)
Most users don't want to look at etymologies, they just want to see what the word means; so they won't read the etymology (or at the alternative forms, or the pronunciation) at all. If they glance at the etymology section, they're as likely to go tl;dr at mysterious cabalistic symbols like *h₃ḗHḱ-ō as they are at longish texts. Only if they are interested will they read it. Interestingly, the information I present is already in the format you suggest: the very first sentence gives the PB and the PIE etymon, you don't have to read any further than that. Perhaps the only necessary change here is to add a carriage return after that first sentence, to put the rest of the information in a separate paragraph? --Pereru (talk) 22:23, 4 August 2015 (UTC)
The list of cognates is less relevant, I agree. The only problem is that different sources often quote different cognates, and this may be a problem. One solution is to bypass cognates altogether, but this only works for the (relatively few) 'famous' words or roots that already have reconstructed forms here at Wiktionary (where one can add cognates and refer to the specific sources that mention them. But over 90% of Latvian words for which Karulis' LEV gives etymologies are not in this category: rather, they are words with only a couple of cognates, mostly in Baltic (e.g. liegt) or maybe a couple of other non-Baltic languages. It will be a long time before those etyma have Wiktionary pages, so eliminating these cognates looks like a bad idea. I would agree, though, with the cases in which there is already a good Appendix page with the etymon (as long as different cognates proposed by different sources are clearly distinguished there). Do you have one such example, so we could discuss the format further?
If sources conflict, then Wiktionary has to find a compromise through the usual consensus process. Consensus may invalidate some sources or even all of them, or choose a particular one that seems most usable by the people discussing the matter. —CodeCat 21:28, 4 August 2015 (UTC)
Sure. Let it happen, then. The LEV, for instance, cites cognates that are not cited in some Wiktionary reconstructed entries; should I add them? Or should I start somewhere a discussion about whether or not to do this? Or whether or not the LEV is a good source? And, if so, where? --Pereru (talk) 22:23, 4 August 2015 (UTC)
Showing different takes on the issue by different people is good. I think the best way to present it would be through an unordered list. See for example *fanhaną. —CodeCat 20:24, 4 August 2015 (UTC)
Back to the problem at hand. Yes, that would be good, so separate paragraphs for your PBS etymologies (with correct sourcing) might be a good idea. You could start such paragraphs with 'According to a diffferent source,...' and then add the information. Or you could mention the forms with a footnote to the source, as I did in ūdrs. Either way would be OK with me, as long as the wording is fluent and there is no confusion as to what comes from where. What I would disagree with is what you did before: just adding a form with no sourcing to a text that is itself attributed to a specific source, as if that form also came from the same source (i.e., your original PBS etymon at ūdrs looked like it came from Karulis' LEV, when in fact it came from Kim's unpublished paper).
Besides, note that English fang (from *fanhaną) -- where you find one of those 'huge blocks of text' you so much dislike -- does NOT mention the two proposed PIE etyma mentioned under *fanhaną: rather, it only mentions the first one, and without references to sources. So the information under fang is misleading at best. Shouldn't such things be changed in a more principled way, so that a reconstructed entry does not seem to be in contradiction with the information found in the etymology section of one of its reflexes? --Pereru (talk) 21:23, 4 August 2015 (UTC)
And I would add that I don't think it's a good idea, in principle, to cite unpublished sources. (But maybe Kim's paper has already been published? It was going to come out in a Handbook, as I recall; maybe it is already there?) --Pereru (talk) 21:12, 4 August 2015 (UTC)
This is the problem with the paragraph approach that you use. You source the whole paragraph, which makes it impossible for anyone to make adjustments to the text. Any edits make it no longer faithful to the source. Instead what should be sourced is individual facts. That way, people can add or change things without invalidating the references. Again, splitting etymologies into separate sections with paragraph breaks and lists should help with that. Again look at *fanhaną: each list item has its own separate sourcing.
Yet I did source other forms in ūdrs, for example, so that it is clear that the PBS form is not from the LEV; just put the footnote next to the material from the other source, not at the end of the paragraph. Why not make it standard practice? Another possibility is simply to start a new paragraph with a different source, perhaps starting with "A different source claims that..." or something similar. So this isn't a problem. --Pereru (talk) 22:23, 4 August 2015 (UTC)
I'm not sure what you mean by unpublished sources. If they are available, then they are public, right? —CodeCat 21:26, 4 August 2015 (UTC)
An unpublished article has not yet passed peer review. It may be complete nonsense, or more likely it may have a few minor errors that will be corrected before publishing. --WikiTiki89 22:20, 4 August 2015 (UTC)
Nowadays everything is on the internet: manuscripts, unpublished sources, papers at various levels of completion... because we always want to invite comments from other interested researchers, comments that may improve a paper even before it's completely finished (academia.edu is a great site for this, as are individual researchers' pages at their institution website). When a paper is published, however, it is officially released, be it on paper, be it in a publishing website. After that, it can no longer be edited or altered; and the year of its publication becomes fixed. Also, a published paper went through a refereeing process in which it was read and commented upon by two or three of the author's peers; an unpublished paper, of course, didn't. So the jist of it is that an unpublished paper is (supposed to be) less good and less final than its published version. Its author, for instance, wouldn't like you to cite an unpublished version if there is a published one alreday. (Kim's paper states quite clearly -- at the end, I think -- that it is an unpublished version, to appear in a Handbook of something or other). --Pereru (talk) 22:23, 4 August 2015 (UTC)
Different pages conflicting on each other is an unfortunate effect of how Wiktionary works. There's not much that can be done about it other than checking and updating things regularly. I would say that generally, the reconstruction pages are more reliable than the etymologies within entries, as they've been created and reviewed by more knowledgeable editors. Etymologies in entries often tend to be copied from just one source, often an outdated or nonspecialised one. They are then inserted into entries by editors who are relatively inexperienced with such matters, so that they are not able to spot and correct problems in their sources. And then, when new entries are created for cognate terms, then the etymologies are just copied over. This tends to propagate old/bad etymologies. And it's one of the reasons I prefer keeping etymologies to a bare minimum and letting the proto-language pages handle the rest. —CodeCat 21:33, 4 August 2015 (UTC)
This is, again, difficult for words that have a more complicated history, as I mentioned above. For such words, their etymology section is the only place where, say, discussing a strange semantic evolution or comparing two or three different etymologies is logical: after all, in the reconstructed entries, you are not interested in the details of the semantic evolution of one reflex in one sub-branch of the family (I haven't seen a single reconstructed proto-entry here that does that); rather, the focus is on the reconstructed protoform and how it fits in the proto-system. So I think you would lose more than gain by doing that. The only thing that I would indeed relegate to the reconstructed entries is the list of cognates -- assuming that we can source cognates that occur in only one source, for instance.
And here's a final thought: if inconsistencies are unavoidable at Wiktionary, if no policy can be devised to address them, then we're basically giving up on the idea that Wiktionary can become a quality work. No -- I'm sure something can be done. Wikipedia found solutions, so can Wiktionary. --Pereru (talk) 22:23, 4 August 2015 (UTC)
  • I support banning original research with reconstructions in etymologies, as well as inventive editorial corrections, such as how "ū́drā́-" (the form cited in the article by R.K.) became "ūdrāˀ (which is what CodeCat inserted in the etymology). Additionally, for protolanguages, when there is no accepted general framework, which is the cases with Proto-Baltic/Proto-Balto-Slavic, all of the competing theories should be presented on an equal footing. That means that there can be no single and "true" reconstruction, and that there could be multiple inflection tables for a word according to different sources. --Ivan Štambuk (talk) 10:19, 5 August 2015 (UTC)
Agreed. And since the number of reconstructed entries in Wiktionary is not so high, this is probably quite feasible, isn't it? Shouldn't for instance the page *ūdrāˀ be moved to *ū́drā́-, then? Or does CodeCat have another source that references the form she prefers? --Pereru (talk) 13:46, 5 August 2015 (UTC)
I oppose a ban on editorial corrections; to fail to harmonize notation schemes is misleading. In both Menominee (living language) and Proto-Algonquian (reconstructed language), for example, most people notate long vowels like , but some people write , a: or ā. To have individual words/forms in different systems based on who attested the particular word/form (e.g. fooba·r, plural foobārs) would confuse readers into thinking the vowels were of some different quality. - -sche (discuss) 17:24, 5 August 2015 (UTC)
I agree with -sche here; notation schemes should be harmonized to the extent that this is a simple case of equivalent notations. As for "ū́drā́-" vs. "ūdrāˀ", it's not obvious to me what's going on here. Do the two acute accents indicate Balto-Slavic acute? If so, then it's fine to convert them to use the superscript glottal stop, which can be viewed as simply another way of indicating the BS acute register -- the fact that it expresses an opinion as to how that register was phonologically realized is irrelevant here. But then shouldn't it be "ūˀdrāˀ"? Benwing (talk) 07:16, 6 August 2015 (UTC)
Using acute accent marks to indicate the acute is actually very misleading, because Proto-Balto-Slavic also had a proper phonemic word accent like that of PIE. We should definitely use the same symbol, ´, to denote the accent in both of them. Anything else would just be unnecessarily confusing. That said, it does seem that there is somewhat of a linguistic consensus that the acute register involved some kind of glottal feature. The Latvian broken tone is a direct continuation of the acute, and is realised as glottalisation. So if there is any serious disagreement among linguists about the approximate nature of the acute, then I would like to hear about it. —CodeCat 00:42, 7 August 2015 (UTC)
@CodeCat OK, I think I agree with you here, but what I don't understand is why you didn't write "ūˀdrāˀ" rather than "ūdrāˀ". Isn't the acute register on both syllables? And where's the stress? Benwing (talk) 10:09, 7 August 2015 (UTC)
You're right, I moved the page. But I wonder why the masculine form *udras doesn't have an acute, at least according to the source Pereru gave. Did Winter's law skip that word or something? —CodeCat 12:06, 7 August 2015 (UTC)
I think it does have an acute, it's just mis-written. The Latvian descendant has a long broken-tone vowel, and AFAIK broken-tone is descended from an unstressed Balto-Slavic acute vowel (one of the other two tones reflects a stressed acute vowel, I think, but I forget which one). Benwing (talk) 12:17, 7 August 2015 (UTC)

Sourcing etymologies bis: a proposal[edit]

Well, here is a modest proposal for sourcing (and otherwise formatting) etymologies in etymology sections:

  • For "simple" etymologies (A < proto-B AA' < proto-C AAA),

(a) State in the first sentence what the path is from the current form to the oldest protoform you want to cite ('From proto-B AA, from proto-C AAA). Make it a separate paragraph.
(b) Further infomration (semantic evolution, irregular transformations, etc.) can be described in the following paragraph, if need be, as succinctly as possible.

  • For "complicated" etymologies (there are several suggested paths or etyma):

(a) Start with "There are (two, three, several) proposed hypotheses:";
(b) State each hypothesis in a single sentence in a separate paragraph, starting with a letter -- (a), (b), (c), etc. -- to identify the hypothesis;
(c) If further information is necessary on a given hypothesis, add it in a separate paragraph after all the hypothesVes, referring back to it by its letter.

  • Cognates would be listed, in full agreement with the source (i.e., no tampering with the data!) in a separate paragraph at the end. If one of the protoforms (preferably the oldest) already has a good, consensus-approved entry in the Appendix, then all, cognates to that entry, making sure that each cognate is duly and correctly sourced. (This is not the current state in most reconstructed entries here, and those interested in entering protoforms should add their sources.)

What do y'all think?

I'm impressed with the detail you put into ūdrs. My only suggestion would be to put the cognates into a separate paragraph to avoid the "wall of text" feeling. It sounds like you're in agreement with this. Benwing (talk) 08:37, 5 August 2015 (UTC)
Is this arrangement in ūdrs (a carriage return between the two paragraphs) what you had in mind? --Pereru (talk) 14:05, 5 August 2015 (UTC)
I don't like how you are duplicating the cognates in ūdrs. They are already listed in *udrós. The Latvian page is not the proper place to discuss the development of Latin lutra. --Vahag (talk) 16:38, 5 August 2015 (UTC)
In this case I actually agree. But before removing them, we need to solve inconsistencies. So there are cognates in my source that aren't mentioned in *udrós. Should I copy them and source them there? How about the fact that my source menitons a Proto-Baltic form, whereas *udrós lists only Proto-Balto-Slavic? I agree that basically cognates (at least for the 'richer' words with cognates in many branches) should be in the reconstructed entry page, but we need to know which forms should be there, from which sources... or else we simply don't know what kind of information we have there. In ūdrs, at least I know who made the claim and where.
I made a first attempt to change *udrós, introducing information from the Latvian source and footnoting it. I don't quite like the look of the result, but it's a first attempt. Any thoughts? --Pereru (talk) 08:43, 6 August 2015 (UTC)
Yes, you can add the cognates to the proto-entry and source them there. That way the information from LEV can be enjoyed by everyone, not just the viewers of the Latvian page. For the format of referencing individual descendants you can look at *tep-. As to which descendants should be there and from which sources, I think at first all descendants from all sources can be added. If people have objections, a centralized discussion will happen on the talk page of the proto-entry or in WT:ES. The bad cognates from outdated sources will be eventually weeded out. That will not happen if you keep the information on the Latvian page. --Vahag (talk) 09:24, 6 August 2015 (UTC)
Well, @Vahagn Petrosyan:, I did add LEV cognates to the list, but my changes at *udrós were reverted without explanation (diff). Unless this is better explained, so that I can know what is going on, what is the point of adding cognates there? It seems safer to leave them on the Latvian page...--Pereru (talk) 01:31, 7 August 2015 (UTC)
I gave an explanation, so did you just choose to not read it? —CodeCat 01:56, 7 August 2015 (UTC)
Reverting good faith edits is not cool, CodeCat. You did the same to me recently.
There is no accepted format for listing both Proto-Balto-Slavic and Proto-Baltic. Pereru, can't you list the cognates under Balto-Slavic only and still reference LEV? Sure, LEV says "from Proto-Baltic", but we understand that in essence what he is saying is that ūdrs is from PIE *udrós, whatever the intermediate details. When my dated Armenian equivalent of LEV says հոտ ‎(hot) is from PIE *ōd-, I understand that I should list it under modern PIE *h₃ed- and still reference my old source. I have seen it done by academic scholars. Martirosyan 2010 can write that a source in 1920s derives a word from such-and-such PIE root and use modern reconstruction for that root. It seems to me that you are trying to give a literal translation of LEV in Wiktionary. That is a job for Wikisource, not Wiktionary. The best practice is to synthesize sources new and old under the light of modern knowledge. --Vahag (talk) 09:43, 7 August 2015 (UTC)
I agree with Vahag that User:CodeCat probably shouldn't have reverted that change, and should definitely have given a better explanation than "this just looks ugly". I can understand CodeCat's objection to the form ū́drā́ with acute accents indicating the acute register (i.e. it conflicts with the conventional use of accents to indicate stress, which is also phonemic in Balto-Slavic, and it's inconsistent with the way other Balto-Slavic entries have been formatted in Wiktionary [granted, it was CodeCat doing that formatting]), but in that case, she should have just undone that one change, with explanation, rather than the whole thing. I also agree with Vahag that we should feel free to modernize/canonicalize proto-forms and such. Benwing (talk) 10:05, 7 August 2015 (UTC)
This is not about canonicalization. Those glottal stops are phonemes on their own in the reconstruction of Proto-Balto-Slavic by the Leiden School, and according to it only after the parent language disintegration did individual branches developed their own acute/circumflex distinctions. The notation with acute accents by R. K. is an entirely different reconstruction, where acute accent marks indicate intonation/tone. Those two also have different originating points - the glottalic theory of PIE vs. the standard PIE Frankenstein's monster with laryngeals, genders and thematic inflection existing contemporaneously. You can't mix those two notations, because they refer to two different protolanguages, in two different chronological stages. There are also other differences that go beyond mere characters substitutions. --Ivan Štambuk (talk) 12:19, 7 August 2015 (UTC)
I think my point still stands, though, that these can be viewed as equivalent notations, with acute register vs. non-acute register marked either by acute accent vs. tilde (or circumflex) accent or by presence or absence of superscript glottal stop, without necessarily committing to a phonological interpretation of the notation. As long as it's agreed that there was a two-way register distinction -- regardless of whether that is interpreted as tonal, as glottal, or whatever -- then the notations are equivalent in that you can convert from one to the other without loss of information, and we may as well be consistent. Benwing (talk) 12:53, 7 August 2015 (UTC)
What does the "two-way register distinction" actually mean? It's a meaningless notion, vague and abstract. Those symbols mean different things in different protolanguages. Leiden School theory also has short *o and *a, and different assumptions on Auslautgesetze and paradigms leading to different endings and forms in inflections. Also, some of the origins of the glottal stop or "acute" are disputed (Winter's law formulation, long/hyperlong vowles), which in particular renders the acute accent notation inapplicable, whereas with the glottal stop you can just use parentheses as is the customary notation for optional parts of reconstruction. Lastly, the superscript notation is baised as to the phonation character of what you call the Balto-Slavic "acute register" - there are different theories (rising/falling tone, glottalization/stod). It's best not to mix those two protolanguages, and use two different reconstructions. There are some Proto-Slavic appendices that already do it like that. --Ivan Štambuk (talk) 13:32, 7 August 2015 (UTC)

──────────────────────────────────────────────────────────────────────────────────────────────────── "register distinction" is an abstract way of referring to a distinction with unknown phonetics, but it's certainly not meaningless, and more than the three laryngeals of PIE, which are equally abstract. No one would have any problem regularizing e.g. Ringe's laryngeal notation, where he writes something like ç x xʷ in his Tocharian book, into more standard h₁ h₂ h₃, even though they may have a completely different interpretation of what these symbols mean phonologically. Differences that cannot be treated as notational variants, e.g. differences in which register or vowel length is reconstructed in a particular word, or in numbers of vowels, obviously shouldn't be confounded, but when there's an equivalence to be made between notational variants I don't see the point of not making it. Benwing (talk) 17:00, 7 August 2015 (UTC)

But the difference is that the glottal stop is not a mere "abstract register". It's a phoneme with a very specific phonetic value. Nobody disputes the phonemic status of PIE laryngeals. The differences between the protolanguage described by Ronald Kim (which has no glottal stop as a phoneme, and "acute" is a property of certain vowels) and the one of the Leiden School are irreconcilable and these two should not be mixed. This "canonicalization" is a thinly-veiled attempt at giving undue prominence to certain theories at the expense of others. It should be resisted and denounced. --Ivan Štambuk (talk) 17:09, 7 August 2015 (UTC)
The difference you mention is nothing more than relative chronology and allophony. Compare the sequence -Vnh- in Proto-Germanic. It eventually gave way to -Ṽ:h-, with a long nasal vowel. It doesn't matter in the slightest whether we write one or the other, because they represent the same phonological units. It's merely a matter of phonetic interpretation, but notation certainly does not have to indicate any particular phonetic reality. The same applies here with the acute. The interpretation in which there is an actual glottal stop, and the one in which there is merely glottalisation or some other change in the vowel, are different interpretations of the same phonological units. How you interpret it phonetically, or denote it in writing, is completely irrelevant to what it is. It's the acute, nothing more, nothing less. Whatever symbol we choose to show that it's there is equally valid, because it's just a symbol that says "acute is here". —CodeCat 19:00, 7 August 2015 (UTC)
I am not trying to give prominence to one theory over another. I'd be just as happy if you denote the acute with a superscript A, and the non-acute with a superscript B (or vice-versa). That makes it obvious that we're talking about what is ultimately an abstract register difference. It is entirely analogous to the situation in Old Chinese, where everyone agrees there was a distinction between "Type A" and "Type B" syllables but no one agrees what the relevant feature was. Some think type B syllables have an extra /j/ phoneme before the vowel, some think type A syllables have pharyngealization of the syllable-initial consonant, some think the difference is vowel length, some think it's a phonation difference (creakiness/breathiness/whatever), etc. But these theories are hardly irreconcilable just because of this. My concern is not to favor one theory over another but to avoid needless complication introduced by notational differences. Since you don't seem to ever believe in canonicalizing notations, we might end up just having to agree to disagree. Benwing (talk) 19:45, 7 August 2015 (UTC)
Two points that have been made in previous discussions: one, we have to recognize that many obvious derivations are not cited, e.g. it is unlikely that a dictionary has gone through Spanish's (or, even more likely, Rumantsch's) massive corpus of words inherited from Latin and noted, in each and every case "yep, this one too was inherited from its obvious Latin predecessor, rather than, like, borrowed from Welsh or something". The same sort of research we undertake to determine what words mean, and e.g. how they inflect (in contradistinction to what scholars and authorities think they mean and how scholars and authorities think they inflect), will sometimes be necessary when tracing etymologies. Two, it would be misleading and foolish not to allow for standardization of notation schemes, as I note above. - -sche (discuss) 17:36, 5 August 2015 (UTC)
@Pereru There's still a bit of a "wall of text" effect, since the paragraph break is barely visible. If it could be set off better, I think people would object less. An alternative is to just list a few cognates, the subjectively most "interesting" ones (e.g. Greek, Sanskrit) and put the rest on a reconstruction page; and if such a page doesn't exist, create it. I personally don't object to seeing all the cognates listed on the Latvian page, but I understand the objections of the others, and I also see how it's likely to lead to inconsistencies (e.g. you give an etymology for the unexpected l and t in lutra whereas the reconstruction page doesn't and says it's unknown. Benwing (talk) 07:23, 6 August 2015 (UTC)
Does indenting the cognate paragraph make it look better (see ūdrs)? As for cognates in general, I do understand the feelings, but there are too few reconstructed entries in Wiktionary for all cognates to be easily transferable (and given that there are discussions about "what the right form" is, I'm a bit afraid of creating hundreds of new reconstructed pages on the authority of my Latvian source, the LEV, just to see them moved to other titles, or incorporated into other pages, etc.; I'd rather wait till there are more solid criteria. I frankly think Wiktionary simply follows no real policy on dealing with reconstructed entries, etymological sources, etc. -- everybody pretty much does whatever s/he wants... For differences between sources, case in point: Latin l in lutra. I simply copied what my source had on this problem, while the reconstructed page made that claim apparently on the basis of an old etymological dictionary of Ossetian (though it is not clear whether the reference refers to the entire page or only to the reconstructed protoform -- again, we lack a good format for these things). Should we mention both? Only the most recent? The best source? Again, where's the policy?... --Pereru (talk) 08:43, 6 August 2015 (UTC)
Indenting is better, although not perfect. I also tried indenting with ':' (where you don't see the bullet point) and setting the paragraph off with two blank lines. All are possibilities.
As for there being no real policy on reconstructed entries, I think you're right. Mostly that's probably because few people are actually creating those entries -- mostly it seems to be CodeCat (talkcontribs), at least for IE languages. You might consider proposing a policy and getting people to vote on it (although that may be a bit like herding cats). Benwing (talk) 08:08, 7 August 2015 (UTC)
BTW you could also try just "being bold" and editing pages like Wiktionary:Etymology and Wiktionary:About Proto-Indo-European and Wiktionary:About Latvian and so on that purport to be policy pages; if anyone objects, they will change it. Benwing (talk) 08:11, 7 August 2015 (UTC)
I'd like to have OP's third point clarified a bit. First, "listing cognates in full agreement with the source": sources on languages that have been unwritten or scarcely written until recently will often utilize technical or otherwise non-standard orthography or transcription; but I would suggest that this does not mean we are obliged to provide a separate source for the actual native orthography. E.g. the Udmurt reflex of Proto-Uralic *käle is кыл ‎(kyl), but all basic sources appear to list only the transliteration kyl, kïl, kɨl or ki̮l. (And, as mentioned above, I agree that transcription schemes should definitely be unified here as well.)
Second, would "making sure that each cognate is duly sourced" involve simply watching that people don't add new cognates out of the blue, or actually adding an inline citation for every single cognate? The latter would sound like overkill, whenever the majority of a cognate list is based on a reliable and comprehensive source, such as authoritative major etymological dictionary (or dictionaries), and is not explicitly contradicted by other equally reliable sources. Not every language group necessarily has such a source available of course, and establishing what sources to consider "reliable by default" (and to what extent — often a source might be quite reliable for cognates but outdated for reconstructions or etymologies) should be determined by the consensus of editors involved with the language or language group in question.
For "unexpected" cognates that are added from somewhere else than from a standard source (say, if someone were to release a paper arguing that Mongolian хэл ‎(hel) is a Uralic loanword), I'd be in favor of annotating the etymologies in more detail, but it should probably be sufficient doing this on the "main" etymology hubs — the entry's own page and its posited origin's entry (whether an attested form or a reconstructed proto-form) — rather than on every single page that refers to it. --Tropylium (talk) 11:52, 7 August 2015 (UTC)
Here are my personal opinions on the clarifications you ask about:
(a) "In full agreement with the source" is not supposed to mean that you can't regularize transcriptions, as long as this is described in a policy page (e.g., WK:Etymology or WK:About Proto-Uralic or something like that), so that the interested reader can always see what can be one to source transcriptions.
(b) "Making sure that every cognate has a source" is meant as making sure the reader can tell where cognates came from. So, if all cognates come from the same source (some authoritative etymological dictoinary, for instance), you can refer to it only once at the end of the page. But then it becomes necessary to indicate deviations if they occur. If someone adds a new cognate to this page that happens not to come from the common source, then this cognate needs a footnote indicating its source, so the reader isn't fooled into thinking it is from the same source as the others. In short: don't necessarily add a footnote to every cognate, but always make it possible for the reader to know where the cognate comes from. If it's an original suggestion of a Wiktionarian (e.g., CodeCat, who is into original research), then also say so by adding an "original research" template. --Pereru (talk) 19:55, 8 August 2015 (UTC)

As I see it, a discussion on allowing "Notes" as a valid header should be considered.

Vahag has brought this up (Wiktionary:Grease_pit#.7B.7Breflist.7D.7D) and I'm running into a similar problem all the time. As ridiculously silly of an argument as it may be, I do, in fact, agree that numbered and bulleted references together look ugly AF. (I have even went to such ridiculous steps as removing a reference that didn't add anything critical just because it was bulleted while the other ones were numbered because of how unappealing it looks.)

In more general terms, I kind of get the feeling that there seems to be consensus that references are in fact valuable and add value to the entry, perhaps the discussion should focus more on how to allow more elegant ways of faithfully citing content, particularly in "controversial" cases, e.g., obviously one bulleted reference is enough under, say, an assertion that et kala and liv kalā derive from the same source because, well, it's pretty obvious but then if there is a "weird" controversial cognate there isn't even a way of citing it inline (unless you want the awful looking mixing of numbered and bulleted refs.) Neitrāls vārds (talk) 11:09, 12 August 2015 (UTC)

  • Support having a ==Notes== section separate from ==References==, esp. when both exist. Benwing (talk) 11:21, 12 August 2015 (UTC)
Thinking of copying this to a separate header for separate discussion. Neitrāls vārds (talk) 06:45, 17 August 2015 (UTC)

Transliteration obligatory?[edit]

It seems that transliterating non-latin scripts has become obligatory in all templates, but in certain cases -- "latin-like" scripts like Cyrilic or Greek -- I think transliteration actually annoys more than it helps. Why is transliteration, especially of Cyrilic and Greek, obligatory in all cases, including inflection tables and examples? I would rather have it only next to the headword... Case in point: Eastern Mari лум, where having two Mari lines is somewhat disruptive. Can't we have a parameter tr=- (which is what I used in this case) to avoid the obligatory transliteration? As things are, the only option is not to use templates... which I would prefer not to do. --Pereru (talk) 04:47, 5 August 2015 (UTC)

Generally, we transliterate everything in Wiktionary -- we don't assume readers are able to handle foreign scripts. So I don't think it's a good idea to disable the translit just because it seems disruptive to you -- we're not limited in space or anything like that (and "disruptive" is in the eye of the beholder). Benwing (talk) 07:48, 5 August 2015 (UTC)
I would also say that translit is especially important for an obscure language like Eastern Mari -- even though it's "just" Cyrillic, it invariably has different conventions from more familiar languages like Russian. (Consider, for example, the Abkhaz language, which is written in Cyrillic but with all sorts of strange non-Russian characters.) Benwing (talk) 07:52, 5 August 2015 (UTC)
I have fixed the problem that |tr=- didn't work in {{ux}}/{{usex}}. However, it should not be used except in exceptional circumstances and this is not one of them. --WikiTiki89 13:30, 5 August 2015 (UTC)
So you guys don't think that it is confusing to have two "examples" separated by em-dashes -- the original spelling example and the transliterated example -- followed by a translation? My first reaction was that it looked like it had two translations, or at least that there were too many elements, enough to clutter the view. Wouldn't it be better to restrict transliterations to headwords, and leave them out of examples and inflection tables? --Pereru (talk) 13:53, 5 August 2015 (UTC)
Should people reading the examples and inflection tables be required to be able to read the script? I think that's too high/elitist a requirement. People might be wanting to read inflection tables for all kinds of reasons. For example, I might be interested in Armenian inflections even though I can't read the script at all. Why make that impossible for me to do? —CodeCat 14:20, 5 August 2015 (UTC)
I agree with the obligatory transliteration of usexes. The format may be tweaked though. Perhaps the first em-dash can be replaced with parentheses. --Vahag (talk) 16:41, 5 August 2015 (UTC)
What about this: лум лумеш, возеш (lum lumeš, vozeš) ― it (lit. snow) is snowing ? DTLHS (talk) 16:51, 5 August 2015 (UTC)
I would not italicize, nor use a small font. I would use a format as in the headword line or {{l}}. --Vahag (talk) 20:20, 5 August 2015 (UTC)
I agree with Wikitiki89 and CodeCat. For short usexes DTLHS's suggestion is good. (I suppose this comes back to the subject we were discussing elsewhere, of having the template 'know' when to make a multi- vs a single-line usex.) - -sche (discuss) 17:37, 5 August 2015 (UTC)
I also like DTLHS's idea with parenthesis and a smaller fornt. My problem with having transliterations everywhere is simply that it affects compactness, which we also want to strive for. I'd be in favor of some solution that doesn't force inflection tables, sometimes already too big (especially in an agglutinative language like Eastern Mari), to become twice as big. Wouldn't it be possible, for instance, to have a second, alternative table with the transliterations? Perhaps with a clickable point to change one version of the table into the other? Or could we maybe have the transliteration become visible in a hovering bubble as you move your cursor over the table? --Pereru (talk) 08:52, 6 August 2015 (UTC)
  • The widespread assumption on Wiktionary is that users are idiots, so having redundant and often unjustifiable data cluttering the entry is generally seen as a good thing. Perhaps we need a two-tier Wiktionary: one for "common people" - without dead words and meanings, complicated etymologies, transliterations on every place under the sun and generally anything that could hurt their attention spans in search of that precious datum of information that landed them here, and one for "serious people", with all that extra stuff. --Ivan Štambuk (talk) 11:02, 7 August 2015 (UTC)
That's actually a good idea. Is it possible to do something like that, maybe by having different shells for "specialists" and "non-specialists"? Even 'normal' articles seem cluttered with all those translation tables and alternate forms and what not, especially for the casual user who just wants to know what a word means. Note that other online dictionaries often have this extra information in some clickable-access format, but not immediately displayed when one asks for a certain word. --Pereru (talk) 00:58, 8 August 2015 (UTC)
I prefer to see transliterations, when templates are used, including for scripts and languages I can read, e.g. Korean or Hindi, etc.
내가 어찌 알겠어?
Nae-ga eojji algesseo?
How should I know?
एक नई शुरुआत
ek naī śuruāt
a new beginning
It's much easier that way for most users. "Smart" users can bear with those who are dumb :) --Anatoli T. (обсудить/вклад) 12:16, 7 August 2015 (UTC)
But wouldn't it be just as good if the transliteration were 'clickable' or available on a hovering bubble? --Pereru (talk) 00:58, 8 August 2015 (UTC)
This feature is currently unavailable. There's no point mentioning something that doesn't exist. (I am not saying, it's not possible to implement.) --Anatoli T. (обсудить/вклад) 13:05, 8 August 2015 (UTC)
If you're "not saying it's not possible to implement", then what's wrong with proposing it? --WikiTiki89 17:51, 10 August 2015 (UTC)
  • I oppose obligatory transliteration of usage examples. I would even favor banning transliteration in usage examples, but there won't be consensus for this. Then at least, don't make it mandatory. These transliterations present inessential (disposable) visual noise. --Dan Polansky (talk) 11:17, 8 August 2015 (UTC)
I suspect that any non-Roman script would be visual noise for you. Unless you can read all scripts, of course. --Anatoli T. (обсудить/вклад) 13:05, 8 August 2015 (UTC)
I am not saying the non-Roman script is noise; I am saying that the romanization in the example sentence is noise. Thus, in лум, I see this:
  • мамык лум ― mamyk lum ― fluffy snow
But I'd like to see this:
  • мамык лум ― fluffy snow
Romanizations in headword lines are fine, IMHO. --Dan Polansky (talk) 13:37, 8 August 2015 (UTC)
At least for one-line usexes, I'd like to see:
  • мамык лум (mamyk lum) ― fluffy snow
but that may become unwieldy for usexes where the translation is on a different line from the usex. —Aɴɢʀ (talk) 14:20, 8 August 2015 (UTC)
I agree with Dan Polansky above, of course. But I still ask: why not find some other way of handling transliterations, such as making them visible when one moves the cursos over them in hovering bubbles, or having a button that makes them visible or invisible depending on the taste of the viewer? What would be wrong with that? We make inflectional tables appear closed by default, and only open when you click on them; why not do the same with transliterations? --Pereru (talk) 19:39, 8 August 2015 (UTC)
@Pereru. Someone will probably create a technical solution for this but I don't understand your dislike for transliterations in usexes. You can ignore them if you don't need them but do you realise that other users may be interested? They may not know the script or willing to learn it, they could be interested in analysing the grammar, vocabulary or language comparison. Foreign scripts just put off some people who are only used to Roman letters. I know this for a fact - this includes people who are familiar with foreign scripts but not fluent in them and reading foreign characters takes some effort. Besides, I'm sure you're having Cyrillic in mind when wanting to get rid of transliterations but the change (if implemented) will affect all non-Roman scripts, some are very complicated and hard to read! How useful would a string of Thai characters like this: เรียกรถแท็กซี่แล้วยัง be to you, compared to เรียกรถแท็กซี่แล้วยัง?rîak rót tɛ́k-sîi lɛ́ɛo yang? ― Did you call the taxi?? You would probably even have some difficulty in finding the headword term (เรียก) at first? --Anatoli T. (обсудить/вклад) 01:19, 11 August 2015 (UTC)
@Atitarev, maybe I'm making this seem more important to me than it really is. It all boils down to an esthetical preference: examples plus translations tend to already be long enough, if you still add transliterations the result will often be longer than one line, and that offends my sense of proportion. I would prefer no transliterations even in languages whose script I don't read (I can read the Thai script, so that's not a big deal for me, but, for instance, I don't read Chinese characers; and yet, for me, lines with just the original Chinese example and a translation look nicer than those with the transliteration). The esthetics gets especially bad with inflection tables, which become at least twice larger than they need to be only to accommodate transliterations. Now, I understand and agree that others have a right to think differently, and I won't mind too terribly if things remain as they are. But if there's a chance of getting a nicer format... then I'm all for it! --Pereru (talk) 06:20, 11 August 2015 (UTC)
@Pereru Thanks for the reply. Yes, various enhancements are welcome but until they are implemented, I think it's good to keep transliterations as they are. Yes, foreign language example can look nice but sometimes meaningless or very hard to digest. It is very true when you look for them. For me, full FL examples with translations and with transliterations (or phonetic guide/help like Japanese furigana, Arabic vocalisations, word stresses, etc.) were always a blessing in learning the basic of new tongues in a relatively short period. You can focus on scripts, grammar, vocabulary, syntax - it's your choice what you do and when, when you have all three (audio recording is a fourth important component). --Anatoli T. (обсудить/вклад) 07:08, 11 August 2015 (UTC)
  • Readability and usability I think that adding transliteration to other scripts is extremely valuable and would like to see it implemented throughout the dictionary but I am also concerned about the perspective that encourages adding an extra step on clicking or focusing for the browser and mouse because this is difficult for users with certain disabilities and on some platforms. —Justin (koavf)TCM 06:16, 12 August 2015 (UTC)

Eastern Mari possessed forms[edit]

I'm thinking about how to do a template that will include possessed forms in Eastern Mari, but because every possessed form ('my house', 'your house', etc.) can also be inflected for ten cases, singular and plural ('my house', 'in my house', 'in my houses', 'to my house', 'to my houses', etc.), we end up having 6 persons x 20 cases x 2 numbers = 240 forms, most of which are predictably formed. This means creating tables that are rather big and unwieldy. I was wondering if someone working with similar cases (in other Finno-Ugric languages, or in Turkish, etc.) has found a better solution that just creating big tables? (Right now, I'm tempted to make each non-possessed declined form -- e.g., 'my house' -- an independent sublemma, with its own case inflection table under it, but I'm not sure this is the best solution.) --Pereru (talk) 04:56, 5 August 2015 (UTC)

I'm a bit fuzzy on the details, but I vaguely remember someone saying that you can nest collapsible boxes. That means you could have just one form showing, but a whole sub-paradigm that opens up when you click on it. Chuck Entz (talk) 06:58, 5 August 2015 (UTC)
Finnish declension tables ignore possessive forms. The possessive endings (which can also be added to verb forms) have separate entries like -ni, -si, -nsä etc with lots of usage examples. --Makaokalani (talk) 10:32, 5 August 2015 (UTC)
For Hungarian entries, each possessive form contains its own declension table. For example: ablak (‘window’) → ablakom (‘my window’), of which the latter have a separate table with forms such as ablakommal (‘with my window’), ablakomban (‘in my window’), etc. Einstein2 (talk) 19:09, 5 August 2015 (UTC)
Nagyon szépen! I like the Hungarian solution. But how do you get those green links? They speed up the making of form-of pages considerably. --Pereru (talk) 01:57, 6 August 2015 (UTC)
Here's a description about how to make a template use the script which generates the green links: User:Conrad.Irwin/creation.js/documentation. Einstein2 (talk) 11:22, 6 August 2015 (UTC)

Make Proto-Baltic an etymology-only language[edit]

Linguists don't all agree on the nature of the Baltic languages as a group. There are three main proposals, that I know of:

  1. Balto-Slavic splits into Baltic and Slavic. Baltic then split into East and West Baltic. (this is the traditional view)
  2. Balto-Slavic splits into East Baltic and Slavic-West Baltic. Slavic-West Baltic then split into Slavic and West Baltic.
  3. Balto-Slavic splits into East Baltic, West Baltic and Slavic.

Proto-Baltic only exists in the first of these proposals. Moreover, it has been noted that there aren't really any common linguistic changes that separate Proto-Baltic from Proto-Balto-Slavic. As reconstructed, the two are essentially identical.

In the past, we've deleted and merged different proto-languages when there is no definite agreement on their existence and definition, and when they are too similar to their parent language to make separate pages for them worthwhile. For example, Proto-Finno-Permic and Proto-Finno-Ugric were recently merged into Proto-Uralic. There was also a discussion on merging various Polynesian languages, although I'm not sure where that went. In any case, I don't see the value in having separate pages for Proto-Baltic reconstructions when they're all just going to be identical to Proto-Balto-Slavic reconstructions. So I think that Proto-Baltic should be changed into an etymology-only language, so that it can be mentioned with {{etyl}}, but there can be no entries or links to it. All existing links would be changed to Proto-Balto-Slavic. —CodeCat 12:14, 7 August 2015 (UTC)

  • Support. --WikiTiki89 14:59, 7 August 2015 (UTC)
  • Support also. Like you, I have also heard that Baltic = East + West Baltic is not a valid clade. Benwing (talk) 16:30, 7 August 2015 (UTC)
  • Disagree. PBS is still not consensus, and as far as I understand the assumption PB = PBS is not obviously true -- Slavic can alter PB reconstructions significantly if it is taken into account for PBS. So, since there is no consensus, I say keep the PB pages as long as they're sourced. After there is a PBS etymological dictionary then this issue can be dealt with here; before that, doing this would simply be premature. --Pereru (talk) 00:55, 8 August 2015 (UTC)
But would there be any difference? It would receive the same treatment as fiu-pro – valid for use in etymologies (in {{etyl}}) but not having its own appendices. Does bat-pro even have any appendices, I think majority are bsl-pro, is that correct? Hopefully this would be another step towards lessening confusion/misguided deletions like this: User_talk:Tropylium#Category:Proto-Finnic_terms_derived_from_Proto-Baltic (I'm sure it was done with good intentions but a user should be able to use such oft-cited (in published literature) genetic groupings in etymologies even if they are considered defunct by the most recent research and don't have their own appendices.) Neitrāls vārds (talk) 09:25, 9 August 2015 (UTC)
There was no deletion: that category simply hasn't been created yet. The category showed up in Special:WantedCategories, and I wanted to make sure it was a good idea to create it before doing so. I wouldn't have deleted it if someone else had created it, but I try to avoid creating categories that are only going to be deleted later (though it inevitably happens some of the time). I do weed out a lot of mistaken categories from bad edits, which I correct, like Category:Spanish adejctive forms, but I generally wouldn't do that with a knowledgeable editor who intended to do it that way. I didn't create the category, but I didn't "fix" the entry itself. Chuck Entz (talk) 14:50, 9 August 2015 (UTC)
  • Question: has Proto-East Baltic been worked out to any major degree? As far as I know, everyone accepts East Baltic, which means that effectively the Baltic vs. Balto-Slavic debate should only come up whenever there's Old Prussian or similar data involved. I would not be surprized if there were even sources defining "Proto-Baltic" as only the common ancestor of Latvian + Lithuanian anyway. (I tentatively support a merger between the appendices; bear in mind that we could still cover in prose differences between Baltic and Balto-Slavic of they were to come up. But I have no opinion on which of the two should remain.) --Tropylium (talk) 18:46, 8 August 2015 (UTC)
Nope. AFAIK no such a thing has been worked out as of yet. Neitrāls vārds (talk) 09:25, 9 August 2015 (UTC)
  • Support. Like Benwing, my understanding of the scholarship is that Baltic is not a genetic group and there was no Proto-Baltic. Even Derksen, who writes of Proto-Baltic, says "I am not convinced that it is justified to reconstruct a Proto-Baltic stage; the term Proto-Baltic is used for convenience’s sake." Reconstructing Prehistorical Dialects: Initial Vowels in Slavic and Baltic says "Baltic scholars who have concerned themselves with this question conclude that one cannot reconstructed a Proto-Baltic." The situation seems comparable to Proto-Algonquian, which was initially reconstructed as Proto-Central-Algonquian (contrasted with Eastern and Plains), before scholars realized that only Eastern was a genetic group with a proto-language (PEA), and that what had been reconstructed as PCA was, with only a few minor changes here and there, simply Proto-Algonquian. - -sche (discuss) 19:14, 8 August 2015 (UTC)
    But there is the question of accuracy. Since PBS still hasn't really been reconstructed (no etymological dictionary), mentioning fleeting forms or original research should only be done explicitly, which is not (yet) done here as policy. What is available out there often does have PB, not PBS, forms -- only those few words that are important for an author's paper, such as the Derksen paper you cite. In the absence of a body of consensus reconstructions for Proto-Balto-Slavic, disregarding the Proto-Baltic ones or changing them automatically into Proto-Balto-Slavic is simply too hasty. The work hasn't been done yet to justify this. We're still at "Proto Central Algonquian" time; to assume that the work of demonstrating that all those forms are simply "Proto Algonquian" has already been done is at best temerary. --Pereru (talk) 19:36, 8 August 2015 (UTC)
I'm confused ... AFAIK no one questions that Balto-Slavic is a clade. Benwing (talk) 05:12, 9 August 2015 (UTC)
Some Lithuanian (and perhaps Latvian) nationalists deny it. I've seen the claim made that the Balto-Slavic theory was a Soviet plot to justify the annexation of the Baltic States into the Soviet Union. I don't know whether any reputable linguists free of ideological motivations deny it, but if so, they're in the minority. —Aɴɢʀ (talk) 15:24, 9 August 2015 (UTC)
One of them tried to hijack the Wikipedia pages on the subject not that long ago. As to Benwing's confusion: the issue isn't whether it's a clade, but whether the details have been worked out on the proto-language. Also, proto-languages are theoretical constructs that are only as good as the information on which they're based: including Slavic in a reconstruction provides extra material to work with, so a PB reconstruction may not be as a complete a picture as a PBS one. I have no problem with documenting that a referenced reconstruction was for PB rather than PBS. My main issue has been with categorizing entries as derived from PB. Even experienced editors sometimes forget about the categories that are added by the templates. Chuck Entz (talk) 15:57, 9 August 2015 (UTC)
@Chuck Entz, well, using it in etyl would imply categorization as well, this is how it's done for fiu-pro as well, do you think the cat should redir?
@Angr, one way it can be valuable (if one reads between the lines) is that it often is used as a "code word" for Proto-East Baltic (the hypothetical parent of Latv. and Lith. that hasn't been worked out yet and judging by current theories wouldn't include Slavs if it is, in fact, worked out at some point) which gives geographic and chronological clues (this can be important in Uralic/Finnic etymologies for example, as there appear to be several layers – a pre-Slavic Balt(o-Slav)ic layer and for Finnics a "Proto-Baltic" (read "Proto-East Baltic") layer of borrowings.) Neitrāls vārds (talk) 16:31, 9 August 2015 (UTC)
@Chuck: It's possible that Slavic would include more information, but someone with enough knowledge of Slavic sound changes could easily evaluate if the Proto-Baltic reconstruction is also valid for Proto-Balto-Slavic. In most cases, it will be. This is not limited to Slavic either; information from outside Balto-Slavic can also contribute to a Balto-Slavic reconstruction. —CodeCat 20:19, 9 August 2015 (UTC)

Adding our own diacritics in quotations of prose works printed without them[edit]

I've had an ongoing debate in the past with User:Atitarev about whether we should add stress marks to quotations of Russian prose. He believes that this is helpful to readers, but I am against this for a number of reasons. Firstly, I believe that out of respect to the author and publisher, all of our quotations should reproduce as closely as possible the original work with the exception of the bolding we add to the word(s) that the quote is demonstrating. Secondly, this forces us in some instances to choose between two or more equally acceptable stress variants of some words, or worse in some cases between two or more homographs with different meanings. Note that this does not apply as much to poetry from which stress can be inferred by the meter, or to songs or movies in which the stress can be heard. This problem is significantly exacerbated in languages such as Hebrew and Arabic, where would not only be inferring stress, but also vowels, leaving much more possibility for ambiguity.

The question is: Should we (Wiktionary) do this in general? Should we do this for languages like Russian, even if not for languages like Hebrew and Arabic? Should we do this even for languages like Hebrew and Arabic? Should we remove diacritics from quotations where we have already added them? --WikiTiki89 15:32, 7 August 2015 (UTC)

As far as I know, the practice is to leave quotes relatively unchanged. I don't think we add macrons to Latin or old Germanic quotes, for example. —CodeCat 15:37, 7 August 2015 (UTC)
Adding macrons to Latin is a completely different story, because these texts are often already printed with macrons. I'm not talking about always sticking with the most original quote version of the quote, but about sticking to existing publications. This question mostly applies to relatively modern quotations. --WikiTiki89 15:55, 7 August 2015 (UTC)
As for Arabic, stress of course doesn't really apply, but I think it would be a huge help to the reader to add the vowels to the extent that they can be inferred reasonably unambiguously. Reading Arabic is hard for non-fluent speakers due to the underspecified text, esp. with verbs. I think in the case of Russian, similar arguments could be made -- if you're concerned about ambiguous cases, just leave off the stress in those cases or (perhaps better) follow Anatoli's convention of putting a stress mark in each possible place of stress. I'd also like to see individual words inside quotes linked -- again it would be a great help for the language learner. Benwing (talk) 16:26, 7 August 2015 (UTC)
Not all the quotations we include need to be targeted toward beginners. We can have usage examples with the full diacritics, which would be helpful for beginners. But quotations are meant to show how the words are really used in reality; and in reality, Russian is not written with stress marks and Arabic is not written with vowels. --WikiTiki89 17:32, 7 August 2015 (UTC)
In reality we always (or should always) transliterate Arabic text, at the very least. (And who's to decide what's targeted towards beginners and what's not? The same arguments could be made for not transliterating at all.) Benwing (talk) 19:49, 7 August 2015 (UTC)
What I mean is that not everything needs to be targeted toward beginners who can't read without vowels. And even for people who are not so comfortable reading without vowels, it's not as hard when you already know what word you're looking at. With transliteration, we're not actually altering the original text; the original is still there and anyone who doesn't want or need the transliteration can ignore it. --WikiTiki89 20:19, 7 August 2015 (UTC)
Another thing is that adding adding vowels prevents us from being able to show how vowels actually are used in the text (such as the fatḥatān, šadda, and other sporadic disambiguators). This applies to all three of the languages I've mentioned. --WikiTiki89 21:10, 7 August 2015 (UTC)
When we're giving a direct quote, we should keep the original spelling of the whole quote, i.e. without Russian stress marks (unless we happen to be quoting some text that for whatever reason uses them). We should also keep е for ё if that's how it was spelled in the original. (I don't quite understand why we allow ё in page names in the first place.) We can include stress marks in the transliteration if need be, though that will mean writing the transliteration out manually instead of letting it happen automatically. —Aɴɢʀ (talk) 10:55, 8 August 2015 (UTC)
We allow ё in the page names because this is a dictionary convention, it's so also in the Russian Wiktionary. The Russian Wikipedia makes the letter mandatory throughout articles and many native speakers prefer to write it all the time. Letter ё isn't exactly banned in Russian! It's also considered a separate letter, not a е with two dots (две точки). Every Russian dictionary uses it in the alphabetical order. Knowing that ё is replaced with е by native speakers lets you figure out how to spell it in the real world. For the same reason, I don't see how adding stress marks, normalising texts with ё, adding Arabic or Hebrew diacritics, Japanese furigana is a problem in quotations. Many editors suggest photographic image of the original texts, even using the glyphs. Modern Russian books don't reprint texts in the pre-1918 reform spellings. China republished all old books in the simplified script. Japanese publishers partially follow the post-war reform.
Another point, some Russian books appear in accented forms with consistent usage of ё, designed for foreigners or children. Or Arabic texts can be with or without vocalisations. Japanese texts appear with furigana (ruby) to help with the pronunciation, especially when aiming at young readers.
My strong opinion is that dictionary should be user-friendly and help master languages, it's about the language, not the facts. Showing how languages are written out there in the real world can be described in appendices. Learners learn this as the first thing. For me, a learner of Arabic, is much more useful to have vocalised Arabic then telling me over and over again that diacritics are not used by Arabs. Imposed restrictions is the reason I dislike adding citations. --Anatoli T. (обсудить/вклад) 12:46, 8 August 2015 (UTC)
I support Benwing's idea of linking words in usage example. It has long been used by Chinese templates, which do it automatically. E.g.
中國首都北京 [MSC, trad.]
中国首都北京 [MSC, simp.]
Zhōngguó de shǒudū shì Běijīng. [Pinyin]
The capital of China is Beijing.
As you can see, it has a semi-automatic script conversion and transliteration, it can also be used for quotes, which will display both traditional and simplified forms, regardless of the original form. --Anatoli T. (обсудить/вклад) 12:53, 8 August 2015 (UTC)
All of this is fine for our own example sentences, but I do think we should follow the original orthography when we're giving a direct quote. We're showing how the word is used "in the wild", and I don't think we should pretty that up. But headword lines and translation listings and usage examples can be as learner-friendly as we want them to be. —Aɴɢʀ (talk) 05:54, 9 August 2015 (UTC)
What do we violate by providing "самолёт лети́т на за́пад" instead of "самолет летит на запад" with word stresses and normalising "е" as "ё"? The text is the same, it just has accents to make the reading easier. It's completely uncommon in Russia to use pre-1918 reform spelling when quoting old authors and Chinese don't have to use traditional script when quoting old authors, regardless of what script the original was in. Chechen texts often replace Cyrillic palochka with |, l, 1, etc. for technical reasons but the normalised spelling distinguishes capital and small Ӏ and ӏ , e.g. лугӏат ‎(luġat) (the correct spelling) will appear in a printed text as луг|ат, лугlат, луг1ат or лугӀат. Should we also copy the fonts and word breaks in citations? --Anatoli T. (обсудить/вклад) 07:23, 9 August 2015 (UTC)
I feel like with direct quotes, we should present them as faithfully as Wikisource presents source texts: we don't copy over fonts and word breaks, and incorrect character shapes can be replaced with correct ones when the intent is clear (e.g. when the original author is clearly attempting to write a palochka but doesn't have the exact character available), but we do present misspellings, misprints, typos, etc., uncorrected (though they can be [sic]ed) and we don't add pedagogical diacritics. —Aɴɢʀ (talk) 08:17, 9 August 2015 (UTC)
I agree with Angr. - -sche (discuss) 06:03, 9 August 2015 (UTC)
I disagree, as mentioned above. Although in any case there shouldn't be problems linking individual words in quotes. Benwing (talk) 07:08, 9 August 2015 (UTC)
FWIW, I think the argument for adding diacritics to Arabic (it's often unintelligible without them) is much stronger than the argument for adding diacritics to Russian (it's perfectly intelligible without them), and I would sooner allow the former than the latter. At the risk of adding far too much visual noise to non-Latin script citations, perhaps we could have vocalized forms display on mouse-over or something? - -sche (discuss) 16:54, 9 August 2015 (UTC)

WikiTiki's and Anatoli's disagreement is very deep and philosophical. It stems from the disagreement over the purpose of Wiktionary. Anatoli and his camp see Wiktionary mainly as a learning tool for non-native speakers. Hence the reading aids in quotations, the note in Template:ru-adj1 and the unscientific, pronunciation-based transliteration system for Russian. The other camp, which includes me, sees Wiktionary as a scholarly resource, a kind of an encyclopaedia of language, useful for native speakers too. One side wants to write an OALD, the other an OED. Both projects are useful and have a right to exist, but we have to choose one. --Vahag (talk) 13:56, 9 August 2015 (UTC)

I'm not sure what you mean by "unscientific" here. Also, maybe I'm an optimist but I think it's possible to resolve this issue through compromise. As for OALD vs. OED, keep in mind this is the English Wiktionary, and hence designed for English speakers. That means that foreign-language entries are inevitably geared somewhat towards language learners, just like all cross-language dictionaries. I don't think there's much disagreement over this. This means the OALD isn't the right point of comparison. We're rather trying to create something like the OED for the English-language entries and the Hans Wehr dictionary for Arabic language entries (this is the best dictionary of Modern Standard Arabic I can think of), and similarly for other foreign-language entries. Benwing (talk) 21:25, 9 August 2015 (UTC)
Vahag. Neither OALD nor OED cover topics in detail we do here. Published Russian dictionaries lack transliterations, there's nothing to compare with. Well-known dictionaries are unconcerned about the Russian transliteration, they simply don't do it. When they do (in citations, etc.), you get both "narodnovo" (phonetic) and "narodnogo" (graphic) transliterations (genitive or animate accusative of наро́дный ‎(naródnyj)). You made negative comments about word stresses and genders as well but most users and editors find them useful, AFAIK. Therefore, I have to use other languages again as examples, for the umptieth time.
Examples of irregular pronunciations and transliterations, using very common words in various scripts:
  • Thai: ชาติ ‎(châat) (written as "châa-dti") but the final "i" is silent. Can you find a (scientific) source, which claims that it should be transliterated as "châa-dti" or similar, with a transliterated "i"?
  • Korean: 십육 ‎(simnyuk) (written as "sibyuk"). Can you find a (scientific) source, which claims that it should be transliterated as "sibyuk" or similar?
  • Japanese: 今日は ‎(こんにちは, konnichi wa) (written as "konnichi ha"). Can you find a (scientific) source, which claims that it should be transliterated as "konnichi ha" or similar?
  • Arabic: شُوكُولَاتَة ‎(šokolāta) (written as "šūkūlāta"). Can you find a (scientific) source, which claims that it should be transliterated as "šūkūlāta" or similar? Perhaps a better example is إِنْجْلِيزِيّ ‎(ʾinglīziyy) written as "ʾinjlīziyy".
I can give more examples where phonetic transliteration (closer to pronunciation) is considered standard and scientific. Are there sources that claim that "что" should only be "čto" and never "što" and "кого" should only be transliterated as "kogo" and never "kovo"? --Anatoli T. (обсудить/вклад) 01:01, 10 August 2015 (UTC)
Anatoli, Benwing, Russian transliteration has been discussed million times (see Wiktionary talk:Russian transliteration) without achieving consensus. Let's not start a new one here. I was merely pointing it out as an example of scientifically rigorous vs convenient. The issue at hand are the usage examples. When you are giving a quote from Pushkin's Eugene Onegin, I want it do be without stress marks and in pre-reform orthography, as it was published in 1833. If you normalize the text, it is less valuable to me and others who are interested in diachronic, historical development of Russian. Language learners would prefer normalized quotes. Our needs are irreconcilable. --Vahag (talk) 10:50, 10 August 2015 (UTC)
If I were to provide citations for the pre-1918 reform spelling of пока́мест ‎(pokámest) (modern) - пока́мѣстъ ‎(pokáměst) (pre-1918 spelling reform), then the old spelling would be more appropriate:
Покамѣстъ, въ утреннемъ уборѣ
Надѣвъ широкій боливаръ,
Онѣгинъ ѣдетъ на бульваръ,
И такъ гуляетъ на просторѣ,
Пока недремлющій брегетъ
Не прозвонитъ ему обѣдъ
But why would I need to confuse users/readers if the entry is the modern spelling (покамест)? Pre-reform spellings are, of course, allowed but they should be clearly marked as old or obsolete. Pushkin's works are enjoyed today by most readers who don't have to struggle to read the old orthography, citing pre-revolution authors works just fine but if anyone is interested in the old orthographies, they are free to do so but what it has little to do with the dictionary of (modern) Russian. --Anatoli T. (обсудить/вклад) 12:32, 10 August 2015 (UTC)
You can reference a more modern printing of the work, which would have already converted the orthography. I would have no problem with that. --WikiTiki89 17:59, 10 August 2015 (UTC)
That's great but why would a learner of Russian seek archaic spellings in the Russian sections of the English Wiktionary, even if the reference is for a term, which hasn't changed with the reform? It's fine if you already mastered modern standard Russian and wish to take the next step and familiarise yourself with historical spellings. Yes, we can add all historical spellings but they are not a priority for this project. --Anatoli T. (обсудить/вклад) 01:07, 11 August 2015 (UTC)
In case you misunderstood me, you can quote a more modern printing of the work that uses the modern orthography. I'm only concerned with with us altering the text ourselves. --WikiTiki89 01:22, 11 August 2015 (UTC)
Perhaps, my comment was to get my point across in reply to Vahag's comment earlier, where he said that he would prefer the original orthography quotes. Eugene Onegin (or Yevgeny Onegin) is available in both pre-reform and modern spellings or in fact any old literature for that matter. I just don't see the need to quote pre-reform orthography for modern terms. Not at the expense of modern orthography, in any case. --Anatoli T. (обсудить/вклад) 01:48, 11 August 2015 (UTC)
Some people may be interested in them. If we already have a few quotations in modern orthography, who does it hurt to have one in the old orthography as well? --WikiTiki89 02:08, 11 August 2015 (UTC)
  • Adding stress marks to attesting quotations of Russian prose is a poor practice, IMHO. Adding these to headword lines is acceptable; adding these to lists of terms such as synonyms and derived terms is equally poor, IMHO. In most places, terms should be presented in the form in which they appear in print. I don't believe the learners of Russian should be reminded on every single occassion how to pronounce; the headword line itself should suffice. --Dan Polansky (talk) 11:40, 10 August 2015 (UTC)
    I agree with Dan in principle. Language learners are smart enough to click the link if they forgot the stress or pronunciation of a word. --WikiTiki89 17:59, 10 August 2015 (UTC)
As a (admittedly not very committed) learner of Russian, I would find more pervasive usage of stress marks very useful. Looking up the stress every time is very tedious and a drag on learning. Seeing the stress mark in quotes, examples, links and synonyms would make learning faster through repetition and reinforcement. --Tweenk (talk) 22:48, 26 August 2015 (UTC)

Tagging unsourced reconstructed entries[edit]

I've just made {{needsources}} to tag reconstructed entries (protoforms) that were created without explicit published sources. Since, after all, reconstructed forms are simply hypotheses, not attested words, they need sources (who proposed that reconstruction, in what publication, and based on which cognates) just as much as a "normal" word needs usage examples so we know it really exists. I therefore suggest that any reconstructed entries that have no sources in them be tagged, so that those interested in them can add the sources. (I started doing this, but my edits were reverted since the issue had not been discussed here first, so I am doing this now.) --Pereru (talk) 01:40, 8 August 2015 (UTC)

I don't understand why they must absolutely have sources. From its conception, Wiktionary has been a dictionary and therefore stands on par with other dictionaries. Other dictionaries do not source all their definitions to another linguistic work; they interpret and present their research independently. In the same way, Wiktionary and its editors have directly interpreted evidence in the form of attestations. Parroting other dictionaries has always been explicitly forbidden and independent research of lexicographic content has been a requirement, enshrined in WT:CFI and the process of WT:RFV. For lexicographical content, we have never once required corroboration by an outside source; we require evidence and make our own decisions based on that through consensus and peer review.
Because Wiktionary presents etymological information as well, it's also an etymological dictionary. That means that other etymological dictionaries stand on par with Wiktionary. Etymological dictionaries, too, present independent and sometimes novel interpretation of the evidence, and are not required to take all of their contents from other linguistic sources. Of course, when information is corroborated by another source, they can and do indicate this, to strengthen their own claims. But etymological works may equally question or refute what other sources say; they're not limited to parroting others.
Wikipedia is an encylopedia, a compendium of existing knowledge. This makes sourcing vital to Wikipedia, and original research a problem. But as I have shown here, Wiktionary is of a very different nature, and through this nature it is bound by different rules. It's not a compendium of lexicographic or etymologic knowledge presented by others; it's an independent source of this knowledge. We are not subservient to other linguistic sources, we are their equivalents, or even competitors. Original research within Wiktionary is important, it's an integral part of how Wiktionary works and has always worked. Therefore, it's not appropriate to require sourcing to another linguistic work for information presented on Wiktionary. This goes directly against what Wiktionary is, and the principles and processes written down in our policies. —CodeCat 12:09, 8 August 2015 (UTC)
Contrary to the above, requiring references for etymologies is not against en wikt policies since we do not have any on the matter. WT:ATTEST, the important evidence-based criterion, says nothing about etymologies. Some people are even pushing a requirement that etymologies should be referenced into WT:ETY; my removal (diff) of an undiscussed addition of such a requirement was undone. I think the whole section References in WT:ETY should be removed as not traceable to a discussion or vote showing consensus, but I have better things to do at this point; maybe a couple of months later. Again, while for definitions we have WT:ATTEST and WT:CFI in general, for etymologies we have a policy vacuum. --Dan Polansky (talk) 13:10, 8 August 2015 (UTC)
Let's see if I can help CodeCat understand why sources for etymologies are a good thing:
(a) Etymologies are hypotheses, not the truth; the interested reader should be able to see why a certain etymology is given here rather than others, without havaing to trace some discussion of its correctness somewhere in the archives.
(b) Etymologies, being hypotheses, have authors: unlike words, they aren't simply "in usage" or "out of usage" or "dated" and whatnot, they were actually ideas, good or bad, proposed by someone. To omit this information is (a1) a disservice to the interested reader, since it hides available information, and (a2) unethical, since it amounts to not giving credit to an author for his/her idea, which is a kind of intellectual theft
(c) To the non-specialist, more information is better than less information. I am sure that a specialist can probably quickly assess and evaluate the goodness of a specific etymology, but others would need more than that. Claiming you don't need this information because "expert Wiktionaries" can access the correctness of an etymology anyway is like claiming that attestations are not necessary to qualify a word for inclusion because "expert Wiktionarians" can tell if a very rare or dialectal word actually exists...
(d) "Other dictionaries don't do that" is not a good argument ("Wiki is not paper", etc.). Some do: etymological dictionaries, where the sources are so important they are usually listed at the beginning of the book rather than at the end, because the author knows that the interested reader will want to form his/her opinion on the author's choice of sources. Non-etymological dictionaries indeed often don't, but they also often don't cite any etymologies at all, and they certainly don't have appendices with reconstructed protoforms -- if we want to follow them, then we should delete all reconstructed entries, shouldn't we?
(e) "Etymological dictionaries present independent and novel interpretations of the evidence" -- indeed, and they always label it as such! And they also always give sources for ideas that are not "independent and novel"! Why should Wiktionary be any different? Personally, I am not against independent research, as long as it is (e1) labeled as such, and (e2) argued for, preferably on the same page. Why are you not doing that? Īn other words: Etymological dictionaries do distinguish original ideas from other people's ideas, which they give sources for; why don't we -- why don't YOU -- do the same?
(f) Mentioning sources is not equivalent to parroting other dictionaries' definitions--quite the opposite! Mentioning sources means respecting other people's intellectual property rights, and also giving the reader the possibility of exploring the basis for a given etymology being used here.
Besides, both Wiktionary:Reconstructed_terms#References_and_verifiability and Wiktionary:Etymology#References mention the need for sources in etymologies. Why shouldn't we follow these guidelines?
@CodeCat:, you seem to believe that sources vs. lack of sources boils down to Wikipedia vs. Wiktionary. It doesn't. The reason for writing adding sources to etymologies is that it is a good idea (see above), not a simple imitation of other wiki projects. Please get off the soapbox!... Also, it's not a question -- at least not to me -- of "original research". As I said elsewhere, I have nothing in principlpe against original research; I just want it to be labeled as such. If the reconstructed protoforms you created entries for are all your own work, then they should be labeled as such, and your reasons for creating them with that form should be on their page (or on a page like WK:About_Proto-Indo-European, or WK:About_Proto-Balto-Slavic, etc.). I'm not "requiring sourcing to another linguistic work", I'm just "requiring sourcing"-- if it's your work, say so on the page! That's what etymological dictionaries do: they label their own work as such. It's also not about "criteria for inlcusion or deletion": I'm not saying 'delete it if it's original research', I'm saying 'label and argue for it if it's your work' -- not just in obscure discussions two years ago in the Scriptorium, but right on the reconstructed entry page! WHY THE HELL NOT? Something I really don't understand is why you are hellbent on obscuring the reasons why a certain protoform is included here. In what way does hiding the reasons/sources for including a form help Wiktionary become better? Claiming that "expert Wiktionaries" can judge it so we don't need to argue for them on their page is like claiming that "Expert Wiktionaries" can tell if, say, Arabic usage examples are correct or not, so we don't need to translate them into English on the page of the word they are an example for...--Pereru (talk) 19:05, 8 August 2015 (UTC)
(a) Sourcing doesn't actually tell the readers any reasoning. It just suggests that the reasoning might be found in another work instead, but even that's no guarantee as plenty of other works just give forms without any arguments. I am completely for reasoning and giving arguments for reconstructions, within reason. Some widely known and accepted sound changes like Grimm's law should not need to be pointed out in every etymology. So I'm not sure how this point is relevant. External sourcing doesn't change anything about it. If anything, I understand your argument to mean that we should provide argumentation for etymologies in addition to, and regardless of, sourcing.
(b) It can be assumed that all information on Wiktionary is the result of Wiktionary's own editorial process. All content on a wiki is already sourced through the page history, so that gives credit to everything users have ever added to pages. Adding references to Wiktionary users only complicates things. External sources are fine, but we should not be required to tag everything we add with our own usernames, that's just stupid.
(c) Again, a source provides no information, it merely says where information came from. We use many specialised linguistic works as sources on Wiktionary, and I don't think many Wiktionary readers will have access to them. So to the majority of readers, the source is nothing more than a name.
(d) I have nothing against providing a reference to a source when information is taken from them. I admit I have been rather sloppy about this, and still am to some degree. But I am trying to improve things, as you may have noticed from my recent edits to PIE root pages. Do as I say not as I do. Just because I'm not perfect doesn't mean I'm not right.
(e) Again, I have nothing against sourcing information that does come from an external source. What I disagree with is requiring that all information comes from an external source; this is what your new template's wording appears to imply. I also disagree with sourcing particular ideas to individual editors. Wiktionary is a wiki, and information can and should be edited and improved by other editors. This means it's not right to place certain parts of pages on "lockdown", not allowing anyone else to edit them. Etymological information originating from within Wiktionary should be sourced to Wiktionary editors as a whole, and to editorial consensus. But since all information not sourced to external sources can be assumed to have been provided by Wiktionary editors, this is entirely redundant.
(f) Copyright doesn't apply and never has applied to information alone. So intellectual property is not relevant here. Scientists give each other credit and require it from others, because of plagiarism, but that's not intellectual property as far as I know. And I have no idea what the laws and rules are on plagiarism anyway. Wiktionary doesn't have any rules for it.
Those two pages you mentioned were written long ago, long before there was really any significant number of reconstructed pages. I also doubt whether they actually reflect consensus and common practice, so they should be changed to reflect what we actually do. My objections to your proposals, now and before, are that we should not be required to have an external source for all etymological information on Wiktionary. This is where my comparison with Wikipedia comes in. Wikipedia has a simple rule: unsourced material that is challenge can and should be removed. I object to bringing this practice to Wiktionary, as we are a dictionary (lexicographical, etymological and other) and it is in the nature of this project to be able to interpret, research and peer review available evidence (attested words) on our own.
So, again, to recap: I have nothing against sourcing. If information comes from somewhere else, source it. That's a good thing. Explaining reasoning for particular reconstructions, in the entries themselves, is also a good thing. I have no problem against that either, but within reason. Very obvious things like Grimm's law probably don't need to be mentioned, but there is no objective standard for this and if we want to go this route, we should figure out among ourselves which information is obvious enough to leave out. —CodeCat 19:39, 8 August 2015 (UTC)
(a): Of course, only if you mention bad sources. Good sources do have the reasoning behind the proposals. It's up to you if you cite good sources or bad sources. Don't cite bad ones; cite good ones. If you see bad sources being cited, mention that to the author or start a discussion about that source. Don't just omit it -- as always, there is nothing to be gained by using a source -- including your own original research -- and not mentioning it. How many etymological dictionaries do you know that fail to mention their sources? And they are not Wikipedia... Now, it would indeed be better if you added the entire reasoning behind a suggestion rather than just reference the source, but the latter is easier and is the standard practice in etymological dictionaries. And most reconstructed entries here -- especially the ones you made -- still lack such an explanation, which is why they should be tagged with {{needsources}}.
(b): Sure. But as others have said there is no policy with respect to etymologies and their sources, so saying "it's the result of Wiktionary editorial process" still tells us nothing about what was done. What if I want to know the reasons? Where do I find this information -- an information that most etymological dictionaries give by means of, among other things, indicating their sources? And by giving detailed reasonings when it's their own idea?
(c): A good source does provide information. Are you familiar with good etymological dictionaries? They provide further sources, so you can trace it down to the original proposer, and they provide rationales for deviant forms. They also compare different hypotheses, and often provide further evidence for preferring one or the other. Plus they list correspondences and sound laws, especially the least known ones. They're full of argumentation, reasonings, rationales... What the heck are you talking about? What sources are you talking about?
(d): Good! Please continue doing that. If you add sources to your pages I have problems with them. In fact that is my entire point: not having sources and reasons for including a particular protoform on the page itself is not a plus for Wiktionary, it's actually, as you put it, being sloppy. I'm glad you're fixing that, and you'll get my support for this. The goal, of course, should be to fix everything.
(e): And here we are apparently in full agreement: I am in favor of referencing external sources only when the information comes from an external source (duh!...). But now, "if" a given word is the result of your own original research, then this should be sourced, so that the reader knows that it is your original research. If you have your reasoning on the page, what the heck is bad about saying it is your idea? In what way is that bad for Wiktionary? And again, good etymological dictionaries do that (Karulis adds a big "K" to every paragraph in the LEV that contains his own ideas, for instance. That is what good etymological dictonaries do: they do not shy away from original research, but they label it as such and argue for it on the entry itself! Why is that so bad?)
(f): Intellectual property is not simply a question of law; it's a question of ethics. "Plagiarism", i.e. people taking advantage of other people's ideas without mentioning them, is exactly what the concept of intellectual property is supposed to prevent; why else do you think it exists? I think scientists don't own the legal copyright over their own ideas after they're published, but they certainly have the moral/ethical copytright. Do you think Dr Kim would be happy if you wrote him an e-mail telling himv you've mentioned Proto-Balto-Slavic protoforms he proposed in a public forum like Wiktionary without mentioning his name? Would you, if you were in his place? Maybe he thinks Wiktionary is "just internet" or "not trustworthy" and thus not worth the trouble, but I'm sure he wouldn't think that not mentioning his name is the right thing to do -- in fact, I'll bet he would mention this as an argument against taking Wiktionary seriously. Which in fact it is.
(g): Maybe the pages should indeed be changed; two other Wiktionarians in the rfd discussion have already suggested that I myself "be bold" and edit and change them. I don't want to do that, though; but if you feel so strongly about it, why don't you? I do point out, though, that several others have said there is no official policy, so I'm not sure that there is a "what we do" yet: you seem to be placing the cart before the horses here. I think you still need to argue for "what YOU do" as being "what we should do". And frankly, I don't see how you can argue that not mentioning sources actually enhances Wiktionary. There is no self-respecting etymological dictionary that doesn't mention sources and doesn't label independent original research as such; why should Wiktionary?
In sum, if you don't have anything against sourcing, then remove the {{rfd}} from a template that merely asks for what you say you have nothing against. If you are in favor of explaining reasons for particular reconstructions in the entries themselves, then do so. In fact create a framework for doing that, with a special page in the Wiktionary namespace for listing all correspondences, all sound laws, etc. so you can easily refer to them in the shorter explanations in every reconstructed entry. By all means do so! The problem thus far is that this is not being done, and when I started requesting that it be done ("source" = "published source" OR "original research rationale") you reverted all my changes and asked for my template to be deleted. Be consistent! Do as you claim to believe! --Pereru (talk) 20:34, 8 August 2015 (UTC)
@CodeCat:, to summarize:
It seems we agree in most things. We both think it's good to have sources if the information comes from an external source. We both think original research is OK, and we're both in favor of writing down the reasons for a certain reconstruction in the entry itself. I'm further in favor of you also mentioning yourself as the author of a given idea if indeed that is the case, or at least of referencing/copying the discussion that led to a given form being accepted here. So why not do it? And what is the problem with tagging the entries where this wasn't done yet? I also add {{rfap}} to basically every new Latvian entry I make, because this puts them in a single category where Latvian native speakers like Neitrāls vārds can comfortably find the words they want to add pronunciation files to. Because, just as in the case of etymologies being sourced (and I don't mean only external sources), this actually adds value to the entry. Why not make this official Wiktionary policy?--Pereru (talk) 20:46, 8 August 2015 (UTC)
I just don't want my name to be placed in entries, and especially not my real name. I think that's my prerogative. —CodeCat 21:28, 8 August 2015 (UTC)
Not even CodeCat? Why not? I don't want my real name here either, but I wouldn't mind signing something here as "Pereru", the same way I sign a picture I upload to Commons as "Pereru"... If our names are in the histories of the pages we edit, and here as signatures in the comments we write, why not also in suggestions in pages? But well, it *is* your prerrogative. Call it then "Wiktionary contribution", or tag it with a "W" or "WK" to show that the idea originated here, rather than in the outside world.--Pereru (talk) 04:18, 9 August 2015 (UTC)
I don't accept the notion that we need to cite sources to list descendants; that would hobble us. Regular inheritance by a language of a word from an earlier stage of that language (including from a proto-language) is usually so obvious and non-noteworthy that it is not mentioned except for common words in well-documented languages, or for proto-language terms that an author needs to grasp at less-documented languages to demonstrate; good luck finding a reference that confirms, for any sizeable number of words from e.g. Rumantsch, that they indeed derive from Latin/PIE foobar. Even borrowing may be obvious but unreferenced; no reference in supra confirms that the word derives from Georgian, but it's fairly obvious.
I do think the sheer existence of a word in a proto-language is something we need to provide a reference for, though if a reference attests that a certain word existed in a proto-language, I think we can and should certainly adapt that reference's potentially outdated notation; when I do this in Proto-Algonquian appendices I write source (has form) (sometimes visibly and sometimes in an HTML comment). If no previous scholarship attests the existence of a word, we could put a template at the bottom of the entry (a bit like {{LDL}} and {{Webster}}) saying something like "this reconstruction is the product of deduction by Wiktionary editors"; users could then (as with every other claim on every non-talk-page) look to the page history to see who added what. Such a template would provide a nice way of tracking and periodically revisiting such entries to see if references for them had become available, since it seems to be obvious to everyone except CodeCat that citing external authority wrt the existence of words in proto-languages is better than leaving it at "well, a random, vehemently anonymous person on the internet thinks so". - -sche (discuss) 22:15, 8 August 2015 (UTC)
@-sche: the problem here is simply when you have cognates proposed by different sources. Cognates can be sourced by default (they will mostly come from the same source anyway), without necessarily adding a footnote to each of them; but those who come from some other source will need to be footnoted, so that we are clear the source in question did not claim cognacy in this case. This applies even to words suggested as cognate by Wiktionarians: we could add a little superscript "W" to those, for example. This happens because "obvious" is not always true. French parler looks like a cognate of Portuguese falar, but it isn't. In fact, it's standard scientific practice: when you are presenting cognates, they must be either (a) sourced, or (b) your claim, or at least (c) be attested in some very well-known source, so they can be presented as "known to everybody already".--Pereru (talk) 04:48, 9 August 2015 (UTC)
I'm still being misunderstood here it seems. I do think that citing an external authority improves etymologies and reconstructions further. However, I don't think that reconstructions are necessarily less reliable without them. Sometimes, the reconstruction is just so obvious that there's nothing else it could possibly be. A great example is Proto-Finnic *kala. It's exactly the same form as its ancestor and many of its descendants. If we can find sources that agree with our own ideas, then all the better, that just shows that we're not alone in thinking that. But the same applies to sources with respect to each other, too. If we have two sources that disagree with each other, then we can mention the idea from both of them. But we should also feel free to poke holes in these proposals. Maybe we (through WT:ES or a talk) could decide that one of them has more merit than the other, and we can mention our reasoning in the entry. As editors and researchers, we don't have to consider all sources equally valid. —CodeCat 22:28, 8 August 2015 (UTC)
I think the misunderstanding is actually yours, about how science works. Yes, *kala is maybe an obvious case, but it was not discovered by you or me. It has a proposer, and saying who it is is, I think, something an etymologist would be interested in. See, this is like saying we don't have to provide usage examples or definitions for words that "everybody knows". Yes, everybody knows what time and happy mean; yet Witkionary provides them with definitions. Is this useless? No. Is it useless to provide a source for *kala? Again, no. Just ask any scientist: is it useless to provide sources for 'obvious' things? No, both for credit/historical reasons (the guy who said it first deserves the credit), and for scientific reasons ('obvious' ideas sometimes turn out to be wrong...). If the source is well-known ('everybody knows who proposed that'), then scientists will not mention the author (everybody knows the laws of gravitation were proposed by Sir Isaac Newton).
Looking for "the proposer of" obvious etymologies is not a good idea. Finnic is a dialect continuum, and it has always been known by the speakers that people in nearby areas use plenty of the same words. This would be sort of like asking "who was it that proposed that fish in British English and fish in American English are cognate?" (Or: "who was it to discover that the moon has phases?")
It's possible to do historiography on when does an etymology like this start turning up in scientific literature of course, but that's more constrained by the development of linguistic methodology and publication practices themself. {{R:fi:SSA}} mentions appearences of kala spanning 350 years; the earliest inter-Finnic comparison found by them is Finnish ~ Estonian from 1786, followed by Karelian in 1799, Veps in 1830 (in the first linguistic report on Veps to be published), Votic in 1856 (in the first grammar of Votic to be written), etc. (There's no specific date on who was the first to claim that this is also a Proto-Finnic word; but if we grant modern theoretic understanding, this is already implied by the Finnish-Hungarian comparisons from the 17th century, so essentially the date would be as soon as someone came up with the concept of "Proto-Finnic" in the first place.)
I agree that this is information that someone might be interested in, but just referencing SSA itself should be enough so that people interested in the history of etymology would know where to look for more details. At Wiktionary we're only working on etymology itself, not its history. --Tropylium (talk) 13:47, 9 August 2015 (UTC)
Indeed, I agree, especially because a source like SSA would probably give you the beginning of the trail leading to the first proponent if need be. I'm not saying that you need to find out the very first historical source ever to make the claim; but that, unless the claim is yours, some source should be indicated (so the interested reader can follow the trail). And it seems that we agree on that, right? (The "fish" in AE and BE case is not really parallel: I don't think these words were popularly believed to be cognates, but rather they were believed to be the same word, much as when I use "fish" as opposed to when you use "fish": we are using the same words, even if we pronounce them differently. Now, English "fish" and German "Fisch" or Dutch "vis": that is not perceived as the 'same word', and cognacy enters the picture.) --Pereru (talk) 19:20, 10 August 2015 (UTC)

Transliterations in parentheses?[edit]

From the above discussion, it seems to me that most people want to keep the automatic transliteration of non-latin-script examples. Would it be possible to implement DTLHS's suggestion of putting the transliteration in parenthesis rather than after an em-dash, to distinguish it more clearly from the following translation? Could someone perhaps make the necessary changes in the appropriate module, assuming nobody has any objections? --Pereru (talk)

I oppose using brackets but perhaps a light-grey colour for transliterations would be more palatable? --Anatoli T. (обсудить/вклад) 12:59, 8 August 2015 (UTC)
What's wrong with brackets/parentheses? Transliterations on the headword line are in parentheses. Light grey text is hard for people with bad or limited eyesight (e.g. partial blindness) to read, although such people are probably only a tiny minority of our readers. I'd prefer parentheses to lighter text. I like the suggestion (made above) of putting transliteration on the same vs a different line according to the length of the line, but I guess it has no chance of actually corresponding to "fits on one line" vs "doesn't", given the variety of phone- and computer-screen sizes (unless we implement it is a css feature?). - -sche (discuss) 17:58, 8 August 2015 (UTC)
I agree. (Personally, I would even favor a smaller font, in addition to parentheses, but parentheses would already be enough to separate more clearly transliteration from transcription and from the original text).--Pereru (talk) 18:43, 8 August 2015 (UTC)
How about an option that allows transliterations to be shown and hidden at will? —CodeCat 19:49, 8 August 2015 (UTC)
Sounds OK to me. Is that easy to implement? --Pereru (talk) 04:16, 9 August 2015 (UTC)
As long as "at will" means something the end user does, not the editor. Benwing (talk) 05:19, 9 August 2015 (UTC)
Yes, it should work more or less like showing and hiding inflection tables. But there should probably be something that saves the user's preference too, so that transliterations stay hidden forever unless you show them again. —CodeCat 17:02, 9 August 2015 (UTC)
I can agree with that. I'll wait for implementation beforee using the templates in Eastern Mari, but after that it shouldn't be a problem. --Pereru (talk) 19:21, 10 August 2015 (UTC)

When adding RFC to entries[edit]

Would it be too much to ask whether when an RFC is added to an entry that the date be added as well (perhaps automatically), so that it can be traced back much more easily in the RFC records. Some RFCs remain in entries for years and get forgotten about, and are not easily traceable in the entry's history. Donnanz (talk) 16:00, 9 August 2015 (UTC)

Wikipedia has a bot that goes around adding dates to cleanup templates. Perhaps we could ask the folks who run it to run one here, too. You can find always the RFC discussion via the whatlinkshere (restrict it to searching the Wiktionary namespace and ctrl-f "cleanup"), unless the page was tagged but not listed. - -sche (discuss) 01:59, 11 August 2015 (UTC)
One way of adding the date is by adding your "four tildes" next to the RFC, but very few users would think of that, hence this thread. I try not to create too many RFCs! Donnanz (talk) 16:23, 11 August 2015 (UTC)
We already have the capability to deploy "oldest" and "newest" tables (such as the "oldest" table at the top of this page) for categories, which addresses on of your concerns.
The very existence of these suggests that the dates when an item was added to a category must already be accessible. Does anyone know how? DCDuring TALK 18:09, 11 August 2015 (UTC)

Templatizing usage examples[edit]

FYI, I created Wiktionary:Votes/pl-2015-08/Templatizing usage examples. Let us discuss the proposal, and postpone the start of the vote as much as the discussion requires. --Dan Polansky (talk) 09:38, 10 August 2015 (UTC)

I support this and I don't see why anyone wouldn't. It's analogous to why we templatize headwords and such. Templatized foreign-script languages, for example, allow for automatic translit. And likewise, the format can be changed, either by the end user through CSS or by editing the template -- e.g. if we figure out how to automatically use CSS to decide whether to put such an example on one line or multiple lines, which should definitely be doable since things like Bootstrap (a CSS library released by Twitter) can do it. Benwing (talk) 01:39, 11 August 2015 (UTC)
Is this something we even need to vote on? Is anyone against it? We've been templatizing usage examples for quite a while now and I don't remember anyone complaining. --WikiTiki89 01:45, 11 August 2015 (UTC)
I'm getting tired of all these pointless votes to be honest. —CodeCat 01:48, 11 August 2015 (UTC)
I too oppose votes on matters of formatting and template usage (and have stated as much in the past). Such votes could be seen as, at best, pointless, or as disruptive attempts to block the implementation of relatively minor changes by requiring the changes undergo more hurdles and meet a higher threshold (compare how US congresspeople use the filibuster to raise the threshold for passing legislation from 51% to 60%, blocking legislation which has enough votes to pass but not enough votes to come to the floor). Once before I started an "oppose having this vote" section on a vote, which garnered as much support as the vote itself; one could consider such an action if this vote is opened. (Side note, all the examples in the vote are English usexes, but I think it may be wise to consider English usexes — which don't need transliteration or translation — differently from foreign-language usexes.) - -sche (discuss) 02:14, 11 August 2015 (UTC)
Here's Wiktionary:Votes/2015-03/Templatizing topical categories in the mainspace; it has 50% support. Here's Wiktionary:Votes/2014-08/Migrating from Template:term to Template:m; it ended with 60% support. I find the above implication that editors at large should not have a consensus-based say in matters of template use in the mainspace and formatting in mainspace disconcerting. The wiki and template markup is the user interface and it matters a lot. The formatting instructions WT:ELE are a policy and cannot, in most circumstances, be edited without a vote. I oppose the use of ux and usex templates in English and Czech entries; it adds almost no value and makes the markup ugly to read. I never said so since I did not have the energy to do so; there are usually all to many things to discuss, in part since there are too many unnecessary changes being introduced by various editors without discussion. I have finally lost my patience, after seeing an editor chastise another editor for not using these templates. If I am a lone voice, the vote will easily pass. --Dan Polansky (talk) 08:03, 11 August 2015 (UTC)
As for "meet a higher threshold", can you clarify what the lower threshold and and the higher thresholds are in this particular Wiktionary situation? Do you consider 2/3 to be a too high threshold to pass? --Dan Polansky (talk) 08:09, 11 August 2015 (UTC)
You shouldn't create a vote before the issue has ever even been discussed. --WikiTiki89 10:33, 11 August 2015 (UTC)
The vote can be postponed as much as the discussion needs. Furthermore, overtemplatizing has been discussed, AFAIR. I remember one editor expressing his dislike of quotation templates and his preference for plain non-templated markup for attesting quotations; that's a case similar though not the same as example sentences. --Dan Polansky (talk) 11:49, 11 August 2015 (UTC)
  • Could someone remind me of what the benefit of this template is to new contributors, to passive users, or to others? If the benefit is a technical benefit that inures in a diffuse way to many, please explain. DCDuring TALK 12:30, 11 August 2015 (UTC)
See my comment up top about the benefit of the template, although there may be other reasons as well. Benwing (talk) 14:04, 11 August 2015 (UTC)
I asked not about the generalized benefits of templates, but of this one. I was hoping there were more.
So the total benefit is in the statement "the format can be changed, either by the end user through CSS or by editing the template -- e.g. if we figure out how to automatically use CSS to decide whether to put such an example on one line or multiple lines"
  1. What portion of our "end users" (admins? whitelisted editors? newbies?) will be trusted to make CSS changes of broad implementation? How would that work? Can you point to any examples or analogs in existing templates?
  2. Generally it seems that the features of templates quickly become Luacized, which dramatically reduces the ability of more casual contributors like me to make changes, especially since there is no group of responsive technical contributors willing to respond to requests, rather than implement their own cryptic agendas.
  3. All benefit depends on either:
    1. total implementation of a very capable (ergo, hard to develop successfully) template or
    2. allowing user-option non-use of the template when it fails to provide good output by the person using the template, ie, some who knew or was willing to learn the switches etc.
But we do not even have consistent use of our existing format, which is almost certainly needed for successful mass conversion to the template approach. What steps have we taken to discover inconsistencies in formatting, to learn from them, and to either correct them or amend WT:ELE?
Our failure to successfully continue deployment of Autoformat worries me. The existing format-maintenance system seems to be a regression requiring much more manual involvement.
It would be much easier for me to accept changes if they did not make it harder for newer content contributors, did not require more typing, did not make editing harder by uglifying the edit frame, led to specific benefits that were achievable with reasonable certainty, and were implemented by a responsive group of technical contributors. Continued overtemplatization in areas for which we need more contributors, ie, definitions, usage examples, citations, seems approximately opposite to the direction we should go. DCDuring TALK 14:50, 11 August 2015 (UTC)

Wiktionary:Votes/pl-2014-03/CFI: Removing usage in a well-known work 3[edit]

Some people recently mentioned they missed Wiktionary:Votes/pl-2014-03/CFI: Removing usage in a well-known work 3, despite the fact that the vote was opened for 5 months. Some of the people who missed it could have been User:Cloudcuckoolander, User:Ungoliant MMDCCLXIV, and User:DCDuring. I would like to encourage such people to post late votes, properly indented so that they do not count (e.g. #: Late '''oppose'''). We can't keep votes open forever, but we can continue to collect best evidence of consensus or its lack. Having a rationale accompanied with a late vote would would be very preferable, I think. --Dan Polansky (talk) 10:13, 10 August 2015 (UTC)

Thanks. Nevertheless, pinging for votes, even after-the-fact, looks like electioneering, a use of discretion that biases the process. It is what political parties do in elections: get out their vote. As I recall, there is some policy (probably unenforceable) against using e-mail to solicit votes. This has the merit of being more transparent, but still. Is but still an includable idiom or just elision? DCDuring TALK 14:15, 10 August 2015 (UTC)
I see your point. By pinging, I notified three people who explicitly said that they missed the vote, two of whom are likely to oppose the vote and one of whom would support based on his past comments. At the same time, I posted to Beer parlour so everyone who monitors Beer parlour is indirectly notified. I don't know what better I could have done other than stay silent. Late votes won't change the vote result anyway but are interesting, so I think they are a good idea. --Dan Polansky (talk) 17:42, 10 August 2015 (UTC)
The BP note alone would be more to my taste. But, as I said, pinging from a well-watched page is at least transparent, especially compared to alternatives. DCDuring TALK 18:14, 11 August 2015 (UTC)

Rare senses x rare forms[edit]

I have noticed that the parameter "rare" of the {{template:context}} categorizes entries into the Category:Terms with rare senses by language, while the parameter "uncommon" into the Category:Rare forms by language. What is the difference between these two categories? Originally I thought that "rare forms" contains only forms of some lemma, which are rare (e. g. common Czech word "pes ‎(dog)" has a common plural "psi", but rarely "psové" can be found too), but the real content of the category does not look so. Jan Kameníček (talk) 00:19, 11 August 2015 (UTC)

The fact that various rare, historical, dated, archaic, and obsolete things are categorized differently is due to (1) a desire to categorize terms with only obsolete/rare senses (like heleth) differently from terms which are still current/common in some senses (like land), combined with (2) the fact that categorizing such entries differently requires a lot of work (edits to entries, templates, etc), most of which has not been done yet. I think the ideal/plan/hope is that one day terms like heleth will be in Category:English obsolete terms (I am not sure why Category:English obsolete forms exists with the name and content it has; as you note, it should properly be used only for e.g. low as a form of laugh), while land et al will be in Category:English terms with obsolete senses. (And likewise with rare things.) - -sche (discuss) 01:03, 11 August 2015 (UTC)

Retiring the codes of spurious languages[edit]

As of this year, the ISO has retired or has received requests to retire the following codes on the grounds that they are spurious and the languages they ostensibly refer to never existed. I suggest we also retire the codes.

  1. cbh Cagua, kox Coxima, cum Cumeral, ome Omejes, toe Tomedes, rna Runa. I quote from the change request forms (cbh, kox, cum, ome, toe): "Alan Wares, in correspondence with Barbara Grimes (5/28/1971), stated that [each one] should be deleted as 'non-existent.' Moreover, the Ethnologue has not added any information to the language entry in nearly 40 years. Mention of this language is missing from all the major sources on South American languages (Adelaar 2004, Campbell 1997, Dixon & Aikhenvald 1999, Loukotka 1968, Crevels 2007, Voegelin & Voegelin 1977). Hammarstrom (2014, in press) cites two additional sources (Landaburu 2000, Ortiz 1965) for the non-attestation of [it]." (rna's change request is similarly blunt about the total lack of evidence that it exists.)
  2. cbe Chipiajes and pod Ponares. These are surnames rather than language names. Quoth the change request form for obe: "Alan Wares, in correspondence with Barbara Grimes (5/28/1971), stated that Chipiajes should be deleted as 'non-existent.' The only information that the Ethnologue has added for Chipiajes: 'A Sáliba surname. Many Guahibo also have that name.' Mention of this language is missing from all the major sources on South American languages (Adelaar 2004, Campbell 1997, Dixon & Aikhenvald 1999, Loukotka 1968, Crevels 2007, Voegelin & Voegelin 1977). Hammarstrom (2014, in press) cites two additional sources (Landaburu 2000, Ortiz 1965) for the non-attestation of Chipiajes." The comments on pod are similar.
  3. xbx Kabixí. See the change request form, where it is noted that the term Kabixí is a catch-all for any hostile tribe, and the linguist who studied it "concedes that there was no information on" it.
  4. iap Iapama. Quoth [3]: "There is no evidence that this language exists. No information has been added to the Ethnologue since the 1980s. Mention of this language is missing from all the major sources on South American languages (Adelaar 2004, Campbell 1997, Dixon & Aikhenvald 1999, Loukotka 1968, Crevels 2007, Voegelin & Voegelin 1977). Hammarstrom (2014, in press) cites two additional sources (Grenand & Grenand 1994, Gallois & Ricardo 1983) for the non-attestation of Iapama."
  5. svr Savara. Quoth [4]: "Hammarstrom (2014, in press) states that it has been checked quite carefully that no Dravidian language exists matching the name Savara or any of the other information in the entry (p.c. David Stampe 2011) , nor, for that matter, the Indo-Aryan Oriya variety labeled Sahara/Saora in Mahapatra (2002:183-184). Barb Waugh, in an email dated 10 November 2009, responded to queries about Savara stating that she did not believe that the language existed at all. She only knew Savara as an alternate name for Sora [srb], a Munda language. She pointed out that Ruhlen (1987) lists Savara as a Dravidian language. Around the same time, Kirk Miller (UCSB) wrote questioning the existence of this language."
  6. yds Yiddish Sign Language. See jewish-languages.org's entry for more. [has been removed from Wiktionary]
  7. btl Bhatola. See [5].
  8. myi Mina (India). See [6]. Neatly, retiring this will allow us to include hna Mina (Cameroon) without a disambiguator.
  9. pry Pray (aka "Pray 3", as Ethnologue called it because they just gave up on disambiguating it in any of their usual ways). This one is not strictly spurious; rather, it turns out to be no more than a duplicate of prt Prai (aka Pray). See the change request form with data from recent field research.
  10. yos "Yos" was retired and merged it into zom "Zo", with the change request noting that Yos is simply the English plural of Yo which is no more than a variant form of Zo. [has been removed from Wiktionary]

If you have objections to any of these, speak up. (This list does not include codes retired by being split or merged [except pry and yos], or some other codes; I'll post about those later.) - -sche (discuss) 02:39, 12 August 2015 (UTC)

(Pinging because this is old.) Have we any translations into (or entries in) these? If so, I guess merge pry and yos, but what do we do with the others?​—msh210 (talk) 20:43, 8 September 2015 (UTC)
@Msh210 We have no words (entries or translations) which are claimed to be from these languages, AFAICT. (Which is good, given that the linguists cited above are saying the languages don't exist.)
These aren't the first codes we or the ISO have retired for having nonexistent referents, btw.
I'll wait until the 15th, when the ISO/SIL say they will post their notes on which of the retirement requests they themselves accepted, before acting on most of these. But yos and yds, which they retired years ago, I'll go ahead and merge now. - -sche (discuss) 02:18, 10 September 2015 (UTC)

About gloss parameter in term templates in all other sections, except etymology[edit]

I am not sure how helpful is a gloss in Derived and Related terms or even in Synonyms. A term, as all words, may have many different senses. These can include a figurative one, a literal one etc. The existence of a gloss definition in etymology section is usefull since possibly (but not always) only one sense is the specific one that "caused" the people to use the word (or phoneme). I came to these conclusions after @Saltmarsh pointed out that I should add some gloss definitions to my additions. I tried it for start, but there where some "mind" troubles when I had to add terms that have more than one wide used sense or more than one gender. Someone might say that in such cases do not add a gloss. But even if only one sense is wide used we "provoke" the rejection of all other senses that user may find in the article. --Xoristzatziki (talk) 06:05, 12 August 2015 (UTC)

Synonyms should use {{sense}}. Derived and related terms should specify a gloss but it can be of the form |gloss=foo, bar, baz with multiple defns; doesn't have to be all possible senses but should be the principal ones. Benwing (talk) 06:45, 12 August 2015 (UTC)
[I've been away]   Reasoning: the gloss may be interesting/relevant and having it there saves the user linking through to find out. User Benwing shows the syntax.   Gender: I think I would do this for multiple gender forms of different meaning:
Idiomatic phrases: I would normally link each term (do we want all of these with a separate entry) and these certainly need a gloss.  — Saltmarshσυζήτηση-talk 05:57, 15 August 2015 (UTC)
  • Synonyms and Derived terms sections should not specify a gloss, by my lights. It is not only my preference but also a long-term overwhelming practice not to specify a gloss. Some people prefer glosses, obviously, but it is nowhere close to being a prescribed or recommended practice. Similarly, these sections should not provide gender, IMHO, but here the practice varies language by language. As for rationale, glosses make these sections too busy with information that is available elsewhere. Gender is okay as for being too busy or not, but is available in the lemma, and IMHO not so important that it should be available in term lists. --Dan Polansky (talk) 17:42, 16 August 2015 (UTC)

Allow Etymology as level 4 header[edit]

Me and some others (I don't know who, or where the discussion was) have expressed in the past a desire to have the etymology section nested under part-of-speech sections, rather than floating alongside them (both on level 3) or having the part of speech nested under the etymology. I think it makes more sense to put etymology underneath an individual word:

  1. Users generally look up terms for their definitions, etymology is of lesser importance overall. Therefore, it makes more sense to put it below the definitions.
  2. Etymology always applies to a single word and part of speech. If it happens to apply to multiple parts of speech, then the chances are that one of them was first, and the others were derived from that. That's something we can and should note in the etymologies of each individual part of speech.
  3. Having to increase the heading level whenever there are multiple etymologies is annoying. It also makes it look less consistent; sometimes POS is level 3, sometimes level 4? Level 5 headings are hard to distinguish visually from level 4, so I think level 4 should be the highest level we use.
  4. For non-lemma entries, we generally don't have or need etymologies, but we're forced to create etymology sections for them whenever there is another word in the entry. For example, rose ‎(rise, past) needs an etymology header to separate it from the header for rose ‎(flower), but the etymology section itself is left empty or doesn't have any useful information, because the etymology is at the lemma, rise.

So I'd like to ask/propose that etymologies be allowed to be nested underneath the POS header, as level 4. It would be added below the definitions, usage notes and inflection headers, but above synonyms, antonyms and derived/related terms. This is done in accordance with the general principle in our entry layout that information about the current term precedes information about relationships to other terms.

This proposal is intended as an indefinite trial, to let users who prefer this alternative format apply it to entries and evaluate its merits and problems. The original format will continue to be allowed as well, at least until there is a decision to phase it out. —CodeCat 18:50, 12 August 2015 (UTC)

"If it happens to apply to multiple parts of speech" -- isn't that overwhelmingly common? It would be tiresome to have ety sections repeated all over the place when they are basically the same word/sense. Do other dicts do that? Equinox 18:54, 12 August 2015 (UTC)
I've not found it particularly common in the languages I've worked with, it's quite rare. Maybe English is just an exception. But this is why I'm not proposing to get rid of the old format just yet; we can still keep using it in situations that we haven't found alternatives for yet. That said, I think it's pretty easy to handle this with nested etymologies, as I noted in point 2. Just put the etymology on the term that was first, and the rest get etymologies saying they were derived from that first term. For example, up ‎(preposition) is derived from up ‎(adv), which our entry fails to note. —CodeCat 19:01, 12 August 2015 (UTC)
For many words, it might not be known which POS came first. Other dictionaries do not do this. They generally list all the parts of speech and give one etymology at the top or bottom of the entry (which sometimes mentions different derivations of specific senses of the word, is still in one section). --WikiTiki89 19:12, 12 August 2015 (UTC)
"English is just an exception." And also merely, technically the host language of this wiki.
No matter how many times this is proposed, it still seems like a bad idea. The structuring advantage of having semantically related terms that are different PoSes is enormous for English. Though it is not particularly helpful for one not familiar with large dictionaries, it is quite helpful once one get the hang of it. It is almost essential where there are homonyms both with, say, nouns as PoSes. Is it the proposal to combine all of the noun PoSes, no matter what the etymologies? We have spent a fair amount of effort trying to split etymologies where semantically warranted. To run the definitions through the blender as seems to be proposed seems like a regression. We may have come to accept them in technical areas as people abandon the project and their creations lapse, but I don't see why they should be allowed in content. DCDuring TALK 23:26, 12 August 2015 (UTC)
Where are you getting the idea that POS sections are going to be merged? They'll be split by etymology as they always have been. —CodeCat 23:40, 12 August 2015 (UTC)
  • If POS is level 3, and etym is level 4... how would the POS sections not be merged? I find this rather confusing. ‑‑ Eiríkr Útlendi │Tala við mig 00:40, 13 August 2015 (UTC)
And I'm confused that it confuses you, because it seems pretty simple to me. POS sections are not merged, they're kept separate as they are now. Nothing more to it. —CodeCat 00:54, 13 August 2015 (UTC)
Just for fun (and clarity), could you take one of the more complex entries and reformat it in your proposed style (perhaps within your userspace)? It'd be useful for reference. Equinox 01:11, 13 August 2015 (UTC)
Ok, can you give me one you had in mind? —CodeCat 01:25, 13 August 2015 (UTC)
I think what is being proposed is (for e.g. rose):
A flower. (blah blah blah, headword line template, synonyms, etc)
From Oscan.
Past tense of "rise".
Inflected form of "rise".
Which indeed obviously keeps the POS sections distinct. I'm not sold on such an arrangement, but one obvious benefit is that etymology could be pushed below the definitions, which some people have favoured. - -sche (discuss) 03:01, 13 August 2015 (UTC)
  • To clarify my concern about merging POS bits, I'm not talking about nouns and verbs being thrown together. Instead, I'm concerned about terms that have multiple senses of a single POS type, and where those separate senses have different etymologies. If the etymology header is made subordinate to POS, then things get confusing pretty quickly. Consider the Japanese entry at , for instance. This term has nine different noun senses, all with distinct etymologies (and eight distinct pronunciations even). The proposed structure of an ====Etymology==== header at level 4, under a ===Noun=== header at level 3, would make this entry a complete mess. ‑‑ Eiríkr Útlendi │Tala við mig 18:07, 14 August 2015 (UTC)
See diff for an example of how I think etymologies should be handled. Each POS has its own etymology, including (especially) forms of another lemma. Two different lemmas can't possibly have the same etymology, because after all, if they have the same development history, why are they still different?
As for Eirikr's entry above, I'm not really seeing the issue. The entry already has one etymology section for each POS, so all that would be left to do is to switch the headers around. —CodeCat 20:09, 18 August 2015 (UTC)
  • CodeCat, have another look. As visible in the page's TOS, some single etymologies cover multiple POSes -- noun and prefix, noun and suffix.
In addition, I still don't quite understand your proposed layout. Further up the thread, it sounds like all nouns would go together under a single ===Noun=== header -- which then leaves me wondering how the disparate etymologies would be accounted for. Even if your intention is to have as many ===Noun=== headers as there are etymologies, this produces a strange circumstance where we are organizing higher-level headers in a way that's dependent on lower-level headers. Just in terms of hierarchical organization, that seems backwards.
And that still doesn't account for the case where an entry has multiple etymologies, and some of those etymologies apply to multiple POSes. Numerous Japanese terms have a single spelling, with multiple POSes under a single etym and pronunciation. Fewer, but still numerous, entries have multiple separate etymologies, each etym with its own pronunciation and possibly multiple POSes.
Would you be willing to edit the entry into your proposed structure, as you did for dice? A more concrete example would illustrate things more clearly, I think. ‑‑ Eiríkr Útlendi │Tala við mig 21:07, 18 August 2015 (UTC)
See User:CodeCat/ja. Since I didn't know the etymologies of all the terms, I had to make something up. —CodeCat 21:44, 18 August 2015 (UTC)
An alternative.​—msh210 (talk) 20:57, 8 September 2015 (UTC)

Why don't we have an Unattested namespace?[edit]

Putting unattested terms in the Appendix namespace gives no information about why they are there or how they differ from other appendices. Why not give them their own namespace? There is nothing particularly appendix-like about them. DTLHS (talk) 02:57, 13 August 2015 (UTC)

New namespace I'm confused: how would a new namespace help? —Justin (koavf)TCM 03:06, 13 August 2015 (UTC)
If you're talking about reconstructed proto-language terms (versus, say, this kind of unattested terms), I think giving them their own namespace (say, "Reconstructed:") would be a fine idea. We could even perhaps then write a Mediawiki: page or, at worst, some js/css, to automatically display the "this term is reconstructed" warning atop such pages, which people currently have to remember to add manually. - -sche (discuss) 03:08, 13 August 2015 (UTC)
Right, reconstructed, sorry. DTLHS (talk) 03:11, 13 August 2015 (UTC)
It would also make it easier to parse reconstructed pages, which should be treated like all other namespace pages, vs appendix pages which mostly should not be. DTLHS (talk) 03:30, 13 August 2015 (UTC)
I was a bit confused by the use of "Unattested" at first, (Appendix:English unattested phobias and Appendix:English dictionary-only terms come to mind) but for reconstructed terms I, too, support the idea of creating the separate namespace Reconstructed:. --Daniel Carrero (talk) 06:02, 13 August 2015 (UTC)
We should name the namespace so as to include constructed languages as well. --WikiTiki89 06:10, 13 August 2015 (UTC)
I definitely support a Reconstructed: namespace. I don't think we should include appendix-only constructed languages in it. What we should do with them, I don't know, but muddling the reconstructed namespace with them is a bad idea and would take away some of the technical benefits of such a namespace. My own personal preference is to just delete them altogether. —CodeCat 11:41, 13 August 2015 (UTC)
I support a Reconstructed: namespace too, without conlangs. They can have a namespace of their own, e.g. Conlang:. —Aɴɢʀ (talk) 12:19, 13 August 2015 (UTC)
I also support a Reconstructed: namespace. Not sure about conlangs; either they should go into Conlang: or into the main namespace. Arguably, conlangs that are well enough attested should go into the main namespace and others shouldn't be included at all. If we use a Conlang: namespace, where do we draw the line? Esperanto was originally a conlang, too, but we put it in the main namespace. Same with Lojban, for example. Benwing (talk) 12:53, 13 August 2015 (UTC)
This is why I prefer deleting them. It's a bit strange if we say "yeah, we don't actually allow these conlangs, but if you hide them away in an appendix then it's ok". —CodeCat 12:54, 13 August 2015 (UTC)
Question: What about reconstructed terms in, say, Vulgar Latin? What differentiates them from terms in Proto-Romance? Our entry claims VL and Proto-Romance are synonyms, but w:Vulgar Latin says that the two are "often confused". DCDuring TALK 13:33, 13 August 2015 (UTC)
Something to consider: should the pages in this new namespace be named with the language name as they are now? Or should we have entries named only with the headword, like in the main namespace? —CodeCat 16:33, 13 August 2015 (UTC)
There would be some pages with multiple proto-languages on them, e.g. the strings *me- and *ke- are so short that they're surely found in more languages than just proto-Algonquian. OTOH, handle that just fine in the main namespace. - -sche (discuss) 18:23, 13 August 2015 (UTC)
I'm in favor of organizing them like the main namespace rather than like the current layout, e.g. /wiki/Reconstructed:bʰer- with a ==Proto-Indo-European== heading rather than /wiki/Reconstructed:Proto-Indo-European/bʰer-, where the ==Proto-Indo-European== heading would be redundant. Maybe we could pick a shorter name for the namespace though, like Proto:. —Aɴɢʀ (talk) 18:18, 14 August 2015 (UTC)
Proto: would not work for non-protolanguage reconstructions. —CodeCat 18:25, 14 August 2015 (UTC)
It would work, it just wouldn't be the optimal name. "Recons:", maybe? I just don't feel like typing out "Reconstructed:" all the time. —Aɴɢʀ (talk) 18:45, 14 August 2015 (UTC)
I would suggest "R:" were it not for the fact that that would conflict with how we name and transclude reference templates. - -sche (discuss) 18:47, 14 August 2015 (UTC)
The software allows for namespace shortcuts. WT: is a shortcut to Wiktionary:. —CodeCat 18:57, 14 August 2015 (UTC)
Yeah, I (and, on my talkpage, JohnC5) have thought about the utility of having more namespace shortcuts, e.g. AP: for appendices. The shortcut might still have to be RC:, though, since I suspect the existence of an R: namespace (even as a redirect) might cause {{R:OED}} to be interpreted as a transclusion of R:OED rather than Template:R:OED (certainly I would expect it to fail to reach Template:R:foo for any {{R:foo}} where R:foo was a page). Side note, @Angr, how often would you be typing out rather than copy-pasting the first part of the pagename (Reconstructed:) given that the second part would probably contain characters like ɸ or ʰ₂r̥ that you'd have to copy-paste or insert from the edittools? Perhaps we could add Reconstructed: to the things edittools can insert... - -sche (discuss) 19:11, 14 August 2015 (UTC)
Even then, all our linking templates already treat * as a shortcut to reconstructed pages. So you'd only need to type the namespace name in the very rare occasion that you're not using a linking template. —CodeCat 19:23, 14 August 2015 (UTC)
I feel like I waste hours of my time typing the words Appendix and Category. If the abbreviations AP and CT respectively existed, I would be very pleased. Also Temp or TP for Template would be great for that matter. I don't see why we don't have more of these. I also support the creation of the Reconstructed namespace. —JohnC5 19:28, 14 August 2015 (UTC)
@CodeCat: Separate to this discussion, could we look into adding those shortcuts to the search bar? —JohnC5 12:58, 19 August 2015 (UTC)
Ideally, we'll get a few more users to chime in here supporting such namespace-redirects. Then we can file a Phabricator ticket asking for (a) a 'Reconstructed' namespace, and (b) 'RC'→'Reconstructed', 'AP'→'Appendix' and 'CT' (or maybe 'CA', since 'CT' sounds like 'Category talk' although we almost never have discussions on Category talk pages) → 'Category' namespace-redirects. It shouldn't be hard / take long for the devs to grant such things to us. - -sche (discuss) 07:49, 20 August 2015 (UTC)
  • I agree that conlangs should be handled separately from reconstructed terms. In contrast to how we handle proto-languages, our current approach to conlangs actually is fairly well suited to the appendix namespace, in that we have one page (one appendix, total) on each conlang. However, most of them are constant copyvio magnets, since we can only allow short appendices, but the inclusion of any appendix at all tempts people to expand said appendix: see e.g. [7] (BP discussion of copyright issues). I wouldn't mind deleting most of them, perhaps moving a few (de minimis) words into our mainspace entries on the names of the conlangs, using {{examples-right}}, like this. - -sche (discuss) 18:23, 13 August 2015 (UTC)
  • Support a separate namespace for reconstructed languages (for one, it's the by far busiest part of the Appendix: namespace, and trying to find out whatever is going on with all the other appendices is a pain). — I do not think that a mainspace-type approach to lumping "homographic" roots from different protolangs on the same page is a good idea though. Notational systems for protolangs vary greatly, and this could imply a senseless amount of repetition of "Alternate spelling of…" sections in the future. The basic object of protolang pages is an etymological group, not the graphical representation of its proto-form, per se.
    In fact I could suggest that the new namespace be named simply Etymology:, and that it could include appendix pages tracing the descendants of attested words just as well (a la Appendix:Names derived from Marcus). --Tropylium (talk) 14:10, 22 August 2015 (UTC)


Last year, the ISO approved the code esy for Eskayan. Should we follow suit, and if so, should we allow it in the main namespace? It's technically a conlang from the early 1900s, but it comes with a mythology that claims it's much older and it functions as a medium for recording traditional stories (both in Roman script and in a native script which lacks a ISO 15924 code). It has no native speakers but a few hundred secondary speakers and a few schools to teach it. - -sche (discuss) 06:50, 14 August 2015 (UTC)

I've added it to Module:languages. It is spoken by a few hundred people, and schools teach it and literature is published in it and has been for almost a century, so I suppose it is allowed in the main namespace like Esperanto. Its creator intended it for widespread use (by his ethnic group) and attributed it to his tribe's mythical ancestor rather than to himself, and then he (the actual creator) died in 1949, so as far as copyright concerns go, it seems similar to e.g. Esperanto and different from e.g. Dothraki. Shall we update WT:CFI#Constructed_languages to note the existence and inclusion of Eskayan, or is that not necessary because the ISO doesn't categorize esy as a constructed language, and it does not itself admit that it is one (even though it is identifiable as such by linguists)? - -sche (discuss) 19:03, 16 August 2015 (UTC)
I don't see any reason to exclude it.--Prosfilaes (talk) 20:32, 17 August 2015 (UTC)
@-sche: Reading up on it, I see that it's pretty much relexified Boholano Cebuano. If that's the case, it resembles the avoidance registers of Australia or the pandanus languages of New Guinea, which we treat as part of whatever language's grammar they have. Perhaps, then, we ought not to be including Eskayan on those grounds instead. —Μετάknowledgediscuss/deeds 02:20, 18 August 2015 (UTC)
That seems to value internal consistency over ease of use and external consistency. If the world treats it as a separate language, it seems like people looking it up are going to be expecting it to be a separate language.
Also, people looking up a language that has multiple known registers are going to know about the registers. It's a lot easier for students and the like to get confused if we mix Eskayan words in with Boholano Cebuano words, no matter how they're labeled.--Prosfilaes (talk) 04:56, 18 August 2015 (UTC)
Right. Furthermore, I'm not sure treating Eskayan as Cebuano would even provide internal consistency: if (as is my understanding) the entire lexicon is different to the point that there is zero mutual intelligibility, on what basis would we consider them the same language, while considering languages with very similar grammars and lexicons (say, Danish and Swedish) to be distinct? It's my impression that even the largest avoidance registers contain only a fraction of the number of words the main language possesses. - -sche (discuss) 03:27, 26 August 2015 (UTC)
  • I'm big on recording what people are actually using to communicate. On the other hand, there's a lot of missionary-mangled versions of languages that aren't really worth bothering with, and this looks like it might be just another example. If someone wants to do it, I'm not going to object.--Prosfilaes (talk) 07:33, 29 August 2015 (UTC)
    • I agree. If islanders use this language to communicate amongst themselves, then it would seem to be comparable to (a much less widespread form of) Esperanto, or even to Michif with only the difference that the group of people who created it lived recently enough to be identifiable by name rather than lost to the mists of time. But if it never gained use outside of the missionaries' materials, then it would seem comparable to other failed attempts at language-blending conlangs. - -sche (discuss) 16:57, 29 August 2015 (UTC)

Neo and Talossan (the two ISO-coded conlangs CFI doesn't specifically address)[edit]

Quoth CFI as updated to reflect current ISO numbers, in addition to the 7 (self-identified-as- and identified-by-the-ISO-as-) constructed languages which are approved for inclusion in the mainspace, there are 14 more languages which are classified as constructed languages, of which 9 "have not yet been approved for inclusion in the English Wiktionary", and are included in appendices: these are languages like Láadan. "Another 3 of those fourteen languages are prohibited", namely Quenya, Sindarin and Klingon, which are also included in appendices.

  1. What is the difference between being 'not approved' and included only in appendices, and being 'prohibited' and included only in appendices?
  2. What should be done with the two languages which are left out of the above count (9+3=12≠14), Neo and Talossan? Should they be 'not approved' and limited to appendices, or 'prohibited' and limited to appendices, or something else?
  3. What should be down with WT:BP#Eskayan, discussed above, which the ISO does not classify as a constructed language but which is identifiable as one?

- -sche (discuss) 19:29, 16 August 2015 (UTC)

I think we need to overhaul that part of CFI a bit. Instead of listing languages and thus being both messy and incomplete, we should make it clear that those 7 languages are approved, and no other languages that the ISO considers to be constructed languages may have entries in mainspace. That would leave Eskayan just like any other language, which I think is fine. —Μετάknowledgediscuss/deeds 23:48, 17 August 2015 (UTC)

Two romanization headers in a row[edit]

In entries like de and lei, is it preferable to have two romanization headers in a row (one with the "form of X" templates and one with the "nonstandard form of Y" templates), or only one header, like so? - -sche (discuss) 03:52, 17 August 2015 (UTC)

I think it's preferable to have a single Romanization header in such cases. —Aɴɢʀ (talk) 18:44, 17 August 2015 (UTC)

Notes as a valid L3 (esp. along References)[edit]

Copied from a related discussion, for separate discussion. (Link removed from sig not to ping unintentionally.) Neitrāls vārds (talk) 06:50, 17 August 2015 (UTC)

As I see it, a discussion on allowing "Notes" as a valid header should be considered.

Vahag has brought this up (Wiktionary:Grease_pit#.7B.7Breflist.7D.7D) and I'm running into a similar problem all the time. As ridiculously silly of an argument as it may be, I do, in fact, agree that numbered and bulleted references together look ugly AF. (I have even went to such ridiculous steps as removing a reference that didn't add anything critical just because it was bulleted while the other ones were numbered because of how unappealing it looks.)

In more general terms, I kind of get the feeling that there seems to be consensus that references are in fact valuable and add value to the entry, perhaps the discussion should focus more on how to allow more elegant ways of faithfully citing content, particularly in "controversial" cases, e.g., obviously one bulleted reference is enough under, say, an assertion that et kala and liv kalā derive from the same source because, well, it's pretty obvious but then if there is a "weird" controversial cognate there isn't even a way of citing it inline (unless you want the awful looking mixing of numbered and bulleted refs.) Neitrāls vārds (talk) 11:09, 12 August 2015 (UTC)

  • Support having a ==Notes== section separate from ==References==, esp. when both exist. Benwing (talk) 11:21, 12 August 2015 (UTC)
Where in the standard order of headers would this be placed? —CodeCat 18:48, 17 August 2015 (UTC)
Right before ==References==. --WikiTiki89 18:53, 17 August 2015 (UTC)
Am I understanding correctly, then, that the notes section would apply to all POS sections collectively rather than any specific one? —CodeCat 18:59, 17 August 2015 (UTC)
The way it is now (perhaps unofficially) is that the ==References== section may be found in an entry with one etymology as an L3 or L4 or in an entry with more than one etymology as an L3, L4, or L5 (my personal preference is never to have it as an L3 with more than one etymology, so I usually fix these cases). --WikiTiki89 19:04, 17 August 2015 (UTC)
My preference is the opposite, to have it always at L3. —CodeCat 20:00, 17 August 2015 (UTC)
If they are always tagged with <ref> tags, then your way may be better, but often the ==References== section is just used to list references that apply to an entire section, in which case you need to know which section that is. You can have a different set of reference links for each etymology section or even each POS section. --WikiTiki89 20:04, 17 August 2015 (UTC)
Your point is valid, but I don't like references that don't use ref tags to begin with. "Section-wide" references tell you nothing about what comes from where. All they do is say "these references were somehow involved in the creation of this entry", which is rather vague. —CodeCat 20:11, 17 August 2015 (UTC)
This. His point is invalidated since we shouldn't have references without ref tags. — LlywelynII 14:07, 18 August 2015 (UTC)
Support. This will also help prevent misuse of the Usage notes section. I frequently run across entries whose “usage notes” have nothing to do with how the word is used (arachnogenic necrosis is the latest example). — Ungoliant (falai) 19:05, 17 August 2015 (UTC)
  • Oppose. The solution to the layout problem is to not use bulleted references. Usage notes already covers any notes relevant to the entry. Anything that would go into a Wikipedia entry's "Notes" section should either be addressed in the appropriate section directly (as with contested etymologies) or simply removed (as with Ungoliant's "a. necrosis" example). Giving people yet another section in which to include errata isn't an actual solution to the problems people are listing. — LlywelynII 14:07, 18 August 2015 (UTC)
    • IMO this makes little sense. You basically think people should never create references sections listing refs; that's an impossible standard to meet and often way too awkward. In my Arabic entries that I add, I routinely add a "References" section under each part-of-speech entry listing the books where I got the entry definitions from. There's no simpler way of doing it, since the reference really does refer to the POS section as a whole in most cases. And many languages do this. So we really do need Notes and References separate. Likewise if we're using Harvard-Style references, with short footnotes under "Notes" that are linked to a list of references under "References". Benwing (talk) 14:17, 18 August 2015 (UTC)
      • IMO you're confused as to what's being proposed which probably goes back to the original discussion's misunderstanding of Wikipedia's #Notes section. #Notes (as the name implies) are for actual notes; they are not for references of any sort. #References are for both generated inline references (what's being called numbered references here) and bibliographic lists. If you feel the layout requires it, you can create subsections for #Citations and #Bibliography or #Works cited or #Whathaveyou.

        There's no call whatsoever for a (second) #Notes section at Wiktionary and creating one will increase the level of errata our users will add to entries, which the editors above felt to be a problem. The w:1st rule of holes suggests not expanding the areas of the entry devoted to random information, beyond that included in the existing and needful areas.

        As for having a subsection of #References for linked #Citations and another for stand-alone #Works... I fall back on my position that you're just being lazy and should create appropriate references as you create entries. At the same time, there's no real problem with creating a subsection within #References to deal with the layout issues, if people really want harvard style references and a separate list of works. But that discussion has nothing to do with a #Notes section. — LlywelynII 14:25, 18 August 2015 (UTC)

        As an example of what I mean, I patched up բալախ, the original entry that prompted this discussion. Note that having a separate inline section means that the inline citations should not fully duplicate the information in the bulleted list. It should be kept terser, with the full information on the source given below. — LlywelynII 14:38, 18 August 2015 (UTC)

        Here's an edit after the #Citation section has been made terser and the bibliographic info has been moved down to the #Bibliography section. Obviously it could be made more helpful and nicer with some of Wikipedia's inline citation templates like sfnp, which create automatic links to the full citation info. — LlywelynII 14:55, 18 August 2015 (UTC)
        WT:NOT#Wiktionary is not Wikipedia. We can do things differently. --WikiTiki89 14:32, 18 August 2015 (UTC)
        We can, but having an infelicitously-named #Notes section is really not a good place to start. If it's intended for storing inline references, it still belongs in the #Reference section. — LlywelynII 14:34, 18 August 2015 (UTC)
        Well we could have the ==Notes== section actually be notes that reference the ==References== section, like Benwing mentioned. --WikiTiki89 14:39, 18 August 2015 (UTC)
        A #Note section giving notes on the #References section would be a section of commentary on the sources being used for the entry. That's completely different from what Benwing was discussing and doesn't seem particularly helpful itself, either. — LlywelynII 14:55, 18 August 2015 (UTC)
While we're on this subject, note that I cannot use inline references for two language sections simultaneously without resorting to ugly tricks. See գութ. --Vahag (talk) 14:48, 18 August 2015 (UTC)
Sure you can. You either duplicate the information in each section or you use a named reference, with a #Reference section below both. I do have to admit I'm confused, though. Your example գութ doesn't have any reference shared between its two sections. Was there one you wanted to share or was it just a bad example? — LlywelynII 14:59, 18 August 2015 (UTC)
Hmm, I wanted to do this and I could swear that format didn't work before. It does now, so I withdraw my comment. --Vahag (talk) 15:29, 18 August 2015 (UTC)
  • Oppose. I also oppose the notion (expressed by some above) that all references need to use ref tags. In particular, because Wiktionary has a longstanding practice, which I support, of not cite other dictionaries inline for definitions, but Wiktionary does allow other dictionaries ("mentions") to verify words in many languages, there will always be many entries which have references which apply to the whole entry, as Benwing notes. I personally don't find a mix of bulleted and numbered citations problematic, but if you do, a solution like the one deployed on բալախ is preferable to a new section which, I agree with Llywelyn, is unnecessary and also apparently misunderstanding what Wikipedia uses ==Notes== for (hint: not references, but actual clarificatory notes, which often don't cite references). Practically speaking, the continued use of "related terms" by new users to mean "semantically related" when it actually is for "etymologically related", and the only very slight distinction that is proposed to be made between ==References== and ==Notes==, convinces me that only a few veteran adepts would use ==Notes== correctly, and other people would either not use it "correctly" vis-a-vis ==References==, or fill it up with trivia. - -sche (discuss) 15:16, 18 August 2015 (UTC)
    • But those are completely different things. There's references ("see here for more") and sourcing ("we got this information from here"). Mixing them into the same references section is bad. I have no problem with listing external reference works, but treating them as sources or mixing them in with sources is very bad. External reference works should, surely, go in the "external links" section, the "references" section should be kept for sourcing only. —CodeCat 20:06, 18 August 2015 (UTC)


See Talk:𐤋𐤏𐤁. Seems we have a hundred-odd entries whose headwords are perfectly correct but whose article titles are written backwards for no apparent reason. (Nothing came up searching the beer parlor but there may have been a discussion about this elsewhere. If so, just kindly link to it.) — LlywelynII 23:33, 17 August 2015 (UTC)

Could this be a problem with the wiki editor (and/or the user's browser)? I mean, if you start typing Hebrew or Arabic, it will correctly switch to right-to-left mode. But it doesn't necessarily "know" about every language. Equinox 23:51, 17 August 2015 (UTC)
The problem is that even though Unicode designated Phoenician as right-to-left, most fonts seem to display Phoenician characters left-to-right. And because of this, the editors who created these entries entered the letters backwards in an attempt to get them to display correctly, so the article titles are actually wrong. --WikiTiki89 02:18, 18 August 2015 (UTC)
Ah. So it's a well-meaning problem all around: the original editors were trying to get it to display correctly; the programmers got around to formatting that language to process correctly; implementing the new coding has now made the existing pages display incorrect backwards names which are getting copied onto other people's work elsewhere on the internet. So, we just need to go fix this, right? Is this something easily automated or do we just slowly do it by hand?

And will the entries now alphabetize correctly? or do they need special treatment in their DEFAULTSORTs? — LlywelynII 13:48, 18 August 2015 (UTC)
This must be done manually by someone with enough familiarity with Semitic languages (such as myself). The entries are very inconsistent. Some are correct, and some are incorrect in different ways. And yes, they will alphabetize correctly after this. --WikiTiki89 14:02, 18 August 2015 (UTC)

Nouns mostly used in plural - redirection to singular[edit]

I see reduction of content going on in nouns often used in plural, via soft redirection to singular forms. That includes crocodile tears, savings, and scrambled eggs. This seems inferior to me and I would like to refert. We should IMHO host the definitions in the most common form, and if the most common form is the plural, we should host it in plural. What do you think? Anyone has a link to a previous discussion? --Dan Polansky (talk) 19:10, 18 August 2015 (UTC)

One concern I expressed in this previous discussion (see also this one) was that most people are able to figure out when a word is plural even if they can't tell what it means, and will look up the base form (e.g. foobar, if what they see in the text is "the foobars are blah"), so unless there is some explicit and obvious notice that additional senses are to be found in the plural's entry, readers may never think to look there.
If all senses are most common in the plural, I agree that the plural should be the lemma, with the singular using Template:singular of or a similar template. If only some senses are most common in the plural, I think it's more helpful to the reader to have them all in one place with appropriate labels (like "chiefly plural"). I could live with splitting them, though, as long as there were explicit, obvious notices to readers that they need to look in the other entry for more senses. (I don't think bare Template:singular of as an additional definition-line after some substantive definitions makes it sufficiently obvious that there's more semantic information to be found in the plural, but Template:singular of with a gloss specified might work.) - -sche (discuss) 19:38, 18 August 2015 (UTC)
The way I see it, there is one and only one lemma entry (one with definitions, inflection, -nyms, etc) per lemma. A single lemma should not have more than one lemma entry. So either these should all be concentrated on a single lemma page, as our normal practice is with respect to lemmas and non-lemmas, or we should treat them as separate lemmas entirely and keep them completely separate. I have done this with some entries as well, such as dialectics and darts. Note that in the former case I made sure to split the etymology as well, as different lemmas always have different etymologies. —CodeCat 20:20, 18 August 2015 (UTC)
We also need to establish some limit for how much more common the plural is. According to bgc ngrams, shoes, eyes, and feet are all somewhat more common than their corresponding singulars, but I wouldn't want to treat the plurals as the lemmas. —Aɴɢʀ (talk) 06:28, 19 August 2015 (UTC)

How can we improve Wikimedia grants to support you better?[edit]


The Wikimedia Foundation would like your feedback about how we can reimagine Wikimedia Foundation grants, to better support people and ideas in your Wikimedia project. Ways to participate:

Feedback is welcome in any language.

With thanks,

I JethroBT (WMF), Community Resources, Wikimedia Foundation. 05:24, 19 August 2015 (UTC)

What to call plural noun lemmas?[edit]

We have the template {{en-plural noun}} to categorise nouns whose lemma is grammatically plural. But this template also categorises in Category:English pluralia tantum. Is every noun that is used primarily in the plural a plurale tantum? I'm thinking a better category name would be Category:English plural nouns or Category:English plural-only nouns. —CodeCat 14:28, 19 August 2015 (UTC)

Very many "plural only" nouns can be found to be attested in the singular, eg scissor. It would, IMO, be misleading to eliminate the category for this reason, but it means that we need a good explanation in the category header. If we have a good explanation, we don't need to worry as much about the category name. I think what users need to know is not that the lemma is plural in form, but whether it is more commonly ("correctly") used ("agrees") with a singular or plural verb. I think this is an empirical question for many such terms, rather than something that follows from the categorization. I wonder whether the category shouldn't be hidden and the "plural-only" display replaced with something that focused on the agreement issue. As a hidden category it would retain its usefulness in directing contributors to reviewing the entries to determine whether they adequately and correctly addressed the agreement issue. DCDuring TALK 14:42, 19 August 2015 (UTC)
scissors pl ‎(normally plural, singular scissor). We can call the category Category:English plural nouns (and use it only for lemmas, not forms-of). --WikiTiki89 14:50, 19 August 2015 (UTC)
With such nouns that do have a singular, we have to ask what the singular actually means. For the derivation singular > plural it's easy, it is simply multiple of a thing. For plural > singular, if the plural form clearly does refer to multiple objects, then I'd reason that it should simply be a non-lemma and the singular is the lemma. But for plural nouns that are not clearly multiple instances of something, it's more difficult. "Scissors" is a single object, so a hypothetical singular form doesn't have a predictable meaning. What is a "scissor"? Saying it's the singular of "scissors" doesn't actually make it clear what it is. So I think that we should evaluate cases where the singular parameter of this template has been specified. —CodeCat 15:00, 19 August 2015 (UTC)
I agree (but that shouldn't prevent it from being on the headword line, just in case that's what you were implying). And the same is the case with plurals of proper nouns, such as Islams; just calling it the "plural of Islam doesn't explain what it means. --WikiTiki89 15:04, 19 August 2015 (UTC)
"A scissor is for cutting"; "A scissors is for cutting"; "Scissors are for cutting" (could refer to one or multiple pairs of scissors). The pattern doesn't apply to spectacles/glasses.
What label and what category name should be applied to scissors and to glasses/spectacles? DCDuring TALK 16:17, 19 August 2015 (UTC)
Yes, but what is a scissor? Some would say it is one half of a pair of scissors. Other's would say it is one pair of scissors. Others would say it is one instance of a scissoring motion. But none of that is clear from the definition of scissors. --WikiTiki89 16:21, 19 August 2015 (UTC)
I don't think it is used much to mean "one of the two parts of a pair of scissors." despite the apparent use of scissor in just that sense in pair of scissors. We have long past the time when there was a significant group of speakers who used scissor that way. DCDuring TALK 16:29, 19 August 2015 (UTC)
Challenge accepted. --WikiTiki89 16:34, 19 August 2015 (UTC)
I don't doubt that you can find current attestable usage of scissor in the sense you have dredged up from history and etymology. I think it is more likely the subject of humor (eg, George Carlin) than conversation that adheres to the Gricean maxims, in particular "Avoid obscurity of expression" and "Avoid ambiguity" (presumably in context). DCDuring TALK 16:46, 19 August 2015 (UTC)

Guidance requested on religious terminology[edit]

Quaker terms I would like to make entries or a listing for Quaker-related terminology, as some of it is very particular but I'm not sure if it belongs in the main body of the dictionary or an appendix or what-have-you. For instance, Quakers traditionally didn't refer to the days of the week by their common pagan-derived names but used "first day" for Sunday, "second day" for Monday, etc. I could easily imagine someone reading about a "Friend going to meeting-house on first day" and not realizing that this means a "Quaker going to church on Sunday". Should I create entries for all of these terms or simply something like Appendix:Quaker terminology? Thanks. —Justin (koavf)TCM 02:17, 20 August 2015 (UTC)

  • Be bold, and make a start. We'll let you know if you do anything wrong. SemperBlotto (talk) 05:20, 20 August 2015 (UTC)
    • Do we have a context label for Quakerism? If not, we should make one. —Aɴɢʀ (talk) 09:49, 20 August 2015 (UTC)
      • A strategy is to start with a simple, but formatted, list in an Appendix * {{l|en|first day}}, yielding first day. That would enable you to see how many of the terms already existed in English (blue link), possibly with the right definition, how many required a new English section (orangish link), and how many needed new entries (red link). Each of these situations can be speeded up by having specific cut-and-paste. DCDuring TALK 18:06, 21 August 2015 (UTC)

@DCDuring:, @Angr:, @SemperBlotto: A lot of them are at Appendix:Quakerism. There are probably a few more but I'm tired now. Do you think that a context label and tracking category would be useful? Thanks. —Justin (koavf)TCM 03:16, 23 August 2015 (UTC)

I do. We already have them for other Christian denominations such as Category:en:Anglicanism‎, Category:en:Eastern Orthodoxy‎, Category:en:Coptic Church‎, Category:en:Mormonism‎, Category:en:Protestantism‎, Category:en:Roman Catholicism‎, so why not Quakerism? —Aɴɢʀ (talk) 06:25, 23 August 2015 (UTC)

French French, Spanish Spanish and the like[edit]

This came up tangentially in May, but I'd like to raise it in its own thread. Currently, most regional categories are named "[place-adjective] [language]", as in "French French", "Welsh English" and "Austrian German", while a minority are named "[place-noun] [language]", as in "Louisiana English" (not *"Louisianan English") and "Quebec French".
I and some others find "French French" (and also to some extent "Welsh English") awkward and confusing, because it's easy to interpret both instances of "French" (and "Welsh") as referring to a language rather than a place. The "[place-adjective] [language]" scheme is also impossible or undesirable for some languages: "Swiss German" was felt [by some people, not me] to be so similar to the name of the Swiss German language [which Wiktionary calls Alemannic] that its category was moved to "Switzerland German", and it's currently impossible to distinguish French terms specific to the DRC from those specific to the ROC, because both go in "Congolese French". OTOH, "Austrian German" and most other category names are fine.
I propose we move all the reduplicated categories (like "French French") to either the "France French" format some categories already use, or to a format like "French of France". (Should we move all categories, including "Austrian German", etc, to one of those formats? It'd be consistent, but unnecessary in most cases.) - -sche (discuss) 22:05, 20 August 2015 (UTC)

Using the "French in France" format has the nice advantage, from a technical standpoint, that it fits the same name format as all our other part-of-speech type categories. —CodeCat 22:30, 20 August 2015 (UTC)
  • Support Absolutely. I always support "X in Y" or "X of Y" constructions because of Congo/Congo and Dominican/Dominican (Dominica and the Dominican Republic). —Justin (koavf)TCM 03:58, 21 August 2015 (UTC)
We need to make sure we use linguistic borders rather than political borders. Anything with the word "Republic" in it is not likely to be a linguistic border. --WikiTiki89 05:25, 21 August 2015 (UTC)
I could support this in cases where it's ambiguous (like Congolese French) or highly misleading (like Swiss German was), but some of the reduplicated names (e.g. English English for the English of England) are actually well established and I wouldn't be happy to see them go. And I really wouldn't want to change the names of local varieties when the names are nonreduplicated, well established, and unambiguous, like Austrian German or Munster Irish. —Aɴɢʀ (talk) 12:43, 21 August 2015 (UTC)
Yeah, that's a concern I have, too — "Austrian German" and most categories have perfectly good names as-is, it's only a minority that are problematic. I certainly don't want to have three competing formats ("[place-adjective] [language]", "[place-noun] [language]", "[language] of [place]"), so if we're not prepared to switch in general to a "[language] of [place]" format, I suppose the status quo of occasionally deviating from "[place-adjective] [language]" to "[place-noun] [language]" is functional, if a bit unschön. "Dominica English" and "Dominican Republic English" work, and I guess so does "DRC French" (probably the least ugly option, compared to "DR Congo[lese] French" or the atrocious "Democratic Republic of the Congo French"). - -sche (discuss) 19:13, 21 August 2015 (UTC)
  • Where can I see uses of "English English"? google books:"English English" gives me high number of hits but from clicking the hits I find no quotations of use of "English English". --Dan Polansky (talk) 21:20, 21 August 2015 (UTC)
google books:"English English dialects" turns up a handful, which I've added to Citations:English English. Obviously, I don't dispute that the phrase is attested, only that it's the best/clearest name we could choose to use. - -sche (discuss) 03:15, 22 August 2015 (UTC)
We don't have names for linguistic divisions. It's English of England, not the more accurate English of England minus the northern half of Northumberland and the southwestern part of Wrexham, Wales and various enclaves in Paris, Dublin, New York City, Hollywood, etc., etc. (Yes, that was made up; I don't know the exact lines of English of England, and in fact the edges aren't that clean, the lines between Welsh English and Scottish English and the English of England are in fact slow changes.) By the difficulty of moving across national borders, and cultural identities tied to them, national borders tend to have some effect on language division, and where they don't, we probably can't say anything about it. So, no, "Republic" in the name doesn't mean anything.--Prosfilaes (talk) 20:56, 21 August 2015 (UTC)
Fr.Wikt lists ~70 words which are used in one Congo but not the other; I welcome suggestions on how to categorize them without using the names of the countries (which is what fr.Wikt does, if anyone wondered). :-) Fr.Wikt also lists a handful of words which are used in both Congos, which it might be tempting to conflate into one category, but I note that we don't conflate words used in Canada with words used in the US even when the words are used in both places — we dual categorize them as "Canadian English" and "American English". (In fact, we had a discussion which specifically deprecated the geographic label "North American" and made it so {{lb|en|North America}} displays and categorizes as "Canada, US".) - -sche (discuss) 03:23, 22 August 2015 (UTC)

Get rid of the parentheses around inflections in headword lines[edit]

Instead of putting parentheses there, I'm thinking it might look cleaner to separate the inflections with an m-dash or something similar. Something like this:

testplural tests

An advantage is that it looks nicer when you put qualifiers or transliterations there. Those features aren't used much, but they are available.

What do you think? —CodeCat 21:15, 21 August 2015 (UTC)

@CodeCat: I think it could be visually appealing but mdashes with spaces is bad typography. Space ndashes or use mdashes immediately between the terms. —Justin (koavf)TCM 03:18, 22 August 2015 (UTC)
Not it's not. Languages other than English frequently use m-dashes with spaces. It's not "bad typography", just not typical in English text. --WikiTiki89 03:30, 22 August 2015 (UTC)
@Wikitiki89: If it's not typical typography, then we shouldn't use it. —Justin (koavf)TCM 03:43, 22 August 2015 (UTC)
It's not typical in English running text. That says nothing about specially formatted things like tables or dictionaries. --WikiTiki89 00:49, 23 August 2015 (UTC)
Here's what other dictionaries do (a slash denotes a line break):
online dictionaries:
  • Cambridge: thesis / noun (plural theses) / [definition]
  • Collins: thesis / noun / (plural) -ses
  • dictionary.com: thesis / noun, plural theses [...] / [definition]
  • Merriam-Webster: thesis / noun [...] / [definition] / plural theses
  • Oxforddictionaries: thesis / noun (plural theses) / [definition]
  • thefreedictionary.com: thesis / n. pl. theses / [definition]
paper dictionaries:
  • Concise Oxford English Dictionary: thesis n. (pl. theses) [definition]
  • Webster: thesis n., pl. -ses [definition]
The trend is to have as little horizontal space as possible between the singular and the plural, which is consistent with using parentheses or a comma, and inconsistent with a dash. - -sche (discuss) 05:13, 22 August 2015 (UTC)
But the trend for online dictionaries like us is to have a line break, so maybe {{head|en|noun|plural|tests}} and {{en-noun}} should generate:
and so on, e.g. {{de-noun|m|Tischs|gen2=Tisches|Tische|Tischlein|dim2=Tischchen}} gives:
I think that's easier to read than piling the forms up horizontally. —Aɴɢʀ (talk) 06:09, 22 August 2015 (UTC)
  • This would make sense on cell phones, but on laptops there's usually limited vertical space and lots of horizontal space.
  • In response to CodeCat, I've long wanted the parens gone, because with translits you end up with two layers of parens. Benwing2 (talk) 06:31, 22 August 2015 (UTC)
I think the line break some other dictionaries provide between the first mention of the lemma form and the mention of its plural is the same one we already provide between those two things: we just separate the first mention of the lemma form (up at the top of every page) from the rest of the headword line by so much other stuff like etymology that we repeat the lemma form a second time before we give the plural. I don't think we should add another line break on the PC version of the site, although as Benwing notes, it might actually make sense to do so on the mobile version. - -sche (discuss) 07:54, 22 August 2015 (UTC)
What about just a comma?
noun, plural nouns
ко (ko), plural кои (koi)
Arabic entries like حدث already employ commas rather than parentheses for this kind of thing. - -sche (discuss) 08:02, 22 August 2015 (UTC)
There are many formats that would get my vote, but any format that would take up any additional vertical screen space on a desktop, laptop, or good-sized tablet would not. I'd prefer an endash over an emdash too.
Wouldn't the space constraints of a cellphone be better addressed by the Wiktionary app than by our efforts? DCDuring TALK 11:34, 22 August 2015 (UTC)

The dot before the first transliteration on some entries' headword lines[edit]

The thread above this prompted me to look closely at how headword-lines are formatted, and it strikes me that having a dot before only the first transliteration on only the headword line of only some entries creates an awkward and inconsistent amount of space. For example, in буква: why should "(búkva)" be further away from "бу́ква" than it is from "f inan"? and why should "(búkva)" be further away from "буква" than "(Latin spelling žagati)" is from "жагати" in [[жагати]], or than "‎(romaji aizōban)" is from "あいぞうばん" in that entry? The dot is especially awkward in entries like حَدُثَ ‎(ḥaduṯa), where the headword-line goes on to give another word and its translit, and the second translit is not separated by a dot. I propose we eliminate the dot.
I know that for the tiny number of languages which have WT:_ transliteration pages, the dot serves as an easter egg for the tiny number of people who notice that it contains a link. The link could either be moved to the transliteration itself, i.e. бу́ква (búkva), or just omitted because a nearly unnoticeable link that only exists for a few languages and points to a page that's frankly not very useful is, well, not so useful that it needs to stay... I mean entries don't even normally link to WT:About _ pages AFAIK, and those are frequently more useful. - -sche (discuss) 08:30, 22 August 2015 (UTC)

I was one of the proponents and spreaders of the "dot" format, but recently I changed my mind for the reasons you give. I now think it should be removed altogether. --Vahag (talk) 11:07, 22 August 2015 (UTC)

User:Pereru and sources again[edit]

This user has started adding all kinds of templates like {{needsources}} and {{needref}} to entries again. These templates don't serve a purpose as there is no strict need for sources. You can't ask for sources if there aren't any.

More annoyingly, the user is now also preventing me from editing and fixing up etymologies, reasoning that I may only write what agrees with the source. This is complete nonsense; if sources restrict what edits Wiktionarians may make, then the sources need to go. Or better yet, the users need to stop doing that and let editors do their work. If sources prevent me from improving Wiktionary, I'm going to start removing them. —CodeCat 18:39, 22 August 2015 (UTC)

The template {{needsources}} was kept, as in the decision above, and is believed to be useful. Adding it to an entry does not change any of its contents, it merely points out that there are no sources and that it would be an improvement to add them. If there are no sources, add a rationale -- the template says so. If you want, we can talk about how to do that. But adding information that is based on something -- even original research -- wihtout mentioning its source or rationale -- that is in no way an improvement.
Ahn... Please don't misrepresent me. What I'm saying is that etymologies cannot float in the vacuum. If you don't have a source to add, add a rationale. You often do a quick'n'dirty one in the edit summary -- why not add a better one to the text itself?
I insist: I am not preventing you from fixing etymologies: I merely think that, by letting them float in the air, you're making them worse. Ground your etymologies, and I'll have no problem with what you do. Please, don't misrepresent what I say.--Pereru (talk) 18:45, 22 August 2015 (UTC)
Why is a rationale needed? Specify which parts of an etymology are in doubt. Or better yet, take it to WT:ES. Putting a template on the entry solves nothing at all. The template itself needs a rationale just as much for it to be useful.
Because the reader is not a Wiktionarian. He's not trying to discuss etymologies. He wants to know what the jist is of the reason why this form is here rather than some other form. He is not a critic: he just wants to know how Wiktionary decided that this was the right form. It's information, it's relevant, it should be on the page. Why is this even a problem? Are you trying to hide something?
I'm also not amused by your continued stance against Balto-Slavic. Balto-Slavic is accepted and has consensus among linguists, yet whenever I add it to an entry you put brackets around it and add "perhaps", while keeping your own Baltic-only etymologies displayed prominently. Wiktionary is not here to promote your fringe anti-Balto-Slavic views. We should show the current state of research. I think if you continue to exclude Balto-Slavic or play down its relevance or acceptance, then you should stop editing etymologies altogether. —CodeCat 19:07, 22 August 2015 (UTC)
Not the linguists I've talked to, no. But even if it were a consensus -- I don't have a problem with you adding Proto-BS to Wiktionary. I have a problem with you not arguing for the forms (same for Proto-IE, by the way). If you've invented them yourself, say so and state why. Why is this so difficult? If you're proposing a hypothesis, justify it on the page! If it's an argument that is generally valid for many words, write it up somewhere and link to it on the page! The "perhaps" there is meant to show that there is no reason for that form given here in Wiktionary -- if you add a reason, a justification for that form, then I'll be happy to delete any hedges.
Let me turn the argument against you: Wiktionary is also not here to support your anti-source, anti-rationale agenda. Being against sources and insisting on hiding the reasons why you choose one specific protoform when there are other in the literature and when there often is disagreement among Wiktionarians (see Štambuk and you on Kim vs. the Leiden school) does not make anything better here -- it arguably makes things worse. --Pereru (talk) 19:15, 22 August 2015 (UTC)
And yet, you refuse to explain anything at all about the problems you have with etymologies. WT:ES exists for a reason, why don't you use it? That's the place for discussing etymologies. Discuss what's wrong with them. If you find them implausible, then say why in the discussion. Just putting "perhaps" and a bunch of templates doesn't solve any of that. —CodeCat 19:22, 22 August 2015 (UTC)
That's because I don't have any specific problems with the forms in question; I just want to see what the reasons are for their having been chosen. And I keep not understanding why wanting to see this is strange, and why you're so determined to hide it. Again: it's not about discussing, it's about documenting. It's like adding sources to quotations. --Pereru (talk) 19:34, 22 August 2015 (UTC)
Yes, and which reasons are unclear? What needs explaining? Specify which aspects of the etymology are unclear and need explanation. And don't say "all of it" because that would make no sense; not a single etymological source explains everything about an etymology. The reader is always assumed, by every work, to have an understanding of the linguistics. What etymological sources do is they explain special parts of the etymology that may be surprising or unexpected, or aspects of a language's development that are unknown or not fully consensus. So you need to specify which parts of the etymology are unclear and need motivating.
I need to see the rationale in order to tell you if I think there is something wrong about it. Just as I need to see the definition of a word to see if I think it's wrong. A word without a definition is not useful in a dictionary. An etymology without a source or rationale is just floating in space, it is a speculation of its author. This should be obvious. Ask yourself: why is it that every good etymological dictionary known to man has both sources and justifications for the protoforms it lists? Are they really all wrong in doing that?
Also, aside from all of this, you do realise that all this applies to you as well? You'll have to give motivations in all your etymologies as well. Especially the ones that promote Baltic while dismissing Balto-Slavic. Fringe and unusual ideas should always be subject to higher scrutiny. So if you have a particular reason for going against the majority view that Balto-Slavic is a real thing, then you will have to explain this and why this view should be preferred in Wiktionary etymologies. Because I don't think there is a consensus for excluding or minimising Balto-Slavic. It has more support among linguists than Baltic does. —CodeCat 19:43, 22 August 2015 (UTC)
Of course I realize that. That's what I've been doing from the start. Every single etymology I have added has (a) a source and (b) a motivation/rationale. They are just not mine; they are Konstantīn Karuli's. You may disagree with them, and you are free to argue or counterargue (justifying and sourcing your arguments); and if indeed they proceed, then you win. What is the problem with that?
I'm all in favor of scrutiny! My entire point is that you provide no scrutiny. You just carry out your decisions without giving good reasons, and every time you're called on that, you just say something like "I don't need to justify my preferences". Well, you do. Please, do some scrutinizing. And write it down for other to see and scrutinize, too! Your distaste for justifications and/or sources is the fringiest idea I've seen: I don't know a single person interested in historical linguistics who supports that, including the other Wiktionarians here. It's only you, CC. You're the fringe one here, the one who needs scrutinizing. Please accept that. --Pereru (talk) 13:36, 24 August 2015 (UTC)

Sources, despite User:CodeCat[edit]

Frankly, here is my personal approach to this. If anyone (except CodeCat, who really isn't impartial about this issue) thinks I'm wrong, please let me know.

  • I think sources and/or rationales (CodeCat always forgets this part, for some reason) make etymologies more trustworthy, because they show to that Wiktionary has done its homework and allow the more educated user to check whether or not s/he agrees with Wiktionary (this is especially important when an etymology is Wiktionary's own).
  • Sources and/or rationales (CodeCat always forgets this part, for some reason) are easy to add: if you're copying the info from somewhere, write down where from. If you're creating it yourself, write down why this is better.
  • If an entry doesn't have sources and/or rationales (CodeCat always forgets this part, for some reason), then it's OK to add a template that says so, so that those who are interested can take care of it. It's not different from templates like {{rfap}}, which I also use extensively to encourage Latvian speakers to add audio pronunciation files.

What is wrong about any of the above? And in what way does any of the above prevent anyone from working?

CodeCat and I have been reverting each other's edits for a few minutes. I will no longer do that -- it's more than a bit childish -- but I will leave here my request that something be done about it. This page is a discussion forum where such problems can hopefully be resolved. Let us talk about that, then, and come to some sort of conclusion, so that we can finally go on doing things without sudden tantrums from our estimated colleagues. --Pereru (talk) 19:03, 22 August 2015 (UTC)

  • CodeCat, I think previous discussions have made abundantly clear that the only person here who thinks there is no need for reconstructions to have some kind of reference is you. Given that fact, it would be wise of you to stop edit-warring {{needsources}} (which was RFD-kept per consensus) out of entries. Let's start working on a template or format for presenting "Wiktionarian research" / "rationales" on entries which lack scholarly references. For reconstructions based on known sound correspondances, perhaps we could document the sound correspondances on an 'About' page (or similar page) and then have a template that says "Reconstructed by Wiktionary according to known sound correspondences" which could be placed in the references section or at the bottom of the entry {{Webster}}- and {{LDL}}-style. - -sche (discuss) 19:08, 22 August 2015 (UTC)
    • The issue I have is that these templates are telling me to add references and sources. There aren't any, so I remove the template. What point is there in asking for something that doesn't exist? —CodeCat 19:11, 22 August 2015 (UTC)
      • The template says sources and/or rationales. If there is no source, add a rationale. Are you claiming the rationales also don't exist? Supposedly you haven't been picking protoforms randomly... have you? --Pereru (talk) 19:20, 22 August 2015 (UTC)
      • (e/c) The template explicitly asks for either pre-existing scholarly sources, or what it currently calls "original research" (that wording and the format it prescribes need to be improved, but the meaning is clear). On Wikipedia and some other Wiktionaries, like de.Wikt, you would be blocked if you kept adding original etymological research. We're offering you a big concession, a big compromise — you get to keep adding your OR (whereas newer users even on this wiki have been threatened with blocks, as recently as last week, for adding OR etymologies), but you have to provide your rationales for it — for each reconstruction you invent. If you aren't willing to do that, previous discussions have made clear that there are quite a few people who would be happy to simply delete and ban all etymological OR. - -sche (discuss) 19:23, 22 August 2015 (UTC)
        • You're making it sound like there's this big change that has to be made to allow unsourced etymologies. But it's just the status quo. So I keep doing what has always been done, as there hasn't been a policy change. Don't make it sound like a concession because it isn't. If you want to require sources for all etymologies, make a policy and enforce it (which would mean removing somewhere around 90% of all etymology sections and reconstructions). That's all I ask for. Until then, you need to be clearer about what's wrong with the etymologies. Just asking for sources and rationales is going to get ignored. Pereru can patrol his fringe etymologies all he wants, that's fine with me. Latvian is not my responsibility. As long as I can make sure the rest of Wiktionary is up to par. —CodeCat 19:31, 22 August 2015 (UTC)
          • Previous discussions have made abundantly clear that you are the only person who subscribes to the view that reconstructions are not required to have any sources. Your long-standing but solitary refusal to accept the status quo does not change the status quo. - -sche (discuss) 19:37, 22 August 2015 (UTC)
            • It's not clear to me at all, the prior BP discussions gave a rather nuanced picture. Make a policy that has clear consensus, and then enforce it. Nothing else will do. —CodeCat 19:45, 22 August 2015 (UTC)
          • I think this is the main point of all these discussions -- to create a new policy. Are we all in agreement now? If anyone other than CodeCat disagrees that sourcing and justifications are good and people should add them to pages, then please say so, or else... do we have a new policy? --Pereru (talk) 19:51, 22 August 2015 (UTC)
            • A policy is a separate page, clearly and delicately worded, and approved by consensus through a vote. Something like WT:CFI. —CodeCat 20:01, 22 August 2015 (UTC)
                • In a previous discussion I did exactly that, on this very page. And since nobody disagreed, I suppose this means we have a policy? --Pereru (talk) 20:48, 22 August 2015 (UTC)
      • There can be no such thing as no source; if you made it up, then write Source: CodeCat's ass. If someone is asking for a source, it's useful information that you just made it up.--Prosfilaes (talk) 20:55, 22 August 2015 (UTC)

@Pereru And since nobody disagreed, I suppose this means we have a policy? From what I gather CC insists that a lack of voted-on, explicit policy negates the fundamental clause that "Wiktionary is a secondary source" (which means that wikt allows some elements of synthesis but the synthesized sources still need to be cited.)

Anyways, can this be a thing? In case I disappear I would like to document my support of a potential policy requiring sources, including for synthesis [e.g., "bebe could be considered derived from baba because source X says that ebe is derived from aba" and so forth, the keyword here being source X.] Neitrāls vārds (talk) 21:51, 22 August 2015 (UTC)

Is User:CodeCat's behavior a problem?[edit]

I have personally nothing against CodeCat's work, which is excellent in many areas of Wiktionary. But his/her behavior with respect to sourcing and providing support for his/her etymological choices are causing increasing concern. Despite the majority view that {{needsources}} was useful, and that sources and/or rationales (CodeCat always forgets this part, for some reason) improve an entry (just like audio pronunciations do, which is why there are templates like {{rfap}}), CodeCat is doing his/her damndest to make this particular part of the job -- selecting the entries that need this improvement, and then going about doing it -- irritatingly difficult. Again, I have nothing against all other contributions by CodeCat, who, as far as I know, is a good person. I'm not against the person, I'm against the behavior, which, as I think most people agree, is not justified.
In view of that, is there some adminsitrative procedure here that can be undertaken to deal with such cases of irrational behavior? --Pereru (talk) 19:51, 22 August 2015 (UTC)

I have a problem with Pereru ignoring the consensus agreed upon in BP just this month, to make Proto-Baltic an etymology-only language. Pereru continues to create Proto-Baltic pages and categories, even going so far as to undo page moves. This needs to stop and I would like to know if there is some administrative procedure that can take care of this irrational behaviour. —CodeCat 21:02, 22 August 2015 (UTC)
I've blocked Pereru for one day for disruptive edits, which ignored consensus. —CodeCat 21:04, 22 August 2015 (UTC)
And I've unblocked him because, as I wrote, it was a "bad block by an admin who is actively involved in edit wars with this user, and is herself disruptively editing against community consensus (which is what she accuses Pereru of)". - -sche (discuss) 21:08, 22 August 2015 (UTC)
Of course, my behaviour makes his perfectly excusable. —CodeCat 21:09, 22 August 2015 (UTC)
Yes. You're being irrational, so your decisions don't make sense, whereas I wasn't, and mine do. What's the problem with that? --Pereru (talk) 13:38, 24 August 2015 (UTC)
  • But still, guys: CodeCat is imposing a policy that was never approved, that clearly goes against what the majority here wants, that goes against written recommendations like Wiktionary:Etymology#References; s/he also goes on a tantrum whenever anyone opposes that and takes unmeasured punishing actions such as his/her recent attempt to block me. And yet nobody does anything against it. What is the problem? Why does Wiktionary allow such destructive behavior? Isn't it the time for a disciplinary action? --Pereru (talk) 13:45, 24 August 2015 (UTC)
    • A disciplinary action was tried, but it failed. --Vahag (talk) 15:19, 24 August 2015 (UTC)

Czech possessive adjectives - etymology and related terms[edit]

First off, the term "Czech possessive adjective" does not find much use but I do not find a better one. Czech possessive adjectives would be the likes of orlův (eagle's) from orel (eagle). They are much like English possessive forms that we do not include for the reason that the apostrophe makes them effectively sum of parts; that is not the case with the Czech forms. In Czech, there is still a distinction between orlův and orlí; the latter would be used in the translation for "eagle's nest".

Now, how to treat them as for etymology and related terms?. I want that entries for them do not repeat the etymology of the base term, and I want to see no "Related terms" section. I prefer that they be treated a bit like items in Category:Latin participles. In this, I seem to differ from User:Jan.Kamenicek.

A possessive adjective is created for great many animate nouns, most often referring to humans but also sometimes to animals. They include matčin (and forms matčina, matčino), otcův, sestřin, bratrův, synův, orlův, etc. They are not to be confused with koní, orlí, kočičí, psí, člověčí, etc.

I am asking for input from other people. I am looking forward to getting a view from other languages that have a similar feature, maybe Russian and other Slavic languages, but also other languages. --Dan Polansky (talk) 13:38, 23 August 2015 (UTC)

The term "Czech possessive adjective" does not find much use because there are not many English books dealing with them. The term "Czech hard adjectives" seems to find even less use, but they do exist. It is also not easy to filter them out, because not all books dealing with Czech possessive adjectives use the phrase "Czech possessive adjectives", they can talk simply about Czech language and use only the phrase "possessive adjectives" (such as here: [8]).
I believe that all the expressions like orel, orlice, orlí or orlův should be listed in the categories like Category:Czech terms derived from Proto-Slavic and therefore their etymology sections should include information that they "come from Proto-Slavic *orьlъ", which also puts it into the correct category.
As for the "eagle's nest": it can be translated in both ways (depending on context) as orlí hnízdo (talking about the kind of nest), or orlovo hnízdo (nominative neuter of orlův) The latter is used quite rarely, usually when referring to a nest belonging to a specific eagle, but examples when it is used as a synonym for "orlí" can be also found (usually in poetry or in old texts, one of them is in the quotation in the entry orlův). Jan Kameníček (talk) 14:16, 23 August 2015 (UTC)
My preferred format is like this нилеце ‎(nilece), which seems to be what Dan Polansky is suggesting (the term "sub-lemma" comes to mind.) Just my "2 cents." Neitrāls vārds (talk) 14:25, 23 August 2015 (UTC)
I think that words categorized as lemmas should be treated as lemmas. Either it is a lemma, or it is not. I do not think that e. g. orlův can be considered a sublemma of orel. It is an adjective derived from orel by a suffix -ův, which is a derivational suffix, not an inflectional suffix. --Jan Kameníček (talk) 17:08, 23 August 2015 (UTC)
Maybe possessive adjectives should be ranked as non-lemmas, along with Latin participles and Czech comparatives (menší). It would be consistent with the practice of PSJC and SSJC. But I do not think it obvious that there should only be lemmas and non-lemmas, and that's it. For instance, many editors prefer to create some entries as alternative forms, and prefer to centralize etymology in the main entry and avoid it in the alternative form. The alternative form is still a lemma, but it is a secondary entry from the standpoint of information management. I have even seen some editors use the word "lemma" to mean "main entry" rather than "the word form representing all the inflected forms of the word".
The question is, like, do we want to repeat the etymology of huge in hugely, and do that for the whole class of -ly adverbs? --Dan Polansky (talk) 17:24, 23 August 2015 (UTC)
Was orlův separate from orel in Proto-Slavic, or was it only formed in Czech? If it was only formed in Czech, then I agree with Dan and Neitrāls: just say how it's derived from orel and put the history of orel in that entry. Just because something is its own lemma doesn't mean we have to duplicate (knowing it will come unsynced) information in multiple entries; rigidify is its own lemma independent of rigid, but doesn't repeat rigid’s etymology. - -sche (discuss) 19:12, 23 August 2015 (UTC)
Generally speaking, possessive adjectives appeared already in Proto-Slavic, see Appendix:Proto-Slavic/-ovъ. The possessives with the suffix *-ovъ changed in Proto-Czech (between 10th and 13th century) to -óv and later -uov, which changed into modern -ův.
Unlike huge x hugely, there are often more changes taking place when creating Czech possessives than adding the suffix, compare e. g. Radka x Radčin.
Besides this, I think that all words which have roots in proto-languages, should be listed in the categories like Terms derived from Proto-... . I don't think that only one representant of a group of related words should be listed there. Using the {{template:etyl}} in the etymology section is a good way to do so. Or should the category be added manually? Jan Kameníček (talk) 21:01, 23 August 2015 (UTC)
The fact that going from "matka" to "matčin" does not look like plain suffixing does not matter; it is the property of Czech morphology (inflectional and derivational alike) that it often does not work like plain suffixing on the surface level. For instance, "bedna" --> "bednář" = "bedna" - "a" + "ář"; "samec" --> "samčí"; "vyrobit" --> "výrobce" = "vyrobit" - "it" + "ce" with "y" made acute or the like; "dům" --> "domeček" (ů went to o); "orel" --> "orlíček" (e dropped); "hrdlo" --> "hrdelní"; etc.
What matters is that we are dealing with a very productive derivation or inflection pattern, like in English for -ly, -ness, -hood, -ify, -ing, etc. And what matters is whether we want to have etymologies like the one currently in orlův, which says this:

"From orel +‎ -ův. Noun orel comes from Proto-Slavic *orьlъ, which is from Proto-Indo-European *h₃er- ‎(“big bird, eagle”).[1]"

As you can see, the etymology first indicates the suffixing, and then goes into detailing the etymology of the component "orel". That is really like "swimming" detailing the etymology of "swim", and "merrily" detailing the etymology of "merry".

Whether possessives in general originated early or not does not seem to matter. What matters is the particular etymology, and whether it is of the form "base + suffix. Base is from base-etymology" rather than what we see e.g. in windmill, which could conceivably be "wind + mill", but can in fact be traced to Old English *windmylen. I do not think that all compounds should provide the etymologies of the component terms on the pages of the compounds. Put in general terms, I do not think that all etymologies of all terms resulting from derivation (prefixing, suffixing, compounding, etc.) should repeat the etymologies of their base terms. --Dan Polansky (talk) 19:10, 24 August 2015 (UTC)

Question (re: sourcing)[edit]

So, there was this thing that I wanted to get a feel of the general attitude.

Do passages/statements attributed to an author or a book need to actually reflect what the author/book says? Can they be changed ("corrected") with something that author doesn't say while still attributing it to them?

My answer would probably be "are you effing kidding me?" (lol) Then again en.wikt can be a serious "land of the bent mirrors" [don't remember the correct idiom] and things that I see as common sense some others don't even consider.

This is referring to a discussion 3 headers up (that I actually missed) where (to sum it up somewhat snarkily) CodeCat says that that book is stupid and needs her corrections while still proudly displaying the reference [1] at the end despite (in some cases) all the core information being changed. For example in akmens the direct parent root was changed, then extrapolating from that the proto-group was changed and a different PIE root introduced (none of these things are to be found in the source cited.) I call this manufactured references/misattribution but maybe I'm dumb...?

Would like others' input.

And more generally this thing has been lingering on for years, the crux of the matter is that CC demands an explicit, voted-on policy, why not just do it, it could be something very simple, something to the effect:

  • Wiktionary by previous consensus is a secondary source, this explicitly applies to etymologies, sources need to be provided, in case of synthesis, the synthesized works need to be attributed.
    • Usage of templates to keep track of unsourced pages is to be encouraged.
    • Attribution of statements to an author that they didn't make is to be avoided.

What do you think about that? Perhaps User:Dan Polansky could help set it up? Neitrāls vārds (talk) 14:57, 23 August 2015 (UTC)

There is still Wiktionary:Votes/2013-10/Reconstructions need references that never started. How is the present wording of the vote from your standpoint? --Dan Polansky (talk) 15:09, 23 August 2015 (UTC)
As for policy page, the main thing is consensus and evidence of consensus, IMHO. A policy page by itself is a poor evidence of consensus; it merely makes things convenient for newcomers who then do not need to wade through previous votes to find what the decision was. Thus, a policy page is not strictly necessary, IMHO.
For interest, Wiktionary:Votes/pl-2006-12/Proto- languages in Appendicies is a related vote that does not seem to indicate inclusion criteria. --Dan Polansky (talk) 15:15, 23 August 2015 (UTC)
Looks good, pretty much exactly what I had in mind. The only problem – a bit narrow. In Latvian there is this problem that the entries look like doormats (to be a bit dramatic). Would be perfect if it could be extended to mainspace...? P.S. perhaps a clause about misattribution would be necessary – right now I can name two appendices that very dubiously cite template:R:lv:LEV (a connection is attributed to this book that cannot be found there.) Neitrāls vārds (talk) 15:31, 23 August 2015 (UTC)
I agree completely. I don't know what is on CC's mind, but s/he is clearly doing the wrong thing here. I don't really know what "policies" are supposed to imply (CC clearly acts without one), but I say there has to be some order in the usage of references and justifications. I also agree completely that reconstructions need justifications (sources, rationales), a practice that is used in every good etymological dictionary that I know. --Pereru (talk) 13:26, 24 August 2015 (UTC)
@Neitrāls vārds: I updated the vote a bit, to indicate sentence structure in a clearer way.
As for narrowness: I'd suggest to leave it narrow, and see whether it can get enough support as is. We can create another vote for etymologies later. There is still the question whether etymologies should be inline referenced etc.; dealing with these appendices separately seems to be a good initial step. --Dan Polansky (talk) 15:39, 23 August 2015 (UTC)
I added the vote to WT:VOTE and scheduled it to start in a week. Let us postpone the vote as much as a discussion requires. --Dan Polansky (talk) 15:44, 23 August 2015 (UTC)
Great, thanks! Neitrāls vārds (talk) 15:49, 23 August 2015 (UTC)

Native speakers' advice[edit]

Native speaker's advice needed, please look at Talk:houbelec#Translation. Thanks very much! Jan Kameníček (talk) 21:15, 23 August 2015 (UTC)



I don't see the use of keeping such empty entries that failed their RFD's. Could someone explain? Thanks 12:15, 24 August 2015 (UTC)

Partly as a place to store the evidence for the word (so that if we eventually find more, we can recreate it more easily – see for instance redamancy, which was a blank entry pointing to Appendix:English dictionary-only terms, until we managed to find enough citations to create a full entry), and partly to stop people trying to recreate the page (which often happens with "words" that correspond to rare phobias, sex acts, political insults etc, which are often mentioned in word lists and novelty dictionaries but never actually used – look how many times "wunch" got deleted, until I created a proper cited entry for it). Smurrayinchester (talk) 13:44, 24 August 2015 (UTC)
Your first reason does not apply, since the citations page would exist even if the soft redirect to it from the main entry did not exist. --WikiTiki89 13:48, 24 August 2015 (UTC)
But who checks whether the citations tab is a blue-link when creating an entry? Smurrayinchester (talk) 14:22, 24 August 2015 (UTC)
That's your second reason. I only said your first reason doesn't apply. --WikiTiki89 14:59, 24 August 2015 (UTC)

Recreating Proto-Baltic (and other "deprecated" languages) with a different status?[edit]

Proto-Baltic was recently discontinued as an accepted language in Wiktionary. I was against it, because it doesn't seem to me that the discussion is over (and because there is no real authoritative source for PBS etyma yet), so it seemed premature, but OK, I can live with that. The problem, it seems to me, is that this forces changing quotes from sources in ways that don't seem legitimate. If a source reconstructs a form as Proto-Baltic, renaming it as Proto-Balto-Slavic without any further changes (e.g., replacing it with a different source) seems to me illegitimate. So: how about having a different status for Proto-Baltic? Say, "older/deprecated/obsolete Proto-language" or something like that? In this manner, we could list deprecated protoforms here (with templates duly identifying them as such) in the same way we list "misspellings of" or "alternative forms of" or "obsolete forms of" words in the main namespace. Here are a couple of reasons:

  1. People will still come upon older reconstructions -- they are, after all, attested in papers, etymological dictionaries, and other similar sourcces --, and may want to know what they were and why they were abandoned; it would thus be useful to have pages with these forms (clearly tagged as "deprecated" or something like that, and linked to the most recent and most widely accepted form), just as in biological taxonomy it is useful to have lists of old, deprecated scientific names so that older articles can still be read and understood correctly
  2. To follow the history of a proposed protoform, knowing its predecessors is important -- often, a new protoform is proposed in explicit opposition to, or as an explicit correction of, an earlier proposal. Being able to track these would be useful in understanding the state-of-the-art.

What do y'all think?

You can still reference a source that reconstructs a Proto-Baltic term in a Proto-Balto-Slavic entry. Think of it this way: we are reconstructing a Proto-Balto-Slavic term based on someone else's reconstruction of a Proto-Baltic term. --WikiTiki89 15:01, 24 August 2015 (UTC)
Sure, but the Proto-Balto-Slavic reconstruction will ultimately look different, at least in that it refers to a different level. (Most PBS entries here look very much different from the PB forms on which they are based). Someone who sees a PB form somewhere and wants to know what it is won't find a page about it here. Shouldn't there be one -- in the same way that there are "alternative spelling of" and "obsolete form of" pages? In this way we don't misrepresent sources, and we allow users to find exactly the form they saw in some source and track its status (deprecated) and understand why it was replaced by the PBS form. --Pereru (talk) 15:16, 24 August 2015 (UTC)
Let me give an example. A proto-Baltic form, like e.g. Appendix:Proto-Baltic/*akemns, would have an initial template saying something like: "This protoform is deprecated. The current consensus form is Appendix:Proto-Balto-Slavic/akmo. Reasons for this change are indicated below. See also Appendix:Proto-Balto-Slavic for the current view on this branch of Indo-European." In the page itself, the sources for that form (say, Karulis' LEV) would be cited. In this way, the reader would know what this form is, where it came from, and what it was abandoned for. The end result would seem to me to be at least as useful as "alternative spelling of" or "obsolete form of" pages. (I imagine there would also be a heading in the current reconstruction -- something like ==Deprecated forms== or ==Older proposed forms== -- to link the currently accepted protoform to its previous incarnations.)--Pereru (talk) 15:22, 24 August 2015 (UTC)
You may be right about including them in some way, shape, or form, but this has nothing to do with misrepresenting sources. We do not misrepresent PB sources by altering the form of the reconstruction to make it PBS-like. --WikiTiki89 15:33, 24 August 2015 (UTC)
Thanks. But as for sources, if a source clearly reconstructs a form as X, and we list it here under page Y, then it seems to me we are misrepresenting it, aren't we? (But one possible solution would be to mention this on the page; i.e., have something akin to a ===Usage notes===, or a footnote, where we explain that what the source said isn't exactly what is on the page. Would that be OK with you?) --Pereru (talk) 17:47, 24 August 2015 (UTC)
If you quote the Pythagorean theorem as (= (+ (* x x) (* y y)) (* z z)) rather than as a^2 + b^2 = c^2, are you misrepresenting the Pythagorean theorem? --WikiTiki89 18:21, 24 August 2015 (UTC)
If you cite someone who quoted it as a^2 + b^2 = c^2, then yes, you are. If you have some standard way of referring to the theorem that supersedes whatever the author you're quoting saying, then you should say so somewhere and link to it. It would be like, you know, changing US spelling to British spelling in a quote written by an American author -- not the right way to quote. --Pereru (talk) 18:39, 24 August 2015 (UTC)
But that's the thing, we're not quoting, we're paraphrasing. And when you paraphrase, it is totally OK to change British spelling to American. I can talk about the "color" of Winston Churchill's eyes and cite a British source that spells it as "colour", and I would not be misrepresenting the source. --WikiTiki89 18:44, 24 August 2015 (UTC)
But that most clearly should not be the case for reconstructions -- they are ideas and hypotheses, not paraphrases. Their spelling is often exactly what is being claimed -- a *X instead of a *X'. In other words, the sounds that compose the protoform are exactly the theoretical point that is being made; and, in this case, of course the spelling matters, in fact it is what matters most. There can of course be general problems that can be solved in a general way -- researcher 1 uses X for a certain sound (say little glottal stops), while researcher 2 uses Y (say accent amrks) -- and you can adapt the spellings to reflect that (as long as you are consistent, and you write up somewhere why you chose to regularize this difference in the way you did -- and link it to the pages where it is relevant). But in most cases this is not so, and differences in spelling mean something much more serious -- and they should be better documented. --Pereru (talk) 20:11, 24 August 2015 (UTC)
The difference is between equality (faithfully representing a source in its original form), and equivalence (understanding its meaning and/or intention). Some sources for PIE write h₂ while others write H₂. These are different things in writing, but we know and understand that they mean the same thing; they are equivalent even if they are not equal. So we can exchange one for the other without problems. Likewise, in Wikitiki's example, "colour" and "color" are unequal representations of equivalent meanings (I might call them "equivalent words", but this hinges on the question of whether different spellings make different words). —CodeCat 18:57, 24 August 2015 (UTC)
And the solution for this is easy: you make a principled choice (I hope, after a discussion with others) for, say, h₂; and then you write somewhere (say, Appendix:Proto-Indo-European) that you did that, and why, and you link this page to those in which h₂ occurs -- so the reader, who may have seen a source that had H₂, doesn't think that you made a mistake. And since you do know why you preferred h₂ to H₂, explaining it in writing shouldn't be a problem. You would only need to do it once, in one page (where you could explain all the other similar choices you made), and then link it to new PIE entries. And again I ask: what is it about this suggestion that is so unreasonable or difficult to do? You spend more time writing comments here than it would take you to do this.--Pereru (talk) 20:11, 24 August 2015 (UTC)
There's nothing against this in principle, and it's even preferred I would imagine. But at the same time, a lot of Wiktionary's practices and conventions are unwritten; we follow them because we learn from existing examples that are already on Wiktionary. In the case of the choice to use lowercase h₂, the earliest that I can find is this. And there, too, it was simply set as a rule without discussion or motivation. To discuss and motivate it now would be a bit pointless, as there's already a consensus for it. —CodeCat 20:19, 24 August 2015 (UTC)
Good, let's do it like that in the future then. Write up your favorite spelling choices for PIE (h₂ instead of H₂), their reasons (in this case, I suppose because h₂ is more recent?), and voilà: no more for reasons for complains, and people can go back to arguing the merits. The point is not justifying it to other Wiktionarians (though that in itself is not bad: there are always new people coming who don't know where this decision came from, and I'm sure they'd appreciate the information), but to users. If someone checks an etymology here and sees something s/he finds strange, and there is no justification anywhere for it, then this doesn't make Wiktionary look more trustworthy. Again: I'm not suggesting a discussion (unless people think there should be one), I'm suggesting documenting choices, to show, at first sight, that they are choices, not mistakes. Besides, there are things that are much more important than h₂, like your current diatribe about how to spell proto-BS intonations. After this is over, don't you think it would be a service to others to write down somewhere why one variant was preferred? Again, so that it doesn't look like a mistake, but like a true principled choice? --Pereru (talk) 20:53, 24 August 2015 (UTC)
I'm not going to start documenting everything for PIE all over again. Not unless enough people feel there is a pressing need for it. So far, nobody has complained about our current standards. If you want motivations for PIE notation, you'll have to write them on your own. —CodeCat 21:05, 24 August 2015 (UTC)
I hope they do, because this would indeed lead you towards actually improving your PIE forms. Of course, nobody can force you to do the right thing; you're a free individual. I'll simply keep adding {{needsources}} and {{needref}} to your unjustified decisions (unless you'll help me by doing it yourself, of course), hoping that someone other than you will have the knowledge to do the right thing. As for complaints, I did complain against your standards, and I've seen several peole (Štambuk, -sche) disagreeeing with your standards in specific cases, so I think you're assuming a non-existing consensus here. You're more counting on people's intertia than consensus actually. But hey, it is a strong feature of humans too. For all I know, you may well get away with it. --Pereru (talk) 22:37, 24 August 2015 (UTC)
There's also the case where different sources disagree on certain sound laws. For example there's a subset of linguists that thinks the change o > a happened independently in all the Balto-Slavic branches rather than in Balto-Slavic itself. In this case, too, we have to pick one particular set of sound laws as the "main" one. Our existing pages treat the o > a change as Balto-Slavic. Likewise, some sources may neglect to indicate accent or acutes even when all descendants are in agreement. You can compare this to Pokorny's reconstructions for PIE: they don't reflect modern understandings so they have all kinds of weird schwas and long vowels while lacking laryngeals. So imagine that the only source we have on a particular form is Pokorny; should we allow ourselves to bring the form up to par? These are all questions that arise when we start giving too much weight to sourcing. —CodeCat 18:33, 24 August 2015 (UTC)
Disagreement between sources is exactly the reason why you need to argue for the forms you create pages for her1e -- I'm so glad you brought this up! Look: if different sources give different opinions and explanations, then you discuss them and explain why you favor one over the other. Things like "different sound laws" can be part of the discussion. All the problems you mention above can be summarized and written up in a page (e.g., Appendix:Proto-Indo-European sources) to which you can refer as part of your explanations for preferring one form over another. I've seen this done in etymological dictionaries, and I see no reason why you couldn't do this here. Sources are good -- even when they disagree... --Pereru (talk) 18:39, 24 August 2015 (UTC)
We don't have to motivate and discuss every single choice we make. For Dutch entries, we choose the spelling as prescribed by the Dutch language union as the norm for lemmas, even though not everyone uses it and some people advocate alternatives. This choice is not motivated or discussed; it's simply set as a rule and accepted by our Dutch editors. In the same way, it's not necessary to discuss why we picked one particular set of sound laws to base our reconstructions on. In many cases, the choice is arbitrary and we simply picked one because we had to make a choice. I think it's more important for us all to agree on a set representation and sound laws for Balto-Slavic reconstructions, than it is for us to discuss and motivate it all. Not that it's not welcome and valuable to give reasons for choosing one particular thing, but that's secondary to making the choice in the first place. What we choose is more important than why. —CodeCat 18:52, 24 August 2015 (UTC)
Things are different with reconstructions, especially when there are competing hypotheses. See, a reconstruction is not a word, but an idea; and, as for every idea, justifying it is important. The spelling of a Dutch word is not an idea that is being discussed by several people right now and with different, equally authorative variants (for a still different, but more comparable case, see Nynorsk vs. Bokmål). That's what historical linguists do -- they justify their reconstructions -- and that's what you should do, too, if you care about reconstructions. When you say "we don't have to justify it", you're making a "petitio principii" without there actually being policy on this. Why don't you start a policy page on why we don't have to justify choosing one etymology over another, one set of sound laws over another (thus disputing Wiktionary:Etymology and let people vote on it? You keep talking as if everybody agreed with you, when this is clearly not the case, much the opposite. I'd like to see you try to defend and get this "policy" of yours approved. --Pereru (talk) 19:08, 24 August 2015 (UTC)
Well of course, there must be a consensus. I'm aware that the current representation of Balto-Slavic doesn't have consensus, as both you and Ivan seem to disagree on it. But Ivan's solution was to simply create alternative (duplicate) entries or move mine, which of course is no way to come to an agreement. So the question becomes how do we come to an agreement on things, and if we don't, what should be done with existing and future Balto-Slavic pages? Right now, the majority of them has been created by me, so they mostly reflect the (unwritten) standards I follow. But if we insist that there must be consensus first, what do we do with them? Should they be deleted until there is an agreement on them? What about Balto-Slavic forms in etymologies? —CodeCat 19:16, 24 August 2015 (UTC)
It's not simply that there should be a consensus -- the consensus shouldn't be hidden, buried in some page that was archived three years ago. The reason for the consensus should be right there, on the page, or at least in Appendix:Proto-Indo-European, so that the reader knows what consensus decisions you made, and why. Since you're talking about theories, not words that exist in real languages, then your sources and/or arguments are the basic reason why the protoword is here -- i.e., they are precisely the most important piece of information. (I don't disagree with anyone's spelling of proto-BS, by the way, simply because I'm not sufficiently familiar with it to have a principled opinion. But I see you disagree -- and given this fact, why create pages with one spelling when you can't even agree this is the right thing to do? Why not create a paragraph in, say, Appendix:Proto-Balto-Slavic, that summarizes this discussion -- after you're done with --, lists your conclusion and the reasons for it? Then you can follow it consistently, and nobody -- or at least not I -- will complain. What I keep not understanding is this need to hide your reasons: that makes no sense to me at all; and it's something that no etymological dictionary I know of has ever done. Why this innovation? --Pereru (talk) 20:16, 24 August 2015 (UTC)
Consensus doesn't necessarily have to be formed through discussion. Sometimes all that's needed is for one person to do something and for others to then follow that example. Consensus is often silent, and therefore undocumented, reflected only in practice. There's no documented consensus for most of the edits people make to Wiktionary entries; it's simply the fact that they're left unreverted that creates a sense of agreement for the new status quo. It's only when someone disputes something that a lack of consensus becomes obvious. In the case of Balto-Slavic, you two have voiced your opinions, so that's how I know. I continued creating entries because I figured, the source of the dispute is the naming, but we can still have good content and when we solve the dispute we can rename the entries. I haven't made further attempts to come to a consensus because the attempts I did make didn't work; Ivan's opinions were fundamentally different from mine on this matter, and nobody else seemed to care enough to provide a third voice, so the matter remained unresolved and both of us just kept doing our own thing. —CodeCat 20:28, 24 August 2015 (UTC)
True. And sometimes what is necessary to challenge it is for someone to come here and say "but this is not right, and here's why". And that's what I'm doing, quite legitimately so, since what I am asking for is no more, no less than what every good etymological dictionary known to man already does -- sources + justifications. So: me being here, and the reactions of several others, show that there is no consensus here. If I were you, I would stop adding any new words, and concentrate on justifying the ones you've added already. You know the reasons you had for creating them, so this shouldn't be a problem. What to do with the proto-BS (or IE, or FU...) words? Justify them. If in the future a given justification is abandoned, because a new one came up... then all those pages will need to be moved, and a new justification added. That's how things go with ideas that aren't attested words (and add the reasons why). --Pereru (talk) 20:44, 24 August 2015 (UTC)
Oh of course, challenge and counter-challenge. And then eventually there's either an agreement, or we all give up until the next time. I actually find it much easier to discuss things with many participants though, that way things are more nuanced and it's not just two opposites clashing and getting nowhere. Much less chance of a stalemate. I will see if I can write up a proposal for PBS reconstructions with those motivations you're after so much. No promises though. I will refrain from creating any more until I do this, but I ask you not to add your templates to the pages. You should also remove them from Germanic pages because the norms are already explained at WT:AGEM, have consensus, and therefore don't need further justification. —CodeCat 21:05, 24 August 2015 (UTC)
Yes, the Wiktionarian way, isn't it? So conducive to the right result!... I also like it when there are more participants. Please! And I will indeed be glad to see you write up your proposals, so that others can see what exactly are the tacit rules you're tacitly following with the tacit (dis)agreement of your peers. WT:AGEM is actually quite good -- proficiat! But it doesn't say why sources or justifications should not be added. (I keep saying: you're following a policy that is not used in any goood etymological dictionary anywhere. "Consensus" indeed!...) I will refrain from adding the template to them for now, but unless someone explains why there shouldn't be sources/justifications in Proto-Germanic words I will eventually return to adding them. Why should it be less good to source/justify Proto-Germanic reconstructions than those of any other protolanguage? As in the other cases, they aren't words, their justification is a crucial element to their eligibility for having a page that states they are the "right" protoform, etc.- --Pereru (talk) 22:49, 24 August 2015 (UTC)
You can list deprecated reconstructions under ===Alternative reconstructions===, tagging them with {{qual|obsolete}} or whatever. Note the reconstruction from Pokorny in Appendix:Proto-Indo-European/h₂eHs-. The entries for these alternative forms can be soft-redirected as in Appendix:Proto-Indo-European/pel-. --Vahag (talk) 15:29, 24 August 2015 (UTC)
That is a good idea! I didn't know this could be done. Now, would it be OK if I created Proto-Baltic forms as such, under Appendix:Proto-Baltic/xxxx and then redirected them to their Proto-Balto-Slavic equivalents at Appendix:Proto-Balto-Slavic/xxxx? --Pereru (talk) 17:47, 24 August 2015 (UTC)
Of course, the names of Balto-Slavic pages should agree in notation with what is already the current practice for BS entries. Acutes and accent should be indicated when known, and the distinction between ś/ź (former palatovelars) and š (from RUKI) should be maintained, while the letter ž is not used for Balto-Slavic. This means that such things should be corrected for in the redirect as well. If certain features are reconstructible but not indicated in the page name, this should be explained in the entry. For example, if Slavic and Latvian have s while Lithuanian has š, then the expected reconstruction is ś and any difference should be accounted for by the entry. Likewise, if the descendants all indicate an acute but the page name has none, this too needs explaining. —CodeCat 18:05, 24 August 2015 (UTC)
Hi CodeCat! Glad you're not whimsically blocking people today. Now, to keep a form that was reconstructed as PB under a PBS heading would be as wrong as keeping a Latvian word under a Lithuanian heading -- it simply disagrees with the source, i.e. it is factually wrong information. The various letters are just notational conventions, differing from author to author, and could probably be resolved with redirect pages. (You could of course also include this information about Proto-Baltic in the Proto-Balto-Slavic page itself, but I don't see how this would be any better -- care to elaborate?
Let me give an example. I'm going to recreate the Appendix:Proto-Baltic/akmens page -- CodeCat, please refrain from deleting it until the discussion here is complete -- and make it look like what I'm thinking. Then you guys can give your opinions. --Pereru (talk) 18:34, 24 August 2015 (UTC)
Latvian vs Lithuanian is irrelevant here, that's a completely different case. They are real attested languages, and to label a Latvian word as Lithuanian would be a misrepresentation of the attested facts (not the sources; sources are irrelevant for attestation as we are a secondary source). But for etymologies, sources aren't facts, they're proposals. And as an independent etymological source, we're allowed to make different proposals. So if we think that no, your Baltic reconstruction doesn't make much sense, here's a Balto-Slavic one we agree with more, then we are allowed to do that. Being a secondary source means that we do our own interpretation of the facts. We can of course use the proposals of others as part of ours, which we do. And we should definitely source that. —CodeCat 18:44, 24 August 2015 (UTC)
And reconstructed protoforms are attested as claims at certain levels; and to misrepresent the claims as different from what they were is wrong. If you prefer, compare it to adding a quote to a certain word, but (a) misspelling words in it, or (b) attributing it to the wrong source. Not the right move, ahn?
Thank you so much for saying the sources are proposals -- I had said that to you so many times, I thought you never would agree with me. That's exactly why it's so important to ground them. See, when you create a protofrom page here, you're not creating a word: you're creating a proposal. And what makes proposals good or bad are the arguments that support them -- as you yourself said, they are not attested facts. That is exactly why sourcing and arguing them is so important: proposals without the accompanying argumentation are not compelling.
Finally, I have no Baltic reconstructions -- Karulis does. Take it up with him if you want, not me. Just like you haven't invented any of the Dutch words you contributed to Wiktionary (right? you haven't, have you? I mean, maybe you think Dutch is like Proto-BS and you should be allowed to add even the ones you invented yourself without further justification...). I have absolutely no problem with you changing any Proto-Baltic etymologies as long as you document you reason for doing so, or your source, etc. -- so that the reader can see why this is supposed to be better. I repeat: it's not much work, it takes a couple of minutes, and you must have the information already since you're making judgments on the basis of it. There is absolutely no valid reason for you not to do that. Period. --Pereru (talk) 19:08, 24 August 2015 (UTC)
Adding a source to our proposals just says "we agree with this idea". But that doesn't make sourcing important necessarily. Maybe there aren't any proposals that we agree with, and in that case we have nothing to source. So what Karulis says may be nice, but they are your reconstructions as soon as you put them in etymologies. Again, the source simply says that Karulis agrees with you, but you put it in the etymology, so you are proposing it in the name of Wiktionary. And I'm not required to give motivation for changing the etymology if there isn't one to begin with. Take your favourite edit warring target suns for example; the form is not motivated at all, but simply stated as fact, with reference to Karulis. This seems like exactly the kind of thing you're advocating against. A proper etymology, as I understand your view to be, would provide a motivation for the reconstruction. This motivation may itself come from Karulis's work, or it may be your own supplement. Or, it could be documented centrally in an appendix so that we don't have to write it down everywhere. But would have to exist, even for proposals that are sourced. —CodeCat 19:26, 24 August 2015 (UTC)
Also an added note: Karulis's reconstruction for suns is demonstrably wrong, because it shows the ō > uo diphthongisation for both East and West Baltic. This change only occurred in East Baltic, and is not found in West Baltic{{R:Fortson 2004}} so the form Karulis gives is Proto-East-Baltic. This is one of the reasons I am against over-reliance on sources; sometimes they are quite obviously wrong. —CodeCat 19:34, 24 August 2015 (UTC)
──────────────────────────────────────────────────────────────────────────────────────────────────── If we change suns from saying "From x.<ref>Karulis, Book</ref>" to saying "According to Karulis, from x.<ref>Book by Karulis</ref>", does that solve at least some of this dispute? That's what was done once before when there was dispute over the etymology of bensin — the entry was rephrased to attribute the etymological theory explicitly, rather than giving it in "Wiktionary's voice". - -sche (discuss) 19:45, 24 August 2015 (UTC)
I think it would help, because, to me, the main problem is making sure that everybody's opinion is clearly marked -- Karulis', CC's (or Wiktionary's), etc. It's all a question of knowing we are reporting the right thing.
@CodeCat, look: the point is not whether Karulis is right or wrong -- I have no beef with that. The point is making sure your reasons for agreeing or disagreeing with him are documented somewhere, so the reader can see them and decide if s/he agrees with you or not. So: if you want to copy the paragraph you wrote above and place it, say, somewhere (in suns, or in Appendix:Proto-Balto-Slavic and then link it to suns, adding a few words to the etymology discussion) -- I have no problem with that. My only problem is with you erasing or changing Karulis' opinion, and then contributing something that cannot be checked. Let me see if I put in bold you will finally react to this: I am not saying you have to believe your sources unconditionally; I am saying that you have to explain the choices you make. You're not explaining your choices; and it would be easy to do so: just create Appendix:Proto-Balto-Slavic and do it there, and link it to other pages. (After discussing the 'best solution' with your colleagues, i.e. after you and Štambuk and whoever else is intersted finally agree on how to spell proto-BS words.) But if you simply take down Karulis' opinion without justifying it -- and obviously you can try to justify doing it, since you just did it in the preceding paragraph -- you are NOT improving Wiktionary; you're just making it look more whimsical. My entire point in a nutshell: why hide the reasons for making a choice, especially when this choice is the crucial thing -- the very name of the page you create depends on it? --Pereru (talk) 20:26, 24 August 2015 (UTC)
Are you saying I need an Appendix page in order to remove an etymology I judge to be bad? Lots of other editors before me have simply edited out bad content, nothing to it. I'm just doing what others have also been doing already. It's you that's now trying to change all this and making it much more complicated, and then complaining when someone doesn't simply do it your new way and they start to butt heads with you. —CodeCat 20:32, 24 August 2015 (UTC)
Yes, but it's actually very simple. The Appendix page you need is a general guide to why certain things are 'bad content' -- they don't follow accepted correspondences, or they misapply sound laws, or are based on some idea (say, Glottalic Theory) that has been disproved, etc. Only one such page would probably solve all your problems. Then, when you remove an etymology that you think is bad and replace it with one you think is good, you mention in a footnote that so-and-so prosed the bad etymology, but then there's reason 1 and 2 (say, correspondence nr. 35, and sound law nomber 4) why this was bad -- see Appendix:Proto-Indo-European reconstructions -- which is why it was removed here. For deeper differences you might have specific pages, but I don't think there would be many of those, no. And it would also be possible to link Wikipedia pages, in case you see one that you agree with and you think actually explains the issue. The actual argumentation for removing an etymology would probably be one sentence long, and be added as a footnote. You could also mark it as a "Wiktionary editorial decision" if you don't want your name there. --Pereru (talk) 21:01, 24 August 2015 (UTC) NOTE: but note also that you'd have to deal with those that disagree with your reason. I suggest that anyone who disagrees with an etymology should first mention it somewhere -- the talk page of the protoform in question, or maybe WT:ES -- before making the change. If you do make the change, then also be ready to discuss with whoever disagrees with it, and if his/her arguments are good, then incorporate them in your rationale for accepting and/or refusing his/her criticism in the original footnote that explains the change.--Pereru (talk) 21:04, 24 August 2015 (UTC)
Unless we make this a rule for the removal of any content, etymological or not, then I'm not on board with this proposal. It would have to be justified why the rules for removing bad etymologies are different from those for removing bad anything else. Wiktionarians have always had the prerogative to delete content they think is bad, and they've never had to refer to some kind of standards document to justify their removal. An edit summary has generally been enough, and often even that is not done. This has worked well enough so far that you're the first to propose a change. So I will be expecting a more general support for this idea as it seems like a solution without an obvious problem. —CodeCat 21:10, 24 August 2015 (UTC)
You might do this if you want, though you yourself have pointed out repeatedly that etymologies are not words, so it's up to you to argue why they should follow the same rules. Feel free to present your arguments. As for me, obviously, protoforms (to quote your post) are proposals, not words; and, in science, proposals exist only because of their arguments. Unless you've changed your mind and no longer think protoforms are proposals rather than words, you should agree, for the sake of consistency with your own stated opinion.
I dispute the idea that Wiktionarians have always been free to delete whatever they thought was bad content; if they don't justify their deletions, they are stopped and blocked after a while -- i.e., others have to agree with them, tacitly or not, or else they are not allowed to continue. Adding justifications to protoforms, especially when you're making choices, falls within this general area. I maintain that for protoforms (= proposals), justifications are more important, let's say as important as sources are for quotes. You don't seem to want to address that, so I'll assume you tacitly agree (as you assume those Wiktionarians who don't revert your edits tacitly agree with you -- "tacit consensus", right? :-) --Pereru (talk) 22:22, 24 August 2015 (UTC)

I kind of like this idea – referenced Proto-Baltic pages with a clear disclaimer that it's a defunct grouping in its classical sense according to the most recent sources. (Disclaimer: I have yet to see serious challenging of Slavic being a daughter of W-Balt, thus I do not believe that there can be W-Balt + E-Balt grouping excluding Slav).

We do not misrepresent PB sources by altering the form of the reconstruction to make it PBS-like. --WikiTiki89 I agree with this, "sadly" that is not exactly the case ("correcting" referenced (even if deprecated by mod. stand.) PB forms to Orig. Res. PBS forms is what lead to edit-warring a while ago, in my reading of things.)

My (personal/pseudoscientific) reading between the lines of Pereru's proposal is that it would serve as another "safety valve" and, baby, we couldn't have enough of those, lol. Neitrāls vārds (talk) 22:00, 24 August 2015 (UTC)

From where I stand, PBS does look like a better grouping that PB (the evidence seems to be accumulating). But in the absence of a general work on the topic (say, a PBS etymological dictionary), I don't think it can be regarded as settled -- I'm just conservative on this point. But I have nothing against it as a theory, and as long as things are clearly marked and sources are not misrepresented, I have no problem with it. --Pereru (talk) 22:22, 24 August 2015 (UTC)
Also, I'm not in principle against altering the forms of reconstructions -- I just think this should be done in the open, with the rules clearly laid out and placed in some page where others can see them. What is the point of "adjusting" a form to a spelling that was not in the original source, and then doing nothing, not even adding a footnote, thus misrepresenting the original content? And it's so easy to do it right -- just add the footnote, or change the source to the one whose spelling you think is better. This implies adding only a few words, keeps things clean and organized, and doesn't prevent anyone from expressing his/her agreement or disagreement with this or that protoform. Why not do it? Or, worse yet in CodeCat's case, why fight against it? --Pereru (talk) 22:26, 24 August 2015 (UTC)
Yes, wouldn't it be so much easier if everyone just saw it your way? Why do people always have to make it so difficult by disagreeing with you? It's so inconvenient. —CodeCat 22:38, 24 August 2015 (UTC)
Indeed! You have much more experience than I do with being in this position, so I'm hoping you'll share your wisdom in this respect? And especially with respect to my old, old question: "all good etymological dictionaries do it this way, and CodeCat does the opposite. Now, who do you think is more likely to be wrong?..." --Pereru (talk) 22:55, 24 August 2015 (UTC)
So, to sumarize: I'm OK with deprecated pages/redirects, as long as it is clear which form is which, and who proposed what and why. As far as I'm concerned, this settles the question. --Pereru (talk) 05:25, 25 August 2015 (UTC)
But why create Appendix:Proto-Baltic/akmens with an unusual "deprecated" infrastructure? Why a hard redirect wouldn't do? In case of proto-languages on the same level we should use soft redirects, because the page can contain homonymous roots. Why do that for Proto-Baltic? How will a user ever even get to the Proto-Baltic page? --Vahag (talk) 09:09, 25 August 2015 (UTC)
Why a hard redirect wouldn't do? Hard redirect to what though? You mean akmō which is somehow mysteriously unciteable (I was actually looking at it and wondering whether to ask Itsacatfish if it would be possible to come up with some refs (non-agnostic of PBS) but I wouldn't want to draw any "innocent" editors in this drama.) Neitrāls vārds (talk) 22:31, 25 August 2015 (UTC)
Uncitability is a different question and has nothing to do with the policy of redirecting. The PBS page will presumably have CodeCat's original-research justifications (I'm with Pereru on this one). --Vahag (talk) 08:26, 26 August 2015 (UTC)
The discussion here (including other headers above) seems to have some of the problems arising from overdoing lexicography:
  1. from trying to use sources to "attest" reconstructions,
  2. and from treating reconstructions as "headwords" — instead of kind of index words for etymologically connected word groups.
Creating redirects for alternate reconstructions, and discussion of competing (though not necessarily depreciated) approaches both sound like good ideas, but I do not see the benefit in creating separate pages altogether for reconstructions based on more or less the same data as another one.
If (and it appears to me that this is an if) the point of protolang pages is to illustrate the connections between attested languages, then cutting down on repetition is necessary. We do not create separate appendices for things like West Germanic or Anglo-Frisian, even though they are known to have existed; since this stuff can be adequately discussed already in the "Proto-Germanic" appendices.
I would hold that, strictly speaking, we have no such thing as an "accepted reconstructed language" on Wiktionary — that's why they go in the Appendix namespace to begin with. Which is not a namespace that means "just like mainspace, but for second-tier languages". As I see it, an appendix-only status means not only that protolanguages can be subject to new limitations like possibly requiring sources, but also that they don't need, and in some respects probably shouldn't, be treated as lexicographic subjects.
I also welcome explaining systematic details on how and why to present reconstructions on pages like Wiktionary:About Proto-Balto-Slavic. That said, if the dispute is about a current inability to establish a consensus reconstruction of PBS that we could use as the index forms, there are a couple of alternate possibilities that can be considered:
  • Picking an index language and listing forms under its' reflex. In the 'stone' example above, we'd perhaps use Lithuanian akmuo. This seems a bit difficult to fit into the Appendix:Proto-Whatever/word notation, though (it might appear to imply that it is a Proto-Baltic or Proto-Balto-Slavic form rather than Lithuanian).
  • Using rough "non-reconstructions". A convention introduced by I think Roger Blench is the symbol "#" in place of "*" when we have a cognate word-family but no systematic reconstruction scheme has been worked out in detail; and adding this to some kind of a "majority representation" (in principle partly arbitrary) of the word root's shape. In this case this would probably bring us to #akmV (since the ending seems to be the main issue).
--Tropylium (talk) 08:21, 29 August 2015 (UTC)
Tropylium said (..) problems arising from overdoing lexicography (..) from trying to use sources to "attest" reconstructions – Dan brought up that in their opinion protoforms need (in wiki jargon) tertiary sourcing (as opposed to secondary sources.) I completely agree with this (I also think that the vote that's "in the pipeline" essentially implies this.) A way to paraphrase it would be to say that protoforms need to be sourced as "ideas" or "concepts" (which is exactly what they are, imo) as opposed to sourcing them like "real words." On one hand the sourcing requirements are more stringent, OTOH in that they are not "words," things such as "uniformifying" their spelling would be allowed (and prob. encouraged) unlike "real words" where one would need an "alternative/archaic/blah spelling of". There's a bit of disc. on that here.
It sounds like there might some kind of inflation of "source grades" going on here. In lexicography, an attestation is a primary source, while a mention in a research paper would be a secondary source. However, in etymology, attested words are merely data, while a publication proposing an etymology or a reconstruction is a primary source. An etymological dictionary would be a secondary source; it'd take something like an etymology section in a general-purpose dictionary to reach a tertiary source. And as usual, there should be no reason to demand tertiary sources specifically.
It also follows that basing a reconstruction on "just look at these words here" is not merely synthesis of existing sources, it's unambiguously original research; and, similarly, demanding etymdict-type sources would be equal to positioning Wiktionary as a Wikipedia-type tertiary source. --Tropylium (talk) 20:32, 31 August 2015 (UTC)
In lexicography, an attestation is a primary source – I think on wikt. this is treated as secondary(?) – their head/brain was the "primary source," then they publish it (which renders it "archived") and then it becomes secondary (in my reading of wiki jargon anyhow.) Perhaps, indeed, there could be a "shift forward" by one "grade," so, if lexicog sourcing is primary in your terms then it would make the proposed protoform sourcing "secondary" (or tertiary if it's "shifted" by one.) Neitrāls vārds (talk) 20:53, 31 August 2015 (UTC)
Etymological dictionaries are sources of new information as well, not just research papers. I consider them secondary sources, because the primary source is the attestations themselves, and etymological dictionaries and research papers contain interpretations and conclusions based on these attestations. This is the same as what Wiktionary does: Wiktionary collects attestations in the form of citations from primary sources, and then makes interpretations and conclusions as to the meanings of the words and other aspects. Reconstructions are just another kind of interpretation and conclusion drawn from the data, except they're drawn from attestations of many words collectively. Hence, the question that still remains to be answered is whether Wiktionary is an etymological dictionary (secondary source with its own interpretations) or an encyclopedia/compendium of etymological research (tertiary source). Currently, Wiktionary is an etymological dictionary/secondary source as it contains its own interpretations of the data. —CodeCat 22:27, 31 August 2015 (UTC)
Imho the Baltic stuff doesn't even merit a discussion, make it an etyl-only lang and period. However, edit-warring between certain users, a knee-jerk block on a certain user and a lot of generally disruptive stuff (in the strictest sense of this word, aka, some users would probably be making useful contributions if not being swept up in this drama, hence disruptive) show that there is definitely lack of tools to solve (or if you ask me, not have it in the first place) this type of stuff. Neitrāls vārds (talk) 19:52, 31 August 2015 (UTC)

Just in case, here's the page I was referring to WP:TERTIARY.

I'm also confusing some things myself, it's not the act of sourcing that is to be any "-ary", it is what the project is supposed to be, e.g., wikip. is supposed to be a tertiary source, wikt. is supposed to be a secondary source (well, kind of... the lexicog part at least?), as opposed to secondarily/tertiraily sourced, this is where the "grade inflation" came from, sorry, my bad!

Also, I don't think there would necessarily be difference in the "grade" of a res. paper and an etyl dict. as the wikip. page seems to suggest that it's the manner of how it is being discussed that determines the "grade", e.g.,

  • John Doe wrote in his book "the water is so clear and splashy..." – someone on wikt. makes the judgement that the term "water" is used to mean H2O, J. Doe then is primary source and by using this citation to assert that "water" does indeed mean H2O wikt. is a secondary source.
  • But then Jane Doe mentioning a (hypothetical) protoform *wōdor (or whatever) would have to spell out that "wōdor is H2O in Proto-Whatever, because, I, Jane Doe said so." Then this becomes secondary and by quoting this wikt. is being "tertiary."

Neitrāls vārds (talk) 18:15, 1 September 2015 (UTC)

An etymological dictionary can be a primary source, in case its editorial team advances any analyses or etymologies that are entirely new. Most that I have used mainly compile etymologies established by earlier research, however.
This might be something dependant on the language family though… families like Indo-European or Uralic have a deep research history and there is quite a bit to be cited; but elsewhere, it might well happen that a wider etymological dictionary will be the first etymological source to treat a given language at all.
(The arbitraryness of the language/dialect division brings up a couple difficulties here, too; if variety X had earlier been considered merely a dialect of language A, then it'll be debatable if old sources on language A will count as sources on X as well.)
but back to reconstructions: when we're dealing with unattested protolanguages, the crucial point is the lack of lexicographical primary sources. (I'll skip edge cases like Latin or Sanskrit for now.) This means that something else has to be the primary source, and this is going to depend on what exactly we are sourcing.
  • If we want to establish a proto-Fooian word for let's say 'macaroni' having existed at all, then already someone pointing out that a set of words in Fooian languages for 'macaroni' are of common inheritance is a primary source (common inheritance necessarily requires that a PFooian word existed). Of course, this also presumes that a Fooian language family is already established. After all we presumably do not want to leave backdoors open for Proto-Worlders, Hungaro-Sumerists, everything-is-Tamil-ists, or even partisans for debated families like Hokan or Nilo-Saharan…
  • If we want to know some details about how the PFooian word should be reconstructed, someone's paper or monograph or course handout or whatever, on Fooian historical phonology or historical morphology or semantic change or whatever will be a primary source; and I suppose this holds even if they don't treat the particular word we're interested in. (After all, you don't need to list every single example out there in order to establish that e.g. French m- usually corresponds to Spanish m-.)
  • If later on Jane Doe comes along to put the pieces together to state "the Proto-Fooian word for 'macaroni' is *nuduly", this would indeed be a secondary source — provided that she's not proposing new details in the process.
--Tropylium (talk) 21:10, 1 September 2015 (UTC)

Links in examples of non-English words[edit]

I mean in the entry πειρατής#noun (meaning: pirate) the example

  • Πειρατές του Αιγαίου (meaning: pirates of the Aegean Sea)

(strictly speaking this example should be in the entry πειρατές, which is the plural nominative of πειρατής)
>>>> I believe the links of this kind are very useful for an english-speaking person who wants to learn that other language (Greek in the case above) because she/he can examine the word for word translation of the example (when a word for word translation can be provided for an example).
Another user reverted an edit of mine that added a link of this kind. Is there a Wiki-Decision on this issue?SoSivr (talk) 10:26, 25 August 2015 (UTC)

See WT:ELE#Example sentences: "Example sentences should... not contain wikilinks (the words should be easy enough to understand without additional lookup)". However, that policy may have been written with English example sentences in mind; perhaps it's time to reconsider it for other languages. —Aɴɢʀ (talk) 10:55, 25 August 2015 (UTC)
Yes this occurred to me some time ago. I'd like to split the rule for English and non-English entries, or just abolish it all together. Renard Migrant (talk) 17:04, 25 August 2015 (UTC)

Implementing some type of autolinking in usexes has been brought up (by Benwing, I think?) and I really like this idea. I have been doing this manually (as in Россия ‎(Rossija)), it's a bit of a pita doing it manually though. Neitrāls vārds (talk) 23:52, 25 August 2015 (UTC)

"A bit of a pita"? I don't support autolinking in usage examples. It barely works in headword lines. --WikiTiki89 00:33, 26 August 2015 (UTC)
Oh... I just got what a pita is (only because someone else used it in all caps in a discussion below). --WikiTiki89 03:06, 26 August 2015 (UTC)
I support autolinking. It would be better if they were black links, because the wrong links would be hidden and it would be easier on the eye. — Ungoliant (falai) 00:37, 26 August 2015 (UTC)
@Ungoliant, With the so-called "orange" links (when a landing page doesn't have the header for that lang) built into the software they could be made pretty accurate (only capitals at start of sentences would be a problem. @WikiTiki, well, perhaps the person originally suggesting this could share their vision of how it could/couldn't be implemented, Idk. Neitrāls vārds (talk) 01:24, 26 August 2015 (UTC)
Full support. Autolinking has been a de-facto standard for Chinese lects. In fact, you need to add an @ sign to remove links in {{zh-usex}}. In any case, the choice should be available for difficult or rare words, especially in foreign languages. I consider this quite important for languages without spaces between words (existing usexes may need to be need to be rewritten to allow autolinking as in เรียก). --Anatoli T. (обсудить/вклад) 01:27, 26 August 2015 (UTC)
I like the idea of autolinks, if it can be done right. Wikitiki, can you explain what doesn't work currently? Benwing2 (talk) 07:06, 26 August 2015 (UTC)

Adding a collocations tab or section[edit]

In the past, there has been support for listing common collocations somewhere (besides usexes, which only fit a few), such as in ====Collocations==== sections. At WT:RFD#sentimental_value, it was suggested that not only collocations but also translations be provided. IMO, it might consume too much visual and byte space to list translations of collocations within entries, so I propose that we [ask the developers to] create a 'Collocations' namespace with its own tab like 'Citations'. We could also link to it using a {{seeCites}}-type template in entries. In that namespace, we could list common collocations, perhaps as the glosses to translation tables to which translations could be added — I have mocked up an example at Talk:goods; note that SOP translations are linked to their component parts. What do you think; would you like a Collocations: tab, a ====Collocations==== section, or neither? Should the tab or section contain translations, like at goods? - -sche (discuss) 19:44, 25 August 2015 (UTC)

Seems like a reasonable solution to a perennial problem, at least if the default search includes the Collocations namespace. If it doesn't, we won't have helped users. I suppose I would support it anyway because we might be able to come up with some other way to facilitate user search access to it or technical possibilities and rules may change. DCDuring TALK 00:18, 26 August 2015 (UTC)
Adding another namespace is a PITA because there's no enforced correspondence between the entry and the other page (which is why the citations namespace should be deleted). If it's too much of a distraction it can go in a collapsed box. DTLHS (talk) 02:18, 26 August 2015 (UTC)
  • I don't know the technicalities of the issue. However, I would strongly support this idea, as it would make a natural repository for SoP expressions which actually have some linguistic value, such as what we often call "set phrases", or what is the usual (unexpected) verb which collocates with this noun?- ("wage war", "run for president", "wax lyrical"), as well as being a useful tool for Eng L2 students. There are well known lemming dictionaries out there devoted to the theme of common collocations. -- ALGRIF talk 15:50, 29 August 2015 (UTC)
I support this too. Useful for everybody. — Ungoliant (falai) 16:26, 29 August 2015 (UTC)
As a follow-up to my proposal re translations tables (see my mock-up of what Collocations:goods might look like): when a collocation has a synonym which has a main-namespace entry, we can of course use {{trans-see}}, like so. - -sche (discuss) 03:02, 1 September 2015 (UTC)
  • I support a separate namespace for collocations to such an extent that I strongly oppose an additional Collocations section in principal namespace that took up as much screen space as -sches's example. Imagine what that would look like for words like set, take, have, head. I think users of English Wiktionary who want a usable monolingual English dictionary for definitions and diction guidance already have a lot of drek (from their POV) to contend with:
    1. German or Translingual entries where their WP habits lead them to expect English;
    2. {{also}} that often leads them to FL entries;
    3. alternative forms sections that go to form of entries that convey no additional information;
    4. lengthy etymologies, with PIE and cognates;
    5. pronunciation sections they can't use without IPA;
    6. translation tables;
    7. semantic relations headers using words that don't occur in normal speech [hypernyms, hyponyms, troponyms, meronyms]).
Adding something else to this list seems like a good way to drive any English monolingual speakers away for good. DCDuring TALK 13:21, 1 September 2015 (UTC)
The version of Google that I end up most often using shows succinct definitions of English words above the search results without having to visit any of the search results. When traveling Google automatically switches to the local version of the site and I haven't seen any real differences, I also saw a tidbit about Wikipedia shedding 250 million visits (in July, I think?) aside from a "summer slump," these Google "blurbs" where mentioned as a factor in this rather dramatic decrease.
Imo, if Wiktionary is to ever have "an edge" over competitors, it's by providing highly detailed information. Neitrāls vārds (talk) 23:52, 2 September 2015 (UTC)
  • "collocation" is itself a word that doesn't occur in normal speech. If this is going to be helpful to everyday users, I think needs a clearer name. Nothing's coming to mind though. Also, which collocations would be common enough to list? Smurrayinchester (talk) 13:20, 3 September 2015 (UTC)
    I think we would have to have the courage and humility to have a name for the tab like "Words used [with] this word". DCDuring TALK 17:48, 3 September 2015 (UTC)
    How about just "phrases" or "related phrases"? Equinox 17:50, 3 September 2015 (UTC)
    Or Derived phrases, as I assume the headword or its inflected forms must be included in the collocations. That would at least be consistent with the use of derived in derived terms. DCDuring TALK 21:40, 3 September 2015 (UTC)
    I would prefer it that derived terms be split into different types more generally. Separating phrases from other derived things is useful, but it's also useful to separate, say, compounds from affixed words, or compounds with the term as a head from compounds with the term as a modifier. —CodeCat 21:42, 3 September 2015 (UTC)

I like this idea. While I don't have particular dislike for the citations tab (admittedly it's a bit of a mystery to me), I'm inclined to agree with DTLHS in that I prefer all the information in the relevant entry. Neitrāls vārds (talk) 23:52, 2 September 2015 (UTC)

Thanks for drawing my attention to the Google definitions. They seem better in quality than ours. The have copious synonyms. We cover more variation in senses, including obsolete, archaic, and obscure ones, though they have an excellent expanded display of additional definitions. They have good etymologies, though what I saw was presented in a confused graphical way that IMO misrepresented the facts they reported in text. They offer translations, too. IMO, all online dictionaries will have problems competing.
The bright [side] of the reduced number of WP visits is the lower load on the servers!!! DCDuring TALK 00:25, 3 September 2015 (UTC)
  • I oppose creating a separate namespace for collocations. Listing them in the mainspace is fine. --Dan Polansky (talk) 18:34, 4 September 2015 (UTC)
  • Re Derived Terms vs Collocations. A derived term would normally be a blue link term. E.g. mineral water is a derived term in the entry for mineral. A collocation is "use of English". Collocations are generally not going to be suitable as main entries, for SoP-iness. It's, for instance, knowing typical useful verb-noun collocations - such as "follow instructions", or "take aim", or "entertain a doubt". It's also about typical adjective-noun or adverb-verb collocations, etc. Words such as "wedge" would be enhanced if you could read that one "drives a wedge between" things, and so on. Word chunks that, if you know them, can make your English sound "good". However, I repeat, these are not derived terms. I am of the opinion that, by the time any Eng L2 learner is wondering about this stuff, they will already be familiar with the term Collocation, and so will be very happy to see the section, or tab, available as a resource. -- ALGRIF talk 15:32, 7 September 2015 (UTC)
I disagree that an English L2 learner would likely know the meaning of the word "collocation". For instance, my French is at the point at which such a tool on Wiktionnaire would be very useful to me, but if it was labelled "collocation", I would only know what it meant because it's the same word in English. I haven't ever seen the word in French before (the only reason I know it's the same is because I just looked it up), and I imagine it would be the situation for second language speakers of English. Something slightly wordier, but using more familiar English, might be "common phrases with this word" or "derived phrases" (as distinct from "derived terms"...which could be confusing, especially for new editors). Andrew Sheedy (talk) 21:43, 7 September 2015 (UTC)
While not detracting from your experience, I would also say in my defense that anyone taking First Certificate or similar would already know and use the term Collocation. Furthermore, whatever is chosen in the end has to fit neatly on a tab. -- ALGRIF talk 13:33, 8 September 2015 (UTC)
I bet more people would know collocation than hyponym, hypernym, coordinate and etymology. — Ungoliant (falai) 14:11, 8 September 2015 (UTC)
That's not saying much. Etymology, though is different, because it's present as a label in lots and lots of dictionaries. My suggestion would be "combinations", as in "what terms do you usually find it combined with?". While combine isn't a basic word like you would find in a book for young children, it's not eye-glaze material like collocation. I read a lot about language and have an undergraduate degree in Linguistics, but I don't remember ever seeing collocation in use before I came here, nor did I remember what it meant when I first saw it- though I must have encountered it at least a few times over the years. Chuck Entz (talk) 01:28, 9 September 2015 (UTC)
How about "Expressions" or "Phrases"? --Panda10 (talk) 13:16, 9 September 2015 (UTC)
FWIW, despite having received a degree in linguistics, I had never heard collocations used in this sense until this discussion; whereas I have heard and used hyponym, hypernym, coordinate and etymology numerous times. It certainly appears to be the correct term for this discussion, and it seems like in the past we have not shied away from using the right terminology (like collateral form or deponent) despite the relative obscurity. I'd certainly be fine with a different name, but also wouldn't dismiss collateral wholesale based on obscurity. —JohnC5 13:34, 9 September 2015 (UTC)
  • I don't know if anyone has looked at the Pedia entry? It is very informative for those of you who are not sure about the use, usefulness, or correctness of collocations. -- ALGRIF talk 15:02, 24 September 2015 (UTC)
 Collocation on Wikipedia


I have created Wiktionary:Votes/2015-09/Adding a collocations or phrases namespace or section so we can obtain a clear, enumerated consensus (or lack thereof) to show the devs, because we will need to ask them to add the namespace if the namespace is what we want. (It's not difficult to ask them and AFAIK it's not difficult for them to add a namespace; it's how we came to have a citations namespace. It's just a technical observation that we can't create a namespace ourselves, we have to ask them.) Please fix/point out any problems you see with the vote, suggest/make improvements, etc. As it gets closer to the scheduled start date, I will ping anyone who has participated in this discussion but doesn't seem to have noticed the vote. - -sche (discuss) 18:09, 24 September 2015 (UTC)

Gender markers in Polish adjective entries[edit]

{{pl-adj}} currently requires a gender parameter. However, gender in Polish adjectives is inflectional, not lexical, and the lemma form is almost always masculine nominative singular (with rare exceptions for "female-only" adjectives like ciężarna ‎(pregnant) or szczenna ‎(pregnant with puppies)). I think these markers should be removed and the gender parameter ignored and eventually removed through a bot, as the exceptional cases can be easily identified by looking at the adjective ending. Are there any objections? --Tweenk (talk) 22:30, 26 August 2015 (UTC)

If the gender can always be determined from the ending, then this sounds good to me. Even if in rare cases it can't, it might still be better to have the gender auto-detected and only present as an override. Benwing2 (talk) 08:30, 27 August 2015 (UTC)

Allowing matched-pair entries[edit]

I created Wiktionary:Votes/2015-08/Allowing matched-pair entries as a proposal to formalize entries such as ( ), based on the discussion Wiktionary:Beer parlour/2015/July#Merging ( and ) into a single entry. Thoughts? Can this vote be improved? What would be your vote and why? Feel free to edit it. --Daniel Carrero (talk) 14:27, 27 August 2015 (UTC)

Having an entry for () or some variant is one, thing, I doubt we actually want to delete ( or ) as they're real. Of course, can be used in smileys where pairing is not necessary. Renard Migrant (talk) 16:37, 9 September 2015 (UTC)
A more legitimate unpaired use outside of smileys is, for example, numbering: 1) like this. 2) like that. --WikiTiki89 16:41, 9 September 2015 (UTC)
Yes. Because :) is debatably more of an image than a word or a symbol. Renard Migrant (talk) 16:46, 9 September 2015 (UTC)
True. See also Citations:) for examples. --Daniel Carrero (talk) 18:30, 9 September 2015 (UTC)

Scientific symbols?[edit]

At present, there's no good place to put scientific symbols in entries (eg E for energy or electric field, t1/2 for half-life etc.) What would people say to modifying {{en-noun}} or creating a new inflection line template to show these symbols (similar to what's currently done at speed of light, but neater). So for instance:

speed of light ‎(uncountable, symbol c)
velocity ‎(countable and uncountable, plural velocities, symbol v or v)
magnetic flux ‎(uncountable, symbol Φ or ΦB)
neutron ‎(plural neutrons, symbol n)

There are some shortcomings (for instance, the need to sometimes use bold or italics) so I'd be happy to hear other suggestions. Smurrayinchester (talk) 15:49, 27 August 2015 (UTC)

I think it would be better if we agreed on a guideline on how to add them to definition lines rather than HWLs, because a symbol doesn’t always apply to all senses (i.e. velocity ‎(rapidity of motion) and speed of light ‎(figurative: extremely fast speed)). — Ungoliant (falai) 15:57, 27 August 2015 (UTC)
Why aren't they just displayed next to the appropriate {{sense}} under Synonyms, just like abbreviations sometimes are and always should be, IMO. I could understand making these symbols larger, having a different background or a border, etc to make them more visible as they could get lost in a series or block of synonyms. DCDuring TALK 16:26, 27 August 2015 (UTC)
Surely not worth changing en-noun for this. Use alternative forms of synonyms. If really necessary use {{head|en|noun}}. Renard Migrant (talk) 17:38, 27 August 2015 (UTC)
In an entry for an English word there is a section English, in an entry for a French word there is a section French and so on. But in an entry for a number, e.g. 7 or for a symbol, e.g. c, there is a section Translingual. Therefore similarly one could have an additional translation for e.g. the english noun velocity as rapidity of motion:
  • French: vitesse
  • Spanish: velocidad
  • Symbol(or Translingual): v

SoSivr (talk) 21:39, 28 August 2015 (UTC)

It's not a translation of the word, though: it's a conventional abbreviation. Equinox 21:42, 28 August 2015 (UTC)
Such symbols are normally Translingual. Thus they might be a synonym in many languages. DCDuring TALK 21:56, 28 August 2015 (UTC)
I agree with DCDuring, list abbreviations in the Synonyms or Alternative forms section. This is also how we handle non-scientific abbreviations, in my experience, like United KingdomUK. - -sche (discuss) 22:15, 28 August 2015 (UTC)

Attributive use of nouns[edit]

How to we tell for certain that a noun that modifies another noun is or isn't an adjective? For instance, I'm pretty sure that the word donkey in "donkey sanctuary" is just a noun, as is beer in beer parlour. An example of true adjectival usage would be welcome. SemperBlotto (talk) 14:57, 29 August 2015 (UTC)

Wiktionary:English adjectives which is of course, not policy. Wiktionary:About English contains no policy that I can see on what separates an adjective from a noun used attributively. I actually don't think it's that hard and in ambiguous cases, there should be three citations which are clearly adjectival not either nominal or adjectival. For example "this desk is wood" would not count as a clear adjectival cite as it's just as easily (or more easily) identifiable as a noun than an adjective. Renard Migrant (talk) 15:11, 29 August 2015 (UTC)
It's very difficult to get a wording through a vote, though. Even people who agree that we need such a policy will oppose on the grounds of wording, so getting 70% ish approval is unlikely. Renard Migrant (talk) 15:12, 29 August 2015 (UTC)
Would you say that epidemic in "epidemic proportions" is an adjective? It seems so to me (but I can't explain why). SemperBlotto (talk) 15:55, 29 August 2015 (UTC)
Yes, you're right. [9]. Donnanz (talk) 17:02, 29 August 2015 (UTC)
Yes, "proportions" usually takes an adj; e.g. you'd say "canine proportions", not "dog proportions". Equinox 17:11, 29 August 2015 (UTC)
Apply tests of adjectivity, and Occam's razor. Donkey has not (yet) been shown to be used in contexts that are clearly adjectival, like this sanctuary is donkeyer than that one; it was very red and very donkey. In contexts where either a noun or an adjective could work (donkey sanctuary could be compared to noun sanctuary or improbable sanctuary), Occam's razor suggests it's more likely to still be a noun than to have acquired a second part of speech which is peculiarly limited to only those varied contexts where the first part of speech could also be used. On the other hand, epidemic is used in contexts where only an adjective could work, so it must be an adjective (some of the time). It's also used in contexts where only a noun could work, e.g. in the plural, hence it is also a noun. When a word that has been shown to be both an adjective and a noun is used in contexts where it could be either (like epidemic disease), I think we've tended to default to the interpretation that it's an adjective unless semantics make the other interpretation more likely: e.g. the adjective is the best semantic fit in epidemic fraud (widespread fraud), while the noun would be the best fit in *epidemic storage (section of a lab which stores samples of viruses that cause epidemics). But if a prime minister fakes an outbreak of disease in order to push through security measures, you could speak of "his epidemic fraud" with epidemic as a noun, just like you could mock postmodernism as "that postmodernism nonsense". - -sche (discuss) 16:45, 29 August 2015 (UTC)

General thoughts on this:

  1. The reason that it's difficult to get a policy through is that there's no bright line.
  2. This is a particularly confusing subject for non-English speakers. I speak English 1st and French 2nd. French isn't big into attributive nouns. In English, you can construct a sentence "A B", where A is an attributive noun and B is a common noun. French you usually construct it "B de A", where A and B are nouns and "de" is the preposition "de"

Purplebackpack89 17:01, 29 August 2015 (UTC)

Some French adjectives feel a lot like attributive nouns to me, e.g. routier (not comparable and so forth). Equinox 17:11, 29 August 2015 (UTC)
True adjectives can be qualified by adverbs. Epidemic fraudfraud was indeed epidemic; Epidemic storage → *the storage was indeed epidemic (in the sense -sche mentioned); the table is woodenthe table is solidly wooden; the table is woodthe table is solid wood, *the table is solidly wood. — Ungoliant (falai) 17:15, 29 August 2015 (UTC)
I wouldn't really have any problem with "the table is solidly wood". —CodeCat 17:45, 29 August 2015 (UTC)
In "the table is solidly wood", solidly is modifying is, not wood. --WikiTiki89 14:35, 1 September 2015 (UTC)

restoring solitary wasp[edit]


Perhaps not the best place to post a request but I don't know another place to do it. This is a perfectly attestable expression, and my grammar, though not perfect (I'm not a native speaker) was certainly acceptable, and at least correctable if there were mistakes. Could someone bring back that entry please? I'm really fed up with the cavalier behavior of this admin, really (and not the only one). Thank you 20:50, 29 August 2015 (UTC)

Hi. Yes, it's a real phrase, but doesn't it just refer to any wasp that is solitary (i.e. not social or colony-dwelling)? Then it's obvious from the two words. Equinox 20:59, 29 August 2015 (UTC)
It seems that the terms solitary wasp, social wasp, and hunting wasp have been used as if they referred to well-defined groups, though most modern thinking would apparently have them as SoP. For example, Century 1911 has solitary wasp as a run-in at the entry for solitary. DCDuring TALK 21:33, 29 August 2015 (UTC)
This has been recreated, and I have rewritten it in English. However, I can't find it in any other dictionary and feel it is sum-of-parts. There are plenty of hits for the two words used together so I'm not sure that RfV would be useful. SemperBlotto (talk) 20:33, 31 August 2015 (UTC)
Take a look at the bottom of the entry for solitary in The Century Dictionary, The Century Co., New York, 1911 where it is a run-in entry. I take this to mean that "the solitary wasps" was considered at least an informal grouping at that time and that the most promising source of citations would be before 1910, though the term may have continued in use past that time. DCDuring TALK 22:23, 31 August 2015 (UTC)
I have edited the entry in line with the thoughts above, adding a dated definition with cites from the 19th century and adding {{&lit|social|wasp}} to replace the previous SoP definition. Note that two of the citations are of Social Wasps, the capitalization being suggestive of something other than SoPitude. The entry could be further improved or challenged, of course. DCDuring TALK 22:55, 31 August 2015 (UTC)

Which English entries need pronunciation?[edit]

Can someone generate a list of English entries that don't have {{IPA}}? But somehow sort them in order of importance? I'm not sure how we would go about that, but there are basic entries out there, like garbage, which really should have the IPA pronunciation. Ultimateria (talk) 04:41, 30 August 2015 (UTC)

I'd like this too, though "in order of importance" is probably an unattainable goal. I've added pronunciation info at garbage now. —Aɴɢʀ (talk) 06:52, 30 August 2015 (UTC)

Here's a list of top 100 English entries whose English section did not contain "{{IPA" on 28 July 2014, ordered by Wiktionary:Frequency lists/PG/2006/04/1-10000, not constrained to lemmas, based on 20140728 dump: said, no, de, hands, Gutenberg, english, 2, replied, united, john, looking, coming, making, sn, arms, followed, appeared, continued, ety, reached, suddenly, miles, taking, beyond, nearly, laws, comes, natural, laid, copyright, opened, an', 4, makes, tried, Dr, lived, certainly, unto, placed, letters, remained, blockquote, happened, minutes, loved, knows, donations, thoughts, including, filled, seeing, tears, places, raised, moved, giving, laughed, leaving, started, circumstances, c., lines, considered, observed, wished, Charles, formed, trying, allowed, girls, discovered, sitting, ways, officers, offered, happiness, produced, walls, declared, prepared, takes, soldiers, talking, steps, intended, matters, appears, closed, gives, required, ladies, fixed, troops, camp, copies, v., running, cases, names.

If you want to have the list constrained to lemmas, let me know. Basically, let me know:

  • a) How many items you want
  • b) Whether you want to constrain to lemmas
  • c) To what location do you want the list delivered, like someone's talk page, some subpage, or the like

The process is rather simple, based on a dump. The key part is identifying English sections that do not contain "{{IPA". This is done using the following script find-missing-English-IPA.py:

import sys, re
entryStartFound = False
IPAFound = False
title = ""
for line in open(sys.argv[1]):
  line = line.rstrip()
  if "<title>" in line: title = re.sub(" *</?title> *", "", line)
  if entryStartFound:
    if "{{IPA" in line or "{{audio-IPA" in line: IPAFound = True
    if "----" in line or "</text>" in line:
      entryStartFound = False      
      if not IPAFound: print title
      IPAFound = False
  if "==English==" in line: entryStartFound = True

The rest is intersecting the result with the frequency list in such order that the result is sorted by frequency list. The process was as follows:

  • find-missing-English-IPA.py enwiktionary-20140728-pages-articles.xml >English-entries-with-no-IPA.txt
  • grep -Fx -f English-entries-with-no-IPA.txt frequency-list-English-PG-10000.txt >t.txt
    That's a set intersection, but the order of files matter.
  • head -100 t.txt
    Output the first 100 lines

You need Python, grep and head. You probably do not really need head, since you can pick the top 100 in your favorite editor. grep is used to do set intersection; if you have another method, you don't need grep. --Dan Polansky (talk) 10:34, 30 August 2015 (UTC)

By the way, English-entries-with-no-IPA.txt has 519,273 items. --Dan Polansky (talk) 10:36, 30 August 2015 (UTC)
The first two in your list above, said and no use {{audio-IPA}}, so they do have IPA pronunciations given. I bet several of the others in the list do, too. —Aɴɢʀ (talk) 11:55, 30 August 2015 (UTC)
@Aɴɢʀ: I fixed the script above. Do you want to have the list constrained to lemmas? Do you want to have a longer list? --Dan Polansky (talk) 12:58, 30 August 2015 (UTC)
I don't know about Ultimateria, but I don't want it constrained to lemmas, and I'd like to have an exhaustive list unless that would take too long to generate. —Aɴɢʀ (talk) 13:29, 30 August 2015 (UTC)
The exhaustive list of English entries without IPA is approximately the same as the list of all English entries. It has 519,273 items, as stated above. The list of items in PG-10000 that lack IPA has about 4070 items. I am posting the first 500 items to Beer parlour; when you're done adding IPA to those, drop me a line on my talk page to get more:
--Dan Polansky (talk) 13:47, 30 August 2015 (UTC)
Angr is right, I don't want just lemmas. The entries from PG 1-10000 is a good start. Could you put them at User:Ultimateria/en-needing-ipa? Ultimateria (talk) 19:08, 30 August 2015 (UTC)
Done. --Dan Polansky (talk) 19:17, 30 August 2015 (UTC)
Thanks, Dan! But I'm wondering why nearly is in the list; it's had IPA since February. —Aɴɢʀ (talk) 19:19, 30 August 2015 (UTC)
The list is based on a 28 July 2014 dump, as per above. That should be good enough, I think. By the way, the addition of "audio-IPA" had very little effect. 20-80. --Dan Polansky (talk) 19:29, 30 August 2015 (UTC)
Thanks for the list, Dan! Sad to see that in the past 13 months hardly anyone added IPA to these entries... Ultimateria (talk) 21:10, 30 August 2015 (UTC)


I think someone should check the Polish declension of one. I think the recent change looks very, very odd. —Stephen (Talk) 13:57, 1 September 2015 (UTC)

Removed. It was absolute nonsense. --Tweenk (talk) 19:04, 4 September 2015 (UTC)

September 2015

Category:Sanskrit language appears in Category:All extinct languages[edit]

To my surprise, w:Sanskrit shows that Sanskrit is an official language in one region of India and that it has native speakers. Yet we class is as an extinct language, so one of us is wrong. Renard Migrant (talk) 18:07, 1 September 2015 (UTC)

The Wikipedia page says "The Mattur village in central Karnataka claims to have native speakers of Sanskrit among its population. Inhabitants of all castes learn Sanskrit starting in childhood and converse in the language." This really seems no different from the status Latin would have had maybe a hundred years ago in some places. --WikiTiki89 18:11, 1 September 2015 (UTC)
@Wikitiki89: For that matter, Latin is in this category. To be sure, there are Latin speakers and even those who are exposed to it from birth (through Catholic Mass) but it is effectively a dead language. —Justin (koavf)TCM 03:25, 2 September 2015 (UTC)

Wiktionary:Votes/2013-10/Reconstructions need references[edit]

The vote is now open. (I presume pinging users I have seen working with etyls would be politeering? (Why is this not a word?)) Anyway, if you have an opinion, please participate in the vote, thank you! Neitrāls vārds (talk) 21:55, 1 September 2015 (UTC)

Manual inflection tables: positional or named parameters?[edit]

For manual inflection tables, each form is specified separately rather than generated through stems, rules and other logic. I am wondering whether it's preferred to have such templates with numbered/positional parameters to specify each form, or named ones? Named has the advantage that you don't have to remember which parameter is for which form, but it's also a lot more to type. —CodeCat 14:08, 2 September 2015 (UTC)

I think it's better to have named parameters, along with a copy-and-paste template in the template's documentation to save typing time. See Template:uk-adj-table. --WikiTiki89 14:35, 2 September 2015 (UTC)
Or for a more extreme example {{sga-conj-complex}}. And I agree with WikiTiki89 that it's better to use named parameters. —Aɴɢʀ (talk) 14:56, 2 September 2015 (UTC)
I think such a template should have named parameters if there are many parameters or if there are likely to be parameters skipped in some cases, because (in both those cases) people using the template will have a hard time keeping track of positional parameters. I think such a template should have positional parameters if some of its uses will be of only the first not-many parameters, because people using the template will not want to write named parameters' names. Thus, where both those criteria apply, the template should have both named and positional parameters, like {{{so|{{{1}}}}}}. (I don't see the harm in doing things that way even if not both the criteria apply.)​—msh210 (talk) 21:53, 8 September 2015 (UTC)
Well, for these kinds of templates, all the parameters are given the vast majority of the time. —CodeCat 22:04, 8 September 2015 (UTC)
For most of them, yes, I think so. But since this question (whether to use positional or named parameters) is decided when writing each template, and there may be some — I suspect there are — which are often used without all their parameters, it is appropriate to note (and, if writing guidelines, to build into the guidelines) the criteria I mentioned.​—msh210 (talk) 21:56, 9 September 2015 (UTC)

Introducing the Wikimedia public policy site[edit]

Hi all,

We are excited to introduce a new Wikimedia Public Policy site. The site includes resources and position statements on access, copyright, censorship, intermediary liability, and privacy. The site explains how good public policy supports the Wikimedia projects, editors, and mission.

Visit the public policy portal: https://policy.wikimedia.org/

Please help translate the statements on Meta Wiki. You can read more on the Wikimedia blog.


Yana and Stephen (Talk) 18:12, 2 September 2015 (UTC)

(Sent with the Global message delivery system)

Is this really something Wikimedia should be involved in? There goes NPOV... --WikiTiki89 18:21, 2 September 2015 (UTC)
It seems like policy that favors the survival of WMF and the projects. And they can't go it alone so they have to ally and coordinate with other parties. DCDuring TALK 19:28, 2 September 2015 (UTC)
@DCDuring: I don't think they're worried about not surviving. It probably has to do with ensuring that people in every country are allowed to have access to WMF resources. Even though this is highly desirable for the WMF and its projects, it isn't a task that the WMF itself should be taking on (see my other responses below). --WikiTiki89 14:02, 3 September 2015 (UTC)
@Wikitiki89 The survival question in the short run is just the intermediary liability issue, which could easily bankrupt WMF as well as create a chilling effect. All four of the other thrust are longer-term survival matters: maintaining the economic model (eg copyright) and a political model to enable WMF projects to serve superordinate goals that make contributors feel they are working for a higher cause in lieu of monetary compensation. The superordinate goals also help garner support from other elements in society. DCDuring TALK 17:58, 3 September 2015 (UTC)
@DCDuring: Ok, maybe I'm wrong about the survival question, but the point I was trying to make still stands: this isn't a task that the WMF itself should be taking on. --WikiTiki89 18:04, 3 September 2015 (UTC)
Very few organizations of any size and weight in the world get to ignore public policy matters. WMF has selected issues that are close to its core mission. I would be unhappy if they got involved in other causes, no matter how much I agreed with them, eg, certain environmental issues, official corruption, nuclear proliferation. DCDuring TALK 21:34, 3 September 2015 (UTC)
The second you make a political statement, you alienate everyone that disagrees. The WMF foundation's core mission is to make information available to everyone, and alienating people is not the right way to achieve that. WMF projects need to be able to thrive even in places where free speech is a foreign concept and governments looking over people's shoulders is taken for granted. --WikiTiki89 21:57, 3 September 2015 (UTC)
I think WMF would lose a lot of very committed people if it were to fail to push its values as best it can and many others would lose some of the feel-good that keeps them contributing, weakening their commitment to the projects. DCDuring TALK 00:47, 4 September 2015 (UTC)
You're saying people would quit supporting the WMF if it weren't vocal enough about politics? --WikiTiki89 14:01, 4 September 2015 (UTC)
And people really get excited about such things? SemperBlotto (talk) 19:32, 2 September 2015 (UTC)
  • @Wikitiki89: WMF doesn't have to abide by NPOV, just the projects themselves. —Μετάknowledgediscuss/deeds 21:20, 2 September 2015 (UTC)
    • @Wikitiki89: I'm confused... Are you saying the WMF and broader Wikimedia community of editors and supporters shouldn't be in favor of copyright reform and protecting readers' privacy? I don't see what is inappropriate here. —Justin (koavf)TCM 21:22, 2 September 2015 (UTC)
      • I agree, this is a great thing to have. bd2412 T 22:55, 2 September 2015 (UTC)
      • @Metaknowledge: They don't have to, but they should. The NPOV philosophy of WMF projects loses a lot of credibility if the organization behind these projects is making political statements, regardless of what these political statements are. @Koavf: The editors can be in favor of whatever they want, but the Wikimedia organization itself should remain publicly neutral, even if every single one of the editors shared the same opinions. --WikiTiki89 14:02, 3 September 2015 (UTC)
Hmm, as long as they don't start bringing politics into it like the current GitHub code-of-conduct controversy. Equinox 23:07, 2 September 2015 (UTC)
Thanks for pointing out the GitHub thing. We don't have a hardcore adolescent geek culture here, despite occasional outbreaks of the kind of humor that can be offensive. I also haven't seen much of it in other WMF projects, so there shouldn't be as much occasion for a Code of Conduct. We did manage to deal with that kind of thing without resorting to a formal code of conduct and banning. I do expect that there will be pressure to adopt such a thing however though the policy thing seems to deal solely with public policy, not internal policy. DCDuring TALK 23:45, 2 September 2015 (UTC)

Special categories for Euro & Brazilian Portuguese forms[edit]

We have category:American English forms and category:British English forms. Why not category:Brazilian Portuguese forms & category:European Portuguese forms? Combining the orthographies and semantics together just hardens navigation. --Romanophile (talk) 00:48, 3 September 2015 (UTC)

  • @Romanophile: Agreed. It's completely legitimate to separate varieties for navigational purposes and this is particularly common with pt/pt-br. —Justin (koavf)TCM 03:10, 3 September 2015 (UTC)
  • Support. {{tcx}} could be made to categorise {{tcx|Portugal}} as Category:European Portuguese forms, but entries created prior to {{tcx}} will have to be updated. — Ungoliant (falai) 15:08, 3 September 2015 (UTC)
    • But then what if the label "Portugal" is used for another language? —CodeCat 15:21, 3 September 2015 (UTC)
      • {{tcx}} takes a language code. — Ungoliant (falai) 15:48, 3 September 2015 (UTC)
        • That's not the point. What would {{lb|en|Portugal}} result in? —CodeCat 16:04, 3 September 2015 (UTC)
          • Whatever it results in currently. — Ungoliant (falai) 16:09, 3 September 2015 (UTC)
  • Yes, and we shoukd do the same for transpondian Spanish (if we don't already).SemperBlotto (talk) 15:12, 3 September 2015 (UTC)
    Are there that many orthographic differences between European and Latin-American Spanish? --WikiTiki89 15:27, 3 September 2015 (UTC)
    I'm not a Spanish expert - but I know that tortilla has different meanings in Europe and the Americas. SemperBlotto (talk) 15:32, 3 September 2015 (UTC)
    But this isn't about meanings, this is about orthography. We already have Category:Spanish Spanish and Category:Latin American Spanish for things like that. --WikiTiki89 15:50, 3 September 2015 (UTC)
    There are no (longer any) orthographic differences in the standard of Spanish of Latin America versus that of Spain. —Μετάknowledgediscuss/deeds 16:03, 3 September 2015 (UTC)
    @Wikitiki89: Spanish is almost entirely phonetic (with some caveats about c/k/s/th/z sounds running together) so there are very few--if any--orthographic differences. I could imagine some eye spellings becoming somewhat popular in regions but I don't know of any. In fact, I don't know of any Spanish spelling differences like the American/British differences between colo(u)r/hono(u)r/etc. —Justin (koavf)TCM 04:04, 4 September 2015 (UTC)
    I cringe every time someone says any languages' orthography is "almost entirely phonetic". This seems true on the surface, but breaks down when you take a deeper look, especially at not-completely-standard varieties. --WikiTiki89 14:04, 4 September 2015 (UTC)
    @Wikitiki89: E.g.? —Justin (koavf)TCM 02:03, 5 September 2015 (UTC)
    @Koavf: The first two things that come to mind for Spanish are:
    • Voicing assimilation of "s": mismo is pronounced [ˈmizmo] rather than [ˈmismo], etc.
    • In some dialects the dropping of syllable-final "s" affects the quality of the preceding vowel and creates a phonemic distinction between, for example, todo [ˈtoð̞o] and todos [ˈtɔð̞ɔ].
    There are many more examples. --WikiTiki89 15:04, 8 September 2015 (UTC)
    @Wikitiki89: Sure, or the peninsular distinction--pronouncing "s" as "sh". But the spelling is a virtual free-for-all as in English. I don't know of any language anywhere near the size of Spanish where spelling can be inferred from sound and vice versa as well. —Justin (koavf)TCM 15:40, 8 September 2015 (UTC)
    @Koavf: Isn't that exactly what I was saying, that virtually no language is "almost entirely phonetic"? --WikiTiki89 15:45, 8 September 2015 (UTC)
    @Wikitiki89: Well, we may have been saying the same thing the entire time but my point was that although Spanish--like any language--is not perfectly phonetic, it is far more regular than the language that we are using right now. Especially when one considers that Spanish has about 600 million speakers. As a general rule of thumb, the language is phonetic but with some necessary explanation. I don't think it's really cringe-worthy when someone says, "Turkish is phonetic" because that is a meaningful statement. Maybe not a perfect one but still a useful one for understanding that spelling is very standardized and maps to pronunciation in a predictable way. (In point of fact, this reminds me of when I worked in a bookstore and a Turk asked me in which "ay-zul" a book was--she meant "aisle".) —Justin (koavf)TCM 15:56, 8 September 2015 (UTC)
    First of all, English is much more phonetic than people give it credit for; there are just a lot more rules to learn. Second of all, the fact that you just said that Turkish spelling is standardized, proves that it is not a phonetic language for speakers of less standard dialects (a truly phonetic language would not have a standard and everyone would write in their own dialect, which is almost the case for Serbo-Croatian). Third of all, there is big difference between "far more regular than [English]" and "almost entirely phonetic". --WikiTiki89 16:49, 8 September 2015 (UTC)
    @Wikitiki89: What I meant that if you hear Turkish, you will know how it's spelled and if you see something spelled in Turkish, you will know how to pronounce it. Not anything about a standard register. This is not even close to true for English and if you have a huge panoply of rules to remember, then you don't have very phonetic spelling--that's what makes spelling phonetic. If you hear English and try to transcribe the sounds using the ISO-standard Latin alphabet, you can get all kinds of spellings and many of them not close to proper English. This is not true for Spanish. In fact, this is empirical: we could take native and non-native speakers and have them transcribe sounds or guess how words are spelled and we would find that they would be much more accurate for Spanish or Turkish than for English. That's all I'm claiming and I think that anyone wouldn't make a more over-reaching claim about any language being perfectly phonetic or without variation over time and space. —Justin (koavf)TCM 17:28, 8 September 2015 (UTC)
    You're right that if you see something spelled in standard Turkish, you will know how to pronounce it in the standard language (and not counting the exceptions that I presume exist but do not know about, having never studied Turkish). I don't know what you mean by transcribing English with the "ISO-standard Latin alphabet", but the average English speaker would be able to accurately transcribe spoken English into written English, even when this speech contains words not previously known to the listener. A non-native English speaker who is not very proficient would not be able to. But the same goes for Spanish and (I presume) Turkish. If you heard a Spanish speaker say [ˈtɔð̞ɔ], you might mistakenly transcribe it as todo, depending on your familiarity with this class of dialects. You might also hear [ˈla.o] and not know whether to transcribe it as lado, lago, or lavo. --WikiTiki89 17:56, 8 September 2015 (UTC)
    There are few eye spellinings like rajuñar vs. rasguñar but they are rather rarely used and officially considered incorrect. Matthias Buchmeier (talk) 05:05, 4 September 2015 (UTC)
    There are regional morphological differences in the second person, though they don't line up in a completely tidy Europe/America split. Chuck Entz (talk) 06:01, 4 September 2015 (UTC)

Languages (possibly) needing additional scripts[edit]

DTLHS (talk) 19:42, 4 September 2015 (UTC)

Acehnese was formerly written in Arabic script, per WP and Philip A. Luelsdorff, Orthography and Phonology (1987, ISBN 9027274436), page 136. Banjarese either formerly was or still is written in Arabic. Old Javanese was written in Javanese script. I'll update the modules accordingly.
The Algonquian translation I have simply removed (wrong script and part of speech).
The Chinese-script-Chamorro was a typo (of the language code cmn as ch, clearly mnemonic for "Chinese").
- -sche (discuss) 16:50, 5 September 2015 (UTC)

Two Russia German (Russlanddeutsch) languages[edit]

I have some words from two languages of the Russlanddeutsche which I would like to add, but we need to decide how to encode them.

  1. The Volga Germans speak primarily Rhine Franconian dialects[1] (with some Russian loans like Erbus ‎(watermelon)),[2] similar to the Pennsylvania Germans.
    We could treat Volga German under the code gmw-rfr which we recently created for the Rhine Franconian varieties of Germany proper (at which point it would probably make sense to also merge Pennsylvania German into that header). In favour of this are the arguments that Volga German, Penn. German and Palatine German have remained very similar despite their geographically separate development, and having separate headers will result in many pages looking like Wasser does now. Some references, like the Pfälzisches Wörterbuch, do treat Volga German, Penn. German and Palatine proper as one language with a huge variety of dialects (since both Volga German and Palatine proper have quite a few dialects).
    Alternatively, we could treat Volga German as its own language (say, gmw-vog). In the past, we have tended to give lects their own codes if they developed independently due to geographic isolation, even if they didn't develop to be very different: hence not only Pennsylvania German but also Transylvanian Saxon, Hunsrik, Alemán Coloniero and other lects have their own codes separate from their parent varieties. And there is another argument: Volga German is not entirely Rhine Franconian; it developed in communities made up of people from all over (Alsace, Baden, East Central German areas, Sweden, etc), and hence it is in practice a mish-mash in which Rhine Franconian is merely the most dominant element (this is also true of Penn. German).[3][4] Several references treat Volga German as its own lect, though most of them comment on its similarity to Penn. and/or Palatine German.
    See User:-sche/Volga for a sample and comparison.
  2. The Russian Mennonites spoke Mennonite Low German, a.k.a. Plautdietsch. At the momet, our only ==Plautdietsch== entries are from American communities; I'd like to include historical texts from those communities while they were still in Russia, and texts from the communities which remain in Russia, under the same L2. Plautdietsch is distinguished from German- and Dutch- Low German by some phonological changes (especially to k) which hamper mutual intelligibility, but within Plautdietsch the distinction between American and European is only as notable as the distinction between Chortitzaer and Molotschnaer, and the references I can find treat the American and Russian (and Chortitzaer and Molotschnaer) varieties as the same language. See User:-sche/RMLG for a comparison: the principle distinction is that American MLG has [c], written 'kj', where Russian MLG has [tʲ], written 'tj'. To re-iterate, I'd like to add Russian Mennonite Low German under our existing Plautdietsch header with {{label}}s and {{a}}s, rather than giving it its own header.

Note that there are and historically were other Germans in Russia (e.g. Swabian speakers in some places), but I'm content to leave them undiscussed for now because I don't have words from them yet. (I intend to start a thread about the Danube Swabians later.) - -sche (discuss) 21:48, 4 September 2015 (UTC)

I'd lean in favor of a separate language code for Volga German, for the reasons you mention. (Is it never written in Cyrillic?) And I'm in favor of treating pdt as one language with a Russian dialect and an American dialect—or rather, a North American dialect, since isn't Plautdietsch also spoken in Canada and Belize? —Aɴɢʀ (talk) 08:04, 5 September 2015 (UTC)
On the other hand, w:Plautdietsch language#Varieties says the two major dialects are Chortitza and Molotschna; does that division correspond to what you're calling the Russian/American division? —Aɴɢʀ (talk) 08:12, 5 September 2015 (UTC)
No; modern Plautdietsch in both the Americas and Europe blends elements of the Chortitza and Molotschna varieties. For instance, early references note that /tʲ ~ c/ (from original */k/) originated as a Chortitza feature, but it is now found everywhere. Also, in the early period Molotschna had [uː] in words like 'Fru' and 'Hus' while Chortitza had [yː], but the references I looked at noticed speakers of both varieties using the other's form. One says that before WWII, "the Chortica rounded front vowel [yː] was replaced by the Molotchna long back vowel [uː], as in [fryː] - [fruː]", while another says that after the war, the "dominierende [Molotschnaer] Varietät setzte sich gleichwohl nicht in allen primären Merkmalen durch, sondern nahm z. B. aus der rezessiven [Chortitzaer] Varietät den Umlaut langes yː statt langem uː (fryː statt fruː 'Frau', hyːs statt huːs 'Haus')". (It has been suggested that the switch to the otherwise less prestigious but older and more original Chortitza form was a way of defiantly resisting Russian pressure to become more Russian.) - -sche (discuss) 18:03, 5 September 2015 (UTC)
Maybe we should recognize four dialects of it then: the two older ones and the two modern ones. —Aɴɢʀ (talk) 19:04, 5 September 2015 (UTC)
re Cyrillic: I would expect Russian-language texts to mention individual Volga German words in Cyrillic the same way English texts mention Russian in transliterated form, but I can't even find examples of that, searching the web for Cyrillizations of common words, like "бам|баам|ман|манн" поволжские немцы. The printed as well as the handwritten texts in the language that I've seen are in Latin script, and the one Russian-language reference I have prints all the Volga German words in Latin script. - -sche (discuss) 21:39, 8 September 2015 (UTC)
I have added "Mennonite Low German", "Chortitza", "Molotschna", and "Russian Mennonite Low German" as alt names of pdt and will add content from Russia/Ukraine-based communities under that code soon.
I have not yet added a code for Volga German. I recognize that the weight of precedent regarding European languages is behind giving it its own code, and I have a weak preference for that myself. I worry that it does show "our" bias ("our" meaning not just Wiktionary, but the various generally European- or American-authored reference works on the languages themselves which we consult), however: that various mutually-intelligible shades of European languages are often treated as distinct while mutually-intelligible shades of African languages are often handled as single languages.
- -sche (discuss) 21:39, 8 September 2015 (UTC)
I have added gmw-vog. - -sche (discuss) 05:26, 28 September 2015 (UTC)

Open call for Individual Engagement Grants[edit]

Greetings! The Individual Engagement Grants program is accepting proposals from August 31st to September 29th to fund new tools, community-building processes, and other experimental ideas that enhance the work of Wikimedia volunteers. Whether you need a small or large amount of funds (up to $30,000 USD), Individual Engagement Grants can support you and your team’s project development time in addition to project expenses such as materials, travel, and rental space.

I JethroBT (WMF), 09:34, 5 September 2015 (UTC)

There is less than one week left to submit Individual Engagement Grant (IEG) proposals before the September 29th deadline. If you have ideas for new tools, community-building processes, and other experimental projects that enhance the work of Wikimedia volunteers, start your proposal today! Please encourage others who have great ideas to apply as well. Support is available if you want help turning your idea into a grant request. I JethroBT (WMF) (talk) 15:31, 24 September 2015 (UTC)

People who have productive names[edit]

I've thought about this for years, and haven't floated the idea before for fear of starting an ugly shitstorm of useless argument, but it's continued to bug me.

Wiktionary is not for biographies, but we have a lot of words that derive from proper nouns. It would make sense to me to include definitions for those base terms that have gone on to form other words. I'm not proposing lengthy biographies, just who a person is with a link to their Wikipedia article, and only for the form of their name which has been productive.

We have entries for Hemingwayesque and Hemingwayan, so we should include a biographical definition at Hemingway (e.g. "Ernest Hemingway (1899–1961), American writer and journalist") and list the derived terms. Keeping with the current rules, we don't need an entry at Ernest Hemingway nor Ernest Miller Hemingway, as they are not productive terms and don't forms other words.

Similarly, we have the word Obamacare, so we should include a short biographical definition at Obama (but not at Barack Obama, nor Barack H. Obama, etc). Notably, Obama already has a biographical entry in defiance of the "no biographies" rule.

We have the term Darth Vader, as in "the Darth Vader of", but because of strict adherence to the "no biographies" rule, there is no definition for the fictional character w:Darth Vader himself. As such, the entry is ridiculous, redeemed only slightly by squeezing the fictional character into the etymology. You don't need to go to the discussion page to know there's been some weird argument that has lead to the elephant-in-the-room definition list. The attributive use derives from the fictional character, so he should also get an entry at Darth Vader (but not Vader, nor Anakin Skywalker unless they also have derived terms).

Darwin has a lengthy list of derived terms but the entry awkwardly squeezes mention of Charles Darwin into the "A surname" definition.

You can have "a Picasso" (a work of art by Pablo Picasso), so Picasso himself should get an entry at Picasso. And under the Spanish heading I see he's snuck in.

Sorry to kick up this argument again, but it seems like a no-brainer to me, and in many cases it's what's already seems to be happening. If a name is productive, that form of the name should be allowed a definition. If nothing else, it will allow some consistency for what is already being added. If there are too many odd cases, we could restrict it to only those people who are "notable enough" to have Wikipedia entries. And again, only the productive/attributive form of their name, not the Wikipedia article name. Thoughts? —Pengo (talk) 00:10, 8 September 2015 (UTC)

@Pengo: This issue is completely legitimate and probably thorny. We have commonly-used derivations like "Orwellian", "Kafkaesque", and "Dickensian" but I also added this change to Volapük and had it removed. Maybe constructions like this need to have a kind of secondary citation threshold: not only must someone coin the term "koavfesque" but someone also needs to comment on that usage ("It seems that the sorry state of public infrastructure is being described by both conservatives and liberals as 'koavfesque', meaning truly pathetic and dilapidated...") Does that make sense? —Justin (koavf)TCM 02:49, 8 September 2015 (UTC)
I see nothing ridiculous about providing an etymology for Darth Vader noting the origin of the name as a fanciful coinage for a fictional work. bd2412 T 03:20, 8 September 2015 (UTC)
We have reasonable dictionary definitions for several people who are known just by their surname. Hitler is a reasonable example. This sort of thing is good to have in a dictionary. We should have more of them. SemperBlotto (talk) 07:42, 8 September 2015 (UTC)
The "Picasso" noun entry seems a bit silly. You can do that with any painter's name, e.g. "a Manet". Equinox 17:56, 8 September 2015 (UTC)
Sure, but that's kind of my point, that we have an entry for "a Picasso" but, by our current guidelines, Picasso himself shouldn't have a sense.
I had another read of the CFI guidelines. I thought there was some kind of ban on biographical entries (perhaps because of the Darth Vader kerfuffle, or perhaps because it basically doesn't mention them at all), but the guidelines only exclude names of people that are made up of 2+ words:
No individual person should be listed as a sense in any entry whose page title includes both a given name or diminutive and a family name or patronymic. For instance, Walter Elias Disney, the film producer and voice of Mickey Mouse, is not allowed a definition line at Walt Disney.
So really this only excludes Darth Vader, which is kind of a bit silly and arbitrary, but otherwise biographical entries are basically ignored by CFI. Perhaps they could be made more explicitly allowed, with some notability guidelines.
As for "a Picasso", we kind of need that sense because you can also have several "Picassos", which is certainly attestable and that needs an entry too (just like Monets and Rembrandts). —Pengo (talk) 01:18, 9 September 2015 (UTC)
I think I'm in the minority, but my feeling has always been that we should WP-link to famous names like Einstein from our entries (under "See also", or conceivably "Derived terms") without attempting to "define" them ourselves. People are individuals who bear a name, not a sense or meaning of the name. People are not semantic. Having said that, I have come across hybrid "encyclopaedic dictionaries" (Oxford has or had one) that do include such entries, but I've found that they tend to be poor dictionaries and even worse encyclopaedias. Okay, we don't have the lack-of-paper problem, but WP is always going to be superior on encyclo coverage, and we should exploit that rather than doing some dubious weak copying of a fraction of its entries. Equinox 01:25, 9 September 2015 (UTC)
Of course people are semantic. It's just that names are often not unique globally, so their meaning is context dependent. But that's nothing new; every noun preceded by the is also context dependent. Names can be considered to have an inherent definite article in them. Some languages actually do use a definite article with names, too. —CodeCat 01:51, 9 September 2015 (UTC)
Bearing in mind my suggestion that "people are individuals who bear a name, not a sense or meaning of the name", would you then support a sense at Smith for every individual mentioned attestably (e.g. 3 mentions! CFI) as "Smith"? That's gonna be a long entry. Equinox 02:10, 9 September 2015 (UTC)
BTW, the "several Picassos" thing is almost a red herring to me, since you can pluralise surnames qua surnames (e.g. "I'm going to see the Smiths"). This then just becomes the combination of two rules: (i) you can pluralise a surname, (ii) a surname can stand in for a work by a person who bears that name. Equinox 01:26, 9 September 2015 (UTC)
@CodeCat For me, the evidence that the person's name has become "semantic" in language is when his or her name is used as a base of other words or meanings. E.g. "Darwinian" or "Hitlerite". I don't think you can argue very strongly that "Darwin" or "Hitler" are just names with no particular meaning any more, evidenced by the fact their names have branched off into new words. So we should try to capture what that base word (name) lends to its derived terms. We're certainly not trying to replicate Wikipedia. All cases I've seen link there with about as much text as a Wikipedia disambiguation page listing. See any of the above examples. But having the fictional character "Darth Vader" under "See also" for "the Darth Vader of..." is just silliness to me. Having it under "Derived terms" would be the wrong way around, and an attempt to downplay a main meaning of the term. Pengo (talk) 16:09, 9 September 2015 (UTC)
@Equinox I feel that someone could list "a Picasso" or "a Rembrandt" in any list of random items and it would be understood to specifically mean a painting by that painter by most English speakers, and that someone might legitimately look up "Rembrandt" or "Rembrandts" trying to understand its meaning, e.g. having seen it without enough context to guess what it meant, perhaps confusing it for "a Remington". So perhaps the pluralization part is a red herring, but it still seems dictionary-worthy from a "what does this usually mean in English" viewpoint, and doesn't apply equally to all surnames. Pengo (talk) 16:09, 9 September 2015 (UTC)
Shouldn't the proper noun definition of Darth Vader go? We have the etymology and the noun. Renard Migrant (talk) 16:40, 9 September 2015 (UTC)
No. —Pengo (talk) 04:22, 10 September 2015 (UTC)
Agreed: Obama (Obamacare), Dickens (Dickensian), Picasso (Picassian), Darwin (Darwinian), Popper (Popperian), Kuhn (Kuhnian) should have a succinct biographical sense lines, or they should have ", especially ..." parts of the surname sense line. Here's what dicts do:
  • AHD: Obama[10], Dickens[11], and Picasso[12].
  • Collins: Obama[13], Dickens[14], and Picasso[15].
  • Merriam-Webster: Obama[16], Dickens[17] and Picasso[18].
  • oxforddictionaries.com: Obama[19], Dickens[20], and Picasso[21].
  • Macmillan and dictionary.cambridge.org have none of this.
--Dan Polansky (talk) 09:33, 13 September 2015 (UTC)
They should be see-alsos. Vote? Equinox 09:36, 13 September 2015 (UTC)
@Equinox: What is the rationale for excluding these? (They are not excluded by the current CFI.) Is it the redundancy to Wikipedia? If so, should Nile be reduced to "a specific river" to minimize redundancy? Is there at least one dictionary that does such a minimization of "Nile"? --Dan Polansky (talk) 09:49, 13 September 2015 (UTC)
If I take your "People are not semantic" above as the rationale or part of the rationale, do we really mean that referents of proper names are not semantic and should therefore be excluded or reduced? I saw that position before. Applied to extreme, each geographic name would have a single sense line saying just "a geographic name"; even "a specific river" would be too specific; and each astronomical name would say "astronomical name" instead of "an autumn constellation of the northern sky" (Perseus). If this position that referents must be excluded or obscured as far as possible is accepted, I don't see why it should be only accepted for biographical names and not for geographic names or astronomical names. --Dan Polansky (talk) 10:55, 13 September 2015 (UTC)
Re: "would you then support a sense at Smith for every individual mentioned attestably": That's a good point. As a practical matter, we do not want to include a sense line for every attested human individual. That's why the name as referring to the individual should overcome additional hurdles, such as that it gave rise to an adjective or that it is broadly understood to refer to that individual when used out of context. Other hurdles can be come up with. The hurdles are not specified in CFI, but CFI allows editors discretion in deleting proper names and their senses, via RFD of course. --Dan Polansky (talk) 11:12, 13 September 2015 (UTC)

Sathmar Swabian[edit]

I have some words from Sathmar Swabian which I'd like to add, but as with the Russia German languages I mentioned above, we need to decide how to encode them. The Sathmar Swabians inhabit a region on the border of Hungary and Romania, having migrated there from Swabia, and they speak a dialect of Upper Swabian which has remained very similar to the varieties of Schussenried and Otterswang in Germany.
As I noted above re Volga German, we have on the one hand tended to give lects their own codes if they developed independently due to geographic isolation, even if they didn't develop to be very different (see e.g. Pennsylvania German, Transylvanian Saxon, etc). On the other hand, the variation which does exist between Sathmar Swabian and Upper Swabian proper is comparable to the variation between Upper Swabian and other dialects of Swabian (which are incidentally usually the sources of the differences), which we don't split, and of the four people who are said to be the main scholars of the language, Moser and Stephani speak of it as 'the Sathmar dialect (Upper Swabian)'. (I can't determine the stance of the other two scholars, Fischer and Wonhas. De.WP implies that Fischer considered it a dialect, but I don't see where it's covered in his comprehensive Schwäbisches Wörterbuch, perhaps because I'm missing some obvious terminological difference or perhaps because he doesn't actually include it.)
I have a weak preference for treating Sathmar Swabian as its own language, but compare my comments in the section above, re Volga German, about "bias". You can gauge the similarity of the lects yourself at User:-sche/Sathmar. - -sche (discuss) 01:24, 9 September 2015 (UTC)

I've created the code gmw-stm for this. - -sche (discuss) 05:27, 28 September 2015 (UTC)

How to deal with formations that have no overt phonetic or orthographic form?[edit]

There are many cases where a suffix has been reduced to zero, but its effects can still be seen through other grammatical processes. For example, in Northern Sami, the present participle has no actual suffix, and is characterised purely by strengthening the consonant grade. The Estonian genitive case has no actual suffix, but is apparent through consonant gradation, and because the stem-final vowel is not deleted as in the nominative. I am wondering how we could create entries for these. There's no actual suffix to use as the page name, but detailing the function and (especially) etymology would be useful nonetheless. So how should I do this? —CodeCat 23:39, 8 September 2015 (UTC)

I would suggest an appendix, such as Appendix:Arabic verbs, which details Arabic verb classes, some of which are characterized by the doubling of the middle root letter, which is very similar to the issue you are dealing with with Northern Sami. --WikiTiki89 14:33, 9 September 2015 (UTC)
Uralic morphology is almost entirely suffixing though, so such an appendix wouldn't give much extra value. The difficulty here is only that the actual suffix has eroded due to sound changes, leaving some other effect as a residue. But the suffix is still "real" and its identity can be etymologically established, unlike the vowel changes of Arabic. For the Northern Sami present participle for example, the present participle can be traced to a suffix *-jē, which has disappeared after a sound change that deleted intervocalic -j- with compensatory lengthening of the preceding consonant, and changes to the vowels. While there is no actual suffix, there is still conceptually a suffix that causes this effect when attached.
My own thought is to use the entry - for cases like this. It could also be used, for example, to explain the zero plural suffix in sheep (which goes back to Proto-Germanic *-ō). —CodeCat 19:11, 9 September 2015 (UTC)
But you can't really say it's a suffix anymore at that point, rather a morphological feature. There is no longer any suffix in the plural of English sheep, even if the singular and plural are derived from previously distinct suffixes. You could say that in Russian, there is a null suffix in the nominative of masculine nouns, which also causes epenthetic vowels when it follows a consonant cluster. So I guess you could treat this Northern Sami thing as a null suffix as well, but you would have to think about whether that actually makes sense. Even if Uralic morphology is almost entirely suffixing, an appendix would still be useful. It does not have to be exactly like the Arabic one, since this is a completely different situation; the reason I compared it to Arabic is that some verb forms (such as form X) can be simply analyzed as prefixes (اِسْتَـ ‎(ista-)), while others (such as form II) are characterized solely by the doubling of the middle root letter, which cannot be represented as an affix or infix in any way that makes sense. For Northern Sami, you could create an appendix detailing verb conjugation and participle formation (if it is all suffixes as you say, this page would be short and sweet), and for Estonian you could create an appendix detailing noun morphology (which would also be short and sweet). --WikiTiki89 19:59, 9 September 2015 (UTC)
Zero suffixes are sometimes theoretically justified, but I'd think they pretty clearly should not be dictionary entries. A separate appendix page could be useful in some cases though, I suppose. --Tropylium (talk) 20:19, 9 September 2015 (UTC)
Note the treatment of the "zero suffix" by User:Cinemantique in снос. What do people think about it? --Vahag (talk) 21:54, 9 September 2015 (UTC)
I personally don't very much like it, but I could be convinced of its usefulness. In the case of снос ‎(snos) and similar, I think it would be better to just say "From сноси́ть ‎(snosítʹ)". --WikiTiki89 22:03, 9 September 2015 (UTC)
I agree with you. But it can be argued that some kind of categorization of "suffixless" derivations is useful. --Vahag (talk) 22:11, 9 September 2015 (UTC)
This is a very good reason to have entries for such derivations. We certainly would want derivation lists and categories for them. I think Cinemantique's solution is pretty good, and I think I'll use it as well unless there are bad objections (as well as good alternatives). —CodeCat 23:07, 9 September 2015 (UTC)
Categorizing zero derivatives sounds like a good idea (including things like bug, drink), but surely this can be done without treating zero suffixes as entries. That's kind of the point after all; these are forms that are morphosyntactically treated as derived while being lexically underived.
Note that this is a distinct issue from stem alternations or the like as allomorphs of productive inflectional suffixes — in those cases there's full reason to have a suffix entry in the first place, and given out habit of linking allomorphs to the same entry, there should be no obstacle to doing something like {{suffix|stem|suff|alt2=∅}}. But then again, since when do we link inflected forms to their suffixes anyway? --Tropylium (talk) 07:09, 10 September 2015 (UTC)
If we're going to do this, -∅ does seem like the logical notation to use (or at least, to display; perhaps we could rig it up so that the link went to the appendix: -∅). But we do have to be careful when deciding where to do this; I agree with WikiTiki that it would be inaccurate to describe English "sheep" as having a suffix. - -sche (discuss) 01:11, 10 September 2015 (UTC)
I like the idea of explaining the suffixlessness, but using a -∅ seems overly jargony to me. Why not write a plain English explanation as has been done above, and add it to the entry (either as a line in the etymology or in a "Grammar notes" section), saying something like what has been said from above, "The present participle has no suffix, and is characterised purely by strengthening the consonant grade" or something more easily understood. If this text needs to be repeated on many entries then make a {{suffixless}} template and use that. I don't know how common -∅ notation is, but I suspect it's not common enough to help casual readers of Wiktionary. Pengo (talk) 13:54, 13 September 2015 (UTC)
  • Empty suffix looks like an inferior idea. To derive Russian снос as сносить +‎ -∅ looks most curious to me. Suffixing is a process of adding something; if nothing is added, no suffixing takes place. google books:"empty suffix" does not seem to find many hits from linguistics; most seem to be computer science. --Dan Polansky (talk) 19:12, 13 September 2015 (UTC)
    That’s because it is usually termed a zero morpheme or null morpheme, not an empty suffix, in linguistics. Deriving as x + -∅ is fairly common linguistic practice. Vorziblix (talk) 08:46, 17 September 2015 (UTC)

"New Ancient Greek"?[edit]

We already have "New Latin" for Latin words coined in modern times. But Ancient Greek words are also made anew, often for scientific purposes, by combining ancient elements. Should there be a separate "New Ancient Greek" etymology language for these, to match how we treat Latin? —CodeCat 22:48, 12 September 2015 (UTC)

In taxonomy, at least, Greek is Latinized before being incorporated, so that even terms composed of Ancient Greek parts are really Latin (although there are a number of taxonomic names that show Ancient Greek nominative endings, I've never seen one that used Ancient Greek genitive endings instead of Latin). I'm sure this is true throughout the sciences. Chuck Entz (talk) 01:16, 13 September 2015 (UTC)
Ancient Greek was not used in the scientific community and therefore the statement that "Ancient Greek words are also made anew, often for scientific purposes" is false; words may be made from Ancient Greek roots but not actually reflect words in Ancient Greek. The same is true of many scientific coinages from Latin nowadays, but until the 19th century most of them originated in New Latin texts. Therefore there would really be no use for this. —Μετάknowledgediscuss/deeds 01:38, 13 September 2015 (UTC)
It's not just the roots that are Ancient Greek though, the derivational rules used to create the combination are also Ancient Greek. Everything about the words is Ancient Greek, except that they're not used in Ancient Greek. They are Ancient Greek words coined for the sole purpose of deriving loanwords from them. Does that make them words or not? Not in the usual sense, but to say that they are "not words" doesn't seem right either. They're like limbo words. —CodeCat 02:55, 13 September 2015 (UTC)
Well, no, not everything. As Chuck already pointed out, they are almost always Latinised. In any case, we document words that are used, and those in limbo can never have entries, and thus never be an etymon. It's like reconstructing a word in a protolanguage for a modern concept just because all the descendant languages use cognate terms for it; it could be done, but has no lexicographical validity. —Μετάknowledgediscuss/deeds 03:57, 13 September 2015 (UTC)
Examples? I think the idea is worth entertaining (though I'd unlikely be expert enough to weigh in). But I feel we really need some examples of potential modern Ancient Greek words for any meaningful discussion. —Pengo (talk) 13:34, 13 September 2015 (UTC)
@CodeCat, Chuck Entz, Metaknowledge, Pengo: The only "New Ancient Greek" word I can think of is μιξόγλωττος ‎(mixóglōttos), which occurs in Johann Jacob Hofmann's Lexicon Universale (1698) as an adjective qualifying the Latin noun nōmenclātor here. CodeCat's on to something here, but I'm not sure how widespread this phenomenon is. — I.S.M.E.T.A. 15:08, 13 September 2015 (UTC)
I think we're talking about two different things here. I'm talking about Ancient Greek words that are coined to serve as a base for derivations in various languages. Things like hypothermia. I am aware that these words have never been actually used in Ancient Greek, but to ignore their existence altogether seems wrong too. —CodeCat 15:15, 13 September 2015 (UTC)
@CodeCat: So, you want (appendical) entries for hypothetical etyma like *ὑποθερμία ‎(*hupothermía), yes? — I.S.M.E.T.A. 15:49, 13 September 2015 (UTC)
Something like that, yes. They are not reconstructed terms though, since we know they weren't used. Reconstructions are assumed to have been used, just unattested. —CodeCat 15:53, 13 September 2015 (UTC)
@CodeCat: Well, yes; that's why I called them hypothetical. — I.S.M.E.T.A. 16:08, 13 September 2015 (UTC)
There is no reason to have that. The parts can be linked to separately in the etymology. --WikiTiki89 15:55, 13 September 2015 (UTC)
Yes, but what about linking the derivatives together? Are they not cognates? —CodeCat 16:09, 13 September 2015 (UTC)
@CodeCat: Normally we would pick the language which coined "hypothermia" first, and that one gets a list of descendants. Otherwise what would we do with television? Create a "New Ancient Greek-Latin hybrid" language to list its cognates? I think the real issue here is that there's no consistent way to list and/or tag cognates, especially when the language it was originally coined is not clear, i.e. when there's no clear descendent-relationship. Other than listing descendants, what other benefits are there to New Ancient Greek? Are Ancient Greek grammatical rules applied to hypothermia (or other such terms) which are reflected in multiple descendent languages [and wouldn't those rules be the same just for 'thermia', θέρμη ‎(thérmē), anyway]? —Pengo (talk) 01:19, 14 September 2015 (UTC)
I don't see anything getting in the way of something like Appendix:Wanderwort/Hypothermia or Appendix:Wanderwort/Television for collecting the cognates in cases like these. They would obviously have to be formatted differently from usual entries though. --Tropylium (talk) 18:24, 3 October 2015 (UTC)



A userpage appearing without ever being created is quite surprising at first. Since it is clearly self-advertizing, it has nothing to do here but it also seems to be undeletable. Thank you 08:10, 13 September 2015 (UTC)

(edit conflict) The actual page is on Mediawiki, but it shows up on every Wikimedia wiki that doesn't have a page by that name. The only way to get rid of a global user page here would be to create a page (even a blank one) to replace the global page locally. That said, I'm not sure that global page is actually advertising, though it probably doesn't meet the standards in WT:USER. We haven't really worked out how to respond to global user pages, yet. Chuck Entz (talk) 08:16, 13 September 2015 (UTC)
In fact it's not really advertizing (at least not "classical" advertizing), but if the page was directly created here, it would certainly have been deleted for being "promotional material" (and perhaps the user blocked indefinitely), as far I can see. Am I wrong? Bu193 (talk) 08:28, 13 September 2015 (UTC)

Intransparent headwords for Ancient Greek entries[edit]

I noticed a bot introduced intransparent headwords into headword lines of Ancient Greek entries. For Εὐριπίδης, the transparent headword was Εὐριπίδης while the intransparent is "Εὐρῑπίδης". Can User:Benwing, the owner of the bot, point me to the discussion that lead to that change? Thank you. --Dan Polansky (talk) 13:24, 13 September 2015 (UTC)

What makes you think there was a discussion? Kind of presumptuous. —CodeCat 13:31, 13 September 2015 (UTC)
Can anyone point me to a discussion, if any? --Dan Polansky (talk) 13:39, 13 September 2015 (UTC)
From the revert war at Euripides, I see that the unfair methods of CodeCat are taking hold. Oh well. I have created Wiktionary:Votes/pl-2015-09/Using macrons in headword lines of Ancient Greek entries. The Euripides entry is now locked for "Disruptive edits by Dan Polansky". --Dan Polansky (talk) 14:00, 13 September 2015 (UTC)
@Dan Polansky: See Wiktionary talk:About Ancient Greek/Archive 1#Breves in Templates and User talk:Benwing#Ͷ, ͷ for two discussions. — I.S.M.E.T.A. 14:56, 13 September 2015 (UTC)
Wiktionary talk:About Ancient Greek/Archive 1#Breves in Templates is a 2007 discussion showing no consensus. User talk:Benwing#Ͷ, ͷ does not seem relevant, and is not a Beer parlour discussion. Is this a joke? --Dan Polansky (talk) 17:20, 13 September 2015 (UTC)
@Dan Polansky: Go to hell, you troll. — I.S.M.E.T.A. 17:50, 13 September 2015 (UTC)
Really? You revert-war against status quo ante, provide irrelevant discussions, and then call me a troll? You have shown true colors, indeed. --Dan Polansky (talk) 17:52, 13 September 2015 (UTC)
No, Dan isn't a troll- he takes himself way too seriously for that. As far as he's concerned, he's the last barrier standing between Civilization As We Know It and Tyranny And Chaos. On occasion, that's not that far from the truth. The problem is that he's so used to seeing himself as this principled Defender Of Truth, that he often can't see it when his own personal grudges take the place of his principles. Chuck Entz (talk) 21:13, 13 September 2015 (UTC)
@Chuck Entz: I apologise for my outburst. I should've written something like what was written by Μετάknowledge in Wiktionary talk:Votes/pl-2015-09/Using macrons in headword lines of Ancient Greek entries#Status quo ante, viz. “your statement of status quo ante is inaccurate. This [discussion] is pointless, and the issue belongs only among Ancient Greek editors, who already have a clear consensus (as you can see from the response [herein]). You are clueless about what has been going on”. @Dan Polansky: Perhaps you don't realise how profoundly irritating it is to have an editor completely unfamiliar with a language-editing community's practices (I have never seen you make a non-trivial edit to Ancient Greek content) come in, revert a completely routine and uncontroversial edit, demand an essay in justification of that edit, begin litigation when that demand is refused, and then dismiss the response of an editor from that community (when he finally concedes) as “a joke”. Your meddlesome, holier-than-thou manner endears you to no one. — I.S.M.E.T.A. 20:34, 15 September 2015 (UTC)

@Dan Polansky: I'm a little confused. Is the claim that using macra lacks transparency? We put diacritics outside the normal written orthography in the headword all the time (like Latin macra). Why is this a problem? —JohnC5 15:06, 13 September 2015 (UTC)

I don't remember Latin using macra all the time. Has this been changed recently? --Dan Polansky (talk) 17:20, 13 September 2015 (UTC)
It has been the case in Latin that macra should be used within Latin entries for a long time. The use of macra has been specified in WT:ALA and WT:AGRC for quite a while. Any entries that do not contain macra are ill-formatted and should be updated. The status quo has always been for their use. —JohnC5 19:17, 13 September 2015 (UTC)
As for Latin, my memory must have failed me; I now recall that macrons were used as long as I can remember. My mistake. As for Ancient Greek, let us have a look at WT:AGRC. This revision from 6 May 2012 tells me, on a slightly unrelated subject, that "Vowel length marks (i.e. the macron and breve) should not be used outside of Ancient Greek entries", which is relevant for the appearance of Ancient Greek in Euripides, which "I'm so meta even this acronym" protected to have his way contrary to WT:AGRC from 6 May 2012. Meanwhile, someone changed WT:AGRC to no longer state that. I admit that the same revision states that 'Secondly, headline templates, such as {{grc-noun}} have a "head" parameter, which can take vowel marks.' I admit my mistake since WT:AGRC suggests allowed use of these on the headword lines. Nonetheless, I point out that this was not an actual widespread practice before the bot run, as far as I remember (but do I remember this correctly?). Be it as it may, it may well be that there never was any controversy about the use of macrons in headword lines of Ancient Greek entries, and that I am quite mistaken here. Whatever the case, the vote should clarify that. --Dan Polansky (talk) 19:39, 13 September 2015 (UTC)
FWIW, I (as well as ISMETA and ObsequiousNewt) have been using macra in AG for a full year now and the new {{grc-decl}} requires them for correct functionality. —JohnC5 19:53, 13 September 2015 (UTC)
Oh noes, that makes you a criminal too! —CodeCat 20:30, 13 September 2015 (UTC)
I've been using macrons in grc exactly the same way as in Latin (i.e. everywhere except page names) ever since Module:languages/data3/g was edited here in January 2014 to automatically strip them in links. Breves have been stripped since October. I see no reason not to take advantage of this functionality, nor do I see any way in which doing so is "intransparent". —Aɴɢʀ (talk) 13:52, 14 September 2015 (UTC)
What's an intransparent headword head word? Renard Migrant (talk) 20:43, 14 September 2015 (UTC)
Certainly, all the Greek dictionaries I've seen include breves and macrons in their headwords. Benwing2 (talk) 10:42, 15 September 2015 (UTC)
@Benwing2: Can you please state these specific dictionaries at Wiktionary talk:Votes/pl-2015-09/Using macrons in headword lines of Ancient Greek entries#Dictionary practice? There, can you name specific entries in which it can be verified that they use macrons? --Dan Polansky (talk) 10:00, 20 September 2015 (UTC)

Macrons in Ancient Greek entries[edit]

This topic is currently discussed above, in #Intransparent headwords for Ancient Greek entries. --Dan Polansky (talk) 13:50, 13 September 2015 (UTC)

wrz, war translations[edit]

There are a bunch of translations labeled as "Waray" (code wrz) with the code for Waray-Waray (war). These are two distinct languages. Can they all be switched to one or the other, or is there a mix of both languages that need careful separation? DTLHS (talk) 20:32, 14 September 2015 (UTC)

@DTLHS: I looked through them, and recognised a lot of words that are definitely war, so I think it would be safe to switch them all. —Μετάknowledgediscuss/deeds 20:36, 14 September 2015 (UTC)

Vote timeline clean up[edit]

Wiktionary:Votes/Timeline (current revision: 29123778) has not been updated with new finished votes since 2013. The page is a bit messy, so I am thinking of maybe cleaning it up as a whole when I have the time. I am creating this BP discussion, as opposed to an RFC, because that's a personal project that I intend to do myself, not a request for others. (but if someone else beats me to it, that would be great, too).

Here's a list of what I intend to do. Please say whether you support doing it that way or if you'd rather it done differently.

  • Using the table format (which is being used in votes from 2004 to 2008) in all of the votes.
  • Merging the sections "Archived votes" and "Policy votes", since the division seems pointless in the first place and the former has plenty of "policy votes" to boot. (not to mention that "Archived votes" and "Policy votes" are ordered in opposite directions)
  • Editing the date: leaving just year-month (as in, "2008-12") as it's done since 2009 and not year-month-day like it was done between 2004-2008 (using multiple time formats). Maybe I'd use a single date template to format it consistently across the whole page.
  • Probably more things that I haven't brought up before but would post here before proceeding. Anything that's inconsistent and can be fixed easily. One minor thing:

--Daniel Carrero (talk) 07:25, 15 September 2015 (UTC)

I archived it --Zo3rWer (talk) 13:54, 15 September 2015 (UTC)
I don't see any bad consequence of this. Someone would have to point out some bad effect for me to consider opposing it. DCDuring TALK 22:16, 15 September 2015 (UTC)
  • I oppose using a table format in Wiktionary:Votes/Timeline. The current list format is nice enough, and easier to create for any archiver. People started to use the format on WT:VOTE itself (at the bottom), which makes archiving WT:VOTE easier. Keep things simple. --Dan Polansky (talk) 08:38, 20 September 2015 (UTC)
  • As for "Policy votes", that is a selection that I created in 2010. It is admittedly out of date, but there is nothing to be "merged"; it would have to be removed. As it is now, most people will see it is out of date, I think, and it still adds some value. I think it would better to update it; a guide for it is at Wiktionary_talk:Votes/Timeline. --Dan Polansky (talk) 08:38, 20 September 2015 (UTC)

Multilingual tables[edit]

Based on the multi-language system of Template:list:chess pieces/pt (which I had created pre-Lua), I created a system of tables that can be used in multiple languages.

First tables:

Thoughts? --Daniel Carrero (talk) 17:09, 15 September 2015 (UTC)

I like it. Most lists probably shouldn't be converted into tables, but I think this is a good example of a type of list that would benefit from it. —Μετάknowledgediscuss/deeds 17:37, 15 September 2015 (UTC)
That's ok because it's not too intrusive. As Metaknowledge says it wouldn't work with most lists. Renard Migrant (talk) 14:19, 16 September 2015 (UTC)
That's true. Category:English list templates has 91 members as of now. Many of those are lists of geography/places (countries, continents, oceans, states) which are probably better off the way they are, as lists rather than tables. The table of chess pieces has the advantage of being a simple eight-member group; lists with varying/unpredictable number of members (canids, religions, blues, reds) also probably should not be converted to tables either. --Daniel Carrero (talk) 14:35, 16 September 2015 (UTC)

Assuming we are going to start using this system for other tables in the future, (Disclaimer: I'm not proposing converting every list into a table, it's just that maybe there are other tables that it would be a good idea to have, after the chess thing.) I've been putting those in categories named like this:

Is this a good name? Obviously this is the best I could think of, but "auto-table" really does not explain that much, so I'm very open to other ideas. Or maybe it's a pretty good name anyway. --Daniel Carrero (talk) 07:51, 17 September 2015 (UTC)

Useless statistics[edit]

I was curious about belpuga, because it is an entirely ordinary seven letter word apparently found (but marginally) in but one language, as far as Google Books can tell. So I downloaded the page titles, counting small all-lowercase words, to see what density we have of the possible space.

ASCII [a-z], one letter: 26 (possible 26)
ASCII two letters: 521 (possible 676)
ASCII three letters: 4656 (possible 17576)
ASCII four letters: 22800 (possible 450,000)
ASCII five letters: 71184 (possible 12 million)
ASCII [a-z][aeiou] or [aeiou][a-z]: 228 (possible 235) (missing qo, iq, iy, ub, uc, uj, uq)
ASCII [a-z][aeiouy] or vice versa: 259 (possible 276) (additionally missing qy, yc, yh, yj, yk, yp, yq, yv, yx, yz)
ASCII three letters, including [aeiou]: 3956 (possible 8315)
ASCII three letters, including [aeiouy]: 4170

[[:lower:]]: 630 (possible 1984 Unicode 8.0 characters, or 1492 excluding MATHEMATICAL Unicode characters)
lower * 2: 2239
lower * 3: 13089

(I don't know if :lower: would count the latest Unicode 8.0 characters, so maybe someone has added them.) I don't know how much this reflects us and how much anything in the world of languages. If anyone cares about noting alphabetic characters, there may be 800 characters that need some sort of "xxth character of the XXX alphabet".--Prosfilaes (talk) 23:28, 15 September 2015 (UTC)

That is very fascinating stuff, and almost certainly useless. Congratulations! --Zo3rWer (talk) 15:43, 29 September 2015 (UTC)

Old Italian[edit]

The fate of Category:Old Italian language is being discussed at Wiktionary:Requests for moves, mergers and splits#Category:Old Italian language. I would like it if more than 3 people commented (most notably, GianWiki who is the only person making Old Italian entries, has not commented). Renard Migrant (talk) 15:52, 16 September 2015 (UTC)

@GianWiki. — I.S.M.E.T.A. 16:01, 16 September 2015 (UTC)

Change the appearance of the "favourite languages" in translations[edit]

I don't know what this feature is called, but it's the one that shows translations of languages you select, in the top bar of the translation box, even when collapsed. I think this format isn't so useful, because there's no room for more than one or two languages. I think it would preferable if, instead of collapsing the box altogether, the box just collapsed smaller, and showed only the translations for the languages you selected. So in collapsed state, it shows favourite translations, and expanding it shows all of them. I'm thinking of something similar to how the inflection box works on muitalit. —CodeCat 01:00, 18 September 2015 (UTC)

I support some sort of improvement in this aspect. My average translation table has 4 featured languages and it does get too cluttery. — Ungoliant (falai) 01:31, 18 September 2015 (UTC)
I would implement this if I had any idea how, or where this feature is currently located. —CodeCat 14:42, 21 September 2015 (UTC)

Min Nan POJ entries[edit]

Should Min Nan entries in Pe̍h-ōe-jī have definitions (e.g. in pe̍h-ōe-jī), or should they be like the pinyin entries for Mandarin words (e.g. in pīnyīn), where the character form(s) of the romanization are linked? (This is probably a question about whether POJ is a main script used for Min Nan.) Justinrleung (talk) 01:18, 18 September 2015 (UTC)

The status quo seems to be the former. —suzukaze (tc) 01:40, 18 September 2015 (UTC)
(e/c) I think the main question is: do people communicate using POJ, or do they just use it to transliterate? As I understand it, one wouldn't normally write a letter or a book in Pinyin, Romaji, etc. when the audience was native speakers, except perhaps in dictionaries or language-education materials. In other words it's more about the characters/writing system than about the subject matter. It would also seem to me that texts intended to demonstrate how familiar subject matter looks when written in the script would also be about the writing rather than about the subject matter. There's a book, for instance, that has the Lord's Prayer in hundreds of different languages and scripts- I would consider it strictly on the mention side of the use/mention distinction. Chuck Entz (talk) 01:42, 18 September 2015 (UTC)
The Bible has been published in POJ. —suzukaze (tc) 02:07, 18 September 2015 (UTC)
However, in the "Current Status" section of the Wikipedia article on POJ, most Taiwanese are unfamiliar with POJ. Justinrleung (talk) 02:16, 18 September 2015 (UTC)
Since we include usage from throughout recorded history, that might not be a problem in itself, if it was in sufficient use at one time, though we have to be careful to avoid giving brief, failed experiments undue weight, and to be clear about the difference between historical and current usage. Chuck Entz (talk) 02:25, 18 September 2015 (UTC)
I just came across Talk:a-bú which may be relevant. —suzukaze (tc) 23:53, 19 September 2015 (UTC)
@Suzukaze-c Thanks for the link. However, a concern I have is that the current format for Chinese entries might make it redundant to have definitions in both the Chinese character entry and the POJ entry. Also, does that mean POJ entries should be linked in translation boxes? Justinrleung (talk) 00:22, 20 September 2015 (UTC)
  • Personally, I'd support making them like pīnyīn entries, but I'm not an especially relevant editor. —Μετάknowledgediscuss/deeds 05:10, 20 September 2015 (UTC)

Multiple translations[edit]

I was just merging one section of the Translation section for "chaperone"/"chaperon". The German section has 11 terms - I think this is unhelpful - the user will eventually have to look at 11 pages to find out which might be the best. An editor should really do this for them by restricting translations to normally one (and occasionally two). In this case, IMHO, the reader would find Anstandsdame and then go to that entry to find those other terms, where their nature (dialectal, idiomatic, slang, insulting, … ) could be explained.   — Saltmarshσυζήτηση-talk 09:23, 18 September 2015 (UTC)

Slightly expanded usage of Template:also[edit]

Earlier today, I attempted to add {{also}} to Northwest Territory and Northwest Territories, each linking them to the other. Seem to make perfect sense: people would confuse the defunct American jurisdiction with the still-existent Canadian one. The wording at Template:also seems to leave the door open for something like this, yet I was reverted (rather rudely, I might add) by User:Ungoliant MMDCCLXIV, who informed me that this template is only to alternate capitalizations and diacritics, without providing any actual reasoning why it should be limited to those things. Is there any good reason for constraining, and if not, can we allow use of Template:also in this relatively small case of proper nouns that are unrelated except for the fact that one is the plural of the other? Purplebackpack89

Isn't this what the header "See also" is for? —suzukaze (tc) 05:07, 20 September 2015 (UTC)
I support using {{also}} to link between Northwest Territory and Northwest Territories. --Daniel Carrero (talk) 08:13, 20 September 2015 (UTC)
This is what the header See also is for, however {{also}} is for similar titles and I think it makes sense sometimes to have the disambiguation right at the top and not almost at the bottom in a see also section. How would you handle aliterate and alliterate? Renard Migrant (talk) 09:44, 20 September 2015 (UTC)
Would Usage notes work? —suzukaze (tc) 04:55, 21 September 2015 (UTC)
@Renard Migrant: See alliterate and aliterate DCDuring TALK 07:13, 21 September 2015 (UTC)
  • The good reason for placing a limit on what it included in {{also}} is to attempt to maintain its utility in its original intended purpose: helping users find what they wanted despite limited keyboarding skills or understanding of scripts with different diacritics. Homonyms in Pronunciation perform a similar function, though limited to same-language items. That {{also}} is placed above any L2 section suggests that its primary use is for resolving cross-language/cross-script confusion. The principal exception is that we use it for items in the same language that have different initial capitalization. Under the logic of the headings structure of our entries that would seem to be a mistake.
If someone were to want add to this confusion by including other kinds of confusions in those allowed in {{also}} above the first L2 or by using {{also}} within L2 sections, I'd like to see a proposal and a vote. DCDuring TALK 11:12, 20 September 2015 (UTC)
I don't really buy into your argument that if additional uses were added, it would be less useful in its original use. Purplebackpack89 12:56, 20 September 2015 (UTC)
@Purplebackpack89: Why is that? Because you can't afford it? Because you don't like the salesperson? Because you are unfamiliar with the research that shows that folks, when faced with a choice of ten items are less likely to make a selection or purchase of an item than when there are three? Or is it because you are aware of other research that contradicts this. DCDuring TALK 15:33, 20 September 2015 (UTC)
I don't buy into it because it believe it to be not entirely true. I don't believe that {{also}} reaches its maximum utility by being limited in use. Correct me if I'm wrong, but your argument seems to be somewhat that if it was added to more entries, it wouldn't be as useful to the entries it was on originally (or at least before September 19, 2015). I disagree. I think if you add {{also}} to more entries, there is added utility for the entries it is added to, and consistent utility for the entries it was on before. Purplebackpack89 16:19, 20 September 2015 (UTC)
  • I prefer using a ===See also=== section for this sort of thing. It's not a keyboard limitation issue. —Aɴɢʀ (talk) 13:27, 20 September 2015 (UTC)
  • Oppose. {{also}} is meant for language-independent links. Northwest Territory and Northwest Territories can only be confused in English and thus the link should be within the English section. --WikiTiki89 14:01, 20 September 2015 (UTC)
  • I oppose this as well. But I should note that I have on occasion included {{also}} inside language sections. In these cases, there was possible confusion between different words in the same language, for example between c and č or e and é. —CodeCat 14:51, 20 September 2015 (UTC)
  • Support. This is already how the template is used in practice. I've used it with malternative and mallternative, two words that have completely different meanings, but a variance in spelling of a single letter, and thus might reasonably be confused. It makes sense to disambiguate words that might reasonably be confused due to having very similar spellings. Placing an {{also}} template at the top of the entry is a simple, unobtrusive way to point readers in the right direction. Burying a link in a "see also" section probably isn't as helpful, since a reader who takes a wrong turn is likely to be unfamiliar with the structure of Wiktionary entries, and thus won't know to look in the "see also" section. This really shouldn't be controversial. Ultimately, it's about increasing the ease of use of this site for readers -Cloudcuckoolander (talk) 20:07, 20 September 2015 (UTC)

For the record, nowhere in WT:ELE explictly says what is the exact use for {{also}}. It has not been voted anywhere. Perhaps it should. --Daniel Carrero (talk) 23:05, 20 September 2015 (UTC)

IMO, we are not yet to a point of near-consensus that would allow a good vote.
AFAICT, we have a few locations we can use to direct users to an entry with slightly different spelling:
  1. {{also}} above the first L2 header (first and only choice for items with letter-by-letter correspondence (excepting diacritical marks and type case) to headword, but on different pages, in different languages.
  2. {{also}} at the beginning of any L2 header (for items in the same language subject to confusion.)
  3. {{homophone}} under Pronunciation header (first and only choice for words in the same language, with same pronunciation, but different spelling and derivation.
  4. Under Alternative forms header (first and only choice for variations of the headword that fit under the same Etymology heading)
  5. Under See also header {first choice for items that are not search targets for users coming to the page and L2, but which provide additional information of possible use to some users).
It would be nice if the solution for first L2 headers also worked for other L2 headers and worked under tabbed languages.
It seems to me that the use of {{also}} above first L2 would not work for languages that appeared below the English L2 section. It would also be conceptually different from the class of items under 1 above. Position 5 doesn't make much sense because it occurs too far down on large pages. The alternative forms and pronunciation headers don't fit the facts of the case. That makes me conclude that option would be the right choice in general. In other cases aesthetics might lead us to combine 1&2 and place it above first L2. DCDuring TALK 00:45, 21 September 2015 (UTC)
I agree with DCDuring's specifications, except that I would remove the restriction "in different languages" from #1. --WikiTiki89 14:37, 21 September 2015 (UTC)
Move {{also}} to English L2 Also remove See alsos. DCDuring TALK 00:45, 21 September 2015 (UTC)
Makes no sense. "See also" L3-section or L4-section is not for syntactic or morphological relations. {{also}} (previously {{see}}) is used to connect syntactic forms regardless of language. Regardless of language does not mean cross-language; it means that it does not matter whether it is within a single language or not. --Dan Polansky (talk) 08:37, 27 September 2015 (UTC)
  • I support to use {{also}} to conect "alliterate" and "aliterate". "Northwest Territory" and "Northwest Territories" is syntactically near enough to be connected with "also" as well, I think. --Dan Polansky (talk) 08:37, 27 September 2015 (UTC)

Terms derived from Latin[edit]

For example, beef is derived from the Latin bōs, however not in the nominative, but the accusative bovem. The French bœuf actually mentions this.

The question is, should all terms derived from Latin show the accusative instead of the nominative? --kc_kennylau (talk) 05:19, 20 September 2015 (UTC)

If they click on the accusative, they'll go to a form-of entry. If they click on the nominative, they'll go to the lemma, with all the important information, and the chances of there being a nominative entry are much higher than for an accusative entry. This is similar to the issue of higher-level taxonomic names, which are notmally derived from the genitive form of a generic name. I've been known to say "from X, the Y form of Z", but that can be a bit unwieldy.Chuck Entz (talk) 06:17, 20 September 2015 (UTC)
Some mention accusatives and some don't. The thing is, even if you had a policy for this (and some might say that's a step too far) you'd have to implement it by hand and that could take years. A bit like orphaning the abbreviation headers, there are thousands of them and they all need to be fixed manually. Renard Migrant (talk) 09:48, 20 September 2015 (UTC)
I think we should cite etyma in their lemma forms; this means saying that bœuf comes from bōs (not from bovem) and that chanter comes from cantō (not from cantāre). It just makes it easier for people to find the informative entry at first click. One possible compromise I've seen in some entries is to link to the lemma form but display both the lemma and the relevant inflected form, e.g. {{m|la|bos|bōs, bovis|ox}} or {{m|la|canto|cantō, cantāre|to sing}}. I'm not thrilled with that, as it strikes me as pedantic, but I can live with it. —Aɴɢʀ (talk) 13:48, 20 September 2015 (UTC)
I think that when the process is regular, we do not need to specify this on every page and we can just link to the nominative. In this case, French bœuf is regularly derived from the accusative of Latin bōs, because (almost) all French nouns derived from Latin are derived from the accusative. The same would especially be the case for French verbs (we should say that French prendre is "from Latin prehendō", not that it is "from Latin prehendere") because the French infinitive represents the whole paradigm just like the Latin first-person singular present represents the whole paradigm. In the case of irregular derivations from the nominative, we can specify that, for example, French fils derives "from Latin nominative fīlius". As for the English word beef, since it derives from (Old) French, and (Old) French regularly derives from Latin accusatives, we can simply say "from Old French buef, from Latin bōs". If English had taken the word directly from a Latin accusative, then we would have to specify. --WikiTiki89 14:12, 20 September 2015 (UTC)
My practice is to link to the lemma but display the true antecedent. So I’ll have bovem and cantāre. Less clicking. Users should be able to spot the accusative forms and infinitive forms in the declension tables. --Romanophile (talk) 14:16, 20 September 2015 (UTC)
What about the other direction? I imagine it might be useful for ====Descendants==== sections to mention that they are generally derived from the accusative. --Tropylium (talk) 14:23, 22 September 2015 (UTC)
In every one of the thousands? —CodeCat 14:32, 22 September 2015 (UTC)
Why would that be useful? --WikiTiki89 14:45, 22 September 2015 (UTC)

Difference between "regional" and "dialectal"[edit]

When a term clearly exists, but isn't part of Standard English and its origins are murky, we tend to mark it as either "dialectal" or "regional". I can't see any meaningful difference between the two in Category:English regional terms and Category:English dialectal terms – is there a good argument for not merging these? Anything that is "regional" is surely also "dialectal" by definition. Smurrayinchester (talk) 13:02, 22 September 2015 (UTC)

Not necessarily: regional sounds broader, as in maybe northern England as opposed to, say, Scouse. In other words, regional would encompass a number of dialects in a larger area, while dialectal would be in isolated dialects here or there. Of course, there's also the pejorative sense of dialect at play here: my usage is regional, yours is dialectal because it's in some obscure, out-of-the-way backwater that I don't care about... ;) Chuck Entz (talk) 14:16, 22 September 2015 (UTC)
To me, regional is a subset of dialectal, as dialects are not necessarily tied to a region. They can be social as well. —CodeCat 14:31, 22 September 2015 (UTC)
That's how I understand it too. Polari is a dialect, but not regional. I can't think of anything that would be regional but not dialectal – even British English, Indian English, US English etc are dialects of a kind. (And that's another problem with regional – are we talking about a town or a continent?) Smurrayinchester (talk) 14:58, 22 September 2015 (UTC)
Really everyone here is mostly in agreement. Even by CodeCat and Smurrayinchester's definition, Chuck Entz's example makes sense. --WikiTiki89 15:08, 22 September 2015 (UTC)
DARE documents a lot of differences in frequency of usage and pronunciation of words by region in the US, some at the level of city (NY, Chicago), parts of states, states, some over much larger areas (Southern US). I don't think that these vocabulary differences are sufficient to make a dialect. AFAICT linguists typically recognize only Appalachian English, and, sometimes, southern English as dialects. DCDuring TALK 15:54, 22 September 2015 (UTC)
Labov et al's Atlas of North American English (ISBN 978-3110167467) describes layers of dialects in America, from 'South', 'Mid-Atlantic', 'Inland North dialect' and other broad dialects to 'New York City dialect' and other small dialects with are in some cases subsets of the broad dialects. I can imagine some people making a distinction between 'dialectal' and 'regional'; but the categories reveal that insufficiently many people imagine that there is a distinction, and they are in practice used interchangeably, with dialects described as regional (see in particular Category:Classical Hebrew and Category:Biblical Hebrew!) and the distinctive speech of regions (accurately IMO and per Labov) called dialectal. I favour making "regional" an alias of "dialectal". I wouldn't object to having "regional" display as such but categorize as "dialectal". I find the label to be uselessly vague, though: specify which regions, and then you needn't add a vague "regional" label, and if it's desirable that all regional/dialectal entries go into a single category, then make each label double-categorize into the specific category and the regional/dialectal category, rather than appending "regional" to only a small handful of the many entries that have regional (Southern US, Northern England, etc) tags. - -sche (discuss) 15:35, 23 September 2015 (UTC)

The constituent part parameters in Template:calque[edit]

The template {{calque}} has parameters to provide the source language term as well as the terms in the calquing language that the new term was made from. This is problematic, because it makes it impossible to provide more detailed etymologies using more fine-grained templates like {{compound}} or {{affix}}. In fact, many calques are also compounds, but are not categorized as such. I think it would make more sense and work better if, instead of putting this in antibody like now

{{calque|en|anti-|body|etyl lang=de|etyl term=Antikörper}}

this were changed into

{{affix|en|anti-|body}}, {{calque|en|de|Antikörper}}

This way, the entry will be correctly categorized as being prefixed with anti-. And it also makes {{calque}} work more like {{borrowing}}. —CodeCat 16:02, 22 September 2015 (UTC)

I agree. This annoyed me recently when I was editing съвѣсть ‎(sŭvěstĭ). --WikiTiki89 17:30, 22 September 2015 (UTC)
I've now modified {{calque}} so that it works fine when no positional parameters are given. The entries that still do have them are at Category:calque with terms. There's quite a lot to do, and they can't be done by bot because it's not a given that {{compound}} should always be used. grammatical alternation for example is definitely not a compound. But then, we have never been very strict on what a compound is, some people treat anything made up of different parts as a compound. —CodeCat 19:58, 27 September 2015 (UTC)

Modules, schmodules[edit]

When did it become policy that you can't add a category to another category? This is unfair to users who can't code (i.e. most of us). Anyone should be allowed to put a category into another category, and we should not be forced to use User:CodeCat's confusing, elitist and generally unnecessary module system. Purplebackpack89 21:29, 22 September 2015 (UTC)

It's unfair that people with no medical knowledge can't perform surgery. BAWWWW. Equinox 21:30, 22 September 2015 (UTC)
Editing Wiktionary should be easy enough that we shouldn't compare it to surgery. You should be able to edit nearly all aspects of Wiktionary, including putting categories into other categories, without knowing how to code a module. Coding modules is really advanced stuff. Purplebackpack89 21:34, 22 September 2015 (UTC)
If you are talking about your recent edit war with CodeCat about Category:Penutian languages, maybe you should wait WT:RFDO#Category:Penutian languages to end. Perhaps the problem is not that using modules is the "hard" way to use categories, I'd argue that templates/modules actually make the category system easier to manage as a whole, it is just that the category in question is not part of the system because nobody created a code for "Penutian". Assuming that the category fails RFDO it will get deleted; if it passes you would just have to insert it normallly into the category system. The proper code would not be {{langcatboiler}}, but {{famcatboiler|qfa-pen}}, provided that the category passes RFDO, then a code qfa-pen would be created for it. --Daniel Carrero (talk) 21:54, 22 September 2015 (UTC)
I'm arguing about modules in general, and I totally disagree that it's easier. It's so much easier to use HotCat to add categories to other categories than it is to edit modules or add templates. FWIW, I am also concerned that User:CodeCat has too much OWNership of the categorization system of this project; particularly when CodeCat has forced us to use modules instead of just HotCatting everything like most other projects do. Purplebackpack89 22:03, 22 September 2015 (UTC)
@Purplebackpack: Most languages/families alreay have a code so it works for them; Penutian is the odd one out, still under discussion. I suggested above: That is clearly a disputed/proposed language family, wait for the RFDO discussion to end before using that category further. But you'd rather use that category (and "Terms derived from Penutian languages", etc.) now, before discussion? I wouldn't advise to do that manually. That's a lot of work for a category under discussion: You would have to edit Category:Terms derived from Wintuan languages, Category:Terms derived from Chinookan languages and others as subcategories of Category:Terms derived from Penutian languages individually or leave the work unfinished. The purpose of the module system is doing or undoing all that at once with few edits to the modules.
For better or for worse, when you complain about @CodeCat, I demand credit, too, since pre-Lua I created {{poscatboiler}}, {{langcatboiler}}, {{famcatboiler}} and a good chunk of the category structure in use now. Though lots of other people contributed to the system as well, with edits and also votes and discussions. (I admit my coding was a mess, often in the form of large templates, and it made people's lives difficult editing them. I apologize to everyone who tried to edit the templates and remember this.) One thing I think is an improvement is that you know what to expect in a category system that is structuralized through templates: before the creation of a number of standardized subcategories, language categories like Category:English language had entries, appendices, templates and indexes randomly placed directly in it instead of in the subcategories, for example. Also many categories "Category:XYZ nouns" were NOT in "Category:Nouns by language, a number of categories were using different names for the same language (Category:Nynorsk language vs. Category:Norwegian Nynorsk language) and so on. --Daniel Carrero (talk) 22:52, 22 September 2015 (UTC)
"The purpose of the module system is doing or undoing all that at once with few edits to the modules." If you can't code and/or can't find modules, the number of edits it takes isn't particularly relevant. With HotCat (and even without it), I can add a category to a large number of pages relatively quickly, as HotCat is relatively easily to use and doesn't require coding. Also, Dan, while you may have had a role in our current confusing system, you revert me (and other uses) far less in this area than CodeCat does. Purplebackpack89 23:54, 22 September 2015 (UTC)
I should make myself clearer. "The purpose of the module system is doing or undoing all that at once with few edits to the modules." That is in cases where you want to change how the system works, not just simple tasks like adding or deleting categories, which often don't require any change to the modules themselves.
Even if you did your edits with HotCat in a perfect and consistent way, we can't guarantee everyone will do it. Modules make the category system consistent. Consistency is important: Category:English nouns is a subcat of Category:English lemmas, not a subcat of Category:English language, and that is true for all languages. But what if people want to change this? What if User:Example wants Category:German nouns to be a subcat of Category:German language? My point is: There are dozens of thousands of categories, that is an increasingly complex system by itself. In my opinion, User:Example should not be able to place Category:German nouns under Category:German language while leaving all other "X nouns" categories unattended. If that is to change, then all "X nouns" would have to change together. That is why we are using modules now. What's your opinion on this? --Daniel Carrero (talk) 10:15, 23 September 2015 (UTC)
My opinion, Dan, is that you're ignoring the main drawback of module use: that most people either don't know what a module is and/or are unable to edit them even if they do. Yes, using HotCat or other manual categorization methods may have a lack of standardization, but it's far less complicated. I could probably add or remove a category from every single category we've got in less time than it'd take me to learn to code. Purplebackpack89 14:32, 23 September 2015 (UTC)
I am not ignoring that modules are complicated to people who don't know how to edit them. (I am one of those people, I can make templates using MediaWiki code, but I know virtually nothing of Lua.) I am explicitly supporting the consistency of modules in favor of the freedom of editing any category individually, on the grounds that categories are far more useful (at least IMO) in this multilingual system if all languages follow exactly the same categorization system. You say: "manual categorization methods may have a lack of standardization", but that downplays the fact that the categorization system without templates (as it was without poscatboiler and some votes and discussions) was a huge mess, with some examples that I already mentioned in this discussion. I also find it hard to believe that everything could be solved manually, like adding "Category:Nouns by language" and "Category:X lemmas" manually to all the 1,567 "Category:X nouns" categories, without having any categories using different language names. Surely there would be some problems solved with higher priority and others left unattended? Do you have interest in the whole categorization system or just a few of the existing categories? You said: "most people either don't know what a module is and/or are unable to edit them even if they do." What do you want to edit in modules? Or: What problem are you trying to solve where you would use HotCat rather than modules? --Daniel Carrero (talk) 15:52, 23 September 2015 (UTC)
If by "problem", you mean "everything I've ever wanted to recategorize or create since we created modules"...Purplebackpack89 16:30, 23 September 2015 (UTC)
That does not answer my questions. Be more specific. You only have 42 edits to categories but apparently since the beginning you had known how to use {{poscatboiler|gul|verbs}}, for example, so fitting new categories into the existing system does not seem to be an issue. What did you want to recategorize or create ever since we created modules? --Daniel Carrero (talk) 16:54, 23 September 2015 (UTC)
I have some questions:
  1. Can we direct User:Purplebackpack89 to the documentation for the relevant part of our infrastructure?
  2. Can we direct him to the the documentation for how to accomplish specific tasks?
  3. Can we tell him where the system for accumulating user requests, issues or questions and the response thereto?
No truly professional sysytem with a user/contributor interface would have so little documentation accessible from users PoV. If we can't have a truly professional system, then why do we place so much of the project in the hands of coders? Why do we accept so much regression in the features of the software. It seems as if many of the software projects simplify user-interface and other matters to match the skill level and "tidiness" preferences of the coders. DCDuring TALK 22:20, 22 September 2015 (UTC)
Question 1: {{famcatboiler}} / Module:families/data.
Question 3: WT:GP. --Daniel Carrero (talk) 22:52, 22 September 2015 (UTC)
Re: Question 1: How would one find that?
Re: Question 3: What gives some assurance that the problem is actually solved and how it is solved (any regressions?) rather than oversimplified, ignored, or deleted?
DCDuring TALK 23:29, 22 September 2015 (UTC)
Question 1: By looking at the code of other family categories. Random suggestion: Category:Italic languages. (or by asking other people)
Question 3: No assurance; this is a volunteer project. Though I'm curious: Did you have a request, issue or question that was posted in GP and later oversimplified, ignored or deleted? --Daniel Carrero (talk) 23:38, 22 September 2015 (UTC)
re: "this is a volunteer project" That seems to be interpreted as coders doing whatever they want, whether asked or not, conforming only to standards of their own invention. Other users are simply to conform to the changes. Those users are also volunteers and they have stayed away from or left this project in droves. DCDuring TALK 01:22, 23 September 2015 (UTC)
What do you want to change in categories? I don't see people complaining that way about complex templates other than categorization ones, such as {{en-noun}} and {{context}} (and previous incarnations, such as {{obsolete}}). What would people do if they wanted to change {{en-noun}} some way but didn't have the skills to do it? {{poscatboiler}} and related templates are great; but I'm just saying my opinion. If a number of people really hate them so much, why don't you nominate them for deletion? --Daniel Carrero (talk) 10:15, 23 September 2015 (UTC)
And Question 2, which is the "how to code" question? Also, having coding issues actually makes things harder for people like you and CodeCat who can code, because not only do you have to do your coding, you have to do the coding of everybody who needs coding done. Purplebackpack89 23:54, 22 September 2015 (UTC)
You started your complaint with "having coding issues"; what coding issues? --Daniel Carrero (talk) 10:15, 23 September 2015 (UTC)
And let me remind you: this is a general beef I have with categorization requiring coding. Penutian wasn't the first time I ran into this wall, and I'm definitely not the only person to run into this wall. Purplebackpack89 23:59, 22 September 2015 (UTC)
Nominate {{poscatboiler}} for deletion. I'd vote oppose, but you have the right to pursue what you think is best. Or hire someone to create something better if you can't code. Or make a list of everything you think is wrong with the template so that can be discussed/fixed. --Daniel Carrero (talk) 10:15, 23 September 2015 (UTC)
How 'bout I just nominate all modules for deletion, and we go back to manual categorization? Your "solutions" essentially require that I acquiesce to coding being necessary for categorization on this project. I'm not willing to do that. I shouldn't be asked to. Purplebackpack89 14:32, 23 September 2015 (UTC)
"...require that I acquiesce to coding being necessary for categorization on this project" No. What is the difference between presenting an opposing idea in a discussion and "requiring that the other person acquiesce" to something? On that logic, I could accuse you of making me acquiesce to your ideas right now, too. But the fact is that I am not trying to "manipulate" you into changing your mind. I am presenting my points and I am asking you to discuss yours, especially after you have explictly complained of your edits being reverted or what you have to say being ignored. What else do you want? Ultimately, nominating {{poscatboiler}} for deletion as I said may have sounded sarcastic, but that was not the case. If a template is problematic, creating a deletion discussion would be the natural thing to do. Even though I think it's a great and helpful template, you can do whatever you want. --Daniel Carrero (talk) 15:52, 23 September 2015 (UTC)

Formatting proposal: always put cognates in a separate paragraph[edit]

Previous discussion: Wiktionary:Beer parlour/2015/June#Proposal: Always collapse cognate lists in entries

Back in June I complained that lists of cognates make etymologies messy and hard to read. Several other editors seemed to agree, but there wasn't a consensus for any change that I can see. So I'd like to propose something much milder and less intrusive: cognates must always be in their own paragraph, separate from the explanation of the term's origin. See landschap for an example. —CodeCat 00:16, 24 September 2015 (UTC)

Support. Some etymologies have the evolution chain and the cognates intertwined (i.e. “from Fooese foo (cognate to Barese bar, Voynichean cthar), from Klingon kplar (cognate to ...)”. How can these be dealt with? — Ungoliant (falai) 01:11, 24 September 2015 (UTC)
They should probably be detwined and made into a paragraph. Of course, there's nothing that says there can't be more than one paragraph of cognates, so one could have one for close cognates and another for more distant ones. —CodeCat 01:27, 24 September 2015 (UTC)
I also support, though I would like to know more about this issue. Would we format multiple cognate paragraphs to begin with a word to indicate from what level the cognates come? If so, is the cognate paragraph labeled by the parallel cognate or the shared etymon? Also, do we represent collateral forms under the list of cognates? —JohnC5 16:15, 24 September 2015 (UTC)
Would it be possible to make a template for this, something like {{cognates|fi=...|eu=...}}, or do you think there would be too much variation? DTLHS (talk) 01:42, 24 September 2015 (UTC)
I Support. And I am huge fan of consistency, so in cases (like neap etym_2), will the one cognate be in a separate paragraph ? What should be the proper way in this instance? Leasnam (talk) 11:44, 24 September 2015 (UTC)
And we also treat comparisons in a similar way to true cognates, so would we need a separate template (if any) for these ? Leasnam (talk) 11:47, 24 September 2015 (UTC)
That template would be possible and is an interesting new way to use template parameters, but I don't think we should use it for this because cognates need to be arranged in a logical and structured order which makes sense in terms of the etymology and the template would have no way of knowing how to arrange them. --WikiTiki89 14:40, 24 September 2015 (UTC)
For descendants we usually have a fixed ordering, so could we not use that for cognates too? —CodeCat 14:47, 24 September 2015 (UTC)
The ordering is fixed only when the word descended through it's "natural path", which is not always the case. --WikiTiki89 14:51, 24 September 2015 (UTC)
True, but that might not matter for listing cognates. —CodeCat 15:29, 24 September 2015 (UTC)
Well it does still matter, but you may be right that it would not apply in many cases because the words did descend in their natural paths to all the listed cognates. --WikiTiki89 15:38, 24 September 2015 (UTC)
Support: saying they "must always" be in their own paragraph seems needlessly firm to me. There's always room for exceptions to be made. But in general it seems like a good idea. WurdSnatcher (talk) 16:18, 24 September 2015 (UTC)
Support, seems reasonable. Also WT:ELE is currently lacking any mention of cognates. I realise they don't have their own header, but this is the place users ought to be able to look for guidance on how to add them to an entry. —Pengo (talk) 01:49, 26 September 2015 (UTC)
When an etymology is very long, this would be helpful, but when an etymology is very short ("From Proto-Algonquian *foo, whence also Penobscot foo."), a paragraph break seems unnecessary and excessive. It would also make entries with different levels of cognates (as Ungoliant describes) messy. - -sche (discuss) 05:21, 26 September 2015 (UTC)
I agree with this point. "Must always" seems a bit overly firm here. We could consider e.g. separating cognates if the etymology runs to more than one line long? --Tropylium (talk) 22:42, 28 September 2015 (UTC) 
@Tropylium How long is a line? —CodeCat 18:05, 1 October 2015 (UTC)
Depends on the window size (which further depends on one's screen resolution), of course. But we should probably aim for downward compatibility. Is it possible to find out what is the median non-mobile user's screen width? Off the cuff I'd guess 1024 px? and allowing for other screen uses, about 600-800 px might then be a reasonable range for "a line". --Tropylium (talk) 20:20, 1 October 2015 (UTC)
  • I oppose that "cognates must always be in their own paragraph" since it makes etymology sections take more vertical space, especially the short ones. I am quoting, including "must" and "always". --Dan Polansky (talk) 11:45, 26 September 2015 (UTC)
Define list, is a list just more than one? Renard Migrant (talk) 14:32, 26 September 2015 (UTC)
  • I support the option to use separate paragraphs, but would not be comfortable enforcing it if, for instance, there are only one or two examples. Ƿidsiþ 14:31, 30 September 2015 (UTC)

Wikiproject Siriono[edit]

Hi all,

I inform you about a project I am trying to build for fr.wiktionary and es.wiktionary. It's name Wikiproject Siriono and it's about a language spoke in Bolivia named Siriono. I am a French linguist who study this language since five years now for my PhD in linguistics and I want to export my database into Wiktionary. Then, as the project is wrote, I plan to go to Bolivia to train the speakers to access and manage the entries in the Spanish edition of Wiktionary. My database is made with FieldWorks Language Explorer and include translation from Siriono to local bolivian Spanish, French and English. I don't know nothing about the structure of the English edition of Wiktionary, so I will not try to create a bridge for this edition, but I am willing to collaborate if someone from here want to. In my opinion, it's just a basic formatting for xml datas and a language check, as English is not my mother tongue. Plus, if there is here someone who speak Spanish and want to be part of the team that will go to Bolivia next year, there's still room for a name. I want to work mainly if only on the Spanish Wiktionary so better if it's someone that know this project well. Hope you will find interesting and your welcome to comments, critics and fix mistakes in the proposal if my English is to vague or incorrect. Yours, Eölen (talk) 17:51, 24 September 2015 (UTC)

@Eölen: Very exciting project! I've copy edited the proposal as requested (hopefully I haven't added any errors). The only part I wasn't sure about was the line "...and be host in family during one month." I wasn't sure if you meant "and his family will host for one month" or maybe "a local family will host the team for one month" so I left it (I'll let you fix it). Looks like a very worthwhile project and wish you the best with it. —Pengo (talk) 03:04, 26 September 2015 (UTC)
@Pengo: Thank you very much! You did a great job! I will clarify this sentence. A local family will host the team, in one separated house I built in the village, next to their house. I don't have relatives there! I am very grateful you helped to fix the language and improve the writing. I hope this project will go well, I am still trying to define precisely how to schedule it and I am still looking for a volunteer for the team! Thanks again! Eölen (talk) 04:25, 26 September 2015 (UTC)

"Compare" in etymologies[edit]

I've always been annoyed when etymologies say "compare", with a bunch of words in other languages, because it's so meaningless. Why am I supposed to compare them? Am I supposed to note their superficial similarity? Is their meaning interesting? The number of syllables? Do I win a meerkat if I compare them? Obviously the insinuation is that there is some connection, but why does the entry not just say what the connection is? Are they cognate? Or otherwise related? Then say so.

Sorry, a bit of a rant, but I hope people agree with me. —CodeCat 00:15, 26 September 2015 (UTC)

Then say so.—how about you go speak to your cat like that?
I guess I'll have to be the first to disagree, since those were my two entries that you just defaced. Like I said when I reverted your edits, at least mine were helpful, which is more than I can say for yours. If you don't find this information useful—don't use it. If you don't find it interesting—ignore these sentences. And let other people make whatever sense they make of those things. They don't break any rules, the connections pointed out are factual and real.
P.S. Does anybody honestly care what someone has "always been annoyed" by? Pfftallofthemaretaken (talk) 00:21, 26 September 2015 (UTC)
But you didn't point out any connection. You said "compare", which means nothing. Things can be compared to show the absence of connections, too. And Wiktionary isn't a free-for-all where you can just put anything you think is interesting. Things have to serve a purpose and fit into our formatting rules. A "compare with" heading does neither. And for the record, I didn't "deface" "your" entries. I edited the entry, and the entry was Wiktionary's from the moment you made your edit. —CodeCat 00:27, 26 September 2015 (UTC)
There is a place for "compare" in etymologies, because establishing the exact relationship between words isn't always possible, but there are plenty of cases where there seems to be some kind of connection. I would rather have "compare" in many cases than to either have a long explanation of why the term is of interest, but maybe not a cognate, or to eliminate anything other than that which is rigorously proven to be a cognate. As with anything open-ended, this can certainly be abused: I don't really want to see a bunch of Mongolian terms in English entries because they pass some kind of Pan-Turkist's Rohrschach test. But then, I don't really want to see Norwegian, Swedish, Danish, Icelandic and Faroese cognates because one of the languages was included and partisans for the others felt they deserved equal treatment.
As for the two cases at contention: I agree that having a number of terms in other languages under a separate "Compare with" header is overkill, though they all probably trace back to the same Pali term through some combination of borrowing and inheritance. On the other hand, removing A Chinese term from a Sino-Vietnamese etymology that's made up of the same characters as those cited in the main part of the etymology seems like overkill, too. It's a matter of proportion and degree of relevance, which can't be tidily summed up in some kind of rule. Chuck Entz (talk) 01:20, 26 September 2015 (UTC)
It's more the wording that annoys me than anything else. I have no problem with listing cognates or related terms, but then say what they are. Don't use "compare" and leave people guessing at what the idea is. Related terms should be called related terms, because then you know what they are. —CodeCat 01:38, 26 September 2015 (UTC)
I agree. I assume cf. is shorthand for "see also this other word which sounds a bit similar so maybe its etymology is correct for this word too or at least had an influence on its formation, who knows?", or more briefly: possibly cognate with or possibly influenced by or formed in a similar way with a different meaning etc, which are the kinds of phrases I'd prefer.
Similarly Related terms is also fairly meaningless, and it annoys me that we have no standard way (afaik) to, for example, mark a related term as "adjective form of this noun", etc. —Pengo (talk) 02:11, 26 September 2015 (UTC)
For related terms, I was talking about putting them in etymologies. Just as we have "cognate with" lists, there's also "related to" lists, which are much preferable to the nondescript "compare". —CodeCat 02:18, 26 September 2015 (UTC)
Compare makes perfect sense. It means, look for similarities or analogies. "Compare" does not state what the connection is since that would be too wordy, and since that is often obvious once you actually start doing the comparing. It looks like I largely agree with Chuck Entz above. --Dan Polansky (talk) 11:51, 26 September 2015 (UTC)
Compare is used for cognates in other language and also cognates in the same language, or words that aren't cognates but follow a similar pattern in their formation. You're free to be pissed off abut it, it's your life, but I'd like a Wiktionary where not only what CodeCat thinks is important, and other people can make some decisions too. Renard Migrant (talk) 14:24, 26 September 2015 (UTC)
Of course, because whenever I think something can be improved, that must automatically be interpreted as my attempts to force my will through. Because we all know that leaving problems unsaid works so well. At least now that I've said this, it's clear that some people agree with me. So how can you even think it's about what I alone think is important? Do those other people not count? —CodeCat 14:40, 26 September 2015 (UTC)
I agree with CodeCat, "compare" only means something if you already know what the relationship is or know enough about linguistics to deduce what's meant. It makes sense in paper dictionaries where you have to save space, but here we can be clear. Why not say "possibly cognate with" or "may be related to" or whatever? It seems like Wiktionary tries to be as confusing and off-putting as possible sometimes. Why use such stilted, unnatural language? Are we trying to communicate info to each other and other linguistics enthusiasts? Or are we trying to educate ordinary people? If we're trying to educate ordinary people, "compare" is pointless obfuscation about as useful as a "Meronyms" section. WurdSnatcher (talk) 14:30, 26 September 2015 (UTC)
“I would rather have "compare" in many cases than to either have a long explanation of why the term is of interest, but maybe not a cognate, or to eliminate anything other than that which is rigorously proven to be a cognate”—this.
@WurdSnatcher Educate ordinary people? I didn’t know that was the aim of the project. I would be curious to know how many people here would say they contribute because they’re trying to educate ordinary people. I’d say ordinary people don’t care about dictionaries at all—they use them once or twice a year to look up a word or two. And when they do that, they’re certainly not interested in things like, say, the etymology of the word at all. Should we start removing all etymologies then?
“Why use such stilted, unnatural language?”—oh wow, that’s rich, coming from someone championing the interests of ordinary people and proposing to use “possibly cognate with” instead of “compare with”. I didn’t even know what a ‘cognate’ was until a couple of months ago. How are ‘ordinary’ (whatever that means anyway) people supposed to know that? “Compare with” is certainly more natural than “possibly cognate with” or, god forbid, ‘cf.’
“It makes sense in paper dictionaries where you have to save space, but here we can be clear.”—there’s another thing we need to try and save, whether we’re talking about paper or online dictionaries. That thing is people’s time. And that is the reason why “compare with” is preferable to “possibly [insert an obscure linguistic term that only linguists understand]”, or etymology sections spanning half a dozen sentences. This is especially true if we’re keeping in mind the interests of those ‘ordinary’ people, who just want to quickly figure out what that obscure word they’ve just heard means.
@CodeCat Are you going to nitpick my use of the word ‘my’ in front of the word ‘entry’ now? The entries might not be mine, but can I at least keep the messages to my unworthy person? With m’lady permission, of course. Thank you. And while my messages are still mine, the way that I see it is that I get to call things whatever I want (as long as I’m not employing insults, and I’m not), so ‘deface’ you did. I like the word, and it describes accurately the nature of your edits.
By the way, here’s an idea for you. This is how books used to be censored in tsarist Russia: [[22]]. Maybe next time you don’t like something in an entry you could just replace that part with dots, like in the book pictured. You still get to censor things (which you seem to enjoy doing), but don’t do as much damage in the process. Pfftallofthemaretaken (talk) 19:34, 26 September 2015 (UTC)
Wow, that was quite a bit of bile there. I'm not really sure that was merited in response to anything. I do not mean to lecture you, but no user really has chief control over any entry outside of his or her user namespace. Code may often be rather brusque when she makes changes to things, but the reason we are having this discussion is to decide how we, not she and not you, will handle this issue in the future. I really don't want us to be so angry about such a small matter, so I would ask all parties involved please calm down just a touch.
Compare has always seemed like shoddy explanation to me personally, but not enough that I think it should be removed wholesale. I normally understand compare to act as a stub for bigger and better things!
P.S. I can whip up a template like {{redacted|US}} or {{redacted|tsarist}} that will place black bars or dots over text respectively. It wouldn't be too hard, really. (N.B. the preceding joke is meant solely for the purpose of humor and not to antagonize, mock, or promote one side's argument in any way. Please do not get bent out of shape about a silly suggestion.) —JohnC5 22:32, 26 September 2015 (UTC)
I agree that removing "compare" along with all the terms is probably bad. But at the same time, it's often not possible for a random person to improve things, because this means knowing just what the connection being alluded to is - exactly the vagueness that is making me complain about "compare" in the first place. If I see "compare", I have no way of knowing whether the words are cognates, unless I have knowledge of those particular languages (e.g. Germanic). For Chinese, I would have no clue what to replace "compare" with; only the person who added it there presumably knows. Preventing that situation is better than having to tediously fix it afterwards, hence my revert to Pfftallofthemaretaken's edits. —CodeCat 23:10, 26 September 2015 (UTC)
That seems fair, especially considering the illegal headers. Also, sorry for calling you brusque. —JohnC5 23:53, 26 September 2015 (UTC)
@JohnC5 After reading your message I realized I was wrong and amended mine accordingly.
"I do not mean to lecture you, but"—"but that's exactly what I'll proceed to do."
"I normally understand compare to act as a stub for bigger..."—I sincerely hope that this isn't what this project is all about.
"...but the reason we are having this discussion is to decide how we, not she and not you, will handle this issue in the future."—really? I thought we were having this discussion because one particular editor wished to communicate to the rest of us what she's "always been annoyed" by.
But anyways, I agree that the situation could use some defusing, so let's all play a game. The rules are as follows. I will now revert CodeCat's edit of the entry on bảo đảm and put back "compare with Chinese 擔保" that I originally put there. These are the same two characters that are used in the hán tự spelling of the word, but in reverse order. I'm sure there's a name for it in linguistics, but haven't the faintest idea what that term might be. So I'll use "compare with". Now, whoever reverts that edit will have to explain why they didn't also remove all the instances of compare in the Etymology section of the entry on father—as the section invites us to compare that word with 15 other words, in 15 different languages! I don't know about everyone else, but I'm curious what's gonna happen. Because, after all, maybe those reverts that I mentioned in my original reply here were just personal... Pfftallofthemaretaken (talk) 08:21, 27 September 2015 (UTC)
This game seems more antagonistic than conciliatory…. —JohnC5 15:35, 27 September 2015 (UTC)
The way I see it, "compare" is not the ideal, but sometimes details are simply not available or the editor writing the etymology is not aware of the details and does not want to spend too much time researching them. You can look at it as an invitation for anyone who cares enough to do some research and replace "compare" with something more informative. --WikiTiki89 16:29, 28 September 2015 (UTC)
I agree. And I didn't see the abovementioned entry edits (so don't know whether they were removal of "compare" lists or what), but don't think that "compare" lists should be removed. Clarified, certainly. Removed, no.​—msh210 (talk) 18:03, 1 October 2015 (UTC)
One point, tho not directly related to the kerfuffle at hand, is that a lot of our "compare" stretches seem to date from before we had much in the way of protolang appendices, and seem to fulfill the role of listing cognates. This at least I think can be done much more efficiently with the appendices (which would also help with the occasionally seen complaing that our etymology sections take too much space).
I do not oppose "compare" if given context, e.g. if we say that "Barese X is borrowed from Proto-Fooian Y", then it seems pretty clear what adding a "(compare Classical Foo Z)" is doing. But yeah, a completely hanging "compare Zoinks Ø" seems unhelpful. --Tropylium (talk) 22:38, 28 September 2015 (UTC)

Automation of French conjugation[edit]

This user is requesting that all French conjugations use the Template {{fr-conj-auto}}. If the consensus agrees, this user will start converting all French verbs to the new usage. --kc_kennylau (talk) 08:32, 27 September 2015 (UTC)

  • Any chance of some documentation? SemperBlotto (talk) 08:37, 27 September 2015 (UTC)
    Obviously it's a good idea, but does it ever need parameters, or is all irregular verb information stored in the module itself? Renard Migrant (talk) 15:18, 27 September 2015 (UTC)
    @SemperBlotto: Okay, documentation done. --kc_kennylau (talk) 01:40, 28 September 2015 (UTC)
    @Renard Migrant: No parameter would be needed (usually). --kc_kennylau (talk) 01:40, 28 September 2015 (UTC)

Template inh in etymologies[edit]

I see a bot replacing {{etyl}} with {{inh}}, e.g. in diff. The resulting markup combines what was etyl and term into a single template, like {{inh|cs|sla-pro|*čьlověkъ}}. I don't really like this. Everyone happy? --Dan Polansky (talk) 11:24, 27 September 2015 (UTC)

Yes. I've been waiting a long time for someone to finally do this. I find it really annoying to have to separate the {{etyl}} and {{m}} templates, and half the time I find myself putting the etymon directly in {{etyl}}, and then I have to go back and fix it. Using {{inh}} makes life much easier. —Aɴɢʀ (talk) 12:12, 27 September 2015 (UTC)
Well, {{inh}} isn't meant as a drop-in replacement for {{etyl}} and {{m}}. It's meant specifically for inherited terms, as its documentation notes. For borrowed terms, you'd use {{bor}}. There are also cases where terms are neither inherited nor borrowed. For example, foxhound does not inherit from Proto-Germanic *fuhsaz or *hundaz but it can be said to be derived from Proto-Germanic nonetheless. I intended "inheritance" to mean specifically cases where the actual morphological formation was inherited. In other words, inheritance can be traced until the point that the actual word "foxhound" came into existence. —CodeCat 13:16, 27 September 2015 (UTC)
Good idea. Funnily enough the French template étyl has done this almost since its inception many years ago. The only issue with bot replacements is sometimes you get "Borrowed from {{etyl|la|fr}} {{m|la|<word here>}}" which for obvious reasons, shouldn't used {{inh}}. Renard Migrant (talk) 14:52, 27 September 2015 (UTC)
I haven't done wholesale bot replacements. The ones I did were always pairwise: for only one given source language and current language. That way I could make sure that the current language was in fact a descendant of the source. I've also only replaced instances where the source is the first to appear in the etymology, and only when the preceding word is "From" or empty, never "Borrowed" or anything else. On top of that, I've so far avoided language pairs where the current language might have reasonably borrowed from the source as well as inherited from it, i.e. Romance borrowings from Latin. The Germanic languages have generally not borrowed words from older stages, so they are safe. I've also skipped cases where reconstructed terms end in -, because those are likely to be stems or roots rather than fully-formed words. Roots have no descendants, only derived terms which have descendants. —CodeCat 15:13, 27 September 2015 (UTC)
Side note: it's often a hard to tell if a word's history was derivative formed in proto-X → develops into word in X or root inherited in early X → derivative formed within the history of X. I would suggest that at least in cases where we can only clearly establish a proto-root and a language-specific derivative, it's safest to analyze them as a derivative within the language, and only linking the root of the derivative to the proto-root. --Tropylium (talk) 22:28, 28 September 2015 (UTC)
I seem to remember cleaning up a couple of cases (maybe not with this specific template) where the language code was an etymology-only subset along the lines of NL. and the module tried to use it for the mention. You would need to allow for this before converting such cases. Chuck Entz (talk) 15:54, 27 September 2015 (UTC)
Are you sure? It explicitly handles etymology-only languages already. —CodeCat 16:12, 27 September 2015 (UTC)
Is the bot checking to make sure there's no borrowing along the way (e.g., if an English word from Latin via a French/Norman borrowing says only "from Latin foo", or if an English word from Hebrew via Yiddish says only "from Hebrew foo"), CodeCat?​—msh210 (talk) 17:57, 1 October 2015 (UTC)
Well, so far I've only done Dutch and the Finnic and Samic languages. I don't have any immediate plans to do more. That said, this script and template are only for inherited terms, so I wouldn't use it for English from Latin because English didn't descend from Latin. —CodeCat 18:03, 1 October 2015 (UTC)

Proposal: Drop "explained" and make it "Wiktionary:Entry layout"[edit]

I've been thinking about this for a while. I propose renaming Wiktionary:Entry layout explained (WT:ELE) to just Wiktionary:Entry layout (WT:EL), while retaining the old name and shortcut as usable redirects.

1. "explained" does not add anything new or make the title any better. We could just as well have Wiktionary:Criteria for inclusion explained, Wiktionary:Blocking policy explained, Wiktionary:Page deletion guidelines explained and maybe Wiktionary:Bots explained without the new name making the policies any more accurate. Maybe one reason "explained" is in the title is because a three-letter shortcut ("ELE") is catchy?

Bonus, less important, reason:
2. Speaking for myself, if the title is shorter, it would invite me for typing the full name of the policy in discussions or on my personal browser more often if I want to ("WT:Entry layout" or "Wiktionary:Entry layout"). Conversely, if the title is longer, it makes me more likely to use the shortcut only ("WT:ELE"). I am willing to bet this would be true for other people, too. Not that big of a reason, (it boils down to "'explained' makes the title longer!") but it's good to mention it, as secondary to the question above.

Previous discussion:

I did not use RFM this time because that's a major policy. In any event, if other people agree with the name change in this discussion, I plan to follow up with a vote to close the deal.


  • WT:EL was the shortcut to Wiktionary:External links and was used in three places: a 2006 discussion, a 2014 discussion and as a proposed shortcut to ELE itself. IMO that's unused enough that the shortcut could be changed; I renamed it to WT:EXT to free it for this proposal

--Daniel Carrero (talk) 14:50, 27 September 2015 (UTC)

Good idea, obviously. Renard Migrant (talk) 14:53, 27 September 2015 (UTC)
Support. —CodeCat 15:10, 27 September 2015 (UTC)
Seems reasonable enough - support SemperBlotto (talk) 15:13, 27 September 2015 (UTC)
Support. - -sche (discuss) 06:36, 28 September 2015 (UTC)
SupportPengo (talk) 03:19, 29 September 2015 (UTC)
Why not!   — Saltmarshσυζήτηση-talk 04:40, 29 September 2015 (UTC)
Support - --Zo3rWer (talk) 15:38, 29 September 2015 (UTC)
Support. --WikiTiki89 15:43, 29 September 2015 (UTC)
Support.​—msh210 (talk) 17:49, 1 October 2015 (UTC)
The downside of it is that it will no longer be a three-letter acronym (TLA). --Dan Polansky (talk) 15:55, 3 October 2015 (UTC)
We can still continue to call it ELE. --WikiTiki89 15:21, 5 October 2015 (UTC)
WT:ELE is definitely going to be kept, since it has been a widely used shortcut. We can use WT:ELA (Entry LAyout) as an alternate three-letter acronym, though personally I prefer just WT:EL. (Also, in Portuguese, ele = he, ela = she.) --Daniel Carrero (talk) 16:37, 5 October 2015 (UTC)

I count 10 supports (including myself, not counting @Dan Polansky). I take it we can move WT:Entry layout explained to WT:Entry layout now without the need for a vote? --Daniel Carrero (talk) 16:37, 5 October 2015 (UTC)

Yeah; I've moved the page and its talk page and subpages, leaving redirects in all cases. Any double redirects will soon show up at Special:DoubleRedirects and I'll fix them. - -sche (discuss) 21:43, 6 October 2015 (UTC)

Reimagining WMF grants report[edit]

IdeaLab beaker and flask.svg

Last month, we asked for community feedback on a proposal to change the structure of WMF grant programs. Thanks to the 200+ people who participated! A report on what we learned and changed based on this consultation is now available.

Come read about the findings and next steps as WMF’s Community Resources team begins to implement changes based on your feedback. Your questions and comments are welcome on the outcomes discussion page.

Take care, I JethroBT (WMF) 17:02, 28 September 2015 (UTC)

On three-part compounds[edit]

Ancient Greek words have a tendency to form compounds that are not directly derivable from their component words—not in the same way English compounds are.

There is also an analog in English: words of the form X-Yed, like quick-witted. However, the treatment of these words with respect to etymology is not entirely consistent, and the most common manner is apparently to create a separate lemma, e.g. witted, called an adjective but described as a suffix. Which may be the most accurate manner in which to describe such a morpheme, I'm not sure, although I do think at least it should include a hyphen. With respect to Ancient Greek—after making this list and pondering my findings, I have come upon the conclusion that the best practice is to mark the etymology with {{compound|stem|stem|suffix}}, or {{compound|stem|stem}} when the "suffix" is really just the thematic ending (i.e. type 1 above) or no ending (type 1b.) Verbs that change grade should probably get separate pages, e.g. -λογος (although the POS of this is uncertain—perhaps "adjective" is best?). Harder is the zero-grade—perhaps entries such as -τραφής?

Comments? —ObsequiousNewt (εἴρηκα|πεποίηκα) 19:30, 28 September 2015 (UTC)

Why not use {{affix|en|quick|wit|-ed}}? —CodeCat 20:27, 28 September 2015 (UTC)
Oh, hey, that is better than {{compound}}, isn't it. Great. Do you have any other thoughts regarding this? —ObsequiousNewt (εἴρηκα|πεποίηκα) 02:56, 29 September 2015 (UTC)
@User:ObsequiousNewt: It depends on whether you want to emphasize the compounding process or the suffixing process; "quick-witted" contains both processes, and compounding seems more marked to me in this term. And then, you need to know whether you want to have the terms categorized as compounds. I don't see anything wrong about "{compound|stem|stem|suffix}}", really. --Dan Polansky (talk) 16:07, 3 October 2015 (UTC)
{{compound}} won't categorise in Category:English words suffixed with -ed, while the entry should be located there. —CodeCat 17:34, 3 October 2015 (UTC)
{{compound|lang=en|quick|wit}}{{suffix||-ed|lang=en}} will, though. —Aɴɢʀ (talk) 17:56, 3 October 2015 (UTC)
Which does the same as {{affix|en|quick|wit|-ed}}, but is less logical. —CodeCat 18:00, 3 October 2015 (UTC)
It might make sense to modify {compound to categorize to Category:English words suffixed with -ed when it sees {{compound|quick|wit|-ed|lang=en}}. Then the categorization effect would be the same, and it would only be the matter of deciding whether the markup should emphasize compounding or suffixing. --Dan Polansky (talk) 19:22, 3 October 2015 (UTC)
I'm not sure what you mean by emphasizing. The end result is probably indistinguishable on the page. What you seem to be suggesting is to make {{compound}} work like {{affix}}, minus differences in the parameters. But I already checked in the past whether this would work, and it won't. Not everything beginning with - is a suffix, and not everything ending with - is a prefix. Those cases don't work with {{affix}} either, no, but that template was created anew so there were no issues with backwards compatibility. With {{compound}} on the other hand... —CodeCat 22:46, 3 October 2015 (UTC)
Will {{affix|en|quick|wit|-ed}} put it in Category:English compound words? —Aɴɢʀ (talk) 19:32, 3 October 2015 (UTC)
It will; try it in a dummy entry in preview, without saving. --Dan Polansky (talk) 19:41, 3 October 2015 (UTC)

Proper nouns and capitalization across all languages[edit]

We have established through many discussions here in the BP that proper nouns are defined by their usage and semantics and not by their capitalization. Most of these discussions, however, focused mainly on English. I am wondering whether we have a consensus that this also applies to all other languages that distinguish capitalized and uncapitalized nouns. I know that many of us believe that we should not distinguish between proper and common nouns at all, but as long as we do, we should do it consistently. Let's do a poll on whether we agree with the following principle: For all languages, usage patterns and semantics should take priority over capitalization and punctuation in determining whether a noun is common or proper. --WikiTiki89 15:32, 29 September 2015 (UTC)


“For all languages, usage patterns and semantics should take priority over capitalization and punctuation in determining whether a noun is common or proper.”

Please state whether you agree or disagree. If you disagree, please explain why and, if possible, include an example of when you believe this should not apply.


  1. Agree. --WikiTiki89 15:32, 29 September 2015 (UTC)
  2. Agree, but my preference is to treat them all as nouns and use categorisation alone to distinguish them. —CodeCat 15:33, 29 September 2015 (UTC)
  3. Agree. The usefulness of distinguishing is lost if capitalization is used as the distinguisher, as there would be no need to look up the part of speech. --Andrew Sheedy (talk) 23:22, 30 September 2015 (UTC)
  4. Agree. Capitalization does not necessarily tell whether something is a proper noun; rather, capitalization habits are to an extent adjusted based on the perception of grammarians of whether something is a proper noun or not. In particular, "Frenchman" is not a proper noun. However, I am not clear what role punctuation might have in something being a proper noun; what language would that pertain to? On another note, I don't think we should necessarily be consistent across languages: If English grammarians deem names of languages to be proper nouns, I am okay with marking them so in English, while marking them as common nouns in Czech as per Czech grammatical tradition. --Dan Polansky (talk) 10:01, 3 October 2015 (UTC)
    Punctuation was just hypothetical; what I had in mind was something like Ancient Egyptian cartouches. --WikiTiki89 13:14, 3 October 2015 (UTC)


This becomes problematic when you get into dead languages. Academic consensus for Latin, Ancient Greek, etc. is to capitalize proper nouns, even though actual writing only had one (upper) case. —ObsequiousNewt (εἴρηκα|πεποίηκα) 16:12, 29 September 2015 (UTC)

I think you misunderstood. What I am saying is not about how we should capitalize nouns, but about how we should classify them. In other words, that we should ignore the capitalization when deciding whether to put ===Noun=== or ===Proper noun=== as the POS, but still follow the established capitalization practices for deciding where to put the entry. --WikiTiki89 17:17, 29 September 2015 (UTC)
Yeah, I see what you mean now. My bad. —ObsequiousNewt (εἴρηκα|πεποίηκα) 17:10, 30 September 2015 (UTC)
  1. Disagree. Semantics are not always easy to determine. Wikitiki89 recently turned the famous sense of перестро́йка ‎(perestrójka, perestroyka) into a proper noun, so he is making a case now.

    There is no convention or precedence of treating lower case words as proper nouns in Russian. Lower case words can never be proper nouns in Russian, regardless of semantics, and I see no need to introduce this. Specific rules and conventions with different languages should not be ignored. I oppose treating differentiation of proper/common nouns the same way we do for English. E.g. language names, ethnicities, month and weekday names are common nouns and are spelled in lower case in Russian but some political terms are proper nouns, made so by the Soviet Communist government.

    I am not too interested in making points on the topic. Suffice that I expressed my opinion on the matter but I may join the discussion later on. --Anatoli T. (обсудить/вклад) 01:28, 1 October 2015 (UTC) (I've reformatted your post slightly, Anatoli.​—msh210 (talk) 17:46, 1 October 2015 (UTC))

  2. Disagree. I have no reason to think that semantics and/or usage is the usual standard by which proper vs. common noun is determined in every language (that has proper and common nouns). Maybe for some languages it is indeed orthography (e.g. capitalization) or something else. I think this should be decided by each language's editors.​—msh210 (talk) 17:44, 1 October 2015 (UTC)


And what are the cross-linguistic usage patterns and semantics that determine whether a noun is proper or common? —Aɴɢʀ (talk) 17:20, 29 September 2015 (UTC)

These might vary by language (I didn't mean to imply that they are universal). The general idea is that proper nouns usually cannot take determiners or adjectives, cannot change their number, and cannot be possessed (unless these features are already part of the lemma, or unless the proper noun is being commonized). As far as semantics, they generally refer to one specific thing, rather than a class of things. --WikiTiki89 17:35, 29 September 2015 (UTC)
We should be consistent across languages, though. If day or month names are common nouns in some languages, we should treat them that way in all languages. This, by the way, is one huge reason for treating proper nouns and common nouns the same on Wiktionary: the distinction is semantic, and can be determined by analysing the referent, so the result is the same for all words with the same referent in all languages. They are a subset of nouns, not separate from nouns. —CodeCat 23:39, 30 September 2015 (UTC)
That's not necessarily true though. Some languages might handle certain concepts with common nouns that other languages handle with proper nouns. In most cases, you would be right, but there will be exceptions. --WikiTiki89 01:11, 1 October 2015 (UTC)
That seems kind of unlikely to me. Can you give an example? —CodeCat 17:25, 1 October 2015 (UTC)
For example, Navajo bilagáana tʼáá biʼałkʼiijééʼ (American Civil War). The Navajo term is not a proper noun, as it translates literally to "when the Americans fought each other". —Stephen (Talk) 01:55, 2 October 2015 (UTC)
  • Also, what English calls "Boxing Day" (proper noun) is in many languages just "second day of Christmas" (a specific common noun - one of the days of Christmas). Smurrayinchester (talk) 09:47, 2 October 2015 (UTC)
You aren't seriously suggesting applying the same POS rules across languages? In some languages, adjectives are verbs. In others, they're nouns. In still others, they're neither. I'm sure there are similar discontinuities in assignment of terms to the proper-noun POS. Chuck Entz (talk) 02:31, 1 October 2015 (UTC)
  • Many Australian languages differentiate proper nouns from common nouns by suffixes, which gives us a linguistic way of telling common and proper nouns apart. The Western Desert languages treat ŋana (determiner "who") as a proper noun (ref), so do we make it a proper noun across all languages? Smurrayinchester (talk) 14:58, 1 October 2015 (UTC)
    • "Who" is proper in English too. It refers to the specific person whose identity you are asking about. It also fits all of the criteria WikiTiki noted above: it can't take determiners or adjectives, can't change number, and can't be possessed. —CodeCat 17:27, 1 October 2015 (UTC)
      • But it's a pronoun, not a noun. Are there "proper pronouns" to be distinguished by a new header? Equinox 01:32, 2 October 2015 (UTC)
        • All pronouns are proper; I don't think there are common-noun-like pronouns. --WikiTiki89 15:32, 6 October 2015 (UTC)
          • What about indefinite pronouns like someone, anyone, whoever, etc.? They don't refer to a specific person. In many languages (e.g. Italian and Portuguese) possessive pronouns can take determiners (il mio padre / o meu pai), and as for pronouns taking adjectives, what about "poor me" and "lucky you"? —Aɴɢʀ (talk) 19:54, 6 October 2015 (UTC)
            • Good point. I realized that right after I posted that. I think indefinite pronouns would be a separate category altogether, still unlike common nouns (and even they can sometimes be "commonized" into a common noun: e.g. "this particular someone"). "Possessive pronouns" are determiners and are not really pronouns, much like English possessives such as "John's" are determiners and are really no longer proper nouns. And in case anyone was going to mention "o João", this is one of the reasons I said "usually" and not "always". --WikiTiki89 20:43, 6 October 2015 (UTC)
  • Can you give an example of a proper noun that meets all these criteria? I'm struggling to think of one. Even terms that are universally agreed upon as proper nouns violate these rules in at least some edge cases. Germany: "the Germany of my youth", "beautiful Germany", "the two Germanies", "Merkel's Germany". John: "the John I knew and loved", "good old John", "Which are of the Johns do you mean?", "my darling John". White House: "the White House", "the rebuilt White House", "they wanted their own White Houses", "Hoban's White House". Allah: "a vengeful Allah", "the groups worship totally different Allahs", "thy Allah and the Allah of thy fathers". Smurrayinchester (talk) 13:30, 6 October 2015 (UTC)
    • You seem to have missed my parenthetical remark "unless the proper noun is being commonized". Practically any proper noun (in English at least) can be turned into a common noun with a slightly different meaning. --WikiTiki89 15:32, 6 October 2015 (UTC)
I'm often not sure whether to make something a common or proper noun, e.g. medical syndromes. Can't think of examples right now, but suppose there's a "small face syndrome": it refers to one specific thing and has no plural; it also isn't uncountable (because it's "the syndrome", not "some syndrome"); I would probably make it a proper noun but it feels odd since it's somehow nothing like Paris. Equinox 14:21, 1 October 2015 (UTC)
I feel that this is better decided on a language basis, by each of their contributor communities individually. That said, I support this as a default guideline. — Ungoliant (falai) 14:36, 1 October 2015 (UTC)
If this is such a hazy question, then why do we need to determine the proper/common status of all nouns anyway? For a user interested in onomastics specifically, the unambigous cases like placenames or personal names will be clearly identifiable in any case. --Tropylium (talk) 20:10, 1 October 2015 (UTC)

Should we display the active votes in the watchlist?[edit]

I moved the vote list from Wiktionary:Votes to a separate template that can be used anywhere as a reminder of the votes to participate.

Do you think it would be a good idea displaying this box in the watchlist of all users to increase awareness of votes?

I believe this could be accomplished by editing MediaWiki:Watchlist-details. Maybe we should have some way for each user to opt-out displaying the vote box in the watchlist, but I'm not sure how to do that.

Previous discussion:

--Daniel Carrero (talk) 17:32, 29 September 2015 (UTC)

Support. I always miss votes that are not explicitly advertised in the BP. --WikiTiki89 17:38, 29 September 2015 (UTC)
Before, the way not to miss votes was to watchlist Wiktionary:Votes. Now, you have to watchlist Template:votes. —Aɴɢʀ (talk) 18:24, 29 September 2015 (UTC)
I do watch WT:Votes and I still manage to miss them all. --WikiTiki89 18:28, 29 September 2015 (UTC)
@Angr @Wikitiki89 My idea was placing that box in the watchlist of all users, so the list of votes itself should appear whether you watchlist Template:votes or not. But, anyway, if that idea proves unpopular, I'll delete the template and restore WT:Votes to the previous version. --Daniel Carrero (talk) 00:05, 1 October 2015 (UTC)
Also, I think the actual list of votes should not be located in the template namespace. We should move it to something like Wiktionary:Votes/Active and have {{votes}} tranclude it. --WikiTiki89 19:57, 29 September 2015 (UTC)
Since no one responded to this suggestion, I just went ahead with it. --WikiTiki89 19:04, 1 October 2015 (UTC)
I don't mind making the votes more visible, but I don't like the idea of pushing the actual watchlist even further down the screen; already the "preamble" takes up so much space that only a few lines of the actual watchlist make it onto the first screen on my computer. Is there a way of making it the version that gets displayed in the watchlist smaller, maybe a horizontal list separated by mid dots? Alternatively, what if we adopted the same idea as the Beer Parlor month-subpages, and had a single page that new votes were moved through (I mean using the "move" function), so that each vote page itself was added to the watchlist of everyone who watched that central page, and each vote would thus show up in everyone's watchlist every time it was edited, the same way that new BP month subpages show up in your watchlist without you ever doing anything to watchlist them? (Contrast how, currently, if you watchlist WT:VOTE, you only see the one edit when a new vote is listed on that page; you aren't updated when the vote starts, and when people vote, unless you watch each individual vote page.) - -sche (discuss) 21:01, 29 September 2015 (UTC)
  1. "I don't mind making the votes more visible, but I don't like the idea of pushing the actual watchlist even further down the screen"
    • Since that box floats to the fight, maybe the box could stay side-by-side with the watchlist instead of pushing it down further?
  2. "Is there a way of making it the version that gets displayed in the watchlist smaller, maybe a horizontal list separated by mid dots?"
    • Yes, that could be done and it's not very difficult to do.
  3. "Alternatively, what if we adopted the same idea as the Beer Parlor month-subpages, and had a single page that new votes were moved through (I mean using the "move" function), so that each vote page itself was added to the watchlist of everyone who watched that central page, and each vote would thus show up in everyone's watchlist every time it was edited, the same way that new BP month subpages show up in your watchlist without you ever doing anything to watchlist them? (Contrast how, currently, if you watchlist WT:VOTE, you only see the one edit when a new vote is listed on that page; you aren't updated when the vote starts, and when people vote, unless you watch each individual vote page.)"
    • That sounds great. But I'm not sure how I would do that; can it be done, in the first place?
--Daniel Carrero (talk) 00:05, 1 October 2015 (UTC)

Update: I edited MediaWiki:Watchlist-details to make the vote box currently appear in the watchlist of all users, as proposed by me above. I request feedback on this. Does it look good? Edit Template:votes/layout to change the appearance if needed. --Daniel Carrero (talk) 00:05, 1 October 2015 (UTC)

I think the list should be sorted by end date, and the end date should be shown as well. —CodeCat 00:20, 1 October 2015 (UTC)
Support. --Daniel Carrero (talk) 04:38, 1 October 2015 (UTC)
Good idea! I recently proposed instituting a system for notifying users of votes, and I think this would do the job well. -Cloudcuckoolander (talk) 00:37, 1 October 2015 (UTC)
  • The bad consequence of this is that, where before I only had to edit one page to move votes to the bottom, now I need to edit two. I don't really like the change. OTOH, if the change makes it easier for people to watch active votes, that's fine. I still think that regularly glancing at WT:VOTE with its less than 10 items at each point of time is an easy way to not miss any votes. --Dan Polansky (talk) 10:07, 3 October 2015 (UTC)

I initially disliked the box, considering it clutter. Knowing the human tendency for habituation, I left it a couple of days before commenting. I'm fine with it now, and I think it's a good idea. I support this addition. — I.S.M.E.T.A. 13:30, 3 October 2015 (UTC)

I don't mind it being there, but conceptually the watchlist seems a slightly irrelevant place for it. Equinox 14:41, 3 October 2015 (UTC)
I agree with Equinox. There is no good conceptual logic to placing it on the watchlist page, but it does place one of our wiki-citizenship duties squarely in front us better than any page I can imagine. DCDuring TALK 18:37, 3 October 2015 (UTC)
It makes about as much sense as Wanted entries being on the Watchlist. --WikiTiki89 15:23, 5 October 2015 (UTC)

Update: Per CodeCat's suggestion, I edited the list to sort votes by end date and show the end date as well. --Daniel Carrero (talk) 00:07, 7 October 2015 (UTC)

FYI: Vote on namespace for reconstructed terms[edit]

Wiktionary:Votes/2015-09/Creating a namespace for reconstructed terms --WikiTiki89 20:46, 29 September 2015 (UTC)

Vote: Installing DynamicPageListEngine[edit]

See: Wiktionary:Votes/2015-09/Installing DynamicPageListEngine. --Daniel Carrero (talk) 21:48, 29 September 2015 (UTC)

October 2015

Templates for place names[edit]

I created a few templates for place names; they should be able to generate standardized definitions for them in all languages. I've been using these templates for entries of places in Brazil only, but this system should be usable for other countries by copying and adapting the existing templates.

I chose the format of "municipalities of São Paulo, Brazil" to copy Category:en:Municipalities of China and others. (see Place names and Earth modules for a complete list of place name categories; I should mention I've found some different, inconsistent naming formats that could hopefully be fixed eventually)

The templates I created:

Main template:

Known issue:

  • These templates generate simple standardized definitions like "A municipality in São Paulo, Brazil." They still lack the functionality of linking from states to state capitals, and vice-versa. I plan on implementing that feature soon.

Thoughts? --Daniel Carrero (talk) 13:14, 1 October 2015 (UTC)

I was expecting something more like {{surname}} and {{given name}}. —CodeCat 13:22, 1 October 2015 (UTC)
How so? --Daniel Carrero (talk) 13:26, 1 October 2015 (UTC)
Like {{municipality|São Paulo|Brazil|lang=pt}}. —CodeCat 20:05, 1 October 2015 (UTC)
I did start to develop something like this at User:Daniel Carrero/place (while is just a stub, it did work perfectly in this revision, see the code). But, in my opinion, I'd rather use hard-wired full definitions (no matter if they are stored in MW code or Lua) like "municipality of São Paulo, Brazil" for this reason: if we use parts of definitions and allow people to use these parts in any way they see fit, then it would be potentially impossible to make the whole system consistent (judging by all the current un-templatized entries for place names, which are inconsistent in various levels, from the act of categorizing or not some entries, to the internal logic of the category naming system itself!):
  1. There would be definitions like {{city|Florianópolis|Brazil}} (basically all second-level subdivisions in Brazil are "municipalities", presumably that's why multiple Wikipedias and Wiktionaries use "municipality" categories; yet I found many of those to be randomly defined as "cities" or "towns", which could mess up the categorization and definitions).
  2. There would be too much freedom to change levels, like {{municipality|São Paulo|Southeastearn Region|Brazil}} with a "Southeastearn Region" in the middle.
  3. And also just {{municipality|Florianópolis|Brazil}} does not account for the fact that Florianópolis is a state capital unless you add another parameter.
I don't mind changing the system, but since the current system restricts each of the full definitions and associates them with categories individually, it does not have any of the aforementioned limitations, so I would like any other proposed system to be safe from all these problems as well. --Daniel Carrero (talk) 20:32, 1 October 2015 (UTC)

Update: Rather than adding new parameters to the previous templates, I created different templates for capitals because I needed more specific definitions and categories (they use 2 categories each: state capitals of Brazil; and municipalities of each state). I've thought of somethiong along the lines of {{place:capital of São Paulo, Brazil}} or {{place:São Paulo (capital)}}, but that could change. They work well. The only problem I fear is having too many different templates, though that seems manageable. Naturally, I am open to different suggestions, such as using Lua or less templates with more parameters, but first I have one thing to say: The current system is simple and intuitive enough (at least, that's my opinion) and very customizable in case other countries have different needs. (like, provinces instead of states; or more or less comma-separated levels) For this reason, I'd suggest waiting some time before attempting to merge the current templates into any more condensed model, because that might not work for all countries. In the meantime, I plan to continue using these templates for new entries. As usual, I also request feedback of other people, too. --Daniel Carrero (talk) 19:14, 1 October 2015 (UTC)

Update: Using fewer templates for Brazil: I'm deleting state-specific and (oh God.) city-specific templates in favor of {{place:Brazil/municipality}} and {{place:Brazil/state capital}} (the last one I'm going to create in a moment.) --Daniel Carrero (talk) 08:19, 2 October 2015 (UTC)

Should all foreign-language place names have counterparts in English?[edit]

I've created most of the 853 entries for municipalities (a.k.a., cities/towns) of Minas Gerais (a state of Brazil) in Portuguese. See Category:pt:Municipalities of Minas Gerais, Brazil. I am thinking of doing the same to fill the English category completely as well. See Category:en:Municipalities of Minas Gerais, Brazil.

Should all place names in foreign languages have counterparts in English? Surely many of those (or all of those?) are citable anyway. I've done some cursory search of small Brazilian towns on Google Books and as of yet found all of them to be citable in English. Random example: Comercinho is citable on this book. What do you think? --Daniel Carrero (talk) 08:33, 2 October 2015 (UTC)

Might as well. I doubt anyone would stop you. --Zo3rWer (talk) 12:58, 2 October 2015 (UTC)
How can you be so sure that every single placename in every language has an English translation? Making all FL placenames link to English by default was not a good idea. — Ungoliant (falai) 15:42, 3 October 2015 (UTC)
Of course they have an English name. How else would English speakers refer to it? —CodeCat 15:47, 3 October 2015 (UTC)
I doubt that some little village up in a Chinese mountain has an English name. It will certainly have an English transliteration - but we don't accept those. SemperBlotto (talk) 16:04, 3 October 2015 (UTC)
Attestation in running English text is a prerequisite for an English entry, as per CFI. --Dan Polansky (talk) 16:13, 3 October 2015 (UTC)
Place names are not words, they're designations. If a person comes along and introduces themselves as Katherine, then the other party must use that name to refer to her. And that's true cross-linguistically; speakers of all languages must use that name, because that's what she said her name is. The same would apply to a random village in China. It can be assumed that foreign speakers will adopt the name that locals use, because that is the name of the village. Of course, there's exonyms, but that's a different story: with exonyms, there is a name, but certain speakers decide to use another. When there is no known name, it must be assumed that there is still at least one name. —CodeCat 17:33, 3 October 2015 (UTC)
That sounds almost like an argument to make such entries Translingual rather than pin them down to a specific language. —Aɴɢʀ (talk) 17:54, 3 October 2015 (UTC)
Maybe that's a good idea. —CodeCat 17:59, 3 October 2015 (UTC)
Names as "translingual" would seem to make it pretty difficult to deal with even things like Москва ‎(Moskva)/Moscow, let alone more divergent cases like Köln/Cologne. And in case you're only suggesting this approach for placenames that only exist in one language: suppose that we discover that a tiny Chinese village does have a distinct name in a minority language spoken nearby; would this turn it from translingual back into (Mandarin) Chinese? I'd say treat placenames as particular to the local language, and attestations in other languages as citation loans, unless there's some kind of evidence to the contrary (e.g. for a pronunciation or spelling particular to English). --Tropylium (talk) 18:17, 3 October 2015 (UTC)
Single-word place names (London, not New York) either are words per me and John Stuart Mill or in any case they behave like words: they get written down using alphabet, get pronounced, inflected and have an etymology. Having them as translingual very often does not work since they get language-specific inflection. But even if place names somehow were not "words", they still get included and regulated by CFI, and the criterion of attestation applies to them. We even had this vote Wiktionary:Votes/pl-2010-05/Placenames with linguistic information 2, so I do not think I am in a minority to think they need to be attested. --Dan Polansky (talk) 19:08, 3 October 2015 (UTC)
Then does this mean that many of the articles for places that exist in English Wikipedia actually have titles and describe places in running text, when their names are not words usable in English according to our own standards? I find that a bit bizarre. Let's take w:Tegal Buleud as a random obscure place. The article uses the name of the place twice in running text. Is "Tegal Buleud" not English if it's used in this way? If not, then what would make it English? —CodeCat 22:55, 3 October 2015 (UTC)
It's not that "their names are not words usable in English", it's that their names are not attestable in English by our standards. --WikiTiki89 15:54, 5 October 2015 (UTC)
I understand that, but then this does mean that our mission statement is false. We don't include all words, just the attestable ones. —CodeCat 16:18, 5 October 2015 (UTC)
And that's always been the case. --WikiTiki89 16:23, 5 October 2015 (UTC)
What I do (with Italian placenames) is generate an Italian entry then, if I can find an English translation, I add an English entry for the translation. If I can't find a translation, I have been known to add an English section if it is a well-known place. SemperBlotto (talk) 18:23, 3 October 2015 (UTC)
For Spanish places, I usually add an English and a Spanish, but sometimes just a Spanish or just an English. This is pure laziness from my part, but I suppose in theory we could have the same entry in loads of languages. I know that the French Wiktionnaire often does this - see [23] as an example of (excessive?) repetition. --Zo3rWer (talk) 10:31, 4 October 2015 (UTC)
Place names are designations, but these designations are words. There should be a section for a language only if the word is used in the language: attestations are required. These sections may be very useful, especially for pronunciation (have you ever heard of another dictionary with a pronunciation given for place names from all over the world?), homophones, examples/citations, usage notes, derived words, gentilics, anagrams, etc. In the example given above, most sections are repetition, sure, but these sections will be completed with time. Lmaltier (talk) 20:20, 5 October 2015 (UTC)
  • I don't see any reason to forbid creating English entries for foreign-language place names. I also don't see a need to rush out and create them all immediately. People can create them from time to time, though. Purplebackpack89 22:12, 5 October 2015 (UTC)
    Yes, especially when they see them used in the language. It's the same for Italian places names used in Spanish, etc. Lmaltier (talk) 17:58, 6 October 2015 (UTC)

Place name format: English (non-gloss) vs. foreign language (translation + gloss)[edit]

As I said in the discussions above, I created a few placename templates. I'd like to discuss about the results as they appear on the entries. See the entry Ouro Preto (a municipality in Brazil). It is currently defined in 3 languages using the same template. ({{place:Brazil/municipality}}) I checked on Google Books, it's attestable in all three.

If the language is English, the definition is formatted as a main (non-gloss) definition and if it's a foreign language, it's formatted as an English translation (linking back to the English entry or section) + {{gloss}}. It looks good IMO in the entry I linked, and it's a consistent system overall, but it also causes a problem: the translation still points back to the English section even if there's no English section to begin with, like in the entry Comercinho. (Which in my opinion, is a bad thing that should be fixed some way or other, but it's not extremely harmful. Ultimately, it's just a pointless link back to the same page.) I was just kind of expecting an English section to be present most of the time, like it happens randomly in entries like Laredo and Colorado (each of these entries have definitions for places in multiple countries, in the English section).

Note that the template uses the same syntax for all languages (only the language code changes), so it's supposed to make copypasting between languages easy. If you wanted to add a French section, you would just use the same code with "fr".

# {{place:Brazil/municipality|en|state=Minas Gerais}}
# {{place:Brazil/municipality|pt|state=Minas Gerais}}
# {{place:Brazil/municipality|es|state=Minas Gerais}}

This is one reason why I asked directly above "Should all foreign-language place names have counterparts in English?". If we could simply add English entries/sections for all placenames, the problem would be solved. But for cases where there's no language section in English, what should the template do? Should the template allow for non-gloss formatting (first letter capitalized and the period at the end) in foreign language entries? Can we use the Translingual section some way for place names? Most likely, any new functionality would be controlled by new parameters to the templates, so the system would become a bit more complex than it is now.

My favorite proposal is this:

  • For foreign language entries without an English translation, make the template keep the gloss format but without linking the main word [like this: Ouro Preto (municipality in the state of Minas Gerais, Brazil)]. Rationale: Consistent formatting in all entries, and when you translate Ouro Preto into English, you would use the original language name ("Ouro Preto"), even in cases where it's not attestable in permanently recorded media in English.

I plan to continue using the current system for the new entries I am creating. Edit {{meta-place}} if needed, to change how it works. --Daniel Carrero (talk) 08:21, 5 October 2015 (UTC)

Proposal: Extinct unwritten languages should not qualify for inclusion[edit]

WT:CFI includes some interesting clauses for the inclusion of terms from languages that are not "well-documented on the internet". Perhaps the most interesting is that entries may be created even on the basis of a single mention, without any attestation or evidence of attestability required.

Let's think a little bit about what are we even doing here. I would not say this criterion is always unreasonable; but obviously it does not exist just as a backdoor to document words even when they are not attestable per the usual standards, since we do not allow this method for creating entries for rare words in well-documented languages.

The impression I get — though this does not appear to be written out anywhere! — is that there's an underlying idea that some languages mainly exist elsewhere than on the Internet: e.g. as old literary languages that are not spoken anymore, or mainly as spoken languages which so far do not have much written materials available. And so we assume that if a word as been documented in a scholarly source or the like, it could in principle be also easily attested once the speakers of Pohnpeian get around to hanging online in greater amounts, or once people have uploaded enough Karakhanid Turkic materials on Wikisource, or so on forth. But as long as this is not the case, asking editors to provide attestations would be simply adding to a backlog.

"Mention implies potential attestation" however fails to hold for some languages. I propose that to qualify for inclusion in the main namespace, a language must have at least one of the following:

  1. A surviving written tradition.
  2. Continuing existence (≈ the potential for a written tradition to be established in the future).

This is relatively lenient still. If we consider epigraphic attestation a written tradition (but see below), languages like Oscan would continue to qualify under criterion #1 — alongside any more abundantly attested extinct languages, say Hittite, Old Tupi or Ubykh. In the absense of other updates to CFI, any individual words in these languages would also continue to qualify on the basis of mentions alone.

What this serves to exclude are languages like Crimean Gothic, Pumpokol or Tasmanian. Any material that is known of languages of this sort generally only exists in linguistic sources, in all but exceptional cases comes with glosses attached, and thus seems to fall clearly short of Wiktionary's general rule of inclusion, as stated at WT:CFI:

A term should be included if it's likely that someone would run across it and want to know what it means.

That is: it seems to me like entries in dead-and-buried languages do not exist to fulfill this need. They exist solely for the sake of linguistic curiosity. No one randomly runs across text in Crimean Gothic, and is left wondering what it might mean.

Linguistic curiosity is of course still a need, and Wiktionary is doing a good job at answering it, I believe. Some people might indeed wonder "so how does one count to ten in Crimean Gothic", or "has this Ket word of alleged Yeniseian ancestry been even recorded from the related languages" and want to look it up. Hence I am not proposing flat-out deleting what we currently have in languages that fail the current-or-potential-written-tradition test. A better solution probably would be the inclusion of recorded data from any natural language variety in the Appendix namespace. (Or, perhaps, the creation of a new Extinct namespace?)

You might ask what difference does it make to switch from regular entries to an appendix, other than make the terms not come up by default in search. One thing is that this would also seem to be grounds to diverge from the usual layout requirements. For example:

  • If a language's known corpus is something like twenty or fifty words, we can put them in a single appendix rather than sprinkle them across several stub entries.
    • Individual quotations could in such cases be replaced with a single references section.
  • If a language has only been recorded in phonetic transcription, we could accordingly provide only the pronunciation/transcription, and avoid implying that an orthography exists.
    • If competing transcription schemes exist, we could standardize one of them and cover the others by means of an equivalence table, rather than creating duplicate entries.
  • Missing information such as parts of speech could be left unknown.
  • If glosses are only available in a language other than English, and the precise meaning cannot be verified, we might leave the glosses in the original language and not risk mistranslating things.

--Tropylium (talk) 01:08, 6 October 2015 (UTC)

I support this proposal, if I understand that it essentially means "Mentions may only be used for attestation, if the language's corpus as a whole has actual attestations of other words. Languages whose entire corpus is mentions do not qualify for inclusion." —CodeCat 09:27, 6 October 2015 (UTC)
It's still more lenient than that: "Entries may be based on mentions only if the language's corpus includes actual attestations, or, if the language is not extinct." --Tropylium (talk) 16:35, 6 October 2015 (UTC)


The above argumentation could additionally be extended to exclude from mainspace also two further types of languages:

  • Extinct languages whose known corpus is highly limited and which does not allow directly establishing the language's grammar, pronunciation, the meaning of its words, etc. Often information of this type can be determined via the comparative method (a particularly good example might be Proto-Norse) — but it would seem to me that this is not too much different from entirely unattested reconstructed languages.
  • Moribund languages, for which all available material is linguistic documentation and no revitalization efforts exist. For such languages, we have effectively no foreseeable hope of ever gaining anything resembling a written standard that could be documented according to regular attestation criteria. An example might be Ter Sami.

I would however like to go on record as strongly in favor of continuing to include endangered languages for which even marginal natural transmission can be suspected to remain (e.g. Ishkashimi), or even elementary attempts at formulating a written standard are underway (e.g. Votic). Hence any exclusion of languages by these criteria should be probably "opt-in", i.e. with the burden of proof on the side claiming that a language is indeed poorly-documented enough to not merit inclusion. --Tropylium (talk) 01:08, 6 October 2015 (UTC)

I strongly oppose this. Your "impression" (i.e. assumption) recorded above is inaccurate; we have more lenient criteria for such languages because we would otherwise have no way of documenting them (the use-mention distinction is merely meant as a tool for us to determine whether a word is really used). There is no point to separating off certain natural languages thus, and it would be counterintuitive to users. —Μετάknowledgediscuss/deeds 01:25, 6 October 2015 (UTC)
An interesting interpretation. (And in the absense of explicit policy support, I could ask whether it is also merely an assumption.) It seems to carry fairly strong implications, though.
  • If Wiktionary's purpose is to document any use of language whatsoever, on an equal level;
  • if this includes extinct languages just as well as living ones;
  • and if we're not tied to direct attestation, but are to also allow documentation thru indirect inference such as mentions
— then this would actually seem to require that we must also document proto-languages in mainspace, to an extent! The most reliably "entirely reconstructed" words are at least on an equally probable footing as are words in otherwise attested languages that are presumed to have exist based on hapax attestations by non-native speakers, reconstructed semantics, etc.
You might also want to note that the proposals above are not in opposition to the documentation of anything at all, only on what should be presented as regular dictionary entries. Contrary to its name, WT:CFI is not actually the criteria for inclusion on the Wiktionary servers period, but merely the criteria for inclusion in mainspace.
(Also, should I assume that this reply is meant also in opposition to the main proposal, not just to the two subproposals?)
--Tropylium (talk) 16:31, 6 October 2015 (UTC)
I oppose this as well. Attested terms should be included, period. —CodeCat 09:27, 6 October 2015 (UTC)
Fair enough, though this counterargument appears to only cover only languages of the Proto-Norse type. With languages of the Ter Sami type, no attestations exist whatsoever. --Tropylium (talk) 16:31, 6 October 2015 (UTC)
What exists for Ter Sami then? —CodeCat 16:58, 6 October 2015 (UTC)
The only major source is a comparative dialect dictionary of Kola Sami (the majority of whose materials are Kildin Sami) rendered in phonetic transcription, i.e. a huge bunch of mentions. Accordingly, what entries we have essentially have been created only as a pronunciation. Consider e.g. лa̭i̭ja ≈ IPA [ɫʌi̝jɑ]. (I doubt if anyone with native knowledge of Ter Sami would recognize either of these written forms, if presented with it.)
It's not only moribund languages that suffer from this problem, though. Tons of endangered languages only have field research materials available so far. You may recall a BP discussion about a new "Languages without a Written Tradition" in February. Hence I draw a difference here not between written and unwritten languages, but on if Wiktionary inclusion could be expected actually benefit the speaker community, and if we could expect native speakers to contribute at some point. --Tropylium (talk) 19:39, 6 October 2015 (UTC)
  • Query: How would this proposal affect proto-languages? Do you suggest that we remove all of our proto-language appendix entries? ‑‑ Eiríkr Útlendi │Tala við mig 08:00, 6 October 2015 (UTC)
    None of this has any effect on reconstructed proto-languages, which are already excluded from mainspace inclusion. --Tropylium (talk) 16:31, 6 October 2015 (UTC)

Away until mid-October.[edit]

I will be away until mid-October. Please try to have this project completed by the time I return. Cheers! bd2412 T 15:28, 7 October 2015 (UTC)

You're not my supervisor! WurdSnatcher (talk)