Wiktionary:Beer parlour/2013/September: difference between revisions

From Wiktionary, the free dictionary
Jump to navigation Jump to search
Content deleted Content added
Line 217: Line 217:
:::::::We split proper nouns senses from common noun senses, but many, many Sanskrit sections do not. Not all Sanskrit sections include the link to the underlying dictionary, which is itself an omission, and not every bit of explanatory note in the original dictionary seems to have survived. A glossary for the abbreviations used does not seem to be included. The language used is not contemporary English and the definitions lack glosses. [[User: DCDuring |DCDuring]] <small >[[User talk: DCDuring|TALK]]</small > 23:23, 12 September 2013 (UTC)
:::::::We split proper nouns senses from common noun senses, but many, many Sanskrit sections do not. Not all Sanskrit sections include the link to the underlying dictionary, which is itself an omission, and not every bit of explanatory note in the original dictionary seems to have survived. A glossary for the abbreviations used does not seem to be included. The language used is not contemporary English and the definitions lack glosses. [[User: DCDuring |DCDuring]] <small >[[User talk: DCDuring|TALK]]</small > 23:23, 12 September 2013 (UTC)
:::::::For an example of a problem see [[WT:RFC#सह]] and join the discussion there. [[User: DCDuring |DCDuring]] <small >[[User talk: DCDuring|TALK]]</small > 23:25, 12 September 2013 (UTC)
:::::::For an example of a problem see [[WT:RFC#सह]] and join the discussion there. [[User: DCDuring |DCDuring]] <small >[[User talk: DCDuring|TALK]]</small > 23:25, 12 September 2013 (UTC)
:::::::: MW dictionary is perfectly valid even today (because we're dealing with an extinct language, doh), some of the entries were created before the online version of MW dictionary was available, Sanskrit grammar tradition doesn't make the distinction between proper and common nouns, 98% of the words in its definitions are perfectly valid contemporary English as far as I recall, and anyone studying Sanskrit doesn't need a meaning gloss. In other words, there are no problems with Sanskrit entries. --[[User:Ivan Štambuk|Ivan Štambuk]] ([[User talk:Ivan Štambuk|talk]]) 16:02, 13 September 2013 (UTC)


== Pages with protolanguage information? ==
== Pages with protolanguage information? ==

Revision as of 16:02, 13 September 2013


Term only citable with different spellings counting

I can find one hit for Copenhagenisation and two for Copehagenization (meaning “(sociolinguistics) the process of Danish speakers begining to use the dialect of Copenhagen”). Not enough citations for either, but they’re just different ways of spelling the same word, so should they be included? — Ungoliant (Falai) 12:35, 2 September 2013 (UTC)[reply]

Our entries are for spellings. DCDuring TALK 13:41, 2 September 2013 (UTC)[reply]
There is support for it (Wiktionary:Information_desk/Archive_2012/July-December#Request for clarification: How strict is WT:CFI regarding attestation of spellings which vary slightly?). — Ungoliant (Falai) 14:05, 2 September 2013 (UTC)[reply]
Why do you call that support? DCDuring TALK 15:52, 3 September 2013 (UTC)[reply]
Support creating both entries. I don't think there is much point in gerrymandering the CFI to exclude terms merely because of spelling differences. It's the same word. —CodeCat 13:58, 2 September 2013 (UTC)[reply]
Agreed. If it were a regional term or an alternative spelling, where the spelling is what's in question, it might be different, but -ize and -ise are substituted into words by an extremely regular and mechanical process analogous to inflection (most of the time, we don't even notice we're doing it). If we accept plurals for singular lemmas, or past for present lemmas, we should accept these. Chuck Entz (talk) 14:41, 2 September 2013 (UTC)[reply]
Yup, though not without exception. (Consider (deprecated template usage) compromise and (deprecated template usage) exercise and (deprecated template usage) advertise, whose counterparts in <-ize> are quite rare by comparison. And, for that matter, consider (deprecated template usage) matrices and (deprecated template usage) hypotheses and (deprecated template usage) phalanges, whose regularly-backformed singulars (deprecated template usage) matrice and (deprecated template usage) hypothese and (deprecated template usage) phalange are, similarly, quite rare compared to the standard singulars. So we do need to exercise caution.) —RuakhTALK 21:49, 2 September 2013 (UTC)[reply]
I agree with CodeCat. —RuakhTALK 21:49, 2 September 2013 (UTC)[reply]
Yet another step that means increase in quantity and decrease in quality of entries. DCDuring TALK 15:52, 3 September 2013 (UTC)[reply]
You’re just being a concern troll. — Ungoliant (Falai) 18:34, 3 September 2013 (UTC)[reply]
I'm not sure what "step" you're referring to. Are you implying that hitherto we have not allowed entries in cases where a word meets the CFI but has not had any individual spellings/forms that do? —RuakhTALK 20:29, 3 September 2013 (UTC)[reply]
Exactly. Am I wrong? I know I am not wrong about the poor quality of our definitions, both English and other. It's hard to say whether they are getting worse or not as we have no metrics (not that we could readily develop any, except on a sample basis). I'm quite sure that our definitions are not rapidly improving and that we are constantly adding FL terms with ambiguous glosses. DCDuring TALK 23:58, 3 September 2013 (UTC)[reply]
Support including the term, but I would like to see one proper entry for a lemma, plus a form-of page.
Our pages are for spellings, but many of our full entries are for lemmas, with form-of references for inflections and spelling variations. The latter is a much better arrangement for the reader, and also for integrity of the dictionary, per the w:DRY principle. Our citation practices encourage me to think that we cite terms, not spellings: “Unlike the main space, inflected forms and alternate spellings should be redirected to the primary entry. Variations in case should be on the same page, with the other(s) redirecting, even if the definitions are distinct” (from WT:CITE#Naming).
Some of these are also citable: Copenhagenize/Copenhagenise, Copenhagenized/Copenhagenised, Copenhagenizes/Copenhagenises, Copenhagenizing/CopenhagenisingMichael Z. 2013-09-06 04:33 z
Lightly object. To generalize from this example, we're talking about words that have two spellings in English, which means you can't come up with 5 examples in English--with Google Books, that's not usually a huge hurdle. You're also usually taking about words that predictable variations on words we should have; if we have Copenhagen, Copenhagenization should be pretty clear. I don't see the benefits as being huge.--Prosfilaes (talk) 05:52, 6 September 2013 (UTC)[reply]

Wiktionary's definition of a word is spelling-based, and I don't see why we should make an exception for Copenhagenisation and Copehagenization. If both can be independently cited they both deserve a separate entry, with one being a lemma and other an alternative form, misspelling, or whatever. --Ivan Štambuk (talk) 17:22, 12 September 2013 (UTC)[reply]

I'd've thought this was a good idea for things that Wiktionary:Entry layout explained either doesn't mention or doesn't give an unambiguous verdict on. For example, definitions may be formatted as sentences, or not. There's very little consistency. Even two consecutive definitions in a single entry, the first will have an initial capital and a full stop, and the second will have neither. Mglovesfun (talk) 11:11, 3 September 2013 (UTC)[reply]

Good or featured articles?

Hi,

Is there here a system of good or feature articles, like on Wikipedia (Wikipedia:Featured articles/Wikipedia:WikiProject Good articles)?

Thanks by advance, Automatik (talk) 13:18, 3 September 2013 (UTC)[reply]

But we do have WT:WOTD. DCDuring TALK 13:25, 3 September 2013 (UTC)[reply]
Thaks for your answer. Automatik (talk) 15:35, 3 September 2013 (UTC)[reply]
But WT:WOTD isn't really comparable. There are no quality requirements on WOTD and no process for "bringing an entry up to WOTD level". The non-English WOTD must have a pronunciation and at least one citation (at least one mention for a limited-documentation language), but the requirements on English WOTD all have to do with the nature of the word itself, not with the quality of the entry. —Angr 21:46, 3 September 2013 (UTC)[reply]
That's true, but I think it's in part because we can improve an English entry quickly once it's announced as an upcoming word of the day. —RuakhTALK 04:14, 7 September 2013 (UTC)[reply]

Hello, I came from the French wiktionary too. We are trying to create a system to have a quality evaluation, and it seems no other Wiktionary have a system like that. Do you want to join to the discussion? If yes, we do have to work some weeks more and then we can translate it in English to share the ideas with you. Eölen (talk) 22:52, 3 September 2013 (UTC)[reply]

Can you please provide a link to the relevant discussion on French wiktionary? --Ivan Štambuk (talk) 17:16, 12 September 2013 (UTC)[reply]

Since I saw it on a "needed badly" list somewhere, I decided to start this page. It has been brewing on my disk for some time. I loosely based it on WT:ACS, while trying to explain some grammatical features, and highlight a few gaps in current practices. Please tell me what you think, whether anything is missing, needs change or an explanation. Keφr 10:34, 6 September 2013 (UTC)[reply]

Good work! Not much to criticise or suggest at this point. I'll watch this project and may use something to add to Wiktionary:About Russian. I'd like to see more treatment of verbs, including perfective/imperfective (not just entries but translations), abstract/concrete, semelfactive. Also interested in the policy for reflexive verbs, which seem to be handled differently across languages (separate entries or separate senses?). Polish could perhaps use more etymology info, which can often be looked up at Serbo-Croatian (or sometimes Russian) entries with Proto-Slavic derivations. --Anatoli (обсудить/вклад) 23:52, 8 September 2013 (UTC)[reply]
I do not remember ever encountering a semelfactive aspect which would be distinct from perfective. Translations - noted, will write something up. I tried to be descriptive of current practices rather than prescriptive, so if you people want to discuss how the policy ought to be, feel free. Not sure what you mean by the abstract-concrete distinction. Remember, this is not a complete guide to Polish grammar, just a quick summary to explain how it is relevant to presenting terms in Wiktionary. Keφr 07:25, 9 September 2013 (UTC)[reply]
Re: semelfactive vs simply perfective, example: "krzyknąć" and "pokrzyczeć" are both perfective, the former is semelfactive (instantaneous, momentive), the latter is not. Abstract vs concrete (verbs of motion only): "chodzić"/"iść". I've added some categories for a few Slavic languages other than Russian before. Your project page doesn't have to describe all that, of course. --Anatoli (обсудить/вклад) 12:24, 9 September 2013 (UTC)[reply]
And I used to think that aspect is an easy language… aspect. Did you notice the mention of frequentatives? Any idea whether and how this aspect mess should be handled? (I think I remember Russian having a similar feature.)
The abstract-concrete gave me some idea, but I am not sure I got it right. I think you will not find a good translation of the verb (deprecated template usage) go in all its generality. The verb (deprecated template usage) iść still sort-of refers to using feet, even if the main focus on something else.
And this page is not "mine" by any standard. If you think you have something to add, go ahead. In the worst case you will get reverted once or twice. Keφr 13:54, 9 September 2013 (UTC)[reply]
Although I would not mind having the former type of page somewhere in here, to be honest. There are a few languages I would like to learn, but have something of a hard time finding good resources. A brief grammar reference would be helpful. Keφr 07:28, 9 September 2013 (UTC)[reply]

Interwikis and translation-links for languages without Wiktionaries, or whose Wiktionaries are closed.

As far as I can tell, Wiktionaries can be classified into four groups:

  1. Regular Wiktionaries, like fr.wikt and es.wikt. Except for various annoying edge cases that aren't the subject of this discussion, these work just fine, and exactly as you'd expect.
  2. Nonexistent Wiktionaries that redirect to the Wikimedia Incubator, like vep.wikt. I'm not sure quite how we should handle these, but I think we can basically do whatever we want; we just need to decide what we want to do with them, and then do it. Interwiki-links to [[:vep:...]], for example, work fine, linking to vep.wikt URIs that redirect to Incubator URIs.
  3. Nonexistent Wiktionaries that don't redirect to the Wikimedia Incubator, like zza.wikt. With these we can do whatever we want for translation-links (we just have to link directly to the Incubator entry if we want that), but interwiki-links are uglier (we'd have to add them JavaScript-ically).
  4. Closed/locked Wiktionaries, like aa.wikt and dz.wikt. (I suppose these could be considered a subset of the previous.) These are annoying, because they have some existent pages, and they have database-dumps, but redlinks to them are rather pointless (since content can't be added), even bluelinks to them are rather dubious (since problematic content can't be fixed or removed), and in some (most? all?) cases there's at least as much content on Incubator as on the Wiktionary domain itself.

Group #1 needs no discussion, but how do we want to handle each of groups #2–4?

RuakhTALK 06:21, 7 September 2013 (UTC)[reply]

Since no one's weighed in yet, here are my own views:
  • we should never link to closed/locked Wiktionaries — not as interwiki-links, and not as translation-links.
  • we should never link to non-existent pages on Incubator — not as interwiki-links (obviously), and not as translation-links.
  • when a translation has an appropriate-language Wiktionary entry on the Wikimedia Incubator, we should link to it using {{t+}}. (Note: since e.g. [[zza:...]] and [[aa:...]] don't work properly, this will require a change to the translation-templates. Actually these templates are already a bit broken when it comes to languages without Wiktionaries — {{t|zza|foo}} links to a page named zza:foo on en.wikt — so we'll want to make some sort of change to them regardless.)
  • when an interwiki-link would appropriately link to a redirect to an existent entry on the Wikimedia Incubator, we should use it. For example, [[April]] should include [[vep:April]] among its interwiki-links.
  • when an entry exists on the Wikimedia Incubator, but an interwiki-link wouldn't work, should we hack up some JavaScript to make it work? I'm not sure.
RuakhTALK 19:47, 7 September 2013 (UTC)[reply]
Sounds all reasonable to me on the face of it. As for the Javascript question in the last item, my instinct would be to avoid adding Javascript unless it generates significant added value, which does not seem to be the case. --Dan Polansky (talk) 20:14, 7 September 2013 (UTC)[reply]

IMHO, apart from top-X (where X < 5), other Wiktionaries are so much inferior in quality that linking to them in both interwikis and translation tables seems like a waste of time, database space and edit counts. --Ivan Štambuk (talk) 17:10, 12 September 2013 (UTC)[reply]

Number forms

Based on the Category:Inflections, I believe Wiktionary needs a new category called Numeral forms because some languages have inflections for their cardinal numbers. I hope this isn't a difficult suggestion. --KoreanQuoter (talk) 18:08, 7 September 2013 (UTC)[reply]

But we already have one? —CodeCat 18:26, 7 September 2013 (UTC)[reply]
I tried to make a separate page for одно (neuter form of один) and I think Numeral forms is more appropriate for a category. --KoreanQuoter (talk) 18:51, 7 September 2013 (UTC)[reply]
I still don't understand. What is wrong with the existing numeral forms category? —CodeCat 19:24, 7 September 2013 (UTC)[reply]
Wait. There was an existing numeral forms category? --KoreanQuoter (talk) 05:47, 8 September 2013 (UTC)[reply]
…yes? Keφr 06:02, 8 September 2013 (UTC)[reply]
Oh. Silly me. Thank you. --KoreanQuoter (talk) 06:18, 8 September 2013 (UTC)[reply]

CFI and Wiktionary is not an encyclopedia

I have created vote Wiktionary:Votes/pl-2013-09/CFI_and_Wiktionary_is_not_an_encyclopedia. I propose to remove or at least trim WT:CFI#Wiktionary is not an encyclopedia section.

Let us postpone the vote as much as the discussion needs. --Dan Polansky (talk) 08:08, 8 September 2013 (UTC)[reply]

Let's keep the comments on the talk page of the vote. Mglovesfun (talk) 09:06, 8 September 2013 (UTC)[reply]

CFI and trimming the Idiomaticity section

I have created vote Wiktionary:Votes/pl-2013-09/CFI and trimming the Idiomaticity section.

Let us postpone the vote as much as the discussion needs. --Dan Polansky (talk) 09:00, 8 September 2013 (UTC)[reply]

Underestimating idiomaticity of Finnish translations

While going around fixing translation lists, I noticed that very often, the Finnish translations are marked up as if they were sum-of-parts. At first I thought, "well, I guess Finnish is weird", but recently I started to doubt the accuracy of their such characterisation. Take (deprecated template usage) door-to-door. The Finnish translations listed look like simple inflections of the Finnish word for "door" ((deprecated template usage) ovi). The English meaning of "door-to-door" is apparently idiomatic, so I have a quite hard time imagining how the Finnish entry, which breaks down into constituents pretty much the same way, would be sum-of-parts. I am also suspicious of entries where Finnish translations are broken into roots and affixes.

Do you think we should go over these? Keφr 11:54, 8 September 2013 (UTC)[reply]

"ovelta ovelle" translates literally as "from door to door". (deprecated template usage) ovelta is the ablative case of (deprecated template usage) ovi and means "(away) from a/the door", while (deprecated template usage) ovelle is the allative case and means "to/towards/onto a/the door". —CodeCat 12:19, 8 September 2013 (UTC)[reply]
I've noticed this too and I think it's just the way they've been added (by a human editor) and nothing to do with the language itself. Mglovesfun (talk) 19:58, 8 September 2013 (UTC)[reply]
So in that case, should we not have an entry for the whole phrase and link to it in the translations list? Keφr 20:55, 8 September 2013 (UTC)[reply]
I would trust Hekaheka's judgement on how to translate into Finnish. We should invite her if any doubt. Translations may be done as "solids" (if they are idiomatic in the target language) or using "sum of part" methods. Using "Template:tø" allows you to see individual components and the grammar but "ovelta ovelle" requires the entry to exist or at least interwiki. --Anatoli (обсудить/вклад) 22:30, 8 September 2013 (UTC)[reply]
I agree (w/Anatoli). Even if it is just a case where the Finnish-speaking editors have made a different decision than most of the non-Finnish-speaking editors would have made . . . well, that doesn't seem like a big deal to me. There are a lot of things that it's important to be consistent about across languages, but I'm not sure this is one of them. —RuakhTALK 23:16, 8 September 2013 (UTC)[reply]
There are pros and cons in both approaches but it's often safer to use "SoP" approach, even if it's more time-consuming. It usually causes less criticism. It's the first I see criticism of the SoP approach.--Anatoli (обсудить/вклад) 04:32, 9 September 2013 (UTC)[reply]
Safer? Maybe. Though note that [[ovelta ovelle]] does exist in this case. The specific problem I see with overmarking as sum-of-parts are that 0) this discourages creation of entries for terms which may be non-trivial to translate into English; 1) these terms will not be picked up by Yair rand's gadget on the search page, and it does not seem obvious to me how it could be extended to do so; leading into 2) the translation of such a term will be harder and more time-consuming, especially when the entries for the constituent words are missing some meanings. So, marking translations as idiomatic can be beneficial even when that makes them redlinks. Keφr 07:09, 9 September 2013 (UTC)[reply]
By "safer" I mean in terms of someone disputing idiomaticity. I've translated double-decker bus and death camp as SoP двухэтажный автобус m (dvuxetážnyj avtóbus) and Template:tø as tomorrow someone may dispute both the English terms and the Russian translations. It's still educational to see the grammar of the translations, showing the individual parts and how the translation is made. --Anatoli (обсудить/вклад) 13:00, 9 September 2013 (UTC)[reply]
Educational — I am not denying that. Having these translations listed as SoPs is helpful, even if only by the virtue of it being better than having no translations at all. But the grammatical structure of a multi-word term can be also analysed in the whole term's entry, and can also be inferred by hand when the constituent word pages are reasonably complete, so this is not hugely relevant. Creating pages for these I find rather easy. Though granted, I do not always create these myself.
All I wanted to know is whether everyone is okay with treating such terms as SoPs. There are many other examples, they often land in Category:Translations to be checked (Finnish), because I usually just leave them there when xte suggests reviewing the translation. (Hekaheka would probably like to call me a perkeleen vittupää because of that, but oh well. Cannot please everyone.) Keφr 14:10, 9 September 2013 (UTC)[reply]

Merging Mari and Buryat varieties

Can we or should we merge some varieties of Mari and Buryat at Wiktionary?

  • Mari (chm, mhr, mrj):
Hill/Western Mari (mrj) can probably stay separate, it has a few more letters than the standard or Eastern (Meadow) Mari. Extra Cyrillic letters in Western Mari: Ӓ, ӓ and Ӹ, ӹ and they don't use standard Mari letter Ҥ, ҥ. This variety has about 30 thousand speakers. It's still possible to merge if Western Mari has context labels and the additional letters are handled. Anyway, chm and mhr can be merged safely.
Language codes with names:
  • chm - "Mari", "Standard Mari"
  • mhr - "Eastern Mari", "Meadow Mari"
  • mrj - "Western Mari", "Hill Mari"


  • Buryat (bxr, bxu, bxm, bua):
Russian and Mongolian Buryat use the same alphabet. Mongolian and Vagindra are hardly used. The overwhelming majority of Buryats live in Buryatia, some in Mongolia, even less in China.
Language codes with names:
  • bxr - "Russia Buryat"
  • bxm - "Mongolia Buryat"
  • bxu - "China Buryat"
  • bua - "Buryat", "Buriat"
(it's obvious that at least one is redundant)

--Anatoli (обсудить/вклад) 04:53, 10 September 2013 (UTC)[reply]

Meadow and Hill Mari have separate written standards so they should be kept separate (and the chm code deleted). Buryat, easy thing, merge them. There is no difference between these 'lects, except perhaps for some loanwords. -- Liliana 07:39, 10 September 2013 (UTC)[reply]
Seems like we have an agreement on Buryat. So we can delete bxr, bxm and bxu and make bua the only code for Buryat.
With Mari, I would rather delete mhr and leave the name "Mari". Standard Mari is "Eastern Mari" or "Meadow Mari" and chm is more common. OK, let's leave mrj but I'll make a transliteration page and a module, which works for both alphabets. --Anatoli (обсудить/вклад) 22:42, 10 September 2013 (UTC)[reply]

Overriding manual transliteration

This has been discussed in other pages, but no consensus was reached.

Automated transliteration works perfectly for several languages, such as Armenian. Some suggest to always override manual transliteration for these languages, because many of them are incorrect due to human errors and inconsistent (due to changes to transliteration system, etc.) Some others say we should always let the editors use |tr=.

Another solution is removing the old manual transliterations for the terms of these languages, and don't override manual transliterations after that. (we can put the pages with |tr= for terms of these languages in a category to keep track of them) --Z 13:29, 10 September 2013 (UTC)[reply]

Wouldn't we want to be able to let that be decided on a language-by-language basis? What about also allowing the overriding of everything with tr=, but allowing tr0= to override bad automatic transliterations, also on a per-language basis? DCDuring TALK 14:47, 10 September 2013 (UTC)[reply]
It is being decided language-by-language. See the override_translit section of Module:links. --Vahag (talk) 15:15, 10 September 2013 (UTC)[reply]
I support overriding manual transliteration for languages whose automatic transliteration works perfectly, e.g. Armenian, Georgian. For such languages manual transliteration will be redundant in the best case and wrong in the worst case. --Vahag (talk) 15:15, 10 September 2013 (UTC)[reply]

Great language game

I know that this isn't a forum, but there isn't really anywhere else to put it. And it would be a shame not to share it because I think many people on Wiktionary will like it. There's a new website called the Great Language Game where you can see how well you can tell different languages apart by ear. I seem to do pretty well with it, I hope it's fun to others as well. —CodeCat 12:32, 12 September 2013 (UTC)[reply]

Love it, thanks. Only 800 for me... And I got lucky, I kept ending up with Slavic languages. --Fsojic (talk) 20:16, 12 September 2013 (UTC)[reply]

What can be done to improve quality?

The more my wanderings take me to visit a wide range of non-English entries, the more I think that the English-language entry quality problem is not our only quality problem.

For non-English entries the problems range from the near-incoherent terseness of our copyings of a 110-year old Sanskrit dictionary to the frequent presentation of non-idiomatic calques as glosses and the use of terms that simply don't belong in a definiens of a contemporary dictionary due to the age, rareness, or unglossed polysemy of the term or terms used.

For English entries the quality problem includes the obsolete language of definiens and the poor coverage of polysemic terms, especially uses that developed in the 20th century and remain common today. The entries for polysemic terms contain many important definitions that are buried and lost in visual clutter. The definiens of many terms includes words that are rare and/or technical when neither characteristic is necessary.

Are there technical means that could help? An example might be processing the dumps to identify uses of terms labeled rare, obsolete, archaic in definiens. Or words used only once in any definiens.

What can we do to get more effort by existing and past editors devoted to entry improvement?

Are there helpful ways to more actively recruit or develop contributors? DCDuring TALK 12:56, 12 September 2013 (UTC)[reply]

I do think that we should avoid using obscure terms in definitions, but sometimes there just happens to be that one word that describes it so much better than anything else. In such cases I usually prefer to show both. I often include multiple glosses if it helps to narrow the meaning down more.
I'm not sure if there is much we can do to increase the effort. People will work on what they feel like working on. We can raise awareness, but that's about all we can do. Wiktionary is pretty decentralised and we have no central announcement system that everyone is guaranteed to see, except for WT:NFE which a lot of people ignore regardless. So if we want to raise awareness of issues we first need some kind of global platform to raise them on to begin with. Beer Parlour isn't really enough.
As for visual clutter, I think this is a real problem and I think it could be improved substantially by adopting a visual style similar or identical to the French Wiktionary. Their use of colours, borders and icons is far easier on the eye and does a lot to direct the user's attention to certain parts of the page. It makes things stand out more and gives visual structure to the page which is pretty much essential. —CodeCat 13:19, 12 September 2013 (UTC)[reply]
For foreign language entries, heavy use of glosses or listing several possible translations is a must. For example, in the Serbo-Croatian entry Template:l/sh/Latn, a definition given is “binding”. The reader is left to guess which sense of Template:l/en it refers to. When I add a Portuguese entry, I always try to add enough information via glosses and possible translations so the user won’t need to follow any link nor rely on guesswork to understand precisely what the term means.
Shortcut glosses like “(all senses)” should be avoided as well, IMO, as they can lead to error. — Ungoliant (Falai) 13:28, 12 September 2013 (UTC)[reply]
"For foreign language entries, heavy use of glosses or listing several possible translations is a must." Very much this. I already do this when adding entries in Polish, and I put a similar recommendation at WT:APL#Definitions. This should be a project-wide policy, because the reasons I gave there are not exclusive to Polish at all. This is a no-brainer for anyone who deals with translations, really.
As for visual side, I will disagree. I actually like our, shall I say, ascetically colourless style. I think we the best solution would be to convert pages to some kind of semantic markup so that we do not have to enforce any particular style at all. Dislike a style? Switch your skin.
Regarding the lack of a central propaganda tube, I would add a "add N4E to watchlist" link to the welcome template, and maybe streamline the template to the most essential bits. Could help. And the Beer parlour does not cut it, I presume, partly because the main BP page is just too damn big, loads slowly, and you have to keep adding and removing per-month pages to your watchlist to keep being updated, which is tedious. Wikipedia's archive pages system is better in this regard, although it has its own flaws. I cannot wait for mw:Flow to solve all our wiki discussion problems. In the meantime, why not convert the central discussion pages to LiquidThreads? Keφr 15:29, 12 September 2013 (UTC)[reply]
By comparing other meanings it is is obvious that the binding sense of the Serbo-Croatian noun (deprecated template usage) vȇz refers to "A finishing on a seam or hem of a garment". Sometimes using a dictionary requires a minimum amount of intelligence on reader's part. Ditto for what DCDuring calls "near-incoherent terseness" of Monier-Williams Sanskrit dictionary, the most comprehensive Sanskrit dictionary compiled by the most authorative Sanskrit lexicographer in the West. --Ivan Štambuk (talk) 17:04, 12 September 2013 (UTC)[reply]
Easy to say that when you’re a native speaker and already know the word. And the best one can do is figuring out that “A finishing on a seam [] ” is the most likely meaning, but without a gloss there is no certainty. Even then, people expect a dictionary, not a test on their figuring-out-the-most-likely-meaning-of-words skills. — Ungoliant (Falai) 17:14, 12 September 2013 (UTC)[reply]
@Ivan: I don't think that those providing support for this project intend that it be usable only by an intellectual elite. The intellectual elite that uses and contributes to this wiki needs to also serve the general population of those who need dictionaries.
I don't doubt the underlying quality of the Sanskrit dictionary in term of its coverage of Sanskrit. It seems like an outstanding basis for good Sanskrit Wiktionary entries. I just don't think that it is very usable for a non-specialist, partially because the style and wording of the Wiktionary entries resulting from the copying is not similar to that of other Wiktionary entries. The problem is not unlike the problem of copying Webster 1913 definitions, except the stylistic difference is even more dramatic. As it stands our Wiktionary Sanskrit entries are, in many ways, worse than the underlying dictionary because of omissions. DCDuring TALK 17:37, 12 September 2013 (UTC)[reply]
But Sanskrit and other extinct and classical languages are only used by intellectual elite. The only way a common person is going to come across a Sanskrit entry is through some etymology.
I don't recall having seen Sanskrit entries that are formatted radically different than entries in other languages. The only problem is the abundance of meanings that words have, and which sometimes get grouped by eras or sources, and not by semantic closeness as they are normally - but that's a particular issue of classical languages that have been used over a long period of time, which other "normal" languages don't have. But users looking up Sanskrit words expect such layout which enables them to quickly isolate set of meanings appearing in a particular work that they are reading.
I don't understand what omissions you are referring to. If you have some constructive proposal of how to change some user-unfriendly entry of your choice I'd be happy to hear it. --Ivan Štambuk (talk) 22:02, 12 September 2013 (UTC)[reply]
With other meanings being embroidery and needlework, I can't imagine anyone construing the binding sense in any definition of [[binding]] other than the one that has sewing context label attached. I don't think that the average reader is that stupid. --Ivan Štambuk (talk) 22:02, 12 September 2013 (UTC)[reply]
This is more of an observation than a suggestion, but when I was experimenting with using links to senses with sense IDs using {{senseid}} when writing definitions for foreign-language terms, I found that the sense that I wanted to link to was missing--about half the time, actually--and that there were many senses of the English term that I hadn't thought of, and which forced more disambiguation on the foreign language end than I had realized. The process of matching senses exposed gaps on both ends, so if that could be integrated in the editing process, it would be a big aid to editors. That would really tap into the collaborative power of the project. This is not to push or oppose sense IDs, just my experience with them. --Haplology (talk) 16:05, 12 September 2013 (UTC)[reply]
I would expect contemporary non-English terms to often need well-worded, contemporary senses of common English terms that the English entries lack. That is one of the biggest problems for English entries. {{rfdef}} helps us identify the need, but a note in the template to explain the need would be helpful in prioritizing work on the English entry. Do we have a tag to mark the FL definition as waiting for a suitable English definition to be provided? Would a use of {{sense-id}} in the English and FL entries help by providing a way to find the original FL entry problem (missing gloss)? DCDuring TALK 16:36, 12 September 2013 (UTC)[reply]
We also have {{gloss-stub}}. I add it to entries whenever I find that the definition doesn't identify the meaning specific enough. —CodeCat 16:46, 12 September 2013 (UTC)[reply]
That presumably goes in the FL entry and {{rfdef}} goes in a new line at the English L3/L4 section. How could those be linked more or less automagically by the use of {{senseid}} in each? DCDuring TALK 17:25, 12 September 2013 (UTC)[reply]
Install Wikidata. DTLHS (talk) 20:30, 12 September 2013 (UTC)[reply]
And I realize that saying "install wikidata" isn't helpful- just pent up frustration about trying to implement features of a database in something that very much isn't. DTLHS (talk) 20:49, 12 September 2013 (UTC)[reply]
Speaking of Wikidata — I often see statements in the Wikipedia metaspace that there are plans to deploy Wikidata on Wiktionary in some form. At the same time I am yet to see anybody from Wikidata approaching the community here about this. Trying to explain how it would work, how to handle existing dictionary content, and such. I smell a disaster. Keφr 20:55, 12 September 2013 (UTC)[reply]
d:Wikidata:Wiktionary. --Yair rand (talk) 22:06, 12 September 2013 (UTC)[reply]
Huh, I just realized I'm the only Wiktionary admin who's also a Wikidata admin. We're probably going to need some more Wiktionarians paying attention to Wikidata's progress if WD use is going to turn out well here. ... --Yair rand (talk) 22:20, 12 September 2013 (UTC)[reply]
@Ivan. I think we would want to be more than a wikisource for a 110-year old dictionary, no matter how good that dictionary may be, especially as there already it already is available: eg, [1]
We split proper nouns senses from common noun senses, but many, many Sanskrit sections do not. Not all Sanskrit sections include the link to the underlying dictionary, which is itself an omission, and not every bit of explanatory note in the original dictionary seems to have survived. A glossary for the abbreviations used does not seem to be included. The language used is not contemporary English and the definitions lack glosses. DCDuring TALK 23:23, 12 September 2013 (UTC)[reply]
For an example of a problem see WT:RFC#सह and join the discussion there. DCDuring TALK 23:25, 12 September 2013 (UTC)[reply]
MW dictionary is perfectly valid even today (because we're dealing with an extinct language, doh), some of the entries were created before the online version of MW dictionary was available, Sanskrit grammar tradition doesn't make the distinction between proper and common nouns, 98% of the words in its definitions are perfectly valid contemporary English as far as I recall, and anyone studying Sanskrit doesn't need a meaning gloss. In other words, there are no problems with Sanskrit entries. --Ivan Štambuk (talk) 16:02, 13 September 2013 (UTC)[reply]

Pages with protolanguage information?

CodeCat and I have discussed a couple of times the question of reconstructed forms without references in Etymology sections (the most recent discussion is here). One conclusion now seems to be that it would be a good idea to have pages (perhaps in the Appendix) with more detailed historical information, including perhaps original research by Wiktionarians, on specific topics, which could then be linked to from individual words. Case in point: Proto-Baltic vs. Proto-Balto-Slavic. The current tendency goes in the direction of Proto-Balto-Slavic, but there are not many published reconstructions of words out there, whereas Proto-Baltic has clearer sources. Now, if Wiktionarians want to add Proto-Balto-Slavic etymologies, or simply replace the Proto-Baltic label ({{etyl|bat-pro|LANG}}) with the Proto-Balto-Slavic one ({{etyl|ine-bsl-pro|LANG}}) on the assumption that most PB reconstructions will be acceptable PBS reconstructions as well, wouldn't it be nice to have a page (called, say, "Appendix:Proto-Baltic and Proto-Balto-Slavic") that discusses this in detail, with correspondences, derivations, and clear statements of what things in PB we think will remain the same in PBS, and why? In this way, any changes of PB to PBS can be referred to this page: it will be the basic source for the reconstruction, and the interested reader can read it to see on what grounds we have an (as yet unpublished) PBS etymology rather than the (already published) PB one. Also, Appendix pages with reconstructed PBS words could be linked to it. One objection is that this page would contain "encyclopedic" information. Yet I feel that this kind of information is quite vital for someone who is navigating the thorny area of Indo-European etymology and wants to feel sure the etymological information given at Wiktionary is correct and accurate -- as vital as having, say, a page on IPA and its symbols, or a page with the definitions of all grammatical terms used to tag words. What do you guys think? --Pereru (talk) 20:25, 12 September 2013 (UTC)[reply]

I think you missed a few important parts of the original discussion. To me, the point of having a special page for this is to act as a repository of sourced knowledge related to the reconstruction of a given language, but it's also the place that we as Wiktionarians would use to collect our own conclusions about certain minor issues surrounding them. Specifically I called it a way to allow original research, while keeping it both contained and publically accessible as a reference for etymologies and reconstructed entries on Wiktionary. Basically, to enable peer review of Wiktionary's reconstructions. I also think that the last two posts in our conversation are important:
Me: My main objection with really big stuff is that it is the kind of area where even professional linguists get things very wrong, so that makes it even more likely for amateurs to miss things. I don't have any professional schooling in linguistics, just a lot of curiosity that made me want to look for things and learn more. So I know a bit I think but what I know is not at a professional level and I don't think it is for anyone else here either. The limitations are mainly there to protect ourselves, Wiktionary and its users from our own incompetence. :P
Pereru: I am a professional linguist, though not an Indo-Europeanist (I work on South American indigenous languages). But one of the things I've learned is to stick to logics and good arguments, because (a) big stars with famous diplomas often think their fame is all their need to justify something, and (b) non-big-stars, without any diplomas, surprisingly often contribute really intelligent, insightful ideas that deserve recognition.
CodeCat 21:03, 12 September 2013 (UTC)[reply]
That is an interesting thing, and I certainly support it. But I do see the main point of having such pages in a 'dictionary (as opposed to a research journal) in being able to add references to specific reconstructions -- be they in ===Etymology=== sections, be they independent pages on PBS reconstructed forms. --Pereru (talk) 06:48, 13 September 2013 (UTC)[reply]
No we can not add original research by Wiktionarians in etymologies. Neither as reconstructions nor as speculations on word origins. Etymologies are like small encyclopedic articles and all of the Wikipedia policies on no OR and maintaining NPOV apply to them as well. If we allowed original research Wiktionary would become worthless as an etymological dictionary because there would be no way to differentiate among credible sources. We might as well restore H&M's Chinese phonosemantic interpretations. If you want to make up theories on word origins go write a blog or paper. It is not up to us to deem sources "right" or "wrong", but simply to collect all of the competing theories from established authorities and present them to the reader in the most appropriate fashion, taking into account issues such as neutrality, acceptance, and newness.
Proto-Baltic is an obsolete theory and it's quite irritating to see you intentionally replacing Proto-Balto-Slavic reconstructions that can be cited with the ones based on the 1980s scholarship. I don't think that there are linguists today (apart from some Russophobic Baltic nationalists) that dispute PBSl. There so no "tendency", it's a settled matter. There are are many details that need to be settled, but the grouping itself is not a point of contention. --Ivan Štambuk (talk) 21:26, 12 September 2013 (UTC)[reply]
But where is it actually cited as policy that we don't allow original research? We constantly do original researching when we document definitions, why is this different? If Wiktionary editors can be lexicographers, why not also etymologists? Pereru explicitly encourages the matter and he is a professional linguist himself, so he understands what is involved. I understand that you want to differentiate reliable theories from bogus ones and that is exactly what this proposal is supposed to prevent, as the idea is just that: to collect all of the competing theories and build up a body of peer reviewed research that can be used to support reconstructions in Wiktionary articles. I wonder if you even understand what has been suggested? —CodeCat 22:10, 12 September 2013 (UTC)[reply]
Writing definitions on the basis of attestations is "original research" in much the same way that writing Wikipedia articles based on cited sources is. We do not invent new meanings, but rather collect the ones attested in usage on the basis of our CFI (which are really "criteria for attestation"). The original part there is to word the definition in a manner that doesn't coincide with any of the existing dictionaries (unless they are out of copyright). No original research is one of the pillars of Wikipedia that protects users from obscure theories, and the project itself from being a propaganda machine for every fringe group that thinks that the lack of editorial or peer-review process as an opportunity to present its fringe view.
Pereru is just someone nicknamed "Pereru" (what is a "professional linguist" BTW? Somebody paid by taxpayers to produce work hidden from general populace behind paywalls and costly volumes?). I don't care if he is de Saussure reincarnated.
You seem to be conflating to separate points: 1) etymologist as somebody writing a paper on a word origin, postulating reconstructions and speculating on word origins 2) etymologist as somebody writing an etymological dictionary, which is usually done by every single headword having references to various scholarly opinions, with etymologist then choosing what he thinks is the "best" explanation. We can only do OR in the second sense, by being a synthetic work of the most recent scholarship. Not invent reconstructions and deep theories of word origins based on our own opinions of how languages evolved. Which is what you have been doing and seem to be keen on getting a community approval. --Ivan Štambuk (talk) 08:26, 13 September 2013 (UTC)[reply]
When we apply existing principles known to linguistics to come to a reconstruction that nobody has published before, is that not just applying the same science that linguists do? My intention was specifically to allow our own reconstructions while at the same time have every detail of that reconstruction accounted for by sources. This is currently an area that is lacking like Pereru points out below; we either have references to the whole reconstruction verbatim, or none at all. For example take Template:term/t. I don't need a source to tell me that it's a sound reconstruction, because I can see that it fits perfectly with all the relevant sound laws in Balto-Slavic and its descendants. Yet it has no source because no source happens to attest this word in Balto-Slavic, even though every single phoneme of the reconstruction can be accounted for by established and sourced sound laws. Also I'm not sure why you think there would be a lack of peer review. I specifically noted that the whole point of this is peer review, and wikis as a whole are founded on the principle of peer review. So fringe theories would be rejected because there is no consensus for them on Wiktionary. As long as we assume that Wiktionary editors are knowledgeable about the area, there would be peer review of new reconstructions to ensure that the science has been applied correctly according to the most mainstream theories. I have done this with many Germanic reconstructions in the past, and it has worked well. —CodeCat 12:09, 13 September 2013 (UTC)[reply]
I'll have to agree with CodeCat here: where is it said that there can be no original work? To me, it seems every time you add new definitions to words -- definitions not previously published in other dictionaries --, you are doing original work. Where is it said that original work is not OK on Wiktionary, and why? (All I've seen is references to "Wiktionary is not Wikipedia".)
On your objections:
(a)If we allowed original research Wiktionary would become worthless as an etymological dictionary because there would be no way to differentiate among credible sources -- Why not? All you need to do is make accurate references. If you're taking something from a published source, by all means refer to it! (Shall we make it official policy that reconstructed protoforms are only allowed here with references?) If you're proposing one, write a page here with the details not still found in published sources and refer to it! In what way is this confusing, and how would this make it impossible to differentiate among credible sources? If at all, references would make it easier to differentiate among these sources... (On the subject of original research, I refer to published etymological dictionaries, in which the authors often advance original contributions and ideas for specific words, always carefully labeling them -- in the LEV, with a letter "K" at the end -- as the author's own work).
(b) It is not up to us to deem sources "right" or "wrong", but simply to collect all of the competing theories from established authorities and present them to the reader in the most appropriate fashion, taking into account issues such as neutrality, acceptance, and newness -- I agree fully. But note that most etymologies thus far presented here at Wiktionary are not like that: they are given without a source, and the casual reader has no way of judging whether they were presented "appropriately", with attention to "neutrality, acceptance, and care". It seems to me that adding a page in which things like PBS vs. PS etymologies could be explicitly discussed would be a great step forward in the direction of achieving precisely the goal you state. (In fact, here is another suggestion: how about a page, maybe in the Appendix, discussing precisely the good and bad points of all published sources for PIE etymologies that are used at Wiktionary, and why we trust some of them more than others? In the interest of full transparency and disclosure, wouldn't this increase the level of precision, as well as trustworthiness, of Wiktionary etymologies as a whole?)
(c) Proto-Baltic is an obsolete theory and it's quite irritating to see you intentionally replacing Proto-Balto-Slavic reconstructions that can be cited with the ones based on the 1980s scholarship. -- If they can be cited, why is (almost) nobody doing that? I've seen a couple of good citations of PBS forms (usually by you, actually), but most PBS forms proposed here have no support in published sources and, as per your own policy (in (b)) above, should not be here at all. So why are they, and why is wrong to remove them and replace them with sourced ones?
I don't care how "well established" you think PBS is (and a couple of Leiden specialists I've talked to -- both Dutch, not "Russophobic Baltic nationalists", whatever that is -- would beg to differ from you): the issue here is "what published source does a given reconstructed form come from"? Currently, almost nobody is adding sources to reconstructions here. If you have a good, published source for PBS etymologies, by all means refer to it! Heed your own advice! But when I see PBS forms being added without supporting evidence, and that in a world, no matter how well established you think PBS to be as a hypothesis, in which published PBS reconstructions are still few and far between, I think that the best policy is -- as you yourself propose! -- to trust the published sources, in which PB is still much more frequent. And, to follow this policy -- which, again, you yourself explicitly subscribe to! -- I delete, and will go on deleting, unsourced PBS etymologies and replacing them with sourced PB ones. After all, in a dictionary, sourced should always defeat unsourced. If a PBS etymology is sourced, it stays. If it isn't, it doesn't. I honestly don't see how you can subscribe to the "honesty and neutrality" policy you described above, and still disagree with that. Unless you simply want to push your personal vision of "what's right" in PBS reconstructions -- in which case, how is this NPOV?
Alternatively, you can do what CodeCat suggests: write a page in which YOU say why it is that PB reconstructions should be relabeled as PBS even in the absence of a published source that explicitly states that PBS = PS. You can sketch arguments, give examples, correspondences, etc... and then cite this page as your source.
How on earth would this be confusing, and how would this create trust problems for Wiktionary? Please riddle me that! If at all, what we're recommeding is that things be done more responsibly, and with more references. Don't you think that the current etymologies-without-references bonanza creates a much, much worse trust problem than any PBS-vs-PS page would?
I end up having to agree with CodeCat above: I think you didn't understand what it is you're disagreeing with. There is no contradiction between what is proposed here and any of the principles you espouse. Please read it again. --Pereru (talk) 06:48, 13 September 2013 (UTC)[reply]

Just out of curiosity, because I still don't understand what is at stake here: what's exactly the difference between Proto-Balto-Slavic and Proto-Baltic? Does Proto-Balto-Slavic theory say that there was simply no Proto-Baltic language, but that Latvian and Lithuanian evolved from Proto-Balto-Slavic exactly the same way that Proto-Slavic did? I've just drawn this (sorry for the probably simplistic view) so... which tree represents best the actual Proto-Balto-Slavic theory? The second or the third one? --Fsojic (talk) 13:18, 13 September 2013 (UTC)[reply]

The first has been more or less discredited, although some still hang on to it, maybe for political reasons. The second is how linguists generally saw it in the past. Newer research suggests that there are really three branches of Balto-Slavic (not the same as your third image): East Baltic, West Baltic, and Slavic. Each of those, it is supposed, had its own proto-language, but the proto-language of East and West Baltic together (what is called "Proto-Baltic") is not demonstrably different from Proto-Balto-Slavic itself. That is, if you try to find out what the common ancestor of all Baltic languages was, then you end up with a language that Slavic can also descend from. —CodeCat 14:30, 13 September 2013 (UTC)[reply]
But West Baltic evidence is limited. If one reconstructs a word from Latvian and Lithuanian - or East Baltic in general - alone because there is no known corresponding word in Old Prussian - or West Baltic in general - and label it as Proto-Baltic rather than Proto-East-Baltic (and I suppose some do this; well, I don't know), can we be sure it's the root for Proto-Slavic as well? --Fsojic (talk) 15:17, 13 September 2013 (UTC)[reply]
It's a matter of applying knowledge of how each language evolved, and then making all the ends fit together. Linguists formulate the phonetic evolution of a language through a series of ordered rules called "sound laws", which each act to change the pronunciation of words in some specific way according to certain rules. The sound laws for the Balto-Slavic languages are all more or less known, with some difficulty in the details still, but the general picture is clear. This means that it's fairly easy to find out if a given form can be an ancestor for a given Slavic term. All you need to do is apply all the Balto-Slavic-to-Slavic sound laws and see if the result you get matches what is actually found in attested Slavic or in reconstructed Proto-Slavic. An example: you start with Proto-Balto-Slavic Template:term/t. There are two sound laws that apply in this particular case. The first is Balto-Slavic *ū > Slavic *y, the second is masculine nominative singular Balto-Slavic *-as > Proto-Slavic *-ъ. Applying these two rules together gives *dūmas > *dymъ. And that is the form that is actually found in Slavic (see Template:term/t). Thus, the reconstruction is correct for Slavic. The same can then be applied to all the other Balto-Slavic languages, and if it matches all of them, then you have successfully reconstructed a Proto-Balto-Slavic term. —CodeCat 15:27, 13 September 2013 (UTC)[reply]