Wiktionary:Beer parlour/2013/September: difference between revisions

From Wiktionary, the free dictionary
Jump to navigation Jump to search
Content deleted Content added
(5 intermediate revisions by the same user not shown)
Line 443: Line 443:
:::: The claim that "Katze" is a valid German translation of English "cat" is a ''fact'', not an ''opinion''. --[[User:Dan Polansky|Dan Polansky]] ([[User talk:Dan Polansky|talk]]) 10:08, 18 September 2013 (UTC)
:::: The claim that "Katze" is a valid German translation of English "cat" is a ''fact'', not an ''opinion''. --[[User:Dan Polansky|Dan Polansky]] ([[User talk:Dan Polansky|talk]]) 10:08, 18 September 2013 (UTC)
::::: we define foreign words with one word translations where we can; also I believe you are correct because of the derivative work rule (regarding your first paragraph). [[User:Mglovesfun|Mglovesfun]] ([[User talk:Mglovesfun|talk]]) 10:11, 18 September 2013 (UTC)
::::: we define foreign words with one word translations where we can; also I believe you are correct because of the derivative work rule (regarding your first paragraph). [[User:Mglovesfun|Mglovesfun]] ([[User talk:Mglovesfun|talk]]) 10:11, 18 September 2013 (UTC)
:::::: I believe that translation pairs of single words themselves cannot be copyrighted. Translation pairs of single English words are facts. It is difficult to show originality in translation pairs when it comes to translation dictionaries. Originality in translation pairs does not apply to single word translations. These translation rules you are imposing are going out of hand. Relying on only one source is what may constitute a copyright violation. [[User:Tedius Zanarukando|Tedius Zanarukando]] ([[User talk:Tedius Zanarukando|talk]]) 02:40, 19 September 2013 (UTC)

Revision as of 03:51, 19 September 2013


Term only citable with different spellings counting

I can find one hit for Copenhagenisation and two for Copehagenization (meaning “(sociolinguistics) the process of Danish speakers begining to use the dialect of Copenhagen”). Not enough citations for either, but they’re just different ways of spelling the same word, so should they be included? — Ungoliant (Falai) 12:35, 2 September 2013 (UTC)[reply]

Our entries are for spellings. DCDuring TALK 13:41, 2 September 2013 (UTC)[reply]
There is support for it (Wiktionary:Information_desk/Archive_2012/July-December#Request for clarification: How strict is WT:CFI regarding attestation of spellings which vary slightly?). — Ungoliant (Falai) 14:05, 2 September 2013 (UTC)[reply]
Why do you call that support? DCDuring TALK 15:52, 3 September 2013 (UTC)[reply]
Support creating both entries. I don't think there is much point in gerrymandering the CFI to exclude terms merely because of spelling differences. It's the same word. —CodeCat 13:58, 2 September 2013 (UTC)[reply]
Agreed. If it were a regional term or an alternative spelling, where the spelling is what's in question, it might be different, but -ize and -ise are substituted into words by an extremely regular and mechanical process analogous to inflection (most of the time, we don't even notice we're doing it). If we accept plurals for singular lemmas, or past for present lemmas, we should accept these. Chuck Entz (talk) 14:41, 2 September 2013 (UTC)[reply]
Yup, though not without exception. (Consider (deprecated template usage) compromise and (deprecated template usage) exercise and (deprecated template usage) advertise, whose counterparts in <-ize> are quite rare by comparison. And, for that matter, consider (deprecated template usage) matrices and (deprecated template usage) hypotheses and (deprecated template usage) phalanges, whose regularly-backformed singulars (deprecated template usage) matrice and (deprecated template usage) hypothese and (deprecated template usage) phalange are, similarly, quite rare compared to the standard singulars. So we do need to exercise caution.) —RuakhTALK 21:49, 2 September 2013 (UTC)[reply]
I agree with CodeCat. —RuakhTALK 21:49, 2 September 2013 (UTC)[reply]
Yet another step that means increase in quantity and decrease in quality of entries. DCDuring TALK 15:52, 3 September 2013 (UTC)[reply]
You’re just being a concern troll. — Ungoliant (Falai) 18:34, 3 September 2013 (UTC)[reply]
I'm not sure what "step" you're referring to. Are you implying that hitherto we have not allowed entries in cases where a word meets the CFI but has not had any individual spellings/forms that do? —RuakhTALK 20:29, 3 September 2013 (UTC)[reply]
Exactly. Am I wrong? I know I am not wrong about the poor quality of our definitions, both English and other. It's hard to say whether they are getting worse or not as we have no metrics (not that we could readily develop any, except on a sample basis). I'm quite sure that our definitions are not rapidly improving and that we are constantly adding FL terms with ambiguous glosses. DCDuring TALK 23:58, 3 September 2013 (UTC)[reply]
Support including the term, but I would like to see one proper entry for a lemma, plus a form-of page.
Our pages are for spellings, but many of our full entries are for lemmas, with form-of references for inflections and spelling variations. The latter is a much better arrangement for the reader, and also for integrity of the dictionary, per the w:DRY principle. Our citation practices encourage me to think that we cite terms, not spellings: “Unlike the main space, inflected forms and alternate spellings should be redirected to the primary entry. Variations in case should be on the same page, with the other(s) redirecting, even if the definitions are distinct” (from WT:CITE#Naming).
Some of these are also citable: Copenhagenize/Copenhagenise, Copenhagenized/Copenhagenised, Copenhagenizes/Copenhagenises, Copenhagenizing/CopenhagenisingMichael Z. 2013-09-06 04:33 z
Lightly object. To generalize from this example, we're talking about words that have two spellings in English, which means you can't come up with 5 examples in English--with Google Books, that's not usually a huge hurdle. You're also usually taking about words that predictable variations on words we should have; if we have Copenhagen, Copenhagenization should be pretty clear. I don't see the benefits as being huge.--Prosfilaes (talk) 05:52, 6 September 2013 (UTC)[reply]

Wiktionary's definition of a word is spelling-based, and I don't see why we should make an exception for Copenhagenisation and Copehagenization. If both can be independently cited they both deserve a separate entry, with one being a lemma and other an alternative form, misspelling, or whatever. --Ivan Štambuk (talk) 17:22, 12 September 2013 (UTC)[reply]

I'd've thought this was a good idea for things that Wiktionary:Entry layout explained either doesn't mention or doesn't give an unambiguous verdict on. For example, definitions may be formatted as sentences, or not. There's very little consistency. Even two consecutive definitions in a single entry, the first will have an initial capital and a full stop, and the second will have neither. Mglovesfun (talk) 11:11, 3 September 2013 (UTC)[reply]

Hi,

Is there here a system of good or feature articles, like on Wikipedia (Wikipedia:Featured articles/Wikipedia:WikiProject Good articles)?

Thanks by advance, Automatik (talk) 13:18, 3 September 2013 (UTC)[reply]

But we do have WT:WOTD. DCDuring TALK 13:25, 3 September 2013 (UTC)[reply]
Thaks for your answer. Automatik (talk) 15:35, 3 September 2013 (UTC)[reply]
But WT:WOTD isn't really comparable. There are no quality requirements on WOTD and no process for "bringing an entry up to WOTD level". The non-English WOTD must have a pronunciation and at least one citation (at least one mention for a limited-documentation language), but the requirements on English WOTD all have to do with the nature of the word itself, not with the quality of the entry. —Angr 21:46, 3 September 2013 (UTC)[reply]
That's true, but I think it's in part because we can improve an English entry quickly once it's announced as an upcoming word of the day. —RuakhTALK 04:14, 7 September 2013 (UTC)[reply]

Hello, I came from the French wiktionary too. We are trying to create a system to have a quality evaluation, and it seems no other Wiktionary have a system like that. Do you want to join to the discussion? If yes, we do have to work some weeks more and then we can translate it in English to share the ideas with you. Eölen (talk) 22:52, 3 September 2013 (UTC)[reply]

Can you please provide a link to the relevant discussion on French wiktionary? --Ivan Štambuk (talk) 17:16, 12 September 2013 (UTC)[reply]

Since I saw it on a "needed badly" list somewhere, I decided to start this page. It has been brewing on my disk for some time. I loosely based it on WT:ACS, while trying to explain some grammatical features, and highlight a few gaps in current practices. Please tell me what you think, whether anything is missing, needs change or an explanation. Keφr 10:34, 6 September 2013 (UTC)[reply]

Good work! Not much to criticise or suggest at this point. I'll watch this project and may use something to add to Wiktionary:About Russian. I'd like to see more treatment of verbs, including perfective/imperfective (not just entries but translations), abstract/concrete, semelfactive. Also interested in the policy for reflexive verbs, which seem to be handled differently across languages (separate entries or separate senses?). Polish could perhaps use more etymology info, which can often be looked up at Serbo-Croatian (or sometimes Russian) entries with Proto-Slavic derivations. --Anatoli (обсудить/вклад) 23:52, 8 September 2013 (UTC)[reply]
I do not remember ever encountering a semelfactive aspect which would be distinct from perfective. Translations - noted, will write something up. I tried to be descriptive of current practices rather than prescriptive, so if you people want to discuss how the policy ought to be, feel free. Not sure what you mean by the abstract-concrete distinction. Remember, this is not a complete guide to Polish grammar, just a quick summary to explain how it is relevant to presenting terms in Wiktionary. Keφr 07:25, 9 September 2013 (UTC)[reply]
Re: semelfactive vs simply perfective, example: "krzyknąć" and "pokrzyczeć" are both perfective, the former is semelfactive (instantaneous, momentive), the latter is not. Abstract vs concrete (verbs of motion only): "chodzić"/"iść". I've added some categories for a few Slavic languages other than Russian before. Your project page doesn't have to describe all that, of course. --Anatoli (обсудить/вклад) 12:24, 9 September 2013 (UTC)[reply]
And I used to think that aspect is an easy language… aspect. Did you notice the mention of frequentatives? Any idea whether and how this aspect mess should be handled? (I think I remember Russian having a similar feature.)
The abstract-concrete gave me some idea, but I am not sure I got it right. I think you will not find a good translation of the verb (deprecated template usage) go in all its generality. The verb (deprecated template usage) iść still sort-of refers to using feet, even if the main focus on something else.
And this page is not "mine" by any standard. If you think you have something to add, go ahead. In the worst case you will get reverted once or twice. Keφr 13:54, 9 September 2013 (UTC)[reply]
Although I would not mind having the former type of page somewhere in here, to be honest. There are a few languages I would like to learn, but have something of a hard time finding good resources. A brief grammar reference would be helpful. Keφr 07:28, 9 September 2013 (UTC)[reply]

As far as I can tell, Wiktionaries can be classified into four groups:

  1. Regular Wiktionaries, like fr.wikt and es.wikt. Except for various annoying edge cases that aren't the subject of this discussion, these work just fine, and exactly as you'd expect.
  2. Nonexistent Wiktionaries that redirect to the Wikimedia Incubator, like vep.wikt. I'm not sure quite how we should handle these, but I think we can basically do whatever we want; we just need to decide what we want to do with them, and then do it. Interwiki-links to [[:vep:...]], for example, work fine, linking to vep.wikt URIs that redirect to Incubator URIs.
  3. Nonexistent Wiktionaries that don't redirect to the Wikimedia Incubator, like zza.wikt. With these we can do whatever we want for translation-links (we just have to link directly to the Incubator entry if we want that), but interwiki-links are uglier (we'd have to add them JavaScript-ically).
  4. Closed/locked Wiktionaries, like aa.wikt and dz.wikt. (I suppose these could be considered a subset of the previous.) These are annoying, because they have some existent pages, and they have database-dumps, but redlinks to them are rather pointless (since content can't be added), even bluelinks to them are rather dubious (since problematic content can't be fixed or removed), and in some (most? all?) cases there's at least as much content on Incubator as on the Wiktionary domain itself.

Group #1 needs no discussion, but how do we want to handle each of groups #2–4?

RuakhTALK 06:21, 7 September 2013 (UTC)[reply]

Since no one's weighed in yet, here are my own views:
  • we should never link to closed/locked Wiktionaries — not as interwiki-links, and not as translation-links.
  • we should never link to non-existent pages on Incubator — not as interwiki-links (obviously), and not as translation-links.
  • when a translation has an appropriate-language Wiktionary entry on the Wikimedia Incubator, we should link to it using {{t+}}. (Note: since e.g. [[zza:...]] and [[aa:...]] don't work properly, this will require a change to the translation-templates. Actually these templates are already a bit broken when it comes to languages without Wiktionaries — {{t|zza|foo}} links to a page named zza:foo on en.wikt — so we'll want to make some sort of change to them regardless.)
  • when an interwiki-link would appropriately link to a redirect to an existent entry on the Wikimedia Incubator, we should use it. For example, [[April]] should include [[vep:April]] among its interwiki-links.
  • when an entry exists on the Wikimedia Incubator, but an interwiki-link wouldn't work, should we hack up some JavaScript to make it work? I'm not sure.
RuakhTALK 19:47, 7 September 2013 (UTC)[reply]
Sounds all reasonable to me on the face of it. As for the Javascript question in the last item, my instinct would be to avoid adding Javascript unless it generates significant added value, which does not seem to be the case. --Dan Polansky (talk) 20:14, 7 September 2013 (UTC)[reply]

IMHO, apart from top-X (where X < 5), other Wiktionaries are so much inferior in quality that linking to them in both interwikis and translation tables seems like a waste of time, database space and edit counts. --Ivan Štambuk (talk) 17:10, 12 September 2013 (UTC)[reply]

Number forms

Based on the Category:Inflections, I believe Wiktionary needs a new category called Numeral forms because some languages have inflections for their cardinal numbers. I hope this isn't a difficult suggestion. --KoreanQuoter (talk) 18:08, 7 September 2013 (UTC)[reply]

But we already have one? —CodeCat 18:26, 7 September 2013 (UTC)[reply]
I tried to make a separate page for одно (neuter form of один) and I think Numeral forms is more appropriate for a category. --KoreanQuoter (talk) 18:51, 7 September 2013 (UTC)[reply]
I still don't understand. What is wrong with the existing numeral forms category? —CodeCat 19:24, 7 September 2013 (UTC)[reply]
Wait. There was an existing numeral forms category? --KoreanQuoter (talk) 05:47, 8 September 2013 (UTC)[reply]
…yes? Keφr 06:02, 8 September 2013 (UTC)[reply]
Oh. Silly me. Thank you. --KoreanQuoter (talk) 06:18, 8 September 2013 (UTC)[reply]

CFI and Wiktionary is not an encyclopedia

I have created vote Wiktionary:Votes/pl-2013-09/CFI_and_Wiktionary_is_not_an_encyclopedia. I propose to remove or at least trim WT:CFI#Wiktionary is not an encyclopedia section.

Let us postpone the vote as much as the discussion needs. --Dan Polansky (talk) 08:08, 8 September 2013 (UTC)[reply]

Let's keep the comments on the talk page of the vote. Mglovesfun (talk) 09:06, 8 September 2013 (UTC)[reply]

CFI and trimming the Idiomaticity section

I have created vote Wiktionary:Votes/pl-2013-09/CFI and trimming the Idiomaticity section.

Let us postpone the vote as much as the discussion needs. --Dan Polansky (talk) 09:00, 8 September 2013 (UTC)[reply]

Underestimating idiomaticity of Finnish translations

While going around fixing translation lists, I noticed that very often, the Finnish translations are marked up as if they were sum-of-parts. At first I thought, "well, I guess Finnish is weird", but recently I started to doubt the accuracy of their such characterisation. Take (deprecated template usage) door-to-door. The Finnish translations listed look like simple inflections of the Finnish word for "door" ((deprecated template usage) ovi). The English meaning of "door-to-door" is apparently idiomatic, so I have a quite hard time imagining how the Finnish entry, which breaks down into constituents pretty much the same way, would be sum-of-parts. I am also suspicious of entries where Finnish translations are broken into roots and affixes.

Do you think we should go over these? Keφr 11:54, 8 September 2013 (UTC)[reply]

"ovelta ovelle" translates literally as "from door to door". (deprecated template usage) ovelta is the ablative case of (deprecated template usage) ovi and means "(away) from a/the door", while (deprecated template usage) ovelle is the allative case and means "to/towards/onto a/the door". —CodeCat 12:19, 8 September 2013 (UTC)[reply]
I've noticed this too and I think it's just the way they've been added (by a human editor) and nothing to do with the language itself. Mglovesfun (talk) 19:58, 8 September 2013 (UTC)[reply]
So in that case, should we not have an entry for the whole phrase and link to it in the translations list? Keφr 20:55, 8 September 2013 (UTC)[reply]
I would trust Hekaheka's judgement on how to translate into Finnish. We should invite her if any doubt. Translations may be done as "solids" (if they are idiomatic in the target language) or using "sum of part" methods. Using "Template:tø" allows you to see individual components and the grammar but "ovelta ovelle" requires the entry to exist or at least interwiki. --Anatoli (обсудить/вклад) 22:30, 8 September 2013 (UTC)[reply]
I agree (w/Anatoli). Even if it is just a case where the Finnish-speaking editors have made a different decision than most of the non-Finnish-speaking editors would have made . . . well, that doesn't seem like a big deal to me. There are a lot of things that it's important to be consistent about across languages, but I'm not sure this is one of them. —RuakhTALK 23:16, 8 September 2013 (UTC)[reply]
There are pros and cons in both approaches but it's often safer to use "SoP" approach, even if it's more time-consuming. It usually causes less criticism. It's the first I see criticism of the SoP approach.--Anatoli (обсудить/вклад) 04:32, 9 September 2013 (UTC)[reply]
Safer? Maybe. Though note that [[ovelta ovelle]] does exist in this case. The specific problem I see with overmarking as sum-of-parts are that 0) this discourages creation of entries for terms which may be non-trivial to translate into English; 1) these terms will not be picked up by Yair rand's gadget on the search page, and it does not seem obvious to me how it could be extended to do so; leading into 2) the translation of such a term will be harder and more time-consuming, especially when the entries for the constituent words are missing some meanings. So, marking translations as idiomatic can be beneficial even when that makes them redlinks. Keφr 07:09, 9 September 2013 (UTC)[reply]
By "safer" I mean in terms of someone disputing idiomaticity. I've translated double-decker bus and death camp as SoP двухэтажный автобус m (dvuxetážnyj avtóbus) and Template:tø as tomorrow someone may dispute both the English terms and the Russian translations. It's still educational to see the grammar of the translations, showing the individual parts and how the translation is made. --Anatoli (обсудить/вклад) 13:00, 9 September 2013 (UTC)[reply]
Educational — I am not denying that. Having these translations listed as SoPs is helpful, even if only by the virtue of it being better than having no translations at all. But the grammatical structure of a multi-word term can be also analysed in the whole term's entry, and can also be inferred by hand when the constituent word pages are reasonably complete, so this is not hugely relevant. Creating pages for these I find rather easy. Though granted, I do not always create these myself.
All I wanted to know is whether everyone is okay with treating such terms as SoPs. There are many other examples, they often land in Category:Translations to be checked (Finnish), because I usually just leave them there when xte suggests reviewing the translation. (Hekaheka would probably like to call me a perkeleen vittupää because of that, but oh well. Cannot please everyone.) Keφr 14:10, 9 September 2013 (UTC)[reply]

Merging Mari and Buryat varieties

Can we or should we merge some varieties of Mari and Buryat at Wiktionary?

  • Mari (chm, mhr, mrj):
Hill/Western Mari (mrj) can probably stay separate, it has a few more letters than the standard or Eastern (Meadow) Mari. Extra Cyrillic letters in Western Mari: Ӓ, ӓ and Ӹ, ӹ and they don't use standard Mari letter Ҥ, ҥ. This variety has about 30 thousand speakers. It's still possible to merge if Western Mari has context labels and the additional letters are handled. Anyway, chm and mhr can be merged safely.
Language codes with names:
  • chm - "Mari", "Standard Mari"
  • mhr - "Eastern Mari", "Meadow Mari"
  • mrj - "Western Mari", "Hill Mari"


  • Buryat (bxr, bxu, bxm, bua):
Russian and Mongolian Buryat use the same alphabet. Mongolian and Vagindra are hardly used. The overwhelming majority of Buryats live in Buryatia, some in Mongolia, even less in China.
Language codes with names:
  • bxr - "Russia Buryat"
  • bxm - "Mongolia Buryat"
  • bxu - "China Buryat"
  • bua - "Buryat", "Buriat"
(it's obvious that at least one is redundant)

--Anatoli (обсудить/вклад) 04:53, 10 September 2013 (UTC)[reply]

Meadow and Hill Mari have separate written standards so they should be kept separate (and the chm code deleted). Buryat, easy thing, merge them. There is no difference between these 'lects, except perhaps for some loanwords. -- Liliana 07:39, 10 September 2013 (UTC)[reply]
Seems like we have an agreement on Buryat. So we can delete bxr, bxm and bxu and make bua the only code for Buryat.
With Mari, I would rather delete mhr and leave the name "Mari". Standard Mari is "Eastern Mari" or "Meadow Mari" and chm is more common. OK, let's leave mrj but I'll make a transliteration page and a module, which works for both alphabets. --Anatoli (обсудить/вклад) 22:42, 10 September 2013 (UTC)[reply]

Overriding manual transliteration

This has been discussed in other pages, but no consensus was reached.

Automated transliteration works perfectly for several languages, such as Armenian. Some suggest to always override manual transliteration for these languages, because many of them are incorrect due to human errors and inconsistent (due to changes to transliteration system, etc.) Some others say we should always let the editors use |tr=.

Another solution is removing the old manual transliterations for the terms of these languages, and don't override manual transliterations after that. (we can put the pages with |tr= for terms of these languages in a category to keep track of them) --Z 13:29, 10 September 2013 (UTC)[reply]

Wouldn't we want to be able to let that be decided on a language-by-language basis? What about also allowing the overriding of everything with tr=, but allowing tr0= to override bad automatic transliterations, also on a per-language basis? DCDuring TALK 14:47, 10 September 2013 (UTC)[reply]
It is being decided language-by-language. See the override_translit section of Module:links. --Vahag (talk) 15:15, 10 September 2013 (UTC)[reply]
I support overriding manual transliteration for languages whose automatic transliteration works perfectly, e.g. Armenian, Georgian. For such languages manual transliteration will be redundant in the best case and wrong in the worst case. --Vahag (talk) 15:15, 10 September 2013 (UTC)[reply]

Great language game

I know that this isn't a forum, but there isn't really anywhere else to put it. And it would be a shame not to share it because I think many people on Wiktionary will like it. There's a new website called the Great Language Game where you can see how well you can tell different languages apart by ear. I seem to do pretty well with it, I hope it's fun to others as well. —CodeCat 12:32, 12 September 2013 (UTC)[reply]

Love it, thanks. Only 800 for me... And I got lucky, I kept ending up with Slavic languages. --Fsojic (talk) 20:16, 12 September 2013 (UTC)[reply]

What can be done to improve quality?

The more my wanderings take me to visit a wide range of non-English entries, the more I think that the English-language entry quality problem is not our only quality problem.

For non-English entries the problems range from the near-incoherent terseness of our copyings of a 110-year old Sanskrit dictionary to the frequent presentation of non-idiomatic calques as glosses and the use of terms that simply don't belong in a definiens of a contemporary dictionary due to the age, rareness, or unglossed polysemy of the term or terms used.

For English entries the quality problem includes the obsolete language of definiens and the poor coverage of polysemic terms, especially uses that developed in the 20th century and remain common today. The entries for polysemic terms contain many important definitions that are buried and lost in visual clutter. The definiens of many terms includes words that are rare and/or technical when neither characteristic is necessary.

Are there technical means that could help? An example might be processing the dumps to identify uses of terms labeled rare, obsolete, archaic in definiens. Or words used only once in any definiens.

What can we do to get more effort by existing and past editors devoted to entry improvement?

Are there helpful ways to more actively recruit or develop contributors? DCDuring TALK 12:56, 12 September 2013 (UTC)[reply]

I do think that we should avoid using obscure terms in definitions, but sometimes there just happens to be that one word that describes it so much better than anything else. In such cases I usually prefer to show both. I often include multiple glosses if it helps to narrow the meaning down more.
I'm not sure if there is much we can do to increase the effort. People will work on what they feel like working on. We can raise awareness, but that's about all we can do. Wiktionary is pretty decentralised and we have no central announcement system that everyone is guaranteed to see, except for WT:NFE which a lot of people ignore regardless. So if we want to raise awareness of issues we first need some kind of global platform to raise them on to begin with. Beer Parlour isn't really enough.
As for visual clutter, I think this is a real problem and I think it could be improved substantially by adopting a visual style similar or identical to the French Wiktionary. Their use of colours, borders and icons is far easier on the eye and does a lot to direct the user's attention to certain parts of the page. It makes things stand out more and gives visual structure to the page which is pretty much essential. —CodeCat 13:19, 12 September 2013 (UTC)[reply]
For foreign language entries, heavy use of glosses or listing several possible translations is a must. For example, in the Serbo-Croatian entry Template:l/sh/Latn, a definition given is “binding”. The reader is left to guess which sense of Template:l/en it refers to. When I add a Portuguese entry, I always try to add enough information via glosses and possible translations so the user won’t need to follow any link nor rely on guesswork to understand precisely what the term means.
Shortcut glosses like “(all senses)” should be avoided as well, IMO, as they can lead to error. — Ungoliant (Falai) 13:28, 12 September 2013 (UTC)[reply]
"For foreign language entries, heavy use of glosses or listing several possible translations is a must." Very much this. I already do this when adding entries in Polish, and I put a similar recommendation at WT:APL#Definitions. This should be a project-wide policy, because the reasons I gave there are not exclusive to Polish at all. This is a no-brainer for anyone who deals with translations, really.
As for visual side, I will disagree. I actually like our, shall I say, ascetically colourless style. I think we the best solution would be to convert pages to some kind of semantic markup so that we do not have to enforce any particular style at all. Dislike a style? Switch your skin.
Regarding the lack of a central propaganda tube, I would add a "add N4E to watchlist" link to the welcome template, and maybe streamline the template to the most essential bits. Could help. And the Beer parlour does not cut it, I presume, partly because the main BP page is just too damn big, loads slowly, and you have to keep adding and removing per-month pages to your watchlist to keep being updated, which is tedious. Wikipedia's archive pages system is better in this regard, although it has its own flaws. I cannot wait for mw:Flow to solve all our wiki discussion problems. In the meantime, why not convert the central discussion pages to LiquidThreads? Keφr 15:29, 12 September 2013 (UTC)[reply]
By comparing other meanings it is is obvious that the binding sense of the Serbo-Croatian noun (deprecated template usage) vȇz refers to "A finishing on a seam or hem of a garment". Sometimes using a dictionary requires a minimum amount of intelligence on reader's part. Ditto for what DCDuring calls "near-incoherent terseness" of Monier-Williams Sanskrit dictionary, the most comprehensive Sanskrit dictionary compiled by the most authorative Sanskrit lexicographer in the West. --Ivan Štambuk (talk) 17:04, 12 September 2013 (UTC)[reply]
Easy to say that when you’re a native speaker and already know the word. And the best one can do is figuring out that “A finishing on a seam [] ” is the most likely meaning, but without a gloss there is no certainty. Even then, people expect a dictionary, not a test on their figuring-out-the-most-likely-meaning-of-words skills. — Ungoliant (Falai) 17:14, 12 September 2013 (UTC)[reply]
@Ivan: I don't think that those providing support for this project intend that it be usable only by an intellectual elite. The intellectual elite that uses and contributes to this wiki needs to also serve the general population of those who need dictionaries.
I don't doubt the underlying quality of the Sanskrit dictionary in term of its coverage of Sanskrit. It seems like an outstanding basis for good Sanskrit Wiktionary entries. I just don't think that it is very usable for a non-specialist, partially because the style and wording of the Wiktionary entries resulting from the copying is not similar to that of other Wiktionary entries. The problem is not unlike the problem of copying Webster 1913 definitions, except the stylistic difference is even more dramatic. As it stands our Wiktionary Sanskrit entries are, in many ways, worse than the underlying dictionary because of omissions. DCDuring TALK 17:37, 12 September 2013 (UTC)[reply]
But Sanskrit and other extinct and classical languages are only used by intellectual elite. The only way a common person is going to come across a Sanskrit entry is through some etymology.
I don't recall having seen Sanskrit entries that are formatted radically different than entries in other languages. The only problem is the abundance of meanings that words have, and which sometimes get grouped by eras or sources, and not by semantic closeness as they are normally - but that's a particular issue of classical languages that have been used over a long period of time, which other "normal" languages don't have. But users looking up Sanskrit words expect such layout which enables them to quickly isolate set of meanings appearing in a particular work that they are reading.
I don't understand what omissions you are referring to. If you have some constructive proposal of how to change some user-unfriendly entry of your choice I'd be happy to hear it. --Ivan Štambuk (talk) 22:02, 12 September 2013 (UTC)[reply]
With other meanings being embroidery and needlework, I can't imagine anyone construing the binding sense in any definition of [[binding]] other than the one that has sewing context label attached. I don't think that the average reader is that stupid. --Ivan Štambuk (talk) 22:02, 12 September 2013 (UTC)[reply]
This is more of an observation than a suggestion, but when I was experimenting with using links to senses with sense IDs using {{senseid}} when writing definitions for foreign-language terms, I found that the sense that I wanted to link to was missing--about half the time, actually--and that there were many senses of the English term that I hadn't thought of, and which forced more disambiguation on the foreign language end than I had realized. The process of matching senses exposed gaps on both ends, so if that could be integrated in the editing process, it would be a big aid to editors. That would really tap into the collaborative power of the project. This is not to push or oppose sense IDs, just my experience with them. --Haplology (talk) 16:05, 12 September 2013 (UTC)[reply]
I would expect contemporary non-English terms to often need well-worded, contemporary senses of common English terms that the English entries lack. That is one of the biggest problems for English entries. {{rfdef}} helps us identify the need, but a note in the template to explain the need would be helpful in prioritizing work on the English entry. Do we have a tag to mark the FL definition as waiting for a suitable English definition to be provided? Would a use of {{sense-id}} in the English and FL entries help by providing a way to find the original FL entry problem (missing gloss)? DCDuring TALK 16:36, 12 September 2013 (UTC)[reply]
We also have {{gloss-stub}}. I add it to entries whenever I find that the definition doesn't identify the meaning specific enough. —CodeCat 16:46, 12 September 2013 (UTC)[reply]
That presumably goes in the FL entry and {{rfdef}} goes in a new line at the English L3/L4 section. How could those be linked more or less automagically by the use of {{senseid}} in each? DCDuring TALK 17:25, 12 September 2013 (UTC)[reply]
Install Wikidata. DTLHS (talk) 20:30, 12 September 2013 (UTC)[reply]
And I realize that saying "install wikidata" isn't helpful- just pent up frustration about trying to implement features of a database in something that very much isn't. DTLHS (talk) 20:49, 12 September 2013 (UTC)[reply]
Speaking of Wikidata — I often see statements in the Wikipedia metaspace that there are plans to deploy Wikidata on Wiktionary in some form. At the same time I am yet to see anybody from Wikidata approaching the community here about this. Trying to explain how it would work, how to handle existing dictionary content, and such. I smell a disaster. Keφr 20:55, 12 September 2013 (UTC)[reply]
d:Wikidata:Wiktionary. --Yair rand (talk) 22:06, 12 September 2013 (UTC)[reply]
Huh, I just realized I'm the only Wiktionary admin who's also a Wikidata admin. We're probably going to need some more Wiktionarians paying attention to Wikidata's progress if WD use is going to turn out well here. ... --Yair rand (talk) 22:20, 12 September 2013 (UTC)[reply]
@Ivan. I think we would want to be more than a wikisource for a 110-year old dictionary, no matter how good that dictionary may be, especially as there already it already is available: eg, [1]
We split proper nouns senses from common noun senses, but many, many Sanskrit sections do not. Not all Sanskrit sections include the link to the underlying dictionary, which is itself an omission, and not every bit of explanatory note in the original dictionary seems to have survived. A glossary for the abbreviations used does not seem to be included. The language used is not contemporary English and the definitions lack glosses. DCDuring TALK 23:23, 12 September 2013 (UTC)[reply]
For an example of a problem see WT:RFC#सह and join the discussion there. DCDuring TALK 23:25, 12 September 2013 (UTC)[reply]
MW dictionary is perfectly valid even today (because we're dealing with an extinct language, doh), some of the entries were created before the online version of MW dictionary was available, Sanskrit grammar tradition doesn't make the distinction between proper and common nouns, 98% of the words in its definitions are perfectly valid contemporary English as far as I recall, and anyone studying Sanskrit doesn't need a meaning gloss. In other words, there are no problems with Sanskrit entries. --Ivan Štambuk (talk) 16:02, 13 September 2013 (UTC)[reply]
The problem with the dictionary is not the definienda, it's the definiens. We need to convert the definitions to a more contemporary English, at least removing the archaicisms and obsolete terms. Formatting the entries to Wiktionary standards, eg, Proper noun sections. Including references to the underlying dictionary to aid the work wouldn't hurt. Having excellent coverage of Sanskrit is certainly an important goal for Wiktionary, which perhaps subsequent contributors will achieve. DCDuring TALK 16:42, 13 September 2013 (UTC)[reply]

Pages with protolanguage information?

CodeCat and I have discussed a couple of times the question of reconstructed forms without references in Etymology sections (the most recent discussion is here). One conclusion now seems to be that it would be a good idea to have pages (perhaps in the Appendix) with more detailed historical information, including perhaps original research by Wiktionarians, on specific topics, which could then be linked to from individual words. Case in point: Proto-Baltic vs. Proto-Balto-Slavic. The current tendency goes in the direction of Proto-Balto-Slavic, but there are not many published reconstructions of words out there, whereas Proto-Baltic has clearer sources. Now, if Wiktionarians want to add Proto-Balto-Slavic etymologies, or simply replace the Proto-Baltic label ({{etyl|bat-pro|LANG}}) with the Proto-Balto-Slavic one ({{etyl|ine-bsl-pro|LANG}}) on the assumption that most PB reconstructions will be acceptable PBS reconstructions as well, wouldn't it be nice to have a page (called, say, "Appendix:Proto-Baltic and Proto-Balto-Slavic") that discusses this in detail, with correspondences, derivations, and clear statements of what things in PB we think will remain the same in PBS, and why? In this way, any changes of PB to PBS can be referred to this page: it will be the basic source for the reconstruction, and the interested reader can read it to see on what grounds we have an (as yet unpublished) PBS etymology rather than the (already published) PB one. Also, Appendix pages with reconstructed PBS words could be linked to it. One objection is that this page would contain "encyclopedic" information. Yet I feel that this kind of information is quite vital for someone who is navigating the thorny area of Indo-European etymology and wants to feel sure the etymological information given at Wiktionary is correct and accurate -- as vital as having, say, a page on IPA and its symbols, or a page with the definitions of all grammatical terms used to tag words. What do you guys think? --Pereru (talk) 20:25, 12 September 2013 (UTC)[reply]

I think you missed a few important parts of the original discussion. To me, the point of having a special page for this is to act as a repository of sourced knowledge related to the reconstruction of a given language, but it's also the place that we as Wiktionarians would use to collect our own conclusions about certain minor issues surrounding them. Specifically I called it a way to allow original research, while keeping it both contained and publically accessible as a reference for etymologies and reconstructed entries on Wiktionary. Basically, to enable peer review of Wiktionary's reconstructions. I also think that the last two posts in our conversation are important:
Me: My main objection with really big stuff is that it is the kind of area where even professional linguists get things very wrong, so that makes it even more likely for amateurs to miss things. I don't have any professional schooling in linguistics, just a lot of curiosity that made me want to look for things and learn more. So I know a bit I think but what I know is not at a professional level and I don't think it is for anyone else here either. The limitations are mainly there to protect ourselves, Wiktionary and its users from our own incompetence. :P
Pereru: I am a professional linguist, though not an Indo-Europeanist (I work on South American indigenous languages). But one of the things I've learned is to stick to logics and good arguments, because (a) big stars with famous diplomas often think their fame is all their need to justify something, and (b) non-big-stars, without any diplomas, surprisingly often contribute really intelligent, insightful ideas that deserve recognition.
CodeCat 21:03, 12 September 2013 (UTC)[reply]
That is an interesting thing, and I certainly support it. But I do see the main point of having such pages in a 'dictionary (as opposed to a research journal) in being able to add references to specific reconstructions -- be they in ===Etymology=== sections, be they independent pages on PBS reconstructed forms. --Pereru (talk) 06:48, 13 September 2013 (UTC)[reply]
No we can not add original research by Wiktionarians in etymologies. Neither as reconstructions nor as speculations on word origins. Etymologies are like small encyclopedic articles and all of the Wikipedia policies on no OR and maintaining NPOV apply to them as well. If we allowed original research Wiktionary would become worthless as an etymological dictionary because there would be no way to differentiate among credible sources. We might as well restore H&M's Chinese phonosemantic interpretations. If you want to make up theories on word origins go write a blog or paper. It is not up to us to deem sources "right" or "wrong", but simply to collect all of the competing theories from established authorities and present them to the reader in the most appropriate fashion, taking into account issues such as neutrality, acceptance, and newness.
Proto-Baltic is an obsolete theory and it's quite irritating to see you intentionally replacing Proto-Balto-Slavic reconstructions that can be cited with the ones based on the 1980s scholarship. I don't think that there are linguists today (apart from some Russophobic Baltic nationalists) that dispute PBSl. There so no "tendency", it's a settled matter. There are are many details that need to be settled, but the grouping itself is not a point of contention. --Ivan Štambuk (talk) 21:26, 12 September 2013 (UTC)[reply]
But where is it actually cited as policy that we don't allow original research? We constantly do original researching when we document definitions, why is this different? If Wiktionary editors can be lexicographers, why not also etymologists? Pereru explicitly encourages the matter and he is a professional linguist himself, so he understands what is involved. I understand that you want to differentiate reliable theories from bogus ones and that is exactly what this proposal is supposed to prevent, as the idea is just that: to collect all of the competing theories and build up a body of peer reviewed research that can be used to support reconstructions in Wiktionary articles. I wonder if you even understand what has been suggested? —CodeCat 22:10, 12 September 2013 (UTC)[reply]
Writing definitions on the basis of attestations is "original research" in much the same way that writing Wikipedia articles based on cited sources is. We do not invent new meanings, but rather collect the ones attested in usage on the basis of our CFI (which are really "criteria for attestation"). The original part there is to word the definition in a manner that doesn't coincide with any of the existing dictionaries (unless they are out of copyright). No original research is one of the pillars of Wikipedia that protects users from obscure theories, and the project itself from being a propaganda machine for every fringe group that thinks that the lack of editorial or peer-review process as an opportunity to present its fringe view.
Pereru is just someone nicknamed "Pereru" (what is a "professional linguist" BTW? Somebody paid by taxpayers to produce work hidden from general populace behind paywalls and costly volumes?). I don't care if he is de Saussure reincarnated.
You seem to be conflating to separate points: 1) etymologist as somebody writing a paper on a word origin, postulating reconstructions and speculating on word origins 2) etymologist as somebody writing an etymological dictionary, which is usually done by every single headword having references to various scholarly opinions, with etymologist then choosing what he thinks is the "best" explanation. We can only do OR in the second sense, by being a synthetic work of the most recent scholarship. Not invent reconstructions and deep theories of word origins based on our own opinions of how languages evolved. Which is what you have been doing and seem to be keen on getting a community approval. --Ivan Štambuk (talk) 08:26, 13 September 2013 (UTC)[reply]
When we apply existing principles known to linguistics to come to a reconstruction that nobody has published before, is that not just applying the same science that linguists do? My intention was specifically to allow our own reconstructions while at the same time have every detail of that reconstruction accounted for by sources. This is currently an area that is lacking like Pereru points out below; we either have references to the whole reconstruction verbatim, or none at all. For example take Template:term/t. I don't need a source to tell me that it's a sound reconstruction, because I can see that it fits perfectly with all the relevant sound laws in Balto-Slavic and its descendants. Yet it has no source because no source happens to attest this word in Balto-Slavic, even though every single phoneme of the reconstruction can be accounted for by established and sourced sound laws. Also I'm not sure why you think there would be a lack of peer review. I specifically noted that the whole point of this is peer review, and wikis as a whole are founded on the principle of peer review. So fringe theories would be rejected because there is no consensus for them on Wiktionary. As long as we assume that Wiktionary editors are knowledgeable about the area, there would be peer review of new reconstructions to ensure that the science has been applied correctly according to the most mainstream theories. I have done this with many Germanic reconstructions in the past, and it has worked well. —CodeCat 12:09, 13 September 2013 (UTC)[reply]
By creating a reconstruction you are ipso facto making statements "these words are inherited" and "this is the proto/ancestral form" and "these are the sound laws that have occurred". These statements constitute true original research. Specifically, Proto-Balto-Slavic *dūmas "smoke" is by Kortlandt, Derksen and others from Leiden reconstructed as *dúʔmos, with segmental laryngeal merger as glottal stop, and without the change PIE *o > PBSl *a. *dūmas is far from being a sound reconstructions if radically different alternatives are given by reputable authorities in the field. And I think that I can saw forms *duHmos or *duHmas as well in the literature.
Wikis are based on the peer review of content that is itself based on solid evidence. There is no peer review of original research. What is susceptible to discussion are issues such as "is this wording neutral" or "is that prominent opinion or theory sufficiently represented". Not completely new and original interpretations of ex-wiki facts that are repeatedly revised by wiki editors.
That you have done such original research with Proto-Germanic - i.e. postulating reconstructions not found anywhere - only demonstrates that urgent action is needed to stop you from turning this project even more into your personal playground. I don't care about Germanic languages much, but some Balto-Slavic reconstructions and paradigms that you've been making are nothing but original research.
I do support however going beyond traditional etymological dictionaries which are constrained by space, by making extended etymologies describing every sound change that has occurred, and have even proposed how these should be formatted the last time PBSl. was discussed in the BP. But not creating our own reconstructions that cannot be found anywhere. There are thousands of of published works that deal with proto-forms, and if no reference can be found for a particular reconstruction that doesn't necessarily mean "this reconstruction is unreferencable not because it's implausible, but because no linguist has yet studied it" but rather "this reconstruction is unreferenced because it is implausible, and nobody authoritative has wasted time with it". Formally there is no way to distinguish the two cases, absence of evidence and of counter-evidence. You can combine countless theories on the development of particular properties in proto-languages, yielding dozens of equally "valid" reconstructions that individually cannot be attested, but with each sound change within being attested. --Ivan Štambuk (talk) 16:29, 13 September 2013 (UTC)[reply]
I'll have to agree with CodeCat here: where is it said that there can be no original work? To me, it seems every time you add new definitions to words -- definitions not previously published in other dictionaries --, you are doing original work. Where is it said that original work is not OK on Wiktionary, and why? (All I've seen is references to "Wiktionary is not Wikipedia".)
On your objections:
(a)If we allowed original research Wiktionary would become worthless as an etymological dictionary because there would be no way to differentiate among credible sources -- Why not? All you need to do is make accurate references. If you're taking something from a published source, by all means refer to it! (Shall we make it official policy that reconstructed protoforms are only allowed here with references?) If you're proposing one, write a page here with the details not still found in published sources and refer to it! In what way is this confusing, and how would this make it impossible to differentiate among credible sources? If at all, references would make it easier to differentiate among these sources... (On the subject of original research, I refer to published etymological dictionaries, in which the authors often advance original contributions and ideas for specific words, always carefully labeling them -- in the LEV, with a letter "K" at the end -- as the author's own work).
(b) It is not up to us to deem sources "right" or "wrong", but simply to collect all of the competing theories from established authorities and present them to the reader in the most appropriate fashion, taking into account issues such as neutrality, acceptance, and newness -- I agree fully. But note that most etymologies thus far presented here at Wiktionary are not like that: they are given without a source, and the casual reader has no way of judging whether they were presented "appropriately", with attention to "neutrality, acceptance, and care". It seems to me that adding a page in which things like PBS vs. PS etymologies could be explicitly discussed would be a great step forward in the direction of achieving precisely the goal you state. (In fact, here is another suggestion: how about a page, maybe in the Appendix, discussing precisely the good and bad points of all published sources for PIE etymologies that are used at Wiktionary, and why we trust some of them more than others? In the interest of full transparency and disclosure, wouldn't this increase the level of precision, as well as trustworthiness, of Wiktionary etymologies as a whole?)
(c) Proto-Baltic is an obsolete theory and it's quite irritating to see you intentionally replacing Proto-Balto-Slavic reconstructions that can be cited with the ones based on the 1980s scholarship. -- If they can be cited, why is (almost) nobody doing that? I've seen a couple of good citations of PBS forms (usually by you, actually), but most PBS forms proposed here have no support in published sources and, as per your own policy (in (b)) above, should not be here at all. So why are they, and why is wrong to remove them and replace them with sourced ones?
I don't care how "well established" you think PBS is (and a couple of Leiden specialists I've talked to -- both Dutch, not "Russophobic Baltic nationalists", whatever that is -- would beg to differ from you): the issue here is "what published source does a given reconstructed form come from"? Currently, almost nobody is adding sources to reconstructions here. If you have a good, published source for PBS etymologies, by all means refer to it! Heed your own advice! But when I see PBS forms being added without supporting evidence, and that in a world, no matter how well established you think PBS to be as a hypothesis, in which published PBS reconstructions are still few and far between, I think that the best policy is -- as you yourself propose! -- to trust the published sources, in which PB is still much more frequent. And, to follow this policy -- which, again, you yourself explicitly subscribe to! -- I delete, and will go on deleting, unsourced PBS etymologies and replacing them with sourced PB ones. After all, in a dictionary, sourced should always defeat unsourced. If a PBS etymology is sourced, it stays. If it isn't, it doesn't. I honestly don't see how you can subscribe to the "honesty and neutrality" policy you described above, and still disagree with that. Unless you simply want to push your personal vision of "what's right" in PBS reconstructions -- in which case, how is this NPOV?
Alternatively, you can do what CodeCat suggests: write a page in which YOU say why it is that PB reconstructions should be relabeled as PBS even in the absence of a published source that explicitly states that PBS = PS. You can sketch arguments, give examples, correspondences, etc... and then cite this page as your source.
How on earth would this be confusing, and how would this create trust problems for Wiktionary? Please riddle me that! If at all, what we're recommeding is that things be done more responsibly, and with more references. Don't you think that the current etymologies-without-references bonanza creates a much, much worse trust problem than any PBS-vs-PS page would?
I end up having to agree with CodeCat above: I think you didn't understand what it is you're disagreeing with. There is no contradiction between what is proposed here and any of the principles you espouse. Please read it again. --Pereru (talk) 06:48, 13 September 2013 (UTC)[reply]

Just out of curiosity, because I still don't understand what is at stake here: what's exactly the difference between Proto-Balto-Slavic and Proto-Baltic? Does Proto-Balto-Slavic theory say that there was simply no Proto-Baltic language, but that Latvian and Lithuanian evolved from Proto-Balto-Slavic exactly the same way that Proto-Slavic did? I've just drawn this (sorry for the probably simplistic view) so... which tree represents best the actual Proto-Balto-Slavic theory? The second or the third one? --Fsojic (talk) 13:18, 13 September 2013 (UTC)[reply]

The first has been more or less discredited, although some still hang on to it, maybe for political reasons. The second is how linguists generally saw it in the past. Newer research suggests that there are really three branches of Balto-Slavic (not the same as your third image): East Baltic, West Baltic, and Slavic. Each of those, it is supposed, had its own proto-language, but the proto-language of East and West Baltic together (what is called "Proto-Baltic") is not demonstrably different from Proto-Balto-Slavic itself. That is, if you try to find out what the common ancestor of all Baltic languages was, then you end up with a language that Slavic can also descend from. —CodeCat 14:30, 13 September 2013 (UTC)[reply]
But West Baltic evidence is limited. If one reconstructs a word from Latvian and Lithuanian - or East Baltic in general - alone because there is no known corresponding word in Old Prussian - or West Baltic in general - and label it as Proto-Baltic rather than Proto-East-Baltic (and I suppose some do this; well, I don't know), can we be sure it's the root for Proto-Slavic as well? --Fsojic (talk) 15:17, 13 September 2013 (UTC)[reply]
It's a matter of applying knowledge of how each language evolved, and then making all the ends fit together. Linguists formulate the phonetic evolution of a language through a series of ordered rules called "sound laws", which each act to change the pronunciation of words in some specific way according to certain rules. The sound laws for the Balto-Slavic languages are all more or less known, with some difficulty in the details still, but the general picture is clear. This means that it's fairly easy to find out if a given form can be an ancestor for a given Slavic term. All you need to do is apply all the Balto-Slavic-to-Slavic sound laws and see if the result you get matches what is actually found in attested Slavic or in reconstructed Proto-Slavic. An example: you start with Proto-Balto-Slavic Template:term/t. There are two sound laws that apply in this particular case. The first is Balto-Slavic *ū > Slavic *y, the second is masculine nominative singular Balto-Slavic *-as > Proto-Slavic *-ъ. Applying these two rules together gives *dūmas > *dymъ. And that is the form that is actually found in Slavic (see Template:term/t). Thus, the reconstruction is correct for Slavic. The same can then be applied to all the other Balto-Slavic languages, and if it matches all of them, then you have successfully reconstructed a Proto-Balto-Slavic term. —CodeCat 15:27, 13 September 2013 (UTC)[reply]
Except that not everybody accepts PIE *o > Proto-Balto-Slavic *a. You can get both Baltic and Slavic forms independently from Post-PIE *d(ʰ)ūmos. What is important here is w:Hirt's law yielding Balto-Slavic acute accent with fixed (columnar) paradigm on the root, and which is an exclusive Baltic-Slavic isogloss not found in other branches. Superficially, Lithuanian (deprecated template usage) dūmas is more similar to Sanskrit (deprecated template usage) dhūmás, but "under the hood" it's really not. --Ivan Štambuk (talk) 16:41, 13 September 2013 (UTC)[reply]
Vote: Wiktionary:Votes/2013-09/Translation-links to other Wiktionaries

I'm starting to think that maybe our Translations sections should only link to target-language-Wiktionary entries that are actually known to exist (just like how we only have interwiki-links to existent pages). Under such an approach:

  • {{t}} would behave like {{}} does now.
  • {{t-}} and {{}} would redirect to {{t}}, and presumably eventually be eliminated.
  • various tools and bots (Conrad's translation-editor, Kephir's {{t}}-ifier, Rukhabot, etc.) would only deal in {{t}} and {{t+}}.

If y'all are on board with this, I think we'd probably want some sort of vote — the current system, give or take, has been endorsed by votes — but I figured I would start a discussion first, to see (1) if y'all are on board, and (2) if y'all have any alternative/additional ideas.

So . . . any thoughts?

RuakhTALK 20:18, 13 September 2013 (UTC)[reply]

It will not simplify anything for the tools for the same reason the move to {{g}} will not simplify anything until it is completely done, which will not be very soon: in the meantime, we have to deal with both unconverted and converted pages. Complexity in fact at best stays at the same level. My tool always generates {{t}} anyway and will have to recognise existing uses of {{t-}} and {{}} (which it currently does not touch at all). For other tools, including bots, it should be similar.
I am mildly opposed, actually. Contributors from foreign Wiktionaries might be actually looking for redlinks into their native Wiktionaries simply to create the missing entries. With the current approach, it takes two middle-clicks, two keyboard shortcuts, some typing and tab switching to copy our entry into their native Wiktionary, or just two clicks to start the entry from scratch. Although now they would have a somewhat hard time actually finding these. Categorising usages of {{t-}} would be useful for this. Maybe not the best use case, but… I can see some value in this.
So why, really? I fail to see any advantage in the above-mentioned… characteristics of this approach, for lack of a better word. Keφr 20:57, 13 September 2013 (UTC)[reply]
I'll start with your third paragraph ("So why, really? [] "), since I think that's the crux of your comment. (I didn't actually give my reasons for thinking we we shouldn't link to nonexistent FL-wikt entries; I guess I should have.) The reason is, I think such links are useless clutter:
  • In the case of {{t-}}, they're bright red, like redlinks within en.wikt, but unlike redlinks within en.wikt, there's little chance that readers and editors here will be able to help with them, and they're likely to be not-very-useful for en.wikt readers even once they exist. Note that we don't add red interwiki-links, for example, because the goal is to indicate what FL-wikts information can be found in.
  • In the case of {{t}}, the links aren't bright red, but in a way, that's even worse: it's hard to tell at a glance that it's linking to a non-existent FL-wikt entry (because the external-link blue is so similar to the bluelink blue), so it's a link to trick readers into thinking they're going to get more information, when in fact they're not.
That out of the way . . .
Re: first paragraph ("It will not simplify anything [] "): I'm not sure I completely agree with your literal statement, but I think we can agree on a key point, say, "we shouldn't do this because it's a simplification": you because you don't think it is a simplification, me because I don't think a small technical simplification (even if real) can justify a much-larger functionality change.
Re: second paragraph: Thanks for weighing in. For the specific use-case you mention (contributors from an FL wikt looking for our redlinks to them), I'd be happy to generate language-specific lists, which I think would work better for that use-case than searching for entries with {{t-}}. (And of course, even that use-case doesn't recommend {{t}}'s current behavior.) But if you can think of any other relevant use-cases, I'd be interested to hear about them.
RuakhTALK 03:26, 14 September 2013 (UTC)[reply]
Okay, I am fine with that. You can go ahead as far as I am concerned. Keφr 08:37, 14 September 2013 (UTC)[reply]
I support this. —CodeCat 01:51, 14 September 2013 (UTC)[reply]
  • Support. As for "Contributors from foreign Wiktionaries might be actually looking for redlinks into their native Wiktionaries simply to create the missing entries": I don't think it en.wikt's job to act as a worklist for other Wiktionaries, presenting the editors of en.wikt with redlinks that they cannot turn blue by editing en.wikt. --Dan Polansky (talk) 08:28, 14 September 2013 (UTC)[reply]

X-system and H-system in Esperanto

Discussion moved to Wiktionary talk:About Esperanto#X-system and H-system.

Block of User:MewBot

Ruakh blocked MewBot for updating {{it-noun}} quite profoundly without any prior discussion. This seems to violate WT:BOT#Policy. CodeCat has been unblocking her own bot. Since both the blocking and the unblocking are unilateral, I thought I'd bring it here. I also support an indefinitely (but presumably not infinite) block both for this issue and the fact that CodeCat can't always act alone on updating things in her grand vision of things without discussing it first. Mglovesfun (talk) 21:10, 14 September 2013 (UTC)[reply]

What are you talking about? You even took part in the discussion, and it wasn't even the only one that took place, there's more on SemperBlotto's talk page. —CodeCat 21:13, 14 September 2013 (UTC)[reply]
I must admit that CodeCat can be very annoying, particularly when modifying heavily-used modules/templates without testing them. But in this case, the modifications were discussed with me (the major editor of Italian nouns) in advance, and they seem to work OK. SemperBlotto (talk) 21:21, 14 September 2013 (UTC)[reply]
Thanks for bringing this here. Personally, I actually don't support any long-term block: CodeCat obviously enjoys running a bot, and as long as she's using it to do things that the community has agreed should be done, I think that's great. My blocks were under the assumption that she would quickly fix the issue and then unblock it (I think my block-summary even said as much); I had no idea how much of a trial this would be. She's taken it personally, so has started making it personal herself, casting aspersions on my intentions, so now I'm annoyed enough that I'm half-tempted to support a long-term block, :-P   but my best current judgment is that I should trust my earlier, non-annoyed judgment. —RuakhTALK 02:43, 15 September 2013 (UTC)[reply]
  • I support temporarily blocking User:MewBot for bot actions made without first gaining consensus for them via appropriate channel such as Beer parlour. Whenever a dispute over there being a consensus for bot actions arises, CodeCat should provide links that show there is consensus for their actions. Only after the blocking admin is satisfied that the actions are supported by consensus can the User:MewBot be unblocked, on a case-to-case basis. --Dan Polansky (talk) 09:48, 15 September 2013 (UTC)[reply]
    • And what is consensus? Yes, this is a redlink. As far as I know, we never practised "consensus" here. Keφr 10:32, 15 September 2013 (UTC)[reply]
    • Dan, I did in fact show that there was a consensus, but Ruakh wasn't satisfied. Not much I can do then. —CodeCat 12:01, 15 September 2013 (UTC)[reply]
      • Ruakh said: "I want the changes to stop until they are discussed on a page such as Wiktionary:Beer parlour, Template talk:it-noun, or Wiktionary talk:About Italian." I support his request. --Dan Polansky (talk) 14:28, 15 September 2013 (UTC)[reply]
        • And you think anyone else would read it and care to respond? This wiki has like, twenty regular contributors, most of them admins (which I think is telling something), and the areas of their interests hardly ever overlap. We cannot afford being bureaucratic here. Keφr 14:42, 15 September 2013 (UTC)[reply]
          • Re: "And you think anyone else would read it and care to respond?": Then there wouldn't be a problem. You just post, with a question like "Does anyone object?" or "Any objections?", and if no one cares to respond, you just go ahead after a day or two. CodeCat refuses to do even that much. —RuakhTALK 15:13, 15 September 2013 (UTC)[reply]
            • One question first. Do you want to object to these edits, given their content? Keφr 15:29, 15 September 2013 (UTC)[reply]
              • I don't think that's relevant. Suppose that someone ran a bot to delete hundreds of entries in a language that you don't know, but that several editors contribute in. Suppose that this bot-task wasn't discussed or mentioned anywhere that you can find; you only find about it because you happen to see one of the deletions. You wouldn't (couldn't) object to the deletions themselves, because you don't know the language — for all you know, the deletions are perfectly correct — but then again, for all you know, the deletions might be enforcing one editor's idiosyncratic or prescriptivist views. Wouldn't you want to make sure that the other editors in the language were aware of what was going on? Wouldn't you be annoyed that someone took it upon themselves to do this without any discussion? (That may sound like a reductio ad absurdum, but given CodeCat's other recent mass actions, such as deleting all Slovene translations whose gender was given as "masculine", I expect to see something like this any day now. Maybe she'll change her mind about the script Gothic should be in, and no one else will realize it until it's a fait accompli. (N.B.: In all fairness regarding the Slovene thing, I should mention that she did intend the deletions to be somewhat temporary: she hoped to restore the translations herself using a bot. Dunno how well that would have worked; if the change hadn't been reverted, it's almost certain that at least a few translations would still be gone today, but it's hard to say how many. Actually it's quite possible that a few still are gone, and there's no way to tell.)) —RuakhTALK 06:46, 16 September 2013 (UTC)[reply]
        • It also concerns Italian templates. How many people beside Semper do you think would care for that? My resistance to all of this is from Ruakh saying "I'm not convinced of your way of forming consensus, do it my way or I'll keep blocking your bot". —CodeCat 14:53, 15 September 2013 (UTC)[reply]
          • I certainly never made reference to "your way of forming consensus", because until now it never even occurred to me that you thought you were forming consensus! You've often taken infrastructural actions unilaterally, with no pretense of consensus-building — once, recently, when I called you out on making breaking changes to {{support}}, your entire reply was "So being bold is not ok anymore?" — and I assumed that this was more of the same. So, just to be absolutely clear about this: as far as your bot-edits regarding {{it-noun}} are concerned, "your way of forming consensus" was simply to ask SemperBlotto (talkcontribs) about it, and to leave it at that? —RuakhTALK 06:46, 16 September 2013 (UTC)[reply]
If we don't have a quorum of knowledgeable and/or interested people to decide on something of broad impact, ie, in a language, then we shouldn't be doing it. If it is so obvious that language-specific expertise is not required, then there should be no problem getting some kind of consent from parties with less expertise. If language expertise is needed, can't contributors from other Wiktionaries be solicited for advice or help?
In any event, there is plenty to do at the level of cleaning up messes and long-standing problems. One could even get involved in individual entry improvement, or with some kind of support for welcoming new contributors and making it easy for them to make contributions that we value. If this seems hard or vague, that might be an indication that it is the kind of task that is neglected and might have a substantial payoff. DCDuring TALK 16:16, 15 September 2013 (UTC)[reply]
Expertise isn't the issue, it's people affected by it. People who don't use Italian templates will generally just go "I don't care because I am not affected by it". So is it that surprising that I went directly to one of the few people who would be affected? —CodeCat 16:35, 15 September 2013 (UTC)[reply]
With that attitude, we are going to stagnate. And get feedback like this: Wiktionary:Feedback#Why Wiktionary sucks. And no one will be able to do anything. If there is only one person capable of identifying a problem, able and willing to fix it, they should solve it, quorum be damned. People will not join if we neglect to address issues because of insufficient number of, well, people; they will just go elsewhere. There will never be a quorum at all. I repeat, we cannot afford to be bureaucratic.
Where are these tasks? I want to see them. (Coincidentally, I have been thinking for a while about creating a global to-do list page, named like Wiktionary:Open tasks). Keφr 16:44, 15 September 2013 (UTC)[reply]
I like how Linus Torvalds put it: "[...] don't expect people to jump in and help you. That's not how these things work. You need to get something half-way useful first, and then others will say "hey, that almost works for me", and they'll get involved in the project." Keφr 16:46, 15 September 2013 (UTC)[reply]
We won't stagnate because someone's favorite technical project can't proceed. And we can't rely on the would-be problem-solvers to find the right problems to solve. We may stagnate because we don't have the resources to maintain and improve what we have. I am not so sure that it is advisable that adds to the maintenance burden by attempting to maintain a font system that fights with our host software.
Does anyone have any ideas about how to make it fun and easy for new users to make useful contributions? DCDuring TALK 17:04, 15 September 2013 (UTC)[reply]
Ideas how to make it easy? Having ideas is simple, implementing them is harder. Make an editor which abstracts away the markup syntax, everywhere. Something to replace the NEC and WT:EDIT. It should also meaningfully support inflection tables and requests for etymology/pronunciation/verification/etc. But for that, we need to harmonise markup and template usage in some way — pages which render similarly may have wildly different markup, the semantics of which will not always be obvious, and therefore it will be hard to parse. Or even completely migrate Wiktionary to some kind of semantic database, because MediaWiki markup is quite lousy for our purposes. (Though I am a bit sceptical about Wikidata — I never saw a discussion between regulars here and the Wikidata people. There might be some friction.) The "fun" part may be harder to have ideas about. Perhaps we could ask the WMF to enable the WikiLove extension and the "thanks" feature (and create an opt-out or even opt-in for ULS and WebFonts by the way), and encourage people to use them. Although I am not particularly enthusiastic about these, because I do see barnstars degenerate into meaninglessness on Wikipedia, so… On the other hand, I do miss "thanks". I will probably use it somewhat often if it be here. Keφr 17:23, 15 September 2013 (UTC)[reply]
What wikis should be good for is capturing input from users. If we can bring format bit by bit to a certain level of consistency so data can be extracted from the dumps, I think we will have done our job. I think that means keeping at least one level simple; having bots run to identify non-conforming entries; having easy, form-based ways to add definitions, citations, usage examples, even in-line comments, which can be easily flagged by patrolers for further specific kinds of further review. I don't know if we have any statistics about how many definitions and usexes are added using the specific tools. They aren't available by default to unregistered users, which might be where they do the most good (and have the most risk of abuse). Having thumbs-up/down rating for individual definitions and sections might be nice and give us a way to capture some feedback.
"Fun" has a lot of degrees to it. "Satisfaction" is a level. And thank you for being you! ;-}} DCDuring TALK 19:48, 15 September 2013 (UTC)[reply]
…which is pretty similar to what I had in mind, yes. And I like the thumbs-up/down for definitions idea. In addition to creating some gratification to editors, it would be a somewhat useful feedback tool. (Better than our current one, anyway. Buried deep next to interwiki links. Ugh.) And compared to barnstars, it would better fit our until-now unwritten (to an extent) philosophy that this project is ostensibly a dictionary and not a circlejerk. (Wikipedia has it written down, which is probably why they never follow it.) But it seems we hijacked the thread. Time to go back to lynching CodeCat, Ruakh, or whoever deserves, or whoever does not deserve, but we just feel like lynching. Keφr 20:39, 15 September 2013 (UTC)[reply]
  • My main point was the procedural one that if there isn't broad support for, rather than merely a lack of opposition to, a reform of how bots support a language, the reform should not proceed. The "broad" support could be among en.Wiktionarians knowledgeable about the language, among en.Wiktionarians as a whole (which usually means the reform is obviously beneficial and doesn't seem to require a lot of specific knowledge), or either type of grouping supplemented by support from those active on the language's Wiktionary. I could imagine relevant opinion coming from other wikis.
It just isn't a very good idea for powerful technical means to be unleashed without being fairly sure that the ends to be achieved are on balance desirable. DCDuring TALK 22:20, 15 September 2013 (UTC)[reply]

Eliminating adjective PoS for Ainu

Although John Batchelor includes adjectives for Ainu in his works about 100 years ago, according to scholars such as Tamura and Kumagai [2], Ainu has no adjectives; that category of speech is best characterized as intransitive verbs. Wiktionary has four adjectives, listed at Category:Ainu adjectives. Are there any objections to changing all of these to verbs? Since these words include the inchoative sense (become X), a possible way to gloss them is "To be/become X." BB12 (talk) 08:19, 15 September 2013 (UTC)[reply]

No objection to changing them to verbs. As to the rest: I would say a better way to categorize them is as stative verbs. See Category:Hawaiian stative verbs for one way of handling these without resorting to "be/become" in every definition. Chuck Entz (talk) 09:56, 15 September 2013 (UTC)[reply]
Thank you for the reference. There's a description of this at wt:About Hawaiian, but I don't understand it all. So, for example, does keʻokeʻo = white, clear, including the meaning of "become white, become clear"?
I found minor problems with the following Hawaiian stative verbs: makahiki and wikiwiki (no "stative" label, probably safe for me to add), kea (it's not clear why White Mountain has quotes and is italicized), and luahine (I think the comma just needs to be deleted from the inside the template). BB12 (talk) 21:20, 15 September 2013 (UTC)[reply]
Stative, not inchoative. The meanings are "be white, be clear". —Μετάknowledgediscuss/deeds 21:22, 15 September 2013 (UTC)[reply]
Right. So what would be a good way of showing the user that these Ainu words have the inchoative meaning as well? I don't understand how Hawaiian really makes the stative issue clear, either, as a stative label does not seem very user-friendly. BB12 (talk) 22:09, 15 September 2013 (UTC)[reply]
Comment: English words like "white" and "clear" (and "brown", etc) are verbs, yet the basic meaning of the Hawaiian verbs currently defined/translated as "white", "clear" etc is not necessarily white#Verb, clear#Verb, etc (unless the English verbs happen to have stative senses). The meaning of the Hawaiian verbs is "be white#Adjective", etc. Hence I agree with BB12 that it's not user-friendly to omit "be" from the definitions. Someone could make a pass over the entries with AWB to insert it anywhere it's missing. - -sche (discuss) 04:47, 16 September 2013 (UTC)[reply]
I have no particular expertise with regard to Ainu, but do know a bit about stative verbs. It seems likely that, Batchelor notwithstanding, ピリカ is a stative verb. Batchelor even glosses it as "to be good" in addition to calling it an adjective glossed as "good". This seems similar to Lakota, where missionary linguists identified the stative verbs as adjectives since their English or French translations are usually adjectives. I'm not sure about アィヌ, though. It seems clear that it exists as a noun; that Batchelor calls it an adjective might be faulty reasoning from a stative verb (if one exists), but could equally be false reasoning from use of the noun to modify other nouns, as in アィヌモシㇼ (Ainu land; Hokkaido). In either case, though, it's not an objection to eliminating the adjective POS for Ainu. (I'd also echo Chuck Entz in pointing out that statives are not necessarily inchoative.) Cnilep (talk) 05:23, 16 September 2013 (UTC)[reply]
I have sent the adjective アィヌ to wt:RfV.
The fact that Batchelor calls words such as "white" adjectives seems reasonable. If I were creating a glossary for myself, I would probably go with adjective as well, just to make it easier in my mind. But Masayoshi Shibatani in "The languages of Japan" (1994, p. 19) says: "Forms corresponding to adjectives in meaning and function of other languages function as predicates in exactly the same way as intransitive verbs. Not only do they share the same personal affixed, but they both function as nominal modifiers in exactly the same way (section 3.3). Furthermore, these forms can have an inchoative reading, as well as their basic stative one... Thus, there does not sem to be any need to set up an independent category for adjectives in Ainu."
There seems to be little objection to erasing the adjective category. Would it work to have a template that generates "(stative) To be" for these Hawaiian adjective/verbs, and another template that generates "(inchoative, stative) To be/become" for Ainu? Other languages could use them as well, of course.... BB12 (talk) 08:31, 16 September 2013 (UTC)[reply]
Supplemental: I forgot to check the Japanese Wiktionary for the PoS. They have nine adjectives for Ainu, and the Japanese Wikipedia article on Ainu does not discuss the issue. BB12 (talk) 08:37, 16 September 2013 (UTC)[reply]

Deleting list of protologisms

I have created vote Wiktionary:Votes/pl-2013-09/Deleting list of protologisms.

Let us postpone the vote as much as the discussion needs. --Dan Polansky (talk) 12:47, 15 September 2013 (UTC)[reply]

Appendix talk:List of protologisms#Deletion debate looks like a good place to start. Note that Wiktionary:Criteria for inclusion no longer has a section 'Protologism', and also Conrad.Irwin's comment of "an institution, albeit a terrible one" (and he voted to keep). Mglovesfun (talk) 13:35, 15 September 2013 (UTC)[reply]

A new way of formatting definitions I saw someone use

I just came across the entry paraconsistent logic, and noticed that the whole definition has been wrapped into {{l|en|...}}. I had not thought of this way of using our templates but it's quite a nice idea. If the text contains links, then the template only processes the links that exist in the text, it does not add any new ones. So this is quite a simple and effective way to ensure that all of the links in the definition point to the #English section. I think it's worth considering turning into normal practice, but I would prefer to use a dedicated template name such as {{def}} (currently a language code template, which might be deleted) or the even shorter {{d}} (currently a redirect to {{delete}}, so we could usurp the name). Using a dedicated template would let us avoid all the superfluous logic that {{l}} has such as language tags, script detection, gloss annotations and so on. Would others be interested in adopting this practice? (I can already guess one person who probably isn't) —CodeCat 19:54, 15 September 2013 (UTC)[reply]

Yes I quite like it because it marks everything in the definition as English, which is it. Words don't exists independently of language, no reason not to mark all words in a sentence written in English as English. Mglovesfun (talk) 20:01, 15 September 2013 (UTC)[reply]
Well, all text on Wiktionary is marked by default as English. The language of each web page has lang=en on it at the top level, so we don't really need to mark the language for definitions. The benefit of this idea is mainly to add #English to the links, but it is also nice that definitions are tagged specially with their own template. That might have some future use for bots or other kinds of semantic parsing. —CodeCat 20:22, 15 September 2013 (UTC)[reply]
Yes, {{d}} looks like it is free to repurpose. I have also seen {{senseid}} being put forward a few times earlier. Why not merge its functionality into the new {{d}} template? Keφr 20:04, 15 September 2013 (UTC)[reply]
{{dfn}} is also free. We are probably not going to introduce any more language code templates? Keφr 20:13, 15 September 2013 (UTC)[reply]
You could also merge it with {{label}} by passing everything after the first parameter as label arguments... DTLHS (talk) 20:19, 15 September 2013 (UTC)[reply]
To Kephir's first post: We could do that, and I definitely think we should, but what about context labels? {{context}}/{{label}} is a bit too complex to combine neatly into another template, so we would probably want to keep it separate. Template calls can be nested, but we should avoid it if at all possible because it means that bots using regular expressions can no longer parse them, recursion would be required. (MWParserFromHell parses recursively, but not everyone can use it).
To the second post: either {{def}} or {{dfn}} is fine, but the latter might cause some confusion with the HTML element <dfn> which has a very different purpose.
To DTLHS: That can be done but it means that the order that the labels and definitions appear in in the wikicode is the opposite of the way it appears on the page. It's somewhat counterintuitive. —CodeCat 20:22, 15 September 2013 (UTC)[reply]
As for parsing, the API (mw:API:Properties#revisions / rv, mw:API:Expandtemplates) has a "generate parse tree XML" feature (the same as used in Special:Expandtemplates), which may help with the issue, although I presume it puts some load on the servers, so it would be nice to avoid it. I agree that putting {{label}} into the mix would complicate things; never mind the template logic, the cognitive load of editing the entry using that all-in-one template (I imagine it would be something like: {{d|lang|senseid|label|label|...|definition}}) could be a problem. While the expressiveness of the syntax stays at pretty much the same level. Keφr 20:48, 15 September 2013 (UTC)[reply]
It would make the wikicode hard to parse for humans as well, although that might be in part because we're just not used to it. A distinct label template stands out as much visually in the wikitext as it does on the page, which helps in quickly finding the part of the source you want to edit. And despite some of the protests, I think it helps too that the labels now always begin with {{context|, it makes them stand out more to the editor and leaves less room to guess about the nature of that particular piece of code. Where do we currently place {{senseid}}? Before the label or after it? —CodeCat 20:54, 15 September 2013 (UTC)[reply]
{{senseid}} has to be placed immediately after the # for it to work. --Yair rand (talk) 20:59, 15 September 2013 (UTC)[reply]
I do think that's the best practice, because the sense ID would "apply" as much to the label following it as to the definition itself. The label is a part of the definition, in a sense. However, it sounds like this is a technical restriction. What reason is there for that? —CodeCat 21:03, 15 September 2013 (UTC)[reply]
The template is essentially overwriting the # in order to attach an ID to the element. If there's anything else before the template, you'll just get an extra list item. --Yair rand (talk) 21:07, 15 September 2013 (UTC)[reply]
  • That looks pretty horrible, IMHO. --Dan Polansky (talk) 21:04, 15 September 2013 (UTC)[reply]
  • I saw it as well and thought "What a dreadful waste of time, space, resources etc". It exemplifies all that is wrong with this Wiki and probably goes some way to explain why it's getting slower and slower. SemperBlotto (talk) 21:13, 15 September 2013 (UTC)[reply]
    • I think that such strongly polarised opinions are also a cause. There has been this schism developing with one side wanting to progress towards a more functional, semantic and manageable practice, and another side preferring the old Wikipedia-style markup. Or said another way, one side sees our current software as a limitation and wants to develop ways to overcome it, while the other side thinks it's fine. Because people hold such strongly differing opinions on what Wiktionary should be, a lot of time is spent arguing over even relatively small things, and progress just grinds to a halt because many attempts to make any significant changes are blocked. So you get a situation where nobody is happy, but nobody is able to do anything about it either. —CodeCat 21:58, 15 September 2013 (UTC)[reply]
WP recently introduced a "visual editor". Personally I hate it, but I'm the kind of person who prefers to hand-code HTML instead of using a tool, and I'm a minority. Maybe we should introduce a "visual editor" without removing the ability to write markup if preferred. Equinox 01:27, 16 September 2013 (UTC)[reply]
No, enabling that would be horrible. It would just invite people to violate WT:ELE, misuse templates and misformat definitions. Besides, the implementation is buggy, and until quite recently it was very obnoxious. If we are going to implement an editor, it should understand our entry formatting practices. Something more like WT:EDIT. Though WT:EDIT also has missing features and is somewhat unintuitive to use. Keφr 07:42, 16 September 2013 (UTC)[reply]
  • If it is going to be used only for linking to "English" section, I think it's not a good idea, note that there are alternative ways to do it, with JS, maybe not a neat way, but it's better than doing it like that with a template and module, overall. If it is really supposed to be used for semantic purposes or if it would significantly make operations easier to do for bots, beside linking to "English" section, then it's a good idea. By the way, the current code of {{senseid}} is hackish, it leaves an unfinished tag. Browsers fix this error though, but still. If we merge it into the proposed template for definitions, we can add 'id's neatly. On the other hand, if we want to put the labels in the element too (which I think we should do), we have to somehow merge {{label}}/{{context}} into the proposed template as well. --Z 09:24, 16 September 2013 (UTC)[reply]

Changes to Template:en-noun

I have gradually been working towards converting this template to Lua, specifically Module:en-headword. {{en-adj}} and {{en-adv}} were already converted a while ago, but they were far easier to convert and did not have such intricate parameter usage. With this template things have not been so easy so I've been trying to untie the rather confusing mess of parameters that this template used to support, and also to fix any errors that might have crept into existing entries. In the process I have made some changes to the templates that made certain old uses no longer work. I converted the existing entries but I realise now that I should have discussed these changes more widely before applying them, for which I apologise.

The current situation for the parameters is now relatively simple. The first parameter gives the plural form, or it can be given as "s" or "es" which are interpreted specially by the template. In the past, you could also give the stem and the ending as separate parameters, but this did not seem to have any benefit for English nouns so I removed this feature (again I apologise for not discussing this first). You can also give the first parameter as "-" or "~", which indicates that the noun is usually or partially uncountable. In that case, the parameters shift up by one, so the second parameter gives the plural then. If a noun has more than one plural form, the additional plural forms are given with pl2=, pl3= and so on.

What I would like to change is convert the pl2= (etc.) parameters into positional parameters. So {{en-noun|first plural|pl2=second plural}} would become {{en-noun|first plural|second plural}}. I also want to add support for the shorthand "s" and "es" to these additional plurals; currently this is not supported and you always have to give the whole word. Module:en-headword is not currently used for this template, but after these changes are done, we should be able to change the template to use it. The code for English nouns is already there and should work, please check it to be sure.

Once the conversion to Lua has been done, we can look at ways to make the default plural form that the template shows a bit more useful. Rather than just adding -s onto the end of the word, it could look at the last consonants and decide what to add, so that we would not need to specify "es" anymore. It could also be made to change the ending, like convert -y to -ies. Such changes have been made to the adjective and adverb templates already and they work well. But although I am a native speaker of English, I am not really all that familiar with the intricacies of the spelling and grammar, so it would be helpful if someone could make a list of the most common rules for forming the plural in English. Keep in mind that these should be sensible default rules (rules of thumb), so they don't need to work all the time, only enough of the time for the rule to be worthwhile.

Please let me know what you think of this proposal. If it's agreed, I will make the changes to the template and convert the plural parameters. Then the changeover to Lua can be made. —CodeCat 20:50, 16 September 2013 (UTC)[reply]

That makes sense to me. One difficulty, after we've Luacized and want to start supporting things like <-y> → <-ies>, is that we can't change a default-generated plural without identifying beforehand all the cases where this will actually affect the entry. (For example, if the pagename ends with <-y> and we're currently using the default-generated plural in <-ys>, as we currently are for one of the nouns spelled <why>, we don't want it to suddenly become <-ies>. Even when the default-generated <-ys> is actually wrong, we'll want a human to take a look, if only because there's a good chance that we'll need to delete a plural entry that was autocreated off the mistaken version.) Also, we'll need to make sure that the documentation is very clear about how to override the default-generated plural when necessary, for the benefit of less-savvy editors: IME, such editors tend to find it frustrating when computers do complex-but-mistaken things. (Actually I personally think it would be better never to autogenerate plurals at all — we can just always use {{en-noun|s}} and {{en-noun|-|es}} and so on — but I know that a lot of editors would never accept that, so I won't try to push for it. :-P   ) Lastly — thank you for bringing this here. I realize that it can be frustrating, when you have a great idea for how to improve things, to have to restrain your excitement and wait for feedback before beginning work. —RuakhTALK 21:09, 16 September 2013 (UTC)[reply]
Actually, come to think of it, we might want to detect nouns that end in <-y> (or <-ss> or whatnot) and don't specify a plural, not so we can try to autogenerate the correct plural, but just so we can assume |? and tag them for human examination. In cases like <whys>, we can require an explicit |s or |whys. This way we can supply a default in only the (very common) case that we can be reasonably sure it's not totally wrong, while still having a simple and easy-to-understand behavior for other cases. —RuakhTALK 21:14, 16 September 2013 (UTC)[reply]
We can do this the same way I did it in the past for the adjectives, and for {{it-noun}} currently. We can add code to the module which generates default plurals internally both the "old" way and the "new" way, and then categorises depending on whether they match or not. The new default would become whies, the old would be whys, and so the entry why would end up in the "does not match" category, where we can take measures to fix it, by explicitly adding the full plural form "whys" into the entry. Once we fix all entries in that category, we can be sure that no entry will be affected by changing from the old default to the new. —CodeCat 21:28, 16 September 2013 (UTC)[reply]
Yup, though I think it's best to first examine a database dump to look for any identifiable instances. The problem with relying on template-edits and categorization is that MW is not 100% reliable, and when you edit a widely-transcluded template in a way that affects the categories it generates on a given page, it will often happen that the page itself is updated, but still doesn't actually show up in the category. (Database-dumps are not 100% reliable either, firstly because they're out-of-date by up to a few weeks, and secondly because the code used to examine them can never perfectly match the code that MediaWiki uses to parse wikitext — it's difficult-to-impossible to catch all edge-cases and intertemplate magic — but by using database-dumps for the first pass, and categorization for the second pass, you can minimize the chances of undetected breakage.) —RuakhTALK 22:03, 16 September 2013 (UTC)[reply]
We could always counter that by doing a null edit on all transclusions to make sure they're updated. That's what I usually do. —CodeCat 00:35, 17 September 2013 (UTC)[reply]
I assume that the reason MW is unreliable is that the system is overtaxed, and that performing additional mass edits (null or otherwise) simply exacerbates the problem. (But perhaps I assume wrongly. It may be worth consulting the developers.) —RuakhTALK 05:13, 17 September 2013 (UTC)[reply]
I would say that the job queues are long right now because so many changes are being made to widespread templates. The software isn't getting enough time to catch up. But I don't think that has anything to do with the actual CPU usage or anything like that. As far as I know, doing null edits would only tell the system to prioritise the page you edit, it shouldn't affect things in general. And actually, I think that each view or edit also causes the system to process a small part of the job queue, so maybe doing many edits actually helps the system a little bit. In any case, I don't think that editing regular pages, which don't transclude anything, would make the job queue longer. —CodeCat 12:55, 17 September 2013 (UTC)[reply]

TTBC and language names

I propose that language names used in {{ttbc}} are left alone rather than being replaced with language code like in diff. The language names are used in translation tables, so they make it easier to switch a ttbc entry to a verified one. --Dan Polansky (talk) 16:29, 17 September 2013 (UTC)[reply]

I have a different proposal. Rather than using {{ttbc}} to replace the language name, we just write the language name and place {{ttbc}} after it instead. Then there is no need for the template to take a language name at all. Actually, why not convert it into an actual translation link template like {{t}}? Then we would be able to mark specific translations to be checked, like {{ttbc|nl|gedrag|n}}. —CodeCat 16:32, 17 September 2013 (UTC)[reply]
My current solution is just to replace {{ttbc|xyz}} with {{subst:xyz}} when checking a translation. That will work as long as we don't delete the language-code templates. —Angr 16:36, 17 September 2013 (UTC)[reply]
I like that last idea. Keφr 16:38, 17 September 2013 (UTC)[reply]
Replacing ttbc| with subst: is easier because you only have to remove one piece of text instead of two. Keφr 16:38, 17 September 2013 (UTC)[reply]
You can subst module invocations too. You could replace {{ttbc|xyz}} with {{subst:#invoke:language utilities|lookup_language|xyz|names}}. But if the language code does not exist, then this will substitute the script error into the page instead. —CodeCat 16:41, 17 September 2013 (UTC)[reply]
Yes, imagine me typing that every time. (Conclusion: We need a JS tool for that. Or maybe temporarily extend {{ttbc}} so that substituting it will return just the language name. Or change ttbc to mark specific translations like you proposed. Then it will be just a question of "ttbc" → "t".) Keφr 16:45, 17 September 2013 (UTC)[reply]
  • How many newbie editors know that they can do a thing like {{subst:cs}} to get "Czech"? Why is it easier to delete "ttbc" and type "subst" than just deleting "ttbc", in terms of keystrokes? Why do you take something that works flawlessly and is obvious and replace it with something unobvious? --Dan Polansky (talk) 16:42, 17 September 2013 (UTC)[reply]
  • ──────────────────────────────────────────────────────────────────────────────────────────────────── So you mean this idea, I guess: "Rather than using {{ttbc}} to replace the language name, we just write the language name and place {{ttbc}} after it instead." So instead of "{{ttbc|Czech}}", we are going to write "Czech {{ttbc|cs}}" right? For what benefit? And this leaves the key question unanswered: why do you take something that works flawlessly and is obvious and replace it with something unobvious? --Dan Polansky (talk) 16:58, 17 September 2013 (UTC)[reply]
    • I suppose we disagree on the assumption that the current setup works flawlessly and is obvious? I would not have proposed it otherwise. —CodeCat 17:01, 17 September 2013 (UTC)[reply]
      • Transferring "ttbc|Czech" to "Czech" is obvious; the burden of proof is on you to show that what you are doing is more obvious, IMHO; the editor has to guess that "subst:cs" is going to work. If you claim the current setup does not work flawlessly, then what are its flaws? --Dan Polansky (talk) 17:04, 17 September 2013 (UTC)[reply]
        • It is documented, guessing is unnecessary. On the other hand, manuals are written to be never read. How about the other CodeCat's proposal? Keφr 17:11, 17 September 2013 (UTC)[reply]
        • In the case of my proposal, it would involve changing * Czech: {{ttbc|cs|chování|n}} to * Czech: {{t|cs|chování|n}}. I don't think that is really any more difficult from your idea. But I think that's far more intuitive to do it this way, because it's clear which translation needs checking. And you also avoid all issues with mixing language names and codes, because the name is always there. It makes it much easier for tools like XTE to parse it too. —CodeCat 17:18, 17 September 2013 (UTC)[reply]
          • Two things. What are the flaws of the current setup?
          • Now that I see it fully exemplified, I like your proposal, and I can confirm that it is no more laborsome than the current setup. --Dan Polansky (talk) 17:23, 17 September 2013 (UTC)[reply]
            • Aren't the flaws more or less why you posted this discussion? {{ttbc}} allows both language names and codes, which is exceptional and somewhat strange in itself. When a code is given, then that doesn't work neatly with the other translations because they begin with a name. This could be solved by changing {{ttbc}} to use a language name at all times, but that is also unusual (all of our other templates use codes), and it still means that the language name portion of the translation line has to be parsed separately, because it can either be a plain language name or {{ttbc|language name}}. My proposal increases consistency by saying that all translation lines begin with the language name, no exceptions. It also allows you to tag individual translations for checking, which the current method does not allow; the best we can do now is marking all translations for a given language to be checked and hoping that others will figure it out. And because my proposal takes the form of just another translation template, existing tools such as XTE only need to be adapted by adding "ttbc" to the list of recognised translation templates (which currently contains "t", "t+", "t-", "tø", "t0"), so it does not make translations harder to parse. —CodeCat 17:31, 17 September 2013 (UTC)[reply]
            • Oh and just to be clear, my idea affects {{trreq}} as well. It would be placed after the language name, rather than replacing it. So a request for a Czech translation would look like this: * Czech: {{trreq|cs}}. —CodeCat 17:34, 17 September 2013 (UTC)[reply]
              • I like the idea, but I think it would be better to use a new template; for example, we could use {{t?}} (both for translations-to-be-checked and for translations-requests, the difference just being that the former includes a provisional translation while the latter does not). And we'd probably want to still support the ability to link to the FL wikt, since that's actually very helpful in checking a translation. Maybe {{t?+}}? (Or is that starting to become inscrutable?) —RuakhTALK 20:06, 17 September 2013 (UTC)[reply]
                • We can use {{t?}} or something that's a bit longer, {{t-check}} just to make it clear and stand out more. I'm not sure what to do with the interwiki link. I suppose it should be included, but then we just end up adding lots more templates, which doesn't exactly make things easier to follow. —CodeCat 20:27, 17 September 2013 (UTC)[reply]
                  • All of {{ttbc}}, {{t?}} and {{t-check}} seem okay to me: it does not need to stand out, as it is located in a dedicated "translations to be checked" section of the Translations section. As for {{t?+}}, I don't know; it could be useful. --Dan Polansky (talk) 08:33, 18 September 2013 (UTC)[reply]
                    • Not all translations to be checked, or even most of them, appear in a separate section. Most appear among the regular translations. XTE tags translations to be checked that way, but it was only copying existing practice which existed long before then. —CodeCat 13:13, 18 September 2013 (UTC)[reply]

Wikisaurus and attestation

I have created vote Wiktionary:Votes/pl-2013-09/Wikisaurus and attestation.

Let us postpone the vote as much as the discussion needs. --Dan Polansky (talk) 17:14, 17 September 2013 (UTC)[reply]

I believe that copying translation pairs from copyrighted translation dictionaries with incompatible license into Wiktionary is a copyright violation. As at least two editors claim otherwise, I'd like to discuss this in Beer parlour. --Dan Polansky (talk) 07:54, 18 September 2013 (UTC)[reply]

To be specific, I cannot take the bulk of http://eudict.com/ and copy their translation pairs to Wiktionary. Likewise, I cannot take the translation pairs from http://slovnik.seznam.cz/ (provided by Lingea company) and copy them to Wiktionary. And I cannot take the multilingual translation pairs from http://www.thefreedictionary.com/ (a multilingual dictionary popular per Alexa rank) and copy them to Wiktionary as I see fit. --Dan Polansky (talk) 08:01, 18 September 2013 (UTC)[reply]

Can someone please help me find the topic, also in Beer parlour, where I asked about the necessity of references in entries? For some reason I think Dan Polansky also took part in the discussion. Anyway, the outcome was that we can use published dictionaries for separate words, if I remember correctly. --Anatoli (обсудить/вклад) 08:05, 18 September 2013 (UTC)[reply]
I believe that already copying a single translation pair from a single copyrighted translation dictionary constitutes a copyright violation. Entering translation pairs such that each of them is present in several independent copyrighted translation dictionaries seems to be not a copyright violation. Granted that one infringed translation pair is not a real issue; the issue is the use of distributed editorship to copy a translation dictionary while each of the copying editors would only copy a small fraction. --Dan Polansky (talk) 09:26, 18 September 2013 (UTC)[reply]
Translations are facts. Facts by itself are not copyrightable as per the US Copyright Law. Please see the w:Feist v. Rural ruling for more information. -- Liliana 09:31, 18 September 2013 (UTC)[reply]
Translations dictionaries show originality in their choice of target terms for source terms. That originality is protected by copyright law. If all translation dictionaries ended up with the same target terms for each source term, it would be true that translation pairs are not protected by copyright as being a straightforward unoriginal obvious expression of a fact, but such is not the case. Strictly speaking, translation pairs cannot be facts; they can at best be a straightforward unoriginal unobvious expression of facts. Likewise, a sentence is not a fact; it is the meaning of a sentence that captures a fact. A fact is a state of affairs; in the case of a translation pair, the state of affairs is that one of the meanings of the source term is identical or similar to one of the meanings of the target term. --Dan Polansky (talk) 09:55, 18 September 2013 (UTC)[reply]
Corrected myself by inserting "unoriginal" and striking "un". --Dan Polansky (talk) 10:17, 18 September 2013 (UTC)[reply]
Translations are not facts, they are opinions are potentially copyrightable. However as far as I understand it here has to be a minimum level of creativity before a definition is copyrightable. For example French casser translated as break is not copyrightable, indeed that's why so many dictionaries have it. If there was a sentence of usage notes, that IMO would be copyrightable. Mglovesfun (talk) 10:03, 18 September 2013 (UTC)[reply]
I am talking translation pairs, not definitions and not usage notes. Again, the choice of translation pairs shows originality. They do not need to show creativity in any artisitic sense. I believe that a set of, say, 10,000 genuinely random numbers that a person publishes is subject to copyright because of the originality shown in that set, albeit not artistic originality.
The claim that "Katze" is a valid German translation of English "cat" is a fact, not an opinion. --Dan Polansky (talk) 10:08, 18 September 2013 (UTC)[reply]
we define foreign words with one word translations where we can; also I believe you are correct because of the derivative work rule (regarding your first paragraph). Mglovesfun (talk) 10:11, 18 September 2013 (UTC)[reply]
I believe that translation pairs of single words themselves cannot be copyrighted. Translation pairs of single English words are facts. It is difficult to show originality in translation pairs when it comes to translation dictionaries. Originality in translation pairs does not apply to single word translations. These translation rules you are imposing are going out of hand. Relying on only one source is what may constitute a copyright violation. Tedius Zanarukando (talk) 02:40, 19 September 2013 (UTC)[reply]