Wiktionary:Beer parlour

Definition from Wiktionary, the free dictionary
(Redirected from Wiktionary:Beer Parlour)
Jump to: navigation, search

Wiktionary > Discussion rooms > Beer parlour

Lautrec a corner in a dance hall 1892.jpg

Welcome, all, to the Beer Parlour! This is the place where many a historic decision has been made and where important discussions are being held daily. If you have a question about fundamental Wiktionary aspects—that is, about policies, proposals and other community-wide features—please place it at the bottom of the list (click on Start a new discussion), and it will be considered. Please keep in mind the rules of discussion: remain civil, don't make personal attacks, don't change other people's posts, and sign your comments with four tildes (~~~~), which produces your name with timestamp. Also keep in mind the purpose of this page. There are various other discussion rooms which may serve the idea behind your questions better. Please take a look to see which is most appropriate.

Sometimes discussion identifies an issue as an idea for policy development or rewriting. Such discussions may be taken out of the Beer parlour to a relevant page, or a brand new page may be created. Usually, the active policy pages will be listed in one of the sections below. See also the policy development page and the votes page.

Questions and answers will not remain on this page indefinitely, as it would very soon become too long to be editable. After a period of time with no further activity (usually a couple of weeks), information will be moved to the archives. We make a point to preserve all discussions that were started here in the archives. However, talk that is clearly not intended for this page may be moved and will not end up in the archives. Enjoy the Beer parlour!

Beer parlour archives edit


June 2015

Good site?[edit]

Hello. Do you like Wiktionary? --Keyboard Masher (talk) 23:31, 2 June 2015 (UTC)

@Keyboard Masher: I do. Do you? —Justin (koavf)TCM 04:53, 3 June 2015 (UTC)
Meh, it's OK. I'd enjoy it more if it came in more colours. --Keyboard Masher (talk) 08:05, 3 June 2015 (UTC)
We need a "smellyvision" version. SemperBlotto (talk) 06:01, 4 June 2015 (UTC)
Two words: Cleveland steamer. Chuck Entz (talk) 13:41, 4 June 2015 (UTC)
Yuck! --Hekaheka (talk) 20:16, 8 June 2015 (UTC)

Inuktitut characters[edit]

Would someone like to add entries for each of the characters use in Inuktitut words (those that we haven't already got)?

For instance the word ᑕᕝᕙ ‎(tavva) contains the characters ‎(ta) (which we have), and ‎(v) and ‎(va) (which we haven't). SemperBlotto (talk) 16:30, 4 June 2015 (UTC)

Normalization of entries vote[edit]

Wiktionary:Votes/pl-2015-05/Normalization of entries started today. --Daniel 22:47, 4 June 2015 (UTC)

Pywikibot compat will no longer be supported - Please migrate to pywikibot core[edit]

Sorry for English, I hope someone translates this.
Pywikibot (then "Pywikipediabot") was started back in 2002. In 2007 a new branch (formerly known as "rewrite", now called "core") was started from scratch using the MediaWiki API. The developers of Pywikibot have decided to stop supporting the compat version of Pywikibot due to bad performance and architectural errors that make it hard to update, compared to core. If you are using pywikibot compat it is likely your code will break due to upcoming MediaWiki API changes (e.g. T101524). It is highly recommended you migrate to the core framework. There is a migration guide, and please contact us if you have any problem.

There is an upcoming MediaWiki API breaking change that compat will not be updated for. If your bot's name is in this list, your bot will most likely break.

Thank you,
The Pywikibot development team, 19:30, 5 June 2015 (UTC)

Your usage of English is unforgivable :) —suzukaze (tc) 23:15, 9 June 2015 (UTC)

Votes on desysopping inactive admins[edit]

WF has created votes for the desysopping without prejudice of four sysops who have been wholly inactive for years. As these votes have largely escaped public notice, I've extended them for 10 more days so that more of the community can weigh in on whether or not to remove their bits. Please vote here: Wiktionary:Votes/sy-2015-05/User:Caladon for de-sysop, Wiktionary:Votes/sy-2015-05/User:Jun-Dai for de-sysop, Wiktionary:Votes/sy-2015-05/User:Celestianpower for de-sysop, Wiktionary:Votes/sy-2015-05/User:EivindJ for de-sysop. —Μετάknowledgediscuss/deeds 20:54, 5 June 2015 (UTC)

Has anyone tried contacting these admins for their input? bd2412 T 02:00, 6 June 2015 (UTC)
(Follow-up) I have posted messages on the talk pages of these four admins, and have sent e-mails to the three who have e-mail set up. bd2412 T 02:09, 6 June 2015 (UTC)

Entries for ISO codes[edit]

About having entries for language codes, like en or ang — and also ISO family/script/country codes. Can we have those, provided they are attested as usual, or was there some discussion or some issue preventing creating entries for them? Granted, one can predict that comparatively only a few, far from all codes would be attestable.

It's a bit difficult finding previous discussions in this subject as I naturally can't search for "language code entries" or "language code" without seeing a thousand unrelated discussions, but here are some, all of those are from 2010:

I've tried my hand at attesting Citations:Latn meaning Latin script. What do you think, that's good enough that we can create the entry Latn? I tried to find citations where Latn is being used in running text, in accordance with WT:CFI#Conveying meaning. --Daniel 10:11, 6 June 2015 (UTC)

Case in point: We have Category:ISO 3166-1 (country codes) with 471 entries. --Daniel 11:58, 6 June 2015 (UTC)
I suppose it comes down to the individual attestability of every single code. Some might be attestable and others might not. Remember we don't even keep units of measure (like stupid zettakilograms or whatever) unless they are attested, even if they follow official naming rules. Equinox 10:16, 6 June 2015 (UTC)
I seem to think after jv failed we delete all ISO 639 codes, which is dubious because they didn't all fail RFV, just one or a couple of them. Renard Migrant (talk) 11:03, 7 June 2015 (UTC)

Contemporary Old High German[edit]

This is merely a mental exercise on a dogmatic question, but who knows, one day the Alemans could descend off their mountains into our dictionary, so give it a serious shot.
In the south of Switzerland, the local dialects

  • Do not feature final obstruent devoicing
  • Do not diphthongise PGM long vowels
  • Have long consonants
  • Have not lengthened short vowels in open syllables
  • Know at least five different vowel qualities in unstressed syllables (i, u, e, o, a)

So nobody can tell me that's not Old High German. At the same time, there are Alemannic dialects which have merged all unstressed vowels into /ə/. How they are not Middle High German is beyond me as well. Is it really sensible to list both as Alemannic rather than as living forms of OHG and MHG? Korn [kʰʊ̃ːæ̯̃n] (talk) 22:47, 7 June 2015 (UTC)

A language is not defined solely by sound changes, but by other things as well like grammar and lexicon. I would be more convinced by your argument if OHG were more intelligible to these speakers than Old Norse is to Icelanders. —CodeCat 23:25, 7 June 2015 (UTC)
Pardon? I wasn't making an argument, I was asking a question. Korn [kʰʊ̃ːæ̯̃n] (talk) 13:20, 8 June 2015 (UTC)
Is it really sensible to follow the lead of professional scholars on the subject? I'm going with yes.--Prosfilaes (talk) 12:36, 8 June 2015 (UTC)

Category:Northern German - Category:Southern German‎[edit]

Neither of these have a definition. If they're to be kept, they should. Especially in the former category, many of these terms aren't actually restricted to Northern Germany, they're used everywhere. They might be more common in some regions than others but that's not what the regional label is for. -- Liliana 15:32, 8 June 2015 (UTC)

Define "elsewhere" and please give examples. I'm highly baffled by your statement. (Which doesn't mean I'm not believing you.) Korn [kʰʊ̃ːæ̯̃n] (talk) 21:02, 8 June 2015 (UTC)
For example, the term Rummel is definitely not restricted to Northern Germany, it's used everywhere. Same with moin, it might have originated there, but it's used in the whole country nowadays. You can find terms like this in the other category as well: händisch is definitely not restricted to Southern Germany.
There is no line you can draw on the map to denote that north of it is Northern German and south of it is Southern German, unlike (say) Swiss and Austrian German which stop pretty much at the national borders. -- Liliana 21:15, 8 June 2015 (UTC)
That's not a reason to ditch the labels, especially given how common they are in other dictionaries, including de.Wikt and the Duden. There's no single precise definition of the "Southern US", and not all of the terms used in Category:Southern US English are used in all of the same exact places. But if certain terms are widely perceived/agreed to be "southern German" or "Southern US", categorizing them as such can still be useful. - -sche (discuss) 21:50, 8 June 2015 (UTC)
Actually, "Southern US" is a very well defined region, it refers to very specific states. You can't make that claim for the German categories discussed here. -- Liliana 22:03, 8 June 2015 (UTC)
But Southern US English is not restricted to the states considered the South. Indiana isn't the South, but the language of the southern half of Indiana is distinctly Southern. —Aɴɢʀ (talk) 14:48, 11 June 2015 (UTC)
You rediscovered the fact that language boundaries do not conform to political boundaries. Nevertheless, there is a more-or-less defined region, even if there is a gray area in between. --WikiTiki89 15:18, 11 June 2015 (UTC)
Comment 1: de.Wikt also uses these labels (e.g. in de:kross, de:Obacht), often simply linked to de:süddeutsch (süddeutsch) and de:norddeutsch (norddeutsch). Does that provide sufficient definition?
Comment 2: A while ago, I started a discussion about the redundancy of having both Category:North American English/Category:North American French and Category:American English/French and Category:Canadian English/French. The decision was to reduce the "North American" label to an alias of "US, Canada" and deprecate its category. We already have an "Austrian" label, would it be better to deprecate these two labels in favour of other state- or dialect- specific labels? OTOH, a category for "Bavarian German" regional German could be considered confusing by some. (Compare how "Swiss German" regional German was renamed "Switzerland German" by me because some users, although not me, felt the former name was too confusing.)
- -sche (discuss) 21:26, 8 June 2015 (UTC)
That might indeed be better although I have no idea how to divide the regions. There definitely are terms that are used only in Bavaria and nowhere else (grüß Gott being perhaps the most famous example). We already have {{DDR}} for terms from East Germany. -- Liliana 21:34, 8 June 2015 (UTC)
Two things: 1. If I ever catch a Bavarian saying 'moin', I'll smack some Grüß Gott into him. 2. To me Northern Germany always seemed very strictly defined as "Bundesländer with a sea coast" and more loosely as "areas where Low German happens". "Southern Germany" seems to be defined as areas where Alemannisch+Bairisch+Oberfränkisch happen. Isn't this how it's used 99% of the time, especially in linguistic context? Korn [kʰʊ̃ːæ̯̃n] (talk) 10:16, 9 June 2015 (UTC)
And the middle states are what, nothing? -- Liliana 20:34, 9 June 2015 (UTC)
Central Germany. - -sche (discuss) 21:56, 9 June 2015 (UTC)
I propose that we formally define Southern Germany for WT as the area south of the Speyrer and Northern Germany north of the Uerdinger line. Korn [kʰʊ̃ːæ̯̃n] (talk) 09:27, 11 June 2015 (UTC)
"Bundesländer with a seacoast" would exclude Berlin and Brandenburg (which are often considered part of Northern Germany) and Hamburg (which always is). The Uerdingen line seems better for linguistic purposes such as ours. —Aɴɢʀ (talk) 14:48, 11 June 2015 (UTC)
I've not encountered Berlin and Brandenburg being considered Northern Germany, culturally, by anyone in my life. Hamburg of course is part, but it's within the realm of coastal states, so to speak. Korn [kʰʊ̃ːæ̯̃n] (talk) 23:41, 11 June 2015 (UTC)

What to do when the lemma form (and only that form) has alternative forms?[edit]

When a single lemma has several different options for a particular inflection, we create entries for all of them and give them the appropriate definition. So for example, an English noun with two possible plurals will simply have one plural entry for each of them. But in some cases, the form that we have chosen as the lemma will have alternative forms itself. In some cases, this implies that the stem of the word is different, so that they have two separate sets of inflections. In this case, we define one of them as "alternative form" and include inflection tables on both.

But it's also possible that there is one single paradigm that happens to have two possible forms for the lemma form only. For example, a noun might have two different nominative singular forms, but there is only one form for all other inflections. Or a verb could have two possible infinitives. How should we handle these cases? If we use "alternative form of" then it's misleading because the user might think that this is an entirely different verb with its own inflections, when in reality only the lemma form happens to have an alternative form. So I'm thinking that it would make more sense to list this as, for example, "nominative singular of" or "infinitive of", just like we do with any other inflected form.

Of course, the question also arises which of the forms should be chosen as the "real" lemma. —CodeCat 22:41, 9 June 2015 (UTC)

I think our current practice is good and needs no changing. See for example honos. —Μετάknowledgediscuss/deeds 22:44, 9 June 2015 (UTC)
Ok, but which lemma do you have the inflections point to, honor or honos? —CodeCat 22:51, 9 June 2015 (UTC)
Either to the lemma, or to both the lemma and alternative form. I prefer just to the lemma, which in this case is honor. --WikiTiki89 23:03, 9 June 2015 (UTC)
Alternative forms are also lemmas. They're categorised as such. —CodeCat 23:23, 9 June 2015 (UTC)
That means nothing more than that there is a discrepancy between how I used "lemma" in my sentence and how we use it in categorization. You still understood what I said. --WikiTiki89 01:03, 10 June 2015 (UTC)
I agree with Meta and WikiTiki. - -sche (discuss) 23:52, 9 June 2015 (UTC)
The problem seems to be that we use "alternative form" to mean both a single inflected wordform sometimes, and an entire paradigm at others. --Tropylium (talk) 00:06, 17 June 2015 (UTC)
That's a very good point. Maybe we should start distinguishing between them. --WikiTiki89 16:44, 17 June 2015 (UTC)
I think "alternative form" should only be used for lemmas. For non-lemma forms, the distinction is moot because both are equally forms of their lemma. —CodeCat 17:57, 17 June 2015 (UTC)
Not sure what you mean. What I thought Tropylium was talking about was that we have things like "ax is an alternative form of axe", which means nothing more than "the form ax is an alternative of the form axe", and then we have things like "plow is an alternative form of plough" which really means "plow and all of the forms plow represents are respectively alternatives of plough and all the forms plough represents". --WikiTiki89 18:51, 17 June 2015 (UTC)
Yes. And I'm saying we should only be using "alternative form" for the latter. We should also not be using "alternative form" for non-lemmas, so nothing like marking octopodes as an alternative form of octopuses. Both should be marked simply as the plural of octopus. —CodeCat 19:07, 17 June 2015 (UTC)
So then what would you do for ax? --WikiTiki89 19:20, 17 June 2015 (UTC)
What we should consider is what to do with the plural. If axes should be defined as a plural of both axe and ax, then the latter is an alternative spelling of the former. But if it's only the plural of axe, then ax just be an alternative spelling of the singular form only. —CodeCat 19:41, 17 June 2015 (UTC)
If we follow what you said, then we can't do the latter because you can't have an alternative form of a single form. Unless you want to make an exception and only allow this for the lemma form. --WikiTiki89 19:51, 17 June 2015 (UTC)
Well, axes is, in fact, the plural of ax as well as the plural of axe (not to mention the plural of axis), which is exactly what the page already says. Do you think that we should say it's the plural of axe (and axis) alone, and not mention ax on the page at all? —Aɴɢʀ (talk) 19:47, 17 June 2015 (UTC)
That's the idea of this proposal. If one word happens to have two or more lemma forms, then it's not helpful to have inflected form entries that link to each of them separately.
The way I see it is that we treat lemmas as "inflection sets". An alternative form has, by the treatment proposed here, a different (or at least partially different) inflection set from the word it's an alternative form of, and is therefore a lemma in itself, albeit one that is used in variation with another. This means that if two lemmas have the same inflection set except for the lemma form, then they clearly have the same inflection set and only the lemma form of the inflection set has several forms. This is no different from having a non-lemma form that has several forms. For example, having one inflection with two nominative singular forms is conceptually no different from one with two genitive plural forms.
This is not a simple rule, though. After all, there are cases like English nouns where the inflection set consists of only two items, singular and plural. Does a word belong to a different inflection set if the singulars differ but have the plural forms in common? This is something we would need to determine separately. —CodeCat 20:48, 17 June 2015 (UTC)

Citation links[edit]

Using {{seeCites}} at the entry example returns the text:

"For usage examples of this term, see the citations page."

Sometimes the citation page linked is different from the entry name, but the template text shows absolutely no indication of that. For that reason I would like to edit the text.


  • Insert the title of the citations page in the text.
  • If the entry is different, link to the entry too. (Use {{l-self}}.)

For example it might return this text:

"For usage examples, see the citations page of example." (unlinked entry)
"For usage examples, see the citations page of example." (linked entry, if the template is used anywhere other than example itself)


At the Portuguese entry como, some citations are at Citations:como, but the citations which are verb forms of comer would be at Citations:comer, both citation pages are linked from the respective POS section but I would like to change the fact that there is no indication that these are actually different citation pages.

After editing the template, at the entry como we would have both:

  • Adverb/Conjunction/Interjection

"For usage examples, see the citations page of como." (unlinked)

  • Verb

"For usage examples, see the citations page of comer." (linked)
--Daniel 14:27, 18 June 2015 (UTC)

Support. Good idea. It's not uncommon for citations of one word to be in another word's citations page, either because citations of all spellings and hyphenations have been gathered in one place (Citations:moose-misse), or because citations of all inflected forms have been placed on the lemma's page (Citations:they), or potentially for some other reason. - -sche (discuss) 15:59, 18 June 2015 (UTC)
Why not just show the actual name of the citation page? Citations:example? —CodeCat 17:31, 18 June 2015 (UTC)
That'd be fine by me. - -sche (discuss) 22:45, 18 June 2015 (UTC)
  • Support per -sche. DCDuring TALK 21:37, 18 June 2015 (UTC)
  • Support.​—msh210 (talk) 06:52, 21 June 2015 (UTC)

Done, with both {{seeCites}} and {{seemoreCites}}. --Daniel 00:25, 4 July 2015 (UTC)

A new (better) way to collapse inflection tables?[edit]

MediaWiki:Gadget-legacy.js contained a second, apparently unused method of collapsing elements. It's more flexible: rather than having to put everything in wrapper divs, you can specify for individual elements whether they should be hidden or not. Moreover, it's possible to specify that elements should be displayed only when the element is collapsed. This makes it possible to have a table that shows one set of table rows when collapsed, and another set when expanded. For inflection tables, the expanded version could show the full table, while the collapsed version shows only the most important/least predictable forms (principal parts).

I have created an example of this at User:CodeCat/vsExample. Compare it to the original table at gooien. Note that there are no more wrapper divs, the table itself is the outermost element now. This makes it possible, in theory, to have the table scale automatically as its contents gets too wide. This was not possible with the old method. —CodeCat 23:52, 18 June 2015 (UTC)

Looks good. Might be a good way to hide selected portions (like cognate lists) of our too-long etymologies. DCDuring TALK 01:28, 19 June 2015 (UTC)
Yes. We already do that with the old "div" method in some pages, but with the new method, we can make it look different and more appealing. Though, to be fair, I usually remove cognates if they are just duplicated in many entries and if they can easily be found on a proto-language page. —CodeCat 01:55, 19 June 2015 (UTC)
There are more than 7,500 entries containing "{{m|ine-pro|", "English", "Etymology", and "cognate" and/or "compare", so we have a way to go in cleaning them out. I could see why it is handy not to compel those with specialized interests to rummage around in multiple entries, but most users have no interest in such matters and find our above-the-definition material intimidating and confusing. Perhaps all lists of cognates should be enclosed in a template, which allowed them to be hidden by default and displayed for a given registered user always by gadgetry or by use of a show/home control. Perhaps something similar would make sense for the portions of etymology related to all or some reconstructed languages. DCDuring TALK 22:55, 19 June 2015 (UTC)

I've implemented this for the Dutch inflection tables now, and I'm quite pleased with the result. See groot, goed, verbogen, zijn, werpen, uitwerpen for some examples. But now that the most important forms are shown in the inflection table even when it's collapsed, it's a bit redundant to show them in the headword line as well. Presumably they should be removed from there. If we start extending this kind of inflection table to other languages, then we should probably remove the forms from the headword line then too. For example, we would no longer need to show the principal parts of Latin words in the headword line if they are already shown in the inflection table. It's rather redundant otherwise. —CodeCat 13:45, 23 June 2015 (UTC)

@kc kennylau pinging because he has worked on Latin templates recently. —CodeCat 13:48, 23 June 2015 (UTC)

They do look good. I wouldn't rush to remove the redundancy on the inflection line as we have trained users to look there for core inflection information. Perhaps the redundancy is really in the new tables. DCDuring TALK 15:24, 23 June 2015 (UTC)

Category:Perching birds[edit]

Discussion moved to Wiktionary:Requests for moves, mergers and splits#Category:Perching_birds.


22:00, 19 June 2015 (UTC)

Apache nesting[edit]

(Has this been discussed before?) Only some Apache varieties are nested. Should they all be nested, or should none of them be nested? - -sche (discuss) 22:05, 21 June 2015 (UTC)

Judging from the Wikipedia article (w:Southern Athabaskan languages), Navajo is more closely related to Western Apache and the Chiricahua/Mescalero group than it is to Plains Apache, Jicarilla Apache or Lipan Apache. That makes it kind of pointless to talk about nesting based on linguistic criteria. We have to decide whether we're nesting based on cultural/historical commonalities, in which case Plains Apache shouldn't be included, or just convenience- lumping together everything named "Apache". I think anything but the latter is going to be confusing to the average user, so I'm inclined to either nest them all or nest none of them. Chuck Entz (talk) 23:41, 21 June 2015 (UTC)
I would nest none of them, n part because users will probably look for Plains Apache under "P", etc. I question our nesting of Ancient Greek and Mycenaean Greek under modern Greek, too. - -sche (discuss) 00:52, 22 June 2015 (UTC)
Taxonomic (family) nesting and evolutionary nesting (not yet suggested) seem to suit us, not users unlike us. Listing under hypernyms is at least accessible for ordinary users, as long as the modern language name, which is also usually the hypernym, appears where it belongs in an alphabetical sequence. Sortable tables would address this and similar issues in other data (such as definitions), but they may not be feasible, reliable etc. DCDuring TALK 01:12, 22 June 2015 (UTC)
We don't really have any rules on nesting anyway, do we? We seem to do it on a very subjective, intuition-based basis. I feel like it makes sense to nest Primitive Irish, Old Irish, and Middle Irish under Irish, but if the rule is to group ancestral forms under the equivalent name without words like "Primitive", "Old", "Middle", etc., then it isn't clear where to put Old English and Middle English (since we never have English in translation tables) or Old Norse (since there isn't a language we call "Norse"). I think I would look for Plains Apache under A rather than P, but I don't know how representative I am. —Aɴɢʀ (talk) 12:32, 22 June 2015 (UTC)
In my experience, people look up all varieties of Greek under Greek, and all varieties of Apache under Apache. Navajo is expected to be under Navajo. —Stephen (Talk) 12:49, 22 June 2015 (UTC)

Category:Plurals with a red link for singular[edit]

May I bring this category to your attention. It contains plural words that various people have come across but don't know how to define the singulars.

Any help in providing such a definition would be welcome. Please ignore the appendices, user pages, talk pages and the like. SemperBlotto (talk) 16:21, 22 June 2015 (UTC)

p.s. I have got as far as "g".

With some adjustments to Module:form of, this can probably be extended for any form-of entry whose lemma is missing. —CodeCat 17:26, 22 June 2015 (UTC)
Any idea why Appendix:Proto-Algonquian/aya·pe·waki is in the category? - -sche (discuss) 18:10, 22 June 2015 (UTC)

Synchronic and diachronic derived terms[edit]

Many languages have terms which were derived through processes that are no longer productive, but where the relationship is still clear enough to be recognised by most speakers. For example, Dutch has many cases in which a noun is derived from the root of a verb through ablaut, or by using a variety of obsolete suffixes. Some examples: springen > sprong, dringen > drang, spreken > gesprek, zien > zicht. The question is whether these can be considered derived terms. I think most Dutch speakers would understand that sprong is derived from springen, even if the actual method of derivation is opaque. But the actual derivation occurred in Pre-Proto-Germanic times.

And if these are derived terms, then where should we draw the line? Is dawn still a derivative of day? lord a derivative of loaf? —CodeCat 18:13, 22 June 2015 (UTC)

I think ====Derived terms==== should be limited to regular derivations whose process is transparent and could be applied to other words. In the examples you gave from Dutch, there are two issues: that it is not clear exactly how the vowel is determined, and that it is not clear without looking at historical evidence whether it was the noun or the verb that came first. Thus, in my opinion, all the examples you gave are better off in the ====Related terms==== section. However, I completely agree that for words whose derivations are still regular and transparent, they should be allowed in ====Derived terms==== even if the derivation took place thousands of years ago in a vastly different parent language. --WikiTiki89 18:27, 22 June 2015 (UTC)
A cutoff point with the clause "could be applied to other words" can get unwieldy quite fast for agglutinative languages. These often have a wide variety of derivative suffixes that are entirely transparent, but not really at all productive in the sense of being applicable to any arbitrary word. E.g. the Finnish suffix -sto regularly yields collectives, but this does not mean it is actually possible to take any random word like roskakori and form something like ˣroskakoristo. Often they will still be productive in the weaker sense that every so often, a new instance of a word using the suffix is added to the language — but this is not really a synchronically measurable property.
On the other hand: mere transparency seems to be too weak a condition. This will generate things like Category:Finnish words prefixed with geronto- or Category:Finnish words prefixed with terato-, although I do not think there are any cases of native Finnish formations using these Hellenic prefixes.
So, perhaps: morphophonological transparency for native derivational processes, versus evidence of productive use for originslly foreign derivational processes? --Tropylium (talk) 07:45, 23 June 2015 (UTC)
First of all, we are only talking about the ====Derived terms==== section, not the ===Etymology=== section. Second of all, everything you described about -sto fits my definition of "could be applied to other words". Note that I did not say "could be applied to any other word". --WikiTiki89 16:46, 23 June 2015 (UTC)

Request to add glosses in etymology sections[edit]

Could we make it a policy and/or guideline that editors should add glosses for etyma when working on etymology sections? For instance, knowing that knǫrr ‎(a large merchant ship used in mediaeval Scandinavia) comes from Proto-Germanic *knarzuz is interesting, but what does *knarzuz mean? It would be more useful if *knarzuz were provided with a gloss right there in the etymology section -- especially when we have no entry yet for the given etymon. ‑‑ Eiríkr Útlendi │Tala við mig 18:14, 22 June 2015 (UTC)

The normal practice is to give glosses only if the word means something else than the one preceding it (its descendant). So if knǫrr means the same as *knarzuz then only the former would have a gloss. This also means that if the word never changed meaning throughout its known history, then no glosses should be present at all. —CodeCat 18:20, 22 June 2015 (UTC)
  • It's only unclear if it isn't followed rigidly (which of course it isn't), but I do feel it would be tedious to see that foot comes from a Middle English word that means 'foot', which comes from an Old English word that means 'foot', which comes from a Proto-Germanic word that means 'foot', which comes from a Proto-Indo-European word that means 'foot' and is cognate with a Sanskrit word that means 'foot' and an Ancient Greek word that means 'foot' and a Latin word that means 'foot'. —Aɴɢʀ (talk) 19:03, 22 June 2015 (UTC)
  • At the bare minimum, it would be useful to have a gloss for the last etymon in the chain, in cases where the meaning hasn't changed. ‑‑ Eiríkr Útlendi │Tala við mig 19:12, 22 June 2015 (UTC)
  • Another set of cases for which we need glosses involves etymon redlinks.
Still another would be an etymologically important missing definition where we have only an incomplete entry for the etymon.
Yet another involves any etymon that is/was highly polysemic, especially in a sense that is less common, archaic, or obsolete.
I find myself constantly trying to look up etymon definitions and being frustrated. When I am able to find the definitions from other sources, the "same definition as previous etymon" assumption proves unwarranted except in the loosest of senses of same. I am often interested in whether a term had achieved a specialized meaning in Ancient Greek or Latin, which specialized meaning are often neglected in our entries.
As a result I would favor having an explicit requirement that we have glosses, except in cases where we have an entry for the etymon, the applicable sense(s) are in the entry, and the applicable sense is clear. DCDuring TALK 20:44, 22 June 2015 (UTC)
A problem here is that the meanings of words in proto-languages are not necessarily even reconstructible in too much detail. Often it is easy enough to figure out that a word meaning "a" in language A and a word meaning "b" in language B are cognate, but it can be an intractable question if the original meaning was "a", "b", both of them, or perhaps something slightly different altogether. I'm in favor of glossing attested pre-forms in e.g. Latin, especially if they differ, but this policy cannot be fully generalized for all pre-forms. --Tropylium (talk) 07:26, 23 June 2015 (UTC)
For my use of a dictionary that is a reason to exclude such reconstructions, perhaps by hiding them so they don't waste screen space. DCDuring TALK 09:38, 23 June 2015 (UTC)
So this is another argument for a user setting "Hide etymologies", I guess? --Tropylium (talk) 12:48, 29 June 2015 (UTC)
The proto-form explains how the cognates fit together, and the cognates themselves give clues about the possible range of meanings for the proto-form- they're complementary. The problem with too many similar cognates is that they obscure that relationship- especially if one branch shares an innovation, and the sheer number of cognates in that branch gives the impression that they're the norm. Chuck Entz (talk) 13:33, 29 June 2015 (UTC)

Standard forms of words versus regional variants[edit]

If there is a standard version of a word, should it be used in place of regional variants? Changing from one regional variant to another regional variant is counterproductive, but what about changing from a regional variant or alternative form of a word to the word's standard version? --WikiWinters (talk) 00:26, 23 June 2015 (UTC)

This should be decided for each language individually. Some languages have standard forms, but the standard is not widely followed by speakers. So the standard form is not always the most-used or best-known form. —CodeCat 14:01, 23 June 2015 (UTC)

Proposal: Always collapse cognate lists in entries[edit]

In the discussion above, User:DCDuring suggested that cognate lists should always be hidden behind a collapsible element of some sort. I do think this is a good idea, because cognates often clutter up etymologies, and it's not unusual to see huge lists of them because of course everyone insists on including their favourite language.

Aside from this, I think it's also worth discussing what else we can do about cognates. In a lot of cases, the cognates are already listed neatly in the descendants section of the term's ancestor. Listing them in the entry as well is redundant then, and duplicates information, so we may want to remove cognates altogether if they're already listed more thoroughly on another, central page. On the other hand, having them in the entry is convenient to the user, at least. So what can we do to alleviate the duplication? —CodeCat 13:59, 23 June 2015 (UTC)

I suppose we might consider whether there is any reasonable way to decide whether a cognate for a term should:
  1. Appear unhidden as part of the etymology in the entry for the term (Some cognates seem to be more or less essential elements of an etymology.)
  2. Appear hidden as part of the etymology. (This might be particularly warranted if there is no entry for the term's ancestor at which the term's cognate would appear as a derived term or descendant.)
  3. Not appear at all in the entry, but rather in descendants or derived terms of an ancestor of the term.
But hiding seems to be a good tool for handling cognate lists, pending moving the cognate to descendant or derived term in another entry or possibly creating such entry. Although this is in principle just a temporary solution, it is likely that there will always be some cognates that have no home as descendants or derived terms. DCDuring TALK 15:20, 23 June 2015 (UTC)
We could make a template similar to {{etymtree}} in order to list cognates without duplicating information. — Ungoliant (falai) 15:26, 23 June 2015 (UTC)
Combining hiding with avoidance of duplication seems like a good idea. I never cease to be amazed at how little the performance penalty for well-designed templates/modules+data of such apparent complexity. DCDuring TALK 15:40, 23 June 2015 (UTC)
It should depend on the number. If there are only three or four cognates listed, there really is no need to hide them and they serve to illustrate the etymology. --WikiTiki89 17:22, 23 June 2015 (UTC)
I understand and sympathize with that view, but I think many current and potential users find cognates distracting and irrelevant, even to etymology. Curious users will click on whatever unhide control we have and registered users can set it up to display by default for them. That CSS flexibility seems to me to fully address the concerns of all parties, if we are willing to do the work. DCDuring TALK 17:34, 23 June 2015 (UTC)
So you would hide them even if there is only one cognate? --WikiTiki89 17:38, 23 June 2015 (UTC)
I think that cognates in a few representative major languages should be shown as presently. When the number grows beyond "a few" they could be hidden behind a "click to see longer list of cognates" feature. 21:57, 27 June 2015 (UTC)
There's always going to be non-neutrality in which languages we choose. For example if we choose Swedish, then people will start adding Norwegian and Danish. Include Finnish, and soon there'll be Estonian too. That's just how it always goes. —CodeCat 22:23, 27 June 2015 (UTC)
I didn't mean that a fixed list of "major" languages should be enforced. If a Swedish word is used in one place then a Norwegian or Danish one can be used somewhere else. Of course, if people are going to be keeping score ... 22:57, 27 June 2015 (UTC)
It's not a matter of keeping score. There are editors who see it as their purpose in life to add cognates in their language to every English term with an etymology, and especially to those with cognates in languages that they see as linguistic rivals. This is most obvious with Albanian and Kurdish, but various Scandinavian and Romance languages do it too. There are also some real partisans in Turkic, Dravidian and in some African language families, but they don't have English cognates to work with. Chuck Entz (talk) 23:53, 27 June 2015 (UTC)
You can understand "keeping score" as covering all kinds of activities where individuals must have their favourite language in the non-collapsed part of the list on every occasion, rather than accepting a spirit of give and take. 00:05, 28 June 2015 (UTC)
I'll just throw in what I seem to be saying in every discussion recently: If someone wants to do non-harmful work, why would one undo it. Just collapse them where they are non-essential parts of an etymology section or at least be consistent and disallow them in etymology sections completely. No pick and choose, we must avoid every tiniest opportunity for people to argue. Korn [kʰʊ̃ːæ̯̃n] (talk) 17:58, 28 June 2015 (UTC)
Oppose. I love (all) cognates. Wyang (talk) 23:59, 27 June 2015 (UTC)
Delete cognates when they are listed in the proto-page. --Vahag (talk) 12:48, 28 June 2015 (UTC)
  • I've noticed that, when using Century 1911, which as accessible as indexed scans of the pages of the print dictionary, that the often longish etymologies seem to defeat the role that etymologies play in grouping definitions by similarity of meaning. I think that very same defeat of user accessibility is what we have achieved in some of our entries with longer Etymology sections. I had formerly supported the current Etymology-first presentation, but I now wonder whether we should revisit the notion of putting Etymology at the bottom of the group of definitions to which it applies. That practice is what most online dictionaries follow, presumably reflecting their beliefs about user behavior, some of which are almost certainly based on actual click data. Having the Etymology sections below the definitions (in each homonym section) would allow the cognate lists to be as long as anyone wanted without much interfering with users who were interested in definitions. DCDuring TALK 19:21, 28 June 2015 (UTC)
    • Definitely agree. Most users want definitions first, so why present them with etymology at the top? —CodeCat 19:50, 28 June 2015 (UTC)
    • For words with multiple different etymologies, the "Etymology" headings presently also act as section headings, so some consideration would need to be given about how that would work. Would there be an "Etymology 1" heading, for example, and then later a further "Etymology" heading within the "Etymology 1" section? Having said that, I essentially agree that etymolgies should come after definitions. More generally, I think there is very considerable further scope for improvement in Wiktionary page design, so as to make it more attractive and appealing to users. 20:27, 28 June 2015 (UTC)
    • I agree the definition should probably come first, as it simply has to be the most important thing for most dictionary users. It's not just online dictionaries that put def before ety, either. Many put the pronunciation before the def, but they don't have a big subtitled section for it! Equinox 20:35, 28 June 2015 (UTC)
      I like the etymology-first presentation when the etymology section is short, but we don't seem to be getting very far in limiting its use of scarce screen space that users see first. Cognates are only part of the problem. Many etymologies are just verbose. An alternative to reordering sections would be to have the Etymology sections collapsed in their entirety, with a terse etymology appearing in the show/hide bars such as are produced by {{rel-top}} DCDuring TALK 21:28, 28 June 2015 (UTC)
      That's just a workaround at best. First we decide to put it first, then we decide that we don't want it there and hide it? We should just move it elsewhere completely. —CodeCat 21:35, 28 June 2015 (UTC)
      You can be dismissive as a rhetorical tactic, but I thought it was a way of having one's cake (terse etymology visibly organizing the entry) and eating it too (drastically reducing the space taken by the worst-offending lengthy etymologies). DCDuring TALK 22:49, 28 June 2015 (UTC)

The Index namespace[edit]

I think we should either keep the indexes updated, or completely delete them. It is confusing to our readers to have seriously out-of-date indexes. --WikiTiki89 19:09, 23 June 2015 (UTC)

For the most part, our lemma categories have replaced these. —CodeCat 19:10, 23 June 2015 (UTC)
Which is why I favor the latter option (i.e. deleting them). --WikiTiki89 19:12, 23 June 2015 (UTC)
Large categories are pretty hideous to navigate through, though. The lemma categories should be as easy to browse as the pages of a real dictionary. —CodeCat 19:22, 23 June 2015 (UTC)
Yeah, I don't know why they make the categories so difficult. On all other pages (history, watchlist, etc.), you can adjust how many entries you see on the screen and skip multiple pages or to a particular page number; but in categories, the number is fixed to 200 and you can only move forward or backward one page at a time. We need to complain harder to the devs about this. --WikiTiki89 19:29, 23 June 2015 (UTC)
What about cattoc's? Like here Category:English lemmas or here (with just alphabet which I think is enough) Category:Latvian lemmas.
Agreed on "unwieldiness" of browsing cats. There've been times where I've been "shopping" for an audio file to be used in wiki to illustrate a particular sound and the fact that the only thing for navigation that I have is "Next 200" is very inconvenient, cattoc makes this much more convenient.
Also agree about indices, I understand some people have invested time (at some distant point in the past) in maintaining them but the whole point escapes me. Something like that should always be auto-updated (like categories are.) The indices, imo, should be replaced with lemma cats with cattocs. It probably takes a couple minutes of work to add this cattoc but then there could simply be a drive "want your pet language featured in the indices box on the first page? Well, then go and make a cattoc for an alphabetical index of its lemma cat." Neitrāls vārds (talk) 11:30, 25 June 2015 (UTC)
We do have "TOC"s in the lemma categories, as you have already pointed out. But having both that and more navigable pages would be much better. --WikiTiki89 15:41, 25 June 2015 (UTC)

Are there any languages for which the Index is satisfactorily updated? In other words: can we delete the whole Index namespace at once or are there any languages that should be kept? "Chinese radical" index is one that comes to mind since it is different from the rest - it is not a list of Chinese words but a (large) list of Chinese characters. I don't have the ability to tell if it's accurate, of good quality, complete or near completion. Also, what about proto-language indices like Index:Proto-Indo-European/d? Can those be deleted too? --Daniel 00:22, 4 July 2015 (UTC)

See Special:Contributions/Conrad.Bot. The only languages for which the Index is potentially up to date are those that have not had any new entries since the last time Conrad.Bot updated it (which was May 2, 2012). In other words, if there is such a language, it's rather insignificant. As far as proto-languages, it seems they were updated by a different bot, NadandoBot, which last updated them on September 22, 2012, but I see no reason they should be treated any differently; they have lemma categories just like any other language. Any valid red links can be collected on a requests page. --WikiTiki89 20:26, 6 July 2015 (UTC)
Looks good enough to me. Since deleting the whole index (or most of it minus Chinese radical, I guess; I don't know if it could be replaced by categories, but it seems it hasn't) is a major project, if it's alright I'm thinking of creating a vote for it sometime in the new few days. --Daniel Carrero (talk) 01:20, 13 July 2015 (UTC)
Sounds good. --WikiTiki89 13:15, 13 July 2015 (UTC)
It seems that I am the only one who wants to keep the Index namespace. There are so many talented editors here who could update Conrad's code and run it as bot a couple of times a year. Reasons to keep (I am repeating what I said elsewhere):
  1. an audio link if there is an audio
  2. the part of speech
  3. asterisks linking back to the English entries where translations were added
  4. red-link entries that were added to translations but not created yet
  5. orange-link entries that were added to translations but the FL section is missing on the entry page
  6. it is also an excellent tool for troubleshooting and maintenance, showing mistranslations and incorrect entries
  7. it is more compact than the lemma category (a full-size window can show even 5 columns and all this extra information)
  8. it is easier to navigate than the lemma category (this was also mentioned by others above)
Would you all please reconsider? --Panda10 (talk) 14:31, 13 July 2015 (UTC)
Your reasons for keeping the Index sound good, but IMO the great problem is how the Index is out of date with nobody yet to update it. I propose carrying on with the project of deleting most of the Index namespace, while we could mention on the vote that this is without prejudice; that people are encouraged to "update Conrad's code and run it as bot a couple of times a year." in case someone volunteers to do so. --Daniel Carrero (talk) 15:05, 13 July 2015 (UTC)


Previous discussion: Wiktionary:Grease pit/2015/June#Template:archive-top

The terms "passed" and "failed" are not very clear when it comes to RFD/RFDO. It would be clearer to use "kept" and "deleted". However, for RFV, it does make more sense to use "passed" and "failed". Therefore I would like to propose that we change both the displayed text and the template parameter from "passed"/"failed" to "kept"/"deleted" for and only for archives of RFD/RFDO discussions. The downside would be that it would complicate the template logic and possibly confuse the users of the template to have different sets of values for RFD/RFDO and RFV. --WikiTiki89 18:48, 25 June 2015 (UTC)

I will just point out that you do not even have to think about such inane details if you just use the archiving script I wrote. Which archives better that you ever could manually. Keφr 18:52, 25 June 2015 (UTC)
@Kephir: Using "passed" and "failed" for RFD/RFDO archives is still confusing to the readers of the archive, regardless of how it was archived. Has it failed to be deleted, or has it failed to be kept? --WikiTiki89 18:53, 25 June 2015 (UTC)
I surmise that readers of the archive read it in page view mode, not directly as wikitext. I have no idea how a detail they are not even aware of could confuse them. Keφr 18:58, 25 June 2015 (UTC)
"The following information has failed Wiktionary's deletion process." Is that not the text they would see? --WikiTiki89 19:00, 25 June 2015 (UTC)
Yes, but that is a completely different issue from what template parameters trigger this text. Keφr 19:18, 25 June 2015 (UTC)
I believe I said that this concerns "both the displayed text and the template parameter". --WikiTiki89 19:26, 25 June 2015 (UTC)
If you wish to change the text, just do it. (I was not particularly happy about some phrasing there anyway.) Keφr 19:34, 25 June 2015 (UTC)
Well I want to change both, which is why I started this discussion to get consensus. I realize that we would need a bot run and you would have to change your aWa tool, but that shouldn't be too hard. --WikiTiki89 19:40, 25 June 2015 (UTC)
You can change the template in ways that do not break existing uses. Using bots is unnecessary in that case. Keφr 20:14, 25 June 2015 (UTC)
Yes you can. But then people will continue to use what they see. Not everybody uses your aWa tool. --WikiTiki89 20:17, 25 June 2015 (UTC)
I do not follow. What is wrong with it? Keφr 20:28, 25 June 2015 (UTC)
In addition to what I've already mentioned, to maintain consistency between entered content and displayed content. --WikiTiki89 20:35, 25 June 2015 (UTC)
If you wish to adjust the blurb or add aliases for parameter values, I have little against it, but I think changing existing usage is too much hassle for no benefit. I am fine with current parameter names. And people who are such masochists that they would want to use the template manually should look up its documentation. Keφr 20:58, 25 June 2015 (UTC)
So hypothetically, if it didn't take any hassle at all, what would the ideal parameters be? --WikiTiki89 21:01, 25 June 2015 (UTC)
"0" and "1". Short, sweet and to the point. Keφr 21:03, 25 June 2015 (UTC)
<sarcasm>Not "f" and "j", so that you don't have to move your fingers?</sarcasm> You're forgetting that people read code. This is why people do #define TRUE 1 and #define FALSE 0 in C, so that they can type "TRUE" and "FALSE", even though just using "1" and "0" would be much faster. --WikiTiki89 21:11, 25 June 2015 (UTC)
Well, you already have to move your fingers to type the pipe character; if you take that into account, you may try "\" and "]". However, "0" and "1" offer a nice balance between readability and brevity. They are also much more universal; they would be just as fit when someone proposes to reword the displayed text. Keφr 09:52, 26 June 2015 (UTC)
  • Support adding "kept" and "deleted" as parameters in one way or another: My preference would be to have "kept" and "deleted" as additional parameters that are supported when somebody enters "kept" or "deleted" where they would normally enter "passed" or "failed". Purplebackpack89 18:51, 25 June 2015 (UTC)
  • I would still like to see the ability to close discussions as "RFD kept" rather than "RFD passed" in archive-top template. That is, I want to be able to enter {{archive-top|rfd|kept}} and {{archive-top|rfd|deleted}}. I don't want to use AWA tool to archive discussions. --Dan Polansky (talk) 20:22, 25 June 2015 (UTC)
  • Yeah, that tool's too complex for most editors, and there's not really any harm in keeping templates that aWa mimics. Purplebackpack89 20:43, 25 June 2015 (UTC)

Appendix:Unicode subpages no longer have article links[edit]

Due to edits by User:Kephir at Module:character info and Module:character list, redlinks (and also regular links) are no longer showing up at Appendix:Unicode. I don't recall such an action being discussed earlier anywhere in the discussion rooms, so I'm bringing it up here since I'm just curious as to what is going on. Bumm13 (talk) 19:15, 25 June 2015 (UTC)

Deprecated German spellings[edit]

Deprecated in 1996

In case of spellings that were deprecated in 1996 it's kind of easy to see what they are, even though they were inconsistently labeled as obsolete, dated, nonstandard or alternative forms here.

  • obsolete: obsolete here at WT is a stronger term than archaic and forms deprecated in 1996 aren't even archaic. Thus: obsolete doesn't fit.
  • dated: many (or even all?) forms which were in use before 1996 are still in use - though maybe rather by older than younger people and also being rarer now than they were years ago. Thus: dated doesn't fit.
  • nonstandard: Appendix:Glossary#nonstandard: "Not conforming to the language as accepted by the majority of its speakers."
    • There were many surveys that showed that a majority is against the reform, so it's doubtful that deprecated spellings aren't "accepted by the majority of its speakers", even though deprecated spellings became rarer and might and at least sometimes do count as errors in schools.
    • In many cases many people don't know which form is correct accourding to the spelling reform of 1996 (2004, 2006, 2011) anyway. This leads to hypercorrections such as "ausser" and "Fussball", and to the use of deprecated forms which aren't recognised as deprecated or are used anyway (and thus most likely aren't nonstandard; e.g. geschrieen).
    • Thus: nonstandard is doubtful or doesn't fit.
  • alternative: If a form is attestable even after the reform and when there is the "the spelling became deprecated" note, this should be fine. At least it's more fitting than the other labels.
  • Another alternative label: instead of nonstandard (which is doubtful) and alternative (which might be "too neutral"), something like "unofficial" (German: nichtamtlich) might be better. The term isn't doubtful (in contrary to "nonstandard") and is also neutral/describing (and not prescribing like a misuse of "obsolete" or (sometimes) "nonstandard"), but might be more precise (than just "alternative").
Deprecated in 1902

ATM there's no entry which says that a spelling was deprecated in 1902, but anyway:

  • Forms that were deprecated in 1902 most likely aren't in use anymore and thus aren't attestable for the 21st century. Thus it shouldn't be "alternative form" or "nonstandard form".
  • Usually forms deprecated in 1902 are easy to understand (e.g. compare Thür and Tür). Thus "dated" should be more fitting than "archaic" or "obsolete".


  • How about labeling forms deprecated in 1996 as "unofficial"?
  • How should words deprecated in 1902 be labeled?

- 10:54, 26 June 2015 (UTC)

My two cents. I don't speak or edit in German, but Portuguese has similar issues. Proposal:
Re labels: at Wiktionary talk:About_German#pre-1996_spellings_are_.22_forms_of.22_current_spellings, we worked out to use Template:de-superseded spelling of (or Template:superseded spelling of if it would be feasible to greatly expand its functionality without making it prohibitively expensive for the servers and for users who have to add parameters and have the template know that the German spelling reform of 1996 is not the same as the Foobarese spelling reform of 1996). That template handles the variable labelling of things, based on the age of the reform that deprecated them, as "superseded", "obsolete", etc. Re categories, Daniel's basic suggestion is good (precise category names TBD); we do already have Category:German words affected by 1996 spelling reform. - -sche (discuss) 17:54, 26 June 2015 (UTC)
Regarding that template:
  • And what is with re-superseeded spellings, when one spelling superseeded another and then got superseeded by the former spelling, like daß/dass became daß in 1901 and then officially dass in 1996? Thus, "dass" is a superseeded spelling of "daß" (as of 1901-1996), but then "daß" is a superseeded spelling of "dass" (as of 1996). Not to mention that "dass" was deprecated between 1901-1996 is no solution, as this would be a lack of infomation and in a way it would also be non-neutral.
  • "Obsolete spelling [...] deprecated in [...] 1901." -- Please read Appendix:Glossary#obsolete: "No longer in use, and no longer likely to be understood." The first part is true (at least in case of dropped "h" like in "Thür"), but the second part is not. "Thür" is likely to be understood as it looks pretty much like "Tür". Thus, as the definition in the glossary uses an "and" and not an "or", "obsolete" doesn't fit - and maybe it even is some kind of false friend of German "obsolet" in the sense of "unneeded". Even spellings which came out of use in the 17th century aren't always obsolete - e.g. uncapitalised words are likely to be understood.
  • The "First Orthographic Conference" failed, so it doesn't make sense to say that a word was "deprecated in the First Orthographic Conference".
  • "1600s" is 1600-1609, which is something different then "16th century" which is 1601-1700 - so the parameters should rather be just "1700", "1800" (like in "till 1700", "till 1800"). 15:48, 1 July 2015 (UTC)
1600-1700 is the 17th century. Also, how about the label 'wrong'? I don't see a need to avoid prescriptivism when the there is a legal prescription. The only way the German orthography could be even more prescriptivist was if the state put fines on media for misspelling words. Korn [kʰʊ̃ːæ̯̃n] (talk) 20:30, 1 July 2015 (UTC)
The German language is not owned by the country of Germany. Wiktionary mentions prescriptions because they are often relevant, but does not itself prescribe. And forgive me if I am wrong, but there are people even in Germany who categorically refuse to follow the orthographic reforms; Wiktionary is not here to decide whether these people are right or wrong. --WikiTiki89 20:36, 1 July 2015 (UTC)
As such, shouldn't be marking these clearly by what orthographic prescriptions they follow or don't, and ignore tags like dated until they're really necessary?--Prosfilaes (talk) 22:02, 1 July 2015 (UTC)
Yes. I feel that dated refers more to words that have naturally fallen out of use, rather than those that were banned. --WikiTiki89 22:05, 1 July 2015 (UTC)
It's not 100% correct to say that the German language is not owned by the country of Germany, at least in the field of orthography. (As for pronunciation standards: God, no.) Germany, Switzerland, Luxemburg, and I believe Liechtenstein too, have declared the Duden as the legally binding institution for their orthographies. The Duden is based in Germany and its decisions are mainly influenced by discussions within German society and politics. I wouldn't be surprised if its editorial and panel were exclusively German as well. The legal situation is the same in Austria, with their home-based Austrian Dictionary being the entity entitled to decide the rules. Point I want to make is that they're all equally prescriptive. Korn [kʰʊ̃ːæ̯̃n] (talk) 00:31, 2 July 2015 (UTC)
Yes, but not every German writer in the world lives in the countries you mentioned. And not every German writer that does live in the countries you mentioned actually follows the legally prescribed rules. Should we also say that in the Persian language, any anti-Iranian propaganda is grammatically incorrect? --WikiTiki89 14:21, 2 July 2015 (UTC)
Depends, is there an authority with any sort of binding power regulating grammar, rather than content, in such a way that anti-Iranian propaganda would automatically fail its requirements? If so, yes. Don't try to be thick on purpose just for political reasons. We do have the label 'misspelling' in English where there is no regulation whatso-fucking-ever and suddenly we're having an argument how labeling something as a misspelling is unacceptable for a language for which every country who has it as a national language has a law regulating its spelling, and all on one based on the same source? Really? Because if there are no misspellings, then we have to include a lot of stuff. I might author a German book entirely in a mixture of runes and devanagari and enter every single one of the words I used here. Descriptivist dictionary ho! Korn [kʰʊ̃ːæ̯̃n] (talk) 14:39, 2 July 2015 (UTC)
So if there were such an authority in Iran, then you think Wiktionary should follow it as well? Wiktionary is not supposed to take sides—any sides. Wiktionary is only supposed to describe the existing situation. I would have no problem saying "now considered incorrect by Duden" with a link to an appendix page explaining how authoritative Duden is. But we should definitely not mark something as simply "wrong", because that implies Wiktionary supports that view and Wiktionary does not support any views. --WikiTiki89 14:50, 2 July 2015 (UTC)
The existing situation is that virtually every German speaker considers non-Duden spellings as wrong. Korn [kʰʊ̃ːæ̯̃n] (talk) 16:07, 2 July 2015 (UTC)
Let's see some evidence of that. I can find plenty of Google Books hits for daß from well after the reform. --WikiTiki89 16:41, 2 July 2015 (UTC)
I'm not sure what you intended to link me, but I looked at the first twelve pages of the link you gave. The overwhelming majority of hits are from the 18th century, another fair share is from even before that and the two or so hits which are post 1996 are at reprints of pre-reform texts. Korn [kʰʊ̃ːæ̯̃n] (talk) 23:12, 3 July 2015 (UTC)
It was supposed to show you hits from 2005 and later. Some of them are reprints, but it seems to me that some of them are not; although I may be wrong. --WikiTiki89 20:30, 6 July 2015 (UTC)
I opened the link in another browser and looked over the first ten pages. They mainly are two things: Unedited reprints of older textbooks and books which quote historical writings. There are 5 genuinely new books which use 'daß', but two are false entries in in Google, where the actual book cover uses 'dass', which serves us as a warning for looking twice. So in the small sample survey I did in that link, new books with old spellings make up 3%. Korn [kʰʊ̃ːæ̯̃n] (talk) 09:41, 7 July 2015 (UTC)
re "'1600s' is 1600-1609" = no, 1600s is 1600-1699 in most contexts in English.
As I noted on WT:T:ADE, "no longer likely to be understood" only applies to words; for spellings, the concern is only whether or not they are still in use. Spellings which fell out of use more than a century ago are obsolete unless they are still used for effect, in which case they are archaic.
- -sche (discuss) 22:38, 1 July 2015 (UTC)
Spellings deprecated by a regulatory authority but still in widespread use are alternative spellings. They are neither non-standard, nor obsolete, nor misspellings, nor wrong. The English Wiktionary, being a descriptivist dictionary, does not label entries or spellings as "wrong" based on stipulations of regulatory authorities. this revision of Eßstäbchen looks good to me: it ranks the spelling as alternative but informs the reader via a usage note that the spelling was deprecated. That is the accurate, informative reporting to the reader that we should strive for. We should not prescribe, but there is no need for us to omit the fact that an authority has deprecated the spelling, since many a reader wishes to know that. --Dan Polansky (talk) 19:26, 3 July 2015 (UTC)
It seems that "usage note" is too heavy weight for this. We could say "1902 Duden standard version (deprecated by 1996 standard) of ..."? That seems clunky, but short of notation more appropriate for a German-language dictionary, I'm not sure how to compress it.--Prosfilaes (talk) 20:46, 6 July 2015 (UTC)
  • Excuse me, but if a spelling which is 1. virtually not used and 2. not part of the official standard of literally every country which uses that language, is not fitting the term non-standard for you, you might need to get a new dictionary since you seem to have grabbed an edition printed in w:Bizarro World. Korn [kʰʊ̃ːæ̯̃n] (talk) 09:41, 7 July 2015 (UTC)
As for the above claim that certains spellings are "vitually not used", here is Google Ngram Viewer in German corpus, for Eßstäbchen, Essstäbchen, going to 2008. What I see there leads me to report that "Eßstäbchen" is an alternative spelling that still finds plenty of use. On another note, my understanding of the phrase "non-standard spelling" is not as "not fitting a prescriptive, stipulated standard" but rather "very rare, causing surprise to native speakers when seen in print". I oppose attempts to label spellings not fitting prescriptive standards as "non-standard". --Dan Polansky (talk) 19:36, 7 July 2015 (UTC)
What's the advantage in going all judgmental and calling it non-standard instead of correctly stating which standards it's a part of? Besides being prescriptive, it's misleading to readers to have spellings used in 1965 labeled "non-standard", as if the author was a bad speller or the editor incompetent, instead of labeling it not conformant to the 1995 standard.--Prosfilaes (talk) 20:24, 7 July 2015 (UTC)
It seems to me that what you're thinking of is what we already have, Template:de-superseded spelling of (which was indeed designed to, among other things, obviate/replace usage notes and provide descriptive language on the sense line). - -sche (discuss) 17:44, 7 July 2015 (UTC)
In Brennessel, the template produces this: "Former spelling of Brennnessel which was deprecated in the spelling reform (Rechtschreibreform) of 1996." That seems slightly misleading since the spelling is not so much former as alternative; it is not "former" because it is still in use, and furthermore, because it is still a spelling; if something is "former X", it means it is "no longer X", so "former spelling" suggests "it used to be a spelling but is no more". For brevity and accuracy, the template should IMHO better produce "Alternative spelling of Brennnessel deprecated in the spelling reform of 1996.", having changed "former" to "alternative" and having dropped "which was" and "(Rechtschreibreform)" for brevity. --Dan Polansky (talk) 19:43, 7 July 2015 (UTC)
A concern with labelling Foobar an "alternative spelling of Fubar deprecated in the spelling reform of 1996" is that it could be interpreted as saying that Foobar used to be an alternative spelling of Fubar (i.e. that both Foobar and Fubar were standard) until Foobar was deprecated in 1996, leaving only Fubar. This is actually the case with some words, e.g. geschrieen (both geschrieen and geschrien were standard before 1996, now only geschrien is). Just "spelling of Fubar which was deprecated in the spelling reform of 1996" ("which was" strikes me as necessary for clarity, especially if we drop the adjective) is one idea, but it fails to note that the spellings were formerly standard. Hmm... I have changed the wording to "Formerly standard spelling of Fubar which was deprecated in the spelling reform (Rechtschreibreform) of 1996." - -sche (discuss) 17:53, 17 July 2015 (UTC)
@-sche: By my lights, the word "standard" should not appear anywhere in the output of the template, whether as "standard", "non-standard" or the like. Put differently, "standard" is not a lexicographical category, IMHO. --Dan Polansky (talk) 10:19, 19 July 2015 (UTC)
I notice that the argument with me is beginning to turn in circles, so I'll make my final case, in order to not hinder the progress of the discussion. There is only one standard. Technically, there are two standards, Austrian Dictionary and Duden, but I can't think of anything where they would diverge in terms of spelling. Then there are a zillion reprints of university textbooks, for which it is neither necessary nor cost efficient to revise their spelling, and texts quoting works with older spellings, which fuck with robots like Google, and then you get a preciously scare group of ultraconservatives who do consciously not follow the reform. The official standard isn't some fringe idea, it's what defines what people perceive as wrong and right. The ß you can get past people as an odd quirk, but omitting an N in Brennnessel will not be considered as a dated alternative, it will be perceived as a mistake by something between virtually and literally everyone. I know it's not the best argument, but I have to say it once, because it keeps popping up in my mind: I can't help but feel that the local foreigners have a wrong impression of how the average and absolutely crushing majority of German speakers see the reform. (They don't care. The state said 'this is correct now' and they accept that as the new given.) It might not be perfectly clear why this bothers me so much, so let me voice why I'm so irked by this debate: I am of the honest and stern conviction that we do a disservice to our users if do not abundantly clearly tell them that these spellings are not equal to the official spellings. 'Alternative' doesn't cut it. The very least we can do is 'proscribed', personally, I'd move for 'misspelling'. If you believe that such a thing as a 'misspelling' does exist as a concept, this is it. Brennessel, adultry, who are we to judge? My answer to that is: We're a dictionary, not a copy shop. We don't embezzle context information. Korn [kʰʊ̃ːæ̯̃n] (talk) 22:39, 7 July 2015 (UTC)
Can you express that in some way that's not so dated? Like a Youtube video, or Twitter?
When B. F. J. Scheller published their Die amerikanische Brennessel, they correctly spelled the title. No force in heaven or on Earth can turn back the hands of time and make that a misspelling. Any dictionary that presumes to cover the totality of cited human writing like Wiktionary does would fail if it were to mark a word that was correctly spelled as a misspelling because some reform one hundred years later presumed to make it so. Even if there is but one standard today, it differs from the standard then.
I have no clue what you mean by "we don't embezzle context information." It is entirely appropriate for a dictionary to clearly state what standards spellings adhere to.--Prosfilaes (talk) 07:48, 8 July 2015 (UTC)

Requests for quotations[edit]

From time to time I come across notes like "Can we find and add a quotation of <author's name> to this entry?" I am curious about these since there is no explanation (and certainly no obvious reason) why some seemingly random author should be so important for some particular word sense. And, if there is a good reason, i.e. the editor knew of a particularly pertinent quote, then why didn't the editor just add it? I would be interested to know more about the reason for these notes. 23:54, 26 June 2015 (UTC)

Webster 1913 often cited authors who used a word, but didn't give the citation; we have copied these while importing the Webster data, because sometimes there are very few authors who ever used a rare word. Also, sometimes we're too busy to fill out the entire entry at the time. Equinox 00:03, 27 June 2015 (UTC)
For further background, Webster 1913 is a core source of Wiktionary entries, being out of copyright and available in readily usable form. Also, it is simply a lot of work to add citations. It would take at least 1,000 hours for one person to add just the citations marked with the template {{rfquotek}}, assuming they can manage 10-11 per hour. Try adding just one. DCDuring TALK 00:20, 27 June 2015 (UTC)
I see, thanks for info. 00:45, 27 June 2015 (UTC)

Format of definitions[edit]

I think it looks messy that some definitions start with a capital letter and end with a full stop, while others don't. The page Wiktionary:Entry layout explained says "Each definition may be treated as a sentence: beginning with a capital letter and ending with a full stop." This seems undesirably vague to me. Is there a reason why "may" is not "should"? Another slightly messy inconsistency is that some verb definitions begin with the infinitive marker "to", while others do not. Apologies if I missed it, but I don't see any instruction about this on the "Entry layout explained" page. I think this should be covered. 00:49, 27 June 2015 (UTC)

A definition consisting of a single capitalised word followed by a full stop looks silly. One word doesn't make a sentence. —CodeCat 01:16, 27 June 2015 (UTC)
(Almost) no definitions are grammatically full sentences. It doesn't make any difference in that respect whether a noun phrase, for example, is one word or twenty. 01:19, 27 June 2015 (UTC)
How do other dictionaries do it? —CodeCat 01:20, 27 June 2015 (UTC)
Well, you can see as well as me that they vary. I should say that I am not especially certain that the "sentence" format is the best. I think the main thing is that all entries should be consistent. If you can find a dictionary that has inconsistency like Wiktionary then that would be more interesting. 01:28, 27 June 2015 (UTC)
We've sought consistency and have not achieved it. Non-English entries are overwhelmingly lower case without period. English entries are mostly upper case with period. Other formats exist but tend to be converted to the dominant format for English or FLs. DCDuring TALK 01:34, 27 June 2015 (UTC)
Do you know offhand what the barrier(s) to achieving consistency are/were? 01:38, 27 June 2015 (UTC)
Adherents of one or the other option being vehemently against codifying anything but their preference, and the sheer magnitude of work required to synchronize millions of entries that are constantly being edited by an unknown number of people at all hours of the day and night. Chuck Entz (talk) 03:45, 27 June 2015 (UTC)
The inconsistency between English and FLs is largely attributable to the fact that English definitions tend to be longer, closer to full sentences in length if not structure. (How could a substitutable definition of anything other than a sentence be a sentence?) FL definitions are most frequently a single English word (whether or not that should be the case), sometimes with a disambiguating gloss, for polysemic words (and homonyms). Inconsistency within English is due to some contributors having disagreed with me. DCDuring TALK 04:12, 27 June 2015 (UTC)
I wonder whether there is any appetite to look at this again with a view to settling on a single format, at least for English. It's not so bad if separate foreign-language sections have different formats from English, but when adjacent English definitions are formatted differently, the effect is, as I say, messy. It does not look designed or intentional, but just like different people are doing different things at random. Even if exact criteria were developed for choosing one format over the other, say based on length, I think that for ordinary readers there would always be an arbitrary-seeming cutoff point at which one would ask "Why are these two formatted differently?" I do not agree that hope of standardisation should be abandoned just because Wiktionary is user-editable. Sure, people may not follow standards, and things may have to be corrected, and they may go uncorrected for a long time, but the same is true of any layout or style requirement. If you go down that route you might as well give up on the whole of "Wiktionary:Entry layout explained". By the way, does anyone have a view on my other point about use of "to" in verb definitions? 11:55, 27 June 2015 (UTC)
I like the use of to in English verb definitions as it accelerates and confirms the recognition that the definition is for a verb. On mobile screens and for longer definitions the PoS heading may not be visible. For a non-native speaker especially the possibility that a defining word is a verb and has a homonym that is of another part of speech adds to the potential for confusing, even ambiguous definitions. DCDuring TALK 12:52, 27 June 2015 (UTC)
It does look messy when there's inconsistency, but for the most part, the definitions are capitalized and punctuated for English words, and not for non-English words. I have been taking care of inconsistencies as I come across them (that goes for the occasional headers that are the wrong sizes). In fact, those little inconsistencies are what started me editing Wiktionary not too long ago, since they were bugging me. I definitely think we should strive for consistency in every way possible. JodianWarrior (talk) 22:20, 30 June 2015 (UTC)
We do have some more important problems, like missing definitions, wording of a definition of a part of speech that makes it seem like another part of speech, confusing order of definitions, etc., but working on format is a great way to get exposure to the content of a lot of definitions. By doing so one can pick up prevailing good (and not so good) practice in definitions and other parts of entries. DCDuring TALK 23:20, 30 June 2015 (UTC)

Strange tables in Korean entries[edit]

I think that these need to be removed and the "orthoepy" sent to ko-pron. 04:24, 27 June 2015 (UTC)

July 2015

Words used "in dialects, including A, B, C"[edit]

Quite a few entries use the labels "dialectal", "dialect" and "dialects". This is allowable, because sometimes a user may not know which dialects a word is used in. But we should always attempt to be more specific, IMO. I'd like to make people aware of the label "including", which allows listing dialects in a way that makes clear the list isn't exhaustive. E.g. in the entry favor: {{lb|en|transitive|in|_|dialects|including|Southern US|and|Cajun}}(transitive, in dialects, including Southern US and Louisiana). (I can also find evidence that the sense was used in British dialects a century ago; I don't know whether it still is or not.) - -sche (discuss) 00:35, 2 July 2015 (UTC)

Flash card function for language learning publicly requested[edit]

In this blog article the author suggests the desirability of having the Wiktionaries offer a flash-card like system for learning African languages. The advantage of hosting such a system is that it would offer the opportunity for teachers or advocates of the language to add entries to the languages of interest to them to achieve sufficient language coverage to make the effort worthwhile. This came up on the Wiktionary-l mailing list, so we should try to make as constructive response as possible. DCDuring TALK 18:39, 2 July 2015 (UTC)

I've been extracting flashcard files (for Anki et al.) from the dumps for personal use for several years (one component of a language-learning program that has helped me earn a tidy little set of ATA certifications). It would be fairly trivial to make such files available on a regular basis for any given set of languages. -- Visviva (talk) 21:44, 2 July 2015 (UTC)
I think the blog author and the fellow who put it on the mailing list may be looking for more. Actually there must be good, free web-based software or free applications that could run this. Perhaps we could assemble a list with links and select words and (god help me) phrases suitable for basic word and phrasebook flashcards. DCDuring TALK 21:57, 2 July 2015 (UTC)
FWIW, Anki is open source and has web-based, desktop and app versions. The author's idea of a "flashcard mode" for Special:RandomInCategory is interesting (and could be accomplished with some clever JavaScript, I think), and could have some real pedagogical value if combined with a "basic words" category (rather than an "all lemmas" category), but it would still be a pretty poor substitute for a proper spaced-repetition flashcard program. -- Visviva (talk) 22:44, 2 July 2015 (UTC)
The common element is the core list of words for all languages and some target-language-specific words. I wish I could do it. All I'd need is the talent. Maybe some false friends, though that depends of both target and native language. The common element just seems like a good idea. There are the Swadesh lists, but we have to add some more contemporary material. I liked the spirit and tone of the original Gimmick series. Anyway, we can take requests if we want. I suppose this needs to start with just one or a few languages. DCDuring TALK 00:29, 3 July 2015 (UTC)
Is there any reason why particularly African languages would lend themselves better to flash cards? But in all seriousness, this is a good idea, but do we have anyone willing to do anything about it? --WikiTiki89 21:48, 2 July 2015 (UTC)
Are you thinking about the flashcard mode in JS or something? DCDuring TALK 00:29, 3 July 2015 (UTC)
  • This seems like a grant-worthy project for the right talent and proposal. MWF would probably support it. That's probably what the fellow who put it on the Wiktionary list was thinking. DCDuring TALK 00:32, 3 July 2015 (UTC)

  • FWIW, I just wrote User:Dixtosa/history.js that keeps track of searched terms in specified languages (you need to override historyjsIncludeLanguages variable). Then on your history page (username+/history, e.g. User:Dixtosa/history) you can view and manage (like deleting and deleting and even deleting) all the terms and all the targeted languages generated by JS. --Dixtosa (talk) 22:17, 13 July 2015 (UTC)

Poll: Replace the image in the entry "penis"[edit]

Proposal: Replace the image in the entry "penis".
Current image: File:Labelled flaccid penis.jpg (explicit picture of a penis)
Proposed image: File:Illu repdt male erect.jpg (cross-section drawing of a penis)


  1. Symbol support vote.svg Support --Daniel 00:52, 4 July 2015 (UTC)
  2. Symbol support vote.svg Support Seems obvious. -- Visviva (talk) 18:20, 4 July 2015 (UTC)
  3. Symbol support vote.svg Support I don't know why it has to be the erect one? Kaixinguo~enwiktionary (talk) 18:23, 4 July 2015 (UTC)
  4. Symbol support vote.svg Support As long as a guideline like Wiktionary:Votes/2015-06/Collapsing offensive images is not in effect, I think supporting this replacement is the best way to go. --Njardarlogar (talk) 13:38, 5 July 2015 (UTC)


  1. Symbol oppose vote.svg Oppose That's a lousy drawing.--Prosfilaes (talk) 05:35, 5 July 2015 (UTC)
    • Ideas for alternate replacements are welcome. --Daniel 13:55, 5 July 2015 (UTC)


  1. Symbol abstain vote.svg Abstain I oppose both images of these Chinese penises. We should replace both with a picture of a more realistic, bigger penis. --Vahag (talk) 11:46, 4 July 2015 (UTC)
  2. Symbol abstain vote.svg Abstain I support replacement with a drawing in principle, but as for the proposed medical drawing, I wonder whether I would recognize it to be a drawing of penis if I did not already know it was one. I am not sure the proposed edit is really an improvement. I collected some drawings at Commons:Human penis drawing. At the very least, File:Illu repdt male.jpg seems better to me, but I still wish we would have a much nicer drawing. --Dan Polansky (talk) 16:31, 5 July 2015 (UTC)
  3. Symbol abstain vote.svg Abstain I'm not bothered by the current image, but if we do switch to a drawing, I'd suggest one of File:Penis location.jpg, File:Sketch of a flaccid penis.png, or File:Sketch of a human penis.png. —Aɴɢʀ (talk) 16:43, 5 July 2015 (UTC)

Related discussions:

As an aside: If there are any other explicit pictures in any language, I would like to know. The entry masturbation had an explicit animated gif from May to June 2015, it does not have any image at the moment. --Daniel 19:49, 4 July 2015 (UTC)

There is one at ძუძუ ('female breast'). --Njardarlogar (talk) 13:40, 5 July 2015 (UTC)
Add breast to the list. It wouldn't surprise me if many of the non-English entries have such images. --Njardarlogar (talk) 13:43, 5 July 2015 (UTC)
What do you think of the drawing I just placed at ძუძუ? I wish I could find a nicer one, though. --Dan Polansky (talk) 15:51, 5 July 2015 (UTC)
Is there any reason this needs to be in the Beer Parlour? It's discussing one entry and so should be in the Tea Room. --WikiTiki89 17:02, 6 July 2015 (UTC)
I just forgot about TR. Maybe my brain went ou autopilot and subsconsciouly considered this as a follow-up to the other BP discussion. --Daniel Carrero (talk) 22:06, 12 July 2015 (UTC)
For the moment I am replacing the current image with File:Penis location.jpg. After 7 days, poll results are technically 4-1-3 with the majority of voters supporting the proposal. Still, a number of people disapprove of the specific proposed image. File:Penis location.jpg was among Angr (talkcontribs)'s suggestions. --Daniel Carrero (talk) 22:06, 12 July 2015 (UTC)

Presentation of Katharevousa Greek in en:Wiktionary.[edit]

I currently treat Katharevousa as shown here, entering it as an alternative form of the Standard Modern Greek one. Where an SMG form does not exist I would define it thus:

1. (Katharevousa) suitable translation

Does this seem the appropriate treatment. Are there better, different examples in other languages?

(@Chuck Entz, @Xoristzatziki, @Flyax, @Eipnvn, @Angr)  — Saltmarshσυζήτηση-talk 05:39, 4 July 2015 (UTC)
  • That looks like a good way to do it to me. See b́edro for an example of how I treated an obsolete word whose modern spelling is unattested. —Aɴɢʀ (talk) 14:25, 4 July 2015 (UTC)

There are two distinct "areas": "polytonic orthography" is the one and the other is "Katharevousa". "Katharevousa" has only "polytonic orthography" but "Demotic Greek" (official language of Greece since 1976) was also printed in "polytonic orthography" (officially until 1982). But there are polytonic forms that belong purely to "Demotic Greek" (βασιληᾶς or βασιλιᾶς). Also "Demotic Greek" is not a "descendant" of "Katharevousa". But there are many words created (most translated or transliterated) during the period where "Katharevousa" was official language and thus can be somehow stated that come from "Katharevousa". IMHO "Katharevousa" should be used only if form has only "polytonic orthography" and the printed word cannot be treated as a polytonic form of a word in use. Also "Katharevousa" is far more distinguished by her own set of grammatical and syntactical rules which cannot be "presented" in individual lemmas. (about the above mentioned example: Ἀριθμοί is the polytonic form of Αριθμοί which, in turn, comes from Ancient Greek Ἀριθμοί and not from "Katharevousa"). --Xoristzatziki (talk) 16:14, 5 July 2015 (UTC)

As a start, and by way of suggestion, I have made some changes to Αριθμοί and Ἀριθμοί. A suggested category might be Category:Polytonic Greek. Are there any views on whether "Polytonic spelling of" might be better than "Polytonic form of"?   — Saltmarshσυζήτηση-talk 10:16, 6 July 2015 (UTC)

eye dialect ing[edit]

I'm curious about the policy on eye dialect spellings of ing verbs in English. For example we have walkin' but not buyin'. AFAICT none of the eye dialect spellings are cited. Is there a special policy on when to include them? Just to pick an obscure verb, with a little casual googling I found a use for transmogrifyin' -- just one, but if I could find one barely looking, I bet there are more out there. Do we make pages for every English verb where someone has written it like that enough times to meet CFI? Are we actually required to find examples? There are no examples for any eye dialect words I've checked including some ones I'd have been surprising to find in writing, like agonizin' and considerin' (neither of which have any easy to find results on Google Books). Just curious if this has been discussed, I don't plan on mass-making these pages or nominating them for deletion or anything like that. WurdSnatcher (talk) 13:58, 4 July 2015 (UTC)

I personally consider them less useful than "common misspellings", as the general rule of dropping the "g" becomes obvious to a language learner rather quickly. We are not very good at agreeing on quantitative criteria for any class of inclusion/exclusion decisions, so the motivation and opinion of contributors, subject to the RfV process, governs, leading to an unsystematic result. DCDuring TALK 14:17, 4 July 2015 (UTC)
As far as I'm concerned, they're includable if they meet CFI: at least three uses in independent, permanently archived sources, spanning more than a year. For most verbs it shouldn't be difficult to find usage, considering how widespread such forms are in reported speech. But they're not eye dialect and shouldn't be labeled as such; they should be labeled {{nonstandard form of}}. —Aɴɢʀ (talk) 14:22, 4 July 2015 (UTC)
In an old discussion, we sorta decided to include them like any other word: any that meets our attestation requirement is in; any that doesn't is out. Our discussions since have followed that rule AFAICT. (And I agree with it, personally, fwiw.) However, I've found one discussion that did not apply that rule to multi-word terms, preferring instead to have only the single-word g-less term and the g-full phrase.​—msh210 (talk) 18:14, 7 July 2015 (UTC)


Category:Appendices and Category:English glossaries are inconsistently formatted and disorganized. Is there a policy on how to handle these pages? I was thinking about trying to clean things up over there, can't find any guidelines or even significant discussion about it. WurdSnatcher (talk) 01:39, 6 July 2015 (UTC)

I don't think there's been any major discussions about appendices.
Personally, I can think of some guidelines I've been applying to them when possible:
--Daniel 06:28, 6 July 2015 (UTC)

Using BBC Voices as a source[edit]

A few years ago, the BBC organized a large series of conversations between members of the public across the country about their dialects/accents. This information is now maintained by the British Library (so we can assume it's permanently archived) and although the files don't have full transcripts (just summaries), it's still a useful source for a lot of terms that are difficult to archive. To pick a random example, several participants in the Hartlepool conversation use the word cuddy-wifter/cuddywifter, which is difficult to cite even in its standard meaning of "left-handed person" (only one non-mention on Google Books), but as comes out in the course of the chat it has an additional meaning on Teesside of "Catholic". Given that a lot of the problems that we have with collecting dialectal terms is that they are often used in speech but seldom written down, can we use this archive as a citation? Smurrayinchester (talk) 10:29, 7 July 2015 (UTC)

(ETA: the recordings made in Scotland are fully transcripted, and can be searched here) Smurrayinchester (talk) 10:30, 7 July 2015 (UTC)
Well, we have chosen not to accept voice recordings as sources, even though there are plenty of movies, music, and other media containing voice recordings that are durably archived. However, the transcripts can probably count as written sources. --WikiTiki89 11:48, 7 July 2015 (UTC)
The untranscripted recordings afford an opportunity to cite pronunciations, thereby upgrading the objectivity of our pronunciations. Is there any kind of index to the untranscripted recordings to help one find where a particular word is pronounced? DCDuring TALK 14:15, 7 July 2015 (UTC)
Some (not all) have been looked over by linguists who've indentified the phonemes the speakers used. For example, here's the analysis of the Birkenhead recording, which has some snippets with interesting pronunciations highlighted and transcribed (very closely) with IPA. For example: "I was hung-over yesterday [jɛstədᶻi] so (yeah) and then I got a phone call [fʌʊŋkˣɔːɫ] from the college [kˣɒləʤ] saying, “oh you’re in an interview [ɪntsəvjuː] tomorrow” [tsəmɒɾʌʊ] and I was like, [laɪkˣ] “what what about?”" (incidentally, I think the [ts]s in that quote are typos for [tˢ]). There's also more general notation of standard phonemes - it notes that the FOOT vowel is [ʊ], that there's a lot of H-dropping, etc. I don't think there's a proper archive - the best way to find a word is probably to do a Google search of the http://sounds.bl.uk/ domain and try your luck. Smurrayinchester (talk) 15:40, 7 July 2015 (UTC)
Or perhaps they really were saying [ts] rather than [tˢ]. You'd have to listen to the recording. --WikiTiki89 16:00, 7 July 2015 (UTC)
Re "we have chosen not to accept voice recordings as sources": on the contrary, WT:CFI explicitly says "Other recorded media such as audio and video are also acceptable, provided they are of verifiable origin and are durably archived". Libraries often archive copies of CDs and DVDs (songs and movies), and several of our entries cite songs and movies as a result. There was some discussion of the subject in in May 2012, where it was pointed out that using only audio citations of a term we can't be sure of the spelling of would be problematic, but audio citations can be used in conjunction with written citation (as on Qapla') or (as Chuck put it) "where only one spelling is possible and the audio or video confirms usage", such as (as Ruakh put it) when "we often RFV a specific sense of a term, or an idiomatic expression whose component words are clear. In both of these cases, it can sometimes be quite clear what the spelling is." - -sche (discuss) 18:11, 7 July 2015 (UTC)
Your right. I really should replace my brain with a RAID array. --WikiTiki89 18:18, 7 July 2015 (UTC)

are religions nouns or proper nouns?[edit]

It seems we are not consistent on this on Wiktionary.
Categorised as nouns: Bahá'í Faith, Buddhism, Christianity, Confucianism, Druidry, Hinduism, Islam, Judaism, Scientology, Taoism
Categorised as proper nouns: Cao Dai, Jainism, Luciferianism, Raëlism, Rastafarianism, Shinto, Spiritism, Thelema, Wicca, Zoroastrianism
What do we do about this? ---> Tooironic (talk) 05:38, 8 July 2015 (UTC)

What other dictionaries distinguish proper nouns from common nouns and could offer us guidance? I recall from discussions of personal names and some other words that many other dictionaries don't distinguish proper from common nouns, and a surprising number of works, even high school and college English textbooks, erroneously equate "proper noun vs common noun" with "capitalized vs lowercase". Merriam-Webster, Dictionary.com, Collins and Cambridge all have "Buddhism", "Paul" and "White House" all just labelled noun, strongly suggesting that they simply don't distinguish proper from common nouns. (This means that if we could be consistent and correct in our labelling of things as proper vs common nouns, we'd be offering readers something other dictionaries don't!) Our colleagues at de.Wikt, who do distinguish proper from common nouns, have Buddhism as a common noun. - -sche (discuss) 05:55, 8 July 2015 (UTC)
Other languages do not consider them proper nouns. Same thing with language names and names of days and months. I think English calls them proper nouns only because English capitalizes them. —Stephen (Talk) 07:56, 8 July 2015 (UTC)
They're proper nouns because there's only one of them. Christianity is a particular set of beliefs; you don't generally speak about a Christianity, or these Christianities. (There are of course cases where "Christianities" is used, but that's true as well for any proper noun, e.g. Elvis Presleys or Elvises or Elvii.)--Prosfilaes (talk) 08:09, 8 July 2015 (UTC)
That just means it's uncountable, like iron or physics. —CodeCat 12:45, 8 July 2015 (UTC)
But you can talk about "this iron and that iron", but not "this Christianity and that Christianity". Perhaps physics should be a proper noun. --WikiTiki89 14:14, 8 July 2015 (UTC)
The grammatical (mostly countable usage, modifiability by adjectives, etc) and orthographic (initial upper case) behavior of the names of religions and other systems of belief is almost identical to that of language names, especially those that are not homonymous with adjectives. We treat all languages as proper nouns. DCDuring TALK 14:23, 8 July 2015 (UTC)
Physics can be used in a countable-like way in the phrase "alternative physics" example Not sure if that should be considered an idiom or not. Plenty of hits for "an alternate(ive) physics".WurdSnatcher (talk) 14:25, 8 July 2015 (UTC)
We are well aware that nearly any proper noun can be commonized: A Joseph from our Vermont created a new Christianity. But nevertheless the primary usages of these words are as proper nouns. --WikiTiki89 14:29, 8 July 2015 (UTC)
One can even use a proper noun as a verb: Elvised and Elvising would meet our standards for attestation. Even Christianitied and Christianitying can be found on the web. This kind of use doesn't warrant creating a new PoS section IMO. DCDuring TALK 14:47, 8 July 2015 (UTC)
Wikitiki, you can talk about "this or that" Christianity, e.g. "The Christian imprint upon his thought is certainly clearly evident everywhere. But this Christianity is very much modified and very abbreviated." Equinox 17:37, 8 July 2015 (UTC)
See my subsequent comment: "nearly any proper noun can be commonized". --WikiTiki89 17:41, 8 July 2015 (UTC)
User:EncycloPetey has a nice, informative subpage about proper nouns where he explains the criteria for classifying a term as a proper noun. He regards that days of the week and names of festivals as borderline cases. But, under his criteria, I'd be inclined to say that names of religions are proper nouns. -- · (talk) 15:42, 8 July 2015 (UTC)
I think we should abandon proper nouns and treat them as nouns. Grammatical properties, like whether an article precedes a word, are more diverse than proper vs. common. The uniqueness of a referent is a semantic property rather than grammatical and not relevant to part of speech. —CodeCat 16:40, 8 July 2015 (UTC)
I have been wanting us to abandon the label "proper noun" for ages. —Aɴɢʀ (talk) 16:59, 8 July 2015 (UTC)
The Penguin Writer's Manual (2004, ISBN 0141924829) says this:
A proper noun is a noun that denotes a specific person or thing. It is, to all intents and purposes, a name. [...] Proper nouns include people's first names and surnames, the names of places, times, events, and institutions, and the titles of books, films, etc. They are spelt with an initial capital letter: Sam, Shakespeare, New York, October, Christmas, Christianity, Marxism, and Coronation Street. All nouns that are not proper nouns are known as common nouns. [...]
Apart from being spelt with an initial capital letter, proper nouns have other characteristics that usually distinguish them from common nouns. They do not, generally, have a plural and they are not, usually, preceded by a or an. There is only one Australia; there was only one Genghis Khan. [...] there are many exceptions to [this]. There are occasions when either a specific example or several examples of something denoted by a common noun must be referred to: keeping up with the Joneses; [...] one of the warmest Januaries on record.
The manual then goes on to describe concrete nouns (like table) and abstract nouns (like happiness and unity), countable nouns (table again) and uncountable nouns (mud, foliage), and collective nouns (flock).
- -sche (discuss) 17:41, 8 July 2015 (UTC)
I have yet to find any English grammar reference, of any vintage, that doesn't discuss proper nouns. I suppose that print dictionaries and their online descendants rely on capitalization, the habits and experience of speakers, and common sense to communicate what needs to be communicated to users without wasting space on pages or screens. CGEL handles proper nouns and proper names in less than twenty pages, so it shouldn't be all that difficult for us to interpret the treatment of proper nouns in grammar references to help us differentiate proper from common nouns. DCDuring TALK 18:04, 8 July 2015 (UTC)
Comment: While there are some borderline cases, such as the names of months and days, most nouns can be readily distinguished as common or proper. The actual criteria and grammar of proper nouns are of far more debate, albeit philosophical. The problem arises in that the names of abstractions and philosophies behave grammatically much like proper nouns. Is socialism a common noun or a proper noun? In older texts, it was capitalized and treated much like Confucianism or Christianity. All three are philosophies. Further, capitalization cannot always be relied upon as a guide, since a number of common nouns and even adjectives are capitalized by virtue of their etymological source (e.g. Welshman, French dressing, African). --EncycloPetey (talk) 20:27, 8 July 2015 (UTC)
While a few users have suggested that distinguishing proper from common nouns is not useful, we have established and seem to be in agreement that religions are proper nouns. A long as we maintain a distinction between common and proper nouns, I will update Islam etc accordingly. - -sche (discuss) 23:09, 10 August 2015 (UTC)

Language labels within Translingual citations[edit]

At the moment we have 30 pages in Category:Translingual citations.

I found them to lack consistency as some of those had only a "English citations" section while others had a "Translingual citations" section without specifying which language is each citation. So I am trying a new format to standardize them all.

See Citations:VL. I separated the citations with language-specific labels within the "Translingual citations of VL" section. I also put them in the respective language categories: Category:English citations, etc. (Maybe something like Category:Translingual citations in English could be an improvement. Still, maybe this level of granularity is not necessary now because there are only few of those citations. If we had hundreds of Translingual citations I might think differently.) I made sure all the 30 pages are following this new format at the moment. I'd like some feedback to see if other people like this format or it could be improved some other way.

Thoughts? --Daniel Carrero (talk) 21:30, 8 July 2015 (UTC)

I'm thinking that there should be no such thing as a translingual citation. The citation itself is in a language, even if the term it's citing is used cross-language. —CodeCat 22:03, 8 July 2015 (UTC)
@CodeCat: I disagree with you on this point. IMO, having "Translingual citations" mirroring the Translingual section of the entry itself is useful because allows us to group different language citations into the same senses. See Citations:(, specifically sense "Punctuation mark: expands a word into another word, inflection or spelling"; it has both Portuguese and English examples. It could have even dozens of languages in the future. It serves for easier comparison of how the same specific sense is used, to check if the Translingual definition is true in all languages. Current sense at ( is inaccurate. It is: "Begins denoting an alternative option for a preceding word. / dog(s)", but there are citations with "colo(u)r" and "(re)criação" (Portuguese). For punctuation marks, one could even argue that one sense of a punctuation mark is truly "Translingual" if it has been attested in multiple languages; ¿ has only the Spanish section. But taxonomic names would be truly Translingual even if attested in only one language, I hope: Citations:Anous stolidus has only one Portuguese citation at the moment. --Daniel Carrero (talk) 02:21, 9 July 2015 (UTC)
The Translingual case shows that the naming convention for the citations categories assumes there to be no difference between the language label for the term and the language label for the citation. AFAICT only Translingual violates the assumption, though Translingual itself is a highly heterogeneous collection of ideograms, symbols, taxonomic names, and other scientific names. Arguably it should also include Latin-derived term that appear in medical, legal, even alchemical running text of many languages.
IMO, Any citation pages for Translingual terms should remain where they are and those pages should be categorized as Translingual, eg, the current Category:Translingual citations. IMO, there could certainly be additional categorization into categories for the language in which the translingual term is embedded in each citation. This would enable the citation to be found and reused for citing the terms of its language that it includes and subjected to any language-specific maintenance that might be required.
I don't see any great advantage to having a category like Category:Translingual citations in English rather than two categories: Category:Translingual citations and Category:English citations, but one disadvantage: there is no simple single category that contains all the pages bearing in whatever content namespace that have all the citations in each language. The current search engine makes it easy to search for the intersection of categories. As long as there is no regression of search-engine capability, we should be good for real-time search. We also have the dumps to process should there be regression. DCDuring TALK 00:17, 9 July 2015 (UTC)
I am thinking, maybe I would support that "Latin-derived terms that appear in medical, legal, even alchemical running text of many languages" be Translingual entries. Some phrases and terms to consider: List of Latin phrases and List of legal Latin terms. Maybe the pronunciation of those would be slightly different among different languages but taxonomical names with pronunciations would also have this issue to consider.
I agree with DCDuring's reasons for having the Category:Translingual citations, in addition to the reasons I stated above in my response to CodeCat. About Category:Translingual citations in English, I'm not really interested in it at the moment, but it's possible that at some point in the future I'm going to bring it up again. IMO, if we had hundreds of Translingual citation pages, then I would prefer using categories than using the search engine for more navigable results. (seeing 200 page titles at once, sorted alphabetically, where it's possible to see how many members one category has, etc.) --Daniel Carrero (talk) 02:21, 9 July 2015 (UTC)
No one except for me wanted a pronunciation header for taxonomic terms. I only wanted one suggested (ie somewhat prescriptive) pronunciation. But since Translingual terms are unlike the usual terms in a few ways, perhaps we should reconsider what the distinctive characteristics of the various types of Translingual entries are and develop a custom ELE for them. For example, we could have a "hidden" pronunciation section for taxonomic terms with pronunciations in as many languages as people care to provide.
I take your point about the possible future value of Category:Translingual citations in English. DCDuring TALK 02:49, 9 July 2015 (UTC)
I added the Brazilian Portuguese pronunciation in the Translingual sections of Homo sapiens and Vulpes vulpes. I deleted the English section of H. sapiens in the process, because it seemed to me it had no value other than having pronunciations, foreign script translations (like ホモサピエンス, moved link to Translingual section) and the plural "Homines sapientes", which I cited in Portuguese too. The English section had some random translations of man/person too, which I just deleted.
Custom ELE for Translingual entries = WT:AMUL? I guess it would be both the CFI and ELE for that "language", though it would have to be edited further as I see it currently focuses almost entirely on criteria for inclusion and says little about layout. I support the proposal: 'we could have a "hidden" pronunciation section for taxonomic terms with pronunciations in as many languages as people care to provide'. But maybe for the moment we could just keep adding pronunciations in any language to Translingual sections without bothering to have them collapsed. Related category: Category:Translingual terms with IPA pronunciation (292 members). --Daniel Carrero (talk) 06:52, 9 July 2015 (UTC)
I think the Chinese entries are a possible model: we should have a collapsible pronunciation table along the lines of the translation table, since the same symbol is read out loud in different languages and dialects as period, full stop, point, Punkt, etc. One problem to deal with: scientific names, at least, also have syntactic information in various languages. For instance, scientific usage in English is to refer to "the family Malvaceae", but a lot of people who aren't familiar with this refer to "the Malvaceae family". Another is that scientific names are more often read than heard, so pronunciation can vary widely from person to person: I would pronounce Malvaceae as something like /malˈvej si ej/, but I often see the pronunciation given as /malˈvej si i/. Writing for the public on pronouncing scientific names tends to say things like "there's no one right way to say it- everyone is different". This is especially true when scientific names are based on names of people: if one recognizes the name, one may pronounce it after the pronunciation of the person's name, rather than by the usual rules. For example, "hopei" might be pronounced as two syllables or three, depending on whether one notices that it's based on the surname Hope. Chuck Entz (talk) 14:00, 9 July 2015 (UTC)
Sounds like we should have a Pronunciation section at WT:AMUL(or WT:ATAX?) to which we can have a standard link, perhaps as part of the control for hiding/showing the pronunciations. It seems impractical and speculative to offer too many idiosyncratic pronunciations within each language. My own inclination is somewhat prescriptive with respect to a term like hopei, in case we don't have an etymology or a user doesn't look at it or make the appropriate inference. DCDuring TALK 14:15, 9 July 2015 (UTC)
If my input is of any value: if Wiktionary had pronunciation of taxonomic names and had them more thoroughly covered, I would look them up regularly, and would expect (and want) the prescriptive (presumably Latin), more neutral pronunciation, rather than the way it is pronounced in various languages (unless it's in common use outside of the scientific community like Homo sapiens or T. rex). The pronunciation would vary from person to person and would have too many variants for there to be any point in looking it up rather than sounding it out using the pronunciation rules of the language of the context. JodianWarrior (talk) 14:49, 9 July 2015 (UTC)
Proposal: Use WT:TAXON, not WT:ATAX: "tax" is ISO 639-3 for Tamki language. --Daniel Carrero (talk) 19:00, 9 July 2015 (UTC)
Sure. DCDuring TALK 23:39, 9 July 2015 (UTC)
While this approach sound good in theory, it has little practical value. The prescriptive back-constructed Classical Latin pronunciations are not actually used by modern scientists in the area of taxonomy. For example, the genus of pine is Pinus, which in Classical Latin sounds exactly like English penis, so only the most eccentric of taxonomists uses that pronunciation. Even the family names in botany vary wildly in their use pronunciations by country, and none of them use the Classical-style pronunciation that I have ever heard (and I've heard US, UK, Danish, French, and Portuguese botanists).
So, what would be the point of prescribing pronunciations that no one actually uses? If we're going to include pronunciations for scientific taxon names, then I would suggest we limit ourselves to those pronunciations found in English-speaking countries, and include as a matter of course, a link to a page where the variability of these names in other cultures is explained with some examples. --EncycloPetey (talk) 20:41, 24 July 2015 (UTC)
That seems like a sensible recommendation. It would give English users of Wiktionary some clue about one or more intelligible (descriptive) pronunciations and might provide hints relevant to other languages, such as stress pattern, diphthongs, or diaeresis, hard vs soft c and ch, etc, which probably apply across many languages. DCDuring TALK 22:58, 24 July 2015 (UTC)

"Audio" in front of pron files for non-pluricentric languages[edit]

Do languages that do not have several very well established regional varieties (an example of this could be English (US), English (UK), English (Aus), etc.) need the text "Audio" prepended before their pronunciation file players? Neitrāls vārds (talk) 20:31, 9 July 2015 (UTC)

Proposal for a "best practices" recommendation: "Audio" before a pronunciation file should be used only in the presence of some other qualifier. It is otherwise redundant As bullet points are used to itemize/list text, a bullet point is not to be used either (because a Flash element is not text.) (Ping some users whose editing involves pronunciation files User:Pereru, User:Panda10, anyone else welcome to express their opinion.)

Pronunciations would be formatted in the following way for pluricentric languages languages:

And the following way for languages whose pronunciation files usually do not feature additional qualifiers:



  • Hyphenation: Ame‧ri‧ka

Modified 2nd version: "Audio" is not to be used in the absence of some other qualifier but bullet point must be used.

3rd version: "Audio" is not to be used in the absence of some other qualifier. An editor can choose whether to use a bullet point. ({{IPA}} doesn't appear to be checking for namespace when adding categories the IPA examples should be removed from this page at some point or namespace-checking/cat suppression should be added to the template.)

Just learned that ping won't work without signing. User:Pereru, User:Panda10. Neitrāls vārds (talk) 19:57, 10 July 2015 (UTC)
I think the bullet point should be retained in all cases. It makes the flash element align with the other things, and gives them all the same visual 'introduction' (a bullet point); it also makes the edit window more legible, IMO; furthermore, it helps when indentation is used: for example, if audio were added to impact, it could be indented under the 'noun' and 'verb' lines (although see object for another way of presenting such information; we are not consistent).
If we were to drop "Audio" from non-pluricentric languages, could we just drop it from all languages? Then we would have:
- -sche (discuss) 21:22, 10 July 2015 (UTC)
I prefer the bullet point NOT to be retained -- the result is that the flash element is placed right under the pronunciation transcription to which it refers, as part of the same paragraph -- which to me is more logical: the pronunciation file is not a separate pronunciation, a separate item in a list of possible pronunciations, but an actual realization of the same pronunciation that was transcribed with the IPA right above it, i.e. logically part of the same paragraph. (I might even prefer it if the flash element occurred in the same line and after the actual IPA transcription; but occurring right under it is also OK.) --Pereru (talk) 21:56, 10 July 2015 (UTC)
  • Note that we have a template {{audio-IPA}} for when the audio is meant to go with the IPA transcription. --WikiTiki89 22:07, 10 July 2015 (UTC)
I'm fine with removing the "Audio" label. If there are qualifiers, they can be displayed without the "Audio" label. It would also be fine removing the bullets but without them two or more audio templates will be displayed in a single line. It doesn't look good. Maybe you can modify the audio template to resolve this. Leaving the use of bullets optional is probably not a good policy. Some editors would use it, others won't. It would create too much inconsistency in the layout. I assume the new standards will be implemented by a bot and they will continued to be checked after every edit. --Panda10 (talk) 12:44, 11 July 2015 (UTC)
The only concern I have is that the player and associated graphic do not always display in some browsers or under certain conditions. If we remove the "(Audio)" text, can we ensure that when the player fails to display, that default text of "(Audio)" or something equally descriptive appears in its place? --EncycloPetey (talk) 20:45, 24 July 2015 (UTC)

Old Italic standardization proposals[edit]

I've recently been working on Module:Ital-translit and Appendix:Old Italic script and have come to the point where I need some oppinions. The Ital code block does not currently possess all the characters needed fully to encode all the languages that use it; so I propose the following rules to standardize the Ital's use. Previous conversations maybe found at User talk:JohnC5#Testing transliteration modules and WT:Beer parlour/2013/June#South Picene alphabet.

Proposal 1:All entries should be written left-to-right[edit]

The majority (if not all) of the languages that use Ital are written boustrophedon and thus could have lemmata appearing in left-to-right or right-to-left order. Modern scholarship, however, tends to merely unspool the inscriptions and then present them in left-to-right order. I therefore propose that all languages using Ital should be lemmatized in left-to-right order.

Support. This is proper use of Unicode, because Ital is encoded as left-to-right. If we ever decide to make some piece of Old Italic text in boustrophedon or right-to-left, it should be done with HTML, never by typing it backwards like some have suggested. — Ungoliant (falai) 14:00, 10 July 2015 (UTC)
Support. I second everything Ungoliant said above. --WikiTiki89 16:56, 10 July 2015 (UTC)
Support per Ungoliant. It would be wonderful if we could have a template to wrap Old Italic quotations which would present the text boustrophedon. — I.S.M.E.T.A. 18:27, 21 July 2015 (UTC)

Proposal 2:Allow alternative use of Ital characters[edit]

For much of Ital script, the character may be transcribed unambiguously or with only minor phonetic deviations from the canonical. Examples are represented be a blue box in Appendix:Old Italic script and include:

  • 𐌂: canonical - c; Camunic, Oscan, South Picene, Noric, North Picene - g
  • 𐌅: canonical - v; Old Latin - f
  • 𐌈: canonical - θ; Umbrian - t; Noric - d

However, in some cases, one language may use one glyph to represent an entirely different sound (whether by innovation of a new but similar letterform or by reällocation of a previous letterform). Examples are represented be a red box in Appendix:Old Italic script and include:

  • 𐌁: canonical - b; Camunic - ś; Raetic - tʼ / þ
  • 𐌑: canonical - ś; Camunic - b; South Picene - í
  • 𐌣: canonical - 50; Camunic - þ; Faliscan - f

I therefore propose the use of the character which most closely resembles the letterform in a particular language. Therefore South Picene matereíh should be lemmatize as 𐌌𐌀𐌕𐌄𐌓𐌄𐌑𐌇 ‎(matereíh) and not 𐌌𐌀𐌕𐌄𐌓𐌄𐌝𐌇 ‎(matereíh). The rules will be those set forth in Appendix:Old Italic script

Support. I see no reason against this. --WikiTiki89 16:59, 10 July 2015 (UTC)
My initial reaction is to oppose. As was just noted in the Grease Pit, "б in Serbian is sometimes displayed differently from the б in Russian", but we don't handle this by using a different character for Serbian б in an attempt to mimic its shape. For Runic, we make do with or , even when the inscription clearly has S. If Unicode has encoded something as, for example, "LETTER SHE", and we use it in spelling a word which is actually spelled with "LETTER II", I don't see how readers are supposed to figure out that the word isn't spelled with she (and wonkily transliterated by us). - -sche (discuss) 18:47, 11 July 2015 (UTC)
Re "in some cases, one language may use one glyph to represent an entirely different sound (whether by innovation of a new but similar letterform or by reällocation of a previous letterform)", I suppose there is an important theoretical difference between innovational homoglyphs and reällocated glyphs; the former simply doesn't exist in the encoded repertoire, whilst it is perfectly appropriate to transcribe the latter in whatever noncanonical way according to the reällocation of a given language. I doubt that the distinction has any more than theoretical importance, however, so I find myself inclined to support the use of noncanonical transcriptions where a given language calls for it. — I.S.M.E.T.A. 18:27, 21 July 2015 (UTC)

Proposal 3:Add extra characters into Ital temporarily[edit]

The interpunct ·, two dot punctuation , and tricolon are all variously used as word separators in Ital languages and should be used as punctuation in entries. Furthermore, South Picene (always the culprit) uses · to represent the letter o and for the letter f. Thus the entry mefiín contains the quotation:

  • 𐌀𐌐𐌀𐌄𐌔⁝𐌒𐌖𐌐𐌀[𐌕?⁝𐌄?]𐌔𐌌𐌑𐌍⁝𐌐𐌞𐌐𐌞𐌍𐌉𐌔⁝𐌍𐌑𐌓⁝𐌌𐌄⁚𐌉𐌑𐌍⁝𐌅𐌄𐌉𐌀𐌕⁝𐌅𐌄𐌐𐌄𐌕𐌑
    apaes qupa[t? e?]smín púpúnis nír mefiín veiat vepetí
    The nobleman lies, the chief of the Picenes (?) is (?), in the middle of the tomb.

Until such time as Unicode adds one-, two-, and three-dot word-separators to the Ital code block, · (U+00B7), (U+205A), and (U+205D) should be used in entries and, in the case of South Picene, in page names (mefiín should be moved to 𐌌𐌄⁚𐌉𐌑𐌍 ‎(mefiín)).

Oppose. This is improper use of Unicode. It’s no different than using | (pipe) instead of I (capital i). I prefer using transliteration since the script variant used by South Picene is clearly not covered well enough by Unicode, but using 𐌏 and 𐌚 are also a better solution. — Ungoliant (falai) 13:53, 10 July 2015 (UTC)
Support. I disagree with Ungoliant. This is nothing like using | (pipe) instead of I (capital i), because I (capital i) exists in Unicode. --WikiTiki89 16:58, 10 July 2015 (UTC)
Support. I seriously doubt that Unicode will add Old Italic specific punctuation; punctuation is for all scripts where possible.--Prosfilaes (talk) 18:43, 10 July 2015 (UTC)
I can support using U+00B7, U+205A and U+205D for punctuation, but using them for letters is indeed a misuse of Unicode like Ungoliant said. Why not use the regular "O" (U+1030F) and "F" (U+1031A) codepoints for South Picene? It does not seem particularly distinct from, say, the Serbian variant of Cyrillic to me. Keφr 08:01, 12 July 2015 (UTC)
I support using · (U+00B7), (U+205A), and (U+205D) as punctuation marks in Old Italic languages (until such time as Unicode encodes punctuation marks specific to Old Italic). I oppose using · (U+00B7) and (U+205A) for South Picene o and f; we should instead use 𐌏 and 𐌚 in conjunction with a font that will make those letters display as dots (as I suggested at User talk:JohnC5#Testing transliteration modules). — I.S.M.E.T.A. 18:27, 21 July 2015 (UTC)

Proposal 4:Page names should be in Ital when applicable[edit]

For several of these languages (Old Latin most notably), there may exist a corpus written in the Latn alphabet. The majority of the languages exist primarily in their version of the Ital alphabet and should be lemmatized as such. It is the scholarly practice to place words transcribed from Ital in boldface and those found in Latn in italics or roman. Where possible, we should strive to put words found in Ital or Latn according to their appearance in the source. The major offender at this point is Faliscan, the majority of whose entries, I suspect, should be in Ital (also, -el̄u shouldn't have a macron in the page name).

Support. Entries should be in the same script as the original attestation, not printed transcriptions. --WikiTiki89 17:01, 10 July 2015 (UTC)
Oppose. We are a printed work, therefore we should follow the standards of printed works. Don't Proliferate; Transliterate!. Trying to post entries in Old Italic also demands that we have translation entries for Latin script so people actually using printed works can look things up.--Prosfilaes (talk) 18:53, 10 July 2015 (UTC)
Speak for yourself. I am not a printed work. Keφr 20:40, 10 July 2015 (UTC)
Extremely strong support, except, perhaps, for Old Latin (iff its corpus is primarily Latn). Perhaps we should do something similar to what is done with Gothic, and have entries for the Latn spellings of every Ital lemma. — I.S.M.E.T.A. 18:27, 21 July 2015 (UTC)

Sorry for how long this is, but I needed to discuss all the different issues because each affects how words will be lemmatized. When we have a decision, I will create WT:AITAL with the information.

People who may be interested: @I'm so meta even this acronym, Ungoliant MMDCCLXIV, EncycloPetey, The Man in Question, Wikitiki89, Kephir. —JohnC5 03:04, 10 July 2015 (UTC)

Ping fail. Please read mw:Help:Echo#Technical details to learn why (you added section headers). Keφr 06:04, 10 July 2015 (UTC)
Grrrrr, that explains a lot. @I'm so meta even this acronym, Ungoliant MMDCCLXIV, EncycloPetey, The Man in Question, Wikitiki89, KephirJohnC5 13:34, 10 July 2015 (UTC)
For the record, I do not feel that I have neither enough knowledge of Old Italic nor of its script to offer any meaningful opinions in this discussion. --EncycloPetey (talk) 02:40, 12 July 2015 (UTC)

Collapse multiple inflection-of definitions into one with subsenses?[edit]

I've always been bothered by entries like agri and aquae. There's no need to repeat "of (word)" four times on separate lines. So I'm thinking it would be good to extend {{inflection of}} so that you can specify distinct multiple inflections instead of just one. These would be displayed as subsenses, so that aquae would look like:

  1. inflections of aqua:
    1. nominative plural
    2. genitive singular
    3. dative singular
    4. vocative plural

I think this would look a lot better, and above all there is only one link to the lemma rather than 3 extra redundant ones. We can also make the list of subsenses collapsible in cases where there's too many (like for German adjectives).

To implement this, {{inflection of}} would need some way to indicate how to separate multiple inflections. This would have to be some kind of special tag that is inserted as a separator, like: {{inflection of|aqua||nom|p|(sep)|gen|s|(sep)|dat|s|(sep)|voc|p|lang=la}}. My question is what the separator should be. It should be something that isn't legitimately used in existing entries and would not likely be used in future ones. If proposals are made, the current template can be modified to track any uses of those proposed tags in current entries, which would then allow us to assess the situation better. —CodeCat 20:20, 10 July 2015 (UTC)

  • Support. I love this idea! --WikiTiki89 20:27, 10 July 2015 (UTC)
    And I was just about to suggest making it collapsible and then I realized you already said that. --WikiTiki89 21:32, 10 July 2015 (UTC)
  • Meh. A trivial case of redundancy not worth addressing. And I think subsenses should not be used unless a word has so many definitions that they would become too hard to navigate otherwise. And I especially oppose having them collapsed. Which particular inflected form the word is in is crucial information and the reader should not be required to click through to see it. Keφr 20:38, 10 July 2015 (UTC)
    • Have you ever seen roten? —CodeCat 12:51, 11 July 2015 (UTC)
      • So you want to make definition lists less usable for the sake of one word? (Which is not going to help with anything anyway; collapsible or not, you still have a long list.) Keφr 13:05, 11 July 2015 (UTC)
        • I don't see how they're made less usable to begin with, I think this increases usability. But roten isn't the only German adjective like this; all German adjectives have one form ending in -en which has this many definitions. Editors have so far sidestepped this issue by using {{inflected form of}}, which really isn't any good as it doesn't say anything about what inflected form. You said yourself that this is crucial information. —CodeCat 13:22, 11 July 2015 (UTC)
          • How about this? Keφr 13:32, 11 July 2015 (UTC)
            • It's an improvement, but still quite a long list. —CodeCat 13:39, 11 July 2015 (UTC)
              • I find it completely bearable. Your "solution", on the other hand, does not shorten it at all. You are going to have a long list no matter what. The only situation in which this proposal would be of any help is when a single token is derived from multiple lemmata; but I suspect these situations to be quite rare. Keφr 13:48, 11 July 2015 (UTC)
  • Support. Looks nice to me. —JohnC5 20:48, 10 July 2015 (UTC)
  • Support. I would like to suggest a comma or a semicolon, since those are separators people would use in running text- we might as well make it as intuitive as possible. Chuck Entz (talk) 00:19, 11 July 2015 (UTC)
    • A comma could be used legitimately, like if someone wants to write "nominative, accusative and dative". —CodeCat 00:20, 11 July 2015 (UTC)
      • Another thought: allow groups that share one element, so that you could have "|nom|acc|voc|s|" that would be the same as "|nom|s|,|acc|s|,|voc|s|" Chuck Entz (talk) 00:26, 11 July 2015 (UTC)
        • It would be quite complicated to do that, the module would have to somehow guess that the singular label applies to all of the previous. Not worth the trouble considering all the edge cases there are with this. Dutch has a form for masculine/feminine gender, plural or definite adjective inflection, just to name an example of overlapping sets. —CodeCat 00:33, 11 July 2015 (UTC)
  • Support. How would this impact verb forms where the person can be the same for each form but also could be different? See vizsgáljuk. --Panda10 (talk) 12:57, 11 July 2015 (UTC)
    • It wouldn't change anything directly, it just adds new possibilities. You'd have to modify the entry to make use of it. In this case, you'd change the definition to: {{inflection of|vizsgál||1|p|indic|pres|def|;|1|p|subj|pres|def|lang=hu}}. —CodeCat 13:03, 11 July 2015 (UTC)
      • Ok, that works. I noticed the word "inflections of" is in plural since it lists multiple inflected forms below, but for some reason "inflection of" would sound better to me because it is after all a single form used in multiple places. --Panda10 (talk) 13:36, 11 July 2015 (UTC)
        • I'm indifferent about this personally, what do other editors think? —CodeCat 13:38, 11 July 2015 (UTC)
          • I lean towards "inflection of", too, even when multiple inflections are given. It certainly shouldn't be "inflections of" if only one inflection is given. Will the template still indent the "list" if it only contains one item, like abnegationem? (It's not disastrous if it does, but it's unnecessary.) - -sche (discuss) 18:56, 11 July 2015 (UTC)
            • Well, see for yourself on abnegationem. :) It displays the way it always did. —CodeCat 20:03, 11 July 2015 (UTC)

Since there seems to be overwhelming support, I've added the necessary code for this to {{inflection of}}. I've chosen ; (semicolon) as the separator. See aquae, which I've changed to make use of this new option. We would likely want to inform bot owners of this, and also run a bot to convert existing entries. —CodeCat 13:05, 11 July 2015 (UTC)

One day after starting the discussion? Typical CodeCat. Just stop and let the discussion proceed in a regular fashion. --Dan Polansky (talk) 07:20, 12 July 2015 (UTC)
  • Oppose (just saw this) Ummm. . . so how will we key quotations to specific senses, if they're all collapsed? --EncycloPetey (talk) 02:41, 12 July 2015 (UTC)
    • Aren't quotations under the lemma form, not under the inflected form? —Aɴɢʀ (talk) 05:46, 12 July 2015 (UTC)
      • They can be, in languages like English that have little inflection. But for highly-inflected languages like Latin, they cannot. We want documentation of the various inflected forms, and many Latin verbs are incompletely conjugated, and some other Latin words have inflectional irregularities. It is not feasible to try to include supporting quotations for all forms of a Latin verb under the lemma; there are simply too many forms, and identifying and sorting the various forms within a lemma page would be disastrous for the sanity of both editors and users. --EncycloPetey (talk) 23:40, 12 July 2015 (UTC)
        • Angr is right, quotations and usage examples go on the lemma form. Quotations shouldn't be used merely to attest a term, they exist as a higher-quality alternative to usage examples. If the idea is to show attestation of a term, then it should go on the citation page, which exists for that purpose. —CodeCat 13:55, 13 July 2015 (UTC)
          • No, Angr is wrong about this. We are not simply attesting the term and its usage as a collection of forms, but are cataloging spelling variation, different plural forms, and different inflected forms. While any of these can be listed at the lemma, it serves no useful purpose whatsoever to restrict them there. It is much more useful to be able to find supporting citations associated with the specific form of the word, rather than with a lemma form that, in some cases, has been chosen arbrtrarily from among the possible alternatives. If we are showing usage examples, as you say and as I agree, then those need to be placed on the forms pages too. If I want to see how the dative form of a word is used, I want to look at a collection of usages in the dative, not usage of all of the forms together. --EncycloPetey (talk) 20:51, 24 July 2015 (UTC)
            • Then that certainly goes against the common practice among editors. Editors put usage examples for any of the inflected forms on the page of the lemma, and have done so since forever. They don't restrict themselves to putting only usage examples for the lemma form on that page. This practice is established enough that some editors will move the usage examples from a non-lemma page to the lemma. A change to this would certainly need further discussion.
              If you're looking for usage examples for the dative in a given language, you should not expect to find them in a dictionary under some random word. Explaining how cases are used is the job of a grammar, not a lexicon. —CodeCat 21:07, 24 July 2015 (UTC)
              @CodeCat Re: "Editors put usage examples for any of the inflected forms on the page of the lemma, and have done so since forever.": Evidence, please. That does not match my recollection. --Dan Polansky (talk) 09:53, 25 July 2015 (UTC)
            • As CodeCat and Angr have said: citations showing how a term is used go in the lemma entry. If in specific, unusual cases citations are needed to verify that the dative plural of a term is foobarenn rather than the usually-expected foobaren or whatever, then those citations go on the citations page. This has been the case for years. - -sche (discuss) 21:47, 24 July 2015 (UTC)
              • Argument from what "has been the case for years" (and I will not bother to argue whether this has actually been the situation or not) is a weak argument that does not address any objective Wiktionary is trying to accomplish. Further, your argument above makes sense only if (1) citations exist solely for the purpose of demonstrating grammatical usage, and (2) grammtical usage does not vary with form. But in some languages, the usage of a term may actually vary with the form of the inflection. As a simple example, the grammar of singular and plural millē are very differently in Latin. I would also argue that assuming point (1) is an unnecessary limitation on Wiktionary. Citations exist to document forms and spelling at least equal in measure to documenting proper grammar. To that end, each variant should ideally be (eventually) documented from sources. That cannot be reasonably accomplished if all the various citations are limited to the lemma page. And CodeCat, no one is suggesting we put the citations under a "random word"; that is a straw man argument. --EncycloPetey (talk) 00:21, 25 July 2015 (UTC)

the writing on the wall[edit]

We do not have an entry for any form of mene mene tekel upharsin (numbered, numbered, weighed, and divided). I think the language is Chaldean Aramaic (Biblical Aramaic) and it was probably written on the wall in Neo-Babylonian cuneiform script. Today, however, it is commonly used in English texts in Roman letters. Should there be an entry in Roman letters, and if so, what language to label it? I supposed it could be written in Hebrew (מְנֵ֥א מְנֵ֖א תְּקֵ֥ל וּפַרְסִֽין), Syriac, and/or cuneiform (if the spellings could be found in those scripts). —Stephen (Talk) 15:33, 11 July 2015 (UTC)

  • It's written in Hebrew in a famous painting (I think in the National Gallery, London). SemperBlotto (talk) 15:35, 11 July 2015 (UTC)
    • It's certainly written in Hebrew letters in the Bible. —Aɴɢʀ (talk) 18:22, 11 July 2015 (UTC)
I had a children's Bible that showed it in Roman letters. Not very "English" though. Redirect? Equinox 18:24, 11 July 2015 (UTC)
Yeah, probably redirect, since the string is long enough that it's unlikely to be an unrelated word in another language. - -sche (discuss) 18:58, 11 July 2015 (UTC)
The phrase was never used in Aramaic in Roman letters. The pronunciation currently in the entry is the English pronunciation. So either it should be converted to English, or it should be a redirect. --WikiTiki89 12:44, 13 July 2015 (UTC)

Use "male" and "female" for gendered nouns[edit]

Many languages have nouns that occur in different forms depending on the natural gender of the referent, like French comédien/comédienne, English actor/actress. This is not actually grammatical gender the way we know it, exemplified by the fact that languages that have no grammatical gender can still often make such distinctions. Of course grammatical gender may align with natural gender in this case, but it doesn't have to (I can't think of an example, but maybe someone else can). Spanish amiga is not a grammatically feminine form of the lemma amigo; rather both are independent nouns and have different meanings. The choice is made based on the referent rather than based on grammatical rules.

So I think that using the terms "masculine" and "feminine" and using {{feminine of}} and such for these cases is incorrect and confusing, as it conflates grammatical and natural gender. It's especially bizarre in entries like mayoress with languages that don't even have grammatical gender. I'm therefore proposing to introduce the separate terms "male" and "female" to refer to natural gender in these cases. amiga is the female equivalent or female counterpart of amigo, not a form. There would need to be two new form-of templates. —CodeCat 20:17, 11 July 2015 (UTC)

If they are independent nouns then why do we need more templates at all? Define them separately, as e.g. "a man who mows lawns" and "a woman who mows lawns", and each can link to the other as a related term. Equinox 20:20, 11 July 2015 (UTC)
That's not ideal, because they might have many distinct meanings. Duplicating them all would be bad. The idea of a new template is to indicate "this noun means the same as this other one, except referring to a female individual". —CodeCat 20:25, 11 July 2015 (UTC)
I don't agree. The Italian words gato and gata both mean cat. The animals are make and female, but the words are masculine and feminine. SemperBlotto (talk) 20:22, 11 July 2015 (UTC)
No, gato means male/unspecified cat, while gata means female cat. And, this is exactly my point. Grammatical gender is arbitrary. "Feminine of gato" tells us nothing; it merely indicates that this noun is related to "gato" but has feminine grammatical gender. Nothing in the entry indicates that the cat itself has to be female, only that the word referring to it is feminine. —CodeCat 20:25, 11 July 2015 (UTC)
I can think of cases where grammatical gender doesn't match natural gender (cailín ‎(girl) is masculine, while gasóg ‎(boy scout) and stail ‎(stallion) are feminine), but I can't think of a case where a word referring to a person of one gender is derived from a word referring to a person of the other gender, but grammatical and natural genders don't match (in a language that has grammatical gender, unlike English). —Aɴɢʀ (talk) 20:31, 11 July 2015 (UTC)
Oh, and if anyone's wondering why gato and gata are orange links, it's because the Italian words are actually gatto and gatta. —Aɴɢʀ (talk) 20:32, 11 July 2015 (UTC)
gatta, as it is now, looks good. But as I said above, with highly polysemic words it becomes a problem to copy all the definitions. A simple template that refers to the definitions of the gender-neutral term is more effective. Also, this entry illustrates another important distinction between "feminine of" nouns and adjectives: the female equivalent noun can have meanings the male one doesn't have, or the reverse. With true grammatical gender, like that found in adjectives, that would be unthinkable. They really are separate nouns. —CodeCat 20:36, 11 July 2015 (UTC)
I'll write more later, but at the moment I just want to highlight that the observation that "the female equivalent noun can have meanings the male one doesn't have, or the reverse" calls into question the sensibility of avoiding spelling out which senses each word has and instead using a template that would "indicate 'this noun means the same as this other one, except referring to a female individual'". - -sche (discuss) 21:05, 11 July 2015 (UTC)
You're right about that point. I just wanted to accommodate users who certainly want to use a template, and also existing entries that have no definition beyond {{feminine of}}. —CodeCat 21:08, 11 July 2015 (UTC)
It may be wise to distinguish languages which have grammatical gender from those which do not. Because English does not normally* mark gender grammatically, it's at least debatable whether mayoress should be described as 'female' or 'feminine'. (The references turned up by google books:English "-ess" "feminine form", compared to the irrelevance turned up by google books:English "-ess" "female form", suggest that the traditional analysis has been that it's a 'feminine' rather than a 'female' form.) *Of course, note how google books:"blonde mayoress" gets two hits while "blond mayoress" gets none, and "blonde mayor" gets no hits while "blond mayor" gets at least five (plus a lot of chaff), suggesting that there are some areas where grammatical gender agreement is found in English.
In German and other languages with grammatical gender, the case for describing Wissenschaftlerin et al. as 'feminine' rather than 'female' forms of Wissenschaftler et al. is necessarily stronger, since they are feminine, and take feminine adjectives, etc, independent of whether or not they are regarded as 'feminine forms' or 'female forms' of the corresponding masculine nouns. - -sche (discuss) 22:26, 11 July 2015 (UTC)
One other thing to keep in mind is that sometimes the "female equivalent of X" means "woman is who is an X" but sometimes it means "wife of an X". In the UK at least, a duchess is always the wife or widow of a duke; no woman can become duchess by virtue of her birth. Our definition of Burggräfin is "female burgrave", but when burgraves were still running around they were always male; a Burggräfin is the wife of a burgrave. A hundred years ago or so, Professorin almost always meant "wife of a professor" but today it almost always means "female professor". In the E. F. Benson novel Trouble for Lucia, a woman becomes mayor of a town in England and has to choose a mayoress to help her, but she is not the mayoress herself—in that context, then, mayoress means neither "female mayor" nor "wife of the mayor" but rather "woman who assists the mayor". I doubt a single template can or should accommodate all this variation. —Aɴɢʀ (talk) 05:56, 12 July 2015 (UTC)
Those are excellent points. For words like those, I'm persuaded that we should give the words actual definitions (as Equinox said). For English, almost all of the -ess and -rix and other such entries I've seen do have definitions, and we should just clean up the few that don't. For German, the Duden has -in entries only as pointers to their masculine counterparts, but de.Wikt gives them full definitions, and entries like de:wikt:Professorin (which records the different meanings) vs Duden: Professorin (which doesn't acknowledge them) convince me that full definitions are preferable (and, I think, already the norm). That doesn't preclude the existence of entries that would be better handled by a template whose wording could then be debated, but we should probably identify some such entries before debating wording further. - -sche (discuss) 07:26, 12 July 2015 (UTC)
  • I agree with CodeCat that there is a problem with some entries: The current presentation at amiga#Spanish reads "feminine of amigo, friend", which seems suboptimal since it highlights grammatical properties rather than focusing on the referent. However, I don't agree with CodeCat's solution of using a template. Czech učitelka says "female teacher", which seems fine to me, and preferable to using a template. Above, Angr makes a good point about duchess: female duke vs. wife of a duke. --Dan Polansky (talk) 08:23, 12 July 2015 (UTC)
  • I came across another problematic entry, coreana. The noun presumably indicates a female person, but there is again nothing in the entry to indicate that. —CodeCat 19:38, 12 July 2015 (UTC)

Language codes[edit]

2015-07.11 16:30 I'd like to add some language codes for some swedish "dialects" (they should be considered languages IMO) because I don't wanna clutter the swedish entries with tons of dialectal versions, not to mention the dialects have their own grammar and pronounciations and I would like it if I could list those.

The ones I have in mind are Pitemål (Peijtmåle), Lulemål (Leulmale), Överkalixmål (Överkölismale) and Jamtlandic (Jamska).

Don't know what else i'm supposed to say really, Codecat told me to post here about it. —This unsigned comment was added by (talk).

@Br0shaan: just letting you know that I've moved the discussion to here (from Wiktionary talk:Beer parlour, which is the talk page for discussing the Beer Parlour itself...). - -sche (discuss) 22:02, 11 July 2015 (UTC)
@-sche: Thanks! I a little new to wiktionary, or well, the discussion parts of it anyway. Is there anything special I need to provide to get this suggestion get accepted?Br0shaan (talk) 19:58, 12 July 2015 (UTC)
@Br0shaan: we do not have the requisite framework for handling languages with a lot of dialects. Armenian has circa 50 dialects with their own word forms, pronunciations and grammar. For now I have come up only with a way to show the word forms on the entry with the literary spelling: Module:hy:Dialects. See it used in փետուր ‎(pʿetur), գազար ‎(gazar), բամբակ ‎(bambak). You can create a similar module for Swedish. --Vahag (talk) 08:30, 13 July 2015 (UTC)
@Vahagn: That's a shame, but the module looks alright. How do you deal with words with formations completely different from standard words? Just create a new word entry? Also how should I list these in derivation trees when looking at things like norse or proto-germanic? not at all? Because that would be very dissapointing. Anyway, thanks for the help and the quick reply! :) 13:30, 13 July 2015 (UTC)
Just create a new entry, like կյա̈զա̈ր, and label it with {{label|se|dialectal|Lulemål or whatever}}. The list of labels can be added to Module:labels/data which will allow automatic categorization and linking to Wikipedia. As for derivation trees, there is no accepted way of doing things; I have tried a format like in ճանդարի ‎(čandari). --Vahag (talk) 13:43, 13 July 2015 (UTC)
I'll see what i'll come up with for derivation trees, thanks for the help! Br0shaan (talk) 15:18, 13 July 2015 (UTC)
@Vahagn: Also could you familiarize me with how the module works and how to implement it? Is there any good documentation?
The module is invoked by {{alter}}. It has some documentation. Just copy the format of Module:grc:Dialects; it is pretty simple. --Vahag (talk) 13:46, 13 July 2015 (UTC)
I actually figured it out myself before i saw your answer, so yeah it was pretty easy. Most of the trouble was finding out how to make a new module page haha. Br0shaan (talk) 15:18, 13 July 2015 (UTC)

Merging ( and ) into a single entry[edit]

I was thinking that maybe it would be a good idea to merge entries of separate brackets "(" and ")" into matched-pair entries such as "()" and leaving only single-character entries with definitions about actual uses of a single character without the other, when they exist; when they do not exist the single-character entries could redirect to the matched-pair entries.

1. (repetition) The way it is now, most definitions are repeated: sometimes, the left side sense is "Begins X" and the right side sense is "Ends X". I don't think one should be required to check two separate pages to see definitions for the same thing; also, "begins" and "ends" makes it a bit longer to read, especially when these two words are present in almost all senses in the two pages.

2. (consistency) With two almost identical entries, editing one entry requires editing the other for consistency. I am in the process of updating ( and ) to conform with uses quoted in Citations:(, but that makes it somewhat more cumbersome to keep both entries updated. One example of inconsistency (although easy to be fixed) is that { currently has a sense that } doesn't.

3. (lexical unit) I'd argue that since in most senses of () you can't use one without the other, they are together only one lexical unit. IMO, having them separated is like having the entry . (full stop) with the sense "The first, second, or third dot in an ellipsis, which indicates a pause or omission."

(The reason 4 was added later the same day, 21:44, 12 July 2015 (UTC) - original message linked here.)
4. (incompleteness) is defined as "Starts a quotation." and is defined as "Ends a quotation." Like a number of other current single-character entries, this seems directed to readers who already know how to use the brackets or quotation marks. (Compare (horizontal bar), defined as: "Introduces quoted text.", which from its definition seems exactly synonymous with but does not actually require any mark at the end of the text.) If they are merged into “”, then the definition is obviously going to change some way. But if they are kept as separate one-character entries, it would be more accurate to define them as:

entry for left quotation mark () - "Starts a quotation that ends with ”."
entry for right quotation mark () - "Ends a quotation that begins with “."

More examples of repeated definitions:
( - Begins supplemental information.

Sen. John McCain (R., Arizona) spoke at length.

) - Ends supplemental information.

Sen. John McCain (R., Arizona) spoke at length.

Some affected entries:

Thoughts? --Daniel Carrero (talk) 08:45, 12 July 2015 (UTC)

  • Yes, please. Though I would prefer a title of [[( ... )]] to avoid the impression that "()" appears in texts without anything in between, and because the ellipsis also lends itself well to phrases that are otherwise hard to lemmatise (e.g. I'm ... year(s) old). Keφr 09:49, 12 July 2015 (UTC)
  • Oppose , at least is not usually a paired character. is also used as an unpaired character. Even when not used as apostrophes or transliteration characters, what pairs of quotes there are and what order they come in is language-dependent; to turn « and » into «», »« and »», as w:Guillemet says is necessary, is not a win.--Prosfilaes (talk) 10:58, 12 July 2015 (UTC)
    • I have never seen an unpaired use of ; I have seen , however. Guillemets could be handled by {{alternative form of}} (and a usage note saying which form is preferred in which language): not perfect, but not tragic either. Keφr 12:34, 12 July 2015 (UTC)
      • And quotes, which come in various directions. Why not just a {{paired with}} template instead of physically merging the pages.--Prosfilaes (talk) 21:42, 12 July 2015 (UTC)
        • There is no such template; we do not have circumfixes or entries like how do you say...in English split into two pages either, so why this? Keφr 23:15, 12 July 2015 (UTC)
          • Because it's not as neat as a- -ing. Because quotes stand alone at the starts of lines to mark continued quotes in older books; because quotes are used in different pairs in different languages and different time periods. Because many quotes, whether or not you've seen them, have been used for non-quote purposes. The braces and parenthesis can probably justifiably be put together, at least I haven't seen an exception, but the quotes are much more complex across all writing then just simple standardized matched pairs.--Prosfilaes (talk) 23:47, 12 July 2015 (UTC)
          • For example, says "starts a quotation". That's incomplete; it also ends a quotation, as per w:Quotation mark. It can be matched up with several others.--Prosfilaes (talk) 23:52, 12 July 2015 (UTC)
            • If we create entries for every combination in w:Quotation mark#Summary table for all languages, then we would have these, as long as they are attestable: (using the format with space in the middle) “ ”, „ “, „ ”, ‘ ’, ‚ ’, « », " ", ‹ ›, » «, › ‹ and » ». I propose creating all the matched-pairs (11 entries don't look like too many, unless I'm missing something?) while leaving the single-characters with indexes to the matched-pairs. If we decide to do the opposite, e.g. keep only single-character entries and thus not create any of those matched-pairs, then the single characters would still need to have accurate indices of all formats of all language(s) either way. As an aside: 「 」 and 『 』 are fine to me, but I don't know exactly what to do with vertical brackets, I wonder if they could be treated as single-character exceptions until further discussion: and . --Daniel Carrero (talk) 00:28, 13 July 2015 (UTC)
The merger seems like a good idea, as the matched-pair usage is, in a sense not SoP, but it is probably also true that we can find attestation of the use of each character in isolation and, just as in the case of the morphemes that make up a compound, we would probably want to keep separate entries, even if there were no attestation apart from the matched-pair use.
We would in any event need to have hard redirects from the unmatched characters to the corresponding matched-pair entry. If we go the route of extensive hard (and soft) redirects, then the objection that no normal person would ever search for [[( ... )]] becomes moot. IOW as I see it each paired entry would need at least 3 hard or soft redirects to it and would not be useful without them. DCDuring TALK 13:35, 12 July 2015 (UTC)
BTW, why don't we have non-gloss definitions for the use of most of these as part of the character-based emoticons that some of us use, eg, ?(:-【} ? They seem to be usable productively, possibly even in widespread use, eg, in Usenet. DCDuring TALK 13:44, 12 July 2015 (UTC)
I do not think it should be listed as a sense. For one, the meanings of individual characters of emoticons are very context-dependent: in the "]" in ":]" is a mouth, but in "]:->" it represents devil's horns. Would you add a sense of "represents a head in orz" to [[o]]? Keφr 16:00, 12 July 2015 (UTC)
@Kephir: I would use {{n-g|Used to form images, especially of faces, used in some text-based computer communications|lang=mul}}. Usage examples would probably be better than explicit glosses. DCDuring TALK 23:52, 12 July 2015 (UTC)
Having in mind the emoticon o_o or o_O, you could also tweak your definition to mention that the letter "o" is used "to form images, especially of faces or eyes". --Daniel Carrero (talk) 02:05, 13 July 2015 (UTC)
How about [1]? Definitions of that kind would apply to so many other characters (while the actual meaning is so context-dependent, and relatively obvious in context anyway) that I doubt it would be practical or necessary to cover them all. Ever heard of ASCII art? Keφr 07:56, 13 July 2015 (UTC)
I take your point about ASCII art and the "cell division" example that you linked here. --Daniel Carrero (talk) 12:45, 13 July 2015 (UTC)
  • Oppose: They are different characters and are rarely, if ever, used successively. There's also no clear-cut way to represent them in a single entry. Purplebackpack89 15:18, 12 July 2015 (UTC)
  • I oppose using ( ... ) as the central location; the target should be blank. I abstain on whether, say, ) could be created as a soft redirect to (, for the time being. --Dan Polansky (talk) 16:11, 12 July 2015 (UTC)
    I request that pages ( and ) are left as they were at the start of this discussion for at least three days after the start of the discussion. I have undone moves of ( and ) done today by another user. --Dan Polansky (talk) 16:18, 12 July 2015 (UTC)
    • If you so wish, then go on and create a page with a blank title. I will be waiting here. Keφr 16:24, 12 July 2015 (UTC)
      • My mistake: ( ... ) should be a redlink, like it was a couple of hours ago. --Dan Polansky (talk) 16:30, 12 July 2015 (UTC)
    • Let me point out that ) has sense "Separates a number or letter from an item in a list" in which "1) New York, 2) London, 3) Paris" is given as one of multiple examples; that does not fit ( ... ). --Dan Polansky (talk) 21:34, 12 July 2015 (UTC)
      • In my mind, we would have the entry ( ) with all uses of both parentheses together and the separate, cross-linked, entry ) for that sense you mentioned. One of multiple senses of ( ) would be exactly a variation of the sense you mentioned: "Encloses a letter or number starting an item in a list.", with "(1) New York, (2) London, (3) Paris." as the list of examples. --Daniel Carrero (talk) 01:32, 13 July 2015 (UTC)
  • No, I don't like it. What do we do with constructions like the French "ne pas" ? SemperBlotto (talk) 16:14, 12 July 2015 (UTC)
    • Is it grammatical to say ne pas just like that? If yes, the entry can be at [[ne pas]]. If not, use [[ne ... pas]]. The ellipsis should only be used as a last resort. Keφr 16:24, 12 July 2015 (UTC)
      • In French, ne pas is used without anything between the two words when negating infinitives. Andrew Sheedy (talk) 01:22, 13 July 2015 (UTC)
  • I'm adding now the 4th bullet point of the rationale in my first message above; please see it. Concerning the page name, IMO I was thinking of adding a space in the page name, like this: ( ), « », ¿ ?, etc. Although I still see much merit in the spaceless (), «», ¿?, etc. I don't like ( ... ), particularly the fact that it's more difficult to type; though entries like this certainly would be linkable or redirected from their single-character parts. I am worried about ' and "; spaceless matched-pair entries for these two would be and ""; these look too ugly and (two apostrophes) looks identical to " (one quotation mark) to me. I think the same with space (' ' and " ") is great. --Daniel Carrero (talk) 21:44, 12 July 2015 (UTC)

In accordance with this proposal, or to test it a little to see if it looks good, I created 18 new entries for most of the variations of quotation marks listed in w:Quotation mark. I chose to link to and from all single-characters rather than using redirects. This improves our coverage since our entries didn't mention all these varieties before. Having separate entries is also an opportunity to explain better how they are used in each language. IMO, just having the entry with "Ends a quotation." is worthless if we can write the quotation marks as many ways as “ ”, ” ” or „ ”.

See this link, it is very interesting. It is the previous version of the entry with a translation table of 33 languages - just the starting quotation mark in each one, no mention of how to end the quotation, which I find confusing and annoying since you had to go to the other page to see how the quotation mark ends — sure enough, this other link is also with a translation table under the same circumstances, except with only 30 languages. If you saw the first table and discovered that Hungarian and Romanian apparently start quotations with and Swedish starts them with , the second table won't help you to know how they end. Apparently this was intentional — since all these three seemingly end with , putting this information on the table would not be a "translation". Anyway, I deleted both tables and replaced them with one of my own. ({{quotation marks}})

Also, Dan Polansky (talkcontribs) requested: "I request that pages ( and ) are left as they were at the start of this discussion for at least three days after the start of the discussion." While I did not touch the parentheses specifically yet, I've intentionally done as he said since the discussion started on July 12 and I edited the entries of quotation marks on July 16.

New entries:

Thoughts? Is it just me or do other people think they look good too? Do people think it was a waste of time and that the new entries should all be deleted? (I acknowledge some people here opposed the proposal, others supported it.)

I mostly just copied Wikipedia as I don't speak all those languages, and I used only minimal definitions for each entry. If there's any mistake in the entries or the table feel free to fix it, also expand the entries if you like. If it's alright, I'd like to do the same for ( ), square brackets, ¿ ?, etc. --Daniel Carrero (talk) 08:16, 16 July 2015 (UTC)

  • Oppose proposed merge of ( and ) on the grounds that each is susceptible to unique senses. Specifically, ) typically signifies a "smile" in emoticons, and ( typically signifies a frown. While it is true that it is possible to write emoticons going the other way, this is far less common in practice. I have no opposition to having a separate entry for () or ( ) or ( ... ) for uses unique to that setup. bd2412 T 13:27, 16 July 2015 (UTC)
    • I argued above that being a part of an emoticon should not constitute a sense. Does "(" in "(^_^)" signify a frown? Keφr 14:30, 16 July 2015 (UTC)
Oppose merging, but it wouldn't hurt to have an entry for the combined form, with a single sense line at the left and right symbols' entries referring users to the combined ones for more complete information/more senses. We don't want to remove information, just add to it and organize it better. Chuck Entz (talk) 13:56, 16 July 2015 (UTC)
Those two last comments read like supports, actually. When a paired character has a definition corresponding to standalone usage, then this definition obviously cannot be merged with the counterpart character. Keφr 14:30, 16 July 2015 (UTC)
I agree that BD2412 (talkcontribs) and Chuck Entz (talkcontribs)'s comments actually read like supports, in that both are supporting the proposal of creating entries in the format of ( ). ("I have no opposition to having a separate entry for () or ( ) or ( ... )", "it wouldn't hurt to have an entry for the combined form"). I take the point that BD2412 and Chuck are opposing specifically the possibility of having hard redirects from single-characters to matched-pairs, like redirecting ( to ( ). --Daniel Carrero (talk) 19:41, 16 July 2015 (UTC)
That is correct. I am specifically opposed to "merging ( and ) into a single entry". I do think that we should have an entry for "[]" (if that is possible), because that can be used to indicate the elision of text in a quote. bd2412 T 22:16, 16 July 2015 (UTC)

Poll: Format of the matched-pair entries[edit]

What should be the format for the matched-pair entries? (This says nothing about keeping or deleting the entries for single characters, just what to do with the matched-pair entries.)

As with other polls and votes in the past, if you'd like to, you are allowed to support either one or multiple options, the same holds true for oppose and abstain.

  1. left, space, right: ( ), “ ”, « », ¿ ?, " ", ' ', [ ], { }
  2. left, right: (), “”, «», ¿?, "", , [], {}
  3. left, space, ellipsis, space, right: ( … ), “ … ”, « … », ¿ … ?, " … ", ' … ', [ … ], { … }
  4. left, ellipsis, right: (…), “…”, «…», ¿…?, "…", '…', […], {…}

Support option 1

  1. Symbol support vote.svg Support That's the one I've been using for the matched-pair entries I've been creating. --Daniel Carrero (talk) 10:37, 17 July 2015 (UTC)
  2. Symbol support vote.svg Support I think this looks neatest and makes it clear that the punctuation isn't one continuous symbol. Andrew Sheedy (talk) 18:01, 17 July 2015 (UTC)

Oppose option 1

Abstain option 1

Support option 2

Oppose option 2

  1. Symbol oppose vote.svg Oppose As I said before, IMO, (), ¿? and others look great, but "" and look ugly and confusing in this format. --Daniel Carrero (talk) 11:28, 17 July 2015 (UTC)
  2. Symbol oppose vote.svg Oppose Same reason as Daniel Carrero. Andrew Sheedy (talk) 02:39, 18 July 2015 (UTC)

Abstain option 2

Support option 3

  1. Symbol support vote.svg Support, to clearly indicate that something goes between the paired characters, and to follow entries like I'm ... year(s) old. Keφr 10:52, 17 July 2015 (UTC)
Are parentheses used like that outside of the phrasebook? I'm of the opinion that the phrasebook should be a semi-separate thing, like the rhymes and Wikisaurus are. (It could include translation targets that are SOP as well.) I just don't think the phrasebook should be used as a base for formatting. Andrew Sheedy (talk) 18:01, 17 July 2015 (UTC)

Oppose option 3

  1. Symbol oppose vote.svg Oppose Harder to type. Even if we have redirects from ( ) and () to ( … ), most people would try to type the entry name with ellipsis anyway before figuring out the redirects, since the name with ellipsis would be the actual entry name. Redirects would not be intuitive unless we start adding the {{shortcut}} template to entries. I've created […] as an example of entry which has the ellipsis as part of the entry name, not as an indication that it is a blank space to be filled. Having [ … ] simultaneously with that entry would require some additional explanation of what is a space to be filled and what is an actual ellipsis. Just like with the English circumfixes, I don't think the ellipsis is necessary to demonstrate that the space between parentheses is a blank to be filled, because: 1) in the case of parentheses and other common English symbols, most readers probably already know how they are positioned in relation to the text anyway; but 2) especially in the case of unknown and FL brackets, the definitions should explain this satisfactorily; even a simple phrase like "Encloses supplemental information." at ( ) is good enough IMO, especially when together with examples, and perhaps usage notes when needed. --Daniel Carrero (talk) 11:28, 17 July 2015 (UTC)
  2. Symbol oppose vote.svg Oppose The ellipses aren't actually part of the punctuation, and while they may indicate that something should go between, they look messy to me. If the user looking up the brackets/parentheses/whatever doesn't know that something goes between the two parts, then they may not know that the ellipses just stand in for something. If they know that text is supposed to go in between the two sides, then the ellipses are redundant. Andrew Sheedy (talk) 18:01, 17 July 2015 (UTC)

Abstain option 3

Support option 4

Oppose option 4

  1. Symbol oppose vote.svg Oppose Same reasons as my opposing vote in the option 3. --Daniel Carrero (talk) 11:28, 17 July 2015 (UTC)
  2. Symbol oppose vote.svg Oppose Same reasons as what I wrote above. Andrew Sheedy (talk) 02:39, 18 July 2015 (UTC)

Abstain option 4

Note that (…) is defined as something else than just the parentheses: "Symbol used to substitute parts of a quotation that are deliberately omitted.". --Daniel Carrero (talk) 10:37, 17 July 2015 (UTC)

FL example sentences[edit]

There seems to be a great deal of inconsistency in the formatting of example sentences under foreign language entries. I've been reformatting them as I come across them, but it's a lot of work, and I'm not sure if there are any accepted formats besides the one given in the guidelines. Speaking of which:

  1. (Definition.)
    Voici un exemple.
    Here is an example.
  1. (Definition.)
    Voici un exemple.
    Here is an example.

Both of the above are considered correct according to WT:ELE, and both are common. Is one preferred over the other, or are both in equal use and equally allowed?

Now, here are some formats of FL examples that I've come across frequently for Spanish sentences (but with often missing punctuation included):

  1. (Definition.)
    Voici un exemple. - Here is an example.
    Voici un exemple. — “Here is an example.”
    Voici un exemple. -- Here is an example.

There are others, but the above seem to be especially widespread (at least in Spanish entries), and at least some are being included in new definitions. Should I just leave them alone, or fix them as I see them? Is it possible to fix something like that with a bot? Andrew Sheedy (talk) 19:36, 13 July 2015 (UTC)

For very short usage examples, it is sometimes better to display them as a single line. You can add the argument inline=1 to {{ux}} or {{usex}} to make it so. — Ungoliant (falai) 19:39, 13 July 2015 (UTC)
As for bolding the term in the translation, you should do so whenever possible. The only exception is that sometimes the differences between the languages will make it impossible to isolate the term in the translation. --WikiTiki89 19:55, 13 July 2015 (UTC)
Unless it is debated, I think it should be noted at WT:ELE#Example_sentences that the translation of the term should be in bold as well, since it isn't clear due to lack of consistency. Andrew Sheedy (talk) 22:40, 13 July 2015 (UTC)
I've updated WT:ELE and WT:USEX. Did I miss anything (does anything still need to be updated)? - -sche (discuss) 01:25, 14 July 2015 (UTC)
The example translations and transcriptions further down the page at WT:USEX don't show that the translation/transcription of the word is to be in bold as well as the term itself, nor is that mentioned at WT:ELE. I would add it for clarity's sake, so new users like me know to do it, as trivial as it may be.... Andrew Sheedy (talk) 02:19, 14 July 2015 (UTC)
@-sche I missed this before, but the example "For non-English words in non-Latin alphabets" at WT:USEX specifies that there are to be no italics or words in bold in the translation. Andrew Sheedy (talk) 01:02, 15 July 2015 (UTC)
OK, I've updated both of those sections. Please let me know if I've missed anything else that needs to be done. :) - -sche (discuss) 16:13, 16 July 2015 (UTC)

Persistent extensions of votes[edit]

I consider these numerous persistent extensions (in summa: 4 with a fifth attempt thwarted; I find the præsence of the adjective fair in this fifth attempt maladroit) of a single vote truly inappropriate or at least disconcerting. I would like to clarify that currently this not a critical remark regarding the vote’s closing or outcome, instead I would like to discountenance said adjustment ad libitum of the expiration date of that already protracted vote with the aim to impede an outcome that at the time of the second extension (beginning of April) was an evident lack of consensus (7-6). Actually this had been mine initial motivation for participating in the vote: the desire to contribute with one more vote to the manifestness of the rejection and hopefully præcipitate the closure of that vote.
To me, there is no reasonable justification for extending any vote more that one month (to put it simply or to appeal to æsthetics: those numerous struck extensions encumber the mere lecture of the vote’s content), or at most one and a half months, but I would be interested to heed to others’ suggestions (if any arise) for a temporal limitation in that sense in order to præclude future unconfined extensions. The uſer hight Bogorm converſation 20:01, 13 July 2015 (UTC)

I would like to know what are the reasons for extending a vote. Should we really wait for the people who voted later?
Concerning the Sanskrit vote, I've made a chart of the extensions and what would be the results if the vote, which ended 5 6 July 2015, had ended on each of the previous scheduled dates:
  • (5-5-1) 5 March 2015
  • (6-5-1) 5 April 2015
  • (9-6-1) 5 May 2015
  • (11-6-1) 5 June 2015
  • (12 11-6-1) 5 July 2015
  • (12-6-1) 6 July 2015
--Daniel Carrero (talk) 20:27, 13 July 2015 (UTC)
If, instead of respecting the deadline, we repeatedly move it ahead until such time as we happen to have a sufficient number of voters to call it a consensus (which 12–6 isn't really, but let's ignore that arguendo), then we're favoring view of such latecomers as happen to come across the vote first, a selection bias. If we believe that a longer time is necessary or desired, then (0) that longer time should be set when first proposing the vote and not extended. And if that realization comes post facto, then, ideally, (1) call it no consensus, discuss and advertise the issue better in the BP and perhaps elsewhere, and start a new and better vote, if desired. Or, at least, (2) we should have a limit of one extension on a vote. Or, at the very least, (3) we should extend a vote as long again after consensus is achieved as we did before it was (and as long again after it's achieved in the opposite direction, if that happens). Any of those would seem much fairer than the method employed at the particular vote that led to this discussion.​—msh210 (talk) 20:55, 14 July 2015 (UTC)
Ever heard of an Allen charge? --WikiTiki89 21:10, 14 July 2015 (UTC)
I agree with Msh210 that the practice of extending votes until victory (or defeat) is achieved is an unfair procedure. I don't think it matters on which side the extender votes or whether the extender abstains, but it is particularly suspect when the outcome is the same as the extender's vote. It is at best a lazy procedure and at worst a corrupt one.
The only remedy is to void the vote. Obviously it can be reproposed and revoted, possibly after recrafting the proposal. DCDuring TALK 23:09, 14 July 2015 (UTC)
I agree with your assessment of particular suspectness and your proposed remedy. Voiding often provides relief.  :-) ​—msh210 (talk) 06:26, 15 July 2015 (UTC)
Everyone had four full months in which to question, complain, or lodge a protest, but everyone was silent during all of that time. Voiding the vote now, after acquiescing to the multiple extensions by maintaining silence, would be unfair to those on the winning side of the decision. The best thing to do at this point would be to leave this vote as it stands, and to develop a policy that will address the extensions issue in future vote. However, if a lot of people are hell-bent on overturning the decision, then we should put it to an official vote on whether to void the decision and redo the original vote (emphasis on official vote to void the decision). —Stephen (Talk) 12:41, 15 July 2015 (UTC)
It seems to be a fairly common practice for votes (and discussions) here to drag out seemingly without end. This one was no exception. Furthermore, discussions are not mere exercises in bean counting. Five out of six editors expressing opposition to the proposal provided no substantive argument on the matter. The sixth provided a factual error as their premise. All of that can reasonably have weighed into the outcome. Are we now going to reopen every discussion that was closed after a series of extensions? bd2412 T 13:40, 15 July 2015 (UTC)
This is about a corrupting procedural matter, not substance. For those whose ox is gored as a result of the abuse of voting procedure, the option of not accepting the extensions was open.
"Are we now going to reopen every discussion that was closed after a series of extensions?"
No. If we do it once and adhere to a policy of no unilateral extensions, we will never have to void a vote again. If the extension process had not been abused by repeated extensions only to result in a bare victory for the view supported by the person extending, this would not have come up. DCDuring TALK 14:35, 15 July 2015 (UTC)
If you thought it was a corrupting procedural matter, why didn’t you say something about it during those four months? There have been quite a few votes where the end date was extended, often more than once. Why didn’t you say something during all of those times? Any one of you could have close the vote and made the decision at the end of each of those extensions, but none of you did. Why not? So why now, all of a sudden, has it become a "corrupting procedural matter"? Whether it was a good idea or a bad idea, you, like all the rest, went right along with it until somebody didn’t like a decision, so now you want to throw around accusations of curruption. That’s ridiculous, you had ample time and opportunity to speak up and say that you are against it. Instead of bashing someone who was just trying to do what he thought was right, while you kept silent and looked the other way, just propose that we have a vote to void the decision.
And whether you like it or not, it creates a precedent, and anybody in the future who does not like an outcome can claim malfeasance of some sort and demand the vote be thrown out. Either we accept that it’s okay to void a vote someone does not like, or we don’t do it. —Stephen (Talk) 14:55, 15 July 2015 (UTC)
You're right: I should have spoken up after each extension. I saw them and ignored them. Maybe it's w:Kitty Genovese syndrome or a simple desire to avoid confrontation. (To answer your "Any one of you could have close[d] the vote and made the decision at the end of each of those extensions, but none of you did. Why not?", though — I fully intended to after two of the later ones, but they were re-extended before I had a chance.) But closure on the first opportunity, on a slim margin, by someone who voted like the closure? I needed to say something. Note, though, that I don't mind the substance of the decision at all: I looked only a little into the Sanskrit issue, but think the proposal makes sense. Nonetheless, the procedure followed stank.​—msh210 (talk) 22:15, 15 July 2015 (UTC)
Re "intended to after two of the later ones, but they were re-extended before I had a chance", consider e.g. Wiktionary:Votes/2015-03/Templatizing topical categories in the mainspace, which was repeatedly extended just before its deadline. Obviously, no one can close it at that time (last-minuite voters may yet come). (Pinging SGB.)​—msh210 (talk) 20:08, 22 July 2015 (UTC)
I completely agree with Stephen here. --WikiTiki89 14:58, 15 July 2015 (UTC)
As I. You can't void a vote because you don't like the use of the established process. If you want to overturn the decision, start a vote for that. If you want to change or clarify the rules going forward, we can discuss that.--Prosfilaes (talk) 21:03, 15 July 2015 (UTC)
I didn't get involved in the vote because I didn't have an opinion and wasn't watching the page. I've missed lots of votes. We haven't had anything quite as egregious as this lately. Were the procedural process not such a bad precedent I wouldn't have cared. Sorry that your ox is gored as a result of the practice of other supporters of the proposal.
I've got another idea. Why don't we have another extension? DCDuring TALK 15:32, 15 July 2015 (UTC)
Better yet, extend again as long as it's been extended hitherto, as I suggested above.​—msh210 (talk) 22:15, 15 July 2015 (UTC)
The vote is ended and decided. What you are suggesting is voiding the decision (without a vote to do so) and opening the vote again so that you can beat the bushes to scare up enough votes to win the opposite decision. It is the same thing as overturning the vote and having a redo. Why not just save everybody the trouble and declare the decision reversed (failed)?
If you want to void the decision (which is unfair to the majority who supported and won already), you need to hold an official vote for the purpose of voiding the decision of the Sanskrit vote and doing the vote over again (which will set a precedent for having do-overs whenever anybody does not like the outcome of a vote). —Stephen (Talk) 23:16, 15 July 2015 (UTC)
By the way, DCDuring, there are several votes on WT:Votes that are ready for closure and decision right now. Since you think we’re egregiously corrupt and bereft of ethics, why don’t you nip over there and close the votes yourself? Or would you prefer that we continue to do it so that it’s more convenient for you to say we’re corrupt? —Stephen (Talk) 23:28, 15 July 2015 (UTC)
I don't think I ever said that individuals were corrupt, only that the process was. In any event, that is what I intended and I stand by that. I'd favor other people closing votes rather than me as I can't figure out how the archiving is suppose to go, but I closed a few votes that had run their appointed term.
Judging by the low participation, I wonder why we give any force at all to the outcome of some votes. If we can't muster a quorum (6, 7, 8, 10?; counting abstainers?; differing for various classes of votes (bot status, admin votes, substantive?), then there should be no mandatory policy resulting from the vote. Votes probably need to be more publicized. The subpage structure interferes with achieving comprehensive coverage of votes. Would Editor news be good for that or BP? Do we need a tickler system (a single page?) of some kind to remind folks when a vote starts, when it is about to end, when and how it was decided?
Perhaps BP polls would provide guidance without something becoming mandatory. DCDuring TALK 14:01, 18 July 2015 (UTC)
Your accusing me of making my suggestion "so that [I] can beat the bushes to scare up enough votes to win the opposite decision" is inappropriate and insulting. First of all, I mentioned above that I mind the procedure followed not the decision itself. Second, even if I disagreed with the decision substantively, that'd be a groundless accusation. You're right that the vote has been called. Arguably, it's been called inappropriately. Can't people contest the closure on the vote page and see if consensus builds there to let it stand closed or not, without holding a new vote on the issue, and with the burden on those who wish to reopen it (viz so that, if no consensus builds at all, the vote stays closed)? In my opinion yes.​—msh210 (talk) 05:03, 16 July 2015 (UTC)
@msh210: I don't think the initially set end date of the vote is a deadline, and that our procedure is to forbid extending a vote. That would be another procedure, not the one that we have. In fact, we do not have a specified procedure as for the meaning of the end date of the vote, merely the common practice. And the common practice is to allow extensions of a vote, as was done e.g. in Wiktionary:Votes/pl-2010-05/Placenames with linguistic information 2; if anyone is interested, I can collect all votes that were ever extended. My extending this particular vote was driven by the same tentative unspoken principles I was using in previous votes that I have extended. You participated on extension of Wiktionary:Votes/pl-2014-03/CFI: Removing usage in a well-known work 3; have you changed your mind, meanwhile? --Dan Polansky (talk) 08:26, 19 July 2015 (UTC)
You are misrepresenting my participation on the 2014 vote, Dan, no doubt due to unawareness rather than malice. As the talkpage there shows, my extension was only after there was a clear consensus and because the consensus had been newly reached during a previous extension. That is exactly in the spirit of my comments here (if perhaps the details vary slightly).​—msh210 (talk) 04:04, 20 July 2015 (UTC)
@msh210: Oh, I see, sorry for that. As an aside, I do realize the danger of selection bias, and do see where you are coming from in principle even though I happen to think the concern with selection bias is excessive, and that the real risk is much lower than it appears. --Dan Polansky (talk) 22:28, 20 July 2015 (UTC)
  • Repeated extensions of votes are a great thing. People are slow to come to votes. There is nothing unethical about extending votes, nothing that I was able to discover when I investigated the question. There is very little opportunity for the extender to use selection bias. You cannot on any vote just wait until 2/3 supermajority happens; most votes do not create that opportunity. In any vote in which waiting until 2/3 worked, there was in fact a lot more support than opposition. --Dan Polansky (talk) 08:03, 19 July 2015 (UTC)
  • Let me echo multiple editors above: Extending votes is a common practice. This practice can be objected to, but it is not like Wiktionary:Votes/pl-2014-07/Allowing well-attested romanizations of Sanskrit is the first or second vote to use that practice. During the multiple extensions of the vote, there was ample opportunity for those who deem the practice "corrupt" to speak their mind on this issue, on the vote talk page, in Beer parlour or elsewhere. Obviously, I do not consider the practice "corrupt". Nonetheless, by my principles, Wiktionary:Votes/pl-2014-07/Allowing well-attested romanizations of Sanskrit should be extended by one month; it should not be voided. --Dan Polansky (talk) 08:42, 19 July 2015 (UTC)
  • FYI: Wiktionary:Votes/pl-2015-07/Disallowing extending of votes. --Dan Polansky (talk) 08:58, 19 July 2015 (UTC)
    It is a simple question of allowing rabid parliamentarians to dominate the voting process by effectively steering marginal proposals to victory or defeat by the simple rule of: Terminate the vote at the stated termination date if the outcome is what one wants; extend the vote if it is not. It was and is how many legislative bodies are run, but we don't have to let it happen that way. DCDuring TALK 22:31, 19 July 2015 (UTC)
    • And how do you want to accomplish that? —Keφr 00:55, 20 July 2015 (UTC)
      No extension of votes, to prevent closer/extender from choosing when the right time to close it. Minimum number of voters for a vote to be effective, to prevent votes that were not well publicized from being enacted. I'm open to other suggestions. Any proposed vote needs a seconder. DCDuring TALK 01:17, 20 July 2015 (UTC)

All we should do is to show active votes more actively. For example, putting them in the watchlist page below Wanted Entries will dramatically increase the awareness of new votes. This has been suggested by YairRand ••Dixtosa (talk) 10:36, 19 July 2015 (UTC)

  • In general, oppose extension of votes Discussions need not drag on indefinitely. Votes are open for a month as it is; easily long enough for people who edit here with any kind of activity at all to notice them. Same with RfDs: after a month, they should be closed and archived, even if there isn't a clear consensus, with the "no consensus" outcome defaulting to keep. Purplebackpack89 19:31, 22 July 2015 (UTC)

Deletion of good faith edits with no explanation[edit]

I have been a very sporadic contributor to Wiktionary for a number of years. Sometimes I have little bursts of activity, and then sometimes long gaps of inactivity. One of the things that repeatedly drives me away just when I might be getting enthusiastic about joining the project is the unexplained deletion of added content, such as happened here. This comes across as extremely rude and hostile. I understand that a lot of vandalism and nonsense has to be reverted, and I understand that mistakes are sometimes made. However, this has happened to me too often, and mostly (as far as I recall) from certain editors, for it to always be a mistake. I think instead it is a cultural problem here amongst certain members that the community would do well to address. 20:56, 13 July 2015 (UTC)

One thing that would help would be for you to become a registered user. That is what makes it possible to communicate and helps us take contributions more seriously. It also helps if the name is not too frivolous, though that is not a requirement.
I see that one can find attestation for pair of marigolds so your contribution would be a good one. DCDuring TALK 22:24, 13 July 2015 (UTC)
I agree, this community has a problem with biting newbies. WurdSnatcher (talk) 00:10, 14 July 2015 (UTC)
Do we not have a notice that unsourced material may be challenged or removed? That might be a good start. It's hard to see why the patrollers like SB should be expected to do the work of verifying (or formally RFVing) every random unverified sense that gets added. -- Visviva (talk) 01:57, 14 July 2015 (UTC)
I see no notice on the frame of the edit window that suggests anything of the kind, only the license links.
It seems that we really would like Wiktionary to be less wiki-like for anonymous users, imposing some kind of limits on their changes. Isn't that like what WP has, with some changes from some users being held in suspense until reviewed? DCDuring TALK 02:31, 14 July 2015 (UTC)
The rubber gloves are Marigolds, not marigolds. SemperBlotto (talk) 08:04, 14 July 2015 (UTC)
According to what DCDuring mentioned above, pair of marigolds seems to be attested (both capitalized and uncapitalized). Andrew Sheedy (talk) 15:08, 14 July 2015 (UTC)
  • I'm with WurdSnatcher on this one. There seem to be a number of "experienced" editors on this page who never bother to explain their reverts of good-faith edits, especially to new editors, and get uptight when asked to. And we wonder why we're bad at attracting new editors... Purplebackpack89 17:50, 14 July 2015 (UTC)
Ungoliant and other admins have explained some of my errors to me, and I didn't made those mistakes again. I do find it very helpful when I'm told what I did wrong, since I usually do it out of ignorance. I would likely have been discouraged from editing, or would have repeated the same mistakes had my edits been undone with no explanation. Andrew Sheedy (talk) 18:54, 14 July 2015 (UTC)
I think an explanation should be given. The revert tool shouldn't be used when the editor can be reasonably expected to take heart. —CodeCat 19:59, 14 July 2015 (UTC)
Yep, the auto-revert tool should really only be used for obvious bad-faith edits. If they're making a meaningful attempt, they deserve a real message explaining what's wrong. WurdSnatcher (talk) 00:27, 15 July 2015 (UTC)
+1. Revert should only be used for vandalism. Speed, schmeed. Purplebackpack89 01:01, 15 July 2015 (UTC)
Not necessarily only for vandalism, but for any edit where it is judged that an explanation will not have a significant effect on the editor. So it would also include editors who persistently make mistakes and bad edits and won't change their ways. —CodeCat 15:17, 15 July 2015 (UTC)
If a user is unregistered then it is very difficult to have meaningful communication. DCDuring TALK 22:25, 15 July 2015 (UTC)
The edit history isn't just for the benefit of the user being reverted; it will be seen by any other editor happening across the page. That's reason enough to make it helpful. Keith the Koala (talk) 06:17, 16 July 2015 (UTC)

Italicizing the entry name of taxonomic names[edit]

I am just announcing an edit I made, since I was thinking about it for a while and decided to just do it today without discussing beforehand.

I made {{taxoninfl}} italicise the entry title of all entries for taxonomic names that use this template, so that:

--Daniel Carrero (talk) 18:14, 17 July 2015 (UTC)

Not good, because, unlike genera, families should not be italicised. Equinox 18:16, 17 July 2015 (UTC)
Sorry about that. Based on your comment, I've changed the template further to italicize the entry name only when i=1, just like the headword line. That way, Homo is italicized while Hominidae isn't. --Daniel Carrero (talk) 18:29, 17 July 2015 (UTC)
And that conflicts with the German use of Homo. Maybe I should just undo the change and leave all the affected entries without italics like they were? That said, the italicized name looks good on Homo sapiens, Acer rubrum, etc. and all the species names, though. --Daniel Carrero (talk) 18:32, 17 July 2015 (UTC)
There's no good reason to have a pl parameter. All taxa are proper nouns. At rank of genus or lower they have the form of a singular Latin noun. At ranks higher than genus they have the form of a plural Latin noun. That is more or less part of the prescribed "grammar" of such names. Plural forms of generic and subgeneric rank taxa are not, strictly speaking part of the taxonomic name system. One could consider them to be borrowings into whatever language they are embedded. It would be interesting to see whether they appeared in New Latin genus and species descriptions, but arguably they would then be Latin. DCDuring TALK 18:46, 17 July 2015 (UTC)
Other cases besides Homo#Translingual/Homo#German include all the entries for genera that are named after historical and mythological figures for which we now have or may have an entry. DCDuring TALK 19:24, 17 July 2015 (UTC)
There are at least 179 exsiting entries for which English capitalized forms correspond to Translingual genus names. DCDuring TALK 19:36, 17 July 2015 (UTC)
How many taxonomic names at rank of genus or lower did not have i=1? DCDuring TALK 18:46, 17 July 2015 (UTC)
I reverted my edits to {{taxoninfl}} concerning italicization of entry names; now Homo sapiens and the like don't have the entry name italicized any more.
Concerning pl=, I used it with exactly 2 names: Homo sapiens=Homines sapientes and Pithecanthropus erectus=Pithecanthropi erecti. At least Homines sapientes is cited in English and Portuguese through Citations:Homo sapiens. DCDuring (talkcontribs), about your comment, particularly "Plural forms of generic and subgeneric rank taxa are not, strictly speaking part of the taxonomic name system. One could consider them to be borrowings into whatever language they are embedded." In the past, before I started editing Homo sapiens and Homines sapientes for a number of different reasons, there were English sections, an (odd) translation table and pronunciations; I moved all the applicable information into Translingual. Personally, I'd rather keep them that way, even if other entries for declensions of homo+sapiens are attestable (Hominis sapientis? Homini sapienti?), especially if those are found in running text in multiple languages. But it would be understandable if you and/or other people wanted to use different language sections for those like we do for CJK languages. You said the plurals are not strictly part of the system, for this reason I apologize since the current format with pl= makes it seem like the plurals really are part of the system. I propose keeping the plurals Translingual, at least until further discussion, while linking from the singular forms as Derived terms or the like, if you'd agree with that. --Daniel Carrero (talk) 20:18, 17 July 2015 (UTC)
@Daniel Carrero: Why bother for two instances? I would have thought that {{mul-proper noun}} (which is not deprecated, just not my preference for taxonomic names) was perfect for that. Furthermore it is difficult for me to accept that plural and genitive forms are taxonomic names. The citations indicate that the terms are being used as plural for members of the group Homo spaiens, not for plurals of the group. Every taxonomic name is of a group, not of its members. One great advantage of limiting the use of {{taxoninfl}} to taxonomic names is that it can be used to identify taxonomic entries that are lemmas. Remember that the heterogeneity of Translingual makes the idea of a single class of Translingual lemmas useless for most practical purposes. DCDuring TALK 22:07, 17 July 2015 (UTC)
@DCDuring: You are most involved with entries for taxonomic names and I edit them only occasionally. I have the feeling I'm probably going to fold and revert quick if you say I've done something wrong with the templates or the entries. Still, there's a point I would like to discuss. About: "Every taxonomic name is of a group, not of its members." as well as "Plural forms of generic and subgeneric rank taxa are not, strictly speaking part of the taxonomic name system. One could consider them to be borrowings into whatever language they are embedded." Wiktionary is a descriptive dictionary. Even if taxonomic names are intended to be used as proper nouns representing entire groups, while this should be respected and informed in the entries, I'd argue that their separate usage as nouns is nothing special. Just like you can say: "I've found a member of Vulpes vulpes!", you could say "I've found a Vulpes vulpes!" and find plenty of citations of "noun" versions of taxonomic names like this in multiple languages. IMO, cited uses like this don't constitute a reason for having separate sections other than Translingual for any languages, let alone a great number of language sections just for cited noun senses for a given entry as they are found, especially if any plurals attested use the rules of Latin grammar in multiple languages. I'm not sure if we could have Translingual noun senses along with proper noun senses, or maybe not? My point is just that it does not seem to merit separate language sections just for this. What do you think? --Daniel Carrero (talk) 01:17, 19 July 2015 (UTC)
I don't think that the way people use them is the same in every language and I have no idea how to get that information. I'm not even going to do it in English. What authoritative resource would we use for that? It I can't imagine doing the attestation. I'm not going to beat my brains out to incorporate relatively subtle variations which most users won't even notice. Our dictionary is rife with omission of much less subtle information in areas that are know to cause English language learners problems: ambiguous, erroneous, and misleading use of determiners in our definitions and failure to provide basic grammatical information ((un)countability), (in)transitivity, complements) come to mind.
In any event we would have to document the usage of taxonomic names in the communities that use them most. A very small share of taxonomic names even have vernacular-language homonyms that correspond to the taxa and we have entries for some of those, especially in horticulture, eg. azalea, andromeda, rhododendron. DCDuring TALK 02:02, 19 July 2015 (UTC)
Daniel is right, though: while the authorities may prescribe that the names be used only for "the group X", many of them are well attested in multiple languages as terms for "a member of the group X", which can be used with the indefinite article and in the plural (see e.g. Citations:Homo sapiens, and google books:"un Homo sapiens"). - -sche (discuss) 03:37, 19 July 2015 (UTC)
I would think it more useful to have note on how folks borrow taxonomic terms into each language in general than to lexicalize a million or even a hundred instances of such borrowing.
All someone has to do is attest the pattern of usage (capitalization, pluralization, and other inflections in some languages) for each language in which the Translingual term is borrowed and used. I don't see any way around it. Today I looked at plurals of Virus. In some germanic languages the plural is Virusen. I don't think that belongs in Translingual as it reflects a pattern specific to at most a group of languages.
I certainly won't protest if someone chooses to do all of that, but I am more interested in having Translingual entries for purposes of disambiguating vernacular names; helping folks read scientific literature by providing etymology, pictures, and translations; and even providing gender to help folks with naming species. DCDuring TALK 04:03, 19 July 2015 (UTC)
A language-specific plural is evidence that the Latin/Translingual term has been borrowed into another language. (Jumi Vogler, Was der Humor für Sie tun kann, wenn in Ihrem Leben mal ..., 2014, page 20, has this example of Eindeutschung: Zumal damals das Warenangebot an Homo sapiensen noch relativ klein war.) If a Latinate plural, however, is used in as many languages as a Latinate singular, I don't see how only one of them could be excluded from the Translingual section short of saying "we copy what the authorities prescribe on this matter", which doesn't comport with descriptivism. Here's one way such information could be presented (note not only my added sense and usage note, but the plural which is already provided). If one wanted to weigh the scales a bit in favour of prescriptivism, one could even confine both things to the usage note, i.e. not add a second sense-line nor a plural to the headword-line, but mention both in the usage note.
I suppose if the 2 or 3 entries which currently have plurals are the only ones that pluralize and/or are used with the indefinite article to refer to members of a group / species / etc, and they only do so in 5 or 6 languages, one could argue it's easier to add 18 different language sections than to expand 3 Translingual sections... but if more entries than that pluralize, it becomes untenable, IMO, to require a myriad of different language sections rather than expand the Translingual section. - -sche (discuss) 07:43, 19 July 2015 (UTC)
Is that how we handle borrowing from English? DCDuring TALK 12:44, 19 July 2015 (UTC)
It isn't the way we handle borrowings into English, which we show as English whether or not there is any alteration in the term, eg, sang-froid. Wouldn't we need to include multiple pronunciations in a Translingual entry? DCDuring TALK 12:50, 19 July 2015 (UTC)

Why even keep taxonomic names here anyway? I thought species: is for that. Keφr 18:56, 17 July 2015 (UTC)

I won't invoke our slogan. Wikispecies generally does not bother with obsolete taxa or with the gender and etymology of any taxa. (Few other taxonomic databases bother with gender and etymology either.) They also do not always have entries that correspond to well-attested vernacular names including those we already have, which is the purpose of the lists at User:DCDuring/MissingTaxa. Wikipedia doesn't bother with gender and is very uneven about covering etymology and obsolete taxa.
That we don't provide pronunciations or translations of taxa is a result of our decisions, not whether such would be useful to users. Our decision about translations is apparently based on the perceived need to reflect how native speakers of various languages actually pronounce the taxon, not how it ought to be pronounced, though that is what users seem to want. Our decision not to have translations seems as much to be that a vernacular name could be viewed as a monolingual synonym, as a translation, or as a term identifying members of the group named by the taxon, so we didn't want to depart from the gem-like precision of our conceptual model of language to include them. DCDuring TALK 19:14, 17 July 2015 (UTC)
@DCDuring: Going the other way around, what is so special and different about Wikispecies, then? Would you say that Wikispecies can be totally replaced by Wiktionary's coverage of species? --Daniel Carrero (talk) 01:17, 19 July 2015 (UTC)
@Daniel Carrero: They have some big offsetting advantages relative to us, but few relative to outside databases.
  1. They have vernacular names in multiple languages in many species and genus entries. We have decide to exclude non-English names on the taxon page, relying on the English vernacular name, which may not exist, eg, for species that don't occur in English-speaking lands, especially plants.
  2. They pay more attention to the authorities behind each name. We don't, which on a small number of occasions has led to some confusion.
  3. They have about 20 or more times as many taxon entries as we do.
  4. Their average page is better linked to external sources. But for some reason they don't link to WP or Commons very much. Our best entries are better linked to outside sources than theirs (useful for determining gender, checking consensus on circumscription and placement).
One other disadvantage they have is that they don't do much (translations?) that other databases don't do and most other databases do something they don't. DCDuring TALK 01:40, 19 July 2015 (UTC)
Some comments about the utility and challenges of Wikispecies:
  1. For a long time, a single sysop ran the entire operation his way, 24/7, overruling anyone else making edits there. A lot of animosity developed between this sysop, other sysops, and some other wiki projects. That user has since been banned, but this also means that style and content are a bit unstable as the community finds its footing again.
  2. Wikispecies goes in heavy for sourcing the publication, description, revision, and circumscription of taxa. This often has no bearing on the use of the word, but is of vital importance to researchers.
  3. Wikispecies has a highly navigable taxonomic tree built into every entry, such that taxonomic changes can be easily implemented without having to re-edit every affected entry.
  4. Commons links to Wikispecies whenever there is an entry to match a Commons category. Some Wikipedias (such as fr) also build in a link to Wikispecies from their taxoboxes. This isn't universal, though, in either direction, in part because the classification systems in use at different Wikipedias does not always match.
  5. Further, since Wikidata now controls interwiki links between the Wikipedias, the link situation has deteriorated. The editing of interwiki links between botanical taxa, for example, is under the control and supervision of User:Brya, who has been banned here, at the English Wikipedia, and at the Dutch Wiktionary and Wikipedia, for contentious edits, sockpuppetry, and a number of other problems. Her idiosyncratic ideas have led to a fragmentation of data items on Wikidata so that identical circumscriptions of taxa given different names, attributions, or rank on different Wikipedias are no longer interlinked. And links will only exist if everything about the taxa match exactly (and even then I've come across baffling counterexmaples).
So, we're a long way from useful interlinking between taxon entries on different projects. It is therefore difficult to avoid or streamline any duplication of content or redundancy of data. --EncycloPetey (talk) 21:11, 24 July 2015 (UTC)
Please don't italicize the headword lines of taxonomic names. Acer rubrum should not show the headword line in italics. Note they are translingual, and you have not shown that they are universally used in italics in multiple languages. Please undo your changes while the discussion is pending. --Dan Polansky (talk)

cs-noun and animacy[edit]

Can someone please undo the recent edits of {{cs-noun}} to provide for pseudo-genders m-an and m-in. They are intended to mark "an" for "animate" and "in" for inanimate. Animacy is not gender and should not be marked as part of a gender. Thanks. --Dan Polansky (talk) 07:19, 19 July 2015 (UTC)

See my other comment ... I think rather than asking for undoing this change, if you really object to the general concept of having "gender" include "pseudo-genders" then you should (a) propose an alternative, (b) open a more general discussion about how to handle this. As I mentioned, this is far from the only place that "gender" has been co-opted to include other gender-like properties. Benwing (talk) 08:06, 19 July 2015 (UTC)
I have not seen pseudo-genders in Czech templates. I do not watch the template situation outsite of Czech closely. Which other comment should I see and where? As for an alternative, that is obvious: create an animacy parameter. --Dan Polansky (talk) 08:12, 19 July 2015 (UTC)
We have added an animate and inanimate parameter to our masculine template on the French wiktionary. It is most useful to distinguish nouns, compare French entry kohoutek with local entry kohoutek. --Diligent (talk) 08:36, 19 July 2015 (UTC)
I agree with creating the animacy parameter, among others it could also enable adding the entries into special animacy categories. However, I strongly oppose removing the "pseudo-genders" (as Dan calls it) before such a parameter is added. Jan Kameníček (talk) 18:30, 19 July 2015 (UTC)
@Dan Polansky I was referring to my comment on WT:GP, where you've also responded. Benwing (talk) 09:36, 20 July 2015 (UTC)

Normalization of entries 2[edit]

Wiktionary:Votes/pl-2015-05/Normalization of entries failed. See also at the end of the vote my comments about the result of the vote, which I'm cool with, since the affected policy is still imperfect. The vote proposed having Wiktionary:Normalization of entries (WT:NORM) as an official policy alongside WT:CFI and WT:ELE. WT:NORM deals with aspects of formatting that are invisible to the user but are expected to be standardized nonetheless, such as whitespaces, spaces between == ==, the placement of interwikis at the end of the page and the placement of categories at the end of the language section.

The list of items currently in the policy was developed from this extensive 2006 thread, which shaped the wiki code of our entries as we know to this date with the major role of User:AutoFormat (2007–2010) and I proposed to be officialized through this discussion from May 2015 with 13 polls. Controversial, outdated or undiscussed items were removed from the list and moved to here. Continuing from where the previous discussion left off, I thought of 2 more polls to address issues that were raised in the vote. I feel it's a good idea to keep asking questions until the policy is just right. --Daniel Carrero (talk)

Poll 14[edit]

Having WT:NORM only with rules that affect the wiki code of the entry and are invisible to the readers.
Currently, most rules listed in WT:NORM are invisible, (such as whitespace, line breaks, spaces between == ==, spaces after * and interwikis at the end of the list), so it does not matter if the rules are followed or not by editors, the page would look the same to readers. If there are any rules that affect the layout of the pages, they should be kept in WT:ELE, not WT:NORM. Use the comments of this poll to discuss exactly rules can be affected by this poll.

I believe the rules that exist in the current version of WT:NORM and can be removed for affecting the layout of the entries are, specifically:

  • Language names should not be linked
  • Translation sections: Markup such as gender should be provided within the {{t}}/{{t+}} template, except for qualifiers, which should use {{qualifier}}
  • ---- before each language heading except the first

--Daniel Carrero (talk) 08:04, 19 July 2015 (UTC)

  • Hi Daniel. I'm unclear as to what you mean exactly by "invisible to the reader". Can you spell out which rules aren't invisible? As I mentioned, I had two objections. One concerns the insistence that categories need to be put at the end of the language section instead of at the end of an etymology subsection; I assume this is "invisible to the reader"? The other is about only one headword line per section, which simply doesn't work well for some Arabic entries. I assume this is "visible to the reader"? Benwing (talk) 08:14, 19 July 2015 (UTC)
    Hi Benwing. After you sent this message, since no one besides myself had voted for this poll yet, I've changed the whole text of the poll; maybe it does look clearer now?
    After you gave your reasons for opposing both rules of "only one headword line per section" and "categories need to be put at the end of the language section", I simply removed them from WT:NORM and added them to Wiktionary_talk:Normalization_of_entries#Removed_items until further discussion. But, since following these rules does affect how the entry look like to readers, I'd say these are "visible" rules and thus I don't think they should be applicable in WT:NORM anyway. --Daniel Carrero (talk) 08:35, 19 July 2015 (UTC)
    I'm striking this poll. I edited WT:NORM so that all rules of this policy concern whitespace, blank lines, etc. and removed everything else that changes the layout of the entry, thus is "visible" to the reader of the entry. I don't think there's any reason to leave any rules at WT:NORM if they can be placed in WT:ELE instead. --Daniel Carrero (talk) 17:48, 19 July 2015 (UTC)

Poll 15[edit]

WT:NORM should be mandatory for bots only.


  1. Symbol support vote.svg Support --Daniel Carrero (talk) 08:04, 19 July 2015 (UTC)
  2. Symbol support vote.svg Support DCDuring TALK 20:54, 19 July 2015 (UTC)
  3. Symbol support vote.svg Support With the two issues I object to removed, I have no problem supporting this and I already try to follow rules of this sort in any case in my bot changes. Benwing (talk) 09:42, 20 July 2015 (UTC)




  • I assume that we are not compelling every bot to implement all aspects of WT:NORM, but only to make its changes in conformity with it, so that the immediate neighborhood of each of its changes conforms. DCDuring TALK 20:57, 19 July 2015 (UTC)
    That is correct; what you said is what I had in mind too. --Daniel Carrero (talk) 21:06, 19 July 2015 (UTC)

Poll 16[edit]

Between an image and content that follows, should there be a blank line or not?

Examples with blank line:

[[File:Example 1.jpg|thumb|250px|upright|Description.]]

===Alternative forms===
[[File:Example 1.jpg|thumb|250px|upright|Description.]]

* form1
* form2

[[File:Example 1.jpg|thumb|250px|upright|Description.]]

{{term|example|lang=en}} + {{term|example|lang=en}}

[[File:Example 1.jpg|thumb|250px|upright|Description.]]

* {{a|foo}} {{IPA|/example/|lang=en}}
* {{audio|example.ogg|Audio (US)|lang=en}}

[[File:Example 1.jpg|thumb|250px|upright|Description.]]


[[File:Example 1.jpg|thumb|250px|upright|Description.]]

* synonym1

====Usage notes====
[[File:Example 1.jpg|thumb|250px|upright|Description.]]

In all examples, this example is exemplified by a process of exemplification.

===See also===
[[File:Example 1.jpg|thumb|250px|upright|Description.]]

* something

Examples without blank line:

[[File:Example 1.jpg|thumb|250px|upright|Description.]]
===Alternative forms===
[[File:Example 1.jpg|thumb|250px|upright|Description.]]
* form1
* form2

[[File:Example 1.jpg|thumb|250px|upright|Description.]]
{{term|example|lang=en}} + {{term|example|lang=en}}

[[File:Example 1.jpg|thumb|250px|upright|Description.]]
* {{a|foo}} {{IPA|/example/|lang=en}}
* {{audio|example.ogg|Audio (US)|lang=en}}

[[File:Example 1.jpg|thumb|250px|upright|Description.]]

[[File:Example 1.jpg|thumb|250px|upright|Description.]]
* synonym1

====Usage notes====
[[File:Example 1.jpg|thumb|250px|upright|Description.]]
In all examples, this example is exemplified by a process of exemplification.

===See also===
[[File:Example 1.jpg|thumb|250px|upright|Description.]]
* something

Poll 16 - Comments[edit]

Rather than having support/oppose/abstain options, I would like to discuss what looks better in each case.

Personally, my opinions are:

  • Yes - I believe it's especially important that we do insert a blank line between the image and a new section that follows below the image (===Noun===, for example), because if there were no image, a blank line would precede the new section anyway.
  • No - don't insert a blank line between the image and a headword template. (in cases where the image is between ===Noun=== and {{en-noun}}, for example, just don't insert a blank line anywhere) That because, in my mind, the headword template is sort of the extension of the POS heading.
  • In all other cases, I'd probably be fine either way, but I'm leaning towards: yes, have the space in all situations, it looks better and a bit easier to read, by properly separating one type of content from the other.

Thoughts? --Daniel Carrero (talk) 12:53, 20 July 2015 (UTC)

  • Don't the added spaces in some cases change the appearance that results? DCDuring TALK 19:20, 20 July 2015 (UTC)
    • @DCDuring No, not that I'm aware of. I tested both versions of the whole code that I used as an example for this poll and the presence or lack of spaces did not change anything in the appearance of the entry. In addition, the poll 6 from May 2015 was specifically about having a image or a {{wikipedia}} box between two headings. In that poll, I addressed a similar question about spaces changing the appearance of the page. My reply was: "[E]xtra vertical space only appears if we use a broken template with extra newlines at the end of the code before <includeonly/>, I presume? [...]" and I mentioned five second rule and feminism as two entries which use images with spacing without breaking anything. Also, the results of the poll I mentioned were 0-6-0-2, meaning 6 votes supporting the spacing, no votes supporting the space-less version; no opposes and 2 abstains. --Daniel Carrero (talk) 19:42, 20 July 2015 (UTC)
  • When would it make sense to add a picture under Alternative forms or Synonyms?--Dixtosa (talk) 19:37, 20 July 2015 (UTC)
    • Sorry, I was just testing various possibilities for the code. You can ignore them if that'd be better. --Daniel Carrero (talk) 19:43, 20 July 2015 (UTC)

Uncommon and exotic words in Translations section[edit]

Someone added German Weltnetz and Zwischennetz to the "Translations" section of Internet: see diff. These words are hardly used, and the usual German word for Internet is simply Internet. The presence of these words in the "Translations" section suggests that they are normal German translations of the English term Internet.

What should one do with them?

  • Delete them? As English to German translations they are useless and misleading.
  • Add labels? Such as hardly used?

Wiktionary:Translations doesn't say much about this problem.

See also:

--MaEr (talk) 11:22, 19 July 2015 (UTC)

Delete them. Due to the crammed nature of translation tables, it’s not worth presenting information of such limited usefulness. — Ungoliant (falai) 14:47, 19 July 2015 (UTC)
Some native speakers may prefer such terms to recent-vintage borrowed terms. Is one of the German terms noticeably more common? DCDuring TALK 15:08, 19 July 2015 (UTC)
See German WP, which argues for the terms being uncommon and politically fraught. Also Internetz seems as common as either of the above, if not more so. DCDuring TALK 15:16, 19 July 2015 (UTC)
I agree about deleting. I present archaic, dialectal, colloquial, uncommon forms in the main FL entry, under ===Synonyms===. --Vahag (talk) 15:10, 19 July 2015 (UTC)

Delete from translations, never hear about those words. Matthias Buchmeier (talk) 17:21, 19 July 2015 (UTC)

Thank you, everybody! I will remove these "translations" from now on, or move them to the foreign language entry, as Vahagn suggested.
I would like to add this suggestion to Wiktionary:Translations. Does one need a formal poll or decision for this? --MaEr (talk) 17:52, 19 July 2015 (UTC)

They should be deleted except when there is no normal, common form. Right?--Dixtosa (talk) 17:55, 19 July 2015 (UTC)

I'm sure I've seen things like {{t|fo|bar}} {{qualifier|rare}} (which yields "bar (rare)"), and with other qualifiers. (Ping.)​—msh210 (talk) 18:19, 20 July 2015 (UTC)

ISBN - request for more opinions[edit]

There is a discussion at Wiktionary talk:About Czech#Rejzek 2015 whether an ISBN parameter can stay in the reference template {{R:Rejzek 2015}} or whether it should be removed. After several reverts were made at the template I would like to ask the community for more opinions to decide the issue. Thanks. Jan Kameníček (talk) 17:28, 19 July 2015 (UTC)

My reasoning, for a Beer parlour discussion: ISBN is visual noise, and makes the user experience worse for people like me. It is inessential for identification. It is inessential for search purposes. It is not used in the references sections of multiple English books that I own and that I checked. I prefer that the use of ISBN in reference templates is avoided. I also prefer that it is avoided in attesting quotations, but that is less urgent since these are hidden in the mainspace by default. --Dan Polansky (talk) 18:53, 19 July 2015 (UTC)
What we could do is create an appendix with references. The reference template would link to a location in the appendix, like Appendix:References#Rejzek_2015. That location would provide more extensive information, including the ISBN, and maybe multiple relevant searches, and links related to the reference, including one to Wikipedia. Book identifiers other than ISBN could be provided as well, if wished. Thus, we could keep the appearance of the reference template in the mainspace short and simple, while providing extensive detail to those readers who need or want it. --Dan Polansky (talk) 19:22, 19 July 2015 (UTC)
An ISBN uniquely identifies a particular book, in theory and usually in practice. I fail to see how a few extra characters makes that much difference, but it does make searching a hundred times easier. As Dan Polansky points out, one can type in http://www.google.com/search?q=2015+%C4%8Cesk%C3%BD+etymologick%C3%BD+slovn%C3%ADk+Rejzek; or as I point out, one can click on the ISBN which Wikimedia helpfully links to various book sites, no guessing what values to feed into Google.--Prosfilaes (talk) 23:37, 19 July 2015 (UTC)
What Prosfilaes said.​—msh210 (talk) 18:07, 20 July 2015 (UTC)
I think the ISBN should be included when possible. It's essential information, and the comment about visual noise is just moot. —CodeCat 12:25, 20 July 2015 (UTC)
I have always felt the ISBN parameter as noise wherever it occurs on content pages. When I accidentally click on it, I wish I hadn't and I curse those who made it possible for a time-waste (waiting for the linked-to site to allow the back button to take effect in a controlled way) like that to occur. It is also misleading when it refers to a specific binding and edition of a work that is available in numerous forms. When the reference is to something that at least provides something like full text, the noise is worth it. Otherwise, kill with fire. DCDuring TALK 12:39, 20 July 2015 (UTC)
"When the reference is to something that at least provides something like full text, the noise is worth it." If I understood this correctly, despite your criticism of ISBN, when citation is linked to the visualization on Google Books it's okay? --Daniel Carrero (talk) 13:04, 20 July 2015 (UTC)
On mature reflection, I think I'd rather have a link from the repetition of the headword or from a page number. DCDuring TALK 13:37, 20 July 2015 (UTC)
I think it's useful information to have, but I agree it's "visual noise". Might be good to have a little hyperlink (to some standard ISBN lookup location? Wikipedia uses one, IIRC) but not to display the actual number on screen. Equinox 13:40, 20 July 2015 (UTC)
Great idea, IMHO. Having the text "ISBN" there with a hyperlink to IBSN look up location would be a huge improvement. And it would make all sides relatively happy, wouldn't it? In case of Rejzek, it would look like this: ISBN. When you click that link, it takes you to what is transparently marked up as Special:BookSources/9788073353933. No one can possibly argue that the IBSN was not provided to the readers who want to search by it. --Dan Polansky (talk) 19:36, 20 July 2015 (UTC)
I, too, think Equinox's idea is grand, but the link text probably should be something other than "ISBN". After all, the running text "1997, John Smith, Some Book Title, ISBN, page 37" doesn't really make much sense. Arguably the link should be from the book title itself (as I think someone suggested above); the only problem with that is that we sometimes link to the book's w: article from the book title. Or, arguably the link should be from the page number (as DCD suggested above); but we often link to bgc from the page number (directly to the right page, which special:booksources does not). I'm just spelling out some issues; I don't have a good solution, I'm afraid.​—msh210 (talk) 16:04, 21 July 2015 (UTC)
ISBN is only for most books published since 1970 (1967 with some conversion adjustments). It is not the same as EAN, though it can be converted to EAN. It is most relevant for those who would purchase a book, as libraries don't always make it easy to find book from its ISBN.
The ISBN is overly specific in that it specifies particular stock-keeping units for book retailers, not specific texts, which may be available in multiple ISBNs.
It is the display of "ISBN" followed by the ISBN number that is my core problem. Can we not have less clutter while achieving the same link as a result?
I would much prefer that we standardize on the display of desired links, of which I can think only of two at the moment. The more desirable of the two is a link to a particular page of the reference work (or database) available online. The second is the special:booksources link. For the link to text available online: page xx or a display of the headword or other term linked to; and something analogous for the link to special:booksource. One possibility is that we link to special:booksources using the the title of the work and link to any WP article via "WP" or something similar. DCDuring TALK 18:00, 21 July 2015 (UTC)
That last sounds good to me fwiw.​—msh210 (talk) 22:06, 21 July 2015 (UTC)

Wiktionary:Votes/pl-2014-07/Allowing well-attested romanizations of Sanskrit extended[edit]

Wiktionary:Votes/pl-2014-07/Allowing well-attested romanizations of Sanskrit has been extended. Some concern was expressed that this and/or other votes were poorly advertised, so let this serve as advertisement. Who has participated in the previous vote and discussions, or in discussions of this vote, without voting (even to abstain) in this vote yet? @Angr comes to mind. - -sche (discuss) 23:22, 19 July 2015 (UTC)

I don't consider this extension legitimate. It seems like bullying over the result, with the effect of cowing editors to change positions in order to achieve a different result. bd2412 T 18:01, 20 July 2015 (UTC)
I'd be surprised if anyone changed position on this. The problem is the process not the result. But to accept the result is to accept the fruit of a poisoned tree. DCDuring TALK 18:08, 20 July 2015 (UTC)
One editor already has. Unfortunately, this tells us nothing about the merits of providing more information as a dictionary, and everything about keeping up appearances. The 2/3 bean-counting requirement is not set in stone in any case. Where the question is one of presenting a more informative lexicon, a vocal minority opposing for no reason or based on factually flawed premises should not prevail. bd2412 T 18:24, 20 July 2015 (UTC)
Do you think that oppose votes with no rationale should be disqualified? Or were you thinking of something more nuanced? —CodeCat 18:27, 20 July 2015 (UTC)
Oppose votes with no rationale should certainly be given less weight. Otherwise, we open the process up to opposition by rote, rather than for a reason. bd2412 T 18:38, 20 July 2015 (UTC)
I don't think it's a good idea for users to start trying to discount votes that they don't like just because the voters didn't spell out explicitly "I do not agree with the rational offered for doing what this vote proposes to do; I oppose doing it". If you do want to suggest such disqualifications with any veneer of propriety, you'll have to also discount support votes that offer no rational, like Stephen's old support vote, Saltmarsh's, or SemperBlotto's. - -sche (discuss) 18:43, 20 July 2015 (UTC)
The first five oppose votes look to me like someone's idea of a joke. bd2412 T 18:45, 20 July 2015 (UTC)
Oppose votes without rationale come across as "I just don't like it"; there's no recourse for editors to come to a consensus except by discussing more (which vote pages are not really meant/good for). Wikipedia even has a page w:WP:I just don't like it suggesting that such argumentation should be avoided. So can we really take a vote seriously if everyone is just voting for preference without substantiating anything? For political voting that works, but not for a community based on consensus. We have no coalition and opposition here, nor should we. If each side just uses "I don't like it" to the other side, that isn't consensus, that's just tyranny of the majority and grudging acceptance by the minority. —CodeCat 18:53, 20 July 2015 (UTC)
A user who favoured the passage of the vote didn't mind extending it repeatedly for as long as it took to obtain the appearance of a majority in favour of the vote, but now objects to extending the vote any further than that because he thinks the further extension will result in it being clear that there isn't a (passage-sufficient) majority in favour of the proposal after all. And he suggests changing the customary threshold for passage or disqualifying "oppose" votes so that the vote could still pass without consensus. Hmm... can you see why people are suspicious of the legitimacy of the vote? In the past (for years, vide Wiktionary:Votes/Timeline), when a vote showed that there was no consensus for something, the vote was closed at the scheduled time as "no consensus" (or simply as "fails", because votes require consensus to pass). If necessary/desired, another vote was held later after further discussion and advertisement. - -sche (discuss) 18:32, 20 July 2015 (UTC)
We seem to have no problem closing RfDs (which have no maximum time) with "kept no consensus to delete", ie, status quo ante. DCDuring TALK 18:51, 20 July 2015 (UTC)
Requiring consensus to delete is a position that favors the inclusion of more information in the dictionary, unless there is a strong sense that the information should be excluded. The vote at issue here is also to include more information in the dictionary - reliably attested information found in books in print (although one opposer would prefer to limit inclusion because those books don't come from "a publishing house that has published writings of eminent Indologists", and another is solely concerned with the possibility that we will rely on uses from websites, which is not this proposal at all). bd2412 T 18:58, 20 July 2015 (UTC)
It is NOT any Wiktionary policy to "favor the inclusion of more information in the dictionary" without limit. That may be your desire and you may feel that History is on your side and therefore you are justified in using any means you choose to achieve your desire, but not everyone agrees with your views and certainly not with the use of any means, whatever principles of fairness or "due process" they violate. DCDuring TALK 19:11, 20 July 2015 (UTC)
What are you talking about? BD2412 was just observing that the way RFD works, it skews Wiktionary's preference in favour of keeping. A supermajority is required to delete, therefore purely by statistics, content is easier to keep than to delete, and will naturally lead to keeping more than deleting. It has nothing to do with any explicit Wiktionary policy, only a consequence of our existing ones (insofar as RFD's rules are policy). —CodeCat 19:21, 20 July 2015 (UTC)
Correct. --Dan Polansky (talk) 19:25, 20 July 2015 (UTC)
@DCDuring We had a vote on whether to default to excluding romanizations. That vote failed. The consequence is that anyone can enter any transliteration, and whether it is kept or not is up to the whims of RfD (or VfD, if it is entered without citations). My proposal would avoid those disputes for a limited class of transliterations. bd2412 T 19:40, 20 July 2015 (UTC)
Again, I am concerned about process. BD has no trouble closing RfDs, which have no time limit, rather than keeping them open because he apparently likes the result. When it comes time to close a vote, which has a definite time limit he has no objection to extending the vote, apparently because he prefers to see a positive outcome. The common element is the process selected is one that favors his desired outcome. An effort to mount a principled needs to overcome the indisputable appearance of the manipulation of process. I don't doubt that all participants believe that the manipulation of process is justified. I find it hard to believe that they don't think the process is being manipulated. I think that is betrayed by the proposal that someone should to assume the role of a judge and throw out the result of a vote based on no policy or practice. It seems that the idea is to achieve one's objective by any means necessary and practical. DCDuring TALK 19:54, 20 July 2015 (UTC)
Your premises are factually wrong. From WT:RFD: "Time and expiration: Entries and senses should not normally be deleted in less than seven days after nomination. When there is no consensus after some time, the template {{look}} should be added to the bottom of the discussion. If there is no consensus for more than a month, the entry should be kept as a 'no consensus'". I have always abided by those time limits. I have often closed votes as against my own stated preference; no one has ever asserted otherwise. Can you show me a single instance where I closed a vote early because I 'liked the result'? bd2412 T 19:59, 20 July 2015 (UTC)
The above by DCDuring, is in poor taste, IMHO. --Dan Polansky (talk) 20:03, 20 July 2015 (UTC)
I considered the repeated extensions, starting with the first one on this second incarnation (at which the vote was 5-5-1), to be worse than poor taste, to be manipulative of the process. The proposal had failed once before. Why not just end it? DCDuring TALK 21:11, 20 July 2015 (UTC)
1) If you deemed it worse than poor taste, it was your moral duty to say so, which you did not do. You even voted after the 1st extension (diff), although you could have abstained with the comment "I object to the extension" or the like. You did not do that. 3) All I am saying is give votes a chance. Give them a better chance. Recent experience shows that more people do come to votes when they are extended. Recent experience with multiple extensions of votes is a positive one, as far as I am concerned. --Dan Polansky (talk) 22:06, 20 July 2015 (UTC)
Just to prove otherwise, here are five RfD discussions where I supported or would have preferred deleting an entry, and closed the discussion as keep or no consensus: Talk:Mobil, Talk:police protection, Talk:bacon and eggs, Talk:am I right or am I right, Talk:big balls. bd2412 T 20:27, 20 July 2015 (UTC)
That only shows that YOUR VIEW on the principle over your preference in an individual case. How many times have you exercised discretion to delete something not patently garbage? —This unsigned comment was added by DCDuring (talkcontribs).
As I noted on your talk page, there is no discretion involved. If there is consensus to delete, I delete. If not, I close as no consensus, as required by the page instructions. There are also several occasions where I have deleted an entry, per consensus, where I would have preferred to keep it. For example, Talk:dolemite. bd2412 T 22:33, 20 July 2015 (UTC)
I never paid attention to the extensions until after the vote was properly closed and they were raised as an issue; however, the latest one has only yielded opposition based on an apparent misunderstanding of the proposal itself, which is actually much more limited then the new opposition suggests. Currently, well-attested Sanskrit transliterations are included as words in English, and that is absurd, and the point of allowing those transliterations to be called Sanskrit. Opposition at this point seems like an excuse to bash the procedure, not deal with the responsibility of informing readers. bd2412 T 18:44, 20 July 2015 (UTC)
I extended the vote again since I consider the closing illegitimate and irregular, and I said so on the day of the closure on the talk page of the closer. This discussion and previous ones confirm that multiple editors see this the same way I did. I repeatedly extended the vote knowing that I must not stop as soon as a threshold is reached since that would create accusations of selection bias; and it did create such accusations. Notice that, based on my preference and my cast vote, my preferred outcome would result from keeping the vote closed and not interfering. It must be obvious that I do not act so as to convince more people to oppose; I wish more people to support, as I did. I act on principle, as best as I can. --Dan Polansky (talk) 19:13, 20 July 2015 (UTC)
@-sche I've voted now. —Aɴɢʀ (talk) 06:13, 21 July 2015 (UTC)
  • This is not the first time that Polansky is pushing his version of justice. Any vote must be expire when it was started, otherwise all of the voters should be informed of an extension. Extending the vote just before the expiry is retroactively changing the rules. If there are doubts as to whether the vote is legitimate, or whether it reflects a consensus of the relevant community, it can be restarted again in the future. --Ivan Štambuk (talk) 11:55, 3 August 2015 (UTC)
    If understood as a description of the actual practice, the above is untrue: There is an uncontested precedent of extending votes, as I documented at Wiktionary:Votes/pl-2015-07/Disallowing extending of votes. On an alternative reading, the above is a set of prescriptions (not descriptions) that is probably not supported by consensus of editors. Especially "Extending the vote just before the expiry is retroactively changing the rules" is wrong. --Dan Polansky (talk) 09:03, 8 August 2015 (UTC)

No LDL for sign languages?[edit]

Is there any particular reason that LDL is restricted to spoken languages? It seems strange that sign languages can't be cited that way, after all they're languages as well and there really aren't that many texts written in sign language. -- Liliana 08:53, 21 July 2015 (UTC)

Is it restricted to spoken languages?​—msh210 (talk) 16:08, 21 July 2015 (UTC)
WT:CFI#Number of citations says, "For all other spoken languages that are living, only one use or mention is adequate, subject to the following requirements:". Perhaps whoever wrote that meant "natural languages", since constructed languages are subject to their own CFI. —Aɴɢʀ (talk) 17:55, 21 July 2015 (UTC)
Ah. The "spoken" comes from [[Wt:Votes/2012-06/Well Documented Languages]], where it was part of the original version of the page by BenjaminBarrett12, and where it seems to have gone unnoticed.​—msh210 (talk) 22:16, 21 July 2015 (UTC)
And that came from [[Wt:Beer parlour/2012/June#New update to languages with limited documentation]], where, too, the "spoken" appears to have gone unnoticed.​—msh210 (talk) 22:20, 21 July 2015 (UTC)
I'd support changing "spoken" to "natural" so that sign languages are also treated as LDLs. We do have some specific criteria for sign languages, although they are not on WT:CFI proper but on a page it links to from a clearly-marked section: Wiktionary:About sign languages#Criteria_for_inclusion. - -sche (discuss) 22:05, 22 July 2015 (UTC)
I purposefully left sign languages out of the LDL because they have their own rules for inclusion as shown in the CFI, which references the sign CFI Wiktionary:About sign languages.
The sign language CFI says: 'Unlike spoken languages, sign languages are rarely written outside of reference materials and academic publications. Thus, the "clearly widespread use" condition of Wiktionary:Criteria for inclusion (CFI) is considered to be met by any sign that is used by multiple independent deaf communities, and the "usage in permanently recorded media" condition includes any visual media that has been widely distributed, including DVDs, broadcast television, and sign language dictionaries.' I have not been active on Wiktionary for some time, so I might be out of date, but I would not be in favor of adding sign languages to the LDL.
As to Angr's point about natural languages, the CFI page includes a link to Wiktionary:Criteria_for_inclusion/Well_documented_languages which specifically notes that only approved constructed languages are acceptable.
A very picky follow-up WRT to Angr's point. The number of citations requirement says: "For languages well documented on the Internet, three citations in which a term is used is the minimum number for inclusion in Wiktionary. For terms in extinct languages, one use in a contemporaneous source is the minimum, or one mention is adequate subject to the below requirements. For all other spoken languages that are living, only one use or mention is adequate, subject to the following requirements:" Somebody might argue that a spoken, living constructed language that is not in the list of "languages well documented on the Internet" therefore requires only one use or mention. However, constructed languages are specifically addressed later on the page, so I don't think this is an issue of concern. -BB12 (talk) 00:31, 23 July 2015 (UTC)

A suggestion about Category:place names[edit]

Category:Place_names should have "Place names by territorial entities" not directly "► Place names of England" in it, because otherwise specific categories will easily overshadow other meaningful subcategories.

Also, it could have "Hydronyms" as a subcategory containing categories like lakes, rivers, seas and entries directly that are neither of these.--Dixtosa (talk) 16:46, 21 July 2015 (UTC)

Re "Place names by territorial entities", I agree. - -sche (discuss) 22:06, 22 July 2015 (UTC)

User:Benwing for admin[edit]

Benwing (talkcontribs) has accepted my nomination for adminship. I think most of us know his great contributions, abilities and character (in terms of his presence, activities and interactions with others). Let's support him at Wiktionary:Votes/sy-2015-07/User:Benwing for admin! --Anatoli T. (обсудить/вклад) 13:01, 22 July 2015 (UTC)

Benwing's user page says (s)he is on wikibreak as of last September, but Special:Contributions/Benwing suggests otherwise. If the wikibreak is over, please remove that statement. —Aɴɢʀ (talk) 13:08, 22 July 2015 (UTC)
Yes, good point. --Anatoli T. (обсудить/вклад) 13:13, 22 July 2015 (UTC)
I removed that; it was out of date. Benwing (talk) 13:22, 22 July 2015 (UTC)

Main page of the app[edit]

The main page of the Wiktionary app just shows the (English-language) Word of the Day. Can/should it also display the Foreign Word of the Day? If so, how do we implement that? —Aɴɢʀ (talk) 05:16, 23 July 2015 (UTC)

Instructions for Mobile homepage formatting. --Panda10 (talk) 12:12, 23 July 2015 (UTC)
Hmm, that says that anything appearing on the mobile main page should be tagged with mf-XXX, but when I look at the code of our main page, not even the (English) Word of the Day has that tag, so I can't figure out how the mobile main page knows to show it. —Aɴɢʀ (talk) 13:58, 23 July 2015 (UTC)
Right click on the page and select View Page Source. Search for mf- and you will see that the word of the day has an id=mf-wotd next to it. I'm not sure why this is not visible on the edit screen. --Panda10 (talk) 14:46, 23 July 2015 (UTC)
I figured it out: the id=mf-wotd is in Template:WOTD, not directly in the Main Page. However, since {{WOTD}} and {{FWOTD}} have totally different setups, I can't figure out where to put the id=mf-wotd to get the Foreign Word of the Day tagged correctly. —Aɴɢʀ (talk) 15:16, 23 July 2015 (UTC)
I added the tag. What else needs to be done? --WikiTiki89 15:21, 23 July 2015 (UTC)
Nothing, I guess. I just checked both my phone and my tablet and it looks good on both. Thanks! —Aɴɢʀ (talk) 16:36, 23 July 2015 (UTC)

Proposal to create PNG thumbnails of static GIF images[edit]

The thumbnail of this gif is of really bad quality.
How a PNG thumb of this GIF would look like

There is a proposal at the Commons Village Pump requesting feedback about the thumbnails of static GIF images: It states that static GIF files should have their thumbnails created in PNG. The advantages of PNG over GIF would be visible especially with GIF images using an alpha channel. (compare the thumbnails on the side)

This change would affect all wikis, so if you support/oppose or want to give general feedback/concerns, please post them to the proposal page. Thank you. --McZusatz (talk) & MediaWiki message delivery (talk) 05:08, 24 July 2015 (UTC)


I created {{huh}} here because {{cleanup}} would not adequately explain the problem; the way wikipedia uses {{huh}} would have explained what I wanted to convey. I suggest making a template similar to the way it is used on Wikipedia. 00:12, 25 July 2015 (UTC)

We operate differently here from Wikipedia. If you feel like {{cleanup}} is inadequate, it's better to start a new discussion about the term in question at the Tea room. —Aɴɢʀ (talk) 08:23, 25 July 2015 (UTC)

Transliteration of Ξ[edit]

split off from an old general discussion of transliteration at Wiktionary:Grease pit/2014/June#Automatic transcription appears to override manual transcription?

@LlywelynII has pointed out that Wiktionary's idiosyncratic automatic transliteration of Ξ as ks should be changed to x; I support this, as it is how every other authority I can find on Greek transliterates the character (viz ELOT, UN, ISO 843, ALA-LC, BGN/PCGN). It is also how other etymological dictionaries transliterate the character (look at the etymology of climax in Merriam-Webster, Dictionary.com, Collins, and OxfordDictionaries). - -sche (discuss) 17:56, 28 July 2015 (UTC)

@-sche: Yes, I'd support this change, too. — I.S.M.E.T.A. 18:30, 28 July 2015 (UTC)
My preference is for ks, because we also transliterate ps. —CodeCat 18:39, 28 July 2015 (UTC)
I don't take your point. ⟨ps⟩ is the standard transliteration and always has been. ⟨ks⟩ isn't and never has been. It's not even useful since ⟨x⟩ simply is a /ks/ sound; indeed, it's actively misleading since ⟨κσ⟩ is actually ⟨ks⟩.
Now, I'm fully on board keeping ⟨χ⟩ as ⟨kh⟩ because it has nothing to do with English's /t͡ʃ/ noise and even support treating ⟨φ⟩ differently once other scholars do as well. But it's not a biggie either way. We can quickly link to a full Greek entry and the Greek pronunciation template does a good job presenting the changing pronunciations over time. — LlywelynII 23:00, 28 July 2015 (UTC)
If there hadn't been any standards, I would have preferred ks. Now we just have to decide between being less confusing or following the standards. --WikiTiki89 18:43, 28 July 2015 (UTC)
Why? English has a letter for the sound /ks/ and it's ⟨x⟩. What do you think is confusing about it? The transliteration is into English, not IPA. Further, how do you feel that it isn't confusing to use an idiosyncratic standard which conflates ⟨ξ⟩ and ⟨κσ⟩? — LlywelynII 23:00, 28 July 2015 (UTC)
Because transliterations don't only go by English. For example, we use x to transliterate Russian /x/ and Persian /χ/. Not to mention that it looks too much like the Greek χ. --WikiTiki89 13:32, 29 July 2015 (UTC)
For the same reason it's not confusing to conflate ψ and πσ. —CodeCat 00:55, 29 July 2015 (UTC)
Obviously, I support the change, even though it is somewhat off-topic to talk about romanization schemes for modern Greek (ELOT & al.) when dealing with ancient Greek. (All the romanization schemes for ancient Greek also use ⟨x⟩, though, so it's no biggie.)
I'll take the opportunity, though, to note that once you saw every single transliteration scheme backed me up there was absolutely nothing helpful in maintaining broken transliterations by repeatedly reverting my proper corrections. If Wiktionary doesn't have a wp:iar analogue, you need one. We're here to improve the entries, not just make ourselves feel big by screwing with people and maintaining errors on procedural grounds. There isn't even a policy that the term template must always be used in every etymology section. You just felt like that. It's nuts. — LlywelynII 23:00, 28 July 2015 (UTC)
Yes, we're here to improve the entries, but reasonable people can disagree about how to achieve that, and reasonable people can do things for other reasons than to "make ourselves feel big by screwing with people". There's a difference between argumentation and argumentativeness, between logic and ad hominem. Please try to stay on the right side of it. You're very sure you're right, and you want everyone to let you do things your own way, but then, the same could be said of this guy. He would have said it was all about improving the entries, too.
Now, to the merits: there are reasons for the current transliteration scheme that have nothing to do with anyone being dropped on their head when they were little. X is open to confusion: not only does it look like χ, but it's been used to represent it, for instance in beta code. There's no doubt about what "ks" represents. It's also a matter of being consistent in using digraphs for both the consonant + s series and the aspirated consonant series. Your way has merit, but it's not the only way that makes sense. Chuck Entz (talk) 06:46, 29 July 2015 (UTC)
Done. I note that transliteration as 'x' was the original behaviour, until it was changed in 2013. - -sche (discuss) 03:20, 8 August 2015 (UTC)
@LlywelynII please note that it took a single edit to change not just the entry you were screaming bloody murder over, but every single entry that uses a template to link to any Ancient Greek word with that letter in it anywhere on Wiktionary, and every link using the templates that will ever be added to Wiktionary as long as the module is in its current state. Someone will have to check all the entries with Ξ in them, though, because other people may have done what you wanted to do and hard-coded the transliterations. Chuck Entz (talk) 05:20, 8 August 2015 (UTC)
  • One should not be confused about the status quo ante at Module:grc-translit, which was created on 8 September 2013‎ by User:ZxxZxxZ. The decisive thing should be the status quo in the manually entered transliterations that were used in Ancient Greek entries before the module was created. I recall User:Atelaes had some cards in Ancient Greek transliteration. I don't have enough energy to do this, but someone should investigate what the mainspace transliteration was back then, and then either keep the -sche change to the module or revert it as not yet supported by consensus. --Dan Polansky (talk) 08:53, 8 August 2015 (UTC)

Derivation categories for multiple homonymous morphemes[edit]

Many languages have morphemes that are spelled and/or pronounced the same, but have different origins and different uses. An example is the English -er: it has a variety of unrelated uses. Currently, {{suffix}}, {{affix}} and family would put words in the same category even if they are derived from different underlying suffixes. Consequently, the categories are a bit of a mess, just see Category:English words suffixed with -er. The same is now also happening with PIE root categories; some roots are actually two distinct but homonymic roots, and it's necessary to distinguish which of them a word came from. I think this should be fixed somehow, but I'm not sure in what way. —CodeCat 21:24, 28 July 2015 (UTC)

Perhaps we could put a disambiguation suffix on the category names so that different homonyms go to different categories (maybe [[Category:English words suffixed with -er/2]] or [[Category:English words suffixed with -er:2]]?). That would require adding a parameter to the affix templates to specify the disambiguator. It would also require adding the same parameter to the catboilers so they could accommodate the suffixes. The catboilers would need to add the unsuffixed category so the suffixed categories would show up as subcategories in the unsuffixed categories. It would also be a good idea to add a parameter for a sense ID or similar anchor to the catboilers so that they could add the anchor to the url in the same way that catfix adds the language tag. It might make it easier if the anchor and the parameters/suffixes were all the same.
The difficult part would be keeping the unsuffixed category empty: I don't really see a way to inform people who add the affix templates to etymologies about whether the morpheme they're adding has suffixed categories- they would have to check. I suppose you could have the catboilers check for both suffixed subcategories and entries being present at the same time, and adding the unsuffixed categories to a maintenance category. Chuck Entz (talk) 01:51, 2 August 2015 (UTC)
I was thinking of numbers as well at first, but you mentioned senseid. If we're going to be using senseid (which we should) then why not use the senseid itself as the disambiguator? We'd have something like Category:English words suffixed with -er (agent noun), and the page -er itself would have {{senseid|en|agent noun}}.
This does bring up a shortcoming of senseid though. It's designed and intended for tagging individual senses. But what if we want to tag whole etymologies or parts of speech? Do we need a new template, or should we just continue using the existing senseid? —CodeCat 12:37, 2 August 2015 (UTC)
How about using the part of speech in the category name? English nouns suffixed with -xyz? This would require a pos= parameter in the template call, though. --Panda10 (talk) 13:09, 2 August 2015 (UTC)
That's not going to work if many have the same part of speech. Part of speech is not enough to uniquely separate them. Consider for example bystander versus bylaw. Moreover, it doesn't work at all with the PIE root categories or any other case where POS is not relevant. —CodeCat 13:33, 2 August 2015 (UTC)

NORM vote 2[edit]

I revised WT:NORM based on comments/criticism from the first vote, then created Wiktionary:Votes/pl-2015-07/Normalization of entries 2. --Daniel Carrero (talk) 00:05, 29 July 2015 (UTC)

Model pages[edit]

I have been trying to rewrite Wiktionary:About Greek for some time (years), the current page is at least 5 years out of date and has not been revised to reflect changes. Since a picture is worth a thousand words I'm thinking of creating model pages to illustrate how entries should be structured - obviating the need to update About Greek very often. These would be protected, categorised, and limited in number to a bare minimum. I would welcome any comments.   — Saltmarshσυζήτηση-talk 10:42, 29 July 2015 (UTC)

Spanish Voseo forms[edit]

I would like to fix up all the voseo redlinks, but am unsure if these conjugations are correct since they're also missing from the Spanish Wiktionary. Could someone check these before I add them? Thanks. —This unsigned comment was added by Codeofdusk (talkcontribs) at 00:54, 31 July 2015 (UTC).

I'd say that all the regular -ar verbs are conjugated fine. A few other ones might be tricky. I changed a couple things on one template, which suggests that there could be more errors in others. Also, I'm gonna make a few more missing voseo categories. I'll let you know. --A230rjfowe (talk) 14:19, 31 July 2015 (UTC)
There's a new one at Category:Spanish verbs having voseo red links in their conjugation table (regular -ar verbs) for verbs using Template:es-conj-ar. You might want to start with them. --A230rjfowe (talk) 14:22, 31 July 2015 (UTC)
Thanks! Will fix those when I'm able. Codeofdusk (talk) 16:58, 31 July 2015 (UTC)
Done, but the categories need to be updated.Codeofdusk (talk) 22:19, 31 July 2015 (UTC)
Could we create a Voseo redlinks category for -car and -gar verbs? Codeofdusk (talk) 07:12, 1 August 2015 (UTC)
Done: Category:Spanish verbs having voseo red links in their conjugation table (-car) and Category:Spanish verbs having voseo red links in their conjugation table (-gar) --A230rjfowe (talk) 07:25, 1 August 2015 (UTC)

What does a Healthy Community look like to you?[edit]

Community Health Cover art News portal.png

The Community Engagement department at the Wikimedia Foundation has launched a new learning campaign. The WMF wants to record community impressions about what makes a healthy online community. Share your views and/or create a drawing and take a chance to win a Wikimania 2016 scholarship! Join the WMF as we begin a conversation about Community Health. Contribute a drawing or answer the questions on the campaign's page.

Why get involved?[edit]

The world is changing. The way we relate to knowledge is transforming. As the next billion people come online, the Wikimedia movement is working to bring more users on the wiki projects. The way we interact and collaborate online are key to building sustainable projects. How accessible are Wikimedia projects to newcomers today? Are we helping each other learn?
Share your views on this matter that affects us all!
We invite everyone to take part in this learning campaign. Wikimedia Foundation will distribute one Wikimania Scholarship 2016 among those participants who are eligible.

More information[edit]

Happy editing!

MediaWiki message delivery (talk) 23:43, 31 July 2015 (UTC)

A healthy community is definitely one where people get blocked for hate speech.Codeofdusk (talk) 07:18, 3 August 2015 (UTC)

August 2015

Category:(langname) plurals and Category:(langname) noun plural forms[edit]

Continuing the discussion from Module talk:category tree/poscatboiler/data/non-lemma forms § Plurals and noun plural forms

Right now both Category:(langname) plurals and Category:(langname) noun plural forms exist, their descriptions are the same "(langname) nouns that are inflected to be quantified as more than one (more than two in some languages with dual number)." And they are used in mostly the same way. The main difference is that there are counterparts to Category:(langname) noun plural forms, such as Category:(langname) noun dual forms, which don't exist for Category:(langname) plurals. And this also follows with the naming scheme of (langname) adjective * forms.

I think we should either change plurals to be more general ((langname) terms that are... (vs (langname) nouns that are...)) and move it out of noun forms, or better yet just remove it. Note that (langname) singularia/dualia/pluralia tantum categories exist. Enosh (talk) 18:56, 2 August 2015 (UTC)

I proposed merging the plurals category into the noun plural forms category before, for consistency with other categories. I still support this. —CodeCat 19:02, 2 August 2015 (UTC)
I thought the goal was to merge any plural category into their appropriate forms category. For example Category:Hungarian plurals were merged into Category:Hungarian noun forms a long time ago. So there is no separate category for plurals at this moment. Isn't the goal the same for all languages? Is this discussion related: [2]? --Panda10 (talk) 20:18, 2 August 2015 (UTC)
Yes, it's the same proposal. But I'm not sure what you're asking. —CodeCat 20:23, 2 August 2015 (UTC)
I support merging Category:English plurals into Category:English noun plural forms for consistency with other categories. See also: Category:Noun plural forms by language. --Daniel Carrero (talk) 21:38, 2 August 2015 (UTC)
I support doing this in general. We have both Category:Arabic plurals and Category:Arabic noun plural forms, which ought to have the same contents but don't for reasons I'm not quite sure of; it's a bit of a mess. Benwing (talk) 05:53, 4 August 2015 (UTC)
Finally done for English. That was a lot of work for sure. A lot of entries needed manual fixing too so it wasn't just a simple bot run. In many entries, the plural-of definition was mixed in with other "proper" lemma definitions even though these should be kept to separate noun sections. There were also many entries where the headword line specified a noun lemma, rather than a noun plural form. —CodeCat 22:24, 19 August 2015 (UTC)

Thai transliterations with tones[edit]

Discussion moved from Wiktionary:Grease pit/2015/August#Thai transliterations with tones.

Native speakers seem to dislike dictionary and textbook transliterations designed for learners, which includes tones and replace it with Royal Thai General System of Transcription (RTGS). I see my older edits replace toned transliterations with RTGS.

I think it's a problem. The standard Thai transliteration system (RTGS) lacks not just tones but displays short and long vowels the same way, merges some consonants. I think it can be used as one of the systems but not the main one. I mentioned this in this discussion.

I insist that transliterating Thai tones is very important, not just the nominal but irregular tones as well. We could include RTGS along with phonetic transliterations (another parameter in Thai headwords?).

For example, ฉัน is nominally "chăn" but normally pronounced "chán" (pronoun), also ไหม (sense 1) is pronounced "mái" (nominally "măi"). I suggest we should use toned transliterations, as dictionaries and textbooks do, not as prescribed by the Thai government. @Stephen G. Brown, Iudexvivorum, Iyouwetheyhesheit. --Anatoli T. (обсудить/вклад) 12:17, 3 August 2015 (UTC)

  1. I agree that we need a romanisation system that better reflects tones, short and long vowels, etc.
  2. What system should we use then?
  3. The system developed by Thai2english (T2E) might be okay. But the T2E machine transliterator should be used with caution, as it sometimes gives incorrect transliterations (see the table below).
  4. Some other systems that might work:
    1. The now-defunct 1939 version of the RTGS (English translation) contains a general system and a precise system (which records tones, short and long vowels, etc.).
    2. The ALA-LC system is generally used by libraries in English-speaking countries. But this system lacks tone marks. (Could we add tone marks ourselves?)
    3. ISO 11940 is used in academic context.
--iudexvivorum (talk) 14:22, 3 August 2015 (UTC)
terms romanised by
T2E transliterator
correctly romanised
according to RTGS system according to T2E system
ภิยโย pí-yá-yoh phin-yo pin-yoh
อธิกมาส a-tík-mâat a-thi-ka-mat;
ทรูก trôok suk sôok
ซอมซ่อ som-sôr sommaso som-má-sôr
รอมร่อ rom-rôr rommaro rom-má-rôr
เทพรัตนราชสุดา tâyp-rát-dtà-ná-râat-chá-sù-daa theppha rat rat suda tâyp-pá-rát-râat-sù-daa
นิลรัตน์ nin-rát ninlarat nin-lá-rát
อุตบล u-dtà-bon utbon ùt-bon
I completely agree that transliterations need to reflect long vowels and tone marks. If I'm trying to learn Thai, it will do me no good to have important phonetic information like this left out. Native speakers should not be the ones determining transliteration; translit is not designed for them. However, I think that this T2E system looks just awful, and I don't think it will help. People expect foreign words to follow the usage where a e i o u stand for the sounds they have in Latin and Spanish, rather than using weird things like ay for /e/, oo for /u/, or for (presumably) /ɔ/ (this latter notation is especially unhelpful for American English speakers), etc. ISO 11940 won't work either because it's a translit system in the narrow sense in that it reflects the writing rather than the pronunciation (properly speaking, Wiktionary misuses "transliteration" to mean "transcription" but that is a discussion for another day). Adding the tone marks to the ALA-LC system is not a bad idea; you could imagine taking the T2E tone marks and adding them to the ALA-LC system. You could also imagine rewriting long vowels as e.g. aa instead of ā, to avoid the stacking up of diacritics when long vowels are combined with tone marks. Benwing (talk) 06:15, 4 August 2015 (UTC)

Here's a comparison between some systems: --iudexvivorum (talk) 11:39, 4 August 2015 (UTC)

# Thai meaning IPA romanisation
(without tone marks)
tone marks added
(using numbers to indicate tones - see notes below)
1 ไม้ใหม่ไหม้มั้ย Was that new piece of wood burnt by the fire? mäːj˦˥ mäj˩ mäj˥˩ mäj˦˥ mai mai mai mai máai mài mâi mái māi mai mai mai māi4 mai2 mai3 mai4
2 กรุงเทพมหานคร อมรรัตนโกสินทร์ The city as great as a celestial city, where the Emerald Buddha stays in perpetuity. krũŋ˧ tʰeːp̚˥˩ mä˥.häː˩˦ nä˥.kʰɔ̃ːn˧ ʔä˩.mɔ̃ːn˧ rät̚˥.tä˩.nä˥ koː˧.sĩn˩˥ krungthepmahanakhon amonrattanakosin grung-tâyp-má-hăa-ná-kon a-mon-rát-dtà-ná-goh-sĭn krungthēpmahānakhǭn ʻamǭnrattanakōsin krung1-thēp2-ma4-hā5-na4-khǭn1 ʻa2-mǭn1-rat4-ta2-na4-kō1-sin5
3 เสียงลือเสียงเล่าอ้าง อันใด พี่เอย What tales, what rumours, you ask? siːä̃ŋ˩˦ lɯː˧ siːä̃ŋ˩˦ läw˥˩ ʔä̃ːŋ˥˩ ʔä̃n˧ däj˧ pʰiː˥˩ ʔɤːj˧ siang lue siang lao ang an dai phi oei sĭang leu sĭang lâo âang an dai pêe oie sīang lū’ sīang lao ʻāng ʻan dai phī ʻœi sīang5 lū’1 sīang5 lao3 ʻāng3 ʻan1 dai1 phī3 ʻœi1
4 อันมือไกวเปลไซร้แต่ไรมา คือหัตถาครองพิภพจบสากล The hand that rocks the cradle is the hand that rules the world. ʔä̃ːŋ˥˩ mɯː˧ kwäj˧ pleː˧ säj˦˥ tɛː˨˩ raj˧ mäː˧ kʰɯː˧ hät̚˩.tʰäː˩˥ kʰrɔ̃ːŋ˧ pʰi˥.pʰop̚˥ t͡ɕop̚˩ säː˩˥.kõn˧ an mue kwai ple sai tae rai ma khue hattha khrong phiphop chop sakon an meu gwai bplay sái dtàe rai maa keu hàt-tăa krong pí-póp jòp săa-gon ʻan mū’ kwai plē sai tǣ rai mā khū’ hatthā khrǭng phiphop čhop sākon ʻan1 mū’1 kwai1 plē1 sai4 tǣ2 rai1 mā1 khū’1 hat2-thā5 khrǭng1 phi4-phop4 čhop2 sā5-kon1
Tone representation:
"1" = สามัญ (mid; [aː˧])
"2" = เอก (low; [aː˨˩] / [aː˩])
"3" = โท (falling; [aː˥˩])
"4" = ตรี (high; [aː˦˥] / [aː˥])
"5" = จัตวา (rising; [aː˩˩˦] / [aː˩˦])
I got the idea of using numbers from the Wade–Giles system for romanising Chinese. But the numbers will be superscript under the WG system (e.g. "p'in1-yin1" for "拼音").
@Iudexvivorum Thanks. Good job! I was going to suggest the system used by Benjawan Poomsan Becker. In his dictionaries he uses special characters for vowels "ʉ" for อึ, "ɛ" for แอะ, "ɔ" for เอาะ and "ə" for เออะ. Long vowels are simply duplicated, e.g. ตืน is "dtʉʉn". Tone marks are used on the first vowels only, e.g. เบิก is "bə̀ək". Tone marks are (using "a"): "a" (1 - no tone mark), "à" (2), "â" (3), "á" (4) and "ǎ" (5). Like T2E he uses d-dt-t, b-bp-p.
Using that system the examples above become:
  • ไม้ใหม่ไหม้มั้ย: máai mài mâi mái
  • กรุงเทพมหานคร อมรรัตนโกสินทร์: grung-têep-má-haa-ná-kon a-mon-rát-dtà-ná-goh-sǐn
  • เสียงลือเสียงเล่าอ้าง อันใด พี่เอย: sǐang leu sǐang lâo âang an dai pêe oie
I agree that Thai2English may not transliterate words correctly, which it doesn't have in their dictionary. (I wonder if อธิกมาส has various readings, though. Both T2E and http://www.thai-language.com transliterate it as "atíkmâat".). Are "a-tí-gà-mâat" and "a-tík-gà-mâat" irregular alternative readings? --Anatoli T. (обсудить/вклад) 12:32, 4 August 2015 (UTC)
  1. The term อธิกมาส is never pronounced "a-thik-mat" (a-tík-mâat). Grammatically, it is pronounced "a-thi-ka-mat" (a-tí-gà-mâat), as it is from Sanskrit अधिकमास adhikamāsa. But people also pronounce it as "a-thik-ka-mat" (a-tík-gà-mâat) and this pronunciation has become so popular. The Royal Institute Dictionary, the official dictionary of the Thai language, therefore accepts both pronunciations.
  2. There are many other similar cases. Some are shown in the table below.
  3. FYI: The Royal Institute of Thailand publishes a popular book called "อ่านอย่างไรและเขียนอย่างไร" ("How to Write? How to Read?"), containing common misspellings and mispronunciations, pronunciations of proper nouns, useful rules concerning writing and reading, etc. The book is regularly updated. The 2014 edition (22th edition; ISBN 9786167073965) seems to be its latest edition. But it is in Thai only.
--iudexvivorum (talk) 14:29, 4 August 2015 (UTC)
term acceptable pronunciations notes
grammatical popular
กรณี RTGS: karani
T2E: gà-rá-nee
IPA: kä˩.rä˥.niː˧
RTGS: korani
T2E: gor-rá-nee
IPA: kɔː˧.rä˥.niː˧
from Sanskrit करणि karaṇi
ครหา RTGS: kharaha
T2E: ká-rá-hăa
IPA: kʰä˥.rä˥.haː˩˩˦
RTGS: khoraha
T2E: kor-rá-hăa
IPA: kʰɔː˧.rä˥.haː˩˩˦
from Sanskrit गर्हा gar'hā
ปรัชญา RTGS: prat-ya
T2E: bpràt-yaa
IPA: prät̚˩.jäː˧
RTGS: pratchaya
T2E: bpràt-chá-yaa
IPA: prät̚˩.t͡ɕʰä˥.jäː˧
from Sanskrit प्राज्य prājya
ปรมาจารย์ RTGS: paramachan
T2E: bpà-rá-maa-jaan
IPA: pä˩.rä˥.mäː˧.t͡ɕä̃ːn˧
RTGS: poramachan
T2E: bpor-rá-maa-jaan
IPA: pɔː˧.rä˥.mäː˧.t͡ɕä̃ːn˧
from Sanskrit परम parama + आचार्य ācārya
มนุษยสัมพันธ์ RTGS: manutsayasamphan
T2E: má-nút-sà-yá-săm-pan
IPA: mä̃˧.nut̚˥.sä˩.jä˧.sä̃m˩˥.pʰä̃n˧
RTGS: manutsamphan
T2E: má-nút-săm-pan
IPA: mä̃˧.nut̚˥.sä̃m˩˥.pʰä̃n˧
from Sanskrit मनुष्य manuṣya + सम्बन्ध sambandha
อธิบดี RTGS: a-thi-bodi
T2E: a-tí-bor-dee
IPA: ʔä˩.tʰi˥.bɔː˧.diː˧
RTGS: a-thipbodi
T2E: a-típ-bor-dee
IPA: ʔä˩.tʰip̚˥.bɔː˧.diː˧
from Sanskrit अधिपति adhipati
อาชญา RTGS: at-ya
T2E: àat-yaa
IPA: ʔäːt̚˨˩.jäː˧
RTGS: atchaya
T2E: àat-chá-yaa
IPA: ʔäːt̚˨˩.t͡ɕʰä˥.jäː˧
from Sanskrit आज्य ājya
If I were to design a Thai translit system, I'd want the following:
  1. Use diacritics for tones rather than numbers; numbers look ugly to me and take up extra room.
  2. Use double letters rather than macrons; this is necessary with diacritic tonal marks to avoid double diacritics.
  3. Don't separate syllables with hyphens; that looks ugly to me and takes up lots of extra room.
  4. Use t th d rather than d t dt.
However, if Benjawan Poomsan Becker's system satisfies 1-3 but not 4, then maybe we should go ahead and use it in the interest of using an existing system rather than rolling our own. Benwing (talk) 08:31, 5 August 2015 (UTC)
@Iudexvivorum Thanks for providing this info. Irregular pronunciation was a side question. We still want to transliterate Thai words with irregular pronunciations phonetically. BTW, you can use automatic transliterations for Sanskrit, e.g. करणि ‎(karaṇi), गर्हा ‎(garhā), प्राज्य ‎(prājya), etc. Unfortunately, it seems that some online dictionaries, including thai2english and thai-language.com don't always provide phonetic transliterations or respellings for irregular words. (The latter uses yet another transliteration system, which is great for learning but not good for dictionaries) If I get some words wrong, I'd appreciate your corrections!
@Benwing I favour Benjawan Poomsan Becker's system but it also uses hyphens, like Thai2English. Hyphens can be either removed or added regardless of what system we choose. It's easier to read Thai correctly when syllables are split by hyphens. Initials and finals are pronounced quite differently in Thai like in many East Asian languages and like many East Asian languages, consonants change pronunciations when they are finals, specifically - s, ch, j, d, dt, t are all pronounced as a clipped "t" [t̚] when they are finals, p, bp, b, f are all [p̚], g, k are [k̚] and n, l and r become [n]. It's important to separate clusters like "kla" from "-k-la", "tra" from "-t-ra", etc. User:Stephen G. Brown also favours using solid words, without hyphens. There are pros and cons with languages like Thai with both. Textbooks and dictionaries favour hyphens, sometimes spaces after each syllable.
Shall I make proposed full tables with Benjawan Poomsan Becker's system? --Anatoli T. (обсудить/вклад) 11:39, 6 August 2015 (UTC)
@Atitarev As for hyphens, I guess I'm used to Pinyin, written without them. But I also kind of would have expected final s, ch, j, etc. to be transcribed as t to follow the pronunciation. But I imagine whatever Becker does should work fine. If dictionaries tend to use hyphens, for example, then that's what we should do. Benwing (talk) 21:35, 6 August 2015 (UTC)
@Iudexvivorum, Benwing I've slowly started using Becker's transliteration, as in เรียก, including a usex, e.g.:
เรียกรถแท็กซี่แล้วยัง?rîak rót tɛ́k-sîi lɛ́ɛo yang? ― Did you call the taxi?.
I've also started Category:Thai terms with irregular pronunciations, which I think could be useful. For irregular pronunciations as in ชาติ ‎(châat) I've added a line "Phonetic respelling: ชาด". What do you think? Sorry, I haven't provided a full table for your consideration because I don't know your opinion on the change (see my post above - 12:32, 4 August 2015). --Anatoli T. (обсудить/вклад) 00:49, 11 August 2015 (UTC)
  1. What you've done above looks great! Anyway, "เรียกรถแท็กซี่หรือยัง?" sounds more natural than "เรียกรถแท็กซี่แล้วยัง?". I've edited the entry เรียก. But I haven't provided transliterations (because I don't know how) and I haven't replaced "เรียกรถแท็กซี่แล้วยัง?" with "เรียกรถแท็กซี่หรือยัง?". I hope you will further improve the entry.
  2. I've been waiting for the full table; that's why I didn't give any opinion, lol! I'll also start using the system as soon as possible. And I think, for readers' sake, you should create a page on Wiktionary that contains the table (like the page Wiktionary:International Phonetic Alphabet) and the transliterations should be linked to that page (by means of template or any other means).
--iudexvivorum (talk) 02:12, 11 August 2015 (UTC)
@Iudexvivorum OK, great. I'll make a table and it will make it easy to look up and copy/paste if needed and I'll teach you some tricks to make adding transliterations easier (if you use Firefox, it's even easier). We don't normally link transliterations to templates (just using tr=) but if entries contain more than one transliteration, it could be done, I guess - I will ask for assistance to enhance Thai headword modules/templates. Wiktionary:Thai transliteration and Wiktionary:About Thai will need to be updated. I will try adding new transliterations to your usage examples. You can use the new transliteration "rʉ̌ʉ-yang" for หรือยัง, if you want to replace แล้วยัง with หรือยัง :). BTW, can หรือยัง be considered a single term? Does it need a space instead of a hyphen between the two syllables? I trust your judgement on what sounds more natural, of course, since my Thai is very basic, LOL! --Anatoli T. (обсудить/вклад) 02:50, 11 August 2015 (UTC)
  1. Thank you so much! I've replaced "แล้วยัง" with "หรือยัง".
  2. "หรือยัง", "แล้วหรือ", "แล้วหรือยัง" are generally interchangeable. For example:
    1. "จะไปหรือยัง", "จะไปแล้วหรือ", "จะไปแล้วหรือยัง" = "shouldn't we go yet?"
    2. "ไปได้หรือยัง", "ไปได้แล้วหรือ", "ไปได้แล้วหรือยัง" = "can't we go yet?"
    3. "ไปหรือยัง", "ไปแล้วหรือ", "ไปแล้วหรือยัง" = "hasn't he gone yet?" / "hasn't he left yet?"
  3. Using "แล้วยัง" in a question is rare in the Central Thai dialect, though it would mean the same as the above phrases. But it can be found in the Northern Thai and Northeastern Thai dialects. (In fact, in Northern Thai, "แล้วยัง" is even less common than "แล้วกา".)
  4. I don't think "แล้วยัง", "หรือยัง", "แล้วหรือ", "แล้วหรือยัง" can be considered single terms, just as "should not", "have not", "is not", "are not", etc., are not single terms. (That's why I removed the hyphen from "rʉ̌ʉ-yang".)
--iudexvivorum (talk) 04:03, 11 August 2015 (UTC)

Feedback on alternative layout for Template:de-decl-adj-table[edit]

I created an alternative layout for this template, see User:CodeCat/de-adj. The three sections for strong, mixed and weak are now merged into one piece, with the distinction instead shown through columns. Please comment; is it better, worse? Should we use it? —CodeCat 14:50, 3 August 2015 (UTC)

Your table is more compact. On the other hand, the current arrangement with all strong forms in one place, all weak forms in one place, and all mixed forms in one place seems better for what I expect is the main use of the tables: someone has "[definite article] _ [noun]" or "[indefinite article] _ [noun]" or "_ [noun]" (i.e. they know whether they're looking for a strong or weak or mixed form), and they want to know what ending to put on "rot", for the case and gender they're dealing with, when they plug it into to that blank. Both online (de.Wikt, Canoo) and print references seem to favour the "all strong (etc) forms in one place" format. Notably, I would expect printed works to prefer a more space-saving compact format if they didn't think there was a compelling reason for the longer format. OTOH, if your table were rotated 90°, it might be compact enough to have the advantage of fitting all on one screen for mobile users (but as it is, I imagine it's still too wide). - -sche (discuss) 00:23, 4 August 2015 (UTC)
The main reason I made it was to show the similarities of forms between strong, weak and mixed declensions. This is something that I personally always struggled with, so I though a different table layout might help. But I'll leave it then. —CodeCat 00:34, 4 August 2015 (UTC)
A slightly different issue -- surely the order "nom gen dat acc" is unhelpful for German? My German textbooks use "nom acc dat gen", which IMO is far better since nom and acc are so often the same. Benwing (talk) 06:37, 4 August 2015 (UTC)
I agree that this order is more helpful. The order used for old Germanic languages is generally nom acc gen dat, and this is still used for Icelandic. I never saw the point in having accusative fourth; it's "traditional", but traditions are superceded when we realise they're stupid. —CodeCat 20:11, 9 August 2015 (UTC)
Like Benwing, I'd prefer nom-acc-dat-gen. Nom-gen-dat-acc was traditionally the most common order, but I wouldn't mind improving upon tradition, and there certainly are references which have already done so, as Benwing notes; e.g. Günter Kempcke, Wörterbuch Deutsch als Fremdsprache (2000); Paul G. Graves, ‎Henry Strutz, Master the Basics: German (1995, ISBN 0812090012); David Crowner, ‎Klaus Lill Impulse: Kommunikatives Deutsch Fur Die Mittelstufe (1998, ISBN 0395909341); Karsten Fink, Workbook Deutsch: Das Übungsbuch zu Eine wesentliche Grammatik (2014); and even Robert P. Ebert, ‎Oskar Reichmann, ‎Hans-Joachim Solms, Frühneuhochdeutsche Grammatik (1993), which all use Nom-Akk-Dat-Gen order. - -sche (discuss) 20:34, 9 August 2015 (UTC)
Time for a proposal then? I wouldn't mind one for Latin either to be honest, but Latin tends to be full of tradition freaks... x.x —CodeCat 20:57, 9 August 2015 (UTC)
My only objection is that I am so used to nom-gen-dat-acc that I get confused every time I see nom-acc-dat-gen. But I'll get over it if it's really a better order and we start using it more. Whichever order we choose though, we should try as much as possible to use it consistently not only within languages, but across all languages. --WikiTiki89 17:46, 10 August 2015 (UTC)
Heh, I have the reverse problem. (*looks at second row of inflection table* "what?! there's no way that's the accusative form..." *looks at legend* "oh, it really isn't.") I don't think all languages can necessarily be handled the same; perhaps for some (e.g. Latin) there really is a case for nom-gen order, while for others we already use nom-acc order (e.g. Proto-Germanic, Middle Dutch). I'd rather handle German first and worry about unrelated languages I don't speak later (e.g. Finnish, which uses nom-gen-part-acc, in contrast to Hungarian which uses nom-acc-dat). - -sche (discuss) 19:11, 10 August 2015 (UTC)
I agree with User:-sche that we should do one language at a time. Different languages may have different orders that make the most sense, and also there's the issue of tradition -- German textbooks often prefer nom-acc-dat-gen but Old English textbooks use nom-acc-gen-dat. Sanskrit has a traditional order nom-voc-acc, inst-dat-abl, gen-loc which makes total sense for Sanskrit (and for PIE, and it looks like we indeed use it for PIE) but for Latin the order that makes the most sense might be something like nom-voc-acc, gen-dat, abl-loc, which is similar but moves the genitive. Lithuanian seems to have its own order nom-gen-dat-acc-inst-loc-voc and people working on it might object to changing the order (although personally I think the first two should be nom-voc because they're the same in the plural). Benwing (talk) 01:32, 11 August 2015 (UTC)
For Slovene, the traditional order is nom-gen-dat-acc-loc-ins, but on Wiktionary that's changed into nom-acc-gen-dat-loc-ins. So here, too, genitive precedes dative. For IE languages with a vocative, the order should indeed be nom-voc-acc, like for Proto-Germanic. Balto-Slavic languages tend to put the vocative last; for Proto-Slavic and Proto-Balto-Slavic we currently use the order nom-acc-gen-loc-dat-ins-voc. —CodeCat 01:40, 11 August 2015 (UTC)
Russian seems to do nom-gen-dat-acc-ins-prep which reverses the order of the last two from Slovenian (since "prepositional" is really the locative case). But it would make a lot more sense to move the acc to come after nom, like we do for Slovenian, since the acc is usually the same as either nom or gen (presumably Slovenian is like this too). I guess the point is that the most appropriate order depends somewhat on the language ... for German, acc-dat-gen makes sense since dat and acc are often the same but gen is different, whereas for Russian, acc-gen-dat makes sense since gen and acc are often the same. Benwing (talk) 08:57, 12 August 2015 (UTC)

Deletion of inflected forms[edit]

I see an editor deleting inflected form entries that use {{inflected form of}}, including kveldi, kljenuta, and κυκλῶν. Do we want this? I don't. --Dan Polansky (talk) 23:15, 3 August 2015 (UTC)

Most uses of the template are gone now, via Special:Contributions/MewBot and its e.g. "Rename inflected form of > lb-inflected form of for Luxembourgish entrie" or "Rename inflected form of > yi-inflected form of for Yiddish entries".

I ask that the bot be immediately blocked for a gross violation of WT:BOT and that it remain blocked until the changes are undone. (I might as well talk to a tree, I guess.) --Dan Polansky (talk) 23:24, 3 August 2015 (UTC)


shows that the bot made more than 5000 edits to remove {{inflected form of}}, at the rate of approximately 60 edits per second. --Dan Polansky (talk) 23:30, 3 August 2015 (UTC)

I think you mean minute. DTLHS (talk) 23:36, 3 August 2015 (UTC)
Yes, my mistake. --Dan Polansky (talk) 23:41, 3 August 2015 (UTC)
The change to kveldi looks correct; {{inflected form of}} should be avoided in favor of specifying the actual inflection, which is what was done here. But I totally disagree with simply deleting the pages that use this template, as in kljenuta and κυκλῶν. They should be left alone until someone manages to fix them up to specify which inflection is involved. As for templates like {{de-inflected form of}} instead of the generic one, I'm not sure the point of them, but I imagine CodeCat can explain, and at least there is no loss of information. Benwing (talk) 06:25, 4 August 2015 (UTC)
I agree that these deletions are not okay, and CodeCat should recreate all the entries she has bot-deleted for this reason. —Μετάknowledgediscuss/deeds 06:29, 4 August 2015 (UTC)
Just so we're on the same page, "all the entries she has bot-deleted" = zero entries, and she only deleted three by hand (kveldi, kljenuta, and κυκλῶν). The bot work consisted of switching German uses to {{de-inflected form of}} (which was proposed on the 22nd, met with agreement from a German speaker on the 23rd, and thereafter met with silence until after the changes had been made; only then did someone object) or relatedly switching Yiddish and Luxembourgish uses to corresponding templates. The fact that more languages than were initially thought use {{inflected form of}} may mean we want to go back to the general-purpose template and use langcodes, rather than using language-specific templates — if so, we can do that, since nothing was deleted, but rather only renamed. - -sche (discuss) 08:20, 4 August 2015 (UTC)
Someone else has restored kveldi and I've restored κυκλῶν and made it more precise than it was. I've left kljenuta deleted since if the declension table at kljenut is right, kljenuta isn't a form of it. —Aɴɢʀ (talk) 10:33, 4 August 2015 (UTC)
Thanks for the clarification, -sche. Dan Polansky's wording was evidently intentionally misleading, but my faulty assumptions derived therefrom aside, I still do not support those deletions without process. —Μετάknowledgediscuss/deeds 16:15, 4 August 2015 (UTC)
I apologize to anyone who was mislead by my wording. I should have already been fast asleep at the time when I posted the initial post here; 23:30 means it was 1:30 CET, summer time. --Dan Polansky (talk) 10:01, 8 August 2015 (UTC)
  • Manual creation of a subset of word's inflected forms should be banned, and such entries deleted. Making such entries only complicates botting the rest of the inflection in the future. Too much time is wasted cleaning up such entries. If you are creating inflected forms manually either create it entirely for a lemma, using one and only one template, or don't create it at all. --Ivan Štambuk (talk) 09:13, 4 August 2015 (UTC)
    No it shouldn't, and no they shouldn't. I don't know how to use a bot, and I don't always have the time to create entries for all the inflected forms. I often create entries only for those inflected forms that already exist as spellings in other languages. For example, if some random Irish or Old Irish verb form happens to share a spelling with an existing Spanish entry, I'll create the Irish form there, but I won't bother creating brand-new entries for all the other forms of the verb. In other words, I'll work to remove orange links from inflection tables, but not (always) black/red ones. —Aɴɢʀ (talk) 10:39, 4 August 2015 (UTC)
    Extinct languages like Old Irish which have irregular paradigms and limited attestation of inflection should of course be manually treated. But for living languages that don't have such issues you are just creating more cleanup in the future. Blueing orange links seem to me the only valid reason to do so (convenience over thoroughness). --Ivan Štambuk (talk) 10:23, 5 August 2015 (UTC)

Some relevant data:

  • There were 45419 uses of {{inflected form of}} on a definition line on 2014-07-28. I used the following Windows command line to ascertain that: find /c "# {{inflected form of" enwiktionary-20140728-pages-articles.xml
  • {{de-inflected form of}} was created on 3 August 2015‎ by CodeCat. --Dan Polansky (talk) 10:01, 8 August 2015 (UTC)
  • AWB shows 25000 uses of {{de-inflected form of}} as of now, but there is probably a limit of 25000 built into AWB. I hazard a guess that almost all uses of {{inflected form of}} were replaced with {{de-inflected form of}}.

--Dan Polansky (talk) 10:01, 8 August 2015 (UTC)

{{ux}} in Eastern Mari?[edit]

Recently, CodeCat (talkcontribs) changed the format of the examples in Eastern Mari лум, inserting {{ux}}. The result looked like this: [3]. I certainly understand the need to use standard templates, but the resulting format was much less compact and less practical: three lines per example (including transliteration). Since I thought one line per example would be nicer on the eye and easier for anyone actually interested in seeing how the word can be used, I reverted her change. But I wondered if it wouldn't be possible to change said templates (or create a new one) that has the one-line-per-example format, and keep using it. Would that be a problem to anyone? Is there a reason why the three-line-per-example format should be preferred to the one-line-per-example one? --Pereru (talk) 20:02, 4 August 2015 (UTC)

@Pereru: Just add the parameter |inline=1 to the {{ux}}/{{usex}} template. --WikiTiki89 20:13, 4 August 2015 (UTC)
OK. Now, can this be the standard format? Or is there any reason to prefer the three-line-per-example format? Or is this up to every Wiktionarian to decide? --Pereru (talk) 20:20, 4 August 2015 (UTC)
The reason is that most usage examples are much longer and wouldn't fit well on one line. It's the short ones that are the exception. --WikiTiki89 22:14, 4 August 2015 (UTC)
I think this should be automated in some way. Once the length exceeds a threshold, put it on multiple lines, otherwise keep it on one line. —CodeCat 20:25, 4 August 2015 (UTC)
It's hard to determine length other than by counting characters, which is not so accurate. I think it is better to leave it as is. Perhaps we can make it easier by having a template such {{uxln}} or {{ux1}} which would effectively be a redirect to {{ux|inline=1}}. --WikiTiki89 22:14, 4 August 2015 (UTC)
That's probably better than having such a parameter. But there is an alternative to counting characters: CSS layout. I'm not sure if it's feasible, but at least the client-side stuff knows exactly how wide text is, and it can overflow when necessary. —CodeCat 22:17, 4 August 2015 (UTC)
But it would also need to hide the dashes when it overflows. How would you do that? Also semi-relatedly, |tr=- doesn't work to hide the transliteration in {{ux}}. --WikiTiki89 22:35, 4 August 2015 (UTC)

Sourcing etymologies?[edit]

Recently, in Latvian ūdrs, I reverted a change that introduced a Proto-Baltic reconstruction in the Etymology section, without proper sourcing. Given the way the text was written, it would seem that the Proto-Baltic proposed reconstruction came from Karulis' Latviešu Etimoloģijas Vārdnīca, when in fact it came from an as yet unpublished article by R. Kim. I changed the format, to make it clearer where the Proto-Baltic form was taken from. Can't we perhaps agree on a general policy for Etymology sections whereby we try to explicitly source what is what -- so that, if two protoforms from different sources are cited, the reader can know which was proposed by which source? The format doesn't have to be the one I used in ūdrs, of course, but it would be nice to have something that would avoid this kind of confusion. A second, unrelated question is whether unpublished sources should be accepted in Wiktionary. I'd say no: let it be published before it can be cited here. But I don't know what the others here think. --Pereru (talk) 20:19, 4 August 2015 (UTC)

My general issue with your etymologies is that they're huge blocks of text. They need to be structured better in order to be readable. The long list of cognates is not necessary either, especially if we already have PIE pages and, more recently, categories to hold them. At the very least, they should be made collapsible or presented in a separate paragraph to make the rest easier to read.
This is of course not the problem I wanted to talk about here, but OK, there we go...
The 'huge blocks of text' are necessary when the etymology is not simple, or is disputed, or involves changes, semantic or otherwise, that are not obvious, as in liegt. When the etymology is simple -- just PIE to PB to the word, without semantic changes, as in acs, you have only one short sentence. I suppose your problem here is how much information should be given: should there be only a reference to the etymon, with no indication of how you got from that form and from that meaning to the current state? Or should more information be provided? I, for one, favor the latter, because this extra information is important to judge and accept the etymology, and are part of the history of the word, which is what the etymology section is about. It is also often interesting and brings new light to the understanding of the word, as several other people here told me when commenting favorably on the 'huge blocks of text' that you dislike. Call that 'humanistic etymology' if you will.
I don't have anything against presenting a lot of information. My problem is more the way it's presented. One giant paragraph doesn't invite the user to read it, and instead they'll just go tl;dr at it. If I want to know, at a glance, what the origin is, I don't want to have to read through a lot of blabber to get to the point. So what I would suggest is to write etymologies focus first on the known and reconstructed history, and leave the details until later. That way, people who aren't interested in the extra details can skip them, rather than having to sift through. Make the information that users want more accessible by splitting it. —CodeCat 21:26, 4 August 2015 (UTC)
Most users don't want to look at etymologies, they just want to see what the word means; so they won't read the etymology (or at the alternative forms, or the pronunciation) at all. If they glance at the etymology section, they're as likely to go tl;dr at mysterious cabalistic symbols like *h₃ḗHḱ-ō as they are at longish texts. Only if they are interested will they read it. Interestingly, the information I present is already in the format you suggest: the very first sentence gives the PB and the PIE etymon, you don't have to read any further than that. Perhaps the only necessary change here is to add a carriage return after that first sentence, to put the rest of the information in a separate paragraph? --Pereru (talk) 22:23, 4 August 2015 (UTC)
The list of cognates is less relevant, I agree. The only problem is that different sources often quote different cognates, and this may be a problem. One solution is to bypass cognates altogether, but this only works for the (relatively few) 'famous' words or roots that already have reconstructed forms here at Wiktionary (where one can add cognates and refer to the specific sources that mention them. But over 90% of Latvian words for which Karulis' LEV gives etymologies are not in this category: rather, they are words with only a couple of cognates, mostly in Baltic (e.g. liegt) or maybe a couple of other non-Baltic languages. It will be a long time before those etyma have Wiktionary pages, so eliminating these cognates looks like a bad idea. I would agree, though, with the cases in which there is already a good Appendix page with the etymon (as long as different cognates proposed by different sources are clearly distinguished there). Do you have one such example, so we could discuss the format further?
If sources conflict, then Wiktionary has to find a compromise through the usual consensus process. Consensus may invalidate some sources or even all of them, or choose a particular one that seems most usable by the people discussing the matter. —CodeCat 21:28, 4 August 2015 (UTC)
Sure. Let it happen, then. The LEV, for instance, cites cognates that are not cited in some Wiktionary reconstructed entries; should I add them? Or should I start somewhere a discussion about whether or not to do this? Or whether or not the LEV is a good source? And, if so, where? --Pereru (talk) 22:23, 4 August 2015 (UTC)
Showing different takes on the issue by different people is good. I think the best way to present it would be through an unordered list. See for example *fanhaną. —CodeCat 20:24, 4 August 2015 (UTC)
Back to the problem at hand. Yes, that would be good, so separate paragraphs for your PBS etymologies (with correct sourcing) might be a good idea. You could start such paragraphs with 'According to a diffferent source,...' and then add the information. Or you could mention the forms with a footnote to the source, as I did in ūdrs. Either way would be OK with me, as long as the wording is fluent and there is no confusion as to what comes from where. What I would disagree with is what you did before: just adding a form with no sourcing to a text that is itself attributed to a specific source, as if that form also came from the same source (i.e., your original PBS etymon at ūdrs looked like it came from Karulis' LEV, when in fact it came from Kim's unpublished paper).
Besides, note that English fang (from *fanhaną) -- where you find one of those 'huge blocks of text' you so much dislike -- does NOT mention the two proposed PIE etyma mentioned under *fanhaną: rather, it only mentions the first one, and without references to sources. So the information under fang is misleading at best. Shouldn't such things be changed in a more principled way, so that a reconstructed entry does not seem to be in contradiction with the information found in the etymology section of one of its reflexes? --Pereru (talk) 21:23, 4 August 2015 (UTC)
And I would add that I don't think it's a good idea, in principle, to cite unpublished sources. (But maybe Kim's paper has already been published? It was going to come out in a Handbook, as I recall; maybe it is already there?) --Pereru (talk) 21:12, 4 August 2015 (UTC)
This is the problem with the paragraph approach that you use. You source the whole paragraph, which makes it impossible for anyone to make adjustments to the text. Any edits make it no longer faithful to the source. Instead what should be sourced is individual facts. That way, people can add or change things without invalidating the references. Again, splitting etymologies into separate sections with paragraph breaks and lists should help with that. Again look at *fanhaną: each list item has its own separate sourcing.
Yet I did source other forms in ūdrs, for example, so that it is clear that the PBS form is not from the LEV; just put the footnote next to the material from the other source, not at the end of the paragraph. Why not make it standard practice? Another possibility is simply to start a new paragraph with a different source, perhaps starting with "A different source claims that..." or something similar. So this isn't a problem. --Pereru (talk) 22:23, 4 August 2015 (UTC)
I'm not sure what you mean by unpublished sources. If they are available, then they are public, right? —CodeCat 21:26, 4 August 2015 (UTC)
An unpublished article has not yet passed peer review. It may be complete nonsense, or more likely it may have a few minor errors that will be corrected before publishing. --WikiTiki89 22:20, 4 August 2015 (UTC)
Nowadays everything is on the internet: manuscripts, unpublished sources, papers at various levels of completion... because we always want to invite comments from other interested researchers, comments that may improve a paper even before it's completely finished (academia.edu is a great site for this, as are individual researchers' pages at their institution website). When a paper is published, however, it is officially released, be it on paper, be it in a publishing website. After that, it can no longer be edited or altered; and the year of its publication becomes fixed. Also, a published paper went through a refereeing process in which it was read and commented upon by two or three of the author's peers; an unpublished paper, of course, didn't. So the jist of it is that an unpublished paper is (supposed to be) less good and less final than its published version. Its author, for instance, wouldn't like you to cite an unpublished version if there is a published one alreday. (Kim's paper states quite clearly -- at the end, I think -- that it is an unpublished version, to appear in a Handbook of something or other). --Pereru (talk) 22:23, 4 August 2015 (UTC)
Different pages conflicting on each other is an unfortunate effect of how Wiktionary works. There's not much that can be done about it other than checking and updating things regularly. I would say that generally, the reconstruction pages are more reliable than the etymologies within entries, as they've been created and reviewed by more knowledgeable editors. Etymologies in entries often tend to be copied from just one source, often an outdated or nonspecialised one. They are then inserted into entries by editors who are relatively inexperienced with such matters, so that they are not able to spot and correct problems in their sources. And then, when new entries are created for cognate terms, then the etymologies are just copied over. This tends to propagate old/bad etymologies. And it's one of the reasons I prefer keeping etymologies to a bare minimum and letting the proto-language pages handle the rest. —CodeCat 21:33, 4 August 2015 (UTC)
This is, again, difficult for words that have a more complicated history, as I mentioned above. For such words, their etymology section is the only place where, say, discussing a strange semantic evolution or comparing two or three different etymologies is logical: after all, in the reconstructed entries, you are not interested in the details of the semantic evolution of one reflex in one sub-branch of the family (I haven't seen a single reconstructed proto-entry here that does that); rather, the focus is on the reconstructed protoform and how it fits in the proto-system. So I think you would lose more than gain by doing that. The only thing that I would indeed relegate to the reconstructed entries is the list of cognates -- assuming that we can source cognates that occur in only one source, for instance.
And here's a final thought: if inconsistencies are unavoidable at Wiktionary, if no policy can be devised to address them, then we're basically giving up on the idea that Wiktionary can become a quality work. No -- I'm sure something can be done. Wikipedia found solutions, so can Wiktionary. --Pereru (talk) 22:23, 4 August 2015 (UTC)
  • I support banning original research with reconstructions in etymologies, as well as inventive editorial corrections, such as how "ū́drā́-" (the form cited in the article by R.K.) became "ūdrāˀ (which is what CodeCat inserted in the etymology). Additionally, for protolanguages, when there is no accepted general framework, which is the cases with Proto-Baltic/Proto-Balto-Slavic, all of the competing theories should be presented on an equal footing. That means that there can be no single and "true" reconstruction, and that there could be multiple inflection tables for a word according to different sources. --Ivan Štambuk (talk) 10:19, 5 August 2015 (UTC)
Agreed. And since the number of reconstructed entries in Wiktionary is not so high, this is probably quite feasible, isn't it? Shouldn't for instance the page *ūdrāˀ be moved to *ū́drā́-, then? Or does CodeCat have another source that references the form she prefers? --Pereru (talk) 13:46, 5 August 2015 (UTC)
I oppose a ban on editorial corrections; to fail to harmonize notation schemes is misleading. In both Menominee (living language) and Proto-Algonquian (reconstructed language), for example, most people notate long vowels like , but some people write , a: or ā. To have individual words/forms in different systems based on who attested the particular word/form (e.g. fooba·r, plural foobārs) would confuse readers into thinking the vowels were of some different quality. - -sche (discuss) 17:24, 5 August 2015 (UTC)
I agree with -sche here; notation schemes should be harmonized to the extent that this is a simple case of equivalent notations. As for "ū́drā́-" vs. "ūdrāˀ", it's not obvious to me what's going on here. Do the two acute accents indicate Balto-Slavic acute? If so, then it's fine to convert them to use the superscript glottal stop, which can be viewed as simply another way of indicating the BS acute register -- the fact that it expresses an opinion as to how that register was phonologically realized is irrelevant here. But then shouldn't it be "ūˀdrāˀ"? Benwing (talk) 07:16, 6 August 2015 (UTC)
Using acute accent marks to indicate the acute is actually very misleading, because Proto-Balto-Slavic also had a proper phonemic word accent like that of PIE. We should definitely use the same symbol, ´, to denote the accent in both of them. Anything else would just be unnecessarily confusing. That said, it does seem that there is somewhat of a linguistic consensus that the acute register involved some kind of glottal feature. The Latvian broken tone is a direct continuation of the acute, and is realised as glottalisation. So if there is any serious disagreement among linguists about the approximate nature of the acute, then I would like to hear about it. —CodeCat 00:42, 7 August 2015 (UTC)
@CodeCat OK, I think I agree with you here, but what I don't understand is why you didn't write "ūˀdrāˀ" rather than "ūdrāˀ". Isn't the acute register on both syllables? And where's the stress? Benwing (talk) 10:09, 7 August 2015 (UTC)
You're right, I moved the page. But I wonder why the masculine form *udras doesn't have an acute, at least according to the source Pereru gave. Did Winter's law skip that word or something? —CodeCat 12:06, 7 August 2015 (UTC)
I think it does have an acute, it's just mis-written. The Latvian descendant has a long broken-tone vowel, and AFAIK broken-tone is descended from an unstressed Balto-Slavic acute vowel (one of the other two tones reflects a stressed acute vowel, I think, but I forget which one). Benwing (talk) 12:17, 7 August 2015 (UTC)

Sourcing etymologies bis: a proposal[edit]

Well, here is a modest proposal for sourcing (and otherwise formatting) etymologies in etymology sections:

  • For "simple" etymologies (A < proto-B AA' < proto-C AAA),

(a) State in the first sentence what the path is from the current form to the oldest protoform you want to cite ('From proto-B AA, from proto-C AAA). Make it a separate paragraph.
(b) Further infomration (semantic evolution, irregular transformations, etc.) can be described in the following paragraph, if need be, as succinctly as possible.

  • For "complicated" etymologies (there are several suggested paths or etyma):

(a) Start with "There are (two, three, several) proposed hypotheses:";
(b) State each hypothesis in a single sentence in a separate paragraph, starting with a letter -- (a), (b), (c), etc. -- to identify the hypothesis;
(c) If further information is necessary on a given hypothesis, add it in a separate paragraph after all the hypothesVes, referring back to it by its letter.

  • Cognates would be listed, in full agreement with the source (i.e., no tampering with the data!) in a separate paragraph at the end. If one of the protoforms (preferably the oldest) already has a good, consensus-approved entry in the Appendix, then all, cognates to that entry, making sure that each cognate is duly and correctly sourced. (This is not the current state in most reconstructed entries here, and those interested in entering protoforms should add their sources.)

What do y'all think?

I'm impressed with the detail you put into ūdrs. My only suggestion would be to put the cognates into a separate paragraph to avoid the "wall of text" feeling. It sounds like you're in agreement with this. Benwing (talk) 08:37, 5 August 2015 (UTC)
Is this arrangement in ūdrs (a carriage return between the two paragraphs) what you had in mind? --Pereru (talk) 14:05, 5 August 2015 (UTC)
I don't like how you are duplicating the cognates in ūdrs. They are already listed in *udrós. The Latvian page is not the proper place to discuss the development of Latin lutra. --Vahag (talk) 16:38, 5 August 2015 (UTC)
In this case I actually agree. But before removing them, we need to solve inconsistencies. So there are cognates in my source that aren't mentioned in *udrós. Should I copy them and source them there? How about the fact that my source menitons a Proto-Baltic form, whereas *udrós lists only Proto-Balto-Slavic? I agree that basically cognates (at least for the 'richer' words with cognates in many branches) should be in the reconstructed entry page, but we need to know which forms should be there, from which sources... or else we simply don't know what kind of information we have there. In ūdrs, at least I know who made the claim and where.
I made a first attempt to change *udrós, introducing information from the Latvian source and footnoting it. I don't quite like the look of the result, but it's a first attempt. Any thoughts? --Pereru (talk) 08:43, 6 August 2015 (UTC)
Yes, you can add the cognates to the proto-entry and source them there. That way the information from LEV can be enjoyed by everyone, not just the viewers of the Latvian page. For the format of referencing individual descendants you can look at *tep-. As to which descendants should be there and from which sources, I think at first all descendants from all sources can be added. If people have objections, a centralized discussion will happen on the talk page of the proto-entry or in WT:ES. The bad cognates from outdated sources will be eventually weeded out. That will not happen if you keep the information on the Latvian page. --Vahag (talk) 09:24, 6 August 2015 (UTC)
Well, @Vahagn Petrosyan:, I did add LEV cognates to the list, but my changes at *udrós were reverted without explanation (diff). Unless this is better explained, so that I can know what is going on, what is the point of adding cognates there? It seems safer to leave them on the Latvian page...--Pereru (talk) 01:31, 7 August 2015 (UTC)
I gave an explanation, so did you just choose to not read it? —CodeCat 01:56, 7 August 2015 (UTC)
Reverting good faith edits is not cool, CodeCat. You did the same to me recently.
There is no accepted format for listing both Proto-Balto-Slavic and Proto-Baltic. Pereru, can't you list the cognates under Balto-Slavic only and still reference LEV? Sure, LEV says "from Proto-Baltic", but we understand that in essence what he is saying is that ūdrs is from PIE *udrós, whatever the intermediate details. When my dated Armenian equivalent of LEV says հոտ ‎(hot) is from PIE *ōd-, I understand that I should list it under modern PIE *h₃ed- and still reference my old source. I have seen it done by academic scholars. Martirosyan 2010 can write that a source in 1920s derives a word from such-and-such PIE root and use modern reconstruction for that root. It seems to me that you are trying to give a literal translation of LEV in Wiktionary. That is a job for Wikisource, not Wiktionary. The best practice is to synthesize sources new and old under the light of modern knowledge. --Vahag (talk) 09:43, 7 August 2015 (UTC)
I agree with Vahag that User:CodeCat probably shouldn't have reverted that change, and should definitely have given a better explanation than "this just looks ugly". I can understand CodeCat's objection to the form ū́drā́ with acute accents indicating the acute register (i.e. it conflicts with the conventional use of accents to indicate stress, which is also phonemic in Balto-Slavic, and it's inconsistent with the way other Balto-Slavic entries have been formatted in Wiktionary [granted, it was CodeCat doing that formatting]), but in that case, she should have just undone that one change, with explanation, rather than the whole thing. I also agree with Vahag that we should feel free to modernize/canonicalize proto-forms and such. Benwing (talk) 10:05, 7 August 2015 (UTC)
This is not about canonicalization. Those glottal stops are phonemes on their own in the reconstruction of Proto-Balto-Slavic by the Leiden School, and according to it only after the parent language disintegration did individual branches developed their own acute/circumflex distinctions. The notation with acute accents by R. K. is an entirely different reconstruction, where acute accent marks indicate intonation/tone. Those two also have different originating points - the glottalic theory of PIE vs. the standard PIE Frankenstein's monster with laryngeals, genders and thematic inflection existing contemporaneously. You can't mix those two notations, because they refer to two different protolanguages, in two different chronological stages. There are also other differences that go beyond mere characters substitutions. --Ivan Štambuk (talk) 12:19, 7 August 2015 (UTC)
I think my point still stands, though, that these can be viewed as equivalent notations, with acute register vs. non-acute register marked either by acute accent vs. tilde (or circumflex) accent or by presence or absence of superscript glottal stop, without necessarily committing to a phonological interpretation of the notation. As long as it's agreed that there was a two-way register distinction -- regardless of whether that is interpreted as tonal, as glottal, or whatever -- then the notations are equivalent in that you can convert from one to the other without loss of information, and we may as well be consistent. Benwing (talk) 12:53, 7 August 2015 (UTC)
What does the "two-way register distinction" actually mean? It's a meaningless notion, vague and abstract. Those symbols mean different things in different protolanguages. Leiden School theory also has short *o and *a, and different assumptions on Auslautgesetze and paradigms leading to different endings and forms in inflections. Also, some of the origins of the glottal stop or "acute" are disputed (Winter's law formulation, long/hyperlong vowles), which in particular renders the acute accent notation inapplicable, whereas with the glottal stop you can just use parentheses as is the customary notation for optional parts of reconstruction. Lastly, the superscript notation is baised as to the phonation character of what you call the Balto-Slavic "acute register" - there are different theories (rising/falling tone, glottalization/stod). It's best not to mix those two protolanguages, and use two different reconstructions. There are some Proto-Slavic appendices that already do it like that. --Ivan Štambuk (talk) 13:32, 7 August 2015 (UTC)

──────────────────────────────────────────────────────────────────────────────────────────────────── "register distinction" is an abstract way of referring to a distinction with unknown phonetics, but it's certainly not meaningless, and more than the three laryngeals of PIE, which are equally abstract. No one would have any problem regularizing e.g. Ringe's laryngeal notation, where he writes something like ç x xʷ in his Tocharian book, into more standard h₁ h₂ h₃, even though they may have a completely different interpretation of what these symbols mean phonologically. Differences that cannot be treated as notational variants, e.g. differences in which register or vowel length is reconstructed in a particular word, or in numbers of vowels, obviously shouldn't be confounded, but when there's an equivalence to be made between notational variants I don't see the point of not making it. Benwing (talk) 17:00, 7 August 2015 (UTC)

But the difference is that the glottal stop is not a mere "abstract register". It's a phoneme with a very specific phonetic value. Nobody disputes the phonemic status of PIE laryngeals. The differences between the protolanguage described by Ronald Kim (which has no glottal stop as a phoneme, and "acute" is a property of certain vowels) and the one of the Leiden School are irreconcilable and these two should not be mixed. This "canonicalization" is a thinly-veiled attempt at giving undue prominence to certain theories at the expense of others. It should be resisted and denounced. --Ivan Štambuk (talk) 17:09, 7 August 2015 (UTC)
The difference you mention is nothing more than relative chronology and allophony. Compare the sequence -Vnh- in Proto-Germanic. It eventually gave way to -Ṽ:h-, with a long nasal vowel. It doesn't matter in the slightest whether we write one or the other, because they represent the same phonological units. It's merely a matter of phonetic interpretation, but notation certainly does not have to indicate any particular phonetic reality. The same applies here with the acute. The interpretation in which there is an actual glottal stop, and the one in which there is merely glottalisation or some other change in the vowel, are different interpretations of the same phonological units. How you interpret it phonetically, or denote it in writing, is completely irrelevant to what it is. It's the acute, nothing more, nothing less. Whatever symbol we choose to show that it's there is equally valid, because it's just a symbol that says "acute is here". —CodeCat 19:00, 7 August 2015 (UTC)
I am not trying to give prominence to one theory over another. I'd be just as happy if you denote the acute with a superscript A, and the non-acute with a superscript B (or vice-versa). That makes it obvious that we're talking about what is ultimately an abstract register difference. It is entirely analogous to the situation in Old Chinese, where everyone agrees there was a distinction between "Type A" and "Type B" syllables but no one agrees what the relevant feature was. Some think type B syllables have an extra /j/ phoneme before the vowel, some think type A syllables have pharyngealization of the syllable-initial consonant, some think the difference is vowel length, some think it's a phonation difference (creakiness/breathiness/whatever), etc. But these theories are hardly irreconcilable just because of this. My concern is not to favor one theory over another but to avoid needless complication introduced by notational differences. Since you don't seem to ever believe in canonicalizing notations, we might end up just having to agree to disagree. Benwing (talk) 19:45, 7 August 2015 (UTC)
Two points that have been made in previous discussions: one, we have to recognize that many obvious derivations are not cited, e.g. it is unlikely that a dictionary has gone through Spanish's (or, even more likely, Rumantsch's) massive corpus of words inherited from Latin and noted, in each and every case "yep, this one too was inherited from its obvious Latin predecessor, rather than, like, borrowed from Welsh or something". The same sort of research we undertake to determine what words mean, and e.g. how they inflect (in contradistinction to what scholars and authorities think they mean and how scholars and authorities think they inflect), will sometimes be necessary when tracing etymologies. Two, it would be misleading and foolish not to allow for standardization of notation schemes, as I note above. - -sche (discuss) 17:36, 5 August 2015 (UTC)
@Pereru There's still a bit of a "wall of text" effect, since the paragraph break is barely visible. If it could be set off better, I think people would object less. An alternative is to just list a few cognates, the subjectively most "interesting" ones (e.g. Greek, Sanskrit) and put the rest on a reconstruction page; and if such a page doesn't exist, create it. I personally don't object to seeing all the cognates listed on the Latvian page, but I understand the objections of the others, and I also see how it's likely to lead to inconsistencies (e.g. you give an etymology for the unexpected l and t in lutra whereas the reconstruction page doesn't and says it's unknown. Benwing (talk) 07:23, 6 August 2015 (UTC)
Does indenting the cognate paragraph make it look better (see ūdrs)? As for cognates in general, I do understand the feelings, but there are too few reconstructed entries in Wiktionary for all cognates to be easily transferable (and given that there are discussions about "what the right form" is, I'm a bit afraid of creating hundreds of new reconstructed pages on the authority of my Latvian source, the LEV, just to see them moved to other titles, or incorporated into other pages, etc.; I'd rather wait till there are more solid criteria. I frankly think Wiktionary simply follows no real policy on dealing with reconstructed entries, etymological sources, etc. -- everybody pretty much does whatever s/he wants... For differences between sources, case in point: Latin l in lutra. I simply copied what my source had on this problem, while the reconstructed page made that claim apparently on the basis of an old etymological dictionary of Ossetian (though it is not clear whether the reference refers to the entire page or only to the reconstructed protoform -- again, we lack a good format for these things). Should we mention both? Only the most recent? The best source? Again, where's the policy?... --Pereru (talk) 08:43, 6 August 2015 (UTC)
Indenting is better, although not perfect. I also tried indenting with ':' (where you don't see the bullet point) and setting the paragraph off with two blank lines. All are possibilities.
As for there being no real policy on reconstructed entries, I think you're right. Mostly that's probably because few people are actually creating those entries -- mostly it seems to be CodeCat (talkcontribs), at least for IE languages. You might consider proposing a policy and getting people to vote on it (although that may be a bit like herding cats). Benwing (talk) 08:08, 7 August 2015 (UTC)
BTW you could also try just "being bold" and editing pages like Wiktionary:Etymology and Wiktionary:About Proto-Indo-European and Wiktionary:About Latvian and so on that purport to be policy pages; if anyone objects, they will change it. Benwing (talk) 08:11, 7 August 2015 (UTC)
I'd like to have OP's third point clarified a bit. First, "listing cognates in full agreement with the source": sources on languages that have been unwritten or scarcely written until recently will often utilize technical or otherwise non-standard orthography or transcription; but I would suggest that this does not mean we are obliged to provide a separate source for the actual native orthography. E.g. the Udmurt reflex of Proto-Uralic *käle is кыл ‎(kyl), but all basic sources appear to list only the transliteration kyl, kïl, kɨl or ki̮l. (And, as mentioned above, I agree that transcription schemes should definitely be unified here as well.)
Second, would "making sure that each cognate is duly sourced" involve simply watching that people don't add new cognates out of the blue, or actually adding an inline citation for every single cognate? The latter would sound like overkill, whenever the majority of a cognate list is based on a reliable and comprehensive source, such as authoritative major etymological dictionary (or dictionaries), and is not explicitly contradicted by other equally reliable sources. Not every language group necessarily has such a source available of course, and establishing what sources to consider "reliable by default" (and to what extent — often a source might be quite reliable for cognates but outdated for reconstructions or etymologies) should be determined by the consensus of editors involved with the language or language group in question.
For "unexpected" cognates that are added from somewhere else than from a standard source (say, if someone were to release a paper arguing that Mongolian хэл ‎(hel) is a Uralic loanword), I'd be in favor of annotating the etymologies in more detail, but it should probably be sufficient doing this on the "main" etymology hubs — the entry's own page and its posited origin's entry (whether an attested form or a reconstructed proto-form) — rather than on every single page that refers to it. --Tropylium (talk) 11:52, 7 August 2015 (UTC)
Here are my personal opinions on the clarifications you ask about:
(a) "In full agreement with the source" is not supposed to mean that you can't regularize transcriptions, as long as this is described in a policy page (e.g., WK:Etymology or WK:About Proto-Uralic or something like that), so that the interested reader can always see what can be one to source transcriptions.
(b) "Making sure that every cognate has a source" is meant as making sure the reader can tell where cognates came from. So, if all cognates come from the same source (some authoritative etymological dictoinary, for instance), you can refer to it only once at the end of the page. But then it becomes necessary to indicate deviations if they occur. If someone adds a new cognate to this page that happens not to come from the common source, then this cognate needs a footnote indicating its source, so the reader isn't fooled into thinking it is from the same source as the others. In short: don't necessarily add a footnote to every cognate, but always make it possible for the reader to know where the cognate comes from. If it's an original suggestion of a Wiktionarian (e.g., CodeCat, who is into original research), then also say so by adding an "original research" template. --Pereru (talk) 19:55, 8 August 2015 (UTC)

As I see it, a discussion on allowing "Notes" as a valid header should be considered.

Vahag has brought this up (Wiktionary:Grease_pit#.7B.7Breflist.7D.7D) and I'm running into a similar problem all the time. As ridiculously silly of an argument as it may be, I do, in fact, agree that numbered and bulleted references together look ugly AF. (I have even went to such ridiculous steps as removing a reference that didn't add anything critical just because it was bulleted while the other ones were numbered because of how unappealing it looks.)

In more general terms, I kind of get the feeling that there seems to be consensus that references are in fact valuable and add value to the entry, perhaps the discussion should focus more on how to allow more elegant ways of faithfully citing content, particularly in "controversial" cases, e.g., obviously one bulleted reference is enough under, say, an assertion that et kala and liv kalā derive from the same source because, well, it's pretty obvious but then if there is a "weird" controversial cognate there isn't even a way of citing it inline (unless you want the awful looking mixing of numbered and bulleted refs.) Neitrāls vārds (talk) 11:09, 12 August 2015 (UTC)

  • Support having a ==Notes== section separate from ==References==, esp. when both exist. Benwing (talk) 11:21, 12 August 2015 (UTC)
Thinking of copying this to a separate header for separate discussion. Neitrāls vārds (talk) 06:45, 17 August 2015 (UTC)

Transliteration obligatory?[edit]

It seems that transliterating non-latin scripts has become obligatory in all templates, but in certain cases -- "latin-like" scripts like Cyrilic or Greek -- I think transliteration actually annoys more than it helps. Why is transliteration, especially of Cyrilic and Greek, obligatory in all cases, including inflection tables and examples? I would rather have it only next to the headword... Case in point: Eastern Mari лум, where having two Mari lines is somewhat disruptive. Can't we have a parameter tr=- (which is what I used in this case) to avoid the obligatory transliteration? As things are, the only option is not to use templates... which I would prefer not to do. --Pereru (talk) 04:47, 5 August 2015 (UTC)

Generally, we transliterate everything in Wiktionary -- we don't assume readers are able to handle foreign scripts. So I don't think it's a good idea to disable the translit just because it seems disruptive to you -- we're not limited in space or anything like that (and "disruptive" is in the eye of the beholder). Benwing (talk) 07:48, 5 August 2015 (UTC)
I would also say that translit is especially important for an obscure language like Eastern Mari -- even though it's "just" Cyrillic, it invariably has different conventions from more familiar languages like Russian. (Consider, for example, the Abkhaz language, which is written in Cyrillic but with all sorts of strange non-Russian characters.) Benwing (talk) 07:52, 5 August 2015 (UTC)
I have fixed the problem that |tr=- didn't work in {{ux}}/{{usex}}. However, it should not be used except in exceptional circumstances and this is not one of them. --WikiTiki89 13:30, 5 August 2015 (UTC)
So you guys don't think that it is confusing to have two "examples" separated by em-dashes -- the original spelling example and the transliterated example -- followed by a translation? My first reaction was that it looked like it had two translations, or at least that there were too many elements, enough to clutter the view. Wouldn't it be better to restrict transliterations to headwords, and leave them out of examples and inflection tables? --Pereru (talk) 13:53, 5 August 2015 (UTC)
Should people reading the examples and inflection tables be required to be able to read the script? I think that's too high/elitist a requirement. People might be wanting to read inflection tables for all kinds of reasons. For example, I might be interested in Armenian inflections even though I can't read the script at all. Why make that impossible for me to do? —CodeCat 14:20, 5 August 2015 (UTC)
I agree with the obligatory transliteration of usexes. The format may be tweaked though. Perhaps the first em-dash can be replaced with parentheses. --Vahag (talk) 16:41, 5 August 2015 (UTC)
What about this: лум лумеш, возеш (lum lumeš, vozeš) ― it (lit. snow) is snowing ? DTLHS (talk) 16:51, 5 August 2015 (UTC)
I would not italicize, nor use a small font. I would use a format as in the headword line or {{l}}. --Vahag (talk) 20:20, 5 August 2015 (UTC)
I agree with Wikitiki89 and CodeCat. For short usexes DTLHS's suggestion is good. (I suppose this comes back to the subject we were discussing elsewhere, of having the template 'know' when to make a multi- vs a single-line usex.) - -sche (discuss) 17:37, 5 August 2015 (UTC)
I also like DTLHS's idea with parenthesis and a smaller fornt. My problem with having transliterations everywhere is simply that it affects compactness, which we also want to strive for. I'd be in favor of some solution that doesn't force inflection tables, sometimes already too big (especially in an agglutinative language like Eastern Mari), to become twice as big. Wouldn't it be possible, for instance, to have a second, alternative table with the transliterations? Perhaps with a clickable point to change one version of the table into the other? Or could we maybe have the transliteration become visible in a hovering bubble as you move your cursor over the table? --Pereru (talk) 08:52, 6 August 2015 (UTC)
  • The widespread assumption on Wiktionary is that users are idiots, so having redundant and often unjustifiable data cluttering the entry is generally seen as a good thing. Perhaps we need a two-tier Wiktionary: one for "common people" - without dead words and meanings, complicated etymologies, transliterations on every place under the sun and generally anything that could hurt their attention spans in search of that precious datum of information that landed them here, and one for "serious people", with all that extra stuff. --Ivan Štambuk (talk) 11:02, 7 August 2015 (UTC)
That's actually a good idea. Is it possible to do something like that, maybe by having different shells for "specialists" and "non-specialists"? Even 'normal' articles seem cluttered with all those translation tables and alternate forms and what not, especially for the casual user who just wants to know what a word means. Note that other online dictionaries often have this extra information in some clickable-access format, but not immediately displayed when one asks for a certain word. --Pereru (talk) 00:58, 8 August 2015 (UTC)
I prefer to see transliterations, when templates are used, including for scripts and languages I can read, e.g. Korean or Hindi, etc.
내가 어찌 알겠어?
Nae-ga eojji algesseo?
How should I know?
एक नई शुरुआत
ek naī śuruāt
a new beginning
It's much easier that way for most users. "Smart" users can bear with those who are dumb :) --Anatoli T. (обсудить/вклад) 12:16, 7 August 2015 (UTC)
But wouldn't it be just as good if the transliteration were 'clickable' or available on a hovering bubble? --Pereru (talk) 00:58, 8 August 2015 (UTC)
This feature is currently unavailable. There's no point mentioning something that doesn't exist. (I am not saying, it's not possible to implement.) --Anatoli T. (обсудить/вклад) 13:05, 8 August 2015 (UTC)
If you're "not saying it's not possible to implement", then what's wrong with proposing it? --WikiTiki89 17:51, 10 August 2015 (UTC)
  • I oppose obligatory transliteration of usage examples. I would even favor banning transliteration in usage examples, but there won't be consensus for this. Then at least, don't make it mandatory. These transliterations present inessential (disposable) visual noise. --Dan Polansky (talk) 11:17, 8 August 2015 (UTC)
I suspect that any non-Roman script would be visual noise for you. Unless you can read all scripts, of course. --Anatoli T. (обсудить/вклад) 13:05, 8 August 2015 (UTC)
I am not saying the non-Roman script is noise; I am saying that the romanization in the example sentence is noise. Thus, in лум, I see this:
  • мамык лум ― mamyk lum ― fluffy snow
But I'd like to see this:
  • мамык лум ― fluffy snow
Romanizations in headword lines are fine, IMHO. --Dan Polansky (talk) 13:37, 8 August 2015 (UTC)
At least for one-line usexes, I'd like to see:
  • мамык лум (mamyk lum) ― fluffy snow
but that may become unwieldy for usexes where the translation is on a different line from the usex. —Aɴɢʀ (talk) 14:20, 8 August 2015 (UTC)
I agree with Dan Polansky above, of course. But I still ask: why not find some other way of handling transliterations, such as making them visible when one moves the cursos over them in hovering bubbles, or having a button that makes them visible or invisible depending on the taste of the viewer? What would be wrong with that? We make inflectional tables appear closed by default, and only open when you click on them; why not do the same with transliterations? --Pereru (talk) 19:39, 8 August 2015 (UTC)
@Pereru. Someone will probably create a technical solution for this but I don't understand your dislike for transliterations in usexes. You can ignore them if you don't need them but do you realise that other users may be interested? They may not know the script or willing to learn it, they could be interested in analysing the grammar, vocabulary or language comparison. Foreign scripts just put off some people who are only used to Roman letters. I know this for a fact - this includes people who are familiar with foreign scripts but not fluent in them and reading foreign characters takes some effort. Besides, I'm sure you're having Cyrillic in mind when wanting to get rid of transliterations but the change (if implemented) will affect all non-Roman scripts, some are very complicated and hard to read! How useful would a string of Thai characters like this: เรียกรถแท็กซี่แล้วยัง be to you, compared to เรียกรถแท็กซี่แล้วยัง?rîak rót tɛ́k-sîi lɛ́ɛo yang? ― Did you call the taxi?? You would probably even have some difficulty in finding the headword term (เรียก) at first? --Anatoli T. (обсудить/вклад) 01:19, 11 August 2015 (UTC)
@Atitarev, maybe I'm making this seem more important to me than it really is. It all boils down to an esthetical preference: examples plus translations tend to already be long enough, if you still add transliterations the result will often be longer than one line, and that offends my sense of proportion. I would prefer no transliterations even in languages whose script I don't read (I can read the Thai script, so that's not a big deal for me, but, for instance, I don't read Chinese characers; and yet, for me, lines with just the original Chinese example and a translation look nicer than those with the transliteration). The esthetics gets especially bad with inflection tables, which become at least twice larger than they need to be only to accommodate transliterations. Now, I understand and agree that others have a right to think differently, and I won't mind too terribly if things remain as they are. But if there's a chance of getting a nicer format... then I'm all for it! --Pereru (talk) 06:20, 11 August 2015 (UTC)
@Pereru Thanks for the reply. Yes, various enhancements are welcome but until they are implemented, I think it's good to keep transliterations as they are. Yes, foreign language example can look nice but sometimes meaningless or very hard to digest. It is very true when you look for them. For me, full FL examples with translations and with transliterations (or phonetic guide/help like Japanese furigana, Arabic vocalisations, word stresses, etc.) were always a blessing in learning the basic of new tongues in a relatively short period. You can focus on scripts, grammar, vocabulary, syntax - it's your choice what you do and when, when you have all three (audio recording is a fourth important component). --Anatoli T. (обсудить/вклад) 07:08, 11 August 2015 (UTC)
  • Readability and usability I think that adding transliteration to other scripts is extremely valuable and would like to see it implemented throughout the dictionary but I am also concerned about the perspective that encourages adding an extra step on clicking or focusing for the browser and mouse because this is difficult for users with certain disabilities and on some platforms. —Justin (koavf)TCM 06:16, 12 August 2015 (UTC)

Eastern Mari possessed forms[edit]

I'm thinking about how to do a template that will include possessed forms in Eastern Mari, but because every possessed form ('my house', 'your house', etc.) can also be inflected for ten cases, singular and plural ('my house', 'in my house', 'in my houses', 'to my house', 'to my houses', etc.), we end up having 6 persons x 20 cases x 2 numbers = 240 forms, most of which are predictably formed. This means creating tables that are rather big and unwieldy. I was wondering if someone working with similar cases (in other Finno-Ugric languages, or in Turkish, etc.) has found a better solution that just creating big tables? (Right now, I'm tempted to make each non-possessed declined form -- e.g., 'my house' -- an independent sublemma, with its own case inflection table under it, but I'm not sure this is the best solution.) --Pereru (talk) 04:56, 5 August 2015 (UTC)

I'm a bit fuzzy on the details, but I vaguely remember someone saying that you can nest collapsible boxes. That means you could have just one form showing, but a whole sub-paradigm that opens up when you click on it. Chuck Entz (talk) 06:58, 5 August 2015 (UTC)
Finnish declension tables ignore possessive forms. The possessive endings (which can also be added to verb forms) have separate entries like -ni, -si, -nsä etc with lots of usage examples. --Makaokalani (talk) 10:32, 5 August 2015 (UTC)
For Hungarian entries, each possessive form contains its own declension table. For example: ablak (‘window’) → ablakom (‘my window’), of which the latter have a separate table with forms such as ablakommal (‘with my window’), ablakomban (‘in my window’), etc. Einstein2 (talk) 19:09, 5 August 2015 (UTC)
Nagyon szépen! I like the Hungarian solution. But how do you get those green links? They speed up the making of form-of pages considerably. --Pereru (talk) 01:57, 6 August 2015 (UTC)
Here's a description about how to make a template use the script which generates the green links: User:Conrad.Irwin/creation.js/documentation. Einstein2 (talk) 11:22, 6 August 2015 (UTC)

Make Proto-Baltic an etymology-only language[edit]

Linguists don't all agree on the nature of the Baltic languages as a group. There are three main proposals, that I know of:

  1. Balto-Slavic splits into Baltic and Slavic. Baltic then split into East and West Baltic. (this is the traditional view)
  2. Balto-Slavic splits into East Baltic and Slavic-West Baltic. Slavic-West Baltic then split into Slavic and West Baltic.
  3. Balto-Slavic splits into East Baltic, West Baltic and Slavic.

Proto-Baltic only exists in the first of these proposals. Moreover, it has been noted that there aren't really any common linguistic changes that separate Proto-Baltic from Proto-Balto-Slavic. As reconstructed, the two are essentially identical.

In the past, we've deleted and merged different proto-languages when there is no definite agreement on their existence and definition, and when they are too similar to their parent language to make separate pages for them worthwhile. For example, Proto-Finno-Permic and Proto-Finno-Ugric were recently merged into Proto-Uralic. There was also a discussion on merging various Polynesian languages, although I'm not sure where that went. In any case, I don't see the value in having separate pages for Proto-Baltic reconstructions when they're all just going to be identical to Proto-Balto-Slavic reconstructions. So I think that Proto-Baltic should be changed into an etymology-only language, so that it can be mentioned with {{etyl}}, but there can be no entries or links to it. All existing links would be changed to Proto-Balto-Slavic. —CodeCat 12:14, 7 August 2015 (UTC)

  • Support. --WikiTiki89 14:59, 7 August 2015 (UTC)
  • Support also. Like you, I have also heard that Baltic = East + West Baltic is not a valid clade. Benwing (talk) 16:30, 7 August 2015 (UTC)
  • Disagree. PBS is still not consensus, and as far as I understand the assumption PB = PBS is not obviously true -- Slavic can alter PB reconstructions significantly if it is taken into account for PBS. So, since there is no consensus, I say keep the PB pages as long as they're sourced. After there is a PBS etymological dictionary then this issue can be dealt with here; before that, doing this would simply be premature. --Pereru (talk) 00:55, 8 August 2015 (UTC)
But would there be any difference? It would receive the same treatment as fiu-pro – valid for use in etymologies (in {{etyl}}) but not having its own appendices. Does bat-pro even have any appendices, I think majority are bsl-pro, is that correct? Hopefully this would be another step towards lessening confusion/misguided deletions like this: User_talk:Tropylium#Category:Proto-Finnic_terms_derived_from_Proto-Baltic (I'm sure it was done with good intentions but a user should be able to use such oft-cited (in published literature) genetic groupings in etymologies even if they are considered defunct by the most recent research and don't have their own appendices.) Neitrāls vārds (talk) 09:25, 9 August 2015 (UTC)
There was no deletion: that category simply hasn't been created yet. The category showed up in Special:WantedCategories, and I wanted to make sure it was a good idea to create it before doing so. I wouldn't have deleted it if someone else had created it, but I try to avoid creating categories that are only going to be deleted later (though it inevitably happens some of the time). I do weed out a lot of mistaken categories from bad edits, which I correct, like Category:Spanish adejctive forms, but I generally wouldn't do that with a knowledgeable editor who intended to do it that way. I didn't create the category, but I didn't "fix" the entry itself. Chuck Entz (talk) 14:50, 9 August 2015 (UTC)
  • Question: has Proto-East Baltic been worked out to any major degree? As far as I know, everyone accepts East Baltic, which means that effectively the Baltic vs. Balto-Slavic debate should only come up whenever there's Old Prussian or similar data involved. I would not be surprized if there were even sources defining "Proto-Baltic" as only the common ancestor of Latvian + Lithuanian anyway. (I tentatively support a merger between the appendices; bear in mind that we could still cover in prose differences between Baltic and Balto-Slavic of they were to come up. But I have no opinion on which of the two should remain.) --Tropylium (talk) 18:46, 8 August 2015 (UTC)
Nope. AFAIK no such a thing has been worked out as of yet. Neitrāls vārds (talk) 09:25, 9 August 2015 (UTC)
  • Support. Like Benwing, my understanding of the scholarship is that Baltic is not a genetic group and there was no Proto-Baltic. Even Derksen, who writes of Proto-Baltic, says "I am not convinced that it is justified to reconstruct a Proto-Baltic stage; the term Proto-Baltic is used for convenience’s sake." Reconstructing Prehistorical Dialects: Initial Vowels in Slavic and Baltic says "Baltic scholars who have concerned themselves with this question conclude that one cannot reconstructed a Proto-Baltic." The situation seems comparable to Proto-Algonquian, which was initially reconstructed as Proto-Central-Algonquian (contrasted with Eastern and Plains), before scholars realized that only Eastern was a genetic group with a proto-language (PEA), and that what had been reconstructed as PCA was, with only a few minor changes here and there, simply Proto-Algonquian. - -sche (discuss) 19:14, 8 August 2015 (UTC)
    But there is the question of accuracy. Since PBS still hasn't really been reconstructed (no etymological dictionary), mentioning fleeting forms or original research should only be done explicitly, which is not (yet) done here as policy. What is available out there often does have PB, not PBS, forms -- only those few words that are important for an author's paper, such as the Derksen paper you cite. In the absence of a body of consensus reconstructions for Proto-Balto-Slavic, disregarding the Proto-Baltic ones or changing them automatically into Proto-Balto-Slavic is simply too hasty. The work hasn't been done yet to justify this. We're still at "Proto Central Algonquian" time; to assume that the work of demonstrating that all those forms are simply "Proto Algonquian" has already been done is at best temerary. --Pereru (talk) 19:36, 8 August 2015 (UTC)
I'm confused ... AFAIK no one questions that Balto-Slavic is a clade. Benwing (talk) 05:12, 9 August 2015 (UTC)
Some Lithuanian (and perhaps Latvian) nationalists deny it. I've seen the claim made that the Balto-Slavic theory was a Soviet plot to justify the annexation of the Baltic States into the Soviet Union. I don't know whether any reputable linguists free of ideological motivations deny it, but if so, they're in the minority. —Aɴɢʀ (talk) 15:24, 9 August 2015 (UTC)
One of them tried to hijack the Wikipedia pages on the subject not that long ago. As to Benwing's confusion: the issue isn't whether it's a clade, but whether the details have been worked out on the proto-language. Also, proto-languages are theoretical constructs that are only as good as the information on which they're based: including Slavic in a reconstruction provides extra material to work with, so a PB reconstruction may not be as a complete a picture as a PBS one. I have no problem with documenting that a referenced reconstruction was for PB rather than PBS. My main issue has been with categorizing entries as derived from PB. Even experienced editors sometimes forget about the categories that are added by the templates. Chuck Entz (talk) 15:57, 9 August 2015 (UTC)
@Chuck Entz, well, using it in etyl would imply categorization as well, this is how it's done for fiu-pro as well, do you think the cat should redir?
@Angr, one way it can be valuable (if one reads between the lines) is that it often is used as a "code word" for Proto-East Baltic (the hypothetical parent of Latv. and Lith. that hasn't been worked out yet and judging by current theories wouldn't include Slavs if it is, in fact, worked out at some point) which gives geographic and chronological clues (this can be important in Uralic/Finnic etymologies for example, as there appear to be several layers – a pre-Slavic Balt(o-Slav)ic layer and for Finnics a "Proto-Baltic" (read "Proto-East Baltic") layer of borrowings.) Neitrāls vārds (talk) 16:31, 9 August 2015 (UTC)
@Chuck: It's possible that Slavic would include more information, but someone with enough knowledge of Slavic sound changes could easily evaluate if the Proto-Baltic reconstruction is also valid for Proto-Balto-Slavic. In most cases, it will be. This is not limited to Slavic either; information from outside Balto-Slavic can also contribute to a Balto-Slavic reconstruction. —CodeCat 20:19, 9 August 2015 (UTC)

Adding our own diacritics in quotations of prose works printed without them[edit]

I've had an ongoing debate in the past with User:Atitarev about whether we should add stress marks to quotations of Russian prose. He believes that this is helpful to readers, but I am against this for a number of reasons. Firstly, I believe that out of respect to the author and publisher, all of our quotations should reproduce as closely as possible the original work with the exception of the bolding we add to the word(s) that the quote is demonstrating. Secondly, this forces us in some instances to choose between two or more equally acceptable stress variants of some words, or worse in some cases between two or more homographs with different meanings. Note that this does not apply as much to poetry from which stress can be inferred by the meter, or to songs or movies in which the stress can be heard. This problem is significantly exacerbated in languages such as Hebrew and Arabic, where would not only be inferring stress, but also vowels, leaving much more possibility for ambiguity.

The question is: Should we (Wiktionary) do this in general? Should we do this for languages like Russian, even if not for languages like Hebrew and Arabic? Should we do this even for languages like Hebrew and Arabic? Should we remove diacritics from quotations where we have already added them? --WikiTiki89 15:32, 7 August 2015 (UTC)

As far as I know, the practice is to leave quotes relatively unchanged. I don't think we add macrons to Latin or old Germanic quotes, for example. —CodeCat 15:37, 7 August 2015 (UTC)
Adding macrons to Latin is a completely different story, because these texts are often already printed with macrons. I'm not talking about always sticking with the most original quote version of the quote, but about sticking to existing publications. This question mostly applies to relatively modern quotations. --WikiTiki89 15:55, 7 August 2015 (UTC)
As for Arabic, stress of course doesn't really apply, but I think it would be a huge help to the reader to add the vowels to the extent that they can be inferred reasonably unambiguously. Reading Arabic is hard for non-fluent speakers due to the underspecified text, esp. with verbs. I think in the case of Russian, similar arguments could be made -- if you're concerned about ambiguous cases, just leave off the stress in those cases or (perhaps better) follow Anatoli's convention of putting a stress mark in each possible place of stress. I'd also like to see individual words inside quotes linked -- again it would be a great help for the language learner. Benwing (talk) 16:26, 7 August 2015 (UTC)
Not all the quotations we include need to be targeted toward beginners. We can have usage examples with the full diacritics, which would be helpful for beginners. But quotations are meant to show how the words are really used in reality; and in reality, Russian is not written with stress marks and Arabic is not written with vowels. --WikiTiki89 17:32, 7 August 2015 (UTC)
In reality we always (or should always) transliterate Arabic text, at the very least. (And who's to decide what's targeted towards beginners and what's not? The same arguments could be made for not transliterating at all.) Benwing (talk) 19:49, 7 August 2015 (UTC)
What I mean is that not everything needs to be targeted toward beginners who can't read without vowels. And even for people who are not so comfortable reading without vowels, it's not as hard when you already know what word you're looking at. With transliteration, we're not actually altering the original text; the original is still there and anyone who doesn't want or need the transliteration can ignore it. --WikiTiki89 20:19, 7 August 2015 (UTC)
Another thing is that adding adding vowels prevents us from being able to show how vowels actually are used in the text (such as the fatḥatān, šadda, and other sporadic disambiguators). This applies to all three of the languages I've mentioned. --WikiTiki89 21:10, 7 August 2015 (UTC)
When we're giving a direct quote, we should keep the original spelling of the whole quote, i.e. without Russian stress marks (unless we happen to be quoting some text that for whatever reason uses them). We should also keep е for ё if that's how it was spelled in the original. (I don't quite understand why we allow ё in page names in the first place.) We can include stress marks in the transliteration if need be, though that will mean writing the transliteration out manually instead of letting it happen automatically. —Aɴɢʀ (talk) 10:55, 8 August 2015 (UTC)
We allow ё in the page names because this is a dictionary convention, it's so also in the Russian Wiktionary. The Russian Wikipedia makes the letter mandatory throughout articles and many native speakers prefer to write it all the time. Letter ё isn't exactly banned in Russian! It's also considered a separate letter, not a е with two dots (две точки). Every Russian dictionary uses it in the alphabetical order. Knowing that ё is replaced with е by native speakers lets you figure out how to spell it in the real world. For the same reason, I don't see how adding stress marks, normalising texts with ё, adding Arabic or Hebrew diacritics, Japanese furigana is a problem in quotations. Many editors suggest photographic image of the original texts, even using the glyphs. Modern Russian books don't reprint texts in the pre-1918 reform spellings. China republished all old books in the simplified script. Japanese publishers partially follow the post-war reform.
Another point, some Russian books appear in accented forms with consistent usage of ё, designed for foreigners or children. Or Arabic texts can be with or without vocalisations. Japanese texts appear with furigana (ruby) to help with the pronunciation, especially when aiming at young readers.
My strong opinion is that dictionary should be user-friendly and help master languages, it's about the language, not the facts. Showing how languages are written out there in the real world can be described in appendices. Learners learn this as the first thing. For me, a learner of Arabic, is much more useful to have vocalised Arabic then telling me over and over again that diacritics are not used by Arabs. Imposed restrictions is the reason I dislike adding citations. --Anatoli T. (обсудить/вклад) 12:46, 8 August 2015 (UTC)
I support Benwing's idea of linking words in usage example. It has long been used by Chinese templates, which do it automatically. E.g.
中國首都北京 [MSC, trad.]
中国首都北京 [MSC, simp.]
Zhōngguó de shǒudū shì Běijīng. [Pinyin]
The capital of China is Beijing.
As you can see, it has a semi-automatic script conversion and transliteration, it can also be used for quotes, which will display both traditional and simplified forms, regardless of the original form. --Anatoli T. (обсудить/вклад) 12:53, 8 August 2015 (UTC)
All of this is fine for our own example sentences, but I do think we should follow the original orthography when we're giving a direct quote. We're showing how the word is used "in the wild", and I don't think we should pretty that up. But headword lines and translation listings and usage examples can be as learner-friendly as we want them to be. —Aɴɢʀ (talk) 05:54, 9 August 2015 (UTC)
What do we violate by providing "самолёт лети́т на за́пад" instead of "самолет летит на запад" with word stresses and normalising "е" as "ё"? The text is the same, it just has accents to make the reading easier. It's completely uncommon in Russia to use pre-1918 reform spelling when quoting old authors and Chinese don't have to use traditional script when quoting old authors, regardless of what script the original was in. Chechen texts often replace Cyrillic palochka with |, l, 1, etc. for technical reasons but the normalised spelling distinguishes capital and small Ӏ and ӏ , e.g. лугӏат ‎(luġat) (the correct spelling) will appear in a printed text as луг|ат, лугlат, луг1ат or лугӀат. Should we also copy the fonts and word breaks in citations? --Anatoli T. (обсудить/вклад) 07:23, 9 August 2015 (UTC)
I feel like with direct quotes, we should present them as faithfully as Wikisource presents source texts: we don't copy over fonts and word breaks, and incorrect character shapes can be replaced with correct ones when the intent is clear (e.g. when the original author is clearly attempting to write a palochka but doesn't have the exact character available), but we do present misspellings, misprints, typos, etc., uncorrected (though they can be [sic]ed) and we don't add pedagogical diacritics. —Aɴɢʀ (talk) 08:17, 9 August 2015 (UTC)
I agree with Angr. - -sche (discuss) 06:03, 9 August 2015 (UTC)
I disagree, as mentioned above. Although in any case there shouldn't be problems linking individual words in quotes. Benwing (talk) 07:08, 9 August 2015 (UTC)
FWIW, I think the argument for adding diacritics to Arabic (it's often unintelligible without them) is much stronger than the argument for adding diacritics to Russian (it's perfectly intelligible without them), and I would sooner allow the former than the latter. At the risk of adding far too much visual noise to non-Latin script citations, perhaps we could have vocalized forms display on mouse-over or something? - -sche (discuss) 16:54, 9 August 2015 (UTC)

WikiTiki's and Anatoli's disagreement is very deep and philosophical. It stems from the disagreement over the purpose of Wiktionary. Anatoli and his camp see Wiktionary mainly as a learning tool for non-native speakers. Hence the reading aids in quotations, the note in Template:ru-adj1 and the unscientific, pronunciation-based transliteration system for Russian. The other camp, which includes me, sees Wiktionary as a scholarly resource, a kind of an encyclopaedia of language, useful for native speakers too. One side wants to write an OALD, the other an OED. Both projects are useful and have a right to exist, but we have to choose one. --Vahag (talk) 13:56, 9 August 2015 (UTC)

I'm not sure what you mean by "unscientific" here. Also, maybe I'm an optimist but I think it's possible to resolve this issue through compromise. As for OALD vs. OED, keep in mind this is the English Wiktionary, and hence designed for English speakers. That means that foreign-language entries are inevitably geared somewhat towards language learners, just like all cross-language dictionaries. I don't think there's much disagreement over this. This means the OALD isn't the right point of comparison. We're rather trying to create something like the OED for the English-language entries and the Hans Wehr dictionary for Arabic language entries (this is the best dictionary of Modern Standard Arabic I can think of), and similarly for other foreign-language entries. Benwing (talk) 21:25, 9 August 2015 (UTC)
Vahag. Neither OALD nor OED cover topics in detail we do here. Published Russian dictionaries lack transliterations, there's nothing to compare with. Well-known dictionaries are unconcerned about the Russian transliteration, they simply don't do it. When they do (in citations, etc.), you get both "narodnovo" (phonetic) and "narodnogo" (graphic) transliterations (genitive or animate accusative of наро́дный ‎(naródnyj)). You made negative comments about word stresses and genders as well but most users and editors find them useful, AFAIK. Therefore, I have to use other languages again as examples, for the umptieth time.
Examples of irregular pronunciations and transliterations, using very common words in various scripts:
  • Thai: ชาติ ‎(châat) (written as "châa-dti") but the final "i" is silent. Can you find a (scientific) source, which claims that it should be transliterated as "châa-dti" or similar, with a transliterated "i"?
  • Korean: 십육 ‎(simnyuk) (written as "sibyuk"). Can you find a (scientific) source, which claims that it should be transliterated as "sibyuk" or similar?
  • Japanese: 今日は ‎(こんにちは, konnichi wa) (written as "konnichi ha"). Can you find a (scientific) source, which claims that it should be transliterated as "konnichi ha" or similar?
  • Arabic: شُوكُولَاتَة ‎(šokolāta) (written as "šūkūlāta"). Can you find a (scientific) source, which claims that it should be transliterated as "šūkūlāta" or similar? Perhaps a better example is إِنْجْلِيزِيّ ‎(ʾinglīziyy) written as "ʾinjlīziyy".
I can give more examples where phonetic transliteration (closer to pronunciation) is considered standard and scientific. Are there sources that claim that "что" should only be "čto" and never "što" and "кого" should only be transliterated as "kogo" and never "kovo"? --Anatoli T. (обсудить/вклад) 01:01, 10 August 2015 (UTC)
Anatoli, Benwing, Russian transliteration has been discussed million times (see Wiktionary talk:Russian transliteration) without achieving consensus. Let's not start a new one here. I was merely pointing it out as an example of scientifically rigorous vs convenient. The issue at hand are the usage examples. When you are giving a quote from Pushkin's Eugene Onegin, I want it do be without stress marks and in pre-reform orthography, as it was published in 1833. If you normalize the text, it is less valuable to me and others who are interested in diachronic, historical development of Russian. Language learners would prefer normalized quotes. Our needs are irreconcilable. --Vahag (talk) 10:50, 10 August 2015 (UTC)
If I were to provide citations for the pre-1918 reform spelling of пока́мест ‎(pokámest) (modern) - пока́мѣстъ ‎(pokáměst) (pre-1918 spelling reform), then the old spelling would be more appropriate:
Покамѣстъ, въ утреннемъ уборѣ
Надѣвъ широкій боливаръ,
Онѣгинъ ѣдетъ на бульваръ,
И такъ гуляетъ на просторѣ,
Пока недремлющій брегетъ
Не прозвонитъ ему обѣдъ
But why would I need to confuse users/readers if the entry is the modern spelling (покамест)? Pre-reform spellings are, of course, allowed but they should be clearly marked as old or obsolete. Pushkin's works are enjoyed today by most readers who don't have to struggle to read the old orthography, citing pre-revolution authors works just fine but if anyone is interested in the old orthographies, they are free to do so but what it has little to do with the dictionary of (modern) Russian. --Anatoli T. (обсудить/вклад) 12:32, 10 August 2015 (UTC)
You can reference a more modern printing of the work, which would have already converted the orthography. I would have no problem with that. --WikiTiki89 17:59, 10 August 2015 (UTC)
That's great but why would a learner of Russian seek archaic spellings in the Russian sections of the English Wiktionary, even if the reference is for a term, which hasn't changed with the reform? It's fine if you already mastered modern standard Russian and wish to take the next step and familiarise yourself with historical spellings. Yes, we can add all historical spellings but they are not a priority for this project. --Anatoli T. (обсудить/вклад) 01:07, 11 August 2015 (UTC)
In case you misunderstood me, you can quote a more modern printing of the work that uses the modern orthography. I'm only concerned with with us altering the text ourselves. --WikiTiki89 01:22, 11 August 2015 (UTC)
Perhaps, my comment was to get my point across in reply to Vahag's comment earlier, where he said that he would prefer the original orthography quotes. Eugene Onegin (or Yevgeny Onegin) is available in both pre-reform and modern spellings or in fact any old literature for that matter. I just don't see the need to quote pre-reform orthography for modern terms. Not at the expense of modern orthography, in any case. --Anatoli T. (обсудить/вклад) 01:48, 11 August 2015 (UTC)
Some people may be interested in them. If we already have a few quotations in modern orthography, who does it hurt to have one in the old orthography as well? --WikiTiki89 02:08, 11 August 2015 (UTC)
  • Adding stress marks to attesting quotations of Russian prose is a poor practice, IMHO. Adding these to headword lines is acceptable; adding these to lists of terms such as synonyms and derived terms is equally poor, IMHO. In most places, terms should be presented in the form in which they appear in print. I don't believe the learners of Russian should be reminded on every single occassion how to pronounce; the headword line itself should suffice. --Dan Polansky (talk) 11:40, 10 August 2015 (UTC)
    I agree with Dan in principle. Language learners are smart enough to click the link if they forgot the stress or pronunciation of a word. --WikiTiki89 17:59, 10 August 2015 (UTC)
As a (admittedly not very committed) learner of Russian, I would find more pervasive usage of stress marks very useful. Looking up the stress every time is very tedious and a drag on learning. Seeing the stress mark in quotes, examples, links and synonyms would make learning faster through repetition and reinforcement. --Tweenk (talk) 22:48, 26 August 2015 (UTC)

Tagging unsourced reconstructed entries[edit]

I've just made {{needsources}} to tag reconstructed entries (protoforms) that were created without explicit published sources. Since, after all, reconstructed forms are simply hypotheses, not attested words, they need sources (who proposed that reconstruction, in what publication, and based on which cognates) just as much as a "normal" word needs usage examples so we know it really exists. I therefore suggest that any reconstructed entries that have no sources in them be tagged, so that those interested in them can add the sources. (I started doing this, but my edits were reverted since the issue had not been discussed here first, so I am doing this now.) --Pereru (talk) 01:40, 8 August 2015 (UTC)

I don't understand why they must absolutely have sources. From its conception, Wiktionary has been a dictionary and therefore stands on par with other dictionaries. Other dictionaries do not source all their definitions to another linguistic work; they interpret and present their research independently. In the same way, Wiktionary and its editors have directly interpreted evidence in the form of attestations. Parroting other dictionaries has always been explicitly forbidden and independent research of lexicographic content has been a requirement, enshrined in WT:CFI and the process of WT:RFV. For lexicographical content, we have never once required corroboration by an outside source; we require evidence and make our own decisions based on that through consensus and peer review.
Because Wiktionary presents etymological information as well, it's also an etymological dictionary. That means that other etymological dictionaries stand on par with Wiktionary. Etymological dictionaries, too, present independent and sometimes novel interpretation of the evidence, and are not required to take all of their contents from other linguistic sources. Of course, when information is corroborated by another source, they can and do indicate this, to strengthen their own claims. But etymological works may equally question or refute what other sources say; they're not limited to parroting others.
Wikipedia is an encylopedia, a compendium of existing knowledge. This makes sourcing vital to Wikipedia, and original research a problem. But as I have shown here, Wiktionary is of a very different nature, and through this nature it is bound by different rules. It's not a compendium of lexicographic or etymologic knowledge presented by others; it's an independent source of this knowledge. We are not subservient to other linguistic sources, we are their equivalents, or even competitors. Original research within Wiktionary is important, it's an integral part of how Wiktionary works and has always worked. Therefore, it's not appropriate to require sourcing to another linguistic work for information presented on Wiktionary. This goes directly against what Wiktionary is, and the principles and processes written down in our policies. —CodeCat 12:09, 8 August 2015 (UTC)
Contrary to the above, requiring references for etymologies is not against en wikt policies since we do not have any on the matter. WT:ATTEST, the important evidence-based criterion, says nothing about etymologies. Some people are even pushing a requirement that etymologies should be referenced into WT:ETY; my removal (diff) of an undiscussed addition of such a requirement was undone. I think the whole section References in WT:ETY should be removed as not traceable to a discussion or vote showing consensus, but I have better things to do at this point; maybe a couple of months later. Again, while for definitions we have WT:ATTEST and WT:CFI in general, for etymologies we have a policy vacuum. --Dan Polansky (talk) 13:10, 8 August 2015 (UTC)
Let's see if I can help CodeCat understand why sources for etymologies are a good thing:
(a) Etymologies are hypotheses, not the truth; the interested reader should be able to see why a certain etymology is given here rather than others, without havaing to trace some discussion of its correctness somewhere in the archives.
(b) Etymologies, being hypotheses, have authors: unlike words, they aren't simply "in usage" or "out of usage" or "dated" and whatnot, they were actually ideas, good or bad, proposed by someone. To omit this information is (a1) a disservice to the interested reader, since it hides available information, and (a2) unethical, since it amounts to not giving credit to an author for his/her idea, which is a kind of intellectual theft
(c) To the non-specialist, more information is better than less information. I am sure that a specialist can probably quickly assess and evaluate the goodness of a specific etymology, but others would need more than that. Claiming you don't need this information because "expert Wiktionaries" can access the correctness of an etymology anyway is like claiming that attestations are not necessary to qualify a word for inclusion because "expert Wiktionarians" can tell if a very rare or dialectal word actually exists...
(d) "Other dictionaries don't do that" is not a good argument ("Wiki is not paper", etc.). Some do: etymological dictionaries, where the sources are so important they are usually listed at the beginning of the book rather than at the end, because the author knows that the interested reader will want to form his/her opinion on the author's choice of sources. Non-etymological dictionaries indeed often don't, but they also often don't cite any etymologies at all, and they certainly don't have appendices with reconstructed protoforms -- if we want to follow them, then we should delete all reconstructed entries, shouldn't we?
(e) "Etymological dictionaries present independent and novel interpretations of the evidence" -- indeed, and they always label it as such! And they also always give sources for ideas that are not "independent and novel"! Why should Wiktionary be any different? Personally, I am not against independent research, as long as it is (e1) labeled as such, and (e2) argued for, preferably on the same page. Why are you not doing that? Īn other words: Etymological dictionaries do distinguish original ideas from other people's ideas, which they give sources for; why don't we -- why don't YOU -- do the same?
(f) Mentioning sources is not equivalent to parroting other dictionaries' definitions--quite the opposite! Mentioning sources means respecting other people's intellectual property rights, and also giving the reader the possibility of exploring the basis for a given etymology being used here.
Besides, both Wiktionary:Reconstructed_terms#References_and_verifiability and Wiktionary:Etymology#References mention the need for sources in etymologies. Why shouldn't we follow these guidelines?
@CodeCat:, you seem to believe that sources vs. lack of sources boils down to Wikipedia vs. Wiktionary. It doesn't. The reason for writing adding sources to etymologies is that it is a good idea (see above), not a simple imitation of other wiki projects. Please get off the soapbox!... Also, it's not a question -- at least not to me -- of "original research". As I said elsewhere, I have nothing in principlpe against original research; I just want it to be labeled as such. If the reconstructed protoforms you created entries for are all your own work, then they should be labeled as such, and your reasons for creating them with that form should be on their page (or on a page like WK:About_Proto-Indo-European, or WK:About_Proto-Balto-Slavic, etc.). I'm not "requiring sourcing to another linguistic work", I'm just "requiring sourcing"-- if it's your work, say so on the page! That's what etymological dictionaries do: they label their own work as such. It's also not about "criteria for inlcusion or deletion": I'm not saying 'delete it if it's original research', I'm saying 'label and argue for it if it's your work' -- not just in obscure discussions two years ago in the Scriptorium, but right on the reconstructed entry page! WHY THE HELL NOT? Something I really don't understand is why you are hellbent on obscuring the reasons why a certain protoform is included here. In what way does hiding the reasons/sources for including a form help Wiktionary become better? Claiming that "expert Wiktionaries" can judge it so we don't need to argue for them on their page is like claiming that "Expert Wiktionaries" can tell if, say, Arabic usage examples are correct or not, so we don't need to translate them into English on the page of the word they are an example for...--Pereru (talk) 19:05, 8 August 2015 (UTC)
(a) Sourcing doesn't actually tell the readers any reasoning. It just suggests that the reasoning might be found in another work instead, but even that's no guarantee as plenty of other works just give forms without any arguments. I am completely for reasoning and giving arguments for reconstructions, within reason. Some widely known and accepted sound changes like Grimm's law should not need to be pointed out in every etymology. So I'm not sure how this point is relevant. External sourcing doesn't change anything about it. If anything, I understand your argument to mean that we should provide argumentation for etymologies in addition to, and regardless of, sourcing.
(b) It can be assumed that all information on Wiktionary is the result of Wiktionary's own editorial process. All content on a wiki is already sourced through the page history, so that gives credit to everything users have ever added to pages. Adding references to Wiktionary users only complicates things. External sources are fine, but we should not be required to tag everything we add with our own usernames, that's just stupid.
(c) Again, a source provides no information, it merely says where information came from. We use many specialised linguistic works as sources on Wiktionary, and I don't think many Wiktionary readers will have access to them. So to the majority of readers, the source is nothing more than a name.
(d) I have nothing against providing a reference to a source when information is taken from them. I admit I have been rather sloppy about this, and still am to some degree. But I am trying to improve things, as you may have noticed from my recent edits to PIE root pages. Do as I say not as I do. Just because I'm not perfect doesn't mean I'm not right.
(e) Again, I have nothing against sourcing information that does come from an external source. What I disagree with is requiring that all information comes from an external source; this is what your new template's wording appears to imply. I also disagree with sourcing particular ideas to individual editors. Wiktionary is a wiki, and information can and should be edited and improved by other editors. This means it's not right to place certain parts of pages on "lockdown", not allowing anyone else to edit them. Etymological information originating from within Wiktionary should be sourced to Wiktionary editors as a whole, and to editorial consensus. But since all information not sourced to external sources can be assumed to have been provided by Wiktionary editors, this is entirely redundant.
(f) Copyright doesn't apply and never has applied to information alone. So intellectual property is not relevant here. Scientists give each other credit and require it from others, because of plagiarism, but that's not intellectual property as far as I know. And I have no idea what the laws and rules are on plagiarism anyway. Wiktionary doesn't have any rules for it.
Those two pages you mentioned were written long ago, long before there was really any significant number of reconstructed pages. I also doubt whether they actually reflect consensus and common practice, so they should be changed to reflect what we actually do. My objections to your proposals, now and before, are that we should not be required to have an external source for all etymological information on Wiktionary. This is where my comparison with Wikipedia comes in. Wikipedia has a simple rule: unsourced material that is challenge can and should be removed. I object to bringing this practice to Wiktionary, as we are a dictionary (lexicographical, etymological and other) and it is in the nature of this project to be able to interpret, research and peer review available evidence (attested words) on our own.
So, again, to recap: I have nothing against sourcing. If information comes from somewhere else, source it. That's a good thing. Explaining reasoning for particular reconstructions, in the entries themselves, is also a good thing. I have no problem against that either, but within reason. Very obvious things like Grimm's law probably don't need to be mentioned, but there is no objective standard for this and if we want to go this route, we should figure out among ourselves which information is obvious enough to leave out. —CodeCat 19:39, 8 August 2015 (UTC)
(a): Of course, only if you mention bad sources. Good sources do have the reasoning behind the proposals. It's up to you if you cite good sources or bad sources. Don't cite bad ones; cite good ones. If you see bad sources being cited, mention that to the author or start a discussion about that source. Don't just omit it -- as always, there is nothing to be gained by using a source -- including your own original research -- and not mentioning it. How many etymological dictionaries do you know that fail to mention their sources? And they are not Wikipedia... Now, it would indeed be better if you added the entire reasoning behind a suggestion rather than just reference the source, but the latter is easier and is the standard practice in etymological dictionaries. And most reconstructed entries here -- especially the ones you made -- still lack such an explanation, which is why they should be tagged with {{needsources}}.
(b): Sure. But as others have said there is no policy with respect to etymologies and their sources, so saying "it's the result of Wiktionary editorial process" still tells us nothing about what was done. What if I want to know the reasons? Where do I find this information -- an information that most etymological dictionaries give by means of, among other things, indicating their sources? And by giving detailed reasonings when it's their own idea?
(c): A good source does provide information. Are you familiar with good etymological dictionaries? They provide further sources, so you can trace it down to the original proposer, and they provide rationales for deviant forms. They also compare different hypotheses, and often provide further evidence for preferring one or the other. Plus they list correspondences and sound laws, especially the least known ones. They're full of argumentation, reasonings, rationales... What the heck are you talking about? What sources are you talking about?
(d): Good! Please continue doing that. If you add sources to your pages I have problems with them. In fact that is my entire point: not having sources and reasons for including a particular protoform on the page itself is not a plus for Wiktionary, it's actually, as you put it, being sloppy. I'm glad you're fixing that, and you'll get my support for this. The goal, of course, should be to fix everything.
(e): And here we are apparently in full agreement: I am in favor of referencing external sources only when the information comes from an external source (duh!...). But now, "if" a given word is the result of your own original research, then this should be sourced, so that the reader knows that it is your original research. If you have your reasoning on the page, what the heck is bad about saying it is your idea? In what way is that bad for Wiktionary? And again, good etymological dictionaries do that (Karulis adds a big "K" to every paragraph in the LEV that contains his own ideas, for instance. That is what good etymological dictonaries do: they do not shy away from original research, but they label it as such and argue for it on the entry itself! Why is that so bad?)
(f): Intellectual property is not simply a question of law; it's a question of ethics. "Plagiarism", i.e. people taking advantage of other people's ideas without mentioning them, is exactly what the concept of intellectual property is supposed to prevent; why else do you think it exists? I think scientists don't own the legal copyright over their own ideas after they're published, but they certainly have the moral/ethical copytright. Do you think Dr Kim would be happy if you wrote him an e-mail telling himv you've mentioned Proto-Balto-Slavic protoforms he proposed in a public forum like Wiktionary without mentioning his name? Would you, if you were in his place? Maybe he thinks Wiktionary is "just internet" or "not trustworthy" and thus not worth the trouble, but I'm sure he wouldn't think that not mentioning his name is the right thing to do -- in fact, I'll bet he would mention this as an argument against taking Wiktionary seriously. Which in fact it is.
(g): Maybe the pages should indeed be changed; two other Wiktionarians in the rfd discussion have already suggested that I myself "be bold" and edit and change them. I don't want to do that, though; but if you feel so strongly about it, why don't you? I do point out, though, that several others have said there is no official policy, so I'm not sure that there is a "what we do" yet: you seem to be placing the cart before the horses here. I think you still need to argue for "what YOU do" as being "what we should do". And frankly, I don't see how you can argue that not mentioning sources actually enhances Wiktionary. There is no self-respecting etymological dictionary that doesn't mention sources and doesn't label independent original research as such; why should Wiktionary?
In sum, if you don't have anything against sourcing, then remove the {{rfd}} from a template that merely asks for what you say you have nothing against. If you are in favor of explaining reasons for particular reconstructions in the entries themselves, then do so. In fact create a framework for doing that, with a special page in the Wiktionary namespace for listing all correspondences, all sound laws, etc. so you can easily refer to them in the shorter explanations in every reconstructed entry. By all means do so! The problem thus far is that this is not being done, and when I started requesting that it be done ("source" = "published source" OR "original research rationale") you reverted all my changes and asked for my template to be deleted. Be consistent! Do as you claim to believe! --Pereru (talk) 20:34, 8 August 2015 (UTC)
@CodeCat:, to summarize:
It seems we agree in most things. We both think it's good to have sources if the information comes from an external source. We both think original research is OK, and we're both in favor of writing down the reasons for a certain reconstruction in the entry itself. I'm further in favor of you also mentioning yourself as the author of a given idea if indeed that is the case, or at least of referencing/copying the discussion that led to a given form being accepted here. So why not do it? And what is the problem with tagging the entries where this wasn't done yet? I also add {{rfap}} to basically every new Latvian entry I make, because this puts them in a single category where Latvian native speakers like Neitrāls vārds can comfortably find the words they want to add pronunciation files to. Because, just as in the case of etymologies being sourced (and I don't mean only external sources), this actually adds value to the entry. Why not make this official Wiktionary policy?--Pereru (talk) 20:46, 8 August 2015 (UTC)
I just don't want my name to be placed in entries, and especially not my real name. I think that's my prerogative. —CodeCat 21:28, 8 August 2015 (UTC)
Not even CodeCat? Why not? I don't want my real name here either, but I wouldn't mind signing something here as "Pereru", the same way I sign a picture I upload to Commons as "Pereru"... If our names are in the histories of the pages we edit, and here as signatures in the comments we write, why not also in suggestions in pages? But well, it *is* your prerrogative. Call it then "Wiktionary contribution", or tag it with a "W" or "WK" to show that the idea originated here, rather than in the outside world.--Pereru (talk) 04:18, 9 August 2015 (UTC)
I don't accept the notion that we need to cite sources to list descendants; that would hobble us. Regular inheritance by a language of a word from an earlier stage of that language (including from a proto-language) is usually so obvious and non-noteworthy that it is not mentioned except for common words in well-documented languages, or for proto-language terms that an author needs to grasp at less-documented languages to demonstrate; good luck finding a reference that confirms, for any sizeable number of words from e.g. Rumantsch, that they indeed derive from Latin/PIE foobar. Even borrowing may be obvious but unreferenced; no reference in supra confirms that the word derives from Georgian, but it's fairly obvious.
I do think the sheer existence of a word in a proto-language is something we need to provide a reference for, though if a reference attests that a certain word existed in a proto-language, I think we can and should certainly adapt that reference's potentially outdated notation; when I do this in Proto-Algonquian appendices I write source (has form) (sometimes visibly and sometimes in an HTML comment). If no previous scholarship attests the existence of a word, we could put a template at the bottom of the entry (a bit like {{LDL}} and {{Webster}}) saying something like "this reconstruction is the product of deduction by Wiktionary editors"; users could then (as with every other claim on every non-talk-page) look to the page history to see who added what. Such a template would provide a nice way of tracking and periodically revisiting such entries to see if references for them had become available, since it seems to be obvious to everyone except CodeCat that citing external authority wrt the existence of words in proto-languages is better than leaving it at "well, a random, vehemently anonymous person on the internet thinks so". - -sche (discuss) 22:15, 8 August 2015 (UTC)
@-sche: the problem here is simply when you have cognates proposed by different sources. Cognates can be sourced by default (they will mostly come from the same source anyway), without necessarily adding a footnote to each of them; but those who come from some other source will need to be footnoted, so that we are clear the source in question did not claim cognacy in this case. This applies even to words suggested as cognate by Wiktionarians: we could add a little superscript "W" to those, for example. This happens because "obvious" is not always true. French parler looks like a cognate of Portuguese falar, but it isn't. In fact, it's standard scientific practice: when you are presenting cognates, they must be either (a) sourced, or (b) your claim, or at least (c) be attested in some very well-known source, so they can be presented as "known to everybody already".--Pereru (talk) 04:48, 9 August 2015 (UTC)
I'm still being misunderstood here it seems. I do think that citing an external authority improves etymologies and reconstructions further. However, I don't think that reconstructions are necessarily less reliable without them. Sometimes, the reconstruction is just so obvious that there's nothing else it could possibly be. A great example is Proto-Finnic *kala. It's exactly the same form as its ancestor and many of its descendants. If we can find sources that agree with our own ideas, then all the better, that just shows that we're not alone in thinking that. But the same applies to sources with respect to each other, too. If we have two sources that disagree with each other, then we can mention the idea from both of them. But we should also feel free to poke holes in these proposals. Maybe we (through WT:ES or a talk) could decide that one of them has more merit than the other, and we can mention our reasoning in the entry. As editors and researchers, we don't have to consider all sources equally valid. —CodeCat 22:28, 8 August 2015 (UTC)
I think the misunderstanding is actually yours, about how science works. Yes, *kala is maybe an obvious case, but it was not discovered by you or me. It has a proposer, and saying who it is is, I think, something an etymologist would be interested in. See, this is like saying we don't have to provide usage examples or definitions for words that "everybody knows". Yes, everybody knows what time and happy mean; yet Witkionary provides them with definitions. Is this useless? No. Is it useless to provide a source for *kala? Again, no. Just ask any scientist: is it useless to provide sources for 'obvious' things? No, both for credit/historical reasons (the guy who said it first deserves the credit), and for scientific reasons ('obvious' ideas sometimes turn out to be wrong...). If the source is well-known ('everybody knows who proposed that'), then scientists will not mention the author (everybody knows the laws of gravitation were proposed by Sir Isaac Newton).
Looking for "the proposer of" obvious etymologies is not a good idea. Finnic is a dialect continuum, and it has always been known by the speakers that people in nearby areas use plenty of the same words. This would be sort of like asking "who was it that proposed that fish in British English and fish in American English are cognate?" (Or: "who was it to discover that the moon has phases?")
It's possible to do historiography on when does an etymology like this start turning up in scientific literature of course, but that's more constrained by the development of linguistic methodology and publication practices themself. {{R:fi:SSA}} mentions appearences of kala spanning 350 years; the earliest inter-Finnic comparison found by them is Finnish ~ Estonian from 1786, followed by Karelian in 1799, Veps in 1830 (in the first linguistic report on Veps to be published), Votic in 1856 (in the first grammar of Votic to be written), etc. (There's no specific date on who was the first to claim that this is also a Proto-Finnic word; but if we grant modern theoretic understanding, this is already implied by the Finnish-Hungarian comparisons from the 17th century, so essentially the date would be as soon as someone came up with the concept of "Proto-Finnic" in the first place.)
I agree that this is information that someone might be interested in, but just referencing SSA itself should be enough so that people interested in the history of etymology would know where to look for more details. At Wiktionary we're only working on etymology itself, not its history. --Tropylium (talk) 13:47, 9 August 2015 (UTC)
Indeed, I agree, especially because a source like SSA would probably give you the beginning of the trail leading to the first proponent if need be. I'm not saying that you need to find out the very first historical source ever to make the claim; but that, unless the claim is yours, some source should be indicated (so the interested reader can follow the trail). And it seems that we agree on that, right? (The "fish" in AE and BE case is not really parallel: I don't think these words were popularly believed to be cognates, but rather they were believed to be the same word, much as when I use "fish" as opposed to when you use "fish": we are using the same words, even if we pronounce them differently. Now, English "fish" and German "Fisch" or Dutch "vis": that is not perceived as the 'same word', and cognacy enters the picture.) --Pereru (talk) 19:20, 10 August 2015 (UTC)

Transliterations in parentheses?[edit]

From the above discussion, it seems to me that most people want to keep the automatic transliteration of non-latin-script examples. Would it be possible to implement DTLHS's suggestion of putting the transliteration in parenthesis rather than after an em-dash, to distinguish it more clearly from the following translation? Could someone perhaps make the necessary changes in the appropriate module, assuming nobody has any objections? --Pereru (talk)

I oppose using brackets but perhaps a light-grey colour for transliterations would be more palatable? --Anatoli T. (обсудить/вклад) 12:59, 8 August 2015 (UTC)
What's wrong with brackets/parentheses? Transliterations on the headword line are in parentheses. Light grey text is hard for people with bad or limited eyesight (e.g. partial blindness) to read, although such people are probably only a tiny minority of our readers. I'd prefer parentheses to lighter text. I like the suggestion (made above) of putting transliteration on the same vs a different line according to the length of the line, but I guess it has no chance of actually corresponding to "fits on one line" vs "doesn't", given the variety of phone- and computer-screen sizes (unless we implement it is a css feature?). - -sche (discuss) 17:58, 8 August 2015 (UTC)
I agree. (Personally, I would even favor a smaller font, in addition to parentheses, but parentheses would already be enough to separate more clearly transliteration from transcription and from the original text).--Pereru (talk) 18:43, 8 August 2015 (UTC)
How about an option that allows transliterations to be shown and hidden at will? —CodeCat 19:49, 8 August 2015 (UTC)
Sounds OK to me. Is that easy to implement? --Pereru (talk) 04:16, 9 August 2015 (UTC)
As long as "at will" means something the end user does, not the editor. Benwing (talk) 05:19, 9 August 2015 (UTC)
Yes, it should work more or less like showing and hiding inflection tables. But there should probably be something that saves the user's preference too, so that transliterations stay hidden forever unless you show them again. —CodeCat 17:02, 9 August 2015 (UTC)
I can agree with that. I'll wait for implementation beforee using the templates in Eastern Mari, but after that it shouldn't be a problem. --Pereru (talk) 19:21, 10 August 2015 (UTC)

When adding RFC to entries[edit]

Would it be too much to ask whether when an RFC is added to an entry that the date be added as well (perhaps automatically), so that it can be traced back much more easily in the RFC records. Some RFCs remain in entries for years and get forgotten about, and are not easily traceable in the entry's history. Donnanz (talk) 16:00, 9 August 2015 (UTC)

Wikipedia has a bot that goes around adding dates to cleanup templates. Perhaps we could ask the folks who run it to run one here, too. You can find always the RFC discussion via the whatlinkshere (restrict it to searching the Wiktionary namespace and ctrl-f "cleanup"), unless the page was tagged but not listed. - -sche (discuss) 01:59, 11 August 2015 (UTC)
One way of adding the date is by adding your "four tildes" next to the RFC, but very few users would think of that, hence this thread. I try not to create too many RFCs! Donnanz (talk) 16:23, 11 August 2015 (UTC)
We already have the capability to deploy "oldest" and "newest" tables (such as the "oldest" table at the top of this page) for categories, which addresses on of your concerns.
The very existence of these suggests that the dates when an item was added to a category must already be accessible. Does anyone know how? DCDuring TALK 18:09, 11 August 2015 (UTC)

Templatizing usage examples[edit]

FYI, I created Wiktionary:Votes/pl-2015-08/Templatizing usage examples. Let us discuss the proposal, and postpone the start of the vote as much as the discussion requires. --Dan Polansky (talk) 09:38, 10 August 2015 (UTC)

I support this and I don't see why anyone wouldn't. It's analogous to why we templatize headwords and such. Templatized foreign-script languages, for example, allow for automatic translit. And likewise, the format can be changed, either by the end user through CSS or by editing the template -- e.g. if we figure out how to automatically use CSS to decide whether to put such an example on one line or multiple lines, which should definitely be doable since things like Bootstrap (a CSS library released by Twitter) can do it. Benwing (talk) 01:39, 11 August 2015 (UTC)
Is this something we even need to vote on? Is anyone against it? We've been templatizing usage examples for quite a while now and I don't remember anyone complaining. --WikiTiki89 01:45, 11 August 2015 (UTC)
I'm getting tired of all these pointless votes to be honest. —CodeCat 01:48, 11 August 2015 (UTC)
I too oppose votes on matters of formatting and template usage (and have stated as much in the past). Such votes could be seen as, at best, pointless, or as disruptive attempts to block the implementation of relatively minor changes by requiring the changes undergo more hurdles and meet a higher threshold (compare how US congresspeople use the filibuster to raise the threshold for passing legislation from 51% to 60%, blocking legislation which has enough votes to pass but not enough votes to come to the floor). Once before I started an "oppose having this vote" section on a vote, which garnered as much support as the vote itself; one could consider such an action if this vote is opened. (Side note, all the examples in the vote are English usexes, but I think it may be wise to consider English usexes — which don't need transliteration or translation — differently from foreign-language usexes.) - -sche (discuss) 02:14, 11 August 2015 (UTC)
Here's Wiktionary:Votes/2015-03/Templatizing topical categories in the mainspace; it has 50% support. Here's Wiktionary:Votes/2014-08/Migrating from Template:term to Template:m; it ended with 60% support. I find the above implication that editors at large should not have a consensus-based say in matters of template use in the mainspace and formatting in mainspace disconcerting. The wiki and template markup is the user interface and it matters a lot. The formatting instructions WT:ELE are a policy and cannot, in most circumstances, be edited without a vote. I oppose the use of ux and usex templates in English and Czech entries; it adds almost no value and makes the markup ugly to read. I never said so since I did not have the energy to do so; there are usually all to many things to discuss, in part since there are too many unnecessary changes being introduced by various editors without discussion. I have finally lost my patience, after seeing an editor chastise another editor for not using these templates. If I am a lone voice, the vote will easily pass. --Dan Polansky (talk) 08:03, 11 August 2015 (UTC)
As for "meet a higher threshold", can you clarify what the lower threshold and and the higher thresholds are in this particular Wiktionary situation? Do you consider 2/3 to be a too high threshold to pass? --Dan Polansky (talk) 08:09, 11 August 2015 (UTC)
You shouldn't create a vote before the issue has ever even been discussed. --WikiTiki89 10:33, 11 August 2015 (UTC)
The vote can be postponed as much as the discussion needs. Furthermore, overtemplatizing has been discussed, AFAIR. I remember one editor expressing his dislike of quotation templates and his preference for plain non-templated markup for attesting quotations; that's a case similar though not the same as example sentences. --Dan Polansky (talk) 11:49, 11 August 2015 (UTC)
  • Could someone remind me of what the benefit of this template is to new contributors, to passive users, or to others? If the benefit is a technical benefit that inures in a diffuse way to many, please explain. DCDuring TALK 12:30, 11 August 2015 (UTC)
See my comment up top about the benefit of the template, although there may be other reasons as well. Benwing (talk) 14:04, 11 August 2015 (UTC)
I asked not about the generalized benefits of templates, but of this one. I was hoping there were more.
So the total benefit is in the statement "the format can be changed, either by the end user through CSS or by editing the template -- e.g. if we figure out how to automatically use CSS to decide whether to put such an example on one line or multiple lines"
  1. What portion of our "end users" (admins? whitelisted editors? newbies?) will be trusted to make CSS changes of broad implementation? How would that work? Can you point to any examples or analogs in existing templates?
  2. Generally it seems that the features of templates quickly become Luacized, which dramatically reduces the ability of more casual contributors like me to make changes, especially since there is no group of responsive technical contributors willing to respond to requests, rather than implement their own cryptic agendas.
  3. All benefit depends on either:
    1. total implementation of a very capable (ergo, hard to develop successfully) template or
    2. allowing user-option non-use of the template when it fails to provide good output by the person using the template, ie, some who knew or was willing to learn the switches etc.
But we do not even have consistent use of our existing format, which is almost certainly needed for successful mass conversion to the template approach. What steps have we taken to discover inconsistencies in formatting, to learn from them, and to either correct them or amend WT:ELE?
Our failure to successfully continue deployment of Autoformat worries me. The existing format-maintenance system seems to be a regression requiring much more manual involvement.
It would be much easier for me to accept changes if they did not make it harder for newer content contributors, did not require more typing, did not make editing harder by uglifying the edit frame, led to specific benefits that were achievable with reasonable certainty, and were implemented by a responsive group of technical contributors. Continued overtemplatization in areas for which we need more contributors, ie, definitions, usage examples, citations, seems approximately opposite to the direction we should go. DCDuring TALK 14:50, 11 August 2015 (UTC)

Wiktionary:Votes/pl-2014-03/CFI: Removing usage in a well-known work 3[edit]

Some people recently mentioned they missed Wiktionary:Votes/pl-2014-03/CFI: Removing usage in a well-known work 3, despite the fact that the vote was opened for 5 months. Some of the people who missed it could have been User:Cloudcuckoolander, User:Ungoliant MMDCCLXIV, and User:DCDuring. I would like to encourage such people to post late votes, properly indented so that they do not count (e.g. #: Late '''oppose'''). We can't keep votes open forever, but we can continue to collect best evidence of consensus or its lack. Having a rationale accompanied with a late vote would would be very preferable, I think. --Dan Polansky (talk) 10:13, 10 August 2015 (UTC)

Thanks. Nevertheless, pinging for votes, even after-the-fact, looks like electioneering, a use of discretion that biases the process. It is what political parties do in elections: get out their vote. As I recall, there is some policy (probably unenforceable) against using e-mail to solicit votes. This has the merit of being more transparent, but still. Is but still an includable idiom or just elision? DCDuring TALK 14:15, 10 August 2015 (UTC)
I see your point. By pinging, I notified three people who explicitly said that they missed the vote, two of whom are likely to oppose the vote and one of whom would support based on his past comments. At the same time, I posted to Beer parlour so everyone who monitors Beer parlour is indirectly notified. I don't know what better I could have done other than stay silent. Late votes won't change the vote result anyway but are interesting, so I think they are a good idea. --Dan Polansky (talk) 17:42, 10 August 2015 (UTC)
The BP note alone would be more to my taste. But, as I said, pinging from a well-watched page is at least transparent, especially compared to alternatives. DCDuring TALK 18:14, 11 August 2015 (UTC)

Rare senses x rare forms[edit]

I have noticed that the parameter "rare" of the {{template:context}} categorizes entries into the Category:Terms with rare senses by language, while the parameter "uncommon" into the Category:Rare forms by language. What is the difference between these two categories? Originally I thought that "rare forms" contains only forms of some lemma, which are rare (e. g. common Czech word "pes ‎(dog)" has a common plural "psi", but rarely "psové" can be found too), but the real content of the category does not look so. Jan Kameníček (talk) 00:19, 11 August 2015 (UTC)

The fact that various rare, historical, dated, archaic, and obsolete things are categorized differently is due to (1) a desire to categorize terms with only obsolete/rare senses (like heleth) differently from terms which are still current/common in some senses (like land), combined with (2) the fact that categorizing such entries differently requires a lot of work (edits to entries, templates, etc), most of which has not been done yet. I think the ideal/plan/hope is that one day terms like heleth will be in Category:English obsolete terms (I am not sure why Category:English obsolete forms exists with the name and content it has; as you note, it should properly be used only for e.g. low as a form of laugh), while land et al will be in Category:English terms with obsolete senses. (And likewise with rare things.) - -sche (discuss) 01:03, 11 August 2015 (UTC)

Retiring the codes of spurious languages[edit]

As of this year, the ISO has retired or has received requests to retire the following codes on the grounds that they are spurious and the languages they ostensibly refer to never existed. I suggest we also retire the codes.

  1. cbh Cagua, kox Coxima, cum Cumeral, ome Omejes, toe Tomedes, rna Runa. I quote from the change request forms (cbh, kox, cum, ome, toe): "Alan Wares, in correspondence with Barbara Grimes (5/28/1971), stated that [each one] should be deleted as 'non-existent.' Moreover, the Ethnologue has not added any information to the language entry in nearly 40 years. Mention of this language is missing from all the major sources on South American languages (Adelaar 2004, Campbell 1997, Dixon & Aikhenvald 1999, Loukotka 1968, Crevels 2007, Voegelin & Voegelin 1977). Hammarstrom (2014, in press) cites two additional sources (Landaburu 2000, Ortiz 1965) for the non-attestation of [it]." (rna's change request is similarly blunt about the total lack of evidence that it exists.)
  2. cbe Chipiajes and pod Ponares. These are surnames rather than language names. Quoth the change request form for obe: "Alan Wares, in correspondence with Barbara Grimes (5/28/1971), stated that Chipiajes should be deleted as 'non-existent.' The only information that the Ethnologue has added for Chipiajes: 'A Sáliba surname. Many Guahibo also have that name.' Mention of this language is missing from all the major sources on South American languages (Adelaar 2004, Campbell 1997, Dixon & Aikhenvald 1999, Loukotka 1968, Crevels 2007, Voegelin & Voegelin 1977). Hammarstrom (2014, in press) cites two additional sources (Landaburu 2000, Ortiz 1965) for the non-attestation of Chipiajes." The comments on pod are similar.
  3. xbx Kabixí. See the change request form, where it is noted that the term Kabixí is a catch-all for any hostile tribe, and the linguist who studied it "concedes that there was no information on" it.
  4. iap Iapama. Quoth [4]: "There is no evidence that this language exists. No information has been added to the Ethnologue since the 1980s. Mention of this language is missing from all the major sources on South American languages (Adelaar 2004, Campbell 1997, Dixon & Aikhenvald 1999, Loukotka 1968, Crevels 2007, Voegelin & Voegelin 1977). Hammarstrom (2014, in press) cites two additional sources (Grenand & Grenand 1994, Gallois & Ricardo 1983) for the non-attestation of Iapama."
  5. svr Savara. Quoth [5]: "Hammarstrom (2014, in press) states that it has been checked quite carefully that no Dravidian language exists matching the name Savara or any of the other information in the entry (p.c. David Stampe 2011) , nor, for that matter, the Indo-Aryan Oriya variety labeled Sahara/Saora in Mahapatra (2002:183-184). Barb Waugh, in an email dated 10 November 2009, responded to queries about Savara stating that she did not believe that the language existed at all. She only knew Savara as an alternate name for Sora [srb], a Munda language. She pointed out that Ruhlen (1987) lists Savara as a Dravidian language. Around the same time, Kirk Miller (UCSB) wrote questioning the existence of this language."
  6. yds Yiddish Sign Language. See jewish-languages.org's entry for more.
  7. btl Bhatola. See [6].
  8. myi Mina (India). See [7]. Neatly, retiring this will allow us to include hna Mina (Cameroon) without a disambiguator.
  9. pry Pray (aka "Pray 3", as Ethnologue called it because they just gave up on disambiguating it in any of their usual ways). This one is not strictly spurious; rather, it turns out to be no more than a duplicate of prt Prai (aka Pray). See the change request form with data from recent field research.
  10. yos "Yos" was retired and merged it into zom "Zo", with the change request noting that Yos is simply the English plural of Yo which is no more than a variant form of Zo.

If you have objections to any of these, speak up. (This list does not include codes retired by being split or merged [except pry and yos], or some other codes; I'll post about those later.) - -sche (discuss) 02:39, 12 August 2015 (UTC)

About gloss parameter in term templates in all other sections, except etymology[edit]

I am not sure how helpful is a gloss in Derived and Related terms or even in Synonyms. A term, as all words, may have many different senses. These can include a figurative one, a literal one etc. The existence of a gloss definition in etymology section is usefull since possibly (but not always) only one sense is the specific one that "caused" the people to use the word (or phoneme). I came to these conclusions after @Saltmarsh pointed out that I should add some gloss definitions to my additions. I tried it for start, but there where some "mind" troubles when I had to add terms that have more than one wide used sense or more than one gender. Someone might say that in such cases do not add a gloss. But even if only one sense is wide used we "provoke" the rejection of all other senses that user may find in the article. --Xoristzatziki (talk) 06:05, 12 August 2015 (UTC)

Synonyms should use {{sense}}. Derived and related terms should specify a gloss but it can be of the form |gloss=foo, bar, baz with multiple defns; doesn't have to be all possible senses but should be the principal ones. Benwing (talk) 06:45, 12 August 2015 (UTC)
[I've been away]   Reasoning: the gloss may be interesting/relevant and having it there saves the user linking through to find out. User Benwing shows the syntax.   Gender: I think I would do this for multiple gender forms of different meaning:
Idiomatic phrases: I would normally link each term (do we want all of these with a separate entry) and these certainly need a gloss.  — Saltmarshσυζήτηση-talk 05:57, 15 August 2015 (UTC)
  • Synonyms and Derived terms sections should not specify a gloss, by my lights. It is not only my preference but also a long-term overwhelming practice not to specify a gloss. Some people prefer glosses, obviously, but it is nowhere close to being a prescribed or recommended practice. Similarly, these sections should not provide gender, IMHO, but here the practice varies language by language. As for rationale, glosses make these sections too busy with information that is available elsewhere. Gender is okay as for being too busy or not, but is available in the lemma, and IMHO not so important that it should be available in term lists. --Dan Polansky (talk) 17:42, 16 August 2015 (UTC)

Allow Etymology as level 4 header[edit]

Me and some others (I don't know who, or where the discussion was) have expressed in the past a desire to have the etymology section nested under part-of-speech sections, rather than floating alongside them (both on level 3) or having the part of speech nested under the etymology. I think it makes more sense to put etymology underneath an individual word:

  1. Users generally look up terms for their definitions, etymology is of lesser importance overall. Therefore, it makes more sense to put it below the definitions.
  2. Etymology always applies to a single word and part of speech. If it happens to apply to multiple parts of speech, then the chances are that one of them was first, and the others were derived from that. That's something we can and should note in the etymologies of each individual part of speech.
  3. Having to increase the heading level whenever there are multiple etymologies is annoying. It also makes it look less consistent; sometimes POS is level 3, sometimes level 4? Level 5 headings are hard to distinguish visually from level 4, so I think level 4 should be the highest level we use.
  4. For non-lemma entries, we generally don't have or need etymologies, but we're forced to create etymology sections for them whenever there is another word in the entry. For example, rose ‎(rise, past) needs an etymology header to separate it from the header for rose ‎(flower), but the etymology section itself is left empty or doesn't have any useful information, because the etymology is at the lemma, rise.

So I'd like to ask/propose that etymologies be allowed to be nested underneath the POS header, as level 4. It would be added below the definitions, usage notes and inflection headers, but above synonyms, antonyms and derived/related terms. This is done in accordance with the general principle in our entry layout that information about the current term precedes information about relationships to other terms.

This proposal is intended as an indefinite trial, to let users who prefer this alternative format apply it to entries and evaluate its merits and problems. The original format will continue to be allowed as well, at least until there is a decision to phase it out. —CodeCat 18:50, 12 August 2015 (UTC)

"If it happens to apply to multiple parts of speech" -- isn't that overwhelmingly common? It would be tiresome to have ety sections repeated all over the place when they are basically the same word/sense. Do other dicts do that? Equinox 18:54, 12 August 2015 (UTC)
I've not found it particularly common in the languages I've worked with, it's quite rare. Maybe English is just an exception. But this is why I'm not proposing to get rid of the old format just yet; we can still keep using it in situations that we haven't found alternatives for yet. That said, I think it's pretty easy to handle this with nested etymologies, as I noted in point 2. Just put the etymology on the term that was first, and the rest get etymologies saying they were derived from that first term. For example, up ‎(preposition) is derived from up ‎(adv), which our entry fails to note. —CodeCat 19:01, 12 August 2015 (UTC)
For many words, it might not be known which POS came first. Other dictionaries do not do this. They generally list all the parts of speech and give one etymology at the top or bottom of the entry (which sometimes mentions different derivations of specific senses of the word, is still in one section). --WikiTiki89 19:12, 12 August 2015 (UTC)
"English is just an exception." And also merely, technically the host language of this wiki.
No matter how many times this is proposed, it still seems like a bad idea. The structuring advantage of having semantically related terms that are different PoSes is enormous for English. Though it is not particularly helpful for one not familiar with large dictionaries, it is quite helpful once one get the hang of it. It is almost essential where there are homonyms both with, say, nouns as PoSes. Is it the proposal to combine all of the noun PoSes, no matter what the etymologies? We have spent a fair amount of effort trying to split etymologies where semantically warranted. To run the definitions through the blender as seems to be proposed seems like a regression. We may have come to accept them in technical areas as people abandon the project and their creations lapse, but I don't see why they should be allowed in content. DCDuring TALK 23:26, 12 August 2015 (UTC)
Where are you getting the idea that POS sections are going to be merged? They'll be split by etymology as they always have been. —CodeCat 23:40, 12 August 2015 (UTC)
  • If POS is level 3, and etym is level 4... how would the POS sections not be merged? I find this rather confusing. ‑‑ Eiríkr Útlendi │Tala við mig 00:40, 13 August 2015 (UTC)
And I'm confused that it confuses you, because it seems pretty simple to me. POS sections are not merged, they're kept separate as they are now. Nothing more to it. —CodeCat 00:54, 13 August 2015 (UTC)
Just for fun (and clarity), could you take one of the more complex entries and reformat it in your proposed style (perhaps within your userspace)? It'd be useful for reference. Equinox 01:11, 13 August 2015 (UTC)
Ok, can you give me one you had in mind? —CodeCat 01:25, 13 August 2015 (UTC)
I think what is being proposed is (for e.g. rose):
A flower. (blah blah blah, headword line template, synonyms, etc)
From Oscan.
Past tense of "rise".
Inflected form of "rise".
Which indeed obviously keeps the POS sections distinct. I'm not sold on such an arrangement, but one obvious benefit is that etymology could be pushed below the definitions, which some people have favoured. - -sche (discuss) 03:01, 13 August 2015 (UTC)
  • To clarify my concern about merging POS bits, I'm not talking about nouns and verbs being thrown together. Instead, I'm concerned about terms that have multiple senses of a single POS type, and where those separate senses have different etymologies. If the etymology header is made subordinate to POS, then things get confusing pretty quickly. Consider the Japanese entry at , for instance. This term has nine different noun senses, all with distinct etymologies (and eight distinct pronunciations even). The proposed structure of an ====Etymology==== header at level 4, under a ===Noun=== header at level 3, would make this entry a complete mess. ‑‑ Eiríkr Útlendi │Tala við mig 18:07, 14 August 2015 (UTC)
See diff for an example of how I think etymologies should be handled. Each POS has its own etymology, including (especially) forms of another lemma. Two different lemmas can't possibly have the same etymology, because after all, if they have the same development history, why are they still different?
As for Eirikr's entry above, I'm not really seeing the issue. The entry already has one etymology section for each POS, so all that would be left to do is to switch the headers around. —CodeCat 20:09, 18 August 2015 (UTC)
  • CodeCat, have another look. As visible in the page's TOS, some single etymologies cover multiple POSes -- noun and prefix, noun and suffix.
In addition, I still don't quite understand your proposed layout. Further up the thread, it sounds like all nouns would go together under a single ===Noun=== header -- which then leaves me wondering how the disparate etymologies would be accounted for. Even if your intention is to have as many ===Noun=== headers as there are etymologies, this produces a strange circumstance where we are organizing higher-level headers in a way that's dependent on lower-level headers. Just in terms of hierarchical organization, that seems backwards.
And that still doesn't account for the case where an entry has multiple etymologies, and some of those etymologies apply to multiple POSes. Numerous Japanese terms have a single spelling, with multiple POSes under a single etym and pronunciation. Fewer, but still numerous, entries have multiple separate etymologies, each etym with its own pronunciation and possibly multiple POSes.
Would you be willing to edit the entry into your proposed structure, as you did for dice? A more concrete example would illustrate things more clearly, I think. ‑‑ Eiríkr Útlendi │Tala við mig 21:07, 18 August 2015 (UTC)
See User:CodeCat/ja. Since I didn't know the etymologies of all the terms, I had to make something up. —CodeCat 21:44, 18 August 2015 (UTC)

Why don't we have an Unattested namespace?[edit]

Putting unattested terms in the Appendix namespace gives no information about why they are there or how they differ from other appendices. Why not give them their own namespace? There is nothing particularly appendix-like about them. DTLHS (talk) 02:57, 13 August 2015 (UTC)

New namespace I'm confused: how would a new namespace help? —Justin (koavf)TCM 03:06, 13 August 2015 (UTC)
If you're talking about reconstructed proto-language terms (versus, say, this kind of unattested terms), I think giving them their own namespace (say, "Reconstructed:") would be a fine idea. We could even perhaps then write a Mediawiki: page or, at worst, some js/css, to automatically display the "this term is reconstructed" warning atop such pages, which people currently have to remember to add manually. - -sche (discuss) 03:08, 13 August 2015 (UTC)
Right, reconstructed, sorry. DTLHS (talk) 03:11, 13 August 2015 (UTC)
It would also make it easier to parse reconstructed pages, which should be treated like all other namespace pages, vs appendix pages which mostly should not be. DTLHS (talk) 03:30, 13 August 2015 (UTC)
I was a bit confused by the use of "Unattested" at first, (Appendix:English unattested phobias and Appendix:English dictionary-only terms come to mind) but for reconstructed terms I, too, support the idea of creating the separate namespace Reconstructed:. --Daniel Carrero (talk) 06:02, 13 August 2015 (UTC)
We should name the namespace so as to include constructed languages as well. --WikiTiki89 06:10, 13 August 2015 (UTC)
I definitely support a Reconstructed: namespace. I don't think we should include appendix-only constructed languages in it. What we should do with them, I don't know, but muddling the reconstructed namespace with them is a bad idea and would take away some of the technical benefits of such a namespace. My own personal preference is to just delete them altogether. —CodeCat 11:41, 13 August 2015 (UTC)
I support a Reconstructed: namespace too, without conlangs. They can have a namespace of their own, e.g. Conlang:. —Aɴɢʀ (talk) 12:19, 13 August 2015 (UTC)
I also support a Reconstructed: namespace. Not sure about conlangs; either they should go into Conlang: or into the main namespace. Arguably, conlangs that are well enough attested should go into the main namespace and others shouldn't be included at all. If we use a Conlang: namespace, where do we draw the line? Esperanto was originally a conlang, too, but we put it in the main namespace. Same with Lojban, for example. Benwing (talk) 12:53, 13 August 2015 (UTC)
This is why I prefer deleting them. It's a bit strange if we say "yeah, we don't actually allow these conlangs, but if you hide them away in an appendix then it's ok". —CodeCat 12:54, 13 August 2015 (UTC)
Question: What about reconstructed terms in, say, Vulgar Latin? What differentiates them from terms in Proto-Romance? Our entry claims VL and Proto-Romance are synonyms, but w:Vulgar Latin says that the two are "often confused". DCDuring TALK 13:33, 13 August 2015 (UTC)
Something to consider: should the pages in this new namespace be named with the language name as they are now? Or should we have entries named only with the headword, like in the main namespace? —CodeCat 16:33, 13 August 2015 (UTC)
There would be some pages with multiple proto-languages on them, e.g. the strings *me- and *ke- are so short that they're surely found in more languages than just proto-Algonquian. OTOH, handle that just fine in the main namespace. - -sche (discuss) 18:23, 13 August 2015 (UTC)
I'm in favor of organizing them like the main namespace rather than like the current layout, e.g. /wiki/Reconstructed:bʰer- with a ==Proto-Indo-European== heading rather than /wiki/Reconstructed:Proto-Indo-European/bʰer-, where the ==Proto-Indo-European== heading would be redundant. Maybe we could pick a shorter name for the namespace though, like Proto:. —Aɴɢʀ (talk) 18:18, 14 August 2015 (UTC)
Proto: would not work for non-protolanguage reconstructions. —CodeCat 18:25, 14 August 2015 (UTC)
It would work, it just wouldn't be the optimal name. "Recons:", maybe? I just don't feel like typing out "Reconstructed:" all the time. —Aɴɢʀ (talk) 18:45, 14 August 2015 (UTC)
I would suggest "R:" were it not for the fact that that would conflict with how we name and transclude reference templates. - -sche (discuss) 18:47, 14 August 2015 (UTC)
The software allows for namespace shortcuts. WT: is a shortcut to Wiktionary:. —CodeCat 18:57, 14 August 2015 (UTC)
Yeah, I (and, on my talkpage, JohnC5) have thought about the utility of having more namespace shortcuts, e.g. AP: for appendices. The shortcut might still have to be RC:, though, since I suspect the existence of an R: namespace (even as a redirect) might cause {{R:OED}} to be interpreted as a transclusion of R:OED rather than Template:R:OED (certainly I would expect it to fail to reach Template:R:foo for any {{R:foo}} where R:foo was a page). Side note, @Angr, how often would you be typing out rather than copy-pasting the first part of the pagename (Reconstructed:) given that the second part would probably contain characters like ɸ or ʰ₂r̥ that you'd have to copy-paste or insert from the edittools? Perhaps we could add Reconstructed: to the things edittools can insert... - -sche (discuss) 19:11, 14 August 2015 (UTC)
Even then, all our linking templates already treat * as a shortcut to reconstructed pages. So you'd only need to type the namespace name in the very rare occasion that you're not using a linking template. —CodeCat 19:23, 14 August 2015 (UTC)
I feel like I waste hours of my time typing the words Appendix and Category. If the abbreviations AP and CT respectively existed, I would be very pleased. Also Temp or TP for Template would be great for that matter. I don't see why we don't have more of these. I also support the creation of the Reconstructed namespace. —JohnC5 19:28, 14 August 2015 (UTC)
@CodeCat: Separate to this discussion, could we look into adding those shortcuts to the search bar? —JohnC5 12:58, 19 August 2015 (UTC)
Ideally, we'll get a few more users to chime in here supporting such namespace-redirects. Then we can file a Phabricator ticket asking for (a) a 'Reconstructed' namespace, and (b) 'RC'→'Reconstructed', 'AP'→'Appendix' and 'CT' (or maybe 'CA', since 'CT' sounds like 'Category talk' although we almost never have discussions on Category talk pages) → 'Category' namespace-redirects. It shouldn't be hard / take long for the devs to grant such things to us. - -sche (discuss) 07:49, 20 August 2015 (UTC)
  • I agree that conlangs should be handled separately from reconstructed terms. In contrast to how we handle proto-languages, our current approach to conlangs actually is fairly well suited to the appendix namespace, in that we have one page (one appendix, total) on each conlang. However, most of them are constant copyvio magnets, since we can only allow short appendices, but the inclusion of any appendix at all tempts people to expand said appendix: see e.g. [8] (BP discussion of copyright issues). I wouldn't mind deleting most of them, perhaps moving a few (de minimis) words into our mainspace entries on the names of the conlangs, using {{examples-right}}, like this. - -sche (discuss) 18:23, 13 August 2015 (UTC)
  • Support a separate namespace for reconstructed languages (for one, it's the by far busiest part of the Appendix: namespace, and trying to find out whatever is going on with all the other appendices is a pain). — I do not think that a mainspace-type approach to lumping "homographic" roots from different protolangs on the same page is a good idea though. Notational systems for protolangs vary greatly, and this could imply a senseless amount of repetition of "Alternate spelling of…" sections in the future. The basic object of protolang pages is an etymological group, not the graphical representation of its proto-form, per se.
    In fact I could suggest that the new namespace be named simply Etymology:, and that it could include appendix pages tracing the descendants of attested words just as well (a la Appendix:Names derived from Marcus). --Tropylium (talk) 14:10, 22 August 2015 (UTC)


Last year, the ISO approved the code esy for Eskayan. Should we follow suit, and if so, should we allow it in the main namespace? It's technically a conlang from the early 1900s, but it comes with a mythology that claims it's much older and it functions as a medium for recording traditional stories (both in Roman script and in a native script which lacks a ISO 15924 code). It has no native speakers but a few hundred secondary speakers and a few schools to teach it. - -sche (discuss) 06:50, 14 August 2015 (UTC)

I've added it to Module:languages. It is spoken by a few hundred people, and schools teach it and literature is published in it and has been for almost a century, so I suppose it is allowed in the main namespace like Esperanto. Its creator intended it for widespread use (by his ethnic group) and attributed it to his tribe's mythical ancestor rather than to himself, and then he (the actual creator) died in 1949, so as far as copyright concerns go, it seems similar to e.g. Esperanto and different from e.g. Dothraki. Shall we update WT:CFI#Constructed_languages to note the existence and inclusion of Eskayan, or is that not necessary because the ISO doesn't categorize esy as a constructed language, and it does not itself admit that it is one (even though it is identifiable as such by linguists)? - -sche (discuss) 19:03, 16 August 2015 (UTC)
I don't see any reason to exclude it.--Prosfilaes (talk) 20:32, 17 August 2015 (UTC)
@-sche: Reading up on it, I see that it's pretty much relexified Boholano Cebuano. If that's the case, it resembles the avoidance registers of Australia or the pandanus languages of New Guinea, which we treat as part of whatever language's grammar they have. Perhaps, then, we ought not to be including Eskayan on those grounds instead. —Μετάknowledgediscuss/deeds 02:20, 18 August 2015 (UTC)
That seems to value internal consistency over ease of use and external consistency. If the world treats it as a separate language, it seems like people looking it up are going to be expecting it to be a separate language.
Also, people looking up a language that has multiple known registers are going to know about the registers. It's a lot easier for students and the like to get confused if we mix Eskayan words in with Boholano Cebuano words, no matter how they're labeled.--Prosfilaes (talk) 04:56, 18 August 2015 (UTC)
Right. Furthermore, I'm not sure treating Eskayan as Cebuano would even provide internal consistency: if (as is my understanding) the entire lexicon is different to the point that there is zero mutual intelligibility, on what basis would we consider them the same language, while considering languages with very similar grammars and lexicons (say, Danish and Swedish) to be distinct? It's my impression that even the largest avoidance registers contain only a fraction of the number of words the main language possesses. - -sche (discuss) 03:27, 26 August 2015 (UTC)
  • I'm big on recording what people are actually using to communicate. On the other hand, there's a lot of missionary-mangled versions of languages that aren't really worth bothering with, and this looks like it might be just another example. If someone wants to do it, I'm not going to object.--Prosfilaes (talk) 07:33, 29 August 2015 (UTC)
    • I agree. If islanders use this language to communicate amongst themselves, then it would seem to be comparable to (a much less widespread form of) Esperanto, or even to Michif with only the difference that the group of people who created it lived recently enough to be identifiable by name rather than lost to the mists of time. But if it never gained use outside of the missionaries' materials, then it would seem comparable to other failed attempts at language-blending conlangs. - -sche (discuss) 16:57, 29 August 2015 (UTC)

Neo and Talossan (the two ISO-coded conlangs CFI doesn't specifically address)[edit]

Quoth CFI as updated to reflect current ISO numbers, in addition to the 7 (self-identified-as- and identified-by-the-ISO-as-) constructed languages which are approved for inclusion in the mainspace, there are 14 more languages which are classified as constructed languages, of which 9 "have not yet been approved for inclusion in the English Wiktionary", and are included in appendices: these are languages like Láadan. "Another 3 of those fourteen languages are prohibited", namely Quenya, Sindarin and Klingon, which are also included in appendices.

  1. What is the difference between being 'not approved' and included only in appendices, and being 'prohibited' and included only in appendices?
  2. What should be done with the two languages which are left out of the above count (9+3=12≠14), Neo and Talossan? Should they be 'not approved' and limited to appendices, or 'prohibited' and limited to appendices, or something else?
  3. What should be down with WT:BP#Eskayan, discussed above, which the ISO does not classify as a constructed language but which is identifiable as one?

- -sche (discuss) 19:29, 16 August 2015 (UTC)

I think we need to overhaul that part of CFI a bit. Instead of listing languages and thus being both messy and incomplete, we should make it clear that those 7 languages are approved, and no other languages that the ISO considers to be constructed languages may have entries in mainspace. That would leave Eskayan just like any other language, which I think is fine. —Μετάknowledgediscuss/deeds 23:48, 17 August 2015 (UTC)

Two romanization headers in a row[edit]

In entries like de and lei, is it preferable to have two romanization headers in a row (one with the "form of X" templates and one with the "nonstandard form of Y" templates), or only one header, like so? - -sche (discuss) 03:52, 17 August 2015 (UTC)

I think it's preferable to have a single Romanization header in such cases. —Aɴɢʀ (talk) 18:44, 17 August 2015 (UTC)

Notes as a valid L3 (esp. along References)[edit]

Copied from a related discussion, for separate discussion. (Link removed from sig not to ping unintentionally.) Neitrāls vārds (talk) 06:50, 17 August 2015 (UTC)

As I see it, a discussion on allowing "Notes" as a valid header should be considered.

Vahag has brought this up (Wiktionary:Grease_pit#.7B.7Breflist.7D.7D) and I'm running into a similar problem all the time. As ridiculously silly of an argument as it may be, I do, in fact, agree that numbered and bulleted references together look ugly AF. (I have even went to such ridiculous steps as removing a reference that didn't add anything critical just because it was bulleted while the other ones were numbered because of how unappealing it looks.)

In more general terms, I kind of get the feeling that there seems to be consensus that references are in fact valuable and add value to the entry, perhaps the discussion should focus more on how to allow more elegant ways of faithfully citing content, particularly in "controversial" cases, e.g., obviously one bulleted reference is enough under, say, an assertion that et kala and liv kalā derive from the same source because, well, it's pretty obvious but then if there is a "weird" controversial cognate there isn't even a way of citing it inline (unless you want the awful looking mixing of numbered and bulleted refs.) Neitrāls vārds (talk) 11:09, 12 August 2015 (UTC)

  • Support having a ==Notes== section separate from ==References==, esp. when both exist. Benwing (talk) 11:21, 12 August 2015 (UTC)
Where in the standard order of headers would this be placed? —CodeCat 18:48, 17 August 2015 (UTC)
Right before ==References==. --WikiTiki89 18:53, 17 August 2015 (UTC)
Am I understanding correctly, then, that the notes section would apply to all POS sections collectively rather than any specific one? —CodeCat 18:59, 17 August 2015 (UTC)
The way it is now (perhaps unofficially) is that the ==References== section may be found in an entry with one etymology as an L3 or L4 or in an entry with more than one etymology as an L3, L4, or L5 (my personal preference is never to have it as an L3 with more than one etymology, so I usually fix these cases). --WikiTiki89 19:04, 17 August 2015 (UTC)
My preference is the opposite, to have it always at L3. —CodeCat 20:00, 17 August 2015 (UTC)
If they are always tagged with <ref> tags, then your way may be better, but often the ==References== section is just used to list references that apply to an entire section, in which case you need to know which section that is. You can have a different set of reference links for each etymology section or even each POS section. --WikiTiki89 20:04, 17 August 2015 (UTC)
Your point is valid, but I don't like references that don't use ref tags to begin with. "Section-wide" references tell you nothing about what comes from where. All they do is say "these references were somehow involved in the creation of this entry", which is rather vague. —CodeCat 20:11, 17 August 2015 (UTC)
This. His point is invalidated since we shouldn't have references without ref tags. — LlywelynII 14:07, 18 August 2015 (UTC)
Support. This will also help prevent misuse of the Usage notes section. I frequently run across entries whose “usage notes” have nothing to do with how the word is used (arachnogenic necrosis is the latest example). — Ungoliant (falai) 19:05, 17 August 2015 (UTC)
  • Oppose. The solution to the layout problem is to not use bulleted references. Usage notes already covers any notes relevant to the entry. Anything that would go into a Wikipedia entry's "Notes" section should either be addressed in the appropriate section directly (as with contested etymologies) or simply removed (as with Ungoliant's "a. necrosis" example). Giving people yet another section in which to include errata isn't an actual solution to the problems people are listing. — LlywelynII 14:07, 18 August 2015 (UTC)
    • IMO this makes little sense. You basically think people should never create references sections listing refs; that's an impossible standard to meet and often way too awkward. In my Arabic entries that I add, I routinely add a "References" section under each part-of-speech entry listing the books where I got the entry definitions from. There's no simpler way of doing it, since the reference really does refer to the POS section as a whole in most cases. And many languages do this. So we really do need Notes and References separate. Likewise if we're using Harvard-Style references, with short footnotes under "Notes" that are linked to a list of references under "References". Benwing (talk) 14:17, 18 August 2015 (UTC)
      • IMO you're confused as to what's being proposed which probably goes back to the original discussion's misunderstanding of Wikipedia's #Notes section. #Notes (as the name implies) are for actual notes; they are not for references of any sort. #References are for both generated inline references (what's being called numbered references here) and bibliographic lists. If you feel the layout requires it, you can create subsections for #Citations and #Bibliography or #Works cited or #Whathaveyou.

        There's no call whatsoever for a (second) #Notes section at Wiktionary and creating one will increase the level of errata our users will add to entries, which the editors above felt to be a problem. The w:1st rule of holes suggests not expanding the areas of the entry devoted to random information, beyond that included in the existing and needful areas.

        As for having a subsection of #References for linked #Citations and another for stand-alone #Works... I fall back on my position that you're just being lazy and should create appropriate references as you create entries. At the same time, there's no real problem with creating a subsection within #References to deal with the layout issues, if people really want harvard style references and a separate list of works. But that discussion has nothing to do with a #Notes section. — LlywelynII 14:25, 18 August 2015 (UTC)

        As an example of what I mean, I patched up բալախ, the original entry that prompted this discussion. Note that having a separate inline section means that the inline citations should not fully duplicate the information in the bulleted list. It should be kept terser, with the full information on the source given below. — LlywelynII 14:38, 18 August 2015 (UTC)

        Here's an edit after the #Citation section has been made terser and the bibliographic info has been moved down to the #Bibliography section. Obviously it could be made more helpful and nicer with some of Wikipedia's inline citation templates like sfnp, which create automatic links to the full citation info. — LlywelynII 14:55, 18 August 2015 (UTC)
        WT:NOT#Wiktionary is not Wikipedia. We can do things differently. --WikiTiki89 14:32, 18 August 2015 (UTC)
        We can, but having an infelicitously-named #Notes section is really not a good place to start. If it's intended for storing inline references, it still belongs in the #Reference section. — LlywelynII 14:34, 18 August 2015 (UTC)
        Well we could have the ==Notes== section actually be notes that reference the ==References== section, like Benwing mentioned. --WikiTiki89 14:39, 18 August 2015 (UTC)
        A #Note section giving notes on the #References section would be a section of commentary on the sources being used for the entry. That's completely different from what Benwing was discussing and doesn't seem particularly helpful itself, either. — LlywelynII 14:55, 18 August 2015 (UTC)
While we're on this subject, note that I cannot use inline references for two language sections simultaneously without resorting to ugly tricks. See գութ. --Vahag (talk) 14:48, 18 August 2015 (UTC)
Sure you can. You either duplicate the information in each section or you use a named reference, with a #Reference section below both. I do have to admit I'm confused, though. Your example գութ doesn't have any reference shared between its two sections. Was there one you wanted to share or was it just a bad example? — LlywelynII 14:59, 18 August 2015 (UTC)
Hmm, I wanted to do this and I could swear that format didn't work before. It does now, so I withdraw my comment. --Vahag (talk) 15:29, 18 August 2015 (UTC)
  • Oppose. I also oppose the notion (expressed by some above) that all references need to use ref tags. In particular, because Wiktionary has a longstanding practice, which I support, of not cite other dictionaries inline for definitions, but Wiktionary does allow other dictionaries ("mentions") to verify words in many languages, there will always be many entries which have references which apply to the whole entry, as Benwing notes. I personally don't find a mix of bulleted and numbered citations problematic, but if you do, a solution like the one deployed on բալախ is preferable to a new section which, I agree with Llywelyn, is unnecessary and also apparently misunderstanding what Wikipedia uses ==Notes== for (hint: not references, but actual clarificatory notes, which often don't cite references). Practically speaking, the continued use of "related terms" by new users to mean "semantically related" when it actually is for "etymologically related", and the only very slight distinction that is proposed to be made between ==References== and ==Notes==, convinces me that only a few veteran adepts would use ==Notes== correctly, and other people would either not use it "correctly" vis-a-vis ==References==, or fill it up with trivia. - -sche (discuss) 15:16, 18 August 2015 (UTC)
    • But those are completely different things. There's references ("see here for more") and sourcing ("we got this information from here"). Mixing them into the same references section is bad. I have no problem with listing external reference works, but treating them as sources or mixing them in with sources is very bad. External reference works should, surely, go in the "external links" section, the "references" section should be kept for sourcing only. —CodeCat 20:06, 18 August 2015 (UTC)


See Talk:𐤋𐤏𐤁. Seems we have a hundred-odd entries whose headwords are perfectly correct but whose article titles are written backwards for no apparent reason. (Nothing came up searching the beer parlor but there may have been a discussion about this elsewhere. If so, just kindly link to it.) — LlywelynII 23:33, 17 August 2015 (UTC)

Could this be a problem with the wiki editor (and/or the user's browser)? I mean, if you start typing Hebrew or Arabic, it will correctly switch to right-to-left mode. But it doesn't necessarily "know" about every language. Equinox 23:51, 17 August 2015 (UTC)
The problem is that even though Unicode designated Phoenician as right-to-left, most fonts seem to display Phoenician characters left-to-right. And because of this, the editors who created these entries entered the letters backwards in an attempt to get them to display correctly, so the article titles are actually wrong. --WikiTiki89 02:18, 18 August 2015 (UTC)
Ah. So it's a well-meaning problem all around: the original editors were trying to get it to display correctly; the programmers got around to formatting that language to process correctly; implementing the new coding has now made the existing pages display incorrect backwards names which are getting copied onto other people's work elsewhere on the internet. So, we just need to go fix this, right? Is this something easily automated or do we just slowly do it by hand?

And will the entries now alphabetize correctly? or do they need special treatment in their DEFAULTSORTs? — LlywelynII 13:48, 18 August 2015 (UTC)
This must be done manually by someone with enough familiarity with Semitic languages (such as myself). The entries are very inconsistent. Some are correct, and some are incorrect in different ways. And yes, they will alphabetize correctly after this. --WikiTiki89 14:02, 18 August 2015 (UTC)

Nouns mostly used in plural - redirection to singular[edit]

I see reduction of content going on in nouns often used in plural, via soft redirection to singular forms. That includes crocodile tears, savings, and scrambled eggs. This seems inferior to me and I would like to refert. We should IMHO host the definitions in the most common form, and if the most common form is the plural, we should host it in plural. What do you think? Anyone has a link to a previous discussion? --Dan Polansky (talk) 19:10, 18 August 2015 (UTC)

One concern I expressed in this previous discussion (see also this one) was that most people are able to figure out when a word is plural even if they can't tell what it means, and will look up the base form (e.g. foobar, if what they see in the text is "the foobars are blah"), so unless there is some explicit and obvious notice that additional senses are to be found in the plural's entry, readers may never think to look there.
If all senses are most common in the plural, I agree that the plural should be the lemma, with the singular using Template:singular of or a similar template. If only some senses are most common in the plural, I think it's more helpful to the reader to have them all in one place with appropriate labels (like "chiefly plural"). I could live with splitting them, though, as long as there were explicit, obvious notices to readers that they need to look in the other entry for more senses. (I don't think bare Template:singular of as an additional definition-line after some substantive definitions makes it sufficiently obvious that there's more semantic information to be found in the plural, but Template:singular of with a gloss specified might work.) - -sche (discuss) 19:38, 18 August 2015 (UTC)
The way I see it, there is one and only one lemma entry (one with definitions, inflection, -nyms, etc) per lemma. A single lemma should not have more than one lemma entry. So either these should all be concentrated on a single lemma page, as our normal practice is with respect to lemmas and non-lemmas, or we should treat them as separate lemmas entirely and keep them completely separate. I have done this with some entries as well, such as dialectics and darts. Note that in the former case I made sure to split the etymology as well, as different lemmas always have different etymologies. —CodeCat 20:20, 18 August 2015 (UTC)
We also need to establish some limit for how much more common the plural is. According to bgc ngrams, shoes, eyes, and feet are all somewhat more common than their corresponding singulars, but I wouldn't want to treat the plurals as the lemmas. —Aɴɢʀ (talk) 06:28, 19 August 2015 (UTC)

How can we improve Wikimedia grants to support you better?[edit]


The Wikimedia Foundation would like your feedback about how we can reimagine Wikimedia Foundation grants, to better support people and ideas in your Wikimedia project. Ways to participate:

Feedback is welcome in any language.

With thanks,

I JethroBT (WMF), Community Resources, Wikimedia Foundation. 05:24, 19 August 2015 (UTC)

What to call plural noun lemmas?[edit]

We have the template {{en-plural noun}} to categorise nouns whose lemma is grammatically plural. But this template also categorises in Category:English pluralia tantum. Is every noun that is used primarily in the plural a plurale tantum? I'm thinking a better category name would be Category:English plural nouns or Category:English plural-only nouns. —CodeCat 14:28, 19 August 2015 (UTC)

Very many "plural only" nouns can be found to be attested in the singular, eg scissor. It would, IMO, be misleading to eliminate the category for this reason, but it means that we need a good explanation in the category header. If we have a good explanation, we don't need to worry as much about the category name. I think what users need to know is not that the lemma is plural in form, but whether it is more commonly ("correctly") used ("agrees") with a singular or plural verb. I think this is an empirical question for many such terms, rather than something that follows from the categorization. I wonder whether the category shouldn't be hidden and the "plural-only" display replaced with something that focused on the agreement issue. As a hidden category it would retain its usefulness in directing contributors to reviewing the entries to determine whether they adequately and correctly addressed the agreement issue. DCDuring TALK 14:42, 19 August 2015 (UTC)
scissors pl ‎(normally plural, singular scissor). We can call the category Category:English plural nouns (and use it only for lemmas, not forms-of). --WikiTiki89 14:50, 19 August 2015 (UTC)
With such nouns that do have a singular, we have to ask what the singular actually means. For the derivation singular > plural it's easy, it is simply multiple of a thing. For plural > singular, if the plural form clearly does refer to multiple objects, then I'd reason that it should simply be a non-lemma and the singular is the lemma. But for plural nouns that are not clearly multiple instances of something, it's more difficult. "Scissors" is a single object, so a hypothetical singular form doesn't have a predictable meaning. What is a "scissor"? Saying it's the singular of "scissors" doesn't actually make it clear what it is. So I think that we should evaluate cases where the singular parameter of this template has been specified. —CodeCat 15:00, 19 August 2015 (UTC)
I agree (but that shouldn't prevent it from being on the headword line, just in case that's what you were implying). And the same is the case with plurals of proper nouns, such as Islams; just calling it the "plural of Islam doesn't explain what it means. --WikiTiki89 15:04, 19 August 2015 (UTC)
"A scissor is for cutting"; "A scissors is for cutting"; "Scissors are for cutting" (could refer to one or multiple pairs of scissors). The pattern doesn't apply to spectacles/glasses.
What label and what category name should be applied to scissors and to glasses/spectacles? DCDuring TALK 16:17, 19 August 2015 (UTC)
Yes, but what is a scissor? Some would say it is one half of a pair of scissors. Other's would say it is one pair of scissors. Others would say it is one instance of a scissoring motion. But none of that is clear from the definition of scissors. --WikiTiki89 16:21, 19 August 2015 (UTC)
I don't think it is used much to mean "one of the two parts of a pair of scissors." despite the apparent use of scissor in just that sense in pair of scissors. We have long past the time when there was a significant group of speakers who used scissor that way. DCDuring TALK 16:29, 19 August 2015 (UTC)
Challenge accepted. --WikiTiki89 16:34, 19 August 2015 (UTC)
I don't doubt that you can find current attestable usage of scissor in the sense you have dredged up from history and etymology. I think it is more likely the subject of humor (eg, George Carlin) than conversation that adheres to the Gricean maxims, in particular "Avoid obscurity of expression" and "Avoid ambiguity" (presumably in context). DCDuring TALK 16:46, 19 August 2015 (UTC)

Guidance requested on religious terminology[edit]

Quaker terms I would like to make entries or a listing for Quaker-related terminology, as some of it is very particular but I'm not sure if it belongs in the main body of the dictionary or an appendix or what-have-you. For instance, Quakers traditionally didn't refer to the days of the week by their common pagan-derived names but used "first day" for Sunday, "second day" for Monday, etc. I could easily imagine someone reading about a "Friend going to meeting-house on first day" and not realizing that this means a "Quaker going to church on Sunday". Should I create entries for all of these terms or simply something like Appendix:Quaker terminology? Thanks. —Justin (koavf)TCM 02:17, 20 August 2015 (UTC)

  • Be bold, and make a start. We'll let you know if you do anything wrong. SemperBlotto (talk) 05:20, 20 August 2015 (UTC)
    • Do we have a context label for Quakerism? If not, we should make one. —Aɴɢʀ (talk) 09:49, 20 August 2015 (UTC)
      • A strategy is to start with a simple, but formatted, list in an Appendix * {{l|en|first day}}, yielding first day. That would enable you to see how many of the terms already existed in English (blue link), possibly with the right definition, how many required a new English section (orangish link), and how many needed new entries (red link). Each of these situations can be speeded up by having specific cut-and-paste. DCDuring TALK 18:06, 21 August 2015 (UTC)

@DCDuring:, @Angr:, @SemperBlotto: A lot of them are at Appendix:Quakerism. There are probably a few more but I'm tired now. Do you think that a context label and tracking category would be useful? Thanks. —Justin (koavf)TCM 03:16, 23 August 2015 (UTC)

I do. We already have them for other Christian denominations such as Category:en:Anglicanism‎, Category:en:Eastern Orthodoxy‎, Category:en:Coptic Church‎, Category:en:Mormonism‎, Category:en:Protestantism‎, Category:en:Roman Catholicism‎, so why not Quakerism? —Aɴɢʀ (talk) 06:25, 23 August 2015 (UTC)

French French, Spanish Spanish and the like[edit]

This came up tangentially in May, but I'd like to raise it in its own thread. Currently, most regional categories are named "[place-adjective] [language]", as in "French French", "Welsh English" and "Austrian German", while a minority are named "[place-noun] [language]", as in "Louisiana English" (not *"Louisianan English") and "Quebec French".
I and some others find "French French" (and also to some extent "Welsh English") awkward and confusing, because it's easy to interpret both instances of "French" (and "Welsh") as referring to a language rather than a place. The "[place-adjective] [language]" scheme is also impossible or undesirable for some languages: "Swiss German" was felt [by some people, not me] to be so similar to the name of the Swiss German language [which Wiktionary calls Alemannic] that its category was moved to "Switzerland German", and it's currently impossible to distinguish French terms specific to the DRC from those specific to the ROC, because both go in "Congolese French". OTOH, "Austrian German" and most other category names are fine.
I propose we move all the reduplicated categories (like "French French") to either the "France French" format some categories already use, or to a format like "French of France". (Should we move all categories, including "Austrian German", etc, to one of those formats? It'd be consistent, but unnecessary in most cases.) - -sche (discuss) 22:05, 20 August 2015 (UTC)

Using the "French in France" format has the nice advantage, from a technical standpoint, that it fits the same name format as all our other part-of-speech type categories. —CodeCat 22:30, 20 August 2015 (UTC)
  • Support Absolutely. I always support "X in Y" or "X of Y" constructions because of Congo/Congo and Dominican/Dominican (Dominica and the Dominican Republic). —Justin (koavf)TCM 03:58, 21 August 2015 (UTC)
We need to make sure we use linguistic borders rather than political borders. Anything with the word "Republic" in it is not likely to be a linguistic border. --WikiTiki89 05:25, 21 August 2015 (UTC)
I could support this in cases where it's ambiguous (like Congolese French) or highly misleading (like Swiss German was), but some of the reduplicated names (e.g. English English for the English of England) are actually well established and I wouldn't be happy to see them go. And I really wouldn't want to change the names of local varieties when the names are nonreduplicated, well established, and unambiguous, like Austrian German or Munster Irish. —Aɴɢʀ (talk) 12:43, 21 August 2015 (UTC)
Yeah, that's a concern I have, too — "Austrian German" and most categories have perfectly good names as-is, it's only a minority that are problematic. I certainly don't want to have three competing formats ("[place-adjective] [language]", "[place-noun] [language]", "[language] of [place]"), so if we're not prepared to switch in general to a "[language] of [place]" format, I suppose the status quo of occasionally deviating from "[place-adjective] [language]" to "[place-noun] [language]" is functional, if a bit unschön. "Dominica English" and "Dominican Republic English" work, and I guess so does "DRC French" (probably the least ugly option, compared to "DR Congo[lese] French" or the atrocious "Democratic Republic of the Congo French"). - -sche (discuss) 19:13, 21 August 2015 (UTC)
  • Where can I see uses of "English English"? google books:"English English" gives me high number of hits but from clicking the hits I find no quotations of use of "English English". --Dan Polansky (talk) 21:20, 21 August 2015 (UTC)
google books:"English English dialects" turns up a handful, which I've added to Citations:English English. Obviously, I don't dispute that the phrase is attested, only that it's the best/clearest name we could choose to use. - -sche (discuss) 03:15, 22 August 2015 (UTC)
We don't have names for linguistic divisions. It's English of England, not the more accurate English of England minus the northern half of Northumberland and the southwestern part of Wrexham, Wales and various enclaves in Paris, Dublin, New York City, Hollywood, etc., etc. (Yes, that was made up; I don't know the exact lines of English of England, and in fact the edges aren't that clean, the lines between Welsh English and Scottish English and the English of England are in fact slow changes.) By the difficulty of moving across national borders, and cultural identities tied to them, national borders tend to have some effect on language division, and where they don't, we probably can't say anything about it. So, no, "Republic" in the name doesn't mean anything.--Prosfilaes (talk) 20:56, 21 August 2015 (UTC)
Fr.Wikt lists ~70 words which are used in one Congo but not the other; I welcome suggestions on how to categorize them without using the names of the countries (which is what fr.Wikt does, if anyone wondered). :-) Fr.Wikt also lists a handful of words which are used in both Congos, which it might be tempting to conflate into one category, but I note that we don't conflate words used in Canada with words used in the US even when the words are used in both places — we dual categorize them as "Canadian English" and "American English". (In fact, we had a discussion which specifically deprecated the geographic label "North American" and made it so {{lb|en|North America}} displays and categorizes as "Canada, US".) - -sche (discuss) 03:23, 22 August 2015 (UTC)

Get rid of the parentheses around inflections in headword lines[edit]

Instead of putting parentheses there, I'm thinking it might look cleaner to separate the inflections with an m-dash or something similar. Something like this:

testplural tests

An advantage is that it looks nicer when you put qualifiers or transliterations there. Those features aren't used much, but they are available.

What do you think? —CodeCat 21:15, 21 August 2015 (UTC)

@CodeCat: I think it could be visually appealing but mdashes with spaces is bad typography. Space ndashes or use mdashes immediately between the terms. —Justin (koavf)TCM 03:18, 22 August 2015 (UTC)
Not it's not. Languages other than English frequently use m-dashes with spaces. It's not "bad typography", just not typical in English text. --WikiTiki89 03:30, 22 August 2015 (UTC)
@Wikitiki89: If it's not typical typography, then we shouldn't use it. —Justin (koavf)TCM 03:43, 22 August 2015 (UTC)
It's not typical in English running text. That says nothing about specially formatted things like tables or dictionaries. --WikiTiki89 00:49, 23 August 2015 (UTC)
Here's what other dictionaries do (a slash denotes a line break):
online dictionaries:
  • Cambridge: thesis / noun (plural theses) / [definition]
  • Collins: thesis / noun / (plural) -ses
  • dictionary.com: thesis / noun, plural theses [...] / [definition]
  • Merriam-Webster: thesis / noun [...] / [definition] / plural theses
  • Oxforddictionaries: thesis / noun (plural theses) / [definition]
  • thefreedictionary.com: thesis / n. pl. theses / [definition]
paper dictionaries:
  • Concise Oxford English Dictionary: thesis n. (pl. theses) [definition]
  • Webster: thesis n., pl. -ses [definition]
The trend is to have as little horizontal space as possible between the singular and the plural, which is consistent with using parentheses or a comma, and inconsistent with a dash. - -sche (discuss) 05:13, 22 August 2015 (UTC)
But the trend for online dictionaries like us is to have a line break, so maybe {{head|en|noun|plural|tests}} and {{en-noun}} should generate:
and so on, e.g. {{de-noun|m|Tischs|gen2=Tisches|Tische|Tischlein|dim2=Tischchen}} gives:
I think that's easier to read than piling the forms up horizontally. —Aɴɢʀ (talk) 06:09, 22 August 2015 (UTC)
  • This would make sense on cell phones, but on laptops there's usually limited vertical space and lots of horizontal space.
  • In response to CodeCat, I've long wanted the parens gone, because with translits you end up with two layers of parens. Benwing2 (talk) 06:31, 22 August 2015 (UTC)
I think the line break some other dictionaries provide between the first mention of the lemma form and the mention of its plural is the same one we already provide between those two things: we just separate the first mention of the lemma form (up at the top of every page) from the rest of the headword line by so much other stuff like etymology that we repeat the lemma form a second time before we give the plural. I don't think we should add another line break on the PC version of the site, although as Benwing notes, it might actually make sense to do so on the mobile version. - -sche (discuss) 07:54, 22 August 2015 (UTC)
What about just a comma?
noun, plural nouns
ко (ko), plural кои (koi)
Arabic entries like حدث already employ commas rather than parentheses for this kind of thing. - -sche (discuss) 08:02, 22 August 2015 (UTC)
There are many formats that would get my vote, but any format that would take up any additional vertical screen space on a desktop, laptop, or good-sized tablet would not. I'd prefer an endash over an emdash too.
Wouldn't the space constraints of a cellphone be better addressed by the Wiktionary app than by our efforts? DCDuring TALK 11:34, 22 August 2015 (UTC)

The dot before the first transliteration on some entries' headword lines[edit]

The thread above this prompted me to look closely at how headword-lines are formatted, and it strikes me that having a dot before only the first transliteration on only the headword line of only some entries creates an awkward and inconsistent amount of space. For example, in буква: why should "(búkva)" be further away from "бу́ква" than it is from "f inan"? and why should "(búkva)" be further away from "буква" than "(Latin spelling žagati)" is from "жагати" in [[жагати]], or than "‎(romaji aizōban)" is from "あいぞうばん" in that entry? The dot is especially awkward in entries like حَدُثَ ‎(ḥaduṯa), where the headword-line goes on to give another word and its translit, and the second translit is not separated by a dot. I propose we eliminate the dot.
I know that for the tiny number of languages which have WT:_ transliteration pages, the dot serves as an easter egg for the tiny number of people who notice that it contains a link. The link could either be moved to the transliteration itself, i.e. бу́ква (búkva), or just omitted because a nearly unnoticeable link that only exists for a few languages and points to a page that's frankly not very useful is, well, not so useful that it needs to stay... I mean entries don't even normally link to WT:About _ pages AFAIK, and those are frequently more useful. - -sche (discuss) 08:30, 22 August 2015 (UTC)

I was one of the proponents and spreaders of the "dot" format, but recently I changed my mind for the reasons you give. I now think it should be removed altogether. --Vahag (talk) 11:07, 22 August 2015 (UTC)

User:Pereru and sources again[edit]

This user has started adding all kinds of templates like {{needsources}} and {{needref}} to entries again. These templates don't serve a purpose as there is no strict need for sources. You can't ask for sources if there aren't any.

More annoyingly, the user is now also preventing me from editing and fixing up etymologies, reasoning that I may only write what agrees with the source. This is complete nonsense; if sources restrict what edits Wiktionarians may make, then the sources need to go. Or better yet, the users need to stop doing that and let editors do their work. If sources prevent me from improving Wiktionary, I'm going to start removing them. —CodeCat 18:39, 22 August 2015 (UTC)

The template {{needsources}} was kept, as in the decision above, and is believed to be useful. Adding it to an entry does not change any of its contents, it merely points out that there are no sources and that it would be an improvement to add them. If there are no sources, add a rationale -- the template says so. If you want, we can talk about how to do that. But adding information that is based on something -- even original research -- wihtout mentioning its source or rationale -- that is in no way an improvement.
Ahn... Please don't misrepresent me. What I'm saying is that etymologies cannot float in the vacuum. If you don't have a source to add, add a rationale. You often do a quick'n'dirty one in the edit summary -- why not add a better one to the text itself?
I insist: I am not preventing you from fixing etymologies: I merely think that, by letting them float in the air, you're making them worse. Ground your etymologies, and I'll have no problem with what you do. Please, don't misrepresent what I say.--Pereru (talk) 18:45, 22 August 2015 (UTC)
Why is a rationale needed? Specify which parts of an etymology are in doubt. Or better yet, take it to WT:ES. Putting a template on the entry solves nothing at all. The template itself needs a rationale just as much for it to be useful.
Because the reader is not a Wiktionarian. He's not trying to discuss etymologies. He wants to know what the jist is of the reason why this form is here rather than some other form. He is not a critic: he just wants to know how Wiktionary decided that this was the right form. It's information, it's relevant, it should be on the page. Why is this even a problem? Are you trying to hide something?
I'm also not amused by your continued stance against Balto-Slavic. Balto-Slavic is accepted and has consensus among linguists, yet whenever I add it to an entry you put brackets around it and add "perhaps", while keeping your own Baltic-only etymologies displayed prominently. Wiktionary is not here to promote your fringe anti-Balto-Slavic views. We should show the current state of research. I think if you continue to exclude Balto-Slavic or play down its relevance or acceptance, then you should stop editing etymologies altogether. —CodeCat 19:07, 22 August 2015 (UTC)
Not the linguists I've talked to, no. But even if it were a consensus -- I don't have a problem with you adding Proto-BS to Wiktionary. I have a problem with you not arguing for the forms (same for Proto-IE, by the way). If you've invented them yourself, say so and state why. Why is this so difficult? If you're proposing a hypothesis, justify it on the page! If it's an argument that is generally valid for many words, write it up somewhere and link to it on the page! The "perhaps" there is meant to show that there is no reason for that form given here in Wiktionary -- if you add a reason, a justification for that form, then I'll be happy to delete any hedges.
Let me turn the argument against you: Wiktionary is also not here to support your anti-source, anti-rationale agenda. Being against sources and insisting on hiding the reasons why you choose one specific protoform when there are other in the literature and when there often is disagreement among Wiktionarians (see Štambuk and you on Kim vs. the Leiden school) does not make anything better here -- it arguably makes things worse. --Pereru (talk) 19:15, 22 August 2015 (UTC)
And yet, you refuse to explain anything at all about the problems you have with etymologies. WT:ES exists for a reason, why don't you use it? That's the place for discussing etymologies. Discuss what's wrong with them. If you find them implausible, then say why in the discussion. Just putting "perhaps" and a bunch of templates doesn't solve any of that. —CodeCat 19:22, 22 August 2015 (UTC)
That's because I don't have any specific problems with the forms in question; I just want to see what the reasons are for their having been chosen. And I keep not understanding why wanting to see this is strange, and why you're so determined to hide it. Again: it's not about discussing, it's about documenting. It's like adding sources to quotations. --Pereru (talk) 19:34, 22 August 2015 (UTC)
Yes, and which reasons are unclear? What needs explaining? Specify which aspects of the etymology are unclear and need explanation. And don't say "all of it" because that would make no sense; not a single etymological source explains everything about an etymology. The reader is always assumed, by every work, to have an understanding of the linguistics. What etymological sources do is they explain special parts of the etymology that may be surprising or unexpected, or aspects of a language's development that are unknown or not fully consensus. So you need to specify which parts of the etymology are unclear and need motivating.
I need to see the rationale in order to tell you if I think there is something wrong about it. Just as I need to see the definition of a word to see if I think it's wrong. A word without a definition is not useful in a dictionary. An etymology without a source or rationale is just floating in space, it is a speculation of its author. This should be obvious. Ask yourself: why is it that every good etymological dictionary known to man has both sources and justifications for the protoforms it lists? Are they really all wrong in doing that?
Also, aside from all of this, you do realise that all this applies to you as well? You'll have to give motivations in all your etymologies as well. Especially the ones that promote Baltic while dismissing Balto-Slavic. Fringe and unusual ideas should always be subject to higher scrutiny. So if you have a particular reason for going against the majority view that Balto-Slavic is a real thing, then you will have to explain this and why this view should be preferred in Wiktionary etymologies. Because I don't think there is a consensus for excluding or minimising Balto-Slavic. It has more support among linguists than Baltic does. —CodeCat 19:43, 22 August 2015 (UTC)
Of course I realize that. That's what I've been doing from the start. Every single etymology I have added has (a) a source and (b) a motivation/rationale. They are just not mine; they are Konstantīn Karuli's. You may disagree with them, and you are free to argue or counterargue (justifying and sourcing your arguments); and if indeed they proceed, then you win. What is the problem with that?
I'm all in favor of scrutiny! My entire point is that you provide no scrutiny. You just carry out your decisions without giving good reasons, and every time you're called on that, you just say something like "I don't need to justify my preferences". Well, you do. Please, do some scrutinizing. And write it down for other to see and scrutinize, too! Your distaste for justifications and/or sources is the fringiest idea I've seen: I don't know a single person interested in historical linguistics who supports that, including the other Wiktionarians here. It's only you, CC. You're the fringe one here, the one who needs scrutinizing. Please accept that. --Pereru (talk) 13:36, 24 August 2015 (UTC)

Sources, despite User:CodeCat[edit]

Frankly, here is my personal approach to this. If anyone (except CodeCat, who really isn't impartial about this issue) thinks I'm wrong, please let me know.

  • I think sources and/or rationales (CodeCat always forgets this part, for some reason) make etymologies more trustworthy, because they show to that Wiktionary has done its homework and allow the more educated user to check whether or not s/he agrees with Wiktionary (this is especially important when an etymology is Wiktionary's own).
  • Sources and/or rationales (CodeCat always forgets this part, for some reason) are easy to add: if you're copying the info from somewhere, write down where from. If you're creating it yourself, write down why this is better.
  • If an entry doesn't have sources and/or rationales (CodeCat always forgets this part, for some reason), then it's OK to add a template that says so, so that those who are interested can take care of it. It's not different from templates like {{rfap}}, which I also use extensively to encourage Latvian speakers to add audio pronunciation files.

What is wrong about any of the above? And in what way does any of the above prevent anyone from working?

CodeCat and I have been reverting each other's edits for a few minutes. I will no longer do that -- it's more than a bit childish -- but I will leave here my request that something be done about it. This page is a discussion forum where such problems can hopefully be resolved. Let us talk about that, then, and come to some sort of conclusion, so that we can finally go on doing things without sudden tantrums from our estimated colleagues. --Pereru (talk) 19:03, 22 August 2015 (UTC)

  • CodeCat, I think previous discussions have made abundantly clear that the only person here who thinks there is no need for reconstructions to have some kind of reference is you. Given that fact, it would be wise of you to stop edit-warring {{needsources}} (which was RFD-kept per consensus) out of entries. Let's start working on a template or format for presenting "Wiktionarian research" / "rationales" on entries which lack scholarly references. For reconstructions based on known sound correspondances, perhaps we could document the sound correspondances on an 'About' page (or similar page) and then have a template that says "Reconstructed by Wiktionary according to known sound correspondences" which could be placed in the references section or at the bottom of the entry {{Webster}}- and {{LDL}}-style. - -sche (discuss) 19:08, 22 August 2015 (UTC)
    • The issue I have is that these templates are telling me to add references and sources. There aren't any, so I remove the template. What point is there in asking for something that doesn't exist? —CodeCat 19:11, 22 August 2015 (UTC)
      • The template says sources and/or rationales. If there is no source, add a rationale. Are you claiming the rationales also don't exist? Supposedly you haven't been picking protoforms randomly... have you? --Pereru (talk) 19:20, 22 August 2015 (UTC)
      • (e/c) The template explicitly asks for either pre-existing scholarly sources, or what it currently calls "original research" (that wording and the format it prescribes need to be improved, but the meaning is clear). On Wikipedia and some other Wiktionaries, like de.Wikt, you would be blocked if you kept adding original etymological research. We're offering you a big concession, a big compromise — you get to keep adding your OR (whereas newer users even on this wiki have been threatened with blocks, as recently as last week, for adding OR etymologies), but you have to provide your rationales for it — for each reconstruction you invent. If you aren't willing to do that, previous discussions have made clear that there are quite a few people who would be happy to simply delete and ban all etymological OR. - -sche (discuss) 19:23, 22 August 2015 (UTC)
        • You're making it sound like there's this big change that has to be made to allow unsourced etymologies. But it's just the status quo. So I keep doing what has always been done, as there hasn't been a policy change. Don't make it sound like a concession because it isn't. If you want to require sources for all etymologies, make a policy and enforce it (which would mean removing somewhere around 90% of all etymology sections and reconstructions). That's all I ask for. Until then, you need to be clearer about what's wrong with the etymologies. Just asking for sources and rationales is going to get ignored. Pereru can patrol his fringe etymologies all he wants, that's fine with me. Latvian is not my responsibility. As long as I can make sure the rest of Wiktionary is up to par. —CodeCat 19:31, 22 August 2015 (UTC)
          • Previous discussions have made abundantly clear that you are the only person who subscribes to the view that reconstructions are not required to have any sources. Your long-standing but solitary refusal to accept the status quo does not change the status quo. - -sche (discuss) 19:37, 22 August 2015 (UTC)
            • It's not clear to me at all, the prior BP discussions gave a rather nuanced picture. Make a policy that has clear consensus, and then enforce it. Nothing else will do. —CodeCat 19:45, 22 August 2015 (UTC)
          • I think this is the main point of all these discussions -- to create a new policy. Are we all in agreement now? If anyone other than CodeCat disagrees that sourcing and justifications are good and people should add them to pages, then please say so, or else... do we have a new policy? --Pereru (talk) 19:51, 22 August 2015 (UTC)
            • A policy is a separate page, clearly and delicately worded, and approved by consensus through a vote. Something like WT:CFI. —CodeCat 20:01, 22 August 2015 (UTC)
                • In a previous discussion I did exactly that, on this very page. And since nobody disagreed, I suppose this means we have a policy? --Pereru (talk) 20:48, 22 August 2015 (UTC)
      • There can be no such thing as no source; if you made it up, then write Source: CodeCat's ass. If someone is asking for a source, it's useful information that you just made it up.--Prosfilaes (talk) 20:55, 22 August 2015 (UTC)

@Pereru And since nobody disagreed, I suppose this means we have a policy? From what I gather CC insists that a lack of voted-on, explicit policy negates the fundamental clause that "Wiktionary is a secondary source" (which means that wikt allows some elements of synthesis but the synthesized sources still need to be cited.)

Anyways, can this be a thing? In case I disappear I would like to document my support of a potential policy requiring sources, including for synthesis [e.g., "bebe could be considered derived from baba because source X says that ebe is derived from aba" and so forth, the keyword here being source X.] Neitrāls vārds (talk) 21:51, 22 August 2015 (UTC)

Is User:CodeCat's behavior a problem?[edit]

I have personally nothing against CodeCat's work, which is excellent in many areas of Wiktionary. But his/her behavior with respect to sourcing and providing support for his/her etymological choices are causing increasing concern. Despite the majority view that {{needsources}} was useful, and that sources and/or rationales (CodeCat always forgets this part, for some reason) improve an entry (just like audio pronunciations do, which is why there are templates like {{rfap}}), CodeCat is doing his/her damndest to make this particular part of the job -- selecting the entries that need this improvement, and then going about doing it -- irritatingly difficult. Again, I have nothing against all other contributions by CodeCat, who, as far as I know, is a good person. I'm not against the person, I'm against the behavior, which, as I think most people agree, is not justified.
In view of that, is there some adminsitrative procedure here that can be undertaken to deal with such cases of irrational behavior? --Pereru (talk) 19:51, 22 August 2015 (UTC)

I have a problem with Pereru ignoring the consensus agreed upon in BP just this month, to make Proto-Baltic an etymology-only language. Pereru continues to create Proto-Baltic pages and categories, even going so far as to undo page moves. This needs to stop and I would like to know if there is some administrative procedure that can take care of this irrational behaviour. —CodeCat 21:02, 22 August 2015 (UTC)
I've blocked Pereru for one day for disruptive edits, which ignored consensus. —CodeCat 21:04, 22 August 2015 (UTC)
And I've unblocked him because, as I wrote, it was a "bad block by an admin who is actively involved in edit wars with this user, and is herself disruptively editing against community consensus (which is what she accuses Pereru of)". - -sche (discuss) 21:08, 22 August 2015 (UTC)
Of course, my behaviour makes his perfectly excusable. —CodeCat 21:09, 22 August 2015 (UTC)
Yes. You're being irrational, so your decisions don't make sense, whereas I wasn't, and mine do. What's the problem with that? --Pereru (talk) 13:38, 24 August 2015 (UTC)
  • But still, guys: CodeCat is imposing a policy that was never approved, that clearly goes against what the majority here wants, that goes against written recommendations like Wiktionary:Etymology#References; s/he also goes on a tantrum whenever anyone opposes that and takes unmeasured punishing actions such as his/her recent attempt to block me. And yet nobody does anything against it. What is the problem? Why does Wiktionary allow such destructive behavior? Isn't it the time for a disciplinary action? --Pereru (talk) 13:45, 24 August 2015 (UTC)
    • A disciplinary action was tried, but it failed. --Vahag (talk) 15:19, 24 August 2015 (UTC)

Czech possessive adjectives - etymology and related terms[edit]

First off, the term "Czech possessive adjective" does not find much use but I do not find a better one. Czech possessive adjectives would be the likes of orlův (eagle's) from orel (eagle). They are much like English possessive forms that we do not include for the reason that the apostrophe makes them effectively sum of parts; that is not the case with the Czech forms. In Czech, there is still a distinction between orlův and orlí; the latter would be used in the translation for "eagle's nest".

Now, how to treat them as for etymology and related terms?. I want that entries for them do not repeat the etymology of the base term, and I want to see no "Related terms" section. I prefer that they be treated a bit like items in Category:Latin participles. In this, I seem to differ from User:Jan.Kamenicek.

A possessive adjective is created for great many animate nouns, most often referring to humans but also sometimes to animals. They include matčin (and forms matčina, matčino), otcův, sestřin, bratrův, synův, orlův, etc. They are not to be confused with koní, orlí, kočičí, psí, člověčí, etc.

I am asking for input from other people. I am looking forward to getting a view from other languages that have a similar feature, maybe Russian and other Slavic languages, but also other languages. --Dan Polansky (talk) 13:38, 23 August 2015 (UTC)

The term "Czech possessive adjective" does not find much use because there are not many English books dealing with them. The term "Czech hard adjectives" seems to find even less use, but they do exist. It is also not easy to filter them out, because not all books dealing with Czech possessive adjectives use the phrase "Czech possessive adjectives", they can talk simply about Czech language and use only the phrase "possessive adjectives" (such as here: [9]).
I believe that all the expressions like orel, orlice, orlí or orlův should be listed in the categories like Category:Czech terms derived from Proto-Slavic and therefore their etymology sections should include information that they "come from Proto-Slavic *orьlъ", which also puts it into the correct category.
As for the "eagle's nest": it can be translated in both ways (depending on context) as orlí hnízdo (talking about the kind of nest), or orlovo hnízdo (nominative neuter of orlův) The latter is used quite rarely, usually when referring to a nest belonging to a specific eagle, but examples when it is used as a synonym for "orlí" can be also found (usually in poetry or in old texts, one of them is in the quotation in the entry orlův). Jan Kameníček (talk) 14:16, 23 August 2015 (UTC)
My preferred format is like this нилеце ‎(nilece), which seems to be what Dan Polansky is suggesting (the term "sub-lemma" comes to mind.) Just my "2 cents." Neitrāls vārds (talk) 14:25, 23 August 2015 (UTC)
I think that words categorized as lemmas should be treated as lemmas. Either it is a lemma, or it is not. I do not think that e. g. orlův can be considered a sublemma of orel. It is an adjective derived from orel by a suffix -ův, which is a derivational suffix, not an inflectional suffix. --Jan Kameníček (talk) 17:08, 23 August 2015 (UTC)
Maybe possessive adjectives should be ranked as non-lemmas, along with Latin participles and Czech comparatives (menší). It would be consistent with the practice of PSJC and SSJC. But I do not think it obvious that there should only be lemmas and non-lemmas, and that's it. For instance, many editors prefer to create some entries as alternative forms, and prefer to centralize etymology in the main entry and avoid it in the alternative form. The alternative form is still a lemma, but it is a secondary entry from the standpoint of information management. I have even seen some editors use the word "lemma" to mean "main entry" rather than "the word form representing all the inflected forms of the word".
The question is, like, do we want to repeat the etymology of huge in hugely, and do that for the whole class of -ly adverbs? --Dan Polansky (talk) 17:24, 23 August 2015 (UTC)
Was orlův separate from orel in Proto-Slavic, or was it only formed in Czech? If it was only formed in Czech, then I agree with Dan and Neitrāls: just say how it's derived from orel and put the history of orel in that entry. Just because something is its own lemma doesn't mean we have to duplicate (knowing it will come unsynced) information in multiple entries; rigidify is its own lemma independent of rigid, but doesn't repeat rigid’s etymology. - -sche (discuss) 19:12, 23 August 2015 (UTC)
Generally speaking, possessive adjectives appeared already in Proto-Slavic, see Appendix:Proto-Slavic/-ovъ. The possessives with the suffix *-ovъ changed in Proto-Czech (between 10th and 13th century) to -óv and later -uov, which changed into modern -ův.
Unlike huge x hugely, there are often more changes taking place when creating Czech possessives than adding the suffix, compare e. g. Radka x Radčin.
Besides this, I think that all words which have roots in proto-languages, should be listed in the categories like Terms derived from Proto-... . I don't think that only one representant of a group of related words should be listed there. Using the {{template:etyl}} in the etymology section is a good way to do so. Or should the category be added manually? Jan Kameníček (talk) 21:01, 23 August 2015 (UTC)
The fact that going from "matka" to "matčin" does not look like plain suffixing does not matter; it is the property of Czech morphology (inflectional and derivational alike) that it often does not work like plain suffixing on the surface level. For instance, "bedna" --> "bednář" = "bedna" - "a" + "ář"; "samec" --> "samčí"; "vyrobit" --> "výrobce" = "vyrobit" - "it" + "ce" with "y" made acute or the like; "dům" --> "domeček" (ů went to o); "orel" --> "orlíček" (e dropped); "hrdlo" --> "hrdelní"; etc.
What matters is that we are dealing with a very productive derivation or inflection pattern, like in English for -ly, -ness, -hood, -ify, -ing, etc. And what matters is whether we want to have etymologies like the one currently in orlův, which says this:

"From orel +‎ -ův. Noun orel comes from Proto-Slavic *orьlъ, which is from Proto-Indo-European *h₃er- ‎(“big bird, eagle”).[1]"

As you can see, the etymology first indicates the suffixing, and then goes into detailing the etymology of the component "orel". That is really like "swimming" detailing the etymology of "swim", and "merrily" detailing the etymology of "merry".

Whether possessives in general originated early or not does not seem to matter. What matters is the particular etymology, and whether it is of the form "base + suffix. Base is from base-etymology" rather than what we see e.g. in windmill, which could conceivably be "wind + mill", but can in fact be traced to Old English *windmylen. I do not think that all compounds should provide the etymologies of the component terms on the pages of the compounds. Put in general terms, I do not think that all etymologies of all terms resulting from derivation (prefixing, suffixing, compounding, etc.) should repeat the etymologies of their base terms. --Dan Polansky (talk) 19:10, 24 August 2015 (UTC)

Question (re: sourcing)[edit]

So, there was this thing that I wanted to get a feel of the general attitude.

Do passages/statements attributed to an author or a book need to actually reflect what the author/book says? Can they be changed ("corrected") with something that author doesn't say while still attributing it to them?

My answer would probably be "are you effing kidding me?" (lol) Then again en.wikt can be a serious "land of the bent mirrors" [don't remember the correct idiom] and things that I see as common sense some others don't even consider.

This is referring to a discussion 3 headers up (that I actually missed) where (to sum it up somewhat snarkily) CodeCat says that that book is stupid and needs her corrections while still proudly displaying the reference [1] at the end despite (in some cases) all the core information being changed. For example in akmens the direct parent root was changed, then extrapolating from that the proto-group was changed and a different PIE root introduced (none of these things are to be found in the source cited.) I call this manufactured references/misattribution but maybe I'm dumb...?

Would like others' input.

And more generally this thing has been lingering on for years, the crux of the matter is that CC demands an explicit, voted-on policy, why not just do it, it could be something very simple, something to the effect:

  • Wiktionary by previous consensus is a secondary source, this explicitly applies to etymologies, sources need to be provided, in case of synthesis, the synthesized works need to be attributed.
    • Usage of templates to keep track of unsourced pages is to be encouraged.
    • Attribution of statements to an author that they didn't make is to be avoided.

What do you think about that? Perhaps User:Dan Polansky could help set it up? Neitrāls vārds (talk) 14:57, 23 August 2015 (UTC)

There is still Wiktionary:Votes/2013-10/Reconstructions need references that never started. How is the present wording of the vote from your standpoint? --Dan Polansky (talk) 15:09, 23 August 2015 (UTC)
As for policy page, the main thing is consensus and evidence of consensus, IMHO. A policy page by itself is a poor evidence of consensus; it merely makes things convenient for newcomers who then do not need to wade through previous votes to find what the decision was. Thus, a policy page is not strictly necessary, IMHO.
For interest, Wiktionary:Votes/pl-2006-12/Proto- languages in Appendicies is a related vote that does not seem to indicate inclusion criteria. --Dan Polansky (talk) 15:15, 23 August 2015 (UTC)
Looks good, pretty much exactly what I had in mind. The only problem – a bit narrow. In Latvian there is this problem that the entries look like doormats (to be a bit dramatic). Would be perfect if it could be extended to mainspace...? P.S. perhaps a clause about misattribution would be necessary – right now I can name two appendices that very dubiously cite template:R:lv:LEV (a connection is attributed to this book that cannot be found there.) Neitrāls vārds (talk) 15:31, 23 August 2015 (UTC)
I agree completely. I don't know what is on CC's mind, but s/he is clearly doing the wrong thing here. I don't really know what "policies" are supposed to imply (CC clearly acts without one), but I say there has to be some order in the usage of references and justifications. I also agree completely that reconstructions need justifications (sources, rationales), a practice that is used in every good etymological dictionary that I know. --Pereru (talk) 13:26, 24 August 2015 (UTC)
@Neitrāls vārds: I updated the vote a bit, to indicate sentence structure in a clearer way.
As for narrowness: I'd suggest to leave it narrow, and see whether it can get enough support as is. We can create another vote for etymologies later. There is still the question whether etymologies should be inline referenced etc.; dealing with these appendices separately seems to be a good initial step. --Dan Polansky (talk) 15:39, 23 August 2015 (UTC)
I added the vote to WT:VOTE and scheduled it to start in a week. Let us postpone the vote as much as a discussion requires. --Dan Polansky (talk) 15:44, 23 August 2015 (UTC)
Great, thanks! Neitrāls vārds (talk) 15:49, 23 August 2015 (UTC)

Native speakers' advice[edit]

Native speaker's advice needed, please look at Talk:houbelec#Translation. Thanks very much! Jan Kameníček (talk) 21:15, 23 August 2015 (UTC)



I don't see the use of keeping such empty entries that failed their RFD's. Could someone explain? Thanks 12:15, 24 August 2015 (UTC)

Partly as a place to store the evidence for the word (so that if we eventually find more, we can recreate it more easily – see for instance redamancy, which was a blank entry pointing to Appendix:English dictionary-only terms, until we managed to find enough citations to create a full entry), and partly to stop people trying to recreate the page (which often happens with "words" that correspond to rare phobias, sex acts, political insults etc, which are often mentioned in word lists and novelty dictionaries but never actually used – look how many times "wunch" got deleted, until I created a proper cited entry for it). Smurrayinchester (talk) 13:44, 24 August 2015 (UTC)
Your first reason does not apply, since the citations page would exist even if the soft redirect to it from the main entry did not exist. --WikiTiki89 13:48, 24 August 2015 (UTC)
But who checks whether the citations tab is a blue-link when creating an entry? Smurrayinchester (talk) 14:22, 24 August 2015 (UTC)
That's your second reason. I only said your first reason doesn't apply. --WikiTiki89 14:59, 24 August 2015 (UTC)

Recreating Proto-Baltic (and other "deprecated" languages) with a different status?[edit]

Proto-Baltic was recently discontinued as an accepted language in Wiktionary. I was against it, because it doesn't seem to me that the discussion is over (and because there is no real authoritative source for PBS etyma yet), so it seemed premature, but OK, I can live with that. The problem, it seems to me, is that this forces changing quotes from sources in ways that don't seem legitimate. If a source reconstructs a form as Proto-Baltic, renaming it as Proto-Balto-Slavic without any further changes (e.g., replacing it with a different source) seems to me illegitimate. So: how about having a different status for Proto-Baltic? Say, "older/deprecated/obsolete Proto-language" or something like that? In this manner, we could list deprecated protoforms here (with templates duly identifying them as such) in the same way we list "misspellings of" or "alternative forms of" or "obsolete forms of" words in the main namespace. Here are a couple of reasons:

  1. People will still come upon older reconstructions -- they are, after all, attested in papers, etymological dictionaries, and other similar sourcces --, and may want to know what they were and why they were abandoned; it would thus be useful to have pages with these forms (clearly tagged as "deprecated" or something like that, and linked to the most recent and most widely accepted form), just as in biological taxonomy it is useful to have lists of old, deprecated scientific names so that older articles can still be read and understood correctly
  2. To follow the history of a proposed protoform, knowing its predecessors is important -- often, a new protoform is proposed in explicit opposition to, or as an explicit correction of, an earlier proposal. Being able to track these would be useful in understanding the state-of-the-art.

What do y'all think?

You can still reference a source that reconstructs a Proto-Baltic term in a Proto-Balto-Slavic entry. Think of it this way: we are reconstructing a Proto-Balto-Slavic term based on someone else's reconstruction of a Proto-Baltic term. --WikiTiki89 15:01, 24 August 2015 (UTC)
Sure, but the Proto-Balto-Slavic reconstruction will ultimately look different, at least in that it refers to a different level. (Most PBS entries here look very much different from the PB forms on which they are based). Someone who sees a PB form somewhere and wants to know what it is won't find a page about it here. Shouldn't there be one -- in the same way that there are "alternative spelling of" and "obsolete form of" pages? In this way we don't misrepresent sources, and we allow users to find exactly the form they saw in some source and track its status (deprecated) and understand why it was replaced by the PBS form. --Pereru (talk) 15:16, 24 August 2015 (UTC)
Let me give an example. A proto-Baltic form, like e.g. Appendix:Proto-Baltic/*akemns, would have an initial template saying something like: "This protoform is deprecated. The current consensus form is Appendix:Proto-Balto-Slavic/akmo. Reasons for this change are indicated below. See also Appendix:Proto-Balto-Slavic for the current view on this branch of Indo-European." In the page itself, the sources for that form (say, Karulis' LEV) would be cited. In this way, the reader would know what this form is, where it came from, and what it was abandoned for. The end result would seem to me to be at least as useful as "alternative spelling of" or "obsolete form of" pages. (I imagine there would also be a heading in the current reconstruction -- something like ==Deprecated forms== or ==Older proposed forms== -- to link the currently accepted protoform to its previous incarnations.)--Pereru (talk) 15:22, 24 August 2015 (UTC)
You may be right about including them in some way, shape, or form, but this has nothing to do with misrepresenting sources. We do not misrepresent PB sources by altering the form of the reconstruction to make it PBS-like. --WikiTiki89 15:33, 24 August 2015 (UTC)
Thanks. But as for sources, if a source clearly reconstructs a form as X, and we list it here under page Y, then it seems to me we are misrepresenting it, aren't we? (But one possible solution would be to mention this on the page; i.e., have something akin to a ===Usage notes===, or a footnote, where we explain that what the source said isn't exactly what is on the page. Would that be OK with you?) --Pereru (talk) 17:47, 24 August 2015 (UTC)
If you quote the Pythagorean theorem as (= (+ (* x x) (* y y)) (* z z)) rather than as a^2 + b^2 = c^2, are you misrepresenting the Pythagorean theorem? --WikiTiki89 18:21, 24 August 2015 (UTC)
If you cite someone who quoted it as a^2 + b^2 = c^2, then yes, you are. If you have some standard way of referring to the theorem that supersedes whatever the author you're quoting saying, then you should say so somewhere and link to it. It would be like, you know, changing US spelling to British spelling in a quote written by an American author -- not the right way to quote. --Pereru (talk) 18:39, 24 August 2015 (UTC)
But that's the thing, we're not quoting, we're paraphrasing. And when you paraphrase, it is totally OK to change British spelling to American. I can talk about the "color" of Winston Churchill's eyes and cite a British source that spells it as "colour", and I would not be misrepresenting the source. --WikiTiki89 18:44, 24 August 2015 (UTC)
But that most clearly should not be the case for reconstructions -- they are ideas and hypotheses, not paraphrases. Their spelling is often exactly what is being claimed -- a *X instead of a *X'. In other words, the sounds that compose the protoform are exactly the theoretical point that is being made; and, in this case, of course the spelling matters, in fact it is what matters most. There can of course be general problems that can be solved in a general way -- researcher 1 uses X for a certain sound (say little glottal stops), while researcher 2 uses Y (say accent amrks) -- and you can adapt the spellings to reflect that (as long as you are consistent, and you write up somewhere why you chose to regularize this difference in the way you did -- and link it to the pages where it is relevant). But in most cases this is not so, and differences in spelling mean something much more serious -- and they should be better documented. --Pereru (talk) 20:11, 24 August 2015 (UTC)
The difference is between equality (faithfully representing a source in its original form), and equivalence (understanding its meaning and/or intention). Some sources for PIE write h₂ while others write H₂. These are different things in writing, but we know and understand that they mean the same thing; they are equivalent even if they are not equal. So we can exchange one for the other without problems. Likewise, in Wikitiki's example, "colour" and "color" are unequal representations of equivalent meanings (I might call them "equivalent words", but this hinges on the question of whether different spellings make different words). —CodeCat 18:57, 24 August 2015 (UTC)
And the solution for this is easy: you make a principled choice (I hope, after a discussion with others) for, say, h₂; and then you write somewhere (say, Appendix:Proto-Indo-European) that you did that, and why, and you link this page to those in which h₂ occurs -- so the reader, who may have seen a source that had H₂, doesn't think that you made a mistake. And since you do know why you preferred h₂ to H₂, explaining it in writing shouldn't be a problem. You would only need to do it once, in one page (where you could explain all the other similar choices you made), and then link it to new PIE entries. And again I ask: what is it about this suggestion that is so unreasonable or difficult to do? You spend more time writing comments here than it would take you to do this.--Pereru (talk) 20:11, 24 August 2015 (UTC)
There's nothing against this in principle, and it's even preferred I would imagine. But at the same time, a lot of Wiktionary's practices and conventions are unwritten; we follow them because we learn from existing examples that are already on Wiktionary. In the case of the choice to use lowercase h₂, the earliest that I can find is this. And there, too, it was simply set as a rule without discussion or motivation. To discuss and motivate it now would be a bit pointless, as there's already a consensus for it. —CodeCat 20:19, 24 August 2015 (UTC)
Good, let's do it like that in the future then. Write up your favorite spelling choices for PIE (h₂ instead of H₂), their reasons (in this case, I suppose because h₂ is more recent?), and voilà: no more for reasons for complains, and people can go back to arguing the merits. The point is not justifying it to other Wiktionarians (though that in itself is not bad: there are always new people coming who don't know where this decision came from, and I'm sure they'd appreciate the information), but to users. If someone checks an etymology here and sees something s/he finds strange, and there is no justification anywhere for it, then this doesn't make Wiktionary look more trustworthy. Again: I'm not suggesting a discussion (unless people think there should be one), I'm suggesting documenting choices, to show, at first sight, that they are choices, not mistakes. Besides, there are things that are much more important than h₂, like your current diatribe about how to spell proto-BS intonations. After this is over, don't you think it would be a service to others to write down somewhere why one variant was preferred? Again, so that it doesn't look like a mistake, but like a true principled choice? --Pereru (talk) 20:53, 24 August 2015 (UTC)
I'm not going to start documenting everything for PIE all over again. Not unless enough people feel there is a pressing need for it. So far, nobody has complained about our current standards. If you want motivations for PIE notation, you'll have to write them on your own. —CodeCat 21:05, 24 August 2015 (UTC)
I hope they do, because this would indeed lead you towards actually improving your PIE forms. Of course, nobody can force you to do the right thing; you're a free individual. I'll simply keep adding {{needsources}} and {{needref}} to your unjustified decisions (unless you'll help me by doing it yourself, of course), hoping that someone other than you will have the knowledge to do the right thing. As for complaints, I did complain against your standards, and I've seen several peole (Štambuk, -sche) disagreeeing with your standards in specific cases, so I think you're assuming a non-existing consensus here. You're more counting on people's intertia than consensus actually. But hey, it is a strong feature of humans too. For all I know, you may well get away with it. --Pereru (talk) 22:37, 24 August 2015 (UTC)
There's also the case where different sources disagree on certain sound laws. For example there's a subset of linguists that thinks the change o > a happened independently in all the Balto-Slavic branches rather than in Balto-Slavic itself. In this case, too, we have to pick one particular set of sound laws as the "main" one. Our existing pages treat the o > a change as Balto-Slavic. Likewise, some sources may neglect to indicate accent or acutes even when all descendants are in agreement. You can compare this to Pokorny's reconstructions for PIE: they don't reflect modern understandings so they have all kinds of weird schwas and long vowels while lacking laryngeals. So imagine that the only source we have on a particular form is Pokorny; should we allow ourselves to bring the form up to par? These are all questions that arise when we start giving too much weight to sourcing. —CodeCat 18:33, 24 August 2015 (UTC)
Disagreement between sources is exactly the reason why you need to argue for the forms you create pages for her1e -- I'm so glad you brought this up! Look: if different sources give different opinions and explanations, then you discuss them and explain why you favor one over the other. Things like "different sound laws" can be part of the discussion. All the problems you mention above can be summarized and written up in a page (e.g., Appendix:Proto-Indo-European sources) to which you can refer as part of your explanations for preferring one form over another. I've seen this done in etymological dictionaries, and I see no reason why you couldn't do this here. Sources are good -- even when they disagree... --Pereru (talk) 18:39, 24 August 2015 (UTC)
We don't have to motivate and discuss every single choice we make. For Dutch entries, we choose the spelling as prescribed by the Dutch language union as the norm for lemmas, even though not everyone uses it and some people advocate alternatives. This choice is not motivated or discussed; it's simply set as a rule and accepted by our Dutch editors. In the same way, it's not necessary to discuss why we picked one particular set of sound laws to base our reconstructions on. In many cases, the choice is arbitrary and we simply picked one because we had to make a choice. I think it's more important for us all to agree on a set representation and sound laws for Balto-Slavic reconstructions, than it is for us to discuss and motivate it all. Not that it's not welcome and valuable to give reasons for choosing one particular thing, but that's secondary to making the choice in the first place. What we choose is more important than why. —CodeCat 18:52, 24 August 2015 (UTC)
Things are different with reconstructions, especially when there are competing hypotheses. See, a reconstruction is not a word, but an idea; and, as for every idea, justifying it is important. The spelling of a Dutch word is not an idea that is being discussed by several people right now and with different, equally authorative variants (for a still different, but more comparable case, see Nynorsk vs. Bokmål). That's what historical linguists do -- they justify their reconstructions -- and that's what you should do, too, if you care about reconstructions. When you say "we don't have to justify it", you're making a "petitio principii" without there actually being policy on this. Why don't you start a policy page on why we don't have to justify choosing one etymology over another, one set of sound laws over another (thus disputing Wiktionary:Etymology and let people vote on it? You keep talking as if everybody agreed with you, when this is clearly not the case, much the opposite. I'd like to see you try to defend and get this "policy" of yours approved. --Pereru (talk) 19:08, 24 August 2015 (UTC)
Well of course, there must be a consensus. I'm aware that the current representation of Balto-Slavic doesn't have consensus, as both you and Ivan seem to disagree on it. But Ivan's solution was to simply create alternative (duplicate) entries or move mine, which of course is no way to come to an agreement. So the question becomes how do we come to an agreement on things, and if we don't, what should be done with existing and future Balto-Slavic pages? Right now, the majority of them has been created by me, so they mostly reflect the (unwritten) standards I follow. But if we insist that there must be consensus first, what do we do with them? Should they be deleted until there is an agreement on them? What about Balto-Slavic forms in etymologies? —CodeCat 19:16, 24 August 2015 (UTC)
It's not simply that there should be a consensus -- the consensus shouldn't be hidden, buried in some page that was archived three years ago. The reason for the consensus should be right there, on the page, or at least in Appendix:Proto-Indo-European, so that the reader knows what consensus decisions you made, and why. Since you're talking about theories, not words that exist in real languages, then your sources and/or arguments are the basic reason why the protoword is here -- i.e., they are precisely the most important piece of information. (I don't disagree with anyone's spelling of proto-BS, by the way, simply because I'm not sufficiently familiar with it to have a principled opinion. But I see you disagree -- and given this fact, why create pages with one spelling when you can't even agree this is the right thing to do? Why not create a paragraph in, say, Appendix:Proto-Balto-Slavic, that summarizes this discussion -- after you're done with --, lists your conclusion and the reasons for it? Then you can follow it consistently, and nobody -- or at least not I -- will complain. What I keep not understanding is this need to hide your reasons: that makes no sense to me at all; and it's something that no etymological dictionary I know of has ever done. Why this innovation? --Pereru (talk) 20:16, 24 August 2015 (UTC)
Consensus doesn't necessarily have to be formed through discussion. Sometimes all that's needed is for one person to do something and for others to then follow that example. Consensus is often silent, and therefore undocumented, reflected only in practice. There's no documented consensus for most of the edits people make to Wiktionary entries; it's simply the fact that they're left unreverted that creates a sense of agreement for the new status quo. It's only when someone disputes something that a lack of consensus becomes obvious. In the case of Balto-Slavic, you two have voiced your opinions, so that's how I know. I continued creating entries because I figured, the source of the dispute is the naming, but we can still have good content and when we solve the dispute we can rename the entries. I haven't made further attempts to come to a consensus because the attempts I did make didn't work; Ivan's opinions were fundamentally different from mine on this matter, and nobody else seemed to care enough to provide a third voice, so the matter remained unresolved and both of us just kept doing our own thing. —CodeCat 20:28, 24 August 2015 (UTC)
True. And sometimes what is necessary to challenge it is for someone to come here and say "but this is not right, and here's why". And that's what I'm doing, quite legitimately so, since what I am asking for is no more, no less than what every good etymological dictionary known to man already does -- sources + justifications. So: me being here, and the reactions of several others, show that there is no consensus here. If I were you, I would stop adding any new words, and concentrate on justifying the ones you've added already. You know the reasons you had for creating them, so this shouldn't be a problem. What to do with the proto-BS (or IE, or FU...) words? Justify them. If in the future a given justification is abandoned, because a new one came up... then all those pages will need to be moved, and a new justification added. That's how things go with ideas that aren't attested words (and add the reasons why). --Pereru (talk) 20:44, 24 August 2015 (UTC)
Oh of course, challenge and counter-challenge. And then eventually there's either an agreement, or we all give up until the next time. I actually find it much easier to discuss things with many participants though, that way things are more nuanced and it's not just two opposites clashing and getting nowhere. Much less chance of a stalemate. I will see if I can write up a proposal for PBS reconstructions with those motivations you're after so much. No promises though. I will refrain from creating any more until I do this, but I ask you not to add your templates to the pages. You should also remove them from Germanic pages because the norms are already explained at WT:AGEM, have consensus, and therefore don't need further justification. —CodeCat 21:05, 24 August 2015 (UTC)
Yes, the Wiktionarian way, isn't it? So conducive to the right result!... I also like it when there are more participants. Please! And I will indeed be glad to see you write up your proposals, so that others can see what exactly are the tacit rules you're tacitly following with the tacit (dis)agreement of your peers. WT:AGEM is actually quite good -- proficiat! But it doesn't say why sources or justifications should not be added. (I keep saying: you're following a policy that is not used in any goood etymological dictionary anywhere. "Consensus" indeed!...) I will refrain from adding the template to them for now, but unless someone explains why there shouldn't be sources/justifications in Proto-Germanic words I will eventually return to adding them. Why should it be less good to source/justify Proto-Germanic reconstructions than those of any other protolanguage? As in the other cases, they aren't words, their justification is a crucial element to their eligibility for having a page that states they are the "right" protoform, etc.- --Pereru (talk) 22:49, 24 August 2015 (UTC)
You can list deprecated reconstructions under ===Alternative reconstructions===, tagging them with {{qual|obsolete}} or whatever. Note the reconstruction from Pokorny in Appendix:Proto-Indo-European/h₂eHs-. The entries for these alternative forms can be soft-redirected as in Appendix:Proto-Indo-European/pel-. --Vahag (talk) 15:29, 24 August 2015 (UTC)
That is a good idea! I didn't know this could be done. Now, would it be OK if I created Proto-Baltic forms as such, under Appendix:Proto-Baltic/xxxx and then redirected them to their Proto-Balto-Slavic equivalents at Appendix:Proto-Balto-Slavic/xxxx? --Pereru (talk) 17:47, 24 August 2015 (UTC)
Of course, the names of Balto-Slavic pages should agree in notation with what is already the current practice for BS entries. Acutes and accent should be indicated when known, and the distinction between ś/ź (former palatovelars) and š (from RUKI) should be maintained, while the letter ž is not used for Balto-Slavic. This means that such things should be corrected for in the redirect as well. If certain features are reconstructible but not indicated in the page name, this should be explained in the entry. For example, if Slavic and Latvian have s while Lithuanian has š, then the expected reconstruction is ś and any difference should be accounted for by the entry. Likewise, if the descendants all indicate an acute but the page name has none, this too needs explaining. —CodeCat 18:05, 24 August 2015 (UTC)
Hi CodeCat! Glad you're not whimsically blocking people today. Now, to keep a form that was reconstructed as PB under a PBS heading would be as wrong as keeping a Latvian word under a Lithuanian heading -- it simply disagrees with the source, i.e. it is factually wrong information. The various letters are just notational conventions, differing from author to author, and could probably be resolved with redirect pages. (You could of course also include this information about Proto-Baltic in the Proto-Balto-Slavic page itself, but I don't see how this would be any better -- care to elaborate?
Let me give an example. I'm going to recreate the Appendix:Proto-Baltic/akmens page -- CodeCat, please refrain from deleting it until the discussion here is complete -- and make it look like what I'm thinking. Then you guys can give your opinions. --Pereru (talk) 18:34, 24 August 2015 (UTC)
Latvian vs Lithuanian is irrelevant here, that's a completely different case. They are real attested languages, and to label a Latvian word as Lithuanian would be a misrepresentation of the attested facts (not the sources; sources are irrelevant for attestation as we are a secondary source). But for etymologies, sources aren't facts, they're proposals. And as an independent etymological source, we're allowed to make different proposals. So if we think that no, your Baltic reconstruction doesn't make much sense, here's a Balto-Slavic one we agree with more, then we are allowed to do that. Being a secondary source means that we do our own interpretation of the facts. We can of course use the proposals of others as part of ours, which we do. And we should definitely source that. —CodeCat 18:44, 24 August 2015 (UTC)
And reconstructed protoforms are attested as claims at certain levels; and to misrepresent the claims as different from what they were is wrong. If you prefer, compare it to adding a quote to a certain word, but (a) misspelling words in it, or (b) attributing it to the wrong source. Not the right move, ahn?
Thank you so much for saying the sources are proposals -- I had said that to you so many times, I thought you never would agree with me. That's exactly why it's so important to ground them. See, when you create a protofrom page here, you're not creating a word: you're creating a proposal. And what makes proposals good or bad are the arguments that support them -- as you yourself said, they are not attested facts. That is exactly why sourcing and arguing them is so important: proposals without the accompanying argumentation are not compelling.
Finally, I have no Baltic reconstructions -- Karulis does. Take it up with him if you want, not me. Just like you haven't invented any of the Dutch words you contributed to Wiktionary (right? you haven't, have you? I mean, maybe you think Dutch is like Proto-BS and you should be allowed to add even the ones you invented yourself without further justification...). I have absolutely no problem with you changing any Proto-Baltic etymologies as long as you document you reason for doing so, or your source, etc. -- so that the reader can see why this is supposed to be better. I repeat: it's not much work, it takes a couple of minutes, and you must have the information already since you're making judgments on the basis of it. There is absolutely no valid reason for you not to do that. Period. --Pereru (talk) 19:08, 24 August 2015 (UTC)
Adding a source to our proposals just says "we agree with this idea". But that doesn't make sourcing important necessarily. Maybe there aren't any proposals that we agree with, and in that case we have nothing to source. So what Karulis says may be nice, but they are your reconstructions as soon as you put them in etymologies. Again, the source simply says that Karulis agrees with you, but you put it in the etymology, so you are proposing it in the name of Wiktionary. And I'm not required to give motivation for changing the etymology if there isn't one to begin with. Take your favourite edit warring target suns for example; the form is not motivated at all, but simply stated as fact, with reference to Karulis. This seems like exactly the kind of thing you're advocating against. A proper etymology, as I understand your view to be, would provide a motivation for the reconstruction. This motivation may itself come from Karulis's work, or it may be your own supplement. Or, it could be documented centrally in an appendix so that we don't have to write it down everywhere. But would have to exist, even for proposals that are sourced. —CodeCat 19:26, 24 August 2015 (UTC)
Also an added note: Karulis's reconstruction for suns is demonstrably wrong, because it shows the ō > uo diphthongisation for both East and West Baltic. This change only occurred in East Baltic, and is not found in West Baltic{{R:Fortson 2004}} so the form Karulis gives is Proto-East-Baltic. This is one of the reasons I am against over-reliance on sources; sometimes they are quite obviously wrong. —CodeCat 19:34, 24 August 2015 (UTC)
──────────────────────────────────────────────────────────────────────────────────────────────────── If we change suns from saying "From x.<ref>Karulis, Book</ref>" to saying "According to Karulis, from x.<ref>Book by Karulis</ref>", does that solve at least some of this dispute? That's what was done once before when there was dispute over the etymology of bensin — the entry was rephrased to attribute the etymological theory explicitly, rather than giving it in "Wiktionary's voice". - -sche (discuss) 19:45, 24 August 2015 (UTC)
I think it would help, because, to me, the main problem is making sure that everybody's opinion is clearly marked -- Karulis', CC's (or Wiktionary's), etc. It's all a question of knowing we are reporting the right thing.
@CodeCat, look: the point is not whether Karulis is right or wrong -- I have no beef with that. The point is making sure your reasons for agreeing or disagreeing with him are documented somewhere, so the reader can see them and decide if s/he agrees with you or not. So: if you want to copy the paragraph you wrote above and place it, say, somewhere (in suns, or in Appendix:Proto-Balto-Slavic and then link it to suns, adding a few words to the etymology discussion) -- I have no problem with that. My only problem is with you erasing or changing Karulis' opinion, and then contributing something that cannot be checked. Let me see if I put in bold you will finally react to this: I am not saying you have to believe your sources unconditionally; I am saying that you have to explain the choices you make. You're not explaining your choices; and it would be easy to do so: just create Appendix:Proto-Balto-Slavic and do it there, and link it to other pages. (After discussing the 'best solution' with your colleagues, i.e. after you and Štambuk and whoever else is intersted finally agree on how to spell proto-BS words.) But if you simply take down Karulis' opinion without justifying it -- and obviously you can try to justify doing it, since you just did it in the preceding paragraph -- you are NOT improving Wiktionary; you're just making it look more whimsical. My entire point in a nutshell: why hide the reasons for making a choice, especially when this choice is the crucial thing -- the very name of the page you create depends on it? --Pereru (talk) 20:26, 24 August 2015 (UTC)
Are you saying I need an Appendix page in order to remove an etymology I judge to be bad? Lots of other editors before me have simply edited out bad content, nothing to it. I'm just doing what others have also been doing already. It's you that's now trying to change all this and making it much more complicated, and then complaining when someone doesn't simply do it your new way and they start to butt heads with you. —CodeCat 20:32, 24 August 2015 (UTC)
Yes, but it's actually very simple. The Appendix page you need is a general guide to why certain things are 'bad content' -- they don't follow accepted correspondences, or they misapply sound laws, or are based on some idea (say, Glottalic Theory) that has been disproved, etc. Only one such page would probably solve all your problems. Then, when you remove an etymology that you think is bad and replace it with one you think is good, you mention in a footnote that so-and-so prosed the bad etymology, but then there's reason 1 and 2 (say, correspondence nr. 35, and sound law nomber 4) why this was bad -- see Appendix:Proto-Indo-European reconstructions -- which is why it was removed here. For deeper differences you might have specific pages, but I don't think there would be many of those, no. And it would also be possible to link Wikipedia pages, in case you see one that you agree with and you think actually explains the issue. The actual argumentation for removing an etymology would probably be one sentence long, and be added as a footnote. You could also mark it as a "Wiktionary editorial decision" if you don't want your name there. --Pereru (talk) 21:01, 24 August 2015 (UTC) NOTE: but note also that you'd have to deal with those that disagree with your reason. I suggest that anyone who disagrees with an etymology should first mention it somewhere -- the talk page of the protoform in question, or maybe WT:ES -- before making the change. If you do make the change, then also be ready to discuss with whoever disagrees with it, and if his/her arguments are good, then incorporate them in your rationale for accepting and/or refusing his/her criticism in the original footnote that explains the change.--Pereru (talk) 21:04, 24 August 2015 (UTC)
Unless we make this a rule for the removal of any content, etymological or not, then I'm not on board with this proposal. It would have to be justified why the rules for removing bad etymologies are different from those for removing bad anything else. Wiktionarians have always had the prerogative to delete content they think is bad, and they've never had to refer to some kind of standards document to justify their removal. An edit summary has generally been enough, and often even that is not done. This has worked well enough so far that you're the first to propose a change. So I will be expecting a more general support for this idea as it seems like a solution without an obvious problem. —CodeCat 21:10, 24 August 2015 (UTC)
You might do this if you want, though you yourself have pointed out repeatedly that etymologies are not words, so it's up to you to argue why they should follow the same rules. Feel free to present your arguments. As for me, obviously, protoforms (to quote your post) are proposals, not words; and, in science, proposals exist only because of their arguments. Unless you've changed your mind and no longer think protoforms are proposals rather than words, you should agree, for the sake of consistency with your own stated opinion.
I dispute the idea that Wiktionarians have always been free to delete whatever they thought was bad content; if they don't justify their deletions, they are stopped and blocked after a while -- i.e., others have to agree with them, tacitly or not, or else they are not allowed to continue. Adding justifications to protoforms, especially when you're making choices, falls within this general area. I maintain that for protoforms (= proposals), justifications are more important, let's say as important as sources are for quotes. You don't seem to want to address that, so I'll assume you tacitly agree (as you assume those Wiktionarians who don't revert your edits tacitly agree with you -- "tacit consensus", right? :-) --Pereru (talk) 22:22, 24 August 2015 (UTC)

I kind of like this idea – referenced Proto-Baltic pages with a clear disclaimer that it's a defunct grouping in its classical sense according to the most recent sources. (Disclaimer: I have yet to see serious challenging of Slavic being a daughter of W-Balt, thus I do not believe that there can be W-Balt + E-Balt grouping excluding Slav).

We do not misrepresent PB sources by altering the form of the reconstruction to make it PBS-like. --WikiTiki89 I agree with this, "sadly" that is not exactly the case ("correcting" referenced (even if deprecated by mod. stand.) PB forms to Orig. Res. PBS forms is what lead to edit-warring a while ago, in my reading of things.)

My (personal/pseudoscientific) reading between the lines of Pereru's proposal is that it would serve as another "safety valve" and, baby, we couldn't have enough of those, lol. Neitrāls vārds (talk) 22:00, 24 August 2015 (UTC)

From where I stand, PBS does look like a better grouping that PB (the evidence seems to be accumulating). But in the absence of a general work on the topic (say, a PBS etymological dictionary), I don't think it can be regarded as settled -- I'm just conservative on this point. But I have nothing against it as a theory, and as long as things are clearly marked and sources are not misrepresented, I have no problem with it. --Pereru (talk) 22:22, 24 August 2015 (UTC)
Also, I'm not in principle against altering the forms of reconstructions -- I just think this should be done in the open, with the rules clearly laid out and placed in some page where others can see them. What is the point of "adjusting" a form to a spelling that was not in the original source, and then doing nothing, not even adding a footnote, thus misrepresenting the original content? And it's so easy to do it right -- just add the footnote, or change the source to the one whose spelling you think is better. This implies adding only a few words, keeps things clean and organized, and doesn't prevent anyone from expressing his/her agreement or disagreement with this or that protoform. Why not do it? Or, worse yet in CodeCat's case, why fight against it? --Pereru (talk) 22:26, 24 August 2015 (UTC)
Yes, wouldn't it be so much easier if everyone just saw it your way? Why do people always have to make it so difficult by disagreeing with you? It's so inconvenient. —CodeCat 22:38, 24 August 2015 (UTC)
Indeed! You have much more experience than I do with being in this position, so I'm hoping you'll share your wisdom in this respect? And especially with respect to my old, old question: "all good etymological dictionaries do it this way, and CodeCat does the opposite. Now, who do you think is more likely to be wrong?..." --Pereru (talk) 22:55, 24 August 2015 (UTC)
So, to sumarize: I'm OK with deprecated pages/redirects, as long as it is clear which form is which, and who proposed what and why. As far as I'm concerned, this settles the question. --Pereru (talk) 05:25, 25 August 2015 (UTC)
But why create Appendix:Proto-Baltic/akmens with an unusual "deprecated" infrastructure? Why a hard redirect wouldn't do? In case of proto-languages on the same level we should use soft redirects, because the page can contain homonymous roots. Why do that for Proto-Baltic? How will a user ever even get to the Proto-Baltic page? --Vahag (talk) 09:09, 25 August 2015 (UTC)
Why a hard redirect wouldn't do? Hard redirect to what though? You mean akmō which is somehow mysteriously unciteable (I was actually looking at it and wondering whether to ask Itsacatfish if it would be possible to come up with some refs (non-agnostic of PBS) but I wouldn't want to draw any "innocent" editors in this drama.) Neitrāls vārds (talk) 22:31, 25 August 2015 (UTC)
Uncitability is a different question and has nothing to do with the policy of redirecting. The PBS page will presumably have CodeCat's original-research justifications (I'm with Pereru on this one). --Vahag (talk) 08:26, 26 August 2015 (UTC)
The discussion here (including other headers above) seems to have some of the problems arising from overdoing lexicography:
  1. from trying to use sources to "attest" reconstructions,
  2. and from treating reconstructions as "headwords" — instead of kind of index words for etymologically connected word groups.
Creating redirects for alternate reconstructions, and discussion of competing (though not necessarily depreciated) approaches both sound like good ideas, but I do not see the benefit in creating separate pages altogether for reconstructions based on more or less the same data as another one.
If (and it appears to me that this is an if) the point of protolang pages is to illustrate the connections between attested languages, then cutting down on repetition is necessary. We do not create separate appendices for things like West Germanic or Anglo-Frisian, even though they are known to have existed; since this stuff can be adequately discussed already in the "Proto-Germanic" appendices.
I would hold that, strictly speaking, we have no such thing as an "accepted reconstructed language" on Wiktionary — that's why they go in the Appendix namespace to begin with. Which is not a namespace that means "just like mainspace, but for second-tier languages". As I see it, an appendix-only status means not only that protolanguages can be subject to new limitations like possibly requiring sources, but also that they don't need, and in some respects probably shouldn't, be treated as lexicographic subjects.
I also welcome explaining systematic details on how and why to present reconstructions on pages like Wiktionary:About Proto-Balto-Slavic. That said, if the dispute is about a current inability to establish a consensus reconstruction of PBS that we could use as the index forms, there are a couple of alternate possibilities that can be considered:
  • Picking an index language and listing forms under its' reflex. In the 'stone' example above, we'd perhaps use Lithuanian akmuo. This seems a bit difficult to fit into the Appendix:Proto-Whatever/word notation, though (it might appear to imply that it is a Proto-Baltic or Proto-Balto-Slavic form rather than Lithuanian).
  • Using rough "non-reconstructions". A convention introduced by I think Roger Blench is the symbol "#" in place of "*" when we have a cognate word-family but no systematic reconstruction scheme has been worked out in detail; and adding this to some kind of a "majority representation" (in principle partly arbitrary) of the word root's shape. In this case this would probably bring us to #akmV (since the ending seems to be the main issue).
--Tropylium (talk) 08:21, 29 August 2015 (UTC)

Links in examples of non-English words[edit]

I mean in the entry πειρατής#noun (meaning: pirate) the example

  • Πειρατές του Αιγαίου (meaning: pirates of the Aegean Sea)

(strictly speaking this example should be in the entry πειρατές, which is the plural nominative of πειρατής)
>>>> I believe the links of this kind are very useful for an english-speaking person who wants to learn that other language (Greek in the case above) because she/he can examine the word for word translation of the example (when a word for word translation can be provided for an example).
Another user reverted an edit of mine that added a link of this kind. Is there a Wiki-Decision on this issue?SoSivr (talk) 10:26, 25 August 2015 (UTC)

See WT:ELE#Example sentences: "Example sentences should... not contain wikilinks (the words should be easy enough to understand without additional lookup)". However, that policy may have been written with English example sentences in mind; perhaps it's time to reconsider it for other languages. —Aɴɢʀ (talk) 10:55, 25 August 2015 (UTC)
Yes this occurred to me some time ago. I'd like to split the rule for English and non-English entries, or just abolish it all together. Renard Migrant (talk) 17:04, 25 August 2015 (UTC)

Implementing some type of autolinking in usexes has been brought up (by Benwing, I think?) and I really like this idea. I have been doing this manually (as in Россия ‎(Rossija)), it's a bit of a pita doing it manually though. Neitrāls vārds (talk) 23:52, 25 August 2015 (UTC)

"A bit of a pita"? I don't support autolinking in usage examples. It barely works in headword lines. --WikiTiki89 00:33, 26 August 2015 (UTC)
Oh... I just got what a pita is (only because someone else used it in all caps in a discussion below). --WikiTiki89 03:06, 26 August 2015 (UTC)
I support autolinking. It would be better if they were black links, because the wrong links would be hidden and it would be easier on the eye. — Ungoliant (falai) 00:37, 26 August 2015 (UTC)
@Ungoliant, With the so-called "orange" links (when a landing page doesn't have the header for that lang) built into the software they could be made pretty accurate (only capitals at start of sentences would be a problem. @WikiTiki, well, perhaps the person originally suggesting this could share their vision of how it could/couldn't be implemented, Idk. Neitrāls vārds (talk) 01:24, 26 August 2015 (UTC)
Full support. Autolinking has been a de-facto standard for Chinese lects. In fact, you need to add an @ sign to remove links in {{zh-usex}}. In any case, the choice should be available for difficult or rare words, especially in foreign languages. I consider this quite important for languages without spaces between words (existing usexes may need to be need to be rewritten to allow autolinking as in เรียก). --Anatoli T. (обсудить/вклад) 01:27, 26 August 2015 (UTC)
I like the idea of autolinks, if it can be done right. Wikitiki, can you explain what doesn't work currently? Benwing2 (talk) 07:06, 26 August 2015 (UTC)

Adding a collocations tab or section[edit]

In the past, there has been support for listing common collocations somewhere (besides usexes, which only fit a few), such as in ====Collocations==== sections. At WT:RFD#sentimental_value, it was suggested that not only collocations but also translations be provided. IMO, it might consume too much visual and byte space to list translations of collocations within entries, so I propose that we [ask the developers to] create a 'Collocations' namespace with its own tab like 'Citations'. We could also link to it using a {{seeCites}}-type template in entries. In that namespace, we could list common collocations, perhaps as the glosses to translation tables to which translations could be added — I have mocked up an example at Talk:goods; note that SOP translations are linked to their component parts. What do you think; would you like a Collocations: tab, a ====Collocations==== section, or neither? Should the tab or section contain translations, like at goods? - -sche (discuss) 19:44, 25 August 2015 (UTC)

Seems like a reasonable solution to a perennial problem, at least if the default search includes the Collocations namespace. If it doesn't, we won't have helped users. I suppose I would support it anyway because we might be able to come up with some other way to facilitate user search access to it or technical possibilities and rules may change. DCDuring TALK 00:18, 26 August 2015 (UTC)
Adding another namespace is a PITA because there's no enforced correspondence between the entry and the other page (which is why the citations namespace should be deleted). If it's too much of a distraction it can go in a collapsed box. DTLHS (talk) 02:18, 26 August 2015 (UTC)
  • I don't know the technicalities of the issue. However, I would strongly support this idea, as it would make a natural repository for SoP expressions which actually have some linguistic value, such as what we often call "set phrases", or what is the usual (unexpected) verb which collocates with this noun?- ("wage war", "run for president", "wax lyrical"), as well as being a useful tool for Eng L2 students. There are well known lemming dictionaries out there devoted to the theme of common collocations. -- ALGRIF talk 15:50, 29 August 2015 (UTC)
I support this too. Useful for everybody. — Ungoliant (falai) 16:26, 29 August 2015 (UTC)

Gender markers in Polish adjective entries[edit]

{{pl-adj}} currently requires a gender parameter. However, gender in Polish adjectives is inflectional, not lexical, and the lemma form is almost always masculine nominative singular (with rare exceptions for "female-only" adjectives like ciężarna ‎(pregnant) or szczenna ‎(pregnant with puppies)). I think these markers should be removed and the gender parameter ignored and eventually removed through a bot, as the exceptional cases can be easily identified by looking at the adjective ending. Are there any objections? --Tweenk (talk) 22:30, 26 August 2015 (UTC)

If the gender can always be determined from the ending, then this sounds good to me. Even if in rare cases it can't, it might still be better to have the gender auto-detected and only present as an override. Benwing2 (talk) 08:30, 27 August 2015 (UTC)

Allowing matched-pair entries[edit]

I created Wiktionary:Votes/2015-08/Allowing matched-pair entries as a proposal to formalize entries such as ( ), based on the discussion Wiktionary:Beer parlour/2015/July#Merging ( and ) into a single entry. Thoughts? Can this vote be improved? What would be your vote and why? Feel free to edit it. --Daniel Carrero (talk) 14:27, 27 August 2015 (UTC)

Scientific symbols?[edit]

At present, there's no good place to put scientific symbols in entries (eg E for energy or electric field, t1/2 for half-life etc.) What would people say to modifying {{en-noun}} or creating a new inflection line template to show these symbols (similar to what's currently done at speed of light, but neater). So for instance:

speed of light ‎(uncountable, symbol c)
velocity ‎(countable and uncountable, plural velocities, symbol v or v)
magnetic flux ‎(uncountable, symbol Φ or ΦB)
neutron ‎(plural neutrons, symbol n)

There are some shortcomings (for instance, the need to sometimes use bold or italics) so I'd be happy to hear other suggestions. Smurrayinchester (talk) 15:49, 27 August 2015 (UTC)

I think it would be better if we agreed on a guideline on how to add them to definition lines rather than HWLs, because a symbol doesn’t always apply to all senses (i.e. velocity ‎(rapidity of motion) and speed of light ‎(figurative: extremely fast speed)). — Ungoliant (falai) 15:57, 27 August 2015 (UTC)
Why aren't they just displayed next to the appropriate {{sense}} under Synonyms, just like abbreviations sometimes are and always should be, IMO. I could understand making these symbols larger, having a different background or a border, etc to make them more visible as they could get lost in a series or block of synonyms. DCDuring TALK 16:26, 27 August 2015 (UTC)
Surely not worth changing en-noun for this. Use alternative forms of synonyms. If really necessary use {{head|en|noun}}. Renard Migrant (talk) 17:38, 27 August 2015 (UTC)
In an entry for an English word there is a section English, in an entry for a French word there is a section French and so on. But in an entry for a number, e.g. 7 or for a symbol, e.g. c, there is a section Translingual. Therefore similarly one could have an additional translation for e.g. the english noun velocity as rapidity of motion:
  • French: vitesse
  • Spanish: velocidad
  • Symbol(or Translingual): v

SoSivr (talk) 21:39, 28 August 2015 (UTC)

It's not a translation of the word, though: it's a conventional abbreviation. Equinox 21:42, 28 August 2015 (UTC)
Such symbols are normally Translingual. Thus they might be a synonym in many languages. DCDuring TALK 21:56, 28 August 2015 (UTC)
I agree with DCDuring, list abbreviations in the Synonyms or Alternative forms section. This is also how we handle non-scientific abbreviations, in my experience, like United Kingdom→[UK]]. - -sche (discuss) 22:15, 28 August 2015 (UTC)

Attributive use of nouns[edit]

How to we tell for certain that a noun that modifies another noun is or isn't an adjective? For instance, I'm pretty sure that the word donkey in "donkey sanctuary" is just a noun, as is beer in beer parlour. An example of true adjectival usage would be welcome. SemperBlotto (talk) 14:57, 29 August 2015 (UTC)

Wiktionary:English adjectives which is of course, not policy. Wiktionary:About English contains no policy that I can see on what separates an adjective from a noun used attributively. I actually don't think it's that hard and in ambiguous cases, there should be three citations which are clearly adjectival not either nominal or adjectival. For example "this desk is wood" would not count as a clear adjectival cite as it's just as easily (or more easily) identifiable as a noun than an adjective. Renard Migrant (talk) 15:11, 29 August 2015 (UTC)
It's very difficult to get a wording through a vote, though. Even people who agree that we need such a policy will oppose on the grounds of wording, so getting 70% ish approval is unlikely. Renard Migrant (talk) 15:12, 29 August 2015 (UTC)
Would you say that epidemic in "epidemic proportions" is an adjective? It seems so to me (but I can't explain why). SemperBlotto (talk) 15:55, 29 August 2015 (UTC)
Yes, you're right. [10]. Donnanz (talk) 17:02, 29 August 2015 (UTC)
Yes, "proportions" usually takes an adj; e.g. you'd say "canine proportions", not "dog proportions". Equinox 17:11, 29 August 2015 (UTC)
Apply tests of adjectivity, and Occam's razor. Donkey has not (yet) been shown to be used in contexts that are clearly adjectival, like this sanctuary is donkeyer than that one; it was very red and very donkey. In contexts where either a noun or an adjective could work (donkey sanctuary could be compared to noun sanctuary or improbable sanctuary), Occam's razor suggests it's more likely to still be a noun than to have acquired a second part of speech which is peculiarly limited to only those varied contexts where the first part of speech could also be used. On the other hand, epidemic is used in contexts where only an adjective could work, so it must be an adjective (some of the time). It's also used in contexts where only a noun could work, e.g. in the plural, hence it is also a noun. When a word that has been shown to be both an adjective and a noun is used in contexts where it could be either (like epidemic disease), I think we've tended to default to the interpretation that it's an adjective unless semantics make the other interpretation more likely: e.g. the adjective is the best semantic fit in epidemic fraud (widespread fraud), while the noun would be the best fit in *epidemic storage (section of a lab which stores samples of viruses that cause epidemics). But if a prime minister fakes an outbreak of disease in order to push through security measures, you could speak of "his epidemic fraud" with epidemic as a noun, just like you could mock postmodernism as "that postmodernism nonsense". - -sche (discuss) 16:45, 29 August 2015 (UTC)

General thoughts on this:

  1. The reason that it's difficult to get a policy through is that there's no bright line.
  2. This is a particularly confusing subject for non-English speakers. I speak English 1st and French 2nd. French isn't big into attributive nouns. In English, you can construct a sentence "A B", where A is an attributive noun and B is a common noun. French you usually construct it "B de A", where A and B are nouns and "de" is the preposition "de"

Purplebackpack89 17:01, 29 August 2015 (UTC)

Some French adjectives feel a lot like attributive nouns to me, e.g. routier (not comparable and so forth). Equinox 17:11, 29 August 2015 (UTC)
True adjectives can be qualified by adverbs. Epidemic fraudfraud was indeed epidemic; Epidemic storage → *the storage was indeed epidemic (in the sense -sche mentioned); the table is woodenthe table is solidly wooden; the table is woodthe table is solid wood, *the table is solidly wood. — Ungoliant (falai) 17:15, 29 August 2015 (UTC)
I wouldn't really have any problem with "the table is solidly wood". —CodeCat 17:45, 29 August 2015 (UTC)

restoring solitary wasp[edit]


Perhaps not the best place to post a request but I don't know another place to do it. This is a perfectly attestable expression, and my grammar, though not perfect (I'm not a native speaker) was certainly acceptable, and at least correctable if there were mistakes. Could someone bring back that entry please? I'm really fed up with the cavalier behavior of this admin, really (and not the only one). Thank you 20:50, 29 August 2015 (UTC)

Hi. Yes, it's a real phrase, but doesn't it just refer to any wasp that is solitary (i.e. not social or colony-dwelling)? Then it's obvious from the two words. Equinox 20:59, 29 August 2015 (UTC)
It seems that the terms solitary wasp, social wasp, and hunting wasp have been used as if they referred to well-defined groups, though most modern thinking would apparently have them as SoP. For example, Century 1911 has solitary wasp as a run-in at the entry for solitary. DCDuring TALK 21:33, 29 August 2015 (UTC)

Which English entries need pronunciation?[edit]

Can someone generate a list of English entries that don't have {{IPA}}? But somehow sort them in order of importance? I'm not sure how we would go about that, but there are basic entries out there, like garbage, which really should have the IPA pronunciation. Ultimateria (talk) 04:41, 30 August 2015 (UTC)

I'd like this too, though "in order of importance" is probably an unattainable goal. I've added pronunciation info at garbage now. —Aɴɢʀ (talk) 06:52, 30 August 2015 (UTC)

Here's a list of top 100 English entries whose English section did not contain "{{IPA" on 28 July 2014, ordered by Wiktionary:Frequency lists/PG/2006/04/1-10000, not constrained to lemmas, based on 20140728 dump: said, no, de, hands, Gutenberg, english, 2, replied, united, john, looking, coming, making, sn, arms, followed, appeared, continued, ety, reached, suddenly, miles, taking, beyond, nearly, laws, comes, natural, laid, copyright, opened, an', 4, makes, tried, Dr, lived, certainly, unto, placed, letters, remained, blockquote, happened, minutes, loved, knows, donations, thoughts, including, filled, seeing, tears, places, raised, moved, giving, laughed, leaving, started, circumstances, c., lines, considered, observed, wished, Charles, formed, trying, allowed, girls, discovered, sitting, ways, officers, offered, happiness, produced, walls, declared, prepared, takes, soldiers, talking, steps, intended, matters, appears, closed, gives, required, ladies, fixed, troops, camp, copies, v., running, cases, names.

If you want to have the list constrained to lemmas, let me know. Basically, let me know:

  • a) How many items you want
  • b) Whether you want to constrain to lemmas
  • c) To what location do you want the list delivered, like someone's talk page, some subpage, or the like

The process is rather simple, based on a dump. The key part is identifying English sections that do not contain "{{IPA". This is done using the following script find-missing-English-IPA.py:

import sys, re
entryStartFound = False
IPAFound = False
title = ""
for line in open(sys.argv[1]):
  line = line.rstrip()
  if "<title>" in line: title = re.sub(" *</?title> *", "", line)
  if entryStartFound:
    if "{{IPA" in line: IPAFound = True
    if "----" in line or "</text>" in line:
      entryStartFound = False      
      if not IPAFound: print title
      IPAFound = False
  if "==English==" in line: entryStartFound = True

The rest is intersecting the result with the frequency list in such order that the result is sorted by frequency list. The process was as follows:

  • find-missing-English-IPA.py enwiktionary-20140728-pages-articles.xml >English-entries-with-no-IPA.txt
  • grep -Fx -f English-entries-with-no-IPA.txt frequency-list-English-PG-10000.txt >t.txt
    That's a set intersection, but the order of files matter.
  • head -100 t.txt
    Output the first 100 lines

You need Python, grep and head. You probably do not really need head, since you can pick the top 100 in your favorite editor. grep is used to do set intersection; if you have another method, you don't need grep. --Dan Polansky (talk) 10:34, 30 August 2015 (UTC)

By the way, English-entries-with-no-IPA.txt has 519,273 items. --Dan Polansky (talk) 10:36, 30 August 2015 (UTC)
The first two in your list above, said and no use {{audio-IPA}}, so they do have IPA pronunciations given. I bet several of the others in the list do, too. —Aɴɢʀ (talk) 11:55, 30 August 2015 (UTC)