Wiktionary:Beer parlour

Definition from Wiktionary, the free dictionary
(Redirected from Wiktionary:Beer Parlour)
Jump to: navigation, search

Wiktionary > Discussion rooms > Beer parlour

Lautrec a corner in a dance hall 1892.jpg

Welcome, all, to the Beer Parlour! This is the place where many a historic decision has been made and where important discussions are being held daily. If you have a question about fundamental Wiktionary aspects—that is, about policies, proposals and other community-wide features—please place it at the bottom of the list (click on Start a new discussion), and it will be considered. Please keep in mind the rules of discussion: remain civil, don't make personal attacks, don't change other people's posts, and sign your comments with four tildes (~~~~), which produces your name with timestamp. Also keep in mind the purpose of this page. There are various other discussion rooms which may serve the idea behind your questions better. Please take a look to see which is most appropriate.

Sometimes discussion identifies an issue as an idea for policy development or rewriting. Such discussions may be taken out of the Beer parlour to a relevant page, or a brand new page may be created. Usually, the active policy pages will be listed in one of the sections below. See also the policy development page and the votes page.

Questions and answers will not remain on this page indefinitely, as it would very soon become too long to be editable. After a period of time with no further activity (usually a couple of weeks), information will be moved to the archives. We make a point to preserve all discussions that were started here in the archives. However, talk that is clearly not intended for this page may be moved and will not end up in the archives. Enjoy the Beer parlour!

Beer parlour archives +/-


August 2014[edit]

Normalised spellings and CFI[edit]

For some languages, we commonly respell the words into a common form. This is done for Old Norse, Old High German, Middle Dutch, and other old languages. As it is now, CFI does not actually allow for this practice, but I think it should be allowed. So we should probably codify this practice as an exemption. Something along the lines of "for languages for which a normalised spelling is adopted, the normalised spelling itself does not need to be attested, as long there are unnormalised spellings of the same word that do meet CFI". —CodeCat 23:09, 3 August 2014 (UTC)

Support. That's definitely the case with Old Church Slavonic or Old Russian. Most quotations of these in modern Russian use modern Cyrillic letters, instead of old letters, which makes the terms in old spellings difficult to attest. --Anatoli T. (обсудить/вклад) 00:19, 4 August 2014 (UTC)
Support, and for each language that normalizes spellings, we would have to detail the normalization rules on its "WT:About X" page. We can consider an non-normalized spelling to attest its normalized spelling obtained by following the normalization rules that we have listed for the language. --WikiTiki89 13:38, 4 August 2014 (UTC)
I forgot to mention that quotations should always be added in the original non-normalized spellings whenever possible, and known non-normalized spellings should be listed in alternative forms sections. --WikiTiki89 13:53, 4 August 2014 (UTC)
  • Words should be added as they are spelled in attestation. This "normalized spelling" idiocy is another attempt to impose artificial uniformity where there is none, namely in attestations of all languages before the 19th century where there were usually no enforced rules of spelling. It will make Wiktionary completely useless as a resource because we would never know whether the added word was attested as such, or is a a guessed transcription according to a scheme devised by some wiki nickname. Any kind of "normalized" spellings should be used strictly as redirects to real spellings. -Ivan Štambuk (talk) 13:46, 4 August 2014 (UTC)
    It does not impose uniformity, it merely makes it easier to find the entry where the definition is located. Also, all dictionaries do this, which nullifies your usual argument of breaking accepted conventions. --WikiTiki89 13:56, 4 August 2014 (UTC)
    If the point were in finding entries, then the normalized spellings themselves would be redirects, not the main entries. Instead, what is suggested is that all of the main entries be somehow normalized, regardless of how they are attested, containing all of the definitions, citations and so on for all of the spellings that they resolve to under some lossy scheme, and actually attested entries be soft redirects, and paradoxically listed as "alternative spellings" under the normalized entry (how can real attestations be alternative spellings to something made up?!)
    When it comes to ancient languages, all of the paper dictionaries have space constraints that require usage of a standardized spelling scheme to help look up entries. A single word could have a dozen different spellings. However, online dictionaries do not suffer from such limitations. We can have everything - the original script in Unicode and not Latin transliteration, citations, as well as a list of widely used scholarly transcriptions, normalization schemes, reconstructed pronunciations or whatever - but the latter not as full-blown entries, because they are not real words but reconstructions. --Ivan Štambuk (talk) 14:19, 4 August 2014 (UTC)
    Online dictionaries have screen real estate constraints as well. I would not like to picture what an inflection table would look like if it includes all attested variant spellings. --WikiTiki89 14:29, 4 August 2014 (UTC)
    For ancient languages inflection tables are not that important. People don't use them to learn to speak those languages (except maybe Latin and Sanskrit, but that's insignificant). For them, much more important points are accuracy and reliability. Inflection tables which only contain attested forms in their original spelling are much more important than inflection tables containing reconstructed forms that were possibly never attested in those spellings. It's a difference between Wiktionary as a serious reference work, and Wiktionary as a conlang community. --Ivan Štambuk (talk) 14:49, 4 August 2014 (UTC)
There's also the problem of unattested lemma forms. We have an entry for πρίαμαι (príamai), for example, but according to Liddell & Scott that particular form is not attested. That doesn't happen too often in Ancient Greek, which has an enormous corpus, but it happens very frequently in languages like Gothic and Old Irish. I've been creating entries for unattested lemmas in both of those languages, but I've been wondering if that's really such a good idea. Maybe we should put them in the Appendix namespace alongside other reconstructed forms. —Aɴɢʀ (talk) 14:54, 4 August 2014 (UTC)
If the lemma form can be easily determined, then I think it is the best place to define the term. We can note on the page that the lemma form is unattested and I guess it makes sense to be able to mark or remove the unattested inflected forms as well. Inflection tables are still very useful, especially when all or most of the forms are in fact attested. --WikiTiki89 15:24, 4 August 2014 (UTC)
Support. For Old Norse and Old High German, I expect that normalized spellings actually meet CFI, firstly because Norse texts are so regularly printed in normalized form, and secondly because print and online dictionaries (the former of which are sufficient verification, per CFI, of extinct or poorly-documented languages) invariably use normalized spellings. Another set of languages that already benefit from normalized spelling and would benefit further from having the practice codified are the indigenous languages of North America, which different dictionaries and text-collections have often used slightly different orthographies to represent. For instance, in many languages, some sources have represented long vowels with macrons (ā), or circumflexes (â), other sources have used trailing mid dots (·), and still others have used doubling (aa). Pace Ivan, I think it'd be hilariously nonsensical for e.g. one third of the inflected forms of a term or one third of a set of compounds that share a common element to use ā, while another third used aa, most of the rest used â and a few entries used , all because someone preferred to blindly copy and paste the idiosyncrasies of the different dictionaries the forms were attested in rather than think critically about them for a moment. - -sche (discuss) 17:51, 4 August 2014 (UTC)
To be fair, I don't think that Old Norse texts that are printed in normalized are allowed to attest the normalized spelling. The actual attestation should be of the original spelling(s) from when the language was still in use. --WikiTiki89 18:07, 4 August 2014 (UTC)
Paradoxically, those spellings are much harder to attest. It's much like the scripts of Gothic: it was written in Gothic script originally, but everyone "normalises" it into a transliterated form nowadays. —CodeCat 18:12, 4 August 2014 (UTC)
Yes, that is why if we allow entries at normalized spellings in CFI, we must remember that normalized texts attest the term, but not the spelling. --WikiTiki89 18:18, 4 August 2014 (UTC)
Those "normalized Gothic" texts are not attestations of Gothic language. That is not how the Gothic was written. Those are scholarly transcriptions made for scholarly purposes. They are equivalent to e.g. respelling any language in phonemic transcriptions. Nobody writes Gothic today. It's a dead language with small and fixed corpus. Those kind of transcriptions are not attestations of Gothic. --Ivan Štambuk (talk) 09:31, 5 August 2014 (UTC)
I disagree that only original editions of works can be cited; I don't see such a restriction in WT:CFI. Two editions are not independent of each other for the purposes of citing a single word/spelling (e.g. one can't cite both the American and British versions of Harry Potter and have them count as two citations of castle), but nothing I see prohibits citing different editions to confirm the existence of different words or spellings — e.g. citing an American edition of Harry Potter as a use of the word favor, even if JK Rowling's original used favour. The American edition is durably archived and verifiably uses favor several times (making clear that it isn't e.g. a typo). Likewise, the normalized editions of Norse texts are durably archived. (In most cases, they're far better archived and far more accessible — as copies exist in hundreds of libraries — than the original manuscripts, which are periodically destroyed by fires and in historical cases may even have been destroyed before any un-normalized editions of them were printed. But that's mostly superfluous to my point.) Consider also how many translations of the Bible have been cited to verify various words around here — CFI's prohibition against citing two "verbatim or near-verbatim quotations or translations of a single original source" only stops us from citing two editions of the Bible as citations of the same (spelling of a) word, it doesn't stop us from using two editions of the Bible as citations of two different words. - -sche (discuss) 20:17, 4 August 2014 (UTC)
What I meant was a reproduction of an Old Norse text printed well after Old Norse died out cannot count as a citeation of Old Norse. However, we can assume that an unnormalized reproduction reflects the original spelling and use it to attest spellings, and we can assume that a normalized reproduction does not necessarily reflect the spelling but still reflects the form of the word and we can use it to attest the term and its form, but not its spelling. --WikiTiki89 20:25, 4 August 2014 (UTC)
@-sche: I think it'd be hilariously nonsensical for e.g. one third of the inflected forms of a term or one third of a set of compounds that share a common element to use ā, while another third used aa, most of the rest used â and a few entries used a· - Indeed it would be nonsensical from the perspective of someone who imagines that Ancient Greek, Gothic, Sanskrit, Old Church Slavonic, Akkadian, Hittite, Old High German, Middle Persian, Old French and others were written by a single and unified speech community, who spoke a single language in a single point in time, as opposed to being spoken an written across many centuries (often millenia) by a diverse communities who never knew each other, who wrote in ill-fitting lossy scripts under the influence of traditional orthography not necessarily reflecting actually spoken sounds, and languages of documents X and Y who are today treated as parts of a single ancient language X would be in any other occasions treated as two completely separate languages, were they attested today.
I'm receptive to the idea of having both 1) reconstructed, template-generated inflection fitting some "idealized" model of a language as well as 2) listing only actually attested forms (I believe Old Irish conjugation and Old Persian declensions currently does that). But, simply ignoring all of the variation in order to fit them into some kind of imaginary order is a disservice to any serious potential users of Wiktionary. The only ones who would benefit from that would be non-serious users who could then claim that they "learned" some ancient language as presented by Wiktionary, even though such language never existed in the form it is being presented. It would be similar to many of our protolanguage inflection templates who present some kind of ridiculous Stammbaum-like picture of parent language dissolution reflecting a POV of a single linguist, which never existed as such.
Regarding the barely documented indigenous languages - they are a separate category. They are usually a living thing, and if one scholar uses â and another ā to represent what is indisputably the same sound, it makes sense to standardize on the most common notation and use others as redirects. But if some ancient language uses three different symbols for the [a:] sound, we cannot standardize it on anything because we don't have a clue whether those symbols meant the same thing (even though some, but not all, think they did). We can't make that kind of value judgments. If the original documents are still being published in facsimile editions, it means that no normalization is possible. There could be exceptions - e.g. Gothic with a tiny corpus and a small number of authors (one, is it? Ignoring Crimean Gothic). But for the majority it's not practical at all.
Anyway, this should all be discussed on an individual language basis. --Ivan Štambuk (talk) 10:41, 5 August 2014 (UTC)

In modern languages too, such as French, there may be normalized (recommended) spellings, and it's sometimes very difficult or impossible to find attestations for these normalized spellings. When there if an official recommendation, I think that they should be includable. For old languages, the issue is more difficult. Of course, they should be included when attested (even when the olf spelling cannot be found), but not considered as the main entry (the other entry should be as complete as the normalized one). If a (normalized or old) spelling is included even when it seems to be unattested, the fact that no attestation has been found should be made very clear in the entry. Lmaltier (talk) 18:20, 4 August 2014 (UTC)

Unattested lemma forms and CFI[edit]

Kind of a spinoff based on what Angr brought up. Currently, the common practice is to reconstruct the lemma form if it is not attested, and place the entry there. If several lemmas are possible, we generally include them all and choose one at random. This practice is primarily done with old languages, but it's easily conceivable that it could happen to modern languages as well. For example, if all we have for a particular English lemma is two attestations of fonges, one of fonging and one of fonged, then I doubt we would put the main entry at one of those entries. We'd put it at fonge, even though it's not attested. CFI doesn't say anything about this practice, but as it's so widespread both on Wiktionary and outside it, I think we should clarify and codify it. —CodeCat 18:18, 4 August 2014 (UTC)

I don't think English verbs are the best example of the phenomenon. I have worked on and observed cases where we have -ing forms and -ed forms as distinct entries, but do not have the presumed verb lemma form. In the absence of the lemma, the -ing form is often shown as noun and/or adjective and the -ed form as adjective. This seems to actually be a fairly common evolution, with the base and -s forms coming well after the -ing and -ed forms, if indeed they ever materialize in use. I cannot recall specific cases, but, if it is important, instances could probably be found. The best way would be by extracting the cases from the dump. DCDuring TALK 18:37, 4 August 2014 (UTC)
I did include the 3rd person singular present as one of the attested forms in my example, and the past form could include the past tense as well. —CodeCat 18:50, 4 August 2014 (UTC)
So we're both operating without real cases. I'll rejoin the discussion when someone, possibly me, has a real case. DCDuring TALK 20:04, 4 August 2014 (UTC)
Here is a real case: I cannot find the French verb arsenicaliser in its lemma form, but I can find it in a conjugated form : "Le praticien qui a le plus arsenicalisé le monde et dont l’expérience a le plus d’extension, de richesse et de certitude, M. Boudin, préfère actuellement l’acide arsénieux et se tient exclusivement à lui dans tous les cas : (…)" (Annales de la Société de médecine de Lyon, 1851) (it's undisputably a verb in this sentence) or "Nickel minéralisé par le fer & le cobolt sulphurés & arsenicalisés ;" (François Rozier, ‎Jean André Mongez, ‎Jean-Claude de La Métherie, Journal de physique, de chimie, d’histoire naturelle et des arts, 1777). Lmaltier (talk) 20:28, 4 August 2014 (UTC)
Here are some more real cases: Passargisch, ostweserisch, and several of the other adjectives in Category:German terms with rare senses. - -sche (discuss) 22:23, 4 August 2014 (UTC)
There are plenty of real cases from extinct languages; I already brought up πρίαμαι (príamai), which itself is not attested, but other forms of it are (see [1]). Old Irish examples include ad·gnin, ailid, and claidid. —Aɴɢʀ (talk) 22:32, 4 August 2014 (UTC)

Need for entries or a field in the 'create a new entry page' that may just be cross references for all spellings.[edit]

There is a requirement for entries that may just be cross references, for all spellings that use characters that are not in the usual Romanised character set.

Just going to the <create a new entry> page is very frustrating. The cross reference might be an extra field on the <create a new entry> page, with the title something like <have you viewed ...>.

Repeatedly I am on a page but cannot search for it to get back to it or to get to similar pages in the index to get back to it.

<kephalḗ> is the Romanisation of <κεφαλή>, but you cannot search for <κεφαλή> using <kephal ...>. It is often very difficult to imagine what you need to do to get back to a page that you have accessed when using the etymology, especially.

I expect to be at Wikimania on Wednesday ...

Genevieve Hibbs

Tocharian question[edit]

@Ivan Štambuk: and @Word dewd544: in particular since they seem to be our most prolific Tocharian editors: I see from kuse that the vowel letters that are normally represented as subscripts are represented by full letters in the entry name, but the headword line shows the subscript (in this case, kuse). Is this the best way to do this? "Kuse" and "kuse" correspond to two different spellings in the original script, don't they? If and when Unicode finally provides the Tocharian alphabet, we will presumably want to move our entries to forms written in the native script (hopefully retaining the Latin-alphabet entries as "Romanizations of..."), and if we want to do that by bot, it would be good to have entries under unambiguous names. Shouldn't the Tocharian B section of [[kuse]] be moved to [[kᵤse]] instead? The only problem I foresee is that sometimes it's "ä" that's subscript, and Unicode doesn't have a character for subscript "ä". For those cases maybe we could cheat and use "ₔ" instead. What do y'all (and anyone else interested) think? —Aɴɢʀ (talk) 14:09, 5 August 2014 (UTC)

Yes it should be moved, I wasn't even aware that subscript u sign <ᵤ> existed in Unicode until now.. --Ivan Štambuk (talk) 15:02, 5 August 2014 (UTC)
And are you OK with using "ₔ" for "ä"? Are there even any entries that currently call for that? —Aɴɢʀ (talk) 15:17, 5 August 2014 (UTC)
I'm fine with that. These issue should best be discussed on the about-page for Tocharian. We only have a few hundred Tocharian entries, and they need to be rechecked and referenced at any case. Unicode support doesn't seem to be coming anytime soon. If you feel like doing that, just knock yourself out... --Ivan Štambuk (talk) 16:21, 5 August 2014 (UTC)
We don't have an about-page for Tocharian. I don't have the resources to recheck and reference the Tocharian entries, but if I happen to see any subscripts in headword lines, I'm happy to move the info to a new entry name. —Aɴɢʀ (talk) 16:33, 5 August 2014 (UTC)
Yeah, I agree that it should be moved as well. I think the reason I didn't use those characters originally was because I didn't know the others actually existed for use on here, and also just made them based on the format that was used for the few existing entries, from what I remember. But this way is better. Word dewd544 (talk) 13:52, 13 August 2014 (UTC)
Cool. Is [[kᵤse]] the only one? —Aɴɢʀ (talk) 14:31, 13 August 2014 (UTC)
OK, I've moved the Tocharian B section to [[kᵤse]] and, I believe, fixed all the links that were pointing to it. I looked through the lemma categories of both Toch. languages and couldn't find any others with subscript vowels, but I may have overlooked something. —Aɴɢʀ (talk) 10:48, 7 August 2014 (UTC)

Representing Old Irish "tense" sonorants[edit]

Anyone interested in Celtic languages or IPA transliteration (or both, or anyone who just wants to put their oar in) is invited to join the discussion I've just started at Appendix talk:Old Irish pronunciation#Representing the tense sonorants. —Aɴɢʀ (talk) 01:25, 9 August 2014 (UTC)

Lists of dictionary headwords[edit]

Are they subject to copyright? I am interested in creating appendices containing lists of headwords of some notable dictionaries that are still under copyright, as well as some additional information not contained in them. --Ivan Štambuk (talk) 20:31, 9 August 2014 (UTC)

Compiling your own list is not copyrighted as far as I know. —CodeCat 20:32, 9 August 2014 (UTC)
Copyright is a matter subject to interpretation anyway... So let's not give a shit about it --Fsojic (talk) 20:44, 9 August 2014 (UTC)
But I don't want my own list. I want an enhanced lists of words or reconstructions exclusively from certain works so that the experience of browsing them could be simulated by clicking. Additionally, references which refer to them could back-link to such lists. It could also be good for verification and inspection of coverage. I prefer lists and tabular presentation over categories.. I recall a discussion a while back about Brian's hotlist which was kept, so I suppose it's not a big deal. But such lists would be exposed outside userspace, and that seems a bit more problematic, so I'm asking if it could be prohibited for some reason. --Ivan Štambuk (talk) 21:08, 9 August 2014 (UTC)
Intellectual property lawyer hat on. The namespace that a list appears in is irrelevant to copyright law. As far as I recall, Brian's hotlist is a compilation of headwords from several different dictionaries, and therefore can not identifiably impinge on the copyright of any one of them. I think that it would be problematic, at least, to list the headwords of a specified edition of a specified, in-copyright, printed dictionary. Such a list of words defined reflects the editorial judgment of the dictionary's authors, and is therefore likely to be covered by copyright. Doing so for one that was out of copyright would be fine. A possible workaround would be to make one set of lists of words defined in out-of-copyright versions of specified dictionaries, and a separate list containing a combination of words not defined in the out-of-copyright versions (which will basically be words that are new since their publication) but which are defined in unspecified "major dictionaries". bd2412 T 22:01, 9 August 2014 (UTC)
Interesting. Thank you very much for this explanation. --Dan Polansky (talk) 09:20, 10 August 2014 (UTC)

Can we automate Hiragana and Katakana transliteration?[edit]

I see we don't currently have automatic transliteration of Hiragana and Katakana. Is there a technical reason why we can't, or is it just that no one's gotten around to it yet? —Aɴɢʀ (talk) 11:02, 10 August 2014 (UTC)

It's possible, but it would give incorrect results when they are mixed with Kanji. So the module would have to check for the presence of Kanji characters and return nothing if found. —CodeCat 12:09, 10 August 2014 (UTC)
Mixed terms should rely on kana, e.g. 勉強する and 電子メール should use kana spellings べんきょうする (benkyō suru) and でんしメール (denshi mēru), if it only transliterated the hiragana/katakana part する and メール, it would be a mess. --Anatoli T. (обсудить/вклад) 01:00, 11 August 2014 (UTC)
Would that be a lot of work? Obviously we shouldn't do it if it means listing every single one of the 6.3 kilosagans of possible Kanji characters, but if it can be done with less than 100 characters of code, why not? —Aɴɢʀ (talk) 12:19, 10 August 2014 (UTC)
Module:ja already does it (in Japanese headwords). It's not implemented in link templates, as the transliteration may be incorrect. Wyang (talk) 23:53, 10 August 2014 (UTC)
The automatic transliteration is used in Japanese entries and usexes ({{ja-usex}}) and some other templates. It's only not used in translations. For this to happen, the translations would need to follow the same format as entries, using spaces in multipart words or phrases with particles, capitalisation (forced with symbol ^ or automatic on proper nouns). Besides, many kanji translations don't have hiragana, which is needed for transliterations to happen. --Anatoli T. (обсудить/вклад) 00:02, 11 August 2014 (UTC)
The other challenge is that the Japanese transliteration is somewhat context-driven, as I said, proper nouns (which excludes language names, demonyms, month names, weekdays) are capitalised, verb with final おう are transliterated as "-ou", rather than "ō", there are cases when morphemes need to be separated ("." is used in entries), particles は and へ are "wa" and "e", rather than "ha" and "he". --Anatoli T. (обсудить/вклад) 00:56, 11 August 2014 (UTC)

Block policy clarification[edit]

The current blocking policy page WT:BLOCK seems misleading. I propose to reduce the page content to the following wikitext:

:''See also '''[[Help:Interacting with humans]]'''''

# The block tool should only be used to prevent edits that will, directly or indirectly,
hinder or harm the progress of the English Wiktionary.
# It should not be used unless less drastic means of stopping these edits are, by the assessment
of the blocking administrator, highly unlikely to succeed.

===See also===
* [[Wiktionary:Range blocks]] - when and how to block a range of IP addresses
* [[Wiktionary:Vandalism in progress]] ([[WT:VIP]]) for currently occurring or very recent vandalism
* [[Wiktionary:Vandalism]] (or [[WT:VANDAL]]) for vandalism of Wiktionary in general

As per Wiktionary:Votes/pl-2010-01/New blocking policy, the above text is the only binding part of the page.

Note that I placed "policy-CFIELE" there, so that the criteria for further modification of this page should be identical to those of CFI and ELE.

What do you think? --Dan Polansky (talk) 12:06, 10 August 2014 (UTC)

Special:Abusefilter is supposed to filter enough to allow to talk with the staying editors if they're wrong, and so encourage them toward perfection. How many valuable professionals could post their personal site in reference by ignorance, we can't treat all of them as some incorrigible spammers, it would contravene to WT:Be bold.
Moreover, I'm still considering that if the current WT:BLOCK had been applied with my known Wikimedia bot (3 millions editions and 21 flags), the blocker wouldn't have to refuse to assume any hurried arbitrary decision. I saw too much waste because of friendly fire by the past.
That's why letting a message was a sine qua non condition before forbidding indefinitely the open wiki. JackPotte (talk) 13:10, 10 August 2014 (UTC)
Apart from that we could also recruit more patrollers, for example by giving this status to them automatically after 500 editions, like on the French Wikipedia. JackPotte (talk) 16:24, 11 August 2014 (UTC)


Template:pedia was redirected from one page to another earlier today, resulting in a number of pages being broken. Instead of Template:pedia redirecting to Template:projectlink/Wikipedia, I request that Template:projectlink/Wikipedia redirect to Template:pedia instead. Since Template:pedia is linked to from [ https://en.wiktionary.org/w/index.php?title=Special:WhatLinksHere/Template:pedia&namespace=0&limit=5000 thousands of pages], it seems the more likely target. Purplebackpack89 23:56, 11 August 2014 (UTC)

Support it being at Template:pedia
  1. Purplebackpack89 23:56, 11 August 2014 (UTC)
Support it being at Template:projectlink/Wikipedia

Did you even look at the page history? Template:pedia has been a redirect since 2007. —CodeCat 23:58, 11 August 2014 (UTC)

And when you moved it earlier today, it wasn't working on the pages I looked at. It shouldn't have been moved by you earlier today, and you shouldn't have deleted the page you did. You also shouldn't have edit-warred, and you should have provided better edit summaries. Purplebackpack89 00:02, 12 August 2014 (UTC)
That's because you reverted my move while I was still in the process of updating all the redirects to point to the new location. Your revert actually broke the template altogether because it ended up pointing to a deleted page. You should have taken more care before making changes when you didn't know what you were doing. You should also have taken more care to get the facts clear before posting erroneous and misinformed "polls" like you did, which do nothing but embarass you and waste the time of other editors who have better things to do than deal with you. —CodeCat 00:05, 12 August 2014 (UTC)
You coulda saved yourself the work of not updating all redirects by not making the move in the first place. Nothing will convince me that it was a good idea to make that move. Heck, things would have worked just fine if you'd let Template:pedia have the full text it did in my last edit. Nobody will ever use Template:projectlink/Wikipedia, because just adding Template:pedia is so much easier. Why don't we just have Template:projectlink/Wikipedia redirect to Template:pedia? Everything would be so much simpler that way Purplebackpack89 00:24, 12 August 2014 (UTC)
Template:projectlink/Wikipedia is not meant to be used directly in entries anyway. Rather it's meant to be used through {{projectlink}}, which supports many other projects. {{pedia}} is just a remnant from before it was converted to {{projectlink}} back in 2007. All the projectlink pages are named beginning with PL:, including Template:PL:pedia, which Template:pedia was originally a redirect to. All I did was move Template:PL:pedia to Template:projectlink/Wikipedia. I am intending to move all the other PL: templates too, as they are properly subtemplates of Template:projectlink and are only meant to be used in conjunction with it. Having them as subpages makes that relationship more clear. I really don't understand why you are making such drama out of it. —CodeCat 00:34, 12 August 2014 (UTC)
Because you broke pages, and they wouldn't have been broken if you hadn't messed around with the template. You probably shouldn't have deleted Template:PL:pedia either. Purplebackpack89 00:42, 12 August 2014 (UTC)
It has no transclusions, so why would we keep it? It's useless. —CodeCat 00:47, 12 August 2014 (UTC)
  • By the way, it was COI for CodeCat to protect a page she was edit-warring on. For the life of me, I don't understand why CodeCat is still an administrator. She edit-wars frequently, she rarely explains what she's doing, and she protects things she's engaged in edit wars on. Purplebackpack89 00:28, 12 August 2014 (UTC)
    • You're just looking for reasons to get your right when I've already countered your other arguments. You're pretty much pulling the idea that I was edit warring out of your hat in an attempt to put me in a bad light while excusing yourself. If someone breaks things or makes other bad edits repeatedly, there is nothing wrong with edit warring. It's just un-breaking the wiki. Imagine if we had to start a discussion whenever someone kept re-inserting "poop" into an entry. It would be rediculous! —CodeCat 00:34, 12 August 2014 (UTC)
      • For starters, the last edit I made to Template:pedia wasn't a bad edit. I want you to look closely at it before calling it a bad edit. Secondly, I was acting in good faith trying to restore a template that was showing up as broken on a page. Somebody who inserts "poop" into a page is vandalizing. In one of those cases, it is acceptable to edit-war. In the other, it isn't. If you don't understand which is which, and you think it's OK to edit-war to revert good-faith edits without even an edit summary explaining why you did what you did, then you have no business being an admin. Purplebackpack89 00:42, 12 August 2014 (UTC)
        • To be fair, I think very little of what you do on Wiktionary is truly good faith. You mostly get on people's nerves and are obstructive almost on principle, and people have said so many times in the past. You've even driven away other valued and productive editors with your behaviour. So if I shouldn't be an admin, then I suggest you shouldn't be on Wiktionary at all. —CodeCat 00:47, 12 August 2014 (UTC)
          • I resent your accusation. I tried to fix that template because it was showing up as broken, not to piss you off. I vote keep at RfD because I believe the project would be improved with more articles, not to piss Mglovesfun off. Every mainspace and RfD edit I make is in good faith and with a view to improving the Wiktionary Purplebackpack89 00:50, 12 August 2014 (UTC)
            • Of course, but so are all of my edits. :) I never said that you did it to piss me off. That's not what the other editors who have complained said either. But good faith edits are not equal to good edits, and are therefore not exempt from being reverted. Having good intentions also doesn't prevent you from getting on people's nerves. A while ago there was User:KYPark who kept inserting rather outlandish etymologies at WT:ES, and would get very philosophical about the ideas while not really contributing or making any kind of point. He got upset when we started moving them to his userspace because he didn't understand that it didn't belong there, and after his behaviour continued for about a year or so, he got blocked, I think even several times. There was no discussion about a block, but nobody really minded that he was blocked because he had annoyed and frustrated so many people that nobody was willing to stand up for keeping him. They were glad he was finally gone. The reason I am telling all this is that something similar may eventually happen to you as well. You would do well to try to be a friend of the larger Wiktionary community, because all the good faith in the world will not help you if they are fed up. —CodeCat 01:02, 12 August 2014 (UTC)
From the edit history of Template:pedia it is apparent that Purplebackpack's starting assumptions (since amended) are mistaken. Template:pedia was not moved; it has been a redirect since 2007. And it was not CodeCat's updating of the redirect target but Purplebackpack's revert of that which seems to have broken some existing uses during the update that was being made. Purplebackpack's subsequent unilateral insertion of thousands of bytes of duplicated code also created quite a mess. Purplebackpack says "I was acting in good faith". As Wikipedia observes at w:WP:CIR, "[some users] believe that good faith is all that is required to be a useful contributor. Sadly, this is not the case at all. Competence is required as well. A mess created in a sincere effort to help is still a mess." - -sche (discuss) 01:07, 12 August 2014 (UTC)
  • Explain how the code insertation created a mess. Purplebackpack89 01:10, 12 August 2014 (UTC)
  • Also, there are certain things that acting in good faith entitles you to. One of them is a clear explanation when you are reverted. CodeCat did not give one. Purplebackpack89 01:12, 12 August 2014 (UTC)
    • I was more focused on undoing the damage than on giving an explanation. Fixing thousands of entries had a higher priority to me than satisfying one user. —CodeCat 01:20, 12 August 2014 (UTC)
      • Maybe you shouldn't have broken them, then... FWIW, CIR isn't policy here or even on Wikipedia, it's merely an essay, and it's a bad idea, because it flies in the face of being BOLD, and taking chances with edits. It's also walking too fine a line, because it's impossible to understand why a particular editor did a particular edit. Finally, it requires a level of communication that is present on Wikipedia but not on Wiktionary; Wikipedia not only has fewer things that can be broken (since they don't use as many templates and lack a rigidity of article structure), it also is better at explaining to editors what's wrong. Purplebackpack89 02:57, 12 August 2014 (UTC)
      • Furthermore, @CodeCat:, your attitude that explaining your edits to other editors is of little or no import is disheartening, to say nothing of being wrong. You complain about me being hard-headed, but I've mentioned this to you at least half a dozen times, and other editors have mentioned it as well, and you've ignored them. It's very disingenuous for you to make a CIR-based argument when you have not been forthcoming about why you're right. Purplebackpack89 04:36, 12 August 2014 (UTC)
        • (edit conflict) CodeCat didn't break anything- you did. You assumed bad faith, and didn't bother to ask or investigate. I'm not going to apologize for CodeCat- sometimes I vehemently disagree with her actions, and I've done my share of griping about it. I've even reverted a few of her edits- but only when things were seriously broken and she wasn't around to fix them, and only after carefully analyzing everything to make sure I wasn't going to make things worse.
        • You see, normal people would post a complaint on her talk page or in the forums first and demand to know why she was doing it. You, on the other hand, know better than everyone else and reserve the right to unilaterally step in and take over any time it sort of looks like someone might be doing something wrong- shoot first, and ask questions later. And then, when it's demonstrated that you were mistaken, you don't admit you were wrong, you don't apologize- no, you attack the person you interfered with for not explaining things so even you could understand. After all, you never make mistakes- the only way you could ever be wrong is if someone else misleads you into being wrong. Chuck Entz (talk) 07:54, 12 August 2014 (UTC)
          • Chuck, I saw something was broken and tried to fix it. I did that in good faith, and felt I was owed an explanation for why my edits were wrong. Purplebackpack89 14:07, 12 August 2014 (UTC)
  • Now as before, I find the CodeCat pattern of discussion-free and summary-free edits to infrastructure objectionable. CodeCat hardly ever explains themsemselves, but require explanation for opposition to their edits. CodeCat lacks the maturity to understand that excessive change with little added value is bad. --Dan Polansky (talk) 07:11, 12 August 2014 (UTC)
  • PBP, for this and other edit-warring incidents you have been stripped of rollback and autopatrolled privileges. The latter increases the chance that someone without "COI" will notice any disputes with you, so you should be thankful, really. Further misbehaviour will be met by a block. Keφr 08:10, 12 August 2014 (UTC)
    This should be immediately undone, since the BPB vs. CodeCat incident had nothing to do with autopatrolling and rollback flags. Especially the autopatrolling should be returned back, since the mainspace edits of PBP are largely undisputed, and removing the flag will increase patrolling cost to the patrollers. Furthermore, the threat of a block is inappropriate, since PBP was edit warring with CodeCat on a page which CodeCat edited without consensus; a block or desysopping of CodeCat could be in order, given the long-term pattern of their editing behavior. --Dan Polansky (talk) 08:31, 12 August 2014 (UTC)
    Flags restored. We can't just punish one party in a conflict, especially considering that the reverts were (perceived as) legitimate, and that there is no pattern of (perceived) abuse of those flags.. --Ivan Štambuk (talk) 08:55, 12 August 2014 (UTC)
    The reverts were perceived as legitimate by whom? Keφr 09:07, 12 August 2014 (UTC)
    By them, obviously, otherwise they wouldn't have done them. --Ivan Štambuk (talk) 12:33, 12 August 2014 (UTC)
    I stand by what I did. PBP may not have used the rollback button here, but the repeated combative and misinformed edit-warring is evidence that he cannot be trusted with it. Patrolling burden should not be a problem; PBP has made one edit yesterday, two the previous day, three edits two days ago, and previous 13 edits were on 3rd of August, so his edits are quite infrequent. However, the few edits he makes do need attention apparently. In my opinion a block is not only appropriate, but long overdue. This is not just a single incident, and PBP refusing to learn (from past mistakes and from everything else) is a huge red flag. And nothing prevents you from starting a desysopping vote. Keφr 09:01, 12 August 2014 (UTC)
    In Wiktionary:Beer_parlour/2014/June#Purplebackpack89, in the hidden section "Rights removal", two editors supported flag removal while four editors opposed, two of which explained that since PBP has not abused the flags, they should not be removed. Again: since the editor has not abused the flags, they should not be removed. Furthermore, nowhere in this thread have you noticed that CodeCat refuses to learn. You have singled out the fairly harmless PBP, and conveniently ignored the editor who by my lights have caused actual damage in the mainspace, unlike PBP whose only damage are drama threads in Beer parlour, a fairly unimportant thing. In this very thread, the drama was sustained by CodeCat, who continued to respond to PBP posts. But again, the drama itself is fairly harmless, an attribute of an open wiki where people can actually speak up. --Dan Polansky (talk) 09:17, 12 August 2014 (UTC)
    Wasting people's time on futile discussions is not "fairly harmless". And again, if you think CodeCat's actions are so egregious, what are you waiting for? For any punishment to be effective, CodeCat needs to be desysopped first. Keφr 09:35, 12 August 2014 (UTC)
    There is no sound desysopping process. That is why Ruakh left before he would have to deal with CodeCat in this environment. The only desysopping process that we tried relied on the 2/3-supermajority consensus for desysopping. And this of course enables CodeCat to perform mass changes with unclear support, possibly even less than plain majority support, and be fairly sure they will not get desyssopped, since there probably is something like 45% or more of supporters of what they are doing; I have invented the 45% number and I do not really know the scope of support for their various changes. --Dan Polansky (talk) 09:46, 12 August 2014 (UTC)
    "Nowhere in this thread have you noticed that CodeCat refuses to learn." What she has refused to learn is that the editing process would be a helluvalot easier for everybody concerned if she used edit summaries. She has repeated refused to even consider doing so. Purplebackpack89 14:01, 12 August 2014 (UTC)
As annoyed as I am with PBP's behavior and attendant whining in this incident, I disagree with removing his flags over it. I simply don't see the relevance. This reminds me a lot of the whole Gtroy/Acdcrocks/LuciferWildcat affair: in that case, improper harassment generated enough sympathy that he was able to continue with his prolific creation of subpar and often fabricated content far longer than he would have otherwise. Chuck Entz (talk) 13:57, 12 August 2014 (UTC)
Are you suggesting we should repeat the same mistake by letting him loose? As you see, there is enough strife in this community without stubborn ignoramuses adding to it. Keφr 14:29, 12 August 2014 (UTC)
  • One thing's for sure: Kephir overstepped his bounds with his removal of rights. His "beef" against me has translated into HOUNDing and irrational admin actions, and this after I told him multiple times that interacting with me is unproductive. I am very close to considering he be forbidden from interacting with me for the good of the community. Purplebackpack89 14:07, 12 August 2014 (UTC)

Practically this whole discussion is making me facepalm...stop getting so bent out of shape over such piddly little things... To precounter any (slightly) likely accusations, I'm not taking sides here.

  • PBP: You thought CodeCat fucked things up, but in reality things only got messed up because you didn't let them finish the redirections and such that they were doing. Accept that you made a mistake and move on. I get that this was a somewhat more widely used/high profile, etc template not some obscure thing but maybe instead of, as Chuck said, assuming bad faith you should have posted to CodeCat's page to the effect of "Do you realise you broke this thing?" (since that's what you thought happened) before reverting. User: PalkiaX50 talk to meh 14:59, 12 August 2014 (UTC)
Where do you get the idea I was assuming bad faith? And you guys fail to acknowledge that CodeCat bears some responsibility for not initially communicating why she did what she did. Purplebackpack89 16:22, 12 August 2014 (UTC)

Lots of errors in Old French nouns (and in verbs too, before I fixed them)[edit]

(This is a bit of a rant. No offense intended to whoever created all the mistakes ... maybe User:Mglovesfun?)

I notice a bunch of mistakes in Old French noun declension. Of the first 4 words I checked out, 3 had incorrect declensions.

  • seror is mistakenly listed under suer; suer is the nom. sg. and seror the obl. sg. but WT has them reversed.
  • empereor is the std form, but WT claims that empereür is standard and redirects the former to the latter when it should be reversed; it also messes up the nom. sg. (should be emperere not empereres) and obl. pl. (should be empereürs not empereres).
  • ameor has the same declension as empereor (nom. amere - ameor, obl. ameor - ameors) but is listed with a totally different declension, broken in a different way from empereür.

A little more looking reveals

  • chanteor has the same declension as ameor and empereor but is listed with a messed-up declension that is different from both the messed-up declensions of ameor and empereor/empereür; what a mess. Its etymology is also broken ... it lists a mistaken cantor instead of cantātor.
  • BTW empereür's etym. is slightly messed up, listing a non-existent Latin word imperātōr with a stray long mark over the o.
  • chaceor, robeor, troveor have the mistaken declension of ameor.
  • compaignon and felon should have similar declensions; both are broken, each differently from the other.
  • nonain and *ante (should be antain) again should have similar declensions and are broken, each differently from the other.

I'm sure there are tons more. How did this get so messed up?

BTW, the Old French verb conjugations were utterly messed up, too, and full of wrong-way redirects as well, but I've put a lot of work into fixing them.

I'd suggest in the future that it would be better to have no declensions/conjugations at all than completely wrong ones.

Benwing (talk) 09:17, 12 August 2014 (UTC)

  • Well, what happens with editors in the more obscure languages is naturally that less people know anything about them, so it is tough to find out if they are correct or not. The same happened with plenty of other editors. There was a guy called User:Razorflame who editted in tonnes of languages, but users better than him kept pointing out his mistakes in these languages, which made him move to other languages - before long he was editting in Kannada, and became the self-proclaimed Kannada expert (and since nobody else knew anything about the language, he was allowed to edit to his heart's content, doubtlessly filling this project with crappy Kannada entries). The same thing has happened myriad times, for example User:Wonderfool with Asturian - he claims to be married to an Asturian woman who knows the language, and since nobody else edits in that English, he becomes the "local expert". My suggestion (and hope) is to fix as many of the Old French entries as you can. It's highly probable that Mglovesfun has made plenty of mistakes, so we appreciate any new editors in less widespread languages like. Wonderfool too would appreciate other Asturians to correct his work. --Type56op9 (talk) 14:30, 12 August 2014 (UTC)
    Maybe you can even get your Asturian wife to check all your edits. --WikiTiki89 15:57, 12 August 2014 (UTC)

CodeCat pushing original research[edit]

User:CodeCat is again pushing large-scale original research (OR) into etymology sections of mainspace articles, as well as articles for protolanguage reconstructions in the appendix namespace, but this time removing cited scholarship (which he added in several instances) and replacing it with his own fabrications (en example). In the past he objected to tagging his made-up theories with the template {{original research}} which he unilaterally deleted out of process having removed all of the instances of articles being tagged with it (en example). Neither of these were discussed anywhere and CodeCat never uses edit summaries. His behavior is detrimental to the both credibility of Wiktionary as well as discouraging for any editors involved in those areas who see their work undone in such dictatorial manner. --Ivan Štambuk (talk) 12:31, 12 August 2014 (UTC)

We've been over this before. Wiktionary does not have a policy or prohibition against original research, and just because you say it's unwanted doesn't mean it is. —CodeCat 12:33, 12 August 2014 (UTC)
No we haven't been over this. Many have objected to this practice. And what you're doing here is something entirely different - bending different (legitimate and scholarly-supported) theories into something original and thus useless, but seemingly supported by references. And you do it repeatedly, without discussion, and when it's reverted you revert back to the disputed version containing your original research, claiming that the disputed version should be discussed first before reverting. --Ivan Štambuk (talk) 12:38, 12 August 2014 (UTC)
Yes, because I disagree with your moves. You also disagree with mine. So we're at a standoff. That's why I called for a discussion, to form a real consensus on WT:AINE-BSL rather than to just edit war over it. This discussion is not going to get us anywhere as long as it's just the two of us. —CodeCat 12:41, 12 August 2014 (UTC)
I object to CodeCat placing their unsourced original theories where sourced theories exist. --Dan Polansky (talk) 12:54, 12 August 2014 (UTC)
As for Wiktionary:About Proto-Balto-Slavic, each sentence present there that is not based on consensus should be tagged "[disputed]" or the like, to make it clear the page does not represent consensus. --Dan Polansky (talk) 12:56, 12 August 2014 (UTC)
Every single part of the Proto-Balto-Slavic reconstructions I created can be sourced. What is not sourced is the exact written form of the words. Instead, I converted them to use a common notation, just like we do for other reconstructed languages. I don't understand what is so controversial about it. —CodeCat 13:02, 12 August 2014 (UTC)
Okay. I object to CodeCat replacing (or renaming) particular forms that are sourced with particular forms that are unsourced. --Dan Polansky (talk) 13:04, 12 August 2014 (UTC)
Just to put things into context here. Are you suggesting that if a source attests, say, Indo-European *teutā or *teutéh₂, and someone creates an entry with that name, then we are not allowed to move that to *tewtéh₂ even if no source attests it in that exact written form? Because that's the equivalent of what I've been doing for Balto-Slavic. —CodeCat 13:10, 12 August 2014 (UTC)
Yes, that is correct, just that by "even if no source" you probably meant "since no source". The sourced exact written forms should prevail unless there is an overwhelming consensus to the contrary. --Dan Polansky (talk) 13:18, 12 August 2014 (UTC)
Well, I would say that we already have a consensus as WT:AINE already details how forms are to be normalised. Some of it I've written, but some parts of it were already there before (in particular the bit about laryngeals). There has not been any dispute about that practice, and it has been enforced by other editors as well, so I believe consensus can be assumed. So then I would conclude that there is, in fact, a consensus for moving those entries to the normalised form *tewtéh₂. Furthermore, there is also an established practice to normalise even attested languages, including most prominently Old Norse and Old English, but also languages with a prescribed standard orthography. So I can only assume that normalising spellings is a well-established practice for Wiktionary and if I was supposed to treat it as something controversial or disputed, I would have expected more evidence for that. —CodeCat 13:23, 12 August 2014 (UTC)
Given my past experience with you, I don't believe a single word that you say about "consensus". So please deliver objective evidence of consensus; I will not consider any consensus claims made in the absence of such objective evidence. --Dan Polansky (talk) 13:26, 12 August 2014 (UTC)
You can't prove a negative. Consensus exists through the lack of dispute. As there has not been any dispute regarding the normalising of spellings in PIE, consensus can be assumed. —CodeCat 13:29, 12 August 2014 (UTC)
Re: "Consensus exists through the lack of dispute": Absolutely not. Either an overwhelming common practice or a discussion is a prerequisite for there being a consensus; both can be demonstarted by objective evidence. Since we now know that your consensus claims are based merely on your perceived "lack of dispute" and conventiently fit your long-standing pattern of mass editing without consensus, the need for you to provide objective evidence has been corroborated. --Dan Polansky (talk) 13:33, 12 August 2014 (UTC)
So then there is actually no consensus between us on what consensus is. That's going to be difficult. —CodeCat 13:36, 12 August 2014 (UTC)
Re: "Consensus exists through the lack of dispute": That is an absolutely outrageous view of consensus. In any event, obviously we now have evidence of lack of consensus. DCDuring TALK 13:37, 12 August 2014 (UTC)
For what it may be worth: AINE and AINE-BSL before CodeCat edited them. The latter was only edited by Ivan before. Keφr 14:40, 12 August 2014 (UTC)
What should also be looked at is how many of the entries that existed at that time followed the policy on those pages. I'm fairly sure that when Ivan created the page for PBS, it was when the dispute had already started, and most of the Balto-Slavic entries that existed at the time did not in fact adhere to the practices that Ivan was detailing. So it was not an attempt to codify practices but to establish his own as canonical in contradiction to what was already present on Wiktionary. The edits I made after that corrected that, while also inserting practices I felt were more reasonable, but were not established by anyone prior to that. For the PIE page, I believe at the time most of our entries (there weren't that many yet) also didn't adhere to the spelling norms on that page. So I probably edited the page to reflect the reality, although it's long ago so I don't really remember. —CodeCat 22:08, 12 August 2014 (UTC)
User:CodeCat: Maybe you should compile a list of those instead of saying "fairly sure". Apparently some people here do not take your "fairly sure" very seriously. Which is not necessarily the problem with those people. Keφr 10:09, 13 August 2014 (UTC)
What you did is 1) changed the policy page removing the stuff you don't like and overriding it with your original research 2) mass renamed a bunch of pages, and edited the references to them in the articles 3) undid my reverts when I challenged those changes.
Your claim that "most of the Balto-Slavic entries that existed at the time did not in fact adhere to the practices that Ivan was detailing" is nothing but lies. Most of them were referenced except for the reconstructions that are a figment of your imagination and cannot be found anywhere in the literature, and which are not deleted due to your interpretation of "no policy against original research" = "I can do whatever I like".
And now again you mention spelling norms which this has nothing to do with. These different reconstructions represent completely different theories and cannot be unified under a single "normalized" spelling. I explained that countless times. --Ivan Štambuk (talk) 10:32, 13 August 2014 (UTC)
  • The so-called "normalized spellings" argument is a red herring, and an attempt to push a particular POV. Writing a instead of o, ź instead of ž, or writting a glottal stop sign ʔ or not is not merely a "normalization of spellings" - these represent completely different protolanguages, reflecting different theories by different linguists. Usage of innocent terms such as normalization is merely an attempt to trivialize implications of such edits. These differently reconstructed protolanguages in fact represent completely incompatible theories and cannot be reconciled via notational convention. It's not like w = u̯ in Proto-Indo-European, in the example given by CodeCat above. --Ivan Štambuk (talk) 14:29, 12 August 2014 (UTC)
    • As far as I know, the majority of Balto-Slavic linguists accepts the existence of a so-called "acute" register for Proto-Balto-Slavic. The use of *ś rather than *š reflects a real phonetic difference in Proto-Balto-Slavic (that of PIE *ḱ versus *s + RUKI) and this difference is maintained in Slavic as *s versus *x/*š, and I'm not aware of any dispute about this either. I'm less certain about *o versus *a, but using *a in all cases seems like the more conservative approach, at least until more sources start popping up supporting *o. So far I've only seen Kortlandt's arguments, but his theories are hardly mainstream. —CodeCat 14:36, 12 August 2014 (UTC)
      Indeed they do, but most do not treat them directly as glottal stops - usage of ʔ by Leiden school is in a completely different framework (glottalic theory of PIE, PIE *H > segmental *ʔ) than others who e.g. claim that it was just phonologically redundant feature in long syllables, or perhaps not (when preserving the *V: vs. *VH distinction). I was referring to from PIE patalovelars and not the RUKI-induced . Regarding the *a vs. *o - the only Proto-Balto-Slavic dictionary published since Trautmann's 1927 book uses *o, so it's pretty far extreme to impose *a like you've been doing. Different reconstructions = different theories, and NPOV requires us to abandon any attempts of notational "normalization" and treat all of the incompatible sources equally. --Ivan Štambuk (talk) 15:26, 12 August 2014 (UTC)
      The use of a superscript glottalization symbol is not meant to indicate that it was indeed a glottal stop of any kind. It's just an abstract symbol that stands for the acute register, whatever its nature was. As for *ž, I realise that you meant that, but it doesn't make much sense to write *ž < PIE *ǵ but *ś < PIE *ḱ. We should use the same diacritic for both of them. And the sources I've seen so far mostly indicate the PIE palatovelars with an acute accent. If we indicate them as *š and *ž instead, then what symbol shall we use for the RUKI-induced variation of *s? Concerning *a versus *o, the problem is that if we distinguish them only sometimes but not other times, that's going to confuse users who may think there is real significance to this. They might think, quite reasonably, that if one noun ends in *-os and another ends in *-as, that this reflects some real difference rather than different theories. Therefore, I opted to not distinguish them in the normalisation, so that no false impressions are given about this. We could of course normalise in the other direction, but the issue there is that the *o versus *a distinction is not reconstructable in the majority of cases. —CodeCat 15:44, 12 August 2014 (UTC)
  • I think it's high time that the original research policy on etymologies be formulated and voted on. If at any point a legitimate and referenced scholarship that I added could be overridden by some anon on the basis of a WT:ES discussion I want to know it so that I know what I don't want to waste my time on. --Ivan Štambuk (talk) 16:22, 12 August 2014 (UTC)
Why should proto languages be exempt from all attestation rules? It's true that they can't be attested in the same way as attested languages but that doesn't allow us to just use invented forms. Wiktionary is no scholarly platform for linguistics, if you want that, go to Wikiversity. This isn't the first time CodeCat's behavior over here has been criticized - soon I'll be at a point where I am forced to take action in form of a WT:VOTE. -- Liliana 16:27, 12 August 2014 (UTC)
I would love to see that. While CC's OR Finnic roots don't bother me all that much as I once told her it would be better for both her and Wikt. if she used an appropriate platform for her OR, e.g., Amazon self publishing (I might even buy it.)
And then there is the much more serious issue of meddling with actual referenced etymologies, see diff where arguably the most authoritative source on lv etymologies – Karulis – was replaced by a root from god knows where all the while the superscript [1] at the end would have a reader believe that it was, in fact, Karulis who offered that root, essentially we end up with what is called (I think) a fabricated citation on Wikipedia. Removing intermediate steps (as in diff) in a referenced etymology also evades me, there are two homographic terms with completely different meanings why would trimming down a ref'd etymology be a good idea?
Re: OR Uralic roots I was disappointed by CC's ignoring of important corollary information from authoritative sources when crafting her OR root appendices, for example etymoloogiasõnaraamat was explicit in the fact that Finnic word for "shoulder; help" is ultimately an Iranian borrowing which for some reason she saw as not worthy of inclusion in her appendices for this root (e.g., Appendix:Proto-Finnic/api) and I'm lost as to why... Neitrāls vārds (talk) 18:14, 14 August 2014 (UTC)
*pūteiti is supposed be the infinitive form except that infinitive for Proto-Balto-Slavic verbs cannot be reconstructed because Old Prussian evidence doesn't agree with East Baltic and Slavic. But let's just ignore that conveniently in order to "fix" the inherent flaws of the tree model of language change and "normalize" entries... --Ivan Štambuk (talk) 16:30, 16 August 2014 (UTC)

What is consensus?[edit]

There is some disagreement about the exact nature of consensus in the above discussion. Specifically, the debate is over whether lack of debate implies consensus on a particular thing. My view is that it does, as we generally tend to follow the practice that an edit is ok until someone reverts it or complains about it. So my question is, in the absence of any discussion, can consensus be assumed? And if not, what should be done with the many unwritten and undiscussed rules that were never formally "consensusified"? Also, if I'm correct that consensus is needed for every edit on Wiktionary, what does that mean for the millions of edits on mainspace entries for which no discussion was made beforehand? Is requiring explicit consensus for every change workable? —CodeCat 13:47, 12 August 2014 (UTC)

  • This is a clarion call for instituting the BRD process they have at Wikipedia. An undiscussed edit is a "bold" edit. If another editor disagrees with that edit, he/she can revert it. At that point, you discuss. If no one disagrees with a bold edit, it isn't discussed. Purplebackpack89 13:57, 12 August 2014 (UTC)
  • A planned massive change cannot be claimed to be supported "by consensus" if there is no discussion and no evidence of overwhelming common practice, merely "lack of dispute", and the lack of dispute is caused by the fact that the change was not proposed in a public forum in the first place. As for the need of discussion, mass changes absolutely should not be put on par with single edits of mainspace articles. When specific claims of consensus are made without reference to a discussion or a vote where people expressed their agreement, the consensus is less certain but still possible, and can be proven by pointing out to a long-standing overwhelming common practice, sample of which can be provided by the claimant. When such a hypothesis of consensus is presented in a public forum, the rest of the editors can try to find a significant volume of refuting counterexamples to the putative common practice claim.

    The "dispute", "consensus" sequence is the opposite one: if I make an edit in a mainspace and no one oppose it but no one also becomes aware of the edit, there is no point in talking about consensus. It is only after there is at least a shred of dispute that talk about consensus and consensus forming becomes meaningful in the first place. --Dan Polansky (talk) 14:06, 12 August 2014 (UTC)

    • I think we should write WT:Consensus and have it approved by vote. —CodeCat 14:23, 12 August 2014 (UTC)
      • You should better write User:CodeCat/Consensus, so that everyone can know that, by your lights, a proposal that you did not even make is automatically supported by consensus since no one managed to dispute it. I think trying to write WT:Consensus could have some nasty repercussions, since it delves into meta-levels and infinite regress; it is this infinite regress that you are abusing here. --Dan Polansky (talk) 14:30, 12 August 2014 (UTC)
        • Wikipedia does fine with their w:WP:Consensus, and I think we need something similar. In fact, I think copying and amending it would be good. —CodeCat 14:39, 12 August 2014 (UTC)
  • Let us return to the substance. The problem with w:WP:Consensus is that it is bullcrap not so great. Consider this: "Any edit that is not disputed or reverted by another editor can be assumed to have consensus." That cannot be the case. A consensus is a state of general agreement. No one can be thought to agree with an edit of which they are not even aware of. As long as the edit is undisputed, it is not known whether it is supported by consensus, but in the absence of indication to the contrary, there is no need to revert the edit. Agreement and disagreement with an edit is only possible after the edit arrived to the attention of the person agreeing or disagreeing. --Dan Polansky (talk) 15:29, 12 August 2014 (UTC)
  • Well there was opposition to you doing OR in etymologies but that didn't stop you from continuing to do so. Do you perceive stopped being reverted as "consensus was formed" ? --Ivan Štambuk (talk) 15:40, 12 August 2014 (UTC)
  • The sense in which we have "consensus" at Wiktionary when folks fail to object to something is reminiscent of a "consensus" of users in response to arbitrary changes of user interface on must-use systems such as some of those of the federal government or Google or one's IP provider. One of the points of a wiki is to elicit contributor commitment by getting beyond such supposed consensus to one based on authentic participation. That users continue to participate in Wiktionary despite unsatisfactory changes and unsatisfactory methods of enacting changes wrought by technical contributors is a tribute to the preexisting commitment those users have because of the wiki idea and their prior efforts. It is also a tribute to the burgeoning complexity of the way in which many aspects of Wiktionary are implemented, which concentrates power in those very few who have both the time and the motivation to enact that complexity. Whether that complexity is actually necessary rather than a way of increasing the power of the motivated contributors is now moot. We now are stuck with the complexity and are help hostage to the whims of such contributors.
So our "consensus" seems to reflect the realities of power, more than anything else, though laziness and weakness of commitment to the project may also shoulder some of the blame. CodeCat is simply making that explicit. DCDuring TALK 16:59, 12 August 2014 (UTC)
Mostly the latter I think, as you can probably see by observing our "consensus-building" processes. You often either have the discussion stuck in nowhere because people are unable to agree on a minor detail, or complete lack of participation or "meh whatever" responses. And then someone uses a pretentions Latin phrase to justify reverting any changes. I am not surprised at all that CodeCat tends to bypass those venues. All this apathy can be really frustrating. Keφr 10:56, 13 August 2014 (UTC)
Insinuations may sound great, but do not belong on an open wiki and in an open discourse. The "someone" would be me. The Latin phrase would be "no consensus => status quo ante". The phrase "status quo" has been used over the years repeatedly by other editors as well, and has been an important principle that I have not introduced. As for e.g. Wiktionary:Votes/bt-2009-12/User:JackBot (to which you are untrasparently referring above, which is a poor practice), 6 editors participated; since the vote did not state the task for the bot, I would have opposed as per Ruakh and msh210 there; I don't know what you are complaining about. We do not need neverending repeated mass changes, especially those with significant oppositions; making many changes is enjoyed by immature juveniles, who avoid the real building of the real dictionary, unlike e.g. SemperBlotto or Equinox. I am sure the reusers and parsers of Wiktionary data (there are some) do not enjoy incessant changes either. --Dan Polansky (talk) 11:16, 13 August 2014 (UTC)
  • Blimey! How is it that the talk page of the article I'm editing on Wikipedia about the Israeli-Palestinian conflict is more collegial than this? I am tempted to collapse the entire last fourth of this month-subpage for generating "more heat than light". I will start blocking editors if they continue to speculate so incivilly about other editors' gender or genitalia; such speculation is not only irrelevant to our stated goal of making a dictionary but outright harmful to that goal, because it creates an exceedingly hostile environment that turns off potential contributors and tries to drive away existing contributors (it did drive Cloudcuckoolander to leave for a while, IIRC). It meets every criterion of our written blocking policy; it directly and deliberately hinders/harms the progress of Wiktionary in the way aforementioned, it has continued despite less drastic means of stopping it being attempted, it wastes everyone's time and it causes editors distress by directly insulting them and being continually impolite towards them.
    Some of you clearly feel that some of CodeCat's edits, and her tendency to implement them without discussion and even in the face of opposition, are also harmful to the project (and it was said above that they drove Ruakh to scale back participation in the project); if you would like to propose to desysop CodeCat, or block her if she continues her own behaviour, those options remain open [to all of you and to those of you who are admins, respectively], though continuing a civil discussion seems like a better course of action. (But, to be clear, CodeCat's behaviour and the misgendering above are not comparable.) - -sche (discuss) 21:46, 12 August 2014 (UTC)
    Thank you. Keφr 10:02, 13 August 2014 (UTC)
    I believe that many do not believe that we can desysop CodeCat without even more damage to the project, as the skimpy documentation for our system would make it difficult to maintain. I think this is probably a wrong belief, as shown by our operating successfully without CodeCat's presence after the last conflict, but who wants to test it for an extended period? I think our ever-increasing dependence on complex modules and relentless "tidying" of templates that would compete with such dependence has the effect of increasing CodeCat's power over the project. DCDuring TALK 22:07, 12 August 2014 (UTC)
    I think DCDuring said we use too many templates and they're too complex. If that's what he said, I agree. Purplebackpack89 22:16, 12 August 2014 (UTC)
  • For the uninformed: -sche (sic) is one of the major enablers of CodeCat's editing without consensus, and himself guilty of repeated controversial mass editing without consensus. --Dan Polansky (talk) 22:28, 12 August 2014 (UTC)
    Right. User:-sche apparently blocked me for "speculating about CodeCat's gender", while at the same time we can see threats and ad hominem attacks against editors like User:Purplebackpack89 go unpunished. --Ivan Štambuk (talk) 14:41, 14 August 2014 (UTC)
    To clarify, I blocked you because you are the only person I've seen continue to speculate after I asked (warned) people to stop that particular long-practised irrelevant/harmful behaviour. (I decided not to institute ex post facto blocks to other users as long as they stopped, and so far they have.) I am none too happy about Kephir calling Purplebackpack a "lying illiterate troll", but I hope that sort of incivility can be discouraged by discussion. (Also, I am not the only one with a mop, other people could step up to the plate and issue warnings and blocks if ad hominem attacks continue...) - -sche (discuss) 16:34, 14 August 2014 (UTC)
    There was already misunderstanding regarding my usage of the personal pronoun he so it was necessary for me to elaborate on that. I simply declared my position on the topic I didn't brought up in the first place, so if you want to issue warnings and blocks you're barking up the wrong tree. --Ivan Štambuk (talk) 16:19, 16 August 2014 (UTC)
  • As someone who has interacted with both editors (Codecat and Polansky) in both negative and positive terms i can say i am a neutral and impartial party in this conflict. Nonetheless, although i have also encountered posturing behaviour by Polansky, i think that has been balanced out by his helpful lessons he's given me on how guidelines on wiktionary work. Pass a Method (talk) 14:46, 14 August 2014 (UTC)

Superprotect certain pages such that sysop permissions are not sufficient to edit them[edit]

Users may be interested in [2], which creates a new protection level called "superprotect" for "protecting pages such that sysop permissions are not sufficient to edit them". The new protection was developed after a series of events on en.WP, and was applied to de.WP's MediaWiki:Common.js after a series of events there. (De.WP held a RFC on Media Viewer and found that consensus was for it to be disabled by default and opt-in rather than enabled by default and opt-out, a de.WP admin implemented that consensus via w:de:MediaWiki:Common.js, and an edit war occurred between that admin and another admin + a WMF person. Events on en.WP were similar, except en.WP admins didn't edit-war.) There has been some discussion of the new protection level at en.WP (permalink to current revision), though it has generated more heat than light, and there is a RFC on Meta.
You may also be amused by this; if you don't speak German, the key bit (after the initial post by Bene* in English) was BHC's reply "does Bene* even have the right to edit MediaWiki:Common.js now?"
- -sche (discuss) 19:04, 13 August 2014 (UTC)

I'm sure that's a tool CodeCat would love. -- Liliana 19:15, 13 August 2014 (UTC)
All power to the technocrats. DCDuring TALK 19:22, 13 August 2014 (UTC)
I think the idea is to lock a page for a few days, during which nobody can edit, discuss what the right thing to do is, and then do the right thing when the protection ends. I can get behind that Purplebackpack89 19:24, 13 August 2014 (UTC)
What we really need is shadowbanning ;) Equinox 19:28, 13 August 2014 (UTC)
Yeah, then we coulda banned Mglovefun so I wouldn't have had to read all his low-level digs of me. Purplebackpack89 20:25, 13 August 2014 (UTC)
Was that comment really necessary? -- Liliana 21:14, 13 August 2014 (UTC)
Yes. It serves as evidence that PBP's favourite pastime here is not dictionary-building, but trolling. Of which I think there is abundance already, but whatever. Also, meet kettle, pot. Your remark about CodeCat above in this section was equally gratuitous. Keφr 21:19, 13 August 2014 (UTC)
Kephir, calling me a troll is inaccurate (witness how many entries I've created in the last 24 hours alone) and is evidence of your continual campaign to get me banned from the project for relatively innocuous edits. I have told you numerous times that interacting with me is unproductive. One more remark like that and I will request a one-sided interaction ban on you interacting with me. Purplebackpack89 21:28, 13 August 2014 (UTC)
A "one-sided interaction ban" sounds absurd. Googling the phrase only finds you demanding them in a few places. Equinox 21:30, 13 August 2014 (UTC)
But don't you agree with the general idea that things would be better off if Kephir stopped interacting with me? Purplebackpack89 21:35, 13 August 2014 (UTC)
Oh c'mon. Do you really need to bring your seemingly-unlimited paranoia in here? -- Liliana 21:40, 13 August 2014 (UTC)
Paranoia, Liliana? Dude tried to remove my autopatrol and rollback rights less than 48 hours ago (see above discussion where he was shouted down). He has been targeting me for months since that disruptive pump thread a month and a half ago. He freely admits to monitoring my edits. And he just accused me of primarily being a troll, which is a flat-out lie. Purplebackpack89 21:44, 13 August 2014 (UTC)
Since you mentioned that, you have edited a whopping number of six pages in main namespace in the last 24 hours; what do you want, a biscuit? As for you being a troll, your ban at SEW and subsequent lack of remorse for what caused it should be enough evidence of that.
Also, if you really thought that interactions with me are such a waste of your time, you could simply avoid having them and ignore me. Simple as that. Keφr 22:15, 13 August 2014 (UTC)
How many new entries, Kephir? Three new entries in the last 24 hours (one of which had multiple definitions).
I cannot ignore you because you keep inserting yourself into my editing, even though I've asked you to not do so many times, and am asking you do so again. You have admitted to monitoring my edits, not to better content but to find dirt on me. A perfect example of this is your bullshit removal of my rights a day and a half ago, even though it was blatantly clear that the punishment didn't fit the crime, and you did not have a community consensus to do so. So, I ask you once more: will you voluntarily stop interacting with me for the good of the community? Because it's crystal clear that you continuing to interact with me is unproductive. Purplebackpack89 22:21, 13 August 2014 (UTC)
Three entries? Whoop-de-doo. Wonderfool makes more in 15 minutes. What does it prove? "You have admitted to monitoring my edits, not to better content but to find dirt on me" — show a quotation of me saying that. Keφr 22:49, 13 August 2014 (UTC)
In the only discussion I've only had with you on your talk page. If you were merely looking at edits for errors, it wouldn't matter who created them. I take it you're not going to agree to stop monitoring my edits, nor to stop using your tools in a way that results in a wheel war? What a shame. I thought you'd be the bigger man. Purplebackpack89 22:54, 13 August 2014 (UTC)
I asked for a quotation. If you fail at reading comprehension, you should not participate in writing a dictionary. Keφr 22:58, 13 August 2014 (UTC)
Your quote "Also, stop randomly jumping accounts for no reason, it makes it harder to review your "contributions", for lack of a better word." would indicate that you have been "reviewing" my contributions. But quotation, schmotation. I ask you once more: are you going to stop doing it? Purplebackpack89 23:01, 13 August 2014 (UTC)
I noticed two different accounts using the same signature at random. Did I "admit to monitoring [your] edits, not to better content but to find dirt on [you]" as you claimed? No. Lying illiterate troll. Keφr 23:21, 13 August 2014 (UTC)
The Wiktionary community has the right to review any edits anyone makes; this is a public project. I would be happy if someone were reviewing my edits, because any mistakes I make would be fixed. The quote about PBP from his ban on the Simple English Wikipedia (which Kephir linked to above) that I think basically sums him up very well is "it's clear that he cannot collaborate in a constructive fashion". Even though his mainspace edits may be productive, he is incapable of dealing with criticism. --WikiTiki89 23:39, 13 August 2014 (UTC)
I'm supposed to let a wheel war over my permissions and being called a "lying illiterate troll" roll off my back? I'm not the problem here, Kephir is. Kephir cannot collaborate in that he continues to call me a troll even after I've asked him to stop interacting with me altogether. Purplebackpack89 23:45, 13 August 2014 (UTC)


@Purplebackpack89: If you continue to target other users in the BP, I will block you. --WikiTiki89 23:43, 13 August 2014 (UTC)

PBP, if you continue to comment here, I would too. Wyang (talk) 23:46, 13 August 2014 (UTC)
You're going to block ME because another editor attacked me? That seems unfair. Purplebackpack89 23:49, 13 August 2014 (UTC)
Why don't you scroll up and look at who is starting all of the antagonism in these discussions (hint: it's you). --WikiTiki89 23:52, 13 August 2014 (UTC)
Except I wasn't talking to Kephir, and he went in there, and called me a troll. Heck, the Mglovesfun comment wasn't even altogether serious, and that was blatently obvious! I can't believe you think that it's OK for Kephir to do that, or to wheel war over my permissions. He acts abominably towards me, and it has got to stop immediately. Purplebackpack89 23:54, 13 August 2014 (UTC)
Please don't propose ridiculous "interaction bans", and I will not have to strike them out. Notice I said "discussions" in the plural. I don't think anyone here got the humor in your remark about MG. --WikiTiki89 23:59, 13 August 2014 (UTC)

This Purplebackpack89 is very similar to User:Razorflame — no substance, disruptive, irreverent, spoilt brat. And what do they have in common? They are both American children. I blame the American school system. No European or Asian youth would behave like this. --Vahag (talk) 07:00, 14 August 2014 (UTC)

I think that's not a good generalisation, Vahagn. It won't get us anywhere. There are good and bad, well-educated and spoilt people everywhere. (No comment on the topic at hand). --Anatoli T. (обсудить/вклад) 07:08, 14 August 2014 (UTC)
This is exactly what's wrong with liberals like you, Anatoli — false equivalency. "All religions are peaceful", "all cultures are good", etc... It is only America that instills in its children an undeserved sense of specialness and entitlement. You know very well that in our societies someone like Purplebackpack89 would immediately eat a couple of slaps in the face and wouldn't dare raise voice against such a valuable editor as Kephir. --Vahag (talk) 10:20, 14 August 2014 (UTC)
First of all, I'm not defending Purplebackpack89, I'm just saying that it is wrong to judge people by their origin. Using slaps in the face has little to do with being liberal, use of force is often justified. Skinheads in Russia killed hundreds of people from the Caucasus (including Armenians) and Central Asia, just for having the wrong looks and accents, for being generalised as "less civilised" or having bad behaviour or upbringing. I'm not always justifying American politics either but it's Russia that now instills in its children an undeserved sense of specialness and entitlement. Russia may treat Armenia better than other neighbours but don't be fooled, Putinism in this stage can't have real friends, allies or simply partners, it can only have vassals or enemies. Real liberal societies, which you dislike, give a chance to everyone, regardless of where they come from and a slap of face gets the one who deserves it, no matter where they come from. I work and socialise with people of different races and colours in the country, which treats everyone equally, more than US or Russia, and I don't see any problem with that. You would think the same way if you lived in a friendlier environment. I don't blame you for your views. I don't want to be involved in political debates or discuss, which nation is better, just replying to your comment. "No European or Asian youth would behave like this". This is funny really, people behave badly "in our societies", even if they are beaten. I am actually having trouble finding civilised and open conversations in runet (Russian Internet), besides it's problematic to punish someone on the Internet for bad behaviour. --Anatoli T. (обсудить/вклад) 06:02, 15 August 2014 (UTC)
You never get my trolling, Anatoli... And by the way, regarding modern Russia — I'm probably the biggest Russophobe you know. --Vahag (talk) 06:22, 15 August 2014 (UTC)
Well, в каждой шутке есть доля ... шутки. I do get your trolling but not always, I admit. Your jokes get you into trouble, so I'm not the only one who doesn't always get your trolling. :) I am a Russian Russophobe, as far as the Russian politics go, even if I was born in the Eastern Ukraine and lived in Russia most of my life. --Anatoli T. (обсудить/вклад) 06:38, 15 August 2014 (UTC)
@Vahagn Petrosyan:, you've crossed the line with your comments. For one, it's a personal attack to call somebody a "spoilt brat". It's also inaccurate to say I'm without substance, as I have created over 100 entries. Purplebackpack89 13:32, 14 August 2014 (UTC)
Purpleback reminds me a little of Wonderfool too. Socially naive, mostly well-meaning, and occasionally a genuine asshole. Definitely could do with some more mainspace edits tho. --ElisaVan (talk) 01:31, 15 August 2014 (UTC)
Just a note, and sorry for this being a few months late, but I am not this user. Just thought I'd clear that up. Razorflame 17:48, 22 October 2014 (UTC)
Good to know. Keφr 18:05, 22 October 2014 (UTC)

Personal word list/vocabulary list ?[edit]

I use Wiktionary as my learning tool for learning foreign languages.
I am using the "Watchlist" feature as my word list but I think we need a proper personal word list.
If people could create a word list and then choose to add words that they looked up to the list, it would help them remember those words better. Burkhankhaldun (talk) 07:17, 14 August 2014 (UTC)

Hit Ctrl-D. (This is not a prank. That would be Ctrl-W.) Keφr 07:34, 14 August 2014 (UTC)
You could simply edit your own user page and add the words of interest there; it would also allow grouping and formatting using the wiki markup. Equinox 07:44, 14 August 2014 (UTC)
What Equinox said. :) If you click the button at the top of the edit window that says "advanced", the icon on the far right of the menu that appears will even help you add a sortable table if you want to put in both foreign-language words and translations, and be able to sort them. Cheers! - -sche (discuss) 16:38, 14 August 2014 (UTC)
Userspace should be user for the activity related to the improvement of project, and not as a personal diary or a learning tool. That kind of functionality is outside the purpose of this project. --Ivan Štambuk (talk) 14:55, 16 August 2014 (UTC)

Not renaming template pedialite[edit]

I have only now noticed that "pedialite" is being renamed in the mainspace to "projectlink|wikipedia", like in diff. I object. Let me also repeat my sense of exasperation about the perpetrator of that renaming. This is very angering, and I feel helpless. I don't think I would quit Wiktionary about this, since Wiktionary is too great a project regardless, but the exasperation does move me in that direction. --Dan Polansky (talk) 09:20, 14 August 2014 (UTC)

Short markup is highly beneficial to editors who actually create entries. (I suppose it makes no odds to those who primarily work on templates. Ha.) Every time I have to start using {{cx|cookery|lang=en}} instead of {{cookery}}, or {{l|en|-mone}} instead of [[-mone]] (or have to dig my edit cursor through masses of such code generated by others), I feel a similar frustration to Dan's. I don't care about the final underlying representation — it could be a huge complex XML document — but the stuff I type in a text box, including others' work as presented to me for ongoing editing, should omit the complexities. Editing facilities do not seem to be keeping up with internal changes. Equinox 09:28, 14 August 2014 (UTC)
Amen. Content seems to be taking a back seat to a single individual's urge to "tidy". DCDuring TALK 12:21, 14 August 2014 (UTC)
And yet, when things are confusing, people wonder why we don't attract many new editors. My efforts to increase consistency and tidiness is aimed at reducing the substantial mental load that comes with editing Wiktionary, so that it becomes more accessible to newcomers. Wiktionary is too strongly biased towards its existing user base. —CodeCat 13:03, 14 August 2014 (UTC)
  • I object as well. The confusing thing is that Template:pedia and Template:pedialite are redirects to something, rather than being templates themselves. I seriously doubt that any editor would understand why projectlink exists at all. The reason that Wiktionary is too strongly biased to its existing user base is an over-reliance on an increasingly few templates, and a lack of redundancy in templates. Purplebackpack89 13:22, 14 August 2014 (UTC)
    • I see it as the opposite. I see having too many templates that perform similar functions as the problem. Redundancy should be eliminated, not increased. The more templates are alike, and the less different kinds with subtle differences, the easier it is for new editors to learn them. But to address the point about pedia and pedialite specifically, I was not going to delete them. I was only converting them to something equivalent. After all, I've seen other people's bots convert {{cx}} to {{context}} which is no different. It's just eliminating the shortcut, as shortcuts are intended primarily to give editors less to type. —CodeCat 14:25, 14 August 2014 (UTC)
  • This is a common tactic: first a "redundant" template is orphaned by bots, and afterwards they are listed for deletion on the obscure template discussion board which barely anyone keeps track of, and after a discussion involving one or two editors they get summarily deleted. You take a 1 month wikibreak and suddenly templates that were stable for years, had short and easily recallable names are all gone and replaced with some verbose monstrosities. --Ivan Štambuk (talk) 14:50, 14 August 2014 (UTC)
  • After a fortuitous e/c that illustrates my point.
    @Purplebackpack: You seem to misunderstand. You should know that the only important confusion is that faced by those who control the system, not any confusion on the part of actual users or contributors. That confusion is asserted to affect new contributors, though a moment's thought would clearly show the assertion to be implausible.
Most new potential contributors are likely to be contributing in one or two languages, their native language or their native language plus English. Any uniformity of system design across languages makes no difference to them. Redirecting templates are the means of getting the best out of our technical infrastructure while allowing customization for individual languages.
In the case at hand, we have redirecting templates applied to a different class of items. It would be easy enough to address the naming inconsistencies among our project-linking templates. The problem is that CodeCat/Mewbot fails to listen to or heed objections, let alone seek input in advance. I can only conclude based on the consistent pattern of behavior that CodeCat can't handle disagreement and avoids it by doing elaborate endruns and using technical malarkey to put an end to any discussion that might lead to a frustration of ambitions or whatever. I suppose that we can expect various neuroses (or worse) among contributors to become evident over time. It is only when the resulting behavior causes problems or inconvenience for others that we are entitled to object. This seems to be one of those times. DCDuring TALK 15:03, 14 August 2014 (UTC)
I see value in merging templates in the backend, but there is no reason that {{pedialite}} can't be kept as a redirect. I also see no value in mass-converting uses of this template. --WikiTiki89 15:11, 14 August 2014 (UTC)
I agree with Wikitiki on all points. Improving the backend of the template is good. The (semi-)memorable name ({{pedialite}}) should be kept as a redirect, and RFDed if someone wants it to be deleted. Renaming existing usages is not harmful, but it's not helpful or necessary, either. (Ditto renaming existing usages of {{cx}}, as Mglovesfun did.) - -sche (discuss) 18:36, 14 August 2014 (UTC)
I noticed that DP is now running an unapproved bot making undiscussed edits to many pages. I find it ironic that in trying to undo what he considers my wrongs, he commits those very same wrongs himself. Apparently the rules don't apply, and there's no need to discuss anything, if you already know you're right? This kettle disapproves of the pot. —CodeCat 20:35, 22 August 2014 (UTC)
A neat trick of yours. It shows your modus operandi in the clearest. I am merely restoring the state before your undiscussed changes. I do not need to actively seek consensus to restore status quo ante. --Dan Polansky (talk) 20:42, 22 August 2014 (UTC)
Thank you for confirming that you did not intend to discuss your edits or your bot with anyone. —CodeCat 20:44, 22 August 2014 (UTC)
Again, I am not instating a change; I am abolishing a change. This very thread shows the degree of consensus or its lack for the change that I am abolishing. --Dan Polansky (talk) 20:59, 22 August 2014 (UTC)
Can you show the vote or other kind of discussion that demonstrates that there is consensus for mass-reverting other editors without discussion and with an unapproved bot? —CodeCat 21:10, 22 August 2014 (UTC)
I recall instances unilateral reversal of changes to individual entries to the status quo ante. But I don't see any particular reason why the same should not apply to multiple bot-installed items. I see no particular reason why any single person's unvoted-on changes deserves any special protection from reversion or reversal. In the case of someone who simply institutes changes without any consensus, motivated principally by a purely personal compulsion to tidy, and then leaves to others the task of completing the change (Redlinked categories come to mind.), reversion would seem to be warranted and a failure to do so to reward bad behavior. DCDuring TALK 21:57, 22 August 2014 (UTC)
No, I cannot show you the vote showing that this wiki should be governed by consensus; that I accepted as a given when I have joined the project. Re: "Unapproved bot": this is AWB from a menial-work user controlled by me, not a bot. --Dan Polansky (talk) 22:10, 22 August 2014 (UTC)

Debotting MewBot[edit]

FYI: Wiktionary:Votes/2014-08/Debotting MewBot. --Dan Polansky (talk) 09:34, 14 August 2014 (UTC)


How do I go about rendering Inupiak entries when a specific font is required to view the intended characters? When I copy and paste words from the original typeface into wiki entries, they are rendered with completely wrong characters. I don't want to create entries that aren't accurate, so it's probably best if I wait for a solution before creating more work for myself by having to change them later on! The font can be found hereJakeybeanTALK 17:53, 14 August 2014 (UTC)

Use the proper Unicode characters: Ġ, ġ, Ḷ, ḷ, Ł, ł, Ł̣, ł̣, Ñ, ñ, Ŋ, ŋ. — Ungoliant (falai) 17:59, 14 August 2014 (UTC)
Brilliant, thank you. —JakeybeanTALK 18:29, 14 August 2014 (UTC)

Empowering JackBot[edit]

FYI: Wiktionary:Votes/bt-2014-08/User:JackBot for bot status. --Dan Polansky (talk) 20:33, 14 August 2014 (UTC)

Old Provençal or Old Occitan?[edit]

Wiktionary knows of a language called "Old Provençal", which nowadays is normally termed "Old Occitan". Should it be renamed? Benwing (talk) 04:38, 15 August 2014 (UTC)

SupportCodeCat 13:03, 15 August 2014 (UTC)
@CodeCat: Since you repeatedly complained of people posting no rationales, do you have any? --Dan Polansky (talk) 13:47, 15 August 2014 (UTC)
I support the rationale that Benwing gave. —CodeCat 13:49, 15 August 2014 (UTC)
@CodeCat: Thank you. How can we verify that "Old Provençal" is nowadays normally termed "Old Occitan"? --Dan Polansky (talk) 13:51, 15 August 2014 (UTC)
My book "Introduction to Old Occitan" says re. Occitan vs. Provençal:
Occitan enjoys increasing acceptance in all the languages of scholarship on the subject (despite the resistance of Provençal partisans) [with a footnote here] and will be adopted here.
The footnote says
"Only among specialists outside France has Occitan come to be the generally accepted term for the language" (Field 233).
Since we are "outside France" then we should use Old Occitan. Note that we're already using Occitan for the modern language, since Provençal is properly speaking only one of the dialects of Occitan. Benwing (talk) 16:17, 15 August 2014 (UTC)
Thank you very much. I have now also looked at Old Provençal,Old Occitan at Google Ngram Viewer and W:Old Occitan, and support. I've seen User:Renard Migrant edit Occitan, so I am pinging him, in case he has input. --Dan Polansky (talk) 16:48, 15 August 2014 (UTC)
Support as well. --WikiTiki89 17:14, 15 August 2014 (UTC)

Removing the number sign (#) from voting templates[edit]

I propose to remove the automatic number sign (#) from the voting templates {{support}}, {{oppose}}, and others. There has been discussion about this before (can someone find where?) and I recall that the objections were that people might not precede the template with a number sign, but I from looking at our votes, all of the uses I see have a number sign before the template. This will solve those annoying indentation problems with these templates, by putting all of the indentation control outside of the templates.

To clarify, this means that this will still work:

# {{support}} ~~~~

This will no longer work:

{{support}} ~~~~

But this will now work as expected:

: {{support}} ~~~~

As part of updating these templates, we can even merge the backends so that all of the voting templates will function the same way (such as {{vote delete}}).

--WikiTiki89 13:02, 15 August 2014 (UTC)

  • Support. I feel the templates should not generate the number, and should not be used like they are used in the second option above. It seems to me that the # sign is not part of the support itself, unlike the support icon and the text "support". And someone may want to write # Weak {{support}} ~~~~, and this should work seamlessly. --Dan Polansky (talk) 13:49, 15 August 2014 (UTC)
Support per nom and per Dan's point that the current format makes it hard to cast qualified votes like "weak support". A short previous discussion was here (prompted by edits to Template:vote-generic and other vote-setup templates). - -sche (discuss) 18:40, 15 August 2014 (UTC)

Additions/changes to Template:en-verb[edit]

Recently I converted this template to Lua, but without changing how its parameters work at all. But with Lua I think we can streamline it some and hopefully make it easier to use. The way I propose is as follows:

  1. For completely regular verbs, which add -s, -ing and -ed to the page name, nothing changes. You just specify no parameters at all.
  2. For slightly irregular verbs, you can specify only the first parameter (this is new):
    1. If the first parameter equals "es", then the present 3rd singular gets that ending instead of the normal -s. (example: smash {{en-verb|es}})
    2. If it equals "ies", then the final -y of the page name is replaced by -ies in the present 3rd singular, and by -ied in the past, while the present participle will end in -ying. (example: carry {{en-verb|ies}})
    3. If it equals "d", then the past ending will be that instead of the normal -ed. (example: free {{en-verb|d}})
    4. If it is a word ending in "es", then this ending is replaced with -ing and -ed to form the present participle and past. (example: recognize {{en-verb|recognizes}})
    5. If the first parameter anything else, then it is taken as the stem to form the present participle and past. (example: plot {{en-verb|plott}})
  3. If the second parameter and possibly third and fourth parameter are present, the template works as before. This is done for backwards compatibility. In particular, you can specify the present, present participle, past and optionally past participle directly using the four positional parameters.

CodeCat 13:32, 15 August 2014 (UTC)

I agree with #1 and #3, but I think many of the things in #2 can be automated. #2.1, #2.2, #2.3, and #2.4 can all be detected automatically (in case of false positives, we can use {{en-verb|s}} to provide the default behavior). For #2.5, I think we can do something like {{en-verb|tt}}, {{en-verb|dd}}, etc. --WikiTiki89 14:15, 15 August 2014 (UTC)

Why not also add another parameter in the first place, which would be a number denoting the verb's position. So for example carry out would be like
Programmatically, It will first check if the first argument is a number and treat 2nd 3rd and other arguments as if they were 1st 2nd and so on (the arguments meaning don't change).
Otherwise it will behave just like it does now.--Dixtosa (talk) 14:17, 15 August 2014 (UTC)
That is a good idea, but I think it would be better of as a positional parameter: {{en-verb|p=1}}. --WikiTiki89 14:19, 15 August 2014 (UTC)
I didn't want to make the proposal too complicated technically speaking. We could still do that in a later proposal, but for now I'd rather focus on what is there first. —CodeCat 14:40, 15 August 2014 (UTC)
The technical side doesn't matter in the proposal. My suggestions simplify the interface that editors will use, by not requiring any arguments in most cases. --WikiTiki89 14:55, 15 August 2014 (UTC)
I'm more wary of making proposals that make too many changes. I've noticed in the past some people don't like changes, so I've tried to keep them to a minimum. —CodeCat 14:56, 15 August 2014 (UTC)
What you should have noticed was that people don't care about the internal changes, but about the external changes. And my suggestions will have fewer external changes. People tend to complain when you require extra parameters or change the names of templates or parameters, but not when the template is updated to do all the work for them, while supported backwards compatibility. --WikiTiki89 15:00, 15 August 2014 (UTC)
What Wikitiki said. DCDuring TALK 15:48, 15 August 2014 (UTC)
I agree with Wikitiki about automating the things in #2. Benwing (talk) 16:23, 15 August 2014 (UTC)
How will this handle the existing cases such as tie#Verb when used in accordance with the current documentation, ie, {{en-verb|t|y|ing}}? Presumably per 3 above? DCDuring TALK 18:04, 15 August 2014 (UTC)
Yes, it looks like that is covered by #3, although ideally there should be a shortcut for it. --WikiTiki89 18:17, 15 August 2014 (UTC)
Agreed: "What you should have noticed was that people don't care about the internal changes [changes to template and module internals], but about the external changes [changes to wiki markup]." --Dan Polansky (talk) 09:15, 23 August 2014 (UTC)
@CodeCat: I have blocked User:MewBot for going ahead with this without consensus. --WikiTiki89 23:53, 24 August 2014 (UTC)
I see a consensus for it here. —CodeCat 23:53, 24 August 2014 (UTC)
I see a few people agreeing with me about my proposed changes to your changes. I also see no mention of a bot run in this discussion. --WikiTiki89 23:56, 24 August 2014 (UTC)
Your proposals were all additions to mine. I've just implemented a subset, foregoing the autodetection part. And while there is no mention of a bot run, why would there be opposition to one if we already agreed on what the new parameters are? —CodeCat 00:00, 25 August 2014 (UTC)
One reason is that if we do implement the auto-detection part, then you would have to do another bot run on the same verbs. Another reason, is that you need to start following bot procedure more closely. I would have thought that while there is a vote going on to debot your bot, you would be on your best behavior. --WikiTiki89 00:08, 25 August 2014 (UTC)
I thought I was? That's why I'm confused, I really thought I was finally doing something nobody would find a reason to be upset at me about. Maybe my bot should be blocked then, it seems I'm not good enough at judging when it's ok to use it. —CodeCat 00:13, 25 August 2014 (UTC)
You shouldn't be guessing at what people would be upset about. All you need to do is ask before every bot run. It's a simple enough rule to follow. --WikiTiki89 00:17, 25 August 2014 (UTC)
People never give a straight answer though. Just look at what happened here; I thought there was assent, when you judged it differently. I don't deal well with trying to figure out just what people mean, and clearly when I try to make sense in interpreting all the different comments and opinions, people get upset because I inevitably get it wrong. —CodeCat 00:22, 25 August 2014 (UTC)
You have to ask the question before you complain about getting unclear answers. After you were done making the changes in the module, you should have come back to this thread and asked something like "Can I start the bot run now?" --WikiTiki89 00:27, 25 August 2014 (UTC)
The module/template is specially designed to make all the parameters spell one of the inflected form, just as how I designed {{de-conj-auto}} to make all the parameters spell the pagename (advertisement!). For example, criticize would have {{en-verb|criticiz|es}} for aesthetics purposes. And talking about unclear answers, would you like people to just oppose you if they don't agree with you? Isn't that a bit rude? I oppose criticize having {{en-verb|criticiz}} as parameter. --kc_kennylau (talk) 01:49, 25 August 2014 (UTC)
That is one of the reasons I want auto-detection. I think we can all agree that just plain {{en-verb}} is much nicer than either of {{en-verb|criticiz|es}} and {{en-verb|criticiz}}. --WikiTiki89 02:00, 25 August 2014 (UTC)
I think the second is better than the first, because the second doesn't use a superfluous parameter. The "es" is literally just a no-op, like if you wrote {{en-verb|criticiz|lang=en}}, since the template doesn't need or use a lang= parameter. —CodeCat 02:02, 25 August 2014 (UTC)
What Kenny is saying is that criticiz looks very wrong by itself. And speaking of superfluous parameters, the entire word criticiz is superfluous as well, since it can be deduced from the page name. --WikiTiki89 02:06, 25 August 2014 (UTC)
Not any wronger than some of the parameters you might find in other templates. For the inflection table of Finnish suomalainen for example you write {{fi-decl-nainen|suomalai|a}}. If you assume that parameters should look like natural words, then yes, the parameters look strange. But that's a wrong assumption. And while the parameter can indeed be deduced from the page name, we haven't yet determined how many cases there are for which that deduction gives the wrong results. That's one of the reasons I preferred to play it safe at first, given that this is such a widely used template and we can't afford errors. Once the proposed changes had been made, I was going to research how feasible your additions would be. But I never got that far now... —CodeCat 02:16, 25 August 2014 (UTC)
If there is a choice between using a parameter that looks natural or one that looks unnatural, you should go with the natural one. As for the reliability, of course if we haven't discussed it, we couldn't have determined anything. I wouldn't call going ahead with a bot run "playing it safe". --WikiTiki89 02:25, 25 August 2014 (UTC)
What my bot did was apply the more limited version, the proposal I originally made. That I had already done a lot of researching and experimenting with, so I already knew that it would be possible to implement it fully as proposed, before I even proposed anything. I wanted to make sure that I wouldn't be proposing something that ended up not being feasible. —CodeCat 02:41, 25 August 2014 (UTC)
It is of course feasible, but the aestheticness is lost. --kc_kennylau (talk) 03:30, 25 August 2014 (UTC)

The meaning of "word"[edit]

We have had some disagreements of late as to what constitutes a "word". Our corpus offers a few relevant definitions, including:

  • A distinct unit of language (sounds in speech or written letters) with a particular meaning, composed of one or more morphemes, and also of one or more phonemes that determine its sound pattern.
  • A distinct unit of language which is approved by some authority.
  • Any sequence of letters or characters considered as a discrete entity.
  • Different symbols, written or spoken, arranged together in a unique sequence that approximates a thought in a person's mind.

So what is a "word" so far as "all words in all languages" is concerned? bd2412 T 18:33, 15 August 2014 (UTC)

There is that old classic definition, as MWOnline puts it: "any segment of written or printed discourse ordinarily appearing between spaces or between a space and a punctuation mark". This definition has the virtue of fitting with the use of written documents for attestation, which virtue is not shared by any of the four above. Of course, it is not the only definition that might have that virtue. DCDuring TALK 20:02, 15 August 2014 (UTC)
It also has the drawback of not working for languages that aren't written with spaces, or aren't written at all. —Aɴɢʀ (talk) 20:04, 15 August 2014 (UTC)
And, of course, MMWOnline has 10 main senses, 20 individual definitions of word, of which two senses, four definitions (including that above), may be relevant to (y)our discussion. DCDuring TALK 21:52, 15 August 2014 (UTC)
Several of the recent discussions about what constitutes a word were specifically about whether romanizations were words. On that subject, I've opined that romanizations are not words but representations of written words, like the shadows of things in Plato's Cave. Cambridge’s definition is interesting to consider in this context; it says a word is "a single unit of language that has meaning and can be spoken or written" — as if in their view words themselves are concepts, like the Platonic concept of jar, and spoken and written forms are just instances, like actual jars (and then, in my analysis, romanizations are the shadows of the jars). Cambridge's definition also implies that words have to belong to languages.
Even if one doesn't share my view of romanizations, it may be difficult to write a definition of "word" that applies to all words, doesn't apply to anything other than words, and yet isn't a paragraph long. I'll think about it, but for now, to add to the above list of other references' definitions, Wikipedia defines word as "the smallest element that may be uttered in isolation with semantic or pragmatic content (with literal or practical meaning) [...or more concisely...] the smallest meaningful unit of speech that can stand by [itself]".
- -sche (discuss) 21:20, 15 August 2014 (UTC)
I would actually go further and say that written language isn't language at all but only the representation of language, just like a painting of a pipe isn't a pipe. —Aɴɢʀ (talk) 22:10, 15 August 2014 (UTC)
I would have to disagree with you there. I would say that written languages and spoken languages are different languages using different media, but heavily influenced by each other. --WikiTiki89 22:21, 15 August 2014 (UTC)
It would be fascinating if we had a dictionary of sounds, where the "reader" could speak the sound and in response be told the meaning of it, with no visual symbols being used at all. bd2412 T 23:01, 15 August 2014 (UTC)
An IPA dictionary would be the next closest thing. I'd love to work on that, but it would be hard. —CodeCat 00:50, 16 August 2014 (UTC)
  • From MW Online:
  • a - (1) : a speech sound or series of speech sounds that symbolizes and communicates a meaning usually without being divisible into smaller units capable of independent use
    (2) : the entire set of linguistic forms produced by combining a single base with various inflectional elements without change in the part of speech elements
  • b - (1) : a written or printed character or combination of characters representing a spoken word <the number of words to a line> []
    (2) : any segment of written or printed discourse ordinarily appearing between spaces or between a space and a punctuation mark
  • -- HTH DCDuring TALK 21:52, 15 August 2014 (UTC)
    I notice that all of the definitions we've discussed so far fail to cover words from sign languages that lack written forms. Many of the above-mentioned definitions (e.g. Cambridge's) do cover words from sign languages like ASL, DGS, etc, because words from those languages can be written (using SignWriting, HamNoSys, etc), but not all sign languages can be written. - -sche (discuss) 22:36, 15 August 2014 (UTC) clarified 03:33, 16 August 2014 (UTC)
    There's another question raised by that - do we need to pick one definition for "word" or should we, in an abundance of caution, include within our sweep several different formulations? Should we include every distinct unit of sounds in speech or written letters with a particular meaning, plus every distinct unit of language which is approved by some authority, plus every sequence of letters or characters considered as a discrete entity? bd2412 T 00:45, 16 August 2014 (UTC)
    That's a good question. My initial reaction when I thought of sign languages was that it would make sense to have different definitions for (1) spoken/written languages' words and (2) sign languages' words, or even (1) spoken languages' words, (2) written languages' words and (3) sign languages' words. But it seems to me that there's a basic concept behind all of those, a basic sense of "word"—if we can figure out how to formulate it—that "spoken word", "written word" and "signed word" are subsenses of. (Does it seem that way to you?) Other senses like "unit of language approved by some authority" — which I guess is the sense people use when they say "irregardless isn't a word"? — could either be additional senses on the same level as that basic sense, or subsenses of it.
    For a definition of the basic sense, what about "the smallest unit of language which has meaning and can be expressed by itself"? Would that include or omit anything it shouldn't, bearing in mind that I'd envision it having subsenses that would provide the details on the nature of written / spoken / signed words? - -sche (discuss) 03:33, 16 August 2014 (UTC)
  • An interesting issue is caused by clitics. English has a clitic in the 's ending, but some languages have a lot of them. For example, Arabic has clitic object and possessive pronouns, plus clitic prefixes like wa- "and", fa- "so", ka- "like", sa- (future tense), etc., and many Arabic dialects have a clitic negative circumfix ma- ... . In some sense, adding a clitic to a word makes a larger word rather than two joined words; that's the nature of a clitic. At the very least, the resulting entity behaves as a single phonological word, usually with a single primary stress and possibly secondary stress(es). But it's possibly not a single linguistic word (morphological word?) in that it's not the "smallest unit of language which has meaning and can be expressed by itself" -- that would be the part without any clitics added. I would say there should be a general rule that word+clitic combinations should not be entered into Wiktionary; I think there are already agreements of this sort in specific cases, e.g. the -que, -ve, -ne clitics in Latin.
  • Compound words are also a problem since they seem to also violate the "the smallest unit of language which has meaning and can be expressed by itself" criterion but often have an idiosyncratic meaning, e.g. blackboard and blackbird are not the same as black board and black bird. Sometimes in English we write such compound words with no spaces, but not consistently cf. red tape, red-eye/redeye/red eye (flight), data base/database, file name/filename, etc. In Mandarin the issue comes up even more acutely since most words are compounds of one sort or another and there are widely varying degrees of compositionality of meaning, phonological behavior as one word or several, etc. Benwing (talk) 10:03, 16 August 2014 (UTC)
  • So I guess that we are considering abandoning the notion of the lexicon in favor of our slogan. I suppose that each word in our slogan should be similarly parsed and that any limits of our user interface be ignored in pursuit of an ideal that won't be realized before Wiktionary collapses under the burden of idealism. DCDuring TALK 12:27, 16 August 2014 (UTC)
    We abandoned the notion of the lexicon (a collection of lexemes) right at the very beginning, when we decided to have individual entries (and in some cases citations and the like) for plural forms like peaches, conjugated forms like stumbled, and superlative forms like hungriest. Your fear of the "limits of our interface" is interesting, given that the entirety of Wiktionary can fit on some USB thumb drives, and that our sister project Wikipedia is showing no signs of such problems despite having over 33 million pages across all namespaces with no signs of slowing, compared to our 4.1 million. I'd say we have plenty of room to grow. bd2412 T 14:19, 16 August 2014 (UTC)
    The limits are only that we are stuck with screen output and keyboard input. Other forms of input and output are much more limited in the portion of our content that they can accommodate. DCDuring TALK 15:06, 16 August 2014 (UTC)
    Sorry, I misinterpreted your actual shortsightedness for an altogether different kind of shortsightedness. My wife asks her phone things all the time and gets answers (often from Wikipedia). Quite a few technologies exist to allow blind people to use the Internet, generally. The limitation on our interface is that Wikimedia has not yet initiated a vocal interface. bd2412 T 15:28, 16 August 2014 (UTC)
    I am very happy to leave farsightedness and idealism to those who enjoy living in fantasy worlds. Judging by the way speech recognition works as delivered by Google and Apple and its rate of progress to date, I would say that it has no relevance whatsoever for casual users of Wiktionary for the next ten years or more. Of course, hard-core user/contributors such as ourselves may find some use for it sooner, though judging by the use of speech-recognition technology in workplace situations (ie, one language, one speaker, narrow range of vocabulary) much less challenging than ours and much more equipped with technical resources, even this may not turn out to be true. DCDuring TALK 17:34, 16 August 2014 (UTC)
    You should brush up on Moore's law. Also, web accessibility. Your ten-year projection may be a bit out of line with the current state of the art. I just tried my wife's phone with a few definitions of non-English words and it did pretty well. Of course, all of this is a separate discussion from what is a "word" for our purposes. bd2412 T 21:36, 16 August 2014 (UTC)
    Show me. DCDuring TALK 01:56, 17 August 2014 (UTC)
    How would I go about doing that? In any case, I'll concede that it's a bit beyond the scope of the question of how "word" is defined for purposes of writing the dictionary. bd2412 T 03:33, 17 August 2014 (UTC)
  • The question is too general and philosophical, and thus of no practical value. What is a word should be decided on an individual language basis, primarily on the basis of the criterion of usability: how the users and third-party software would look up Wiktionary entries using the search box/API to find out word's meaning and other metadata. The purpose of "all words in all languages" motto is not to impose exclusive inclusion of words (as opposed to non-words, a we already have countless entries for non-words), but rather to extol the liberal principles of the project, namely the absence of "authorities" which decides what goes in or not, as opposed to actual attestations of language which provide real-word evidence of usage. --Ivan Štambuk (talk) 14:27, 16 August 2014 (UTC)
    • The question may well be overly general and philosophical, but if we're in the business of offering people definitions of words, we should have a handle on what words are. bd2412 T 13:42, 18 August 2014 (UTC)
      We are in the business of collecting human knowledge, at a scale that vastly exceeds the needs 99% of people. People don't care whether something is a word or not when they look it up in the dictionary - they just want the meaning/translation and other goodies. Optimization of user interface for human consumption should be orthogonal to the underlying goal of documenting all instances of written (perhaps one day even spoken) language. The potentially harmful impact of overextending the formal definition of word to include the supposedly non-opaque compounds or set phrases, or "real" words with various affixes attached with debatable level of transparency is trivial. --Ivan Štambuk (talk) 13:45, 21 August 2014 (UTC)

I have overhauled our entry [[word]], adding several verb senses, adding more citations, etc. I left the four definitions which pertained to the linguistic sense of 'word' alone until the end of the overhaul, when I changed them like this (if it's too hard to pick through that diff, see this): I didn't reword the last three of the four senses at all (though I am about to RFV at least one of them), but I split the sense which had said
  1. A distinct unit of language (sounds in speech or written letters) with a particular meaning, composed of one or more morphemes, and also of one or more phonemes that determine its sound pattern. [from 10th c.]
into one sense and two subsenses with supporting citations:
  1. The smallest unit of language which has a particular meaning and can be expressed by itself; the smallest discrete, meaningful unit of language. (Contrast morpheme.) [from 10th c.]
    1. The smallest discrete unit of spoken language which has a particular meaning, composed of one or more phonemes and one or more morphemes.
    2. The smallest discrete unit of written language which has a particular meaning, composed of one or more letters or symbols and one or more morphemes.
Further improvements are welcome and indeed encouraged. - -sche (discuss) 03:42, 18 August 2014 (UTC)
Postscript: I modified another of the 'liguistic'-sense-related definitions, like this. - -sche (discuss) 05:06, 18 August 2014 (UTC)

Multiple Categories For 1 Label?[edit]

I run into items in Special:WantedCategories from time to time that that result from labels categorizing differently than contributors expect. For instance, kibutz is a Turkish word for an Israeli institution, so it only seemed logical to put "Israel" in the {{context}} template. This added the redlinked category Category:Israeli Turkish, because the system assumes Israel is where the term is spoken, rather than where the referent of the term is found. The same thing happens when someone uses classical to refer to something associated with ancient Greece and Rome: we have things set up to interpret that as referring to the classical stage of the language (this was done with Classical Chinese in mind), and thus we had the redlinked category Category:Classical English. Is there a way to have both a regional and a topical category for the same term, and to select between them? Or, lacking that, is there a way to turn off categorization for a single parameter in the {{context}} template, to have it display in the list, but not generating a bogus category? Chuck Entz (talk) 02:53, 16 August 2014 (UTC)

This has been raised and ignored before, but it might be the singer not the song. DCDuring TALK 02:56, 16 August 2014 (UTC)
I'd say the problem is more general in that we never had a proper way to distinguish regions as topics and as dialects. I think I brought this up before, when the label templates were being converted to Lua, but I think it was mostly ignored at the time like DCDuring said. —CodeCat 14:28, 16 August 2014 (UTC)
The problem isn't limited to regions. A region can be a usage context or a "topic", but so can a discipline, trade or profession. In addition, as Chuck points out, the categories are not necessarily intuitively connected to the label. Furthermore the mapping from labels to categories only has the sullen consent of users as it was basically imposed in the new regime. In the former regime there were many fewer topical categories, so much less often-inappropriate force-fitting was required and there could be a template for each major "context". These templates were apparently added based on a count of frequently used parameters of {{context}}, after some editing of user-inserted labels that were "close" to existing ones. The by-product was that there was an incentive for users to stick to the "contexts" for which templates existed because fewer keystrokes were required, but user flexibility was not otherwise discouraged. There were specific technical difficulties with template-only implementation of {{context}}, which difficulties made it an early target of Lua-based reform. However, there was a significant loss of capability in the simplistic implementation of the new approach. DCDuring TALK 14:53, 16 August 2014 (UTC)

where to put quotations when there are multiple spellings of a word?[edit]

Old French is notorious for having inconsistent spelling, e.g. standard arachier also appears as arrachier and arracher, among others. I have the latter two linked to the former as alternative forms, and the former lists the latter two as alternative forms. I have one quotation which would involves the form arrache, which is a form of both arrachier and arracher. Should I (a) put this quote under arrachier because that's the more standard spelling of the two, or (b) put it under arachier because it's under this lemma that all the variant forms are "gathered"? Is there a rule for this? Benwing (talk) 10:10, 16 August 2014 (UTC)

There's also: (c) put the quote under arrache, since it's specific to the spelling. Chuck Entz (talk) 17:34, 16 August 2014 (UTC)
Are you saying "the more standard spelling of the two" is not the entry that has been made the lemma? That should be fixed before anything else is done, and then the quotation should go in the lemma entry = entry for the most standard spelling (=, if I follow what you're saying, the or at least a spelling from which the form found in the quotation derives).
For English and German, I often see quotations of one spelling placed in the lemma entry even if it is a different spelling, especially if the quotations are famous uses (e.g. from Shakespeare), or contribute to showing that the word has been in continuous use for a long time. There are even quotations of Chaucer (Middle English) placed in English entries, which I have been told off for moving to Middle English entries. Quotations are also placed in the entries for the spellings they use, if that it necessary to verify the existence of those spellings. So, in the general case, it seems you can place the quotation wherever seems most appropriate. - -sche (discuss) 17:48, 16 August 2014 (UTC)
Sorry, let me clarify, arrachier is more standard than arracher but arachier is the most standard form. Both arracher and arrachier would produce the same 3rd-singular present indicative arrache. I fixed things so that both arrachier and arracher point to arachier. You (-sche) suggest putting it under arachier if I'm not mistaken. Chuck says put it under arrache but there's no entry at all for that. Benwing (talk) 19:10, 16 August 2014 (UTC)
I didn't say "put it under arrache", just that doing so should be included as one of the options. I would vote for either the lemma or the actual spelling, but I don't claim to be an expert on Wiktionary best practices. As -sche (who knows far more about this than I do) said, putting the quote in the actual spelling is mostly done when the quote is for the purpose of showing usage of the spelling itself, rather than of the term independent of its spelling. It might also be useful if you had a quote that was ambiguous enough so you weren't sure which lemma it belonged to.
In general, the issue of which form to put content on boils down to a tradeoff of usefulness vs. practical concerns: it would be most useful to have things like the definition at every spelling, but practically, that would be a nightmare to keep in synch, so we have everything in the lemma. Being consistent in format is just another aspect of usefulness: the more consistent things are, the less thought you have to put into finding things. Here, I don't see practical concerns unless you want to put it in multiple places, so you're better off figuring out where people would likely want or expect it to be, and putting it there. Chuck Entz (talk) 20:30, 16 August 2014 (UTC)

is there a policy statement somewhere indicating the standard order of "alternative forms", "etymology", "external links", etc. sections?[edit]

There seem to be standards for how these sections are ordered, but I'm not sure where this is documented. I'm guessing it's something like this:

At level 3:

  1. Alternative forms
  2. Etymology
  3. Verb / Noun / etc.
  4. External Links

At level 4, under the headword:

  1. Definition
  2. Pronunciation
  3. Conjugation / Declension
  4. Derived words, Related words
  5. Descendants
  6. Synonyms
  7. Anagrams

But not really sure.

Benwing (talk) 10:15, 16 August 2014 (UTC)

See WT:ELE. DCDuring TALK 10:52, 16 August 2014 (UTC)
Pronunciation is level 3, just after etymology. An exception is when there are multiple etymologies and they have different pronunciations. — Ungoliant (falai) 18:01, 16 August 2014 (UTC)
I'd say the exception is when there are multiple etymologies and they have the same pronunciation—in that case, Pronunciation precedes Etymology 1. —Aɴɢʀ (talk) 18:30, 16 August 2014 (UTC)
Yeah, you’re right. — Ungoliant (falai) 18:34, 16 August 2014 (UTC)

Catalan rhymes[edit]

Our current rhymes pages deal exclusively with central/eastern Catalan, which is the basis for the standard language in Catalonia itself. In these dialects, there is significant vowel reduction in unstressed syllables, so that many more words rhyme when they did not before. But in Valencia as well as large parts of western Catalonia, there is much less vowel reduction. The Balearic dialects take an intermediate position, reducing less than central Catalan but more than Valencian.

Because our rhymes pages use central Catalan as a base, they're no use for anyone outside of that dialect area. On the other hand, vowel reduction is entirely predictable, so information would be gained if we used unreduced vowels in rhymes, while none would be lost. So I'd like to propose that we do not reflect vowel reduction in Catalan rhymes, so that all dialects can be covered.

This does not solve all cross-dialect problems for Catalan, as Balearic dialects have an extra vowel phoneme that the other dialects lack, while Estern and Western Catalan differ in the application of é versus è. But it would be a step in the right direction at least. —CodeCat 19:51, 17 August 2014 (UTC)

If I understood it, this means basically to make rhymes in Western Catalan with regard to vowels (é/è is another problem). For example, Rhymes:Catalan/atʃus will be renamed to "Rhymes:Catalan/atʃos" indicating both pronunciations and Rhymes:Catalan/aɾə will be splited in "Rhymes:Catalan/aɾa" and "Rhymes:Catalan/aɾe" with a note redirecting to the other page for rhymes in Eastern Catalan. I think it is clear enough for a intermediate Catalan and it is useful for rhymes in Western Catalan or Valencian. --Vriullop (talk) 17:25, 28 September 2014 (UTC)
Yes, much like that. For Catalan as well as other languages, we already have notices on the rhyme pages, which say that the words on the page rhyme with the words on another rhymes page in certain dialects that merge or drop some sounds. The Central Catalan -r in infinitives is an example of this, English has rhotic and non-rhotic pages, and in Dutch there's final -n. In each case, we choose the rhyme that gives the most distinction. —CodeCat 17:33, 28 September 2014 (UTC)

Nynorsk translations and interwiki links[edit]


As you know, the Bokmål translations can have an interwiki link to the Bokmål Wiktionary when the trnslation exists in this project. But for the Nynorsk translations, we don't have often this chance. Indeed, Nynorsk Wiktionary still exists currently but community seems not very active anymore (nn:Special:RecentChanges). Therefore, maybe we could route the « nn » code to « no » in Module:translations, like it’s done for « nb »; that is to say having an interwiki link to no.wiktionary for translations in Nynorsk, instead of having an interwiki link to nn.wiktionary.

Here is a previous discussion about the proposal to rally efforts on Bokmål Wiktionary: w:nn:Wikipedia:Samfunnshuset/Arkiv/2009#Wiktionary, et fellesnorsk prosjekt.

And you'll find here an analysis of Nynorsk translations on this project who whould have an interwiki link if there were routed to no:: User:Automatik/Analysis of Nynorsk translations.

What do you think of routing nn code to no for interwiki links in translations? — Automatik (talk) 00:23, 19 August 2014 (UTC)

Support. Before the Chinese merger we used to do this for Mandarin (cmn) terms, which linked to Chinese (zh) Wiktionary. BTW, Chinese merger is a success and helped increase contents for Chinese varieties (esp. Cantonese, Min Nan, Wu and Hakka), perhaps a good indication of what could have been done for Norwegian (Bokmål and Nynorsk) (vote didn't pass) and Arabic (no vote), even if the situation is not the same (e.g. Chinese lects have no inflection and PoS headers don't require genders or plural forms). --Anatoli T. (обсудить/вклад) 00:30, 19 August 2014 (UTC)
I think the question we need to ask ourselves first is what the purpose of interwiki links is in translations. Is the purpose to lead the user to more information about the term outside of what the English Wiktionary has available, regardless of what language it is presented in? Or is the purpose specifically to lead to the entry in that specific language? Or something else? If we just want to link to more information, then we really want lots of interwiki links, not just to one specific language. But if the idea is to match the language specifically, then we shouldn't be linking to Bokmål from Nynorsk translations. —CodeCat 13:20, 20 August 2014 (UTC)
The purpose is to link to a definition of the term in its own language. In this case, it's the same language. It may even make sense to link to all three of the Norwegian Wiktionaries (when the entries exist) for every Norwegian term. --WikiTiki89 14:09, 20 August 2014 (UTC)
Actually there are not three but two Norwegian Wiktionaries (no: and nn:). And 'no' stands for Bokmål even if in ISO it stands for Norwegian and 'nb' stands for Bokmål. — Automatik (talk) 21:08, 20 August 2014 (UTC)
Oh. In that case there is no question that, at the very least, we should we should point Bokmål translations to the no.wikt. I assumed there were three because Serbo-Croatian has four (Bosnian, Croatian, and Serbian, as well as the combined Serbo-Croatian). --WikiTiki89 21:22, 20 August 2014 (UTC)
Bokmål translations already points to no.wikt. I asked for Nynorsk translations because no.wikt is also written in Norwegian and is more active. But I can understand the reluctance of some (why do not point to nn.wikt if it's not closed and still exists?… for the reason explained before I guess, but I understand). — Automatik (talk) 22:14, 20 August 2014 (UTC)
Oh sorry, I completely understand now. In that case, I'm still going to say that all three of {{t+|no}}, {{t+|nn}} and {{t+|nb}} should point to both of no: and nn:. --WikiTiki89 00:15, 21 August 2014 (UTC)
But what should be done for the distinction between {{t}} and {{t+}} in this case? The template would not have any way to say that no: has an entry but nn: does not. —CodeCat 00:19, 21 August 2014 (UTC)
It would have to be a bit more complex. Something like {{t|nn|foo|iw=no,nn}}, {{t|nn|bar|iw=no}}. Each language would have to have a list of allowed interwikis. Not to mention User:Rukhabot would have to be updated (ping User:Ruakh). --WikiTiki89 00:35, 21 August 2014 (UTC)

Sorry I'm very late here. The no project isn't a Bokmål only project anymore. It encompasses both nn and nb. All words are (at least should be) clearly marked to denote if they are one or the other or both. So linking to the no project will yield the most up to date definitions and correct grammar etc. --Teodor (dc) 20:43, 22 September 2014 (UTC)

One could just as well argue that sv, da, nb and nn should link to whichever of them is the biggest, most active, with the best content or whatever at the time, since they are, linguistically, versions of the same language.

I could support having fallback links, like, for Nynorsk: 1. nb > 2. sv > 3. da. Similar arrangements for other languages (with the requirement that they are very closely related, to promote a linguistic rather than political and/or geographical link). --Njardarlogar (talk) 17:18, 23 September 2014 (UTC)

Cantonese translations[edit]

(Notifying Kc_kennylau, Wyang): The topic above gave me an idea that Cantonese (yue) translations could be linked to the Chinese (zh) Wiktionary. Although the Chinese Wiktionary doesn't always provide Cantonese transliterations or other info, it may be helpful to look up other things. Written Cantonese shares about 99% of terms with Mandarin. Perhaps Cantonese jyutping can be loaded with a bot here and Chinese Wiktionary, Wyang has created a framework for this. --Anatoli T. (обсудить/вклад) 00:38, 19 August 2014 (UTC)

@Atitarev: 係咪?(Do you mean this?) --kc_kennylau (talk) 00:41, 19 August 2014 (UTC)
@Kc kennylau: Thanks, Kenny. Well, yes, single character entries (zi) have jyutping but multi-characters often don't. Since Cantonese Wiktionary doesn't exist, it would probably make sense to use Chinese (for semantics, translations). -Anatoli T. (обсудить/вклад) 00:50, 19 August 2014 (UTC)
@Atitarev: I still don't understand what you mean. Can you provide me with an example? --kc_kennylau (talk) 00:56, 19 August 2014 (UTC)
@Kc kennylau: OK, Mandarin translation (at China#Translations) 中国 (zh) (Zhōngguó) links to zh:中国, even if it uses "cmn" language code but Cantonese 中国 (zung1 gwok3) links to nothing, uses {{t}}, not {{t+}}. I suggest to redirect yue to zh, the way cmn is used in translations and use {{t+}} when a Chinese entry exists in zh:wikt. --Anatoli T. (обсудить/вклад) 01:03, 19 August 2014 (UTC)
@Atitarev: Oh, I thought you were talking about {{zh-usex}}, that's why I didn't understand. --kc_kennylau (talk) 01:05, 19 August 2014 (UTC)
@Kc kennylau: Do you support this idea? Or you think zh:wikt shouldn't be used for Cantonese? --Anatoli T. (обсудить/вклад) 01:11, 19 August 2014 (UTC)
@Atitarev: Support. --kc_kennylau (talk) 14:30, 19 August 2014 (UTC)

Renaming existing uses of Template:term to Template:m[edit]

Recently I've noticed that User:Mulder1982 has been working through many entries and correcting and fixing etymologies, but he has also consistently replaced {{term}} with {{m}}. I fully support this action, but I think it would be more effective to use a bot to do this. Is it ok for me to run a bot to replace existing instances of {{term}} with {{m}}? —CodeCat 13:24, 20 August 2014 (UTC)

Yes, please do! It's hard work. :D But well, that has actually been a by-product. The main thing I've been doing these days is to remove redundant transliterations of Gothic. That could probably be done with bots too, I suppose. But yes, speaking for myself, do run the bot. Mulder1982 (talk) 13:27, 20 August 2014 (UTC)
I weakly support this and would point out that there is absolutely no urgency in the matter. --WikiTiki89 14:07, 20 August 2014 (UTC)
  • Oppose converting "{{term}}" to anything. Wiki markup is the user interface; a significant widespread conversion like this should be done via a vote, IMHO. See also Wiktionary:Beer parlour/2014/April#Convert Template:term.2Ft_and.2For_Template:term to Template:m.3F. By the way, where is the rationale? Moreover, Mulder1982 should not have been manually performing the conversion until they can demonstrate widespread support for such a conversion. --Dan Polansky (talk) 17:40, 20 August 2014 (UTC)
  • Support; at first I found it confusing that both {{term}} and {{m}} existed, since it wasn't obvious that they do the same thing. Being consistent would be less confusing for new users. Benwing (talk) 22:32, 20 August 2014 (UTC)
  • Support; I do this manually all the time. There is zero reason to use 10 characters (term|lang=) where 2 will do the same thing (m|). CodeCat, you mentioned once before that there are still many instances of {{term}} where no language is specified. I would say those should not be changed at all, rather than being changed to {{m|und}}; after all, probably over 95% of those cases are English anyway. —Aɴɢʀ (talk) 22:44, 20 August 2014 (UTC)
    • We can either change them to "und" and then later fix all instances of {{m|und|...}}, or we can leave them as they are. They're going to need to be fixed either way. —CodeCat 22:49, 20 August 2014 (UTC)
      • There are some cases where we anticipate and tolerate that linking templates are used with the language code 'und', aren't there? E.g. when Phaistos disc or Buyla inscription particles are linked to, or when forms from intermediate stages of reconstructed languages or language families are mentioned (pre-Proto-Algonquian (post-Proto-Algic), etc, and IIRC 'Middle Iranian'). Hence, I think it makes sense to leave language-less {{term}}s as they are, as they constitute a different category of things that need to be cleaned up from {{m|und}}s. (In some cases, language-less {{term}}s may actually need to be converted to {{m|und}}s, but in a lot of cases they need instead to be labelled as English or, in etymologies, Danish or Norwegian.) - -sche (discuss) 18:09, 21 August 2014 (UTC)
        • Of course. But if we have a cleanup category for uses needing a language, it doesn't imply that the category needs to be emptied out altogether. There will be some where the use of "und" is legitimate, but by far the most of them won't be, so a category would still be helpful. —CodeCat 18:12, 21 August 2014 (UTC)
          • It makes it harder to clean up when the cleanup category is filled with legitimate uses. It would be better to leave correct usages as {{m|und|...}} and unknown usages as {{term|...}}. --WikiTiki89 18:24, 21 August 2014 (UTC)

This was proposed just couple of days back at Wiktionary:Requests_for_moves,_mergers_and_splits#Template:term_into_Template:m, with three opposes. --Dan Polansky (talk) 22:58, 22 August 2014 (UTC)

I have created a vote: Wiktionary:Votes/2014-08/Migrating from Template:term to Template:m. --Dan Polansky (talk) 23:09, 22 August 2014 (UTC)

Capitalizing proper nouns in reconstructed languages[edit]

Should we capitalize proper nouns in reconstructed languages? For example, should the Proto-Slavic word for Rome be located at *Rimъ or *rimъ?

From my point of view, reconstructions represent sounds, not spellings, and sounds have no notion of capitalization. Additionally, if we choose to use capitalization, we would be forced to pick a particular language's rules for when to capitalize words. In the case of Proto-Slavic, the various Slavic languages themselves have varying rules for capitalization. --WikiTiki89 17:42, 20 August 2014 (UTC)

I think we should not capitalise. —CodeCat 17:57, 20 August 2014 (UTC)
If it's a convention to standardize them in the literature, which seems to be the case, we should do it as well.
Contrary to the popular misconception, reconstructions don't represent sounds but phonemes. Phonetic reconstruction (*abc -> *[abc]) is a different category and has little to do with comparative method. --Ivan Štambuk (talk) 18:02, 20 August 2014 (UTC)
Allophones can be reconstructed in some cases, though. Sometimes allophones become phonemes later on, but we can also reconstruct them through the effect they have on sound changes. For example, we can reconstruct two allophones of Old English /x/ because of later developments in late Middle English. For Proto-Germanic we know that an allophone [ŋʷ] existed because of the effects of the w:Boukolos rule. And for Proto-Indo-European the allophony of syllabic and consonantal semivowels is also well known. —CodeCat 18:08, 20 August 2014 (UTC)
Unless I'm going into detail about phonology/phonetics, there is no reason to differentiate sounds from phonemes here. Neither sounds nor phonemes have any notion of capitalization. But as to your first point, I'm willing to bet that most sources that capitalize reconstructed proper nouns use the language would use the rules of its own language to determine when a word should be capitalized. For example, Vasmer would use the Russian rules, while the Hrvatski etimološki rječnik would use the Croatian rules. In the cases where the rules differ, how would we decide which one to follow? --WikiTiki89 18:11, 20 August 2014 (UTC)
Probably we should find common rules and use them, like we do with reconstructed words. —Useigor (talk) 18:15, 20 August 2014 (UTC)
We already capitalize phonetic transcriptions like Pinyin so there is a precedent. The whole transcription of sounds vs. alphabetic words dichotomy is false - this is an issue of arbitrary convention and nothing else. Thousands of largely unwritten languages use scholarly transcription schemes invented to accommodate their phonology, with capitalization rules conveniently imposed. Rules for capitalization are generally the same across all Slavic languages - the few corer cases like demonyms could be decided by counting the preferred form in reflexes. --Ivan Štambuk (talk) 13:29, 21 August 2014 (UTC)
Pinyin has its own standard capitalization rules. --WikiTiki89 13:39, 21 August 2014 (UTC)
Which are completely arbitrary (though based on Latin-script languages) which just proves my point that something being not a "real" word but rather a transcription is no argument. --Ivan Štambuk (talk) 13:49, 21 August 2014 (UTC)
The rules themselves may have been chosen arbitrarily, but our rule is straightforward and logical (namely: follow the standard). In the case of reconstructions, there is no standard to follow, so the arbitrary choice is on us, and we shouldn't be making such arbitrary choices. --WikiTiki89 13:59, 21 August 2014 (UTC)
We do not follow the standard but the attested usage. That the usage overwhelmingly conforms to the standard is a different issue - but even if it didn't we'd still have to add them because they're attested as such. Conversely, the lack of de iure standard does not preclude the possibility of forming one on or own, as it has already been done countless times, on the basis of de facto standard established in the literature, or for pragmatic reasons. Furthermore, the precedent set by Pinyin where capitalization and other orthography rules are being assigned to a purely transcriptional notation demonstrates that the practice is far from unnatural, and that the dichotomy of word vs. transcription is a false one. --Ivan Štambuk (talk) 18:29, 21 August 2014 (UTC)
The "dichotomy" that I was referring to is not word vs. transcription, but spelling vs. transcription. Both spellings and transcriptions represent words (in fact, as we see in the discussion above, no one actually knows what a word is). I never said that capitalization in transcriptions is unnatural, only that we shouldn't use it for proto-languages. Pinyin was designed as a multipurpose system. One of these purposes is embedding Chinese words in foreign language text ("President Xí Jìnpíng did such-and-such."), which is why capitalization is useful, since foreign languages expect capitalization. However, we do not do this at all with proto-languages; we don't write "Shrines of *Perunъ were located either on top of mountains or hills", we only use this transcription to talk about the word itself. Attestations of the spelling "*Perunъ" are not really attesting anything, they are just simply how other researchers choose to transcribe the term. We are free to transcribe the term however we please (for example, we could transcribe it as *Perūnu) as long as it is internally consistent, but it is in our own best interest to ensure that the transcription conforms with some sort of norm, so that our readers who are accustomed to reading about Proto-Slavic will not be confused. However, I doubt much confusion would come from not capitalizing nouns. In fact it does our readers a disservice to pretend that *Perunъ the god is somehow a different word form *perunъ the weather phenomenon, when we in fact do not know whether the Slavs of the time saw them as the same thing or different. --WikiTiki89 18:55, 21 August 2014 (UTC)
The deity appellation is the original term, and the common noun *perunъ is secondary, derived from the proper noun. Far from being a disservice, it is in cases such as this when proper and common noun denote different entities but differ only in capitalization, that capitalization of proper nouns is helpful from lemmatization. Proper noun is uncountable and animate, whereas the common noun would be countable and inanimate, and both have separate and different set of reflexes in daughter languages.
Your argument was that "reconstructions represent sounds and not spellings", from which it is now evident that when you typed spelling you meant word. Why it shouldn't be used for reconstruction still remains a mystery. This "fictional reader" argument gets thrown a lot for every imaginable dispute and I don't buy it. If anything, readers would expect proto-terms to conform to capitalization rules of ordinary languages.
Regardless of the Pinyin's design goals, there is fundamentally no difference between "Shrines of *Perunъ" and ""President Xí". Both are not "real" words to the same extent, regardless of how you define the term. --Ivan Štambuk (talk) 00:08, 22 August 2014 (UTC)
It doesn't matter what the original term was (in fact I'm curious how we even know which one it was). The point is they may or may not be different words, they may or may not have had different spellings (if they even had any sort of writing), but the only thing we do know is that they were pronounced the same. "President Xí" and other Pinyin words are attestable as uses in English, while reconstruction notation is only attestable, and therefore only meant for, mentions. --WikiTiki89 03:42, 22 August 2014 (UTC)
Just on capitalisation of Chinese/Japanese/Korean romanisations to no-one in particular, since it was mentioned here. There are certain rules, used by various dictionaries and standards, not just for the use of loanwords in English but e.g. educational purposes. Place names and people's names are definitely capitalised, even the rules for word spacing and hyphenations are described. So, "Xí Jìnpíng" is a standard modern transliteration (pinyin) for 近平 but "Xí Jìn-píng" or "Xí Jìn Píng" is not. Months, days of the week are not capitalised There are some discrepancies as for demonyms and language names e.g. 中国人 (Chinese person), 中文 (Chinese language). Both "zhōngguórén" and "Zhōngguórén" (less commonly "Zhōngguó rén"), "zhōngwén" and "Zhōngwén" are used by various dictionaries. An agreement between editors is required on capitalisations in this case. --Anatoli T. (обсудить/вклад) 04:27, 22 August 2014 (UTC)
In some transcription schemes for proto-languages, capital letters are semantically different from lower-case letters. For example, Proto-Brythonic is often reconstructed with both *b and *B, where Proto-Brythonic *B is the descendent of Proto-Celtic *b, while Proto-Brythonic *b is the descendent of Proto-Celtic * in a leniting environment. So capitalizing proper names in those cases would be a bad idea. And I'm generally opposed to it for other proto-languages where that isn't an issue, too. —Aɴɢʀ (talk) 18:22, 20 August 2014 (UTC)
What about the reconstructed Latin words? --kc_kennylau (talk) 06:22, 22 August 2014 (UTC)
The difference with Latin, is that Latin orthography is well known. For most proto-languages, the orthography is 100% artificial. --WikiTiki89 12:07, 22 August 2014 (UTC)

More entries in the lemma category than in its subcategories?[edit]

I am trying to figure out where the lemmas are coming from in Category:Hungarian lemmas. It has more than 16000 entries but the subcategories contain only about 14000, even if I include the phrases, proverbs, suffixes, prefixes which are not specifically listed under the subcategories. Are there other categories? --Panda10 (talk) 13:24, 21 August 2014 (UTC)

You can see a full list of the categories that are recognised as "lemmas" at the top of Module:headword. —CodeCat 13:37, 21 August 2014 (UTC)

Letter petitioning WMF to reverse recent decisions[edit]

The Wikimedia Foundation recently created a new feature, "superprotect" status. The purpose is to prevent pages from being edited by elected administrators -- but permitting WMF staff to edit them. It has been put to use in only one case: to protect the deployment of the Media Viewer software on German Wikipedia, in defiance of a clear decision of that community to disable the feature by default, unless users decide to enable it.

If you oppose these actions, please add your name to this letter. If you know non-Wikimedians who support our vision for the free sharing of knowledge, and would like to add their names to the list, please ask them to sign an identical version of the letter on change.org.

-- JurgenNL (talk) 17:35, 21 August 2014 (UTC)

Process ideas for software development[edit]


I am notifying you that a brainstorming session has been started on Meta to help the Wikimedia Foundation increase and better affect community participation in software development across all wiki projects. Basically, how can you be more involved in helping to create features on Wikimedia projects? We are inviting all interested users to voice their ideas on how communities can be more involved and informed in the product development process at the Wikimedia Foundation.

I and the rest of my team welcome you to participate. We hope to see you on Meta.

Kind regards, -- Rdicerb (WMF) talk 22:15, 21 August 2014 (UTC)

--This message was sent using MassMessage. Was there an error? Report it!

Haida lects[edit]

Wiktionary currently includes Southern Haida (hax), Northern Haida (hdn), and the macrolanguage they are sometimes considered to form, Haida (hai). Μετάknowledge and I discussed this on my talk page and are of the opinion that we should deprecate the macrolanguage hai and have only hax and hdn. The phonological and other differences between Northern and Southern Haida are, as linguist Michael Krauss puts it, "rather great, allowing only partial mutual intelligibility without practice, perhaps like Swedish and Danish, or German and Dutch." Translator Robert Bringhurst says "[i]t is, in fact, chiefly out of courtesy that northern and southern Haida are described as two dialects rather than two close but separate languages. By 1900, north and south had clearly known centuries of diverging cultural growth." Each language indeed has its own dialects (Kaigani and Masset Haida are the mutually intelligible dialects of Northern Haida, Skidegate and the now-extint Ninstints constitute Southern Haida). - -sche (discuss) 19:08, 22 August 2014 (UTC)

CFI and Non-Deities in Classical Mythology[edit]

Is there any reason we should keep most entries or senses referring to individual people in Greek mythology? It's true they're not covered by the "given name and surname" rule in CFI, but the names often have no meaning beyond their reference to the individuals themselves. Do we really need definitions that read like "daughter of so-and-so and so-and-so, wife of King so-and-so, and mother of so-and-so"? Chuck Entz (talk) 22:27, 22 August 2014 (UTC)

We should keep them for the lexicographical information that they carry, including etymology and pronunciation; that is for keeping the entries. As for keeping senses, that is kind of natural to me. A related poll: Wiktionary:Beer parlour/2010/December#Poll: Including individual people. --Dan Polansky (talk) 22:31, 22 August 2014 (UTC)
While these words do refer to the individuals themselves, the individuals may be so widely known that their names are assumed understood. If you consider the use-mention distinction, then most people will be introduced to a reader much the same as any other unfamiliar term is. But certain people are assumed to be known and their names are therefore only used and not introduced. This applies not just to Greek mythological names but also to names in modern times. For example Elvis can be seen in this way. Of course the knowledge of the person fades with time as popular culture shifts, but some names stay known for longer than others, just look at Bonnie and Clyde, Hitler, Napoleon, Caesar, Tutankhamun etc. These names definitely have "meaning". The meaning is tied to a cultural context, so that when that context is lost, the meaning goes with it. But in a sense, learning who Napoleon was is not so different from learning what a lute or an ironclad is. —CodeCat 22:38, 22 August 2014 (UTC)
We could include them with a definition such as "Character in Greek mythology" with a link to the Wikipedia article. --WikiTiki89 22:46, 22 August 2014 (UTC)
What a horrible idea. --Dan Polansky (talk) 22:46, 22 August 2014 (UTC)
Why is it a horrible idea? We already do this for many Biblical characters. --WikiTiki89 22:49, 22 August 2014 (UTC)
Actually scratch that. I was thinking of what we do for non-English entries. --WikiTiki89 22:53, 22 August 2014 (UTC)

Asteroid Names[edit]

The IP who's been adding these has been better known for adding mountains of junk edits to Japanese and Chinese entries, and to English entries having to do with magic and mythology, among others. These are so innocuous, and the other stuff is so awful, that we've let them slide, for the most part. It occurred to me, though, that there are some conceptual issues to be addressed.

Asteroid/minor planet names are assigned by an international body according to specific rules, usually based on the wishes of the discoverer(s), and are accompanied by a unique number reflecting the order in which the names have been assigned. Thus, the first asteroid discovered and named is w:1 Ceres, but the 1 is often left off. The IP describes it as Ceres being "short for 1 Ceres". I'm not sure, though, if the number is really part of the name, or whether it's added to the name to make the full official designation. In practice, the number seems to left off quite a bit in normal usage.

A more serious issue is what language header should be used. It seems clear to me that something like 1 Ceres, as a scientific name assigned by an international body, is interlingual. I'm not sure, though, if we have- or should have- entries for any of these. The question then becomes: is the sense at Ceres referring to 1 Ceres English, translingual, or some combination of both? Is the number like the author abbreviations in taxonomic names: necessary to be included at least once in a publication for the name to be technically complete, but mostly left off? Or is omitting it a sign that it's not translingual in that use? How is the name handled in other languages where script or morphology are different than the norm in the language for things like taxonomic names? Chuck Entz (talk) 04:04, 23 August 2014 (UTC)

Besides w:fr:(1) Cérès begins with these words: (1) Cérès (international designation (1) Ceres). We can see into w:Category:Asteroids that the parentheses are often used. JackPotte (talk) 08:31, 23 August 2014 (UTC)
Wikipedia tells us that there are 625 000 asteroids to be named. The IP has concentrated on the first hundred. Could we at least decide that anything beyond the top 100 will be deleted? All human names are likely to be used to name asteroids and hurricanes (and doggies, dolls and teddy bears), it's not possible to define every instance. Ceres certainly needs the asteroid definition, but I don't think the numbered forms belong to a dictionary. --Makaokalani (talk) 11:28, 29 August 2014 (UTC)
@Makaokalani: I think we need something other than an arbitrary criterion for any class of items, like asteroids, that we include. By our often-hostile reaction to items that are boring linguistically and our loving attention to items of the tiniest linguistic interest we betray the self-serving bias of many of our decisions and even policies. If this dictionary is to serve an audience beyond ourselves, we need to have principles and criteria that have more of a user focus, that we can include in CFI and even in a slogan to supplement or even replace "All words in all languages".
Asteroid names seem clearly intended to be suitable for use in all languages, to the greatest practical extent. The official name with the number is almost guaranteed to be attestable in scholarly literature and lists. If we were to exclude mere inclusion in a list on the possible grounds that such inclusion was a mention, not a use, I'd not be surprised if we didn't drastically(!) reduce inclusion. Such a principle would need to be applied in all realms to avoid the kind of bias I refer to above. But such a principle has the disadvantage of requiring many RfVs and much attestation effort.
A "policy" decision to include a class of names (eg, asteroid names with or without the order-of-discovery number) is subject to "correction" by RfVs. Do we know which of the forms is most likely to be in use translingually in running text? Both? Can the lettered portion of the full asteroid name be used again for an asteroid name, so only the number and the combination would be unique? Ie, could there be a 23400987 Ceres as well as 1 Ceres?
It seems silly to have L2 sections for every language that attestably hosts discourse about asteroids for either form of the asteroid names. The pronunciation rationale sometimes advanced for toponyms seems less applicable to asteroids, which will much more rarely be pronounced. The precedent we've established for taxonomic names says that we don't have any pronunciation section for translingual terms. I could imagine some kind of automatic system for generating language-specific IPA to help users pronounce any such term, which might be a reasonable approximation of what actually takes place in discourse using such terms. DCDuring TALK 12:26, 29 August 2014 (UTC)
It's incorrect to say that "1 Ceres" is short for Ceres, per se. Ceres is the planetoid's name, 1 Ceres is its systematic designation, which combines the object's name with its unique serial number (the number system came later than the naming, and the two are somewhat independent - early asteroids like Ceres were named without being numbered, and most modern asteroids are numbered without being named). A few asteroids/comets do have the same name (eg 209P/LINEAR and 118401 LINEAR, or the 9 Comets Shoemaker-Levy), which are taken from the names of their discoverers and have to be distinguished by their number. In running scientific text, asteroids and comets always seem to be given by name and number, but in tables only the number is given. From a translingual perspective, the English name takes precendence in scientific papers, but in common usage they may differ for transliteration reasons etc (what we call 67P/Churyumov–Gerasimenko, the German press call Tschurjumow-Gerassimenko or Tschuri), and some of the oldest objects have had other names grandfathered in (so the planetoid that we call Ceres, the Chinese call 穀神星 (star of the God of grain)).
Personally, I think a good rule of thumb would be to have any and all asteroids and comets where we can find three citations that name them without giving them any additional title or number (which would serve as an indication to the reader that they should check a sky catalogue, not a dictionary). "Ceres" is easily citable without the 1, and would therefore be kept - likewise, the more major recently-discovered minor planets like 99942 Apophis, 50000 Quaoar, 90377 Sedna and Comet Shoemaker-Levy 9 can be cited without the number. Wilson–Harrington on the other hand only appears in Google Books as "Comet Wilson–Harrington", "Wilson–Harrington (4015)" or "107P/Wilson–Harrington", and would therefore not be worth having, since in each case, it's clear that it represents an astronomical object. Halley's Comet would be an exception, I assume, since it uses the possessive (it's not Comet Halley) and has interesting translations in other languages. Smurrayinchester (talk) 08:55, 11 September 2014 (UTC)
So you would seem to favor making the language header for such names Translingual.
I am not sure that I understand the rationale for excluding any two-part asteroid name (ie, with the number) that is citable outside of a table. We rarely show any willingness to exclude multi-part names based on their ready intelligibility in context. It is neither a policy, nor an accepted practice, despite arguments that it should be. Certainly the first time one sees such a name one would almost certainly not be aware of any naming convention, though one might infer that the number and name collocate and behave as a unit. Is it really adequate for a user to be able to infer the definition "the 4015th astronomical object of this type to be formally recognized"?
Are we being consistent across topical domains in how we treat such proper names, both as to inclusion and the content of the definition? Consistency is not required, but somewhat similar cases might shed some light on lexical features that argue for inclusion and what makes for an adequate definition. Somewhat similar domains include multi-part taxonomic names, especially species names distinguished only by the discoverer's name; food additives (eg, E300) the class of which is almost always clear from context, vernacular names of taxa (eg, oak and oak tree), which often include a hyponym (eg, tree) that is arguably redundant. Another types with less broad inclusion has as an exemplar Route 66. DCDuring TALK 12:48, 11 September 2014 (UTC)
Smurrayinchester's suggestion sounds reasonable to me. - -sche (discuss) 00:33, 12 September 2014 (UTC)

Migrating from Template:context to Template:cx[edit]

FYI: Wiktionary:Votes/2014-08/Migrating from Template:context to Template:cx. --Dan Polansky (talk) 08:40, 23 August 2014 (UTC)

Where was this discussed? —CodeCat 12:33, 23 August 2014 (UTC)
You could start discussing it now, instead of retreating into a procedural defense invoking those you routinely ignore. It does make it look like you have no particular substantive arguments.
The proposal seems sensible. {{context}} could be retained as a redirect, as it affords an option that may not require consulting the documentation for casual users. The shorter name would reduce the size of the database. Making {{cx}} the effective template rather than a redirect would eliminate one extra redirect call at the time of page loading. This seems like basic efficiency, even though efficiency is a secondary concern. DCDuring TALK 13:40, 23 August 2014 (UTC)
It's usually Dan who resorts to procedure, so I was only returning what he does to me all the time. Petty maybe, but satisfying. —CodeCat 14:44, 23 August 2014 (UTC)
Ok, but where are the substantive arguments now? DCDuring TALK 15:20, 23 August 2014 (UTC)

FYI: Wiktionary:Votes/2014-08/Templates context and label. --Dan Polansky (talk) 17:47, 23 August 2014 (UTC)

CFI Misspelling Cleanup[edit]

FYI: Wiktionary:Votes/pl-2014-08/CFI Misspelling Cleanup. --Dan Polansky (talk) 09:32, 24 August 2014 (UTC)

Format and layout of language categories[edit]

Yesterday I made some changes to how language categories like Category:English language display. Instead of some prose, there is now a table which displays the information in a more systematic format, and it also shows information that did not appear before, like ancestors and other names. I do think that the table is a good addition, but I'm not really happy with how it looks. So I wonder if we should keep it this way. Is there something we could do to improve it? Or was the original format better? —CodeCat 13:46, 24 August 2014 (UTC)

A few things: 1. Maybe you could merge this table with the one on the right- having 2 different tables makes the page look messy. 2. If there are no ancestors of a language don't display that row. 3. Can descendants of a language be shown as well as ancestors? 4. In Category:English language, for example, in other names there is "Hawaiian Creole English". But "Hawaiian Creole English" isn't another name for English- it's its own thing. DTLHS (talk) 20:59, 24 August 2014 (UTC)
We include not just names for the language, but also names of varieties that are subsumed under the same language on Wiktionary. Category:Dutch language also lists Flemish for example. Showing descendants would be much harder to do, and would involve basically searching through all the languages to find any that have the current language as their ancestor, and then repeat. —CodeCat 21:05, 24 August 2014 (UTC)
(e/c) This is somewhat orthogonal to your question, but the list of "Other names" highlights something I've been thinking about for some time, which is that it's both confusing for anyone looking at our data (e.g. people looking at WT:LOL, and now also people looking at Category:English language), and possibly even undesirable from a technical standpoint, that we conflate in the names= parameter both "X, another name of language Foo" and "Y, name of one dialect of language Foo which is subsumed under the header Foo". "Modern English" is indeed another name for "English"; anything that is ISO-code-en (i.e. post-1500) "English" can also, in linguistic context, be called "Modern English". That's very different from "Hawai'ian Creole English", which is not another name for English — not interchangeable with "English" — but merely the name of one non-independent (non-L2-having) dialect. So I wonder if we shouldn't split a dialects= parameter off from the names= parameter. (There will be a few edge cases where a name refers to both a dialect and the language itself. These could be handled by listing the name in both places, or by giving one parameter priority and saying e.g. "never list anything in the dialects= parameter which is already in the names= parameter".) - -sche (discuss) 21:13, 24 August 2014 (UTC)
I agree. And while we're at it, we might as well split off a separate value for the canonical name? —CodeCat 21:18, 24 August 2014 (UTC)
@CodeCat: After considering it (and then getting distracted and forgetting about this thread for a week), I think that would make sense. As long as we're splitting off both a canonical= parameter (whether we decide to call it that or something else) and a dialects= parameter, should we also rename the rump "names" parameter to something like alt_names=? - -sche (discuss) 05:39, 15 September 2014 (UTC)
Somewhat unsurprisingly, I don't like the change. --Dan Polansky (talk) 18:15, 25 August 2014 (UTC)
I agree that the box should be merged with the one on the right. --WikiTiki89 18:20, 25 August 2014 (UTC)
Should the content that is currently on the right be put to the left, right, top or bottom of the current left box? —CodeCat 18:28, 25 August 2014 (UTC)
I'm not sure, but the merged box needs to end up on the right. --WikiTiki89 18:34, 25 August 2014 (UTC)
After thinking for a moment longer, I decided that I think what is currently on the left should go on top of what is currently on the right. --WikiTiki89 18:35, 25 August 2014 (UTC)

Red link headline in new entries[edit]

When I create a new entry, the line under the part of speech is a red link even after saving the entry. See -obb. I am using the {{head}} template. Is this how it's supposed to work? --Panda10 (talk) 20:05, 24 August 2014 (UTC)

The fuck? I didn't think {{head}} was supposed to create links at all, just the pagename (or head= parameter if there is one) in boldface. —Aɴɢʀ (talk) 20:09, 24 August 2014 (UTC)
Panda10, no this is not normal. The problem is of course in breaking a term.
BTW, null edit makes it blue (because null edit makes the module rerun, which on the second run sees the suffix on the list of available articles).
It is interesting that noone has noticed that suffixes and prefixes link to themselves so far.--Dixtosa (talk) 20:45, 24 August 2014 (UTC)
I think this must be fairly recent behavior. I suspect somehow it's treating affixes as two-term entries, like hot dog, where the two words are automatically linked. Only for some reason it thinks [[-obb]] is something like [[]][[-obb]]. —Aɴɢʀ (talk) 20:56, 24 August 2014 (UTC)
This is probably caused by recent changes to Module:headword that User:Wikitiki89 made. —CodeCat 21:03, 24 August 2014 (UTC)
Yes, this is my fault. I will attempt to fix it shortly. In the meanwhile, I am going to assume that this is not harmful enough to revert. Angr diagnosed it correctly. Only I was smart enough to remove the [[]], but not smart enough to remove the link from the [[-obb]]. --WikiTiki89 22:29, 24 August 2014 (UTC)
Yes check.svg Done --WikiTiki89 23:06, 24 August 2014 (UTC)
But wait, I think hyphen should still be a separator. Now it is not--Dixtosa (talk) 12:14, 25 August 2014 (UTC)
If it isn't possible to add links when there are letters on both sides of the hyphen, while not adding links if the hyphen is the first or last character in the string, then better not to have automatic linking in words with hyphens at all, and add it manually where needed. —Aɴɢʀ (talk) 12:22, 25 August 2014 (UTC)
That is possible and fairly easy too. My first thought is one regex replace:
"text lol-lol2".replace (/([^\s\-]+?)(\-|\s)([^\s\-]+?)/g, "$1]]$2[[$3") and embrace it with "[[" and "]]" if at least one replace occurs--Dixtosa (talk) 13:03, 25 August 2014 (UTC)
It's possible, but it is not always wanted. Very often hyphenated words should be linked together. --WikiTiki89 14:56, 25 August 2014 (UTC)

IPA alphabet[edit]

Where did the symbols e̞, o̞ and ø̞ go? They are needed at least for Finnish pronunciations. It's a small language, but big in en-Wiktionary. --Hekaheka (talk) 04:37, 25 August 2014 (UTC)

Please check Module:IPA. Someone may add those symbols, if they are valid. --Anatoli T. (обсудить/вклад) 04:46, 25 August 2014 (UTC)
Heka may have been referring to MediaWiki:Edittools, from which the symbols were removed earlier this month and to which they were just re-added. - -sche (discuss) 04:52, 25 August 2014 (UTC)
I saw that too. I thought these IPA symbols generate module errors. It would be the case, if they are missing in the module. --Anatoli T. (обсудить/вклад) 04:55, 25 August 2014 (UTC)
-sche, thanks for re-listing them. --Hekaheka (talk) 05:05, 25 August 2014 (UTC)
It was User:Wyang, actually. --Anatoli T. (обсудить/вклад) 05:14, 25 August 2014 (UTC)
Anatoli, they shouldn't cause an issue, because they are in the module. They are [eoø] + "combining tack below" for lowered articulation. The latter symbol is in Module:IPA/data under "primary articulation". --Catsidhe (verba, facta) 05:12, 25 August 2014 (UTC)
OK, I haven't checked. --Anatoli T. (обсудить/вклад) 05:14, 25 August 2014 (UTC)
Why are they needed for Finnish? Does Finnish have three heights of mid vowels? In other words, are /e/, /e̞/, and /ɛ/ all separate phonemes? Or are [e], [e̞], and [ɛ] three distinct allophones of the same phoneme? Because there's no need to use the symbol e̞ at all, either allophonically or phonemically, unless you're already using both e and ɛ and need a third symbol that's distinct from both of them. —Aɴɢʀ (talk) 07:00, 25 August 2014 (UTC)
I can't explain this any better than it's currently done in Wikipedia: Finnish_phonology#Vowels. If I understand it correctly /e/, /e̞/, and /ɛ/ are indeed separate phonemes and Finnish happens to use /e̞/ and not /e/ as equivalent for the Latin alphabet "e". --Hekaheka (talk) 16:18, 26 August 2014 (UTC)
I don't think that it's only necessary to use the diacritic to indicate contrasts. If the phoneme is really /e̞/, why write /e/ which is less accurate? It would be a bit like writing /m/ instead of /n/ when a language has only one nasal consonant and its articulation is alveolar. Or writing /s/ when the language's only sibilant is postalveolar, or writing /p/ when the only labial plosive in the language is voiced. —CodeCat 16:41, 26 August 2014 (UTC)
The clause "if the phoneme is really /e̞/" doesn't make any sense. Phonemes aren't preassociated with IPA symbols. IPA symbols are convenient ways of representing phonemes and allophones, and there is always some wiggle room in their application. By longstanding phonetic convention, diacritics are to be avoided unless they illustrate some important distinction; and ordinary Latin-alphabet characters are to be preferred over modified ones whenever feasible. So if a language has only one front mid vowel, the convention is to use "e" to represent it. Using [e̞] by itself in fact doesn't make any sense, because [e̞] means "a sound more open than [e]", but if you don't use [e] in your transcription system of the language in question, then you haven't defined what sound [e̞] is more open than. You could say it's more open than the cardinal vowel [e], but in fact very few languages' vowels are located at exactly their cardinal value. (Neither English [i] nor German [i] is cardinal [i], though German is closer to it than English is.) So you have to define [e] in the context of your language before the symbol [e̞] is even meaningful. As for Finnish, Finnish phonology#Vowels does not say that /e/, /e̞/, and /ɛ/ are separate phonemes; it says that Finnish has a single mid front unrounded vowel phoneme which falls between cardinal [e] and cardinal [ɛ]. The authors of the Wikipedia reveal their ignorance of how the IPA works by insisting on /e̞/ to mark that vowel, when by actual IPA conventions it should be transcribed /e/. —Aɴɢʀ (talk) 17:51, 26 August 2014 (UTC)
What he^ said. --WikiTiki89 18:04, 26 August 2014 (UTC)
If understanding of IPA relies on such conventions, then it kind of bypasses the point of having a universally applyable and unambiguous transcription system. The reasoning here also seems a bit circular in a way. On one side, you say you need to define [e] before defining [e̞], but then what does it mean to say that the Finnish sound is between cardinal [e] and [ɛ]? Surely that in itself means that there are absolute reference points that the lowering symbol is relative to? So the way I see it, the question is whether the correct transcription is [e̞] or [ɛ̝]? If the sound is between them, then either symbol is appropriate. —CodeCat 18:21, 26 August 2014 (UTC)
The "cardinal" ones are defined (although not sure how they are defined), but as Angr already said, very few languages have vowels that coincide with the "cardinal" values, so by CodeCat's logic, all languages should use a whole pile of diacritics on every vowel and every consonant. --WikiTiki89 18:28, 26 August 2014 (UTC)
See Cardinal vowels on how they're defined. Cardinal [i], [u], and [ɑ] are defined as the most high front vowel possible, the most high back rounded vowel possible, and the most low back vowel possible, and all the others are defined as being a certain acoustic distance between those. The IPA makes no claim of being able to represent every conceivable nuance in articulation (or acoustics) in every single spoken language in an unambiguous and universally applicable way. German Haus and English house sound quite different from each other, but both are—correctly—transliterated [haʊs]. —Aɴɢʀ (talk) 18:44, 26 August 2014 (UTC)
I realise that IPA is only an approximation. I see it as a kind of set: every symbol used on its own represents a certain set of possible articulated speech sounds. Diacritics narrow that set further down. But in the case being discussed here, it's not quite clear whether the sound belongs to the [e] set or the [ɛ] set, as it's equally distant to the cardinal value of both (as I understand it). So in this case using a diacritic seems warranted to clarify that the sound in question is an edge case. —CodeCat 18:54, 26 August 2014 (UTC)
Modern Hebrew and Standard Spanish are a couple examples of other langues for which a sound right between [e] and [ɛ] is transcribed as /e/. --WikiTiki89 19:36, 26 August 2014 (UTC)
If the mid vowels of Finnish are midway between canonical [e], etc and [ɛ], etc, and our Finnish-speaking editors want to use the transcriptions [e̞], [ø̞], [o̞], then I think it would be unhelpful for people who don't speak Finnish to try to ban that notation from narrow transcriptions and mandate that the narrow transcriptions be less accurate than the Finnish-speakers want them to be. (The degree of specificity used in narrow transcriptions can and does vary from language to language, so the argument that we'd have to use many diacritics for other languages can be dismissed.)
In broad transcriptions I am inclined to accept that the vowels can be written /e/, /ø/, /o/ or /ɛ/, /œ/, /ɔ/ (de.WP uses the latter).
I would attach weight to how Finnish references transcribe the vowels. Perhaps Hekaheka has references on Finnish with IPA transcriptions and can say what symbols those references use. Checking Google Books, I find only a few English- or German-language references that give IPA-like transcriptions, but it's not clear to me that they are actually IPA and not just regular letters enclosed in IPA-like brackets:
  • Melvin J. Luthy's Phonological and Lexical Aspects of Colloquial Finnish (1973) speaks of "a syllable boundary between all low and mid vowels, e.g. [pa.eta], [kä.etä]."
  • Variation in Finnish phonology and morphology (1997) speaks of "the mid vowels /o, ö/".
- -sche (discuss) 19:47, 26 August 2014 (UTC)
If you ask me, this is an example of the bad Western-European-centric design of IPA. Most Germanic and Romance languages have at least four vowel heights, so for them it makes sense. But globally, far more languages have three heights than four. So having no symbol for a straight mid-vowel is quite a frustrating omission. —CodeCat 20:03, 26 August 2014 (UTC)
You're looking at it wrong. /e/ is what you're describing. /ɛ/ was only added to accommodate languages with four heights. --WikiTiki89 20:18, 26 August 2014 (UTC)
How do you know? —CodeCat 20:47, 26 August 2014 (UTC)
Because "e" has been used for "transcriptions" since Roman times, and they did not have a separate letter "ɛ". --WikiTiki89 20:50, 26 August 2014 (UTC)

I have relied on this source [3]. BTW, to my untrained ear, the Finnish "e" sounds about the same as English /ɛ/. --Hekaheka (talk) 21:42, 26 August 2014 (UTC)

But that's because English has no actual [e]. It has [eɪ] which is a slight closing diphthong, which probably sounds more like the Finnish [e̞i]. —CodeCat 23:05, 26 August 2014 (UTC)
Can you prove it? Lysdexia (talk) 04:05, 29 August 2014 (UTC)

Policy for Translations entries[edit]

Within the definition of most(?) English words, there is a "Translations" section giving corresponding words in various languages. (Like anything else with the word "translation" in it, this is not perfect, but can be immensely helpful.)

I guess there is a policy of not putting a "Translations" section on non-English words, which sounds reasonable since one can look at the English word, but there is an interesting class of exception to this rule -- when the English word does not exist. For example, 何番目 and wievielte are perfectly good Japanese and German words respectively, but the best we can do in English is the non-word whatth. (Personally I would have looked for "how manyth", but that isn't even a single word.) It seems to me it would be helpful to add the translations section to the foreign words, since otherwise a person who recalls that there is a way of saying this in at least two languages has no way of navigating from one to the other. Imaginatorium (talk) 10:08, 25 August 2014 (UTC)

I've thought about this too, and part of the problem is deciding which language to put the translation table in if we're going to add them for non-English languages. For example, a whole lot of languages have a single verb for "to be silent" (schweigen, zwijgen, taire, taceō, etc.), but English doesn't. I've often wanted to be able to put all of those words in a translation table, but where? If we allowed them in non-English languages, we'd have to have the same translation table in each one of the languages where this word exists, and that's a lot more than just the four I mentioned above. —Aɴɢʀ (talk) 11:20, 25 August 2014 (UTC)
Incidentally, to judge from Google Books, how manyth and/or how manieth might actually be attestable. —Aɴɢʀ (talk) 11:26, 25 August 2014 (UTC)
Another example would probably be "double eyelid" and "single eyelid". Wyang (talk) 12:56, 25 August 2014 (UTC)
The only way of adding these would IMHO be to allow sum-of-part translation-only English entries. We have some of them already, see Category:English_non-idiomatic_translation_targets, but there is some risk that these entries are eventually deleted afer discussion on WT:RFD. See also the discussion on Wiktionary_talk:Criteria_for_inclusion#Translation_target Matthias Buchmeier (talk) 17:54, 25 August 2014 (UTC)
Do you guys realize that we are not here to put everything that is interesting (Even if it is linguistically interesting) into articles? Examples are this, dord (and the like), some sum-of-part terms that may have an interesting etymology or maybe it is so widely used that it has become a thing, but... they are still sum of parts.
Anyways, Ill throw an idea (probably a stupid one though xD): let's make a User script, that will fetch (assuming JS can go on external links) the translation table from the language's own wiktionary and insert it into our article formatted in the the same way as English entries do and also apply assisted adding of translations, which actually adds to foreign wiktionary. In this way en:wikt runs the world :D of wiktionaries--Dixtosa (talk) 18:08, 25 August 2014 (UTC)
Hate to be a spoilsport, but we were not able to make automatic generation of entries work properly even within the English wiktionary. In principle we already have a way to find out translations of wievielter to other languages: one clicks "Deutsch" in the "In other languages" -list on the left side of the page and checks the translations in the German wiktionary. In the particular case of wievielter there's the problem that there's no article on it in de-wikt, but that would be a problem in any automated model as well. --Hekaheka (talk) 04:46, 27 August 2014 (UTC)
@Imaginatorium: I've made an entry for what number. --Anatoli T. (обсудить/вклад) 23:21, 25 August 2014 (UTC)
  • This could be solved by wrapping all such sum-of-parts English translations into a special template to track them in FL entries, and then generate (by bot) the appropriate entries in the Appendix namespace. This would really be an awesome feature - a cross-lingual glossary of terms otherwise not directly translatable to English, like that disputed list of terms that there was a big discussion about which I cannot find now (deleted?) containing "to call a mobile phone and let it ring once" and others. Some additional tags would be necessary though (e.g. part of speech). {{translation only}} is absurd - how I am suppose to know the meaning to translate if I knew only English and the missing FL? --Ivan Štambuk (talk) 15:19, 26 August 2014 (UTC)
    • From the entry name of course. —CodeCat 15:30, 26 August 2014 (UTC)
      Ha ha! Except that it's often not that self-explanatory - what number is barely even valid English and could be translated both as "what kind of" and "how much". Those translations referring to concepts that need entire English sentences to translate (like the notorious "to call a mobile phone") are too cumbersome to have as their own entries. --Ivan Štambuk (talk) 08:49, 27 August 2014 (UTC)

Proclitics in Hebrew[edit]

I saw the paragraph Wiktionary:About_Hebrew#Proclitics, and entry הערב. This seems inconsistent, although I don't see any problem to have entries such הערב. — Automatik (talk) 15:14, 26 August 2014 (UTC)

הָעֶרֶב is a different story, since it is idiomatic in the sense of "this evening/tonight". There are a few other similar cases, such as הַיּוֹם (today). The problem with including the non-idiomatic ones, is that there is a very large number of them and they add nothing useful to the dictionary. As I recently pointed out in the WT:GP, we would end up with ridiculous cases such as וּכְשֶׁבְּבֵיתְכֶם (u-kh'-she-b'-veit'-khém, and when in your house). --WikiTiki89 15:28, 26 August 2014 (UTC)

Allow the hyphen to be specified explicitly on Template:suffix and such[edit]

The templates {{suffix}}, {{suffixcat}}, {{prefix}} and so on currently require that you omit the hyphen from the affix parameter. If you put it in anyway, you end up with a double hyphen. This sometimes causes problems mainly because people might not think about it and add the hyphen in the parameter. But I believe that having the hyphen as part of the parameter explicitly is also a bit clearer. So I'd like to ask for two changes:

  1. Instead of never being able to add the hyphen, allow the hyphen to be specified optionally. If the module detects that the hyphen is already present, it will not add another one, but it will add one like before if it's not already present.
  2. If you agree with the first change, I would also like to ask if I may use a bot to convert entries that don't include the hyphen in the parameter, so that they include it. (For example: {{suffix|hyphen|less|lang=en}} > {{suffix|hyphen|-less|lang=en}}) This would effectively establish the use of the hyphen in the parameter as the more common form of use, while leaving the hyphenless form as a possible alternative for backwards compatibility for those who are used to it or prefer to do it this way.
  3. If you agree with the second change, there's the option to convert to {{affix}}. This template requires hyphens, so if we are going to have hyphens in the entries' wikicode anyway, we could also opt to use this template instead.

Please specify whether you agree with just option 1, or with both option 1 and 2. Or alternatively, if you oppose, specify whether you oppose only option 2 or both option 1 and 2. —CodeCat 14:40, 27 August 2014 (UTC)

I strongly oppose two sources producing the same output.--Dixtosa (talk) 16:50, 27 August 2014 (UTC)
Why? —CodeCat 16:59, 27 August 2014 (UTC)
Option 1
Support option 1. I mistakenly save pages with a hyphen all the time. Abstain on option 2. --Vahag (talk) 14:52, 27 August 2014 (UTC)
Support option 1, unlikely to change my mind. —This unsigned comment was added by DCDuring (talkcontribs).
Support option 1. --WikiTiki89 16:52, 27 August 2014 (UTC)
Mild support, since people (including me) do save pages with explicit hyphens sometimes, and it would be useful if the template handled that smoothly. - -sche (discuss) 00:15, 28 August 2014 (UTC)
Option 2
Support option 2, might change my mind. DCDuring TALK 15:32, 27 August 2014 (UTC)

Now that there's also {{affix}}, I've added a third option. I don't expect people will support migrating everything to it only days after it was created, but at least there's the opportunity. —CodeCat 00:37, 28 August 2014 (UTC)

Option 3

I've applied the first change. The templates will now first test to see if the given form already has hyphens where they belong, and if it doesn't, it will add them. So {{suffix|red|-ness|lang=en}} correctly shows -ness and not --ness. But it also means that something like {{suffix|brako|um-|lang=eo}} will have the hyphen applied, giving -um-, as the suffix parameter is a prefix here, and hence not of the expected type. This was also the previous behaviour, so it's necessary for it to work this way to not break anything. More subtly, it means that {{infix|education|-ma|lang=en}} will end up linking to --ma-, because infixes are expected to have hyphens on both ends, and this parameter didn't. —CodeCat 19:20, 2 September 2014 (UTC)


User:Kephir created this a few months ago but left it unfinished. So I made it usable. It replaces {{prefix}}, {{suffix}}, {{confix}} and {{interfix}}, but not {{compound}}, {{infix}} or {{circumfix}}. I hope it's useful, and if there are any problems or shortcomings, please report them on Module talk:compound. —CodeCat 22:37, 27 August 2014 (UTC)

Module:ar-verb needing testing, work, use of new {{ar-conj}}[edit]

Moved to Module talk:ar-verb. --Anatoli T. (обсудить/вклад) 01:36, 3 September 2014 (UTC)

Away for a few days.[edit]

I will be away for family matters until September 2. Please try to have the dictionary done by the time I get back. I will, however, be going to town on RfD cleanup when I return, unless someone else does it first. See you next Tuesday! bd2412 T 03:15, 29 August 2014 (UTC)

I am back, and I am going to town. Cheers! bd2412 T 13:39, 2 September 2014 (UTC)
I'm sorry to disappoint you, but the dictionary hasn't been completely finished yet. --WikiTiki89 14:09, 2 September 2014 (UTC)
It's closer than it was. I'll take that. bd2412 T 19:37, 3 September 2014 (UTC)

Request for comment: Admin corruption[edit]

When I was blocked I left the below message on my talk page but nobody dealt with or said anything about it. After the term it was no longer appropriate to post it at WT:VIP.

Earlier today Chuck Entz applied his assumption of Grimm's law and wrote a rude comment about me on https://en.wiktionary.org/wiki/Wiktionary:Etymology_scriptorium/2014/August#.CE.BA.CE.B1.CF.84.CE.AC_and_Appendix:Proto-Germanic.2Fgad.C5.8Dn.C4.85. Lysdexia (talk) 03:30, 29 August 2014 (UTC)

Transclude to Wiktionary:Vandalism in progress[edit]

Atelaes (talkcontribsdeleted contribspage movesblock userblock logactive blocks) and EncycloPetey (talkcontribsdeleted contribspage movesblock userblock logactive blocks) are ultraconservative English-haters who revert glosses and bully and block the user, me, who adds them. Their NPOV violations are linked at User talk:Lysdexia#2 week block and thereabove. Help:FAQ is exactly the standards I follow where one section tells the editor to use OED and other dictionaries for verification. When I tried to post a OED link to prove my edit to the entry sense, EncycloPetey said he didn't get OED and couldn't read the link; therefore he wouldn't consider my edit and block, even if one can access OED.com for free if one enters anything in the library card field. Atelaes has picked on me the whole time where at last he's reverted a legitimate edit after I had a talk with another admin, Chuck Entz, and blocked me for two weeks after I had another talk with Chuck on another word where I stated my case where he and the other admins or editors all break the NPOV policy and that the policies agree with my edits and disagree with theirs; however, Chuck has not answered and I did not revert this word. These two editors bring a hostile work environment to this project and misrepresent its policies and my edits. Lysdexia (talk) 17:51, 19 May 2014 (UTC)

We know Atelaes and EncycloPetey quite well and we know that they are neither vandals nor English-haters. I don’t know what you mean by ultraconservative. By making exaggerated accusations such as these, your own reputation and credentials suffer. I looked at a couple of the entries you may have meant, and it was clear to me that you think this is Wikipedia. We don’t work like Wikipedia. You think policy and rules must be stated somewhere in black and white. That has never been the case in this Wiktionary. First and foremost, listen to the advice and explanations of the old hands such as Atelaes and EncycloPetey. If you find something written in a policy page that disagrees with them, mention it with a link so that we can correct the policy page. —Stephen (Talk) 18:33, 30 August 2014 (UTC)

Make rhymes pages use / as the separator[edit]

Previous discussion: WT:RFM#Rhymes pages from using : as the separator to using /

In the previous discussion there was no opposition but also not much support either. Mglovesfun suggested removing the hyphen as well. I think that's a good idea, so I propose moving Rhymes:Dutch:-ɑn to Rhymes:Dutch/ɑn, Rhymes:Dutch:-ɑ- to Rhymes:Dutch/ɑ-, and doing the same kind of change to other pages. The {{rhymes}} template would need to be changed to point to the new place. —CodeCat 18:20, 30 August 2014 (UTC)

Should showI3raab be on by default in Module:ar-translit?[edit]

Apparently there is a standard for transliteration functions, involving 3 params text, lang, sc. The Arabic one has a fourth parameter showI3raab that causes final-syllable short vowels to be displayed; otherwise they are omitted. I think this should be the default -- if someone writes in those vowels they should be displayed. If you don't write them in, they won't be displayed, simple as that. Arabic script containing vowel diacritics isn't that common anyway, and when it's there, often those final vowel diacritics are omitted when it's intended that they don't be read. Benwing (talk) 02:51, 1 September 2014 (UTC)

Yes, for automatic transliteration, "showI3raab" should be the default, otherwise verbs won't show full endings. Otherwise, there's no point in displaying classical Arabic grammar endings. The ability to drop the endings in pausa or in a more relaxed MSA should be assumed. So, يُسَالِمُ (yusālimu) should be transliterated "yusālimu", even if it can be pronounced as "yusālim". The module will probably be restricted to conjugation and headers (if implemented), anyway. We have a tradition to drop "i3raab" vowels in translations - "katab", not "kataba". If translations were supplied with full vowels and automated, we could revisit this. --Anatoli T. (обсудить/вклад) 02:57, 1 September 2014 (UTC)
I think it should be disabled by default (mostly for the headword line and for links), but enabled in conjugation and inflection tables. --WikiTiki89 01:31, 2 September 2014 (UTC)
There is no real reason to disable "showI3raab" by default in the headers as well. We transliterate كَتَبَ (kataba) as "kataba", not "katab" and يَكْتُبُ (yaktubu) as "yaktubu", not "yaktub". If case or conjugation diacritics are written, they can also be transliterated, e.g. "يُسَالِمُ" is "yusālimu" with the final ḍamma, "يُسَالِم" is "yusālim", without it. It also matches Arabic grammar references. Textbooks often skip final vocalisation to reflect the actual pronunciations, which also matches transliteration then. Learners who know this feature will have no problem with this, dropping i3raab is a common knowledge, e.g. even if we transliterate "yusālimu", they will know to drop "u" in the less formal environment or in pausa. You can arrive at the abbreviated pronunciation by dropping i3raab, the reverse is more difficult ("kataba" and "yaktubu" transliterations are informative and unambiguous, you can always remove -a and -u). --Anatoli T. (обсудить/вклад) 02:13, 2 September 2014 (UTC)
Module:ar-translit isn't used in the headword line as far as I know; nor in links. Headword translations do contain final vowels in verb forms. I would rather have the default be to translate what is written; this is what all other languages do. If you don't write the final vowels in Arabic, they won't get transliterated, simple as that. Benwing (talk) 05:41, 2 September 2014 (UTC)
Thanks to your improvement in Module:ar-translit (it seems to do the right thing now) it will be possible to use it in the headword (without the override of the manual transliteration). It will only be neccessar to provide all the necessary diacritics and the the translit module will do the rest, if manual translit is missing. A headword containing نَظَافَة (naẓāfa) would transliterate it as "naẓāfa" or "naẓāfatun" if tanwīn is added, e.g. نَظَافَةٌ (naẓāfatun). --Anatoli T. (обсудить/вклад) 05:53, 2 September 2014 (UTC)
Our current practice for nouns is to write مَكْتَبَةٌ (with the tanween), but transliterate it as "maktaba". Do you think this should be changed? --WikiTiki89 12:06, 2 September 2014 (UTC)
I this this current practice makes no sense. Either do or don't write the tanween, but either way, transliterate what's written. I've never seen an Arabic book that uses a practice like what you just described; e.g. John Mace's "Arabic Verbs" writes the i`rāb vowels on verbs but not on verbal nouns or participles, and transliterates accordingly.
BTW Currently Module:ar-translit transliterates final taa' marbūṭah as (t), although it could as well write it as h or nothing. Writing as nothing has the problem that it becomes hard to distinguish feminine endings from the verbal -a ending, and hard to distinguish the long feminine ending -āh from masculine alif maqṣūrah ending . Benwing (talk) 00:33, 3 September 2014 (UTC)
I am all for transliterating the tanween in nouns, but that is not our current practice. If we do that, then we won't have any problem of how to transliterate the ta marbuta. --WikiTiki89 00:47, 3 September 2014 (UTC)
(E/C) @Wikitiki89. Yes, I think it makes sense to change it, especially if we are going to use more the automatic transliteration module. In my translations or occasional Arabic entries I make I usually don't use iʿrāb, so I don't think I didn't follow it.
@Benwing. I think it's better to transliterate tāʾ marbūṭa (in pausa, without iʿrāb) as nothing. It's not possible to avoid all possible confusions and we have no intention to reverse-engineer Arabic script, and we will still need to cater for exceptions in pronunciations, especially loanwords and dialects. alif maqṣūra and final alif (pronouncable) may also end up as "ā". Another issue is initial alif, Mahmudmasri insisted on not transliterating it and I eventually agreed. --Anatoli T. (обсудить/вклад) 00:50, 3 September 2014 (UTC)
Currently we do transliterate an initial hamza when the Arabic script writes alif-hamza, and we don't if the Arabic script has a plain alif (hamzat-al-waṣl). I think this is the right thing to do, and again it keeps the transliteration consistent with the original script. I know that Hans Wehr's dictionary doesn't transliterate initial hamza in any case but I think this doesn't make a lot of sense and it's contrary to the practice of most other Arabic books (e.g. John Mace "Arabic Verbs"). I thought what you had actually agreed was not to transliterate hamza in loanwords; if you really want to implement that, just write the Arabic without an initial hamza. I really think in general the transliteration should not gratuitously lose information, which is what happens if you omit initial hamzas.
BTW the ending اة is commonly pronounced -āt even in pausa (I'm pretty sure at least), so it could be transliterated this way. Benwing (talk) 01:42, 3 September 2014 (UTC)
Re: initial hamza. Well, that was my idea as well, all the time. I conceded to Mahmumasri's demands/requests with reluctance but he is not a very active editor. Hans Wehr doesn't usually use initial hamza, so we can't use it for reference here, anyway. Wikitiki89 seems to favour "ʾ" symbol over nothing "" (for initial hamza) as well. I agree with your point but he (Mahmumasri) had a point too. It's a matter of convention, anyway, subject to change. Yes, اة can be transliterated as "āt". --Anatoli T. (обсудить/вклад) 02:02, 3 September 2014 (UTC)

I've turned it on by default. Benwing (talk) 01:12, 4 September 2014 (UTC)

September 2014[edit]

English plural nouns: agreement and countability[edit]

We tell our users what is mostly obvious by inspection, that a noun is plural in form. The only significant information that is provided is that there is no singular form, which information is sometimes false.

I would expect that it would be more useful to a learner of English would be to know what form of verbs was used for agreement in number and whether the noun was countable, which indicates which set of determiners it would be used with.

  1. Apart from the work involved, is there any reason not to try to provide this information?
  2. Is there any reason for this information not to be on the inflection line?

If we can agree to the answers to these simple questions, then it should be possible to emend {{en-plural noun}} and/or {{en-noun}} appropriately. {{en-plural noun}} is transcluded in 1454 times in principal namespace and probably should be used in additional existing entries. This may overlap in some cases with the separate phenomenon of British/Commonwealth English requiring that a noun like team take a plural form of verbs for agreement. DCDuring TALK 15:05, 1 September 2014 (UTC)

Responding to your last sentence first, the British use of "team (etc) are" vs the American use of " team (etc) is" seems like a general grammatical phenomenon, perhaps not worth mentioning in individual words' entries. If one inverts a new word, e.g. the country name "Triceleuden", I expect it will be adapted to the existing grammar and hence a Brit will report sporting news as "Triceleuden face Australia in their next match" while an American will say "Triceleuden faces Australia".
Some entries do mention whether they take singular or plural verbs, in varying ways: feces, data, dramatics, bobby socks, grits. I agree that indicating the "verb number preference" of plural-only nous would be useful. Grits and data show why the information cannot always be on the inflection / headword line, and feces shows how the information may be too long to fit into a sense-line {{label}}. Perhaps we could use sense-line labels and just expand on them in the usage notes when necessary, though. Regalia is an example of a plural-only noun that seems to take singular verbs and plural verbs in even measure. - -sche (discuss) 16:28, 1 September 2014 (UTC)

Chinese Medicine Entries[edit]

There are hundreds of entries for Chinese medicinal preparations that nobody seems to be aware of. They go beyond encyclopedic into a level of detail that Wikipedia doesn't touch. To illustrate the magnitude of the phenomenon, here's the "definition" for 二十五味珊瑚丸:

  1. A reddish-brown pill used in traditional Tibetan medicine to "promote the restoration of consciousness, promote blood circulation and relieve pain, when there are symptoms that include unconsciousness, numbness of body, dizziness, headache, abnormal blood pressure, epilepsy and cranial neuralgia".

Ershiwuwei Shanhu pills have the following herbal ingredients:

Name Chinese (S) Grams
Qingjinshi 青金石 20
Os Corallii 珊瑚 75
Margarita 珍珠 15
Concha Margaritifera 珍珠母 50
Fructus Chebulae 诃子 100
Radix Aucklandiae 木香 60
Flos Carthami 红花 80
Flos Caryophylli 丁香 35
Lignum Aquilariae Resinatum 沉香 70
Cinnabaris 硃砂 30
Os Draconis 龙骨 40
Calamina 炉甘石 25
Naoshi 鱼脑石 25
Magnetitum 磁石 25
Limonitum 禹余粮 25
Semen Sesami 白芝麻 40
Fructus Lagenariae 壶芦 30
Flos Asteris 野冬菊 45
Herba Swertiae Bimaculatae 獐牙菜 80
Rhizoma Acori Calami 白菖 50
Radix Aconiti Preparata 制川乌 45
Herba Chrysanthemi Tatsiiensis 打箭薹草 75
Radix Glycyrrhizae 甘草 75
Stigma Croci 红花 25
Moschus 麝香 2

二十五 is Chinese for 25, and, not coincidentally, there are 25 ingredients listed. Though it's no doubt got more ingredients in the list than most of these entries, there are hundreds of them. DCDuring has added hyperlinks to some of the Latin, but other than that, a look at the edit history shows no one but the creator of the entry and some bots, and that seems to be the norm.

I may be wrong, but I think someone listing the names and amount of the ingredients for various standardized foods or beverages would be reverted pretty quickly, but these have been there for half a decade. The question is: do we want to have different standards for entries that no one cares about? Chuck Entz (talk) 07:35, 2 September 2014 (UTC)

Most of those articles need to be deleted. Only the ingredients themselves warrant inclusion. Wyang (talk) 09:57, 2 September 2014 (UTC)
Other than the fact that the tables are not WT:ELE compliant, what exactly is wrong with the entries? The items on all the tables are meronyms of the headwords, with quantification. I know that EncycloPetey has expressed a linguistic interest in Chinese herbal medicine. I have become convinced that the meronyms are themselves not necessarily SoP as they are often short names for ingredients more specific than the names would suggest, either in the species involved, in the form, or the manner of preparation. The large literature on Chinese herbal medicine makes it highly likely that all of the terms involved, headwords and meronyms, are attestable. The subject matter is basically irrelevant to inclusion.
It seems to me that all of the entries with the tables would be worth some effort to bring into greater conformity with WT:ELE and would otherwise be subject to the same RfV and RfD as other entries.
I would also appreciate views about whether "radix Aconiti preparata", as used in Chinese or English works on herbal medicine, are or might be likely to be entry-worthy or, if not, why. DCDuring TALK 13:41, 2 September 2014 (UTC)
Here is a list of them:
These are not considered words by Chinese dictionaries. Apart from SoPness, problems also include: 1) lack of creator's knowledge in Chinese. Many of these entries have been fixed over the years, but numerous mistakes are still present. For example, this nonsense entry 发育迟缓 ("growth retardation"). 2) incorrect formatting. Most of these fall into Category:Chinese terms with uncreated forms, and have weird formatting errors (eg. nonstandard use of the hanzi box, initial or trailing spaces in Pinyin, hidden characters in title). Wyang (talk) 23:50, 2 September 2014 (UTC)
Which ones should be RfDed? Which ones have something that can be saved (ie, should be RfCed)? I would like to at least harvest the taxonomic species or genera referred to in the tables. The English or "Medical Latin" terms seem attestable and not necessary SoP, as they are used as units. DCDuring TALK 02:28, 3 September 2014 (UTC)

Arabic transliteration module enabled, a minor change needed, verb header template needs work[edit]

(Notifying Benwing, Wikitiki89, ZxxZxxZ, Mahmudmasri): I have just added Arabic transliteration module Module:ar-translit to Module:languages/data2 to allow automatic transliteration, which is now enabled in all templates, which use automatic transliterations. It's not added to Module:links, so the manual transliteration is not overridden. It's a good time to change Arabic verb template {{ar-verb}}. As most verbs use full diacritics and it would be much easier, if the manual transliteration is removed, all missing diacritics checked, when the template starts using the automatic transliteration. It works fine in most cases. We need to make "showI3raab" default to show case and verb conjugation endings. As was previously agreed, we don't need word stresses for Arabic words, so a fully vocalised word نَزَفَ (nazafa) should be transliterated as "nazafa", not "nazaf" (missing iʿrāb ending) or "názafa" (stress mark). (I should mention, if it's not obvious that the module is supposed to be used on a fully or partially vocalised forms, i.e. with diacritics, which are normally unwritten in a running Arabic text). You're doing a great job, Benwing, thanks! --Anatoli T. (обсудить/вклад) 03:28, 3 September 2014 (UTC)

You're welcome.
Can you give an example of templates that use automatic transliterations? In the case of {{ar-verb}}, can you point to a verb where this translation happens? Will it happen if you remove the manual transliteration?
I'm all for making "showI3raab" the default; if no one objects I'll go ahead and do this.
Also (Notifying Lo Ximiendo): I haven't done much with {{ar-verb}} so far. I think it should be rewritten entirely so that it basically takes the same params as {{ar-conj}} and makes use of the same code. That would obviate the need for explicitly writing out the perfect and imperfect verb forms and would automatically supply the right vowels and such. This should be retrofitted into the existing params, which I think is possible. What this means is that form I verbs need only the form and past and non-past vowels specified, and augmented (non-form-I) verbs need only the form specified (the radicals are inferred from the headword; the few cases where ambiguity exists all involve weak radicals i.e. و or ي, and as it happens the call to {{ar-verb}} already specifies the radicals in these cases, e.g. in تسلى where III=و is specified, see below). The form is already present in the call to {{ar-verb}} but the past/non-past vowels aren't; however, this isn't an issue if the verb forms are manually given (which they are, currently), and we can arrange things so that there's a category containing form-I verbs whose call to {{ar-verb}} is missing the past or non-past vowels, so they'll eventually be fixed. For augmented (non-form-I) verbs, I'm thinkingwe should actually ignore the parameters specifying verb forms, because of cases like تسلى, which has a call to {{ar-verb}} declared as {{ar-verb|III=و|form=5|tr=tasallā|impf=يتسلى|impftr=yatasallā}} with missing vowel diacritics. Module:ar-verb will correctly generate the verb forms on its own: it currently handles all "regular" verbs and almost all of the very few truly irregular verbs, and if any cases come up where it doesn't work properly, just fix the module. (A more conservative approach is to check to see whether the diacritics are present and use the manually specified verb form if so.) I don't have time to work on this now, so Anatoli if you're interested in working on it, go ahead. Benwing (talk) 04:49, 3 September 2014 (UTC)
I have just updated نَزَفَ (nazafa) (and the verbal noun), which now uses automatic transliteration ("showI3raab" is currently off), so it now automatically shows "nazaf", instead of "nazafa". Note that imperfect forms only need one parameters - the form with diacritics, |impfhead=يَنْزِفُ is not necessary, neither is |impftr= (inflected forms are not transliterated in the headword. The entry got into Category:Arabic terms lacking transliteration, which should probably be removed from {{ar-verb}}. --Anatoli T. (обсудить/вклад) 05:25, 3 September 2014 (UTC)
Re: templates using automatic transliteration: {{t}}, {{l}}, {{term}}, headword templates, etc. One problem with that is that if a term is missing both manual transliteration and diacritics, it will transliterate incorrectly, e.g. نزف is, as you can see is just "nzf". Various Arabic translations using {{t}} or {{t+}} will now have wrong transliterations. Someone may complain about this. --Anatoli T. (обсудить/вклад) 05:32, 3 September 2014 (UTC)
Actually, the intention is clearly that the imperfect is translated in the headword. The call to {{ar-verb}} passes in the |impftr= param as {{head}} param |f1tr=, but this is (no longer?) supported. I think this intention is correct. Benwing (talk) 05:49, 3 September 2014 (UTC)
(You must have meant transliterated in the headword). It's no longer supported. I posted on the GP discussion you started. --Anatoli T. (обсудить/вклад) 06:01, 3 September 2014 (UTC)
Yeah, sorry, I meant transliterated. Benwing (talk) 06:11, 3 September 2014 (UTC)

Swedish entries give glosses rather than translations[edit]

The entry for malm, for example, has, among other definitions:

(archaic) an alloy consisting of copper, zinc, lead and some tin (archaic) the geological period of late Jurassic (archaic) a hill or ridge consisting of sand or gravel

Unless these are specifically terms for which there is no English equivalent, these should be translations rather than glosses - for example, the first looks as if it should simply be "bronze".

I've looked at several other Swedish entries and they seem to give translations, so this may be an isolated example after all.

Grants to improve your project[edit]

Greetings! The Individual Engagement Grants program is accepting proposals for funding new experiments from September 1st to 30th. Your idea could improve Wikimedia projects with a new tool or gadget, a better process to support community-building on your wiki, research on an important issue, or something else we haven't thought of yet. Whether you need $200 or $30,000 USD, Individual Engagement Grants can cover your own project development time in addition to hiring others to help you.

Four RfV topics to clear 2013.[edit]

If we can settle the four oldest RfV issues, we'll have 2013 cleared off of that board. Any takers? bd2412 T 20:01, 3 September 2014 (UTC)

returning nil in Module:ar-translit when vowel diacritics not available?[edit]

Now that we've turned on automatic transliteration for Arabic, one issue is that not all words have the vowel diacritics supplied, leading to incorrect transliterations. One possibility is to check for this, and return nil when encountering a word that isn't completely vocalized (with an exception made for vowels omitted on the last letter of a word). Some questions:

  1. Will this work? What happens in general when a transliteration module returns nil?
  2. Is this a good idea?
  3. Is this the right place to ask a question like this (apologies to Wikitiki89)?

Benwing (talk) 05:18, 5 September 2014 (UTC)

Maybe @Wyang: can help? You made the Korean transliterate hangeul 공부하다 (gongbuhada), while hanja is not transliterated: 工夫? (工夫하다 (工夫hada) should also be nil, IMO)
Nil shouldn't transliterate anything, like with language for which automatic transliteration is not enabled, e.g. Hindi, Hebrew, if there is no manual tranliteration. Most Arabic entries have manual transliterations, they shouldn't be removed, before vowel diacritics are added, that's all. --Anatoli T. (обсудить/вклад) 05:39, 5 September 2014 (UTC)
This probably should be a WT:GP question. But to address the question itself, I was actually thinking that we should do the exact same thing. User:CodeCat would know if it does what we want it to do. I was thinking further that if any letter where a diacritic is expected does not have one, then we should return nil. For example بَني would return nil because a diacritic is expected on the ن to distinguish بَنِي from بَنَيْ, and دَم would return nil because the iʿrāb is not specified. But لَا would not return nil because no diacritic is expected on the ʾalif. --WikiTiki89 19:32, 5 September 2014 (UTC)
My thought was to allow final consonants without iʿrāb (which is frequently omitted in otherwise properly vocalized nouns), and to allow a few other cases where things are unambiguous even without diacritics -- specifically, alif or tā' marbuṭa with missing fatḥa before it. This is not intended to encourage people to write things like this but to handle existing usage like كاتِب, which is written as such in the كاتب entry and can be unambiguously transliterated as kātib. (Sorry if this is drifting into an Arabic-specific discussion again.) Benwing (talk) 20:52, 5 September 2014 (UTC)
But doing that would be a way to catch all existing cases and fix them. --WikiTiki89 21:31, 5 September 2014 (UTC)
This is true. I guess it comes down to a compromise between current usefulness and usefulness in fixing. Since I don't see any people currently offering to go fix all the (thousands of) existing cases needing fixing, I'd rather have the iʿrāb-less transliterations there. If someone wants to fix the iʿrāb, they can edit Module:ar-translit and temporarily comment out the lines that allow iʿrāb-less transliterations (there's a comment indicating where to do this). Ideally there would be a way of allowing iʿrāb-less transliterations while still marking them but I don't see how to do it.
BTW in case it's not clear I did implement returning nil on unvocalized text. Hopefully I did it correctly. So far all the places I can find that should have transliterations do. Benwing (talk) 08:07, 6 September 2014 (UTC)
If a transliteration module returns nil, it's equivalent to having no transliteration module at all. —CodeCat 19:34, 5 September 2014 (UTC)
Can we somehow tag it with a category such as Category:Arabic terms lacking transliteration? --WikiTiki89 19:45, 5 September 2014 (UTC)
How would Module:headword know that a transliteration is needed? —CodeCat 20:37, 5 September 2014 (UTC)
Any time a call to the transliteration module returns nil, it should be inserted into a category indicating this, perhaps Category:Arabic terms lacking vocalization; presumably this would be language-specific (for Arabic, maybe Hebrew as well and other languages using Arabic script) or script-specific. Module:links should also do this; it in fact already inserts categories like Category:Terms with manual transliterations different from the automated ones. Benwing (talk) 20:52, 5 September 2014 (UTC)
I think Category:Arabic terms lacking transliteration makes more sense, since it would be difficult for Module:headword to infer the reason for the transliteration failure. This would only apply to Arabic, not other Arabic-script languages, since most of them do not have vocalization systems. --WikiTiki89 21:31, 5 September 2014 (UTC)
I think we should be able to make this more general. After all, we really want transliterations for any language not written in Latin script, right? Whether it's generated or manually supplied is not even relevant, as long as it's there. So maybe we could apply this general rule: if the term is written in a script that is not Latn, Latinx or varieties, then if there is no transliteration, add a category to request one. That way we don't have to make it specific to Arabic. —CodeCat 21:50, 5 September 2014 (UTC)
Did I ever say "Let's make it specific to Arabic so that other languages can't take advantage of this useful feature."? Anyway, jokes aside, I agree, if there is no transliteration then it should be placed in a category, whether it's a headword or just a link. But, I think that if {{{2}}} is specified, then the category should not be added, and we need some language-specific exceptions such as for Serbo-Croatian. --WikiTiki89 21:54, 5 September 2014 (UTC)
I also think this is a good idea. CodeCat, could you take a crack at this when you have a chance? I can't do it myself since Module:links is locked. Benwing (talk) 08:07, 6 September 2014 (UTC)

Bracketed ellipses and widowing[edit]

A week or so ago I fell sick and faced a time of lassitude and confinement. To counter or alleviate this I decided to undertake a detailed but repetitive Wiktionary project that had suggested itself during my normal occupation of adding quotations to senses, a project that could be done with a dull mind, and sitting up in bed.
 I had noticed that Robert Burton's The Anatomy of Melancholy was frequently and justifiably quoted in various entries, but also that the quotations varied in details and only occasionally (and sometimes wrongly) linked to the two directly relevant Wikipedia entries. So I created a template (Template:RQ:RBrtn AntmyMlncly) to use in sense headings. I've recently finished (I think), and there were over 400 quotations.
 As is my habit, I occasionally made what I saw as minor changes, such as adding full stops (periods) at the end of sense definitions, and making leading letters in those definitions upper case if needed. After I had been doing the project for a few days I became aware that one of the most frequent problems I was fixing was potential widowing. Because of a background in the printing industry going back to the early '50s and the days of moveable type, this was not a minor problem to me. I'll explain.
 A widow, as I was taught it, was several possible things. A line widow, as it was told to me, was the first or last line of a paragraph placed on a separate page from the rest of the paragraph. And this was considered unutterably evil if a word was split by hyphenation across the pages. Of course, in the days of moveable type this could very easily be fixed before the flong was flung. But there are widows at the letter level if mistakes are made in the typesetting, particularly with punctuation because punctuation usually needs to be juxtaposed to a word. These were avoided automatically by typesetters and simple ones are nowadays usually avoided by formnatting software. And line widows are irrelevant to the Wiktionary as running text is not split between separated pages; Keys such as PgUp and PgDn move text on the screen.
 However, the widowing that I became aware of was particular to quotations related to the bracketed ellipses (hereinafter simply called ellipses). It looks as if the {{...}} template was introduced to make ellipses easy to key in. This is just fine if the ellipsis is simply a gap between words, but there is a problem: if the ellipsis is at the end of a sentence or quotation then it needs to be followed by a full stop (period). Unfortunately for the full stop, {{...}} puts a simple space in the display on either side of the ellipsis.
 That it looks bad is not the major problem, which is that the full stop will be split off if it won't fit onto the end of the line, and worse still if the split happens at the end of a quotation. Ordinary users of the Wiktionary will have different line widths, so some would see it even if most wouldn't. Not quite so bad, but bad enough, is the effect of the simple space put in front of the ellipsis which can split the ellipsis from the text it follows, which has a bad effect on the aesthetics and a bit of burden for the readability.

  I have been criticised for making my modifications to avoid these problems and have had some undone. What I am asking for here is to have agreement that something should be done about the problems like the ones I have outlined. If agreement is reached, I suggest that what should be done is to create three new templates
(1) {{..,}} to leave off the trailing blank of the ellipsis,
(2) {{,..}} to replace the leading blank of the ellipsis with a non-breaking space (&nbsp;),
(3) {{,.,}} to combine the above two actions.
Note that it would be a mistake to use (1) to add a full stop to the ellipsis as the omission will also be needed for other punctuation, such as the comma.

I've tried to look for what's in the {{...}} template, but have got lost, perhaps because of my medical state, which continues. If something can be done very quickly I would be extremely grateful, as I would like to use another similar project to lighten my oncoming days. — ReidAA (talk) 10:06, 5 September 2014 (UTC)

(First of all, thanks for your work on finding the quotations.) Personally I don't at all like seeing the &.amp; &.hellip; &.nbsp; stuff floating around in an entry (I see you've even used it to format your comments here) and I think it hinders editing. Almost the only one I ever use is &.mdash; and that's because I get tired of having to open Character Map to find the literal symbol. Anyway: IMO, if we need to fix this "widow" problem at all (has anyone else been upset by it, or even noticed it?), then Wiktionary's markup is not the correct place to fix it. It sounds more like something that should be addressed by the CSS (Cascading Style Sheets) standard, surely...? Equinox 10:16, 5 September 2014 (UTC)
I must plead guilty to sometimes using &.amp; in place of an ampersand and always use &.#0133; in place of an ellipsis - on the assumption (maybe mistaken) that these characters might not show up correctly on some machines. Should I desist? — Saltmarshαπάντηση 10:54, 5 September 2014 (UTC)
I would greatly appreciate any advice on how to achieve the effect of a non-breaking space using CSS. Specifically, I would like to replace many instance of &.nbsp;- with something less intimidating to potential contributors. In the application I have in mind all instances of a dash in a piece of text (a line with no line-breaks other than those imposed by the width of the frame) would warrant such treatment. DCDuring TALK 13:13, 5 September 2014 (UTC)
This discussion seems to be straying from the requests that I started this discussion with. Let me start again with a different tack.
  Someone wrote the (presumably very simple) {{...}} template that seems to have had wide acceptance, probably both by those like me who are concerned with presentation details, and those who are put off by plain HTML code, at least such as uses ampersands.  Furthermore I have been told that the Wikimedia software has trouble handling the directly coded […] ([&hellip;]) so that I should use the template, and I have been doing so when no widowing is threatened.
  If there are others like me concerned with presentation effects, and if it would be a simple task for someone with the requisite knowledge to create the three new templates I suggest, what objection is there to creating them so that concerned contributing editors like myself could improve the presentation as we go along making other contributions?  What effect would there be on editors who are not worried about widows?  My work here suggests to me that there is a hell of a lot of such tidying to be done. — ReidAA (talk) 23:07, 5 September 2014 (UTC)
I am not sure that I did this right, but it seems to work. It substitutes a nonbreaking space for what turned out to be an ordinary space. Try it: {{nb...}}. Frankly, if it works, it would seem to be preferred to the original. I can't see why it would ever be right to risk having the [...] widowed. DCDuring TALK 23:26, 5 September 2014 (UTC)
I tried it by simply putting {{nb..}} into an entry I was working on, but it gave an error. Is there something special I need to code to get to your brave attempt? And, yes, I agree about it being better than the original. I can't offhand think of any case where the change would cause problems. — ReidAA (talk) 00:05, 6 September 2014 (UTC)
@ReidAA: It is {{nb...}}, not {{nb..}}. If you typed it in with three dots and got an error, leave it there and let me know the entry. DCDuring TALK 00:46, 6 September 2014 (UTC)
@DCDuring: Not needed; I took it out when it didn't work.  But, I would prefer the shorter name for elegance; or how about {{nb.}}? — ReidAA (talk) 01:04, 6 September 2014 (UTC)
@ReidAA: I'd like to understand in which situations it goes wrong. Maybe I could fix it. DCDuring TALK 01:12, 6 September 2014 (UTC)
This addressed your second requested item. But why not eliminate the trailing space for all cases? What harm could come of having the leading space always be a non-breaking space? DCDuring TALK 23:31, 5 September 2014 (UTC)
Unfortunately there is a lot of existing code that relies on the trailing space being added. Oh, and I agree about the leading space. — ReidAA (talk) 00:05, 6 September 2014 (UTC)
@ReidAA: I removed the trailing space for {{nb...}}. The impression I get is that most uses of {{...}} have additional leading and trailing space inserted by users after the template. The wiki software renders the two spaces, one from the user the other from the template as one. I think the problematic entries that would exist if we were to substitute {{nb...}} for {{...}} would be of two kinds. 1., The kind with no space rendered after the template could be found by search for  [] followed by a letter. This wouldn't be so bad for the basic Latin script, but the search would also have to take place for other letters and symbols in other scripts. 2., The kind with an extra space before would need some kind of bot to eliminate the extra breaking space, but the extra space is not very annoying to me. Can you find the extra space on this paragraph? DCDuring TALK 00:24, 6 September 2014 (UTC)
@DCDuring: I put two responses in last time; I think you might have overlooked the first.
  Indeed a lot of editors include both leading and trailing spaces with their ellipsis template invocation, but there are also a helluva lot who don't.  Hullo, hullo, I've just noticed that your template has three dots; in my trial I only used two.  When I've finished here I'll go and try again.  Wouldn't the extra preceding breaking space your refer to potentially widow the ellipsis?  Or have I misunderstood you?  Incidentally, I've just imitated your ping at the start of this response.  Is this supplementary or alternative to the watch this page button? — ReidAA (talk) 00:57, 6 September 2014 (UTC)
@DCDuring: Well, it works, thanks very much indeed.  When will it be alright for me to start using it?  Do you agree with me about the shorter name(s) though? — ReidAA (talk) 01:14, 6 September 2014 (UTC)
{{ping}} is additional. It lights up the number next to your username at the top of the page. I don't remember if it works across projects.
If we replace insert the altered code into {{...}}, there would be a lot of cleanup to do.
I'd like to keep this five-keystroke name until I know we can't improve it. Your suggestions seem fine, but this might warrant more attention, perhaps at the grease pit.
{{nb...}} should be used without a user-added leading space to avoid the line-break. The user controls what happens after the {{nb...}}. It is by no means idiot-proof.
You could start using it now. It is really fairly simple, probably low risk. If something goes wrong, — stop using it, undo it, and let me know the entry where it went wrong. DCDuring TALK 01:26, 6 September 2014 (UTC)
Then I shall start using it right away.  My tests suggest to me that, since I understand it well (I think), there will be no problems.  My very small experience with RQ templates included the (somewhat obscure) creation of documentation to go with them.  Your code for nb... looks very opaque to me, but would you like me to try to create documentation for its use?  Again, thanks very much for your cooperation and work. — ReidAA
It is not really my code. All I did, starting from a copy of {{...}}, was substitute a nonbreaking space for the ordinary space at the leading position and delete the trailing space. I really couldn't have started from scratch. DCDuring TALK 04:31, 6 September 2014 (UTC)
Well done. I'll put some documentation together in a little while and let you know. — ReidAA (talk) 04:36, 6 September 2014 (UTC)
@DCDuring: Template documentation added at Template:nb.../documentation. You might like to look it over and improve it. — ReidAA (talk) 07:00, 6 September 2014 (UTC)
@ReidAA: I added Category:Text format templates to both {{nb...}} and {{...}} and referenced {{nb...}} in the documentation for {{...}}. There might be some now-unnecessary CSS stuff in {{nb...}}. DCDuring TALK 13:02, 6 September 2014 (UTC)
@DCDuring: Thanks very much.  Great work!  And I've found it useful several times already. — ReidAA (talk) 23:20, 6 September 2014 (UTC)

Proto-Hellenic and Proto-Greek[edit]

Although we tend to think of Greek as a single language, it's actually a family of languages, and those languages have a common ancestor. According to Wikipedia, what is reconstructed as "Proto-Greek" usually includes all Hellenic dialects, including Mycenaean, but not the rather divergent Ancient Macedonian. One noticeable point of early divergence in AM is that the aspirates have voiced reflexes in some cases in AM while they are always devoiced in the rest of Hellenic. Unless Mycenaean underwent a re-voicing, this would have to indicate that their common ancestor still had voiced aspirates as they were inherited from Proto-Indo-European.

So this would mean that a hypothetical Proto-Hellenic, including Macedonian, would have to have voiced aspirates, while Proto-Hellenic-minus-Macedonian would have voiceless aspirates. The problem is how to define the Hellenic language family, and how people reconstruct it. Wikipedia notes that people may or may not include Macedonian in the reconstruction, but generally do not. Of course, if we include such reconstructions, we can't really call them "Proto-Hellenic" because they're missing one branch. So what should we call them? If we call them "Proto-Greek", then we would need to invent a new language family for the non-Macedonian branch of the Hellenic languages, make up a name for it (if we use "Greek" it would cause confusion with the language) and also a code. —CodeCat 23:28, 6 September 2014 (UTC)

Is there really that much consensus that Macedonian shares a common ancestor with Greek later than PIE? I would say we should treat Proto-Hellenic as identical to Proto-Greek and leave Macedonian out of it altogether to be on the safe side. —Aɴɢʀ (talk) 13:01, 7 September 2014 (UTC)
I think there is a fair consensus, at least judging by Wikipedia sourcing. But after having looked at it, there are some common innovations that appear to show that if it wasn't Greek, it was closely allied with it. At the very least, it was still similar enough to Greek that it took part in common sound changes, such as the loss of -y- intervocalically and the change of final -m to -n. w:Ancient Macedonian language shows various classifications that have been made over time. To me, the first and last proposals are the most plausible, and note that some sources call the combination of Greek + Macedonian "Greco-Macedonian" while others call it "Hellenic". So we have to make sure we check how each source understands the terms "Greek" and "Hellenic" before using them as references. In any case, if we do treat Proto-Hellenic as not including AM, then we also need to change the family of AM because it's currently included in Category:Hellenic languages. —CodeCat 13:12, 7 September 2014 (UTC)
I've written a basic draft for Wiktionary:About Proto-Hellenic. Can other editors review it and comment on it? —CodeCat 12:30, 8 September 2014 (UTC)
Oh look - even more legitimate etymologies vandalized by CodeCat's original research. Will this nightmare ever end? --Ivan Štambuk (talk) 18:08, 8 September 2014 (UTC)
Oh look, even more personal attacks from Ivan. Will this nightmare ever end? —CodeCat 19:59, 8 September 2014 (UTC)
CodeCat, what are your sources for reconstructions like *éhər. Also, how is it useful? All its content can be presented at ἔαρ (éar) without duplication. --Vahag (talk) 20:18, 8 September 2014 (UTC)
I agree with Angr that we should not distinguish Proto-Hellenic from Proto-Greek. Wikipedia does not seem to support such a distinction, specifically noting in w:Hellenic languages that most researchers consider them identical. This could happen either if Ancient Macedonian is considered to be a descendant of Proto-Greek or to be outside Proto-Hellenic entirely. The Wikipedia page on w:Ancient Macedonian language describes it as a dialect of Northwest Greek (i.e. a Proto-Greek descendant) but also notes that it's not well-attested. The supposed need for a distinct Proto-Hellenic hangs from a pretty thin thread, IMO -- one single sound change (which Wikipedia identifies as happening "sometimes") in a barely-attested language along with a clear lack of scholarly consensus. As a result I really think that it needlessly complicates the picture to create a Proto-Hellenic separate from Proto-Greek, which is going to have identical reconstructions to Proto-Greek except for mechanically substituting voiceless aspirates with something else. If you want to include AM forms, you should probably just list them under Proto-Greek.
As for the sound changes themselves, there's nothing a priori impossible about a revoicing of aspirates to voiced stops, i.e. there's not really any strong evidence that AM wasn't a Greek dialect. And your examples of common sound changes between AM and Greek don't rule out AM being outside of Proto-Hellenic: The loss of intervocalic -y- and change of -m to -n could easily be areal phenomena, or simply independent changes, esp. -m to -n, which occurred in many different IE branches. Benwing (talk) 21:32, 8 September 2014 (UTC)
The w:Proto-Greek page lists several sound changes that must have occurred after Grassmann's law, and the devoicing of aspirates is one of them. AM didn't appear to have aspirates (or at least they weren't written for us to see?), so it seems that this important identifying feature of Greek did not affect AM. It also implies that any pre-stage of Proto-Greek that would include voiced aspirates would also lack Grassmann's law, as well as palatalization, so it would be much closer phonologically to PIE; basically PIE minus laryngeals. —CodeCat 21:44, 8 September 2014 (UTC)
The devoicing of aspirates occurred before Grassmann's Law. And the list you present isn't necessarily chronological (BTW I wrote most of that list). But I don't see what any of this has to do with whether there should be separate Proto-Hellenic reconstructions, which I don't think makes sense. And as I said before there's no evidence that AM's voiced sounds weren't secondary or due to substrate influence or whatever. Benwing (talk) 02:27, 9 September 2014 (UTC)

Italic and transliterations[edit]


I noticed that in the section of translations and for the headline, transliterations are not in italic, while they are in the section of etymology. It's not very important but I'm curious: is there a reason to that? Thanks by advance. — Automatik (talk) 02:00, 7 September 2014 (UTC)

That's odd- I see the opposite (удар (udar) (no italic), удар (udar) (italic)). DTLHS (talk) 02:14, 7 September 2014 (UTC)
Sorry, my mistake. I fixed it. — Automatik (talk) 04:08, 7 September 2014 (UTC)

Layout of IPA in the editing tools[edit]

Currently there doesn't seem to be a very clear method to how the IPA characters are arranged in the edit tools (below the edit window). This makes many of the characters hard to find as you basically have to look through the whole list, sometimes several times. So I'd like to propose that the characters be arranged in the standard IPA table format, like on WT:IPA but perhaps more compact. —CodeCat 13:45, 7 September 2014 (UTC)

Certainly more compact; there's no need to include the IPA characters that are also basic ASCII characters. (I believe that set consists entirely of the 26 lowercase letters of the English alphabet.) —Aɴɢʀ (talk) 15:03, 7 September 2014 (UTC)
Then there would just be gaps in the table. We might as well include them if the space goes unused otherwise. —CodeCat 15:37, 7 September 2014 (UTC)
The most helpful layout for a non-linguist editor would be per language: letters in alphabetical order, the IPA symbol immediately after each letter. E.g. for Hungarian: Aa /ɒ/ Áá /aː/ Bb /b/ Cc /t͡s/ Cscs /t͡ʃ/, etc. Since there are more than 1500 languages in this wiki, this might be a very long list in the drop-down, but this could be solved if editors could select a small number of alphabets to appear in the drop-down; the ones they are actually using, let's say between 1 and 15. This could be added to the preferences. --Panda10 (talk) 16:35, 7 September 2014 (UTC)
Surely people who are editing in a language are at least able to name the script it uses? —CodeCat 17:09, 7 September 2014 (UTC)

Automatic pronunciation templates[edit]

We now have a fair number of templates that automatically generate pronunciation information in IPA based on the spelling of the headword and/or an orthographic representation given as a template parameter. The ones I'm aware of are:

There may be others; not all of the above are included in Category:Pronunciation templates, so maybe there are more that aren't there. (There's also {{fa-pron}}, which doesn't actually give pronunciation information, and {{liv-IPA}} which requires manual input of IPA rather than generating it automatically.) If you know of others that I haven't listed above, please add them.

The first problem is that these templates are not all gathered into a single category. Is Category:Pronunciation templates sufficient, or should there be a more specific category like Automatically generated pronunciation templates for them?

The second problem is that there is no uniformity of naming. Some are called "xy-pronunc", some are called "xy-IPA" (or "xy-ipa"), some are called "xy-pron" (particularly bad since other templates called "xy-pron" are headword line templates for pronouns), and some have "-auto" appended to the end. Ideally, there should be a uniform name for these; my preference is for "xy-pronunc", but what do others think? —Aɴɢʀ (talk) 14:55, 7 September 2014 (UTC)

I think we should name them {{xx-IPA}}. This resolves the pronoun problem and leaves room for the theoretical possibility of counterparts outputting something other than IPA. --WikiTiki89 13:44, 8 September 2014 (UTC)

Some IPA templates names differ in usage, e.g. Korean {{ko-pron}} is to display the user written IPA and {{ko-pron/auto}} is automatic. (There are also a number of templates for Chinese (Mandarin, Cantonese, Min Nan, etc.) and a Japanese template, which weren't listed above.) --Anatoli T. (обсудить/вклад) 05:54, 9 September 2014 (UTC)

Also {{fa-pronunciation}} (takes transliteration as input; not Lua-ized) --Z 08:54, 9 September 2014 (UTC)
I have made the following moves so at least the Luacized templates are consistently named:
Two languages that automatic IPA templates ought to be easy to write modules for (but not by me since I don't know how to write modules) are Finnish and Hungarian. Anyone feel like writing modules and creating {{fi-IPA}} and {{hu-IPA}} to invoke them? —Aɴɢʀ (talk) 15:01, 1 October 2014 (UTC)
I've considered writing a module for Slovene, but for that language we would also want to display the tonal diacritic/respelled form. Naming it "IPA" would not be appropriate in that case. —CodeCat 15:32, 1 October 2014 (UTC)
Yeah; as Wikitiki says, using "-IPA" for the ones that do generate IPA allows the option of having some other name for the ones that don't (or the ones that generate other things in addition to IPA). —Aɴɢʀ (talk) 15:56, 1 October 2014 (UTC)

Workshop: Greek and Latin in an Age of Open Data[edit]

This event, at the University of Leipzig in December, may be of interest: Workshop: Greek and Latin in an Age of Open Data. It would be good to have somebody there to represent and speak about Wiktionary and Wikidata. Pigsonthewing (talk) 16:00, 7 September 2014 (UTC)

ORCID and other identifiers[edit]

People who edit Wiktionary should be able to show w:ORCID (and other forms of w:Authority Control, such as w:VIAF) on their user pages, as explained at w:WP:ORCID.

The template d:Template:Authority Control allows this. Can someone with the relevant bit please use Special:Import to import it and all its sub-templates (or set the bit, temporarily, to allow me to do so)? I'll then be happy to do the documentation and set up some examples. Pigsonthewing (talk) 16:15, 7 September 2014 (UTC)

@Pigsonthewing: Assuming you meant w:Template:Authority control (d:Template:Authority Control doesn't seem to exist), I imported it to Template:authority control. However, I didn't want to import the complex modules it depended on — like (apparently) w:Module:Arguments, which is probably written according to different coding conventions than are used here, and possibly also redundant to things here — and the template was being waaay more complex than it needed to be by depending on modules, anyway, so I simplified it dramatically. If someone wants to prettify it a bit so that parameters which are not set simply don't display, rather than displaying "—", they can feel free to do that. - -sche (discuss) 02:34, 10 September 2014 (UTC)
Thank you, - -sche. I actually meant d:Template:Authority control (lower-case "c"). Your version has removed both the links to articles about the authority control type (like w:VIAF, w:ORCID, but also to the authority control databases, (like https://viaf.org/viaf/70042340 and http://orcid.org/0000-0001-5882-6823 ). For a comparison, look at my user page here, and on en.WP. Please will you look to importing the Wikidata template instead, which has the latter links, and a version of the former (which I will then update), and hides empty parameters? Pigsonthewing (talk) 21:10, 14 September 2014 (UTC)
@Pigsonthewing: Ah, I see. Well, I didn't notice this before, but it seems that en.Wiktionary cannot import pages from Wikidata. (Compare how en.WP cannot import pages from en.Wiktionary.) The wikis which our Special:Import is configured to allow importation from are w, b, s, q, v, n, commons, and a long list of other-language Wiktionaries. I can think of a couple of ways around this. One is to get approval for en.Wikt to import Wikidata pages; I suppose the avenue for that would be bugzilla. Another is to copy whatever revision of the Wikidata template you want, using your edit summary to explain what you were doing and to direct users to see the Wikidata page for its history and contributors. - -sche (discuss) 22:54, 14 September 2014 (UTC)
@-sche: Thanks; in that case, I'll do the latter (I was just hoping to preserve the edit histories). Before I do, would you like to delete your imports, or shall I just overwrite them? Pigsonthewing (talk) 18:05, 15 September 2014 (UTC)
You can just overwrite them. - -sche (discuss) 18:17, 15 September 2014 (UTC)
@-sche: OK, that's done; see Template:authority control. I'm out of time now, but tomorrow I will work on the documentation, examples and styling. In the meantime, please see the instance on my user page. Pigsonthewing (talk) 21:03, 15 September 2014 (UTC)


The template is now working. It can be styled horizontally (like on Wikipedia - see my user page on en.WP) or left as it is. To do the former, we'd need to copy the rules for the class hlist from en.WP's Common.css

Next, we need to think about how to encourage contributors to display their authority control IDs on their user pages; and if appropriate to register for an ORCID identifier. How can I get a line added to Wiktionary:News for editors? Pigsonthewing (talk) 15:11, 16 September 2014 (UTC)

I've solicited others' input on whether to copy en.WP's hlist rules or not.
I can add a blurb to WT:NFE. What should it say? "Template:authority control exists and allows users to specify their authority control numbers." ?
- -sche (discuss) 20:10, 16 September 2014 (UTC)

Requests for verification of pronunciation[edit]

Occasionally I come across entries where I suspect the pronunciation listed is erroneous. Where can I request verification of suspect pronunciations? We have Wiktionary:Requests for verification but that seems to be for verifying meanings only, not for verifying pronunciations. —Psychonaut (talk) 14:18, 9 September 2014 (UTC)

All I can think of is to use the {{attention}} tag, or take a specific issue to the Tea room. —Aɴɢʀ (talk) 14:48, 9 September 2014 (UTC)
We have {{rfv-pronunciation}}. —CodeCat 16:58, 9 September 2014 (UTC)
Which categorizes entries into "Category:Xyz entries needing reference", but AFAICT all such categories are red. Category:English entries needing reference is at any rate, and adding {{poscatboiler|en|entries needing reference}} creates an error. —Aɴɢʀ (talk) 13:13, 13 September 2014 (UTC)
They haven't been created yet because there's a discussion at WT:RFM about what to name them. Please join in! —CodeCat 13:18, 13 September 2014 (UTC)
The trouble with joining in that conversation is that I don't care what they're named as long as a name gets picked soon. —Aɴɢʀ (talk) 14:33, 13 September 2014 (UTC)
I think I'll just create the categories for now, and delete them once a conclusion is reached. That way things aren't left hanging in the air while people decide. —CodeCat 19:25, 15 September 2014 (UTC)

Change in renaming process[edit]

-- User:Keegan (WMF) (talk) 16:22, 9 September 2014 (UTC)

Sant Bhasha[edit]

Apparently, a user that goes by the username Bhvintri (talkcontribs) says/claims that the word ਸੋਚੈ is of a language called Sant Bhasha. Should we make a code and category for such a language? --Lo Ximiendo (talk) 19:31, 10 September 2014 (UTC)

Can we wait for someone to comment before deleting the entries? DTLHS (talk) 00:56, 11 September 2014 (UTC)
I wasn't aware of this discussion, and the user kept creating more of them. I deleted them to try to limit the damage as every single one of them was badly formatted, and in any case we'd need to update them all if a code is assigned. —CodeCat 01:08, 11 September 2014 (UTC)
The Wikipedia article (w:Sant Bhasha) seems a bit confused about what this really is: is it a lingua franca, a conlang, or a grab bag of various similar lects used to communicate between members of a particular stratum of society who otherwise don't speak the same languages? Chuck Entz (talk) 02:51, 11 September 2014 (UTC)
Literally, it means "Saint language". I had not heard of it before, but I believe it is like the Slavic esperanto that any group of people who come from each of the Slavic-speaking countries (Russian, Poland, Ukraine, Serbia, Czech Republic, Slovakia, Belorusia, Bulgaria) naturally fall into in order that everyone can speak to and understand everyone else. In the time of Germany’s w:Martin Luther (early 1500s), the German theologians, philosophers, and other writers from various parts of Germany, all speaking different dialects of German, did much the same thing, which is where today’s Standard High German comes from. —Stephen (Talk) 03:18, 11 September 2014 (UTC)
I think calling it an "Esperanto-like language" is misleading. It doesn't seem to have been deliberately constructed by someone. I think it's much more like a lingua franca. The Wikipedia article on Sant Bhasha is very confusing, but the article on the Guru Granth Sahib says more clearly, "It is written in the Gurmukhī script, in various dialects – including Lehndi Punjabi, Braj Bhasha, Khariboli, Sanskrit and Persian – often coalesced under the generic title of Sant Bhasha." That leads me to believe that Sant Bhasha doesn't require a code of its own, because all of the words appear in some other language, i.e. ਸੋਚੈ is a word of Lehndi Punjabi, Braj Bhasha, Khariboli, Sanskrit and/or Persian, although perhaps only written in Gurmukhi when it's being used in Sant Bhasha. (Is the phonetic similarity between sant and saint/santo a coincidence?) —Aɴɢʀ (talk) 07:01, 11 September 2014 (UTC)
Hi There! I will stop making any new entry unless this matter is resolved.
Now why we need separate entry for Sant Bhasha. Here are my points in favour of it:
1.) There is a bulk of literature written in it by number of writers. I am listing few of them.
a. Guru Granth Sahib - Note it is a combined work of more than 30 authors.
b. Dasam Granth
c. Varan Bhai Gurdas
d. Panth Parkash
e. Suraj Parkash
2.) The languages from which it draws its vocabulary: Punjabi, Hindi, Marathi, Sindhi, Apabhramshas, Sanskrit or Persian, none of them use Gurmukhi Alphabet except Punjabi. Here it is not logical to put Sanskrit, Persian or Hindi language words in Gurmukhi Alphabet. The same logic why we need to put the word 'algebra' under English in Latin Alphabet, when we already have an entry for this word under Arabic in Arabic Alphabet. Why we need the words borrowed from Latin under different languages, when they are already explained under Latin ?
3.) Most important point is that we a bulk of word-forms where the root word comes from different language and its declension or conjugation is derived from different language. eg in ਸੋਚੈ 'sochai', root-word 'soch' could have been borrowed from Punjabi, whereas declension -ai is derived from Sanskrit instrumental plural -aih. So where should we put this word under Punjabi or Sanskrit?
Q-Is the phonetic similarity between sant and saint/santo a coincidence?
A-Maybe Latin sanctus and Sanskrit santa originated from same Indo-European root.Bhvintri (talk) 00:32, 12 September 2014 (UTC)
  • Are there any grammars or dictionaries of it, or it's just a literary language of a fixed number of works? I think that it should be mandatory to have citations for any added words for obscure cases such as this one, so that it's easier to clean them up in the future (e.g. if it is decided to treat it as a form of some other Middle/Modern Indo-Aryan language). --Ivan Štambuk (talk) 10:02, 13 September 2014 (UTC)
This is what wikipedia says under Sacred language:
Sant Bhasha, a mélange of archaic Punjabi and several other languages, is the language of the Sikh holy scripture Guru Granth Sahib.

http://books.google.com/books?id=Itp2twGR6tsC Indo-Aryan Languages by Colin Masica also attests it on page 57 as:
The Sant or Nirguna tradition of mystical poets, beginning with Kabir, prefered a fluid mixed dialect with a strong Khari Boli element.

'An Introduction to the Sacred Language of the Sikhs' by Christopher Shackle is a book on the grammar of this language. But unfortunately I can't find it online. http://www.amazon.com/An-introduction-sacred-language-Sikhs/dp/B0007BRI5W
In Indo-Aryan Languages (edited by Geroge Cordona and Dhanesh Jain) http://books.google.com/books?id=OtCPAgAAQBAJ on page xxi it is listed as 'Language of Adi Granth' (note: Adi Granth is another name for Sikh holy scripture Guru Granth Sahib), separate from Punjabi or Hindi. Further from page 656 to 672, Christopher Shackle has given a Declensions and Conjugations of its nouns, pronouns, adjectives and verbs and compared them with Modern Punjabi.
The most famous dictionary of this language is Mahan Kosh published in 1930 and then republished in 1981.
Mahan Kosh and other online dictionaries of this langiage can be found online at many links like this: http://www.srigranth.org/servlet/gurbani.dictionary
Bhvintri (talk) 19:39, 13 September 2014 (UTC)

Pronunciation needs to be at level 4 for Arabic[edit]

WT:ELE claims that pronunciation ought to be at level 3, above individual entries for nouns, verbs, adjectives, etc. Apparently, cases like duplicate where the pronunciation differs by part of speech are handled by listing the part of speech under the pronunciation section, above the corresponding pronunciation. But this fails entirely for Arabic, where a single page may have e.g. two nouns and three verbs on it, each with a different pronunciation (because the vowels are omitted in writing). In fact, it's rarely the case that two different part-of-speech entries will share the same pronunciation. As a result it seems clear to me that pronunciation for Arabic needs to go at level 4. But where exactly? I think the most obvious thing is to place it directly above the definition, possibly without any preceding header.

Note also that the current naming scheme for the .ogg pronunciation snippets is totally broken because it's named for the page title (without vowels), meaning that there's no way with this naming scheme to have separate pronunciations for two different subentries on the page, much less five or six. Benwing (talk) 06:28, 12 September 2014 (UTC)

This is a side-effect of the problem of etymologies in Arabic (both diachronic i.e. from Proto-Semitic, and synchronic i.e. from root x-y-z) - they usually refer to a one specific PoS using one specific derivational mechanism, but are instead usually grouped as if referring to all of them. After splitting by individual etymologies, pronunciations should come at level 4 naturally. These are left at level 3 for now until someone knowledgeable comes along. --Ivan Štambuk (talk) 15:22, 13 September 2014 (UTC)

Cleaning company spam[edit]

We're getting quite a lot of this lately (one every day or so?). It advertises various cleaning companies in the UK, in e.g. Brent and Enfield. I added a filter thing to prevent the original spam message, which can also be found on other sites with the same wording, but they seem to have changed to another one. Further filtering would be welcome, as this spammer is creating a lot of accounts — one useful thing might be that the word "clean" appears in many of the user names. Equinox 15:05, 12 September 2014 (UTC)

Chinese Character Composition[edit]

During my studies of Chinese I thought it would be useful to be able to look up characters by their components, not limitted to the traditional radicals. I saw that wiktionary already has some of this information and I used it as a starting point.

Altogether I decomposed more than 14,000 characters, traditional and simplified. My decomposition also provides locational information of the components in the characters.

See http://bioinfoc.ch:8081/languages/HanziComp for an application that uses the composition information. The Help link gives a short introduction.

The format of what I could provide is like this:

児 t:131/2s,b:r10 er2

兑 =11/3 =(t:r12a,b:11/2) s7 dui4

兒 =r10a =131/14 =(t:r134,b:r10) s8 er2 r5

兔 =164/1 =(a) s8 tu4

兕 t:69/118,b:r10 si4

兖 t:29/57,b:11/27 yan3

兗 t:29/57,b:11/2 yan3

兘 o:11/13,i:58/6 shi3

兙 o:152/1',i:r24 shi2 ke4

党 =63/32s =(t:200/105,b:11/2) s10 dang3

兛 o:152/1',i:10/3 qian1 ke4

The three fields are: Character Composition Pinyin

The composition field can also name components that are used in other characters: =rNNN traditional radical =NNN/NNN name used in Chinese Characters: A Genealogy and Dictionary (English and Mandarin Chinese Edition) [Paperback], Rick Harbaugh (Author) (http://www.zhongwen.com/)

Some of the named components are atomic =(a) Others are further decomposed, e.g. =(t:r134,b:r10) All of the named components have the number of strokes sNNN

I'm sure that my analysis still contains errors (but I checked it in multiple ways for consistency), some questionable assignments or incomplete decompositions.

Anyway, is there interest from your side to integrate this information into wiktionary in order to make it available to a broader audience? I would be able to put more work into this and give you the information in any format. —This unsigned comment was added by Brogerc (talkcontribs).

My Chinese knowledge is extremely limited, but no one else has replied and this information seems valuable. My main question would be how it could be presented to the user so that it could be useful and easily understood? And how does it differ from the composition data which is already present on many entries in Wiktionary. E.g. lists the composition as: ⿱. I assume this is similar to what "t:131/2s,b:r10" represents? Thanks. Pengo (talk) 22:43, 23 September 2014 (UTC)
From what little I know about Chinese characters, the components aren't necessarily just simple combinations. Often a character that is made of smaller components can be used as a component itself in an even larger character. So it's much like compounding, where a compound can be used as a base to form a larger compound. So the question is how we want to show this information. Do we want to show only the most basic components, or do we also want to show the intermediate combinations? —CodeCat 22:53, 23 September 2014 (UTC)
All I know is that this notation looks confusing and darn-near unreadable. I don't know how helpful it would be to add such an obtuse notation for character decomposition, especially given its dependence on using Latin letters. There's never going to be an "easy" method of breaking down these characters; many characters aren't based on regular radical forms but rather on however the Chinese could best modernize them from earlier Oracle Bone and Small/Large Seal Script variants. Bumm13 (talk) 22:29, 24 September 2014 (UTC)
The information I can contribute is exactly as already available for some characters such as the above mentioned , but I created it for >14,000 characters and, in addition, I can provide positional information (top, bottom, left, right, in, out) that is currently not available in Wiktionary.
Most of my decompositions are binary: left/right, top/bottom or in/out. Both of the components can possibly be further decomposed, but this can be looked up in the entries of both components (characters), as it is currently done in Wiktionary.
My (internal) IDs don't have to be shown, just the corresponding components (characters).
—This unsigned comment was added by Brogerc (talkcontribs).

Birds in English[edit]


In the Finnish version we have many articles on birds. However, we have yet to decide a naming policy, and I came here to ask your thoughts. I've noticed, that here birds names are in small caps, for example coal tit. In the Finnish version we have both fi:coal tit and fi:Coal Tit and this is somewhat problematic. In ornithology, caps seem to be used: http://www.worldbirdnames.org/english-names/spelling-rules/capitalization/. So, how should we write English bird names in the Finnish version? Of course, we would ideally want the interwiki links to work, so also a common naming policy could be sought after. Suggestions? --Hartz (talk) 16:18, 13 September 2014 (UTC)

It’s up to you guys, but my suggestion is to include all attested spellings. — Ungoliant (falai) 16:49, 13 September 2014 (UTC)
(edit conflict) Naming policies are for Wikipedia (where there's a good bit of debate on the subject). Wiktionary goes by usage: in theory, the spelling/capitalization that's used most should be the main entry, and any others attestably in use should be alternative forms. In practice, though, the one that's created first tends to be the main article, and I'm not sure if the "most used" criterion has ever been explicitly made a policy. Information about which capitalization is used in which contexts would be good information for usage notes. If we were consistent enough with our context labels, we might indicate it that way, but people tend to use "zoology" or "ornithology" for any word having to do with animals or birds rather than just for words used by ornithologists or zoologists. Chuck Entz (talk) 16:51, 13 September 2014 (UTC)
  • I find there is a conflict between building a good set of substantive entries of such kind and having entries that are true to the most common orthography, which may differ by the source of whatever list or source the contributor was working from, contributor personal preferences, or even actual frequency research. Perhaps frequency of use should govern in principle, but it is a counsel of perfection that may stand in the way of good substance. Almost whatever reasonable two-part spelling a user types in will cause the search engine will find any two-part spellings in the wiki (including hyphenated forms, eg coal-tit, Coal-tit) and place them at the top of the headword-not-found page. Even a single-word spelling would be found, eg coaltit. But for regular users and contributors consistency of orthography makes it easier to know whether there is a substantive entry for a given vernacular name. The tedium of determining which is the more common form seems to me to far outweigh the benefits-in-principle, which seem quite modest relative to the benefits in practice IMO.
A possible practical solution would be to have standardized spelling for all main entries, but indicate the most common orthography among the alternative forms/spellings at the top of the entry. DCDuring TALK 20:20, 13 September 2014 (UTC)
  • The English Wikipedia engaged in a lot of bickering for a long time regarding how to capitalized birds' names, e.g. "rusty blackbird" / "Rusty Blackbird". Finally, recently, a broad (site-wide?) RFC — which NB was judged by an admittedly pro-uppercase editor — determined that birds' names should be lowercase, because they are lowercase in most cases (e.g. in general books about subjects like home decor which happen to mention that a tapestry depicts a rusty blackbird; in works of fiction; in general reference works; etc — in other words, in general use) and it is only in some specialist works on the subject of ornithology that bird names are capitalized, and in those cases, the capitalization is equivalent or akin to honorific capitalization or to the old English practice of capitalizing Important Words. As far as I know, Wiktionary has long used lowercase for that same reason. (Wiktionary and Wikipedia have both always used lowercase for animals other than birds, e.g. rusty tinamou.) Compare how several armed forces uppercase their rank terms and other terms, e.g. "Private", "Sailor", "Ship", etc, but we have just "private", "sailor", "ship", etc. Whether you want to include redirects from other case-forms is up to you. Wiktionary does not use redirects for things like "Ship"; I don't know of any examples of Wiktionary using redirects for birds' names, but I can't rule out that some exist, and I express no opinion at this time on whether or not such redirects should exist. - -sche (discuss) 20:10, 13 September 2014 (UTC)

Official wordlists for specific fields[edit]

There are prescriptive lists of common names for some taxonomic groups issued by various scientific and other organizations, e.g. the w:International Ornithological Congress has a list at www.worldbirdnames.org. These obviously aren't a factor as far as CFI, but it might be worthwhile to have categories and/or reference templates to indicate that a name is designated as the preferred name in a given list. I don't think appendices are that good of an idea, since they would duplicate lists available elsewhere online. The IOC list would fit nicely in our current topical framework, since it covers multiple languages, but I believe there are some that only cover a given language in a given region.

I would appreciate any suggestions on how to represent this information, since these lists would be a good way to expand our coverage of names for living things, and some are available online in formats that could be used for mass importation of entries if anyone who knows how is so inclined. Chuck Entz (talk) 17:43, 13 September 2014 (UTC)

A good representation is to create an entry for each bird. I'm sure many already exist. This Excel spreadsheet has a lot of good information and this would be a lot of work. A bot could create the missing English entries, add the given translations, and create the FL entries. The English bird names are all capitalized in the spreadsheet, though. --Panda10 (talk) 18:46, 13 September 2014 (UTC)
We have more than 5200 Translingual bird name entries that use {{R:Gill2006}}, thereby indicating recourse to the IOC publication. These usually contain little more than the English vernacular name, not even that for many genus names. Remarkably that template does not appear in many (any?) English vernacular name entries.
Mass importing might be nice once we agree on a desirable format, which need not be very difficult. I have been working on a more demand-oriented approach for taxonomic names and corresponding vernacular names, but mass-import is good. Spreadsheets are easy to work with for reformatting, so capitalization need not be a problem if we agree on a simple policy of importing in a standard capitalization, leaving the more time-consuming business of determining more frequent capitalization, by date, usage context or whatever, for future generations of contributors with even more powerful tools and resources.
Isn't there also an international bird-watchers body that has different naming ideas? DCDuring TALK 21:39, 13 September 2014 (UTC)
  • @Chuck Entz: There is also a similar list for viruses (~2-3K names), published by the International Committee for Taxonomy of Viruses. Checklists from the USDA Plants database are downloadable for each US state, certain territories and possessions, for Canadian provinces, and for the whole. It should be possible to get their USDA official "vernacular" name. The large NCBI (US) taxonomy database seems downloadable and has some vernacular names. There are certainly others for taxonomic families or other groupings. Some may be POV, in the sense of advocating a taxonomic scheme, not necessarily widely accepted. Some more definitive higher-level groupings seem to be restricted to further the sales of print publications or of machine-readable data (eg, mammals). There seem to be many more databases of scientific names than of vernacular names, even of the "recommended" vernacular names. Vernacular names deserve some priority, especially as long as contributors who vote on this page disfavor for some reason I can't fathom translation tables for taxonomic names.
All of that said, Wikispecies has many entries with tables of vernacular names. Simply adding the taxonomic names for such entries, followed by stub L2 sections for the vernacular names we do not have would be a significant contribution, possibly more to the taste of contributors here. We might be able to do them the favor of identifying possible or even actual errors using gender agreement. I have confirmed some using more definitive online database such as IPNI. DCDuring TALK 18:18, 23 September 2014 (UTC)

Renaming rhyme pages[edit]

I have noticed rhyme pages have been renamed, such as from Rhymes:Czech:-alɪ to Rhymes:Czech/alɪ. I object and ask that they be renamed back. I cannot find the Beer parlour discussion for this. --Dan Polansky (talk) 07:25, 14 September 2014 (UTC)

Note that I am the creator of more than 1000 comprehensive Czech rhyme pages.

I object to subcategorization; I ask that all Czech rhyme pages be found in Category:Czech_rhymes, as they were before not too long. --Dan Polansky (talk) 07:28, 14 September 2014 (UTC)

Wiktionary:Requests for moves, mergers and splits[edit]

I think Wiktionary:Requests for moves, mergers and splits does more harm than good. Proposals are being made there that affects more than 1000 of pages. Such proposals should IMHO be made in Beer parlour. For one-off moves of single mainspace pages, WT:Tea room should suffice. I think whenever someone makes a proposal there affecting a volume of pages, the proposal should immediately be rejected as being made via a wrong venue. In the ideal hypothetical world, the page would probably be deleted. --Dan Polansky (talk) 08:56, 14 September 2014 (UTC)

Maybe we should merge all the forums into your talk page so you won't miss anything, because, obviously, there's nothing more important than keeping you informed. Never mind that it's been the designated forum for this kind of thing for years- you missed out on something because you weren't paying attention to it, so it has to go. NOW!!! Chuck Entz (talk) 15:01, 14 September 2014 (UTC)
Wrong. I found the forum annoying when it was created back in 2010. It is now doing tangible harm. --Dan Polansky (talk) 18:28, 14 September 2014 (UTC)

Rhyme pages and subcategories or subcategorization[edit]

Dutch rhyme pages have subcategories. You can browse them from Category:Dutch rhymes and see how useful or not these are.

  • I oppose creating such subcategories for Czech rhyme pages.
  • I oppose that an editor not working on rhyme pages for a particular language creates rhyme subcategories for that language without having express support for doing so from editors working on rhyme pages for that language.

Subcategories for rhymes are a fairly useless form of organizing rhyme pages, IMHO. My idea of a useful organization of rhyme pages can be seen at Rhymes:Czech, which uses tables AKA matrices rather than a hierarchical tree, which is what categories present. Worse yet, subcategories do not present the hierarchical tree at a glance; rather, you have to click through them one at a time to see their content; even by clicking them one at a time, you won't see the larger picture.

--Dan Polansky (talk) 09:10, 14 September 2014 (UTC)

Please make RFE & RFP mandatory[edit]

Requests for etymology and requests for pronunciation should be common practice. Contributors should have the luxury of being able to facilely go through catalogues of terms that are without etymology or pronunciation. It facilitates navigation and accelerates labour. Concerning etymology, exceptions can be made for some noncanonical entries and some alternative forms, but that is about it. --Æ&Œ (talk) 07:16, 15 September 2014 (UTC)

Every lemma entry should have under those headers either good content or a link to an entry that has such content.
But I don't think universal use of {{rfp}} and {{rfe}} will lead to more good content under the headings. Perhaps it would be nice to make sure that all lemma L2 sections have Etymology and Pronunciation headers to reduce tedious typing for those who would add the content. IMO it is more important to find out which entries actually have motivated a specific individual request. It might be nice even to have a mechanism for votes supporting such requests on specific entries.
I would strongly favor creating lists (even just counts) of lemma L2 sections that lack pronunciation headers, etymology headers, translation headers and, more importantly, those that have the headers but lack actual content under those headers. DCDuring TALK 15:12, 15 September 2014 (UTC)
  • Block this troll already. This is not a sane proposal, and the troll knows it very well. --Dan Polansky (talk) 17:17, 15 September 2014 (UTC)
The French Wiktionary always uses {{rfp}} (well the nearest equivalent). Admittedly the total number of entries that use it is in the hundreds of thousands, probably over a million. Renard Migrant (talk) 22:13, 15 September 2014 (UTC)
What exactly is the point of a request category with millions of members? DTLHS (talk) 22:17, 15 September 2014 (UTC)
Maybe we should start making a distinction between entries for which something is requested, and entries which are merely lacking something? —CodeCat 22:27, 15 September 2014 (UTC)
That's what I was trying to get at. I was thinking that lists of entries lacking pronunciation headers or lacking etymology headers would be good applications of dump-processing. I would think they would be most helpful for English one-word lemmas. DCDuring TALK 23:50, 15 September 2014 (UTC)
We have 369K members of the English lemma category, 89K mainspace entries that contain both "English" and "pronunciation" (so, probably an overestimate of entries with English Pronunciation headers) and 1K English rfps. It seems that the English lemma category includes many abbreviations, plurals, multi-word terms, and other items for which pronunciation and etymology are not necessarily worth any significant effort. DCDuring TALK 00:05, 16 September 2014 (UTC)
I found nothing strange in the request. The requests would not be manageable but it's not a reason for blocking or ridiculing. As for the request itself, I oppose it. We have a huge number of such requests already. We're lucky to have an entry for a term - with an English translation.
Etymology: I actually find Korean entries a good example - the etymologies are split roughly by Sino-Korean (40-60%), native (about 35%), loanwords from European languages (5%). Having something like "native + language name" is already informative. For Slavic, Germanic, Romance, etc. the minimum info could be "Slavic", etc.
Pronunciation: For languages such as Czech, etc. pronunciation can be automated but someone has to create a module. Languages, such as English could use a phonetic respelling to get automatic IPA, look at Persian or Chinese which use transcription to get IPA. --Anatoli T. (обсудить/вклад) 00:19, 16 September 2014 (UTC)

Category:Terms needing transliteration by language[edit]

I've not spotted this one before but, some of these use [[:Category:<langname> needing transliteration]] and some use [[:Category:<langname> lacking transliteration]]. Even worse, some use both and split the entries over two categories. Purely because of the name of the parent category, could we align these into [[:Category:<langname> needing transliteration]]? For our purposes lacking and needing are synonymous. Renard Migrant (talk) 22:12, 15 September 2014 (UTC)

large page navigation[edit]

When I’m loading a large page, I have to wait approximately one minute for the content to load just so that I can see the languages that I’m interested in. The table of contents is irritating to navigate, particularly on pages with a huge number of bytes (e.g.: a). The table doesn’t even have nested tabs like on Wiktionnaire.

My idea would be to have some sort of option to ‘filter’ languages before the page loads, but I suspect that this would be very difficult to programme. I can’t think of a superior alternative, though. Do you lot have any better ideas, by any chance?

Am I the only one who finds it annoying to navigate high‐content pages? I’m not sure what we can do about it, though. --Æ&Œ (talk) 13:04, 16 September 2014 (UTC)

Buy a newer computer. I never have problems with large pages. --Vahag (talk) 13:25, 16 September 2014 (UTC)
I don’t have that kind of trouble at all. For me, every page loads as fast as every other page ... in about a second. First, I think you should go to PREFERENCES > Gadgets > User interface gadgets, and tick Enable Tabbed Languages. Each language will have its own page (and no more tables of contents). Second, I don’t know if the browser makes any difference, but I use the very latest Firefox browser, Firefox 32.0.1. Third, RAM memory might be an issue, and you should see if you can get another RAM memory card that will increase your computer’s memory. My computer is just a cheap old laptop that I bought second-hand several years ago. I just added some RAM and upgraded to Windows 7. I think if you do these things, your pages will load quickly. —Stephen (Talk) 13:57, 16 September 2014 (UTC)

Subpage editing weirdness[edit]

I went to cull some blue links from User:Brian0918/Hotlist/A2, and was unable to save the edit, instead receiving a message that said:

This action has been automatically identified as harmful, and therefore disallowed. If you believe your action was constructive, please inform an administrator of what you were trying to do. A brief description of the abuse rule which your action matched is: Users touching other users' user pages and subpages

I was, however, able to move the page to my userspace, delete some of those blue links, and then move it back. I also tried editing User:Robert Ullmann/Oldest redlinks, and got the same message. Since I am an administrator, it seems rather pointless to "inform an administrator" of what I was trying to do. I have edited subpages like these up until earlier today without this issue coming up. What changed? bd2412 T 19:02, 16 September 2014 (UTC)

See Special:AbuseFilter/24. Looking at today's change log, it says: "2014-09-16: Check user_rights for "autoconfirmed" instead of user_groups for "*confirmed". That way global groups providing "autoconfirmed" are also supported whilst still supporting the local "confirmed" user group, too. ––Krinkle". Equinox 19:09, 16 September 2014 (UTC)
Fixed. Next time, post this in the WT:GP. --WikiTiki89 19:11, 16 September 2014 (UTC)
... but as long as we're in the BP: there has been discussion of changing the filter to allow users to edit others' subpages; I now support such a change. Would anyone else like to express a view on (or implement) that change? - -sche (discuss) 19:14, 16 September 2014 (UTC)
I support such a change. It's often legitimate to edit subpages of other users, like to update bot feed lists or to remove entries from lists that another user has generated from a dump. Editing the main user page of another user should be restricted to people who know what they are doing (unlike User:WritersCramp who vandalised my user page in diff). —CodeCat 19:27, 16 September 2014 (UTC)
I've been doing that for years. Removing entries from lists that another user has generated from a dump, I mean, not vandalizing CodeCat's user page. bd2412 T 20:09, 16 September 2014 (UTC)
As admin I've been editing selected dump-populated subpages, too. But I have simple questions about this:
  1. Would this be a default setting that could be overridden?
  2. What level of user are we talking about? Registered? Whitelisted?
Prudence would require that the pages be added to one's watchlist, no matter what the default and no matter what level of user. DCDuring TALK 21:47, 16 September 2014 (UTC)

bookworm: movies[edit]

Here’s a kind of ngram for movies & TV: movies.benschmidt.orgMichael Z. 2014-09-16 21:16 z

Korean lemmas and categories[edit]

Discussion: User_talk:Jusjih#Categories

(Notifying TAKASUGI Shinji, Wyang, Jusjih):

We have a bit of a disagreement with User:Jusjih regarding Korean hanja. In my opinion, hanja or Chinese character forms, as not a primary writing system for modern Korean, should not have any topical categories. Cf. Japanese kyūjitai (pre-reform spellings), kana (usually hiragana and sometimes katakana) terms (when not the most common spelling), let alone romanisation entries - rōmaji and pinyin.

E.g. 중국인 (junggugin) (the current standard spelling) belongs to Category:ko:Nationalities but IMO 中國人 () (hanja entry) should not. I would say the same about Vietnamese Hán tự entries. Both Korean hanja and Vietnamese Hán tự definitions are, by convention, very short (one-liner) and only link to the current writing system, i.e. hangeul for Korean and Latin spelling for Vietnamese.

What do you think? Comments regarding kyūjitai, kana, Hán tự would also be appreciated. Format for rōmaji and pinyin entries are set in stone by appropriate votes. --Anatoli T. (обсудить/вклад) 06:02, 17 September 2014 (UTC)

As South Korean debates of pure hangul vs. mixed script with hanja are still ongoing, your POV toward Korean hanja could extend the Korean "text war" (w:zh:朝鮮漢字#六十年文字戰争, w:ja:朝鮮における漢字#「文字戦争」). Wiktionary is not a battlefield. Even Korean Google News does get a few hanja without hangul in parenthesis, like "사우디전 나서는 이광종號, 주목해야 될 점은?". When using a Korean hanja dictionary getting homophone compounds is so easy, like 수도 (sudo) matching so many different hanja compounds with very different topics.--Jusjih (talk) 07:16, 17 September 2014 (UTC)
To clarify, I don't hold or promote any negative opinion about hanja, I create hanja entries, too. I'm just saying that hangeul is the primary writing system and the forms learners are more likely to use. Most hanja entries lack categories, pronunciation and other sections too. I don't think it's such a great deal but maintaining categorisation for both forms is also quite difficult. --Anatoli T. (обсудить/вклад) 07:47, 17 September 2014 (UTC)
The practice of centralising all information on the Hangul/Quoc Ngu pages is about minimising the amount of maintenance work needed, as words in various East Asian languages can easily be written in a variety of scripts. I also support merging the Simplified and Traditional Chinese entries, by making Traditional the lemma form where all trad-simp conversions are by default enabled, and making Simplified a pretty soft-redirect containing no information but the link and the Hanzi box. Wyang (talk) 00:30, 18 September 2014 (UTC)
If simplifying maintenance is the reason not to categorize topics in too many pages, here are some scenarios that your proposed merger from simplified Chinese to traditional Chinese will work:
  1. If a Chinese compound has simplified and traditional characters while unused in Japanese kanji, Korean hanja, and Vietnamese Hán tự, then your proposed merger to redirect will work very well.
  2. If a Chinese compound has simplified and traditional characters while also used in Japanese kanji unchanged and Korean hanja unsimplified, then your proposed merger to redirect will work well, like the noun 结婚 redirected to 結婚 (marriage).
  3. Finally, thanks for promoting traditional Chinese. I have heard that red China has officials calling for reverting to traditional Chinese with no further avail yet, but I have not heard of Japan planning to return to kyujitai. Once Chinese soft redirects work well, would you like Japanese kyujitai (old characters) soft-redirected to shinjitai (new characters) as well? Some but not all Japanese shinjitai are the same as simplified Chinese.--Jusjih (talk) 04:36, 18 September 2014 (UTC)
@Jusjih:. I think User:Wyang means a soft redirect, meaning there will be still entries for simplified forms, traditional having all the term info. It has its merits (centralising all information), even though simplified Chinese beats traditional 10:1 or so (Internet content and publications). --Anatoli T. (обсудить/вклад) 01:06, 19 September 2014 (UTC)
Chinese Wiktionary is even more sensitive on traditional vs. simplified characters. English Wikisource already uses soft redirects for different purposes. Now, what is your concern of maintaining categories on different scripts, like Korean hangul vs. hanja? Because categories may change in the future in some ways?--Jusjih (talk) 06:11, 19 September 2014 (UTC)
My concern is synchronisation. It's difficult to keep in sync infos in both traditional and simplified entries. E.g. currently there are many traditional entries without the audio link but simplified entries have them. If we add categorisations to hanja as well, then ALL hanja entries should have them but I don't see the need. If, e.g. countries, fruits, animals contains hangeul only, it's sufficient, one-line hanja entries have the link to hangeul entries. As for the Chinese Wiktionary, it's not very consistent with keeping only simplified entries but they have only one entry per term, not two, like paper dictionaries - they provide both forms but in ONE entry. --Anatoli T. (обсудить/вклад) 06:31, 19 September 2014 (UTC)
Then soft redirects or the like should be considered for these?
  1. simplified to traditional Chinese (Hong Kong and Macau sometimes use different traditional characters from Taiwan.)
  2. Japanese kyujitai to shinjitai
  3. Korean hanja to hangul
  4. Vietnamese Hán tự to Latinized ahplabets
As treating different scripts in CJKV as equally as possible would be ideal, I wonder how feasible it is to automatically synchronize topic categories with our proposed soft redirects. This will require some technical works to detect topic categories in target pages.--Jusjih (talk) 05:50, 20 September 2014 (UTC)
  1. Variants (with qualifiers, context labels) would be sufficient. Simplified characters are actually better standardised than traditional and have less dated, rare, obscure or regional characters. Well, that was also a reason for the simplification.
  2. Already the case, sort of. Kyūjitai has less info than shinjitai. User:Eirikr thinks they also need dated labels or similar, since kyūjitai is no longer used in Japan.
  3. Don't you like our current hanja structure?
  4. Same for Vietnamese Hán tự. Hanja and Hán tự entries are almost like soft redirects already. At least they are supposed to be one-liners, with pronunciation, etymology, synonyms, usage examples, etc. in hangeul entries.
From User:Wyang's post I understand he wants simplified entries look something like this:

The problem is here with the definition lines and missing PoS. Even if we make it like (with a new template {{jiantizi}} or similar):

# {{jiantizi|天氣}}

The community may not accept such formats. --Anatoli T. (обсудить/вклад) 07:02, 20 September 2014 (UTC)
I like Anatoli's second proposed format. Wyang (talk) 00:38, 22 September 2014 (UTC)
@Wyang: I was just materialising your view in an example. As I said, the problem is with the missing PoS header. It won't work without a vote because it violates the current WT:ELE. The format may need tweaking to match something like {{alternative form of}} examples or other templates. If PoS headers are included, it has more chances to pass. Otherwise, let's see what others will say. If it passes in this format, then WT:ELE need to be updated to reflect this exception. Other Chinese editors need to be polled as well. --Anatoli T. (обсудить/вклад) 00:51, 22 September 2014 (UTC)
(Notifying Kc kennylau, Atitarev, Tooironic, Jamesjiao, Bumm13, Meihouwang): Pinging other editors. Please express your views here too. Wyang (talk) 00:55, 22 September 2014 (UTC)
I am favourable to a centralisation of the information in the traditional page.
  • First it will reduce the amount of work for word with simplified and traditional form. no need to copy and convert the work done on one form to the other.
  • Secondly, the Wiktionary data will be easier to parse as there would be no duplicate informations. I parse the Wiktionary data to use in dictionary I am programming. With character with two form, it's hard to programmatically merge the two entries so I just take one arbitrarily.
A little more work will need to be done as every word in the entry will need to be written in simplified and traditional form (with {{zh-l}}, {{zh-ts}}, {{zh-tra}} or {{zh-sim}}) but in the end the will be less work to do than when synchronizing simplified and traditional entries.Meihouwang (talk) 11:17, 24 September 2014 (UTC)
I am more favourable in centralising trad./simpl. entries but with preserving PoS headers and categorisations, that way simplified entries won't be "discriminated against". The definition lines would contain soft redirects to traditional entries. Simplified entries would still require maintenance but would be less "pronunciation, etymology, synonyms, antonyms, usage examples, usage notes, etc., etc. I won't insist on this if Wyang's proposal passes and others agree with his plan. BTW, we should separate this topic from the original Korean hanja.
(While realising the need to centralise and simplify work, jiantizi is a standard in China, Singapore, Malaysia and most universities teaching Chinese, preferred by foreign learners and, as I mentioned before, has much more Internet penetration and amount of published texts. Japanese shinjitai also often coincides with Chinese jiantizi (about 30%?). Just a thought, which may be raised by opponents.)--01:05, 29 September 2014 (UTC)

Visual Editor[edit]

For the record, I oppose the introduction of Visual Editor (W:Wikipedia:VisualEditor) in any form or manner into English Wiktionary; I also oppose an opt-in introduction. Some reasons for this opposition were stated by Kephin in Wiktionary:Grease_pit/2014/September#Visual_Editor. --Dan Polansky (talk) 05:40, 18 September 2014 (UTC)

  • Hi Dan. Having this tool would allow me to more easily do some things that would be helpful to the project. Please reconsider. Perhaps we could condition use on the editor having some level of participation which makes misuse unlikely (admin-only or the like). Cheers! bd2412 T 14:09, 18 September 2014 (UTC)
  • I see no reason why this should not be allowed as an opt-in if it does not have any effect on anyone other than the contributor using it. I doubt that it would be a serious server-resource hog, that it would generate a lot of help requests at WT:GP or WT:ID, or that it would generate a higher ratio of bad content to good content, at least as an opt-in. DCDuring TALK 14:19, 18 September 2014 (UTC)
  • I also support the introduction of Visual Editor, at least as opt-in. —Mr. Granger (talkcontribs) 22:30, 18 September 2014 (UTC)
  • I would also say there is no reason for it not to be available for anyone who wants to try it. It should not, of course, be enabled by default, or even promoted in any way, but it should at least be possible for individual users to use it somehow. VE is at the stage where it is not causing page corruption or other undesirable effects that disrupt or pollute wikitext or diffs. This, that and the other (talk) 10:14, 23 September 2014 (UTC)

requests for synonyms[edit]

I believe that we should have a template that requests synonyms. This may be somewhat problematic since not all words are going to have synonyms, but perhaps we could compensate by simply inserting ‘This term does not have any synonyms.’ Is anybody totally opposed to this template idea? --Æ&Œ (talk) 13:23, 18 September 2014 (UTC)

From the user's point of view I think it's annoying and unhelpful to have editor-only content in an entry, saying "hey, something is missing!". I would prefer this kind of thing to be some kind of invisible markup that generates a category (for those wanting to help with that category) but doesn't add text to the entry. Also, is this a feature that people would use; do we get many requests for synonyms already, on talk pages? Equinox 13:28, 18 September 2014 (UTC)
You both make good points.
We do get occasional requests for synonyms at Info Desk and Tea Room. As we discourage the use of entry talk pages by not responding to them quickly, it is no surprise that they are not much use for this. Even if the only use of this would be to structure user requests to make them easier to fulfill it would be a help.
I agree that some of our request boxes aimed at requesting content, such as those created by {{rfi}}, {{rfp}}, {{rfe}}, are too prominent. The alternative of eliminating them and relying exclusively on a category misses a chance to teach ordinary users that such requests exist, which may help them make such requests, which may get them involved as content contributors. I also find that occasionally I will be motivated by a visible request in an entry to add the requested content. OTOH I have never been motivated to add content by a category.
Accordingly, we could make a {{rfsyn}} that displayed on the scale of {{rfv-sense}}. I would further favor it not displaying properly unless a sense were provided. Another approach would be to modify {{sense}} to allow a second parameter which could be "?", which could generate the (modest) display and categorize. DCDuring TALK 14:09, 18 September 2014 (UTC)
I have added and used {{rfelite}}, both because it is less intrusive and therefore desirable and as a demonstration of what a {{rfsyn}} could look like. DCDuring TALK 16:23, 23 September 2014 (UTC)
I’m not sure what you mean by ‘editor‐only content.’ Are you saying that your requests for quotations aren’t ‘editor‐only content?’ I’m confused.
I don’t remember many requests from others concerning synonyms, but I certainly do request many synonyms from Mister Brown. Not sure why you think that it must be requested to be merited. Nobody requests my silly entries, but I create them any way. I mean, if I end up helping people besides myself, great. But if it only helps me, whatever. Generally speaking, I create what I want because I want it, not because somebody else does. --Æ&Œ (talk) 14:21, 18 September 2014 (UTC)
  • I oppose there being a template for requests for synonyms. --Dan Polansky (talk) 07:33, 20 September 2014 (UTC)

Can we disable the #babel parser function?[edit]

This parser function is quite problematic because it does not use Wiktionary's own database of languages, but uses Wikimedia's. These two lists differ on some crucial points. In particular, the Wikimedia list is not always ISO compatible, using "als" for Alemannic German while this code represents Tosk Albanian in ISO and "gsw" is used in both ISO and Wiktionary for Alemannic. The function also includes codes like "zh-min-nan" for our "nan", "nds" for our "nds-de", and also codes not recognised at all by Wiktionary like the Serbo-Croatian standard varieties, and even "Simple English". And perhaps most crucially, it does not support any of the custom Wiktionary codes (basically all the ones in Module:languages/datax). It could be argued that users should be free to declare their language in the form they prefer, but at the same time the main point of Babel boxes is to allow other users to find people who know a particular language well enough to edit entries for it. In that light, since we have no Croatian or Simple English entries, we don't need categories for speakers of those languages either. —CodeCat 22:37, 19 September 2014 (UTC)

That's the whole point of it: so people can use the same babel template across all Wikimedia sites. I'd say we should discourage its use, especially by frequent contributors, but definitely not prohibit it, and definitely not disable it. Also, I don't see people listing languages Wiktionary doesn't recognize as a problem in any way. --WikiTiki89 22:47, 19 September 2014 (UTC)
What do you suggest we do with categories like Category:User simple, Category:User zh-min-nan or Category:User als then? —CodeCat 22:57, 19 September 2014 (UTC)
That's one minor flaw. Can we make it not categorize? --WikiTiki89 23:05, 19 September 2014 (UTC)
Not that I know of. And if we could, what would be the point in having it at all? —CodeCat 23:19, 19 September 2014 (UTC)
The babel boxes on the userpage... --WikiTiki89 23:34, 19 September 2014 (UTC)
I don't think the current format of the babel boxes is very useful anyway. Because the text is not in English, someone who doesn't understand the language will probably not know what language it is, unless they also know the language code. It would be clearer if the description was in English. —CodeCat 01:15, 20 September 2014 (UTC)
You can use non-predefined templates within the Babel boxes. The templates just have to start with the name "User " (or maybe "user ", not sure what happens on case-sensitive wikis like this one). Such templates could be created for languages not known to Wikimedia. This, that and the other (talk) 10:16, 23 September 2014 (UTC)

Category structure documentation, review and correction[edit]

I have a few questions about categories.

  1. Where is the topical category structure and its current implementation documented? I can find lots of bits of obsolete documentation, predating modules, but it would take a research project to figure out how things are now working.
  2. Where is the rationale for the particular topical hierarchical structure explained? It seems to be the product of at most two minds, which minds I cannot read.
  3. What is the process for reviewing the adequacy of the structure and then implementing changes? None of our existing review pages seem appropriate based on their names.

Is it supposed to be a secret? DCDuring TALK 16:42, 20 September 2014 (UTC)

Automation of German verb conjugation[edit]

In May, I have raised the discussion of automation of German verb conjugation both in BP and in GP, in which nobody has put any idea. If nobody opposes, I shall start the automation soon. --kc_kennylau (talk) 08:12, 21 September 2014 (UTC)

I don't understand what you are proposing, since I thought the German conjugation templates are already automatic to a considerable extent. It won't harm to ping a couple of people: User:-sche, User:Matthias Buchmeier, User:Liliana-60, User:Longtrend, User: --Dan Polansky (talk) 10:56, 21 September 2014 (UTC)
@Dan Polansky: Then treat it as a reform. --kc_kennylau (talk) 13:02, 22 September 2014 (UTC)
It looks to me like the code in Module:de-conj is half-implemented. It handles some strong verbs but there are tons more. Is there not a better way of handling strong verbs than essentially listing each one? I've seen analyses of German strong verbs in terms of the seven classical Germanic strong-verb classes, and if this is fairly regular in the modern language, it might make more sense to do it this way. In classical Germanic languages, which class you're in is predictable to a large degree from the stem vowel. This may not completely apply in the modern language, meaning you sometimes will have to specify the class explicitly.
Also, the documentation in Template:de-conj-auto is totally confusing and needs to be rewritten and expanded, if you expect other people to figure out how to use it. Although the docs say to divide the verb into "prefix", "stem" and "ending", in reality it's not at all obvious how to separate stem and ending in the expected way. Why for example does finden separate into stem f- and ending -inden, or even more strangely, how would someone possibly figure out that in erlöschen the "ending" is -löschen and the stem is empty, while in plain löschen the ending is -en and the stem is lösch-? Benwing (talk) 02:45, 23 September 2014 (UTC)
Maybe Module:nl-verb can be used as a base to work from? —CodeCat 12:12, 24 September 2014 (UTC)
  • If it still requires parameters, then it's not automation. At best, it's parameters reduction. Users don't care about the behind-the-scenes voodoo generating the inflection tables that they see. Perhaps a less time-consuming path would be compiling a table that leverages existing template infrastructure using some of the online German inflection databases, creating a huge correlation table (split into several parts to take care of the memory consumption limits) so that the usage of {{de-conj-auto}} would require no parameters at all? It would be a simple pattern matching exercise. Just a suggestion. --Ivan Štambuk (talk) 00:04, 29 September 2014 (UTC)

Documenting in WT:CFI our treatment of certain typographic and code-point variants[edit]

Pursuant to Wiktionary:Votes/2011-06/Redirecting combining characters, whenever Unicode has included both a combining and a non-combining variant of a character, Wiktionary excludes the combining variant except as a redirect to the non-comibin variant. I recently moved the documentation of this practice from the "conveying meaning" section of WT:CFI to the "spellings" section, and replaced the hand-wavy lament that the vote didn't explicitly specify text in WT:CFI to be changed with the text of the approved proposal. It got me wondering: should we also document our exclusion of other typographic and code-point variants — our exclusion of ligature variants like fisherwoman, and long-s spellings, and perhaps also our exclusion of The? And perhaps also our inclusion of variants like vp and dies Iouis? (We could add a section with a header like "typography and encoding" next to the "spelling" section, or wherever else is deemed most appropriate.) - -sche (discuss) 18:34, 24 September 2014 (UTC)

Names of letters of the English alphabet and their plurals[edit]

When searching to confirm the spelled out names of letters, of the English alphabet, I found that there were considerable inconsistencies in the entries. I did not know where to expect to find them as I moved through the alphabet.

Some letter entries offer the singular and plural for the noun (with a plural). Others have no noun entry or only the singular spelling is offered. That singular spelling might also be in the <letter> entry or the <number> entry or both.

In any case there are sometimes alternative names.

For these names to be presented consistently it appears that the entries themselves need to be more consistent.GHibbs (talk) 05:20, 25 September 2014 (UTC)

Easier to find them by using the category: Category:en:Latin letter names. —Stephen (Talk) 10:14, 25 September 2014 (UTC)
Who the devil came up with these spellings, by the way? Why "aitch" and not "eitch"? Why "wye" and not "wai", for instance? Tharthan (talk) 18:43, 29 September 2014 (UTC)

you need to expand quotations!![edit]

I was surprised by http://en.wiktionary.org/wiki/gunsel which stated "By misunderstanding of the 1929 Maltese Falcon quotation above". There is no quotation above. So I thought, well, clearly it got removed. I went through the history. I found the ACTUAL EDIT that added the quotatino (http://en.wiktionary.org/w/index.php?title=gunsel&diff=7728039&oldid=7715648 ) and then I clicked on that version (after literally seeing it after a + in the diff, so I knew it was there). I got to this page http://en.wiktionary.org/w/index.php?title=gunsel&oldid=7728039 and STILL didn't see it. I clicked back, found what I was looking for, clicked forward, to read it on the page again, STILL couldn't see it. It took me 5 minutes to find that there is a hidden part of Wiktionary, that I NEVER knew about, hiding a lot of text, under an impossible to see superscript that has a similar styling to the IPA key! I never would have thought that has text. This makes me angry, as it means for HUNDREDS of wiktionary entries I've seen over the past year (or whatever length of time), I've been missing really valuable quotations people took the time to upload! For this reason I very strongly suggest changing this template or software so that if you do need to hide it, at least the first few character (10, say) are shown. This would let you not have to include full quotations open, if you don't want them, yet make sure nobody misses this really valuable content that is added specifically for the entry.

As it is not at all standard template language (like the IPA key) I would actually advise including the full quotations without hiding them, regardless of length. Space is simply not at a premium in a wiktionary entry! The alternative is to change the software so that the full quotation is hidden, but the fact that it is custom content specific to that page is clear, from the first few words being visible. Thank you kindly!!! —This unsigned comment was added by ‎ (talkcontribs) at 08:13, 25 September 2014 (UTC).

This is the first time that I have heard of this kind of problem.
We would like to understand your problem a bit better so that we can address it.
What kind of device were you using? What browser (including version) were you using? Does it have Javascript enabled?
Did you see the text "quotations" where the "superscript that has a similar styling to the IPA key" was?
Were you logged in as a registered user?
I will attempt to get the attention of someone who can address the problem technically. DCDuring TALK 18:11, 25 September 2014 (UTC)
Thank you. Yes of course, I mean that I didn't notice the word "Quotations" has valuable content under it! There is no other important content that is 'hidden' in this way. So this is a UI suggestion. You are, in my humble suggestion, taking away from some of the valuable work people contribute by folding it up in this way. To answer your questions, I was just viewing it from a standard browser, it looks like this to me. http://imgur.com/qVfdaKI I realize this is 'as intended', and I realize that you can click Quotations. I think this should be changed so that the text is not hidden. There is no other hidden content on the page, nor any reason to hide it, but there are irrelevant links (like the IPA key, and the Edit links) that I am used to ignoring. Thanks so much! 09:32, 26 September 2014 (UTC)
@ What do you say to this sort of display instead? Is that more noticeable? The number would change and we could use a more distinctive bullet (like ‣, ❧, or ⦿) if that'd be better. — I.S.M.E.T.A. 14:59, 28 September 2014 (UTC)
That's pretty good, but IMO it would be bettert without the break. It is still true that we want to get as many definitions as possible on the screen at once within the limits of our entry structure. Giving up a display line for a problem that hasn't surfaced before seems unwarranted to me. DCDuring TALK 15:40, 28 September 2014 (UTC)
A small part of the quotation should be shown in a style that suggests it's a quotation, but the text should fade out along the bottom or towards the end so the user instantly perceives that there is a quote without having to explicitly tell the user "Hey, there's a quotation here just click on this arrow here! We hid it so we wouldn't have to fill this entry with as much stuff." I cannot think where to find an example of the "fading" style but it seems common enough online. I find the current collapsed quotations (and also hidden translation tables, and other collapsed content) quite awful from a user experience perspective, as they are very easy to miss (as demonstrated here), also the user has little idea what will happen when they click, and it requires a bunch of clicking on little arrows to open the whole page. This would at least be partly fixed with a style where part of the quotation (or table or whatever) acts as a suggestion that there is more, rather than a mystery triangle. Pengo (talk) 10:32, 29 September 2014 (UTC)

Royal Society of Chemistry - Wikimedian in Residence[edit]

Hi folks,

I've just started work as w:Wikimedian in Residence at the w:Royal Society of Chemistry. Over the coming year, I'll be working with RSC staff and members, to help them to improve the coverage of chemistry-related topics in Wikipedia and sister projects.

You can keep track of progress at w:en:Wikipedia:GLAM/Royal Society of Chemistry, or use my talk page here if you have any questions or suggestions, or requests for help with chemistry-related terms. Pigsonthewing (talk) 12:59, 25 September 2014 (UTC)

Disable automatic creation of redirects?[edit]

Because Wiktionary tends to avoid redirects, we generally don't want to leave redirects when we move pages. But only sysops can move pages without leaving a redirect behind, which is cumbersome, especially as ordinary users are not aware of this and will not mark the redirect for deletion afterwards. So I think that redirects should not be created when moving pages, or at least not by default. Ordinary users should also be given the option to disable the redirect. —CodeCat 16:57, 25 September 2014 (UTC)

Is our patrolling and filtering diligent enough to catch something being moved to a hard-to-find namespace or pagename? DCDuring TALK 18:29, 25 September 2014 (UTC)
Probably not. But would we find them even now? —CodeCat 19:33, 25 September 2014 (UTC)
At least a user would find it with a properly typed search. Also a link to the original name would. We could also periodically search the dump for redirects to implausible namespaces or implausible characters. Without the redirect the search might be harder and the remedy a bit more time-consuming. DCDuring TALK 19:40, 25 September 2014 (UTC)

Category:English coordinates[edit]

I'm a bit confused by this category. Many of the terms listed don't actually match the description. For example "et" is not a coordinating conjunction in English, and "etcetera" is not understood as a combination of a coordinator and a head, but is treated by English speakers as a single indivisible set phrase. But even then, is the term "coordinates" even the most common term for this? We already have Category:English non-constituents, should these really go there? —CodeCat 20:22, 25 September 2014 (UTC)

It might be useful to make it a subcategory of English non-constituents. The description could be improved to allow for abbreviations that are synonyms of a combination of coordinator and head. I'm open to other suggested names.
I've often wondered why all MWEs are categorized into phrases when many are not, why we categorize things as interjections when many are not, why we use context tags to make topical categories. I suppose many of our categories are compromises among any contributors' senses of the logic of things, the limited availability of generally accepted terms, and the willingness to police them. DCDuring TALK 21:47, 25 September 2014 (UTC)
I have to say that I don't fully understand the meanings of many linguistic terms like these, and it doesn't help that people might have different ideas of their meanings, not just here but also in scholarly circles. I've been trying to make things fit a bit more, but it's not easy. I would greatly appreciate it if other editors could review the current structure, in particular the categories that now use {{poscatboiler}}. I'm trying to add more and more categories to that module/template, but it's hard when I don't even really know what the categories' names mean or what they're meant to contain. That's why I came here; I have no idea what a "coordinate" is, nor if it's a linguistically standard term, nor whether all the entries in the category belong there. —CodeCat 21:56, 25 September 2014 (UTC)
I don't know that there is any official term to characterize these expressions, but no PoS does them justice. To call them "Adverb" in the wastebasket use of that term hardly does them justice. Calling them Phrase is contrary to the grammatical definition of the word and our general effort to use grammatical PoS terms, correctly applied, for headers. Other dictionaries variously assign nice and no PoS, Adjective with an adverbial definition, Adverb, and Idiom.
I have edited the text, removed the one item, [[honoris causa]], that clearly didn't belong there, added one, and made the category a subcategory of English non-constituents. The category could be made more homogeneous by eliminating items that could not terminate lists of multiple conjuncts, but why? DCDuring TALK 22:52, 25 September 2014 (UTC)
In a sense, these terms feel a bit like prepositions. They are "incomplete" and need something extra to make a whole "thing". —CodeCat 22:55, 25 September 2014 (UTC)
And also like other non-constituents, but also like determiners, conjunctions; adjuncts, like adjectives and manner adverbs; copulas and transitive verbs. Everything short of a complete canonical sentence can be considered grammatically incomplete in some way. DCDuring TALK 00:18, 26 September 2014 (UTC)

Description for Category:Predicatives by language?[edit]

I'm struggling a bit with coming up with a description for the terms in this category. Right now, the descriptions of the individual categories just say "?". Can anyone come up with something better? —CodeCat 20:43, 25 September 2014 (UTC)

Maybe it would be easier if we used these words' actual parts of speech rather than making stuff up. —Aɴɢʀ (talk) 21:11, 25 September 2014 (UTC)
If parts of speech actually provided a complete category structure for our entries and exhausted the useful, generally accepted knowledge about a language, they would indeed suffice. Would that they did.
I always thought that the category structure in our software and most other user-oriented software did not compel hierarchical structure for good reason: poor correspondence with most folks' needs and perception of reality.
Supplementing PoS categories is essential, not only to overcome the poor match between PoSes and the actual nature of many of our entries, eg, Proverbs which aren't even proverbs, Phrases which aren't phrases, Interjections which aren't interjections, but also to reflect kernels of knowledge that can make break our categories into more manageable units, especially those that reflect some actual specific knowledge about some linguistic class of entries. DCDuring TALK 22:08, 25 September 2014 (UTC)
That's why we now have Category:English terms by semantic function. This category deals with ways of categorising terms that is not strictly a matter of part of speech, but has some semantic element in it too. —CodeCat 22:19, 25 September 2014 (UTC)

Vote on CFI Misspelling Cleanup[edit]

Wiktionary:Votes/pl-2014-08/CFI Misspelling Cleanup is nearing to its end. Could you please post your vote, even if it is "abstain"? It is a trivial vote, from my standpoint, but it would do with a couple of abstains so that it can be cleanly closed. --Dan Polansky (talk) 17:17, 26 September 2014 (UTC)

@Dan Polansky: Done. — I.S.M.E.T.A. 14:35, 28 September 2014 (UTC)

Some confusion with suffixes and the absence of many prefixes and (is it all) infixes .[edit]

1. Some confusion with suffixes.

In Wiktionary, sometimes suffixes have special characters like the Latin <-ālis>, but sometimes they do not, as in the Latin <abdominalis> (Wiktionary entry). The suffix entry does not accommodate the <-alis> version.

Sometimes the suffixes in Wiktionary have two components <-at-> and <-ive>, or even three components <-at-> <-i-> and <-on>. The <-ate> or other components may or may not be acknowledged in the entry. How <-ate> may become <-at-> to combine to form other suffixes is not mentioned, though the archaic <-at> is acknowledged.

The entry <-ation> takes you the Latin <-ātiō> but there is no mention of the English <-atio> as in <ratio>.

2. Absence of many prefixes and (it may be all) infixes.

Clearly to include them all would be a gigantic effort. Perhaps some temporary rudimentary pages could be made. Prefixes are often used as infixes and identifying just the prefixes would be a valuable move forward.

GHibbs (talk) 08:06, 27 September 2014 (UTC)

I've added the appropriate long marks to abdominālis. It would be easier for us to find the things you're talking about if you linked to them using double square brackets [[like this]] rather than with greater than/less than signs. I don't think the -atio in ratio is really a suffix in English. —Aɴɢʀ (talk) 13:57, 29 September 2014 (UTC)
I see no good reason to declare -at- to be an English infix when we have -ate as an English suffix. This must be the product of the synchronic morphomania in our Etymology sections and overuse or misuse of {{confix}}. DCDuring TALK 15:04, 29 September 2014 (UTC)

Checking for invalid phonemes in Template:IPA[edit]

The template already checked for invalid characters before, but I've now added some functionality that lets you check for the validity of the phonemes themselves, according to which language is being used. This is done by listing all valid phonemes for the in Module:IPA/data. If an entry contains invalid phonemes, it's listed in both Category:IPA pronunciations with invalid phonemes and a language-specific subcategory. I've done it for Dutch, and it seems to work quite well, but I don't know if it would be useful for every language. At least it's there if anyone wants to use it, and I hope it helps. —CodeCat 22:31, 27 September 2014 (UTC)

Hm, that could be useful for languages with small or closed inventories, like Latin, Old Norse and Esperanto. I notice it's currently highlighting the /œ/ in Duits, but the Dutch Wiktionary says that word does indeed have a /œ/ in both the north Netherlands (where nl.Wikt says it's /dœʏ̯ts/, /dʌʏ̯ts/) and Flanders, Brabant and Limburg (/dœːts/). - -sche (discuss) 02:02, 28 September 2014 (UTC)
It's because it got written with the nonsyllabic diacritic above the y, while most other entries write it below. —CodeCat 11:52, 28 September 2014 (UTC)
Can't this be made to work when transcribing phonetically with [ ]? --Vahag (talk) 09:33, 29 September 2014 (UTC)
  • The purpose of IPA template is font support, it shouldn't decide whether the characters used in phonemic transcription are valid IPA characters or not. You can phonemically transcribe using whatever set of symbols you like. Even using e.g. Cyrillic characters as it is done for many Cyrillic-script based languages. Phonemic transcriptions are *not* pronunciations. (Which is why Wiktionary's usage of /ɹ/ instead of /r/ which every single other English dictionary does is so dumb.) Furthermore, phonemic inventory of a language depends on the author making such analysis and vary for just about any single language except artificial ones, or those with dictatorial institutions "governing" them. To proper step to reduce inconsistencies is to forbid manual transcriptions altogether and make pronunciation-generating modules in Lua, even if it requires phonetic respelling to properly generate regional variants. --Ivan Štambuk (talk) 23:48, 28 September 2014 (UTC)
    I mostly agree with Ivan, but I don't think it would hurt to have it do some behind the scenes cleanup category stuff, even if it is experimental with a lot of false positives. The average user would not see hidden categories and would not be affected by this in any way. --WikiTiki89 08:23, 29 September 2014 (UTC)
    I disagree that {{IPA}} can be used with whatever set of symbols you like. Pronunciation information can be added using other transcription systems, as we do with {{enPR}}, but non-IPA systems shouldn't use the {{IPA}} template. I asked for a way of categorizing invalid IPA characters because I was tired of finding things like g ' : instead of ɡ ˈ ː in IPA transcriptions and wanted an easy way to find all the instances. (I sometimes regret making that request, though, because the number of characters considered invalid is greater than I expected, and the number of pages in Category:IPA pronunciations with invalid IPA characters is far greater than anyone can work through.) But I am very skeptical of the attempt to find invalid language-specific phonemes, not least because we often give narrow phonetic transliteration in addition to broad phonemic transliteration. If the template knows that /kʰ/ and /æ̃/ are not phonemes of English, won't it incorrectly tag IPA(key): /kæn/, [kʰæ̃n] as containing invalid phonemes? Or is it smart enough to look only inside slashes and not inside square brackets? Then there's the problem of languages with dialects (/æː/ is a valid phoneme of Ulster Irish but not Munster Irish, /ɑː/ is the opposite) and the problem of people not wanting to stick to the symbols listed in our pronunciation appendices (I get a lot of grief from other editors for trying to make English pronunciations comply with Appendix:English pronunciation). —Aɴɢʀ (talk) 19:18, 29 September 2014 (UTC)
Why would we use /r/ for /ɹ/ when they're completely different phonemes? o_O Tharthan (talk) 11:39, 8 October 2014 (UTC)
@Tharthan: If you know what a phoneme is, then you should realize that in the context of English, that sentence is completely nonsensical. English only has one rhotic consonant phoneme, which is usually pronounced something close to [ɹ]. Whether you use /r/ or /ɹ/ to represent it makes no difference, but if you assume that you can only choose one of them, talking about them simultaneously as in "they're completely different phonemes" is completely nonsensical (or if you assume that you can choose both of them, then your sentence is plain wrong because they are the same phoneme). When choosing the representation of the phoneme, there are criteria to consider, such as how easy it is to input, how recognizable it is, how close to the actual phonetic realization it is, etc. How much weight we give each criterion is up to us. --WikiTiki89 11:58, 8 October 2014 (UTC)
@Wikitiki89:,some dialects of English (not in North America, but nevertheless they do exist) use /r/ or (more rarely) /ɾ/ where other dialects use /ɹ/. As such, we need to distinguish between /ɹ/ and /r/ so that those with an actual /r/ in their phonemic inventories don't get confused. Tharthan (talk) 16:46, 8 October 2014 (UTC)
I believe the dialects you refer to have [r] and [ɾ] as allophones of /ɹ/. --WikiTiki89 20:42, 8 October 2014 (UTC)
No one would be confused if we used the same symbol for the English r-sound that every single dictionary of the English language except Wiktionary uses, namely /r/. At worst we might have to distinguish between [ɹ] and [r] at the phonetic level (using the latter for, say, Scottish English), but never at the phonemic level. —Aɴɢʀ (talk) 17:21, 8 October 2014 (UTC)
The problem is that /r/ isn't the right IPA letter for the English "r" consonant in most dialects. As such, the idea of changing the correct and more-or-less unambiguous /ɹ/ to /r/ is ludicrous. What would be the purpose of making pronunciation transcription more ambiguous? Should we also write modern widespread British English dialectal glottal stops as if they were /t/s? Tharthan (talk) 00:37, 9 October 2014 (UTC)
The IPA is more flexible than you think it is. If Peter Ladefoged, Alfred C. Gimson, Kenyon and Knott, and John C. Wells are comfortable using /r/ to transcribe the English r-sound, we can be too. —Aɴɢʀ (talk) 05:51, 9 October 2014 (UTC)


Can someone please restore Template:policy to the revision from 7 May 2014? I think the color change is inappropriate. --Dan Polansky (talk) 18:45, 28 September 2014 (UTC)

Done, I agree with you. Additionally that template page itself shouldn't be modified without at least some discussion. --Neskaya sprecan? 23:01, 7 October 2014 (UTC)

Wiktionary URL shortcut[edit]

Hey guys, so it seems there's a redirect to EN Wikipedia at http://enwp.org/ (such that enwp.org/Foo redirects to en.wikipedia.org/wiki/Foo). I've used this many times and it's really useful, but I feel that it'd be great to extend it to EN Wiktionary.

The information at wikipedia:User talk:Tl-lomas/enwp.org indicates that I can use http://enwp.org/wikt:Foo to redirect to Wiktionary, but I quite feel that many people (myself included) surely must use Wiktionary enough to find a more direct URL useful. Furthermore, the user who created that script has not edited ENWP in four years and Wiktionary never - he isn't responding to any past talk page messages, and presumably won't be around to respond to feature requests. Thus I thought it'd be logical if someone set up an "enwt.org", but someone on IRC claimed that "enwikt.org" would be more logical based on current interwiki links.

Do you guys think it'd be worth it? 20:38, 28 September 2014 (UTC)

I don't think so. I think most browsers permit custom address-bar searches now. For example, in Opera, I have it set up so that "d blah" finds blah on Wiktionary, and "k blah" finds blah on Wikipedia. Equinox 20:40, 28 September 2014 (UTC)
I don't think that's the point. I think the point is simply to have shortcut URLs, similar to URL shortening. I think this might be useful, but it certainly is not necessary. --WikiTiki89 20:44, 28 September 2014 (UTC)
TBH the primary times when I would personally find this useful is on my phone when I need to look up a word. Currently it's annoying enough that I had to resort to a - shudder - paper dictionary when reading Tale of Two Cities. I will see if my mobile browsers support what you mention. 20:52, 28 September 2014 (UTC)

<meridium> is it < merīdīum> as in <post meridium> page or <merīdium> as the main entry.[edit]

The Latin spelling of <meridium>. Should it better be it < merīdīum> as in the English <post meridium> and <ant meridium> pages' cross references or <merīdium> as the main Latin entry? GHibbs (talk) 07:49, 29 September 2014 (UTC)

It should be merīdiem. What the English entries had (until I just now corrected them) was merīdiēm. They never said "merīdīum". —Aɴɢʀ (talk) 09:55, 29 September 2014 (UTC)

The most common binomials in books[edit]

Below are the top 20 most common binomial names to be found in books, found via my original research using the Catalogue of Life and Google ngram data. I'm not sure what our policy is for scientific names, but these are the most commonly found ones, so it seems some care should be taken to give them complete entries with etymologies (which several already have, but also almost half are red links). Hope this is useful for editors.

  1. Homo sapiens
  2. Escherichia coli - E. coli
  3. Staphylococcus aureus (8 occurrences in wikt defs., 4 linked, 2 taxlinked) - Staphylococcus - staphylococcus - staph
  4. Candida albicans (4, 2 linked)
  5. Pseudomonas aeruginosa (7, 4 linked, 0) - Pseudomonas - pseudomonas
  6. Mycobacterium tuberculosis (5, 3 linked, 3) - Mycobacterium - mycobacterium
  7. Saccharomyces cerevisiae (8, 7 linked)
  8. Drosophila melanogaster (10, 7 linked)
  9. Zea mays (21, 16 linked)
  10. Bacillus subtilis (8, 6 linked)

11. Haemophilus influenzae (2, 2 linked, 1) - Haemophilus - influenzae
12. Pneumocystis carinii (2, 2 linked, 0) - Pneumocystis - pneumocystis
13. Salmonella typhimurium (2, 1 linked, 0) - Salmonella - salmonella
14. Treponema pallidum (4, 3 linked)
15. Streptococcus pneumoniae (2, 2 linked, 0) - Streptococcus - streptococcus - strep
16. Phaseolus vulgaris (20, 16 liked)
17. Clostridium botulinum (5, 5 linked)
18. Listeria monocytogenes (2, 2 linked, 0) - Listeria - listeria
19. Klebsiella pneumoniae (6, 4 linked)
20. Xenopus laevis - Xenopus (1, 1 linked)

Pengo (talk) 10:10, 29 September 2014 (UTC)

Thanks. We have only some of the generic names for the redlinks. Bacteria are definitely not well covered, partially because there are rarely vernacular names for them and they therefore aren't "requested" by use in an entry. DCDuring TALK 12:54, 29 September 2014 (UTC)
If you think about it, binomials would be more likely to show up on this list if they didn't have common names to compete with them in usage. Chuck Entz (talk) 13:29, 29 September 2014 (UTC)
Yes. And the strong interest in disease-causing organisms among researchers, clinicians, and the public accounts for 14 of the list members. I guess that a way of measuring the "demand" for these would be to add the organism name to the name of the disease caused in each language for which we have an entry for the disease and to request translations for English disease words. DCDuring TALK 14:32, 29 September 2014 (UTC)
And 5 on the list are model organisms that would appear in a vast number of scholarly publications, including the beer yeast. That leaves us with Homo sapiens. DCDuring TALK 14:43, 29 September 2014 (UTC)
A lot of the "generic names" for bacteria are just the species name written lower case. —Aɴɢʀ (talk) 14:49, 29 September 2014 (UTC)
Those are the really common ones.
w:Model organism and the pages linked at w:List of sequenced eukaryotic genomes#See also contain a good number of potential entries which would have similar usage. DCDuring TALK 14:56, 29 September 2014 (UTC)
When I get a chance, I might try making a lists of vertebrates and plants found in the fiction corpus to try and get a less research-centric list. Pengo (talk) 23:04, 29 September 2014 (UTC)
I don't object to what you've done: I welcome it. I don't certainly consider the research bias a weakness, but it is a characteristic of the methodology. I don't hope for much from a fiction corpus.
In my discussion of your list I was trying to understand how your approach differed from the approaches I and others had been taking and specifically why our approaches missed a good number of the specific items in the top 20. (I haven't even looked at the longer list.)
The approaches that have been used, some only sporadically, are:
  1. Top-down filling in of the tree of life, adding hyponyms at each level. (This becomes quite unwieldy sometimes at the genus level, sometimes at the species or lower level. It also leads to a possible overemphasis on extinct taxa and on the proliferating population of clades not used outside of systematic taxonomy.)
  2. Bottom-up filling in of the tree of life, adding hypernyms. (The number of additions declines because so many lower taxa share hypernyms/ancestors.)
  3. Adding items of interest to the contributor, often by type of flora or fauna (eg, birds, spiders, types of mammals: felines, canines, murines, bovines, marine, etc) or based on national or local lists of flora and fauna (most notably Finland), (This is a good fit with our wikiness, but makes for very spotty coverage.)
  4. Adding lists of flora and fauna neglected in other sources (eg, liverworts) (Something of a dead end in this case.)
  5. Adding templates to any taxonomic names already in Wiktionary to determine the "demand" for taxonomic and lately English vernacular names and adding the most common ones. (Limited so far by a lack of automation of the template-adding process, which should perhaps be replaced by counting the number of occurrences in en.wikt's dump of taxa names occurring as headwords in en.wikt, WP, and Wikispecies.)
  6. Adding items from topical lists such as for endangered species, sequenced genomes. (Small numbers of items)
  7. Adding items from WP dab pages for English vernacular names. (Limited use so far, but could become systematic)
Other approaches not yet tried:
  1. Add L2 sections or definitions for all the vernacular names in all languages contained in Wikispecies
  2. Add entries or definitions from available downloadable databases, such as for viruses and birds.
  3. Follow the approach taken at Swedish WP: having webcrawlers gather material for a good stub for such articles.
I favor approach 5 for systematic additions at this stage, but practice 1 and 2 as part of that effort. I also indulge in 3 and 6. If we were to shift to automatic mass addition of entries, I would shift my efforts to making sure that we were linking to external databases as automatically as possible, reviewing such entries, and improving existing entry quality. DCDuring TALK 00:22, 30 September 2014 (UTC)
I'm not sure I understand your #5 point? What kind of templates? Do you mean taxlink templates? and how do they help measure "demand"?
The main thing I've focused on with my approach is ranking the data. I figure if we started at Aa achalensis and work our way down through 1.5 million species then we'll take a long time to create entries for any of even the most common searches (I know that's not actually an approach you've listed, but it's the alternative I have in my head). My goal is to have definitions and etymologies for the most popular taxa, especially around the species level (genera, epithets, binomials), so that nature enthusiasts and students of biology can understand their meanings better and be less discouraged by the Latin terminology they encounter, perhaps even referring to Wiktionary one day when it comes to naming a new species. I've tried a few other approaches to ranking, such as using Wikipedia's hit counting (although I can't find much of the list I created from it except the short list here), and of course simply counting the most popular epithets in a big list of species. I'd like to try using Google Trends. I don't think Google's API allow doing 1.5 million queries, but perhaps it would be possible just to re-rank the 52,000 species found in books. This might be the best way to discover the scientific names which the broader population are actually searching for.
The other audience for my lists is Wiktionary's editors. I still have little proficiency with Latin. (The first time I posted a list of common epithets, I was actually surprised to discover, after seeing new entries created, that most of them were not terms specific to modern biology but were simply ordinary Latin words). So I largely rely on editors to do the heavy lifting of creating new entries. And that's something that has that has occurred to me again after reading your post: although I've been editing wikis for over a decade, I don't really have any idea what editors here would actually prefer to be editing, or what their process is, or what motivates editors to do what they do, or what kind of lists they'd like to see. I've basically just made lists and posted them, hoping that they'll list things worth including in Wiktionary and that editors might be interested in creating the entries, and fortunately it's generally worked out well. Although I have plenty of ideas for how to improve the lists or for how other lists could be made, I haven't gotten a lot of broad feedback on what editors would actually like to be editing or creating entries for, or how their process works, or what information would be most useful, whether editors would prefer to focus on one type of entry at a time (e.g. words ending in -ceps), or a bunch of things with a general theme? or how information should be grouped (would it be a big help if masculine, feminine, and neuter forms were listed together or not make much difference?), what trips up editors or slows them down? or what kind of decisions are editors making when looking through a list? —Pengo (talk) 11:19, 30 September 2014 (UTC)
@Pengo: re: 5 above. I have a little perl script that counts occurrence of {{taxlink}}. I will soon modify it to do the same with {{vern}}. I had originally thought that the categorization would be good for generating lists, but, as the list is of entries, not taxa, it isn't. As a result I run the script and add the most common items on the resulting list each time. A more ambitious approach would be to count the occurrences in Wiktionary of words in lists of specific epithets or of entire taxa. It is necessary to count unlinked terms because so many taxonomic names in entries are not linked, for reasons that can't reflect any user considerations. Probably the contributors disliked the redlinks and thought that taxon entries, especially for binomens and trinomens would never be created. This thought reflected expressed opinions of senior contributors. Many such entries don't even have links for the taxa to WP (in any language) or to Wikispecies.
By 'demand' I only mean use on Wiktionary, which reflects some kind of blend of how many language have one or more vernacular names for the taxon and whether wiktionarians have any interest in either the taxa or the vernacular names. Over time, taxa appearing as Hyponyms or, especially, Hypernyms in Translingual sections have come to be well-represented despite being uncommon except in the literature of systematics.
The list of specific epithets used in the most species names, as useful as it is, does not well correspond to the list of those specific epithets actually used but missing on Wiktionary. Due to a lack of consensus about whether some specific epithets not occurring in Classical Latin were better treated as Latin or as Translingual I use {{epinew}} to link and categorize epithets by the language we choose for them. This is supposed to link the the lemma entry and display the actual term. It sorts the item by the lemma so it is easy to use Category:Species entry using missing Latin specific epithet to find missing specific epithets with multiple occurrences. Adding {{epinew}} to the existing species entries that don't have it is tedious.
I have thought it a little embarrassing for us to have so little knowledge about what users seek. It should be even more embarrassing that we are unwilling to characterize what we like to work on. Speaking for myself, I have liked:
  1. cleanup lists, both one-time lists unlikely to need to be recreate and those that are constantly renewed, often by user error or ignorance. Such lists can be long if the effort required per entry is modest.
  2. relatively short lists of items that IMO need a lot of work, so that I have the satisfaction of emptying them.
  3. individual requests, because I know someone is interested.
  4. variety in my areas of interest and self-perceived responsibility. I have browser tabs open to several lists in those areas (whether categories, search results, or user-created).
I have always been motivated to correct what I see as problems, so lists with such focus are particularly motivating. I now try harder to avoid areas of controversy, unless in an area I feel especially responsible for.
I expect that my preferences are not unique, but also not universal. DCDuring TALK 14:05, 30 September 2014 (UTC)

There is no British English <-isation> though the US version <-ization> exists.[edit]

In Wiktionary there is no entry of the frequent British English <-isation> though the US version <-ization> exists. Both the entry and cross references are required. GHibbs (talk) 13:38, 29 September 2014 (UTC)

Huh? We have an entry -isation, and have had for quite some time. —Aɴɢʀ (talk) 13:48, 29 September 2014 (UTC)

requirements for getting rollback permission?[edit]

Special:ListGroupRights shows that the rollback permission can be granted to users without having to make them an admin. I searched the help files, but there is no mention of this practice. So I'm just wondering: what are the requirements of getting the rollback permission on Wiktionary? --Ixfd64 (talk) 19:35, 29 September 2014 (UTC)

I was going to apply for rollback and patrolling rights. You can be nominated by an admin on WT:WL but there's no procedure for nominating yourself (AFAIK). Renard Migrant (talk) 11:05, 3 October 2014 (UTC)
No, WT:WL is only for the autopatrolled flag. If you want rollback, you can just ask here. The only requirement is that you have to convince The Powers That Be that you can do the job well. For me personally, that means you already have the autopatrolled flag, you know WT:CFI and WT:ELE like the back of your hand, and there are no red flags of potential trouble (drama-queening, etc.). (Would be nice to get acquainted with the admins, too.)
User:Ixfd64, your contributions here have been rather sparse lately, and I have not really seen you around in the "(anti-)social" side of the project (so to speak), but otherwise I see no reason not to grant you these flags. Just ask. User:Renard Migrant, I think you know policies well, but I have mixed feelings about letting you deal with newbies given your, shall we say, brutal honesty. We already have a few too many arseholes in power here. Keφr 14:24, 7 October 2014 (UTC)
How is the rollback tool different from just clicking "undo" on an edit or series of edits and then saving the page in its reverted form? — I.S.M.E.T.A. 15:07, 7 October 2014 (UTC)
It only takes one click and automatically generates an edit summary. --WikiTiki89 16:17, 7 October 2014 (UTC)
WT:WL has been used for rollbacker nominations before. — Ungoliant (falai) 16:26, 7 October 2014 (UTC)
Yeah, I haven't been that active on Wiktionary in recent years. I used to regularly create wanted entries, but Wiktionary has since become pretty mature, and most of the wanted entries nowadays are foreign words that I'm not familiar with. So I spend much of the time doing RC patrols now. --Ixfd64 (talk) 17:21, 7 October 2014 (UTC)

Category tree[edit]

How does one create a new category these days? Do we have a page with instructions? Ƿidsiþ 07:31, 1 October 2014 (UTC)

The modules are not quite finished yet (at least not how I would like them to be) but I suppose I could write some documentation in the meantime. If you look on Module:category tree, there are various subpages for different parts of the tree. Some are modules containing code, while others are data modules where the categories themselves are specified. —CodeCat 12:35, 1 October 2014 (UTC)
So one hasn't been able to readily add a conforming category for how long now? DCDuring TALK 12:48, 1 October 2014 (UTC)
One can create a category the old-fashioned way and let others bring it into conformity later. DCDuring TALK 12:50, 1 October 2014 (UTC)

October 2014[edit]

What makes a single word idiomatic?[edit]

I think it would be nice if we took WT:CFI a bit more seriously. I mean, de facto there's no problem because nobody's forcing us to apply our own rules; there's no 'court of appeal' if there's a deletion decision that goes against WT:CFI. Anyway.

Under General rule:

"A term should be included if it's likely that someone would run across it and want to know what it means. This in turn leads to the somewhat more formal guideline of including a term if it is attested and idiomatic."

Under Idiomaticity:

"An expression is idiomatic if its full meaning cannot be easily derived from the meaning of its separate components."

So, all terms have to be idiomatic (well, it's a 'somewhat more formal guideline'. Viewed from that perspective, it does make it sound like attested and idiomatic aren't in the rules, they're just in the guidelines!), but in terms of CFI, it only give guidelines on what idiomatic means from an expression. Given that all terms have to be idiomatic, what's the test for say, hat, or reenter?

I know it's hard work, but I just think it would be nice if we could take ourselves a bit more seriously. Renard Migrant (talk) 11:14, 4 October 2014 (UTC)

For hat it's obvious because its meaning cannot be easily derived from its phonemes /h/, /æ/, /t/, since phonemes do not have any meaning to convey. For reenter it's less obvious because its meaning can be derived from the meaning of re- and the meaning of enter, but we seem to have an (unwritten?) agreement here that everything written together without a space is eligible for an entry. That convention breaks down, however, for languages that are not usually written with spaces; and it has been controversial for polysynthetic languages that may write whole phrases like "he had had in his possession a bunchberry plant" as one word without spaces. For English, the only real ambiguity is in expressions that are written with spaces, because there is no unambiguous criterion to distinguish idiomatic ones from unidiomatic ones. Probably everyone agrees that hot dog is idiomatic and hot lightbulb isn't, but between those two extremes there's a continuum, not a clearly defined split. —Aɴɢʀ (talk) 12:53, 4 October 2014 (UTC)

Make Categories Show Where They're Defined[edit]

I would like to propose that the category templates be modified to show the name of the data (sub) module where the information for the category resides. This would make it easier to make changes, and also make it easier to figure out where a new category analogous to existing ones could be added.

Adding documentation pages to modules is helpful, but it still takes a bit of wandering the maze of modules and sub-modules and data sub-sub-modules to figure out where category information resides. This shouldn't be too hard, since the modules have to have this information at some point- it's just a matter of developing protocols for passing it back to the templates.

It might also be nice to give instructions on where to go to get changes made, but that may not even be settled yet. This is all part of a larger problem with our newer Lua-based architecture, which is that things are centralized in data modules and impossible for non-admins to access, but I'll leave that for a separate topic. Chuck Entz (talk) 16:43, 4 October 2014 (UTC)

All the categories have an "edit" button already, and it's been there for a few years maybe. You never noticed? —CodeCat 16:47, 4 October 2014 (UTC)
Why are you so surprised? It's not what one one expect from how many other things work. Human attention works that way. Given that the question of category documentation and editing has been asked before without answer, Chuck probably assumed that it must be a policy matter. DCDuring TALK 18:47, 4 October 2014 (UTC)
And, once the edit button is clicked on, then what? DCDuring TALK 18:51, 4 October 2014 (UTC)
I wrote three paragraphs. You never read them?
If you click on Edit for Category:English colloquialisms, you get:
  1. {{poscatboiler|en|colloquialisms}}. poscatboiler contains:
  2. {{#invoke:category tree|show|template=poscatboiler|code={{{1|}}}|label={{{2|}}}|sc={{{sc|}}}}}, so we go to that module.
  3. Module:category tree refers us to:
  4. Special:PrefixIndex/Module:category tree. The logical next step is:
  5. Module:category tree/poscatboiler. This refers us to:
  6. Module:category tree/poscatboiler/data, which refers us to:
  7. Special:PrefixIndex/Module:category tree/poscatboiler/data, which contains dozens of submodules. Fortunately, I've been working with categories long enough to spot:
  8. Module:category tree/poscatboiler/data/terms by usage as the most likely choice.
And there it indeed is. What I'm proposing is a line at Category:English colloquialisms that refers you to Module:category tree/poscatboiler/data/terms by usage without your having to going through all the steps above. I've worked a lot with categories, and I know something about templates and modules, and there are times when I have to look at several data sub-sub-modules before I can find where the configuration is for a given category. Sure- it's simple! Chuck Entz (talk) 18:58, 4 October 2014 (UTC)
CodeCat was referring to the small edit button next to the text. You were referring to the edit tab at the top, which is the first place one would look to edit something other than a section. Someone introduced a non-standard positioning of the edit option and expected it to "of course" be noticed by anyone with half a brain. But that is simply not true: habits that are reinforced by thousands of successful repetitions are not easily overcome and cause attentional blindness to such things as small edit buttons in unexpected places. DCDuring TALK 19:11, 4 October 2014 (UTC)
Ah, that explains it! No, I never noticed it. I was wondering how she could have so completely missed my point. That feature does, indeed, make my proposal rather redundant- but it might still be useful for those who are trying to figure out how the categories work, but aren't going to be editing data modules. Perhaps a combination would be a good idea, such as "This category is defined at Module:category tree/poscatboiler/data/terms by usage" with the edit link at the end. Chuck Entz (talk)
That would be a bit too long to fit where the edit button currently is. Do you know where else it could be placed? —CodeCat 19:52, 4 October 2014 (UTC)
How is what happens after one clicks the edit link self-explanatory? Some kind of help (colored green?) to click on next to the edit button would both make the edit button more visible and afford an opportunity to explain further. DCDuring TALK 20:42, 4 October 2014 (UTC)
Is new to me too. Here's some ideas for making it more visible:
  1. Add a hidden category to the category pages, e.g. Category:Categories defined by Module:category tree/terms by usage. (And that category can then explain more in its description, and link more obviously to the module). Editors are more likely to have hidden categories showing, so may notice.
  2. Change the text to something more descriptive, such as "[Edit category definition]", and/or perhaps an even more wordy hover text, e.g. "Edit the module which defines this category's description, category parent, and category text."
  3. Add an item to the left nav under "Tools". (Though that would probably be even less noticed)
Also, pages like Module:category_tree/poscatboiler/data/terms_by_usage could really use some docs to say what is and isn't safe to edit, how to propose or add new categories, and how to test that your edits aren't going to break everything. (Especially as it [Edit] buttons encourage users to edit it). Even if you know Lua and something about Wiktionary, you still don't know what can be edited safely on that page.
Perhaps a whole other conversation, but the docs on each category page really should say (or link to) how a regular editor can add a page to that category, e.g. which template or group of templates are used in the article space to add the category tag and whether it needs additional parameters to cause it to be added, etc. Though that's a whole other conversation and perhaps a thankless task to document properly. Pengo (talk) 11:52, 5 October 2014 (UTC)
How about in an editnotice? --Yair rand (talk) 14:13, 5 October 2014 (UTC)


Considering that so much time has been wasted on rfv's/rfd's due to misspellings (especially in hyphenation) resulting from scannos, should we expand our criteria for inclusion page with notifications/warnings or something? Just a suggestion. Zeggazo (talk) 20:15, 4 October 2014 (UTC)

Category:Arabic definitive nouns???[edit]

First of all, shouldn't these be "definite nouns", not "definitive nouns"? Second of all,four of the five entries in this category are simply the definite equivalents of Arabic lemma nouns (which are always in the indefinite). The definition itself specifies this. The definite equivalents are formed simply by appending "al-" (or rather, the Arabic equivalent) to the noun. I thought there was a policy not to include such forms unless they have an idiomatic definition? I'm going to add {{delete}} tags soon but I want to make sure others don't disagree.

BTW the fifth of five entries is the word العَرَبِيَّة (al-ʿarabiyya), which has a special meaning ("the Arabic language"), separate from the word عَرَبِيَّة ("carriage" or "female Arab"), so it should be kept. Benwing (talk) 08:38, 5 October 2014 (UTC)

Perhaps nouns and proper nouns where ال (al-) is always used should still be categorised as "definite nouns"? It's useful for readers to know that a term is formed by al- + the stem. Not sure if ALL such terms should be redirected to terms without the definite article. --Anatoli T. (обсудить/вклад) 23:36, 5 October 2014 (UTC)
OK, So no one answered my question. For the ones that are simply the definite equivalents of existing lemmas, with no special meaning, should I delete them, or keep them and use something like {{definite of}}? I think we should delete, since otherwise we're setting a precedent for creating definite equivalents of every single noun out there, which is crazy, since they're all formed trivially in exactly the same fashion by just adding "al-" (actually ال (al-), in the Arabic script) onto the beginning of the noun. It would be comparable to creating entries for the car and the boat and the kumquat, etc. etc. Any objections to me deleting them? Benwing (talk) 10:31, 8 October 2014 (UTC)
Normally terms should be RFD'ed for deletion but since they are definitely just "definite article + noun" entries, yes, delete all, except العربية and اللغة العربية. If you don't have the rights to delete, I'll delete them for you. العربية and اللغة العربية should probably be RF-ed or RFV-ed, not sure. --Anatoli T. (обсудить/вклад) 22:29, 8 October 2014 (UTC)
I would also keep الأمين as one of the names of Muhammad and also given name after that. --WikiTiki89 22:49, 8 October 2014 (UTC)
Also, {{ar-proper noun}} should automatically add to Category:Arabic definite nouns. --WikiTiki89 22:52, 8 October 2014 (UTC)
Yes, keep الأمين. Agree about proper nouns as well. --Anatoli T. (обсудить/вклад) 22:59, 8 October 2014 (UTC)
Definite forms in Arabic are not written with a separating space, as far as I know, so they closely parallel the definite forms of the Scandinavian languages. Since we have separate entries for those (dag, dagen, dagar, dagarna), we should probably also have separate entries for the definite forms of Arabic nouns. —CodeCat 23:20, 8 October 2014 (UTC)
Arabic grammar doesn't consider definite articles part of the word. Exceptions are proper nouns. Also, monosyllabic prepositions consisting of one consonant and a short (unwritten) vowel are spelled together, they are separate words, unless they are adverbs (debatable), e.g. بِسُرْعَة (bisurʿa) -quickly (lit.: "with speed"), preposition بِ (bi-) + سُرْعَة (surʿa) (speed), enclitic pronouns بَيْتِي (baytī) "my house", بَيْت (bayt) + my - "ي" (-ī). Scandinavian, Bulgarian/Macedonian, Albanian definite forms are also debatable but they should be considered separately. Korean particles and copulas are also written without a space but they are considered separate words. 도서관 (doseogwane) "to the library" = 도서관 + 에. --Anatoli T. (обсудить/вклад) 23:43, 8 October 2014 (UTC)
Arabic, Hebrew, and Aramaic have a lot of clitics and we have a consensus generally not to include words with clitics. The definite article is arguably one of these clitics, although in Aramaic the definite form is actually the lemma form. However, we do seem to have a status quo of generally not including the definite forms for Arabic and Hebrew. --WikiTiki89 02:52, 9 October 2014 (UTC)

The Latin word com has no entry.[edit]

The Latin word com, a component of commodus does not have an entry. GHibbs (talk) 08:06, 6 October 2014 (UTC)

Is it ever a free-standing word? As a prefix we have com- (and con-, col-, cor-, and co-). DCDuring TALK 10:39, 6 October 2014 (UTC)
The free-standing word corresponding to com- is cum. —Aɴɢʀ (talk) 16:37, 6 October 2014 (UTC)

Transliterations for headword-line inflections[edit]

Previous discussion: Wiktionary:Beer parlour/2013/October#Transliterations for inflected forms in headwords?

This was discussed before a while ago, but didn't reach much of a conclusion. The question is how to deal with transliterations of inflected forms that are displayed in headwords. Module:headword, and by extension many of our current headword-line templates, do not support this at all. But for Arabic we've always displayed transliterations for inflected forms, and the templates therefore had to be custom-made to handle this.

I imagine it's best to have a single common behaviour for all languages. So the question is, should we include them for all languages, for none, or for some subset? And if only for some subset, then based on what criteria? —CodeCat 16:08, 7 October 2014 (UTC)

  • My 2p is on all. As the EN WT, our user base can be assumed to read English. If an entry is in a non-Latin script, we cannot assume that our users can read the headword, and as such, for the sake of usability (among other factors), we should provide transcriptions. ‑‑ Eiríkr Útlendi │ Tala við mig 17:26, 7 October 2014 (UTC)
I thought that our "ground rules" said that all non-Roman texts should (eventually) be transliterated - and that this could be by means of "pop-up" text if necessary or wanted. — Saltmarshαπάντηση 17:44, 7 October 2014 (UTC)
Transliterate all. --Vahag (talk) 18:42, 7 October 2014 (UTC)
Don't transliterate Russian inflected forms or some other languages having irregular pronunciations. It may also look quite messy if there are a lot of forms in the header. Arabic editors want to transliterate all, so be it. I don't object Arabic transliterations. --Anatoli T. (обсудить/вклад) 22:36, 7 October 2014 (UTC)
I'm not sure I understand your reasoning. If I understand correctly that by "irregular pronunciation" you mean "pronunciation not fully predictable from spelling", then it seems to me that those cases are exactly the ones where a transliteration would be useful. Then again, we've already established that many editors here disagree with the practice of using pronunciation as a guide to transliteration in phonemic scripts such as Cyrillic. —CodeCat 22:49, 7 October 2014 (UTC)
I agree with Atitarev that we should transliterate inflected forms only for languages for which the transliteration is essential to understand the structure of the inflected form. For languages such as Arabic, for which transliterations could be considered superfluous when the words are fully vowelated, there is another consideration: It may be difficult for some readers to see the vowel diacritics, making the transliterations essential to these readers. For languages like Persian, for which we do not indicate vowels at all in the native script, transliterations are absolutely essential. --WikiTiki89 22:47, 7 October 2014 (UTC)
What about users who want to know what is written, but are not learned in reading it? Arabic looks like nonsensical squiggles to me, and without transliterations the forms might as well not be there at all. For Cyrillic or Greek the consideration is no different, except that I just happen to be able to read those scripts. But there will of course be many users that can't. —CodeCat 22:51, 7 October 2014 (UTC)
Someone who cannot read a language is unlikely to need to know how a word inflects. --WikiTiki89 00:04, 8 October 2014 (UTC)
@Wikitiki89: I guess I'm unlikely then? —CodeCat 00:33, 8 October 2014 (UTC)
Yes, you are one of the few. Keep in mind that our inflection tables usually do have transliterations. But if you are interested enough in Arabic, I suggest you learn the alphabet. Otherwise you would be comparable to someone wanting to learn chemistry without learning the chemical element symbols or someone wanting to learn calculus without learning mathematical notation. --WikiTiki89 11:30, 8 October 2014 (UTC)
  • Does adding a romanization to inflected forms harm the project in any way? It seems to me instead that it would add value. Perhaps I happened across the term რეჰანმა (rehanma) and simply wanted to know roughly how to read it, without any knowledge of the Mkhedruli script. Thankfully, this entry for an inflected form already includes a romanized spelling. Would you advocate for removing romanizations from inflected forms? If so, why? ‑‑ Eiríkr Útlendi │ Tala við mig 05:29, 8 October 2014 (UTC)
@CodeCat, Many editors doesn't mean there's a consensus. If you haven't noticed there are a lot of languages with irregular pronunciations and transliterations (exceptions). There's no practice in published dictionaries to transliterate Russian or Greek, hence an in-house (Witktionary) transliteration method is used. "narodnovo" and "narodnogo" are equally attestable transliteration of genitive form of наро́дный (naródnyj) - наро́дного (naródnovo). Japanese and Korean exceptions are partially handled by smart modules (some Korean exceptions still need to be transliterated manually, such as 십육) but Russian is not, こんにちは is "konnichi wa", not "konnichi ha". Do I need to bring up that argument again? Hindi, Thai, Lao, Greek also have irregularities, which are reflected in standard or Wiktionary transliterations. Automatic transliteration would cause, e.g. ру́сского appear as "rússkogo", which should be "rússkovo" (gen. of русский) --Anatoli T. (обсудить/вклад) 23:03, 7 October 2014 (UTC)
Cyrillic, Greek, Armenian, Georgian vs Hangeul, Arabic, Hebrew, Thai, Devanagari, etc. The former are considered "easy" by dictionary publishers, although Devanagari is very phonetic. Since dictionaries usually don't use transliterations for the former, we have this argument that those should reflect the spelling, letter-by-letter whereas the difficult ones use phonetic transliterations or transcriptions, mixture of literal and phonetic. You can learn about transliterations for complex scripts and see that they are full of exceptions, most are documented ("standard" or "scientific"). --Anatoli T. (обсудить/вклад) 23:13, 7 October 2014 (UTC)
  • Reading the above, I think it would be useful for us to be clear about transcription -- changing one script for another, such as “ру́сского” → “rússkogo” -- versus transliteration -- which would include phonetic considerations, such as “ру́сского” → “rússkovo”.
Anatoli, do you (or any others) have any objection to transliteration? ‑‑ Eiríkr Útlendi │ Tala við mig 23:29, 7 October 2014 (UTC)
@Eirikr: You seem to have gotten transcription and transliteration backwards. Transcriptions are phonetic while transliterations are (supposed to be) graphemic. --WikiTiki89 00:04, 8 October 2014 (UTC)
  • Fair enough, I may have gotten it backwards. But the point stands -- are we worried about orthographic fidelity, or phonetic? Or do we even want both? ‑‑ Eiríkr Útlendi │ Tala við mig 05:29, 8 October 2014 (UTC)
@Eirikr:, have you read all of my posts above? Would agree to transliterate こんにちは as "konnichi ha" and 십육 as "sibyuk"? Modern standard transliterations go far beyond just representing words simply letter-by-letter. They use a lot of phonetic considerations, call them transcriptions, if you wish but they are not. "rússkovo" is not 100% phonetic, only shows irregular pronunciation of "г", it's pronounced [ˈruskəvə] (the phonetic respelling is "ру́скава"). --Anatoli T. (обсудить/вклад) 23:37, 7 October 2014 (UTC)
BTW, fully automated Arabic transliteration will affect irregular Arabic words, such as إنْجِلِيزِيٌّ (ʾinjilīziyyun), which is pronounced the "Egyptian" way - "ʾingilīziyyun" and other loanwords and dialectal pronunciations. It's probably fine, just need to be aware of this. --Anatoli T. (обсудить/вклад) 23:46, 7 October 2014 (UTC)
Just to make sure, you realise that if we do have transliterations for inflections on headword lines, there will also be parameters on {{head}} to override any default ones? —CodeCat 23:49, 7 October 2014 (UTC)
I suspected there would and should be but the task is too big. All adjective-like nouns will be affected first (-ого, -его/-ёго genitive endings), all words where (Cyrillic) "е" is pronounced as "э" (the largest group of exceptions). --Anatoli T. (обсудить/вклад) 23:55, 7 October 2014 (UTC)
  • @Atitarev, Wikitiki89: I'm left unsure -- do you two oppose the addition of romanizations on inflected forms, or do you instead oppose an automated approach that might introduce errors? ‑‑ Eiríkr Útlendi │ Tala við mig 05:29, 8 October 2014 (UTC)
I oppose the addition of romanizations on inflected forms for two reasons (for Russian) - 1. The irregular words will need to be transliterated manually or might introduce errors. 2. The headwords get cluttered. (genitive sg., nom. plural, feminine form - are the possible inflected forms for Russian). It doesn't have to be for all languages like that. --Anatoli T. (обсудить/вклад) 05:34, 8 October 2014 (UTC)
  • Your mention of "clutter" led me to look into Russian entry format. Here's a sample headword line from the entry for русский:

ру́сский (rússkijm anim, m inan (genitive русского, nominative plural ру́сские, feminine ру́сская)

This looks like a bit of a mess to me; all of the additional headword information for inflected forms is already given, as expected, in an Inflected forms table contained within the entry.
Redundancy aside, I think русский (russkij) is already fine -- there's a romanization of the headword, and the Inflected forms table provides romanizations of all other forms.
My current understanding of general policy, and this proposal, is that we want to make sure that all entries in non-Latin scripts include romanizations. So I'm really not worried so much about the lack of romanization for the link to русская (russkaja) in the headword line for the русский (russkij) entry. (For that matter, I think the headword line should be simplified to remove the redundant and visually cluttered inflected forms, but that might just be me.) I'm more concerned about whether there is any romanization given in the actual entries for inflected forms. Gladly, русская (russkaja) does provide a romanization.
Would you be amenable to ensuring that all entries have romanizations? ‑‑ Eiríkr Útlendi │ Tala við mig 07:11, 8 October 2014 (UTC)
I'm going to add my 2 cents to transliterating all inflections in all languages, but I think it's most important for languages like Persian and Arabic where vowels may not be written, and is important for Arabic even when vowels are written because of the difficulty that the average user will have in reading the script. So far it looks like Anatoli is opposed to transliterating inflections for Russian but not Arabic, Wikitiki might be similar, and everyone else is OK with transliterating inflections in all languages. Is this right?
I do think it's possible to make an argument that there's something qualitatively different and more "foreign" about Arabic or Devanagari or Thai vs. Greek or Cyrillic. Certainly this is the case for me. However, keep in mind, Anatoli, that you're a native Russian speaker whereas the majority of users of the English Wikipedia will not be, and might well be trying to learn a foreign language and so care about the inflections, but not be very comfortable with the script.
BTW as for the clutter issue, the same "issue" should theoretically appear in Arabic, but IMO the previous way of doing things (before CodeCat changed it), which did display transliteration of all Arabic inflections, didn't look especially cluttered. The trick here I think is to put the inflections outside of the parens, so that you don't end up with nested parens when you display the transliterations. Benwing (talk) 08:20, 8 October 2014 (UTC)
I agree that we put too much information on the inflection lines of Russian nouns. There is absolutely no need for the genitive or plural in the headword line, unless the form is irregular. The feminine form is useful, however. If the argument is about showing the stress pattern, then the genitive is needed only for nouns ending in a consonant (or ь). But I still don't see why the declension table isn't enough for this. --WikiTiki89 11:30, 8 October 2014 (UTC)
Just to clarify my position on Russian headwords. I don't oppose the information (it's helpful, can help quickly identify stress patterns and declension types and plural forms) but I don't think it's a good idea to transliterate inflected forms. --Anatoli T. (обсудить/вклад) 00:33, 9 October 2014 (UTC)
The genitive only helps identify the stress patter for nouns that end in a consonant, and only the singular stress pattern at that. It is completely useless for nouns that end in consonants, as the singular stress pattern is apparent from the nominative, except for nouns ending in , which may need the accusative (but certainly not the genitive). The nominative plural is insufficient to identify the plural stress pattern. You additionally need one other plural form other than the plural genitive and also the plural genitive in some cases. At that point, there is too much information in the headword line and we already have declension tables with all of this information. --WikiTiki89 03:02, 9 October 2014 (UTC)
I disagree (please review your post, you have two contradicting statements - the first two sentences, so I don't know what you mean there). There are 6 stress patterns: Appendix:Russian stress patterns - nouns + some nouns that are irregular.
Consonantal endings:
  1. до́ктор - до́ктора - доктора́
  2. ди́ктор - ди́ктора - ди́кторы
Ь or "hissing" sounds:
  1. ле́карь - ле́каря - ле́кари (stress pattern 3 is also acceptable)
  2. сле́сарь - сле́саря - сле́сари/слесаря́ (то́карь is the same)
  3. глуха́рь - глухаря́ - глухари́
  4. врач - врача́ - врачи́
  5. това́рищ - това́рища - това́рищи
Do I need examples for vowel endings? For people mastering the basics of Russian, including native speakers, this info is usually sufficient without looking at the full declension table. --Anatoli T. (обсудить/вклад) 03:30, 9 October 2014 (UTC)
Maybe you misunderstood my post. For nouns that end in consonants (including ь), I agree that the genitive singular helps determine the stress pattern for the singular. For nouns that end in vowels, the genitive singular is of no help at all, since the stress is always in the same place as in the nominative singular. Furthermore, for nouns that end in , the accusative might have a different stress from the nominative, yet for some reason we do not include it. For the plural, the nominative plural is insufficient to determine the full plural stress pattern. More information is needed as I explained above, and that would completely overwhelm the headword line and defeat the purpose of having inflection tables. --WikiTiki89 03:43, 9 October 2014 (UTC)
-а nouns are only one portion of nouns, large but not huge. You still need to know that plural and gen. sg for ка́ша is ка́ши, not ка́шы (beginner level) and томоды́ is a form of томода́. Animacy helps determine the accusative. Well, yes, it's not comprehensive but sufficient in MOST cases. Apart from stress patterns, there are other things - колесо́ -колеса́ - колёса, огонёк - огонька́ - огоньки́, и́мя - и́мени - имена́. Knowing that "-а" nouns (NOT ALL VOWELS, just "а"!) are predictable is a blessing but there are too many other declension and stress patterns. I want to reiterate that gen. sg. and pl. nom. forms are sufficient to determine THE FULL STRESS PATTERN (usually). --Anatoli T. (обсудить/вклад) 04:38, 9 October 2014 (UTC)
Someone who does not know the rules for ы vs и will probably need the full declension table anyway to figure anything out. Can you give me an example of a noun that ends in a vowel (not including ь or й) whose stress pattern for the singular cannot be determined from the nominative? (I don't believe there are such nouns, but if you can prove me wrong, go ahead.) Note that I am all for including the genitive for nouns ending in consonants. As for the plural, the "usually" part is exactly my point. If there are exceptions, then you can't say that the full stress pattern can be "determined", but only "guessed". I noticed other Russian dictionaries tend to include the genitive and/or the dative for the plural in cases where there could be confusion. But the more we include, the more we get back to the question of why isn't the declension table enough? --WikiTiki89 05:26, 9 October 2014 (UTC)
Haven't I already with колесо, имя, голова, борода (unlike simple one like женщина with stress pattern 1? What about о́блако - о́блака - облака́ ? --Anatoli T. (обсудить/вклад) 05:37, 9 October 2014 (UTC)
Perhaps you should re-read which forms I am referring to: nominative singular (колесо́, голова́, борода́) and genitive singular (колеса́, головы́, бороды́). Although you did remind me that the n-stems such as имя are possible exceptions; we should definitely include the genitives for them. --WikiTiki89 05:49, 9 October 2014 (UTC)
And here's a good one for you: with "-а": голова́ - головы́ - го́ловы, борода́ - бороды́ - бо́роды. So it's not absolutely useless, even for this type of nouns. :) --Anatoli T. (обсудить/вклад) 05:01, 9 October 2014 (UTC)
Umm... Yes it is useless. Unless you're blind, you can see that the genitive singulars you just gave have the same stress as their corresponding nominative singulars. --WikiTiki89 05:26, 9 October 2014 (UTC)
Hmm, what?! Have you read it carefully? голова is not like most nouns ending in "-а" and stress patterns can be determined not just by genitive sg but gen. sg + nom. pl in combination! See the table again. It's pattern 6, not 1, example given: полоса́ (same pattern as голова and борода). --Anatoli T. (обсудить/вклад) 05:37, 9 October 2014 (UTC)
Perhaps you should re-read which forms I am referring to. My point is that in these cases, if you have the nominative singular and the nominative plural, then the genitive singular adds no new information (since the singular pattern is determined from the nominative singular and the plural pattern has nothing to do with the genitive singular). --WikiTiki89 05:49, 9 October 2014 (UTC)
Displaying genitive sg just shows that it's "as expected", treating vowel and consonant endings differently doesn't make much sense. --Anatoli T. (обсудить/вклад) 05:58, 9 October 2014 (UTC)
Then instead of treating the vowel and consonants differently, let's use this simple rule: if the stress in the genitive is in a different place from the nominative (or if the stem itself is different, such as for день/дня or имя/имени) then we include the genitive, otherwise it is "as expected" and we exclude it to avoid clutter. If the user is still unsure, then they can check the declension table. --WikiTiki89 06:07, 9 October 2014 (UTC)
The modules are complicated as is. I don't see the need to change the Russian noun headword. The Russian headword style was discussed and agreed on a while ago. Even if genitive is hardly the crucial case, it's an example of a case and shows how nouns may change. --Anatoli T. (обсудить/вклад) 01:32, 10 October 2014 (UTC)
Who exactly "agreed" on this, just you and CodeCat? I don't think there is anything wrong with using the genitive as opposed to another case, I just don't think we need to include it for every word. --WikiTiki89 11:21, 10 October 2014 (UTC)
Right, I too favour not including inflected forms in Russian headword lines, but practices for Russian are usually determined by a minority here. Refer to the transliteration debate. --Vahag (talk) 12:46, 10 October 2014 (UTC)
If transliteration for the headwords is chosen I'd favour removing inflected forms from the Russian headword altogether. That way, there won't be any additional reasons for arguments, introduced discrepancies with the existing transliteration practice. @Wiki, having genitive in some terms and not the others will be confusing. Also, if you don't like something, don't do it. You're under no obligation to edit in Russian and genitive sg. and plural forms are optional. I've added manually genitives and plurals on many entries, CodeCat did it with a bot and did the headword changes, no conspiracy here. @Vahagn, you can direct your anger at all other languages where transliteration is not 100% graphemic. Transliterating English into Armenian or Russian graghically wouldn't be very useful, would it?--Anatoli T. (обсудить/вклад) 13:22, 10 October 2014 (UTC)
The question isn't about whether the transliteration is graphic, but whether it represents the written expression of the word rather than the spoken one. For example, a reasonable Cyrillization of English that aims to represent the written language would transliterate colonel as колонел rather than as кёрнел, but bite would still be байт rather than the silly бите. --WikiTiki89 14:10, 10 October 2014 (UTC)

Consensus on transliteration of headword inflections?[edit]

Irrespective of the question of how much info to include in Russian headwords, can I propose a consensus around the following?
  1. For Cyrillic (and maybe also Greek), don't include transliterations of inflections in headword lines.
  2. For other non-Latin scripts, do so. This info comes either from an explicitly given transliteration or, failing that, from auto-transliteration when it is available and is able to succeed.
My preference would be to transliterate all inflections, but I can accept this compromise for the purpose of consensus. The logic here might be something like this: Cyrillic and Greek are similar enough to Latin script, and easy enough to learn, that there's a reasonable likelihood that someone interested in the inflections of a foreign word has a decent command of these scripts, whereas other scripts are generally much harder to learn and especially to master fluently to the point where a transliteration isn't helpful. This is certainly my experience: I've learned Arabic script and tried to learn Thai script and Devanagari, and my experience with all of these is that it takes a lot more work to become comfortable reading these fluently than it does with Cyrillic or Greek, both of which I learned easily. Even after a lot of work with Arabic I still sometimes stumble over the letters, and find the transliteration very helpful. An additional consideration for Arabic script is that some of the vowels are typically omitted, making transliteration essential. Even when vowels are present, they're often hard to read properly because of font considerations (the vowels are displayed above or below the letters and frequently get drawn over letter descenders or other diacritics, or sometimes a vowel below the line can be confused with a vowel above the next line below). Benwing (talk) 04:19, 9 October 2014 (UTC)
I have already expressed my opinion. Yes, splitting "easy" and "complex" scripts sounds reasonable to me. I have to ask about Korean inflected form (verbs and adjectives). @Wyang:, what do you think, do we need to transliterated Korean inflected forms in the headword? Vahagn wants Armenian (and probably Georgian) to be fully transliterated. --Anatoli T. (обсудить/вклад) 04:38, 9 October 2014 (UTC)
I think the idea of compulsorily applying headwords to all languages is silly, and a lot of languages would be much better off without it, including the non-inflecting languages and some agglutinative languages. I think the headword is being overused in two aspects: 1) pronunciation; 2) inflection. For Korean, the inflection information in the headword more properly belongs in the conjugation section, and it can be moved to the top of the conjugation table as another table (identifying the key forms) alongside the stems table. The romanisation in the headword is redundant and should be removed. There is then no need for information or parameter duplication as in the cases of 십육 (rv=) and 아름답다 (irreg=y). In the division of "easy" and "complex" scripts, Korean would definitely be classified as an "easy" script, especially according to the Hangul supremacists. It's also called "morning script", as "a wise man can acquaint himself with them before the morning is over; a stupid man can learn them in the space of ten days". Wyang (talk) 22:35, 9 October 2014 (UTC)
This isn't a question of whether to have info in headwords but whether to transliterate them. I personally see Korean as a bunch of random squiggles, so for me it's not that easy. I have also heard that romanization of Korean involves various considerations beyond mere transliteration, i.e. the transcription shows various sorts of assimilations. I think one problem here is that people are thinking in terms of their own expert knowledge rather than the likely audience, which is someone who is a native English speaker and foreign language learner who may not have much experience with a foreign script. Benwing (talk) 00:19, 10 October 2014 (UTC)
I also used to look at Korean and Arabic as a bunch of squiggles, until I started learning these languages. Changes in the Korean transliteration make perfect sense when its phonology is understood. And learning a foreign script without learning a bit of a language using it doesn't make much sense. So, learning a script in a day or in a few days is applicable to people speaking that language. Arabic was somewhat easier for me (with good fonts only) and I still think Arabic script is easier and would be quite easy if vowel points were always written (I'm not suggesting it should). I think some info in the Korean headword is useful but for me the important bits are not those currently appearing there. --Anatoli T. (обсудить/вклад) 01:20, 10 October 2014 (UTC)
OK, consensus appears to be:
  • No translit for Cyrillic, Greek or Korean scripts.
  • Yes for others.
  • @CodeCat:, can you implement that? We can always add additional exceptions later if needed. Benwing (talk) 04:11, 10 October 2014 (UTC)
Arrowred.png Sorry guys, wrench-thrower here --
What constitutes a "simple script"? Who decides what is "simple"?
Again, I must note that, as the English Wiktionary, our only safe consideration we can make when it comes to scripts is that our user base can read the Latin script. I reiterate my position that I believe we should provide romanizations for all headwords not written in the Latin script.
One argument against including romanizations for certain non-Latin scripts seems to be that the scripts are "simple". Sure, any script (or anything at all, really) can be viewed as simple, once you've already learned it. Many other scripts are also pretty straightforward, with charts providing straightforward phonetic conversions. Are we to no longer provide romanizations for Mkhedruli? Gothic? Amharic?
An undercurrent appears to be that we shouldn't include romanizations because doing so would be difficult. That said, this whole project of creating a multilingual dictionary is itself an enormous amount of work. Is such a relatively small amount of additional work really so much of a hurdle? Romanizations are a very simple way to greatly increase the usability of Wiktionary as a whole.
As with everything here, those who don't want to do the work don't have to. But as far as policy or goals are concerned, I feel very strongly that deciding to not include romanizations for non-Latin-script headwords does us, as a project, a grave disservice. ‑‑ Eiríkr Útlendi │ Tala við mig 04:55, 10 October 2014 (UTC)
@Eirikr:, a few points.
  1. This issue concerns only the inflected forms in headwords. The headword itself is always transliterated, as are links.
  2. I agree with you. I would rather see transliterations (transcriptions or romanizations, more correctly) of inflections for all non-Latin scripts.
  3. I don't think it's an issue of how difficult it is but rather that some people seem to think it's "cluttering" the display.
  4. My main concern for the moment is to find some workable compromise so that CodeCat is willing to put back auto-transliteration of Arabic inflections in headwords; I'd do that myself but I don't have permission to edit Module:headword. (Can I request such permission on a page-by-page basis or do I have to become an admin?)
Here's another possible compromise:
  1. For scripts where there's no objection to transliterating inflections in headwords, we go ahead and put the transliteration there after the native-script inflected form, whether it's explicitly given or auto-transliterated. Let's say this will currently apply to all scripts except for Cyrillic and Korean, maybe Greek as well.
  2. For scripts where people think doing this will "clutter" the headword line, include the transliteration in a mouse-over -- I think this is feasible. (It could be said that we should use mouse-over for all scripts, but I'd rather have the transliteration directly visible whenever possible -- it is faster to read that way, and users might not realize that the transliteration is present on mouse-over.) Benwing (talk) 12:15, 10 October 2014 (UTC)
I've added a temporary exception to Module:headword so that Arabic inflections are always transliterated. This will hopefully alleviate your immediate concerns, but I do hope that you'll continue to participate in the wider discussion. —CodeCat 13:00, 10 October 2014 (UTC)
Thanks, and I will stay in the discussion. I wish more people would contribute; it's hard to form a consensus when only a small number of people speak up. Benwing (talk) 13:16, 10 October 2014 (UTC)
I realised I haven't stated my own opinion. I mostly follow Eirikr's reasoning, and think that transliterations should accompany all non-Latin-script terms in some form, wherever they are. Exceptions can be made in cases where terms generally appear paired with Latin-script alternatives, such as in Serbo-Croatian. —CodeCat 13:18, 10 October 2014 (UTC)
  • I support transliteration of all forms listed in the headword line in all scripts other than Latin, preferably automatically generated, even if this means certain Russian forms will appear to end in -ogo instead of -ovo. Some people might say that's easy for me to say, since the only non-Latin-script language I spend much time on is Burmese, and Burmese doesn't have inflections. Nevertheless, I think it's preferable to transliterate them all rather than to try to decide which scripts are "simple" enough that they don't need it. —Aɴɢʀ (talk) 13:53, 10 October 2014 (UTC)
    • It reminds me a bit of a debate we had some time ago, considering whether languages were "well known" and "major" enough to not be linked in translation tables and in {{etyl}}. Eventually we gave up on the debate and just made translations never link, and {{etyl}} always link. —CodeCat 14:05, 10 October 2014 (UTC)
One thing we seem to be forgetting here: why are the inflections included in the headword line in the first place? They're included for those who know the rules of the language to figure out the inflection without looking through the tables. In other words, they're a shorthand for people who mostly don't need transliterations. For someone who sees the letters as scribbles, an inflected form is most likely just decoration, anyway- whether it's transliterated or not. That means that this isn't a matter of substance (with a few exceptions such as Arabic), but of style. Chuck Entz (talk) 16:58, 10 October 2014 (UTC)
But many languages don't have tables, so we include the forms on the headword line. And even in cases where there are tables, the forms we include on the headword line are sometimes not in those tables. —CodeCat 17:00, 10 October 2014 (UTC)
Certainly for Arabic, this is exactly correct. The inflections list basic and very important things, like feminines and plurals for nouns and adjectives. For nouns and adjectives we don't currently have any inflection tables. There are other languages that are similar. I took a look at other non-Latin-script languages with inflections, and I can find only Russian and Georgian for nouns, and they also list basic things like the plural (and in the case of Russian, the genitive singular). I can easily imagine a situation where a learner has some concept of grammar -- doesn't take much to want to know how to form the plural -- but a shaky grasp on the native script. Benwing (talk) 23:29, 10 October 2014 (UTC)
For Russian, the genitive and plural forms are also in the tables. But for adjectives, there's the comparative forms, which are not in any table. For verbs, the imperfective and perfective counterparts are not in the table either. —CodeCat 23:47, 10 October 2014 (UTC)
The world of language learners is not neatly divided into those who can read the script and those who can't read the script. If push comes to shove, I can read Sanskrit in Devanagari, but I'd rather read it in transliteration because it's easier. I don't know if our Sanskrit headword lines currently include principle parts or not (our coverage of Sanskrit is not great), but if it did, I would want to have translits on each form listed. Devanagari is not just scribbles for me, but it does take me about 10 times longer to read than transliteration. —Aɴɢʀ (talk) 08:28, 11 October 2014 (UTC)


OK, a majority seems to want to see translit of inflections in all languages. This consists of (at least) me, CodeCat, Angr, Eirikr, Vahag, perhaps also Saltmarsh. A minority seems to either want translit of inflections in only some languages, or wants fewer headword inflections in certain languages, or both. This consists of Anatoli (doesn't want translit in Russian, is OK with the rest, is OK with headword inflections in general), Wyant (doesn't want translit in Korean, wants fewer headword inflections in Korean), and Wikitiki (seems to want fewer headword inflections in general, has expressed particular opinions about Russian, might also want less transliteration although I'm less sure about that).
So, we can do two things, it seems:
  1. Take a vote.
  2. Find some compromise that will satisfy both camps. I've proposed above the idea that we can transliterate the headword inflections of most non-Latin-script languages the traditional way (in parens or something similar, after the native-script word), and for the ones where people object (Korean, Russian), transliterate using a mouse-over popup.
I'd like each person who has expressed an opinion, and any others who want, to comment indicating whether they find #2 reasonable and whether they'd accept it, and if not, do they think #1 is the way to go, and if not, what do they think is the way to go? Benwing (talk) 09:02, 12 October 2014 (UTC)
I don't feel super strongly about this, so I'm open to finding a compromise. —Aɴɢʀ (talk) 15:22, 12 October 2014 (UTC)
I really like the idea of the mouse-over popup (or tooltip) transliteration; however, MediaWiki is imposing their own "preview" popup, which does not even work properly in any useful way on Wiktionary and I really wish we could get rid of it and make room for our own popups. --WikiTiki89 14:30, 14 October 2014 (UTC)
  • What I'd like to see for transliterations is 1) the most common scheme used by default, for all languages 2) ability to switch between all of the popular transliteration systems available by clicking on a link placed near the headword, opening a popup menu with options 3) selected choice remembered when browsing other entries in the same language. 4) Ability to hide/show all transliterations for languages that use them. No "one true transliteration scheme" and no "one true transliteration display option". I believe that all of the necessary data can be generated in Lua, and selectively displayed/hidden using JavaScript. We should give users options not cripple them. --Ivan Štambuk (talk) 00:27, 15 October 2014 (UTC)

Phrasal verbs whose lemma is not the infinitive[edit]

I noticed that there are some phrasal verb entries in English that are conjugated, but the infinitive is not used as the lemma. An example I noticed just now is all hell breaks loose. This verb certainly does have an infinitive, all hell break loose. This is clear when you add auxiliary verbs: I want all hell to break loose or may all hell break loose. So I think we should move these entries to the infinitive. —CodeCat 22:24, 11 October 2014 (UTC) :We usually don't bother with inflecting phrasal verbs, as it just clutters the entry for no real gain. This kind of a case probably warrants it, however. DCDuring TALK 22:38, 11 October 2014 (UTC)

The problem is it just sounds funny when the subject of the verb is included. I know we moved there is to there be a while back, but it has the same problem: with the subject (even just a dummy subject there) present, the bare infinitive just sounds really odd. —Aɴɢʀ (talk) 22:42, 11 October 2014 (UTC)
It does, but you can't deny that the infinitive exists. So either we should make a specific rule for these cases, or we should continue to use the infinitive, right? —CodeCat 23:13, 11 October 2014 (UTC)
Among OneLook dictionaries only Cambridge Idioms actually covers this and they do it at all hell breaks loose. DCDuring TALK 23:54, 11 October 2014 (UTC)
Whichever form we make the lemma, there should be redirects from the other forms. - -sche (discuss) 03:38, 12 October 2014 (UTC)
  • This isn't what I would call a phrasal verb, nor is it so categorized. It is a full sentence. As is the case with virtually all other full English sentences (See Category:English sentences.), the verb and sometimes the noun within can be inflected. (It is trivial to show it to be a full sentence and to show it or any other sentence to occur with an infinitive.) Sentences are usually shown in their canonical form (present indicative tense). DCDuring TALK 05:49, 12 October 2014 (UTC)

use–mention distinction in reference templates[edit]

As happened seven months ago, Dan Polansky and I are currently in disagreement about reference-template formatting; this time, we disagree about whether {{R:L&S}} should enclose the cited entry title in quotation marks. I believe that such quotation marks are necessary in order to mark the use–mention distinction, and that quotation marks create a more legible presentation than italicising the entry title would. I don't know why Dan Polansky disagrees, and nor do I know why he reverted the addition of {{documentation}} to the template in the same edit. — I.S.M.E.T.A. 01:37, 13 October 2014 (UTC)

To explain, I come here in the hope that I shall find or obtain consensus to use quotation marks in {{R:L&S}}. — I.S.M.E.T.A. 01:57, 13 October 2014 (UTC)

Just ignore him. Keφr 11:14, 13 October 2014 (UTC)
@Kephir: Forgive me; does "him" refer to Dan Polansky or to me? — I.S.M.E.T.A. 17:02, 13 October 2014 (UTC)
Polansky. He is going to be obstructionist just because he can. But for the sake of having anything said on-topic, I agree with you about the quotation marks. On the other hand, some consistency in formatting mentions would be nice, which would favour italics instead. But either way, bare external link formatting seems rather unfitting to me. Keφr 15:34, 14 October 2014 (UTC)
Thanks, Keφr; I thought you meant him, but I wanted to make sure. I've made the change again; hopefully it'll stick this time round. — I.S.M.E.T.A. 18:28, 14 October 2014 (UTC)
FWIW, I agree with quotation marks, since we are referring to a piece of a larger work: "qua" (for example) is more-or-less a section title. (This is not exactly the same as the use–mention distinction. We are neither using nor mentioning the word qua, we're just citing a source that mentions the word qua. Perhaps a subtle distinction, but IMHO a useful one to keep in mind in cases where the reference work uses a different citation form than we do, or when it assigns a few lemmata to a single entry for whatever reason.) —RuakhTALK 04:56, 15 October 2014 (UTC)

Empowering WingerBot[edit]

I filled out a vote request to empower my new bot WingerBot, here:

Wiktionary:Votes/2014-10/Request for bot status: WingerBot

This is my first bot.

It gives a 30-day vote period, which seems excessive. For example, JackBot had a 7-day window, which seems reasonable. If that can be applied here, can someone fix up the start and end times appropriately?

Thanks. Benwing (talk) 07:20, 13 October 2014 (UTC)

FYI, the voting is going on now (and has been for a few days).
My bot's source code is available on github: [4]
See also Wiktionary talk:Votes/2014-10/Request for bot status: WingerBot.
Benwing (talk) 11:29, 23 October 2014 (UTC)

Compound lists for Japanese entries (and possibly CJK in general) -- are these really needed?[edit]

With the advent of User:Haplology's various categories for Japanese entries, which compile lists of terms using each kanji (such as Category:Japanese terms spelled with 赤 read as あか, or Category:Japanese terms spelled with 幸 read as こう), it occurs to me that the potentially *huge* lists of compounds that could be compiled and included within each kanji entry are actually redundant and obsolete. Rather than laboriously compile these lists by hand, I think it makes a lot more sense to leverage the categories to do the hard work for us.

Comparing the categories and the manually created lists, the only additional information that the manual lists provide is a possible reading, and a gloss. This leads me to two things:

  • As a proposal: I posit that this information, while potentially helping to improve usability slightly, also represents a sizable negative potential for mistakes and inconsistencies. I therefore propose that we no longer include such lists in Japanese entries, referring users instead to the categories. I also submit for consideration that Chinese and Korean editors might do the same for hanzi and hanja compound lists.
  • As a request: Does anyone familiar with the inner workings of categories know if there might be some technically feasible way to get readings to display automatically in category listings? For instance, 幸運#Japanese is added to category Category:Japanese terms spelled with 幸 read as こう, with the sort argument こううん (kōun). Looking at the list on the category page, we see that 幸運#Japanese is there, but its sort argument is lost -- other than the sorting itself, the sort argument doesn't appear on the page as any kind of useful information. Is there any way of capturing sort arguments and getting them to display somehow in category lists?

I look forward to hearing what others think. ‑‑ Eiríkr Útlendi │ Tala við mig 18:19, 13 October 2014 (UTC)

I find them useful. They are not hard to create. Ideally, a bot should make those categories.--Anatoli T. (обсудить/вклад) 10:09, 14 October 2014 (UTC)
  • Sorry, which them did you mean in I find them useful? Did you mean the categories that list compounds (which are already auto-generated once the appropriate templates are added to an entry), or the in-entry lists of compounds (which so far have to be created by hand)? ‑‑ Eiríkr Útlendi │ Tala við mig 19:11, 14 October 2014 (UTC)
I find categories useful, such as Category:Japanese terms spelled with 飢 read as う. Yes, the template auto-generates cats but they have to be created manually if they are missing. --Anatoli T. (обсудить/вклад) 21:42, 14 October 2014 (UTC)

Rethinking Babel boxes[edit]

I did some minor editing at WT:Babel recently, which made me wonder whether it would make sense to rewrite {{Babel}} in Lua. My initial motivation was to integrate it with our central list of languages (maybe even into the category boilerplate system which User:CodeCat developed) and get rid of inline styles on the way. While planning this out, some other ideas emerged in my head:

  • To have the blurbs ("This user speaks Elbonian at an advanced level") in English, and English only. On one hand, this is contrary to how Babel boxes look in other Wikimedia projects. On the other, not only will it massively simplify the code, it also makes the most sense: English is the one language in which English Wiktionary's (duh) definitions, boilerplate and meta-content are written and in which discussions are (usually) conducted, and the only language which can be assumed to be understood by all users. If I am looking at a Babel box of an advanced speaker of Cantonese, I can recognise it only because I remember yue to be the code for Cantonese, and that the number 3 means advanced level. The blurb tells me nothing; I do not know nearly enough Hanzi to recognise a single character.
  • To rename the user categories. "User si-3" is rather terse and again forces me to remember language codes. "Wiktionary:Advanced speakers of Sinhalese" would be more elegant and descriptive.
  • To deprecate {{#babel:}}, as was suggested in Wiktionary:Beer parlour/2014/September#Can we disable the #babel parser function? (I see the English blurb issue was brought up there too). I think some page in the MediaWiki namespace can be edited to point users to the template instead.
  • To suggest users to add themselves to interest groups (in Module:workgroup ping/data) when they speak a certain language at a high above level.

Some considerations:

  • Integration with our central languages list would mean that, for example Template:User en-us-N would have to be folded into Template:User en (I see Template:User sr-4 already redirects to Template:User sh-4)
  • I think some users may expect Wikimedia language codes to work in our Babel boxes (they may simply copy the Babel template across projects). I think we should generally not break that expectation; however, I worry about some Wikimedia codes not mapping perfectly to local ones.


Keφr 17:08, 14 October 2014 (UTC)

I support translating the Babel boxes into English. Their very purpose is defeated when they are incomprehensible. — Ungoliant (falai) 17:18, 14 October 2014 (UTC)
I agree with this too, and I definitely agree with converting to Lua to eliminate the unmaintainable mess of templates we currently have. —CodeCat 17:21, 14 October 2014 (UTC)
I support translating to English. I oppose converting to Lua because once we translate to English it will be very easy to turn it into a small maintainable template without Lua. I also oppose, as before, deprecating {{#babel:}}. --WikiTiki89 17:30, 14 October 2014 (UTC)
The Module:workgroup ping integration and (maybe) validation would be much harder to do from a bare template. And I think so would be Eirikr's suggestion to avoid nested tables (while maintaining all current functionality, at least). Keφr 08:35, 17 October 2014 (UTC)
Arrowred.png I enjoy seeing the other languages and would be sad to see them go, but I understand and generally agree with the rationale for changing the Babel boxes to be all-English. If we're going to have them redone, my 2p request would be to not use nested tables, and to make sure that the columns actually line up properly. I'm one of those visually oriented people for whom the jagged inconsistencies of the current Babel infrastructure is so jarring, that I deconstructed the tables and rebuilt them to line up properly on my own user page. ‑‑ Eiríkr Útlendi │ Tala við mig 19:08, 14 October 2014 (UTC)
Does the text really matter, other than the English, as well as native, language name? Wouldn't luacizing the templates would mean that, as a practical matter, the text could only be in English? A new person with a new language could not be assumed capable of adding the text required in their language in a standard-conforming way, unless there were a particularly obvious way to add the text. DCDuring TALK 08:20, 15 October 2014 (UTC)
Well, maybe; translating into every language would be a bit of work (just create a huge data table… the only problem is that it would probably grow even larger than Module:languages, so we would have to split it, and it might become hard to navigate…), but could be done in principle. Though I think we could abuse the Scribunto i18n library to reuse messages provided by mw:Extension:Babel, and have every single Babel box in any language the reader desires (just add ?uselang= to the URL). Though that would put mw:Extension:Babel in a weird limbo of "deprecated but depended upon by its replacement"; and I have no idea how this interface could be exposed. Or we could just use that facility to maintain the status quo (pardon the Polanskyism) of having them in the target language. Keφr 08:35, 17 October 2014 (UTC)
Proof of concept: {{#invoke:User:Kephir/test1|babble|ast|5}} gives
{{GENDER:USER|Esti usuariu|Esta usuaria}} tien un conocimientu [[LEVEL LINK|profesional]] d'[[LANG LINK|asturianu]].
This user has [[LEVEL LINK|professional]] knowledge of [[LANG LINK|Asturian]].
This user has professional knowledge of Asturian.
. Try also viewing this page in Chinese. Keφr 14:38, 20 October 2014 (UTC)
I always thought that the purpose of having the blurbs in the target language was to help non-English speakers or English language learners to find users with whom they might be able to communicate if they needed help. I think it is beneficial to see the name of the language in English so that English speakers can easily recognize which language the box indicates. - TheDaveRoss 20:35, 16 October 2014 (UTC)
I did not consider this. This is a good argument. Keφr 08:35, 17 October 2014 (UTC)

Spaces in alphabetization of language names[edit]

How do we treat spaces when we alphabetize language names? Specifically, does "Lower Sorbian" precede or follow "Low German"? If we ignore spaces, then "LowerSorbian" precedes "LowGerman", but if we treat spaces as preceding A in alphabetical order, then "Low_German" precedes "Lower_Sorbian". —Aɴɢʀ (talk) 19:01, 14 October 2014 (UTC)

There are pros and cons to both options. What do Dictionaries that list multi-word phrases as separate entries do? --WikiTiki89 01:35, 15 October 2014 (UTC)
I just checked six print dictionaries (two British, four American) and they all ignore spaces (hotchpot before hot dog before hotel). —Aɴɢʀ (talk) 06:12, 15 October 2014 (UTC)
w:Alphabetical order#Treatment of multiword strings is relevant.​—msh210 (talk) 12:30, 15 October 2014 (UTC)
That page basically outlines the question, but does not provide an answer. --WikiTiki89 12:39, 15 October 2014 (UTC)
Both treatments are valid; the question is, which do we want to use? Dictionary headwords apparently usually follow the "ignore the space" rule, but other lists may follow the "treat the words separately" rule. —Aɴɢʀ (talk) 13:53, 15 October 2014 (UTC)
Internet-based sorting, including our own categories, generally treats a space as being ordered before any other character. So that would place Low German before Lower Sorbian. —CodeCat 18:27, 17 October 2014 (UTC)
Some paper dictionaries, too, use this ordering, e.g. the Routledge dictionary of historical slang: have a look at http://books.google.fr/books?id=JRuNMHNcu5cC&pg=PP12&lpg=PP12&dq=%22something+before+nothing%22+dictionaries&source=bl&ots=6iDNPNRHjr&sig=S8mC2Wqar5xb4FCC2zWaw4itGG8&hl=fr&sa=X&ei=yXJBVNebNMnDPPWkgIgG&ved=0CCMQ6AEwAA#v=onepage&q=%22something%20before%20nothing%22%20dictionaries&f=false This is the better ordering for our kind of dictionary. Lmaltier (talk) 19:52, 17 October 2014 (UTC)
This dictionary calls it something before nothing. Do you understand why? Lmaltier (talk) 20:12, 17 October 2014 (UTC)
On what basis do you say "This is the better ordering for our kind of dictionary."? I happen to be leaning the other way. --WikiTiki89 21:30, 17 October 2014 (UTC)
The reason is the number of multi-word phrases, etc. here. When entries in a dictionary are almost always single words (without spaces, etc.) and phrases are defined in these basic entries, the strict alphabetical order is the logical choice. Wnen each phrase has its own entry, it's much better to get all phrases beginning with the same word together when using a category. An example : you expect boulanger-pâtissier (probably adressed in boulanger in most paper dictionaries) after boulanger but before boulangerie, the order boulanger, boulangerie, boulanger-pâtissier is not what you would expect. Lmaltier (talk) 15:48, 19 October 2014 (UTC)
For the most part, we don't have to worry about alphabetization here; our entries are on separate pages that aren't ordered with respect to each other. Our categories alphabetize automatically, and I see that Category:en:Languages has Low German >> Low Prussian >> Low Saxon >> Lower Lusatian >> Lower Silesian >> Lower Sorbian >> Lower Wendish, meaning that our automatic alphabetization does treat spaces as ordered before any other character. The only alphabetization we have to do manually is the ordering of the languages in entries like se, which is where I first encountered the problem of where to put Lower Sorbian with respect to Low German. My immediate instinct was Low German >> Lower Sorbian, but then I second-guessed myself and asked here. After discovering that dictionary lemmas treat spaces as nonexistent, I went back to se and switched the order to Lower Sorbian >> Low German. But now that I've looked at how our categories alphabetize, I'm gonna go back again and switch it back to my first instinct, Low German >> Lower Sorbian. —Aɴɢʀ (talk) 20:07, 19 October 2014 (UTC)
We have to worry about categories only, but this is very important. They alphabetize automatically, but we must ensure that they alphabetize the best way for readers. For languages: the result is disputable for Lak'ota. Lmaltier (talk) 05:41, 20 October 2014 (UTC)
We have some control over sorting in categories, though I'm not sure if that includes treatment of spaces. As for "Lak'ota", that's not a good example- we call the language Lakota. Chuck Entz (talk) 12:24, 20 October 2014 (UTC)
It was a real example: see Category:en:Languages and look at L. Lak'ota is before Lake Miwok. Lmaltier (talk) 20:17, 20 October 2014 (UTC)

Extended etymologies[edit]

I came up on this website illustrating an idea that I had in mind for a while (click on the blue links in the leftmost column). We could extend the < "derives from" operator used in etymologies to generate a drop-down table illustrating intermediate steps between pairs in the derivational chain, i.e. all of the sound changes involved. Short descriptions could link to appendices where more details are available. This would be applicable to both reconstructions and attested etymons, including borrowings (which often undergo some special rules can nevertheless be described and cataloged). Chronologically inverted list would be used in the descendants sections of the corresponding source word/reconstruction. Support could be added for multiple sequences of derivation, and even multiple sources or different reconstructions reflecting different protolanguages. It would however require some non-trivial investment in the groundwork to make it work, so it should best be approved (or better: not disapproved) first before people waste time. I've seen some recent works that use this method but they use numbers instead of descriptions to explain what's going on, so one has to manually look up what each of the numbers used means, and the layout is horizontal not vertical. --Ivan Štambuk (talk) 00:13, 15 October 2014 (UTC)

Support, although I recognize that there should be a lot of discussion about the specifics of the layout. --WikiTiki89 01:36, 15 October 2014 (UTC)
How would it work, on a technical level? How would you share data between entries? DTLHS (talk) 04:46, 15 October 2014 (UTC)
Support. I had a vague idea about having such lists in appendices somewhere, but never developed it. Filling out the details would seem to go beyond the limits of published sources without resorting to the kind of extrapolation that you've been berating CodeCat for- are you ok with that? Chuck Entz (talk) 13:30, 15 October 2014 (UTC)
Support. Categorization based on sound change could also be added, such as Category:Old Armenian terms derived by Meillet's law. Or such terms could appear on the appendix dedicated to Meillet's law. --Vahag (talk) 10:29, 16 October 2014 (UTC)
  • I think this might overwhelm normal entries, especially if people do it for every morpheme in a polymorphemic word, but it would be nice to do this somehow on reconstructed-form appendix pages. —Aɴɢʀ (talk) 13:49, 15 October 2014 (UTC)
    • It wouldn't be too bad if we restricted it to the rules between a term and its nearest parent (i.e., an English etymology would only have the steps between it and Middle English or maybe Old English), and hid the list so that only those who choose to look at it would see it. Chuck Entz (talk) 13:57, 15 October 2014 (UTC)

Categories for words that have pronunciations marked in the form of IPA[edit]

Should we create such categories? I believe that it is convenient to go to Special:WhatLinksHere/Appendix:Italian pronunciation for the above information. --kc_kennylau (talk) 09:53, 15 October 2014 (UTC)

What's the general consensus view on handling abusive editors?[edit]

I stumbled across the activities of a new editor and have been quite impressed at how abusive they can be -- foul language, name-calling, lawyering, basically the kind of trollish behavior that drove me from Wikipedia years ago. I analyzed their total contributions, only a short list so far, and found that more than a quarter have been on talk pages, where this editor has mostly argued about editing decisions, illustrated their profound ignorance of the consensus here, and berated other users. Another more-than-quarter has been in this user's own userspace. 40% has been actual constructive mainspace edits, mostly in January-March this year. Out of the total, more than a quarter has been confrontational and even outright abusive.

For what it's worth, this editor has not yet had any direct dealings with me.

How would other admins approach this? ‑‑ Eiríkr Útlendi │ Tala við mig 18:05, 17 October 2014 (UTC)

I would post a warning on his/her user page along the lines of "Start being nice to people, or I will block you." (but in a more polite way). --WikiTiki89 18:10, 17 October 2014 (UTC)

Proposal: use quotation marks to mark headwords cited in reference templates for Latin-script languages[edit]

Further to §: use–mention distinction in reference templates above, may I suggest that we use quotation marks in our R:-prefixed reference templates to mark the headwords cited by those templates? So, for example, the standard format (at least where the headword is concerned) would be:

  • “foo, n.” in Some Big Dictionary

(Because of potential problems with using quotation marks with other scripts, I make this proposal for Latin-script languages only.) Does that seem sensible to everyone? Is there consensus? Shall I prepare a vote? — I.S.M.E.T.A. 18:35, 17 October 2014 (UTC)

  • It's also worth noting that all three of these changes to remove the quotes were in 2009, now half a decade ago. Attitudes and ideas change over time. I suggest we check the opinions of the relevant people here. That said, Ullman is no longer with us, and Spangineer's last edit was in 2010. @DCDuring: do you have any input on this quote issue? ‑‑ Eiríkr Útlendi │ Tala við mig 19:32, 17 October 2014 (UTC)
  • I support adding quotes. It's the only way we can make the cited part stand out without changing text style like the italic "n.". —CodeCat 19:16, 17 October 2014 (UTC)
    The only way to stand out? That is obviously untrue. The text of the word stands out by the use of a different color for the hyperlink, as in cat in Webster’s Revised Unabridged Dictionary, G. & C. Merriam, 1913. --Dan Polansky (talk) 19:27, 17 October 2014 (UTC)
    Not all people can see such colours. —CodeCat 19:35, 17 October 2014 (UTC)
    You mean color blind (are there such that cannot distinguish blue vs. black)? Or people with a simple browser that does not distinguish a piece of text with a link from a piece of text without a link? Even assuming some people do not see such colors, will they miss the link because of the missing quotation marks? If so, will they miss links in general, since in general links are not surrounded by quotation marks? --Dan Polansky (talk) 19:38, 17 October 2014 (UTC)
    Surprisingly, I agree with Dan. Color or other link distinction seems sufficient. Quotation marks, especially double, add visual clutter IMO.
We use quotes for glosses, so any need for glosses in such templates — quite possible IMO — would require multiple quotes.
If we resort to further distinction, I would strongly oppose ever using italics as it makes it impossible to maintain the appropriate typographic contrast for the taxonomic names that are supposed to have it. DCDuring TALK 19:51, 17 October 2014 (UTC)
  • Re: links, are there any cases where a term might not be linked in such a template call? ‑‑ Eiríkr Útlendi │ Tala við mig 20:27, 17 October 2014 (UTC)
    It certainly might not always be the pagename. In some cases having a named link might be misleading, as it implies that it is possible to go to a page that is directly related to the term, rather than, say, a general search-form page. The more I deal with these, the more I appreciate such refinements. Also: optional italics for the taxonomic names that need them ("i=1") and a optional gloss ("gloss="). Not every template needs such options, but they are handy. DCDuring TALK 22:08, 17 October 2014 (UTC)

Redesign-Redefine of Russian Entries[edit]

I'm going towards a large redo of many Russian pages, translating swathes from Russian Wiktionary with a focus to layout consistency, definition intuitiveness/coverage, and relevant design/coding.

Info on en-Wiktionary is generally inadequate for translating literature; often confusing for basic words (e.g.'весь', see below). We have all necessary info already, only, on Wiktionary-ru, hence inaccessible to casuals (many definition examples cited there derive from literature.) I started translating Dostoevsky, ( https://github.com/icarot/bk ) which was when such inadequacies became more obvious.


1) Collaborate with Grease Pit to try to normalize the data layout as consistently as possible, for parsing by robots. A parser/morphological analyzer needs quality, open data. Hacky consistency = hacky parse.

2) Improve word-count and definition count immensely. On the order of a few thousand for one of them. Even ru-Wiktionary is occasionally lacking in this department.

3) Clean messy pages, i.e. 'весь' (which confound the novice with the unintuitive concept that Russian uses declensions to represent irregular meaning on an unusually multi-purpose [pronoun-adjective] word), and does not represent all of the critical meanings.

4) Pronunciations from ru-Wiktionary as well. Ours are sufficient but different (we use phonemic vs. narrow transcriptions). In my opinion, the narrow transcriptions are better since they reveal useful subtleties of pronunciation without adding obscure IPA symbols. The main changes would be notating non-phonemes when ru-Wiktionary decided to do so and we did not, such as replacing our alveolar approximants with velarized allophones, and notating unusual instances of vowel allophony, or secondary stress. In short, copying the more precise and still friendly transcriptions from ru-Wiktionary. Consensus?

What are desired improvements I've missed for Russian translations which can be directly bettered from conventions and the scope of information on Russian Wiktionary? Looking for criticisms, guidance, etc. I wouldn't just run rampant without letting the community know what was going on, or asking for help.

Main Points Noted

  • Ivan I can help generate stubs [..] — that would be brilliant! I'd do the same, using lemmas from Dostoevsky. I'll use the corpora from ru-Wiktionary (i.e., National Russian Corpus) because if it's there, logically I assume it's license-compatible. I agree with you about Google Translate — they can't possibly have the copyright on that data. But we should verify to make sure.
  • Ivan German article in Spiegel and there were like 2-3 missing words in every single sentence. I can imagine that the statistics for Dostoevsky are even worse. It has become embarrassing. We should have some kind of stubs for statistically top 20k words in every language IMHO — I think this is a fantastic idea. And you're absolutely right about your inference about Dostoevsky. It's the English equivalent of reading the word 'snicker' and having no entry whatsoever. This is middle- or high-school vocabulary, and is a large problem as a whole for practical use as a dictionary. Can we reach a consensus for doing this specifically for ru-articles?
  • Wikitiki89 do not change the layout without discussing it first. — Main change wanted is inflection tables. These on en-Wiktionary waste huge amounts of space. We should copy ru-Wiktionary's approach: a clean, uncluttered overview of an inflection pattern. While we're on the topic of morphology, I want Alfred Zalizjank's inflection descriptions from ru-Wiktionary as well. He uses one number and one letter for each word to comprehensively cover the morphology and stress pattern of Russian. I'll work on translating the description from ru-Wiktionary when I get a moment.

Icarot (talk) 00:18, 18 October 2014 (UTC)

We have seen you talking but we haven't seen you working :). You're welcome to demonstrate your ideas. Yes, we need more Russian entries and some entries may need fixes or improvements but you can't make major changes without a prior agreement. --Anatoli T. (обсудить/вклад) 05:20, 18 October 2014 (UTC)
  • Just a heads-up: Any automatic transmission of data from Russian Wiktionary into English Wiktionary has to clearly indicate the source of the data in the edit summary to prevent copyright violation. --Dan Polansky (talk) 05:34, 18 October 2014 (UTC)
  • Feel free to make any changes you want to content, but do not change the layout without discussing it first. --WikiTiki89 14:25, 18 October 2014 (UTC)
  • @Icarot: I can help you generate stubs for Russian nouns, adjectives, verbs and adverbs (the rest are a closed category and mostly covered). Stubs would be entries like in this category - the only thing they are missing are definitions. I could help extract a list of missing lemmas from a particular work. We could also pregenerate a list of examples for every entry and format them using the {{usex}} template, by taking them from ru Wiktionary, glosbe, parallel corpora databases, subtitles, google translate and so on, that editors could easily copy/paste into entries that are missing them. Don't worry about associations (derived terms, *nyms, morphological etymologies etc.) - those can be largely automated once entries with definitions are created. The primary focus should be on coverage. --Ivan Štambuk (talk) 07:32, 19 October 2014 (UTC)
    • Not sure why you are not continuing with this crap in Serbo-Croatian Wiktionary. It already has more than 100 000 Serbo-Croatian definitionless entries. If Wiktionary users are so hungry after such content as you posit, Serbo-Croatian Wiktionary could become one of the most visited Wiktionaries soon. Unless it gets shut down due to copyright violation, that is, such as because of automated lifting of data from Google translate as you seem to suggest above. --Dan Polansky (talk) 07:58, 19 October 2014 (UTC)
      Inflections cannot be copyrighted, the databanks such as HJP are completely free. Besides, I fixed many errors in them, and used two others as well. Definitions on the other hand can be copyrighted, and are nevertheless abundantly stolen by many FL Wiktionaries without anyone so much raising an eyebrow. Don't worry Polansky, soon I'll add many such stubs for Czech as well. --Ivan Štambuk (talk) 08:07, 19 October 2014 (UTC)
      • As you know from a previous discussion on the subject with copious participation, there is no consensus supporting your mass creation of definitionless entries. There is no consensus for blocking that behavior either, though. You may get blocked in the process nonetheless; if I were a crat, I would have blocked you by now for entering definitionless rubbish. You may also get blocked for the above cynical utterances of disrespect toward copyright; if I were the operator of this website, I would block you for that. In the meantime, I will take this opportunity to register my annoyance. --Dan Polansky (talk) 08:16, 19 October 2014 (UTC)
        A wide consensus is not necessary for language-specific work (The original discussion was for all languages). A few editors agreeing and working together is enough. The rest can complain about it all day for all I care. (It seems to be the only thing that you do anyway.) Just looking at the content of Category:Czech nouns: We have 13k Czech nouns and 95% of them don't have inflection and pronunciation. I can guess the meaning of 90% of them and I've never studied Czech in my life. I know it's hard to accept that most of your work has been futile, but such is life. Google Translate is based on statistical correlation in parallel corpora not owned by Google an its translation pairs are uncopyrightable, and can completely substitute all of the work you've done. Working smart not stupid is the way to go, using bots and free databases for heavy lifting and not wasting time on typing wiki syntax. --Ivan Štambuk (talk) 08:31, 19 October 2014 (UTC)
        • Re: 'The rest can complain about it all day for all I care. (It seems to be the only thing that you do anyway.)': That is obviously untrue; it suffices to inspect my mainspace contribution to see otherwise. I propose you use your blocking tools to block yourself for that remark. --Dan Polansky (talk) 08:36, 19 October 2014 (UTC)
        • Re: "I can guess the meaning of 90% of them": Very unlikely. --Dan Polansky (talk) 08:38, 19 October 2014 (UTC)
          Well I took a look at the last 50 contribs of yours, and the only novel mainspace edit is some English misspelling. Anyway, my point was that you've invested too much time into easily replicable manual labor so that you oppose stubbing not by reason but principle. See: neo-Luddite. We have too little editors to do everything manually, and after 10 years we're still missing thousands of top words in many major languages. The other day I was reading a German article in Spiegel and there were like 2-3 missing words in every single sentence. I can imagine that the statistics for Dostoevsky are even worse. It has become embarrassing. We should have some kind of stubs for statistically top 20k words in every language IMHO (including translations). Regarding blocking - using words such as crap or rubbish when referring to other people's work is considered impolite and could be a cause for a block. --Ivan Štambuk (talk) 08:56, 19 October 2014 (UTC)
          Are you semantically challenged? Which part of "the only thing that you do" you fail to understand? Some recent contributions are [5] and [6]. Your ridiculous insults and inaccuracy are just tiresome. --Dan Polansky (talk) 09:19, 19 October 2014 (UTC)
          You've made ~500 mainspace edits in 4 months, most of which are translation pairs. I could in a few hours write a script that would generate both those and inflections and pronunciations. 4 months of work reduced to few hundred lines of code. I can even extract context labels from dicts. I understand your anger but there is no need to project it towards others. Behave yourself. --Ivan Štambuk (talk) 09:36, 19 October 2014 (UTC)
          • My point is that what you said was clearly false. I still see no "I stand corrected". Actually, when one rereads your posts above, they are full of obvious inaccuracies. I am not sure why I care to respond to that sort of communication style that is inaccurate by design, and whose author never says "I stand corrected, I was wrong". --Dan Polansky (talk) 09:44, 19 October 2014 (UTC)
            Natural languages are too primitive to convey the nuances of meaning representative of the real world. Nature is stochastic and statistical, and there really exists no such thing as true or false, right or wrong. In practice "never" means "almost never/in 0.something % of cases", and "all" means "100% for all practical purposes". It's real life 101. But I digress. If you don't have anything to say regarding my points I suggest that we terminate this interlocution.--Ivan Štambuk (talk) 10:05, 19 October 2014 (UTC)
            • Re: "Natural languages are too primitive to convey the nuances of meaning representative of the real world." No one should be allowed to get away with this sort of continental nonsense. The relevant distinctions are very easy to express in natural language: there is a clear, easy to understand difference between "The only thing you do is X", "You do almost nothing but X", and "Most of what you do consists of X". No rocket science, nothing to do with stochastic nature of the real world. As I said, remind me of the occasion on which you admit you made an error rather than blaming natural language for lack of expressive power. Your sort of response to clear refuting examples is the sort of behavior which Popper's philosophy of falsificationism was intended to combat. --Dan Polansky (talk) 17:51, 24 October 2014 (UTC)
          Re: 'using words such as crap or rubbish when referring to other people's work is considered impolite and could be a cause for a block': That's utter rubbish. You can hear "rubbish" all the time, used be well educated and generally polite people. These words are not the most polite forms available, but fit well to describe the sort of content that dominates the Russian Wiktionary. --Dan Polansky (talk) 09:23, 19 October 2014 (UTC)
          I'm not sure what kind of polite people you socialize with, but referring to other people's work as crap and rubbish an them as challenged (a jocular pejorative) is generally reserved for intimate contexts where they would not perceive it as an insult (e.g. family or close friends). Russian Wiktionary is doing fine, thanks for asking. And so will the Serbo-Croatian Wiktionary. Not so long ago the SC Wikipedia was ridiculed on similar grounds, and now is the bigger than any of the hr/bs/sr pedias with the highest growth rate. --Ivan Štambuk (talk) 09:36, 19 October 2014 (UTC)

@Icarot: Feel free to add definitions to Category:Russian entries needing definition, generated by User:Ivan Štambuk, which I have been working on. Plenty of work to do! I'll repeat what was said before: please don't change the design without a prior agreement. As I said before, we haven't seen you working yet. --Anatoli T. (обсудить/вклад) 02:37, 24 October 2014 (UTC)

IPA, language code and error message[edit]

Whatever changes were made to IPA modules to make older pages (2013) have conspicuous red error message in the IPA section should be undone. Example: this revision. Old revisions should look as legible and sane as possible; this is not. In general, IPA templates should not require the language parameter; filling-all-the-fields concerns should be delegated to editors with a shovel who have no real interest in building the dictionary. --Dan Polansky (talk) 05:31, 18 October 2014 (UTC)

I agree that the lack of a lang parameter shouldn't result in an error message, but we don't have any editors who have no real interest in building the dictionary. People with no interest in building the dictionary don't become editors. —Aɴɢʀ (talk) 07:00, 18 October 2014 (UTC)
I completely agree that there shouldn't be an error message. A cleanup category would be sufficient. --WikiTiki89 14:27, 18 October 2014 (UTC)
I was gonna say exactly what Wikitiki89 said. Renard Migrant (talk) 11:49, 24 October 2014 (UTC)


There's a lot about this entry that makes me nervous: the word was apparently coined in a journal article published in mid August, with some or all of the authors working at Alabama State University in Montgomery, Alabama. The Wiktionary article was created at the beginning of September by an anonymous contributor whose IP is assigned to ASU. A variety of IPs from the same southern Alabama/northern Florida area as ASU, as well as an account that seems to bear the name of one of the authors, have been adding references, which are all articles/blurbs about either the research program at ASU or about the original article itself. It's tagged as a hot word, but it looks to me to be lukewarm at best: a Google search does show the word in a blog or news article here or there, but this isn't the kind of strong, widespread adoption we saw with olinguito.

I can't escape the impression that we're being used for promotional purposes, and I feel we need to do something- but I'm not sure whether to tag this for cleanup to prune out all the PR from the references, or to rfv it, or something else. It certainly doesn't meet the letter of the CFI, since it's only 2 months old, but how do we decide whether this is "hot" enough to keep it provisionally as a hot word? Chuck Entz (talk) 05:04, 19 October 2014 (UTC)

Some use outside of the group promoting it would be nice. I'd RfV it for starts. DCDuring TALK 12:44, 19 October 2014 (UTC)
It's hard to say which of the "references" have print counterparts or can otherwise be considered to be durably archived. At least one is a self-proclaimed blog. Nothing in CFI says we have to include something as a hot word, especially when it is not at all clear that use would get ever beyond the field of forensic science and practice. I think that means that it would in the end come to a vote, which usually takes place at RfD. And then there's the increasingly important question of how we address the decline of print media.
This particular case seems to me to be part of a campaign by a university PR office. RfC seems inappropriate as the entire issue is with the attestation. I'd RfV it to get a slow clock started. We need to have properly formatted attestation to facilitate wide participation in review. Why should each participant have to click through to each website? DCDuring TALK 13:17, 19 October 2014 (UTC)

Headwords for reconstructed languages[edit]

So I'm putting in the first steps towards an appendix for Proto-Samic, a fairly well-reconstructed proto-language. I'm however wondering what would be a good choice of headword for verbs?

  1. Use just the bare verb stem. This is what the main published sources, including the 1989 dictionary by Lehtiranta [1], seem to do: e.g. *ëstë (to be in time). However this is not an actual wordform by itself.
  2. Use the verb stem, marked by a hyphen to be just a stem and not an actual wordform: e.g. *ëstë-.
  3. Follow the standard for the modern-day Samic languages (and, for that matter, our PF and PGmc appendices) and use the infinitive: e.g. *ëstëtēk. These are not directly listed in the source literature, but they are simple enough to assemble, and the ending itself is uncontroversial.

Worth noting is that some otherwise homophonic roots would be distinguishable under options #2 and #3 (e.g. *ćēkćë 'osprey' ~ *ćëkćë- 'to kick'). OTOH there also exist roots for which it is not clear if the original meaning was nominal or verbal (*teampō 'to become wet / seaweed'), and their placement would end up arbitrary if we strictly separated verbs and nominals by citation form.

(Discussion on further matters perhaps ought to go at Wiktionary talk:About Proto-Samic. [EDIT] 15:34, 24 October 2014 (UTC): Page now up.)

--Tropylium (talk) 20:46, 19 October 2014 (UTC)

[1] Lehtiranta, Juhani. 1989–2001. Yhteissaamelainen sanasto ('Common Sami Vocabulary'). Suomalais-Ugrilaisen Seuran Toimituksia 200. Helsinki: Suomalais-Ugrilainen Seura. ISBN 951-9403-23-X.

I would choose option 3 mainly because it lines up better with modern terms and makes comparisons easier. It also matches our treatment of Proto-Finnic, which also uses the infinitive as the lemma. —CodeCat 21:21, 19 October 2014 (UTC)

On proper nouns[edit]

Previous discussions: Wiktionary:Information desk/2014/July#Are names always proper nouns (or proper names)?, Wiktionary:Beer parlour/2014/July#Proper nouns

Why do we treat proper nouns as a separate POS from nouns? Proper nouns are just a specific type of noun; having separate headings and categories for "Proper nouns" as opposed to "Nouns" is a bit like having separate headings and categories for "Transitive verbs" as opposed to "Verbs". Merging proper nouns in with nouns would solve a lot of ambiguity problems, such as words like Friday and Christmas that can be used both as a proper noun and as a common noun, not to mention the problem that there is no real clear cross-linguistic definition of what constitutes a proper noun. (Most attempts at defining the difference I've seen apply only to English and don't necessarily work for other languages.) —Aɴɢʀ (talk) 16:26, 20 October 2014 (UTC)

I definitely support this. Furthermore, even if this does not pass, I would like to propose categorising all proper nouns as nouns as well, and merging Category:Proper noun forms by language into Category:Noun forms by language. —CodeCat 16:59, 20 October 2014 (UTC)
As a general rule if something (eg, a classification, attribute) is reasonably well researched and documented in a given language and has lexical implications, then we should have it in that language. If other languages don't have the distinction or don't have it documented then we shouldn't have it for those languages. I don't see why we should dumb down presentation of any language, let alone the host language, for the sake of uniformity or the convenience of translators or Lua practitioners.
For English and for taxonomic names, the notion of proper nouns is well-documented and useful. We could make the presentation simpler by acknowledging that large classes of English proper nouns have perfectly predictable (ie, effectively syntactical) patterns of common-noun use. I always wonder whether we can prevent contributors from adding "missing" information such as an Adjective PoS section to cover attributive use of an English noun, but that problem seems to be declining. DCDuring TALK 17:21, 20 October 2014 (UTC)
But we don't have to indicate the "propriety" of nouns by having "Proper noun" considered a separate POS. We could tag nouns {{lb|en|proper}} or {{lb|en|common}}, for example, the way we already label verbs {{lb|en|transitive}} or {{lb|en|intransitive}}. It isn't "dumbing down" the presentation of the language to aim for accuracy as well as precision. —Aɴɢʀ (talk) 18:34, 20 October 2014 (UTC)
You have now taken a position that is better defined than your initial posting, which expressed opposition to proper noun headings and categories. And your initial posting included "the problem that there is no real clear cross-linguistic definition of what constitutes a proper noun", which seems like the kind of cross-linguistic uniformitarianism that is often proposed here and which is probably what has won you CodeCat's support.
Your statement above that 'having separate headings and categories for "Proper nouns" as opposed to "Nouns" is a bit like having separate headings and categories for "Transitive verbs" as opposed to "Verbs"' implies that you are opposed to such headers and categorization in the case of entries that are now proper nouns. But we have categorization of "Intransitive verbs". Are you really opposed to that as well. The ratio of English proper noun entries to total English noun entries is even smaller than the ratio of intransitive English verbs to total English verbs, so the category is arguably more useful. Given our current "efficient" method of implementing labels, we cannot use "what links here" and a template to construct a list of items so labeled, leaving us with only categories, programs run on dump runs, and text searches as ways of constructing such lists from labeled definitions. Speaking from extensive and recent experience, I can say that text searches are not fully satisfactory and that programs run on the XML dumps are inconvenient for many ad-hoc purposes.
Are you opposed to the proper noun category as well as to the proper noun heading? Are you in favor of proper labeling of individual definitions before the proper noun heading is eliminated? Are we sure that proper labeling does not require manual review? Who do you propose do the checking and conversion? DCDuring TALK 19:15, 20 October 2014 (UTC)
I'm not proposing anything yet; at this point all I want is discussion. I do want to consider getting rid of the L3 header, but you're right that parallelism with transitive and intransitive verbs does suggest retaining Category:English proper nouns as well as creating Category:English common nouns. As for a cross-linguistic definition, I'm not even talking about languages that aren't considered to have the proper/common distinction (though I'm not aware of any languages that don't), I'm talking about a definition that would apply to all languages that are considered to have both kinds of nouns. Even for such syntactically similar languages as English, French, and German I don't know how to define "proper noun" in a way that will apply to all three languages. And if each language has to have its own language-specific definition, that's a good indication to me that the concept of "proper noun" has no linguistic basis at all and is useful only for pedagogy. And if it turns out there is no adequate definition of "proper noun", then we shouldn't use the label template or the category at all. What do other dictionaries do? Do other dictionaries label proper nouns separately? What criteria do they use? For that matter, what criteria do we use? Why are AB-yogurt and air chief marshal proper nouns? —Aɴɢʀ (talk) 19:36, 20 October 2014 (UTC)
I'd be willing to stake my reputation as a linguist on there being massive overlap among the sets of things considered as proper nouns in all languages. Many folks don't act as if taxonomic names are proper nouns, but most theoretical taxonomists seem to. And then there is the proper name/proper noun distinction. DCDuring TALK 21:48, 20 October 2014 (UTC)
But "being considered a proper noun" isn't a definition. And I'm not sure there's even always overlap within the same language. For example, we call language names like Latin and Sanskrit proper nouns, just like names like Noah and London. But the American Heritage Dictionary, which gives no part of speech info for Noah and London, labels Latin and Sanskrit "n.", which they otherwise do only for common nouns. So are language names proper or common? What usage of taxonomic names indicates that theoretical taxonomists treat them as proper nouns? (That's an actual question, not a rhetorical one.) Considering our first definition of [[proper name]] is "proper noun", I wonder the distinction between the two is supposed to be. —Aɴɢʀ (talk) 22:12, 20 October 2014 (UTC)

Support tentatively (but I will see how the discussion goes), even if it causes Japanese, Chinese (only Mandarin, Min Nan/Min Dong and Hakka) and Korean transliterations to become lower case (various dictionaries use different standard for capitalisations of these languages, place and personal names are usually capitalised but not by all dictionaries). There's definitely no need to treat language names, demonyms, month and weekday names to be proper nouns. Various languages here just follow English when using proper nouns. Transliterations, which are never capitalised don't need and don't benefit from this distinction at all. E.g. Arabic nouns are just nouns. --Anatoli T. (обсудить/вклад) 22:59, 20 October 2014 (UTC)

@Atitarev: Actually, in Arabic there is very important distinction between proper and common nouns. Proper nouns are automatically definite and never take the definite article الـ (al-) or possessive suffixes, and usually do not take nunation, in which case they also have a slightly different declension pattern. For example: مِصْرُ الْقَدِيمَةُ (miṣru al-qadīmatu, Ancient Egypt) and فِي مِصْرَ الْقَدِيمَةِ (fī miṣra al-qadīmati, in Ancient Egypt). Similar applies to Hebrew and Aramaic. --WikiTiki89 21:05, 21 October 2014 (UTC)
Proper nouns never take the definite article in Arabic? So العراق‎, السعودية and الإسكندرية are common nouns? People sometimes make the same claim about English, that proper nouns never take the definite article, but then Netherlands, Gambia, and Philippines (not to mention Ukraine and Crimea in more old-fashioned varieties) would have to be called common nouns. —Aɴɢʀ (talk) 22:29, 21 October 2014 (UTC)
Well in those cases, they don't take another definite article because the definite article is part of the proper noun. For your English examples, I would say that "the Netherlands" is the proper noun, while just "Netherlands" is an incomplete proper noun (or the plural of "Netherland"). --WikiTiki89 22:45, 21 October 2014 (UTC)
Yes, some proper nouns may become diptotes but this probably has to do with their definiteness, rather than the fact that they are proper nouns. The thing is also, not ALL proper nouns are triptotes, e.g. (with full vowelisation) مُحَمَّدٌ (muḥammadun) and, as Angr mentioned, they can also take a definite article, as in العِرَاق (al-ʿirāq) "Iraq" and الأُرْدُنّ (al-ʾurdunn) "Jordan", although the nisba doesn't have it: عِرَاقِي (ʿirāqī) "Iraqi" and أُرْدُنِي (ʾurdunī) "Jordanian". There are some rules about, which proper nouns can be diptotes - the length, whether they are loanwords or native Arabic, the endings, certain patterns (e.g. "fuʿal"). --Anatoli T. (обсудить/вклад) 12:35, 22 October 2014 (UTC)
My whole point was that their definiteness (more so, the fact that they cannot be made indefinite and cannot take possessive suffixes) is what makes them proper nouns. Nisbas are not proper nouns, so I don't see how they are relevant. You cannot, for example, say مِصْرُكَ (miṣruka, your Egypt) or عِرَاقُكَ (ʿirāquka, your Iraq); or if you do say that, then you are turning it into a common noun. As for مُحَمَّدٌ (muḥammadun), I did use the word "usually" for a reason. --WikiTiki89 12:56, 22 October 2014 (UTC)
Well, it's obvious that proper nouns, like unique place names, are definite but I personally don't see this really as a grammatical difference, to separate them as proper nouns, they can sometimes take a definite article, they can also take possessive suffixes (converting to common nouns, if you wish), they can sometimes be triptotes (and common nouns can be diptotes). These features are not reliable (also hard to verify, since ʾiʿrāb is seldom written, not so often pronounced in full). I found some rules for diptotes for proper nouns but my source doesn't mention how many are triptotes, so, not sure if the list is big. My nisba examples were just to show that الـ (al-) is not part of the word. Since Arabic grammarians do mention Arabic proper nouns, I'll drop this point specific to Arabic. I only think that language names and nationalities should be common nouns in Arabic, reserve proper nouns for place, people's and company names. --Anatoli T. (обсудить/вклад) 14:24, 22 October 2014 (UTC)

A much simpler solution would be to rename the current Noun to Common noun. I would strongly oppose the introduction of an Intransitive verb POS, but I think it's very helpful to readers to keep both POSs when they are meaningful in the language, these two kinds of nouns being used very differently. The precise limit between proper nouns and common nouns only depends on tradition in each language (e.g. we consider italien (the language), septembre or Parisien, a capitalized word, as common nouns in French). Note that, generally speaking, all proper nouns can be used as common nouns (but this does not make them common nouns), and common nouns can be used as proper nouns, this cannot be considered as ambiguity. Lmaltier (talk) 20:26, 21 October 2014 (UTC)

  • We need to indicate whether a noun is common or proper in some way. Whether this is in the POS heading or somewhere else makes little difference, but it seems that the POS heading is the most obvious and best place for it. Verbs do not need transitive/intransitive distinctions as much because it is usually obvious from the definition. --WikiTiki89 21:05, 21 October 2014 (UTC)
    • What's the evidence that the two kinds of nouns are "used very differently"? They seem to be used exactly the same way to me: as the subject or direct object of a sentence, as the object of a preposition, etc. Why do we need to indicate this apparently undefinable and artificial distinction? And if we do, why is the POS heading the most obvious and best place for it? To the extent the distinction actually exists, it's usually obvious from the definition too. —Aɴɢʀ (talk) 22:29, 21 October 2014 (UTC)
      • In some languages, it's clear that they are used very differently, and that they are very different from the reader's point of view. In French, the article is usually used with common nouns, not with proper nouns (it's much less simple, e.g. the definite article is normal with most country names, but this is the general idea). Lmaltier (talk) 05:49, 22 October 2014 (UTC)
        • The only thing that's clear to me so far in this discussion is that many languages have nouns that are definite without the markers of definiteness that are usual in that language, such as being governed by a definite article, a possessive determiner or the like. But in none of the languages discussed so far is that set of nouns exactly coterminous with a set of nouns that can be defined by a semantic property such as being the name of a person, geographical location, language, etc. In English, Arabic, and German, most geographical names don't use the definite article, but some do, and statements like "the definite article in the Netherlands is part of the name" is simply begging the question. In Irish, most language names do use the definite article except in certain constructions, but at least one (Béarla (English)) never uses it. So if we want to label nouns by this property at all, we should label them as being definite even without a definiteness marker, rather than implying that there is some sort of semantic property that causes nouns to be "proper nouns" and that their syntactic behavior results from that. —Aɴɢʀ (talk) 15:13, 22 October 2014 (UTC)
          • No, no, not at all, the only possible criterion is the tradition in the language. It was only an example to show that being a proper noun often has a major impact on the use, including grammatical rules to be used. Lmaltier (talk) 17:29, 22 October 2014 (UTC)

Another argument is that paper dictionaries including both common nouns and proper nouns sometimes have a fully separate part for proper nouns (it's the case of a best-seller dictionary for French: Petit Larousse Illustré). Readers may be used to this clear separation. Lmaltier (talk) 17:35, 22 October 2014 (UTC)

  • I don't think "tradition in the language" is a reason at all, especially since the vast majority of the world's languages don't have a tradition about it one way or the other. If the distinction between common nouns and proper nouns is linguistically real, it must be possible to come up with a definition that applies to all languages regardless of traditional grammars. —Aɴɢʀ (talk) 18:42, 22 October 2014 (UTC)
    • I don't think so. In any case, stating that the French nouns poker, septembre or arménien (the language) are proper nouns would clearly be wrong. They are not proper nouns in French. Lmaltier (talk) 18:58, 22 October 2014 (UTC)
      • But why not? What definition of "proper noun" are you using to determine that? Capitalization alone? Because if that's the only criterion that can be used to distinguish proper nouns from common nouns, then the distinction is definitely nonlinguistic. —Aɴɢʀ (talk) 19:40, 22 October 2014 (UTC)
        • No, we consider Parisien as a common noun in French, too, despite capitalization. When I refer to tradition of the language, I mean that the general meaning is always the same (see proper noun), but how it's interpreted precisely may depend on languages in some cases (in most case, it's the same in all languages recognizing proper noun as a word category). Lmaltier (talk) 19:52, 22 October 2014 (UTC)
          • So the distinction is made on the basis of native speakers' intuitions? A noun is a proper noun because it feels like a proper noun? —Aɴɢʀ (talk) 19:59, 22 October 2014 (UTC)
            • This intuition is based on the tradition of the language, on how specialists of the language usually consider the word. In French, traditionally, proper nouns are names of places, people (and peoples), companies, brands, historical events, works of art or books, not much more. Sometimes, we hear about proper adjectives in English (seemingly according to capitalization), this word is meaningless in French. Lmaltier (talk) 05:59, 23 October 2014 (UTC)
              • So still no definition, just an appeal to authority. I'm becoming more and more convinced there's no such thing as a proper noun. —Aɴɢʀ (talk) 12:05, 23 October 2014 (UTC)
                In French the definition is simple: a proper noun is used to described a unique being or thing. Every modern French dictionary unambigously distinguishes proper nouns from common nouns: Larousse, Robert, TLFi, Dictionnaire de l'Académie française... Just because you can't find a universal definition for a proper noun doesn't mean that you can ignore this distinction when it is part of a language like French. Dakdada (talk) 13:10, 23 October 2014 (UTC)
                • If my family owns one dog and my mother says "Have you fed the dog?", then "the dog" refers to a unique being; does that make it a proper noun? What about language names like arménien mentioned above? Is that not a unique thing? Then why is it not a proper noun in French? Just because dictionaries invent distinctions to make life easier for language teachers, that doesn't mean those artificial distionctions are actually part of the language. —Aɴɢʀ (talk) 13:49, 23 October 2014 (UTC)
                  That's just the + dog. It doesn't change the fact that dog is a common noun. Language names are debatable, but obviously I can't convince you if you really don't see any difference between e.g. city and London. Dakdada (talk) 16:30, 24 October 2014 (UTC)
The kind of thing that a proper name names can include a lineage (real, hypothetical, or conventional), as a Roman gens or a taxon. It can include a people, race, tribe, breed, family?, etc, even when they are not lineages. All of these can be plural in form, but they are considered to be referring to a single entity. Such a word, whether singular or plural, when referring to an individual member or subset of any such grouping, seems to me to be a common noun.
More generally it is a question of convention, as almost all actual language is, as opposed to part of some ephemeral rational scheme, purported to be universal and timeless, but actually just a hypothesis.
If a given definition has exceptions, that does not invalidate the definition, which is usually of the typical member of the class. Wittgenstein's discussion of game (or was it Spiel?) should informative. DCDuring TALK 14:29, 23 October 2014 (UTC)
A distinction can be made for analytical purposes (not, IMO, for Wiktionary presentation purposes) between proper names and proper nouns. Mary is a proper noun, sometimes serving as a proper name (where the context makes it sufficient to uniquely identify the individual) and sometimes as part of a noun phrase (Mary Ellen Smith) that serves as a proper name in other contexts (but not necessarily all possible contexts). That White House is a proper name, which we present as a proper noun, does not make House a proper noun or proper name. House is a proper noun by virtue of its use as a surname.
It is hard for me to believe that the request for a definition is anything but a rhetorical ploy, as such definitions are abundant and adequate for most purposes. If we need something more for purposes of knowing what goes under a given language's Proper noun heading or into the category, we can either impose the host language's conventions, either universally or by default, allowing exceptions for the conventions of other languages. We already allow orthographic departure from English usage and certainly don't impose English grammar (eg, use of determiners) on other languages, not even PoS headers, useful though they may be. If someone would like to document the proper noun/proper name practices of a language in an appendix, they would be doing the project a service. DCDuring TALK 14:29, 23 October 2014 (UTC)
No, the request for a definition is not a rhetorical ploy. I'd genuinely like a definition because I am often uncertain whether to label a particular noun as a ===Proper noun=== or not, especially in languages other than English. Usually I simply have to rely on how the English equivalent is labeled. Most conventional definitions seem to be circular and therefore useless, as in: "When is a noun capitalized in English? When it's a proper noun. OK, so when is a noun a proper noun in English? When it's capitalized." Either that or hopelessly vague, as in "a proper noun is the name of a specific, unique being", which doesn't explain why The Hague is a proper noun that just happens to include the word the, but the dog is a common noun made definite by the presence of the definite article. —Aɴɢʀ (talk) 17:23, 24 October 2014 (UTC)
The Hague is the name of a particular city. The dog is not the name of a particular dog (just the + dog). It has nothing to do with the definite article or the capitalization, which are secondary and language related. If you want definitions, what about w:Proper noun? Dakdada (talk) 17:53, 24 October 2014 (UTC)
If a distinction can be made between definite and indefinite reference, then it's a common noun. Otherwise it's a proper noun. If both, then it's both. --Ivan Štambuk (talk) 18:38, 24 October 2014 (UTC)
I can't venture anything about languages other than English.
Not all capitalized words or expressions in English are proper nouns. You should discard with prejudice any reference that says otherwise.
The Hague (sometimes the Hague) is a proper noun because of its definition. I expect that it has the attached because it is a calque of den Haag.
The is attached to Netherlands in running text (but not in mailing addresses, etc.), probably because of the historical Nether Lands, whether factual or imagined.
In English it is usually not too hard to distinguish in current and recent usage between a definite expression (usually with the) that describes or characterizes something and a proper name that includes the. But it was not too long ago that an expression like "John, sawyer" served to uniquely identify someone on parish rolls.
In English the incompatibility of a proper name with a or any or every seems more indicative than the presence of the.
In English the hand of history and fashion is very visible. Usage dictates. How each usage gets started or terminates can be a very particular story. As a result I don't think there is a short list of rules and exceptions that covers all the cases. That is why WP needs a style sheet that documents its decisions about capitalization and why the taxonomic naming authorities have explicit rules. And why users need dictionaries and style guides. Wiktionary can do a better job of providing such lexical information than other references if we continue to be willing to do so. We can check corpora and style guides so users who trust us don't have to. DCDuring TALK 18:44, 24 October 2014 (UTC)

Small, doable modification to WT:CFI#Idiomaticity[edit]

WT:CFI#Idiomaticity sentence #1:

An expression is idiomatic if its full meaning cannot be easily derived from the meaning of its separate components.

Change this to

A multi-word term is idiomatic if its full meaning cannot be easily derived from the meaning of its separate components.

The changed part here is A multi-word term.

Rationale: WT:CFI does not define what an expression is, the Wiktionary entry expression isn't any help either. Some multi-word terms like come in may not be considered expression. Multi-word term is vastly better than term, because term could include single words with transparent meanings, like improvable, points (plural of point) reenter (enter again) and so on. I'm canvassing to see if there's enough support to make a vote of it. Renard Migrant (talk) 12:01, 23 October 2014 (UTC)

The sentence is better, but is it really useful anyway? Idiomaticity of multi-word terms should not be a condition for inclusion. ice hockey cannot be considered as idiomatic. Nonetheless, it's a term of the English language, and including it is therefore normal. Lmaltier (talk) 16:54, 23 October 2014 (UTC)
Support. I think it is useful. — Ungoliant (falai) 17:16, 23 October 2014 (UTC)
Lmaltier I appreciate your input, but we also know from past experience it's just you that thinks this. Also I do consider ice hockey idiomatic. It has very different rules to hockey. Like, is table tennis merely tennis played on a table? I certain don't think so! Renard Migrant (talk) 11:52, 24 October 2014 (UTC)
Of course. Nonetheless, the meaning can be easily derived from the meaning of its separate components (provided you know the sport, even without knowing its name). I copy the definition of idiom: An expression peculiar to or characteristic of a particular language, especially when the meaning is illogical or separate from the meanings of its component words. table tennis is not something peculiar to English or characteristic of English, and its meaning is not illogical nor separate from the meanings of its component words. You understand why I don't like this sentence as a criterion. Lmaltier (talk) 18:18, 24 October 2014 (UTC)
  • I have to oppose. I think the term "expression" was intended to cover both single words and multi-word terms. The new wording would not do that. Therefore, the new wording would no longer define what CFI:idiomatic means for single words like "redefine". Right now, "redefine" is idiomatic because its components are not separate enough. --Dan Polansky (talk) 17:41, 24 October 2014 (UTC)
    • But that interpretation is not the status quo. The status quo, although it's an unwritten rule, is to accept all single words (for varying interpretations of "word") as idiomatic regardless of morphological transparency. Or to say it another way, idiomaticity is not a factor in the inclusion of single "words". —CodeCat 18:23, 24 October 2014 (UTC)
      • What I have written is consistent with current common practice. For instance, we include "blueness", since while "blueness" is clear from "blue" and "-ness", the two are not separate, which matters for "An expression is idiomatic if its full meaning cannot be easily derived from the meaning of its separate components." --Dan Polansky (talk) 22:39, 24 October 2014 (UTC)
        • I can see this interpretation, just it wouldn't be my interpretation; blue and -ness are separate. "Separateness" doesn't mean "separated by a typographical space". Renard Migrant (talk) 13:15, 25 October 2014 (UTC)

Dialect context labels - adjective, dialect name or place name?[edit]

There's something vaguely weird on croggan: The first sense is described as "Cornish" while the second is "Scotland", and the mixing of parts of speech stands out a bit. This isn't an isolated thing - among other British Isles dialects, we have "Wales", "Ireland", "Teesside" and "Yorkshire", but "Geordie" (rather than "Newcastle-upon-Tyne" or "Tyneside"), "Bristolian" (rather than "Bristol"), "Manx" (rather than "Isle of Man"), "Northumbrian" (instead of "Northumbria") and "Liverpudlian" (rather "Liverpool", "Merseyside" or "Scouse").

I understand why we can't use (for example) Welsh or Irish as context labels in English-language entries (and by that logic, "Manx" is probably inappropriate too since there's a Gaelic Manx language), but the mishmash is a bit strange. Would people object to changing the labels to follow this pattern?

We use the proper name of the city/region that spawned it, except for in the handful of cases where the dialect has a widely-understood name that is not etymologically related to its origins (Geordie, Pitmatic, Cockney, Scouse - possibly Cajun, although I don't know whether everything currently tagged "Louisiana" is actually Cajun English.)

It just seems a bit cleaner that way. "croggan" would then be (Cornwall, Scotland), "mam" would be (Scouse, Northumbria). Smurrayinchester (talk) 16:40, 23 October 2014 (UTC)

I think that using the adjective could be more practical. It would allow us to distinguish terms used in a place from terms used in the context of discussing a place. —CodeCat 16:45, 23 October 2014 (UTC)
I prefer using placenames. Using placenames in context labels for senses discussing the place is usually confusing and can always be improved by removing the label and amending the definition (i.e., at ABC “(Brazil) [] cities [] that form the most important industrial area in the country.” → “(geopolitics) [] cities [] that form the most important industrial area in Brazil.”). — Ungoliant (falai) 17:11, 23 October 2014 (UTC)
When we had context labels rather than a module, we used to redirect things like {{Scottish}} to {{Scotland}} so that both displayed (Scotland). I see no reason to discontinue this. Having said that an adjective is better if it's more accurate or easier to understand, so Geordie rather than Tyneside, I'm fine with that. Renard Migrant (talk) 11:55, 24 October 2014 (UTC)
We still do that, only everything is within the module. --WikiTiki89 14:52, 24 October 2014 (UTC)
Good, then let's keep doing that, unless people don't want to. Renard Migrant (talk) 13:17, 25 October 2014 (UTC)

Black's Law 2d going up at Wikisource[edit]

Just a heads up - I am currently creating OCR pages of Black's Law Dictionary, 2d Edition (1910) at Wikisource, and would eventually like to bring as much of it as is useful over here. Cheers! bd2412 T 21:02, 23 October 2014 (UTC)

Cool! Maybe you should make a Template:Black's 1910 or something, similar to {{Webster 1913}}, for entries taken from it. —Aɴɢʀ (talk) 21:26, 23 October 2014 (UTC)
Yes, that is a very good idea. bd2412 T 21:27, 23 October 2014 (UTC)
@BD2412: That is excellent news. Thank you for your efforts. — I.S.M.E.T.A. 23:27, 23 October 2014 (UTC)
Cool. Even if you don't bring it here. DCDuring TALK 00:27, 24 October 2014 (UTC)
Maybe you could link all the terms here, like [7] (a lot of work!) DTLHS (talk) 00:53, 24 October 2014 (UTC)