Wiktionary:Beer parlour

Definition from Wiktionary, the free dictionary
(Redirected from Wiktionary:BP)
Jump to: navigation, search

Wiktionary > Discussion rooms > Beer parlour

Lautrec a corner in a dance hall 1892.jpg

Welcome, all, to the Beer Parlour! This is the place where many a historic decision has been made and where important discussions are being held daily. If you have a question about fundamental Wiktionary aspects—that is, about policies, proposals and other community-wide features—please place it at the bottom of the list (click on Start a new discussion), and it will be considered. Please keep in mind the rules of discussion: remain civil, don't make personal attacks, don't change other people's posts, and sign your comments with four tildes (~~~~), which produces your name with timestamp. Also keep in mind the purpose of this page. There are various other discussion rooms which may serve the idea behind your questions better. Please take a look to see which is most appropriate.

Sometimes discussion identifies an issue as an idea for policy development or rewriting. Such discussions may be taken out of the Beer parlour to a relevant page, or a brand new page may be created. Usually, the active policy pages will be listed in one of the sections below. See also the policy development page and the votes page.

Questions and answers will not remain on this page indefinitely, as it would very soon become too long to be editable. After a period of time with no further activity (usually a couple of weeks), information will be moved to the archives. We make a point to preserve all discussions that were started here in the archives. However, talk that is clearly not intended for this page may be moved and will not end up in the archives. Enjoy the Beer parlour!

Beer parlour archives +/-


June 2014[edit]

Template:cx vs. template:context[edit]

I prefer {{cx}} over {{context}}. Nonetheless, MglovesfunBot (talkcontribs) is replacing the former with the latter. Was there a discussion from which this action follows? Any links to the discussion? --Dan Polansky (talk) 09:25, 3 June 2014 (UTC)

They're equivalent, one is a shortcut to the other. So there is no problem with replacing one with the other. —CodeCat 11:35, 3 June 2014 (UTC)
Is there is discussion supporting this? They are not visually equivalent in the wiki markup; one is much shorter, so the actual context like "colloquial" is visually much more outstanding with it. --Dan Polansky (talk) 12:27, 3 June 2014 (UTC)

Prescriptivism as to common lay transliterations.[edit]

According to Wiktionary:Neutral point of view: "On Wiktionary, neutrality directly implies that a descriptive approach is taken towards the documentation of languages, and not a prescriptive approach. This is one of the primary tenets of how Wiktionary works". We do not adhere to this principle, however, when it comes to common lay transliterations (i.e. commonly used translations for terms originating in foreign scripts, created without necessarily having or following an authoritative scheme of transliteration). This is exemplified by entries like tovarich (English, really?), ayubowan (decided through RfD to be kept as an English word derived from Sinhalese), and the current discussion at Wiktionary:Requests for deletion‎#mahā.

We generally decide whether any unbroken string of letters is "a word" by looking to see if it is used in print to convey a consistent meaning. Our CFI is built around this principle. We do this because the existence of the word in print is what makes it likely that a reader will come across it and want to know how it is defined, or possibly how it is pronounced, derived, or translated into other languages. I see no reason consistent with our CFI or our NPOV tenet to exclude any unbroken string of letters used in print to convey a consistent meaning, certainly on the basis that this string of letters is not formed by some official arbiter of transliteration. I would propose that our current CFI and NPOV language requires that we include attested words created by lay transliteration, whether or not these words appeal to our own sense of propriety. bd2412 T 19:03, 3 June 2014 (UTC)

Even transliteration from Latin to other scripts? — Ungoliant (falai) 20:19, 3 June 2014 (UTC)
Like フロリダ州 and Флорида? bd2412 T 20:21, 3 June 2014 (UTC)
Those are written in their native scripts. — Ungoliant (falai) 20:29, 3 June 2014 (UTC)
Like гуд морнинг. — Ungoliant (falai) 20:38, 3 June 2014 (UTC)
Those are hardly their "native scripts"; they are transliterations which have been adopted into the language, probably through length of use. As for "гуд морнинг", that's a two-word phrase. I am referring to the "unbroken string of letters", so in addition to use, idiomacity of the transliterated phrase would need to be shown. Assuming it can be, the question then is whether "гуд морнинг" is used in print to convey a consistent meaning over a sufficient span. If it is, then it is entirely plausible that a reader might come across it and want to know its meaning. This is scarcely different than including non-Latin eye dialect entries like падонки and キレる (or Latin eye dialect like dayum and innerduce, for that matter). I assume that at some point someone will make an entry for тверкинг, also. bd2412 T 21:27, 3 June 2014 (UTC)
Yes they are; Katakana is a native script of Japanese and Cyrillic is the native script of Russian. フロリダ州 is a Japanese word that has been loaned from a language whose native script happens to be Latin, not a mere transliteration like гуд морнинг (or just морнинг, if you prefer), which is the English phrase (good) morning written in Cyrillic script instead of Latin. If you think that гуд морнинг occurs in Russian as a loanword from English, feel free to add it as a Russian. My question is: if your proposal is accepted, would we create things like an English entry for морнинг? — Ungoliant (falai) 23:12, 3 June 2014 (UTC)
Is морнинг even attested? I can see that it exists, but since I can not read Russian, I have no idea whether it is attested with the same meaning as morning in English, or whether the cites that exist are even uses as opposed to mere mentions. Assuming that all of these criteria are satisfied, and there are a CFI-worthy number of uses of морнинг in running text consistently conveying the meaning "morning", then we should have an entry defining the term for the benefit of the reader. Should it be defined as English? It seems absurd to call it Russian when it is merely an English word written in Russian characters. If a new kind of entry is required to accommodate the existence of such words, then we need to put one in place. We would not be able to claim that Wiktionary is a descriptive work, rather than a prescriptive work, if we were to pretend that "морнинг" did not exist, or conveyed no intelligible meaning. bd2412 T 23:59, 3 June 2014 (UTC)
I'm not quite following the purpose of this discussion. What is this for? Is it about allowing transliterations in various languages? We have allowed, on a limited basis Roman transliteration of a few languages - after a vote or by consensus. I don't think we should spread to any non-Roman based language, unless they are a part of another language. New additions should be allowed after a vote and should never be on the same level than terms in the native script. --Anatoli (обсудить/вклад) 00:39, 4 June 2014 (UTC)
Wouldn't that require us to repeal the more fundamental policy of the above-quoted language of Wiktionary:Neutral point of view? After all, it is a purely prescriptivist position to exclude attested words (using our definition of "word": A distinct unit of language (sounds in speech or written letters) with a particular meaning, composed of one or more morphemes, and also of one or more phonemes that determine its sound pattern). bd2412 T 00:52, 4 June 2014 (UTC)
Yes, believe it or not we don't just record every attested combination of letters ever put on paper (and yes, that means we have a point of view and that we aren't purely descriptivist). DTLHS (talk) 00:55, 4 June 2014 (UTC)
If a word is attested in a language, then it can be included in that language in that script, not its transliteration. Transliterations are also attestable but their usage is well - mere transliteration, when it's not possible or difficult to use proper native scripts or to teach the script or pronunciation. sijakhada is only transliteration of Korean 시작하다. It's useful and can be found in published books but its purpose is different. Transliteration is not a substitute for native scripts. --Anatoli (обсудить/вклад) 01:02, 4 June 2014 (UTC)
How are you defining "word" to permit that distinction? Certainly not the way it is defined in our own corpus. bd2412 T 01:10, 4 June 2014 (UTC)
It depends on the word and on the language, what writing systems are used in a given language. For example, "Я читаю книгу, а она смотрит телевизор." are Russian words (I'm reading a book and she's watching TV.), "Ja čitáju knígu, a oná smótrit televízor" is the transliteration of the Russian phrase, they are not words. यह लड़की बहुत सुंदर है । is a Hindi phrase (This girl is very beautiful.). Hindi is written in Devanagari. "yah laṛkī bahut sundar hai." is a transliteration of the Hindi phrase, none of these: yah, laṛkī bahut sundar hai are Hindi words and none of Ja čitáju knígu, a oná smótrit televízor are Russian words, even if it's a standard transliteration and can be attested. There can be plethora of standard, chat, practical, textbook, specific dictionary transliterations. Is that enough? I don't know if I can explain better. --Anatoli (обсудить/вклад)
How does this help our readers when they come across such things? None of this explains how, for example, laṛkī or televízor are not distinct units of language with a particular meaning. They have as strong a claim to being words as ндрав. There must be some way that we can assist readers who come across these in print (assuming they occur with a sufficient degree of attestation to meet our CFI) in understanding their meaning. If we can't, if readers must turn to some other resource to determine the meaning of these, then we're not fully functioning as a dictionary. I would not oppose stricter constraints on the number and type of references to be required for such things, or a form of presentation that makes it clear that the transliteration is not the native form, but there must be some way to avoid turning a blind eye to the existence of these lexical units and their potential to require defining for our readers. This is perhaps even more pressing in the case of a word like bahut, for which we have a definition that will only confuse and frustrate the person reading transliterated Hindi, if it exists. bd2412 T 02:13, 4 June 2014 (UTC)
It may not be possible to cover all possible transliterations, including scientific, phonetical, practical (čto vs što vs shto for Russian "что"), chatroom (Arabic "3iid mubaarak" for عيد مبارك instead of "ʿīd mubārak"), with or without stress indication (televízor vs televizor), with or without vowel length indication (mahā vs maha), suppressing unpronounced letters or transliterating as they are written (laṛkī vs laṛakī). If users are not able to separate proper words in a proper script from their transliterations/transcription or loanwords from the phonetic representation, they may not be able to use our dictionary. We have an advanced search facility and it's more of a technical question, rather than policy. --Anatoli (обсудить/вклад) 02:34, 4 June 2014 (UTC)
I don't see why this is any less possible than covering all untransliterated words (particularly if we are including common misspellings and eye dialect terms in multiple languages). However, my concern is focused on words found in books in print, and particularly to words used in running English text without italics, quotation marks, or other distinct presentations designed to indicate their "foreignness". I think the typical reader can be forgiven for not being able to separate those words from proper words in a proper script. Such a limitation would obviously substantially shrink the set of words to consider, probably excluding most of the examples that you have provided here. Would that bring us closer to a resolution where you would feel comfortable having some commonly used "unofficial" transliterations? bd2412 T 02:45, 4 June 2014 (UTC)
Wait, do I understand correctly that you've just said you want to take certain things that are "used in running English text without italics, quotation marks, or other distinct presentations designed to indicate their 'foreignness'" and include them as romanizations/transliterations rather than as English? If so, what basis do you have for saying that strings used in running English text without any indication of foreignness are not English, other than a prescriptivist view of what constitutes English? - -sche (discuss) 03:20, 4 June 2014 (UTC)
I want to include them, period. I'm not picky about how, but I think it's silly to treat mahā as an English word when it is being used no differently than महा would be, except that various authors have transliterated it so their readers will find it more familiar. bd2412 T 03:38, 4 June 2014 (UTC)

I still don’t get what is being proposed, BD. We include all attested words, whether their spelling is determined by standard or non-standard transliteration, transcription, or whatever else (which is probably impossible to determine in most cases). As far as I know, there is nothing written or unwritten preventing the use of ones “created by lay transliteration”. Tovarich, for example, might be based on a French transcription of Russian or something. It doesn’t matter. It is included as English because (I presume) it is used three times in English sources. Michael Z. 2014-06-05 09:33 z

  • I recently proposed to restore the previously deleted mahā, a widely-used transliteration of a Sanskrit word. In the discussion, some editors are opposing the inclusion of this word on the grounds that it is "not a word" because it is a transliteration from Sanskrit. I think it should be included in some form, and don't see why we include transliterations of Chinese and Japanese, for example as Chinese and Japanese, but will not include this at all (perhaps not even as English). bd2412 T 11:56, 5 June 2014 (UTC)
I have four separate issues about this. For one, I think when something like Moskva comes up in English text, whether we consider it English or Russian is not so important as the fact that someone might want to look up that spelling. This applies to relatively rare cases, with words that are sort of edging across boundaries. On the other hand, I'm not thrilled with the concept of citing linguistic transliterations into Latin for whatever reasons; between search and manual transliterations, users of such works should be able to find the words they're looking for, and it seems like that could lead to a huge boost in the number of words. On the flip side, I do believe that we should use scripts that the language is actually published in. Gothic, for example, is published in Latin script; I do not believe that non-trivial amounts of the language have ever been published in Gothic script. We should (and do) support Gothic in the script that it's published in, no matter what purism leads us to using the script it was once written in. I don't know enough about Sanskrit; I believe it can actually be published in any number of Indic scripts, and if people actually read Sanskrit in Latin script, then Latin script as well as all of those Indic scripts it's actually read in should have entries for Sanskrit. Lastly, I'm more worried about this the harder the script is. Russian or Greek transliteration shouldn't be hard to get back into its native script, but ideographs are generally going to be impossible, with Sanskrit being challenging for many users but not hopefully impossible.--Prosfilaes (talk) 01:03, 6 June 2014 (UTC)
I do find this concept that mahā is not a word and महा is to be confusing. They're both a unit of communication clearly corresponding to the same Platonic word. abba, 𐌰𐌱𐌱𐌰, and αββα are the same Gothic word, practically spelled with the same letters with a different font (and there's serious argument that the Gothic script is merely Greek with a few extra letters and should be treated that way[1]; in practice it's treated as Latin with a few extra letters, because it's Germanic and thus was handled by German philologists.)--Prosfilaes (talk) 01:03, 6 June 2014 (UTC)
The issue is not at all abstract to me. Maha is a disambiguation page on Wikipedia. I recently set about fixing the large number of links to that page, and discovered that most of them were from articles referencing mahā. Most did not provide the Sanskrit script, and the editors who wrote them may well have been unaware of their presentation Sanskrit script, because there are plentiful sources using only the latter. I initially though maha would have the answer but it did not. It took an extensive amount of poking around - and my ability as an administrator to see the content of the deleted page, mahā, for me to figure out what was going on here. bd2412 T 01:49, 6 June 2014 (UTC)
Yeah. I think part of the problem is that we have incredibly strict formatting and structure; we generally require that each word belong to a precise language, that each sense belong to a precise POS, that each quotation belong to a precise sense, that everything be templatizable and Luacizable and MewBottable six ways from Sunday. The strictness of structure not only appeals to the aesthetic sense of programmers (myself included), but also has practical benefits; but it also has drawbacks, one of which is that it does not allow us to do a great job capturing the messy reality of real language, and it forces us to adopt POVs when we would rather not (or at least, should rather not). —RuakhTALK 02:35, 6 June 2014 (UTC)
I would include this with Moskva then; whether or not mahā is English shouldn't get in the way of providing an entry for a word that people may look up.--Prosfilaes (talk) 03:14, 6 June 2014 (UTC)
+1 to what Ruakh said.
A few entries already use ==Undetermined== as their language header. We could start using that header more, although I'm not sure that's the best solution (or even a good solution). - -sche (discuss) 03:35, 6 June 2014 (UTC)
It's not so much that the language is undetermined as that we seem compelled to treat anything transliterated and placed in English text as English. Suppose we had headers that literally read, e.g., ==Transliterated Sanskrit== or ==Transliterated Sinhalese== - would that be an improvement? bd2412 T 17:57, 6 June 2014 (UTC)
Nah, for things like mahā, I don't think we need any new L2 headers; the best approach IMO would be to do what we've done for all other languages for which we've wanted to include romanizations: have a vote to allow romanizations, and then have entries using the language's usual header (in this case, ==Sanskrit==). ==Undetermined== might still be worth considering for a few other things, though, which actually do slip "between the cracks" of other languages. For example, Gott in Himmel occurs only in supposedly-German snippets (often italicized) in English works; it doesn't occur, or at least isn't truly related when/if it occurs, in German works. (But even with that, I'm not sure an ==Undetermined== header would be any less unsatisfactory that our current arrangement of calling it ==German== and/or the other obvious possibility, of calling it ==English==.) - -sche (discuss) 18:38, 6 June 2014 (UTC)
I would definitely support allowing transliterations of Sanskrit, generally, but that's a band-aid. There are still many other languages with common Latin-alphabet transliterations of forms originally written in other scripts - Russian, Ukranian, Armenian, Arabic, Sinhalese, Hebrew, etc. This entire book collects stories in transliterated Sinhalese, peppered with English explanations. This one has Hebrew transliterated at length. If words in these running text can be found having the same transliteration in other sources, do we need a separate vote to allow Sinhalese transliterations? Won't the end result be a series of votes allowing all attested transliterations? bd2412 T 19:25, 6 June 2014 (UTC)
Someone could start a series of votes, but I don't think all the votes would pass. For example, I'm sceptical there would be support for allowing romanized entries for Russian or Ukrainian (especially given how very many romanization schemes there are — cf. Wiktionary:Beer parlour/2013/November#Including_multiple_transliterations.2C_from_multiple_systems.2C_in_entries). Having someone type a romanization into our search bar and be taken to an entry which defines the string as a romanization of [whatever] is only one of several ways of getting that person to the native-script form of the word. Another is having the search function find the romanization in the native-script entry and bring that up as a search result. (In this case, a search for mahā brings up మహా, महत् and महा as search results. A user who sees those search results gathered on the search page should, IMO, have no more trouble figuring out which one she is looking for than a user who sees those things gathered into an entry.) For some languages, another way of having the user find the native-script form is having an appendix (here or on WP) detailing the conversation between various romanization schemes and the native script. When the issue of romanizations of Phoenician and some other dead languages came up, my impression was that one reason so many people were OK with allowing romanizations of those languages to have entries is that those languages are indeed dead, and no-one is natively writing them in any script, and modern discussion of them often is in Latin script. For a language like Russian, which people are still natively writing in the Cyrillic script, I could see users preferring to skip having entries for the romanizations, and either use the search-bar functionality, or just rely on users to use the appendix to convert between romanization and native script. That's why I think it's useful to discuss each language on its individual merits (Anatoli seems to feel the same way). - -sche (discuss) 22:32, 6 June 2014 (UTC)
It seems to me that we are approaching two separate problems. Your take on this addresses the question of how we should provide readers with the ability to find the native-script entry for a particular romanization, which I grant is a reasonable question to consider. My take, however, is that if a "word" (broadly defines) is used in print to the extent that people might come across it and want it defined, we should have an entry for it, whether that word is violono or Portsmouth or mahā. The only remaining question is what form should this entry take. I am thinking, at this point, about drafting a proposal for a vote laying out a series of options (allow all such entries but treat them as loanwords, allow them and treat them as their language of origin, redirect them to the entry for the language of origin if there is a unique target, allow only select languages as approved individually by the community). bd2412 T 16:25, 7 June 2014 (UTC)

How is mahā different from fēngshuǐ?[edit]

We have an entry on fēngshuǐ which identifies it as a Mandarin romanization. However, it is easily possible to find examples of "fēngshuǐ" used in running English text. Why is this entry labelled Mandarin and not English? Should the existence of citations in English text lead us to have entries for both Mandarin and English "fēngshuǐ" (as we do for aloha? If not, then why would a comparable entry on mahā be labelled English rather than Sanskrit? Why not both, since citations can be found both in running English and in long selections of transliterated Sanskrit? If we are not prescriptive, why do we care if there is an "official" transliteration system, so long as we know such a system is in use? With such inconsistencies in our coverage of transliterated terms, it seems to me that we should err on the side of at least covering well-attested phonemes. bd2412 T 00:36, 6 June 2014 (UTC)

This is why fēngshuǐ should be violently killed. Wyang (talk) 00:42, 6 June 2014 (UTC)
@BD2412 You have to be clear about what you're proposing, rather than demanding - I want to include them, period.. If it's all about romanised Sanskrit or specifically the form mahā.
  1. Do you suggest to allow romanised Sanskrit entries? Please specify, which transliteration standard, what format. Why do we need them? E.g. Devanagari is too hard to learn/to type, there are too many homophones (I seriously doubt that). AFAIK, it has to be decided by a vote, the way Mandarin standard pinyin and Japanese rōmaji were decided. Note that pinyin and rōmaji are soft-redirects and disambiguations, they have no definitions, merely links to standard Chinese and Japanese forms. It's not allowed to have a pinyin or rōmaji entry if no single a Han character (or Japanese kana) don't exist.
  2. If it's an English word (borrowed from Sanskrit), citations should be provided. --Anatoli (обсудить/вклад) 00:52, 6 June 2014 (UTC)
@Wyang. I respect your opinion on the matter but... Well, that was decided by a vote, which actually heavily reduced the role of Pinyin, not without a very strong opposition. Similarly, the vote on Romaji reduced the role of Romaji, rather than introducing it for the first time. There can be different opinions on the importance of Pinyin or Romaji but they serve as disambiguation of various homophones and favoured by other Chinese, Japanese editors. --Anatoli (обсудить/вклад) 00:52, 6 June 2014 (UTC)
Re "Why is this entry labelled Mandarin and not English?": this is a conflation of two questions; let me separate them and answer each.
  1. Why do we have an entry for fēngshuǐ labelled as Mandarin? Answer: in Wiktionary:Votes/2011-07/Pinyin entries, the community decided to allow pinyin romanizations "using the tone-marking diacritics, [...] whenever we have an entry for a traditional-characters or simplified-characters spelling"; that vote cited [[yánlì]] as an example of how such entries would be formatted (note the ==Mandarin== header). In a subsequent vote, the community decided that various Chinese lects, including Mandarin, would be unified under the header ==Chinese==. It appears that fēngshuǐ has not yet been updated to use ==Chinese== rather than ==Mandarin== as its header; this can be ascribed to the fact that Wiktionary is a work in progress and is not complete or finished yet.
  2. Why do we not have an entry for fēngshuǐ labelled as English? Answer: either because fēngshuǐ is not attested in English, or because Wiktionary is a work in progress and is not complete or finished yet. If fēngshuǐ is attested, unitalicized, in English text, then it is a loanword and we should have an English entry.
Re "why would a comparable entry on mahā be labelled English rather than Sanskrit": if mahā is comparable to fēngshuǐ in that it is attested, unitalicized, in English text, in a way that conveys meaning (which I am not convinced is the case), then we should have an English entry because it is a loanword from Sanskrit used in English. If mahā is comparable to fēngshuǐ in that the community has voted to allow romanized Sanskrit the way it voted to allow Chinese pinyin, then we should have a Sanskrit entry for mahā per that vote. However, the community has not AFAICT held any vote to allow romanized Sanskrit. The comments I've seen in the BP and at RFD have suggested that such a vote would pass, but no-one has stepped forward to draft such a vote. - -sche (discuss) 01:07, 6 June 2014 (UTC)
(Re ==Mandarin== header in Pinyin entries) Good point. There are a few things to change, though. Our main Chinese editor Wyang, though doesn't support pinyin but we can still ask him to help to change it or get anyone else. Strictly speaking, Pinyin only applies to Mandarin=standard Chinese but it can still have ==Chinese== header and the templates could display Mandarin Chinese Pinyin reading of ... --Anatoli (обсудить/вклад) 01:15, 6 June 2014 (UTC)
@Anatoli, I am not proposing any specific solution per se; this is a discussion to figure out a solution. We have a lot of things going on here - transliterations from some languages presented as words in those languages, transliterations from other languages presented as English words, entries for various languages of "eye dialect" spellings (and a well-attested lay transliteration is basically an eye dialect spelling of a word from another alphabet). I would like to see some solution arrived at that allows us to cover all words in the way that our own corpus defines words, and to provide the most accurate description of what kind of word it actually is (a Chinese word, a Sanskrit word, a Russian word). I'm fine with this being done by redirect, but that only works to the extent that the only meaning of the word is as a transliterated form from the target language. I'm not convinced that allowing Sanskrit solves the problem. There are many languages and scripts for which these kinds of transliterations are well-attested.
@- -sche, citations of mahā attested, unitalicized, in English text, can be found at Citations:mahā. More can be found with a Google Books search, but there are also many instances of mahā being used in running transliterated Sanskrit text. Also, suppose a word is well-attested but exists only in italics? Suppose it is attested only in block of transliterated text from the same language? In that case, it is still an attested phoneme, and one that a reader might look up in a dictionary. Cheers! bd2412 T 02:14, 6 June 2014 (UTC)
I'm not sure I share the view that we have "transliterations from other languages presented as English words". We have entries for loanwords, and we allow romanizations for certain languages. In some cases, a string is both a valid romanization in Chinese or another language and a loanword, but then there are two L2 headers, and only the loanword is [or: should be] presented as English. If you find an ==English== entry that is only a transliteration/romanization, RFV it.
The 1991 citation in Citations:mahā is of Mahā Yogi, not mahā; the 1910 citation is of Mahā Bhārata, the proper name of a work (possibly not includable: we have Iliad, but we don't have The Fault in Our Stars). The 2004 and 2007 and 2014 citations (yes, I added the last of those — it was the 'least unconvincing' one I could find at the time) are italicized. The 2009 citation does not have an intelligible meaning, IMO: "the classification of mahāyoga into three parts, starting with the [great] of [great]"? What does that mean? The 2013 citation is arguably a good "jib"-type citation, although it explicitly uses a somewhat different definition than other citations ("greater" as opposed to "great"). - -sche (discuss) 02:40, 6 June 2014 (UTC)
PS, you seem to be using a sense of [[phoneme]] that our entry lacks; does our entry need to be expanded? - -sche (discuss) 02:45, 6 June 2014 (UTC)
I should say an unbroken collection of phonemes (although this could be read as excluding monosyllabic constructions). We have tovarish and tovarich as English "loanwords", and many others like those, but they are really only thinly disguised transliterations. What other dictionary in the world includes them as words in English? (Okay, this one might, but still.). If we ignore the italicization issue, then there are a great many uses of mahā (Mahā Bhārata is the name of a work, but it is still composed of individual words, just like Holy Bible; we do have the more modern form, Mahabharata). bd2412 T 03:01, 6 June 2014 (UTC)
  • Wiktionary:Neutral point of view is not a policy. WT:CFI is the inclusion policy. WT:CFI does not forbid attested transliterations from being in the mainspace. The claim "transliterations are not words" is hogwash. No discussion or vote has been produced to show that there is a consensus for forbidding attested transliterations from the mainspace. WT:CFI does not trade in the term "native script". There is no policy concerning native and non-native scripts, AFAIK. --Dan Polansky (talk) 17:50, 8 June 2014 (UTC)
    • I think NPOV is beyond policy even, it is one of the five pillars (on Wikipedia), and it is my understanding that it is indented to run through all Wikimedia projects as a guiding principal. That said, I think that both NPOV and the CFI would permit inclusion of all attested transliterations (and I agree that it is nonsensical to call them "not words"), and that we should have a vote if we are going to come up with a specific scheme for the limitation of these, and for the presentation of those that are included. bd2412 T 03:48, 9 June 2014 (UTC)
Hogwash, nonsensical? If you're both so sure that no vote is necessary and transliterations are "words" and are already allowed by current policies, why don't you create some transliterated entries and observe the results? --Anatoli (обсудить/вклад) 04:08, 9 June 2014 (UTC)
If romanizations were by default allowed, we wouldn't have had one vote to allow romanized Gothic, and then a second (more successful) vote after the first was judged to have failed. The fact that such votes have been held for every language for which someone has wanted to include romanizations is, to put it mildly, normative. If you want to change the status quo — you're in the right place doing the right thing (having a BP discussion).
Incidentally, I find it notable that that first vote is the only vote on allowing romanizations which I recall failing, and it failed because people like User:Msh210 objected to the suggestion that we start allowing romanizations of just any old language. - -sche (discuss) 05:01, 9 June 2014 (UTC)
Devanagari is also a living script with a number of languages, apart from Sanskrit, using it. If, for various reasons, after votes, transliterations were allowed for some languages, reasons for keeping transliterations specifically for Sanskrit have not been presented yet in this discussion, if we don't count arguments "transliterations are already allowed by CFI" or "I want them included, I don't care how". It's not even clear if this discussion is specifically about Sanskrit transliteration or any transliteration for any language written in non-Roman script. --Anatoli (обсудить/вклад) 06:42, 9 June 2014 (UTC)
bd2412 I don't think you could be more wrong, to be honest! We're just applying our own rules and because the people who participate in RFD's vary, decisions aren't always consistent. That's no different to a court of law where it depends what jury you get! What you seem to be saying is "I don't like this community decision, please change it" with the emphasis on the "I" (that is, you don't like it). Renard Migrant (talk) 10:39, 9 June 2014 (UTC)
In fairness, aren't most proposals for change made by people who say "I don't like the existing state of affairs"?
I'm somewhat surprised by how bristly this debate is getting. I apologize if I'm complicit in that to any extent. - -sche (discuss) 13:05, 9 June 2014 (UTC)
The English word feng shui is normally written without tone marks, since tone marks make no sense in English. —Stephen (Talk) 13:21, 9 June 2014 (UTC)
If someone had, in the past, called a vote to include the a common noun like cow in the dictionary, would that mean a vote is now required to include common nouns? There needs to be some standard by which a word is excluded before action needs to be taken to allow its inclusion. There is no actual rule being applied here that anyone has pointed to, and no definition of "word" that anyone has provided that would not include transliterations. The 2011 vote on allowing romanization of languages in ancient scripts, by the way, says nothing about the application of this rule to currently existing languages, and nothing about whether the words at issue meet the CFI. The illusion that an unwritten prohibition represents the "existing state of affairs" is about as reliable as the claims currently made in this dictionary that dagoba, khakkhara, and haramzada are actually words of the English language. I am not content to have us misleading our readers, either about our goal to have "all words in all languages", or about the actual language to which a word belongs. bd2412 T 14:14, 9 June 2014 (UTC)
ᛋᛏᚩᛈ ᛏᚱᚣᛁᛝ ᛏᚩ ᛗᚪᛣᛖ ᚢᛋ ᚠᛖᛖᛚ ᛒᚫᛞ ᚠᚩᚱ ᛏᚫᛣᛁᛝ ᛁᚾᛏᚩ ᚪᚳᚳᚩᚢᚾᛏ ᚦᛖ ᛠᛋᛁᛚᚣ ᚩᛒᛋᛖᚱᚠᚪᛒᛚᛖ ᚠᚫᚳᛏ ᚦᚫᛏ ᛚᚫᛝᚷᚢᚪᚷᛖᛋ ᚫᚱᛖ ᚹᚱᛁᛏᛏᛖᚾ ᛁᚾ ᚫ ᛚᛁᛗᛁᛏᛖᛞ ᚾᚢᛗᛒᛖᚱ ᚩᚠ ᛋᚳᚱᛁᛈᛏᛋ. ᛁᛏ ᛁᛋ ᚾᚩᛏ ᛈᚩᛁᚾᛏ-ᚩᚠ-ᚠᛁᛖᚹ, ᛁᛏ ᛁᛋ ᚾᚩᛏ ᛗᛁᛋᛚᛠᛞᛁᛝ, ᛄᚢᛋᛏ ᚳᚩᛗᛗᚩᚾ ᛋᛖᚾᛋᛖ. — Ungoliant (falai) 16:43, 9 June 2014 (UTC)
For the sake of convenience: running the above through Module:Runr-translit (with the language code "ang") gives "stóp tryiŋ tó maᛣe us feel bæd fór tæᛣiŋ intó accóunt þe easily óbserfable fæct þæt læŋȝuaȝes ære written in æ limited number óf scripts. it is nót póint-óf-fiew, it is nót misleadiŋ, just cómmón sense." The module may need some improvements. Keφr 06:53, 10 June 2014 (UTC)
On my computer, this shows up as a string of boxes. Obviously (like the transliterations I would propose to include) it would be much more helpful to the reader for this to be readable in Latin script. Has Wiktionary stopped caring about helping readers find definitions that they are likely to search for? bd2412 T 18:30, 9 June 2014 (UTC)
Languages may be written in a limited number of scripts, but I would practically bet that all of them have been written in the Latin script. I can try and think of possible exceptions, but if there are any, I'd be surprised if we had any vocabulary in any of them.--Prosfilaes (talk) 07:31, 10 June 2014 (UTC)
Re: "If romanizations were by default allowed, we wouldn't have had one vote to allow romanized Gothic, ...": This argument fails to distinguish attested transliterations from all transliterations. The vote allowed the inclusion of unattested transliterations (as long as the native-script form corresponding to the transliteration is attested), so went beyond the current CFI. Furthermore, an existence of a vote is no proof at all that there already exists policy that the voted proposal overrides; many a vote takes place in a policy vacuum, especially if it serves to confirm an uncodified common practice. --Dan Polansky (talk) 16:51, 9 June 2014 (UTC)
As I've said several times, the only real way to attest Gothic is in the Latin script. The Gothic script Gothic entries we have are most likely transliterations from the Latin script. There's a one-to-one correspondence, but still. If you find a book of Gothic text, it will be in Latin script. Unless you're into paleography, you will not find Gothic in the Gothic script. And no, I do not accept that citing published works transcribed from handwritten originals is acceptable for English but not Gothic.--Prosfilaes (talk) 22:13, 9 June 2014 (UTC)
Re: "If you're both so sure that no vote is necessary and transliterations are "words" and are already allowed by current policies, why don't you create some transliterated entries and observe the results?" This is just power talk: when the speaker cannot point to a policy--as he cannot--he instead threatens to delete transliterations, or have them deleted by another admin. There may be a consensus among admins that attested transliterations should be deleted (I don't know), but there is no policy supporting such deletions. --Dan Polansky (talk) 16:51, 9 June 2014 (UTC)
I'm not threatening anyone. It's you who is behaving as if there is nothing to discuss, as if it's a done deal because it matches your opinion on the matter and ridiculing those who disagree with you. There is no secret "agreement" among admins to talk about but entries in the wrong script (not native if you wish) are usually marked as such, converted to native script or deleted. --Anatoli (обсудить/вклад) 05:55, 10 June 2014 (UTC)
I dunno, Anatoli, denying the existence of the Deletionist Cabal seems like exactly the sort of thing a member of the Deletionist Cabal would do. I may have to report you to the Delete the 'Deletionist Cabal' Cabal... ;)   - -sche (discuss) 21:54, 10 June 2014 (UTC)
As for the Runic script (ᛋᛏᚩᛈ...) above, this entirely misses the point that we are talking attested romanizations; this use of runes is not attested.
As for cabal and secrecy, the only people claiming something is done in secret are those claiming that romanizations are already forbidden from the mainspace, since they were so far unable to provide references to a public record showing consensus for their exclusion.
FYI, I have created Wiktionary:Votes/pl-2014-06/Excluding romanizations by default. --Dan Polansky (talk) 09:49, 15 June 2014 (UTC)

A vote on allowing entries for romanized Sanskrit[edit]

I have drafted a vote: Wiktionary:Votes/pl-2014-06/Romanization of Sanskrit. I used basically the same wording as previous votes used; suggest improvements if you have any. The vote can be postponed as much as necessary. - -sche (discuss) 13:32, 9 June 2014 (UTC)

This vote began a while ago, FYI. - -sche (discuss) 16:20, 23 June 2014 (UTC)

A vote on allowing all attested romanizations[edit]

I drafted a vote: Wiktionary:Votes/pl-2014-06/Allowing attested romanizations. I tried to reflect, as best I could, what was proposed in this thread (see my notes on the talk page); however, as I am an opponent of blanket inclusion of romanizations, I welcome people who actually support the proposal to reword it as necessary. I am particularly interested in your views on which of these citations could be used to cite/attest (=support the existence of) a romanization entry, vs which, if any, could not. In cases where I only bothered to type up 1 or 2 citations of a given string, please assume for the purposes of discussion that there are enough other citations available to add up to 3: what I am interested in is what kind of citation "counts" as "attesting" a romanization. - -sche (discuss) 07:33, 10 June 2014 (UTC)

FYI: Wiktionary:Votes/pl-2014-06/Excluding romanizations by default. --Dan Polansky (talk) 09:51, 15 June 2014 (UTC)
Both votes' start dates were a couple of days ago, so I have removed the 'premature' tags from both, and voting has begun on both. (Some users had suggested merging the votes, but none showed any interest in drafting a merged vote, so the votes remain separate.) - -sche (discuss) 16:23, 23 June 2014 (UTC)

What are romanizations for?[edit]

This seems a deeper question to me than what I've read so far in this thread. What are romanizations for, in the first place? If romanizations are intended to solve a problem, what is that problem?

From my past experience here and from what I've seen mentioned above, the core issue appears to be findability or discoverability. (NOTE: This is setting aside the entire question of attestations.) If a language is generally written in Script X, it can be difficult for EN WT users, who can only be assumed to be able to type using the Latin alphabet, to find entries in that script.

Assuming that users would want to be able to search for entries using the Latin script, do we need to have separate entries just for the romanized forms? Is it enough now to simply include the romanization right there within the entry in Script X? Can our search software now find such entries? My own brief testing suggests that yes, our search software can find an entry in Script X when searching on a Latin-alphabet string, provided that said Latin-alphabet string is included in the page. Importantly, this Latin-alphabet string can be provided by a template, and does not seem to be needed as-is within the wikitext. C.f. this search for kamau, which does find the 構う entry, even though the string "kamau" does not appear anywhere directly within the wikitext, and is instead provided by the {{ja-verb}} template.

So long as our search feature allows users to find Script X entries when searching on Latin strings, I fail to see any need for separate entries just for romanized forms that only serve to redirect users to the Script X entries. As such, separate romanized entries looks like a solution in search of a problem. Is my understanding correct? Are there other problems that romanized entries are intended to solve? ‑‑ Eiríkr Útlendi │ Tala við mig 20:32, 10 June 2014 (UTC)

There's more to it than that. The search can find entries based on romanization, yes, but how well does it do that job? Is it obvious to readers how to provide their search query in such a way that the result they need appears first? How well does it perform if users simply type in the romanization and press "search", like they are accustomed to doing for other languages? What if the search term they entered exists, but does not lead them anywhere closer to their goal (and they don't realise there are really two types of search)? Furthermore, how practical is it to do this for every word they want to look up (which may be dozens if not hundreds per day)? All these are problems that having dedicated entries for romanizations will reduce if not eliminate altogether. A user looking for Gothic can now simply type dags in the search box, press enter, and be on their way. —CodeCat 21:15, 10 June 2014 (UTC)
"What are romanizations for" is not really a Wiktionary question, any more than "what are verbs for" or "what are place names for". Writers use romanizations to convey meaning, they are found in print, and they may be read by people who turn to Wiktionary to find out what mean is intended. It is a dicey proposition to ask readers to rely on our search system. What are they to do when a romanization of a word from one writing system happens to closely match an entry from a word in another language? Look at the Hindi दक्षिण (meaning southern), romanized as dakṣiṇa, just a few diacritic variations away from dakšiņa, the Latvian word for "fork". Not all romanizations have distinct diacritics, even. There will be romanizations for which the word exactly matches some other existing word. What are readers to do then, if we have no entry for the romanization? bd2412 T 21:26, 10 June 2014 (UTC)
  • Re: "What are romanizations for" is not really a Wiktionary question, any more than "what are verbs for" or "what are place names for". -- there is no reason to be obtuse, this is entirely within the context of the EN WT and more specifically this very thread. To be entirely explicit, I'm asking, what are romanizations for, with regard to EN WT entries.
Re: not all romanizations have distinct diacritics, it seems you've presented one more obstacle, rather than reason for adding separate romanization entries. If someone searches for daksina, say, the system should show both the Hindi and Latvian -- as it already does. Having a separate entry at [[dakṣiṇa#Hindi]] won't change that situation at all, other than to require users to click through one more level of abstraction (the romanized redirection page) before finally getting to the entry page. ‑‑ Eiríkr Útlendi │ Tala við mig 21:46, 10 June 2014 (UTC)
Sorry, I was not trying to be obtuse, but we don't generally ask the purpose behind having any kind of entry for a word that meets the CFI. The purpose is always to help the reader define the word. With respect to the obstacle I raise, I actually came into this issue because I was disambiguating "mahā" at Wikipedia, and needed to find a Wiktionary entry to link to when it was being used as a dicdef. Typing "maha" into our search engine took me straight to the unhelpful entry, maha, which only has completely unrelated meanings from completely different languages. The only way that I was able to find the Hindi word at issue was to use my admin bit to look at the deleted entry, mahā. bd2412 T 21:55, 10 June 2014 (UTC)
Thank you for clarifying, I clearly had the wrong end of that stick. Your reply also makes it clear that I needed to clarify that I am thinking past the bounds of CFI: any term in any non-Latin script should ideally have some romanized form included in the entry for better searchability, regardless of whether that romanized term can be attested anywhere.
FWIW, clicking the magnifying glass icon to go to the secondary search and entering mahā gets me this. From there I see the महा#Sanskrit entry, but no independent listing at all for the Hindi entry that you were looking for. (The Hindi term does show up in the etymologies of compounds, such as महेश्वर.)
Perhaps then the root issue isn't necessarily that EN WT should include separate romanizations for the sake of having them, but that the EN WT search feature is still inadequate (doesn't allow specification of languages when entering search strings, doesn't show language names in results, ordering of entries seems a bit arbitrary, etc.), and adding separate romanization entries is one workaround. Is that a good restatement of this issue? ‑‑ Eiríkr Útlendi │ Tala við mig 22:07, 10 June 2014 (UTC)
Come to think of it, I don't know whether the word as used in Wikipedia articles is meant to be Hindi or Sanskrit. All I, as the lay reader, can know for sure is that several Wikipedia articles on topics in Hinduism use the word "mahā", and that many books in print also use this word. The search function is only as useful as our ability to convey the means of using it to the average user, who may have very little experience with Wiktionary. If at all possible, it might be helpful to generate a list of transliterations that theoretically would be made under this proposal, and see how many of those are blue links having existing articles with other meanings in other languages. bd2412 T 01:16, 11 June 2014 (UTC)

Oh, wow.

I have tried my best to go through all of this discussion and the related votes. It is absolutely confused, the terms are used to mean different things by various participants, and the proposals and examples are vague and undefined to the point of meaninglessness.

If you guys can’t come up with something even a bit cogent, I’m just going to vote against all of it. Michael Z. 2014-06-16 22:51 z

@Mzajac: In my understanding, one rationale for allowing romanizations is "having entries for romanizations is the best way (or a good way) to get users to the native-script forms of words"; one rationale for excluding romanizations is that "romanizations are not the best way (or a good way) to get users to the native-script forms of words; it is better to have the search function find the native-script entries directly by finding the romanizations in them". An additional rationale for allowing romanizations is that "romanizations are words in the same languages as the native-script words they romanize, and as such merit inclusion" (so, e.g., in these citations, linked-to from the 'allow romanizations' vote, "svobodnyx" is [asserted to be] a Russian word); while a rationale for excluding romanizations is that "romanizations are not words, they are merely shadows of words, and as such are not intrinsically/automatically inclusion-worthy". Each side disagrees with the other side's rationales; each side further disagrees on whether the status quo is that romanizations like "svobodnyx" are allowed or excluded. - -sche (discuss) 23:10, 16 June 2014 (UTC)
It seems that now arguments are used like in Wiktionary_talk:Votes/pl-2014-06/Excluding_romanizations_by_default#Clarification_needed that if we exclude romanisations, then words like "judo" will be excluded. I agree with Michael that the whole discussion in various places is messy and confusing. If "... to get users to the native-script forms of words" is used then it should be clear they are soft-redirects and should only exist if native script entries also exist. BTW, instead of "svobodnyx", could you use the lemma form, rather than inflected, see свобо́дный (svobódnyj)? --Anatoli (обсудить/вклад) 23:40, 16 June 2014 (UTC)
Re "arguments are used [...that] 'judo' will be excluded": such arguments don't hold water, IMO. Re "свободный": If the vote to allow attested romanizations passes, and "svobodnyj" is attested, then both it and "svobodnyx" will be allowed, presumably pointing to свободный and свободных, respectively. - -sche (discuss) 23:53, 16 June 2014 (UTC)
OK. Are you able to provide an example of a romanised entry (which is not English but a romanisation only) for the vote(s)? Doesn't have to use any (final) templates but I need to see the structure. --Anatoli (обсудить/вклад) 01:48, 17 June 2014 (UTC)
I'm not one of the people who want to allow romanizations (I just drafted the votes, since it looked like no-one else had the time/intention to), so I'm all ears if either of them has a different idea of how romanizations' entries should look. However, in both the Sanskrit vote and the 'allow romanizations' vote, I specified that romanization entries "will contain only the modicum of information needed to allow readers to get to the native-script entry"; that's the same clause that was used for the vote on pinyin. For the Sanskrit vote, I spelled out that entries will look basically like the Gothic romanizations' entries (e.g. qino), except with "Sanskrit" (or in the case of svobodnyx, "Russian") headers instead of "Gothic". So, like User:-sche/svobodnyx. - -sche (discuss)
Thanks. I see. I am against romanisation entries (except for those, which are already allowed by a vote) but qino looks OK to me and User:-sche/svobodnyx is not. It goes way beyond "modicum of information...". If citations are required, they should be on citation page, IMO. --Anatoli (обсудить/вклад) 02:58, 17 June 2014 (UTC)
Okay, I’ll quickly summarize some of what is frustrating me:
  1. Why does this only apply to romanizations? What about Cyrillizations, &c.? Is it because the intent is to serve readers of English-language texts? If that is the case, then why not only accept romanizations attested in English texts?
  2. The wording makes it sound like any term that can be considered a romanization is allowed/disallowed an entry based on this proposal. That’s silly. Thousands of entries are loanwords that are arguably romanizations, in a narrow or broad sense. The wording has to be more specific to account for this conflict.
  3. Why does this refer to romanizations at all? Are we talking about loanwords in English texts whose spellings may be considered transliterations or transcriptions? Does it mean foreign words that are being mentioned in English texts? In other Latin-alphabet language texts like German or Lithuanian? And if so, why not Cyrillicized, Arabicized or Siniticized forms too? Why this level of specificity?
  4. What about words attested in spoken sources? Why does whatever principle is being applied here apply only to the written word?
  5. The given examples don’t help determine what is being proposed (Citations:mahā, User:-sche/svobodnyx). Some of them are proper names used in English. Others are words in transliterated titles of works. Some are accompanied directly in the text with a gloss, indicating that they are foreign terms being mentioned, and not used in an English text. They appear in their sources for various reasons that don’t relate to any rationale for this proposal that I can divine.
Perhaps a Romanizations section in the native-script entry would satisfy whatever needs this is meant to satisfy. It’s already been discussed and sort-of accepted twice. It could be limited to attested romanizations, or to standardized romanizations, or perhaps to the superset of both. (Actually, better a more general “Converted forms” section, including any foreign-language, other-script conversions or transcriptions makes better sense.) Michael Z. 2014-06-17 22:28 z
  1. At the very beginning of this discussion, BD did propose allowing entries for all transliterations, even cyrillizations, etc. The proposal got narrowed early on to only romanizations (see the initial exchange between BD and Ungoliant, and then BD's comment to Anatoli at 02:45, 4 June 2014), as some of the people involved seemed to recognize that the idea of allowing entries for cyrillizations, arabicizations, etc was so much more controversial than the already controversial idea of giving entries to romanizations that it would probably act as a poison pill.
  2. I personally think the wording of the vote is clear, particularly when taken together with the examples, which show that the vote is to take sequences like svobodnyx (in the Latin script) and include them under ==Russian== (not e.g. ==English==) L2 headers. How would you change the wording to make it clearer?
  3. See point 1.
  4. Huh? Audio has sometimes been cited to attest words (e.g. in [[Qapla']] we record that that word was uttered in the film Team America: World Police). I don't see how it could attest a particular spelling or script. In the case of Qapla', we used common sense / Occam's razor to assume that the word was in the script that the dialogue it was in it would normally be in (the dialogue was English, so: Latin script), just as we routinely assume books that seem to use words like "Москва" are in fact using those (Cyrillic) letters, and not using e.g. "Mоcквa" (a mix of Latin and Cyrillic letters).
  5. Re "Some of them are proper names used in English. Others are words in transliterated titles of works.": Yes. The main proponent of allowing romanizations, BD, has argued that those citations are still using the "words" to convey meaning; see his comment of 21:19, 10 June 2014 on the 'allow romanizations' vote's talk page, and this comment.
Now that we have Lua and transliteration is largely automatable (and now that Lua- and template-generated text is findable by our search engine), I agree that the inclusion of romanizations sections in the native-script entries is a reasonable idea, preferable to the idea of making entries for romanizations. - -sche (discuss) 23:27, 17 June 2014 (UTC)
Re: inclusion of romanizations sections in the native-script entries is a reasonable idea, preferable to the idea of making entries for romanizations. That's my point too. You don't need an entry for akīrtikara if अकीर्तिकर (akīrtikara) shows "akīrtikara" in the transliteration section. Further, if our searches allowed to select specific languages, then finding a foreign-script term would be even easier. For transliterations with complex diacritics we should develop reverse transliteration system, e.g use "jaaGgala" to search for जाङ्गल (jāṅgala) as is used by Spoken Sanskrit site. --Anatoli (обсудить/вклад) 01:55, 19 June 2014 (UTC)
I am curious about your opinion of cyrillizations. I have imported these tables from Wikipedia's article on Cyrillization of Chinese (there is a similar table on Cyrillization of Japanese), and would like to see the entries made for the Cyrillic phonemes (and here I mean all of the one-syllable phonemes). bd2412 T 04:07, 19 June 2014 (UTC)
Well, (Russian) Cyrillisation of standard Chinese (Palladius) or Japanese (Polivanov) (there's also Korean (Kontsevich)) is only needed to understand how Chinese, Japanese or Korean people's or place names, concepts are usually written in Russian or can be written or transliterated for education (other Cyrillic based language have similar but less systematic standards). E.g. it's Хиросима (Hiroshima), not "Хирошима", "гоюй" (Guoyu), not "гуою", Korean "ханча", not "ханджа" - hanja. There are some exceptions (Токио, Иокогама, not Токё, Йокохама/Ёкохама - Tokyo, Yokohama) and traditional spellings (Пекин, not Бэйцзин - Beijing), variants (Аомынь not Аомэнь - Aomen, Macau). They can stay in appendices, I don't know if we need entries for them, unless they are words, like proper nouns, loanwords. I actually think that Cyrillisation is a bit of a misnomer here, they are rather Russifications, these systems are partially used in other languages or sometimes used as a base but they definitely won't fit for Ukrainian, Belarusian, Bulgarian, Serbian and Macedonian without changes. --Anatoli (обсудить/вклад) 04:30, 19 June 2014 (UTC)
An appendix would be suitable, so long as the appropriate redirects were made. Something would need to be done about the many existing entries for Russian words having different meanings that coincide with these Cyrillizations (if that's the wrong word, than Wikipedia's article needs fixing also). This leads to wonder, by the way, are our romanizations of Chinese and Japanese characters and words universal to all languages using the Latin alphabet? bd2412 T 17:26, 19 June 2014 (UTC)
I suggest Cyrillization for Russian, etc., just like there are romanizations that are characterized as “international,” “English,” “German,” etc. Russification (Russianization?), Anglicization, etc., are very broad terms, and as far as I know, they don’t conventionally refer to script conversions. Michael Z. 2014-06-27 18:09 z

@Michael Z., my entire consideration here is that if there is a reasonable possibility that a reader will come across a "word" while reading books in print (and by "word" I mean something that the average person would look at and believe to be a word), and may want to find out things about that word (definition, etymology, pronunciation, etc.) for than word, then that word should be included in our corpus to offer these kinds of information. Whether there is a reasonable possibility is why we have a CFI. We should do these thingsbecause our goal is to provide "all words in all languages", and we do not have the limitations that other dictionaries have. Of course, words found only in spoken sources aren't going to be come across in print until some author chooses to write about and transliterate them. bd2412 T 17:23, 19 June 2014 (UTC)

Well, our framework treats “all words (used) in all languages,” that is, each term meriting an entry, as a word of a particular language. Specifically, every attested term used as a native or naturalized expression.
What you are proposing is broadening our general principle, and accepting every mention too (“all words ever mentioned”). This is a much bigger discussion than just “accepting transliterations,” and should be discussed as a fundamental change to the principles of CFI. Michael Z. 2014-06-27 18:09 z
Where in any policy do we define "word" as limited to a "native or naturalized expression"? That certainly is not our practice, which already includes tens of thousands of entries that do not fall within that limitation, including all Latinized entries from words with dead scripts or languages without scripts, all translingual terms, all of our existing romanizations of Chinese and Japanese words and characters, and many words like tovarish/tovarich that are in reality transliterations from other scripts, even if they are listed as "English". bd2412 T 18:26, 27 June 2014 (UTC)
  • It bears noting that all Japanese and Chinese romanized entries (and, I think, all romanized entries for Gothic as well, among others) are there purely to aid in finding the lemma entries. These are not, nor should they be regarded as, entries unto themselves -- they are no more than workarounds for our profoundly inadequate search features. ‑‑ Eiríkr Útlendi │ Tala við mig 21:55, 27 June 2014 (UTC)
  • I would not envision entries for attested romanizations as serving any other purpose than that. bd2412 T 00:55, 28 June 2014 (UTC)
Bd2412, Nope. At best, you are listing a set of explicit exceptions to the principle that I mentioned.
  1. Dead scripts handling is an explicit exception. They are largely used academically in romanized form.
  2. Translingual is the combining of entries for expressions used natively in more than one language – and arguably, many violate CFI because they are not terms, or words, or lexical items (are 3, , , 𝄇, and “words”, to be “defined” in dictionaries?).
  3. Chinese and Japanese romanizations are native forms used in Chinese and Japanese, are they not?
  4. Tovarish is an English word: I don’t understand how saying that such words “are in reality transliterations from other scripts” contradicts that – a naturalized or semi-naturalized borrowing remains a borrowing, regardless of what script the donor language relies on: tovarishes of the tsar is no less English than three cappuccinos. Furthermore, the argument for picking out only “transliterations” this way makes no sense. The set of spellings of the term tovarish is a result of the combined influences of transliteration, direct transcription, English transcription of utterances, and re-borrowings from other languages. For evidence, the OED entry’s citations include tavarisch, tovarisch, tovaritch, tovarich, and Tovarishch. A spelling is not a term, and the surmised source of a spelling doesn’t determine the identity of a term.
The whole proposal presupposes that “a transliteration” is a kind of term, but it is not. A transliteration is one expression of a term, derived from another expression of a term. Also, don’t forget that although we have a web page for every variant spelling, capitalization, hyphenation, or other orthographic form (form-of entries, “soft redirects,” &c.), our proper entries actually represent terms, not spellings. Michael Z. 2014-06-28 16:28 z

Civility and formatting of citations[edit]

Yesterday, I politely reminded Spinningspark at RfV to format cites per our guidelines when adding them to citations pages, rather than posting a string of unformatted links. Spinningspark told me to die. I told Spinningspark that such hostility was "uncalled-for and completely unacceptable," and in response I was informed "my comment was completely called for."

Such incivility is unacceptable and should not be tolerated. -Cloudcuckoolander (talk) 16:52, 5 June 2014 (UTC)

I agree. It's a very reasonable request to ask a user to format citations correctly. We inform new users when their formatting is off all the time, so an experienced user like Spinningspark should have no problem in handling that. His reaction was definitely not acceptable. —CodeCat 17:14, 5 June 2014 (UTC)
Right. If one is going to add citations to a citations page or an entry, one ought to take the time to properly format them. Otherwise post whatever Google Books/etc. links you find in the RfV discussion so that someone else can step in and do the formatting. Does someone also want to inform Spinningspark of this discussion? I don't want to do it myself due to involvement/conflict of interest. -Cloudcuckoolander (talk) 17:28, 5 June 2014 (UTC)
Strictly speaking, "go hang" is a relatively mild oath, and not one that I would equate with literally telling you to die. bd2412 T 17:34, 5 June 2014 (UTC)
Well, I don't think everyone who adds citations has to know how to format them correctly. Everyone has to learn. But I think if someone asks to format them, it's a very reasonable request that someone who is willing to cooperate with the Wiktionary community (and its practices) should have no problem with fulfilling. So if the user just flat out refuses in such a hostile manner, it's more or less implying that they don't want to take responsibility for their work, hence willful disruption of Wiktionary ("I know how to do it because I have been told, and I know that if I don't do it it gives others more work, but I'm still not doing it"). —CodeCat 17:39, 5 June 2014 (UTC)
I was not aware that "go hang" had an idiomatic meaning roughly equivalent to "take a hike" or "pound sand." To me it read as an instruction to "drop dead" or "go kill yourself." However you cut it, though, it's hostile language, and and hostility (especially unprovoked hostility) is not conducive toward a collaborative project. -Cloudcuckoolander (talk) 21:13, 5 June 2014 (UTC)
You coming to the Beer Parlor to blow the whistle on a fellow Wiktionarian is uglier than Spinningspark's incivility, IMO. Grow a pair. --Vahag (talk) 17:54, 5 June 2014 (UTC)
We don't have any formal dispute resolution processes or noticeboards etc. like they do on Wikipedia (at least none of which I am aware). This is the only place I could think of to bring this incident to the community's attention, because the alternative, of course, was to ignore it, and incivility has been passively tolerated for way too long around here. Something needs to change. Preventing editors from getting tired of incivility and leaving is more important in the long-term than not stepping on anyone's toes by daring to call out unacceptable conduct. -Cloudcuckoolander (talk) 21:13, 5 June 2014 (UTC)

I always take the time to correctly format any kind of material that I place on an entry page to the best of my abilities. That is our "product" and is the part of the project on display to the public. That is expected of anyone and I am not an exception. However, RFV is a page requesting someone to find citations. I have found some, as requested. That is helpful. Me not formatting them and not putting them in the entry is no more unhelpful than anyone else not doing it. Just because I found some citations does not oblige me to do anything with them. This is a volunteer project and I am choosing not to volunteer.

The citations were placed on the citations page rather than the RFV page because I have repeatedly been asked to do that.[2][3][4][5][6] I am happy to go back to placing them on the RFV page if that is what is wanted, but I really don't understand this attitude that we really don't want to hear about cites unless they are perfectly formatted. SpinningSpark 23:44, 5 June 2014 (UTC)

  • Citations pages are a product too; they exist to show the reader examples of use in historical context. A citations page is not necessary to resolve an RfV discussion, but they tend to be made because the cites have been collected. Therefore, if you are only going to provide bare links, it is probably best to post them in the RfV discussion itself. If you are up to fully formatted cites, those don't need to be on a citations page, but may as well. bd2412 T 00:09, 6 June 2014 (UTC)
    The previous requests that the unformatted links be placed on the Citations page was an effort to put them one step closer to being usable in the face of Spinningspark's failure to format them as the rest of us do. In those locations a dump run could at least find them easily and someone else could clean up Spinningspark's mess. DCDuring TALK 00:42, 6 June 2014 (UTC)
    If I had been running around spoiling nicely formatted citation pages by dropping bare urls on to them then you might have a point, but I would never do that. Where others have already begun the work of producing a formatted page of citations then I am always careful to follow. But then, there is rather less need to find citations if others are already doing it. It is unfair to characterise me as a misfit who is refusing to conform to site standards. What I am really doing is the work of finding cites when no one else has the time, can be bothered, or wants to save the entry. Would you rather have the cites or not?. "Fitting in", in the case of most of the entries I have responded to, would not be formatting the cites I provide, it would be doing nothing and leaving the page blank like everybody else has.
The real unacceptable behaviour here is the opprobrium that is being heaped on me for doing this work. I'm not looking to turn this into a dick measuring contest, but if someone were to do some stats of RFV I am pretty sure that a very significant percentage of the cites found in response to those requests would be from me. I should be thanked for fulfilling this task when no one else wants to do it, but instead I am being hounded and criticised for it. SpinningSpark 07:16, 6 June 2014 (UTC)
I agree that you deserve thanks for the work that you do. I do disagree with the rationale enunciated by DCDuring, and would still have bare links put in the RfV discussion rather than on a citations page, regardless. bd2412 T 11:47, 6 June 2014 (UTC)
@Spinningspark: Relatively speaking, finding the citations is the fun part of the task of documenting our definitions. Formatting them, though not difficult and not painful, is definitely less fun. Using our various "quote" templates makes it much less tedious than it could be — but not fun. Therefore we follow the simple fairness rule of allocating to the person having the fun of finding each quotation the less fun task of formatting the quotation in the entry. This is simple plays-well-with-others schoolyard justice.
@BD2412: Because, 1., the final resting place of the citations will be in the entry or its citations page, 2., the definition being "cited" is always part of the problem and is always in the entry but not always in the RfV, and sometimes 3., because the context of multiple definitions in a PoS helps us better grasp a particular definition, it has always seemed more efficient to me to just insert the citations where they will ultimately be, thereby enjoying the benefits of seeing how citations fit in the entry, rather than in an out-of-context argument on the RfV page. DCDuring TALK 12:28, 6 June 2014 (UTC)
Exactly where is this guideline that says the person reporting a cite is allocated the work of formatting it? RFV is a page asking for citations. Providing them I would have thought is exactly what is wanted. You might consider that to be only half the job, but at least I have left the task only half undone when everyone else left it fully undone. Frankly, if such a guideline actually exists, or you manage to get one created, then I will probably unwatch RFV and not bother in the future at all.
As for fun, there are two extremes here. At one end is the case of wifty (which led to this thread in the first place). Cites for wifty are so easily found with the simplest of gbooks searches—no need for complex search terms or scrolling through pages of results, they just leap out at you—that I would have thought that all that was needed was to point that out for the RFV to be settled. I am sure if the proposer of the thread had done that search there wouldn't have been an RFV in the first place. In cases like that I really don't feel inclined, or see the need, to do a lot of work on the entry. It's these kinds of drive-by requests that are the make-work at RFV, not me finding cites for the make-workers. And playground responses along the lines of nah nah nah we're not listening unless you do it properly [7] are really not going to change my mind.
At the other extreme are obscure words that are really difficult to cite. A search term that zeroes in on that exact sense is hard to come up with and many pages of false hits have to be examined. It is not exactly fun to go through page after page of google results. It is quite satisfying when one finds something, but not fun. I am much more inclined to put cites in the entry under those circumstances (properly formatted of course) to preserve a record of what I found. If others feel I am hogging all the fun, then by all means jump in and find some cites yourself! SpinningSpark 16:27, 6 June 2014 (UTC)
Speaking of tiresomeness of formatting citations: some time ago I wrote a gadget (Quiet Quentin) which can assist in finding citations on b.g.c and format them accordingly. It is at the bottom of Special:Preferences#mw-prefsection-gadgets. The generated markup often needs manual corrections, but nevertheless the tool takes much of the burden away. Unfortunately, there will be probably no Usenet support, because I could not find a public JSONP API for searching it anywhere. Keφr 16:43, 6 June 2014 (UTC)
@Spinningspark: re: "Exactly where is this guideline [] ". Certain aspects of behavior don't actually require documentation among well-socialized mammals (animals?). Humans usually pick up basic civility (tit-for-tat, golden rule, taking the good with the bad, etc) in the schoolyard, though application in new realms often proves challenging. Admittedly, the online environment seems to require more explicit norms than some other environments. And some folks may not have had good schoolyard experiences. DCDuring TALK 18:37, 6 June 2014 (UTC)
I didn't learn anything like that in my schoolyard. Maybe I learnt that if someone starts ordering you about tnen you should hit the fucker first before the fight starts properly, but nothing in etiquette that would actually be useful at a dinner party. SpinningSpark 23:27, 6 June 2014 (UTC)
That explains it. DCDuring TALK 23:45, 6 June 2014 (UTC)
This is a bit painful to watch: two people who unselfishly make massive, high-quality contributions to rfv fighting each other, and a general atmosphere of negativity in the comments posted by others.
Although I tend to side with Spinningspark on substance in this issue, he can be a bit rude at times- though I would say describing it as "extreme incivility" is a bit much. The paradoxical effect is that, of all the people who contribute searches to rfv without adding formatted cites to entries, Spinningspark is the only one consistently singled out for criticism for doing so.
I think everyone here needs to take a step back and look at the big picture: this is a community effort which requires the voluntary efforts of real people, with all their quirks and flaws, virtues and vices. Negativity tends to lead to more negativity, not to the kind of results we want. Too much negativity, and people stop participating.
I would say to Spinningspark: please consider biting your tongue every once in a while and being more civil, even when people make unreasonable or unfair demands. Otherwise, you're moving the focus away from what you're responding to and onto your response- letting the others off the hook.
I would say to those criticizing Spinningspark: don't look a gift horse in the mouth. Maybe Spinningspark should be more careful about what gets put on the citations page if that forces others to fix things, but Spinningspark shouldn't be criticized for not putting things there in the first place. Sure, someone has to prepare and enter the cites, but Spinningspark could just not contribute anything, and then someone would still have to prepare and enter the cites anyway, but they would also have to do Spinningspark's part, too. I don't see the point in criticism over this. It's one thing to make sure that people are aware that there's more to it than just finding the cites, but if someone already knows this and makes it clear they just don't want to do it, nothing good will come of making it an issue- criticizing a volunteer for not volunteering enough is more likely to to lead to less volunteering, not more.
Not that I'm accusing anyone of hypocrisy: everyone who's participated in this discussion has contributed many times more than enough to the project to have every right to speak on this issue. Cloudcuckoolander, particularly, isn't appreciated enough for doing what Spinningspark has been doing, but doing all of it and doing it right. My point is that having the right and the standing to say something doesn't make it a good idea- if people see that their contributions will be met with criticism for not contributing enough, they'll just not contribute at all. Chuck Entz (talk) 19:44, 6 June 2014 (UTC)
I also wish Spark would play ball regarding formatting — and if he/she can't, then post citations to some other location where they can be fixed by us before adding to the entries (which won't help our already huge workload) — but I still think that conflicts and disagreements are something that happens, and something that human beings should be raised to deal with. Offence is taken, not given. Some people are rude; it may mean you don't want to deal with them very often; but most people in the world are not you. Usenet policed itself perfectly well long before the WWW, when there were no "moderators" or "administrators", just common sense and a majority who supported netiquette. I don't see incivility as a serious project problem unless it's abuse and threats. Equinox 21:02, 6 June 2014 (UTC)
@Chuck Entz — The choice of the description "extreme incivility" was the result of my initial thought that "go hang" was an instruction to die rather than an idiom roughly equivalent to "take a hike." I went ahead and changed the section title above.
If I'd been catty in my initial request to Spinningspark, I could understand receiving a defensive response. But I wasn't catty. I didn't complain about having to finish what he'd started. I thanked him for taking the time to collect cites and politely requested that in the future he remember to format cites before putting them on the cites page.
This was the first time I ever asked Spinningspark to format cites. I walked into this unaware that there is apparently a history of Spinningspark being asked to format cites. So I didn't know that my request might open an old wound, as it were. Being one of the relatively small pool of editors who regularly do legwork at RfV, I will admit that I've occasionally experienced a sense that the important contributions I'm making toward the project are going unnoticed. But I'm still receptive to polite requests to do things differently, and if I don't agree with a request, I try not to take it personally, and state my objections in a non-confrontational manner.
One of the concerns raised about Luciferwildcat was that he didn't properly format the entries he created, and this meant more work for someone else. This drove home to me that "if you're going to do something, do it right" is a philosophy by which Wiktionary operates. I don't think Spinningspark should be given a free pass in this regard. No, I'm not suggesting that Spinningspark is under any obligation to gather or format cites if he does not wish to do so, but if you are going to post citations in an entry or on a citations page, you are obligated to do it properly. Asking someone to correctly format cites when posting them to a citations page is not an "unreasonable or unfair demand."
And, frankly, I find it troubling that you would request we refrain from calling out Spinningspark on the grounds it might drive away a valuable contributor, and yet not take into consideration that having to endure hostility might also drive away valuable contributors. Do I really need to point out that I am also a volunteer and am under no obligation to "grow a pair" as Vahag suggested above and tolerate hostile treatment? -Cloudcuckoolander (talk) 17:36, 7 June 2014 (UTC)
I think we all feel like our valuable contributions go unnoticed. How often do you see any contributor here thanking another for anything? Maybe it's because of the small size or relative wonkiness of this community, but we don't really have the culture of appreciation that Wikipedia tends to have. There are some rough edges here, and there are likely to continue to be, but all that said, we have managed to build one hell of a dictionary. bd2412 T 18:02, 7 June 2014 (UTC)

Translingual entries and Chinese entries in the new format[edit]

We have tens of thousands of Mandarin, Cantonese, etc. entries lacking definitions, also with ===Hanzi===, ===Han character=== requesting definitions but Translingual entries have imported definitions, rather generic and vague, obviously without part of speech info.

As a first step, I suggest to merge all or most single-character definitionless Chinese (i.e. currently Cantonese, Mandarin, Hakka, Min Nan, Wu) entries using this entry (see this revision) as an example. The last edits removed ==Cantonese== and ==Mandarin== sections with nothing particularly useful (only request for definitions and transliterations, which are not lost), created ==Chinese==, merged definitions requests (with {{defn}}) into one language, all definition requests would now be in Category:Chinese definitions needed. If this format is accepted, perhaps a bot might bring all similar entries into this format (or similar, depends what readings/topolects are available, whether Old Chinese and Middle Chinese information can be obtained). Please comment (hopefully without trolling about "destruction of Sinitic languages", we are not destroying anything). Another point is, the example entry uses ===Definitions=== header, which is still under discussion in the 2014 May page. Since not only definition is unknown but PoS, ===Definitions=== may be the best choice in this case.

@Wyang: please join. --Anatoli (обсудить/вклад) 03:51, 6 June 2014 (UTC)

And this is an example of a single-character entry WITH definitions - this revision of . @Bumm13: I have edited just after you, please check. --Anatoli (обсудить/вклад) 04:17, 6 June 2014 (UTC)
The principle seems good to me. The technical side could be very challenging after 12 years of scattered edits. Wyang (talk) 07:27, 6 June 2014 (UTC)
It's understandable, partial recreation of entries may be required. Entries without definitions and sometimes erroneous transliterations have little value. --Anatoli (обсудить/вклад) 08:08, 6 June 2014 (UTC)
Spot-checking a few of the approximately 30 000 Chinese-character entries which use {{defn}} leads me to suspect that even after all these years, a large number of them have only been edited by bots making changes to all of them en masse. So, don't be discouraged from trying to sic a bot of your own on them ... the percentage of them that follow a predictable format and thus can be cleaned up by bot may be larger than you expect. :) - -sche (discuss) 05:17, 9 June 2014 (UTC)

Tone and register used in Wiktionary content[edit]

On my talk page here, another editor complained that we should use "whom" because it's "correct", which I disagree with for the reasons I gave in the discussion. But I wonder what others think of this. We have occasionally had complaints that definitions are terse and complicated, or use words no longer in normal use (thou is notorious), and they can't easily be understood. So I think that we should avoid using language that is no longer widely used in speech. —CodeCat 10:43, 7 June 2014 (UTC)

Thou is in a different class from whom. Whereas "thou" is genuinely archaic and is never used except when one desires to be deliberately archaic, "whom" is still used somewhat extensively in formal contexts. Just my 2p. If I may not take part in this discussion, I shall abstain from doing so. Velociraptor888 10:49, 7 June 2014 (UTC)
While I agree that "for who" is not an error, it is less formal and therefore less appropriate than "for whom" in the context of a dictionary definition. Although we are a descriptive and not prescriptive dictionary in our scope, our readers have a right to expect a certain degree of formality and adherence to the prescriptive rules of edited written English in our content. —Aɴɢʀ (talk) 11:27, 7 June 2014 (UTC)
We have every right to be prescriptive for ourselves, and I completely agree that we should use formal language (including whom) in our definitions (but not necessarily usage examples). --WikiTiki89 14:40, 7 June 2014 (UTC)
Formality, yes; words [ie, lemmas] less frequent than the top 30,000 or so of current English, not unless the definiens itself is technical. DCDuring TALK 16:33, 7 June 2014 (UTC)
The overall frequency of whom is irrelevant. What matters is its frequency in the place where you would expect it, which in informal settings is probably nearly 0%, but in formal settings I'd expect it to be around 90%. --WikiTiki89 17:25, 7 June 2014 (UTC)
I've never used it, no matter what the context. It feels very pretentious and unnatural to me. —CodeCat 17:31, 7 June 2014 (UTC)
Yes, but you live in the Netherlands. --WikiTiki89 19:28, 7 June 2014 (UTC)
So when you have a sentence that would be ungrammatical if it used who (e.g., "an unrepentant criminal on whom the court imposes an additional penalty"), do you just make it ungrammatical, or do you rephrase it to avoid whom? —RuakhTALK 00:11, 8 June 2014 (UTC)
I use who. Although I think I would rephrase it anyway, not to avoid whom but just because it sounds more natural: "an unrepentant criminal (that/which) the court imposes an additional penalty on". whom just isn't in my normal vocabulary at all, it feels more or less like an archaic synonym for who, and so who doesn't strike me as ungrammatical in the slightest. It's simply my own feel for the language as a native speaker; it's how I learned to speak and have always spoken and been spoken to in English. —CodeCat 01:37, 8 June 2014 (UTC)
Then you should consider editing your user-page to remove your claim that you're a native speaker. I do not think *" [] criminal on who the court [] " is English. (And I think that this use of "which" is actually archaic, unlike "whom" which is merely formal; but I'm not sure, it may just be dialectal.) —RuakhTALK 07:18, 9 June 2014 (UTC)
That sounds like a w:No true Scotsman argument to me. —CodeCat 10:02, 9 June 2014 (UTC)
Well, sure. English is defined as the native language of native English speakers, and native English speakers are defined as those who have English as their native language. Your English is excellent, but if you've internalized a rule that it's always grammatical to substitute "who" for "whom" without making any other changes, then I believe you are, in that respect, at variance with native speakers. (Of course, English has many dialects, and your difference from an actual dialect seems to be less than many dialects' differences from each other. I'm not saying "OMG yur t3h suck", I'm just saying that I think you're wrong about this point of usage.) —RuakhTALK 17:55, 9 June 2014 (UTC)
I consider myself a native speaker of Russian. However, since I did not grow up in Russia (but in the US), many things that sound outdated or archaic to me, I am often surprised to find are actually still in common use in Russia. This does not mean I'm not a "true" native speaker, but just that I grew up in a different environment and therefore cannot always judge what is archaic or not in Russia itself. --WikiTiki89 13:42, 9 June 2014 (UTC)
That's fair. I recently had a similar realization for Hebrew. (Though unlike you and CodeCat, I've chosen not to list my Hebrew as "native" on my user-page.) —RuakhTALK 17:55, 9 June 2014 (UTC)
I think the issue isn't whether we should use "who" or "whom" in entries, but whether you should be reverting someone for making what you consider to be the wrong choice. Sure, I've reverted edits that added an obsolete 16th-century term to a definition- but that's because it would make it harder for an ordinary person to understand. "Whom" is different: it's still in use, especially in more prescriptive contexts. People may not identify with those who use it, and they may not remember the rules about when to use it, but they still understand it just fine.
With all of the truly awful stuff that gets added to entries, I'm simply not going to bother with borderline issues like this. If someone wants to replace "who" with "whom", or vice versa, I leave them alone unless it looks like it's going to degenerate into an edit war (sometimes I'll revert pondian color/colour edits for that reason). The same goes for hyphens vs. dashes, ending sentences with prepositions, etc. Unless it's going to make it harder to understand, or it's going to make many of our readers cringe (e.g. possessive "it's"), I let people do what they think is best. Chuck Entz (talk) 20:07, 7 June 2014 (UTC)
Re who/whom: These ngrams suggest whom is still dramatically more common than who as the objective case of who following a preposition. Anecdotally, I've often seen objective-case use of "who" deprecated, while I've never until now heard anyone oppose "whom". Therefore, I'd say that while contributors can create entries with whichever of "who" vs "whom" they personally feel more comfortable with, I wouldn't revert someone who cleans up the "who"s to "whom"s. PS, "whom" passes DCDuring's test: oxforddictionaries.com lists it as one of the top 1000 words in its corpus, and in the Corpus of Contemporary American English it's the 1021st most common word. - -sche (discuss) 20:21, 7 June 2014 (UTC)
I'm not surprised. I even feel that "The person to who it was given" sounds really awkward since a formal context would use "whom" and an informal context would use a dangling preposition. --WikiTiki89 06:56, 8 June 2014 (UTC)
In the edited works at Google N-gram "who it was given to" does not appear whereas "whom" variations with and without dangling "to" do. In speech I'd expect the absent version to be more common.
In any event "whom" would seem to be required to give the impression that Wiktionary definitions are written in English. DCDuring TALK 12:12, 8 June 2014 (UTC)
Steven Pinker mentions whom specifically in his book The Language Instinct. He says it's a relict of a dying Germanic case system and shouldn't be mandatory. My words now not his, but try objecting to someone who doesn't use the -est ending for second-person singular forms like sayest and talkest. They'll probably say that's ridiculous, to which you say it's the same principle just -est forms died out longer ago. Renard Migrant (talk) 10:29, 9 June 2014 (UTC)
I must disagree with Pinker. Are we to similarly discard him, them, her, me? ‑‑ Eiríkr Útlendi │ Tala við mig 20:42, 9 June 2014 (UTC)
I have learned that whom is BE, and AE uses who exclusively. -- Liliana 11:21, 9 June 2014 (UTC)
AM uses whom in formal literary language, but who in spoken language. Usually when someone uses whom in spoken language, they are being humorous. —Stephen (Talk) 12:02, 9 June 2014 (UTC)
I doubt there is much actual difference between American and British English on this, despite common perception (but I may be wrong). --WikiTiki89 13:42, 9 June 2014 (UTC)
Whom is used in spoken American English, but not in some situations, especially where there is a separation between the preposition or verb of which who/whom might be the object. Usages such as "Who did you see?" and "Who did you give it to?" are very common in spoken American English and not rare in written American English. I suspect that other varieties of English show a similar pattern, though frequency may differ. DCDuring TALK 16:38, 9 June 2014 (UTC)
As another anecdoctal point on the graph, my experience of US English matches DCDuring's. ‑‑ Eiríkr Útlendi │ Tala við mig 20:42, 9 June 2014 (UTC)
MWDEU devotes 2 full pages to use of who and whom. It notes object use of who and subject use of whom. They cite Shakespeare for instances of all of the four main uses. They specifically say that there is not evidence of the decline of whom in written English. I don't think that Garner's (2009) is entirely accurate, but they strongly defend objective whom and nominative who, except in intentionally casual writing. I commend any good descriptive style book or grammar from Jespersen and Mencken to the present for details on use in both speech and writing. DCDuring TALK 23:52, 9 June 2014 (UTC)
My experience in CanE is similar to DCD’s.
Let’s avoid any construction that would look like an error to readers. It’s distracting, possibly confusing, and probably harmful to readers’ confidence in the dictionary’s reliability. Please use whom where it is conventionally used in formal writing. Michael Z. 2014-06-16 20:53 z

Module:category tree[edit]

I created this module to replace {{catboiler}}. Right now it does exactly the same as the template did, so nothing has changed when it comes to creating new category names for existing templates like {{poscatboiler}}. The module just uses the existing subtemplates, there is no data module yet. So for most editors nothing has really changed, but I just wanted to let them know anyway. —CodeCat 22:39, 9 June 2014 (UTC)

boldfaced forms of invariant lemmata in headword lines[edit]

(Already mentioned at module talk:headword; now I'm bringing it here, where it belongs.)

At [[bonefish]], {{en-noun|bonefish|bonefishes}} displays

''plural'' <span class="form-of lang-en plural-form-of"><b class="Latn" lang="en"><strong class="selflink">bonefish</strong></b></span>

rather than

''plural'' <span class="form-of lang-en plural-form-of"><b class="Latn" lang="en">bonefish</b></span>

as it (IMO) should, because it 'tries' to link to the plural. I propose this be changed. Thoughts? (I guess function format_parts can be modified somehow to fix this.)​—msh210 (talk) 17:38, 11 June 2014 (UTC)

As I mentioned to you in the talk page, it's not the module that is responsible for the extra "selflink" tag, it's the wiki software itself. It does that whenever you link to the current page, like here: Wiktionary:Beer parlour/2014/June. —CodeCat 17:45, 11 June 2014 (UTC)
Hrm... doesn't adding an anchor like "#English" stop the software from thinking that it is in fact linking to the same [exact place on the] page, like here (where only the last link is bolded)? I presume headword templates add such anchors, so er... why are the links still bolded? (And why is it a problem that the invariant plural is bolded, when the '-es' plural is also bolded? I think either bolding all plural forms or bolding none of them is good. Am I just totally misunderstanding what's being discussed here?) - -sche (discuss) 18:06, 11 June 2014 (UTC)
It does, but that anchor is not added if the final parameter to full_link is provided. —CodeCat 18:22, 11 June 2014 (UTC)
Well, yeah, of course. But it's the module that links! I'm proposing that it not do so when the form is the same as the lemma.​—msh210 (talk) 23:05, 11 June 2014 (UTC)
Why does it make a difference? The software already handles links that point to the current page. —CodeCat 23:09, 11 June 2014 (UTC)
Because there's an extra <strong>, which looks awful (bonefish (plural bonefish)).​—msh210 (talk) 04:52, 12 June 2014 (UTC)
Aha, after some testing I can see what you mean. Those two instances of "bonefish" are identical on my computer (Windows 7) if I use Firefox v30 or Opera v12, but I can see the difference if I use Internet Explorer v11. - -sche (discuss) 06:16, 12 June 2014 (UTC)
They look very different on Firefox for me: file:Bonefish msh210.png.​—msh210 (talk) 06:39, 12 June 2014 (UTC)
For me, the second "bonefish" (when it is different from the first, i.e. when I use Internet Explorer) is bigger/bolder, but not underlined. How odd that things would display so differently not just from browser to browser but from user to user! - -sche (discuss) 14:32, 12 June 2014 (UTC)
There are more than two levels of font weight. I remember reading somewhere about Firefox having different font weight for <strong> and <b>Keφr 21:55, 12 June 2014 (UTC)
Would this be solved by specifying this in the CSS?
b strong { font-weight: inherit; }
—This unsigned comment was added by CodeCat (talkcontribs) at 22:51, 12 June 2014 (UTC).
If there's a class used for all headword lines, then .classname strong.selflink{font-weight:inherit} would probably work. And it'd not inadvertently affect other places b strong may appear.​—msh210 (talk) 04:59, 13 June 2014 (UTC) Stricken.​—msh210 (talk) 07:11, 15 June 2014 (UTC)
Fixing an HTML issue with a CSS hack seems like a bad idea. Why not just test to see if the plural is the same word as the current page and if so don't make it a link? Kaldari (talk) 08:50, 14 June 2014 (UTC)
I agree.​—msh210 (talk) 07:11, 15 June 2014 (UTC)
On the contrary. The issue isn't even in the HTML, as "strong" nested within "b" is perfectly fine as far as HTML is concerned. Rather, the problem is how that combination is displayed, which is what CSS is supposed to take care of. Trying to fix this issue by changing the HTML is a bit like breaking down a wall because you don't like its colour. The wall wasn't the problem, the paint was. —CodeCat 23:36, 17 June 2014 (UTC)
I'm not sure I follow that logic. Isn't <i> nested within <b> fine as far as HTML is concerned? Yet if you don't want any of the bold text on your site to be italic, the solution is not to use CSS to change how <b><i>foo</i><b> displays, it's to drop the <i> from the places where it occurs inside (or outside) <b>. - -sche (discuss) 01:13, 18 June 2014 (UTC)
Actually no, in your example that is what you'd do. That is why CSS exists; to separate presentation from the underlying content. In other words: the content should not contain information about how to display it. That's a very fundamental HTML principle which was not strongly enforced when HTML was new, but is now much more rigid. Hence, in the modern HTML5 interpretation, <b> doesn't actually mean "bold", despite that it is commonly used to bold text. It's perfectly valid to make text in that element not display bolded. See this link for an explanation on what these elements actually mean. —CodeCat 01:21, 18 June 2014 (UTC)
According to that page, we don't want inflected forms that are the same as their lemma forms to get <strong>: <strong> is for more emphasis/importance (than, in this case, the lemme form has), which we have no reason to prescribe for the inflected form. It just makes sense for the module to check whether the inflected form matches the lemma form and, if so, not generate a link.​—msh210 (talk) 07:29, 18 June 2014 (UTC)
Then that should be addressed to the MediaWiki developers. They are the ones that decided that "strong" would be appropriate to indicate a link to the current page. We shouldn't try to work around that, because we would never be able to catch all cases anyway. —CodeCat 16:47, 19 June 2014 (UTC)
Or we can fix Module:links (lines 214-216) so that it does not replicate this behaviour. Keφr 18:04, 19 June 2014 (UTC)
But we need to replicate it. Otherwise links to the current page won't show in bold. —CodeCat 18:24, 19 June 2014 (UTC)
The above discussion suggests the opposite. Keφr 18:30, 19 June 2014 (UTC)
The above discussion is only about headword templates. The code you referred to is used in many other places too. —CodeCat 19:03, 19 June 2014 (UTC)
Personally, I would get rid of the boldface in inflection tables too. I think it might be quicker to list places where boldface on self-links is desirable, actually. How about changing language_link so that the bolding would depend on the link face? (By which I mean whatever is currently passed as the face argument to full_link.) Keφr 19:08, 19 June 2014 (UTC)
That's probably the best solution. —CodeCat 21:14, 19 June 2014 (UTC)
If we choose to change it only in headword lines, then the fix would be a conditional in the definition of part in function format_parts in module:headword, I think.​—msh210 (talk) 07:10, 20 June 2014 (UTC)

A small improvement for languages with the same name[edit]

We currently have a variety of ways to disambiguate language names so that they're unique, but they may not always help people find the language they're looking for. So what if we created categories like Category:Buli language, and categorise all languages called Buli in there? @-sche: I think this would interest you. —CodeCat 22:45, 12 June 2014 (UTC)

I think that's a good idea. In addition to languages that are distinguished by parenthetical disambiguators, like Buli (Ghana) and Buli (Indonesia), this could be especially helpful for languages that are distinguished by prepended family info, like Austronesian Gimi vs Papuan Gimi. Someone who comes here knowing that a language is called Gimi might look in Category:All languages under 'G', or might start typing "Category:Gim..." into our search bar; right now the search suggestion function won't find anything to suggest, but if there were a Category:Gimi language, it would. (For languages that are distinguished by the use of alternate names, this method isn't possible; they will just have to keep cross-linking via {{also}}.) - -sche (discuss) 02:40, 13 June 2014 (UTC)

Using ISO 639-3 private use codes for custom language families[edit]

A little while ago I mentioned in passing (I don't remember where) that it might be good to change the way we devise codes for language families lacking one. I suggested using the ISO 639-3 private use area for this. Private use codes are in the range qaa-qtz, so there would be 520 codes for us to use, which I imagine is plenty if we use them for language families alone. The main reason I propose this is so that we can avoid really long 9-letter codes like "ine-bsl-pro" for Proto-Balto-Slavic, or "qfa-kor-jjm" for Jeju. If the family is at most 3 letters, then the whole code will never be more than 6, which is a bit more manageable.

I propose the following for our existing "exceptional" family codes:

Name Old New
Admiralty Islands poz-aay qai
Anatolian ine-ana qan
Andamanese qfa-adm qad
Arabic sem-arb qar
Aramaic sem-ara qam
Arandic aus-rnd qac
Araucanian qfa-ara qau
Arnhem aus-arn qah
Atayalic map-ata qal
Aymaran sai-aym qay
Bahnaric aav-ban qba
Balto-Slavic ine-bsl qbs
Bantoid nic-bod qbd
Benue-Congo nic-bco qbc
Borneo-Philippines poz-bop qbp
Brythonic cel-bry qbr
Bungku-Tolaki poz-btk qbt
Bunuban aus-bub qbn
Burmish tbq-brm qbm
Canaanite sem-can qca
Cariban sai-car qcb
Central Chadic cdc-cbm qcc
Central New South Wales aus-cww qcn
Central Semitic sem-cen qcs
Central-Eastern Oceanic poz-occ qco
Chapacuran qfa-cpc qch
Chinookan nai-ckn qci
Chukotko-Kamchatkan qfa-cka qck
Chumashan nai-chu qcm
Daly aus-dal qdl
Dardic iir-dar qda
Dogon qfa-dgn qdg
Dyirbalic aus-dyb qdy
East Barito poz-bre qeb
East Chadic cdc-est qec
East Semitic sem-eas qes
Edoid alv-edo qed
Eskimo esx-esk qek
Ethiopian Semitic sem-eth qet
Finisterre ngf-fin qfn
Finnic fiu-fin qfi
Finno-Permic fiu-fpr qfp
French Sign Languages sgn-fsl qfs
Frisian gmw-fri qfy
Fur ssa-fur qfu
Garawan aus-gar qgw
German Sign Languages sgn-gsl qgs
Goidelic cel-gae qga
Grassfields nic-grf qgf
Guahiban qfa-gua qgh
Guaicuruan sai-gua qgc
Gunwinyguan aus-gun qgy
Gur nic-gur qgu
Halmahera-Cenderawasih poz-hce qhc
Hurro-Urartian qfa-hur qhu
Inuit esx-inu qiu
Iwaidjan aus-wdj qia
Iwam paa-iwm qiw
Japanese Sign Languages sgn-jsl qjs
Jivaroan qfa-jiv qjv
sai-jee qje
Kadu qfa-kad qka
Kaili-Pamona poz-kal qkp
Kainantu-Goroka paa-kag qkg
Kainji nic-knj qkj
Karnic aus-kar qkr
Keresan qfa-ker qks
Kiowa-Tanoan qfa-kta qkt
Korean qfa-kor qko
Kukish tbq-kuk qkk
Kwa alv-kwa qkw
Kx'a qfa-kxa qkx
Lakes Plain paa-lkp qlp
Lampungic poz-lgx qla
Left May qfa-mal qlm
Lencan qfa-len qln
Macro-Chibchan qfa-mch qmh
Macro-Jê sai-mje qmj
Maiduan nai-mdu qmd
Malayic poz-mly qml
Malayo-Chamic poz-mcm qmc
Malayo-Sumbawan poz-msa qms
Masa cdc-mas qma
Mascoian qfa-mas qmo
Mataco-Guaicuru qfa-mgc qmg
Matacoan qfa-mtc qmt
Mbum alv-mbm qmu
Micronesian poz-mic qmn
Mien hmx-mie qmi
Misumalpan qfa-min qmm
Mixe-Zoquean nai-miz qmz
Mixtecan omq-mix qmx
Muna-Buton poz-mun qmb
Muran sai-mur qmr
Muskogean qfa-mus qmk
Nahuan azc-nah qnu
Nambikwaran sai-nmk qnk
New Caledonian poz-cln qnc
Ngayarda aus-nga qny
North Athabaskan ath-nor qna
North Bahnaric aav-nbn qnh
North Bornean poz-bnn qnb
North Sarawakan poz-swa qnw
North-Central Vanuatu poz-vnc qnv
Northeast Caucasian cau-nec qkc
Northwest Caucasian cau-nwc qpc
Northwest Semitic sem-nwe qns
Northwest Sumatran poz-nws qnm
Nyulnyulan aus-nyu qnn
Oceanic poz-oce qoc
Ok ngf-okk qok
Old South Arabian sem-osa qoa
Pacific Coast Athabaskan ath-pco qpk
Palaihnihan qfa-pal qph
Pama-Nyungan aus-pam qpn
Paman aus-pmn qpm
Pano-Tacanan qfa-pat qpt
Panoan qfa-pan qpa
Polynesian poz-pol qpl
Pomoan nai-pom qpo
Sabahan poz-san qsh
Sahaptian nai-shp qsp
Saluan-Banggai poz-slb qsb
Sama-Bajaw poz-sbj qsj
Savanna alv-sav qsv
Senegambian alv-sng qsg
Sepik paa-spk qse
Siouan-Catawban qfa-sca qsc
Sko paa-msk qsk
South Arabian sem-sar qsa
South Bird's Head ngf-sbh qbh
South Semitic sem-sou qsm
South Sulawesi poz-ssw qss
Southeast Solomonic poz-sls qsl
Southwest Pama-Nyungan aus-psw qsn
Southwestern Tai tai-swe qst
substrate qfa-sub qsu
Sunda-Sulawesi poz-sus qsi
Tacanan qfa-tac qta
Tai-Kadai qfa-tak qtk
Tocharian ine-toc qto
Tomini-Tolitoli poz-tot qtt
Torricelli qfa-tor qtc
Tucanoan qfa-tuc qtn
Tuu qfa-tuu qtu
Tyrsenian qfa-tyn qty
Ubangian nic-ubg qbg
Ugric fiu-ugr qgr
Vietic mkh-vie qfv
Volta-Congo nic-vco qcv
Volta-Niger alv-von qng
West Barito poz-brw qbw
West Chadic cdc-wst qcw
West Semitic sem-wes qsw
Western Oceanic poz-ocw qow
Wintuan qfa-wtq qin
Wotu-Wolio poz-wot qqw
Xincan qfa-xin qic
Yeniseian qfa-yen qey
Yidinyic aus-yid qiy
Yok-Utian qfa-you qou
Yolngu aus-yol qoy
Yuin-Kuric aus-yuk qky
Yukaghir qfa-yuk qqy
Yuman-Cochimí nai-yuc qcy
Zaparoan qfa-zap qrz
Zapotecan omq-zap qpz

(I don't know how to make the table collapsible so that it takes up less space. If you know, please edit my post.) —CodeCat 23:10, 15 June 2014 (UTC)

The previous discussion was at the end of this GP thread from April. As I did then, I oppose this now because I don't think it's workable. The ISO has been relatively stingy when it comes to granting codes to families and subfamilies, and our own Module:families/data is (even after I started working on it) sadly incomplete. In the GP thread I said that, as a ballpark guess, I'd expect us to end up with maybe four times as many exceptional (non-ISO) family and subfamily codes as we have now, by the time Module:families is 'complete'. In particular, our treatment of African, Asian and American languages is often coarse; we're lacking a lot of subfamilies. And I wasn't even thinking of sign language families at the time, but there are probably dozens of those that don't have ISO codes. The list above contains 167 families. If my estimate is correct, we'll end up needing ~660 codes, which means we'd run out of possible codes. Moreover, long before that happened, we'd run out of codes that were memorable approximations of their families' names. Just looking at the last few codes in the list, I see that due to the restriction on codes higher than "qtz", you've already had to resort to things like "qqy" for Yukaghir, "qrz" for "Zaparoan", "qpz" for "Zapotecan", "qky" for "Yuin-Kuric", etc. - -sche (discuss) 00:14, 16 June 2014 (UTC)
There is no disadvantage for us to have more than three characters in a language code. In fact, it makes it easier to come with more meaningful and memorable codes. --WikiTiki89 00:36, 16 June 2014 (UTC)
Using non-standard language codes invalidates the HTML of our pages. Private-use extension codes are okay, of course, because they are standardized.
And I agree that longer codes might be better. Instead of Yukaghir = qaa, why not use the BCP47 private-private use subtags, allowing, e.g., Yukaghir = x-yuk or x-yukaghir? Michael Z. 2014-06-16 20:14 z
I think we should do this only for the roots, not for the branches. Using a q code for something like Benue-Congo is just wasting a limited resource, IMO- especially since some of the entries in your table are bogus (several of the subdivisions of poz are figments of Blust's questionable methodology, for instance).
We should concentrate first on the codes starting with dummy families such as qfa and und- those are the main source of the unwieldy and non-mnemonic clutter you're talking about. Next we might think about regional ones like aus, nai and sai, but they do have a little bit of mnemonic value, even though they have no linguistic merit at all.
Any time we have a three-letter code, everything below it on the tree should start with that code, unless it has its own ISO code- in which case everything below it should have the iso code. There may be a few cases such as nic and alv where we may decide to make an exception, but that should be the general rule. Chuck Entz (talk) 02:09, 16 June 2014 (UTC)
Hmm, I had considered this idea myself, of giving q__ codes only to top-level families that currently have qfa-___ codes. It's probably workable, but it wouldn't change much. At present, there are 38 codes with "qfa-" prefixes, and only some of their proto-languages' codes would get shorter: "Pano-Tacanan" is currently "qfa-pat", its subfamily "Tacanan" is "qfa-tac" (NB not "qfa-pat-tac"), and their associated proto-languages would be "qfa-pat-pro" and "qfa-tac-pro" (not "qfa-pat-tac-pro") if they existed; if "Pano-Tacanan" were "qpt", its proto-language code would be "qpt-pro", but "Tacanan" would still be nine letters, as "qpt-tac-pro". - -sche (discuss) 20:22, 16 June 2014 (UTC)

Naming scheme for templatized usage notes[edit]

Sometimes, it's useful to put the same usage note on several entries. When that happens, the usage note is made into a template. However, we don't have a consistent naming scheme for such templates. Many start their names with language codes followed by 'note' or 'usage', like Template:he-usage-begedkefet and Template:de-note obsolete spelling. A few start with 'usage', like Template:usage less fewer and Template:usage ize. Template:U:Latin stop+liquid poetic stress alteration exists in a 'U:' pseudonamespace, apparently inspired by the 'R:' pseudonamespace that reference templates exist in. I actually quite like that last idea, especially if coupled with the use of language codes, as it groups all templatized usage notes together in Special:AllPages, just like the reference templates are grouped under 'R:'.
I suggest the following naming scheme for templatized usage notes: Template:U:[language code]:[brief identifier]. Template:de-note obsolete spelling would become Template:U:de:deprecated spelling (or similar); Template:usage ize would become Template:U:en:ize (or similar).
My second choice would be to fix the half-dozen outliers to use the [language code]-['note' or 'usage']-[identifier] format that it seems everything else uses.
Thoughts? - -sche (discuss) 05:45, 17 June 2014 (UTC) corrected a typo/thinko in my original post: switched from U:[langcode]- to U:[langcode]: (compare: the reference templates use a colon after the langcode, not a hyphen) - -sche (discuss) 19:12, 18 June 2014 (UTC)

Support. We also need a category for these templates. — Ungoliant (falai) 13:26, 17 June 2014 (UTC)
I like the U:langcode: prefix approach, though, as always, I don't think that the langcode is a needless waste of keystrokes for langcode=en. It should dramatically increase the likelihood that one could find the template by typing something in the search box. DCDuring TALK 17:20, 17 June 2014 (UTC)
Support. The Hungarian usage templates can be found here: Category:Hungarian usage templates. --Panda10 (talk) 12:19, 19 June 2014 (UTC)
Oh, and it seems we even have a Category:Usage templates, we just need to be sure to use it(s subcategories) on all the templates. - -sche (discuss) 15:47, 19 June 2014 (UTC)
Using Special:AllPages, I found every template that had "usage" or "note" in its name. I've moved about half of them; these remain to be moved. I haven't categorized many of the templates into Category:Usage templates yet, but it should be easy to find all the templates that need categorizing now that they all begin with "U:" (or are listed here). - -sche (discuss) 21:24, 3 July 2014 (UTC)
Some more candidates: {{rank}}, {{season name spelling}}, {{who vs. whom}}, {{ga-analytic}}, {{oikein väärin}}, {{HTML char}}, {{Hiragana informal}}, {{he-begedkefet}}, {{el-freq-Google}}, {{el-freq}}, {{katakana-in-science}}, {{preferred IUPAC name}}, {{trademark erosion}}, {{sh-coll-link}}, {{arabdialect}}, {{1990}}, {{el-T-Vs}}, {{el-T-Vp}}, {{ja-kun-vs-on}}, {{be-у-ў}} DTLHS (talk) 02:11, 4 July 2014 (UTC)
Thanks for finding those. "1990"? Youch; as terrible names for usage-note-templates go, that one is exceptional... - -sche (discuss) 03:12, 4 July 2014 (UTC)

chapter in quote-book[edit]

I recently asked in the Grease Pit for what seemed a minor change in a template that would make a major improvement to the display of quotations from books (see here). It was about shifting the presentation of the chapter number from before the book title to after it. It seemed to me to be an important change of minor coding difficulty to anyone familiar with the templates and their encoding (I am familiar with neither).
Nothing has been done about it, and nobody has even commented on it. Should I have raised the matter here ? Or in the Tea Room ? ReidAA (talk) 07:25, 19 June 2014 (UTC)


IMO « “Soup from a Sausage Peg”, in The Snow Queen and Other Tales » looks good.​—msh210 (talk) 07:20, 20 June 2014 (UTC)

That's fine if the chapter has a name, but not if it only has number, as in choc-a-bloc. And I don't understand why your example brings up the word in while mine doesn't. I've tried to find out about the internals of templates but to call the documentation mind-boggling is a gross understatement. What I would appreciate is a quote-novel template, much like the quote-book one, that puts a chapter number (which the quote-book seems able to detect) after the title of the novel and it would be useful to have also a time parameter that can be used to convey when the story is set. ReidAA (talk) 06:47, 24 June 2014 (UTC)
I like the format at [[choc-a-bloc]]. In any event, you can do what I do and what I suspect most editors do: not use the templates, instead formatting the quotations as in [[Wiktionary:Quotations#How to format quotations]].​—msh210 (talk) 19:04, 24 June 2014 (UTC)
Hey, that's a great link. I only wish I had found it for myself long ago. But I'm not sure that using templates mightn't allow easier coding. What I would like to be able to do is to code those RQ templates (i.e., make my own), preferably without embedded template usage. Can you tell me where there is an explanation of how to code them?—ReidAA (talk) 01:12, 25 June 2014 (UTC)
I don't think there is. Do you know the basics of coding wiki templates? In that case, perhaps find one that does similarly to what you want yours to do and copy and modify it. Otherwise, ask for help coding one in this section of this page or in the Grease pit.​—msh210 (talk) 05:34, 25 June 2014 (UTC)
I fancy I could handle the coding, provided I could look at example template coding. My attempts to get to see one have been fruitless, though I have no difficulty looking at their documentation code because it's easy to pretend to be editing it. So how do I get to look at an example RQ template source code?—ReidAA (talk) 09:40, 25 June 2014 (UTC) Fixed indenting.​—msh210 (talk) 00:37, 26 June 2014 (UTC)
Go to any such template and click "Edit" or "View source" atop the page or add ?action=edit to the URL; e.g., http://en.wiktionary.org/wiki/Template:RQ:Hardy_Laodicean?action=edit.

If you need to edit someone else's post on a discussion page, it's courteous to indicate you did so (as I've done to your last post, just above); I'd even say it's obligatory if (as was not the case here) the edit is substantive.​—msh210 (talk) 00:37, 26 June 2014 (UTC)

Hey, that link looks great! I think it's just what I needed. I presume I'll be able to copy that, edit it, and put it in the same location. Should I add a link to it to https://en.wiktionary.org/wiki/Category:Reference_templates or is that automatic? (I notice that your example doesn't seem to be there, and there doesn't seem to be a button to allow me to put a change in.)

As to my editing, all I meant to do was change the indentation so that more text would be visible at one time (see boldfaced forms of invariant lemmata in headword lines about 4 items above to see the kind of thing I was trying to avoid). If I did more it was accidental, for which my apologies. In fact, I think it would be preferable merely to alternate the message indentations, but in future I'll just go along with what seems to be the accepted practice.—ReidAA (talk) 11:09, 26 June 2014 (UTC)

It's not automatic.​—msh210 (talk) 07:38, 29 June 2014 (UTC)

Media Viewer is now live on this wiki[edit]

Media Viewer lets you see images in larger size


The Wikimedia Foundation's Multimedia team is happy to announce that Media Viewer was just released on this site today.

Media Viewer displays images in larger size when you click on their thumbnails, to provide a better viewing experience. Users can now view images faster and more clearly, without having to jump to separate pages — and its user interface is more intuitive, offering easy access to full-resolution images and information, with links to the file repository for editing. The tool has been tested extensively across all Wikimedia wikis over the past six months as a Beta Feature and has been released to the largest Wikipedias, all language Wikisources, and the English Wikivoyage already.

If you do not like this feature, you can easily turn it off by clicking on "Disable Media Viewer" at the bottom of the screen, pulling up the information panel (or in your your preferences) whether you have an account or not. Learn more in this Media Viewer Help page.

Please let us know if you have any questions or comments about Media Viewer. You are invited to share your feedback in this discussion on MediaWiki.org in any language, to help improve this feature. You are also welcome to take this quick survey in English, en français, o español.

We hope you enjoy Media Viewer. Many thanks to all the community members who helped make it possible. - Fabrice Florin (WMF) (talk) 21:54, 19 June 2014 (UTC)

--This message was sent using MassMessage. Was there an error? Report it!

To turn Media Viewer off (which you will probably want to, since it's incredibly annoying), go to Special:Preferences, select the Appearance tab, scroll down to Files, and unclick "Enable Media Viewer". If you don't have an account, I don't think you can turn it off, so you're just out of luck. Despite all appearances to the contrary, it's a feature, not a bug. —Aɴɢʀ (talk) 06:55, 20 June 2014 (UTC)
Oh, great. Another misfeature to turn off. But it is not that pictures are critical to us, anyway. Keφr 07:07, 20 June 2014 (UTC)

Category:Place names and topical subcategorisation in general[edit]

Category:en:Place names and its subcategories are presumably meant to contain place names in English. But among its various subcategories are categories like Category:en:United States of America. This is a place name itself, yes, but this category is intended and used for anything related to the US, not just places in the US. What bothers me here is that this category is nonetheless a subcategory of Category:en:Place names. I like to think that any subcategory is a strict subset of its parent category, so that any terms placed in a subcategory would be valid in the main category as well. How do other editors think of this principle. And if we should apply it, how would we do so here? There are other areas within the category tree where this applies too, like Category:en:Hydrology and Category:en:Snow being subcategories of Category:en:Liquids. —CodeCat 22:00, 19 June 2014 (UTC)

Place names is a lexical category. If it is to be subdivided further, it should probably not be by geography. We don’t have Category:en:Adjectives in the United States of America.
If Category:USA is supposed to categorize referents, i.e., the things represented by terms, then it doesn’t belong under Place names at all.
And in my opinion, it belongs in Wikipedia, not in Wiktionary. As long as we continue to categorize terms by qualities of their referents, there will continue to be such confusion among 1 lexical/grammatical categorization of terms, 2 technical/subject-field categorization of terms’ usage, 3. encyclopedic categorization of things. Why should we put so much energy into creating a far lamer copy of Wikipedia’s categorization? Instead, let’s keep adding Wiktionary links to Wikipedia articles. Michael Z. 2014-06-27 18:22 z
@Mzajac: Are you saying that something like Category:nl:Days of the week does not belong on Wiktionary? I don't think I agree with that, it's a very useful category. —CodeCat 20:19, 29 June 2014 (UTC)
I see that this is a leaf in the branch » Dutch language » All topics » Nature » Time » Days of the week. I guess this relates to meanings, the way a thesaurus classifies words? I don’t even know if these topics relate to or overlap with the technical vocabulary categories that are applied using usage (“context”) labels. I do see it as a problem that this is not called Category:nl:Names of days of the week or Category:nl:Terms for days of the week, because our entries represent terms, not their referents.
But if that is useful, aren’t the following also: Category:nl:Colours of the rainbow, Category:nl:Apostles of Christ, Category:nl:Blackletter fonts, or Category:nl:Dog breeds that are good with childrenMichael Z. 2014-06-29 23:56 z
Presumably yes, except that there's not as strong a need to look up those terms. I've been working on the topical tree for a few days now, reorganising and moving things around a bit to what seems more workable to me. What I noticed is that there are two basic types of category: categories of topic or relationship ("Chemistry", "Weather", "Food and drink"), and categories of types or sets ("Days of the week", "Organic compounds", "Countries of Africa"). The actual entries contained in them may not be so strictly separated, however. The former generally have names in the singular, while the latter mostly have plural names. What I've tried to do is to make sure that relationship categories are not subcategorised into type categories, to avoid the scenarios I described above. —CodeCat 00:05, 30 June 2014 (UTC)
I see we have Category:Colors of the rainbow, just no Dutch there.
Well, thanks for working on better organizing these. Michael Z. 2014-06-30 01:14 z
To go back to what you said, though. Our entries represent terms rather than their referents, and that's specifically why topical categories exist. To categorise by their referents instead. Of course you can put "terms related to" or "terms for" in front of every category name, but that doesn't really change anything in the end, the category structure will still be the same and so will the entries in them. So I wonder what you would suggest that wouldn't diminish the utility of these categories. —CodeCat 01:32, 30 June 2014 (UTC)
Of course naming the categories properly would change something. Some editors have only the vaguest idea of what dictionary entries are or how this is fundamentally different from Wikipedia, and imprecise or incorrect language built into the project just increases the confusion.
I’m not sure, but I think there is confusion because topical categories exist for several different reasons, which are not wholly compatible. Editors have been shuffling these around non-stop for years now, with no overall plan as to how they should look. I have a feeling this can’t be resolved without defining exactly what these are, and possibly creating two or three separate category trees for them.
  1. For example, since usage is documented by specialized subject labels like “chemistry,” logic tells me that there should be a category containing technical vocabulary used in the field of chemistry. We have a zillion labels categorizing usage, but no usage categories.
  2. Since editors are adding [[Category:en:Chemistry]] to entries, I suppose there should also be a more general subject-field category for “English terms related to chemistry,” after we define exactly what that means.
  3. If we are also categorizing by definition, distinct from usage or subject field, then I suppose we should settle on some scheme like that of Roget’s Thesaurus, where all words are grouped by concept (unlike most of today’s alphabetical thesauruses) – so Category:Chemical elements would be somewhere in section 635 “Materials.”
Maybe nos. 2 and 3 are the same thing, but they are certainly distinct from no. 1. But since all three are mixed up in a soup, our categories will continue to be shuffled around because they don’t seem right. Michael Z. 2014-07-02 00:34 z
I definitely agree with your first point. If terms are used within specific fields, or certain senses are, then that's lexically significant and not really any different from many of the categories currently in Category:English lexicons. What you mention in point 2 is a general problem with these categories in that "related" is not well-defined, and can be interpreted rather broadly. Some people might consider water to belong in Category:en:Chemistry just because it's the name of a chemical substance and hence related to chemistry. Others might disagree but think carbon dioxide belongs there. And if I understand your third point correctly, it refers to what I called "categories of types" above. They group things by common hypernyms; that is, by what their referents are. —CodeCat 00:46, 2 July 2014 (UTC)

Chinese most basic words are still undefined![edit]

Calling on Chinese-aware editors (natives and learners) to pull up their socks and make some effort to add missing Chinese contents. HSK Beginning level, first few hundred most frequent Chinese words (e.g. Appendix:Mandarin_Frequency_lists/1-1000 still miss definitions, lack formatting and have been neglected since their creation many years ago. I don't think there are obstacles for doing them now (translingual sections, difference in topolects), even basic senses for single-character words are still missing - This entry needs a definition. Please add one, then remove {{defn}}. And they are most frequent everyday words! I have just added "to do" - one of the most frequent Chinese verbs. Recently added "water", "year", "mountain", many other basic words. Even if definitioneless Mandarin entries get definitions, it's a step in the right direction, no need to work with a dialect you don't speak, if you're not confident.

@Tooironic:, @Kc kennylau:, @Bumm13:, @Meihouwang:, @Wyang: - not such a small group, huh? I may have missed some people. --Anatoli (обсудить/вклад) 02:02, 20 June 2014 (UTC)

The complexity of Mandarin/Chinese entries put off people but now they're not so complicated. Here's an example:




Wyang also suggested using:


for terms with complex semantics. --Anatoli (обсудить/вклад) 02:09, 20 June 2014 (UTC)

I've deliberately put off doing 字 entries because they seem such a big undertaking. Is anyone else up for this? I suppose we could get the basic ones done first. ---> Tooironic (talk) 02:10, 20 June 2014 (UTC)
I know that you put off doing them. Thanks. That's what I argue - first - they are not THAT complicated, second - the definitions don't need to be exhaustive, e.g. see is good enough, IMHO. :). One definition is better than nothing at all. Besides, you don't have to have to do stroke orders, canjie, anything that goes into translingual, just semantics is fine. --Anatoli (обсудить/вклад)
I've been following other people's leads and doing some occasional edits to them... Wyang (talk) 03:52, 20 June 2014 (UTC)
You've been most productive and leading in the Chinese editing. Aren't all the new modules, templates, bot work your doing? :) I only included you in the list to invite you to the topic. Thanks to the new structures you made it has become much easier to add non-Mandarin contents too. (I'm not taking credit for what I haven't done myself, in case it sounded like I do, LOL). --Anatoli (обсудить/вклад) 04:01, 20 June 2014 (UTC)
Forgot to ping @Jamesjiao:. You've been very quiet. :) --Anatoli (обсудить/вклад) 07:59, 20 June 2014 (UTC)
LOL... yes.. Let's just say I am no longer alone in my life now, so I have been spending a lot less time on the dictionary lately! Character entries just put me off for some reason; maybe it's because I like to be comprehensive and edit in all the possible definitions all in one go. I could change that attitude and just add in the basic stuff first... JamesjiaoTC 03:17, 24 June 2014 (UTC)
When will the "definitions" thingy be official? I can't wait to see that day! --kc_kennylau (talk) 09:04, 20 June 2014 (UTC)
Probably not in the foreseeable future. —CodeCat 12:12, 21 June 2014 (UTC)
Should we put compound only definition in this header? for , I separated it into noun and verb header. But this is wrong because this character will never be used as a noun or verb by itself. Should the part of speech correspond to the use of the character in a sentence or to the meaning? Meihouwang (talk) 08:40, 21 June 2014 (UTC)

Using rollback to revert good-faith edits[edit]

...needs to stop immediately. It violates numerous policies. The undoing of good faith edits should only be done while leaving edit summaries Purplebackpack89 (Notes Taken) (Locker) 00:00, 22 June 2014 (UTC)

Which policies? Also, if I recall, you left no edit summary on your own revert of Chuck's (good faith and correct) edit. —CodeCat 00:03, 22 June 2014 (UTC)
Look again, CodeCat. You'll see I DID leave an edit summary. And, at the time, the module was not working, so Chuck's edit removed a category, and was therefore wrong. If you don't understand the policies governing rollback, CodeCat, you shouldn't be using it. Purplebackpack89 (Notes Taken) (Locker) 00:06, 22 June 2014 (UTC)
"BRD" is just some letters. It's not an edit summary, nor does it explain your reasons for reverting. And again I ask, which policies? Furthermore, if the module was not working, that doesn't mean Chuck's edit was incorrect. It means that the module needed fixing, which he did as far as I can tell. Either way, that category didn't belong on that page whether the module was fixed or not. —CodeCat 00:07, 22 June 2014 (UTC)
BRD has a distinct meaning in the Wikimedia universe. It means bold, revert, discuss. Chuck made an edit. I reverted. Per BRD, instead of reverting, you should have discussed. But this particular edit is beside the point. The point is that edit summaries should be left except in case of vandalism. You going on and on about how right you are doesn't give you an excuse to not leave an edit summary when making a clearly controversial edit. Purplebackpack89 (Notes Taken) (Locker) 00:14, 22 June 2014 (UTC)
CodeCat, before you continue editing, please read this, it's the Meta blurb on rollback. We don't have a blurb on rollback, so in the absence of one, I defer to Meta and to Wikipedia. Purplebackpack89 (Notes Taken) (Locker) 00:16, 22 June 2014 (UTC)
BRD is a Wikipedia concept, not a Wiktionary concept. It's not used in Wiktionary and people here generally are not familiar with it unless they happen to edit Wikipedia too. Furthermore, even on Wikipedia, "BRD" is not a valid justification for any edit, as it is a common Wikipedia practice, not a policy nor a reason for making an edit. In any case, if you want to set up an official policy to make edit summaries required when not reverting vandalism, you're free to do so. But I don't think there's much chance of it succeeding. Regardless of what Meta says, on Wiktionary a revert simply means the same as an undo. It means "I think the page was better before". —CodeCat 00:20, 22 June 2014 (UTC)
The ideal solution is to get the same tool that Wikipedia has to easily revert both good and bad faith edits and optionally leave an edit summary. --WikiTiki89 00:09, 22 June 2014 (UTC)
And until Twinkle arrives on Wiktionary, rollback shouldn't be used for anything except vandalism Purplebackpack89 (Notes Taken) (Locker) 00:14, 22 June 2014 (UTC)
Why does Purplebackpack have rollbacker rights anyway? He is unfamiliar with Wiktionary’s practices and doesn’t seem interested in becoming familiar with them. — Ungoliant (falai) 00:38, 22 June 2014 (UTC)
User:Ungoliant MMDCCLXIV, That's painting with too broad a brush. However, there are a number of "policies" and not-having-policies that seem just plain arbitrary. Some policy even seem like they're different solely to stick a finger in Wikipedia's eye. Also, the general reason for taking rollback away is abuse. If I am found to have abused rollback and you use that as justification for taking mine away, you'd have to also take away CodeCat's and probably other people's as well. Purplebackpack89 (Notes Taken) (Locker) 00:57, 22 June 2014 (UTC)
Because? Keφr 15:20, 22 June 2014 (UTC)
Well if he keeps up this I-don’t-like-your-practices-so-I’ll-just-follow-Wikipedia’s attitude I will request the removal of his rights. — Ungoliant (falai) 17:14, 22 June 2014 (UTC)
And, as Kephir noted, you would have no basis for doing so. Not liking practices and abusing rollback are two completely different issues. Since how rollback should be used on this Wikipedia is ambiguous, I have not abused it, and therefore there's no reason to remove it. Purplebackpack89 (Notes Taken) (Locker) 00:05, 23 June 2014 (UTC)
Where did I note that? Keφr 00:09, 23 June 2014 (UTC)

Proposal: Require edit summaries for reverting non-vandalism edits[edit]

Keeps us in line with many other Wikimedia projects. Not having it makes editors who use rollback on good-faith edits come off as discourteous. Purplebackpack89 (Notes Taken) (Locker) 00:57, 22 June 2014 (UTC)

  1. Purplebackpack89 (Notes Taken) (Locker) 00:57, 22 June 2014 (UTC)
  1. “Reverted edits by Foo. If you think this rollback is in error, please leave a message on my talk page.” I see this as humble instead of discourteous, due to the admission that the the reversion may be in error. — Ungoliant (falai) 05:04, 22 June 2014 (UTC)
  2. What Ungoliant said. (And that default text is editable if the community so desires.)​—msh210 (talk) 05:50, 22 June 2014 (UTC)
  3. I don't see anything discourteous. The text as Ungoliant quoted it does not seem offensive in any way, especially compared to "Undid revision". —CodeCat 10:36, 22 June 2014 (UTC)
  4. --Yair rand (talk) 05:35, 23 June 2014 (UTC)
  5. For one thing, the 'line' between "vandalism sensu stricto" and "misguided, malformed edits which need to be undone" is quite blurry. Take this diff, for instance, which added a malformatted, vaguely worded, apparently redundant definition onto the headword line. - -sche (discuss) 13:36, 23 June 2014 (UTC)
    "malformatted, vaguely worded, apparently redundant definition". There's your edit summary right there lol. The 'line' is good-faith-but-you-don't-know-what-you're-doing vs. bad-faith-and-you-do-know-what-you're-doing. One is permissible, but not a great idea, and one isn't. You can be blocked for a few instances of one and not for a few instances of the other. Purplebackpack89 19:57, 27 June 2014 (UTC)
  • What matters isn't rather you the experienced editor doesn't think it discourteous, it's rather whether the (probably less-experienced) editor thinks it is. It's clear none of you have read this, which explains why other editors would find it discourteous. Please read it before commenting further. Purplebackpack89 (Notes Taken) (Locker) 14:50, 22 June 2014 (UTC)

Proposal: Get Twinkle[edit]

We haven't had a serious discussion about getting Twinkle in years. If people are concerned about not being able to make edits fast enough, getting Twinkle could solve those problems. Purplebackpack89 (Notes Taken) (Locker) 00:57, 22 June 2014 (UTC)

Does importScript('User:AzaToth/twinkle.js','en.wikipedia.org','431551787'); not work here? (I haven't tried it.)​—msh210 (talk) 05:56, 22 June 2014 (UTC)
It still does, but I marked it as deprecated in favour of mw.loader.load('//en.wikipedia.org/w/index.php?action=raw&oldid=431551787') (or a longer version, you get the idea). I probably should have been more explicit about it. The reason being that importScript from our MediaWiki:Common.js clashes with MediaWiki's built-in importScript, and I think we have no good way to guarantee that either version will run at a given moment. (Also, I have been trying to clean up our scripts mess lately. I think introducing inconsistencies between MediaWiki installations in this way is a bad idea, and this is just waiting for a good moment to bite someone in a vaguely specified body part.) However, the built-in importScript only accepts one argument, and can only load scripts from the local wiki. Keφr 07:19, 22 June 2014 (UTC)
Well, in any event, some sort of importation is possible. So those who want Twinkle can use it already, can't they? This discussion is only about whether to host it locally also (and customize it to our local desires)?​—msh210 (talk) 07:34, 22 June 2014 (UTC)
No, I don't think they can Purplebackpack89 (Notes Taken) (Locker) 14:57, 22 June 2014 (UTC)
Why? Keφr 15:14, 22 June 2014 (UTC)
  1. Purplebackpack89 (Notes Taken) (Locker) 00:57, 22 June 2014 (UTC)
  2. Twinkle provides the ability to do things that are very difficult to do otherwise, such as revert an arbitrary string of edits by multiple editors and leave an optional edit summary behind. Frankly, I don't see any downsides. If we are worried about too many people having access to it, then I'm sure there is a way to restrict its use to users who have rollback privileges. --WikiTiki89 20:15, 22 June 2014 (UTC)
    Like applying maintenance tags to articles, doing all three required steps of starting an AFD/MFD/RFD, initiating a request for page protection, applying protection templates… oh wait, we have none of that nonsense. Keφr 21:03, 22 June 2014 (UTC)
    So we won't use those features. Maybe they can be disabled? --WikiTiki89 21:36, 22 June 2014 (UTC)
    Or we could just use them for RfD/RfV instead. Purplebackpack89 (Notes Taken) (Locker) 00:02, 23 June 2014 (UTC)
    Might as well. On the other hand, starting RfVs and RfDs here is much less of an exercise in bureaucracy than XfD on Wikipedia (add the template, click the "+" link, write the nomination rationale, submit, done; versus editing three or four different pages while manually pasting different template magic in different places on each), and I cannot remember the last time anyone complained about the tedium of our nomination process, so… meh. The amount of work that would need to go into that would be simply not worth it. Or again, do you want to volunteer? Keφr 00:31, 23 June 2014 (UTC)
  1. Equinox 01:00, 22 June 2014 (UTC)
  1. I oppose until sufficient reason to install the gadget here is supplied. I haven't seen such yet.​—msh210 (talk) 05:56, 22 June 2014 (UTC)
  2. Abstaining until someone explains what Twinkle is. —CodeCat 10:37, 22 June 2014 (UTC)
    See w:WP:WikiSpeak#Twinkle. And w:WP:Twinkle for amusement. Most of its functionality makes sense only for Wikipedia, though. Keφr 10:44, 22 June 2014 (UTC)

User:Equinox, care to give rationale rather than just voting? Wouldn't Twinkle mean faster editing and fewer situations where CodeCat et. al misuse rollback? Purplebackpack89 (Notes Taken) (Locker) 01:04, 22 June 2014 (UTC)

Twinkle does not solve the problem because you can still have good edits reverted by Twinkle with no explanation if none is filled in: see [8]. This has happened to me. So it is essentially no different from our pre-existing features (e.g. the "red D" in Recent Changes — not sure whether you've seen this, as it is only available to admins). Equinox 13:01, 22 June 2014 (UTC)
I usually keep this disabled, because I find it too annoying, but I re-enabled "Patrolling enhancements" just so I could see what you are talking about… but no red "D" appeared. Only a blue "M". Keφr 15:14, 22 June 2014 (UTC)
Twinkle isn't like patrolling, and why should either be limited to administrators? Why can't joe schmos like me have tools? Purplebackpack89 (Notes Taken) (Locker) 15:26, 22 June 2014 (UTC)
Because "joe schmos" like you lack the good judgement to use them well. Keφr 15:43, 22 June 2014 (UTC)
That is a very elitist stance, Kephir. It's also untrue: nobody has ever come up to me on this project and said: "You make too many edits in too short a time". Purplebackpack89 (Notes Taken) (Locker) 15:53, 22 June 2014 (UTC)
Correct, it is the contents of your edits that raise our objections. (Also, they almost did tell it to me. Not that I am complaining, just noting.) Keφr 16:04, 22 June 2014 (UTC)
There you go again redefining what people say so it's easier for you to answer it. Of course no one here has said that, but they have pointed out numerous errors in judgement on your part, which you would be able to propagate much more quickly with more tools. Chuck Entz (talk) 16:11, 22 June 2014 (UTC)
  • User:Msh210, it enables us to do more stuff in regards to reverting vandalism or tagging articles, and enables us to do things quicker. What more rationale do you need? And, User:CodeCat, you'd never heard of Twinkle before? Really? Purplebackpack89 (Notes Taken) (Locker) 14:56, 22 June 2014 (UTC)
    • We do not have articles here. We have entries. Twinkle's tagging module would need significant rework before it could be rendered usable for Wiktionary. Which very few here have time for. Or do you want to volunteer? Keφr 15:14, 22 June 2014 (UTC)
    • So it allows anyone to be a rollbacker, even without the rollback right? Sounds like a poor idea to me. There's a reason it's a right granted (easily but) not to everyone.​—msh210 (talk) 04:32, 23 June 2014 (UTC)
  • @Kephir: The red 'D' appears next to the blue 'M' if and only if an unpatrolled edit has created a page; that probably doesn't happen often on your watchlist, and may not even happen in recent changes at any given time you look. The 'D' allows you to delete the page with no edit summary (unless you provide one in a little box which also appears iff an unpatrolled edit has created a page). - -sche (discuss) 19:51, 22 June 2014 (UTC)
    • Hmm. I opened Special:RecentChanges, plenty of unpatrolled new pages there, but no "D". It can see it on Special:NewPages, however. By skimming the source code, I guess it does not work with the "Group changes by page in recent changes and watchlist" option enabled. Still, I am going to keep the gadget disabled. The buttons look too distracting and just feel too easy to misclick. Keφr 20:05, 22 June 2014 (UTC)

Proposal: Codify the BRD practice[edit]

General jist: If you make an edit, another editor can undo it, and after that point, discussion must take place, or else both editors are in error Purplebackpack89 (Notes Taken) (Locker) 00:59, 22 June 2014 (UTC)

To clarify, BRD only applies to good-faith edits. Vandalism can still be reverted without discussion, and an unlimited number of times Purplebackpack89 (Notes Taken) (Locker) 00:00, 23 June 2014 (UTC)
  1. Purplebackpack89 (Notes Taken) (Locker) 00:57, 22 June 2014 (UTC)
  1. Chuck Entz (talk) 22:49, 22 June 2014 (UTC)
  2. --Yair rand (talk) 05:35, 23 June 2014 (UTC)

Nice in theory, but the additional required steps will make patrolling hundreds of edits at a time every day all the more time-consuming. This looks suspiciously like yet another technique PurpleBackpack89 can use to change the subject and shift the blame in order to continue to avoid admitting to ever being wrong or making a mistake.Chuck Entz (talk) 22:49, 22 June 2014 (UTC)

User:Chuck Entz, that is an assumption of bad faith and a DICK-ish comment. BRD doesn't apply to vandalism, it only applies to good-faith edits, so it wouldn't slow down patrolling. Purplebackpack89 (Notes Taken) (Locker) 00:00, 23 June 2014 (UTC)
P89, you are being much more "DICK-ish" than anybody else by constantly resorting to tedious legalistic interpretation of policies — often ones from Wikipedia that don't even apply here. Read the DICK page yourself, especially the top part, and consider how much you are annoying everybody, as is evident from recent discussions, and just shut up for a minute and think about it. Your modus operandi seems to be to attack everybody for not sharing your personal opinion, based on legalistic policies (which half the time don't even exist here), but if anyone ever attacks you for not sharing their opinion, the rules go out of the window and you are suddenly the injured party who needs soothing and pacifying. It's pretty pathetic and disgusting and hypocritical as hell. Equinox 00:04, 23 June 2014 (UTC)
Equinox, you don't get to tell me to shut up. I don't give a damn if you don't like my opinions on things, I am entitled to them as much as you are to yours. I don't attack people for not sharing my opinion, I don't attack people at all. I merely point out that it is wrong to make the broad generalizations people do about me. Take a look at how often I say "User X is always...". You'll find it's never. I only comment on other people's INDIVIDUAL edits. Other people comment on ME. Purplebackpack89 (Notes Taken) (Locker) 00:09, 23 June 2014 (UTC)
Are you saying that we should be prohibited from noticing detrimental patterns in your behaviour? Keφr 00:51, 23 June 2014 (UTC)
Not exactly. I'm saying the way Equinox and Chuck characterized the edits I made was inaccurate. If you think every edit I make is bad (which is damn near what people have been saying), that's inaccurate. If you think every comment I make toward another user is an attack, that's inaccurate. Frankly, everything said about me in this thread is hyperolization at best and completely inaccurate at work (this is in no way reflection on edits outside this thread). Purplebackpack89 (Notes Taken) (Locker) 01:04, 23 June 2014 (UTC)
I think you will find that the community here has a relatively thicker skin than yourself. Not that we tolerate gratuitous offence, but most regulars here will probably not care that much about most instances of what you are oh-so-ready to term "personal attacks". We have not been accusing you of "attacking" anyone. "Attacks" are irrelevant. We are accusing you of gross incompetence with regard to our practices and policies (not understanding that Wikipedia's policies may be not applicable here), elementary courtesy (shifting the blame and burden of proof, claiming you cannot be stripped of rights merely because you have not technically violated any written policy, or some imaginary policy — see previous) and reading comprehension (regularly misconstruing other editors' statements and questions, e.g. just above), and not just inability, but an outright refusal to change this state of affairs. Is every single edit of yours wrong? I guess no. But enough of them that the community has decided to watch you closely, and is seriously considering stripping you of your editing privileges. Keφr 01:42, 23 June 2014 (UTC)
(edit conflict) Would you care to cite one example in which you have ever assumed good faith from your opponents in discussions? To my memory I have never seen you admit you were wrong, and I have never seen you assume good faith. I have, however, seen dozens of marginally off-topic rants about why everyone is out to thwart you. Chuck Entz (talk) 00:14, 23 June 2014 (UTC)
Chuck, you need to divorce assuming someone is wrong from assuming someone acted in bad faith. When I ask, "Why did you do X" (as I did with CodeCat yesterday), I assume that that person has a perfectly good reason (and therefore acted in good faith) for doing so. I don't assume that anyone who disagrees with me is a vandal or is doing what he/she did for nefarious reasons; I have no idea where you got the idea I did. You'll also notice that when I comment on a person's particular edits, I ONLY comment on those edits; I make no generalizations. Purplebackpack89 (Notes Taken) (Locker) 00:28, 23 June 2014 (UTC)
As usual, you're not answering the question you were asked. I asked you to cite an example of assuming good faith for opponents in discussions. I didn't ask if you could point to examples where you didn't assume bad faith about someone's edits outside of discussions. As for generalizations: you do it all the time in discussions here. You may not point to individuals most of the time, but you do talk about how people around here are finding fault because they don't like how you dare to disagree with them. And you do spend lots of time implying all kinds of things, without explicitly saying them.
As for you actions yesterday: you saw my edit, and reverted it without asking why I made it. You were reverted. You reverted the revert without asking why you were reverted. You were reverted again, etc. It was only after it was clear you weren't going to prevail that you bothered to ask, and then it was more like demanding to know why than asking. If you had assumed good faith and asked, the whole episode would have been avoided. I'm not saying I've never done anything to make you think I might be acting in bad faith- but you clearly weren't assuming good faith. Chuck Entz (talk) 01:27, 23 June 2014 (UTC)
You are wrong: Had you bothered to look at the timing of reverts, you'll see that several CodeCat reverts occurred after I asked why she was doing what she was doing. She continued reverting before answering my question, so it's on her, not me. Also, "not assuming bad faith" and "assuming good faith" are the same thing. To claim they aren't is pedantic. Purplebackpack89 (Notes Taken) (Locker) 01:37, 23 June 2014 (UTC)
You didn't really ask, you came right at me with a rather aggressive tone. "What's the big idea" is not asking, nor is it a civil way to start a reasonable discussion. Furthermore you did not even wait for me to answer before reverting me anyway. I'm sorry but I agree with Chuck, that is more of a demand ("stop interfering, you do you think you are?") than an honest attempt to work out a problem to me. —CodeCat 01:41, 23 June 2014 (UTC)
For the record, I blocked P89 (per WP:DICK, lol). No, mainly because he doesn't really do anything useful, and just sows discord. If anyone who's also dealt with this person for several years thinks this was bad judgement, feel free to undo it. Equinox 03:05, 23 June 2014 (UTC)
It's been undone, and the manner in which you did it was highly inappropriate and frankly a personal attack (for crying out loud, you made an lol about a block you made). It's also inaccurate. Have you actually LOOKED at my contributions in mainspace recently? Have you LOOKED at my "pages created" list? Furthermore, it's probably not the greatest of ideas to block someone just because of how their contributions pie falls. Purplebackpack89 (Notes Taken) (Locker) 03:51, 23 June 2014 (UTC)
Cry us a river. Out loud. So what? Nobody is going to treat you nicely merely because you keep crying about "personal attacks"; in fact, this only worsens your position. But since you failed to learn this until now, I doubt you ever will. Your days here are numbered. Keφr 08:27, 23 June 2014 (UTC)
  • This really shouldn't be in the same thread as a BRD proposal. The discussion over the last four hours has been nothing at all about whether or not BRD is a good idea or not, it's using me as a pinata. I tried to split this into another thread, but Ungoliant undid it. Purplebackpack89 (Notes Taken) (Locker) 04:09, 23 June 2014 (UTC)
    For reference, here’s the title of the new thread PBP had added: “Various comments about the behavior of User:Purplebackpack89 that don't really have much to do with whether or not BRD is a good idea”. — Ungoliant (falai) 12:43, 23 June 2014 (UTC)
  • @Purplebackpack89: You can use Wikipedia's "Twinkle" tool by adding importScriptURI("//bits.wikimedia.org/en.wikipedia.org/load.php?debug=false&lang=en&modules=ext.gadget.Twinkle"); to your personal Special:MyPage/common.js, if you'd like. Please don't assume that users here have heard of Wikipedia's tools or policies, and definitely don't act as though their policies have any sort of relevance here; it's pretty ridiculous, moderately offensive, and generally unacceptable. --Yair rand (talk) 05:35, 23 June 2014 (UTC)

The {{l|en|word}} stuff[edit]

Please see User_talk:Ready_Steady_Yeti#The_.7B.7Bl.7Cen.7Cword.7D.7D_stuff. Ungoliant seems to think that I'm a moron. Do other regular editors not agree with me that it's bad enough to have to struggle through a lot of redundant "lang=en" every time we edit an English entry, and it would be worse to have the "l/en" stuff in every single linked word in a definition? Perhaps I sound paranoid or silly, but I do a lot of editing here (second most active editor) and I am genuinely concerned because this kind of stuff could double the amount of time it takes me to create an entry — and make it much more painful and less pleasant. Equinox 01:17, 23 June 2014 (UTC)

I think there's two issues: should editors be discouraged from using the {{l}} template, and should editors be discouraged from not using the {{l}} template. I would say no to both. It's true that RSY has, in typical fashion, taken the practice to extremes, but I don't think there's anything inherently wrong with it. All the same, wikilinking is fine for most purposes when linking to English terms, and for many editors the extra work of using templates isn't worth it- and that should be respected. There are parts of the entry where categorization, formatting and script handling tilt the balance in favor of templates, but that doesn't include definitions. Chuck Entz (talk) 01:50, 23 June 2014 (UTC)
You will not find an ally in me, sorry. I think |lang=en is not redundant to begin with, and that we generally should not treat English much differently from other languages — apart from the fact that the definitions, entry layout and all the documentation of the infrastructure around them are written in English. And I like definitions to have anchors pointing to the right language, although I prefer to accomplish that by just wrapping the whole definition inside {{l|en|...}} — then at least the inside of the template looks more like plain markup.
Related discussion: Wiktionary:Beer parlour/2013/September#A new way of formatting definitions I saw someone use Keφr 01:52, 23 June 2014 (UTC)
Personally, I prefer simple links in definitions, but I tolerate {{l}} links as well. I don't think they are detrimental as long as we are not forced to use them. --WikiTiki89 01:57, 23 June 2014 (UTC)
I don’t think you’re a moron. Also, while I strongly support the use of {{l}} and think it should be encouraged, I don’t think it should be forced on editors. — Ungoliant (falai) 02:22, 23 June 2014 (UTC)
To put it another way, which might be clearer or more convincing to computer people: when you write an HTML document, you can apply an entire property to a section, e.g. <p style="color:red">. This means you can then do anything inside the paragraph, and not care about the colour, because it's already been set. I feel we should be able to do this, language-wise. So all generic links in an English section would link to English, and all generic links in a Spanish section would link to Spanish, etc. I can see the point of specifying the language in a link where it isn't otherwise obvious, but having a default seems very useful and time-saving. I'm honestly amazed nobody agrees. Of course I will bow to consensus but wow! Equinox 03:03, 23 June 2014 (UTC)
Links inside foreign-language definition lines link to English most of the time though. — Ungoliant (falai) 03:27, 23 June 2014 (UTC)
I have always assumed that those whose main contributions are not in English entries simply don't care about any possible time-saving from dispensing with lang=en as it doesn't much effect them. There may also be some kind of fairness/equality ideology at play: "Why should English get special treatment?" "I have to do these extra keystrokes so why not those arrogant English-native yokels?" And then there is the technological uniformity-is-easier-to-code-for factor. DCDuring TALK 03:38, 23 June 2014 (UTC)
Well, exactly. Which is pretty much what I wrote above. Keφr 08:28, 23 June 2014 (UTC)
In my opinion, foreign words should be marked up with the HTML lang attribute (to indicate to screenreaders and search engines and so on what languages the words are), and the easiest/briefest way to do so is by use of {{l}} (or {{head}}), which has the added benefit of linking to the correct section. (Actually, it'd be nice if any 'nyms or 'nyms-like section in a FL L2 would have a <div lang=…>. But we don't have that (yet).) However, English words don't need that attribute: the default language for the page is English. Moreover, we don't need a template to ensure linking to the correct section when it's English, as English is the top section on each page where it exists (pretty much). So {{l/en}} is pretty useless: I generally replace it with plain square brackets when I see it.​—msh210 (talk) 04:41, 23 June 2014 (UTC)
Translingual sections and tables of contents prevent plain links from linking directly to the English section, as well as a same-language section in the page being linked to when Tabbed Languages is being used. — Ungoliant (falai) 05:10, 23 June 2014 (UTC)
Ah, right, I knew there was some issue I was forgetting. Yes, I agree: if and when we decide to enable Tabbed Languages by default, then we should switch English links to {{l}} in FL sections.​—msh210 (talk) 05:52, 23 June 2014 (UTC)
I already fix links to use {{l}} even for English. Mainly because I use Tabbed Languages. It seems a bit strange that we don't take TL users into account when we do support it as an option on our wiki. Why would we need to wait until it becomes the default? Even non-default options should still at least work correctly, shouldn't they? —CodeCat 12:04, 23 June 2014 (UTC)
TL would be a reason to use {{l}} for English only in FL sections, not in English sections. (I switch it to square brackets where I see it only in English sections, actually.)​—msh210 (talk) 16:42, 23 June 2014 (UTC)
So then we've been undoing each other's work. I'm not sure if I like that... —CodeCat 16:55, 23 June 2014 (UTC)
I agree with Equinox. I oppose using "{{l|en|" or "{{l/en|" in English definition lines of English entries. Wiki markup is the user interface; it has to be pleasant to use, which includes not only initial creation but also reading and revising. --Dan Polansky (talk) 20:53, 23 June 2014 (UTC)


Jamaican Creole font[edit]

I assume that there's a rational reason for writing Jamaican Creole and some other languages with larger font than others (sc=Deva), but what is it? Jamaicans do not have worse eyes than the average citizen of the world, do they? --Hekaheka (talk) 09:18, 28 June 2014 (UTC)

Jamaican Creole shows up in the same font (and same size) as English for me, and in Module:languages/data3/j its script is set to "Latn", just like English's. - -sche (discuss) 02:26, 29 June 2014 (UTC)

No Fun Allowed[edit]

Hi. Please make this an official policy A.S.A.P. --Æ&Œ (talk) 10:04, 28 June 2014 (UTC)

A reminder to myself and others: DFTT. --Dan Polansky (talk) 10:30, 28 June 2014 (UTC)
SUOA. TATSOOM. But seriously, just because a topic is “trolling” doesn’t mean that you can’t have fun with it. Congratulations on ignoring the humour and supporting my exaggerated perception that having fun on Wiktionary is wrong. --Æ&Œ (talk) 13:34, 28 June 2014 (UTC)
Silly boy- creating valuable content for an online reference that educates and enlightens the public is all the fun anyone could ever want. Why, just yesterday, correcting an etymology to reference the correct Proto-Indo-European root so filled me with joy that I just had to laugh out loud! I have lots of fun (I understand my neighbors in the apartment building are concerned, though). Chuck Entz (talk) 14:28, 28 June 2014 (UTC)
  • If people get their "fun" making generalizations, low-level digs, and sarcastic remarks, then "fun" should be disallowed Purplebackpack89 14:49, 28 June 2014 (UTC)
  • Mandatory fun? DCDuring TALK 15:39, 28 June 2014 (UTC)
    Admittedly it is difficult to make fun mandatory, but we could make attendance at a Wiknic next weekend mandatory. A selfie/usie with other attendees and signage would be required to prove attendance. DCDuring TALK 16:13, 28 June 2014 (UTC)
Clearly the real issue is that we have too many toasters. -- Liliana 21:20, 28 June 2014 (UTC)

Über-template with tabular output for pronunciation section (2)[edit]

Older discussion: Wiktionary:Beer parlour/2014/March#Über-template with tabular output for pronunciation section

The hard parts have been mostly done, here's a prototype:


variety IPA Rhymes Optional column Audio Homophones Hyphenation
Netherlands /ˈrɛizə(n)/ -ə(n) VALUE
rijzen rei‧zen
Belgium /ˈrɛːzn/ [ˈrɛːzn̩] rei‧zen
/ˈrɛːzə/ rijzn
Sandwich Islands /gumbalagumba/

Click on "less ▲" or "more ▼" to switch between the views. (This approach has been stolen adopted from User:Atelaes' Template:grc-pron) There was always feedback from readers that they can not find definitions, pronunciation sections are especially big, so I had proposed to have a show/hide button, but why not showing some of the info instead of hiding everything completely?

Regarding the full view:

  • There's a set of predefined columns with predefined displaying text and order. These columns, including the first column (variety/accents), can be omitted, though. In templates/entries, users can define additional columns in their desired order. The hyphenation column will probably be removed from this template. We may predefine enPR column as well. We may want to create separate templates for different languages, similar to headword templates. Any comments would be appreciated.

Regarding the brief view:

  • What information should we put in this mode? I only put accent name, IPA, and audio, and kept them as brief as possible. Should we also include enPR?
  • Should it be a list or a table? How should we arrange and display the information?
  • Some accents are less important. For English, for example, there is usually little demand for pronunciation in accents other than American and British. Should we include all accents in this mode?
  • I made the audio box smaller, but the buttons (play, volume) sometimes disappear or being misplaced, maybe a CSS-related issue or something?

--Z 17:54, 28 June 2014 (UTC)

If enPR is included, it should be hidden along with the other extended content until someone clicks 'more', IMO; likewise for SAMPA.
Personally, I prefer bulleted lists to tables (for both the expanded and the condensed views).
- -sche (discuss) 16:50, 29 June 2014 (UTC)
I'm glad to see it stolen. I did my best to make it broadly pilferable. -Atelaes λάλει ἐμοί 20:20, 29 June 2014 (UTC)
  • Is the lack of colons after the dialect name deliberate?
  • Perhaps throw in some use of rowspan when certain content will be identical across accents?
  • Something to consider: We might want to build "About" pages for various accents. Could be some helpful content, and possibly useful for delineating where accents start and end.
  • Something we might not want to consider if it's likely to get too complicated and/or annoying: Flags.
  • This looks like it'll likely kill visibility of the very interesting Rhymes content on Wiktionary. Not in favor.
  • Perhaps Module:IPAc could be integrated, at least for English? Many readers don't know IPA.
  • How should accents that have identical content for a particular word be handled?

--Yair rand (talk) 05:04, 30 June 2014 (UTC)

Module Errors on Empty Input[edit]

Is there a good reason to have templates like {{l}} go to a module error is there's no content? I can understand a module error for a missing language code when there's content to be displayed, even if I find it annoying, but it seems to me that {{l|}} or even {{l|en|}} is a minor omission that shouldn't be dealt with by throwing scary-looking module errors.

It would seem to make more sense for the module to test for empty input and simply return nothing for nothing (I suppose a hidden tracking category would be ok). That would also mean it could be used in inflection-table templates without extra code to test for a null parameter.

In general, I think we should move away from using module errors as the way to deal with simple data-entry errors wherever possible- it gives the illusion of their being technical errors that only experts can fix. Chuck Entz (talk) 16:04, 29 June 2014 (UTC)

I disagree that "{{l|}} is a minor omission" — it's the most major omission that it's possible to make from that template, the total omission of all content. But I do note that {{IPA|}} merely categorizes into Category:Pronunciation templates without a pronunciation and Category:Language code missing/IPA, without throwing a module error. Perhaps we should even add a superscript "please add a (link|pronunciation) or remove the template" message, similar to the message Dutch headword-line templates use when diminutives aren't provided. - -sche (discuss) 16:44, 29 June 2014 (UTC)
I also disagree, but I furthermore disagree that not displaying errors is an improvement. There are parts of the site which display big red errors even for relatively minor mistakes, like adding references to a page but nowhere to display them. Showing the errors makes them clearly visible and gives editors more of an incentive to fix them. "Out of sight, out of mind" definitely applies here. Errors that only add categories generally don't get fixed; just look at how big some cleanup/request/attention categories are. —CodeCat 17:23, 29 June 2014 (UTC)
How hard would it be to make the error message depend on the status of the person logged in and on 'type' of error?
I would argue that unregistered users, at one extreme, and admins, at the other benefit from different approaches. Non-contributing unregistered users probably benefit from suppression of error messages. Our ability to recruit contributors may be enhanced by only gradually revealing how finicky we have made the process of contributing. In any event it seems highly likely that we are making the passive (ie, normal) user's experience worse by exposing such users to raw Module error messages without in any way leading to a better Wiktionary by eliciting valuable corrections from such users. ::Further, I am reasonably sure that not all errors merit the same approach, even. Some thought should lead to an architecture allows discrimination along both the user dimension and the error-class dimension.
I am sorry if (probably that) discriminating by user and specific error situation makes the task of designing modules and templates more complicated, but we already have a situation in which very few contributors can do any editing of modules and therefore cannot readily alter the behavior of templates without begging. I would hope that our talent could conceive of some simplifying architecture to enable this kind of discrimination in the main cases. DCDuring TALK 18:46, 29 June 2014 (UTC)
It's very easy to hide module errors to anonymous using CSS; I did it before. But at the time I think people didn't like it because it made text look strange with odd gaps were the errors should be. I don't think CSS can be used to actually change the text "Module error" itself, but it can change how it appears. —CodeCat 20:11, 29 June 2014 (UTC)
I take it that it is not possible to close up the gaps by setting character width to be 0 or very small or selecting a font or pseudofont that has the property of being of zero width.
Another thing that may be useful is {{REVISIONUSER}}, which allows one to test whether a(n) (anonymous) user made an edit. That would allow us to provide a message (on previewing or saving the edit) to an anonymous user who otherwise would not get a message. DCDuring TALK 22:15, 29 June 2014 (UTC)

July 2014[edit]

Category for all lemmas again[edit]

Previous discussion: Wiktionary:Beer parlour/2014/January#A category for all words or lemmas in a language

The previous discussion seemed to have general support, so I would like to make this change, but there are a few details I'd like to ask about first. We can either have just a category for all lemmas, but nothing else changes, or we could split off all "form" categories into their separate tree and have another category for non-lemmas (which may not be all that useful in the end?). A third option would be to have a category for lemmas alongside a category for all terms in a language regardless of lemma status. However, this last option could also be achieved by mentally merging the lemma and non-lemma categories, so this does not have much added value over the second option. —CodeCat 11:34, 1 July 2014 (UTC)

I prefer the idea of having a per-language category with all words (not just lemmata/headwords), rather like the way the Official Scrabble Words is presented. When I've used Index:English in the past — of course, it's years out of date now — I've wished it had all words and word forms. Equinox 13:25, 6 July 2014 (UTC)
I think we should have both: one category for all words, and another category for all lemmata. Fr.Wikt and De.Wikt already have categories for all words in each language. Both categories would have many uses. A category of all words would be useful for scrabble players, and for finding entries in the event that we needed to (a) make some change to all words in a certain language, or (b) examine all words in a certain language to see which of them met a certain criterion (e.g. used an acute accent, if we decided that they were all actually supposed to use a macron). (On De.Wikt I used to use the "all words" categories to look for words I didn't recognize, check Google Books and other dictionaries for them, and 'RFV' them if necessary.) A category of all lemmata would be useful for finding words to alliterate, and would also probably be more useful for any other practical purpose, for highly inflected languages where inflected forms would otherwise swamp the lemmata. Both categories would allow Wiktionary to be used like a paper dictionary, where all words can be seen in alphabetical order regardless of POS. - -sche (discuss) 16:22, 6 July 2014 (UTC)
Would a category for all lemmas, and another for all non-lemmas also be ok? That way, you could still look through all words, by searching through both categories. —CodeCat 16:34, 6 July 2014 (UTC)
No, I think there are advantages to having a category that already contains all words, vs having to merge two categories oneself. And I don't actually see a benefit to having a category for all non-lemmata at all, besides that it might provide a more up-to-date count of "form[-of] definitions" than WT:STATS does.
It's also worth noting that a category for all words will be simpler on a philosophical level, and presumably also on a technical level, to implement than a category for all lemmata, because for the "lemmata only" category we will have to wrestle with questions like: are Template:alternative spelling ofs lemmata? Are Template:standard spelling ofs lemmata? What scalable way is there to know which category to use for entries that only contain {{head|foo}} with no POS set? What scalable way is there to know which category to use for entries like messages (q.v.)? Etc, etc. Whereas, anything with {{head|en}} can go into the "all words" category. - -sche (discuss) 16:58, 6 July 2014 (UTC)
It's more that if we have a category for lemmas and all words, then every lemma in every language will have two more categories added to it. If we split them, it will only be one. As for the question of what is a lemma, I think it's relatively simple: if it would probably be listed as a lemma in a paper dictionary, we would do the same. My intention was to create separate category trees for lemmas and non-lemmas, Category:English lemmas and Category:English non-lemma forms. The former would contain Category:English nouns, Category:English verbs etc, while the latter would have Category:English plurals, Category:English verb forms and so on. I would consider an alternative spelling a lemma, because it is the lemma form of a word, and would presumably be found in a paper dictionary with a "see (other lemma)" notice. —CodeCat 17:31, 6 July 2014 (UTC)

FWIW, De.Wikt and Fr.Wikt both use their equivalents of Category:English language as their "all words in English" categories. We could either follow that model, or come up with a separate category, like Category:English words. - -sche (discuss) 16:58, 6 July 2014 (UTC)

I just came across this discussion from a link from NFE (July 23's entry). That page says "As {{head}} needs its second parameter to be specified for this to work, this should always be specified if possible. All remaining entries that are still missing this parameter are placed in Category:head tracking/no pos and will need fixing.". Is there consensus to that effect? I don't see any on this page, at least, nor at [[WT:RFM#Split subcategories of Category:English parts of speech between Category:English lemmas and Category:English non-lemma forms]] (current link), and the January BP discussion linked to above has, as its last point, CodeCat saying we should have more input before going ahead with it: so here's mine. It doesn't make sense to me: form-of entries (many of them, anyway) have always gotten along just fine with {{head|langcode}} as their headword line and a categorizing template as the definition line, obviating the need for the headword to categorize also. If we want terms to be directly in the non-lemma category, besides being in the e.g. past-tense-form category, then the e.g. past-tense-form-of template can so categorize: it makes more sense to require some dozens of template edits now than some thousands of entry edits now followed by making everyone work extra on each entry. Moreover, it's unwise to require users to write duplicate information in each entry. Pinging CodeCat and kc_kennylau, who've discussed this (see the January BP link above).​—msh210 (talk) 19:17, 29 July 2014 (UTC)

I should clarify that I support the categorization, and I support the use of {{head}} to so categorize. I object only to requiring a second parameter in {{head}}.​—msh210 (talk) 19:18, 29 July 2014 (UTC)
The way it works right now is fine. I don't think it is ever right to omit the second parameter, but I don't think there should be a error if it is. A cleanup category is enough. --WikiTiki89 19:22, 29 July 2014 (UTC)
Well for starters, certainly not all form-of templates categorise and there have been many difficulties with the ones that do. {{plural of}} was cumbersome to use for example as the category it added was often inappropriate, so we removed the category from it some time ago. Of course some of them still categorise, but who is going to remember which ones do and which one's don't? I'm not. So I think the headword line should always add a category just to remove any ambiguity as to whether a category is needed. I support the exact opposite that you do: the form-of templates should categorise less, not more. I see no reason why requiring the second parameter on {{head}} would be a problem. —CodeCat 19:25, 29 July 2014 (UTC)

A plea for more scrupulous patrolling[edit]

Rather sloppy content has been slipping through RC patrol recently. I have found some through second-hand monitoring pages like Special:UncategorizedPages and Special:Shortpages. Apparently User:SemperBlotto has been inactive lately, which means that someone else has to do what he has been doing. I urge all sysops and patrollers to visit Special:RecentChanges more often.

On request, I can grant rollback and patroller rights to trusted regulars. Keφr 06:56, 2 July 2014 (UTC)

Rollback and patroller rights AFAIK have in the past been done at WT:WL, requiring two admins' input, not one.​—msh210 (talk) 05:55, 6 July 2014 (UTC)
Given the lack of interest, this question is kind of academic anyway, but: Wiktionary:Beer_parlour/2013/October#Purplebackpack89 Rollback request. And for some (if not most) users listed at Special:ListUsers/rollbacker the rollback or patroller right has been granted without any process at all (just because Stephen sees someone undo a lot of edits). Of course, for me an autopatrolled flag (which is granted at WT:WL with input from two admins) is a prerequisite here. And given that I am announcing this in public, and it can be undone in case someone disagrees with my judgement, I think it should not pose a problem. Keφr 06:49, 6 July 2014 (UTC)
Sounds good to me.​—msh210 (talk) 07:40, 6 July 2014 (UTC)
@Kephir: Please make me a patroller. I don't promise anything, but becoming the patroller will create the temptation for me to actually patrol. Let the patroller flag be removed from me as soon as anyone disagrees. --Dan Polansky (talk) 09:00, 6 July 2014 (UTC)
Granted. Keφr 09:09, 6 July 2014 (UTC)
Wait, Dan doesn't have the mop? If he doesn't, he should. Purplebackpack89 15:02, 6 July 2014 (UTC)
He shouldn’t be an administrator if he still can’t deal with editors peacefully. I don’t think that he’s merited patroller rights either. --Æ&Œ (talk) 18:59, 6 July 2014 (UTC)
To be clear, I am hardly a big fan of Dan's, but given his, shall I say, very critical attitude to other people's editing, I doubt he is going to abuse the "mark as patrolled" button too much. About the rollback button, I am less sure. Keφr 20:08, 6 July 2014 (UTC)
AEOE, If dealing with editors peacefully is a criteria for adminship, there are some admins who should have their mops taken away. Purplebackpack89 22:49, 6 July 2014 (UTC)
Too bad SemperBlotto drove away all the new users that could have picked up the slack :P Kaldari (talk) 08:38, 13 July 2014 (UTC)

I'm willing to help too by becoming a patroller please. I have a certain experience of this task, with more than 50,000 rereadings on the French Wiktionary and around 3,000 editions here. JackPotte (talk) 17:56, 12 August 2014 (UTC)

Converting WT:Information desk to monthly pages[edit]

Moved from Wiktionary:Grease pit/2014/July#Converting WT:Information desk to monthly pages

Can we do this now? The last time this was proposed there was some contention that new users might be confused by the monthly pages system and post things to the wrong page. However, I cannot recall a single such incident, so this seems to be a non-issue. Shall we switch WT:ID to the monthly page system as well? The benefits are quite obvious.

Keφr 21:14, 1 July 2014 (UTC)

There are plenty of examples of people (and in some cases the "+ (add section)" button itself—see [12]) getting confused and mistakenly posting to the main page rather than the monthly subpages, e.g. [13] and [14]. (There are also examples of people posting to the wrong monthly subpage.) However, I no longer feel that this is much of a problem. - -sche (discuss) 22:44, 1 July 2014 (UTC)
Is this worth even asking the question. The page doesn't get very big and shows no signs of growth AFAICT. DCDuring TALK 23:31, 1 July 2014 (UTC)
Yes it does. --WikiTiki89 23:35, 1 July 2014 (UTC)
I think all these were submitted while MediaWiki:Common.js was broken. So assuming it will not break too often, we are rather safe. Keφr 05:13, 2 July 2014 (UTC)
This is a WT:BP question now, since we know we are technically capable of it. --WikiTiki89 22:49, 1 July 2014 (UTC)
Yes, I see. The method used for BP would work for ID, but not for request pages without further complications. DCDuring TALK 00:28, 2 July 2014 (UTC)

It looks like -sche is for it, DCDuring has been convinced(?), and Wikitiki89 seems kind of supportive. One more supporter and if no one objects I go with it. Keφr 13:32, 5 July 2014 (UTC)

  • I oppose this. The benefit that I can see is no need to archive the page anymore, but the page is low-profile enough that archiving is not really a problem. The subpaging seems less intuitive than having a single page. --Dan Polansky (talk) 15:52, 5 July 2014 (UTC)
    • I think one of the reasons it is "low-profile" is that nobody wants to visit it, because it is so annoyingly large. (See WT89's diff above.) Keφr 16:01, 5 July 2014 (UTC)
      • I don't think so; the information desk is a rather unimportant page, especially compared to Beer parlour, so it gets low traffic; nothing to do with the size. As an aside, you said "I knew I could count on you." in the edit summary. If you want to say such things, be enough of a man and put them in the discussion, or, better yet, drop that juvenile behavior. --Dan Polansky (talk) 16:37, 5 July 2014 (UTC)
  • How about automating the current archiving method by archivebot.py (docs)? There will need to be slight (and probably good, I'd say) changes, though; the month headings need to go, and archiving will be done section by section, not all sections in a period at once. For an example, see ArchiverBot working on [15]. I can volunteer to run it, if there is interest. Whym (talk) 10:53, 6 July 2014 (UTC)
    • Not workable in my opinion. We have rather few bots, and for all I know, there is no one who can afford to run a bot full-time. And even if, they would probably prefer it to handle mainspace tasks. Also, I never liked Wikipedia-style archives. With monthly pages, you know that if you started a thread in one place, it stays there, unless expressly moved. Keφr 13:34, 6 July 2014 (UTC)
      • Just to clarify, I am an operator of the archive bot for two other wikis. It costs almost nothing to me to add one wiki. Whym (talk) 16:05, 6 July 2014 (UTC)
        • And who will replace you if you stop running it? (Which is another problem here. High rotation and little staff.) Keφr 16:43, 6 July 2014 (UTC)
          • (Just responding to "if you stop running it" for the record, not objecting to the other concerns Kephir and -sche noted) My bot uses Tools Lab. [16] Co-maintainers are welcomed. It could also be useful for archiving user talk pages. Whym (talk) 03:09, 7 July 2014 (UTC)

{{look}} Asking User:Æ&Œ, User:Equinox, User:-sche, User:Angr, User:Stephen G. Brown for further input. (Anyone else is also welcome.) Keφr 16:43, 6 July 2014 (UTC)

  • I have no strong opinion on the issue one way or the other. —Aɴɢʀ (talk) 19:13, 6 July 2014 (UTC)
  • I also have no strong feeling about monthly subpages. In the past, I opposed converting the Information Desk to subpages, out of concern for teh noobs, but as evidenced by my comment above, I no longer feel that people posting to the main page rather than the subpages is much of a problem, given how easy it is to move threads. The suggestion that a bot could archive threads on an individual basis is interesting, but the number of pages on which that might conceivably be useful is small (BP, GP, ID, ?TR?), and I think the benefit Kephir notes (of knowing that if you started a discussion on the July subpage, that's where it's staying) outweighs the small potential benefits of per-thread archiving. - -sche (discuss) 19:29, 6 July 2014 (UTC)
  • I don’t have a strong feeling about it. It gets very little traffic, so I don’t think it matters either way. —Stephen (Talk) 03:29, 7 July 2014 (UTC)
    • In the last archived batch, ID had 16 threads per month on average. Which seems rather typical, and is not that small in my opinion. The Etymology Scriptorium often has fewer topics.
    Anyway, what we have here seems to be three "welllll, sure, if you want to" (WT89, -sche, DCD), one oppose (DP), and two strong lacks of opinions (Angr, Stephen). I am going to convert it now. Revert me if you give a shit. Keφr 09:38, 9 July 2014 (UTC)

New Word of the Day feed[edit]

Featured Feeds for Word of the Day are now available: rss, atom. If you have a suggestion to better format the feed, I'd like to help implementing. Otherwise, enjoy. :) Whym (talk) 15:30, 2 July 2014 (UTC)

I just set up a FWOTD feed, when should I expect it to appear? Also, it would be nice if the feed item contained the actual word for its title. I already know how to set that up, but it requires running a bot over WOTD/FWOTD pages, which I am too lazy to do right now (basically, the same way we solved the problem with context templates). Otherwise, wooooooo! Keφr 16:00, 2 July 2014 (UTC)
Feed names need to be added on the server side; see gerrit:136316. Should FWOTD be added for all Wiktionaries or only for English Wiktionary? Whym (talk) 11:19, 3 July 2014 (UTC)
I have no knowledge of other projects having a FWOTD. Keφr 11:21, 3 July 2014 (UTC)
Ok, I have made the request in bugzilla:67563. Whym (talk) 11:02, 6 July 2014 (UTC)
And it has been resolved: [17][18] Whym (talk) 09:32, 11 July 2014 (UTC)

Recent "Tbot" entries[edit]

I've been finding a few entries here and there that are tagged with the {{tbot entry}} template that date to 2013 and 2014. They had redlinked categories, and I created a few of those categories using the {{tbotcatboiler}} template before I realized that these were for new entries.

Not that I have anything against the type of entries Tbot used to create, but if we're going to be doing this sort of thing again, we should change the documentation so we're not listing someone who's no longer here as the contact, and talking about how things are different now that it's 2007. Chuck Entz (talk) 07:44, 6 July 2014 (UTC)

If these entries are not by Tbot… where do they come from? Keφr 13:36, 6 July 2014 (UTC)
See this. One user making a few. I'd ask User:Liuscomaes. --Type56op9 (talk) 00:37, 9 July 2014 (UTC)



Does the {{deprecated}} headband is still available on that template? The template seems to be very used and no replacement is proposed. — Automatik (talk) 15:48, 7 July 2014 (UTC)

The replacement is to use a real part-of-speech header like "noun" or "verb". —CodeCat 16:06, 7 July 2014 (UTC)
What would be the replacement for IANAL? Keφr 16:08, 7 July 2014 (UTC)
Did you even look at the entry? :) —CodeCat 16:09, 7 July 2014 (UTC)
Oh, me stupid. Previous time I checked, the header was "Acronym". But truth is, even "Phrase" does not seem very fitting. Keφr 16:13, 7 July 2014 (UTC)
Well, in any case, the replacement is whatever header you would use for the fully spelled out form. So if "I am not a lawyer" is a phrase, then so is this. If not, then this needs to be changed, but I don't know what into. —CodeCat 16:15, 7 July 2014 (UTC)
And for the categorisation? {{en-noun|-}} doesn't seem to be correct for Mbps, neither {{en-noun}} because there is no inflection for this word. — Automatik (talk) 17:12, 7 July 2014 (UTC)
{{en-plural noun}}? (Which I still think to be a stretch.) Keφr 17:15, 7 July 2014 (UTC)
I don't think, because we can say 1 Mbps. — Automatik (talk) 17:19, 7 July 2014 (UTC)
I guess it's safe to say that it stands for both "megabit per second" and "megabits per second". --WikiTiki89 17:33, 7 July 2014 (UTC)
{{en-noun|Mbps}}? Keφr 17:35, 7 July 2014 (UTC)
Thank you, I used it. — Automatik (talk) 14:43, 8 July 2014 (UTC)
Realistically I don't think this template will ever be orphaned because it always needs human intervention. That is, a bot can't tell if it's a 'noun', a 'verb', (etc.) so a human editor is always needed. Meanwhile the template is still being used in new entries. But in principal 'noun', 'verb' (etc.) offers more information to the user, while things like 'acronym' should be in the etymology, as 'acronym' explains how the word was formed in the first place. Renard Migrant (talk) 10:13, 17 July 2014 (UTC)
We could make an abuse filter for it. —CodeCat 11:08, 17 July 2014 (UTC)

"Definitions" header in Chinese entries[edit]

Apparently, people have been adding this header to Chinese entries instead of part-of-speech headers. But I recall that there was no support for this in the previous discussion. Why is this being done anyway? These entries should be fixed. —CodeCat 11:58, 8 July 2014 (UTC)

Can we just have a real vote on it? Otherwise people are just going to keep going back and forth. DTLHS (talk) 21:53, 8 July 2014 (UTC)
Because the validation of a language-specific header does not require consensus by vote (Wiktionary:Entry layout explained/POS headers#Other headers in use). It only needs the agreement between editors who regularly deal with such entries. The "definitions" header is no different from the "Han character" header in use in the hundreds of thousands of Chinese character entries (e.g. ). Wyang (talk) 03:23, 9 July 2014 (UTC)
Inventing a new part of speech header for languages where it's appropriate (I've done this too, I added the "Relative" POS for Xhosa and Zulu) is not a problem. It's a very different story when you're introducing a new header to remove part-of-speech information altogether. That is my objection here. —CodeCat 11:33, 11 July 2014 (UTC)
Regardless of bureaucracy, what exactly is the reason for replace POS headers with ===Definitions===? --WikiTiki89 13:54, 11 July 2014 (UTC)
I'm with CodeCat and DTLHS here. Why not just split the meanings by part of speech like we do for literally every other language. If there's a case to be made for not doing this, set it out in a vote where we can all see it. Renard Migrant (talk) 10:43, 15 July 2014 (UTC)
  • Re: what is the reason, there's the simple fact that 1) Chinese doesn't inflect at all, so there's no useful information provided by the POS header other than the POS itself, which can easily enough be included inline; and 2) many Chinese terms have basically the same meanings applied in different POS ways. Take , for example. We've got 13 senses listed under 5 different POS headers. The headers really only serve to break up the page in ways that are unintuitive for Chinese. ‑‑ Eiríkr Útlendi │ Tala við mig 18:52, 15 July 2014 (UTC)
    Thank you for explaining your reasoning. Here's what I think: The POS headers are useful because they make it easier to find the definition you are looking for. Most of the time when you are looking up a word, you already have a good sense of its POS because of how it was used in a sentence, and so you can use the headers to narrow down your search for the definition. It would be very redundant to list "(noun)" before every noun sense, etc. BUT I think it may be a good idea to remove the requirement for "inflection lines" after each POS header, since they serve no purpose other than to duplicate the same information over and over. --WikiTiki89 18:59, 15 July 2014 (UTC)
Some of the reasons were also mentioned here: Template_talk:zh-pron#Why_does_this_categorise_in_part-of-speech_categories.3F. The choice of PoS is often arbitrary, based on the translation into English, dictionaries either mix up PoS or ignore it. By any system, listed PoS's do not sufficiently represent the actual usage. --Anatoli T. (обсудить/вклад) 05:28, 17 July 2014 (UTC)
I oppose "Definitions" header in Chinese entries, now as before. I already posted this, albeit to what is now ranked by someone as "off-topic", below. --Dan Polansky (talk) 18:05, 23 July 2014 (UTC)

Abbreviated Authorities in Webster[edit]

I have recently discovered the Abbreviated Authorities in Webster Table, and, noticing that a few of the early entries have been linked to Wikipedia, I have been adding a few such links myself. It's interesting, though there are occasionally mismatches of dates (should the Wikipeida date be moved in?). But it's a bit inconvenient for navigation. I feel that the table should be divided by initial letter. If this seems to be generally agreed upon, is it something I would need to do myself or is it something that should be done by a coding whizz ? —ReidAA (talk) 08:08, 9 July 2014 (UTC)

Excellent! That table could be quite useful in resolving some of the {{rfquotek}} entries.
I started manually splitting the table by initial letter. It is not hard. It just requires copying the wikitable formatting surrounding the "W" or Y" headers and inserting it in the appropriate place in the undivided table.
What might be a great help would be adding links to Wikisource, Google Books, or Project Gutenberg versions of some of the specific works. As an example I did so for Hawking and Hunting. To make sure that the work is useful we should extract from the XML dump a list of how often each authority is used within {{rfquotek}}. DCDuring TALK 10:12, 9 July 2014 (UTC)
The table is now initialised. I've done a bit more wiki-referencing some of the authors. --Catsidhe (verba, facta) 12:23, 9 July 2014 (UTC)
Thanks. A dump run would help us see which authorities were actually in use, so, for now, we may as well just pursue what is interesting. DCDuring TALK 14:01, 9 July 2014 (UTC)
As we would want to use this to source citations, the best forms of a work to link to would be those that allowed search at once of the entire range of the authority in question. Wikisource often breaks the work into chapters, which is unsatisfactory for search, though arguably good for linking. It is not so handy to have to download the work to search it. DCDuring TALK 14:22, 9 July 2014 (UTC)

Proper nouns[edit]

I just came across Bible and Qur'an, which are labelled proper nouns. But at the same time, these have plurals and can take an indefinite article. I just read through w:Proper noun, which suggests that real proper nouns (or proper names) can't take indefinite articles nor have plurals. If they do, then they're not proper nouns, but refer to a class of things rather than a unique entity. The article uses "Toyota" as an example that can be either: the company itself as a proper noun, or a car made by the company as a common noun. In this sense "Bible" is a common noun because it's a book that many copies can exist of. It doesn't act grammatically the same as other book or story titles, whether old or modern. Compare for example "Odyssey", which takes a definite article like "Bible", but doesn't normally have an indefinite article: a Bible versus a copy of the Odyssey, not *an Odyssey. So I wonder what kind of criteria we should apply to proper nouns on Wiktionary, and whether we shouldn't consider relabelling some. —CodeCat 18:33, 11 July 2014 (UTC)

Are given names not proper nouns? They can be pluralised: "All the Jameses in the room raised their heads.", and they do not seem to have a distinct meaning in the plural. Keφr 18:44, 11 July 2014 (UTC)
(e/c) In particular, we currently label personal names as proper nouns, while simultaneously admitting (in many though not yet all entries) that they pluralize. Ditto country names (Germany : Germanies, Germanys, America : Americas, France : Frances). - -sche (discuss) 18:46, 11 July 2014 (UTC)
These words can be both common and proper nouns. Compare the following sentences:
  1. The Bible says to honor one's parents.
  2. Jack read the Bible.
  3. Jack put the Bible he had just bought under his pillow.
In the first sentence, "the Bible" is indisputably a proper noun, while in the third, it is indisputably a common noun; in the second, however, it can be interpreted either way. --WikiTiki89 19:01, 11 July 2014 (UTC)
In cases such as Bible and Qur'an, I think we should include both POS sections, which is what Bible already does. — Ungoliant (falai) 18:54, 11 July 2014 (UTC)
Yes, in the case of books, I think including both sections (Proper noun, and Noun) is best. In the case of personal names, on the other hand, I think including two sections would be unjustifiable; as Kephir notes, the singulars and plurals have the same sense (differing only in number): "one Richard" means one person named Richard, "two Richards" means two people named Richard. Whether that means it would be better to relabel all personal names plain nouns, or live with pluralized proper nouns, I don't know. - -sche (discuss) 19:07, 11 July 2014 (UTC)
"One Richard" is a common noun. "Richard" by itself is a proper noun. However, I think it would be overkill to create common noun sections for every name. --WikiTiki89 19:14, 11 July 2014 (UTC)
We could also just call them all nouns, couldn't we? We could keep the category if needed, but just use the normal "Noun" header. —CodeCat 19:34, 11 July 2014 (UTC)
(@Wikitiki) I don't necessarily disagree that "Richard" can be a proper noun, but I note that whatever parts of speech "Richard" can have, "Richards" can also have. The very reason that given names' definition-lines are italicized is that they are in most uses non-gloss; "and then Richard arrived" means "and then a person named Richard arrived", not *"and then a male given name arrived". An exception would be a hypothetical use like *"not long after the first scribe began to spell the adjective which had been hart as hard, the change spread to instances of the word in compounds, and with that, Richard had arrived", where "Richard" really would be a proper noun meaning "a male given name" — but NB Richards could (equally hypothetically) be used the very same way, e.g. *"and when 'd'-final words began to pluralize with '-s' rather than '-es', Richards arose". - -sche (discuss) 19:36, 11 July 2014 (UTC)
Mentioning a word is an entirely different story. I was not referring to that at all. I also disagree that the plural exists as a proper noun (except in cases where a group of people who are all named "Richard" are collectively named "Richards"; e.g. Richards are coming for dinner, where "Richards" refers to a specific group of people). --WikiTiki89 19:48, 11 July 2014 (UTC)
FWIW, here's how de.Wikt handles it: common names are common nouns, e.g. de:Angela's POS is "noun - first name" and de:Fritz has one POS "noun - first name", another "noun - last name", and a third, labelled "noun", which covers in one section the slang uses that our entry on Fritz split into a "noun" and a "proper noun" section. When a name is defined as referring to only one specific person, e.g. de:Archimedes, it is labelled "noun - proper noun" (but contrast de:Platon). - -sche (discuss) 19:36, 11 July 2014 (UTC)
[e/c] A basic distinction is between proper names (of specific entities, eg, "The White House", "Mack the Knife", "Germany", "The Federal Republic of Germany", "Deutschland", my late dog "Hayek" [short his full name "Friedrich Augustus von Hayek"]) and proper nouns. CGEL (Huddleston and Pullum) hold that ""Proper nouns, by contrast, are word-level units belonging tho the category noun. Clinton and Zealand are proper nouns, but New Zealand is not." and "Proper nouns are nouns which are specialised to the function of heading proper names. There may be homonymy between a proper noun and a common noun, often resulting from historical reanalysis in one or other direction." Their examples are sandwich and Sandwich and rosemary and Rosemary.
Our L3 header "Proper noun" is applied both to terms that serve as names of specific entities and to "nouns which are specialised to the function of heading proper names". Even a term such as White House, which is often considered the name of a specific entity, ie, a proper name, can be shown to be attestably made into a plural. Are the uses of White House to be taken as nicknames for the specific entities Roosevelt White House or the Franklin Delano Roosevelt White House?
Whether in a given case we have under the L3 header "proper noun" a proper name or a proper noun (in the CGEL sense), there is no reason not to show plurals, if attestable. Showing a word like Bible as both a common noun and a proper noun seems fine as the common noun meanings are not entirely predictable from any of the meanings of the proper noun and are attestable, but both common and proper noun meanings are likely to be pluralizable, some attestably so. DCDuring TALK 20:09, 11 July 2014 (UTC)

Reading all of the discussion here, I wonder if things would benefit from an approach like the German Wiktionary, with all of them treated as nouns. Our header structure is different, so it would not fit in exactly the same way. So how about relegating proper name-ness to the actual definition line? For given names, we already have a template to do the job, and for others, the definition already implies properness in most cases. So there's nothing that the header "Proper noun" really adds beyond what the definition already tells the user. It would also allow us to list plurals without problems, while labelling the real proper names as uncountable, and we could also merge Noun and Proper noun sections together in entries when the distinction is not so clear anyway (like in Bible). Furthermore, we need to distinguish nouns that are used without the definite article (such as names) from those that are used with it. There is nothing in the current Bible entry that indicates this to the user. —CodeCat 20:59, 11 July 2014 (UTC)

Uncountability, as we use it, is not the same as not having a plural, though many use {{en-noun}} as if that were true, either through lack of understanding of uncountability, not reading the {{en-noun}} documentation, or being defeated by it. The problem would seem to be that we use "uncountable" both in reference to mass nouns, specific entities, and nouns whose plural form is the same as the singular form. If we could find attestation for expressions like "too much/little White House" (which we probably can), that wold show White House to be uncountable in the sense of mass noun.
Nothing in a template should per se prevent us from making a decision to show plurals for things that appear under the proper noun header. We would just have to revise {{en-proper noun}} and search instances were the plural shown by the template ("tail") did not conform to usage ("dog").
OTOH, none of the OneLook dictionaries call (the) White House a proper noun. (Most call it a noun; some seem to dispense with PoS labels.) We could either take that as an indication that we have bitten off more than we can chew or that we are making an un-lemming-like advance over other dictionaries.
Use with the is usually grammatical information (eg, no the in attributive use; the used to emphasize that a named entity was the famous one of bearing the name), but may also be sense-level information (examples to follow).
It seems to me that we are still some distance away from having a sufficient shared appreciation of the issues involved in altering the thousands of English proper noun L3 headers, let alone those in other languages. DCDuring TALK 22:10, 11 July 2014 (UTC)
  • I dissent from Wikitiki and CodeCat on this. It is possible for a proper noun to have both singular and plural forms. You can have one James or a lot of Jameses, one Henderson or a lot of Hendersons. I also don't understand where CodeCat is coming from with her Wikipedia argument: I read the article last night, and I came out of it thinking the opposite. Purplebackpack89 17:45, 23 July 2014 (UTC)
    @Purplebackpack89: See the section w:Proper noun#Capitalized common nouns derived from proper nouns. --WikiTiki89 17:51, 23 July 2014 (UTC)
    Jameses is the plural form of a common noun. It's very easy to see this just by back-forming the countable singular. A James is not the same thing as James, and there is certainly a big difference between saying you don't look like James and you don't look like a James. Furthermore, the statement James is a James is true, which illustrates that a single specific person called James is a member of the class Jameses (people who have the name James). Compare this to a car is a vehicle which has the same semantic structure. —CodeCat 18:35, 23 July 2014 (UTC)
    The problem, though, is that "Jameses" can be definite or indefinite. "a James" (indefinite) might be common, but "the James" is proper. Purplebackpack89 18:41, 23 July 2014 (UTC)
    "The James" is still a common noun, unless it is turned into a name/nickname. For example: Here "The James" is a common noun: "The James I met yesterday was taller than the James I met the day before." But here "The James" is a proper noun: "There are five people named 'James' at school, but only one of them—the biggest and baddest one—we call 'The James'; Everyone is afraid of The James." --WikiTiki89 18:49, 23 July 2014 (UTC)
  • Famous examples: Thackeray authored The Four Georges, The Newcomes, and The Virginians. A less famous example is The Four Jameses, an anthology of Canada's four worst poets, all named James.
  • Another example is in the absolutely fantastic dialog in Douglas Adams Mostly Harmless, where one character misinterprets "The King" as "the King", because he did not recognize Elvis.
  • I recently created Mighty Mouse. One of the citations has the plural. I was wondering at the time how to enter the plural, which is Mighty Mouses. (What do people use for the plural of that Apple mouse from way back, nicknamed the "Mighty Mouse"?) Choor monster (talk) 21:47, 31 July 2014 (UTC)
    • Further comment: Regarding consensus regarding what is a "proper noun", I noted that the examples given at Template:en-proper noun include "Wiktionarian", which is obviously not a proper noun. The link treats it as a common noun. Choor monster (talk) 14:59, 1 August 2014 (UTC)
      You're right. Those examples have been there since the template was created in 2006 and no one has bothered to correct them. I'm going to follow suit and also not bother. --WikiTiki89 15:05, 1 August 2014 (UTC)
      I've changed the example from "Wiktionarian" to "Alex", which is better, though still suboptimal. Side note, it annoys me that {{en-noun|foobar|ies}} behaves differently than {{en-proper noun|foobar|ies}}: vide foobary (plural foobar or ies) vs foobary (plural foobar or ies). - -sche (discuss) 18:56, 1 August 2014 (UTC)
      I could make them behave the same. The hard part is ensuring that all the entries that rely on the old behaviour are updated. —CodeCat 19:21, 1 August 2014 (UTC)

I've now updated {{en-proper noun}} to have the same parameters as {{en-noun}}. It also categorises differently: uncountable proper nouns aren't categorised specially, but countable ones are placed in Category:English countable proper nouns. This category probably needs cleaning up. —CodeCat 13:19, 3 August 2014 (UTC)

Thanks. I notice that when a plural is specified, as on Alex, Template:en-proper noun now says "usually uncountable, plural ___" — something Template:en-noun does not do when the same parameters are supplied to it. I don't think this new wording is correct; I don't think there's anything unusual about Alex’s countability. Alex is not often counted, but this is different from being usually not countable. Something like anger or dark matter is usually uncountable: it's an emotion / type of substance (respectively), and even though one may find various instances/forms of it, they are just that, specific examples/variants of the one underlying emotion / type of substance. Specific people named Alex are not variants of one underlying person or such; there's nothing about "Alex" which is uncountable per se, it's just that people more often have occasion to refer to one Alex (one person, at a time) than to several Alexes. (Wasn't there recently a discussion about the meaning of "uncountable"/"countable"? Postscript: oh, yes, just a bit earlier in this thread.) - -sche (discuss) 15:37, 3 August 2014 (UTC)
It is in principle possible for a proper name or proper noun to be used uncountably. I am sure that it would not be attestable for very many names, but after failing with Mary, I succeeded with Marilyn From a biography of Edith Head: "If you do, there will be too much Marilyn showing." I suppose we can try to dismiss such cases as metonymy. DCDuring TALK 15:51, 3 August 2014 (UTC)
too much Winston Churchill would be attestable. DCDuring TALK 15:53, 3 August 2014 (UTC)
I think this is again a problem of mixing "countable" and "having a plural". Everything that has a plural is countable, but not everything that has no plural is uncountable, apparently. So when templates receive a parameter that tells them to omit the plural, they should not show "uncountable" as they do now, but "no plural". Something like "usually uncountable" could become something like "plural (rare) ..." —CodeCat 15:56, 3 August 2014 (UTC)
(@DCDuring, after edit conflict) You seem to acknowledge that uncountable use of names is less common (attested for fewer names) than countable use is. That being the case, I'd prefer a wording like "countable and uncountable; plural __" to wording like "usually uncountable, plural __" (emphasis mine). I don't want to discount the uncountable uses [if attested], I just don't want to privilege them over, or discount, the countable uses. - -sche (discuss) 15:59, 3 August 2014 (UTC)
In the case of proper nouns, I can imagine uncountability of an apparent proper noun in some realistic cases (perhaps too much Ebolavirus? (the Translingual proper name used in English)). I find it hard to believe that there could be any proper noun in English that was ever used uncountably in more than a small minority of cases.
I would very much like us to simply show the plural of proper nouns without committing ourselves to the existence of the plural of the lemma or its possible uncountability. Why would we want to use one minute of contributor time in attesting or debating plurals, let alone uncountability, of individual proper nouns? To me the situation is reminiscent of the uncommon inflected forms that occupy some Latin inflection tables. DCDuring TALK 16:48, 3 August 2014 (UTC)
Yes, to be clear, my top choice is for Template:en-proper noun to behave like Template:en-noun in this regard (that was in theory, but not in fact, what the recent edits to it were to do), and just display the provided plural. - -sche (discuss) 17:10, 3 August 2014 (UTC)
Well, if it were to do exactly the same, it would also display a plural by default. —CodeCat 17:12, 3 August 2014 (UTC)
Automatic pluralization can be suppressed (for now, although depending on what percentage of our proper noun entries are names, it might be worth enabling in the future). I'd just like the template not to insert novel claims that terms are uncountable whenever plurals are added (it's oxymoronic). - -sche (discuss) 17:18, 3 August 2014 (UTC)
Indeed. I would very much like to make sure that we don't transfer to proper nouns practices that can be tedious (though sometimes meaningful), even for common nouns, such as RfDing or RfVing proper noun sections or definitions solely for their use in the plural or their uncountability. For proper nouns and even proper names plurals are virtually always possible. Uncountability, too, is significantly rarer, but also, I think, always possible.
Do we need to do anything special to record this as a desirable practice? Does such a recording have any force or would we need a vote? I would like to have a link to this archived discussion in Wiktionary:English proper nouns (very incomplete, awaiting EncycloPetey) or its talk page. DCDuring TALK 17:56, 3 August 2014 (UTC)
  • Would it make sense to have one or more parameters in {{en-proper noun}} that reflected what meaningful class a given proper noun was in, eg, personal name (or given name, or surname, or both), organization name (presumably a proper name of a specific individual), proper individual name, eg, Marilyn Monroe, toponyms, demonyms, language names, and possibly others? Each class could generate the appropriate inflection line, plural or no plural and more specific category membership. Personal names unless of a specific individual would show plurals by default, organization names would not, etc, with the possibility of overriding any of the defaults. DCDuring TALK 20:21, 3 August 2014 (UTC)
    My approach has been to go through the given-, family- and place-name categories and add plurals after checking that they are attested. I would be wary of adding more "class" parameters, in part because it was only when I looked at {{en-proper noun}}'s code recently that I realized that it already had parameters for given and family names — I have never seen anyone use them, and I presume that's what would happen if we added more class parameters. (There would also be problems if, as frequently happens, a string is a placename and a given name and a family name.) Isn't it sufficient to add plurals to the "proper nouns" (or subcategories/types of proper nouns) which pluralize, and leave the other proper nouns as they are, pluralless? - -sche (discuss) 22:08, 3 August 2014 (UTC)
    It certainly is simpler in terms of decision making. We would gain some information by having smaller, more homogeneous categories. It would make it easier to implement any changes in policy that applied to the classes or subsets of the classes. It was just a thought that occurred to me, knowing that you were spending time visiting the entries anyway. DCDuring TALK 22:36, 3 August 2014 (UTC)

Example sentences in ELE, linking of words and delinking of transliterations[edit]

What's the history of the rule behind Wiktionary:ELE#Example_sentences? Who said we can't link individual words? Now that transliteration is (unintentionally) wikified in {{usex}}, see Wiktionary:Grease_pit/2014/May#Transliteration_linked_to_individual_parts_in_usexes_when_hyperlinked, my request to delink it, is brushed by - we shouldn't link words, anyway. Can we change this rule - "not contain wikilinks" for words used in usage examples? Do we really need a vote for that? Can somebody help delink usex transliterations, as in this revision or this revision ? --Anatoli (обсудить/вклад) 00:28, 14 July 2014 (UTC)

Someone needs to edit Module:usex to delink transliterations- removing links by hand is a waste of time. DTLHS (talk) 00:39, 14 July 2014 (UTC)
Yes, I agree (thanks for agreeing to fix!) but the rule itself doesn't reflect the reality and I think it's not helpful. A lot of Russian usexes are linked (not my edits but I don't see it as a problem, in fact, it may quite useful for learners to link to lemmas or some difficult words) and most Chinese usexes are linked and it's very useful for languages with no straightforward word boundaries. Anyway, editors should be free to choose, if they want to wikify individual words in usexes. --Anatoli (обсудить/вклад) 00:58, 14 July 2014 (UTC)

Gender templates for French inflected forms[edit]

{{fr-adj-form}} has been edited so that it no longer accepts gender. I understand that adjectives do not inherently have their own gender but agree in gender with what they are describing. I also understand that in with the 'definition', it says 'feminine singular of' or 'masculine plural of'... but I still think we should encourage having the gender in the head word wherever possible.

My proposal is to enable gender in {{fr-adj-form}} and to add back the gender to French adjective forms wherever possible. This is very doable by bot, for example \{\{fr\-adj\-form\}\}\n\n# \{\{feminine of\|([\ -9\;-\\\^-z\}-ퟻ]+)(\||\}) is a regex that finds all the uses of {{fr-adj-form}} with no gender, followed by {{feminine of}} on the following line (with a single blank line in between). Renard Migrant (talk) 16:49, 14 July 2014 (UTC)

Why should the gender information be in two places, both on the headword line and in the definition? - -sche (discuss) 18:28, 14 July 2014 (UTC)
I oppose this for the reason -sche gave, and the reasons you yourself gave too. —CodeCat 18:29, 14 July 2014 (UTC)
I find it quicker to understand with the gender in the head word. I say quicker, probably by a few tenths of a second. Renard Migrant (talk) 21:57, 14 July 2014 (UTC)

How about going the other way then? Actively removing the gender from the headword template? That's even easier to do! Renard Migrant (talk) 11:03, 15 July 2014 (UTC)

Software update: <ref> without <references/> no longer shows an error or categorizes[edit]

As was announced on Wikipedia but oddly not over here yet, "With the deployment of 1.24wmf12 on July 10, missing reference markup will no longer show an error; the reference list will show below the content [...] without adding a category, so there's no way to find and fix the affected pages." See this WP thread (permalink) and this WP thread (permalink) for discussion, and diff for an example of the phenomenon. Note that our abuse filter still (correctly) discourages adding <ref> without <references/>. - -sche (discuss) 17:18, 14 July 2014 (UTC)

If a page has multiple language sections, and un-<references/>ed ref tags are added to one of the upper language sections, the references appear in the last language section. This has the potential to be especially confusing for people who use Tabbed Languages. - -sche (discuss) 18:26, 14 July 2014 (UTC)


In a discussion above, DCDuring noted something that (I think) implied we're not using the term "uncountable" the way we should. But I'm not quite sure what this means, as to me uncountable just means having no plural. Is this not what it means, and what does it mean in that case? I came across a few categories named "singulare tantum", is that the term we should be using instead of "uncountable"? —CodeCat 18:34, 14 July 2014 (UTC)

Uncountable does not mean having no plural, it means that quantities of the noun are not measured in discrete amounts. Theoretically, a noun could be countable, but not have a plural if, for example, there is only one in existence and no one ever speaks of any others. For an uncountable noun, it is impossible to say that "there is only one in existence". Proper nouns can be countable but not have a plural: there is only one William Shakespeare (barring metaphorical usage, or others who happen to have the same name), but William Shakespeare is most certainly countable. --WikiTiki89 18:57, 14 July 2014 (UTC)
What I understand, then, is that uncountable words have no plural for semantic reasons (it makes no sense to speak of a plurality) while the remainder have none only because it is simply rarely used or not at all. —CodeCat 19:04, 14 July 2014 (UTC)
Well I think that proper nouns such as William Shakespeare also don't have a plural for semantic reasons, but it's a different semantic reason. --WikiTiki89 19:23, 14 July 2014 (UTC)
"Paint" is uncountable when you talk about "some paint" but countable when you talk about "three different red paints". If something has no plural but is singular, I tend to use the "plural not attested": {{en-noun|!}}. Equinox 19:30, 14 July 2014 (UTC)
Yes, but the "paint"s in your two examples are different senses. --WikiTiki89 19:34, 14 July 2014 (UTC)
I think CodeCat is thinking about the inflection line for common nouns and {{en-noun}}, ie, not definition-level of countablity/uncountability distinctions.
The prevailing pattern of usage of "uncountability" by English native contributors at Wiktionary coincides with the mass noun concept. However, many uses of various early incarnations of {{en-noun}} used features of the template intended to mark uncountability (mass noun) to suppress the display of plurals, for whatever reason the contributor felt justified that suppression, eg, user didn't-know-how/couldn't-be-bothered to get plural ending in "es" or a truly irregular form to display, user didn't think noun had or should have a plural form, plural form was not attested. If you combine that with the changes to {{en-noun}} wrought by contributors with an imperfect understanding of the concept, you can understand why we have not made much progress in rectifying this. I hope we can come up with some scheme so that our inflection-line displays can be made correct without thousands of hours of tedium and are not too misleading in the interim. I doubt that bots can be relied on however, except perhaps for narrowly circumscribed cases.
At the sense level we use "labels" or "contexts" to distinguish. There is nothing that prevents understanding usually if someone uses a countable noun uncountably or an uncountable noun (mass noun) countably, but we have invested a great deal of effort in attempting to distinguish uncountable from uncountable senses, which effort is worth preserving. The task of marking each English noun sense as countable or uncountable (or both) is quite incomplete.
At the inflection-line level, we do not usually get data to support our claims that a given common noun is always or never countable or that countability of uncountability is the prevailing usage, relying mostly on native-speaker intuition, as most other dictionaries do not expend resources on this matter. DCDuring TALK 20:59, 14 July 2014 (UTC)
Would it be ok then if we adopt the practice of showing "no plural", "singular only" or the like in the headword line, and leave countable/uncountable information to the individual senses? That way the headword line is agnostic about countability, which makes sense if this can be different for different senses anyway. It would also mean changing the categorisation of many nouns, emptying out "uncountable nouns" categories in most cases and substituting it with something else. Possibilities might be Category:English singular-only nouns or Category:English nouns with no plural. We may want to revise the use of "plurale tantum" as well. —CodeCat 21:09, 14 July 2014 (UTC)
CodeCat said "as to me uncountable just means having no plural". Oh come on I find it hard to believe you're not better educated than that. There's such a thing as countable singular use, e.g. "I have a grain" is a countable singular use of grain. "I have some grain" is uncountable use of grain. Some countable nouns will be attested in the singular but not the plural. Renard Migrant (talk) 22:04, 14 July 2014 (UTC)
WT:AGF says we should take CodeCat's word for it. DCDuring TALK 22:15, 14 July 2014 (UTC)
No, it only says that we should assume CodeCat's intentions were in good faith. --WikiTiki89 15:00, 15 July 2014 (UTC)
@CodeCat: If eliminating inflection-line information would make things simpler for you, who am I stand in your way? Why don't we eliminate the display of regular plurals (ending in "s", "es", and "ies") too? Oh, wait, users might value the information.
The logic of our entry display is that inflection-line information is assumed to carry over to definition lines unless there is something contrary indicated on the definition line. Thus exceptional plurals are sometimes displayed at definition lines, sometimes only at definition lines. It is a major change to depart from that formulation for one attribute of one PoS in one language, especially where the language is the wiki's host language.
So, before we start changing modules and templates of wide use, I would like to understand an implementation plan that preserved the correct information that was now in the inflection lines and transferred it to the definition lines for each type of headword-line, whether implemented using {{en-noun}} or {{head}} directly or by other means. A dump-processing run that took a census of the options used in {{en-noun}} would be useful for that. We must have at least a dated one to support the major changes you previously made to {{en-noun}}.
It would be nice if the changes were carried out with more care and knowledge than the changes made to {{en-noun}}. DCDuring TALK 22:15, 14 July 2014 (UTC)

One way to recognize the difference is that singular countable nouns typically require a determiner, while singular uncountable nouns can get by without one. At Simple English Wiktionary, this is dealt with at the sense level (e.g., alarm). I can't see how it would make sense to deal with it at any other level.--Brett (talk) 23:30, 14 August 2014 (UTC)

@Brett: Yes, but we do not yet have that information at the sense level in many, many of our polysemic English noun sections. In even more cases there is only one noun sense, so the information on the inflection line is adequate. Further, many definitions, whether usually countable or usually uncountable, will show some use of the other kind, but without much other semantic difference, so we would be essentially duplicating the definition in order to more clearly distinguish uncountable and countable use. All of this makes reform a little difficult, especially if it is solely undertaken as a template simplification and "tidying" problem. DCDuring TALK 00:43, 15 August 2014 (UTC)
Understood. In cases where both countable and uncountable uses exist a given sense, then we can simply say as much: rather than duplicating the entry, just have a tag for "countable and uncountable", which again, is what is done at Simple English.--Brett (talk) 00:56, 15 August 2014 (UTC)
And we may not be able to come to agreement on exactly how it should be done. DCDuring TALK 01:48, 15 August 2014 (UTC)

Context Label: Reflexive[edit]

I have recently edited the module code for the context labels (https://en.wiktionary.org/wiki/Module:labels/data) so as to have it automatically send entries marked with the label "reflexive" into a category named "-LANGUAGE NAME- reflexive verbs". I did this in an attempt to have the Macedonian reflexive verbs compiled into a list, since I didn't see any other way to do this other than add "[[Category:Macedonian reflexive verbs]]" under each entry, which didn't seem like an ideal solution - I wanted something automatic, just like the automatic system that works for intransitive and transitive verbs. I also thought that if I merely wrote "[[Category:Macedonian reflexive verbs]]", it may end up erased in the future, whereas some automatic mechanism would be operable on a longer term. However, things have gone awry.

Apparently, the context label "reflexive" has been used for various entries in various languages to mark reflexive pronouns as well. It has also been used to denote reflexive senses of verbs which are not truly reflexive and thus don't belong in a reflexive verb list. Now, I suppose these things need mending, so I have come here to announce what has happened in hope that someone will be able to restore things the way they were before the change (and possibly advise me as to how to solve the problem I had with the Macedonian reflexive verbs, i.e. how to have them automatically go to a list of reflexive verbs). Martin123xyz (talk) 14:59, 15 July 2014 (UTC)

As far as I understand, you could have a label 'reflexive verb' that displays reflexive but categorizes in reflexive verbs. I don't know about other languages (much) but in French, almost all transitive verbs can be used reflexively, and almost no verbs are always reflexive, so you could talk about reflexive usage but not reflexive verbs (because they're not inherently reflexive, just they can be used that way). Renard Migrant (talk) 15:02, 15 July 2014 (UTC)
I noted before you made this change that calling verbs where one or more senses are used reflexively "reflexive verbs" is silly. Just look at Category:English reflexive verbs now. Almost none of them are actually reflexive, they just happen to have a sense that is used reflexively. The same applies to Category:English transitive verbs and Category:English intransitive verbs as well, which also had categorisation added recently for some reason. And Category:English countable nouns and Category:English uncountable nouns are a similar problem, which prompted me to start the discussion above. —CodeCat 15:03, 15 July 2014 (UTC)
That's the argument for categories with names like Category:English nouns with countable senses. Even then, it would seem even better to just not categorize at all. Renard Migrant (talk) 15:19, 15 July 2014 (UTC)
Probably, yes. Most of the time these labels are only used when it's not clear from the definition, or to contrast with other definitions. So paradoxically, the nouns labelled "countable" are primarily those which are also labelled "uncountable". —CodeCat 15:21, 15 July 2014 (UTC)
I know that many think it pointless to have a reflexive verb category, but in Macedonian there are some verbs that are always reflexive, i.e. whose reflexive form is inherent. For example, "се кае" means "to regret", but "кае" doesn't anything. Also, there are many cases where a reflexive form of a verb is unrelated to the basic one when it comes to meaning. Thus, "дере" means "to skin" whereas "се дере" means "to scream". I think that these verbs deserve a separate category. Finally, many of the reflexive verbs in Macedonian have one-word equivalents in English - in those cases, English doesn't convey reflexivity explicitly. For example, Macedonian "се движи" and "движи" both correspond to English "move", but they have different meanings - the former means to be in motion whereas the latter means to cause something to be in motion. I think that in these cases too, the reflexive verb deserves its own category.
It's not as though I created separate entries for all reflexive forms in Macedonian and then declared them unique verbs. For example, I haven't created an entry "се допира" beside "допирa", because I don't feel that there is anything special about the reflexive form - it is marked explicitly in English too. Namely, the difference is that between "to touch oneself" and "to touch". This is because "се допира" is a true reflexive verb, whereas the point is that I am not really focusing on the true reflexive verbs. I am more interested in a separate category for the autocausative, anticausative and inherent ones. The true reflexive, reciprocal, and universal passive verbs are predictable and as you pointed out, derivable from any transitive verb. I really don't know why all of these are under the umbrella term "reflexive verbs"...
Anyway, I have a potential solution. I would create a new context label in the module code, called "mkreflexive", which would send entries to "Macedonian reflexive verbs", and I would mark Macedonian reflexive verbs with it. Meanwhile, I would set the display to simply "reflexive", which is what users actually need to see. Then, only I (and possibly someone who chooses to continue my work in the future) would use this label, and there would be no categories for reflexive verbs for other languages and no problems with reflexive pronouns or pseudo-reflexive verbs. However, could the problem I have caused already be fixed, i.e. could all the unnecessary (and defective) categories be undone? Martin123xyz (talk) 15:24, 15 July 2014 (UTC)
{{fr-verb}} covers this by allowing type=reflexive. There are some, s'agir differs in usage from agir for example. Renard Migrant (talk) 15:28, 15 July 2014 (UTC)
I don't agree with creating such a label. A better solution would be to let the inflection table add the category. —CodeCat 15:31, 15 July 2014 (UTC)
How would I let the inflection table add the category? I use the same inflection table for reflexive verbs, except that I use the parameter "ref" to have it add the reflexive marker "се" where appropriate. 16:08, 15 July 2014 (UTC)
You (or someone) can edit the template and have the "ref" parameter trigger a category. --WikiTiki89 16:48, 15 July 2014 (UTC)
Could you tell me how to have the parameter trigger a category? I have no idea where to code that, as I've never even defined the "ref" parameter anywhere. I just automatically used it in an if-statement and it worked. Martin123xyz (talk) 16:55, 15 July 2014 (UTC)
Exactly the same way. You use it in an if-statement, and have the true-clause add a category: {{#if:{{{ref|}}}|[[Category:WHATEVER]]|}}. You can add that anywhere really, but the end is the best place I think. --WikiTiki89 17:06, 15 July 2014 (UTC)
How very simple - thank you. I didn't think it could just work like that. I'll see to it soon enough. Martin123xyz (talk) 17:08, 15 July 2014 (UTC)
I prefer refl in general to avoid confusion with reference. Renard Migrant (talk) 20:58, 17 July 2014 (UTC)

Script or language: let us reduce ambiguity and prevent confusion![edit]

On pages such as https://en.wiktionary.org/wiki/Appendix:Proto-Slavic/-ica and in many translations lists, spellings of one word in one language are given in multiple scripts. These scripts are indicated by names that sometimes coincide with language names, such as with "Latin" and "Hebrew". That easily creates ambiguity and, with it, confusion, at least with me. I have changed such references several times by adding the word "script" where a script is meant, but such contributions have also been reverted. I do insist that tables where language( groups) may be branched into several languages and where languages may be branched into several scripts, it is difficult for the eye to make out if the final branch concerns a language or a script.

I propose adding the word "script" to all occurrences of script names near language names. I do not know how to do it, but there seem to be scripts (of a different kind, this time) that can help us do this in a rather automated way.

Please help a language enthousiast, and his colleagues!

(I am trying to find my way, and just found out that this divided into month parts. This is the second place where I added my plea, because I put the first one in a month part somewhere in 2013.Redav (talk) 20:27, 17 July 2014 (UTC)

Support. Before I started using targeted translations, I used to come across some really strange Latin translations, only to find out it was Latin-script Ladino. It was my fault for skimming through the translations too quickly, but it won’t hurt to add script to the lines. — Ungoliant (falai) 20:35, 17 July 2014 (UTC)
(e/c) I don't see where the confusion can come from, since Latin is not a sub-language of Serbo-Croatian. In some places, we do use the word "Roman" instead, but this does not solve the problem in the general case, since some multi-scriptal languages use scripts like Hebrew and Arabic, which are also languages. --WikiTiki89 20:38, 17 July 2014 (UTC)
"Roman" is a misnomer anyway. The script is called Latin; "roman" is one variety of the Latin script, the other being called "italic". —Aɴɢʀ (talk) 20:44, 17 July 2014 (UTC)
By that logic, romanizations would be de-italicizations. "Roman" has both meanings. --WikiTiki89 20:50, 17 July 2014 (UTC)
Would they even be de-italicizations, if they were presented (as they sometimes are) in italics? Would they not then be italicizations? Why, this puts a whole new spin on the debate over whether or not to italicize Cyrillic! lol
As WikiTiki says, "Roman" has both meanings. - -sche (discuss) 20:55, 17 July 2014 (UTC)
For some languages, such as Cree, script names are (in my experience) not provided at all. One sees simply Cree: ᒪᐢᑲᐧ / maskwa. Providing script names, especially with "script" spelled out, would be quite unwieldy:
- -sche (discuss) 20:55, 17 July 2014 (UTC)

For Beer parlour people who work more in discussions than on translations or in the main namespace, it IS confusing to have "Latin" and "Hebrew" to mean both the script and the language (also script tags Roman and Cyrillic). If you used User:Conrad.Irwin/editor.js quite a lot, you'd notice that the name conflict is quite frequent. When a translation into Hebrew, Aramaic, Serbo-Croatian, Latin appears not where it's expected, either from this tool, a bot or a human error. I'm not suggesting any specific solution but just letting you know that I have also experienced these problems firsthand and I am also very interested in the resolution. --Anatoli T. (обсудить/вклад) 00:49, 18 July 2014 (UTC)

You're right that it causes bugs in some of our tools, but I don't think it's confusing to people (at least when everything is formatted correctly). --WikiTiki89 13:13, 18 July 2014 (UTC)

Thanks for your input so far! I read:

  • 1) that at least two more people got or get confused by the ambiguity of names that may either indicate a language or a script;
  • 2) that someone does not see where the confusion could come from; that remark referred to the one particular example I mentioned, and (obviously) to readers who know beforehand that Latin is not a sub-language of Serbo-Croat; luckily I knew, otherwise I might indeed have been misled by the information, and that is exactly my point!;
  • 3) that someone thinks adding the word "script" to script names is unwieldy; I would have thought so too if I had not yet (admittedly only recently) discovered that the actual work may be done by bots.

For substantiation's sake, here I give you more examples where confusion looms.

On https://en.wiktionary.org/wiki/water I read in the translations list:



If I were totally ignorant about these languages and scripts, I would not be able to make out if Devanagari were a sub-language, or if Campidanese were an orthography name (some languages have several orthographies) or even script name, or wether Arebica were a sub-language.

On https://en.wiktionary.org/wiki/Appendix:Proto-Sino-Tibetan/%C5%8Ba I encounter:

        • Tibetan
          • Written Tibetan: (nga, I)
            • Modern Tibetan [Lhasa]: /ŋa˩˨/


      • Kiranti
        • Eastern Kiranti = Rai
          • Rodong: /ka-ŋa/
          • Limbu: ᤀᤅ (aṅa) /əŋa/
          • Waling: /aŋ-ka/

Again, I personally know that written and spoken languages exist alongside each other, I know that // are used to enclose IPA-script indicating pronunciation. But if "Waling" had not been mentioned and if the pronunciation in Limbu had been the same as in Rodong, I could have believed that Rodong and Limbu are just different scripts for the same language.

On https://en.wiktionary.org/wiki/Appendix:Proto-Semitic/bayt- I see:

The unsuspecting reader might be led to think that Egyptian Arabic is meant as a script name (since the spelling differs from the vocalized spelling after "Arabic", and that Maltese is yet another spelling or orthography. (By the way, there are people arguing that Maltese is a dialect rather than a language.) The Aramaic, Hebrew, and Syriac case were already mentioned by user writer above.

I do acknowledge a difference is (often) made between language and script indications by putting a bullet in front of a language name and leaving it out in front of a script name. But the difference in meaning between bulleting and non-bulleting does not seem to be easily understandable, and in my case, I did not notice it until I was already confused.

To return to my proposal:

  • I think sharing and finding information on e.g. languages and scripts is very valuable.
  • I think the information given about language and script names deserves to come across clearly and easily. I and several of you have given examples of things that may (and do) cause confusion to me, to several of you, and to other unsuspecting readers.
  • I get the impression that bots could do the hard part of the job more or less automatically, and that the "only" work a human might have to do is adding " script" to all script names. I volunteer (if I get the access needed, which does not seem the case now).
  • I read of no objections (other than the alleged unwieldiness in adding the word "script"), there are no remarks that e.g. adding the word "script" confuses or makes reading more difficult.
  • I propose adding the word "script" behind a script name, but am open to other suggestions.
  • To the remark "For Beer parlour people who work more in discussions than on translations or in the main namespace" I would like to say: I would like to be adding translations, and I did that a few times already. But, at least to me, Wiktionary is not always intuitive to handle: I seem to have managed to scramble a translations box when adding a translation, I added the word "script" several times but saw them reverted (which is fine with me if the problem I observe is solved in a different way) and at the same time learnt that bots seem to make changes as well, I added a few other bits and pieces but did not feel helped by an environment handing me the lay-out nearby (I have visited multiple pages to find examples and tried to mimic them). I am not certain that my input will not be overridden by bots, because I cannot see any indications that warn me I am inputting in the same place where a bot might revert my work. But this is a different topic already, isn't it?Redav (talk) 13:19, 2 August 2014 (UTC)
    Re "someone thinks adding the word "script" to script names is unwieldy; I would have thought so too if I had not yet (admittedly only recently) discovered that the actual work may be done by bots": I don't think it was meant that the work would be difficult, but that the word script adds unnecessary text to the screen. --WikiTiki89 20:56, 2 August 2014 (UTC)
    • You may be right. I reacted to -sche's utterance saying: "Providing script names ... would be quite unwieldy." rather than: "Reading script names ... would be quite unwieldy." Personally I am not convinced that reading the extra word "script" would look unwieldy to me: it is only the relatively few languages that are / were written in two or several scripts that would show the extra word "script" in front of the various spellings of their words.Redav (talk) 15:21, 8 August 2014 (UTC)

I have just come to realize that leaving out any script name (so even the part that would come before the word "script") would solve the problem as well. We would then simply have e.g.:

in the list of translations for "water". Or would that (re)create other problems that were meant to be solved by indicating script names?Redav (talk) 15:21, 8 August 2014 (UTC)

That is actually a brilliant idea! I would support it. --WikiTiki89 15:25, 8 August 2014 (UTC)
Support it as well. —CodeCat 15:27, 8 August 2014 (UTC)
Wikitiki is right, I was saying that spelling out script names like "Canadian Aboriginal Syllabic script" would make the relevant lines in the translations tables undesirably long. At present, for languages that use CAS, script names are IME not provided at all. Dropping script names in other places is a fine solution in my opinion. As far as I know, there are no cases where both script forms and dialect forms are nested on separate lines, because that would be untenably confusing even in the current arrangement. (See for example how the Serbo-Croatian translations of euro and chemistry are provided.) - -sche (discuss) 19:17, 9 August 2014 (UTC)


This bot, which belongs to User:JackPotte, has been active on Wiktionary in the past, but in December I noticed that it has no bot flag, so I followed our current procedure as I understand it, and blocked the account. Since a protest of the block has now been posted on the talk page, I thought it would be a good idea to expedite things by bringing it up here. I also want to know if I should have dealt with the matter differently, and if I should handle bot accounts differently in the future.

I should mention that, although most of the edits have been interwikis, a run was performed in March of 2013 that created a large number of entries for Geological era names, at least some of which (if memory serves) ended up in rfv. I don't have any objection to those entries as a whole, and they may very well be a one-time exception to the bot's normal interwiki tasks, but I thought they were worth mentioning, just to be complete. Chuck Entz (talk) 21:58, 19 July 2014 (UTC)

Hello, just to precise that my March summaries were pointing to their BP permission. JackPotte (talk) 22:48, 19 July 2014 (UTC)
I think JackBot failed either two or three bot votes. Renard Migrant (talk) 21:20, 23 July 2014 (UTC)
Precisely and objectively I had already proposed here two bot jobs which had been judged unnecessary by a minority:
  1. Wiktionary:Votes/bt-2009-12/User:JackBot
  2. Wiktionary:Votes/bt-2010-11/User:JackBot2
But they could also have been useful as on 21 other wikis as you can see, and are not linked to the test for which I was indefinitely blocked after without a message (which is not praised in the current recommendations as I've already demonstrated in the dedicated template).
Moreover I used to published my scripts on the bot subpages and Github, if you want to make your own idea of the whole context apart from that. JackPotte (talk) 19:30, 24 July 2014 (UTC)

Factually after three weeks nobody has treated Category:Requests for unblock and the blocker is simply ignoring my follow-up on his page. Is the Wiktionary policy something that is not applied? JackPotte (talk) 17:24, 9 August 2014 (UTC)

  • The above-mentioned votes (1, 2) were interpreted at the time as showing "no consensus" for having JackBot perform the tasks which were the subjects of those votes. But firstly I think some of the votes/subvotes might have been interpreted as passing had they been held under present [unwritten] "rules"/sentiments (as they show 2-to-1 support), and secondly we have (as someone observed in another thread) come since that time to realize how useful it is to have multiple interwiki bots. Is there anyone who objects to JackBot adding interwikis? Is there anyone who would like to vocally support JackBot adding interwikis? I am willing to unblock the bot if there is support for that. - -sche (discuss) 01:26, 12 August 2014 (UTC)
Lmaltier (talkcontribs) have seen my work for six years in fr.wikt, I would be glad to him if he could testify here... JackPotte (talk) 09:59, 12 August 2014 (UTC)
I support JackBot adding interwikis, as long as they are added correctly, in the right order, etc. I also think we should have User:Ruakh's input as the operator of one of the interwiki bots. --WikiTiki89 13:56, 12 August 2014 (UTC)
I've slowly come to that conclusion myself. It's hard to remember the exact reasons for the block 9 months later, but at the time I was aware of the deletion of some of the geological entries, but had forgotten about any discussions here re: permission to do a bot run. When I saw that he had been refused the bot flag in the past, I apparently thought the block would be the proper way to prompt him to come here to discuss it per WT:BOT. I overlooked the fact that the bot was only doing interwikis, a gray area that we intentionally leave open in our bot enforcement practice. I brought the matter here because I thought there was a possibility that I had made a mistake, but I wasn't sure. I would support unblocking as long as the bot sticks to interwikis. Chuck Entz (talk) 14:15, 12 August 2014 (UTC)
I do not think that interwikis are a gray area that we leave open. You were absolutely correct to block the unapproved bot and direct its owner to seek the bot flag. (And similarly for the geological-entries case. Even if the run had consensus, it should not have been performed without the flag.) —RuakhTALK 20:11, 14 August 2014 (UTC)
  • Incidentally, I do support having more interwiki-bots. There was a time when we decided that we didn't need more than one, because the Interwicket/Rukhabot dump-based approach covered them all — and in mainspace that's true — but since then we've eased up on that, realizing that there's little harm in having more-traditional/Wikipedia-style interwiki-bots as well. The two are not in competition. And the more-traditional/Wikipedia-style interwiki-bots can cover categories and appendices, which the Interwicket/Rukhabot dump-based approach cannot (or at least, currently does not). —RuakhTALK 20:18, 14 August 2014 (UTC)
    @Ruakh: I have created Wiktionary:Votes/bt-2014-08/User:JackBot for bot status. Please instruct me if there are some deficiencies with the vote, such as details on how the interwiki is to proceed. --Dan Polansky (talk) 20:24, 14 August 2014 (UTC)

Russian pronunciation - standard, alternative, regional, dated or simply individual[edit]

User:Wikitiki89 has been persistently adding alternative Russian pronunciations, which I consider not only non-standard but individual and rare, possibly limited to immigrants. He has been very persistent in his edits, so any reversals just results in edit wars. I have no problem with having alternative non-standard forms but Russian is much more phonetic than he claims it to be, so if you pronounce it irregularly, you can spell it so, there are notable (well-documented exceptions), which also follow certain rules or patterns but there is some limit to irregularities. I tried to compromise by creating alternative non-standard forms but he insists on adding these irregular pronunciations on the regular entries. In particular he claims that these words are alternatively pronounced:

  1. капюшо́н (kapjušón) as капишо́н (kapišón)
  2. двою́родный (dvojúrodnyj) as двою́рный (dvojúrnyj)
  3. во́доросль (vódoroslʹ) as во́дросль (vódroslʹ)
  4. не́который (nékotoryj) as не́кторый (néktoryj)
  5. сейча́с (sejčás) as щас (ščas) (I'm OK with this one but still the casual pronunciation should belong to the alternative forms, since it exists)

Another claim was that бюрокра́тия (bjurokrátija) can be also pronounced as бирокра́тия (birokrátija), which I find quite ridiculous and he's using кверх нога́ми (kverx nogámi) as the first translation for upside down, "кверх нога́ми" sounds very rustic and illiterate to me (even if this form can be found on the web), вверх нога́ми (vverx nogámi) is the common and standard form. These alternative forms do exist but they are not as common and these pronunciations are neither standard nor common. In any case, the alternative pronuncations, IMO, belong to alternative forms. I am creating a request on gramota.ru, since I don't know how to handle this situation. The English Wiktionary doesn't have enough native Russian speakers, so I'm not sure this argument can be resolved. On the Russian Wiktionary such edits would be ultimately reverted. I don't claim to be the ultimate source for the Russian language but some Russian edits of Wikitiki89 surprise me. Sorry, I don't mean to insult him or something. My goal is accuracy. --Anatoli T. (обсудить/вклад) 00:43, 21 July 2014 (UTC)

Let it be known that (1) Anatoli's Russian and my Russian are from different regions, (2) I was raised in a highly educated environment and could not possibly have picked up any "illiterate" Russian, and (3) I have been willing to discuss each of the above cases individually with Anatoli and don't see a need for BP discussion. --WikiTiki89 01:44, 21 July 2014 (UTC)
1) My family's accents are a mixture of south Russia, Ukraine and Siberian accents. Due to education exposures and self-discipline as far as the language I speak standard and common Russian, not southern or Ukrainian Russian, travelled a lot in Russia, read many books and watched a lot of movies, videos, etc. My Russian is not regional at all and I can tell when Russian is regional or non-standard. And since I lived till I was 30 in Russia, speak Russian with my family, friends and communicate with Russians in Russia, I have been exposed to various accents. I'm sure I can judge what is right and what is wrong in Russian to a high degree but as I said, by no means, I don't consider myself an ultimate source. Having said this, I humbly consider my Russian significantly better than his. 2) It is quite commendable for a long-time emigrant, who left Russia in the young age to preserve the language but there are still small problems, which show in the edits and I don't think we should allow misleading info. 3) the discussions so far have not been very fruitful and edit-warring has happened on a number of entries. As an interim solution, I suggest to source irregular pronunciation with something other than plain Google searches. As I said, I don't oppose any non-standard form entries, which I have also created. --Anatoli T. (обсудить/вклад) 02:05, 21 July 2014 (UTC)
And how should I do that? With links to YouTube videos? --WikiTiki89 02:17, 21 July 2014 (UTC)
Not sure yet. Maybe Youtube, if the pronunciation is clear and the speakers are native speakers. --Anatoli T. (обсудить/вклад) 02:23, 21 July 2014 (UTC)

I support moving non-standard pronunciations to the entries with non-standard spellings. --Vahag (talk) 08:57, 21 July 2014 (UTC)

This would resolve some disagreements, since Russian pronunciation is much more regular and most irregularities are documented. It doesn't matter that much if a spelling is used more often than pronunciation or the other way around. Just need to create those non-standard forms. The irregular spelling is used quite often used to render the irregular pronunciation and the existence of irregular spellings can usually be easily found. --Anatoli T. (обсудить/вклад) 09:11, 21 July 2014 (UTC)
On the other hand, most people who use the colloquial pronunciations, use only the formal spellings. --WikiTiki89 11:55, 21 July 2014 (UTC)
Yes but even if a person reads "what's up" as "wassup", it doesn't mean that "what's up" should have the same pronunciation. It's better to separate regular and irregular pronunciations and spellings, especially when a form is definitely a different (older, colloquial, regional) of another one, like капюшо́н (kapjušón) and капишо́н (kapišón). --Anatoli T. (обсудить/вклад) 22:52, 21 July 2014 (UTC)
"What's up" and "wassup" is a bad example, because it is colloquial either way and so people will write it exactly as they say it. This is more like environment being pronounced like enviorment (which is citeable in google books:"enviorment"). Most people who say enviorment, still write environment, which is why it makes sense to have the pronunciation right there. --WikiTiki89 23:32, 21 July 2014 (UTC)
Well, it depends on the case and if you can make this type of judgement. If you consider "wassup" a bad example, then "капишон" is worse. It's a dated form, not an alternative pronunciation, since "пю" is never read as "пи" as you insisted. --Anatoli T. (обсудить/вклад) 23:51, 21 July 2014 (UTC)
I said that "wassup" is a bad example, because "what's up" is also colloquial. "Капюшон" is not colloquial, so that reason does not apply. Compare it to my example of environment. --WikiTiki89 23:58, 21 July 2014 (UTC)
I think you misunderstand. "wassup" (colloquial) should have its own pronunciation, so should "капишон" (dated) and "двоюрный" (irregular) but the regular forms shouldn't include them. "водросль", "некторый" may be considered similar to the "enviorment" case. --Anatoli T. (обсудить/вклад) 00:14, 22 July 2014 (UTC)
I did not misunderstand you. You misunderstood me. All I am saying is that "wassup" is a bad analogy because "what's up" itself is also colloquial, while "enviorment" is a much closer analogy. Would you say that the pronunciation /ɪnˈvaɪɚmɪnt/ doesn't belong at environment? --WikiTiki89 00:25, 22 July 2014 (UTC)
It does, I have already said so, so does "сечас" belong to "сейчас", "пожалуста" to "пожалуйста", also "водросль", "некторый", even if pronunciations are less common. --Anatoli T. (обсудить/вклад) 00:33, 22 July 2014 (UTC)
Am I missing something or are you agreeing with me now? --WikiTiki89 00:39, 22 July 2014 (UTC)
What I'm saying is, one needs to judge whether a pronunciation is indeed alternative or it should belong to a different spelling. /ɪnˈvaɪɚmɪnt/ and /ɪnˈvaɪɚnmɪnt/ can belong to "environment" entry. Same with some Russian words I mentioned above, e.g. сейчас as /sʲɪˈt͡ɕæs/ (=сечас). However, "капюшон" and "капишон" should definitely have separate pronunciations, like "wassup" and "what's up". --Anatoli T. (обсудить/вклад) 00:48, 22 July 2014 (UTC)
Ok, so you agree with me for "водросль" and "некторый", but not for "капишон". I'm willing to concede "капишон" for now until I get some more data on it. I have already found a few YouTube examples of the pronunciations "водросли" and "проволка" used with the spellings "водоросли" and "проволока". --WikiTiki89 01:10, 22 July 2014 (UTC)
Yes, "водросль" and "некторый" are OK, even if I don't think they are common, I found that people were surprised like me with these accents but these can be considered alternative pronunciations with a drop of vowel, which does happen. So, I'm conceding on these. I put "двоюрный" into the same bucket as "капишон", although they differ in etymology. Note, even if you find pronunciation "капишон", it still belongs to this different spelling. Just making sure you agree on the distinction. --Anatoli T. (обсудить/вклад) 01:19, 22 July 2014 (UTC)
It depends. For example, if I find a video titled "мой классный капюшон" where it is clearly pronounced "капишон", then that is (one piece of) evidence that the pronunciation does belong at "капюшон". I also think there is some confirmation bias going on. When I listen to someone say "капюшон", I hear "капишон"; and I'm sure that if you listen to someone say "капишон", you will hear "капюшон". These vowels are very close and in a short unstressed syllable, they are hard to distinguish. --WikiTiki89 01:27, 22 July 2014 (UTC)
I know what you mean. There are ways, as I suggested one can use a tool such as Audacity where you can listen in a very slow speed (you can adjust the speed). The audio should be available as an MP3 or OGG file, for example. Yes, your example with "мой классный капюшон" would work. As an example, I used Audacity to determine Chinese tones and prove my point that Chinese tones are pronounced even in quick speech. "Hard" is not impossible with technologies. --Anatoli T. (обсудить/вклад) 01:42, 22 July 2014 (UTC)
As a longtime student and user of the Russian language, I consider the Russian entries on English Wiktionary to be intended for native English speakers who are interested in a Russian word or who are studying Russian. As such, I see no value in putting these anomalous pronunciations here, and I think American students of Russian will take away the wrong thing from them. Such pronunciations belong in the Russian Wiktionary for the enjoyment of a native Russian audience. This reminds me of w:Charles Robert Jenkins, an American defector to North Korea. Jenkins got a job teaching English at a North Korean university, since the North Koreans wanted to learn English well enough to pass as South Korean. However, Jenkins was from North Carolina and spoke with a strong southern accent. Once the Koreans learned his English pronunciation was very odd, he was fired from his job. When people study a foreign language, they usually want to learn the best standard pronunciation. —Stephen (Talk) 03:31, 22 July 2014 (UTC)
I have nothing against properly indicating which pronunciations are standard and which are colloquial, but there is no reason to suppress information. --WikiTiki89 04:00, 22 July 2014 (UTC)
We’re not suppressing information, it’s a matter of putting the information where it belongs. This information belongs on the Russian Wiktionary. There are three major accent areas in spoken Russian ... if we wanted to see nonstandard pronunciations here, it would be far more preferable to show the pronunciations of the other two major accents, northern (with оканье among other features) and southern (with аканье/яканье among other features). But even this is really not useful to indicate on every page, and would be likely to cause confusion and damage. The northern and southern Russian accents should be described and explained with sufficient examples on Appendix pages. But the idiosyncratic pronunciations you are adding are not so useful or interesting and I would not include them on English Wiktionary at all. —Stephen (Talk) 05:55, 22 July 2014 (UTC)
I will not comment on the specifics of this discussion, as I’m very little familiar with Russian, but I support the inclusion of regional, nonstandard and colloquial pronunciations in the English dictionary. They should be tagged as such, of course. — Ungoliant (falai) 21:48, 23 July 2014 (UTC)
It's about specifics of various pronunciations. It's not so much about whether we include regional, nonstandard and colloquial pronunciations but whether they are frequent enough for inclusion (not individual, used by limited overseas communities), belong to the same spelling as the standard pronunciation. Yes, labelling is important and we do include variants. Major variations - northern "okanye" and southern "h" for "g" could be considered as well, if they are needed. --Anatoli T. (обсудить/вклад) 22:59, 23 July 2014 (UTC)

User:Wikitiki89 and proper nouns[edit]

User:Wikitiki89 has been going around changing things from proper noun to common noun. In particular, he has been doing it with political factions such as Libertarian and Democrat, plus the California separatist group known as the Osos. I believe that he is in error, and I have reverted him pending discussion here. Purplebackpack89 17:34, 23 July 2014 (UTC)

Democrat, Libertarian and Oso are proper nouns
  1. Purplebackpack89 17:34, 23 July 2014 (UTC)
Democrat, Libertarian and Oso are common nouns
  1. Sure, Democrat, Libertarian and Oso are common nouns. Just like many of the items at Category:English words suffixed with -ian. Just like Frenchman, Popperian, or Clintonite. --Dan Polansky (talk) 17:58, 23 July 2014 (UTC)
Please take a look at our POS for Englishman, American, Frenchman, and many more, none of which I have ever edited. --WikiTiki89 17:37, 23 July 2014 (UTC)
But you have mass-edited a number of pages in the last hour or so after I mentioned Democrat was tagged as a proper noun, and have edit-warred with me to keep them common nouns. You should stop changing pages until this discussion is over or you've linked me to another beer parlor discussion that supports your POV. Purplebackpack89 17:43, 23 July 2014 (UTC)
I do not need your permission to make changes that we have had a consensus on for a long time. --WikiTiki89 17:45, 23 July 2014 (UTC)
If you claim such consensus has existed for a long time, the least you can do is provide a link to that discussion (and the discussion from earlier this month is a) still going, and b) not at consensus at the moment). And if the last discussion with consensus was indeed a long time ago, then it may not hold now and it is perfectly acceptable to revisit it. Particularly if the discussion was about some subset of nouns that are different from this subset. Purplebackpack89 17:50, 23 July 2014 (UTC)
Take your pick. --WikiTiki89 17:55, 23 July 2014 (UTC)
Wikitiki89 is correct; these are common nouns, like Briton and Nazi. - -sche (discuss) 18:01, 23 July 2014 (UTC)
I agree, and so do the professionally edited dictionaries in which I just checked Frenchman (Chambers, Merriam-Webster, OED). The test of "properness" of a noun is not, of course, just whether it has a capital letter! Equinox 18:56, 23 July 2014 (UTC)
So a user has been going round making correct edits. Why are we discussing this? Are there so few correct edits nowadays we need to have threads to discuss them in? Renard Migrant (talk) 21:19, 23 July 2014 (UTC)
Democrat isn't the faction (as you put it) Democratic Party is the faction, perhaps that's what's causing the confusion here. Renard Migrant (talk) 21:26, 23 July 2014 (UTC)

Can we get 'particularly useful translation target' into CFI?[edit]

From Wiktionary:Requests for deletion#emergency physician (later: Talk:emergency physician) two users want to keep outside of CFI as a translation target. I worry about translation targets as a bit of a slippery slope issue. Do we want an entry in English for everything that can be expressed as a single word in one other language? No. Because then we'd end up with he had had in his possession a bunchberry plant (I'm not kidding, see xłp̓x̣ʷłtłpłłs). Is there any way to regulate this? There's a further issue, translation is necessarily subjective so what one person might translate with a two-word noun, I might translate with a slightly different two-word noun. It's tricky.

As a completely separate issue, I've noticed that entries de facto don't need to meet CFI. They just need to not get nominated for deletion or get nominated and pass with a consensus even if they don't meet CFI. I suppose that's why serious efforts to amend CFI into something usable have failed. It's easier to just keep on ignoring it. Renard Migrant (talk) 21:16, 23 July 2014 (UTC)

You're setting up a strawman. No one has ever been proposing to have he had had in his possession a bunchberry plant only because there is a single entry like xłp̓x̣ʷłtłpłłs. If we were after a formal strict set of criteria for translation targets, we would take care to handle these sorts of languages. --Dan Polansky (talk) 21:36, 23 July 2014 (UTC)
I'm not setting up a strawman. I'm saying we would need criteria and you seem to be agreeing. Renard Migrant (talk) 21:45, 23 July 2014 (UTC)
I agree; this practice should be codified. My first suggestion is that it should be used for lexemes, not individual forms, with distinct meaning (i.e., let’s not add I will do because farei, haré etc. exist nor translations of the “sentence-words” of polysynthetic languages). — Ungoliant (falai) 21:40, 23 July 2014 (UTC)
Thirded. We should codify "hot words" too. The problem always is that there are so many issues and so few people willing to tackle them. And often people get distracted before we reach anything conclusive. Keφr 22:10, 23 July 2014 (UTC)
  • Support: Purplebackpack89 22:12, 23 July 2014 (UTC)
  • Support as well. I would also support clarifying CFI in general to make it less opaque and more friendly to people not familiar with Wiktionary. Ideally, it should be written in such a way that someone who has spent only a day using Wiktionary (as a reader, not a contributor) should be able to understand enough of it to not do anything really bad. —CodeCat 22:16, 23 July 2014 (UTC)
  • Support as well. Also, I think we need to include Wiktionary:Lemming_principle#Lemming_test. What about back-translations from English (for lexemes only, as per Ungoliant's comment above)? Terms such as па́лец ноги́ (pálec nogí) and 足の指 (ashi no yubi), etc. have passed RFD, both non-idiomatic translations of toe, literally "finger of the foot". Such terms do penetrate various dictionaries, since "toe" exists in English, what's the word for it in language X? --Anatoli T. (обсудить/вклад) 22:59, 23 July 2014 (UTC)
    • To note: the closing comment at Talk:палец ноги and Talk:足の指 are "Kept: no consensus to delete either entry" and "Kept: no consensus to delete" respectively. Keeping due to "no consensus" is a rather weak outcome in my opinion, and I always had the impression that these "no consensus" entries are more open to renomination than those with clear consensus to keep. This is hardly "passing RFD". Keφr 23:17, 23 July 2014 (UTC)
I know. "No consensus" is not a strong case for closing RFD. Still, they are kept for now. With proper formatting (a soft redirect?) and labelling, they may be a bit more palatable. They are not idiomatic by definition and if they are only there to point users to how an English term is translated, there may be some room for them here. --Anatoli T. (обсудить/вклад) 06:59, 24 July 2014 (UTC)

Looking to get AWB privileges[edit]

Hello. I'm relatively new to Wiktionary but I've been active on Wikipedia for a long time. I've been working on Old French verbs and there are a bunch of changes I'd like to make that are too painful to do without an automated regex tool like AWB -- basically, to change the templates used for conjugating a number of verbs. Could someone add me to the list of registered AWB users? Thanks.

Benwing (talk) 05:32, 24 July 2014 (UTC)

I have added you to Wiktionary:AutoWikiBrowser/CheckPage#Approved_users. —Stephen (Talk) 06:59, 24 July 2014 (UTC)
Awesome, thank you. Benwing (talk) 08:50, 24 July 2014 (UTC)

Are phrases lemmas?[edit]

Entries marked as {head|xx|phrase} are currently listed in the main lemmas category. Are they really lemmas? I think they should be in a Phrases subcategory under the main lemmas category. --Panda10 (talk) 12:33, 24 July 2014 (UTC)

They are lemmas because they are not a form of another lemma. —CodeCat 12:40, 24 July 2014 (UTC)
I agree, though it is sometimes hard to identify the lemma properly, as such multi-word entries, especially with verbs, are, at essentially defective, or at least have a dramatically different distribution of use across inflected forms. DCDuring TALK 13:51, 24 July 2014 (UTC)
The problem is really that we're not using the term "phrase" properly on Wiktionary. In many cases, it seems that "sentence" is the more appropriate term. See w:Sentence (linguistics). —CodeCat 14:05, 24 July 2014 (UTC)
I'm not sure about the "They are lemmas because they are not a form of another lemma" argument. Phrasebook entries such as I don't understand or this morning clutter up the lemma category and will contribute to an inaccurate count of lemmas. --Panda10 (talk) 13:05, 25 July 2014 (UTC)

Including sum-of-parts terms[edit]

One of the reasons against including sum-of-parts terms is that they are counter-productive in defining the term by picking and choosing some of the senses of the component parts, thus under-emphasizing the other senses. Listing all possible combinations would cause too much duplication of information, which is bad for a number of reasons; for example, adding or modifying a sense of the component parts would require also adding or modifying one or more senses of the whole term as well. When we include sum-of-parts terms, we often try to make them sound more idiomatic by making the definition more specific than it needs to be.

On the other hand, there are many reasons to include some sum-of-parts terms:

  • They are defined in other dictionaries and/or people are likely to look them up: random number
  • They have useful translations into other languages: last year
  • They happen to be spelled as one word or have alternative spellings as one word: coal mine, unhelpful (un- + helpful)
  • They have unusual etymologies, pronunciations, or other useful information: (can't think of any at the moment, but I know they exist)
  • They are non-obvious in the encoding direction, even if they are obvious in the decoding direction: and so on and so forth

We have provisions, some of which are controversial, for keeping some of the types of words listed above, but not for all. We also have endless RFD debates about keeping words "outside of CFI".

I think a compromise is needed and I propose allowing the inclusion of some sum-of-parts terms that we decide would be useful to include, but without real definitions, similar to what we already do for translation targets. This can apply to terms included through WT:COALMINE, as well as simple cases of prefixes and suffixes, where a full definition has very little benefit over linking to the component parts. Here are some examples I created: User:Wikitiki89/coal mine, User:Wikitiki89/unhelpful, User:Wikitiki89/and so on and so forth.

--WikiTiki89 15:35, 24 July 2014 (UTC)

The problem I see with your example for "coal mine" is that it requires prior knowledge of the term to understand how to interpret the parts of the term. There is nothing in your entry that specifies that it's the sense "excavation" that is meant, rather than "explosive device". This is exactly why we need a full definition for it and other similar entries. If a term were truly SOP, then it could be validly be interpreted and used as any possible combination of its parts' meanings. But the reality is very different, such terms usually have much more restricted uses. —CodeCat 15:48, 24 July 2014 (UTC)
Another more general issue is that we seem to treat "idiomatic" and "SOP" as antonyms where they often are not. and so on and so forth is definitely idiomatic, even if it may also be interpretable as a sum of parts. Idiomatic phrases often translate into idioms in other languages, but we are sorely lacking translations for such terms thanks to our overly strict focus on deleting SOP terms. —CodeCat 15:50, 24 July 2014 (UTC)
But that's the thing about SOP, a "coal mine" could be an explosive device made of coal (also, as I said, we will only do this "where a full definition has very little benefit over linking to the component parts"). As to your second point, that is why I did not use the word "idiomatic" here. --WikiTiki89 15:54, 24 July 2014 (UTC)
I think we need to consider whether a term is a term of art in a specified field. For example, genuine issue of material fact is SoP to one who knows which senses of each term are intended, but is also a set phrase used in the law, and one that can not be substituted for other phrases. I think that if a general dictionary has a phrase, we should have it, and if a specialized dictionary (legal, medical, engineering, slang, etc.) has a term, then we should have it with the appropriate context label. Context labels go a long way towards eliminating the problem of "picking and choosing some of the senses of the component parts" because they indicate that when this phrase is used in this field it only refers to the specified senses of the words included. bd2412 T 15:59, 24 July 2014 (UTC)
That's the idea here: we will have the phrase, but we will link it to the component parts. Note that we can consider "material fact" to be one part rather than two, and possibly likewise for "genuine issue" if it is in fact a set phrase outside of this term. --WikiTiki89 16:03, 24 July 2014 (UTC)
For that example, I'm not aware of "genuine issue" being used outside the complete phrase. Our definition of genuine actually doesn't really capture the meaning used here (an actual controversy between the parties, rather than the facade of a controversy designed to test the law). It is sense 11 of issue. However, I generally think that a veneer definition requiring readers to look at two or three different entries to figure out the complete meaning of a term would be a needless inconvenience. bd2412 T 16:28, 24 July 2014 (UTC)
It's less of an inconvenience than the inconvenience of finding incomplete information presented as if it were complete. --WikiTiki89 16:32, 24 July 2014 (UTC)
That is where I think a context tag helps. If you are talking to a geologist or a civil engineer or a utility company about a coal mine, then there is only one relevant meaning, and the information presented is complete within that context. We could, for all of SoP definitions that are set phrases within a particular context, have an &lit sense, so that we can inform readers that when used other than in the sense of industry or geology, "coal mine" can mean any combination of coal and mine. bd2412 T 17:04, 24 July 2014 (UTC)
The word there is only one relevant definition of "mine" when talking to a geologist or civil engineer; this has nothing to do with the preceding word "coal". --WikiTiki89 17:22, 24 July 2014 (UTC)
Coal mine is a bad example for this point, since it only exists due to coalmine. If "coalmine" didn't exist, I would agree to deleting "coal mine" as readily as "copper mine" or "uranium mine". However, this principle is directly applicable to random number, which in the context of mathematics will never mean a "slapdash and seemingly directionless performance of a dance routine within a larger show". bd2412 T 13:24, 25 July 2014 (UTC)
That's one of my points. Since we are only including coal mine because of coalmine, it does not need a real definition, so we can just link to its parts. --WikiTiki89 13:29, 25 July 2014 (UTC)
Are we still going to have a complete definition at coalmine? I wouldn't object to coalmine being an "alternative spelling of" template and coal mine being bare links, but I don't think coalmine can be used to describe a military mine that runs on coal, so something would be getting lost in the sequence there. bd2412 T 15:16, 25 July 2014 (UTC)
I think there's a bit of a slippery slope here. Your test page decomposes "unhelpful" into [[un-]] + [[helpful]], but there's nothing stopping it from being decomposed into [[un-]] + [[help]] + [[-ful]]. Is [[electricity]] then SOP too, as [[electric]] + [[-ity]]? Is [[nothing]] just [[no]] + [[thing]]? Are full definitions only for monomorphemic words? —Aɴɢʀ (talk) 16:32, 24 July 2014 (UTC)
Most multimorphemic words are not simply SOP of their morphemes. Out of your examples, only [[nothing]] can actually be defined as just [[no]] + [[thing]], but then it is for us to decide whether it is beneficial to do so in each specific case. --WikiTiki89 16:37, 24 July 2014 (UTC)
I disagree that "nothing" is the only one that is SOP of its morphemes, but either way, I think it would create far too much work for us to decide on a case-by-case basis which polymorphemic words are SOP of their morphemes and which aren't. It's hard enough for us to decide that for multi-word expressions as it is. —Aɴɢʀ (talk) 17:20, 24 July 2014 (UTC)
There isn't much deciding to do. If the definition at the term is clearly equal to the component definitions, then you can replace it with a reference to each component. If someone later decides that that definition is inadequate, he could replace that with an adequate definition. No huge RFD discussions are even required. --WikiTiki89 17:26, 24 July 2014 (UTC)
@Wikitiki89: It can be used with that meaning, yes, but Wiktionary concerns itself only with attestable meanings. So the question we should be asking is: is it used with that meaning? Does coal mine ever mean "explosive device made of coal"? I would be very surprised if it did, precisely because its main sense "excavation for mining coal" is so much more common and using it in any other sense would cause confusion. So in reality, "coal mine" is much more restricted in meaning than its parts allow, which makes it idiomatic and hence includable per CFI. —CodeCat 17:17, 24 July 2014 (UTC)
It's probably possible to attest that meaning. --WikiTiki89 17:22, 24 July 2014 (UTC)
"Probably" isn't good enough for an RFV, though. If our current entry was like your proposal, I could validly RFV all senses that arise from the possible combinations of meanings of its parts. And many of them would likely fail, which would then mean we would have to put in a more specific, limiting definition. —CodeCat 10:17, 25 July 2014 (UTC)
If you want to find citations, I will. Anyway, something like "see coal, mine" (however we choose to format it) does not imply that all combinations exist, so it is not necessary to narrow it down. --WikiTiki89 10:57, 25 July 2014 (UTC)

My feeling about a set phrase is a bit like the US Supreme Court judge's feeling about hard-core pornography: I can't define it, but I know it when I see it. Some are little more than common collocations, but when they are common enough, especially within a particular field or arena, then to me they start to ‘feel’ like single concepts and not two concepts stuck together. This is unscientific but I'm just trying to explain my process. The CFI tests are good ways to check if something is a set phrase, but sometimes a term can fail all of them and still demand coverage (at least to my mind). DCDuring's ‘lemming test’ is, I think, valuable because it gives us a rationale without having to explain exactly why something should be kept. The weird thing is that when I first joined Wiktionary, I was a firm deletionist. I thought that entries like fried egg and Egyptian pyramid were a waste of time. But over the years I have slowly done almost a complete 180. My feelings in general now are that if there is a significant minority of people who see value in an entry, then we lose nothing by keeping it. Ƿidsiþ 16:58, 24 July 2014 (UTC)

The whole point of my proposal here is to allow us to keep these set phrases, without duplicating their wide range of definitions from the component parts. --WikiTiki89 17:02, 24 July 2014 (UTC)
I don't object to it on principle in some cases, though not necessarily routinely. There is also the issue that if a multi-word term has more than one meaning, we would presumably want to split the two senses so as to show quotation evidence for each one, and then you would have to write some kind of meaningful definition. Ƿidsiþ 17:14, 24 July 2014 (UTC)

Actually, the important word in your proposal is term. If they are terms of the language, if they belong to its vocabulary, they should be includable, SOP or not. Lmaltier (talk) 17:42, 9 August 2014 (UTC)


Misspellings are recognised as lemmas by {{head}}, but that doesn't quite seem right. They have their own parts of speech of course, so they should probably use the normal POS categories and templates like {{en-noun}}. But I imagine some might object to this because they are supposedly not "proper". Recently I created rediculous, which is the spelling I normally use, and which is quite easily CFI-attestable. But I opted to call it an alternative spelling, because it didn't seem right to label a spelling I use normally a "mistake". So I have been wondering whether labelling things as "misspellings" does not go against the descriptivist philosophy of Wiktionary. What we really mean is that these spellings are commonly proscribed, but they are probably not considered misspellings by the people that use them. So what do other editors think of this situation? Should we categorise them simply as misspellings, or should we give the proper POS? And should we continue to label them as "misspellings" or change the wording to something more descriptive?

As a side note, the template {{misspelling of}} originally said common misspelling, but I removed this because it looked silly for entries like animalike. —CodeCat 10:25, 25 July 2014 (UTC)

Rediculous is certainly a misspelling. I think the criteria for that should have something to do with whether most people who use it would admit that it is a misspelling if shown the correct spelling. --WikiTiki89 11:02, 25 July 2014 (UTC)
Well that doesn't include me, because I think the spelling "rediculous" makes more sense. It better reflects how it's pronounced, and that's probably what all the other people think too. —CodeCat 11:04, 25 July 2014 (UTC)
I realize that it does not include you, but I do think that it includes most people. I also think that the main influence of this spelling is not the pronunciation, but the abundance of word initial re- compared to the relative rarity of ri-. --WikiTiki89 11:12, 25 July 2014 (UTC)
That's bizarre: "littel" (little) would make more sense for pronunciation, but everybody knows that's not how English spelling works. Which other words do you respell for this reason? Equinox 12:07, 25 July 2014 (UTC)
The difference is that it was not a conscious effort to change the spelling based on some reasoning. I just wasn't acutely aware of how other people spelled it, and I spelled it the way I figured it would make the most sense. It's only after I found out how people write it that I figured, my way is fine too. —CodeCat 12:11, 25 July 2014 (UTC)
As to the question of whether something we agree is a misspelling should count as a lemma. I would think the answer to that is simply NO.
Perhaps we need to also review items in our English alternative spellings categories to root out miscategorized entries. We serve users well by misleadingly characterizing common misspellings as alternates. After all we are supposed to only have common misspellings. AFACIT rediculous is not even a "common" misspelling. It occurs 3 times in BNC/COCA combined vs nearly 8,700 occurrences of ridiculous. Results are similar in Google Books and Google N-gram. DCDuring TALK 11:17, 25 July 2014 (UTC)
Why only common misspellings? Why not just any that are attestable per CFI? And why should misspellings not be lemmas? They have plurals and other inflections like any other lemma might have. —CodeCat 11:34, 25 July 2014 (UTC)
It has been our practice to do so because the number of attestable misspellings of common words probably exceeds by far the number uf axepted [spelins. DCDuring TALK 12:02, 25 July 2014 (UTC)
I do hope that reasoning distiguishes between accidental mistakes, deliberate respellings, and deliberate and consistent spelling variants that are intended as normal use. We should definitely have the latter no matter how common, per descriptivism. For the former two, I think a criterium for commonness is ok. —CodeCat 12:07, 25 July 2014 (UTC)
No it does not and should not. We are documenting the set of conventions called language. DCDuring TALK 12:12, 25 July 2014 (UTC)
That would make sense if everyone followed the same conventions, but clearly they don't. If labelling something a misspelling is a matter of one group disagreeing with another group about the spelling, then why can we not label things like color as misspellings? My point is just that: Wiktionary cannot and should not decide what is a misspelling, and clearly mispelling-ness is not strictly defined as there are varying opinions about it. So what I ask for is clear criteria, which are verifiable, that can be used to decide when the label "misspelling" should be used. If Wiktionary is descriptive (which it is), then a label like "misspelling" should describe some objective verifiable reality, not subjective opinion. —CodeCat 12:17, 25 July 2014 (UTC)
I completely agree with CodeCat here. There is a difference between a misspelling most likely caused by the writer's clumsy typing alone (e.g. typign), a misspelling caused by the writer most likely not knowing how to spell the word (e.g. independance), and a misspelling most likely caused by the writer's intentional choice to use a variant in order to achieve a literary effect like showing snarkiness or dialect (e.g. rediculous, "gawn to the sto'"). The only typo we should include is teh, because its commonness has turned it into a word intentionally used in jest. The second kind we should include if they are common enough that a reader would want them defined, so we can inform the reader in our definition that this is not the correct spelling. The third kind we should include if they are attested, because their specialized use makes them subtly different words in terms of the definition itself. bd2412 T 12:52, 25 July 2014 (UTC)
Yes but CodeCat isn't saying that exactly, he's saying he (or she, not sure) continues to use ‘rediculous’ because he thinks it's ‘more logical’ and therefore it shouldn't be called a misspelling. Ƿidsiþ 13:00, 25 July 2014 (UTC)
I mean to agree with CodeCat's comment immediately preceding my response. But his earlier point is also valid. Isn't that why we have thru and tho? bd2412 T 13:19, 25 July 2014 (UTC)
My objection concerning "rediculous" specifically is that it didn't seem like a misspelling to me, just an uncommon alternative spelling. The "misspelling" part lies only in the proscription against it. This is why I consider "misspelling of" to be equivalent to "(proscribed) alternative/rare spelling of". Whether something is a misspelling is subjective, but widespread proscription against a certain spelling is objective and can be verified at least in theory. Proscription can wane as forms become more accepted, and people will no longer consider them wrong. So I think we should replace "misspelling of" with something else that makes that more clear. Something like "proscribed spelling of" - this fits with how "(dated)" + "alternative spelling of" gives "dated spelling of" and similarly for other usage labels. —CodeCat 13:41, 25 July 2014 (UTC)
I would consider something an "alternative spelling" if a significant number of people believe that it is the correct spelling, even if others proscribe it. --WikiTiki89 13:43, 25 July 2014 (UTC)
Then what about {{rare spelling of}}? —CodeCat 13:48, 25 July 2014 (UTC)
I would consider that an equivalent of {{cx|rare}} {{alternative spelling of}}. If a spelling is considered by almost everyone to be a misspelling, then it we should label it as such. --WikiTiki89 13:53, 25 July 2014 (UTC)
Do you think there is a difference between {{context|proscribed}} {{alternative spelling of}} and {{misspelling of}}? —CodeCat 14:18, 25 July 2014 (UTC)
Yes, something can be proscribed by some people and accepted as correct by others. --WikiTiki89 14:24, 25 July 2014 (UTC)
Does that mean that to you, a misspelling is accepted by nobody? —CodeCat 15:01, 25 July 2014 (UTC)
By no significant group at least. Note that I'm saying what the intrinsic criteria are, even if it may be impossible for us to determine whether this is the case or not. --WikiTiki89 15:17, 25 July 2014 (UTC)
I just noticed an entry that uses "misspelling" as the second parameter of {{head}}, i.e. uses "misspelling" as if it were a part of speech: [[aqui]]. I do not recall noticing this before. My initial reaction is that such entries should declare their actual part-of-speech, which in [[aqui]]'s case is "adverb". But I can also see how that would "pollute" the part-of-speech (and "lemma") categories with non-words (to whatever extent we use "misspelling" to describe things that are actually misspellings/mistakes, as opposed to intentional alternative spellings), and so I can see an argument for continuing to not put them into the POS categories.
Regarding the wording of the template: I think the idea behind including the word "common" was that it would emphasize and enforce our exclusion of rare misspellings. In practice, however, rare misspellings were including using the template anyway, so removing the word was probably good.
Similar to BD, I distinguish three categories of nonstandard spellings: (1) typos or typo-like misspellings, which are distinguished by (among other things) not being used consistently throughout a work, and which are not includable, (2) misspellings, or mistaken spellings, and (3) intentionally deviations from standard spelling, which we handle through templates like {{alternative spelling of}} and {{eye dialect of}}. (Re "teh": in my opinion, teh is includable because it has come to be used intentionally, and so it does not constitute an exception to the exclusion of typos.) - -sche (discuss) 02:51, 26 July 2014 (UTC)

Use of babel templates from other wikis[edit]

I was going to create Category:User eml, but saw that it's based on a language code we don't recognize (it was split into egl and rgn). That led me to wonder why we had Category:User eml-3 and Category:User eml-N. It turns out that there are a couple of user pages that have {{#babel:it| which means they're using the Italian Wiki's babel system, which apparently recognizes some language codes we don't, and that this prompts User:Babel AutoCreate to re-create categories that we had deleted.

Is this ok, and, if not, what should we do about it? Chuck Entz (talk) 19:19, 25 July 2014 (UTC)

The script was actually blocked twice for creating categories like this, once by someone who though it was a bot and once by someone who seemed to think it was a live user. As I noted when I unblocked it, the solution that's most obvious to me is to salt the categories we don't want by protecting them such that only admins can re-create them. Alternatively, we could allow people to specify fluency even in things we don't consider languages, and specially categorise the categories, e.g. we could allow Category:User eml and put it in Category:User egl and Category:User rgn. - -sche (discuss) 20:15, 25 July 2014 (UTC)
We also ended up with Category:User simple, Category:Romany language, Category:Traditional Chinese language, Category:British English language and Category:Simplified Chinese language thanks to this script. The categories don't exist, but they do have entries in them. —CodeCat 20:46, 25 July 2014 (UTC)
All of those except the first one were due to mistaken hard-coded categories, which it was simple to fix (e.g. 'Romani' was misspelt, I corrected it). We could continue to delete and "salt" those categories even if we decided to allow categories for retired language codes. - -sche (discuss) 02:58, 26 July 2014 (UTC)

Proposed compromise votes on romanizations[edit]

Since the various recent votes on romanizations have failed to achieve a consensus, I have drafted two compromise votes incorporating some ideas that had some traction in the various discussions. These are Wiktionary:Votes/pl-2014-07/Allowing well-attested romanizations of Sanskrit and Wiktionary:Votes/pl-2014-07/Redirecting attested romanizations. Cheers! bd2412 T 02:50, 27 July 2014 (UTC)

Middle Vietnamese[edit]

Thanks to Mxn, we have two 'Middle Vietnamese' entries, and . (trên and trời also mention Middle Vietnamese in their etymologies.) However, because Middle Vietnamese doesn't have its own language code, they just use the code and templates of Vietnamese. Is Middle Vietnamese different enough from modern Vietnamese thay it should have its own code and templates, or should the two entries be switched to use 'Vietnamese' headers? - -sche (discuss) 18:04, 29 July 2014 (UTC)

See also Wiktionary:Grease pit/2014/April#Middle Vietnamese. – Minh Nguyễn (talk, contribs) 10:06, 13 August 2014 (UTC)
Aha, thanks for the link. I've commented there. - -sche (discuss) 21:12, 14 August 2014 (UTC)

Inclusion of Dothraki[edit]

Thanks to one user, Wiktionary currently includes in the tables and subpages of Appendix:Dothraki more than 100 words from Dothraki, an artificial language created a few years ago for the television series Game of Thrones. According to Wikipedia's sources, Dothraki only contains a few thousand words, so Wiktionary is including a substantial part of it. Wiki.dothraki.org notes that "All extant words in the Dothraki language are copyright of HBO, as is the text and audio of the language documents provided to HBO by the LCS." Dothraki.com notes that "Living Language, in conjunction with HBO Global Marketing, is publishing a guide on Dothraki this October! You can preorder the book [...]". Is Wiktionary violating HBO's copyright and competing with or harming Living Language by publishing for free information which they intend to sell? Even if the answer to the previous question is 'no', should Wiktionary be including Dothraki? - -sche (discuss) 18:08, 29 July 2014 (UTC) (added last sentence making the more basic question explicit - -sche (discuss) 18:26, 29 July 2014 (UTC))

I don't know who wrote that wiki but "All extant words in the Dothraki language are copyright of HBO" seems like nonsense. You can't copyright a word, can you? Equinox 18:12, 29 July 2014 (UTC)
The Loglan-Lojban dispute seems to suggest that you can copyright (a collection of) words, provided you're the one who created them in the first place. - -sche (discuss) 18:16, 29 July 2014 (UTC)
(e/c) You can copywrite a language. I oppose the inclusion of Dothraki, since as far as I know it has no community of speakers. --WikiTiki89 18:17, 29 July 2014 (UTC)
Some Dothraki words existed before the series was created. They are present in George Martin’s books. — Ungoliant (falai) 18:15, 29 July 2014 (UTC)
...which are copyright GRRM. - -sche (discuss) 18:16, 29 July 2014 (UTC)
Putting on the intellectual property attorney hat. Yes, the language as a whole is subject to copyright because it is a substantial element of a creative work. We can as a matter of fair use present "Dothraki" words that seep into general usage (comparable to the Klingon Qapla'). We most definitely can not make an appendix listing a large number of such words that do not meet this criteria. I recommend speedy deletion. bd2412 T 19:10, 29 July 2014 (UTC)
Speaking of Klingon, Wiktionary has an Appendix:Klingon which seems to present from the same issues as Appendix:Dothraki. Should it be deleted too? - -sche (discuss) 19:49, 29 July 2014 (UTC)
Let me put it this way - Anderson v. Stallone and Castle Rock Entertainment, Inc. v. Carol Publishing Group stand for the principle that if you take a large number of original elements of a work covered by copyright, even if those elements are rearranged in some way, then you are liable for copyright infringement. Copyright presently runs for the life of the author plus seventy years (running from the end of the calendar year in which they died). Any invented language that is part of a copyrighted work for which the author hasn't been dead since 1943 are likely to be legally barred to us. There are exceptions, for example where an author releases their work into the public domain, or failed to abide by technicalities that were in force up until the 1970s, but there's no reason to think any of those are applicable here. Therefore, Appendix:Klingon is nearly as problematic (although it does have the benefits from a copyright defendant's viewpoint of being much older, so that it has had more time to seep into the culture, and has both a larger number of "authors" diluting the claims of ownership, and a longer history of copyright owners failing to prosecute infringing conduct). I would get rid of both of these, or severely cut the Klingon appendix down to words for which we have entries, and the chart of personal pronouns, which is de minimis. Also, before anyone asks, yes Appendix:Na'vi should go too. bd2412 T 20:15, 29 July 2014 (UTC)
I have deleted Appendix:Dothraki, Appendix:Na'vi, Appendix:Goa'uld, Appendix:Unas and Appendix:Noxilo. I have truncated Appendix:Klingon, and modified the short lists of Appendix:Eloi, Appendix:Lapine, Appendix:Mandalorian, Appendix:Láadan, Appendix:Toki Pona and Appendix:Bolak, per the suggestion that short lists are de minimis and OK. Of the other languages listed in Template:artistic languages and Category:Appendix-only constructed languages, I tentatively make the following assumptions: Appendix:Communicationssprache was intended for widespread distribution and its creator died in 1843, so it seems to be includable; similarly, Appendix:Mundolinco seems includable; Appendix:Quenya and Appendix:Sindarin and Appendix:Black Speech are probably under copyright and should probably be deleted like Dothraki or truncated like Klingon; Category:Lingua Franca Nova language and Category:Neo language I am not sure what to do with. - -sche (discuss) 21:33, 29 July 2014 (UTC)
There are some differences between constructed languages made as adjuncts to works of fiction and those made as proposed languages for actual speaking. First, the latter type is not part of the sort of creative endeavor that would be diminished in value by republication of the definitions; and second, the latter type is intended for general use by people other than the author, and inclusion in a dictionary is a predictable kind of the intended use. Of course, when it comes to Lingua Franco Nova, we could just ask C. George Boeree for permission, but I do note that the Lingua Franco Nova website has copyright notices on every page. bd2412 T 02:23, 30 July 2014 (UTC)
Which of those two does Klingon fit under? It was originally an adjunct to a work of fiction, but has grown enough that it has a comparatively large number of speakers. --WikiTiki89 12:17, 30 July 2014 (UTC)
The Klingon language was initially created as part of a larger whole creative work, and for the purely commercial purpose of making that work more marketable for potential purchasers. It remains under copyright for that purpose for as long as the original work remains under copyright for that purpose. I had thought that the language was created, at least in some part, for the original Star Trek series in the 60s, but apparently it was not constructed in detail until the making of Star Trek III in 1984. In any case, the key fact is that it was not created for the purpose of serving as a language to be used in real life, but as part of the story, as a harsh and guttural language that filled out the Klingon characters and gave them an extra sense of menace. As long as the original works remain under copyright, the language remains under copyright, and is only susceptible to fair use. Note that the original work for purposes of most of the Klingon language is The Klingon Dictionary by Mark Okrand (under a copyright owned by CBS Television Studios), which is written from an an-universe perspective, as if it related facts about an actual Klingon civilization, and is therefore clearly part of the creative work. bd2412 T 13:27, 30 July 2014 (UTC)
But then the question is what do they do with their copyright? Clearly they allow things such as w:Klingon Language Institute. --WikiTiki89 15:47, 30 July 2014 (UTC)
I have truncated Appendix:Quenya to 20 words (or 30 counting inflected forms), and Appendix:Sindarin and Appendix:Black Speech to 26 words each. Neo already only included its 10 number words. At this point, the largest appendices (not counting those for languages like Communicationssprache which seem to be unproblematic) are for Appendix:Lapine, with 36 words, and LFN, with 140+. I am trimming LFN now. - -sche (discuss) 18:06, 30 July 2014 (UTC)

Multiple accounts[edit]


It has come to my attention that BaicanXXX - who has been confirmed (by the administrators of Romanian Wikipedia) to have a sock puppet called WernescU - has created yet another account called BanescuBaican. He has currently three active accounts here.

I've tried to discuss this issue with a Wiktionary administrator, but I haven't received any feedback.

Aren't multiple accounts prohibited? Unsure if this is the right place to discuss this; I just wanted to give the heads-up.

Best regards, --Robbie SWE (talk) 11:19, 30 July 2014 (UTC)

There is no policy against them (and no policy for them either, WT:BOT notwithstanding), but this question, as many others, is ultimately left to administrators' discretion. If the accounts have clearly distinct purposes, are not constantly switched, the number of accounts is reasonably small (say, less than five) and the user is in relatively good standing with the community, I see no problem with that. Is the user doing anything wrong here? Keφr 11:52, 30 July 2014 (UTC)
Well that's the problem; the account owner switches quite freely between accounts, usually every time I correct his translations and/or question his contributions. Baican has been blocked two times in the Romanian Wiktionary because of insubordination and using/adding words that don't exist. He was also blocked in the Romanian Wikipedia because he harassed other users with whom he didn't agree with. I've had issues with this user before and that's the reason why I monitor his contributions around here. --Robbie SWE (talk) 12:09, 30 July 2014 (UTC)
I see. Any juicy diffs? I want to see the whole context. What also worries me is that the user is apparently either unable, or simply refuses to write in English. Not that I like the language, but I know no Romanian at all, so I would rather prefer English. (Also, a link to the previous discussion). Keφr 15:48, 30 July 2014 (UTC)
Off the top of my head, his translation for Commonwealth of Nations. I've corrected the Romanian entry, because it was simply not accurate. As I pointed out earlier, he usually provides verbatim translations (think Google Translate) and although I understand why they can at times be useful, they can just as well be confusing. I'm not saying that all his contributions are bad, but I've had a hard time reasoning with him when he hasn't followed the rules. Trying to discuss anything - be it in English, Romanian or any other language - usually turns into a feisty discussion. --Robbie SWE (talk) 17:18, 30 July 2014 (UTC)
Did I mention something about "context"? One bad edit does not a context make. And in Commonwealth of Nations you made a slight mistake of your own. Where can I read about those blocks? Or could, if I knew Romanian? Keφr 18:08, 30 July 2014 (UTC)
I see that I made a mistake of my own and I apologise for that. I was trying to keep up with his edits and became sloppy - not trying to make excuses for myself, just give an explanation :-) Unfortunately the blocks and the discussions leading to the blocks are indeed in Romanian. --Robbie SWE (talk) 19:30, 30 July 2014 (UTC)
  • Wiktionary at present has no sockpuppetry policy. But if it ever gets one, multiple accounts should not be prohibited. There are only three reasons multiple accounts (or one account and an IP) are problematic:
  1. Block evasion (using one account while another is blocked)
  2. Double voting (voting with both accounts)
  3. Handedness (one account)

An account that doesn't do any of those things is perfectly acceptable. Purplebackpack89 17:53, 30 July 2014 (UTC)

One of our block reasons is "abusing multiple accounts". However, it seems that you are correct that no policy page actually describes our policy on it. --WikiTiki89 18:00, 30 July 2014 (UTC)
This editor is somewhat like Gtroy; they create a prodigious quantity of edits which contain enough errors that they need to be checked, and it wears out the admins and other users who take on the task of checking them (for which reason Dick Laurent at one point just blocked the editor for 6 months). And now, like Gtroy, the user is creating multiple accounts. Given its name, the new account does not exactly hide its relationship to the others, but it isn't linked from them, either. If the user creates too many accounts, it has the effect of making scrutiny of their edits difficult (without us even speculating on whether or not that is the user's intent), which is disruptive. If the number of accounts gets any higher, let me know; I am willing to block the 'extras'. (For reference, the accounts mentioned so far are BAICAN_XXX, last edit 26 July, WernescU, last edit 12 July, BanescuBAICAN, last edit 30 July.) - -sche (discuss) 18:29, 30 July 2014 (UTC)
Thank you for taking interest. And I agree. Also, note that WernescU has the autopatrolled flag. I think it should be revoked, given the situation. Keφr 18:53, 30 July 2014 (UTC)
Okay, I've de-autopatrolled him. (I wonder if de-autopatrol is attestable...) —Aɴɢʀ (talk) 19:29, 30 July 2014 (UTC)

Contributing the contents of bi-lingual dictionaries?[edit]


Hope I'm asking in the proper place.

I work as a consultant for the Dzongkha Development Commission (DDC), the national language authority of Bhutan. After talking to the head of the DDC and others concerned, and explaining what Wiktionary and the CCbySA license are, it seems we can contribute the data we have for English-Dzongkha, Dzongkha-English, Dzongkha-Dzongkha and Tibetan-Dzongkha Dictionaries.

What is the best way to proceed with this? We have the data for these dictionaries in a MySQL database as well as in XDXF (XML) and StarDict formats.


CFynn (talk) 08:57, 31 July 2014 (UTC)

If you are able to convert the data automatically into the format that Wiktionary uses for its entries, then you could probably create the entries with a program. It would probably still be necessary to tag those entries with some kind of template notice so that it's clear the entry was not created by hand by a user, and may therefore need to be checked. —CodeCat 11:07, 31 July 2014 (UTC)
Please contact me on my talk page or via email if you are not sure about something. Wyang (talk) 12:26, 31 July 2014 (UTC)
A basic Dzongkha noun entry would look like this: e.g. if I want to create an entry for ལྡུམ་ར (ldum ra) (from a dictionary), I would make it like this:


# [[garden]]
--Anatoli T. (обсудить/вклад) 13:27, 31 July 2014 (UTC)

Names categories[edit]

Right now we have two different category trees for names, Category:English names and Category:en:Names. I would like to merge the latter into the former, as I suggested at WT:RFM before. But someone suggested bringing it up here because there might be more comments. The point that was brought up in a past discussion was that there is something intuitively different about names that are "English" and names that simply occur in English usage. The problem is partly the category name: "English names" suggests that the names themselves are "English" but we really intend it to mean that it's simply a name used in English at some point, like "English nouns" contains nouns used in English at some point. People interpret "English name" differently from "English noun". But someone rightly pointed out that café is not "English" by many people's standards, yet we have no problem calling it an English noun.

Of course there's also the problem that bearers of names can move around the world and take their name with them. And people in other countries have names which then need to be adapted to English. This is very different from how loanwords are adopted into languages. Loanwords are adopted and used by speakers of another language, simply by choice or for convenience. But names are generally granted by speakers of a certain language and adapted to other languages when people need to refer to someone with that name. On the other hand, names can be "loaned" in the sense that they are adopted by speakers of a language, like common names such as John, Lisa and Hans. So there's a distinction between adapting a foreign-given name into English for the purpose of referring to a person with that name, and English speakers adopting that name as an actual loanword by giving that name to their children.

Then there's the issue of Category:en:Place names. Place names probably don't suffer as much from the above problems as personal names do, because they are simply the names used in English for a certain geographical object. They may be borrowed, or they may be adapted to the language, pretty much like normal words. There's much less dispute that Kilimanjaro is the English name for that specific mountain. So even if we don't find a solution for Category:en:Names, would it at least be ok to migrate Category:en:Place names and its subcategories to become subcategories of Category:English names, and renamed to something like Category:English place names, Category:English names of countries etc.? —CodeCat 11:51, 31 July 2014 (UTC)

Aren't ALL proper nouns used in personal names essentially Translingual with the characteristic "English" (used simply as a concrete example of a language) being one of its Etymology? That there are instances of names sharing a common root (Jean, Sean, Juan etc) or common derivation from something in the real world (Eagle and Adler as surnames) may create an illusion that names are more language-specific than they are in our times. What lasting limits on full translinguality there are are probably script (narrowly construed for language-specific diacritics) differences.
Similarly for toponyms. The vast majority, though not the most common ones, seem essentially Translingual, though some are rarely used outside of the local language. DCDuring TALK 13:26, 31 July 2014 (UTC)
I think this is one of the reasons that many dictionaries don't include proper nouns at all. --WikiTiki89 13:54, 31 July 2014 (UTC)
I think DCDuring makes a very good point. But there are reasons why we should continue to treat names in a language-specific way. A major one is inflection; names inflect differently in different languages and putting all of that in a single translingual section would not be feasible. Of course there's also pronunciation, which can vary quite widely even with the same spelling. Just compare English Jean /dʒiːn/ and French Jean /ʒɑ̃/, not to mention the difference in gender between the two. Inflection and pronunciation is a verifiability concern because it's quite likely that some names have never appeared in a certain language at all and therefore have never been pronounced or inflected in it before, so then any information we add in the name's entry would be pure speculation.
So maybe what we need to do is examine the categories more closely. Category:English given names may not really describe the reality, and maybe what we really want is something along the lines of "Given names adapted into/used in English". That category would of course be entirely separate from "Given names originating in English", which would be etymological.
I don't see that it's the same with toponyms. The big difference is that toponyms are not repeatedly assigned to things, they're names for single definite objects, much like nouns are (albeit that regular nouns name classes, not individuals). Of course because most toponyms are of foreign origin, many of them are borrowed more or less verbatim, but there's certainly no guarantee of that, which is why there is such a thing as an exonym. —CodeCat 14:04, 31 July 2014 (UTC)
Ahem... not repeatedly assigned? --WikiTiki89 14:14, 31 July 2014 (UTC)
re: "Given names originating in English". The word "originating" should be replaced by "derived from". Presumably membership would be dependent on the content of the Etymology section for the word. Derivation reflects the world as it existed with a relatively limited amount of peaceful migration of literate people into places occupied by other literate people, Rome being a great exception.
re: toponyms. Right. Full toponyms (eg, Springfield, Illinois) are proper names, names of specific entities. But why are they not Translingual? Pronunciation?
re: pronunciation differences justifying multiple L2 sections for exact homonyms. I suppose that means we should, in principle have different L2s for taxonomic names for the same reason. You should look at the discussion of that question. DCDuring TALK 14:28, 31 July 2014 (UTC)
Let's take a look at some of the ways by which a name can make its way into English-language texts or utterances. How should each case be categorized?
  1. A word or set of words of native Germanic origin came to be used in Old English as name, and is still used (perhaps in changed form) in the modern English period. Example: Harold. This is the most clearly, purely English kind of name.
  2. A word of native Germanic origin came to be used in the modern English period as a name. Examples: Winter, from winter (think of Harlow Winter Kate Madden); Sky, from sky (think of Sky Ferreira).
  3. A name came to be used in a foreign language and was borrowed into Middle or Old (not modern) English and then passed into modern English. Example: Vernon.
  4. A name came to be used in a foreign language and was borrowed into modern English. Example: Pierre. (If Pierre happens to have been borrowed before 1500, there are many other examples.)
  5. A word of foreign origin was borrowed as a word into Middle or Old (not modern) English, and came to be used as a name in modern English. Example: Joy (and probably also River).
  6. A word of foreign origin was borrowed as a word into modern English, and then came to be used as a name. Example: Karma (currently a red/orange link, but attested).
  7. Non-English roots were combined in a language other than English (say, Mongolian) to form a name which is only ever found in English texts when Mongolian (non-English-speaking) individuals who bear the name are being mentioned. Examples: Tsakhiagiin (name of the current president of Mongolia, and potentially other Mongolians), Toivo (name of a former Prime Minister of Finland, and of various other Finns). This is the most clearly non-English kind of name. (Does it make a difference that Tsakhiagiin is a transliteration of Цахиагийн, while Toivo is not a transliteration?)
  8. The same situation as above, but the name was anglicized. Possible example: Khosrau or Chosroes (referring to inter alia the Persian ruler whose name, if simply transliterated, would have been something more like Husrō(y). (If that is a poor example, I am sure others exist.)
  9. As above, but the name has come to be used for new people (i.e. the name is given by English-speaking parents to their babies). Example: Virgil (from Latin Vergilius).
  10. Assimilation of all or part of a non-English-speaking culture into an English-speaking culture results in some of that culture's names being used by English-speakers (regardless of whether their forebears were also English-speakers, or were part of the non-English-speaking culture). Examples include many Irish names and some Lakota/Dakota names. (In the Tea Room, I speculated that there may be more [European- and African-American] English monoglots named "Winona"/"Wenona" than there are Sioux-speaking [Lakota and Dakota tribes-]people with that name.)
- -sche (discuss) 22:04, 31 July 2014 (UTC)
I would say that all except 7 and 8 are English names. If we apply the use-mention distinction here, then a word is used as a name if it's used to name a new person, whereas referring to an already-named person is a mention. This is distinct from using the names as simple words in a text. Evaluating names in this way may be a useful criterium? Of course there is still the ambiguity of expatriates naming their child from their own culture/language, or bilingual parents. For example if a Moroccan couple move to the Netherlands and name their child who was born there Abdul, is it then a Dutch name? What if one of the parents is native Dutch? —CodeCat 22:15, 31 July 2014 (UTC)
See Wiktionary:About given names and surnames#The language statement of a name. The etymology of a name says very little about its language statement, since most Europeans and Americans bear given names that are borrowed from other languages: Hebrew, Latin and Ancient Greek mostly, but also from modern languages. Karen and Michelle are definitely English names by now. There are good statistics of given name use available in many modern languages. Foreigners' names are translingual as long as they do not change their spelling. There is no need to define a purely Finnish given name in English. Transliterations like Abdul do need a definition; and if you check statistics, there might be enough Dutch-speaking Muslims to make it a genuine Dutch name. We do have a difference in categorization already: transliterations are in the topic Category:Transliteration of personal names. Notice that places are defined as topics: "A city in England", "Any of a great number of cities in the US", not as "A place name". Unlike given names and surnames, foreign places do need an English section too.--Makaokalani (talk) 11:36, 1 August 2014 (UTC)

August 2014[edit]

Normalised spellings and CFI[edit]

For some languages, we commonly respell the words into a common form. This is done for Old Norse, Old High German, Middle Dutch, and other old languages. As it is now, CFI does not actually allow for this practice, but I think it should be allowed. So we should probably codify this practice as an exemption. Something along the lines of "for languages for which a normalised spelling is adopted, the normalised spelling itself does not need to be attested, as long there are unnormalised spellings of the same word that do meet CFI". —CodeCat 23:09, 3 August 2014 (UTC)

Support. That's definitely the case with Old Church Slavonic or Old Russian. Most quotations of these in modern Russian use modern Cyrillic letters, instead of old letters, which makes the terms in old spellings difficult to attest. --Anatoli T. (обсудить/вклад) 00:19, 4 August 2014 (UTC)
Support, and for each language that normalizes spellings, we would have to detail the normalization rules on its "WT:About X" page. We can consider an non-normalized spelling to attest its normalized spelling obtained by following the normalization rules that we have listed for the language. --WikiTiki89 13:38, 4 August 2014 (UTC)
I forgot to mention that quotations should always be added in the original non-normalized spellings whenever possible, and known non-normalized spellings should be listed in alternative forms sections. --WikiTiki89 13:53, 4 August 2014 (UTC)
  • Words should be added as they are spelled in attestation. This "normalized spelling" idiocy is another attempt to impose artificial uniformity where there is none, namely in attestations of all languages before the 19th century where there were usually no enforced rules of spelling. It will make Wiktionary completely useless as a resource because we would never know whether the added word was attested as such, or is a a guessed transcription according to a scheme devised by some wiki nickname. Any kind of "normalized" spellings should be used strictly as redirects to real spellings. -Ivan Štambuk (talk) 13:46, 4 August 2014 (UTC)
    It does not impose uniformity, it merely makes it easier to find the entry where the definition is located. Also, all dictionaries do this, which nullifies your usual argument of breaking accepted conventions. --WikiTiki89 13:56, 4 August 2014 (UTC)
    If the point were in finding entries, then the normalized spellings themselves would be redirects, not the main entries. Instead, what is suggested is that all of the main entries be somehow normalized, regardless of how they are attested, containing all of the definitions, citations and so on for all of the spellings that they resolve to under some lossy scheme, and actually attested entries be soft redirects, and paradoxically listed as "alternative spellings" under the normalized entry (how can real attestations be alternative spellings to something made up?!)
    When it comes to ancient languages, all of the paper dictionaries have space constraints that require usage of a standardized spelling scheme to help look up entries. A single word could have a dozen different spellings. However, online dictionaries do not suffer from such limitations. We can have everything - the original script in Unicode and not Latin transliteration, citations, as well as a list of widely used scholarly transcriptions, normalization schemes, reconstructed pronunciations or whatever - but the latter not as full-blown entries, because they are not real words but reconstructions. --Ivan Štambuk (talk) 14:19, 4 August 2014 (UTC)
    Online dictionaries have screen real estate constraints as well. I would not like to picture what an inflection table would look like if it includes all attested variant spellings. --WikiTiki89 14:29, 4 August 2014 (UTC)
    For ancient languages inflection tables are not that important. People don't use them to learn to speak those languages (except maybe Latin and Sanskrit, but that's insignificant). For them, much more important points are accuracy and reliability. Inflection tables which only contain attested forms in their original spelling are much more important than inflection tables containing reconstructed forms that were possibly never attested in those spellings. It's a difference between Wiktionary as a serious reference work, and Wiktionary as a conlang community. --Ivan Štambuk (talk) 14:49, 4 August 2014 (UTC)
There's also the problem of unattested lemma forms. We have an entry for πρίαμαι (príamai), for example, but according to Liddell & Scott that particular form is not attested. That doesn't happen too often in Ancient Greek, which has an enormous corpus, but it happens very frequently in languages like Gothic and Old Irish. I've been creating entries for unattested lemmas in both of those languages, but I've been wondering if that's really such a good idea. Maybe we should put them in the Appendix namespace alongside other reconstructed forms. —Aɴɢʀ (talk) 14:54, 4 August 2014 (UTC)
If the lemma form can be easily determined, then I think it is the best place to define the term. We can note on the page that the lemma form is unattested and I guess it makes sense to be able to mark or remove the unattested inflected forms as well. Inflection tables are still very useful, especially when all or most of the forms are in fact attested. --WikiTiki89 15:24, 4 August 2014 (UTC)
Support. For Old Norse and Old High German, I expect that normalized spellings actually meet CFI, firstly because Norse texts are so regularly printed in normalized form, and secondly because print and online dictionaries (the former of which are sufficient verification, per CFI, of extinct or poorly-documented languages) invariably use normalized spellings. Another set of languages that already benefit from normalized spelling and would benefit further from having the practice codified are the indigenous languages of North America, which different dictionaries and text-collections have often used slightly different orthographies to represent. For instance, in many languages, some sources have represented long vowels with macrons (ā), or circumflexes (â), other sources have used trailing mid dots (·), and still others have used doubling (aa). Pace Ivan, I think it'd be hilariously nonsensical for e.g. one third of the inflected forms of a term or one third of a set of compounds that share a common element to use ā, while another third used aa, most of the rest used â and a few entries used , all because someone preferred to blindly copy and paste the idiosyncrasies of the different dictionaries the forms were attested in rather than think critically about them for a moment. - -sche (discuss) 17:51, 4 August 2014 (UTC)
To be fair, I don't think that Old Norse texts that are printed in normalized are allowed to attest the normalized spelling. The actual attestation should be of the original spelling(s) from when the language was still in use. --WikiTiki89 18:07, 4 August 2014 (UTC)
Paradoxically, those spellings are much harder to attest. It's much like the scripts of Gothic: it was written in Gothic script originally, but everyone "normalises" it into a transliterated form nowadays. —CodeCat 18:12, 4 August 2014 (UTC)
Yes, that is why if we allow entries at normalized spellings in CFI, we must remember that normalized texts attest the term, but not the spelling. --WikiTiki89 18:18, 4 August 2014 (UTC)
Those "normalized Gothic" texts are not attestations of Gothic language. That is not how the Gothic was written. Those are scholarly transcriptions made for scholarly purposes. They are equivalent to e.g. respelling any language in phonemic transcriptions. Nobody writes Gothic today. It's a dead language with small and fixed corpus. Those kind of transcriptions are not attestations of Gothic. --Ivan Štambuk (talk) 09:31, 5 August 2014 (UTC)
I disagree that only original editions of works can be cited; I don't see such a restriction in WT:CFI. Two editions are not independent of each other for the purposes of citing a single word/spelling (e.g. one can't cite both the American and British versions of Harry Potter and have them count as two citations of castle), but nothing I see prohibits citing different editions to confirm the existence of different words or spellings — e.g. citing an American edition of Harry Potter as a use of the word favor, even if JK Rowling's original used favour. The American edition is durably archived and verifiably uses favor several times (making clear that it isn't e.g. a typo). Likewise, the normalized editions of Norse texts are durably archived. (In most cases, they're far better archived and far more accessible — as copies exist in hundreds of libraries — than the original manuscripts, which are periodically destroyed by fires and in historical cases may even have been destroyed before any un-normalized editions of them were printed. But that's mostly superfluous to my point.) Consider also how many translations of the Bible have been cited to verify various words around here — CFI's prohibition against citing two "verbatim or near-verbatim quotations or translations of a single original source" only stops us from citing two editions of the Bible as citations of the same (spelling of a) word, it doesn't stop us from using two editions of the Bible as citations of two different words. - -sche (discuss) 20:17, 4 August 2014 (UTC)
What I meant was a reproduction of an Old Norse text printed well after Old Norse died out cannot count as a citeation of Old Norse. However, we can assume that an unnormalized reproduction reflects the original spelling and use it to attest spellings, and we can assume that a normalized reproduction does not necessarily reflect the spelling but still reflects the form of the word and we can use it to attest the term and its form, but not its spelling. --WikiTiki89 20:25, 4 August 2014 (UTC)
@-sche: I think it'd be hilariously nonsensical for e.g. one third of the inflected forms of a term or one third of a set of compounds that share a common element to use ā, while another third used aa, most of the rest used â and a few entries used a· - Indeed it would be nonsensical from the perspective of someone who imagines that Ancient Greek, Gothic, Sanskrit, Old Church Slavonic, Akkadian, Hittite, Old High German, Middle Persian, Old French and others were written by a single and unified speech community, who spoke a single language in a single point in time, as opposed to being spoken an written across many centuries (often millenia) by a diverse communities who never knew each other, who wrote in ill-fitting lossy scripts under the influence of traditional orthography not necessarily reflecting actually spoken sounds, and languages of documents X and Y who are today treated as parts of a single ancient language X would be in any other occasions treated as two completely separate languages, were they attested today.
I'm receptive to the idea of having both 1) reconstructed, template-generated inflection fitting some "idealized" model of a language as well as 2) listing only actually attested forms (I believe Old Irish conjugation and Old Persian declensions currently does that). But, simply ignoring all of the variation in order to fit them into some kind of imaginary order is a disservice to any serious potential users of Wiktionary. The only ones who would benefit from that would be non-serious users who could then claim that they "learned" some ancient language as presented by Wiktionary, even though such language never existed in the form it is being presented. It would be similar to many of our protolanguage inflection templates who present some kind of ridiculous Stammbaum-like picture of parent language dissolution reflecting a POV of a single linguist, which never existed as such.
Regarding the barely documented indigenous languages - they are a separate category. They are usually a living thing, and if one scholar uses â and another ā to represent what is indisputably the same sound, it makes sense to standardize on the most common notation and use others as redirects. But if some ancient language uses three different symbols for the [a:] sound, we cannot standardize it on anything because we don't have a clue whether those symbols meant the same thing (even though some, but not all, think they did). We can't make that kind of value judgments. If the original documents are still being published in facsimile editions, it means that no normalization is possible. There could be exceptions - e.g. Gothic with a tiny corpus and a small number of authors (one, is it? Ignoring Crimean Gothic). But for the majority it's not practical at all.
Anyway, this should all be discussed on an individual language basis. --Ivan Štambuk (talk) 10:41, 5 August 2014 (UTC)

In modern languages too, such as French, there may be normalized (recommended) spellings, and it's sometimes very difficult or impossible to find attestations for these normalized spellings. When there if an official recommendation, I think that they should be includable. For old languages, the issue is more difficult. Of course, they should be included when attested (even when the olf spelling cannot be found), but not considered as the main entry (the other entry should be as complete as the normalized one). If a (normalized or old) spelling is included even when it seems to be unattested, the fact that no attestation has been found should be made very clear in the entry. Lmaltier (talk) 18:20, 4 August 2014 (UTC)

Unattested lemma forms and CFI[edit]

Kind of a spinoff based on what Angr brought up. Currently, the common practice is to reconstruct the lemma form if it is not attested, and place the entry there. If several lemmas are possible, we generally include them all and choose one at random. This practice is primarily done with old languages, but it's easily conceivable that it could happen to modern languages as well. For example, if all we have for a particular English lemma is two attestations of fonges, one of fonging and one of fonged, then I doubt we would put the main entry at one of those entries. We'd put it at fonge, even though it's not attested. CFI doesn't say anything about this practice, but as it's so widespread both on Wiktionary and outside it, I think we should clarify and codify it. —CodeCat 18:18, 4 August 2014 (UTC)

I don't think English verbs are the best example of the phenomenon. I have worked on and observed cases where we have -ing forms and -ed forms as distinct entries, but do not have the presumed verb lemma form. In the absence of the lemma, the -ing form is often shown as noun and/or adjective and the -ed form as adjective. This seems to actually be a fairly common evolution, with the base and -s forms coming well after the -ing and -ed forms, if indeed they ever materialize in use. I cannot recall specific cases, but, if it is important, instances could probably be found. The best way would be by extracting the cases from the dump. DCDuring TALK 18:37, 4 August 2014 (UTC)
I did include the 3rd person singular present as one of the attested forms in my example, and the past form could include the past tense as well. —CodeCat 18:50, 4 August 2014 (UTC)
So we're both operating without real cases. I'll rejoin the discussion when someone, possibly me, has a real case. DCDuring TALK 20:04, 4 August 2014 (UTC)
Here is a real case: I cannot find the French verb arsenicaliser in its lemma form, but I can find it in a conjugated form : "Le praticien qui a le plus arsenicalisé le monde et dont l’expérience a le plus d’extension, de richesse et de certitude, M. Boudin, préfère actuellement l’acide arsénieux et se tient exclusivement à lui dans tous les cas : (…)" (Annales de la Société de médecine de Lyon, 1851) (it's undisputably a verb in this sentence) or "Nickel minéralisé par le fer & le cobolt sulphurés & arsenicalisés ;" (François Rozier, ‎Jean André Mongez, ‎Jean-Claude de La Métherie, Journal de physique, de chimie, d’histoire naturelle et des arts, 1777). Lmaltier (talk) 20:28, 4 August 2014 (UTC)
Here are some more real cases: Passargisch, ostweserisch, and several of the other adjectives in Category:German terms with rare senses. - -sche (discuss) 22:23, 4 August 2014 (UTC)
There are plenty of real cases from extinct languages; I already brought up πρίαμαι (príamai), which itself is not attested, but other forms of it are (see [19]). Old Irish examples include ad·gnin, ailid, and claidid. —Aɴɢʀ (talk) 22:32, 4 August 2014 (UTC)

Need for entries or a field in the 'create a new entry page' that may just be cross references for all spellings.[edit]

There is a requirement for entries that may just be cross references, for all spellings that use characters that are not in the usual Romanised character set.

Just going to the <create a new entry> page is very frustrating. The cross reference might be an extra field on the <create a new entry> page, with the title something like <have you viewed ...>.

Repeatedly I am on a page but cannot search for it to get back to it or to get to similar pages in the index to get back to it.

<kephalḗ> is the Romanisation of <κεφαλή>, but you cannot search for <κεφαλή> using <kephal ...>. It is often very difficult to imagine what you need to do to get back to a page that you have accessed when using the etymology, especially.

I expect to be at Wikimania on Wednesday ...

Genevieve Hibbs

Tocharian question[edit]

@Ivan Štambuk: and @Word dewd544: in particular since they seem to be our most prolific Tocharian editors: I see from kuse that the vowel letters that are normally represented as subscripts are represented by full letters in the entry name, but the headword line shows the subscript (in this case, kuse). Is this the best way to do this? "Kuse" and "kuse" correspond to two different spellings in the original script, don't they? If and when Unicode finally provides the Tocharian alphabet, we will presumably want to move our entries to forms written in the native script (hopefully retaining the Latin-alphabet entries as "Romanizations of..."), and if we want to do that by bot, it would be good to have entries under unambiguous names. Shouldn't the Tocharian B section of [[kuse]] be moved to [[kᵤse]] instead? The only problem I foresee is that sometimes it's "ä" that's subscript, and Unicode doesn't have a character for subscript "ä". For those cases maybe we could cheat and use "ₔ" instead. What do y'all (and anyone else interested) think? —Aɴɢʀ (talk) 14:09, 5 August 2014 (UTC)

Yes it should be moved, I wasn't even aware that subscript u sign <ᵤ> existed in Unicode until now.. --Ivan Štambuk (talk) 15:02, 5 August 2014 (UTC)
And are you OK with using "ₔ" for "ä"? Are there even any entries that currently call for that? —Aɴɢʀ (talk) 15:17, 5 August 2014 (UTC)
I'm fine with that. These issue should best be discussed on the about-page for Tocharian. We only have a few hundred Tocharian entries, and they need to be rechecked and referenced at any case. Unicode support doesn't seem to be coming anytime soon. If you feel like doing that, just knock yourself out... --Ivan Štambuk (talk) 16:21, 5 August 2014 (UTC)
We don't have an about-page for Tocharian. I don't have the resources to recheck and reference the Tocharian entries, but if I happen to see any subscripts in headword lines, I'm happy to move the info to a new entry name. —Aɴɢʀ (talk) 16:33, 5 August 2014 (UTC)
Yeah, I agree that it should be moved as well. I think the reason I didn't use those characters originally was because I didn't know the others actually existed for use on here, and also just made them based on the format that was used for the few existing entries, from what I remember. But this way is better. Word dewd544 (talk) 13:52, 13 August 2014 (UTC)
Cool. Is [[kᵤse]] the only one? —Aɴɢʀ (talk) 14:31, 13 August 2014 (UTC)
OK, I've moved the Tocharian B section to [[kᵤse]] and, I believe, fixed all the links that were pointing to it. I looked through the lemma categories of both Toch. languages and couldn't find any others with subscript vowels, but I may have overlooked something. —Aɴɢʀ (talk) 10:48, 7 August 2014 (UTC)

Representing Old Irish "tense" sonorants[edit]

Anyone interested in Celtic languages or IPA transliteration (or both, or anyone who just wants to put their oar in) is invited to join the discussion I've just started at Appendix talk:Old Irish pronunciation#Representing the tense sonorants. —Aɴɢʀ (talk) 01:25, 9 August 2014 (UTC)

Lists of dictionary headwords[edit]

Are they subject to copyright? I am interested in creating appendices containing lists of headwords of some notable dictionaries that are still under copyright, as well as some additional information not contained in them. --Ivan Štambuk (talk) 20:31, 9 August 2014 (UTC)

Compiling your own list is not copyrighted as far as I know. —CodeCat 20:32, 9 August 2014 (UTC)
Copyright is a matter subject to interpretation anyway... So let's not give a shit about it --Fsojic (talk) 20:44, 9 August 2014 (UTC)
But I don't want my own list. I want an enhanced lists of words or reconstructions exclusively from certain works so that the experience of browsing them could be simulated by clicking. Additionally, references which refer to them could back-link to such lists. It could also be good for verification and inspection of coverage. I prefer lists and tabular presentation over categories.. I recall a discussion a while back about Brian's hotlist which was kept, so I suppose it's not a big deal. But such lists would be exposed outside userspace, and that seems a bit more problematic, so I'm asking if it could be prohibited for some reason. --Ivan Štambuk (talk) 21:08, 9 August 2014 (UTC)
Intellectual property lawyer hat on. The namespace that a list appears in is irrelevant to copyright law. As far as I recall, Brian's hotlist is a compilation of headwords from several different dictionaries, and therefore can not identifiably impinge on the copyright of any one of them. I think that it would be problematic, at least, to list the headwords of a specified edition of a specified, in-copyright, printed dictionary. Such a list of words defined reflects the editorial judgment of the dictionary's authors, and is therefore likely to be covered by copyright. Doing so for one that was out of copyright would be fine. A possible workaround would be to make one set of lists of words defined in out-of-copyright versions of specified dictionaries, and a separate list containing a combination of words not defined in the out-of-copyright versions (which will basically be words that are new since their publication) but which are defined in unspecified "major dictionaries". bd2412 T 22:01, 9 August 2014 (UTC)
Interesting. Thank you very much for this explanation. --Dan Polansky (talk) 09:20, 10 August 2014 (UTC)

Can we automate Hiragana and Katakana transliteration?[edit]

I see we don't currently have automatic transliteration of Hiragana and Katakana. Is there a technical reason why we can't, or is it just that no one's gotten around to it yet? —Aɴɢʀ (talk) 11:02, 10 August 2014 (UTC)

It's possible, but it would give incorrect results when they are mixed with Kanji. So the module would have to check for the presence of Kanji characters and return nothing if found. —CodeCat 12:09, 10 August 2014 (UTC)
Mixed terms should rely on kana, e.g. 勉強する and 電子メール should use kana spellings べんきょうする (benkyō suru) and でんしメール (denshi mēru), if it only transliterated the hiragana/katakana part する and メール, it would be a mess. --Anatoli T. (обсудить/вклад) 01:00, 11 August 2014 (UTC)
Would that be a lot of work? Obviously we shouldn't do it if it means listing every single one of the 6.3 kilosagans of possible Kanji characters, but if it can be done with less than 100 characters of code, why not? —Aɴɢʀ (talk) 12:19, 10 August 2014 (UTC)
Module:ja already does it (in Japanese headwords). It's not implemented in link templates, as the transliteration may be incorrect. Wyang (talk) 23:53, 10 August 2014 (UTC)
The automatic transliteration is used in Japanese entries and usexes ({{ja-usex}}) and some other templates. It's only not used in translations. For this to happen, the translations would need to follow the same format as entries, using spaces in multipart words or phrases with particles, capitalisation (forced with symbol ^ or automatic on proper nouns). Besides, many kanji translations don't have hiragana, which is needed for transliterations to happen. --Anatoli T. (обсудить/вклад) 00:02, 11 August 2014 (UTC)
The other challenge is that the Japanese transliteration is somewhat context-driven, as I said, proper nouns (which excludes language names, demonyms, month names, weekdays) are capitalised, verb with final おう are transliterated as "-ou", rather than "ō", there are cases when morphemes need to be separated ("." is used in entries), particles は and へ are "wa" and "e", rather than "ha" and "he". --Anatoli T. (обсудить/вклад) 00:56, 11 August 2014 (UTC)

Block policy clarification[edit]

The current blocking policy page WT:BLOCK seems misleading. I propose to reduce the page content to the following wikitext:

:''See also '''[[Help:Interacting with humans]]'''''

# The block tool should only be used to prevent edits that will, directly or indirectly,
hinder or harm the progress of the English Wiktionary.
# It should not be used unless less drastic means of stopping these edits are, by the assessment
of the blocking administrator, highly unlikely to succeed.

===See also===
* [[Wiktionary:Range blocks]] - when and how to block a range of IP addresses
* [[Wiktionary:Vandalism in progress]] ([[WT:VIP]]) for currently occurring or very recent vandalism
* [[Wiktionary:Vandalism]] (or [[WT:VANDAL]]) for vandalism of Wiktionary in general

As per Wiktionary:Votes/pl-2010-01/New blocking policy, the above text is the only binding part of the page.

Note that I placed "policy-CFIELE" there, so that the criteria for further modification of this page should be identical to those of CFI and ELE.

What do you think? --Dan Polansky (talk) 12:06, 10 August 2014 (UTC)

Special:Abusefilter is supposed to filter enough to allow to talk with the staying editors if they're wrong, and so encourage them toward perfection. How many valuable professionals could post their personal site in reference by ignorance, we can't treat all of them as some incorrigible spammers, it would contravene to WT:Be bold.
Moreover, I'm still considering that if the current WT:BLOCK had been applied with my known Wikimedia bot (3 millions editions and 21 flags), the blocker wouldn't have to refuse to assume any hurried arbitrary decision. I saw too much waste because of friendly fire by the past.
That's why letting a message was a sine qua non condition before forbidding indefinitely the open wiki. JackPotte (talk) 13:10, 10 August 2014 (UTC)
Apart from that we could also recruit more patrollers, for example by giving this status to them automatically after 500 editions, like on the French Wikipedia. JackPotte (talk) 16:24, 11 August 2014 (UTC)


Template:pedia was redirected from one page to another earlier today, resulting in a number of pages being broken. Instead of Template:pedia redirecting to Template:projectlink/Wikipedia, I request that Template:projectlink/Wikipedia redirect to Template:pedia instead. Since Template:pedia is linked to from [ https://en.wiktionary.org/w/index.php?title=Special:WhatLinksHere/Template:pedia&namespace=0&limit=5000 thousands of pages], it seems the more likely target. Purplebackpack89 23:56, 11 August 2014 (UTC)

Support it being at Template:pedia
  1. Purplebackpack89 23:56, 11 August 2014 (UTC)
Support it being at Template:projectlink/Wikipedia

Did you even look at the page history? Template:pedia has been a redirect since 2007. —CodeCat 23:58, 11 August 2014 (UTC)

And when you moved it earlier today, it wasn't working on the pages I looked at. It shouldn't have been moved by you earlier today, and you shouldn't have deleted the page you did. You also shouldn't have edit-warred, and you should have provided better edit summaries. Purplebackpack89 00:02, 12 August 2014 (UTC)
That's because you reverted my move while I was still in the process of updating all the redirects to point to the new location. Your revert actually broke the template altogether because it ended up pointing to a deleted page. You should have taken more care before making changes when you didn't know what you were doing. You should also have taken more care to get the facts clear before posting erroneous and misinformed "polls" like you did, which do nothing but embarass you and waste the time of other editors who have better things to do than deal with you. —CodeCat 00:05, 12 August 2014 (UTC)
You coulda saved yourself the work of not updating all redirects by not making the move in the first place. Nothing will convince me that it was a good idea to make that move. Heck, things would have worked just fine if you'd let Template:pedia have the full text it did in my last edit. Nobody will ever use Template:projectlink/Wikipedia, because just adding Template:pedia is so much easier. Why don't we just have Template:projectlink/Wikipedia redirect to Template:pedia? Everything would be so much simpler that way Purplebackpack89 00:24, 12 August 2014 (UTC)
Template:projectlink/Wikipedia is not meant to be used directly in entries anyway. Rather it's meant to be used through {{projectlink}}, which supports many other projects. {{pedia}} is just a remnant from before it was converted to {{projectlink}} back in 2007. All the projectlink pages are named beginning with PL:, including Template:PL:pedia, which Template:pedia was originally a redirect to. All I did was move Template:PL:pedia to Template:projectlink/Wikipedia. I am intending to move all the other PL: templates too, as they are properly subtemplates of Template:projectlink and are only meant to be used in conjunction with it. Having them as subpages makes that relationship more clear. I really don't understand why you are making such drama out of it. —CodeCat 00:34, 12 August 2014 (UTC)
Because you broke pages, and they wouldn't have been broken if you hadn't messed around with the template. You probably shouldn't have deleted Template:PL:pedia either. Purplebackpack89 00:42, 12 August 2014 (UTC)
It has no transclusions, so why would we keep it? It's useless. —CodeCat 00:47, 12 August 2014 (UTC)
  • By the way, it was COI for CodeCat to protect a page she was edit-warring on. For the life of me, I don't understand why CodeCat is still an administrator. She edit-wars frequently, she rarely explains what she's doing, and she protects things she's engaged in edit wars on. Purplebackpack89 00:28, 12 August 2014 (UTC)
    • You're just looking for reasons to get your right when I've already countered your other arguments. You're pretty much pulling the idea that I was edit warring out of your hat in an attempt to put me in a bad light while excusing yourself. If someone breaks things or makes other bad edits repeatedly, there is nothing wrong with edit warring. It's just un-breaking the wiki. Imagine if we had to start a discussion whenever someone kept re-inserting "poop" into an entry. It would be rediculous! —CodeCat 00:34, 12 August 2014 (UTC)
      • For starters, the last edit I made to Template:pedia wasn't a bad edit. I want you to look closely at it before calling it a bad edit. Secondly, I was acting in good faith trying to restore a template that was showing up as broken on a page. Somebody who inserts "poop" into a page is vandalizing. In one of those cases, it is acceptable to edit-war. In the other, it isn't. If you don't understand which is which, and you think it's OK to edit-war to revert good-faith edits without even an edit summary explaining why you did what you did, then you have no business being an admin. Purplebackpack89 00:42, 12 August 2014 (UTC)
        • To be fair, I think very little of what you do on Wiktionary is truly good faith. You mostly get on people's nerves and are obstructive almost on principle, and people have said so many times in the past. You've even driven away other valued and productive editors with your behaviour. So if I shouldn't be an admin, then I suggest you shouldn't be on Wiktionary at all. —CodeCat 00:47, 12 August 2014 (UTC)
          • I resent your accusation. I tried to fix that template because it was showing up as broken, not to piss you off. I vote keep at RfD because I believe the project would be improved with more articles, not to piss Mglovesfun off. Every mainspace and RfD edit I make is in good faith and with a view to improving the Wiktionary Purplebackpack89 00:50, 12 August 2014 (UTC)
            • Of course, but so are all of my edits. :) I never said that you did it to piss me off. That's not what the other editors who have complained said either. But good faith edits are not equal to good edits, and are therefore not exempt from being reverted. Having good intentions also doesn't prevent you from getting on people's nerves. A while ago there was User:KYPark who kept inserting rather outlandish etymologies at WT:ES, and would get very philosophical about the ideas while not really contributing or making any kind of point. He got upset when we started moving them to his userspace because he didn't understand that it didn't belong there, and after his behaviour continued for about a year or so, he got blocked, I think even several times. There was no discussion about a block, but nobody really minded that he was blocked because he had annoyed and frustrated so many people that nobody was willing to stand up for keeping him. They were glad he was finally gone. The reason I am telling all this is that something similar may eventually happen to you as well. You would do well to try to be a friend of the larger Wiktionary community, because all the good faith in the world will not help you if they are fed up. —CodeCat 01:02, 12 August 2014 (UTC)
From the edit history of Template:pedia it is apparent that Purplebackpack's starting assumptions (since amended) are mistaken. Template:pedia was not moved; it has been a redirect since 2007. And it was not CodeCat's updating of the redirect target but Purplebackpack's revert of that which seems to have broken some existing uses during the update that was being made. Purplebackpack's subsequent unilateral insertion of thousands of bytes of duplicated code also created quite a mess. Purplebackpack says "I was acting in good faith". As Wikipedia observes at w:WP:CIR, "[some users] believe that good faith is all that is required to be a useful contributor. Sadly, this is not the case at all. Competence is required as well. A mess created in a sincere effort to help is still a mess." - -sche (discuss) 01:07, 12 August 2014 (UTC)
  • Explain how the code insertation created a mess. Purplebackpack89 01:10, 12 August 2014 (UTC)
  • Also, there are certain things that acting in good faith entitles you to. One of them is a clear explanation when you are reverted. CodeCat did not give one. Purplebackpack89 01:12, 12 August 2014 (UTC)
    • I was more focused on undoing the damage than on giving an explanation. Fixing thousands of entries had a higher priority to me than satisfying one user. —CodeCat 01:20, 12 August 2014 (UTC)
      • Maybe you shouldn't have broken them, then... FWIW, CIR isn't policy here or even on Wikipedia, it's merely an essay, and it's a bad idea, because it flies in the face of being BOLD, and taking chances with edits. It's also walking too fine a line, because it's impossible to understand why a particular editor did a particular edit. Finally, it requires a level of communication that is present on Wikipedia but not on Wiktionary; Wikipedia not only has fewer things that can be broken (since they don't use as many templates and lack a rigidity of article structure), it also is better at explaining to editors what's wrong. Purplebackpack89 02:57, 12 August 2014 (UTC)
      • Furthermore, @CodeCat:, your attitude that explaining your edits to other editors is of little or no import is disheartening, to say nothing of being wrong. You complain about me being hard-headed, but I've mentioned this to you at least half a dozen times, and other editors have mentioned it as well, and you've ignored them. It's very disingenuous for you to make a CIR-based argument when you have not been forthcoming about why you're right. Purplebackpack89 04:36, 12 August 2014 (UTC)
        • (edit conflict) CodeCat didn't break anything- you did. You assumed bad faith, and didn't bother to ask or investigate. I'm not going to apologize for CodeCat- sometimes I vehemently disagree with her actions, and I've done my share of griping about it. I've even reverted a few of her edits- but only when things were seriously broken and she wasn't around to fix them, and only after carefully analyzing everything to make sure I wasn't going to make things worse.
        • You see, normal people would post a complaint on her talk page or in the forums first and demand to know why she was doing it. You, on the other hand, know better than everyone else and reserve the right to unilaterally step in and take over any time it sort of looks like someone might be doing something wrong- shoot first, and ask questions later. And then, when it's demonstrated that you were mistaken, you don't admit you were wrong, you don't apologize- no, you attack the person you interfered with for not explaining things so even you could understand. After all, you never make mistakes- the only way you could ever be wrong is if someone else misleads you into being wrong. Chuck Entz (talk) 07:54, 12 August 2014 (UTC)
          • Chuck, I saw something was broken and tried to fix it. I did that in good faith, and felt I was owed an explanation for why my edits were wrong. Purplebackpack89 14:07, 12 August 2014 (UTC)
  • Now as before, I find the CodeCat pattern of discussion-free and summary-free edits to infrastructure objectionable. CodeCat hardly ever explains themsemselves, but require explanation for opposition to their edits. CodeCat lacks the maturity to understand that excessive change with little added value is bad. --Dan Polansky (talk) 07:11, 12 August 2014 (UTC)
  • PBP, for this and other edit-warring incidents you have been stripped of rollback and autopatrolled privileges. The latter increases the chance that someone without "COI" will notice any disputes with you, so you should be thankful, really. Further misbehaviour will be met by a block. Keφr 08:10, 12 August 2014 (UTC)
    This should be immediately undone, since the BPB vs. CodeCat incident had nothing to do with autopatrolling and rollback flags. Especially the autopatrolling should be returned back, since the mainspace edits of PBP are largely undisputed, and removing the flag will increase patrolling cost to the patrollers. Furthermore, the threat of a block is inappropriate, since PBP was edit warring with CodeCat on a page which CodeCat edited without consensus; a block or desysopping of CodeCat could be in order, given the long-term pattern of their editing behavior. --Dan Polansky (talk) 08:31, 12 August 2014 (UTC)
    Flags restored. We can't just punish one party in a conflict, especially considering that the reverts were (perceived as) legitimate, and that there is no pattern of (perceived) abuse of those flags.. --Ivan Štambuk (talk) 08:55, 12 August 2014 (UTC)
    The reverts were perceived as legitimate by whom? Keφr 09:07, 12 August 2014 (UTC)
    By them, obviously, otherwise they wouldn't have done them. --Ivan Štambuk (talk) 12:33, 12 August 2014 (UTC)
    I stand by what I did. PBP may not have used the rollback button here, but the repeated combative and misinformed edit-warring is evidence that he cannot be trusted with it. Patrolling burden should not be a problem; PBP has made one edit yesterday, two the previous day, three edits two days ago, and previous 13 edits were on 3rd of August, so his edits are quite infrequent. However, the few edits he makes do need attention apparently. In my opinion a block is not only appropriate, but long overdue. This is not just a single incident, and PBP refusing to learn (from past mistakes and from everything else) is a huge red flag. And nothing prevents you from starting a desysopping vote. Keφr 09:01, 12 August 2014 (UTC)
    In Wiktionary:Beer_parlour/2014/June#Purplebackpack89, in the hidden section "Rights removal", two editors supported flag removal while four editors opposed, two of which explained that since PBP has not abused the flags, they should not be removed. Again: since the editor has not abused the flags, they should not be removed. Furthermore, nowhere in this thread have you noticed that CodeCat refuses to learn. You have singled out the fairly harmless PBP, and conveniently ignored the editor who by my lights have caused actual damage in the mainspace, unlike PBP whose only damage are drama threads in Beer parlour, a fairly unimportant thing. In this very thread, the drama was sustained by CodeCat, who continued to respond to PBP posts. But again, the drama itself is fairly harmless, an attribute of an open wiki where people can actually speak up. --Dan Polansky (talk) 09:17, 12 August 2014 (UTC)
    Wasting people's time on futile discussions is not "fairly harmless". And again, if you think CodeCat's actions are so egregious, what are you waiting for? For any punishment to be effective, CodeCat needs to be desysopped first. Keφr 09:35, 12 August 2014 (UTC)
    There is no sound desysopping process. That is why Ruakh left before he would have to deal with CodeCat in this environment. The only desysopping process that we tried relied on the 2/3-supermajority consensus for desysopping. And this of course enables CodeCat to perform mass changes with unclear support, possibly even less than plain majority support, and be fairly sure they will not get desyssopped, since there probably is something like 45% or more of supporters of what they are doing; I have invented the 45% number and I do not really know the scope of support for their various changes. --Dan Polansky (talk) 09:46, 12 August 2014 (UTC)
    "Nowhere in this thread have you noticed that CodeCat refuses to learn." What she has refused to learn is that the editing process would be a helluvalot easier for everybody concerned if she used edit summaries. She has repeated refused to even consider doing so. Purplebackpack89 14:01, 12 August 2014 (UTC)
As annoyed as I am with PBP's behavior and attendant whining in this incident, I disagree with removing his flags over it. I simply don't see the relevance. This reminds me a lot of the whole Gtroy/Acdcrocks/LuciferWildcat affair: in that case, improper harassment generated enough sympathy that he was able to continue with his prolific creation of subpar and often fabricated content far longer than he would have otherwise. Chuck Entz (talk) 13:57, 12 August 2014 (UTC)
Are you suggesting we should repeat the same mistake by letting him loose? As you see, there is enough strife in this community without stubborn ignoramuses adding to it. Keφr 14:29, 12 August 2014 (UTC)
  • One thing's for sure: Kephir overstepped his bounds with his removal of rights. His "beef" against me has translated into HOUNDing and irrational admin actions, and this after I told him multiple times that interacting with me is unproductive. I am very close to considering he be forbidden from interacting with me for the good of the community. Purplebackpack89 14:07, 12 August 2014 (UTC)

Practically this whole discussion is making me facepalm...stop getting so bent out of shape over such piddly little things... To precounter any (slightly) likely accusations, I'm not taking sides here.

  • PBP: You thought CodeCat fucked things up, but in reality things only got messed up because you didn't let them finish the redirections and such that they were doing. Accept that you made a mistake and move on. I get that this was a somewhat more widely used/high profile, etc template not some obscure thing but maybe instead of, as Chuck said, assuming bad faith you should have posted to CodeCat's page to the effect of "Do you realise you broke this thing?" (since that's what you thought happened) before reverting. User: PalkiaX50 talk to meh 14:59, 12 August 2014 (UTC)
Where do you get the idea I was assuming bad faith? And you guys fail to acknowledge that CodeCat bears some responsibility for not initially communicating why she did what she did. Purplebackpack89 16:22, 12 August 2014 (UTC)

Lots of errors in Old French nouns (and in verbs too, before I fixed them)[edit]

(This is a bit of a rant. No offense intended to whoever created all the mistakes ... maybe User:Mglovesfun?)

I notice a bunch of mistakes in Old French noun declension. Of the first 4 words I checked out, 3 had incorrect declensions.

  • seror is mistakenly listed under suer; suer is the nom. sg. and seror the obl. sg. but WT has them reversed.
  • empereor is the std form, but WT claims that empereür is standard and redirects the former to the latter when it should be reversed; it also messes up the nom. sg. (should be emperere not empereres) and obl. pl. (should be empereürs not empereres).
  • ameor has the same declension as empereor (nom. amere - ameor, obl. ameor - ameors) but is listed with a totally different declension, broken in a different way from empereür.

A little more looking reveals

  • chanteor has the same declension as ameor and empereor but is listed with a messed-up declension that is different from both the messed-up declensions of ameor and empereor/empereür; what a mess. Its etymology is also broken ... it lists a mistaken cantor instead of cantātor.
  • BTW empereür's etym. is slightly messed up, listing a non-existent Latin word imperātōr with a stray long mark over the o.
  • chaceor, robeor, troveor have the mistaken declension of ameor.
  • compaignon and felon should have similar declensions; both are broken, each differently from the other.
  • nonain and *ante (should be antain) again should have similar declensions and are broken, each differently from the other.

I'm sure there are tons more. How did this get so messed up?

BTW, the Old French verb conjugations were utterly messed up, too, and full of wrong-way redirects as well, but I've put a lot of work into fixing them.

I'd suggest in the future that it would be better to have no declensions/conjugations at all than completely wrong ones.

Benwing (talk) 09:17, 12 August 2014 (UTC)

  • Well, what happens with editors in the more obscure languages is naturally that less people know anything about them, so it is tough to find out if they are correct or not. The same happened with plenty of other editors. There was a guy called User:Razorflame who editted in tonnes of languages, but users better than him kept pointing out his mistakes in these languages, which made him move to other languages - before long he was editting in Kannada, and became the self-proclaimed Kannada expert (and since nobody else knew anything about the language, he was allowed to edit to his heart's content, doubtlessly filling this project with crappy Kannada entries). The same thing has happened myriad times, for example User:Wonderfool with Asturian - he claims to be married to an Asturian woman who knows the language, and since nobody else edits in that English, he becomes the "local expert". My suggestion (and hope) is to fix as many of the Old French entries as you can. It's highly probable that Mglovesfun has made plenty of mistakes, so we appreciate any new editors in less widespread languages like. Wonderfool too would appreciate other Asturians to correct his work. --Type56op9 (talk) 14:30, 12 August 2014 (UTC)
    Maybe you can even get your Asturian wife to check all your edits. --WikiTiki89 15:57, 12 August 2014 (UTC)

CodeCat pushing original research[edit]

User:CodeCat is again pushing large-scale original research (OR) into etymology sections of mainspace articles, as well as articles for protolanguage reconstructions in the appendix namespace, but this time removing cited scholarship (which he added in several instances) and replacing it with his own fabrications (en example). In the past he objected to tagging his made-up theories with the template {{original research}} which he unilaterally deleted out of process having removed all of the instances of articles being tagged with it (en example). Neither of these were discussed anywhere and CodeCat never uses edit summaries. His behavior is detrimental to the both credibility of Wiktionary as well as discouraging for any editors involved in those areas who see their work undone in such dictatorial manner. --Ivan Štambuk (talk) 12:31, 12 August 2014 (UTC)

We've been over this before. Wiktionary does not have a policy or prohibition against original research, and just because you say it's unwanted doesn't mean it is. —CodeCat 12:33, 12 August 2014 (UTC)
No we haven't been over this. Many have objected to this practice. And what you're doing here is something entirely different - bending different (legitimate and scholarly-supported) theories into something original and thus useless, but seemingly supported by references. And you do it repeatedly, without discussion, and when it's reverted you revert back to the disputed version containing your original research, claiming that the disputed version should be discussed first before reverting. --Ivan Štambuk (talk) 12:38, 12 August 2014 (UTC)
Yes, because I disagree with your moves. You also disagree with mine. So we're at a standoff. That's why I called for a discussion, to form a real consensus on WT:AINE-BSL rather than to just edit war over it. This discussion is not going to get us anywhere as long as it's just the two of us. —CodeCat 12:41, 12 August 2014 (UTC)
I object to CodeCat placing their unsourced original theories where sourced theories exist. --Dan Polansky (talk) 12:54, 12 August 2014 (UTC)
As for Wiktionary:About Proto-Balto-Slavic, each sentence present there that is not based on consensus should be tagged "[disputed]" or the like, to make it clear the page does not represent consensus. --Dan Polansky (talk) 12:56, 12 August 2014 (UTC)
Every single part of the Proto-Balto-Slavic reconstructions I created can be sourced. What is not sourced is the exact written form of the words. Instead, I converted them to use a common notation, just like we do for other reconstructed languages. I don't understand what is so controversial about it. —CodeCat 13:02, 12 August 2014 (UTC)
Okay. I object to CodeCat replacing (or renaming) particular forms that are sourced with particular forms that are unsourced. --Dan Polansky (talk) 13:04, 12 August 2014 (UTC)
Just to put things into context here. Are you suggesting that if a source attests, say, Indo-European *teutā or *teutéh₂, and someone creates an entry with that name, then we are not allowed to move that to *tewtéh₂ even if no source attests it in that exact written form? Because that's the equivalent of what I've been doing for Balto-Slavic. —CodeCat 13:10, 12 August 2014 (UTC)
Yes, that is correct, just that by "even if no source" you probably meant "since no source". The sourced exact written forms should prevail unless there is an overwhelming consensus to the contrary. --Dan Polansky (talk) 13:18, 12 August 2014 (UTC)
Well, I would say that we already have a consensus as WT:AINE already details how forms are to be normalised. Some of it I've written, but some parts of it were already there before (in particular the bit about laryngeals). There has not been any dispute about that practice, and it has been enforced by other editors as well, so I believe consensus can be assumed. So then I would conclude that there is, in fact, a consensus for moving those entries to the normalised form *tewtéh₂. Furthermore, there is also an established practice to normalise even attested languages, including most prominently Old Norse and Old English, but also languages with a prescribed standard orthography. So I can only assume that normalising spellings is a well-established practice for Wiktionary and if I was supposed to treat it as something controversial or disputed, I would have expected more evidence for that. —CodeCat 13:23, 12 August 2014 (UTC)
Given my past experience with you, I don't believe a single word that you say about "consensus". So please deliver objective evidence of consensus; I will not consider any consensus claims made in the absence of such objective evidence. --Dan Polansky (talk) 13:26, 12 August 2014 (UTC)
You can't prove a negative. Consensus exists through the lack of dispute. As there has not been any dispute regarding the normalising of spellings in PIE, consensus can be assumed. —CodeCat 13:29, 12 August 2014 (UTC)
Re: "Consensus exists through the lack of dispute": Absolutely not. Either an overwhelming common practice or a discussion is a prerequisite for there being a consensus; both can be demonstarted by objective evidence. Since we now know that your consensus claims are based merely on your perceived "lack of dispute" and conventiently fit your long-standing pattern of mass editing without consensus, the need for you to provide objective evidence has been corroborated. --Dan Polansky (talk) 13:33, 12 August 2014 (UTC)
So then there is actually no consensus between us on what consensus is. That's going to be difficult. —CodeCat 13:36, 12 August 2014 (UTC)
Re: "Consensus exists through the lack of dispute": That is an absolutely outrageous view of consensus. In any event, obviously we now have evidence of lack of consensus. DCDuring TALK 13:37, 12 August 2014 (UTC)
For what it may be worth: AINE and AINE-BSL before CodeCat edited them. The latter was only edited by Ivan before. Keφr 14:40, 12 August 2014 (UTC)
What should also be looked at is how many of the entries that existed at that time followed the policy on those pages. I'm fairly sure that when Ivan created the page for PBS, it was when the dispute had already started, and most of the Balto-Slavic entries that existed at the time did not in fact adhere to the practices that Ivan was detailing. So it was not an attempt to codify practices but to establish his own as canonical in contradiction to what was already present on Wiktionary. The edits I made after that corrected that, while also inserting practices I felt were more reasonable, but were not established by anyone prior to that. For the PIE page, I believe at the time most of our entries (there weren't that many yet) also didn't adhere to the spelling norms on that page. So I probably edited the page to reflect the reality, although it's long ago so I don't really remember. —CodeCat 22:08, 12 August 2014 (UTC)
User:CodeCat: Maybe you should compile a list of those instead of saying "fairly sure". Apparently some people here do not take your "fairly sure" very seriously. Which is not necessarily the problem with those people. Keφr 10:09, 13 August 2014 (UTC)
What you did is 1) changed the policy page removing the stuff you don't like and overriding it with your original research 2) mass renamed a bunch of pages, and edited the references to them in the articles 3) undid my reverts when I challenged those changes.
Your claim that "most of the Balto-Slavic entries that existed at the time did not in fact adhere to the practices that Ivan was detailing" is nothing but lies. Most of them were referenced except for the reconstructions that are a figment of your imagination and cannot be found anywhere in the literature, and which are not deleted due to your interpretation of "no policy against original research" = "I can do whatever I like".
And now again you mention spelling norms which this has nothing to do with. These different reconstructions represent completely different theories and cannot be unified under a single "normalized" spelling. I explained that countless times. --Ivan Štambuk (talk) 10:32, 13 August 2014 (UTC)
  • The so-called "normalized spellings" argument is a red herring, and an attempt to push a particular POV. Writing a instead of o, ź instead of ž, or writting a glottal stop sign ʔ or not is not merely a "normalization of spellings" - these represent completely different protolanguages, reflecting different theories by different linguists. Usage of innocent terms such as normalization is merely an attempt to trivialize implications of such edits. These differently reconstructed protolanguages in fact represent completely incompatible theories and cannot be reconciled via notational convention. It's not like w = u̯ in Proto-Indo-European, in the example given by CodeCat above. --Ivan Štambuk (talk) 14:29, 12 August 2014 (UTC)
    • As far as I know, the majority of Balto-Slavic linguists accepts the existence of a so-called "acute" register for Proto-Balto-Slavic. The use of *ś rather than *š reflects a real phonetic difference in Proto-Balto-Slavic (that of PIE *ḱ versus *s + RUKI) and this difference is maintained in Slavic as *s versus *x/*š, and I'm not aware of any dispute about this either. I'm less certain about *o versus *a, but using *a in all cases seems like the more conservative approach, at least until more sources start popping up supporting *o. So far I've only seen Kortlandt's arguments, but his theories are hardly mainstream. —CodeCat 14:36, 12 August 2014 (UTC)
      Indeed they do, but most do not treat them directly as glottal stops - usage of ʔ by Leiden school is in a completely different framework (glottalic theory of PIE, PIE *H > segmental *ʔ) than others who e.g. claim that it was just phonologically redundant feature in long syllables, or perhaps not (when preserving the *V: vs. *VH distinction). I was referring to from PIE patalovelars and not the RUKI-induced . Regarding the *a vs. *o - the only Proto-Balto-Slavic dictionary published since Trautmann's 1927 book uses *o, so it's pretty far extreme to impose *a like you've been doing. Different reconstructions = different theories, and NPOV requires us to abandon any attempts of notational "normalization" and treat all of the incompatible sources equally. --Ivan Štambuk (talk) 15:26, 12 August 2014 (UTC)
      The use of a superscript glottalization symbol is not meant to indicate that it was indeed a glottal stop of any kind. It's just an abstract symbol that stands for the acute register, whatever its nature was. As for *ž, I realise that you meant that, but it doesn't make much sense to write *ž < PIE *ǵ but *ś < PIE *ḱ. We should use the same diacritic for both of them. And the sources I've seen so far mostly indicate the PIE palatovelars with an acute accent. If we indicate them as *š and *ž instead, then what symbol shall we use for the RUKI-induced variation of *s? Concerning *a versus *o, the problem is that if we distinguish them only sometimes but not other times, that's going to confuse users who may think there is real significance to this. They might think, quite reasonably, that if one noun ends in *-os and another ends in *-as, that this reflects some real difference rather than different theories. Therefore, I opted to not distinguish them in the normalisation, so that no false impressions are given about this. We could of course normalise in the other direction, but the issue there is that the *o versus *a distinction is not reconstructable in the majority of cases. —CodeCat 15:44, 12 August 2014 (UTC)
  • I think it's high time that the original research policy on etymologies be formulated and voted on. If at any point a legitimate and referenced scholarship that I added could be overridden by some anon on the basis of a WT:ES discussion I want to know it so that I know what I don't want to waste my time on. --Ivan Štambuk (talk) 16:22, 12 August 2014 (UTC)
Why should proto languages be exempt from all attestation rules? It's true that they can't be attested in the same way as attested languages but that doesn't allow us to just use invented forms. Wiktionary is no scholarly platform for linguistics, if you want that, go to Wikiversity. This isn't the first time CodeCat's behavior over here has been criticized - soon I'll be at a point where I am forced to take action in form of a WT:VOTE. -- Liliana 16:27, 12 August 2014 (UTC)
I would love to see that. While CC's OR Finnic roots don't bother me all that much as I once told her it would be better for both her and Wikt. if she used an appropriate platform for her OR, e.g., Amazon self publishing (I might even buy it.)
And then there is the much more serious issue of meddling with actual referenced etymologies, see diff where arguably the most authoritative source on lv etymologies – Karulis – was replaced by a root from god knows where all the while the superscript [1] at the end would have a reader believe that it was, in fact, Karulis who offered that root, essentially we end up with what is called (I think) a fabricated citation on Wikipedia. Removing intermediate steps (as in diff) in a referenced etymology also evades me, there are two homographic terms with completely different meanings why would trimming down a ref'd etymology be a good idea?
Re: OR Uralic roots I was disappointed by CC's ignoring of important corollary information from authoritative sources when crafting her OR root appendices, for example etymoloogiasõnaraamat was explicit in the fact that Finnic word for "shoulder; help" is ultimately an Iranian borrowing which for some reason she saw as not worthy of inclusion in her appendices for this root (e.g., Appendix:Proto-Finnic/api) and I'm lost as to why... Neitrāls vārds (talk) 18:14, 14 August 2014 (UTC)
*pūteiti is supposed be the infinitive form except that infinitive for Proto-Balto-Slavic verbs cannot be reconstructed because Old Prussian evidence doesn't agree with East Baltic and Slavic. But let's just ignore that conveniently in order to "fix" the inherent flaws of the tree model of language change and "normalize" entries... --Ivan Štambuk (talk) 16:30, 16 August 2014 (UTC)

What is consensus?[edit]

There is some disagreement about the exact nature of consensus in the above discussion. Specifically, the debate is over whether lack of debate implies consensus on a particular thing. My view is that it does, as we generally tend to follow the practice that an edit is ok until someone reverts it or complains about it. So my question is, in the absence of any discussion, can consensus be assumed? And if not, what should be done with the many unwritten and undiscussed rules that were never formally "consensusified"? Also, if I'm correct that consensus is needed for every edit on Wiktionary, what does that mean for the millions of edits on mainspace entries for which no discussion was made beforehand? Is requiring explicit consensus for every change workable? —CodeCat 13:47, 12 August 2014 (UTC)

  • This is a clarion call for instituting the BRD process they have at Wikipedia. An undiscussed edit is a "bold" edit. If another editor disagrees with that edit, he/she can revert it. At that point, you discuss. If no one disagrees with a bold edit, it isn't discussed. Purplebackpack89 13:57, 12 August 2014 (UTC)
  • A planned massive change cannot be claimed to be supported "by consensus" if there is no discussion and no evidence of overwhelming common practice, merely "lack of dispute", and the lack of dispute is caused by the fact that the change was not proposed in a public forum in the first place. As for the need of discussion, mass changes absolutely should not be put on par with single edits of mainspace articles. When specific claims of consensus are made without reference to a discussion or a vote where people expressed their agreement, the consensus is less certain but still possible, and can be proven by pointing out to a long-standing overwhelming common practice, sample of which can be provided by the claimant. When such a hypothesis of consensus is presented in a public forum, the rest of the editors can try to find a significant volume of refuting counterexamples to the putative common practice claim.

    The "dispute", "consensus" sequence is the opposite one: if I make an edit in a mainspace and no one oppose it but no one also becomes aware of the edit, there is no point in talking about consensus. It is only after there is at least a shred of dispute that talk about consensus and consensus forming becomes meaningful in the first place. --Dan Polansky (talk) 14:06, 12 August 2014 (UTC)

    • I think we should write WT:Consensus and have it approved by vote. —CodeCat 14:23, 12 August 2014 (UTC)
      • You should better write User:CodeCat/Consensus, so that everyone can know that, by your lights, a proposal that you did not even make is automatically supported by consensus since no one managed to dispute it. I think trying to write WT:Consensus could have some nasty repercussions, since it delves into meta-levels and infinite regress; it is this infinite regress that you are abusing here. --Dan Polansky (talk) 14:30, 12 August 2014 (UTC)
        • Wikipedia does fine with their w:WP:Consensus, and I think we need something similar. In fact, I think copying and amending it would be good. —CodeCat 14:39, 12 August 2014 (UTC)
  • Let us return to the substance. The problem with w:WP:Consensus is that it is bullcrap not so great. Consider this: "Any edit that is not disputed or reverted by another editor can be assumed to have consensus." That cannot be the case. A consensus is a state of general agreement. No one can be thought to agree with an edit of which they are not even aware of. As long as the edit is undisputed, it is not known whether it is supported by consensus, but in the absence of indication to the contrary, there is no need to revert the edit. Agreement and disagreement with an edit is only possible after the edit arrived to the attention of the person agreeing or disagreeing. --Dan Polansky (talk) 15:29, 12 August 2014 (UTC)
  • Well there was opposition to you doing OR in etymologies but that didn't stop you from continuing to do so. Do you perceive stopped being reverted as "consensus was formed" ? --Ivan Štambuk (talk) 15:40, 12 August 2014 (UTC)
  • The sense in which we have "consensus" at Wiktionary when folks fail to object to something is reminiscent of a "consensus" of users in response to arbitrary changes of user interface on must-use systems such as some of those of the federal government or Google or one's IP provider. One of the points of a wiki is to elicit contributor commitment by getting beyond such supposed consensus to one based on authentic participation. That users continue to participate in Wiktionary despite unsatisfactory changes and unsatisfactory methods of enacting changes wrought by technical contributors is a tribute to the preexisting commitment those users have because of the wiki idea and their prior efforts. It is also a tribute to the burgeoning complexity of the way in which many aspects of Wiktionary are implemented, which concentrates power in those very few who have both the time and the motivation to enact that complexity. Whether that complexity is actually necessary rather than a way of increasing the power of the motivated contributors is now moot. We now are stuck with the complexity and are help hostage to the whims of such contributors.
So our "consensus" seems to reflect the realities of power, more than anything else, though laziness and weakness of commitment to the project may also shoulder some of the blame. CodeCat is simply making that explicit. DCDuring TALK 16:59, 12 August 2014 (UTC)
Mostly the latter I think, as you can probably see by observing our "consensus-building" processes. You often either have the discussion stuck in nowhere because people are unable to agree on a minor detail, or complete lack of participation or "meh whatever" responses. And then someone uses a pretentions Latin phrase to justify reverting any changes. I am not surprised at all that CodeCat tends to bypass those venues. All this apathy can be really frustrating. Keφr 10:56, 13 August 2014 (UTC)
Insinuations may sound great, but do not belong on an open wiki and in an open discourse. The "someone" would be me. The Latin phrase would be "no consensus => status quo ante". The phrase "status quo" has been used over the years repeatedly by other editors as well, and has been an important principle that I have not introduced. As for e.g. Wiktionary:Votes/bt-2009-12/User:JackBot (to which you are untrasparently referring above, which is a poor practice), 6 editors participated; since the vote did not state the task for the bot, I would have opposed as per Ruakh and msh210 there; I don't know what you are complaining about. We do not need neverending repeated mass changes, especially those with significant oppositions; making many changes is enjoyed by immature juveniles, who avoid the real building of the real dictionary, unlike e.g. SemperBlotto or Equinox. I am sure the reusers and parsers of Wiktionary data (there are some) do not enjoy incessant changes either. --Dan Polansky (talk) 11:16, 13 August 2014 (UTC)
  • Blimey! How is it that the talk page of the article I'm editing on Wikipedia about the Israeli-Palestinian conflict is more collegial than this? I am tempted to collapse the entire last fourth of this month-subpage for generating "more heat than light". I will start blocking editors if they continue to speculate so incivilly about other editors' gender or genitalia; such speculation is not only irrelevant to our stated goal of making a dictionary but outright harmful to that goal, because it creates an exceedingly hostile environment that turns off potential contributors and tries to drive away existing contributors (it did drive Cloudcuckoolander to leave for a while, IIRC). It meets every criterion of our written blocking policy; it directly and deliberately hinders/harms the progress of Wiktionary in the way aforementioned, it has continued despite less drastic means of stopping it being attempted, it wastes everyone's time and it causes editors distress by directly insulting them and being continually impolite towards them.
    Some of you clearly feel that some of CodeCat's edits, and her tendency to implement them without discussion and even in the face of opposition, are also harmful to the project (and it was said above that they drove Ruakh to scale back participation in the project); if you would like to propose to desysop CodeCat, or block her if she continues her own behaviour, those options remain open [to all of you and to those of you who are admins, respectively], though continuing a civil discussion seems like a better course of action. (But, to be clear, CodeCat's behaviour and the misgendering above are not comparable.) - -sche (discuss) 21:46, 12 August 2014 (UTC)
    Thank you. Keφr 10:02, 13 August 2014 (UTC)
    I believe that many do not believe that we can desysop CodeCat without even more damage to the project, as the skimpy documentation for our system would make it difficult to maintain. I think this is probably a wrong belief, as shown by our operating successfully without CodeCat's presence after the last conflict, but who wants to test it for an extended period? I think our ever-increasing dependence on complex modules and relentless "tidying" of templates that would compete with such dependence has the effect of increasing CodeCat's power over the project. DCDuring TALK 22:07, 12 August 2014 (UTC)
    I think DCDuring said we use too many templates and they're too complex. If that's what he said, I agree. Purplebackpack89 22:16, 12 August 2014 (UTC)
  • For the uninformed: -sche (sic) is one of the major enablers of CodeCat's editing without consensus, and himself guilty of repeated controversial mass editing without consensus. --Dan Polansky (talk) 22:28, 12 August 2014 (UTC)
    Right. User:-sche apparently blocked me for "speculating about CodeCat's gender", while at the same time we can see threats and ad hominem attacks against editors like User:Purplebackpack89 go unpunished. --Ivan Štambuk (talk) 14:41, 14 August 2014 (UTC)
    To clarify, I blocked you because you are the only person I've seen continue to speculate after I asked (warned) people to stop that particular long-practised irrelevant/harmful behaviour. (I decided not to institute ex post facto blocks to other users as long as they stopped, and so far they have.) I am none too happy about Kephir calling Purplebackpack a "lying illiterate troll", but I hope that sort of incivility can be discouraged by discussion. (Also, I am not the only one with a mop, other people could step up to the plate and issue warnings and blocks if ad hominem attacks continue...) - -sche (discuss) 16:34, 14 August 2014 (UTC)
    There was already misunderstanding regarding my usage of the personal pronoun he so it was necessary for me to elaborate on that. I simply declared my position on the topic I didn't brought up in the first place, so if you want to issue warnings and blocks you're barking up the wrong tree. --Ivan Štambuk (talk) 16:19, 16 August 2014 (UTC)
  • As someone who has interacted with both editors (Codecat and Polansky) in both negative and positive terms i can say i am a neutral and impartial party in this conflict. Nonetheless, although i have also encountered posturing behaviour by Polansky, i think that has been balanced out by his helpful lessons he's given me on how guidelines on wiktionary work. Pass a Method (talk) 14:46, 14 August 2014 (UTC)

Superprotect certain pages such that sysop permissions are not sufficient to edit them[edit]

Users may be interested in [20], which creates a new protection level called "superprotect" for "protecting pages such that sysop permissions are not sufficient to edit them". The new protection was developed after a series of events on en.WP, and was applied to de.WP's MediaWiki:Common.js after a series of events there. (De.WP held a RFC on Media Viewer and found that consensus was for it to be disabled by default and opt-in rather than enabled by default and opt-out, a de.WP admin implemented that consensus via w:de:MediaWiki:Common.js, and an edit war occurred between that admin and another admin + a WMF person. Events on en.WP were similar, except en.WP admins didn't edit-war.) There has been some discussion of the new protection level at en.WP (permalink to current revision), though it has generated more heat than light, and there is a RFC on Meta.
You may also be amused by this; if you don't speak German, the key bit (after the initial post by Bene* in English) was BHC's reply "does Bene* even have the right to edit MediaWiki:Common.js now?"
- -sche (discuss) 19:04, 13 August 2014 (UTC)

I'm sure that's a tool CodeCat would love. -- Liliana 19:15, 13 August 2014 (UTC)
All power to the technocrats. DCDuring TALK 19:22, 13 August 2014 (UTC)
I think the idea is to lock a page for a few days, during which nobody can edit, discuss what the right thing to do is, and then do the right thing when the protection ends. I can get behind that Purplebackpack89 19:24, 13 August 2014 (UTC)
What we really need is shadowbanning ;) Equinox 19:28, 13 August 2014 (UTC)
Yeah, then we coulda banned Mglovefun so I wouldn't have had to read all his low-level digs of me. Purplebackpack89 20:25, 13 August 2014 (UTC)
Was that comment really necessary? -- Liliana 21:14, 13 August 2014 (UTC)
Yes. It serves as evidence that PBP's favourite pastime here is not dictionary-building, but trolling. Of which I think there is abundance already, but whatever. Also, meet kettle, pot. Your remark about CodeCat above in this section was equally gratuitous. Keφr 21:19, 13 August 2014 (UTC)
Kephir, calling me a troll is inaccurate (witness how many entries I've created in the last 24 hours alone) and is evidence of your continual campaign to get me banned from the project for relatively innocuous edits. I have told you numerous times that interacting with me is unproductive. One more remark like that and I will request a one-sided interaction ban on you interacting with me. Purplebackpack89 21:28, 13 August 2014 (UTC)
A "one-sided interaction ban" sounds absurd. Googling the phrase only finds you demanding them in a few places. Equinox 21:30, 13 August 2014 (UTC)
But don't you agree with the general idea that things would be better off if Kephir stopped interacting with me? Purplebackpack89 21:35, 13 August 2014 (UTC)
Oh c'mon. Do you really need to bring your seemingly-unlimited paranoia in here? -- Liliana 21:40, 13 August 2014 (UTC)
Paranoia, Liliana? Dude tried to remove my autopatrol and rollback rights less than 48 hours ago (see above discussion where he was shouted down). He has been targeting me for months since that disruptive pump thread a month and a half ago. He freely admits to monitoring my edits. And he just accused me of primarily being a troll, which is a flat-out lie. Purplebackpack89 21:44, 13 August 2014 (UTC)
Since you mentioned that, you have edited a whopping number of six pages in main namespace in the last 24 hours; what do you want, a biscuit? As for you being a troll, your ban at SEW and subsequent lack of remorse for what caused it should be enough evidence of that.
Also, if you really thought that interactions with me are such a waste of your time, you could simply avoid having them and ignore me. Simple as that. Keφr 22:15, 13 August 2014 (UTC)
How many new entries, Kephir? Three new entries in the last 24 hours (one of which had multiple definitions).
I cannot ignore you because you keep inserting yourself into my editing, even though I've asked you to not do so many times, and am asking you do so again. You have admitted to monitoring my edits, not to better content but to find dirt on me. A perfect example of this is your bullshit removal of my rights a day and a half ago, even though it was blatantly clear that the punishment didn't fit the crime, and you did not have a community consensus to do so. So, I ask you once more: will you voluntarily stop interacting with me for the good of the community? Because it's crystal clear that you continuing to interact with me is unproductive. Purplebackpack89 22:21, 13 August 2014 (UTC)
Three entries? Whoop-de-doo. Wonderfool makes more in 15 minutes. What does it prove? "You have admitted to monitoring my edits, not to better content but to find dirt on me" — show a quotation of me saying that. Keφr 22:49, 13 August 2014 (UTC)
In the only discussion I've only had with you on your talk page. If you were merely looking at edits for errors, it wouldn't matter who created them. I take it you're not going to agree to stop monitoring my edits, nor to stop using your tools in a way that results in a wheel war? What a shame. I thought you'd be the bigger man. Purplebackpack89 22:54, 13 August 2014 (UTC)
I asked for a quotation. If you fail at reading comprehension, you should not participate in writing a dictionary. Keφr 22:58, 13 August 2014 (UTC)
Your quote "Also, stop randomly jumping accounts for no reason, it makes it harder to review your "contributions", for lack of a better word." would indicate that you have been "reviewing" my contributions. But quotation, schmotation. I ask you once more: are you going to stop doing it? Purplebackpack89 23:01, 13 August 2014 (UTC)
I noticed two different accounts using the same signature at random. Did I "admit to monitoring [your] edits, not to better content but to find dirt on [you]" as you claimed? No. Lying illiterate troll. Keφr 23:21, 13 August 2014 (UTC)
The Wiktionary community has the right to review any edits anyone makes; this is a public project. I would be happy if someone were reviewing my edits, because any mistakes I make would be fixed. The quote about PBP from his ban on the Simple English Wikipedia (which Kephir linked to above) that I think basically sums him up very well is "it's clear that he cannot collaborate in a constructive fashion". Even though his mainspace edits may be productive, he is incapable of dealing with criticism. --WikiTiki89 23:39, 13 August 2014 (UTC)
I'm supposed to let a wheel war over my permissions and being called a "lying illiterate troll" roll off my back? I'm not the problem here, Kephir is. Kephir cannot collaborate in that he continues to call me a troll even after I've asked him to stop interacting with me altogether. Purplebackpack89 23:45, 13 August 2014 (UTC)


@Purplebackpack89: If you continue to target other users in the BP, I will block you. --WikiTiki89 23:43, 13 August 2014 (UTC)

PBP, if you continue to comment here, I would too. Wyang (talk) 23:46, 13 August 2014 (UTC)
You're going to block ME because another editor attacked me? That seems unfair. Purplebackpack89 23:49, 13 August 2014 (UTC)
Why don't you scroll up and look at who is starting all of the antagonism in these discussions (hint: it's you). --WikiTiki89 23:52, 13 August 2014 (UTC)
Except I wasn't talking to Kephir, and he went in there, and called me a troll. Heck, the Mglovesfun comment wasn't even altogether serious, and that was blatently obvious! I can't believe you think that it's OK for Kephir to do that, or to wheel war over my permissions. He acts abominably towards me, and it has got to stop immediately. Purplebackpack89 23:54, 13 August 2014 (UTC)
Please don't propose ridiculous "interaction bans", and I will not have to strike them out. Notice I said "discussions" in the plural. I don't think anyone here got the humor in your remark about MG. --WikiTiki89 23:59, 13 August 2014 (UTC)

This Purplebackpack89 is very similar to User:Razorflame — no substance, disruptive, irreverent, spoilt brat. And what do they have in common? They are both American children. I blame the American school system. No European or Asian youth would behave like this. --Vahag (talk) 07:00, 14 August 2014 (UTC)

I think that's not a good generalisation, Vahagn. It won't get us anywhere. There are good and bad, well-educated and spoilt people everywhere. (No comment on the topic at hand). --Anatoli T. (обсудить/вклад) 07:08, 14 August 2014 (UTC)
This is exactly what's wrong with liberals like you, Anatoli — false equivalency. "All religions are peaceful", "all cultures are good", etc... It is only America that instills in its children an undeserved sense of specialness and entitlement. You know very well that in our societies someone like Purplebackpack89 would immediately eat a couple of slaps in the face and wouldn't dare raise voice against such a valuable editor as Kephir. --Vahag (talk) 10:20, 14 August 2014 (UTC)
First of all, I'm not defending Purplebackpack89, I'm just saying that it is wrong to judge people by their origin. Using slaps in the face has little to do with being liberal, use of force is often justified. Skinheads in Russia killed hundreds of people from the Caucasus (including Armenians) and Central Asia, just for having the wrong looks and accents, for being generalised as "less civilised" or having bad behaviour or upbringing. I'm not always justifying American politics either but it's Russia that now instills in its children an undeserved sense of specialness and entitlement. Russia may treat Armenia better than other neighbours but don't be fooled, Putinism in this stage can't have real friends, allies or simply partners, it can only have vassals or enemies. Real liberal societies, which you dislike, give a chance to everyone, regardless of where they come from and a slap of face gets the one who deserves it, no matter where they come from. I work and socialise with people of different races and colours in the country, which treats everyone equally, more than US or Russia, and I don't see any problem with that. You would think the same way if you lived in a friendlier environment. I don't blame you for your views. I don't want to be involved in political debates or discuss, which nation is better, just replying to your comment. "No European or Asian youth would behave like this". This is funny really, people behave badly "in our societies", even if they are beaten. I am actually having trouble finding civilised and open conversations in runet (Russian Internet), besides it's problematic to punish someone on the Internet for bad behaviour. --Anatoli T. (обсудить/вклад) 06:02, 15 August 2014 (UTC)
You never get my trolling, Anatoli... And by the way, regarding modern Russia — I'm probably the biggest Russophobe you know. --Vahag (talk) 06:22, 15 August 2014 (UTC)
Well, в каждой шутке есть доля ... шутки. I do get your trolling but not always, I admit. Your jokes get you into trouble, so I'm not the only one who doesn't always get your trolling. :) I am a Russian Russophobe, as far as the Russian politics go, even if I was born in the Eastern Ukraine and lived in Russia most of my life. --Anatoli T. (обсудить/вклад) 06:38, 15 August 2014 (UTC)
@Vahagn Petrosyan:, you've crossed the line with your comments. For one, it's a personal attack to call somebody a "spoilt brat". It's also inaccurate to say I'm without substance, as I have created over 100 entries. Purplebackpack89 13:32, 14 August 2014 (UTC)
Purpleback reminds me a little of Wonderfool too. Socially naive, mostly well-meaning, and occasionally a genuine asshole. Definitely could do with some more mainspace edits tho. --ElisaVan (talk) 01:31, 15 August 2014 (UTC)

Personal word list/vocabulary list ?[edit]

I use Wiktionary as my learning tool for learning foreign languages.
I am using the "Watchlist" feature as my word list but I think we need a proper personal word list.
If people could create a word list and then choose to add words that they looked up to the list, it would help them remember those words better. Burkhankhaldun (talk) 07:17, 14 August 2014 (UTC)

Hit Ctrl-D. (This is not a prank. That would be Ctrl-W.) Keφr 07:34, 14 August 2014 (UTC)
You could simply edit your own user page and add the words of interest there; it would also allow grouping and formatting using the wiki markup. Equinox 07:44, 14 August 2014 (UTC)
What Equinox said. :) If you click the button at the top of the edit window that says "advanced", the icon on the far right of the menu that appears will even help you add a sortable table if you want to put in both foreign-language words and translations, and be able to sort them. Cheers! - -sche (discuss) 16:38, 14 August 2014 (UTC)
Userspace should be user for the activity related to the improvement of project, and not as a personal diary or a learning tool. That kind of functionality is outside the purpose of this project. --Ivan Štambuk (talk) 14:55, 16 August 2014 (UTC)

Not renaming template pedialite[edit]

I have only now noticed that "pedialite" is being renamed in the mainspace to "projectlink|wikipedia", like in diff. I object. Let me also repeat my sense of exasperation about the perpetrator of that renaming. This is very angering, and I feel helpless. I don't think I would quit Wiktionary about this, since Wiktionary is too great a project regardless, but the exasperation does move me in that direction. --Dan Polansky (talk) 09:20, 14 August 2014 (UTC)

Short markup is highly beneficial to editors who actually create entries. (I suppose it makes no odds to those who primarily work on templates. Ha.) Every time I have to start using {{cx|cookery|lang=en}} instead of {{cookery}}, or {{l|en|-mone}} instead of [[-mone]] (or have to dig my edit cursor through masses of such code generated by others), I feel a similar frustration to Dan's. I don't care about the final underlying representation — it could be a huge complex XML document — but the stuff I type in a text box, including others' work as presented to me for ongoing editing, should omit the complexities. Editing facilities do not seem to be keeping up with internal changes. Equinox 09:28, 14 August 2014 (UTC)
Amen. Content seems to be taking a back seat to a single individual's urge to "tidy". DCDuring TALK 12:21, 14 August 2014 (UTC)
And yet, when things are confusing, people wonder why we don't attract many new editors. My efforts to increase consistency and tidiness is aimed at reducing the substantial mental load that comes with editing Wiktionary, so that it becomes more accessible to newcomers. Wiktionary is too strongly biased towards its existing user base. —CodeCat 13:03, 14 August 2014 (UTC)
  • I object as well. The confusing thing is that Template:pedia and Template:pedialite are redirects to something, rather than being templates themselves. I seriously doubt that any editor would understand why projectlink exists at all. The reason that Wiktionary is too strongly biased to its existing user base is an over-reliance on an increasingly few templates, and a lack of redundancy in templates. Purplebackpack89 13:22, 14 August 2014 (UTC)
    • I see it as the opposite. I see having too many templates that perform similar functions as the problem. Redundancy should be eliminated, not increased. The more templates are alike, and the less different kinds with subtle differences, the easier it is for new editors to learn them. But to address the point about pedia and pedialite specifically, I was not going to delete them. I was only converting them to something equivalent. After all, I've seen other people's bots convert {{cx}} to {{context}} which is no different. It's just eliminating the shortcut, as shortcuts are intended primarily to give editors less to type. —CodeCat 14:25, 14 August 2014 (UTC)
  • This is a common tactic: first a "redundant" template is orphaned by bots, and afterwards they are listed for deletion on the obscure template discussion board which barely anyone keeps track of, and after a discussion involving one or two editors they get summarily deleted. You take a 1 month wikibreak and suddenly templates that were stable for years, had short and easily recallable names are all gone and replaced with some verbose monstrosities. --Ivan Štambuk (talk) 14:50, 14 August 2014 (UTC)
  • After a fortuitous e/c that illustrates my point.
    @Purplebackpack: You seem to misunderstand. You should know that the only important confusion is that faced by those who control the system, not any confusion on the part of actual users or contributors. That confusion is asserted to affect new contributors, though a moment's thought would clearly show the assertion to be implausible.
Most new potential contributors are likely to be contributing in one or two languages, their native language or their native language plus English. Any uniformity of system design across languages makes no difference to them. Redirecting templates are the means of getting the best out of our technical infrastructure while allowing customization for individual languages.
In the case at hand, we have redirecting templates applied to a different class of items. It would be easy enough to address the naming inconsistencies among our project-linking templates. The problem is that CodeCat/Mewbot fails to listen to or heed objections, let alone seek input in advance. I can only conclude based on the consistent pattern of behavior that CodeCat can't handle disagreement and avoids it by doing elaborate endruns and using technical malarkey to put an end to any discussion that might lead to a frustration of ambitions or whatever. I suppose that we can expect various neuroses (or worse) among contributors to become evident over time. It is only when the resulting behavior causes problems or inconvenience for others that we are entitled to object. This seems to be one of those times. DCDuring TALK 15:03, 14 August 2014 (UTC)
I see value in merging templates in the backend, but there is no reason that {{pedialite}} can't be kept as a redirect. I also see no value in mass-converting uses of this template. --WikiTiki89 15:11, 14 August 2014 (UTC)
I agree with Wikitiki on all points. Improving the backend of the template is good. The (semi-)memorable name ({{pedialite}}) should be kept as a redirect, and RFDed if someone wants it to be deleted. Renaming existing usages is not harmful, but it's not helpful or necessary, either. (Ditto renaming existing usages of {{cx}}, as Mglovesfun did.) - -sche (discuss) 18:36, 14 August 2014 (UTC)
I noticed that DP is now running an unapproved bot making undiscussed edits to many pages. I find it ironic that in trying to undo what he considers my wrongs, he commits those very same wrongs himself. Apparently the rules don't apply, and there's no need to discuss anything, if you already know you're right? This kettle disapproves of the pot. —CodeCat 20:35, 22 August 2014 (UTC)
A neat trick of yours. It shows your modus operandi in the clearest. I am merely restoring the state before your undiscussed changes. I do not need to actively seek consensus to restore status quo ante. --Dan Polansky (talk) 20:42, 22 August 2014 (UTC)
Thank you for confirming that you did not intend to discuss your edits or your bot with anyone. —CodeCat 20:44, 22 August 2014 (UTC)
Again, I am not instating a change; I am abolishing a change. This very thread shows the degree of consensus or its lack for the change that I am abolishing. --Dan Polansky (talk) 20:59, 22 August 2014 (UTC)
Can you show the vote or other kind of discussion that demonstrates that there is consensus for mass-reverting other editors without discussion and with an unapproved bot? —CodeCat 21:10, 22 August 2014 (UTC)
I recall instances unilateral reversal of changes to individual entries to the status quo ante. But I don't see any particular reason why the same should not apply to multiple bot-installed items. I see no particular reason why any single person's unvoted-on changes deserves any special protection from reversion or reversal. In the case of someone who simply institutes changes without any consensus, motivated principally by a purely personal compulsion to tidy, and then leaves to others the task of completing the change (Redlinked categories come to mind.), reversion would seem to be warranted and a failure to do so to reward bad behavior. DCDuring TALK 21:57, 22 August 2014 (UTC)
No, I cannot show you the vote showing that this wiki should be governed by consensus; that I accepted as a given when I have joined the project. Re: "Unapproved bot": this is AWB from a menial-work user controlled by me, not a bot. --Dan Polansky (talk) 22:10, 22 August 2014 (UTC)

Debotting MewBot[edit]

FYI: Wiktionary:Votes/2014-08/Debotting MewBot. --Dan Polansky (talk) 09:34, 14 August 2014 (UTC)


How do I go about rendering Inupiak entries when a specific font is required to view the intended characters? When I copy and paste words from the original typeface into wiki entries, they are rendered with completely wrong characters. I don't want to create entries that aren't accurate, so it's probably best if I wait for a solution before creating more work for myself by having to change them later on! The font can be found hereJakeybeanTALK 17:53, 14 August 2014 (UTC)

Use the proper Unicode characters: Ġ, ġ, Ḷ, ḷ, Ł, ł, Ł̣, ł̣, Ñ, ñ, Ŋ, ŋ. — Ungoliant (falai) 17:59, 14 August 2014 (UTC)
Brilliant, thank you. —JakeybeanTALK 18:29, 14 August 2014 (UTC)

Empowering JackBot[edit]

FYI: Wiktionary:Votes/bt-2014-08/User:JackBot for bot status. --Dan Polansky (talk) 20:33, 14 August 2014 (UTC)

Old Provençal or Old Occitan?[edit]

Wiktionary knows of a language called "Old Provençal", which nowadays is normally termed "Old Occitan". Should it be renamed? Benwing (talk) 04:38, 15 August 2014 (UTC)

SupportCodeCat 13:03, 15 August 2014 (UTC)
@CodeCat: Since you repeatedly complained of people posting no rationales, do you have any? --Dan Polansky (talk) 13:47, 15 August 2014 (UTC)
I support the rationale that Benwing gave. —CodeCat 13:49, 15 August 2014 (UTC)
@CodeCat: Thank you. How can we verify that "Old Provençal" is nowadays normally termed "Old Occitan"? --Dan Polansky (talk) 13:51, 15 August 2014 (UTC)
My book "Introduction to Old Occitan" says re. Occitan vs. Provençal:
Occitan enjoys increasing acceptance in all the languages of scholarship on the subject (despite the resistance of Provençal partisans) [with a footnote here] and will be adopted here.
The footnote says
"Only among specialists outside France has Occitan come to be the generally accepted term for the language" (Field 233).
Since we are "outside France" then we should use Old Occitan. Note that we're already using Occitan for the modern language, since Provençal is properly speaking only one of the dialects of Occitan. Benwing (talk) 16:17, 15 August 2014 (UTC)
Thank you very much. I have now also looked at Old Provençal,Old Occitan at Google Ngram Viewer and W:Old Occitan, and support. I've seen User:Renard Migrant edit Occitan, so I am pinging him, in case he has input. --Dan Polansky (talk) 16:48, 15 August 2014 (UTC)
Support as well. --WikiTiki89 17:14, 15 August 2014 (UTC)

Removing the number sign (#) from voting templates[edit]

I propose to remove the automatic number sign (#) from the voting templates {{support}}, {{oppose}}, and others. There has been discussion about this before (can someone find where?) and I recall that the objections were that people might not precede the template with a number sign, but I from looking at our votes, all of the uses I see have a number sign before the template. This will solve those annoying indentation problems with these templates, by putting all of the indentation control outside of the templates.

To clarify, this means that this will still work:

# {{support}} ~~~~

This will no longer work:

{{support}} ~~~~

But this will now work as expected:

: {{support}} ~~~~

As part of updating these templates, we can even merge the backends so that all of the voting templates will function the same way (such as {{vote delete}}).

--WikiTiki89 13:02, 15 August 2014 (UTC)

  • Support. I feel the templates should not generate the number, and should not be used like they are used in the second option above. It seems to me that the # sign is not part of the support itself, unlike the support icon and the text "support". And someone may want to write # Weak {{support}} ~~~~, and this should work seamlessly. --Dan Polansky (talk) 13:49, 15 August 2014 (UTC)
Support per nom and per Dan's point that the current format makes it hard to cast qualified votes like "weak support". A short previous discussion was here (prompted by edits to Template:vote-generic and other vote-setup templates). - -sche (discuss) 18:40, 15 August 2014 (UTC)

Additions/changes to Template:en-verb[edit]

Recently I converted this template to Lua, but without changing how its parameters work at all. But with Lua I think we can streamline it some and hopefully make it easier to use. The way I propose is as follows:

  1. For completely regular verbs, which add -s, -ing and -ed to the page name, nothing changes. You just specify no parameters at all.
  2. For slightly irregular verbs, you can specify only the first parameter (this is new):
    1. If the first parameter equals "es", then the present 3rd singular gets that ending instead of the normal -s. (example: smash {{en-verb|es}})
    2. If it equals "ies", then the final -y of the page name is replaced by -ies in the present 3rd singular, and by -ied in the past, while the present participle will end in -ying. (example: carry {{en-verb|ies}})
    3. If it equals "d", then the past ending will be that instead of the normal -ed. (example: free {{en-verb|d}})
    4. If it is a word ending in "es", then this ending is replaced with -ing and -ed to form the present participle and past. (example: recognize {{en-verb|recognizes}})
    5. If the first parameter anything else, then it is taken as the stem to form the present participle and past. (example: plot {{en-verb|plott}})
  3. If the second parameter and possibly third and fourth parameter are present, the template works as before. This is done for backwards compatibility. In particular, you can specify the present, present participle, past and optionally past participle directly using the four positional parameters.

CodeCat 13:32, 15 August 2014 (UTC)

I agree with #1 and #3, but I think many of the things in #2 can be automated. #2.1, #2.2, #2.3, and #2.4 can all be detected automatically (in case of false positives, we can use {{en-verb|s}} to provide the default behavior). For #2.5, I think we can do something like {{en-verb|tt}}, {{en-verb|dd}}, etc. --WikiTiki89 14:15, 15 August 2014 (UTC)

Why not also add another parameter in the first place, which would be a number denoting the verb's position. So for example carry out would be like
Programmatically, It will first check if the first argument is a number and treat 2nd 3rd and other arguments as if they were 1st 2nd and so on (the arguments meaning don't change).
Otherwise it will behave just like it does now.--Dixtosa (talk) 14:17, 15 August 2014 (UTC)
That is a good idea, but I think it would be better of as a positional parameter: {{en-verb|p=1}}. --WikiTiki89 14:19, 15 August 2014 (UTC)
I didn't want to make the proposal too complicated technically speaking. We could still do that in a later proposal, but for now I'd rather focus on what is there first. —CodeCat 14:40, 15 August 2014 (UTC)
The technical side doesn't matter in the proposal. My suggestions simplify the interface that editors will use, by not requiring any arguments in most cases. --WikiTiki89 14:55, 15 August 2014 (UTC)
I'm more wary of making proposals that make too many changes. I've noticed in the past some people don't like changes, so I've tried to keep them to a minimum. —CodeCat 14:56, 15 August 2014 (UTC)
What you should have noticed was that people don't care about the internal changes, but about the external changes. And my suggestions will have fewer external changes. People tend to complain when you require extra parameters or change the names of templates or parameters, but not when the template is updated to do all the work for them, while supported backwards compatibility. --WikiTiki89 15:00, 15 August 2014 (UTC)
What Wikitiki said. DCDuring TALK 15:48, 15 August 2014 (UTC)
I agree with Wikitiki about automating the things in #2. Benwing (talk) 16:23, 15 August 2014 (UTC)
How will this handle the existing cases such as tie#Verb when used in accordance with the current documentation, ie, {{en-verb|t|y|ing}}? Presumably per 3 above? DCDuring TALK 18:04, 15 August 2014 (UTC)
Yes, it looks like that is covered by #3, although ideally there should be a shortcut for it. --WikiTiki89 18:17, 15 August 2014 (UTC)
Agreed: "What you should have noticed was that people don't care about the internal changes [changes to template and module internals], but about the external changes [changes to wiki markup]." --Dan Polansky (talk) 09:15, 23 August 2014 (UTC)
@CodeCat: I have blocked User:MewBot for going ahead with this without consensus. --WikiTiki89 23:53, 24 August 2014 (UTC)
I see a consensus for it here. —CodeCat 23:53, 24 August 2014 (UTC)
I see a few people agreeing with me about my proposed changes to your changes. I also see no mention of a bot run in this discussion. --WikiTiki89 23:56, 24 August 2014 (UTC)
Your proposals were all additions to mine. I've just implemented a subset, foregoing the autodetection part. And while there is no mention of a bot run, why would there be opposition to one if we already agreed on what the new parameters are? —CodeCat 00:00, 25 August 2014 (UTC)
One reason is that if we do implement the auto-detection part, then you would have to do another bot run on the same verbs. Another reason, is that you need to start following bot procedure more closely. I would have thought that while there is a vote going on to debot your bot, you would be on your best behavior. --WikiTiki89 00:08, 25 August 2014 (UTC)
I thought I was? That's why I'm confused, I really thought I was finally doing something nobody would find a reason to be upset at me about. Maybe my bot should be blocked then, it seems I'm not good enough at judging when it's ok to use it. —CodeCat 00:13, 25 August 2014 (UTC)
You shouldn't be guessing at what people would be upset about. All you need to do is ask before every bot run. It's a simple enough rule to follow. --WikiTiki89 00:17, 25 August 2014 (UTC)
People never give a straight answer though. Just look at what happened here; I thought there was assent, when you judged it differently. I don't deal well with trying to figure out just what people mean, and clearly when I try to make sense in interpreting all the different comments and opinions, people get upset because I inevitably get it wrong. —CodeCat 00:22, 25 August 2014 (UTC)
You have to ask the question before you complain about getting unclear answers. After you were done making the changes in the module, you should have come back to this thread and asked something like "Can I start the bot run now?" --WikiTiki89 00:27, 25 August 2014 (UTC)
The module/template is specially designed to make all the parameters spell one of the inflected form, just as how I designed {{de-conj-auto}} to make all the parameters spell the pagename (advertisement!). For example, criticize would have {{en-verb|criticiz|es}} for aesthetics purposes. And talking about unclear answers, would you like people to just oppose you if they don't agree with you? Isn't that a bit rude? I oppose criticize having {{en-verb|criticiz}} as parameter. --kc_kennylau (talk) 01:49, 25 August 2014 (UTC)
That is one of the reasons I want auto-detection. I think we can all agree that just plain {{en-verb}} is much nicer than either of {{en-verb|criticiz|es}} and {{en-verb|criticiz}}. --WikiTiki89 02:00, 25 August 2014 (UTC)
I think the second is better than the first, because the second doesn't use a superfluous parameter. The "es" is literally just a no-op, like if you wrote {{en-verb|criticiz|lang=en}}, since the template doesn't need or use a lang= parameter. —CodeCat 02:02, 25 August 2014 (UTC)
What Kenny is saying is that criticiz looks very wrong by itself. And speaking of superfluous parameters, the entire word criticiz is superfluous as well, since it can be deduced from the page name. --WikiTiki89 02:06, 25 August 2014 (UTC)
Not any wronger than some of the parameters you might find in other templates. For the inflection table of Finnish suomalainen for example you write {{fi-decl-nainen|suomalai|a}}. If you assume that parameters should look like natural words, then yes, the parameters look strange. But that's a wrong assumption. And while the parameter can indeed be deduced from the page name, we haven't yet determined how many cases there are for which that deduction gives the wrong results. That's one of the reasons I preferred to play it safe at first, given that this is such a widely used template and we can't afford errors. Once the proposed changes had been made, I was going to research how feasible your additions would be. But I never got that far now... —CodeCat 02:16, 25 August 2014 (UTC)
If there is a choice between using a parameter that looks natural or one that looks unnatural, you should go with the natural one. As for the reliability, of course if we haven't discussed it, we couldn't have determined anything. I wouldn't call going ahead with a bot run "playing it safe". --WikiTiki89 02:25, 25 August 2014 (UTC)
What my bot did was apply the more limited version, the proposal I originally made. That I had already done a lot of researching and experimenting with, so I already knew that it would be possible to implement it fully as proposed, before I even proposed anything. I wanted to make sure that I wouldn't be proposing something that ended up not being feasible. —CodeCat 02:41, 25 August 2014 (UTC)
It is of course feasible, but the aestheticness is lost. --kc_kennylau (talk) 03:30, 25 August 2014 (UTC)

The meaning of "word"[edit]

We have had some disagreements of late as to what constitutes a "word". Our corpus offers a few relevant definitions, including:

  • A distinct unit of language (sounds in speech or written letters) with a particular meaning, composed of one or more morphemes, and also of one or more phonemes that determine its sound pattern.
  • A distinct unit of language which is approved by some authority.
  • Any sequence of letters or characters considered as a discrete entity.
  • Different symbols, written or spoken, arranged together in a unique sequence that approximates a thought in a person's mind.

So what is a "word" so far as "all words in all languages" is concerned? bd2412 T 18:33, 15 August 2014 (UTC)

There is that old classic definition, as MWOnline puts it: "any segment of written or printed discourse ordinarily appearing between spaces or between a space and a punctuation mark". This definition has the virtue of fitting with the use of written documents for attestation, which virtue is not shared by any of the four above. Of course, it is not the only definition that might have that virtue. DCDuring TALK 20:02, 15 August 2014 (UTC)
It also has the drawback of not working for languages that aren't written with spaces, or aren't written at all. —Aɴɢʀ (talk) 20:04, 15 August 2014 (UTC)
And, of course, MMWOnline has 10 main senses, 20 individual definitions of word, of which two senses, four definitions (including that above), may be relevant to (y)our discussion. DCDuring TALK 21:52, 15 August 2014 (UTC)
Several of the recent discussions about what constitutes a word were specifically about whether romanizations were words. On that subject, I've opined that romanizations are not words but representations of written words, like the shadows of things in Plato's Cave. Cambridge’s definition is interesting to consider in this context; it says a word is "a single unit of language that has meaning and can be spoken or written" — as if in their view words themselves are concepts, like the Platonic concept of jar, and spoken and written forms are just instances, like actual jars (and then, in my analysis, romanizations are the shadows of the jars). Cambridge's definition also implies that words have to belong to languages.
Even if one doesn't share my view of romanizations, it may be difficult to write a definition of "word" that applies to all words, doesn't apply to anything other than words, and yet isn't a paragraph long. I'll think about it, but for now, to add to the above list of other references' definitions, Wikipedia defines word as "the smallest element that may be uttered in isolation with semantic or pragmatic content (with literal or practical meaning) [...or more concisely...] the smallest meaningful unit of speech that can stand by [itself]".
- -sche (discuss) 21:20, 15 August 2014 (UTC)
I would actually go further and say that written language isn't language at all but only the representation of language, just like a painting of a pipe isn't a pipe. —Aɴɢʀ (talk) 22:10, 15 August 2014 (UTC)
I would have to disagree with you there. I would say that written languages and spoken languages are different languages using different media, but heavily influenced by each other. --WikiTiki89 22:21, 15 August 2014 (UTC)
It would be fascinating if we had a dictionary of sounds, where the "reader" could speak the sound and in response be told the meaning of it, with no visual symbols being used at all. bd2412 T 23:01, 15 August 2014 (UTC)
An IPA dictionary would be the next closest thing. I'd love to work on that, but it would be hard. —CodeCat 00:50, 16 August 2014 (UTC)
  • From MW Online:
  • a - (1) : a speech sound or series of speech sounds that symbolizes and communicates a meaning usually without being divisible into smaller units capable of independent use
    (2) : the entire set of linguistic forms produced by combining a single base with various inflectional elements without change in the part of speech elements
  • b - (1) : a written or printed character or combination of characters representing a spoken word <the number of words to a line> []
    (2) : any segment of written or printed discourse ordinarily appearing between spaces or between a space and a punctuation mark
  • -- HTH DCDuring TALK 21:52, 15 August 2014 (UTC)
    I notice that all of the definitions we've discussed so far fail to cover words from sign languages that lack written forms. Many of the above-mentioned definitions (e.g. Cambridge's) do cover words from sign languages like ASL, DGS, etc, because words from those languages can be written (using SignWriting, HamNoSys, etc), but not all sign languages can be written. - -sche (discuss) 22:36, 15 August 2014 (UTC) clarified 03:33, 16 August 2014 (UTC)
    There's another question raised by that - do we need to pick one definition for "word" or should we, in an abundance of caution, include within our sweep several different formulations? Should we include every distinct unit of sounds in speech or written letters with a particular meaning, plus every distinct unit of language which is approved by some authority, plus every sequence of letters or characters considered as a discrete entity? bd2412 T 00:45, 16 August 2014 (UTC)
    That's a good question. My initial reaction when I thought of sign languages was that it would make sense to have different definitions for (1) spoken/written languages' words and (2) sign languages' words, or even (1) spoken languages' words, (2) written languages' words and (3) sign languages' words. But it seems to me that there's a basic concept behind all of those, a basic sense of "word"—if we can figure out how to formulate it—that "spoken word", "written word" and "signed word" are subsenses of. (Does it seem that way to you?) Other senses like "unit of language approved by some authority" — which I guess is the sense people use when they say "irregardless isn't a word"? — could either be additional senses on the same level as that basic sense, or subsenses of it.
    For a definition of the basic sense, what about "the smallest unit of language which has meaning and can be expressed by itself"? Would that include or omit anything it shouldn't, bearing in mind that I'd envision it having subsenses that would provide the details on the nature of written / spoken / signed words? - -sche (discuss) 03:33, 16 August 2014 (UTC)
  • An interesting issue is caused by clitics. English has a clitic in the 's ending, but some languages have a lot of them. For example, Arabic has clitic object and possessive pronouns, plus clitic prefixes like wa- "and", fa- "so", ka- "like", sa- (future tense), etc., and many Arabic dialects have a clitic negative circumfix ma- ... . In some sense, adding a clitic to a word makes a larger word rather than two joined words; that's the nature of a clitic. At the very least, the resulting entity behaves as a single phonological word, usually with a single primary stress and possibly secondary stress(es). But it's possibly not a single linguistic word (morphological word?) in that it's not the "smallest unit of language which has meaning and can be expressed by itself" -- that would be the part without any clitics added. I would say there should be a general rule that word+clitic combinations should not be entered into Wiktionary; I think there are already agreements of this sort in specific cases, e.g. the -que, -ve, -ne clitics in Latin.
  • Compound words are also a problem since they seem to also violate the "the smallest unit of language which has meaning and can be expressed by itself" criterion but often have an idiosyncratic meaning, e.g. blackboard and blackbird are not the same as black board and black bird. Sometimes in English we write such compound words with no spaces, but not consistently cf. red tape, red-eye/redeye/red eye (flight), data base/database, file name/filename, etc. In Mandarin the issue comes up even more acutely since most words are compounds of one sort or another and there are widely varying degrees of compositionality of meaning, phonological behavior as one word or several, etc. Benwing (talk) 10:03, 16 August 2014 (UTC)
  • So I guess that we are considering abandoning the notion of the lexicon in favor of our slogan. I suppose that each word in our slogan should be similarly parsed and that any limits of our user interface be ignored in pursuit of an ideal that won't be realized before Wiktionary collapses under the burden of idealism. DCDuring TALK 12:27, 16 August 2014 (UTC)
    We abandoned the notion of the lexicon (a collection of lexemes) right at the very beginning, when we decided to have individual entries (and in some cases citations and the like) for plural forms like peaches, conjugated forms like stumbled, and superlative forms like hungriest. Your fear of the "limits of our interface" is interesting, given that the entirety of Wiktionary can fit on some USB thumb drives, and that our sister project Wikipedia is showing no signs of such problems despite having over 33 million pages across all namespaces with no signs of slowing, compared to our 4.1 million. I'd say we have plenty of room to grow. bd2412 T 14:19, 16 August 2014 (UTC)
    The limits are only that we are stuck with screen output and keyboard input. Other forms of input and output are much more limited in the portion of our content that they can accommodate. DCDuring TALK 15:06, 16 August 2014 (UTC)
    Sorry, I misinterpreted your actual shortsightedness for an altogether different kind of shortsightedness. My wife asks her phone things all the time and gets answers (often from Wikipedia). Quite a few technologies exist to allow blind people to use the Internet, generally. The limitation on our interface is that Wikimedia has not yet initiated a vocal interface. bd2412 T 15:28, 16 August 2014 (UTC)
    I am very happy to leave farsightedness and idealism to those who enjoy living in fantasy worlds. Judging by the way speech recognition works as delivered by Google and Apple and its rate of progress to date, I would say that it has no relevance whatsoever for casual users of Wiktionary for the next ten years or more. Of course, hard-core user/contributors such as ourselves may find some use for it sooner, though judging by the use of speech-recognition technology in workplace situations (ie, one language, one speaker, narrow range of vocabulary) much less challenging than ours and much more equipped with technical resources, even this may not turn out to be true. DCDuring TALK 17:34, 16 August 2014 (UTC)
    You should brush up on Moore's law. Also, web accessibility. Your ten-year projection may be a bit out of line with the current state of the art. I just tried my wife's phone with a few definitions of non-English words and it did pretty well. Of course, all of this is a separate discussion from what is a "word" for our purposes. bd2412 T 21:36, 16 August 2014 (UTC)
    Show me. DCDuring TALK 01:56, 17 August 2014 (UTC)
    How would I go about doing that? In any case, I'll concede that it's a bit beyond the scope of the question of how "word" is defined for purposes of writing the dictionary. bd2412 T 03:33, 17 August 2014 (UTC)
  • The question is too general and philosophical, and thus of no practical value. What is a word should be decided on an individual language basis, primarily on the basis of the criterion of usability: how the users and third-party software would look up Wiktionary entries using the search box/API to find out word's meaning and other metadata. The purpose of "all words in all languages" motto is not to impose exclusive inclusion of words (as opposed to non-words, a we already have countless entries for non-words), but rather to extol the liberal principles of the project, namely the absence of "authorities" which decides what goes in or not, as opposed to actual attestations of language which provide real-word evidence of usage. --Ivan Štambuk (talk) 14:27, 16 August 2014 (UTC)
    • The question may well be overly general and philosophical, but if we're in the business of offering people definitions of words, we should have a handle on what words are. bd2412 T 13:42, 18 August 2014 (UTC)
      We are in the business of collecting human knowledge, at a scale that vastly exceeds the needs 99% of people. People don't care whether something is a word or not when they look it up in the dictionary - they just want the meaning/translation and other goodies. Optimization of user interface for human consumption should be orthogonal to the underlying goal of documenting all instances of written (perhaps one day even spoken) language. The potentially harmful impact of overextending the formal definition of word to include the supposedly non-opaque compounds or set phrases, or "real" words with various affixes attached with debatable level of transparency is trivial. --Ivan Štambuk (talk) 13:45, 21 August 2014 (UTC)

I have overhauled our entry [[word]], adding several verb senses, adding more citations, etc. I left the four definitions which pertained to the linguistic sense of 'word' alone until the end of the overhaul, when I changed them like this (if it's too hard to pick through that diff, see this): I didn't reword the last three of the four senses at all (though I am about to RFV at least one of them), but I split the sense which had said
  1. A distinct unit of language (sounds in speech or written letters) with a particular meaning, composed of one or more morphemes, and also of one or more phonemes that determine its sound pattern. [from 10th c.]
into one sense and two subsenses with supporting citations:
  1. The smallest unit of language which has a particular meaning and can be expressed by itself; the smallest discrete, meaningful unit of language. (Contrast morpheme.) [from 10th c.]
    1. The smallest discrete unit of spoken language which has a particular meaning, composed of one or more phonemes and one or more morphemes.
    2. The smallest discrete unit of written language which has a particular meaning, composed of one or more letters or symbols and one or more morphemes.
Further improvements are welcome and indeed encouraged. - -sche (discuss) 03:42, 18 August 2014 (UTC)
Postscript: I modified another of the 'liguistic'-sense-related definitions, like this. - -sche (discuss) 05:06, 18 August 2014 (UTC)

Multiple Categories For 1 Label?[edit]

I run into items in Special:WantedCategories from time to time that that result from labels categorizing differently than contributors expect. For instance, kibutz is a Turkish word for an Israeli institution, so it only seemed logical to put "Israel" in the {{context}} template. This added the redlinked category Category:Israeli Turkish, because the system assumes Israel is where the term is spoken, rather than where the referent of the term is found. The same thing happens when someone uses classical to refer to something associated with ancient Greece and Rome: we have things set up to interpret that as referring to the classical stage of the language (this was done with Classical Chinese in mind), and thus we had the redlinked category Category:Classical English. Is there a way to have both a regional and a topical category for the same term, and to select between them? Or, lacking that, is there a way to turn off categorization for a single parameter in the {{context}} template, to have it display in the list, but not generating a bogus category? Chuck Entz (talk) 02:53, 16 August 2014 (UTC)

This has been raised and ignored before, but it might be the singer not the song. DCDuring TALK 02:56, 16 August 2014 (UTC)
I'd say the problem is more general in that we never had a proper way to distinguish regions as topics and as dialects. I think I brought this up before, when the label templates were being converted to Lua, but I think it was mostly ignored at the time like DCDuring said. —CodeCat 14:28, 16 August 2014 (UTC)
The problem isn't limited to regions. A region can be a usage context or a "topic", but so can a discipline, trade or profession. In addition, as Chuck points out, the categories are not necessarily intuitively connected to the label. Furthermore the mapping from labels to categories only has the sullen consent of users as it was basically imposed in the new regime. In the former regime there were many fewer topical categories, so much less often-inappropriate force-fitting was required and there could be a template for each major "context". These templates were apparently added based on a count of frequently used parameters of {{context}}, after some editing of user-inserted labels that were "close" to existing ones. The by-product was that there was an incentive for users to stick to the "contexts" for which templates existed because fewer keystrokes were required, but user flexibility was not otherwise discouraged. There were specific technical difficulties with template-only implementation of {{context}}, which difficulties made it an early target of Lua-based reform. However, there was a significant loss of capability in the simplistic implementation of the new approach. DCDuring TALK 14:53, 16 August 2014 (UTC)

where to put quotations when there are multiple spellings of a word?[edit]

Old French is notorious for having inconsistent spelling, e.g. standard arachier also appears as arrachier and arracher, among others. I have the latter two linked to the former as alternative forms, and the former lists the latter two as alternative forms. I have one quotation which would involves the form arrache, which is a form of both arrachier and arracher. Should I (a) put this quote under arrachier because that's the more standard spelling of the two, or (b) put it under arachier because it's under this lemma that all the variant forms are "gathered"? Is there a rule for this? Benwing (talk) 10:10, 16 August 2014 (UTC)

There's also: (c) put the quote under arrache, since it's specific to the spelling. Chuck Entz (talk) 17:34, 16 August 2014 (UTC)
Are you saying "the more standard spelling of the two" is not the entry that has been made the lemma? That should be fixed before anything else is done, and then the quotation should go in the lemma entry = entry for the most standard spelling (=, if I follow what you're saying, the or at least a spelling from which the form found in the quotation derives).
For English and German, I often see quotations of one spelling placed in the lemma entry even if it is a different spelling, especially if the quotations are famous uses (e.g. from Shakespeare), or contribute to showing that the word has been in continuous use for a long time. There are even quotations of Chaucer (Middle English) placed in English entries, which I have been told off for moving to Middle English entries. Quotations are also placed in the entries for the spellings they use, if that it necessary to verify the existence of those spellings. So, in the general case, it seems you can place the quotation wherever seems most appropriate. - -sche (discuss) 17:48, 16 August 2014 (UTC)
Sorry, let me clarify, arrachier is more standard than arracher but arachier is the most standard form. Both arracher and arrachier would produce the same 3rd-singular present indicative arrache. I fixed things so that both arrachier and arracher point to arachier. You (-sche) suggest putting it under arachier if I'm not mistaken. Chuck says put it under arrache but there's no entry at all for that. Benwing (talk) 19:10, 16 August 2014 (UTC)
I didn't say "put it under arrache", just that doing so should be included as one of the options. I would vote for either the lemma or the actual spelling, but I don't claim to be an expert on Wiktionary best practices. As -sche (who knows far more about this than I do) said, putting the quote in the actual spelling is mostly done when the quote is for the purpose of showing usage of the spelling itself, rather than of the term independent of its spelling. It might also be useful if you had a quote that was ambiguous enough so you weren't sure which lemma it belonged to.
In general, the issue of which form to put content on boils down to a tradeoff of usefulness vs. practical concerns: it would be most useful to have things like the definition at every spelling, but practically, that would be a nightmare to keep in synch, so we have everything in the lemma. Being consistent in format is just another aspect of usefulness: the more consistent things are, the less thought you have to put into finding things. Here, I don't see practical concerns unless you want to put it in multiple places, so you're better off figuring out where people would likely want or expect it to be, and putting it there. Chuck Entz (talk) 20:30, 16 August 2014 (UTC)

is there a policy statement somewhere indicating the standard order of "alternative forms", "etymology", "external links", etc. sections?[edit]

There seem to be standards for how these sections are ordered, but I'm not sure where this is documented. I'm guessing it's something like this:

At level 3:

  1. Alternative forms
  2. Etymology
  3. Verb / Noun / etc.
  4. External Links

At level 4, under the headword:

  1. Definition
  2. Pronunciation
  3. Conjugation / Declension
  4. Derived words, Related words
  5. Descendants
  6. Synonyms
  7. Anagrams

But not really sure.

Benwing (talk) 10:15, 16 August 2014 (UTC)

See WT:ELE. DCDuring TALK 10:52, 16 August 2014 (UTC)
Pronunciation is level 3, just after etymology. An exception is when there are multiple etymologies and they have different pronunciations. — Ungoliant (falai) 18:01, 16 August 2014 (UTC)
I'd say the exception is when there are multiple etymologies and they have the same pronunciation—in that case, Pronunciation precedes Etymology 1. —Aɴɢʀ (talk) 18:30, 16 August 2014 (UTC)
Yeah, you’re right. — Ungoliant (falai) 18:34, 16 August 2014 (UTC)

Catalan rhymes[edit]

Our current rhymes pages deal exclusively with central/eastern Catalan, which is the basis for the standard language in Catalonia itself. In these dialects, there is significant vowel reduction in unstressed syllables, so that many more words rhyme when they did not before. But in Valencia as well as large parts of western Catalonia, there is much less vowel reduction. The Balearic dialects take an intermediate position, reducing less than central Catalan but more than Valencian.

Because our rhymes pages use central Catalan as a base, they're no use for anyone outside of that dialect area. On the other hand, vowel reduction is entirely predictable, so information would be gained if we used unreduced vowels in rhymes, while none would be lost. So I'd like to propose that we do not reflect vowel reduction in Catalan rhymes, so that all dialects can be covered.

This does not solve all cross-dialect problems for Catalan, as Balearic dialects have an extra vowel phoneme that the other dialects lack, while Estern and Western Catalan differ in the application of é versus è. But it would be a step in the right direction at least. —CodeCat 19:51, 17 August 2014 (UTC)

Nynorsk translations and interwiki links[edit]


As you know, the Bokmål translations can have an interwiki link to the Bokmål Wiktionary when the trnslation exists in this project. But for the Nynorsk translations, we don't have often this chance. Indeed, Nynorsk Wiktionary still exists currently but community seems not very active anymore (nn:Special:RecentChanges). Therefore, maybe we could route the « nn » code to « no » in Module:translations, like it’s done for « nb »; that is to say having an interwiki link to no.wiktionary for translations in Nynorsk, instead of having an interwiki link to nn.wiktionary.

Here is a previous discussion about the proposal to rally efforts on Bokmål Wiktionary: w:nn:Wikipedia:Samfunnshuset/Arkiv/2009#Wiktionary, et fellesnorsk prosjekt.

And you'll find here an analysis of Nynorsk translations on this project who whould have an interwiki link if there were routed to no:: User:Automatik/Analysis of Nynorsk translations.

What do you think of routing nn code to no for interwiki links in translations? — Automatik (talk) 00:23, 19 August 2014 (UTC)

Support. Before the Chinese merger we used to do this for Mandarin (cmn) terms, which linked to Chinese (zh) Wiktionary. BTW, Chinese merger is a success and helped increase contents for Chinese varieties (esp. Cantonese, Min Nan, Wu and Hakka), perhaps a good indication of what could have been done for Norwegian (Bokmål and Nynorsk) (vote didn't pass) and Arabic (no vote), even if the situation is not the same (e.g. Chinese lects have no inflection and PoS headers don't require genders or plural forms). --Anatoli T. (обсудить/вклад) 00:30, 19 August 2014 (UTC)
I think the question we need to ask ourselves first is what the purpose of interwiki links is in translations. Is the purpose to lead the user to more information about the term outside of what the English Wiktionary has available, regardless of what language it is presented in? Or is the purpose specifically to lead to the entry in that specific language? Or something else? If we just want to link to more information, then we really want lots of interwiki links, not just to one specific language. But if the idea is to match the language specifically, then we shouldn't be linking to Bokmål from Nynorsk translations. —CodeCat 13:20, 20 August 2014 (UTC)
The purpose is to link to a definition of the term in its own language. In this case, it's the same language. It may even make sense to link to all three of the Norwegian Wiktionaries (when the entries exist) for every Norwegian term. --WikiTiki89 14:09, 20 August 2014 (UTC)
Actually there are not three but two Norwegian Wiktionaries (no: and nn:). And 'no' stands for Bokmål even if in ISO it stands for Norwegian and 'nb' stands for Bokmål. — Automatik (talk) 21:08, 20 August 2014 (UTC)
Oh. In that case there is no question that, at the very least, we should we should point Bokmål translations to the no.wikt. I assumed there were three because Serbo-Croatian has four (Bosnian, Croatian, and Serbian, as well as the combined Serbo-Croatian). --WikiTiki89 21:22, 20 August 2014 (UTC)
Bokmål translations already points to no.wikt. I asked for Nynorsk translations because no.wikt is also written in Norwegian and is more active. But I can understand the reluctance of some (why do not point to nn.wikt if it's not closed and still exists?… for the reason explained before I guess, but I understand). — Automatik (talk) 22:14, 20 August 2014 (UTC)
Oh sorry, I completely understand now. In that case, I'm still going to say that all three of {{t+|no}}, {{t+|nn}} and {{t+|nb}} should point to both of no: and nn:. --WikiTiki89 00:15, 21 August 2014 (UTC)
But what should be done for the distinction between {{t}} and {{t+}} in this case? The template would not have any way to say that no: has an entry but nn: does not. —CodeCat 00:19, 21 August 2014 (UTC)
It would have to be a bit more complex. Something like {{t|nn|foo|iw=no,nn}}, {{t|nn|bar|iw=no}}. Each language would have to have a list of allowed interwikis. Not to mention User:Rukhabot would have to be updated (ping User:Ruakh). --WikiTiki89 00:35, 21 August 2014 (UTC)

Cantonese translations[edit]

(Notifying Kc_kennylau, Wyang): The topic above gave me an idea that Cantonese (yue) translations could be linked to the Chinese (zh) Wiktionary. Although the Chinese Wiktionary doesn't always provide Cantonese transliterations or other info, it may be helpful to look up other things. Written Cantonese shares about 99% of terms with Mandarin. Perhaps Cantonese jyutping can be loaded with a bot here and Chinese Wiktionary, Wyang has created a framework for this. --Anatoli T. (обсудить/вклад) 00:38, 19 August 2014 (UTC)

@Atitarev: 係咪?(Do you mean this?) --kc_kennylau (talk) 00:41, 19 August 2014 (UTC)
@Kc kennylau: Thanks, Kenny. Well, yes, single character entries (zi) have jyutping but multi-characters often don't. Since Cantonese Wiktionary doesn't exist, it would probably make sense to use Chinese (for semantics, translations). -Anatoli T. (обсудить/вклад) 00:50, 19 August 2014 (UTC)
@Atitarev: I still don't understand what you mean. Can you provide me with an example? --kc_kennylau (talk) 00:56, 19 August 2014 (UTC)
@Kc kennylau: OK, Mandarin translation (at China#Translations) 中国 (zh) (Zhōngguó) links to zh:中国, even if it uses "cmn" language code but Cantonese 中国 (zung1 gwok3) links to nothing, uses {{t}}, not {{t+}}. I suggest to redirect yue to zh, the way cmn is used in translations and use {{t+}} when a Chinese entry exists in zh:wikt. --Anatoli T. (обсудить/вклад) 01:03, 19 August 2014 (UTC)
@Atitarev: Oh, I thought you were talking about {{zh-usex}}, that's why I didn't understand. --kc_kennylau (talk) 01:05, 19 August 2014 (UTC)
@Kc kennylau: Do you support this idea? Or you think zh:wikt shouldn't be used for Cantonese? --Anatoli T. (обсудить/вклад) 01:11, 19 August 2014 (UTC)
@Atitarev: Support. --kc_kennylau (talk) 14:30, 19 August 2014 (UTC)

Renaming existing uses of Template:term to Template:m[edit]

Recently I've noticed that User:Mulder1982 has been working through many entries and correcting and fixing etymologies, but he has also consistently replaced {{term}} with {{m}}. I fully support this action, but I think it would be more effective to use a bot to do this. Is it ok for me to run a bot to replace existing instances of {{term}} with {{m}}? —CodeCat 13:24, 20 August 2014 (UTC)

Yes, please do! It's hard work. :D But well, that has actually been a by-product. The main thing I've been doing these days is to remove redundant transliterations of Gothic. That could probably be done with bots too, I suppose. But yes, speaking for myself, do run the bot. Mulder1982 (talk) 13:27, 20 August 2014 (UTC)
I weakly support this and would point out that there is absolutely no urgency in the matter. --WikiTiki89 14:07, 20 August 2014 (UTC)
  • Oppose converting "{{term}}" to anything. Wiki markup is the user interface; a significant widespread conversion like this should be done via a vote, IMHO. See also Wiktionary:Beer parlour/2014/April#Convert Template:term.2Ft_and.2For_Template:term to Template:m.3F. By the way, where is the rationale? Moreover, Mulder1982 should not have been manually performing the conversion until they can demonstrate widespread support for such a conversion. --Dan Polansky (talk) 17:40, 20 August 2014 (UTC)
  • Support; at first I found it confusing that both {{term}} and {{m}} existed, since it wasn't obvious that they do the same thing. Being consistent would be less confusing for new users. Benwing (talk) 22:32, 20 August 2014 (UTC)
  • Support; I do this manually all the time. There is zero reason to use 10 characters (term|lang=) where 2 will do the same thing (m|). CodeCat, you mentioned once before that there are still many instances of {{term}} where no language is specified. I would say those should not be changed at all, rather than being changed to {{m|und}}; after all, probably over 95% of those cases are English anyway. —Aɴɢʀ (talk) 22:44, 20 August 2014 (UTC)
    • We can either change them to "und" and then later fix all instances of {{m|und|...}}, or we can leave them as they are. They're going to need to be fixed either way. —CodeCat 22:49, 20 August 2014 (UTC)
      • There are some cases where we anticipate and tolerate that linking templates are used with the language code 'und', aren't there? E.g. when Phaistos disc or Buyla inscription particles are linked to, or when forms from intermediate stages of reconstructed languages or language families are mentioned (pre-Proto-Algonquian (post-Proto-Algic), etc, and IIRC 'Middle Iranian'). Hence, I think it makes sense to leave language-less {{term}}s as they are, as they constitute a different category of things that need to be cleaned up from {{m|und}}s. (In some cases, language-less {{term}}s may actually need to be converted to {{m|und}}s, but in a lot of cases they need instead to be labelled as English or, in etymologies, Danish or Norwegian.) - -sche (discuss) 18:09, 21 August 2014 (UTC)
        • Of course. But if we have a cleanup category for uses needing a language, it doesn't imply that the category needs to be emptied out altogether. There will be some where the use of "und" is legitimate, but by far the most of them won't be, so a category would still be helpful. —CodeCat 18:12, 21 August 2014 (UTC)
          • It makes it harder to clean up when the cleanup category is filled with legitimate uses. It would be better to leave correct usages as {{m|und|...}} and unknown usages as {{term|...}}. --WikiTiki89 18:24, 21 August 2014 (UTC)

This was proposed just couple of days back at Wiktionary:Requests_for_moves,_mergers_and_splits#Template:term_into_Template:m, with three opposes. --Dan Polansky (talk) 22:58, 22 August 2014 (UTC)

I have created a vote: Wiktionary:Votes/2014-08/Migrating from Template:term to Template:m. --Dan Polansky (talk) 23:09, 22 August 2014 (UTC)

Capitalizing proper nouns in reconstructed languages[edit]

Should we capitalize proper nouns in reconstructed languages? For example, should the Proto-Slavic word for Rome be located at *Rimъ or *rimъ?

From my point of view, reconstructions represent sounds, not spellings, and sounds have no notion of capitalization. Additionally, if we choose to use capitalization, we would be forced to pick a particular language's rules for when to capitalize words. In the case of Proto-Slavic, the various Slavic languages themselves have varying rules for capitalization. --WikiTiki89 17:42, 20 August 2014 (UTC)

I think we should not capitalise. —CodeCat 17:57, 20 August 2014 (UTC)
If it's a convention to standardize them in the literature, which seems to be the case, we should do it as well.
Contrary to the popular misconception, reconstructions don't represent sounds but phonemes. Phonetic reconstruction (*abc -> *[abc]) is a different category and has little to do with comparative method. --Ivan Štambuk (talk) 18:02, 20 August 2014 (UTC)
Allophones can be reconstructed in some cases, though. Sometimes allophones become phonemes later on, but we can also reconstruct them through the effect they have on sound changes. For example, we can reconstruct two allophones of Old English /x/ because of later developments in late Middle English. For Proto-Germanic we know that an allophone [ŋʷ] existed because of the effects of the w:Boukolos rule. And for Proto-Indo-European the allophony of syllabic and consonantal semivowels is also well known. —CodeCat 18:08, 20 August 2014 (UTC)
Unless I'm going into detail about phonology/phonetics, there is no reason to differentiate sounds from phonemes here. Neither sounds nor phonemes have any notion of capitalization. But as to your first point, I'm willing to bet that most sources that capitalize reconstructed proper nouns use the language would use the rules of its own language to determine when a word should be capitalized. For example, Vasmer would use the Russian rules, while the Hrvatski etimološki rječnik would use the Croatian rules. In the cases where the rules differ, how would we decide which one to follow? --WikiTiki89 18:11, 20 August 2014 (UTC)
Probably we should find common rules and use them, like we do with reconstructed words. —Useigor (talk) 18:15, 20 August 2014 (UTC)
We already capitalize phonetic transcriptions like Pinyin so there is a precedent. The whole transcription of sounds vs. alphabetic words dichotomy is false - this is an issue of arbitrary convention and nothing else. Thousands of largely unwritten languages use scholarly transcription schemes invented to accommodate their phonology, with capitalization rules conveniently imposed. Rules for capitalization are generally the same across all Slavic languages - the few corer cases like demonyms could be decided by counting the preferred form in reflexes. --Ivan Štambuk (talk) 13:29, 21 August 2014 (UTC)
Pinyin has its own standard capitalization rules. --WikiTiki89 13:39, 21 August 2014 (UTC)
Which are completely arbitrary (though based on Latin-script languages) which just proves my point that something being not a "real" word but rather a transcription is no argument. --Ivan Štambuk (talk) 13:49, 21 August 2014 (UTC)
The rules themselves may have been chosen arbitrarily, but our rule is straightforward and logical (namely: follow the standard). In the case of reconstructions, there is no standard to follow, so the arbitrary choice is on us, and we shouldn't be making such arbitrary choices. --WikiTiki89 13:59, 21 August 2014 (UTC)
We do not follow the standard but the attested usage. That the usage overwhelmingly conforms to the standard is a different issue - but even if it didn't we'd still have to add them because they're attested as such. Conversely, the lack of de iure standard does not preclude the possibility of forming one on or own, as it has already been done countless times, on the basis of de facto standard established in the literature, or for pragmatic reasons. Furthermore, the precedent set by Pinyin where capitalization and other orthography rules are being assigned to a purely transcriptional notation demonstrates that the practice is far from unnatural, and that the dichotomy of word vs. transcription is a false one. --Ivan Štambuk (talk) 18:29, 21 August 2014 (UTC)
The "dichotomy" that I was referring to is not word vs. transcription, but spelling vs. transcription. Both spellings and transcriptions represent words (in fact, as we see in the discussion above, no one actually knows what a word is). I never said that capitalization in transcriptions is unnatural, only that we shouldn't use it for proto-languages. Pinyin was designed as a multipurpose system. One of these purposes is embedding Chinese words in foreign language text ("President Xí Jìnpíng did such-and-such."), which is why capitalization is useful, since foreign languages expect capitalization. However, we do not do this at all with proto-languages; we don't write "Shrines of *Perunъ were located either on top of mountains or hills", we only use this transcription to talk about the word itself. Attestations of the spelling "*Perunъ" are not really attesting anything, they are just simply how other researchers choose to transcribe the term. We are free to transcribe the term however we please (for example, we could transcribe it as *Perūnu) as long as it is internally consistent, but it is in our own best interest to ensure that the transcription conforms with some sort of norm, so that our readers who are accustomed to reading about Proto-Slavic will not be confused. However, I doubt much confusion would come from not capitalizing nouns. In fact it does our readers a disservice to pretend that *Perunъ the god is somehow a different word form *perunъ the weather phenomenon, when we in fact do not know whether the Slavs of the time saw them as the same thing or different. --WikiTiki89 18:55, 21 August 2014 (UTC)
The deity appellation is the original term, and the common noun *perunъ is secondary, derived from the proper noun. Far from being a disservice, it is in cases such as this when proper and common noun denote different entities but differ only in capitalization, that capitalization of proper nouns is helpful from lemmatization. Proper noun is uncountable and animate, whereas the common noun would be countable and inanimate, and both have separate and different set of reflexes in daughter languages.
Your argument was that "reconstructions represent sounds and not spellings", from which it is now evident that when you typed spelling you meant word. Why it shouldn't be used for reconstruction still remains a mystery. This "fictional reader" argument gets thrown a lot for every imaginable dispute and I don't buy it. If anything, readers would expect proto-terms to conform to capitalization rules of ordinary languages.
Regardless of the Pinyin's design goals, there is fundamentally no difference between "Shrines of *Perunъ" and ""President Xí". Both are not "real" words to the same extent, regardless of how you define the term. --Ivan Štambuk (talk) 00:08, 22 August 2014 (UTC)
It doesn't matter what the original term was (in fact I'm curious how we even know which one it was). The point is they may or may not be different words, they may or may not have had different spellings (if they even had any sort of writing), but the only thing we do know is that they were pronounced the same. "President Xí" and other Pinyin words are attestable as uses in English, while reconstruction notation is only attestable, and therefore only meant for, mentions. --WikiTiki89 03:42, 22 August 2014 (UTC)
Just on capitalisation of Chinese/Japanese/Korean romanisations to no-one in particular, since it was mentioned here. There are certain rules, used by various dictionaries and standards, not just for the use of loanwords in English but e.g. educational purposes. Place names and people's names are definitely capitalised, even the rules for word spacing and hyphenations are described. So, "Xí Jìnpíng" is a standard modern transliteration (pinyin) for 近平 but "Xí Jìn-píng" or "Xí Jìn Píng" is not. Months, days of the week are not capitalised There are some discrepancies as for demonyms and language names e.g. 中国人 (Chinese person), 中文 (Chinese language). Both "zhōngguórén" and "Zhōngguórén" (less commonly "Zhōngguó rén"), "zhōngwén" and "Zhōngwén" are used by various dictionaries. An agreement between editors is required on capitalisations in this case. --Anatoli T. (обсудить/вклад) 04:27, 22 August 2014 (UTC)
In some transcription schemes for proto-languages, capital letters are semantically different from lower-case letters. For example, Proto-Brythonic is often reconstructed with both *b and *B, where Proto-Brythonic *B is the descendent of Proto-Celtic *b, while Proto-Brythonic *b is the descendent of Proto-Celtic * in a leniting environment. So capitalizing proper names in those cases would be a bad idea. And I'm generally opposed to it for other proto-languages where that isn't an issue, too. —Aɴɢʀ (talk) 18:22, 20 August 2014 (UTC)
What about the reconstructed Latin words? --kc_kennylau (talk) 06:22, 22 August 2014 (UTC)
The difference with Latin, is that Latin orthography is well known. For most proto-languages, the orthography is 100% artificial. --WikiTiki89 12:07, 22 August 2014 (UTC)

More entries in the lemma category than in its subcategories?[edit]

I am trying to figure out where the lemmas are coming from in Category:Hungarian lemmas. It has more than 16000 entries but the subcategories contain only about 14000, even if I include the phrases, proverbs, suffixes, prefixes which are not specifically listed under the subcategories. Are there other categories? --Panda10 (talk) 13:24, 21 August 2014 (UTC)

You can see a full list of the categories that are recognised as "lemmas" at the top of Module:headword. —CodeCat 13:37, 21 August 2014 (UTC)

Letter petitioning WMF to reverse recent decisions[edit]

The Wikimedia Foundation recently created a new feature, "superprotect" status. The purpose is to prevent pages from being edited by elected administrators -- but permitting WMF staff to edit them. It has been put to use in only one case: to protect the deployment of the Media Viewer software on German Wikipedia, in defiance of a clear decision of that community to disable the feature by default, unless users decide to enable it.

If you oppose these actions, please add your name to this letter. If you know non-Wikimedians who support our vision for the free sharing of knowledge, and would like to add their names to the list, please ask them to sign an identical version of the letter on change.org.

-- JurgenNL (talk) 17:35, 21 August 2014 (UTC)

Process ideas for software development[edit]


I am notifying you that a brainstorming session has been started on Meta to help the Wikimedia Foundation increase and better affect community participation in software development across all wiki projects. Basically, how can you be more involved in helping to create features on Wikimedia projects? We are inviting all interested users to voice their ideas on how communities can be more involved and informed in the product development process at the Wikimedia Foundation.

I and the rest of my team welcome you to participate. We hope to see you on Meta.

Kind regards, -- Rdicerb (WMF) talk 22:15, 21 August 2014 (UTC)

--This message was sent using MassMessage. Was there an error? Report it!

Haida lects[edit]

Wiktionary currently includes Southern Haida (hax), Northern Haida (hdn), and the macrolanguage they are sometimes considered to form, Haida (hai). Μετάknowledge and I discussed this on my talk page and are of the opinion that we should deprecate the macrolanguage hai and have only hax and hdn. The phonological and other differences between Northern and Southern Haida are, as linguist Michael Krauss puts it, "rather great, allowing only partial mutual intelligibility without practice, perhaps like Swedish and Danish, or German and Dutch." Translator Robert Bringhurst says "[i]t is, in fact, chiefly out of courtesy that northern and southern Haida are described as two dialects rather than two close but separate languages. By 1900, north and south had clearly known centuries of diverging cultural growth." Each language indeed has its own dialects (Kaigani and Masset Haida are the mutually intelligible dialects of Northern Haida, Skidegate and the now-extint Ninstints constitute Southern Haida). - -sche (discuss) 19:08, 22 August 2014 (UTC)

CFI and Non-Deities in Classical Mythology[edit]

Is there any reason we should keep most entries or senses referring to individual people in Greek mythology? It's true they're not covered by the "given name and surname" rule in CFI, but the names often have no meaning beyond their reference to the individuals themselves. Do we really need definitions that read like "daughter of so-and-so and so-and-so, wife of King so-and-so, and mother of so-and-so"? Chuck Entz (talk) 22:27, 22 August 2014 (UTC)

We should keep them for the lexicographical information that they carry, including etymology and pronunciation; that is for keeping the entries. As for keeping senses, that is kind of natural to me. A related poll: Wiktionary:Beer parlour/2010/December#Poll: Including individual people. --Dan Polansky (talk) 22:31, 22 August 2014 (UTC)
While these words do refer to the individuals themselves, the individuals may be so widely known that their names are assumed understood. If you consider the use-mention distinction, then most people will be introduced to a reader much the same as any other unfamiliar term is. But certain people are assumed to be known and their names are therefore only used and not introduced. This applies not just to Greek mythological names but also to names in modern times. For example Elvis can be seen in this way. Of course the knowledge of the person fades with time as popular culture shifts, but some names stay known for longer than others, just look at Bonnie and Clyde, Hitler, Napoleon, Caesar, Tutankhamun etc. These names definitely have "meaning". The meaning is tied to a cultural context, so that when that context is lost, the meaning goes with it. But in a sense, learning who Napoleon was is not so different from learning what a lute or an ironclad is. —CodeCat 22:38, 22 August 2014 (UTC)
We could include them with a definition such as "Character in Greek mythology" with a link to the Wikipedia article. --WikiTiki89 22:46, 22 August 2014 (UTC)
What a horrible idea. --Dan Polansky (talk) 22:46, 22 August 2014 (UTC)
Why is it a horrible idea? We already do this for many Biblical characters. --WikiTiki89 22:49, 22 August 2014 (UTC)
Actually scratch that. I was thinking of what we do for non-English entries. --WikiTiki89 22:53, 22 August 2014 (UTC)

Asteroid Names[edit]

The IP who's been adding these has been better known for adding mountains of junk edits to Japanese and Chinese entries, and to English entries having to do with magic and mythology, among others. These are so innocuous, and the other stuff is so awful, that we've let them slide, for the most part. It occurred to me, though, that there are some conceptual issues to be addressed.

Asteroid/minor planet names are assigned by an international body according to specific rules, usually based on the wishes of the discoverer(s), and are accompanied by a unique number reflecting the order in which the names have been assigned. Thus, the first asteroid discovered and named is w:1 Ceres, but the 1 is often left off. The IP describes it as Ceres being "short for 1 Ceres". I'm not sure, though, if the number is really part of the name, or whether it's added to the name to make the full official designation. In practice, the number seems to left off quite a bit in normal usage.

A more serious issue is what language header should be used. It seems clear to me that something like 1 Ceres, as a scientific name assigned by an international body, is interlingual. I'm not sure, though, if we have- or should have- entries for any of these. The question then becomes: is the sense at Ceres referring to 1 Ceres English, translingual, or some combination of both? Is the number like the author abbreviations in taxonomic names: necessary to be included at least once in a publication for the name to be technically complete, but mostly left off? Or is omitting it a sign that it's not translingual in that use? How is the name handled in other languages where script or morphology are different than the norm in the language for things like taxonomic names? Chuck Entz (talk) 04:04, 23 August 2014 (UTC)

Besides w:fr:(1) Cérès begins with these words: (1) Cérès (international designation (1) Ceres). We can see into w:Category:Asteroids that the parentheses are often used. JackPotte (talk) 08:31, 23 August 2014 (UTC)

Migrating from Template:context to Template:cx[edit]

FYI: Wiktionary:Votes/2014-08/Migrating from Template:context to Template:cx. --Dan Polansky (talk) 08:40, 23 August 2014 (UTC)

Where was this discussed? —CodeCat 12:33, 23 August 2014 (UTC)
You could start discussing it now, instead of retreating into a procedural defense invoking those you routinely ignore. It does make it look like you have no particular substantive arguments.
The proposal seems sensible. {{context}} could be retained as a redirect, as it affords an option that may not require consulting the documentation for casual users. The shorter name would reduce the size of the database. Making {{cx}} the effective template rather than a redirect would eliminate one extra redirect call at the time of page loading. This seems like basic efficiency, even though efficiency is a secondary concern. DCDuring TALK 13:40, 23 August 2014 (UTC)
It's usually Dan who resorts to procedure, so I was only returning what he does to me all the time. Petty maybe, but satisfying. —CodeCat 14:44, 23 August 2014 (UTC)
Ok, but where are the substantive arguments now? DCDuring TALK 15:20, 23 August 2014 (UTC)

FYI: Wiktionary:Votes/2014-08/Templates context and label. --Dan Polansky (talk) 17:47, 23 August 2014 (UTC)

CFI Misspelling Cleanup[edit]

FYI: Wiktionary:Votes/pl-2014-08/CFI Misspelling Cleanup. --Dan Polansky (talk) 09:32, 24 August 2014 (UTC)

Format and layout of language categories[edit]

Yesterday I made some changes to how language categories like Category:English language display. Instead of some prose, there is now a table which displays the information in a more systematic format, and it also shows information that did not appear before, like ancestors and other names. I do think that the table is a good addition, but I'm not really happy with how it looks. So I wonder if we should keep it this way. Is there something we could do to improve it? Or was the original format better? —CodeCat 13:46, 24 August 2014 (UTC)

A few things: 1. Maybe you could merge this table with the one on the right- having 2 different tables makes the page look messy. 2. If there are no ancestors of a language don't display that row. 3. Can descendants of a language be shown as well as ancestors? 4. In Category:English language, for example, in other names there is "Hawaiian Creole English". But "Hawaiian Creole English" isn't another name for English- it's its own thing. DTLHS (talk) 20:59, 24 August 2014 (UTC)
We include not just names for the language, but also names of varieties that are subsumed under the same language on Wiktionary. Category:Dutch language also lists Flemish for example. Showing descendants would be much harder to do, and would involve basically searching through all the languages to find any that have the current language as their ancestor, and then repeat. —CodeCat 21:05, 24 August 2014 (UTC)
(e/c) This is somewhat orthogonal to your question, but the list of "Other names" highlights something I've been thinking about for some time, which is that it's both confusing for anyone looking at our data (e.g. people looking at WT:LOL, and now also people looking at Category:English language), and possibly even undesirable from a technical standpoint, that we conflate in the names= parameter both "X, another name of language Foo" and "Y, name of one dialect of language Foo which is subsumed under the header Foo". "Modern English" is indeed another name for "English"; anything that is ISO-code-en (i.e. post-1500) "English" can also, in linguistic context, be called "Modern English". That's very different from "Hawai'ian Creole English", which is not another name for English — not interchangeable with "English" — but merely the name of one non-independent (non-L2-having) dialect. So I wonder if we shouldn't split a dialects= parameter off from the names= parameter. (There will be a few edge cases where a name refers to both a dialect and the language itself. These could be handled by listing the name in both places, or by giving one parameter priority and saying e.g. "never list anything in the dialects= parameter which is already in the names= parameter".) - -sche (discuss) 21:13, 24 August 2014 (UTC)
I agree. And while we're at it, we might as well split off a separate value for the canonical name? —CodeCat 21:18, 24 August 2014 (UTC)
Somewhat unsurprisingly, I don't like the change. --Dan Polansky (talk) 18:15, 25 August 2014 (UTC)
I agree that the box should be merged with the one on the right. --WikiTiki89 18:20, 25 August 2014 (UTC)
Should the content that is currently on the right be put to the left, right, top or bottom of the current left box? —CodeCat 18:28, 25 August 2014 (UTC)
I'm not sure, but the merged box needs to end up on the right. --WikiTiki89 18:34, 25 August 2014 (UTC)
After thinking for a moment longer, I decided that I think what is currently on the left should go on top of what is currently on the right. --WikiTiki89 18:35, 25 August 2014 (UTC)

Red link headline in new entries[edit]

When I create a new entry, the line under the part of speech is a red link even after saving the entry. See -obb. I am using the {{head}} template. Is this how it's supposed to work? --Panda10 (talk) 20:05, 24 August 2014 (UTC)

The fuck? I didn't think {{head}} was supposed to create links at all, just the pagename (or head= parameter if there is one) in boldface. —Aɴɢʀ (talk) 20:09, 24 August 2014 (UTC)
Panda10, no this is not normal. The problem is of course in breaking a term.
BTW, null edit makes it blue (because null edit makes the module rerun, which on the second run sees the suffix on the list of available articles).
It is interesting that noone has noticed that suffixes and prefixes link to themselves so far.--Dixtosa (talk) 20:45, 24 August 2014 (UTC)
I think this must be fairly recent behavior. I suspect somehow it's treating affixes as two-term entries, like hot dog, where the two words are automatically linked. Only for some reason it thinks [[-obb]] is something like [[]][[-obb]]. —Aɴɢʀ (talk) 20:56, 24 August 2014 (UTC)
This is probably caused by recent changes to Module:headword that User:Wikitiki89 made. —CodeCat 21:03, 24 August 2014 (UTC)
Yes, this is my fault. I will attempt to fix it shortly. In the meanwhile, I am going to assume that this is not harmful enough to revert. Angr diagnosed it correctly. Only I was smart enough to remove the [[]], but not smart enough to remove the link from the [[-obb]]. --WikiTiki89 22:29, 24 August 2014 (UTC)
Yes check.svg Done --WikiTiki89 23:06, 24 August 2014 (UTC)
But wait, I think hyphen should still be a separator. Now it is not--Dixtosa (talk) 12:14, 25 August 2014 (UTC)
If it isn't possible to add links when there are letters on both sides of the hyphen, while not adding links if the hyphen is the first or last character in the string, then better not to have automatic linking in words with hyphens at all, and add it manually where needed. —Aɴɢʀ (talk) 12:22, 25 August 2014 (UTC)
That is possible and fairly easy too. My first thought is one regex replace:
"text lol-lol2".replace (/([^\s\-]+?)(\-|\s)([^\s\-]+?)/g, "$1]]$2[[$3") and embrace it with "[[" and "]]" if at least one replace occurs--Dixtosa (talk) 13:03, 25 August 2014 (UTC)
It's possible, but it is not always wanted. Very often hyphenated words should be linked together. --WikiTiki89 14:56, 25 August 2014 (UTC)

IPA alphabet[edit]

Where did the symbols e̞, o̞ and ø̞ go? They are needed at least for Finnish pronunciations. It's a small language, but big in en-Wiktionary. --Hekaheka (talk) 04:37, 25 August 2014 (UTC)

Please check Module:IPA. Someone may add those symbols, if they are valid. --Anatoli T. (обсудить/вклад) 04:46, 25 August 2014 (UTC)
Heka may have been referring to MediaWiki:Edittools, from which the symbols were removed earlier this month and to which they were just re-added. - -sche (discuss) 04:52, 25 August 2014 (UTC)
I saw that too. I thought these IPA symbols generate module errors. It would be the case, if they are missing in the module. --Anatoli T. (обсудить/вклад) 04:55, 25 August 2014 (UTC)
-sche, thanks for re-listing them. --Hekaheka (talk) 05:05, 25 August 2014 (UTC)
It was User:Wyang, actually. --Anatoli T. (обсудить/вклад) 05:14, 25 August 2014 (UTC)
Anatoli, they shouldn't cause an issue, because they are in the module. They are [eoø] + "combining tack below" for lowered articulation. The latter symbol is in Module:IPA/data under "primary articulation". --Catsidhe (verba, facta) 05:12, 25 August 2014 (UTC)
OK, I haven't checked. --Anatoli T. (обсудить/вклад) 05:14, 25 August 2014 (UTC)
Why are they needed for Finnish? Does Finnish have three heights of mid vowels? In other words, are /e/, /e̞/, and /ɛ/ all separate phonemes? Or are [e], [e̞], and [ɛ] three distinct allophones of the same phoneme? Because there's no need to use the symbol e̞ at all, either allophonically or phonemically, unless you're already using both e and ɛ and need a third symbol that's distinct from both of them. —Aɴɢʀ (talk) 07:00, 25 August 2014 (UTC)
I can't explain this any better than it's currently done in Wikipedia: Finnish_phonology#Vowels. If I understand it correctly /e/, /e̞/, and /ɛ/ are indeed separate phonemes and Finnish happens to use /e̞/ and not /e/ as equivalent for the Latin alphabet "e". --Hekaheka (talk) 16:18, 26 August 2014 (UTC)
I don't think that it's only necessary to use the diacritic to indicate contrasts. If the phoneme is really /e̞/, why write /e/ which is less accurate? It would be a bit like writing /m/ instead of /n/ when a language has only one nasal consonant and its articulation is alveolar. Or writing /s/ when the language's only sibilant is postalveolar, or writing /p/ when the only labial plosive in the language is voiced. —CodeCat 16:41, 26 August 2014 (UTC)
The clause "if the phoneme is really /e̞/" doesn't make any sense. Phonemes aren't preassociated with IPA symbols. IPA symbols are convenient ways of representing phonemes and allophones, and there is always some wiggle room in their application. By longstanding phonetic convention, diacritics are to be avoided unless they illustrate some important distinction; and ordinary Latin-alphabet characters are to be preferred over modified ones whenever feasible. So if a language has only one front mid vowel, the convention is to use "e" to represent it. Using [e̞] by itself in fact doesn't make any sense, because [e̞] means "a sound more open than [e]", but if you don't use [e] in your transcription system of the language in question, then you haven't defined what sound [e̞] is more open than. You could say it's more open than the cardinal vowel [e], but in fact very few languages' vowels are located at exactly their cardinal value. (Neither English [i] nor German [i] is cardinal [i], though German is closer to it than English is.) So you have to define [e] in the context of your language before the symbol [e̞] is even meaningful. As for Finnish, Finnish phonology#Vowels does not say that /e/, /e̞/, and /ɛ/ are separate phonemes; it says that Finnish has a single mid front unrounded vowel phoneme which falls between cardinal [e] and cardinal [ɛ]. The authors of the Wikipedia reveal their ignorance of how the IPA works by insisting on /e̞/ to mark that vowel, when by actual IPA conventions it should be transcribed /e/. —Aɴɢʀ (talk) 17:51, 26 August 2014 (UTC)
What he^ said. --WikiTiki89 18:04, 26 August 2014 (UTC)
If understanding of IPA relies on such conventions, then it kind of bypasses the point of having a universally applyable and unambiguous transcription system. The reasoning here also seems a bit circular in a way. On one side, you say you need to define [e] before defining [e̞], but then what does it mean to say that the Finnish sound is between cardinal [e] and [ɛ]? Surely that in itself means that there are absolute reference points that the lowering symbol is relative to? So the way I see it, the question is whether the correct transcription is [e̞] or [ɛ̝]? If the sound is between them, then either symbol is appropriate. —CodeCat 18:21, 26 August 2014 (UTC)
The "cardinal" ones are defined (although not sure how they are defined), but as Angr already said, very few languages have vowels that coincide with the "cardinal" values, so by CodeCat's logic, all languages should use a whole pile of diacritics on every vowel and every consonant. --WikiTiki89 18:28, 26 August 2014 (UTC)
See Cardinal vowels on how they're defined. Cardinal [i], [u], and [ɑ] are defined as the most high front vowel possible, the most high back rounded vowel possible, and the most low back vowel possible, and all the others are defined as being a certain acoustic distance between those. The IPA makes no claim of being able to represent every conceivable nuance in articulation (or acoustics) in every single spoken language in an unambiguous and universally applicable way. German Haus and English house sound quite different from each other, but both are—correctly—transliterated [haʊs]. —Aɴɢʀ (talk) 18:44, 26 August 2014 (UTC)
I realise that IPA is only an approximation. I see it as a kind of set: every symbol used on its own represents a certain set of possible articulated speech sounds. Diacritics narrow that set further down. But in the case being discussed here, it's not quite clear whether the sound belongs to the [e] set or the [ɛ] set, as it's equally distant to the cardinal value of both (as I understand it). So in this case using a diacritic seems warranted to clarify that the sound in question is an edge case. —CodeCat 18:54, 26 August 2014 (UTC)
Modern Hebrew and Standard Spanish are a couple examples of other langues for which a sound right between [e] and [ɛ] is transcribed as /e/. --WikiTiki89 19:36, 26 August 2014 (UTC)
If the mid vowels of Finnish are midway between canonical [e], etc and [ɛ], etc, and our Finnish-speaking editors want to use the transcriptions [e̞], [ø̞], [o̞], then I think it would be unhelpful for people who don't speak Finnish to try to ban that notation from narrow transcriptions and mandate that the narrow transcriptions be less accurate than the Finnish-speakers want them to be. (The degree of specificity used in narrow transcriptions can and does vary from language to language, so the argument that we'd have to use many diacritics for other languages can be dismissed.)
In broad transcriptions I am inclined to accept that the vowels can be written /e/, /ø/, /o/ or /ɛ/, /œ/, /ɔ/ (de.WP uses the latter).
I would attach weight to how Finnish references transcribe the vowels. Perhaps Hekaheka has references on Finnish with IPA transcriptions and can say what symbols those references use. Checking Google Books, I find only a few English- or German-language references that give IPA-like transcriptions, but it's not clear to me that they are actually IPA and not just regular letters enclosed in IPA-like brackets:
  • Melvin J. Luthy's Phonological and Lexical Aspects of Colloquial Finnish (1973) speaks of "a syllable boundary between all low and mid vowels, e.g. [pa.eta], [kä.etä]."
  • Variation in Finnish phonology and morphology (1997) speaks of "the mid vowels /o, ö/".
- -sche (discuss) 19:47, 26 August 2014 (UTC)
If you ask me, this is an example of the bad Western-European-centric design of IPA. Most Germanic and Romance languages have at least four vowel heights, so for them it makes sense. But globally, far more languages have three heights than four. So having no symbol for a straight mid-vowel is quite a frustrating omission. —CodeCat 20:03, 26 August 2014 (UTC)
You're looking at it wrong. /e/ is what you're describing. /ɛ/ was only added to accommodate languages with four heights. --WikiTiki89 20:18, 26 August 2014 (UTC)
How do you know? —CodeCat 20:47, 26 August 2014 (UTC)
Because "e" has been used for "transcriptions" since Roman times, and they did not have a separate letter "ɛ". --WikiTiki89 20:50, 26 August 2014 (UTC)

I have relied on this source [21]. BTW, to my untrained ear, the Finnish "e" sounds about the same as English /ɛ/. --Hekaheka (talk) 21:42, 26 August 2014 (UTC)

But that's because English has no actual [e]. It has [eɪ] which is a slight closing diphthong, which probably sounds more like the Finnish [e̞i]. —CodeCat 23:05, 26 August 2014 (UTC)

Policy for Translations entries[edit]

Within the definition of most(?) English words, there is a "Translations" section giving corresponding words in various languages. (Like anything else with the word "translation" in it, this is not perfect, but can be immensely helpful.)

I guess there is a policy of not putting a "Translations" section on non-English words, which sounds reasonable since one can look at the English word, but there is an interesting class of exception to this rule -- when the English word does not exist. For example, 何番目 and wievielte are perfectly good Japanese and German words respectively, but the best we can do in English is the non-word whatth. (Personally I would have looked for "how manyth", but that isn't even a single word.) It seems to me it would be helpful to add the translations section to the foreign words, since otherwise a person who recalls that there is a way of saying this in at least two languages has no way of navigating from one to the other. Imaginatorium (talk) 10:08, 25 August 2014 (UTC)

I've thought about this too, and part of the problem is deciding which language to put the translation table in if we're going to add them for non-English languages. For example, a whole lot of languages have a single verb for "to be silent" (schweigen, zwijgen, taire, taceō, etc.), but English doesn't. I've often wanted to be able to put all of those words in a translation table, but where? If we allowed them in non-English languages, we'd have to have the same translation table in each one of the languages where this word exists, and that's a lot more than just the four I mentioned above. —Aɴɢʀ (talk) 11:20, 25 August 2014 (UTC)
Incidentally, to judge from Google Books, how manyth and/or how manieth might actually be attestable. —Aɴɢʀ (talk) 11:26, 25 August 2014 (UTC)
Another example would probably be "double eyelid" and "single eyelid". Wyang (talk) 12:56, 25 August 2014 (UTC)
The only way of adding these would IMHO be to allow sum-of-part translation-only English entries. We have some of them already, see Category:English_non-idiomatic_translation_targets, but there is some risk that these entries are eventually deleted afer discussion on WT:RFD. See also the discussion on Wiktionary_talk:Criteria_for_inclusion#Translation_target Matthias Buchmeier (talk) 17:54, 25 August 2014 (UTC)
Do you guys realize that we are not here to put everything that is interesting (Even if it is linguistically interesting) into articles? Examples are this, dord (and the like), some sum-of-part terms that may have an interesting etymology or maybe it is so widely used that it has become a thing, but... they are still sum of parts.
Anyways, Ill throw an idea (probably a stupid one though xD): let's make a User script, that will fetch (assuming JS can go on external links) the translation table from the language's own wiktionary and insert it into our article formatted in the the same way as English entries do and also apply assisted adding of translations, which actually adds to foreign wiktionary. In this way en:wikt runs the world :D of wiktionaries--Dixtosa (talk) 18:08, 25 August 2014 (UTC)
Hate to be a spoilsport, but we were not able to make automatic generation of entries work properly even within the English wiktionary. In principle we already have a way to find out translations of wievielter to other languages: one clicks "Deutsch" in the "In other languages" -list on the left side of the page and checks the translations in the German wiktionary. In the particular case of wievielter there's the problem that there's no article on it in de-wikt, but that would be a problem in any automated model as well. --Hekaheka (talk) 04:46, 27 August 2014 (UTC)
@Imaginatorium: I've made an entry for what number. --Anatoli T. (обсудить/вклад) 23:21, 25 August 2014 (UTC)
  • This could be solved by wrapping all such sum-of-parts English translations into a special template to track them in FL entries, and then generate (by bot) the appropriate entries in the Appendix namespace. This would really be an awesome feature - a cross-lingual glossary of terms otherwise not directly translatable to English, like that disputed list of terms that there was a big discussion about which I cannot find now (deleted?) containing "to call a mobile phone and let it ring once" and others. Some additional tags would be necessary though (e.g. part of speech). {{translation only}} is absurd - how I am suppose to know the meaning to translate if I knew only English and the missing FL? --Ivan Štambuk (talk) 15:19, 26 August 2014 (UTC)
    • From the entry name of course. —CodeCat 15:30, 26 August 2014 (UTC)
      Ha ha! Except that it's often not that self-explanatory - what number is barely even valid English and could be translated both as "what kind of" and "how much". Those translations referring to concepts that need entire English sentences to translate (like the notorious "to call a mobile phone") are too cumbersome to have as their own entries. --Ivan Štambuk (talk) 08:49, 27 August 2014 (UTC)

Proclitics in Hebrew[edit]

I saw the paragraph Wiktionary:About_Hebrew#Proclitics, and entry הערב. This seems inconsistent, although I don't see any problem to have entries such הערב. — Automatik (talk) 15:14, 26 August 2014 (UTC)

הָעֶרֶב is a different story, since it is idiomatic in the sense of "this evening/tonight". There are a few other similar cases, such as הַיּוֹם (today). The problem with including the non-idiomatic ones, is that there is a very large number of them and they add nothing useful to the dictionary. As I recently pointed out in the