Wiktionary:Beer parlour/2014/June

From Wiktionary, the free dictionary
Jump to navigation Jump to search

Template:cx vs. template:context[edit]

I prefer {{cx}} over {{context}}. Nonetheless, MglovesfunBot (talkcontribs) is replacing the former with the latter. Was there a discussion from which this action follows? Any links to the discussion? --Dan Polansky (talk) 09:25, 3 June 2014 (UTC)[reply]

They're equivalent, one is a shortcut to the other. So there is no problem with replacing one with the other. —CodeCat 11:35, 3 June 2014 (UTC)[reply]
Is there is discussion supporting this? They are not visually equivalent in the wiki markup; one is much shorter, so the actual context like "colloquial" is visually much more outstanding with it. --Dan Polansky (talk) 12:27, 3 June 2014 (UTC)[reply]

Prescriptivism as to common lay transliterations.[edit]

According to Wiktionary:Neutral point of view: "On Wiktionary, neutrality directly implies that a descriptive approach is taken towards the documentation of languages, and not a prescriptive approach. This is one of the primary tenets of how Wiktionary works". We do not adhere to this principle, however, when it comes to common lay transliterations (i.e. commonly used translations for terms originating in foreign scripts, created without necessarily having or following an authoritative scheme of transliteration). This is exemplified by entries like tovarich (English, really?), ayubowan (decided through RfD to be kept as an English word derived from Sinhalese), and the current discussion at Wiktionary:Requests for deletion#mahā.

We generally decide whether any unbroken string of letters is "a word" by looking to see if it is used in print to convey a consistent meaning. Our CFI is built around this principle. We do this because the existence of the word in print is what makes it likely that a reader will come across it and want to know how it is defined, or possibly how it is pronounced, derived, or translated into other languages. I see no reason consistent with our CFI or our NPOV tenet to exclude any unbroken string of letters used in print to convey a consistent meaning, certainly on the basis that this string of letters is not formed by some official arbiter of transliteration. I would propose that our current CFI and NPOV language requires that we include attested words created by lay transliteration, whether or not these words appeal to our own sense of propriety. bd2412 T 19:03, 3 June 2014 (UTC)[reply]

Even transliteration from Latin to other scripts? — Ungoliant (falai) 20:19, 3 June 2014 (UTC)[reply]
Like フロリダ州 and Флорида? bd2412 T 20:21, 3 June 2014 (UTC)[reply]
Those are written in their native scripts. — Ungoliant (falai) 20:29, 3 June 2014 (UTC)[reply]
Like гуд морнинг. — Ungoliant (falai) 20:38, 3 June 2014 (UTC)[reply]
Those are hardly their "native scripts"; they are transliterations which have been adopted into the language, probably through length of use. As for "гуд морнинг", that's a two-word phrase. I am referring to the "unbroken string of letters", so in addition to use, idiomacity of the transliterated phrase would need to be shown. Assuming it can be, the question then is whether "гуд морнинг" is used in print to convey a consistent meaning over a sufficient span. If it is, then it is entirely plausible that a reader might come across it and want to know its meaning. This is scarcely different than including non-Latin eye dialect entries like падонки and キレる (or Latin eye dialect like dayum and innerduce, for that matter). I assume that at some point someone will make an entry for тверкинг, also. bd2412 T 21:27, 3 June 2014 (UTC)[reply]
Yes they are; Katakana is a native script of Japanese and Cyrillic is the native script of Russian. フロリダ州 is a Japanese word that has been loaned from a language whose native script happens to be Latin, not a mere transliteration like гуд морнинг (or just морнинг, if you prefer), which is the English phrase (good) morning written in Cyrillic script instead of Latin. If you think that гуд морнинг occurs in Russian as a loanword from English, feel free to add it as a Russian. My question is: if your proposal is accepted, would we create things like an English entry for морнинг? — Ungoliant (falai) 23:12, 3 June 2014 (UTC)[reply]
Is морнинг even attested? I can see that it exists, but since I can not read Russian, I have no idea whether it is attested with the same meaning as morning in English, or whether the cites that exist are even uses as opposed to mere mentions. Assuming that all of these criteria are satisfied, and there are a CFI-worthy number of uses of морнинг in running text consistently conveying the meaning "morning", then we should have an entry defining the term for the benefit of the reader. Should it be defined as English? It seems absurd to call it Russian when it is merely an English word written in Russian characters. If a new kind of entry is required to accommodate the existence of such words, then we need to put one in place. We would not be able to claim that Wiktionary is a descriptive work, rather than a prescriptive work, if we were to pretend that "морнинг" did not exist, or conveyed no intelligible meaning. bd2412 T 23:59, 3 June 2014 (UTC)[reply]
I'm not quite following the purpose of this discussion. What is this for? Is it about allowing transliterations in various languages? We have allowed, on a limited basis Roman transliteration of a few languages - after a vote or by consensus. I don't think we should spread to any non-Roman based language, unless they are a part of another language. New additions should be allowed after a vote and should never be on the same level than terms in the native script. --Anatoli (обсудить/вклад) 00:39, 4 June 2014 (UTC)[reply]
Wouldn't that require us to repeal the more fundamental policy of the above-quoted language of Wiktionary:Neutral point of view? After all, it is a purely prescriptivist position to exclude attested words (using our definition of "word": A distinct unit of language (sounds in speech or written letters) with a particular meaning, composed of one or more morphemes, and also of one or more phonemes that determine its sound pattern). bd2412 T 00:52, 4 June 2014 (UTC)[reply]
Yes, believe it or not we don't just record every attested combination of letters ever put on paper (and yes, that means we have a point of view and that we aren't purely descriptivist). DTLHS (talk) 00:55, 4 June 2014 (UTC)[reply]
If a word is attested in a language, then it can be included in that language in that script, not its transliteration. Transliterations are also attestable but their usage is well - mere transliteration, when it's not possible or difficult to use proper native scripts or to teach the script or pronunciation. sijakhada is only transliteration of Korean 시작하다. It's useful and can be found in published books but its purpose is different. Transliteration is not a substitute for native scripts. --Anatoli (обсудить/вклад) 01:02, 4 June 2014 (UTC)[reply]
How are you defining "word" to permit that distinction? Certainly not the way it is defined in our own corpus. bd2412 T 01:10, 4 June 2014 (UTC)[reply]
It depends on the word and on the language, what writing systems are used in a given language. For example, "Я читаю книгу, а она смотрит телевизор." are Russian words (I'm reading a book and she's watching TV.), "Ja čitáju knígu, a oná smótrit televízor" is the transliteration of the Russian phrase, they are not words. यह लड़की बहुत सुंदर है । (yah laṛkī bahut sundar hai .) is a Hindi phrase (This girl is very beautiful.). Hindi is written in Devanagari. "yah laṛkī bahut sundar hai." is a transliteration of the Hindi phrase, none of these: yah, laṛkī bahut sundar hai are Hindi words and none of Ja čitáju knígu, a oná smótrit televízor are Russian words, even if it's a standard transliteration and can be attested. There can be plethora of standard, chat, practical, textbook, specific dictionary transliterations. Is that enough? I don't know if I can explain better. --Anatoli (обсудить/вклад)
How does this help our readers when they come across such things? None of this explains how, for example, laṛkī or televízor are not distinct units of language with a particular meaning. They have as strong a claim to being words as ндрав. There must be some way that we can assist readers who come across these in print (assuming they occur with a sufficient degree of attestation to meet our CFI) in understanding their meaning. If we can't, if readers must turn to some other resource to determine the meaning of these, then we're not fully functioning as a dictionary. I would not oppose stricter constraints on the number and type of references to be required for such things, or a form of presentation that makes it clear that the transliteration is not the native form, but there must be some way to avoid turning a blind eye to the existence of these lexical units and their potential to require defining for our readers. This is perhaps even more pressing in the case of a word like bahut, for which we have a definition that will only confuse and frustrate the person reading transliterated Hindi, if it exists. bd2412 T 02:13, 4 June 2014 (UTC)[reply]
It may not be possible to cover all possible transliterations, including scientific, phonetical, practical (čto vs što vs shto for Russian "что"), chatroom (Arabic "3iid mubaarak" for عيد مبارك instead of "ʿīd mubārak"), with or without stress indication (televízor vs televizor), with or without vowel length indication (mahā vs maha), suppressing unpronounced letters or transliterating as they are written (laṛkī vs laṛakī). If users are not able to separate proper words in a proper script from their transliterations/transcription or loanwords from the phonetic representation, they may not be able to use our dictionary. We have an advanced search facility and it's more of a technical question, rather than policy. --Anatoli (обсудить/вклад) 02:34, 4 June 2014 (UTC)[reply]
I don't see why this is any less possible than covering all untransliterated words (particularly if we are including common misspellings and eye dialect terms in multiple languages). However, my concern is focused on words found in books in print, and particularly to words used in running English text without italics, quotation marks, or other distinct presentations designed to indicate their "foreignness". I think the typical reader can be forgiven for not being able to separate those words from proper words in a proper script. Such a limitation would obviously substantially shrink the set of words to consider, probably excluding most of the examples that you have provided here. Would that bring us closer to a resolution where you would feel comfortable having some commonly used "unofficial" transliterations? bd2412 T 02:45, 4 June 2014 (UTC)[reply]
Wait, do I understand correctly that you've just said you want to take certain things that are "used in running English text without italics, quotation marks, or other distinct presentations designed to indicate their 'foreignness'" and include them as romanizations/transliterations rather than as English? If so, what basis do you have for saying that strings used in running English text without any indication of foreignness are not English, other than a prescriptivist view of what constitutes English? - -sche (discuss) 03:20, 4 June 2014 (UTC)[reply]
I want to include them, period. I'm not picky about how, but I think it's silly to treat mahā as an English word when it is being used no differently than महा would be, except that various authors have transliterated it so their readers will find it more familiar. bd2412 T 03:38, 4 June 2014 (UTC)[reply]

I still don’t get what is being proposed, BD. We include all attested words, whether their spelling is determined by standard or non-standard transliteration, transcription, or whatever else (which is probably impossible to determine in most cases). As far as I know, there is nothing written or unwritten preventing the use of ones “created by lay transliteration”. Tovarich, for example, might be based on a French transcription of Russian or something. It doesn’t matter. It is included as English because (I presume) it is used three times in English sources. Michael Z. 2014-06-05 09:33 z

  • I recently proposed to restore the previously deleted mahā, a widely-used transliteration of a Sanskrit word. In the discussion, some editors are opposing the inclusion of this word on the grounds that it is "not a word" because it is a transliteration from Sanskrit. I think it should be included in some form, and don't see why we include transliterations of Chinese and Japanese, for example as Chinese and Japanese, but will not include this at all (perhaps not even as English). bd2412 T 11:56, 5 June 2014 (UTC)[reply]
I have four separate issues about this. For one, I think when something like Moskva comes up in English text, whether we consider it English or Russian is not so important as the fact that someone might want to look up that spelling. This applies to relatively rare cases, with words that are sort of edging across boundaries. On the other hand, I'm not thrilled with the concept of citing linguistic transliterations into Latin for whatever reasons; between search and manual transliterations, users of such works should be able to find the words they're looking for, and it seems like that could lead to a huge boost in the number of words. On the flip side, I do believe that we should use scripts that the language is actually published in. Gothic, for example, is published in Latin script; I do not believe that non-trivial amounts of the language have ever been published in Gothic script. We should (and do) support Gothic in the script that it's published in, no matter what purism leads us to using the script it was once written in. I don't know enough about Sanskrit; I believe it can actually be published in any number of Indic scripts, and if people actually read Sanskrit in Latin script, then Latin script as well as all of those Indic scripts it's actually read in should have entries for Sanskrit. Lastly, I'm more worried about this the harder the script is. Russian or Greek transliteration shouldn't be hard to get back into its native script, but ideographs are generally going to be impossible, with Sanskrit being challenging for many users but not hopefully impossible.--Prosfilaes (talk) 01:03, 6 June 2014 (UTC)[reply]
I do find this concept that mahā is not a word and महा is to be confusing. They're both a unit of communication clearly corresponding to the same Platonic word. abba, 𐌰𐌱𐌱𐌰, and αββα are the same Gothic word, practically spelled with the same letters with a different font (and there's serious argument that the Gothic script is merely Greek with a few extra letters and should be treated that way[1]; in practice it's treated as Latin with a few extra letters, because it's Germanic and thus was handled by German philologists.)--Prosfilaes (talk) 01:03, 6 June 2014 (UTC)[reply]
The issue is not at all abstract to me. Maha is a disambiguation page on Wikipedia. I recently set about fixing the large number of links to that page, and discovered that most of them were from articles referencing mahā. Most did not provide the Sanskrit script, and the editors who wrote them may well have been unaware of their presentation Sanskrit script, because there are plentiful sources using only the latter. I initially though maha would have the answer but it did not. It took an extensive amount of poking around - and my ability as an administrator to see the content of the deleted page, mahā, for me to figure out what was going on here. bd2412 T 01:49, 6 June 2014 (UTC)[reply]
Yeah. I think part of the problem is that we have incredibly strict formatting and structure; we generally require that each word belong to a precise language, that each sense belong to a precise POS, that each quotation belong to a precise sense, that everything be templatizable and Luacizable and MewBottable six ways from Sunday. The strictness of structure not only appeals to the aesthetic sense of programmers (myself included), but also has practical benefits; but it also has drawbacks, one of which is that it does not allow us to do a great job capturing the messy reality of real language, and it forces us to adopt POVs when we would rather not (or at least, should rather not). —RuakhTALK 02:35, 6 June 2014 (UTC)[reply]
I would include this with Moskva then; whether or not mahā is English shouldn't get in the way of providing an entry for a word that people may look up.--Prosfilaes (talk) 03:14, 6 June 2014 (UTC)[reply]
+1 to what Ruakh said.
A few entries already use ==Undetermined== as their language header. We could start using that header more, although I'm not sure that's the best solution (or even a good solution). - -sche (discuss) 03:35, 6 June 2014 (UTC)[reply]
It's not so much that the language is undetermined as that we seem compelled to treat anything transliterated and placed in English text as English. Suppose we had headers that literally read, e.g., ==Transliterated Sanskrit== or ==Transliterated Sinhalese== - would that be an improvement? bd2412 T 17:57, 6 June 2014 (UTC)[reply]
Nah, for things like mahā, I don't think we need any new L2 headers; the best approach IMO would be to do what we've done for all other languages for which we've wanted to include romanizations: have a vote to allow romanizations, and then have entries using the language's usual header (in this case, ==Sanskrit==). ==Undetermined== might still be worth considering for a few other things, though, which actually do slip "between the cracks" of other languages. For example, Gott in Himmel occurs only in supposedly-German snippets (often italicized) in English works; it doesn't occur, or at least isn't truly related when/if it occurs, in German works. (But even with that, I'm not sure an ==Undetermined== header would be any less unsatisfactory that our current arrangement of calling it ==German== and/or the other obvious possibility, of calling it ==English==.) - -sche (discuss) 18:38, 6 June 2014 (UTC)[reply]
I would definitely support allowing transliterations of Sanskrit, generally, but that's a band-aid. There are still many other languages with common Latin-alphabet transliterations of forms originally written in other scripts - Russian, Ukranian, Armenian, Arabic, Sinhalese, Hebrew, etc. This entire book collects stories in transliterated Sinhalese, peppered with English explanations. This one has Hebrew transliterated at length. If words in these running text can be found having the same transliteration in other sources, do we need a separate vote to allow Sinhalese transliterations? Won't the end result be a series of votes allowing all attested transliterations? bd2412 T 19:25, 6 June 2014 (UTC)[reply]
Someone could start a series of votes, but I don't think all the votes would pass. For example, I'm sceptical there would be support for allowing romanized entries for Russian or Ukrainian (especially given how very many romanization schemes there are — cf. Wiktionary:Beer parlour/2013/November#Including_multiple_transliterations.2C_from_multiple_systems.2C_in_entries). Having someone type a romanization into our search bar and be taken to an entry which defines the string as a romanization of [whatever] is only one of several ways of getting that person to the native-script form of the word. Another is having the search function find the romanization in the native-script entry and bring that up as a search result. (In this case, a search for mahā brings up మహా, महत् and महा as search results. A user who sees those search results gathered on the search page should, IMO, have no more trouble figuring out which one she is looking for than a user who sees those things gathered into an entry.) For some languages, another way of having the user find the native-script form is having an appendix (here or on WP) detailing the conversation between various romanization schemes and the native script. When the issue of romanizations of Phoenician and some other dead languages came up, my impression was that one reason so many people were OK with allowing romanizations of those languages to have entries is that those languages are indeed dead, and no-one is natively writing them in any script, and modern discussion of them often is in Latin script. For a language like Russian, which people are still natively writing in the Cyrillic script, I could see users preferring to skip having entries for the romanizations, and either use the search-bar functionality, or just rely on users to use the appendix to convert between romanization and native script. That's why I think it's useful to discuss each language on its individual merits (Anatoli seems to feel the same way). - -sche (discuss) 22:32, 6 June 2014 (UTC)[reply]
It seems to me that we are approaching two separate problems. Your take on this addresses the question of how we should provide readers with the ability to find the native-script entry for a particular romanization, which I grant is a reasonable question to consider. My take, however, is that if a "word" (broadly defines) is used in print to the extent that people might come across it and want it defined, we should have an entry for it, whether that word is violono or Portsmouth or mahā. The only remaining question is what form should this entry take. I am thinking, at this point, about drafting a proposal for a vote laying out a series of options (allow all such entries but treat them as loanwords, allow them and treat them as their language of origin, redirect them to the entry for the language of origin if there is a unique target, allow only select languages as approved individually by the community). bd2412 T 16:25, 7 June 2014 (UTC)[reply]

How is mahā different from fēngshuǐ?[edit]

We have an entry on fēngshuǐ which identifies it as a Mandarin romanization. However, it is easily possible to find examples of "fēngshuǐ" used in running English text. Why is this entry labelled Mandarin and not English? Should the existence of citations in English text lead us to have entries for both Mandarin and English "fēngshuǐ" (as we do for aloha? If not, then why would a comparable entry on mahā be labelled English rather than Sanskrit? Why not both, since citations can be found both in running English and in long selections of transliterated Sanskrit? If we are not prescriptive, why do we care if there is an "official" transliteration system, so long as we know such a system is in use? With such inconsistencies in our coverage of transliterated terms, it seems to me that we should err on the side of at least covering well-attested phonemes. bd2412 T 00:36, 6 June 2014 (UTC)[reply]

This is why fēngshuǐ should be violently killed. Wyang (talk) 00:42, 6 June 2014 (UTC)[reply]
@BD2412 You have to be clear about what you're proposing, rather than demanding - I want to include them, period.. If it's all about romanised Sanskrit or specifically the form mahā.
  1. Do you suggest to allow romanised Sanskrit entries? Please specify, which transliteration standard, what format. Why do we need them? E.g. Devanagari is too hard to learn/to type, there are too many homophones (I seriously doubt that). AFAIK, it has to be decided by a vote, the way Mandarin standard pinyin and Japanese rōmaji were decided. Note that pinyin and rōmaji are soft-redirects and disambiguations, they have no definitions, merely links to standard Chinese and Japanese forms. It's not allowed to have a pinyin or rōmaji entry if no single a Han character (or Japanese kana) don't exist.
  2. If it's an English word (borrowed from Sanskrit), citations should be provided. --Anatoli (обсудить/вклад) 00:52, 6 June 2014 (UTC)[reply]
@Wyang. I respect your opinion on the matter but... Well, that was decided by a vote, which actually heavily reduced the role of Pinyin, not without a very strong opposition. Similarly, the vote on Romaji reduced the role of Romaji, rather than introducing it for the first time. There can be different opinions on the importance of Pinyin or Romaji but they serve as disambiguation of various homophones and favoured by other Chinese, Japanese editors. --Anatoli (обсудить/вклад) 00:52, 6 June 2014 (UTC)[reply]
Re "Why is this entry labelled Mandarin and not English?": this is a conflation of two questions; let me separate them and answer each.
  1. Why do we have an entry for fēngshuǐ labelled as Mandarin? Answer: in Wiktionary:Votes/2011-07/Pinyin entries, the community decided to allow pinyin romanizations "using the tone-marking diacritics, [...] whenever we have an entry for a traditional-characters or simplified-characters spelling"; that vote cited [[yánlì]] as an example of how such entries would be formatted (note the ==Mandarin== header). In a subsequent vote, the community decided that various Chinese lects, including Mandarin, would be unified under the header ==Chinese==. It appears that fēngshuǐ has not yet been updated to use ==Chinese== rather than ==Mandarin== as its header; this can be ascribed to the fact that Wiktionary is a work in progress and is not complete or finished yet.
  2. Why do we not have an entry for fēngshuǐ labelled as English? Answer: either because fēngshuǐ is not attested in English, or because Wiktionary is a work in progress and is not complete or finished yet. If fēngshuǐ is attested, unitalicized, in English text, then it is a loanword and we should have an English entry.
Re "why would a comparable entry on mahā be labelled English rather than Sanskrit": if mahā is comparable to fēngshuǐ in that it is attested, unitalicized, in English text, in a way that conveys meaning (which I am not convinced is the case), then we should have an English entry because it is a loanword from Sanskrit used in English. If mahā is comparable to fēngshuǐ in that the community has voted to allow romanized Sanskrit the way it voted to allow Chinese pinyin, then we should have a Sanskrit entry for mahā per that vote. However, the community has not AFAICT held any vote to allow romanized Sanskrit. The comments I've seen in the BP and at RFD have suggested that such a vote would pass, but no-one has stepped forward to draft such a vote. - -sche (discuss) 01:07, 6 June 2014 (UTC)[reply]
(Re ==Mandarin== header in Pinyin entries) Good point. There are a few things to change, though. Our main Chinese editor Wyang, though doesn't support pinyin but we can still ask him to help to change it or get anyone else. Strictly speaking, Pinyin only applies to Mandarin=standard Chinese but it can still have ==Chinese== header and the templates could display Mandarin Chinese Pinyin reading of ... --Anatoli (обсудить/вклад) 01:15, 6 June 2014 (UTC)[reply]
@Anatoli, I am not proposing any specific solution per se; this is a discussion to figure out a solution. We have a lot of things going on here - transliterations from some languages presented as words in those languages, transliterations from other languages presented as English words, entries for various languages of "eye dialect" spellings (and a well-attested lay transliteration is basically an eye dialect spelling of a word from another alphabet). I would like to see some solution arrived at that allows us to cover all words in the way that our own corpus defines words, and to provide the most accurate description of what kind of word it actually is (a Chinese word, a Sanskrit word, a Russian word). I'm fine with this being done by redirect, but that only works to the extent that the only meaning of the word is as a transliterated form from the target language. I'm not convinced that allowing Sanskrit solves the problem. There are many languages and scripts for which these kinds of transliterations are well-attested.
@- -sche, citations of mahā attested, unitalicized, in English text, can be found at Citations:mahā. More can be found with a Google Books search, but there are also many instances of mahā being used in running transliterated Sanskrit text. Also, suppose a word is well-attested but exists only in italics? Suppose it is attested only in block of transliterated text from the same language? In that case, it is still an attested phoneme, and one that a reader might look up in a dictionary. Cheers! bd2412 T 02:14, 6 June 2014 (UTC)[reply]
I'm not sure I share the view that we have "transliterations from other languages presented as English words". We have entries for loanwords, and we allow romanizations for certain languages. In some cases, a string is both a valid romanization in Chinese or another language and a loanword, but then there are two L2 headers, and only the loanword is [or: should be] presented as English. If you find an ==English== entry that is only a transliteration/romanization, RFV it.
The 1991 citation in Citations:mahā is of Mahā Yogi, not mahā; the 1910 citation is of Mahā Bhārata, the proper name of a work (possibly not includable: we have Iliad, but we don't have The Fault in Our Stars). The 2004 and 2007 and 2014 citations (yes, I added the last of those — it was the 'least unconvincing' one I could find at the time) are italicized. The 2009 citation does not have an intelligible meaning, IMO: "the classification of mahāyoga into three parts, starting with the [great] of [great]"? What does that mean? The 2013 citation is arguably a good "jib"-type citation, although it explicitly uses a somewhat different definition than other citations ("greater" as opposed to "great"). - -sche (discuss) 02:40, 6 June 2014 (UTC)[reply]
PS, you seem to be using a sense of [[phoneme]] that our entry lacks; does our entry need to be expanded? - -sche (discuss) 02:45, 6 June 2014 (UTC)[reply]
I should say an unbroken collection of phonemes (although this could be read as excluding monosyllabic constructions). We have tovarish and tovarich as English "loanwords", and many others like those, but they are really only thinly disguised transliterations. What other dictionary in the world includes them as words in English? (Okay, this one might, but still.). If we ignore the italicization issue, then there are a great many uses of mahā (Mahā Bhārata is the name of a work, but it is still composed of individual words, just like Holy Bible; we do have the more modern form, Mahabharata). bd2412 T 03:01, 6 June 2014 (UTC)[reply]
  • Wiktionary:Neutral point of view is not a policy. WT:CFI is the inclusion policy. WT:CFI does not forbid attested transliterations from being in the mainspace. The claim "transliterations are not words" is hogwash. No discussion or vote has been produced to show that there is a consensus for forbidding attested transliterations from the mainspace. WT:CFI does not trade in the term "native script". There is no policy concerning native and non-native scripts, AFAIK. --Dan Polansky (talk) 17:50, 8 June 2014 (UTC)[reply]
    • I think NPOV is beyond policy even, it is one of the five pillars (on Wikipedia), and it is my understanding that it is indented to run through all Wikimedia projects as a guiding principal. That said, I think that both NPOV and the CFI would permit inclusion of all attested transliterations (and I agree that it is nonsensical to call them "not words"), and that we should have a vote if we are going to come up with a specific scheme for the limitation of these, and for the presentation of those that are included. bd2412 T 03:48, 9 June 2014 (UTC)[reply]
Hogwash, nonsensical? If you're both so sure that no vote is necessary and transliterations are "words" and are already allowed by current policies, why don't you create some transliterated entries and observe the results? --Anatoli (обсудить/вклад) 04:08, 9 June 2014 (UTC)[reply]
If romanizations were by default allowed, we wouldn't have had one vote to allow romanized Gothic, and then a second (more successful) vote after the first was judged to have failed. The fact that such votes have been held for every language for which someone has wanted to include romanizations is, to put it mildly, normative. If you want to change the status quo — you're in the right place doing the right thing (having a BP discussion).
Incidentally, I find it notable that that first vote is the only vote on allowing romanizations which I recall failing, and it failed because people like User:Msh210 objected to the suggestion that we start allowing romanizations of just any old language. - -sche (discuss) 05:01, 9 June 2014 (UTC)[reply]
Devanagari is also a living script with a number of languages, apart from Sanskrit, using it. If, for various reasons, after votes, transliterations were allowed for some languages, reasons for keeping transliterations specifically for Sanskrit have not been presented yet in this discussion, if we don't count arguments "transliterations are already allowed by CFI" or "I want them included, I don't care how". It's not even clear if this discussion is specifically about Sanskrit transliteration or any transliteration for any language written in non-Roman script. --Anatoli (обсудить/вклад) 06:42, 9 June 2014 (UTC)[reply]
bd2412 I don't think you could be more wrong, to be honest! We're just applying our own rules and because the people who participate in RFD's vary, decisions aren't always consistent. That's no different to a court of law where it depends what jury you get! What you seem to be saying is "I don't like this community decision, please change it" with the emphasis on the "I" (that is, you don't like it). Renard Migrant (talk) 10:39, 9 June 2014 (UTC)[reply]
In fairness, aren't most proposals for change made by people who say "I don't like the existing state of affairs"?
I'm somewhat surprised by how bristly this debate is getting. I apologize if I'm complicit in that to any extent. - -sche (discuss) 13:05, 9 June 2014 (UTC)[reply]
The English word feng shui is normally written without tone marks, since tone marks make no sense in English. —Stephen (Talk) 13:21, 9 June 2014 (UTC)[reply]
If someone had, in the past, called a vote to include the a common noun like cow in the dictionary, would that mean a vote is now required to include common nouns? There needs to be some standard by which a word is excluded before action needs to be taken to allow its inclusion. There is no actual rule being applied here that anyone has pointed to, and no definition of "word" that anyone has provided that would not include transliterations. The 2011 vote on allowing romanization of languages in ancient scripts, by the way, says nothing about the application of this rule to currently existing languages, and nothing about whether the words at issue meet the CFI. The illusion that an unwritten prohibition represents the "existing state of affairs" is about as reliable as the claims currently made in this dictionary that dagoba, khakkhara, and haramzada are actually words of the English language. I am not content to have us misleading our readers, either about our goal to have "all words in all languages", or about the actual language to which a word belongs. bd2412 T 14:14, 9 June 2014 (UTC)[reply]
ᛋᛏᚩᛈ ᛏᚱᚣᛁᛝ ᛏᚩ ᛗᚪᛣᛖ ᚢᛋ ᚠᛖᛖᛚ ᛒᚫᛞ ᚠᚩᚱ ᛏᚫᛣᛁᛝ ᛁᚾᛏᚩ ᚪᚳᚳᚩᚢᚾᛏ ᚦᛖ ᛠᛋᛁᛚᚣ ᚩᛒᛋᛖᚱᚠᚪᛒᛚᛖ ᚠᚫᚳᛏ ᚦᚫᛏ ᛚᚫᛝᚷᚢᚪᚷᛖᛋ ᚫᚱᛖ ᚹᚱᛁᛏᛏᛖᚾ ᛁᚾ ᚫ ᛚᛁᛗᛁᛏᛖᛞ ᚾᚢᛗᛒᛖᚱ ᚩᚠ ᛋᚳᚱᛁᛈᛏᛋ. ᛁᛏ ᛁᛋ ᚾᚩᛏ ᛈᚩᛁᚾᛏ-ᚩᚠ-ᚠᛁᛖᚹ, ᛁᛏ ᛁᛋ ᚾᚩᛏ ᛗᛁᛋᛚᛠᛞᛁᛝ, ᛄᚢᛋᛏ ᚳᚩᛗᛗᚩᚾ ᛋᛖᚾᛋᛖ. — Ungoliant (falai) 16:43, 9 June 2014 (UTC)[reply]
For the sake of convenience: running the above through Module:Runr-translit (with the language code "ang") gives "stóp tryiŋ tó maᛣe us feel bæd fór tæᛣiŋ intó accóunt þe easily óbserfable fæct þæt læŋȝuaȝes ære written in æ limited number óf scripts. it is nót póint-óf-fiew, it is nót misleadiŋ, just cómmón sense." The module may need some improvements. Keφr 06:53, 10 June 2014 (UTC)[reply]
On my computer, this shows up as a string of boxes. Obviously (like the transliterations I would propose to include) it would be much more helpful to the reader for this to be readable in Latin script. Has Wiktionary stopped caring about helping readers find definitions that they are likely to search for? bd2412 T 18:30, 9 June 2014 (UTC)[reply]
Languages may be written in a limited number of scripts, but I would practically bet that all of them have been written in the Latin script. I can try and think of possible exceptions, but if there are any, I'd be surprised if we had any vocabulary in any of them.--Prosfilaes (talk) 07:31, 10 June 2014 (UTC)[reply]
Re: "If romanizations were by default allowed, we wouldn't have had one vote to allow romanized Gothic, ...": This argument fails to distinguish attested transliterations from all transliterations. The vote allowed the inclusion of unattested transliterations (as long as the native-script form corresponding to the transliteration is attested), so went beyond the current CFI. Furthermore, an existence of a vote is no proof at all that there already exists policy that the voted proposal overrides; many a vote takes place in a policy vacuum, especially if it serves to confirm an uncodified common practice. --Dan Polansky (talk) 16:51, 9 June 2014 (UTC)[reply]
As I've said several times, the only real way to attest Gothic is in the Latin script. The Gothic script Gothic entries we have are most likely transliterations from the Latin script. There's a one-to-one correspondence, but still. If you find a book of Gothic text, it will be in Latin script. Unless you're into paleography, you will not find Gothic in the Gothic script. And no, I do not accept that citing published works transcribed from handwritten originals is acceptable for English but not Gothic.--Prosfilaes (talk) 22:13, 9 June 2014 (UTC)[reply]
Re: "If you're both so sure that no vote is necessary and transliterations are "words" and are already allowed by current policies, why don't you create some transliterated entries and observe the results?" This is just power talk: when the speaker cannot point to a policy--as he cannot--he instead threatens to delete transliterations, or have them deleted by another admin. There may be a consensus among admins that attested transliterations should be deleted (I don't know), but there is no policy supporting such deletions. --Dan Polansky (talk) 16:51, 9 June 2014 (UTC)[reply]
I'm not threatening anyone. It's you who is behaving as if there is nothing to discuss, as if it's a done deal because it matches your opinion on the matter and ridiculing those who disagree with you. There is no secret "agreement" among admins to talk about but entries in the wrong script (not native if you wish) are usually marked as such, converted to native script or deleted. --Anatoli (обсудить/вклад) 05:55, 10 June 2014 (UTC)[reply]
I dunno, Anatoli, denying the existence of the Deletionist Cabal seems like exactly the sort of thing a member of the Deletionist Cabal would do. I may have to report you to the Delete the 'Deletionist Cabal' Cabal... ;)   - -sche (discuss) 21:54, 10 June 2014 (UTC)[reply]
As for the Runic script (ᛋᛏᚩᛈ...) above, this entirely misses the point that we are talking attested romanizations; this use of runes is not attested.
As for cabal and secrecy, the only people claiming something is done in secret are those claiming that romanizations are already forbidden from the mainspace, since they were so far unable to provide references to a public record showing consensus for their exclusion.
FYI, I have created Wiktionary:Votes/pl-2014-06/Excluding romanizations by default. --Dan Polansky (talk) 09:49, 15 June 2014 (UTC)[reply]

A vote on allowing entries for romanized Sanskrit[edit]

I have drafted a vote: Wiktionary:Votes/pl-2014-06/Romanization of Sanskrit. I used basically the same wording as previous votes used; suggest improvements if you have any. The vote can be postponed as much as necessary. - -sche (discuss) 13:32, 9 June 2014 (UTC)[reply]

This vote began a while ago, FYI. - -sche (discuss) 16:20, 23 June 2014 (UTC)[reply]

A vote on allowing all attested romanizations[edit]

I drafted a vote: Wiktionary:Votes/pl-2014-06/Allowing attested romanizations. I tried to reflect, as best I could, what was proposed in this thread (see my notes on the talk page); however, as I am an opponent of blanket inclusion of romanizations, I welcome people who actually support the proposal to reword it as necessary. I am particularly interested in your views on which of these citations could be used to cite/attest (=support the existence of) a romanization entry, vs which, if any, could not. In cases where I only bothered to type up 1 or 2 citations of a given string, please assume for the purposes of discussion that there are enough other citations available to add up to 3: what I am interested in is what kind of citation "counts" as "attesting" a romanization. - -sche (discuss) 07:33, 10 June 2014 (UTC)[reply]

FYI: Wiktionary:Votes/pl-2014-06/Excluding romanizations by default. --Dan Polansky (talk) 09:51, 15 June 2014 (UTC)[reply]
Both votes' start dates were a couple of days ago, so I have removed the 'premature' tags from both, and voting has begun on both. (Some users had suggested merging the votes, but none showed any interest in drafting a merged vote, so the votes remain separate.) - -sche (discuss) 16:23, 23 June 2014 (UTC)[reply]

What are romanizations for?[edit]

This seems a deeper question to me than what I've read so far in this thread. What are romanizations for, in the first place? If romanizations are intended to solve a problem, what is that problem?

From my past experience here and from what I've seen mentioned above, the core issue appears to be findability or discoverability. (NOTE: This is setting aside the entire question of attestations.) If a language is generally written in Script X, it can be difficult for EN WT users, who can only be assumed to be able to type using the Latin alphabet, to find entries in that script.

Assuming that users would want to be able to search for entries using the Latin script, do we need to have separate entries just for the romanized forms? Is it enough now to simply include the romanization right there within the entry in Script X? Can our search software now find such entries? My own brief testing suggests that yes, our search software can find an entry in Script X when searching on a Latin-alphabet string, provided that said Latin-alphabet string is included in the page. Importantly, this Latin-alphabet string can be provided by a template, and does not seem to be needed as-is within the wikitext. C.f. this search for kamau, which does find the 構う entry, even though the string "kamau" does not appear anywhere directly within the wikitext, and is instead provided by the {{ja-verb}} template.

So long as our search feature allows users to find Script X entries when searching on Latin strings, I fail to see any need for separate entries just for romanized forms that only serve to redirect users to the Script X entries. As such, separate romanized entries looks like a solution in search of a problem. Is my understanding correct? Are there other problems that romanized entries are intended to solve? ‑‑ Eiríkr Útlendi │ Tala við mig 20:32, 10 June 2014 (UTC)[reply]

There's more to it than that. The search can find entries based on romanization, yes, but how well does it do that job? Is it obvious to readers how to provide their search query in such a way that the result they need appears first? How well does it perform if users simply type in the romanization and press "search", like they are accustomed to doing for other languages? What if the search term they entered exists, but does not lead them anywhere closer to their goal (and they don't realise there are really two types of search)? Furthermore, how practical is it to do this for every word they want to look up (which may be dozens if not hundreds per day)? All these are problems that having dedicated entries for romanizations will reduce if not eliminate altogether. A user looking for Gothic can now simply type dags in the search box, press enter, and be on their way. —CodeCat 21:15, 10 June 2014 (UTC)[reply]
"What are romanizations for" is not really a Wiktionary question, any more than "what are verbs for" or "what are place names for". Writers use romanizations to convey meaning, they are found in print, and they may be read by people who turn to Wiktionary to find out what mean is intended. It is a dicey proposition to ask readers to rely on our search system. What are they to do when a romanization of a word from one writing system happens to closely match an entry from a word in another language? Look at the Hindi दक्षिण (meaning southern), romanized as dakṣiṇa, just a few diacritic variations away from dakšiņa, the Latvian word for "fork". Not all romanizations have distinct diacritics, even. There will be romanizations for which the word exactly matches some other existing word. What are readers to do then, if we have no entry for the romanization? bd2412 T 21:26, 10 June 2014 (UTC)[reply]
  • Re: "What are romanizations for" is not really a Wiktionary question, any more than "what are verbs for" or "what are place names for". -- there is no reason to be obtuse, this is entirely within the context of the EN WT and more specifically this very thread. To be entirely explicit, I'm asking, what are romanizations for, with regard to EN WT entries.
Re: not all romanizations have distinct diacritics, it seems you've presented one more obstacle, rather than reason for adding separate romanization entries. If someone searches for daksina, say, the system should show both the Hindi and Latvian -- as it already does. Having a separate entry at [[dakṣiṇa#Hindi]] won't change that situation at all, other than to require users to click through one more level of abstraction (the romanized redirection page) before finally getting to the entry page. ‑‑ Eiríkr Útlendi │ Tala við mig 21:46, 10 June 2014 (UTC)[reply]
Sorry, I was not trying to be obtuse, but we don't generally ask the purpose behind having any kind of entry for a word that meets the CFI. The purpose is always to help the reader define the word. With respect to the obstacle I raise, I actually came into this issue because I was disambiguating "mahā" at Wikipedia, and needed to find a Wiktionary entry to link to when it was being used as a dicdef. Typing "maha" into our search engine took me straight to the unhelpful entry, maha, which only has completely unrelated meanings from completely different languages. The only way that I was able to find the Hindi word at issue was to use my admin bit to look at the deleted entry, mahā. bd2412 T 21:55, 10 June 2014 (UTC)[reply]
Thank you for clarifying, I clearly had the wrong end of that stick. Your reply also makes it clear that I needed to clarify that I am thinking past the bounds of CFI: any term in any non-Latin script should ideally have some romanized form included in the entry for better searchability, regardless of whether that romanized term can be attested anywhere.
FWIW, clicking the magnifying glass icon to go to the secondary search and entering mahā gets me this. From there I see the महा#Sanskrit entry, but no independent listing at all for the Hindi entry that you were looking for. (The Hindi term does show up in the etymologies of compounds, such as महेश्वर.)
Perhaps then the root issue isn't necessarily that EN WT should include separate romanizations for the sake of having them, but that the EN WT search feature is still inadequate (doesn't allow specification of languages when entering search strings, doesn't show language names in results, ordering of entries seems a bit arbitrary, etc.), and adding separate romanization entries is one workaround. Is that a good restatement of this issue? ‑‑ Eiríkr Útlendi │ Tala við mig 22:07, 10 June 2014 (UTC)[reply]
Come to think of it, I don't know whether the word as used in Wikipedia articles is meant to be Hindi or Sanskrit. All I, as the lay reader, can know for sure is that several Wikipedia articles on topics in Hinduism use the word "mahā", and that many books in print also use this word. The search function is only as useful as our ability to convey the means of using it to the average user, who may have very little experience with Wiktionary. If at all possible, it might be helpful to generate a list of transliterations that theoretically would be made under this proposal, and see how many of those are blue links having existing articles with other meanings in other languages. bd2412 T 01:16, 11 June 2014 (UTC)[reply]

Oh, wow.

I have tried my best to go through all of this discussion and the related votes. It is absolutely confused, the terms are used to mean different things by various participants, and the proposals and examples are vague and undefined to the point of meaninglessness.

If you guys can’t come up with something even a bit cogent, I’m just going to vote against all of it. Michael Z. 2014-06-16 22:51 z

@Mzajac In my understanding, one rationale for allowing romanizations is "having entries for romanizations is the best way (or a good way) to get users to the native-script forms of words"; one rationale for excluding romanizations is that "romanizations are not the best way (or a good way) to get users to the native-script forms of words; it is better to have the search function find the native-script entries directly by finding the romanizations in them". An additional rationale for allowing romanizations is that "romanizations are words in the same languages as the native-script words they romanize, and as such merit inclusion" (so, e.g., in these citations, linked-to from the 'allow romanizations' vote, "svobodnyx" is [asserted to be] a Russian word); while a rationale for excluding romanizations is that "romanizations are not words, they are merely shadows of words, and as such are not intrinsically/automatically inclusion-worthy". Each side disagrees with the other side's rationales; each side further disagrees on whether the status quo is that romanizations like "svobodnyx" are allowed or excluded. - -sche (discuss) 23:10, 16 June 2014 (UTC)[reply]
It seems that now arguments are used like in Wiktionary_talk:Votes/pl-2014-06/Excluding_romanizations_by_default#Clarification_needed that if we exclude romanisations, then words like "judo" will be excluded. I agree with Michael that the whole discussion in various places is messy and confusing. If "... to get users to the native-script forms of words" is used then it should be clear they are soft-redirects and should only exist if native script entries also exist. BTW, instead of "svobodnyx", could you use the lemma form, rather than inflected, see свобо́дный (svobódnyj)? --Anatoli (обсудить/вклад) 23:40, 16 June 2014 (UTC)[reply]
Re "arguments are used [...that] 'judo' will be excluded": such arguments don't hold water, IMO. Re "свободный": If the vote to allow attested romanizations passes, and "svobodnyj" is attested, then both it and "svobodnyx" will be allowed, presumably pointing to свободный and свободных, respectively. - -sche (discuss) 23:53, 16 June 2014 (UTC)[reply]
OK. Are you able to provide an example of a romanised entry (which is not English but a romanisation only) for the vote(s)? Doesn't have to use any (final) templates but I need to see the structure. --Anatoli (обсудить/вклад) 01:48, 17 June 2014 (UTC)[reply]
I'm not one of the people who want to allow romanizations (I just drafted the votes, since it looked like no-one else had the time/intention to), so I'm all ears if either of them has a different idea of how romanizations' entries should look. However, in both the Sanskrit vote and the 'allow romanizations' vote, I specified that romanization entries "will contain only the modicum of information needed to allow readers to get to the native-script entry"; that's the same clause that was used for the vote on pinyin. For the Sanskrit vote, I spelled out that entries will look basically like the Gothic romanizations' entries (e.g. qino), except with "Sanskrit" (or in the case of svobodnyx, "Russian") headers instead of "Gothic". So, like User:-sche/svobodnyx. - -sche (discuss)
Thanks. I see. I am against romanisation entries (except for those, which are already allowed by a vote) but qino looks OK to me and User:-sche/svobodnyx is not. It goes way beyond "modicum of information...". If citations are required, they should be on citation page, IMO. --Anatoli (обсудить/вклад) 02:58, 17 June 2014 (UTC)[reply]
Okay, I’ll quickly summarize some of what is frustrating me:
  1. Why does this only apply to romanizations? What about Cyrillizations, &c.? Is it because the intent is to serve readers of English-language texts? If that is the case, then why not only accept romanizations attested in English texts?
  2. The wording makes it sound like any term that can be considered a romanization is allowed/disallowed an entry based on this proposal. That’s silly. Thousands of entries are loanwords that are arguably romanizations, in a narrow or broad sense. The wording has to be more specific to account for this conflict.
  3. Why does this refer to romanizations at all? Are we talking about loanwords in English texts whose spellings may be considered transliterations or transcriptions? Does it mean foreign words that are being mentioned in English texts? In other Latin-alphabet language texts like German or Lithuanian? And if so, why not Cyrillicized, Arabicized or Siniticized forms too? Why this level of specificity?
  4. What about words attested in spoken sources? Why does whatever principle is being applied here apply only to the written word?
  5. The given examples don’t help determine what is being proposed (Citations:mahā, User:-sche/svobodnyx). Some of them are proper names used in English. Others are words in transliterated titles of works. Some are accompanied directly in the text with a gloss, indicating that they are foreign terms being mentioned, and not used in an English text. They appear in their sources for various reasons that don’t relate to any rationale for this proposal that I can divine.
Perhaps a Romanizations section in the native-script entry would satisfy whatever needs this is meant to satisfy. It’s already been discussed and sort-of accepted twice. It could be limited to attested romanizations, or to standardized romanizations, or perhaps to the superset of both. (Actually, better a more general “Converted forms” section, including any foreign-language, other-script conversions or transcriptions makes better sense.) Michael Z. 2014-06-17 22:28 z
  1. At the very beginning of this discussion, BD did propose allowing entries for all transliterations, even cyrillizations, etc. The proposal got narrowed early on to only romanizations (see the initial exchange between BD and Ungoliant, and then BD's comment to Anatoli at 02:45, 4 June 2014), as some of the people involved seemed to recognize that the idea of allowing entries for cyrillizations, arabicizations, etc was so much more controversial than the already controversial idea of giving entries to romanizations that it would probably act as a poison pill.
  2. I personally think the wording of the vote is clear, particularly when taken together with the examples, which show that the vote is to take sequences like svobodnyx (in the Latin script) and include them under ==Russian== (not e.g. ==English==) L2 headers. How would you change the wording to make it clearer?
  3. See point 1.
  4. Huh? Audio has sometimes been cited to attest words (e.g. in [[Qapla']] we record that that word was uttered in the film Team America: World Police). I don't see how it could attest a particular spelling or script. In the case of Qapla', we used common sense / Occam's razor to assume that the word was in the script that the dialogue it was in it would normally be in (the dialogue was English, so: Latin script), just as we routinely assume books that seem to use words like "Москва" are in fact using those (Cyrillic) letters, and not using e.g. "Mоcквa" (a mix of Latin and Cyrillic letters).
  5. Re "Some of them are proper names used in English. Others are words in transliterated titles of works.": Yes. The main proponent of allowing romanizations, BD, has argued that those citations are still using the "words" to convey meaning; see his comment of 21:19, 10 June 2014 on the 'allow romanizations' vote's talk page, and this comment.
Now that we have Lua and transliteration is largely automatable (and now that Lua- and template-generated text is findable by our search engine), I agree that the inclusion of romanizations sections in the native-script entries is a reasonable idea, preferable to the idea of making entries for romanizations. - -sche (discuss) 23:27, 17 June 2014 (UTC)[reply]
Re: inclusion of romanizations sections in the native-script entries is a reasonable idea, preferable to the idea of making entries for romanizations. That's my point too. You don't need an entry for akīrtikara if अकीर्तिकर (akīrtikara) shows "akīrtikara" in the transliteration section. Further, if our searches allowed to select specific languages, then finding a foreign-script term would be even easier. For transliterations with complex diacritics we should develop reverse transliteration system, e.g use "jaaGgala" to search for जाङ्गल (jāṅgala) as is used by Spoken Sanskrit site. --Anatoli (обсудить/вклад) 01:55, 19 June 2014 (UTC)[reply]
I am curious about your opinion of cyrillizations. I have imported these tables from Wikipedia's article on Cyrillization of Chinese (there is a similar table on Cyrillization of Japanese), and would like to see the entries made for the Cyrillic phonemes (and here I mean all of the one-syllable phonemes). bd2412 T 04:07, 19 June 2014 (UTC)[reply]
Well, (Russian) Cyrillisation of standard Chinese (Palladius) or Japanese (Polivanov) (there's also Korean (Kontsevich)) is only needed to understand how Chinese, Japanese or Korean people's or place names, concepts are usually written in Russian or can be written or transliterated for education (other Cyrillic based language have similar but less systematic standards). E.g. it's Хиросима (Hiroshima), not "Хирошима", "гоюй" (Guoyu), not "гуою", Korean "ханча", not "ханджа" - hanja. There are some exceptions (Токио, Иокогама, not Токё, Йокохама/Ёкохама - Tokyo, Yokohama) and traditional spellings (Пекин, not Бэйцзин - Beijing), variants (Аомынь not Аомэнь - Aomen, Macau). They can stay in appendices, I don't know if we need entries for them, unless they are words, like proper nouns, loanwords. I actually think that Cyrillisation is a bit of a misnomer here, they are rather Russifications, these systems are partially used in other languages or sometimes used as a base but they definitely won't fit for Ukrainian, Belarusian, Bulgarian, Serbian and Macedonian without changes. --Anatoli (обсудить/вклад) 04:30, 19 June 2014 (UTC)[reply]
An appendix would be suitable, so long as the appropriate redirects were made. Something would need to be done about the many existing entries for Russian words having different meanings that coincide with these Cyrillizations (if that's the wrong word, than Wikipedia's article needs fixing also). This leads to wonder, by the way, are our romanizations of Chinese and Japanese characters and words universal to all languages using the Latin alphabet? bd2412 T 17:26, 19 June 2014 (UTC)[reply]
I suggest Cyrillization for Russian, etc., just like there are romanizations that are characterized as “international,” “English,” “German,” etc. Russification (Russianization?), Anglicization, etc., are very broad terms, and as far as I know, they don’t conventionally refer to script conversions. Michael Z. 2014-06-27 18:09 z


@Michael Z., my entire consideration here is that if there is a reasonable possibility that a reader will come across a "word" while reading books in print (and by "word" I mean something that the average person would look at and believe to be a word), and may want to find out things about that word (definition, etymology, pronunciation, etc.) for than word, then that word should be included in our corpus to offer these kinds of information. Whether there is a reasonable possibility is why we have a CFI. We should do these thingsbecause our goal is to provide "all words in all languages", and we do not have the limitations that other dictionaries have. Of course, words found only in spoken sources aren't going to be come across in print until some author chooses to write about and transliterate them. bd2412 T 17:23, 19 June 2014 (UTC)[reply]

Well, our framework treats “all words (used) in all languages,” that is, each term meriting an entry, as a word of a particular language. Specifically, every attested term used as a native or naturalized expression.
What you are proposing is broadening our general principle, and accepting every mention too (“all words ever mentioned”). This is a much bigger discussion than just “accepting transliterations,” and should be discussed as a fundamental change to the principles of CFI. Michael Z. 2014-06-27 18:09 z
Where in any policy do we define "word" as limited to a "native or naturalized expression"? That certainly is not our practice, which already includes tens of thousands of entries that do not fall within that limitation, including all Latinized entries from words with dead scripts or languages without scripts, all translingual terms, all of our existing romanizations of Chinese and Japanese words and characters, and many words like tovarish/tovarich that are in reality transliterations from other scripts, even if they are listed as "English". bd2412 T 18:26, 27 June 2014 (UTC)[reply]
  • It bears noting that all Japanese and Chinese romanized entries (and, I think, all romanized entries for Gothic as well, among others) are there purely to aid in finding the lemma entries. These are not, nor should they be regarded as, entries unto themselves -- they are no more than workarounds for our profoundly inadequate search features. ‑‑ Eiríkr Útlendi │ Tala við mig 21:55, 27 June 2014 (UTC)[reply]
Bd2412, Nope. At best, you are listing a set of explicit exceptions to the principle that I mentioned.
  1. Dead scripts handling is an explicit exception. They are largely used academically in romanized form.
  2. Translingual is the combining of entries for expressions used natively in more than one language – and arguably, many violate CFI because they are not terms, or words, or lexical items (are 3, , , 𝄇, and “words”, to be “defined” in dictionaries?).
  3. Chinese and Japanese romanizations are native forms used in Chinese and Japanese, are they not?
  4. Tovarish is an English word: I don’t understand how saying that such words “are in reality transliterations from other scripts” contradicts that – a naturalized or semi-naturalized borrowing remains a borrowing, regardless of what script the donor language relies on: tovarishes of the tsar is no less English than three cappuccinos. Furthermore, the argument for picking out only “transliterations” this way makes no sense. The set of spellings of the term tovarish is a result of the combined influences of transliteration, direct transcription, English transcription of utterances, and re-borrowings from other languages. For evidence, the OED entry’s citations include tavarisch, tovarisch, tovaritch, tovarich, and Tovarishch. A spelling is not a term, and the surmised source of a spelling doesn’t determine the identity of a term.
The whole proposal presupposes that “a transliteration” is a kind of term, but it is not. A transliteration is one expression of a term, derived from another expression of a term. Also, don’t forget that although we have a web page for every variant spelling, capitalization, hyphenation, or other orthographic form (form-of entries, “soft redirects,” &c.), our proper entries actually represent terms, not spellings. Michael Z. 2014-06-28 16:28 z

Civility and formatting of citations[edit]

Yesterday, I politely reminded Spinningspark at RfV to format cites per our guidelines when adding them to citations pages, rather than posting a string of unformatted links. Spinningspark told me to die. I told Spinningspark that such hostility was "uncalled-for and completely unacceptable," and in response I was informed "my comment was completely called for."

Such incivility is unacceptable and should not be tolerated. -Cloudcuckoolander (talk) 16:52, 5 June 2014 (UTC)[reply]

I agree. It's a very reasonable request to ask a user to format citations correctly. We inform new users when their formatting is off all the time, so an experienced user like Spinningspark should have no problem in handling that. His reaction was definitely not acceptable. —CodeCat 17:14, 5 June 2014 (UTC)[reply]
Right. If one is going to add citations to a citations page or an entry, one ought to take the time to properly format them. Otherwise post whatever Google Books/etc. links you find in the RfV discussion so that someone else can step in and do the formatting. Does someone also want to inform Spinningspark of this discussion? I don't want to do it myself due to involvement/conflict of interest. -Cloudcuckoolander (talk) 17:28, 5 June 2014 (UTC)[reply]
Strictly speaking, "go hang" is a relatively mild oath, and not one that I would equate with literally telling you to die. bd2412 T 17:34, 5 June 2014 (UTC)[reply]
Well, I don't think everyone who adds citations has to know how to format them correctly. Everyone has to learn. But I think if someone asks to format them, it's a very reasonable request that someone who is willing to cooperate with the Wiktionary community (and its practices) should have no problem with fulfilling. So if the user just flat out refuses in such a hostile manner, it's more or less implying that they don't want to take responsibility for their work, hence willful disruption of Wiktionary ("I know how to do it because I have been told, and I know that if I don't do it it gives others more work, but I'm still not doing it"). —CodeCat 17:39, 5 June 2014 (UTC)[reply]
I was not aware that "go hang" had an idiomatic meaning roughly equivalent to "take a hike" or "pound sand." To me it read as an instruction to "drop dead" or "go kill yourself." However you cut it, though, it's hostile language, and and hostility (especially unprovoked hostility) is not conducive toward a collaborative project. -Cloudcuckoolander (talk) 21:13, 5 June 2014 (UTC)[reply]
You coming to the Beer Parlor to blow the whistle on a fellow Wiktionarian is uglier than Spinningspark's incivility, IMO. Grow a pair. --Vahag (talk) 17:54, 5 June 2014 (UTC)[reply]
We don't have any formal dispute resolution processes or noticeboards etc. like they do on Wikipedia (at least none of which I am aware). This is the only place I could think of to bring this incident to the community's attention, because the alternative, of course, was to ignore it, and incivility has been passively tolerated for way too long around here. Something needs to change. Preventing editors from getting tired of incivility and leaving is more important in the long-term than not stepping on anyone's toes by daring to call out unacceptable conduct. -Cloudcuckoolander (talk) 21:13, 5 June 2014 (UTC)[reply]

I always take the time to correctly format any kind of material that I place on an entry page to the best of my abilities. That is our "product" and is the part of the project on display to the public. That is expected of anyone and I am not an exception. However, RFV is a page requesting someone to find citations. I have found some, as requested. That is helpful. Me not formatting them and not putting them in the entry is no more unhelpful than anyone else not doing it. Just because I found some citations does not oblige me to do anything with them. This is a volunteer project and I am choosing not to volunteer.

The citations were placed on the citations page rather than the RFV page because I have repeatedly been asked to do that.[2][3][4][5][6] I am happy to go back to placing them on the RFV page if that is what is wanted, but I really don't understand this attitude that we really don't want to hear about cites unless they are perfectly formatted. SpinningSpark 23:44, 5 June 2014 (UTC)[reply]

  • Citations pages are a product too; they exist to show the reader examples of use in historical context. A citations page is not necessary to resolve an RfV discussion, but they tend to be made because the cites have been collected. Therefore, if you are only going to provide bare links, it is probably best to post them in the RfV discussion itself. If you are up to fully formatted cites, those don't need to be on a citations page, but may as well. bd2412 T 00:09, 6 June 2014 (UTC)[reply]
    The previous requests that the unformatted links be placed on the Citations page was an effort to put them one step closer to being usable in the face of Spinningspark's failure to format them as the rest of us do. In those locations a dump run could at least find them easily and someone else could clean up Spinningspark's mess. DCDuring TALK 00:42, 6 June 2014 (UTC)[reply]
    If I had been running around spoiling nicely formatted citation pages by dropping bare urls on to them then you might have a point, but I would never do that. Where others have already begun the work of producing a formatted page of citations then I am always careful to follow. But then, there is rather less need to find citations if others are already doing it. It is unfair to characterise me as a misfit who is refusing to conform to site standards. What I am really doing is the work of finding cites when no one else has the time, can be bothered, or wants to save the entry. Would you rather have the cites or not?. "Fitting in", in the case of most of the entries I have responded to, would not be formatting the cites I provide, it would be doing nothing and leaving the page blank like everybody else has.
The real unacceptable behaviour here is the opprobrium that is being heaped on me for doing this work. I'm not looking to turn this into a dick measuring contest, but if someone were to do some stats of RFV I am pretty sure that a very significant percentage of the cites found in response to those requests would be from me. I should be thanked for fulfilling this task when no one else wants to do it, but instead I am being hounded and criticised for it. SpinningSpark 07:16, 6 June 2014 (UTC)[reply]
I agree that you deserve thanks for the work that you do. I do disagree with the rationale enunciated by DCDuring, and would still have bare links put in the RfV discussion rather than on a citations page, regardless. bd2412 T 11:47, 6 June 2014 (UTC)[reply]
@Spinningspark Relatively speaking, finding the citations is the fun part of the task of documenting our definitions. Formatting them, though not difficult and not painful, is definitely less fun. Using our various "quote" templates makes it much less tedious than it could be — but not fun. Therefore we follow the simple fairness rule of allocating to the person having the fun of finding each quotation the less fun task of formatting the quotation in the entry. This is simple plays-well-with-others schoolyard justice.
@BD2412 Because, 1., the final resting place of the citations will be in the entry or its citations page, 2., the definition being "cited" is always part of the problem and is always in the entry but not always in the RfV, and sometimes 3., because the context of multiple definitions in a PoS helps us better grasp a particular definition, it has always seemed more efficient to me to just insert the citations where they will ultimately be, thereby enjoying the benefits of seeing how citations fit in the entry, rather than in an out-of-context argument on the RfV page. DCDuring TALK 12:28, 6 June 2014 (UTC)[reply]
Exactly where is this guideline that says the person reporting a cite is allocated the work of formatting it? RFV is a page asking for citations. Providing them I would have thought is exactly what is wanted. You might consider that to be only half the job, but at least I have left the task only half undone when everyone else left it fully undone. Frankly, if such a guideline actually exists, or you manage to get one created, then I will probably unwatch RFV and not bother in the future at all.
As for fun, there are two extremes here. At one end is the case of wifty (which led to this thread in the first place). Cites for wifty are so easily found with the simplest of gbooks searches—no need for complex search terms or scrolling through pages of results, they just leap out at you—that I would have thought that all that was needed was to point that out for the RFV to be settled. I am sure if the proposer of the thread had done that search there wouldn't have been an RFV in the first place. In cases like that I really don't feel inclined, or see the need, to do a lot of work on the entry. It's these kinds of drive-by requests that are the make-work at RFV, not me finding cites for the make-workers. And playground responses along the lines of nah nah nah we're not listening unless you do it properly [7] are really not going to change my mind.
At the other extreme are obscure words that are really difficult to cite. A search term that zeroes in on that exact sense is hard to come up with and many pages of false hits have to be examined. It is not exactly fun to go through page after page of google results. It is quite satisfying when one finds something, but not fun. I am much more inclined to put cites in the entry under those circumstances (properly formatted of course) to preserve a record of what I found. If others feel I am hogging all the fun, then by all means jump in and find some cites yourself! SpinningSpark 16:27, 6 June 2014 (UTC)[reply]
Speaking of tiresomeness of formatting citations: some time ago I wrote a gadget (Quiet Quentin) which can assist in finding citations on b.g.c and format them accordingly. It is at the bottom of Special:Preferences#mw-prefsection-gadgets. The generated markup often needs manual corrections, but nevertheless the tool takes much of the burden away. Unfortunately, there will be probably no Usenet support, because I could not find a public JSONP API for searching it anywhere. Keφr 16:43, 6 June 2014 (UTC)[reply]
@Spinningspark re: "Exactly where is this guideline [] ". Certain aspects of behavior don't actually require documentation among well-socialized mammals (animals?). Humans usually pick up basic civility (tit-for-tat, golden rule, taking the good with the bad, etc) in the schoolyard, though application in new realms often proves challenging. Admittedly, the online environment seems to require more explicit norms than some other environments. And some folks may not have had good schoolyard experiences. DCDuring TALK 18:37, 6 June 2014 (UTC)[reply]
I didn't learn anything like that in my schoolyard. Maybe I learnt that if someone starts ordering you about tnen you should hit the fucker first before the fight starts properly, but nothing in etiquette that would actually be useful at a dinner party. SpinningSpark 23:27, 6 June 2014 (UTC)[reply]
That explains it. DCDuring TALK 23:45, 6 June 2014 (UTC)[reply]
This is a bit painful to watch: two people who unselfishly make massive, high-quality contributions to rfv fighting each other, and a general atmosphere of negativity in the comments posted by others.
Although I tend to side with Spinningspark on substance in this issue, he can be a bit rude at times- though I would say describing it as "extreme incivility" is a bit much. The paradoxical effect is that, of all the people who contribute searches to rfv without adding formatted cites to entries, Spinningspark is the only one consistently singled out for criticism for doing so.
I think everyone here needs to take a step back and look at the big picture: this is a community effort which requires the voluntary efforts of real people, with all their quirks and flaws, virtues and vices. Negativity tends to lead to more negativity, not to the kind of results we want. Too much negativity, and people stop participating.
I would say to Spinningspark: please consider biting your tongue every once in a while and being more civil, even when people make unreasonable or unfair demands. Otherwise, you're moving the focus away from what you're responding to and onto your response- letting the others off the hook.
I would say to those criticizing Spinningspark: don't look a gift horse in the mouth. Maybe Spinningspark should be more careful about what gets put on the citations page if that forces others to fix things, but Spinningspark shouldn't be criticized for not putting things there in the first place. Sure, someone has to prepare and enter the cites, but Spinningspark could just not contribute anything, and then someone would still have to prepare and enter the cites anyway, but they would also have to do Spinningspark's part, too. I don't see the point in criticism over this. It's one thing to make sure that people are aware that there's more to it than just finding the cites, but if someone already knows this and makes it clear they just don't want to do it, nothing good will come of making it an issue- criticizing a volunteer for not volunteering enough is more likely to to lead to less volunteering, not more.
Not that I'm accusing anyone of hypocrisy: everyone who's participated in this discussion has contributed many times more than enough to the project to have every right to speak on this issue. Cloudcuckoolander, particularly, isn't appreciated enough for doing what Spinningspark has been doing, but doing all of it and doing it right. My point is that having the right and the standing to say something doesn't make it a good idea- if people see that their contributions will be met with criticism for not contributing enough, they'll just not contribute at all. Chuck Entz (talk) 19:44, 6 June 2014 (UTC)[reply]
I also wish Spark would play ball regarding formatting — and if he/she can't, then post citations to some other location where they can be fixed by us before adding to the entries (which won't help our already huge workload) — but I still think that conflicts and disagreements are something that happens, and something that human beings should be raised to deal with. Offence is taken, not given. Some people are rude; it may mean you don't want to deal with them very often; but most people in the world are not you. Usenet policed itself perfectly well long before the WWW, when there were no "moderators" or "administrators", just common sense and a majority who supported netiquette. I don't see incivility as a serious project problem unless it's abuse and threats. Equinox 21:02, 6 June 2014 (UTC)[reply]
@Chuck Entz — The choice of the description "extreme incivility" was the result of my initial thought that "go hang" was an instruction to die rather than an idiom roughly equivalent to "take a hike." I went ahead and changed the section title above.
If I'd been catty in my initial request to Spinningspark, I could understand receiving a defensive response. But I wasn't catty. I didn't complain about having to finish what he'd started. I thanked him for taking the time to collect cites and politely requested that in the future he remember to format cites before putting them on the cites page.
This was the first time I ever asked Spinningspark to format cites. I walked into this unaware that there is apparently a history of Spinningspark being asked to format cites. So I didn't know that my request might open an old wound, as it were. Being one of the relatively small pool of editors who regularly do legwork at RfV, I will admit that I've occasionally experienced a sense that the important contributions I'm making toward the project are going unnoticed. But I'm still receptive to polite requests to do things differently, and if I don't agree with a request, I try not to take it personally, and state my objections in a non-confrontational manner.
One of the concerns raised about Luciferwildcat was that he didn't properly format the entries he created, and this meant more work for someone else. This drove home to me that "if you're going to do something, do it right" is a philosophy by which Wiktionary operates. I don't think Spinningspark should be given a free pass in this regard. No, I'm not suggesting that Spinningspark is under any obligation to gather or format cites if he does not wish to do so, but if you are going to post citations in an entry or on a citations page, you are obligated to do it properly. Asking someone to correctly format cites when posting them to a citations page is not an "unreasonable or unfair demand."
And, frankly, I find it troubling that you would request we refrain from calling out Spinningspark on the grounds it might drive away a valuable contributor, and yet not take into consideration that having to endure hostility might also drive away valuable contributors. Do I really need to point out that I am also a volunteer and am under no obligation to "grow a pair" as Vahag suggested above and tolerate hostile treatment? -Cloudcuckoolander (talk) 17:36, 7 June 2014 (UTC)[reply]
I think we all feel like our valuable contributions go unnoticed. How often do you see any contributor here thanking another for anything? Maybe it's because of the small size or relative wonkiness of this community, but we don't really have the culture of appreciation that Wikipedia tends to have. There are some rough edges here, and there are likely to continue to be, but all that said, we have managed to build one hell of a dictionary. bd2412 T 18:02, 7 June 2014 (UTC)[reply]

Translingual entries and Chinese entries in the new format[edit]

We have tens of thousands of Mandarin, Cantonese, etc. entries lacking definitions, also with ===Hanzi===, ===Han character=== requesting definitions but Translingual entries have imported definitions, rather generic and vague, obviously without part of speech info.

As a first step, I suggest to merge all or most single-character definitionless Chinese (i.e. currently Cantonese, Mandarin, Hakka, Min Nan, Wu) entries using this entry (lǎo) (see this revision) as an example. The last edits removed ==Cantonese== and ==Mandarin== sections with nothing particularly useful (only request for definitions and transliterations, which are not lost), created ==Chinese==, merged definitions requests (with {{defn}}) into one language, all definition requests would now be in Category:Chinese definitions needed. If this format is accepted, perhaps a bot might bring all similar entries into this format (or similar, depends what readings/topolects are available, whether Old Chinese and Middle Chinese information can be obtained). Please comment (hopefully without trolling about "destruction of Sinitic languages", we are not destroying anything). Another point is, the example entry uses ===Definitions=== header, which is still under discussion in the 2014 May page. Since not only definition is unknown but PoS, ===Definitions=== may be the best choice in this case.

@Wyang please join. --Anatoli (обсудить/вклад) 03:51, 6 June 2014 (UTC)[reply]

And this is an example of a single-character entry WITH definitions - this revision of (huǎng). @Bumm13 I have edited just after you, please check. --Anatoli (обсудить/вклад) 04:17, 6 June 2014 (UTC)[reply]
The principle seems good to me. The technical side could be very challenging after 12 years of scattered edits. Wyang (talk) 07:27, 6 June 2014 (UTC)[reply]
It's understandable, partial recreation of entries may be required. Entries without definitions and sometimes erroneous transliterations have little value. --Anatoli (обсудить/вклад) 08:08, 6 June 2014 (UTC)[reply]
Spot-checking a few of the approximately 30 000 Chinese-character entries which use {{defn}} leads me to suspect that even after all these years, a large number of them have only been edited by bots making changes to all of them en masse. So, don't be discouraged from trying to sic a bot of your own on them ... the percentage of them that follow a predictable format and thus can be cleaned up by bot may be larger than you expect. :) - -sche (discuss) 05:17, 9 June 2014 (UTC)[reply]

Tone and register used in Wiktionary content[edit]

On my talk page here, another editor complained that we should use "whom" because it's "correct", which I disagree with for the reasons I gave in the discussion. But I wonder what others think of this. We have occasionally had complaints that definitions are terse and complicated, or use words no longer in normal use (thou is notorious), and they can't easily be understood. So I think that we should avoid using language that is no longer widely used in speech. —CodeCat 10:43, 7 June 2014 (UTC)[reply]

Thou is in a different class from whom. Whereas "thou" is genuinely archaic and is never used except when one desires to be deliberately archaic, "whom" is still used somewhat extensively in formal contexts. Just my 2p. If I may not take part in this discussion, I shall abstain from doing so. Velociraptor888 10:49, 7 June 2014 (UTC)[reply]
While I agree that "for who" is not an error, it is less formal and therefore less appropriate than "for whom" in the context of a dictionary definition. Although we are a descriptive and not prescriptive dictionary in our scope, our readers have a right to expect a certain degree of formality and adherence to the prescriptive rules of edited written English in our content. —Aɴɢʀ (talk) 11:27, 7 June 2014 (UTC)[reply]
We have every right to be prescriptive for ourselves, and I completely agree that we should use formal language (including whom) in our definitions (but not necessarily usage examples). --WikiTiki89 14:40, 7 June 2014 (UTC)[reply]
Formality, yes; words [ie, lemmas] less frequent than the top 30,000 or so of current English, not unless the definiens itself is technical. DCDuring TALK 16:33, 7 June 2014 (UTC)[reply]
The overall frequency of whom is irrelevant. What matters is its frequency in the place where you would expect it, which in informal settings is probably nearly 0%, but in formal settings I'd expect it to be around 90%. --WikiTiki89 17:25, 7 June 2014 (UTC)[reply]
I've never used it, no matter what the context. It feels very pretentious and unnatural to me. —CodeCat 17:31, 7 June 2014 (UTC)[reply]
Yes, but you live in the Netherlands. --WikiTiki89 19:28, 7 June 2014 (UTC)[reply]
So when you have a sentence that would be ungrammatical if it used who (e.g., "an unrepentant criminal on whom the court imposes an additional penalty"), do you just make it ungrammatical, or do you rephrase it to avoid whom? —RuakhTALK 00:11, 8 June 2014 (UTC)[reply]
I use who. Although I think I would rephrase it anyway, not to avoid whom but just because it sounds more natural: "an unrepentant criminal (that/which) the court imposes an additional penalty on". whom just isn't in my normal vocabulary at all, it feels more or less like an archaic synonym for who, and so who doesn't strike me as ungrammatical in the slightest. It's simply my own feel for the language as a native speaker; it's how I learned to speak and have always spoken and been spoken to in English. —CodeCat 01:37, 8 June 2014 (UTC)[reply]
Then you should consider editing your user-page to remove your claim that you're a native speaker. I do not think *" [] criminal on who the court [] " is English. (And I think that this use of "which" is actually archaic, unlike "whom" which is merely formal; but I'm not sure, it may just be dialectal.) —RuakhTALK 07:18, 9 June 2014 (UTC)[reply]
That sounds like a w:No true Scotsman argument to me. —CodeCat 10:02, 9 June 2014 (UTC)[reply]
Well, sure. English is defined as the native language of native English speakers, and native English speakers are defined as those who have English as their native language. Your English is excellent, but if you've internalized a rule that it's always grammatical to substitute "who" for "whom" without making any other changes, then I believe you are, in that respect, at variance with native speakers. (Of course, English has many dialects, and your difference from an actual dialect seems to be less than many dialects' differences from each other. I'm not saying "OMG yur t3h suck", I'm just saying that I think you're wrong about this point of usage.) —RuakhTALK 17:55, 9 June 2014 (UTC)[reply]
I consider myself a native speaker of Russian. However, since I did not grow up in Russia (but in the US), many things that sound outdated or archaic to me, I am often surprised to find are actually still in common use in Russia. This does not mean I'm not a "true" native speaker, but just that I grew up in a different environment and therefore cannot always judge what is archaic or not in Russia itself. --WikiTiki89 13:42, 9 June 2014 (UTC)[reply]
That's fair. I recently had a similar realization for Hebrew. (Though unlike you and CodeCat, I've chosen not to list my Hebrew as "native" on my user-page.) —RuakhTALK 17:55, 9 June 2014 (UTC)[reply]
I think the issue isn't whether we should use "who" or "whom" in entries, but whether you should be reverting someone for making what you consider to be the wrong choice. Sure, I've reverted edits that added an obsolete 16th-century term to a definition- but that's because it would make it harder for an ordinary person to understand. "Whom" is different: it's still in use, especially in more prescriptive contexts. People may not identify with those who use it, and they may not remember the rules about when to use it, but they still understand it just fine.
With all of the truly awful stuff that gets added to entries, I'm simply not going to bother with borderline issues like this. If someone wants to replace "who" with "whom", or vice versa, I leave them alone unless it looks like it's going to degenerate into an edit war (sometimes I'll revert pondian color/colour edits for that reason). The same goes for hyphens vs. dashes, ending sentences with prepositions, etc. Unless it's going to make it harder to understand, or it's going to make many of our readers cringe (e.g. possessive "it's"), I let people do what they think is best. Chuck Entz (talk) 20:07, 7 June 2014 (UTC)[reply]
Re who/whom: These ngrams suggest whom is still dramatically more common than who as the objective case of who following a preposition. Anecdotally, I've often seen objective-case use of "who" deprecated, while I've never until now heard anyone oppose "whom". Therefore, I'd say that while contributors can create entries with whichever of "who" vs "whom" they personally feel more comfortable with, I wouldn't revert someone who cleans up the "who"s to "whom"s. PS, "whom" passes DCDuring's test: oxforddictionaries.com lists it as one of the top 1000 words in its corpus, and in the Corpus of Contemporary American English it's the 1021st most common word. - -sche (discuss) 20:21, 7 June 2014 (UTC)[reply]
I'm not surprised. I even feel that "The person to who it was given" sounds really awkward since a formal context would use "whom" and an informal context would use a dangling preposition. --WikiTiki89 06:56, 8 June 2014 (UTC)[reply]
In the edited works at Google N-gram "who it was given to" does not appear whereas "whom" variations with and without dangling "to" do. In speech I'd expect the absent version to be more common.
In any event "whom" would seem to be required to give the impression that Wiktionary definitions are written in English. DCDuring TALK 12:12, 8 June 2014 (UTC)[reply]
Steven Pinker mentions whom specifically in his book The Language Instinct. He says it's a relict of a dying Germanic case system and shouldn't be mandatory. My words now not his, but try objecting to someone who doesn't use the -est ending for second-person singular forms like sayest and talkest. They'll probably say that's ridiculous, to which you say it's the same principle just -est forms died out longer ago. Renard Migrant (talk) 10:29, 9 June 2014 (UTC)[reply]
I must disagree with Pinker. Are we to similarly discard him, them, her, me? ‑‑ Eiríkr Útlendi │ Tala við mig 20:42, 9 June 2014 (UTC)[reply]
I have learned that whom is BE, and AE uses who exclusively. -- Liliana 11:21, 9 June 2014 (UTC)[reply]
AM uses whom in formal literary language, but who in spoken language. Usually when someone uses whom in spoken language, they are being humorous. —Stephen (Talk) 12:02, 9 June 2014 (UTC)[reply]
I doubt there is much actual difference between American and British English on this, despite common perception (but I may be wrong). --WikiTiki89 13:42, 9 June 2014 (UTC)[reply]
Whom is used in spoken American English, but not in some situations, especially where there is a separation between the preposition or verb of which who/whom might be the object. Usages such as "Who did you see?" and "Who did you give it to?" are very common in spoken American English and not rare in written American English. I suspect that other varieties of English show a similar pattern, though frequency may differ. DCDuring TALK 16:38, 9 June 2014 (UTC)[reply]
As another anecdoctal point on the graph, my experience of US English matches DCDuring's. ‑‑ Eiríkr Útlendi │ Tala við mig 20:42, 9 June 2014 (UTC)[reply]
MWDEU devotes 2 full pages to use of who and whom. It notes object use of who and subject use of whom. They cite Shakespeare for instances of all of the four main uses. They specifically say that there is not evidence of the decline of whom in written English. I don't think that Garner's (2009) is entirely accurate, but they strongly defend objective whom and nominative who, except in intentionally casual writing. I commend any good descriptive style book or grammar from Jespersen and Mencken to the present for details on use in both speech and writing. DCDuring TALK 23:52, 9 June 2014 (UTC)[reply]
My experience in CanE is similar to DCD’s.
Let’s avoid any construction that would look like an error to readers. It’s distracting, possibly confusing, and probably harmful to readers’ confidence in the dictionary’s reliability. Please use whom where it is conventionally used in formal writing. Michael Z. 2014-06-16 20:53 z

I created this module to replace {{catboiler}}. Right now it does exactly the same as the template did, so nothing has changed when it comes to creating new category names for existing templates like {{poscatboiler}}. The module just uses the existing subtemplates, there is no data module yet. So for most editors nothing has really changed, but I just wanted to let them know anyway. —CodeCat 22:39, 9 June 2014 (UTC)[reply]

boldfaced forms of invariant lemmata in headword lines[edit]

(Already mentioned at module talk:headword; now I'm bringing it here, where it belongs.)

At [[bonefish]], {{en-noun|bonefish|bonefishes}} displays

''plural'' <span class="form-of lang-en plural-form-of"><b class="Latn" lang="en"><strong class="selflink">bonefish</strong></b></span>

rather than

''plural'' <span class="form-of lang-en plural-form-of"><b class="Latn" lang="en">bonefish</b></span>

as it (IMO) should, because it 'tries' to link to the plural. I propose this be changed. Thoughts? (I guess function format_parts can be modified somehow to fix this.)​—msh210 (talk) 17:38, 11 June 2014 (UTC)[reply]

As I mentioned to you in the talk page, it's not the module that is responsible for the extra "selflink" tag, it's the wiki software itself. It does that whenever you link to the current page, like here: Wiktionary:Beer parlour/2014/June. —CodeCat 17:45, 11 June 2014 (UTC)[reply]
Hrm... doesn't adding an anchor like "#English" stop the software from thinking that it is in fact linking to the same [exact place on the] page, like here (where only the last link is bolded)? I presume headword templates add such anchors, so er... why are the links still bolded? (And why is it a problem that the invariant plural is bolded, when the '-es' plural is also bolded? I think either bolding all plural forms or bolding none of them is good. Am I just totally misunderstanding what's being discussed here?) - -sche (discuss) 18:06, 11 June 2014 (UTC)[reply]
It does, but that anchor is not added if the final parameter to full_link is provided. —CodeCat 18:22, 11 June 2014 (UTC)[reply]
Well, yeah, of course. But it's the module that links! I'm proposing that it not do so when the form is the same as the lemma.​—msh210 (talk) 23:05, 11 June 2014 (UTC)[reply]
Why does it make a difference? The software already handles links that point to the current page. —CodeCat 23:09, 11 June 2014 (UTC)[reply]
Because there's an extra <strong>, which looks awful (bonefish (plural bonefish)).​—msh210 (talk) 04:52, 12 June 2014 (UTC)[reply]
Aha, after some testing I can see what you mean. Those two instances of "bonefish" are identical on my computer (Windows 7) if I use Firefox v30 or Opera v12, but I can see the difference if I use Internet Explorer v11. - -sche (discuss) 06:16, 12 June 2014 (UTC)[reply]
They look very different on Firefox for me: file:Bonefish msh210.png.​—msh210 (talk) 06:39, 12 June 2014 (UTC)[reply]
For me, the second "bonefish" (when it is different from the first, i.e. when I use Internet Explorer) is bigger/bolder, but not underlined. How odd that things would display so differently not just from browser to browser but from user to user! - -sche (discuss) 14:32, 12 June 2014 (UTC)[reply]
There are more than two levels of font weight. I remember reading somewhere about Firefox having different font weight for <strong> and <b>Keφr 21:55, 12 June 2014 (UTC)[reply]
Would this be solved by specifying this in the CSS?
b strong { font-weight: inherit; }
— This unsigned comment was added by CodeCat (talkcontribs) at 22:51, 12 June 2014 (UTC).[reply]
If there's a class used for all headword lines, then .classname strong.selflink{font-weight:inherit} would probably work. And it'd not inadvertently affect other places b strong may appear.​—msh210 (talk) 04:59, 13 June 2014 (UTC) Stricken.​—msh210 (talk) 07:11, 15 June 2014 (UTC)[reply]
Fixing an HTML issue with a CSS hack seems like a bad idea. Why not just test to see if the plural is the same word as the current page and if so don't make it a link? Kaldari (talk) 08:50, 14 June 2014 (UTC)[reply]
I agree.​—msh210 (talk) 07:11, 15 June 2014 (UTC)[reply]
On the contrary. The issue isn't even in the HTML, as "strong" nested within "b" is perfectly fine as far as HTML is concerned. Rather, the problem is how that combination is displayed, which is what CSS is supposed to take care of. Trying to fix this issue by changing the HTML is a bit like breaking down a wall because you don't like its colour. The wall wasn't the problem, the paint was. —CodeCat 23:36, 17 June 2014 (UTC)[reply]
I'm not sure I follow that logic. Isn't <i> nested within <b> fine as far as HTML is concerned? Yet if you don't want any of the bold text on your site to be italic, the solution is not to use CSS to change how <b><i>foo</i><b> displays, it's to drop the <i> from the places where it occurs inside (or outside) <b>. - -sche (discuss) 01:13, 18 June 2014 (UTC)[reply]
Actually no, in your example that is what you'd do. That is why CSS exists; to separate presentation from the underlying content. In other words: the content should not contain information about how to display it. That's a very fundamental HTML principle which was not strongly enforced when HTML was new, but is now much more rigid. Hence, in the modern HTML5 interpretation, <b> doesn't actually mean "bold", despite that it is commonly used to bold text. It's perfectly valid to make text in that element not display bolded. See this link for an explanation on what these elements actually mean. —CodeCat 01:21, 18 June 2014 (UTC)[reply]
According to that page, we don't want inflected forms that are the same as their lemma forms to get <strong>: <strong> is for more emphasis/importance (than, in this case, the lemme form has), which we have no reason to prescribe for the inflected form. It just makes sense for the module to check whether the inflected form matches the lemma form and, if so, not generate a link.​—msh210 (talk) 07:29, 18 June 2014 (UTC)[reply]
Then that should be addressed to the MediaWiki developers. They are the ones that decided that "strong" would be appropriate to indicate a link to the current page. We shouldn't try to work around that, because we would never be able to catch all cases anyway. —CodeCat 16:47, 19 June 2014 (UTC)[reply]
Or we can fix Module:links (lines 214-216) so that it does not replicate this behaviour. Keφr 18:04, 19 June 2014 (UTC)[reply]
But we need to replicate it. Otherwise links to the current page won't show in bold. —CodeCat 18:24, 19 June 2014 (UTC)[reply]
The above discussion suggests the opposite. Keφr 18:30, 19 June 2014 (UTC)[reply]
The above discussion is only about headword templates. The code you referred to is used in many other places too. —CodeCat 19:03, 19 June 2014 (UTC)[reply]
Personally, I would get rid of the boldface in inflection tables too. I think it might be quicker to list places where boldface on self-links is desirable, actually. How about changing language_link so that the bolding would depend on the link face? (By which I mean whatever is currently passed as the face argument to full_link.) Keφr 19:08, 19 June 2014 (UTC)[reply]
That's probably the best solution. —CodeCat 21:14, 19 June 2014 (UTC)[reply]
If we choose to change it only in headword lines, then the fix would be a conditional in the definition of part in function format_parts in module:headword, I think.​—msh210 (talk) 07:10, 20 June 2014 (UTC)[reply]

A small improvement for languages with the same name[edit]

We currently have a variety of ways to disambiguate language names so that they're unique, but they may not always help people find the language they're looking for. So what if we created categories like Category:Buli language, and categorise all languages called Buli in there? @-sche I think this would interest you. —CodeCat 22:45, 12 June 2014 (UTC)[reply]

I think that's a good idea. In addition to languages that are distinguished by parenthetical disambiguators, like Buli (Ghana) and Buli (Indonesia), this could be especially helpful for languages that are distinguished by prepended family info, like Austronesian Gimi vs Papuan Gimi. Someone who comes here knowing that a language is called Gimi might look in Category:All languages under 'G', or might start typing "Category:Gim..." into our search bar; right now the search suggestion function won't find anything to suggest, but if there were a Category:Gimi language, it would. (For languages that are distinguished by the use of alternate names, this method isn't possible; they will just have to keep cross-linking via {{also}}.) - -sche (discuss) 02:40, 13 June 2014 (UTC)[reply]

Using ISO 639-3 private use codes for custom language families[edit]

A little while ago I mentioned in passing (I don't remember where) that it might be good to change the way we devise codes for language families lacking one. I suggested using the ISO 639-3 private use area for this. Private use codes are in the range qaa-qtz, so there would be 520 codes for us to use, which I imagine is plenty if we use them for language families alone. The main reason I propose this is so that we can avoid really long 9-letter codes like "ine-bsl-pro" for Proto-Balto-Slavic, or "qfa-kor-jjm" for Jeju. If the family is at most 3 letters, then the whole code will never be more than 6, which is a bit more manageable.

I propose the following for our existing "exceptional" family codes:

Name Old New
Admiralty Islands poz-aay qai
Anatolian ine-ana qan
Andamanese qfa-adm qad
Arabic sem-arb qar
Aramaic sem-ara qam
Arandic aus-rnd qac
Araucanian qfa-ara qau
Arnhem aus-arn qah
Atayalic map-ata qal
Aymaran sai-aym qay
Bahnaric aav-ban qba
Balto-Slavic ine-bsl qbs
Bantoid nic-bod qbd
Benue-Congo nic-bco qbc
Borneo-Philippines poz-bop qbp
Brythonic cel-bry qbr
Bungku-Tolaki poz-btk qbt
Bunuban aus-bub qbn
Burmish tbq-brm qbm
Canaanite sem-can qca
Cariban sai-car qcb
Central Chadic cdc-cbm qcc
Central New South Wales aus-cww qcn
Central Semitic sem-cen qcs
Central-Eastern Oceanic poz-occ qco
Chapacuran qfa-cpc qch
Chinookan nai-ckn qci
Chukotko-Kamchatkan qfa-cka qck
Chumashan nai-chu qcm
Daly aus-dal qdl
Dardic iir-dar qda
Dogon qfa-dgn qdg
Dyirbalic aus-dyb qdy
East Barito poz-bre qeb
East Chadic cdc-est qec
East Semitic sem-eas qes
Edoid alv-edo qed
Eskimo esx-esk qek
Ethiopian Semitic sem-eth qet
Finisterre ngf-fin qfn
Finnic fiu-fin qfi
Finno-Permic fiu-fpr qfp
French Sign Languages sgn-fsl qfs
Frisian gmw-fri qfy
Fur ssa-fur qfu
Garawan aus-gar qgw
German Sign Languages sgn-gsl qgs
Goidelic cel-gae qga
Grassfields nic-grf qgf
Guahiban qfa-gua qgh
Guaicuruan sai-gua qgc
Gunwinyguan aus-gun qgy
Gur nic-gur qgu
Halmahera-Cenderawasih poz-hce qhc
Hurro-Urartian qfa-hur qhu
Inuit esx-inu qiu
Iwaidjan aus-wdj qia
Iwam paa-iwm qiw
Japanese Sign Languages sgn-jsl qjs
Jivaroan qfa-jiv qjv
sai-jee qje
Kadu qfa-kad qka
Kaili-Pamona poz-kal qkp
Kainantu-Goroka paa-kag qkg
Kainji nic-knj qkj
Karnic aus-kar qkr
Keresan qfa-ker qks
Kiowa-Tanoan qfa-kta qkt
Korean qfa-kor qko
Kukish tbq-kuk qkk
Kwa alv-kwa qkw
Kx'a qfa-kxa qkx
Lakes Plain paa-lkp qlp
Lampungic poz-lgx qla
Left May qfa-mal qlm
Lencan qfa-len qln
Macro-Chibchan qfa-mch qmh
Macro-Jê sai-mje qmj
Maiduan nai-mdu qmd
Malayic poz-mly qml
Malayo-Chamic poz-mcm qmc
Malayo-Sumbawan poz-msa qms
Masa cdc-mas qma
Mascoian qfa-mas qmo
Mataco-Guaicuru qfa-mgc qmg
Matacoan qfa-mtc qmt
Mbum alv-mbm qmu
Micronesian poz-mic qmn
Mien hmx-mie qmi
Misumalpan qfa-min qmm
Mixe-Zoquean nai-miz qmz
Mixtecan omq-mix qmx
Muna-Buton poz-mun qmb
Muran sai-mur qmr
Muskogean qfa-mus qmk
Nahuan azc-nah qnu
Nambikwaran sai-nmk qnk
New Caledonian poz-cln qnc
Ngayarda aus-nga qny
North Athabaskan ath-nor qna
North Bahnaric aav-nbn qnh
North Bornean poz-bnn qnb
North Sarawakan poz-swa qnw
North-Central Vanuatu poz-vnc qnv
Northeast Caucasian cau-nec qkc
Northwest Caucasian cau-nwc qpc
Northwest Semitic sem-nwe qns
Northwest Sumatran poz-nws qnm
Nyulnyulan aus-nyu qnn
Oceanic poz-oce qoc
Ok ngf-okk qok
Old South Arabian sem-osa qoa
Pacific Coast Athabaskan ath-pco qpk
Palaihnihan qfa-pal qph
Pama-Nyungan aus-pam qpn
Paman aus-pmn qpm
Pano-Tacanan qfa-pat qpt
Panoan qfa-pan qpa
Polynesian poz-pol qpl
Pomoan nai-pom qpo
Sabahan poz-san qsh
Sahaptian nai-shp qsp
Saluan-Banggai poz-slb qsb
Sama-Bajaw poz-sbj qsj
Savanna alv-sav qsv
Senegambian alv-sng qsg
Sepik paa-spk qse
Siouan-Catawban qfa-sca qsc
Sko paa-msk qsk
South Arabian sem-sar qsa
South Bird's Head ngf-sbh qbh
South Semitic sem-sou qsm
South Sulawesi poz-ssw qss
Southeast Solomonic poz-sls qsl
Southwest Pama-Nyungan aus-psw qsn
Southwestern Tai tai-swe qst
substrate qfa-sub qsu
Sunda-Sulawesi poz-sus qsi
Tacanan qfa-tac qta
Tai-Kadai qfa-tak qtk
Tocharian ine-toc qto
Tomini-Tolitoli poz-tot qtt
Torricelli qfa-tor qtc
Tucanoan qfa-tuc qtn
Tuu qfa-tuu qtu
Tyrsenian qfa-tyn qty
Ubangian nic-ubg qbg
Ugric fiu-ugr qgr
Vietic mkh-vie qfv
Volta-Congo nic-vco qcv
Volta-Niger alv-von qng
West Barito poz-brw qbw
West Chadic cdc-wst qcw
West Semitic sem-wes qsw
Western Oceanic poz-ocw qow
Wintuan qfa-wtq qin
Wotu-Wolio poz-wot qqw
Xincan qfa-xin qic
Yeniseian qfa-yen qey
Yidinyic aus-yid qiy
Yok-Utian qfa-you qou
Yolngu aus-yol qoy
Yuin-Kuric aus-yuk qky
Yukaghir qfa-yuk qqy
Yuman-Cochimí nai-yuc qcy
Zaparoan qfa-zap qrz
Zapotecan omq-zap qpz

(I don't know how to make the table collapsible so that it takes up less space. If you know, please edit my post.) —CodeCat 23:10, 15 June 2014 (UTC)[reply]

The previous discussion was at the end of this GP thread from April. As I did then, I oppose this now because I don't think it's workable. The ISO has been relatively stingy when it comes to granting codes to families and subfamilies, and our own Module:families/data is (even after I started working on it) sadly incomplete. In the GP thread I said that, as a ballpark guess, I'd expect us to end up with maybe four times as many exceptional (non-ISO) family and subfamily codes as we have now, by the time Module:families is 'complete'. In particular, our treatment of African, Asian and American languages is often coarse; we're lacking a lot of subfamilies. And I wasn't even thinking of sign language families at the time, but there are probably dozens of those that don't have ISO codes. The list above contains 167 families. If my estimate is correct, we'll end up needing ~660 codes, which means we'd run out of possible codes. Moreover, long before that happened, we'd run out of codes that were memorable approximations of their families' names. Just looking at the last few codes in the list, I see that due to the restriction on codes higher than "qtz", you've already had to resort to things like "qqy" for Yukaghir, "qrz" for "Zaparoan", "qpz" for "Zapotecan", "qky" for "Yuin-Kuric", etc. - -sche (discuss) 00:14, 16 June 2014 (UTC)[reply]
There is no disadvantage for us to have more than three characters in a language code. In fact, it makes it easier to come with more meaningful and memorable codes. --WikiTiki89 00:36, 16 June 2014 (UTC)[reply]
Using non-standard language codes invalidates the HTML of our pages. Private-use extension codes are okay, of course, because they are standardized.
And I agree that longer codes might be better. Instead of Yukaghir = qaa, why not use the BCP47 private-private use subtags, allowing, e.g., Yukaghir = x-yuk or x-yukaghir? Michael Z. 2014-06-16 20:14 z
I think we should do this only for the roots, not for the branches. Using a q code for something like Benue-Congo is just wasting a limited resource, IMO- especially since some of the entries in your table are bogus (several of the subdivisions of poz are figments of Blust's questionable methodology, for instance).
We should concentrate first on the codes starting with dummy families such as qfa and und- those are the main source of the unwieldy and non-mnemonic clutter you're talking about. Next we might think about regional ones like aus, nai and sai, but they do have a little bit of mnemonic value, even though they have no linguistic merit at all.
Any time we have a three-letter code, everything below it on the tree should start with that code, unless it has its own ISO code- in which case everything below it should have the iso code. There may be a few cases such as nic and alv where we may decide to make an exception, but that should be the general rule. Chuck Entz (talk) 02:09, 16 June 2014 (UTC)[reply]
Hmm, I had considered this idea myself, of giving q__ codes only to top-level families that currently have qfa-___ codes. It's probably workable, but it wouldn't change much. At present, there are 38 codes with "qfa-" prefixes, and only some of their proto-languages' codes would get shorter: "Pano-Tacanan" is currently "qfa-pat", its subfamily "Tacanan" is "qfa-tac" (NB not "qfa-pat-tac"), and their associated proto-languages would be "qfa-pat-pro" and "qfa-tac-pro" (not "qfa-pat-tac-pro") if they existed; if "Pano-Tacanan" were "qpt", its proto-language code would be "qpt-pro", but "Tacanan" would still be nine letters, as "qpt-tac-pro". - -sche (discuss) 20:22, 16 June 2014 (UTC)[reply]

Naming scheme for templatized usage notes[edit]

Sometimes, it's useful to put the same usage note on several entries. When that happens, the usage note is made into a template. However, we don't have a consistent naming scheme for such templates. Many start their names with language codes followed by 'note' or 'usage', like Template:he-usage-begedkefet and Template:de-note obsolete spelling. A few start with 'usage', like Template:usage less fewer and Template:usage ize. Template:U:Latin stop+liquid poetic stress alteration exists in a 'U:' pseudonamespace, apparently inspired by the 'R:' pseudonamespace that reference templates exist in. I actually quite like that last idea, especially if coupled with the use of language codes, as it groups all templatized usage notes together in Special:AllPages, just like the reference templates are grouped under 'R:'.
I suggest the following naming scheme for templatized usage notes: Template:U:[language code]:[brief identifier]. Template:de-note obsolete spelling would become Template:U:de:deprecated spelling (or similar); Template:usage ize would become Template:U:en:ize (or similar).
My second choice would be to fix the half-dozen outliers to use the [language code]-['note' or 'usage']-[identifier] format that it seems everything else uses.
Thoughts? - -sche (discuss) 05:45, 17 June 2014 (UTC) corrected a typo/thinko in my original post: switched from U:[langcode]- to U:[langcode]: (compare: the reference templates use a colon after the langcode, not a hyphen) - -sche (discuss) 19:12, 18 June 2014 (UTC)[reply]

Support. We also need a category for these templates. — Ungoliant (falai) 13:26, 17 June 2014 (UTC)[reply]
I like the U:langcode: prefix approach, though, as always, I don't think that the langcode is a needless waste of keystrokes for langcode=en. It should dramatically increase the likelihood that one could find the template by typing something in the search box. DCDuring TALK 17:20, 17 June 2014 (UTC)[reply]
Support. The Hungarian usage templates can be found here: Category:Hungarian usage templates. --Panda10 (talk) 12:19, 19 June 2014 (UTC)[reply]
Oh, and it seems we even have a Category:Usage templates, we just need to be sure to use it(s subcategories) on all the templates. - -sche (discuss) 15:47, 19 June 2014 (UTC)[reply]
Using Special:AllPages, I found every template that had "usage" or "note" in its name. I've moved about half of them; these remain to be moved. I haven't categorized many of the templates into Category:Usage templates yet, but it should be easy to find all the templates that need categorizing now that they all begin with "U:" (or are listed here). - -sche (discuss) 21:24, 3 July 2014 (UTC)[reply]
Some more candidates: {{rank}}, {{season name spelling}}, {{who vs. whom}}, {{ga-analytic}}, {{oikein väärin}}, {{HTML char}}, {{Hiragana informal}}, {{he-begedkefet}}, {{el-freq-Google}}, {{el-freq}}, {{katakana-in-science}}, {{preferred IUPAC name}}, {{trademark erosion}}, {{sh-coll-link}}, {{arabdialect}}, {{1990}}, {{el-T-Vs}}, {{el-T-Vp}}, {{ja-kun-vs-on}}, {{be-у-ў}} DTLHS (talk) 02:11, 4 July 2014 (UTC)[reply]
Thanks for finding those. "1990"? Youch; as terrible names for usage-note-templates go, that one is exceptional... - -sche (discuss) 03:12, 4 July 2014 (UTC)[reply]

chapter in quote-book[edit]

I recently asked in the Grease Pit for what seemed a minor change in a template that would make a major improvement to the display of quotations from books (see here). It was about shifting the presentation of the chapter number from before the book title to after it. It seemed to me to be an important change of minor coding difficulty to anyone familiar with the templates and their encoding (I am familiar with neither).
Nothing has been done about it, and nobody has even commented on it. Should I have raised the matter here ? Or in the Tea Room ? ReidAA (talk) 07:25, 19 June 2014 (UTC)[reply]

Here.

IMO « “Soup from a Sausage Peg”, in The Snow Queen and Other Tales » looks good.​—msh210 (talk) 07:20, 20 June 2014 (UTC)[reply]

That's fine if the chapter has a name, but not if it only has number, as in choc-a-bloc. And I don't understand why your example brings up the word in while mine doesn't. I've tried to find out about the internals of templates but to call the documentation mind-boggling is a gross understatement. What I would appreciate is a quote-novel template, much like the quote-book one, that puts a chapter number (which the quote-book seems able to detect) after the title of the novel and it would be useful to have also a time parameter that can be used to convey when the story is set. ReidAA (talk) 06:47, 24 June 2014 (UTC)[reply]
I like the format at [[choc-a-bloc]]. In any event, you can do what I do and what I suspect most editors do: not use the templates, instead formatting the quotations as in [[Wiktionary:Quotations#How to format quotations]].​—msh210 (talk) 19:04, 24 June 2014 (UTC)[reply]
Hey, that's a great link. I only wish I had found it for myself long ago. But I'm not sure that using templates mightn't allow easier coding. What I would like to be able to do is to code those RQ templates (i.e., make my own), preferably without embedded template usage. Can you tell me where there is an explanation of how to code them?—ReidAA (talk) 01:12, 25 June 2014 (UTC)[reply]
I don't think there is. Do you know the basics of coding wiki templates? In that case, perhaps find one that does similarly to what you want yours to do and copy and modify it. Otherwise, ask for help coding one in this section of this page or in the Grease pit.​—msh210 (talk) 05:34, 25 June 2014 (UTC)[reply]
I fancy I could handle the coding, provided I could look at example template coding. My attempts to get to see one have been fruitless, though I have no difficulty looking at their documentation code because it's easy to pretend to be editing it. So how do I get to look at an example RQ template source code?—ReidAA (talk) 09:40, 25 June 2014 (UTC) Fixed indenting.​—msh210 (talk) 00:37, 26 June 2014 (UTC)[reply]
Go to any such template and click "Edit" or "View source" atop the page or add ?action=edit to the URL; e.g., http://en.wiktionary.org/wiki/Template:RQ:Hardy_Laodicean?action=edit.

If you need to edit someone else's post on a discussion page, it's courteous to indicate you did so (as I've done to your last post, just above); I'd even say it's obligatory if (as was not the case here) the edit is substantive.​—msh210 (talk) 00:37, 26 June 2014 (UTC)[reply]

Hey, that link looks great! I think it's just what I needed. I presume I'll be able to copy that, edit it, and put it in the same location. Should I add a link to it to https://en.wiktionary.org/wiki/Category:Reference_templates or is that automatic? (I notice that your example doesn't seem to be there, and there doesn't seem to be a button to allow me to put a change in.)

As to my editing, all I meant to do was change the indentation so that more text would be visible at one time (see boldfaced forms of invariant lemmata in headword lines about 4 items above to see the kind of thing I was trying to avoid). If I did more it was accidental, for which my apologies. In fact, I think it would be preferable merely to alternate the message indentations, but in future I'll just go along with what seems to be the accepted practice.—ReidAA (talk) 11:09, 26 June 2014 (UTC)[reply]

It's not automatic.​—msh210 (talk) 07:38, 29 June 2014 (UTC)[reply]

Media Viewer is now live on this wiki[edit]

Media Viewer lets you see images in larger size

Greetings,

The Wikimedia Foundation's Multimedia team is happy to announce that Media Viewer was just released on this site today.

Media Viewer displays images in larger size when you click on their thumbnails, to provide a better viewing experience. Users can now view images faster and more clearly, without having to jump to separate pages — and its user interface is more intuitive, offering easy access to full-resolution images and information, with links to the file repository for editing. The tool has been tested extensively across all Wikimedia wikis over the past six months as a Beta Feature and has been released to the largest Wikipedias, all language Wikisources, and the English Wikivoyage already.

If you do not like this feature, you can easily turn it off by clicking on "Disable Media Viewer" at the bottom of the screen, pulling up the information panel (or in your your preferences) whether you have an account or not. Learn more in this Media Viewer Help page.

Please let us know if you have any questions or comments about Media Viewer. You are invited to share your feedback in this discussion on MediaWiki.org in any language, to help improve this feature. You are also welcome to take this quick survey in English, en français, o español.

We hope you enjoy Media Viewer. Many thanks to all the community members who helped make it possible. - Fabrice Florin (WMF) (talk) 21:54, 19 June 2014 (UTC)[reply]

--This message was sent using MassMessage. Was there an error? Report it!

To turn Media Viewer off (which you will probably want to, since it's incredibly annoying), go to Special:Preferences, select the Appearance tab, scroll down to Files, and unclick "Enable Media Viewer". If you don't have an account, I don't think you can turn it off, so you're just out of luck. Despite all appearances to the contrary, it's a feature, not a bug. —Aɴɢʀ (talk) 06:55, 20 June 2014 (UTC)[reply]
Oh, great. Another misfeature to turn off. But it is not that pictures are critical to us, anyway. Keφr 07:07, 20 June 2014 (UTC)[reply]

Category:Place names and topical subcategorisation in general[edit]

Category:en:Place names and its subcategories are presumably meant to contain place names in English. But among its various subcategories are categories like Category:en:United States of America. This is a place name itself, yes, but this category is intended and used for anything related to the US, not just places in the US. What bothers me here is that this category is nonetheless a subcategory of Category:en:Place names. I like to think that any subcategory is a strict subset of its parent category, so that any terms placed in a subcategory would be valid in the main category as well. How do other editors think of this principle. And if we should apply it, how would we do so here? There are other areas within the category tree where this applies too, like Category:en:Hydrology and Category:en:Snow being subcategories of Category:en:Liquids. —CodeCat 22:00, 19 June 2014 (UTC)[reply]

Place names is a lexical category. If it is to be subdivided further, it should probably not be by geography. We don’t have Category:en:Adjectives in the United States of America.
If Category:USA is supposed to categorize referents, i.e., the things represented by terms, then it doesn’t belong under Place names at all.
And in my opinion, it belongs in Wikipedia, not in Wiktionary. As long as we continue to categorize terms by qualities of their referents, there will continue to be such confusion among 1 lexical/grammatical categorization of terms, 2 technical/subject-field categorization of terms’ usage, 3. encyclopedic categorization of things. Why should we put so much energy into creating a far lamer copy of Wikipedia’s categorization? Instead, let’s keep adding Wiktionary links to Wikipedia articles. Michael Z. 2014-06-27 18:22 z
@Mzajac Are you saying that something like Category:nl:Days of the week does not belong on Wiktionary? I don't think I agree with that, it's a very useful category. —CodeCat 20:19, 29 June 2014 (UTC)[reply]
I see that this is a leaf in the branch » Dutch language » All topics » Nature » Time » Days of the week. I guess this relates to meanings, the way a thesaurus classifies words? I don’t even know if these topics relate to or overlap with the technical vocabulary categories that are applied using usage (“context”) labels. I do see it as a problem that this is not called Category:nl:Names of days of the week or Category:nl:Terms for days of the week, because our entries represent terms, not their referents.
But if that is useful, aren’t the following also: Category:nl:Colours of the rainbow, Category:nl:Apostles of Christ, Category:nl:Blackletter fonts, or Category:nl:Dog breeds that are good with childrenMichael Z. 2014-06-29 23:56 z
Presumably yes, except that there's not as strong a need to look up those terms. I've been working on the topical tree for a few days now, reorganising and moving things around a bit to what seems more workable to me. What I noticed is that there are two basic types of category: categories of topic or relationship ("Chemistry", "Weather", "Food and drink"), and categories of types or sets ("Days of the week", "Organic compounds", "Countries of Africa"). The actual entries contained in them may not be so strictly separated, however. The former generally have names in the singular, while the latter mostly have plural names. What I've tried to do is to make sure that relationship categories are not subcategorised into type categories, to avoid the scenarios I described above. —CodeCat 00:05, 30 June 2014 (UTC)[reply]
I see we have Category:Colors of the rainbow, just no Dutch there.
Well, thanks for working on better organizing these. Michael Z. 2014-06-30 01:14 z
To go back to what you said, though. Our entries represent terms rather than their referents, and that's specifically why topical categories exist. To categorise by their referents instead. Of course you can put "terms related to" or "terms for" in front of every category name, but that doesn't really change anything in the end, the category structure will still be the same and so will the entries in them. So I wonder what you would suggest that wouldn't diminish the utility of these categories. —CodeCat 01:32, 30 June 2014 (UTC)[reply]
Of course naming the categories properly would change something. Some editors have only the vaguest idea of what dictionary entries are or how this is fundamentally different from Wikipedia, and imprecise or incorrect language built into the project just increases the confusion.
I’m not sure, but I think there is confusion because topical categories exist for several different reasons, which are not wholly compatible. Editors have been shuffling these around non-stop for years now, with no overall plan as to how they should look. I have a feeling this can’t be resolved without defining exactly what these are, and possibly creating two or three separate category trees for them.
  1. For example, since usage is documented by specialized subject labels like “chemistry,” logic tells me that there should be a category containing technical vocabulary used in the field of chemistry. We have a zillion labels categorizing usage, but no usage categories.
  2. Since editors are adding [[Category:en:Chemistry]] to entries, I suppose there should also be a more general subject-field category for “English terms related to chemistry,” after we define exactly what that means.
  3. If we are also categorizing by definition, distinct from usage or subject field, then I suppose we should settle on some scheme like that of Roget’s Thesaurus, where all words are grouped by concept (unlike most of today’s alphabetical thesauruses) – so Category:Chemical elements would be somewhere in section 635 “Materials.”
Maybe nos. 2 and 3 are the same thing, but they are certainly distinct from no. 1. But since all three are mixed up in a soup, our categories will continue to be shuffled around because they don’t seem right. Michael Z. 2014-07-02 00:34 z
I definitely agree with your first point. If terms are used within specific fields, or certain senses are, then that's lexically significant and not really any different from many of the categories currently in Category:English lexicons. What you mention in point 2 is a general problem with these categories in that "related" is not well-defined, and can be interpreted rather broadly. Some people might consider water to belong in Category:en:Chemistry just because it's the name of a chemical substance and hence related to chemistry. Others might disagree but think carbon dioxide belongs there. And if I understand your third point correctly, it refers to what I called "categories of types" above. They group things by common hypernyms; that is, by what their referents are. —CodeCat 00:46, 2 July 2014 (UTC)[reply]

Chinese most basic words are still undefined![edit]

Calling on Chinese-aware editors (natives and learners) to pull up their socks and make some effort to add missing Chinese contents. HSK Beginning level, first few hundred most frequent Chinese words (e.g. Appendix:Mandarin_Frequency_lists/1-1000 still miss definitions, lack formatting and have been neglected since their creation many years ago. I don't think there are obstacles for doing them now (translingual sections, difference in topolects), even basic senses for single-character words are still missing - This entry needs a definition. Please add one, then remove {{defn}}. And they are most frequent everyday words! I have just added "to do" - one of the most frequent Chinese verbs. Recently added "water", "year", "mountain", many other basic words. Even if definitioneless Mandarin entries get definitions, it's a step in the right direction, no need to work with a dialect you don't speak, if you're not confident.

@Tooironic, @Kc kennylau, @Bumm13, @Meihouwang, @Wyang - not such a small group, huh? I may have missed some people. --Anatoli (обсудить/вклад) 02:02, 20 June 2014 (UTC)[reply]

The complexity of Mandarin/Chinese entries put off people but now they're not so complicated. Here's an example:
==Chinese==
{{zh-hanzi-box|[[HANZI]]|...}}

===Pronunciation===
{{zh-pron
|m=PINYIN
...
|c=JYUTPING
...
|cat=POS,...
}}

===POS===
{{zh-POS}}

# DEFINITIONS
...

Wyang also suggested using:

Definitions

for terms with complex semantics. --Anatoli (обсудить/вклад) 02:09, 20 June 2014 (UTC)[reply]

I've deliberately put off doing 字 entries because they seem such a big undertaking. Is anyone else up for this? I suppose we could get the basic ones done first. ---> Tooironic (talk) 02:10, 20 June 2014 (UTC)[reply]
I know that you put off doing them. Thanks. That's what I argue - first - they are not THAT complicated, second - the definitions don't need to be exhaustive, e.g. see is good enough, IMHO. :). One definition is better than nothing at all. Besides, you don't have to have to do stroke orders, canjie, anything that goes into translingual, just semantics is fine. --Anatoli (обсудить/вклад)
I've been following other people's leads and doing some occasional edits to them... Wyang (talk) 03:52, 20 June 2014 (UTC)[reply]
You've been most productive and leading in the Chinese editing. Aren't all the new modules, templates, bot work your doing? :) I only included you in the list to invite you to the topic. Thanks to the new structures you made it has become much easier to add non-Mandarin contents too. (I'm not taking credit for what I haven't done myself, in case it sounded like I do, LOL). --Anatoli (обсудить/вклад) 04:01, 20 June 2014 (UTC)[reply]
Forgot to ping @Jamesjiao. You've been very quiet. :) --Anatoli (обсудить/вклад) 07:59, 20 June 2014 (UTC)[reply]
LOL... yes.. Let's just say I am no longer alone in my life now, so I have been spending a lot less time on the dictionary lately! Character entries just put me off for some reason; maybe it's because I like to be comprehensive and edit in all the possible definitions all in one go. I could change that attitude and just add in the basic stuff first... JamesjiaoTC 03:17, 24 June 2014 (UTC)[reply]
When will the "definitions" thingy be official? I can't wait to see that day! --kc_kennylau (talk) 09:04, 20 June 2014 (UTC)[reply]
Probably not in the foreseeable future. —CodeCat 12:12, 21 June 2014 (UTC)[reply]
Should we put compound only definition in this header? for , I separated it into noun and verb header. But this is wrong because this character will never be used as a noun or verb by itself. Should the part of speech correspond to the use of the character in a sentence or to the meaning? Meihouwang (talk) 08:40, 21 June 2014 (UTC)[reply]

Using rollback to revert good-faith edits[edit]

...needs to stop immediately. It violates numerous policies. The undoing of good faith edits should only be done while leaving edit summaries Purplebackpack89 (Notes Taken) (Locker) 00:00, 22 June 2014 (UTC)[reply]

Which policies? Also, if I recall, you left no edit summary on your own revert of Chuck's (good faith and correct) edit. —CodeCat 00:03, 22 June 2014 (UTC)[reply]
Look again, CodeCat. You'll see I DID leave an edit summary. And, at the time, the module was not working, so Chuck's edit removed a category, and was therefore wrong. If you don't understand the policies governing rollback, CodeCat, you shouldn't be using it. Purplebackpack89 (Notes Taken) (Locker) 00:06, 22 June 2014 (UTC)[reply]
"BRD" is just some letters. It's not an edit summary, nor does it explain your reasons for reverting. And again I ask, which policies? Furthermore, if the module was not working, that doesn't mean Chuck's edit was incorrect. It means that the module needed fixing, which he did as far as I can tell. Either way, that category didn't belong on that page whether the module was fixed or not. —CodeCat 00:07, 22 June 2014 (UTC)[reply]
BRD has a distinct meaning in the Wikimedia universe. It means bold, revert, discuss. Chuck made an edit. I reverted. Per BRD, instead of reverting, you should have discussed. But this particular edit is beside the point. The point is that edit summaries should be left except in case of vandalism. You going on and on about how right you are doesn't give you an excuse to not leave an edit summary when making a clearly controversial edit. Purplebackpack89 (Notes Taken) (Locker) 00:14, 22 June 2014 (UTC)[reply]
CodeCat, before you continue editing, please read this, it's the Meta blurb on rollback. We don't have a blurb on rollback, so in the absence of one, I defer to Meta and to Wikipedia. Purplebackpack89 (Notes Taken) (Locker) 00:16, 22 June 2014 (UTC)[reply]
BRD is a Wikipedia concept, not a Wiktionary concept. It's not used in Wiktionary and people here generally are not familiar with it unless they happen to edit Wikipedia too. Furthermore, even on Wikipedia, "BRD" is not a valid justification for any edit, as it is a common Wikipedia practice, not a policy nor a reason for making an edit. In any case, if you want to set up an official policy to make edit summaries required when not reverting vandalism, you're free to do so. But I don't think there's much chance of it succeeding. Regardless of what Meta says, on Wiktionary a revert simply means the same as an undo. It means "I think the page was better before". —CodeCat 00:20, 22 June 2014 (UTC)[reply]
The ideal solution is to get the same tool that Wikipedia has to easily revert both good and bad faith edits and optionally leave an edit summary. --WikiTiki89 00:09, 22 June 2014 (UTC)[reply]
And until Twinkle arrives on Wiktionary, rollback shouldn't be used for anything except vandalism Purplebackpack89 (Notes Taken) (Locker) 00:14, 22 June 2014 (UTC)[reply]
Why does Purplebackpack have rollbacker rights anyway? He is unfamiliar with Wiktionary’s practices and doesn’t seem interested in becoming familiar with them. — Ungoliant (falai) 00:38, 22 June 2014 (UTC)[reply]
User:Ungoliant MMDCCLXIV, That's painting with too broad a brush. However, there are a number of "policies" and not-having-policies that seem just plain arbitrary. Some policy even seem like they're different solely to stick a finger in Wikipedia's eye. Also, the general reason for taking rollback away is abuse. If I am found to have abused rollback and you use that as justification for taking mine away, you'd have to also take away CodeCat's and probably other people's as well. Purplebackpack89 (Notes Taken) (Locker) 00:57, 22 June 2014 (UTC)[reply]
Because? Keφr 15:20, 22 June 2014 (UTC)[reply]
Well if he keeps up this I-don’t-like-your-practices-so-I’ll-just-follow-Wikipedia’s attitude I will request the removal of his rights. — Ungoliant (falai) 17:14, 22 June 2014 (UTC)[reply]
And, as Kephir noted, you would have no basis for doing so. Not liking practices and abusing rollback are two completely different issues. Since how rollback should be used on this Wikipedia is ambiguous, I have not abused it, and therefore there's no reason to remove it. Purplebackpack89 (Notes Taken) (Locker) 00:05, 23 June 2014 (UTC)[reply]
Where did I note that? Keφr 00:09, 23 June 2014 (UTC)[reply]

Proposal: Require edit summaries for reverting non-vandalism edits[edit]

Keeps us in line with many other Wikimedia projects. Not having it makes editors who use rollback on good-faith edits come off as discourteous. Purplebackpack89 (Notes Taken) (Locker) 00:57, 22 June 2014 (UTC)[reply]

Support
  1. Purplebackpack89 (Notes Taken) (Locker) 00:57, 22 June 2014 (UTC)[reply]
Oppose
  1. “Reverted edits by Foo. If you think this rollback is in error, please leave a message on my talk page.” I see this as humble instead of discourteous, due to the admission that the the reversion may be in error. — Ungoliant (falai) 05:04, 22 June 2014 (UTC)[reply]
  2. What Ungoliant said. (And that default text is editable if the community so desires.)​—msh210 (talk) 05:50, 22 June 2014 (UTC)[reply]
  3. I don't see anything discourteous. The text as Ungoliant quoted it does not seem offensive in any way, especially compared to "Undid revision". —CodeCat 10:36, 22 June 2014 (UTC)[reply]
  4. --Yair rand (talk) 05:35, 23 June 2014 (UTC)[reply]
  5. For one thing, the 'line' between "vandalism sensu stricto" and "misguided, malformed edits which need to be undone" is quite blurry. Take this diff, for instance, which added a malformatted, vaguely worded, apparently redundant definition onto the headword line. - -sche (discuss) 13:36, 23 June 2014 (UTC)[reply]
    "malformatted, vaguely worded, apparently redundant definition". There's your edit summary right there lol. The 'line' is good-faith-but-you-don't-know-what-you're-doing vs. bad-faith-and-you-do-know-what-you're-doing. One is permissible, but not a great idea, and one isn't. You can be blocked for a few instances of one and not for a few instances of the other. Purplebackpack89 19:57, 27 June 2014 (UTC)[reply]
Discussion
  • What matters isn't rather you the experienced editor doesn't think it discourteous, it's rather whether the (probably less-experienced) editor thinks it is. It's clear none of you have read this, which explains why other editors would find it discourteous. Please read it before commenting further. Purplebackpack89 (Notes Taken) (Locker) 14:50, 22 June 2014 (UTC)[reply]

Proposal: Get Twinkle[edit]

We haven't had a serious discussion about getting Twinkle in years. If people are concerned about not being able to make edits fast enough, getting Twinkle could solve those problems. Purplebackpack89 (Notes Taken) (Locker) 00:57, 22 June 2014 (UTC)[reply]

Does importScript('User:AzaToth/twinkle.js','en.wikipedia.org','431551787'); not work here? (I haven't tried it.)​—msh210 (talk) 05:56, 22 June 2014 (UTC)[reply]
It still does, but I marked it as deprecated in favour of mw.loader.load('//en.wikipedia.org/w/index.php?action=raw&oldid=431551787') (or a longer version, you get the idea). I probably should have been more explicit about it. The reason being that importScript from our MediaWiki:Common.js clashes with MediaWiki's built-in importScript, and I think we have no good way to guarantee that either version will run at a given moment. (Also, I have been trying to clean up our scripts mess lately. I think introducing inconsistencies between MediaWiki installations in this way is a bad idea, and this is just waiting for a good moment to bite someone in a vaguely specified body part.) However, the built-in importScript only accepts one argument, and can only load scripts from the local wiki. Keφr 07:19, 22 June 2014 (UTC)[reply]
Well, in any event, some sort of importation is possible. So those who want Twinkle can use it already, can't they? This discussion is only about whether to host it locally also (and customize it to our local desires)?​—msh210 (talk) 07:34, 22 June 2014 (UTC)[reply]
No, I don't think they can Purplebackpack89 (Notes Taken) (Locker) 14:57, 22 June 2014 (UTC)[reply]
Why? Keφr 15:14, 22 June 2014 (UTC)[reply]
Support
  1. Purplebackpack89 (Notes Taken) (Locker) 00:57, 22 June 2014 (UTC)[reply]
  2. Twinkle provides the ability to do things that are very difficult to do otherwise, such as revert an arbitrary string of edits by multiple editors and leave an optional edit summary behind. Frankly, I don't see any downsides. If we are worried about too many people having access to it, then I'm sure there is a way to restrict its use to users who have rollback privileges. --WikiTiki89 20:15, 22 June 2014 (UTC)[reply]
    Like applying maintenance tags to articles, doing all three required steps of starting an AFD/MFD/RFD, initiating a request for page protection, applying protection templates… oh wait, we have none of that nonsense. Keφr 21:03, 22 June 2014 (UTC)[reply]
    So we won't use those features. Maybe they can be disabled? --WikiTiki89 21:36, 22 June 2014 (UTC)[reply]
    Or we could just use them for RfD/RfV instead. Purplebackpack89 (Notes Taken) (Locker) 00:02, 23 June 2014 (UTC)[reply]
    Might as well. On the other hand, starting RfVs and RfDs here is much less of an exercise in bureaucracy than XfD on Wikipedia (add the template, click the "+" link, write the nomination rationale, submit, done; versus editing three or four different pages while manually pasting different template magic in different places on each), and I cannot remember the last time anyone complained about the tedium of our nomination process, so… meh. The amount of work that would need to go into that would be simply not worth it. Or again, do you want to volunteer? Keφr 00:31, 23 June 2014 (UTC)[reply]
Oppose
  1. Equinox 01:00, 22 June 2014 (UTC)[reply]
Abstain
  1. I oppose until sufficient reason to install the gadget here is supplied. I haven't seen such yet.​—msh210 (talk) 05:56, 22 June 2014 (UTC)[reply]
  2. Abstaining until someone explains what Twinkle is. —CodeCat 10:37, 22 June 2014 (UTC)[reply]
    See w:WP:WikiSpeak#Twinkle. And w:WP:Twinkle for amusement. Most of its functionality makes sense only for Wikipedia, though. Keφr 10:44, 22 June 2014 (UTC)[reply]
Discussion

User:Equinox, care to give rationale rather than just voting? Wouldn't Twinkle mean faster editing and fewer situations where CodeCat et. al misuse rollback? Purplebackpack89 (Notes Taken) (Locker) 01:04, 22 June 2014 (UTC)[reply]

Twinkle does not solve the problem because you can still have good edits reverted by Twinkle with no explanation if none is filled in: see [8]. This has happened to me. So it is essentially no different from our pre-existing features (e.g. the "red D" in Recent Changes — not sure whether you've seen this, as it is only available to admins). Equinox 13:01, 22 June 2014 (UTC)[reply]
I usually keep this disabled, because I find it too annoying, but I re-enabled "Patrolling enhancements" just so I could see what you are talking about… but no red "D" appeared. Only a blue "M". Keφr 15:14, 22 June 2014 (UTC)[reply]
Twinkle isn't like patrolling, and why should either be limited to administrators? Why can't joe schmos like me have tools? Purplebackpack89 (Notes Taken) (Locker) 15:26, 22 June 2014 (UTC)[reply]
Because "joe schmos" like you lack the good judgement to use them well. Keφr 15:43, 22 June 2014 (UTC)[reply]
That is a very elitist stance, Kephir. It's also untrue: nobody has ever come up to me on this project and said: "You make too many edits in too short a time". Purplebackpack89 (Notes Taken) (Locker) 15:53, 22 June 2014 (UTC)[reply]
Correct, it is the contents of your edits that raise our objections. (Also, they almost did tell it to me. Not that I am complaining, just noting.) Keφr 16:04, 22 June 2014 (UTC)[reply]
There you go again redefining what people say so it's easier for you to answer it. Of course no one here has said that, but they have pointed out numerous errors in judgement on your part, which you would be able to propagate much more quickly with more tools. Chuck Entz (talk) 16:11, 22 June 2014 (UTC)[reply]
  • @Kephir The red 'D' appears next to the blue 'M' if and only if an unpatrolled edit has created a page; that probably doesn't happen often on your watchlist, and may not even happen in recent changes at any given time you look. The 'D' allows you to delete the page with no edit summary (unless you provide one in a little box which also appears iff an unpatrolled edit has created a page). - -sche (discuss) 19:51, 22 June 2014 (UTC)[reply]
    • Hmm. I opened Special:RecentChanges, plenty of unpatrolled new pages there, but no "D". It can see it on Special:NewPages, however. By skimming the source code, I guess it does not work with the "Group changes by page in recent changes and watchlist" option enabled. Still, I am going to keep the gadget disabled. The buttons look too distracting and just feel too easy to misclick. Keφr 20:05, 22 June 2014 (UTC)[reply]

Proposal: Codify the BRD practice[edit]

General jist: If you make an edit, another editor can undo it, and after that point, discussion must take place, or else both editors are in error Purplebackpack89 (Notes Taken) (Locker) 00:59, 22 June 2014 (UTC)[reply]

To clarify, BRD only applies to good-faith edits. Vandalism can still be reverted without discussion, and an unlimited number of times Purplebackpack89 (Notes Taken) (Locker) 00:00, 23 June 2014 (UTC)[reply]
Support
  1. Purplebackpack89 (Notes Taken) (Locker) 00:57, 22 June 2014 (UTC)[reply]
Oppose
  1. Chuck Entz (talk) 22:49, 22 June 2014 (UTC)[reply]
  2. --Yair rand (talk) 05:35, 23 June 2014 (UTC)[reply]
Discussion

Nice in theory, but the additional required steps will make patrolling hundreds of edits at a time every day all the more time-consuming. This looks suspiciously like yet another technique PurpleBackpack89 can use to change the subject and shift the blame in order to continue to avoid admitting to ever being wrong or making a mistake.Chuck Entz (talk) 22:49, 22 June 2014 (UTC)[reply]

User:Chuck Entz, that is an assumption of bad faith and a DICK-ish comment. BRD doesn't apply to vandalism, it only applies to good-faith edits, so it wouldn't slow down patrolling. Purplebackpack89 (Notes Taken) (Locker) 00:00, 23 June 2014 (UTC)[reply]
P89, you are being much more "DICK-ish" than anybody else by constantly resorting to tedious legalistic interpretation of policies — often ones from Wikipedia that don't even apply here. Read the DICK page yourself, especially the top part, and consider how much you are annoying everybody, as is evident from recent discussions, and just shut up for a minute and think about it. Your modus operandi seems to be to attack everybody for not sharing your personal opinion, based on legalistic policies (which half the time don't even exist here), but if anyone ever attacks you for not sharing their opinion, the rules go out of the window and you are suddenly the injured party who needs soothing and pacifying. It's pretty pathetic and disgusting and hypocritical as hell. Equinox 00:04, 23 June 2014 (UTC)[reply]
Equinox, you don't get to tell me to shut up. I don't give a damn if you don't like my opinions on things, I am entitled to them as much as you are to yours. I don't attack people for not sharing my opinion, I don't attack people at all. I merely point out that it is wrong to make the broad generalizations people do about me. Take a look at how often I say "User X is always...". You'll find it's never. I only comment on other people's INDIVIDUAL edits. Other people comment on ME. Purplebackpack89 (Notes Taken) (Locker) 00:09, 23 June 2014 (UTC)[reply]
Are you saying that we should be prohibited from noticing detrimental patterns in your behaviour? Keφr 00:51, 23 June 2014 (UTC)[reply]
Not exactly. I'm saying the way Equinox and Chuck characterized the edits I made was inaccurate. If you think every edit I make is bad (which is damn near what people have been saying), that's inaccurate. If you think every comment I make toward another user is an attack, that's inaccurate. Frankly, everything said about me in this thread is hyperolization at best and completely inaccurate at work (this is in no way reflection on edits outside this thread). Purplebackpack89 (Notes Taken) (Locker) 01:04, 23 June 2014 (UTC)[reply]
I think you will find that the community here has a relatively thicker skin than yourself. Not that we tolerate gratuitous offence, but most regulars here will probably not care that much about most instances of what you are oh-so-ready to term "personal attacks". We have not been accusing you of "attacking" anyone. "Attacks" are irrelevant. We are accusing you of gross incompetence with regard to our practices and policies (not understanding that Wikipedia's policies may be not applicable here), elementary courtesy (shifting the blame and burden of proof, claiming you cannot be stripped of rights merely because you have not technically violated any written policy, or some imaginary policy — see previous) and reading comprehension (regularly misconstruing other editors' statements and questions, e.g. just above), and not just inability, but an outright refusal to change this state of affairs. Is every single edit of yours wrong? I guess no. But enough of them that the community has decided to watch you closely, and is seriously considering stripping you of your editing privileges. Keφr 01:42, 23 June 2014 (UTC)[reply]
(edit conflict) Would you care to cite one example in which you have ever assumed good faith from your opponents in discussions? To my memory I have never seen you admit you were wrong, and I have never seen you assume good faith. I have, however, seen dozens of marginally off-topic rants about why everyone is out to thwart you. Chuck Entz (talk) 00:14, 23 June 2014 (UTC)[reply]
Chuck, you need to divorce assuming someone is wrong from assuming someone acted in bad faith. When I ask, "Why did you do X" (as I did with CodeCat yesterday), I assume that that person has a perfectly good reason (and therefore acted in good faith) for doing so. I don't assume that anyone who disagrees with me is a vandal or is doing what he/she did for nefarious reasons; I have no idea where you got the idea I did. You'll also notice that when I comment on a person's particular edits, I ONLY comment on those edits; I make no generalizations. Purplebackpack89 (Notes Taken) (Locker) 00:28, 23 June 2014 (UTC)[reply]
As usual, you're not answering the question you were asked. I asked you to cite an example of assuming good faith for opponents in discussions. I didn't ask if you could point to examples where you didn't assume bad faith about someone's edits outside of discussions. As for generalizations: you do it all the time in discussions here. You may not point to individuals most of the time, but you do talk about how people around here are finding fault because they don't like how you dare to disagree with them. And you do spend lots of time implying all kinds of things, without explicitly saying them.
As for you actions yesterday: you saw my edit, and reverted it without asking why I made it. You were reverted. You reverted the revert without asking why you were reverted. You were reverted again, etc. It was only after it was clear you weren't going to prevail that you bothered to ask, and then it was more like demanding to know why than asking. If you had assumed good faith and asked, the whole episode would have been avoided. I'm not saying I've never done anything to make you think I might be acting in bad faith- but you clearly weren't assuming good faith. Chuck Entz (talk) 01:27, 23 June 2014 (UTC)[reply]
You are wrong: Had you bothered to look at the timing of reverts, you'll see that several CodeCat reverts occurred after I asked why she was doing what she was doing. She continued reverting before answering my question, so it's on her, not me. Also, "not assuming bad faith" and "assuming good faith" are the same thing. To claim they aren't is pedantic. Purplebackpack89 (Notes Taken) (Locker) 01:37, 23 June 2014 (UTC)[reply]
You didn't really ask, you came right at me with a rather aggressive tone. "What's the big idea" is not asking, nor is it a civil way to start a reasonable discussion. Furthermore you did not even wait for me to answer before reverting me anyway. I'm sorry but I agree with Chuck, that is more of a demand ("stop interfering, you do you think you are?") than an honest attempt to work out a problem to me. —CodeCat 01:41, 23 June 2014 (UTC)[reply]
For the record, I blocked P89 (per WP:DICK, lol). No, mainly because he doesn't really do anything useful, and just sows discord. If anyone who's also dealt with this person for several years thinks this was bad judgement, feel free to undo it. Equinox 03:05, 23 June 2014 (UTC)[reply]
It's been undone, and the manner in which you did it was highly inappropriate and frankly a personal attack (for crying out loud, you made an lol about a block you made). It's also inaccurate. Have you actually LOOKED at my contributions in mainspace recently? Have you LOOKED at my "pages created" list? Furthermore, it's probably not the greatest of ideas to block someone just because of how their contributions pie falls. Purplebackpack89 (Notes Taken) (Locker) 03:51, 23 June 2014 (UTC)[reply]
Cry us a river. Out loud. So what? Nobody is going to treat you nicely merely because you keep crying about "personal attacks"; in fact, this only worsens your position. But since you failed to learn this until now, I doubt you ever will. Your days here are numbered. Keφr 08:27, 23 June 2014 (UTC)[reply]
  • @Purplebackpack89: You can use Wikipedia's "Twinkle" tool by adding importScriptURI("//bits.wikimedia.org/en.wikipedia.org/load.php?debug=false&lang=en&modules=ext.gadget.Twinkle"); to your personal Special:MyPage/common.js, if you'd like. Please don't assume that users here have heard of Wikipedia's tools or policies, and definitely don't act as though their policies have any sort of relevance here; it's pretty ridiculous, moderately offensive, and generally unacceptable. --Yair rand (talk) 05:35, 23 June 2014 (UTC)[reply]

The {{l|en|word}} stuff[edit]

Please see User_talk:Ready_Steady_Yeti#The_.7B.7Bl.7Cen.7Cword.7D.7D_stuff. Ungoliant seems to think that I'm a moron. Do other regular editors not agree with me that it's bad enough to have to struggle through a lot of redundant "lang=en" every time we edit an English entry, and it would be worse to have the "l/en" stuff in every single linked word in a definition? Perhaps I sound paranoid or silly, but I do a lot of editing here (second most active editor) and I am genuinely concerned because this kind of stuff could double the amount of time it takes me to create an entry — and make it much more painful and less pleasant. Equinox 01:17, 23 June 2014 (UTC)[reply]

I think there's two issues: should editors be discouraged from using the {{l}} template, and should editors be discouraged from not using the {{l}} template. I would say no to both. It's true that RSY has, in typical fashion, taken the practice to extremes, but I don't think there's anything inherently wrong with it. All the same, wikilinking is fine for most purposes when linking to English terms, and for many editors the extra work of using templates isn't worth it- and that should be respected. There are parts of the entry where categorization, formatting and script handling tilt the balance in favor of templates, but that doesn't include definitions. Chuck Entz (talk) 01:50, 23 June 2014 (UTC)[reply]
You will not find an ally in me, sorry. I think |lang=en is not redundant to begin with, and that we generally should not treat English much differently from other languages — apart from the fact that the definitions, entry layout and all the documentation of the infrastructure around them are written in English. And I like definitions to have anchors pointing to the right language, although I prefer to accomplish that by just wrapping the whole definition inside {{l|en|...}} — then at least the inside of the template looks more like plain markup.
Related discussion: Wiktionary:Beer parlour/2013/September#A new way of formatting definitions I saw someone use Keφr 01:52, 23 June 2014 (UTC)[reply]
Personally, I prefer simple links in definitions, but I tolerate {{l}} links as well. I don't think they are detrimental as long as we are not forced to use them. --WikiTiki89 01:57, 23 June 2014 (UTC)[reply]
I don’t think you’re a moron. Also, while I strongly support the use of {{l}} and think it should be encouraged, I don’t think it should be forced on editors. — Ungoliant (falai) 02:22, 23 June 2014 (UTC)[reply]
To put it another way, which might be clearer or more convincing to computer people: when you write an HTML document, you can apply an entire property to a section, e.g. <p style="color:red">. This means you can then do anything inside the paragraph, and not care about the colour, because it's already been set. I feel we should be able to do this, language-wise. So all generic links in an English section would link to English, and all generic links in a Spanish section would link to Spanish, etc. I can see the point of specifying the language in a link where it isn't otherwise obvious, but having a default seems very useful and time-saving. I'm honestly amazed nobody agrees. Of course I will bow to consensus but wow! Equinox 03:03, 23 June 2014 (UTC)[reply]
Links inside foreign-language definition lines link to English most of the time though. — Ungoliant (falai) 03:27, 23 June 2014 (UTC)[reply]
I have always assumed that those whose main contributions are not in English entries simply don't care about any possible time-saving from dispensing with lang=en as it doesn't much effect them. There may also be some kind of fairness/equality ideology at play: "Why should English get special treatment?" "I have to do these extra keystrokes so why not those arrogant English-native yokels?" And then there is the technological uniformity-is-easier-to-code-for factor. DCDuring TALK 03:38, 23 June 2014 (UTC)[reply]
Well, exactly. Which is pretty much what I wrote above. Keφr 08:28, 23 June 2014 (UTC)[reply]
In my opinion, foreign words should be marked up with the HTML lang attribute (to indicate to screenreaders and search engines and so on what languages the words are), and the easiest/briefest way to do so is by use of {{l}} (or {{head}}), which has the added benefit of linking to the correct section. (Actually, it'd be nice if any 'nyms or 'nyms-like section in a FL L2 would have a <div lang=…>. But we don't have that (yet).) However, English words don't need that attribute: the default language for the page is English. Moreover, we don't need a template to ensure linking to the correct section when it's English, as English is the top section on each page where it exists (pretty much). So {{l/en}} is pretty useless: I generally replace it with plain square brackets when I see it.​—msh210 (talk) 04:41, 23 June 2014 (UTC)[reply]
Translingual sections and tables of contents prevent plain links from linking directly to the English section, as well as a same-language section in the page being linked to when Tabbed Languages is being used. — Ungoliant (falai) 05:10, 23 June 2014 (UTC)[reply]
Ah, right, I knew there was some issue I was forgetting. Yes, I agree: if and when we decide to enable Tabbed Languages by default, then we should switch English links to {{l}} in FL sections.​—msh210 (talk) 05:52, 23 June 2014 (UTC)[reply]
I already fix links to use {{l}} even for English. Mainly because I use Tabbed Languages. It seems a bit strange that we don't take TL users into account when we do support it as an option on our wiki. Why would we need to wait until it becomes the default? Even non-default options should still at least work correctly, shouldn't they? —CodeCat 12:04, 23 June 2014 (UTC)[reply]
TL would be a reason to use {{l}} for English only in FL sections, not in English sections. (I switch it to square brackets where I see it only in English sections, actually.)​—msh210 (talk) 16:42, 23 June 2014 (UTC)[reply]
So then we've been undoing each other's work. I'm not sure if I like that... —CodeCat 16:55, 23 June 2014 (UTC)[reply]
I agree with Equinox. I oppose using "{{l|en|" or "{{l/en|" in English definition lines of English entries. Wiki markup is the user interface; it has to be pleasant to use, which includes not only initial creation but also reading and revising. --Dan Polansky (talk) 20:53, 23 June 2014 (UTC)[reply]

Purplebackpack89[edit]

Jamaican Creole font[edit]

I assume that there's a rational reason for writing Jamaican Creole and some other languages with larger font than others (sc=Deva), but what is it? Jamaicans do not have worse eyes than the average citizen of the world, do they? --Hekaheka (talk) 09:18, 28 June 2014 (UTC)[reply]

Jamaican Creole shows up in the same font (and same size) as English for me, and in Module:languages/data3/j its script is set to "Latn", just like English's. - -sche (discuss) 02:26, 29 June 2014 (UTC)[reply]

No Fun Allowed[edit]

Hi. Please make this an official policy A.S.A.P. --Æ&Œ (talk) 10:04, 28 June 2014 (UTC)[reply]

A reminder to myself and others: DFTT. --Dan Polansky (talk) 10:30, 28 June 2014 (UTC)[reply]
SUOA. TATSOOM. But seriously, just because a topic is “trolling” doesn’t mean that you can’t have fun with it. Congratulations on ignoring the humour and supporting my exaggerated perception that having fun on Wiktionary is wrong. --Æ&Œ (talk) 13:34, 28 June 2014 (UTC)[reply]
Silly boy- creating valuable content for an online reference that educates and enlightens the public is all the fun anyone could ever want. Why, just yesterday, correcting an etymology to reference the correct Proto-Indo-European root so filled me with joy that I just had to laugh out loud! I have lots of fun (I understand my neighbors in the apartment building are concerned, though). Chuck Entz (talk) 14:28, 28 June 2014 (UTC)[reply]
Clearly the real issue is that we have too many toasters. -- Liliana 21:20, 28 June 2014 (UTC)[reply]

Über-template with tabular output for pronunciation section (2)[edit]

Older discussion: Wiktionary:Beer parlour/2014/March#Über-template with tabular output for pronunciation section

The hard parts have been mostly done, here's a prototype:


 

varietyIPARhymesOptional columnAudioHomophonesHyphenation
NL/ˈrɛizə(n)/-ə(n)VALUE(deprecated use of |lang= parameter)
(file)
rijzenrei‧zen
BE/ˈrɛːzn/ [ˈrɛːzn̩] invalid IPA characters (//[])rei‧zen
/ˈrɛːzən/(deprecated use of |lang= parameter)
(file)
rijzen
/ˈrɛːzə/rijzn
Sandwich Islands/gumbalagumba/ replace g with ɡ, invalid IPA characters (gg)

Click on "less ▲" or "more ▼" to switch between the views. (This approach has been stolen adopted from User:Atelaes' Template:grc-pron) There was always feedback from readers that they can not find definitions, pronunciation sections are especially big, so I had proposed to have a show/hide button, but why not showing some of the info instead of hiding everything completely?

Regarding the full view:

  • There's a set of predefined columns with predefined displaying text and order. These columns, including the first column (variety/accents), can be omitted, though. In templates/entries, users can define additional columns in their desired order. The hyphenation column will probably be removed from this template. We may predefine enPR column as well. We may want to create separate templates for different languages, similar to headword templates. Any comments would be appreciated.

Regarding the brief view:

  • What information should we put in this mode? I only put accent name, IPA, and audio, and kept them as brief as possible. Should we also include enPR?
  • Should it be a list or a table? How should we arrange and display the information?
  • Some accents are less important. For English, for example, there is usually little demand for pronunciation in accents other than American and British. Should we include all accents in this mode?
  • I made the audio box smaller, but the buttons (play, volume) sometimes disappear or being misplaced, maybe a CSS-related issue or something?

--Z 17:54, 28 June 2014 (UTC)[reply]

If enPR is included, it should be hidden along with the other extended content until someone clicks 'more', IMO; likewise for SAMPA.
Personally, I prefer bulleted lists to tables (for both the expanded and the condensed views).
- -sche (discuss) 16:50, 29 June 2014 (UTC)[reply]
I'm glad to see it stolen. I did my best to make it broadly pilferable. -Atelaes λάλει ἐμοί 20:20, 29 June 2014 (UTC)[reply]
  • Is the lack of colons after the dialect name deliberate?
  • Perhaps throw in some use of rowspan when certain content will be identical across accents?
  • Something to consider: We might want to build "About" pages for various accents. Could be some helpful content, and possibly useful for delineating where accents start and end.
  • Something we might not want to consider if it's likely to get too complicated and/or annoying: Flags.
  • This looks like it'll likely kill visibility of the very interesting Rhymes content on Wiktionary. Not in favor.
  • Perhaps Module:IPAc could be integrated, at least for English? Many readers don't know IPA.
  • How should accents that have identical content for a particular word be handled?

--Yair rand (talk) 05:04, 30 June 2014 (UTC)[reply]

Module Errors on Empty Input[edit]

Is there a good reason to have templates like {{l}} go to a module error is there's no content? I can understand a module error for a missing language code when there's content to be displayed, even if I find it annoying, but it seems to me that {{l|}} or even {{l|en|}} is a minor omission that shouldn't be dealt with by throwing scary-looking module errors.

It would seem to make more sense for the module to test for empty input and simply return nothing for nothing (I suppose a hidden tracking category would be ok). That would also mean it could be used in inflection-table templates without extra code to test for a null parameter.

In general, I think we should move away from using module errors as the way to deal with simple data-entry errors wherever possible- it gives the illusion of their being technical errors that only experts can fix. Chuck Entz (talk) 16:04, 29 June 2014 (UTC)[reply]

I disagree that "{{l|}} is a minor omission" — it's the most major omission that it's possible to make from that template, the total omission of all content. But I do note that {{IPA|}} merely categorizes into Category:Pronunciation templates without a pronunciation and Category:Language code missing/IPA, without throwing a module error. Perhaps we should even add a superscript "please add a (link|pronunciation) or remove the template" message, similar to the message Dutch headword-line templates use when diminutives aren't provided. - -sche (discuss) 16:44, 29 June 2014 (UTC)[reply]
I also disagree, but I furthermore disagree that not displaying errors is an improvement. There are parts of the site which display big red errors even for relatively minor mistakes, like adding references to a page but nowhere to display them. Showing the errors makes them clearly visible and gives editors more of an incentive to fix them. "Out of sight, out of mind" definitely applies here. Errors that only add categories generally don't get fixed; just look at how big some cleanup/request/attention categories are. —CodeCat 17:23, 29 June 2014 (UTC)[reply]
How hard would it be to make the error message depend on the status of the person logged in and on 'type' of error?
I would argue that unregistered users, at one extreme, and admins, at the other benefit from different approaches. Non-contributing unregistered users probably benefit from suppression of error messages. Our ability to recruit contributors may be enhanced by only gradually revealing how finicky we have made the process of contributing. In any event it seems highly likely that we are making the passive (ie, normal) user's experience worse by exposing such users to raw Module error messages without in any way leading to a better Wiktionary by eliciting valuable corrections from such users. ::Further, I am reasonably sure that not all errors merit the same approach, even. Some thought should lead to an architecture allows discrimination along both the user dimension and the error-class dimension.
I am sorry if (probably that) discriminating by user and specific error situation makes the task of designing modules and templates more complicated, but we already have a situation in which very few contributors can do any editing of modules and therefore cannot readily alter the behavior of templates without begging. I would hope that our talent could conceive of some simplifying architecture to enable this kind of discrimination in the main cases. DCDuring TALK 18:46, 29 June 2014 (UTC)[reply]
It's very easy to hide module errors to anonymous using CSS; I did it before. But at the time I think people didn't like it because it made text look strange with odd gaps were the errors should be. I don't think CSS can be used to actually change the text "Module error" itself, but it can change how it appears. —CodeCat 20:11, 29 June 2014 (UTC)[reply]
I take it that it is not possible to close up the gaps by setting character width to be 0 or very small or selecting a font or pseudofont that has the property of being of zero width.
Another thing that may be useful is {{REVISIONUSER}}, which allows one to test whether a(n) (anonymous) user made an edit. That would allow us to provide a message (on previewing or saving the edit) to an anonymous user who otherwise would not get a message. DCDuring TALK 22:15, 29 June 2014 (UTC)[reply]