Wiktionary:Beer parlour

Definition from Wiktionary, the free dictionary
Jump to: navigation, search

Wiktionary > Discussion rooms > Beer parlour

Lautrec a corner in a dance hall 1892.jpg

Welcome, all, to the Beer Parlour! This is the place where many a historic decision has been made and where important discussions are being held daily. If you have a question about fundamental Wiktionary aspects—that is, about policies, proposals and other community-wide features—please place it at the bottom of the list (click on Start a new discussion), and it will be considered. Please keep in mind the rules of discussion: remain civil, don't make personal attacks, don't change other people's posts, and sign your comments with four tildes (~~~~), which produces your name with timestamp. Also keep in mind the purpose of this page. There are various other discussion rooms which may serve the idea behind your questions better. Please take a look to see which is most appropriate.

Sometimes discussion identifies an issue as an idea for policy development or rewriting. Such discussions may be taken out of the Beer parlour to a relevant page, or a brand new page may be created. Usually, the active policy pages will be listed in one of the sections below. See also the policy development page and the votes page.

Questions and answers will not remain on this page indefinitely, as it would very soon become too long to be editable. After a period of time with no further activity (usually a couple of weeks), information will be moved to the archives. We make a point to preserve all discussions that were started here in the archives. However, talk that is clearly not intended for this page may be moved and will not end up in the archives. Enjoy the Beer parlour!

Beer parlour archives edit
2002
December
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016


Contents

June 2016

Colored box around closed votes?[edit]

I think it would be useful if we put a colored box around votes after there closed, the way we do with RfDs when they're archived. Purplebackpack89 13:47, 1 June 2016 (UTC)

It might be nice, but you could overlook that, too, like you overlooked the "Status/Votes" column. An absent-minded mistake by you doesn't mean we have to rearrange everything to make it impossible for you to make the same absent-minded mistake again. Most people would just say "oops- my bad" and let it go. Chuck Entz (talk) 01:29, 2 June 2016 (UTC)
Purplebackpack89 also overlooked the decision at the end of the vote page. --Daniel Carrero (talk) 01:59, 2 June 2016 (UTC)
@Chuck Entz I'm not the dullest tool in the shed. If I make that absent-minded mistake, it's likely others would too. It's unlikely I'd notice something like the whole vote being shaded red, blue or green. @Daniel Carrero I don't think you're seeing the problem. Because the decision is at the bottom instead of the top, it is BEYOND the voting section. Purplebackpack89 16:47, 2 June 2016 (UTC)
Yeah, but it also has the close date near the top of the vote. Maybe you should give the "I'm not the dullest tool in the shed" thing a rest. —Μετάknowledgediscuss/deeds 17:49, 2 June 2016 (UTC)

Closed votes still in "current votes" section[edit]

Should votes that are closed be removed from the "current votes" section and put in a "recently closed" section? It seems bad form to have open votes and closed votes both as "current" Purplebackpack89 13:49, 1 June 2016 (UTC)

No. Not worth the extra visual clutter and the extra work. It's hard enough to make things idiot-proof- do we have to make things Purplebackpack89-proof, too?
Please note that I'm not calling you or likening you to an idiot (an idiot would be easier to anticipate). Chuck Entz (talk) 01:43, 2 June 2016 (UTC)
@Chuck Entz They have these fail-safes on Wikipedia. It's not like I'm suggesting anything revolutionary. And you ARE kinda suggesting that mistakes I make are mistakes nobody else could ever conceivably make, while defending a very confusing process. Purplebackpack89 16:48, 2 June 2016 (UTC)
Absent-minded mistakes aren't related to level of intelligence- if anything, people with more going on in their minds are more likely to make them. I also am not saying that you make mistakes that nobody else makes (except with regards to misinterpreting others' intent- but that's different). No, my point was that your response to your mistakes is different: there's no need to find someone or something to blame for an absent-minded mistake- we all make them, and no one would bat an eyelash at your admitting to one. You're not going to be singled out from the midst of the flock and eaten by wolves if you show signs of weakness. Chuck Entz (talk) 02:40, 3 June 2016 (UTC)
This isn't about blame, though, it's about improvement. IMO, there are a lot of ways in which Wiktionary is organized that could be better. This is one of them. Purplebackpack89 04:11, 3 June 2016 (UTC)

Sending thanks[edit]

This may seem bloody obvious to some, but…! On the history page you are given the option of thanking an editor. When chosen you are asked "Do you want to send public thanks? Yes or No" - the question could be taken as ambiguous. If I choose "No" am I (1) cancelling the thanks, or (2) sending thanks privately. What does "public thanks" mean? Where are they published?   — Saltmarshσυζήτηση-talk 06:38, 2 June 2016 (UTC)

Special:Log/thanks has a list of them. Wyang (talk) 06:46, 2 June 2016 (UTC)
Yes, I wasn't too sure what it meant when I first thanked someone for an edit. I went with "Yes" just in case, but I imagine many newer users are also confused by the ambiguity. Andrew Sheedy (talk) 07:41, 2 June 2016 (UTC)
@Saltmarsh "Send public thanks for this edit?" If you click no, you are cancelling the thanks.
The question probably should be: "Do you still want to send thanks, with the full knowledge that it will be public? (Yes/No)"
Just to make sure, I clicked "Thanks" in your Beer Parlour edit and then clicked No. If you received my thanks, then I was wrong. --Daniel Carrero (talk) 12:40, 2 June 2016 (UTC)
That's a bit long, though. "Really send public thanks?" would suffice. Equinox 13:38, 2 June 2016 (UTC)
Am I missing something, or does the thanks log not in fact indicate the specific edit that was "thanked"? Equinox 13:40, 2 June 2016 (UTC)
If we want to configure this, it seems the message to edit is mediawiki:Thanks-confirmation2.​—msh210 (talk) 14:22, 2 June 2016 (UTC)
It was "Send public thanks for this edit?" and I've changed it to "Send thanks for this edit? It will be public.". How's that?​—msh210 (talk) 15:52, 2 June 2016 (UTC)
I like Equinox's version ("Really send public thanks")--Dixtosa (talk) 15:54, 2 June 2016 (UTC)
Personally, I don't like when software uses the word "really". It sounds too colloquial. --WikiTiki89 16:01, 2 June 2016 (UTC)
(You should see the appalling slang in Office 2013!) Alternatively, we could just reduce it to "Send thanks for this edit?", and document the fact that thanks are public elsewhere. We don't warn about public-ness for other common wiki operations. Equinox 16:57, 2 June 2016 (UTC)
I've seen it, and I don't like it. --WikiTiki89 17:40, 2 June 2016 (UTC)
Equinox's Send thanks for this edit? seems to solve the problem succinctly.   — Saltmarshσυζήτηση-talk 04:59, 3 June 2016 (UTC)
Yeah, but it doesn't solve the problem of notifying the user that the thanks will be public. --WikiTiki89 14:27, 3 June 2016 (UTC)
Why is it necessary to double-check that a user really wants to do what he just said to do? I can understand having that sort of failsafe in place for something potentially damaging, but not for something as innocuous as sending thanks. Can't we just eliminate the message altogether and allow clicking on "thank" to immediately do what it says it does? —Aɴɢʀ (talk) 14:23, 3 June 2016 (UTC)
It's right next to "undo". Really not the kind of situation where you want a slip up. Korn [kʰũːɘ̃n] (talk) 17:16, 3 June 2016 (UTC)

Potential Bot for Adding LSJ and L&S Links to Ancient Greek and Latin Entries[edit]

Hello. In the last few days I have edited the L&S and LSJ templates and modules so that links to the dictionaries resolve correctly from the page names, without use of arguments, in a very large proportion of cases. The exceptions mainly involve proper nouns, affixes, non-lemma forms, and alternative spellings which are not precisely bugs. I have tested a robot called OrphicBot to add LSJ external links to the subset of 4,062 of Wiktionary's approximately 7,000 Ancient Greek entries which are not already linked, which are lemmas, and for which the bare template is tested to produce a valid result. Since, for example, almost all German entries link to the Duden dictionary, it seems consistent to include a link to a freely available dictionary for Greek. I also think it could be quite helpful, since too much inconvenience, perhaps, in Hellenistic pursuits is merely typographical in nature. Equivalently, the Wiktionary Latin section is much more developed, with nearly 30,000 lemma entries, as I recall. If it seems reasonable to others, I would like also to add links to the L&S dictionary via template where these are not already present. The source code (albeit grossly formatted, and in perhaps a still rough iteration) is linked in the user page of OrphicBot, and a small test run can also be seen in the catalogue of that user's contributions. If these edits seem reasonable to make to others here, I will put the bot user status question to vote in the voting area. Thank you. Isomorphyc (talk) 03:04, 3 June 2016 (UTC)

I don't know about L[ewis] & S[hort], but we already have a template {{R:LSJ}} that makes links to Liddell and Scott. The problem is the large number of Ancient Greek entries that don't use it, but instead have merely a link to Wikipedia's article on the Liddell and Scott dictionary. What I'd like a bot to do is go through and change all instances of *[[w:LSJ|LSJ]] to *{{R:LSJ}}, adding any necessary arguments as well. —Aɴɢʀ (talk) 14:28, 3 June 2016 (UTC)
Edited for clarity: this is a problem I have felt as well. Here are options by increasing aggressiveness:
1) just add {{R:LSJ}} to External Links where valid, even if an LSJ-mention exists.
2) replace all LSJ-mentions with LSJ-templates where valid; potentially this effaces bibliographical information (negligibly, I think: if an LSJ mention happens to imply the paper dictionary where it differs from the Perseus version, or where it implies the preface rather than the headword entry, for example.) This is close to my preference.
3) move all existing LSJ links to External links for consistency. This consistent and easy to use, but it destroys far too much bibliographical information.
Additional options/issues:
- Add an additional template to categorise lemmas with no valid entry in LSJ for manual linking. Mostly these are a few hundred non-Attic dialectical spellings and some number of Byzantine words. The former will usually be in LSJ with Attic spellings and the latter will not. A few other examples are prefixes and suffixes. The number is not large and I think this is worth doing.
- I would want to skip over inflected forms; given there are literally millions potentially, to destem and link seems like clutter.
Are there other Greek desiderata that can be addressed?
Isomorphyc (talk) 01:55, 5 June 2016 (UTC)
Hello @JohnC5, @Wikitiki89, @Angr, @Metaknowledge, @Chuck Entz -- thank you all for participating in my small discussion about my robot. I have opened a vote on this topic in the voting [area]. I would respect any of you if you chose not to support me in this, or to abstain, especially since I am so new here; but I would also be gratified should any of you choose to vote. Naturally I would be exceedingly gratified for any of your support. I hope that my recent activity has given some sense of the types of contributions I like to make to Wiktionary. I would still be very grateful for any further concrete References desiderata; I have posted a few blocks of samples on the user page of User:OrphicBot should it bring anything to mind that anyone might like-- or indeed might not like about the presentation. Thanks. Isomorphyc (talk) 07:27, 16 June 2016 (UTC)
Hi @JohnC5, I've been working on the pronunciations a bit. Does the new robot edit on diff:χρηστότης look worth proceeding with? I'm posting this here mainly so anyone interested knows I am working on this and can object if desired. It seems 1/8th of the grc-ipa-rows usages have all unambiguous vowels (approx. 235), and can be replaced with no arguments. If this looks reasonable I'll proceed with the following steps: 1) test for a,i,u in diphthongs and call grc-IPA with no arguments 2) test for breves or macrons in head=... and generate arguments 3) look in to finding head=... arguments from LSJ or possibly flagging ambiguous vowels missing head=... arguments, either from the grc-noun (and similar) or with a robot. Also: I noticed grc-ipa-rows produces unexpected output pretty regularly, so I am not using it to test the correctness of or generate any grc-IPA arguments.Isomorphyc (talk) 18:06, 20 June 2016 (UTC)
@Isomorphyc: The diff for χρηστότης looks good to me. I agree that all unambiguous entries should be changed, and your plan for proceeding seems logical. If we can cut down the number of ambiguous ones to a few hundred, we can fix the rest by hand. —JohnC5 00:07, 21 June 2016 (UTC)
Hi @JohnC5, per our discussion about the memory utilisation of the data modules, I have sharded the six largest across four shards. Here is a table of the changes:
new location new name old name old size desc
Module:data tables/data[0-2] grc_RLBG_lemma_to_index Module:R:LBG/data 2.2 M lemma to index
Module:data tables/data[0-2] grc_RWoodhouse_lemma_to_headwords Module:R:Woodhouse/reverse index 1.4 M lemma to headwords
Module:data tables/data[0-2] grc_RWoodhouse_lemma_to_infinitives Module:R:Woodhouse/psia1 to infs 737 K lemma to infinitives
Module:data tables/data[0-2] la_RMA_index_to_phrases Module:R:M&A/ix to phrase 400 K index to phrases
Module:data tables/dataUnsharded grc_R:Cunliffe_lemma_to_index Module:R:Cunliffe/data 277 K lemma to index
Module:data tables/dataUnsharded la_R:M&A_lemmas_no_collision_to_ix_phrase Module:R:M&A/lemmas no collision to ix phrase 143 K lemma to indices
Data access can now take this form: require("Module:data tables").index_table("grc_RCunliffe_lemma_to_index", title) instead of this form: mw.loadData("Module:R:Cunliffe/data")[title]. Hence, now only the shard which contains data for a given lemma will have to be loaded. Better still, since sharding takes place by title-key, all modules retrieving data for a given key on the same page will require only one module load. It turns out that it takes only about ten lines of Python to reshard the data into an arbitrary number of files. I would suggest thirty to one hundred, giving a file size of 50 K - 150 K per shard. An added benefit is that new tables, should they be needed for other modules can be added to dataUnsharded by hand in whole blocks. If this scheme seems to work, resharding can take place mechanically on an as-needed basis or through watching the file sizes. I view this solution as about six flavours of horrible (data masquerading as code), but it seems to solve our memory ceiling problem in a scalable and extensible way. I haven't altered the production modules yet, or tested very extensively, but preliminarily it seems to work; I wanted to mention it here in case there are obvious objections which have not come to my mind. Isomorphyc (talk) 03:40, 29 June 2016 (UTC)

bot status vote[edit]

Planned, running, and recent votes [edit this list]
Ends Title Status/Votes
Jul 19 label → lb passed
Jul 22 User:Smuconlaw for admin passed
Jul 28 CFI: List of terms Symbol support vote.svg4 Symbol oppose vote.svg6 Symbol abstain vote.svg0
Aug 1 Tohru for deadmin Symbol support vote.svg5 Symbol oppose vote.svg0 Symbol abstain vote.svg1
Aug 13 Editing "Flexibility" Symbol support vote.svg4 Symbol oppose vote.svg1 Symbol abstain vote.svg0
Aug 20 Adding PIE root box Symbol support vote.svg0 Symbol oppose vote.svg7 Symbol abstain vote.svg0
Aug 23 Pronunciation 2 Symbol support vote.svg2 Symbol oppose vote.svg1 Symbol abstain vote.svg0
Sep 21 Using template l to link to English entries Symbol support vote.svg1 Symbol oppose vote.svg4 Symbol abstain vote.svg0
Sep 28 Using template l to link to entries starts: Jul 31

Is no one watching the floating {{votes}} box? Some people are wondering why the vote on User:UT-interwiki-Bot has not been closed out. It appears to have passed a week ago. —Stephen (Talk) 14:43, 3 June 2016 (UTC)

Since you noticed it, could you not have closed it out? Anyway, I just closed it out. --WikiTiki89 14:53, 3 June 2016 (UTC)
That's the issue with having the box require manual updating, which I seem to remember opposing back when @Daniel Carrero instituted it (but I thought he would deal with it). —Μετάknowledgediscuss/deeds 18:11, 3 June 2016 (UTC)
You are referring to Wiktionary:Beer parlour/2016/January#Vote counter. Adding the result in the box was @Benwing2's idea, I just implemented it. In that discussion, I did not even formally support the idea, I "voted" abstain. That is, I don't really care if we have the result in the box or not. --Daniel Carrero (talk) 18:21, 3 June 2016 (UTC)
@Metaknowledge: The problem here is not that the box wasn't updated, but that the vote wasn't closed at all. --WikiTiki89 18:42, 3 June 2016 (UTC)
My mistake. —Μετάknowledgediscuss/deeds 18:49, 3 June 2016 (UTC)
I used to close out most of the votes, but the last time I did it, User:DCDuring began crying corruption! corruption! (or words to that effect), and, try as I might, I was never able to get an explanation of his accusation, so I stopped handling votes. I would not even have mentioned this unnoticed vote here, except that someone asked me to close it out, which I will no longer do. —Stephen (Talk) 18:22, 3 June 2016 (UTC)
I think you are talking about Wiktionary:Votes/pl-2014-07/Allowing well-attested romanizations of Sanskrit and Wiktionary:Beer parlour/2015/July#Persistent extensions of votes.
If you change your mind and decide to close votes again in the future, it would be fine by me, for what it's worth. --Daniel Carrero (talk) 18:37, 3 June 2016 (UTC)
I have four main objections to past practice on voting:
  1. Votes should rarely be extended and never by initiators of proposals or those who have strong opinions. Exceptions might be made by following some reasonable procedure.
  2. Substantive votes should not be too abundant or complex. Some special procedure might be needed to allow a large number of votes or a complex voting structure.
  3. There should be a minimum number of participants, including abstainers, for a vote to be closed out with an outcome that changes the status quo ante.
  4. "Technical" changes that have broad implications should not occur without a vote.
These are essentially due process objections. Does anyone have any reason why I should not have these objections? DCDuring TALK 18:47, 3 June 2016 (UTC)
You should have listed your objections instead of crying corruption! over and over. At first I thought you were accusing Dan of being corrupt because he likes to extend votes. Eventually it dawned on me that you were accusing me of corruption, but could not imagine what you were referring to. In any case, let’s not rehash this here. As far as I’m concerned, the matter ended a year ago and I’m not interested in reliving it. I’m only explaining to Wikitiki why I did not close out the vote. —Stephen (Talk) 18:57, 3 June 2016 (UTC)

Constructed languages and Foreign words of the day[edit]

Per the opinion poll taken alongside the original vote that established the Foreign word of the day project, we have only featured words in attested natural languages. However, part of the purpose of FWOTD is to exhibit the content that we have to offer, which includes excellent coverage of various constructed languages and reconstructed languages. Personally, I would like to see mainspace constructed languages like Esperanto be featured, and also well-referenced reconstructed languages like Proto-Germanic (so nothing in the Appendix namespace would be featured). Of course, they would still need to meet all the requirements that terms normally need to meet. What do you think? —Μετάknowledgediscuss/deeds 21:17, 3 June 2016 (UTC)

I don't mind including words in constructed languages already approved for mainspace (e.g. Esperanto), but I'd be opposed to including words in protolanguages. There's a reason protolanguages aren't in mainspace, and that reason should apply to FWOTD as well. —Aɴɢʀ (talk) 23:07, 3 June 2016 (UTC)
I support the featurability of both. — Ungoliant (falai) 02:23, 4 June 2016 (UTC)
Ditto   — Saltmarshσυζήτηση-talk 04:34, 4 June 2016 (UTC)
I also have no real objection to either being featured. It could be interesting to include appendix-only constructed languages as well, but that might limit our credibility as a serious dictionary in the eyes of some. Andrew Sheedy (talk) 05:38, 4 June 2016 (UTC)
It feels as though these would be mainly of interest to linguists and editors, and less so to mainstream language users and learners. Equinox 05:45, 4 June 2016 (UTC)
I was thinking no more than one a month of each, for that reason. But then again, more than 400,000 people are learning Esperanto on Duolingo, so these clearly aren't so unpopular as you might think. —Μετάknowledgediscuss/deeds 05:54, 4 June 2016 (UTC)

Deprecation tags for language codes[edit]

@-sche I've added a bit of code to Module:languages that checks for a deprecation tag on the language data, and includes a tracking template if found. This can be used to easily track down uses of a code that is being phased out, without generating errors everywhere when the code is removed. To use it, just place deprecated = true in the entry in the relevant language data module. Pages that use the code will then appear in Special:WhatLinksHere/Template:tracking/languages/deprecated as well as "Special:WhatLinksHere/Template:tracking/languages/deprecated/(language code)". See Special:WhatLinksHere/Template:tracking/languages/deprecated/dlc for an example where this has been used. I hope it's helpful! —CodeCat 12:35, 4 June 2016 (UTC)

Automatic transliteration for Thai has been disabled for now[edit]

Previous discussion: User talk:Wyang#Module:links

I disabled the automatic transliteration for Thai, because Module:th-translit isn't generating the right transliterations. Apparently, the code to generate the correct transliteration is located in Module:th in the getTranslit function, so this needs to be added to the transliteration module so that it generates the correct transliterations. User:Wyang had added workaround code to Module:links instead, but this is inappropriate, especially considering the code to generate a proper transliteration already exists, so I removed it again. Module:th-translit should be modified so that such workarounds are no longer necessary; then the automatic transliteration can be reinstated. —CodeCat 12:59, 4 June 2016 (UTC)

What you are doing is a perfect manifestation of your arrogance, ignorance and mindlessness. "So this needs to be added to the transliteration module so that it generates the correct transliterations." – while Module:th-translit is working perfectly fine with phonetically respelled words. You are suggesting that I should turn a transliteration module into a module that actually parses the entire entry's Wikitext and extract certain parts of the text, because "this is what a transliteration module is supposed to do". Sigh! So much for Eurocentric hubris on Wiktionary. "I shall break it, and ask you plebs to explain to me why things broke after this." Wyang (talk) 13:23, 4 June 2016 (UTC)
It's supposed to work with as many words as possible, not just phonetically respelled ones. The getTranslit function is capable of generating better transliterations, so this needs to be integrated into Module:th-translit. Right now, Module:th-translit only correctly transliterates a subset of the words that it could, in theory, but adding custom code to Module:links is not the way to fix that. Modifying Module:th-translit is the right way. User:Wikitiki89 even did so yesterday, and you just reverted it. Why? —CodeCat 13:36, 4 June 2016 (UTC)
Because those codes do not belong to a transliteration module page. How many times do I need to iterate that? Wyang (talk) 13:43, 4 June 2016 (UTC)
Yes they do. And they certainly do not belong on Module:links instead. —CodeCat 13:47, 4 June 2016 (UTC)
Which definition of "transliteration" is for this? Wyang (talk) 13:58, 4 June 2016 (UTC)
The same definition we apply across Wiktionary: generating a Latin-script version of a word, that can be understood by people who don't know the script. The accuracy of the transliteration, or its nature (pronunciation or spelling based) is up to the editors of the language and of the transliteration module. However, under no circumstances should a generic language-agnostic module be used to work around a deficiency of the transliteration module. —CodeCat 14:05, 4 June 2016 (UTC)
In that sense Module:th-translit is working perfectly well. It's just that your Module:links failed to take into account the fact that some languages require another level of phonetic respelling extraction, and it is that phonetic respelling, rather than the entry title itself, that needs to be fed to the transliteration modules. Wyang (talk) 14:17, 4 June 2016 (UTC)
Yes, and in those cases, we use the tr= parameter that is available on countless templates. But let's stick with the situation here. You have a function getTranslit that is clearly capable of generating the correct transliteration, albeit that it has to parse the page's content in order to extract it. The method used is completely irrelevant. It is clear that there exists a function that is capable of doing the transliteration better than Module:th-translit is currently doing. Therefore, it seems obvious that this function should be added to Module:th-translit so that its transliterations become more accurate. This is what Wikitiki89 tried to do, so what is your objection against having better transliterations? And why do you insist on putting inappropriate workarounds in Module:links instead? —CodeCat 14:29, 4 June 2016 (UTC)
Regarding your latest attempt at editing Module:links, the edits are completely unnecessary. This module doesn't have to account for this "phonetic extraction". The transliteration module can perform "phonetic extraction" instead. So please, for the nth time, add it to Module:th-translit and stop edit warring in Module:links. —CodeCat 14:32, 4 June 2016 (UTC)
I just fixed your Module:links, which you again reverted. Module:th-translit is functioning perfectly, given the right inputs. Stop insisting that this belongs at Module:th-translit; it does not. This is not transliteration.
Transliteration is not concerned with representing the sounds of the original, only the characters, ideally accurately and unambiguously. (Wikipedia)
It belongs at Module:links, which is lacking this new functionality of extracting the phonetic respelling to feed into the transliteration module. So for the nth time, please mend your Module:links so that it is fully language-agnostic, not just European language-agnostic. Wyang (talk) 14:49, 4 June 2016 (UTC)
The transliteration module itself should extract this information if it needs it. —CodeCat 14:55, 4 June 2016 (UTC)
Then it is not a module that does transliteration any more. This is exactly why the transliteration module should not be responsible for extracting this. Transliteration module is for transliteration, which is faithfully and systematically converting one writing system to another. Module:th-translit is fully functional at what it does, which is transliteration. A module that tries to extract phonetic respellings is a pronunciation module, which would have to be defined in Module:languages/data2 and have the infrastructure built around it, i.e. mending Module:links. Either way Module:links has to incorporate additional functionalities for non-phonetic languages. Wyang (talk) 15:03, 4 June 2016 (UTC)
I don't care if it doesn't do transliteration according to your narrow idea of what a transliteration is. Nobody else on Wiktionary cares either, I'd bet. What we all care about is that it generates transliterations according to what Wiktionary's idea of transliteration is, and has been for years, not what your idea of it is. —CodeCat 15:07, 4 June 2016 (UTC)
You are arguing whatever you believe in is what Wiktionary believes in, allegedly in opposition to what I believe in. A bit tongue-tied, probably? Wyang (talk) 15:17, 4 June 2016 (UTC)
I have restored automatic Thai transliteration. Remember that what you are doing is against the goal of this project - rather than improving the pages, removing information from numerous entries. Wyang (talk) 13:36, 4 June 2016 (UTC)
I have removed it. It's still not fixed. Stop edit warring and reach a consensus first. —CodeCat 13:37, 4 June 2016 (UTC)
Edit warring? Or undoing highly destructive edits to the project? Wyang (talk) 13:39, 4 June 2016 (UTC)
You added unnecessary custom code to Module:links, and when reverted, you keep reinstating it over and over despite a clear lack of agreement. That is edit warring against consensus. Reach a consensus for your edit first, then it can be reinstated. —CodeCat 13:40, 4 June 2016 (UTC)
It has been there for months. You abruptly removed it, causing all the Thai links to malfunction, prompting Thai editors to ask me to look into the problem and restore the original functionality. Can you be even further from the truth? Wyang (talk) 13:43, 4 June 2016 (UTC)
It never should have been added in the first place. Not in a highly visible and widely used language-generic module like Module:links. Language-specific code belongs in language-specific modules. —CodeCat 13:45, 4 June 2016 (UTC)
User:Wyang, again, please reach a consensus for your edit to Module:links rather than forcing the issue. Do not edit war to push your opinion through. Wait until there is a general agreement that your code belongs in the module. —CodeCat 13:53, 4 June 2016 (UTC)
Stop vandalising the page! Your removal simply wiped out thousands of correct Thai transliterations from Wiktionary pages. Where is your protest when I added it back in February? And where is your explanation when you suddenly removed the code 6 days ago? If you would like to maintain the status quo, at least get the version right. Wyang (talk) 13:58, 4 June 2016 (UTC)
Is there a time limit for contesting something? How long ago should an edit be before it's considered an automatic consensual status quo? Do we have a policy for this? I am contesting your edit now, as have two others so far, but you continue to ignore them and push your edit through. That is edit warring against consensus and I wouldn't be surprised if it got you blocked, though I won't be the one to do it because I'm involved in the dispute and people won't like that. —CodeCat 14:01, 4 June 2016 (UTC)
Did you forget that your edit had been reverted twice [38649499][38650974] by someone other than me? Taking out the block card now? A step-up from your threat to disable on my talk page? Four months seem like a much longer time than 6 days. Wyang (talk) 14:07, 4 June 2016 (UTC)
Reverts aren't the only way to contest an edit. But in any case, your edit was reverted first by me, then by Wikitiki, then by me again, then you started edit warring, and Dixtosa has also contested your edit. In comparison, only you and Metaknowledge have supported it. According to our common practice, consensus requires a 67% majority in favour, which is clearly not the case. So your edit has no consensus. —CodeCat 14:09, 4 June 2016 (UTC)
So stop your vandalism. The reason you dare to tackle Thai specifically is you simply don't care. You just don't care about what Thai editors think at all, hence destroying thousands of Thai entries is perfectly justified in your opinion. Wyang (talk) 14:17, 4 June 2016 (UTC)
Please stop using personal attacks. Reverting an edit that has no consensus is not vandalism. Reinstating that edit over ten times despite being notified that your edit has no consensus is vandalism. —CodeCat 14:29, 4 June 2016 (UTC)
Are you denying that your edit effectively eliminates valid Thai transliterations from thousands of entries? Repeatedly removing any one of those thousands of transliterations would lead to someone being blocked. So not vandalism you say? Wyang (talk) 14:34, 4 June 2016 (UTC)
Only for as long as the transliteration module hasn't been fixed to compensate. The fact that you refuse to do so does not suddenly make my reversions vandalism. In fact, you also reverted Wikitik89's edit to Module:th-translit, which did fix (or attempt to fix) the module. So it appears you are not actually interested in fixing the transliterations. —CodeCat 14:37, 4 June 2016 (UTC)
I have now reinstated User:Wikitiki89's edit to Module:th-translit. Reverting this again would re-break the transliterations, thus doing the exact same thing that you accuse me of doing. So if you revert this too, then I can only assume you are not interested in finding a solution for this problem. —CodeCat 14:41, 4 June 2016 (UTC)
It looks like พลเรือน ‎(pon-lá-rʉʉan) once again has the correct transliteration. Why you reverted the edits by Wikitiki89 that restored this is beyond me. But please do not break it again. —CodeCat 14:45, 4 June 2016 (UTC)
As I said numerous times before, this is not transliteration. It does not belong in a transliteration module. Transliteration is the faithful letter-to-letter correspondence performed between writing systems, which is obviously not the process you and Wikitiki89 would like to see implemented in Module:th-translit. Which is hence something that more properly belongs elsewhere, i.e. at your Module:links. Wyang (talk) 14:49, 4 June 2016 (UTC)
Transliteration on Wiktionary is not the faithful letter-to-letter correspondence, and it never has been. Many languages have non-orthographic transliterations. Hindi, Chinese, Russian, just to name some. You cannot just unilaterally redefine what "transliteration" means on Wiktionary to suit your purposes, and then demand that everyone else accepts your edits to a generic module to work around it. It seems that this isn't a workaround for code, but a workaround for your own mental idea. —CodeCat 14:58, 4 June 2016 (UTC)
Well, there has never been a Module:zh-translit! Because a Chinese-English transliteration system is never possible. Hindi and Russian have two sets of transliteration and pronunciation modules: Module:hi-translit vs Module:hi-IPA, and Module:ru-translit vs Module:ru-pron, with the former doing fairly strict transliteration and the latter IPA interpretation based on transcription. Thai also has two: Module:th-translit vs Module:th-pron. And yet you are suggesting that th-translit should take on the role of the latter. It is never my "own mental idea" - it is what the definition of transliteration is, and it forms the basis for its distinction from "transcription", whether you are willing to accept it or not. Wyang (talk) 15:17, 4 June 2016 (UTC)
  • I do not think any module that is to be invoked in mainspace should EVER take content from the entry and parse it, because the entry can get arbitrarily large and introduces very difficult dependency. It is abusing Lua. As for code placement, it is about how you look at *-translit modules. CodeCat views (shared by me) them as the general transliteration modules which should work independently (i.e. not necessarily through Module:links). But, again, I disapprove the parsing part. @Wyang, why do not you just pass them as arguments? --Dixtosa (talk) 13:41, 4 June 2016 (UTC)
  • I apparently disagree. There are huge benefits from the use of parsing, as Wiktionary's system is inherently cumbersome and unsuitable for building a dictionary without parsing. See {{zh-forms}} for an example. Wyang (talk) 13:52, 4 June 2016 (UTC)
    Wait, so you've done this for other languages too?? —CodeCat 13:58, 4 June 2016 (UTC)
    Ignorant - you must be reading European language entries only in this year and a half. Wyang (talk) 14:03, 4 June 2016 (UTC)

Recap for us outsiders: Did I understand correctly that the way Thai editors handled the situation worked but was incompatible with some stuff CodeCat's robots do, so CodeCat changed it to make it comply with his/her robots, which in turn broke it for Thai editors? And now you can't decide which way to go because you do not agree whether or not the module should scan the entire entry or not? Korn [kʰũːɘ̃n] (talk) 15:00, 4 June 2016 (UTC)

No this has nothing to do with bots. What happened, it seems, was that Wyang insisted that transliteration modules should only give letter-for-letter transliterations. But doing that would generate incorrect transliterations in many entries because Thai script is rather haphazard. So rather than adjusting their dogma - and the transliteration module - they instead made an edit to Module:links, a generic language-agnostic module, to entirely bypass the defective transliteration module. This code was noticed a few months later by me, and removed, then removed by Wikitiki89 again, then removed a whole lot more by me again. Wikitiki made edits to Module:th-translit and Module:th which fixed the transliterations after removing the Thai-specific code from Module:links had broken them. However, this seemed to go against Wyang's dogma that transliteration modules must transliterate letter-for-letter (even though they don't, and never have, on Wiktionary), so he reverted the edits and again reverted me when I tried to reinstate the fixes Wikitiki made. —CodeCat 15:05, 4 June 2016 (UTC)
The transliteration system for Thai was fully and well functional since its implementation in February, until it was abruptly removed by User:CodeCat six days ago. A bit of investigation led to User:CodeCat's edits which basically led to all Thai transliterations on Wiktionary non-functional. Wyang (talk) 15:07, 4 June 2016 (UTC)
Did you miss the fact that Wikitiki fixed the problem, and you undid his edits? Your undoing broke the transliterations again, but instead of putting Wikitiki's edits back in, instead you insisted that Module:links be edited to fit your dogma instead. —CodeCat 15:10, 4 June 2016 (UTC)
  • You both claim that your way produces correct results and the system of the other breaks it. Can you each provide a specific example which works with your system and say how it gets broken by your opponent's method? Korn [kʰũːɘ̃n] (talk) 15:13, 4 June 2016 (UTC)
    Both methods work. However, I object to having extra code in Module:links that handles deficiencies in Module:th-translit, deficiencies that were readily remedied by Wikitiki. The issue seems to be that Wyang dislikes Wikitiki's remedies, but to undo them he has to reinstate the extra code that I object to. I think that problems with Module:th-translit ought to be fixed in that same module, as Wikitiki did, rather than introducing workarounds in another module that has nothing to do with Thai. —CodeCat 15:16, 4 June 2016 (UTC)

For many languages transliteration and transcription/pronunciation are very different concepts, and Thai is one of these languages. One can generate a transliterated outcome for a Thai word (Module:th-translit), but oftentimes this is different from the pronunciation. The core issue here is that Module:links provides no support for these non-phonetic languages, which is why I added the new functionality in the module. Such information does not belong to individual transliteration modules, as this is a widespread linguistic phenomenon and the addition would greatly benefit many non-European languages (for example, Chinese and Japanese). The lack of transcription support in the central linking templates/modules is exactly the reason these languages have been moving away from the standard linking templates, resulting in much confusion and repetition during editing. Wyang (talk) 15:43, 4 June 2016 (UTC)

That's irrelevant. Module:links needs no additional support, transliteration modules (for Wiktionary's use of the word) are sufficient. If they are not, then you have to show why. So far you have failed to do so, since Wikitiki's edits (which you reverted) proved you wrong, it's perfectly possible for the existing infrastructure to handle Thai. Perhaps you don't want to be proven wrong? —CodeCat 16:31, 4 June 2016 (UTC)

I don't know what's going on. I am native and I only can say that direct auto transliteration from a "Thai word" could never be done due to complexity uncertainity of spelling. That's why we do it on basic syllables (which are more certain); it has been tought in school either. --Octahedron80 (talk) 15:23, 4 June 2016 (UTC)

Rather than this constant revert war that's going on: is it not possible to apply the code fixes in one single operation that will make the transliterations continue to work as they did before? Fixing only part of it, while leaving Thai users without useful content, seems like a problem. Equinox 16:39, 4 June 2016 (UTC)
As of right now, things work just fine. Wyang keeps reverting it. —CodeCat 16:41, 4 June 2016 (UTC)
Here's how it started(from User talk:Wyang):

I don't understand it either. Why do those edits change the transliterations, even though none is given in the entry? —CodeCat 20:49, 2 June 2016 (UTC)
Even after looking at the section above, I don't see what this edit does. In fact, it seems like it would break cases that have alt forms. --WikiTiki89 14:29, 3 June 2016 (UTC)
I've undone it until we can establish how the special treatment actually changes anything. —CodeCat 14:35, 3 June 2016 (UTC)
It looks like it, somehow, for some reason, changes the transliteration of พล ‎(pon) between "pol" and "pon". But I have no idea why. I think the problem is with the Thai transliteration module here, not Module:links. —CodeCat 14:37, 3 June 2016 (UTC)
I think I fixed the problem with these edits. --WikiTiki89 18:57, 3 June 2016 (UTC)

This was part of a topic where it had been asked what the code was for, and everyone was waiting for Wyang's response. CodeCat acted without knowing what the consequences would be, without waiting to find out what the code was for. That was clearly wrong, and Wyang was understandably upset. Wikitiki89 helpfully came up with an alternative that seems to work.
This whole episode is painful to watch: we have two strong-minded people who have both done great things for the project, but are now butting heads instead of discussing rationally.
Wyang has a history of coming up with ingenious ways to make our system do things that no one would have thought possible. Our Chinese entries are infinitely better than they were, and they're getting better all the time. There are, however, times when the system gets brought to its knees, as at .
CodeCat is as responsible as anyone for the current template, module and category infrastructure that runs this site. This prodigious work ethic and expertise is, however, marred by a willingness to break things in order to force people to fix things she sees as wrong (case in point: Module:parameters). She also has a tendency to ramrod things through, which has created deep resentment in some quarters that has poisoned a number of discussions on unrelated issues.
On the one hand, we have Wyang, still furious about CodeCat's behavior and unwilling to allow anything that would let her get away with it. On the other hand, we have CodeCat, who has gone into Orwellian DoubleSpeak mode to shift attention from her initial, destructive action, portray Wyang as a dangerous loose cannon and portray herself as an innocent victim
We need to get past all of that and look at the merits of how we want to structure this. Our architecture isn't set up to handle the use of respellings in transliteration, so Wyang came up with a kludge to work around this. At the moment, the debate seems to be over where to put the kludge, not on whether there's a better way to do this. My question is: can we come up with a way to get the respellings to the modules without having the modules swallow an entry whole and rummage through it to find them (please forgive the mixed metaphor)? Chuck Entz (talk) 20:05, 4 June 2016 (UTC)

Wiktionary does not have a JSON-style dictionary system, which is why there is so much formatting nuisance with the use of different headers, headword templates, reduplicating etymologies, ectopic related terms and unsystematic pronunciation notations. Each word in a language should be defined by a JSON set, containing a series of qualities indicating the nature and relationships of subordination of various parts of the text. All the Wikitexts on a Wiktionary entry should be generated from scratch, from that JSON set using pre-defined formatting codes which tells the entry how the original core information should be displayed. All the JSON information from entries should be made rapidly indexable to other entries, so that there is no need to repeatedly define what the pronunciation of another word in the etymology is, or what the meanings of that word are.

What Wiktionary has is a very different system. A system that tends to make people think about "what are we eating for tonight" rather than "how can we most efficiently make dinner for the next 20 years". You can create a magnificent, all-encompassing entry for a word in a language if you put into the entry everything that is known on Earth about the word, when in actual fact you should not have to do most of what you did because they can already be found elsewhere in the dictionary and should have been "extracted" rather than "generated or provided de novo". Say you want to link to another word in your perfect entry. Then in the perfect entry on Wiktionary you would have to put in: (1) the word you wish to link to, (2) the transliteration/transcription of that word, (3) the definition of that word, and (4) qualifiers for the definitions (e.g. derogatory, obsolete), although points (2-4) have already been stored in your destination entry. Previously all of (2-4) would have to be provided in the internal link. Things have improved in that point (2) is sometimes no longer necessary, as Module:links will attempt to generate the transliteration from a series of transliteration modules. This is a great leap forward, as we start to realise some of what we previously wrote were not necessary at all. However, the source of that omitted information (i.e. the regenerable information) is misunderstood. It is not the transliteration modules that are ultimately the source of the regenerable information; rather it is the destination entry where the regenerable information is stored. For languages where transliteration approximates fairly well the transcription/transliteration system we use for that language, this is an acceptable and quite efficient way of regenerating the information, despite the non-zero failure rate (e.g. link in коэффицие́нту ‎(koefficijéntu) to коэффицие́нт ‎(koefficijént)). But for languages where transliteration approximates the transcription system we use very poorly (Thai etc.), or where transliteration is intrinsically impossible (Chinese, Japanese etc.), our hub of Module:links simply gives up, telling editors of these languages "sorry, there is nothing I can help you with here", when in fact it should have been set up to facilitate the extraction of the phonetic pronunciation in the destination entry. Languages lie on various parts of this transliteration–transcription continuum and it is outright inappropriate to call this process of phonetic extraction "transliteration" for languages that fall towards the transcription half of the continuum (Thai, Chinese, Japanese etc.), as that is an obvious oxymoron, and/or transcription vs transliteration are contrastive concepts for these languages. Mixing these two very different concepts or intentionally confusing them to achieve minimal effort could be very dangerous. Wyang (talk) 01:46, 5 June 2016 (UTC)

Word Transliteration outcome Transcription outcome What should be returned if transliteration is desired What should be returned if transcription is desired What should be returned if IPA is desired
พล (Thai) pol pon pol pon /pʰon˧/
십육 (Korean) sib.yug simnyuk sib.yug simnyuk /ɕʰimɲjuk̚/
བརྒྱད (Tibetan) brgyad gyaew brgyad gyaew /cɛʔ˩˧˨/
(Chinese) none dòu nil dòu /toʊ̯˥˩/
(Japanese) none mizu nil mizu /mizɯᵝ/
ရှည် (Burmese) hrany she hrany she /ʃè/

vs

Word Transliteration outcome Transcription outcome What should be returned if transliteration is desired What should be returned if transcription is desired What should be returned if IPA is desired
дли́нный (Russian) dlínnyj dlínnyj dlínnyj dlínnyj /ˈdlʲinːɨj/
ტორტი (Georgian) ṭorṭi ṭorṭi ṭorṭi ṭorṭi /tʼɔrtʼɪ/
κέντρον (Ancient Greek) kéntron kéntron kéntron kéntron /kéntron/
^According to the table, transliteration of Thai would be useless and would result in problem on difficult words, such as เศรษฐศาสตร์, รัฐธรรมนูญ. You could try to replace letter-by-letter but no one will understand it. I prefer transcription. --Octahedron80 (talk) 09:46, 5 June 2016 (UTC)
On Wiktionary, the term "transliteration" encompasses transliteration, transcription and general romanization. It's just a historical accident that we call it "transliteration", but it's not transliteration in the strict sense. See Wiktionary:Transliteration and romanization. So it is not an argument that the module can only supply transliterations in the strict sense just because it's called a transliteration module. It's a romanization module, but it's called a transliteration module for historical reasons. —CodeCat 12:35, 5 June 2016 (UTC)
It's not just a historical accident. It is Eurocentrism in Wiktionary at its best. As a consequence of this historical confusion, the central system just assumes that all languages use transliteration as their romanisation method, and Module:links sends words of all languages indiscriminately to their transliteration modules to generate their romanisations. This leaves languages with both transliteration and transcription outcomes unsupported. Thai already has a functioning transliteration module (Module:th-translit), and in addition it also has a transcription module (Module:th). Module:links should relay the 'tr' parameter to the correct place so that it is truly language-agnostic, and this includes distinguishing between the transliterative and transcriptive modules used for a particular language and rendering support to languages that use a transcriptive method of romanisation. Wyang (talk) 12:54, 5 June 2016 (UTC)
The correct place for transcriptions is Module:th-translit, so there is no need for additional code. Are you suggesting that we setup an entirely separate system to deal with transcriptions as opposed to transliterations, and have separate transcription and transliteration modules for languages? What's the benefit? And if you are so passionate about it, why don't you start a vote to change the current practice of including transcription in transliteration, rather than edit warring over it for days? Right now you have yet to display any kind of consensus for your views. —CodeCat 13:00, 5 June 2016 (UTC)
Never mind, I've done it for you. —CodeCat 13:08, 5 June 2016 (UTC)
For whom? You ignored the points I raised above and therefore completely misunderstood what I was saying. Again I feel like I am talking to someone who did not care to read my comments at all. The answers to your questions are: No and no, and you would not have asked these questions if you had read my replies above. I'm not suggesting that we set up an entirely separate system to deal with transcriptions as opposed to transliterations, nor am I interested in having separate transcription and transliteration modules for any other languages that do not differentiate between the two concepts on a romanisation level. Likewise I am absolutely uninterested in changing the current practice of including transcription in transliteration. Wyang (talk) 13:19, 5 June 2016 (UTC)
Then why do you keep edit warring? All Wikitiki's edits did was change Module:th-translit to supply a transcription. If you are fine with this practice, your edits say otherwise. —CodeCat 13:22, 5 June 2016 (UTC)

Poll: Should there be separate systems for transcription and transliteration?[edit]

Currently, both orthographic transcription and phonetic transliteration are subsumed under the term "transliteration" on Wiktionary. Wyang seems to be arguing that we should use "transliteration" only for the strict definition, and have an entirely separate system for transcriptions, allowing them both to exist side by side. Presumably, links and headwords would also show both, if both are available. Do you agree with this change? —CodeCat 13:05, 5 June 2016 (UTC)

Support[edit]

Oppose[edit]

  1. Symbol oppose vote.svg Oppose There is no need for separate systems, this overly complicates things without any obvious benefit. The point of transliteration modules currently is to supply a version of a word in the Latin alphabet, without regard to how closely it maps to the orthographic form of the original language. In other words, they are romanization modules, that are called transliteration modules by historical accident. I see no value in being pedantic about the meaning. If we want to display transliteration and transcription side by side whenever applicable, we should be able to demonstrate that users benefit from this information overload. —CodeCat 13:05, 5 June 2016 (UTC)
    What's the use of phonetic transliteration when we already have dedicated Pronunciation section?--Dixtosa (talk) 13:13, 5 June 2016 (UTC)
    The point here is that this involves a change in the status quo. If we want transliterations to be strict transliterations, then we have to change the practices of all languages whose transliteration is not a strict transliteration currently, and make changes to Wiktionary:Transliteration to reflect the new practice. Russian editors have strongly opposed this in the past. @Benwing, Atitarev. —CodeCat 13:19, 5 June 2016 (UTC)

This is the most stupid response I have ever seen. You would not have acted so bizarrely if you were more attentive and respectful, and this includes completely misconstruing my reasoning and thus creating this poll, and abusing your admin rights to block me. I would like to request to have your admin rights reviewed. Wyang (talk) 13:19, 5 June 2016 (UTC)

I blocked you because you keep making edits that have no consensus. This poll is an attempt to establish a consensus, but you continue to revert without awaiting the results of the poll. —CodeCat 13:23, 5 June 2016 (UTC)
Pot, meet kettle; kettle, pot. DCDuring TALK 13:42, 5 June 2016 (UTC)
Aha. Whatever. I'd rather not be compared with this crazy user. Next time I'll just redirect all the Thai complaints to her. Wyang (talk) 13:59, 5 June 2016 (UTC)
  • It seems the status quo ante of Module:links is what CodeCat reverted to, and that therefore CodeCat's edit should be reinstanted but probably not by CodeCat. Wyang should be prevented from reinstating his edits. Wikitiki's edits to Module:th and Module:th-translit should be reinstated and then we should see which Thai entries, if any, display a problem with transliteration or transcription. --Dan Polansky (talk) 20:47, 5 June 2016 (UTC)
Not really. Wyang added the code in February, and CodeCat removed it in May as part of an extensive rework of the module. Wyang was just restoring it under the assumption that it had been removed accidentally. Thai editors have been basing their edits for three months on the presence of that feature. Wyang made his June edit only because Thai editors were complaining about it not working any more. Chuck Entz (talk) 21:21, 5 June 2016 (UTC)
Wyang's February edit cannot be traced to a discussion showing consensus, AFAIK. The edit is now challenged. The status quo ante is the status before the challenged edit. Three months have elapsed between the edit and its challenge, probably because the challenging editor did not notice the edit earlier. Now as before, I propose that CodeCat and Wikitiki edits are reinstated, and that specific problems in Thai entries that are a result of that are clearly stated, including stating at least one Thai entry that has the problem. --Dan Polansky (talk) 21:32, 5 June 2016 (UTC)
The word "consensus" is thrown around here too much. —Aryamanarora (मुझसे बात करो) 23:55, 9 June 2016 (UTC)

Languages of Sweden[edit]

The fact that Elfdalian now has an official ISO-639 code reminds me that we have several pages, at least in the Reconstruction: namespace, on which the language names Westrobothnian, Jamtish, and Scanian are used. These languages have neither ISO-639 codes nor Wiktionary-specific ad-hoc codes. What do we want to do with them? Should we make ad-hoc codes (e.g. gmq-vas, gmq-jmk, and gmq-scy) for them? Shall we consider them Regional Swedish dialects? —Aɴɢʀ (talk) 13:13, 4 June 2016 (UTC)

I think Scanian is a dialect rather than a language, I'm not sure about the others. DonnanZ (talk) 13:17, 4 June 2016 (UTC)
I don't agree, at least not the historical Scanian language. Indeed, Scanian has recently been under heavy influence from Standard Swedish and most Scanians today speak the Scanian variety of Standard Swedish due to recent language standardization in Sweden. But Genuine Scanian had it's own grammar, sound developments, own vocabulary etc., differing well from Standard Swedish, see for yourself at [1]. Same situation with Jamtish and Westrobothnian.--87.63.114.210 13:27, 4 June 2016 (UTC)
In addition, we list Gutnish (in a similar situation) as a separate language, even though it has come under heavy influence from Standard Swedish and is slowly dying out, in the meanwhile there are projects to revive it ([2]). Furthermore, Swedish Wiktionary uses gmq-bot for Westrobothnian. --87.63.114.210 13:37, 4 June 2016 (UTC)
gmq-bot is fine with me. I only suggested gmq-vas because Linguist List's ad hoc code is swe-vas. Are these three lects as different from Standard Swedish as Elfdalian and Gutnish are? If so, then I'm for giving them their own codes. —Aɴɢʀ (talk) 19:29, 4 June 2016 (UTC)
We've discussed this before. Can anyone come up with links to the previous discussions so we don't have to start from scratch? Chuck Entz (talk) 20:11, 4 June 2016 (UTC)
The discussion was at [3], but was left unresolved. --87.63.114.210 20:27, 4 June 2016 (UTC)
It looks like no one objected to giving all these languages their own codes; the discussion stalled over the truly trivial issue of whether or not to prefix the codes with gmq-. I don't care if we leave the prefix off, but I thought it would confuse the HTML if we did. —Aɴɢʀ (talk) 20:51, 4 June 2016 (UTC)
I've created gmq-bot, gmq-jmk, and gmq-scy, so entries can now be made for those languages, and links to them in the Reconstruction namespace can now use {{l}} instead of bare links. —Aɴɢʀ (talk) 12:35, 6 June 2016 (UTC)
P.S. I'm not touching Category:Scanian Swedish, because I'm not capable of saying what's Scanian language and what's Scanian dialect of Standard Swedish. I leave that for someone who knows these languages. —Aɴɢʀ (talk) 12:58, 6 June 2016 (UTC)
Great, thank you. I'll clean up the links. --87.63.114.210 18:09, 6 June 2016 (UTC)

Parameter in quotation templates for earliest attestation[edit]

Should we have a parameter in quotation templates for the earliest attestation that can be found? This is not the same as the earliest quotation that might be in the entry- this would specifically indicate that someone had searched for earlier quotations and found none. It would hopefully be a replacement for {{defdate}}, which I've always disliked since it gives no reference for its claim. It could also categorize by century or by a more granular period of time. DTLHS (talk) 21:36, 4 June 2016 (UTC)

what does eminant boot agreement mean[edit]

what does eminant boot agreement mean

For BOOT, see [4]. The "eminent" might have something to do with eminent domain...? Equinox 08:53, 5 June 2016 (UTC)

Proposal: Desysopping of User:CodeCat[edit]

Reason: Abuse of admin rights – misusing her admin power to block the other party of a personal dispute. Block log: [5]. Wyang (talk) 13:28, 5 June 2016 (UTC)

I blocked you to put an end to the continuous edits which forced Wyang's point of view without a consensus for that view. We block other editors for such behaviour, so why not Wyang? —CodeCat 13:29, 5 June 2016 (UTC)
Well your edit simply removed thousands of correct Thai transliterations on Wiktionary and caused uproar among our Thai editors, which is why it was reverted. Repeated removal of any one of those thousands of transliterations is sufficient to warrant a block. Wyang (talk) 13:31, 5 June 2016 (UTC)
No it didn't. The edits you've been edit warring on for the past day did not break any entry. Please demonstrate that Wikitiki's edits, which you continued to revert, broke or removed thousands of transliterations. —CodeCat 13:34, 5 June 2016 (UTC)
I have once again reapplied Wikitiki's edits. Please show an entry that is currently broken. —CodeCat 13:35, 5 June 2016 (UTC)
Why have you undone Wikitiki's edits yet again? There is no consensus for having transliteration and transcription separate. You should wait for the poll to finish. —CodeCat 13:37, 5 June 2016 (UTC)
I ask that Wikitiki's edits be restored until 1. it is established that a consensus exists for separating transliteration from transcription, or 2. it is established that Wikitiki's edits break anything. —CodeCat 13:39, 5 June 2016 (UTC)
Nor is it appropriate or does it have consensus. You seem to be in denial of your repeated vandalism – let me refresh your memory: diff, diff, diff, diff. These are the first four of your edits - did they remove useful content en masse? Wyang (talk) 13:40, 5 June 2016 (UTC)
Wikitiki also made that same edit diff, so should he also be blocked? Wiktiki in fact made additional edits to fix the problems caused by this edit, and you then reverted his edits too. —CodeCat 13:43, 5 June 2016 (UTC)
Circumventing the question huh? Did your edits repeatedly remove useful content en masse? Wyang (talk) 13:46, 5 June 2016 (UTC)
No, they did not, once Wikitiki had provided an appropriate fix. Which you then reverted. So again, please demonstrate that Wikitiki's trio of edits to Module:links, Module:th and Module:th-translit broke something, and that it is therefore warranted to desysop me for restoring those edits. You have yet to show even a single entry that was broken by it, yet you continue to revert these edits. —CodeCat 13:47, 5 June 2016 (UTC)
Go to the time points (1) 12:56, 4 June 2016; (2) 13:34, 4 June 2016; (3) 02:22, 4 June 2016 and (4) 01:01, 4 June 2016. Preview the page พลเรือน. Were the Thai romanisations there? Wyang (talk) 13:51, 5 June 2016 (UTC)
Please stop dodging the question. Did Wikitiki's trio of edits break any entries? Please restore his edits and then show us a broken entry. If you can't demonstrate that his edits broke an entry, how can you ask me to be desysopped for restoring them? —CodeCat 13:53, 5 June 2016 (UTC)
Looks like you are unable to answer my question. You did not restore his edits. You restored your edit, which wiped out thousands of Thai transliterations. Wyang (talk) 13:57, 5 June 2016 (UTC)
For the past day, you have been reverting those three edits Wikitiki made, one of which included the edit I also made. I have been trying to restore those edits because there is no consensus for your views and no evidence that those three edits break anything. —CodeCat 13:59, 5 June 2016 (UTC)

Your continued edit warring shows a severe lack of professionalism and responsibility. You both are perfectly aware that edit warring warrants an admin stepping in if the users can't get a hold of themselves. You both seem to be admins and abuse your positions to keep ranting where other users would long have been shut up. (Read: Prevented from editing the entry in question.)
You both continuously accuse the other of having no consensus, but your endless bickering makes it harder and harder for people to get an overview over the situation, and thus makes it more and more difficult for the community to actually reach a consensus. Please keep your hands still for a while so that the rest of the community, or at least those parts who understand the techno babble, can actually debate this matter. Korn [kʰũːɘ̃n] (talk) 15:28, 5 June 2016 (UTC)

  • +1. I can't even figure out what the primary point of contention is. (I agree very strongly with Dixtosa's point above that no module invoked in the mainspace should ever take content from the entry and parse it, though. Seriously, the devs are going to regret ever giving us Lua if we go in that direction.) Can someone please explain the difference between transliteration and transcription, and where they're each used in entries? --Yair rand (talk) 20:38, 5 June 2016 (UTC)
Whether we want to allow modules invoked from the main namespace to parse other entries should be a separate discussion, if anyone wants to start it. I believe the Chinese modules extensively use this paradigm. DTLHS (talk) 21:45, 5 June 2016 (UTC)
The distinction which seems to be being made by those who are making a distinction is : transliteration takes a set of characters and renders them letter-for-letter into another script (in this case, the Latin script), whereas transcription renders the word itself into another script; the difference being that e.g. cannot be 'transliterated' per se, but it can be transcribed (as dòu, IPA: /toʊ̯˥˩/), and that if e.g. พล is transliterated, it is pol, but if it is transcribed, it is pon (in IPA it is /pʰon˧/). In practice, the argument here seems to be (1) not over which of these systems should be used (since I haven't actually seen someone suggest that พล should be rendered pol), but over which word should be used, and (2) not over whether or not a module should parse a page, but over which module should host the code. - -sche (discuss) 21:01, 5 June 2016 (UTC)

Module:links is protected so that only administrators can edit it; this prevents non-admins from editing or edit-warring over it, and it means the edit-war between admins User:CodeCat and User:Wyang is a wheel war. If the two of you continue to wheel-war, I will ask a bureaucrat such as User:Chuck Entz or a global 'crat to make emergency and hopefully temporary desysoppings to stop the war. - -sche (discuss) 21:14, 5 June 2016 (UTC)

I was already considering doing so, but I've been hoping they would start acting like adults without being forced to. Unfortunately, the action has been taking place while I've been offline (I do sleep, occasionally), so I'm left to wonder whether it's over or it's just waiting to flare up again when both are back online. Chuck Entz (talk) 21:36, 5 June 2016 (UTC)
  • A proposal for desysopping amounts to harassment, in my opinion. DonnanZ (talk) 22:04, 5 June 2016 (UTC)
    Preventing such proposals would seem to be creating an untouchable ruling elite... Equinox 22:21, 5 June 2016 (UTC)
Yes, but no one seems to be backing the proposal, so it's not the brightest of ideas, just a desperate measure. DonnanZ (talk) 22:37, 5 June 2016 (UTC)
  • Each party has suggested the other's desysopping (above at at [6]) — and given that both parties are wheel-warring using admin tools/privileges, and that one blocked the other while edit-warring with him (as noted above), following both proposals and emergency-desysopping both may be in order if the warring continues. - -sche (discuss) 22:40, 5 June 2016 (UTC)
  • So blocking the other side of the argument is completely justified and one should not lodge a complaint after such abuse of rights? Ridiculous. Very disappointed in the Wiktionary community; seems to be a place for admin bullies who wilfully block others and maintain their modules without the slightest consideration of the consequences. Will greatly reduce the amount of time spent here. Considering quitting. Wyang (talk) 23:14, 5 June 2016 (UTC)
No one is excusing CodeCat's behavior, but de-sysopping is a very serious step, and one best not considered in the midst of a dispute, unless circumstances demand it. Chuck Entz (talk) 03:37, 6 June 2016 (UTC)

Thai Transliteration Debate Explained (I think)[edit]

This all revolves around what Latin text should be used to represent the letters of the Thai script when templates link to a Thai entry. The Thai script is mostly phonemic, but there are exceptions where the same letters can be read as different sounds, depending on the term. A true transliteration always represents the same letter or sequence of letters with the same Latin letter or sequence of letters, no matter how it's pronounced. A transcription represents the sounds of the text.

The transliteration can also be forced to be more like a transcription by using a respelling: a sequence of letters that can only be interpreted as the actual sounds of the term. That would be like spelling cathouse as "cat-houss" so the "th" doesn't get read as a digraph like it is in cathode and the "se" doesn't get read as a "z" like it is in "rouse". The template {{th-pron}} is used in Thai entries to display pronunciations, and the input often has to be respelled to get the right results.

The module that does the linking (Module:links) will show a transliteration for a term in a non-Latin script if we pass it as text using the |tr= parameter. If there's no |tr= parameter, it next checks whether there's a transliteration module listed for the language in our language data modules. If there is, it gets the transliteration from that module. Perhaps I should use quotes here, because we sometimes stray from transliteration to transcription when the sounds depart from the actual letters in odd or unexpected ways.

Thai has a transliteration module listed, (Module:th-translit), but this just calls the same module that {{th-pron}} uses(Module:th-pron) - the one that requires respelling to work right.

What happened[edit]

Back in February Wyang put code into Module:links that checked for Thai, then called a function in a different module than that used for the transliteration. This function basically checked if there was an entry for the term, and if there was, looked in the source of the entry for the {{th-pron}} wikicode. If it found the template, it took the template's (respelled) parameters and substituted them for the the actual spelling of the entry name, then called the same module that the transliteration module did. Whatever the module returned was returned in turn to Module:links (sorry), which used that instead of calling the regular transliteration module.

Nobody but the Thai editors noticed this for 3 months, until, at the end of May, CodeCat reworked that part of the module and, in the process, removed Wyang's code- perhaps without realizing it had been there. Thai editors asked Wyang why the link transliterations weren't working right anymore, so he put his code back in to fix the problem.

This time, CodeCat noticed the code and couldn't immediately figure out what it did, so she left a message on Wyang's talk page. In the meanwhile she reverted Wyang's edit. Soon after that, Wikitiki89 came up with a compromise that incorporated Wyang's code from Module:links into the Thai transliteration module.

When Wyang responded to the comments on his talk page 11 hours later, he explained his code and the rationale for it in detail, and expressed his annoyance at CodeCat's reverting his edit before finding out what it did.

Having explained himself, he went back and reverted CodeCat's revert to reinstate his edit.

CodeCat then responded by explaining on Wyang's talk page why she thought it was a bad idea to put custom code in Module:links, but then went on to say that the problem was all due to deficiencies in the transliteration module and tell him that his code wouldn't be allowed back until she was convinced it was necessary. She then reverted his revert of her revert of his edit.

If you don't already have a headache from this- it gets worse. They then proceeded to revert-war back and forth, stopping every once in a while to argue and denounce each other angrily (see above). Then CodeCat blocked him for edit-warring- which accomplished nothing, since he immediately unblocked himself. Then Wyang called for CodeCat to be de-sysopped, and CodeCat called for Wyang to be de-sysopped.

The issues[edit]

Filtering out the misunderstandings and trash talk, here's what I see the basic core arguments are (my formulation, not theirs):

CodeCat
  1. A general-purpose, high-traffic module like Module:links shouldn't have special cases hardwired into it- language-specific code should go in the language-specific modules.
  2. The transliteration modules aren't just for transliteration- they can provide transcriptions, if that's what's right for the language.
Wyang
  1. Thai and other languages like it need special treatment, because they need transcriptions rather than transliterations
  2. The version of the modules that CodeCat keeps reverting to isn't the same as his version.
Concerns from others
  1. Modules getting data from entries is a very bad idea.

My 2 cents[edit]

I agree more with Wyang's view of the events, but agree more with CodeCat on the substance.

CodeCat was wrong to revert Wyang's edit without knowing what it did. Her response to Wyang was too confrontational and demanding. Her poll wasn't really an accurate reflection of what Wyang was asking for, and the block did nothing but make things worse- much worse. On top of that, her characterization of the dispute is rife with spin and trash talk.

Of course, once the revert-war started, Wyang was a full partner in the mudfight, so I'm not giving him a pass, either.

I think the place to deal with Thai's peculiarities is in the Thai transliteration module, not in Module:links. Is there any module other than Module:links that gets the name of the transliteration module from our language data modules (in this case Module:languages/data2)? If not, we should take the function called by Wyang's code (Module:th.getTranslit) and use it as the basis for the transliteration module that Module:links calls (basically what Wikitiki89 did).

Except... I'm not qualified to say much about the concerns expressed over going to other entries to get data. After thinking about it, I can see why Wyang felt he needed to do it: most people linking to Thai entries know nothing about respelling, so it's unrealistic to require passing it as a parameter, and creating a data module with all the terms needing respelling would be a monumental and possibly fruitless task. Still, I think the module should eliminate as much as possible of the straightforward stuff before resorting to such tactics, in order to keep them to an absolute minimum.

Sorry for the encyclopedic length of this, but I wanted to make sure I didn't miss anything. Chuck Entz (talk) 04:17, 6 June 2016 (UTC)

This is a fairly good summary of the past events. By looking at the Thai frequency list, I think it is safe to say that more than half of the 4000 most commonly used Thai words require some phonetic respelling. This number will only go up if we consider the entire set of Thai words, meaning that only relying on the Thai title linked to is quite hopeless at generating the correct transcription. So it boils down to the problem of whether to analyse the link destination to extract the correct pronunciation, or make it compulsory to supply the romanisation every time. I'm highly biased towards the former as I think page parsing is the best functionality on Wiktionary, and I would imagine the natIve Thai editors to be not very welcoming to the idea of the latter either.
Regarding transliteration vs transcription, this is an issue that extends to many languages beyond Thai. Tibetan and Burmese are good examples that come to mind. I wrote Module:bo-translit (Tibetan) and Module:my-translit (Burmese) a while back, which form the backend for the Wiktionary transliterations of these two languages. The schemes used are the Wylie transliteration and MLCTS schemes respectively, both of which are transliteration schemes, and transliterated outputs of Tibetan and Burmese texts from these schemes have been used wherever the native script appears, whether it be in a Tibetan or Burmese language entry, in the etymologies of other languages or in translation sections.
The universal use of these transliteration schemes is confusing to many unfamiliar with the languages, especially casual visitors to the site. Consequently, there should be additional transcription modules developed for the two languages, used to generate the appropriate romanisation in some circumstances on Wiktionary. The most important circumstance under which transcriptions are desired is probably in translation sections. At the moment someone looking to say "eight" in Tibetan would be absolutely clueless when the person saw the following result on the page eight:
བརྒྱད ‎(brgyad)
Same with someone trying to say "long" in Burmese:
ရှည် ‎(hrany)
The pronunciations of these two words are /cɛʔ˩˧˨/ (Transcription: gyaew) and /ʃè/ (Transcription: she), which the person reading the pages eight and long would not have guessed if (s)he only stayed on those pages. For other circumstances, such as ordinary inter-entry linking, the use of a transliteration method of romanisation is probably better (especially in etymologies), although the decision is to be made by all active editors. The realisation that romanisations used in translation sections should resemble the pronunciation as much as possible has been present on Wiktionary. Compare the Wikitext in the Russian translation of catheter:
{{t+|ru|кате́тер|m|tr=katɛ́tɛr}}
This is despite the fact that there is a Russian transliteration module on Wiktionary, which in this case would generate a correct transliteration but an incorrect transcription outcome. On a whole, the distinction between transliteration/transcription in Western languages is very minor compared to languages of the East, for which no infrastructure for this distinction is provided on Wiktionary at the moment. This is how Module:languages/data2 appears currently:
m["tt"] = {
	canonicalName = "Tatar",
	scripts = {"Cyrl", "Latn", "Arab", "tt-Arab"},
	family = "trk-kip",
	translit_module = "tt-translit",
}
This works well with alphabetic languages. For many languages of the East, the section should be more detailed:
m["bo"] = {
	canonicalName = "Tibetan",
	scripts = {"Tibt"},
	family = "tbq",
	ancestors = {"xct"},
	translit_module = "bo-translit",
	transcript_module = "bo-...",
	transcript_in_links = false, --optional
	transcript_in_translations = true,
}
This is the reason I regarded this problem as a lack of support from the central modules, and did not consider changing Module:th-translit into a transcription module as an appropriate way to tackle this. Wyang (talk) 08:36, 6 June 2016 (UTC)
@Wyang: One thing I'm confused about, is if you are planning to use the transcription instead of the transliteration, why do you need a transliteration module? --WikiTiki89 18:21, 6 June 2016 (UTC)
Different languages have different uses of transliteration modules. For Thai, editors have agreed on the use of transcriptions in translation sections and in normal links, although transliteration may be the better option of romanisation of Thai terms in etymologies of other languages, when the module calling Module:links is Module:etymology. For Tibetan and Burmese, transcription should be used in translations, whereas transliteration is the better mode of romanisation in generic links, as there is good one-to-one script correspondence and makes etymologies much more apparent. The modules should be kept and named accordingly for languages where the distinction is important on a romanisation level. Wyang (talk) 00:47, 7 June 2016 (UTC)
@Wyang: Ok, now I understand better what your intentions are. However, I don't think it's a good idea to use different transliteration/transcription systems in different places. This is something the Wiktionary community should agree on as a whole, and not just the Thai editors (and the Tibetan and Burmese editors). The other issue is that parsing a linked-to entry to determine the word's phonetic transcription is a really bad idea for a number of reasons that have already been pointed out in the above discussions. What would be wrong with manually supplying these transcriptions? You can even add the manual transcriptions with a bot, which is similar to what User:Benwing2 did for Russian accent marks. Changing the logic of Module:links is not the right solution to either of these problems. --WikiTiki89 14:21, 7 June 2016 (UTC)
From the experience with parsing in the past one and a half years, I would say that the associated harm is very minimal and benefits are extensive. This is somewhat similar to the case of the deletion of Template talk:str index (used in py-to-ipa then) that I contested about five years ago, well before the advent of this Lua system, and the difference is that the benefit-to-harm ratio in this case is even higher. People were not even that warm to the idea of automatic transliteration back then. The earliest and most important use of parsing is in {{zh-forms}}, and it has resulted in dramatic changes in the way that Chinese entries are formatted. Code is much more succinct, and as a consequence efficiency and productivity have exponentially increased (examples of use: 安眠藥, 暗物質, 報酬遞減定律).
Tools should only be used in situations where they must be. In the case of parsing for transcriptions, it is irrelevant to most of the languages hosted on Wiktionary and therefore most editors on the site. Most people have no experience and will have no experience with this. People tend to show aversion to the unfamiliar, and when the aversive mentality is voiced collectively by similar-minded peers, the disinclination is irrationally amplified and may as well convincingly mask the reality, which may only be visible to those centrally involved. (This may well underlie some political phenomena and explain the difficulty experienced with the Chinese entry format change here.) I would be arguing that new technology should be actively embraced and not feared (Wikipedia:Don't worry about performance). Likewise, transcription should be achieved automatically and people/bots should not have been manually supplying the transcriptions since the infrastructure is fully functional with no demonstrated risks. Even if there are, the focus should be on how to solve it, not on how to disable it.
With regard to the partial change to transcriptive romanisation, I argued for what I consider as appropriate for Tibetan and Burmese and would be happy to hear about other ideas. On a historical note, before the creation of Module:my-translit, most formatted Burmese entries were using the BGN/PCGN system for romanisation, which is a transcription system, and the change to a transliteration system (MLCTS) occurred due to the higher success rate of automation of the latter, which allowed a much wider coverage of romanisation for the Burmese content. It is a decision to be made by Burmese-language editors collectively, and people should have the freedom to choose a practice of romanisation that is most appropriate for the language, with modules using the two modes (transliteration and transcription) of romanisation for this language already recorded in the backend database, and infrastructure in place for determining which system should be used where. For instance, if Burmese uses transcription in links I would still suggest that any calls to Module:links by Module:etymology use the Burmese transliteration module to generate romanisations, as Burmese transcriptions are much less informative for this purpose. Wyang (talk) 08:53, 8 June 2016 (UTC)
You make some good points. I'll need to think about this for a bit. But also note that {{Wikipedia:Don't worry about performance}} does not apply here. The page states "You, as a user, should not worry about site performance. In most cases, there is little you can do to appreciably speed up or slow down the site's servers. The software is, on the whole, designed to prohibit users' actions from slowing it down much." But the concern is not slowing down site performance, but that since the site's performance is protected by time and memory limits, we have frequently seen on Wiktionary these limits being reached and producing errors. Thus, performance is still an issue, even though its consequences do not affect the site's performance overall. --WikiTiki89 14:40, 8 June 2016 (UTC)
So, what happens now? Can we please get rid of the Thai code from Module:links now, or do we need some more edit warring? —CodeCat 12:06, 11 June 2016 (UTC)
Do you have any constructive suggestions? DCDuring TALK 14:25, 11 June 2016 (UTC)
Reinstate Wikitiki's original 3 edits and be done with it. —CodeCat 15:30, 11 June 2016 (UTC)
I not that Wikitiki's comment of three days ago made it seem that he hadn't come to that final conclusion. DCDuring TALK 00:21, 12 June 2016 (UTC)
  • User:Chuck Entz has described the situation very well. User:Wyang has created a working code for Thai transliterations/transcriptions and character sequencing. It is another commendable achievement of his. Few people attempted to work with scripts of such complexity as Thai. The majority of developers think that Thai is simply not transliteratable, even the phonetic respelling. User:CodeCat has broken the code for the reasons she mentioned. So, Thai transliteration modules stopped working and no alternative was offered. Thai editors were left wondering what was going on. User:Wikitiki89 has provided a workaround (later). I don't really know if it's a good fix. it should, of course, be considered but Wikitiki89 is not sure himself. There could be other solutions for many solutions but breaking an existing code without really offering a working solution is wrong. It seems CodeCat simply doesn't care about thousands of Thai entries, translations, editors and tremendous work put into this. I fully understand Wyang's frustration. I hope this conflict will be resolved peacefully. I don't want anyone desysopped but I encourage more consideration of other people's work. I'll leave the final technical solution to the people who understand it better. I don't see a huge reason for Module:links not to take some of the work (language-specific customisations) and/or accommodate handling of complex scripts with various levels of possible transliteration/transcription. For example, we capitalise transliterations of Korean proper nouns with a symbol "^" using the module.
  • As for the transliteration/transcription for Thai - a graphical (literal) transliteration for the Thai script is not used anywhere, no Thai dictionary uses non-phonetic transliteration, it would produce nonsensical garbage, even for many words with regular or predictable spellings, just like many English words would if they were transliterated graphically into another script, e.g. "light" (l-i-g-h-t) - Cyrillic лигхт ‎(ligxt). A phonetic Thai transliteration is not only popular but it's also standard. There are various Thai transliteration standards but none of them is graphical (showing sequence of symbols). A graphical spelling can also be provided, please see กรรเชียง ‎(gan-chiiang), which shows the actual orthography (including the phonetic respelling of the term - "กัน-เชียง). The one adopted here is based on Paiboon publisher of dictionaries, phrasebooks and textbooks. Royal Thai General System of Transcription is also phonetic but not very useful for learners - no tones, no long vowels, etc. --Anatoli T. (обсудить/вклад) 04:27, 14 June 2016 (UTC)

Google Scholar[edit]

Can we use Google Scholar for attestation? --Daniel Carrero (talk) 05:16, 7 June 2016 (UTC)

We can use Google Scholar to locate permanently archived journal articles, so I'd say yes. —Aɴɢʀ (talk) 07:28, 7 June 2016 (UTC)
We have traditionally counted it at RFV. —Μετάknowledgediscuss/deeds 08:13, 7 June 2016 (UTC)

Case order in German declension tables (others too probably)[edit]

German declension tables are vertically split by case. The cases are ordered nominative, genitive, dative, accusative. This makes no sense to me! It would be better if it was nom, acc, dat, gen:

  1. Conceptually, nominative and accusative are the most fundamental, and then dative is a variation on accusative. Genitive is then its own thing.
  2. The forms of practically everything (articles, adjective declension etc) tend to match in either nom+acc or acc+dat, and sometimes dat+gen. This ordering would place them next to each other.

A similar but more minor thing occurs with gender: it's ordered MFN, when usually the masculine and neuter forms are more similar, or sometimes F+N, but rarely M+F.

Why is it in this order? Would people support it being changed? Issues with this I'm imagining:

  1. There's some (stupid!) tradition that it's written in this order.
  2. It'd have to be changed across all languages or none.

This is how it would look the way I'm suggesting.

Fedjmike (talk) 07:44, 7 June 2016 (UTC)

You seem to think traditions are stupid. We have to cleave closely to traditions to be taken seriously as a scholarly work. Admittedly, some German grammarians do have a different order, but I would say that the one we use is probably the most traditional. Changing things up because you like them better is not a convincing argument. —Μετάknowledgediscuss/deeds 08:12, 7 June 2016 (UTC)
Yeah, guilty as charged wrt tradition. But I'm not saying change it because I don't like it, I gave what I think are good reasons for that order. Which sources use the current order, and why? I'd like to at least read about it and understand why they use it. I'm not sure I understand your argument about needing to match tradition; whose approval is Wiktionary trying to get, and why would it matter to them if it were to use a less conventional ordering of cases in tables? Fedjmike (talk) 08:43, 7 June 2016 (UTC)
Switching to nom-acc-dat-gen order has been proposed before a number of times. I am in favor of it. - -sche (discuss) 08:29, 7 June 2016 (UTC)
As am I. Leasnam (talk) 00:02, 10 June 2016 (UTC)
I don't really care which order the cases are in as long as nominative is first, but the advantage to sticking with tradition is that it's what readers will expect. I would be thrown off by an adjective declension table that put the gender columns in the order masculine-neuter-feminine, because over the years I have come to always expect masculine-feminine-neuter, and not just for German but for all languages with those three genders. I have no doubt we would get a lot more complaints about a declension table that put neuter between masculine and feminine than we get about the current order. —Aɴɢʀ (talk) 09:10, 7 June 2016 (UTC)
  • As someone who favours monolithic integrated tables over clear but repetitive tables, I'm also in favour of ordering the tables so that the number of cells is as small as possible. As such I'm giving strong support for NADG and having n/m and f/p next to each other. Korn [kʰũːɘ̃n] (talk) 09:18, 7 June 2016 (UTC)
  • Wikipedia uses the order NAGD (en and de, as well as fr.wikt). However de.wikt uses NGDA, and fr.wikipedia NADG. I am personally more familiar with NADG (I learned German in a French school). All that to say that the order of German declension seems to be far from being cast in stone, so we may as well choose the one that makes the most sense to learn the language. — Dakdada 11:11, 7 June 2016 (UTC)
  • FWIW, my German learning books mostly use NADG (presumably since that's the order that learners come across them). It depends whether we want to go for the scholarly one or the German-as-a-second-language one. Smurrayinchester (talk) 14:04, 7 June 2016 (UTC)
Awhile ago I proposed using NADG. This is what I find in my German books and it definitely makes the most sense to me. The NGDA order is only done in imitation of Latin. Perhaps this should be voted on. Benwing2 (talk) 01:28, 8 June 2016 (UTC)
For Slovene, the common order is also NGDA but we use NAGD here. For old Germanic languages we seem to use NAGD order, while for modern Icelandic and Faroese we use NADG. I personally find NGDA order to be really annoying and counterintuitive (given that nominative and accusative are the most common cases and often identical) and would favour abandoning it for all IE languages, Latin included. —CodeCat 12:27, 8 June 2016 (UTC)

What's the difference between a journal and a magazine?[edit]

We have both {{quote-journal}} and {{quote-magazine}}, with identical parameters. Could we combine these into {{quote-periodical}}? Is there a reason to distinguish journals and magazines, and if so what criteria could be used? DTLHS (talk) 23:59, 8 June 2016 (UTC)

Hmm. To me, a magazine is usually a mainstream popular publication you can find in shops, while a journal (unless we're talking about a personal diary) is usually an academic thing that gets published in volumes and issues. If you look at the APA academic style for citing the two things, there isn't much difference apart from the fact that journals come out in volumes and issues. They don't even require the publisher and city for either of them, despite requiring it for books. Equinox 00:07, 9 June 2016 (UTC)
Periodicals Agreed that the difference is mostly popular perception and occasionally a title will cross over, such as National Geographic which is certainly scholarly but also available in popular locations such as bookstores and dentists' offices. There's no particular reason to have separate templates and certainly many popular magazines have "volumes" and "issues" amongst those volumes. I agree with rolling them into one and having the other two templates redirect to it. —Justin (koavf)TCM 00:12, 9 June 2016 (UTC)
Okay. Magazines are more a subset of journal than vice versa (I think?), so shall we propose that we keep quote-journal (with volume/issue optional, since some magazines only have a month&year) and drop quote-magazine as redundant? Equinox 00:21, 9 June 2016 (UTC)
An even better idea: call it quote-periodical because "magazines are journals" is open to some debate but "magazines and journals are both periodicals" is not. Equinox 00:22, 9 June 2016 (UTC)
@Smuconlaw Do you have any input here? DTLHS (talk) 02:30, 10 June 2016 (UTC)
Actually, the primary template is {{quote-journal}}; {{quote-magazine}} and {{quote-news}} are just redirects to it. I suppose we used "quote-journal" by analogy to "cite journal" at the English Wikipedia. (According to the OED, a journal is "[a] daily newspaper or other publication; hence, by extension, Any periodical publication containing news or dealing with matters of current interest in any particular sphere", while a magazine is "[a] periodical publication containing articles by various writers; esp. one with stories, articles on general subjects, etc., and illustrated with pictures, or a similar publication prepared for a special-interest readership". A usage note adds: "The use of the word (rather than periodical) typically indicates that the intended audience is not specifically academic.") — SMUconlaw (talk) 02:42, 10 June 2016 (UTC)
Periodical seems the most generic of the candidates and therefore seems the least confusing for new users. But the redirects solve most practical problems. It is only when reading documentation that a user is likely to notice what the "real" template is. DCDuring TALK 10:42, 10 June 2016 (UTC)
I should also add that the template accepts the parameters |journal=, |magazine=, |newspaper=, |periodical= and |work=. — SMUconlaw (talk) 17:39, 10 June 2016 (UTC)
That's handy. But users might expect there to be a parallel in name between the template they want and {{quote-books}}. It wouldn't much inconvenience us to have a few redirects to {{periodical}}, would it? DCDuring TALK 17:59, 10 June 2016 (UTC)
We could create {{quote-periodical}} as a redirect to {{quote-journal}}. It may be a good idea to retain {{quote-journal}} as the primary template for consistency with other Wikimedia projects, as I suspect that many editors work on multiple projects. — SMUconlaw (talk) 00:04, 11 June 2016 (UTC)

{{hu-verb}} - no links in multi-word entries[edit]

Even though {{hu-verb}} is connected to {{head}}, it does not create links for each member of a multi-word entry. I can't figure out why. Can someone please help? It contains only a single line. Thanks. --Panda10 (talk) 12:47, 9 June 2016 (UTC)

Pagename is automatically treated as the argument in |head=. It should be fixed now. Wyang (talk) 12:58, 9 June 2016 (UTC)
Thanks so much! :) --Panda10 (talk) 13:04, 9 June 2016 (UTC)

Phrasebook vs. phrases categories[edit]

Is there a way to place phrasebook expressions/sentences only to the phrasebook category and remove them from the phrases category? In the past, I tried to solve this by using {{head|hu|phrasebook}}, but that was changed by a bot to {{head|hu|phrase}}, so it seems this is not accepted. Is there another way? The phrases category is cluttered up with sentences that really belong to the phrasebook only. Thanks. --Panda10 (talk) 15:57, 9 June 2016 (UTC)

Actually, Category:English phrases has 1,776 entries and Category:English phrasebook has 358 entries. Removing all phrasebook entries from the phrases category would mean a change of 20%. Just my opinion: I don't think the phrases category is too cluttered by phrasebook entries, and I don't think it would be much more improved by that change of 20% to justify the work to do it.
If we had some sort of distinction between "phrases" and "phrasebook", a few examples like how are you and good night would still fit both categories; and hello is both an interjection and part of the phrasebook. (currently, the POS header of good morning is Interjection and that of good night and good afternoon is Phrase, and that of good evening is Noun). --Daniel Carrero (talk) 16:20, 9 June 2016 (UTC)
I see your point. However, the percentage will be different for every language. Also, the 20% for English is true today, but may change in the future. I would still be interested to find out if there is a way to do this within the policies of this wiki. --Panda10 (talk) 16:45, 9 June 2016 (UTC)
More to the point, {{head}} and other headword templates categorise entries by part of speech. "Phrasebook" is certainly not a part of speech. —CodeCat 16:46, 9 June 2016 (UTC)
@CodeCat: Are you saying that phrase is a part of speech? --Panda10 (talk) 11:54, 10 June 2016 (UTC)
Many of our multi-word expressions are not phrases and not constituents. They are sometimes designed to simply be the target of a long list of redirects or to appear at the top of a no-entry search list. Because we never have an explicit "not elsewhere classified/categorized" category, inevitably some category or categories becomes the junk-catching category. In English grammar, "adverb" has long been one such. For us, "interjection" and "phrase" serve similar functions.
"Interjection" is a misnomer as we apply it. How does hello fit our definition of interjection in most of its normal uses? Collins uses "sentence substitute" (read "prosentence" if you'd prefer a technical word) for hello for example.
"Phrase" would benefit from a similar kind of split into one or more categories, though "phrasebook" is not any kind of grammatical category and would probably not be part of a long-term solution. DCDuring TALK 22:44, 9 June 2016 (UTC)

bot status vote 2[edit]

Planned, running, and recent votes [edit this list]
Ends Title Status/Votes
Jul 19 label → lb passed
Jul 22 User:Smuconlaw for admin passed
Jul 28 CFI: List of terms Symbol support vote.svg4 Symbol oppose vote.svg6 Symbol abstain vote.svg0
Aug 1 Tohru for deadmin Symbol support vote.svg5 Symbol oppose vote.svg0 Symbol abstain vote.svg1
Aug 13 Editing "Flexibility" Symbol support vote.svg4 Symbol oppose vote.svg1 Symbol abstain vote.svg0
Aug 20 Adding PIE root box Symbol support vote.svg0 Symbol oppose vote.svg7 Symbol abstain vote.svg0
Aug 23 Pronunciation 2 Symbol support vote.svg2 Symbol oppose vote.svg1 Symbol abstain vote.svg0
Sep 21 Using template l to link to English entries Symbol support vote.svg1 Symbol oppose vote.svg4 Symbol abstain vote.svg0
Sep 28 Using template l to link to entries starts: Jul 31

Some are asking that the vote on User:RobokoBot be closed out. —Stephen (Talk) 12:47, 10 June 2016 (UTC)

Yes check.svg Done --Daniel Carrero (talk) 15:56, 11 June 2016 (UTC)

Category:Form-of templates[edit]

It says that the templates are used in the definition line, but some of the templates can be (and, sometimes, can ONLY be) used in the etymology section. This may happen when the etymology word does not mean the same as the derived word rendering the link to etym word useless. Am I right? So, I guess we need to add a new parameter to all of the templates to inform them if they are used in the etymology section. Also if this change is implemented the templates should be aware that the word it comes from may be a different language (like {{compound}}). For instance, English lb is an abbreviation of Latin libra. --Dixtosa (talk) 13:43, 11 June 2016 (UTC)

Just found that there are two versions of the templates already for Russian. {{ru-etym abbrev of}}, {{ru-etym acronym of}}, {{ru-etym initialism of}},* {{ru-abbrev of}},* {{ru-acronym of}},* {{ru-clipping of}},* {{ru-initialism of}}--Giorgi Eufshi (talk) 06:30, 22 June 2016 (UTC)

Conversation about origin of words[edit]

In conversation, when someone says "You're using that word wrong. When the word was first used, the meaning was different." Or: "That word came from (Ancient) Greek and the Greeks used it to mean something else. Therefore we should all use the original (?) meaning invented by the Greeks." What would you say to that person? --Daniel Carrero (talk) 22:24, 11 June 2016 (UTC)

Language changes. I would find some examples where the person, not being a linguist at all, didn't know about the change ("sad" is a good example), and ask them why they are not using the word in its original sense; or why they speak English at all, when it isn't the oldest language ever invented. Equinox 22:26, 11 June 2016 (UTC)
That they are falling for the w:etymological fallacy. Enosh (talk) 15:09, 12 June 2016 (UTC)
That sounds like a perfect Quora question. ("perfect" in a sense that it is perfectly characteristic to Quora). --Dixtosa (talk) 15:19, 12 June 2016 (UTC)
Folk etymology For that matter, a lot of folk etymologies are just wrong. You could ask the person, "If it turns out that the actual original meaning is [X] instead of [Y], then would you change your behavior...?" and the answer is no. —Justin (koavf)TCM 22:55, 12 June 2016 (UTC)

t:cite-meta author format[edit]

While discussing the new template {{R:M&A}}, I noticed the strange fact that we use semicolons to delimit the authors in {{cite-meta}} as opposed to the traditional “A, B, & C”. To that end I created a template {{format list}} which will take parameters and write them out in the normal list format (yes, it could be done more elegantly in Lua, but I wanted to conserve Lua runtime and memory for more important stuff). User:Smuconlaw and User:Isomorphyc then pointed out that there might be some concerns about changing the citation format, so I thought I'd ask here. —JohnC5 02:42, 14 June 2016 (UTC)

I cannot think of a conventional reason to use semicolons rather than commas; I do favour the change suggested by User:JohnC5, though since the template is very widely used, I thought it would be reasonable to ask around a bit first. Isomorphyc (talk) 02:56, 14 June 2016 (UTC)
@Smuconlaw, Isomorphyc Any further thoughts on this? I'm still in favor of making this change. —JohnC5 20:33, 1 July 2016 (UTC)
@JohnC5 I am too, and I think nobody has objected because there is no sensible objection. Isomorphyc (talk) 20:45, 1 July 2016 (UTC)
Perhaps we can consider the following scenarios, and decide how they are best set out:
  • A list of co-authors: "John Doe; Mary Doe; Richard Roe" (example 1A) or "John Doe, Mary Doe, Richard Roe" (1B).
  • An author and a translator: "John Doe; Mary Doe, transl." (2A) or "John Doe, Mary Doe, transl." (2B) (perhaps the latter would need to be changed to "John Doe, Mary Doe (transl.)").
(Please add additional examples if you can think of any.) I don't this there is any issue with either example 1A or 1B. Because we currently use semicolons, example 2A works well, but we are likely to encounter a problem with example 2B if we switch to commas. If we do so, I wonder if we can make corrections by bot? — SMUconlaw (talk) 20:52, 1 July 2016 (UTC)
Either option works quite well. The main issue is that they don't represent the citation convention. While it may be odd and sometime ambiguous, that's how the system works. That's why I'm bringing this up. —JohnC5 21:07, 1 July 2016 (UTC)
I'm not sure both 2A and 2B work equally well. Example 2B is ambiguous – let's say we have a list of three names, like this: "John Doe, Mary Doe, Richard Roe, transl.". Is Richard Roe the translator, or are both Mary Doe and Richard Roe translators, or (probably unlikely) are all three of them translators? I suppose I am saying that I do not mind a switch to commas, but if we do so I think we will also need to advise editors to start using parentheses for descriptors of that sort (i.e., "(transl.)") as well as to arrange for a bot update current uses. To maintain consistency in citations, ", editor" will also need to be changed to "(editor)". — SMUconlaw (talk) 21:35, 1 July 2016 (UTC)
I would also add |translator= to enforce such style conventions. —JohnC5 21:46, 1 July 2016 (UTC)
I suppose I could ... just wondering whether it's a good idea to add so many different parameters. Anyway, do we need more views before switching to commas? — SMUconlaw (talk) 12:55, 2 July 2016 (UTC)

──────────────────────────────────────────────────────────────────────────────────────────────────── Another issue has occurred to me: what if editors use the "last name, first name" format? For example: "Doe, John; Doe, Jane; Roe, Richard" (3A) vs. "Doe, John, Doe, Jane, Roe, Richard" (3B). I think you will agree that example 3B is unacceptable, which means we will have to retain semicolons for that format. Switching to commas will thus require this part of the template to be substantially rewritten. — SMUconlaw (talk) 11:43, 3 July 2016 (UTC)

@Smuconlaw: To be honest, I've been curious why you hadn't brought up 3B this entire time (I thought it would be the main point of contention). The fact is that's precisely how it normally works (except with an ampersand). This, I think, is often mitigated through abbreviation or omission of the first names (as seen in this APA documentation). I could whip up a template that would convert a names into an abbreviation. The main issue remains that no system uses the semicolon. I like the semicolon personally and think it disambiguates the authors nicely; it just doesn't represent any format I've ever seen. Our decision at the moment seems to be between making an unambiguous format that only we use or following a standard that could be ambiguous. I tend to lean towards the latter, even if it means adding some extra logic to lessen ambiguity where possible. —JohnC5 15:16, 3 July 2016 (UTC)
It only just occurred to me, as I don't use the "last name, first name" format. OK, so what do we do now?
  • Wait to see if other editors have comments?
  • Go ahead and switch to commas for all cases?
  • Switch to commas for all cases except the "last name, first name" format?
SMUconlaw (talk) 09:36, 4 July 2016 (UTC)

Category:English merisms and Category:English lexical doublets[edit]

What is the difference between these two categories? Do we need both? Is the definition of merism "a pair of contrasting words" in linguistics? --Panda10 (talk) 13:05, 14 June 2016 (UTC)

Merisms seem, on the face of it, to be a subset of lexical doublets, just as reduplicated doublets are. DCDuring TALK 15:15, 14 June 2016 (UTC)

Wikimania 2016[edit]

Slides of the talk about Wiktionary at Wikimania.

Hi, English-speaking Wiktionarians!

Wikimania is an annual meeting to discuss global issues in the Wikiverse. This year Wikimania take place in Italy, June 22 to 26 and the programme is here. Three nice French contributors plan to be there to talk about Wiktionary! Yes, our little-known project by non-English speakers. Is it not intriguing?

We already mentioned here our proposal in January 2016 and we are now in the process of organizing our slides. We are not ready yet but we want to make the building of the talk as collaborative as our projects. So, feel free to have a look at it and point out every mistake in the language or part you want more details on.

+ we want to meet you guys! So, if some of you come to Wikimania, please come to our talk or to a meetup later on the evening! Come to have a glass of Italian wine with us and discuss our amazing projects! If you plan to travel to France, for instance to see a football game, tell us, we'll be glad to host you! Noé (talk) 13:55, 14 June 2016 (UTC)

Hi! I update the slides with the version we broadcast during Wikimania today. If you have question, feel free to ping us or to come to visit our Wiktionary Noé (talk) 15:37, 25 June 2016 (UTC)

When the quotations are in phonetic transcription[edit]

Do we have any established customs regarding what do with quotations that aren't written in a phonetic transcription rather than the usual orthography of the language in question? I have a book of Burmese proverbs that writes all the Burmese in transcription, not in Burmese orthography; likewise Die araner mundart has lots of usage examples for Irish written in phonetic transcription rather than conventional orthography. So far, I've just been putting these things in conventional orthography, but that goes against our usual custom of transcribing quotations exactly as they're written in the source. —Aɴɢʀ (talk) 17:35, 14 June 2016 (UTC)

The ideal would be to find a native Burmese/Irish source written in the native orthography and quote from it. --WikiTiki89 17:53, 14 June 2016 (UTC)
I do do that when possible for Irish, but when I'm working through Die araner mundart to make sure we have entries for all the words it lists, it's easier to give the same examples. Also, it's a good source for unalloyed dialectal Irish rather than standard "school" Irish. —Aɴɢʀ (talk) 18:12, 14 June 2016 (UTC)
  • I might leave a note that the orthography of the text doesn't match the standard one, and maybe give both for the Irish example. I found one such time I did that, a while back, at mo'ai. —Μετάknowledgediscuss/deeds 18:07, 14 June 2016 (UTC)
    The difference there is much smaller than what I'm talking about. At aithrí, for example, I just added the usex "Mara ndéanfaidh muid aithrí inár bpeacaí, tá muid ar fad caillte", but what the source actually says is "mar ə ńīnə myȷ æŕī ə n-r̥ bȧkī, tā myȷ əŕ fad kāĺcə". —Aɴɢʀ (talk) 18:12, 14 June 2016 (UTC)
Is it a quotation or a usage example? Shouldn't you cite the source if there is one? DTLHS (talk) 18:16, 14 June 2016 (UTC)
It's a quotation of a usage example. The book I'm using is a reference work about this dialect; volume II is the dictionary, which provides usage examples taken from the author's fieldwork among native speakers. They're sentences that he heard spoken while he was living among Irish speakers, so this book is the only form in which these sentences have been published. And rather than writing them in conventional orthography, he writes them in his own ad-hoc phonetic transcription. —Aɴɢʀ (talk) 18:23, 14 June 2016 (UTC)
I suppose I could format it as a quotation along the following lines:
  • 1899, Franz Nikolaus Finck, Die araner mundart, Marburg: Elwert’sche Verlagsbuchhandlung, vol. II, 28:
    mar ə ńīnə myȷ æŕī ə n-r̥ bȧkī, tā myȷ əŕ fad kāĺcə.
    conventional orthography: Mara ndéanfaidh muid aithrí inár bpeacaí, tá muid ar fad caillte.‎
    Unless we do penance for your sins, we are all lost.
That would make it clear, wouldn't it? —Aɴɢʀ (talk) 18:30, 14 June 2016 (UTC)
That looks good to me. Maybe you want to make a special quotation template for this if you're going to be citing it a lot. DTLHS (talk) 18:36, 14 June 2016 (UTC)
Yeah, I was thinking about doing that. —Aɴɢʀ (talk) 18:47, 14 June 2016 (UTC)

Abbreviations in etymologies[edit]

Sometimes I see etymologies with abbreviations. Example: ferruminate contains "ferruminatus, p.p. of ferruminare".

I don't remember if it was discussed before, but based on Wiktionary:Todo/unhelpful abbreviations, I suppose abbreviations like p.p., q.v., Gr., and so on are disallowed in etymologies. Am I right? --Daniel Carrero (talk) 21:50, 14 June 2016 (UTC)

Since we are not a print dictionary, we don't need to save space. We decided (although I don't know when or where) that it's better not to use abbreviations in etymologies because not everyone will know or be able to guess their meanings. --WikiTiki89 21:57, 14 June 2016 (UTC)

"vernacular" as a label for Russian прост.?[edit]

@Atitarev, Cinemantique, Wikitiki89, Wanjuscha, KoreanQuoter Russian-language dictionaries of Russian commonly use the label "прост." for a certain type of colloquial words; "прост." stands for просторе́чный ‎(prostoréčnyj) which literally means "vernacular" or "common speech". This is different from "разг." = разгово́рный ‎(razgovórnyj) = "colloquial". I gather that words labeled прост. are considered lower-register than those labeled разг. For awhile I labeled these words as "nonstandard" but this doesn't quite seem right, as "nonstandard" would suggest these words are somewhat proscribed, which I don't think is quite correct. My Russian-English dictionary labels both types of words as simply "colloquial". I've started to label the прост. words using {{lb|ru|vernacular}}, but this label doesn't currently exist so it doesn't usefully categorize them. Should we create a "vernacular" label that categorizes into e.g. CAT:Russian colloquialisms (as {{lb|ru|colloquial}} does) and also maybe into CAT:Russian vernacular speech or similar? If not, how should these be handled? Benwing2 (talk) 03:06, 15 June 2016 (UTC)

I don't think "vernacular" is the best word for it. I'm tempted to say that we should choose in each case between "regional", "dialectal", and "colloquial", whichever fits better. --WikiTiki89 14:51, 15 June 2016 (UTC)
There's no good equivalent of просторе́чие ‎(prostoréčije) in English. If you have to choose between "colloquial" and "non-standard" labels, "colloquial" is probably better, IMO but not in all cases and some would argue. --Anatoli T. (обсудить/вклад) 21:54, 15 June 2016 (UTC)
English also has informal as a label that covers broad range of registers, but excludes academic papers, legal and government documents, and similar. DCDuring TALK 23:17, 15 June 2016 (UTC)
It may have to do with perceptions of non-standard language in Russian and English. Non-standard language has always been discouraged and people who use it looked down on. Perhaps a good example is ложи́ть ‎(ložítʹ). If someone uses it without prefixes or reflexive suffixes may be immediately identified as "uneducated", unlike when someone in English saying "gonna", "gotta" or "he/she have" (I'm sure there are better examples). There's not always a clear boundary between colloquial and non-standard (like in other languages), e.g. туды́ ‎(tudý) (another example of "просторечие") is often used in joke. --Anatoli T. (обсудить/вклад) 23:51, 15 June 2016 (UTC)
I see.
Because we have the labels informal ("not suitable for formal speech or writing"), colloquial ("conversational"), non-standard ("scorned by many"), and dialectal ("acceptable only in a region or population") [glosses per my Wiktionary idiolect) available, we should probably avoid using senses of these that overlap with senses of the others. In particular, colloquial has a range of meanings, often overlapping with non-standard, but the "conversational" meaning is distinctive, IMO. DCDuring TALK 00:42, 16 June 2016 (UTC)
@Atitarev OK. I think your point about sounding uneducated is important. In English, educated people will use colloquial or slang speech in a sufficiently informal context but will generally avoid nonstandard speech except when used self-consciously for effect. For this reason we say words like "ain't" and "alls" ("alls you need to do ...") and "drug" (instead of "dragged") and "have dranken" (instead of "have drunk" or "have drank") are nonstandard. ложить sounds like an example of this. But I get the feeling most things that are просторечный are more like colloquialisms or slang. For this reason I'll use "colloquial" from now on for lack of anything better; I'd rather have some way of distinguishing разговорный from просторечный but there may be no help for this. Benwing2 (talk) 04:24, 16 June 2016 (UTC)

{{defdate}} and the Shorter Oxford English Dictionary[edit]

We must have thousands of entries that "reference" this dictionary (example abstain), just copying their dates of earliest attestation. This seems like a copyright violation, given that we are just copying their research across a large number of entries. Am I wrong? DTLHS (talk) 21:10, 15 June 2016 (UTC)

Copyright protects the expression of information, not the information itself. It would likely be a breach of copyright to copy a definition from the SOED word for word, but probably not if what is copied is a piece of information such as the date of earliest attestation. — SMUconlaw (talk) 22:16, 15 June 2016 (UTC)

putting an internal link in a translation of an example sentence[edit]

Are we allowed to put an internal link in a translation of an example sentence? See 行李 (baggage carousel) and 吹噓 (bragging rights). I think it's useful since it makes it clear that the translation is also a set term in English, but I thought I read somewhere that we shouldn't format it as such. ---> Tooironic (talk) 15:13, 17 June 2016 (UTC)

If you really want to, go ahead. I would only do this in very limited circumstances. --WikiTiki89 15:40, 17 June 2016 (UTC)

Sources for pronunciations of English words[edit]

I notice that lots and lots of English words are missing their pronunciations. I'm thinking of trying to write a bot to add them but I need a free source of pronunciations that contains enough detail to map to IPA. Anyone know of such a source? It's not obvious that websters1913.com will work; e.g. for Nation they show Na"tion and for National they show Na"tion*al; whatever those symbols mean they don't seem to indicate the vowel-quality difference in the a in the two words. Benwing2 (talk) 11:28, 18 June 2016 (UTC)

When to use references?[edit]

Should we include references only when a term is rare or disputed? Or should we use them whenever possible? The policy pages don't really say. Is "credibility" such an important factor for our goals? If so there are thousands and thousands of entries we could easily reference in English, Spanish, French, German, etc. Ultimateria (talk) 15:25, 18 June 2016 (UTC)

@Ultimateria: I think each page should have at least one external link to a good monolingual dictionary online as far as possible, or in case of English, OneLook would do. But that is not really a reference to boost credibility but rather an external link to provide further reading. I am sure readers are going to love having great sources one click way. That said, the reader may use the links for verification as well. We should not pester entry creators for failure to add external links; adding is up to the people who want to add them, or actually probably up to bots and similar scripted operations. One risk that I see for us adding external links is that it may confuse reader into thinking that we actually use these external links for verification when we in fact use attesting quotations. That is one of the reasons why I much prefer the "External links" header to "References" for the purpose.
Wiktionary:Votes/bt-2016-06/User:OrphicBot for bot status is currently running and proposes to bot-add certain external links to as many Latin and Ancient Greek entries as possible, which I welcome, despite opposing.
My user User:DPMaid was volume adding external links to multiple languages as documented on its talk page and no one objected so far. (I do not see why anyone would.) --Dan Polansky (talk) 07:56, 19 June 2016 (UTC)
Good point on the header, I'll start using "external links" and try to add them more consistently. Ultimateria (talk) 10:25, 19 June 2016 (UTC)
@Dan Polansky While the choice for the Korean dictionary is good, the Russian {{R:BTS}} is incorrect. It's not linked to Большой толковый словарь but to gramota.ru. Not useful at all. Please undo your edits in Russian entries. I didn't check your other templates. A good bilingual Russian dictionary is [7], if the term can be added to the URL. --Anatoli T. (обсудить/вклад) 12:50, 19 June 2016 (UTC)
@User:Atitarev: The {{R:BTS}} template uses "?bts=x" in the URL to specifically select Большой толковый словарь from all the dicts available at gramota.ru. If you following the link in one of the pages, e.g. from словарь, and then look at the right, you will see a list of checkboxes, and only the one for Большой толковый словарь will get checked. I really don't see how that is "not useful at all"; it cannot get much more useful than it is. Would it help if I add "at gramota.ru" to the text shown by the template, to warn the reader that the dictionary is hosted by the site in case the reader does not like the site or something? --Dan Polansky (talk) 13:02, 19 June 2016 (UTC)
As an alternative, when I was designing the template, I pondered creating "R:DGR" with the text "word in Russian dictionaries at gramota.ru". That would also show the reader some other dicts, like one featuring synonyms and one featuring antonyms. --Dan Polansky (talk) 13:04, 19 June 2016 (UTC)
I see now what you were trying to do. It didn't open correctly on a mobile version. While there's some value of gramota.ru for advanced learners or editors, it's better to use a bilingual or multilingual dictionary for a broader audience, IMO. --Anatoli T. (обсудить/вклад) 13:47, 19 June 2016 (UTC)
Monolingual dictionaries are the best ones as far as coverage, depth and unambiguity. They often contain example sentences, which bilingual ones usually do not do. Some argue that bilingual dictionaries are a really bad thing; while that seems to be overstated, many bilingual dictionaries indeed are severely limited, and lead to a lot of unnecessary misunderstanding on the part of their readers. Nonetheless, I do not object to adding some good bilingual dictionaries as external links. However, removing links to good monolingual dictionaries only because they are monolingual would be a real loss for the reader. Anyone seriously interested in a language should read its monolingual dictionaries. --Dan Polansky (talk) 14:09, 19 June 2016 (UTC)
@Anatoli T.: Mobile: I started my mobile device, went to словарь and followed the link. Indeed, the specific page for the word did not show, and instead, I landed at a page not specific to a word, offering "proverka slova", in the proper script, of course. I was trying to play with the URL on my desktop by placing "m." there, to emulate the mobile view, but the servers seem to redirect me to the desktop view. This will be a rather poor experience for the mobile users, and I do not know how to fix it. One thing the mobile user can do is enter the word again into the search field on gramota.ru and get to the sought dictionary. From my experience, sites that offer both mobile and desktop view usually provide some links at the bottom or the top to make it possible for me to switch between the mobile and desktop views regardless of the type of the actual device; I see no such link at gramota.ru. We may hope they will improve this at some point. --Dan Polansky (talk) 10:23, 20 June 2016 (UTC)
@Dan Polansky. It's okey, I guess, nothing can be done. Mobile users could click on "полная версия" hyperlink ("full version", i.e. desktop version) at the bottom right corner in gramota.ru and get the expected link. --Anatoli T. (обсудить/вклад) 12:40, 20 June 2016 (UTC)
  • Personally, if I was more strict with quality control, I would add an external link to the DRAE on every valid page in Spanish. In reality though, that's never gonna happen. Obviously though, I'd love other users to add references and external links all over the place. --Turnedlessef (talk) 10:38, 20 June 2016 (UTC)

Inconsistency in the treatment of comparatives[edit]

Hello,

The Latin adjective melior is considered as a lemma, same for the French adjective meilleur, whereas the English adjective best is considered as a non-lemma form. Why this difference, and especially, why best is considered as a non-lemma form? It's not an inflected form of good, is it? — Automatik (talk) 02:43, 19 June 2016 (UTC)

It probably has something to do with the lack of inflected forms for the English term. If you treat meilleur as a form, that makes meilleures a form of a form, which gets confusing. With best, on the other hand, it's always "best"- no matter what it modifies. Chuck Entz (talk) 03:27, 19 June 2016 (UTC)
Also, best is, in fact, an inflected form of good, through the process of w:suppletion, in the same way as лучше ‎(lučše) is the comparative form of хорошо́ ‎(xorošó). Chuck Entz (talk) 03:35, 19 June 2016 (UTC)
meilleur is also a suppletion according to w:suppletion, so it is considered both as lemma and non-lemma form?… — Automatik (talk) 12:29, 19 June 2016 (UTC)
This argumentation doesn't work. We commonly treat participles as forms of verbs, but they have inflections too, including in French. Some languages also have possessive forms for nouns, like Hungarian or Turkish, but we don't treat them as lemmas of their own despite having inflections. I think we should use the same treatment regardless of language, when possible. The consideration I go by is whether you'd expect to find a form in a paper dictionary as a lemma. Participles and comparatives would not normally appear in a paper dictionary, being subsumed under the lemma of the main verb/adjective. So by that reasoning I would not treat them as lemmas on Wiktionary. I would consider nonlemmas that have inflections of their own a "sublemma", a lemma that is part of the paradigm of another lemma. —CodeCat 13:31, 19 June 2016 (UTC)
For sure, the comparative meilleur has a specific entry in French paper dictionaries (under M). Is it the case for best? I don't have any English paper dictionary at home. — Automatik (talk) 13:49, 19 June 2016 (UTC)
That's probably because it's irregular. I wouldn't be surprised if was and were appeared in an English dictionary either. But comparatives generally would not be found there. —CodeCat 15:06, 19 June 2016 (UTC)
Regular comparatives and participles are usually listed within the main lemma entry but without definition in English-language paper dictionaries. The following would be typical:
red adj. Of the color of blood. red·der, red·dest.
walk v. To proceed by placing one foot in front of the other. walks, walked, walk·ing.
Irregular forms, at least those that are alphabetically far removed, would have their own minimal entries, e.g.:
bet·ter comparative of good.
went past tense of go.
Obviously each dictionary is different, but that's sort of typical for paper dictionaries. —Aɴɢʀ (talk) 17:40, 19 June 2016 (UTC)

Redirects to matched pairs[edit]

Suggestion:

At least ) has a separate sense: used in lists, like "A) milk, B) eggs, C) flour" so it should be kept as a separate entry and also link to ( ).

It seems that (, ), [, ] can be used alone in set builder notation, so I take it all the 4 entries should be kept as well.

I'd like to do this:

  • Delete all senses of ( and ) that are redundant to ( ).
  • Delete all senses of [ and ] that are redundant to [ ].

And finally:

  • Redirect { and } to { }. (unless { or } can be used alone in some sense)

Rules:

  • If a symbol is only used as part of a matched pair, redirect the symbol to the matched pair.
  • If a symbol is used as part of multiple matched pairs, create the entry for the symbol and link to all matched pairs.
  • If a symbol is used by itself as well as part of a matched pair, create the entry for the symbol and list the individual uses normally, plus link to the matched pair entry.

--Daniel Carrero (talk) 19:08, 20 June 2016 (UTC)

I agree with you on rules 1 and 2. I am also in favor of rule 3; I think we should include a definition-line pointer to the matched-pair entry (for instance using {{only in}}), rather than e.g. just a "See also" link.
- -sche (discuss) 19:49, 20 June 2016 (UTC)
Maybe we ought to extend rule 3 to apply generally to words that appear as part of a larger idiomatic term? For example, include a link among the definitions of give that leads to give up. —CodeCat 21:24, 20 June 2016 (UTC)
When the number of collocations to be linked to is small (especially if it's just one or two), I support that. For punctuation marks and symbols I could see allowing separate {{only in}} lines for each "collocation" even if there are many of them. But for words with a very large number of collocations, like take (take in, take over, take cover, take back, take up, take up for, etc, etc), I can see how some people might think it was better to list them in a collapsible table as is done at present. An alternative might be a template similar to {{only in}} but which allowed an arbitrarily long list of collocations to be linked to all on one line (rather than separate lines), a bit like how {{&lit}} can link to as many constituent parts as necessary. - -sche (discuss) 22:40, 20 June 2016 (UTC)
I favour putting them among the definitions, though. When someone says "give up", the word "give" in that collocation still has a meaning, but that meaning is only apparent in the combination with "up". It's still the word "give", and per our mission statement, if someone wants to know what a word means, they should be able to look it up. It doesn't matter that it's a collocation or idiom, because the person looking it up might not know that. —CodeCat 23:18, 20 June 2016 (UTC)
@CodeCat, -sche: I agree that ) should link to ( ) in a sense line as opposed to a "see also" section or something.
Would we only have collocations of verb + preposition and adverbs? For example, would the full sense line of give look like this one below?
  1. Used in: give away, give back, give in, give off, give out, give over, give up
--Daniel Carrero (talk) 01:40, 22 June 2016 (UTC)
Something like that, yes. If the list gets too long, we could have a separate section to list them instead, but then that definition should be replaced with "used in: see #section" or similar. —CodeCat 16:47, 22 June 2016 (UTC)

Note: The proposal concerning matched pairs affects few entries and people so far supported it. I'll wait a bit more and if no one objects I'll make all the moves and redirects.

I'd like to write the 3 rules into WT:EL eventually, but it would involve creating a vote and I suppose it can be done later. --Daniel Carrero (talk) 05:23, 25 June 2016 (UTC)

@CodeCat, -sche: Done. I created all the redirects and edited all the entries of matched pairs. See Category:Translingual matched pairs. --Daniel Carrero (talk) 01:36, 28 June 2016 (UTC)

level of detail in English pronunciations[edit]

Under prodigal, the pronunciation looks like this:

/ˈpɹɑdɪɡəl/, [ˈpʰɹ̥ɑɾɨɡɫ̩]

Besides the fact that this is a specifically American pronunciation without labeled as such, do we really need the level of detail expressed in [ˈpʰɹ̥ɑɾɨɡɫ̩]? IMO this is hardly going to help most people and will likely scare a lot of them off. Benwing2 (talk) 21:57, 20 June 2016 (UTC)

AFAIK we are supposed to show phonemic and not phonetic pronunciation. Equinox 23:20, 20 June 2016 (UTC)
We can show both, as long as the phonetic pronunciation is clearly labelled as where it's used, register etc. —CodeCat 23:23, 20 June 2016 (UTC)
Like in every conversation on this topic I restate my conviction that the question should never be 'do we need it' but 'does it harm us'. Korn [kʰũːɘ̃n] (talk) 23:44, 20 June 2016 (UTC)
We're supposed to show phonemic, yes, but there's no ban on also showing phonetic. If the information is correct and correctly-labelled and (ideally) verifiable, include it. Average readers have the broad transcription to look at and advanced language learners and others might be interested in the narrow transcriptions. If the phonetic pronunciations become so numerous that they clutter the entry, collapse them. - -sche (discuss) 23:45, 20 June 2016 (UTC)
At the very least they should not be put on the same line. — Dakdada 08:52, 21 June 2016 (UTC)
  • I consider this sort of thing a case of false precision that should be removed. It's a bit like measuring the distance between two cities down to the nearest nanometer. —Aɴɢʀ (talk) 14:25, 21 June 2016 (UTC)
    I agree with Angr. To use language that would satisfy Korn, false precision is harmful. --WikiTiki89 15:25, 21 June 2016 (UTC)
    Which particular features here would be false precision in your opinion? Aspiration, velarization of coda /l/, and, in American English, medial flapping are quite common features of English pronunciation. Voicelessness of glides after voiceless stops does not seem too bad either. [ɨ] for /ɪ/ and syllabic [ɫ̩] for /əl/ seem more dubious, I suppose. --Tropylium (talk) 21:15, 21 June 2016 (UTC)
    Showing both aspiration of the [p] and devoicing of the [ɹ] is redundant. Flapping is common, but optional, in AmEng, so [ɾ] is not the only possibility. The unstressed vowel is not as far back as [ɨ]. And above all, all this information is predictable, so it doesn't need to be shown. There's a reason why paper dictionaries only give phonemic transcription, not phonetic, and saving space isn't it. —Aɴɢʀ (talk) 21:58, 21 June 2016 (UTC)
    Redundancy is not false precision, optionality is not false precision, predictability is not false precision, claiming that the unstressed vowel is never backed to [ɨ] might be false precision. Declension tables are predictable information too. For languages with the right spelling, the pronunciation section itself is redundant since predictable. If you know the rules, you can predict large parts of most languages down from their proto form. Where draw the line? That said, I agree that false precision is harmful. But I disagree that there is such a thing as too much precision, if the phænomena are well enough recorded. Korn [kʰũːɘ̃n] (talk) 23:30, 21 June 2016 (UTC)
    I do think it's possible to be too precise -- too much obvious detail will swamp the important things and make it harder to read. Benwing2 (talk) 23:43, 21 June 2016 (UTC)
    BTW we ran into this same issue when giving Russian pronunciation. We don't, for example, indicate that non-palatal [l] is heavily velarized, or the exact quality of [ɨ] (which, for example, has a noticeable on-glide preceding it when following labial consonants), but we do indicate the pronunciation of unstressed /a/ as either [ɐ] or [ə] (the rules for this are somewhat complex and easy to forget). The idea here is to include detail that is likely to help language learners and omit detail that is less helpful (either because it's too precise or because it will already be known). Especially unhelpful IMO is including lots of the more obscure IPA diacritics and other symbols, which few people will be familiar with and fewer still will have any idea how to pronounce correctly. Even using [ɾ] for flapped /d/ and /t/ bothers me a bit -- I would be at least as comfortable using [d], even if it's a slight lie. Benwing2 (talk) 23:53, 21 June 2016 (UTC)
    So false imprecision is better than false precision? I'm shocked. The moment we start entering even one smidgen of false information knowingly is the moment when we can scratch the entire project, because we no longer have the goodwill on which this project runs. And as long as we can collapse, I don't see how we can ever get swamped. We can easily make three labeled levels: Archiphonemic (English), phonemic (USA), phonetic (Working Class Michigan) and hide the phonetic levels if they become too many. In all languages. Korn [kʰũːɘ̃n] (talk) 05:27, 22 June 2016 (UTC)
    Giving the phonemic transcription only is not giving false information, but giving the phonetic transcription falsely implies that all other phonetic renderings are wrong, which is harmful. Pronouncing this word without aspiration/devoicing is unusual (except in certain accents like Indian English) but not incorrect. Pronouncing this word without flapping the /d/ is unusual in North America but not incorrect. Pronouncing this word with [ɪ] rather than [ɨ] is normal and not incorrect. Pronouncing this word with a nonvelarized [l] is unusual in North America but not incorrect. That's why this is false precision: it implies that any deviation is wrong, and it isn't. It's like saying the distance from New York to Boston is 13,495,680 inches: it implies that it's more than 13,495,679 inches but less than 13,495,681 inches, which is absurd. —Aɴɢʀ (talk) 11:08, 22 June 2016 (UTC)
    I would not read an implication that everything else is wrong, and even if that was the case, that issue would be fixable by adding labels, even more pronunciations, and not by removing stuff. Korn [kʰũːɘ̃n] (talk) 11:41, 22 June 2016 (UTC)
    No one else has mentioned that this is how they pronounce it so here I am. It's my exact pronunciation (although added by User:msh210). I don't see the harm. If someone wanted to find the phonetic transcription, where else could they find it besides here? Ultimateria (talk) 21:23, 21 June 2016 (UTC)
    OK, I took the liberty of deleting the excessively detailed pronunciation (and adding UK pronunciation in, hopefully I got it right). If we want to put it back we should have a general policy of how to represent phonemic and phonetic detail. I think something like [ˈpʰɹɑɾɪɡəl] is plenty enough detail. Rules for aspiration and flapping are a bit complicated so it may be useful to show them, but devoicing of [ɹ] is obvious and surely excessive, and the quality of [ɪ] and [ə] (and whether the last syllable has a syllabic l) are too variable to quantify, and all /l/'s are velarized in American English so it's probably not necessary to bother with that -- anyone who cares enough about the exact quality of /l/ will almost surely already know that /l/ is velarized. Benwing2 (talk) 23:41, 21 June 2016 (UTC)
    I don't think the devoicing of [ɹ] is obvious to all non-native speakers, nor do I fully agree with your final comment. When learning other languages, I find very detailed phonetic pronunciations extremely helpful, as I am not always able to pick up the finer subtleties of pronunciation just by listening (and by finer, I mean at least as fine as [ˈpʰɹɑɾɪɡəl], and often more specific). In French, for instance, I'm finding that I've become limited in my ability to improve my accent, because nowhere can I find exact enough phonetic transcriptions of words, and I'm often not able to successfully imitate some of the minutiae of pronunciation that I hear. I'm opposed to removing phonetic pronunciations unless their precision is actually false (as opposed to "unnecessary"), but I do think they should be clearly labelled as such. Andrew Sheedy (talk) 04:23, 22 June 2016 (UTC)
    I've restored the pronunciation, labelled as American, as [ˈpʰɹɑɾɪɡɫ̩]. Seeing as many varieties (including US varieties) of English use both [ɫ] and [l], and some languages contrast them, a narrow transcription should distinguish them. I went with [ɫ̩] rather than [əɫ] because the former is what I've seen more of in other entries, e.g. battle, bottle, petal, fiddle (in the broad transcription of that last one — there it should probably be changed to /əl/). - -sche (discuss) 04:47, 22 June 2016 (UTC)
    I think (a select few) non-native speakers might find the transcription of [ɹ] as [ɹ̥] helpful, but I suppose they would be able to find that information elsewhere. Andrew Sheedy (talk) 05:53, 22 June 2016 (UTC)
    No one seems to have mentioned this yet, but reason I would label this as false precision is that it is selectively precise. It is precise about some aspects of the pronunciation and imprecise about others. The problem with that is that our readers will think it is a precise transcription and assume that all aspects of it are precise. What aspects are we being imprecise about? First of all, the [ɫ] symbol is an intentionally imprecise symbol and should never be used in precise transcriptions; this symbol is intentionally vague about whether the [l] is velarized or pharyngealized. Secondly, we are missing the actual place of articulation of the /l/, which for most Americans is dental. Thus the last syllable can be given as [l̪̩ˠ]. Next, the articulation of the /ɹ/ is most certainly not simply alveolar. In fact I'm not entirely sure what it is. But after saying this word over an over, I have come to suspect that in my pronunciation of this particular word, it is [ʟ̹ʷ] (a rounded labialized velar lateral approximant) or perhaps [ɣ̞̹ʷ] (a rounded labialized velar approximant), this also seems to velarize the /p/, giving [pˠʰʟ̹̊ʷ] or [pˠʰɣ̞̹̊ʷ] for the initial consonant cluster. Now we encounter another problem, which is that I have no idea whether all GenAm speakers pronounce it that exact way or not, and if not then by using this transcription we would be making the inaccurate claim that they do. I'm not even gonna bother analyzing the vowel qualities and lengths, but just note that those are another missing piece of precision. My guideline would be that if the phonetic transcription is not illustrating some important peculiarity of a word, then it is superfluous and falsely precise. --WikiTiki89 14:43, 22 June 2016 (UTC)
    The phonetic pronunciation is showing the peculiarity of a specific accent. Something I am absolutely looking for in Wiktionary, it's highly interesting information to me and seems to me to be well apt for our pronunciation section. As long as the data given is correct, I don't see the relevance of other pronunciations which diverge more or less from it. They can be added. Assuming that all people in area X pronounce aword exactly the same way is a lack of understanding that's to be fixed by a lecture on linguistics/phonetics, not a dictionary. Korn [kʰũːɘ̃n] (talk) 11:37, 23 June 2016 (UTC)
    ps.: While I'm usually for assuming that the user is not too well acquainted with linguistics and has a short attention span, clearly anyone knowing how to read IPA in the first place has a basic interest in the topic and can be expected to have a basic understanding. If not, add a disclaimer, don't remove information. Korn [kʰũːɘ̃n] (talk) 11:39, 23 June 2016 (UTC)
    No, the only peculiarity of a specific accent that it is showing is the realization of /d/ as [ɾ] (well and the vowel quality of the first syllable's vowel, but that's already given in the phonemic representation and is actually variable within GenAm). The aspiration of /p/ and the darkening of /l/ are universal in English (perhaps with the exception of small dialects that I don't know about?). The devoicing of the /ɹ/ is not something I've ever noticed or paid attention to before, but I suspect that it is not peculiar to GenAm either. The quality of the second syllable's vowel is disputable (I'm not sure what it actually is), and I don't think it is peculiar to GenAm either. The features I mentioned in my previous post, however, such as the dental nature of the /l/ and the precise articulation of the /ɹ/, are peculiar to GenAm (RP has an alveolar /l/ and in this word I suspect the /ɹ/ is simply [ɹ] or [ɻ], and not velar). The vowel length features of GenAm are also completely ignored (the first syllable has a longer vowel than the other two). The features given are not any more important or interesting than the features not given. --WikiTiki89 15:09, 23 June 2016 (UTC)
    Non-velarised L occurs in Northumbria and Ireland. I'm talking about whether this level of pronunciation should be had in general; I have no merit whatsoever to talk about this pronunciation specifically. I'm just saying that, if e.g. GenAm is /bɜrd/, then having New York: [bɜjd] and Some city: [pɚt] seems to me to be within our scope, and desirable. Every phonetic feature which is distinguishing either for or within the dialect should be visible. So l-velarisation should be featured, for, while it is not phonemic anywhere, it is part of what makes Geordie sound like Geordie. When dealing with most German, just having /a/ and /a:/ might be sufficient, but an extra line for northern accents, where /aː/ and /ɑː/ are contrasting phonemes, actually making that difference, and having that line in the first place, is neither superfluous nor false precision, but simply extra service. Can we be on the same page on that? Korn [kʰũːɘ̃n] (talk) 15:56, 23 June 2016 (UTC)
    Yes, I can agree that "New York: [bɜjd]" is useful, because it does not attempt to be overprecise, it is just highlighting a particular feature. --WikiTiki89 17:18, 23 June 2016 (UTC)
    I definitely agree that we should avoid being overprecise. @Korn, keep in mind that many people know or can learn the basics of IPA but will not know how to interpret all the strange diacritics and such. Figuring out that [ʃ] represents the sound of sh is on an entirely different level from figuring out what all the symbols in [l̪̩ˠ] (much less [pˠʰɣ̞̹̊ʷ]) mean. Most people, including IMO many or most people familiar with IPA, have no idea what sounds are denoted by vowel symbols like [ɤ] and [ɞ] and [ɜ], or what the difference between dental vs. alveolar articulations or velarized vs. pharyngealized articulations are, or even what "pharyngealized" means, and have no easy way of figuring this out, either. This is a general dictionary and needs to be aimed towards the intelligent layman, not a specialist in linguistics. We always have to strike a balance between precision and ease of use. Benwing2 (talk) 17:56, 25 June 2016 (UTC)
    Finding out the value of [ʃ] and [ɤ] is equally difficult. Both infos are a mere two klicks on Wikipedia away - tops. And since our users browse this Wikiproject, we can assume they have access to another. If they cannot be arsed to look up the information, that is their choice. Wiktionary should be aimed at everyone, which is my understanding of the Wiki spirit. We should not suddenly stop being informative at a certain level of education. If we cannot approach laymen and expert alike, we're simply not good at what we do. I'm tired of hearing that the user is unknowing and thus may never be given incentive or chance to improve. We can provide both, the simple and the precise, the overview and the in-depth-treatment. Korn [kʰũːɘ̃n] (talk) 18:13, 25 June 2016 (UTC)
    I've been away from the discussion for a bit, but I strongly agree with pretty much everything Korn has been saying. If we're worried about confusing the average user with extremely precise (and potentially extremely helpful) phonetic information, then let's put it in collapsable boxes, not remove it. Andrew Sheedy (talk) 03:59, 7 July 2016 (UTC)

"Category:en:Currencies" and "Category:en:Currency"[edit]

What's the difference between "Category:en:Currencies" and "Category:en:Currency"? Do we need both? — SMUconlaw (talk) 22:14, 21 June 2016 (UTC)

"Currencies" contains the names of particular currencies, while "Currency" contains terms related to currency that are not necessarily currencies. So there is a difference. —CodeCat 01:01, 22 June 2016 (UTC)
Maybe in theory, but that's not the case with those categories at present. Equinox 02:25, 22 June 2016 (UTC)
In that case, the categories need usage notes, and some reclassification is in order. — SMUconlaw (talk) 07:17, 22 June 2016 (UTC)
We really need some clear contrast made between "set" categories and "topic" categories. Other than the fact that "set" categories tend to have plural names and "topic" categories singular names, I don't know how we're supposed to predict which category is of which type. Category:en:Horses, for example, has a plural name and says "English terms for horses", but in fact its content includes lots of terms that relate to horses in some way but are not terms for horses (behind the bit, equine, gait, etc.). Some of them could be moved to Category:en:Equestrianism, but not all of them. —Aɴɢʀ (talk) 10:50, 22 June 2016 (UTC)
Perhaps rename to a clearer word? These little difference are annoying to one that uses of non-plural language just like me. --Octahedron80 (talk) 10:57, 22 June 2016 (UTC)
Wikipedia distinguishes them using the plural in some cases. There's w:Category:Color next to w:Category:Colors. But I do agree that it may make sense to distinguish them more clearly. I just don't know how. Perhaps the simplest solution would be to have Category:Topic:Horses for the topic, or disambiguate the set as Category:Kinds of horses. But then we'd have to do the same for all other categories too, so we might end up with Category:Species of mammals and similar "long" names for all life forms. And the system may not be watertight in any case; someone may still decide to place Stadtkreis in Category:de:Districts of Germany, even if the category may be intended only for the names of actual districts, not terms for specific kinds of districts. Both are sets, but the category would be only intended as one of them. —CodeCat 15:46, 22 June 2016 (UTC)
cat:en:List of colors? --Giorgi Eufshi (talk) 15:54, 22 June 2016 (UTC)
It kind of works, but it also has a connotation to me that implies it's a complete list. Maybe all categories are that way, I don't know, but it feels stronger with "list of". —CodeCat 16:36, 22 June 2016 (UTC)
I don't know if new names are really necessary; it might be sufficient to have more explicit text in the categories themselves. The text currently in CAT:en:Currency and CAT:en:Currencies is pretty good, but maybe they could even say "This is a topic category..." and "This is a set category...". Take CAT:en:Body, which says "English terms for and related to the body and its parts." It seems to be both, as it's both terms for and related to. It should probably be a topic category, with a separate CAT:en:Body parts as the set category. Then, to add to the confusion, there's CAT:en:Anatomy, which I guess is supposed to be just for anatomical technical terms (such as one might learn in anatomy class at university) and not for everyday words, but in practice it's full of every day words for parts of the body. Maybe we should have a third kind of category, the "technical-term category" and label them as such. I was recently at a loss where to put some language's word for "feather". Not in CAT:Birds, because a feather isn't a kind of bird, and not in CAT:Ornithology because it isn't a technical term. CAT:Body is OK I guess, though it seems odd since the average reader probably expects that to refer to the human body. I notice that feather isn't in any category specific to its primary birdy meaning. —Aɴɢʀ (talk) 17:31, 22 June 2016 (UTC)

──────────────────────────────────────────────────────────────────────────────────────────────────── I am in favour of fuller usage notes that clearly explain the intended use of a particular category and suggest alternative categories for related words. I just wanted to point out that "This is a topic category ..." and "This is a set category ..." are not clear enough. — SMUconlaw (talk) 21:09, 22 June 2016 (UTC)

People that add categories to entries won't always look at the description on the category page. If they see one language use it, they might add it to their own language entry without checking. I certainly don't look at the descriptions much myself. —CodeCat 21:34, 22 June 2016 (UTC)
I don't look at the descriptions all that much either, but I do look at them when I'm uncertain whether a particular words belongs in a particular category. —Aɴɢʀ (talk) 21:56, 22 June 2016 (UTC)
Nonetheless, it would be useful to define possibly confusing categories on the category pages so that incorrectly categorized words can be spotted and moved. — SMUconlaw (talk) 23:51, 22 June 2016 (UTC)
Perhaps two separate namespaces, like Category (or Topic) and Set (or List)? Functionally they would both operate like categories but the distinction would then be clear(er) even to those who don't look at the cat page. Equinox 22:26, 22 June 2016 (UTC)
Can we actually create new category namespaces? Do we really want to? We can also have Category:Topic:en:Anatomy or similar. But also consider Angr's point that there is a distinction among topical categories between terms related to a topic, and technical terms and jargons used within a field. —CodeCat 00:22, 23 June 2016 (UTC)
Well, one could potentially have a three=way split: "Category:List:en:Religions" (list of religions: Judaism, Islam, etc), "Category:Topic:en:Religion" (words pertaining to religion: god, church, etc), and "Category:Jargon:en:Religion" (or some other word besides "jargon") (for words used chiefly by scholars of religion, like perhaps actual sin). But the last one might be better named "Category:Jargon:en:Theology". In other cases, the Topic and Jargon categories might share a keyword ("...:en:Aviation") while the List category had a different name ("...:en:Aircraft"); that's not a problem, I'm just mentioning it. A related issue the the tendency of people to use labels for all three purposes, making it hard to tell when a sense simply pertains to religion and when only scholars of religion use the sense. - -sche (discuss) 04:11, 23 June 2016 (UTC)
The whole idea of separating topics and sets is likely lost on the vast majority of potential readers, which leads me to believe there's not a lot of benefit in a dual system. Those (such as CodeCat) who want a dual system or even a triple system are quite focused on the minutiae of categorization. They aren't really creating a user-friendly system. Purplebackpack89 15:33, 23 June 2016 (UTC)
I threw the idea out there to see what people would think. I'm only tangentially interested in categories (and would like to see how much use they get, if we could measure such things). But if nobody cared at all, this discussion wouldn't have come up, I suppose. I do think it's worth stating on each category page what it's supposed to achieve, but realise somebody will always add further entries without reading that text. Equinox 16:46, 23 June 2016 (UTC)
  • Merge: Distinction between the two is essentially meaningless. Purplebackpack89 13:36, 23 June 2016 (UTC)

Editing WT:EL#Flexibility[edit]

Proposal: Editing WT:EL#Flexibility to remove everything except the first sentence. (diff)

Current text:
While the information below may represent some kind of “standard” form, it is not a set of rigid rules. You may experiment with deviations, but other editors may find those deviations unacceptable, and revert those changes. They have just as much right to do that as you have to make them. Be ready to discuss those changes. If you want your way accepted, you have to make the case for that. Unless there is a good reason for deviating, the standard should be presumed correct. Refusing to discuss, or engaging in edit wars may also affect your credibility in other unrelated areas.

Rationale:
I created Wiktionary:Votes/pl-2016-02/Removing "Flexibility" this year with the proposal of removing the paragraph entirely. It ended as no consensus 6–4–1 (60%), with some voters stating that they support the notion of flexibility. That said, as an additional suggestion, I maintain that maybe adding the rest of the text to Help:Interacting with other users would be a good idea.

Note about votes:
I'm trying not to create too many votes at once because people complained when I did it last time, around February 2016. I believe this change in the flexibility text would need a vote, but it can be created later. I still have the intention of editing more of WT:EL but I'll try focusing on this section now because it's the first unvoted section of the policy. --Daniel Carrero (talk) 09:41, 25 June 2016 (UTC)

Should the ‘Vulgar Latin’ distinction be deleted?[edit]

The way we treat Latin is rather unusual. The Latin spoken by the illiterate or barely literate is classified as Vulgar, whereas that which was spoken by the educated is called Classical. There was almost certainly a lot of variation within Vulgar Latin, and between regions and ages. Likewise, Classical Latin had its own set of variations. Presumably Vulgar Latin is unique in that it was usually spoken by the less educated, but I’ve never seen people apply this distinction to other languages.

The phrase “Vulgar Latin”, coined in the nineteenth century by Hugo Schuchardt, is unfortunate, but has come into common usage among scholars. József Herman defined Vulgar Latin as a collective label for those features of Latin which we can be sure did exist, but which were not recommended by the grammarians. It should not be, although it often has been, envisaged as being a separate language (or “system”) co-existing with “Classical” Latin. Both terms are ambiguous, and probably best avoided: all the styles and periods might as well be included under the umbrella of “Latin”, tout court. “Romance” is no clearer a word when applied to these centuries. The word is sometimes used as an alternative label for what others call Vulgar Latin, implying that Latin and Early Romance were then, and are now in retrospect, clearly distinct simultaneous entities. This perspective could only command adherence before the development of sociolinguistics which has made clear that variation is normal within a single language, and we need not assume that the existence of synonymous variants (such as DE and genitives, or AMABO and AMARE plus an auxiliary) implies that they each belong to a different language (or “system”). It is true, of course, that by the second millennium CE it had become normal to distinguish, at least in some contexts, Latin and Romance as separate entities; but it now seems anachronistic to postulate such a mental distinction as existing before the seventh century, and probably (for reasons there is no room to go into now) before the Carolingian “Renaissance” of c. 800 CE. Attested features in writing of the period preceding this separation of Latin and Romance may well therefore be direct evidence of the spoken usage of that time (inasmuch as written evidence can be taken to attest speech anywhere): e.g. when St Benedict of Nursia in the sixth century CE used both manducare and comedere to mean “eat” in his Monastic Rule, it is reasonable to take that as evidence for both words being used at that time and in that area; but it is a pointless argument to discuss whether these uses attest sixth-century southern Italian “Latin” or “Romance”, as if that had been a real distinction in St Benedict’s context. It makes it awkward for us, but we need to realize that calling the spoken language of pre-Carolingian post-imperial times (c. 400–800 CE) “Latin” (or “Late Latin” or “Vulgar Latin”) or “Romance” (or “Early Romance”) is only a terminological distinction. There was just one language there, however variable it was in increasingly elastic and complex ways. (Source; more information.)

Are we better off without this? --Romanophile (contributions) 12:31, 25 June 2016 (UTC)

No, I don't think we are. We treat Vulgar Latin as an etymology-only variant of Latin anyway, it's not as if we treat it as a separate language. And in etymologies it is important to recognize that some reconstructed words that must have existed as early as the 1st century simply aren't attested, or are attested only in nonliterary contexts (e.g. graffiti in Pompeii), or that some words must have had colloquial senses that aren't attested in literature (e.g. focus ‎(fire) as opposed to 'fireplace, hearth'). Of course in real life the distinction between Vulgar Latin and Classical Latin wasn't binary, it was a continuum from basilect to acrolect, but for dictionary-writing purposes it's still more helpful to retain the labels than to ditch them. —Aɴɢʀ (talk) 12:45, 25 June 2016 (UTC)
Welsh has a similar distinction between colloquial and literary language, and they differ in grammar too. But we don't make that distinction as strongly as we do for Vulgar Latin. —CodeCat 13:50, 25 June 2016 (UTC)
I thought that we were just following our sources. Most of the Vulgar Latin etymology labels are from MW 1913, possibly Century 1911 too. The passage cited may lead to a vast amount of scholarship which would enable better labeling in etymologies and in Latin entries. When the fruits of that research are available we could incorporate it into our labeling. Unless the current label leads to loss of users or contributors, it seems silly to eliminate what information value the label has, however modest. DCDuring TALK 14:35, 25 June 2016 (UTC)
I agree with the above and I'm for keeping it the way it is for now, as the label does serve a purpose. But I'll admit the way we handle it can be unusual. On a related note, I think it's also somewhat inconsistent. Technically, most terms that were inherited probably should've passed through a "Vulgar Latin" phase before making their way into what we call "Romance". But on here we usually only make a point of listing the Vulgar Latin intermediate if it is sufficiently different from the attested Classical Latin term and thus warranted. There are numerous hypothetical intermediates between each of the Classical term and the Romance language term that we could ascribe to Vulgar Latin, such as forms dropping the final -s or -m, or ones accounting for sound shifts, contractions/syncope, metathesis, etc. So the way we handle it is admittedly somewhat arbitrary. However, adding a Vulgar Latin intermediate form for every inherited entry is unnecessary, of course, as the transformation from the Classical can be already be clearly inferred in many cases. Word dewd544 (talk) 17:43, 27 June 2016 (UTC)

Should I just add basic definitions?[edit]

I was looking at this wordlist: https://en.wiktionary.org/wiki/Wiktionary:Frequency_lists/Czech_wordlist

I noticed that a lot of the words have non-working links. I was thinking about adding English definitions for these words. But I don't know all the rules for participating in Wiktionary, and I don't really want to take all the time to learn. So would it be useful of me to just add basic definitions so that someone else can come along and put them into the official Wiktionary format, or would my sloppy additions just be deleted anyway?

I've been emailing info-en@wiktionary.org about this for four days, but nobody is responding. Please let me know if there is a better place to get an answer to my question. Thanks!

  • This is the place. Why not add one or two of the words with English translations (not normally definitions for non-English terms) and we'll tell you how they can be improved (if needed). SemperBlotto (talk) 14:35, 26 June 2016 (UTC)
    • p.s. Look up related or similar words that we already have, and see how they are formatted. SemperBlotto (talk) 14:36, 26 June 2016 (UTC)
    • I just did a few. I hope these minimal additions will be retained and eventually expanded, because I think they're clearly better than nothing, they're quick for me to add, and I really don't want to have do everything in order to do something. Trreuioteu (talk) 15:01, 26 June 2016 (UTC)
      • See how I have modified pojď. Using our standard templates means that the term gets added to the right categories. SemperBlotto (talk) 15:04, 26 June 2016 (UTC)
  • Off topic: anyone know who has access to info-en@wiktionary.org? It seems like an email we should actually be keeping track of. At the very least, it should have an automatic reply that tells people to post in the Information Desk. —Μετάknowledgediscuss/deeds 17:56, 26 June 2016 (UTC)
    • No, I have never heard of it (being a relative newcomer). SemperBlotto (talk) 04:39, 27 June 2016 (UTC)
      • The list contains too many non-lemma forms and misspellings, not a good list, not a true frequency list. I'm not surprised they are not defined at all. Non-lemma entries should be created by a bot, not by a human (editor), provided that inflection tables exist and there's a Lua developer interested in making a module. --Anatoli T. (обсудить/вклад) 04:45, 27 June 2016 (UTC)
    • I think that address goes into the OTRS queue. I did some OTRS work a while back, but I have not participated in a long time. Not sure if anyone from en.wiktionary is currently active on OTRS. - TheDaveRoss 12:50, 27 June 2016 (UTC)

[edit]

The logo vote passed. According to phab:T138801, the change is scheduled for 30 June 15:00-16:00 UTC. --Daniel Carrero (talk) 07:56, 29 June 2016 (UTC)

I look forward to the next new logo vote! - TheDaveRoss 12:44, 29 June 2016 (UTC)
We are going to need a matching favicon as discussed here. --Daniel Carrero (talk) 12:50, 29 June 2016 (UTC)
Finally, we got it! I'll celebrate by editing twice as hard today --Turnedlessef (talk) 10:33, 1 July 2016 (UTC)
I realize now the lighter gray color of the text at the bottom would have been better off black, and there probably should have been a little more clearance at the top, but it's not too bad. Also, they haven't changed our logo on the www homepage. Are they going to? Do we need to enter a separate ticket for that? --WikiTiki89 16:52, 1 July 2016 (UTC)
The old logo has disappeared for me, leaving an empty space ... — SMUconlaw (talk) 17:55, 1 July 2016 (UTC)
You edit meta Www.wiktionary.org_template to affect that page. It is protected, so someone who is an admin there will have to do it I guess. - TheDaveRoss 18:47, 1 July 2016 (UTC)
Something is wrong. As with Smuconlaw, for me the logo has disappeared. Benwing2 (talk) 02:21, 2 July 2016 (UTC)
Same here- both on Firefox and Safari (Mac). I looked at the part of the page source for the logo:
<div id="p-logo" role="banner"><a class="mw-wiki-logo" href="/wiki/Wiktionary:Main_Page" title="Visit the main page"></a></div>
There's no code for an image. Odd. Chuck Entz (talk) 03:22, 2 July 2016 (UTC)
The image is provided via CSS applied to .mw-wiki-logo. (The image is [8].) —suzukaze (tc) 03:45, 2 July 2016 (UTC)
I'm using Chrome on Mac, and still have a missing logo. Benwing2 (talk) 06:09, 2 July 2016 (UTC)
The logo looks fine here; it didn't disappear at all to me. I use Firefox 47.0 on Windows 8.1. --Daniel Carrero (talk) 06:17, 2 July 2016 (UTC)
I'm on the new Firefox 47.0.1 and there is still a gaping grey hole. — SMUconlaw (talk) 12:57, 2 July 2016 (UTC)
I installed the new Firefox 47.0.1, too. The logo is still perfect to me. I even cleared the cache a few times, all is good here. --Daniel Carrero (talk) 13:04, 2 July 2016 (UTC)
Clearing the cache makes no difference for me. :( — SMUconlaw (talk) 13:08, 2 July 2016 (UTC)
Still missing for me, under both Chrome and Safari on the Mac. Can we file a phab bug? Benwing2 (talk) 21:48, 2 July 2016 (UTC)
I created phab:T139255. --Daniel Carrero (talk) 02:42, 3 July 2016 (UTC)
Thanks! Benwing2 (talk) 02:56, 3 July 2016 (UTC)
Hi, there was a mistake with the original configuration change so that people with high-resolution (retina) displays wouldn't see any logo at all. I deployed the fix just now and all should be good. Sorry about the trouble! Legoktm (talk) 07:47, 3 July 2016 (UTC)
Yup, it looks fine now! Thanks. — SMUconlaw (talk) 11:43, 3 July 2016 (UTC)

──────────────────────────────────────────────────────────────────────────────────────────────────── @Legoktm: Actually, it looks fine on my laptop but I am still not seeing any logo on mobile devices (am using an iPhone and iPad). — SMUconlaw (talk) 09:37, 4 July 2016 (UTC)

I don't think there ever was one in the mobile view. I may be wrong though. --WikiTiki89 20:24, 5 July 2016 (UTC)
I wasn't viewing the website in the special mobile mode, but as an ordinary website in the Safari browser. Unless I've been hallucinating, my impression is that there used to be a logo visible, in the same way it appears when the website is viewed on a laptop or desktop. — SMUconlaw (talk) 20:36, 5 July 2016 (UTC)
Hmm... On my iPhone the logo appears as expected on the Desktop version of the site in both Chrome and Safari. --WikiTiki89 20:40, 5 July 2016 (UTC)
Tried closing and reopening Safari. Nope, the logo is still missing. I'm on iOS 9.3.2 (the current version). — SMUconlaw (talk) 21:08, 5 July 2016 (UTC)

──────────────────────────────────────────────────────────────────────────────────────────────────── I can now see the logo on my older-model iPad, but not on my iPhone 6. — SMUconlaw (talk) 12:01, 16 July 2016 (UTC)

Compact Links coming soon to this wiki[edit]

Screenshot of Compact Language Links interlanguage list

Hello, I wanted to give a heads up about an upcoming feature for this wiki which you may seen already in Tech News. Compact Language Links has been available as a beta-feature on all Wikimedia wikis since 2014. With compact language links enabled, users are shown a much shorter list of languages on the interlanguage link section of an article (see image). This will be enabled as a feature in the coming week for all users, which can be turned on or off using a preference setting. We look forward to your feedback and please do let us know if you have any questions. Details about Compact Language Links can be read in the project documentation. Thank you. On behalf of the Wikimedia Language team:--Runa Bhattacharjee (WMF) (talk) 13:09, 29 June 2016 (UTC)

I hate this feature. Can there be a way to disable it in user preferences? --WikiTiki89 20:12, 29 June 2016 (UTC)
Seems to already exist, Wikitiki.
Looks like it is an optional feature. - TheDaveRoss 20:22, 29 June 2016 (UTC)
Thanks! You just made my day! --WikiTiki89 20:25, 29 June 2016 (UTC)
If you don't want it, untick the box that will be available at Preferences > Appearance > Languages. —Stephen (Talk) 03:57, 30 June 2016 (UTC)

July 2016

Renaming all requests categories[edit]

See this: WT:RFM#Category:Translation requests (X) to Category:X translation requests / Category:Translations to be checked (X) to Category:X translations to be checked.

There, I suggested renaming all kinds of requests categories to a consistent format.

I am mentioning this here because it's a large proposed change. Thanks. --Daniel Carrero (talk) 20:00, 1 July 2016 (UTC)

Closing OrphicBot vote[edit]

Concerning this vote:

Wiktionary:Votes/bt-2016-06/User:OrphicBot for bot status

The vote was scheduled to end on June 30, today is July 3. Current results: 4-1-1. (the abstention is mine)

I am hesitant to close the vote as "Passed". It seems some issues about Ancient Greek are under discussion in the "Oppose" section.

So I decided to extend the vote +7 days. New end date: July 10. Please check if all is OK with the bot proposal, before we can close the vote. Thanks.

--Daniel Carrero (talk) 11:03, 3 July 2016 (UTC)

@Daniel Carrero: Thank you for extending the vote; I was unsure what the process was at this point. I believe the specific concerns of the user who has not responded in twelve days to have been addressed: 1) The robot is not intended to remove bad R:DGE links, but I have demonstrated it does not add them. There are only two which need to be removed by hand. 2) Edit summaries are now generated automatically based on actual changes. During testing, I wrote them by hand. The seven samples tested two functions, and my edit summary reflected the more experimental one. Further discussion of the robot has since taken place here: User_talk:I'm_so_meta_even_this_acronym#R:LSJ_and_the_Perseus_Resolve_Form. I am a little bit concerned the dissenting voter is requesting maximalist features (sorting the References section according to Classical-oriented preferences) without consulting other users. I like his requested features, and I am very pleased he asked for them, but I would not have foist them on everyone else myself, given LSJ is not relevant to Byzantine studies, is not a first choice for readers of Homer, and is not as suited to the needs of language learners as Middle Liddell. I had assumed this proposal was so innocuous that for this reason it had likely not generated any attention in almost two weeks. Given the References modules are somewhat more complicated than a programme for appending strings to a list according to some heuristics and sorting the list, I am puzzled anyone would think this task would be significantly safer left to another user. However, should anyone else want to do it, my existing code is available at User:OrphicBot and I would not mind withdrawing this proposal. Isomorphyc (talk) 13:20, 3 July 2016 (UTC)
I don't think you should withdraw the proposal. —JohnC5 14:59, 3 July 2016 (UTC)
It is not aboute LSJ vs. Middle Liddell; it is about LSJ vs. a Spanish dictionary, as detailed in the vote in the discussion below my vote. I ultimately returned to opposition not because of template order but because the interaction, and changes in mainspace and their edit summaries did not give me enough confidence. For that I am really sorry since this is a great initiative. --Dan Polansky (talk) 17:48, 6 July 2016 (UTC)
Hi @Dan Polansky:; I regret this did not work out. If you do have feature requests or other issues regarding the classical languages references after the vote, however, please do let me know, as this is an ongoing process. The issue with your sorting preferences was my concern-- I agree with your preferences, but I was concerned you and I were the only participants in that conversation. I know more users now and am more comfortable with this sort order. You may have been less concerned about the possibility of dead links had you participated in the earlier conversations. My initial edits to R:LSJ (a module we have both worked on) removed more than a thousand dead links it produced, and all of my successive modules followed the same procedure of indexing the target dictionary. (I did not work on R:DGE, and only linked it at another user's request, as you can see in the discussions). I agree with you most about the edit summaries, but as you can see in the code, the production version writes these automatically, but I did not have this feature during testing, and it was necessary to test multiple features in the same run. Assuming the robot is approved, I hope the changes work out to your satisfaction, and I trust we may find something about which to agree at a later time. Isomorphyc (talk) 18:26, 9 July 2016 (UTC)

Shortening some 'exceptional' language codes[edit]

Currently, exceptional codes for non-proto-languages are created by adding three letters approximating the name of the language onto the end of, as WT:LANG puts it, "a relevant family code". Because WT:LANG goes on to specify that "this system is used even if the relevant family code is itself an exceptional code rather than an ISO-derived code", many exceptional language codes have three parts: ira-azr-klt, ira-azr-kls, nai-yuc-tip, nai-yuc-yav, qfa-ctc-cat (qfa-ctc should actually be sai-ctc for consistency with other family codes, but that's a separate matter), qfa-ctc-col, qfa-len-slv (qfa-len should be nai-len, but that is again a separate matter). However, others have only two parts: Kitanemuk is azc-ktn rather than azc-tak-ktn, Phuthi is bnt-phu rather than bnt-ngo-phu.

At RFM, Μετάknowledge and I were discussing whether or not to always only use the nearest ISO family code (except where there is none), to obtain shorter codes, like ira-klt instead of ira-azr-klt. Other benefits are that when the precise (sub)family membership is uncertain, using only the ISO's high-level family codes is often "safer", and it allows editors to add codes for subfamilies (such as "Upper Amazon Arawakan") without worrying about needing to recode any languages (such as Amarizana and Anauyá and Maypure) which had the newly-added subfamily as their most immediate family.

If you think switching to only two-part codes is a good idea, the following codes will be affected: ira-azr-klt (which would become →ira-klt), ira-azr-kls (→ira-kls), nai-yuc-tip (→nai-tip), nai-yuc-yav (→nai-yav), qfa-ctc-cat (→sai-cat), qfa-ctc-col (→sai-col), qfa-len-slv (→nai-sln, renaming the second element to incorporate 'Lenca' now that the family portion of the code no longer does). Alternatively, if you think we should stick to using the nearest family code, then languages like azc-ktn will need to be renamed (to azc-tak-ktn).

Proto-languages are not part of this proposal, and 'exceptional' languages which belong to families that do not have ISO codes would continue to be named using the existing system. (For example, if we wanted to add an exceptional code for the divergent Korean dialect Foobarese, it would be qfa-kor-foo because the Koreanic language family has no ISO code.) - -sche (discuss) 20:47, 4 July 2016 (UTC)

I support shortening these codes. They're already long and confusing enough for editors to remember and get used to; the least we could do is make them shorter and more stable, which this proposal would accomplish. —Μετάknowledgediscuss/deeds 21:02, 4 July 2016 (UTC)

Yes check.svg Done. - -sche (discuss) 07:21, 7 July 2016 (UTC)

Open call for Project Grants[edit]

IEG barnstar 2.png

Greetings! The Project Grants program is accepting proposals from July 1st to August 2nd to fund new tools, research, offline outreach (including editathon series, workshops, etc), online organizing (including contests), and other experiments that enhance the work of Wikimedia volunteers. Whether you need a small or large amount of funds, Project Grants can support you and your team’s project development time in addition to project expenses such as materials, travel, and rental space.

Also accepting candidates to join the Project Grants Committee through July 15.

With thanks, I JethroBT (WMF) 15:21, 5 July 2016 (UTC)

Module:encodings[edit]

This is a new module I just created, announced at WT:NFE. I hope it's useful! If anyone needs additional encodings, let me know. It only supports ISO 8859-1 currently. —CodeCat 22:31, 5 July 2016 (UTC)

Small, ugly modification to improve Ancient Greek ASCII searchability -- justified?[edit]

Hello Ancient Greek editors with whom I have corresponded a few times (namely: @CodeCat, @JohnC5, @Metaknowledge, @I'm so meta even this acronym ): sorry to bother all of you, but I've made a trial change to the Template:grc-noun using a new module Module:grc-ascii-searchability which solves some searchability problems I have had in Greek. However, the change is very unaesthetic. I was going to revert it if it seems particularly disliked, or if there's a better solution available, but I was thinking of extending it to the other twelve or so templates if it seems to solve a problem others have. My problem is that I have trouble searching for Greek words on Wiktionary because I tend to type in the Roman alphabet even when I shouldn't. As a result, to find Greek words I usually look for cognates whose etymologies likely link, or else I find the word on Perseus with betacodes and then paste the link into Wiktionary. This process normally takes a few minutes. I thought there had to be a better way-- but possibly I found a worse way. I dangled some invisible text from the Greek headword template with unique variants from up to nine different Romanisation schemes (asciisation schemes, really), so that now a search for something like "qea greek" or "chalix greek" or "elios greek" or "hlios greek" in the search bar turns up the relevant Greek noun usually as the first or second result. (qea, as you know, is unaccented beta code for θεᾱ́; the rest are self-explanatory). In general, anything in this list should be searchable using most reasonable asciisation schemes I could think of: https://en.wiktionary.org/w/index.php?title=Special:WhatLinksHere/Module:grc-ascii-searchability&hideredirs=1&hidelinks=1. My hope is that these Romanisations will percolate into external search engines as well so that our Greek entries will be easily searchable that way. Does this seem like a change worth extending? If it is not widely liked I will revert it, but if it is, I will either add it to all of the Greek headword templates, or else (more cleanly, if this method is preferred), look in to finding the appropriate place for it in Module:headword. I had wanted to get the Greek words into the search bar's autocomplete feature through typing the Romanisations, but I think that is completely impossible for me. It is worth pointing out that the macron-accented Romanisation already offered by the headword module is also searchable in plain ASCII; it is just a little bit more detail-oriented to find, and speaking personally, it is not something that was ever part of my process to look for, or indeed, not something I knew about till I saw it in my own search results a few minutes ago. Thanks for your time. Isomorphyc (talk) 04:12, 6 July 2016 (UTC)

Not sure I see the point. Searching "thea greek" already works (I just tried it), so there's no reason we need to have "qea greek" work as well. Beta code is far more nonobvious than our romanisation, IMO. —Μετάknowledgediscuss/deeds 04:59, 6 July 2016 (UTC)
I think you are quite right. I will delete this if no-one else replies. Thank you! When I started I incorrectly assumed our primary Romanisation was not searchable through the macrons. Isomorphyc (talk) 09:37, 6 July 2016 (UTC)
Edited: @Metaknowledge: I originally did not realise this, but the reason you can search "thea greek" I believe is because of the template. The reason might be that since the Romanisation appears twice in the page, it outranks other pages which cite it. Try something other than a noun, for example, αἰσθάνομαι via "aisthanomai greek." One has the problem that a great many etymologies in derived languages appear before the main entry. The reason I included beta code and other variants was not so much because I like any of them, but because I wanted to include every reasonable asciisation scheme to avoid controversy about which one to use. At the same time, this is a lot of ad-hoc mess for a little benefit, and this is a parochial solution to a general problem, in which popular pages with outgoing link Romanisations outrank their link targets in searches. Isomorphyc (talk) 10:50, 6 July 2016 (UTC)
Further edit: The most relevant comparison is with verbs, which are remain in their original state: https://en.wiktionary.org/w/index.php?title=Special%3AWhatLinksHere%2FTemplate%3Agrc-verb&hidelinks=1&hideredirs=1 Isomorphyc (talk) 12:55, 6 July 2016 (UTC)
Oh, I see. Well, it could be useful. At the same time, I'm disinclined to do things that are ugly unless they will get a lot of use, which I doubt this will. I suppose you'd best wait for another classically inclined editor's opinion. —Μετάknowledgediscuss/deeds 18:15, 6 July 2016 (UTC)
What would you or others think of repeating ascii versions of all headword Romanisations within non-visible html to make them outrank pages which link to them for ascii-Romanised searches? Clearly, this method works, and what is ugly (and unfair to Modern Greek) is that I am doing it for just one language. The real problem is that we can't (I think) hint to the search function any other way that a Romanisation has almost the force of a headword. The really ugly thing which would solve the problem, but which I would never advocate, is to give the Romanisations their own headwords, as is done in Chinese, which is pretty convenient, but still less so than Greek because it is harder to type in Greek. I know I am suggesting a significant change; I wouldn't want to go forward with any of these options unless at least a few people are ecstatic, but this is not what I am seeing here. Isomorphyc (talk) 19:42, 6 July 2016 (UTC)
I think we should have entries for romanizations of all languages. We seem to go by the rule that lesser-known scripts get their own romanization entry while better-known and more widely used scripts don't. But this argumentation is completely pointless in the face of users who don't understand the script regardless. It doesn't help them in the slightest that the script is well known if they don't know it. —CodeCat 19:46, 6 July 2016 (UTC)
Ultimately, this is probably what we should do. I've categorised that idea under things to deal with in the far future, but it could actually be a great way to increase readership in the short term. It would require a very active romanisation bot, of course, and we'd presumably have to phase in languages one by one (make sure that all the romanisations are good, then create the soft redirects). —Μετάknowledgediscuss/deeds 19:51, 6 July 2016 (UTC)
I took a quick look at a random pageview statistics file amongst the five or ten terabytes available here: [9]. Of the top ten Chinese words visited in that particular hour, all ten were hanzi, not pinyin or another Romanisation. Obviously I could make a more systematic study of this, but the most cursory evidence suggests adding Romanisations will not increase readership much. (Incidentally, Latin words are one of the biggest attractions here). I think this question is not really about readership in general, but about how specific types of users use specific languages of largely academic interest. Isomorphyc (talk) 20:45, 6 July 2016 (UTC)
Of course, the romanizations may not be used nearly as much. But can we get statistics of the romanizations alone? How much are they used? —CodeCat 20:51, 6 July 2016 (UTC)
This will have sampling size issues, time of day issues, language choice issues, etc. but for the file I am looking at, about 12% of Mandarin visits are pinyin and 88% are hanzi. I have most of the dataset locally, so this could be done more systematically or for other languages, of course, with more planning. The numbers are 1492 hanzi entries visited 1922 times, compared to 174 entries visited 261 times in that hour interval for pinyin. Isomorphyc (talk) 21:18, 6 July 2016 (UTC)
In general, Romanisations are 1.2% of pages viewed in this period: 1415 out of 114222. So Mandarin has about a 10x statistical lift for users preferring Romanisations compared to the average language. Isomorphyc (talk) 21:22, 6 July 2016 (UTC)

──────────────────────────────────────────────────────────────────────────────────────────────────── @Isomorphyc: I am more in favour of including the invisible HTML Romanisations than I am in favour of creating entries for Greek Romanisations. Neither idea fills me with joy, however. For you personally, I recommend learning to type polytonic Greek with your keyboard. — I.S.M.E.T.A. 13:26, 7 July 2016 (UTC)

Hi @I'm so meta even this acronym:: I use this method for composition, and it is not too bad, though I find it is not so good for dictionary type of tasks. Pinging also @CodeCat, @Metaknowledge, @JohnC5: following the earlier discussion, I have some statistics for Romanisation. They are quite interesting. Mandarin is the only language which is fully Romanised, and my old numbers were wildly incorrect (because, as I had warned, of time-of-day bias. It was daytime in China, and pinyin was unpopular). Pinyin has received 2m of Mandarin's 4.2m year-to-date page views, mostly during night-time in China. Please see the following chart. The data are summed over all 1H 2016 page views. I have put the table in a user page appendix: User:Isomorphyc/Romanisation_Page_View_Statistics
Several observations: Adding Japanese Romanisations would perhaps yield a 50% increase in Japanese page views. The current convention in Cantonese is to offer Jyutpin entries only for characters. Switching to a convention which Romanises all words (as in Mandarin) could nearly perhaps double Cantonese page views. I am not advocating either of these (I don't know much Cantonese or any Japanese), but only mention this; wide readership should be perhaps even only a secondary objective here, in some ways. Archaic-script Romanisations are very popular, but a caveat in Gothic is that a great many entries were created in 2016. The script and the Romanisations are surprisingly popular-- presumably the script users are clicking on intra-Wiktionary etymologies while the non-script users are searching non-script outside references. There is no data on the popularity of Romanisations for languages with widely-used scripts (Hebrew, Devanagari, Greek, Cyrillic, etc.) The only user-base I can imagine are people (such as myself) who are too lazy to change their keyboard configuration. I believe learners of modern languages only ever use Romanisations for character-based, never script-based, languages. We already gloss most widely gloss Romanisations in mentions and lists with templates, so I don't see a case that users are confused. If someone really wants to do this, my suggestion would be to pick one popular-script language, Romanise all of it, and see if the usage numbers justify keeping it and proceeding after a reasonable amount of time.
More observations (not in the table): Wiktionary has about a billion page views a year, or thirty per second. Fully 8.9% of this traffic is people looking up non-lemma forms in Latin. (A further 4.5% is lemma forms). Latin comprises, I think as is widely known, 13.7% of entries and 13.8% of page views. Evidently nearly 10% of Wiktionary by weight and popuarlity is essentially an easy to use Latin stemmer. I suspect the best thing one could do for Greek does not relate to Romanisations, but instead is extensively to create non-lemma entries while ensuring a core vocabulary of the top 5000 Attic and Koine are available. I wouldn't mind working on this in the medium term, but for now, to me, the Romanisation searchability seems like very low-hanging fruit.
While I am here, this table may also be of mild interest. User:Isomorphyc/Page_Views_and_Entry_Counts_by_Language_1H_2016
Apologies for the bad formatting, long note and long delay. The raw 2016 data are 3 TB uncompressed, but I can offer a ~200 MB file if anyone would like annualised 2016 data. Thanks for reading. Isomorphyc (talk) 00:07, 12 July 2016 (UTC)
@Isomorphyc Is there any referrer information (what sites people came here from)? Or differentiation between internal and external refers. DTLHS (talk) 02:21, 12 July 2016 (UTC)
@DTLHS: Unfortunately, no. The data are per-page hourly view counts for all Wikimedia projects. I believe this is for privacy concerns. I had wished for the same thing. Isomorphyc (talk) 02:25, 12 July 2016 (UTC)
@Isomorphyc: This information is startling interesting and represents the strongest argument I've ever seen for large-scale Romanizations. I've always been a proponent of using native scripts (as evinced by my painstaking construction of AP:Old Italic script), but if extra entries make that much difference, I'd certainly support their broader use. —JohnC5 04:53, 12 July 2016 (UTC)
@JohnC5: For all practical purposes, we do not have data on Romanisation of scripts, only of character systems. I believe the benefit is likely far less, because character systems usually have official Romanisations which are often taught to tourists, beginning students, and casual learners. Russian, Arabic, and Modern Greek are the three most developed non-Roman script languages on Wiktionary. I would see very large issues in creating Romanised headwords for any of them, but it may still be worth a trial. Academic languages (Greek, Sanskrit) I think are a separate category. Isomorphyc (talk) 11:56, 12 July 2016 (UTC)
Modern languages such as Mandarin and Japanese have official romanization standards, so creating romanization entries for the best-known ones isn't that hard. The same is true for languages such as Gothic, with relatively uniform treatment in the literature. Ancient Greek, on the other hand, has a plethora of ad hoc systems that vary in subtle ways: is it chi, khi, or xi? How about xi vs ksi? Or diphthongs: ou or u, ei or i? How is length handled? I would think that, as the predictability of what people would search for diminishes, the benefits from romanization entries would, as well. Even with Mandarin and Japanese, we have Hanyu pinyin and romaji, but not Wade-Giles or Hepburn. Then there are things like beta code: even if the system would let us have entries like fqoggh/ or w)=|, I think they would cause more confusion than they're worth (having entries searchable by such things is another matter). Chuck Entz (talk) 13:31, 12 July 2016 (UTC)
Another idea to throw in the ring: is there any way to create an input method or keyboard of some sort, so people could type beta code into the search box and get Greek to show up? And beta code has standards for other scripts, as well, so it would be useful for many other languages. Chuck Entz (talk) 13:45, 12 July 2016 (UTC)
@Chuck Entz: That idea sounds wonderful! I'm not sure how to implement it, but I'd love to help. —JohnC5 14:21, 12 July 2016 (UTC)
@Chuck Entz: I do not know to interface with the search bar, or much about JavaScript; but I would be glad to learn if there is a way for me to help. I like this idea very much. Isomorphyc (talk) 01:20, 13 July 2016 (UTC)
@Isomorphyc: Where do you find these page view statistics? --WikiTiki89 13:53, 12 July 2016 (UTC)
@Wikitiki89: The raw page-view statistics are here: [10]. There are a few other forms in the same place, with various characteristics. Isomorphyc (talk) 14:06, 12 July 2016 (UTC)

Using template l to link to English entries[edit]

FYI, I created Wiktionary:Votes/2016-07/Using template l to link to English entries.

Let us postpone the vote as much as discussion requires. --Dan Polansky (talk) 08:21, 6 July 2016 (UTC)

Imagine me taking off my surgeon's mask as I ask: But...why? Korn [kʰũːɘ̃n] (talk) 10:23, 6 July 2016 (UTC)
Because there may be better ways to do this. E.g. writing the language code "en" is cumbersome and hinders readability. 99.99% of the words linked on the definition lines are to English words, so, we may as well use "en" as a default. But given the stupid order of parameters (language code first? Really?), that is not possible. — Dakdada 10:47, 6 July 2016 (UTC)
I suggested a separate proposal to address the problem you just mentioned. See: Wiktionary talk:Votes/2016-07/Using template l to link to English entries#Separate proposal. --Daniel Carrero (talk) 10:56, 6 July 2016 (UTC)
  • I meant "why is this proposed?" not "why is the vote postponed?" And I prefer templates to take all non-optional parameters first. Korn [kʰũːɘ̃n] (talk) 12:39, 6 July 2016 (UTC)
Ah, sorry I misunderstood. — Dakdada 15:58, 6 July 2016 (UTC)
@Dan Polansky, you said here: "Rationale" - "To be entered by supporters." Are you planning to vote "Oppose"? --Daniel Carrero (talk) 16:22, 6 July 2016 (UTC)

Bot replace Template:etyl with Template:cog?[edit]

Is it ok if I run a bot to replace all instances of "{{etyl|xx|-}} {{m|xx|...}}" with "{{cog|xx|...}}", currently categorised in Category:etyl cleanup no target? —CodeCat 20:05, 6 July 2016 (UTC)

I would certainly support this. —JohnC5 20:11, 6 July 2016 (UTC)
How do you know all uses of {{etyl|xx|-}} are intended to be cognates? DTLHS (talk) 20:15, 6 July 2016 (UTC)
What else could they be? —CodeCat 20:17, 6 July 2016 (UTC)
I support and appreciate the change. Please go ahead. --Daniel Carrero (talk) 20:18, 6 July 2016 (UTC)
(e/c) And why would it matter? The template doesn't have to only be used for cognates, even if cognates are its main purpose. --WikiTiki89 20:19, 6 July 2016 (UTC)
I could limit it to just etymology sections for now, if that's better. —CodeCat 20:20, 6 July 2016 (UTC)
Yes, probably limit it to etymology sections (kampung for example) DTLHS (talk) 20:25, 6 July 2016 (UTC)
(e/c) Theoretically, {{etyl}} should only have been used in etymology sections anyway. I would like to know what kinds of other places it's used in. --WikiTiki89 20:26, 6 July 2016 (UTC)
  • I've used {{etyl}} in ====Usage notes==== sections, such as in "compare with English [SOME OTHER TERM]..." constructions. ‑‑ Eiríkr Útlendi │Tala við mig 20:32, 6 July 2016 (UTC)
    Well that's clearly not the intended purpose of {{etyl}}, and I would even say {{cog}} is preferable there. So there is no need to limit this replacement to etymology sections. --WikiTiki89 20:38, 6 July 2016 (UTC)
  • The intention of the template is not clear from its documentation. The existence of second argument - to explicitly avoid categorization lends this template to broader use than just in ===Etymology=== sections. Even the name etyl, described as coming from etymological language, suggests that this could be used in any case where an editor seeks to specify a given term's language.
Before embarking on any bot-driven overhaul of how {{etyl}} is used, I would strongly recommend first finding out where and how it is actually used, based on the hard data available in a dump, rather than just relying on our own individual assumptions. ‑‑ Eiríkr Útlendi │Tala við mig 21:43, 6 July 2016 (UTC)
At worst, the bot run will convert some incorrect uses of {{etyl}} to equally incorrect uses of {{cog}}. We can do such an analysis after the run, too. —CodeCat 21:46, 6 July 2016 (UTC)
  • @CodeCat, so long as the xx in both {{etyl}} and the following {{m}} match, this is fine by me. There have been cases where {{etyl|ojp|...}} is followed by {{m|ja|...}} due to the unresolved status of Old Japanese entries, both here and in Japanese lexicography in general (i.e. there isn't as clear a distinction between the two; many bigger dictionaries of modern JA include obsolete terms that could technically qualify as OJP, and current terms that have specific OJP or Classical senses). ‑‑ Eiríkr Útlendi │Tala við mig 20:36, 6 July 2016 (UTC)
    I would say it's better to put something like "from Old Japanese (compare Japanese XYZ)". --WikiTiki89 20:39, 6 July 2016 (UTC)
  • So far as I know, we do not (yet) have any OJP entries. We do have plenty of JA entries, and in some cases, the JA and OJP differ mainly in conjugation patterns and idiomatic usage. Monolingual JA dictionaries will often put OJP and JA content into a single entry, indicating obliquely in the header that the older forms have a different conjugation. "Compare" doesn't quite seem correct in these cases. ‑‑ Eiríkr Útlendi │Tala við mig 21:54, 6 July 2016 (UTC)

I've made a few test edits with the script, it seems to work ok. All those Malayo-Polynesian etc entries that are listed right at the start of the category use {{etyl}} in descendants sections, we'll probably want to fix them. There's a few I already encountered that use it in other sections but could theoretically replace it with {{cog}}, such as in -ains. —CodeCat 20:43, 6 July 2016 (UTC)

But like I said, {{cog}} is not any more wrong than {{etyl}} there. So it's safe to replace them. --WikiTiki89 20:50, 6 July 2016 (UTC)
Ok, I'll remove the section restriction. —CodeCat 20:52, 6 July 2016 (UTC)
  • Query: does {{cog}} add any categories? If so, which ones? If not, should it?
I'm also a little puzzled by the apparently cavalier attitude for using {{cog}} for relationships that are not cognates. This confuses things unnecessarily. ‑‑ Eiríkr Útlendi │Tala við mig 21:59, 6 July 2016 (UTC)
It does not add categories. Also, we've used etyl cavalierly to mark relationships that are not etymological; this is a step in the right direction. —Μετάknowledgediscuss/deeds 22:05, 6 July 2016 (UTC)
  • {{etyl}} makes more sense to me than {{cog}} for marking the language of a given term.
And it is unclear to me how changing to {{cog}} "is a step in the right direction". Although I disagree that use of {{etyl}} has been cavalier, even granting that, swapping one apparent confusion for another does not strike me as progress. I note also that the documentation for {{cog}} explicitly states that this template is intended to mark cognate relationships, and that it is intended for use solely in ===Etymology=== sections. What's described in this thread here goes well beyond that stated scope. ‑‑ Eiríkr Útlendi │Tala við mig 22:24, 6 July 2016 (UTC)
Please look at how the templates work. The advantage of {{cog}} is that it links to both the language and the term, thus serving the purposes of two templates (less markup means easier for editors to read and easier for bots to manipulate, while eliminating mismatch error). —Μετάknowledgediscuss/deeds 22:44, 6 July 2016 (UTC)
If the naming of the template is an issue, we can create an exactly identical template with a different name and documentation, or we can rename this one. Either way, this doesn't affect the bot run. —CodeCat 23:10, 6 July 2016 (UTC)
  • The functionality (combining two templates into one with essentially the same output → less markup) I agree with. The naming of {{cog}} and its documentation make me think that cog as a name here is inappropriate for this current use, which has nothing to do with cognates. If a different template with a different name could be used for this instead of {{cog}}, my current concerns would be resolved. ‑‑ Eiríkr Útlendi │Tala við mig 01:17, 7 July 2016 (UTC)
So, for example in almagra, you want to change the current etymology from "From Spanish almagra, almagre, from Arabic المُغْرَة ‎(al-muḡra, “red clay or earth”)" into "From Spanish en almagra, almagre, from Arabic en ‎(en) المُغْرَة ‎(al-muḡra, “red clay or earth”)", and you want to remove the categories Category:English terms derived from Spanish and Category:English terms derived from Arabic altogether?
Why don’t you want to keep those categories, and what are you going to do with the en and en ‎(en)? (note: en is a language code, so it will vary according to the language.) —Stephen (Talk) 07:33, 7 July 2016 (UTC)
I think you don't understand this. Please look carefully at what CodeCat wrote at the very beginning, so you can see that it doesn't apply to uses of {{etyl}} to show etymological derivation. —Μετάknowledgediscuss/deeds 07:42, 7 July 2016 (UTC)
Oh, just in Category:etyl cleanup no target, not necessarily "replace all instances of". —Stephen (Talk) 07:58, 7 July 2016 (UTC)

I've run the bot now, and it did a good number of edits, but there's still 5000 cases remaining. I spotted some regular occurring patterns that a bot could also fix up:

  • {{etyl|xx|-}} ''[[foobar]]'', optionally with a language as the anchor.
  • {{etyl|xx|-}} {{l|xx|...}}, presumably added by editors who don't know the difference between the templates.
  • {{etyl|xx|-}}: {{l|xx|...}} in a descendants section. Seems to occur mostly in Malayo-Polynesian languages.

CodeCat 16:03, 8 July 2016 (UTC)

I don't understand the difference between the templates either. What do {{l}} and {{m}} actually do differently? Korn [kʰũːɘ̃n] (talk) 17:14, 8 July 2016 (UTC)
See Help:Language sections#Linking to language sections. --Daniel Carrero (talk) 17:16, 8 July 2016 (UTC)
{{m}} italicizes Latin-script terms and transliterations of non-Latin script terms. {{l}} does not. --WikiTiki89 18:18, 8 July 2016 (UTC)
Strictly speaking, it doesn't italicise, but it tags it with the "mention" CSS class, and the CSS then gives it italic formatting. The distinction matters when users start making custom CSS. —CodeCat 18:37, 8 July 2016 (UTC)
@Korn See also Wiktionary:Style_guide#Styling_templates, a section I wrote which shows all (or most) of the styling templates and where to use them. In short, {{l}} is used for lists and {{m}} in running text, and as mentioned, the latter italicizes (usually) but the former doesn't. Benwing2 (talk) 20:56, 8 July 2016 (UTC)

@CodeCat I have thought also about running a bot to convert instances of LANG {{m|xx|...}} to {{cog}}, where LANG and xx agree, e.g. Serbo-Croatian {{m|sh|...}}. This has to be done carefully; one idea is to look for lists of terms (e.g. LANG1 {{m|xx|...}}, LANG2 {{m|yy|...}} ... and LANGN {{m|zz|...}}, probably with additional smarts to allow for parenthesized terms in the list) and only convert them when the immediately preceding text says "[Cc]ompare" or "[Cc]ognate with" or certain other expressions. Benwing2 (talk) 21:03, 8 July 2016 (UTC)

The smarter it is, the more potential there is for errors or oversights - don't write code smarter than yourself, and don't overestimate how smart you are. So I'd prefer simpler heuristics if possible. Limiting it to Etymology sections is a good start. —CodeCat 21:25, 8 July 2016 (UTC)
Sorry, I definitely meant it to be limited to Etymology sections; that's clear. As for the rest of it, I don't think this is terribly over-clever code, and I've written bot code like this before without too much problem. You just have to be careful and review a bunch of the subs (before actually saving anything) to make sure it's behaving like you want. Benwing2 (talk) 23:32, 8 July 2016 (UTC)

New abuse filter for canned edit summaries?[edit]

Wikipedia has a list [11] of edit summaries commonly used by vandals: Added content; Added; Fixed typo; Typo; Fixed grammar; Grammar; I made it better. I don't know whether there are any specific vandalism tools or help-sheets out there that use these phrases, but I've seen them (identically, with the initial capital) around Wiktionary too. Perhaps an abuse filter to tag them is in order? Equinox 21:00, 7 July 2016 (UTC)

The reason vandals use these edit summaries is because normal editors use them too. I'm not sure how effective it would be to tag them. --WikiTiki89 21:37, 7 July 2016 (UTC)
I'm not sure that normal editors do use them. I've never seen "Added content" on a legit edit, but often on bad ones. Equinox 21:45, 7 July 2016 (UTC)
Maybe not "added content", but "fixed typo" and "fixed grammar" are certainly used by normal editors. --WikiTiki89 21:48, 7 July 2016 (UTC)
While "added content" is usually a red flag, I do see a small number of good edits all the time. The same with "fixed typo", though the size-change criterion is a helpful- but not infallible- added indicator. Just plain "Fixed" is somewhat reversed in terms of vandalism-to-good-edit ratios. On the other hand, I've never seen a good edit accompanied by "I made it better"- or by anything using the pronoun "I". By the way, I think the reason "added content" is used so much by vandals is that it's very vague and seems innocuous. While we're at it, I think any edit comment that includes lol, lulz, or variants should definitely be flagged, too. Chuck Entz (talk) 03:43, 8 July 2016 (UTC)
I have seen "fixed typo" used by so many vandals on WP. I don't recall whether I've seen it here or not. We could tag edits by new users that used them and check the tag log after a while and see whether it was catching enough vandalism (and a high enough ratio of vandalism to helpful edits) to be worth continuing to tag. - -sche (discuss) 22:23, 7 July 2016 (UTC)
Typos are small, so any edit glossed "fix typo" that changes more than, say, six bytes is very suspect. Equinox 22:30, 7 July 2016 (UTC)
  • I use 'typo', 'fixed typo', 'grammar mistake', 'general improvements' and 'expanded'/'extended' with great regularity. And while Germanic, I'm not a vandal. I just make a lot of typos. If you don't want to become scholars of my works, maybe the tagging should exclude autopatrollers. Korn [kʰũːɘ̃n] (talk) 22:41, 7 July 2016 (UTC)
Tags If a certain tag is being flagged too often, it can easily be turned off. It's worth trying these to see if they are useful. —Justin (koavf)TCM 03:30, 8 July 2016 (UTC)
Good idea. The filter can combine the edit summary with other observable characteristics of the edit, like, quoting Equinox, 'so any edit glossed "fix typo" that changes more than, say, six bytes is very suspect'.
A rather unrelated idea: I would prevent anons from making edits that remove more than, say, 10 bytes. --Dan Polansky (talk) 09:36, 10 July 2016 (UTC)
@Dan Polansky: I think that a policy about restricting the bytes that an IP can edit is hostile to users who don't wish to have an account. We should welcome anyone to edit, even if he doesn't want a pseudonym associated with this edits. —Justin (koavf)TCM 14:04, 10 July 2016 (UTC)
It's about bytes removed, not bytes added. --Dan Polansky (talk) 14:10, 10 July 2016 (UTC)
That sounds much too restrictive to me: there are plenty of good reasons to remove material, e.g. cutting down waffle/verbosity, or fixing the vandalism of other anons. Equinox 14:20, 10 July 2016 (UTC)
@Dan Polansky, Equinox: For that matter, someone may make a very helpful edit by removing a lot of text and moving it to the Citations namespace or replacing it with a template. I agree that it can still trigger a tag but not restricting the ability to do it altogether. For that matter, someone can remove 100kb of data and then add back 106kb of junk. The absolute difference in data is not a good metric of quality, hence just tagging it rather than stopping it altogether. It still requires a lot of human discretion. —Justin (koavf)TCM 18:11, 10 July 2016 (UTC)

Old Gutnish[edit]

I've noticed a few redlinks to Old Gutnish in Proto-Germanic entries. Does anyone know if it is distinctive enough to have its own code, or should it be merged into Old Norse/modern Gutnish? KarikaSlayer (talk) 00:48, 8 July 2016 (UTC)

It's considered a dialect of Old Norse, but quite distinct. Old Norse is considered to be split into three main dialect areas, East, West and Gutnish. Some of the idiosyncracies of Old Gutnish survive into modern Gutnish, including in particular the triphthong jau. —CodeCat 01:02, 8 July 2016 (UTC)

Compact Language Links enabled in this wiki today[edit]

Screenshot of Compact Language Links interlanguage list

Compact Language Links has been available as a beta-feature on all Wikimedia wikis since 2014. With compact language links enabled, users are shown a much shorter list of languages on the interlanguage link section of an article (see image). Based on several factors, this shorter list of languages is expected to be more relevant for them and valuable for finding similar content in a language known to them. More information about compact language links can be found in the documentation.

From today onwards, compact language links has been enabled as the default listing of interlanguage links on this wiki. However, using the button at the bottom, you will be able to see a longer list of all the languages the article has been written in. The setting for this compact list can be changed by using the checkbox under User Preferences -> Appearance -> Languages

The compact language links feature has been tested extensively by the Wikimedia Language team, which developed it. However, in case there are any problems or other feedback please let us know on the project talk page. It is to be noted that on some wikis the presence of an existing older gadget that was used for a similar purpose may cause an interference for compact language list. We would like to bring this to the attention of the admins of this wiki. Full details are on this phabricator ticket. Thank you. On behalf of the Wikimedia Language team:--Runa Bhattacharjee (WMF) (talk) 03:12, 8 July 2016 (UTC)

Company names in Russian?[edit]

@Atitarev, Cinemantique, Wikitiki89, Wanjuscha, KoreanQuoter I'm pretty sure Wiktionary's policies don't allow things like company names e.g. Газпром (Gazprom), Лукойл (Lukoil), Роснефть (Rosneft) to be inserted as lemmas. For languages like Russian this may be slightly problematic as it's often useful to be able to give the pronunciation, stress, gender, declension etc., some or all of which may be unpredictable. Some but not all of this information is found in Wikipedia. Any solution? Benwing2 (talk) 04:44, 8 July 2016 (UTC)

I think this would fall under WT:BRAND, which in practice allows major company/brand names to be included (basically if they might be mentioned in passing, without talking specifically about the commerce side). Equinox 04:47, 8 July 2016 (UTC)
Thanks for pinging but it's not up to me. I find some company names interesting as well but they get deleted, e.g 宏碁宏棋 (Hóngqí, “Acer”). --Anatoli T. (обсудить/вклад) 06:11, 8 July 2016 (UTC)
I don't think there is anything special about Russian with regard to company and brand names. Even in English, it is not always clear from the spelling these names how they are supposed to be pronounced. The reasons for excluding most company and brand names from English and other languages apply equally to Russian as well. But like Equinox said, some of these names can still pass. --WikiTiki89 14:33, 8 July 2016 (UTC)

Future of Wiktionary and its interface[edit]

I want to ask Wiktionary users to think about this more seriously. Now I myself don't have any alternative in my mind right now (I only know we probably need something new from scratch), but I think we should bring discussions like this up every now and then, even if we don't make any apparent progress right now. It's a shame that many of us users don't bother to think about what the project should look like in the long run, while it is relevant to our very own efforts.

Wiktionary is not user-friendly, and looks relatively time-consuming to edit for most people, especially for beginners (note the fact that the bulk of our users are either quite active or quite inactive). I feel what we currently have is not what our editors deserve. Flexibility is good when you are at the beginning of things, but I think our project has reached a good degree of matureness to move toward finally deciding about a less "flexible" and more systematic way to take and show information. I noticed this when I saw we are trying to make even Etymology, Descendants and Pronunciation sections as such.

P.S. Consider Wikidata as well. --Z 17:27, 9 July 2016 (UTC)

Our hands are tied. We don't have control over anything of importance in terms of how this website works, and we have very few people who are willing and able to put enough time into new user-friendly JS or beginner-friendly templates. Wikidata is a must for interwikis, but otherwise I have not seen much prospect for its utility. From my perspective, all I can do is keep adding content. —Μετάknowledgediscuss/deeds 18:25, 9 July 2016 (UTC)
From what I can gather, Wikidata would rather move ahead with their own pseudo-dictionary project rather than touch anything on this site. DTLHS (talk) 18:28, 9 July 2016 (UTC)
@DTLHS: Can you tell me what you mean by this? Can you point me to some relevant pages? —Justin (koavf)TCM 20:47, 9 July 2016 (UTC)
See the wikidata page on "water". There are many translations, with glosses defined in the target language. They are attempting to do many of the same things we do. Notice that there are links to every other Wikimedia project but not Wiktionary. DTLHS (talk) 21:01, 9 July 2016 (UTC)
@DTLHS: Well, see my comment at d:Wikidata:Project chat. The problem is that interwiki links mean two different things only for this project but not for others. For every other project, an interwiki link would be the same idea (wikt:en:foot to wikt:es:pie) but for this one, it would also be good to link to the same idea (a translation) but also the same term (so from wikt:en:foot to wikt:es:foot). —Justin (koavf)TCM 03:31, 10 July 2016 (UTC)
  • @Justin (koavf), what you propose represents a huuuuge problem -- the correct translation of a single term in isolation, into a single term in some other language, is often flat-out impossible. Which sense of water would you link to in the other languages? What about metal? Even compound terms can introduce ambiguity -- for heavy metal, would you link to the corresponding entry for the chemical sense, or the musical sense? This impossibility is precisely why Wiktionary interwiki links only target the corresponding entry for that same spelling in that same script. ‑‑ Eiríkr Útlendi │Tala við mig 00:38, 13 July 2016 (UTC)
@Eirikr: Agreed--it's very difficult to do this. This is why OmegaWiki uses a different database entry per meaning, whereas we have an entirely different page per term/character/etc. —Justin (koavf)TCM 01:24, 13 July 2016 (UTC)
Even meanings are problematic: for every clear, discrete concept like a chemical substance or a species, there are a hundred others that are really clouds of interacting sub-concepts, with the relative importance of each sub-concept influenced by context or personal style, or phases of the moon... or something. These terms don't quite mean exactly the same thing any two times they're used, and the ambiguity can be just as important as the core meanings. This is the stuff of poetry and humor, and pinning it down precisely will often kill it. Translation is an art, not a science, and even a mega-corporation like Google with all the resources it puts into Google Translate can't make computer translation accurate. It's been a few decades since I took Semantics, but I don't think this is going to be solved any time soon- we just keep progressively halving the distance to infinity. Chuck Entz (talk) 02:33, 13 July 2016 (UTC)
@DTLHS: That's not intended to be a dictionary in any form. Needing labels corresponding to topics that need to be used by Wikipedia is incidental, and Wikidata does not currently include any effort to build a database of lexical/linguistic content outside of that. However, there are plans to build one in the future, so that the Wiktionary projects can make use of it. See the most recent proposal for Wiktionary-Wikidata integration. These efforts have not yet begun. --Yair rand (talk) 20:29, 17 July 2016 (UTC)
@Metaknowledge: If you have any ideas for some scripts to make things more user-friendly, I'd be glad to write the code.
Re Wikidata, I think it will be very useful for storing things like hard-to-manage transcriptions (those which are currently scraped out of entries by templates), pronunciations (which could then be shared across projects), quotations, and maybe etymological data. Eventually, we'll probably be able to move basically everything over there without disrupting everyone's workflows, but that's probably many years away. --Yair rand (talk) 20:39, 17 July 2016 (UTC)
@Yair rand: Right now most of my wishlist is for editors, not users, but if that offer stands, I'll try to remember to tell you when something comes up. And I'm still suspicious of our future at Wikidata — if they really wanted to do this the right way in the foreseeable future, they'd try to get us on board and invite discussion here. —Μετάknowledgediscuss/deeds 20:45, 17 July 2016 (UTC)
  • Is there any context for this? Because I've little idea what exactly the topic is. Korn [kʰũːɘ̃n] (talk) 18:28, 9 July 2016 (UTC)
There is a possible way forward. We could write a parser for the syntax of the site as it currently exists (this is not easy!). If we have a parser, we can represent the data contained here independently of our own peculiar conventions, and bootstrap our way towards something better. Without a parser we can only make incremental changes, slowly, and mostly by hand. DTLHS (talk) 18:34, 9 July 2016 (UTC)
The wikidata "water" page addresses part of definition 1 at [[water]]. I think Wiktionary would a useful, large set of ambiguations (polysemies) to get them out of what looks like a narrow set of concepts that they are so far covering. Perhaps they should work with WordNet or other semantic net until they are ready to work with the complications of natural languages. DCDuring TALK 21:22, 9 July 2016 (UTC)

Hi all, we do talk about this topic in the French Wiktionary. That why we went to Wikidata to add Translate into all conversation about Wiktionary. Well, I see two different direction. First is the technical improvement and it may pass by a lexical database, and we are the ones that have to discuss this, in a way to scope what can be integrate. Second aspect is interface, and Visual Editor is the way in my opinion. Ok, I know, this tool is far from perfect now, as it was at the beginning for Wikipedia. Well, we can change that. Recently I wrote an idea I had during Wikimania on Visual Editor talk page and it result on a proposal on phabricator. It's something small, and adapt this tool to each project will be a long term project, but I think we can do it Face-smile.svg Noé (talk) 12:17, 10 July 2016 (UTC)

Renaming male -> masculine, female -> feminine in {{given name}}?[edit]

Enoshd (talkcontribs) suggested this and refers to w:Sex and gender distinction. He thinks both the text and categories should change. Changing the text is easy but renaming the categories will require a bot probably. Comments? Benwing2 (talk) 22:37, 9 July 2016 (UTC)

FYI: Wiktionary:Votes/pl-2010-01/Renaming given name appendixes, Wiktionary:Votes/2009-12/Masculine and feminine given names. --Dan Polansky (talk) 09:04, 10 July 2016 (UTC)
Pointless; we already have a sense at male that says "Belonging to the masculine gender (social category)." Equinox 12:20, 10 July 2016 (UTC)
We should make a distinction between grammatical and personal gender. These names' gender is personal, and possibly also grammatical but not necessarily (diminutives in Dutch are neuter, names are no exception). Since we already have "feminine nouns" categories and such for some languages, we should use "male" and "female" here to distinguish them. —CodeCat 12:58, 10 July 2016 (UTC)
Cecil is a male given name, but it isn't very masculine. Masculine (and feminine) would create ambiguity. Renard Migrant (talk) 14:43, 15 July 2016 (UTC)
Are you calling Cecil Rhodes a sissy? —Aɴɢʀ (talk) 15:27, 15 July 2016 (UTC)
+1 to what CodeCat said. For example, cailín is a masculine word, but a female given name (google e.g. Cailin Mcloughlin). Also, as Equinox points out, "male" and "female" are sufficiently polysemous to be acceptable. - -sche (discuss) 19:36, 15 July 2016 (UTC)
The Sex and gender distinction is irrelevant here, since no language actually makes that distinction (if you look at our entries for sex and gender, both words can refer to both concepts). Like others have said, there is a completely separate distinction relevant to us that we call grammatical gender vs. whateverthehellyouwanttocalltheotherone (and both "sex" and "gender" fit into the latter). It is true that in many languages the latter often strongly influences the former, but we still need to maintain the distinction, and the way we've been doing that is by using masculine and feminine for the former and male and female for the latter. --WikiTiki89 19:50, 15 July 2016 (UTC)

Do we need to mark accent in Hebrew words? Also, why don't why add redirects for Hebrew romanizations? (among other notes)[edit]

1. The accents and stress are very predictable in Hebrew, to the point of graphic accents being unnecessary.

2. Hebrew words are often understood in their romanized form(s) and aren't easy to search without a Latin keyboard. Why not do for Gothic and Japanese for Hebrew?

3. Do bekadgefat letters need to be mentioned with a tag? There are only six, and it would be redundant to mark every single word.

4. Nobody uses the CCaCtem/ CCaCten meter in the past pa'al due to analogy, nor does anyone use feminine plurals outside of the present tense. Shouldn't space be dedicated to gerunds and in the case of pa'al, past participles?

5. Shouldn't irregular verbs that are technically not weak like "konen" be marked?

6. Why is tzade always ts not tz? Nobody spells the language like that. Nor does anyone use kh for heyt, altho if there is a /X/~/X\/ separation in the orthography, I get having a /?/~/?\/ separation too, using ' and `, respectively. Maybe there should be precise and common transliterations? —This unsigned comment was added by Zontas (talkcontribs).

@Zontas: Re #2: For what it's worth, the Japanese actually use Romanji. I agree that we should have redirects for these, though or pages which read something like "Latinization of [term]" and which are maybe edit-protected. —Justin (koavf)TCM 22:38, 13 July 2016 (UTC)
Yeah, but the Goths never romanized their language and how would anyone find a word without a keyboard dedicated to Hebrew? —This unsigned comment was added by Zontas (talkcontribs).
My answer:
  1. That's not quite true. Most of the time the accent is predictable (in native words), but in many cases it is not. Minimal pairs include תואר \ תֹּאַר ‎(tó'ar, title) vs. תואר \ תֹּאַר ‎(to'ár, he was described) and בָּנוּ ‎(bánu, in us, also they understood) vs. בָּנוּ ‎(banú, they built). I agree that including accents on all words is superfluous, it would have been better to mark accents only when not on the final syllable. But marking the accent on all words is what was decided upon before my time here.
  2. What do you mean by "are often understood"? Gothic and Japanese each have their own peculiarities that make romanizations desirable as entries. In both cases there were votes in order to allow them to be treated as exceptions. Our default policy is that romanizations are not words and do not deserve entries and I don't think Hebrew is special enough to be treated differently. Searching can be solved by using the MediaWiki software keyboard, various online keyboards, copy/pasting, enabling the Hebrew keyboard that you may not have known came with your computer, etc.
  3. Are you referring to the usage note we include? I agree that they are too common for this to be necessary. I don't add them, but I don't really have a problem with others adding them.
  4. I think our conjugation tables mention that. If not, my new conjugation module that I'm working on should address the issue. But remember that our Hebrew entries cover Hebrew from all periods, including Biblical, Mishnaic, Medieval, and Modern.
  5. I don't understand what you mean.
  6. Firstly, keep in mind that American Jews' transliterations of Hebrew words are not Hebrew. People that actually write in Hebrew do so in Hebrew letters. I also have advocated a dual romanization system for Hebrew -- a scholarly one and a modern one -- which I use sometimes in various places. We use ts and kh because that was the common practice before my time here; theoretically, it makes more sense, even though it looks weird to us American Jews who are used to tz and ch. For kh, there is also the added benefit of leaving ch free to indicate the צ׳ sound (as in צ׳יק צ׳ק ‎(chik chak)).
--WikiTiki89 00:35, 14 July 2016 (UTC)
WT:REDIRECT advises against redirects and they can be confusing, especially for a Latin to a non-Latin script. Also you'll get collisions where two words, say Chinese and Hebrew have the same romanisation and an entry title can't redirect to both. Thirdly there are often rival romanisations schemes for languages, Chinese has loads, doesn't it? So one word might have seven different romanisations. Bear in mind we have search function that will picking up romanisations in an entry's text. Renard Migrant (talk) 14:40, 15 July 2016 (UTC)
  1. Yeah, I guess the accent isn't completely useless, but I suppose the answer is to limit usage, not remove or overdo it.
  2. I concede regarding romanization redirects, I guess a better idea would be adding Hebrew words to "in other scripts" sections and having two official transliterations on here (precise/ Biblical and colloquial/ Modern).
  3. I mean, it wastes space and isn't exactly helpful. But it's a low priority, my only goal is to make it not mandatory.
  4. Ahhh, I completely understand. Tho, the tables mostly mention it being rare rather than obsolete.
  5. Konen is a verb in pi'el that has an odd conjugation (as you can tell by the dictionary form), but it's not remarked upon for some reason, I suppose we start marking it and other odd verbs like yakhol and lamad.
  6. Most people who write Hebrew do use the Hebrew Script of course, but if they can't for some reason, they use Latin; And in Latin I've never seen ts used for tzade even on other Wikimedia, and kh is mostly used for khaf rather than heyt. It's common here, but about nowhere else. Plain c, which is unused, could be used for /tS/, with ch and ` becoming the (commonly enough) pharyngeal consonants. —This unsigned comment was added by Zontas (talkcontribs) at 15:32, 15 July 2016.
(4) I would say calling them rare rather than obsolete is accurate. (5) כּוֹנֵן is not irregular, is simply a member of the sub-binyan po'el of pi'el, just like רָץ is a member of a sub-binyan of pa'al. (6) Transliterations are not meant to represent how these words are written when the Hebrew script is not available. Transliterations are meant to aid our readers in reading our entries when they don't know the script. --WikiTiki89 18:02, 15 July 2016 (UTC)
Re #6: It could be argued that using more standard transliterations makes it easier for naive users to read. Personally ts makes more sense to me than tz but I agree that tz is more common. Benwing2 (talk) 18:33, 15 July 2016 (UTC)
4) I mean, I do concede it's to mark all eras of Hebrew. 5) Technically weak roots are not irregular, but I that's part of what I was referring to. Anywho, my main issue is that some sub-conjugations were never marked as having unique forms, like konen. 6). Fair enough about not using redirects, tho I still thing we should revize our main transliteration system to match common trends, and to add another for precise transliteration. There should be a Hebrew-to-Latin index. Also, what's your opinions on my responses to #1-#3? --Zontas (talk) 20:28, 16 July 2016 (UTC)
כּוֹנֵן does not have "unique" forms. There are many verbs in this sub-conjugation. Perhaps we should categorize these sub-conjugations, but that would be difficult for some of the confusing pa'al subgroups. --WikiTiki89 14:46, 18 July 2016 (UTC)

New Spanish verb template backend[edit]

Template:es-conj-güir

Automatically highlights and adds categories (in the main namespace) for any irregular forms. Can export a JSON representation of inflected forms:

I've only added it to two of the existing templates (I will need to go through the existing uses to look for unexpected parameters). Any feedback before I go further? DTLHS (talk) 00:12, 14 July 2016 (UTC)

Also I'd like to get rid of all the individual templates and move to three templates (for -ar/-er/-ir verbs) with the pattern as the first parameter, unless there are objections. DTLHS (talk) 00:22, 14 July 2016 (UTC)
@DTLHS: Is it necessary to even have three? Could one suffice? Or maybe one for -ar and another for -er/-ir? —Justin (koavf)TCM 00:50, 14 July 2016 (UTC)
It doesn't really matter to me. But it should either be 1 or 3, the -er / -ir paradigms are different. DTLHS (talk) 00:54, 14 July 2016 (UTC)

Appendix:List of protologisms/non-English[edit]

Right, I admit that I'm one of those people who thinks that having a page like this is essentially counterproductive, but I accept the notion that it's better to have an appendix like this than having to rid Wiktionary of protologisms on a daily basis. However, edits by one user to the Romanian section, is prompting me to take action.

My reasons:

  • (1) a vast majority lack English definitions – last time I checked, we're still in the English Wiktionary. I'm uncomfortable having a disproportionately long list of made-up words lacking definitions which users of this site can understand;
  • (2) an overwhelming percentage of these words are ludicrous – they don't have English equivalents and they have 0 hits on Google, Google Books, social media etc.;
  • (3) most of these terms do not fulfil even basic criteria for inclusion found here:
    "[…]should meet an expressive need" – most don't.
    "follow some logic in their etymology" – not in most cases.
    "follow standards of spelling, intonation, and pronunciation in the language" – unfortunately in absurdum.
    "and should be ideally "catchy" enough to have a chance of gaining wider acceptance." – strongly no, considering that they were deleted in other Romanian projects (for instance Wikibooks) for being absurd.

I'm not going to take action unless I have a mandate to do so by the community. --Robbie SWE (talk) 11:42, 15 July 2016 (UTC)

Zero hits isn't an issue for a protologism. I agree that the page (i) should have definitions in English and (ii) isn't very useful to anybody anyway. Equinox 13:28, 15 July 2016 (UTC)
I don't see why we should be cleaning up the garbage dump. A garbage dump is supposed to be filled with garbage. --WikiTiki89 14:12, 15 July 2016 (UTC)
The only real rule I'd like implemented is that anything attestable should be removed. Like Wikitiki89 says it's a rubbish dump and by the way that was its original purpose as well, to discourage people putting them in the main namespace. As for formatting, surely formatting lists of words that don't exist by very definition has to be the lowest of all priorities. But anyone wanting to do it I'm not going to try and stop them. Renard Migrant (talk) 14:37, 15 July 2016 (UTC)

Ok, I see what you guys are saying and I've arrived at the same conclusion – a dump is a dump, no point in trying to organise it. Thanks for the input though! --Robbie SWE (talk) 11:51, 16 July 2016 (UTC)

Vote: Adding PIE root box[edit]

FYI, I created Wiktionary:Votes/2016-07/Adding PIE root box. Let us postpone the vote if needed, that is, as long as the discussion requires. --Dan Polansky (talk) 18:28, 15 July 2016 (UTC)

Old English Long Vowels/ Wynn/ Edh/ Orthography[edit]

I get keeping thorn and æsh around as they represent unique phonemes, but by the way we are archiving the words, wouldn't it make more sense to use <w> not wynn, and <þ> not <ð>; and also using macrons or acute accents to mark long vowels. Old English writing was semi-chaotic, granted, but most of it is archived in a pseudo-modernized uniform script, and it would be easier to find with some consistency. Not to mention we should stop posting runic words as OE abandoned the script relatively early.

--Zontas (talk) 21:52, 15 July 2016 (UTC)

  • Hmm, I thought edh and thorn represented distinct phonemes? Or is that just for Icelandic? ‑‑ Eiríkr Útlendi │Tala við mig 22:10, 15 July 2016 (UTC)
  • We already use w and þ primarily, as that's what dictionaries use generally. But we have a policy to allow all attested words, in the representation they were written in (as far as Unicode can represent it). So that also leaves room for the letter wynn, ð and runes. However, they should direct the user to the normalised spelling rather than being a main entry in themselves. —CodeCat 22:27, 15 July 2016 (UTC)
@Eirikr edh and thorn are distinct in Icelandic but not OE, where they both represent a sound that was voiced when between voiced sounds and not doubled, and voiceless elsewhere. Benwing2 (talk) 23:15, 15 July 2016 (UTC)
But that also applied, at least, to Old Norse. The difference is in the origin and distribution of [ð]: in Old English it derives exclusively from voicing of [θ], while in Old Norse it derives from that and also from Proto-Germanic [ð]. I believe even in modern Icelandic the two are in complimentary distribution, but I'm not sure. —CodeCat 23:34, 15 July 2016 (UTC)
Actually, in Old English [ð] sometimes comes from frication of [d]. Also, I didn't know Proto-Germanic had a [ð] other than a voiced [θ]. --WikiTiki89 00:06, 16 July 2016 (UTC)
It did, but as an allophone of /d/ much like in Spanish. —CodeCat 01:04, 16 July 2016 (UTC)
That still doesn't mean /ð/ is a phoneme. It's predictably an allophone of /d/ and /T/. Also, I get allowing original script redirects over main entries, but nobody has mentioned whether my idea of marking long vowels (and now that I think about it, soft c and g) is to be used. --Zontas (talk) 19:54, 16 July 2016 (UTC)
@Zontas We do support using macrons for long vowels in links and such. The normal convention here (and also for stress marks in Russian, and similar things in other languages) is that the name of the entry itself doesn't have a macron in it, but the headword does, and links should also, and the macron will automatically be stripped out when generating the underlying link. See lad#Old English for an example. If I link to it as lād or lād, the link works correctly. Benwing2 (talk) 20:06, 16 July 2016 (UTC)
I still don't get why we can't use diacritics in the name. The length isn't just a rare thing, it's as common as it is in Latin. --Zontas (talk) 20:28, 16 July 2016 (UTC)
We don't show it in Latin, either. Chuck Entz (talk) 20:33, 16 July 2016 (UTC)
The reason we don't add such diacritics is that we put the entry down as attested. So the page is chosen by the spelling actually used. If the originally manuscript has the diacritics, you can make an entry at a page with the diacritics; if the diacritics are only a scholarly annotation, they only get added on the page where apt, but the pagename itself will be one without diacritics. Korn [kʰũːɘ̃n] (talk) 22:05, 16 July 2016 (UTC)
It's different with normalised spellings though, and most old Germanic languages use normalised spellings on Wiktionary. Technically, Old English terms are never attested with the letter w (or infrequently, I'll let someone who knows more clarify that), but we have entries with w regardless because the ƿ > w normalisation is usual and normal in dictionaries, grammars and republished OE texts, and we follow this custom. If we went strictly by attestation requirements, we could never have these entries but would be required to spell them with ƿ. So technically, if we allow these changes, and propagate them to page names as well, there's nothing in principle against doing the same for macrons too. Consider that the acute accents for long vowels are part of Old Norse page names, despite not appearing in manuscripts either. —CodeCat 22:13, 16 July 2016 (UTC)
Huh. I didn't know this distinction exists and I do not agree with it. But since it doesn't affect me, I won't complain much about it either. Korn [kʰũːɘ̃n] (talk) 23:41, 16 July 2016 (UTC)
Also, note that, for example, for Hebrew, Arabic, and Russian, diacriticized spellings are attestable but we do not allow them as entry titles. --WikiTiki89 14:44, 18 July 2016 (UTC)
As for Old Norse, I assume we include acute accents (long marks) in page titles because in modern Icelandic the long marks are mandatory (is that correct?), and modern Icelandic is spelled almost identically to Old Norse. Cf. the mandatory long marks in Latvian, which appear in page titles. Benwing2 (talk) 02:54, 21 July 2016 (UTC)

Advice for a Sanskrit pronunciation module[edit]

Howdy all! I've been working on a Sanskrit pronunciation module similar to those available in Latin and Ancient Greek, and I would love some advice! The temporary template may be found at {{User:JohnC5/sa-IPA}}, a sandbox at User:JohnC5/Sandbox2, and the module at mod:User:JohnC5/Sandbox2. I've finished the basic conversion to IPA, the syllabification, the rudimentary anusvara rules, and some chronolect handling (Vedic and Classical). I was hoping someone could take a look to see if it all makes sense. Also, please suggest what needs to be added (like Abhinidhāna), what needs to be fixed, how it should look, and anything else that comes to mind. It's also possible that this is completely unnecessary and should not have been attempted in the first place (I don't think so, but I'm open to discussion). I feel like a lot more work needs to be done, but I don't yet know what the end state should be nor the acceptance criteria. Thanks! —JohnC5 07:05, 17 July 2016 (UTC)

A possible issue is that we use the unattested base stem as the lemma for nominals, rather than any of the case forms. So there shouldn't really be any pronunciation on those entries. —CodeCat 12:49, 17 July 2016 (UTC)
@CodeCat: I had been wondering about that problem. For nominal, this issue really only affects the desinence, and I would probably suggest using the (masculine) nominative singular as the exemplar for pronunciation. For verbal root entries, I certainly would recommend against this template's usage, but for the 3rd singular entries (e.g. गच्छति ‎(gacchati)), this should be fine. Does this solution work? —JohnC5 17:56, 17 July 2016 (UTC)
Speaking of गच्छति ‎(gácchati), the module currently produces /ɡə́t͡ɕ.t͡ɕʰə.t̪i/. The realization of च्छ ‎(ccha) seems very wrong to me. Abhinidhāna would say that the first plosive becomes unreleased. In the case of an affricate, does this mean the result would be /ɡə́t̚.t͡ɕʰə.t̪i/ and the use of ‎(ca) is merely a spelling convention? —JohnC5 19:06, 17 July 2016 (UTC)
You could avoid the issue entirely by using ":" instead of doubling the sound. Chuck Entz (talk) 02:36, 18 July 2016 (UTC)
@Chuck Entz: I've already implemented the logic for the abhinidhāna (it's actually not bad). Also I'm not clear how ":" would be used when the cluster is heterosyllabic. Thank you for the advice in any case! —JohnC5 02:48, 18 July 2016 (UTC)
The thing is that it's not exactly [t], because it has the same point of articulation as the following affricate release. That's probably why cc was used to represent it rather than tc. —CodeCat 15:30, 18 July 2016 (UTC)
@CodeCat: Just to make sure there's no confusion, the beginning of the Sanskrit alveolo-palatal cluster /t͡ɕ/ differs from the dental /t̪/ already. So the theoretical difference between त्छ ‎(tcha) (which is both unattested in MW and phonologically impossible) and च्छ ‎(ccha) would be /t̪̚.t͡ɕʰ/ vs. /t̚.t͡ɕʰ/, respectively. Are you, instead, proposing a palatal stop as the form? This would be something like /c̚.t͡ɕʰ/ for च्छ ‎(ccha) and /ɟ̚.d͡ʑʱ/ for ज्झ ‎(jjha). Is that what you are saying? —JohnC5 17:43, 18 July 2016 (UTC)
@CodeCat: Do you think the default display chronolect should be Vedic or Classical? Currently, I have it as Vedic, but that's because I like PIE. —JohnC5 02:31, 18 July 2016 (UTC)
Classical, for sure. It's what is normally taught, what people worldwide will be familiar with. —CodeCat 15:28, 18 July 2016 (UTC)
Can't we display both, similar to what we do for Ancient Greek? --WikiTiki89 15:35, 18 July 2016 (UTC)
@Wikitiki89: I've updated the code so that if the Vedic and Classical pronunciation differ, both are displayed with an arrow like in {{grc-IPA}}. You can see that at User:JohnC5/Sandbox2. Is that sufficient? —JohnC5 17:47, 18 July 2016 (UTC)
Looks good! --WikiTiki89 17:50, 18 July 2016 (UTC)
What are the differences between Vedic and Classical pronunciation that you're showing? I thought Classical Sanskrit had a stress accent that was placed in accordance with a rule similar to that for Latin (except that it can go back further than the antepenultimate), so I was expecting /ˈtɕən̪d̪ɽə/ for चन्द्र. Shouldn't कार्त्स्न्य have /ɑː/ (with long mark) in both Vedic and Classical? What about words whose Vedic scansion reveals one more syllable than is written (e.g. /kɑːɽt̪sniə/ for कार्त्स्न्य or /ukt̪uɑː/ for उक्त्वा – I don't know if those particular words belong to the class I'm talking about, but they illustrate the principle)? Does the system have a way of accommodating them? And don't some scholars believe intervocalic laryngeals were still around in Vedic, so that ā for example might sometimes be /əʔə/ or /ɑːʔə/ or /əʔɑː/ or /ɑːʔɑː/? Your sandbox doesn't seem to have any examples of word-final visarga; how would the module transcribe चन्द्रः? I'd expect /tɕən̪d̪ɽə́h/ in Vedic and /ˈtɕən̪d̪ɽəhə/ in Classical. —Aɴɢʀ (talk) 18:14, 18 July 2016 (UTC)
(edit conflict) @Angr: Thanks for all the comments. As mentioned before, feel free to fiddle around in the sandbox and add things. I'm less familiar with the Classical accent; do you have more description on that matter? I had been trying to figure out some of the vowel issues. One of the main distinctions between Vedic and Classical vowels is the change of ‎(ai) & ‎(e) from /ɑːj/ & /ɑj/ ~ /əj/ to /ɑj/ ~ /əj/ & /eː/ (the same is true for ‎(au) & ‎(o)). I was unsure what to do about the /ɑj/ ~ /əj/ decision (Does it vary or do scholars disagree?). Aslo, do we prefer /ə/ over /ɐ/, since the latter shows up as well. Any guidance on this matter would be greatly appreciated. You're obviously right that from a metric standpoint /ɑː/ must remain long. For the Vedic scansion matters, is there a way to predict it, or is it merely on an anecdotal basis? I can add in some functionality around that (whether the user must specify the variation, or the template generates it automatically), but I need more information. For the Vedic laryngeals, I'm not sure how best to approach this: again, the user would need to specify the specific vowel, perhaps similarly to how the Vedic accent is currently specified (with |a=N for the first word where N represents the syllable on which the stress occurs; |a2=, |a3= etc. for subsequent words). For the visarga, It was my impression that the vocalic reduplication around the visarga (/tɕən̪d̪ɽə́h/ to /ˈtɕən̪d̪ɽəhᵊ/) differed between Śākhās. I'm not saying we shouldn't list several different Śākhās' pronunciations, but I wasn't sure where Classical pronunciations fell. Thanks for reading all these questions. I didn't start this module because I claimed to know Sanskrit phonology particularly well—I did it because I thought it could and should be done and that people would tell me when I make mistakes. —JohnC5 19:35, 18 July 2016 (UTC)
@Angr: I've added the Classical accent. Could you check it? —JohnC5 03:35, 19 July 2016 (UTC)
@JohnC5 I don't have answers to the questions I asked. I thought that stress receded to the rightmost long vowel (excluding the final syllable) and fell on the first syllable if all syllables (excluding the final syllable) had short vowels, so that svataṃtraḥ and aupadraṣṭrya would be stressed on the first syllable, but I'm not positive that's right. Wikipedia doesn't say anything at all about post-Vedic accent, and all I have to go on is my memory of the Sanskrit class I took as an undergraduate more than 25 years ago. So please don't interpret my comments above as "This is how things are, you should accommodate them" but rather "Here's something that might bear looking into, but I'm not sure of the details at all". —Aɴɢʀ (talk) 11:55, 19 July 2016 (UTC)
@Angr: Based on everything that I've read, the Classical accent is like the w:Dreimorengesetz but extended to include the preäntepenult. So starting at the penult and moving leftward, search for the first heavy syllable unless you find the left edge of the word or the fourth-to-last syllable. I've added the vowel echoing around the visarga. I'm still not sure what to do about the laryngeals and alternate syllabification. Perhaps that can wait? Is there anything that you think must be changed before this goes into production? —JohnC5 15:10, 19 July 2016 (UTC)
@JohnC5: I notice you're using /x/ for the visarga; is it really that and not /h/? As for laryngeals and alternate syllabification, we probably don't want them to be generated automatically from the spelling, but maybe the {{sa-IPA}} template could take a parameter like altved=pāat or altved=uktuā that would allow pronunciations not reflected by the spelling to be listed as alternative Vedic pronunciations. —Aɴɢʀ (talk) 10:13, 20 July 2016 (UTC)
@Angr: You are right about the visarga; it took me a while to find a good source for it though. In the case of alternative syllabification, I'm wondering about the vowel hiatus, which does occur in Sanskrit, but rarely. For uktuā, do we assume a homorganic glide to be inserted (/ˈuk.t̪u.ʋɑː/), or pure vowel hiatus (/ˈuk.t̪u.ɑː/), or a glottal stop (/ˈuk.t̪u.ʔɑː/)? Also, can we detect the distinction between a Vedic laryngeal and a Vedic resyllabification, or does the user have to insert a glottal stop like altved=pāʔat? —JohnC5 15:23, 20 July 2016 (UTC)
@JohnC5: I think all we know for sure is that a word could be spelled उक्त्वा and scan as three syllables in the Veda; whether it was realized as /ˈuk.t̪u.ʋɑː/, /ˈuk.t̪u.ɑː/, or /ˈuk.t̪u.ʔɑː/ is probably not really knowable at this point. Likewise all that's known for sure is that some instances of ā scan as two syllables, and comparative evidence shows that these must have been *aHa or *aHā or *āHa or *āHā in PII, but how exactly they were realized in Vedic is again probably not really knowable at this point. I don't know whether vowel hiatus ever occurs when it wasn't due to a laryngeal. I almost regret bringing these issues up now, since I know so little about the details that would help in resolving them. —Aɴɢʀ (talk) 15:38, 20 July 2016 (UTC)
@Angr: Vowel hiatus does occur in other positions, as this paper discusses. I think that we ignore the laryngeals and alternate syllabifications, however, until such time as we find examples of these two phenomena and need to mark them. I'll gladly add the functionality, but it seems far too amorphous at the moment. I hope that someone better informed than we will come to tell us all the hottest new research in Vedic phonology, but for now, we're fine. I'll switch the visargas over to /h/ soon.
On a different note, I'm currently representing the anusvara phonemically as a nasalization of the preceding vowel before ś, ṣ, s or h and a homorganic nasal before stops, but I believe the true phonemic representation is /m/ at a morpheme boundary and /n/ morpheme-internally. The issue then becomes how to tell where morphemes end within heteromorphemic words. Should I just declare all anusvara before a space to be /m/ and all others to be /n/? Also, there's evidence that, along with nasalizing the preceding vowel, the anusvara preceding ś, ṣ, s or h lengthened the vowel too. This is fine for short vowels, but works less well for long vowels (especially once you get into Classical when the w:Pluti vowels disappeared). Should I 1) only lengthen the short vowels 2) lengthen both the short and long vowels or 3) ignore the lengthening altogether since it is not well understood? Thanks for all your commentary thus far! —JohnC5 18:33, 20 July 2016 (UTC)
I don't really like the arrow notation. I prefer it to always show with labels of what each pronunciation represents. —CodeCat 19:05, 18 July 2016 (UTC)
Also, a few other points:
  • I don't think short a was fully central, but more open, perhaps [ɐ], at least in Vedic, and long ā was still a long vowel. I don't have a source, but it seems fairly likely. In the same vein, I'd say e and o were probably [ɐi̯] and [ɐu̯] in Vedic.
  • Your pronunciation also seems to treat as phonemic details that weren't. The pronunciation changes of visarga before labials and velars in Vedic was not phonemic.
  • I suspect that the Vedic transcription of त्रैंश ‎(traiṃśa) is wrong, the -ai- was probably bisyllabic.
  • औपद्रष्ट्र्य ‎(aupadraṣṭrya) may be syllabified wrong, I'd expect -dr- to be entirely in the next syllable.
  • Where does the nasal /j/ in Vedic कार्त्स्न्य ‎(kārtsnya) come from?
  • The resolution of ṛ into ri was post-classical, and actually differed by dialect. Some dialects have ru or ra instead. So for Sanskrit proper, a syllabic sonorant should still be used.
Finally, I hope that there is a parameter to disable Vedic transcriptions. For certain words, the Vedic equivalent may actually have a different spelling, so listing a Vedic pronunciation would be wrong then. —CodeCat 19:23, 18 July 2016 (UTC)
Should I just to Classical then for the default display?
  • As in my response to Angr, I had been curious about that. I'm perfectly happy to use /ɐ/. Do you also think I should use /i̯/ and /u̯/ over /j/ and /w/?
  • You're quite right about the [ɸ] being phonetic. I'll need to add seperate sections for those. What would you say the underlying phoneme of the visarga is? Whenever notated as a visarga, it is [x] or [ɸ], but it is just the allophone of /s/. Should it be that?
  • What would be the method of determining the bisyllabicity of such words?
  • I was unsure about this. The Weerasinghe-Wasala-Gamage method of syllabification makes special cases for /-.CrV-/ and /-.CyV-/ in all other cases accept /VCCV/ which it always interprets as /VC.CV/. It does seem more sensible, however to keep the same rules as in the other cases.
  • William Sidney Allen's Phonetics in Ancient India mentions Vedic turned /m/ + /j/, /l̪/, or /ʋ/ as giving /j̃j/, /l̪̃l̪/, or /ʋ̃ʋ/ respectively. He then says that this occurs only once in Classical and it affects an /n/. First of all, I realize this should be a phonetic rule again. Also is it only /m/ and /n/ or all nasals? It also makes sense that this only applies in the environment of /VN.[jl̪ʋ]V/.
  • That rule was borrowed from w:Vedic_Sanskrit_grammar#Phonology. I'll remove it if you think it is necessary.
  • The ability to turn of the Vedic seems prudent, and I will add it. Also, if you'd like to help coding this (or fixing my bad code), please do! —JohnC5 19:56, 18 July 2016 (UTC)
Also, I assume that abhinidhāna would be a phonetic change, not phonemic. —JohnC5 20:17, 18 July 2016 (UTC)

──────────────────────────────────────────────────────────────────────────────────────────────────── As for what to do with geminate consonants, in Russian we use Cː and syllabify as if it's a single consonant, i.e. if between vowels it forms the beginning of a syllable and joined with the following vowel. This isn't perfect but it deals with affricates well. The alternative I think is to write e.g. /t.t͡ɕʰ/ (or maybe /t̚.t͡ɕʰ/), which I think will be interpreted correctly as a long affricate even if technically it might mean something else; to me, something like /c̚.t͡ɕʰ/ looks really weird and is likely to be misinterpreted. Benwing2 (talk) 02:50, 21 July 2016 (UTC)

@Benwing2: So, to be clear, you think the current solution is sufficient? 03:34, 21 July 2016 (UTC)

Template alternative case form of to show more intuitive description[edit]

I think {{alternative case form of}} should show one of the following, depending on the letter case:

  • Uppercase form of
  • Lowercase form of

Thus, it would no longer show this:

  • Alternative letter-case form of

Easy to implement in a module, I think.

Opinions? --Dan Polansky (talk) 13:25, 17 July 2016 (UTC)

I think this is very sensible. You are also correct that it is fairly easy to implement. —JohnC5 17:50, 17 July 2016 (UTC)
I agree with JohnC5--Dixtosa (talk) 18:55, 17 July 2016 (UTC)
There may be some instances where Lua cannot determine the case. But this should be very rare. DTLHS (talk) 18:58, 17 July 2016 (UTC)
What if the alternative casing is a mix? —CodeCat 19:05, 17 July 2016 (UTC)
Do you have an example in mind? For special cases, the template may be updated to accept a parameter telling it to display the old "Alternative letter-case form of".
Now looking at the proposal again, the "Uppercase form of" should probably be "Capitalized form of"; uppercase would be WORD rather than Word, I fear. --Dan Polansky (talk) 19:08, 17 July 2016 (UTC)
I would say, the template should normally display either "Capitalized form of", "Lowercase form of", or "Uppercase form of", as with Dan P; if the case is mixed somehow or other, it should default to either "Mixed-case form of" or just "Alternative letter-case form of" as before. Note that the "Capitalized form of" detector should be smart enough to allow both Foo-bar and Foo-Bar, i.e. it should treat hyphens as spaces and allow non-initial words to either be capitalized or lowercased, and it will probably need special-casing for Dutch words like IJsland and IJsselmeer. Benwing2 (talk) 20:53, 17 July 2016 (UTC)
I assume that the module could also detect camelcase forms and display accordingly. SemperBlotto (talk) 05:05, 18 July 2016 (UTC)
"Alternative letter-case form of" while wholly accurate is very confusing for people not familiar with such lexicographical terms. Took me a second in fact to work it out. Renard Migrant (talk) 12:43, 23 July 2016 (UTC)

Lexicography at a Crossroads[edit]

Hey. Has anyone read Lexicography at a Crossroads: Dictionaries and Encyclopedias Today? I've been flicking through it, and Wiktionary gets loads of pages dedicated to it, especially the English and Spanish parts. It's pretty interesting, and they bring up several of our flaws. --Turnedlessef (talk) 23:27, 18 July 2016 (UTC)

Page 115: "...they also illustrate two important lexicographical implications. First, only trained English lexicographers can add and/or edit English entries, whereas these requirements are not necessary for working with Spanish ones." Page 119: "Wiktionary not only uses English as default language, but also offers much more data in the English entries than in the Spanish ones. This [...] goes against its democratic philosophy." Can't say I'm much convinced that these fellows have produced an analysis that is useful for us. Korn [kʰũːɘ̃n] (talk) 08:16, 21 July 2016 (UTC)
Author offers his idea of what a Wiktionary entry should look on page 127. Korn [kʰũːɘ̃n] (talk) 08:30, 21 July 2016 (UTC)
I'm no trained lexicographer, but I edit and create English entries without difficulty, and I also hold the French entries I edit to the same standard as the English ones. The book was also written some time ago, meaning that much of its content is probably out of date. From what I saw, though, there were some valid problems addressed, but they are due mainly to the incomplete state of this project rather than its nature, and will hopefully improve with time (e.g. single translations in FL entries that do not accurately represent the word's range of definitions, such as using "business" to define negocio without any glosses to clarify its meaning—a problem that entry notably still has). Andrew Sheedy (talk) 03:01, 22 July 2016 (UTC)
Thanks for pointing this article, but it is not that relevant to Wiktionary today. I started to list this kind of publication in Meta, so I invite you to add other and specially to take a look at the ones about GLAWI Face-smile.svg Noé (talk) 13:42, 22 July 2016 (UTC)
I added a gloss to negocio. --Turnedlessef (talk) 08:31, 23 July 2016 (UTC)

Proposal for Gurjar Apabhraṃśa[edit]

I'd like to propose a code for the Gurjar Apabhraṃśa, the direct ancestor of Old Gujarati, which is given a grammatical sketch by Hemacandra and is also used in a several texts of the era. The code I'd like to propose is inc-agu (and perhaps thus as a model for other Apabhraṃśas later on as inc-axx). I'd also like to have a name that is diacritic-less, but all the literature shows the version with diacritics. Any ideas? DerekWinters (talk) 22:16, 20 July 2016 (UTC)

A tremendous gathering[edit]

Hi, colleagues!

I am thrilled to announce the official creation of the Wiktionary Tremendous Group !

It arose from seeds planted at Wikimania, a month ago and it aims to be a common place to make Wiktionaries better, share our productions and thoughts about technological developments. We can also organize events together such as conferences and LexiSession, a fancy way to contribute together to the same topic during a short period of time. In August, we suggest focusing on cat! Another main goal is to increase our network with the other wiki projects. So, if this interests you, you are very welcome! We want multilingual discussions as much as possible, but my English is not very natural, so feel free to correct any mistake you see. Also, I am very inexperienced with team management and I have no idea how to make this project more attractive. I think a nice logo would be cool, but suggestions are welcome! It will be cool, so please join! Noé (talk) 01:13, 21 July 2016 (UTC)

J'ai regardé la page, et j'aime ce que j'y vois. Peut-être je rejoindrai le groupe plus tard, quand j'ai plus de temps et quelque chose à contribuer (ou peut-être je pourrais vous aider avec les traductions de français en anglais, si ça te tente). Mon français est bien mieux que c'état il y a une année, mais j'imagine que j'ai quand même fait des fautes, et je serais très reconnaissant si tu (ou quelqu'un d'autre) les corrigeais. :) Andrew Sheedy (talk) 03:16, 21 July 2016 (UTC)
Thanks a lot for you help, I'll clean the announcement and I posted on your talk page some comments about your French, but it is better than my English, for sure! There is no implication of being part, it is not a secret society, so feel free to visit once a month or less, that's fine! Noé (talk) 09:37, 21 July 2016 (UTC)

Category for employment[edit]

I have yet to find a category relating to employment, apart from Category:Occupations. Is there one? DonnanZ (talk) 17:04, 21 July 2016 (UTC)

Am I allowed to create a category for this? No objections? DonnanZ (talk) 12:32, 23 July 2016 (UTC)

You can't assume no answer to mean no objections. I for one have no idea what you're talking about or what would be in this category of yours. —Μετάknowledgediscuss/deeds 12:39, 23 July 2016 (UTC)
Sure. I can think of a few. Interview perhaps? Renard Migrant (talk) 12:41, 23 July 2016 (UTC)
Anything relating to employment which doesn't fit in the Occupations category. DonnanZ (talk) 16:23, 23 July 2016 (UTC)
It's definitely a significant hole in our category structure, but I'm not sure where to put it or what its limits are. Cat:Business is one possible parent. That has a subcategory Cat:Human resources, which is closer but probably not a good fit, since it's restricted to a management perspective. Cat:Occupations is another possible parent, but it's under Cat:People. That doesn't seem a good fit for most things relating to work as an activity.
As far as limits, I associate the term employment more with matters regarding whether one is employed and how one becomes employed (i.e., hiring and firing), rather than with the aspects of being employed. I wish we could use "work", but that's got so many other meanings it would be hard to keep from accumulating unrelated terms.
Would you include:
  1. job action, shop steward, strike?
  2. careerist, McJob, wage slave?
  3. blue-collar, clerical, front office, management, professional?
  4. double-dipping, golden parachute?
  5. pension, retirement?
What we do for a living is so basic a part of modern life that it bleeds into a variety of topics, and covers a lot of ground. Chuck Entz (talk) 18:38, 23 July 2016 (UTC)

label → lb[edit]

Wiktionary:Votes/2016-06/label → lb passed. Can someone do the honors and swap {{label}} by {{lb}} in all entries, please? --Daniel Carrero (talk) 18:44, 21 July 2016 (UTC)

Running now. —CodeCat 19:01, 21 July 2016 (UTC)
Thank you. --Daniel Carrero (talk) 17:53, 22 July 2016 (UTC)

take pity vs. take pity on[edit]

I'm not very familiar with creating English phrasal-verb entries. I created the entry take pity for the expression "take pity on", even though it (almost?) always occurs with "on", on the principle of using the shortest form that still preserves the meaning. then created take pity on as a hard redirect; I know these are frowned on but I'm not sure what better thing to do. Benwing2 (talk) 05:44, 24 July 2016 (UTC)

Vote: Using template l to link to entries[edit]

FYI, I created Wiktionary:Votes/2016-07/Using template l to link to entries.

Let us postpone the vote as much as discussion requires, if needed at all. --Dan Polansky (talk) 06:59, 24 July 2016 (UTC)