Wiktionary:Beer parlour

Definition from Wiktionary, the free dictionary
(Redirected from Wiktionary:BP)
Jump to: navigation, search

Wiktionary > Discussion rooms > Beer parlour

Lautrec a corner in a dance hall 1892.jpg

Welcome, all, to the Beer Parlour! This is the place where many a historic decision has been made and where important discussions are being held daily. If you have a question about fundamental Wiktionary aspects—that is, about policies, proposals and other community-wide features—please place it at the bottom of the list (click on Start a new discussion), and it will be considered. Please keep in mind the rules of discussion: remain civil, don't make personal attacks, don't change other people's posts, and sign your comments with four tildes (~~~~), which produces your name with timestamp. Also keep in mind the purpose of this page. There are various other discussion rooms which may serve the idea behind your questions better. Please take a look to see which is most appropriate.

Sometimes discussion identifies an issue as an idea for policy development or rewriting. Such discussions may be taken out of the Beer parlour to a relevant page, or a brand new page may be created. Usually, the active policy pages will be listed in one of the sections below. See also the policy development page and the votes page.

Questions and answers will not remain on this page indefinitely, as it would very soon become too long to be editable. After a period of time with no further activity (usually a couple of weeks), information will be moved to the archives. We make a point to preserve all discussions that were started here in the archives. However, talk that is clearly not intended for this page may be moved and will not end up in the archives. Enjoy the Beer parlour!

Beer parlour archives edit
2002
December
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016


Contents

July 2016

Renaming all requests categories[edit]

See this: WT:RFM#Category:Translation requests (X) to Category:X translation requests / Category:Translations to be checked (X) to Category:X translations to be checked.

There, I suggested renaming all kinds of requests categories to a consistent format.

I am mentioning this here because it's a large proposed change. Thanks. --Daniel Carrero (talk) 20:00, 1 July 2016 (UTC)

Closing OrphicBot vote[edit]

Concerning this vote:

Wiktionary:Votes/bt-2016-06/User:OrphicBot for bot status

The vote was scheduled to end on June 30, today is July 3. Current results: 4-1-1. (the abstention is mine)

I am hesitant to close the vote as "Passed". It seems some issues about Ancient Greek are under discussion in the "Oppose" section.

So I decided to extend the vote +7 days. New end date: July 10. Please check if all is OK with the bot proposal, before we can close the vote. Thanks.

--Daniel Carrero (talk) 11:03, 3 July 2016 (UTC)

@Daniel Carrero: Thank you for extending the vote; I was unsure what the process was at this point. I believe the specific concerns of the user who has not responded in twelve days to have been addressed: 1) The robot is not intended to remove bad R:DGE links, but I have demonstrated it does not add them. There are only two which need to be removed by hand. 2) Edit summaries are now generated automatically based on actual changes. During testing, I wrote them by hand. The seven samples tested two functions, and my edit summary reflected the more experimental one. Further discussion of the robot has since taken place here: User_talk:I'm_so_meta_even_this_acronym#R:LSJ_and_the_Perseus_Resolve_Form. I am a little bit concerned the dissenting voter is requesting maximalist features (sorting the References section according to Classical-oriented preferences) without consulting other users. I like his requested features, and I am very pleased he asked for them, but I would not have foist them on everyone else myself, given LSJ is not relevant to Byzantine studies, is not a first choice for readers of Homer, and is not as suited to the needs of language learners as Middle Liddell. I had assumed this proposal was so innocuous that for this reason it had likely not generated any attention in almost two weeks. Given the References modules are somewhat more complicated than a programme for appending strings to a list according to some heuristics and sorting the list, I am puzzled anyone would think this task would be significantly safer left to another user. However, should anyone else want to do it, my existing code is available at User:OrphicBot and I would not mind withdrawing this proposal. Isomorphyc (talk) 13:20, 3 July 2016 (UTC)
I don't think you should withdraw the proposal. —JohnC5 14:59, 3 July 2016 (UTC)
It is not aboute LSJ vs. Middle Liddell; it is about LSJ vs. a Spanish dictionary, as detailed in the vote in the discussion below my vote. I ultimately returned to opposition not because of template order but because the interaction, and changes in mainspace and their edit summaries did not give me enough confidence. For that I am really sorry since this is a great initiative. --Dan Polansky (talk) 17:48, 6 July 2016 (UTC)
Hi @Dan Polansky:; I regret this did not work out. If you do have feature requests or other issues regarding the classical languages references after the vote, however, please do let me know, as this is an ongoing process. The issue with your sorting preferences was my concern-- I agree with your preferences, but I was concerned you and I were the only participants in that conversation. I know more users now and am more comfortable with this sort order. You may have been less concerned about the possibility of dead links had you participated in the earlier conversations. My initial edits to R:LSJ (a module we have both worked on) removed more than a thousand dead links it produced, and all of my successive modules followed the same procedure of indexing the target dictionary. (I did not work on R:DGE, and only linked it at another user's request, as you can see in the discussions). I agree with you most about the edit summaries, but as you can see in the code, the production version writes these automatically, but I did not have this feature during testing, and it was necessary to test multiple features in the same run. Assuming the robot is approved, I hope the changes work out to your satisfaction, and I trust we may find something about which to agree at a later time. Isomorphyc (talk) 18:26, 9 July 2016 (UTC)

Shortening some 'exceptional' language codes[edit]

Currently, exceptional codes for non-proto-languages are created by adding three letters approximating the name of the language onto the end of, as WT:LANG puts it, "a relevant family code". Because WT:LANG goes on to specify that "this system is used even if the relevant family code is itself an exceptional code rather than an ISO-derived code", many exceptional language codes have three parts: ira-azr-klt, ira-azr-kls, nai-yuc-tip, nai-yuc-yav, qfa-ctc-cat (qfa-ctc should actually be sai-ctc for consistency with other family codes, but that's a separate matter), qfa-ctc-col, qfa-len-slv (qfa-len should be nai-len, but that is again a separate matter). However, others have only two parts: Kitanemuk is azc-ktn rather than azc-tak-ktn, Phuthi is bnt-phu rather than bnt-ngo-phu.

At RFM, Μετάknowledge and I were discussing whether or not to always only use the nearest ISO family code (except where there is none), to obtain shorter codes, like ira-klt instead of ira-azr-klt. Other benefits are that when the precise (sub)family membership is uncertain, using only the ISO's high-level family codes is often "safer", and it allows editors to add codes for subfamilies (such as "Upper Amazon Arawakan") without worrying about needing to recode any languages (such as Amarizana and Anauyá and Maypure) which had the newly-added subfamily as their most immediate family.

If you think switching to only two-part codes is a good idea, the following codes will be affected: ira-azr-klt (which would become →ira-klt), ira-azr-kls (→ira-kls), nai-yuc-tip (→nai-tip), nai-yuc-yav (→nai-yav), qfa-ctc-cat (→sai-cat), qfa-ctc-col (→sai-col), qfa-len-slv (→nai-sln, renaming the second element to incorporate 'Lenca' now that the family portion of the code no longer does). Alternatively, if you think we should stick to using the nearest family code, then languages like azc-ktn will need to be renamed (to azc-tak-ktn).

Proto-languages are not part of this proposal, and 'exceptional' languages which belong to families that do not have ISO codes would continue to be named using the existing system. (For example, if we wanted to add an exceptional code for the divergent Korean dialect Foobarese, it would be qfa-kor-foo because the Koreanic language family has no ISO code.) - -sche (discuss) 20:47, 4 July 2016 (UTC)

I support shortening these codes. They're already long and confusing enough for editors to remember and get used to; the least we could do is make them shorter and more stable, which this proposal would accomplish. —Μετάknowledgediscuss/deeds 21:02, 4 July 2016 (UTC)

Yes check.svg Done. - -sche (discuss) 07:21, 7 July 2016 (UTC)

Open call for Project Grants[edit]

IEG barnstar 2.png

Greetings! The Project Grants program is accepting proposals from July 1st to August 2nd to fund new tools, research, offline outreach (including editathon series, workshops, etc), online organizing (including contests), and other experiments that enhance the work of Wikimedia volunteers. Whether you need a small or large amount of funds, Project Grants can support you and your team’s project development time in addition to project expenses such as materials, travel, and rental space.

Also accepting candidates to join the Project Grants Committee through July 15.

With thanks, I JethroBT (WMF) 15:21, 5 July 2016 (UTC)

Module:encodings[edit]

This is a new module I just created, announced at WT:NFE. I hope it's useful! If anyone needs additional encodings, let me know. It only supports ISO 8859-1 currently. —CodeCat 22:31, 5 July 2016 (UTC)

Small, ugly modification to improve Ancient Greek ASCII searchability -- justified?[edit]

Hello Ancient Greek editors with whom I have corresponded a few times (namely: @CodeCat, @JohnC5, @Metaknowledge, @I'm so meta even this acronym ): sorry to bother all of you, but I've made a trial change to the Template:grc-noun using a new module Module:grc-ascii-searchability which solves some searchability problems I have had in Greek. However, the change is very unaesthetic. I was going to revert it if it seems particularly disliked, or if there's a better solution available, but I was thinking of extending it to the other twelve or so templates if it seems to solve a problem others have. My problem is that I have trouble searching for Greek words on Wiktionary because I tend to type in the Roman alphabet even when I shouldn't. As a result, to find Greek words I usually look for cognates whose etymologies likely link, or else I find the word on Perseus with betacodes and then paste the link into Wiktionary. This process normally takes a few minutes. I thought there had to be a better way-- but possibly I found a worse way. I dangled some invisible text from the Greek headword template with unique variants from up to nine different Romanisation schemes (asciisation schemes, really), so that now a search for something like "qea greek" or "chalix greek" or "elios greek" or "hlios greek" in the search bar turns up the relevant Greek noun usually as the first or second result. (qea, as you know, is unaccented beta code for θεᾱ́; the rest are self-explanatory). In general, anything in this list should be searchable using most reasonable asciisation schemes I could think of: https://en.wiktionary.org/w/index.php?title=Special:WhatLinksHere/Module:grc-ascii-searchability&hideredirs=1&hidelinks=1. My hope is that these Romanisations will percolate into external search engines as well so that our Greek entries will be easily searchable that way. Does this seem like a change worth extending? If it is not widely liked I will revert it, but if it is, I will either add it to all of the Greek headword templates, or else (more cleanly, if this method is preferred), look in to finding the appropriate place for it in Module:headword. I had wanted to get the Greek words into the search bar's autocomplete feature through typing the Romanisations, but I think that is completely impossible for me. It is worth pointing out that the macron-accented Romanisation already offered by the headword module is also searchable in plain ASCII; it is just a little bit more detail-oriented to find, and speaking personally, it is not something that was ever part of my process to look for, or indeed, not something I knew about till I saw it in my own search results a few minutes ago. Thanks for your time. Isomorphyc (talk) 04:12, 6 July 2016 (UTC)

Not sure I see the point. Searching "thea greek" already works (I just tried it), so there's no reason we need to have "qea greek" work as well. Beta code is far more nonobvious than our romanisation, IMO. —Μετάknowledgediscuss/deeds 04:59, 6 July 2016 (UTC)
I think you are quite right. I will delete this if no-one else replies. Thank you! When I started I incorrectly assumed our primary Romanisation was not searchable through the macrons. Isomorphyc (talk) 09:37, 6 July 2016 (UTC)
Edited: @Metaknowledge: I originally did not realise this, but the reason you can search "thea greek" I believe is because of the template. The reason might be that since the Romanisation appears twice in the page, it outranks other pages which cite it. Try something other than a noun, for example, αἰσθάνομαι via "aisthanomai greek." One has the problem that a great many etymologies in derived languages appear before the main entry. The reason I included beta code and other variants was not so much because I like any of them, but because I wanted to include every reasonable asciisation scheme to avoid controversy about which one to use. At the same time, this is a lot of ad-hoc mess for a little benefit, and this is a parochial solution to a general problem, in which popular pages with outgoing link Romanisations outrank their link targets in searches. Isomorphyc (talk) 10:50, 6 July 2016 (UTC)
Further edit: The most relevant comparison is with verbs, which are remain in their original state: https://en.wiktionary.org/w/index.php?title=Special%3AWhatLinksHere%2FTemplate%3Agrc-verb&hidelinks=1&hideredirs=1 Isomorphyc (talk) 12:55, 6 July 2016 (UTC)
Oh, I see. Well, it could be useful. At the same time, I'm disinclined to do things that are ugly unless they will get a lot of use, which I doubt this will. I suppose you'd best wait for another classically inclined editor's opinion. —Μετάknowledgediscuss/deeds 18:15, 6 July 2016 (UTC)
What would you or others think of repeating ascii versions of all headword Romanisations within non-visible html to make them outrank pages which link to them for ascii-Romanised searches? Clearly, this method works, and what is ugly (and unfair to Modern Greek) is that I am doing it for just one language. The real problem is that we can't (I think) hint to the search function any other way that a Romanisation has almost the force of a headword. The really ugly thing which would solve the problem, but which I would never advocate, is to give the Romanisations their own headwords, as is done in Chinese, which is pretty convenient, but still less so than Greek because it is harder to type in Greek. I know I am suggesting a significant change; I wouldn't want to go forward with any of these options unless at least a few people are ecstatic, but this is not what I am seeing here. Isomorphyc (talk) 19:42, 6 July 2016 (UTC)
I think we should have entries for romanizations of all languages. We seem to go by the rule that lesser-known scripts get their own romanization entry while better-known and more widely used scripts don't. But this argumentation is completely pointless in the face of users who don't understand the script regardless. It doesn't help them in the slightest that the script is well known if they don't know it. —CodeCat 19:46, 6 July 2016 (UTC)
Ultimately, this is probably what we should do. I've categorised that idea under things to deal with in the far future, but it could actually be a great way to increase readership in the short term. It would require a very active romanisation bot, of course, and we'd presumably have to phase in languages one by one (make sure that all the romanisations are good, then create the soft redirects). —Μετάknowledgediscuss/deeds 19:51, 6 July 2016 (UTC)
I took a quick look at a random pageview statistics file amongst the five or ten terabytes available here: [1]. Of the top ten Chinese words visited in that particular hour, all ten were hanzi, not pinyin or another Romanisation. Obviously I could make a more systematic study of this, but the most cursory evidence suggests adding Romanisations will not increase readership much. (Incidentally, Latin words are one of the biggest attractions here). I think this question is not really about readership in general, but about how specific types of users use specific languages of largely academic interest. Isomorphyc (talk) 20:45, 6 July 2016 (UTC)
Of course, the romanizations may not be used nearly as much. But can we get statistics of the romanizations alone? How much are they used? —CodeCat 20:51, 6 July 2016 (UTC)
This will have sampling size issues, time of day issues, language choice issues, etc. but for the file I am looking at, about 12% of Mandarin visits are pinyin and 88% are hanzi. I have most of the dataset locally, so this could be done more systematically or for other languages, of course, with more planning. The numbers are 1492 hanzi entries visited 1922 times, compared to 174 entries visited 261 times in that hour interval for pinyin. Isomorphyc (talk) 21:18, 6 July 2016 (UTC)
In general, Romanisations are 1.2% of pages viewed in this period: 1415 out of 114222. So Mandarin has about a 10x statistical lift for users preferring Romanisations compared to the average language. Isomorphyc (talk) 21:22, 6 July 2016 (UTC)

──────────────────────────────────────────────────────────────────────────────────────────────────── @Isomorphyc: I am more in favour of including the invisible HTML Romanisations than I am in favour of creating entries for Greek Romanisations. Neither idea fills me with joy, however. For you personally, I recommend learning to type polytonic Greek with your keyboard. — I.S.M.E.T.A. 13:26, 7 July 2016 (UTC)

Hi @I'm so meta even this acronym:: I use this method for composition, and it is not too bad, though I find it is not so good for dictionary type of tasks. Pinging also @CodeCat, @Metaknowledge, @JohnC5: following the earlier discussion, I have some statistics for Romanisation. They are quite interesting. Mandarin is the only language which is fully Romanised, and my old numbers were wildly incorrect (because, as I had warned, of time-of-day bias. It was daytime in China, and pinyin was unpopular). Pinyin has received 2m of Mandarin's 4.2m year-to-date page views, mostly during night-time in China. Please see the following chart. The data are summed over all 1H 2016 page views. I have put the table in a user page appendix: User:Isomorphyc/Romanisation_Page_View_Statistics
Several observations: Adding Japanese Romanisations would perhaps yield a 50% increase in Japanese page views. The current convention in Cantonese is to offer Jyutpin entries only for characters. Switching to a convention which Romanises all words (as in Mandarin) could nearly perhaps double Cantonese page views. I am not advocating either of these (I don't know much Cantonese or any Japanese), but only mention this; wide readership should be perhaps even only a secondary objective here, in some ways. Archaic-script Romanisations are very popular, but a caveat in Gothic is that a great many entries were created in 2016. The script and the Romanisations are surprisingly popular-- presumably the script users are clicking on intra-Wiktionary etymologies while the non-script users are searching non-script outside references. There is no data on the popularity of Romanisations for languages with widely-used scripts (Hebrew, Devanagari, Greek, Cyrillic, etc.) The only user-base I can imagine are people (such as myself) who are too lazy to change their keyboard configuration. I believe learners of modern languages only ever use Romanisations for character-based, never script-based, languages. We already gloss most widely gloss Romanisations in mentions and lists with templates, so I don't see a case that users are confused. If someone really wants to do this, my suggestion would be to pick one popular-script language, Romanise all of it, and see if the usage numbers justify keeping it and proceeding after a reasonable amount of time.
More observations (not in the table): Wiktionary has about a billion page views a year, or thirty per second. Fully 8.9% of this traffic is people looking up non-lemma forms in Latin. (A further 4.5% is lemma forms). Latin comprises, I think as is widely known, 13.7% of entries and 13.8% of page views. Evidently nearly 10% of Wiktionary by weight and popuarlity is essentially an easy to use Latin stemmer. I suspect the best thing one could do for Greek does not relate to Romanisations, but instead is extensively to create non-lemma entries while ensuring a core vocabulary of the top 5000 Attic and Koine are available. I wouldn't mind working on this in the medium term, but for now, to me, the Romanisation searchability seems like very low-hanging fruit.
While I am here, this table may also be of mild interest. User:Isomorphyc/Page_Views_and_Entry_Counts_by_Language_1H_2016
Apologies for the bad formatting, long note and long delay. The raw 2016 data are 3 TB uncompressed, but I can offer a ~200 MB file if anyone would like annualised 2016 data. Thanks for reading. Isomorphyc (talk) 00:07, 12 July 2016 (UTC)
@Isomorphyc Is there any referrer information (what sites people came here from)? Or differentiation between internal and external refers. DTLHS (talk) 02:21, 12 July 2016 (UTC)
@DTLHS: Unfortunately, no. The data are per-page hourly view counts for all Wikimedia projects. I believe this is for privacy concerns. I had wished for the same thing. Isomorphyc (talk) 02:25, 12 July 2016 (UTC)
@Isomorphyc: This information is startling interesting and represents the strongest argument I've ever seen for large-scale Romanizations. I've always been a proponent of using native scripts (as evinced by my painstaking construction of AP:Old Italic script), but if extra entries make that much difference, I'd certainly support their broader use. —JohnC5 04:53, 12 July 2016 (UTC)
@JohnC5: For all practical purposes, we do not have data on Romanisation of scripts, only of character systems. I believe the benefit is likely far less, because character systems usually have official Romanisations which are often taught to tourists, beginning students, and casual learners. Russian, Arabic, and Modern Greek are the three most developed non-Roman script languages on Wiktionary. I would see very large issues in creating Romanised headwords for any of them, but it may still be worth a trial. Academic languages (Greek, Sanskrit) I think are a separate category. Isomorphyc (talk) 11:56, 12 July 2016 (UTC)
Modern languages such as Mandarin and Japanese have official romanization standards, so creating romanization entries for the best-known ones isn't that hard. The same is true for languages such as Gothic, with relatively uniform treatment in the literature. Ancient Greek, on the other hand, has a plethora of ad hoc systems that vary in subtle ways: is it chi, khi, or xi? How about xi vs ksi? Or diphthongs: ou or u, ei or i? How is length handled? I would think that, as the predictability of what people would search for diminishes, the benefits from romanization entries would, as well. Even with Mandarin and Japanese, we have Hanyu pinyin and romaji, but not Wade-Giles or Hepburn. Then there are things like beta code: even if the system would let us have entries like fqoggh/ or w)=|, I think they would cause more confusion than they're worth (having entries searchable by such things is another matter). Chuck Entz (talk) 13:31, 12 July 2016 (UTC)
Another idea to throw in the ring: is there any way to create an input method or keyboard of some sort, so people could type beta code into the search box and get Greek to show up? And beta code has standards for other scripts, as well, so it would be useful for many other languages. Chuck Entz (talk) 13:45, 12 July 2016 (UTC)
@Chuck Entz: That idea sounds wonderful! I'm not sure how to implement it, but I'd love to help. —JohnC5 14:21, 12 July 2016 (UTC)
@Chuck Entz: I do not know to interface with the search bar, or much about JavaScript; but I would be glad to learn if there is a way for me to help. I like this idea very much. Isomorphyc (talk) 01:20, 13 July 2016 (UTC)
@Isomorphyc: Where do you find these page view statistics? --WikiTiki89 13:53, 12 July 2016 (UTC)
@Wikitiki89: The raw page-view statistics are here: [2]. There are a few other forms in the same place, with various characteristics. Isomorphyc (talk) 14:06, 12 July 2016 (UTC)
For reference, I have for the time being reverted the changes from this experiment. Although it made searching for Greek words significantly easier, I am currently looking at other ways to interface with Wiktionary's CirrusSearch extension to produce measurable results extensible to other Romanisations. Thank you all for your discussion of these matters. Isomorphyc (talk) 15:23, 6 August 2016 (UTC)

Using template l to link to English entries[edit]

FYI, I created Wiktionary:Votes/2016-07/Using template l to link to English entries.

Let us postpone the vote as much as discussion requires. --Dan Polansky (talk) 08:21, 6 July 2016 (UTC)

Imagine me taking off my surgeon's mask as I ask: But...why? Korn [kʰũːɘ̃n] (talk) 10:23, 6 July 2016 (UTC)
Because there may be better ways to do this. E.g. writing the language code "en" is cumbersome and hinders readability. 99.99% of the words linked on the definition lines are to English words, so, we may as well use "en" as a default. But given the stupid order of parameters (language code first? Really?), that is not possible. — Dakdada 10:47, 6 July 2016 (UTC)
I suggested a separate proposal to address the problem you just mentioned. See: Wiktionary talk:Votes/2016-07/Using template l to link to English entries#Separate proposal. --Daniel Carrero (talk) 10:56, 6 July 2016 (UTC)
  • I meant "why is this proposed?" not "why is the vote postponed?" And I prefer templates to take all non-optional parameters first. Korn [kʰũːɘ̃n] (talk) 12:39, 6 July 2016 (UTC)
Ah, sorry I misunderstood. — Dakdada 15:58, 6 July 2016 (UTC)
@Dan Polansky, you said here: "Rationale" - "To be entered by supporters." Are you planning to vote "Oppose"? --Daniel Carrero (talk) 16:22, 6 July 2016 (UTC)

Bot replace Template:etyl with Template:cog?[edit]

Is it ok if I run a bot to replace all instances of "{{etyl|xx|-}} {{m|xx|...}}" with "{{cog|xx|...}}", currently categorised in Category:etyl cleanup no target? —CodeCat 20:05, 6 July 2016 (UTC)

I would certainly support this. —JohnC5 20:11, 6 July 2016 (UTC)
How do you know all uses of {{etyl|xx|-}} are intended to be cognates? DTLHS (talk) 20:15, 6 July 2016 (UTC)
What else could they be? —CodeCat 20:17, 6 July 2016 (UTC)
I support and appreciate the change. Please go ahead. --Daniel Carrero (talk) 20:18, 6 July 2016 (UTC)
(e/c) And why would it matter? The template doesn't have to only be used for cognates, even if cognates are its main purpose. --WikiTiki89 20:19, 6 July 2016 (UTC)
I could limit it to just etymology sections for now, if that's better. —CodeCat 20:20, 6 July 2016 (UTC)
Yes, probably limit it to etymology sections (kampung for example) DTLHS (talk) 20:25, 6 July 2016 (UTC)
(e/c) Theoretically, {{etyl}} should only have been used in etymology sections anyway. I would like to know what kinds of other places it's used in. --WikiTiki89 20:26, 6 July 2016 (UTC)
  • I've used {{etyl}} in ====Usage notes==== sections, such as in "compare with English [SOME OTHER TERM]..." constructions. ‑‑ Eiríkr Útlendi │Tala við mig 20:32, 6 July 2016 (UTC)
    Well that's clearly not the intended purpose of {{etyl}}, and I would even say {{cog}} is preferable there. So there is no need to limit this replacement to etymology sections. --WikiTiki89 20:38, 6 July 2016 (UTC)
  • The intention of the template is not clear from its documentation. The existence of second argument - to explicitly avoid categorization lends this template to broader use than just in ===Etymology=== sections. Even the name etyl, described as coming from etymological language, suggests that this could be used in any case where an editor seeks to specify a given term's language.
Before embarking on any bot-driven overhaul of how {{etyl}} is used, I would strongly recommend first finding out where and how it is actually used, based on the hard data available in a dump, rather than just relying on our own individual assumptions. ‑‑ Eiríkr Útlendi │Tala við mig 21:43, 6 July 2016 (UTC)
At worst, the bot run will convert some incorrect uses of {{etyl}} to equally incorrect uses of {{cog}}. We can do such an analysis after the run, too. —CodeCat 21:46, 6 July 2016 (UTC)
  • @CodeCat, so long as the xx in both {{etyl}} and the following {{m}} match, this is fine by me. There have been cases where {{etyl|ojp|...}} is followed by {{m|ja|...}} due to the unresolved status of Old Japanese entries, both here and in Japanese lexicography in general (i.e. there isn't as clear a distinction between the two; many bigger dictionaries of modern JA include obsolete terms that could technically qualify as OJP, and current terms that have specific OJP or Classical senses). ‑‑ Eiríkr Útlendi │Tala við mig 20:36, 6 July 2016 (UTC)
    I would say it's better to put something like "from Old Japanese (compare Japanese XYZ)". --WikiTiki89 20:39, 6 July 2016 (UTC)
  • So far as I know, we do not (yet) have any OJP entries. We do have plenty of JA entries, and in some cases, the JA and OJP differ mainly in conjugation patterns and idiomatic usage. Monolingual JA dictionaries will often put OJP and JA content into a single entry, indicating obliquely in the header that the older forms have a different conjugation. "Compare" doesn't quite seem correct in these cases. ‑‑ Eiríkr Útlendi │Tala við mig 21:54, 6 July 2016 (UTC)

I've made a few test edits with the script, it seems to work ok. All those Malayo-Polynesian etc entries that are listed right at the start of the category use {{etyl}} in descendants sections, we'll probably want to fix them. There's a few I already encountered that use it in other sections but could theoretically replace it with {{cog}}, such as in -ains. —CodeCat 20:43, 6 July 2016 (UTC)

But like I said, {{cog}} is not any more wrong than {{etyl}} there. So it's safe to replace them. --WikiTiki89 20:50, 6 July 2016 (UTC)
Ok, I'll remove the section restriction. —CodeCat 20:52, 6 July 2016 (UTC)
  • Query: does {{cog}} add any categories? If so, which ones? If not, should it?
I'm also a little puzzled by the apparently cavalier attitude for using {{cog}} for relationships that are not cognates. This confuses things unnecessarily. ‑‑ Eiríkr Útlendi │Tala við mig 21:59, 6 July 2016 (UTC)
It does not add categories. Also, we've used etyl cavalierly to mark relationships that are not etymological; this is a step in the right direction. —Μετάknowledgediscuss/deeds 22:05, 6 July 2016 (UTC)
  • {{etyl}} makes more sense to me than {{cog}} for marking the language of a given term.
And it is unclear to me how changing to {{cog}} "is a step in the right direction". Although I disagree that use of {{etyl}} has been cavalier, even granting that, swapping one apparent confusion for another does not strike me as progress. I note also that the documentation for {{cog}} explicitly states that this template is intended to mark cognate relationships, and that it is intended for use solely in ===Etymology=== sections. What's described in this thread here goes well beyond that stated scope. ‑‑ Eiríkr Útlendi │Tala við mig 22:24, 6 July 2016 (UTC)
Please look at how the templates work. The advantage of {{cog}} is that it links to both the language and the term, thus serving the purposes of two templates (less markup means easier for editors to read and easier for bots to manipulate, while eliminating mismatch error). —Μετάknowledgediscuss/deeds 22:44, 6 July 2016 (UTC)
If the naming of the template is an issue, we can create an exactly identical template with a different name and documentation, or we can rename this one. Either way, this doesn't affect the bot run. —CodeCat 23:10, 6 July 2016 (UTC)
  • The functionality (combining two templates into one with essentially the same output → less markup) I agree with. The naming of {{cog}} and its documentation make me think that cog as a name here is inappropriate for this current use, which has nothing to do with cognates. If a different template with a different name could be used for this instead of {{cog}}, my current concerns would be resolved. ‑‑ Eiríkr Útlendi │Tala við mig 01:17, 7 July 2016 (UTC)
So, for example in almagra, you want to change the current etymology from "From Spanish almagra, almagre, from Arabic المُغْرَة ‎(al-muḡra, “red clay or earth”)" into "From Spanish en almagra, almagre, from Arabic en ‎(en) المُغْرَة ‎(al-muḡra, “red clay or earth”)", and you want to remove the categories Category:English terms derived from Spanish and Category:English terms derived from Arabic altogether?
Why don’t you want to keep those categories, and what are you going to do with the en and en ‎(en)? (note: en is a language code, so it will vary according to the language.) —Stephen (Talk) 07:33, 7 July 2016 (UTC)
I think you don't understand this. Please look carefully at what CodeCat wrote at the very beginning, so you can see that it doesn't apply to uses of {{etyl}} to show etymological derivation. —Μετάknowledgediscuss/deeds 07:42, 7 July 2016 (UTC)
Oh, just in Category:etyl cleanup no target, not necessarily "replace all instances of". —Stephen (Talk) 07:58, 7 July 2016 (UTC)

I've run the bot now, and it did a good number of edits, but there's still 5000 cases remaining. I spotted some regular occurring patterns that a bot could also fix up:

  • {{etyl|xx|-}} ''[[foobar]]'', optionally with a language as the anchor.
  • {{etyl|xx|-}} {{l|xx|...}}, presumably added by editors who don't know the difference between the templates.
  • {{etyl|xx|-}}: {{l|xx|...}} in a descendants section. Seems to occur mostly in Malayo-Polynesian languages.

CodeCat 16:03, 8 July 2016 (UTC)

I don't understand the difference between the templates either. What do {{l}} and {{m}} actually do differently? Korn [kʰũːɘ̃n] (talk) 17:14, 8 July 2016 (UTC)
See Help:Language sections#Linking to language sections. --Daniel Carrero (talk) 17:16, 8 July 2016 (UTC)
{{m}} italicizes Latin-script terms and transliterations of non-Latin script terms. {{l}} does not. --WikiTiki89 18:18, 8 July 2016 (UTC)
Strictly speaking, it doesn't italicise, but it tags it with the "mention" CSS class, and the CSS then gives it italic formatting. The distinction matters when users start making custom CSS. —CodeCat 18:37, 8 July 2016 (UTC)
@Korn See also Wiktionary:Style_guide#Styling_templates, a section I wrote which shows all (or most) of the styling templates and where to use them. In short, {{l}} is used for lists and {{m}} in running text, and as mentioned, the latter italicizes (usually) but the former doesn't. Benwing2 (talk) 20:56, 8 July 2016 (UTC)

@CodeCat I have thought also about running a bot to convert instances of LANG {{m|xx|...}} to {{cog}}, where LANG and xx agree, e.g. Serbo-Croatian {{m|sh|...}}. This has to be done carefully; one idea is to look for lists of terms (e.g. LANG1 {{m|xx|...}}, LANG2 {{m|yy|...}} ... and LANGN {{m|zz|...}}, probably with additional smarts to allow for parenthesized terms in the list) and only convert them when the immediately preceding text says "[Cc]ompare" or "[Cc]ognate with" or certain other expressions. Benwing2 (talk) 21:03, 8 July 2016 (UTC)

The smarter it is, the more potential there is for errors or oversights - don't write code smarter than yourself, and don't overestimate how smart you are. So I'd prefer simpler heuristics if possible. Limiting it to Etymology sections is a good start. —CodeCat 21:25, 8 July 2016 (UTC)
Sorry, I definitely meant it to be limited to Etymology sections; that's clear. As for the rest of it, I don't think this is terribly over-clever code, and I've written bot code like this before without too much problem. You just have to be careful and review a bunch of the subs (before actually saving anything) to make sure it's behaving like you want. Benwing2 (talk) 23:32, 8 July 2016 (UTC)

New abuse filter for canned edit summaries?[edit]

Wikipedia has a list [3] of edit summaries commonly used by vandals: Added content; Added; Fixed typo; Typo; Fixed grammar; Grammar; I made it better. I don't know whether there are any specific vandalism tools or help-sheets out there that use these phrases, but I've seen them (identically, with the initial capital) around Wiktionary too. Perhaps an abuse filter to tag them is in order? Equinox 21:00, 7 July 2016 (UTC)

The reason vandals use these edit summaries is because normal editors use them too. I'm not sure how effective it would be to tag them. --WikiTiki89 21:37, 7 July 2016 (UTC)
I'm not sure that normal editors do use them. I've never seen "Added content" on a legit edit, but often on bad ones. Equinox 21:45, 7 July 2016 (UTC)
Maybe not "added content", but "fixed typo" and "fixed grammar" are certainly used by normal editors. --WikiTiki89 21:48, 7 July 2016 (UTC)
While "added content" is usually a red flag, I do see a small number of good edits all the time. The same with "fixed typo", though the size-change criterion is a helpful- but not infallible- added indicator. Just plain "Fixed" is somewhat reversed in terms of vandalism-to-good-edit ratios. On the other hand, I've never seen a good edit accompanied by "I made it better"- or by anything using the pronoun "I". By the way, I think the reason "added content" is used so much by vandals is that it's very vague and seems innocuous. While we're at it, I think any edit comment that includes lol, lulz, or variants should definitely be flagged, too. Chuck Entz (talk) 03:43, 8 July 2016 (UTC)
I have seen "fixed typo" used by so many vandals on WP. I don't recall whether I've seen it here or not. We could tag edits by new users that used them and check the tag log after a while and see whether it was catching enough vandalism (and a high enough ratio of vandalism to helpful edits) to be worth continuing to tag. - -sche (discuss) 22:23, 7 July 2016 (UTC)
Typos are small, so any edit glossed "fix typo" that changes more than, say, six bytes is very suspect. Equinox 22:30, 7 July 2016 (UTC)
  • I use 'typo', 'fixed typo', 'grammar mistake', 'general improvements' and 'expanded'/'extended' with great regularity. And while Germanic, I'm not a vandal. I just make a lot of typos. If you don't want to become scholars of my works, maybe the tagging should exclude autopatrollers. Korn [kʰũːɘ̃n] (talk) 22:41, 7 July 2016 (UTC)
Tags If a certain tag is being flagged too often, it can easily be turned off. It's worth trying these to see if they are useful. —Justin (koavf)TCM 03:30, 8 July 2016 (UTC)
Good idea. The filter can combine the edit summary with other observable characteristics of the edit, like, quoting Equinox, 'so any edit glossed "fix typo" that changes more than, say, six bytes is very suspect'.
A rather unrelated idea: I would prevent anons from making edits that remove more than, say, 10 bytes. --Dan Polansky (talk) 09:36, 10 July 2016 (UTC)
@Dan Polansky: I think that a policy about restricting the bytes that an IP can edit is hostile to users who don't wish to have an account. We should welcome anyone to edit, even if he doesn't want a pseudonym associated with this edits. —Justin (koavf)TCM 14:04, 10 July 2016 (UTC)
It's about bytes removed, not bytes added. --Dan Polansky (talk) 14:10, 10 July 2016 (UTC)
That sounds much too restrictive to me: there are plenty of good reasons to remove material, e.g. cutting down waffle/verbosity, or fixing the vandalism of other anons. Equinox 14:20, 10 July 2016 (UTC)
@Dan Polansky, Equinox: For that matter, someone may make a very helpful edit by removing a lot of text and moving it to the Citations namespace or replacing it with a template. I agree that it can still trigger a tag but not restricting the ability to do it altogether. For that matter, someone can remove 100kb of data and then add back 106kb of junk. The absolute difference in data is not a good metric of quality, hence just tagging it rather than stopping it altogether. It still requires a lot of human discretion. —Justin (koavf)TCM 18:11, 10 July 2016 (UTC)

Old Gutnish[edit]

I've noticed a few redlinks to Old Gutnish in Proto-Germanic entries. Does anyone know if it is distinctive enough to have its own code, or should it be merged into Old Norse/modern Gutnish? KarikaSlayer (talk) 00:48, 8 July 2016 (UTC)

It's considered a dialect of Old Norse, but quite distinct. Old Norse is considered to be split into three main dialect areas, East, West and Gutnish. Some of the idiosyncracies of Old Gutnish survive into modern Gutnish, including in particular the triphthong jau. —CodeCat 01:02, 8 July 2016 (UTC)

Compact Language Links enabled in this wiki today[edit]

Screenshot of Compact Language Links interlanguage list

Compact Language Links has been available as a beta-feature on all Wikimedia wikis since 2014. With compact language links enabled, users are shown a much shorter list of languages on the interlanguage link section of an article (see image). Based on several factors, this shorter list of languages is expected to be more relevant for them and valuable for finding similar content in a language known to them. More information about compact language links can be found in the documentation.

From today onwards, compact language links has been enabled as the default listing of interlanguage links on this wiki. However, using the button at the bottom, you will be able to see a longer list of all the languages the article has been written in. The setting for this compact list can be changed by using the checkbox under User Preferences -> Appearance -> Languages

The compact language links feature has been tested extensively by the Wikimedia Language team, which developed it. However, in case there are any problems or other feedback please let us know on the project talk page. It is to be noted that on some wikis the presence of an existing older gadget that was used for a similar purpose may cause an interference for compact language list. We would like to bring this to the attention of the admins of this wiki. Full details are on this phabricator ticket. Thank you. On behalf of the Wikimedia Language team:--Runa Bhattacharjee (WMF) (talk) 03:12, 8 July 2016 (UTC)

Company names in Russian?[edit]

@Atitarev, Cinemantique, Wikitiki89, Wanjuscha, KoreanQuoter I'm pretty sure Wiktionary's policies don't allow things like company names e.g. Газпром (Gazprom), Лукойл (Lukoil), Роснефть (Rosneft) to be inserted as lemmas. For languages like Russian this may be slightly problematic as it's often useful to be able to give the pronunciation, stress, gender, declension etc., some or all of which may be unpredictable. Some but not all of this information is found in Wikipedia. Any solution? Benwing2 (talk) 04:44, 8 July 2016 (UTC)

I think this would fall under WT:BRAND, which in practice allows major company/brand names to be included (basically if they might be mentioned in passing, without talking specifically about the commerce side). Equinox 04:47, 8 July 2016 (UTC)
Thanks for pinging but it's not up to me. I find some company names interesting as well but they get deleted, e.g 宏碁宏棋 (Hóngqí, “Acer”). --Anatoli T. (обсудить/вклад) 06:11, 8 July 2016 (UTC)
I don't think there is anything special about Russian with regard to company and brand names. Even in English, it is not always clear from the spelling these names how they are supposed to be pronounced. The reasons for excluding most company and brand names from English and other languages apply equally to Russian as well. But like Equinox said, some of these names can still pass. --WikiTiki89 14:33, 8 July 2016 (UTC)

Future of Wiktionary and its interface[edit]

I want to ask Wiktionary users to think about this more seriously. Now I myself don't have any alternative in my mind right now (I only know we probably need something new from scratch), but I think we should bring discussions like this up every now and then, even if we don't make any apparent progress right now. It's a shame that many of us users don't bother to think about what the project should look like in the long run, while it is relevant to our very own efforts.

Wiktionary is not user-friendly, and looks relatively time-consuming to edit for most people, especially for beginners (note the fact that the bulk of our users are either quite active or quite inactive). I feel what we currently have is not what our editors deserve. Flexibility is good when you are at the beginning of things, but I think our project has reached a good degree of matureness to move toward finally deciding about a less "flexible" and more systematic way to take and show information. I noticed this when I saw we are trying to make even Etymology, Descendants and Pronunciation sections as such.

P.S. Consider Wikidata as well. --Z 17:27, 9 July 2016 (UTC)

Our hands are tied. We don't have control over anything of importance in terms of how this website works, and we have very few people who are willing and able to put enough time into new user-friendly JS or beginner-friendly templates. Wikidata is a must for interwikis, but otherwise I have not seen much prospect for its utility. From my perspective, all I can do is keep adding content. —Μετάknowledgediscuss/deeds 18:25, 9 July 2016 (UTC)
From what I can gather, Wikidata would rather move ahead with their own pseudo-dictionary project rather than touch anything on this site. DTLHS (talk) 18:28, 9 July 2016 (UTC)
@DTLHS: Can you tell me what you mean by this? Can you point me to some relevant pages? —Justin (koavf)TCM 20:47, 9 July 2016 (UTC)
See the wikidata page on "water". There are many translations, with glosses defined in the target language. They are attempting to do many of the same things we do. Notice that there are links to every other Wikimedia project but not Wiktionary. DTLHS (talk) 21:01, 9 July 2016 (UTC)
@DTLHS: Well, see my comment at d:Wikidata:Project chat. The problem is that interwiki links mean two different things only for this project but not for others. For every other project, an interwiki link would be the same idea (wikt:en:foot to wikt:es:pie) but for this one, it would also be good to link to the same idea (a translation) but also the same term (so from wikt:en:foot to wikt:es:foot). —Justin (koavf)TCM 03:31, 10 July 2016 (UTC)
  • @Justin (koavf), what you propose represents a huuuuge problem -- the correct translation of a single term in isolation, into a single term in some other language, is often flat-out impossible. Which sense of water would you link to in the other languages? What about metal? Even compound terms can introduce ambiguity -- for heavy metal, would you link to the corresponding entry for the chemical sense, or the musical sense? This impossibility is precisely why Wiktionary interwiki links only target the corresponding entry for that same spelling in that same script. ‑‑ Eiríkr Útlendi │Tala við mig 00:38, 13 July 2016 (UTC)
@Eirikr: Agreed--it's very difficult to do this. This is why OmegaWiki uses a different database entry per meaning, whereas we have an entirely different page per term/character/etc. —Justin (koavf)TCM 01:24, 13 July 2016 (UTC)
Even meanings are problematic: for every clear, discrete concept like a chemical substance or a species, there are a hundred others that are really clouds of interacting sub-concepts, with the relative importance of each sub-concept influenced by context or personal style, or phases of the moon... or something. These terms don't quite mean exactly the same thing any two times they're used, and the ambiguity can be just as important as the core meanings. This is the stuff of poetry and humor, and pinning it down precisely will often kill it. Translation is an art, not a science, and even a mega-corporation like Google with all the resources it puts into Google Translate can't make computer translation accurate. It's been a few decades since I took Semantics, but I don't think this is going to be solved any time soon- we just keep progressively halving the distance to infinity. Chuck Entz (talk) 02:33, 13 July 2016 (UTC)
@DTLHS: That's not intended to be a dictionary in any form. Needing labels corresponding to topics that need to be used by Wikipedia is incidental, and Wikidata does not currently include any effort to build a database of lexical/linguistic content outside of that. However, there are plans to build one in the future, so that the Wiktionary projects can make use of it. See the most recent proposal for Wiktionary-Wikidata integration. These efforts have not yet begun. --Yair rand (talk) 20:29, 17 July 2016 (UTC)
@Metaknowledge: If you have any ideas for some scripts to make things more user-friendly, I'd be glad to write the code.
Re Wikidata, I think it will be very useful for storing things like hard-to-manage transcriptions (those which are currently scraped out of entries by templates), pronunciations (which could then be shared across projects), quotations, and maybe etymological data. Eventually, we'll probably be able to move basically everything over there without disrupting everyone's workflows, but that's probably many years away. --Yair rand (talk) 20:39, 17 July 2016 (UTC)
@Yair rand: Right now most of my wishlist is for editors, not users, but if that offer stands, I'll try to remember to tell you when something comes up. And I'm still suspicious of our future at Wikidata — if they really wanted to do this the right way in the foreseeable future, they'd try to get us on board and invite discussion here. —Μετάknowledgediscuss/deeds 20:45, 17 July 2016 (UTC)
  • Is there any context for this? Because I've little idea what exactly the topic is. Korn [kʰũːɘ̃n] (talk) 18:28, 9 July 2016 (UTC)
There is a possible way forward. We could write a parser for the syntax of the site as it currently exists (this is not easy!). If we have a parser, we can represent the data contained here independently of our own peculiar conventions, and bootstrap our way towards something better. Without a parser we can only make incremental changes, slowly, and mostly by hand. DTLHS (talk) 18:34, 9 July 2016 (UTC)
The wikidata "water" page addresses part of definition 1 at [[water]]. I think Wiktionary would a useful, large set of ambiguations (polysemies) to get them out of what looks like a narrow set of concepts that they are so far covering. Perhaps they should work with WordNet or other semantic net until they are ready to work with the complications of natural languages. DCDuring TALK 21:22, 9 July 2016 (UTC)

Hi all, we do talk about this topic in the French Wiktionary. That why we went to Wikidata to add Translate into all conversation about Wiktionary. Well, I see two different direction. First is the technical improvement and it may pass by a lexical database, and we are the ones that have to discuss this, in a way to scope what can be integrate. Second aspect is interface, and Visual Editor is the way in my opinion. Ok, I know, this tool is far from perfect now, as it was at the beginning for Wikipedia. Well, we can change that. Recently I wrote an idea I had during Wikimania on Visual Editor talk page and it result on a proposal on phabricator. It's something small, and adapt this tool to each project will be a long term project, but I think we can do it Face-smile.svg Noé (talk) 12:17, 10 July 2016 (UTC)

Renaming male -> masculine, female -> feminine in {{given name}}?[edit]

Enoshd (talkcontribs) suggested this and refers to w:Sex and gender distinction. He thinks both the text and categories should change. Changing the text is easy but renaming the categories will require a bot probably. Comments? Benwing2 (talk) 22:37, 9 July 2016 (UTC)

FYI: Wiktionary:Votes/pl-2010-01/Renaming given name appendixes, Wiktionary:Votes/2009-12/Masculine and feminine given names. --Dan Polansky (talk) 09:04, 10 July 2016 (UTC)
Pointless; we already have a sense at male that says "Belonging to the masculine gender (social category)." Equinox 12:20, 10 July 2016 (UTC)
We should make a distinction between grammatical and personal gender. These names' gender is personal, and possibly also grammatical but not necessarily (diminutives in Dutch are neuter, names are no exception). Since we already have "feminine nouns" categories and such for some languages, we should use "male" and "female" here to distinguish them. —CodeCat 12:58, 10 July 2016 (UTC)
Cecil is a male given name, but it isn't very masculine. Masculine (and feminine) would create ambiguity. Renard Migrant (talk) 14:43, 15 July 2016 (UTC)
Are you calling Cecil Rhodes a sissy? —Aɴɢʀ (talk) 15:27, 15 July 2016 (UTC)
+1 to what CodeCat said. For example, cailín is a masculine word, but a female given name (google e.g. Cailin Mcloughlin). Also, as Equinox points out, "male" and "female" are sufficiently polysemous to be acceptable. - -sche (discuss) 19:36, 15 July 2016 (UTC)
The Sex and gender distinction is irrelevant here, since no language actually makes that distinction (if you look at our entries for sex and gender, both words can refer to both concepts). Like others have said, there is a completely separate distinction relevant to us that we call grammatical gender vs. whateverthehellyouwanttocalltheotherone (and both "sex" and "gender" fit into the latter). It is true that in many languages the latter often strongly influences the former, but we still need to maintain the distinction, and the way we've been doing that is by using masculine and feminine for the former and male and female for the latter. --WikiTiki89 19:50, 15 July 2016 (UTC)

Do we need to mark accent in Hebrew words? Also, why don't why add redirects for Hebrew romanizations? (among other notes)[edit]

1. The accents and stress are very predictable in Hebrew, to the point of graphic accents being unnecessary.

2. Hebrew words are often understood in their romanized form(s) and aren't easy to search without a Latin keyboard. Why not do for Gothic and Japanese for Hebrew?

3. Do bekadgefat letters need to be mentioned with a tag? There are only six, and it would be redundant to mark every single word.

4. Nobody uses the CCaCtem/ CCaCten meter in the past pa'al due to analogy, nor does anyone use feminine plurals outside of the present tense. Shouldn't space be dedicated to gerunds and in the case of pa'al, past participles?

5. Shouldn't irregular verbs that are technically not weak like "konen" be marked?

6. Why is tzade always ts not tz? Nobody spells the language like that. Nor does anyone use kh for heyt, altho if there is a /X/~/X\/ separation in the orthography, I get having a /?/~/?\/ separation too, using ' and `, respectively. Maybe there should be precise and common transliterations? —This unsigned comment was added by Zontas (talkcontribs).

@Zontas: Re #2: For what it's worth, the Japanese actually use Romanji. I agree that we should have redirects for these, though or pages which read something like "Latinization of [term]" and which are maybe edit-protected. —Justin (koavf)TCM 22:38, 13 July 2016 (UTC)
Yeah, but the Goths never romanized their language and how would anyone find a word without a keyboard dedicated to Hebrew? —This unsigned comment was added by Zontas (talkcontribs).
I don't know that Gothic has ever been published in anything besides the Latin script. Having a physical keyboard is becoming less and less important; mobile screen keyboards are easily changed, and virtual computer keyboards can be installed on modern operating systems.--Prosfilaes (talk) 23:08, 25 July 2016 (UTC)
I doubt that most people who are looking up Hebrew words know romanizations well enough to know what to type in the search box. This is especially true since there are a ton of different romanizations. I'm not Jewish, but I grew up in a neighborhood with a large Jewish population. I remember seeing "Happy Hannukka|Hanukkah|Chanukah|Hanukah|Chanukka"- and the Hanukkah entry has a dozen more variations. As to how people look things up: I suspect the most common method for Hebrew would be copypasting from a document or a web page. Either that, or they use the Hebrew keyboard that's offered in that little dialog that shows up in the lower-right corner when you type into any Wikimedia input field- including the search box. Chuck Entz (talk) 02:16, 26 July 2016 (UTC)
My answer:
  1. That's not quite true. Most of the time the accent is predictable (in native words), but in many cases it is not. Minimal pairs include תואר \ תֹּאַר ‎(tó'ar, title) vs. תואר \ תֹּאַר ‎(to'ár, he was described) and בָּנוּ ‎(bánu, in us, also they understood) vs. בָּנוּ ‎(banú, they built). I agree that including accents on all words is superfluous, it would have been better to mark accents only when not on the final syllable. But marking the accent on all words is what was decided upon before my time here.
  2. What do you mean by "are often understood"? Gothic and Japanese each have their own peculiarities that make romanizations desirable as entries. In both cases there were votes in order to allow them to be treated as exceptions. Our default policy is that romanizations are not words and do not deserve entries and I don't think Hebrew is special enough to be treated differently. Searching can be solved by using the MediaWiki software keyboard, various online keyboards, copy/pasting, enabling the Hebrew keyboard that you may not have known came with your computer, etc.
  3. Are you referring to the usage note we include? I agree that they are too common for this to be necessary. I don't add them, but I don't really have a problem with others adding them.
  4. I think our conjugation tables mention that. If not, my new conjugation module that I'm working on should address the issue. But remember that our Hebrew entries cover Hebrew from all periods, including Biblical, Mishnaic, Medieval, and Modern.
  5. I don't understand what you mean.
  6. Firstly, keep in mind that American Jews' transliterations of Hebrew words are not Hebrew. People that actually write in Hebrew do so in Hebrew letters. I also have advocated a dual romanization system for Hebrew -- a scholarly one and a modern one -- which I use sometimes in various places. We use ts and kh because that was the common practice before my time here; theoretically, it makes more sense, even though it looks weird to us American Jews who are used to tz and ch. For kh, there is also the added benefit of leaving ch free to indicate the צ׳ sound (as in צ׳יק צ׳ק ‎(chik chak)).
--WikiTiki89 00:35, 14 July 2016 (UTC)
WT:REDIRECT advises against redirects and they can be confusing, especially for a Latin to a non-Latin script. Also you'll get collisions where two words, say Chinese and Hebrew have the same romanisation and an entry title can't redirect to both. Thirdly there are often rival romanisations schemes for languages, Chinese has loads, doesn't it? So one word might have seven different romanisations. Bear in mind we have search function that will picking up romanisations in an entry's text. Renard Migrant (talk) 14:40, 15 July 2016 (UTC)
  1. Yeah, I guess the accent isn't completely useless, but I suppose the answer is to limit usage, not remove or overdo it.
  2. I concede regarding romanization redirects, I guess a better idea would be adding Hebrew words to "in other scripts" sections and having two official transliterations on here (precise/ Biblical and colloquial/ Modern).
  3. I mean, it wastes space and isn't exactly helpful. But it's a low priority, my only goal is to make it not mandatory.
  4. Ahhh, I completely understand. Tho, the tables mostly mention it being rare rather than obsolete.
  5. Konen is a verb in pi'el that has an odd conjugation (as you can tell by the dictionary form), but it's not remarked upon for some reason, I suppose we start marking it and other odd verbs like yakhol and lamad.
  6. Most people who write Hebrew do use the Hebrew Script of course, but if they can't for some reason, they use Latin; And in Latin I've never seen ts used for tzade even on other Wikimedia, and kh is mostly used for khaf rather than heyt. It's common here, but about nowhere else. Plain c, which is unused, could be used for /tS/, with ch and ` becoming the (commonly enough) pharyngeal consonants. —This unsigned comment was added by Zontas (talkcontribs) at 15:32, 15 July 2016.
(4) I would say calling them rare rather than obsolete is accurate. (5) כּוֹנֵן is not irregular, is simply a member of the sub-binyan po'el of pi'el, just like רָץ is a member of a sub-binyan of pa'al. (6) Transliterations are not meant to represent how these words are written when the Hebrew script is not available. Transliterations are meant to aid our readers in reading our entries when they don't know the script. --WikiTiki89 18:02, 15 July 2016 (UTC)
Re #6: It could be argued that using more standard transliterations makes it easier for naive users to read. Personally ts makes more sense to me than tz but I agree that tz is more common. Benwing2 (talk) 18:33, 15 July 2016 (UTC)
4) I mean, I do concede it's to mark all eras of Hebrew. 5) Technically weak roots are not irregular, but I that's part of what I was referring to. Anywho, my main issue is that some sub-conjugations were never marked as having unique forms, like konen. 6). Fair enough about not using redirects, tho I still thing we should revize our main transliteration system to match common trends, and to add another for precise transliteration. There should be a Hebrew-to-Latin index. Also, what's your opinions on my responses to #1-#3? --Zontas (talk) 20:28, 16 July 2016 (UTC)
כּוֹנֵן does not have "unique" forms. There are many verbs in this sub-conjugation. Perhaps we should categorize these sub-conjugations, but that would be difficult for some of the confusing pa'al subgroups. --WikiTiki89 14:46, 18 July 2016 (UTC)

New Spanish verb template backend[edit]

Template:es-conj-güir

Automatically highlights and adds categories (in the main namespace) for any irregular forms. Can export a JSON representation of inflected forms:

I've only added it to two of the existing templates (I will need to go through the existing uses to look for unexpected parameters). Any feedback before I go further? DTLHS (talk) 00:12, 14 July 2016 (UTC)

Also I'd like to get rid of all the individual templates and move to three templates (for -ar/-er/-ir verbs) with the pattern as the first parameter, unless there are objections. DTLHS (talk) 00:22, 14 July 2016 (UTC)
@DTLHS: Is it necessary to even have three? Could one suffice? Or maybe one for -ar and another for -er/-ir? —Justin (koavf)TCM 00:50, 14 July 2016 (UTC)
It doesn't really matter to me. But it should either be 1 or 3, the -er / -ir paradigms are different. DTLHS (talk) 00:54, 14 July 2016 (UTC)

Appendix:List of protologisms/non-English[edit]

Right, I admit that I'm one of those people who thinks that having a page like this is essentially counterproductive, but I accept the notion that it's better to have an appendix like this than having to rid Wiktionary of protologisms on a daily basis. However, edits by one user to the Romanian section, is prompting me to take action.

My reasons:

  • (1) a vast majority lack English definitions – last time I checked, we're still in the English Wiktionary. I'm uncomfortable having a disproportionately long list of made-up words lacking definitions which users of this site can understand;
  • (2) an overwhelming percentage of these words are ludicrous – they don't have English equivalents and they have 0 hits on Google, Google Books, social media etc.;
  • (3) most of these terms do not fulfil even basic criteria for inclusion found here:
    "[…]should meet an expressive need" – most don't.
    "follow some logic in their etymology" – not in most cases.
    "follow standards of spelling, intonation, and pronunciation in the language" – unfortunately in absurdum.
    "and should be ideally "catchy" enough to have a chance of gaining wider acceptance." – strongly no, considering that they were deleted in other Romanian projects (for instance Wikibooks) for being absurd.

I'm not going to take action unless I have a mandate to do so by the community. --Robbie SWE (talk) 11:42, 15 July 2016 (UTC)

Zero hits isn't an issue for a protologism. I agree that the page (i) should have definitions in English and (ii) isn't very useful to anybody anyway. Equinox 13:28, 15 July 2016 (UTC)
I don't see why we should be cleaning up the garbage dump. A garbage dump is supposed to be filled with garbage. --WikiTiki89 14:12, 15 July 2016 (UTC)
The only real rule I'd like implemented is that anything attestable should be removed. Like Wikitiki89 says it's a rubbish dump and by the way that was its original purpose as well, to discourage people putting them in the main namespace. As for formatting, surely formatting lists of words that don't exist by very definition has to be the lowest of all priorities. But anyone wanting to do it I'm not going to try and stop them. Renard Migrant (talk) 14:37, 15 July 2016 (UTC)

Ok, I see what you guys are saying and I've arrived at the same conclusion – a dump is a dump, no point in trying to organise it. Thanks for the input though! --Robbie SWE (talk) 11:51, 16 July 2016 (UTC)

Vote: Adding PIE root box[edit]

FYI, I created Wiktionary:Votes/2016-07/Adding PIE root box. Let us postpone the vote if needed, that is, as long as the discussion requires. --Dan Polansky (talk) 18:28, 15 July 2016 (UTC)

Old English Long Vowels/ Wynn/ Edh/ Orthography[edit]

I get keeping thorn and æsh around as they represent unique phonemes, but by the way we are archiving the words, wouldn't it make more sense to use <w> not wynn, and <þ> not <ð>; and also using macrons or acute accents to mark long vowels. Old English writing was semi-chaotic, granted, but most of it is archived in a pseudo-modernized uniform script, and it would be easier to find with some consistency. Not to mention we should stop posting runic words as OE abandoned the script relatively early.

--Zontas (talk) 21:52, 15 July 2016 (UTC)

  • Hmm, I thought edh and thorn represented distinct phonemes? Or is that just for Icelandic? ‑‑ Eiríkr Útlendi │Tala við mig 22:10, 15 July 2016 (UTC)
  • We already use w and þ primarily, as that's what dictionaries use generally. But we have a policy to allow all attested words, in the representation they were written in (as far as Unicode can represent it). So that also leaves room for the letter wynn, ð and runes. However, they should direct the user to the normalised spelling rather than being a main entry in themselves. —CodeCat 22:27, 15 July 2016 (UTC)
@Eirikr edh and thorn are distinct in Icelandic but not OE, where they both represent a sound that was voiced when between voiced sounds and not doubled, and voiceless elsewhere. Benwing2 (talk) 23:15, 15 July 2016 (UTC)
But that also applied, at least, to Old Norse. The difference is in the origin and distribution of [ð]: in Old English it derives exclusively from voicing of [θ], while in Old Norse it derives from that and also from Proto-Germanic [ð]. I believe even in modern Icelandic the two are in complimentary distribution, but I'm not sure. —CodeCat 23:34, 15 July 2016 (UTC)
Actually, in Old English [ð] sometimes comes from frication of [d]. Also, I didn't know Proto-Germanic had a [ð] other than a voiced [θ]. --WikiTiki89 00:06, 16 July 2016 (UTC)
It did, but as an allophone of /d/ much like in Spanish. —CodeCat 01:04, 16 July 2016 (UTC)
That still doesn't mean /ð/ is a phoneme. It's predictably an allophone of /d/ and /T/. Also, I get allowing original script redirects over main entries, but nobody has mentioned whether my idea of marking long vowels (and now that I think about it, soft c and g) is to be used. --Zontas (talk) 19:54, 16 July 2016 (UTC)
@Zontas We do support using macrons for long vowels in links and such. The normal convention here (and also for stress marks in Russian, and similar things in other languages) is that the name of the entry itself doesn't have a macron in it, but the headword does, and links should also, and the macron will automatically be stripped out when generating the underlying link. See lad#Old English for an example. If I link to it as lād or lād, the link works correctly. Benwing2 (talk) 20:06, 16 July 2016 (UTC)
I still don't get why we can't use diacritics in the name. The length isn't just a rare thing, it's as common as it is in Latin. --Zontas (talk) 20:28, 16 July 2016 (UTC)
We don't show it in Latin, either. Chuck Entz (talk) 20:33, 16 July 2016 (UTC)
The reason we don't add such diacritics is that we put the entry down as attested. So the page is chosen by the spelling actually used. If the originally manuscript has the diacritics, you can make an entry at a page with the diacritics; if the diacritics are only a scholarly annotation, they only get added on the page where apt, but the pagename itself will be one without diacritics. Korn [kʰũːɘ̃n] (talk) 22:05, 16 July 2016 (UTC)
It's different with normalised spellings though, and most old Germanic languages use normalised spellings on Wiktionary. Technically, Old English terms are never attested with the letter w (or infrequently, I'll let someone who knows more clarify that), but we have entries with w regardless because the ƿ > w normalisation is usual and normal in dictionaries, grammars and republished OE texts, and we follow this custom. If we went strictly by attestation requirements, we could never have these entries but would be required to spell them with ƿ. So technically, if we allow these changes, and propagate them to page names as well, there's nothing in principle against doing the same for macrons too. Consider that the acute accents for long vowels are part of Old Norse page names, despite not appearing in manuscripts either. —CodeCat 22:13, 16 July 2016 (UTC)
Huh. I didn't know this distinction exists and I do not agree with it. But since it doesn't affect me, I won't complain much about it either. Korn [kʰũːɘ̃n] (talk) 23:41, 16 July 2016 (UTC)
Also, note that, for example, for Hebrew, Arabic, and Russian, diacriticized spellings are attestable but we do not allow them as entry titles. --WikiTiki89 14:44, 18 July 2016 (UTC)
As for Old Norse, I assume we include acute accents (long marks) in page titles because in modern Icelandic the long marks are mandatory (is that correct?), and modern Icelandic is spelled almost identically to Old Norse. Cf. the mandatory long marks in Latvian, which appear in page titles. Benwing2 (talk) 02:54, 21 July 2016 (UTC)
  • Most of the entries do use <w> and <þ>. I think that's even what it says at WT:AANG. To be honest it doesn't make much difference since in an ideal world all the attested versions would be covered by =Alternative forms= in any case. I do not agree with using macrons in page titles though. Many of my scholarly editions of OE texts don't have them, for the good reason that they're not in the manuscripts. Ƿidsiþ 16:48, 28 July 2016 (UTC)

Advice for a Sanskrit pronunciation module[edit]

Howdy all! I've been working on a Sanskrit pronunciation module similar to those available in Latin and Ancient Greek, and I would love some advice! The temporary template may be found at {{User:JohnC5/sa-IPA}}, a sandbox at User:JohnC5/Sandbox2, and the module at mod:User:JohnC5/Sandbox2. I've finished the basic conversion to IPA, the syllabification, the rudimentary anusvara rules, and some chronolect handling (Vedic and Classical). I was hoping someone could take a look to see if it all makes sense. Also, please suggest what needs to be added (like Abhinidhāna), what needs to be fixed, how it should look, and anything else that comes to mind. It's also possible that this is completely unnecessary and should not have been attempted in the first place (I don't think so, but I'm open to discussion). I feel like a lot more work needs to be done, but I don't yet know what the end state should be nor the acceptance criteria. Thanks! —JohnC5 07:05, 17 July 2016 (UTC)

A possible issue is that we use the unattested base stem as the lemma for nominals, rather than any of the case forms. So there shouldn't really be any pronunciation on those entries. —CodeCat 12:49, 17 July 2016 (UTC)
@CodeCat: I had been wondering about that problem. For nominal, this issue really only affects the desinence, and I would probably suggest using the (masculine) nominative singular as the exemplar for pronunciation. For verbal root entries, I certainly would recommend against this template's usage, but for the 3rd singular entries (e.g. गच्छति ‎(gacchati)), this should be fine. Does this solution work? —JohnC5 17:56, 17 July 2016 (UTC)
Speaking of गच्छति ‎(gácchati), the module currently produces /ɡə́t͡ɕ.t͡ɕʰə.t̪i/. The realization of च्छ ‎(ccha) seems very wrong to me. Abhinidhāna would say that the first plosive becomes unreleased. In the case of an affricate, does this mean the result would be /ɡə́t̚.t͡ɕʰə.t̪i/ and the use of ‎(ca) is merely a spelling convention? —JohnC5 19:06, 17 July 2016 (UTC)
You could avoid the issue entirely by using ":" instead of doubling the sound. Chuck Entz (talk) 02:36, 18 July 2016 (UTC)
@Chuck Entz: I've already implemented the logic for the abhinidhāna (it's actually not bad). Also I'm not clear how ":" would be used when the cluster is heterosyllabic. Thank you for the advice in any case! —JohnC5 02:48, 18 July 2016 (UTC)
The thing is that it's not exactly [t], because it has the same point of articulation as the following affricate release. That's probably why cc was used to represent it rather than tc. —CodeCat 15:30, 18 July 2016 (UTC)
@CodeCat: Just to make sure there's no confusion, the beginning of the Sanskrit alveolo-palatal cluster /t͡ɕ/ differs from the dental /t̪/ already. So the theoretical difference between त्छ ‎(tcha) (which is both unattested in MW and phonologically impossible) and च्छ ‎(ccha) would be /t̪̚.t͡ɕʰ/ vs. /t̚.t͡ɕʰ/, respectively. Are you, instead, proposing a palatal stop as the form? This would be something like /c̚.t͡ɕʰ/ for च्छ ‎(ccha) and /ɟ̚.d͡ʑʱ/ for ज्झ ‎(jjha). Is that what you are saying? —JohnC5 17:43, 18 July 2016 (UTC)
@CodeCat: Do you think the default display chronolect should be Vedic or Classical? Currently, I have it as Vedic, but that's because I like PIE. —JohnC5 02:31, 18 July 2016 (UTC)
Classical, for sure. It's what is normally taught, what people worldwide will be familiar with. —CodeCat 15:28, 18 July 2016 (UTC)
Can't we display both, similar to what we do for Ancient Greek? --WikiTiki89 15:35, 18 July 2016 (UTC)
@Wikitiki89: I've updated the code so that if the Vedic and Classical pronunciation differ, both are displayed with an arrow like in {{grc-IPA}}. You can see that at User:JohnC5/Sandbox2. Is that sufficient? —JohnC5 17:47, 18 July 2016 (UTC)
Looks good! --WikiTiki89 17:50, 18 July 2016 (UTC)
What are the differences between Vedic and Classical pronunciation that you're showing? I thought Classical Sanskrit had a stress accent that was placed in accordance with a rule similar to that for Latin (except that it can go back further than the antepenultimate), so I was expecting /ˈtɕən̪d̪ɽə/ for चन्द्र. Shouldn't कार्त्स्न्य have /ɑː/ (with long mark) in both Vedic and Classical? What about words whose Vedic scansion reveals one more syllable than is written (e.g. /kɑːɽt̪sniə/ for कार्त्स्न्य or /ukt̪uɑː/ for उक्त्वा – I don't know if those particular words belong to the class I'm talking about, but they illustrate the principle)? Does the system have a way of accommodating them? And don't some scholars believe intervocalic laryngeals were still around in Vedic, so that ā for example might sometimes be /əʔə/ or /ɑːʔə/ or /əʔɑː/ or /ɑːʔɑː/? Your sandbox doesn't seem to have any examples of word-final visarga; how would the module transcribe चन्द्रः? I'd expect /tɕən̪d̪ɽə́h/ in Vedic and /ˈtɕən̪d̪ɽəhə/ in Classical. —Aɴɢʀ (talk) 18:14, 18 July 2016 (UTC)
(edit conflict) @Angr: Thanks for all the comments. As mentioned before, feel free to fiddle around in the sandbox and add things. I'm less familiar with the Classical accent; do you have more description on that matter? I had been trying to figure out some of the vowel issues. One of the main distinctions between Vedic and Classical vowels is the change of ‎(ai) & ‎(e) from /ɑːj/ & /ɑj/ ~ /əj/ to /ɑj/ ~ /əj/ & /eː/ (the same is true for ‎(au) & ‎(o)). I was unsure what to do about the /ɑj/ ~ /əj/ decision (Does it vary or do scholars disagree?). Aslo, do we prefer /ə/ over /ɐ/, since the latter shows up as well. Any guidance on this matter would be greatly appreciated. You're obviously right that from a metric standpoint /ɑː/ must remain long. For the Vedic scansion matters, is there a way to predict it, or is it merely on an anecdotal basis? I can add in some functionality around that (whether the user must specify the variation, or the template generates it automatically), but I need more information. For the Vedic laryngeals, I'm not sure how best to approach this: again, the user would need to specify the specific vowel, perhaps similarly to how the Vedic accent is currently specified (with |a=N for the first word where N represents the syllable on which the stress occurs; |a2=, |a3= etc. for subsequent words). For the visarga, It was my impression that the vocalic reduplication around the visarga (/tɕən̪d̪ɽə́h/ to /ˈtɕən̪d̪ɽəhᵊ/) differed between Śākhās. I'm not saying we shouldn't list several different Śākhās' pronunciations, but I wasn't sure where Classical pronunciations fell. Thanks for reading all these questions. I didn't start this module because I claimed to know Sanskrit phonology particularly well—I did it because I thought it could and should be done and that people would tell me when I make mistakes. —JohnC5 19:35, 18 July 2016 (UTC)
@Angr: I've added the Classical accent. Could you check it? —JohnC5 03:35, 19 July 2016 (UTC)
@JohnC5 I don't have answers to the questions I asked. I thought that stress receded to the rightmost long vowel (excluding the final syllable) and fell on the first syllable if all syllables (excluding the final syllable) had short vowels, so that svataṃtraḥ and aupadraṣṭrya would be stressed on the first syllable, but I'm not positive that's right. Wikipedia doesn't say anything at all about post-Vedic accent, and all I have to go on is my memory of the Sanskrit class I took as an undergraduate more than 25 years ago. So please don't interpret my comments above as "This is how things are, you should accommodate them" but rather "Here's something that might bear looking into, but I'm not sure of the details at all". —Aɴɢʀ (talk) 11:55, 19 July 2016 (UTC)
@Angr: Based on everything that I've read, the Classical accent is like the w:Dreimorengesetz but extended to include the preäntepenult. So starting at the penult and moving leftward, search for the first heavy syllable unless you find the left edge of the word or the fourth-to-last syllable. I've added the vowel echoing around the visarga. I'm still not sure what to do about the laryngeals and alternate syllabification. Perhaps that can wait? Is there anything that you think must be changed before this goes into production? —JohnC5 15:10, 19 July 2016 (UTC)
@JohnC5: I notice you're using /x/ for the visarga; is it really that and not /h/? As for laryngeals and alternate syllabification, we probably don't want them to be generated automatically from the spelling, but maybe the {{sa-IPA}} template could take a parameter like altved=pāat or altved=uktuā that would allow pronunciations not reflected by the spelling to be listed as alternative Vedic pronunciations. —Aɴɢʀ (talk) 10:13, 20 July 2016 (UTC)
@Angr: You are right about the visarga; it took me a while to find a good source for it though. In the case of alternative syllabification, I'm wondering about the vowel hiatus, which does occur in Sanskrit, but rarely. For uktuā, do we assume a homorganic glide to be inserted (/ˈuk.t̪u.ʋɑː/), or pure vowel hiatus (/ˈuk.t̪u.ɑː/), or a glottal stop (/ˈuk.t̪u.ʔɑː/)? Also, can we detect the distinction between a Vedic laryngeal and a Vedic resyllabification, or does the user have to insert a glottal stop like altved=pāʔat? —JohnC5 15:23, 20 July 2016 (UTC)
@JohnC5: I think all we know for sure is that a word could be spelled उक्त्वा and scan as three syllables in the Veda; whether it was realized as /ˈuk.t̪u.ʋɑː/, /ˈuk.t̪u.ɑː/, or /ˈuk.t̪u.ʔɑː/ is probably not really knowable at this point. Likewise all that's known for sure is that some instances of ā scan as two syllables, and comparative evidence shows that these must have been *aHa or *aHā or *āHa or *āHā in PII, but how exactly they were realized in Vedic is again probably not really knowable at this point. I don't know whether vowel hiatus ever occurs when it wasn't due to a laryngeal. I almost regret bringing these issues up now, since I know so little about the details that would help in resolving them. —Aɴɢʀ (talk) 15:38, 20 July 2016 (UTC)
@Angr: Vowel hiatus does occur in other positions, as this paper discusses. I think that we ignore the laryngeals and alternate syllabifications, however, until such time as we find examples of these two phenomena and need to mark them. I'll gladly add the functionality, but it seems far too amorphous at the moment. I hope that someone better informed than we will come to tell us all the hottest new research in Vedic phonology, but for now, we're fine. I'll switch the visargas over to /h/ soon.
On a different note, I'm currently representing the anusvara phonemically as a nasalization of the preceding vowel before ś, ṣ, s or h and a homorganic nasal before stops, but I believe the true phonemic representation is /m/ at a morpheme boundary and /n/ morpheme-internally. The issue then becomes how to tell where morphemes end within heteromorphemic words. Should I just declare all anusvara before a space to be /m/ and all others to be /n/? Also, there's evidence that, along with nasalizing the preceding vowel, the anusvara preceding ś, ṣ, s or h lengthened the vowel too. This is fine for short vowels, but works less well for long vowels (especially once you get into Classical when the w:Pluti vowels disappeared). Should I 1) only lengthen the short vowels 2) lengthen both the short and long vowels or 3) ignore the lengthening altogether since it is not well understood? Thanks for all your commentary thus far! —JohnC5 18:33, 20 July 2016 (UTC)
I don't really like the arrow notation. I prefer it to always show with labels of what each pronunciation represents. —CodeCat 19:05, 18 July 2016 (UTC)
Also, a few other points:
  • I don't think short a was fully central, but more open, perhaps [ɐ], at least in Vedic, and long ā was still a long vowel. I don't have a source, but it seems fairly likely. In the same vein, I'd say e and o were probably [ɐi̯] and [ɐu̯] in Vedic.
  • Your pronunciation also seems to treat as phonemic details that weren't. The pronunciation changes of visarga before labials and velars in Vedic was not phonemic.
  • I suspect that the Vedic transcription of त्रैंश ‎(traiṃśa) is wrong, the -ai- was probably bisyllabic.
  • औपद्रष्ट्र्य ‎(aupadraṣṭrya) may be syllabified wrong, I'd expect -dr- to be entirely in the next syllable.
  • Where does the nasal /j/ in Vedic कार्त्स्न्य ‎(kārtsnya) come from?
  • The resolution of ṛ into ri was post-classical, and actually differed by dialect. Some dialects have ru or ra instead. So for Sanskrit proper, a syllabic sonorant should still be used.
Finally, I hope that there is a parameter to disable Vedic transcriptions. For certain words, the Vedic equivalent may actually have a different spelling, so listing a Vedic pronunciation would be wrong then. —CodeCat 19:23, 18 July 2016 (UTC)
Should I just to Classical then for the default display?
  • As in my response to Angr, I had been curious about that. I'm perfectly happy to use /ɐ/. Do you also think I should use /i̯/ and /u̯/ over /j/ and /w/?
  • You're quite right about the [ɸ] being phonetic. I'll need to add seperate sections for those. What would you say the underlying phoneme of the visarga is? Whenever notated as a visarga, it is [x] or [ɸ], but it is just the allophone of /s/. Should it be that?
  • What would be the method of determining the bisyllabicity of such words?
  • I was unsure about this. The Weerasinghe-Wasala-Gamage method of syllabification makes special cases for /-.CrV-/ and /-.CyV-/ in all other cases accept /VCCV/ which it always interprets as /VC.CV/. It does seem more sensible, however to keep the same rules as in the other cases.
  • William Sidney Allen's Phonetics in Ancient India mentions Vedic turned /m/ + /j/, /l̪/, or /ʋ/ as giving /j̃j/, /l̪̃l̪/, or /ʋ̃ʋ/ respectively. He then says that this occurs only once in Classical and it affects an /n/. First of all, I realize this should be a phonetic rule again. Also is it only /m/ and /n/ or all nasals? It also makes sense that this only applies in the environment of /VN.[jl̪ʋ]V/.
  • That rule was borrowed from w:Vedic_Sanskrit_grammar#Phonology. I'll remove it if you think it is necessary.
  • The ability to turn of the Vedic seems prudent, and I will add it. Also, if you'd like to help coding this (or fixing my bad code), please do! —JohnC5 19:56, 18 July 2016 (UTC)
Also, I assume that abhinidhāna would be a phonetic change, not phonemic. —JohnC5 20:17, 18 July 2016 (UTC)

──────────────────────────────────────────────────────────────────────────────────────────────────── As for what to do with geminate consonants, in Russian we use Cː and syllabify as if it's a single consonant, i.e. if between vowels it forms the beginning of a syllable and joined with the following vowel. This isn't perfect but it deals with affricates well. The alternative I think is to write e.g. /t.t͡ɕʰ/ (or maybe /t̚.t͡ɕʰ/), which I think will be interpreted correctly as a long affricate even if technically it might mean something else; to me, something like /c̚.t͡ɕʰ/ looks really weird and is likely to be misinterpreted. Benwing2 (talk) 02:50, 21 July 2016 (UTC)

@Benwing2: So, to be clear, you think the current solution is sufficient? 03:34, 21 July 2016 (UTC)
@JohnC5: Sorry, I missed your reply. I don't like the way you currently write IPA: /ˈɡɐt͡ɕ.t͡ɕʰɐ.t̪i/, [ˈɡɐt̚.t͡ɕʰɐ.t̪i], I think it should say IPA: /ˈɡɐt.t͡ɕʰɐ.t̪i/, [ˈɡɐt̚.t͡ɕʰɐ.t̪i] or maybe IPA: /ˈɡɐ.t͡ɕʰːɐ.t̪i/, [ˈɡɐt̚.t͡ɕʰɐ.t̪i]. Benwing2 (talk) 05:04, 27 July 2016 (UTC)
@Benwing2: Aha, that makes sense to me now. I prefer your first alternative since it still denotes phonemically the presence of the underlying de-affricated stop. I'll look into implementing that soon. Any thing else? —JohnC5 05:19, 27 July 2016 (UTC)
@JohnC5: The only other thing is that currently when you have a ˈ mark between syllables you leave out the . separating the syllables and you might want to include it, but that's your choice. Benwing2 (talk) 05:25, 27 July 2016 (UTC)
@Benwing2: I've made the change you requested for affricate clusters. What is the convention for stress accent markers and intersyllabic dots? {{la-IPA}} removes the dot, but {{grc-IPA}} does not. —JohnC5 14:35, 27 July 2016 (UTC)
@JohnC5 Presumably there's no convention, so you can pick what you want. I think it looks better with the dot but I'm not going to insist on it. Benwing2 (talk) 16:00, 27 July 2016 (UTC)

@CodeCat, Chuck Entz, Wikitiki89, Angr, Benwing2: So... are we ready to start adding this to entries? —JohnC5 03:13, 27 July 2016 (UTC)

I have no objection. —Aɴɢʀ (talk) 08:05, 27 July 2016 (UTC)

Template alternative case form of to show more intuitive description[edit]

I think {{alternative case form of}} should show one of the following, depending on the letter case:

  • Uppercase form of
  • Lowercase form of

Thus, it would no longer show this:

  • Alternative letter-case form of

Easy to implement in a module, I think.

Opinions? --Dan Polansky (talk) 13:25, 17 July 2016 (UTC)

I think this is very sensible. You are also correct that it is fairly easy to implement. —JohnC5 17:50, 17 July 2016 (UTC)
I agree with JohnC5--Dixtosa (talk) 18:55, 17 July 2016 (UTC)
There may be some instances where Lua cannot determine the case. But this should be very rare. DTLHS (talk) 18:58, 17 July 2016 (UTC)
What if the alternative casing is a mix? —CodeCat 19:05, 17 July 2016 (UTC)
Do you have an example in mind? For special cases, the template may be updated to accept a parameter telling it to display the old "Alternative letter-case form of".
Now looking at the proposal again, the "Uppercase form of" should probably be "Capitalized form of"; uppercase would be WORD rather than Word, I fear. --Dan Polansky (talk) 19:08, 17 July 2016 (UTC)
I would say, the template should normally display either "Capitalized form of", "Lowercase form of", or "Uppercase form of", as with Dan P; if the case is mixed somehow or other, it should default to either "Mixed-case form of" or just "Alternative letter-case form of" as before. Note that the "Capitalized form of" detector should be smart enough to allow both Foo-bar and Foo-Bar, i.e. it should treat hyphens as spaces and allow non-initial words to either be capitalized or lowercased, and it will probably need special-casing for Dutch words like IJsland and IJsselmeer. Benwing2 (talk) 20:53, 17 July 2016 (UTC)
I assume that the module could also detect camelcase forms and display accordingly. SemperBlotto (talk) 05:05, 18 July 2016 (UTC)
"Alternative letter-case form of" while wholly accurate is very confusing for people not familiar with such lexicographical terms. Took me a second in fact to work it out. Renard Migrant (talk) 12:43, 23 July 2016 (UTC)
Agreed that "alternative case" is a tad confusing. The capital/lowercase templates can display "form of" if necessary but shouldn't have the word "form" in the template name itself. DAVilla 06:08, 30 July 2016 (UTC)

Lexicography at a Crossroads[edit]

Hey. Has anyone read Lexicography at a Crossroads: Dictionaries and Encyclopedias Today? I've been flicking through it, and Wiktionary gets loads of pages dedicated to it, especially the English and Spanish parts. It's pretty interesting, and they bring up several of our flaws. --Turnedlessef (talk) 23:27, 18 July 2016 (UTC)

Page 115: "...they also illustrate two important lexicographical implications. First, only trained English lexicographers can add and/or edit English entries, whereas these requirements are not necessary for working with Spanish ones." Page 119: "Wiktionary not only uses English as default language, but also offers much more data in the English entries than in the Spanish ones. This [...] goes against its democratic philosophy." Can't say I'm much convinced that these fellows have produced an analysis that is useful for us. Korn [kʰũːɘ̃n] (talk) 08:16, 21 July 2016 (UTC)
Author offers his idea of what a Wiktionary entry should look on page 127. Korn [kʰũːɘ̃n] (talk) 08:30, 21 July 2016 (UTC)
I'm no trained lexicographer, but I edit and create English entries without difficulty, and I also hold the French entries I edit to the same standard as the English ones. The book was also written some time ago, meaning that much of its content is probably out of date. From what I saw, though, there were some valid problems addressed, but they are due mainly to the incomplete state of this project rather than its nature, and will hopefully improve with time (e.g. single translations in FL entries that do not accurately represent the word's range of definitions, such as using "business" to define negocio without any glosses to clarify its meaning—a problem that entry notably still has). Andrew Sheedy (talk) 03:01, 22 July 2016 (UTC)
Thanks for pointing this article, but it is not that relevant to Wiktionary today. I started to list this kind of publication in Meta, so I invite you to add other and specially to take a look at the ones about GLAWI Face-smile.svg Noé (talk) 13:42, 22 July 2016 (UTC)
I added a gloss to negocio. --Turnedlessef (talk) 08:31, 23 July 2016 (UTC)

Proposal for Gurjar Apabhraṃśa[edit]

I'd like to propose a code for the Gurjar Apabhraṃśa, the direct ancestor of Old Gujarati, which is given a grammatical sketch by Hemacandra and is also used in a several texts of the era. The code I'd like to propose is inc-agu (and perhaps thus as a model for other Apabhraṃśas later on as inc-axx). I'd also like to have a name that is diacritic-less, but all the literature shows the version with diacritics. Any ideas? DerekWinters (talk) 22:16, 20 July 2016 (UTC)

Perhaps Gurjar Apabhramsha? —Aryamanarora (मुझसे बात करो) 19:07, 26 July 2016 (UTC)

A tremendous gathering[edit]

Hi, colleagues!

I am thrilled to announce the official creation of the Wiktionary Tremendous Group !

It arose from seeds planted at Wikimania, a month ago and it aims to be a common place to make Wiktionaries better, share our productions and thoughts about technological developments. We can also organize events together such as conferences and LexiSession, a fancy way to contribute together to the same topic during a short period of time. In August, we suggest focusing on cat! Another main goal is to increase our network with the other wiki projects. So, if this interests you, you are very welcome! We want multilingual discussions as much as possible, but my English is not very natural, so feel free to correct any mistake you see. Also, I am very inexperienced with team management and I have no idea how to make this project more attractive. I think a nice logo would be cool, but suggestions are welcome! It will be cool, so please join! Noé (talk) 01:13, 21 July 2016 (UTC)

J'ai regardé la page, et j'aime ce que j'y vois. Peut-être je rejoindrai le groupe plus tard, quand j'ai plus de temps et quelque chose à contribuer (ou peut-être je pourrais vous aider avec les traductions de français en anglais, si ça te tente). Mon français est bien mieux que c'état il y a une année, mais j'imagine que j'ai quand même fait des fautes, et je serais très reconnaissant si tu (ou quelqu'un d'autre) les corrigeais. :) Andrew Sheedy (talk) 03:16, 21 July 2016 (UTC)
Thanks a lot for you help, I'll clean the announcement and I posted on your talk page some comments about your French, but it is better than my English, for sure! There is no implication of being part, it is not a secret society, so feel free to visit once a month or less, that's fine! Noé (talk) 09:37, 21 July 2016 (UTC)

Category for employment[edit]

I have yet to find a category relating to employment, apart from Category:Occupations. Is there one? DonnanZ (talk) 17:04, 21 July 2016 (UTC)

Am I allowed to create a category for this? No objections? DonnanZ (talk) 12:32, 23 July 2016 (UTC)

You can't assume no answer to mean no objections. I for one have no idea what you're talking about or what would be in this category of yours. —Μετάknowledgediscuss/deeds 12:39, 23 July 2016 (UTC)
Sure. I can think of a few. Interview perhaps? Renard Migrant (talk) 12:41, 23 July 2016 (UTC)
Anything relating to employment which doesn't fit in the Occupations category. DonnanZ (talk) 16:23, 23 July 2016 (UTC)
It's definitely a significant hole in our category structure, but I'm not sure where to put it or what its limits are. Cat:Business is one possible parent. That has a subcategory Cat:Human resources, which is closer but probably not a good fit, since it's restricted to a management perspective. Cat:Occupations is another possible parent, but it's under Cat:People. That doesn't seem a good fit for most things relating to work as an activity.
As far as limits, I associate the term employment more with matters regarding whether one is employed and how one becomes employed (i.e., hiring and firing), rather than with the aspects of being employed. I wish we could use "work", but that's got so many other meanings it would be hard to keep from accumulating unrelated terms.
Would you include:
  1. job action, shop steward, strike?
  2. careerist, McJob, wage slave?
  3. blue-collar, clerical, front office, management, professional?
  4. double-dipping, golden parachute?
  5. pension, retirement?
What we do for a living is so basic a part of modern life that it bleeds into a variety of topics, and covers a lot of ground. Chuck Entz (talk) 18:38, 23 July 2016 (UTC)
Well, employment relates to people, so maybe it can be a subcategory of that? The category can also include labour relations.
Yes, probably all the terms you mentioned - I can also think of job interview, golden handshake, trade union and quite a few other terms. DonnanZ (talk) 16:00, 24 July 2016 (UTC)

label → lb[edit]

Wiktionary:Votes/2016-06/label → lb passed. Can someone do the honors and swap {{label}} by {{lb}} in all entries, please? --Daniel Carrero (talk) 18:44, 21 July 2016 (UTC)

Running now. —CodeCat 19:01, 21 July 2016 (UTC)
Thank you. --Daniel Carrero (talk) 17:53, 22 July 2016 (UTC)

Guys, I don't know where this movement came from, but making all template names obscure cannot be helpful for someone who's learning how to contribute to the project or those who don't contribute often. When I see {{lb}} my first inclination is to question what the heck it means, and if the first character is an L or a one. I don't automatically think "label" (and frankly when I see "label" I don't automatically think the template necessarily belongs on the definition line either, which is why it was originally called {{context}} although that may have been too narrow). It's great if you feel using abbreviations is going to save you time, but how the heck is replacing every {{label}} template call with the equivalent {{lb}} going to advance the project? I take issue even with the popular {{usex}} which is now replaced by the even more obscure {{ux}} when {{example}} would have sufficed. If you really really need the name to be short, why not something like {{eg}} which actually means something outside of this little universe? And now there's a proposed vote to replace {{loan}} with {{bor}} apparently to save on that one single lonesome character at the expense of legibility. Am I missing something big here, like cross-project support, or is this not some sort of obfuscation contest? Has Wiktionary been taken over by programmers of assembly or some obscure language? Please, come to your sense and quit this nonsense! DAVilla 04:33, 29 July 2016 (UTC)

Exactly! If anything, the bot should be converting the abbreviations to the full forms! Andrew Sheedy (talk) 04:41, 29 July 2016 (UTC)
@DAVilla: About Wiktionary:Votes/2016-07/borrowing, borrowed, loan, loanword → bor — only 68 entries actually use Template:loan, while thousands of entries use {{bor}}. I'm more interested in converting {{borrowing}} and {{borrowed}} to {{bor}}, for consistency with {{der}} and {{inh}}.
For anyone interested, Wiktionary:Votes/2015-11/term → m; context → label; usex → ux is a rather long vote in which voters discussed why they voted support, oppose or abstain to 3 separate template abbreviation proposals. The 3 passed. There is a trend toward shorter template names, as indicated by that and other template renaming votes that passed.
Concerning {{cx}}, {{lb}}, {{context}}, {{label}}, none of the names is perfect in my opinion. I believe we usually use "context labels" in conversation. It is also a term that WT:EL uses, as per this 2009 vote. I was very happy to support @Metaknowledge's vote to deprecate {{context}} and {{cx}}, but it was not because of their names; it was because they required "lang=", while {{label}} and {{lb}} didn't. As said in the label → lb vote, template abbreviations often make it easier to read the code: in {{lb|en|medicine}}, the word "medicine" is more conspicuous than in {{label|en|medicine}} or {{context|en|medicine}}.
I know you oppose the template abbreviations, but I actually like your idea of using {{eg}} instead of {{ux}}. I created a redirect from {{eg}} to {{ux}}, so now we can use it if we want. I don't intend to create a vote proposing to change {{ux}} to {{eg}} in all entries because I wish to avoid making proposals that cancel each other too quickly. We just changed from {{usex}} to {{ux}}, and we can't say for sure yet that {{eg}} is going catch on.
I think I was able to answer all the points you raised. Obviously, you don't have to agree with my opinion about the shortcuts. --Daniel Carrero (talk) 06:02, 29 July 2016 (UTC)
The obscure abbreviations is one of the things that I abhor in the en.wikt project. From a programmer point of view, abbreviated terms always make a code unreadable for anyone that is not an elite Wiktionary writer. It can be useful for a handful of templates that are used and repeated everywhere (typically {{l}}), but that's it. — Dakdada 12:09, 29 July 2016 (UTC)
I really disagree with something like {{lb}} being easier to read. Just now I was looking at your comment in wikitext and was thrown off by {{tl}} until I realized it's a different shorthand for "template". I don't think I'd ever use it. How would I remember that it's "tl" and not, say, "tm" or "tp"? At least for an abbreviation like {{abbr}} it's a standard in English. If switching to {{eg}} is something you and I can find agreement on, then I'd really prefer to halt these needless conversions until we have a roadmap and not a general trend set by scattered votes. Shortening names is fine if that's what the community wants, I just really disdain bizarre abbreviations like {{inh}} whose meaning I frankly had to pause to think about. (I wouldn't have guessed "inherited" if it wasn't grouped with the etymology templates.) In fact, I would rather have one letter abbreviations for the really common templates than something like {{cx}}.
What I'd like to see done is a comprehensive survey of which templates are most used and a brainstorm of short words like {{tag}} to be matched with them. I'm a little ignorant on other language Wiktionaries, but we should look for consistencies that we might want to maintain, although things like {{g|ender}} and {{t|ranslation}} I believe we already streamline pretty well. One-letter names must be reserved for uses that are extremely common. If after some discussion we decide that consistently 3-letter abbreviations are easiest to remember for etymology templates, then so be it. I'm not convinced that the 3-letter form is the best home for them, though. I'd rather have 3-letter forms as shortcuts which are expanded to something more legible. There has to be an understanding that what is easiest for the members of the community is not necessarily ideal for growing the community. Certainly I'm not suggesting that a contributor focused on etymology should have to type out {{borrowed}} every time he or she would want to use that template, but the shortcuts are listed right on the documentation for anyone who uses them that often. Everyone else, unfortunately meaning a lot of people who don't vote, I'm sure would be happier with a succinct, yes, but also recognizable character sequence. DAVilla 05:59, 30 July 2016 (UTC)
This effort seems to be the exact opposite of the course of action that we should be taking.
To me the logic of keyboard shortcuts like {{lb}} is to save keystrokes (time, carpal tunnel, etc) at input. What is subsequently seen in the edit window should probably be something more instructive to new or less frequent editors. Bots can give us the best of both worlds by periodically changing shortcut to long-form template names.
The possible advantages of displaying the shorter form in the edit window seem to be three:
1. The stored page is a little smaller and therefore a little faster to download etc. This seems like a small benefit.
2. The edit window shows a little more content. This also seems like a small benefit.
3. We could avoid some editors experiencing a disconnect between what the enter ("lb") and what eventually appears in the entry ("label"). This would be limited to those "in the know" about the shortcuts. I would think those in the know could survive the disconnect.
I think we had better be a bit more concerned about the learning curve for new editors if we are to benefit from broader participation and replace veteran contributors who lose interest in Wiktionary.
IOW, The entire effort to convert to storing and displaying short-form template names rather than long-form template names seems grossly misguided. Why is this so hard to see? DCDuring TALK 12:43, 30 July 2016 (UTC)
The rational contributor is going to use existing entries as models to follow rather than reading ELE or other such statute or regulation. From our existing entries, the rational contributor will pick things real fast. From the very fact that the template occurs at the beginning of the definition lines and contains such arguments as "slang", "informal" or "geography", the rational contributor will figure things out even if the template gets renamed to "xyz". We should not underestimate the intelligene of the sort of contributors we are seeking. --Dan Polansky (talk) 13:04, 30 July 2016 (UTC)
Those persistent, motivated contributors are the ones we are already getting. How about getting some broader participation from those with special expertise (language or subject matter), but less persistence and motivation? DCDuring TALK 15:55, 30 July 2016 (UTC)
  • Chiming in to add my 2p in support of more descriptive wikicode. All this hyperabbreviation is obstructive and counterproductive. We need fewer barriers to entry, not more.
And what is the benefit of these obscure and excessively short code conventions? I see no advantage. ‑‑ Eiríkr Útlendi │Tala við mig 20:11, 2 August 2016 (UTC)
@Eiríkr Útlendi: I propose you post a late oppose at Wiktionary:Votes/2016-06/label → lb. To do this, indent your vote like ":#" and say "late oppose" or the like. Thus, we can have votes closed and yet track broader support-oppose information than that available at the time of closing. --Dan Polansky (talk) 20:22, 2 August 2016 (UTC)

On the distinction of {{ux}} vs. {{eg}}: This subject was discussed before, in Wiktionary:Grease pit/2014/February#Template for eg over usex like label over context. That's when people decided to use {{ux}} and not {{eg}}. For the record, I did not participate in that discussion and, as I said above, my preference is actually towards {{eg}}. --Daniel Carrero (talk) 06:29, 7 August 2016 (UTC)

take pity vs. take pity on[edit]

I'm not very familiar with creating English phrasal-verb entries. I created the entry take pity for the expression "take pity on", even though it (almost?) always occurs with "on", on the principle of using the shortest form that still preserves the meaning. then created take pity on as a hard redirect; I know these are frowned on but I'm not sure what better thing to do. Benwing2 (talk) 05:44, 24 July 2016 (UTC)

Redirects for prepositions like this are ok (aren't they?) and there's also take pity upon which is amply attested. Still I'd move to take pity on and have the other two as alternative forms rather than redirects, with usage notes or context labels or whatever. Renard Migrant (talk) 16:57, 25 July 2016 (UTC)

Vote: Using template l to link to entries[edit]

FYI, I created Wiktionary:Votes/2016-07/Using template l to link to entries.

Let us postpone the vote as much as discussion requires, if needed at all. --Dan Polansky (talk) 06:59, 24 July 2016 (UTC)

Do we want a unified approach to manuscript forms?[edit]

Auec having been kept and noticing that mãdar exists, I wonder if it would be beneficial to treat all Latin script languages the same as just allow all the variants. English ones include vp and giuen. The question is genuine i.e. do we want a unified approach or a per-language approach? NB having some sort of vote or 'consensus' allowing these doesn't mean we have to have bots creating them by the thousand, simply that those we have won't be deleted provided they meet WT:CFI#Attestation. Renard Migrant (talk) 16:45, 25 July 2016 (UTC)

Removing subsection headings from EL § Translations[edit]

I'd like to remove the 3 subsection headings of WT:EL#Translations without a vote. See diff.

Translation dos and don’ts
Translating words which are not lemmas in a target language
Translating words without an exact equivalent in the target language

Rationale:

  • I believe it to be just a cosmetic, nonsubstantial change and it does not change any regulations, hence doing it without a vote.
  • It seems the qualifier "Translation dos and don’ts" actually applies to the whole "Translations" section. There are rules like "ONLY add translations that you are CERTAIN of." and "NEVER use automatic translation software" that are outside the section "Translation dos and don’ts", but they qualify as translations dos and don'ts regardless.
  • "Translating words which are not lemmas in a target language" and "Translating words without an exact equivalent in the target language" are just repeating the beginning of each rule. We could just as well have a dozen sections like "Translations not in the Latin script", "Word-for-word translations of idioms", "Translations of taxonomic names", etc.
  • "Translations" is a H4 title; "Translation dos and don’ts" and the other two are H5 titles. In my Firefox, H4 and H5 appear to be the same font size, which does not look too helpful to me.

--Daniel Carrero (talk) 05:45, 26 July 2016 (UTC)

This is a really minor, nonsubstantial edit and people haven't objected so far so I did it in this diff. Feel free to comment or request the change to be undone. --Daniel Carrero (talk) 22:03, 28 July 2016 (UTC)

Appendix:Letters/English, Appendix:Letters/Portuguese[edit]

I'm pinging people who participated in Wiktionary:Beer parlour/2015/November#Separate, simplified pages for letters, almost one year ago: @Ungoliant MMDCCLXIV, CodeCat, Aryamanarora, -sche, Wikitiki89.

I created Appendix:Letters/English and Appendix:Letters/Portuguese with the format that was discussed there. This includes categories like: Category:Ee. (Category:E is taken)

I hope I did not make any mistake. But I probably forgot to add some other information that could fit in the appendices. Feel free to edit the pages or make suggestions. As discussed before, if appendices like this can work in all languages, (or just the Latin script languages?) I'd like us to consider killing all redundant entries of letters in the main namespace to reduce clutter. (Except Translingual.) --Daniel Carrero (talk) 08:50, 27 July 2016 (UTC)

I propose moving CAT:E to WT:E, since we have usually used WT: for shortcut links to any namespace. That would free up CAT:E for the letter categories. --WikiTiki89 17:30, 27 July 2016 (UTC)
It would be easy to free up CAT:E. The shortcut was created in February 2016 and according to Special:WhatLinksHere/Category:E, it is used in only 21 pages.
However, I have 2 objections:
  1. I have taken a liking to the name Category:Ee as a letter category. Note that Appendix:Letters/Turkish (currently a redlink) would be a member of Category:İi and Category:Iı, as explained in Dotted and dotless I.
  2. "we have usually used WT: for shortcut links to any namespace" I don't remember any "WT:" shortcuts to other namespaces. Regardless, we only recently installed the "CAT:" shortcut; until recently, we only had "WT:" and "WS:" as namespace shortcuts. I think it's more natural to use "CAT:" for categories than "WT:".
--Daniel Carrero (talk) 17:39, 27 July 2016 (UTC)
Some examples: WT:SD, WT:RT, WT:LOP, WT:ENPRONKEY, WT:H, WT:TAT. --WikiTiki89 17:48, 27 July 2016 (UTC)
Thank you for the examples. --Daniel Carrero (talk) 17:51, 27 July 2016 (UTC)

What counts as "offensive"?[edit]

Looking at news media I see a lot more discussion and argument these days about what is "offensive". We use that word as a gloss on some terms, but unlike archaic etc. it's not in the Appendix:Glossary. Is there any way we can usefully define it there? What could our response be in the hypothetical case that a thin-skinned person started to add "offensive" to many words that others feel might not be? Equinox 15:10, 27 July 2016 (UTC)

What we can do is tag words as "insult", or "vulgar", because they reflect the intent of the speaker/writer. However, "offensive" is from the point of view of the listener/reader, which is clearly subjective and a matter of personal feeling. So I don't think we should use this word as a tag. — Dakdada 12:35, 28 July 2016 (UTC)
"Ladies" has been used as an insult, say, to newcomers to military boot camp. A problem is that offensiveness is determined by speaker, object of term, audience, and other context.
What can we help users with? I suppose we can help objects know whether they've been insulted and speakers avoid accidental offense, but the latter seems more plausible to me as something that might bring someone to a dictionary. If so, then "offensive" is a more useful label than "insult" or "pejorative", especially if qualified (eg, "usually", "sometimes", "historically"). DCDuring TALK 12:56, 28 July 2016 (UTC)
Having a mention at ladies that it is a word sometimes used to demean men could be useful. It can sort of slide outside the range of dictionary, given that pretty much any way of referring to a man as feminine (like implying he carries a purse) can be used as an insult.--Prosfilaes (talk) 05:33, 29 July 2016 (UTC)
Ideally, there would be reference to external authority. For example, both cunt and nigger have usage notes documenting studies of precisely how large a portion of the population they offend, and other dictionaries also have strong usage warnings. I also see the label qualified with "sometimes" and "possibly", which could be of use if only a few or minor references attest offensiveness, or the offensiveness could in such cases be in usage notes rather than on the definition-line. - -sche (discuss) 20:05, 28 July 2016 (UTC)
I find this as quite useful; I copied part of the note from nigger to wanker, because wanker is apparently more offensive to Brits than nigger, and I'm happy I found that out before it slipped out as a Briticism somewhere inappropriate.--Prosfilaes (talk) 05:33, 29 July 2016 (UTC)

Vote: Placing English definitions in def template or similar[edit]

FYI, I created Wiktionary:Votes/2016-07/Placing English definitions in def template or similar.

Let us postpone the vote as much as a discussion requires, if at all. --Dan Polansky (talk) 17:28, 27 July 2016 (UTC)

Annoying as hell for editors. What's the rationale this time? Equinox 17:30, 27 July 2016 (UTC)
I don't know. If I don't create the vote, you'll see it everywhere within a year's time or so. --Dan Polansky (talk) 17:34, 27 July 2016 (UTC)
Let me correct myself since I have guessed the rationale: Some editors won't be happy until the links in English definitions end in #English and thus have the English section as an explicit target. --Dan Polansky (talk) 17:48, 27 July 2016 (UTC)
For the record, I herewith submit my objection to using {{def}}, and consider the state before the creation of {{def}} to be the status quo ante. Right now, the template is used in the following mainspace entries: leo, di, agam, agat, aici, againn, ann, leat, linn, springen, La Niña, léi, daoibh, liom, libh, agamsa, ise, dhuit, chugainn, liach, liag, airsean, asatsa, as-san, aistise, astusan, chugat, chuici, chucu. --Dan Polansky (talk) 17:52, 27 July 2016 (UTC)
For the record, it was also used in El Niño. You removed it from that entry. --Daniel Carrero (talk) 17:54, 27 July 2016 (UTC)
Once again, I'll point out that the vote you created is only about the automated addition of {{def}}. If you really want to stop it, you would just created a BP discussion or vote about disallowing the template altogether. --WikiTiki89 17:56, 27 July 2016 (UTC)
I want to know what editors think and I will find out. The wording of the vote will serve fine for the purpose. Automatic and semi-automatic edits are the gravest danger in that regard, as experience plentifully shows.
@Daniel: Right, I removed it from El Niño using the "no consensus => status quo ante" principle. I removed it from no other entries. I have a baseline above via listing the entries using the template at the point of submitting the objection, I won't waste my energies on removing the template from the other entries. --Dan Polansky (talk) 18:02, 27 July 2016 (UTC)

Pronunciation of obsolete words[edit]

The OED and many other dictionaries do not show pronunciations of obsolete words. We do, in many cases. I'm not sure what exactly we are showing. How the word used to be pronounced? Surely not - and if so, when exactly? But if not that, what is it? How the word might theoretically be pronounced by someone reading an old text? I don't see how we can justify or source such claims. My question then is whether there should be some policy about the use of =Pronunciation= for an entry all of whose senses are obsolete. (I have removed some of these sections in the past and been criticised for removing useful information.) Ƿidsiþ 16:55, 28 July 2016 (UTC)

I agree. I made the same exact point in a recent discussion, I forget where. --WikiTiki89 17:18, 28 July 2016 (UTC)
Are there really no credible sources? If not, how do folks manage to reconstruct all those unattested terms and entire unattested proto-languages? DCDuring TALK 17:44, 28 July 2016 (UTC)
It's not that there are no credible sources, it's that those sources give an old pronunciation, while all our pronunciations for modern languages are modern pronunciations. Modern pronunciations do not exist for obsolete terms. --WikiTiki89 18:15, 28 July 2016 (UTC)
So find a way to convey that it's not a modern pronunciation. I don't see the issue here, and if there's a source with an old pronunciation it can be referenced. DTLHS (talk) 18:21, 28 July 2016 (UTC)
What would be wrong with providing sourced obsolete pronunciation, even for words with current pronunciations? After all, we have the obsolete definitions that those terms had and well as the meanings that have persisted from Chaucer and Shakespeare to the present. DCDuring TALK 18:23, 28 July 2016 (UTC)
Well I guess that wouldn't be a problem, but there wouldn't be very many words we could find a sourced pronunciation for. But it would also require coming up with an IPA scheme for every time period, which would require actually defining specific time periods. And if we were to find a sourced pronunciation for each period in the entire history of a given word, it would be overkill to include all of them. The pronunciations would also be pretty useless, since they cannot be compared with other words that don't have a pronunciation from the same time period. --WikiTiki89 18:30, 28 July 2016 (UTC)
I don't think pronunciations have to be sourced for every single word, if we can figure it out in other ways. Linguistics has studied the changes in pronunciation of English quite a bit, so we can rely on that too. Shakespeare's English is pretty well known and may be of particular value to users owing to the popularity of Shakespeare's works. We do have to be aware, though, that no such thing as RP or GA existed centuries ago, so pronunciations are always tied to a particular region or regional standard. Shakespeare's English involves as much a place as it does a time. —CodeCat 18:40, 28 July 2016 (UTC)
If we allow unsourced reconstructions of historical pronunciations, then how do we decide which time period and place for the pronunciation? And what's to stop us from including a pronunciation for every single year and every single place we manage to find information on? --WikiTiki89 18:43, 28 July 2016 (UTC)
We have Latin and Ancient Greek pronunciations don't we? —CodeCat 19:05, 28 July 2016 (UTC)
Sanskrit now too! :)JohnC5 19:36, 28 July 2016 (UTC)
The Latin ones are pretty much useless. The Greek ones are useful; they break Greek down into a few specific times and places and can be automated. I don't think we can do that for English. --WikiTiki89 19:41, 28 July 2016 (UTC)
As we have no limits or even schedules for our ambitions in any area of Wiktionary coverage, why should we start introduce them on this matter? It would be possible to come up with some useful limits, perhaps we could try only EME for starters. DCDuring TALK 20:02, 28 July 2016 (UTC)
If you try to include too much information, Wiktionary would become unusable. If the information is derivable, then people can derive it on their own and we can focus on including only the useful information. --WikiTiki89 20:22, 28 July 2016 (UTC)
When we discussed this at Wiktionary:Tea room/2016/June#Pronunciation_of_proditor, a user argued that we shouldn't include the pronunciations of older words because we can't know how they were pronounced... and then I provided references which noted how they were pronounced. Furthermore, obsolete words which occur in works that are still read today, especially famous ones that have been continually read, will still be pronounced today. As to the notion of including every year: why hello, slippery slope fallacy! - -sche (discuss) 20:15, 28 July 2016 (UTC)
If they occur in texts that are still read and understood today, then how can we call them obsolete? As for the slippery slope fallacy, you're right, but my real issue is how do we choose representative times and places? From what I know, for Greek we just go with the times and places that are well documented; however, for English, we have much more information about times and places. How do we choose which ones to include and which ones to exclude? --WikiTiki89 20:22, 28 July 2016 (UTC)
Also, the slippery slope fallacy is a fallacy for the inclusion of words and phrases only because each word or phrase still needs to be attested. However, if we allow arbitrarily included deduced pronunciations from any time and place that are not individually verified, the slippery slope is real and not a fallacy. --WikiTiki89 20:28, 28 July 2016 (UTC)
Shakespeare is an obvious candidate. He spoke, presumably, 16th century Midlands English, but since his performances happened in London, they were probably performed in 16th century London English. —CodeCat 20:41, 28 July 2016 (UTC)
You're already demonstrating a difficulty of choosing a time and place. --WikiTiki89 20:44, 28 July 2016 (UTC)
I'm not choosing, I'm giving suggestions. —CodeCat 21:09, 28 July 2016 (UTC)
But how are we supposed to know what time period the pronunciation is representing? If a word was used from the 14th to the 19th century and is now obsolete, would we be showing a pronunciation from 1680? 1840? 1550? What phonemes would we use? For Latin and Ancient Greek – and indeed for =Middle English= and =Old English= entries – we have conventions for these things, but in modern English sections we are assumed to be showing a current pronunciation, which makes no sense for an obsolete word. At the very least there should be some time label attached to them, but even then I don't know where we would get the information from. Ƿidsiþ 06:31, 29 July 2016 (UTC)
Old texts get read in modern pronunciation, and obsolete words do turn up in them. The current pronunciation is probably what English professors use for it.--Prosfilaes (talk) 01:29, 30 July 2016 (UTC)
English professors have to guess the pronunciations and likely don't all use the same pronunciation. This pronunciation was not orally transmitted since the time the word was in use (and if it was orally transmitted, then I don't think we can call the word obsolete). --WikiTiki89 14:34, 1 August 2016 (UTC)
Comment I would find Shakespearean pronunciations extremely helpful, as I try to read Shakespeare and older poetry in the original pronunciation (or close) so as not to miss out on rhymes and wordplay. Right now, I have to look at groups of words, sharing certain characteristics of pronunciation, based on RP, and try to extrapolate for words not found on those lists.
It would be equally useful to have pronunciation on 18th-19th century British English. We could always put a little label beside such pronunciations that makes it clear that they are reconstructed and/or obsolete, and maybe put them in collapsible boxes so they wouldn't show up for all users and confuse them. While accuracy is important, I think it's valuable to include information that isn't fully proven, but is most likely true. Andrew Sheedy (talk) 01:37, 30 July 2016 (UTC)
  • When giving pronunciations for modern words, we're painting rough strokes of an exemplaric form as well. We could just as easily conjecture a Shakespearean example as one for "RP" (aka. 1950s southern midlands upper class England). For dead languages, we pick writing-heavy periods and give the user an idealised average of pronunciation. They don't do anything different in universities and theatres. Korn [kʰũːɘ̃n] (talk) 17:46, 30 July 2016 (UTC)
If you think about it, our current practice is misleading: Shakespeare has wordplay based on stale and steal being homophones, but our entries for those words only show modern pronunciations that don't reflect this. By only showing modern pronunciations, we're strongly implying that English pronunciation hasn't changed since 1500. We need to figure out how to represent phonetic etymology in a way that's both concise and honest. Sure, mark reconstructed pronunciations, but don't exclude them.
As for the infinite multiplication of pronunciations: we could have already done that with modern English, given all the thousands of well-documented local pronunciations, but we haven't. I doubt we would do much worse if we added historical variants to the mix. Chuck Entz (talk) 19:28, 30 July 2016 (UTC)

Vote: Request categories[edit]

See Wiktionary:Votes/2016-07/Request categories.

I created it based on this RFM discussion.

The RFM discussion was initially about renaming only "Category:Translation requests (X)". Then I proposed renaming all the request categories. This is a major proposal that affects many categories. In the RFM discussion, @Dan Polansky suggested doing this via a vote. I support doing so via a vote.

Because this is a major proposal, I scheduled the vote to start in 2 weeks. Once is starts, it is scheduled to last for 2 months. Feel free to make changes or suggestions/comments. --Daniel Carrero (talk) 01:51, 29 July 2016 (UTC)

Sanskrit vs. Old Indo-Aryan[edit]

I notice that we currently don't list Sanskrit in the modules as ancestor for anything. Strictly speaking, this is correct, but it's a very widespread practice in most dictionaries to treat Sanskrit as a stand-in for Old Indo-Aryan in etymologies. It's also, as far as I can tell, the most common practice in our own etymologies.

If we're going to convert the combination of {{etyl|sa|xx}} {{m|xx|... to use {{inh}} in all of these etymologies, shouldn't we make Sanskrit the ancestor of all the prakrits and of all the other Indo-Aryan languages currently shown as descended directly from Proto-Indo-Aryan? Chuck Entz (talk) 18:58, 30 July 2016 (UTC)

{{etyl|sa|xx}} {{m|xx|... is equivalent to {{der}}, not {{inh}}. Converting to {{der}} is always ok, nothing changes. Only {{inh}} demands an ancestor relationship. —CodeCat 19:07, 30 July 2016 (UTC)
I've been curious about this issue for a little while now. Classical Sanskrit has no direct descendants, and someone on here was claiming that all the Prakrits descend from Rigvedic Sanskrit. I've had a good deal of trouble finding sources for this claim. If they do indeed descend from Rigvedic, I'm not sure if there is enough difference between Rigvedic and Classical to give them separate codes. —JohnC5 19:41, 30 July 2016 (UTC)
Classical Sanskrit had diversified a little bit already. "Krishna", "Sanskrit" and "Rigveda" are specifically northern forms. The original pronunciation common to all dialects had a syllabic r instead of ri, other dialects inserted different vowels: ru in the west, ra in the east. —CodeCat 20:02, 30 July 2016 (UTC)
But those are all spelled as syllabic r in Sanskrit. Did Pāṇini describe the pronunciation as ri? Chuck Entz (talk) 21:11, 30 July 2016 (UTC)
There are nationalistic and religious reasons to exaggerate the age and universality of Vedic Sanskrit. I hope those aren't in play here. Chuck Entz (talk) 21:22, 30 July 2016 (UTC)
If this discussion may be believed, then Vedic cannot represent the ancestor of all the Indo-Aryan languages. If there are Prakrits directly descended from Vedic, I don't know which ones, but in the mean time, I think we just have to stick with Proto-Indo-Aryan. —JohnC5 21:47, 30 July 2016 (UTC)
That fixes the technical problem, but the contradiction remains. Chuck Entz (talk) 22:24, 30 July 2016 (UTC)
AFAIK, Sauraseni Prakrit is a direct descendant of Rigvedic Sanskrit. Alfred C. Woolner's "Introduction to Prakrit" may help; it's available on archive.org. —Aryamanarora (मुझसे बात करो) 20:21, 1 August 2016 (UTC)
@Aryamanarora: So on pages 3-4 of Woolner, he says:
“If in "Sanskrit" we include the Vedic language and all dialects of the Old Indian period, then it is true to say that all the Prākrits are derived from Sanskrit. If on the other hand "Sanskrit" is used more strictly of the Pāṇini-Patañjali language or "Classical Sanskrit," then it is untrue to say that any Prakrit is derived from Sanskrit, except that Śauraseni, the Midland Prākrit, is derived from the Old Indian dialect of the Madhyadeśa on which Classical Sanskrit was mainly based.”
The phrase “and all dialects of the Old Indian period” worries me if we want to include all the Prakrits. We should certainly move Śauraseni under Sanskrit if we decide to keep Vedic together with Classical (which I also support). —JohnC5 23:16, 1 August 2016 (UTC)
And what about the apabhraṃśas? Do these deserve language codes or should they fall under their respective Prakrits? And what about re-Sanskritized lnaguages, such as Marathi? According to Wikipedia, "The contemporary grammatical rules described by Maharashtra Sahitya Parishad and endorsed by the Government of Maharashtra are supposed to take precedence in standard written Marathi. Traditions of Marathi Linguistics and the above-mentioned rules give special status to tatsamas, words adapted from Sanskrit" (no citation however). —Aryamanarora (मुझसे बात करो) 15:37, 2 August 2016 (UTC)

Vote: CFI - letting terms be linked to pertinent sections[edit]

FYI, I created Wiktionary:Votes/2016-07/CFI - letting terms be linked to pertinent sections.

Let us postpone the vote as much as the discussion requires, if at all. --Dan Polansky (talk) 11:45, 31 July 2016 (UTC)

August 2016

OTRS call for help[edit]

Dear colleagues. The volunteer response team (aka OTRS) is currently lacking volunteers to take care of questions regarding the sister projects wikibooks, wikinews, wikiquote and wiktionary. I'd like to invite you to volunteer at meta:OTRS/Volunteering. If you have any questions, please feel free to contact me. Thank you in advance for considering. --Krd (talk) 08:00, 1 August 2016 (UTC)

Why do we have both Category:Mongolian terms derived from Mandarin and Category:Mongolian terms borrowed from Mandarin?[edit]

I just noticed that one of the templates in мөөг creates a (currently redlink) Category:Mongolian terms borrowed from Mandarin but we already have a similar category populated by other templates with a very similar name and semantic: Category:Mongolian terms derived from Mandarin, containing terms like бууз, мантуу, etc. - What to do? — hippietrail (talk) 09:12, 1 August 2016 (UTC)

Borrowing is a subset of deriving, "derived from" is the generic category that holds terms not categorised more specifically. —CodeCat 11:47, 1 August 2016 (UTC)
If Mongolian borrows a term from another language that borrowed it from Chinese that term can't be described as a Mongolian borrowing from Chinese- thus the need for a separate category. Also, the "derived from" node needs to be there needs to be there to keep the data structure parallel with other categories using sister nodes such as "inherited from". Chuck Entz (talk) 12:41, 1 August 2016 (UTC)
I hadn't noticed this level of subdivision before so I'll leave it to you guys, thanks! — hippietrail (talk) 13:53, 1 August 2016 (UTC)

First LexiSession : cat[edit]

Dear all,

Apologies for writing in non-native English; please fix any mistakes you may encounter in these lines!

Wiktionary Tremendous Group, a cabal nice and open gathering of Wiktionarians, is happy to introduce a new collective experiment: LexiSession.

So, what is a LexiSession? The idea is to coordinate a massive number of contributors from different languages to focus on a shared topic, to enhance all projects at the same time! It may remind you of the Commons monthly contests, but here everyone is a winner! For this first LexiSession, we decided on a month - until the end of August - to make friends with a cat! Not only the cat entry, but also Wikisaurus:cat and other pages dealing with the vocabulary one may need to talk about cats: adjectives, verbs and expressions.

You're welcome to contribute alone, or to create a local project and organize an edit-a-thon in your region. We will probably do at least one edit-a-thon in Lyon soon, and another in Paris during the French WikiConference. Please share your contributions here! You can also have a look at what other Wiktionarians are doing, on the LexiSession Meta page. We will discuss the processes and results in Meta, so feel free to have a look and suggest topics for the following LexiSession.

Thank you for your attention, and I hope you will be interested in this new way of contributing. I'll get back to you later for an update! Noé (talk) 21:57, 1 August 2016 (UTC)

Update: in French Wiktionary, we started a thesaurus about cat in French, and that's cool, we have plenty of red links to create! Noé (talk) 09:51, 4 August 2016 (UTC)

pseudo-digraphs[edit]

It seems we have categories for English words with consonant pseudo-digraphs and English words with vowel pseudo-digraphs despite the fact that there is no such thing as a "pseudo-digraph". A combination of letters is either a digraph or it isn't. These are cases of letter combinations that look like digraphs, but aren't. Personally, I don't really see the need for these categories, and I don't think we should be using novel orthographic terms or concepts on Wiktionary. What do other folks think? Kaldari (talk) 06:01, 2 August 2016 (UTC)

A digraph is a sound written with two letters, and a pseudo-digraph is a combination that looks like it should be a single sound, but isn't. For instance, zoology doesn't rhyme with eulogy, and a ramshorn is a ram's horn, not a ram shorn. Chuck Entz (talk) 08:05, 2 August 2016 (UTC)
It is sometimes hard to decide what should go in them and they may need to be renamed at least slightly (to use "terms" like evetything else). I'm also sceptical that they're maintainable. - -sche (discuss) 15:24, 2 August 2016 (UTC)
I don't think pseudo-digraph is accurate terminology. Is there any better name we can come up with? --WikiTiki89 15:27, 2 August 2016 (UTC)
I've got into the habit of adding this consonant category to certain entries, but I also dislike the non-standard term pseudo-digraph. I don't see maintainability as a big argument against it, however: I hope that one day we can do it with a bot, based on the spelling and the pronunciation, but that is a seriously hard problem and the solution might be decades away. But we already maintain lots of awkward things manually. Equinox 01:53, 3 August 2016 (UTC)

{{lb}} linking to Categories[edit]

Why does {{lb}} work with some categories and not others? For example it won't work with Category:Languages. Is it because no one has thought about it? DonnanZ (talk) 15:34, 2 August 2016 (UTC)

I don't understand what you mean. Could you give some examples of what you are trying to do? --WikiTiki89 17:03, 2 August 2016 (UTC)
For example {{lb|nb|rail}} works fine for Category:nb:Rail transportation, but {{lb|nb|language}} or {{lb|nb|languages}} doesn't work if I add it to, say armensk, which is also an adjective. The category Category:nb:Languages has to be added instead. A little more typing. There are a number of categories like this, but I can't remember which ones now. DonnanZ (talk) 17:55, 2 August 2016 (UTC)
Oh that's what you meant. All the categories, auto-linking, and display text stuff are configurable in Module:labels/data. --WikiTiki89 18:01, 2 August 2016 (UTC)
Hmm, OK, that's all Greek to me. Strangely enough I can't find "language" or "languages" listed as a recognised label, assuming it should be listed alphabetically. Is that the list of labels that have been set up? DonnanZ (talk) 18:38, 2 August 2016 (UTC)
If you can't find it, it's probably not there and that is why it doesn't automatically categorize. --WikiTiki89 18:42, 2 August 2016 (UTC)
Oh great, a serious omission. Can that be rectified please? DonnanZ (talk) 18:58, 2 August 2016 (UTC)
I could do that, but there would be very few cases where it would be needed. In the case of armensk, "language" is not a context label. It's a topic category. A context label would mean that this word is only used in the context of languages, but that is not true, it simply refers to a language. --WikiTiki89 19:04, 2 August 2016 (UTC)
Right. {{lb|nb|language}} is a misuse of {{lb}}; labels indicate that a word is restricted in usage to a certain context, but I doubt that people only say armensk when talking about language/linguistics but not when talking about e.g. botany and mentioning in passing that a certain cited book was translated from armensk. The usual and proper thing to do to add a list category is just add the category manually or via {{C}} (see how it's used on French letter). - -sche (discuss) 19:05, 2 August 2016 (UTC)
Additionally, {{C}} also allows you to add multiple categories more easily, for example: {{C|en|Fruits|Trees|Pome fruits|Mythological plants}} at apple. --WikiTiki89 19:15, 2 August 2016 (UTC)
Normally, {{lb}} combined with language code and category will automatically link to a category, whether it's meant to or not, so I'm not sure that that can be classed as misuse. Admittedly I wasn't aware of {{C}} and I'm sure I will be making use of it now. I was also wanting to combine the functions of a qualifier with that of a label and category, but I obviously can't do that here. There shouldn't be any confusion with the armensk entry, adjective and noun are clearly separated; "en armensk forfatter", "en armensk bok", "armenske bøker", a book translated from armensk is obviously referring to Armenian, the language, not the adjective or the book itself. DonnanZ (talk) 20:01, 2 August 2016 (UTC)
I agree with -sche that {{lb|xx|language}} is a misuse. If the term is a language, then the definition should say so. Context labels should not be used to say what the definition should, and certainly not just to categorise. Categorising should always be secondary to the label; something that comes as part of using the label, rather than a reason to use the label in the first place. Perhaps we should start placing restrictions on labels to combat misuse. —CodeCat 20:23, 2 August 2016 (UTC)
I wouldn't class that as one of your most momentous ideas. Use of {{lb|xx|category}} is a good shortcut if used properly, and doesn't cause any harm at all. DonnanZ (talk) 21:08, 2 August 2016 (UTC)
It does cause harm. Let me give an example with one of our most commonly misused labels "anatomy". The "anatomy" label for the term glomerulus is justified because no who doesn't know anatomy would know what that is, so anatomy is a context in which this term is understood. However, it would be a misuse to put an "anatomy" label for the term kneecap, because everyone knows what a kneecap is and the word can be expected to be understood in practically any context (if you say "I feel and hurt my kneecap", you are not having a discussion about anatomy). Thus, putting the "anatomy" label at kneecap would mislead people to think that it is as much a technical term as glomerulus and that would be harmful. --WikiTiki89 14:50, 3 August 2016 (UTC)
You're calling it misuse, but obviously not everyone agrees with you, if the translations are anything to go by. Some use Category:Anatomy for kneecap, others Category:Skeleton. But that's so-called "misuse" of a category, not of {{lb}}. DonnanZ (talk) 17:31, 3 August 2016 (UTC)
As I just said, "anatomy" is one of our most commonly misused labels. Also, you are confusing categories with labels. The label gives the context, the category just adds the term to a category so that it can be found by browsing the category. The fact that some labels also categorize is just a matter of convenience to not have to put both the label and the category, but that doesn't mean all categories should be given as a label. --WikiTiki89 20:08, 3 August 2016 (UTC)
  • Nope, no confusion. Whether all categories can be accessed via {{lb}} is seemingly another matter that I have no control over. It should be up to the editor's discretion whether they use {{Category|xx|category}}, {{C|xx|category}} or {{lb|xx|category}}, depending on circumstances, and shouldn't be deliberately restricted in this way. DonnanZ (talk) 23:16, 3 August 2016 (UTC)
    It's not deliberately restricted. {{lb}} is meant to add labels not categories. Some of these labels also categorize for convenience, but not all of them do and not all of the categories are textually equivalent to the label, thus there is no way to automatically categorize these labels. Every label that wants to categorize needs to be added to the module so that the module would know the name of the category to use for that label. --WikiTiki89 00:21, 4 August 2016 (UTC)
  • I realise that, but when a request for inclusion is declined, that becomes a deliberate restriction. DonnanZ (talk) 08:27, 4 August 2016 (UTC)
    If you give me an example of where you would use it, I would add it. But so far, I disagree with your use cases. It's important to have a real example in order to actually identify the correct category, and whether we should redirect the label to "linguistics", and things like that. --WikiTiki89 14:56, 4 August 2016 (UTC)
  • No, I'm not confusing languages with linguistics; languages are always categorised as such, linguistics covers related matters, and I wouldn't use the languages label for anything other than actual languages. Most if not all languages in Norwegian have the same spelling as the adjective, which happens in English too. Therefore it would be useful to use a label {{lb|xx|language}} for the language entry. I have already mentioned armensk, other examples are fransk, tysk, japansk, spansk, portugisisk and so on. It's no big deal, but it would be a great convenience, a clear marking and pretty harmless. DonnanZ (talk) 23:11, 4 August 2016 (UTC)
    • But "languages" is not a context. These words are not only used in the context of talking about languages. They're used generally, without context. If I ask "Do you speak tysk?", people will understand regardless of what was being discussed before, and regardless of setting. Therefore the label "languages" is a misuse on these entries. Labels should not, ever, be used to clarify or disambiguate definitions. If the definition by itself is unclear, that's what you'd use a gloss for: {{gloss|language}} after the definition. —CodeCat 23:42, 4 August 2016 (UTC)
  • I give up, some people like to make mountains out of molehills. I don't particularly like {{gloss}} anyway, the note is not in italics. DonnanZ (talk) 10:15, 5 August 2016 (UTC)
    • You mean you wanted to define armensk as "(language) Armenian"? I don't understand what you mean, but the others are right that {{lb}} should not be used to generate a label "language" and categorize an entry. If you just want to categorize the entry without generating a label, use {{C}} or a simple category link. The use of context labels was voted and approved at Wiktionary:Votes/pl-2009-03/Context labels in ELE v2. The voted text says: "A context label identifies a definition which only applies in a restricted context." --Daniel Carrero (talk) 17:03, 8 August 2016 (UTC)

Too many pictures?[edit]

I wonder if we need a policy on where/when to use pictures in entries. For example, having a picture of a Bible at Bible makes sense, but we also have one at swear on a stack of Bibles. Ditto for on it like a car bonnet. I think that having pictures for purely figurative phrases is actually misleading (it might suggest that a real Bible, or a real car bonnet, is involved), and worse than not having them. I definitely don't feel that every entry, when finally completed, ought to have a picture. Only some entries (usually those for literal things, like moon or dog) benefit from them. Thoughts? Equinox 01:51, 3 August 2016 (UTC)

  • Yes, inappropriate images (as in the two you mention) could be removed without discussion (I have so removed). SemperBlotto (talk) 06:20, 3 August 2016 (UTC)
    • As well as that, there is no need to add an image to an entry if it is linked to a Wikipedia article the image is taken from. That's pointless. DonnanZ (talk) 08:04, 3 August 2016 (UTC)
      • I agree that some image removal, such as those mentioned, are clearcut. But what about marginal cases? Where should any necessary discussions of candidates for removal (and appeals or removals) be? Tea Room? Rfd? I don't think a new page is necessary now nor will it be ever. DCDuring TALK 11:01, 3 August 2016 (UTC)
      • It makes no sense to me that the mere presence of a Wikipedia link on an entry page would mean that that entry should not have any images. Granted, the Wikipedia article may have tons of images -- but how is that relevant to the content of the Wiktionary entry? A relevant and appropriate (set of) image(s) in the Wiktionary entry increases the utility of the entry. Requiring the user to click through to some other page entirely is not good usability. ‑‑ Eiríkr Útlendi │Tala við mig 20:03, 3 August 2016 (UTC)

Sorry to interrupt. This discussion is interesting. In French Wiktionary we have 26.406 pictures and we think we need more!. Do you know how much pictures are used in English Wiktionary? Noé (talk) 14:12, 3 August 2016 (UTC)

  • Hopefully someone can answer your question, I must add that I love images and have added a few myself, but I think it has to be done in an intelligent manner. DonnanZ (talk) 14:25, 3 August 2016 (UTC)
  • I also think that we could do with more pictures. And a linked Wikipedia article with a picture is not good enough, for a couple of reasons: the Wikipedia article might change (and remove or change the picture), the image contained in the article could be at the bottom of the page, and there are cases where it is not obvious which sense the picture is actually meant to illustrate. Having the picture right in the Wiktionary entry is more convenient and fixes these problems. Traditionally most dictionaries (and we as editors, too!) are very focussed on words (somewhat understandably), so more entries with well chosen images would help to make us stand out. Jberkel (talk) 17:33, 3 August 2016 (UTC)
  • I grant that point ("the Wikipedia article might change (and remove or change the picture"). DonnanZ (talk) 15:01, 13 August 2016 (UTC)
  • Pictures are great, but they should make some kind of useful point, such as "It looks like this", "What's different about it this", "It got its name because of this", "It's important because of this," or "It can be found in these locations." (especially useful if we are looking for translations). One kind of image that isn't too helpful is a picture of a particular kind (eg. a Norway maple) of a thing (eg, a tree) that doesn't show the features that distinguish it from other kinds (eg, of maples or of trees). DCDuring TALK 18:47, 3 August 2016 (UTC)
  • Note that we have a project in Wikipedia called Wikigrenier, whose purpose is to photograph various common objects. See a list of pictures. Photographs such as those are good for dictionary articles, and it is possible to make requests. — Dakdada 08:34, 4 August 2016 (UTC)
  • I generally agree that some pictures are inappropriate. I also agree that pictures in figurative or abstact entries are generally suspect. Of course, there is that vast category of picture-deserving entries which is not under discussion and in which better picture coverage is welcome. --Dan Polansky (talk) 17:43, 12 August 2016 (UTC)
  • As an example, I tried to find applicable images to clarify the many different senses at ‎(sakura, cherry; cherry tree; cherry blossom; etc.). Many of these senses are easier to understand with a visual. ‑‑ Eiríkr Útlendi │Tala við mig 18:40, 12 August 2016 (UTC)
    • That's great, but looks a bit crowded and the markup is confusing, with the inline table. Are there any image related templates in use? Maybe this would help to standardize the use of images, and we could keep track of which entries have illustrations. Jberkel (talk) 19:32, 12 August 2016 (UTC)
  • I looked in the past (possibly when working on that very entry), and I didn't find anything that did what I needed. If anyone is aware of such a template, I'm certainly game to use it. ‑‑ Eiríkr Útlendi │Tala við mig 20:31, 15 August 2016 (UTC)
  • A list and count of images used, as mentioned above by User:Noé may be a useful tool, if this doesn't exist already. DonnanZ (talk) 15:01, 13 August 2016 (UTC)
  • Just wanted to say that I have been adding images to entries, primarily as part of a general effort to improve entries that appear as Words of the Day. Personally, I don't think there's anything wrong with images that are not strictly descriptive. — SMUconlaw (talk) 17:00, 13 August 2016 (UTC)
  • It's quite pleasing when one finds suitable images on Wikimedia Commons that haven't been used anywhere else before, as for hopper wagon. DonnanZ (talk) 18:15, 13 August 2016 (UTC)
  • Did an analysis of the 20160801 dump with the help of Lyokoï (who provided the numbers for the French Wiktionary): we have 35927 image links. – Jberkel (talk) 20:38, 19 August 2016 (UTC)
  • Very interesting, thanks for the figure. More than in French, apparently. DonnanZ (talk) 20:54, 19 August 2016 (UTC)
  • Yeah, just for short time... We will change this fact quickly ! Haha ! --Lyokoï (talk) 11:46, 20 August 2016 (UTC)
  • That's the spirit! DonnanZ (talk) 18:24, 20 August 2016 (UTC)

Proposed creation of Module:it-IPA[edit]

I was wondering whether it were possible to create a similar module to this one in the Catalan Wiktionary, but more complex; that is:

  1. the apostrophe read as completely absent;
  2. monosyllables only stressed when spelled with an accent;
  3. words treated separately if a space is put between them;
  4. two distinct IPAs, a phonetic and a phonemic one;
  5. the possibility of endless alternative pronunciations.

If anyone is willing to help me, please let me know. Thanks! ;) [ˌiˑvã̠n̪ˑˈs̪kr̺ud͡ʒʔˌn̺ovã̠n̪ˑˈt̪ɔt̪ːo] (parla con me) 15:33, 3 August 2016 (UTC)

There's a Module:ca-IPA already, but it was never deployed. Maybe you can do so, and adapt it to Italian as well? —CodeCat 15:41, 3 August 2016 (UTC)
@CodeCat: the fact is I’m not able to create modules, that’s why I was asking for help. [ˌiˑvã̠n̪ˑˈs̪kr̺ud͡ʒʔˌn̺ovã̠n̪ˑˈt̪ɔt̪ːo] (parla con me) 16:37, 3 August 2016 (UTC)
@IvanScrooge98 I'm willing to help out. Is the idea to automatically create IPA transcriptions? Jberkel (talk) 17:38, 3 August 2016 (UTC)

@Jberkel: thank you so much; yes, basically, that’s the idea. Can you arrange that? [ˌiˑvã̠n̪ˑˈs̪kr̺ud͡ʒʔˌn̺ovã̠n̪ˑˈt̪ɔt̪ːo] (parla con me) 17:43, 3 August 2016 (UTC)

OK, what I would need are some examples of input where ca:Mòdul:it-general produces the undesired output, with the expected output. It probably also makes sense to clean up / rewrite some of the code there – it's one big chain of regular expressions, a maintenance nightmare. Module:ca-IPA is a lot easier to understand and documented as well. Jberkel (talk) 21:05, 3 August 2016 (UTC)
OK, I’m working on it. I’ll let you have a list of examples. Thank you again, @Jberkel! [ˌiˑvã̠n̪ˑˈs̪kr̺ud͡ʒʔˌn̺ovã̠n̪ˑˈt̪ɔt̪ːo] (parla con me) 21:28, 3 August 2016 (UTC)

@Jberkel, here's a short list of comparisons. You will notice certain consonants geminated at the beginning of the word, that's a feature of Italian occurring, between two different words, after vowels only in these circumstances:

  • beginning /ʃ/, /ʎ/, /ɲ/, /dz/ /ts/ are always as double after vowels, even between two separate words;
  • all beginning consonants (with exceptions for certain clusters, namely cn, pn, ps, tm, tn, and all clusters starting with S which don't give /ʃ/, as st, spr, sc+a, etc.) undergo this gemination if they come after:
    • words ending with a graphically stressed vowel (as città, perché, però, giù, , etc.) or the list of stressed monosyllables which are spelled without accent that I provided you there (these monosyllables should display as though O were Ò /ɔ/, E were É /e/, I were Ì, etc...);
    • the unstressed monosyllables (with o = /o/, e = /e/) and the four words I provided.

All other unstressed monosyllables don't make the following consonant geminated; all monosyllables (including the geminating ones) should not display with a stress mark /ˈ/, not even by themselves, unless they are apocopic forms of nouns, etc. as ciel or cuor.
When it comes to secondary stress, I would just leave it to all stressed monosyllables unless very before the primarily stressed syllable, as in è vero; I would also put it in polysyllabic words if the distance with the primary stress is less than four syllables, otherwise I'd mark them with a normal stress mark; but you can choose to just leave one primary stress and all the others as secondary.
I think I didn't miss anything; hope I've been clear enough with these few words and that the task won't be hard for you; in any case, if you have any doubts or didn't understand something in my explanation, don't hesitate to ask me clarifications. Enjoy your work!! [ˌiˑvã̠n̪ˑˈs̪kr̺ud͡ʒʔˌn̺ovã̠n̪ˑˈt̪ɔt̪ːo] (parla con me) 15:31, 5 August 2016 (UTC)

OK, That should be enough to get going. I started to learn a bit of Italian recently, and the pronunciation seemed to be quite straightforward compared to other languages. – Jberkel (talk) 13:33, 6 August 2016 (UTC)
How would the difference between high-mid and low-mid vowels be handled? Presumably you need to write é or è, ó or ò? What happens if someone leaves out the stress mark (is this an error)? Benwing2 (talk) 21:26, 7 August 2016 (UTC)
@Benwing2: in the module on Catalan Wiktionary there are already some base rules to guess close-mid E and O and proparoxytones, the others are considered paroxytones with open-mid vowels. However, there’s no general rule in Italian and that’s just tentative, pronunciations have to be checked and overwritten if it is the case. [ˌiˑvã̠n̪ˑˈs̪kr̺ud͡ʒʔˌn̺ovã̠n̪ˑˈt̪ɔt̪ːo] (parla con me) 09:03, 8 August 2016 (UTC)
For {{ca-IPA}}, you have to specify the vowel, it shows an error if you don't. See gos. —CodeCat 11:00, 8 August 2016 (UTC)

Abbreviations L4 header[edit]

I'm doing some automated fixup on Dutch entries (per WT:NORM and WT:ELE) and came across the entry aanwijzend voornaamwoord. This entry uses an "Abbreviations" section under the part of speech. This section is not listed in WT:ELE as allowed, thus it's also not clear where the section should appear relative to the others. What is the proper fix here?

Note: This isn't about the use of "Abbreviation" as the part of speech, but rather the abbreviations section appearing under a POS header, and therefore describing abbreviations of the current term. —CodeCat 20:01, 3 August 2016 (UTC)

Change it to "Synonyms" and use {{qualifier|abbreviation}}. DTLHS (talk) 20:14, 3 August 2016 (UTC)
Is an abbreviation really a synonym, or an alternative form? I think it's more the latter. —CodeCat 20:27, 3 August 2016 (UTC)
This isn't a black-and-white question. --WikiTiki89 20:30, 3 August 2016 (UTC)
But it needs a black-and-white answer. —CodeCat 20:36, 3 August 2016 (UTC)
What I mean is it's a case-by-case question. Some abbreviations are best handled as alternative forms and some are best handled as synonyms. --WikiTiki89 20:37, 3 August 2016 (UTC)
And how would you decide which to use? DTLHS (talk) 20:41, 3 August 2016 (UTC)
How about using the Derived terms section instead of Synonyms or Alternative forms? --Panda10 (talk) 20:44, 3 August 2016 (UTC)
I don't would object to either the alternative forms or the synonyms headers as a home for these for Dutch terms. If one alternative is necessary to make life technically simpler, then who really cares about how users might interpret one rather than the other. DCDuring TALK 21:27, 3 August 2016 (UTC)

Wiktionarian skills list[edit]

Dear colleagues,

In February, I initiated in French Wiktionary a list of skills, of what we are doing on our project, what we have learned to do. It is not a guideline nor an Help page but roughly what can fill a CV with our empirical learning. After few months of improvement by other people, I am glad to inform you that I tried to translate it in English as Wiktionarian skills list! Yay my English is quite awful, but I imagine it can be improve collaboratively Face-smile.svg I hope it will be interesting for some of you, and I'll be happy to discuss and improve it with you! Noé (talk) 23:43, 3 August 2016 (UTC)

@Noé: I made an attempt to fix most of your English grammar mistakes: diff. Could you please check to make sure I didn't change the meaning of anything? --WikiTiki89 17:51, 4 August 2016 (UTC)
Thank you a lot, it is much clear now! I hope you enjoyed the reading! I changed back only one sentence about trademark, because I was thinking about the problem we have to describe objects when names are brands. We discussed a lot about this in French Wiktionary because of 3M Company. They sent a dozen of e-mails to one contributor that wrote scotch (fr:scotch) and post-it (fr:post-it) to indicate they are trademarks. Companies need to protect their brands against dilution and Wiktionaries have to decide of a policy on this, to explain clearly that Wiktionary is descriptive and not prescriptive, so we do not pretend to decide if this band are now commonly attested as substantive in a language. Well, long discussion, and I plan to translate our conclusions to English at some point. -- Noé (talk) 12:36, 5 August 2016 (UTC)

External links and references in WT:EL[edit]

These headers are given as level 4 headers in the "example entry", but these are actually level 3 headers that go below the parts of speech. In particular, References goes below Anagrams. Can this be fixed? —CodeCat 17:35, 4 August 2016 (UTC)

I fixed this without a vote. --Daniel Carrero (talk) 10:58, 7 August 2016 (UTC)

WT:EL says nothing about multiple etymologies[edit]

It is the longstanding practice that when there are multiple POS sections, each with their own etymology, then we have separate numbered etymology sections and POS sections are nested in each etymology section, at level 4 rather than level 3. But WT:EL seems to say nothing about this. In fact, the "example entry" shows a rather atypical entry layout, with multiple POS sections under a single etymology; in the vast majority of cases, separate POSes ought to have separate etymologies too. I would like to see this remedied, does anyone have proposals for a change? —CodeCat 17:42, 4 August 2016 (UTC)

I created this vote, and it failed in January 2016: Wiktionary:Votes/pl-2015-12/Headings. It had some issues as discussed by the voters. Like in the failed vote, I'd like to propose having a decent "Headings" section with a list of all headings and levels (Etymology = level 3, Noun/Adjective/etc. = level 3, Translations = level 4), explaining how the presence of "Etymology 1"/"Etymology 2"/etc. affects the level of other sections. --Daniel Carrero (talk) 06:53, 7 August 2016 (UTC)

WT:EL homophones and rhymes sections[edit]

These basically duplicate the content of Wiktionary:Pronunciation, which is already linked to on the page. Rather than try to elaborate on every detail of the Pronunciation section, WT:EL should stay short and to the point. I therefore propose to remove these two sections, perhaps replacing them with a sentence or two that mentions other things that go in Pronunciation sections. —CodeCat 17:58, 4 August 2016 (UTC)

If Wiktionary:Votes/pl-2016-07/Pronunciation 2 passes, as explained in the "changes and rationale", all the text in the subsections "Homophones" and "Rhymes" is going to be kept, albeit edited to occupy less space, and the subsection titles will disappear. The titles themselves are unnecessary, in my opinion. If the titles were to be kept, we might as well have titles: "Audio pronunciation", "Transcription", "Hyphenation", etc. --Daniel Carrero (talk) 06:45, 7 August 2016 (UTC)

WT:EL: "Language" under "Entry core"[edit]

The way WT:EL is currently laid out, it first mentions things that go before the definitions, then the "core" which includes the POS section and definitions themselves, and finally things that go after the definitions. But the "Language" section, which describes the use of L2 language sections, is not part of the entry core as implied here. Note also that it is nested under the "Additional headings" L2 section, which says "There are additional headings which you should include if possible, but if you don’t have the necessary expertise, resources or time, you have no obligation to add them, with the possible exception of “References”." I certainly don't think that the L2 language section is optional or dependent on expertise in any sense. Therefore it should be mentioned earlier on the page, and more prominently. —CodeCat 18:03, 4 August 2016 (UTC)

Some thoughts:
  • I think "Entry name" should be somewhere above "Language" and the explanation of any section.
  • Then "Language", above any explanations of basically everything else (etymologies, POS headers, definitions, etc.), because it is the highest-level section we use (not counting the H1 page title).
  • We could delete the titles "Headings before the definitions" and "Headings after the definitions" and just have "Headings".
  • "Additional headings" is a misnomer. It does not contain what the name promises, and the name or contents should change.
  • WT:EL seems to imply that "References" is mandatory. Is "References" mandatory? Why? If anything, any entry must have the language, POS header, headword line and at least 1 definition. (for the record, it was voted and approved that romanization entries must have a definition, too)
--Daniel Carrero (talk) 07:05, 7 August 2016 (UTC)

Variables extension[edit]

What exactly happened to Wiktionary:Votes/2015-12/Install Extension:Variables? On the phabricator (phab:T122934), plans were made to create a similar function, but after 6 months no progress has been made. -Xbony2 (talk) 21:17, 4 August 2016 (UTC)

My understanding is that the work required to make section-aware templates possible is dependent on some work to actually make the MediaWiki parser know what sections are (phab:T114072). As you can see, the Parsing team of the WMF is quite busy, so it looks like it may be some time before this work gets underway. I posted a rather hackish proof-of-concept patch for MediaWiki at phab:T122934, which would solve the problem, but there is practically zero chance of that being accepted - I think the Parsing team would prefer to do it the proper way, rather than introduce yet more technical debt into MediaWiki. This, that and the other (talk) 07:19, 7 August 2016 (UTC)

moisturising cream v. moisturizer[edit]

Moved to Wiktionary:Tea Room#moisturising cream v. moisturizer DCDuring TALK 10:57, 5 August 2016 (UTC)

Diacritics[edit]

How do I get the diacritic in phah-sǹg to display correctly in the title?--Prisencolin (talk) 00:06, 6 August 2016 (UTC)

It might be a font problem on your own computer. It shows up fine for me (Windows 10, Monobook skin). —suzukaze (tc) 00:14, 6 August 2016 (UTC)

Vote: Using template l to link to English entries from English entries[edit]

FYI, I created Wiktionary:Votes/2016-08/Using template l to link to English entries from English entries.

Let us postpone the vote as much as discussion makes necessary, if at all. --Dan Polansky (talk) 10:06, 6 August 2016 (UTC)

Alternative forms after definitions — weaker proposal[edit]

Previous discussions:

The vote Wiktionary:Votes/2016-02/Placement of "Alternative forms" had the proposal below. It ended as no consensus (10-9-1 = 52.6%-47.4%) in March 2016.

Voting on:

  • Fix the placement of the "Alternative forms" section directly above the "Synonyms" section, as a subsection of the POS section.

Rationale:

  • Arguably, synonyms and alternative forms are related concepts.
  • Removing "Alternative forms" from above the definitions is a way to promote the definitions.

Simplified entry example: hardworking

==English==

===Adjective===
{{en-adj}}

# Definition.

(possibly other headers between the definitions and the alternative forms)

====Alternative forms====
* {{l|en|hard-working}}

====Synonyms====
* {{l|en|industrious}}

Unfortunately, as mentioned by some voters, if this vote passed, it would have resulted in duplication of alternative forms sections in entries with multiple POS sections.

New, weaker proposal:

  • Rather than editing all entries (as in, by bot or whatever), just allowing entries to be edited on a case-by-case basis: If someone wants to edit an entry manually and place the "Alternative forms" as a L4 section above "Synonyms", that would be OK. If someone wants to edit an entry manually and place the "Alternative forms" as an L3 section above Etymology/Pronunciation, that would be OK too, and individual entries can be discussed in case of disagreement. This would need a new vote.

Pinging all participants of the previous vote (I hope I didn't miss anyone):

@Metaknowledge, Mr. Granger, Equinox, This, that and the other, -sche, Wikitiki89, Makaokalani, Embryomystic, Andrew Sheedy
@Droigheann, Nibiko, I'm so meta even this acronym, Vahagn Petrosyan, Dan Polansky, Xoristzatziki, Erutuon, Korn, Xbony2

Thoughts? --Daniel Carrero (talk) 08:06, 7 August 2016 (UTC)

I think this makes much more sense than placing them above, as though they apply to all terms. But they're not like pronunciation, where it actually makes sense to show it as a "global" thing that applies to the entire entry. Alternative forms are often term/etymology specific. —CodeCat 12:18, 7 August 2016 (UTC)
steden is an entry where the different etymologies have different alternative forms. —CodeCat 16:38, 7 August 2016 (UTC)
I doubt all alternate forms apply to all POS's of every word. Since they need to be attested for each POS independently, I would (now) have no problem repeating them for each POS. They should definitely be split up for separate etymologies. Andrew Sheedy (talk) 20:32, 7 August 2016 (UTC)
I agree with CodeCat (talkcontribs) here. Alternative forms are very much like synonyms and it makes no sense to stick them at the top where they'll often be missed. Benwing2 (talk) 21:08, 7 August 2016 (UTC)
As for duplication of L4 alternative forms, one alternative (so to speak) is to place them as an L3 header after both or all POS sections. I do this often with Related Terms. (Although I'll grant that it makes more sense to do this for related terms than for alternative forms as often this means no more than converting an L4 to an L3, whereas with alternative forms it will involve moving them below synonyms, antonyms, derived terms and related terms, and they may be missed there just like at the top). Benwing2 (talk) 21:11, 7 August 2016 (UTC)
I don't like this idea. I think moving away from having headers apply to multiple POS sections is the way to go. If we have to duplicate a few, then so be it. It's not that frequent. —CodeCat 21:40, 7 August 2016 (UTC)
I agree with CodeCat on this. --Daniel Carrero (talk) 15:09, 8 August 2016 (UTC)
  • I will make the positioning of alternative forms as an L4 below definitions standard in GML entries. Korn [kʰũːɘ̃n] (talk) 10:35, 18 August 2016 (UTC)

Suggestion for sense tags on antonyms[edit]

Awhile ago CodeCat (talkcontribs) tried changing the text of the {{sense}} tag to say something like (of sense "foo") instead of just (foo). This was roundly disliked, and reverted. The logic given by CodeCat was that it's confusing to have a simple (foo) sense tag next to antonyms, which suggests that the antonyms has the meaning of the sense tag rather than the opposite. How about we do something like what CodeCat tried, but only for antonyms? It could say (of sense "foo") or (antonym of "foo") or similar. The way to implement it is to create a new template {{antsense}}, and use a bot to change all occurrences of {{sense}} in Antonyms sections to {{antsense}}. Thoughts? Benwing2 (talk) 21:17, 7 August 2016 (UTC)

I do think we should do something about this. Unfamiliar users fairly regularly invert the sense, thinking they are fixing an error. Equinox 21:44, 7 August 2016 (UTC)
How many dictionaries actually have antonyms (or any other semantic relations)? I think not many.
How do references of any kind that have antonyms handle this? Among OneLook references, WordNet and Collins Thesaurus have antonyms, which are published online by The Free Dictionary. They offer two presentations (using dark as an example):
  1. the freedictionary, which uses color-coded icons in red () and green ().
  2. the freethesaurus, which is new and uses color coded boxes, pale green for synonyms, pale red for antonyms, peach(?) for "related words".
Color-coding is imperfect (blindness, red-green color-blindness, monitors or screens that do not show colors).
The icons alone don't seem adequate for the full range of users who need the current approach supplemented or replaced.
Longmans DCE 1985 includes "—opposite light" on the appropriate sense line.
Webster's 2nd Intl. has "syn." heading a block of text explaining synonyms and (mostly) near-synonyms and "ant." before a very short list of antonyms.
What does OED do? DCDuring TALK 23:50, 7 August 2016 (UTC)
OneLook itself has a Thesaurus, which uses color coding in the manner of freethesaurus. DCDuring TALK 23:53, 7 August 2016 (UTC)
Chambers Thesaurus screenshot: [4]. They divide each entry into sections headed by examples (dark hair, dark secrets, etc.) and collect all antonyms at the end, numbered by section and marked by the inequality sign . Equinox 23:56, 7 August 2016 (UTC)
That's sort of what we do, except we label senses directly rather than by number, because numbers tend to change as senses are added and rearranged. —CodeCat 00:12, 8 August 2016 (UTC)
On my screen Chambers entry shows the antonyms in red type.
This use of the inequality symbol to mark antonyms doesn't seem obvious though it is quickly learned. DCDuring TALK 04:02, 8 August 2016 (UTC)
We certainly should not use colour exclusively to convey information, only colour in addition to something else. —CodeCat 18:13, 8 August 2016 (UTC)
My screenshot came from the CD-ROM version. I assume they use that symbol to save visual space. Equinox 20:18, 8 August 2016 (UTC)

Is Ushakov's dictionary copyrighted still?[edit]

Calling @bd2412. See Copyright law of the Russian Federation. This case is tricky because Ushakov's dictionary was published in 1935-1940 and he died April 17, 1942 (see Dmitry Ushakov). The copyright law of 1993 retroactively made a copyright of 50 years after the published date or the author's death (whichever is later), and later works extended this to a 70-year term. The Wikipedia article says this means anyone who died in 1943 or later was within the copyright period in 1993, but various additional details might possibly make Ushakov's work within this period as well, in which case the copyright would extend (presumably) to 2013, meaning it's (presumably) out of copyright now. But this stuff is sufficiently complicated that I don't know for sure. Basically I want to use some example sentences from this dictionary to illustrate some Wiktionary entries. Benwing2 (talk) 02:52, 8 August 2016 (UTC)

Possibly relevant: this discussion of fair use and de minimis copying, and of the applicability of US vs non-US laws. - -sche (discuss) 05:35, 8 August 2016 (UTC)
For loading on Commons, you have to follow the source country's copyright and US copyright, but Wiktionary doesn't have to follow source country's copyright by WMF rules, and I don't think en.Wiktionary has policy on it. If it was out of copyright in Russia in 1996, it's almost certainly out of copyright in the US, but if it was in copyright in Russia in 1996, it will be in copyright in the US for 95 years from publication, or until 2031-2036.
I'd note that fair use on stuff taken from a dictionary is going to be much more problematic than on stuff taken from a novel, since quotations from a novel don't influence the normal commercial use of the novel, but we are directly competing against a dictionary.--Prosfilaes (talk) 07:33, 8 August 2016 (UTC)
What kind of damages can they claim, though? How much profit do they still make? Also, I can't imagine it is the case that works are not free of copyright until they are in every country in the world. And retroactively copyrighting seems even more dubious, what if someone had published it under a permissive licence in the meantime? Do non-infringing works suddenly become infringing? —CodeCat 18:16, 8 August 2016 (UTC)
@Prosfilaes This stuff is such a mess. It seems quite possible that it went out of copyright in Russia, went back into copyright in 1993 (conceivably due to a rule stating that dates are moved forward to Jan 1 of the next year), went out again later that year (50 years from author's death, moved forward to Jan 1 1943???), hence was out of copyright in 1996, then went back into copyright in 2004 due to the new 70-year-from-death policy, then went out again in 2013. Presumably that means it's out of copyright in the US. But who knows. What exactly happens if you copy from an out-of-copyright work and then it later goes back into copyright? This stuff sucks. Copyright terms IMO are way way too long. Benwing2 (talk) 00:11, 9 August 2016 (UTC)
@CodeCat: They can claim up to $30,000 as statuary damages in the US. Works are not free of copyright everywhere in the world until they are free of copyright everywhere in the world. The WMF is chartered in the US, and therefore has to follow US rules. There are some countries where the rule of the shorter term is in play, and thus lack of copyright in Russia matters there (which is part of the reason Commons cares about it), but the US doesn't have the rule of the shorter term. Putting a work in the public domain back in copyright is a mess, but countries do it some times, usually with exceptions for preexisting users.
@Benwing2: It was never in copyright in the US, and the URAA in 1996 would have returned it to copyright in the US only if it was still in copyright in Russia. It looks like it's out of copyright close to world-round, so it should be safe to use. As far as I know, copyright terms virtually always extend through the end of the year they expire in.--Prosfilaes (talk) 08:03, 9 August 2016 (UTC)
Thanks! Benwing2 (talk) 08:33, 9 August 2016 (UTC)

Save/Publish[edit]

Whatamidoing (WMF) (talk) 18:02, 9 August 2016 (UTC)

Old Ruthenian[edit]

I'm wondering how we should handle this language. Should we give it its own code and make it a descendent of Old East Slavic and the ancestor of Ukrainian, Belarusian, and Rusyn, or should we make it a dialect of Old East Slavic, or even a dialect of Russian? What should we call it, Ruthenian, Old Ruthenian, Old Western Russian, Lithuanian Russian, etc.? What code should we give it? --WikiTiki89 19:33, 9 August 2016 (UTC)

@CodeCat, Atitarev, Useigor, -sche: Pinging people who might be interested. --WikiTiki89 15:58, 10 August 2016 (UTC)
It's mainly about how different they are. Is Old Ruthenian clearly identifiable as a language contrasting with Old East Slavic? —CodeCat 16:06, 10 August 2016 (UTC)
What language do we consider having been spoken in the Grand Duchy of Moscow and the Tsardom of Russia? Old East Slavic, or Modern Russian? If the answer is Old East Slavic, then we can consider the language of the Grand Duchy of Lithuania to also have been a dialect of Old East Slavic; if the answer is Modern Russian, then we would need to make it a separate language. --WikiTiki89 17:35, 10 August 2016 (UTC)
Russian and Ruthenian probably diverged by the 15th century. Old East Slavic (Old Russian) is the predecessor of both. Church Slavonic was used as the official language of Muscovy then. --Anatoli T. (обсудить/вклад) 12:03, 12 August 2016 (UTC)
If we add it, I would call it "Old Ruthenian", because "Ruthenian" seems too ambiguous. "Lithuanian Russian" also seems ambiguous and has been less common since the 1980s (per ngrams). Looking around cursorily, I do find scholars who consider Old Ruthenian distinct from Old East Slavic — some consider Old Ruthenian a jumble of Old East Slavic elements and Polish ones. It makes it sound like it would be possible to tell whether a given text was Old Ruthenian or Old East Slavic, which is an obvious prerequisite to splitting it. - -sche (discuss) 09:30, 13 August 2016 (UTC)

Involved administrator actions[edit]

Greetings. I would like to ask a question about Wiktionary policies regarding using administrative tools in situations where the admin is "involved" in a dispute. I am not sure if I am in the right place, and if not, could you please direct me to the appropriate venue?

If I am in the right place, here's the situation. The etymology section at sheng nu has been contested for quite some time. Both at the deletion discussion and subsequently the talk page. The first revert was by User:Wyang with the edit summary, "Western fantasisation". The content was sourced by reliable sources including the BBC and the New York Times. This was way back in 2013.

In 2015 I re-added the content back because it had reliable sources and it was discussed at the talk page. The conversation ended with me asking them for reliable sources that place the etymology elsewhere otherwise it's being removed purely on personal opinion and original research. The topic was seemingly dropped and the content remained.

Then in July 2016, Wyang reverted it again. I came across it today and re-added it and then made some major changes to the etymology section. Namely added that the etymology is disputed as described in a book I cited, and then proceeded to list the varying origins for the term as cited by the various reliable sources. Wyang reverted my edits with new changes without an explanation. I left a talk page message and restored my new changes. I was reverted almost immediately, and then to my surprise, Wyang protected the article so that only administrators could edit it.

Maybe Wiktionary allows for the removal of cited etymology content. Fine, but Wyang never provided anything other than original research. Even if they did have sources that indicated a different etymology, they could have added it to the section as one of the alternate origins. All of this as far as I'm concerned is just a simple editing dispute between two editors, but I was very surprised to see administrative tools used to essentially levy the argument into one direction. I'm not very familiar with Wiktionary policies, but as an admin over at the English Wikipedia, we are expressly prohibited from using our administrative tools in arguments and disputes we are involved in.

Any advice is appreciated. Will totally drop this if this is the custom here. Mkdw (talk) 22:30, 9 August 2016 (UTC)

Wyang did the same to me, with a widely-used module, and I'm also an admin. So you're not the only one. —CodeCat 22:33, 9 August 2016 (UTC)
@CodeCat Do you think it was OK to make a widely used module for Thai transliterations and transcriptions unusable, affecting thousands of entries, upsetting all Thai editors and not really giving a working alternative just because you didn't like the methods used? Please don't mention this in unrelated discussions. Sorry, I don't support you in that. --Anatoli T. (обсудить/вклад) 23:51, 9 August 2016 (UTC)
Please don't misrepresent the problem. The module was not made unusable once Wikitiki had provided an alternative. Those edits were reverted by Wyang. —CodeCat 23:53, 9 August 2016 (UTC)
Wikitiki tried to help but he wasn't sure himself it was working correctly and did what was expected. Wyang gave reasons why. --Anatoli T. (обсудить/вклад) 23:57, 9 August 2016 (UTC)
Wyang was wrong. The fixes did work. I repeatedly asked him to give examples of entries that were broken by Wikitiki's edits. He never gave any. There was no reason to revert the fixes, especially not when they re-created the problem he accused me of creating. Since the edit war, things have been left in a semi-broken state, I'm afraid to try fixing them again for fear of another edit war. I would like a guarantee that it will not happen. —CodeCat 00:00, 10 August 2016 (UTC)
Doesn't seem very collaborative. We didn't even try dispute resolution. Does Wiktionary have a formal process for reporting administrator abuse of the tools? Mkdw (talk) 22:47, 9 August 2016 (UTC)
Bringing up old grievances in unrelated discussions- when you look in the mirror you should be seeing Dan Polansky right now... Chuck Entz (talk) 14:19, 10 August 2016 (UTC)
I failed to see any evidence of substance for your claim (reputable Chinese sources, announcements by the All-China Women's Federation or the Ministry of Education). Unreliable Western media claims should be removed if no original sources can be found. Wyang (talk) 23:03, 9 August 2016 (UTC)
The New York Times, BBC, and the Huffington Post among other sources were provided. In addition, I also included a source from the China Daily, South China Morning Post, and a book by Sandy To. If you believe these sources are "unreliable" that is your personal opinion but is directly in line with WikiMedia Foundation policies on reliable sources. Further to, you have failed to provide any sources of your own to support your theory, and even if you had sources, you should have expanded the etymology section to include these other origin explanations. I already added a source that says the etymology is disputed. Lastly, indefinitely protecting the article is prohibited as an abuse of your administrative privileges. Mkdw (talk) 23:11, 9 August 2016 (UTC)
The Wikimedia Foundation has no policies on reliable sources. It is entirely up to the individual sites. DTLHS (talk) 23:14, 9 August 2016 (UTC)
None of these sources makes sense.
"The China Daily reported in 2011 that Xu Wei, the editor-in-chief of the Cosmopolitan Magazine China, coined the term."
This is obviously false (Citations:剩女).
"Chiu, Joanna (04 March 2013). Unlucky in love … or just left out of the market?. South China Morning Post. Retrieved 9 August 2016." is the reference cited for "The term was added to the national lexicon in 2007 and widely popularized by the All-china Women's Federation."
No such claim was found in the actual article.
Wyang (talk) 23:16, 9 August 2016 (UTC)
To repeat my stance from the argument with CodeCat: I consider the application of admin power to either prevent or allow editing in an edit war in which the admin is involved to be misuse of such power, even when the admin in question is correct. I would ask all administrators involved in edit wars to turn to one of their colleagues as a manager of such situations. Korn [kʰũːɘ̃n] (talk) 23:34, 9 August 2016 (UTC)
I would like to petition the community here to unlock the entry sheng nu. Wyang protected the article indefinitely not to prevent vandalism or harm, but to simply enforce their editorial position. As for the editorial dispute, Wiktionary has processes in place such as dispute resolution to which I am a willing participant. Mkdw (talk) 23:38, 9 August 2016 (UTC)
There is no point of petitioning if there is effectively no basis for your claims - the content you added misattributes content from references, or is apparently factually incorrect. Wyang (talk) 23:47, 9 August 2016 (UTC)
"The very first origins of the term sheng nu have been much contested, and it is virtually impossible to find out exactly whena nd who first coined the term, be it television dramas, talk show hoests, magazine articles, or academic circles. But the most significant aspect of the 2007 official definition that has been endorsed by the Chinese government, and continuously propagated by the government-run All-China Women's Federation"
To, S. (2015). China’s Leftover Women: Late Marriage among Professional Women and its Consequences. Oxford; New York: Routledge.
"The term refers to any unmarried Chinese woman over the tender age of 27, and was coined by the All-China Women's Federation"
Tunstall, Lee (15 November 2012). Are All the Single Ladies Really Like the Oil Sands?. The Huffington Post. Retrieved 2 April 2013.
"State-run media started using the term "sheng nu" in 2007. "
Magistad, Mary Kay (20 February 2013). "BBC News - China's 'leftover women', unmarried at 27". BBC News (Beijing). Retrieved 9 August 2016.
"According to The New York Times, the term was made popular by the All-China Women’s Federation in 2007"
MacLeod, Duncan (11 April 2016). "Marriage Market Takeover for Leftover Women". Inspiration Room. Retrieved 9 August 2016.
"The term “leftover women” surfaced in 2007 in a report by the All-China Women’s Federation, a state agency whose professed purpose is to “protect women’s rights and interests.”"
Reynolds, Christopher (18 April 2016). "Viral video inspires China's 'leftover women'". Toronto Star. Retrieved 9 August 2016.
"The term "sheng nu" was first used by the All-China Women's Federation (founded by the Communist party in 1949) in 2007, to explain that a leftover is an unmarried woman over the age of 27."
Iaccino, Ludovica (31 January 2014). "Single and Educated: the Problem of China's 'Leftover' Women". International Business Times. Retrieved 9 August 2016.
The lists of available references goes on and on. The other points you brought out were originally used to cite the sentence, "pressure unwed women into marriage", but you reverted my changes before I was complete. Regardless of whether you think my arguments about sources have merits, it does not exclude you from abusing your administrative tools, nor does it warrant engaging in an edit war. Mkdw (talk) 23:54, 9 August 2016 (UTC)

──────────────────────────────────────────────────────────────────────────────────────────────────── I agree with Korn that a page should not be protected to enforce an editorial position. Unless someone (other than Wyang) objects, I'll unprotect it. Benwing2 (talk) 00:45, 10 August 2016 (UTC)

I support unprotecting the page. I do think more discussion is needed before further edits are made, but they should involve more parties than just Mkdw and Wyang. —CodeCat 00:51, 10 August 2016 (UTC)
Unbelievable. Repeatedly adding unsubstantiated material amounts to vandalism, warranting a block already. This is especially true considering it has been more than three years since I had asked for direct evidence for the claim, and there is none. In this case the editing patterns of the user Mkdw shows that he/she clearly has an agenda with the edits: using fantasy-driven Western media articles misconstruing the actual linguistic scenario in China to push for a point of view, i.e. the viewpoint that the Chinese culture is distorted in the Western eyes - there are words coined by the "All-China Women's Federation" which pejoratively refer to unmarried Chinese women over 25 as "leftover women". This is nonsense. If person 1 writes that "A claims B did something", then it is person 1's task to be able to provide direct evidence that B did something, especially when someone considers A's claim unreliable. If person 1 cannot do so, the claim should be promptly removed. Wyang (talk) 01:18, 10 August 2016 (UTC)
(After edit conflict...)
  • Looking on from outside the argument and sussing out the details, I feel compelled to chime in.
Re: the origins of the term, just looking at Citations:剩女, I see that this term was prima facie not coined by the All-China Women's Federation in 2007 -- all five citations currently on that page are older than 2007: 1964, 1992, 1995, 2002, 2006. Past that, relying on English-language sources to divine the etymology of a Chinese term does not strike me as a wholly viable approach. My field is Japanese, and I've run across numerous instances of English-language sources claiming this or that about a Japanese term, when reliable and respected native-language sources say something else entirely. Relying on mass-media sources is even less viable -- their business is to sell copies, and they do that by printing interesting content, often without much regard to strict veracity.
Re: sources, finding a citation of a term in use is enough to meet our criteria for inclusion, vaguely analogous to Wikipedia's "notability" requirement. But when it comes to the content of an entry, it is not enough that a given source says X or Y: we also pay attention to the identity, reputation, and expertise of sources. As a thought experiment, I wouldn't care one whit if you found that the New York Times itself claimed that the Japanese word gaijin (“outsider, foreigner”) originally came from Hebrew גויים ‎(goyim) -- unless that also agreed with known Japanese sources that make the same claim.
Re: edit warring, it bears noting that Wiktionary's editor base is much smaller than Wikipedia's. We neither need, nor can we use, the kind of bureaucracy that has evolved on Wikipedia. Given also that the number of editors for any given language is much smaller than the total number of Wiktionary editors, we must often rely upon the judgment and expertise of the very small number of people who handle the day-to-day process of maintaining our content. Your edit history (23K+ on Wikipedia, 129 or so here on Wiktionary) and some of the background threads (as at Talk:sheng_nu) suggest that you're well-versed in Wikipedia's culture and way of doing things, but not so much in Wiktionary's.
Ultimately, considering that Wyang is a native speaker of Chinese, can read Chinese source materials, and has a long history of high-quality work on Chinese entries here, I'm much more inclined to trust his judgment over yours, when it comes to the origins of Chinese terms. You discount him entirely by merely posting English-language sources, many of zero etymologic value, and claiming that the burden is on him when he's asking you for reputable Chinese sources backing your claims.
I haven't agreed with everything that Wyang has done, but in this case, it does appear that he is more in the right on the etymology of English sheng nuChinese 剩女. ‑‑ Eiríkr Útlendi │Tala við mig 01:21, 10 August 2016 (UTC)
I admit I discounted Wyang when they removed content under the rationale "Western fantasisation". That indicates to me an unreasonable bias. It doesn't matter how Wiktionary treats original research. Any opinion needs to be supported by something otherwise it's simply an opinion. Here is what was removed:
The exact etymology of the term is disputed.[2] The China Daily reported in 2011 that Xu Wei, the editor-in-chief of the Cosmopolitan Magazine China, coined the term.[3] Other sources have indicated the All-China Women's Federation and the Ministry of Education of the People's Republic of China.[1][4] The term was added to the national lexicon in 2007 and widely popularized by the All-china Women's Federation.[1][5][6]
The first two citations at Citations:剩女 seemingly use the term to refer to people who remained after the war. This does not explain the etymology of the term sheng nu as defined by the Chinese lexicon, an unmarried women in their late twenties. It suggests that leftover and woman (and man in this case) were put together not as a term but as a turn of phrase. The same goes for the 2002 citation that talks about food. Suggesting these are the origins for the term about unmarried women seems unlikely because there is nothing supporting the finding of these words together to the term or an evolutionary process. Where I think Wyang's ability to read and write Chinese could be useful is not conducting their own original research, but finding Chinese sources that tie any one of their citations as being the origin of the term. In the meantime, the sources we do have are all we have. Wrong? Possibly. Sourced and not original research? Yes. I would have even settled for "The exact etymology of the term is disputed.[2] The term was added to the national lexicon in 2007 and widely popularized by the All-china Women's Federation.[1][5][6]" but that never seemed to be on the table either because Wyang always resorted to wholesale reverts.
I won't even get into all the problems Wiktionary invites by allowing original research to prevail over published material (including books). Or by allowing admins to use their tools to resolve editorial disputes. You're right, I know Wikipedia and Wiktionary is different, but I cannot see how Wiktionary hopes to grow their community and welcome newcomers when special privilege and rules are applied to a select few.
Lastly, so quickly labelling me as a disruptive editor is only evidence of that. I'm not here pushing some fringe idea. I was adding what I've found in the sources. I have adjusted the content as new sources have come up as evident in my last series of edits. It allowed for multiple explanations and acknowledged the origin is disputed. I was seeking compromise. I was using the talk page until Wyang stopped replying. I have even mentioned an openness to dispute resolution. Mkdw (talk) 02:45, 10 August 2016 (UTC)
One point that hasn't been made yet: your sources are adequate for an encyclopedia article, but this is an etymology, which requires a different skill set than most journalists have. There was an article in a non-linguistic journal that made assumptions about prehistoric human culture based on long-range linguistic reconstructions that are considered by most linguists to be way out on the fringe, and there were all kinds of articles in mainstream journalistic sources that treated this without any skepticism at all. Chuck Entz (talk) 14:19, 10 August 2016 (UTC)
  • Thank you, Chuck -- that was the point I tried to make earlier, that mass-media sources are of exceedingly low value when it comes to etymologies. Your restatement is clearer. ‑‑ Eiríkr Útlendi │Tala við mig 23:18, 10 August 2016 (UTC)
  • @Benwing2, I'm with Wyang here -- Mkdw really does not seem to get it, but he keeps pushing his position. In Wyang's place, I might have done the same thing: with few Chinese editors to collaborate with, he's been one of the most active Chinese editors for a while now (at least, from what I've seen). If we unprotect the page, what do we do if Mkdw keeps adding low-value English-language "sources"? What other approach would you all advocate? Should we block disruptive users, rather than locking disrupted pages? Serious questions, BTW, I'm not being rhetorical. ‑‑ Eiríkr Útlendi │Tala við mig 01:25, 10 August 2016 (UTC)
OK, I'll leave it alone now. I still believe that it's an abuse of admin powers to lock a page over editorial disagreements, even if the admin is almost certainly correct, as long as the other user is apparently acting in good faith and is willing to respect the dispute process (which for us would probably be a Tea Room discussion); but Eirik you make good points. Benwing2 (talk) 02:31, 10 August 2016 (UTC)
This sets a terrible precedent on principal alone. This creates a de facto community endorsement whereby administrators can revert an editor and then protect the entry indefinitely to enforce their editorial preference in a situation, even if the editor is willing to go to dispute resolution, using the talk page, and changing their edits to find a compromise -- provided the admin feels like they're "right". I'm admittedly very disappointed but I strongly believe in community consensus. If the community consensus here and now is to endorse this action, then I withdraw my request and accept it under protest. The editors at Wiktionary have the right to their own self determination regarding their practices. It's unfortunate that this type of conduct is being endorsed rather than, say, a community consensus possibly endorsing Wyang's editorial position, if they deemed so, but also finding Wyang's administrative actions inappropriate. I would have accepted it and the community (including administrators) would have had recourse against any editor (including myself) as going against the consensus. I would think that would then be deemed a disruptive editor, but seemingly the threshold is way less than that. Maybe there's no appetite for bureaucracy but there must be checks and balances for administrators and this is one step in the opposite direction. Mkdw (talk) 02:51, 10 August 2016 (UTC)

Why are you all walking into a smokescreen now? This is not a debate about the origins of the phrase sheng nü, this is a debate about whether admins should get to abuse their goddamn power. The edits made were both good faith and sourced with media which is generally accepted to be decent, and as such as proper as any Wiki-edit can get. They can be wrong a hundred times over and Wyang can be right a hundred times over, using his superior power in an edit war, not even to stall the warring until a consensus was reached, mind you, but to allow him to forego the argument is crass abuse and should not, for any reason, be tolerated. This should be a place where arguments should be won by superior evidence, not by sucker-punching your opponent with your superior admin-muscle. In the same vein, and you damn well know whom I am looking at, this is a place where arguments should be carried out in civilised debate and not by playing volleyball with a page because nobody can stop you. If you're part of the argument, you don't get to be judge or police, that should be a very simple rule we can all agree on. While I have no doubt that Wyang is an absolute treasure as an editor on Asian languages, the fact that he's involved in two such situations within a short period of time doesn't give me the best impression of him as an admin and the fact that an outsider now has reached the conclusion that we as a whole condone such abuse should shame us all. Korn [kʰũːɘ̃n] (talk) 10:42, 10 August 2016 (UTC)

Well, Wiktionary is not like Wikipedia. A word can be demonstrated to exist at a point in time simply by showing attestations of the word at that time point, and there is no point citing an external article claiming a word was coined in 2011 while there are ample attestations for its use long prior to that. This is exactly what User:Mkdw did not get and what he/she had been trying to do repetitively over the years, including today again. It has been made clear that there are reasonable doubts regarding his edits more than three years ago, but he ignored the comments and the attestations I have gathered at Citations:剩女 to continue pushing for his POV edits. It is apparent they are trying to match the Wiktionary content to the stuff ("Good Article") they have written over at Wikipedia, with complete disregard for criticism and the linguistic facts. This is vandalism and should be dealt with as such. Wyang (talk) 11:57, 10 August 2016 (UTC)
In my book, it's not vandalism as long as it's done in good faith. But no matter whether it is: Even if there is an edit war involving one steadfast defender of the right thing and one plain vandal, then yes, the vandal has to be dealt with, but by a neutral third who has heard both sides, not by one side of the edit war. That is all I'm saying. Korn [kʰũːɘ̃n] (talk) 13:20, 10 August 2016 (UTC)
  • I think the first round of back-and-forth edits in 2013 counts as good faith. Mkdw came back just a couple days ago and added essentially the same content, completely failing to accept or respond to the past argument that the content he was adding was from sources of clearly contestable value. After Wyang made it clear that Mkdw's sources were still inadequate, Mkdw continued to insist -- he again refused to acknowledge the possibility that just having a source isn't enough here on Wiktionary. This is where I start to view Mkdw's edits as not in good faith any more. ‑‑ Eiríkr Útlendi │Tala við mig 23:27, 10 August 2016 (UTC)
PS: See [[Talk:sheng nu]] for the relevant discussion and timeline. ‑‑ Eiríkr Útlendi │Tala við mig 23:29, 10 August 2016 (UTC)
I agree with Korn in principle, but there are some practical issues that need to be dealt with. When someone makes an edit and an admin finds it, at this point the admin is still an uninvolved party, when the admin reverts the edit and the original editor reverts it back, does the admin now suddenly become an involved party and need to seek out another uninvolved admin? When the second admin reverts the edit and the original editor reverts it back, does the second admin now also become an involved party and need to seek out a third admin? It's difficult to know what the "right" thing to do is if you are an admin in that situation and we don't have any clear guidelines on this. I think we really need to draft up a policy on this, so that admins will have a procedure to follow and also so that we can clearly determine when an admin is not following it. --WikiTiki89 15:04, 10 August 2016 (UTC)
My simple proposal for every sort of edit war is that instead of a second undoing, a third person has to be contacted. That is: 1. An edit is made by John. 2. It is undone by Jim. (It is not relevant whether the undo-function or be rewriting of the contents.) 3. The original edit is restored by John. 4. Jim is not allowed to undo it a second time. Instead, Jim is now obliged to bring the discussion to the attention of other users of the language or the community in general (such as Beer Parlour). Korn [kʰũːɘ̃n] (talk) 15:45, 10 August 2016 (UTC)
ps.: Obviously, if John and Jim agree that they will debate this amongst themselves or that they are fine with continued editing as a form of successive proposals rather than merely trying to set the page back to a former status quo over and over, a third party need not be bothered with it. Korn [kʰũːɘ̃n] (talk) 15:48, 10 August 2016 (UTC)
My name is John and my brother's, Jim. I sincerely doubt whether we would have such a disagreement. :PJohnC5 15:54, 10 August 2016 (UTC)
  • I'd like to point out a concern here -- we have the Wiktionary editor community (not very big to start with), and we have the individual language editor communities (much smaller still). If our hypothetical Admin Jim is the only active editor for Language Foo during the time that non-admin Editor John is busy adding controversial content to entry Bar, do we now demand that Admin Jim just sit on his hands for possibly several days, or longer, until some other editor for Language Foo comes along? Again, serious question, not rhetorical. I'm interested in people's views here. ‑‑ Eiríkr Útlendi │Tala við mig 23:27, 10 August 2016 (UTC)
In response to Wikitiki89 (talkcontribs), no it seems to me that an uninvolved admin does not become involved by reverting the editor causing the controversial edit, including multiple times. In response to Eirikr (talkcontribs), no the admin shouldn't have to wait until someone else knowing that language comes along. Instead there should be a discussion in Tea Room or Beer Parlour or wherever. That way, others can weigh in based on the evidence. (In practice, in such a case the admin, esp. a long-time contributor to a language, will probably get the benefit of the doubt unless the other user can show a sufficiently good reason why the admin is wrong.) IMO in a controversy a long-time status quo should prevail until the controversy is resolved, and it's probably OK to lock a page on the status quo to prevent an edit war, *if* the user doing the controversial edit insists on edit-warring rather than participating in a discussion; I've seen that happen in Wikipedia. The locking should happen by an uninvolved admin, though, and only while the discussion is happening. Benwing2 (talk) 23:41, 10 August 2016 (UTC)

Proposed addition to WT:NORM: headers cannot be nested inside things[edit]

I propose adding an additional rule: Headers must not be nested inside other elements, such as templates and (HTML) tags.

This rule would make parsing a lot easier, because a parser would not need to parse the nesting of templates before they can determine whether a header is "real", appearing at page-level, or is actually nested within some template. A parser would be able to assume that every header is "real". With this change, the code

{{foo|bar=
==baz==
}}

would be disallowed. —CodeCat 20:06, 10 August 2016 (UTC)

Do you have an example of a page that does this? DTLHS (talk) 20:12, 10 August 2016 (UTC)
Not really, there may well be none (there's some talk pages, but those don't count for WT:NORM). The point is to have an official rule in place that disallows it, so that a parser's design can be simplified. —CodeCat 20:14, 10 August 2016 (UTC)
Is there really a need to make a formal rule about this? I mean to me it just seems obvious not to screw around with the layout and such like that via a template. Although I do see some pretty crazy things done on wikis sometimes... Philmonte101 (talk) 21:38, 10 August 2016 (UTC)
@CodeCat: I think this is a problem with WT:Norm, Templates, 3: `For templates with many or long parameter values, line breaks are allowed at the end of a template's name or a parameter's value, for the purpose of making the wikitext easier to read.' If one changes `are allowed at' to 'are only allowed at,' I believe this would make your example impossible, because every line inside a template would have to begin with a pipe or a double closing curly brace. I assumed that only, although not stated, was probably intended by this line. Edit: For HTML, I agree and see something like this as important. Isomorphyc (talk) 02:57, 11 August 2016 (UTC)
I don't understand what you're saying. This is about nesting headers in templates and other things. —CodeCat 15:24, 11 August 2016 (UTC)
@CodeCat: As I understand, headers are required to begin immediately after a line break (I assume Headings 1. means `one blank line [immediately] before all headings,' except the first, as anything else is extremely rare and normally treated as an error. If templates which continue onto a second line are required to do so only after the value, then the line break will be immediately followed by a pipe, a closing double curly brace, or whitespace, never an equals sign. Hence, this change to the template rule will prevent embedding pseudo-headers into templates, which it probably was intended to do in the first place. HTML and other things are a different matter. WT:NORM is a little bit subtle in places; am I misunderstanding? Isomorphyc (talk) 16:44, 11 August 2016 (UTC)

Proposed addition to WT:NORM: no template parameter expansions[edit]

This means that things like {{{1}}}, with three curly braces, can't appear in the wikitext. This is probably something that goes without saying, since regular pages aren't ever passed parameters. But to have it codified would again be a useful assumption for parsers: rather than having to decide whether a bunch of curly braces should be grouped two or three, it can assume it's always two. —CodeCat 20:11, 10 August 2016 (UTC)

I support this. Edit: Some rationale: I recognise it is not desirable to turn WT:NORM into a grammar, but I think just a few lines should make the extensions completely orthogonal to the Wikitext abstraction. Given the potential for adding logic into the triple brace notation, and the fact that a pushdown parser is required to fully treat this syntax, this exception is worth codifying. Isomorphyc (talk) 03:57, 11 August 2016 (UTC)
That sounds reasonable. - -sche (discuss) 09:32, 13 August 2016 (UTC)

Wiktionary:About Scots[edit]

Who here contributes a lot with our Scots lexicon? I notice that words that mean the same thing in both English and Scots are never added by users other than me for some reason. I've added terms like electromagnetic, Denmark, and others, that have the same meaning as in English. I guess it's because people a lot of times will consider Scots a dialect rather than a language. I feel like we should have all terms in Scots, or else we should make a formal decision about which Scots terms should be allowed here and which shouldn't. Philmonte101 (talk) 21:36, 10 August 2016 (UTC)

@User:Angr, User:Nbarth, User:Leasnam, you three might be interested in this discussion. Philmonte101 (talk) 23:48, 10 August 2016 (UTC)
I agree that these words should be added, but like we saw with Middle English entries, it tends to fall by the wayside. It may seem like duplicate effort to try and get Scots words in when they're the same as the English word. Time to roll up our sleeves I guess Leasnam (talk) 04:48, 11 August 2016 (UTC)

I just had a few things pop into my head. You're right, for one, it seems really tedious to add all of those entries. But what if we could possibly get a list of these terms from somewhere, like, say, another dictionary dedicated to the Scots language? And then we could either add them manually or automatically create the entries somehow, like, say, a bot of some kind that takes information from the English entries and converts them into Scots (except no definition, just a simple one-word translation)?

@ User:Leasnam But then again, we'd have to worry about verification of these terms, as per WT:ATTEST. I wonder, is there an easier way to verify Scots terms? When I try to look for Scots terms in, say, Google Books, for "electromagnetic", all I find are English sources. Is there an online Scots library of some sort, or some way to search in only Scots language books/archived documents? Philmonte101 (talk) 05:09, 11 August 2016 (UTC)

Hrmm, there are several, but I am unsure if they are copyrighted or not. Most likely would be. I tried searching for "electromagnetic" and "maist"/"mair"/"ilka" and didn't turn up anything in Scots. Scots is mainly devoted to poetry and older language. Anything having to do with electromagnetism I think would be written in English (?) Leasnam (talk) 05:24, 11 August 2016 (UTC)
I'll be honest with you, I found a lot of those terms on Wikipedia, and searched them up to make sure they were used consistently on the site, and didn't seem to have any variations. So I assumed that because of this those terms may be attested. But I guess I was wrong about that. But wait, couldn't Scots be one of those languages that has a template that says it's scarcely documented, and so therefore the rules of CFI for it are different than for more common languages? I know I've seen Malagasy terms that had this template. Philmonte101 (talk) 16:47, 11 August 2016 (UTC)
Semi-relevant: one thing that annoys me is that when you create a Scots verb (sco-verb), it inflects in a certain way (can't remember exactly, but something like e.g. walkit instead of walked) that often doesn't reflect the actual literature. It is, of course, hard to draw a clear line between English and Scots words, given the history. Equinox 20:09, 11 August 2016 (UTC)

Telugu wikisaurus[edit]

I would like to create Wikisaurus for Telugu language. Where and How to do it. What is the platform; Is it English wiktionary or Telugu wiktionary. Thank you if someone answers.--Rajasekhar1961 (talk) 04:01, 11 August 2016 (UTC)

I would try Telegu Wiktionary. DCDuring TALK 10:45, 11 August 2016 (UTC)
Can someone help me in creating the Wikisaurus in Telugu Wiktionary. I need some technical assistance and create the necessary Wiktionary:Wikisaurus/Format and templates there. Thanking you.--Rajasekhar1961 (talk) 11:28, 11 August 2016 (UTC)
@Dan Polansky did the work on that here. DCDuring TALK 13:13, 11 August 2016 (UTC)

In French Wiktionary there is thesaurus in different languages, not only in French, with translation into French. Is it different in English Wiktionary? In which convention page is it specified? Noé (talk) 10:00, 18 August 2016 (UTC)

Usage note at schrift[edit]

I removed this usage note, considering it pointless, but User:Morgengave restored it and expanded it further. I don't think this is much of a usage note at all, since it doesn't say anything about the usage of the term (that the definitions don't already say), nor is it customary for us to mention other terms different by capitalisation in usage notes. We have this on Earth vs earth, but not Moon vs moon or most other cases where this happens. What do others think? —CodeCat 21:56, 11 August 2016 (UTC)

Hi CodeCat - My personal view is that if it can help the person consulting the lemma, then let us include it, on condition that it remains sharp and concise obviously. Overall, it would be helpful to have a Wiktionary:Usage notes policy to use as guidance. Morgengave (talk) 22:05, 11 August 2016 (UTC)
We already have {{also}} for this, which works fine for the vast majority of pages. So why do these few cases in particular require a usage note? Also, does your usage note even give any notes on usage? —CodeCat 22:08, 11 August 2016 (UTC)
I'm fairly relaxed about this - I am open to multiple solutions as long as the user is helped. I believe it helps more than the "also" on the top of the page as in my view the also-template is for users to find the right lemma back easily and the usage notes are to help the user on usage (including potentially confusing situations). Two different purposes in short. There's unfortunately no usage notes policy and this one sentence seems fairly harmless even if one would deem it redundant. Can you help me understand you better: why are you keen on removing this one sentence? Morgengave (talk) 22:27, 11 August 2016 (UTC)

Reference specifications[edit]

As per a debate I have had recently and in the past with @Dan Polansky, an example of which may be found at Template talk:R:DSMG, I think that references in templates or in entries should be explicit and full format. Pace Dan, whatever beauty may be derived from a “simple” format of templates such as {{R:DSMG}} and {{R:Webster 1913}} accompanies a loss of relevant citation information. We have templates created (and updated recent by @Smuconlaw), {{cite-book}}, {{cite-journal}}, etc., that provide standardized citation functionality. Indeed, I would be prepared to start a vote to make rules for citation generally. What do people think? Is this a minor quibble, or do people agree that we should have a standardized, full format? —JohnC5 17:26, 12 August 2016 (UTC)

I am fine with a full detail being available on a mouseover. Thereby, the detail would be there for those who require it, while it would not block the radar screen and disturb the skimming focus of those who love succinct identification. --Dan Polansky (talk) 17:28, 12 August 2016 (UTC)
By way of example: into {{R:is:IEO_1989}}, I placed the following two formats, the short one and the long one, the long one being visible upon moseover:
  • word in Hólmarsson et al.: Íslensk-ensk orðabók. 1989.
  • word in Sverrir Hólmarsson; Sanders, Christopher; Tucker, John • Íslensk-ensk orðabók / Concise Icelandic-English Dictionary • Reykjavík: Iðunn, 1989
--Dan Polansky (talk) 17:31, 12 August 2016 (UTC)
I agree that excessive details can be hidden using a mouseover or expandable box. (I would like this for our quotation templates as well). DTLHS (talk) 17:33, 12 August 2016 (UTC)
If this is to be done it should be done consistently across all templates with regards to what information is hidden. "Box" wasn't the right word, just an expandable section. I prefer that to a mouseover because with a mouseover you can't have any links and you can't select and copy the data. DTLHS (talk) 17:43, 12 August 2016 (UTC)
What should be done for platforms that have no mouse pointer? —CodeCat 17:46, 12 August 2016 (UTC)
An alternative I have in mind is that each reference template would contain a link to a section in an appendix page for reference templates. That section would contain a full identification and more. As for technology, it is a simple wikilink. --Dan Polansky (talk) 17:52, 12 August 2016 (UTC)
The proposal here would be to have giant appendices containing full versions of every citation we use or choose to abbreviate? —JohnC5 18:12, 12 August 2016 (UTC)
You mean every reference, right? I hope you do not intend to push your ornamental cast iron to our poor attesting quotations; they are already too noisy, putting metadata before the quotation itself. As to the substance of your question, the appendices obviously do not need to be "giant"; they can be as granular as we see fit, and therefore as small as we see fit. --Dan Polansky (talk) 18:21, 12 August 2016 (UTC)
Don't worry, none of this will ever happen since there are a million different reference formats none controlled by the same back end, making any kind of unification impossible. DTLHS (talk) 18:42, 12 August 2016 (UTC)
Oh ye of little faith! We certainly can make a standard then fix all the templates. —JohnC5 19:36, 12 August 2016 (UTC)

Proposed extension to criteria for inclusion on proper names of fictional works.[edit]

I'd like to propose a change (and if these become votes, they'd be separate votes from one another) to our criteria for inclusion.

Proposal[edit]

  • Proper names of titles of fictional works, such as books, television series, video games and video game series, should be included in our lexicon, as long as they have 3 sources that are independent from the book/series itself. I.e. the book citations, or Usenet citations, must not specifically be about the television series or video game.
  • Please note that this proposal is not about appending fictional characters or names of fictional entities into Wiktionary; just about the titles of the works themselves (usually represented by italics). I feel that characters and entities should follow the guidelines that are here already.

Motives[edit]

  • Just like countries, cities, county names, etc., titles of these works are proper nouns.
  • Many will argue that including such things "is not traditional." Though it is not traditional, I'm surprised that we didn't include these already. For example, with Wikipedia, traditional paper encyclopedias don't generally include articles about television series or cartoon characters. Well, they might have a few of the really important ones, but not many. So, Wikipedia is thus extremely different from the traditional encyclopedias in many ways, and is in fact better if you ask me. I'd say the same thing about Wiktionary. Wiktionary includes far more information than most paper dictionaries do. Many dictionaries don't include nearly the amount of etymological information, synonym information, derived terms, anagrams, pronunciation, etc., that we do here. Also, they generally don't include rare slang terms. We do. Most paper dictionaries wouldn't include "all words in all languages", because, well, it'd be silly; millions + pages. Most paper dictionaries don't include individual entries for inflected forms. And now add this; paper dictionaries generally don't include names of popular TV series, or classic works of literature, etc. But, it would be informative to readers, so why don't we?
  • Many TV series have a few translations in other languages. Such as, The Simpsons sometimes translates to Los Simpson in Spanish. The TV series Cops apparently translates to Zsaruk in Hungarian. I could find more examples, but you get my point. Many people might want to know the translations of these proper nouns. Of course, the translations as well as the English entries would have to be verified as per the changed CFI.

Examples[edit]

TV series[edit]

Duckman

  • [7] "The network felt that Duckman, the Emmy award-winning, but low-rated series about an acerbic, chauvinistic detective and his bumbling family, did not reflect the general-entertainment brand model that USA was trying to build in prime time [...]" 2013
  • [8] "This may be the show that proves TV animation can stay up past most kids' bedtimes and stiii And a strong, profitable audience. "Duckman," a new latenight series on USA Network, is crude, violent, cynical, antisocial and a little sexist, pretty [...]" 1996
  • [9] Usenet. Just scroll all the way down until you find ones that don't have "Duckman" in the title.
Classic book[edit]

Winesburg, Ohio

  • [10] Groups, from 2010.
  • [11] "As I begin to reevaluate the place of Sherwood Anderson's Winesburg, Ohio in the development of American fiction, I first want to look at Anderson's symbiotic relationship with Gertrude Stein, a relationship most Stein devotees will know about [...]" 1999
  • [12] "In triggering conversation in Winesburg, Ohio, however, a single word can dramatically alienate and isolate dialogic partners in the frightening immediacy of their encounter; such contact is always a "traumatism of astonishment." 2009

Separate proposal[edit]

If this doesn't work, we may be able to have these proper names of fictional or nonfictional works somewhere in the appendix namespace.

Conclusion[edit]

I just threw this together in about 30-45 minutes. But you get my point, I'm sure. You can find sources that aren't directly about these proper nouns, that are from varying years, and that were not written by its creator. If these were the inclusion standards for entries for book names or TV show names, we should also have a header that italicizes the proper noun, as this is the standard in English.

Comments below[edit]

So, what do you think? I feel the urge to start a vote, and it says to start discussion in the beer parlour. I know quite a few of you will most definitely and immediately disagree, and I have a semi-good idea of which users will and won't (whom I know). Although, I'm sure there will at least be some who agree or at least partially agree with this proposal, and I'm open to suggestions to things I should change before starting the vote. (No personal attacks please) Philmonte101 (talk) 05:16, 13 August 2016 (UTC)

  • Oppose. I hope other editors will articulate the reasons; I think it's kind of obvious. --Dan Polansky (talk) 07:49, 13 August 2016 (UTC)
  • Oppose We have enough work to do just filling out, cleaning up, and otherwise maintaining what we've got. DCDuring TALK 12:47, 13 August 2016 (UTC)
  • Oppose. This type of information is better suited to an encyclopedia. If only there were an encyclopedia version of Wiktionary... --WikiTiki89 13:27, 13 August 2016 (UTC)
    Precisely. Oppose. Equinox 13:28, 13 August 2016 (UTC)
  • Oppose for the reason given by Wikitiki89. If a fictional title or character has gained some idiomatic meaning, then it merits inclusion here (e.g., Wonder Woman to mean a woman of extraordinary ability). If it only retains its fictional meaning, then it belongs at Wikipedia. — SMUconlaw (talk) 11:33, 15 August 2016 (UTC)
  • Oppose. We might as well include the names of specific people, like Justin Trudeau, so that people who want to know why his parents chose that name for him can look it up under the etymology. Andrew Sheedy (talk) 17:22, 15 August 2016 (UTC)

oversized Cyrillic for Old Church Slavonic and Old East Slavic[edit]

For some reason, the Cyrillic font we use for Old Church Slavonic and Old East Slavic renders bigger than the Cyrillic font for Russian, at least on my Mac OS X laptop under Chrome. See тать for an example; compare the Old Church Slavonic entry to the Russian entry, and see the Russian etymology for an example of Old East Slavic, which looks (on my machine) the same as Old Church Slavonic. Do we want to fix this? Benwing2 (talk) 21:15, 13 August 2016 (UTC)

The reason for this is that the specific Old Cyrillic fonts come out smaller and therefore need to be rendered bigger. Your Mac is probably using Helvetica or whatever the default Mac font is, because it supports the characters and because you don't have Old Cyrillic fonts installed. Ideally, we should be be able to provide font-specific sizes, but I don't think CSS supports that. --WikiTiki89 21:32, 13 August 2016 (UTC)
If you are lack of fonts, these fonts may help you. [13] (In this case, try Noto Sans or Noto Serif.) --Octahedron80 (talk) 12:57, 18 August 2016 (UTC)
The issue is not character support in the fonts, but rather the choice of font. The Old Cyrillic script is meant to be displayed like this. --WikiTiki89 13:18, 18 August 2016 (UTC)
I'll pass if it depends on font variations, since they are located on the same codepoint. --Octahedron80 (talk) 00:38, 19 August 2016 (UTC)

"book cites aren't usexes"[edit]

In diff user:Equinox removed the {{ux}} template. It's good and well if we decide that this template is strictly for usexes (which is far from decided as far as I know, but never mind), but the template should not be removed altogether. Instead an alternative template should be provided that is more appropriate. —CodeCat 20:26, 14 August 2016 (UTC)

At the moment, ux puts the text in italics, which doesn't look good for book citations. Equinox 20:27, 14 August 2016 (UTC)
{{ux}} is wrong if we wish to maintain the customary italicization of book/journal/newpaper titles. I can't understand how the failure to explicitly exclude {{ux}} from use for citations constitutes sanction in favor of it. One could as easily claim that wikitext can overwrite all templates not explicitly endorsed by a voted policy. This kind of thinking is dangerous in an admin. DCDuring TALK 21:46, 14 August 2016 (UTC)
Maybe we should have a template that's identical to {{ux}} except it doesn't italicize, for use with quotations. People seem to like to use it in this way. DTLHS (talk) 22:19, 14 August 2016 (UTC)
Maybe we should keep {{ux}} for both usexes and quotes and maybe change it not to use italics for Latin chars. Note that it doesn't currently use italics in Cyrillic. Benwing2 (talk) 22:20, 14 August 2016 (UTC)
It's more or less like {{l}} versus {{m}}. —CodeCat 22:26, 14 August 2016 (UTC)
What is the advantage in using {{ux}} in terms of improved user experience, improved ease of adding content, speed of downloading, server load, etc.? Why aren't we hearing about such advantages? This also seems to go against the idea of intuitive names to speed the learning by new contributors. If that isn't important, why not remain {{ux}} to {{u}} for "qUotation" and "Usage example? DCDuring TALK 22:38, 14 August 2016 (UTC)
For English, I don't know. For foreign languages it provides uniform formatting of translations and (for non-Latin scripts) transliterations. Benwing2 (talk) 22:57, 14 August 2016 (UTC)
Because the more consistent we make our entries the easier they are to edit. And the last 10 years have proven that we are utterly incapable of any consistency that isn't rigidly enforced by templates. DTLHS (talk) 22:59, 14 August 2016 (UTC)
For quotations, we have a whole series of templates (the most important being {{quote-book}}, {{quote-journal}} and {{quote-web}}) that can be used for a consistent appearance. I don't think any other templates are required. — SMUconlaw (talk) 11:29, 15 August 2016 (UTC)
@DTLHS
  1. . How does {{ux}} do a better job of making our entries easier to edit? It looks like just a labeling requirement imposed on others to make life easier for amateur programmers.
  2. . The "quote-" family of templates does make for a great deal of uniformity in line and character formatting and in order of the components of citations. What does {{ux}} add? If the idea is that the advantage will emerge in the fullness of time, we would need to have a great deal more faith in the capability of our "technical" contributors than I believe they have earned. DCDuring TALK 13:31, 15 August 2016 (UTC)
{{ux}} provides automatic transliteration, whereas {{quote-book}} et al. do not. A deal breaker for me. --Vahag (talk) 14:03, 15 August 2016 (UTC)
I didn't know that {{ux}} provides automatic transliteration. But in that case, the solution is for someone knowledgeable about Lua to add automatic transliteration to {{quote-meta}}. (The "quote-" family of templates already has a |transliteration= parameter.) Using {{ux}} in this context is not very appropriate because it formats quotations differently from the "quote-" templates, leading to a lack consistent appearance. — SMUconlaw (talk) 15:34, 15 August 2016 (UTC)
But why mix two different functions (citations and text rendering) in one template, when it would be easier to just use two templates? --WikiTiki89 17:06, 15 August 2016 (UTC)
Since the "quote-" templates are already intended for formatting quotations, why not just build the automatic transliteration function into {{quote-meta}} instead of having to use yet another new template? — SMUconlaw (talk) 17:20, 15 August 2016 (UTC)
Transliteration is not the only thing missing. {{ux}} (and I guess now {{quote}}) support many features that are useful for rendering text, such as allowing language-links, and may support more features in the future, such as automatic linking and who knows what else. It wouldn't make sense to add each of these features in more than one place, when they can just be added to one place. Let the "quote-" templates focus on formatting the citation line itself and not worry about rendering quotation text. --WikiTiki89 17:26, 15 August 2016 (UTC)
I see. But the "quote-" templates also render the quotation text through the |passage= parameter. The current situation means that two separate templates have to be used for formatting quotations depending on what features are required in the quotation text. I hope this is explained somewhere (perhaps at "Wiktionary:Quotations"). — SMUconlaw (talk) 17:45, 15 August 2016 (UTC)
I don't see any inherent problem with using two different templates. In fact I even find that it makes the wikitext more readable. --WikiTiki89 17:52, 15 August 2016 (UTC)
I think {{quote-meta}} should be using the usex module to render the quotation text. DTLHS (talk) 17:47, 15 August 2016 (UTC)
You could do that, but you would have to ensure that there are no argument naming conflicts and such. I don't think there are any at the moment, but it would be an extra thing to worry about whenever adding a new argument to either {{ux}} or any of the individual "quote-" templates. I don't see the point. --WikiTiki89 17:52, 15 August 2016 (UTC)
I have created {{quote}}, which works the same as {{ux}} except for these differences in formatting. And it does do automatic transliteration. —CodeCat 15:44, 15 August 2016 (UTC)
What exactly are the differences in formatting? I tried it at קטון, but the formatting is exactly the same. Does this only apply to Latin script? --WikiTiki89 17:06, 15 August 2016 (UTC)
CodeCat answered this question at User talk:KIeio‎#Template:quote, saying: “It doesn't apply any formatting to the quoted text, so that it preserves its original formatting as much as possible.” @CodeCat: I presume you meant basically that italics are not applied? Is there any other difference? And can we get the following to work as expected: {{quote|ru|Слова ''да'' и ''нет''.}} (currently the italics don't do anything in scripts other than Latin)? --WikiTiki89 19:52, 15 August 2016 (UTC)
This can't be fixed unless we allow for Cyrillic italics in general. Previous discussions have mostly led to the conclusion not to allow them. —CodeCat 21:08, 15 August 2016 (UTC)
This can be fixed without allowing non-Latin scripts to be italicized in mentions and usexes. Previous discussions have led only to the conclusion not to allow italicizing non-Latin mentions and usexes, but that does not apply to quotations. --WikiTiki89 21:55, 15 August 2016 (UTC)
True, this would work thanks to {{mention}} having distinct style tags (finally, a good use for it). I just wanted to make sure that it was ok to remove the blocking of all Cyrillic italics, which I believe we have currently. —CodeCat 22:19, 15 August 2016 (UTC)
Is there a reason why {{ux}} no longer seems to italicize example sentences? It's introduced a whole bunch of inconsistencies (compare, for example, the verb and noun sections at shift, one of which was italicized manually; the other with the template). Also, now that the example sentences are no longer visually distinguished from the definitions, it is much harder to read. I'm guessing this discussion has something to do with the change, but could it be reverted? It's just undone possibly hundreds of my edits which were aimed at increasing consistency between entries. Andrew Sheedy (talk) 20:02, 15 August 2016 (UTC)
Module:usex if quote and (sc:getCode() == "Latn" or lang:getCode() == "und") then. @CodeCat Shouldn't it just be if (sc:getCode() == "Latn" or lang:getCode() == "und") then? DTLHS (talk) 20:09, 15 August 2016 (UTC)
That seems to have been a mistake. I fixed it. --WikiTiki89 20:11, 15 August 2016 (UTC)
Perfect, thanks. I'm glad to know it was an accident rather than another template change I have to adjust to.... Andrew Sheedy (talk) 20:17, 15 August 2016 (UTC)
It's pretty standard actually not to use a template for a citation. Not sure why. Perhaps no template has ever exceeded simply doing it by hand. Renard Migrant (talk) 20:27, 15 August 2016 (UTC)
It should never have been standard. DTLHS (talk) 20:31, 15 August 2016 (UTC)
The quotation templates have been greatly improved over the past few months. --WikiTiki89 20:37, 15 August 2016 (UTC)
Face-smile.svgSMUconlaw (talk) 09:10, 16 August 2016 (UTC)
@Smuconlaw: I don't know if you've gotten one, but you deserve a very big thanks for that! It was something I always thought about doing but the templates were such a mess I was too afraid to go near them. --WikiTiki89 14:58, 16 August 2016 (UTC)
Awww, shucks! No problem, it was quite interesting working on those templates. — SMUconlaw (talk) 15:42, 16 August 2016 (UTC)

Templates in Category:Quotation reference templates should use {{quote-book}} et al when possible[edit]

I'm hoping there's agreement on this. Some of the templates have extra parameters that may not fit elegantly. It will be some work but there aren't too many templates to convert. DTLHS (talk) 21:27, 15 August 2016 (UTC)

Is this applicable to reference websites? DCDuring TALK 21:56, 15 August 2016 (UTC)
What do you mean? DTLHS (talk) 21:57, 15 August 2016 (UTC)
  • Support: I'm in favour as this would standardize the formatting of quotations. — SMUconlaw (talk) 09:10, 16 August 2016 (UTC)

Extended flexibility vote[edit]

FYI, I extended Wiktionary:Votes/pl-2016-07/Editing "Flexibility" by 1 month per request. --Daniel Carrero (talk) 00:44, 16 August 2016 (UTC)

I request that it be closed as per its original creation page. DCDuring TALK 02:30, 16 August 2016 (UTC)
Three people, including myself, supported the extension in the #Decision section in the vote. --Daniel Carrero (talk) 02:46, 17 August 2016 (UTC)

Russian combining forms like -бавить or -ключить[edit]

I created a number of entries visible in CAT:Russian verbal combining forms. These are verbs where the base verb is missing but various prefixed derived verbs exist, and I want to create an entry for the base verb for use in etymologies and such. CodeCat (talkcontribs) didn't like the term "combining forms". What do others think? Benwing2 (talk) 09:13, 16 August 2016 (UTC)

I have no strong opinion on this. The categories are useful but I haven't seen similar examples for naming them. --Anatoli T. (обсудить/вклад) 10:25, 16 August 2016 (UTC)
For the L3 header, I'd just call it a verb. We don't seem to have entries for parallel things in English like -ceive, but I did make an entry for Old Irish ·icc and call it a verb. —Aɴɢʀ (talk) 14:36, 16 August 2016 (UTC)
What would make the most sense would be to call them reconstructions: *бавить ‎(*bavitʹ), *ключить ‎(*ključitʹ). But for some reason I don't like that idea. I also don't like the hyphens in -бавить ‎(-bavitʹ) and -ключить ‎(-ključitʹ). I would say we should do what Angr did with ·icc and put them at бавить ‎(bavitʹ) and ключить ‎(ključitʹ). --WikiTiki89 15:01, 16 August 2016 (UTC)
Would they survive an RFV? —CodeCat 15:31, 16 August 2016 (UTC)
It would be an RFD question, because the claim would that they are attested as part of their derivations. --WikiTiki89 15:49, 16 August 2016 (UTC)
That would make them like ceive or ject, which I doubt would survive. RFV demands attestation of the lemma itself, it doesn't allow for such exceptions as far as I know. —CodeCat 16:05, 16 August 2016 (UTC)
Adding a hyphen onto them doesn't suddenly make them any more or less attestable than they were before. This issue is about what the entry name should be and not about attestation. For some languages, like Arabic, we don't indicate prefixes and suffixes with any sort of hyphen in entry titles. For Sanskrit, all our noun lemmas are actually suffixless stems that don't really exist on their own. This isn't much different. --WikiTiki89 16:27, 16 August 2016 (UTC)
-ceive and -ject aren't quite parallel to -бавить and -ключить because (among other reasons) all words containing the former morphemes were borrowed from French and Latin with those morphemes in them, similar to Russian -инг. And we do have entries for things like Latin -bulum that can't be attested on their own so I don't see how the RFV issue applies. Benwing2 (talk) 00:10, 17 August 2016 (UTC)
What about the case of verdwijnen? There is no verb dwijnen, at least not in current Dutch. The point is that we have essentially an unattested verb that it might be desirable to have an entry for. In Latin, we have opted to go for a reconstructed entry for the unattested base verb, as linked in the etymology of abdō, ēdō, dīdō and others. —CodeCat 00:22, 17 August 2016 (UTC)

User:Wyang is edit warring again[edit]

I have tried to explain the situation to him on his talk page, but he doesn't seem to want to understand that he can't just change common practice regarding transliteration to suit his own personal tastes. Big changes to common established practice like this need discussion and consensus, and I consider this a big enough change to require a vote, but I am having a hard time getting him to actually do so and wait for consensus. Instead he edit wars over it to try and force his change through, since he thinks he is right, anything is warranted and any opposition is apparently shortsighted and Eurocentric and therefore it's ok to ignore consensus. Can someone else please try explaining it to him and try to get him to stop messing with the modules? The only thing I can do is continue to revert him. Thank you. —CodeCat 02:01, 17 August 2016 (UTC)

It has been very frustrating interacting with User:CodeCat - unreceptiveness to suggestion, poor participation in discussions at the Beer Parlour, blocking wilfully, replying with completely irrelevant comments, and impetuous reverts without any input to the topic at hand. The word being thrown around is consensus, when there is not even one to begin with. I repeatedly asked for consensus for treating romanisation and transliteration as equivalent in Module:links, but User:CodeCat's response is plain simple - evasion, evasion and evasion. Without any clear and thorough discussion showing your edit is consensus, why are you throwing around the consensus as if there is one? If you are not willing to discuss, you should not be making any changes, let alone reverting impetuously. Disappointing that such blatant bullying is condoned. Wyang (talk) 02:11, 17 August 2016 (UTC)

Proposal: "Description" section for symbols[edit]

I've been using the Etymology section to place descriptions like these for some symbols.

Proposal: I'd like to use a "Description" section instead.

Rationale:

  • These are descriptions, not etymologies.
  • Maybe this would discourage definitions that are merely the Unicode description of the symbol, which would be a good thing.

Template:editnotice-exotic symbols says: "When creating this entry please make sure you give the symbol a proper definition, preferably with attestation. Mere Unicode code point name does not constitute a definition. Symbol entries without proper definitions may be deleted." Related discussion: Wiktionary:Beer parlour/2015/January#Is documenting all Unicode characters within the scope of Wiktionary?.

If someone creates an entry like that, (using the code point name as the definition, I mean) I was hoping we would be able to say: "The definition is not the place to describe the symbol, use the Description section instead. The definitions are for real meanings that can be attested."

Note:

As I said, my idea is to use the Description section for symbols like 💾 and the others above, but if we agree about allowing the section for some symbols, it raises the question of whether common letters, numbers, punctuation, etc. as well should have a Description, too. I'm not sure about whether they should. I'm leaning towards allowing it, but I'd like to know what others have to say. I thought of a few examples for consideration:

  • A = "An upside-down V (two symmetrially opposed diagonal lines meeting at the top-middle point) with a horizontal line in the middle, from one diagonal line to the other. (also mention about the appearance of "A" in handwriting)"
  • ! = "A dot below a vertical line."
  • + = "A vertical and a horizontal line, crossing in the middle."
  • ¨ = "Two horizontally-aligned dots, to be placed above a letter."

I think there may be reasons not to want a "Description" section for all characters of all scripts. Han compounds like "秋 = compound of 禾 ‎+ 火" are real etymologies and descriptions too. Correct me if I'm wrong, but for Han compounds, I believe Etymology is enough and they don't need a "Description" section. Other scripts might have other considerations. --Daniel Carrero (talk) 03:02, 17 August 2016 (UTC)

  • Agree I think that describing symbols and how they may appear "in the wild" with actual usage is a valuable resource. I don't know that we need to describe "A"--if you can read English, you already will recognize this character. —Justin (koavf)TCM 05:44, 17 August 2016 (UTC)
  • Unsure. This would add a special case/section just for unicode characters and might be confusing. The entry layout is already complicated enough. I suspect that "Description" might get used outside of the unicode/symbol context. – Jberkel (talk) 11:19, 17 August 2016 (UTC)
"An upside-down V" would be a terrible way to describe something that wasn't actually derived from the letter V. Very misleading. Equinox 15:29, 17 August 2016 (UTC)
Fair enough. --Daniel Carrero (talk) 22:28, 19 August 2016 (UTC)
IMO "a V-shape" or "a V-like shape" is fine, though. I think "an upside-down V-shape" or "the shape of an upside-down V" would probably be OK. - -sche (discuss) 01:10, 20 August 2016 (UTC)

Is it better to put in usage note instead? --Octahedron80 (talk) 06:11, 18 August 2016 (UTC)

Maybe not, it would sound odd to me. Concerning "⚤", the text "Interlocked male and female symbol." is not how to use the symbol. It is how to draw the symbol, or what to expect in Unicode fonts.
I thought of having a separate Description section, also because it is a repeated, specific type of information that many symbols would have. The Usage notes section is for miscellaneous usage information. --Daniel Carrero (talk) 18:33, 20 August 2016 (UTC)
Maybe the Description section would be useful for someone to know what a certain symbol looks like, without installing the right Unicode font. I'm also willing to consider the hypothesis that a textual description of a symbol or letter would be useful to blind people. Maybe also for creators of fonts, I don't know.
I seem to remember that a certain Unicode character was sometimes depicted as a cross and sometimes depicted as a full church. To me, this sounds like something we should mention somewhere. --Daniel Carrero (talk) 11:36, 21 August 2016 (UTC)

I created Wiktionary:Votes/2016-08/Description. --Daniel Carrero (talk) 16:08, 22 August 2016 (UTC)

I worked closely with a blind person for two years. She used Braille (on paper) and screen-reader software (on the computer) and she had no reason whatsoever to know or care about the shapes of letters, let alone obscure mathematical symbols. Let's not create worthless rubbish for no reason please. Equinox 17:17, 22 August 2016 (UTC)
Thanks for the info. I removed "A separate hypothesis, although unproved, is that it would be useful for blind people to know what the symbol looks like, too." from the vote rationale. --Daniel Carrero (talk) 17:54, 22 August 2016 (UTC)

Image availability[edit]

Are some images not available for use in Wiktionary? I tried using this one, which is used in Wikipedia, but couldn't get it to work [14]. DonnanZ (talk) 09:34, 17 August 2016 (UTC)

It's not hosted at the Wikimedia Commons. (Note the lack of "View on Commons" tab at the top of the page) —suzukaze (tc) 09:39, 17 August 2016 (UTC)
That's a shame, it's a great image. Is it possible to rectify that? DonnanZ (talk) 09:51, 17 August 2016 (UTC)
It seems like it's been a candidate to be copied to the Commons since February 2012 (see the Licensing section, which also includes detailed information on the moving process). —suzukaze (tc) 09:56, 17 August 2016 (UTC)
I'm not skilled in doing that. I wouldn't like to try! DonnanZ (talk) 10:05, 17 August 2016 (UTC)
@Donnanz: There is a guide at w:Wikipedia:Moving files to Commons. You already have an account at Commons just by virtue of having one here, so you don't need to do anything new. If this guide is too confusing, let me know by typing {{Ping|Koavf}} and respond here. Thanks for being so eager to learn and help us! —Justin (koavf)TCM 14:04, 17 August 2016 (UTC)

──────────────────────────────────────────────────────────────────────────────────────────────────── I transferred it. It's now called "File:London United Tramways tram in front of its tram-shed, Kew Road, Richmond, UK - c 1900.jpg". — SMUconlaw (talk) 15:24, 17 August 2016 (UTC)

@Smuconlaw: Ah, wonderful. Thanks a lot (and also to Koavf). The result can be seen at tramshed. DonnanZ (talk) 18:10, 17 August 2016 (UTC)

Quotation questions[edit]

I recently added a quotation to mouth-breather, Gamergater, alt-right, and bluestockings. The quotation captures a number of colorful words, so I added it to several entries. The quotation was origninally:

  • 2016, Ross Douthat, "A Playboy for President," The New York Times, 14 Aug.
    "But the cultural conflict between these two post-revolutionary styles — between frat guys and feminist bluestockings, Gamergaters and the diversity police, alt-right provocateurs and 'woke' dudebros, the mouthbreathers who poured hate on the all-female 'Ghostbusters' and the tastemakers who pretended it was good — is likely here to stay."

Note two things:

  1. The external hyperlinks are in the original article on the NYT website.
  2. My addition was to what I perceived to be the "main" page for mouth-breather, and intentionally not to mouthbreather or mouthbreathers. I viewed the latter two as secondary because they identified themselves as an alternate spelling and a plural, respectively, had no definition or quotations, and directed the reader to the "main" page.

Equinox and I discussed several things, and it was proposed I bring them here:

  1. Is it appropriate to preserve hyperlinks at all? (I don't believe search engines are affected by such relinks in a wiki article.)
  2. If so, should it only be on the entries where they satisfy two criteria: they are in the original source and they clarify the author's meaning of the entry in question. Here that would mean they would stay for Gamergater and alt-right, but not for mouthbreather or bluestocking.
  3. What page should quotations appear on when alternate spellings are in play?
  4. If the alternate spelling has no entry, should it be created and the quotation moved?

--Flex (talk) 21:54, 17 August 2016 (UTC)

1. You should be using {{quote-news}} with the url parameter. 2. I believe quotations should go with the exact spelling of the word, unless the word is exceptionally hard to attest. DTLHS (talk) 22:04, 17 August 2016 (UTC)
One distinction: regarding the word-form where citations go: while I believe in using the exact form (i.e. mouthbreather rather than mouth breather, if there is no space in the citation — since citations need to attest and provide evidence for a specific form existing), I don't think that has to apply to grammatical inflections. In other words, I think it's okay to put a mouthbreathers (plural) citation at mouthbreather, but not at mouth breather. Equinox 22:15, 17 August 2016 (UTC)
Regarding the preservation of hyperlinks in cited text, my other points when talking with Flex were (i) they are something like formatting (e.g. font colour) and not really vital to showing attestation of the word, (ii) our linking might have annoying semantic repercussions (e.g. Google strengthening its PageRank between the linked text and its target, whereas we don't want to reinforce the writer's opinions — though I believe there are technical ways around this, e.g. nofollow), and (iii) Web links die very frequently anyway, and we often have to remove them from entries (plus dead links often benefit subsequent cybersquatters). Equinox 22:22, 17 August 2016 (UTC)
My responses:
  1. It is a common practice for websites to add hyperlinks to previous articles on their own website and, perhaps less frequently, links to external websites. I'm leaning towards the omission of such links in quotations, particularly since those links may well go dead if not archived in some way. (It would theoretically be possible in some instances to archive them at Archive.org, but indicating the archive URLs would be cumbersome. Consider the following: "But the cultural conflict between these two post-revolutionary styles — between frat guys and feminist bluestockings, Gamergaters [archived from the original on 9 August 2016] and the diversity police, alt-right provocateurs [archived from the original on 15 August 2016] and 'woke' dudebros, [] ")
  2. The quotation should appear under the main lemma, not the variant spelling. (This is the practice adopted by the OED.) It would be easier to gauge the vintage of a word that way, as all the quotations would appear on one page. Thus, an 18th-century quotation containing the obsolete form stowadore should appear together with more modern quotations from the 19th to 21st centuries where the lemma is spelled stevedore.
SMUconlaw (talk) 22:24, 17 August 2016 (UTC)
(It seems like Wikimedia wikis have nofollow on by default for external links. —suzukaze (tc) 22:24, 17 August 2016 (UTC))
Smuconlaw: I know it's OED practice, but we need a better rationale than "some experts do it that way". We don't know how they store their data internally; we only see the finished product, the dictionary. If we put all cites at the main form then it becomes very hard to see whether alt forms are supported or not. Equinox 22:27, 17 August 2016 (UTC)
[Edit conflict. I added a reason before seeing your latest comment. — SMUconlaw (talk) 22:29, 17 August 2016 (UTC)]
I think this is more of a user-interface issue. Wouldn't you agree that it makes sense to store citations with their exact attested form internally, but to show them all together to users who view an entry? (This would need further development work.) Equinox 22:32, 17 August 2016 (UTC)
Yes, if this can be technically achieved I see no issue with that. — SMUconlaw (talk) 10:31, 18 August 2016 (UTC)
I think quotations should all go at the main form, but citatons should be on the citations page of the exact form they exemplify. Andrew Sheedy (talk) 18:39, 22 August 2016 (UTC)
What if the main form has more than one sense? Should all the citations for the alternative form be grouped together even if they have difference senses? DTLHS (talk) 22:40, 17 August 2016 (UTC)
I think it's logical (and our current practice, as far as I can tell) for quotations to be grouped by sense. — SMUconlaw (talk) 10:31, 18 August 2016 (UTC)
I think we should remove the hyperlinks, because what we are actually quoting is the durably archived version of the article, i.e. the version printed on paper (that will eventually end up on microfilm in libraries), and that version doesn't have hyperlinks. —Aɴɢʀ (talk) 19:04, 22 August 2016 (UTC)
I disagree. I think we're archiving the text the author actually published. In the case of a purely electronic medium, the links can be very relevant to the author's intent (whom does he consider an exemplar of "alt-right provocateurs"? Sarah Palin? Donald Trump? Richard John Newhouse? Ann Coulter? Specifically, he cites Milo Yiannopoulos.) The links in question are to reputable sources (Time and the NYT) which are less likely to expire than most.
Counter argument to myself: the print edition of this article did not have the hyperlinks, so at least in this case, it seems legitimate to remove them. That doesn't answer the general question of whether they should be included when possible. --Flex (talk) 21:17, 23 August 2016 (UTC)
Yes to linking We definitely should include links--even if they expire sometime, we still have the original citation. In fact, we should include more links to Internet Archive and WebCite as archive links. —Justin (koavf)TCM 21:54, 23 August 2016 (UTC)

I'm having trouble detecting consensus here. What should I do on the two points in question? If this is not the place to look for consensus, where should I go? --Flex (talk) 21:17, 23 August 2016 (UTC)

wheel warring between User:Wyang and User:CodeCat -- not cool[edit]

Something has to be done here. I'm not following this issue closely but I did notice that Wyang blocked CodeCat for 1 day for edit warring, when (a) almost certainly Wyang was equally guilty, (b) it is absolutely not OK for an involved admin to block someone they're involved in a dispute with, esp. another admin. I get the feeling both are equally guilty and deserve to be blocked. Wikitiki actually did block Wyang, who somehow managed to unblock himself (??), something else that's definitely not OK. My first instinct is to block Wyang again for his bad behavior, but instead I'm just going to unblock CodeCat since this particular block should not have been put in in the first place. What do others think? Benwing2 (talk) 02:06, 18 August 2016 (UTC)

The main reason this has escalated to this point is because Wyang has shown no willingness to find a consensus for his proposed changes to Wiktionary practice, and my previous call for help on the matter was completely ignored. Since Wyang is also an admin, my ability to enforce rules and common practice are limited and reverting the contested changes while trying to reason with him is all I can do. Please advise what can be done in the future in dealing with a rogue administrator without making myself a guilty party. —CodeCat 02:46, 18 August 2016 (UTC)
To answer your "??": admins have permission to unblock themselves. Obviously if it gets to that point they should hopefully be trying to generate some consensus with the blocking admin or the community. Equinox 03:44, 18 August 2016 (UTC)

Wheel War- Action Taken[edit]

The conflict between User:CodeCat and User:Wyang has gone on long enough. They've been edit warring over an absolutely critical module used by huge numbers of entries. I'm not sure what that's doing to the edit queue- but it can't be good.

Both deserve to be blocked, but that would render them unable to contribute in discussions over the issue. It's also true that their misbehavior has been limited to editing protected modules and blocking each other.

Therefore, I have temporarily desysopped both of them, which will prevent them from editing the modules in question. I intend to restore them in one week, or when this is resolved- whichever comes first.

If edits need to be made to protected modules before then, I would appreciate it if our more-knowledgable admins would make themselves available to help out- perhaps User:Wikitiki89 or User:DTLHS?

I hope we can resolve this conflict quickly and get back to building a dictionary.

I would appreciate your feedback on my actions, since such things should only be done with community consensus.

Thanks! Chuck Entz (talk) 05:49, 18 August 2016 (UTC)

  • I can't think of any other action that would have been more appropriate. SemperBlotto (talk) 06:10, 18 August 2016 (UTC)
I think the desysopping was appropriate. I would even strongly propose that community consensus be obtained before reinstating the tools. CodeCat and Wyang have wheel-warred before, and each has blocked the other at least once, among other questioned actions. CodeCat, Wyang: you two are knowledgeable contributors to our content, and you are valuable contributors to our technical infrastructure, but you've both long (and not necessarily in equal measure) shown a tendency towards using your abilities to implement faits accomplis and get your way on e.g. module and entry layout or on treatment of Chinese, respectively. For instance, although on this page CodeCat calls on Wyang "to find a consensus for his proposed changes to [what she asserts is] Wiktionary practice", mere days ago Benwing called her out for again using her bot to create many new entries inconsistent without our existing entries and practices. Wyang, in turn, has threatened a few times to take his ball and go home if we don't agree with an action or, long ago, the unification of Chinese. These attitudes have driven away other editors; for instance, User:Mkdw just recently left after calling out Wyang's use of admin tools in the BP, while User:Ruakh has been largely inactive since earlier disputes with CodeCat over modules (as noted e.g. here) and the presentation of module errors (then and still now I agreed with CodeCat that module errors should generate a visible error message, but the dispute cost us a knowledgeable technical editor). This particular wheel-war seems especially excessive because the dispute seems to be not over whether there should be an automatic translit feature for complex non-European scripts like Thai, but over where it is most elegant to put that feature. - -sche (discuss) 06:49, 18 August 2016 (UTC)
  • As, what it feels like anyway, the only non-admin reading these discussion boards, I express my consensus and agree with -sche that the stripping should not be time-bound but powers should only be restored when the community is convinced that the issue is done with in such a way that neither will have any incentive to do something which sparks it up again. I also repeat my conviction that no party of an edit war (as defined by me above as a conflict where two reversals of an edit have happened) should have the right to block or unblock any participant. Korn [kʰũːɘ̃n] (talk) 10:14, 18 August 2016 (UTC)
  • I pretty much agree with you on everything (not unusual, by the way). We have here two equally stubborn and overbearing people who have met their match- if it weren't for the stakes and the damage done, it might be satisfying to see both get their comeuppance. As for duration, I was careful to say "I intend", because the week was just an arbitrary time picked out of the air, and I was hoping we could come up with something better. Right now both are responding with stereotyped "talking points" about the failings of the other, which shows both are still dealing with this on a strictly emotional level. The truth is, both are basically right about each other in the most part, but it's irrelevant. We need to come up with a solution that makes sense and that both can live with. Chuck Entz (talk) 14:16, 18 August 2016 (UTC)
    I think the desysopping has to continue until the matter is resolved. As long as there is no sulking, the project will continue to benefit from their contributions. I hope that the project does not suffer from lingering bad feelings once these valued contributors regain their sysop status. DCDuring TALK 13:10, 19 August 2016 (UTC)
  • I support the emergency temporary desysopping of both editors made by Chuck Entz on account of interminable wheel-warring. I believe a bureaucrat is authorized to take such temporary measures to eliminate this kind of wheel-warring, without a vote. --Dan Polansky (talk) 11:44, 21 August 2016 (UTC)

To be honest, I am not expecting any functional input from User:CodeCat regarding the topic at hand, based on her bullying behaviour and unwillingness to engage in discussions in the past few days. Her only argument has been that her edit was based on "consensus", which is obviously nowhere to be found, even when requested again and again.

Treating romanisation as equivalent to transliteration is clearly erroneous (since romanisation = transcription + transliteration), but she keeps reinstating this misinterpretation, with total disregard for the infrastructure of languages which make the distinction between transcription and transliteration on a romanisation level. For example, Module:th-translit does not even describe what it does after her edits, and she is apparently nonchalant about these languages ("It's a misnomer, but that's the way it is.").

This lack of regard for correctness, coupled with her previous heedless deletion of the indispensable code in Module:links (which precipitated all this), are acts of admin sloppiness. Her one-line response of "So, what happens now? Can we please get rid of the Thai code from Module:links now, or do we need some more edit warring?" to my detailed rationales for putting transcription support in the central modules, is exemplification of her uttermost apathy towards the actual topic ("would rather fight not explain") and disrespect to people.

This second episode was perfectly bound to happen, and bound to end tragically, when all that one side of the dispute cares about is "getting rid of the Thai code from Module:links now", even if she has to use "some more edit warring" for that. Yet, there are people cheering for her. Wyang (talk) 10:38, 18 August 2016 (UTC)

I support this action and wish that someone had done something sooner. I called for help above, but nobody responded, so I was very unsure what to do as I didn't feel like I had any options left, and it was all up to me. I'm sad to see that the community only cares when there is edit warring going on but is unwilling to help in solving the problem outside of that. At least now, people's attention is finally here so I can't complain too much.

As far as the dispute goes, I can summarise what I see:

  • Wyang, in principle, believes that transliteration modules should only be used for transliteration in the strict sense: letter-by-letter conversion.
  • Consequently, the Thai transliteration module does literal transliteration, which makes it pretty much useless for Thai.
  • This goes counter to how the term "transliteration" is generally used on Wiktionary; we use the term to refer to transcription, transliteration and romanization in general. Transliteration modules perform all of these functions, and the tr= parameter that is present on many templates is frequently provided with something that is not strictly transliteration, but rather adheres to the Wiktionary usage of the word. Our policies with respect to the use of these parameters and modules are labelled "transliteration" as well, as evidenced by WT:RU TR, WT:EL TR and WT:JA TR for example. None of these transliteration policies describes transliteration in the strict sense (av rather than ay for Greek, ō rather than ou for Japanese, etc.).
  • Because the transliteration module for Thai is useless by Wyang's own choice, Wyang decided that the best way around this was to insert special-purpose code into Module:links, a widely-used general-purpose module, to transliterate Thai correctly by using code present in another module, Module:th.
  • This was disputed by me, arguing that such special language-specific code does not belong in a general purpose module, especially not when it can easily be put into the existing transliteration module and have everything work just fine.
  • User:Wikitiki89, in the last war, did just this: he moved the code over to the transliteration module, where it belongs. This was immediately reverted by Wyang however, and his special code in Module:links reinstated despite it already having been disputed. My efforts to reapply Wikitiki's edits were repeatedly reverted by Wyang.
  • Fast forward to now, when I once again noticed Wyang's special purpose code in Module:links, and got frustrated that the issue was never solved. I therefore once again moved the code to the Thai modules. This again resulted in a revert war.
  • I attempted to explain on Wyang's talk page that in order for his alternative interpretation of transliteration, which involved creating separate modules and infrastructure for transliteration versus transcription/romanization modules, to be accepted, he would have to find a consensus with the community for it and seal it with a vote.
  • Wyang showed no intention of doing this, instead arguing on the merits of his views as if to convince me that separating the two was the right way to go. In my view, this missed the point as it wasn't me he was supposed to convince, but the community at large. Thus, I ignored his arguments and instead tried to focus on stopping him from edit warrning and trying to get community consensus first.
  • Wyang refused to create a vote, instead telling me to create a vote for him. Two other editors also called for a vote, and even offered to make one if Wyang didn't. I welcomed this, but nothing has been done in this regard yet, and Wyang continued his edit war, rather than waiting on the outcome of the vote.
  • I called for help on the Beer Parlour regarding the matter, hoping that other users would be better capable of solving the issue and, especially, to stop Wyang from reverting me each time and get him to wait for consensus. This call for help was entirely ignored, and thus the warring continued.

CodeCat 14:21, 18 August 2016 (UTC)

It seems the issue is a bit more complicated than that. Wyang seems to want to have both transliterations and transcriptions for Thai, used in different places. This is something that goes against the status quo and should need a vote before being implemented. Wyang has refused to draft this vote claiming that the consensus among Thai editors is enough. However, this impacts not only Thai editors, but our readers as well who may be confused by having two different romanization systems in different places. As long as Wyang continues to refuse to draft a vote, I don't think we should allow his system to be put in place. My personal opinion is that there should be one default romanization system, whether it be strictly a transliteration, or a transcription, and if it is necessary to use a different system in etymologies, this should either be done manually with tr= parameter, or potentially with a dedicated Thai template that would allow choosing a different automatic romanization. In either case, all the automatic Thai romanization code, both transliteration and transcription, should be located in Module:th-translit. --WikiTiki89 14:35, 18 August 2016 (UTC)
I'd like Thai to follow the pattern we've already established for Burmese: one automatically generated transliteration system used everywhere outside of Thai entries (translation sections, etymology sections, etc.), and Thai entries with additional transliteration systems (both spelling-based and sound-based). Ideally the automatically generated one should be ISO 11940-2 or at least based on it. —Aɴɢʀ (talk) 15:05, 18 August 2016 (UTC)
@Angr Burmese entries are nowhere near the level of current Thai entries. The current Burmese transliteration is much closer to the spelling, which doesn't help users much with the pronunciation. Ideally, we should have a system created for Thai - with phonetic respellings but for that we need more native knowledge or reliable data available. With Thai, we're are luckier - we have native speakers, phonetic respellings from some dictionaries and "Paiboon" or other transcriptions from published dictionaries sometimes can help reverse-engineer the phonetic respelling (for non-natives). I'd like to see the same methods used for Burmese and Tibetan. --Anatoli T. (обсудить/вклад) 02:32, 21 August 2016 (UTC)
Transliteration is not concerned with representing the sounds of the original, only the characters, ideally accurately and unambiguously. (Wikipedia)
Romanization, in linguistics, is the conversion of writing from a different writing system to the Roman (Latin) script, or a system for doing so. Methods of romanization include transliteration, for representing written text, and transcription, for representing the spoken word, and combinations of both. (Wikipedia)
Transliteration is not the same as romanisation. Romanisation = transliteration (script conversion by spelling) + transcription (script conversion by pronunciation). It is quite embarrassing that we as a dictionary are getting this basic concept wrong in some places, and seem proud of propagating and not rectifying the error. The contrast between transliteration and transcription is a fundamental concept in basic linguistics (and I do not even have a linguistics background). The distinction is strictly adhered to when one discusses romanisation schemes for languages which have a noticeable script-pronunciation discordance, i.e. languages which distinguish between transliteration and transcription on a romanisation level. If one wishes to talk about conversion to the Latin script using any mapping, it is romanisation. There are some places on Wiktionary which have confused these two concepts - for example WT:RU TR - which is precisely why people have complained that "this [i.e. the "translit" system] looks like a mess of transliteration and transcription". If we search for "transliteration of Russian" (or other languages) on Wikipedia, we will quite sensibly be redirected to "romanisation of Russian".
So far the confusion has been largely non-disastrous, since (1) most content has been for Latin-script languages, and (2) for languages which do not use the Latin script and have had romanisation systems devised for them on Wiktionary, the difference in the romanisation outcomes generated by transliteration and transcription is comparatively minimal, for example Russian. Then we deal with languages of the East, which are renowned in keeping the spelling forms used hundreds or thousands of years ago, and therefore have a high script-pronunciation discordance. If we go back to the comparison table of various languages in transliteration and transcription outcomes that I created in June, we can see that transliteration and transcription are two visibly dissimilar modes of romanisation, and this confusion of transliteration and transcription is destined to have disastrous consequences.
People may not be aware of this, but this distinction has been faithfully adhered to when we designed the module infrastructure for Oriental languages, until this incident. From the aforementioned table, we can see that transliteration as a concept is inherently impossible for Chinese and Japanese since there is no script-to-script mapping, and appropriately there is no Module:zh-translit or Module:ja-translit on Wiktionary. What we have in place for Chinese and Japanese is Module:zh/Module:zh-pron and Module:ja/Module:ja-pron which help interpret auxiliary or native phonetic representations of these languages to generate romanisations in a transcriptive manner. On the other hand, transliteration is possible and contrastive with transcription for Tibetan and Burmese, therefore we have Module:bo-translit and Module:my-translit to generate transliterations and Module:bo/Module:bo-pron to generate transcriptions. For Thai, the distinction has also been faithfully observed until the incident: we have Module:th-translit which deals with transliteration, and Module:th/Module:th-pron dealing with transcription of Thai. It has been customary practice to devise module infrastructure for languages observing the transcription-transliteration distinction prudently, and this includes naming modules the way they should be named, to avoid future complications.
Why do we have to be prudent in devising the module infrastructure for these languages, and what are the complications of imprudent and misnomeric handling of languages observing this transcription-transliteration distinction? I explained this in detail in the previous discussions (discussion 1, discussion 2). In short, the two divisions of romanisation (transliteration and transcription) symbolise the two polarising ends of the romanisation spectrum in a dictionary-building context:
etymology (transliteration) ———— pronunciation (transcription).
The reasons we use romanisations on Wiktionary are different in various parts of the project. In translation sections, the purpose of romanisations is to inform readers how the word in another language is pronounced. In etymological comparisons, the purpose of romanisations is to inform readers how the term is spelled, i.e. how it used to be pronounced. Among languages which observe the transliteration-transcription distinction, there is variation in how acceptable it is to approximate one romanisation with the other and use it in all places. This really needs to be decided on a language-by-language basis. For some languages (e.g. Tibetan, Burmese), it is not advisable to use one mode of romanisation in all places. Again using the Tibetan example of བརྒྱད (transliteration: brgyad; transcription: gyaew): It makes no sense to say:
བརྒྱད (gyaew) is a cognate of Old Chinese (OC *preːd)
when the word is actually spelt as brgyad; and it similarly makes no sense to put:
བརྒྱད (brgyad)
as the Tibetan translation of eight. The next day after the faithful Wiktionary user downloads our app, he/she is found at a Lhasa stall, trying to bargain by whipping out their phone and awkwardly pronouncing /brgjad/.
The transcription-transliteration distinction is a pan-linguistic phenomenon not just limited to Thai lacking support in the core module system, and more consideration and acknowledgement that many script-pronunciation discordant languages use two methods of romanisation needs to go into the infrastructure. A system is never perfect (e.g. Module:links and Module:translations already contain language-specific adaptations), but changes are gradual and have to be initiated at the correct end. A step in the wrong direction may precipitate amplified counterproductivity before eventual rectification takes place, like the misinterpretation of transliteration here. Wyang (talk) 00:41, 19 August 2016 (UTC)
@Angr: ISO 11940-2 is a transcription system. Wyang (talk) 00:43, 19 August 2016 (UTC)
@Wyang: You're starting to sound like a broken record. We all understand that in linguistics there is a distinction between transliteration and transcription, both of which can be called romanization (when this is done into the Latin alphabet). But the issue here is not of terminology. Yes, we use transliteration incorrectly according to the linguistic definition (although the common non-linguistic definition would include phonetic transcriptions as transliterations), but if we "corrected" ourselves and replaced the word transliteration with romanization everywhere that we misuse it in our templates, modules, "about" pages, etc., you still would not be happy. Why? Because your problem is that you want to use two different automatic romanization systems (one a "transcription" and the other a "transliteration"), when our templates only support using one automatic romanization system. So let's talk about that issue and not the terminology. --WikiTiki89 01:09, 19 August 2016 (UTC)
Well, I have to sound like a broken record because much of this has already been said two months ago in discussions poorly tended to by User:CodeCat (aside from the one-liner). The core issue is the confusion of transcription and transliteration by people who designed Module:links and the consequent awkwardness in its support for transcriptions. If transliteration = romanisation = transliteration + transcription in the central infrastructure, then where does transcription fit? It merely becomes a transliteration2, which it is not and should instead be contrasted with. The technical side is easy to fix - the shorthand "tr" is perfect already. We only need to store the transliteration and transcription modules as separate in language_data, and turn on the transcription modules at the appropriate point. For example, my revision at Module:links. Obviously there are more rigorous ways, but the approach has to be central to start with, not by confusing this concept even further in languages which truly distinguish them. Wyang (talk) 01:30, 19 August 2016 (UTC)
Ok, so let's say that we do this. Now, which of the two modules is called when our modules need a romanization to be auto-generated? And what use is the module that does not get called?
Also, aside from all this, why have you never thrown up a BP discussion or vote to discuss this proposal? Why did you edit war to put it in place instead? —CodeCat 01:37, 19 August 2016 (UTC)
Personally I think it would be hella confusing to display brgyad in one place and gyaew in another place when referring to the same word. If we are to make a systematic distinction between transliteration (in the proper sense) and transcription, we should include both forms consistently. Perhaps we write བརྒྱད (brgyad ・gyaew) where the dot in the middle links to a page explaining what the two romanizations mean. Mind you, I'm not convinced it's worth the trouble, but if we are to do it, something like this would be the way. Benwing2 (talk) 01:50, 19 August 2016 (UTC)
And in fact that suggestion is already possible without Wyang's changes to Module:links. --WikiTiki89 01:53, 19 August 2016 (UTC)
"བརྒྱད (trlit. brgyad; trscr. gyaew)"? (a bit more explicit; the blue dot in headwords is not terribly intuitive IMO) —suzukaze (tc) 02:08, 19 August 2016 (UTC)
This is fine with me, and I agree is more intuitive. Benwing2 (talk) 06:01, 19 August 2016 (UTC)
CodeCat, all of this was in the original discussions (discussion 1, discussion 2). "Why did you edit war to put it in place instead" - this is irresponsible and unnecessary accusation. Please have a look at the page history of Module:links; the first revert was your heedless revert which paralysed the Thai entries.
བརྒྱད (brgyad ・gyaew) in translations is too confusing for newcomers. The technical support for transcriptions is not difficult to put in place. A simple parallel function of Language:transcribe can be added in Module:languages. This function can be called by Module:links (i.e. to turn on transcription support) unconditionally for language A, or conditionally for language B (e.g. only when Module:links is called by Module:translations, or unless Module:links is called by Module:etymology). Wyang (talk) 04:26, 19 August 2016 (UTC)
What about suzukaze's suggestion? Do you still think it's too confusing for newcomers? IMO displaying different romanizations in different places is far more confusing than displaying both and I would be strongly against that. Benwing2 (talk) 06:01, 19 August 2016 (UTC)
What about བརྒྱད (gyaew [brgyad])? We already do this for Akkadian, for example: 𒆍𒀭𒊏𒆠 ‎(bābili [KA2.DINGIR.RAKI]). (Although I don't understand why gyaew is even needed, none of the IPA transcriptions at the page look anything like it; they all look much more like brgyad.) --WikiTiki89 09:20, 19 August 2016 (UTC)
gyaew is the Lhasa pronunciation: gy /c/, ae /ɛ/, w /˩˧˨/. Frankly, I would be quite confused by the Akkadian word if I saw it in translations (I still am after reading the entry, especially the etymology). It may be less unsatisfactory for Akkadian, as people may be less interested in the spoken aspects of a dead language. I don't think putting transliteration in translations is a good idea for any of the non-small living languages with a high level of script-language discordance. {{bo-pron}} has more examples of transliteration-transcription correspondences in Tibetan. Wyang (talk) 09:40, 19 August 2016 (UTC)
The fact that you are confused by our Akkadian romanizations is not really a problem. We shouldn't necessarily expect people to automatically understand these things. We need to have appendix pages explaining our romanization scheme for each language, just like any other dictionary would do. Such an appendix would explain to you that bābili is the transcription and KA2.DINGIR.RAKI are the names of each character in the word, named by their usual phonetic value, with capital letters indicating Sumerian logograms (Sumerograms; kind of like Kanji) and superscript indicating determinatives. --WikiTiki89 12:26, 19 August 2016 (UTC)
I would love to read some statistics regarding the traffic of our help pages - I have always been under the impression that very few people are able to navigate to our Wiktionary:About... pages, since we do not have an obvious or subtle link on the entry itself linking to the language help page. We do not have a "translate!" tool alongside the search box that helps a reader check if translations of word A in language X exist (i.e. a simple interface with two fields "word" and "language" (dropdown by #speakers), which parses through the content of the entry A to see if it has the translation of any sense in language X), and prompt the user to suggest that we add this translation if there is none. We also do not have a fuzzy search function, or a reverse transcription search, and many other things. Personally, the reason I look up translations is because I want to know how to say the equivalent in another language. Like the common phrase "How do you say ... in the ... language?", not "How do you spell?". I imagine most readers are expecting to find out the pronunciation of a foreign non-Latin-script word on the translation page itself, which is why I'm suggesting simple, straight-to-the-point phonetic transcriptions inside translation boxes. Wyang (talk) 13:07, 19 August 2016 (UTC)
I'm all for making the "about" pages more easily accessible. As for pronunciation, you're supposed to click on the entry and not simply look at the table. The entry should have all the pronunciation information. Someone unfamiliar with Tibetan will not know how to pronounce gyaew anyway. Someone who knows a little bit about Tibetan would realize that the word might not be pronounced brgyad in Lhasa and click on the entry for further pronunciation information. I don't know why you're bringing up search features, they do not seem relevant to this discussion. --WikiTiki89 13:29, 19 August 2016 (UTC)
Yes, people are supposed to click on them, but people (especially casual visitors) often don't. People may not know how to pronounce gyaew initially, but if the display in translations consistently uses transcription and people are pointed to the correct help page, they are more likely to become regular users and use the translation functionality more frequently. The point about the search features was to lament that our user friendliness is (excuse me) crap... and yet, we are here arguing whether or not we should give support to transcriptions which prominently contrast with transliterations in many languages, and whether or not it is worthy to improve user experience with more consideration. Wyang (talk) 14:24, 19 August 2016 (UTC)
Giving a user a piece of unexplained information without a link or even a name for that information, thus effectively blocking the user from figuring out what that information is, is a problem. Because that means you have not given the user any information at all, you just blurted some nonsensical text. Korn [kʰũːɘ̃n] (talk) 15:11, 19 August 2016 (UTC)
Perhaps all of our transliterations should automatically link to a description page, like this: обезья́на (obezʹjána (key), “monkey”)? --WikiTiki89 15:53, 19 August 2016 (UTC)
That's actually how I handle it for Middle Low German grammar. Though I'm not sure it needs to happen for plain transliteration, which should be more intuitive than Sumerograms. In case of doubt, better safe than sorry, though. Korn [kʰũːɘ̃n] (talk) 16:08, 19 August 2016 (UTC)
The only downside to that idea is that it puts too much emphasis on the transliteration, rather than on the word itself. Another idea I've always contemplated was to just get rid of all transliterations in links and have them only in entries and etymologies, but that's a very radical change. Another idea I just had is what if we have links to transliteration keys after the language name in translation tables and at the top of each language header. --WikiTiki89 16:43, 19 August 2016 (UTC)
What if we just made the transliterations themselves the links to the keys? (Languages where the transliterations have entries (e.g. Gothic) could continue to link to those entries, since they contain, or link to the main entries which contain, much the same information as the key would.) - -sche (discuss) 17:15, 19 August 2016 (UTC)
I did consider that. My first thought is that it would look weird for all transliterations to be colored as links. Also, would the reader know what he would get from clicking the transliteration? But maybe it's not such a bad idea. We should limit this to link templates, though. Usage examples and other such things probably don't need to have their transliterations linked. --WikiTiki89 17:54, 19 August 2016 (UTC)
Strong oppose for any move to remove transliterations / romanizations from links. That would greatly reduce the usability of all Japanese entries. ‑‑ Eiríkr Útlendi │Tala við mig 00:42, 20 August 2016 (UTC)
A lot of online dictionaries have significantly better interfaces than us. Some use hover over for all links to show a sneak peek of the linked-to entry; examples are Moedict, CantoDict, Thai-language. These are all impressive tricks which we can potentially implement to greatly improve the user experience. The link in translations can be turned into a hover-over link which previews the pronunciation and first sense of the term, and on mobiles it can be simple link with transcription following it in parentheses. The point is we need to suitably name and record our utility modules, so that we can easily call on them and not come to the realisation we have mixed up all the transliteration and transcription modules when there is a need to use transcriptions. Wyang (talk) 00:46, 20 August 2016 (UTC)
But at what point has there ever been a need to choose between them or display them both? If we have both a transcription and a transliteration module, would they ever both be used for anything? —CodeCat 01:10, 20 August 2016 (UTC)
In translations. The purpose of having romanisations in translation sections is to inform readers how to say something in another language. Transcription modules, if they exist, should be preferentially called upon when romanising terms in translation sections. Wyang (talk) 02:08, 20 August 2016 (UTC)
I think presenting both romanisations simultaneously in translations is confusing - readers are unlikely to understand what the difference between transliteration or transcription is, or the difference between Wylie transliteration and Tibetan Pinyin. I would prefer presenting the information in the entry itself, and presenting only what is necessary in translations, e.g. བརྒྱད (pr. gyaew). Wyang (talk) 09:13, 19 August 2016 (UTC)
You can always make the words give a one line explanation of the difference on hover over. Korn [kʰũːɘ̃n] (talk) 09:37, 19 August 2016 (UTC)
We have to be careful with using hover over though - it does not seem to be well-supported on mobile devices. Wyang (talk) 10:01, 19 August 2016 (UTC)
  • I'd be fine with making the de-syspopping of CodeCat permanent. This is the latest in a series of abuses of the tools, ranging from bad blocks to making major changes without community consensus. Purplebackpack89 18:59, 18 August 2016 (UTC)
IMO, any action like this needs to be by formal vote. (Note that there was already a vote to desysop CodeCat, which failed.) Benwing2 (talk) 20:22, 18 August 2016 (UTC)
There should be no double standard. Either both Wyang and CodeCat have their sysop powers restored upon resolution of this problem, or they both have to reapply and be voted on. I do not understand, however, why CodeCat's edits are no longer autopatrolled. That should be fixed as soon as possible. —Aɴɢʀ (talk) 21:55, 18 August 2016 (UTC)
I overlooked that detail. Fixed. Chuck Entz (talk) 02:11, 19 August 2016 (UTC)
I can always trust you to make everything be about you and your grievances, no matter the subject. That type of attitude is a large part of what caused this mess in the first place- we need less of it, not more. Chuck Entz (talk) 02:11, 19 August 2016 (UTC)
I oppose CodeCat's recent desysop. First off, where is the formal vote? Second of all, I've not really had any problems with her. I think her intentions really are good, but she may have made a mistake, just like all of us have. Jeez if I had a penny for every mistake I've made on the internet, I'd have like 10 bucks (which is a lot of pennies!). I feel like it's only if a person continues to make such mistakes somewhat consistently over a long period of time, or do something really bad (like delete the main page, for instance), that they should be desysopped because of behavior. I'd be willing to put up a vote to get her resysopped (hey look I made up a new word!) if necessary. Philmonte101 (talk) 22:33, 18 August 2016 (UTC)
A vote would be required for a permanent desysop, but in this case, the desysopping was temporary in order to stop an ongoing edit war. Normal users would have received a temporary block for this, but admins can unblock themselves, making such a block useless if the admin is determined to circumvent it (and both of them did so in this case, before they were desysopped). Thus, I think the temporary desypping was justified. --WikiTiki89 23:36, 18 August 2016 (UTC)
It seems like you completely misunderstand the entire situation. The desysop was the emergency countermeasure to a serious edit war, not "a mistake". —suzukaze (tc) 01:09, 19 August 2016 (UTC)

This topic must not die again. How are we going to set up the transcription/transliteration infrastructure? —suzukaze (tc) 00:39, 21 August 2016 (UTC)

Agreed. I think most people are in agreement that the status quo of one single transliteration is OK, and it's also OK to display two transcriptions/transliterations for languages like Tibetan and Burmese where the pronunciation and written script are far from each other and where the written form carries important etymological information that's missing from the modern pronunciation. This potentially could be done for Thai and Khmer as well although here I think it's less useful, as the difference between the two isn't so much, and the extra information in the written form is mostly only present in Sanskrit loanwords, which are fairly unproblematic etymologically. The main issue here is that Wyang disagrees and wants to impose a system where we show transcriptions in some places and transliterations in others, but I think pretty much everyone else is opposed to this so it won't fly. We could vote on this but Wyang has to be willing to accept the result, since he seems to be the main one who would implement it. Benwing2 (talk) 01:07, 21 August 2016 (UTC)
My main points were: 1) transliteration and transcription should not be confused; 2) for languages which can both be romanised with transliterations and transcriptions, the functional modules should be distinguished and named appropriately; 3) using multiple romanisations is very confusing in translations and readers will not understand the difference; and 4) translation sections should use transcriptions to romanise terms, if transcriptions are contrastive with transliterations. I do not believe I am the only one who is in favour of this. Discussions should involve effective argumentation, not by merely accusing others of being outlandish. Wyang (talk) 01:36, 21 August 2016 (UTC)
Eh, I find it reasonable to display only the relevant romanization to reduce clutter. The entry itself could show which is a transcription and which is a transliteration. —suzukaze (tc) 01:55, 21 August 2016 (UTC)
I prefer to see transcriptions as is currently done by the Thai module. Transliterations or symbol sequence can still be found in Thai entries. --Anatoli T. (обсудить/вклад) 02:32, 21 August 2016 (UTC)
I suggest recording the transcription modules in language_data, creating a parallel Language:transcribe function in Module:languages, and making Module:links call on this function unconditionally or conditionally for certain languages. Wyang (talk) 01:26, 21 August 2016 (UTC)
Thanks for summarizing your position. Benwing2 (talk) 01:41, 21 August 2016 (UTC)
Note that you haven't answered whether you will accept the community's consensus if it goes against yours. Benwing2 (talk) 01:41, 21 August 2016 (UTC)
Fine. Bye bye. Wyang (talk) 01:42, 21 August 2016 (UTC)
Christ. I was trying to play mediator but seem not to have been successful. Wyang, I do hope you will reconsider. No one wants to see you leave. Benwing2 (talk) 03:53, 21 August 2016 (UTC)
Repeatedly using imagined “consensus” (your opinion) as majority tyranny to intimidate others is hardly mediation. I am perplexed how the above discussion could be interpreted as me spewing out nonsense and needing to be brought under control. I elaborated my various points in the discussion and there isn't really any opposing argument regarding either using transcriptions in translations or separating transliteration and transcription utilities for certain languages. Then there was your “summary” which identified the need to smother me without providing any counterarguments whatsoever. To my technical proposal, instead of commenting on the feasibility or reasonableness of this, you again tried to smother me by labelling whatever you believe in as “consensus” and coercing me to accept it. This is opposing for the sake of opposing, without bringing in any intelligent arguments to the discussion. This is bullying. How is བཀྲ་ཤིས་བདེ་ལེགས (zhacf-xih-dev-leh [bkra shis bde legs]) not confusing as the Tibetan translation of hello? It is frustrating to try to have people think sensibly and analytically about topics with the future in mind on Wiktionary. Look at how long it took for the community to come to senses with the Chinese merger and now this; time and time again, it is regression led by the unfamiliar majority, without critically analysing proposals for what they are. Wyang (talk) 02:16, 22 August 2016 (UTC)
Wyang, I am sorry things have gotten to the point that you think I am smothering you, bullying you, tyrannizing you, etc. It was not my intention to do any of these things, and I apologize for giving the wrong impression. How about we simply hold a vote on what is the best way to handle this? This is the Wiktionary way of doing things, and will more clearly reveal the consensus. Are you willing to lead that vote? Benwing2 (talk) 03:12, 22 August 2016 (UTC)
Thank you. Wikipedia:Polling is not a substitute for discussion; Wikipedia:What_Wikipedia_is_not#Wikipedia_is_not_a_democracy has more arguments why decision making should be achieved by discussion and consensus, not votes. It is not sensible to expect voting to be the most suitable method of decision making, when the great majority of eligible voters are uninvolved and perhaps have no prior familiarity with how the transliteration-transcription distinction manifests itself in the romanisation of certain languages. It is akin to believing that User:Wyang will be a responsible voter on the topic of Akkadian romanisation; quite the contrary I have no previous experience with this and any stance I take regarding Akkadian romanisation could be very unwise for the project's future. In this discussion we should be appraising whether the preferential use of transcriptions in translations is favourable over transliterations for certain oriental languages (Tibetan, Burmese, etc.; example below), and consequently whether transliteration and transcription modules should be kept separate for these languages. Wyang (talk) 04:27, 22 August 2016 (UTC)
“eight” “ear” “hello”
Transcription བརྒྱད (pr. gyaew) རྣ་བ (pr. naf-waf) བཀྲ་ཤིས་བདེ་ལེགས (pr. zhacf-xih-dev-leh)
Transcription
(without tone letters)
བརྒྱད (pr. gyae) རྣ་བ (pr. na-wa) བཀྲ་ཤིས་བདེ་ལེགས (pr. zhac-xi-de-le)
Transliteration བརྒྱད (brgyad) རྣ་བ (rna ba) བཀྲ་ཤིས་བདེ་ལེགས (bkra shis bde legs)
A vote is good way way out. The above speaker seems to underestimate the ability of voters to inspect and evaluate evidence and to consider arguments presented. Instead, he seems to commit what looks like an authority fallacy, the erroneous notion that only those already familiar with Thai can make a sound judgment about Thai romanization, be it transcription or "transliteration" narrowly construed.
How consensus, mentioned in the above post, could ever be anything different from the result of a vote is beyond me. Since, consensus is a general agreement even if not unanimity, and I fail to see how a passing vote could ever show anything other than consensus. --Dan Polansky (talk) 08:02, 22 August 2016 (UTC)
A vote, where the topic is unfamiliar to most of the eligible voters, can easily produce ill-advised consensus (“collective stupidity”). And yes, only those familiar with how the transliteration-transcription distinction manifests itself in the romanisation of certain languages can make a sound judgment about the issue. Calling for a vote is not the way out if that side shows no willingness to engage in discussions and present counterarguments to reach consensus. Cases of collective intelligence are when those familiar and knowledgable about the topic critically appraise the arguments for and against the proposal to attempt to reach consensus, not when the proposal is relayed to a vote to see which side is more numerous. Wyang (talk) 10:22, 22 August 2016 (UTC)
You have the option to present "how the transliteration-transcription distinction manifests itself in the romanisation of certain languages". In fact, you have just done that in a table above. I trust most of the voters to consider such presentation in their voting decision. Discussion alone is not a mechanism of decision making; indeed, in this dispute, both parties think they are right and that they have presented the right arguments. Strength of argument is not a mechanism of decision making since there is no simple mechanism to assess strength of argument. --Dan Polansky (talk) 11:13, 22 August 2016 (UTC)
Having information presented can never, ever supplant being knowledgeable about a topic. For instance, if I were provided with a comprehensive overview of the Akkadian language, I would still feel that I am in no position to provide any judgment on Akkadian romanisation. In fact, Wiktionary has witnessed many lessons learnt from having the unfamiliar collectively voice opinions and make decisions. The merger of Chinese is one - it was only adopted in 2014, more than 10 years after the launching of Wiktionary. So much more work could have been done in the meantime, and so much work still remains to be done to rectify the initial step in the wrong direction. The misinterpretation of transliteration is another one. All of these resulted from the lack of intelligent decision making from people who are familiar and knowledgeable about the topics. Discussion is, and is arguably the most important mechanism of decision making, and I would argue that no decision making should be achieved without any substantial discussion. In this dispute, there has been a paucity of argumentation from one side throughout, and a paucity of active discussion of the topics at hand (whether the preferential use of transcriptions in translations is favourable over transliterations for certain oriental languages, and consequently whether transliteration and transcription modules should be kept separate for these languages). Wyang (talk) 12:05, 22 August 2016 (UTC)
If you submit that after being presented the table, I still don't appreciate the difference between letter or character based transliteration and pronunciation based transcription, you are drastically underestimating my capacity to understand very simple things. Nor do I think other readers have failed to appreciate the distinction. Your fallacy is grave. --Dan Polansky (talk) 12:58, 22 August 2016 (UTC)
Basically, you're fighting against any way for other editors to disagree with you. Is there any way Wiktionary editors could decide on a course that you disagree with that you'd accept?--Prosfilaes (talk) 13:43, 22 August 2016 (UTC)
Transliteration / transcription is about providing a Latin-script handle for people who don't read the script. Anywhere but the entry for the word itself we should be using one consistent Latin-script version, and there were can and should provide every transcription/transliteration version now or once in standard use.--Prosfilaes (talk) 09:57, 22 August 2016 (UTC)
Romanisation is indeed about providing a Latin-script handle for people who don't read the script. Nonetheless, the reasons we want to use romanisations are different in various parts of the project. It could be to show how the foreign-script word is pronounced (for example in translation sections), or how it is spelt literally (in etymological comparisons). At the moment, Tibetan is romanised with a transliteration method (Wylie transliteration), which is 100% automatable and is fantastic in etymological comparisons, as it faithfully represents how the word is spelt. However, there is no point showing brgyad as the Tibetan translation of eight, as readers will automatically assume the romanisation in translations is the word's pronunciation and attempt to pronounce it as such when communicating with locals. It makes more sense to simply use transcriptions to inform readers of the pronunciation in translation sections. Wyang (talk) 10:22, 22 August 2016 (UTC)
Romanization is not for showing how words are pronounced. That's what the pronunciation section in the entry is for. If you're using Wiktionary's translation tables as a pronunciation key for communicating in that language, epic fail. In any case, showing བརྒྱད shows six rather different pronunciations; giving me gyaew instead of brgyad doesn't really help me pronounce the word.--Prosfilaes (talk) 13:43, 22 August 2016 (UTC)
Are you sure?! You should then talk to Japanese editors and tell them they should transliterate こんにちは as "konnnichiha" They've been doing it wrong all these years! Also, get in touch with some other dictionary publishers and tell them their Korean and Thai transliterations are wrong. --Anatoli T. (обсудить/вклад) 13:49, 22 August 2016 (UTC)
Sarcasm aside, the rest of my statement still stands. If you want to know how a word is pronounced, look at the pronunciation key, not the translation table. Readers who "automatically assume the romanisation in translations is the word's pronunciation" is going to be consistently lost, and I fail to see how gyaew is going to help any reader who doesn't know Tibetan figure out the pronunciation is /cɛʔ¹³²/ or /bɡjat/ or /dʑɛʔ⁵³/ or /dʑed/ or /wɟjal/ or /hdʑal/. I in fact feel that any reader who could use gyaew to derive the correct pronunciation probably knows enough to figure out that brgyad isn't the pronunciation transcription they were looking for.
Romanize as you will, but the value of having one romanization throughout Wiktionary and giving readers a consistent Latin-script name for a word outweighs the value of having different romanizations in different places.--Prosfilaes (talk) 15:53, 22 August 2016 (UTC)
Now that CodeCat and her supporters have successfully driven away Wyang from the project, someone has to take over all the work he has been doing. Congratulations! I am disgusted with community's reaction to the problem. --Anatoli T. (обсудить/вклад) 02:21, 21 August 2016 (UTC)
Anatoli, what do you think should have been done (or should be done) differently? Benwing2 (talk) 04:17, 21 August 2016 (UTC)
I don't think it's anyone's fault but Wyang's, given that the leaving was in response to "Note that you haven't answered whether you will accept the community's consensus if it goes against yours."--Prosfilaes (talk) 13:07, 21 August 2016 (UTC)
I know the technical subject is not the subject of this thread but anyway: could not the naming disagreement be solved by placing CodeCat code in Module:th-transcr rather than Module:th-translit? Then, the misnomer argument would no longer apply, and other argument against CodeCat's solution would have to be sought. --Dan Polansky (talk) 08:06, 22 August 2016 (UTC)
The code was originally placed in Module:th (function getTranslit). Either Module:th or Module:th-transcript is fine, though either way the transcription module needs to be recorded in addition in Module:links or Module:languages/data2, as translit_module is a misnomeric parameter. Wyang (talk) 10:22, 22 August 2016 (UTC)
A further question: are the modules currently present in Category:Transliteration modules in general transcription modules or are they overwhelmingly transliteration modules in the narrow sense, transcribing on the letter or character level? --Dan Polansky (talk) 08:13, 22 August 2016 (UTC)
Just call it all 'Romanisation modules' and be done with it. Korn [kʰũːɘ̃n] (talk) 10:57, 22 August 2016 (UTC)
I oppose calling those modules "Romanisation modules". There are elements of "transcription" (more or less) in many languages, most of them are standard transliteration. Here are examples of transliterations with elements of transcription, "the translit" shows more graphical transliterations of the same word (the actual spelling):
Arabic: عربى ‎(ʿarabiyy), translit: "ʿrbā", vocalised Arabic: عَرَبِيّ ‎(ʿarabiyy)
Greek: Μπούρμα ‎(Boúrma), translit: "Mpoúrma"
Russian: легкого ‎(ljóxkovo) (phonetic respelling: "лёхково"), translit: "legkogo", spelling with "ё": лёгкого ‎(ljóxkovo)
Japanese: こんにちは ‎(konnichiwa), translit: "konnichiha"
Korean: 십육 ‎(simnyuk) (phonetic respelling: "심뉵"), translit: "sibyuk"
Hindi: फिल्म ‎(film), translit: "philma", spelling with "nuqta": फ़िल्म ‎(film)
One can argue that abjad languages like Arabic, Persian, Urdu, Hebrew, etc. can't be transliterated but romanisations are still called transliterations. Persian and Urdu are seldom fully vocalised, so their graphical transliterations would be completely useless for someone wanting to know how to pronounce Persian or Urdu words. Some irregularities are handled by transliteration modules, for some terms manual (hard-coded) transliteration is required. If someone accuses Wyang for making up transliterations for Thai, check Paiboon dictionaries for terms like ชาติ ‎(châat) (graphical transliteration: "châa-dtì") and see how these terms are transliterated there. --Anatoli T. (обсудить/вклад) 13:18, 22 August 2016 (UTC)
I see no problem with calling any rendering of a non-Latin word in Latin script a romanisation as a hypernym and only referring to it as a transliteration/transcription specifically when it's important to underline the difference. (Maybe leave a note in the documentation.) If a module is made which does both, as some parties propose, or if only one is reasonable for a language, why not go with an indiscriminating 'romanisation' so you can categorise them all easier and don't have to waste debate time on naming conventions? Korn [kʰũːɘ̃n] (talk) 14:37, 22 August 2016 (UTC)
  • A propos of the voting/consensus matter, I am particularly well qualified to contribute as one of those ignorant of most aspects of the matters under discussion.
The rationales for requiring a consensus of more than those knowledgeable about the languages in question is that it might interfere with the module architecture as currently designed, that the translation tables might become cluttered, and that some users (including those not intending to learn the languages and scripts in question) might be confused/overwhelmed/put-off by the transliteration-transcription distinction.
What makes sense for entries in the languages in question is a matter best left to the contributors in those languages IMO. If our module architecture did not anticipate the need for a transcription-transcription distinction, then so much the worse for the architecture. We cannot have the module architecture unreasonably preventing contributors from contributing in the manner that is best for the languages in question by their lights. IOW, we should not have the tail wagging the dog. How to apply this principle is left as an exercise to the reader.
I can only beg that the translation table matters do not make the tables cluttered and confusing for all to deliver a questionable benefit to some. DCDuring TALK 12:40, 22 August 2016 (UTC)
@DCDuring I agree! But wait, maybe CodeCat is eager to make a new module for Khmer or Burmese language and apply their "best practice" there? Well, Wyang has started, somebody can make those modules perfect!
Seriously, I perfectly understand Wyang's frustration. He created a WORKING SOLUTION for complex Asian languages nobody even attempted before. Now, someone starts changing modules without any discussion with him. I would be very upset if someone tried to change my work without first checking with me. Why people even think they should be both blocked? How would YOU feel if you were in the same situation? I don't want CodeCat blocked but I think she is absolutely wrong here. Yes, location of the code can be reviewed and discussed, agreed first and only THEN changed, if the agreement is reached.--Anatoli T. (обсудить/вклад) 13:29, 22 August 2016 (UTC)
They were both blocked/desysopped because they both used their admin powers to continue an edit war. It's not a punishment for being on the wrong side of the argument, it's a method for suppressing disruptive behaviour. Korn [kʰũːɘ̃n] (talk) 14:43, 22 August 2016 (UTC)
@Dan Polansky Dan, can you create a vote to short-circuit endless arguing? The vote should have two choices: (1) Continue the current situation where Module:links enforces the constraint that a single romanization (which may be a two-part transcription/transliteration romanization, on a language-specific basis) is used for all types of links; (2) Modify Module:links to allow different romanizations for different types of links (e.g. etym links vs. translation links). The former is User:CodeCat's position, the latter is User:Wyang's. Set the discussion period and vote start/end dates however you think most appropriate. Benwing2 (talk) 15:34, 22 August 2016 (UTC)
I agree with what User:DCDuring said above. My proposal is to keep transliteration and transcription utilities modules separate in Module:languages/data2 and similar modules, for languages possessing two contrastive sets of romanisation schemes. Notable examples include Tibetan (Wylie transliteration vs Tibetan Pinyin), Burmese (MLCTS vs BGN-PCGN), Thai (ISO 11940 vs Paiboon) and Korean (Yale vs RR). The rationale is that the module infrastructure should anticipate the need for a transliteration-transcription distinction in certain languages, and not unreasonably prevent contributors of these languages from contributing in a manner that is best for the languages in question by their lights. I am in no position to singlehandedly advocate that language X should use romanisation Y for a certain purpose without meticulous discussion having taken place surrounding language X, which need to happen in separate language-specific discussions. Still, there is a lack of adequate in-depth discussion concerning the issue, especially from arguments against - why do the harms outweigh the benefits if we keep the transliteration and transcription modules separate for these languages? Wyang (talk) 23:56, 22 August 2016 (UTC)
If they are kept separate, then there needs to be a functional reason. The distinction needs to have a consequence in how our modules work and treat each one, and where each one appears. I don't think it particularly desirable to have multiple romanization schemes in different parts of Wiktionary, this just confuses users. The system we have now, with a consistent representation across Wiktionary, is just fine. We don't need two systems when one suffices. —CodeCat 00:13, 23 August 2016 (UTC)
  • The whole point of Wyang's argument is that two systems are already in broader use for certain languages, with each system used for specific purposes. I.e., one romanization scheme doesn't suffice, for certain specific languages. ‑‑ Eiríkr Útlendi │Tala við mig 00:40, 23 August 2016 (UTC)
  • The question is “why do the harms outweigh the benefits if we keep the transliteration and transcription modules separate for these languages”? It does not make sense to use multiple romanisation schemes for Greek, Russian, Georgian, Armenian, etc., but the languages of question are languages which contrast these modes of romanisation prominently. Does it make sense to keep transliteration and transcription modules separate in the module infrastructure for these languages? Yes. Many editors of these languages have been conscious of the need to use the appropriate romanisation in certain contexts. See for example, how User:Angr changed the romanisation of the Burmese word to a transcription at elephant. Our Korean romanisation scheme is the transcriptive Revised Romanisation scheme, which is official in South Korea (also hidden under the misnomer Module:ko-translit). User:Visviva, our first prolific Korean contributor, created the entry 미끄럽다. Note the differential use of a transcriptive romanisation in the main text (mikkeureopda) and a transliterative romanisation in etymology (muys.kulepta). Considerations of the arguments for and against need to be made in the context of these script-pronunciation-discordant languages. Provided the romanisation is well-annotated, such as 믯그럽다 (Yale: muys.kulepta) at 미끄럽다, the appropriate, purpose-oriented use of romanisations is hugely beneficial to dictionary building, for these languages. Wyang (talk) 00:54, 23 August 2016 (UTC)
I will just add, Wyang, that you keep claiming that the people opposed to you are giving no real reasons for doing so, but you yourself have given no reasons why consistently using a dual romanization scheme, like I've suggested as a compromise between your view and CodeCat's, is unacceptable, other than the unsupported claim that it's confusing for new users. Benwing2 (talk) 01:09, 23 August 2016 (UTC)
  • FWIW, my bias is to using the more-phonetic romanization (I recognize this is not IPA-grade, but it *is* generally closer to how an English speaker would say something) in translation lists and similar locations, and using the strict transliteration (i.e. letter-for-letter) romanization in etymology sections and other discussions of the term's etymology and historical development. I.e., I disagree with Benwing's suggestion, and I don't want to see both systems used in all cases. This would be similar to what is already in practice for Korean. ‑‑ Eiríkr Útlendi │Tala við mig 01:18, 23 August 2016 (UTC)
  • Exactly. The information presented in a dictionary should be succinct; dual romanisation in translations is infoxication. The reason users look up translations is to answer their questions of “how do you say ... in ...?”, and romanisation in translations should cater to the need of users. Let's look at how other translation dictionaries do this: the only previewable English-Tibetan dictionaries on Google Books are 1, 2 and 3 and all are using transcriptions only. Transcriptions answer the users' questions directly, without additional romanisations to complicate their information processing (below). Wyang (talk) 02:57, 23 August 2016 (UTC)
“hello”: བཀྲ་ཤིས་བདེ་ལེགས (pr. zhacf-xih-dev-leh) vs བཀྲ་ཤིས་བདེ་ལེགས (zhacf-xih-dev-leh [bkra shis bde legs])
“birthday”: འཁྲུངས་སྐར (pr. chungf-gaaf) vs འཁྲུངས་སྐར (chungf-gaaf ['khrungs skar])
“brain”: ཀླད་པ (pr. laef-baf) vs ཀླད་པ (laef-baf [klad pa])
Do you have any evidence that users look up translations solely for the pronunciations of things? The main reason that the system with multiple romanisations in confusing is that not a single one of them is explained.
“brain”: ཀླད་པ (pr. laef-baf, sp. [klad] [pa]) Korn [kʰũːɘ̃n] (talk) 10:08, 23 August 2016 (UTC)
There is plenty of evidence for that. Many transliterations are efforts of a few people over a period of time who took part in their development. Apart from Wyang, myself, you can talk to people like Eirikr, Haplology (Japanese), TAKASUGI Shinji (Japanese and Korean), Aryamanarora (Hindi), Benwing2 (Russian), Saltmarsh (Greek) why transliterations are the way they are. They use some phonetic elements, they are not IPA and not supposed to convey the pronunciation accurately. --Anatoli T. (обсудить/вклад) 10:26, 23 August 2016 (UTC)
Anatoli, I don't understand what you are trying to tell me with your comment. Wikipedia says that Tibetan Pinyin does not mark tone. Our pronunciation sections use the label 'Tibetan Pinyin' but according to Wang, the superscript letters we see are tone marks. What kind of system is that? It seems in need of relabeling. Korn [kʰũːɘ̃n] (talk) 10:34, 23 August 2016 (UTC)
Korn: See Tibetan pinyin#References. This is the modified version of the official Tibetan Pinyin, with tone letters. The Wylie transliteration scheme is the gold standard of Tibetan romanisation; it and its variants are used in almost all scholarly publications. But it is interesting that all of the three previewable English-Tibetan dictionaries on Google Books use transcriptions only to romanise their Tibetan translations. Wyang (talk) 10:56, 23 August 2016 (UTC)
My point was that for each language, the interested and knowledgeable editors decided what and how to go about transliterations for specific languages. I could bring series of discussions about Korean. Wyang implemented most of it. The phonetic transliteration (RR) was adopted - officially recommended in South Korea. There were relevant long discussions, decisions made. Now with the argument between Wyang and CodeCat everybody joined with their opinions but cared little when the actual problems were discussed and solved. --Anatoli T. (обсудить/вклад) 11:08, 23 August 2016 (UTC)
Wyang, there should be a note about what the symbols mean on About: Tibetan, including what the tone marks mean, this information is not easily retrievable. Anatoli, I assume the reason for that is that now the community is forced to take note of the situation because it's brought to the Beer Parlour whereas before it was discussed amongst editors of the language. Korn [kʰũːɘ̃n] (talk) 14:55, 23 August 2016 (UTC)
Absolutely; have to work on it later. Wyang (talk) 21:16, 23 August 2016 (UTC)

az.wiktionary[edit]

Hello. I am here because there's nowhere to post. It's about az.wiktionary (Azeri aka Azerbaijani). There is some problem that a user (perhaps admin) that is native changed verb categories from fel to feil last year. Later, another user that is also native rejects his changes. There is not many people there to discuss about this. I hope this community has knowledge about Azeri and can decide which one to name the categories. --Octahedron80 (talk) 06:02, 18 August 2016 (UTC)

PS If you want to make changes, please do at az.wiktionary directly. I am off. --Octahedron80 (talk) 06:19, 18 August 2016 (UTC)

(calling @Aabdullayev851) A third spelling for verb is fe'l. I think there are slight differences in fel / feil / fe'l, but I not sure what the differences are. —Stephen (Talk) 15:12, 22 August 2016 (UTC)
Hello Octahedron80 and Stephen. I am new admin in Azerbaijani Wiktionary. I am thanks you about your attention to Azerbaijani. There is no fe'l in Azerbaijani language. I am sure about that. But fel or feil? Some sources use fel, but some sources use feil. I am also confuse about this words. How we can decide which is correct? --Aabdullayev851 (talk) 17:00, 22 August 2016 (UTC)
OK. Fe'l use in Soviet time in Azerbaijani. After that start using fel. After 2004 using feil. So current version is feil. You can see this changes in şe'r - şer - şeir. --Aabdullayev851 (talk) 17:31, 22 August 2016 (UTC)
Let's use modern spelling. Feil maybe. --Octahedron80 (talk) 00:33, 23 August 2016 (UTC)
Ok Octahedron80 --Aabdullayev851 (talk) 07:25, 24 August 2016 (UTC)

Proposal: 3-revert-rule[edit]

Wikipedia has a 3-revert-rule which exists explicitly to stop endless edit wars. I think Wiktionary should have a similar policy.

However, I don't know if it should be different in every detail from Wikipedia's version. For example, Wikipedia gives both users three reverts, which means that once all three are used up, the edit is kept rather than reverted. On Wiktionary we follow the principle that disputed edits should be reverted first and then discussed for consensus. So it would make more sense if, when both parties have exhausted their reverts, the final state of a page is the original state rather than the state with the disputed edit. I therefore think that the editing party should get only 2 reverts.

Another detail that should be worked out is how to enforce it, especially when two admins are involved. Wikipedia has a special 3RR noticeboard, which makes sense because it happens so often there. But here it's pretty rare, so there's probably not that much need for a special forum. Would WT:VIP do? I feel that the Beer Parlour is insufficient, as I found out myself when I tried to report an incident and it was ignored. Is it possible at all to make a rule against sysop apathy? —CodeCat 22:06, 18 August 2016 (UTC)

I really don't want us to start enshrining common sense as policy or to start having highly regulated drama-fests like Wikipedia does. Editors shouldn't revert-war and wheel-war because it's disruptive; we shouldn't need an explicit policy stating what everyone already knows. As for returning a page to the "original state", who decides what the "original state" is? If today I revert something another editor did back in 2013, and 15 minutes later he reverts my revert, what's the "original state"? I like how unregulated and uncomplicated Wiktionary is compared to Wikipedia, and I want to keep it that way. —Aɴɢʀ (talk) 22:15, 18 August 2016 (UTC)
I don't see a way to enforce any of our policies. Renard Migrant (talk) 22:23, 18 August 2016 (UTC)
The enforcement mechanisms are social pressure and the action of bureaucrats. Sadly, some seem largely immune from social pressure. DCDuring TALK 02:09, 19 August 2016 (UTC)

PIE root[edit]

--Daniel Carrero (talk) 11:17, 21 August 2016 (UTC)

Wiktionary:Table of votes[edit]


I created Wiktionary:Table of votes. It is automatically generated using Lua.

Feel free to discuss the new page. If there are any suggestions, I can make the changes in the Lua code.

My idea was to let people know when they voted and when they didn't vote. I was hoping that maybe this would increase the turnout in all the vote pages. --Daniel Carrero (talk) 13:58, 22 August 2016 (UTC)

I also added an "expand" link, at the bottom of the vote box, pointing to the new table of votes. --Daniel Carrero (talk) 21:06, 22 August 2016 (UTC)

Unfortunately, the page is too wide. I wonder if I should implement a list of abbreviations for all users. Examples:

--Daniel Carrero (talk) 14:47, 23 August 2016 (UTC)

I will object to any foreshortening of User:I'm so meta even this acronym which is not "ISMETA." - TheDaveRoss 18:08, 23 August 2016 (UTC)
Oops! I typed "IMSOMETA" but I fixed it now. I also removed "META". Those were mistakes. --Daniel Carrero (talk) 18:43, 23 August 2016 (UTC)
Would it be too hard to read the names of the votes if you reoriented the table so that the usernames were in a vertical column? As long as the text of the vote names was wrapped, I think that might be an improvement. Andrew Sheedy (talk) 20:18, 23 August 2016 (UTC)
I agree. There's more voters than votes, so the lower number should be arranged horizontally. —CodeCat 20:42, 23 August 2016 (UTC)
@Andrew Sheedy, CodeCat: Yes check.svg Done. Does it look better now? --Daniel Carrero (talk) 21:48, 23 August 2016 (UTC)
Much! —CodeCat 21:54, 23 August 2016 (UTC)
Definitely—it fits on my screen now! :) Andrew Sheedy (talk) 21:58, 23 August 2016 (UTC)

Voting policy etc.[edit]

Wiktionary:Votes had a lot of important text hidden in collapsible divs. I moved it all to:

--Daniel Carrero (talk) 12:33, 23 August 2016 (UTC)

Vote: Making usex the primary name in the wiki markup[edit]

FYI, I created Wiktionary:Votes/2016-08/Making usex the primary name in the wiki markup. Let us extend the vote as much as discussion requires. --Dan Polansky (talk) 17:32, 23 August 2016 (UTC)

New PIE root categories[edit]

Per comments at WT:RFDO#Template:PIE root, I made {{inh}}, {{der}} and {{bor}} populate categories like Category:Czech terms derived from the PIE root *swep- when the current term is derived from PIE.

This caused a number of redlinked PIE root categories to appear in Special:WantedCategories.

Is that okay with everyone? Is there any problem with these categories, or can they all be created normally? If there is no objection, can someone create all the categories automatically? --Daniel Carrero (talk) 19:07, 23 August 2016 (UTC)

I removed the code that did that, because it was creating all kinds of unwanted categories. Essentially, all PIE terms were being given categories, not just the roots. That said, it's impossible for these templates to determine what is or isn't a root, and besides that, there's tons of etymologies which refer to invalid or badly-formed roots as well. —CodeCat 19:40, 23 August 2016 (UTC)
Maybe it would be better to give the templates a parameter so users can manually make them opt in to categorizing, e.g. |PIErootcat=1 or something like that. —Aɴɢʀ (talk) 09:58, 25 August 2016 (UTC)

Proposal: Request categories with longer names[edit]

This concerns Wiktionary:Votes/2016-07/Request categories. See also the vote talk page for further discussion: Wiktionary talk:Votes/2016-07/Request categories. The vote did not start yet, and is under construction.

Consider all these categories:

Proposal:

Rename all categories, with longer, more accurate names, with proper English grammar/syntax and "requests" in all names. Details are to be discussed below. (Feel free to suggest different names for the categories if you want.)

Proposed names:

--

--

--

--

--

--

--

Rationale and notes:

  • A more consistent naming style proposed to be used in all request categories.
  • ("requests" vs. "needing") These categories track where something was manually requested, not where it is needed.
  • I attempted to propose names with correct English grammar/syntax. As opposed to "English requests for example sentences", for example. As discussed before, there are no "English requests" anywhere.
  • I'd like to replace "needing attention" and "to be checked" by "review". If we are making a request to do something, you are asking people to review the entries.
  • A minor reason: Some of the proposed category names match the request template. {{rfap}} = "request for audio pronunciation", {{rfe}} = "request for etymology".

--Daniel Carrero (talk) 09:21, 24 August 2016 (UTC)

Upcoming 5 million entries milestone[edit]

Should be reached in around one or two months max. Maybe an occasion to celebrate a bit and do some events / communication around the project? A WMF guest blog post with some stats / data visualisations / stories? I'm still surprised about the low profile Wiktionary has, many people have not even heard about it, or confuse it with Wikipedia. Or worse, it gets treated as inferior. At the recent WikiConvention francophone I heard the remark (paraphrased from memory, probably meant as a joke) "Wikipedia is where the real work gets done, Wiktionary is for scrabble players". Time to change this perception! – Jberkel (talk) 10:08, 24 August 2016 (UTC)

The speaker was probably referring to fr.wikt. [;-}] DCDuring TALK 10:33, 24 August 2016 (UTC)
To this day at Wikipedia there are people who consider Wiktionary to be Wikipedia's trashcan. I still see "Transwiki to Wiktionary" at deletion discussions all the time there, even for terms we already have an entry for, and even when our entry is superior to the Wikipedia article up for deletion. —Aɴɢʀ (talk) 15:09, 24 August 2016 (UTC)
I wish that we had a real encyclopedia as a sister project. DCDuring TALK 17:04, 24 August 2016 (UTC)
We are their WT:LOP, and WT:LOP's WT:LOP is Urban Dictionary. Equinox 17:28, 24 August 2016 (UTC)
Deleting WT:LOP would improve our overall quality. --Daniel Carrero (talk) 22:54, 24 August 2016 (UTC)
WT:LOP was formerly the only place to put words that we now call "hot words". It also served as a means of handling good-faith new entries that is less hostile than deletion. I think we are better off looking like the work in progress that we are rather than pretending that we are at all close to being a finished product in whole or in part. DCDuring TALK 00:33, 25 August 2016 (UTC)
Yeah, LOP serves to satisfy/district (some of the) contributors who would otherwise keep trying to add their neologisms to the mainspace, which makes it useful. 'Cause on our end, we can just ignore it... - -sche (discuss) 08:40, 25 August 2016 (UTC)
Something to be mentioned in whatever news release goes out: we have words from about one-third of the world's languages, according to conventional estimates of how many languages are spoken in the world. As of when WT:STATS was updated, we were up to 2535 languages with entries, and I expect we are at least over 2600 by now, if not higher. We include codes for 7960 languages, and given how many languages we've identified as needing codes, which I am steadily adding, I expect that figure will reach 8000 soon (by which time I expect to have passed the one-third mark of 2667 languages with entries, since most languages I am adding codes for I am also adding entries in). - -sche (discuss) 08:40, 25 August 2016 (UTC)
At ~4,845,000 pages, we're pretty close to Wikipedia's pagecount (~5,223,000). I expect that we'll overtake WP as time goes on, because we define at least 200,000 English base words (number of entries minus number of form-of definitions = 368,098, but many are variant spellings, so I conservatively guess 200,000 base words), and have the potential to include that many entries in several thousand languages (assuming poorly-attested languages on one side and highly agglutinative or inflected languages on the other side will make a wash), which is several hundred million entries. (Even just that many entries in the 500 most common of the languages we include would be 100,000,000.) - -sche (discuss) 08:50, 25 August 2016 (UTC)

Proposal: make headword templates for some languages automatically categorise phrasal verbs[edit]

The "phrasal verbs" category is pretty well populated for English, but not so much for other languages. There are also, presumably, many more missing from the English category. I therefore propose that

  • The English headword module be modified so that when the page name contains a space, then the phrasal verbs category is automatically added. I'm not sure if phrasal verbs should also be put in the plain "verbs" category.
  • This change also be applied to the modules of other languages, where this is desirable or applicable.
  • This change be applied to other parts of speech, if desired.

It would also be possible to implement this directly in Module:headword, and then it would apply automatically for all languages. However, I don't know if this would be desirable. If everyone else thinks it's fine, we can do that instead. —CodeCat 16:26, 24 August 2016 (UTC)

I don't think that everything that meets the condition presented is a phrasal verb, within the normal meaning of the term, eg, break wind. I don't believe that every entry for a verb followed by a preposition or particle is a phrasal verb, eg, go to hell or go to in that phrase. DCDuring TALK 17:10, 24 August 2016 (UTC)
I wasn't aware of that definition. I thought it just meant any verb that is a phrase (i.e. the SoP meaning of phrasal verb itself). —CodeCat 17:14, 24 August 2016 (UTC)

Proposal: automatically categorise palindromes in Module:headword[edit]

Recently, logic was added to categorise entries if they have unusual characters in them. We can also do other "analysis" of words automatically in the module, including palindromes. Therefore I propose that we add this feature to the module so that the categories don't have to be added manually anymore. —CodeCat 16:31, 24 August 2016 (UTC)

Would it be undesirably computationally expensive to do anagrams this way too? Equinox 17:27, 24 August 2016 (UTC)
Modules have no way to see what entries are in a category, so they are not able to go over each one and see if a term is an anagram of the current term. —CodeCat 17:39, 24 August 2016 (UTC)
  • This is a good idea, and (much unlike anagrams) I can't think of too many language-specific issues, besides the fact that it's not relevant for certain scripts. —Μετάknowledgediscuss/deeds 22:40, 24 August 2016 (UTC)
  • I agree, this is a good idea. --Daniel Carrero (talk) 22:53, 24 August 2016 (UTC)
  • I also like this idea. Will it be implemented such that periphrastic palindromes ("Madam, I'm Adam") would be allowed? You'd have to remove spaces and punctuation, lowercase the string, and pass it through the sort key logic to get reliable results. —JohnC5 00:44, 25 August 2016 (UTC)
What is the minimum length string we will consider a palindrome? 3 characters? DTLHS (talk) 00:44, 25 August 2016 (UTC)
1 character, it seems. Both a and I are in Category:English palindromes. --Daniel Carrero (talk) 00:53, 25 August 2016 (UTC)
I definitely disagree with categorized single characters as "palindromes". DTLHS (talk) 01:01, 25 August 2016 (UTC)
Appendix:English palindromes has palindromes with a minimum length of 2 characters. There are a few two-letters palindromes in the category, too: ee, oo, BB. Can abbreviations, such as AAA, be considered normal palindromes? --Daniel Carrero (talk) 01:09, 25 August 2016 (UTC)
Those may be better in a category Repeated character than palindromes. — Dakdada 13:21, 25 August 2016 (UTC)
Suppose we define "palindrome" for these purposes as words containing multiple different characters. This would effectively exclude all two letter palindromes, which only repeat the same character, and things like WWW and ooo, but would include things like Ana and oro. bd2412 T 14:40, 25 August 2016 (UTC)
I'd like to implement this, but I don't have the sysop rights required to do it. —CodeCat 16:24, 25 August 2016 (UTC)

Unicode 9.0[edit]

Can someone please update Appendix:Unicode and subpages?

The appendices cover the characters up until Unicode 8.0. Unicode 9.0 was introduced in June, apparently.

List of new Unicode 9.0 characters: http://www.unicode.org/charts/PDF/Unicode-9.0/

--Daniel Carrero (talk) 16:38, 25 August 2016 (UTC)

Trademarks, again[edit]

I'm very much opposed to the idea that Wiktionary has to indicate trade marks in any way. We've recently had an editor who seems to be adding trademark nonsense to dodge bow, a common term not at all connected to any particular trademark, or at the very least a genericised one. They've gone to User talk:JohnC5 and argued that their trademark deserves the same recognition as we give to Nike. What is the policy on this? Why is the trademark indicated for Nike in the first place, and how might this case differ from others? —CodeCat 21:36, 25 August 2016 (UTC)

Per WT:TM, we do not indicate trademark status. The talk page of that page has links to the discussions that led up to it, in which a WMF staff member and an[other?] intellectual property lawyer participated. I have removed the "trademark" context label from Nike. (I wonder if we should upgrade that page from think tank to a higher status.) - -sche (discuss) 21:46, 25 August 2016 (UTC)
Yet, we do explicitly define "trademark" as a recognised label. Even though it's not even a usage context. Should we get rid of it? A possibility we could even consider is to include logic in Module:labels and its data modules to explicitly forbid certain labels. —CodeCat 21:58, 25 August 2016 (UTC)

Austro-Asiatic and Mon-Khmer[edit]

Austro-Asiatic was traditionally divided into Mon-Khmer and Munda, but more recent classifications have made Mon-Khmer synonymous with Austro-Asiatic. On Wiktionary, we still have mkh (Mon-Khmer) as a subfamily of aav (Austro-Asiatic). Where this becomes a problem is that it prevents Munda terms like दाः from referencing their Proto-Mon-Khmer i.e. Proto-Austro-Asiatic ancestors. How should this be addressed? - -sche (discuss) 21:54, 25 August 2016 (UTC)