Wiktionary:Beer parlour/2018/May

Definition from Wiktionary, the free dictionary
Jump to navigation Jump to search
discussion rooms: Tea roomEtym. scr.Info deskBeer parlourGrease pit ← April 2018 · May 2018 · June 2018 → · (current)

Renaming Taino to Taíno[edit]

@-sche, Metaknowledge, I'd like to broach renaming Taino to Taíno again.

  1. Taíno is the spelling commonly used in academic papers (the main source of reconstructions), Ethnologue, and Wikipedia.
  2. It is illustrative of its proper pronunciation, /taɪˈinoʊ/, and not /ˈtaɪnoʊ/.

--Victar (talk) 16:52, 2 May 2018 (UTC)

No opposition from me.
Minor quibble: wouldn't the correct pronunciation be /ta.ˈinoʊ/ instead? ‑‑ Eiríkr Útlendi │Tala við mig 17:06, 2 May 2018 (UTC)
@Eirikr: I normally pronounce it more like /təˈinoʊ/, but I've often heard /taɪˈinoʊ/, as in naïve. --Victar (talk) 17:13, 2 May 2018 (UTC)
Thank you. Part of what prompted my question was the stress, which you'd fixed in the meantime.  :) ‑‑ Eiríkr Útlendi │Tala við mig 17:19, 2 May 2018 (UTC)
Cheers. --Victar (talk) 17:21, 2 May 2018 (UTC)
The same argument about why we have Maori rather than Māori seems to apply here: we should avoid diacritics, which are difficult for most users to type, wherever they are not universally used. In this case, I don't care too much either way, since the language is such a niche interest and I don't think anyone will be caused problems by the change. But I do note that when I search for a phrase like google books:"spoke Taíno", the majority of the results actually have "Taino". —Μετάknowledgediscuss/deeds 17:22, 2 May 2018 (UTC)
@Metaknowledge, Thanks for the reply. I addressed both those concerns in the last discussion. 1. Taino is a purely academic reconstructed language with a very limited lexicon, so ease of typing is something of a moot point. 2. Doing a search comparison is for Taíno is innately flawed because a) OCR software often does not recognize i-acute, and b) diacritical marks are commonly dropped from non-academic sources. --Victar (talk) 17:38, 2 May 2018 (UTC)
And correct me if I'm wrong, but Maori and Māori would both be pronounced the same in English, unlike Taino and Taíno, the í representing a breakup of the diphthong. --Victar (talk) 17:47, 2 May 2018 (UTC)
Responding to your points... I agree with #1, which is why I'm abstaining on this. Your 2a is irrelevant, because I went and looked at the actual previews of the books where possible; you can come up with your own search string and try the same. 2b is not a flaw, but actually an important fact: if a lot of people writing about this language are not doing so in academic contexts, maybe we shouldn't use an academic spelling. —Μετάknowledgediscuss/deeds 17:57, 2 May 2018 (UTC)
@Metaknowledge, to give an example of failed OCR, see results #11 and #13 of google books:"Taíno language", Caciques and Cemi Idols and An Account of the Antiquities of the Indians.
Regarding 2b, I think the context is very relevant. When the book is actually about the Taíno language and people or even the Caribbean peoples at large, I've found Taíno very commonly used, yet when Taíno is just a passing mention, we often find Taino. Also, to restress, this is an academic language, so I believe academic publications should hold more weight than, say, a biography about Christopher Columbus. --Victar (talk) 18:19, 2 May 2018 (UTC)
But the passing references still count towards establishing what the common name of the language is, especially if they greatly outnumber the specialist works that use the diacritics, as seems to be the case here. I'd still prefer the diacritic-less name since it's (more?) common and easier to type, but I don't feel strongly about it. - -sche (discuss) 18:32, 5 May 2018 (UTC)
@-sche:, well, if there are no strong objections, which seems to be the case, and since I appear to be the only one interested enough in adding entries for the language, would you mind renaming it to Taíno? --Victar (talk) 04:48, 7 May 2018 (UTC)
Also, FWIW, at least some of the hits showing as non-accented Taino in the excerpt on the hits page actually show up as accented Taíno when viewing the preview. I suspect the opposite might also happen, due to the vagaries of Google's OCR processing. ‑‑ Eiríkr Útlendi │Tala við mig 17:39, 2 May 2018 (UTC)

Indonesian vs. Indonesian Malay[edit]

I thought I had already asked this a few months ago, but I can't find the discussion now, so maybe I only imagined asking it. What is the difference between Indonesian language and Indonesian Malay? Do we need both categories? —Mahāgaja (formerly Angr) · talk 18:16, 2 May 2018 (UTC)

No, I definitely remember you asking that. Lemme check. --Per utramque cavernam 18:20, 2 May 2018 (UTC)
There was a plan to merge Indonesian into Malay and use {{lb}} appropriately, but the vote never started. —Suzukaze-c 18:23, 2 May 2018 (UTC)
Yes, but I'm pretty sure the discussion Mahagaja is referring to is more recent than that. I even remember someone posting a link to a Wikipedia article, possibly this one, and Mahagaja was satisfied after that. --Per utramque cavernam 18:25, 2 May 2018 (UTC)
Can't find it. --Per utramque cavernam 18:49, 2 May 2018 (UTC)
I definitely don't remember being satisfied, though I may have given up in frustration. At any rate, if the previous discussion can't be found, the question stands: What, if anything, is the difference, and do we need both categories? —Mahāgaja (formerly Angr) · talk 19:10, 2 May 2018 (UTC)
I can't find the discussion either. Maybe it somehow got deleted. — Eru·tuon 19:12, 2 May 2018 (UTC)
Do you also remember this discussion taking place? (I'm slowly starting to feel like I imagined it too, but it couldn't have happened to three of us). --Per utramque cavernam 19:23, 2 May 2018 (UTC)
It's the top discussion here. Wyang (talk) 22:09, 2 May 2018 (UTC)
Hah! This is so satisfying, thanks a lot. --Per utramque cavernam 22:16, 2 May 2018 (UTC)
Thanks, Wyang! WT:Feedback doesn't get archived, just erased, which explains why we were unable to find it. —Mahāgaja (formerly Angr) · talk 22:17, 2 May 2018 (UTC)
Wow, you found it! It's unfortunate that the useful discussions on that page simply vanish into thin air, so to speak. — Eru·tuon 01:01, 3 May 2018 (UTC)
If I happen to see a particularly useful discussion, I aWa-archive it to the most relevant talk page I can find so that it's still findable. A lot of the threads are tosh, but we could either get more users in the habit of archiving the useful ones (especially if a discussion leads to improvement of an entry, it can be useful to have the discussion on the entry's talk page), or just save them all as is done with translation requests. - -sche (discuss) 04:26, 3 May 2018 (UTC)
I would be inclined to support a merger. There seems to be a massive amount of duplication. Perhaps we should (re-)start the vote? - -sche (discuss) 18:53, 10 May 2018 (UTC)
The Malay language in modern context is the one used in Brunei, Malaysia and Singapore which is different from the one used in Indonesia which is called Bahasa Indonesia. So I think there should be a distinction between these two languages. --Tofeiku (talk) 07:18, 12 May 2018 (UTC)

Being nice[edit]

What is the community feeling about this kind of thing? [1] (StackOverflow is one of the main question-and-answer sites for computer programmers.) Are we bad unkind people? Are we causing problems for women and non-white people? Equinox 00:39, 3 May 2018 (UTC)

Do you actually care? DTLHS (talk) 00:39, 3 May 2018 (UTC)
Yes. Equinox 00:41, 3 May 2018 (UTC)
en.wikt is not as welcoming as some other communities, even other WMF ones. The fact that there is a "women problem" is empirical (i.e. the vast, vast majority of contributors are men). As for solutions, ¯\_(ツ)_/¯. —Justin (koavf)TCM 01:04, 3 May 2018 (UTC)
We are a feisty bunch. Some of it is somewhat viciously circular: we don't have many editors relative to WP, so those we do have may be relatively overworked, so partially-OK but e.g. malformatted edits (which are perhaps more possible here than on WP because our formatting is more rigid) are often rolled back rather than cleaned up or even undone with a specific explanation. Sometimes, editors who make such edits are harangued on their talk pages or forums like RFC/RFV/RFD. This contributes somewhat to our non-retention of new editors, including some prolific ones, which means we don't have many editors, which brings us back to the first point in the circle. Also, established users sometimes reflexively defend each other even when [newbies'] complaints (e.g., that someone should have fixed a partially-OK edit rather than rolling it back...) are probably reasonable. (I'm sure I've been the defender, the defendant and the complainant at different times in such situations before.) Of course, we also attract quite a few editors who persist in making low quality or incorrect edits, whether out of ignorance or POV or malice, who we benefit from shutting down without all the quasi-"due process" of Wikipedia. - -sche (discuss) 19:07, 6 May 2018 (UTC)
OT DCDuring (talk) 20:54, 6 May 2018 (UTC)
LOL. - -sche (discuss) 21:10, 6 May 2018 (UTC)
The other side of that is tolerating "things" (racism, sexism) from contributors because they have some rare resource or know an obscure language. DTLHS (talk) 21:37, 6 May 2018 (UTC)

Category:Old French nouns in Hebrew script[edit]

Is this valid? DTLHS (talk) 01:16, 3 May 2018 (UTC)

Why would that be surprising? I'm guessing French Jews wrote the local tongue in Hebrew script, like Jews in many places.--Prosfilaes (talk) 04:13, 3 May 2018 (UTC)
The entries themselves are valid results of (reasonably, IMO) considering Judeo-French zrp to be the same language as Old French fro. I would have thought the category was valid, as we at one time also had Category:Afrikaans nouns in Arabic script, but that category is now empty and the words it contained are simply categorized with other Afrikaans nouns, so maybe we have gotten away from such categorization. - -sche (discuss) 04:20, 3 May 2018 (UTC)
I don't think we need to categorize by both part of speech and script. It would suffice to put these in both Category:Old French nouns and Category:Old French entries in Hebrew script. --WikiTiki89 14:30, 3 May 2018 (UTC)
That's reasonable. I suppose "entries in Hebrew script" might not even need their own category of any sort, since they are usually findably grouped together in the lemmas and non-lemmas categories (and other categories). So, I suppose this category should be removed from the entries that call for it. - -sche (discuss) 18:55, 6 May 2018 (UTC)
I've removed this category from the entries, and deleted the empty Afrikaans categories, but we have many such categories; see e.g. "nouns+in"+"script"&title=Special:Search&profile=default&fulltext=1 this search. - -sche (discuss) 18:52, 10 May 2018 (UTC)

Tropical cyclone names[edit]

I'm planning to create entries of names of tropical cyclones. They are easily attestable and sometimes have interesting etymologies. (Note tropical cyclone names are recycled and thus they are not names of specific entries.) See Haikui for an example. However I don't know whether this is within the scope of Wiktionary and I want to gather some comments about how should the entries be formatted. (including: whether to use English or Translingual header, how should the definitions be, should we have a link to Wikipedia articles about specific cyclones with such names, etc.)--Zcreator alt (talk) 14:36, 3 May 2018 (UTC)

  • That seems to be OK, but I think they need a better definition (and a link to -pedia). SemperBlotto (talk) 14:40, 3 May 2018 (UTC)
  • You should create a template to use for the definitions. That way all the definitions will be the same and we can easily edit it in one place. I suggest a formatting similar to {{tropical cyclone|en|typhoon}}. --WikiTiki89 14:43, 3 May 2018 (UTC)

Format for given names in non-Latin scripts[edit]

It seems as though given names are being treated in different ways by non-Latin script languages. In Persian, so far most given names have had a red link for an English entry. On the other hand, some other languages are not linking and just using {{given name}} to display 'A female/male given name'. For example, Bengali, Georgian and Hindi appear to not link usually (e.g. 'श्वेता', exception: 'शबनम'), but Armenian and Persian do give links. Arabic and Hebrew appear to be mixed and Russian links to equivalent names.

If there is no English entry/Romanisation, it might be more difficult to find information. For example, it is unlikely anyone would ever to type in 'ğonče' to find غنچه (ğonče), but they will find it in its current format by typing 'Ghoncheh'. This could create a large number of potential entries in English, though.

Should there be a policy across all non-Latin script languages to either link to a future English entry or to not link? Kaixinguo~enwiktionary (talk) 09:20, 5 May 2018 (UTC)

Questions in this topic area (of how to handle names, especially English/Latin-script forms of foreign names) have come up several times over the years and have no easy unproblematic answer. The entries should use the "given name" template, and then if the name has a common English counterpart / translation, provide it after a colon: "A male given name: Ghoncheh." Whether or not there should be a link and an ==English== entry for "counterpart" probably needs to be decided on a case-by-case basis: if there are native English-speakers with the name, e.g. children of Iranian immigrants, we should have an entry that defines it as an English given name of Persian origin. But if the name only occurs when foreign bearers' names are rendered into English / Latin script, there has historically been less agreement about how to handle it. (I may dig up links to previous discussions later.) - -sche (discuss) 19:18, 6 May 2018 (UTC)
Thanks. Amongst the languages other than Persian that I looked at, several do not follow the given name template with the transliteration/Romanisation/whatever it is.
I posted a question at the Information Desk asking about the punctuation. At the moment the template uses a comma rather than a colon, so I was wondering if it should be modified.
You are right about the distinction in how the names are being treated 'in English'. There are two categories, Category:English_female_given_names_from_Persian and Category:en:Persian_female_given_names. I'm not thinking about that so much as the format of the Persian- or foreign-language entries in non-Latin scripts and whether they should take a common approach. Kaixinguo~enwiktionary (talk) 21:22, 6 May 2018 (UTC)
Oh, I have no strong feelings about whether the punctuation should be a comma or a colon. Also, I could understand omitting the second part (the English 'version' or customary transliteration) if there isn't one, as could be the case with names from more obscure languages. But I think the first part (the {{given name}} template) should always be used (right?), except maybe on alt forms of names (which could just use {{altform}}), although even there it could be used (as on Sara). - -sche (discuss) 06:24, 10 May 2018 (UTC)


Birgit Müller (WMDE) 14:45, 7 May 2018 (UTC)

Upcoming reference template votes[edit]

After some renewed sparring over the layout of reference and citation templates (in which both sides have become frustrated and acted poorly), I will soon be setting up a vote to standardize the reference template layouts. The proposed text of the vote is as follows:

All citation templates and citations should use the {{cite-*}} or {{R:Reference-meta}} templates underlyingly. All removal of citational information found as a parameter to the {{cite-*}} or {{R:Reference-meta}} templates shall only be motivated via agreement among editors.

As part of this, {{R:Reference-meta}} will need to be brought into alignment with {{cite-*}} where it differs. Please provide comments. —*i̯óh₁n̥C[5] 19:25, 8 May 2018 (UTC)

@Dan Polansky, Sgconlaw, Metaknowledge, Ungoliant MMDCCLXIV, Victar*i̯óh₁n̥C[5] 19:27, 8 May 2018 (UTC)
The proposal looks a bit abstract to me. What does this entail exactly? What difference in display will it cause? --Per utramque cavernam 19:31, 8 May 2018 (UTC)
@Per utramque cavernam: The context is an ongoing disagreement with Dan about what information should be in reference templates and how they should be formatted. Dan prefers that they be simple and not "ornamented" with information that Sgconlaw and I consider to be valid academic citation material. The normal line of argument is that Dan feels there is a "status quo ante" of simple references and that we should not diverge from that. I am proposing this vote to prevent this line of argumentation and the deletion of valid citational information. There is a separate complaint from Dan that our citation method is "not usual academic practice", and to be sure, en.Wikt has always been a bit idiosyncratic in our layout and in the information we allow (specifically regarding ISBN's), I do not think we stray particularly far from recognizable academic standards. The advantage of this vote, however, is that if we decide to change our citation standards at any point, we only need to edit a handful of templates to change the whole project. To summarize, I would like to prevent the removal of (what I consider) valid academic information and would like to centralize everything so all changes my be applied uniformly. —*i̯óh₁n̥C[5] 19:46, 8 May 2018 (UTC)
@JohnC5: Ah yes, I've seen that debate happen in a few places.
So, does the proposal mean that any citation template should make use of all the parameters (or all the relevant parameters) of the aforementioned underlying templates; i.e., that they should always be as complete as possible?
While I tend to prefer a more succinct format, I appreciate thoroughness and accuracy too. Could we imagine having the "short form" by default, and an "expand/see more" button for the full view?
I think @Widsith could be interested in this, as I remember him challenging the length of some quotation templates. --Per utramque cavernam 20:10, 8 May 2018 (UTC)
I can understand making the body of an article more succinct and I often have to restrain myself from adding 20 alternative forms, but I don't see any valid reason to hold that same standard to content in the footer. --Victar (talk) 20:27, 8 May 2018 (UTC)
But it's not always used in the footer. For example, {{RQ:Browne Pseudodoxia Epidemica}} is used in the body of the entry; see latitancy. --Per utramque cavernam 20:37, 8 May 2018 (UTC)
Is this vote meant to apply to RQ templates? DTLHS (talk) 20:40, 8 May 2018 (UTC)
@DTLHS: Under this formulation, no. —*i̯óh₁n̥C[5] 20:51, 8 May 2018 (UTC)
@Per utramque cavernam: In your example, it's collapsed content, which I'm also not concerned by the length of. If there are examples of this that are not collapsible, I recommend that they be made so. --Victar (talk) 21:08, 8 May 2018 (UTC)
@Per utramque cavernam: So your question about what parameters will be used is crucial. When we cite printed material, even digitized print material, we are almost always citing a particular version from a particular year. This information should be represented in the citation. The user should be able, with the appropriate library, find the appropriate version from which a claim is made. Some of these books differ wildly in the content chronologically, and finding the correct one is important. Thus providing the standard bibliographic information (author, title, section/chapter/article title, year, [editors], location, publisher, [pages]) is crucial. As to the ISBN's specifically, I could take them or leave them. The issue is how do we decide the threshold of what is too little to be accurate and what is excessive. This information should be used (in part) to disambiguate where there are multiple versions. In the case of entirely online dictionaries like dictionary.com, less information would be stable or relevant so less would be included (there might not be author(s), publisher(s), location(s), edition(s), etc.).
As for short vs. long form, if we can do this via css, I'm all for it. I'm less excited about appendices or alt text. —*i̯óh₁n̥C[5] 20:49, 8 May 2018 (UTC)
I'd vote yes to that. If someone believes a parameter should be removed or the formatting of templates altered, that should be discussed on {{cite-*}} and {{R:Reference-meta}}, respectively. --Victar (talk) 19:40, 8 May 2018 (UTC)
Seems reasonable to me. Not sure what the problem is to be honest. —AryamanA (मुझसे बात करेंयोगदान) 22:36, 8 May 2018 (UTC)

──────────────────────────────────────────────────────────────────────────────────────────────────── I was going to start a discussion on this; thanks for beating me to it. The background to the issue is that there is a difference of opinion over what format the so-called "dictionary-related" reference templates (i.e., templates like {{R:Online Etymology Dictionary}} and {{R:Webster 1913}}) should have. It seems to me that the following options are available:

  • Option 1: All reference templates should follow the {{cite}} family or {{R:Reference-meta}} formats, which means that full citation information (including imprint information – e.g., edition number, place of publication, name of publisher, year of publication, and ISBN or OCLC number – can or should be provided.
  • Option 2: Dictionary-related reference templates should have a simpler format (entry not enclosed in quotation marks, little or no imprint information provided), while other reference templates should follow the {{cite}} family or {{R:Reference-meta}} formats.
  • Option 3: All reference templates should have a simpler format.

My preference is for option 1. It seems to me that we need to try to achieve consensus on the following issues:

  • Should we try to maintain consistency in the formats of all reference templates and quotation templates? (Quotations are generally formatted using the {{quote}} family of templates, and the {{cite}} family of templates is aligned with the former.)
  • If yes, should we standardize all reference templates with the {{cite}} family of templates, {{R:Reference-meta}} (which I created to try and bridge the differences between options 1 and 2 above), or some other format?
  • If no, what are the reasons for treating the dictionary-related reference templates differently? How do we consistently identify these templates?

SGconlaw (talk) 22:37, 8 May 2018 (UTC)

  • The further issue is, if we choose Option 2, how will we differentiate when there are multiple, conflicting versions of a dictionary and, if we choose Option 3, how will differentiate more fiddly scholarly sources?
  • I would be in favor of standardizing to the {{cite}} family of references, as hey more closely mirror the {{quote}} family in certain ways.
  • We should decide on the quotation-marks-around-cited-dictionary-entries issue. IF we decide to have quotation marks (which I favor) we should build this functionality into all the templates.
*i̯óh₁n̥C[5] 23:19, 8 May 2018 (UTC)
I'm sympathetic to the idea that references don't need so much clutter, though since they're mostly limited to the end of entries (and could be collapsed, I guess, if present halfway through a multi-etymology entry) and can still be input by short names like {{R:MWO}}, I suppose it doesn't matter much how cluttered they are. However, a caution: online references will need to have fewer details in order to be accurate. For a printed dictionary, you could reasonably expect people to cite which edition / publication year they found a term in, but online dictionaries often change versions without anyone noticing ... for example, for more than a decade, Template:R:Dictionary.com referenced "v1.0.1" and the "2006" version of the site, even though the template continued to be applied while the site was updated, including probably to at least some words that weren't present in the 2006 version of the site. So, it was good that the version and date were stripped out of the template.
(And adding e.g. an access date would probably be unhelpful, since if a later editor can't find the term in the dictionary anymore, they can hardly confirm that the earlier editor was telling the truth that they found it in the site, and maybe the entry was removed for inaccuracy. Only in limited circumstances like writing a usage note about how dictionaries formerly included a term does it seem useful to cite e.g. an archived version of a no-longer-present entry in an otherwise-still-online dictionary.)
- -sche (discuss) 15:39, 10 May 2018 (UTC)
  • I have strong feelings about the unwieldy size of RQ-format citation templates, but I'm not clear whether this actually applies to them. Does it? And if not, perhaps we should consider all these similar templates at the same time. Ƿidsiþ 12:48, 17 May 2018 (UTC)
    • I'd say that's an issue which can be discussed separately, otherwise the current discussion will become unwieldy. — SGconlaw (talk) 16:09, 17 May 2018 (UTC)
      • Like the templates! Ƿidsiþ 16:38, 17 May 2018 (UTC)

WT:CFI and "clearly widespread use"[edit]

Does "clearly widespread use" have any currency? It seems like we always slap {{rfv}} on entries and ask for 3+ durable cites (and they HAVE to be durable, AND they must be uses), regardless of widespread use, which usually doesn't go over too well for internet slang, and then we have minor debates every so often about whether the Internet Archive is durable, or how we could amend WT:CFI to suit the digital age. —Suzukaze-c 03:07, 10 May 2018 (UTC)

What is an example of a page that failed RFV despite "widespread use"? DTLHS (talk) 03:12, 10 May 2018 (UTC)
Anyway, I view that line as a way to pass RFV without doing the work of actually putting the quotations in the entry, not that they don't have to exist. DTLHS (talk) 03:15, 10 May 2018 (UTC)
I was thinking mostly about internet slang, but I indiscriminately excised part of my text and accidentally removed that part. I edited my original post to fix it. Regional words might fit in this category too, and social media might be used as proof of "clearly widespread use".
IIRC there was an RFV for some gaming-related term some time ago, and someone mentioned that it seemed to be used a lot but not in published literature, and the entry didn't pass. I can't find it though. —Suzukaze-c 03:25, 10 May 2018 (UTC)
I think we should remove the "widespread use" rule as it is not well-defined. Instead we should reconsider WebCite and Internet Archive - Usenet is no longer in fashion and is English-biased.--Zcreator alt (talk) 04:39, 10 May 2018 (UTC)
I only remember using or seeing used "widespread use" to avoid the effort of citing something that was hard to cite but within the experience of almost everyone or almost every native speaker. I don't think it has been abused.
It's harder to apply for regional use because we may not have multiple editors familiar with such use. DCDuring (talk) 09:31, 10 May 2018 (UTC)
I generally agree with DTLHS and DCDuring. When I tended RFV a lot (a tiring task Kiwima is now the most prominent doer of), I didn't consider it useful to sticklerishly make anyone (including me!) spend time typing up three citations, in those pre-aWa days, as long as they could link to them or provide search terms that would find them or something. But if there simply aren't three citations to be found, a term can't be that widespread, now can it? "Widespread use" is a useful shortcut (including against trolling nominations of words like "shirt"), IMO, but valid citations still need to exist. - -sche (discuss) 15:21, 10 May 2018 (UTC)
Trolling nominations can be handled by w:WP:SNOW without adding a vague rule.--Zcreator alt (talk) 19:57, 10 May 2018 (UTC)
How is that not about as vague as "widespread use"? Widespread use is at least understandable at first glance, whatever the theoretically possible problems with its application. BTW, does anyone have specific instances of the principle having been misapplied? DCDuring (talk) 20:25, 10 May 2018 (UTC)
Not a positive example: I think OP's position is that it isn't being applied enough: words which don't have three citations but which are used a lot online or regionally aren't being passed as "in widespread use" (because, I might argue, they aren't in widespread use). - -sche (discuss) 20:42, 10 May 2018 (UTC)
I wouldn't mind seeing examples of that. It is only after something is RfVed that "widespread use" is invoked. If the objection is to excessive use of RfV, people RfV things for lots of reasons, including being tired and cranky at the time. Also please consider the vast number of definitions that have not been challenged.
RfV are a means of documenting that a term means what our definition says it means. Existence for many words is a non-issue. RfV is especially appropriate for words not found in other dictionaries.
I would not object to easing citation requirements for terms that could be found in sources of a quality similar to DARE, though copyvio is an issue. DCDuring (talk) 21:21, 10 May 2018 (UTC)
I would object to that. The OED has a ton of garbage in it that we shouldn't be reproducing. DTLHS (talk) 21:25, 10 May 2018 (UTC)
@DTLHS, I beg your pardon, is that ironically pertinent to the topic? I could see a few parallels. Rhyminreason (talk) 11:09, 14 May 2018 (UTC)
I don't know what you mean by ironically pertinent. The OED has many words where the only citation is from some dictionary from the 18th century. They also include words that we would classify as Old English / Middle English / Scots. The point is that just copying from other reference works is going to lead to trouble since they have different inclusion criteria. DTLHS (talk) 16:08, 14 May 2018 (UTC)
I'll admit that I've invoked the clause for things that are common but not commonly committed to durable media. One example is Unsupported_titles/Hyphen_vertical_line_vertical_line_hyphen (BTW: why isn't that rendering properly?), which I chose to enter because I was and am confident it would be recognized by most or all native speakers, should any be invited (e.g. from da.wiki, since there are few here). I agree that this sort of reasoning can easily become problematic ("but I've seen it many times!"), and that we should clarify what exactly the clause means.__Gamren (talk) 17:13, 15 May 2018 (UTC)
  • I've always understood this to be shorthand for "Shall we all agree that finding three quotes is unnecessary?" But if someone disagrees, three quotes should always be demonstrated IMO. Ƿidsiþ 12:45, 17 May 2018 (UTC)
If that's all it means, I feel we should get rid of it, since in those situations supplying three quotes is really not burdensome, and in any case entries are always improved by having quotes. In my experience, RFVs of easily verifiable words are not frequent enough to be disruptive.__Gamren (talk) 18:42, 19 May 2018 (UTC)

Our representation of Polish ⟨y⟩ (my primary concern) and handling of consensus (a side note)[edit]

I recently came across the fact that we erroneously transcribe Polish ⟨y⟩ with ⟨ɨ⟩ in IPA, for no good reason. So I changed the IPA module, but I've been told to seek consensus beforehand - which I already had done to no avail. I then was told that lack of response does not equal consensus. While I find that an understandable notion, I also think that it is a dangerous one as it prevents users from fixing errors simply by the majority, frankly my dear, not giving a damn, completely going against the be bold principle that the Wiki projects were based on and ossifying those parts of Wiktionary less frequented.
Be that as it may, from my writing you see that I consider ⟨ɨ⟩ to be an error. As far as I'm aware, ⟨y⟩ is never pronounced with any sound even close to a high central vowel and authors using this character are not contesting this. The actual sound is close-mid, slightly fronted, and potentially raised to near-close. I think this is because until recent times it was [ɪ], as conservative dialects use this sound, but historical Polish is nothing I'm acquainted with. I had changed the module to use [ɪ̽] for ⟨y⟩ and think this what we should do. (Although en.Wikipedia uses ⟨ɘ̘⟩, I believe.) For disclosure: The letter is transcribed thus in broad structural descriptions of Polish phonology, although I do not understand why. I think the reasoning behind it was that an Rocławski chose to consider [ɪ̞̈] to be basically [ɪ̈], which he then represented with ⟨[ɨ̞]⟩ and then simplified to ⟨/ɨ/⟩. I find that approach too imprecise, especially considering that the actual range of the vowel never even reaches near-close according to Rocławski as presented on Wikipedia (I don't have access to the actual book.), and thus not exemplary. We're a dictionary, not a work for peer review in structural linguistics, we owe it to our users that our pronunciation section actually give them a rough idea how to pronounce the word instead of teaching them to speak a language with a Russian accent. Korn [kʰũːɘ̃n] (talk) 09:34, 11 May 2018 (UTC)

@Wikitiki89 As the one reverting me, are you actually for using ⟨ɨ⟩? Is there actually consensus for ⟨ɨ⟩ or is it just happening to be the one implemented first by some ancient user who did a lot of work? And if you want to avoid diacritics in broad transcriptions, why use a character with one (centralising bar) instead of ⟨ɘ⟩, which does not have a diacritic and is actually close to the actual value? Korn [kʰũːɘ̃n] (talk) 11:13, 14 May 2018 (UTC)
I recently spoke with my Polish friend (he has very little exposure to Russian) about phonetics and we agreed, among other things, that the Russian "ы" and the Polish "y" sound identical = /ɨ/. If something sounds similar in similar languages, why should this be avoided? --Anatoli T. (обсудить/вклад) 11:22, 14 May 2018 (UTC)
The question is not 'why avoid ⟨ɨ⟩', that's basically an inversion of the burden of proof. The question is 'why consciously choose ⟨ɨ⟩ when we know that it is not the IPA character representing the sound most closely'? Korn [kʰũːɘ̃n] (talk) 11:38, 14 May 2018 (UTC)
@Korn: Please have a listen to Polish ryba (fish), Russian ры́ба (rýba, fish), Ukrainian ри́ба (rýba, fish) and Belarusian ры́ба (rýba, fish). Does the first vowel sound different to you? (Czech, Slovak or South Slavic have a different sound here). --Anatoli T. (обсудить/вклад) 11:41, 14 May 2018 (UTC)
They sound nothing alike to me. White Russian and Ukrainian I can differentiate roughly when hearing them back to back, but those I would count as "roughly the same vowel", but Polish and Russian sound worlds apart of my ear and both at least recognisably different from be/uk. Korn [kʰũːɘ̃n] (talk) 11:48, 14 May 2018 (UTC)
  • The Ukrainian and Belarusian files sound the same to me, as approximately [ˈrɪba ~ ˈri̠ba]. The Polish and Czech also sound the same to me, but somewhat further back than uk and be, approximately [ˈrɨ̞ba]. The Russian sounds somewhat diphthongal, approximately [ˈrɯi̯ba]. —Mahāgaja (formerly Angr) · talk 12:00, 14 May 2018 (UTC)
Well, you're both working hard on finding differences but they are just /ˈrɨ-/ to me. The differences are so minor, that I, a native Russian speaker, could think these people speak the same language. I'm sure the German pronunciation module could be another Wiktionary success story if we were trying to find an acceptable middle ground, rather than trying to nitpick. --Anatoli T. (обсудить/вклад) 12:16, 14 May 2018 (UTC)
I'm not working hard at all, they sound nothing alike to me. Korn [kʰũːɘ̃n] (talk) 12:36, 14 May 2018 (UTC)
"Sounds like" is cheap, spectrograms or bust. Comparing Polish and Russian ryba in Praat it's obvious that they are different, Polish has a F2 in the 1800Hz ballpark and F1 around 500Hz, which is much lower than high vowels such as /u/ and /i/, but also lower than English /ɪ/ and also not front. Basically an /ɘ/ as Korn said.
In contrast, Russian vowel is consistently high with an unstable F2 (the diphthongal quality described by Mahagaja). An interesting thing is how low the spectral tilt (the relative strength of higher frequencies compared to lower) is in the Russian sample compared to Polish, indicating a much slacker voice, however I'm not sure if this has any phonological relevance (velarization?) Crom daba (talk) 14:43, 14 May 2018 (UTC)

@Hergilei, Tweenk? --Per utramque cavernam 11:23, 14 May 2018 (UTC)

And @Guldrelokk too --Per utramque cavernam 11:50, 14 May 2018 (UTC)
To me Polish y does sound notably different from Russian /ы/, like something between it and the backed Russian /э/ after hard consonants. However, I would not rely on any ear except an actual phonetician’s one for purposes of sound description. Then, what symbols to use in phonemic transcription is a non-issue: hardly any vowel in the usual Danish transcription matches its ‘reference’ phonetic realisation, and for good reason. On the other hand, it is important not to deviate from the major sources on Polish phonology, and there is clearly a preference and a tradition to write /ɨ/. I would stick to it. Guldrelokk (talk) 15:48, 14 May 2018 (UTC)
I wouldn't call the crossbar on ⟨ɨ⟩ a diacritic for IPA purposes since it can't be added to any vowel letter (the centralizing diacritic is the dieresis). And it's true we should avoid diacritics in phonemic representation as much as possible (it isn't always possible, e.g. for phonemically nasalized vowels). So the three main contenders for the phoneme of the sound in question are /ɪ/, /ɨ/, and /ɘ/. Which of these symbols is most commonly used in the literature? I haven't read a lot about Polish phonology, but my impression is that /ɨ/ is the most common (Wikipedia uses it in phonemic represenation, reserving [ɘ̟] for the discussion of its precise realization). If it's true that the literature uses /ɨ/ more than /ɪ/ and /ɘ/—especially if it uses it much more—then I think that's what we should stick with. It's not our job to invent new phonological transcriptions for languages that already have well-established traditions. We can explain at Appendix:Polish pronunciation that /ɨ/ is realized in the range of [ɘ̟ ~ ɨ̞] or whatever it is. Whether the Polish and Russian vowels are identical or different is irrelevant: the vowels of German bieten and English beaten aren't identical either, but we transcribe both as /ˈbiːtn̩/ —Mahāgaja (formerly Angr) · talk 11:42, 14 May 2018 (UTC)
You do not expect that hiding the actual realisation in a hard-to-acces page will cause our average user to simply leave Wiktionary a wrong idea of how the word he looked up is pronounced? Also, please explain how commonality in linguistic works has relevance to our dictionary entries. That something is the most common transcription is a non sequitur description of facts, not an argument. Korn [kʰũːɘ̃n] (talk) 11:50, 14 May 2018 (UTC)
I reject your premise that the Appendix:Polish pronunciation is a hard-to-access page as well as your implication that the average user will be better served by the use of obscure symbols. I know professors of phonology who are unfamiliar with the symbol ⟨ɘ⟩ because it's so rarely used in the literature (cross-linguistically, I mean, not specifically for Polish). I suspect that more than 75% of the uses of ⟨ɘ⟩ in the wild are simply errors for ⟨ə⟩ rather than intentional uses of ⟨ɘ⟩. Commonality of usage in other works is relevant because we are not (and should not be) anyone's sole resource for any language. If a Polish learner encounters ryba transcribed /ˈrɨba/ in all his other resources but /ˈrɘba/ here, he'll be confused. He probably won't assume we're just using a different symbol for the same sound, more likely he'll assume we're saying it's pronounced differently than the other materials say (which we aren't), or he'll assume it's an error on our part and will attempt to fix it. —Mahāgaja (formerly Angr) · talk 12:00, 14 May 2018 (UTC)
Well, you looked up mysz. How do you enter Appendix:Polish pronunciation from there? What tells you this page exists in the first place? My argument is simple: Our pronunciation section is meant to tell users how to pronounce a word. When we say the pronunciation section uses IPA, then our users will expect that the glyphs used represent those values assigned to them by the IPA. Therefore, if we use the glyphs assigned to the vowels in question by the IPA, our pronunciation section will tell users how to pronounce a word. I've yet to hear an actual argument as to how using the wrong glyph will benefit anyone or fulfill the purpose of the pronunciation section. [Edit conflict: Mahagaja has since brought forth confusion with other learning materials.] As for your disliking of ⟨ɘ⟩, which can be looked up literally within a second in the IPA vowel chart I expect anyone to know IPA in the first place to be familiar with, I would like to remind you that my proposal was a centralised ⟨ɪ⟩. Korn [kʰũːɘ̃n] (talk) 12:03, 14 May 2018 (UTC)
I could live with ⟨ɪ⟩ if it's more common in Polish learning materials and phonological literature than ⟨ɨ⟩, but I'm not sure that's the case. At mysz, you click on "key", which is a pretty intuitive thing to do when confronted with a set of symbols you might not be familiar with. —Mahāgaja (formerly Angr) · talk 12:15, 14 May 2018 (UTC)
Here's the crux: As someone who cannot read IPA, you might do that. I wouldn't. I would do the alternative method: Look up IPA on a search engine or Wikipedia. So there's a 50:50 chance (based on the number of options, lacking actual user data) that people will find our key. As someone who knows IPA, or has just looked it up as I described, you expect to be familiar with the symbol and have no reason to expect that we're, basically randomly because other people did that too, reassigning them to new values they're not meant for, unless you already have consumed the works of these people we're copying, in which case you don't actually need the pronunciation section because you are already familiar with Polish phonology. If you however encounter an IPA symbol you are not familiar with, you will look it up if you want to know what it means. Thus using the appropriate symbol creates less pitfalls for the actual target audience of the Pronunciation section. Korn [kʰũːɘ̃n] (talk) 12:23, 14 May 2018 (UTC)
There is a secondary influence on the sound of the vowel that most of you are missing. Both Polish and Russian have hard and soft consonants, but the Russian hard consonant is much, much harder than a Polish hard consonant. For most consonants (such as B, V, P, M), the Polish hard consonant is approximately the same as an English consonant (the Polish L/Ł, English L in various environments, and Russian hard and soft Л being excepted), and the vowel /Y/ after it seems reasonably pure. The Russian hard consonants, including Б, В, П, and М, are much harder, and this hardness strongly influences the sound of a following /Ы/. You have to be familiar with Russian phonology in order to judge Polish /RY/ versus Russian /РЫ/. If you're not familiar with the hard Russian consonants and their effect on /Ы/, then you will think you're hearing an ordinary consonant followed by a weird Russian vowel. Most of the "weirdness" belongs to the consonant.
I think the hard Polish /R/ versus the Russian /Р/ complicates the comparison. It is easier with Polish /M/ versus Russian /М/. So try listening to Polish my (we) and Russian мы (my, we). You still need to be familiar with Russian hard consonants, but with this example, I think it's easier to note and subtract the influence of the hard /М/. After subtracting the influence of the Russian hardness, the Polish /Y/ and Russian /Ы/ sound very much the same to me. Another way to look at it: if you put the Russian /Ы/ after a Polish /R/ or /M/, you get /RЫ/ or /MЫ/, which is identical to /RY/ or /MY/. —Stephen (Talk) 15:23, 14 May 2018 (UTC)
Aside from Crom daba citing what I take to be his own Praat usage, here is the allophonic range according to Rocławski, and here is the chart per Jassem. It really has no relevance how close some Russian allophone gets to a Polish allophone, sections about language X must be based on language X, not be based on language Y. Korn [kʰũːɘ̃n] (talk) 19:35, 14 May 2018 (UTC)
Good point on the /r/ @Stephen G. Brown, it also occured to me that it might cause some lowering, but a thorough exploration of Polish phonetics is beyond the scope of this project and somebody probably already did it much better than I ever could.
Comparing my, same trends are again present, although Polish is not as low (F1 mostly under 450Hz which is somewhere between /ɪ/ and /ʊ/) and Russian is somewhat lower for an interval before going completely closed (here's the contour, the vowel starts a bit after the sudden F2 rise).
I think it's silly to talk about the hardness affecting /Ы/, it is the hardness. Some analysis consider it an allophone of /i/ after hard consonants, we could write /mˠi/ but I don't think that would be helpful. Crom daba (talk) 20:52, 14 May 2018 (UTC)
I'm usually sympathetic to the idea that we should use the nearest simple symbol. I do note, however, that even in narrow transcription pl.Wikt uses [ɨ] (in ryba and my), and perusing works on Polish phonology, I see Edmund Gussmann's Phonology of Polish also uses [ɨ] (in rybak and my), and Jerzy Rubach's Cyclic and lexical phonology: the structure of Polish analyzes the (dia)phoneme as //ɨ// (in e.g. absolutyzm). Perhaps /ɨ/ can be used in broad transcription, and our narrow transcription can give the actual vowel? - -sche (discuss) 15:55, 14 May 2018 (UTC)
There are a lot of things to consider, such as the variation between different speakers, different regions, different social classes, etc. That sort of stuff is better left to the experts. As a dictionary, we should just stick to the common conventions. --WikiTiki89 22:05, 14 May 2018 (UTC)
If Wikipedia is right, /ɨ/ is realised as /ɪ/ in some dialects and it's the older pronunciation (?) of "y", that's why it's rhyming with "i" in poetry. Well, in East Slavic languages "и" rhymes with "ы" (ru) (equivalents in Ukrainian and Belarusian і (i)/и (y) (uk), і (i)/ы (y) (be). Czech and Slovak definitely lack /ɨ/, so Czech is /ˈrɪba/, not /ˈrɨba/. --Anatoli T. (обсудить/вклад) 02:05, 15 May 2018 (UTC)
Pity User:Kephir has left us. I'm sure he could clarify his decisions regarding /ɨ/ in Module:pl-IPA. --Anatoli T. (обсудить/вклад) 02:09, 15 May 2018 (UTC)
He'll have used it because everyone uses it, can't blame him. @Wikitiki89 To our best knowledge, which completely agrees with my personal experience for what such anecdote is worth, the allophonic range is this with conservative speakers using [ɪ], which agrees with the neighbouring Slavic languages of Bohemian (which merged /ɪ/ and /i/), Ukrainian (which uses /ɪ/) and Upper Sorbian (which uses /ɪ/) realising the phoneme in a similar manner, as well as, medium fetched, with the fact that local/neighbouring Germanic lects realise /ɪ/ as [ɪ̈~ɘ̘] (a situation continuing well into Dutch territory). So there is no evidence that this vowel developed from [ɨ] nor that it's currently moving upwards to it within living memory or recorded history. Nobody has claimed anything different, neither here nor in the sources brought forth. The common convention is misuse of the IPA, you could just as well represent the vowel as [ə]. Korn [kʰũːɘ̃n] (talk) 09:30, 15 May 2018 (UTC)
@Korn: Most common or standard pronunciation of "y" is not /ɪ/ but /ɨ/. Please listen to this video made by a native speaker aimed at teaching the sound, which does cause some issues to foreigners, unlike /ɪ/ in the neighbouring Czech Republic and Slovakia. Especially words był or dysk. In YouTube, please enter "osiEeCjQAIM" in the search. If you can't pronounce /ɨ/, /ɪ/ will work for you. --Anatoli T. (обсудить/вклад) 09:44, 15 May 2018 (UTC)
@Korn: Another good example is video "aJI6JDAxUd4" ("Polish Pronunciation Guide") by a Polish girl. Listen to cytryny and other examples. A good clear recording. Someone already did the analysis of the Polish phonology, why are we reviewing it? --Anatoli T. (обсудить/вклад) 09:54, 15 May 2018 (UTC)
Anatoli, someone indeed already did the analysis of Polish phonology, which is why we have hard technical proof and at least 40 recorded years of scientific consensus about the specific make-up of the vowel and its allophonic range, which is [ɪ~ɪ̞̈~ɘ̘]. I'm not sure why you keep providing recordings of Poles saying things. As an aside, if this really is the range of Russian /ɨ/, then it's basically 'any central vowel which is not /ɐ/' and not a suitable reference for anything, not that it would matter either way. Korn [kʰũːɘ̃n] (talk) 10:53, 15 May 2018 (UTC)
I think that range includes the unstressed ы, which has a wider range. Also, the range you gave for Polish incldudes ɨ. --WikiTiki89 11:58, 15 May 2018 (UTC)
'ɨ' is a central close vowel. So the range I gave does not include that, even if you construe /ɨ/ to also cover a near-close central vowel, as Polish Y is moving along an axis of either near-close and front or close-mid and central, with standard language apparently being centered around the latter. We're beginning to turn in circles. Is this too specific a topic to have a vote over? Korn [kʰũːɘ̃n] (talk) 13:36, 15 May 2018 (UTC)
Yes, ɨ covers the near-close central unrounded vowel as well in less precise transcriptions, like ours. That's why it's labeled as such in the diagram you linked to. --WikiTiki89 15:02, 15 May 2018 (UTC)
@Korn: Do you think you could provide a list of online academic references that support your proposal? Or least non-academic sources that support the proposal? This can be put to a vote, but one that is likely to fail given the opposition above; you can take a chance, though. --Dan Polansky (talk) 13:53, 15 May 2018 (UTC)
@Dan Polansky My proposal is to use the appropriate glyph (AFIK nobody is arguing that the vowel was not mid-close in standard language) instead of the overwhelmingly common one. What would support that? Korn [kʰũːɘ̃n] (talk) 14:08, 15 May 2018 (UTC)
@Korn I don't know. The problem is, with no academic references to back it up, and with people who know something about the matter opposing, the proposal may be difficult to sell. --Dan Polansky (talk) 14:17, 15 May 2018 (UTC)
Opposition will also come from those of us who think being consistent with what other sources say is simply more important than being "right". We do this for our English transcriptions all the time: almost no variety of English uses cardinal IPA [ʌ] as the strut vowel, but virtually everyone (including us) uses /ʌ/ to transcribe it, and we do it because it's less confusing for our users to encounter the same symbol they're used to from elsewhere than to use the symbol that most precisely matches the vowel sound. —Mahāgaja (formerly Angr) · talk 14:29, 15 May 2018 (UTC)
@Mahagaja To quote the ever so venerated Captain Picard: There are four lights. - For one, [ʌ] at least is a symbol for a sound that is and was ever used by native speakers of English. For the other, I have to say that I literally do not understand those who hold the opposing view. It's inconceivable to me, so it would help me if you explain. I can name the tangible harm I see coming from the pronunciation section as is: People pronounce it wrong. That means the pronunciation section fails to fulfill its very purpose for existing. Can you please name the tangible harm you expect to come from using the actual symbol? Can you please also explain 1. why you expect confusion to arise in the first place (including why you expect people able to read IPA to not be able to read most IPA or look up unknown IPA) and 2. why you value avoiding this confusion more than providing people with the correct pronunciation? Korn [kʰũːɘ̃n] (talk) 19:55, 15 May 2018 (UTC)
We have to leave people who don't know IPA out of consideration, because they will not understand any of the symbols we use, neither the one(s) you prefer nor the one I prefer. People who do know IPA will not actually be at much risk of mispronouncing the words if we use /ɨ/, because as a phonemic representation, /ɨ/ doesn't mean "a close unrounded central vowel", it means, in the context of Polish, "the vowel of mysz and of the stressed syllable of ryba". It's kind of like how in this image, c doesn't refer to a specific length, but just to , whatever they may be. People will learn the exact realization of the vowel from listening to native speakers. —Mahāgaja (formerly Angr) · talk 20:28, 15 May 2018 (UTC)
The above note about the triangle makes no sense: there, the three letters refer to three specific lengths, and these happen to be bound by the formula. Again, 'c doesn't refer to a specific length' makes no sense to me, and I see no analogy to the discussed subject at all. The dependence in the image is mutual; c can be determined from a and b, but also a can be determined from c and b. --Dan Polansky (talk) 05:39, 16 May 2018 (UTC)
Note that this exact same discussion breaks out periodically at w:Polish phonology, where the /ɨ/ won out based on convention in other sources. While I agree that /ɘ/ is a better representation, I would be reluctant to defect from the way Wikipedia did things since this is the source most of our users probably rely on. Perhaps this discussion should be taken there (for the n-th time), if it wins there I'm sure no one will object to its introduction here. Crom daba (talk) 11:31, 16 May 2018 (UTC)
@Dan Polansky: What I mean by the triangle comparison is that c in the triangle image doesn't stand for a specific measurement like "25 mm"; it's a variable whose value depends on the context it occurs in. The same is true of IPA symbols. We can't measure the formants of a vowel in isolation and say "this is /ɨ̞/, not /ɘ̝/" (or vice versa), because the symbols are only interpretable in the context of other vowels in the same system. The closest, frontest, most unrounded vowel that exists in a language is that language's /i/; the closest, backest, most rounded vowel is that language's /u/; the openest, backest, most unrounded vowel is that language's /ɑ/. If there's a back rounded vowel whose height is between /u/ and /ɑ/, then that's that language's /o/. If there are two vowels in that area, then the closer one is /o/ and the opener one is /ɔ/. That's why two very different-sounding vowels in two different languages can correctly be transcribed with the same symbol. I'm sure if Crom daba ran File:en-us-moot.ogg and File:de-Mut2.oga through Praat (s)he would find wildly different formants between them, but that doesn't make it wrong to transcribe them both /muːt/, because both vowels are /u/ within the context of their languages. —Mahāgaja (formerly Angr) · talk 23:12, 16 May 2018 (UTC)
I don't want to confuse people by using a different symbol than all the other general references for the same sound- especially for a distinction that only dogs, bats, dolphins and trained phoneticians will even notice. Chuck Entz (talk) 04:30, 17 May 2018 (UTC)
 :p --Per utramque cavernam 09:48, 17 May 2018 (UTC)
The fact that we use /æ/ rather than /a/, not only for the American vowel which is closer to /æ/ but also for the British vowel that arguably is /a/ (as Widsith mentions in a thread below), seems to rebut the notion that "the closest, frontest, most unrounded vowel that exists in a language is that language's /i/; the closest, backest, most rounded vowel is that language's /u/; [etc]"; clearly consideration is sometimes given to what IPA symbol the vowel is actually nearest to, and also sometimes the symbols at the extremities are avoided even when they're accurate. Anyway, why not just continue to analyse this Polish vowel in the normal/traditional way in broad phonemic transcriptions and give the actual vowel in narrow transcriptions? - -sche (discuss) 14:47, 17 May 2018 (UTC)
An example of that can be viewed at pięść. While not my preferred way, at least that puts the actual pronunciaton on the page, so if push comes to shove, I'll support it. Korn [kʰũːɘ̃n] (talk) 20:58, 19 May 2018 (UTC)
──────────────────────────────────────────────────────────────────────────────────────────────────── I would suggest having the narrow transcription be generated by the same module that's generating the broad transcriptions. Unfortunately, that takes technical skill. - -sche (discuss) 21:10, 19 May 2018 (UTC)
As I was told that module editing should follow consensus, I suppose having someone say 'sure' is a prior concern. Korn [kʰũːɘ̃n] (talk) 17:38, 22 May 2018 (UTC)
It's also worth considering whether it mightn't be more intelligible to people if we change the modules current t͡ʂ to tʂ and its current tʂ to tʂː, which is used in our narrow pronunciations (as tʂʂ) already anyway and depicts that the difference between cz and trz is a purely chronemic one.Korn [kʰũːɘ̃n] (talk) 21:46, 22 May 2018 (UTC)
Can someone please provide me a link to a sample audio where I can hear a Polish stressed "y" pronounced as [ɘ̟], not [ɨ]? --Anatoli T. (обсудить/вклад) 03:47, 25 May 2018 (UTC)
  • @Korn [kʰũːɘ̃n], others: Side-question: is a chroneme the same thing as a mora? ‑‑ Eiríkr Útlendi │Tala við mig 18:09, 31 May 2018 (UTC)
    No. Moras are used to measure the "length" of word. Chronemes are just an abstraction of the "thing" that lengthens a vowel or consonant. Adding a chroneme to a word may add a mora, but they are not the same thing. --WikiTiki89 18:30, 31 May 2018 (UTC)

@Korn: A question: how would you transcribe nasal vowels in a narrow phonetic transcription? I hear a diphthong at język. --Per utramque cavernam 16:30, 5 June 2018 (UTC)

From all I've read it's /jɛ̃w̃zɘk/, which is close enough to what I hear there. I mention this in the section of Polish IPA a few headers below this one. Korn [kʰũːɘ̃n] (talk) 17:03, 5 June 2018 (UTC)

Negative polarity items[edit]

I've just created Category:Negative polarity items by language. Do you think this is viable? @-sche, DCDuring --Per utramque cavernam 19:20, 11 May 2018 (UTC)

You could add a lot of know phrases, like know someone from Adam and its alphabetical neighbours. Equinox 19:24, 11 May 2018 (UTC)
I don't see why not. It seems orthogonal to the longstanding issue that by lemmatizing the positive forms and creating redirects to them from the negative forms, we may confuse readers who look up not know someone from Adam, don't notice they've been redirected, and find it defined as "know or recognise someone", the opposite of what the term they looked up means. At least the category usefully groups such entries. - -sche (discuss) 19:31, 11 May 2018 (UTC)
I added a few items. (It was fun.) The category seems useful. Wouldn't one for positive items be similarly useful?
Many MWEs that include any are NPIs, but some are SoP. Should polarity influence our inclusion/exclusion decision? What connection should there be between [[any]] and the category? How about things like budge and red cent? Are we capable of formulating or finding explicit criteria for inclusion of lexical items in the category?
We have lots of cases where we the positive form of a term often used in an NPI exists. I suppose, that if the positive only form exists in questions with a negative expected answer, we can still say it is an NPI. But usually there are some other, often not very common, even contrived, positive form uses. DCDuring (talk) 23:31, 11 May 2018 (UTC)
  • I found an interesting and useful 1-page pdf filled with example of NPIs. DCDuring (talk) 15:23, 12 May 2018 (UTC)
    As negative polarity is often a feature of a definition rather than a PoS (let alone a word), we should be populating this from {{lb}}, unless it isn't up to the job. DCDuring (talk) 17:00, 12 May 2018 (UTC)
    I have created User:DCDuring/Negative_polarity_items based, so far, only on CGEL (2002). Whether the redlinked MWEs which are SoP should be included merely because they are negative polarity items in some usage is a policy decision or a policy-application decision. DCDuring (talk) 17:37, 12 May 2018 (UTC)

English pronunciation template and module[edit]

Is it possible to create a template that would automatically generate phonetic English pronunciations in different dialects/accents? I was looking up the entry now today hoping to find some phonetic pronunciations in various English dialects, but all I found was the rather unhelpful /naʊ/. This across-dialect automation seems to be lacking, although it looks like a largely automatable job, using diaphonemes/phonemic forms, with parameters for each dialect (accepting phonemic IPA) that would allow overriding.

Wikipedia has International Phonetic Alphabet chart for English dialects, which has a rather impressive comparison of the English varieties. It doesn't necessarily need to be elaborate as such; even something crude as the; following would be immensely helpful and educational:

A similar model is {{vi-IPA}} used for Vietnamese. Thanks! Wyang (talk) 11:17, 14 May 2018 (UTC)

I'd support this. This is information that we're definitely lacking, especially for underrepresented but important dialects other than RP and GA, like Australian, NZ and Canadian English. — justin(r)leung (t...) | c=› } 18:54, 14 May 2018 (UTC)
I think that is a good idea. We tend to be somewhat what our most active user in that field considers to be proper language-centric and making more of these modules and thus establishing a normalcy to variety will encourage the same for other language. Korn [kʰũːɘ̃n] (talk) 09:39, 15 May 2018 (UTC)
@Wyang: If you find some data available, go for it, Frank!
I would also support the use of some form of phonetic respelling for English, even if it's not exposed to end users. It proved very useful for a variety of languages for automating pronunciation. --Anatoli T. (обсудить/вклад) 02:53, 16 May 2018 (UTC)
I'm nowhere near as familiar with English phonology as a lot of the other editors here... hopefully this can initiate some discussion and be the catalyst for a more standardised and systematic coverage of regional English pronunciations. Wyang (talk) 04:36, 16 May 2018 (UTC)
@Erutuon? --Per utramque cavernam 07:34, 16 May 2018 (UTC)
I too support this idea. I also think that the template should generate (diaphonemic?) enPR, so we can phase out having multiple phonemic transcriptions that can fall out of sync. —Μετάknowledgediscuss/deeds 04:16, 16 May 2018 (UTC)
Just wondering: is English pronunciation uniform enough for this to work? Also, am I correct in assuming that what is being proposed is a template that would generate pronunciations in various varieties of English based on what is input for RP and/or GA? — SGconlaw (talk) 08:16, 16 May 2018 (UTC)
To be precise, a template merely invokes a module. The question isn't proposing a concrete implementation, it's merely specifying what the implementation should achieve. Text-to-Speech and vice versa exist, so the question is not whether it's possible in principle -- yes, it is, to varying degrees of effort. But how and who, that's the question.
Of course I cannot but remark that english orthography makes this really hard for school children and not any easier for programmers.
I mean it's not a simple task. The assumption, e.g. by Anatoli, seems to be that manual labour is required to normalize the spelling first. Instead, my first thought was to use established machine learning algorithms that compare recordings and transcripts to generate probalistic models with over 99% accuracy. Some don't even use transcripts. That's just my two cents.
Normalized "authorgrefy" piques a tangential interest, to be sure. Rhyminreason (talk) 17:17, 16 May 2018 (UTC)
It could probably be made to work for most words, enough that it'd be useful, if we allow manual overrides and additions for when a dialect has an unexpected way of pronouncing something either instead of or in addition to the way one would 'expect' based on other dialects. For example, apparently RP and GenAm both pronounce margarine with /dʒ/, but RP exceptionally also pronounces it with /ɡ/. Other complicated words include aunt, eschew, pecan, quahog, and pwn. The "input" would probably have to be a fictional dialect that lacks any of the losses/mergers of real dialects, since the module might not be able to reliably re-insert /ɹ/ into GenAm from RP input (unless we start always using /(ɹ)/?), couldn't reliably un-merge Mary-marry-merry from GenAm input, etc. - -sche (discuss) 17:24, 16 May 2018 (UTC)
Quite apart from the problems of lexical incidence and recoverability, we'd need separate symbols for each of Wells's lexical sets, which gets complicated for the bath and cloth sets since they merge with different sets in RP and GA. The bath words are even more complicated in Australia, where some of them go with palm/start and others with trap. We'd have to split nurse up for Scottish accents that still distinguish /ɪr/, /ɛr/, and /ʌr/; we'd have to split goose up for Welsh accents that distinguish choose and chews; we'd have to split face and goat up for those accents (East Anglian? I'm not sure) that distinguish made from maid and don't rhyme toe and snow; and somehow we'd have to accommodate the old-fashioned New England accents that distinguish road from rode. That's just for starters. —Mahāgaja (formerly Angr) · talk 20:11, 16 May 2018 (UTC)
I don't think anyone expected this to be useful for all English dialects, especially more poorly documented ones. Just producing GA and RP would be great and reduce potential for error in many entries. —Μετάknowledgediscuss/deeds 21:38, 16 May 2018 (UTC)
Oh, so the template would generate pronunciations from ordinary spelling? Can it deal with cases like "bough, enough, thorough, though", to pick one common example? — SGconlaw (talk) 22:02, 16 May 2018 (UTC)
No. —Μετάknowledgediscuss/deeds 22:18, 16 May 2018 (UTC)
No indeed. If this happens at all, it's clear we'll need some kind of respelling to use in a parameter, just as {{fr-IPA}} already has for cases like ville /vil/ vs. fille /fij/. —Mahāgaja (formerly Angr) · talk 22:52, 16 May 2018 (UTC)
I see ... so what exactly is the template being proposed here? — SGconlaw (talk) 19:33, 17 May 2018 (UTC)
What's the distinction of rode/road? Does one of them belong to the hoarse set? Korn [kʰũːɘ̃n] (talk) 22:22, 16 May 2018 (UTC)
No; there seems to be (or to have been, since the accent in question is very old-fashioned and possibly extinct by now) a separate phoneme /ɵ/ in parts of New England (especially New Hampshire and Maine), that was used in some goat words like coat, road, smoke, stone, home, whole, but not in others like shoat, rode, own, knoll. I have no idea where it came from, since it forms minimal and near-minimal pairs with /oʊ/, and it doesn't come from the Middle English /oː ~ ɔu/ contrast (the toe/snow contrast mentioned above). Indeed, the ancestors of road and rode were homophones in Middle and Old English, so it's hard to imagine why they split in this small section of the English-speaking world. /ɵ/ occurs only in closed syllable and thus patterns like a lax or "checked" vowel, but is still distinct from both /ʊ/ (by being more open) and /ʌ/ (by being rounded). —Mahāgaja (formerly Angr) · talk 22:46, 16 May 2018 (UTC)
  • This could be a good way of settling a long-running dispute on whether to use the Oxford or Cambridge phonemes for British English (i.e. /a/ versus /æ/, /ʌɪ/ versus /aɪ/ etc.) by implementing both of them. However I would note that a lot of words are pronounced VERY differently in different regions and it would take more than automated transcription to deal with them – stress, for instance, often comes in different places in US/UK English (clitoris comes to mind, for some reason) and a few words are just wildly divergent (e.g. croissant – kre-SAHNT versus KWA-son). (We used to have a category for such words; has it disappeared?) Ƿidsiþ 12:42, 17 May 2018 (UTC)
What was the category called? I don't remember that. DTLHS (talk) 18:18, 17 May 2018 (UTC)
You're probably thinking of the awkwardly-named Category talk:Pronunciations wildly different across the pond, the contents of which are now hosted at the still-awkwardly-named Appendix:English words with pronunciations wildly different across the pond. (See also Category talk:English words with different meanings in different locations.) - -sche (discuss) 19:26, 17 May 2018 (UTC)
Yeah, that was it! Ƿidsiþ 04:35, 18 May 2018 (UTC)

'No equivalent expression' template or parameter [edit]

Following a discussion at the Tea room about translations of 'the curse' (a very old name for periods), I have updated the translation table so that translations of 'menstruation' are found at that word's page and translations reflecting the nature of the expression in English can be found at 'curse'.

Edit: let's talk about moobs instead (still not ideal, but the best I can think of).

I wonder if there should be a 'no equivalent expression' template for people to use, though, which could express this whilst still allowing for people to give the best translation available. It could reveal some interesting information about some languages. Kaixinguo~enwiktionary (talk) 07:45, 15 May 2018 (UTC)

I thought we had a template for "no translation exists", at least for "translations" of e.g. she into languages that don't have gendered pronouns (or don't have pronouns), but apparently I was just thinking of the {{qualifier}}s she uses for Dyirbal etc. Situations like that, and languages having only verbal and not adjectival constructions for cold, and languages having no jocular/pejorative/etc but only literal translations for things, are common enough that templates for at least the first and third cases seem like they'd be useful. Maybe the first could just say something like "no term for this exists" and the other could say something like "no comparable term exists, use maldita menstruação, literally 'cursed menstruation'". - -sche (discuss) 15:06, 15 May 2018 (UTC)
I'd be concerned about the overuse of any generic "no translation exists" template. I like what was done at curse. Glossing an FL term for which "no English translation exists" may be difficult, but must be attempted. Perhaps only a non-gloss definition is possible or a usage note is required. DCDuring (talk) 15:33, 15 May 2018 (UTC)
@DCDuring: I'm suggesting this for translation tables from English.
@Kaixinguo~enwiktionary:. How would that integrate with our translation-facilitation system? If incorporated, wouldn't it generate a lot of easy-way-out non-translations or failure to search for a nearly-right translations? If it were something only serious contributors had effective access to (ie, knowledge of), the potential for overuse would be much less. Even then, imitation would still become likely as use of the template spread. DCDuring (talk) 11:07, 16 May 2018 (UTC)
If we do this via a template, especially if we use separate templates for cases like she where no translation at all exists because a language doesn't have gender or pronouns, and cases where a term has to be rendered by an unidiomatic description, we can track and periodically re-examine which entries use the templates, especially the first one. - -sche (discuss) 17:28, 16 May 2018 (UTC)
@DCDuring: I'm not sure where the concern about overuse or generating non-translations comes from. One would have to be at least near-native or even a native speaker to say definitively whether an equivalent expression exists in the foreign language, so in that respect the bar is set higher for such translations, not lower. The question of searching for translations or not isn't that relevant; either one has the level in a language to say whether an equivalent exists, or one does not. You could argue that it could be misused, but that's the case for all translations as it stands. There is very little, if anything, to stop bad translations being added anyway: the potential for bad edits or misuse is the same. Kaixinguo~enwiktionary (talk) 10:36, 17 May 2018 (UTC)
My concern is not specific to translations. I think wherever we have options in our templates that provide an easy way out of doing some work, those options are misused. For a long time {{en-noun}} had only two options. As a result many nouns whose only offense was that the plural was rare or, in any event, not known to the person adding the template were marked as uncountable. Now {{en-noun}} has more options that cover some of the possible cases. But sadly, no one (I include myself.) finds it much fun to review all of the instances of {{en-noun}} to check whether they are correct. Maintenance only works if the task is relatively modest in scope. Initially instances in which no-translation-exists is indicated will be few and easily reviewed. When the category has numerous members it will become difficult to find the recent additions, at least when there are more new additions between reviews than fit in a reasonably sized list of recent additions to the category (assuming such a system of categories is created). DCDuring (talk) 14:30, 17 May 2018 (UTC)+
Just to clarify, I'm not suggesting 'no-translation-exists', what I suggested above is 'no equivalent term or expression', which is different. Kaixinguo~enwiktionary (talk) 18:20, 17 May 2018 (UTC)
@DCDuring: What if this could only be used if near or literal translations were added at the same time, which would address your thoughts about it being an 'easy way out'? For example at 'moobs', *Language name: {{t|no equivalent|use|[example]|language code}} Kaixinguo~enwiktionary (talk) 19:58, 17 May 2018 (UTC)
@-sche: I know I used that as an example, as it's what prompted the suggestion, but let's move away from the 'the curse' example, as I suspect most people here are not comfortable discussing even the term. Someone please suggest another example, if possible. But on the your point, I think 'maldita menstruação' shouldn't be put unless it is idiomatic in Spanish, that is the point. Kaixinguo~enwiktionary (talk) 05:22, 16 May 2018 (UTC)
I think maldita menstruação or some similarly SOP expression of the concept should be included as 'how this would be translated into Portuguese if it had to be translated into Portuguese'; the first part of the text takes care of clarifying that no direct translation exists, but it's not as if the idea couldn't be rendered into other languages. We already do this without any explanatory preface in a lot of cases where languages can only express some English word/concept via a SOP phrase. - -sche (discuss) 05:30, 16 May 2018 (UTC)
To replace 'the curse' let's talk about 'moobs' instead, it's a slang word for male breast tissue that resembles a woman's that could be hard to translate or not have an equivalent in a foreign language. Kaixinguo~enwiktionary (talk) 11:13, 17 May 2018 (UTC)
Aha, I knew we had a template for (the first part of) this: {{not used}}. The wording perhaps could be improved. - -sche (discuss) 21:57, 16 May 2018 (UTC)
I've just seen this relevant previous discussion linked from that template: Wiktionary:Grease_pit/2017/December#How_to_note_the_absence_of_a_translation_in_translation_sections. Kaixinguo~enwiktionary (talk) 19:16, 17 May 2018 (UTC)

This suggestion is now withdrawn per my talk page. Kaixinguo~enwiktionary (talk) 13:31, 18 May 2018 (UTC)

Given that we already have one such template, I'm not sure this can be withdrawn per se. I think it's useful, anyway, because we already have translations tables that make this claim, but the ones that don't use {{not used}} are hard to track; with templates, they can be easily tracked and checked for correctness. I've switched the tables at [[he]] and [[she]] and [[they]] and [[I'm rubber, you're glue]] to use {{not used}} (which is also used on [[the]], [[be]], [[it]], [[an]], [[that]], [[really]], [[do]], and [[-er]], and on [[corrective rape]], where however I may be able to replace it with a translation), and I've switched the ad-hoc wording at [[own]] and [[deadpan]] to use a second template. - -sche (discuss) 14:37, 18 May 2018 (UTC)
{{not used}} isn't what I was suggesting, and I think it is flawed. I think we have been talking about two different concepts anyway, what I was suggesting was a template or parameter to indicate whether an idiomatic term or expression has an equivalent in a language, to suggest an idiomatic approximation that can be used, an explanation and possibly an optional literal translation .
Also, if you look on my talk page you will see that User:DCDuring is unhappy about it.
{{no direct idiomatic translation}} is also flawed. Quite often an idiomatic translation is indeed possible; my intention was to call attention to whether an equivalent idiomatic term or expression exists in the foreign language, that is not the same. I wouldn't have proposed the template name '{{no direct idiomatic translation}}'; I strongly disagree with the idea of suggesting no idiomatic translation is possible. Kaixinguo~enwiktionary (talk) 15:31, 18 May 2018 (UTC)
Something seems warranted for at least some cases, even if just for labeling/categorizing entries with definitions that are problematic for translators. There seem to be at least a few classes of these. Normal users might find the distinctions we find useful hard to understand. DCDuring (talk) 18:14, 18 May 2018 (UTC)
Perhaps something more like "no idomatic equivalent". Anything can be referred to or described in any living human language, but no language has a specific, dedicated term for everything. Also, the semantic range covered by a term in one language may only partly overlap with the semantic range of similar terms in other languages, so a term in English may be the intersection of concepts A and B, but the closest term in Language X might be the intersection of A and D and in Language Y the intersection of B and E. Can either of those other terms be a true translation of the English term if they have no concepts in common? Chuck Entz (talk) 21:39, 18 May 2018 (UTC)

Vote: Proficiency as a prerequisite for contribution[edit]

FYI, I created Wiktionary:Votes/pl-2018-05/Proficiency as a prerequisite for contribution. Let us postpone the vote as much as discussion requires, if at all. --Dan Polansky (talk) 06:54, 17 May 2018 (UTC)

The vote page was deleted by Wyang. I request that the vote page is restored, in accordance with the amicable civilized practices of the English Wiktionary. --Dan Polansky (talk) 11:32, 17 May 2018 (UTC)
@Wyang I restored the page, it isn't obvious vandalism and will cause no harm if it exists while the discussion is taking place. On a personal note, I cannot see any reason for the vote to actually take place, but that is what the discussion page is for. - TheDaveRoss 11:37, 17 May 2018 (UTC)
Can the editors here do some actual work? Why the hell are we feeding ridiculous proposals like that? Wyang (talk) 11:43, 17 May 2018 (UTC)
The reason for existence of the vote is that the vote proposal seems to represent positions of certain people, and absent the vote, the people are likely to make representations that are hard to verify or disprove, and brought to check. --Dan Polansky (talk) 11:44, 17 May 2018 (UTC)
I might be wrong, but I suspect a quite short BP discussion would demonstrate that this proposal would not succeed in a vote, at least not as written. This is why it is a good practice to have a discussion prior to creating a vote. - TheDaveRoss 11:46, 17 May 2018 (UTC)
I believe vote talk pages are excellent places to discuss proposals, including their ability to have headings and the ease with which they can be separately monitored via the watch feature. In order for the vote talk page to exist, the vote page has to exist. And since the vote page always contains a certain specific textual proposal, the discussion can turn around that proposal, and amends to the text can be proposed and made. This has worked real well in the past in the English Wiktionary. --Dan Polansky (talk) 11:55, 17 May 2018 (UTC)
I can almost hear someone saying: Yes, I support the proposal, and no, this proposal should not be brought to vote to fail. Votes are evil. The powerful, strong and competent must rule, not the mediocre majority. --Dan Polansky (talk) 12:04, 17 May 2018 (UTC)
If that is your belief, you should consider changing our guidelines because I recall them explicitly saying that a vote should be the result of discussion, not its start. Korn [kʰũːɘ̃n] (talk) 14:44, 17 May 2018 (UTC)
I don't know which guideline you mean, but this vote is created based on an established practice with countless instances. --Dan Polansky (talk) 14:53, 17 May 2018 (UTC)
This is obviously a POINTy response to the discussion happening on Dan's (and Meta's) talk page right now. - -sche (discuss) 14:33, 17 May 2018 (UTC)
The linked page says "Wikipedia:Do not disrupt Wikipedia to illustrate a point". This is not a disruption, and therefore, is not POINTy. Rather, it is an attempt to find out how many people support a certain proposal and similar proposals. This has worked well in the past. The discussion on my talk page broke my patience; true enough. Nonetheless, I had seem similar representations before, but did not have the energy to deal with them. --Dan Polansky (talk) 14:53, 17 May 2018 (UTC)
My talk page contains three potential supporters of something like the proposed policy: Metaknowledge, Mellohi!, and AryamanA. --Dan Polansky (talk) 16:10, 17 May 2018 (UTC)
You are intentionally misrepresenting my position. —Μετάknowledgediscuss/deeds 17:11, 19 May 2018 (UTC)
Intentionally, no. I quote Metaknowledge: "Here is a warning that I am more accustomed to giving newcomers, but apparently you need to hear it: do not create entries in languages you do not know and have not studied". --Dan Polansky (talk) 07:41, 20 May 2018 (UTC)
I never said that about proficiency in Ancient Greek. What I said on your talk page was that it's not good form that you wrote a definition for an adjective consisting of solely a single noun. mellohi! (僕の乖離) 18:26, 19 May 2018 (UTC)
I said proficiency in a language might allow a contributor to get away with providing fewer sources in entries, not that only proficient editors should edit. I would never support this vote. See also my response below. —AryamanA (मुझसे बात करेंयोगदान) 01:39, 20 May 2018 (UTC)
  • If voted on and implemented, this proposal would go a long way toward confirming what many newbies already seem to feel: that this place is ruled by insiders who do not welcome outsiders.
I also wonder how to operationalize the 'proficiency' prerequisite: a number of (unreverted)/(content) edits, a score on an quiz, subjective opinion, a vote, a vote of those with proficiency in the admission candidate's language, an absence of blackballing (aka blocking)? DCDuring (talk) 17:50, 17 May 2018 (UTC)
I think this vote violates a lot of principles the Wiki sphere stands for and actually having it is thus probably precluded by the terms of service by which the Wikimedia Foundation hosts us. Korn [kʰũːɘ̃n] (talk) 18:41, 17 May 2018 (UTC)
Citation needed. Lots of other Wikimedia wikis have restricted contributors in various ways. DTLHS (talk) 18:43, 17 May 2018 (UTC)

Dan Polansky, since you have no interest whatsoever in actually introducing this topic to BP to solicit opinions, and would rather see this poorly formulated and completely unimplementable proposal being voted on (09:16, 19 May 2018: “Add "Oppose having this vote" section seen in some votes: there is some opposition to having this vote”), you are wasting everyone's time when this time should have been spent on actual dictionary building. Wyang (talk) 09:28, 19 May 2018 (UTC)

Normal parliamentary processes at least require someone to second a proposal. Is there anyone seconding it? DCDuring (talk) 18:35, 19 May 2018 (UTC)
I agree to the deletion of the vote. This kind of restriction goes against the principles of freely and openly editable wikis (as Korn said). Not to mention: how would it even be enforced!? And what exactly constitutes "proficiency"? Votes on such broad topics shouldn't be considered to begin with; only narrowly defined policy items can be voted on with a simple support/oppose/abstain. —AryamanA (मुझसे बात करेंयोगदान) 18:38, 19 May 2018 (UTC)
@AryamanA: Can you please explain to me and the reader why you did not oppose on my talk page when another editor wrote to me, "do not create entries in languages you do not know and have not studied", but instead, posted a post that could be interpreted as your agreeing with him? You wrote this: "Careful work with sources is good, but it's not a substitute for studying and having knowledge of the language at hand. ..." --Dan Polansky (talk) 07:06, 20 May 2018 (UTC)
@Dan Polansky: I would prefer that editors have some knowledge of the language they edit in. This does not need to imply "proficiency" (however that is defined); for example, I doubt many people are proficient with Ancient Greek or Latin because those are dead languages. But for well-known languages which have good textbooks available (such as Ancient Greek), there's no reason to not bother studying the language. You'll notice I don't edit Ancient Greek at all, for the same reason. If someone who has studied the language at hand tells you that you've made a mistake in you edits (and you don't know the language), there's no need to insist that you're correct, and certainly no need to start such a contentious vote. That only harms the dictionary. And anyways, there is no technology on Wiktionary that could enforce the results of this vote, so I think it's pointless. —AryamanA (मुझसे बात करेंयोगदान) 14:16, 20 May 2018 (UTC)

(outdent) I ask again that the vote/request for comments be restored at Wiktionary:Votes/pl-2018-05/Proficiency as a prerequisite for contribution. Those who disagree with having the vote will have a change to post to Oppose having this vote section of the vote, and then we will see how many they are. As for lack of Beer parlour discussion, I and multiple other people have been discussing on the vote talk page, and I do not see why I must discuss in Beer parlour instead. It is inappropriate that a vocal minority should prevent a vote from running. Wiktionary:Votes/2018-03/Showing romanizations in italics by default is an example of a vote that Wyang and AryamanA did not want running, but the overwhelming supermajority of 77.8% did want to see running. My creation of votes is in fact subject to editor control and feedback. There are three votes that have "Oppose having this vote" section, and in none of those votes did the vote-having-opposers achieve a superminority, that is, over 1/3. --Dan Polansky (talk) 06:33, 20 May 2018 (UTC)

Anyway, seeing all the responses so far, I propose to change the wording to the following:

Editors can contribute new entries even for languages that they do not know and have not studied. However, in such case, they are strongly encouraged to work very carefully with sources, and get acquainted with the lemmatization practice of the English Wiktionary for the language. For instance, for Latin, some dictionaries use e.g. stare as the lemma while Wiktionary uses sto as the lemma.

--Dan Polansky (talk) 06:49, 20 May 2018 (UTC)

This is already our policy. —AryamanA (मुझसे बात करेंयोगदान) 14:18, 20 May 2018 (UTC)

How can we make it easier for Wikimedia contributors to understand Wikidata?[edit]

Noun Project author icon 1642368 cc.svg

Dear all

Over the past year or so I've been working quite a lot on Wikidata documentation and have been thinking more about the needs of different kinds of user. I feel that currently Wikidata can be difficult to understand (what it does, how to contribute, what issues there are and what is being done to address them etc) even for experienced Wikimedia project contributors. To help address this I've started an RFC to try and collate this information together. It would be really helpful if you could share your thoughts, especially if you find Wikidata hard to understand or confusing, you can just share your thoughts on the talk page and we will synthesize them into the main document.

Wikidata:Requests for comment/Improving Wikidata documentation for different types of user

Thanks very much

John Cummings (talk) 12:54, 18 May 2018 (UTC)

  • I don't use it because I find it really boring. --Genecioso (talk) 13:03, 18 May 2018 (UTC)
    That is why Wikidata still has a Main Page. - TheDaveRoss 13:25, 18 May 2018 (UTC)

misspellings misconstructions[edit]

I put it to you, that terms labelled misspellings, misconstructions, (subcategories of a Category:Common mistakes?) should never appear _at google searches, _at search box of wiktionary (when we type the beginning of a word)
Because: Just seeing the title of the word, assures us, the users, that it is valid. Rarely do we visit its page to find out that it is a misspelling. I hope, a 'hidden' function be possible. A victim, sarri.greek (talk) 09:37, 20 May 2018 (UTC)

You really should visit the page in Wiktionary and see what it says. What if the form is not a misspelling but a rare word or an archaic one, whereas you want to be understood by a wide audience? Or what if it is a word in a different language? For reference, the vote is at Wiktionary:Votes/pl-2014-04/Keeping common misspellings. --Dan Polansky (talk) 09:50, 20 May 2018 (UTC)
I think this is a good idea, but the best place to discuss this is WT:GP because it's a technical issue. —AryamanA (मुझसे बात करेंयोगदान) 13:10, 21 May 2018 (UTC)
Beer parlour is the ideal place. The issue is not technical; it is about the user-observable behavior of Wiktionary site. --Dan Polansky (talk) 11:51, 27 May 2018 (UTC)

Middle Gujarati[edit]

I've added a special code inc-mgu for Middle Gujarati. It looks noncontroversial to me, there's a Wikipedia article and plenty of hits on Google Books. —AryamanA (मुझसे बात करेंयोगदान) 16:56, 20 May 2018 (UTC)

Some principles for citations[edit]

Definitions in this dictionary ultimately derive their authority from the citations under them, which should be clear and easy to read. Currently, a lot of the RQ-format templates used to add these definitions are ludicrously long and verbose in their formatting (see, for one example among hundreds, a page like lustrum). Note that the problem is not the sentence(s) cited, but rather the formatting of the citation source. When I've tried to trim these back I've been met with some opposition. So here are some principles I would want to suggest:

  1. A book's subtitle should not usually be used.
    • Emma; not Emma: A Novel. In Three Volumes
    • Hydriotaphia or Urne-buriall; not Hydriotaphia, Urne-buriall, or, A Discourse of the Sepulchrall Urnes Lately Found in Norfolk. Together with The Garden of Cyrus, or The Quincunciall, Lozenge, or Net-work Plantations of the Ancients, Artificially, Naturally, Mystically Considered. With Sundry Observations
  2. If there is a conventionally used short form of the title, that may be used instead.
    • Henry VI, part 3 or 3 Henry VI; not
    • The true tragedie of Richard Duke of York, and the death of good King Henrie the sixt
  3. Publisher details should be limited to name of publisher and, if necessary, place of publication.
    • "Oxford: Joseph Barnes"; not
    • "Oxford: Printed by Ioseph Barnes, and are to be sold in Paules Church-yard at the signe of the Crowne, by Simon Waterson"
  4. If the edition cited is not the first edition, details of the first edition should not be included in the citation (except for the original year of publication, which appears in boldface at the start of any citation).
    • 1621, Robert Burton, The Anatomy of Melancholy, Oxford: Henry Cripps 1624, II.2, p. 209; not
    • 1621, Democritus Junior [pseudonym; Robert Burton], “Ayre Rectified. With a Digression of the Ayre.”, in The Anatomy of Melancholy, Oxford: Printed by Iohn Lichfield and Iames Short, for Henry Cripps, OCLC 216894069; The Anatomy of Melancholy: What It Is. With All the Kindes, Cavses, Symptomes, Prognosticks, and Seuerall Cvres of It. In Three Maine Partitions, with Their Seuerall Sections, Members, and Svbsections. Philosophically, Medicinally, Historically Opened and Cut Up, by Democritvs Junior, with a Satyricall Preface, Conducing to the Following Discourse, 2nd corrected and augmented edition, Oxford: Printed by John Lichfield and James Short, for Henry Cripps, 1624, OCLC54573970, partition II, section II, member 3, page 209

These are all real examples taken from our existing templates. To me it seems obvious that a citation source should not be five or six times longer than the citation itself; they should be no longer than one line, or two at an absolute maximum. If there are editors who really want to see all this extra information, perhaps there is some way to hyperlink the whole citation source to some other page of references where we can give the full details of the book's entire title page, the publisher's location, friends, tennis partners etc. as desired. Am I alone in this? Should/could we codify this somewhere? Ƿidsiþ 06:44, 21 May 2018 (UTC)

Definitely keep them short. Detail that helps in identifying an edition is good but don't let it get ludicrously long. —Suzukaze-c 06:46, 21 May 2018 (UTC)
As the editor who created many of the quotation templates, I would explain that my aim was to capture the imprint information of works as accurately as possible. Of course, I am always guided by whatever is the consensus on the issue. I would be in favour of some technological solution that would allow templates to display a short citation by default, and a longer citation if some portion of the citation or a link is clicked on (@Erutuon?). — SGconlaw (talk) 06:50, 21 May 2018 (UTC)
I should also add that where abbreviations can be hard for editors to understand, since Wiktionary is not a print publication where space is at a premium, we can do with slightly more verbosity. (Having just read John Simpson's memoir The Word Detective, I note that the OED made a similar choice when it transformed from being a print to an online publication.) For example, in the Burton example given above, I think most people would find "II.2" quite obscure, and "partition II, section II, member 3, page 209" much clearer. (Fortunately, not many works are subdivided to that level.) — SGconlaw (talk) 06:53, 21 May 2018 (UTC)
Good book! But the OED's idea of a "long" citation source is a bit different from yours. Even online, this Burton citation would appear as "R. Burton, Anat. Melancholy II.II.III", and even when you click on the title, the popup only expands this to "Robert Burton, The Anatomy of Melancholy". It does not give the subtitles or chapter titles. It does not expand the "II.II.III" (though personally I wouldn't necessary be opposed to this, I just don't think it's necessary). Ƿidsiþ 07:02, 21 May 2018 (UTC)
Point 3 seems like a no-brainer, listing where the books were to be sold is absurd. I generally agree with point 1, too. I also agree that "1621, Democritus Junior [pseudonym; ..." is horribly over-long. More liberal use of {{...|the subtitle and all that jazz}} to hide the extra information as hover-text might solve some of this (although if screen-readers read hovertext, people who use them would still have to sit through a minute of book title to get to the quotation). - -sche (discuss) 07:13, 21 May 2018 (UTC)
I am in agreement with these principles. Also there is no reason that we cannot display a subset of the total information given, so that relevant data is kept but only critical information is displayed (unless a user chooses to show more. - TheDaveRoss 12:10, 21 May 2018 (UTC)
I sometimes cite articles, books, or databases used as references. For some, I am able to reference a Wikipedia article, which allows a relatively brief hyperlink and an overall length of reference of less than a line. For articles, there is never a WP article. The articles usually have a specific citation format that seems to be required. Should we have an Appendix that contains the full citation and a {{senseid}}-equivalent and link to that from a tersely wording in-entry link? DCDuring (talk) 15:48, 21 May 2018 (UTC)
That seems annoying to maintain. In addition, many times we will only quote from a particular work one time over the whole site. I would prefer to hide extra details with javascript. DTLHS (talk) 16:03, 21 May 2018 (UTC)
For the taxonomic references, there is likely to be repeated utilization of databases without a WP article (eg, The Scorpion Files) and certain articles (eg, see {{R:Ruggiero}}). Maybe I should just try it for taxonomic references. If others need something similar, it can then be copied, improved, or generalized. DCDuring (talk) 16:19, 21 May 2018 (UTC)
Good luck, but taxonomic references seem tangential to what this discussion is about. They have very different requirements from the quotations in entries. DTLHS (talk) 16:23, 21 May 2018 (UTC)
Ah, the joy of being marginalized.
The common element is the potential for excessive length is the issue. Academic article citations are usually two to three lines long. Widsith had expressed a preference for one-line citations, a standard or goal with which I agree. I already have a reference template that displays at two or three lines in neaarly a thousand entries and I expect to have more. DCDuring (talk) 16:51, 21 May 2018 (UTC)
I agree for the most part as well. I know Sgconlaw means well, but I think he may have gone a little bit overboard with the details. —Μετάknowledgediscuss/deeds 16:21, 21 May 2018 (UTC)
I agree with Widsith; make source identification short. The point is in the quotations, not in the identification of the source. --Dan Polansky (talk) 11:48, 27 May 2018 (UTC)

Proposal: Amend Wiktionary:Entry layout to explicitly disallow empty section[edit]

Wikipedia articles allow empty sections (Wikivoyage even requires section headers even there're no contents to display). However I don't think they serve any propose in Wiktionary (e.g. if someone request etymology of an entry we do have {{rfe}} rather than using an empty etymology section) and it seems to be a de facto practice not to include empty sections. Therefore I propose to formally disallow them (see [2] as an example).--Zcreator alt (talk) 19:28, 21 May 2018 (UTC)

JA terms commonly have multiple different readings, each with their own specifics. These are generally grouped under ===Etymology XX=== sections, where XX is a number. If the etym of that particular reading isn't known at entry creation, editors often leave the etymology section itself blank. However, these are required as a means of grouping the different readings.
If the above proposal extends to such Japanese entries, then I must oppose. ‑‑ Eiríkr Útlendi │Tala við mig 15:45, 22 May 2018 (UTC)
This does not disallow any section with subsections (i.e. sections one level lower), even if there's nothing between the section header and the subsection header.--Zcreator alt (talk) 15:52, 22 May 2018 (UTC)
Yes, I agree (with Z), subsections are content, so a section with subsections is not "empty". (And Japanese is far from the only language with "placeholder" ===Etymology N=== sections; English entries have them, too.) - -sche (discuss) 16:36, 22 May 2018 (UTC)
So what about RFE tags? You say we have those instead but aren't they put under an etymology header? Korn [kʰũːɘ̃n] (talk) 17:35, 22 May 2018 (UTC)
RFE tags are content that displays visually. Equinox 18:29, 22 May 2018 (UTC)
This seems sensible, iff a section with the templates you mention ({{rfe}}, {{rfp}} or the like) is not considered "empty". (Someone could just add the template without a section header, yes, but having the header already present seems neater, since someone adding the etymology (etc) will have to add it anyway.) - -sche (discuss) 16:36, 22 May 2018 (UTC)

Lexicographical data is now available on Wikidata[edit]


After several years discussing about it, and one year of development and discussion with the communities, the development team of Wikimedia Germany has now released the first version of lexicographical data support on Wikidata.

Since the start of Wikidata in 2012, the multilingual knowledge base was mainly focused on concepts: Q-items are related to a thing or an idea, not to the word describing it. Starting now, Wikidata stores a new type of data: words, phrases and sentences, in many languages, described in many languages. This information will be stored in new types of entities, called Lexemes, Forms and Senses.

The goal of lexicographical data on Wikidata is to provide a structured and machine-readable way to describe words and phrases in multiple languages, stored in a same place, reusable under CC-0. In the near future, this data will be available for Wiktionaries and other projects to reuse, as much as you want to.

For now, we’re at the first steps of this project: the new data structure has been released on Wikidata, and we’re looking for people to try it, and give us feedback on what is working or not. Participating to this project is the opportunity for you to have a voice in it, to make sure that your needs and requests are taken in account very early in the process, and to start populating Wikidata with words in your language!

Here’s how you can try lexicographical data on Wikidata:

  • First of all, if you’re not familiar with the data model, I encourage you to have a look at the documentation page. If you’re not familiar with Wikidata at all, I suggest this page as a start point.
  • You can also look at the Lexemes that already exists (search features will be improved in the future).
  • When you feel ready to create a word, go on d:Special:NewLexeme.
  • If some properties that you need are missing, you can suggest them on this page (if you’re not sure how to do it, just let a message on the talk page and someone will help you).
  • The main discussion page is d:Wikidata:Lexicographical data. Here, you can ask for help, suggest ways to organize the data, but also leave feedback: if you encounter any bug or issue, let us know. We’re looking especially to know what are the most important features for you to be worked on next.

In any case, feel free to contact me if you have a question or problem, I’ll be very happy to help.

Cheers, Lea Lacroix (WMDE) (talk) 12:14, 23 May 2018 (UTC)

Something broken in the "terms derived from" category.[edit]

Check this out: CAT:Latin terms derived from Proto-Indo-European has an error in the recent addition to catagory. So do the catagory pages for terms derived by any language from any language. -- 2405:204:9708:8E03:0:0:2396:80B1 13:03, 23 May 2018 (UTC)

User:Eruton made a change recently to Module:auto cat. Probably a typo somewhere. I've undone the change until they can figure out what went wrong. For future reference: this kind of thing is best reported at the Grease pit. Chuck Entz (talk) 14:05, 23 May 2018 (UTC)
Repinging with correct name: @Erutuon Chuck Entz (talk) 14:06, 23 May 2018 (UTC)
@Chuck Entz: Fixed. Sorry about that. I had checked the module errors category after my edit, but I guess the errors took a while to appear. — Eru·tuon 17:31, 23 May 2018 (UTC)

Make words of the day more visible[edit]

Would anyone support putting a short section in the sidebar for the words of the day? I think just putting them on the front page means a lot of people don't see them. For length considerations, I think it would have to just be links to the words and maybe the part of speech- no definitions or anything else. @Sgconlaw, Metaknowledge DTLHS (talk) 18:04, 23 May 2018 (UTC)

I like the idea of increasing visibility to (F)WOTD. I also checked to make sure, and the sidebar can included dynamic content (templates, conditional statements) so we can do it pretty easily. - TheDaveRoss 18:29, 23 May 2018 (UTC)
I never look at the front page, but I do like our WOTD feature, an I love Sgconlaw for maintaining it. --Genecioso (talk) 20:57, 23 May 2018 (UTC)
Awww, thanks. @DTLHS, could you provide a mock-up of what you're suggesting? I can't quite picture it. Sounds interesting, though. — SGconlaw (talk) 21:55, 23 May 2018 (UTC)
Something like this: [3]. DTLHS (talk) 22:05, 23 May 2018 (UTC)
Great idea! Wyang (talk) 22:21, 23 May 2018 (UTC)
Sure, I'd support that. — SGconlaw (talk) 22:25, 23 May 2018 (UTC)

──────────────────────────────────────────────────────────────────────────────────────────────────── On a separate issue, should we add graphics to the WOTD box on the Main Page? — SGconlaw (talk) 22:25, 23 May 2018 (UTC)

Something shiny, spinny and flashy please!!! --Genecioso (talk) 08:21, 24 May 2018 (UTC)
  • I support an addition to the sidebar along the lines of what DTLHS is suggesting. I oppose adding more graphics to the (F)WOTD boxes. —Μετάknowledgediscuss/deeds 08:29, 24 May 2018 (UTC)
The question is how to get the words into the sidebar without making extra work for the two maintainers. DTLHS (talk) 01:44, 25 May 2018 (UTC)
Good question. How does the sidebar work, anyway? — SGconlaw (talk) 03:31, 25 May 2018 (UTC)
MediaWiki:Sidebar and [4]. DTLHS (talk) 03:51, 25 May 2018 (UTC)
I think it can be done without significantly more work. If each WOTD page had a /sidebar subpage which just had the text to be displayed that could be transcluded in the sidebar. If the subpage doesn't exist it can fail gracefully (display nothing). - TheDaveRoss 19:26, 25 May 2018 (UTC)

@Sgconlaw: On another issue, I wonder whether when clicking on WOTD on the front page you can be taken directly to the entry instead of to the top of the TOC, a minor irritation. DonnanZ (talk) 08:08, 28 May 2018 (UTC)

Do you mean that a reader should be taken to the relevant part of speech? I suppose a bookmark can be added, but there's no guarantee that this will work for all time if the entry is later amended by adding or removing sections. (However, in most cases there won't be any changes during the 24 hours that the entry is featured on WOTD.) — SGconlaw (talk) 08:54, 28 May 2018 (UTC)
No, I meant to the top of the English entry, below the TOC, using something like {{l|en|-}}. This may depend on how a user sets up the TOC, I suppose. DonnanZ (talk) 14:22, 28 May 2018 (UTC)
@Donnanz: OK, I've tweaked {{WOTD}} to achieve that effect. — SGconlaw (talk) 16:01, 8 June 2018 (UTC)
@Sgconlaw: Wonderful, it certainly works for today's word, cheers. I wonder if the WOTD nominations list can be adapted too? DonnanZ (talk) 16:22, 8 June 2018 (UTC)
@Donnanz: Yes check.svg Done. — SGconlaw (talk) 16:55, 13 June 2018 (UTC)
@Sgconlaw: Thanks once again. I'm sure these improvements will be of benefit to every user. DonnanZ (talk) 17:00, 13 June 2018 (UTC)

Polish IPA transcription[edit]

Hergilei (talkcontribs) is removing {{pl-IPA}} from Polish entries and replacing it with manual transcriptions showing dental consonants explicitly marked with the IPA dental diacritic. Before getting into a revert war with him/her, I'd like input from more people, especially ones who know Polish. It seems to me that since /t d s n/ etc. are not contrastively dental in Polish (i.e. they don't contrast with alveolars), it is unnecessary (and thus undesirable) for a phonemic transcription to use diacritics to mark them as dental. The simple characters suffice to show what phonemes they are. If we want to show a narrow phonetic transcription, then the dental diacritic makes sense, but in a broad phonetic transcription it's redundant and should be avoided. Do others agree? Or do others think we should edit Module:pl-IPA to show the sounds in question as explicitly dental? —Mahāgaja (formerly Angr) · talk 20:11, 23 May 2018 (UTC)

We really should update pl-IPA to produce both a broad and a narrow transcription. Then the broad one can omit the dental diacritics, use the traditional symbol for that debated vowel, etc, while the narrow one adds the diacritics and the more accurate vowel sybol, etc. - -sche (discuss) 01:35, 24 May 2018 (UTC)
I don't think it's necessary to use diacritics here: prefer /kɔnt/ to /kɔn̪t̪/ in kąt. Besides, the automated module works fine. Hergilei (talkcontribs) should seek agreement first, if some changes were required, they should be done on the module, not manually, leaving a huge inconsistency behind. --Anatoli T. (обсудить/вклад) 01:58, 24 May 2018 (UTC)
I think having the module let the template show both phonemic and phonetic transcription is a good idea. Then szczyt for example can say /ʂt͡ʂɨt/, [ʂt͡ʂɘ̟t̪] and everyone's happy (or at least a little bit closer to being happy). —Mahāgaja (formerly Angr) · talk 20:31, 24 May 2018 (UTC)
I hear [ʂt͡ʂɨt̪] in the recording, though. --Anatoli T. (обсудить/вклад) 20:52, 24 May 2018 (UTC)
@Mahagaja As you know I disagree with your very precept of how to use IPA here, and not weakly, because I think you're blinded by elitism - meaning due to the fact that you're part of an elite (linguistic professionals), not that you feel better than others. Purely phonemic transcriptions are only of use to those dealing with phonemics on a deeper level - which is a narrow part of the dictionary usership I feel no need to especially cater to, as those are really the most likely to make sense of the details of any transcription scheme we'd choose for our language. Phonemic transcription in some languages are tools only for people wishing to analyse a language, not people who wish to have a guide on how to pronounce it. Phonemically, for example, you can render Japanese [nʲipːõ̞ɴ] as /nitpom/, but what normal person actually having need to look at a pronunciation section is actually helped by that? Our IPA should not be a phonemic transcription, it should be an as narrow as possible transcription of the actual sounds produced, to enable people reading it to pronounce the word in question with as little divergence from a native speaker as possible, leaving out only those effects likely to be produced by people without special mention, such as palatalisation before /i/. Hence, I see no limitation of narrowness of transcription, but there's definitely 'too broad' and not marking dentalness is more broad than I like because not using dentals is part of foreign accent. Korn [kʰũːɘ̃n] (talk) 08:23, 25 May 2018 (UTC)
With this approach, you would require 3-5 narrow transcriptions for standardised languages when editors finally agree, which version to consider the final. For languages with less resources, more dialects or more interpretations of what is correct, there will be no progress at all, as it was with German. The /nitpom/ example is not valid, since even the Japanese transliteration is closer to how the term is actually pronounced: "Nippon". --Anatoli T. (обсудить/вклад) 10:46, 25 May 2018 (UTC)
I'm opposed to narrow phonetic transcription for exactly the same reason you (Korn) are opposed to broad phonemic transcription: I think it's elitist (in the same sense you're using the term). It's unrealistic to expect readers to learn all the IPA diacritics and what they stand for; only linguistics experts are really likely to put that much effort into learning the symbols. And the narrow transcriptions are really not necessary: someone learning Polish is not going to look at /t/ in a Polish word and think "That must be alveolar, because if it were dental it would have been written /t̪/." They're going to look at /t/ and think "that's the sound Polish speakers use to correspond to the letter t (or d at the end of a word, etc.)." And no degree of narrowness in our transcription is going to help a nonnative speaker get rid of their foreign accent, but it will confuse them with an excess of information. I don't think we should transcribe にっぽん as /nitpom/, but our current transcription [ɲ̟ip̚põ̞ɴ] is also unhelpful because it goes into excessive and confusing detail, even for me with an advanced degree in phonology. (Questions I would ask myself include: How far to I have to advance the [ɲ]? What exactly is meant by [p̚p]? Is that different somehow from geminate stops in other languages? If so, how; and if not, why aren't we simply writing [pː]? How far open is the normal Japanese [o], and how much more open than that is [õ̞]?) A broad transcription like /nippoɴ/ is sufficient for anyone's purposes. I'm not advocating highly abstract underlying representations either, though: I don't say we should transcribe しゃく as /sjaku/ and しく as /siku/; /ɕakɯ/ and /ɕikɯ/ are broad enough for dictionary purposes without being either too abstract (requiring the reader to remember that /sj/ and /s/ before /i/ surface as [ɕ] in Japanese) or too narrow (confronting the reader with a forest of tiny marks around the letters, which all look the same on most computer screens and whose significances are difficult to remember). And seriously, I have never seen any dictionary of any language, either monolingual or bilingual, use the kind of narrow transcription you seem to be advocating. If narrow transcriptions were actually helpful to language learners, you'd think other lexicographers would have grasped that by now and would use them. But they aren't helpful to language learners, they're just confusing. I'd honestly rather not have narrow phonetic transcriptions at all, but I'm willing to accept them in addition to broad phonemic transcription to satisfy the people who prefer them. —Mahāgaja (formerly Angr) · talk 14:25, 25 May 2018 (UTC)
There's some food for thought in your reply, I'll just respond to one point right now: Your wondering how open Japanese [o] would be is already caused by your approach to IPA as a tool to describe abstract relative positions in a phonemic system rather than as representations of absolute human sounds to within a neglectable margin of variation. And it is this absolute approach which informs my, honestly mostly modest, moves for more precise IPA, not as you two would depict it, the narrowest possible representation (whose implementation I have no problem with) to the exclusion of everything else (which I do not propose). As such, the IPA I would have picked for にっぽん would have been /nipːo̞ɴ/. And if you take that approach, our IPA tells you: It's mid [o̞], as IPA defines. It's more open than [o], not Japanese [o], not English [o], [o], and less open than [ɔ], and how much exactly can be learnt by hearing, which is made easier once our reader is already primed with the info that the vowel is in fact [o̞] and not [o]. Your argument for misinforming our reader [I sort of regret this antagonism, but that's my personal impression as a user.] by hyperbroad IPA-transcriptions was that they can inform themselves on the details from other sources. Well, the same is true about IPA characters - and easier! - as I have said multiple times. You yourself pointed to the available key, claiming it was easy and natural to check it out. So if we have to send our users to read up on the one or the other thing, I rather have that people who looked at our Pronunciation header then had to look up the meaning of IPA characters rather than what the actual pronunciation is. ps.: What about my question about chain shifting /t͡ʂ/ > /tʂ/ > /tʂː/ and thus eliminating <t͡ʂ>, which I actually DO find confusing? Korn [kʰũːɘ̃n] (talk) 18:48, 25 May 2018 (UTC)
By the way, you seem to me to be of the opinion that literally every non-contrastive diacritic should be avoided in our IPA, is that impression correct? I'd actually like to explain my approach to IPA narrowness, maybe we can find common ground: I would like our IPA to depict all things which are not side effects. In our nippon example, the palatalisation and nasalisation are effects carrying over from the phonetic environment, not inherent parts of the phonemes in question. Polish /t/ on the other hand is inherently a dental stop and speakers of another language will not automatically add this feature if it is not part of their native language. Thus, in order to sound natural, they have to consciously choose to add the feature +dental. In contrast to palatalisation and nasalisation which I hence would leave out from Japanese transcriptions. Or in contrast to, to stick with Polish, vocalisation of nasals before fricatives, which is entirely facultative and thus need not be depicted. Korn [kʰũːɘ̃n] (talk) 19:19, 25 May 2018 (UTC)
@Korn: A lot of my thinking is in agreement with this blog post by John C. Wells: "there is no super-phonemic system somewhere in the sky, consisting of universal sounds in one-to-one relationship with IPA symbols... We can, if we choose, use all manner of unusual letters and diacritics to record subtleties that we may have observed. On the other hand we may start out pretty unsubtle and then subsequently need to refine our transcription (make it 'narrower') if we find that we are overlooking subtleties that turn out to be important in the language concerned. But when we want to use IPA symbols in a dictionary or language textbook, that sort of thing won’t do. There, we need a simple straightforward system that is not burdened down by unnecessary complications." And I think Wells is in agreement with the vast majority of phoneticians, phonologists, and lexicographers. It's for this reason that it doesn't make sense to say "the vowel is in fact [o̞] and not [o]" until you've defined [o] with respect to a particular language; [o̞] does not actually have an absolute meaning, certainly not in a phonemic representation, which is why using /o̞/ as the symbol for a languages only mid back rounded vowel is "misinforming our reader". I would only use /o̞/ for a language in which there are three mid back rounded vowels that differ in height. (I'm unaware of any such language, but I'd be interested to find out about one.) As for the chain shift you mention, I don't know enough about the details to answer. I thought that Polish contrasted the cluster /tʂ/ as in trzy with the affricate /t͡ʂ/ as in czy, but a Polish friend told me that only speech-conscious people speaking carefully actually make that contrast and that the two are merged in casual speech. Finally, yes, I do think that a phonemic representation should avoid diacritics unless they're required to distinguish phonemes (e.g. my hypothetical language with a /o ~ o̞ ~ ɔ/ contrast). Incidentally, it occurred to me later that Japanese does contrast /n/ and /ɲ/ (e.g. naku vs. nyaku), so I revise my preferred transcription of にっぽん to /ɲipːoɴ/. —Mahāgaja (formerly Angr) · talk 11:02, 28 May 2018 (UTC)
Yes, and the phonemic transcription can become more phonetic when it is really important będę: IPA(key): ˈbɛn.dɛ, gęba: IPA(key): ˈɡɛm.ba, gęś: IPA(key): ɡɛɲɕ, ręka: IPA(key): ˈrɛŋ.ka are results of automated IPA showing the actual pronunciation of letter ę in various positions. This level of narrowness is sufficient, in my opinion, for learners of Polish. (gęś should actually produce /ɡɛ̃ɕ/, not /ɡɛɲɕ/) --Anatoli T. (обсудить/вклад) 11:37, 28 May 2018 (UTC)
  • According to Wikipedia, w:Kensiu_language#Vowels does contrast the three levels in question. I've been told years ago that the difference between trz and cz is a 'chronemic' one. I always thought it was a long T, but as en.Wiktionary has many examples of [tʂʂ], that must be it. En.Wiktionary also marks the conflation of trz and cz as a feature of Lesser Polish dialect, which sounds vaguely familiar to me. Although, even if distinction was a feature of conscious speech, I'd still be for marking it, as it is obviously part of the language, and no less the register foreign learners are expected to produce. It seems the main clash our views have is what is 'important in the language concerned'. As a German, I have [o] and [ɔ]. When speaking Spanish or Finnish, using either of these will give me more of a German accent than using a vowel whose heighth is in between these two. A Finn or Spaniard on the other hand, using his mid vowel which corresponds to the other language's, would sound less foreign using his O. And this is information I consider important. We can convey this in our pronunciation and if I told you to pronounce [ɔ], you...well you specifically might ask 'What's its position relative to all other vowels?' but you don't really expect our average user to have similar thoughts or be baffled by our IPA for using /ɔ/ or /o̞/, do you? (Two non-rhetorical questions.) Also, per your reasoning, do you oppose using /ɛ ɔ œ ɪ ʏ ʊ/ for German IPA and would prefer we use /e o ø i y u/? Korn [kʰũːɘ̃n] (talk) 11:49, 28 May 2018 (UTC)
  • For me as an English speaker who's learned German, it's also important for me to remember that the vowel in German Mut is considerably backer than the vowel in English moot, and if I want to sound less American when I speak German, I have to push my tongue farther back more when making the German tense u sound. And conversely a German speaker who wants to reduce his German accent when speaking English has to remember to use a fronter vowel than he's used to. But IMO that fact does not justify transcribing the German word as /mu̠ːt/, nor transcribing the English word as /mu̟ːt/ or /müːt/ or /mʉːt/. Both words are really just /muːt/, but /uː/ means something different in German than it means in English. I really do think the risk of bafflement is pretty high if we use /o̞/ in a language where there's no contrasting /o/; not so much for /ɔ/ (which we use in transcribing Yiddish for example) since it's a distinct character rather than a diacritic modifying /o/. To your last question, I support using /ɛ ɔ œ ɪ ʏ ʊ/ because the tense vowels can be short in unstressed syllables, making it possible for short lax and short tense vowels to contrast (e.g. Fokus doesn't rhyme with Dokus). —Mahāgaja (formerly Angr) · talk 12:08, 28 May 2018 (UTC)
So we're stuck on opposing expectations of user experience, what should be do about that? Although I must say that it seems to me that I more thoroughly elaborated the risk in the opposing view and why my approach leads to fewer, more acceptable and more easily rectifiable problems. Korn [kʰũːɘ̃n] (talk) 15:37, 28 May 2018 (UTC)
I still don't see why we can't simply have both: the broader transcription that I prefer in slashes, and the narrow transcription that you prefer in square brackets. —Mahāgaja (formerly Angr) · talk 16:15, 28 May 2018 (UTC)
I suppose that's a compromise. We'd still have to work out what the narrow transcription will depict. I'm not really submerged enough in Polish to work it out in detail, but I'd personally like it to have [ä], the Y-vowel shown properly, the dental marks and the chronemic differnce of trz vs cz - which I'd still prefer to be encoded in the broad transcription. Not sure what about/ęś/. It apparently surfaces as [ɛ̃ɲɕ], [ɛ̃j̃ɕ] and [ɛ̃w̃ɕ], but I don't know which one is standard. I assume but don't know that the same goes for /ǫś/. Anything else that comes to mind? Korn [kʰũːɘ̃n] (talk) 18:29, 29 May 2018 (UTC)
  • I'm also under the impression that there is no such thing as palatalised labials, rather than them all being a transparent combination of two consonants /Cj/. Korn [kʰũːɘ̃n] (talk) 16:25, 5 June 2018 (UTC)

Possible IP range blocks required[edit]

A certain editor who got blocked (User:Liedes) has edited using IP addresses ever since the block (for adding incorrect etymologies and personal attacks on editors who reverted the changes), which constitutes as block evasion. The IPs have continued the practice of consistently adding incorrect etymologies and etymology-related content for numerous entries (mixed with some less incorrect ones). There seems to be at least three different IPs/IP ranges involved right now. Wiktionary:Etymology_scriptorium/2018/May#Special:Contributions/ concerns the two IP ranges and an additional (broadband?) IP that I've found. Both ranges seem to belong to mobile operators, which may incur some collateral damage, so I'm not certain what the best course of action would be. The original account should also probably be blocked.

The IPs and IP ranges in question are:

Other ranges may also exist, but I'm not certain. SURJECTION ·talk·contr·log· 20:11, 23 May 2018 (UTC)

Additional discovery: Special:Contributions/ SURJECTION ·talk·contr·log· 20:56, 23 May 2018 (UTC)
Let this guy be, dudes. --Genecioso (talk) 20:58, 23 May 2018 (UTC)
You bet I will when the bad edits on etymology sections stop. SURJECTION ·talk·contr·log· 21:02, 23 May 2018 (UTC)
Three /16s is a lot of surface area for a single person, but they do seem to hop around a lot. If someone does block these ranges I suggest blocking IP edits only and allowing account creation. - TheDaveRoss 01:09, 24 May 2018 (UTC)
Yes, I agree. Account creation should still be allowed. SURJECTION ·talk·contr·log· 11:43, 24 May 2018 (UTC)
This is still an issue. is the active range right now as of writing this, but it'll surely change again. SURJECTION ·talk·contr·log· 06:07, 30 May 2018 (UTC)
What exactly is the problem with the anon's edits? — SGconlaw (talk) 06:38, 30 May 2018 (UTC)
Adding clearly incorrect etymologies and refusing to stop doing so. They're interspersed in between less bad edits, but the bad edits are such almost to the point that they could be simply classified as vandalism. There is also a history of personal attacks (see Special:Contributions/Liedes, the original account used before it got a block). SURJECTION ·talk·contr·log· 10:47, 30 May 2018 (UTC)
Could you provide a few examples of entries where incorrect etymologies have been added, and why they are incorrect? Sorry, but I'm unfamiliar with Finnish. — SGconlaw (talk) 14:01, 30 May 2018 (UTC)
A few examples, some even fairly recent:
* Special:Diff/49605000: unsourced and very likely original research suggestion that pieni "small" and piena "cleat" are somehow connected etymologically. I cannot find any sources that would speak for this.
* Special:Diff/49532702: suggesting an "abessive like inflection" to every adverb that ends with -ti, even when it's not an inflection but rather a simple suffix. in this particular case, it's likely just completely wrong.
* Special:Diff/49536367: the editor likes to connect and relate likely unrelated word roots (mostly between Finnic and other languages) like this. this isn't even the worst of them: Special:Diff/49504812 and Special:Diff/49409420 are worse.
* Special:Diff/49448244: suggesting that sileä, a not-recent-at-all word for "smooth" or "flat", is derived from "silkki", a comparatively recent loanword!
I can get some more examples, if necessary. Me and User:Tropylium have spent a while going over and reverting the ones that are clearly wrong. SURJECTION ·talk·contr·log· 14:16, 30 May 2018 (UTC)
Just to add another example, one that got pretty nasty... himmeli (look at the history there). SURJECTION ·talk·contr·log· 14:20, 30 May 2018 (UTC)
I blocked the currently active /16 for a week, let's see how it goes. - TheDaveRoss 14:33, 30 May 2018 (UTC)
Thank you. I don't think it'll work as a remedy for that long because of the IP hopping expected of mobile connections, but it's hard to say how well it will work without trying. SURJECTION ·talk·contr·log· 14:34, 30 May 2018 (UTC)

Request for rollback[edit]

Hi. I have started looking at Special:RecentChanges and undo vandalism. I would like to continue and believe that rollback would be useful to me. I admit that I am blocked on other wikis but have since changed my ways. Kiko4564 (talk) 19:06, 24 May 2018 (UTC)

You should stick around and do some work, then come back once there is evidence of your investment and good judgment, as well as familiarity with policies. - TheDaveRoss 19:16, 24 May 2018 (UTC)
We're not going to give rolback rights to a brand new user we've never seen before. —Suzukaze-c 19:16, 24 May 2018 (UTC)
Rollback isn't a right that is usually handed out to anyone here. —AryamanA (मुझसे बात करेंयोगदान) 19:14, 25 May 2018 (UTC)

Category:United States county index[edit]

Completely obliterated by User:Koavf in a fit of vandalism. I very much regret passing him for admin. [5] Can it be restored please? DonnanZ (talk) 21:43, 24 May 2018 (UTC)

Koavf's deletion message said "Previously deleted/failed RFD or RFV". Where was that deletion discussion? DTLHS (talk) 21:51, 24 May 2018 (UTC)
here. User:-sche didn't respond. DonnanZ (talk) 21:57, 24 May 2018 (UTC)
The module was deleted by Koavf as well diff. DonnanZ (talk) 22:11, 24 May 2018 (UTC)

Please see the deletion log from User:-sche: "this is an effort by one POV-pushing user at making an end-run around the clear consensus that entries should be categorized by state, not all lumped into one massive category" There was no consensus for creating this scheme and pushback against its existence; this was the recreation of deleted material against consensus (and as far as I'm aware, without even requesting it). —Justin (koavf)TCM 22:19, 24 May 2018 (UTC)

I think @-sche had second thoughts about that, but didn't express them. DonnanZ (talk) 22:27, 24 May 2018 (UTC)
Further work on counties has been suspended for the time being, needless to say. DonnanZ (talk) 22:34, 24 May 2018 (UTC)
I'd like the broader community to come to a consensus on this. Double-categorizing counties into both state-level categories and a single big category doesn't fit what we do in other cases, e.g. Category:en:Plants seems to mostly contain subcategories (and only a few currently-stray entries); OTOH, there is an argument to be made that categorizing all plants (counties, etc) also into one category that could be perused at a glance (in addition to leaving them in subcategories) would be useful to some readers for some purposes...but it seems to me we should decide to do or not do that in general, not exceptionally for US counties. And iff we were to do that, it would probably make sense to put the entries all into the existing macro-category, not into a new "index" category, which seems like it should rather be an Index:-namespace page. It's unfortunate that the combination of relative heatedness and insignificance which characterize this issue have led relatively few people to comment. - -sche (discuss) 17:33, 26 May 2018 (UTC)
Thanks for responding. Generally I am not against partitioning into subcategories, and I agree a lot of work still needs to be done here, for example on Category:en:Cities. OTOH I feel there can be overcategorisation, which may be OK in English but not suitable in other languages, e.g. for Plants, Birds, Mammals. However in this case (US counties) I have always felt that a master category is definitely of use, the most important reason being because some county names are shared by more one county, the most extreme example being Washington County which occurs in no less than 30 states. I don't think there is any other category like that. Another example is Category:English surnames which despite having many subcategories currently contains no less than 26,894 entries. I don't think there is any way of reducing that, and it's still growing. United States county index would have around 1800 entries max, it was over 1600 before it was summarily removed. DonnanZ (talk) 18:40, 26 May 2018 (UTC)
  • Due to the apparent general apathy amongst users, I have started a county index on my user page. It has hardly any of the attributes of the deleted index, which worked perfectly, the main bugbear being the lack of links from county entries. It works, but I'm not completely happy with it, so improvement suggestions are welcome. An alternative would be including them as derived terms at county, but my aim is to include parishes in Louisiana and boroughs in Alaska as well. Whether it grows too big for my user page is another matter I have to consider. DonnanZ (talk) 09:29, 29 May 2018 (UTC)

Renaming "Deccani" dcc[edit]

I'd like to rename the Deccani language categories (dcc) to use Dakhini, which is overwhelmingly the more common term in modern literature (not to mention that is the name the speakers use). "Dakhini language" gets three times as many hits on Google Books as "Deccani language". Also pinging @DerekWinters. —AryamanA (मुझसे बात करेंयोगदान) 03:02, 26 May 2018 (UTC)

Hm, in Ngram Viewer "Dakhini language" is not common enough to plot, and when I page through Google Books hits, I see only 15 for "Dakhini language" (excluding library subject-heading lists), vs 50+ for "Deccani language". "Deccani Urdu" also outpaces "Dakhini Urdu" 5:1, and searching journal articles with Google Scholar I find the same trend.
When I search for works published after 1990 that mention Dakhini and language, I see 15, vs 18 for Deccani+language, which is admittedly a closer race. And there was a decline in use of "Deccani" and slight rise in "Dakhini" around 1990, but "Deccani" still outpaces "Dakhini" in that Ngram, though it may be helped by the fact that it refers also to things other than the language. (The only reference work I can find which uses either spelling in the title is Ruth Laila Schmidt's 1981 Dakhini Urdu: History and Structure.)
So, I would say the case for a rename is not nearly as clear as you suggest.
Btw, until a recent thread with @Anupam, we called it "Deccan".
- -sche (discuss) 17:43, 26 May 2018 (UTC)

Warning templates[edit]

Does anyone else think Wiktionary should include a formal, structured system of warning templates, similar to Wikipedia's? For those who do not know, Wikipedia has a set of templates for each specific violation (eg. BLP, NPOV, vandalism, etc.) with increasing levels of severity. EhSayer (talk) 05:08, 27 May 2018 (UTC)

No. SemperBlotto (talk) 05:09, 27 May 2018 (UTC)
@SemperBlotto: What's your objection? EhSayer (talk) 05:16, 27 May 2018 (UTC)
Far too much effort - I would rather put my energy into building the dictionary. SemperBlotto (talk) 05:19, 27 May 2018 (UTC)
EhSayer is more worried about the abusing Trump being abused. --Anatoli T. (обсудить/вклад) 05:31, 27 May 2018 (UTC)
@Atitarev: You seem to be implying that I support Donald Trump, while I DEFINITELY do not. I just believe that Wiktionary policy should be adhered to. EhSayer (talk) 05:38, 27 May 2018 (UTC)
And I call morons morons, even if it violates some policies. I got your point, though. No stress. --Anatoli T. (обсудить/вклад) 05:44, 27 May 2018 (UTC)
@SemperBlotto: Most of the templates could probably be copied and pasted directly from Wikipedia, with minor changes. From what I know, Wiktionary's general policies are pretty much in line with Wikipedia's. You wouldn't necessarily have to put in effort, as other editors and I will likely be willing to do so. EhSayer (talk) 05:27, 27 May 2018 (UTC)
The problem with warnings is that they take away the immediacy of our response. Yes, there are people who would benefit from a progressively stronger series of polite explanations, but there are lots more who may not know the intricacies of all the formal policies, but they know what they're doing is wrong. For those people, it's better to have them blocked right away rather than waiting until they've been served formal notification and had a chance to respond. Then there are cases such as this one, where the editor in question has been around for years and probably just finds the warning unnecessary and annoying. If they don't get the hint when their edit is reverted, an admin can leave them a note in their own words asking them to knock it off. In a smaller wiki like this, all the canned warnings just come across as empty formalities churned out by faceless people going through the motions- something to just tune out. Chuck Entz (talk) 07:09, 27 May 2018 (UTC)
BTW Chuck's comment here sums up very nicely why shit like "happy birthday" bots on WP set my teeth on edge. Are we trying to assist culture by creating something or do we just want to create 9000 bots that talk to each other about nothing? It's an open fucking question. Equinox 11:32, 27 May 2018 (UTC)
I think this is the "solution in search of a problem". Sometimes we get newbies who are really keen to be part of a wiki and they start greeting people and warning people and they never actually define a word, or translate a word. It's stupid. So: please let's not until it's necessary. We are not big enough yet. Equinox 11:30, 27 May 2018 (UTC)
I hope, in fact, that we never get so big that we have to import all those bureaucratic warning systems from Wikipedia. Wiktionary is far better in that regard, because that way it's much more welcoming to newbies. —AryamanA (मुझसे बात करेंयोगदान) 00:19, 28 May 2018 (UTC)
I agree with Chuck. I doubt such templates would see much use. The current practice of leaving (non-templatized) messages on talk pages (if it seems like a user will engage with feedback, or in block summaries if it seems like a user won't engage, e.g. if they're a repeat vandal), while not without problems, seems sufficient. - -sche (discuss) 15:14, 27 May 2018 (UTC)
Just to clarify. User:EhSayer is seeking how to warn or punish User:Fumiko Take over this edit @Trump, which he has reverted and me, because I said "I personally don't think it was vandalism. Besides, Trump deserves this type of treatment - it's the way he treats everybody around him." without actually restoring Fumiko's edit or demanding it to be restored. Basically, so much fuss over an opinion and he complained here and there and still can't calm down. --Anatoli T. (обсудить/вклад) 05:53, 29 May 2018 (UTC)
I disagree completely with Atitarev's point of view. Trump is an idiot but the original quotation is excessively inappropriate. It might have a home on Wiktionary, but that home is not Trump#English, and absolutely violates NPOV, especially considering the current political climate. Donald Trump is not the only Trump in the world, and IMO vulgar quotes belong on vulgar entries. As for specific warn NPOV templates, they are formalities that have not proven themselves to be necessary (as far as I can tell), but IMO they are harmless. —Suzukaze-c 05:57, 29 May 2018 (UTC)
I also find Atitarev's insistence that EhSayer is specifically pursuing himself and Fumiko Take unjustified and alarming. —Suzukaze-c 06:05, 29 May 2018 (UTC)
@Suzukaze-c: So you think it's OK to persist on seeking punishment over non-issue? He removed the quotation, warned Fumiko Take on her talk page. No more action from me or Fumiko. What else does he want or what else do YOU want? --Anatoli T. (обсудить/вклад) 06:07, 29 May 2018 (UTC)
What punishment? EhSayer added {{warn test}} to Fumiko Take's talk page due to the lack of NPOV, then wondered why we don't have {{warn npov}} (which would be more accurate) like Wikipedia does, which led to this thread. Meanwhile, you were the one who brought politics into it, calling Trump an idiot who deserves it (whether he does or not is unrelated to NPOV policies) and accusing EhSayer of supporting Trump. —Suzukaze-c 06:11, 29 May 2018 (UTC)

@Suzukaze-c: Thank you. That's exactly what I've been trying to say the entire time. EhSayer (talk) 06:16, 29 May 2018 (UTC)

@Suzukaze-c I called Trump a moron, just like you did in your previous comment (idiot, not moron) and I said that in my personal opinion, it wasn't vandalism, expressed my unconsequential opinion but I DIDN'T ask for the edit to be restored. Now User:EhSayer starts complaining. I can only view it as seeking punishment for me, not for Fumiko Take. --Anatoli T. (обсудить/вклад) 06:22, 29 May 2018 (UTC)
@Suzukaze-c: It seems not just about template updates. EhSayer wants to import Wikipedia's punishing policies to Wiktionary with votes. --Anatoli T. (обсудить/вклад) 06:36, 29 May 2018 (UTC)
@Atitarev: You also blatantly said that you don't care about NPOV. I'm not seeking "punishment" for anyone. All I'm attempting to do is make sure Wiktionary policies are adhered to, which is your responsibility as an admin. EhSayer (talk) 06:31, 29 May 2018 (UTC)
@EhSayer: You can start worrying about my responsibilities when I start violating something. You seem a lot of time on you hands. --Anatoli T. (обсудить/вклад) 06:36, 29 May 2018 (UTC)

@Suzukaze-c: Do you personally believe that Template:warn npov should be re-created? DTLHS deleted it, and he is currently refusing to re-create it for a vote, citing the fact that I "have no backing". EhSayer (talk) 06:44, 29 May 2018 (UTC)

Much ado about nothing. It wasn't vandalism, and it was removed without trouble. This is not Wikipedia, we don't have thousands of active editors, and we don't need warning templates. The only thing that is alarming is being alarmed over such a trifle. Let us now stop making a big deal about nothing. —Stephen (Talk) 09:10, 29 May 2018 (UTC)
@Stephen G. Brown: Your comment about it "not being vandalism" is exactly what I'm trying to get at. Due to the limited number of templates on this site, I was forced to use a vandalism template where an NPOV one should've been used. The problem was that there was no NPOV template. I then attempted to create one, but it was almost instantly deleted by DTLHS, because I didn't have backing. EhSayer (talk) 13:31, 29 May 2018 (UTC)
I have reverted any number of edits, and have never felt like I was unable to communicate my reasoning for doing so because a template didn't exist. The reality is that this project is not Wikipedia, we have significantly different needs and our processes reflect that. Next time I would suggest that you just type out a message that accurately reflects your intent rather than using a template which has a different meaning. Your edit summary of "rvv" also belies that the primary issue was a missing template.
There is also a lot of needless ad hominem attacking going on in this discussion, it is enough to indicate whether or not you support the proposal and why without feeling the need to call out the actions or intentions of other contributors (this is not directed at EhSayer specifically). - TheDaveRoss 14:01, 29 May 2018 (UTC)
@TheDaveRoss: I did not realize that there was an NPOV policy here at the time of the revision, and I only had a general knowledge of the vandalism policy. I have now read both policy pages. Like I said at User talk:Fumiko Take, I used to edit Wikipedia, and I am relatively new to this site and its policies. EhSayer (talk) 16:40, 29 May 2018 (UTC)

unadapted borrowings[edit]

Based on this discussion, I've recently created {{unadapted borrowing}} (with the help of Ungoliant, thanks), to have a way of distinguishing English cubiculum from English cubicle, which have both been borrowed from the same Latin word (i.e. cubiculum). Other examples would be French lapsus vs. laps, processus vs. procès, and probably other cases yet. Note that {{learned borrowing}} doesn't address this issue/is about something else: both French processus and French procès are learned borrowings.

I've described "unadapted borrowings" as "loanwords that have not been conformed to the morpho-syntactic, phonological and/or phonotactical rules of the target language", based on this paper; corrections or suggestions for improvement are welcome.

On another note, we've been saying of English words such as Petrozavodsk or Manasseh that they're "transliterations". But in this discussion, Mahagaja explains that "When used as English words, they're not transliterations. A transliteration is when you write (for example) a Hindi word in the Latin alphabet. But if someone says "I added some methi to the aloos", they aren't speaking Hindi, they're speaking English and using Hindi loanwords. A transliteration can only be found in writing, for one thing, while a loanword can be found in speech. It doesn't make sense to say "transliteration of Hindi जीरा (jīrā)" in the etymology section of an English word."

Based on that, it is my opinion that we should abandon the use of {{translit}} as an etymology template, and that these words would be better described as "unadapted borrowings" too. I thus intend to empty Category:Transliterations of Russian terms (and other similar categories: CAT:Transliterations of Ancient Greek terms, CAT:Transliterations of Hindi terms, etc.) and replace it with Category:Unadapted borrowings from Russian, by replacing all instances of {{translit}} with {{ubor}}.

Any objection? Might I be missing an important distinction?

Edit: at Talk:Փխենյան, Vahagn says that Armenian Փխենյան (Pʿxenyan) is neither a transliteration nor an unadapted borrowing, so the systematic replacement of {{translit}} with {{ubor}} might not be a good idea. However, that does not mean that {{translit}} should be kept either.

@Mahagaja, Vahagn Petrosyan, Ungoliant MMDCCLXIV, and @Erutuon (I think you've been the main user of {{translit}}?) --Per utramque cavernam 11:23, 27 May 2018 (UTC)

It all makes sense but I feel uncomfortable flooding loanword categories with proper nouns - surnames, place names etc. They are so many in every language and virtually any city name/surname may have a transliteration/translation in a different script. Better something like "X surname in Y" where X and Y are language names. --Anatoli T. (обсудить/вклад) 11:41, 27 May 2018 (UTC)
I share Anatoli's concern that systematically using {{ubor}} on place- and personal- names is quite likely undesirable, since for most languages, unadapted borrowings of names will vastly outnumber borrowings of ordinary words. Separating proper nouns into a subcategory (as we do in some other cases), maybe by a parameter in the template, would probably address that issue adequately. Maintaining that distinction (ensuring people use the parameter where appropriate) seems hard, but if anyone was hoping to make distinctions between {{der}}, {{bor}}, {{translit}} and {{ubor}} in the etymologies of borrowed placenames, that is already not being much maintained, heh.
Btw, I'm not convinced "aloo" is an unadapted borrowing; it pluralizes like an English word even in your/Mahagaja's example, unlike some Latin borrowings that retain Latin plurals, and the pronunciation seems to have been anglicized (like, arguably, also the spelling). Several more of the entries that currently use {{translit}} seem like "normal" borrowings rather than mere transliterations or "unadapted" borrowings, e.g. Islam (I just fixed it), and probably also Stalin, both pluralizable and with anglicized pronunciations, and in the case of Islam also with adapted spelling that drops the ʾ. Efforts to maintain a distinction between adapted and "unadapted" borrowings seem likely to be effort-intensive and prone to many errors / gray areas / cases of (possible) disagreement like those.
Which is why I'm unsure: is trying to distinguish "unadapted" borrowings really necessary/desirable? I'm on the fence. - -sche (discuss) 15:50, 27 May 2018 (UTC)
Retiring {{transliteration}} seems sensible, for the reasons Mahagaja mentioned. - -sche (discuss) 15:52, 27 May 2018 (UTC)
@Atitarev, -sche: Yes, I'm concerned about dilution too, and I agree that this endeavour is "prone to many errors / gray areas / cases of (possible) disagreement". I don't know, at first I intended to gather only French latinisms (those at Category:French unadapted borrowings from Latin), then I figured it would be better to generalise. --Per utramque cavernam 10:51, 29 May 2018 (UTC)
If {{transliteration}} is to be retired, then there are implications for Wiktionary:Beer_parlour/2018/May#Format_for_given_names_in_non-Latin_scripts above where I asked for some consensus on whether given names in non-Latin scripts should link to an entry in English because Persian proper nouns were about to be updated, but with no result. One of the main editors of names, User:Makaokalani, has described some given names as transliterations from those languages. That approach (e.g. Negar) is not compatible with this discussion. Kaixinguo~enwiktionary (talk) 11:06, 5 June 2018 (UTC)
For an example of proliferation of terminology, this edit uses "semi-learned borrowing", contrasting with "learned borrowing" and "borrowing". (Discussion above suggests that, in addition to "unadapted borrowings", there are also semi-adapted borrowings.) It might be better to just accept than processus and procès are both in the same category, of learned borrowings; sometimes languages do borrow or even inherit the same word multiple times in multiple ways, and it isn't always possible to distinguish them (for example, newt vs eft, both inherited from the same Old English word, or mishcup, scuppaug, scup and paugie, or the many Portuguese descendants of macula). I sympathize with the desire to distinguish things, but it seems like determining whether or not something has been "adapted" could be non-trivial in many cases: sauerkraut is spelled the same as its German etymon, but it's been in English since the 1600s; some words are borrowed from Latin and pluralize Latinly (but maybe are pronounced Englishly), others pluralize English (or, in French, Frenchly); etc ... - -sche (discuss) 18:29, 4 June 2018 (UTC)

Translation hub label[edit]

Wiktionary:Votes/pl-2018-03/Including translation hubs has passed.

I think it would be most beautiful to label translation hubs using translation hub label, which will point to glossary. The look will be like this:

  1. (translation hub) To refrain from speaking; to say nothing.

This enables to state a definition, even a trivial one, while at the same time providing a link to an explanation.

Am I the only one who likes this labeling? --Dan Polansky (talk) 16:46, 27 May 2018 (UTC)

See Appendix_talk:Glossary#"translation_hub". I think it's a misuse of the {{label}} template, and I'm opposed to it. "translation hub" is a made-up Wiktionary term, and should not be put on an equal footing with standard lexicographic/grammatical terms like intransitive or register labels like informal. I maintain that we should continue using {{translation only}}. --Per utramque cavernam 16:50, 27 May 2018 (UTC)
Why is it a misuse? It is a made up term, but it is one that Wiktionary editors are likely use to refer to these kinds of entries, one that multiple people seemed to prefer over "translation target". --Dan Polansky (talk) 16:52, 27 May 2018 (UTC)
{{translation only}} was traditionally used as the sole item on the definition line. I find it clumsy; I find it preferable to state a normal definition, even a trivial one. However, a normal definition does not explain that the entry is a translation hub. --Dan Polansky (talk) 16:54, 27 May 2018 (UTC)
As stated on Template:label/documentation, {{label}} has two uses:
  1. "label senses with restricted usage"
  2. "label senses with grammatical information, in addition to that in the part-of-speech heading and headword line"
{{lb|en|translation hub}} accomplishes neither of those.
You're talking about Wiktionary editors, I'm talking about readers. It doesn't matter at all that editors prefer to speak of "translation hubs" over "translation targets" (which I've made no mention of); neither of those should be used in the body of the dictionary, which is about readers.
{{translation only}} is free from any Wiktionary jargon, and thus seems perfectly adequate to me.
To sum up my position: I've no problem with people referring to "translation hubs" and/or "translation targets" on talk pages; I just don't want to see those terms in the main space. --Per utramque cavernam 17:00, 27 May 2018 (UTC)
I agree; "translation only" is easily understandable and not jargon. — SGconlaw (talk) 11:12, 29 May 2018 (UTC)
I don't see "translation hub" as generally less transparent than "translation only", and it is better in pointing to the rationale for having the entry. And since the term in the label contains a link pointing to an appendix providing an explanation, the lack of initial familiarity on the part of the reader does not harm either. The general reader will have to consult the appendix for established terms anyway, for established does not mean familiar to lay readers. The term is lexicographical. It could be preferable to find a term used by other lexicographers, but this is the best term we could find. --Dan Polansky (talk) 17:06, 27 May 2018 (UTC)
"it is better in pointing to the rationale for having the entry": We could very well link to the relevant policy page from {{translation only}}.
"The general reader will have to consult the appendix for established terms anyway, for established does not mean familiar to lay readers": I think there's a difference between using technical state-of-the-art terms that people might not be familiar with, and made-up terms that people will not be familiar with as they could not possibly have encountered them before. The first case is inconvenient but necessary, the second case is inconvenient and unnecessary. I'll grant you that I don't find it terribly inconvenient, but still.
Another way of looking at the issue: teaching people what ergative (which is used as a label) means might be useful; teaching them what "translation hub" means is pointless as they won't see it anywhere outside WT. --Per utramque cavernam 17:29, 27 May 2018 (UTC)
Linking to policies from mainspace is what we usually do not want to do; these are two separate domains. At the very least, we would have to link to Appendix:Glossary from {{translation only}}. And the text produced by the template, "(This entry is here for translation purposes only.)" is already too loquacious; the beauty of "translation hub" is its brevity (4 syllables), combined with its being easy to remember, by my estimate. --Dan Polansky (talk)
In my understanding, our labels serve to indicate that real-world use of a word is restricted to or typical of some context, e.g. it's obsolete (found in only old documents), US-specific, etc. Even when people over-apply topic labels to bring about categorization, it still means that real-world use of the word is typical of that context (e.g. medicine). Even the grammatical labels tell people how a word is used, e.g. "transitive" tells them it's used with an object. But "translation hub" has nothing to do with the real-world use of the word, so it shouldn't be a label, IMO. It's just a Wiktionary-internal signal of why Wiktionary has included the word. I think this approach is good. Some entries alternatively just have definitions and no indication of their translation-hub-ness at all (which seems undesirable; they ought to be tagged/trackable somehow), or they indicate it only with a category (which should IMO be the minimum).
I am sure I saw more kinds of labels than restricted-context ones and grammatical ones. A label is there to label, as the name suggests. --Dan Polansky (talk) 18:14, 27 May 2018 (UTC)
  • I doubt that readers will understand any of "translation hub", "translation target" or "translation only" without an explanation, but if there is a link to an explanation then I guess that is fair enough. However, I do not support this kind of layout:
  1. (translation hub) To refrain from speaking; to say nothing.
This is for reasons mentioned above, namely that "translation hub", or similar phrases, is an internal Wikipedia categorisation and not of the same ilk as other labels. Mihia (talk) 21:50, 29 May 2018 (UTC)
  • "Hot word" is also an internal Wiktionary categorization, and yet appears in the mainspace. I would be fine with having "translation hub" indicated in a way similar to "hot word". An example hot word entry is cronynomics, where it says at the top right, "This English term is a hot word. Its inclusion on Wiktionary is provisional", and the word "hot word" links to Appendix:Glossary#hot word. This is very similar case, since "hot word" is an inclusion criterion, like "translation hub". --Dan Polansky (talk) 06:33, 2 June 2018 (UTC)
    @Dan Polansky: For some reason, I've never been bothered by the {{hot word}} template, though it makes use of WT jargon in the main space. Yet other examples would be {{was wotd}} and {{was fwotd}} (which I've objected to at some point, but am now fine with).
    I think I'd be ok with having "translation hub" in such a place too. Per utramque cavernam 19:30, 4 June 2018 (UTC)
    "Hot word" is a bit different in that it's temporary, but I would still object to using it as a context label. I don't mind making the "translation hub" text into an off-to-the-side thing like "hot word", though. If there are cases where one sense or etymology section or part of speech is included for translations and another is idiomatic, it might be awkward, but still OK. - -sche (discuss) 19:43, 4 June 2018 (UTC)
I updated the half-dozen entries that linked to {{translation hub}} so they link directly to {{translation only}}; I would suggest now turning {{translation hub}} into a hot-word-like template (and bringing it here so we can iron out / agree on format and wording), and then people can go through and replace {{translation only}} with {{translation hub}}, updating the placement of the template and/or adding basic/stub definitions if necessary during the process. - -sche (discuss) 19:47, 4 June 2018 (UTC)

Reflexive verbs[edit]

Hahah! You thought there wouldn't be another thread about Polish this month, but here it is! It might affect other languages as well, I can think of at least one example in German. Some verbs mean different things depending on whether they're followed by a reflexive pronoun or not. At least for Polish there's currently two practices, namely 1. making a separate entry for infinitive + reflexive pronoun and 2. listing all definitions at the plain form with a label 'reflexive' where required. I'd like for people to reach some consensus on which to use and then work towards it when they come across forms not fitting it. I like the tidiness of the reflexive label, but I don't know enough Polish to recommend anything. Korn [kʰũːɘ̃n] (talk) 21:14, 27 May 2018 (UTC)

Same problem in French. I never know what to do with those. --Per utramque cavernam 22:32, 27 May 2018 (UTC)
Same deal with Spanish. We sometimes have a separate entry for reflexives and non-refs, often they're together. --Genecioso (talk) 11:50, 28 May 2018 (UTC)
I thought we had reached consensus on this long ago: regardless of semantics, we only have separate entries for reflexive verbs in languages where the reflexive particle is written together (e.g. Spanish casarse, Russian водиться (voditʹsja)), but not in languages where it's written separately (e.g. French se marier [which is a hard redirect], German sich trauen [which I hope is defined at trauen]). By this reasoning (which I vaguely remember having been discussed five or ten years ago, but I'm not going trawling through the archives to find the discussion), everything in Category:Polish reflexive verbs should be moved or merged to the corresponding entries without się. —Mahāgaja (formerly Angr) · talk 12:16, 28 May 2018 (UTC)
If there is such a consensus, that suits me just fine, I'll mind it in the future. Korn [kʰũːɘ̃n] (talk) 12:41, 28 May 2018 (UTC)
That seems like a good approach; centralizing definitions like that is probably the best way to make them findable. Any "verb + another word(s), separated by a space" entries we keep should be linked-to from the main entries, obviously (although they are often not). (Even solidwritten reflexives like casarse should be linked to from casar, and I see it is ... and that casar duplicates its definition.) This is an issue even for English, where 100+ entries have "oneself" in the title, e.g. soil oneself, which is defined at both that entry and at soil... - -sche (discuss) 00:08, 29 May 2018 (UTC)
Casarse is a form of lemma casar. Ideally all definitions should be in the lemma, and the form should have a "form of" template. However, note that some verbs are only used as reflexives: arrepentirse v. arrepentir. In this case arrepentirse is the lemma and arrepentir is just an infinitive form. --Vriullop (talk) 09:38, 29 May 2018 (UTC)
@Mahagaja, do you see the one you were referring to in that lot? I've scanned through them but I didn't see any binding consensus (I'm not disagreeing with what you've said, though). Per utramque cavernam 14:33, 30 May 2018 (UTC)
No, I don't. I thought I had participated in the discussion I'm remembering, but I don't seem to have participated in any of those. It may not have been at the Beer Parlor, and it probably wasn't a binding consensus at all, rather an agreement among the people who were discussing it. Or maybe I'm just remembering what I wish had happened instead of what actually happened. —Mahāgaja (formerly Angr) · talk 14:49, 30 May 2018 (UTC)
Maybe we need some Codex Wiktiorabi. Korn [kʰũːɘ̃n] (talk) 15:13, 30 May 2018 (UTC)
It's relevant for Danish, too, and I've been unsure of what to do. Usually keeping them at the sig-less entries seems most useful, but there are phrasal, reflexive verbs where omitting the sig seems awkward, such as tage sig sammen. Also, is it deliberate that {{lb|xx|reflexive}} doesn't categorize into [cat:Xxx reflexive verbs]? I know there are reflexive pronouns, too, but those tend to be a closed category.__Gamren (talk) 19:00, 2 June 2018 (UTC)
Label definitely needs a fix in that regard. I'd do it, but don't know how. Korn [kʰũːɘ̃n] (talk) 21:39, 2 June 2018 (UTC)
Oh, that's easy. In Module:labels/data, change

labels["reflexive"] = {
display = "reflexive" }


labels["reflexive"] = {
display = "reflexive",
pos_categories = { "reflexive verbs" },

I can do it, but I'll wait and see if there are any objections.__Gamren (talk) 06:39, 3 June 2018 (UTC)
The only impediment I can think of is that we'll need to watch out (in existing entries, and going forward) for "reflexive" being used on something other than a verb. It's used on some pronouns, like [[yourself]] and [[itseni]] and [[ຕົນເອງ]], which should be switched to a label like {{lb|foo|reflexive pronoun}} (which could still display just "reflexive" but not add the verb category) before we make {{lb|foo|reflexive}} start categorizing things as verbs.
I checked a database dump for entries using {{lb|(..|...)|reflexive but not having =Verb= or =Pronoun= sections, and found a couple suffixes, -ózik and -őzik, which however seem to already use a different label ("reflexive suffix") and so are not an issue. I also fixed several verbs mislabelled as nouns, [6] [7] [8] [9].
I have code somewhere I should re-run to find more such header-headword POS mismatches. - -sche (discuss) 07:37, 3 June 2018 (UTC)
Okay, I made a "reflexive pronoun" label, which displays the same as "reflexive" and categorizes into subcats of Category:Reflexive_pronouns_by_language.__Gamren (talk) 14:34, 3 June 2018 (UTC)
OK, searching the database dump from the start of May, I found <100 entries that used the reflexive label and contained a =Pronoun= section, and have switched them to use "reflexive pronoun". I think you can make plain "reflexive" categorize things as verbs now. (I'll add a WT:TODO task of infrequently re-checked for non-verbs using the label.) - -sche (discuss) 22:53, 3 June 2018 (UTC)
I edited WT:ADA to say that non-phrasal reflexive verbs should be entered without sig, and phrasal ones with. I recommend the same thing for English (oneself) and German (sich), but don't think we need a universal rule.__Gamren (talk) 15:49, 11 June 2018 (UTC)

Request for one time manual review of Special:WantedCategories[edit]

I create these automatically when I can, but some are mistakes, some haven't been added to the topic category modules, and some should be created by hand. DTLHS (talk) 16:07, 29 May 2018 (UTC)

I see that things like Category:Brazilian English (Category talk:German English) are cropping up again. Separately I am also reminded of how we conflate the terms "regional" and "dialectal", but I won't hijack this thread to discuss that, heh. - -sche (discuss) 05:33, 30 May 2018 (UTC)
Now this entry, prior to my fix, had some interesting categorization... - -sche (discuss) 05:53, 30 May 2018 (UTC)

Position of "category" tags[edit]

Should "category" tags, e.g. "[[Category:English basic words]]", always go at the very end of the section for the relevant language, even when that is split into multiple etymologies and the tags do not refer to the final etymology? Mihia (talk) 17:58, 29 May 2018 (UTC)

Yes. DTLHS (talk) 18:00, 29 May 2018 (UTC)
Yes, because it would make them easier for editors to find. Also, if they were placed inside separate etymology sections, HotCat and bots would have to be reprogrammed as currently they assume that all the categories are placed together in a block. — SGconlaw (talk) 18:09, 29 May 2018 (UTC)

OK, thanks. Mihia (talk) 19:10, 29 May 2018 (UTC)

Adjectives v. determiners[edit]

Could someone please explain the difference between an adjective and a determiner? At mickle the heading used is "Determiner", and I'm wondering why. Thanks. — SGconlaw (talk) 06:48, 30 May 2018 (UTC)

  • The concept of determiners is less than a century old, and not everyone is sure how useful it is. But the principle is that adjectives "modify" nouns, whereas determiners "introduce" them – a determiner is something that a noun requires in order to be a valid noun phrase. You cannot generally say "house" on its own in a sentence; you have to say "that house" or "my house" or whatever. Unlike adjectives, determiners cannot be graded (something can be redder than something else, but not thatter). With all of that said, however, my understanding is that "mickle" can in fact function as a true adjective meaning "great". Ƿidsiþ 07:10, 30 May 2018 (UTC)
    • thatter/thisser are attested at least once (but that's Ionesco). --Per utramque cavernam 10:33, 30 May 2018 (UTC)
      • In that case it's "thatter way", which looks to me like a rendering of thataway, but I suppose you'd have to look at the Italian to be sure. Ƿidsiþ 10:49, 30 May 2018 (UTC)
The 1932 quote at mickle (for the variant meikle) is for an adjective, which is clear because the phrase already has a determiner: aDet meikleAdj cockN. But the 1590 quote (for the variant muchell) is more ambiguous. I think a case could be made for either one there, though my gut feeling is that in and muchell blood did spend it's a determiner. I don't if I can justify that beyond simply saying that "a lot of" is the sort of thing a determiner is likely to mean. —Mahāgaja (formerly Angr) · talk 13:47, 30 May 2018 (UTC)
Eeek. OK. So should we attempt to separate out the adjectival senses of mickle from the determiner ones? (Note that the word is already an adverb, a noun and a pronoun as well.) — SGconlaw (talk) 13:55, 30 May 2018 (UTC)
That's English for you. Yes, based on the quotes, I'd say it should have both a ===Determiner=== and an ===Adjective=== section. —Mahāgaja (formerly Angr) · talk 14:50, 30 May 2018 (UTC)
How do we define the word in the "Determiner" section in a way that is different from its adjectival sense? — SGconlaw (talk) 19:14, 31 May 2018 (UTC)
Another question: is mickle really a pronoun? The supposed pronoun definition is "a large amount or great extent". The noun definition is "a great amount". How is this different? — SGconlaw (talk) 08:58, 1 June 2018 (UTC)

Large numbers[edit]

Related active discussion: Wiktionary:Requests for deletion/Non-English#quaterseintsinquantaquàter

So how to handle this?

  1. Include any entries with idiomatic senses beyond the numeric value, of course.
  2. Allow entries for all integers from 0 to 100 (as is current practice).
  3. In addition, only include (single-word and English) entries where only the first significant figure is non-zero (four hundred thousand, six million, etc.)?
  4. Require attestation for any entries above a certain arbitrary threshold (say, 10,000)?
  5. But note that there are existing entries that fail these criteria (such as, for 9000, the following multiword entries: дзевяць тысяч, neuf-mille, níu þúsund, sembilan ribu).

What are your thoughts? -Stelio (talk) 15:28, 31 May 2018 (UTC)

I support your idea only for attested spelled-out forms. — TAKASUGI Shinji (talk) 15:39, 31 May 2018 (UTC)
Just a comment: can't understand the third point. Surely for any number, the first significant digit is non-zero; that's what "significant" means. Imaginatorium (talk) 16:19, 31 May 2018 (UTC)
He means that only the first significant figure is non-zero. — TAKASUGI Shinji (talk) 16:55, 31 May 2018 (UTC)
I still struggle to think of any cases where the first significant figure could ever be zero. Do people ever say things like "zero three thousand and fifty-two"? I do hear things like "zero four hundred", but that is military jargon in reference to a time (04:00, i.e. 4 AM). ‑‑ Eiríkr Útlendi │Tala við mig 18:06, 31 May 2018 (UTC)
Shinji has it right; I've edited to add "only" above. So I'm suggesting, for example, to include attested single-word terms for ...70000, 80000, 90000, 100000, 200000, 300000... but not the numbers in between (unless idiomatic senses also exist). -Stelio (talk) 10:45, 1 June 2018 (UTC)
  • I think we should include only those large numbers that meet CFI like any other word. That means for WDLs like German and Italian, we keep dreihundertfünfundneunzig and trecentonovantacinque only if there are at least three usages (not mentions) in independent, durably archived works spanning more than a year, and for LDLs like Emilian, we keep terśeintnovantasìnc only if there is at least one mention or usage in a source deemed appropriate by our Emilian editors to be used for reference (some standard Emilian dictionary, reference grammar, or the like). —Mahāgaja (formerly Angr) · talk 12:29, 1 June 2018 (UTC)
If this discussion is about which non-English numerals to include, I agree with Mahagaja (and AFAICR that is what we've been doing).
If this is about which English numerals to include, I am fine with continuing to do what we've been doing there, too. I'm not convinced THUB changes things for numerals (you say "once we have the same sense in any two non-English language, we can also add the number in English as well for translation purposes"), since numerals have their own specific policy which governs them (and says "numbers, numerals, and ordinals over 100 that are not single words or are sequences of digits should not be included in the dictionary"), and considering THUB to overrule it would seem to functionally repeal it, and THUB allows for editor discretion in deleting individual THUBs anyway. - -sche (discuss) 16:23, 1 June 2018 (UTC)

Lexifier etymology template?[edit]

Ik the issues of derivation from lexifiers has been discussed before. But I'm bringing it up again. Would anyone support creating a new etymology template ({{lex}} or something), with a different arrow for desc trees (maybe or  ?), and categories such as Category:Tok Pisin terms lexified from English (idk if this wording makes sense). And for ancestors just add the lexifier(s), and have an error message if you add the wrong lexifier. This is a super rough proposal; lmk if anything would present any problems. Also I've been gone for a couple weeks so I might've missed some discussions on this topic, sorry abt that. – Julia (talk• formerly Gormflaith • 23:46, 31 May 2018 (UTC)

It's an intriguing idea. I like the idea of an actual solution for pidgins and creoles, rather than the "let's stick to {{der}}" method that I've been promoting. I don't like the idea of the special category particularly, though. And given the way our data is structured, I'm unsure of how this could be best implemented. —Μετάknowledgediscuss/deeds 00:05, 1 June 2018 (UTC)
@Metaknowledge: Without a category how would this be any different from {{der}}? I feel like the main reason of the etymology templates are for categorization. – Julia (talk• formerly Gormflaith • 01:22, 1 June 2018 (UTC)
It would be a specific template (to distinguish from the catch-all nature of {{der}}), and machine-readable (cf. Epantaleo's stuff). It would also allow for easy introduction of categories later, should they be desired. —Μετάknowledgediscuss/deeds 13:40, 2 June 2018 (UTC)
That sounds good to me. If this were implemented, do you think if we add the lexifiers to data modules, a bot could go through and change:
{{der}} and {{inh}}{{lex}}
if third parameter is the lexifier
if it is the first etymology template in the sequence
all other {{inh}}{{der}}
I don't think this would cause any problems, but some manual stuff will probably be involved. – Julia (talk• formerly Gormflaith • 16:18, 2 June 2018 (UTC)