Wiktionary:Beer parlour/2018/December

Definition from Wiktionary, the free dictionary
Jump to navigation Jump to search
discussion rooms: Tea roomEtym. scr.Info deskBeer parlourGrease pit ← November 2018 · December 2018 · January 2019 → · (current)

References for Vietnamese readings listed under Template:vi-readings[edit]

I would like to add superscript references for readings of Vietnamese Han characters using the following code as a suggestion:

| hanviet = giả - tdcn
| nom = giả - tdcn;gdhn, giã - tdcn, rả - tdcn, trả - gdhn;btcn, dã - gdhn

The abbreviations used are: tdcn = {{vi-ref|Nguyen (2014).}} gdhn = {{vi-ref|Trần (2004).}} btcn = {{vi-ref|Hồ (1976).}}

The desired output using as an example is as follows:

Han character[edit]

: Hán Việt readings: giả[1]
: Nôm readings: giả[1][2], giã[1], giở[1], rả[1], trả[2][3], [2]


Currently, this is also achievable using the bulkier code below:

| hanviet = [[giả#Vietnamese|giả]]<ref name="tdcn">{{vi-ref|Nguyen (2014).}}</ref>
| nom = [[giả#Vietnamese|giả]]<ref name="tdcn"/><ref name="gdhn">{{vi-ref|Trần (2004).}}</ref>, [[giã#Vietnamese|giã]]<ref name="tdcn"/>, [[giở#Vietnamese|giở]]<ref name="tdcn"/>, [[rả#Vietnamese|rả]]<ref name="tdcn"/>, [[trả#Vietnamese|trả]]<ref name="gdhn"/><ref name="btcn">{{vi-ref|Hồ (1976).}}</ref>, [[dã#Vietnamese|dã]]<ref name="gdhn"/>

If possible, could someone edit Module:vi so that the suggested code in the first paragraph would give the desired output? KevinUp (talk) 15:49, 1 December 2018 (UTC)

@Suzukaze-c Hi. If you have the time, would you mind comparing the desired output above with 者#Vietnamese? I can't figure out how to implement this within the module. KevinUp (talk) 06:50, 7 December 2018 (UTC)
Yes check.svg Done, I think. —Suzukaze-c 07:01, 13 December 2018 (UTC)
Thank you very much! Also, I'd like to mention here that Template:vi-hantu is now officially deprecated and will be replaced by Template:vi-readings (The former template contains readings imported from the Unihan database, which fails to distinguish between Hán Việt and Nôm readings. Previous discussion can also be found here).
Besides that, the Nom Foundation database does contain some mistakes/unverified readings, such as hoả for [1], which is why I wanted to list out readings based on what is found in the original reference source. All readings with eventually be given superscript references, but it will take some time for this to be done. KevinUp (talk) 14:34, 13 December 2018 (UTC)

unchanged plural[edit]

What does "unchanged plural" exactly mean in the Usage note for craft? that's not the general terminology used in Wkt, is it? --Backinstadiums (talk) 16:38, 2 December 2018 (UTC)

I've changed it to be explicit: "The plural craft is used to refer to vehicles. All other senses use the plural crafts." Ultimateria (talk) 19:12, 2 December 2018 (UTC)

Inevitable discussion about reference works from non-Latin cultures[edit]

Given the situation where issue |lang= in {{quote-web}} in the Grease-pit page of this month insinuates opening all reference templates it has become opportune to uniformize their content. It has caught my eye that there are lurking multiple fashions of displaying references for cases of a work published in a script that is not the one of the Romans, namely, the author name was written in a certain script and the title of course too, but to my great surprise and contrary to Wiktionary’s usual laudable Unicode- and internet-standard compliance I encountered that there were reference templates here already created that did not even include the original title but wrapped it in {{xlit}} so that only a transliteration of it remained, and the same has also been done with titles of their authors, so that I could not recognize any of the books and almost did not find already created templates by Wiktionary’s search function, being already prepared – in vain – to create the templates.

So I reasoned that, since we are in late 2018 and our letter case is unlimited in what concerns languages that have in the Modern Age been used for pursuing science, the templates must all be uniformized so that the original title is displayed, opining also that transliterations are to be discarded for scripts that are unambiguous since they are no gain for anyone (if you don’t know the language you don’t know the transcription either, short of negligible cases when one is literate in Latin script only but not the actual script for a non-Latin-script-written language one knows) and “En.Wiktionary entries already have too much wasted space”, as @-sche acutely observed on the Grease pit page of this month and also has been voiced as a cause of displeasure.

There might be little experience in reference sections in any works containing non-Latin references, but naively and naturally and looking at how my computer does it, I always ordered references by the Latin names first and then the Cyrillic ones, and so I have come to the belief that the original-script author names can be had easily. People might however be more appealed to by Latin-transliterated names, but even then I am apprehensive of those being less iconic, but this is of limited importance for names. It can very well be grave in logographic writing systems some of which are still in use, for particularly people’s names can have the most arbitrary characters and it would be utterly impossible to reconstruct the original name without browsing the web again only to find a name which a Wiktionary editor has needlessly left out. Currently the Japanese reference templates have all formats.

So how does Wiktionary look upon all these factoids? What should references have, perhaps with distinctions by writing systems? I’d like to see completely removed the transcriptions of the titles of alphabetic and syllabaric scripts because they have no non-theoretical uses and would sort references by Unicode (I don’t actually know how Chinese sort their Chinese reference sections, and perhaps one feels that Japanese titles transliterated could somehow help, I avoid talking about those scripts). Plus why have people even thought that |title= and the author parameters would be the correct place to put transliterations or transcriptions? This would easily be different parameters |tr-title=, |tr-author= and so on that can be expanded for those who need it (whose existence I deny), and this would make reference templates use more expected parameters. Which of course entails as a minimum that we have original-script titles – come on, are readers supposed to reverse-transliterate titles? Author titles perhaps in both since one might not know the script but the author from other publications in other, Latin-written languages? But this is not generally true, though there are often adapted author names around. Avicenna is quite iconic, no need for اِبْن سِينَا (ibn sīnā), but that’s more often for classics and applicable to quotation templates. What does iconicity tell us here? And I have not even mentioned how often title-translations should be done, which have a parameter already. There is still this issue around of quotation templates containing bare long titles, and there are a few “click to expand” solutions for these as I remember. Pinging some people I find interesting to hear or interested: @Sarri.greek, Eirikr, Sgconlaw, Dan Polansky. Fay Freak (talk) 00:29, 5 December 2018 (UTC)

I’m sorry, could you summarize all that? I’m having trouble understanding what your concerns are. — SGconlaw (talk) 01:51, 5 December 2018 (UTC)
@Sgconlaw I wanted to uniformize references of books written in a non-Latin alphabet a bit, pointing out the questions whether the original script of a) the author name b) the title should be shown, and c) whether transliterations of the author names should be shown d) whether transliterations of the titles should be shown. I was just formulating much pros and contras. My result has been to vehemently affirm b), deny d) (hardly valuable clutter), lean to a), I am rather open to c), but it would need to look good enough (like on the Chinese reference page KevinUp has linked it is great but we need |tr-author= for this I think). Fay Freak (talk) 19:35, 5 December 2018 (UTC)
I say show both the original and the transliteration, in the future we will be able to customize this to everyone's satisfaction with css magic. Crom daba (talk) 03:42, 5 December 2018 (UTC)
Yes, and at the least transliteration does not belong to |title=, otherwise there won’t be CSS magic. There need to be separate fields for original titles and author names and their transliterations, I don’t think I can be wrong here, @Sgconlaw. Now supra there are the arguments for displaying. The decision about display should not be influenced by limited forms of saving the information. Fay Freak (talk) 19:35, 5 December 2018 (UTC)
Here are the formats used for Chinese references: Wiktionary:About Chinese/references, Korean references: Wiktionary:About Korean/references and Vietnamese references: Wiktionary:About Vietnamese/references. Also, all Chinese quotations and usage examples (whether it is cited from a book, song, video or the web) are provided using Template:zh-x. A list of abbreviations for well known references used by this template can also be found at Module:zh-usex/data. KevinUp (talk) 04:08, 5 December 2018 (UTC)
@KevinUp The Chinese reference page is great. Until the point where I find: “Starostin, Sergei (1989). Rekonstrukcija drevnekitajskoj fonologicheskoj sistemy (A Reconstruction of the Phonological System of Old Chinese)”. Why is the Russian title not given in Russian script but the Chinese titles are given in Chinese script only (and not in Pinyin)? No logics.
There is also the issue of some titles being translated and some not, but that’s minor. Fay Freak (talk) 19:35, 5 December 2018 (UTC)
@Fay Freak: I'm not sure why the work by Sergei Starostin was not written in the Cyrillic script. I tried to trace the source of that work, and this is what I managed to find: [2]. Unfortunately I was unable to trace the original source. Perhaps someone else could help by looking up the bibliography of Sergei Starostin.

Phonological reconstructions for Early Zhou, Classical, and Middle Chinese are based on Sergei Starostin's version as originally published in: [Starostin, Sergei. Rekonstrukcija drevnekitajskoj fonologicheskoj sistemy [Reconstruction of the Phonological System of Old Chinese]. Moscow, 1989.] Particular reconstructions are transliterated into the UTS from S. Starostin's etymological database of Chinese characters (bigchina.dbf), available online at http://starling.rinet.ru.

As to why Chinese titles are given in Chinese script only and not in Pinyin, this may have been done to prevent a cluttered appearance of the reference works. Also, it seems that pinyin tone marks are omitted for Chinese reference works in Yale University Library's Quick Guide on Citation Style for Chinese, Japanese and Korean Sources: APA Examples. KevinUp (talk) 16:32, 6 December 2018 (UTC)

Adding pinyin for numbers in Chinese (Mandarin?) example sentences[edit]

@Dokurrat, KevinUp, Justinrleung, Suzukaze-c, Tooironic, Wyang & co. (alphabetically organized) I added Pinyin for the numbers in a Mandarin Chinese example sentence, and that pinyin was removed- see [3]. I think we should give the pinyin for the numbers (maybe?). I'm okay either way- in fact I don't think we need to do all sentences one way (no pinyin for numbers in example sentences) or all the other way (pinyin for all numbers in example sentences). But I'm not sure. idk. I'm just putting it out there for y'all to discuss. Any which way is fine to me. --Geographyinitiative (talk) 04:30, 5 December 2018 (UTC)

No, I don't think we should add pinyin for Arabic numerals. Dokurrat (talk) 04:41, 5 December 2018 (UTC)
I like the idea. I usually do it for Japanese. —Suzukaze-c 04:42, 5 December 2018 (UTC)
I'd like to see the numbers as pinyin, because they are read according to its Mandarin pronunciation. Also, depending on context, they can be read as cardinal numbers or standalone digits:
365  ―  sānbǎiliùshíwǔ tiān  ―  Three hundred and sixty five days.
員工365失踪 / 员工365失踪  ―  Yuángōng sānliùwǔ shīzōng le.  ―  Employee no. 365 is missing.
KevinUp (talk) 05:04, 5 December 2018 (UTC)
^ this. —Suzukaze-c 05:18, 5 December 2018 (UTC)
Agreed that we should add pinyin conversion for Arabic numerals. ---> Tooironic (talk) 06:09, 8 December 2018 (UTC)
It has to be added manually, of course, otherwise we are asking for possible future errors in conversion. Perhaps re-transliterated numbers need to be displayed differently, so that e.g. sānbǎiliùshíwǔ for "365" is known to mean to stand for 三百六十五 (sānbǎiliùshíwǔ, “three hundred sixty five”) or 三六五 (sānliùwǔ, “three six five”). A different colour or underlined? Also, maybe a trick is needed to use a hidden "三百六十五"/"三六五" but display "365", so that a manual pinyin is not required? BTW, @KevinUp: I have suppressed the display of "365" in your example with @. --Anatoli T. (обсудить/вклад) 07:15, 8 December 2018 (UTC)
@Atitarev: Automatic pinyin transliteration of Arabic numerals can be done by adding pronunciation data of 0-9 to data.polysyllable_pron_correction in Module:zh-usex/data. However, this would render "365" as 三六五 (sānliùwǔ, “three six five”). Manual input would still be needed if "365" is intended to be read as 三百六十五 (sānbǎiliùshíwǔ, “three hundred sixty five”). KevinUp (talk) 14:45, 8 December 2018 (UTC)
@KevinUp: I understand. As I said, what we need is, a new method in the module to use the transliteration of hidden characters, in this case "三百六十五" for transliteration purposes only - "sānbǎiliùshíwǔ" but display unlinked "365" in the Chinese text. --Anatoli T. (обсудить/вклад) 04:16, 9 December 2018 (UTC)
This seems to be slightly complex, so we may have to add this to Wiktionary:About Chinese/tasks. KevinUp (talk) 04:25, 9 December 2018 (UTC)

Wiktionary lemmas written in a nonnative script[edit]

As Wiktionary grows, I noticed some unusual entries written in a nonnative script such as 0.5#Chinese, の#Chinese that qualify for Wiktionary:Criteria for inclusion and may have also passed Wiktionary:Requests_for_verification due to its widespread used in a particular language or region. However, I think that it might be better to list such entries (that have passed RFV) in an appendix or separate namespace or to put a banner right below the language header to inform our readers that this lemma is written in a nonnative script along with categorization. KevinUp (talk) 15:14, 5 December 2018 (UTC)

Out of curiosity, do we have Arabic, Greek, Hebrew, Hindi, Russian lemmas that are written in the Latin script, for example? I've also found Category:Terms written in foreign scripts by language, but only Chinese, Japanese and Korean are listed in this category. KevinUp (talk) 15:24, 5 December 2018 (UTC)
Category:Chinese terms written in foreign scripts DTLHS (talk) 15:26, 5 December 2018 (UTC)
These entries are rather interesting: fighting#Chinese, friend#Chinese, part-time#Chinese. Yes, I've heard these terms used in real life, such as in TVB dramas, but I am surprised to see these entries included in Wiktionary. I would like to propose for such terms to be listed in an appendix or separate namespace, because such entries are more likely to be found in an informal dictionary such as an A-Z pocket slang dictionary, rather than a formal dictionary. KevinUp (talk) 15:55, 5 December 2018 (UTC)
The issue has come up before, with marketing being used (in Latin script) in Greek texts. Wiktionary:Beer parlour/2017/September § Modern Greek terms spelt with Latin characters. See also this revision history for a recent disagreement. I'm not comfortable at all with including that sort of things. Per utramque cavernam 16:15, 5 December 2018 (UTC)
Foreign script is a strong argument for code-switching. Even when it is used constantly in Greek it can be the case that it never passes into Greek, and it is no loss not to add it either because the English entry suffices (you read a Greek text, look up a word here but find it as English, that’s enough, you don’t expect anyway that all that you read is in the dictionary as Greek). Fay Freak (talk) 19:39, 5 December 2018 (UTC)
Script is secondary to the actual spoken language, and usage of words should be analyzed for codeswitching, and for what-language lexicon a word belongs to. French has fr:American way of life#Français and fr:web design#Français, and Japanese has サード (sādo, third) and ホエールウォッチング (hoēru wotchingu, whale watching); are these "acceptable"? —Suzukaze-c 19:42, 5 December 2018 (UTC)
Maybe we need to find a way to represent code-switching? It would seem like a common pattern for a foreign word to have a code-switched variant (with foreign pronunciation, in a foreign script) and a nativized one (being closer to the native language's phonology, spelled in the language's native script) with the first one being extremely common and the second at the edge of attestability, but due to our policies we only include the second one and create a distorted picture of actual usage patterns.
I remember @Vahagn Petrosyan having something to say about this. Crom daba (talk) 20:06, 5 December 2018 (UTC)
I create a Usage note, as in վարագույր (varaguyr). --Vahag (talk) 12:18, 6 December 2018 (UTC)
Yes, I think that we need to find a way to represent code-switching. Rather than using foreign script as an argument for code-switching it might be better to decide based on the pronunciation of the entry.
I would like to suggest for entries such as (1) part-time#Chinese, (2) PK#Chinese, (3) SUS#Japanese that have been nativized to become closer to the phonology of the language it was borrowed into (despite retaining its nonnative script) to be accepted as legit entries whereas entries such as (1) fighting#Chinese, (2) fr:American way of life#Français, (3) の#Chinese that are found mostly in written form but rarely in spoken conversations are to be put under some sort of banner to inform our readers that such entries are of unconventional usage and are mostly written for stylistic effect. KevinUp (talk) 16:32, 6 December 2018 (UTC)
Alternatively, we should set up some sort of guideline to decide whether or not an entry is considered code-switching or not. KevinUp (talk) 06:50, 7 December 2018 (UTC)
Yes, language-specific CFI are needed. --Anatoli T. (обсудить/вклад) 07:17, 8 December 2018 (UTC)
I think that the issue of the script is a bit of a red herring. Take the originally English word online, which has become commonplace in many languages, including Serbian. Now when Danas, a major newspaper, uses the word, they write for example ”Srbi sve više kupuju online. The Politika newspaper is also written in Serbian but uses Cyrillic script; when they use the word, they write for example “Политика Online, as they in fact do on every page of their website. It would be strange to consider the use by Danas a loan word but the use by Politika a case of code switching, merely because one happens to use Roman script and the other Cyrillic for what is the same language.  --Lambiam 17:30, 8 December 2018 (UTC)
In this particular case, the spelling is a strong indicator of code-switching, as Serbian orthography is phonemic and (unlike Croatian) strongly prefers transcribing foreign names and terms. You could consider onlajn (abundantly attested) a nativized variant, although arguably the choice between these spellings is a matter of personal style. Crom daba (talk) 18:00, 10 December 2018 (UTC)

For an example in English, Москва is citeable (Citations:Москва) but was deleted (Talk:Москва), and Citations:ἄρχων is also citeable (as are, I expect, Arabic-script forms of Allah and PBUH, etc). An older Chinese example is Talk:Thames河, deleted in 2011.) - -sche (discuss) 17:54, 8 December 2018 (UTC)

When I read, “With absolute confidence I can boast that my Frittelle di Fiori di Zucca are the best in the world”, I don’t think, “Oh, perhaps we should consider including an entry for the English term frittella di fiori di zucca. No, I think this is an instance of code switching, and in this case one of a very common type. I think we should not have an English entry oliebol either. Although the term can be found in English texts, it is obviously a Dutch word. There is a need for a test or criterium when the use of a foreign term is simply code switching, and when the term becomes part of the lexicon of a borrowing language. As I’ve tried to argue above, being written in a different script is not a litmus test. Being included in quotation marks is a strong indicator of not being seen as part of the lexicon, but not all authors will use these when code switching. When the imported term becomes subject to local inflection, or can serve as a component to form new compound words, this is a strong indicator of having become lexicalized, but as a test this does not work for analytic languages like Mandarin.  --Lambiam 12:33, 9 December 2018 (UTC)
In personal experience, code switched fragments can very easily be inflected and are likely to be joined in compounds to attach them to native sentence structure. Also, lexicalized loans are likely to have defective inflection.
Pronunciation is also no good, since it is extremely speaker and context dependent, and lexicalized loans can themselves have a special phonology.Crom daba (talk) 18:07, 10 December 2018 (UTC)
I don't think we can have a coherent policy or test across different languages. Speakers of different languages will absolutely differ in their criteria for what counts as a native word. This is even more difficult with global languages like English where different communities are in contact with a huge variety of other languages from which to borrow from. DTLHS (talk) 18:23, 10 December 2018 (UTC)
  Good words, @Crom daba. I want to point out how language is really written on the internet: In printed works or works inspired by print practices there are many things that don’t happen but are unproblematic elsewhere, in unrestrained speech where people can develop their own standards or own morals, unspooked by societal expectation, so to speak in Stirnerian language: Remarkably, nowadays in Russian chats, and I mean those where discussions take place and people try to write correctly, one just writes some foreign words in foreign script and then immediately joins Russian endings in Cyrillic script to them. It’s also the way I think and do it: Writing Russian in Germany, referring to things in Germany without having a notion of a Russian equivalent, I just write German words or English words in Latin script and decline them Russian and in Cyrillic script (without space I mean, you understand; most iconic, I think), and this does not make them Russian. I often can think “Is this word Russian already”? There are some obvious ones that do exist, like everyone uses the word терми́н (termín) in reference to appointments in Germany, a word that does not exist in Russia, and I long did not even know that it doesn’t, it seems so indispensable. This middle ground of dubiosa (is this English or Latin, huh? Not English because of lacking spread) is only left out by me and other dictionary editors often because these words have limited relevance to a greater world and one would look up these words in German dictionaries anyway (as I said earlier, an entry in one language suffices, a Greek entry marketing (marketing) is otiose), plus they are CFI-problematic (best one can do is quote them from fora and commentaries under articles, perhaps with archive links, but that’s it, these Soviets here don’t produce a corpus that would help to quote Russian as spoken in Germany). Separating the words is even more difficult if you look at inter-Slavic conversations: Like is Russian менто́вка (mentóvka, mint liquor as popular in Bulgaria) Russian? It is used in Russian texts here and there, and obviously with Russian endings then, but is it perceived as Russian? (With a German legalese term with no equivalent in English, how does the Verkehrsanschauung or Verkehrsansicht see it?) I have also read quite a lot strange words from Russian expats in Serbia and things like that, you could make large lists of such words if you wanted to; theoretically this could lead to having words in Russian written with Cyrillic characters we thought do not exist in Russian – I make here the strange observation that Latin words with foreign diacritics pass easier into texts of other languages but the Cyrillic languages tend more to transcribe all, i. e. having a Russian text with ђ is way more weird than Vietnamese diacritics, Semitic transcriptions and what you can imagine in English texts. And that’s only in Europe, elsewhere things become crazier, which others can describe better.
For the phonetical point, see that legit French words contain pharyngeal fricatives, like hebs (prison, can), hnouch (popo, bacon). Here we have also an issue arising if we know that a word has passed into French, English, and you can attest it from songs (like they have been printed on CD or are buyable as downloads or else unlikely to vanish, so durable). The flip side of words written in a non-native script are words which have passed but cannot or only with uncertainty be written in the native script. English example: gwop (moolah).
Normal dictionaries to a large part avoid such problems because they leave out exotisms, i. e. words for things that do not exist in an area where there is a community of the language documented. With this I lean towards an exclusion ground that is that if a word in English is for a foreign thing and the Verkehrsanschauung does not see the word as English then it is not English. Confer mesdemet! This is “not really English”. What does apply for abstracta then, what is Greek marketing then? This criterion I have just stated becomes difficult for foreign “ways of life”. Maybe Greek marketing is not actually Greek because he who uses such a word ceases to think like a Greek, regardless of the script it is written in. There are many gross things written and said in Arabic or Hindi texts that I would for this reason see as not-Arabic and not-Hindi. And the same criterion can apply to determine if a word has passed from German into Russian.
The issue gets complicated however because there is not only code-switching for Wiktionary but there is also Translingual: You could make a case for “marketing” being Translingual and not only English. I have argued already (User talk:Fay Freak § Translingual) for grammatical terms like genitivus absolutus, status constructus and the like being Translingual in the first place. Maybe “marketing” is translingual because teachers of business and marketing have made it so ex cathedra, which is why it is used in Greek, never able to become Greek. Fay Freak (talk) 20:02, 10 December 2018 (UTC)
Agree. Crom daba (talk) 23:18, 10 December 2018 (UTC)

Linking elements of a term in {{en-noun}}[edit]

At l'esprit de l'escalier, should the individual elements of the phrase in {{en-noun}} be linked to French words, like this: {{en-noun|head=[[l'#French|l']][[esprit#French|esprit]] [[de#French|de]] [[l'#French|l']][[escalier#French|escalier]]}}? (Pinging @Per utramque cavernam as we discussed this on the entry talk page.) — SGconlaw (talk) 17:11, 5 December 2018 (UTC)

No. They should be linked in the etymology. DTLHS (talk) 17:13, 5 December 2018 (UTC)
In the case at hand, the link is to an entire French term, esprit de l'escalier. Where should the individual elements be linked, or do we just not link them in this case? I was thinking that since the elements of a term in {{en-noun}} are usually linked by the template anyway, it makes sense to include the links to the French words manually. — SGconlaw (talk) 17:20, 5 December 2018 (UTC)
Those links are one click away. Theoretically it can be different if a French phrase exists only in English or an other language, the French not being CFI-compliant as French. Fay Freak (talk) 19:43, 5 December 2018 (UTC)
I usually link to component multi-word terms of a term if they reflect the sense of that term, eg, black sugar maple would link to black and sugar maple. And, as Fay Freak says, the individual words are just one more click away. It seems unhelpful to make a user guess at whether there are multiword components and which grouping leads to a possible entry. DCDuring (talk) 23:08, 5 December 2018 (UTC)
The {{en-noun}} template links to English terms, though. In this case, the terms are French, so it's not appropriate to link. —Rua (mew) 10:45, 6 December 2018 (UTC)
Generally, yes, but arguably not exclusively. For example, sometimes when an element is not present in the Wiktionary (for example, a person's name), I've seen a link to an English Wikipedia article. I see no reason why links can't be to other languages where appropriate. — SGconlaw (talk) 12:01, 6 December 2018 (UTC)
Because, again, {{en-noun}} creates English links. If you put a French word in there, it will still be an English link. A dead link, moreover. —Rua (mew) 18:43, 10 December 2018 (UTC)
No, it works fine. Try pasting {{en-noun|head=[[l'#French|l']][[esprit#French|esprit]] [[de#French|de]] [[l'#French|l']][[escalier#French|escalier]]}} at Wiktionary:Sandbox. — SGconlaw (talk) 07:07, 13 December 2018 (UTC)

New sinograph QIOU "poor and ugly"[edit]

How should this situation be dealt with in terms of lexicography?

poor and ugly
--Backinstadiums (talk) 00:26, 6 December 2018 (UTC)
The same way we deal with any other word or sinograph — add it if it is attested in durably archived media, spanning over a year, etc. (It doesn't look like this is.) —Μετάknowledgediscuss/deeds 02:01, 6 December 2018 (UTC)


quadrumanus appears in the Cambridge Grammar of the English Language, page 1663; is it a typo or a variant of quadrumanous --Backinstadiums (talk) 15:58, 6 December 2018 (UTC)

(This sounds like a Wiktionary:Tea room question. — SGconlaw (talk) 16:02, 6 December 2018 (UTC))
It is a taxonomic designation (as in Chiropsalmus quadrumanus). Highly unlikely to be an English adjective because of the spelling. Equinox 16:40, 6 December 2018 (UTC)
The authors were probably looking for a word that began with quadru and was not formed in Latin, as they are talking about "marginal vowels" as English morphological elements, which in the case of 'quadr' can be i, a, or u. Why they didn't choose quadrumane or quadrumanous for the purpose is beyond me. We could ask them. Maybe it is was typo. DCDuring (talk) 18:16, 6 December 2018 (UTC)

New Wikimedia password policy and requirements[edit]

CKoerner (WMF) (talk) 20:02, 6 December 2018 (UTC)

Programming languages[edit]

Since the Wiktionary includes all languages; Does it also include Programming languages? --2A01:112F:742:C00:14B9:E7A5:D1B3:F0B3 09:23, 8 December 2018 (UTC)

No, as they aren't human language (though a few words may rarely get borrowed into English grammar). Equinox 10:23, 8 December 2018 (UTC)
Is tlhIngan Hol a human language?  --Lambiam 19:26, 8 December 2018 (UTC)
Eh, it's clearly a totally different kind of thing from a programming language. The only programming language I've ever seen that even inflects verbs is Inform 7. Equinox 19:34, 8 December 2018 (UTC)
Programming languages are determined by a language specification, not by usage. That falls under "documentation", not lexicography. DTLHS (talk) 17:31, 8 December 2018 (UTC)
But the reference manuals for a programming language use terms from that language as if they were English, French etc - so we really ought to have them somehow. SemperBlotto (talk) 14:23, 10 December 2018 (UTC)
We've had this discussion before. Early programming languages only had a few keywords, but now there are hundreds of frameworks with thousands of named classes (e.g. ExecutionEngineException, HttpMessageInvoker) and each class may have hundreds of named properties, methods and fields. These, too, are listed in manuals and guides. Equinox 14:30, 10 December 2018 (UTC)
See Wiktionary:Requests_for_verification/English#caddr as well. - TheDaveRoss 14:31, 10 December 2018 (UTC)
Take this sentence from a book on conversational French: Bonjour is usually used until around six p.m., whereas bonsoir is used after six p.m.” In a book on French you can expect to find French words used as nouns in English sentences. Only, they are not used with their French meaning. They stand for themselves. So these sentences mention the word in the sense of the use–mention distinction. Likewise, the English sentence esac is case spelled backward, rather like fi is if spelled backward” only mentions these keywords. To understand the sentence you don’t have to know the meaning of any of these words. On the other hand, grep, originally just another computer command, can be used as a verb (”I grep, he greps, we grepped”), so it clearly has become lexicalized and merits to be included.  --Lambiam 18:12, 10 December 2018 (UTC)

Appendix:Reference detail[edit]

According to the description: "This appendix provides detail to sources linked by Wiktionary. It is to be linked from reference templates." It contains three items, all created by User:Dan Polansky. Is this a new policy? The only reason I've noticed it is that Dan changed one of the Hungarian reference templates. I'd prefer to link directly from the template to its corresponding website and not to an appendix. Was there a Beer parlour discussion or vote on this? Panda10 (talk) 15:25, 9 December 2018 (UTC)

It is not a new policy, not anything mandatory and rigid. If you don't like my change in Template:R:TotfalusiEty 2005, please revert it. The point of the appendix is to provide more information than comfortably fits in the mainspace, e.g. English rendering of the title. Some reference templates link to Wikipedia, which is similar in that it does not lead to the main website of the reference. --Dan Polansky (talk) 15:31, 9 December 2018 (UTC)
Dan, thanks for your prompt reply. I do see your point, but for now, if you don't mind, I will revert the changes until it is decided by the community how to standardize reference templates. Panda10 (talk) 16:46, 9 December 2018 (UTC)
Thank you. I realized we could link to the appendix via "→Detail", without losing the immediate link to the dictionary website. I added the link as a proposal. --Dan Polansky (talk) 08:13, 15 December 2018 (UTC)
@Dan Polansky: I'm not sure. The only extra information the Appendix provides is the English translation of the book title. There has to be some other benefits of such an Appendix because it has to be maintained. Who will do it? I appreciate that you care about this but maybe in the future you could demonstrate the proposals using the Czech reference templates? I don't see any of them in the Appendix. :) Thanks! Panda10 (talk) 14:51, 15 December 2018 (UTC)
@Panda10:: The appendix needs to be maintained no less than the templates themselves. Furthermore, once correct information is entered, I do not see much of a need of further updates. As for Czech reference templates, I now added {{R:PSJC}} and {{R:SSJC}} to the appendix, and I am glad all the detail I added is not in the template display for the mainspace. In the appendix, I have stated how many entries there are in the dictionaries. --Dan Polansky (talk) 09:25, 16 December 2018 (UTC)
@Dan Polansky: I'm still not convinced. But if you find this system useful, it's fine to add all the Czech reference templates to the Appendix. As for the Hungarian template, I will revert the change. Panda10 (talk) 18:01, 16 December 2018 (UTC)
I'm letting it be now, I guess, but let me note that I don't understand it. I think it pretty obvious that the reader was better off having a link to a page with more detail, including English rendering of the title and the number of entries in the dictionary. --Dan Polansky (talk) 18:05, 16 December 2018 (UTC)
@Dan Polansky: I see too what you want. Actually instead of listing references in Appendices a technical solution that I consider agreeable is to have the transliterations, transcriptions and translations present in the templates but not shown without clicking to collapse – no? @Panda10: On this page, section 3 is actually about standardizing the information given by reference templates but the community does nothing, you could weigh in too. Fay Freak (talk) 20:21, 10 December 2018 (UTC)

Interesting BBC articles[edit]

An interesting BBC article on an analysis of Twitter that traces the geographic rise and spread of neologisms in American English: Feeling litt? The five hotspots driving English forward (4 May 2018). -Stelio (talk) 08:26, 12 December 2018 (UTC)

Another one on anti-languages: The secret “anti-languages” you’re not supposed to know (12 Feb 2016).

- Stelio (talk) 13:28, 12 December 2018 (UTC)

Thanks for sharing. The inaugural lecture video has some more details, and the data and scripts are available as well. It would be interesting to apply this to other languages. – Jberkel 22:11, 14 December 2018 (UTC)

English words with contraction-'s, etc[edit]

A recent RFD got me thinking: by vote, we don't allow entries for words with possessive-'s, with only a few exceptions. Do we have any policy on which contractions are allowed? I've created some more interesting ones myself (double and triple contractions), but...it seems like contraction-'s can be added to as many words as possessive-'s. Just googling the first few words from various parts of speech that pop into my head, I can find citations of all of them: not just nouns like cat's but difficult's (see google books:"difficult's an", etc), write's (google books:"If any line I write's a nobbler", etc), wow's (google books:"wow's an"), see also google books:dogs're. I presume we don't want entries for all of these! (The small set of ones attached to pronouns (he's, y'all'd've, etc) are worth keeping, IMO.) - -sche (discuss) 18:00, 12 December 2018 (UTC)

Oh yeah![edit]

Guess who has cracked 600,000 edits. [11] That works out at about 150 a day on average, though in practice I have some days when I don't come to Wiktionary at all and some days when I hammer away at it like a lunatic for eight hours. Equinox 00:06, 14 December 2018 (UTC)

I just love how the xtools edit counter has a big red banner at the top saying, "User has made too many edits!" Tsk, tsk. Andrew Sheedy (talk) 02:07, 14 December 2018 (UTC)
Impressive. And I haven't got to the half million mark yet. SemperBlotto (talk) 07:08, 14 December 2018 (UTC)
Impressive. That's over 1% of all the edits made on this site. - -sche (discuss) 22:35, 14 December 2018 (UTC)
Not so impressive. You wouldn't even get into the top 15 in Wikipedia. Also, perhaps you should get help for your wiki-addiction. What is impressive, however, is Wonderfool's hitting 300,000 despite around 130 blockings. --Mustliza (talk) 10:57, 15 December 2018 (UTC)
How about that top 15 on Wikipedia, though? bd2412 T 01:43, 16 December 2018 (UTC)
How on earth am I in sixth place? I feel like a lot of editors are way more active than I am. —Rua (mew) 23:24, 15 December 2018 (UTC)
You deserve some kind of medal, made of some kind of metal, for showing some kind of mettle. bd2412 T 01:44, 16 December 2018 (UTC)

Any use for a "rare character" index?[edit]

Hello! There was recently a discussion at Extension:CirrusSearch about creating a new search index for "rare" characters that are currently not indexed by the on-wiki search engine. The three examples of difficult-to-find characters given were (Ankh), (ditto mark), and (ideographic closing mark). (Note that you can currently do an insource regex search like insource:/☥/, but on large wikis this is guaranteed to time out and not give complete results, and it is extremely inefficient on the search cluster.)

We can't index everything—indexing all every instance of e or . would be very expensive and less useful than , for example. So, in English, we would ignore A-Z, a-z, 0-9, space, and most regular punctuation (exact list TBD) and index pretty much everything else.

The most plausibly efficient way to implement such an index would only track individual characters at the document level, so you could search for documents containing both and , but you could not specify a phrase like "☥ 〆" or "〆 ☥", or a single "word" like ☥☥ or 〆☥.

I've opened a Phabricator ticket T211824 to more carefully investigate such a rare character index, to get a sense of how big it would be and what resources it would take to support it. If you have any ideas about specific use cases and how this would or would not help with them, or any other thoughts, please reply here or on the Phab ticket. (Increased interest increases the likelihood of this moving forward, albeit slowly, over the next year.)

Thank you! TJones (WMF) (talk) 16:27, 14 December 2018 (UTC)

One thing that comes to mind immediately is searching for control characters, private use block characters and unusual whitespace characters. It would be even more useful if such characters could be grouped together in a single search. DTLHS (talk) 16:35, 14 December 2018 (UTC)
We haven't thought too much yet about how the keyword for this would work. Parsing the query carefully so you can search for whitespace characters is always tricky. So, suppose the keyword is char:, then searching for documents with both ☥ and 〆 could be char:☥ char:〆, while searching for either would be char:☥ OR char:〆. We could have a special syntax like char:☥〆, which is more efficient, but would that be an implicit AND or an implicit OR? Either could be confusing; for example, searching for char:Иван would only incidentally actually find the name Иван.
For control or whitespace characters, being able to specify them by number would probably be useful, so \u2002 or U+2002 for an 'en space'. For the all three use cases, it sounds like you'd want OR, not AND as your combining operation, so you'd have to spell them all out, like char:\u2002 OR char:\u2003 OR char:\u2004 OR char:\u2005 ... for whitespace characters. I can see how something like char:\u2002-\u200D would be useful, but on the back end that would balloon into a fairly expensive search, and something like char:\uE000-\uF8FF for the whole Private Use Area or char:\uF0000-\uFFFFF for whole Supplementary Private Use Area-A would explode into ~6,400 or ~65,000 search terms on the back end, which we could not support. I could see maybe allowing specifying a range, but it would have to throw an error for more than some limit of characters in the range. (10? 20? 50?)
Were you hoping to search for an entire private use area at once, or just a limited range of characters? Thanks for the interesting use cases! TJones (WMF) (talk) 18:40, 14 December 2018 (UTC)
Yes, the whole private use area. Maybe that's not such a good fit for this request since I'm more interested in the boolean value "does this page have a private use area character in it or not", and not specifically which character it is. DTLHS (talk) 18:50, 14 December 2018 (UTC)
It might be possible to also index by Unicode block, so if I dig into this, I'll try to get a sense of what that looks like, too. Though I wouldn't expect it to be in the first version if we get that far. TJones (WMF) (talk) 19:30, 14 December 2018 (UTC)
For our purposes, Wiktionary entry names and links to entry names are far more important in searching for special characters: it helps to know when they include zero-width non-joiners, left-to-right markers, punctuation/whitespace outside of the Basic Latin Block, combining diacritics, or anything else that might produce a visual duplicate with different encoding. A different issue is mixing of scripts: Latin-script English paca and Cyrillic-script Russian раса (rasa) are fine, but we want to know when there's something like pаcа that has both Latin and Cyrillic, for instance. You might think of it as a multilingual version of antispoofing. Chuck Entz (talk) 20:08, 14 December 2018 (UTC)
We already have that capability with the standardChars field in Module:languages. Searching inside entries for specific characters is more challenging. DTLHS (talk) 20:11, 14 December 2018 (UTC)
Latin/Cyrillic homoglyph detection and correction is a sometime hobby of mine on my volunteer account—so I know what a pain that can be. Did you know that intitle: now supports regex searches? This search finds titles (or redirects) that have a Cyrillic and Latin character adjacent to each other: intitle:/([Ѐ-ԯ][A-Za-zÀ-ɏɐ-ʯ]|[A-Za-zÀ-ɏɐ-ʯ][Ѐ-ԯ])/ (no link, because it's an expensive query, so you have to want it enough to copy-n-paste). There are some false positives with redirects that have been fixed, and with Kabardian and a few other languages that do seem to actually mix scripts, so къуэкIыпIэ is probably right, but ларпурлартизaм looks like the final a is Latin. Anyway, intitle: searches on regexes still time out (it's just too expensive to scan for everything), but they probably get closer to completion than insource: queries, which have more text to scan.
Anyway, it sounds like a second rare-character index for titles would be helpful for finding zero-width joiners, LTR/RTL markers, etc. in titles. Finding them specifically in links would be harder. They do get stripped from search terms, which is what I usually pay attention to. TJones (WMF) (talk) 20:47, 14 December 2018 (UTC)
Actually, that Kabardian word should have a palochka (here on Wiktionary, the lowercase one, ӏ; elsewhere often the uppercase, Ӏ) instead of a capital Latin letter I (see Kabardian orthography on Wikipedia). But about mixed scripts, on Wikipedia someone posted some words from Halkomelem, which adds the Greek letter theta into an otherwise Latin alphabet. (I was surprised that there wasn't a Latin theta character, because theta is regularly used in the IPA.) — Eru·tuon 00:28, 15 December 2018 (UTC)
@Chuck Entz: Here is a list of titles with both Latin and Cyrillic characters from the December 1st dump. Looks like there are a few quark words (like b-кварк) for which this isn't an error. [Edit: See also User:Keith the Koala/Mixed character sets, though it is not up-to-date.] — Eru·tuon 01:16, 15 December 2018 (UTC)
@Erutuon: I would go through that list and fix them, but right now there are too many redirects and valid uses of the palochka. If you could exclude those, the list would have much fewer false positives. —Μετάknowledgediscuss/deeds 02:45, 15 December 2018 (UTC)
@Metaknowledge: The palochka belongs to the Cyrillic script, so anything in the list with a palochka lookalike (like the aforementioned къуэкIыпIэ) needs fixing. — Eru·tuon 04:27, 15 December 2018 (UTC)
I've removed all the redirects. — Eru·tuon 04:40, 15 December 2018 (UTC)
@Erutuon: Thanks. I honestly can't remember the outcome of the old discussions about what to do with different ways to encode the palochka in Caucasian languages. @Atitarev? —Μετάknowledgediscuss/deeds 05:43, 15 December 2018 (UTC)
@Metaknowledge, Erutuon: I don't remember the exact outcome either BUT when Roman letters, numbers or "|" substitute for palochka (upper or lower case), they are definitely wrong but could be used as redirects, since the use of palochka proper is still uncommon. The correct/normalised spelling for Kabardian къуэкIыпIэ (q̇°ăkIəpIă) is къуэкӏыпӏэ (q̇°ăč̣̍əṗă), using the lower case palochka ӏ but some people think we should use the upper case palochka Ӏ: къуэкӀыпӀэ (q̇°ăč̣̍əṗă). It's the form used when palochka was first introduced and there was no upper case/lower case distinction. Both forms look alike and the lower case palochka was added much later by the Unicode. In my opinion, we should use upper case Ӏ and lower case palochka ӏ following the capitalisation rules of the corresponding languages as intended. Lookalikes: !, 1, |, I, l should be all replaced with Ӏ/ӏ. --Anatoli T. (обсудить/вклад) 06:20, 15 December 2018 (UTC)
@Atitarev, Erutuon: I have now fixed everything on the list except for legitimate/unclear uses and palochkas. Anatoli, would you be willing to move the palochka entries as you see fit, leaving redirects behind? —Μετάknowledgediscuss/deeds 06:24, 15 December 2018 (UTC)
To the invisible characters Chuck has mentioned (namely ZWNJs and LTR and RTL marks) as being undesirable in pagenames, and thus desirable to find, I would add: soft hyphens. Currently, all of these are caught by periodic checks of database dumps, as mentioned in Wiktionary:Todo#Semi-regular_tasks; being able to find the characters in a way that didn't require downloading database dumps would make it easier for more people to check for them more often. (This gives me an idea about MediaWiki:Titleblacklist which I will raise in a new section!) - -sche (discuss) 06:35, 15 December 2018 (UTC)
ZWNJs are quite desirable in pagenames for certain languages, e.g. Persian. You'd have to sort by language just to filter out all the good examples of ZWNJs being used. —Μετάknowledgediscuss/deeds 06:37, 15 December 2018 (UTC)
Good point. I suppose one might do a search like char:[ZWNJ] insource:-Persian. - -sche (discuss) 06:47, 15 December 2018 (UTC)
We could probably make a regex that matches ZWNJ in a position where it actually has a visible effect, for instance between a left- or dual-joining Arabic character, zero or more characters transparent to joining, and a right- or dual-joining character. (I imagine it would be long.) But it would have to be applied in the following manner: if the title contains ZWNJ, forbid it unless it matches this regex. Not sure if that's possible. I did notice there is MediaWiki:Titlewhitelist though. Maybe ZWNJ can be unequivocally blacklisted in MediaWiki:Titleblacklist, but then whitelisted under limited circumstances in MediaWiki:Titlewhitelist. — Eru·tuon 07:16, 15 December 2018 (UTC)
I mean, we don't have to blacklist ZWNJs if it would be problematic/complicated, we could just keep making periodic database-dump checks for them (excluding Persian), and only blacklist things that are indeed always unwanted. - -sche (discuss) 16:38, 15 December 2018 (UTC)
I can’t a priori exclude that bidirectional control characters might appear legitimately in pagenames, and I can only warn since they do have a purpose and invectives against them frequently lead to gold-plating. They are just unlikely needed as multiple scripts are also unlikely needed in page names. I could imagine some mixed chat slang using Latin and Arabic or Hebrew script needing bidi characters, how far away the creation of pertinent pages might now be. Though definitely any bidirectional control sign should throw warnings. The direction-overriding U+202D and U+202E can be blacklisted though. Fay Freak (talk) 14:21, 15 December 2018 (UTC)

Use MediaWiki:Titleblacklist to block titles with undesirable invisible characters[edit]

It occurs to me that we could use this to prevent pagenames from containing various undesirable invisible characters which persistently creep up, like soft-hyphens (a recurring problem when people copy-paste words from certain other sites), couldn't we? My understanding is that pages containing those characters would thereafter be impossible to create, but presumably our existing entries on the characters themselves would be unaffected(?)—or if not, we could move them to Unsupported_titles/. - -sche (discuss) 06:37, 15 December 2018 (UTC)

An abuse filter would also work. Which one is more user friendly? DTLHS (talk) 06:42, 15 December 2018 (UTC)
Good point, and an abuse filter could also warn against and block or tag these in article bodies. OTOH, we can only have abuse filters do so many things before they run out of resources. As for user-friendly: it seems to be possible to display a customized message to anyone adding a blacklisted title (like w:MediaWiki:Titleblacklist-custom-imagename), which might be more friendly(?) than the messages abuse filters theoretically display, since those usually don't display for me (I see only the "short descriptions" like "ref-no-references") and apparently other users, based on confused feedback we've gotten from users wondering why their edits were blocked. - -sche (discuss) 07:08, 15 December 2018 (UTC)

Christmas competition[edit]

Hey all. I made a new Christmas competition. You have until Nanakusa-no-sekku to submit an entry. --Mustliza (talk) 10:52, 15 December 2018 (UTC)


Another important announcement...another entry has hit 10 years. The one in question in Dakasian, which has been sitting in WT for 10 whole years without being corrected. It was made by some prat called Jackofclubs (talkcontribs). I wonder what came of him... --Mustliza (talk) 11:07, 15 December 2018 (UTC)

I just touched the 10-year-old. Equinox 13:25, 15 December 2018 (UTC)
Why don't you have a seat right over there?Dixtosa (talk) 17:01, 15 December 2018 (UTC)
@Mustliza: Wonderfool, how do you know it is of English origin? Looks like an Anglicization of of Armenian Դաքեսյան (Dakʿesyan). --Vahag (talk) 13:46, 15 December 2018 (UTC)
Are the Dakasians something I should be keeping up with? Equinox 15:08, 15 December 2018 (UTC)
IIRC, VP thinks that all words are of Armenian origin. He may be right, though - this website shows Dakasians with first names Hayz, Vahan, Hagop and Vesta. --Mustliza (talk) 20:23, 15 December 2018 (UTC)
I wish I could see the mugs of these people. I can identify an Armenian face with a 99% accuracy. --Vahag (talk) 12:15, 16 December 2018 (UTC)
On Wikipedia, we had a project a while back to identify the oldest and longest untouched pages, touched by the fewest editors. We had a bot assign points based on the age of the page, age of the last edit, and number of people who had edited it. We did come up with a lot of problematic pages that way. bd2412 T 01:47, 16 December 2018 (UTC)

Selection of the Tremendous Wiktionary User Group representative to the Wikimedia Summit 2019[edit]

Dear all,

Sorry for posting this message in English and last minute notification. The Tremendous Wiktionary User Group could send one representative to the Wikimedia Summit 2019 (formerly "Wikimedia Conference"). The Wikimedia Summit is an yearly conference of all organizations affiliated to the Wikimedia Movement (including our Tremendous Wiktionary User Group). It is a great place to talk about Wiktionary needs to the chapters and other user groups that compose the Wikimedia movement.

For context, there is a short report on what happened last year. The deadline is very close to 24 hrs. The last date for registration is 17 December 2018. As a last minute effort, there is a page on meta to decide who will be the representative of the user group to the Wikimedia Summit created.

Please feel free to ask any question on the wiktionary-l mailing list or on the talk page.

For the Tremendous Wiktionary User Group, -- Balajijagadesh 05:56, 16 December 2018 (UTC)

Who wants to go to Berlin? Does anyone know whether there is any money for travel? Otherwise, it will probably be dewikt that sends someone. DCDuring (talk) 22:15, 16 December 2018 (UTC)