Wiktionary:Beer parlour/2018/June: difference between revisions

From Wiktionary, the free dictionary
Jump to navigation Jump to search
Content deleted Content added
Line 282: Line 282:
{{also|Wiktionary:Beer_parlour/2018/April#Disallowing_Appendix-only_constructed_languages}}
{{also|Wiktionary:Beer_parlour/2018/April#Disallowing_Appendix-only_constructed_languages}}


[[Wiktionary:Votes/pl-2018-04/Disallowing appendix-only languages]] has failed. While I think Gamren has taken things a bit backwards, he's made a valid point: it would seem that, at present, Appendix-only languages are not subject to any attestation criteria. Is that really what we want, and what we wanted when we relegated Lojban to the Appendix namespace ([[Wiktionary:Votes/2018-02/Moving Lojban entries to the Appendix]])?
[[Wiktionary:Votes/pl-2018-04/Disallowing appendix-only languages]] has failed. I think Gamren has taken things a bit backwards, and it made it ''look'' like he wanted to delete the mainspace-like content (i.e. the entries) currently hosted in appendices altogether. If I've understood the issue correctly, that wasn't his point at all. I think his view is that our current separation between main space and appendix-only languages
* 1) is artificial;
* 2) leads to our hosting unchecked content.

I'll address the second issue first. In my view, he's made a valid point: it would seem that, at present, Appendix-only languages are not subject to any attestation criteria. Is that really what we want, and what we wanted when we relegated Lojban to the Appendix namespace ([[Wiktionary:Votes/2018-02/Moving Lojban entries to the Appendix]])?


I don't think so, or at least I hope not; in my opinion, all words, wherever they are, should be subjected to ''some'' kind of attestation criteria. Does everyone agree on that, or does it need to be put to the vote?
I don't think so, or at least I hope not; in my opinion, all words, wherever they are, should be subjected to ''some'' kind of attestation criteria. Does everyone agree on that, or does it need to be put to the vote?

Revision as of 19:05, 6 June 2018

Labelling of bound morphemes

After a little talk with WF at the talk page of Spanish -dumbre, we've drawn the conclusion that our current "dated/archaic/obsolete" labelling scheme doesn't work too well for suffixes (and probably other affixes too). Why?

For that reason, I think it would be better to use the labels (productive/non-productive) when speaking of affixes. In fact I think we already do that in a few places, but it would be good to codify the practice.

That's not going to solve all our problems, however: there's even some disagreement on the productivity of -tion at Wiktionary:Tea room/2017/November § -tion.

Another thing: maybe the (obsolete) label doesn't have to be relinquished completely: if there are no words in current use using a certain suffix, then that suffix can be said to be both "non-productive" and "obsolete"/"extinct"? --Per utramque cavernam 15:58, 31 May 2018 (UTC)[reply]

  • Yes, for an affix that is no longer productive but which is found on words which are still used, I think "no longer productive" is better than "obsolete". An affix could also be obsolete, like you say, or archaic (perhaps if it's an alternative spelling of another affix, and sounds archaic: -iren isn't the best example because it's apparently outright obsolete, but it's the vein of thing I'm thinking of ... a- -ing says it's an example). I don't know if it's necessary to specifically label productive affixes or if it's implied by the absence of a "not productive" label. - -sche (discuss) 16:22, 1 June 2018 (UTC)[reply]

Footnotes

  1. ^ By the way, would it be correct to speak of a Spanish -dumbre suffix if there had been no word formed with it in Spanish proper, i.e. if all words "using" it had been inherited from Latin?

Lemmatisation of valent adjectives - preposition in entry title?

I think it would be better to turn prepositionful titles into redirects, and work only with prepositionless titles. We can then use a label/template there; in fact, we already do that for verbs: the first sense of French succéder is tagged with {{indtr|fr|à}} → (transitive with à), a template similar to {{label}}.

At Template_talk:indtr#Adjectives, Ungoliant suggests we do that with a usex-level template instead. Though I don't know what it would look like, I think I'd be fine with that too.

But in the end, my main concern is consistency; let's either move them all to prepositionless titles or to prepositionful titles. And even more importantly, let's have a way of keeping track of those (Category:English transitive adjectives using on, etc.?).

Thoughts? @-sche, DCDuring? --Per utramque cavernam 21:03, 31 May 2018 (UTC)[reply]

@Equinox, Kiwima, Ungoliant MMDCCLXIV, -sche, DCDuring. --Per utramque cavernam 21:04, 31 May 2018 (UTC)[reply]
This issue also comes up with verbs: will someone who sees I foobared at/on/to/for/of the widgets really think to look for our entry "foobar at" if they find that we have an entry "foobar" (which simply fails to link to "foobar at" except possibly as a derived term)? So (momentarily speaking orthogonally to your point,) we should probably consistently use an {{only in}}-like template to link from every verb or adjective foobar to every foobar on/at/ etc that we keep a separate entry for. But many foobar at titles could just be turned into redirects, as you say, and I think I'd prefer that (as the probably more-intuitive-to-use and also easier-to-maintain system) to moving things to prepositionful entry titles. We do sometimes use labels to indicate that, in a particular sense, a word is used "with the" or "with on" or "with for", etc. - -sche (discuss) 21:22, 31 May 2018 (UTC)[reply]
Phrasal-verb definitions are more difficult than adjective definitions that have as complements phrases headed by certain prepositions.
I still bear bruises from having been beaten about the head and shoulders for saying that not all our entries for "phrasal verbs" were for authentic phrasal verbs and that not all of our definitions in our entries for authentic phrasal verbs were non-SoP.
It is not easy to decide whether a given verb + particle pair constitutes a phrasal verb: purveyors of dictionaries of English phrasal verbs and those who've built a career on them find them everywhere; others not so much. It seems to me that there are 'true' phrasal verbs, with definitions that are related to the bare verb principally etymologically. For example:
have at, have (someone) on, and have up
These have(!) almost no connection with any intuitively obvious definition of have. Moreover, phrasal verb definitions at [[have]] are very likely to be lost among all the other definitions we have there.
As we often have bastard English entries to serve as translation targets, part of a rationale for having phrasal verb entries could be that the phrasal verbs are often more common in speech than their Latin- or French-derived single-word synonyms. OTOH, as -sche writes, many language learners might not know enough about English to look up 'verb + particle' rather than 'verb'. We don't often make our decisions about inclusion etc on normal-user-behavior grounds rather than, say, syntactic grounds, but perhaps we should do so more often.
I doubt that we can formulate decision rules that would work in all cases. I don't doubt that we can come up with templates that will often be misapplied.
As a rule, it seems to me that we don't use hard redirects from common SoP phrases to the appropriate definition of the key noun, verb, or adjective in the SoP phrase. But, also as a rule, I am inclined to follow the 'lemming' heuristic: if other dictionaries and glossaries have a real entry (not a redirect) for a term, we should too. DCDuring (talk) 22:23, 31 May 2018 (UTC)[reply]
Thoughts on the adjectives: prone to, sweet on, keen on, etc. aren't adjectives, IMO, and can't be broken down into one POS; better entries (if these were to stay and not be moved to the bare adj.) would be be prone to, be sweet on, be keen on as transitive verbs. That being said, I think only be sweet on should be an entry, and prone and keen should just be senses with lil labels b/c they're not really idiomatic. Redirect prone to to prone (because AFAIK to is the only prep. used with prone), but in keen's case, b/c it can be used just as easily with to and (according to a few Google searches, maybe with), no redirect. Sidenote: is have on (to be wearing) really idiomatic? – Julia (talk• formerly Gormflaith • 15:52, 1 June 2018 (UTC)[reply]

Tibetan observations and questions

I recently discovered the area I now live in in Sydney has the largest Tibetan community in Australia, so I'm taking the opportunity to teach myself some Tibetan.

I've made hundreds (I think) of Tibetan entries and translation entries in the past few weeks, mostly from a scanned PDF of English-Tibetan Dictionary of Modern Tibetan compiled by Melvyn C. Goldstein + Ngawangthondup Narkyid. I've borrowed a copy of Colloquial Tibetan but I'm not very far into it yet.

So far I've only really learned the alphabet and Windows keyboard layout. Very little pronunciation, grammar, or vocabulary.

Most of my Tibetan neighbours I've spoken with have very little English, one has OK English and one has excellent English. Most of them have pre-teenage kids who are all bilingual. I've met two who told me they're from Lhasa or nearby and two or three told me they're from Amdo. At least one of the ones from Amdo also speaks Chinese but some others don't even seem to recognize my attempts to speak Mandarin. They all recognize my attempts to speak Tibetan.

My understanding from a friend from Amdo I knew while staying in Xiamen a year ago is that educated Tibetans in Amdo know Lhasa and Amdo dialects but might not know Mandarin. My friend was probably about 30 years old and told me he'd only recently taught himself Mandarin and was now teaching himself English. He did also seem to speak a peculiar variety of Chinese that sounded very Tibetan to me. He used it once when we ate at an eatery run by Han people from Qinghai.

It seems to me that our Tibetan pronunciation template doesn't cover Amdo Tibetan. Or maybe there's many dialects and I just don't know which dialect names I should be looking at?

Colloquial Tibetan uses a variant of Tibetan with two tones or three tones. High, low, and neutral or no tone / unstressed. How does this relate to the variants covered in our pronunciation tables/template?

If Stephen, Wyang, or any other contributors who have some skills in Tibetan would like to glance at a bunch of my entries and offer constructive feedback if I'm making any consistent mistakes etc, I would appreciate it. — hippietrail (talk) 02:28, 1 June 2018 (UTC)[reply]

Update: Tonight I went to the local Tibetan restaurant and the two staff members I spoke to are both from Kham. On my way home I met another Tibetan, also from Kham. So it seems at least those three major varieties are represented in my neighbourhood. — hippietrail (talk) 13:05, 1 June 2018 (UTC)[reply]
Thanks for creating those Tibetan entries. Only a formatting suggestion from me for now - the Tibetan links in the Etymology section should be in {{m}} rather than {{l}}: diff (it's a ridiculous formatting rule).
For {{bo-pron}}, the |zeku= parameter is used for the Zêkog dialect, and |labrang= is for the Xiahe dialect; both of these are Amdo dialects.
For Lhasa, there are different ways of analysing its tones. I'm assuming the high/low in Colloquial Tibetan to be following a two-tone analysis. In the four-tone model (which is what {{bo-pron}} uses) each of the high/low categories can be further split into two subcategories: high becomes high flat (f) and high falling (h), low becomes low flat (w) and low rising (v). There are some minimal pairs of words in the two kinds of high tones.
I think I know which area of Sydney you meant (DY?). The Tibetan diaspora in Sydney is quite diverse AFAIK; I know some from Amdo living in that area. The Lhasa dialect is the prestige dialect of Tibetan, and many educated Tibetans know Lhasa and can use it when communicating with Tibetans from other dialect regions. Wyang (talk) 03:45, 2 June 2018 (UTC)[reply]
Yep I now live in Dee Why (etymologically comes from the letters "DY" on a map that nobody knows the reason for). Colloquial Tibetan actually describes their MO right at the beginning of the book. They made certain choices for practicality so the learner can communicate with the maximum number of Tibetan speakers, rather than adhere exactly to Standard Lhasa Tibetan. So I believe they used the two-tone model as part of that.
I'll try to remember to stick with "m" in etym sections. I'm so used to seeing a random mix of "m" and "l" that I just stick with the one I knew best. I met another new Tibetan this afternoon, this time a guy with good English who was from Lhasa and taught me several ways to say "goodbye" or "see you" which I've unfortunately already forgotten.
Thanks for your feedback! — hippietrail (talk) 10:26, 2 June 2018 (UTC)[reply]

Citing Twitter

Tweets are an absolute goldmine for vernacular, and this is especially crucial for oral-only languages (including a couple I'm particularly interested in like Scots and Swiss German). I assume since they can be deleted that they aren't considered durably archived? Can we find any solution to this? Has this been discussed before anywhere? Ƿidsiþ 05:29, 2 June 2018 (UTC)[reply]

As you acknowledged, Twitter is very, very much not durably archived. We currently have no solution for this, unless you collect a bunch of tweets and self-publish them, I suppose. Ultimately, we could develop new criteria (say, three tweets count as one regular cite, and a photo of the tweets must be uploaded to Commons and the original tweets checked by an admin to ensure that it hasn't been doctored), but that would require a lot of work and need to be subject to a vote. —Μετάknowledgediscuss/deeds 14:08, 2 June 2018 (UTC)[reply]
The Library of Congress is (ostensibly) archiving all tweets, save the ones which get deleted. Some are archived by law (POTUS, etc.) and should be citable. - TheDaveRoss 13:37, 3 June 2018 (UTC)[reply]
Trump's tweets are rather beside the point, as I presume they will all find their way into published material. —Μετάknowledgediscuss/deeds 14:56, 3 June 2018 (UTC)[reply]
True, but it is an interesting point that because his (and other presidents') are known to be archived (presumably durably), by law, arguably they could be cited without waiting for them to get into print. Hmm... I don't know. Also, @ The Dave Ross, the Library of Congress stopped archiving all tweets at the end of 2017. How accessible is their pre-2017 archive? - -sche (discuss) 15:04, 3 June 2018 (UTC)[reply]
Ah, I missed the update about switching to "selective" archiving, whatever that means. And the last I hear (a few years ago) the archive was proving technically challenging for them, and public access was limited. They also say they are going to keep the archive of tweets they acquired up until they stopped, so those are citable. Internet Archive is saving and sharing the "Spritzer" Twitter stream (1% of all public Tweets) but since they are essentially random that isn't useful for our purposes. - TheDaveRoss 17:05, 3 June 2018 (UTC)[reply]
If this were to come to a vote, I'd definitely support it. IMO tweets are one of the best sources because they're often very close to spoken language. Even if it requires a lot of work, as MK acknowledged, I think it'd be worth it. Also, is "publishing to the internet" via Google Drive durably archived? Like this thing? – Julia (talk• formerly Gormflaith • 17:58, 3 June 2018 (UTC)[reply]
Google doesn't have a good track record of keeping its services alive. (It's a pity we now have to rely on Google Groups for a Usenet archive!) Also if we archive others' tweets then there might possibly be legal issues around privacy/copyright. Equinox 18:01, 3 June 2018 (UTC)[reply]
I agree with User:Julia and it's very long (years) overdue. Kaixinguo~enwiktionary (talk) 18:01, 3 June 2018 (UTC)[reply]
We're not a twitter archiving service. We're not going to become a twitter archiving service. There are currently no Twitter archiving services that we can use. That might change in the future, but right now we would just be doing something half-assed. What do you mean by "a lot of work"? That's meaningless. DTLHS (talk) 18:08, 3 June 2018 (UTC)[reply]
Why not just take a screenshot and upload it to Commons? They could be verified by one or possibly two admins at the time, and permission could be asked just as it is for some images. I think the expression 'a lot of work' is fairly common and self-explanatory. Kaixinguo~enwiktionary (talk) 19:35, 3 June 2018 (UTC)[reply]
In that case what do we do if the original author comes with a copyright claim, or claiming distress that we are persisting something they chose to delete? GDPR etc... -- Actually I suppose the same questions apply to Usenet! Hmm. Equinox 19:42, 3 June 2018 (UTC)[reply]
"a lot of work" as in more steps than just getting a cite from a book or something. Regarding copyright issues (which I don't know much about) what rights do the creators of the tweet have? And what rights would we be possibly violating? – Julia (talk• formerly Gormflaith • 20:06, 3 June 2018 (UTC)[reply]
That's a question for the people at Wikimedia Commons. DTLHS (talk) 04:54, 4 June 2018 (UTC)[reply]

Display text of Template:der3 and others

I have reverted Dan Polansky's latest attempts to change the table title, on the grounds that his version looks stupid and repeats "Derived terms" right after the heading that says the same thing. Furthermore his references to the "status quo" seem specious. Finally, the pattern "Terms derived from X" is easy to extend to "Terms derived from X (noun)" if disambiguation is necessary. Other's thoughts? DTLHS (talk) 06:36, 2 June 2018 (UTC)[reply]

I admit that repeating "Derived terms" in the table heading after the same section heading looks a little odd, but "Terms derived from X" is a needless repetition of X, and looks really bad to me. Of course they are derived from X; X is the entry. I admit that I did not raise this when this practice started to be introduced but only recently. There are so many practice changes being inntroduced without a discussion. I think the "status quo ante" is fundamentally correct, and was used by me in {{rel3}}; I dispute that diff from 2016 was based on consensus, and I have no evidence of that consensus other than silence. --Dan Polansky (talk) 14:31, 2 June 2018 (UTC)[reply]
I like DTLHS's approach for a default. If something else emerges from user-added (non-default) content, that might be considered for a replacement of the default. We (by which I mean DTLHS) could do a dump run to find any potentially desirable innovations. DCDuring (talk) 16:00, 2 June 2018 (UTC)[reply]
Derived terms used to have no collapsible tables; for short derived terms, that was much more user friendly and avoided cruft. Now, when I visit the parta#Czech, I find section "Derived terms", underneath collapsible "Terms derived from parta", and within mordparta, and parťák. It looks ugly and stupid, pardon my French. --Dan Polansky (talk) 09:39, 3 June 2018 (UTC)[reply]
One solution would be to provide no text in the collapsible table: that would remove all cruft and all repetition, even the odd-looking "Related terms" (section heading), "Related terms" (collapsible heading) repetition. --Dan Polansky (talk) 09:48, 3 June 2018 (UTC)[reply]
I consulted English entry party for an unrelated purporse. This is what I see, as a sequence:
  • "Hyponyms"
  • "Hyponyms of party"
  • "Derived terms"
  • "Derived terms of party (noun)"
  • "Related terms"
  • "Terms related to party"
"party" is in italics there. This is what we call in Czech "jak u blbejch na dvorku", like at a morons' yard.
Making all the collapsible headings empty would be a huge improvement.
--Dan Polansky (talk) 10:05, 3 June 2018 (UTC)[reply]
In your example of parta#Czech, I think it would be better for the 'derived terms' not to be collapsed. There are only two of them and it is important information. Kaixinguo~enwiktionary (talk) 10:14, 3 June 2018 (UTC)[reply]
Our entries don't make good use of space, it's true. There's a lot of whitespace (other online dictionaries usually put all their pronunciation information compactly on one same line as 'pronunciation:', for example, not on three or more short lines with lots of empty space to the right of them). When there is text, a fair amount is redundant. But removing all text from the collapsible headings would be bad because it'd make it too easy to miss that there was information there. The "show" text in small font all the way at the other side of the screen from where most text starts; the collapsible box itself is a light grey which, when my screen is tilted at some angles, isn't even distinct from the background, so it's possible to miss the entire existence of it, and even when it is seen, it's possible to miss the "show text" (as mentioned) and, if no other text is present, to think it's an empty box i.e. that there are no derived terms. I know, because I stumbled onto an entry where someone had manually suppressed the text. Perhaps the redundant "derived terms" text should be replaced with "list" (which would also discourage use of the template when there's only a single derived term), or replaced with floating the "show" link to the left. - -sche (discuss) 15:05, 3 June 2018 (UTC)[reply]

Pakistani surnames

There are Pakistani cricketers named Misbah-ul-Haq, Inzamam-ul-Haq, Imam-ul-Haq and probably others. All spelled as a single term with two hyphens. Is "ul-Haq" a surname? If not, can anyone explain the format of the names please? SemperBlotto (talk) 13:38, 2 June 2018 (UTC)[reply]

It is a family name. I think it's "the truth" (حق); see Al-Haqq. Not certain. Equinox 13:46, 2 June 2018 (UTC)[reply]
@SemperBlotto, Equinox: الْحَقّ (al-ḥaqq) is the definite form of حَقّ (ḥaqq, truth). In the formal Arabic the definite article الْ (al-, equivalent of "the") is pronounced with an "a-" at the beginning of an utterance, in other positions it follows desinential inflection (iʿrāb) ending of the previous word and the initial vowel of "al-" is dropped (elided). E.g. "Misbah-ul-Haq" would be مِصْبَاحُ الْحَقِّ (miṣbāḥu l-ḥaqqi), "luminary of the truth" in the nominative case. So "u" belongs to the previous word but this vowel is usually not written, and the initial "ا" is silent. Languages borrowing from Arabic often follow these conventions. "al-" is more common, though. "el-" or "il-" is from dialectal/informal Arabic. --Anatoli T. (обсудить/вклад) 15:02, 2 June 2018 (UTC)[reply]
Thanks. So, would his father, brothers and sons also be xxxx-ul-haq? (and females??) SemperBlotto (talk) 05:56, 3 June 2018 (UTC)[reply]

Idiomatic names.

I'd like to add a type of category for names with idiomatic/sarcastic usage by language. An example in English: You can say 'You don't say, Sherlock' and even people who have not read Sherlock Holmes will understand that the word 'Sherlock' encodes the information that they're being obtuse and have just stated something obvious. This does not work with other fictional detectives such as 'you don't say, Hercule' or 'you don't say, Continental Op', whose names are therefor not idiomatic. I propose a label like 'langname names with idiomatic usage' or 'with sarcastic usage' since I can't recall names like Sherlock/Gandhi/Einstein etc. to be used as an actual praise. Korn [kʰũːɘ̃n] (talk) 11:59, 3 June 2018 (UTC)[reply]

@Korn: Are there many such names? I think the "sarcastic" part would make the category too narrow; besides, isn't sarcasm a contextual/pragmatic phenomenon more than a lexical one?
However, I definitely agree that we should gather all those genericised names (which run parallel to "genericised trademarks", imo) in a category; I even suggested as much last year. By the way, I think the genericisation process is called antonomasia (sense 2). --Per utramque cavernam 15:26, 3 June 2018 (UTC)[reply]
"Idiomatic" is broader than "sarcastic", of course, since Einstein#Noun "intelligent person", Joe#Noun "a guy", and arguably "mein Name ist Hase" "I know nothing" all seem like idiomatic but not sarcastic uses of names. I do think it'd be useful to have a category for idiomatically-used names (others: John Doe, Bubba, and arguably Johnny Reb and Johnny Foreigner, etc). I'm not sure a category for sarcasm would be as maintainable, since most names and other words are used sarcastically sometimes and so it might largely duplicate the "idiomatic" category (though a vote excludes separate senses for sarcasm except when terms are "seldom or never used literally"). Perhaps we should avoid the very opaque antonomasia, though (I suspect DCDuring will agree with me on this part?). - -sche (discuss) 15:33, 3 June 2018 (UTC)[reply]
The reason I'm thinking about using the 'sarcastic' tag is that it might be that non-sarcastic usages are less idiomatic rather than plain references to the actual person, but the idiomatic label seems preferable to me too. Korn [kʰũːɘ̃n] (talk) 16:50, 3 June 2018 (UTC)[reply]

Adapting Wikipedia template to warn about NSFW/sexual images for Persian

For some reason the Persian Wikipedia has more extreme images than others, for example, there is a gif at 'ejaculate'. When I was checking the translations for 'pearl necklace' and 'footjob' I have seen images of that as well. What if the {{wikipedia}} template were adapted to show a short warning? Would anyone mind? Kaixinguo~enwiktionary (talk) 17:55, 3 June 2018 (UTC)[reply]

w:Pearl necklace (sexuality) has the same image as w:fa:گردنبند مروارید. Wikipedia links are basically off-site links, and have the implied warning that we don't control the content there. Trying to track the current NSFW status of every Wikipedia page we link to, even if there were a clear agreement about what NSFW means, is hopeless.--Prosfilaes (talk) 19:06, 3 June 2018 (UTC)[reply]
No, there is no implied warning at all. Kaixinguo~enwiktionary (talk) 19:30, 3 June 2018 (UTC)[reply]
It says it's going to Wikipedia. Wikipedia is not censored, and a link to Wikipedia could potentially lead to anything, and a stable Wikipedia page for a sexual term may have various types of illustrations. That should give users plenty of warning.--Prosfilaes (talk) 04:54, 4 June 2018 (UTC)[reply]
No, Wikipedia isn't censored (much) and sometimes has sexual pictures. That's how it is. You should install censorware on your own computer if you need to stop this. Equinox 19:33, 3 June 2018 (UTC)[reply]
I didn't request to censor Wikipedia. It's not unreasonable to warn of, not censor, a gif of ejaculation.
By the way, I'm not thinking of myself, although I had never seen those images before. Kaixinguo~enwiktionary (talk) 19:39, 3 June 2018 (UTC)[reply]
I understand the notion and agree that there's too great a laxity with preventable exposure in the Wiki community (We have an entry with a picture of a corpse, which I abhor.), but I too think this is an issue to be fixed at Wikipedia and that the very fact that you're moving to another site, to read an article about sexual practices, implies that you might get exposed to the act in question. — This comment was unsigned. My bad. Korn (talkcontribs)
I don't think it is at all reasonable to expect that there will be a video demonstrating a sex act on every page which describes a sex act. While I don't think we should censor content, I don't see any reason why labeling content that is likely to be offensive or otherwise problematic to a large segment of the population would be a bad thing. The fact that we can't be comprehensive is not an argument against such labeling when we are aware. Personally, I often use Wiktionary and Wikipedia while at work, and would be annoyed if a video of someone ejaculating showed up on my screen if I wanted to find out what some term in a random song or comedy act I was listening to meant. I don't think this is a prudish or censorial viewpoint. - TheDaveRoss 00:00, 4 June 2018 (UTC)[reply]
Then don't click on Wikipedia links. The discussion of what to display on Wiktionary is entirely separate from whether we should try to mark up certain Wikipedia links as potentially NSFW.--Prosfilaes (talk) 04:54, 4 June 2018 (UTC)[reply]
The idea that one should never click on Wikipedia links because some of them will show graphic videos is patently ridiculous. - TheDaveRoss 11:22, 4 June 2018 (UTC)[reply]
The idea that Wiktionary should keep track of and put warnings on links to pages on Wikipedia that might contain "NSFW"/"explicit" images, when Wikipedia itself doesn't put warnings on, is also ridiculous, especially because the pages for which the presence of explicit images is least likely to change and invalidate any such warning-labelling, namely pages about sex or body parts, are ones readers can expect an uncensored encyclopedia to illustrate.
"Explicit"/"NSFW" are nebulous, anyway: is a link to a page with an image of a nipple going to be tagged initially? (I'm sure it'll be tagged in the end; censorship creeps.) Is a link to a page that might or might not contain an illustration of breastfeeding to be tagged, initially? Is a Wikipedia "List of Foo slang" that documents swear words or words for sex acts "explicit", given that we do have at least one user who has a filter that deletes swear words? There are entire communities of bigots who would prefer not to see images of gay people, or of any women. Maybe some people only want to tag the most explicit animations, but the censorship will inevitably creep "to err on the side of caution".
- -sche (discuss) 14:12, 4 June 2018 (UTC)[reply]
There are user-scripts users can individually enable to block images / videos from displaying, if they wish to merely read about ejaculation at work. - -sche (discuss) 14:18, 4 June 2018 (UTC)[reply]
Labeling is not censorship, that is a red herring. As is the fact that we cannot be comprehensive, that is an argument against the project as a whole. The subject of this discussion is not to limit what can or cannot be shown on Wikipedia (or Wiktionary); it is merely suggesting that, in cases where an editor knows that the target of a link contains graphic imagery, the editor can let readers know. This does not impede readers from seeing or following links, it just lets them self-select out of certain content if they wish to, instead of forcing them to play roulette. Graphic content lowers the utility of Wikipedia for many, labeling such content so that users can actively avoid it as they choose mitigates this problem. - TheDaveRoss 15:11, 4 June 2018 (UTC)[reply]
They are labeled; every word that starts this discussion has a definition that clearly warns anyone of what might be in Wikipedia. I'm not even sure where Kaixinguo~enwiktionary expects us to put the warning; he talks about the Persian Wikipedia and says "When I was checking the translations"; are we supposed to warn on every translation? If you can't handle what's on Wikipedia, don't go there, or at least install a filter that should try and protect you.
We are pretty comprehensive in English at least. Our usefulness drops drastically in other languages where we don't have a decent coverage. What good does tagging a handful of Wikipedia links as NSFW if there's ten times as many NSFW links that aren't so tagged? It gives you no reason to think you can ever safely click on a Wikipedia link, exactly where you started.--Prosfilaes (talk) 21:29, 4 June 2018 (UTC)[reply]

Eliminating undocumented withtext= param in {{borrowed}}

I plan to use a bot to eliminate the remaining places where withtext= is used in {{borrowed}}. The plan is to use a bot to replace "{{bor|...|withtext=1}}" with "Borrowed from {{bor|...}}" whenever the template occurs at the beginning of a line or sentence, and to handle the remaining cases by hand. I've spot-checked a dozen or so cases so far and all of them have {{bor}} at the beginning of a line or sentence, and all of them read fine when using the "Borrowed from" text instead of the auto-generated "Borrowing from" text (and in many cases, "Borrowed" reads better than "Borrowing"). Benwing2 (talk) 18:19, 3 June 2018 (UTC)[reply]

I think you're good to go, this was part of the plan anyway. See Wiktionary:Beer_parlour/2017/November#Template:bor:_Replace_notext=1_with_withtext=1 --Per utramque cavernam 18:27, 3 June 2018 (UTC)[reply]
OK, I wrote the script and it's ready to go. With some special-case hacking, there are only around 135 lemmas (out of 10,200+) that need to be handled manually; most of these are erroneous uses of withtext=1 of various sorts. I'll wait a bit longer to make sure no one objects. Benwing2 (talk) 20:19, 3 June 2018 (UTC)[reply]
I fixed all the manual cases and am running the script to fix the automatic cases. Benwing2 (talk) 03:04, 4 June 2018 (UTC)[reply]
Finished. Benwing2 (talk) 07:21, 4 June 2018 (UTC)[reply]

Parents of foo-mid languages, foo-old languages

The parent of bn-mid (Middle Bengali) and bn-old (Old Bengali) are given as bn (Bengali), which seems totally wrong. Same for or-mid, or-old, kok-mid, kok-old, etc. etc. Is this correct? Maybe so because these are etym-only languages but it seems weird. Benwing2 (talk) 03:03, 4 June 2018 (UTC)[reply]

If the Old/Middle forms of the language aren't considered sufficiently distinct to treat as different languages from the modern language, then I guess it makes sense; after all, Biblical Hebrew is an etymology-only language with (modern) Hebrew as its "parent". - -sche (discuss) 14:06, 4 June 2018 (UTC)[reply]
No, that isn't right. I don't think Old and Middle Konkani have enough attestation to deserve full codes, but Old/Middle Bengali and Odia (or Oriya, whatever we call it here) should be upgraded to real codes. —AryamanA (मुझसे बात करेंयोगदान) 01:47, 5 June 2018 (UTC)[reply]

Soft redirection template for Japanese

Hi everyone. What do you think about the soft redirection format on 貴方?

For pronunciation and definitions of 貴方 – see あなた.
(This term, 貴方, is a kanji spelling of あなた.)

The soft-redirection template is meant to serve the same function as {{zh-see}} for Chinese. Although currently not implemented, it should be able to display glosses and copy categories from the lemma entry in the future. If the idea of having a Japanese soft-redirection template is accepted, we can create alternative forms (mainly of pairs like まっとう / 全う) faster by doing away with the need of copying POS headers as well as manually providing glosses, which can become out of dated if the lemma entry changes.

(Notifying Eirikr, Wyang, TAKASUGI Shinji, Nibiko, Atitarev, Dine2016, Poketalker, Cnilep, Britannic124, Fumiko Take): --Dine2016 (talk) 11:31, 4 June 2018 (UTC)[reply]

I support centralisation of Japanese entries. The ongoing trouble is to decide what IS the Japanese lemma. This can't be decided easily. There are good arguments in favour of kana entries and in favour of kanji entries (if a term has both) and it very much depends on:
  1. What is the most frequent spelling?
  2. Are there multiple etymologies and what is their distribution? Unrelated homophones with different etymologies are better off having kanji entries as lemmas, if the kanji spelling is more common than kana.
  3. Verbs or adjectives with the same reading/pronunciation might be better off lemmatised at kana entries with only one inflection table. They are mostly native Japanese words.
  4. Sino-Japanese entries (or more broadly words with 100% on'yomi readings) are better off lemmatised at kanji with the most frequent spelling, only if it happens to be kanji.
It's roughly my position before we jump into making redirects for Japanese entries. --Anatoli T. (обсудить/вклад) 12:02, 4 June 2018 (UTC)[reply]
I was under the impression that the discussion two months ago reached a preliminary consensus for native words. Korn [kʰũːɘ̃n] (talk) 12:28, 4 June 2018 (UTC)[reply]
Thanks for your replies. I think the choice of lemma forms is a separate issue, and existing entries can be left as is before we settle on an approach of centralization. There are lots of words for which most editors would agree on the kanji spelling as the lemma form. At the current stage, the soft redirection template could facilitate the creation of their hiragana forms. --Dine2016 (talk) 12:47, 4 June 2018 (UTC)[reply]
Another term would be ふるさと (furusato, hometown, amongst other meanings). It has (at least) three known attested kanji spellings: 古里, 故里, and 故郷. Better than calling an alternate spelling of one kanji compound (as in the third)? ~ POKéTalker13:26, 4 June 2018 (UTC)[reply]
ふるさと (furusato) is a good candidate to be a lemma. A native Japanese word and Chinese characters for it are only a visual help, possibly none of them is more common than the kana form but if one of them is more common than that spelling should be the lemma. If it's decided that native words are lemmatised at kana, then so be it. --Anatoli T. (обсудить/вклад) 13:37, 4 June 2018 (UTC)[reply]
I support this proposal for {{ja-see}}.
Re: which spelling to choose as lemma, I'll reiterate my preference, as discussed and developed in the earlier thread: native-Japanese terms (i.e. 和語 (wago)) and non-Chinese foreign-derived terms (外来語 (gairaigo)) would go under the kana spellings, while Chinese-derived terms (漢語 (kango)) would go under the kanji spellings. The rationale for this is that wago and gairaigo may have multiple possible kanji spellings (where such exist), whereas kango generally only have one (rarely two) spellings. Kana entries for kango would be soft redirects to the kanji entries (the current status quo), while kanji entries for wago and gairaigo would be soft redirects to the kana entries (this would require changes from our current state). The wago and gairaigo entries could specify spelling frequencies with usage notes or labels.
Example: とる (toru) has a basic meaning of to take. However, which kanji spelling is used most frequently depends on which sense is intended: 撮る for photography and video, 採る for samples to be used for something, 捕る for capturing or trapping a pest or catching a pop-fly, 獲る for capturing or trapping prey, 盗る for taking something illicitly, etc. etc. Personally, I think it makes the most sense to consolidate all of these under the とる (toru) kana spelling, and in fact, this is what monolingual JA-JA dictionaries essentially do. See this entry in Daijirin for the 採る spelling, grouped with the other spellings of the same etymology. ‑‑ Eiríkr Útlendi │Tala við mig 16:59, 4 June 2018 (UTC)[reply]
@Eirikr: Hi. OK, ()(かえ) (kurikaeshi) is a wago. Could you explain, why 繰り返し should be a redirect to くりかえし if 繰り返し is the most common spelling? --Anatoli T. (обсудить/вклад) 23:01, 4 June 2018 (UTC)[reply]
Two reasons: 1) technical constraints inherent in the MediaWiki platform, and 2) consistency.
  • The technical constraint has to do with how we have redirects. Electronic JA-JA dictionary apps that I've been able to try out generally have pretty slick redirection, where the user can input kanji or kana and still get to the desired entry either directly or with one additional input. If the user clicked the wrong entry, the list is still there on the screen, so no need to go back: just click another entry in the list. We could do that with hard redirects, but since a single spelling might overlap with terms in other languages, we cannot do so across all JA entries.
  • The consistency consideration is in part to match other JA-JA dictionaries (lemming-wise, including our cohorts at the JA WT), in part so all wago are lemmatized similarly, and in part for usability.
Wago readings tend to be unambiguous, with one reading matching one term. This is consistent with the history, where wago derive from the verbal language. Spellings were an afterthought as literacy was imported into Japan from a completely unrelated donor language. Even today, usage conventions for okurigana (the kana after the kanji) can be somewhat loose -- the KDJ lists kurikaeshi under the spelling 繰返, for instance. But if a user knows the pronunciation, they can always spell out the kana.
Kango, meanwhile, were borrowed from written Chinese, with a focus on the meaning inherent in the characters and without much regard for what they sound like. In extreme instances, a single kango reading might have tens of spellings. せいしん (seishin), for instance, generates 21 distinct hits in my local electronic Daijirin, each with distinct derivations and senses. せいせい (seisei) generates 28 distinct spellings, belonging to 25 different terms. とうし (tōshi) generates 24 distinct spellings for 20 terms.
This difference in history and verbal / visual distinctiveness carries over into how terms are used: wago are used more in spoken and informal speech, where auditory disambiguation is key, while kango are used more in written and formal texts, where the written text allows authors to visually specify meaning that might be lost in a spoken medium.
→ Broadly speaking, wago are phonemically distinct (kana spellings), while kango are graphemically distinct (kanji spellings).
Conversely, one could turn your question around: for any given wago, not just 繰り返し (kurikaeshi), why would we use the kanji for the lemma? There's more overhead for editors in having to identify which spellings are more common (clear for kurikaeshi, but not always so simple for other terms, and not always provided even in modern dictionaries), duplication of data at multiple spellings and/or frustrating arbitrariness where we have to just choose one among multiple current variants of roughly equal frequency, and more potential for confusion among users (which spelling? which okurigana? why is Wago A under a kanji spelling, but Wago B under a kana spelling?). Using kanji spellings for wago can also obscure otherwise-clear relationships, as observable at あばく (abaku). For instance, if we split each sense of とる (toru) out to its kanji spelling, we would fracture the entry and make it harder for users to see that all the spellings of toru are just shades of meaning of the same verb. Imagine if English get were similarly split up, where each sense had a distinct spelling and separate entry, despite all senses having the same reading, same derivation, same underlying meaning.
‑‑ Eiríkr Útlendi │Tala við mig 00:09, 5 June 2018 (UTC)[reply]
@Eirikr: OK, thanks for the detailed answer, almost convinced. Well, the Chinese handling is not perfect either - simplified characters act as redirects, even if their usage is much higher than that of the traditional. --Anatoli T. (обсудить/вклад) 07:22, 5 June 2018 (UTC)[reply]

Transclusion

@Eirikr: I know it's a bold idea, but what if we make the template transclude the appropriate sections from the lemma entry, like this?

==Japanese==
{{ja-see|繰り返し}}
For pronunciation and definitions of くりかえし – see 繰り返し.
(This term, くりかえし, is a kana spelling of 繰り返し.)
Pronunciation
Noun

繰り返し (hiragana くりかえし, rōmaji kurikaeshi)

  1. repetition

--Dine2016 (talk) 01:06, 5 June 2018 (UTC)[reply]

Very nice. Wyang (talk) 03:33, 5 June 2018 (UTC)[reply]
I'm more than okay with that. I've long thought about this kind of transclusion as a means of providing users the relevant info while avoiding flat-out manual duplication. ‑‑ Eiríkr Útlendi │Tala við mig 03:37, 5 June 2018 (UTC)[reply]
@Dine2016, just expanded 繰り返し (kurikaeshi). How any sections (etymology, kanjitab, derived terms, etc.) can be omitted in appropriate entry with ja-see? ~ POKéTalker04:18, 5 June 2018 (UTC)[reply]
@Poketalker: I think every relevant section should be transcluded, so that whether the reader searched for くりかえし or 繰り返し, they will be able to get the same information on the word ()(かえ) (kurikaeshi, repetition). Different spellings, same word, same information. I believe this is how the electronic dictionaries Eirikr mentioned above work, except that we don't require an extra click if we take this approach. An exception may be made for {{ja-kanjitab}}, which is usually spelling-specific. --Dine2016 (talk) 06:59, 5 June 2018 (UTC)[reply]
@Poketalker: There are technical restrictions on the amount of transclusion. In addition, homograph entries like 上下 (agarisagari, ageoroshi, agekudashi, agesage, ueshita, kamishimo, shōka, jōka, jōge, noboriori) can be long and hard for readers to find the entry they're looking for, canceling out the advantage of not requiring an extra click for the full entry. Reconsidering the issue now, I think it's ok to just display the POS headers and definitions (and perhaps pronunciation and usage notes) and direct the reader to the lemma entry for full information. Alternatively, we can include all the information but make the templates collapsed by default (which is probably technically inferior due to page load time). @Wyang, Eirikr, any thoughts? --Dine2016 (talk) 11:24, 6 June 2018 (UTC)[reply]
@Dine2016: You have actually made the kana entry a redirect to kanji in your example but Eirikr wanted the other way around, no? Good job, anyway. --Anatoli T. (обсудить/вклад) 07:22, 5 June 2018 (UTC)[reply]
@Atitarev: Yes, but {{ja-see}} is expected to support both kanji-to-kana and kana-to-kanji, so converting it to the other way should be easy. --Dine2016 (talk) 07:48, 5 June 2018 (UTC)[reply]
The main potential problem I can see would be when the template is on a page with lots of other content- such as huge lists of Chinese compounds on kanji pages. There are limits to the amount of memory and transcluded content allowed on a single page. Going over the memory limit causes highly visible module errors. Going over the transclusion limit, on the other hand, means that every template that hasn't already been transcluded becomes a hyperlink to the template, so {{l|en|example}} is replaced by Template:l (sometimes it just has the invoke statement, but either way, it's useless). Worse, these unexpanded duds are almost always at the bottom of the page, where editors are less likely to spot them- unless you happen to check Category:Pages where template include size is exceeded or notice the category at the bottom of the page, you may not realize anything is wrong. See User:Hermitd/Greek wordlist for an example. That said, the limit is 2 MB of transclusion, so it may not happen much. Chuck Entz (talk) 04:56, 6 June 2018 (UTC)[reply]
@Chuck Entz: Thanks for the heads-up. What contributes to reaching the transclusion limit: fetching the wikicode of the lemma entry with getContent(), expanding the relevant sections with preprocess(), or only returning the expanded wikicode from module to page? --Dine2016 (talk) 07:12, 6 June 2018 (UTC)[reply]

Hey. I'm probably gonna start the new Wiktionary:Summer Competition 2018 soon. It's Wiktionary:Christmas Competition 2013 repeated, but with a less able Gamesmaster. I should probably fix a few things first. I'll keep y'all posted about publication dates etc. when I can be bothered to. --Genecioso (talk) 21:09, 4 June 2018 (UTC)[reply]

How come...

Wiktionary:Translation requests is such a popular page? --Genecioso (talk) 22:04, 4 June 2018 (UTC)[reply]

WMF doesn't keep track of (or at least doesn't publish) referrer statistics for individual pages. If someone here was so inclined they could make a tool to database referrers and add some javascript on this end to populate the table, then we could find out where all of the traffic is coming from. We don't seem to result highly on Google for many seemingly obvious queries. - TheDaveRoss 14:11, 5 June 2018 (UTC)[reply]

Important: No editing between 06:00 and 06:30 UTC on 13 June

This is just to tell you that your wiki will be read-only between 06:00 UTC and 06:30 UTC on 13 June. This means that everyone will be able to read it, but you can’t edit. This is because of a server problem that needs to be fixed. You can see the list of affected wikis on Phabricator.

If you have any questions, feel free to write on my talk page on Meta. /Johan (WMF) (talk) 12:37, 5 June 2018 (UTC)[reply]

For those who don't know what to do during that time, take up the harmonica - I have one to donate - send me an email with your address, and I'll send it off to you. --Genecioso (talk) 13:43, 5 June 2018 (UTC)[reply]
Sounds great! User:Equinox, c/o Wikimedia Foundation, 1 NOTPAPER Way, Banville, CA 95966. Equinox 16:41, 5 June 2018 (UTC)[reply]

Buryat IPA transcription

I don't know if this is appropriate, but can anyone check if the IPA transcriptions for the Buryat lyrics of the Buryatia regional anthem here are accurate? There might be some errors. If there are any errors, would anyone (who is familiar with Buryat phonology and/or phonologies of Mongolic languages) provide a better transcription than the current one? Thanks. 213.183.63.189 04:18, 6 June 2018 (UTC)[reply]

On the placement of constructed languages, and on the attestation of appendix-only languages

Wiktionary:Votes/pl-2018-04/Disallowing appendix-only languages has failed. I think Gamren has taken things a bit backwards, and it made it look like he wanted to delete the mainspace-like content (i.e. the entries) currently hosted in appendices altogether. If I've understood the issue correctly, that wasn't his point at all. I think his view is that our current separation between main space and appendix-only languages

  • 1) is artificial;
  • 2) leads to our hosting unchecked content.

I'll address the second issue first. In my view, he's made a valid point: it would seem that, at present, Appendix-only languages are not subject to any attestation criteria. Is that really what we want, and what we wanted when we relegated Lojban to the Appendix namespace (Wiktionary:Votes/2018-02/Moving Lojban entries to the Appendix)?

I don't think so, or at least I hope not; in my opinion, all words, wherever they are, should be subjected to some kind of attestation criteria. Does everyone agree on that, or does it need to be put to the vote?

If that's agreed upon, the question would then be: what kind of attestation criteria do we want for Appendix-only languages?

  • the stringent ones of WDL?
  • the more lenient ones of LDL?
  • a middle ground (two quotes?)?
  • a mix of both (i.e. submitting some languages to the WDL criteria, others to the LDL criteria)?
  • something else, but something?

Whatever the answer, I see one problem: given that the distinction between main space languages and appendix-only languages would be:

  • neither one of attestation (since we'd have agreed that all of them need some kind of attestation);
  • nor one of strength of attestation (since we already make a distinction in the main space between WDL and LDL – it thus seems difficult to find a third one which would be completely specific to appendix-only);
  • nor one of "naturalness" (since there are both natural languages and constructed languages in the main space);

what exactly would be the criterion? Is there a meaningful distinction to be made between main space languages and appendix-only languages?

  • If we want to make strength of attestation the criterion, we move all LDL-subjected languages – natural or constructed – to the appendix namespace, and all WDL-subjected languages – natural or constructed – to the main space (that's a shit idea, if you ask me);
  • If we want to make "naturalness" the criterion, we move all constructed languages to the Appendix: all natural languages belong in the main space, and all constructed languages (that we've agreed to keep on Wiktionary) belong in the Appendix space, regardless of the attestation criteria we'll choose for each. That would be my preference, but I think many people will be opposed to that.

Given that neither of those solutions is particularly appealing, we have to look further. Fact one: there are only constructed languages in the appendix. Fact two: constructed languages kept in the main space are subject to stringent (WDL) attestation criteria. Fact three: ...

But if, in the end, there really isn't any meaningful distinction to be made (which, again, I'm not convinced of), we go back to Gamren's solution: disallowing appendix-only languages, which means:

  • 1) moving everything to the main space;
  • and 2) working from there: what do we want to keep, under which criteria, and what do we want to see deleted for good?

I might have taken a shortcut or two, but I think that's the gist of it. I hope I haven't misrepresented the facts. --Per utramque cavernam 17:52, 6 June 2018 (UTC)[reply]

With LDL languages the presumption is that we're referencing a dictionary that recorded use, even if we don't have primary access to that use. With constructed languages that may not be the case. I could imagine doing something where words in a particular constructed language would be disallowed in mainspace unless they had three actual examples of use (*not* based on RFV, but required examples before they are in mainspace at all), moved to the appendix if all they had was a dictionary reference, and deleted otherwise. DTLHS (talk) 19:03, 6 June 2018 (UTC)[reply]