Wiktionary:Beer parlour/2017/October

Definition from Wiktionary, the free dictionary
Jump to: navigation, search
discussion rooms: Tea roomEtym. scr.Info deskBeer parlourGrease pit ← September 2017 · October 2017 · November 2017 → · (current)

October LexiSession: punishment[edit]

The Punishment of Loki.

The monthly suggested collective theme is punishment. Not so funny, but the 10th of October is the World Day Against the Death Penalty so we may look at the alternatives and do better descriptions around this theme.

Lexisession is a collaborative experiment without any guide or direction. You're free to participate however you like and to suggest next month's topic. If you participate, please let us know here or on Meta, to keep track on the evolution of LexiSession. In one year, 35+ people have participated! I hope there will be some people interested this month, and if you can spread it to another Wiktionary, you are welcome to do so. Ideally, LexiSession should be a booster for every project at the same time, to give us more insight into the ways our colleagues works in the other projects.

See you soon Face-smile.svg Noé 09:28, 1 October 2017 (UTC)

slow slicing and poena cullei are my requests for this month (entries for these, not as punishment...although, they could be useful detractors for trolls...) --P5Nd2 (talk) 09:39, 8 October 2017 (UTC)

Special:Contributions/98.113.14.63[edit]

Adding translations to too many unrelated languages. No idea where they get transliterations for Chinese dialects, such as Jin, Gan, Xiang, etc. --Anatoli T. (обсудить/вклад) 10:51, 1 October 2017 (UTC)

How the heck did they find Prakrit translations? They must be going through the entries we have already. —Aryaman (मुझसे बात करो) 20:44, 1 October 2017 (UTC)

French Wiktionary September news[edit]

Logo Wiktionnaire-Actualités.svg

Hello!

Hey! September issue of Wiktionary Actualités just came out in English!

In this issue: Comments about press articles, our information desk is not like yours, a description of a dictionary of short-text signs, a comment on the expression of gender in an Andean language, some cool videos about words (in French and English!), announcements for the Wikiconférence francophone in October and plenty of statistics with fancy fleurons surrounding it all!

As usual, it is translated in English by non-native speakers, in less than a day, and it is not perfect, but it can be improved by readers (wiki-spirit). We did not receive any money for this publication and we are not supported by any user group or chapter. It is only written by the community, for the large community of lexicolovers! I hope you did not feel harassed by this notice Face-smile.svg Noé 21:41, 1 October 2017 (UTC)

PulauKakatua19 (talkcontribs) again[edit]

This user is adding spurious Hittite entries, with some really bad/outdated etymologies. No references either. They have been warned many times for Russian, Hindi, and Rohingya edits. I suggest a week-long block. —Aryaman (मुझसे बात करो) 14:51, 3 October 2017 (UTC)

Etymological information for strong verb non-lemma forms[edit]

There are many forms, e.g. in English, where arbitrary, irregular, strong forms of verbs deserve their own etymology. Many of these individual forms have received particular attention from linguists over the decades, e.g. did (past tense of do, from a unique, non-past reduplicated root form of the ancestor of do dating back to Pre-Germanic, for unknown reasons), sang (past tense of sing derived directly from a Proto-Indo-European form of the ancestor of sing). These non-lemma forms have their own independent etymological lineage that can be traced back thousands of years.

A certain administrator (Rua) has informed me that it is policy on Wiktionary to minimize etymological information on non-lemma forms, and instead place such information in the lemma form's etymology section. This can be understood for weak forms like walked, but those forms need little explanation because they are formed regularly, and for the forms that do require extra explanation, it makes for unsightly etymology pages on the lemma form's etymology sections (see Proto-Germanic *dōną's etymology section for the current policy specification; it doesn't even specify the past form *dedǭ, referring to it only as "the past form").

I understand the concern to avoid etymology fragmentation, but in this case, the etymology itself is fragmented and the two forms are remembered as separate, arbitrary, irregular forms. Perhaps there is a solution to maintain the same etymology information in multiple pages, but I think the most simple solution would be to provide etymological information for such forms on their own pages. There is really no reason to avoid this practice and it only makes things more confusing. I am surprised that this is against current policy. Do you agree with this assessment?

128.84.127.223 16:17, 3 October 2017 (UTC)

Strongly oppose putting etymologies on every inflected form, irregular or not. —Rua (mew) 16:35, 3 October 2017 (UTC)
  • Out of curiosity -- where should such etymological information go? Some simple-present verb forms include etymological information for irregular conjugated forms, such as at [[go#Etymology 1]]. Others do not, such as at [[do]], which includes no explanation for the formation of [[did]]. ‑‑ Eiríkr Útlendi │Tala við mig 17:29, 3 October 2017 (UTC)
    • At the lemma entry, where we currently already place them. The IP is arguing that we should put etymologies on nonlemma entries too, which is going to lead to a huge duplicative mess. —Rua (mew) 17:47, 3 October 2017 (UTC)
I am proposing a move of the notable etymologies from the lemma to the non-lemma forms, if they are notable as in strong verbs. There is no duplication going on, only a move, as I indicated in the OP. 128.84.127.223 17:58, 3 October 2017 (UTC)
Then I still oppose it, the etymological information should be centralised on the lemma form. That's how all etymological dictionaries work, that's how we've worked so far too. Our users are accustomed to follow the link to the lemma for information, which is the purpose of non-lemma entries in the first place. They're there to help get users to the right place, nothing more. We should not scatter our information across various non-lemmas. —Rua (mew) 18:34, 3 October 2017 (UTC)
Traditional etymological dictionaries are constrained by the space in a book and give priority to lemma forms because they are the most popular. There is no real reason to ignore non-lemma forms or centralize their etymologies because Wiktionary doesn't have a size constraint, especially for adding reasonable information. I disagree that the only purpose of non-lemma forms is to provide a link to a lemma form; many non-lemma forms have lineages in their own right and there is no reason to marginalize them. Furthermore, users are not accustomed to follow the link to the lemma forms as you suggest; precedents for separate etymologies for non-lemma forms like done, is, are, am, etc. already exist and have existed for a long time. 128.84.127.223 18:37, 3 October 2017 (UTC)
Size constraint isn't the issue. It's keeping our information organised so that information can be found easily. And what I said is the agreed-upon purpose of non-lemma forms. It's why we don't include things such as derived terms, descendants or inflection tables on non-lemmas. Wiktionary is fundamentally lemma-oriented (or lexeme-oriented) rather than word-oriented. If we were word-oriented, we'd also include full definitions on non-lemmas, but thankfully we've been wise enough to not follow that idea. —Rua (mew) 18:42, 3 October 2017 (UTC)
Semantically and synchronically, what you're saying is correct; non-lemma forms don't require separate definitions. The placeholders used now are adequate. Etymologically and diachronically, it's incorrect. Irregular non-lemma forms are entirely independent of their lemma forms. Wiktionary is a semantically lemma-based dictionary, but that's completely unrelated to etymology. There is good reason for irregular non-lemmas to provide etymologies, and the semantic value of the terms have no bearing on it. 128.84.127.223 18:46, 3 October 2017 (UTC)
Ok, but as you must understand by now, that's not how Wiktionary works. The etymologies for the individual parts are noted on the lemma. You'll simply have to adapt to this practice. We're not going to change it just because some random user doesn't like it. —Rua (mew) 18:48, 3 October 2017 (UTC)
You are not arguing against my point, you are arguing your point because "that's the way it's always been done" and based on ad hominem because I'm "some random user". 128.84.127.223 18:54, 3 October 2017 (UTC)
  • I am not proposing putting etymologies on "every inflected form", only on the arbitrary forms with their own separate, traceable etymologies, if only to indicate their significance. The regular forms don't require etymologies because they are predictable. E.g. the etymology for the strong non-lemma form of sing, which is sang:
From Old English sang, from Proto-Germanic *sang, from Proto-Indo-European *songʷh-, o-grade past tense of *sengʷh- (sing, make an incantation).
Right now, the article sang doesn't indicate any of this lineage at all. As a strong and unpredictable form, lexically, sang is just as prominent as its form which is arbitrarily deemed the lemma form, sing, which independently derives from a different PIE form. There is no reason to treat it as a secondary form etymologically, at least in this case. 128.84.127.223 17:36, 3 October 2017 (UTC)
That's not any better. Consider how many times we'd have to duplicate the etymology for all 12 of the past tense forms of vera or syngja. The lemma is a natural place for etymologies, since it's a single central entry that covers all inflected forms. —Rua (mew) 17:47, 3 October 2017 (UTC)
That's not duplication, that's providing the very separate etymologies for very separate forms. If the forms merge at a certain point, then a link can be provided to the form from which they split off to avoid etymology duplication, like is done with borrowed terms. The words is, are, and were, for example, are all forms of is, but does that mean these forms should not provide their own etymologies? Are the etymologies of these forms of less interest and notability than any other term? They are not. 128.84.127.223 17:56, 3 October 2017 (UTC)
  • I could be wrong, but I don't think Rua is arguing that the etymologies of conjugated forms are not worthy of inclusion. I believe that she is instead arguing that the etymologies of conjugated forms should go within the etymology of the lemma form, and that the conjugated-form entries should be minimal.
The issue at hand is not whether to include or exclude certain information -- rather, it is about where to include that information. ‑‑ Eiríkr Útlendi │Tala við mig 18:09, 3 October 2017 (UTC)
Right. I am proposing that it should go in the non-lemma form. In fact, what I'm proposing is already standard practice for many notable forms, e.g. done. I want this to be consistent. For verbs like be, it would clutter its etymology section to list all of the etymologies for all of its many suppletive forms is, are, were, was, am, etc. One interested in these etymologies can follow the links to these forms' pages (which are provided in the term head) and view the etymology. More importantly, someone who specifically searches for irregular forms should have immediate access to their etymologies on the same page.
When I go to the page for am (which actually already follows the format that I'm proposing; I don't think anyone would want to move its etymology to the be page), I want to know the etymology of the term. When dealing with etymology, I don't really care if it's a form of any other (in this case, a completely unrelated lemma form), I want immediate access to its own unrelated and notable etymology. I believe this seems fairly reasonable and already has precedent. 128.84.127.223 18:20, 3 October 2017 (UTC)
The lemma entry is a central place for the term and all of its inflections. Information about am concerns the lemma be, so it should go there. The individual parts of verb paradigms may have separate origins but they don't have separate etymologies because they are inherited as a whole. The verb be in modern English is the same paradigm as the verb been in Middle English. —Rua (mew) 18:38, 3 October 2017 (UTC)
That's incorrect. Separate forms of a verb are not "inherited as a whole". That doesn't even make any sense. Irregular forms all require individual memorization and passing down. The lineage of was, for example, is entirely separate from be, as are both from am. If what you were saying was true, all Indo-European languages would still preserve the verb paradigms of Proto-Indo-European. They do not. They mix, they match, they innovate, they supply. 128.84.127.223 18:41, 3 October 2017 (UTC)
But they still form a single verbal paradigm. A question like "what is the past tense of be?" has an answer precisely because paradigms exist. We have chosen to use a single form to stand in for the entire paradigm, the lemma form, for convenience. That's where etymologies also go. —Rua (mew) 18:46, 3 October 2017 (UTC)
Etymologically, verbal paradigms don't matter for irregular forms. We have chosen the lemma form to stand in for the non-lemma forms semantically, but we have not done so etymologically, because that makes no sense. 128.84.127.223 18:48, 3 October 2017 (UTC)
We've chosen to do both. I'm sorry if that makes no sense to you, but it is what it is. —Rua (mew) 18:49, 3 October 2017 (UTC)
Please cite for me to this specific point in Wiktionary policy. I will propose the change through the proper channels. 128.84.127.223 18:53, 3 October 2017 (UTC)
I found it myself, and lo and behold, you seem to be the one who added this into the "common guidelines" page in the first place. While I agree with most of your additions, exclusivity of etymology to the lemma page is one that does not make any sense. 128.84.127.223 19:06, 3 October 2017 (UTC)
I am in favor of continuing to not split etymologies, on the grounds of workability: an editor who is interested in adding this type of information should be able to see at a glance if it has been done already, without checking each relevant non-lemma entry separately. On the other hand, I don't see a problem in directing users from non-lemma forms to the lemma, in cases where they need a separate discussion.
Actual suppletion seems like a different case, though. is, are and be have completely unrelated etymologies, and continuing to maintain separate etymology sections for them seems like a good idea (but I'd again be in favor of pointing users from the lemma form to the other entries for further reading). --Tropylium (talk) 20:49, 3 October 2017 (UTC)
I don't want to split etymologies, except like you said, for terms with suppletive forms and terms with strong forms. For example, the etymology of "did" takes a separate lineage all the way back to Proto-Indo-European that's completely independent from "do"; despite not being a suppletive form, it's a strong form. I don't want to split etymologies for verbs like "walked", only for verbs like "did" and "is/am/are" and "brought". One page should not contain etymologies for different terms if the etymologies are not currently regularly formed. So this would only be an exception that would affect a relative minority of pages. Wouldn't you agree with this? 128.84.127.223 21:47, 3 October 2017 (UTC)
English has relatively few inflected forms, but it can get pretty complicated when you have forms inflected for gender, number, case, etc. Even in English, am, is and are all go back to inflected forms of the same Proto-Indo-European root. As for strong verbs, I don't think differences in ablaut grade are enough to justify maintaining separate etymologies. We have a recognized system of lemmas and non-lemmas, but I'm not sure how you could decide which form to make the "etymology lemma" for forms sharing an etymology. Chuck Entz (talk) 02:15, 4 October 2017 (UTC)
I have trouble with the vagueness of "strong forms". This is well-defined only for Germanic languages, not a generally applicable concept. Likewise, "having a separate lineage" holds for a lot of things, for starters all irregular forms in general. We have a separate etymology for mice; should we also have separate etymologies for taught or bent?
I think the default assumption should be that, if not otherwise specified, it is not merely the lemma but all applicable inflected forms that descend from a given ancestor. If we give mūs as the ancestor of mouse, then this should already imply that the former's plural mȳs is the ancestor of the latter's plural mice. This gets rid of having to treat any irregularities that represent fossilized original regular alternations, no matter how far back they go. We are working on etymology sections here after all, not on historical morphology or historical phonology.
To be fair, without morphological and phonological supplementary information, etymology often becomes fairly opaque just-take-my-word-for-it business, and I do think Wiktionary could benefit from detailing these somewhere; I just do not think etymology sections are the place for this. --Tropylium (talk) 10:26, 4 October 2017 (UTC)
Having mūs as the ancestor of mouse does not immediately imply that mice derives from mȳs, or make it clear to the viewer. There is no duplication of information going on when etymology is given for mice, only clarification and necessary etymology. Apparently, someone rightly found that etymology should be specified for this non-lemma form, since an etymology section for mice already exists. Anyhow, I think this is being blown out of proportion. I would only ask for the option of specifying non-lemma etymologies where they are notable, as has already long been done with the article of am. Rua would delete all these etymology sections (despite am being a oft-cited non-lemma form for the purposes of reconstruction). When I make an etymology section on brought and did to explain their opaque etymologies, I don't want my edits nonsensically moved and crowded under the etymology pages of bring and do (or more often than not, simply deleted). These sorts of power trips by administrators not following the spirit of the guidelines (that they themselves wrote!) just make me incredibly discouraged from adding information to this website. 128.84.125.120 18:04, 4 October 2017 (UTC)
@Tropylium How would you handle the suppletion of the potential of olla, the perfect of sum, or in być? Putting etymologies on each of the forms is not going to be feasible. —Rua (mew) 23:36, 3 October 2017 (UTC)
There's only a limited amount of suppletion for any given case; we could assign an "etymological lemma" for each nonsuppletive group (e.g. lienee for the Finnish possessive stem). --Tropylium (talk)
Ew. —Rua (mew) 11:04, 4 October 2017 (UTC)
Seconded Rua. Anti-Gamz Dust (There's Hillcrest!) 00:34, 16 October 2017 (UTC)

Rollbacking/Patrolling[edit]

Hullo. I'd like to make a request for the rollbacking or the patrolling tool. Where is it at? --Barytonesis (talk) 08:09, 5 October 2017 (UTC)

@Barytonesis: An admin has to nominate you at WT:Whitelist I think (or is that only for auto patrol)? —Aryaman (मुझसे बात करो) 17:01, 5 October 2017 (UTC)
I think that rollback/patrol most often is applied to people who, for one reason or another, do not want to be administrators. Just apply to be an admin if you want some subset of the tools. - TheDaveRoss 17:04, 5 October 2017 (UTC)
@TheDaveRoss: I'd like to, but I don't think I've gathered enough trust yet. Would you endorse me? --Barytonesis (talk) 16:42, 14 October 2017 (UTC)

A more personal form of Google Translate just for Faroese[edit]

https://www.faroeislandstranslate.com/#!/Justin (koavf)TCM 08:01, 6 October 2017 (UTC)

Entries with deprecated labels[edit]

The label (ordinal) used for ordinal numbers is listed in Category:Entries with deprecated labels with no suggested replacement. Should it even be listed there? DonnanZ (talk) 13:21, 6 October 2017 (UTC)

There is no replacement. There should not be a label there at all, add the category with {{head}} or {{cln}} instead. —Rua (mew) 13:27, 6 October 2017 (UTC)
The label automatically generates the category though, as well as saying what it is, so I don't see any reason to change it, e.g. nittende. Besides that, there is no suggestion to use {{head}} or {{cln}} in the above-mentioned category. DonnanZ (talk) 13:45, 6 October 2017 (UTC)
It's a misuse of labels, that's why it's deprecated. "Ordinal" doesn't specify a context in which a term is used. —Rua (mew) 13:57, 6 October 2017 (UTC)
Whoever set up the label didn't take that into account. It surely would be a simple matter to change the label to "ordinal number", although loads of entries would have to be revised. "cln|nb|ordinal numbers" works for generating the category, but a qualifier would then have to be added, which is twice as much writing, and a step backwards. DonnanZ (talk) 14:11, 6 October 2017 (UTC)
The other label (cardinal) when moused over shows "cardinal number", but this doesn't happen with (ordinal). It is not deprecated. DonnanZ (talk) 14:47, 6 October 2017 (UTC)
"ordinal number" is also not a valid context. Context labels should not be used to give definitions or disambiguate them. They are meant to describe how something is used, not what it means. —Rua (mew) 15:20, 6 October 2017 (UTC)
Have you checked ordinal number? Also see here. Nineteenth is an ordinal number. DonnanZ (talk) 15:35, 6 October 2017 (UTC)
Where are you getting the idea that I'm denying that these are ordinal numbers? I only said that a context label is not how this fact should be indicated. The entry should be categorised with {{cln}} or the cat2= parameter on {{head}}, but there shouldn't be a context label saying that it's an ordinal number. —Rua (mew) 15:47, 6 October 2017 (UTC)
I agree with RuaCat. Ordinal numbers should be categorized as such using |cat2= or {{cln}} but not using {{lb}}. —Aɴɢʀ (talk) 16:21, 6 October 2017 (UTC)
I still disagree, but as you are so keen on everything else but, perhaps you would like to come up with some usage examples. DonnanZ (talk) 22:31, 6 October 2017 (UTC)

Please, please reveal the cause of the revert in the edit summary[edit]

Void information is the default text If you think this rollback is in error, please leave a message on my talk page. In so many words you could give some specific about the actual problem.

Instead of writing pure junk this formula, it would be more helpful for all of us if you would just write the reason in the edit summary (this way we won’t have to bother you on your talk page).

By the revert you make the work of someone to nil. Please, please either correct the error, other at least give a hint about the problem to avoid.

(Sorry for my poor English.)

Karmela (talk) 07:09, 8 October 2017 (UTC)

There are relatively few admins who have to go through a flood of edits by new contributors and see whether they belong in the dictionary or not. Given that, we simply do not have the time to give explanations tailored for every rollback that we make (if it wasn't clear, the default text is added automatically). I created the vote that added that default text because previously, it said nothing at all — obviously, this is much better, because you followed the instructions and left a message on Wikitiki89's page, where you can further discuss the edit. —Μετάknowledgediscuss/deeds 07:22, 8 October 2017 (UTC)
Thank you. For a (not vandal) contributor is the cause of the rollback _never_ clear, s/he made the contribution supposing it was ok.
The list of the typic errors must not be too long, would be possible to chose from a premade explanation list by reverts?
Karmela (talk) 16:37, 8 October 2017 (UTC)
We have such a list for deletions of entire entries. It would be a good start for what you recommend. I do not know whether it is readily done technically. DCDuring (talk) 18:59, 8 October 2017 (UTC)
  • @DCDuring, Metaknowledge In en.wikipedia.org you can add two dropdown boxes below the edit summary box with some useful default summaries:
  1. Common edit summaries -- click to use
  2. Common minor edit summaries -- click to use
One can enable this gadget at https://en.wikipedia.org/wiki/Special:Preferences#mw-prefsection-gadgets
An analog dropdown box Common revert summaries -- click to use must be technically similar.
Karmela (talk) 07:47, 14 October 2017 (UTC)
So, apparently technically possible. How do we get it? DCDuring (talk) 14:01, 14 October 2017 (UTC)
This is how mw.loader.load('//en.wikipedia.org/w/index.php?title=MediaWiki:Gadget-defaultsummaries.js&action=raw&ctype=text/javascript'); Dixtosa (talk) 14:20, 14 October 2017 (UTC)
All this postulating the wish of the community here. Is this here the correct place and form to ask the community of the Wiktionary?
Karmela (talk) 08:52, 15 October 2017 (UTC)

Requests for deletion - restoring the list of nominations[edit]

In June 2017, WT:RFD was changed to no longer list items nominated at the right top of the page. I propose to restore the previous state. The current state is that categories are listed but not the items nominated themselves. That is not very useful, IMHO.

Therefore, I propose:

  • List nominated items again, as a list of items for all languages.
  • To support that, list all nominated items in Category:Requests for deletion instead of listing them only in per-language categories. This, again, is a restoration proposal.

--Dan Polansky (talk) 09:34, 8 October 2017 (UTC)

@Dan Polansky: If you want to see, say, the 5 French requests, click on the "▶" symbol next to the "Requests for deletion in French entries‎ (0 c, 5 e)". In my opinion, this is more useful than before, because now you can choose the language you want to see, as opposed to seeing a mess of entries in all languages. If we want to see a mess of entries in all languages, we may look at the normal TOC (the "Contents" list). I believe we also have the option of making all languages un-collapsed by default, though personally I'd prefer them collapsed as they currently are. --Daniel Carrero (talk) 09:58, 8 October 2017 (UTC)
I want to see the complete list, not by language. I only want to check whether all the items listed there were put to RFD page itself; if I did not want to do that, I would not want to see that right-floating portion of the page at all. --Dan Polansky (talk) 13:32, 8 October 2017 (UTC)
  • I support Dan's proposals. The language-specific RFD categories seem to be useless. —Μετάknowledgediscuss/deeds 16:05, 8 October 2017 (UTC)
    How do we know that no one uses the by-language listings? (BTW, I don't use them)
    BTW, I have noticed that we have a fair number of headings on request pages that do not have tags. Do we need yet another run against the XML to identify:
    1. Tagged L2s that are not on current request pages.
      1. Tagged L2s that are for archived or otherwise closed requests.
    2. Untagged L2s that are on the request pages.
    We'd also need to treat items that have been stricken or closed, but not yet archived.
    At the moment I don't see how this can systematically be accomplished with search. Though I doubt we would need such a run every two weeks, it might be useful every quarter or, at least, every year. DCDuring (talk) 18:47, 8 October 2017 (UTC)
@DCDuring For part 1, User:DTLHS/cleanup/request consistency. I don't think 2 is that important since entries request request pages get archived eventually. It's possible that there are false positives if pages are linked unusually on the request pages. DTLHS (talk) 19:27, 8 October 2017 (UTC)
@DTLHS: For 2 I was thinking about those requests that are entered without use of any request template. Today I noticed it when [[academic institution]] was added to RFDE. (The contributor has now added {{rfd}} at my request.) Perhaps what is needed is to discourage addition of new headers on request pages except through the relevant templates. DCDuring (talk) 20:35, 8 October 2017 (UTC)

Classification of forms with -n't[edit]

Hello. Rua, Equinox, Erutuon and I have been talking about the classification of don't, can't and other forms with -n't in User talk:TAKASUGI Shinji/2017#Contractions. I think they are verb forms just like did and could, according to Arnold M. Zwicky and Geoffrey K. Pullum (Cliticization vs. Inflection: English n’t, Language 59(3), 1983, pp. 502-513), but not everyone agrees with their analysis. In my opinion, we shouldn't use “Contraction” as a header because it is not a part of speech, and we should replace it with a part of speech we can reasonably assign. What do you guys think? — TAKASUGI Shinji (talk) 10:09, 8 October 2017 (UTC)

Our level 3 headers are for more than just part of speech. Suffix isn't a part of speech either. We have to use "contraction" because for most cases there is no other way to do it. Look at Category:Middle Dutch contractions for example. So that argument is not very compelling.
As for these contractions specifically, I don't see how they can be considered anything else. They aren't considered verb forms in any standard grammar of English. One paper is interesting, but we should follow linguistic consensus on the matter and not the opinion of a single paper. —Rua (mew) 11:44, 8 October 2017 (UTC)
An analysis of well-known linguists and lack of analysis don't have the same value. I find their analysis convincing. You can only say don't you? and not *do not you?, from which we must conclude that don't is not a contraction of do not. — TAKASUGI Shinji (talk) 11:41, 9 October 2017 (UTC)
What do you mean by "standard grammar"? The Cambridge Grammar of the English Language (naturally, because it was co-written by Pullum) uses the inflectional-suffix analysis of -n't, and its auxiliary verb paradigms show negative forms corresponding to each of the finite forms. Certainly the more traditional version of English grammar that I learned as a kid didn't recognize negative inflected forms, but it wasn't particularly linguistically rigorous and shouldn't be the basis for our decisions on Wiktionary. — Eru·tuon 21:52, 9 October 2017 (UTC)
I'm in favor of the analysis in which -n't is an inflectional suffix and forms like don't are verb forms (and I could go on about that), but the essential thing is to at least be consistent. I don't think it's consistent to label -n't as a suffix (as it's been labeled since 2008) and then call forms like won't contractions. A contraction is basically the combination of a full word plus one or more clitics that are derived from orthographic words, but are not spelled as words in this case. So for won't to be a contraction, -n't has to be a clitic (a variant form of not). The other option is for -n't to be a suffix and won't a verb form. We need to pick an analysis and stick to it. It would be fine to include usage notes explaining the alternative analysis, or alternative inflection tables, or categories, but the headers and headword templates should stick to a single analysis. — Eru·tuon 19:20, 8 October 2017 (UTC)

A week has passed, and there has been one negative vote. I assume the classification of -n't according to their paper is acceptable. — TAKASUGI Shinji (talk) 12:54, 18 October 2017 (UTC)

Any idea for a new "Thesaurus:" shortcut?[edit]

WS:goodThesaurus:good stills works, as it should.

But "WS:" does not make a lot of sense anymore, because now "Wikisaurus" is called "Thesaurus".

Then again, "TS:good" and "TH:good" are unavailable, because they are language codes. Is there a good shortcut available? If not, I guess we'll have to keep using only "WS". --Daniel Carrero (talk) 18:37, 9 October 2017 (UTC)

THES seems the obvious choice. Equinox 18:44, 9 October 2017 (UTC)
Alright, I guess. I'm not entirely happy with a mere reduction from 9 to 4 letters, but maybe that's the best option we have.
Maybe THE would be better ("THE:good" → Thesaurus:good), but "the" is the ISO code for Chitwania Tharu (w:Tharu languages). Can't we use it anyway? --Daniel Carrero (talk) 14:31, 11 October 2017 (UTC)
But if we can't use ISO codes then we can't use any three-letter code, even if ISO hasn't used it yet. It should be considered reserved for future ISO use. Equinox 15:07, 11 October 2017 (UTC)
That may be true, but we have violated that rule before. We have "cat" and "mod" as working aliases. See CAT:English nouns and MOD:sandbox. "cat" means Catalan, which seems unlikely to be used by Wikimedia because they have settled for https://ca.wiktionary.org/ and https://ca.wikipedia.org/ (using "ca", not "cat"). "mod" is Mobilian Jargon language. --Daniel Carrero (talk) 15:22, 11 October 2017 (UTC)
To be clear, I would support using "the". ("THE:good" → Thesaurus:good) --Daniel Carrero (talk) 15:24, 11 October 2017 (UTC)
I would prefer "THS" which is the language code for the w:Thakali language, a Nepali Sino-Tibetan language with 5,900 native speakers. Chuck Entz (talk) 02:04, 12 October 2017 (UTC)
SYN. —suzukaze (tc) 02:28, 12 October 2017 (UTC)

Linking active policy proposals[edit]

WT:EL should probably link to WT:FORMS in some fashion. I imagine there are also other cases like these, where EL is a dead-end and the actual documentation is hidden away in some obscure undocumented location.

Some might protest that the former is policy while the latter are often drafts, but as long as this is indicated, I do not see any problem in linking. Should we maybe settle on some specific more mildly worded section hatnote, such as "Read more:" (instead of "Main article:" or the like)?

Interestingly, WT:Policies and guidelines, despite being prominently linked from the policy headers ({{policy}}, {{policy-TT}}, {{policy-DP}}), is currently categorized as "inactive". There's Category:Wiktionary think tank policies, but it's not especially user-friendly. --Tropylium (talk) 14:54, 11 October 2017 (UTC)

New section "Synchronic analysis" in WT:EL[edit]

w:en:Synchrony and diachrony

It isn't useful to have only historic (current "Etymology" section at en.wiktionary) or only modern analys.

Example: атония d1g (talk) 08:51, 12 October 2017 (UTC)

We include this in etymology, but the usual wording is "equivalent to". —Rua (mew) 13:36, 12 October 2017 (UTC)

Linking to Wikimedia Commons categories[edit]

Hello, I would like to know why the wiktionary entries are not linked to the Wikimedia Commons categories (by using statements at Wikidata). For example the entry Varvel can be connected to commons:Category:Vervels. It can only help the readers to (visually) learn more about that particular word. Fructibus (talk) 09:45, 12 October 2017 (UTC)

@Fructibus: Have you seen Wiktionary:Wikidata? We do in fact have some links to that sister project via local templates. E.g. tea. —Justin (koavf)TCM 09:50, 12 October 2017 (UTC)
@Koavf: Thanks a lot! By the way, was there any discussion about including the wiktionary pages into Wikidata, connecting with the Wikipedia/Commons pages? Then the Commons link would show automatically for the Wiktionary pages, in all languages. At this moment, if you want to link to Commons in all language articles, that means you have to edit 67 Wiktionary pages. Fructibus (talk) 19:05, 12 October 2017 (UTC)
@fructibus: "Was there any discussion about including the wiktionary pages into Wikidata" Oh yes, quite a bit. And there are currently options to include Wiktionary entries in Wikidata but I don't feel like I can do a good job of summarizing all of that. You may wish to see the equivalent page here: d:Wikidata:Wiktionary. I 100% agree that we should use Wikidata to make sister links--you may wish to talk with User:CodeCatUser:Rua (I had forgotten he [she?] was renamed for some reason) about that. —Justin (koavf)TCM 19:16, 12 October 2017 (UTC)
People have expressed dislike for Wikidata IDs, so we probably won't be using Wikidata for anything after all. I tried. —Rua (mew) 19:45, 12 October 2017 (UTC)
It will happen, it's just that at the moment the advantages aren't completely obvious. – Jberkel (talk) 20:56, 12 October 2017 (UTC)
@Jberkel: Isn't this one of them? —Justin (koavf)TCM 22:26, 12 October 2017 (UTC)
The page tea has already {{wikidata|Q6097}}. Changing to a template like {{sister links|Q6097}} could fetch all sister project links with automatic update of new links, deleted ones or renamed ones. The problem is that a word may have multiple senses that can be connected to multiple equivalent pages on Wikidata. --Vriullop (talk) 08:22, 13 October 2017 (UTC)
@Koavf: I'm all for Wikidata, it's just that to some editors the advantages are less clear at the moment. @Vriullop: yes that would be great, via Wikidata one should be able to fetch all the other relationships. Couldn't {{senseid}} (or something similar) be used for fine-grained associations? – Jberkel (talk) 14:11, 13 October 2017 (UTC)

@Jberkel - @Koavf - @Vriullop - @Rua - Sorry, I am new to Wiktionary buy I really don't see the reason in not linking the Wiktionary definitions in Wikidata. For example the Wikipedia article Water has a link to the Wiktionary definition, at the bottom of the article. Why not to show it in the middle-left side of the page, near to the other sister project links? (Commons, Wikibooks, Wikiquote). This way all the 220 Wikipedia articles can show the link to the Wiktionary definition in their respective language (if it exists), without the need to actually edit the 220 Wikipedia articles. Fructibus (talk) 18:49, 13 October 2017 (UTC)

@Fructibus: I agree as well but there were concerns that it's too difficult, impossible, or possible-but-difficult and not actually helpful. I disagree with the latter two but it's definitely an undertaking to be sure. Then again, so is everything. —Justin (koavf)TCM 19:01, 13 October 2017 (UTC)
@Koavf: Very nice answer, gives a feeling of touching a perfection in language, thanks :) - Fructibus (talk) 23:39, 13 October 2017 (UTC)

Ōbaku tō-on/sō-on readings[edit]

Found this video: Heart Sutra chanted by Ōbaku monks; is the ruby a Chinese pronunciation or as Wikipedia states: tō-on/sō-on readings? Here's a supporting resource. Domo, --POKéTalker (talk) 04:49, 13 October 2017 (UTC)

Personally it sounds suspiciously(?) too much like accented Mandarin ( () (ji)?  () (e)?), possibly dated ( (けん) (ken)), but I also don't know know what I'm talking about. Maybe tō-on is Mandarin. —suzukaze (tc) 05:11, 13 October 2017 (UTC)

For reference, comparison of Japanese, Ōbaku reading (sō-on?) and standard Chinese:

 (かん) () (ざい) () (さつ) (ぎょう) (じん) (はん) (にゃ) () () (みっ) () ()
Kanjizai Bosatsu gyō jin hannya haramitta ji
Avalokitesvara Bodhisattava was practicing deep prajnaparamita when...
 (クヮン) () (サイ) () () (ヘン) (シン) () () () () () () ()
K(w)antsusai Pusa hen shin poze poromito su
Avalokitesvara Bodhisattava was practicing deep prajnaparamita when...
觀自在菩薩般若波羅蜜多 [MSC, trad.]
观自在菩萨般若波罗蜜多 [MSC, simp.]
Guānzìzài Púsà xíng shēn bānruò bōluómìduō shí [Pinyin]
Avalokitesvara Bodhisattava was practicing deep prajnaparamita when...

Though there is probably no clear romanization to the monks chanting, should kanji with these Ōbaku on-readings be provided as sō-on? Just wondering. --POKéTalker (talk) 02:21, 14 October 2017 (UTC)

(The automatic pinyin generated by zh-usex is not correct because it uses the most common readings. [1] has pinyin transcription that seems to be OK —suzukaze (tc) 02:36, 14 October 2017 (UTC))
  • @POKéTalker, to confirm / clarify -- it sounds like you're asking if there is value in adding sōon readings to the individual kanji entries. If that's your proposal, I have no particular opposition, so long as the readings are clearly labeled as sōon (provided that's the correct reading category). ‑‑ Eiríkr Útlendi │Tala við mig 04:36, 14 October 2017 (UTC)
I think POKéTalker wants to make sure that they are indeed tou'on first. —suzukaze (tc) 20:48, 14 October 2017 (UTC)

TabbedLanguages default and English links in definitions[edit]

Yesterday, following Wiktionary:Beer parlour/2017/July#TabbedLanguages edit: default to English for unmarked links, I made a change to MediaWiki:Gadget-TabbedLanguages.js so that the default language would always be English, if no language is specified. This means that it's no longer necessary to use {{l|en|...}} in definitions. I'd like to ask the people who do this to use regular links from now on. —Rua (mew) 15:40, 14 October 2017 (UTC)

But not everyone uses tabbed languages. DTLHS (talk) 15:45, 14 October 2017 (UTC)
I agree that this would be a sensible default for those without it, too. But then we'd need a separate gadget. —Rua (mew) 15:46, 14 October 2017 (UTC)
If we made a separate gadget it could be more smart, such as linking derived terms to the correct language, while linking terms in definitions to English. DTLHS (talk) 15:53, 14 October 2017 (UTC)
Also, what happened to the plan to make TL the default? We had a vote and everything. —Rua (mew) 15:51, 14 October 2017 (UTC)
Any way to undo this behavior for searches and search results? Having them always go to English is pretty annoying when you’re working on some other language. — Vorziblix (talk · contribs) 01:48, 18 October 2017 (UTC)
Where else should they go? —Rua (mew) 10:49, 18 October 2017 (UTC)
Ideally to the last language visited, as they did before. — Vorziblix (talk · contribs) 23:15, 18 October 2017 (UTC)

Singapore terms[edit]

Just a heads up: a while ago, a Singapore schoolteacher encouraged his students to add Singaporean English terms to Wiktionary (which is, on the whole, a good thing). We seem to have a new batch of these happening at the moment, e.g. bus captain, taxi uncle. So be ready for some cleanup. Equinox 08:23, 16 October 2017 (UTC)

Special:Contributions/86.30.235.176[edit]

They've made some drastic changes to pronuciation which might not be correct. Anyone who knows Old English, do you mind taking a look. --Robbie SWE (talk) 18:12, 16 October 2017 (UTC)

@Robbie SWE: It looks like, as far as Old English pronunciation goes, they're changing sequences of /h/ and a sonorant to sonorant and voicelessness diacritic (for instance, /hr/ to /r̥/). That might be correct in a pseudo-phonetic transcription, but I don't know if it is an accepted phonological analysis. — Eru·tuon 21:46, 16 October 2017 (UTC)

Translating both ways[edit]

Hello!

When I started working on a project in which I would like to use translations from the wiktionary, I noticed that wiktionary translations are created separately for each language. That means that even if the English wiktionary contains the translation of a word into another language e.g. Mandarin, in that language there will not be a translation of that word into english.


One example:

library

- the list of translations contains the translation 图书馆

zh:图书馆

- the Chinese wiktionary page does not have a translation for that word into English (the site contains: 英语(English):[[]])

Since these translations are symmetric, it would be correct to add a large number of translations to these wiktionaries with much less effort. However there surely will be a few issues that have to be resolved first.


TheDaveRoss already replied to me per mail already stating some issues:

"1. There are numerous Wiktionaries, each one maintained by a distinct community of volunteers. Each has its own policies regarding what may or may not be included, how translations are to be added, etc. It is very important that you coordinate with the local community wherever you add content to ensure that the content meets their criteria.

2. Translations are very nuanced (as you are probably aware). Automated addition of translation has happened at small scales in the past, however close oversight by a person familiar with both languages is required. Even translations which appear to be symmetrical may require special annotation in the target language which is not included in the original language.

3. The source material may not be correct, and automation can propagate errors. The English Wiktionary, and a few other large Wiktionaries, have enough contributors that many errors are caught quickly. That is not the case for the majority of other languages, so it is important to ensure any additions to other languages are correct"

4. Attribution to the original contributor will be important

E.g. adding the new words to proposed translation first and then checking for correctness would decrease the risk of wrong translations but add some value right away.


What do you think about writing a script to do this, what other problems are there with this? Do you know about previous attempts to do this? I hope this could be very useful!

Noahho (talk) 01:21, 17 October 2017 (UTC)

@Noahho: Something similar to two-way translation could work if we can agree on how we will use Wikidata and how it will be connected across Wiktionaries. Unfortunately, how that would work is very difficult to determine. —Justin (koavf)TCM 02:13, 17 October 2017 (UTC)
@Noahho Hello Noah. Can I please know what project you are working on? If the aim of the project is to extract translations of foreign-language terms into English from Wiktionary, it would be much easier to extract from the pages on English Wiktionary; e.g. for simplified 图书馆 it would be at 圖書館, which says "library". Wyang (talk) 02:17, 17 October 2017 (UTC)
This is an age-old problem. I believe that, somewhere, there is a unified Wiktionary that is not dependent on a "home" language. But I forget what it is called, or what state it is in. SemperBlotto (talk) 05:37, 18 October 2017 (UTC)
@SemperBlotto: omegawiki:? —Justin (koavf)TCM 09:03, 18 October 2017 (UTC)
I will also mention one previous attempt to do this locally was User:Tbot. The person who created that script has passed away, however there is some amount of documentation of his efforts in that user space. - TheDaveRoss 13:57, 18 October 2017 (UTC)

Turkish vs Ottoman Turkish[edit]

The Balkan language loanwords from Turkish should technically be Ottoman Turkish, since that's the era they entered those languages, right? Is the only main difference the script being Arabic vs. Latin? I realized I need to go back and change a bunch of Romanian and some other entries. Word dewd544 (talk) 16:12, 17 October 2017 (UTC)

Yes, they should generally be Ottoman Turkish. The script is one significant difference, but if I’m not mistaken there’s also a huge difference in lexicon, where a large portion of the Ottoman Turkish lexis consists of loanwords from Persian and Arabic that were later stamped out of usage and replaced with neologisms by Atatürk. — Vorziblix (talk · contribs) 21:40, 17 October 2017 (UTC)
There are also grammatical differences. That said, I would personally prefer to treat them as a single language, and I don't think we lose much by claiming Balkan loanwords are from Turkish rather than Ottoman Turkish when the word in question is itself the same. —Μετάknowledgediscuss/deeds 21:44, 17 October 2017 (UTC)
I’m indifferent to merging them, not being knowledgeable enough on the subject, but the split does seem to be mostly a relic of sticking to ISO codes; input from editors experienced with Turkish could be helpful. — Vorziblix (talk · contribs) 06:40, 19 October 2017 (UTC)
Isn’t it more true to the soothfast happenings in the Ottoman Era to describe the Ottoman Turkish as an acrolect of Turkish which the elite prioritized while basically we have had Turkish all the time? It would be awkward to say that we had Turkish once and then, by some peculiar developments in constitutional history, Ottoman Turkish, and then because Atatürk said so Turkish has smitten Ottoman Turkish. Rather there has been one basic Turkish from which the Balkan languages also borrowed rather than from the language we see as Ottoman literary inheritance today, though of course there can be learned borrowings from the literary language as well, though in the case this is largely unlikely because of mostly late literary culture in the Balkan countries and literary culture in the Slavia as well as in Greece (I don’t know about Romania and Albania) also prohibits itself to borrow, as compared to other literary cultures. So in the context of Balkan languages, the Turkish they have been in contact with was coexistent Turkish rather than typical Ottoman Turkish. We are just inveigled to assume that one literary language has borrowed from the other literary language even when the spoken language has borrowed, because for older times we know about the spoken language from its appearance in writing. It is an image of things we have long surpassed in Romance studies, like acknowledging that Spanish has borrowed from colloquial Arabic rather than from literary Arabic. Palaestrator verborum (loquier) 20:51, 18 October 2017 (UTC)
Unfortunately it’s not really clear what Wiktionary means by ‘Ottoman Turkish’ — just the literary acrolect or the language in general during a given time period. Previous discussions don’t seem to have reached a conclusion. Some of the comments made there could be relevant to the issue of how to treat Ottoman Turkish, though. — Vorziblix (talk · contribs) 06:40, 19 October 2017 (UTC)

Listing Translations by Language[edit]

It seems to me that quality control of translations is harder than it should be: those of us who patrol new edits can't be knowledgeable in anywhere near all of the languages, and those with expertise in the languages in question are less likely to be spending their time browsing through English entries. {{t-check}} is helpful, but not used in most languages.

Does everybody think it would be a good idea to create a listing of translations in each language, along with the entry they're in? I would envision it as a listing in the language's alphabetical order, with the {{t}} template converted to an {{l|template}} and followed by the name of the entry:

This would make it easy for an expert in a given language to scan through all the translations in that language without browsing a bunch of English entries. It also might make the redlink categories and the overhead that goes into creating them unnecessary.

I'm bringing it up here because it would be a major undertaking involving massive processing of the dumps, so I want to make sure it's a good idea before asking anyone to do it. Perhaps it could be started with some of the smaller LDL languages as a test. Chuck Entz (talk) 14:19, 18 October 2017 (UTC)

Matthias Buchmeier maintains lists similar to what you describe. — Ungoliant (falai) 14:59, 18 October 2017 (UTC)
I can share the assumption that it would make it more possible to make Wiktionary serve as bilingual dictionary in relation to single languages, as for now one cannot directly work in Wiktionary to make it intentionally a bilingual dictionary for any language because one does not see what is already there, i.e. one can only add to quantity, but only by serendipity to quality. But the requirement would be that such lists are live dumps which get refreshed as soon as edited, instantly by Javascript or at least after refetching the page. Because it is a core part of motivation for editing to see the results published instantly, that’s why the web is there. Palaestrator verborum (talk) 16:41, 18 October 2017 (UTC)

Word Frequencies in Wiktionary[edit]

We have just finished removing the {{rank}} information based on some old, problematic parsing of the Project Gutenberg corpus. We still have a few appendices which record that information. My main objection to the inclusion of that data was that it was flawed, and outdated. But I don't roundly object to having word frequency information which is accurate. To that end I have a few questions.

  1. Should we include any frequency data in any manner?
  2. If so, should that data be represented within word entries in some way?
  3. Which corpora should be used, or which frequency lists?
  4. Should original research (of the type used for the old data) be allowed?

One starting place for English (and a few other languages) can be found at the BYU corpus page. It is probably best to avoid getting too deeply into the weeds here, but rather if it seems like there is a general consensus around what should be included we can spin off a project page and figure out all of the details. - TheDaveRoss 17:53, 18 October 2017 (UTC)

We should certainly not add frequency data to word entries because the data is doubtless interesting in a list, but too unsure and thus and because nobody wants to know which words have the nearest frequency if he looks up a word – which is a totally random result of sundry capacities of a language and without fruit for erudition – and because it would suck off endeavors to more instructive content creation not worthwhile enough to maintain in a the main namespace. And as there is more instructive content to be created by the same endeavors, I also opine that for the collection of frequency data it should be waited until copyright law has been abolished by revolutions in the world and thus representative and illuminatingly separable corpora can be collected. At that time we would only struggle with the technical recognition of what a word be, not of what sources we digest, which together are multiplying error factors. Palaestrator verborum (loquier) 18:25, 18 October 2017 (UTC)
Wut? Sorry, what I meant there was: one, I have no idea what the abolition of copyright law has to do with the inclusion of word frequency on Wiktionary and, two, I disagree with the notion that this would prevent some other work from being done. - TheDaveRoss 18:33, 18 October 2017 (UTC)
The notion is that there are opportunity costs in collecting telling corpora. I don’t think that one could be content with subtitle databases, as these are slanted to Hollywood and mass productions and their fantasy worlds instead of the whole language that we mean when we talk about the language; and actually those current subtitle collections and most other corpora are non-free either. The web-based corpora which are represented on the BYU corpus site have of course their own problems, with the deep web and the dark web and resources being varyingly crawlable. If we want recent and representative data, we can only go illegal by grabbing Library Genesis, accessing journal and newspaper databases via black channels like Sci-Hub does and things like that, perhaps mixed with subtitles, i.e. things that we cannot perform by the means of the Wikimedia Foundation without endangering it. If there would be no copyright law, there would be a large database of works of all kinds which would constitute good (i.e. the technologically and humanly, not legally best possible) and fast data. This is of course a high standard from which I esteem corpus data valuable, and possibly the view of a philosopher against the practical mind of a programmer. But one can set the doubts even higher by asking oneself how to offset different data sources, like how commensurable web data and journal data and parliamentary debates are, even if one has access to all humanly possible corpora, and if one needs to be the man to have correct information about word frequencies. Others could be pleased to see lesser corpus data, but I think that the assumption cannot be rejected out of hand that this is not worth it if there are so many doubts about whether this data represent the actual distribution in language (by my common sense, I often wonder about words not being found at all in large frequency dictionaries) – aside from such data not being valuable maintained in individual word articles, which question is subject to entirely different evaluation criteria, because the intention of a reader opening a word’s entry is different from the intention of a reader opening a word list. Palaestrator verborum (loquier) 19:25, 18 October 2017 (UTC)
It's about time we asked the question, how can words be real if our eyes aren't real? DTLHS (talk) 19:34, 18 October 2017 (UTC)
@DTLHS: I'm ded. But it should be "How Can Words Be Real If Our Eyes Aren't Real". —Aryaman (मुझसे बात करो) 20:40, 18 October 2017 (UTC)
I have not asked this question, they are valuable abstractions from language – we can explain and describe them –, but you cannot just cast “the language” into a measuring beaker to know an objective distribution of its constituent parts (which is also a mereological problem), as the language you know about is always constructed to some degree as necessitated by material constraints. What we want, in laying out at frequency list which could praise itself of utilizing the methods fit for the object, is to be at least as exact as possible about it, but that is by far not legal. Palaestrator verborum (loquier) 19:45, 18 October 2017 (UTC)
Palaestrator, your writing style is unnatural and unnecessarily loquacious. I don't know why you are doing it, but I want you to know that your arguments will be taken more seriously if you try to express them clearly and succinctly, rather than in a way that just makes us all think that you're trying to show off. (And as a side note, your understanding of how corpora work in relation to copyright law seems to be flawed, so you might want to try reading up on that first.) —Μετάknowledgediscuss/deeds 19:53, 18 October 2017 (UTC)
It is easy with the law in this case: If the corpus collection is legal (i.e for example the Wikipedia corpus, accessing it is legal), accessing it is legal; if the data collection is illegal, or the manner of accessing it is (i.e. for example a university’s access being used beyond its license, as Sci-Hub does), that is already legally contentious (it is disputed in many jurisdictions, as the United States of America and the Federal Republic of Germany, if just streaming content published in breach of copyright law is violating it, also dependent of dolus), and one should not lay the hands in fire for the collected fruits of such automated accessing, especially if it is commercially exploited as allowed by the licenses used on Wiktionary.
I can’t show off with my writing style, being unnatural in normal people’s view is my expressions’ very nature, or sounding like a 19th century novel (wherefore being natural though if the matter dealt with is a complicated matter of human culture, as language is? I don’t know why people recall nature when we can surpass it.). And it is not loquacious, I already write off parts of it. Besides, the point of reading it could lie in it saving minds from futile pursuits. Like others, I don’t talk if I prognosticate that my verbosity does not pay proportionately. What is the prospect though of how much work hours the creation and maintenance of those lists take? Palaestrator verborum (loquier) 20:32, 18 October 2017 (UTC)
  • It would be nice to have word frequency information available, but there is the serious PoS/Etymology problem (eg, dyke or dike). I am skeptical of both the heavily annotated corpora (which differentiate [or try to] by PoS, but are generally small) or the large corpora (which do not usually make accurate PoS determinations). That said Google N-Grams and the BYU corpora would be fairly useful, though I have not investigated the terms of use for their frequency data. It doesn't seem particularly useful in an entry. It would be very useful to have some kind of quick indication as to what frequency class a given word used in a definiens was in (eg, top 10K, next 40K, next 200K, perhaps next 750K). As an appendix such lists might make it easier for a contributor to check the understandability of a definition. DCDuring (talk) 21:02, 18 October 2017 (UTC)
    Google N-Grams makes possible Reference links like this frequency comparison of canvas and canvass as verb and noun. To me that seems useful to contributors and to passive users. DCDuring (talk) 21:08, 18 October 2017 (UTC)