Wiktionary:Beer parlour/2017/October

October LexiSession: punishment

The monthly suggested collective theme is punishment. Not so funny, but the 10th of October is the World Day Against the Death Penalty so we may look at the alternatives and do better descriptions around this theme.

Lexisession is a collaborative experiment without any guide or direction. You're free to participate however you like and to suggest next month's topic. If you participate, please let us know here or on Meta, to keep track on the evolution of LexiSession. In one year, 35+ people have participated! I hope there will be some people interested this month, and if you can spread it to another Wiktionary, you are welcome to do so. Ideally, LexiSession should be a booster for every project at the same time, to give us more insight into the ways our colleagues works in the other projects.

See you soon Noé 09:28, 1 October 2017 (UTC)[reply]

slow slicing and poena cullei are my requests for this month (entries for these, not as punishment...although, they could be useful detractors for trolls...) --P5Nd2 (talk) 09:39, 8 October 2017 (UTC)[reply]

Thank you for your participation!

Noé 08:58, 2 November 2017 (UTC)[reply]

Special:Contributions/98.113.14.63

Adding translations to too many unrelated languages. No idea where they get transliterations for Chinese dialects, such as Jin, Gan, Xiang, etc. --Anatoli T. ^{(обсудить}/^вклад) 10:51, 1 October 2017 (UTC)[reply]

How the heck did they find Prakrit translations? They must be going through the entries we have already. —Aryaman ^{(मुझसे बात करो)} 20:44, 1 October 2017 (UTC)[reply]

French Wiktionary September news

Hello!

Hey! September issue of Wiktionary Actualités just came out in English!

In this issue: Comments about press articles, our information desk is not like yours, a description of a dictionary of short-text signs, a comment on the expression of gender in an Andean language, some cool videos about words (in French and English!), announcements for the Wikiconférence francophone in October and plenty of statistics with fancy fleurons surrounding it all!

As usual, it is translated in English by non-native speakers, in less than a day, and it is not perfect, but it can be improved by readers (wiki-spirit). We did not receive any money for this publication and we are not supported by any user group or chapter. It is only written by the community, for the large community of lexicolovers! I hope you did not feel harassed by this notice Noé 21:41, 1 October 2017 (UTC)[reply]

PulauKakatua19 (talk • contribs) again

This user is adding spurious Hittite entries, with some really bad/outdated etymologies. No references either. They have been warned many times for Russian, Hindi, and Rohingya edits. I suggest a week-long block. —Aryaman ^{(मुझसे बात करो)} 14:51, 3 October 2017 (UTC)[reply]

Etymological information for strong verb non-lemma forms

There are many forms, e.g. in English, where arbitrary, irregular, strong forms of verbs deserve their own etymology. Many of these individual forms have received particular attention from linguists over the decades, e.g. did (past tense of do, from a unique, non-past reduplicated root form of the ancestor of do dating back to Pre-Germanic, for unknown reasons), sang (past tense of sing derived directly from a Proto-Indo-European form of the ancestor of sing). These non-lemma forms have their own independent etymological lineage that can be traced back thousands of years.

A certain administrator (Rua) has informed me that it is policy on Wiktionary to minimize etymological information on non-lemma forms, and instead place such information in the lemma form's etymology section. This can be understood for weak forms like walked, but those forms need little explanation because they are formed regularly, and for the forms that do require extra explanation, it makes for unsightly etymology pages on the lemma form's etymology sections (see Proto-Germanic *dōną's etymology section for the current policy specification; it doesn't even specify the past form *dedǭ, referring to it only as "the past form").

I understand the concern to avoid etymology fragmentation, but in this case, the etymology itself is fragmented and the two forms are remembered as separate, arbitrary, irregular forms. Perhaps there is a solution to maintain the same etymology information in multiple pages, but I think the most simple solution would be to provide etymological information for such forms on their own pages. There is really no reason to avoid this practice and it only makes things more confusing. I am surprised that this is against current policy. Do you agree with this assessment?

128.84.127.223 16:17, 3 October 2017 (UTC)[reply]

Strongly oppose putting etymologies on every inflected form, irregular or not. —Rua (mew) 16:35, 3 October 2017 (UTC)[reply]

Out of curiosity -- where should such etymological information go? Some simple-present verb forms include etymological information for irregular conjugated forms, such as at [[go#Etymology 1]]. Others do not, such as at [[do]], which includes no explanation for the formation of [[did]]. ‑‑ Eiríkr Útlendi │^{Tala við mig} 17:29, 3 October 2017 (UTC)[reply]
- At the lemma entry, where we currently already place them. The IP is arguing that we should put etymologies on nonlemma entries too, which is going to lead to a huge duplicative mess. —Rua (mew) 17:47, 3 October 2017 (UTC)[reply]

I am proposing a move of the notable etymologies from the lemma to the non-lemma forms, if they are notable as in strong verbs. There is no duplication going on, only a move, as I indicated in the OP. 128.84.127.223 17:58, 3 October 2017 (UTC)[reply]

Then I still oppose it, the etymological information should be centralised on the lemma form. That's how all etymological dictionaries work, that's how we've worked so far too. Our users are accustomed to follow the link to the lemma for information, which is the purpose of non-lemma entries in the first place. They're there to help get users to the right place, nothing more. We should not scatter our information across various non-lemmas. —Rua (mew) 18:34, 3 October 2017 (UTC)[reply]

Traditional etymological dictionaries are constrained by the space in a book and give priority to lemma forms because they are the most popular. There is no real reason to ignore non-lemma forms or centralize their etymologies because Wiktionary doesn't have a size constraint, especially for adding reasonable information. I disagree that the only purpose of non-lemma forms is to provide a link to a lemma form; many non-lemma forms have lineages in their own right and there is no reason to marginalize them. Furthermore, users are not accustomed to follow the link to the lemma forms as you suggest; precedents for separate etymologies for non-lemma forms like done, is, are, am, etc. already exist and have existed for a long time. 128.84.127.223 18:37, 3 October 2017 (UTC)[reply]

Size constraint isn't the issue. It's keeping our information organised so that information can be found easily. And what I said is the agreed-upon purpose of non-lemma forms. It's why we don't include things such as derived terms, descendants or inflection tables on non-lemmas. Wiktionary is fundamentally lemma-oriented (or lexeme-oriented) rather than word-oriented. If we were word-oriented, we'd also include full definitions on non-lemmas, but thankfully we've been wise enough to not follow that idea. —Rua (mew) 18:42, 3 October 2017 (UTC)[reply]

Semantically and synchronically, what you're saying is correct; non-lemma forms don't require separate definitions. The placeholders used now are adequate. Etymologically and diachronically, it's incorrect. Irregular non-lemma forms are entirely independent of their lemma forms. Wiktionary is a semantically lemma-based dictionary, but that's completely unrelated to etymology. There is good reason for irregular non-lemmas to provide etymologies, and the semantic value of the terms have no bearing on it. 128.84.127.223 18:46, 3 October 2017 (UTC)[reply]

Ok, but as you must understand by now, that's not how Wiktionary works. The etymologies for the individual parts are noted on the lemma. You'll simply have to adapt to this practice. We're not going to change it just because some random user doesn't like it. —Rua (mew) 18:48, 3 October 2017 (UTC)[reply]

You are not arguing against my point, you are arguing your point because "that's the way it's always been done" and based on ad hominem because I'm "some random user". 128.84.127.223 18:54, 3 October 2017 (UTC)[reply]

I am not proposing putting etymologies on "every inflected form", only on the arbitrary forms with their own separate, traceable etymologies, if only to indicate their significance. The regular forms don't require etymologies because they are predictable. E.g. the etymology for the strong non-lemma form of sing, which is sang:

From Old English sang, from Proto-Germanic *sang, from Proto-Indo-European *songʷh-, o-grade past tense of *sengʷh- (“sing, make an incantation”).

Right now, the article sang doesn't indicate any of this lineage at all. As a strong and unpredictable form, lexically, sang is just as prominent as its form which is arbitrarily deemed the lemma form, sing, which independently derives from a different PIE form. There is no reason to treat it as a secondary form etymologically, at least in this case. 128.84.127.223 17:36, 3 October 2017 (UTC)[reply]

That's not any better. Consider how many times we'd have to duplicate the etymology for all 12 of the past tense forms of vera or syngja. The lemma is a natural place for etymologies, since it's a single central entry that covers all inflected forms. —Rua (mew) 17:47, 3 October 2017 (UTC)[reply]

That's not duplication, that's providing the very separate etymologies for very separate forms. If the forms merge at a certain point, then a link can be provided to the form from which they split off to avoid etymology duplication, like is done with borrowed terms. The words is, are, and were, for example, are all forms of is, but does that mean these forms should not provide their own etymologies? Are the etymologies of these forms of less interest and notability than any other term? They are not. 128.84.127.223 17:56, 3 October 2017 (UTC)[reply]

I could be wrong, but I don't think Rua is arguing that the etymologies of conjugated forms are not worthy of inclusion. I believe that she is instead arguing that the etymologies of conjugated forms should go within the etymology of the lemma form, and that the conjugated-form entries should be minimal.

The issue at hand is not whether to include or exclude certain information -- rather, it is about where to include that information. ‑‑ Eiríkr Útlendi │^{Tala við mig} 18:09, 3 October 2017 (UTC)[reply]

Right. I am proposing that it should go in the non-lemma form. In fact, what I'm proposing is already standard practice for many notable forms, e.g. done. I want this to be consistent. For verbs like be, it would clutter its etymology section to list all of the etymologies for all of its many suppletive forms is, are, were, was, am, etc. One interested in these etymologies can follow the links to these forms' pages (which are provided in the term head) and view the etymology. More importantly, someone who specifically searches for irregular forms should have immediate access to their etymologies on the same page.

When I go to the page for am (which actually already follows the format that I'm proposing; I don't think anyone would want to move its etymology to the be page), I want to know the etymology of the term. When dealing with etymology, I don't really care if it's a form of any other (in this case, a completely unrelated lemma form), I want immediate access to its own unrelated and notable etymology. I believe this seems fairly reasonable and already has precedent. 128.84.127.223 18:20, 3 October 2017 (UTC)[reply]

The lemma entry is a central place for the term and all of its inflections. Information about am concerns the lemma be, so it should go there. The individual parts of verb paradigms may have separate origins but they don't have separate etymologies because they are inherited as a whole. The verb be in modern English is the same paradigm as the verb been in Middle English. —Rua (mew) 18:38, 3 October 2017 (UTC)[reply]

That's incorrect. Separate forms of a verb are not "inherited as a whole". That doesn't even make any sense. Irregular forms all require individual memorization and passing down. The lineage of was, for example, is entirely separate from be, as are both from am. If what you were saying was true, all Indo-European languages would still preserve the verb paradigms of Proto-Indo-European. They do not. They mix, they match, they innovate, they supply. 128.84.127.223 18:41, 3 October 2017 (UTC)[reply]

But they still form a single verbal paradigm. A question like "what is the past tense of be?" has an answer precisely because paradigms exist. We have chosen to use a single form to stand in for the entire paradigm, the lemma form, for convenience. That's where etymologies also go. —Rua (mew) 18:46, 3 October 2017 (UTC)[reply]

Etymologically, verbal paradigms don't matter for irregular forms. We have chosen the lemma form to stand in for the non-lemma forms semantically, but we have not done so etymologically, because that makes no sense. 128.84.127.223 18:48, 3 October 2017 (UTC)[reply]

We've chosen to do both. I'm sorry if that makes no sense to you, but it is what it is. —Rua (mew) 18:49, 3 October 2017 (UTC)[reply]

Please cite for me to this specific point in Wiktionary policy. I will propose the change through the proper channels. 128.84.127.223 18:53, 3 October 2017 (UTC)[reply]

I found it myself, and lo and behold, you seem to be the one who added this into the "common guidelines" page in the first place. While I agree with most of your additions, exclusivity of etymology to the lemma page is one that does not make any sense. 128.84.127.223 19:06, 3 October 2017 (UTC)[reply]

I am in favor of continuing to not split etymologies, on the grounds of workability: an editor who is interested in adding this type of information should be able to see at a glance if it has been done already, without checking each relevant non-lemma entry separately. On the other hand, I don't see a problem in directing users from non-lemma forms to the lemma, in cases where they need a separate discussion.

Actual suppletion seems like a different case, though. is, are and be have completely unrelated etymologies, and continuing to maintain separate etymology sections for them seems like a good idea (but I'd again be in favor of pointing users from the lemma form to the other entries for further reading). --Tropylium (talk) 20:49, 3 October 2017 (UTC)[reply]

I don't want to split etymologies, except like you said, for terms with suppletive forms and terms with strong forms. For example, the etymology of "did" takes a separate lineage all the way back to Proto-Indo-European that's completely independent from "do"; despite not being a suppletive form, it's a strong form. I don't want to split etymologies for verbs like "walked", only for verbs like "did" and "is/am/are" and "brought". One page should not contain etymologies for different terms if the etymologies are not currently regularly formed. So this would only be an exception that would affect a relative minority of pages. Wouldn't you agree with this? 128.84.127.223 21:47, 3 October 2017 (UTC)[reply]

English has relatively few inflected forms, but it can get pretty complicated when you have forms inflected for gender, number, case, etc. Even in English, am, is and are all go back to inflected forms of the same Proto-Indo-European root. As for strong verbs, I don't think differences in ablaut grade are enough to justify maintaining separate etymologies. We have a recognized system of lemmas and non-lemmas, but I'm not sure how you could decide which form to make the "etymology lemma" for forms sharing an etymology. Chuck Entz (talk) 02:15, 4 October 2017 (UTC)[reply]

I have trouble with the vagueness of "strong forms". This is well-defined only for Germanic languages, not a generally applicable concept. Likewise, "having a separate lineage" holds for a lot of things, for starters all irregular forms in general. We have a separate etymology for mice; should we also have separate etymologies for taught or bent?

I think the default assumption should be that, if not otherwise specified, it is not merely the lemma but all applicable inflected forms that descend from a given ancestor. If we give mūs as the ancestor of mouse, then this should already imply that the former's plural mȳs is the ancestor of the latter's plural mice. This gets rid of having to treat any irregularities that represent fossilized original regular alternations, no matter how far back they go. We are working on etymology sections here after all, not on historical morphology or historical phonology.

To be fair, without morphological and phonological supplementary information, etymology often becomes fairly opaque just-take-my-word-for-it business, and I do think Wiktionary could benefit from detailing these somewhere; I just do not think etymology sections are the place for this. --Tropylium (talk) 10:26, 4 October 2017 (UTC)[reply]

Having mūs as the ancestor of mouse does not immediately imply that mice derives from mȳs, or make it clear to the viewer. There is no duplication of information going on when etymology is given for mice, only clarification and necessary etymology. Apparently, someone rightly found that etymology should be specified for this non-lemma form, since an etymology section for mice already exists. Anyhow, I think this is being blown out of proportion. I would only ask for the option of specifying non-lemma etymologies where they are notable, as has already long been done with the article of am. Rua would delete all these etymology sections (despite am being a oft-cited non-lemma form for the purposes of reconstruction). When I make an etymology section on brought and did to explain their opaque etymologies, I don't want my edits nonsensically moved and crowded under the etymology pages of bring and do (or more often than not, simply deleted). These sorts of power trips by administrators not following the spirit of the guidelines (that they themselves wrote!) just make me incredibly discouraged from adding information to this website. 128.84.125.120 18:04, 4 October 2017 (UTC)[reply]

@Tropylium How would you handle the suppletion of the potential of olla, the perfect of sum, or in być? Putting etymologies on each of the forms is not going to be feasible. —Rua (mew) 23:36, 3 October 2017 (UTC)[reply]

There's only a limited amount of suppletion for any given case; we could assign an "etymological lemma" for each nonsuppletive group (e.g. lienee for the Finnish possessive stem). --Tropylium (talk)

Ew. —Rua (mew) 11:04, 4 October 2017 (UTC)[reply]

Seconded Rua. Anti-Gamz Dust (There's Hillcrest!) 00:34, 16 October 2017 (UTC)[reply]

Rollbacking/Patrolling

Hullo. I'd like to make a request for the rollbacking or the patrolling tool. Where is it at? --Barytonesis (talk) 08:09, 5 October 2017 (UTC)[reply]

@Barytonesis: An admin has to nominate you at WT:Whitelist I think (or is that only for auto patrol)? —Aryaman ^{(मुझसे बात करो)} 17:01, 5 October 2017 (UTC)[reply]

I think that rollback/patrol most often is applied to people who, for one reason or another, do not want to be administrators. Just apply to be an admin if you want some subset of the tools. - TheDaveRoss 17:04, 5 October 2017 (UTC)[reply]

@TheDaveRoss: I'd like to, but I don't think I've gathered enough trust yet. Would you endorse me? --Barytonesis (talk) 16:42, 14 October 2017 (UTC)[reply]

A more personal form of Google Translate just for Faroese

https://www.faroeislandstranslate.com/#!/ —Justin (koavf)❤T☮C☺M☯ 08:01, 6 October 2017 (UTC)[reply]

Entries with deprecated labels

The label (ordinal number) used for ordinal numbers is listed in Category:Entries with deprecated labels with no suggested replacement. Should it even be listed there? DonnanZ (talk) 13:21, 6 October 2017 (UTC)[reply]

There is no replacement. There should not be a label there at all, add the category with {{head}} or {{cln}} instead. —Rua (mew) 13:27, 6 October 2017 (UTC)[reply]

The label automatically generates the category though, as well as saying what it is, so I don't see any reason to change it, e.g. nittende. Besides that, there is no suggestion to use {{head}} or {{cln}} in the above-mentioned category. DonnanZ (talk) 13:45, 6 October 2017 (UTC)[reply]

It's a misuse of labels, that's why it's deprecated. "Ordinal" doesn't specify a context in which a term is used. —Rua (mew) 13:57, 6 October 2017 (UTC)[reply]

Whoever set up the label didn't take that into account. It surely would be a simple matter to change the label to "ordinal number", although loads of entries would have to be revised. "cln|nb|ordinal numbers" works for generating the category, but a qualifier would then have to be added, which is twice as much writing, and a step backwards. DonnanZ (talk) 14:11, 6 October 2017 (UTC)[reply]

The other label (cardinal number) when moused over shows "cardinal number", but this doesn't happen with (ordinal number). It is not deprecated. DonnanZ (talk) 14:47, 6 October 2017 (UTC)[reply]

"ordinal number" is also not a valid context. Context labels should not be used to give definitions or disambiguate them. They are meant to describe how something is used, not what it means. —Rua (mew) 15:20, 6 October 2017 (UTC)[reply]

Have you checked ordinal number? Also see here. Nineteenth is an ordinal number. DonnanZ (talk) 15:35, 6 October 2017 (UTC)[reply]

Where are you getting the idea that I'm denying that these are ordinal numbers? I only said that a context label is not how this fact should be indicated. The entry should be categorised with {{cln}} or the cat2= parameter on {{head}}, but there shouldn't be a context label saying that it's an ordinal number. —Rua (mew) 15:47, 6 October 2017 (UTC)[reply]

I agree with RuaCat. Ordinal numbers should be categorized as such using |cat2= or {{cln}} but not using {{lb}}. —Aɴɢʀ (talk) 16:21, 6 October 2017 (UTC)[reply]

I still disagree, but as you are so keen on everything else but, perhaps you would like to come up with some usage examples. DonnanZ (talk) 22:31, 6 October 2017 (UTC)[reply]

Please, please reveal the cause of the revert in the edit summary

Void information is the default text If you think this rollback is in error, please leave a message on my talk page. In so many words you could give some specific about the actual problem.

Instead of writing ~~pure junk~~ this formula, it would be more helpful for all of us if you would just write the reason in the edit summary (this way we won’t have to bother you on your talk page).

By the revert you make the work of someone to nil. Please, please either correct the error, other at least give a hint about the problem to avoid.

(Sorry for my poor English.)

Karmela (talk) 07:09, 8 October 2017 (UTC)[reply]

There are relatively few admins who have to go through a flood of edits by new contributors and see whether they belong in the dictionary or not. Given that, we simply do not have the time to give explanations tailored for every rollback that we make (if it wasn't clear, the default text is added automatically). I created the vote that added that default text because previously, it said nothing at all — obviously, this is much better, because you followed the instructions and left a message on Wikitiki89's page, where you can further discuss the edit. —Μετάknowledge^{discuss/deeds} 07:22, 8 October 2017 (UTC)[reply]

Thank you. For a (not vandal) contributor is the cause of the rollback _never_ clear, s/he made the contribution supposing it was ok.

The list of the typic errors must not be too long, would be possible to chose from a premade explanation list by reverts?

Karmela (talk) 16:37, 8 October 2017 (UTC)[reply]

We have such a list for deletions of entire entries. It would be a good start for what you recommend. I do not know whether it is readily done technically. DCDuring (talk) 18:59, 8 October 2017 (UTC)[reply]

@DCDuring, Metaknowledge In en.wikipedia.org you can add two dropdown boxes below the edit summary box with some useful default summaries:

Common edit summaries -- click to use
Common minor edit summaries -- click to use

One can enable this gadget at https://en.wikipedia.org/wiki/Special:Preferences#mw-prefsection-gadgets

An analog dropdown box Common revert summaries -- click to use must be technically similar.

Karmela (talk) 07:47, 14 October 2017 (UTC)[reply]

So, apparently technically possible. How do we get it? DCDuring (talk) 14:01, 14 October 2017 (UTC)[reply]

This is how

mw.loader.load('//en.wikipedia.org/w/index.php?title=MediaWiki:Gadget-defaultsummaries.js&action=raw&ctype=text/javascript');

Dixtosa (talk) 14:20, 14 October 2017 (UTC)[reply]

@Dixtosa, DCDuring, Metaknowledge: If I interpret it correct, first we need to copy en:w:MediaWiki:Gadget-defaultsummaries.js to MediaWiki:Gadget-defaultrevertsummaries.js (or to something like that) and change the texts to the list DCDuring mentioned above (Where is it?). We will need something analog to en:w:MediaWiki:Gadget-defaultsummaries, and the Special:Gadgets must become some extra line (analog to en:w:Special:Gadgets) as well.

All this postulating the wish of the community here. Is this here the correct place and form to ask the community of the Wiktionary?

Karmela (talk) 08:52, 15 October 2017 (UTC)[reply]

Requests for deletion - restoring the list of nominations

In June 2017, WT:RFD was changed to no longer list items nominated at the right top of the page. I propose to restore the previous state. The current state is that categories are listed but not the items nominated themselves. That is not very useful, IMHO.

Therefore, I propose:

List nominated items again, as a list of items for all languages.
To support that, list all nominated items in Category:Requests for deletion instead of listing them only in per-language categories. This, again, is a restoration proposal.

--Dan Polansky (talk) 09:34, 8 October 2017 (UTC)[reply]

@Dan Polansky: If you want to see, say, the 5 French requests, click on the "▶" symbol next to the "Requests for deletion in French entries (0 c, 5 e)". In my opinion, this is more useful than before, because now you can choose the language you want to see, as opposed to seeing a mess of entries in all languages. If we want to see a mess of entries in all languages, we may look at the normal TOC (the "Contents" list). I believe we also have the option of making all languages un-collapsed by default, though personally I'd prefer them collapsed as they currently are. --Daniel Carrero (talk) 09:58, 8 October 2017 (UTC)[reply]

I want to see the complete list, not by language. I only want to check whether all the items listed there were put to RFD page itself; if I did not want to do that, I would not want to see that right-floating portion of the page at all. --Dan Polansky (talk) 13:32, 8 October 2017 (UTC)[reply]

I support Dan's proposals. The language-specific RFD categories seem to be useless. —Μετάknowledge^{discuss/deeds} 16:05, 8 October 2017 (UTC)[reply]
How do we know that no one uses the by-language listings? (BTW, I don't use them)
BTW, I have noticed that we have a fair number of headings on request pages that do not have tags. Do we need yet another run against the XML to identify:
1. Tagged L2s that are not on current request pages.
  1. Tagged L2s that are for archived or otherwise closed requests.
2. Untagged L2s that are on the request pages.
We'd also need to treat items that have been stricken or closed, but not yet archived.

At the moment I don't see how this can systematically be accomplished with search. Though I doubt we would need such a run every two weeks, it might be useful every quarter or, at least, every year. DCDuring (talk) 18:47, 8 October 2017 (UTC)[reply]

@DCDuring For part 1, User:DTLHS/cleanup/request consistency. I don't think 2 is that important since entries request request pages get archived eventually. It's possible that there are false positives if pages are linked unusually on the request pages. DTLHS (talk) 19:27, 8 October 2017 (UTC)[reply]

@DTLHS: For 2 I was thinking about those requests that are entered without use of any request template. Today I noticed it when [[academic institution]] was added to RFDE. (The contributor has now added {{rfd}} at my request.) Perhaps what is needed is to discourage addition of new headers on request pages except through the relevant templates. DCDuring (talk) 20:35, 8 October 2017 (UTC)[reply]

Classification of forms with -n't

Hello. Rua, Equinox, Erutuon and I have been talking about the classification of don't, can't and other forms with -n't in User talk:TAKASUGI Shinji/2017#Contractions. I think they are verb forms just like did and could, according to Arnold M. Zwicky and Geoffrey K. Pullum (Cliticization vs. Inflection: English n’t, Language 59(3), 1983, pp. 502-513), but not everyone agrees with their analysis. In my opinion, we shouldn't use “Contraction” as a header because it is not a part of speech, and we should replace it with a part of speech we can reasonably assign. What do you guys think? — TAKASUGI Shinji (talk) 10:09, 8 October 2017 (UTC)[reply]

Our level 3 headers are for more than just part of speech. Suffix isn't a part of speech either. We have to use "contraction" because for most cases there is no other way to do it. Look at Category:Middle Dutch contractions for example. So that argument is not very compelling.

As for these contractions specifically, I don't see how they can be considered anything else. They aren't considered verb forms in any standard grammar of English. One paper is interesting, but we should follow linguistic consensus on the matter and not the opinion of a single paper. —Rua (mew) 11:44, 8 October 2017 (UTC)[reply]

An analysis of well-known linguists and lack of analysis don't have the same value. I find their analysis convincing. You can only say don't you? and not *do not you?, from which we must conclude that don't is not a contraction of do not. — TAKASUGI Shinji (talk) 11:41, 9 October 2017 (UTC)[reply]

What do you mean by "standard grammar"? The Cambridge Grammar of the English Language (naturally, because it was co-written by Pullum) uses the inflectional-suffix analysis of -n't, and its auxiliary verb paradigms show negative forms corresponding to each of the finite forms. Certainly the more traditional version of English grammar that I learned as a kid didn't recognize negative inflected forms, but it wasn't particularly linguistically rigorous and shouldn't be the basis for our decisions on Wiktionary. — Eru·tuon 21:52, 9 October 2017 (UTC)[reply]

I'm in favor of the analysis in which -n't is an inflectional suffix and forms like don't are verb forms (and I could go on about that), but the essential thing is to at least be consistent. I don't think it's consistent to label -n't as a suffix (as it's been labeled since 2008) and then call forms like won't contractions. A contraction is basically the combination of a full word plus one or more clitics that are derived from orthographic words, but are not spelled as words in this case. So for won't to be a contraction, -n't has to be a clitic (a variant form of not). The other option is for -n't to be a suffix and won't a verb form. We need to pick an analysis and stick to it. It would be fine to include usage notes explaining the alternative analysis, or alternative inflection tables, or categories, but the headers and headword templates should stick to a single analysis. — Eru·tuon 19:20, 8 October 2017 (UTC)[reply]

A week has passed, and there has been one negative vote. I assume the classification of -n't according to their paper is acceptable. — TAKASUGI Shinji (talk) 12:54, 18 October 2017 (UTC)[reply]

Any idea for a new "Thesaurus:" shortcut?

WS:good → Thesaurus:good stills works, as it should.

But "WS:" does not make a lot of sense anymore, because now "Wikisaurus" is called "Thesaurus".

Then again, "TS:good" and "TH:good" are unavailable, because they are language codes. Is there a good shortcut available? If not, I guess we'll have to keep using only "WS". --Daniel Carrero (talk) 18:37, 9 October 2017 (UTC)[reply]

THES seems the obvious choice. Equinox ◑ 18:44, 9 October 2017 (UTC)[reply]

Alright, I guess. I'm not entirely happy with a mere reduction from 9 to 4 letters, but maybe that's the best option we have.

Maybe THE would be better ("THE:good" → Thesaurus:good), but "the" is the ISO code for Chitwania Tharu (w:Tharu languages). Can't we use it anyway? --Daniel Carrero (talk) 14:31, 11 October 2017 (UTC)[reply]

But if we can't use ISO codes then we can't use any three-letter code, even if ISO hasn't used it yet. It should be considered reserved for future ISO use. Equinox ◑ 15:07, 11 October 2017 (UTC)[reply]

That may be true, but we have violated that rule before. We have "cat" and "mod" as working aliases. See CAT:English nouns and MOD:sandbox. "cat" means Catalan, which seems unlikely to be used by Wikimedia because they have settled for https://ca.wiktionary.org/ and https://ca.wikipedia.org/ (using "ca", not "cat"). "mod" is Mobilian Jargon language. --Daniel Carrero (talk) 15:22, 11 October 2017 (UTC)[reply]

To be clear, I would support using "the". ("THE:good" → Thesaurus:good) --Daniel Carrero (talk) 15:24, 11 October 2017 (UTC)[reply]

I would prefer "THS" which is the language code for the w:Thakali language, a Nepali Sino-Tibetan language with 5,900 native speakers. Chuck Entz (talk) 02:04, 12 October 2017 (UTC)[reply]

No offence to you or Daniel but I think it's pretty obnoxious to appropriate a language code because not many people speak it and "it might never happen". You can't be half ISO compliant. Equinox ◑ 20:09, 25 October 2017 (UTC)[reply]

SYN. —suzukaze (t・c) 02:28, 12 October 2017 (UTC)[reply]

NYM: is what I like, to stand for -nyms. --Dan Polansky (talk) 07:49, 21 October 2017 (UTC)[reply]

I am actually OK with it not having a shortcut. - TheDaveRoss 20:42, 24 October 2017 (UTC)[reply]

Linking active policy proposals

WT:EL should probably link to WT:FORMS in some fashion. I imagine there are also other cases like these, where EL is a dead-end and the actual documentation is hidden away in some obscure undocumented location.

Some might protest that the former is policy while the latter are often drafts, but as long as this is indicated, I do not see any problem in linking. Should we maybe settle on some specific more mildly worded section hatnote, such as "Read more:" (instead of "Main article:" or the like)?

Interestingly, WT:Policies and guidelines, despite being prominently linked from the policy headers ({{policy}}, {{policy-TT}}, {{policy-DP}}), is currently categorized as "inactive". There's Category:Wiktionary think tank policies, but it's not especially user-friendly. --Tropylium (talk) 14:54, 11 October 2017 (UTC)[reply]

New section "Synchronic analysis" in WT:EL

w:en:Synchrony and diachrony

It isn't useful to have only historic (current "Etymology" section at en.wiktionary) or only modern analys.

Example: атония d1g (talk) 08:51, 12 October 2017 (UTC)[reply]

We include this in etymology, but the usual wording is "equivalent to". —Rua (mew) 13:36, 12 October 2017 (UTC)[reply]

Linking to Wikimedia Commons categories

Hello, I would like to know why the wiktionary entries are not linked to the Wikimedia Commons categories (by using statements at Wikidata). For example the entry Varvel can be connected to commons:Category:Vervels. It can only help the readers to (visually) learn more about that particular word. Fructibus (talk) 09:45, 12 October 2017 (UTC)[reply]

@Fructibus: Have you seen Wiktionary:Wikidata? We do in fact have some links to that sister project via local templates. E.g. tea. —Justin (koavf)❤T☮C☺M☯ 09:50, 12 October 2017 (UTC)[reply]

@Koavf: Thanks a lot! By the way, was there any discussion about including the wiktionary pages into Wikidata, connecting with the Wikipedia/Commons pages? Then the Commons link would show automatically for the Wiktionary pages, in all languages. At this moment, if you want to link to Commons in all language articles, that means you have to edit 67 Wiktionary pages. Fructibus (talk) 19:05, 12 October 2017 (UTC)[reply]

@fructibus: "Was there any discussion about including the wiktionary pages into Wikidata" Oh yes, quite a bit. And there are currently options to include Wiktionary entries in Wikidata but I don't feel like I can do a good job of summarizing all of that. You may wish to see the equivalent page here: d:Wikidata:Wiktionary. I 100% agree that we should use Wikidata to make sister links--you may wish to talk with ~~User:CodeCat~~User:Rua (I had forgotten he [she?] was renamed for some reason) about that. —Justin (koavf)❤T☮C☺M☯ 19:16, 12 October 2017 (UTC)[reply]

People have expressed dislike for Wikidata IDs, so we probably won't be using Wikidata for anything after all. I tried. —Rua (mew) 19:45, 12 October 2017 (UTC)[reply]

It will happen, it's just that at the moment the advantages aren't completely obvious. – Jberkel (talk) 20:56, 12 October 2017 (UTC)[reply]

@Jberkel: Isn't this one of them? —Justin (koavf)❤T☮C☺M☯ 22:26, 12 October 2017 (UTC)[reply]

The page tea has already {{wikidata|Q6097}}. Changing to a template like {{sister links|Q6097}} could fetch all sister project links with automatic update of new links, deleted ones or renamed ones. The problem is that a word may have multiple senses that can be connected to multiple equivalent pages on Wikidata. --Vriullop (talk) 08:22, 13 October 2017 (UTC)[reply]

@Koavf: I'm all for Wikidata, it's just that to some editors the advantages are less clear at the moment. @Vriullop: yes that would be great, via Wikidata one should be able to fetch all the other relationships. Couldn't {{senseid}} (or something similar) be used for fine-grained associations? – Jberkel (talk) 14:11, 13 October 2017 (UTC)[reply]

@Jberkel - @Koavf - @Vriullop - @Rua - Sorry, I am new to Wiktionary buy I really don't see the reason in not linking the Wiktionary definitions in Wikidata. For example the Wikipedia article Water has a link to the Wiktionary definition, at the bottom of the article. Why not to show it in the middle-left side of the page, near to the other sister project links? (Commons, Wikibooks, Wikiquote). This way all the 220 Wikipedia articles can show the link to the Wiktionary definition in their respective language (if it exists), without the need to actually edit the 220 Wikipedia articles. Fructibus (talk) 18:49, 13 October 2017 (UTC)[reply]

@Fructibus: I agree as well but there were concerns that it's too difficult, impossible, or possible-but-difficult and not actually helpful. I disagree with the latter two but it's definitely an undertaking to be sure. Then again, so is everything. —Justin (koavf)❤T☮C☺M☯ 19:01, 13 October 2017 (UTC)[reply]

@Koavf: Very nice answer, gives a feeling of touching a perfection in language, thanks :) - Fructibus (talk) 23:39, 13 October 2017 (UTC)[reply]

Ōbaku tō-on/sō-on readings

Found this video: Heart Sutra chanted by Ōbaku monks; is the ruby a Chinese pronunciation or as Wikipedia states: tō-on/sō-on readings? Here's a supporting resource. Domo, --POKéTalker (talk) 04:49, 13 October 2017 (UTC)[reply]

Personally it sounds suspiciously(?) too much like accented Mandarin (如(ぢ) (ji)? 厄(え) (e)?), possibly dated (見(けん) (ken)), but I also don't know know what I'm talking about. Maybe tō-on is Mandarin. —suzukaze (t・c) 05:11, 13 October 2017 (UTC)[reply]

@suzukaze, tōon is 唐音 (literally “Tang sound”), i.e. based on the Tang Dynasty pronunciation, no later than 907 CE. At least in theory. ‑‑ Eiríkr Útlendi │^{Tala við mig} 05:31, 13 October 2017 (UTC)[reply]

For reference, comparison of Japanese, Ōbaku reading (sō-on?) and standard Chinese:

観(かん)自(じ)在(ざい)菩(ぼ)薩(さつ)行(ぎょう)深(じん)般(はん)若(にゃ)波(は)羅(ら)蜜(みっ)多(た)時(じ)

Kanjizai Bosatsu gyō jin hannya haramitta ji

Avalokitesvara Bodhisattava was practicing deep prajnaparamita when...

觀(クヮン)自(ツ)在(サイ)菩(プ)薩(サ)行(ヘン)深(シン)般(ポ)若(ゼ)波(ポ)羅(ロ)蜜(ミ)多(ト)時(ス)

K(w)antsusai Pusa hen shin poze poromito su

Avalokitesvara Bodhisattava was practicing deep prajnaparamita when...

觀自在菩薩行深般若波羅蜜多時 [MSC, trad.]
观自在菩萨行深般若波罗蜜多时 [MSC, simp.]

Guānzìzài Púsà xíng shēn bānruò bōluómìduō shí [Pinyin]

Avalokitesvara Bodhisattava was practicing deep prajnaparamita when...

Though there is probably no clear romanization to the monks chanting, should kanji with these Ōbaku on-readings be provided as sō-on? Just wondering. --POKéTalker (talk) 02:21, 14 October 2017 (UTC)[reply]

(The automatic pinyin generated by zh-usex is not correct because it uses the most common readings. [1] has pinyin transcription that seems to be OK —suzukaze (t・c) 02:36, 14 October 2017 (UTC))[reply]

@POKéTalker, to confirm / clarify -- it sounds like you're asking if there is value in adding sōon readings to the individual kanji entries. If that's your proposal, I have no particular opposition, so long as the readings are clearly labeled as sōon (provided that's the correct reading category). ‑‑ Eiríkr Útlendi │^{Tala við mig} 04:36, 14 October 2017 (UTC)[reply]

I think POKéTalker wants to make sure that they are indeed tou'on first. —suzukaze (t・c) 20:48, 14 October 2017 (UTC)[reply]

TabbedLanguages default and English links in definitions

Yesterday, following Wiktionary:Beer parlour/2017/July#TabbedLanguages edit: default to English for unmarked links, I made a change to MediaWiki:Gadget-TabbedLanguages.js so that the default language would always be English, if no language is specified. This means that it's no longer necessary to use {{l|en|...}} in definitions. I'd like to ask the people who do this to use regular links from now on. —Rua (mew) 15:40, 14 October 2017 (UTC)[reply]

But not everyone uses tabbed languages. DTLHS (talk) 15:45, 14 October 2017 (UTC)[reply]

I agree that this would be a sensible default for those without it, too. But then we'd need a separate gadget. —Rua (mew) 15:46, 14 October 2017 (UTC)[reply]

If we made a separate gadget it could be more smart, such as linking derived terms to the correct language, while linking terms in definitions to English. DTLHS (talk) 15:53, 14 October 2017 (UTC)[reply]

Also, what happened to the plan to make TL the default? We had a vote and everything. —Rua (mew) 15:51, 14 October 2017 (UTC)[reply]

@Rua: The vote was conditional on categories being properly sorted by bot in advance of the implementation. This never happened, afaict. --Yair rand (talk) 04:20, 2 November 2017 (UTC)[reply]

Any way to undo this behavior for searches and search results? Having them always go to English is pretty annoying when you’re working on some other language. — Vorziblix (talk · contribs) 01:48, 18 October 2017 (UTC)[reply]

Where else should they go? —Rua (mew) 10:49, 18 October 2017 (UTC)[reply]

Ideally to the last language visited, as they did before. — Vorziblix (talk · contribs) 23:15, 18 October 2017 (UTC)[reply]

Singapore terms

Just a heads up: a while ago, a Singapore schoolteacher encouraged his students to add Singaporean English terms to Wiktionary (which is, on the whole, a good thing). We seem to have a new batch of these happening at the moment, e.g. bus captain, taxi uncle. So be ready for some cleanup. Equinox ◑ 08:23, 16 October 2017 (UTC)[reply]

Special:Contributions/86.30.235.176

They've made some drastic changes to pronuciation which might not be correct. Anyone who knows Old English, do you mind taking a look. --Robbie SWE (talk) 18:12, 16 October 2017 (UTC)[reply]

@Robbie SWE: It looks like, as far as Old English pronunciation goes, they're changing sequences of /h/ and a sonorant to sonorant and voicelessness diacritic (for instance, /hr/ to /r̥/). That might be correct in a pseudo-phonetic transcription, but I don't know if it is an accepted phonological analysis. — Eru·tuon 21:46, 16 October 2017 (UTC)[reply]

Translating both ways

Hello!

When I started working on a project in which I would like to use translations from the wiktionary, I noticed that wiktionary translations are created separately for each language. That means that even if the English wiktionary contains the translation of a word into another language e.g. Mandarin, in that language there will not be a translation of that word into english.

One example:

library

- the list of translations contains the translation 图书馆

zh:图书馆

- the Chinese wiktionary page does not have a translation for that word into English (the site contains: 英语(English)：[[]])

Since these translations are symmetric, it would be correct to add a large number of translations to these wiktionaries with much less effort. However there surely will be a few issues that have to be resolved first.

TheDaveRoss already replied to me per mail already stating some issues:

"1. There are numerous Wiktionaries, each one maintained by a distinct community of volunteers. Each has its own policies regarding what may or may not be included, how translations are to be added, etc. It is very important that you coordinate with the local community wherever you add content to ensure that the content meets their criteria.

2. Translations are very nuanced (as you are probably aware). Automated addition of translation has happened at small scales in the past, however close oversight by a person familiar with both languages is required. Even translations which appear to be symmetrical may require special annotation in the target language which is not included in the original language.

3. The source material may not be correct, and automation can propagate errors. The English Wiktionary, and a few other large Wiktionaries, have enough contributors that many errors are caught quickly. That is not the case for the majority of other languages, so it is important to ensure any additions to other languages are correct"

4. Attribution to the original contributor will be important

E.g. adding the new words to proposed translation first and then checking for correctness would decrease the risk of wrong translations but add some value right away.

What do you think about writing a script to do this, what other problems are there with this? Do you know about previous attempts to do this? I hope this could be very useful!

Noahho (talk) 01:21, 17 October 2017 (UTC)[reply]

@Noahho: Something similar to two-way translation could work if we can agree on how we will use Wikidata and how it will be connected across Wiktionaries. Unfortunately, how that would work is very difficult to determine. —Justin (koavf)❤T☮C☺M☯ 02:13, 17 October 2017 (UTC)[reply]

@Noahho Hello Noah. Can I please know what project you are working on? If the aim of the project is to extract translations of foreign-language terms into English from Wiktionary, it would be much easier to extract from the pages on English Wiktionary; e.g. for simplified 图书馆 it would be at 圖書館, which says "library". Wyang (talk) 02:17, 17 October 2017 (UTC)[reply]

This is an age-old problem. I believe that, somewhere, there is a unified Wiktionary that is not dependent on a "home" language. But I forget what it is called, or what state it is in. SemperBlotto (talk) 05:37, 18 October 2017 (UTC)[reply]

@SemperBlotto: omegawiki:? —Justin (koavf)❤T☮C☺M☯ 09:03, 18 October 2017 (UTC)[reply]

I will also mention one previous attempt to do this locally was User:Tbot. The person who created that script has passed away, however there is some amount of documentation of his efforts in that user space. - TheDaveRoss 13:57, 18 October 2017 (UTC)[reply]

No Tbot-like program should ever be run again. It was a well-intentioned mistake, the effects of which we are still engaged in cleaning up. —Μετάknowledge^{discuss/deeds} 07:10, 2 November 2017 (UTC)[reply]

Turkish vs Ottoman Turkish

The Balkan language loanwords from Turkish should technically be Ottoman Turkish, since that's the era they entered those languages, right? Is the only main difference the script being Arabic vs. Latin? I realized I need to go back and change a bunch of Romanian and some other entries. Word dewd544 (talk) 16:12, 17 October 2017 (UTC)[reply]

Yes, they should generally be Ottoman Turkish. The script is one significant difference, but if I’m not mistaken there’s also a huge difference in lexicon, where a large portion of the Ottoman Turkish lexis consists of loanwords from Persian and Arabic that were later stamped out of usage and replaced with neologisms by Atatürk. — Vorziblix (talk · contribs) 21:40, 17 October 2017 (UTC)[reply]

There are also grammatical differences. That said, I would personally prefer to treat them as a single language, and I don't think we lose much by claiming Balkan loanwords are from Turkish rather than Ottoman Turkish when the word in question is itself the same. —Μετάknowledge^{discuss/deeds} 21:44, 17 October 2017 (UTC)[reply]

I’m indifferent to merging them, not being knowledgeable enough on the subject, but the split does seem to be mostly a relic of sticking to ISO codes; input from editors experienced with Turkish could be helpful. — Vorziblix (talk · contribs) 06:40, 19 October 2017 (UTC)[reply]

Isn’t it more true to the soothfast happenings in the Ottoman Era to describe the Ottoman Turkish as an acrolect of Turkish which the elite prioritized while basically we have had Turkish all the time? It would be awkward to say that we had Turkish once and then, by some peculiar developments in constitutional history, Ottoman Turkish, and then because Atatürk said so Turkish has smitten Ottoman Turkish. Rather there has been one basic Turkish from which the Balkan languages also borrowed rather than from the language we see as Ottoman literary inheritance today, though of course there can be learned borrowings from the literary language as well, though in the case this is largely unlikely because of mostly late literary culture in the Balkan countries and literary culture in the Slavia as well as in Greece (I don’t know about Romania and Albania) also prohibits itself to borrow, as compared to other literary cultures. So in the context of Balkan languages, the Turkish they have been in contact with was coexistent Turkish rather than typical Ottoman Turkish. We are just inveigled to assume that one literary language has borrowed from the other literary language even when the spoken language has borrowed, because for older times we know about the spoken language from its appearance in writing. It is an image of things we have long surpassed in Romance studies, like acknowledging that Spanish has borrowed from colloquial Arabic rather than from literary Arabic. Palaestrator verborum (loquier) 20:51, 18 October 2017 (UTC)[reply]

Unfortunately it’s not really clear what Wiktionary means by ‘Ottoman Turkish’ — just the literary acrolect or the language in general during a given time period. Previous discussions don’t seem to have reached a conclusion. Some of the comments made there could be relevant to the issue of how to treat Ottoman Turkish, though. — Vorziblix (talk · contribs) 06:40, 19 October 2017 (UTC)[reply]

Listing Translations by Language

It seems to me that quality control of translations is harder than it should be: those of us who patrol new edits can't be knowledgeable in anywhere near all of the languages, and those with expertise in the languages in question are less likely to be spending their time browsing through English entries. {{t-check}} is helpful, but not used in most languages.

Does everybody think it would be a good idea to create a listing of translations in each language, along with the entry they're in? I would envision it as a listing in the language's alphabetical order, with the {{t}} template converted to an {{l|template}} and followed by the name of the entry:

bon m, good

This would make it easy for an expert in a given language to scan through all the translations in that language without browsing a bunch of English entries. It also might make the redlink categories and the overhead that goes into creating them unnecessary.

I'm bringing it up here because it would be a major undertaking involving massive processing of the dumps, so I want to make sure it's a good idea before asking anyone to do it. Perhaps it could be started with some of the smaller LDL languages as a test. Chuck Entz (talk) 14:19, 18 October 2017 (UTC)[reply]

Matthias Buchmeier maintains lists similar to what you describe. — Ungoliant ^(falai) 14:59, 18 October 2017 (UTC)[reply]

I can share the assumption that it would make it more possible to make Wiktionary serve as bilingual dictionary in relation to single languages, as for now one cannot directly work in Wiktionary to make it intentionally a bilingual dictionary for any language because one does not see what is already there, i.e. one can only add to quantity, but only by serendipity to quality. But the requirement would be that such lists are live dumps which get refreshed as soon as edited, instantly by Javascript or at least after refetching the page. Because it is a core part of motivation for editing to see the results published instantly, that’s why the web is there. Palaestrator verborum (talk) 16:41, 18 October 2017 (UTC)[reply]

@Chuck Entz: Re quality control via patrolling: There's an option for filtering RecentChanges to only show certain languages in WT:PREFS, near the bottom. Apparently it was broken for a while. Now fixed. (Also, it doesn't work for those who have the new version of recent changes enabled.) --Yair rand (talk) 04:26, 2 November 2017 (UTC)[reply]

Depending on the quality of translations themselves, the targeted bilingual dictionaries can be quite good. The lists mentioned above: Matthias Buchmeier are quite good (mostly English to foreign language) but require some programming work. Building the reverse - foreign language to English is apparently much harder and depend largely on entry structure in those specific languages.--Anatoli T. ^{(обсудить}/^вклад) 06:58, 2 November 2017 (UTC)[reply]

Word Frequencies in Wiktionary

We have just finished removing the {{rank}} information based on some old, problematic parsing of the Project Gutenberg corpus. We still have a few appendices which record that information. My main objection to the inclusion of that data was that it was flawed, and outdated. But I don't roundly object to having word frequency information which is accurate. To that end I have a few questions.

Should we include any frequency data in any manner?
If so, should that data be represented within word entries in some way?
Which corpora should be used, or which frequency lists?
Should original research (of the type used for the old data) be allowed?

One starting place for English (and a few other languages) can be found at the BYU corpus page. It is probably best to avoid getting too deeply into the weeds here, but rather if it seems like there is a general consensus around what should be included we can spin off a project page and figure out all of the details. - TheDaveRoss 17:53, 18 October 2017 (UTC)[reply]

We should certainly not add frequency data to word entries because the data is doubtless interesting in a list, but too unsure and thus and because nobody wants to know which words have the nearest frequency if he looks up a word – which is a totally random result of sundry capacities of a language and without fruit for erudition – and because it would suck off endeavors to more instructive content creation not worthwhile enough to maintain in a the main namespace. And as there is more instructive content to be created by the same endeavors, I also opine that for the collection of frequency data it should be waited until copyright law has been abolished by revolutions in the world and thus representative and illuminatingly separable corpora can be collected. At that time we would only struggle with the technical recognition of what a word be, not of what sources we digest, which together are multiplying error factors. Palaestrator verborum (loquier) 18:25, 18 October 2017 (UTC)[reply]

~~Wut?~~ Sorry, what I meant there was: one, I have no idea what the abolition of copyright law has to do with the inclusion of word frequency on Wiktionary and, two, I disagree with the notion that this would prevent some other work from being done. - TheDaveRoss 18:33, 18 October 2017 (UTC)[reply]

The notion is that there are opportunity costs in collecting telling corpora. I don’t think that one could be content with subtitle databases, as these are slanted to Hollywood and mass productions and their fantasy worlds instead of the whole language that we mean when we talk about the language; and actually those current subtitle collections and most other corpora are non-free either. The web-based corpora which are represented on the BYU corpus site have of course their own problems, with the deep web and the dark web and resources being varyingly crawlable. If we want recent and representative data, we can only go illegal by grabbing Library Genesis, accessing journal and newspaper databases via black channels like Sci-Hub does and things like that, perhaps mixed with subtitles, i.e. things that we cannot perform by the means of the Wikimedia Foundation without endangering it. If there would be no copyright law, there would be a large database of works of all kinds which would constitute good (i.e. the technologically and humanly, not legally best possible) and fast data. This is of course a high standard from which I esteem corpus data valuable, and possibly the view of a philosopher against the practical mind of a programmer. But one can set the doubts even higher by asking oneself how to offset different data sources, like how commensurable web data and journal data and parliamentary debates are, even if one has access to all humanly possible corpora, and if one needs to be the man to have correct information about word frequencies. Others could be pleased to see lesser corpus data, but I think that the assumption cannot be rejected out of hand that this is not worth it if there are so many doubts about whether this data represent the actual distribution in language (by my common sense, I often wonder about words not being found at all in large frequency dictionaries) – aside from such data not being valuable maintained in individual word articles, which question is subject to entirely different evaluation criteria, because the intention of a reader opening a word’s entry is different from the intention of a reader opening a word list. Palaestrator verborum (loquier) 19:25, 18 October 2017 (UTC)[reply]

It's about time we asked the question, how can words be real if our eyes aren't real? DTLHS (talk) 19:34, 18 October 2017 (UTC)[reply]

@DTLHS: I'm ded. But it should be "How Can Words Be Real If Our Eyes Aren't Real". —Aryaman ^{(मुझसे बात करो)} 20:40, 18 October 2017 (UTC)[reply]

I have not asked this question, they are valuable abstractions from language – we can explain and describe them –, but you cannot just cast “the language” into a measuring beaker to know an objective distribution of its constituent parts (which is also a mereological problem), as the language you know about is always constructed to some degree as necessitated by material constraints. What we want, in laying out at frequency list which could praise itself of utilizing the methods fit for the object, is to be at least as exact as possible about it, but that is by far not legal. Palaestrator verborum (loquier) 19:45, 18 October 2017 (UTC)[reply]

Palaestrator, your writing style is unnatural and unnecessarily loquacious. I don't know why you are doing it, but I want you to know that your arguments will be taken more seriously if you try to express them clearly and succinctly, rather than in a way that just makes us all think that you're trying to show off. (And as a side note, your understanding of how corpora work in relation to copyright law seems to be flawed, so you might want to try reading up on that first.) —Μετάknowledge^{discuss/deeds} 19:53, 18 October 2017 (UTC)[reply]

It is easy with the law in this case: If the corpus collection is legal (i.e for example the Wikipedia corpus, accessing it is legal), accessing it is legal; if the data collection is illegal, or the manner of accessing it is (i.e. for example a university’s access being used beyond its license, as Sci-Hub does), that is already legally contentious (it is disputed in many jurisdictions, as the United States of America and the Federal Republic of Germany, if just streaming content published in breach of copyright law is violating it, also dependent of dolus), and one should not lay the hands in fire for the collected fruits of such automated accessing, especially if it is commercially exploited as allowed by the licenses used on Wiktionary.

I can’t show off with my writing style, being unnatural in normal people’s view is my expressions’ very nature, or sounding like a 19^th century novel (wherefore being natural though if the matter dealt with is a complicated matter of human culture, as language is? I don’t know why people recall nature when we can surpass it.). And it is not loquacious, I already write off parts of it. Besides, the point of reading it could lie in it saving minds from futile pursuits. Like others, I don’t talk if I prognosticate that my verbosity does not pay proportionately. What is the prospect though of how much work hours the creation and maintenance of those lists take? Palaestrator verborum (loquier) 20:32, 18 October 2017 (UTC)[reply]

It would be nice to have word frequency information available, but there is the serious PoS/Etymology problem (eg, dyke or dike). I am skeptical of both the heavily annotated corpora (which differentiate [or try to] by PoS, but are generally small) or the large corpora (which do not usually make accurate PoS determinations). That said Google N-Grams and the BYU corpora would be fairly useful, though I have not investigated the terms of use for their frequency data. It doesn't seem particularly useful in an entry. It would be very useful to have some kind of quick indication as to what frequency class a given word used in a definiens was in (eg, top 10K, next 40K, next 200K, perhaps next 750K). As an appendix such lists might make it easier for a contributor to check the understandability of a definition. DCDuring (talk) 21:02, 18 October 2017 (UTC)[reply]
Google N-Grams makes possible Reference links like this frequency comparison of canvas and canvass as verb and noun. To me that seems useful to contributors and to passive users. DCDuring (talk) 21:08, 18 October 2017 (UTC)[reply]

I completely agree that frequency at the POS or even sense level is much more useful, but how that might happen eludes me. The concept of not necessarily providing specific rank, but instead indication a frequency class of some kind could be interesting, but the underlying data would suffer in the same way. - TheDaveRoss 11:56, 19 October 2017 (UTC)[reply]
The annotated corpora, like Google N-grams, BYU COCA, as well as smaller ones, support PoS at least. Usually, one etymology accounts for the overwhelming majority of the usage in a given PoS, which we could note in such cases with little OR. DCDuring (talk) 13:11, 19 October 2017 (UTC)[reply]

Thanks for getting rid of it! It's been one of my bugbears for years! --P5Nd2 (talk) 10:06, 20 October 2017 (UTC)[reply]

It would be very interesting to have some corpus-related data available here. It's fairly easy to produce ranking lists from Google's N-gram data, I extracted some a while ago for French and German (only top 1K). – Jberkel (talk) 13:55, 6 November 2017 (UTC)[reply]

If you would like to share your methodology, or simply generate some lists, I am happy to work on the insertion into entry side of things. Obviously understanding the benefits and limitations of any given corpus is critical. Is it possible to do things like extract the top n words from books within a date range? - TheDaveRoss 14:08, 6 November 2017 (UTC)[reply]

The code is on gitlab. If you want I can help to generate some lists. To keep the PoS the script needs to be changed. The date range is flexible. Apparently there are some data quality issues (OCR) in Google's corpus before 1800 and after 2000. – Jberkel (talk) 17:47, 6 November 2017 (UTC)[reply]

Catholicism vs. Roman Catholicism vs. Eastern Catholicism

There seems to me to be an inconsistency with how these terms are used in Wiktionary. I just want to clarify what these terms mean and therefore, we can streamline the usage of these terms in Wiktionary. "Catholicism" or "Catholic" (as capitalized) in common parlance would be the faith or word connoting the Catholic Church (those that are in communion with the Pope in Rome). "Roman Catholicism", on the other hand, means something specific, since it would refer to Catholicism that is using the Roman Rite within the Catholic Church (as opposed to those using other rites such as Byzantine Catholicism, Coptic Catholicism, Syriac Catholicism, etc.). However, historically, the term "Roman Catholicism" was used as a pejorative slur in English-speaking circles for the Catholic Church. Actually, if referring to all western rites (together with the Roman rite, the Ambrosian rite, etc.), it would collectively be called "Latin Catholicism", or "Western Catholicism". "Eastern Catholicism", on the other hand, also means something specific, since it would refer to Catholicism (in communion with the Pope in Rome) that uses any Eastern rite, which would be by Byzantine Catholics, Syriac Catholics, Chaldean Catholics, etc.

The problem now arises. Some call anyone in communion with the Pope as a "Roman Catholic", but almost all Eastern Catholics don't like it, because they say they don't use the Roman rite. Therefore, they don't associate with the term "Roman Catholic", but with whatever their sui iuris church or rite is, like a "Ukrainian Catholic", or a "Greek Catholic". Therefore, some terminology associated with the entire Catholic Church, let's say the Council of Trent, would be associated with the entire Catholic Church, but it is labelled as "Roman Catholic", and Eastern Catholics, since they are in communion with the Pope, would also hold the Council of Trent as true. Therefore, we get Wiktionary entries like patron saint that have both, which is pretty redundant, because one could just label this as simply "Catholicism", and it would be simpler and no one would misunderstand it. Everyone would understand that it means something associated with the Catholic Church.

Therefore, for simplicity, clarity, and completeness of information, I move that all labels under "Roman Catholicism" be changed to "Catholicism" unless the entry is really just concerned with the Roman rite of Catholicism, terms like "Agnus Dei" or a "humeral veil", although I find it redundant too that we need to provide the specific rite within the Catholic Church to which the entry is used. --Mar vin kaiser (talk) 13:16, 19 October 2017 (UTC)[reply]

I thought that in English at least, the term Roman Catholic meant a Catholic who is in communion with the Bishop of Rome, i.e. the Pope, and thus includes Eastern Catholics. The "Roman" is necessary in order to exclude Anglicans and the Eastern Orthodox (who are also considered part of the Catholic Church as that term is used in the Creed). —Aɴɢʀ (talk) 13:54, 19 October 2017 (UTC)[reply]

@Angr: Actually, in those cases, the word "catholic" is written as a smaller case, which is a common practice in reciting the creed by Protestants, such as the "catholic church". When it is capitalized, as "Catholic", it would refer to the Catholic Church in communion with the Pope. This is exemplified by the fact that when one asks what religion you are, one says "Catholic", "Orthodox"(Eastern or Oriental), or "Anglican", and there is no ambiguity with regards to the term "Catholic" that it automatically refers to the Catholic Church in communion with the Pope. As I said, the reason why "Catholic" should be used instead of "Roman Catholic" is because Eastern Catholics simply do not subscribe to the idea that they are Roman Catholics, because they do not follow the Roman rite, nor any Roman tradition, as started in the church in Rome. They have their own liturgy and practices, distinct from the Roman rite, thus they refuse to be called "Roman Catholic". Since the entries in Wiktionary labelled as "Roman Catholic" also apply to "Eastern Catholics", it's better to label as "Catholic". In religious discussion, people actually differentiate "catholic" and "Catholic", wherein the capitalized word refers to the Catholic Church in communion with the Pope. --Mar vin kaiser (talk) 14:31, 19 October 2017 (UTC)[reply]

That's not how I've ever understood "small-c catholic"; I've always taken it to refer to the nonreligious sense of catholic: "universal; all-encompassing; pertaining to all kinds of people and their range of tastes, proclivities etc.; liberal", while "big-C Catholic" has the religious senses. I suppose that, just as with the word American, there are different meanings to both Catholic and Roman Catholic and different people prefer different meanings and get into arguments with other people as to the "proper" meaning. The trouble is, there is no term that is both unambiguous and commonly used that refers to all churches in communion with the Pope. Both "Catholic Church" and "Roman Catholic Church" are ambiguous as they mean different things to different people, and "Church in communion with the Pope" is unwieldy and not exactly a common term (quite apart from the ambiguity of pope, which can refer to other people than the Bishop of Rome}}). —Aɴɢʀ (talk) 14:44, 19 October 2017 (UTC)[reply]

@Angr: I see what you mean, and I understand the trouble of ambiguity. How about we follow the precedent of Wikipedia? The Wikipedia entry w:Catholic Church pertains to all churches in communion with the Bishop of Rome. How about using the "Catholic Church" as a label instead? --Mar vin kaiser (talk) 17:10, 19 October 2017 (UTC)[reply]

It still seems so weird and funny to me that the entry "particular Church" is labelled as "Roman Catholic" when almost all of the particular Churches except 1 refuse to be called "Roman Catholic". --Mar vin kaiser (talk) 17:12, 19 October 2017 (UTC)[reply]

@Mar vin kaiser: The Anglo-Catholic in me rebels at seeing "Catholic Church" used to mean only the parts of it in communion with the Pope (and I was opposed to Wikipedia's moving "Roman Catholic Church" to "Catholic Church" several years ago), but the pragmatist in me says I suppose it's the least bad solution. What do others think? I feel like this isn't a decision that should be made by Mar vin kaiser and me alone. —Aɴɢʀ (talk) 13:47, 20 October 2017 (UTC)[reply]

@Angr, Mar vin kaiser: I don't see what the difference between "Catholicism" and "Catholic Church" is. And what about just "Catholic"? Would that be problematic? — justin(r)leung _{{ (t...) | c=› }} 15:14, 20 October 2017 (UTC)[reply]

Come to think of it, I just noticed that if you type "Catholic" or "Catholicism" into Wikipedia, it redirects you to the article of the "Catholic Church". --Mar vin kaiser (talk) 15:27, 20 October 2017 (UTC)[reply]

I don't think there is a final solution, since there is ambiguity no matter which term you use. I think "Catholicism" or "Catholic Church" are probably the best ways to label something pertaining to the Church in communion with the Pope, since it is almost always what people are referring to when they use that term, and using more specific labels like "Old Catholicism" or "Old Catholic Church" for other brands of Catholicism. There's no perfect solution, but I think that's the way to go. We almost need another term, like "Papal Catholicism" (although that still leaves an ambiguity between Roman Catholics and sedevacantists...). :P Andrew Sheedy (talk) 03:55, 22 October 2017 (UTC)[reply]

I prefer to keep it at "Roman Catholicism". It is less ambiguous and not really unwieldy. Lingo Bingo Dingo (talk) 11:24, 25 October 2017 (UTC)[reply]

@Lingo Bingo Dingo: As I said, most entries that refer to Roman Catholicism also refer to other rites in the Catholic Church, such as all the Eastern rites. I think it makes more sense to follow the approach in Wikipedia, which is the label "Catholicism". --Mar vin kaiser (talk) 08:12, 19 November 2017 (UTC)[reply]

I agree that the label should simply be Catholicism, except in instances where there is actually a distinction between East and West. For instance, Eastern Catholics (at least in the Byzantine Rite) do not celebrate Mass, strictly speaking, and they prefer the term Divine Liturgy, so the label currently at the former entry should be left as is. Andrew Sheedy (talk) 20:37, 19 November 2017 (UTC)[reply]

Yes, I read the above discussion and understood that you meant that, but that doesn't address my concerns. "Roman Catholicism" immediately disambiguates from "Christianity basic on the ecumenical councils", "Latin Christianity", "Old Catholicism", "Anglo-Catholicism/Anglican Catholicism" "Liberal Catholicism" and whatnot, and "Catholicism" doesn't. In normal use this is hardly ever ambiguous because of context, but there isn't a lot of context that we can cram into a label. Lingo Bingo Dingo (talk) 11:54, 21 November 2017 (UTC)[reply]

And "Roman Catholicism" can also be understood to disambiguate "Roman Rite Catholicism" from "Eastern Rite Catholicism", which isn't solved by using that label. Are Old Catholicism and Anglican Catholicism different enough from Roman Catholicism that the definitions wouldn't apply to them? Andrew Sheedy (talk) 18:15, 25 November 2017 (UTC)[reply]

I find that a less problematic ambiguity as these refer to churches that are in full communion.

Based on skimming the Dutch and English categories, a lot of the titles and terms for objects and rituals would be the same, but some particular terms related to the administration of the Roman Catholic Church, monastic orders or Marian devotion wouldn't fit. Lingo Bingo Dingo (talk) 12:13, 4 December 2017 (UTC)[reply]

Please revert vandalism at WT:LOP

Can someone please undo the vandalism of 2017 October 9 at WT:LOP ?

Almost all the page content was deleted by vandal user 2602:306:3B60:F5F0:9DB8:9A6C:2612:A93 (talk)

For some unknown reason pressing revert or trying to go back to the last good version [2] resulted in an edit filter saying it was impossible to save.

-- 70.51.45.76 06:24, 20 October 2017 (UTC)[reply]

Thanks, done. (Repeated vandalism at Appendix:List of protologisms/Q–Z by similar IPs; may require protection or range block.) Wyang (talk) 06:27, 20 October 2017 (UTC)[reply]

Why isn’t it even possible for a non-sysop user to revert multiple commits, at least by a single IP? This could of course be used by vandals, but just deleting content has the same result, and without it the vandals have an advantage, because they just need to save multiple times to make their changes unrevertable. Or do I just fail to see how this can be done by me as a plain user? There has been a number of cases however where I would have been faster than an admin in reverting vandalism. Palaestrator verborum (loquier) 09:10, 20 October 2017 (UTC)[reply]

@Palaestrator verborum: You can revert several edits in a single stroke, although it's (slightly) convoluted. Go to the revision history of the entry; on the left corner, there's a button called "Compare selected revisions". By default, it will compare the current revision with the second-to-last. Change that to the last good version of the entry instead, then click the button "Compare selected revisions", then "Undo". --Barytonesis (talk) 10:21, 24 October 2017 (UTC)[reply]

Removing images of coats of arms

Relatively recently, coats of arms have been added to entries as images.

I propose to remove all images of coats of arms.

Images should help find what the thing referred to looks like or help get a clearer idea of the referent in another way. Coats of arms do not serve the purpose at all. For countries, states and cities, geographic maps seem okay as images.

--Dan Polansky (talk) 07:44, 21 October 2017 (UTC)[reply]

Support removing coats of arms. Support having maps in the entries mentioned. --Daniel Carrero (talk) 10:59, 21 October 2017 (UTC)[reply]

Support and maybe include a map through OpenStreetMap with the extension Kartographer?

Noé 12:03, 23 October 2017 (UTC)[reply]

Support (unless the entry is referring specifically to a coat of arms of course) Pengo (talk) 04:41, 25 October 2017 (UTC)[reply]

Support, except if the coat of arms is relevant to a sense or an etymology. Lingo Bingo Dingo (talk) 11:25, 25 October 2017 (UTC)[reply]

Oppose the proposal to "remove all images of coats of arms" -- as others note above, sometimes an entry's senses may refer explicitly to a coat of arms, as at 分銅 (fundō, “a weight used in a balance scale”) or 茗荷 (myōga, “Japanese ginger”) or 家紋 (kamon, “family crest”). Support removal of coats of arms from entries without such senses, such as entries for geographic entities. ‑‑ Eiríkr Útlendi │^{Tala við mig} 16:38, 25 October 2017 (UTC)[reply]

As a few people above: I support removing them as clutter, unless there is some relevance beyond just "being the coat of arms of X" (e.g. if the entry is about coats of arms, or perhaps if it's the arms of Doggingham and there's a dog on it). Equinox ◑ 20:02, 25 October 2017 (UTC)[reply]

I'm with Equinox and others, see aapiskukko. --Hekaheka (talk) 22:19, 25 October 2017 (UTC)[reply]

The coat of arms in aapiskukko is fine, and the intent of my proposal is indeed narrower than the formulation: let's remove a coat of arms from city X if the coat of arms is there only for its being coat of arms of that city; ditto for countries. --Dan Polansky (talk) 12:49, 29 October 2017 (UTC)[reply]

I have removed ten coats of arms (more are pending): Panama, Mexico, Portugal, Latvia, Serbia, Malaysia, Romania, New Zealand, Slovakia, and Samoa. --Dan Polansky (talk) 22:09, 4 November 2017 (UTC)[reply]

I have removed as many coats of arms as I could quickly find. Two search terms: incategory:"English proper nouns" coat of arms; incategory:"Norwegian Bokmål proper nouns" coat of arms. --Dan Polansky (talk) 09:21, 12 November 2017 (UTC)[reply]

Oppose. They are images that relate to the entries in question. DonnanZ (talk) 09:03, 12 November 2017 (UTC)[reply]

WT:WDP and senseid

Not going to hide, I'm eager to have this information directly at Wiktionary.

Probably Template:senseid is not the best template and we would have better solutions.

So I suggest two votings:

if wikidata ids could be used parallel to synonyms (to connect same senses with same wikidata ids)
if Template:senseid is the best option for this

I suggest to start voting on 28-10-2017. d1g (talk) 15:50, 21 October 2017 (UTC)[reply]

Discussion

Discuss any questions before voting here. d1g (talk) 15:50, 21 October 2017 (UTC)[reply]

Wikidata ids in order to capture same senses

Support

Oppose

Comment

Template:senseid as current solution to implement topic above

Support

Oppose

Comment

from vs <

I stumbled upon this old line in Wiktionary:Etymology:

Some editors use the word “from” to separate ancestors, while others use the algebraic “<”. The symbol “<” implies an arrow that points in the direction of language change. There is currently no consensus on a preferred form, but a majority of editors prefer "from" over "<".

There was no clear consensus in 2011, but the "from" side has clearly won out by 2017. Can we update WT:ETY or does it need some kind of vote? Pengo (talk) 16:25, 21 October 2017 (UTC)[reply]

I've deleted that paragraph. --Barytonesis (talk) 16:53, 21 October 2017 (UTC)[reply]

I added the paragraph back: it is true and links to evidence of consensus or its lack. If you can provide evidence of consensus, we can update the paragraph to state what the consensus is. --Dan Polansky (talk) 17:27, 21 October 2017 (UTC)[reply]

Newer user here. I registered two and a half weeks ago and worked on and read the English Wiktionary heavily since, having almost 10,000 of pages since then in my browser history from en.wiktionary.org, counting several tens of thousands more for the earlier rest of this year, but I have not seen the use of such a sign since then, or if I have seen it it has not exceeded five times, and I cannot imagine that a reader can have an honest will to see it.

Instead, it appears to me that if one is an editor that has not forgone caring for the visual appearance of a linguistic entry, one is inclined by one’s refinement to substitute such figures en passant; I don’t think that it is good typography and I must own that it would be an understatement to claim that I would need to tax my brain to apprehend what such a mark is intended to signify. It is hard for someone who is lacking of any background of having it seen used other than in unavoidable mathematical education and on the other hand it is hard for a mathematician likewise because he who knows mathematics is irritated by any use of mathematical characters for which he has trained much to have specific concepts. Also I am concerned that it is a bad habit to use that sign anywhere at all on the web with the intention of its glyph being displayed, as its meaning is restricted to being part of the XML and HTML markup and one can make a mess with it easily or have it filtered (on other sites if not here).

Uses in print are based on space rationales while the sign does not belong to the inventory of any writing system, but else you can with as much success use ⊷ or ⊱ or √ if you use < – what you want to express with it is no less obscure, as using “<” is a far-fetched trope no matter of how much frequency it has; “smaller than” in mathematics does not overtly map to “borrowed from” or “inherited from” (the keeping apart of which is also double-crossed by it), and the other uses of the sign on the web further distort the meaning up to conviction that this is not a sign that belongs into etymology sections of community-edited web dictionaries. Of course the same applies to many other parts of the web and goes with the characters “>”, “=”, “"” and “~” in so far as these characters are not used in any language but in the scribal traditions of special disciplines. The fact that your computer has a character on your keyboard does not at all recommend it being used outside of a computer-context, and particularly it can not ever be a standard as it does not need to be included at all in a keyboard layout, or in the keyboard if the key that would carry it is missing as is sometimes the case with the ⟨LSGT⟩ key that carries both ASCII angle brackets in half of Europe (vulgo: the one right of the left shift key).

Is there anything about the “<” sign that outweighs the detriments of its usage? Meseems I have falsified everything that one could possibly say in favor of it, and that this my posting could as well have a place in Wiktionary:Etymology. Come hither, defenders of U+003C in web dictionary etymologies! Can you hold anyone at your side? Or will you shrewdly ignore the issue because of convenience? A community decision would be relieving for the purpose of providing a reference that using “<” for etymologies is uncouth, in the ambit of the rationale. I quit writing for it now, as it seems that I have written the top beer parlor post of this year by length without even being inebriated, but I hope that this has an effect, for the quality of the English Wiktionary and maybe other projects, as it has taken me three hours of thinking and explaining and I have in this time done my best to garner your deltas and I want to spare people from repeating the formulation of the thoughts I have laid down. Palaestrator verborum (loquier) 23:17, 21 October 2017 (UTC)[reply]

I have now read the prior utterances in the 2011 thread, and I highlight that the 2011 vote had introducing a certain format with “from” as a practice. However, now the thread is about if U+003C should be replaced, not about what can be used in general. From what I have fished out there, there were not so relevant reasons why people actually opposed. One (Mglovesfun) said: “< is easier on the eye than 'from', not least because in reading I 'internally' pronounce the 'from' but for < I pronounce nothing” – which can only very conditionally be true, as I have shewn that its nature is obscure and the ”internal pronunciation”, if it exists, which is a dubious phonocentristic claim, cannot have weight as one asks oneself what to pronounce for “<” ⇨ it is illogical. Other people opposed because they had to read “a page full of verbal diarrhea” or because they had been asked to vote (SemperBlotto literally), and mostly because they did not want it to be included in the Wiktionary:Etymology page, thus most only of formal reasons. One utterance by some Stevey7788 in opposition is exemplary of the votes in opposition missing the point:

“The "<" sign is easier to read, as wordiness and lack of conciseness tend to cause confusion. This symbol is also widely used in academic publications on historical linguistics. Readers would all quickly learn what "<" means, since it should be very intuitive.” It is the job of the writer not to cause confusion when using words, but “<” does that as a rule; one cannot learn what it means because it means multiple things and one has to interpret it in context, while words can be artful. Also the usage of however-academic publications does not count as we work on a different publication type with unlike constraints and allowances. I want to point out one point that I have not really touched: “less than is not readable for users of screen readers, braille displays, or other assistive technologies.” (Neskaya)

Another guy (Bogorm) wrote that it “is useful in longer derivation chains” – there are ways more useful for the reader to show derivation chains; you write “<” because you are too inert to think of another thing and to accomplish it. Of course it is not that much a sin that it justifies punishment, but there should be a canon for confirmation of the replacement of such inertia. What is Unicode for if people opt for ASCII? What is the increasing store and display space in computers for if they use such mediocre shorthands?

We can do better in reaching a consensus by starting with one summary, as I have started. Palaestrator verborum (loquier) 00:04, 22 October 2017 (UTC)[reply]

The uses of "<" largely disappeared, sure, but whether that was by "consensus" is not entirely clear. I don't believe there are any conclusive arguments unequivocally in favor of "<" or "from"; how to weigh various pros and cons is a matter of preference. Wiktionary:Votes/pl-2011-02/Deprecating less-than symbol in etymologies did not show consensus. The current text in Wiktionary:Etymology does not mislead a new user, I believe. A new user can read the vote, see that nearly a supermajority (2/3) supported "from", look around a bit, and see that "from" has won in the mainspace. That said, another vote may be in order if we want Wiktionary:Etymology to indicate "from" as the recommended practice. --Dan Polansky (talk) 06:04, 22 October 2017 (UTC)[reply]

I disagree that there is "increasing ... display space in computers". Most people seem to use phones or tablets now. Equinox ◑ 16:06, 22 October 2017 (UTC)[reply]

FWIW, from a usability and understandability perspective, I recommend "from" instead of symbolic notation. This is only three characters longer, and it is clearer. ‑‑ Eiríkr Útlendi │^{Tala við mig} 16:17, 23 October 2017 (UTC)[reply]

A possible argument against plain "from" (I do not think it is decisive, but it deserves to be out there): "[A] < B < C" clearly indicates seriality, that is "A comes from B, which comes from C". Our current practice [A] from B, from C" is in principle parseable also in parallel: "from B and from C", i.e. "A comes partly from B and partly from C".

This could be ameliorated with some extra prose: "from B, which comes from C", but this seems excessive. IMO a simple note somewhere relevant on how "from B, from C" is to be interpreted (maybe in the Glossary?) should be sufficient. --Tropylium (talk) 14:45, 28 October 2017 (UTC)[reply]

Move `{{was wotd}}` notices to talk pages

Would anyone support this? I'm not sure of the utility of this template to a reader beyond archiving old words of the day, and the category Category:Word of the day archive can function as an archive with talk pages. DTLHS (talk) 05:33, 22 October 2017 (UTC)[reply]

Why though, is it too ugly? I am afraid it has exactly zero use, only being a source of work that could be used for removing or fixing other template usages. Also it would effectuate additional clicks ad infinitum because people would have to go to the talk pages to check if a word already has been word of the day. Palaestrator verborum (loquier) 05:44, 22 October 2017 (UTC)[reply]

Support moving "was wotd" to talk pages. I don't see why a reader should care to see immediately whether a word was once the word of the day. --Dan Polansky (talk) 06:07, 22 October 2017 (UTC)[reply]

Having it on the talk page is pointless. I personally find it an interesting bit of information. An alternative would be a category (although not very visible, still better than dumping it on talk). – Jberkel (talk) 16:01, 22 October 2017 (UTC)[reply]

@DTLHS: delete from the main space in any case, it's visual clutter. --Barytonesis (talk) 15:46, 31 October 2017 (UTC)[reply]

I disagree, I think it is useful to keep them where they are. It probably helps prevent the same word being nominated for WOTD again. DonnanZ (talk) 11:47, 12 November 2017 (UTC)[reply]

Nobody seems to have consulted @Sgconlaw. DonnanZ (talk) 11:54, 12 November 2017 (UTC)[reply]

Disagree: I think it would be more useful on the entry page. I have a feeling that if it is relegated to the talk page, new nominators will simply not bother to check whether it is there before nominating words. (Also, pinging @Metaknowledge as the result of this discussion would apply to {{was fwotd}} too.) — SGconlaw (talk) 15:03, 12 November 2017 (UTC)[reply]
Disagree. Thanks for the ping, Jack. If there really were a consensus that it is too much visual clutter (which I don't believe there is), the correct change would be to modify the template's behaviour so that it does not display but continues to categorise, not to move it to the talk page and create more work for everyone. —Μετάknowledge^{discuss/deeds} 19:25, 12 November 2017 (UTC)[reply]
In that case the category would need to be made visible to all users, otherwise the template ceases to have the effect of alerting users to the fact that a word has already appeared as WOTD or FWOTD. — SGconlaw (talk) 03:41, 13 November 2017 (UTC)[reply]

Category:Buyeo language

I just RFV-failed our only two entries: 乙那 and 加. Is it clear that there was a Buyeo language, and if so, are there any texts that are indiscussibly in Buyeo?__Gamren (talk) 16:07, 22 October 2017 (UTC)[reply]

Oh, and @suzukaze-c, -sche, Pedrianaplant.__Gamren (talk) 16:09, 22 October 2017 (UTC)[reply]

From various things I've read, there seems to be agreement that Buyeo had a language of its own. C.f. Buyeo languages on WP. ‑‑ Eiríkr Útlendi │^{Tala við mig} 16:21, 23 October 2017 (UTC)[reply]

@Eirikr I did read that, but it did not give me the impression of certainty. The article does list Buyeo as a Buyeo language (which no doubt leads to misunderstandings), but their article on the language redirects to the article of the ("hypothetical") language family. Did any of the things you read make distinctions between the languages, and did they identify any texts as belonging to the language Buyeo, or do they reconstruct words?__Gamren (talk) 15:38, 26 October 2017 (UTC)[reply]

Poll: deploy timeless skin

It'd be great to have the Timeless skin deployed here (opt-in, of course). Wiktionnaire, the French Wikipedia and a bunch of other sites have it already (see T154371 for a list). It has a sticky header and a responsive design that handles different screen sizes much better than vector (demo). We need to have community consensus before it gets deployed, hence this poll. – Jberkel (talk) 16:14, 22 October 2017 (UTC)[reply]

The tables of content in articles of Wiktionnaire look flattering, a bit like i3-gaps and particularly more readable. I have also the perception that I can have more on my screen in general with it even though it looks like i3-gaps. So if it gets deployed, we only get a plus, as one can use other skins? So there is no reason to haggle and I assent. Palaestrator verborum (loquier) 20:42, 22 October 2017 (UTC)[reply]

I see no reason not to. (as long as it's not default) —Aryaman ^{(मुझसे बात करो)} 22:57, 22 October 2017 (UTC)[reply]

If it's not going to immediately replace Vector, I see no harm in including it as an alternative users can choose. —suzukaze (t・c) 19:48, 23 October 2017 (UTC)[reply]

Oppose making the skin default. Sticky headers are a horrible thing, and I am so happy this fashion of the day has not reached Wiktonary yet. I dread the day on which this kind of design fashion prevails on the English Wiktonary. I don't know what to think of an opt-in deployment; I guess if someone wants to use it, that's up to them, and therefore, I don't oppose deployment as opt-in. --Dan Polansky (talk) 06:57, 28 October 2017 (UTC)[reply]

I hate it. I'm opposed to any loss of space to the width of the article and the font size is way too large. Wiktionnaire can keep it. --Victar (talk) 22:56, 21 November 2017 (UTC)[reply]

FYI, it seems that the development team is talking about deploying the timeless skin on every wiki. Pamputt (talk) 07:56, 19 November 2017 (UTC)[reply]

We should probably add it. I could live with the "sticky header". Might stick with the theme I've got, though. Equinox ◑ 16:11, 21 November 2017 (UTC

This is now deployed on all wikis (opt-in). Our main page looks a bit funky, especially the announcement boxes. And tabbed languages browsing is broken. It's all fixable though. – Jberkel (talk) 11:41, 22 November 2017 (UTC)[reply]

The user Equinox is abusing his administrative authority.

On the Discussion page of the entry glow-up, I provided sources to prove the entry's admissibility to Wiktionary according to all rules of this site. On the first page of Google, you can see the phrase is used by many notable YouTubers with million of subscribers and well-known sites such as Buzzfeed, Reddit, Pinterest, and many blogs, over the course of many years. Still, the user Equinox let his feelings cloud his judgment and deleted the evidence and blocked me. I will do everything until the user Equinox gets sanctions for his administrative abuse of powers. Wiktionary and Wikipedia have rules and he violated many of them with this abuse of authority. He is an angry INTJ that should never be given any authority. He can't control his emotions. He is poison that hinders Wiktionary's growth. — This unsigned comment was added by 175.193.153.88 (talk).

I beg to differ. You removed an RfV template before the result of the verification process. That is a long-standing cause for blocking. You need to understand our rules. SemperBlotto (talk) 12:25, 23 October 2017 (UTC

The deleted talk page contain one link: Proof. Ditto. See also User_talk:175.193.153.88 with endless requests to be unblocked. The block is justified. The IP talk page should be blocked as well. --Anatoli T. ^{(обсудить}/^вклад) 12:42, 23 October 2017 (UTC)[reply]

@175.193.153.88: "He is poison that hinders Wiktionary's growth." lol, Equinox has the highest edit count on the whole project. I too suggest that you be blocked. —Aryaman ^{(मुझसे बात करो)} 22:51, 23 October 2017 (UTC)[reply]
"He is an angry INTJ". In case anyone is wondering, you can safely ignore the above user--his claims are baseless. —Justin (koavf)❤T☮C☺M☯ 00:37, 24 October 2017 (UTC)[reply]

I laughed because I really am an angry INTJ. (I've taken that test twice, ten years apart.) Equinox ◑ 00:39, 24 October 2017 (UTC)[reply]

Section "Descendants"

Could someone edit Wiktionary:Entry layout and explain a bit more the scope of the section "Descendants"? It's not supposed to be used to list all the derived terms, but only the direct descendants. An example. --Barytonesis (talk) 09:52, 24 October 2017 (UTC)[reply]

Well I agree. Some editors take it a bit too far and add English numerous derivatives to a basic Latin root entry even if there are other Latin entries that are more immediate ancestors to those words; in that case, I take them out and keep them on the appropriate page. However, I sometimes add a derived term that may not be directly descended if there is no other lemma entry it could go under, but the word is still notable, and perhaps the only "descended" term from that word, even if not directly and through another intermediate (in this case Vulgar Latin) term. Like with poitrine; although in this case pis does exist... but poitrine is now the main word for chest. I mean, should we start adding links to the separate reconstruction pages for Vulgar Latin derivatives in the main Latin entries as opposed to just putting the derived Romance term itself there? Word dewd544 (talk) 20:53, 30 October 2017 (UTC)[reply]

@Word dewd544: You're right, and I was actually a bit hesitant to remove poitrine; I was targeting more specifically the compounds, and especially the neo-classical compounds, which really don't belong there.

On the one hand, I'd prefer we always put the descendants on the page of their very etymon; on the other, I don't want to see all the info disseminated on a myriad of pages, and I certainly don't want to create a Vulgar Latin entry every time.

So what do you think of this? The layout is not terribly pretty but... --Barytonesis (talk) 11:41, 6 November 2017 (UTC)[reply]

It's not bad. I personally like it and think it can work, but there are a few issues I thought about. And this kind of relates to a broader problem. I remember back when I first started, some users were saying that although the nested type of Descendants sections looked better and were more accurate, they hesitated making that a policy because new users unfamiliar with the precise etymological histories of certain words would necessarily not know where to add them, and preferred just having one big descendants section with each descendant word on its own line regardless of the path the word took to get to it. I personally don't much like putting the word "borrowed" or "borrowing" in parentheses after words in descendant sections because it makes it look sort of cluttered and messy, (and there's also words that we're not sure about either), but for lack of a better approach I've been doing that lately. I like the big nested tree of Romance languages that are made for some entries and linked to (and edited separately) but again that would depend on people who are well-versed in these things to sort out and probably preclude your more average user from contributing. The same applies with having a Vulgar Latin derivative section... also where would terms that we are unclear about fit, and what if there end up being like six different Vulgar Latin sub-sections because of how differently various languages evolved them? Word dewd544 (talk) 23:27, 6 November 2017 (UTC)[reply]

@Word dewd544: sorry for this late (and for now incomplete) answer. Mh, maybe we don't have to make it a policy. In my view, users should be allowed to add descendants on their own lines without having to worry where it should go precisely, but we would still make it clear that the end goal is to have them accurately sorted. We won't force anyone to do the sorting themselves, but if they can, let them do it. What do you think? --Barytonesis (talk) 13:22, 21 November 2017 (UTC)[reply]

Yeah even before reading this I was thinking along the same lines. There shouldn't be anything to stop new users from adding a word as a descendant of a lemma, but eventually the goal would be to have them sorted out properly. If they know how, then sure, but it may involve people like us or others "cleaning up" some of the pages. Right now there's not that many other people making major or drastic changes to the Romance and Latin pages so it's not like we have to worry about this on a large scale. I'm still steadily going through the inherited lexicon of the Romance languages and I'll do it the way we discussed here. Word dewd544 (talk) 22:41, 21 November 2017 (UTC)[reply]

Prompted by Barytonesis to participate following a discussion I had with Rua, here are my two cents. I've taken issue lately with an anon adding English descendants which aren't directly derived from the lemma entries. For instance, salient is indirectly related to salio, but isn't the leap a bit too big to motivate the inclusion of salient as a descendant of salio? As I pointed out to Rua, where do we draw the line? E.g. consequently I could add the following Romanian words to descendants of salto, because they're inevitably from the same source: salt, sălta, săltare, săltăreț, săltător, săltătură, saltație, etc. The notion of adding every word remotely descendent of a specific lemma would create utter chaos in our descendants sections, and that scares the crap out of me. --Robbie SWE (talk) 19:52, 18 November 2017 (UTC)[reply]

salient is from saliens, the present participle of saliō. Participles are non-lemmas, so we add the descendant to the lemma saliō. If we did not follow this principle, and instead demanded that descendants be listed at the exact form, then French sauter couldn't be listed at saltō either. —Rua (mew) 19:56, 18 November 2017 (UTC)[reply]

Sorry Rua, but salient is already listed as a descendant of saliens. It's not appropriate to add salient as a descendant of saliō. I'd appreciate it if you answer my question about the Romanian terms – do you think it's appropriate to add all those terms in the descendants section at saltō? --Robbie SWE (talk) 20:16, 18 November 2017 (UTC) PS: all of the descendants currently listed at saltō have inherited the Latin term through the present infintive saltāre. --Robbie SWE (talk) 20:36, 18 November 2017 (UTC)[reply]

When did they make the decision to make participles non-lemmas? I just noticed that recently. Word dewd544 (talk) 00:24, 20 November 2017 (UTC)[reply]

@Word dewd544, I don't remember when it was, but I vividly remember a discussion about it. I also recall Rua participating in that discussion. --Robbie SWE (talk) 19:15, 20 November 2017 (UTC)[reply]

Participles have always been non-lemmas. They were most likely listed in the inflection tables of Latin verbs from the beginning. Same for English, but in the headword line. —Rua (mew) 19:34, 20 November 2017 (UTC)[reply]

@Rua, Robbie SWE: If we're not going to treat present participles as lemmas, I actually agree with Rua that we should move the descendants listed at saliens back to salio, but iff we create subsections in the descendants section ("From the present participle saliens", "From the infinitive saltare" (although that might be pushing it), etc.), as I suggested above (cf. pectus). Otherwise, no; I don't want to have all the derived terms lumped together. --Barytonesis (talk) 20:10, 20 November 2017 (UTC)[reply]

@Barytonesis, it could work. However, I still don't think the problem is solved. --Robbie SWE (talk) 20:15, 20 November 2017 (UTC)[reply]

But Latin participles used to have actual definitions; in the case of saliens, jumping, leaping, springing, etc. If you look in the page history, that was the case. Now they've been removed and just described as participles. Word dewd544 (talk) 13:05, 21 November 2017 (UTC)[reply]

@Word dewd544: I know, I removed them, rather hastily I should add. But either we treat present participles as full lemmas (in which case it has to be reflected in our categorisation scheme), either not, in which case we shouldn't have translations or descendants hosted there. --Barytonesis (talk) 13:10, 21 November 2017 (UTC)[reply]

I don't like that policy. It will clutter up the descendants sections of the main verb too much. Word dewd544 (talk) 13:16, 21 November 2017 (UTC)[reply]

Request for review

I've added etymology to πανούκλα (diff). Could a more experienced editor check it and correct my formatting where necessary, please? In particular I wasn't sure whether the etymology for a specific sense should be put in-line with that definition or at the top. Thanks very much! -Stelio (talk) 09:58, 24 October 2017 (UTC)[reply]

Hi. You can ping me on the talk pages of the entries if you want; I'm always happy to work on MGr. In any case, the Beer parlour isn't the place to ask; you should go to the Wiktionary:Etymology scriptorium or Wiktionary:Tea room instead. --Barytonesis (talk) 10:06, 24 October 2017 (UTC)[reply]

@Stelio: I've done this. The inline etymology wasn't too shocking in this case, but I think it's best to put it in the etymology section anyway. --Barytonesis (talk) 10:12, 24 October 2017 (UTC)[reply]

Perfect; thank you very much! (This seemed the best place to me, since the question was on how to format the etymology, rather than what the etymology is. But duly noted for the future.) -Stelio (talk) 10:24, 24 October 2017 (UTC)[reply]

Yes, actually it does belong here rather than there. --Barytonesis (talk) 10:30, 24 October 2017 (UTC)[reply]

ISO codes as Wiktionary entries

We've got entries like SH for "The ISO 3166-1 two-letter (alpha-2) code for Saint Helena." But we don't have entries for the ISO 639 language codes (where SH is the deprecated code for Serbo-Croatian). Arguably the language codes are of greater practical value to users of Wiktionary (for translating what the local codes mean) than the country codes. Options include:

Status quo: keep existing ISO entries; exclude language code ISO entries
Deletionist: remove existing ISO entries; exclude language code ISO entries
Inclusionist: keep existing ISO entries; include language code ISO entries
Alternative: remove existing ISO entries; include language code ISO entries

Is there a clear policy on this already, and if not is this worth putting to a vote? (I'm hesitant to jump straight into calling a vote, not having done that here before.) -Stelio (talk) 11:23, 24 October 2017 (UTC)[reply]

We had a vote on this 7 1/2 years ago, which more people supported than opposed, but not enough more for it to pass, so the motion failed for lack of consensus. Perhaps 7 1/2 years is long enough to warrant a new vote to see if there's a clearer consensus now than there was then. —Aɴɢʀ (talk) 11:34, 24 October 2017 (UTC)[reply]

I oppose the creation of entries for any kind of code, unless they are used in running text. —Rua (mew) 14:31, 24 October 2017 (UTC)[reply]

I am likewise skeptical about inclusion of codes, and in this case I point out that existing endeavors on Wikipedia to create ISO 639 code lists combined with codes being included on disambiguation pages of Wikipedia are preclusive for Wiktionary containing them, even if just for reason of parsimony – I really would like to see you’all doing other things than including that. I also want to call your attention to language codes being more contestable than country codes, as the existence of countries is mostly fixed, while if language codes are started, we will have quarrels about people including codes which are not ISO but common and things like that, cluttering things that are difficult to ascertain. Things that the people at Wikipedia are more sturdy to be tackled by. The country codes may stay – I am conservative and I don’t feel itched by them. Palaestrator verborum (loquier) 22:28, 24 October 2017 (UTC)[reply]

Most of the historical discussion is at Talk:jv and Talk:de. I agree with the broad consensus that emerged in those debates, namely that we should have entries for codes if and only if they pass CFI like any other term. —Μετάknowledge^{discuss/deeds} 20:01, 25 October 2017 (UTC)[reply]

Coptic standardisation

@Aearthrise, Algentem, DerekWinters, Vorziblix but also to any interested: I have started a section to standardise a few practices in editing Coptic. The intention is to write down the outcome of the discussion at the Coptic policy page. Lingo Bingo Dingo (talk) 12:08, 24 October 2017 (UTC)[reply]

Etymology of Copto-Greek verbs

There are three ways in which Coptic dialects have borrowed verbs from Greek. Sahidic borrows them as bare imperatives, Akhmimic and Lycopolitan combine the nominal state of ⲉⲓⲣⲉ with the Greek imperative, whereas Bohairic use the nominal state of ⲓⲣⲓ/ⲓⲗⲓ (same verb) with the Greek infintive. Fayyumic may use both the Bohairic and the Sahidic methods, this being somewhat variable depending on the dialect. Bare infinitives are not used in any dialect.[3]

The treatment of Greek etymologies has begun to diverge a little for Bohairic and Sahidic, compare for instance ⲉⲣⲫⲟⲣⲓⲛ (erphorin) and ⲫⲟⲣⲉⲓ (phorei). I think the best option is to treat the Greek stems like borrowings and to put the nominal state before it as needed, e.g. Bohairic ⲉⲣ- + φορέω, using {{bor}}. Unattested bare infinitives could then be avoided in Coptic text. Maybe it would also be nice to add (in the future) the specific Greek conjugated form in which a word was borrowed, but that may prove a hassle for verbs that do not yet have a Greek entry. Lingo Bingo Dingo (talk) 12:08, 24 October 2017 (UTC)[reply]

@Lingo Bingo Dingo: I agree. We should write the etymology something like: From {{prefix|cop|ⲉⲣ}}{{bor|cop|grc|φορέω|notext=1}}. — Algentem (talk) 13:50, 7 November 2017 (UTC)[reply]

Nominal and pronominal states

The nominal state and the pronominal state are specific forms of words that occur before nouns/adjectives and pronouns respectively. These states can exist for nouns, verbs and prepositions. The scholarly convention is to mark nominal states with a hyphen at the end and pronominal states with a double oblique hyphen at the end, indicating the position of the argument. It seems wise to standardise this before construct states are added in bulk.

Some matters to be solved are:

Should entries be created with the page name as the unhyphenated form (e.g. ⲛ, ⲙⲙⲟ), exclusively as the hyphenated form (e.g. ⲛ-, ⲙⲙⲟ-), or hyphenated for nominal states and double-hyphenated for pronominal states (e.g. ⲛ-, ⲙⲙⲟ⸗)?
Similarly, how should construct states appear in head templates? So should the displayed pronominal state be ⲛⲁ, ⲛⲁ- or ⲛⲁ⸗?
Should construct states have a L3/L4-header as prefixes or as specific parts of speech?
Which form (absolute state, nominal state or pronominal state) should be made the lemma?

Diffent implementations of 2 and 3 have been tried out at ⲛ- (n-).

I strongly favour displaying nominal states with a hyphen and pronominal states with "doubliques" to follow the modern convention. (point 2)

For lemmatisation I prefer the absolute state, then the nominal state and finally the pronominal state if the other states are unattested. This order of preference is used in most dictionaries. (point 4)

My opinions on the other issues aren't very strong, though I somewhat prefer to have pronominal state entries on pages ending with a hyphen rather than a double hyphen. (point 1) For lemmas it should in my opinion always be clear to the reader what the part of speech is, but I don't favour a particular implementation. (point 3) Lingo Bingo Dingo (talk) 12:08, 24 October 2017 (UTC)[reply]

@Lingo Bingo Dingo:

It would be fine to add hyphenated forms for nominal states and double-hyphenated forms for pronominal states.
The pronominal state should be displayed ⲛⲁ, ⲛⲁ- or ⲛⲁ⸗.
If I understood your question correctly, construct states should have a L3/L4-header as specific parts of speech because verbs; Bohairic can differ widely in construct states.
The nominal state should be made the lemma.
--Aearthrise

^{(Ⲁⲉⲁⲣⲑⲣⲓⲥⲉ)} 00:11, 25 October 2017(UTC)

@Aearthrise: Generally I agree, but why the nominal state? Standard practice in Coptic reference works is to lemmatize at the absolute state if possible, and most of the existing entries (including most of them that you’ve added) are found at the absolute state, but I’m open to arguments if there are advantages to favoring the nominal state instead. — Vorziblix (talk · contribs) 06:42, 26 October 2017 (UTC)[reply]

Maybe he was mostly thinking of prepositions, which generally don't have an absolute state, when he wrote that? The question was perhaps a little too concise. Lingo Bingo Dingo (talk) 09:32, 26 October 2017 (UTC)[reply]

My confusion derived from the nomenclature of the states; I learned their names as long form, short form, abbreviated form, and past participle form. I agree that we use absolute state(long form).

--Aearthrise

^{(Ⲁⲉⲁⲣⲑⲣⲓⲥⲉ)} 10:04, 26 October 2017(UTC)

@Lingo Bingo Dingo:

1 I am ok with the hyphenation and double-hyphenation as long as double-hyphenation is not too cumbersome.

4 I agree that the absolute state should be lemmatized. DerekWinters (talk) 20:53, 6 November 2017 (UTC)[reply]

@Lingo Bingo Dingo: I think that the absolute state should be the lemma. That is the common practice, and most nouns only have the absolute state anyway.

I think that the nominal and pronominal states should be displayed with a hyphen and a double hyphen (oblique or straight?) respectively, but be linked without it (but we do add hyphens to regular prefixes, so I'm not sure). When we create a page for a nominal or a pronominal state we can add the hyphen in the header if we don't have it in the entry.

I also think that nominal and pronominal states should be labeled as prefixes, as they can't stand on their own, but I might change my mind on this in the future.

I created a template that we can use called {{cop-noun}}. It's fully built on the header template, so it works the same. I would like input on it. The only thing I'm unsure about is the plural. I added it at the front, but most entries put it behind the nominal and pronominal states—is there a reason for this? Also, if you don't input a plural form, it assumes that the plural form is identical to the singular and adds that (is this superfluous?). I made it so that you can have up to three plural, nominal and pronominal forms, are more needed? — Algentem (talk) 13:50, 7 November 2017 (UTC)[reply]

Labelling nominal and pronominal states as prefixes is fine in my book, that's basically how they function.

I am afraid (and actually quite certain, though I can't name any examples now) that we may need more plural parameters than three, but I don't know what the attested maximum is.

By the way @Vorziblix, how does {{pre}} handle double oblique hyphens? Lingo Bingo Dingo (talk) 13:03, 22 November 2017 (UTC)[reply]

@Lingo Bingo Dingo: Hmm, right now, I don’t think it does... I’ll look into Module:compound, where I believe this would be handled, and see if I can figure out how to fix that. We might need the help of someone more experienced with Lua scripting, though.

@Algentem: Thanks for making the headword template! That’s something I’d been wanting to do for a long time too. I have no objections to having the plural at the front, and I think showing the identical plural when none is input was a good choice. However, as Lingo Bingo Dingo says, allowing more than three plurals is probably needed — e.g. ⲁⲗⲱ (alō, “pupil”) has the plurals ⲁⲗⲟⲟⲩⲉ (alooue), ⲁⲗⲁⲩⲉ (alaue), ⲁⲣⲟⲟⲩⲉ (arooue), and ⲁⲗⲱ (alō) all attested in Sahidic.

As far as the other issues raised go: I’ve no objection to labelling (pro)nominal states as prefixes if consensus leans in that direction. Regarding the straight vs. oblique double hyphen, the oblique was traditionally used, with the straight only substituted when the oblique was typograhically unavailable. Finally, based on the practice of other Coptic dictionaries, and for consistency with other languages on Wiktionary, I’d support linking (pro)nominal states with hyphens and having them in the entry name, particularly if we put them under a prefix header. This also has consequences for templates like {{af}}, where linking with hyphens would categorize a term as prefixed, while linking without would categorize it as a compound; the former seems to be the way to go if we’re treating these states as prefixes. — Vorziblix (talk · contribs) 22:54, 22 November 2017 (UTC)[reply]

Regarding labelling as prefixes, I'd like to clarify that I'm thinking of something like context labels or categories, and that my preference would still be to have POS headers and headword lines as the functional POS. Lingo Bingo Dingo (talk) 12:01, 24 November 2017 (UTC)[reply]

Dialect tags for Copto-Greek words

Greek borrowings were previously not tagged with dialect labels, but they can differ in dialects and some words are completely absent from some dialects. Therefore telling in what dialects a word is attested is useful information for a reader. This is a proposal to make labelling dialects the norm in Coptic. Lingo Bingo Dingo (talk) 12:08, 24 October 2017 (UTC)[reply]

@Lingo Bingo Dingo: I agree that dialect tagging should be the norm. DerekWinters (talk) 20:53, 6 November 2017 (UTC)[reply]

@Lingo Bingo Dingo: I agree. Seems logical. — Algentem (talk) 13:50, 7 November 2017 (UTC)[reply]

Statives

This is an extremely minor matter, but a convention exists to mark statives with obelisks/daggers. These do not mark any arguments and function just as a shorthand. To me it seems rather useless to include, but I really don't care that much. Lingo Bingo Dingo (talk) 12:08, 24 October 2017 (UTC)[reply]

Previous discussion found here. On pretty much all of these points I’m in agreement with Lingo Bingo Dingo; I’d have nominal states with a hyphen and pronominal with ⸗ in both the headword line and the entry name, and I’d lemmatize at the absolute state. Regarding statives, if marking them is done it should probably be via a parameter in a headword template, but I agree that it’s probably unnecessary. — Vorziblix (talk · contribs) 00:14, 25 October 2017 (UTC)[reply]

@Lingo Bingo Dingo: I would also like to add two points to this discussion. I think that we should add guidelines as to how we add alternative forms in Coptic. Most words in Coptic are attested in several different spelling ways, and pages like ϫⲱⲙ has too many in my opinion. Either we could add a dropdown box, or we only display the "most common" forms, one for each dialect, or a mix between the two?

Secondly, I don't see a reson for why texts in Coptic has to have a bigger font size. I have no difficulty reading Coptic, no more than for example Ancient Greek. For my conjugation template, I made the Coptic font size regular: {{cop-conj}}. — Algentem (talk) 13:50, 7 November 2017 (UTC)[reply]

@Algentem Dropdown boxes may be a good idea for words with many attested spellings (like more than 4 or 5), Muhammad shows a possible implementation. I'd be fine with changing the font size, though I have no idea how this would be implemented. Also pinging @Aearthrise, DerekWinters, Vorziblix again (also see above). Lingo Bingo Dingo (talk) 12:56, 22 November 2017 (UTC)[reply]

@Algentem, Lingo Bingo Dingo: Another possibility would be to put alternative forms as a subheader under the POS header (as WT:ELE permits) so as to shunt them below the definitions. I’d be fine with dropdown boxes too, though. As yet another alternative, perhaps a Coptic-specific template can be created to list forms by dialect, showing only the ‘most common’ ones until expanded? (Compare the handling of pronunciations with {{zh-pron}} on Chinese entries.) Any one of these solutions is fine by me, but I’d avoid outright deleting uncommon alternative forms.

As far as font size goes, I agree that shrinking it would be nice. — Vorziblix (talk · contribs) 21:59, 22 November 2017 (UTC)[reply]

@Vorziblix, Lingo Bingo Dingo, Algentem: I think it would be best to have dropdowns organized by dialect, and ordered by "most common", if that is known. And yes to size shrinking. DerekWinters (talk) 06:01, 24 November 2017 (UTC)[reply]

Improving Wiktionary

Ideas for improving Wiktionary

Collect ideas from the community for improving Wiktionary,
Readability - format the Table of Contents on the right to allow fluid reading from article top.
Etymologies - increase depth of word etymologies and use flowcharts
Reference similar words in a more integrated cross language form:
Rewrite whole entry into a information redundant but readable definition summary at the top.
Treat big words / core words as essay topics and as idea primes / word primes.
Integrate Proto-Indo-European roots into a main column for parsing through roots in a readable way.

-Aision (talk) 03:47, 25 October 2017 (UTC)[reply]

Ideas are easy. Do you have any of the technical skills necessary to implement them? DTLHS (talk) 03:51, 25 October 2017 (UTC)[reply]

These are really vague IMO. The second one is covered by the Tabbed Languages gadget. —Aryaman ^{(मुझसे बात करो)} 11:24, 25 October 2017 (UTC)[reply]

My own thoughts:

Collect ideas -- Why view this as a separate endeavor? We're *already* actively engaged in improving Wiktionary. We're *already* actively engaged in discussing ideas for improvement (viz. this very page).
Readability (TOC) -- As Aryaman noted, already addressed by some existing gadgetry.
Etymologies -- I believe etymology depth is already being addressed by those interested in etymologies. And I struggle to imagine how flowcharts would be helpful.
Referencing -- I don't understand what Aision means by this suggestion, especially the "cross language" part.
Rewriting -- This also doesn't make sense to me. The sense lines already provide definitions in a readable and concise fashion.
Easy topics / idea primes -- Another puzzler. I don't understand this either.
PIE roots in a main column -- Easier to imagine possibilities for what it would look like, but hard to imagine how this would be implemented, especially in a way that is 1) cross-platform (what about mobile? etc.), and 2) user-friendly (what about those relying on screen readers? etc.).

Clarity on the above would help. ‑‑ Eiríkr Útlendi │^{Tala við mig} 16:53, 25 October 2017 (UTC)[reply]

Looks like a mindmap from a teacher training. As dubious as the business of pedagogues is, you just don’t know if it’s dully obvious or obviously vain. Palaestrator verborum (loquier) 17:34, 25 October 2017 (UTC)[reply]

I don't really like tabbed languages but I would like a way to demote the ToC (and think we should do so by default). Equinox ◑ 19:57, 25 October 2017 (UTC)[reply]

Some thoughts:
1. Re: "Collect ideas". If some user or users want to engage in a systematic effort to prepare for major changes in Wiktionary, they may do so, but we seem to have problems achieving consensus for relatively modest changes.
2. Default design should be for unregistered users, about whom we should get some information, about their preferences and, better, behavior. Otherwise configuration gadgets should solve the most important problems for most users.
3. Re: "Treat big words / core words as essay topics and as idea primes / word primes." This sounds like a job for Wikipedia. Keywords (1st edition 1976) by Raymond Williams provides examples of such essays for the semantics of about 120 words in about 300 pages, excluding introduction. Comprehensive grammars (eg, CGEL) cover grammatical matters of a few individual words with one or a few paragraphs, otherwise mentioning them in lists of words with similar characteristics.

DCDuring (talk) 04:05, 26 October 2017 (UTC)[reply]

On collecting ideas, one problem with our current set-up is that the Beer Parlour etc. archives are set up chronologically, not topically. A topical index of some kind to BP discussions (possibly with notation on the status of the issue: open vs. implemented vs. rejected) would be a big help to getting a better big picture on the state of Wiktionary as a project, I believe. --Tropylium (talk) 14:36, 28 October 2017 (UTC)[reply]

See #Topical organization of BP archives below. DCDuring (talk) 18:20, 29 October 2017 (UTC)[reply]

About "big words": I was thinking about concentrating our efforts on most viewed pages first (have a look at Wikiscan). OK, the lots of XXX things in there, but we can ignore those. – Jberkel (talk) 17:34, 6 November 2017 (UTC)[reply]

Adding HSK grade to entries

@Atitarev, Bumm13, Dine2016, Dokurrat, Hongthay, Jamesjiao, Justinrleung, kc_kennylau, Suzukaze-c, Tooironic The HSK level of the words' meanings could also be added, just as Japanese entries show. --Backinstadiums (talk) 20:53, 25 October 2017 (UTC)[reply]

Applications

It seems to me that there are a lot of word "applications" that are being overlooked relating to verbs that have present (ing) and past {ed) participles. While many are correctly noted as becoming nouns (present) or adjectives/adverbs (past), a far larger number have neither mentioned. I believe the vast majority of verbs can become adjectives, adverbs, or nouns through their past and present participles, and wonder who is deciding which shall be included in Wiktionary and which shall not? I am an anagram "nut" and include a vast number of such examples in my own anagram list, and wonder why they should not be included in Wiktionary's list. Has this issue come up before, and if so, how is it being handled? — This unsigned comment was added by Scottmacstra (talk • contribs).

Scott MacStravic?? --Backinstadiums (talk) 18:24, 26 October 2017 (UTC)[reply]

@Scottmacstra: Just a quick note, Scott--I replaced your email address with your username. If you want to publish your email address, that is totally up to you of course, just wanted to make certain. —Justin (koavf)❤T☮C☺M☯ 19:10, 26 October 2017 (UTC)[reply]

User Benwing/Benwing2 has been inactive for some time

Ben was a great contributor in Arabic first, then Russian - even more (other languages, modules and many templates) but has been absent for a considerable amount of time. He hasn't handed over any codes, e.g. for generating inflected Russian forms and pronunciations, many other things. I have tried to contact him a few times but didn't get any response. It's sad but is there anything available that someone can take over or, at least, document? To be honest, I sort of relied on him being available on any fixes, enhancements in the infrastructure for Russian and didn't bother to ask him to train how to do things (modules are well documented, though). Also, I want to thank him for creating a great infrastructure, so that creating quality entries (with the knowledge of the complicated Arabic and Russian pronunciation and grammar) has become easy. --Anatoli T. ^{(обсудить}/^вклад) 08:32, 27 October 2017 (UTC)[reply]

@Atitarev: I hope that it's only temporary, or if not, that he's alright anyway. I was really impressed with his work. --Barytonesis (talk) 19:45, 31 October 2017 (UTC)[reply]

How to indicate a long å?

In Northern Sami, various letters can indicate short or long vowels, and we indicate the long ones with a macron, as in many other languages as well as some linguistic descriptions of the language. In Lule Sami, the same would be desirable, but there is a problem with one letter that Northern Sami doesn't use: å. When you put a macron on it, it looks really weird and not really recognisable as a macron, å̄. Does anyone have ideas on how to make this look better? —Rua (mew) 10:53, 27 October 2017 (UTC)[reply]

Is a macron below any good? å̱ —Rua (mew) 10:58, 27 October 2017 (UTC)[reply]

What do published works use? In the font I use, the macron above looks just fine; another option would be to use the acute accent to mark length in this case since precomposed ǻ (U+01FB) is an existing Unicode point. —Aɴɢʀ (talk) 18:18, 27 October 2017 (UTC)[reply]

In the work that distinguishes them that I have, it's distinguished by bolding. Another one just writes in plain text that the å is long. So that's not particularly useful. —Rua (mew) 20:23, 27 October 2017 (UTC)[reply]

No, I guess not. Is this just for headword lines, not for page names? I'm still partial to ǻ. —Aɴɢʀ (talk) 23:08, 27 October 2017 (UTC)[reply]

Lule Sami and Northern Sami already use á as a real letter in orthography, so I'd rather not use the acute for a non-orthographic mark. It's why I prefer the macron, anyone who knows the spelling will know that it's not normally written that way. —Rua (mew) 13:12, 28 October 2017 (UTC)[reply]

What's wrong with following "ː"? Chuck Entz (talk) 21:56, 27 October 2017 (UTC)[reply]

In a headword line? —Rua (mew) 21:57, 27 October 2017 (UTC)[reply]

There is an IPA rule stating that diacritics that would clash with a descender or an ascender can be printed after the letter instead (so e.g. [g˖] for a fronted [g], or [l̩´] for syllabic [l] with high tone). This might be applicable here too: å¯? --Tropylium (talk) 14:30, 28 October 2017 (UTC)[reply]

Please vote: Wiktionary:Votes/2017-07/Templatizing topical categories in the mainspace 2

Please vote here: Wiktionary:Votes/2017-07/Templatizing topical categories in the mainspace 2.

Support, oppose, abstain, anything is fine.

I extended this vote by 1 month per request by @Dan Polansky, if no one minds. He requested it at the "Decision" section at the end of the vote.

To be clear: personally, I too support extending the vote. Like Dan, I know there's been some opposition to vote extensions.

The vote already was way past the scheduled end date, but only 7 people voted and the vote count would result in "no consensus" as of now. --Daniel Carrero (talk) 04:46, 28 October 2017 (UTC)[reply]

To be honest, I do not even understand what the vote is about or to which end it is started. “Templatizing the markup”? What is that supposed to mean? Are the voters supposed to okay the use of bots to replace direct categorizations with templates? Or asked differently: What will change or is changed when the vote passes? Palaestrator verborum (loquier) 13:20, 28 October 2017 (UTC)[reply]

@Palaestrator verborum: The vote is about allowing templates {{cat}} and {{C}}. The thing is, these templates are already being used in an increasingly large number of entries. But as far as I know, there's no proper consensus for their use. (were they approved in any discussion and/or vote before?) If the vote passes, the templates will be officially accepted and may be added in all entries by bot where applicable. If the vote fails, basically nothing changes. (for example, even if the vote fails, we would not remove the templates from the entries that already use it)

But a failed vote would also likely discourage people from doing the specific action voted -- adding the templates in a lot of entries in a short period of time. Some editing conventions and policy pages apparently were never discussed or voted, but are arguably the "status quo" anyway and persist. If the vote never existed, someone might get the impression that since many entries use {{cat}} and {{C}}, then these templates must be acceptable by default and may be added to many other entries, quickly, without further discussion. --Daniel Carrero (talk) 20:46, 28 October 2017 (UTC)[reply]

Entries for common Finnic-Samic words?

@Tropylium The Finnic and Samic languages have a lot of shared vocabulary that isn't found in any other Uralic languages. A Proto-Finno-Samic language is posited by some, but not generally accepted (see w:Finno-Samic languages) and we don't have it as either a real language or an etymology-only language under Proto-Uralic. This is a shame, because it means we can't create entries that contain these words and their descendants. It would be useful to have an entry for, for example, *ültä, to house Proto-Finnic *ültä (Finnish yltä) and Proto-Samic *ëltē (Northern Sami alde). —Rua (mew) 13:09, 28 October 2017 (UTC)[reply]

Your last example is not a problem; it's also reflected in e.g. Erzya вельде (veľďe) and we can well reconstruct it as Finno-Permic or similar (an option would be to add "West Uralic", for the F-S-Mordv group, supported in many recent analyses). By now this is the case for maybe most cases that are not loanwords (and those can be grouped together whereever the loan is originally from, say *vasara etc. from *wáĵras). E.g. Terho Itkonen's 1997 paper discussing the concept of Finno-Samic lists 98 Finnic-Samic word groups, of which at least 40 have by now been explained otherwise.

For the remaining cases, it's not at all clear if they come from some common ancestor, from some unknown third source, or are old loans between the two. At least some cases are identifiable as substrate loans (e.g. 'island', *saloi ~ *suolōj, has parallels in Baltic and in the later substrate vocabulary in Sami) and therefore should not be reconstructed back to a common native proto-form.

This is really a general problem for etymology entries though, not something specific to the issue of Finno-Samic. If we have an alleged PIE root reflected only in, say, Germanic and Celtic, or Germanic and Balto-Slavic, or in Balto-Slavic and Indo-Iranian — should we reconstruct a PIE entry (possibly tagged as regional), or just refer each daughter entry to the other? Ditto we have an alleged Malayo-Polynesian root only in some western languages, an alleged Central Semitic root only in Arabic and Aramaic, etc.

I would suggest that we stick to a weak version of the "three-reflex rule": only create bottom-level (etymology-less) proto-entries if they have either

reflexes in at least three separate descendants, or
reflexes in two non-adjacent descendants.

--Tropylium (talk) 14:27, 28 October 2017 (UTC)[reply]

Twi language code?

What code should we use for words in the Twi language? Both tw and twi give lua errors. (I came across "bonsam" in a pop song - I think it's Twi but have added it as Akan.) SemperBlotto (talk) 16:09, 28 October 2017 (UTC)[reply]

As it says in WT:LANGTREAT, we use "ak" and treat it as part of Akan. Chuck Entz (talk) 16:20, 28 October 2017 (UTC)[reply]

Topical organization of BP archives

Under the topic above #Improving_Wiktionary, User:Tropylium suggested that one problem is the absence of a topical organization for the BP archives. I agree that it is hard to find discussions that bear on a given point and that the effort to do so seems to be forthcoming only in support of a proposal, especially a vote.

What techniques would usefully assure us that relevant prior discussions would be readily available to help raise the level of our discussions?

The existing means seem tedious and are often not used. The principal means I have used for such purposes is keyword search, first for a page using Wiktionary's search restricted to Wiktionary space, then on the search results for that same or another keyword using the browser's search, and finally on the page for that same or yet another keyword. Then begins the process of reading the text to determine whether the discussion is relevant to the matter at hand. Obviously keywords, usually multiple keywords, have to be well selected, especially since we don't limit the vocabulary used in discussions. I sometimes don't find all the discussions that would be relevant to a matter of interest. Usually I only become aware of the missed discussions by accident or by someone's else's search. There must be other times when a relevant discussion is not found at all.

Would an appendix of discussions referenced in votes and in other discussions help? That seems fairly simple to implement.

Should we salt archived discussions with keywords that would assure inclusion in relevant keyword searches even when the keyword was not present in the original discussion?

There are other, more radical possibilities such as putting each BP discussion on a separate page to eliminate the last two keyword searches most of the time.

Does anyone have any better ideas? DCDuring (talk) 18:19, 29 October 2017 (UTC)[reply]

I think something like this would be quite helpful on discussion pages in general:

Wyang (talk) 06:00, 30 October 2017 (UTC)[reply]

There's also the search box in Category:Wiktionary-namespace discussion pages, though that searches the Tea Room, Etymology Scriptorium, and Grease Pit as well as the Beer Parlour. — Eru·tuon 06:05, 30 October 2017 (UTC)[reply]

I sometimes wish we had more of a ticket system than a discussion system. Issues would be open and could be commented on until they were resolved, as well as tagged. This probably isn't possible onsite. DTLHS (talk) 06:04, 30 October 2017 (UTC)[reply]

I would support having a ticket system. --Daniel Carrero (talk) 21:02, 30 October 2017 (UTC)[reply]

Whatever the merits of a ticket system for solving more-or-less technical (ie, GP) problems, it does not seem particularly well suited for most of the matters that appear in BP, which tend to be less structured and structurable. In many cases they involve revisitation of matters discussed before. Resolution of a BP matter may require the creation of tickets. I'm sure that a ticket system could help us achieve ever greater ossification of our format, which seems to be where consensus, momentum/inertia, or technocracy lead us. DCDuring (talk) 22:51, 30 October 2017 (UTC)[reply]

Cheers. One aspect of a way of going about doing this organization is going through previous discussion and, without modifying the text of archived discussion, amend the headers with a label, according to a scheme. Maybe just put the label under the header. Others can then copy from the chronological archives to a new topical archives, its important not to destroy archives and its not a crime to make a redundant but differently organized copy. First thing would be to set down the number of general topics these ideas come in.-Aision (talk) 02:23, 31 October 2017 (UTC)[reply]

This one will help https://meta.wikimedia.org/wiki/WMDE_Technical_Wishes/AdvancedSearch --Backinstadiums (talk) 19:19, 31 October 2017 (UTC)[reply]

What about this? Wiktionary:Beer parlour index. --Daniel Carrero (talk) 22:03, 9 November 2017 (UTC)[reply]

Colloquial vs. informal

What is the distinction here? How can we decide whether a non-formal term is colloquial and/or informal? Should some senses be glossed with both? Equinox ◑ 03:45, 30 October 2017 (UTC)[reply]

It's worth mentioning that many people use "slang" to refer to both of these; I've never heard it used as a synonym for "jargon". As for the other two, the glossary definition looks like "informal" is for alternative forms or related synonyms. Ultimateria (talk) 09:45, 30 October 2017 (UTC)[reply]

I'll just complain a bit without solving anything, but as long as we won't have some clear distinctive examples of what we mean exactly with all these labels (perhaps in the form of a table?), we're bound to have the same questions come up again and again. --Barytonesis (talk) 00:16, 31 October 2017 (UTC)[reply]

Wiktionary:Glossary would be the place. Equinox ◑ 19:11, 31 October 2017 (UTC)[reply]

Actually at Appendix:Glossary, there is "informal" and "colloquial". --Dan Polansky (talk) 19:14, 31 October 2017 (UTC)[reply]

Oh! That's how I missed it. Equinox ◑ 14:24, 1 November 2017 (UTC)[reply]

@Equinox: For new Czech entries, I am no longer using "colloquial" and only use "informal". In the archives of my talk page, there is a little write-up concerning the two, showing that most modern English dictionaries use "informal" and do not use "colloquial" (User talk:Dan Polansky/2016#Colloquial vs. informal). --Dan Polansky (talk) 18:59, 31 October 2017 (UTC)[reply]

I'm not sure that a distinction can be maintained or is worth maintaining, but there are expressions that seem to only be used in speech or reported speech. I'll find examples if I can. DCDuring (talk) 00:19, 1 November 2017 (UTC)[reply]

I think they involve anaphora + informality/slang: this here, that there, watcha, luv ya/love you, bye bye. DCDuring (talk) 00:31, 1 November 2017 (UTC)[reply]

For the record, of the ~100 English entries I looked at that had the colloquial label, the overwheming majority (>90%) did not fit this description and thus would be better characterized as informal or slang. DCDuring (talk) 15:59, 1 November 2017 (UTC)[reply]

Some languages have a bigger difference between literary and colloquial registers than English does; Welsh, for example, is a pro-drop language and makes wide use of synthetic verb forms in its literary register, but is non-pro-drop and uses primarily analytic verb forms in its colloquial register. I always use {{lb|cy|colloquial}} for the Welsh colloquial register because "informal" doesn't quite feel right, though I'd be hard pressed to say exactly why. Maybe just because "colloquial" is and always has been the usual word for the register opposed to literary. Some other languages with very far-reaching linguistic differences between literary and colloquial registers are Burmese and Bengali. —Aɴɢʀ (talk) 15:52, 1 November 2017 (UTC)[reply]

(outdent, perhaps) Our Appendix:Glossary does not seem to clear map the distinction that says, colloquial = primarily spoken & informal = not formal yet not necessarily primarily spoken:

colloquial: Used primarily in casual conversation or informal writing and not in more formal written works, speeches, and discourse. ...
informal: Denotes spoken or written words that are used primarily in a familiar, or casual, context, where a clear, formal equivalent often exists that is employed in its place in formal contexts. ...

I would submit that the above criteria are basically equivalent. A question is, for tagging English, is there at least one dictionary that uses both or that distinguishes merely informal from informal and primarily spoken? (I said for tagging English since the discussion of other languages can lead to nuance that does not apply to English, so it is better to constrain the question to English) --Dan Polansky (talk) 18:12, 3 November 2017 (UTC)[reply]

My 1995 print edition of Collins COBUILD has both "informal" (eg, decaf) and "spoken" (eg, whoops) as distinct labels. My older Longmans DCE does not. Are there others? DCDuring (talk) 00:57, 4 November 2017 (UTC)[reply]

Favicon and apple-touch-icon

Despite the fact that we switched to a variant of the "Scrabble/Mahjong logo" months ago, the favicon and Apple Touch icon still show the ['w]. KATMAKROFAN (talk) 04:12, 30 October 2017 (UTC)[reply]

I brought this up at Wiktionary talk:Votes/2016-05/New logo 2#Favicon, but never did anything about it, largely because I was simply happy for all those votes to be over. If somebody cares (@Dan Polansky?) and created a new favicon consistent with the logo, I'm sure a vote for it would pass. —Μετάknowledge^{discuss/deeds} 00:32, 1 November 2017 (UTC)[reply]

.ico files can't be uploaded to Wikimedia Commons, it turns out. So here is an icon with a white background and here one with a transparent background (I think the white is more effective). These are in the same format as the existing icon file (three sizes: 16, 32, 48 pixels square). The simplest way to install is to copy over the existing file: https://en.wiktionary.org/static/favicon/wiktionary/en.ico -Stelio (talk) 20:43, 2 November 2017 (UTC)[reply]

Indices on main page?

Why do we link to the Index: namespace on the main page? It's maybe the third thing that people see, but it's embarrassingly outdated (April 2012). We should update them (or even do a complete overhaul, with {l} to mitigate the orange links) or think about replacing them. Perhaps the About X pages or Category:X language. Ultimateria (talk) 09:39, 30 October 2017 (UTC)[reply]

@-sche (you edited the main page recently, IIRC) --Per utramque cavernam (talk) 15:34, 26 February 2018 (UTC)[reply]

I agree they are woefully outdated and should probably go, but I'd rather not remove an entire section of the main page without some more people participating in discussing it! - -sche (discuss) 18:42, 26 February 2018 (UTC)[reply]

Westphalian

@-sche, Rua I'm a bit confused about how to add Westphalian entries to descendant lists. Am I supposed to use nds-de? Should I be specifying that it's Westphalian, ex. * {{desc|nds-de|Sorge}} {{q|Westphalian}} or * {{desc|nds|-}} *: Westphalian: {{l|nds|Sorge}}? Thanks. --Victar (talk) 22:59, 30 October 2017 (UTC)[reply]

Here is a more practical example using Old Saxon ertha. Should I be throwing Dutch Low Saxon, German Low German, and Westphalian under Low German ({{desc|nds|-}})? --Victar (talk) 23:46, 31 October 2017 (UTC)[reply]

de:Verzeichnis:Deutsch/Dialekte und Varietäten has made me of the mind that we shouldn't have gotten rid of the Westphalian language code. --Victar (talk) 06:24, 1 November 2017 (UTC)[reply]

CU Vote for Chuck Entz

Hey all, please note that the vote (Wiktionary:Votes/cu-2017-10/User:Chuck_Entz_for_checkuser) for User:Chuck Entz to become a checkuser is scheduled to end tomorrow. While Chuck currently has a significant level of support, the WMF requires 25-30 affirmative votes for this role. If we are unable to reach that by the currently scheduled end of the vote I will extend it a week, but I thought if I posted about the vote here a few people who might not have noticed would be alerted and we could end on time. - TheDaveRoss 13:02, 31 October 2017 (UTC)[reply]

Kazakh orthography

What will be the consequences of the Kazakh reform for our entries? --Barytonesis (talk) 14:09, 31 October 2017 (UTC)[reply]

Probably should change the Cyrillic entries to soft redirects and move the definitions, etc., to the Latin spellings. —Stephen ^(Talk) 14:21, 31 October 2017 (UTC)[reply]

Why should we follow every whim of a government? The Kazakh entries should stay Cyrillic up into eternity – the Cyrillic alphabet will stay in use outside of Wiktionary because it is fit for the language, so there is no need to bother oneself with the reform. Note that the French entries also do not apply the 1990 reform, because why write it easy if you can write it hard? It is has also been a mistake, a shameless kowtow to start the German entries in 1996/2004/2006 reformed spelling, and I opine that Wiktionary should switch the Russian entries back to the spelling that has been the only one before the Bolshevik overthrow (code ru-petr1708) and is of course still the norm, because anyway some governments later decide that Russian should be written more phonetically like Serbian without double н and things like that. What will Wiktionary do if a cabal between the governments of the majority of English-speakers decides to adapt an “intuitive” spelling, reorganize half the Wiktionary (with category names and descriptions)? In my view it is also more agreeable to create Russian entries in pre-1918 spelling and the reason why I do not have created any Russian entries is that I am averse to communist spelling and tend to read no Russian literature because the publishing houses deface their editions with having them reformed. The governments can keep the language reforms for themselves, the states are not the language communities, but their exploiters that parasitize on them by administrating them and stealing the decisions that they should make themselves. The looser the connection with the government is, the better for the people which uses the language, and thus it is desirable to use a different writing system than the government uses. To revolt is a natural tendency of life. Even a worm turns against the foot that crushes it. In general, the vitality and relative dignity of an animal can be measured by the intensity of its instinct to revolt. Palaestrator verborum (loquier) 16:39, 31 October 2017 (UTC).[reply]

Err, communist spelling? —Aryaman ^{(मुझसे बात करो)} 01:57, 2 November 2017 (UTC)[reply]

@Barytonesis, Stephen G. Brown: We need to wait and see the extent to which the switchover is accepted by the linguistic community. If Kazakh books, magazines, newspapers, billboards, product packaging, etc., really start using the Latin alphabet, then we need to reflect that by moving the content to the Latin spelling and having the Cyrillic spellings be soft redirects (do we have a template {{Cyrillic form of}}?). If not—if only the government switches to the Latin spelling while the rest of the Kazakh-speaking world blithely goes on using the Cyrillic alphabet—then we should keep the content where it is and have the Latin spellings be soft redirects. —Aɴɢʀ (talk) 17:56, 31 October 2017 (UTC)[reply]

I fully agree with Angr. That is the most sensible approach for a descriptive dictionary. — Ungoliant ^(falai) 18:06, 31 October 2017 (UTC)[reply]

But how do you see what the world is? What is the world? Internet chats probably always have used that orthography, and the newspapers underlie Gleichschaltung, as they use to, and elsewhere people working with language have various constraints. And not few people are nomads who do not write anyway. If it really comes to a reform, it would be the fastest to just create alternative-form-of pages, so nobody has to survey constantly how the usage is. You can never know the whole language anyway, to say: Wow, now the usage is more than 50%, looks like we have to switch to Latin. And then back when it drops? Whole publishing houses in Germany have switched back to the 1901 orthography after having used the 1996 one. Descriptiveness does not preclude a-priori economic decisions. Usage is a bad guide for description, one always reasons from exterior sources whenever one describes as long and because one aspires to organize.

If you accept an inferior orthography, you will get a shedload of problems, like automatic IPA display not working because the government has decided to use digraphs instead of fitting characters, in which case therefore it would be more economical to use the Cyrillic entries as base and treat the official ones as alternatives; these could even be auto-created with zero work (with the only need of a bot). Palaestrator verborum (loquier) 18:31, 31 October 2017 (UTC)[reply]

One sensible option (IMHO anyway) is to treat both scripts equally based on actual attestation evidence. Therefore, Latin script entries should only be created when there are attesting quotations in that script meeting WT:ATTEST. When Latin entries are created, their Cyrillic counterparts should be left as is for considerable time. --Dan Polansky (talk) 18:32, 31 October 2017 (UTC)[reply]

I disagree. What is being attested is the language, not the script. Therefore a quote in Cyrillic attests the word in Roman too. This is the same way the language gets treated in large Serbo-Croatian linguistic works. One author writes something in Latin but it is still in Cyrillic in the Речник српскохрватскога књижевног језика (Vols. I-VI 1967–1990) – even though Croatians participated in the dictionary, who do not use Cyrillic at all. Palaestrator verborum (loquier) 18:41, 31 October 2017 (UTC)[reply]

In each particular quotation, use of a script is being attested or evidenced. Therefore, it is an observational factual statement to say that script X is usually used to write language Y, and it is equally observational fact that spelling A in script X is attested. This is related to particular spellings being attested, not only their pronunciations; in fact, pronunciations are not really attested. --Dan Polansky (talk) 18:46, 31 October 2017 (UTC)[reply]

Of course the script is being attested in this sense, but also the other script: As pronunciations for most languages do not need attestation, so lexemes do not need attestation in every script for each lexeme to be included in each available script, because we can derive the spelling in another alphabet by rules a priori as well as we can derive the pronunciation from rules a priori, as long as we know what the script represents (which can be hard for a bot with digraphs or in as much as the script is unfit otherwise). Or do we need to attest every inflection form of every Latin word? One is content by proofs from which the other forms can be inferred. Palaestrator verborum (loquier) 19:43, 31 October 2017 (UTC)[reply]

Each quotation attests only the script which it uses and none other. That is, it serves as direct evidence supporting only the thing that it shows and it does not support as direct evidence things that it does not show. --Dan Polansky (talk) 20:18, 31 October 2017 (UTC)[reply]

You talk like there were an essential difference between directness and indirectness; but the delimitation is fuzzy. Surely a quotation attests some things more directly than others. But the second script instance is directly enough known by the instance in the other script plus the conversion rules, and even in the extremely marginal case that we know that the word has no single occurrence whatsoever in the second script, one still wants to have it listed. It would be an overkill to list a word with an asterisk only because of not being attested in this script, as well as it would be awkward to go through lists of Greek and Latin lemmas and mark the forms which are not attested. What we describe are not all instances, but the general rules of a language, consisting of words, and to the extent as we can circumscribe the rule-dependent usage, for which the script hardly ever matters. Or have requests for verification for Serbo-Croatian required attestations in both scripts? No, or they needed not to, because one and the same Serb who writes a word in Cyrillic as well writes Roman on other occasions; what we prove by quoting him is that the word dealed with exists in his lexicon, not that this word exists in that script – by the way one work by the same author might be printed sometimes in Latin, sometimes in Cyrillic, which also suggests that the language analyst should not distinguish.

And yes, as the script is not the language, and the scope of inclusion is clear (unlike the case if arbitrary invented words were allowed on Wiktionary), I do not have anything against Russian words invented after 1918 listed in the Petrine spelling. Palaestrator verborum (loquier) 21:55, 31 October 2017 (UTC)[reply]

As always, we should wait to see what is used. I'm seeing some kickback on Facebook, as it is pretty ugly using apostrophes instead of diacritics, but that's not really to be helped from us.

There's some argument about citing cross-script and orthography. In Esperanto, I've always converted the h-system and x-system for writing Esperanto without diacritics into the standard diacritics. There's no point in separating Latin-script and Cyrillic-script Serbo-Croatian attestations. It doesn't work so well when the spelling changes or orthography changes aren't easily mapped, though.--Prosfilaes (talk) 12:08, 1 November 2017 (UTC)[reply]

I don't like the new Kazakh orthography, the current transliteration is far better. They don't want to use any diacritic symbols and use ' instead. It's not much different in the way how Uzbek was romanised. They also replaced special letters with apostrophes, a standard symbol is "[ʻ]'. If I'm not mistaken, the new Kazakh orthography will look like this (Cyrillic, current translit, new translit):

Қазақ әліпбиі — қазақ тілінің әріптерінің жүйелі тізбегі, қазақ халқының мәдени өмірінде басқа да түркі халықтарымен бірге пайдаланып келген әр түрлі әріп таңбаларынан тұратын дыбыстық жазу жүйесі.

Qazaq älipbïi — qazaq tiliniñ äripteriniñ jüyeli tizbegi, qazaq xalqınıñ mädenï ömirinde basqa da türki xalıqtarımen birge paydalanıp kelgen är türli ärip tañbalarınan turatın dıbıstıq jazw jüyesi.

Qazaq a'lipbi'i — qazaq tilinin' a'ripterinin' ju'yeli tizbegi, qazaq xalqının' ma'deni' o'mirinde basqa da tu'rki xalyqtarymen birge paydalanyp kelgen a'r tu'rli a'rip tan'balarynan turatyn dybystyq jazy' ju'yesi. --Anatoli T. ^{(обсудить}/^вклад) 07:15, 2 November 2017 (UTC)[reply]

This orthography is a disaster. I think they should get rid of some characters as well like q in favor of k', p in favor of b', t in favor of d' and x in favor of k''. There are some letters lacking, I am guessing sh is s', ch is s'', and w is b''. I mean they must be, just look at this harmony, it is perfect. --Anylai (talk) 20:30, 3 November 2017 (UTC)[reply]

If the orthography is using double apostrophes, that raises some technical issues: while it's entirely possible to create and link to pages containing double apostrophes in the page name, there may be problems with the system mistaking double apostrophes for wikitext outside of the links, and there may be similar problems with our templates and modules- nothing we can't deal with, but a bit of tinkering and extra typing will be required to make things work. Chuck Entz (talk) 21:13, 3 November 2017 (UTC)[reply]

Apostrophes … another irksome point I have not thought about. It points of course to the rule “if it ain’t broke, don’t fix it”. It applies to this topic as: Keep Kazakh in Cyrillic, at least that will work well. From the example by Anatoli it seems to me that even reading the script does not work. Sickening. Palaestrator verborum (loquier) 22:04, 3 November 2017 (UTC)[reply]

I think Anylai was joking? Neither Anatoli T.'s example or the Facebook example I saw complaining about it used double apostrophes, and I'm pretty sure the latter would have complained about double apostrophes.--Prosfilaes (talk) 07:38, 4 November 2017 (UTC)[reply]

I can't believe my joke turned into reality. Nazarbayev, are you following beer parlour or what? --Anylai (talk) 17:45, 25 January 2018 (UTC)[reply]

Nazarbayev wants the transition done by 2025 so there's no need to hurry, if we're lucky maybe he gets putsch-ed over this abominable alphabet. Crom daba (talk) 19:35, 4 November 2017 (UTC)[reply]

This Russian video introduces future Kazakh orthography in a funny way. There are many funny cases, especially names. Nazarbayev himself made a joke about сәбіз (säbız, “carrot”), which will be spelled "saebiz", which looks like vulgar Russian заеби́сь (zajebísʹ). --Anatoli T. ^{(обсудить}/^вклад) 21:23, 4 November 2017 (UTC)[reply]

@Atitarev: Huh. Kazakh alphabets#Correspondences on Wikipedia indicates that ә will be spelled as a'. — Eru·tuon 21:56, 4 November 2017 (UTC)[reply]

Apparently Nazarbayev doesn't know his new alphabet himself. The video used "sa'biz". --Anatoli T. ^{(обсудить}/^вклад) 22:07, 4 November 2017 (UTC)[reply]

Template inflection of - making the arguments required

In Module:form of/templates, an editor insists that arguments of {{inflection of}} are now required. That is to say, it is now technically required that users of the template specify what sort of inflected form it is. The editor has protected the page.

Do we want to make the arguments required on the technical level? (I don't.)

--Dan Polansky (talk) 18:19, 31 October 2017 (UTC)[reply]

What is your specific objection to making the arguments required? DTLHS (talk) 18:23, 31 October 2017 (UTC)[reply]

I would like to be able to create pages that leave the form unspecified, and get no module errors. In such a situation, the reader should still be able to find which kinds of inflected forms are concerned in the lemma entry. --Dan Polansky (talk) 18:26, 31 October 2017 (UTC)[reply]

That turns non-lemma entries into little more than redirections, then. Having the complete morphological description of a form directly on the entry is more useful IMO, so much so that I think it's not a bad idea to enforce the practice by triggering module errors. --Barytonesis (talk) 18:37, 31 October 2017 (UTC)[reply]

The question is what is more valuable, a soft redirect or nothing. Since, you cannot expect the people who would be entering soft redirects (inflection of without the kind of form specified) to enter these forms specified in the same volume, if at all. --Dan Polansky (talk) 18:49, 31 October 2017 (UTC)[reply]

Once you have non-lemmas as soft redirects, other editors can attach pronunciation and rhyming. That allows division of labor and specialization, which is good for productivity and accuracy. --Dan Polansky (talk) 18:54, 31 October 2017 (UTC)[reply]

@Dan Polansky: True. But quite a few contributors have said over the years that they prefer a red link where they can do everything themselves from the start, to a blue one leading to a half-baked entry; handling the first is more gratifying than the second, which feels like a tedious cleaning up job. That's probably less of a problem for non-lemma entries, however. --Barytonesis (talk) 18:47, 4 November 2017 (UTC)[reply]

I don't know what statements from what contributors you have in mind. Different people find different things gratifying. For Czech lemmas, I hardly ever add pronunciations and inflection tables, and I saw recently an anon add these. If the anon is not so strong in English, it would be difficult for them to do full entries, but adding inflections and pronunciation may be easy for them. For those who find it more gratifying to fill redlinks than to expand existing entries, there are plenty of redlinks to fill left, and will be for the foreseeable future. --Dan Polansky (talk) 20:04, 4 November 2017 (UTC)[reply]

Dutch has entries like Arabische that just say "Inflected form of ...". This seems like a justification for not making the parameters mandatory. DTLHS (talk) 18:29, 31 October 2017 (UTC)[reply]

If it is a good and wished thing to make all inflected form of entries specify this kind of information, then {{nl-adj form of}} used in Arabische could be changed or it could be replaced with {{inflection of}}. The question is, is it good and wished to make that mandatory for all contributors? --Dan Polansky (talk) 18:39, 31 October 2017 (UTC)[reply]

Note that we have both {{inflection of}}, which requires arguments, and {{inflected form of}}, which doesn't. If you don't want to specify exactly which forms the term corresponds to, just use {{inflected form of}}. Granted, the doc page for that template says to use {{inflection of}} instead, but that recommendation doesn't match the RFD "keep" outcome archived at Template talk:inflected form of. —Aɴɢʀ (talk) 22:01, 31 October 2017 (UTC)[reply]

But does it make sense? Was the intent of the disputed edit in Module:form of/templates to force me to use {{inflected form of}}? Would not a better course of action be to make sure {{inflection of}} does not produce any module errors (it can place items to a hidden category instead, or even visible category), and deprecate the other template? --Dan Polansky (talk) 17:53, 3 November 2017 (UTC)[reply]

Quotations

When using {{quote}} the term get categorized in Category:/Language/ terms with quotations but when using the quotationtemplates in Category:Citation templates (such as {{quote-journal}}) there is no categorizations. Is this supposed to be this way? I doubt it...Jonteemil (talk) 23:45, 31 October 2017 (UTC)[reply]

It's not, but merging the various citation / quotation templates into something unified is a lot of work that nobody really seems interested in right now. DTLHS (talk) 00:24, 1 November 2017 (UTC)[reply]

Additionally, {{quote-journal}} and the rest of the quotation templates do not require or even make use of a language code, so that would have to be added before they could categorize as "terms with quotations". DTLHS (talk) 00:32, 1 November 2017 (UTC)[reply]

The matter has itched me too recently. But isn’t it a simple task for a programmer? Its solving would require to make an addition to the templates and let a bot run accross the articles, just looking under which language a quotation is placed. However yes, I do not see exactly why anyone should be interested in the work, because the categorization is of little use, methinks. Palaestrator verborum (loquier) 00:51, 1 November 2017 (UTC)[reply]

I see. It’s just that I found it strange that one quotation template categorizes and one doesn’t. But then I know. Thanks for answering!Jonteemil (talk) 01:52, 1 November 2017 (UTC)[reply]

Since quotations are a key part of Wiktionary they should get some more love. But yes, there are quite a few different templates in use, lots of quotes / examples without templates, it will be a bigger project. The first step would be to add language codes to all template invocations. Or we could just wait for T122934. 💤 – Jberkel (talk) 14:05, 6 November 2017 (UTC)[reply]

Wiktionary:Beer parlour/2017/October

October LexiSession: punishment

Special:Contributions/98.113.14.63

French Wiktionary September news

PulauKakatua19 (talk • contribs) again

Etymological information for strong verb non-lemma forms

Rollbacking/Patrolling

A more personal form of Google Translate just for Faroese

Entries with deprecated labels

Please, please reveal the cause of the revert in the edit summary

Requests for deletion - restoring the list of nominations

Classification of forms with -n't

Any idea for a new "Thesaurus:" shortcut?

Linking active policy proposals

New section "Synchronic analysis" in WT:EL

Linking to Wikimedia Commons categories

Ōbaku tō-on/sō-on readings

TabbedLanguages default and English links in definitions

Singapore terms

Special:Contributions/86.30.235.176

Translating both ways

Turkish vs Ottoman Turkish

Listing Translations by Language

Word Frequencies in Wiktionary

Catholicism vs. Roman Catholicism vs. Eastern Catholicism

Please revert vandalism at WT:LOP

Removing images of coats of arms

WT:WDP and senseid

Discussion

Wikidata ids in order to capture same senses

Support

Oppose

Comment

Template:senseid as current solution to implement topic above

Support

Oppose

Comment

from vs <

Move {{was wotd}} notices to talk pages

Category:Buyeo language

Poll: deploy timeless skin

The user Equinox is abusing his administrative authority.

Section "Descendants"

Request for review

ISO codes as Wiktionary entries

Coptic standardisation

Etymology of Copto-Greek verbs

Nominal and pronominal states

Dialect tags for Copto-Greek words

Statives

Improving Wiktionary

Adding HSK grade to entries

Applications

User Benwing/Benwing2 has been inactive for some time

How to indicate a long å?

Please vote: Wiktionary:Votes/2017-07/Templatizing topical categories in the mainspace 2

Entries for common Finnic-Samic words?

Twi language code?

Topical organization of BP archives

Colloquial vs. informal

Favicon and apple-touch-icon

Indices on main page?

Westphalian

CU Vote for Chuck Entz

Kazakh orthography

Template inflection of - making the arguments required

Quotations

Navigation menu

Search

Move `{{was wotd}}` notices to talk pages