Wiktionary:Beer parlour/2014/May

From Wiktionary, the free dictionary
Jump to navigation Jump to search

Proposal: Simple Template for Categorization

[edit]

As has been noted many times, using {{context}} to categorize often leads to strange things like common names for plants being labeled with (botany) even though botanists are the ones least likely to use them.

Why don't we create a template called "cat" to make topical categories that takes two positional parameters: the language code and the category name (maybe additional parameters would be additional cat names), with maybe a named parameter for the script.

Aside from the really obvious and intuitive name, that would have the advantage of not having to type [[Category: in front of everything, and it would allow checking of the language code and standardization of cat names via Module:labels. People would be able to categorize without cluttering the definition line, and context abuse would be reduced.

Thoughts? Chuck Entz (talk) 16:39, 2 May 2014 (UTC)[reply]

I like this idea, but I think the template should be named {{topic}} or similar. --WikiTiki89 16:47, 2 May 2014 (UTC)[reply]
We already have {{catlangcode}} and {{catlangname}}. —CodeCat 16:51, 2 May 2014 (UTC)[reply]
But those are even longer to type. --WikiTiki89 16:52, 2 May 2014 (UTC)[reply]
We can rename them or make redirects. I was just pointing out that we have templates already, in case anyone decides they want to make something. —CodeCat 17:00, 2 May 2014 (UTC)[reply]
So do you think we should redirect {{topic}} (and perhaps also {{top}}) to {{catlangcode}}? --WikiTiki89 17:03, 2 May 2014 (UTC)[reply]
I like the idea. Any short name would do, at least for a redirect, but "cat" seems very appealing. DCDuring TALK 21:29, 2 May 2014 (UTC)[reply]
"cat" is too vague, it doesn't have anything apparent to do with topical categories specifically. {{topics}} would probably be most fitting (plural because it can take multiple names), and it's still shorter than typing the categories out manually. —CodeCat 22:01, 2 May 2014 (UTC)[reply]
Oh, apparently I already created that... —CodeCat 22:02, 2 May 2014 (UTC)[reply]
That'll do fine. DCDuring TALK 22:10, 2 May 2014 (UTC)[reply]
What about just {{c}}? DTLHS (talk) 22:19, 2 May 2014 (UTC)[reply]
Note also the existence of template:categ (which must be substed and works for all sorts of language-specific categories, not only topical categories).​—msh210 (talk) 06:06, 4 May 2014 (UTC)[reply]
…and which doesn't seem to work any longer. (?!)​—msh210 (talk) 06:10, 4 May 2014 (UTC)[reply]

Automatization of German conjugation table

[edit]
This discussion is put in both BP and GP because it involves both policy discussions and technical issues.

I am planning to make the German conjugation tables automatic. Module:de-conj has been built to realize this, but one must note that the module is not yet complete. For whatever reasons, Angr became the first user other than myself to actually use this template in an entry. Kephir helped me to simplify the codings of the module. The template calling this module is Template:de-conj-auto. The discussion below (if any exists) shall be about a few things:

  1. To actually use automatization or not.
  2. The usage of this template. It has been put in Template:de-conj-auto/documentation as a reference. The main rule I follow is to let the concatenation of all the parameters be equal to the page name itself. However, I am open to all potential changes which can improve it.
  3. Where to put the template. Currently, the template is located at Template:de-conj-auto instead of Template:de-conj because it already exists. However, if all German conjugation tables are automatized, it can be actually moved to Template:de-conj.

Of course, the discussion can be about anything related to this topic, other than the three points listed above. Examples of usage is located at Special:WhatLinksHere/Module:de-conj. --kc_kennylau (talk) 10:00, 10 May 2014 (UTC)[reply]

Categorize German compounds by components

[edit]

Does anybody support adding Haushaltgerat to the categories Category:German words compounded with Haus, Category:German words compounded with halten and Category:German words compounded with Gerat? Does anybody support adding abtun to the category Category:German words compounded with tun? Or should the words "compounded with" be replaced with "derived from"? I'm aware that it would create many new categories. --kc_kennylau (talk) 04:09, 11 May 2014 (UTC)[reply]

First of all, I think you meant Haushaltsgerät. More importantly, your naming scheme is based on a misunderstanding: when you say "compounded with x", you're really saying "added to x", not "containing x": Haushaltsgerät would only be in Category:German words compounded with Haus if there were a compound such as "*Haushausaltsgerät". Chuck Entz (talk) 04:27, 11 May 2014 (UTC)[reply]
I thought it's Haushalt +‎ Gerät so it should be in Category:German words compounded with Haushalt and Category:German words compounded with Gerät? If "compounded with" isn't the phrase to be used, then can you suggest one? I'm not a native English speaker. --kc_kennylau (talk) 04:34, 11 May 2014 (UTC)[reply]
A better naming scheme would be Category:German compounds containing Haus, but I'm not sure if it's a good idea: It's so easy to form compounds in German that we should avoid even the possibility of encouraging the creation of SOP entries for the purpose of filling out categories. Chuck Entz (talk) 04:44, 11 May 2014 (UTC)[reply]
Even English compounds aren't organized by the words they contain; Category:English compound words has subcats for the type of compound, but not for the terms used in compounding. —Aɴɢʀ (talk) 14:43, 11 May 2014 (UTC)[reply]
But English doesn't have as many compounds as German. See tun. --kc_kennylau (talk) 16:14, 11 May 2014 (UTC)[reply]
Abtun is definitely not a compound, it is a trennbares Verb. Compound (Zusammensetzung) infers syntactic comparability, whereas a trennbares Verb is built by means of a præfix. The latter case is comparable to English phrasal verbs in structure. The uſer hight Bogorm converſation 15:20, 23 May 2014 (UTC)[reply]
I oppose creation of these or similar categories. Haushaltsgerät can be placed to Derived terms section of Haushalt and Gerät. --Dan Polansky (talk) 19:02, 11 May 2014 (UTC)[reply]
Why do you oppose? —CodeCat 19:11, 11 May 2014 (UTC)[reply]
I generally oppose creation of a huge number of rather small categories that are intended to supplant the direct content of Derived terms sections. There was a similar discussion before: Template_talk:derv#Deletion_debate. I am not sure I can articulate why I dislike having hugely many small categories; at least as a matter of taste, I just don't like it. --Dan Polansky (talk) 19:31, 11 May 2014 (UTC)[reply]
I concur with Dan Polansky and æqually oppose the creation of numerous superfluous categories, in virtue of the applicability of the derived terms section. The uſer hight Bogorm converſation 15:20, 23 May 2014 (UTC)[reply]

Although I was involved in this discussion, I do not see how it can objectively be read other than as expressing a consensus to merge Category:English nouns ending in "-ism" into Category:English words suffixed with -ism. Does anyone disagree with this reading of consensus? If not, I will begin to implement this merge within the next few days (or, of course, anyone else who wants to can beat me to it). bd2412 T 18:45, 12 May 2014 (UTC)[reply]

Hearing no objection, done. Cheers! bd2412 T 20:35, 14 May 2014 (UTC)[reply]
I just realized that I used the wrong AWB edit summary for the entire run, but, oh well. bd2412 T 02:07, 15 May 2014 (UTC)[reply]

Ignoring the lack of consensus exhibited in the major discussion of {{sense}} at WT:Beer_parlour/2013/April#Template:sense, CodeCat has, without bothering to try to build a consensus, simply imposed her favorite change. That change was not even the most favored one at the discussion. The change shows a lack of awareness of the nature of the problem.

The change is unsatisfactory in the numerous cases where {{sense}} uses a label rather than a gloss.

The template now yields results like (zoology): for {{sense|zoology}}.

Some technical effort that went into allowing users to customize the appearance was wasted as the problem is not user preferences as much as it is variation in the nature of the content of {{sense}} as it is now used. Until some consensus on a good design for this is reached, could this simply be reverted? DCDuring TALK 13:26, 21 May 2014 (UTC)[reply]

CodeCat's change was made in good faith, but I agree with DCDuring that it yields undesirable results. The output produced by {{sense}} was already entirely clear and, in addition to being unsatisfactory for use with labels, CodeCat's change adds needless verbiage to every single gloss. Glosses should be as brief as possible. The change should be undone. -- · (talk) 06:15, 23 May 2014 (UTC)[reply]
It wasn’t entirely clear. People would often change the glosses of {{sense}} in antonym sections to match the meaning of the antonym. — Ungoliant (falai) 14:56, 23 May 2014 (UTC)[reply]
Yes, I've noticed anons doing that relatively often. Equinox 22:31, 23 May 2014 (UTC)[reply]
The CodeCat's change should be undone, at least since it is not supported by consensus and is actually opposed. No consensus => status quo ante. --Dan Polansky (talk) 19:10, 23 May 2014 (UTC)[reply]
I don't actually have any problem with it being undone. What gets me though is that people are undoing it for the sake of it, not giving any reason. Wikipedia explicitly forbids "I don't like it" arguments, and I think the same should apply on Wiktionary. There should be reasons, and those reasons should be scrutinised. I've seen Dan do this particularly often (just voting "oppose" with no further rationale), but it's a general tendency that others have as well. It goes against the whole process of consensus building if everyone justifies things based purely on personal preference. Consensus can only be formed if everyone understands and accepts the motivation of changes, despite personal preferences to the contrary. Remember that consensus and "liking" are very different things, Wikipedia makes this abundantly clear, and I don't see why Wiktionary needs to have its own distinct definition of what consensus is about. —CodeCat 19:24, 23 May 2014 (UTC)[reply]
Re: "people are undoing it for the sake of it, not giving any reason": Except that they're starting discussions wherein they give their reasons. And that you're replying to. While claiming that they don't exist. —RuakhTALK 22:09, 23 May 2014 (UTC)[reply]
Well, DCDuring only came here after he undid the change without any kind of explanation about why. That screams "I just don't like it" to me. And Dan still hasn't given any motivation for preferring the original version either. —CodeCat 22:15, 23 May 2014 (UTC)[reply]
I absolutely detest this continental pseudo-rationalism, requiring people to give reasons for their preferences in spite of the preferences being most often empirical givens, while the alegged reasons for them are quite often invented implausible falsities. But if you want to have a "reason", I dislike your change since it makes something that was nicely short needlessly long. Why is long bad? I don't know. In any case, I object to the idea that my dislike of a change not supported by consensus is not good enough if it does not come supplied with a reason. I reserve my right to oppose a change while giving no reason whatsoever; if there actually is a consensus for the change, I should easily get outvoted. That said, by cursory browsing votes, you will see how many votes come with no reasoning at all, and how often my votes--in a proper voting process that deserves that I give it some though, where my giving something a thought is my use of my scarce resource--come equipped with a summary of a reason. --Dan Polansky (talk) 22:26, 23 May 2014 (UTC)[reply]
I'm not sure what point you're trying to make. When the change was made there was no established consensus that I was aware of, in either direction. It only became apparent that there was no consensus afterwards when there was a revert followed by this discussion. So it doesn't seem fair to complain that I acted against consensus. Advance consensus isn't needed to make edits; the assumption on any wiki is that making changes in good faith is ok, and the edits or discussions that follow are what establishes consensus or lack of it. So I don't see anything being done wrong here, and I don't fault DCDuring either since he did the most reasonable thing in face of a conflict. You on the other hand are coming down on me very heavyhandedly and seem more concerned with "dealing with" me, than with the issue DCDuring presented regarding {{sense}}. —CodeCat 22:38, 23 May 2014 (UTC)[reply]
When DCDuring reverted you in diff, that was enough evidence of lack of consensus. You reverted him in diff. Why don't you just undo your change to status quo ante bellum? Please do so now, since there is no consensus for your change, and I oppose your change since (a) I don't like the change, and since (b) the change is not supported by consensus of other editors. --Dan Polansky (talk) 22:44, 23 May 2014 (UTC)[reply]
On Wikipedia, from what I've seen, it's quite common for unexplained reverts or other changes to be undone. Also, in consensus building on Wikipedia, votes aren't counted, but the arguments are. I've seen votes pass despite a majority opposing, based on the strength of the arguments. Since I consider that a good way of forming and determining consensus, I followed that line here too. I motivated my change, DCDuring didn't, so based on that I considered my argument stronger and reinstated the edit. A single unexplained revert does not demonstrate a lack of consensus, it just means that one editor felt like reverting for some undetermined reason, which leaves me free to reinstate the edit for equally undetermined reasons. Consensus only follows once there is a discussion of arguments for or against the change. Reinstating may not have been the best course of action, but that's what happened and why. —CodeCat 22:53, 23 May 2014 (UTC)[reply]
Since I oppose the change and since I even stated a reason for you (the change makes the result of the template too long), please stop the "you have given no reason" game, and revert to status quo ante. --Dan Polansky (talk) 23:01, 23 May 2014 (UTC)[reply]
Re: "I've seen Dan do this particularly often": Diffs please. If I do it quote often, I figure you should have no difficulty finding 7 such diffs. Furthermore, liking is important; you cannot reduce all decisions to objective reasoning. On yet another note, it is you that perform countless illegal bot runs with no justification or reasoning at all, since many of them are run without even a notice at Beer parlour. Here is an example of your "reasoning" (although the link to those horrible liquid threads where one cannot find anything probably does not quite work): A guy asked why did you delete so many categories? And you answered "They seemed kind of pointless to me, like categorising for the sake of categorising." Your pretense that you are somehow providing objective reasoning that justifies your actions is a delusion. --Dan Polansky (talk) 20:19, 23 May 2014 (UTC)[reply]
With respect to Dan's behavior, I know of few editors more reasonable and circumspect than Dan Polansky, and it was distinctly unhelpful for CodeCat to steer her remarks into an ad hominem attack on Dan. That's sure not how one conducts a consensus-building exercise. -- · (talk) 22:19, 23 May 2014 (UTC)[reply]
Let's just say we differ on that point, then. —CodeCat 22:38, 23 May 2014 (UTC)[reply]
What reason do your have for differing on that point? --Dan Polansky (talk) 22:45, 23 May 2014 (UTC)[reply]
My past experiences with you. —CodeCat 22:53, 23 May 2014 (UTC)[reply]
Do you have any objective evidence to support your experience with me, such as diffs? Am I right to suppose that any past experience with me are on-wiki, and thus diff for that exists? --Dan Polansky (talk) 23:01, 23 May 2014 (UTC)[reply]
The message above is just an example. The evidence doesn't need to be objective because it's subjective by nature. I just don't like how you've treated me in the past and present on Wiktionary. Especially not the personal attacks I've had to endure from you. I'm not going to give diffs, but I recall that several people were absolutely disgusted by it. I also received a private email from another well known and respected active sysop with similar complaints about you, from which I quote: "He [Dan] evidently hates you as well, and I think you've got to acknowledge him as an annoyance but not a real threat to your immense value to the community." So I can at least rest assured that my dislike of you is not a personal grudge, but a grievance shared by others as well. And I'm going to leave this discussion at that as I don't think anything I could still contribute to it would be productive in any way. —CodeCat 23:14, 23 May 2014 (UTC)[reply]
Do you agree that you have provided no evidence to support your claim that I often oppose with providing no reasoning? Would you agree that talk about personal attacks is off-topic for the purpose of the claim under investigation, viz. that I often oppose without any reasoning? --Dan Polansky (talk) 23:19, 23 May 2014 (UTC)[reply]

Per this discussion, I've now reverted.​—msh210 (talk) 08:27, 25 May 2014 (UTC)[reply]

@Ungoliant MMDCCLXIV I agree that the label without "Of the sense" seems to cause contributors confusion, most conspicuously when applied under the Antonyms header. Are the benefits of avoiding the confusion worth the cost in extra verbiage in all the applications where no confusion is apparent? Is there an alternative way to reduce confusion, at least with regard to use under the Antonyms header? DCDuring TALK 13:24, 25 May 2014 (UTC)[reply]
My preference is the creation of a new template with the extra verbiage to be used only in antonym sections. But the extra verbiage in every {{sense}} was not so bad considering it fixed a major problem no one else was willing to fix. — Ungoliant (falai) 18:53, 25 May 2014 (UTC)[reply]
The only drawback to having different wording for the Antonyms header (whether an option in {{sense}}, eg, ant=1, or a separate template) is the loss of consistency. I wouldn't object to implementation of such a change. If it turned out to be an unsuccessful attempt, the option could be disabled or the separate template could be redirected to {{sense}}. DCDuring TALK 19:59, 25 May 2014 (UTC)[reply]
I oppose creating a separate template for antonyms. I acknowledge the existence of the problem of anons sometimes misunderstanding the purpose of the output of the {{sense}} template, but the problem is not serious enough to warrant adding extra wording. I disagree that the fix under discussion was one of a "major problem" (I quote); the problem is quite minor indeed. --Dan Polansky (talk) 20:10, 25 May 2014 (UTC)[reply]

Breaking news from Merriam-Webster

[edit]

Merriam-Webster has announced the addition of 150 new words - all of which Wiktionary has already had in its compendium for years. Point. Laugh. Yawn. Shuffle off to find a sandwich. bd2412 T 19:41, 21 May 2014 (UTC)[reply]

It would be cool if we could trace down the actual earliest cites they mention. For example, they say that fangirl goes back to 1934; it would be great if we could actually find the 1934 cite and add it. (Also, "find a sandwich"? Just lying around in the kitchen or something? Ew. I'd make a fresh one.) —Aɴɢʀ (talk) 14:02, 25 May 2014 (UTC)[reply]

Merging some of the category boilerplate templates

[edit]

I managed to make a few changes to these templates and it's now technically feasible to merge many of these into a single template. It concerns the following:

Merging wouldn't really change much in the way the templates are used. The main difference would be that you could no longer leave the second parameter empty for root categories. So you'd have to type "parts of speech" and "etymologies" and such explicitly. But that shouldn't be a major drawback I think. The process of merging wouldn't be too difficult, mainly a matter of moving all the current subtemplates to become subtemplates of the new merged template. It's mostly work intensive and time consuming but not really error prone. There might be some errors appearing during the move process, though?

We would need to think of a new name for the merged template. I don't really have any immediate suggestions, but I would prefer something resembling {{topic cat}}. That is, something ending in a space followed by "cat", rather than ending in "catboiler" with no space. We could adopt this name scheme for templates more widely too if there's support for it.

I have also been working on a Lua replacement for these templates, but that's a much larger task and I haven't worked it all out yet. In the meantime, I think this would be a welcome simplification for editors, as it means less template names to remember. —CodeCat 00:52, 22 May 2014 (UTC)[reply]

If no one has better suggestions, I think I will merge them all into {{poscatboiler}} for now. It's the largest and most widely used of the templates, and we already call categories named in this way (language name + category name) as "POS categories" in Module:labels/data, so it kind of fits with that, even if it's not the best name. —CodeCat 17:49, 28 May 2014 (UTC)[reply]

Done. See WT:NFE for more information. —CodeCat 14:39, 2 June 2014 (UTC)[reply]

New L3 for Chinese

[edit]

User:Wyang suggests to introduce a new L3 header (at least for single-character) entries - ===Definitions===. Please take a look at this version of , the Chinese section.

It does make sense in complicated cases, when a character has a variety of senses and uses and not all can be easily fit into usual notion of parts of speech. With some effort, perhaps considerable and the result may still be imperfect, it may be possible to split the current definitions into several PoS headers. However, it may be easier to allow this header, similar to the way "translingual" sections were - no PoS, just basic meanings. Thoughts? --Anatoli (обсудить/вклад) 07:02, 23 May 2014 (UTC)[reply]

I support for all languages, because WT:FEED has often shown us that newbies have trouble finding our definitions. But I don't know any reason Chinese entries should be different from others.​—msh210 (talk) 05:30, 25 May 2014 (UTC)[reply]
Support getting rid of all POS headings. DTLHS (talk) 06:09, 25 May 2014 (UTC)[reply]
Potentially ditto, though I'd like to see a specific proposal. Simply replacing all POS headers with ===Definitions=== is not necessarily the best approach. (I note that msh210's proposal does not actually involve getting rid of POS headings; it would just nest them under ===Definitions===.) —RuakhTALK 06:13, 25 May 2014 (UTC)[reply]
Well I think my ideal layout would be something like User:DTLHS/export- basically group as many things as possible with their respective definitions (synonyms, translations, etc), put the part of speech information preceding the definition, make the definitions larger and first on the page. DTLHS (talk) 06:35, 25 May 2014 (UTC)[reply]
And obviously if there is only 1 etymology you could do away with the Homograph header, and if there are synonyms or antonyms that cannot be mapped to a specific sense they can get their own header as well. DTLHS (talk) 06:42, 25 May 2014 (UTC)[reply]
Makes sense. How would quotations and example sentences look? —RuakhTALK 07:22, 25 May 2014 (UTC)[reply]
Quite right (Ruakh, 06:13, 25 May 2014 (UTC))! I did not propose being rid of all POS headers. My "I support", above, was for the general idea of a Definitions header as implemented with "it may be possible to split the current definitions into several PoS headers" and not for the other details. Sorry for any confusion.​—msh210 (talk) 08:20, 25 May 2014 (UTC)[reply]
Oppose. I don't think we should do away with parts of speech for all senses just because a few may be hard to classify. I think it would make entries considerably messier and harder to look over and understand. We do have at least one catch-all POS header, for things that don't fit any other part of speech: ===Particle===. Whether or not to have a Definitions header in conjunction with POS headers, as msh's page proposes, is a separate question. - -sche (discuss) 05:43, 25 May 2014 (UTC)[reply]
I oppose this, at least for now. For one thing, changing the current heading structure specifically for Chinese and not for other languages has not been justified in the proposal. The proposal uses the difficulty of assigning a part of speech as a rationale, but I think that is a truly poor one; no case of such a difficulty has beeing mentioned in the proposal. The best rationale that I can think of is that it would make it easier to find all definitions at one place. However, it would be a sharp deviation from the previous dictionary pratice, at least for English dictionaries; the entry would no longer be structured by words. Yes, that is the point of the current manner structuring, to separate by headings what are different words. On an another note, currently breaks WT:ELE. Ideally, you should now update the entry to align it with WT:ELE and create an example for discussion in your user space. I fear such an update is not going to happen; for the record, I find this manner of procedure fundamentally uncivil. Let me emphasize that the discussed proposal is to get rid of part of speech headings rather than only to introduce Definitions heading; when the section heading says "New L3 for Chinese", that is very misleading, to me anyway. --Dan Polansky (talk) 08:39, 25 May 2014 (UTC)[reply]

My ideal format is the one linked to by User:DTLHS. This discussion is proposing that the L3 header "Definitions" be allowed for analytic languages, in which the SoPness of a sense is given an undue amount of emphasis under the current guidelines (i.e. as L3 headers themselves!). The dictionary senses of a Chinese word in other dictionaries are not divided into the SoPs the belongs to before they get divided into senses, but directly by the senses the word has. The issue of fragmenting senses as a consequence of the SoP header constraints becomes more prominent in the case of Chinese characters, as one character may have tens or even hundreds of senses.

For example, ("one") has at least tens of senses. Instead of the current format:

===Numeral===
{{zh-num}}

# [[one]]

===Noun===
{{zh-noun}}

# [[one]]

===Adjective===
{{zh-adj}}

# [[first]]
# [[single]]

===Adverb===
{{zh-adv}}

# [[one by one]]

===Determiner===
{{zh-det}}

# [[some]]

===Verb===
{{zh-verb}}

# to [[unify]]

===Conjunction===
{{zh-con}}

# as soon as, once ...

, it would be more logical to show it as:

===Definitions===
{{zh-def}}

# (num./n.) [[one]]
#: ...
# (adj.) [[first]]
#: ...
# (n.) [[piece]], [[item]], [[part]]
#: ...
# (adv.) [[one by one]]
#: ...
# (adj.) [[single]]
#: ...
# (adj.) [[same]], [[identical]]
#: ...
# (n.) [[unity]], [[alliance]]
#: ...
# (v.) to [[unite]], to [[combine]]
#: ...
# (adv.) [[once]]
#: ...
# (det.) [[some]], [[several]]
#: ...
# (det.) [[all]], [[every]]
#: ...
# (n./v.) [[start]], [[beginning]]; to [[start]] to, to [[begin]] to
#: ...

. The headword templates (previously, "inflection-line templates") for analytic languages are unnecessary templates, as they do not add any real value to the entry. WT:Templates shows that the intended genuine rationale for the sense division by SoP in the L3 headers (and hence, the formulation of WT:ELE) is to account for differently inflected senses - a Eurocentric stance which has not taken non-inflecting languages into account at all. Wyang (talk) 09:37, 25 May 2014 (UTC)[reply]

The format of User:DTLHS/export still structures the entry by different etymologies even if stopping to structure it by different part of speech. Is there a Chinese entry with multiple etymologies? If so, which one is it? How do you propose to format a Chinese entry that has multiple etymologies?
The idea that what you propose is "more logical" (I am quoting) is implausible; it has no bearing to logic AKA study of correct inference at all. What you propose gives, I admit, an interesting and quite neat presentation of information; "neat" is not "logical".
By "SoPness of a sense" you probably mean "PoS of a sense" AKA "part of speech of a sense".
You want to allow a different format for what you called W:analytic languages, but you would probably realize that English is often considered an analytic language (low-inflected language) as well. In Wiktionary, English does not have inflection tables. The neatness (or lack of it) of presentation in what you propose applies to English as well. For highly inflected languages, to format them on the model of what you are proposing, the headword line would no longer contain inflection information and there would be a separate section "===Inflection===" (probably after "===Definitions===", on the same level) in which inflection tables per part of speech would be given, so there really is no Chinese-specific consideration that I can see; the neatness (or lack of it) of your proposal applies to Chinese no less than to English, German and Finnish, as far as I can see.
Re: "The issue of fragmenting senses as a consequence of the SoP header constraints becomes more prominent in the case of Chinese characters, as one character may have tens or even hundreds of senses.": I don't follow this at all. Why does the multitude of senses make the fragmenting into PoS sections more prominent? I think the contrary: in an English entry with 4 etymology sections and only couple of senses per etymology, the separation of definitions by the section headings is often quite annoying, to me anyway. However, to solve this, the structuring by both etymology and part-of-speech would have to be removed. --Dan Polansky (talk) 10:10, 25 May 2014 (UTC)[reply]
Words with multiple etymologies would be split by etymology first, each etymology having its own substructure (pronunciation, definitions). The issue of multiple etymologies is less of an issue for Chinese, as one character typically represents one etymology, and mono- and mulisyllabic homophones are usually represented by heterographs.
By logical I mean the formatting ignores the unnecessary (for analytic languages) split-by-PoS-first-then-by-sense guideline. The definition information is kept centralised and users are less likely to be distracted by PoS headers and "inflection-line" templates.
English is definitely not analytic (enough). For instance, reducing the level of inflection information in English headword templates to nil, by setting those templates to {{head|en|PoS}}, would result in a loss of information, whereas doing the same for Chinese headword templates would not.
The fragmentation is more prominent when there is a multitude of senses, because (as said above) it discentralises the information in a way originally devised for inflecting languages, and distracts users with the SoP information by overemphasising them. My proposal (at least originally) does not cover inflecting languages like English, as the headword templates are still not of null importance and the split of senses by SoP is at least somewhat justified. Wyang (talk) 10:37, 25 May 2014 (UTC)[reply]
I will again point to my argument above about inflected languages like German or Finnish and how they would be treated under your scheme: there would be an ===Inflection=== section showing the inflection in inflection tables; the same thing could be done for English. Therefore, I still do not see the thing that differentiates Chinese from, say, German for the purpose of the presentation that you propose; the lack of inflected forms in Chinese is not the sought differentiator, since, as I pointed out, inflection info can be separated into a dedicated section. I repeat that the split of senses by part of speech is no more justified in German than in Chinese; it come down to whether one wants to group senses by part of speech, and prominently so. Furthermore, dictionaries that split senses by part of speech often do not present any inflection information; their reason for the split is that they consider part of speech important; an example is Century 1911 (triggs.djvu.org/century-dictionary.com), which does not supply their "paper" entry with "papers", "papered" and the like. --Dan Polansky (talk) 11:06, 25 May 2014 (UTC)[reply]
I think I know what differentiates Chinese from German and English, at least believing what you posted above: if Chinese hardly ever has multiple etymologies, removing the part-of-speech structure without removing the etymology structure provides much larger benefit for Chinese, leading to definitions all being found in one place for Chinese, unseparated by headings. --Dan Polansky (talk) 11:13, 25 May 2014 (UTC)[reply]
Instances of dictionaries splitting the senses by SoP first and not showing the inflected forms do not disprove the proposition that the fundamental rationale for such practice is to account for differently inflected forms, especially when the inflection paradigms are mostly regular as in the case of English. I agree that the headword templates for inflecting languages could potentially be made redundant through your method of localisation, but I would imagine that at least some people working with inflecting languages would object to nullification of headword templates for their languages, since the templates, as imitations of the practice in most dictionaries, typically serve to identify some key forms in the paradigm (eg. Haus). Such objection would not exist for Chinese, since the inflection-line templates are truly of no value. It makes more sense there to deemphasise the PoS information by demoting PoS from the header and inflection-line template level to the individual line level. Wyang (talk) 00:33, 26 May 2014 (UTC)[reply]
I am still undecided about PoS for Chinese. I have always thought they were useful and I got used to using them when creating Chinese entries. Sometimes it's a challenge to decide, which part of speech a Chinese term belongs too or they can simultaneously belong to several PoS, like this one 裡頭 (inside) - noun, verb, adjective. The choice is almost random indeed. Some linguist will say they are all nouns, some will say they are postpositions but in fact, it doesn't really matter much, they just have this common idea of "inside" and the role is assigned depending on the context. I don't think we should get rid of PoS headers for Chinese altogether, it's probably Wyang's idea. Definitely not for other languages. However, the valid points he has provided should be given consideration. --Anatoli (обсудить/вклад) 07:16, 27 May 2014 (UTC)[reply]
Another big example of an entry without PoS headers (L4 header Definitions) is this version of . @Wyang Yes, it looks neat and is simpler but it's still non-standard (at the moment). I might agree to get rid of PoS headers (for Chinese only), if you insist so. Thank you for adding the definitions. You still need to get the agreement of the rest of the community, though and other Chinese editors. It might violate some rules, e.g. is Definitions header allowed? I don't know. Will this methods be embraced by other editors? What about other languages with similar grammar. It's important to consider these things, too. --Anatoli (обсудить/вклад) 06:59, 4 June 2014 (UTC)[reply]
Oppose. This policy was a mistake, apparently unsupported by consensus and shouldn't actually be used. Why not have "definition" for every word in all languages or for multicharacter entries? Because parts of speech are useful.
I would have understood the original programme where POS were listed in a more convenient way for Chinese, but making things lazy (by nixing the POS headers) makes people lazy. I haven't run across any Chinese entries the POS are listed as in the original example. — LlywelynII 00:31, 21 November 2016 (UTC)[reply]

Media Viewer

[edit]


Greetings, my apologies for writing in English.

I wanted to let you know that Media Viewer will be released to this wiki in the coming weeks. Media Viewer allows readers of Wikimedia projects to have an enhanced view of files without having to visit the file page, but with more detail than a thumbnail. You can try Media Viewer out now by turning it on in your Beta Features. If you do not enjoy Media Viewer or if it interferes with your work after it is turned on you will be able to disable Media Viewer as well in your preferences. I invite you to share what you think about Media Viewer and how it can be made better in the future.

Thank you for your time. - Keegan (WMF) 21:29, 23 May 2014 (UTC)[reply]

--This message was sent using MassMessage. Was there an error? Report it!


You are forgiven for using this awful language. Keφr 06:46, 24 May 2014 (UTC)[reply]

Language of entries marked with Template:no entry (formerly "only in")

[edit]

Currently most of the pages which use this template have no language section at all. I think this is a mistake because it's certainly possible for a word to be unattested in one language but attested in another. And even if the word is not attested in any languages, the use of language headers would still allow users (and editors) to verify which languages this applies to. For example, if an English term is marked as not having any entry, but there is no German section on the page, then that would simply mean that we haven't gotten around to adding an entry for it yet. On the other hand, if there were a German entry with a "no entry" template, then that would be a positive confirmation that it's not attested. So I think we should add language headers to all of these pages. —CodeCat 22:57, 25 May 2014 (UTC)[reply]

Makes sense. The template has a lang parameter which suggests it should be used in language sections anyway. — Ungoliant (falai) 23:06, 25 May 2014 (UTC)[reply]
There's currently a list of all cases of {{no entry}} which lack a language parameter at Category:Language code missing/no entry. There are over 2000 of them right now. These may coincide with the entries lacking language headers, but they don't necessarily correspond exactly. —CodeCat 23:25, 25 May 2014 (UTC)[reply]
They may not correspond exactly, but if you ever want to add headers by bot from WhatLinksHere, they're the ones you would want to get out of the way first.Chuck Entz (talk) 00:26, 26 May 2014 (UTC)[reply]
I've done most of them by bot, and some others by hand. There are 8 entries left that I don't know what to do with. I can't even click on them, for starters. Can anyone try? —CodeCat 22:19, 27 May 2014 (UTC)[reply]
Here they are with links: 1, 2, 3, 4 ,5 ,6 ,7 ,8 DTLHS (talk) 22:28, 27 May 2014 (UTC)[reply]
And if you ask me they should be deleted (the control characters at least). DTLHS (talk) 22:29, 27 May 2014 (UTC)[reply]
The second one is translingual (I've labelled it accordingly). - -sche (discuss) 22:31, 27 May 2014 (UTC)[reply]

Request for template protect

[edit]

Can I request that the template Template:User committed identity be protected with full protection. This would be because it is a high-risk template and has the potential to be vandalized to surpass the security that it creates. It has already been protected on the English Wikipedia and the next logical step would be to protect it on Wiktionary as well. I don't think the code will change soon since I just updated it to the latest code on WP. Thanks, Negative24 (talk) 14:13, 28 May 2014 (UTC)[reply]

Not such a huge deal. Even if the template is vandalised, the hash can still be read from the user page's markup. And I think our RC patrol should catch that rather quickly. Keφr 17:17, 28 May 2014 (UTC)[reply]
True, I just thought it might be good to do if the template would be shown on a high number of user pages. Negative24 (talk) 17:37, 28 May 2014 (UTC)[reply]

Context tags for impolite terms

[edit]

What should be the tag used for こいつ? --kc_kennylau (talk) 15:59, 29 May 2014 (UTC)[reply]

The closest I can find is "pejorative". You can add various words in the context/label but whether it adds to categories is controlled by Module:labels/data. --Anatoli (обсудить/вклад) 09:37, 31 May 2014 (UTC)[reply]
What about "vulgar"? - -sche (discuss) 22:23, 31 May 2014 (UTC)[reply]

CJK ideographs composition indication

[edit]

@Atitarev, Kephir, Lo Ximiendo, Wyang Should we use ⿰ as suggested in or ⿲ as suggested in to describe the character ? --kc_kennylau (talk) 07:47, 30 May 2014 (UTC)[reply]

That was not a description but an alternative form. It's used, because of the limitation of the input.--Anatoli (обсудить/вклад) 00:04, 31 May 2014 (UTC)[reply]
Okay, I've changed the example. --kc_kennylau (talk) 00:32, 31 May 2014 (UTC)[reply]
Well, the category of says that it's for description. --kc_kennylau (talk) 01:00, 31 May 2014 (UTC)[reply]

Pregenerating entries

[edit]

What does the community think about pregenerating a huge number of entries for various languages, to eliminate all of the unnecessary typing waste? For example, for Ukrainian I have 116k nouns of the form:

==Ukrainian==

===Pronunciation===
* {{IPA|[{{uk-pron|держпо́зика}}]|lang=uk}}

===Noun===
{{uk-noun|держпо́зика|f-in|держпо́зики|держпо́зики}}

# {{rfdef|lang=uk}}

====Declension====
{{uk-decl-noun|держпо́зика|держпо́зики|держпо́зики|держпо́зик|держпо́зиці|держпо́зикам|держпо́зику|держпо́зики|держпо́зикою|держпо́зиками|держпо́зиці|держпо́зиках|держпо́зико|держпо́зики}}

===References===
* {{R:uk:SUM-11}}

Basically everything except definitions gets generated. I was thinking of having a bot that would generate them on a request page, but that seems a rather unnecessary middle step. It would be similar to how CJK ideographs were generated in the early days, but with a bit more content (refs, pronunciation, inflection). It would enable existing editors to better focus on definition lines and examples, plus possibly bring in new ones who would be otherwise intimidated by huge number of templates. --Ivan Štambuk (talk) 09:42, 30 May 2014 (UTC)[reply]

On the other hand, we would not be able to tell whether a blue link contains an actual definition or not. (Some tweaks to the "orange links" gadget may alleviate that, though.) Myself, I prefer to approach this through better editing tools and automatic inflection modules (i.e. changing the typing waste to clicking waste). Something that would incorporate WT:EDIT and User:Yair rand/newentrywiz.js. I might not have the time for making that, however.
Also, welcome back. Where have you been??! Keφr 09:55, 30 May 2014 (UTC)[reply]
I’ve been doing it for a few months and no one complained, so I guess it’s fine. — Ungoliant (falai) 10:16, 30 May 2014 (UTC)[reply]
  • I oppose mass creation of entries that lack definition. Definitions are the key content, the one thing that it should not be lacking in an otherwise stubby entry. Compared to definitions, pronunciation and inflection are trivia. --Dan Polansky (talk) 17:43, 30 May 2014 (UTC)[reply]
    • Is no entry better than an entry with everything but a definition? I'm not so sure. It kind of goes against the idea of building and improving entries over time, which is the main principle of a wiki. —CodeCat 17:52, 30 May 2014 (UTC)[reply]
      • I think having a huge number of definitionless entries in various languages in Wiktionary is a poor state of affairs. Above, Ivan reports to have 116 000 nouns in Ukrainian. Wiktionary should not report to have 116 000 Ukrainian entries when the only thing it would have are algorithmically generated pronunciations and inflection tables. By my lights, at least one definition (or translation) is the minimum content, from which the entry can be incrementally expanded. I recall people complaining about Tbot entries (I like Tbot enties), when the only deficiency of these was that a human had to confirm what Tbot entered; Tbot solved precisely the problem of providing the formatting structure while at the same time providing what often turned out to be a correct translation. --Dan Polansky (talk) 18:32, 30 May 2014 (UTC)[reply]
        • Maybe a possible solution is to have some kind of formal definition of how complete an entry is. Entries with no definitions would of course get a low score on that. A bot could generate such a score on a large number of entries, which would allow us to pinpoint problems more easily maybe. Of course a bot can't decide whether information is correct or useful, only if it's present. But it's a good start. —CodeCat 18:37, 30 May 2014 (UTC)[reply]

I support this idea and not just for Ukrainian. Definitions are the most important part of entries, without them language entries are incomplete but the proposed example doesn't claim to be a complete entry, it asks user to add a definition. I would add more to the bottom, some message like autogenerated entry and another category to keep track of these. Complexity of entries scares off beginners. Besides, if pronunciation, inflection, headers are already there, it's a big step. Inflection, word stress, gender, part of speech, animacy are all non-trivial information for a language like Ukrainian. Russian Wiktionary also mass-generated entries. Now this policy pays off and definitions have been rapidly added. @Ivan Štambuk please generate sample entries as an example and to check the quality. There's nothing new in this approach, people do import entries, request definitions. --Anatoli (обсудить/вклад) 09:27, 31 May 2014 (UTC)[reply]

So can you show us how the Russian Wiktionary policy has paid off? How many definitionless entries were originally created in Russian Wiktionary, and how many of them are no longer definitionless? --Dan Polansky (talk) 11:00, 31 May 2014 (UTC)[reply]
No, that's only my observation and what I saw in the discussions in the Russian Wiktionary. I can't give numbers. Most our Tbot entries were cleaned up eventually, so did many imported or generated entries here or in the Chinese Wiktionary. --Anatoli (обсудить/вклад) 12:45, 31 May 2014 (UTC)[reply]
Is there a category for definitionless entries in Russian Wiktionary? If so, can you post a link to it? If not, can you give us an example of a definitionless entry in Russian Wiktionary? --Dan Polansky (talk) 12:52, 31 May 2014 (UTC)[reply]
[стыковка] A while ago I've added a translation into English ("docking") there but didn't add a definition. --Anatoli (обсудить/вклад) 13:09, 31 May 2014 (UTC)[reply]
Entry ru:стыковка contains # {{пример|}}. There does not seem to be any category for definitionless entries in Russian Wiktionary. ru:Template:пример is one for an example sentence; the category for entries lacking an example sentence (which is not the same as lacking a definition) is ru:Категория:Статьи без примеров употребления. I downloaded a dump from http://dumps.wikimedia.org/ruwiktionary/20140519, "ruwiktionary-20140519-pages-articles.xml.bz2". I unzipped the dump. I ran "grep -P "# *{{\xd0\xbf\xd1\x80\xd0\xb8\xd0\xbc\xd0\xb5\xd1\x80\|}}" ruwiktionary-20140519-pages-articles.xml >t.txt. The number of lines in t.txt is 102 239. These scary \x items are just the binary of the UTF-8 string for пример and I really search for "# *{{пример\|}}". So there seem to be 102 239 definitionless entry lines of the format "# *{{пример\|}}"; but since they don't even bother to categorize them, who knows how many definitionless entries they have in another format. In any case, I do not like what I see there, like a page full of empty section headings including Синонимы, Антонимы, Гиперонимы, Гипонимы. What a bad joke, if you ask me. --Dan Polansky (talk) 14:41, 31 May 2014 (UTC)[reply]
Re: "please generate sample entries as an example and to check the quality": What the heck? Check the quality? What quality? Algorithmic formatting of an entry? We know what is being discussed from the example posted at the beginning of the thread; no need to start generating "sample entries" before there is consensus for this deviation from status quo ante. --Dan Polansky (talk) 11:13, 31 May 2014 (UTC)[reply]
Did all entries in Category:Definitions needed require someone's sanction? They don't violate any format, do they? And opposition to the idea doesn't mean that discussion can't go on or we should stop editing. The example entry looks fine but I want to see how accurate everything else is and whether entries need other tags/categories. My only criticism is the location of the stress mark in the way Ivan has done (but I couldn't do it better). It should be /derʒˈpɔzekɐ/, not /derʒpˈɔzekɐ/ but that's a minor issue, it's not easy to determine the syllable onset.
I have just created держпо́зика (deržpózyka) as per the suggested format above, leaving it definitionless for now. --Anatoli (обсудить/вклад) 12:45, 31 May 2014 (UTC)[reply]

Support. Makes achieving the eventual goal quicker. Chinese Wiktionary sort of used the same strategy: zh:知了, zh:獨立. Wyang (talk) 10:21, 31 May 2014 (UTC)[reply]

Status quo: I contend that the status quo in English Wiktionary is to avoid having large masses of definitioness entries. {{rfdef}} is used in less than 1500 entries. --Dan Polansky (talk) 11:04, 31 May 2014 (UTC)[reply]

  • oppose' -- Liliana 12:32, 31 May 2014 (UTC)[reply]
  • I oppose too. Add the terms as you get definitions for them, even if it's just a one-word gloss. To save typing, you could create a pseudo-template in your userspace and just have your bot create pages with something along the lines of {{subst:User:Ivan Štambuk/uk-entry|держпо́зик|а|f-in|gloss}}. —Aɴɢʀ (talk) 13:29, 31 May 2014 (UTC)[reply]
  • Strongly oppose. I don't want us to end up like the Russian Wiktionary with tons of entries that fool you into thinking they exist but really lack definitions. --WikiTiki89 13:49, 31 May 2014 (UTC)[reply]
  • Oppose per Wikitiki and Keφr: it would mean we could no longer tell based on blueness vs orangeness vs redness of link whether an entry existed (with a definition) or not, and hence whether we still needed to define it or not. If there were a way to distinguish definitionless entries from complete entries (say, by modifying the gadget that creates orange links), I wouldn't mind as much, though I'd still think it best practice to include definitions in one's entries, and not create entries without definitions, except in rare cases where you e.g. add citations but can't figure out what they mean, and so add {{rfdef}}. The reference to the zh.Wikt is interesting, since it brings up the point that about thirty thousand of en.Wikt's Chinese-character entries (a majority?) have had many definitionless language-sections ever since they were created eight years ago. - -sche (discuss) 16:47, 31 May 2014 (UTC)[reply]
  • Strongly oppose per Dan Polansky and Kephir. —Mr. Granger (talkcontribs) 18:28, 31 May 2014 (UTC)[reply]
  • Vahag is on the fence. I hate Russian Wiktionary's empty pages, that cheat you into believing there is content down the link. But Ivan's pregenerated are not empty; they have a lot of useful content that can be valuable even without a definition. --Vahag (talk) 19:32, 31 May 2014 (UTC)[reply]
    Well, ru:стыковка is not entirely free from information either. It tells you that (a) it is a Russian word, (b) a noun, (c) inanimate, (d) it is feminine, (e) inflection table. What Ivan adds is (f) pronunciation, and (g) a link to possibly a good dictionary. Of all these, (g) might be most valuable, from my standpoint, since taking the reader to a page that has a lot of the sought info about the specific word is quite worthwhile. --Dan Polansky (talk) 19:51, 31 May 2014 (UTC)[reply]
    That useful stuff was added to Russian Wiktionary pages incrementally, over time. Ivan is providing them in version 1.0. --Vahag (talk) 20:09, 31 May 2014 (UTC)[reply]
Support. The advantages (presence of useful content that wouldn’t be there otherwise and making it easier for people not familiar with the Wiktionary entry format to add definitions) override the disadvantages (uselessness of link colouring). As for “report[ing] to have 116 000 Ukrainian entries”, I can update the program that calculates the statistics to ignore {{rfdef}}s. — Ungoliant (falai) 20:06, 31 May 2014 (UTC)[reply]
  • To address some of the points raised and extend the rationale further:
    • Definitions are not necessarily the key content. We have many users that exclusively come because Wiktionary is the only place on the Internet you can look up e.g. inflections or IPA transcriptions for words in many foreign languages. For example, on feedback often there are students thanking for Latin inflections.
    • Entries are never complete but are instead a permanent work in progress. We have many entires that consist entirely of a PoS header and a single-word definition. These entries are much more useless in terms of value provided than the prototypical stub I listed above. Not to mention that 90%+ of Wiktionary entries are inflected forms that don't have any definitions at all.
    • The suggestion to check whether the blue link contains definition or not is intriguing. But this is already the situation for everyone unless they have User:Yair rand/orangelinks2.js imported. So there is no difference for 99.9% of Wiktionary users. The {{rfdef}} templates inserts the entry into a language-specific category, so the script could be modified to check for it.
    • The push toward "automatic inflection", i.e. reimplementing the language's generative phonology and inflectional morphology in Lua, is IMHO ultimately not worth it. The user doesn't care how the content was created, and creating dependencies that span the entire individual language's specific PoS category with complicated execution logic like Module:ru-verb for Russian verbs is just begging for accidental runtime errors and bugs in the future. Inflection is just static and immutable data and it should be treated as such. Perhaps one day it could be relocated to e.g. WikiData, because WMF decided to implement lemmatization-enhanced search box across all wikis (both pedias and other wiktionaries), or all wiktionaries decided to pull inflections from a single storage.
    • Personally I think Tbot was the best thing to happen on Wiktionary ever. Checking and expanding its entries is very fun and was adopted by many casual editors. If necessary, these pregenerated entries could have their definition lines changed to have meanings and glosses from the translations of the corresponding English entry if it exists (for vast majority unfortunately it doesn't), or taken from e.g. Google Translate. This could be done in a second pass and it's really easy to do.
    • Scoring entries similar to Wikipedia is an excellent idea, that has already been suggested before. It would be nice to have entries tagged by completion and missing data, as well as having a language-wide average score reported in statistics. The mere number of entries is a very poor metric. But this is really irrelevant for this discussion.
    • I think that the attitude of rather having a small number (a few thousands) of quality entries done by a limited number of editors rather than 200k (a lower figure for any language with a developed literature) of which 90% are likely to be stubs is detrimental to the chief purpose of dictionary which aims to list every used (and unused) word. There are too many words, too many languages, and too few editors. Speeding things up is essential, and that includes lowering the bar on participation to editing only the missing/incomplete definition line, disincetivizing the costly creation of automatic inflection modules by eliminating the problem which they are trying to solve, and leveraging existing architecture such as definition extraction from FL entries in translation tables and the corresponding other-language wiktionaries (like Tbot used to do) to reduce the average editing time per entry. Stubbing on a mass scale would eliminate the number of entries worshiping altogether and would instead incentivize adding quality content. --Ivan Štambuk (talk) 21:14, 31 May 2014 (UTC)[reply]
    • So instead of reimplementing the language's generative phonology and inflectional morphology in Lua, you suggest reimplementing these in bots, in which systematic errors are just as probable (if not more), harder to correct (because someone has to run a bot again to fix template invocations, and mine dumps to find out which pages are affected by an error) and harder to notice (for one, because bots' source code is usually not open to public scrutiny the way modules are)? Keφr 17:50, 1 June 2014 (UTC)[reply]
      No, but rather taking them from morphological lexicons, inflection engines or digitized dictionary headwords which are available on the Internet. The problem is already solved by taxpayer-funded research, and there is no need to reinvent the wheel. I mean you could do it, but it seems wasteful. I'm sure that writing Module:pl-verb would be loads of fun, but life is too short. --Ivan Štambuk (talk) 18:05, 1 June 2014 (UTC)[reply]
StubCreationBot (talkcontribs) has started to created stub entries despite notable lack of consensus. --Dan Polansky (talk) 11:49, 1 June 2014 (UTC)[reply]
I'll be running it in small batches and manually checking each entry. That seems to be the least "controversial" solution. --Ivan Štambuk (talk) 18:05, 1 June 2014 (UTC)[reply]
I worked on these small batches just now by adding definitions. It's very easy and a lot of efforts were saved. So I'm switching to support. --Vahag (talk) 18:11, 1 June 2014 (UTC)[reply]
If it is so easy to fill-in the gaps left in the definitionless entries, why does Russian Wiktionary have over 100 000 definitionless definition lines? Also, why does none of the supporters present an accurate and verifiable report on how well this worked in Russian Wiktionary, so we can learn from their experience? --Dan Polansky (talk) 19:33, 1 June 2014 (UTC)[reply]
Currently, there are about 60,000 definitionless Russian entries in the Russian Wiktionary, down from 108,000 in 2008. They have much some smaller number of editors. I doubt this 48,000 entries woud be created otherwise.--Anatoli (обсудить/вклад) 20:35, 1 June 2014 (UTC)[reply]
Is it correct that those definitionless entries are in the native tongue of the Russian Wiktionary, while the planned definitionless entries in English Wiktionary are in languages that have only few contributors in English Wiktionary, such as Ukrainian? So do you really think that the overall number of editors of English Wiktionary helps definitions of Ukrainian terms here being entered faster than the definitions of Russian terms were entered in Russian Wiktionary? --Dan Polansky (talk) 05:49, 2 June 2014 (UTC)[reply]
I don't know, Dan, I just support the idea, hoping to attract more/new editors. It's not just Russian words that were auto-generated - Ukrainian, English were imported as well but I don't know the original number to check the progress. There are a lot of enthusiastic editors with some knowledge of Slavic languages here who spend more time in the mainspace, not in the Beer parlour. To add a definition, you don't need to be fluent in a given language, suffice to consult a dictionary and many are obvious. Some words can already be confirmed by existing translations from English, Swadesh lists, etc. There are many obvious loanwords, how hard is it to translate e.g. авока́до (avokádo) or австрі́єць (avstríjecʹ) from Ukrainian into English? Gradual importation seems like a compromise, doesn't it? --Anatoli (обсудить/вклад) 06:22, 2 June 2014 (UTC)[reply]
Putting such definitionless entries in the mainspace and making them searchable and visible to casual users seems very unwise to me. Equinox 19:42, 1 June 2014 (UTC)[reply]
It’s not very elegant, but it’s an improvement on having absolutely nothing on a word. — Ungoliant (falai) 20:15, 1 June 2014 (UTC)[reply]
I don't think having them searchable and visible to non-editors is an improvement. As a user I would think, "oh, it's that site that has pages for words it can't even define" — similar to the feeling I get when Google "corrects" my spelling and gives irrelevant results rather than admitting there are no results. Equinox 15:01, 2 June 2014 (UTC)[reply]
Even if users think that, they may also think “but they have pronunciation and inflection so I can count on Wiktionary whenever I’m in need of that” (better yet: “I know the definition, I’ll add it”), which is better than “they have absolutely nothing on this word, why bother coming here at all?”. — Ungoliant (falai) 15:16, 2 June 2014 (UTC)[reply]
Maybe instead of conjecturing what users consider useful or not, we should do a reader survey? Do we have resources for that? Keφr 18:23, 4 June 2014 (UTC)[reply]
I'm changing my vote to weak support. I guess I got caught too up in the flaws of the Russian Wiktionary to realize that our situation is different. Ivan makes some good arguments here. I still think that we have to be cautious with this. We should only do it if there are people willing to be actively working to fill in the definitions, and we should only do this for languages for which we have editors capable of adding definitions. --WikiTiki89 22:20, 1 June 2014 (UTC)[reply]
Yes- I'd support if the bot doesn't create any new entries after some threshold of definitions needed is reached- maybe 500. That makes sure it isn't creating entries when there is nobody looking at them. DTLHS (talk) 22:21, 1 June 2014 (UTC)[reply]
I weakly support the proposal, but I definitely oppose abandoning automatic declension tables, for the reasons that have been given by Kephir. Templates and modules are not just there for convenience of editors; otherwise we'd just subst them all. They are a huge asset to maintenance and should definitely continue in use. —CodeCat 22:30, 1 June 2014 (UTC)[reply]
I wouldn't support it either, if abandoning automatic declension tables was the plan. --Anatoli (обсудить/вклад) 06:22, 2 June 2014 (UTC)[reply]
I agree with DTLHS's notion of limiting the total number. If there already are more than one thousand entries with missing definitions in a language, perhaps the other missing definitions merit attention first. DCDuring TALK 19:01, 2 June 2014 (UTC)[reply]
I agree now with setting the limit but 500 and 1000 is too low. It should be at least 5,000, IMO. Preferably for words starting with various letters, not just "А". One reason: It happens that majority of Ukrainian, Russian words starting with "А" are loanwords and not the most common, useful words. --Anatoli (обсудить/вклад) 00:08, 3 June 2014 (UTC)[reply]
We could use a frequency list to determine the order to add words. --WikiTiki89 05:05, 3 June 2014 (UTC)[reply]
@Ivan Štambuk What is the source of the Ukrainian data that you are proposing to upload, those 116 000 nouns including gender, inflection, and animacy? --Dan Polansky (talk) 06:45, 3 June 2014 (UTC)[reply]

Support  How can we object to adding information to the dictionary? How is no definition worse than no entry at all? If you want orange links to indicate missing definitions instead of missing entries, then let’s ask someone smart to make them do that.

Can someone smart make a widget that gives me a notification in the red box at the top, offering a random definitionless Ukrainian entry every time I visit Wiktionary? Michael Z. 2014-06-03 07:34 z

Re: "How is no definition worse than no entry at all?": If you Google a word, find it has an entry in English Wiktionary, click it, and only find trivia, you may feel as cheated as I do when this happens to me with Russian Wiktionary; consequently, your esteem of the English Wiktionary project is negatively impacted, a negative outcome from the standpoint of those editors who have donated considerable resources to build it. The key disagreement seems to be that what I consider to be trivia some other editors consider to be "useful content". On another note, if Russian Wiktionary and other Wiktionaries are already busy creating definitionless stubs, then users will be able to find what they were looking for there, the Wiktionaries will massively outperform the English Wiktionary in their ability to provide this "useful content" (trivia by my lights), and if this is what dictionary users are really looking for, then it should ultimately show up in page view statistics in http://stats.wikimedia.org/wiktionary/EN/ReportCardTopWikis.htm. --Dan Polansky (talk) 09:20, 3 June 2014 (UTC)[reply]
If I search for a word in Wiktionary and find its gender, part of speech, translations, or pronunciation, &c., I may feel less cheated than if I find a null. I may also add a definition. We’re a repository for lexicographical information, not only definitions, and we are a work in progress. If an editor has some of that information, I would be grateful if he would add it to the project instead of sitting on it. Michael Z. 2014-06-04 13:06 z
Trivia or not, it's still occupying a large chunk of every entry. It should be added sooner or later, and I really see no point in arguing that it should be added later rather than sooner solely because it would otherwise cheat the user out of what they would deem the most relevant part of the entry, in most cases should they end up reading it. Why? Because the purpose of this project is not user satisfaction. There are specialized dictionaries that only provide accents or inflection of words, some that only provide etymology - there is no problem in having only etymology, inflection or pronunciation section without definitions because it's supposed to be there regardless. --Ivan Štambuk (talk) 20:49, 4 June 2014 (UTC)[reply]
  • Support - it is better to have some of the possible information being sought than none at all. The rest can and will be filled in over time. However, perhaps as an interim measure the entries could be created in Wiktionary space (e.g. Wiktionary:autogenerated/word) and moved to mainspace when the definition is added. bd2412 T 18:47, 3 June 2014 (UTC)[reply]
  • User:Ivan Štambuk: Question — does your bot try to import translations from tables? Because right now, when Special:Searching for a redlinked word, at least User:Yair rand's gadget will generate a list of places where the word appears in translation tables (e.g. [1]). Creating an entry disables that. So this is another advantage of having redlinks. I am not sure how many Ukrainian translations it would impact, however. Keφr 13:55, 4 June 2014 (UTC)[reply]
You can still search, even if an entry is created - [2]). In the search window, need to linger longer and select containing .... @Mzajac Not a smart widget but Ukrainian entries asking for definitions are here: Category:Ukrainian definitions needed. --Anatoli (обсудить/вклад) 14:02, 4 June 2014 (UTC)[reply]
I know that. This is to point out yet another piece of our infrastructure that is built with the assumption that if the entry exists, it already contains the definition you need. Which is not always right, but mass generation of definitionless entries would thwart it even more. (Also, note the additional cumbersomeness of searching for a term for which an entry already exists.) Keφr 18:00, 4 June 2014 (UTC)[reply]
That gadget is obviously poorly written and that's not my problem. --Ivan Štambuk (talk) 20:25, 4 June 2014 (UTC)[reply]
How about this variation on my above thought: we autogenerate all the entries in a Wiktionary:subpage/ space, with a blank definition line, and then have a bot comb the entries and move them to mainspace whenever a definition is added. The entries will be made and will require nothing by editors beyond the addition of definition lines. bd2412 T 22:57, 4 June 2014 (UTC)[reply]
Changed the script to work even when the entry exists. One still has to go to the search page, though. --Yair rand (talk) 21:49, 8 June 2014 (UTC)[reply]
Oppose (in case the discussion is still going). I think, the intermediate request page will do better. Besides, the 116k articles Ivan talks about are mainly outdated or combined words, with a great number of regionalisms. I doubt the great part of them will be populated any time soon. On the other hand, there are hundreds words used in the modern language absent in that dictionary, so having option to autocreate articles one personally is going to work with is worth doing.

By the way, speaking of ru.wiktionary, one should also looked at the Ukrainian or French parts of it to have an illustration of what will be here in case of mass creation (save lesser number of total crap of entry parts). Alexdubr (talk) 18:09, 12 June 2014 (UTC)[reply]

Uhm, all of the words are taken from the published Ukrainian corpus and are currently generated in the frequency of appearance in normalized form in modern newspapers. Just because you personally haven't heard of some words it doesn't mean that they don't exist, or that they should be treated as of lesser importance. Regionalisms and outdated words are of particular importance because that's what gives Wiktionary an edge over paper dictionaries who can't afford to list them. Experiences from desperately understaffed sister wikiprojects are irrelevant. --Ivan Štambuk (talk) 18:35, 12 June 2014 (UTC)[reply]
Implying we are not desperately understaffed. Keφr 19:02, 12 June 2014 (UTC)[reply]
From Category:Ukrainian definitions needed: авансодавець, for example is a possible word, I really doubt it exists in practice. авансик and автобусик are diminuatives, do you propose to add a diminuative form of every word or just at random? авіахім is istoricism, which has no correspondence in English, I think. After letter A though, the entries do really seem from a contemporary corpus. Alexdubr (talk) 19:11, 12 June 2014 (UTC)[reply]
These are all real words that were taken from written and published (non-Internet) corpus, all of them passing CFI. The remaining A-words were generated alphabetically, the rest by frequency of appearance. --Ivan Štambuk (talk) 19:37, 12 June 2014 (UTC)[reply]
@Kephir: There are levels of desperateness. What is much worse however is this blind idolatry of manual labor in face of new and exciting technologies that offer a significant productivity boost. Rather than leveraging opportunities that would be inexcusable to ignore, neo-Luddites prefer to stick to the stone age methodologies that on average produced a new dictionary edition once a decade (if that often). --Ivan Štambuk (talk) 19:37, 12 June 2014 (UTC)[reply]
Despite my previously-noted opposition to creation of masses of definitionless entries, I think the way things have been proceeding is OK: StubCreationBot (talkcontribs) has created ~630 entries, but Ivan and possibly other users have then defined the majority of them, such that there are only ~230 entries in Category:Ukrainian definitions needed. I like the suggestion made by several users above of allowing the bot to create stubs, but having it pause after making ~500 or so and wait until people defined some of those before proceeding. - -sche (discuss) 19:41, 12 June 2014 (UTC)[reply]
@Alexdubr. I confirm that all pregenerated terms are valid Ukrainians words (if this confirmation is required). As I previously mentioned, they are not necessarily the most common, useful or otherwise "interesting" words. Dated, regional, rare and diminutives also qualify as words and are, of course, includable. Words without English equivalents can be described, you don't have to give a one-word definition. You're under no obligation to help, though. Ivan has already promised to use frequency lists, which are already making adding definitions easier and more motivating. --Anatoli (обсудить/вклад) 23:50, 12 June 2014 (UTC)[reply]
A small update. Currently at only 104 entries and I haven't worked hard on Ukrainian. There are still many quite rare (but REAL) terms. Some terms have multiple rare etymologies. I think it's OK to remove them and add later, if they are found. E.g. I have no idea what type of plant can be called "роман" in Ukrainian and I'm no expert in plants. --Anatoli T. (обсудить/вклад) 07:47, 21 July 2014 (UTC)[reply]
It's in Bilodid (see references) - a rare alternative form for ромен (romen).
Anyway, I'm exploring other options now:
  1. loading those skeletons via JavaScript from my server when you click them in the redlinked translations, along with translation + gloss from the base entry. This would however force editors to first add translations to English terms and then create the main entry. I'm not sure that we want that, because many FL words don't have exact English equivalents. Many English terms also don't have translation boxes and it's PITA to add them.
  2. Dumping all of the skeletons in a format that GoldenDict can read, so that when you do Ctrl+C (or merely point your mouse if you configure the program that way) you can just copy the preformatted entry from the popup window to clipboard and paste it into the Wiktionary edit box. This has the benefit of also looking up uk words in installed dictionaries for GoldenDict (such as Bilodid, or other uk-en, uk-ru dicts).
What would be the optimal workflow? --Ivan Štambuk (talk) 10:40, 21 July 2014 (UTC)[reply]
I also use accelerated entry creation from translations User:Ruakh/Tbot.js (it needs some small fixing), which work OK for Russian - just generating the language header, PoS header, a translation with a gloss. It would for Ukrainian with addOnloadHook(function() { Tbot.greenifyTranslinks('uk'); });. What you're suggesting is quite interesting. So the formatted entry you're talking about can be generated from a translation? Could you demonstrate with a red-linked Ukrainian translation, such as квито́к (kvytók) at ticket? --Anatoli T. (обсудить/вклад) 23:14, 21 July 2014 (UTC)[reply]

Mass creation of definitionless entries

[edit]

For those who have not notices, a proposal to create definitionless entries en masse is being dicussed at #Pregenerating entries thread. --Dan Polansky (talk) 19:35, 1 June 2014 (UTC)[reply]