User talk:FinalCeiling

From Wiktionary, the free dictionary
Jump to navigation Jump to search

Mass adding usexes[edit]

You have been mass adding usexes to Dutch entries which seem to be generated using ChatGPT. This is generally fine, but you're adding way to many to be helpful to anyone, and you're also not checking them for errors. Please slow down or stop, and please review the ones you've added already. Stujul (talk) 12:59, 5 August 2023 (UTC)[reply]

I would recommend to limit them to two per sense at most and to exercise judiciousness on when to add more than one. The general quality seems good enough to me. ←₰-→ Lingo Bingo Dingo (talk) 15:11, 5 August 2023 (UTC)[reply]
Is there a "expand for more" template for usage examples? I couldn't find anything. It's inexcusable that often dozens of (for langauge learners - rather useless and annoyingly verbose) quotations are embedded inline for word meanings, whereas cohesive and illustrative usexes are frowned upon. FinalCeiling (talk) 15:16, 5 August 2023 (UTC)[reply]
Usexes plainly are not frowned upon. I think that most of the usage examples that you have added are acceptable, but some more judgement is needed with regard to their quantity. The reason why quotations are often favoured is that quotations attest. I don't know of a template like that. ←₰-→ Lingo Bingo Dingo (talk) 15:22, 5 August 2023 (UTC)[reply]
Indeed, but they have the Citations namespace for attestations. What is also funny, on User:Stujul's page I read that "Pages should present the user with as much information as possible" 😊. I've added a second pass to my AI prompt to catch grammatical issues such as on neerschieten. FinalCeiling (talk) 15:58, 5 August 2023 (UTC)[reply]
Thank you. I want to reiterate that I think that the grammar of the output in general is appropriate for an example sentence. ←₰-→ Lingo Bingo Dingo (talk) 17:29, 5 August 2023 (UTC)[reply]
"As much information as possible" does not mean adding five example sentences when one or two would suffice, especially because they often don't provide any more information anyway. The second part of that sentence is "while still being easy to read", which I think very much applies here too 😊 Stujul (talk) 20:39, 5 August 2023 (UTC)[reply]
User:Stujul I understand your perspective, but for language learners, usexes are by far the most important aspect of a dictionary. I would like to have hundreds of usexes per sense, for every collocation and every typical usage context imaginable. Then, a random subset of it could be displayed to the person who opens the entry. Ideally this would be stored in a centralized data store such as Wikidata, so that other projects can reuse them and a language-specific Wiktionary would just pull the usexes on-demand (based on the parameters such as the the target translation language, reader's configured language level, frequency of context usage, and others) but I don't think that's technically possible at this moment based on some quick searches I made. But regardless, I will stick to the guideline of a maximum of two to three usexes per sense. FinalCeiling (talk) 20:48, 5 August 2023 (UTC)[reply]
In my opinion, this only really applies when the lemma is hard to translate, or when the definition could be ambiguous. With a word such as zandstorm, do you really need more than one usex?
I feel like what you are describing is a different project all on its own. It reminds me very much of Tatoeba Stujul (talk) 22:16, 5 August 2023 (UTC)[reply]
User:Stujul I've turned on the "inline" parameter on zandstorm so that usexes take half as space now. Does this address your concerns? I can ensure that the AI generates a usex up until a certain number of characters, so that that even with "inline mode" both the usex and its translation alaways fit in a single screen line.
Tatoeba looks cool, thanks for sharing that link. However, usexes there appear to be written by humans and are quite simplisitc by comparison. FinalCeiling (talk) 06:43, 6 August 2023 (UTC)[reply]
Generating usexes with AI should currently be a no-no for licensing reasons. It is not clear to whom the copyright of such examples would belong. — SURJECTION / T / C / L / 17:33, 5 August 2023 (UTC)[reply]
User:Surjection As far as I know, there are no restrictions for wikimedia projects even with non-permissive models, it's only for commercial use that the terms apply. Besides, no one can prove that I am using a non-permissive model either way. FinalCeiling (talk) 19:45, 5 August 2023 (UTC)[reply]
It is understood in the licensing policy that the contributor holds the copyright or is authorised to act on the behalf of the copyright holder. To whom the copyright of AI-generated works belongs is a grey area at best. Can the contributed content even be licensed in this case? I think that Surjection has pointed at a serious and pressing concern here. ←₰-→ Lingo Bingo Dingo (talk) 21:23, 5 August 2023 (UTC)[reply]
User:Lingo Bingo Dingo All commercial AIs are trained on Wikipedia and probably even Wiktionary. The generated usexes (and other information) cannot be licensed, unless it's provably used verbatim elsewhere which can be easily bypassed by making the AI more "creative" (e.g. by increasing the model's "temperature"). Gray area is also perfectly fine - just look at Wikipedia's "fair use" policy. Small usexes cannot be "watermarked", hence there's always plausible deniability regarding its origin even if the matter were black and white. FinalCeiling (talk) 21:35, 5 August 2023 (UTC)[reply]

adjective + singular diminutive[edit]

It seems that the model which you are using does not handle the inflection of attributive adjectives in noun phrases containing a singular diminutive well. Could you perhaps check those manually or avoid such combinations in your examples? The results are on the whole a lot better in other cases. ←₰-→ Lingo Bingo Dingo (talk) 18:36, 29 August 2023 (UTC)[reply]