Wiktionary:Beer parlour/2021/August: difference between revisions

Content deleted Content added

Inline

Revision as of 18:08, 16 August 2021

temporary right to move pages without redirect

I want temporary extended mover right for one week. I will be moving pages like Sanskrit परिच्छेदासः (paricchedāsaḥ), which is Vedic Sanskrit form of a term which is only in Classical Sanskrit, to [[user:Svartava/...]] (So it is not in mainspace). See google books:परिच्छेदासः: the word is completely unattested. The google hits fo the term are from Wiktionary and other Wiktionary-based-sites. Similarly, some non-lemma inflections like डालाभ्याम् (ḍālābhyām), which are possible but completely unattested. They are shown on the main page डाल (ḍāla) (in declension), so anyone who (after this inflection has been moved) searches for this will be guided to the main page and can see which form it is. there are lot of such pages needing to be cleaned up. Related link: https://en.wiktionary.org/wiki/Wiktionary:Requests_for_deletion/Non-English#कदाय,_कदासः,_कदेभिः,_कदेभ्यः,_कदेभ्यस्,_खेदासः,_खेदेभिः,_राज्ञीभ्याम्,_राज्ञ्याम्,_राज्ञ्यौ,_शुख also informing user:Bhagadatta — Svā rt ava • 08:23, 1 August 2021 (UTC)[reply]

@Svartava: see Wiktionary:Whitelist, expires at 23:16, 9 August 2021 Kutchkutch (talk) 23:22, 2 August 2021 (UTC)[reply]

Where was this policy approved? While one could sanely exclude परिच्छेदासः (paricchedāsaḥ) on the basis that noone composes good Vedic Sanskrit using later Sanskrit vocabulary, the only argument I can see for automatically excluding words like डालाभ्याम् (ḍālābhyām) is that noone uses the lemma in compositions nowadays. There are plenty of words around that don't occur in Google books - I have found Google books generally useless for much of the Pali that I am recording. Instead, we have RfV for inflected forms like डालाभ्याम् (ḍālābhyām). It's also been argued that we should allow perfectly possible word forms like this to be generated by bots. I don't like that being done, but we should follow proper procedure. The absence of this form is surely just an accidental gap. --RichardW57m (talk) 14:47, 3 August 2021 (UTC)[reply]

@RichardW57m this page does no good. it is already on the main page. we dont need it. for example, have a look at User:Svartava/कृपाभ्याम् (ins//dat//abl of कृपा (mercy) in dual i.e. 2 mercies). Moving this is, atleast, approved by User:Bhagadatta. (note that the google and/or books results are just examples of how feminine-ā stem terms decline. कृपा being a 2-letter short word is convenient to be used) — Svā rt ava • 04:29, 4 August 2021 (UTC)[reply]

@Svartava: I don't like the addition of inflected forms without a reason. Reasons for creation include missatisfied links (orange links are an option only available to registered users), their being alternative citation forms (many treat Pali nominative singulars as citation forms - there's a massive backlog there), and homography with lemmas. They also make sense for organising quotations evidencing their existence. They may also need a separate entry for the pronunciation to be recorded adequately. However, despite my dislike of the unnecessary entries, I still accept that we should follow due process when removing entries. --RichardW57m (talk) 09:44, 4 August 2021 (UTC)[reply]

@RichardW57m i mentioned clearly what i will do. 2 administrators have given the approval.the process would actually take a lot of time,so I am just moving these w/o wasting any more time. you can consider this equivalent to {{speedy}}. — Svā rt ava • 10:51, 4 August 2021 (UTC)[reply]

Administrators have no special status in determining which pages to delete. Their job is to implement the consensus of the community. If there is any uncertainty about whether there is such a consensus, it is safer not to delete the page, but instead flag it RFD or RFV.

@Svartava, Sodhaksh: I actually consider it vandalism. The only saving grace is that the pages are still accessible. They can still act at least in parts as decoys for forms in other languages. I agree with Benwing2 that these are not cases for {{speedy}}. How will we ever know whether there was consensus to delete these pages? @Kiril kovachev. I'd be happier with the process if Sodhaksh publicly conceded that he should not have created these entries.--RichardW57m (talk) 12:41, 4 August 2021 (UTC)[reply]

I hereby publicly admit that I should not have created these entries --SodhakSH (talk) 13:19, 4 August 2021 (UTC)[reply]

An example of a decoy is Pali अनेन (anena). I though it was helpful when I created it, because stemming algorithms will not recover the lemma from this form. It turns out that it hides the instrumental singular of three Sanskrit words - two corresponding pronouns and the noun अन (ana, “breath”). The page will ultimately be needed for a lemma, because that word is also a Sanskrit lemma in the MW dictionary. I'm torn between leaving the page as an example decoy, and adding the soft links to the Sanskrit lemmas to the page. --RichardW57m (talk) 12:41, 4 August 2021 (UTC)[reply]

@RichardW57m Why is it that I was mentioned here? I am sorry, I don't think I'm competent enough to add anything meaningful, unless there's a way I can be useful to you. I sadly don't know anything about Sanskrit. What is a decoy? Why is Pali अनेन (anena) a decoy, and what's the significance of that? Sorry for my ignorance. Kiril kovachev (talk) 17:02, 4 August 2021 (UTC)[reply]

@Kiril kovachev: You asked for permission to run a bot adding inflected forms. Svartava was asking for (and has been granted) permission to semi-remove manually added grammatical inflected forms with no alternative forms which a simple test finds to be unattested. The page अनेन, which only lists Pali words, is a decoy because the obvious search tools will not find Sanskrit अनेन - they will report the Pali word instead. One has to explicitly look for a link to अनेन to find the Sanskrit words, which appear in declension tables. --RichardW57m (talk) 12:58, 5 August 2021 (UTC)[reply]

┌────────────────────────────────────────────────────────────────────────────────────────────────────┘ RichardW57m, SodhakSH I am a bit confused. What is the official Wiki policy for non-lemma forms? Do we keep those that have any attestation? For example the Sanskrit अनेन has a usage in the Bhagavad Gita - see here. Do we keep it? Asking just so that I don't create unnecessary pages in the future. Rishabhbhat (talk) 15:24, 6 August 2021 (UTC)[reply]

@Rishabhbhat thanks for asking. till now i haven't read any 'policy' on this, but actually unattested forms are pretty useless and unnecessary. e.g. user:svartava/मूर्खताभ्याम्. anena is a very well attested form, it should be there. all you should avoid creating is unattested forms. we kept कृच्छ्रे for now but all of it isnt attested (in literature, not dictionary) like the feminine forms and the dual ones. — Svā rt ava • 01:38, 7 August 2021 (UTC)[reply]

@Rishabhbhat, svartava: There is no policy. We seem not to even have a policy for what to do when an inflected form fails an RfV and has its entry or page deleted! Should it be removed from the inflection table? --RichardW57m (talk) 14:07, 9 August 2021 (UTC)[reply]

The concern for 'policy' is reassuring, but I don't feel that the policy on deletion is being followed. --RichardW57m (talk) 14:07, 9 August 2021 (UTC)[reply]

@RichardW57m IMHO we should just speedy-delete impossible forms (e.g. Vedic forms of classical words) and the ones that are just ridiculous (e.g. कृपाभ्याम् - with/for/from two mercies?!?).
But we should keep those that are perfectly possible even if they are not attested, so that anyone searching for an inflection should find exactly what it means.

</opinion>

Rishabhbhat (talk) 04:52, 11 August 2021 (UTC)[reply]

permanent right to move pages without redirect

@Bhagadatta Any objection(s) to:

diff: can you make the moving right permanent and

~~for a period of 7 days~~ at WT:Whitelist

? Kutchkutch (talk) 18:03, 7 August 2021 (UTC)[reply]

@Kutchkutch: I'll defer to the direction of any other administrator regarding this. -- 𝓑𝓱𝓪𝓰𝓪 𝓭𝓪𝓽𝓽𝓪^{(𝓽𝓪𝓵𝓴)} 01:06, 9 August 2021 (UTC)[reply]

Automatic rhymes

See also: Beer parlour/2013/December

So I think it's due time to propose automatic rhymes once again.

The major current problem with rhymes is the fact that one has to add a {{rhymes}} template to the page in question, while also having to manually add it to the appropriate Rhymes: page. This results in those of us editing rhymes being unhappy with the double work, and many others not doing it at all.

Now, I can see two solutions for this problem:

Move all rhymes to categories, which will be automatically populated by pages using {{rhymes}}
Send a formal request to the mediawiki developers to make the Rhyme: namespace work like a category.

What are others' thoughts on this? Notifying @Rua, Dan Polansky, DTLHS, -sche, Atitarev, Ruakh, Erutuon, Vininn126, Fenakhay, Shumkichi. Thadh (talk) 23:24, 2 August 2021 (UTC)[reply]

I remember some people mentioning the problems of it being based on {{IPA}} which is that languages treat rhymes different. So maybe what users can do is create the various pages of rhymes that are then entered to the {{rhymes}} template as they already are. The biggest change is that by doing this, that page would automatically update with the new word. I support the automation of rhymes. Vininn126 (talk) 23:19, 2 August 2021 (UTC)[reply]

There's no need to modify the {{IPA}} template since {{rhymes}} already has the information needed to categorize into a particular rhymes category. The scope of this proposal should be made clear: to utilize the existing information contained in {{rhymes}}, or to automatically generate rhymes directly from the IPA string. DTLHS (talk) 00:57, 3 August 2021 (UTC)[reply]

I think it's easiest to go for the former, while leaving the latter to individual language communities. Thadh (talk) 06:25, 3 August 2021 (UTC)[reply]

Hard agree. It feels like the tools are already there, we just need to optimize them. Vininn126 (talk) 11:52, 3 August 2021 (UTC)[reply]

Moving rhymes to categories makes sense to me. —Ruakh_TALK 04:21, 3 August 2021 (UTC)[reply]

Agreed. Ultimateria (talk) 16:48, 4 August 2021 (UTC)[reply]

I support the idea, but there's at least one issue I can think of; we'd need some way to group the entries by syllable count, as many rhyme pages currently do, and I'm not entirely sure if the category infrastructure MediaWiki has lets us do that. — sur jec tion ⟨??⟩ 15:42, 5 August 2021 (UTC)[reply]

And if it doesn't, I don't really think it's a deal breaker either way. — sur jec tion ⟨??⟩ 15:57, 5 August 2021 (UTC)[reply]

Couldn't we just make something like {{dialectboiler}} with a parameter for the amound of syllables? That seemed like the most reasonable way to me. Thadh (talk) 16:08, 5 August 2021 (UTC)[reply]

That could work. Something like word with a {{rhymes|en|ɜː(ɹ)d}} would end up in Rhymes:English/ɜː(ɹ)d (or Category:Rhymes:English/ɜː(ɹ)d), and with some kind of additional parameter such as {{rhymes|en|ɜː(ɹ)d|syllables=1}}, also in Rhymes:English/ɜː(ɹ)d/1 syllable (or Category:Rhymes:English/ɜː(ɹ)d/1 syllable). — sur jec tion ⟨??⟩ 18:31, 5 August 2021 (UTC)[reply]

I just now understood what you meant. Yes, that seems like a good idea to me, doing something like lemmas vs POS categories. Thadh (talk) 19:42, 5 August 2021 (UTC)[reply]

This is exactly the sort of thing I was thinking. Vininn126 (talk) 20:17, 5 August 2021 (UTC)[reply]

I've written an initial version of Module:User:Surjection/category tree/poscatboiler/data/rhymes (to eventually merge into Module:category tree/poscatboiler/data/rhymes) for the new rhyme categories. The format is as I described: Category:Rhymes:English, Category:Rhymes:English/əʊθ, Category:Rhymes:English/əʊθ/1 syllable, Category:Rhymes:English/əʊθ/2 syllables. The last two would be a subcategory of the second category, which is itself under the first. It doesn't however support "intermediate" categories such as Rhymes:English/əʊ-. Maybe those should remain as kind of "indexes", maybe just as categories that are added manually, or do we need them at all? — sur jec tion ⟨??⟩ 11:43, 6 August 2021 (UTC)[reply]

@Surjection Maybe those should stay as some sort of index - after all it would be nice to have an easier way to organize them. Vininn126 (talk) 19:15, 7 August 2021 (UTC)[reply]

@Rua, Dan Polansky, DTLHS, -sche, Atitarev, Ruakh, Erutuon, Fenakhay, Shumkichi, Thadh Thoughts on this? Vininn126 (talk) 11:34, 9 August 2021 (UTC)[reply]

The way you group things in categories is by using a sort key. The template could build the sort key from a number representing syllable count followed by the entry name, i.e. multisyllabic would have "5multisyllabic" and syllabic would have "3syllabic". Since the system adds a subheader each time the first character changes, you would have "1" followed by all the monosyllables, etc. Allowing for more than 9 syllables would complicate things, but it wouldn't be that hard to make it work. Chuck Entz (talk) 04:48, 6 August 2021 (UTC)[reply]

That is another option, but it only allows single digits, so I'd argue using subcategories are a better option (and it allows grouping by the first letter as usual). — sur jec tion ⟨??⟩ 09:38, 6 August 2021 (UTC)[reply]

I rather like User:Chuck Entz's solution since it avoids splitting the rhymes into separate per-syllable-count pages. The number of words with > 9 syllables is small (e.g. out of 37,972 pages in Category:Italian terms with IPA pronunciation, there are only 10 with >= 10 syllables) so I wouldn't worry about distorting the solution to accommodate them; we could e.g. put all words with > 9 syllables under the ">" character or similar. The issue with splitting into per-syllable-count pages is that it makes it less convenient to view the rhymes, particularly for the many rhymes where the number of entries is relatively small. Benwing2 (talk) 03:58, 8 August 2021 (UTC)[reply]

You can put them into both the parent category and into the individual syllable categories. DTLHS (talk) 04:01, 8 August 2021 (UTC)[reply]

That is exactly what I had in mind. It'd put the entry under the rhyme category and the rhyme/syllable count category if available. — sur jec tion ⟨??⟩ 16:27, 8 August 2021 (UTC)[reply]

Good point, that would work. And I agree with the general idea of this thread that moving from the current Rhymes: pages to categories would be a good idea. Even pages like Rhymes:English/æ- linking to Rhymes:English/æb, Rhymes:English/æbd, etc could be reproduced as categories if we want (e.g. "rhymes in æ-" containing "rhymes in æb", "rhymes in æbd", etc, or whatever naming scheme is used). As discussed elsewhere, we could also use categories for anagrams. - -sche (discuss) 01:54, 10 August 2021 (UTC)[reply]

FWIW, I have now implemented this in userspace; Module:User:Surjection/rhymes (updated module), User:Surjection/Template:rhymes/documentation (updated template documentation). The poscatboiler code is already live, so these two are all we "need" to implement this change now. — sur jec tion ⟨??⟩ 15:34, 13 August 2021 (UTC)[reply]

I am for deploying it. Vininn126 (talk) 10:18, 16 August 2021 (UTC)[reply]

I am now deploying it — sur jec tion ⟨??⟩ 17:20, 16 August 2021 (UTC)[reply]

I think the categories should be "Finnish rhymes/..." rather than "Rhymes:Finnish". DTLHS (talk) 17:31, 16 August 2021 (UTC)[reply]

That would've conflicted with the existing Category:X rhymes that links to the old (and arguably now obsolete) rhyme index pages. — sur jec tion ⟨??⟩ 17:44, 16 August 2021 (UTC)[reply]

Delay of the 2021 Board of Trustees election

We are reaching out to you today regarding the 2021 Wikimedia Foundation Board of Trustees election. This election was due to open on August 4th. Due to some technical issues with SecurePoll, the election must be delayed by two weeks. This means we plan to launch the election on August 18th, which is the day after Wikimania concludes.

For information on the technical issues, you can see the Phabricator ticket.

We are truly sorry for this delay and hope that we will get back on schedule on August 18th. We are in touch with the Elections Committee and the candidates to coordinate next steps. We will update the Board election Talk page and Telegram channel as we know more. Best, JKoerner (WMF) (talk) 22:11, 4 August 2021 (UTC)[reply]

Call for Candidates for the Movement Charter Drafting Committee

Movement Strategy announces the Call for Candidates for the Movement Charter Drafting Committee. The Call opens August 2, 2021 and closes September 1, 2021.

The Committee is expected to represent diversity in the Movement. Diversity includes gender, language, geography, and experience. This comprises participation in projects, affiliates, and the Wikimedia Foundation.

English fluency is not required to become a member. If needed, translation and interpretation support is provided. Members will receive an allowance to offset participation costs. It is US$100 every two months.

We are looking for people who have some of the following skills:

Know how to write collaboratively. (demonstrated experience is a plus)
Are ready to find compromises.
Focus on inclusion and diversity.
Have knowledge of community consultations.
Have intercultural communication experience.
Have governance or organization experience in non-profits or communities.
Have experience negotiating with different parties.

The Committee is expected to start with 15 people. If there are 20 or more candidates, a mixed election and selection process will happen. If there are 19 or fewer candidates, then the process of selection without election takes place.

Will you help move Wikimedia forward in this important role? Submit your candidacy here. Please contact strategy2030wikimedia.org with questions. Best, JKoerner (WMF) (talk) 22:12, 4 August 2021 (UTC)[reply]

Internationalism

Earlier discussion: Wiktionary:Beer parlour/2016/January#Internationalisms in etymologies

I think it'd be a good idea to have a {{internationalism}} and Category:Internationalisms by language to mark templates as internationalisms in etymology sections. This would be particularly useful for the plethora of terms part of the so-called international scientific vocabulary that is still gaining new terms mostly constructed of Latinate and Greek elements. It's in theory possible to determine the language in which they were coined for at least some of them, but there's also plenty where that task is much harder if not practically impossible.

Naturally though, I'd say that if {{internationalism}} were to exist, its usage would be restricted in cases where the immediate source language is known. After all, those are just borrowings rather than internationalisms. The problem is that you sometimes simply cannot know the exact chain of languages a word went through.

In order to prepare for this, we'd also need a better definition under Appendix:Glossary#internationalism. The existing definition doesn't really exclude Wanderwörter. I'd argue that internationalisms specifically have to be words that have spread in the modern age, as internationalisms in the sense I see couldn't really have existed in a world before languages were as connected as they are now (or were a couple hundred years ago or so). Another possible prerequisite is that the word has been adapted into the target language, such as by the mostly regular but language-specific processes that govern how Latinate and Greek components are adapted over. As an example, a word like Latin positiō would have once upon a time been adapted into Finnish as positsiooni, after the German and/or Swedish models, but in modern language it's positio. This also applied to other words; postpositio was once postpositsiooni.

Hungarian already appears to have a template, {{hu-int}}, for this sort of purpose. — sur jec tion ⟨??⟩ 15:21, 5 August 2021 (UTC)[reply]

I support this. Vininn126 (talk) 13:59, 6 August 2021 (UTC)[reply]

I think this is a possible solution to a real issue but needs to be thought through carefully. I ran into this issue a lot when creating entries for Russian internationalisms like канонизи́ровать (kanonizírovatʹ, “to canonize”), дисквалифици́ровать (diskvalificírovatʹ, “to disqualify”) and кассацио́нный (kassaciónnyj, “cassation (relational)”). My solution was to assume these entries came from German unless there was some evidence to the contrary (e.g. no such corresponding word in German, or the definitions didn't match), and write something like "Probably borrowed from {{affix|ru|kanonisieren|-овать|lang1=de}}". Sometimes I just said "Ultimately borrowed from " followed by a Latin or Greek term. But both of these solutions are questionable and subject to a good deal of guessing. What I'm concerned about with something like {{internationalism}} is that people will use it lazily and promiscuously to avoid actual etymological investigations, and it will end up being more or less meaningless. As it is, it's not clear to me that an etymology that says nothing but "Internationalism" actually contributes much anything over just leaving the etymology unspecified. It would be better to include a sample of corresponding terms in other languages and list the underlying Latin or Greek terms that make up the word, which at least provides some context. Benwing2 (talk) 03:50, 8 August 2021 (UTC)[reply]

Therein lies the rub. I don't think it's possible to just discourage people from using it lazily, even by having the documentation contain a text in big red all-caps saying "add an ultimate origin (Latin, Greek) or at least comparisons whenever you use this template!" But if one thinks about it, right now most languages simply have no etymologies whatsoever for these so-called internationalisms, so even just an "internationalism" would be better than nothing at all. It'd also add the entry to the appropriate category and thus give people an easy avenue to find etymologies to improve by adding details to. — sur jec tion ⟨??⟩ 17:02, 8 August 2021 (UTC)[reply]

Deleting "Hangul syllable" entries

All entries that consist only of {{ko-syllable-hangul}}, like 괧 (gwael) and 먘 (myak), should be deleted by bot. All etymology sections containing them should also be deleted by bot and the etymologies renumbered, if possible. This has been done manually since a few months ago by the two Korean editors here, but a more drastic solution seems preferable.

For those unfamiliar with Korean, these are the equivalent of nonsense sequences of Latin alphabet letters like "swrg" or "gwerq". For some computer-related reason they have been assigned separate Unicode characters, but they are not in any sense Korean words any more than "wetw" is an English word. As Korean is written in an alphabet, the composition and theoretical pronunciation of these syllables is highly transparent.

I am unsure who exactly these entries benefit or are intended for. The only possible demographic I can think of is people who 1) have zero knowledge of Korean, but 2) decide to look up Korean characters anyhow, and 3) not just any characters but ones that are not actually words in the language. This seems unlikely to be a large group of people.

This has the following effects (in increasing degree of severity):

It clogs up Category:Korean lemmas with "words" that are not only not lemmas but not words at all, and in many cases never actually used in the language.
It wastes the time of editors who are manually creating these because an automated bot could make hundreds upon thousands of them within ten minutes.
The pronunciation section is misleading because the phonetic pronunciation of Korean syllables varies depending on its position within the word.
It makes it appear as if there are actual entries for words like 퀄 (kwol, “(colloquial) quality”) and 힝 (hing, onomatopoeia especially commonly used by young women). This confuses and disappoints readers, who are obviously going to be looking for a definition of the word they have just encountered, only to be faced with this non-entry.--Tibidibi (talk) 15:48, 5 August 2021 (UTC)[reply]

Support —Suzukaze-c (talk) 00:52, 6 August 2021 (UTC)[reply]

Support Benwing2 (talk) 06:35, 6 August 2021 (UTC)[reply]

This proposal is overwhelmingly dishonest:

They are not nonsense words like 'swrg', 'gwerq' or 'wetw'. The English analogy would be nonsense words like 'thung' or 'gwet'.
I believe their character-like nature is a design feature. Remember that they were designed for an environment where writing was done using Kanji.
Tools for manipulation of decomposable Korean characters are not as well-known, so there may be some usage there. Another possible usage group is people without access to a Korean script renderer.
Putting them in Category:Korean lemmas looks like a bug in {{ko-syllable-hangul}}; that should be fixed.
There are exactly 11,172 of them. Perhaps it would be more productive for a bot to complete the set.
It would seem that the pronunciation section is in need of some work.

Furthermore, WT:CFI allows "Characters used in ideographic or phonetic writing such as 字 or ʃ."

Oppose --RichardW57m (talk) 12:05, 6 August 2021 (UTC)[reply]

@RichardW57m How exactly are 뾵 (ppyon) or 먘 (myak) like "thung" or "gwet"? These are outright impossible syllables in Korean. If you ask Korean speakers to come up with nonsense syllables that still sound like they might be words in the language (the same way you could get English speakers come up with "thung" or "gwet"), they will never come up with 뾵 (ppyon) or 먘 (myak). These are bizarre combinations to anyone with the slightest knowledge of Korean, fully equivalent in absurdity to "swrg", "gwerq", or "wetw".
Please explain why. What constraint on words do they violate? Or is it just that they are not phonetic writings of isolated syllables? For example, word-final 'k' is rare, but it does occur. Or doesn't the suffix 녘 (nyeok) sound Korean? --RichardW57 (talk) 02:02, 8 August 2021 (UTC)[reply]

@RichardW57 /Cja/ is not common in standard Korean, and the sequence 먀 (mya) in particular is extremely rare (only three entries in the standard dictionary, excluding redirects to standard forms and recent and uncommon loans, and all of them rare words). The final ㅋ (k) is similarly uncommon. This is not remotely like English /θʌ-/ or /-ʌŋ/, both extremely common sequences in the language. 뾵 (ppyon) should not even be worth discussing, but the final ㄵ (nj) appears only in 앉다 (anda) and 얹다 (eonda) and their derivatives, whereas the iotized vowel ㅛ (yo) does not occur in verbal stems. Perhaps they are not "impossible" in a theoretical sense, but they are certainly combinations so improbable as to be functionally impossible to imagine in the language. And as a native speaker, I can assure you that they are no less bizarre than e.g. "pswm" (which, by the logic you seem to suggest, could also be a plausible English word given the existence of pseudoscience and cwm).--Tibidibi (talk) 06:43, 9 August 2021 (UTC)[reply]

I see nothing particularly "character-like" about Hangul syllables other than the fact that they are written in syllabic blocks. Every letter combines predictably, and there are no particular ligatures that demand special treatment. If anyone has learned the alphabet, they will be able to read all 11,172 Hangul Syllables without any problem whatsoever.
What about writing them? Kerning and sizing look potentially complicated. One could say much the same about the classical Mongolian script - but they insist on teaching it as CV syllables! (There is small amount of ligaturing in Mongolian, and the Unicode Consortium was beaten down to accepting a horrible and actually undefined phonetic encoding, so copy-typing is nightmarish.) --RichardW57 (talk) 02:02, 8 August 2021 (UTC)[reply]

A Korean-English dictionary should not cater to a demographic so inept in the basics of the script as to find the (minimal) changes in the curve or proportional size of the letters "complicated". I don't claim to know anything about the Mongol script, but if it's anything like Manchu, the impact of ligatures is significantly larger than in Korean (where, I repeat, ligatures may as well not exist). Also, Koreans (unlike Manchus or Mongols) do not teach or learn Hangul as a syllabary, so that seems irrelevant.--Tibidibi (talk) 06:43, 9 August 2021 (UTC)[reply]

The purpose of a Korean-English dictionary should not be to service "people without access to a Korean script renderer", nor to help people who cannot even read the alphabet. In any case, people who lack access to a Korean script renderer and people who cannot read the alphabet seem extremely unlikely demographics to look up random syllables that are not words in the language and which they are unlikely to encounter in Korean text.
I don't see your point in noting that there are 11,172 Hangul Syllables encoded in Unicode.
You spoke of hundreds of thousands of syllable blocks, giving the impression that there were an enormous number of precomposed syllable blocks. My point is that there is a fixed number of them, and there won't be any more. The current number of Korean lemmas and syllable blocks is 35,778.

11,172 already is "an enormous number of precomposed syllable blocks"! And note that my point there is not the exact number of existing syllable blocks, but that they are a waste of effort if manually created.

The pronunciation section is an irremediable issue because Hangul syllables do not actually have one fixed pronunciation, even phonemically. Some syllables have long vowels, some syllables have orthographically unmarked tensing, some syllables (depending on dialect) have high pitch. 그 (geu) in isolation is pronounced [kɨ˨] in Middle Korean, [kɯ] in modern Seoul, and [kə] in Busan. How would you remedy this?
Giving them all in a collapsed subsection is the obvious solution. Are all these applicable alternatives being given for words? --RichardW57 (talk) 02:02, 8 August 2021 (UTC)[reply]

As a matter of fact, yes. All Standard Korean variation in vowel length and tensing is marked by {{ko-IPA}}, including in different words written with the same Hangul syllabic block.

And "giving them all in a collapsed subsection" is a non-solution. For instance, any syllable might have either a short or long vowel; depending on the morphology, any syllable with a lenis initial might actually be pronounced with a fortis initial not marked in the spelling. Should every syllable block entry therefore have a paranthetical vowel length mark, or note that the initial might be fortis in some collapsible box? This is not a simple matter of dialectal or synchronic distinctions in realization, but a problem caused by the fact Standard Korean has phonemic distinctions that the modern script does not consistently express. Hence any attempts to assign a definitive pronunciation for a syllabic block is misguided.--Tibidibi (talk) 06:43, 9 August 2021 (UTC)[reply]

These are not "characters used in ideographic or phonetic writing". These are compounds of such characters, combined in fully predictable fashion. The two examples given in WT:CFI itself show the failure of the argument; 字 is a logogramic character, and ʃ (ʃ) is a single letter in an alphabet. They are equivalent to individual Hangul consonant and vowel letters, not the syllables.
Sounds like the multilingual letter î. Is not that a predictable combination? --RichardW57 (talk) 02:02, 8 August 2021 (UTC)[reply]

The analogy is obviously dishonest because the circumflex is a diacritic.--Tibidibi (talk) 06:43, 9 August 2021 (UTC)[reply]

You have also not responded to the important point that it makes it appear as if there are actual entries for missing words.--Tibidibi (talk) 14:23, 6 August 2021 (UTC)[reply]
Walk me through this process. How is it worse than looking up 'asta' and only finding words in languages other than the one you want. --RichardW57 (talk) 02:02, 8 August 2021 (UTC)[reply]

It is worse because the entry is utterly valueless to a language learner. Anyone who has learned Korean for more than two days will know the "information" contained therein.--Tibidibi (talk) 06:43, 9 August 2021 (UTC)[reply]

I agree with User:Tibidibi here. It is a shame that Unicode decided to waste so much space in the BMP with these encoded syllables. That's 11,000+ code points out of 65,536 that could have been used for something better. AFAIK Korean is the only language that gets such treatment, and it has encouraged misguided proposals like the Tamil All Character Encoding that seek to emulate this for other languages. I think the decision to do this was made due to a desire to maintain round-trip compatibility with some now-long-obsolete Korean-language multibyte encoding, but whatever. Benwing2 (talk) 23:59, 7 August 2021 (UTC)[reply]
It's worse than that - it was a purely political decision to get past Korean objections in the ISO process. However, these Tamil encodings do represent how some people's minds work. Tamil children in Canada apparently do have trouble conceiving of CV combinations as consonant plus vowel. --RichardW57 (talk) 02:02, 8 August 2021 (UTC)[reply]

Most Latin script languages get such treatment, in the form of precomposed letters. There's also a fair bit of BMP squandered on precomposed letters, and there's the dead waste of Arabic presentation forms. --RichardW57 (talk) 02:02, 8 August 2021 (UTC)[reply]

I agree about the Arabic presentation forms; totally useless. Precomposed letters are a bit different; there are fewer of them and only ones that actually are in use are defined. Also in some languages, precomposed letters are treated as single entities for sorting, and there were many existing encodings at the time e.g. ISO-8859-1 that contained them. As for Tamil, I strongly suspect that "this is how people's minds work" is not a legitimate argument; the history of writing shows that people in *all* languages think more naturally in terms of syllables or CV sequences than in terms of separate consonants and vowels. Benwing2 (talk) 03:23, 8 August 2021 (UTC)[reply]

Support It's about time; those entries really just take up space and don't provide much benefit. Just because they have a Unicode codepoint for some reason doesn't mean that we should have an entry for all of them. If anything, they should be under the multilingual header anyways. AG202 (talk) 13:55, 6 August 2021 (UTC)[reply]

So restore tham as multilingual? Presumably it's the ones that coincide with words and morphemes that are the resource drain - the nonsense syllables surely chiefly cost for backup operations. "Remember, Wiktionary is not a paper dictionary." --RichardW57m (talk) 13:23, 9 August 2021 (UTC)[reply]

Support –Austronesier (talk) 14:05, 6 August 2021 (UTC)[reply]

Support — Fenakhay ^{(تكلم معاي · ما ساهمت)} 23:23, 6 August 2021 (UTC)[reply]

Another important point I missed is that {{character info}} already contains the entire information in {{ko-defn-hangul}}. So bot-removing all these non-entries will not even result in a loss of information as long as there are other etymologies for the word.--Tibidibi (talk) 06:46, 9 August 2021 (UTC)[reply]

So are you suggesting that someone looking for a decomposition should use a sandbox? I don't see how else one would get access to the decomposition. Of course, {{character info}} could always be optimised to omit Hangul syllable blocks, could it not?. I suppose if you can find a word containing the syllable, one can always interpret the transliteration, can't one? --RichardW57m (talk) 13:23, 9 August 2021 (UTC)[reply]

Support —Mahāgaja · talk 08:31, 9 August 2021 (UTC)[reply]

Support — Omgtw15 (talk) 03:53, 11 August 2021 (UTC)[reply]

Implementation

@Suzukaze-c, Benwing2 A week has passed with overwhelming support (8 Support to 1 Oppose), including both of the two regular Korean editors. Could a bot operation be done for this?

I believe there are four tasks at hand:

Delete all entries where the only headword template is {{ko-syllable-hangul}}. I assume this is the simplest. Examples: 괧 (gwael) and 먘 (myak).
Delete Etymology 1 where the headword template is {{ko-syllable-hangul}} and there are multiple other etymologies, and renumber the Etymology sections accordingly. This is probably trickier for the bot. Examples: 응 (eung) and 공 (gong).
Delete Etymology 1 where the headword template is {{ko-syllable-hangul}} and there is only one other etymology, and reorder the header hierarchy. This also sounds potentially tricky for the bot. Examples: 징 (jing) and 껌 (kkeom).
Some entries are extremely poorly formatted and cannot be fixed automatically. Examples: 업 (eop). As apparently the only regular native Korean editor left, I will fix these manually.

If category (1) can all be deleted by this weekend (hopefully this should be quite easy), it might turn out that (2), (3), (4) are few enough to be manually addressed. Right now I'm not sure exactly how many entries belong to categories (2), (3), and (4). It seems like the clear majority of the 1,215 entries that need fixing belong to category (1).--Tibidibi (talk) 13:21, 13 August 2021 (UTC)[reply]

Also pinging @Erutuon as a botter.--Tibidibi (talk) 13:22, 13 August 2021 (UTC)[reply]

Comment:

I keep seeing statements in various contexts here at EN WIKT where the poster is already quite familiar with the subject matter at hand (such as hangul in this thread, or the relationship between Ingrian and Proto-Finnic in another recent thread). The people making such statements seem to have lost sight of the fact that our EN WIKT readership can only be safely assumed to understand written English.

Case in point: before I learned hangul, I thought that each of the composed hangul glyphs represented an independent syllable in some complicated fashion, as a very large syllabary. I had no clear idea that each glyph is just a composite of individual letters. Were I to try to parse a Korean text in such a state of ignorance, I might select a single "syllable" (the selectable glyphs making up a hangul text) and try to look that up. I would have no way of decomposing the glyph into the individual jamo (letters).

Other users with a similar lack of familiarity with hangul might do the same thing.

Considering 1) the lack of any apparent harm from having these entries, and 2) the demonstrable gain in usability from having these entries (at least, for those just learning about written Korean), I lean strongly towards keeping these.

@Tibidibi, the above thread has gotten chopped up in strange ways and various statements are missing any sig, but I think you are the strongest proponent for deletion. Can you better articulate what harm we suffer because these entries exist? ‑‑ Eiríkr Útlendi │^{Tala við mig} 21:38, 13 August 2021 (UTC)[reply]

@Eirikr, the key issue is the same as you raised a while ago when you criticized somebody for making an entry with no definition but {{rfdef}}. A large number of these non-entries are actual words, some actually quite common in the language, for which no entry has been made. When readers click a bluelink, or search for a keyword and find that there are results, they expect to see definitions. They do not expect to see what effectively amounts to a redlink dressed up as an entry. In fact, these are actively worse than the definition-less entries you criticized, because the latter at least had pronunciations and parts of speech.

For sections where there are real entries for words, the issue is the simple waste of space. Why should readers have to go through an entire header section just to find information which is so basic that it is already duplicated in {{character info}}?

It is true that these could be of marginal utility to readers who do not know Hangul at all. But I find it highly unlikely that any such people would be trying to parse Korean text in the first place. In addition, I do not think it is appropriate to cater to a userbase that does not know something taught on the first day of Korean class! In that sense, I find this very different from the relationship between Ingrian and Proto-Finnic; a fluent Ingrian speaker could go their entire life without ever having heard of Proto-Finnic, while the nature of Hangul blocks is something that all Korean learners will know from Week 1, and even a large number of people who have never actually studied the language itself.

To give another analogy, I find this equivalent to adding the Usage note that "The capitalized form of "word" is "Word". In English, initial capital letters are used at the beginning of sentences, to mark proper nouns, and sometimes for emphasis. The fully capitalized form of "word" is "WORD". In English, full capital letters are used in abbreviations, or in Internet slang to mark shouting." to every single English entry. This would also be useful to English learners, but clearly this is overkill. And the composition of Hangul syllables is far, far more basic and simple than English capitalization rules!--Tibidibi (talk) 01:28, 14 August 2021 (UTC)[reply]

@Tibidibi: "I do not think it is appropriate to cater to a userbase that does not know something taught on the first day of Korean class!" --> What if someone never had a Korean class? Indeed, before I had studied any Korean, I seriously did not know that a single "syllable" glyph was composed of multiple simpler jamo. You assume too much about our readership, I think.

One key difference between Korean syllable blocks and arbitrary strings of English letters is that English letters are still individually selectable. An English-reading person (our very audience) can tell that there are individual characters in any such string of Latin letters. Or indeed Cyrillic letters, or Greek letters, etc. as the encoding (on most modern systems) still allows a user to select individual letters. An English-reading person cannot select an individual jamo from within a composed syllable block -- they can only select the whole block. If we have no entries for composed syllable blocks, users have no means of looking up these individually selectable textual units. We have entries for other individually selectable textual units in other scripts, so why not for Korean?

Re: "waste of space", Wiktionary is not paper, so that's not really much of a concern. ‑‑ Eiríkr Útlendi │^{Tala við mig} 04:30, 14 August 2021 (UTC)[reply]

@Eirikr You cannot select individual letters in Hindi, Thai, Tibetan or any other abugida language; you can only select syllables. Yet we don't enter every Devanagari etc. syllable into Wiktionary. The only real difference I can see between Korean and an abugida language is that Korean syllables have special treatment in Unicode, which (as pointed out by User:RichardW57), was a purely political decision to get the Korean delegation on board. Benwing2 (talk) 04:50, 14 August 2021 (UTC)[reply]

@Benwing2, Eirikr, Tidibid: That remark on selectablity is untrue, and as a policy statement merits banning. What is true is that it is difficult to select a letter without its accompanying combining marks. For Thai, script-independent editing generally treats spacing marks as letters, and Thais like it that way. An attempt to prevent selection of Thai letters by changing the Unicode classification was met with howls of protest, and the change was rescinded. Until recently, selection by Unicode default graphemes was generally available even for Devanagari - it may still be generally available where one can find the editor's customisation menu. --RichardW57 (talk) 15:59, 15 August 2021 (UTC)[reply]

@Benwing2: The Thai graphic unit is the syllable, which includes the final consonant and even consonants beyond (การันต์ (gaa-ran)) it.

@Benwing2: We ought to allow what people consider as letters. There is a list of strongly backed candidates for letter status in the file NamedSequences.txt in the Unicode Character Database - it includes all the possible Tamil syllables. You'll notice that it also includes the subjoined Khmer consonants. --RichardW57 (talk) 15:59, 15 August 2021 (UTC)[reply]

@RichardW57 I don't know why you're accusing people of being "overwhelmingly dishonest" and suggesting that I be banned merely for making a statement you disagree with, but it reflects very badly on you. Benwing2 (talk) 19:28, 15 August 2021 (UTC)[reply]

@Benwing2 I used the term dishonesty because what was presented wasn't quite a lie. Perhaps I was lily-livered when I suggested that someone should merely be banned if they tried to make it difficult for people to edit text in their own language, e.g. by deliberately preventing the selection of parts smaller than an orthographic syllable - 6 characters in an orthographic syllable is not unusual in some languages. I find it horrifying that there are people who regularly edit text in their own language in (Latin) transliteration rather than in their own script. --RichardW57 (talk) 20:09, 15 August 2021 (UTC)[reply]

The fact remains we don't include all syllables in all abugida languages, and I would be strongly opposed to doing that. Whether Unicode includes a particular list of syllables in their database is irrelevant for Wiktionary's decisions. Benwing2 (talk) 19:28, 15 August 2021 (UTC)[reply]

@Benwing2 I would have hoped that we would at least ask why Unicode has done what it has done. I see we do at least allow the whole of the Welsh alphabet - or is the toleration of letters like ph an oversight? --RichardW57 (talk) 20:09, 15 August 2021 (UTC)[reply]

That's... already been explained to you how that's different. In Welsh, "ph" is a letter in the alphabet, similar to ㄱ or ㅋ in Korean, or gb in Yorùbá, and are actual relevant lemmas and have an actual usage. That's an entirely separate thing from allowing random combinations like 쫹 in Korean which don't exist whatsoever in the language. Please please don't derail like that, and if you don't have that much experience in the language, please give the space to others that do and know what they're talking about. AG202 (talk) 03:06, 16 August 2021 (UTC)[reply]

@Eirikr If people do not know anything about Korean, they are not the intended audience of the Korean entries on Wiktionary. Every entry for every language on Wiktionary makes certain assumptions about the readership—again, like the capitalization in European languages. Explaining the composition of Hangul syllables is like having a usage note on every German noun that German nouns are always capitalized, or giving a "combining form before suffixes" for every language written in Arabic script.

In any case, @Erutuon has enabled {{character info}} (which already contains all relevant information) even for nonexistent pages, so the issue seems to have been resolved; what we're now left with are the superfluous separate etymology sections.

The space we're talking about is the length and visual formatting of a single page. In that context, it is indeed a concern that redundant and (for the vast majority of readers) irrelevant information always takes up the most prominent part of the page.--Tibidibi (talk) 04:53, 14 August 2021 (UTC)[reply]

@Eirikr, Tibidibi Another issue that should be mentioned is that these syllables are being categorized in CAT:Korean lemmas, which clogs the category with non-words. Benwing2 (talk) 18:21, 14 August 2021 (UTC)[reply]

@Tidibidi: For which a sane answer, given above, was to classify them as something else.

With all the mentions of {{character info}}, I thought, why not show it on nonexistent mainspace pages with only one character in the title? So I added it to MediaWiki:Newarticletext and MediaWiki:Noarticletext. So now people can see the letters that Korean syllables are made up of, even if there isn't an entry. If this is a bad idea, it's easy to revert. — Eru·tuon 02:46, 14 August 2021 (UTC)[reply]

It's an excellent idea. --RichardW57 (talk) 15:59, 15 August 2021 (UTC)[reply]

Vote to prioritize definitions in entry layout

I'm looking for feedback on a vote that's long overdue: Wiktionary:Votes/2021-08/Prioritizing definitions. Ultimateria (talk) 17:35, 5 August 2021 (UTC)[reply]

Feedback:

I'm not sure about the treatment of Pronunciations. My personal feeling is that they are as important as, if not more than, the definitions, especially for learning a foreign language. Also, in most dictionaries, they come before definitions.
Is there an estimate about the workload if the vote passes? That would include the time to develop, test, run, and maintain the bot(s), I presume. How will irregular entry layouts be dealt with? I feel that some discussion about implementation would be helpful for votes to make their decisions.

--Frigoris (talk) 17:52, 5 August 2021 (UTC)[reply]

Some pronunciation sections are very large and may take up a significant part of the screen estate, so having them before definitions is not a good idea unless there's some way to reduce them down in size for all languages. Placing the pronunciations any higher may on the other hand cause issues if there are multiple parts of speech under the same entry, because you might get something like

==Language==

===Noun===

====Synonyms====

===Pronunciation===

===Verb===

====Synonyms====

which is not very intuitive. — sur jec tion ⟨??⟩ 18:02, 5 August 2021 (UTC)[reply]

Let us take the Chinese entries as example. For Western learners the pronunciations are possibly more important than definitions, since the script can be far less phonetic than their own familiar ones. The creators of {{zh-pron}} has put a lot into balancing space economy with information content; there are foldable components that unhides the rich information upon click.

At least with Chinese, it would've been way less intuitive if definitions come before pronunciation, since for multi-reading terms, a reading can govern a related cluster of meanings that as a whole is fairly separated from the other reading(s). --Frigoris (talk) 19:08, 5 August 2021 (UTC)[reply]

On 水 (shuǐ), {{zh-pron}} takes up more than a single page's worth of screen estate on mobile with default settings. If our plan is to have definitions (the most important part in dictionaries) first, that is simply not ideal in its current form. — sur jec tion ⟨??⟩ 19:12, 5 August 2021 (UTC)[reply]

If we confine ourselves to just changing the level/location of the Pronunciation headers, the alternative would be either to invade the space right above the next language header, Japanese, causing a large break in front of it, or for multi-reading Chinese terms to be internally sparsened by the possibly large {{zh-pron}}s. The solution, as it seems to me, would have been a better way to present the zh-prons for compactness so that the problem is minimized, rather than shifting them around elsewhere. --Frigoris (talk) 19:20, 5 August 2021 (UTC)[reply]

Right-floating has absolutely zero impact on mobile. It'll still take up the same amount of space as it used to. Having pronunciation info right at the bottom is not ideal, but I'd argue it's less bad than having a massive block of pronunciation info drown out the definitions on a page to the point users get tired and don't even bother. The alternative is making all large pronunciation sections (more than half a page long on mobile) collapsible. — sur jec tion ⟨??⟩ 19:35, 5 August 2021 (UTC)[reply]

Do we have to have the headings that take up an entire line? Most paper and online dictionaries make much better use of space by putting various things into paragraphs: dog, noun, /dOg/, a barking animal; .... Equinox ◑ 18:05, 5 August 2021 (UTC)[reply]

This is a very good point. Maybe someone could compile an example entry also for this. Allahverdi Verdizade (talk) 11:26, 6 August 2021 (UTC)[reply]

Yes, I think we should discuss this (even if only in a separate vote); our current setup has a lot of excess whitespace, especially on mobile, because a whole line is devoted to "Noun" and then a whole nother line to "dog (plural dogs)". But figuring out how to keep the table of contents coherent and usable if we change that is an issue... - -sche (discuss) 19:15, 6 August 2021 (UTC)[reply]

It would be good to have some example pages that demonstrate the new layout. DTLHS (talk) 18:52, 5 August 2021 (UTC)[reply]

@DTLHS: I've made a longer English example here and a shorter Spanish example here. Ultimateria (talk) 01:43, 6 August 2021 (UTC)[reply]

Thanks. Probably the translations / derived terms should go under the POS instead of the etymology? DTLHS (talk) 03:10, 6 August 2021 (UTC)[reply]

@DTLHS: I considered it but it puts etymologies really low in long English entries. I think it's important to put etymology before those headers because it relates to the word directly, and lists of other words do not. You can see at the English mockup that the most relevant information for the term itself is found between Pronunciation and Etymology, then you have all the rest, then you have Entry 2. The new structure just needs some extra part-of-speech labels. Ultimateria (talk) 17:03, 7 August 2021 (UTC)[reply]

@Ultimateria: Other headers like Synonyms, Antonyms, Hyperonyms,... also divert away from the main entry. Not sure of the distinction you're making between these and the Derived and Related terms to justify placing the Etymology section inbetween there? Sitaron (talk) 17:47, 7 August 2021 (UTC)[reply]

@Sitaron: It is distinct; synonyms and antonyms can help you understand a definition, but knowing e.g. that "breastplate" and "license plate" are terms derived from "plate" doesn't inform your understanding of the term "plate". Plus, the current trend is to nest synonyms et al under definitions, which could work for translations but not for derived/related terms or descendants. Ultimateria (talk) 22:20, 7 August 2021 (UTC)[reply]

OK, but putting derived terms and translations as a subsection of the etymology section makes no sense- these have no relationship to each other. Furthermore it increases the burden of disambiguating the derived terms with labels where previously they were automatically associated with a particular part of speech. DTLHS (talk) 02:31, 8 August 2021 (UTC)[reply]

They wouldn't be subsections of Etymology, they would also be L3 headers (in single-Entry pages). I think extra disambiguation is a small price to pay to cut right to the definitions while keeping etymologies relatively high up. Ultimateria (talk) 06:09, 8 August 2021 (UTC)[reply]

This is why ninjawords was invented. Maybe we could put a banner on the front page like if you just want definitions for English, without all the etymology, pronunciation, translations and other foreign-language crap, use NinjaWords. Not sure if faff or bullshit would be a better term than crap, though...... Queenofnortheast (talk) 19:03, 5 August 2021 (UTC)[reply]
I can't say I'm really psyched about any changes to the headings order, I prefer the way it is now, but I guess that's to be decided by the vote. Thadh (talk) 19:47, 5 August 2021 (UTC)[reply]

I don’t see any reason to change the order of information; since, as recognized, the importance varies between languages. Also, between reconstructed and attested language. It makes not much sense to go, as in the vote, alternative forms—alternative reconstructions first and then very far away etymologize—where is “reconstruction notes”? Oh, that’s even farer on top, with another mass of things between, oddly “glyph origin” yet farer above though it be etymology of the sign.

Wiktionary does not make use of space horizontally well. Nowadays 4K monitors are standard, often multiple, so I see a lot of white space on the right. If Wiktionary really wants to take a step forward then it may switch to using multiple columns. Anyway I wanted to have IPA beside the transcription schemes in Arabic and Persian entries – better though switchable inflection tables –, as employing even any room in the vertical space for pronunciation sections is wasty and inefficient.

But I doubt the bottability of any proposal. It’s all a waste of manhours, for the order being ever debatable. Fay Freak (talk) 21:34, 5 August 2021 (UTC)[reply]

Support moving etymologies below (but above translations, as per @Geographyinitiative). Oppose moving pronunciations so far below, per @Frigoris, @AG202. At the least they should be above Conjugation, even if we do bring them below the definitions.

Alternatively, languages with simple IPA templates could have them on the headline template and have a separate header for audio files and phonological trivia at the bottom, like fr.wikt does. This allows true "Pronunciation" headers to be reserved for languages that actually need them, like Chinese.--Tibidibi (talk) 00:49, 6 August 2021 (UTC)[reply]

Why are you man presupposing that etymologies aren’t written for multiple parts of speech? Many things would have to be written anew. Fay Freak (talk) 00:55, 6 August 2021 (UTC)[reply]

Thanks for the tag! Yes, like I said in Wiktionary talk:Votes/2021-08/Prioritizing definitions, the change for the Pronunciation section doesn't make that much sense to me, and if anything I'd prefer the fr.wikt solution if any change has to be made. Support the change to etymologies as long as it's above translations (though I do appreciate them at the top as a linguist), but oppose the change to pronunciation unless something else is proposed. AG202 (talk) 02:05, 6 August 2021 (UTC)[reply]

Update: After reviewing the feedback here, the talk page, and on Discord, I've kept the Pronunciation section at the top of the entry, as the only section before part of speech headers (and renamed the vote accordingly). I recommend seeing the new order in action at User:Ultimateria/alt entry layout, and where the change is less stark at User:Ultimateria/alt entry layout 2. The focus of the vote is now to continue grouping pages by etymology while moving the etymologies themselves out of the way of definitions. You can see in my first mockup that definitions are in clearer focus and that definitions, derived and related terms, and translations form more cohesive sections. Ultimateria (talk) 03:35, 6 August 2021 (UTC)[reply]

I will vote yes on the proposal the way it is formulated now. Allahverdi Verdizade (talk) 11:26, 6 August 2021 (UTC)[reply]

I don’t see the advantages in the examples. One isn’t even prioritizing definitions since pronunciations come first. Even for languages where pronunciations is frequently unguessable, I don’t see why this should be so. I can also know English words and their meanings without knowing how to pronounce them. Or Chinese characters. It could be that one is only interested in writing—in fact I have never spoken in English in my life, only written it. A great way to learn languages is treating all like dead ones, like one learned Latin. But here you are even prioritizing the recently controverted pronunciation sections of Latin. And there are more headings than before (Entry N). Fay Freak (talk) 04:28, 8 August 2021 (UTC)[reply]

Should we not explictly allow for pronunciation and etymology to be shared by all the entries on a page? Perhaps some general words can be found. For Lao-script Pali, it is definitely the case that what have different etymologies may have different sets of alternative forms. --RichardW57m (talk) 12:44, 6 August 2021 (UTC)[reply]

Pages with just one entry have one Etymology (L3) section, and pages with multiple entries need an Etymology (L4) for each entry. Pronunciation is a L3 header above definitions by default, and pages with multiple entries may have L4 Pronunciations under each entry if they don't share pronunciations. As for Alternative forms, they will come shortly after definitions, which is already an option. The parts about Etymology and Alternative forms are already covered in my proposed changes to WT:ELE, but I can add a sentence to the Pronunciation section about what to do in multi-entry pages. Ultimateria (talk) 16:30, 6 August 2021 (UTC)[reply]

Okay, I've added it. Ultimateria (talk) 17:20, 6 August 2021 (UTC)[reply]

I could only find it in the complex layout example. Apart from that, I can only find pronunciation as part of a numbered entry section. --RichardW57 (talk) 02:26, 8 August 2021 (UTC)[reply]

Alternative form sections can be quite long - one for each of about 30 scripts for Sanskrit. That gets unwieldy if there are half a dozen entries. --RichardW57 (talk) 02:26, 8 August 2021 (UTC)[reply]

┌────────────────────────────────────────────────────────────────────────────────────────────────────┘ Personally I am fine with requiring the alternative forms be placed below the definition but I somewhat like the current system with etymologies placed above. This is consistent with many print dictionaries that I remember consulting in the past, which tend to list entries somewhat like this: headword /pronunciation/ [etymology] 1. definition 2. definition 3. definition etc. Benwing2 (talk) 03:30, 8 August 2021 (UTC)[reply]

I think we need to step back and look at the big picture: Mediawiki is designed around headers. That means that if you want to divide things into sections, you have to stick a big fat piece of text at the top of each section, with the biggest, fattest piece of text first. In other words, organization is always prioritized before content.

I'm not sure how easy it would be to implement, but I think we should concentrate on minimizing the headers and shifting as much as possible to footers. The tricky part, of course, is designing the appearance so you know what section you're in without a grand megalith at the top to make it obvious. Chuck Entz (talk) 05:31, 8 August 2021 (UTC)[reply]

What do you mean by "footers"? Ultimateria (talk) 06:13, 8 August 2021 (UTC)[reply]

A section footer: something that shows that you're at the bottom of the Noun, or Pronunciation, or Etymology 1 section. Chuck Entz (talk) 06:51, 8 August 2021 (UTC)[reply]

Wonderfool again: splitting a, o

Wonderfool is trying to "fix" the Lua memory errors by splitting pages like a, o into other-languages sections. This is introducing its own errors. We should decide now whether to revert these changes. Benwing2 (talk) 23:44, 7 August 2021 (UTC)[reply]

I have undone them and he has posted here: Wiktionary:Grease_pit#Worst_attempt_ever_at_saving_Wiktionary. Equinox ◑ 23:49, 7 August 2021 (UTC)[reply]

Read-only tomorrow

Hello!

A maintenance operation will be performed tomorrow, Tuesday 10th August, at 05:00 UTC.

It will impact 17 wikis, and is supposed to last a few minutes. During this time, saving edits will not be possible. For more details about the operation, please check on Phabricator.

A banner will be displayed 30 minutes before the operation.

Please help making your community aware of this. Thank you! SGrabarczuk (WMF) 02:02, 9 August 2021 (UTC)[reply]

Quotation mark standardization

I was directed to comment here prior to creating a WT:VOTES regarding style and policy. I have been informed that standardizing quotation mark style was previously proposed, with voting for straight, and voting for curly, but both proposals failed. I think this concern is worth looking into again. I was previously unaware that the two styles existed simultaneously on Wiktionary, and that there had been discussion/votes on the matter; I'm sure many were similarly left out.

While inquiring about variations in quotation mark display, I was directed to view Wiktionary:Style guide#Quotation marks. I propose that Wiktionary should adopt a standard style of quotation mark usage, either straight or curly, but not both. For a comparative reference, see w:MOS:STRAIGHT. I find the mixture of quotation mark styles distracting to read/edit. Wiktionary should adopt one style of usage for uniformity, with allowance for agreed upon exceptions as necessary. — CJDOS, Sheridan, OR (talk) 18:51, 12 August 2021 (UTC)[reply]

Seems pretty pointless to me. If there is a change, it probably should come from inside the WT community, anyway. But I'm just a single user with a single account, so am not speaking on behalf of the community. Wubble You (talk) 15:39, 13 August 2021 (UTC)[reply]

See the trees in the etyma forests

We already categorize affixes by their distinct types using |id=, as in Category:Middle English words suffixed with -ly (adverbial).

In contrast we have |from= in definition lines using {{given name}}. It causes the categories Category:Arabic given names from Coptic and Category:Arabic male given names from Coptic to appear in Category:Arabic terms derived from Coptic.

The categorization I reproach not, but it is ironical and irreconcilable to vote to prioritize definitions but then include etymologies in definition lines. Indeed it can not be avoided sometimes for high-falutin figurative senses, but what concerns is the duplication. Shouldn’t |from= be deprecated? Its historical grounds is, as I understand it, not more than Wiktionary shirking from inclusion of foreign-language content in the unwoke 20·00s, to just mention the origin language instead of the origin term like popular outmoded paper dictionaries. But now the same item is in both Category:Arabic male given names from Coptic and Category:Arabic terms derived from Coptic—this shouldn’t be; I would like to see the actual number of Coptic loans in Arabic, Russian loans in English and so on, by reason that names of people and settlements do not work and are not to be viewed like generic vocabulary.

I conclude that we should to go farther and have a flag in etymology templates – {{bor}}, {{inh}} etc., but also requesting ones such as {{rfe}} –, to mark if a term is a toponym or an anthroponym. This would make any category Category:Requests for etymologies in langname entries much more workable. For the etymology of place-names is a different field than the etymology of common nouns and verbs. For Category:Requests for clarification of definitions by language there may be different groups; for instance old units of measure are often given inexactly. One has to categorize them by {{rfdef}} at best under a particular category of underdefined measures so someone can go through all of them and resolve them by help of metrological material. Fay Freak (talk) 21:07, 13 August 2021 (UTC)[reply]

Clarify what web pages count as "permanently recorded" for WT:ATTEST

There's been a lot of discussion about this over the years (e.g. 1, 2, 3), which ultimately hasn't led to much clarification at all on WT:ATTEST. WT:ATTEST specifies that Usenet is acceptable but is silent with respect to web pages. It should be specific as to which web pages count as permanently recorded for the purpose of this policy and which don't.

As for what should count, I think that anything that can be stuck into the Internet Archive or WebCite should count. A 2012 vote to mention something about WebCite failed with a tie, 7 to 7. It has been alleged that Usenet articles are more durable than web pages on the Internet Archive, but I agree with arguments that the opposite is the case. The broader issue here is that in 2021, Usenet is vastly less popular than the Web, so the first clear written uses of new terms are going to appear on the Web. If we wait for words like sniddy, currently undergoing an RFV, to appear on Usenet or in print, we'll be waiting a long time before we can include words whose usage was clearly established years ago. —Kodiologist (talk) 15:21, 16 August 2021 (UTC)[reply]

A few observations.

Part of the reason for the three durable uses rule is to weed out things that a professional editor would not let by. Yes, I discriminate against lower registers. Tweeting lulz three times doesn't make it a word even if the Internet archive catches it. The online marketing department's latest coinage can be lost in the dustbin of history with no harm done.
There are words that pop up regularly on Twitter, in newspaper comment sections, and in similar contexts that we do want to keep. This is especially true of words that professional journalists and editors avoid because they are politically unsound. I don't have a formula to apply. We could be more liberal with "clearly widespread use."
I do not consider modern Usenet a good source. I would put a cutoff date around 2000-2010 as so many NNTP servers shut down and access became inconsistent.
Some web sites break all their URLs every few years, independent of whether they throw away old content. Some keep long term stable URLs. Some of those with long term stable URLs are on the Internet archive.
We might allow a stable web site to provide one of the uses without saying all three uses can be online, say a long term stable web site that allows archiving. In practice this is unlikely to matter very often.

Vox Sciurorum (talk) 17:27, 16 August 2021 (UTC)[reply]

archive.org: "How can I exclude or remove my site's pages from the Wayback Machine? You can send an email request for us to review to info(ad)archive.org with the URL (web address) in the text of your message." So it isn't durably archived in general.

(The @ had to be replaced as an error appeared:

"Warning: Your edit appears to contain an e-mail address. Posting e-mail addresses here is not recommended; they will be viewable publicly, exposing them to the risk of spam.

If you have an e-mail address assigned to your Wikimedia account, you may use the link [[Special:EmailUser/{{subst:REVISIONUSER}}]] (copy the bracketed text below) to refer other users to it. Note however that only users who themselves have an e-mail address set can use this link.

If you understand the risks and wish to save the edit anyway, you may proceed again."

But when trying to save anyway, a captcha reappeared and then this message only popped up again.)

--18:08, 16 August 2021 (UTC)

@@ Line 337: / Line 337: @@
 :* We might allow a stable web site to provide one of the uses without saying all three uses can be online, say a long term stable web site that allows archiving.  In practice this is unlikely to matter very often.
 : [[User:Vox Sciurorum|Vox Sciurorum]] ([[User talk:Vox Sciurorum|talk]]) 17:27, 16 August 2021 (UTC)
+:[https://help.archive.org/hc/en-us/articles/360004651732-Using-The-Wayback-Machine archive.org]: "How can I exclude or remove my site's pages from the Wayback Machine? &nbsp; You can send an email request for us to review to info(ad)archive.org with the URL (web address) in the text of your message." So it isn't durably archived in general.
+:(The @ had to be replaced as an error appeared:
+:: "Warning: Your edit appears to contain an e-mail address. Posting e-mail addresses here is not recommended; they will be viewable publicly, exposing them to the risk of spam.
+::
+:: If you have an e-mail address assigned to your Wikimedia account, you may use the link <nowiki>[[Special:EmailUser/{{subst:REVISIONUSER}}]]</nowiki> (copy the bracketed text below) to refer other users to it. Note however that only users who themselves have an e-mail address set can use this link.
+::
+:: If you understand the risks and wish to save the edit anyway, you may proceed again."
+:But when trying to save anyway, a captcha reappeared and then this message only popped up again.)
+::--18:08, 16 August 2021 (UTC)

Wiktionary:Beer parlour/2021/August: difference between revisions

Revision as of 18:08, 16 August 2021

Contents

temporary right to move pages without redirect

permanent right to move pages without redirect

Automatic rhymes

Delay of the 2021 Board of Trustees election

Call for Candidates for the Movement Charter Drafting Committee

Internationalism

Deleting "Hangul syllable" entries

Implementation

Vote to prioritize definitions in entry layout

Wonderfool again: splitting a, o

Read-only tomorrow

Quotation mark standardization

See the trees in the etyma forests

Clarify what web pages count as "permanently recorded" for WT:ATTEST

Navigation menu

Wiktionary:Beer parlour/2021/August: difference between revisions

Revision as of 18:08, 16 August 2021

temporary right to move pages without redirect

permanent right to move pages without redirect

Automatic rhymes

Delay of the 2021 Board of Trustees election

Call for Candidates for the Movement Charter Drafting Committee

Internationalism

Deleting "Hangul syllable" entries

Implementation

Vote to prioritize definitions in entry layout

Wonderfool again: splitting a, o

Read-only tomorrow

Quotation mark standardization

See the trees in the etyma forests

Clarify what web pages count as "permanently recorded" for WT:ATTEST

Navigation menu

Search