Module talk:User:Theknightwho/UCA/DUCET

From Wiktionary, the free dictionary
(Redirected from Module talk:User:Theknightwho/UCA DUCET)
Latest comment: 5 months ago by Equinox
Jump to navigation Jump to search

@Theknightwho what is all this 000000, tell me you aren't implementing a new Wiktionary assembly language on top of Lua. Equinox 05:31, 7 January 2024 (UTC)Reply

@Equinox It's the Unicode Collation Algorithm DUCET (i.e. the standardised sorting weights for each character). Unlike codepoints (which are essentially arbitrary), the DUCET weights are actually meaningful, e.g. A. A, a, a, á, à, â etc. all have the same primary weight. There are also secondary and tertiary weights, which basically act as tiebreakers (e.g. "man" comes before "mán", since it would otherwise be a tie, but "mán" comes before "map", because "a" and "á" are otherwise treated as equal). It's a lot more sophisticated than what we do at the moment, which is to basically ignore (a select few) diacritics, which is a problem if either (a) you haven't accounted for that specific character/diacritic in whichever language you're dealing with, which happens a lot with borrowed terms, or (b) your language sorts diacritics in a particular order if all else is equal (e.g. Mandarin, Yoruba).
Implementing it would make our lives a lot easier, because it would replace a ton of language-specific crap we have at the moment, and for the cases where we really do need them there are a bunch of off-the-shelf "tailorings" for a ton of languages that we could steal (which is the name they use for language-specific adjustments to the UCA). Theknightwho (talk) 05:45, 7 January 2024 (UTC)Reply
@Theknightwho: I'll trust ya... once. I assume all these modules are going to get Documentation pages. Equinox 05:49, 7 January 2024 (UTC)Reply
@Equinox This one's a private module - I'm just experimenting to see what's possible. Theknightwho (talk) 05:50, 7 January 2024 (UTC)Reply
Haha, oh yeah, I didn't look at the namespace, did I? Then I'll backpedal and say thanks for not doing your tests in public, like some great Wiktionarians I have known. I had a really funny comment to add but I'm sober enough tonight not to add it. Good luck. Equinox 05:52, 7 January 2024 (UTC)Reply