User talk:CodeCat

Definition from Wiktionary, the free dictionary
Jump to: navigation, search
Archives: 2009-2010 · 2011 · 2012
Start a new discussion
First page
First page
Previous page
Previous page
Last page
Last page

Moved/broken rhyme links


I see that your bot MewBot has moved the English rhymes pages from "Rhymes:English:Stressed on /xxx/" to "Rhymes:English:xx-". Unfortunately, it didn't update the pages that linked to those pages or add a redirect, so now all the links at Rhymes:English are broken. Can you explain the reason for the move (I don't think there was anything wrong the previous format), and fix the broken links? Thanks.

Paul G (talk)17:36, 19 October 2014

Ok I updated the links.

CodeCat17:58, 19 October 2014

Thank you. There are still a lot of broken links in many of the other rhymes pages, though. See the Notes section on this page, for example. Could your bot have a look through the pages it moved to see what links to the old pages?

Paul G (talk)06:30, 20 October 2014


I used in the bot I created (User:WingerBot), with a few changes (mainly, I extracted calls to pywikibot.output() to a function and changed it to directly output UTF-8 encoded text to stdout/stderr, because pywikibot.output() behaves strangely when stdout is redirected to a file, automatically transliterating Arabic text). I would like to make the source code for the bot available. Are you willing to allow to be released? If so, what license do you want on it (e.g. GPL or MIT)?

Benwing (talk)23:43, 14 October 2014

GPL is ok.

CodeCat23:57, 14 October 2014

Thanks. Code is now on github, at [1].

Benwing (talk)08:20, 15 October 2014

Script-related issue in the templates

I noticed a script-related issue in our templates, I don't know which module is responsible exactly as I've completely forgotten which module do what because I wasn't here for a while but it must be related to language utilites or script utilites or their related modules so I'm bringing it up here. The problem is script is chosen solely based on the script detection function and if it fails, the "None" class is used, instead of the first script in m.lang.scripts.


{{head|ccp|noun|head=𑄚𑄳𑄟}}: 𑄚𑄳𑄟 (transliteration needed)
{{l|ccp|𑄚𑄳𑄟}}: 𑄚𑄳𑄟

Related data in Module:languages/data3/c, note "scripts":

m["ccp"] = {
        names = {"Chakma"},
        type = "regular",
        scripts = {"Cakm"},
        family = "inc"}

Related data in Module:scripts/data, note the lack of "characters":

m["Cakm"] = {
        names = { "Chakma" },
Z22:24, 19 July 2014

I suppose that if there is only one script listed, we could use it as fallback if detection fails.

CodeCat22:26, 19 July 2014
Edited by 0 users.
Last edit: 22:34, 19 July 2014

But as far as I recall we have always treated the first script in the list as the default one. and I think it's a good practice. Anyway, in this case we have only one script listed, but our templates mistakenly use "None" instead.

Z22:34, 19 July 2014

That was before we had Lua. Now, all scripts are treated as equal, with none given priority. This is still useful because there are cases where detection fails because the text actually isn't in any of the scripts. But in this case it fails because it's just not able to detect it at all. So that's a different case, and we could look at that.

That said, why can't the characters just be added to the script data instead? That would solve it.

CodeCat22:38, 19 July 2014

So that was intentional? Ok, but we should have add the characters first, it has broken older entries and has caused confusion for users.[1]

By the way, the functionality of the detect_script is not perfect.

Z23:00, 19 July 2014

Remove duplicates when there are multiple heads

The entry for أنتليجنسيا has one Arabic vocalization with two possible transliterations, representing two different pronunciations. This is expressed using 1= and head2=, but for this to work properly, the code in Module:headword should check for and remove duplicate heads. (I already do this in various places in Module:ar-verb; you could reuse e.g. the contains() and insert_if_not() functions from there.) Thanks.

Benwing (talk)00:15, 15 October 2014

This doesn't work because Module:headword expects each transliteration to match up with its corresponding headword. If we're going to allow multiple headwords and multiple transliterations per headword, it'll get really messy. Furthermore, no other template on Wiktionary supports multiple transliterations for a single term.

CodeCat00:17, 15 October 2014

What I'm asking you to do is to modify line 287 so that you copy 'heads' to a new array with duplicates removed before concatenating. This is easy to do. I would do this myself if I had permission to modify this file. Take a look at أنتليجنسيا and you'll see what I'm talking about.

Benwing (talk)00:32, 15 October 2014

You have to consider the implications of such a change though. Let's say that you have these parameters: head=A|head2=A|head3=B|tr=X|tr2=Y|tr3=Z. If your change is implemented, that ends up looking like this:

A or B (X or Y or Z)

It's now no longer obvious which transliteration belongs to which headword.

Concerning this specific case, though, what you're doing seems to be something other than transliteration. You're really adding additional pronunciation details into the transliteration field. Those should really go in the pronunciation section. Transliteration should not be used as a substitute for such distinctions.

CodeCat00:41, 15 October 2014

Rename {{temp|ar-numeral}} to {{temp|ar-cardinal}}?

The template {{ar-numeral}} is being used exclusively for cardinal numerals, and I modified it to reflect this, so it now puts things into Category:Arabic cardinal numbers as well as Category:Arabic numerals. Since you're quick with bot operations, can you rename the template and calls to it to {{ar-cardinal}} or similar? Thanks.

Benwing (talk)10:00, 11 October 2014

"Cardinal number" is not a part of speech though. That's why it's categorised differently.

CodeCat12:49, 11 October 2014

I don't understand. What's the purpose of Category:Arabic cardinal numbers if not to put cardinal numbers in it? Maybe pedantically it should be Category:Arabic cardinal numerals but so be it.

Benwing (talk)14:28, 11 October 2014

Headword-line templates are meant to reflect parts of speech. But "cardinal number" is not necessarily a part of speech as we discussed before, some cardinal numbers belong to other parts of speech (Dutch miljoen).

CodeCat14:30, 11 October 2014

Well, you'd have the same objection with just "numeral", I suppose. The thing I was trying to avoid was people thinking that {{ar-numeral}} is useful for e.g. Abjad numerals like ب or Eastern Arabic numerals like ٢.

Benwing (talk)14:40, 11 October 2014

Things with numbers and numerals are kind of confused, and they were a rather contentious issue for a long time. No consensus could be reached and so things were left in an indeterminate state with multiple conflicting uses of terms and categories. For example, there were "numeral", "number", "cardinal numeral", "cardinal number", "ordinal number" and "ordinal numeral" categories all containing similar entries, with different languages using different names.

Now, we've settled on this situation:

  • "Numeral" is considered a part of speech, and contains all (cardinal) number terms that are not clearly members of another part of speech. These entries receive the ===Numeral=== header.
  • All cardinal number terms, regardless of part of speech, go in the "cardinal numbers" category.
  • All ordinal number terms, regardless of part of speech (although they are generally adjectives), go in the "ordinal numbers" category.
  • Symbols for numbers, such as 1, 2, 10, 12 go in the "numeral symbols" category. These entries receive the ===Symbol=== header. (For languages like Chinese, there's not a clear distinction here because all symbols stand for concepts. There's little difference in Chinese between the word for the number 1, and the symbol for it.)

I hope that clears things up some.

CodeCat15:00, 11 October 2014

Could you add a poscatboiler category for "definite nouns"?

This is used for Category:Arabic definite nouns. Ideally this should automatically get triggered when using {{definite of}}, or something of that sort (maybe a different {{definite noun of}} would be needed?)

Benwing (talk)10:27, 8 October 2014

If it's really a category for noun forms, then the entries should probably go in Category:Arabic noun forms instead.

CodeCat12:30, 8 October 2014

It's not. The entries that go in it are lemma forms that have a definite article in them.

Benwing (talk)07:26, 11 October 2014

Can you add a poscatboiler category for "letter forms"?

Category:Arabic letter forms is non-empty but not yet created because there isn't an obvious way to insert a category boiler. In this case, a "letter form" is one of the forms of an Arabic letter, which may have up to four separate glyphs (initial, medial, final, isolated).

Benwing (talk)07:22, 11 October 2014

This template looked strange, and had the word حرف sitting in place of the letter for b. I commented out the hypernym=حرف param and this seems to have fixed it, along with adding a translit for ا. Not sure the purpose of the hypernym arg.

Benwing (talk)13:57, 10 October 2014

Transliterations of inflections are only being displayed when there's an explicit tr= given

I've been going through and removing redundant transliterations, and if necessary adding vowels to the inflection or headword. However, your recent change to display transliterations of inflections only does so when there's an explicit tr=. Perhaps this was some attempt to finesse the disagreements in the discussion currently in the Beer Parlor, but it doesn't really work for Arabic, where I'd like to discourage having explicit transliterations in favor of specifying the vowels in the Arabic form and transliterating automatically. You might want to go ahead and implement a policy that includes translits if explicitly given or the script is non-Latin, non-Cyrillic; we can always change this if the discussion ends up implementing a different policy. For the moment it looks like the consensus is in favor of transliterating Arabic inflections at least, probably all non-Latin scripts except Cyrillic and maybe Greek.

Benwing (talk)08:36, 8 October 2014

I'd rather wait, certain other editors here have a habit of chiding me every time I make a change they don't like.

CodeCat12:29, 8 October 2014

You incorporated plurals and such into {{head}}. They were purposely not there before so that transliterations would be displayed of plurals and feminine forms and such. The call to {{ar-linkify-bold}} added such transliterations; it also added vowel diacritics to the Arabic, matching the transliteration. (The latter should eventually be cleaned up with a bot, but for the moment it would be good to keep this feature.) It would be best to leave the old code as-is; at least you should fix up {{head}} to add the transliterations back for inflected forms, perhaps on a language-by-language basis.

Benwing (talk)00:42, 7 October 2014

I was aware of that problem, but I'm not sure what would be the best way to fix it. To find a proper solution we'd have to ask, first, why we have transliterations for Arabic inflections but not for Russian ones. We should really be consistent. If we have them for Arabic, we really ought to have them for other languages too.

CodeCat01:03, 7 October 2014

Maybe, but some languages are easier to read than others. Cyrillic is pretty similar to Latin writing so I can understand people not wanting to transliterate. Arabic script is quite different, and harder to learn to read because the letters have contextual forms and ligatures and such, so it's really very helpful to have the transliteration. I think you should put it back until we find consensus to do things differently.

Also, currently in many cases the transliteration has more info than the Arabic. Displaying the transliteration is thus important for multiple reasons.

You also eliminated the automatic vocalization, which really should be there until it gets fixed up with a bot.

In general I'm not sure why you felt the need to rewrite the templates in Lua; they were working fine before.

Benwing (talk)01:33, 7 October 2014

I've started a Beer Parlour discussion about the transliteration issue now.

CodeCat16:10, 7 October 2014

Do you have example code using pywikibot and mwparserfromhell?

Your MewBot code doesn't use mwparserfromhell and its template-parsing code is fragile (it will break e.g. with embedded links with | in them).

Benwing (talk)00:18, 6 October 2014

It does, actually. Where is it breaking?

CodeCat00:18, 6 October 2014

I'm looking at the source code you've posted. It's (c) 2013 so maybe you've updated the actual code. There's no import that looks like mwparserfromhell, and you parse templates in getTemplate() in using regexps (which will break with embedded templates) and break up params in parseTemplate() using split on '|'.

Benwing (talk)00:22, 6 October 2014

That code is only for the part that creates inflection entries. But I haven't run that in some time, mainly because it's so much work to convert it all to use mwparserfromhell. What I run now is mainly small, purpose-built scripts that are backed by pywikibot, mwparserfromhell and a small support library I wrote.

CodeCat00:24, 6 October 2014

Where is your current code available?

Benwing (talk)00:24, 6 October 2014

It's not available anywhere. Most of it is not worth keeping as it's used once and then deleted.

CodeCat00:27, 6 October 2014

What are you looking/aiming for?

I wonder why you made these edits to Module:Template:also.

Would have been nice if you had explained it in the edit summary, by the way.

Keφr12:36, 5 October 2014

They're just for testing purposes, nothing about how the module works has been changed. I was interested mainly in how much of the module could be made automatic.

But it does turn out that Template:tracking/also/uni/noauto and Template:tracking/also/uni/auto have no transclusions. This means that this part of the module can be eliminated.

CodeCat13:26, 5 October 2014


Why do we have these "Wiktionary does not have an entry for..." terms. These are useless.

I understand their points. Their points are to say that we don't have an article on something because it's not notable enough. This is to be said for articles that people commonly look up.

However, I do argue that we could say that for anything. We could have an article for Beyblade: Metal Fury or Mona Parker or Super Mario Galaxy having these kinds of things. I'm sure lots of people would think that Wiktionary would have articles on such things and look them up and not get the answer. Or aogfhdohoolhsfodlshodfgdosf, this article saying "Wiktionary does not have an entry for this term". I could look up random letters anytime I want and still not have a Wiktionary article, but people may type such random words in written works. So, as I say, these are useless and unideal in my opinion. I think we should just get rid of this whole idea.

Rædi Stædi Yæti {-skriv til mig-}00:00, 5 October 2014

Why did you delete Category:Arabic nouns lacking gender, and delete the code that adds lemmas to this category?

The category is actually useful. You also didn't fix up the documentation to reflect this change.

I have half a mind to undo this change; at the very least, you really really need to be giving explanations in the change log.

Benwing (talk)11:20, 3 October 2014

{{head}} automatically adds entries to a cleanup category if the gender is "?" so this extra category isn't needed. See Category:Arabic terms with incomplete gender.

CodeCat12:27, 3 October 2014


Benwing (talk)08:34, 4 October 2014

Edits on Dutch pronounciation

Hi Codecat, though undoubtedly well-intended, I believe your changes to the pronounciation of Dutch words constitute a Netherlands-centric bias / pov. Dutch does not have a standard pronounciation that covers all regions; Dutch is a pluricentric language. The recommended pronounciation of standard Belgian Dutch (e.g., the highly-regarded "VRT-Dutch") is markedly different in several ways from the recommended pronounciation of standard Netherlandic Dutch. I propose to undo all your recent edits concerning the pronounciation of Dutch words.

Morgengave (talk)22:14, 28 September 2014

Dutch is not a pluricentric language at all. There is only one standard, the one regulated by Taalunie, and it only regulates spelling, not pronunciation. There is no standard for pronunciation at all.

What is standard, though, is the common phonemic system of the majority of more-or-less mainstream varieties of Dutch. That means, specifically, that regardless of where you go, the language has the same set of phonemes, even if they are pronounced differently in different areas. See w:Dutch phonology for more details. Many Wiktionary entries currently give the different regional pronunciations as phonemes (with / /) as if the phonemic system is somehow different between Belgian and Netherlandic. But it's not of course, /ç/ and /x/ are really the same thing, so we should not write them differently if we are writing the phonemes, which concern only the underlying structure of the word (that is, which distinguishing sound-pieces is it made of).

CodeCat22:20, 28 September 2014

Hi Codecat, I never said there are Dutch pronounciation "standards", I did refer to pronounciation "recommendations", which de facto exist, which differ between the regions, and which are the closest Dutch has to pronounciation standards. Not entirely certain where you are heading towards, but it's common practice to recognize pronounciation differences in IPA, see for example English military or mandatory. Equally, the English language appendix on this makes note of the differences in pronounciation between the varieties: Appendix:English pronunciation. Opting for one IPA-recognition, and preferring the Nothern-Dutch variant, constitutes a pov. Indeed if we opt for one IPA-transliteration, we could opt for the Southern-Dutch variant as well (which to be clear, is not my suggestion either).

Morgengave (talk)22:41, 28 September 2014

I'm not preferring any variant. There is no such thing as "northern Dutch", "southern Dutch", "Netherlands Dutch" or "Belgian Dutch" to begin with. There are just different regional varieties, and choosing a single particular variety as "the" standard for either the Netherlands or Belgium, like you are suggesting, would be POV. Recommended pronunciations can be included of course, that's not a problem. But what I disagree with is labelling them simply "Netherlands" and such. That's just wrong, but moreover it horribly skews the picture that readers might get. Contrasting "Netherlands" versus "Belgium" gives the impression that the national border is a linguistic border separating these two varieties, which is not true of course. But contrasting "north" and "south" is not much better. What about northwest versus northeast? Or southwest versus southeast? There are big differences in pronunciation there too. We shouldn't be using labels that give an overly simplistic picture of things.

That said, the one thing that all these varieties generally do share, is that they have the same set of phonemes. Whether someone says /ç/, /x/ or /χ/ doesn't matter, anyone anywhere in the Dutch language area would understand them as being the same. Maybe the article w:Diaphoneme would be useful to you.

CodeCat22:51, 28 September 2014

Hi CodeCat, then you would not mind removing the Northern variants and keeping the Southern ones? I do not believe normal users would read the IPA in terms of phonemes - this is certainly not the common practice on Wiktionary - see for example cancer. Hence, most/all people would read this as what are the common and standard ways of pronounciation, and hence excluding the Southern pronounciations remains a pov. To reply on your other point: as you likely know, there is not an official American pronounciation, and in the UK, RP is a recognized practice, rather than a standard. That does not stop contributors from labelling certain IPA's as UK or US. I am equally open to be more precise. But again, let us keep the different IPA's... I also highly recommend not to continue removing IPA's - this is not constructive as long as we do not have a consensus. If need be, we can consult some of the contributors to the English IPA page.

Morgengave (talk)23:11, 28 September 2014

I'm not sure why you think I'm excluding southern pronunciations. I'm not including or excluding any pronunciations at all, I'm just removing misleading information and using a single common phonemic system for Dutch. As I said, labelling pronunciations as "Netherlands" is misleading and POV, especially when they contain features that are only found in a relatively small amount of speakers. We shouldn't be labelling pronunciations unless the label is accurate and neutral. I will continue to remove the labels "Netherlands" and "Belgium" as these are not correct.

CodeCat23:18, 28 September 2014

Why did you remove the Sexagenary cycle category I created?

Can you tell me why?

Fumiko Take (talk)12:15, 27 September 2014

The category structure is stored in a number of modules, Module:category tree/topic cat and its subpages. This is done so that all languages can easily share the same tree structure and there is no redundant information. If you add categories manually then it bypasses that whole system.

Also I wonder why you created the categories if they show errors like that.

CodeCat12:17, 27 September 2014

I'm no real Wiki guy, so I ain't real good at Wiki stuff. I've done a few discussions and seen a lot, but most of them came to a dead end, unless someone just stood up and did something. So I just created the categories, and hoped that anyone who sees the errors would help me fix them.

Fumiko Take (talk)13:06, 27 September 2014

I can help but I would need to know some things first. What parent category should Category:vi:Sexagenary cycle appear in? Also, why is the second word in "Heavenly Stems" and "Earthly Branches" capitalised?

CodeCat14:10, 27 September 2014

I just typed the terms in the way the Wikipedia suggests. You can check out these articles w:Sexagenary cycle, w:Celestial stem, w:Earthly Branches. The Sexagenary cycle category should be a child category of something related to the w:Chinese zodiac, or something related to the way Asian people mark years or times of a day.

Fumiko Take (talk)14:21, 27 September 2014

I've been reading up on this since I saw the new categories: basically, the sexagenary cycle is a a Chinese system for naming units of time that has the 12 Earthly Branches as the low-order cycle and the 10 Heavenly Stems as the high-order cycle- though I'm unclear as to how this adds up to 60 rather than 120. At any rate, the 12 can be times of the day (corresponding to 2-hour periods) with 10 days of the week, months of the year, or years. The 12 Earthly Branches as applied to years are better known in the West by their animal symbols- I believe we're currently in the Year of the Horse. Apparently, the Celestial Stems can also be used anywhere arbitrary names are needed in ordered lists, in the same way we use letters, and the 12 Earthly Branches are used to label compass points.

In modern usage, though, the sexagenary-cycle-based calendar has been replaced by the Gregorian calendar for day-to-day and official purposes, so it mostly survives as the basis for religious/folk beliefs.

All of this means that it's very hard to assign a single parent: in the West, it's mostly known as a sort of astrological cycle applying to years, but it could just as easily apply historically to times of the day, days of the week, and months of the year. As for capitalization: title case seems to be the norm in actual usage.

Chuck Entz (talk)16:37, 27 September 2014

I notice we also have Category:Chinese months. Are those different?

CodeCat17:32, 27 September 2014
Edited by author.
Last edit: 03:37, 28 September 2014

Indeed you got it mostly right. I might add that the Heavenly Stems usually go with the Earthly Branches when it comes to naming years. For example, this year is called "Giáp Ngọ" ("Giáp" is a stem and "Ngọ" is a branch). The next year is "Ất Mùi", in which "Ất" follows "Giáp" and "Mùi" follows "Ngọ", and so on. By the way, I wonder if anyone of you can create a category for this kind of page, such as giờ Mùi, giờ Tuất, giờ Hợi, etc.?

Fumiko Take (talk)02:37, 28 September 2014

More words with no head


DTLHS (talk)20:34, 25 September 2014

Is that a list of all entries which contain the sequence \n''' or \n; ?

CodeCat20:35, 25 September 2014


DTLHS (talk)20:37, 25 September 2014

I'm surprised, I thought there would be more. Thank you!

CodeCat20:38, 25 September 2014

I fixed the Finnish terms on the list. I agree with CodeCat: the list is suspiciously short.

Hekaheka (talk)07:31, 26 September 2014

“Alternative forms are no longer categorised per a previous discussion.”

Please provide me with a link to this discussion.

 — I.S.M.E.T.A.21:18, 21 September 2014

Wiktionary:Beer parlour/2013/December#How useful is Category:Alternative forms by language?. Following that discussion, {{alternative form of}}, {{alternative spelling of}} and {{alternative capitalization of}} no longer categorise. It would be rather strange if {{alternative typography of}} did categorise.

CodeCat21:22, 21 September 2014

Proto-Indo European pronounciation

I have a question about PIE, how is *h₁, *h₂, *h₃ pronounced?

Moonspell Bloodlines (talk)01:29, 21 September 2014

There isn't a clear consensus about it. See w:Laryngeal theory. Personally I think they were something like [h] or [x] for *h₁, [χ] for *h₂, and [xʷ] or [ɣʷ] for *h₃.

CodeCat02:01, 21 September 2014
First page
First page
Previous page
Previous page
Last page
Last page