User talk:Robert Ullmann/2009

Definition from Wiktionary, the free dictionary
Jump to: navigation, search

archive pages, page history with archives

Contents

Suricata[edit]

Would you have anything to add to Talk:Suricata? Ƿidsiþ 19:26, 2 January 2009 (UTC)

Spanish newspaper[edit]

Would you be willing to extract word lists from a Spanish paper? This is the one I was thinking of- the front page lists a lot of articles for the day, so just following links on 1 page should be enough. Lemmatization isn't a problem- I can do that myself from the non-lemmatized forms easily enough. What do you think? Nadando 22:25, 2 January 2009 (UTC)

Still working bugs out of the code set up now (it ran successfully for the first time while I was watching Spurs-Wigan ;-). But yes, can certainly look as this. On a first glance, the sections look useful too. (pages/subpages are easy) Robert Ullmann 23:20, 2 January 2009 (UTC)
I'm still playing; not sure where I want to go with this. I wrote it because I needed it for Swahili. Robert Ullmann 11:55, 3 January 2009 (UTC)
I've been rethinking this- I just don't think we have a good enough general coverage at this point to need to extract words from an entire paper. It takes me an hour just to do 1 article, which I can easily format to extract the words we don't have by myself. Nadando 01:42, 10 January 2009 (UTC)
Took me just a little bit to convert it for ABC.es; good exercise because I need to generalize it so I don't have a full copy of the code to maintain for each language. I've tried running it; works fairly well.
Note it will catch things you probably want, that the simple method of linking all the words in an article won't: horas exists, but is only Latin. colas and altos only English. A number of other fairly common words have entries in Italian (Portuguese, etc, even Finnish) but not Spanish. The code also prioritizes shorter and more common words. I'll do a full run for you to look at; no obligation on your part (;-). Robert Ullmann 15:09, 10 January 2009 (UTC)
Ok then :D should be interesting. Nadando 15:18, 10 January 2009 (UTC)

Italian papers[edit]

Hi there. In User:Robert Ullmann/Italiano/2 January 2009 entry "citt" should be for città. Cheers SemperBlotto 22:47, 3 January 2009 (UTC)

Hi! Yes, already fixed. Wasn't replacing HTML entity for a-grave. Robert Ullmann 00:18, 4 January 2009 (UTC)

Hi again. I would like to add some generic stoppers (maybe using a wildcard) e.g. all terms beginning l', dall', sull' etc as they are just uses of the etc before a vowel. Any chance of being able to add something like "*dall'*" ? SemperBlotto 08:35, 4 January 2009 (UTC) p.s. (changed section title)

I've added a number of those (else you would have seen oodles). But I don't know what they all are. (I can read a reasonable amount of Italian in something like a newspaper , but have never studied the language formally.) Is there a good place to look? I haven't yet done anythig simple like looking at the 'pedia ... should the word after the article be checked? Or does it vary in spelling or form? I.e. should I "stop" it, or remove the article? Robert Ullmann 10:50, 4 January 2009 (UTC)
ATM, I'm using l', d', nell', dell', dall', all', s', sull', quell', and un'. Robert Ullmann 11:00, 4 January 2009 (UTC)
There are at least a few others: tutt', sott' ... ;-) Robert Ullmann 11:27, 4 January 2009 (UTC)
Also nell' - the word after should be normal. SemperBlotto 11:30, 4 January 2009 (UTC)
In User:Robert Ullmann/Italiano/3 January 2009 entry "anda" should be for andare. SemperBlotto 22:58, 4 January 2009 (UTC)
One would think so. But look at the source: the r is underlined, and there is a facebook link wrapped around it. (!) Not a usual situation? Robert Ullmann 23:10, 4 January 2009 (UTC)
Hi there and thanks for these wonderful lists. Just a couple of things: 1) it is Gazzetta dello Sport not Gazetta dello Sport 2) why don't you try adding Il Corriere della Sera [1] (the most important Italian newspaper) instead of l'Osservatore Romano that is full of Latin words? --Barmar 15:12, 5 January 2009 (UTC)
Urk. I spell "gazzetta" correctly in a dozen different places, and get it wrong in the one place that is visible. Okay, fixed, thank you. I wanted l'Osservatore because of the different vocabulary; and it is all on one page so easy to deal with. The code does do a fairly good job skipping the Latin, and would do better if our Latin entries had better coverage (there are a lot of very basic Latin words missing). I can certainly add another. Robert Ullmann 15:45, 5 January 2009 (UTC)
Ok for l'Osservatore. Il Corriere has generated a terrific list; I didn't believe we were still missing such important words! As regards to stoppers I would add s' because it generates forms of reflexive verbs that don't deserve their own entries (e.g. see industriarsi; s'industria =si industria, s'industriarono =si industriarono, s'industriò =s'industriò aren't linked). --Barmar 09:19, 6 January 2009 (UTC)
I've had to drop L'Osservatore anyway, today's (5/6) issue has French, German, and Polish. (A good thing, but not for this ;-) I could still filter it if we had enough words in other languages, but isn't worth it right now. "s" is already in the list of apocoptic forms used as contractions (as of yesterday). You are quite right that adding Il Corriere della Sera was a good idea, very good list of words. The program is now reading several hundred articles (293 today). Robert Ullmann 11:27, 6 January 2009 (UTC)
Good news - lots of useful words - even the English ones are good, as the Italians only use a subset of the meanings e.g. shuttle only = space shuttle.
Bad news - it takes me three days to complete each day's file (and I have had plenty of spare time the last few days, normally I won't have so much). So we will get a bit behind I'm afraid. Never mind. SemperBlotto 17:15, 6 January 2009 (UTC)
Isn't shuttle navette and "shuttle" only English? Yesterday's list was much larger because I added in Corrierre della Sera, going forward they will be smaller, as words from articles that appear on the site for multiple days are not repeated. But yes, it may be impossible to "keep up" at this point; we have about 98% plus coverage of the daily vocabulary (by word instance count, not unique word count) which is very good, but still leaves quite a few. Robert Ullmann 15:25, 7 January 2009 (UTC)
Really too many words! Consider that we only have 22,000 Italian nouns when there are probably about 100,000. IMHO is not a bad idea running the software once a week instead of daily. --Barmar 07:30, 8 January 2009 (UTC)
Here's some other stoppers to be added: m', v', t', nel' (common misspelling of nell'), nell (see above), del' dell (misspelling of dell'), sull (misspelling of sull'), dev', dov' quand' quant' bell' grand' mezz' qualcos' --Barmar 15:22, 7 January 2009 (UTC)
Okay. The forms that are parsing and showing as separate words should go in the stops file. For example, I saw a use of dell'"overtime" which parses as "dell" and "overtime" (;-). THe others I'll add in the program list. Robert Ullmann 15:33, 7 January 2009 (UTC)
Is it possible to somehow exclude words including the letter á? They're all typos; we just use the à. --Barmar 16:57, 9 February 2009 (UTC)
Quite simple. Done. Will take effect from tomorrow. Robert Ullmann 17:10, 9 February 2009 (UTC)
Thank you. Then you can also exclude ó, ú and í (both è and é instead, are correct). --Barmar 13:54, 10 February 2009 (UTC)
Okay, no problem (each is just one line in the profile for Italian). Robert Ullmann 14:44, 10 February 2009 (UTC)
Hi Robert. Have these stopped happening? I know that we have trouble keeping up with you, but one a week (or so) would be useful. Cheers SemperBlotto 11:18, 12 March 2009 (UTC)
I was busy with Interwicket, and we have been having ongoing network problems here where often 70-80% of connection attempts get resets. AF and Interwicket have all network ops wrapped in layers of exception handling, so they just retry again and again (AF has been running for 2.1 Msec since last restarted ;-). I'll need to fix the papers program so it will do the same. Robert Ullmann 11:44, 12 March 2009 (UTC)
Buongiorno Robert. Is it still impossible to run the bot (or whatever it is)? I would like to add some swine flu-related terms. --Barmar 08:28, 28 April 2009 (UTC)
Yes, I'll run it again. I've been fixing lots of other things. (;-) I did fix it to be more resilient to network problems. Robert Ullmann 08:56, 28 April 2009 (UTC)
Thank you. --Barmar 18:16, 28 April 2009 (UTC)

Translation languages[edit]

Could I get a refresh whenever you think that edits from today will be reflected (I don't have a good sense for what the delay on all that is). Also, please, please, please could we have separate pages for valid and invalid like with the L2's? I realize it seems silly to you, but it makes working on them a million times easier. Finally, I have a couple trans-lang switches for Auto-Format to chew on: Guarani --> Guaraní, Tamazight --> Central Atlas Tamazight, Romany --> Romani, Alsatian --> Swiss German, Capeverdean Crioulo --> Kabuverdianu, Ossetic --> Ossetian, Gallegan --> Galician, Frisian --> West Frisian (there might be a few errors produced by this, but I think they'll be few enough to be acceptable), Luxemburgian --> Luxembourgish, Garífuna --> Garifuna. Also, would it be possible to do Scottish --> (ttbc:Scottish Gaelic, ttbc|Scots) and Sorbian --> (ttbc:Upper Sorbian, ttbc|Lower Sorbian)? I guess I've just gotten into the habit of assuming that AF is just as sentient and capable as I am, just with a lot more time and patience ;-). I suppose a simple switch of Scottish to Scottish Gaelic wouldn't be the end of the world, but in this case I'd like them double-checked. -Atelaes λάλει ἐμοί 03:08, 5 January 2009 (UTC)

Also Piemontese --> Piedmontese. -Atelaes λάλει ἐμοί 11:16, 5 January 2009 (UTC)
Please, please, please, at least the update and valid/invalid separation? -Atelaes λάλει ἐμοί 10:45, 16 January 2009 (UTC)
Sorting the table doesn't work? I can go do some code munging. Robert Ullmann 10:50, 16 January 2009 (UTC)
Sorting does work. To be honest, it's largely psychological. With a list that long, there's a need to be able to cross things off and see the list shrink. Also, with the separation, the list is much smaller and easier to keep track of stuff (especially when its practical to remove sorted stuff). Also, there's been quite a bit of work done since the current version. The update should take off quite a bit. Finally, perhaps I'm pushing it, but would it be at all possible to add a blank column in the table, just so I could add notes while I work, for things which aren't immediately sortable? I really do appreciate your munging. -Atelaes λάλει ἐμοί 11:12, 16 January 2009 (UTC)
User:Robert Ullmann/Trans languages/uncoded ... will get a newer dump presently; that is 6 January. Robert Ullmann 12:45, 16 January 2009 (UTC)
Excellent. Thanks. -Atelaes λάλει ἐμοί 17:55, 16 January 2009 (UTC)

I didn't do this with AF before; is a semantic change, nto what it usually does. But I do have code, to add to the table and run. Question on "Swiss German": gsw is "Template:gsw"? Robert Ullmann 23:28, 18 January 2009 (UTC)

Hmmm.....I think that should be changed.....maybe. SIL lists Swiss German, Alsation, and Alemannic as possible names for this code, yet Ethnologue lists it as Swiss German, with Alemannic as an alt name. However, it places it in the Alemannic family. I wonder if this is like Frisian, where most uses of Frisian mean West Frisian, because it is more common or something, even though there are other Frisians. Perhaps we should just leave this one for now until we figure out exactly what to do with it. -Atelaes λάλει ἐμοί 02:55, 19 January 2009 (UTC)
In any case, Alsatian should not be confused with Swiss German, even if they share the same code: Alsatian is spoken in France, and Swiss German in Switzerland. Do you know that Microsoft has developed a version of Windows in Alsatian? (first regional language to be chosen, as an experiment) Lmaltier 15:19, 19 January 2009 (UTC)
Found 294 on test scan on 17 January dump; I'll look at it a bit tomorrow and run it; looks fine. EP may have done a few of those already. Robert Ullmann 00:08, 19 January 2009 (UTC)

Yes, I saw the discussion. I'm only hitting the Romance languages, and most that I'm doing have only one or two erroneous entries (other than Friulian and Galician). I'll send you a list of corrections for the ones that are more common that just one or two. there are also a couple of languges in there theat aren't languages, or which do not meet CFI (like Romanica, which is an offshoot of Interlingua). BTW, if AF doesn't already correct "Descendnats" to "Descendants", it should. I've found that it's a frequent typo on my part. --EncycloPetey 00:03, 19 January 2009 (UTC)

Yes, will fix "Descendnats" ... ;-) Good night Robert Ullmann

{{he-present of}}[edit]

Hi Robert,

I've just written {{he-present of}}; could you take a look, and let me know if you think this is a good approach? I'm especially interested in your opinions about:

  • whether new form-of templates should support pre-linkified parameters, now that we have a different kludge for page-counting;
  • whether this is a good way to handle capitalized vs. uncapitalized text;
  • and whether this is a good way to handle final dots;

but if you have any other thoughts, I'd love to hear them as well.

Thanks in advance!

RuakhTALK 20:10, 6 January 2009 (UTC)

Hi Robert,
Sorry to bug you again, but is it all right if I go ahead and start using this template widely, and creating and using templates for other Hebrew tenses and such?
Thanks again,
RuakhTALK 17:48, 17 January 2009 (UTC)

Hi, sorry for delay; various things ... including that I find I have more time on "normal" working days; on the weekends I like to watch the matches, and I can't timeshare! (Man U, bit of Chelsea, Hull and Arsenal, AC Milan, and it is 1AM! ;-)

If the template can conditionally wikilink, that is good; if it is a problem, don't.

My thought on the cap= parameter was that is should be cap=uc (default) or cap=lc, and code uses (uc, lc)first: from that to format the text.

"." is not a good parameter name (cute, legal, but ...); better is dot=

But: all of this is because of the idiocy of trying to force the inflection into a definition line, when it is in no way defines the term, making it hard to add the actual definition desired, which is what should be there. You might look at User:Robert Ullmann/Form of, re making entries that are useful, instead of meaningless junk. (the "xxx form of yyy" crap mostly originated with Wonderfool, who wanted to add French entries but was too lazy to create proper entries with definitions and examples and translations) From your examples, you'd like to have actual definitions, so put the inflection where it belongs: on the inflection line. Cheers, best regards, (we won ;-) Robert Ullmann 00:19, 18 January 2009 (UTC)

Thanks! :-D
I actually agree with WF on the issue of form-ofs — I think it's better to write "Feminine singular present participle and present tense of הלך (halákh)" than to have four sense lines, one for "(She/it) (is) walking/walks, going/goes, especially on foot", one for each other sense (some of which will need to be modified a bit, because not all the theoretical possibilities actually make sense), and even then it will still miss a lot of important information about what "feminine singular present participle and present tense" means in Hebrew (such as its use in forming the conditional mood, the fact that a subject is required, the fact that it's idiomatic to use it with the verb for "was" but not with the verb for "will be", … and the fact that the "it" in question has to be feminine).
What I would like to do is create an appendix for each Hebrew verb form — there are a ton, maybe several hundred when you count all permutations, but it's still a very finite set — and have the template link to that.
The reason this template supports the alternatives is that msh210 disagrees with me, and I haven't yet gotten my mind-control gun to work over the Internet. (And even once I do, he is so not my first target. There's an Internet full of people who are wrong, why waste jiggawatts on details like this?)
Now, when people start doing this for English words — "defining" goes and went and so on — maybe I'll change my mind, but right now I'm very skeptical.
RuakhTALK 01:01, 18 January 2009 (UTC)
"(some of which will need to be modified a bit, because not all the theoretical possibilities actually make sense)" a very good point, and one that re-inforces having the separate sense lines for each form; not all will make sense, and there are lots of variations. goes and went desperately need to be expanded for each sense that, um, makes sense, with examples. To me, tables in appendices are utterly useless; the number of well-educated English users that can track a tense in and out of tables solely re English is extremely small; doing it with a language you are partly familiar with (which is the point?) is hopeless. I'm looking up a word: I want to be told what it means, not be directed through "lemmas" and tables in which I am utterly guaranteed to get lost. For a non-native speaker of English, as well as 98% of native speakers, (goes:) "Third-person singular simple present indicative form of go." means absolutely utterly nothing at all. Have a good night. Robert Ullmann 01:11, 18 January 2009 (UTC)

User:Tamiko[edit]

I don't know whether you have noticed yet, but someone called Tamiko has left a message on the Feedback page - trying to get in touch with you. Dbfirs 20:22, 6 January 2009 (UTC)

etyl and macrolangs[edit]

So, I know we've talked about this sporadically, but this has me thinking we should do it now, rather than later. So, since we don't want to confuse bots into thinking that macros are kosher L2's, nor do we want {{infl}}, {{term}}, {{etc}} using them, I figure we go the route you proposed some time ago and do {{etyl:dra}}. I suppose the coding aspect would simply add something like {{#ifexist:{{etyl:{{{1}}}}}|[[w:{{etyl:{{{1}}}}} languages|{{etyl:{{{1}}}}}]]{{#ifeq:{{NAMESPACE}}||{{#switch: {{{2|}}}|en|eng|=[[Category:{{etyl:{{{1}}}}} derivations]]|mul|-=|[[Category:{{{2}}}:{{etyl:{{{1}}}}} derivations]]}}}}| to the beginning (with the balancing brackets at the end of course). Of course this would need to follow a BP discussion about whether we want macro-language categorization at all. Your thoughts? -Atelaes λάλει ἐμοί 08:36, 8 January 2009 (UTC)

note to self[edit]

Remember that tear gas is Not Fun. (picture taken across the street from where I live; I was ~50 meters to the left and behind this photographer.)

For others with questions supra and elsewhere; I will get to them (;-) Robert Ullmann 18:07, 9 January 2009 (UTC)

LOL[edit]

Say what‽XD (u):Raifʻhār (t):Doremítzwr﴿ 22:06, 9 January 2009 (UTC)

You should hear the one medical students use to learn the spinal nerves (sorry, I'm not going to repeat it). I also half-remember a rude one involving camels (that I wouldn't repeat publicly if I did) which is used to learn the geological time scale. --EncycloPetey 22:13, 9 January 2009 (UTC)
Oh, Oh, Oh To Touch And Feel ... ? (;-) Um, yes. "Camels ordinarily sit down carefully. Perhaps their joints creak. Possibly early oiling might prevent premature hardening." Presumably you have a dirtier version? That one took a bit of googling; probably I'm just finding bowdlerized versions. Robert Ullmann 00:18, 10 January 2009 (UTC)
I couldn’t guess the reference. Much more amusing, them.  (u):Raifʻhār (t):Doremítzwr﴿ 22:33, 9 January 2009 (UTC)

Hardware scrounging[edit]

Hiya!

I've cannibalized another box and have some Ram to add (also have two hard drives, so may be able to replace the bad one if I can figure out which it is) for Wiktionarydev. But I notice you have xmlproc.py running, which I assume is the xml updater? Not sure how that should be shut down or restarted, so I'm waiting on a response before proceeding. Also, I don't know if anyone is maintaining the server OS? I have not been. - Amgine/talk 19:13, 12 January 2009 (UTC)

Hi, sorry it took me a little while to answer. xmlproc is just something that is set up to either be a cron event, or sleep in the background and schedule itself; it can be killed off. Do what you will. I haven't been doing anything to maintain/update the OS or other things. Can you send me your email address (using link on the left) I don't have it ... Robert Ullmann 22:28, 12 January 2009 (UTC)
Kk, the server went up and down a few times yesterday, and the extra ram turned out to be the culprit. So it's back up and stable atm. I don't know if xmlproc was set up as cron. Did you document it anywhere, so I can do that (or bug someone who speaks python?) - Amgine/talk 14:34, 14 January 2009 (UTC)

lang/Hans[edit]

In reply to: User_talk:Nbarth#Template:lang

Hi Robert,

Thanks for the clarification, and sorry for the mess; I managed to get rather confused about how language and script templates work.

The issue is with the HTML lang tag.

From what I can see, and as I think you say, {{Hans}} is broken; it should set the language as zh-Hans, not Hans; compare with {{zh-forms}}, which use zh-Hans and works (for me).

My issue is that I use Gothic Japanese fonts by default, and rather calligraphic font for Chinese; thus correctly marked Chinese shows up as calligraphic, but incorrectly marked Chinese shows up as Gothic, with calligraphic characters where there isn’t a Japanese character.

…so current cmn-templates ({{cmn-idiom}}, {{cmn-hanzi}}, etc.) don’t work for me – they don’t tag the language, so I get Japanese fonts for Chinese words; this is what I was trying to fix.

OTOH, {{Han char}} and {{zh-forms}} do work: I get Chinese fonts.

From what I can read at:

…HTML language tags must start with the language code; thus lang="Hans" is invalid, but lang="zh-Hans" does.

So:

  • The current Hans/Hani/Hant templates should be fixed to use the valid lang="zh-Hans" etc., instead of the invalid bare script tag lang="Hans", right?
  • Also, from what I can tell {{term}} (etc.) should be fixed to wrap the term in a
<span lang="xx-yyyy">…</span>
tag if la=xx (and optionally sc=yyyy) is present, right?

—Nils von Barth (nbarth) (talk) 01:00, 15 January 2009 (UTC)

(Ain’t Han unification great?)
—Nils von Barth (nbarth) (talk) 01:01, 15 January 2009 (UTC)
(short answer, net is broken badly here, and is 4AM, so I can just sleep anyway); yes, so fix {{Hans}}, not introduce {lang}. We do want to sort it correctly (:-) Good night, morning, whatever! Robert Ullmann 01:10, 15 January 2009 (UTC)
Han unification is pretty cool, I was involved with it around the edges. The work done at UT was a tour de force. I fixed the Hant/Hans/Hani to generate valid tags. What {term} should do is pass the language to the script template, as each script will have rules. Robert Ullmann 13:46, 15 January 2009 (UTC)
{Hans}, {Hant}, {Hani} (and thus cmn-etc.) all look great, thanks!
(…and yeah, bitching aside, Han unification is very sensible – thank you for your (clearly massive) contributions (^.–).
—Nils von Barth (nbarth) (talk) 14:14, 15 January 2009 (UTC)
Re: {term} – good point that individual script templates will do what they will.
However, if no script template is given, but a language is, could we wrap the entry in a
<span lang="xx">...</span>
at least?
My proximate interest is in Japanese terms: it’s a real drag to write (rather, paste) |lang=ja|sc=Jpan into every {term}, rather than |lang=ja: if it’s a Japanese word, it should be marked as such.
That is,
  • rather than defaulting to Latn script for terms (current behavior),
  • it should default to the proper language.
So in pseudo-code (with spaces for legibility):
 {{#if:{{{sc|}}} <!-- check if Script or Language are defined
 -->| {{{{{sc}}}| lang={{{lang|}}} | (content) }}  <!-- if Script defined, use that
 -->| {{#if: {{{lang|}}} <!--
  -->| <span lang="{{{lang|}}}"> (content) </span>  <!-- else if Language is defined, used that
  -->| {{Latn| (content) }} <!-- else default to Latn
  --> }} <!-- end if lang
 --> }} <!-- end if sc
--> (etc.)
Does this seem reasonable?
—Nils von Barth (nbarth) (talk) 14:14, 15 January 2009 (UTC)
{term} tries to do a very great deal, doesn't it? It wants:
  • the language name, for a section link
  • the language code (for this, presumably, anything else?)
  • the script code
Some conversion of language to script (or something to something) in some cases would seem reasonable; the question is how to do that (the lang2sc method not being workable) I'd rather not add more HTML into {term} (and it needs serious fixing anyway, I'd rather not go near it ;-) The use of the "mention-Latn" class inside the Latn template to customize {term} is a crock, it means Latn isn't useable elsewhere. and so on
Your code is reasonable, but "they" want to use Latn for (say) lang=fr; so that isn't very good either (note the current code defaults the script to Latn. (you would want to use xml:lang="..." as well)
Some kind of magic do-the-right-thing script template. Maybe with a magic name like {Zyyy} (;-) ... don't know; it would get called almost everywhere. (and thus can't use #ifexist to use helper templates, and ... never mind)
will noodle it more Robert Ullmann 14:49, 15 January 2009 (UTC)
Oh, now that you mention it, I suppose playing with {term} perhaps isn’t the most carefree task…
Re: language to script – shouldn’t that be unnecessary?
That is, most languages imply a default script, and if one wishes to style text in a given language, there are good CSS ways of doing it, so one doesn’t need to use templates for all foreign language text.
Re: magic do-the-right-thing script: if you use template {{Xyzy}}, it always Does The Right, even without a language tag.
—Nils von Barth (nbarth) (talk) 02:04, 17 January 2009 (UTC)
Like this: {{Xyzy}} ... Robert Ullmann 13:37, 17 January 2009 (UTC)

fuzzy matching routine in AF bot[edit]

Hi Robert,

You are probably aware that I'm trying to create a parser for the Wiktionary projects. I'm concentrating on translations lines for the moment. If you'd like to see what I already accomplished, the code is in svn on the devtionary server. I'm quite proud of what it already can do. There is a unit test in translations/testtranslationlinesParser.py, if you'd like to have a look at it. I have been looking at your code for AF and took some ideas from it that I expanded upon. I hope that was OK to do. When I started writing this code 3 years ago, I simply created lookup dictionaries with all the possile misspellings of language names in it. Of course, back then I was not working with language names for several 1000 languages in several 100. So I didn't foresee that this wouldn't exactly scale all that well. I noticed you solved this problem with your fuzzy lookup routine. I don't think this is something I can easily take the general idea of and then expand upon (or reimplement without ending up with practically the same code), so I wanted to ask you for permission to use it. The code I'm writing together with Conrad is going to be available under the GPL though, so I'll understand if you'd mind. The general idea behind the whole thing is to make the data that is now encapsulated/locked up in many formats in the Wiktionaries, available under a unified format. I think that is what a free information project should be all about. So the end goal is an API to the Wiktionary information.

Kind regards, --Polyglot 13:43, 18 January 2009 (UTC)

The code as published here is GFDL. Since that is odd for code, I'll explain: it is like the examples in a GFDL textbook, which you are free to copy and use as you will. Is very close to public domain in that sense; if you used it in entirely proprietary software it would be okay, although not really ethical.
However, that aside: please do! Take whatever is useful. I'd appreciate a bit of credit if convenient.
The fuzzy routine isn't used by AF for language names; they are too close together, with too many not necessarily known, for it to try to "correct" languages. (there is a note about this somewhere on AF talk). Also with fuzzy: if the strings are longer, and have multiple points of similarity, it will expend a lot of time trying to exhaustively find the best, and it is not the optimum algorithm. (I easily contrived two 80 character strings that would take at least days of compute to "match" ;-) The minimum match length parameter is important. Robert Ullmann 14:49, 19 January 2009 (UTC)
Hi Robert,
I am really glad with your answer concerning the licensing. I have to admit I didn't know what to think about the choice for GFDL. I'll make sure to mention where the code came from.
Now I'm trying to integrate the routine in the code I have, so I was browsing the AF source code to see how it's used. I noticed the following lines:

if fh and level != 3 and fuzzy(header.lower(), 'pronunciation', 11) >= 11 and len(header) < 15: if fuzzy(header.lower(), 'etymology', 7) >= 7 and len(header) < 20:

And I started thinking: Why do the expensive fuzzy() first and then AND it with the len of header, which is a constant. So it's probably more efficient like this, since conditions that are strung together with and are not computed anymore as soon as one of them is False:

if fh and level != 3 and len(header) < 15 and fuzzy(header.lower(), 'pronunciation', 11) >= 11: if len(header) < 20 and fuzzy(header.lower(), 'etymology', 7) >= 7:

I want to use it on language names, but only after a dictionary lookup with the normalised/lowercased value on both language name and script name (Roman, Cyrillic, Kurmanci, etc) failed on both the actual names, variations and synonyms of them. I'll do some tests with it to see whether it is suitable for the purpose I have in mind for it.
What I also wanted to mention is that Conrad Irwin wrote some utilities. iso639.py, iso15925.py and maybe most importantly unicodescript a utility that returns the script name of a unicode code point. Maybe they can be useful in some of your projects. The other two modules are on the devtionary.info server. Polyglot 20:52, 30 January 2009 (UTC)

A little help please[edit]

I can't really understand programming language so could you look at {{pl-decl-noun}} and subsequently słownik and tell me what I've done wrong (in layman's terms). 50 Xylophone Players talk 18:28, 18 January 2009 (UTC)

I don't understand what is going on; newlines are supposed to disappear (along with linear-WS) around parameters. Does this not work if the parameter is directly in brackets? But I've never noted that before. Robert Ullmann 10:33, 19 January 2009 (UTC)
They (newlines) are only stripped around named parameters; it is doing what it is "supposed" to do. Other templates have worked around this in various ways, mostly I think because people have tried things until they worked. I'm going to revert pl-decl-noun for the moment (so it isn't broken), and then we can fix it presently. Robert Ullmann 10:53, 19 January 2009 (UTC)

I have made some progress with {{pl-decl-noun-ia}} but I'm puzzled as to why it isn't making wikilinks at kategoria#Polish... 50 Xylophone Players talk 17:06, 21 January 2009 (UTC)

The redlinks are hidden by code that should not be active by default (it is in WT:PREFS). Something I've been meaning to fix. Take class="infection-table" out for now, and you'll get tne correct behaviour. Robert Ullmann 18:08, 21 January 2009 (UTC)

Bad language names and their corrected counterparts[edit]

Here is what I noticed for Romance languages from the current Translations section listings:

Bolognese → Emiliano-Romagnolo
Català → Catalan
Emilian-Romagnol → Emiliano-Romagnolo
Español → Spanish
Espéranto → Esperanto
France → French
Friulan → Friulian
Galego → Galician
Galiacian → Galician
Gallegan → Galician
Italiano → Italian
Italien → Italian
Rumantsch → Romansch
Spansish → Spanish
Spanush → Spanish
Valencian → Catalan
Waloon → Walloon

Additionally:

  • Auregnais - an extinct language with no ISO code
  • Bresciano - refers to the region of Lombardy and probably means Lombard; the Italian WP has an article on a Lombard dialect by this name w:it:Dialetto bresciano
  • Cantabrian - no ISO code; linguistic status is uncertain and not tied to just one language, so we can't subsume it into an existing language name
  • Lombardo - cannot be corrected automatically, ambiguous; could mean modern Lombard or extinct Lombardic
  • Provençal - cannot be corrected automatically, ambiguous; could mean Franco-Provençal or Occitan
  • Raeto-Romance - probably means Romansch, but Rhaeto-Romance is a language group
  • Romanica - constructed language; offshoot of Interlingua; does not meet CFI

There are also various varieties of Portuguese and Sardinian listed with parenthetical qualifiers, but I am not sure how we have decided to handle the labels for those varieties, and so haven't touched those entries. --EncycloPetey 00:33, 19 January 2009 (UTC)

Are we sure we want to count Valencian as Catalan and Provençal as Occitan? I think we probably do (I'm somewhat of a group-together-ist, except maybe when we risk trampling a minority language by lumping it together with a big fish), but if it hasn't been discussed previously (has it?), I think it might be worth discussing before we make a change that's hard to undo. As with many languages/dialects, there are arguments both for treating them separately, and for treating them together. At the very least, if we want to treat them together, I think we should document that somewhere public. (And in the case of Valencian, we should prepare for flames. I don't think there's any sort of "we're not Occitan!" movement in Provence, but there's definitely a "we're not Catalan!" movement in Valencia.) —RuakhTALK 01:48, 19 January 2009 (UTC)
The Wikipedia article on Valencian notes the debate, but comes down firmly in favor of considering it to be the same language as Catalan, just under a different name. It cites some good papers to back this view. Further, Valencian does not have a separate ISO code. In ISO 693-3, Catalan and Valencian have the same code. We can always fall back on either item as justification if we have to. In English, Provençal is at times a synonym of Occitan, and at times a regional "dialect", but the English and Occitan Wikipedias both treat it as part of Occitan. Again, there is no separate ISO code, so we need not worry about treating it as the same language. --EncycloPetey 05:26, 19 January 2009 (UTC)

I'm not going anywhere near any of the questionable ones with automation (:-). I've added the rest of these to most of Atalaes' from above and running; fixing language name, section refs, and ttbc. (the xs= parameter in {t} belongs to Tbot) Robert Ullmann 14:55, 19 January 2009 (UTC)

Languages with apostrophes[edit]

I think your bot's not reading these correctly. It keeps tagging Are'are' from water as Are'are and incorrect. There are a few of these. I can, of course, simply remove them by hand every time the list is regenerated, as there aren't too many of them, but I figured you'd want to know. -Atelaes λάλει ἐμοί 18:48, 21 January 2009 (UTC)

A (not so little) favour[edit]

I know you're busy at the moment but when you get the time do you think you could write a bot for Hungarian noun forms (including possessives) for me? It will likely be a bit more complex than a bot for another language would be (see fa for an example and note the structure at Category:Hungarian noun forms). Oh yeah, btw how do you "get" pywikipediabot? I've found the site where the "libraries" are but what do I do now? 50 Xylophone Players talk 00:44, 23 January 2009 (UTC)

Umm, is something wrong? Were you not notified of my message? 50 Xylophone Players talk 13:40, 25 January 2009 (UTC)

Eh? Like I don't have eighteen hundred things to do? (;-) I'll get there (here) ... Robert Ullmann 13:52, 25 January 2009 (UTC)
I should have asked you this a few days ago but...what do you mean by "there (here)"? 50 Xylophone Players talk 22:33, 30 January 2009 (UTC)

Source Link[edit]

Hi there, sorry to make a mess. Will be more careful. please post the link to the TBot source code so that I can take a look at it. thanks, mike --Mdupont 07:08, 24 January 2009 (UTC)

"TEST" a good idea? -- Yes![edit]

Very much so, and very useful note from you. It's irritating we need to do it, but it's brilliant being able to search Wikt direct from the browser -- I sometimes used to have separate windows open for searches on Google, Wikipedia, Wikt & Multimap, which was rather silly. Now I only need one window. (I've got as far as downloading Firefox, but haven't yet taken the plunge to ditch IE and actually use it.) --Enginear 03:22, 26 January 2009 (UTC)

Thanks! I set it up in response to a complaint we got, and IIRC, attempts to create the page. Robert Ullmann 15:19, 26 January 2009 (UTC)

update missing forms?[edit]

Would it be easy to update User:Robert Ullmann/Missing forms/English? RJFJR 14:35, 26 January 2009 (UTC)

Running. Robert Ullmann 15:09, 26 January 2009 (UTC)

Interwicket[edit]

Hi! I noticed your bot is adding interwiki immediately after page creation on both pl.wikt and en.wikt. Thanks for it. I have a question however, does it also remove interwiki on en.wikt once page is deleted on pl.wikt? Regards, ABX 07:07, 27 January 2009 (UTC)

It will eventually, when it does its complete sweep of the combined wikt index. But it is not done in "real time" as the additions are. Something I might look at; this part of process is very new (written over the weekend in-between matches ;-); until now it had just added the link to the en.wikt without the symmetrical link. I've been careful to check the pl.wikt edits as you have a few unique rules (add to top, single line, and no links to "ru" when the entry there is a blank template; that last I don't need to worry about yet ;-). Robert Ullmann 07:22, 27 January 2009 (UTC)
I asked because I wonder about vandalisms, which could be deleted but stay as broken interwiki on en.wikt (and the same problem in oposite direction). So I wonder if that could be reasonable to add a delay so admins could have time to delete page created as vandalism without thinking about interwiki added by your bot in seconds after creation. ABX 08:45, 27 January 2009 (UTC)
I'm thinking about this; it isn't looking at the deletion logs at present (which it might do). An issue with a delay is that it isn't clear at all how much is good; on some wikts (at some times of day :-) things will go away quickly, in other cases they sit around for many days. Do note that this only happens if the new bad entry is a "real word" and thus exists in the en.wikt; most vandalism (in my experience) is entirely bad titles. The problem does not occur in the opposite direction: the bot has access to the patrol flags here (using my admin account), and only picks up new entries that have been patrolled. Some delay would also be a good idea; not very hard to do of course. Robert Ullmann 09:32, 27 January 2009 (UTC)
Added a delay; reading the log entries for deletions along with new entries isn't too hard, but then I need the code to understand removing the link as well; not done yet. Thanks, Robert Ullmann 10:06, 27 January 2009 (UTC)

lt.wiktionary[edit]

Hi, Bot status is granted in Lithuanian wiktionary. --Vpovilaitis 15:29, 27 January 2009 (UTC)

Ah, thank you very much; I had noticed quite a few being skipped in test mode. Robert Ullmann 15:36, 27 January 2009 (UTC)

your bot (li.wikt)[edit]

Your bot doesn't have a bot flag at li.wiktionary.org. Will it get a global bot flag? --Ooswesthoesbes 12:24, 30 January 2009 (UTC)

I'm just looking at that. It qualifies, and that is the objective of the the global bot flags; just need to see what the requirements are for sufficient testing. It has 500K edits on the en.wikt, but editing on the others is new. Robert Ullmann 12:48, 30 January 2009 (UTC)
Alright :) --Ooswesthoesbes 08:48, 31 January 2009 (UTC)
Okay, but on checking, I find that li.wikt explicitly does not permit global bots? Robert Ullmann 09:35, 31 January 2009 (UTC)

Japanese stroke order?[edit]

Hi Robert,

I’ve a suggestion/request for {{stroke order}} with Japanese stroke orders:

could we change the heading:

'''Stroke order'''

to read instead:

'''{{#ifeq:{{{type|}}}|jbw|Japanese stroke order|Stroke order}}'''

?

(This way Japanese stroke orders will have Japanese stroke order as the heading.)

My proximate interest is – this has different Chinese & Japanese stroke orders, but the box doesn’t distinguish them.

I’d like to front both of these in the “Translingual” section (so people don’t just see the one of them & be mislead), but currently they’re both displayed in identical boxes, which is confusing.

Does my analysis & solution sound compelling?

Thanks!

—Nils von Barth (nbarth) (talk) 17:38, 30 January 2009 (UTC)

Sounds fine to me, I am very busy right now but I've changed the protection level; go ahead ;-) Robert Ullmann 17:58, 30 January 2009 (UTC)
Thanks, done!
(Check out .)
—Nils von Barth (nbarth) (talk) 02:00, 31 January 2009 (UTC)
I’ve some other changes that I’m interested in making to this – I’ve recently learned (and documented, at Wiktionary:About_Chinese_characters#.28Stroke_order.29) that stroke orders are a bit messier than I’d thought.
Esp., I’d like to make it easy to include multiple stroke orders in one box (Chinese + Japanese, or Traditional + Simplified), rather than the hacks of manual layout or two boxes that we currently have.
I’ll write up test cases and run it by you before making any changes – this is just a heads-up.
—Nils von Barth (nbarth) (talk) 02:00, 31 January 2009 (UTC)

Template:cmn-noun"[edit]

Hi Robert. I have found a little error in a template that you have created and which require administrator permission to edit. On the project about chinese [2] it says zh = Chinese (Standard Mandarin) in romanized Pinyin but Category:zh:Nouns does not contains any pinyin entries. It is because the template cmn-noun have a switch for s, t, ts and p but the p branch is missing a zh/zh-cn/zh-tw category. It should have used the category Category:zh:Nouns. I think you can fix it if you find the place in the template where it says:

  }}<!-- end switch

and put this in front of it:

  [[Category:zh:Nouns|{{{pint}}}]]

Kinamand 15:45, 31 January 2009 (UTC)

Tracking-list code[edit]

Hi, just wanted to mention that I've dropped off the current version of my code at User:Visviva/Tracking/Code. It's still pretty sloppy, I'm afraid, but please feel free to pick through it if you think there are any bits that might be salvageable. And please let me know if you should happen to notice that I'm doing anything too horribly wrong. :-) Thanks for all your help! -- Visviva 15:38, 2 February 2009 (UTC)

I like the bit about "remove obfuscated email". It always amuses me when someone writes their address as "foo at bar dot com", as if the spammers can't read that just as easily ... ;-) I'll look again when I have more time. Robert Ullmann 15:51, 2 February 2009 (UTC)

{{etyl}} update[edit]

Re: WT:BP#ISO 639-5, I made a pass at an {{etyl}} update at User:Bequw/template3. Do you think that will work? I was also wondering about the wording. When I created {{fam:nic}}, I left off "... languages" from the text, that way we can get a more preferable category name ({{etyl}} adds it back for the wiki link). But not all the texts for the codes technically end in "languages". Specifically, we have

  • cpe - Creoles and pidgins, English‑based
  • cpf - Creoles and pidgins, French‑based
  • cpp - Creoles and pidgins, Portuguese-based
  • crp - Creoles and pidgins
  • euq - Basque (family)
  • hyx - Armenian (family)
  • jpx - Japanese (family)
  • ngf - Trans New Guinea
  • qwe - Quechuan (family)
  • zhx - Chinese (family)

Do you think we have to special case these? Or can we just reword them so they end in "languages" (since we also don't like commas or parentheses)? Also, do you think we should code any of the tree hierarchy into the templates? We could add a getparent= param. I bet you got some good ideas on these. Thanks. --Bequw¢τ 07:46, 5 February 2009 (UTC)

You've confused yourself. {etyl} only adds " language" to the wikipedia link. The category just gets " derivations" added to it. no, you probably have it right ;-) No I don't think you should try to get any of the tree in there anywhere.
Just use a name for the thing, add the redirect to the wikipedia (remember they like redirects from all sorts of forms), don't try to stick "(family)" in there. I don't think the "cp" codes are very useful by themselves, why would you want to say that something is derived from "Creoles and pidgins, French‑based"? You'd want to use them as prefixes to define non-conflicting wikt codes for specific ones.
something other than "fam:" as a prefix? "fam" is itself a ISO code for "Fam" (;-). Is confusing, and if it was ever added to the interwiki list, it would cause nasty little problems. I was suggesting "etyl:" (not to say it couldn't be used by others). And the codes we want are not all families by any means. (we have at least one Klingon derivation ;-).
sorry this took a while, been very busy. Robert Ullmann 15:44, 5 February 2009 (UTC)
Thanks. --Bequw¢τ 05:29, 8 February 2009 (UTC)

userpage[edit]

Hello Robert, I have protected temporarily the userpage of is:Notandi:Interwicket because it seems to replace the template I added (which all local bots should have, because it is in Icelandic and has the contact info), [3], please let me know when You programmed the bot to not replace it anymore to unprotect Your page again, thanks and best regards, --birdy (:> )=| 14:18, 5 February 2009 (UTC)

I've fixed it to note templates at the top of the page and leave them there. Robert Ullmann 15:27, 5 February 2009 (UTC)
Interwicket: flagged on it.wikt--Wim b- 13:45, 6 February 2009 (UTC)
Thank you. Robert Ullmann 14:07, 6 February 2009 (UTC)

Interwicket on pl-wikt[edit]

Hi! The vote is finished, no technical questions were asked, your bot pl:User:Interwicket has been granted a nice bot flag :) and now can operate in full speed mode. Once again: thanks for your work! Youandme 17:19, 7 February 2009 (UTC)

Some language code updates[edit]

In January, SIL implemented some ISO 639-3 code changes. Two that have been retired are very close to other useful templates. Would we want to provide redirects to the useful ones?

  • {{acc}} (Merge with [acr]) can be used for {{accusative}}
  • {{sic}} (Split into Keak [keh] and Sos Kundi [sdk]) can be used for {{SIC}}

Also there were two name changes that I wasn't sure about. Do we want to do anything with these

  • swb updated it's name from "Comorian" → "Maore Comorian" (there are other "... Comorian" languages).
  • hif "Fijian Hindi" → "Fiji Hindi" (that's what Wikipedia uses). I know this was kicked around earlier, especially with Girmitya who apparently wanted this change.

Cheers --Bequw¢τ 02:24, 9 February 2009 (UTC)

IMHO we're better off discouraging things like {acc} for "accusative". {sic} might be useful (and I can make AF convert it to {SIC} if we like)
swb I don't have much opinion on, but the more specific name seems better
hif Yes, somehow the name change got lost, should be done; "Fiji Hindi" is the common name for the language Robert Ullmann 11:58, 9 February 2009 (UTC)
Made {{sic}} a REDIRECT, moved all of hif, but left swb for now as I was still unsure about it. --Bequw¢τ 20:00, 9 February 2009 (UTC)

What's AutoFormat up to here?[edit]

Hello Robert Ullmann -- It's not a big deal, but I moved the column break in the translations section of smell a rat so that there'd be two items in column1 and one item in column2. Autoformat promptly came along and put the columns back the way they were, with one item in column1 and two items in column2. Not my idea of the correct way to format columns. Is there a rationale for this, or is Autoformat a little quirky in its handling of such situations? -- WikiPedant 02:59, 9 February 2009 (UTC)

I thought that AF tries to ensure that each column is the same length, and in the cases where there is one “spare” line that must go into one column or the other, then it tries to split the translations as close as it can to the A–M/N–Z-divide? –In the above diff., it moves an I.-lang. translation to the second (N.–Z.-lang.) column; I too am curious as to why AutoFormat did this.  (u):Raifʻhār (t):Doremítzwr﴿ 11:30, 9 February 2009 (UTC)
It doesn't do the A-M/N-Z thing at all; that is obsolete (and wrong: look at entries like iron and you'll see the split, it there was to be a fixed split, is earlier, about L/M, but never mind). It is trying to balance the columns, and is a bit quirky when it comes to wrapped lines; for any given window width, the answer will vary. In this case it thinks that the first item is going to display as two lines, and the other two (just a bit shorter ;-) will be one line each. For a window width about 80% of my (fairly wide) screen this is in fact true ... it might be tweaked a bit more to add one extra line to the first column if it "thinks" any lines may or may not wrap. Robert Ullmann 11:53, 9 February 2009 (UTC)

Re: Interwicket [4][edit]

Hi, I'm sorry but I honestly didn't understand a word you were saying. :P Could you please put it in more simple English? -- 88.112.34.29 13:49, 9 February 2009 (UTC)

Yes my (standard) English looks fine, but I mean that I don't understand Wiktionary jargon so well...mostly at all. ;) What is TJ? A bot flag? -- 88.112.34.29 13:59, 9 February 2009 (UTC)
Ok, so...how can I help you? -- 88.112.34.29 14:23, 9 February 2009 (UTC)
Aaah, I got it. -- 88.112.34.29 20:09, 9 February 2009 (UTC)


Kullanıcı:Interwicket[edit]

Is there a bot request page? Robert Ullmann 01:47, 11 Şubat 2009 (UTC)

Hi, Dbl2010 is not active. You can find the page here.--Saltinbas 09:25, 11 February 2009 (UTC)
Thank you, request added. Robert Ullmann 15:30, 11 February 2009 (UTC)

New bot name![edit]

Hello, Robert Ullmann. Wow! I can see now you had started a new bot name which goes under the name of Interwicket. Actually, I didn't know who's running that bot anymore. But the good thing is that bot belongs to you bro. Thanks for your respond at there in sw:wikt. Thanks for your time.--Wikipedian Activist (talK 2 mE) 12:48, 12 February 2009 (UTC)

Bot flag on da[edit]

Your bot, Interwicket, now has a bot flag on dawikt. I got caught up in work, so I didn't get around to it quite as fast as I wanted to. Sorry for the long delay. -- Wegge 20:19, 12 February 2009 (UTC)

Thank you, not a problem; the bot code checks status itself. It is very important to work with the local community (which in a lot of cases, is only a few people) in making sure the bot is working properly; and with 171 wikts, it takes a bit of time! Again, thanks, cheers, Robert Ullmann 23:21, 12 February 2009 (UTC)

FYI[edit]

I took the initiative to request it bot flag for you: vo:Vükivödabuk:Bots#Geban:Interwicket. :) Best regards, Malafaya 17:02, 13 February 2009 (UTC)

Thank you, Robert Ullmann 23:29, 13 February 2009 (UTC)

Finishing off the trans languages[edit]

User:Robert Ullmann/Trans languages/uncoded is basically done, with a few caveats, all of which are noted in the "notes" column. For the most part, I haven't touched Chinese stuff, as the translation format for Chinese still rather mystifies me. Please feel free to take a stab at them if you like. I'm a bit unsure as to what to do with the macros; perhaps we could fit {{ttbc}} with a capability for the 639-5 codes, once we figure out how we're going to do those. The no codes should probably all be listed at Wiktionary:Languages without ISO codes. Perhaps I'll do that later. Some of the "under discussion" ones probably merit a BP convo, which I don't have the energy for at the moment, but probably will later. Finally, I have another (hopefully the last) list for your bots to chew through. Azerbaijani --> Azeri; Garífuna --> Garifuna; Irish Gaelic --> Irish; Norwegian (Bokmål) --> Norwegian Bokmål; Norwegian (bokmål) --> Norwegian Bokmål; Norwegian (Nynorsk) --> Norwegian Nynorsk; Norwegian (nynorsk) --> Norwegian Nynorsk; Panjabi --> Punjabi; Papago --> O'odham, Rapanui --> Rapa Nui; Scottish --> ttbc|Scottish Gaelic, ttbc|Scots; Sorbian --> ttbc|Upper Sorbian, ttbc|Lower Sorbian; Sranan --> Sranan Tongo; Swazi --> Swati; Tzutujil --> Tz'utujil; Waray-Waray --> Waray; Yucatec --> Yucatec Maya. Both this page and User:Robert Ullmann/L2/invalid could probably use an update, whenever you have time (clearly you're busy with Interwicket's newfound global empire), so whenever you get around to it. Finally, a couple of feature creep requests. First, if the update to User:Robert Ullmann/Trans languages/uncoded could preserve the codes where appropriate, that would save me a bit of work. Also, both it and the invalid L2 page have a few languages which are unlikely to be going anywhere anytime soon. It would save me a bit of work if both pages could have a long-term holding page and a new offenders page. If either of those requests are impractical or you simply don't feel like doing them, I can certainly slog through without them. Many thanks for everything. -Atelaes λάλει ἐμοί 23:40, 13 February 2009 (UTC)

Vegan, animal rights, et al[edit]

Hi; I saw that you recently reverted some of my edits to animal rights and vegan, citing POV issues. I support and affirm your intention to provide accurate definitions. Please see the animal rights and vegan articles on Wikipedia; I think you will discover that my edits to Wiktionary are in line with the definitions, there. The concept of animal rights involves ceasing to regard animals as property, not making animals more comfortable while continuing to use them as property. The latter definition describes animal welfare, not animal rights.

Efforts to "spare animals undue suffering" is absolutely a component of animal welfare, as the concept of animal welfare involves affording animals more comfort and consideration, while still continuing to use them as food, clothing, etc. However, from an animal rights perspective, no amount of suffering is "due" (or "excusable in the course of using animals"). Key to the concept of animal rights is the idea that since animals are sentient and have individual interests, they cannot be considered property, and their usage cannot be morally justified (just as humans, being sentient, cannot be used as slaves).

The "Notable writers" section of the AR navigation template on Wikipedia is a good resource for further reading on the topic, as is the website of Gary L. Francione. I will revert your edits, pending any sources you'd like to raise to support your views. I'd be happy to discuss this further, if necessary. Thanks for reading. Technologe 15:27, 16 February 2009 (UTC)

Yes, as you so clearly state yourself, your edits are unacceptable POV. We define words as used, not as people would like to think they should be used, or ought to mean. If you want to discuss, use the talk pages. "I will revert your edits" is not acceptable. Robert Ullmann 21:18, 16 February 2009 (UTC)
Robert, I've provided my sources; which demonstrate how the terms "animal rights" and "vegan" are used; the community of Wikipedia editors has formed a consensus on these definitions. There are numerous reliable sources at these articles, which back up my assertions: just look in the References section. You haven't provided any sources of your own, yet now you've reverted and locked the articles so that only admins and above can edit them for the next three months. How is this fair? Technologe 02:01, 17 February 2009 (UTC)
It isn't fair, it's what's in the best interest of our dictionary. You seem to be failing to distinguish between the "correct" definition of a word/phrase and the actual definition in usage. We have no interest in the former, only the latter. Also, your audacity and persistence in reverting a long-time administrator of this project shows Robert's protection of the page to be entirely justified. If you want to get the page changed, I suggest you bring it up in our Tea room, as I think it unlikely you will change Robert's mind on his talk page. -Atelaes λάλει ἐμοί 03:59, 17 February 2009 (UTC)
At Atelaes notes, we describe usage (and sometimes prescribe usage, when the issue is grammar and construction). The WP debate you refer to has no relevance here, as WP is a tertiary source. (And from how you describe it sounds like proscribed original research.) On procedure: your edits were POV, properly reverted by not one, but three different, editors. The pages are protected only to prevent revert warring, which we do not engage in. It would have been easier to simply block, but this way you may edit the relevant talk pages, and—if you can—make the case for any given sense in usage. (oh, and the vegans I know, while being very strict with diet, have no use for "animal rights"; one usually wears leather ;-) Robert Ullmann 11:53, 19 February 2009 (UTC)

Bot Status on lo:wiktionary[edit]

Hi Robert. I just wanted to tell you that for bot status on lo:wiktionary, you would need to ask for it on meta-wiki, as we still do not have any bureaucrat nor sysop. Regards. Tuinui 08:12, 18 February 2009 (UTC)

Yes I know; to ask there the request has to be pending for 7 days in the local project. Will do. Robert Ullmann 10:19, 18 February 2009 (UTC)

vääntää rautalangasta[edit]

Hi, I made an article of that idiom and I was just wondering if you could come up with an English idiom which means "to explicate [sth] in a very simplified way" (e.g. with a very simplified language, simple words etc.). That's what that means, but I'm unaware of a good English equivalent. -- Frous 15:07, 18 February 2009 (UTC)

Sure, it is to explain in words of one syllable (not meaning literally one syllable, but condescendingly simplified) To use your example: Can you read this manual or do I have to explain it to you in words of one syllable?! Robert Ullmann 15:21, 18 February 2009 (UTC)
Thanks :) -- Frous 15:33, 18 February 2009 (UTC)

Interwicket@lo.wiktionary[edit]

Hi there Robert. I would like to ask you to make a request at meta for your bot Interwicket on lo.wiktionary. After I went to that page and looked at the recent changes, I noticed that your bot is kind of flooding its' RC page. I do not actively edit there. I just noticed this. Cheers, Razorflame 01:08, 19 February 2009 (UTC)

See section 2 above this. There is one user adding quite a few things and lots of iwikis missing; when Interwicket is the only other thing it looks like a lot of RC even when only editing only a few of the possible times. Request is at lo:Wiktionary:Bots should be pending until 23rd. Robert Ullmann 11:09, 19 February 2009 (UTC)

Just an odd question[edit]

Would it be possible to run your dump updating system for each of the wiktionary languages? - Amgine/talk 02:10, 21 February 2009 (UTC)

It is certainly possible; would take some organization. The control process assumes it is just doing one. The process itself assumes it is doing 'en' (easy to fix), but also knows the namespace numbers for content in the en.wikt (Index, Appendix, etc), and these vary per wikt. The other wikts are about the same size or smaller, so that isn't an issue. (The process will not scale up to something with high edit rates, like the bigger wikipedias; not to mention the amount of data.)
The WMF dumps are broken again as I see, and fixing them is their absolute last priority as usual, eh? Robert Ullmann 08:02, 21 February 2009 (UTC)
In a word, yes. And I've been told now that a SAN is completely out of the question, even if it were given for free. I'm not interested in stepping in to manage a larger wiki backup, but if we could back up the wiktionary project I'd feel better. If we could build a dump server for the wiktionary project I think a lot of people would appreciate it. Anything I can do to facilitate this, just ask. - Amgine/talk 02:42, 11 March 2009 (UTC)
According to WMF, Tomasz is currently focused on restoring dump capability for all WMF projects. At various times in the past various other developers were assigned primary focus on dump capability including Tim when he was first being paid iirc, so while I am appreciative of the gesture I'm unsure as to its actual effectiveness. In the meantime, if there's a way to build a dump site for all of wiktionary maybe we should? Apergos at el has a script that's been running rather a long time which I'm told is both pretty efficient and as low cost to WMF as can be reasonably done. Thoughts?

German noun at lo.wiktionary[edit]

Hello again Robert. I found out that German noun are capitalized (1st letter) on both en.wiktionary and de.wiktionary. I want to use the bot to capitize the words in German noun category at lo.wiktionary, but the problem is some have already been linked to the not capitalized Dutch nouns. Do you have any suggestions to have this better done? Best regards. Tuinui

Move the entries to the proper capitalized form, and let the bot(s) worry about the links. It won't catch them immediately, but will fix things properly when it does. Robert Ullmann 09:18, 21 February 2009 (UTC)
Thanks for the advice. Tuinui 09:25, 21 February 2009 (UTC)

Input sought on RC thingy[edit]

Hi Robert,

I've been playing with a little script lately that follows non-bot edits in mainspace and sorts them by language section affected. Output is here. I think the server load should be minimal -- maybe equivalent to one desultory, passive human patroller: only mainspace changes are fetched, the API is used so there is no parsing, and the downloaded versions are cached locally so they are never fetched more than once. I'm thinking this could be a useful tool for monitoring, as it's difficult to track specific languages by RC alone. People could subscribe either to the main RSS feed (list of languages edited between updates) or the language-specific feeds. I just wanted to check if there are any reasons why this might be a bad idea, before I go and make a fool of myself with a big announcement.  :-) -- Visviva 16:18, 21 February 2009 (UTC)

Seems very reasonable. I've heard a few people say they'd like some way of "watchlisting" a language; but I don't know how much use it would get, one never knows. Presumably an entry is listed for a language if there was a change inside the language section or the section appears or disappears? You use the API both to read RC and get the versions you need? Sounds fairly cool. If a page is changed twice, you do three retrievals of revision content (not four ;-)? You might want to consider some sort of canonical WS reduction or format before comparing, as people often (w/ or w/o .js help) will change spacing in the entire entry when editing only one language. (No, I don't know the value of "often" ...) Robert Ullmann 11:46, 22 February 2009 (UTC)
1. Yep.
2. Yep.
3. Yes, or if it has been changed semi-recently (so that the old rev is stored locally), only two. Actually the entry-space revisions are so small -- about 1.5 K on average -- that I figure I can accumulate millions of them before I have to worry about disc space (though I'll need to store them in a more structured way). I'm hoping I can find some other useful role for all this data, though I'm not quite sure what yet.
4. I've noticed a few of these, but not so many that it seemed worth filtering them out. Actually I was thinking about removing all whitespace before checking for changes, but there are times when whitespace is significant, and I couldn't think of any obvious way to tell a potentially significant WS edit from a non-significant one. At any rate, leading/trailing whitespace is removed, as are the "----"s.
Yeah, I'm not sure if anyone but me will really care about this. :-) But just in the week I've been testing it, I've found it quite useful for my own purposes; I can catch English and Korean edits that would otherwise have gone right by me, and by clicking through the languages I get a very interesting view of "who's doing what".
Thanks for your time! -- Visviva 12:15, 22 February 2009 (UTC)
On WS: you might do:
rews = re.compile(r'[\s\n]*\n')
...
text = rews.sub('\n', text)
i.e. rm all blank lines and all trailing WS on lines. I suspect the most common thing is people adding in blank lines? I don't know: you can see what is being picked up. You might do two comparisons in testing: one removing all WS, and one not, and report out the cases that are different? Or perhaps not important. Robert Ullmann 12:34, 22 February 2009 (UTC)

Oh, you probably want to also include edits for a language when the change is in the English section, and a line containing the language name is added/removed/modified. You think? (* Korean: ...) Robert Ullmann 12:39, 22 February 2009 (UTC)

Yeah, I think so. I've been procrastinating about it, but should be pretty simple to add. Thanks for the whitespace idea; I really haven't noticed many of these, but less noise is always good. -- Visviva 13:56, 22 February 2009 (UTC)

lzh (Literary Chinese)[edit]

See: Wiktionary:Beer parlour#Literary Chinese

A heads-up that there is now an ISO code for Literary Chinese (they don’t seem to be differentiating between 文言 and 古文), and I’ve created a language template for it; hopefully this is of use.

—Nils von Barth (nbarth) (talk) 23:13, 21 February 2009 (UTC)

Isn't 古文 effectively och, with lzh and ltc (and zh, cmn, nan etc) differentiated? Robert Ullmann 07:54, 22 February 2009 (UTC)

Japanese Etymology[edit]

FYI, I’ve written a policy think-tank at Wiktionary:About Japanese/Etymology; hopefully this is a useful reference and place for discussion.

—Nils von Barth (nbarth) (talk) 02:45, 22 February 2009 (UTC)

Hello.[edit]

Hello, sir. Oneunderall 18:10, 23 February 2009 (UTC)

New entry templates[edit]

Sorry for the mistake. I think I've rolled back all of them. There is this, but maybe that's just because the job queue hadn't caught up yet. Nadando 04:41, 26 February 2009 (UTC)

Thanks for the (implicit) encouragement[edit]

I just came back to Wiktionary, two days after a less-than-positive experience with adding a new word that I honestly thought would meet the criteria for inclusion. Seeing your gentle comment on Mr. Irwin's talk page on the deleted entry hoocoodanode was quite encouraging to me. I will give a little time to doing a bit more research and see if the word can meet the cfi criteria. N2e 02:32, 27 February 2009 (UTC)

bot voting at trwiktionary[edit]

Hi, your bot request has been approved. Please ask for flag change at meta. Sorry for delay.--Saltinbas 05:52, 3 March 2009 (UTC)

Thanks, Spacebirdy set the flag per steward request on 20 February. Robert Ullmann 07:45, 3 March 2009 (UTC)

Multiple lang tags[edit]

Hi.

I'm looking at edits to template:Hant,[5] template:Hans,[6] and template:Hani.[7] These have two language tags in the lang and xml:lang attributes, like lang="zh-Hant zh-tw", which causes pages to fail validation (example).

I want to clean these up, but I thought I'd check with you because I'm not familiar with specific browser issues and conventions for these. It would be possible to make language tags with both script and region, like lang="zh-Hant-TW", but it's probably best to keep things as simple as possible and just go with the standard script tag, lang="zh-Hant"Michael Z. 2009-03-03 16:01 z

The browsers (in my experience and to my knowledge) do understand this, using whatever part of the first "token" that they can match; this is basically left over from having less well-defined tags. But at this point we can probably expect browsers to understand "zh-Hant"; things like IE5 that did understand "zh-tw" as well as "foo bar zh-tw" will lose, but that isn't really a problem now (the fonts in IE5 etc being totally messed up anyway, as I hardly need to tell you ;-) So I agree we should go with the "zh-Hant" tag by itself. (We can't do "zh-Hant-TW" anyway, the template doesn't have that information; could be "yue-Hant-HK" is correct for the same case ;-) So, yes, fix it Robert Ullmann 08:06, 4 March 2009 (UTC)
Thanks; will do. Michael Z. 2009-03-04 16:42 z

Request[edit]

Hi there. I cannot help but notice Interwicket's FL status (the one that lists its' edit count for every wiktionary). Do you think that it would be possible for you to create a version of this that could be used by interwiki robots on the Wikipedias where it updates it every time it changes a link on a separate Wikipedia? That would be very helpful in facilitating bot requests on Meta, and since toolserver has been down for a while, it would be very helpful for the operators to see how many edits their bots have made on each of the Wikipedias and which ones it has the bot flag for and which ones it doesn't. If you aren't able to do this, I understand, but it would be very helpful to other bot operators if you could make it available to other bot operators on the Wikipedias. Thanks, Razorflame 15:34, 5 March 2009 (UTC)

It is very useful, isn't it? I'm not sure how it would get integrated into the 'pedia interwiki bot, but I can certainly think about what might be useful. You wouldn't want to update the status every time; my program updates it once a day.
A possibility is to create a stand-alone program that can be run, and use the API to look at the bot's contributions and status, etc, that you can just run once in a while. Then if desired, interwiki could just invoke that every few hours. As I said, I'll think about it a bit. Robert Ullmann 16:43, 5 March 2009 (UTC)
Yes, it is very useful, especially with the toolserver replication lag being so high. Both suggestions that you suggested would be great (and either one would be very helpful, so long as it would make sure that it would have up-to-date edit counts without the replication lag of the toolserver). I very much look forward to seeing what you can do with this. Thanks for the consideration, Razorflame 17:23, 5 March 2009 (UTC)
Okay, try running this: User:Interwicket/code/botstatus and tell me what you think. Robert Ullmann 12:57, 6 March 2009 (UTC)
Very nice! Thanks so much! If there were barnstars on this project, I'd give you a hundred! Thanks so much! I'll make sure that this gets around to the other bot operators! Cheers, Razorflame 17:26, 6 March 2009 (UTC)
Just a question, but could you add a little code that would produce the bots' total edit count over all Wikimedia wikis? That would be very helpful. Thanks, Razorflame 17:38, 6 March 2009 (UTC)
Yes, and there are a few other things, now that I have the basic working. If/when you tell others about it, please point them to the code page here (i.e. don't send them copies!) that way they will know where to find the latest. At some point we'll get it into the framework, but that process is an absolute farce. (They don't reply to serious bug tickets for weeks and months...) All suggestions welcome! Robert Ullmann 17:44, 6 March 2009 (UTC)
Actually, I've just found out that the toolserver is back up, so this will be a great backup/companion to toolserv's edit counters. Thanks for the help! Cheers, Razorflame 19:45, 6 March 2009 (UTC)

Just a note, your botstatus code seems to ignore some wikis that are global bot friendly. da.wiki was the one I noticed. I was wondering if it could be that the code checks the table at meta:Bot policy/Implementation#Where_it_is_policy but doesn't recognize the wikis listed below the table that say they allow global bots but don't otherwise follow standard bot policy? Suppose I could change the table on meta to match the other table unless you can think of an easy way to recognize them. -Djsasso 13:10, 9 March 2009 (UTC)

Quite so; reading the page itself is obvious a really bad hack. What it should do is read the list "wikiset/2" from meta, but that list is only accessible through a rights page that only stewards can access. Fixed, more hackery. It reads the other line format. Of course if that changes, it breaks again. Robert Ullmann 13:45, 9 March 2009 (UTC)

nl.wikt[edit]

Hi Robert, Welcome on the Dutch wikt. I took the liberty to put a SUL-tag on your user page. Hope you don't mind 66.26.84.143 18:36, 7 March 2009 (UTC)

Oops no login

Jcwf 18:37, 7 March 2009 (UTC)

Requested for comment[edit]

You have been requested for comment at the same page as SemperBlotto. Razorflame 16:02, 10 March 2009 (UTC)

Hmm, someone seems to have deleted it. Robert Ullmann 16:07, 10 March 2009 (UTC)
Nope, it's still there. Razorflame 16:09, 10 March 2009 (UTC)
Too bad, I was hoping (seriously) you would not make yourself look even worse. I'm sorry. Robert Ullmann 16:26, 10 March 2009 (UTC)

I would like to formally apologize for all the drama that I caused, and would like to apologize to you personally for any offense I might have caused you. Thanks, Razorflame 16:50, 10 March 2009 (UTC)

Old English interwiki[edit]

I don't think we need one. Old English will need a special kind of dictionary that'd different than most others, something closer to a Bilingual style with the Modern form. I myself will be fixing it up, and I can do most of the work, but I don't want my work to be completely undone. I am [8] on it. I would appreciate it if we could turn off the bot.

(resolved, was confusion with IP-anon edits, not the bot) Robert Ullmann 17:09, 15 March 2009 (UTC)

Poking[edit]

Not urgent, of course, but I thought I'd remind you of this, based on your BP note. Thanks. -Atelaes λάλει ἐμοί 09:42, 11 March 2009 (UTC)

Done. Note that Seneca turns up as un-coded because of the ersatz template. Robert Ullmann 17:08, 15 March 2009 (UTC)

Spanish verb cleanup[edit]

Robert, now that you're back, I have a request. Yesterday, I finished adding {{es-verb}} to all Spanish verb in Category:Spanish verbs ending in -ar; I had already done this for the -er and -ir verbs. What I need now is a list of all entries listed in Category:Spanish verbs that do not call the template {{es-verb}}.

I need this list so that I can complete the task of cleaning up those entries. I am hopeful that the list will be short, but it may include Spanish verb forms and lemma entries that never had the conjugation table added. It is not necessary to wait for the next database dump if there was one within the last two weeks. Very few of the entries were modified in the past two weeks, so I can deal easily with false positives from that time span.

If you could produce such a list and deposit it at User:EncycloPetey/Verbos, I'd appreciate it very much. --EncycloPetey 17:23, 15 March 2009 (UTC)

Okay. Might take a bit longer than a few minutes ("the merely difficult we do immediately, the impossible takes a little longer ..." I forget where that comes from ;-) as it isn't a combination of code I've already written. Easy enough to spin though. I have a dump from yesterday morning (~38 hours old now). Robert Ullmann 17:36, 15 March 2009 (UTC)
It's a motto of the US Army Corps of Engineers during World War II. --EncycloPetey 18:00, 15 March 2009 (UTC)
Ah, thank you. There may also be some oddities listed: ahincar uses "Template:" explicitly, for example. But of course those could use a little TLC too. I'm going out for a pint and to watch la Liga, if nothing faults you should have a report in a while. If not, I'll fix it when I come home. Cheers, Robert Ullmann 18:16, 15 March 2009 (UTC)
I'm also headed out for a bit soon. No worries. Yes, some of the Spanish verbs call their conjugation tables with "Template:" because, unfortunately, some of the template names include an integral colon, which breaks the usual transclusion syntax. The templates need to be renamed before the linking can be fixed, and I've chosen to deal with just one issue at a time on the Spanish verbs. I'm just fixing the inflection line, removing superfluous category links, and sorting the L4 sections for now. There are many other issues to be cleaned up, and I may deal with some of those afterwards. --EncycloPetey 18:36, 15 March 2009 (UTC)
These don't need template:, there for some unknown reason. Templates with : are only a problem if the "prefix" is a namespace or iwiki language or something. I'm not aware of any of these that are a problem. Robert Ullmann 22:29, 15 March 2009 (UTC)
It causes problems in Safari, at least. See yacer (whose conjugation template begins with es:, which is the ISO code for an interwiki link to the Spanish Wiktionary. --EncycloPetey 23:34, 15 March 2009 (UTC)
Has nothing to do with the browser. As noted, only a problem if the template name starts with an iwiki prefix or whatever. Easy to move them. Let's see: Special:Prefixindex/Template:es: ... hmm, all redirects already, just need to fix the entries. Maybe a bit of automated fixing is in order? Robert Ullmann 13:31, 16 March 2009 (UTC)
It is fixing those, and writing the list again. Robert Ullmann 16:34, 16 March 2009 (UTC)
Looks good very reasonable list left. Robert Ullmann 22:34, 16 March 2009 (UTC)
Thanks. I hope to have done everything I can with cleanup on the remaining items by this weekend. It looks like Nadando is helping as well. --EncycloPetey 04:59, 17 March 2009 (UTC)

AutoFormat and Categories[edit]

Is there any particular reason AutoFormat is moving categories off of definition lines, as in this edit? Those definition lines didn't warrant a {{context}} label, but the categories are linked to those specific definitions, and thus should be adjacent to them to avoid confusion for later editors. Carolina wren 19:16, 17 March 2009 (UTC)

Categories are placed at the end of the language section, one per line. Firm policy, see Wiktionary:Votes/2007-05/Categories at end of language section. They are not to be on definition lines, or floating about anywhere else. Robert Ullmann 23:09, 17 March 2009 (UTC)
It would be nice since it's policy if it were listed someplace other than a vote page. That policy is not on Wiktionary:Categorization and Wiktionary:Entry layout explained gives that placement as only a recommendation not as a requirement. Carolina wren 04:18, 18 March 2009 (UTC)
It was carefully added to WT:CG but Connel trashed it (and other important things) with this edit, hence the vote. I didn't realize ELE had a line re placement, I'll fix that. Robert Ullmann 11:14, 20 March 2009 (UTC)

Another person to run interwicket?[edit]

Hi there Robert. Would you like to have a second user running Interwicket with you so that it can keep up with the amount of backlog that your bot is currently reporting at User:Interwicket/FL status? If that is what you want, please let me know on my talk page and we can discuss how to do it. Cheers, Razorflame 20:32, 19 March 2009 (UTC)

Hi! There are two process that Interwicket uses, iwikt.py and mbwa.py. The first is designed so that anyone (bot runner) can run with some wikt as "home", and update all the links efficiently. It has been run on sv.wikt and several others. It would be useful to run it on fr and vi, as can be seen from the stats.
The problem is that it needs a fairly current XML dump of the home wikt, and the WMF dump "service" is now entirely broken. I tried running it on fr, with the last dump available, and it spent 95% of its time slogging through titles already done since.
The mbwa process I re-wrote to not use the dumps at all, but is very specific to Interwicket. It reads all the indexes and keeps an on-disk cache (about 1/2 GB), and uses about 250MB of RAM while running, but does track everything, including RC.
Something else might be created, but I'd have to do quite a bot of work to replace out the missing XML dumps and maintain some efficiency. (This could be done with serious mods to iwikt, to read the links table with the API.)
However, you also need a modified "framework": the current version available in SVN has the iwiki sort order wrong for da, ms, and sv. Have you manually modded the wiktionary_family.py file for these? Robert Ullmann 13:24, 20 March 2009 (UTC)
Not yet, but I would be more than happy and willing to do so, or we could possibly work something out. I could send you an email with my email address, and you can send me the modified version of wikipedia_family.py, or we could do something else. Whatever is needed, I would be more than willing to help, Cheers, Razorflame 00:37, 21 March 2009 (UTC)
No reponse in about 4 days? Razorflame 00:37, 26 March 2009 (UTC)

Old Norse[edit]

Sorry to disturb you here, but AutoFormat goes on adding links to Old Norse translations in the translations section. Could you adjust it, since you agreed that this is not needful for such a well-known language? The uſer hight Bogorm converſation 09:43, 20 March 2009 (UTC)

Ah yes, it needs to be prodded to re-read the list. Will do presently. Robert Ullmann 11:02, 20 March 2009 (UTC)
Speaking of which, why is it unlinking Navajo? Carolina wren 23:56, 20 March 2009 (UTC)
If you are qualified to be an admin/sysop, you should be able to answer that question, not ask it. You seem to be way too new to know enough about how things work. Mind you, asking questions is always fine, but in this case: what is the answer? Robert Ullmann 00:32, 21 March 2009 (UTC)
AutoFormat's user page says it unlinks the Top 40, but there is no way Navajo is in the Top 40 no matter how you're determining that (number of speakers, entries in Wiktionary, translation line in Wiktionary, etc.), so that can't be it. Indeed, as an unusual language, I'd be expecting AF to actively link it under that criteria as given in its user page. Wiktionary:Translations doesn't say anything, all of which I had already examined before coming here to ask. Looking around some more, I see that after following the obscure link on Wiktionary:Translations to Wiktionary:Translations/Wikification, that because Navajo is listed as ¿¡¿commonly?!? known , that's why AutoFormat does that. It would be helpful if mention of the use of that page were made on User:AutoFormat.
However, my question was not about the mechanics, but the reasoning behind it. If Navajo qualifies as a language to be left unlinked because it is commonly known, it seems to me that there are several hundred more languages that need to be in that section as well. Carolina wren 01:07, 21 March 2009 (UTC)
If Navajo qualifies as a language to be left unlinked because it is commonly known No, I have never heard of that language and I support linking it. I advocate leaving unlinked the languages with rich cultural heritage, which includes all Europæan languages probably without one or two and most Asian ones (and Afrikaans, Ge'ez and Amharic of course). This is obviously not the case with this relatively unknown language, so please keep it linked. The uſer hight Bogorm converſation 12:55, 22 March 2009 (UTC)
I wouldn't say that Navajo is without a rich cultural heritage. I do know of it, but I would consider it and other Native American languages (with the possible exception of Cherokee due to Sequoyah's syllabary) to be not well known outside the United States. Conversely, I wouldn't include the subnational languages of continental Europe among the commonly known languages as a general rule since we're not all Europeans here either, nor would I include Ge'ez among languages I'd expect to be commonly known by most users. Carolina wren 14:23, 22 March 2009 (UTC)

Dumps[edit]

Did I see something about the dumps being broken again? (Or still?) RJFJR 00:05, 22 March 2009 (UTC)

Yes, they are pretty much broken now, although maybe getting better? Something is running on frwiki and dewiki, but without the "split-stub" indexes, so they can't run all-history (although it seems to be trying?). The last all-history backup for enwiki is more than a year old. The "current revisions" for enwiki seems to have completed on March 15, which would be good since the previous one was in October (!). Most projects have had no backups since 19 Jan to 12 Feb (depending on project). We, on the other hand, do have dailies; it isn't really that hard. But when one asks, one is told that they are "low priority". If some event or combination of events took out one rack in Tampa and one rack in Amsterdam, the Wikipedia would cease to exist. (Much of the data could be reconstructed, but the project would probably be dead. Note that there is no indication I have found that backups are copied to permanent media stored elsewhere, they seem to live in the same racks ... and all the user meta-data cannot be downloaded.)
You know where ours are: http://devtionary.info/w/dump/xmlu/ you might want to grab the latest once in a while and stash it. That download directory itself isn't backed up to permanent media. (If that system were to break irrecoverably, I could easily set it up again somewhere else.)
However, our dailies don't have the history or the user info, so we'd have to load it up somewhere and re-establish some user accounts, etc. to continue working. Robert Ullmann 01:03, 22 March 2009 (UTC)

User:Robert Ullmann/L2/invalid[edit]

Poke/nudge/brick through your window with threatening letter attached. -Atelaes λάλει ἐμοί 09:39, 22 March 2009 (UTC)

Done. Robert Ullmann 00:15, 23 March 2009 (UTC)
CID has noted the threat, and an Interpol Orange Notice has been prepared in case of any warranted need for distribution. Robert Ullmann 00:15, 23 March 2009 (UTC)
Thanks, for this and the following. Brick rescinded. -Atelaes λάλει ἐμοί 02:30, 23 March 2009 (UTC)

Wiktionary:Grease pit#Template:top, Template:mid[edit]

I think your input here would be valuable. -Atelaes λάλει ἐμοί 21:45, 22 March 2009 (UTC)

Missing[edit]

I'm starting to run into more and more entries that have already been corrected. If you have the opportunity to update these pages, I'd appreciate it. DCDuring TALK 14:43, 25 March 2009 (UTC)

Done. Robert Ullmann 16:43, 25 March 2009 (UTC)
Appreciated. DCDuring TALK 20:09, 25 March 2009 (UTC)

Conjugation of French verbs ending in -ayer[edit]

Hello, There is a slight confusion in the template. For instance, for essayer :

present participle is essayant (not essayé)
past participle is essayé (not essayant)

I don't know how to fix it, and it seems that nobody noticed my request. Sorry for asking you. --Elkaar 19:43, 31 March 2009 (UTC)

Modified. --Elkaar 09:58, 1 April 2009 (UTC)

User:Robert_Ullmann/Mismatched_wikisyntax update?[edit]

There's no hurry, (there are still lots left ;-), but it was last updated in Dec, and I'm curious how many the program would find now. Thanks! JesseW 21:19, 31 March 2009 (UTC)

User:Robert Ullmann/Missing forms/English[edit]

Is User:Robert Ullmann/Missing forms/English ready for a refresh? RJFJR 16:33, 3 April 2009 (UTC)

manyatta[edit]

As the resident Kenya expert, can you check manyatta - I think it is as concise and accurate a definition as possible, but if you know anything else about Masais that could be useful, I'd really appreciate an expansion (maybe an etymology?). Thanks --Jackofclubs 11:54, 4 April 2009 (UTC)

Is more general, any family or clan compound, this use is much more common than the specific use of an encampment of morans. Also more than Kenya; throughout the nomadic region in at least 4 countries. Edited ... good word. Robert Ullmann 12:09, 4 April 2009 (UTC)
Great, thanks Robert - I added a few more Maasai pages - moran, eunoto, emorata . Is there any chance you could generate a page with all the missing words from w:Maasai? Maybe put it at User:Jackofclubs/Maasai? And in the future, how would I be able to generate such a page? --Jackofclubs 12:24, 4 April 2009 (UTC)

Template talk:R:Webster 1913[edit]

Discussion moved to Template talk:R:Webster 1913#Discussions moved from User talk:Robert Ullmann.

WT:A[edit]

Hello Robert, I'm considering nominating myself for adminship. I've been editting reasonably regularly since last June, and would be interested to help out with some admin duties - blocks, deletions, reverting etc. As a long-term and esteemed user, do you think I'm "qualified", and is this a reasonable request? --Jackofclubs 10:23, 18 April 2009 (UTC)

XML dumps broken?[edit]

Hi, I notice there have been no XML dumps for four days. Do you know of this, and do you have some time to fix it? It's not urgent. Thanks Conrad.Irwin 15:01, 19 April 2009 (UTC)

no, sorry, I picked up the last one I'm using on the 13th; and didn't see any problem. Will sort it immediately! Robert Ullmann 22:22, 19 April 2009 (UTC)
devtionary seems to have suffered a longer network outage than the program was prepared to deal with; restarted, and it correctly discarded the bad intermediate file, and is building a new one; I expect the dump tomorrow morning will appear as usual :-) Robert Ullmann 22:41, 19 April 2009 (UTC)
Thanks. Conrad.Irwin 22:47, 19 April 2009 (UTC)
No, much a problem: your automation managed to create Template:prr with no revisions. (!) I have managed (apparently) to "delete" it. Robert Ullmann 01:32, 20 April 2009 (UTC)
ditto prw, psm ... WTF were you doing? No check at all on return status? I'm deleting them one at a time. Robert Ullmann 01:37, 20 April 2009 (UTC)

Dump worked today, after much hand fiddling with page IDs. Robert Ullmann 13:00, 20 April 2009 (UTC)

User:EncycloPetey/Verbos[edit]

Could you regenerate this? Thanks. Nadando 22:20, 19 April 2009 (UTC)

yes, wilco, as soon as I figure out the problem up one (no direct dependency, this (Verbos) doesn't use the dumps ;-) Robert Ullmann 22:24, 19 April 2009 (UTC)
Done. Robert Ullmann 03:00, 20 April 2009 (UTC)

Your removal of quotation marks from Template:R:Webster 1913[edit]

Discussion moved to Template talk:R:Webster 1913#Discussions moved from User talk:Robert Ullmann.

Probleme with apostrophes ' and ’[edit]

French Wiktionary uses the typographic (or typeset) apostrophe ( ’ ) apostrophe while others use the typewriter (or vertical) apostrophe ('). Your robot removed some good interwiki links here and there. Could you fix the probleme with your robot ? And could your robot create interwiki links whatever apostrophe is used by wiktionaries ? Thank you. --Yun 02:49, 1 May 2009 (UTC)

These are in fact bad interwiki links; they must be to the exact form. The other bots will also remove them. Links to redirects on the other wikt (e.g. br.wikt) will work (but the other iwiki bots will still remove them at this time, we're working on that). See discussions on fr.wikt. Robert Ullmann 07:05, 1 May 2009 (UTC)
I have mentioned this to Melancholie, and he seemed to be able to make the two apostrophes synonymous: [9] (Then Interwicket removed them), meaning that if a page has ' or ’ is irrelevant: they are still the same page: [10]. V85 07:11, 12 June 2009 (UTC)
Hello Robert Ullmann, I would recommend reverting your code, as it is legitimate for bots to ignore typographic differences like (') and (’); your code was actually better than PyWikipedia's. Please see de:User_talk:Melancholie#Interwikis for further information and my PyWikipedia script changes I am running with now. --- Kind regards, Melancholie 15:07, 13 June 2009 (UTC)
This is only one small bit of a very large class of issues. Please look at meta:Interwiki.py/Wiktionary functionality discussion. And it is not proper to ignore the difference between ' and in all cases, how would you link those pages themselves? (;-) There are several other variants of apostrophe, the differences are not typographical, and the entries may be different. ʻohu and 'ohu are explicitly different, the latter being a useful redirect for now, it might end up as a word in some other language other than Hawaiian. It looks like your code would consider 'ukulele and ukulele the same, they are manifestly not, and neither is ʻukulele.
While one could (as you have) add special code for apostrophes, do you plan to add special code for all the other classes of entries that "should be equivalent", with rules varying wikt by wikt? (Note apostrophe-equivalence is not valid for the en.wikt, and would have to be specifically disabled here!) I note that you are doing some other things, but not even close to what wikts vary on; and some wikts (like en.wikt) do not count those as equivalent, so your code would be "wrong". Another severe downside is that it then requires manual additions or "clues" to find the links in the first place; linking to same form, including linking to redirects, is entirely automatic.
The fr.wikt has allowed links to redirects (also en, sw, sv), doing this supports all policies that all wikts wish to use for all kinds of variant forms, of any class (variant characters, idioms, inflected forms, caps, etc, etc, etc.). I can enable links to redirects (which will then add them, Interwicket never removes them) for any wikt at will. Robert Ullmann 13:20, 14 June 2009 (UTC)

An example is de:Côte d'Ivoire, where de.wikt does not permit (as I was informed?) links to redirects. If it did, the links for fr and fi would work fine, and would be added automatically by Interwicket when missing. (fr and fi use the variant apostrophe, fr because they use it everywhere, even for English words, fi presumably because this particular term is French?) Robert Ullmann 14:20, 14 June 2009 (UTC)

Hello Robert, note that my bot is not looking for ukulele if the initial page is 'ukulele, unless someone places an interwiki link to ukulele in a 'ukulele page manually. And even then a interwiki conflict would/should prevent my bot to do nonsense in this rare cases. The same applies for non-typographical differences of apostrophes/dashes/etc., the bot only accepts other variants if they have been linked manually, without conflict. My bot is always following redirects, but only accepting the very same page name then, so actually only following if there are only typographical variants (if linked already). I thought this would be a good compromise. If there are differences between variants, it is up to human beings to not mix them up before a bot comes spreading given links, I would say. See also no:Brukerdiskusjon:V85#Apostrophes. --- Kind regards, Melancholie 10:23, 17 June 2009 (UTC)

matatu[edit]

Hello again, can you please have a look at matatu to see if I missed anything? I was considering adding to the definition something like "... which are really dangerous", but this might be considered POV. It looks as though these things are only found in Kenya, but once again, you're the expert in this field. --Jackofclubs 11:11, 3 May 2009 (UTC)

p.s. have you ever been in a matatu? Why is it seen as so dangerous? Is it just the drivers, or is there something wrong with the vehicle itself? Sorry for the questions. --Jackofclubs 11:13, 3 May 2009 (UTC)
To say the drivers are aggressive is an understatement. They (and the "turnboys") earn money based on the riders and fares collected. The fact that their driving accounts for most of the traffic flow disruption that they in turn suffer from is lost on them. The vehicles (called "nissans" whether they are or not ;-) are not well maintained, and suffer badly from the condition of the roads. Add in the fact that the criminals that sometimes hold up matatu passengers are often in league with the driver and turnboy, and the result is fairly dismal. Yes, I've ridden in a matatu. Once.
It doesn't have to be that way: the exactly equivalent system in Kigali (Rwanda), is safe and comfortable; very convenient. Robert Ullmann 13:13, 9 May 2009 (UTC)
How's the definition of turnboy? Google showed me that they are also found on trucks, but it was difficult to discern what they actually do on these vehicles, so I just called them assistants. --Jackofclubs 17:48, 10 May 2009 (UTC)

{{language}}'s continued support for {{lang:xx}}[edit]

Hi Robert,

Seeing as we don't have {{lang:xx}} anymore, should we reduce {{language}} to just {{{{{1}}}|l=}}? (Or maybe just start phasing it out?)

RuakhTALK 18:46, 3 May 2009 (UTC)

I've been thinking of merging it into {{langname}}, so that all places using it/them can take code or full name (unless they need the code as well, e.g. {{t}} when the FL.wikt exists). We do want to keep something (nothing else should use the l= option directly) so we can change it later (for example if we got a proper mapping table, rather than thousands of templates ;-). And note that it handles (blank) correctly: "Template:language", while using {{{{{1}}}|l=}} produces "{{|l=}}". Which is is used some place IIRC. Robert Ullmann 13:03, 9 May 2009 (UTC)
Re: merging into {{langname}}: I think that's a good idea; currently we don't have guidelines SFAICT to decide when to use each of the code→name templates ({{language}}, {{langname}}, {{t-sect}}, etc.), and standardizing on {{langname}} will help. However, in some cases it might be worth asking whether we really want a freeform field: for example, is it really desirable for {{etyl|Northern Maine French}} to add an entry to the non-existent Category:Northern Maine French derivations? (Maybe we do — after all, as an open wiki, there's a limit to how much we can constrain user behavior, even assuming we want to — but it's probably worth having a discussion about.)
Re: blank parameter: I don't think that's used anywhere. That works by using {{lang:}}, and Special:WhatLinksHere/Template:lang: is nearly empty. You might be thinking of {{t-sect|}}, which also maps the empty string to the empty string; I'm not sure whether that one's used for anything. (Is there ever a need to type {{t||…}}?)
RuakhTALK 18:55, 10 May 2009 (UTC)

Articles in entry names[edit]

Hey there, I'm new to Wiktionary, and I'm still learning protocol and such. I recently cross-linked entries for the Irish names of countries in various Wiktionaries, where some had used the article an in the title, and some had not. Your bot, Interwicket, reverted all of these changes. Now, I'm accustomed to Wikipedia, where there's one name and all others, where appropriate, should be merged and/or redirected. I've already been corrected for redirecting, but I'm not sure where to go from here. Clearly the entries should be linked (ga:An Ghearmáin and Gearmáin, for example), but I can't really make their names coincide in a way that Interwicket will like without redirecting. Where Wiktionaries include the toponym (and Germany's just an example; most countries take an article in Irish), they're pretty evenly divided between the two camps, with-article and without-article. Thoughts? Embryomystic 11:37, 9 May 2009 (UTC)

While we fairly generally don't use redirects, there are specific cases in which we do; this would appear to be one of those cases. Sometimes users are criticized fairly automatically for using redirects when it should have been more carefully considered. (Where were you "corrected" for redirecting?)
Interwicket (and the interwiki.py bot in "wiktionary mode" for NS:0) only link/allow links to the exact form, so redirects will be needed. Robert Ullmann 12:55, 9 May 2009 (UTC)

devtionary[edit]

Hi, there seems to be some sort of issue with the DNS for http://devtionary.info, but it can be accessed at http://70.79.96.121/ instead. Conrad.Irwin 18:25, 9 May 2009 (UTC)

pt-wikitionary[edit]

(Sorry for my bad english) Hello! I'm from portuguese wikitionary. On pt-wikitionry there is just one bot (pt:Usuário:Interwicket) that seek and put interwiks on the pages. But, I dont know why, the speed that the bot is doing this work is a little slow and we have many pages needing interwiks. So, I ask if is it possible to you create other bot to seek and put interwinks on pt-wiki, or something like that? --Jesielt (user talk) 22:57, 17 May 2009 (UTC)

It was running a bit slow in the last 24 hours or so, due to a few changes and some network problems. Should be faster now, and catch up in a little while. Robert Ullmann 05:06, 18 May 2009 (UTC)

a request for an opinion[edit]

Regarding your comment on User_talk:Biblbroks#Serbo-Croatian, could you share some thoughts on existence of Template:sr-bs-gender. All the best. 78.30.153.144 02:00, 19 May 2009 (UTC) (Biblbroks)

mbuga[edit]

Having difficulty with the definition of mbuga. Different websites call this plain, savannah, swamp, wetland, arid land... Is this particularly good soil for farming? --Jackofclubs 14:32, 20 May 2009 (UTC)

See sw:mbuga ... is all of those (;-). Mbuga ya Maasai Mara: Maasai Mara National Park. Is Kiswahili, not borrowed into English to my knowledge. Robert Ullmann 15:56, 20 May 2009 (UTC)
It is. There's a plural mbugas in many texts on Google Books. Do you think you could add the English entry? P.S. COMMENTING IN THIS SECTION DOES NOT MAKE ME WONDERFOOL. I just happened to be here. Lord! Equinox 23:17, 4 July 2009 (UTC)

User:Robert Ullmann/Missing forms/English[edit]

Hi, could you rerun this when you get a chance? Thanks. Also can you please take a look up here? 50 Xylophone Players talk 16:26, 20 May 2009 (UTC)

I don't mean to be rude but could you please reply whether you are currently too busy to do this or not? 50 Xylophone Players talk 12:27, 31 May 2009 (UTC)

neo-Luddite[edit]

Thanks; didn't know about that. Equinox 10:43, 2 June 2009 (UTC)

software bug?[edit]

Hi a little while ago I terminated a process (without knowing what would go wrong) for a Veoh related FF plug-in. Anyway FF disappeared and I was notified of a DivX update and an FireFox add-on update which I went along with. However when I was prompted as to whether I wanted to resume or start a new session and resumed, two tabs that had Wiktionary pages open in them displayed this before I reloaded them:

damn stupid copy-paste didn't work >:( [] Okay, that aside the message said something about a _____ (can't remember what was here [] ) syntax error or something. Also the "tab name" was database error and it said "This may indicate a bug in the software". Any ideas? 50 Xylophone Players talk 00:48, 6 June 2009 (UTC)

Your message traslated in Italian language[edit]

I write here for you translation of your message in Italian language. You can replace it at the actual page, if you want:

<small>Traduzione in italiano:</small>

'''Bot interwiki del Wikizionario inglese'''

L'utente "Interwicket" è un bot che ha il compito di aggiungere i link interwiki (cioè i link ai wikizionari in altre lingue) alle voci nel Wikizionario inglese. Il bot è stato creato appositamente per Wikizionario e le sue caratteristiche sono differenti dai bot creati per wikipedia, rispetto ai quali è più efficiente.

Qui (cioè sul Wikizionario italiano) l'utente "Interwicket" ha il compito di aggiungere i link contemporaneamente dal Wikizionario italiano al Wikizionario inglese e viceversa. Inoltre verifica la presenza di altri link da aggiungere (o rimuovere) e provvede a effettuare le modifiche necessarie.

* Se l'utente "Interwicket" viene bloccato qui, (ovviamente) non effettuerà modifiche
* Se l'utente "Interwicket" è flaggato qui, aggiungerà i link

Al contrario, opererà in modalità "test" (''test mode''), effettuando solo pochissime modifiche, che possono quindi essere facilmente controllati (da me o da chiunque altro). La maggior parte degli aggiornamenti non verrà attuata se sussiste tale limitazione.

:La pagina di discussione di "Interwicket" è [[:en:User talk:Interwicket]] (in inglese).
:Il codice del bot si trova alla pagina [[:en:User:Interwicket/code]].
:Lo stato, il numero delle modifiche, e altre informazioni per ogni Wikizionario sono consultabili alla pagina [[:en:User:Interwicket/FL status]].
:La mia pagina di discussione è [[:en:User talk:Robert Ullmann]] (in inglese).

<!-- da notare che l'intero contenuto di questa pagina viene riscritto dal bot, per cui è inutile modificarlo. Qualsiasi template aggiunto in cima oppure categorie o link aggiunti in fondo saranno cancellati dal bot -->

----

<small>Original message:</small>

'''English Wiktionary interwiki 'bot'''

User "Interwicket" is the 'bot that adds interwiki (inter-language) links to entries in the
English Wiktionary. It is designed for the Wiktionaries. It is not the "wikipedia bot", it is
much more efficient.

Here, user "Interwicket" will add links to the English Wiktionary at the same time as it adds
a link from the English entry to the entry here. If it knows of other links to be added or removed
it will also do that.

* If user "Interwicket" is blocked here, it will not edit (of course)
* If user "Interwicket" is given a bot flag here, it will add iwikis whenever adding to en.wikt

Otherwise it will operate in a test mode, doing only a very few edits, 
that can then be checked (by me, and by anyone else). Most of the possible updates will not be done
because of this limit.

:Discussion page for Interwicket is [[:en:User talk:Interwicket]].
:Code is at [[:en:User:Interwicket/code]].
:Status, number of edits, etc for each wikt at [[:en:User:Interwicket/FL status]].
:My talk page is [[:en:User talk:Robert Ullmann]].

<!-- note that all of the
text in this page is re-written by the 'bot; it is pointless to edit it. Any templates added at
the top or categories and iwikis at the bottom will be left -->

I didn't understand quite well this paragraph: "it will add iwikis whenever adding to en.wikt". What "iwikis" stand for?

Contact me if you need help with other translations from English to Italian language or vice versa. See you. --Aushulz 22:25, 10 June 2009 (UTC)

MediaWiki[edit]

I moved the main page on no: from 'Hovedside' to 'Wiktionary:Hovedside' as this is the recommended location of that page. This, however, means that the link to the main page in the 'navigation box' no longer is 'Hovedside', but 'Wiktionary:Hovedside'; how can I change this, so that the link is to the right page, and the main page is still in the correct namespace? V85 12:33, 11 June 2009 (UTC)

You want to set no:Mediawiki:mainpage to {{ns:Project}}:Hovedside
and set no:Mediawiki:mainpage-description to Hovedside
If you look at no:Mediawiki:sidebar it might help explain what is going on; that uses the "mainpage" message as the link, and the mainpage-description message (if it exists) as the display. It does look like it is set up almost that way, I suspect maybe that because no:Mediawiki:mainpage is presently pointing at the main namespace redirect that it isn't quite right. I can't play with it there of course. Robert Ullmann 13:01, 11 June 2009 (UTC)
That worked brilliantly, thank you! V85 13:13, 11 June 2009 (UTC)

Variations 2.0[edit]

Hey bud! Hope you are doing well. If you can, I'd like you to re-run User:Robert Ullmann/Variations. I have some new caveats in mind - first, ignore all terms with existing entries in Category:Variations of words; second, eliminate reports of the target word followed by a letter (such as ang1); third eliminate the three-letter limit (I don't know if a limit is helpful at all in terms of running time, but I think the hits above four letters will be nil).

Also, can you separately run a list that only crunches terms that do have entries in Category:Variations of words, and which indicates which terms should be on those appendix pages, but are missing? Let me know if you can implement these requests. If not, it would be great if you can just re-run last years version, and if at all possible, set aside hits that did not turn up in that version (perhaps bolding or italics). Cheers! bd2412 T 20:06, 12 June 2009 (UTC)

hi! okay, as you probably expect, this should take a bit of looking at. Will see if if I can do that in the next few days and get back to you. (Heavy cricket schedule at present!) Robert Ullmann 22:44, 12 June 2009 (UTC)
No rush at all - thanks for looking at it! bd2412 T 00:42, 13 June 2009 (UTC)
Just thought I'd bump this. Cheers! bd2412 T 21:33, 26 July 2009 (UTC)
I'm working on it. A lot more interesting than cleaning up the other disgusting mess. Robert Ullmann 11:23, 27 July 2009 (UTC)
Should be pretty good. Is one list, showing what you want (I think). Robert Ullmann 13:41, 27 July 2009 (UTC)
It's perfect! Thanks! bd2412 T 00:47, 30 July 2009 (UTC)

Tbot support for Serbo-Croatian[edit]

Hi Robert. I was wondering whether you had the time to review this request? I'd be glad to hear any opinions.. --Ivan Štambuk 14:41, 21 June 2009 (UTC)

Thank you[edit]

Dicfor dictionary write: prosector. anatomist good? Karesz52 17:48, 30 June 2009 (UTC)

Question with regard to your bot[edit]

Hello I am Daniel from the german Wiktionary. I have read on your bots userpage there that you can not speak German. So I ask you if I can translate it. --Daniel Janke 16:35, 17 July 2009 (UTC)

Hello I am sad because you do not answer me. So I would edit your bots userpages and read this here:<!-- note that all of the text in this page is re-written by the 'bot; it is pointless to edit it. Any templates added at the top or categories and iwikis at the bottom will be left --> Can you please change your bot so that I can translate the userpage? --Daniel Janke 17:15, 22 August 2009 (UTC)
Hi, sorry, lots going on. At the top of the page, add {{/Überschrift}} and then create page de:Benutzer:Interwicket/Überschrift and add the translation which will then appear before the English. And it isn't that I can't read (and write a little) German, it is that that same message goes in all the language wikts. Robert Ullmann 20:31, 22 August 2009 (UTC)
Ok, thanks for your answer. When the same message goes in all the language wikts, it is not better to translate the massage in a lot of langues so that everybody can understand it? (Like sites on meta) --Daniel Janke 17:02, 23 August 2009 (UTC)

{{Xyzy}} to {{suffix}} et al.[edit]

Will you look at {{suffix}}, {{prefix}} and {{compound}} please? Will the replacement of Latn with Xyzy everywhere do the work? --Vahagn Petrosyan 06:23, 26 July 2009 (UTC)

Yes, that should work properly. It would be slightly better if those templates didn't pass sc at all if they didn't get it, but that is a tricky bit of code that isn't really needed, as long as they call {term} with the same default as {term}. So change them as you suggest. (I have a lunch party right now, will be busy for a while ;-) Robert Ullmann 09:25, 26 July 2009 (UTC)

IDS composition sequence[edit]

Hi Robert. In the template han char there are a field called IDS. It is used in . I can't find any article on wikipedia about IDS. What is the source for the information? Kinamand 10:22, 28 July 2009 (UTC)

Hi! See Chapter 12.2 of the Unicode spec here (PDF). It was added because someone asked for a place to put IDS sequences; they probably are not too important except for characters not in common fonts. Robert Ullmann 12:38, 28 July 2009 (UTC)

Thanks re: {Xyzy} ‼[edit]

Just a big thanks! for {Xyzy} – this is a huge help for Japanese entries especially! (And will also save Japanese contributors a lot of typing and cruft.)

—Nils von Barth (nbarth) (talk) 16:26, 31 July 2009 (UTC)

[11][edit]

Call off your goon. — [ R·I·C ] opiaterein — 17:34, 2 August 2009 (UTC)

[edit]

Hi Robert. I have created an entry of the ideographic description character ⿱. Can you please check if it looks ok. If it does I plan to also create entries for the other ideographic description characters. Kinamand 10:05, 3 August 2009 (UTC)

Autoformat[edit]

I'd be interested in running an Autoformat bot on the French Wiktionary. However I know little or nothing about programming! Can you send me anything to read. Some French users like User:Koxinga will be able to help me (he runs the French Tbot). Mglovesfun (talk) 10:42, 15 August 2009 (UTC)

Very interesting idea. Let me think a bit. AF has a lot of things very specific to the en.wikt wrapped into it in all sorts of ways. Would be very interesting to abstract the rules much more and create some code useful on other wikts. Robert Ullmann 17:06, 15 August 2009 (UTC)

{{Xyzy}}[edit]

Well, don’t you think it’s time to enact the addition proposals at {{Xyzy}}? --Vahagn Petrosyan 04:33, 19 August 2009 (UTC)

Hmm. Should we include cu to Cyrs? Using it as a default? Robert Ullmann 06:06, 19 August 2009 (UTC)
As I understand it, Cyrs is an esthetic goody, an embellishment, not a necessity. It simply displays the Cyrillic letters in fancy old script for those who have downloaded special fonts like BukyVede, e.g. млѣко. The same can be perfectly rendered by standard Windows fonts: млѣко. I say we introduce support for Gothic instead. Without {{Goth}} enforced you see squares (𐍅𐌿𐌻𐍆𐍃) in Opera and IE, even if you have necessary fonts installed. --Vahagn Petrosyan 07:06, 19 August 2009 (UTC)
Isn't got/Goth a separate issue? Or is there something I'm missing? Robert Ullmann 07:19, 19 August 2009 (UTC)
I meant that since as you said adding new languages is costly, we should not add Cyrs. Instead, it is more beneficial to allocate resources intended for it to newly-proposed Gothic. Ideally, of course, we should add both. --Vahagn Petrosyan 07:34, 19 August 2009 (UTC)

It's time[edit]

To stop treating =Jyutping syllable= as a "non-standard header". =Pinyin= syllable isn't, so why would the Cantonese equivalent? — [ R·I·C ] opiaterein — 19:18, 31 August 2009 (UTC)

It is long past time for you to apologise for your extreme personal abuse. Until then, you seriously expect help or cooperation?
The way to do it is quite simple, but vicious abuse is not the way to find out what it is. Do you not understand that? You may be a self-described "dork", (and I very much like the artwork on your wp user page), but acting that out is not helpful to getting what you want.
You want help? Go on WT:BP and apologise profusely for calling me a "twat", and commit yourself to to never, ever, attacking anyone one here personally ever again on this project. And someone will tell you the way to do just what you want; you don't even need me to do anything. Robert Ullmann 23:31, 31 August 2009 (UTC)
My opinion of you has nothing to do with this request. I'm not going to apologize for calling you a twat, because I think you are one. I would be lying if I said I were sorry, and lying is against my religion. What does my wikipedia userpage have to do with anything? — [ R·I·C ] opiaterein — 00:57, 1 September 2009 (UTC)

French verb sections with red links[edit]

Hello Robert. You have been so helpful to me, but I would like to ask for more help. Is it possible, to generate a list of all French verbs with a conjugation table, but a table with red links in (i.e. with inflected forms, which I have not so far found). It would be great, because I keep finding such verbs, usually hidden away, and it would be nice to see how many such verbs exist - to feed the bot. Thanks again --Rising Sun 18:51, 3 September 2009 (UTC)

bo[edit]

Thank u very much
B9hummingbirdhoverin'æω 12:13, 9 September 2009 (UTC)

i was using this as a guide: Category:User_Deva
i'm gettin meself in a muddle *humph*
B9hummingbirdhoverin'æω 12:34, 9 September 2009 (UTC)

Wanted pages[edit]

Hi, Robert.

I was wondering what query you use to extract Wanted Pages from the XML dump as in User:Robert Ullmann/Oldest redlinks. I would like to implement a similar feature in pt.wikt. Thanks! Malafaya 15:35, 9 September 2009 (UTC)

I haven't run that in a while. Let me clean up the code a bit and re-run it; then I'll be happy to share. Should run on pt.wikt will little modification. Robert Ullmann 17:10, 9 September 2009 (UTC)

Hanja readings[edit]

The term hanja is so well established as belonging to Korean within Wiktionary that not only "Korean Hanja" but also "Japanese Kanji" and "Chinese Hanzi" is tautological or redundant, I fear.

You are supposed to know who really I am. Aren't you? --125.128.159.25 12:28, 10 September 2009 (UTC)

So use your account. And it doesn't matter if "Japanese kanji" is redundant, that's the way we do things, so that category names stay together with the language.
If you persist, I will block the IP and clean things up (handy "delete" tab), see? Robert Ullmann 12:57, 10 September 2009 (UTC)
Robert! Your reply is brief but I can read quite a lot, perhaps thanks to our past history. Again and again, I'm surprised by your huge admin power. Surely, however, overriding me should not be all you seek. That is to say, your power may be abused. You are going to block me asap for one reason or another unclear. You may look like seeing a mosquito and drawing the sword, namely, the Sino-Korean proverb, 견문발검 (見蚊拔劒) -- too hypersensitive to match with the global collaboration like Wiktionary. Have you behaved so strict so far? Should you wish to be so to me, however, go ahead to contribute to the notoriety of hypersensitive wikt administrators against me. But just remember this makes up your karma that would never be undone, on which I may live some day to your great regret. You are not quite sure what may result from your action. --125.128.159.25 14:11, 10 September 2009 (UTC)

Request for automatic entry creations - other languages[edit]

Hi Robert,

Please let me know if you think it's possible. I added this request in Tbot user talk page. --Anatoli 05:59, 23 September 2009 (UTC)

Anagrams[edit]

In this instance (and probably only this instance) including the header in the template is desirable. All the content of the Anagrams section is within the template in order that it is shared among all pages, therefore the [edit] link should point to the template and all pages get updated in tandem. As mentioned, this information is barely consequential, and can be trivially generated. The only information that would be under an ===Anagrams=== header would be the fact that anagrams for this word exist (duplication of the function of the header), the language (duplication of the language header) and the anagram-normal key for the word {{anagrams:en/adeht}}. As the anagram-normal key is a very specialised piece of data that would need custom treatment anyway, the presence or absence of an explicit header makes very little difference. Conrad.Irwin 17:19, 29 September 2009 (UTC)

You have really no idea how much stuff this breaks. Almost any s/w reading the text will treat the anagrams as part of the preceding section; and not be able to tell there is a L3 header there. We need the header in the wikitext. Putting the list in a template (one-per-alphagram) is good (if it must be in the page at all ;-). Robert Ullmann 17:26, 29 September 2009 (UTC)
It seems to me they will have a far easier time of it if they treat it like the category links and interwiki links, {{pageCount}} and other miscellaneous "non-sectional" stuff that already accrues at the bottom of a page (that is the reason it is at the end). I have no idea how much it breaks because it breaks nothing I know of - while I suppose it is conceivable to imagine there is some kind of script that relies on the TOC numbers without reading the TOC properly, such a script is broken anyway. Do you have other examples in which problems are caused? For consumers of Wiktionary wanting to extract our system of anagrams this makes it easy, everything is in a standard format; for consumers who don't want anagrams, it's less for them to ignore. Conrad.Irwin 19:14, 29 September 2009 (UTC)
Conrad, what do you mean by "read the TOC properly?" What TOC? You are thinking only of client side scripts reading (somewhat painfully) the HTML/DOM. I (and Hippietrail below) are talking about the majority of applications using and analysing the database. They read the wikitext, either the XML or from the API. Any consumer "wanting to extract our system of anagrams" should (actually must *) be reading the database, not scraping the data out of the HTML GUI. The vast majority of applications (not "scripts") do this correctly. The headers must be in the entry wikitext.
I am not thinking of them at all, I consider them "broken anyway" as above. If something is wanting to extract anagrams, then having them in templates makes it easier to extract them (all in the same place, in the same format, no need to balance out differences between entries that should list the same anagrams, etc.). The header is only there for users, which is why it should work for them. As you say, no bot should need to know about it. Conrad.Irwin 16:21, 30 September 2009 (UTC)
Earth to Conrad: (;-) they aren't broken; reading the wikitext is the standard way to extract and re-use data from the wikt, and the headers have to be in the wikitext to be parsed. The applications are working properly. Of course it is good to have the anagrams themselves in the templates. The header, however, must be in the wikitext precisely so that applications (of which "bots" are a tiny minority) can read the structure. Robert Ullmann 16:14, 11 October 2009 (UTC)
(* WMF terms of service prohibit "live mirrors" or any applications reading the GUI to return data, the GUI is only for editors and direct users. Applications must use the XML dumps and/or the API within reasonable limits.)
Oppose: I would never accept any heading hidden inside a template. While it may help editors in some small way it completely breaks every automation tool, parser, and extension I have ever written that depends on sections and headings. This is why I can't parse the de.wiktionary at all where all headings are hidden in templates. While cirwin argues that the Anagrams section is "useless" I am utterly against anything which leads to a division of sections into first class and second class. It would mean any of my tools would treat all sections except Anagrams - leaving them out in the cold. And it would get much worse if anyone were to decide that certain other sections would benefit from similar treatment. — hippietrail 00:45, 30 September 2009 (UTC)

daily dumps[edit]

Hi Robert.

I just wanted to ask your opinion on chaining some scripts onto the daily jobs once the dumps are completed on devtionary. I have some scripts which index the dumps to enable fast random access to a dump file which I use for some tools currently on Toolserver. Then some other tools which create minidumps per language and per namespace.

Now cirwin thinks I should poll every couple of minutes to look for a new dump but I wanted your opinion since you set up the daily dump process. I thought it would be better if the daily job executed some other script when it's done. Then I or any of the devtionary devs can add stuff they want done to that script.

What do you think? — hippietrail 23:53, 30 September 2009 (UTC)

Poll? Gak ;-) The dumps are run out of a background script (which I might run from cron someday ;-) that runs the published daily just after midnight (local time on the machine). Should usually be done by 2 AM. You might run something (from cron or whatever) at maybe 3 or 4, checking to make sure the file is available. Starting something under a different UID and path is technically possible but tricky. Probably easier for you to run from cron, which does all that.
Sorry for the long delay in answering. Robert Ullmann 17:04, 11 October 2009 (UTC)

rerun[edit]

Hello. Please can you regenerate User:Rising Sun/French verbs needing conjugation. Rising Sun 12:12, 11 October 2009 (UTC)

Done. Robert Ullmann 17:05, 11 October 2009 (UTC)

UllmannBot[edit]

If you unilaterally decide to run the UllmannBot for the creation of B/C/S/M entries without a passing vote approving that kind of activity, I'll initiate a vote for your reprimand and desysop here, and report a history of your disruptive behaviour to meta incidents board and Jimbo Wales personally. --Ivan Štambuk 15:30, 13 October 2009 (UTC)

Mr Štambuk, your threats are unacceptable behaviour for a wiki (or indeed any human community). It is astounding that despite your obvious intelligence and knowledge, you have no understanding of the impropriety of your constant pattern of threats, intimidation, and vicious personal attacks on anyone and everyone with whom you have a disagreement. Sir. Robert Ullmann 03:30, 14 October 2009 (UTC)
And where exactly do you see "threats, intimidation and vicious personal attacks" here?
You state that you'll unilaterally run your bot for the type of extremely controversial edits that more than warrant prior community sanction per bot policy. Those edits would severely degrade the quality of Wiktionary entries in a language that happens to be my major focus of contributions around here (by being my mother tongue). I have every reason to be extremely upset by this disruptive behaviour of yours, which comes not as an isolated incident but as a continuation of former practice, and, I have reason to believe, is a carefully organized and targeted provocation (and I see you've been corresponding with SpeedyGonsales and Roberta F. - former Croatian Wikipedia bureaucrats who'd been desysoped and banned a month ago without anyone in the community raising as much as an eyebrow, due to their evil off-wiki machinations which resulted in an irreparable loss of many superb contributors over the years).
I personally find the artificially "over-civilised" tone of yours above, and out-of-place smileys in your BP notice particularly abhorring, in the context of everything you've done in the last few months to obstruct the Serbo-Croatian business, insult and defame everyone participating in it. There is absolutely no reason to assume that you're doing any of this in good faith. If you were, you'd be asking for feedback on the WT:ASH talkpage, notifying the relevant contributors. Instead, you continue to abuse the sysop and bot privileges bestowed upon you by the community, to unilaterally act as if you're some kind of "wiki-god". I'll simply do everything I can to thwart your maliciously-crafted plans until it's too late. What you may perceive, or falsely self-victimize, as "threats" or "intimidation", I perceive as morally-obliging Good Thing to do. --Ivan Štambuk 09:11, 14 October 2009 (UTC)
(did he really just abuse me for being polite? yup, he did ;-)
Mr Štambuk, I would advise you to arrange a meeting with your academic advisor at the University. Sit down with him or her, and explain what you are trying to do, then show him/her the actual replies/comments/statements you have made about other people in the discussion(s). And also keep in mind that everything you write here is on the permanent public record, this is not a fantasy role-playing game, but rather part of the permanent academic literature. Robert Ullmann 07:36, 15 October 2009 (UTC)
Look Robert, it's not my attention to "abuse" you, but given your record of deliberate evilness here, methinks it's safe to assume that the over-politeness of your discourse above follows the line of your previous self-victimization efforts, where your apparently "good intentions" are being verbally massacred by insane Štambuk, where in fact the only thing you're really interested in is pushing the absurd "different languages" theory to the limit that would hopefully result in me doing something stupid, getting me permanently removed from this project. That's because you (and your colleagues) cannot argue with me: my arguments are for the most part flawless and irrefutable. And you cannot concede to some win-win consensus either (which I have - separate new manually-edited B/C/S/M entries are allowed, and they're not being removed). Your motives are completely transparent.
What I'm hoping to achieve is to appeal to your super-ego, not to be driven to do something that will likely result in a fair amount of damage to your public standing. SpeedyGonsales and Roberta F. have nothing to lose here. Were they trully interested in the status of ==Croatian== entries here, they'd actually be creating quality entries, not emit their nationalist BS. And I'd have abs. no problems contributing as IP here. --Ivan Štambuk 09:48, 15 October 2009 (UTC)
Wow. Just wow. You are right because you are right, and anyone who disagrees with you obviously knows absolutely nothing, and should defer to your utterly superior intelligence, and "flawless and irrefutable" argument.
We'll just conveniently ignore the fact that your singular position has been completely rejected and repudiated by every single national and international authority, national constitutions, libraries, standards bodies, even the linguistics department of the university you attend.
Because you are right, and they are all ignorant idiots. Right. I understand. Robert Ullmann 04:40, 16 October 2009 (UTC)

"Sortkey"[edit]

Has there even been talk of a "sortkey" like fr:Modèle:clé de tri in French? For anyone who's reading this and doesn't know, it's used to keep pages in alphabetical order in categories, so Spain becomes spain and crème brûléé becomes creme brulee. I'd have thought at least in theory it's a good idea, albeit the number of pages needing one must be at least 10 000. I thought I'd talk to you before going to the Beer Parlour with it. Mglovesfun (talk) 12:04, 27 October 2009 (UTC)

  • I'm pretty sure that I've seen "sort|..." (or some such) in some templates on this wiki. SemperBlotto 12:13, 27 October 2009 (UTC)
    Oooh....that's a really good idea. We do have templates which utilize sortkeys, but they only apply to the specific categorization that that template is causing, so {{grc-noun}} has a sort key, which sorts ἀββα as "αββα" in Category:Ancient Greek nouns, however, it does absolutely nothing to its sorting in Category:grc:Aramaic derivations. Simply put, it's too tedious to have a sort key to be inputted in every single category, categorizing template, categorizing contag, etc. However, if we had a template like the French do, we could just put one and be done with it. Many languages could have a few rules written and get some bots on the task, and we'd be done in no time. I definitely support a BP thread on this. -Atelaes λάλει ἐμοί 13:30, 27 October 2009 (UTC)

There is {DEFAULTSORT}, which initially looks very useful, but applies to all the cats on the page that don't specify their own keys. The issue is that we have different languages on the page, and languages sort (e.g.) Latin diacritics differently.

So we would need to define some common default sort order. (Which is what the French have done implicitly, by simply using the letter sans diacritic as I understand.) Also we want to handle cases like resume and resumé which should not get the same sort key, right? They should sort together, but in a defined sequence.

Languages that "need" a sort order different from the common order would have to use the explicit keys.

So try writing a specification ... you'll find a pleasantly large set of interesting details to address (;-) Robert Ullmann 14:03, 27 October 2009 (UTC)

resume and resumé get the same sortkey on fr.wikt. I can't see what the problem with that is. As pointed out by SB, some templates allow doing it manually, some not. Mglovesfun (talk) 18:08, 27 October 2009 (UTC)
They won't sort in any consistent order. Which one appears first will depend on order of entry in the DB. Is that desired behaviour, or do you (we;-) want "resume" to sort first?
Another issue is that, so far as I can tell, there's no way for a template to determine what the default-sort is; so right now, most of our templates that support sort= sort by something like {{{sort|{{PAGENAME}}}}}, which completely discards any default-sort. (See {{infl}} for one example.) As far as I can tell, the only way to fix that is with something huge and ugly like {{#if:{{{sort|}}}|[[Category:wiki-text that evaluates to the right category name|{{{sort}}}]]|[[Category:wiki-text again]]}}. (Of course, we can wrap that in a {{cat-and-sort|cat=…|sort=…}} that handles both sort= and {{DEFAULTSORT}}; and for the long-term, we can submit a MediaWiki patch that would either (1) treat empty-sort-string as equivalent to no-sort-string, so that {{{sort|}}} would work, or (2) create a magic word that refers to the default-sort, so that templates can use that magic word instead of {{PAGENAME}} as the fallback.) —RuakhTALK 18:44, 27 October 2009 (UTC)
Indeed, {{context}} goes to some convolutions to avoid this. The entire {DEFAULTSORT} business needs work. But we might be able to do something useful with it as it is ... Robert Ullmann 23:40, 27 October 2009 (UTC)
Shall we take this to the Beer Parlour? Mglovesfun (talk) 10:57, 29 October 2009 (UTC)

Tbot mess with Chinese[edit]

Hi Robert,

This edit [12] created such a mess with this Chinese translation! Can this be stopped please? --Anatoli 18:20, 27 October 2009 (UTC)

sure. Don't use inappropriate piped links to redirect to pinyin w/o tones. Or don't link them at all, as transliterations are normally not linked. 23:43, 27 October 2009 (UTC)
Look, wikisyntax (the MW version) is not a formal language. You can't just cram syntax in in various places without breaking all sorts of things! The devs, and others, and I, try to make it as general as possible, but in the strict general case, it is not possible. Not just "we think it is hard", put provably impossible (yes: ! and !! and !!! ;-)
(you complained on BP because I didn't reply for a few hours? I was watching the kids from Brazil and Mexico play in Lagos. ... <grin> look: if it is that critical, send me an SMS +254 722 929 463. If not, wait a bit, eh?) Robert Ullmann 00:07, 28 October 2009 (UTC)
Sorry for taking it up to BP, it's not a UN tribunal yet, is it? :) I saw your edits today, thought you ignored the message, also wanted to get other users' opinion about this wikilinks in general. You haven't said whether you are going to do anything about it. I am worried about my Chinese translations and my work, that's all. Anatoli 08:15, 28 October 2009 (UTC)
(not yet ;-) You write the message here are 18:20 (9PM my time) I saw it at 3AM, the first time I was looking at the computer again ... I think people have come to expect that I am always here ;-)
The transliteration is a string of Latin-extended letters, not an arbitrary piece of wikisyntax. It is almost always unlinked, although there are some simple cases where links work correctly. E.g. in Japanese, where the tr parameter is usually tr=[[(hiragana)]], romaji In this case the pinyin should not be linked, and in any case certainly not to the forms w/o tone diacritics. And certainly not one syllable at a time; if the entire term has a pinyin entry, it might be linked, but probably not. The entries need to be corrected. Robert Ullmann 15:36, 28 October 2009 (UTC)
That's true to a point — halting problem and all that — but it is possible to define a proper subset of wikisyntax that covers most translation-table entries and that Tbot knows how to handle properly, and Tbot can certainly be made to ignore or flag entries that don't conform to said subset. (I leave it to you to decide whether that's worth the effort. It doesn't seem like it should be so hard, but I've never actually looked at Tbot's source code. And I don't know how often it happens that Tbot currently borks things.) —RuakhTALK 18:05, 28 October 2009 (UTC)
How can I find which transliterations have been affected by Tbot? Robert, you still haven't explained what happened and whether you are going to stop it? The Japanese transliteration should not be touched either by bots. There could be just tr=Rōmaji; tr=Hiragana, Rōmaji, Hiragana, Rōmaji (where more than one reading exists). Anatoli 22:08, 28 October 2009 (UTC)
It's worse than that (cf. halting problem). The wikitext has no formal grammar. It is, as noted, possible to define a subset, and that subset has been defined for the tr= parameter to cover the Latin-ext strings and some well-formed simple wikilinks. Once that is exceeded, "flagging" etc is in fact a technically "hard" problem. The second part, "Tbot can certainly be made to ignore", is not only not "easy", it is provably impossible (!! yes, I know. !!!) The Japanese transliterations are fine (in the several cases used). We can go and find the cases where too much wikisyntax was wedged into the parameter, and remove the piped links; but there isn't anything to be fixed. Specific cases are not "hard", but the general case, lacking any formal grammar, is not possible. (and is it 3 AM again; I'll be happy to chase some things later, eh?) Cheers, Robert Ullmann 00:13, 29 October 2009 (UTC)
Re: "The second part, 'Tbot can certainly be made to ignore', is not only not 'easy', it is provably impossible": I really disagree. I think you're over-thinking this. I'm not saying, "Tbot can decide correctly, in all cases, whether it is performing the right transformation, and if not, it can back off"; as you say, that is impossible. (Or at least, I'm quite willing to take your word that it is; I count 49 possible tokenizations for }}}}}}}}}}}}, and not one of them makes me want to write a lexer, let alone parser, to test this out for myself.) Rather, I'm saying, "There are cases when Tbot can determine with confidence that it is performing the right transformation, and if you decided to, you could make Tbot behave conservatively by performing no transformation in cases where it isn't confident." (Again, whether that's worth the effort, I don't know, because I don't know how much effort it would be, nor how many lines are currently being handled properly that would cease to be touched at all, nor how many lines are currently being borked. But it's definitely possible. Heck, one could argue that I'm currently running a very conservative version of Tbot. ;-)   —RuakhTALK 02:01, 29 October 2009 (UTC)
"confidence" isn't in it. The bot is totally confident in what it is doing, it is just following its rules ... ;-) To be less flip: the bot, unlike humans, never looks at the results and says "hmmm, that doesn't look right". (And humans don't do that nearly enough!) (And yes, you can always be "valid" be doing nothing; but it is trying to do something, eh?) I'm sure you've probably seen this (written, note, about humans): (<- margin)
  1. there are things you know, and know that you know,
  2. things that you don't know that you know,
  3. things that you know that you don't know,
  4. and things that you don't know that you don't know

If a grammar is completely specified in a formal system (as, say, the C language), then code parsing it can reduce every bit of syntax to (1) and (3). A given input is valid or not, and a (correct) C compiler will produce correct output from valid input, and reject invalid input.

The wikitext parser syntax is not only not specifiable by a formal syntax, it is a very, very long way away from it. It is designed to allow humans to do simple to moderately complex things without writing a lot of extra syntax.

The result is that in any code, other than the MW parser itself (and not even there, as it is not possible in present practice to isolate the parser; there have been suggestions of untangling it ;-), there is a s--tload of stuff that falls into (2) and (4) above.

In this particular case, the code has a string:

{{t.|.....}}

(. is any character, for various definitions of any, in this case not containing "}", and the first being +, -, ø, or nil) and is looking for the tr parameter:

{{t.|...|tr=...|...}}

which is expected to be a string of Latin-ext characters. It matches "tr=" up to "|" or "}", for "any character" not "|" or "}". with expected input, that will work fine. The Japanese case, linking the kana form, happens to work under (2) above. If I "fix" the case of tr containing [[...|...]] by excluding "[" and "]" then it breaks the Japanese case, which turns out to be very useful, if not anticipated.

This happens, and will continue happening, as people will insist on cramming wikisyntax into places where it appears to work. (And then, inevitably, say "it just shouldn't be touched" ;-) Sorry, but the whole point is "touching" the {t} templates, so they have to parse, and parse with something well shy of the 10's of thousands of lines of idiosyncratic code in the WM parser. (!)

In this case, we can "allow" the piped syntax (although, as noted, it shouldn't be used anyway), which involves a largish chunk of code; flag it out (not too hard to identify this one case), or just not do it ... there are many thousands of ways that imaginative users can bork just about anything by cramming in syntax ...

a so simple header

Syntax can be perfectly legal, and completely unreasonable to deal with when doing ordinary things.

And it can be perfectly legal according to any rational grammar, and bork the MW parser:

globalWrapper[edit]

Erm, fontsize? (and the first example lacks an [edit] link, another parser error)


nevermind.

I can certainly look for some more syntax problems, and will do. Then need a tag and cleanup cat. Robert Ullmann 16:35, 29 October 2009 (UTC)

This paragraph:
This happens, and will continue happening, as people will insist on cramming wikisyntax into places where it appears to work. (And then, inevitably, say "it just shouldn't be touched" ;-) Sorry, but the whole point is "touching" the {t} templates, so they have to parse, and parse with something well shy of the 10's of thousands of lines of idiosyncratic code in the WM parser. (!)
and sums up the issue. People cram wikisyntax into places where it does work in the currently deployed version of MediaWiki. They don't know that the {t} templates were designed to be updated by Tbot, and even if they do, they have no way of knowing what Tbot will and won't be able to handle. And you can hardly blame them, seeing as your whole argument is that Tbot itself doesn't know what it will and won't be able to handle. ;-)
"Look[ing] for some more syntax problems" is not a bad idea; but it's also possible to take the opposite approach, and write a fairly simple am_I_sure_that_Tbot_can_handle_this_line_properly predicate, and flag or ignore every {t}-call (or apparent {t}-call) for which it returns false. That is, rather than looking for specific problems, you can look for the guaranteed absence of any problems.
RuakhTALK 17:45, 29 October 2009 (UTC)
no, it is not possible. Really: NOT POSSIBLE. All we/I can do is write more rules to handle more cases. The MW parser can't even handle the supposed MW syntax! (if you think it is possible, write it. And I'll look forward to your paper in JACM upending 30+ years of theory and practice :-) I could prohibit every possible use of any wikisyntax character or meta character, and that will in turn break lots of stuff that works. Is that the desireable outcome? Robert Ullmann 00:45, 30 October 2009 (UTC)
Re: "I could prohibit every possible use of any wikisyntax character or meta character, and that will in turn break lots of stuff that works": Bingo. You know, you're a really smart guy, but sometimes it takes you an insanely long time to understand very simple points. Sheesh. (And sorry, but you don't need to prohibit every possible use of metacharacters: if there are specific uses of metacharacters that you do want to support, you can add those to the predicate, instead of trying to remove support for specific problematic cases. It's not possible for the latter approach ever to be complete, which means that as long as Tbot follows that approach, it always risks borking things.)
Re: "Is that the desireable outcome?": As I said in both of my first two comments above, I leave that to your judgment. There's a trade-off here: there's no way to handle all translations you possibly can while flagging/ignoring all others. Any step in the direction of flagging/ignoring more translations is going to entail flagging/ignoring some translations that Tbot could actually have handled. And there's also a trade-off in that each additional bit of programming effort is likely to have a diminishing marginal return. I most certainly am not telling you how you should handle this, nor how you should run your bot. But when your bot makes mistakes and borks things, you can't pretend that no judgment was involved here. You chose to write your bot in a way that could bork things; that may well have been the right choice, and I'm not criticizing you for it. But I think you need to acknowledge it, and when specific instances of borking occur and are brought to your attention, I think you should be open to re-evaluating that choice. The longer the {t} templates exist and the more widespread they become, the more kinds of random wikisyntax are going to be introduced. Back in the day, Tbot was the only thing adding {t} to entries, and a choice that made sense back then might — might — eventually stop making sense.
RuakhTALK 01:44, 30 October 2009 (UTC)
I seem to take an "insanely long time" because other people (like you) tend to hand-wave problems that are either not simple, or actually not possible (in the latter case there may be, and usually is, a good approximation but it can take a lot of work to get to). Creating something like mbwa.py (Interwicket) to do 15-20K edits a day without borking everything takes some careful attention to things most others wouldn't bother with.
So what are wikisyntax characters? { [ ] ] | * : ; ' # < > and all sorts of special cases (like RFC 1707 which turns into a link by magic, also http://foo.bar, and that will change the semantics of [. Oh, and I forgot line breaks.) The simple predicate is too far from the desired result. (when did I say judgement wasn't involved?) My point is that people have to be conservative in what they cram into various places, and that isn't particular to Tbot, we do that everywhere.
Patient: Doctor, it hurts when I push a spoon up my nose.
Doctor: Then don't do that!
Patient: But then how do I get the beans out of my nose?
Sometimes the answer is: Don't Do That.
And see below Robert Ullmann 14:10, 31 October 2009 (UTC)
I am lost in this discussion, although I haven't spent a lot of time trying to decipher it. I understand that the issue remains and Tbot will continue to bork Chinese translations, am I right. I was trying to find more translations stuffed by Tbot but couldn't find. It may not be easy.
Robert, I am also interested in the questions I asked before - does Tbot attempt to create Chinese/Mandarin entries from translations? I haven't seen any. No Arabic either. Japanese entries are based on Kana, haven't seen any Kanji entries. I see Russian entries now get created, thank you for looking into this. I didn't have much chance to clean them myself but I see Stephen Brown and Wanjuscha have been doing some work there. Anatoli 21:36, 29 October 2009 (UTC)
don't worry about it, it is fairly (extremely? ;-) arcane. I have modded tbot to catch cases, and know how to hunt other cases.
the problem with adding translations is that "zh.wikt" (Mandarin) is very thin on Chinese words/terms themselves, so not much is found. Robert Ullmann 00:45, 30 October 2009 (UTC)

Also: right now net access is down, I am using Safaricom GPRS bypass for the moment, but don't know when full service will be back. It is 3:37 AM here. Robert Ullmann 00:45, 30 October 2009 (UTC)

Oh, no need to reply so early in the morning. Answer when you can, I'll wait! So, Tbot is relying on zh.wikt to create new entries? It's a pity. It can't verify correctness or match to the English meaning, anyway. This explains why Arabic doesn't get created, does it? Anatoli 01:57, 30 October 2009 (UTC)
Yes. Initially it wasn't doing any cross checking (and not stealing WP links and pictures, etc). It was generating too many bad entries from bad translations table entries. FL wikts that are thin on words in that language with the English translations don't get a lot of entries generated. The Japanese entries where there is kanji and kana and rōmaji in the translation table don't get created because of a simple bug I need to fix (;-). (the bug is simple, the fix isn't)
Tbot is catching the error cases that cause one subroutine to mis-parse (this is the initial problem stated here), look at Category:Entries with t template problems (a mouthful, but it is easy to change that cat name if anyone wants to). It has picked up several things, I fixed a couple but left the others so you can see. Robert Ullmann 14:10, 31 October 2009 (UTC)
And it will find all the existing cases, sooner or later. Robert Ullmann 14:30, 31 October 2009 (UTC)

Tbot and Armenian[edit]

Hi. I was wondering if Tbot can make these kinds of improvements automatically. Armenian looks really bad without its script template enforced. Also, sometimes I find xs=Armenian in translation tables. What is it for and is it really necessary? --Vahagn Petrosyan 14:13, 29 October 2009 (UTC)

xs= is an optimization to reduce the number of individual template calls. Armenian was not in the {t-sect} table(s) until recently, Tbot is now removing xs= in that case (slowly, it keeps minor edits to a minimum, editing just to remove un-needed parameters is mostly avoided).
Converting to the template (as in your example) works if Tbot can match the transliteration; if not, it suspects it might be a qualifier or whatnot, and won't convert. I'm not sure how effective the match is for Armenian. Um, doesn't have the table for Armenian, will fix that ;-). It also wants to see the word exist either here or in hy.wikt, so it isn't bogus. Robert Ullmann 16:55, 29 October 2009 (UTC)
See this edit ;-). And here. I also added the tables for Devanagari and Georgian, and both of those appear to work as well. Robert Ullmann 13:45, 31 October 2009 (UTC)
Thanks. Works nice. Another request: can Tbot do this? No one is going to look for Old Armenian under "O", so I move xcl translations under hy manually. Also, sc=Armn is not necessary for xcl anymore. --Vahagn Petrosyan 18:06, 31 October 2009 (UTC)
The question of which language name to "group" was/has never been sorted. If we had come basic convention, AF could do that. If the sub-line is a full language name, you should use ** instead of *: (that way s/w looking for language can look for * or **, the lines with bullets). Tbot knows that xcl defaults to sc=Armn, it'll elide it eventually. Robert Ullmann 15:32, 2 November 2009 (UTC)

Français newspapers[edit]

Hello again, can I request off of you a rerun of /Français? These kind of lists are super to hunt down missing French words. I'd like to see how much smaller these lists will be, since the previous ones and since DRB has been generating thousands of extra entries. And in the older pages, there's mostly just difficult/boring words left. And maybe an update of User:Rising Sun/French verbs needing conjugation, but this time can you ignore any multi-word entries, and ignore anything missing with the {{infl}} instead of {{fr-verb}} ('cause they both do the same thing). --Rising Sun 19:53, 1 November 2009 (UTC)

  • Ditto Italiano - maybe once in a while rather than regularly? SemperBlotto 20:00, 1 November 2009 (UTC)
A fine idea, I'll run it now and see what "they" have broken since the last time (this papers are constantly changing format ;-) Robert Ullmann 15:33, 2 November 2009 (UTC)
I did run it, but some od the pages have a lot of junk. Apparently hyphenations done differently.
Will run the French verbs again now. Robert Ullmann 12:34, 19 November 2009 (UTC)

Wrapping tables in divs[edit]

In order to avoid tables created with {{top2}} to mess with RHS elements, I though of wrapping them in div tags. Conrad said that aside from editing the templates to add this (because {{bottom}} might be used to close other tables besides {top*} ones), "[t]here's probably some way to bully CSS into treating the table like this directly". Do you know if that's possible? If not, do you think it wise to add div tag to {top*} and {bottom*}? Thanks. --Bequw¢τ 21:02, 6 November 2009 (UTC)

Message for you @ the Info Desk[edit]

Hi Robert, could you please check out the topic entitled "Chinese attention tags on Cuniform characters?" @ the Info Desk? We could use your help. :) Cheers, Tooironic 06:57, 9 November 2009 (UTC)

sorry for the delay, your note here didn't indicate that it was an AF bug. Sounded more like just something of interest I should look at, rather than problem that needed to be fixed. Robert Ullmann 12:32, 17 November 2009 (UTC)

Esperanto forms[edit]

Robert,

What do you think of the code for {{eo-form of}}? This is the template to be used in Esperanto form-of entries by the proposed DarkIceBot. The code looks rather server-intensive to me. --EncycloPetey 02:15, 16 November 2009 (UTC)

Not too bad. It is long, but straight-line conditionals. The problems come with nested templates that iterate switches and things, so that one ends up expanding hundreds of paths to end up with a word or two. This uses some longish switches just once to generate each bit, and the template is itself used only once (I think always, I don't think Esperanto has multiple forms with the same spelling.) per entry.
Doc should be moved to the talk page of course, and could use more explanation, although it is probably just fine for those who know some of the language. (I know a few words ;-) Robert Ullmann 10:41, 16 November 2009 (UTC)

Removed substub from Han/6600[edit]

Just an FYI that I removed the literal use of {{substub}} from your Han dump User:Robert Ullmann/Han/6600, as part of cleaning out Category:Section stub. Trust this is ok, but wanted to advise you when treading in your userspace!

—Nils von Barth (nbarth) (talk) 05:36, 16 November 2009 (UTC)
Thanks, is just fine. The code should have elided them anyway. (More recent code I have written for similar things makes sure it doesn't copy templates.) Robert Ullmann 10:43, 16 November 2009 (UTC)

About Interwicket on es.wiktionary.org[edit]

Hi Robert,

I have found that apparently the bot Interwicket has created a new entry on es.wiktionary.org [riesgos] that is almost empty, regards --Cvmontuy 20:28, 17 November 2009 (UTC)

It was deleted while the bot was in the middle of the edit (with the re-write queued). I should probably look into the options for no-create. (there is something about create and re-create page) In this instance, just delete it again. Robert Ullmann 12:27, 19 November 2009 (UTC)

French nouns[edit]

Hi Robert. Thanks for the recreation of the French verbs page. Can I please request a couple of others, User:Rising Sun/French nouns and User:Rising Sun/French adjectives which would contain French nouns/adjectives lacking {{fr-adj}} and with explicit category link, and any Tbot entries. The adverbs and proper nouns and other parts of speech are not so interesting for me. --Rising Sun 21:05, 19 November 2009 (UTC)

Yes, good idea. Can't promise delivery time ;-) Robert Ullmann 23:48, 19 November 2009 (UTC)

Carsrac at nl[edit]

Hi Robert,

I blocked Carsrac's bot at nl, because it was adding false iw's. Now Carsrac is accusing your bot of deleting links for no good reason. I think it was merely cleaning up his mess, but I'd appreciate if you would check to be sure. nl:Gebruiker:Jcwf Jcwf 00:22, 25 November 2009 (UTC)

Thanks for your remarks Robert. I read the meta page and was wondering if something about the differences between Arabic and Persian ya's and kafs should be added. That's another really tricky thing that I feel I only partly understand. It is at least as troublesome for iw's as the apostrophe question. Jcwf 01:15, 25 November 2009 (UTC)
If Carsrac bot had been using the -wiktionary mode flag as required to operate on NS:0 in the wikts, it would not have added improper links. So it isn't/wasn't even following the proper instructions for the code it was running.
We should look at some of the rules as well; at present Interwicket is following the existing rules (as it must, unless we were to stop using interwiki.py on all the wikts). The present method does allow each of the wikts to have their own rules about what is distinguished and what is redirected; this is generally a good thing. Robert Ullmann 15:08, 25 November 2009 (UTC)
Oh dear, it has been making a mess: look at this edit lots of stuff like that. Interwicket systematically cleaning it up. It found and fixed hundreds of bad edits on Monday morning. Robert Ullmann 15:35, 25 November 2009 (UTC)

missing forms[edit]

User:Robert Ullmann/Missing forms/English is from May. Is it time for a refresh? RJFJR 20:13, 25 November 2009 (UTC)

Tbot[edit]

An incongruous photo, yay for polysemy. Conrad.Irwin 21:09, 27 November 2009 (UTC)

indeed, the sw entry needs a bit of work, and then the pix is fine ... I especially liked the Swedish here ;-). There have been a few other serious funnies, hence the tags to check them ... Robert Ullmann 23:20, 27 November 2009 (UTC)

Deleting Special:PrefixIndex/Template:anagrams[edit]

There are 4308 templates created for the original format proposal for the anagrams bot, do you still have your politely delete stuff script, or shall I roll my own (either politely, or quickly)? Conrad.Irwin 00:20, 6 December 2009 (UTC)

I have the code. The actual delete part is fairly simple, 95% of the code has to do with finding the correct tasks and checking for links.
We might just task it with this set of entry names? See request below to run it again. Robert Ullmann 12:30, 10 December 2009 (UTC)
Well, it isn't quite that simple, but not very hard either. Shall I try it? Robert Ullmann 13:32, 10 December 2009 (UTC) (probably just run a hacked up version ;-) Robert Ullmann 13:42, 10 December 2009 (UTC)
Yes please. I seem to remember you had clever rate-limiters in place, but it doesn't matter hugely either way. If it's too much hassle, I'll happily do it myself. Conrad.Irwin 13:57, 10 December 2009 (UTC)
Will do. Will be mixed in with other deletes to some extent (so limit applies to all ;-). But will run. Robert Ullmann 06:22, 11 December 2009 (UTC)
Thanks again. Conrad.Irwin 13:03, 11 December 2009 (UTC)

Chinese categories[edit]

Since you were involved with them a lot, I'm sure your expertise would be appreciated at Wiktionary:Beer parlour#Chinese categories. Thanks. --Bequw¢τ 00:21, 6 December 2009 (UTC)

CS redirects[edit]

Is User:Robert Ullmann/CS redirects with links still active? It didn't appear to be (so I made my own, poor imitation, lists). If it isn't, could it be restarted? --Bequw¢τ 18:44, 7 December 2009 (UTC)

I ought to look at that. It might very reasonably be run again. (I've been ill for a few days, not spending much time on-line, so there are a couple of other things above I should be chasing too. ;-) Robert Ullmann 15:48, 9 December 2009 (UTC)
Not a priority of course, but thanks! The #'s dropped from 40k to 4k so maybe we'll be able to scratch it off the list soon. --Bequw¢τ 15:25, 10 December 2009 (UTC)
Also, could add to your exception list the German word frequency lists and the Surname/Given name lists? I think it would be better if those were red-links so that editors know at a glance that the entry is missing. The JS redirect will act like the hard redirect for the normal user. --Bequw¢τ 19:50, 10 December 2009 (UTC)
Seems to me I recall the thinking was the other way around the last time; but this makes sense. If the redirect is left somone looking at those lists won't know it is missing. Will change that. I'm also going to dump all the 2008 reports, they are not useful, the process will find everything anew. Robert Ullmann 05:37, 11 December 2009 (UTC)

Ullmannbot for admin?[edit]

Have you ever thought of nominating a bot for temporary admin status to be able to delete unwanted pages/templates/categories? I'd support it. Mglovesfun (talk) 11:05, 12 December 2009 (UTC)

See User:Robert Ullmann SysopBot, I think the reason it wasn't used was that it didn't hide the log actions from Special:RecentChanges anyway (but that's ye olde historey now, so it might be worth retrying). Conrad.Irwin 12:52, 12 December 2009 (UTC)
And see User_talk:Robert_Ullmann/2008#Funny,_you_don't_look_like_a_bot. Dvortygirl temporarily made me a bot, and we tested it; the deletes did not appear in RC, but still flooded the deletion log (of course). One reason it wasn't done then (for CS redirects) was to give them a little visibility; in some cases the conversion script was wrong, and the correct action would be to move the entry back, and delete the other redirect. Some of those were caught in RC. Of course this doesn't apply to the other stuff it is doing now, mixed in its queue with CS and other redirects. What trouble we want to go to may depend on whether it is really irritating anyone? Robert Ullmann 13:45, 12 December 2009 (UTC)
Also, if I delete some anagram templates, does that do any harm? Mglovesfun (talk) 09:40, 16 December 2009 (UTC)
No, won't do any harm, it will just go on to others. Robert Ullmann 10:43, 16 December 2009 (UTC)

Entries with nonstandard headers[edit]

Apologies for asking you a question and forgetting to check for a reply, but I suggested that we should eliminate the name of the header in brackets and get them all in one category. Going through Special:UnusedCategories, there are about 30 or so that are unused. Mglovesfun (talk) 22:29, 20 December 2009 (UTC)

They all go into the "parent" cat now. Connel added a second category, with the header in parents, to help find groups of them. Feel free to remove that cat from the template ({{rfc-header}}). (I don't think the intent was ever to have the cats themselves exist, but someone else created a number of them.) Robert Ullmann 15:52, 21 December 2009 (UTC)
Note Category:English circumfixes, circumfix should be a valid header, note we don't have Category:English confixes. Mglovesfun (talk) 18:29, 21 December 2009 (UTC)

Raw analysis data with RSS feeds[edit]

Hi Robert. I hust had the idea that it would be really nice if you could publish some of your analysis data such as User:Robert Ullmann/Trans languages in a raw format, such as tab-delimeted UTF-8 text, and create an RSS or Atom feed for each one. I know I can scrape and process the data right from the wiki page but a format intended to be authoritative and machine readable would be even better. — hippietrail 05:27, 22 December 2009 (UTC)

RSS or Atom would be huge overkill; even some XML is too much. As you say, tab-delimited UTF-8 text? Convert the ||'s to tabs and you're essentially there. (I hope you don't mean scraping the rendered page? Ugh.) I often read some of these tables into other programs. To use this as an example:
site = wikipedia.getSite("en", "wiktionary")
page = wikipedia.Page(site, "User:Robert Ullmann/Trans languages")
text = page.get()

reline = re.compile(r'^\| ?(.*?) \|\| ?(.*?) \|\| ?(.*?) \|\| ?(.*?)$', re.M)

for codes, lang, occurs, examples in reline.findall(text):
    print repr(codes), repr(lang), repr(occurs), repr(examples)
    # whatever else ...

Or you could skip all lines that don't contain ||, take the "| " from the front, replace || with tab, then tab-space and space-tab with tab. That gives you the tab delimited format.

for line in text.splitlines():
    if '||' not in line or line[:2] != '| ': continue
    line = line[2:]
    line = line.replace('||', '\t')
    line = line.replace('\t ', '\t')
    line = line.replace(' \t', '\t')
    # and put line wherever

(all code tested ;-) Robert Ullmann 12:34, 22 December 2009 (UTC)

Bot question[edit]

Hello Robert Ullmann, please can You check, why Interwicked did remove a valid link here and repair it, thanks, --birdy (:> )=| 21:44, 27 December 2009 (UTC)

"sh" is an illegal code, deleted by ISO and every single international and national authority as a furtherance of the Balkans genocide. It is only now being promoted by a destructive troll here, whose "proposal" to delete the standard languages, particularly Croatian, in response to being blocked from the Croatian projects (for more than sufficient reason), was firmly rejected by the community (vote July/Aug 09). Interwicket isn't looking for the invalid and illegal codes; it just uses the "framework" code to replace the iwikis in an entry with valid codes.
The failure of the Foundation and stewards etc to enforce community decisions (as in the vote referenced) is a severe problem, the encouragement by the stewards of people vandalizing content in direct violation of community decisions is another severe problem. The failure of the stewards and the foundation to comply with international law despite repeated complaints is a severe problem.
Birdy: at long last: can some ADULT look at this and, somehow, allow us to restore the the valid languages to our database, which as been repeatedly pointed out, is, or ought to be the most valuable lexicographic resource in the world. IF we are permitted to represent reality. Robert Ullmann 23:21, 27 December 2009 (UTC)
Hello Robert, I just saw by chance what avalanche was caused by my simple message. To me my time is too precious to read all of it.
My statement is clear and simple: as long as sh.wikt exists under this domain/code it will be linked with this code, if it is a wrong one they can change it via bugzilla, the bots will then correct the links to it afterwards without problems (like removing links to tlh.wikt which was closed, the bots have no problems doing such things). Best regards, --birdy (:> )=| 13:38, 30 December 2009 (UTC)
(In the meantime, I and many others do have a responsibility to all of the projects to represent reality. If you want to know what the vandal(s) thought about SC before deciding to become the 2009 Wikimedia troll(s), thus abruptly ending their professional career(s), read here. It really is fairly tragic, but I place the blame fairly squarely on the people in the foundation who refused to take any responsibility whatsoever before we reached the point of ending the vandals' career prospects to terminate their misguided vandalism.) 23:21, 27 December 2009 (UTC)
Your bot happily worked with sh code for Serbo-Croatian wikiprojects for years after which you deliberately modified to explicitly remove it. You intentionally crafted destructive behavior into your bot and propagated it to all Wiktionaries it was granted bot flag under some silly self-righteous cause of "fighting Serbian nationalism", which came out of the blue half a year ago, and apparently as an act of "revenge" against me.
sh is perfectly valid code that will never ever mean anything else beside Serbo-Croatian (2-letter codes are not assigned anymore). If there is any problem with it, we can always use "macrolanguage" hbs code which is also perfectly valid. This has been explained to you many times. And anyway, it's not up to you decide which language projects are "valid" or not. (Especially not with languages you have absolutely no clue about).
Let's be honest Ullmann - you're unilaterally committing large-scale vandalism. --Ivan Štambuk 23:56, 27 December 2009 (UTC)
This from someone who continues to intentionally violate the vote taken in July/August by vandalizing content? At long last sir, have you no shame whatever? No one is taking revenge against you, and it certainly has utterly nothing to do with me, it is you who gets into trouble everywhere you go. (As has been solidly documented, Stambuk was virulently anti-SC right up until the day he was blocked from hr.wikt, and is using it only as a vehicle to punish the "Croatian Nazi-pedia" by deleting Croatian here.) It is time you learned that Wikimedia is not a place to work out your emotional problems; we are not counsellors, and and all of your abuse is on the permanent public record. Robert Ullmann 09:27, 28 December 2009 (UTC)
No, I'm not violating any vote. That vote failed meaning it can't enforce anything. And it "failed" only because you canvassed countless non-contributors to vote against due to the lack of proper voting policy (which you also sabotaged) here; as analysis shows, almost all of the active Wiktionary community supports the unification effort, overwhelmingly so of those who are actually familiar with any Slavic language.
Again Ullmann you do pathetic ad hominems when confronted with irrefutable evidence. What does my personal opinion of any topic x years ago has anything to do with this matter? Your bot is committing vandalism, you intentionally crafted malicious behavior into it (which it didn't exhibit before). Its destructive behavior was not sanctioned by any of the Wiktionaries the Interwicket operates on. Given that the Interwicket operated perfectly valid for some time with Serbo-Croatian interwiki, and that this disruptive behavior came only after your efforts to abuse me failed (dirty e-mails to Wiktionarians, Wikmedia board, ludicrous attempt to desyop me at meta), the only reasonable conclusion that establishes itself that you did it as a sign of "revenge" against me. --Ivan Štambuk 05:44, 30 December 2009 (UTC)
The vote was seeking permission to delete the standard languages and forcibly merge them into SC; that permission was denied. You looked for community approval of what you were doing, and were told No. That could not be clearer. Trying to project your violations onto others is predictable, and shows that you do understand that your behaviour is unacceptable, and that it is intentional. And you don't know what "ad hominem" means, but then most people don't; I am criticising your behaviour, not trying to undermine your position by attacking your character (you might look it up, there's a dictionary around here somewhere ;-) I agree that it can be confusing when criticism of your unacceptable behaviour is intermixed with discussion of issues and technical problems. But you might note that I do not call you names, I do not say you have "no clue", I don't call your emails "dirty", or your actions "ludicrous", (and so on) I only call you out for that personal abuse. Robert Ullmann 12:20, 30 December 2009 (UTC)
Actually by my count, most (>80%) of the active Wiktionarians agreed with the merger. The only ones who didn't were your puppet votes (Amgine & Co.) who have absolutely no knowledge on Slavic languages whatsoever, and other ones that you canvessed via IRC and e-mail in order to "revenge" against me, as it turned out that I had overwhelming support despite your disgusting "genocidal Greater Serbian nationalist" propaganda.
The vote was merely a codification of > 4 months of effort of Serbo-Croatian contributors, which was initially announced at the Beer Parlour and and complained by no one. The only "issue" with it and "vandalisms" are in the head of Robert Ullmann and his puppet-troll Amgine. Ullmann is strongly suggested to terminate his line of ad hominems and focus on the issues raised. In particularly, self-victimization of being "attacked" by me whilst simultaneously emitting large amounts of FUD and lies. One should merely look at this very section: Ullmann is asked a simple question by an Icelandic contributor, and Ullmann's reply contains my peronal name, epithets such as "destructive troll" (and this troll is the most productive Wiktionarian for the last several months, mind you), terms such as "genocide" etc. And he dares to attack me of "abusing" him! --Ivan Štambuk 12:50, 30 December 2009 (UTC)
Robert, bots are only allowed to implement community consensus. Unless you can show that the Icelandic Wiktionary has reached consensus to not link to http://sh.wiktionary.org entries you must undo your changes. The same goes for any other language edition where Interwicket works. You are free to bring the issues of Serbo-Croatian (and it's deprecated ISO 639-1 code sh) up in all the wiki's you wish and argue for your point, but do not enact changes unilaterally through a bot. --Bequw¢τ 01:16, 28 December 2009 (UTC)


As may be seen at the Library of Congress ISO 639-2 list, sh is not a valid entry. You will also note that "Serbo-Croatian" is not listed even as a macro language under that name. The exact text of the deprecation of this term is as follows:
This code was deprecated in 2000 because there were separate language codes for each individual language represented (Serbian, Croatian, and then Bosnian was added). It was published in a revision of ISO 639-1, but never was included in ISO 639-2. It is considered a macrolanguage (general name for a cluster of closely related individual languages) in ISO 639-3. Its deprecated status was reaffirmed by the ISO 639 JAC in 2005.[1]
However, the Wikimedia Foundation Board has specifically approved the re-opening of sh.WP (Request for new language SH) even though it had been previously closed for lack of activity. I believe this would suggest sh interwikis should not be removed. - Amgine/talk 01:33, 28 December 2009 (UTC)
(Do note that was May 2005, the national and international issues were just being resolved; the decision was fairly bad then, but now entirely wrong; SC has been recognized by everyone—including Štambuk—as temporary linguistic nonsense, coded only because Yugoslavia, not Serbia, Croatia etc, was the representative to ISO. None of which is exactly relevant here.) Robert Ullmann 09:27, 28 December 2009 (UTC)
Language codes used by Wikimedia projects in no way must reflect "official" language codes by ISO or any other similar organization. They are are usually assigned from ISO 639- namespace, but in many occasions they're not. Simple ISO scheme will never be sufficient for lexicographical purpose of describing all of world's languages in all periods, and our list of exceptions will only grow larger in the course of time.
It is meaningless to argue on the particular case of Library of Congress whether SC is one language or not (although it's amusing to see you Amgine still trying to do so, on the basis of empty contextless quotes): FYI, Library of Congress switched their bibliographical tags for SC only after 18 years of lobbying of Croatian nationalist diplomacy, and they did not do it retroactively (meaning that tens of thousands of books tagged with "deprecated" scr/scc tags will not be reassigned to new separate tags, meaning that 99% of SC literature that is of any worth whatsoever will forever be tagged as scc/scr in their catalogs). --Ivan Štambuk 05:13, 28 December 2009 (UTC)
And the same time the national libraries and standards authorities requested that "scc" be redefined as Serbian in Cyrillic script and that "scr" be redefined as Croatian in Roman script, and ISO and LoC as 639-2 RA concurred [13] (and deprecating both in favour of the T codes sr/srp and hr/hrv, the ones we generally use). So there is in fact no remaining material tagged as "Serbo-Croatian". Robert Ullmann 12:20, 30 December 2009 (UTC)
Yes but that "redefinition" is retarded (as Croatian official Mate maras who supervised this futile campaign notes) because it would tag numerous works by Serbian and Bosniak authors printed in Roman script as "Croatian", and by Croatian and Bosniak authors printed in Cyrillic script as "Serbian". That is merely a recommendation however, countless US libraries still use Serbo-Croatian and have no intention of supporting the ahistorical fabrication of these new "languages", simply because it's impossible to draw the lines on the basis of idiotic criteria of script used (which would categorize Na drini ćuprija in Roman script as "Croatian", and in Cyrillic script as "Serbian"). --Ivan Štambuk 12:50, 30 December 2009 (UTC)

Serbo-Croatian Wiktionary is nowhere need "locked" or "inactive". It's active, as well as Serbo-Croatian Wikipedia, and has no signs of decay. You yourself claimed that the only reason why Serbo-Croatian Wiktionary is not linked to is because, in your opinion, "it should be closed". Interwicket worked perfectly fine with adding interwiki to Serbo-Croatian Wiktionary, and only later did you deliberately alter its functionality in order to install the destructive behavior. --Ivan Štambuk 12:50, 30 December 2009 (UTC)

I am confused. Isn't Interwicket's work dependent on whether a Wiktionary exists in that language, rather than whether an ISO code exists? I notice we do have interwiki links to the Simple English Wiktionary. --Yair rand 05:31, 28 December 2009 (UTC)
That is correct. Interwicket should make absolutely no assumption on weather the linked article represents "language" or not. Simple English Wikiprojects are the best example of this. Wikimedia project codes don't reflect ISO scheme, and many of them are assigned pseudocodes (e.g. bat-smg, fiu-vro, be-x-old, simple). All of them for a particular purpose, whose utility is of no concern of us. Interwicket should simply blindly check whether the article exists on target Wiktionary and add interwiki if it does. Regardless whether Ullmann personally disagrees with the existence of Serbo-Croatian wikiprojects or not, it is not up to him to decide whether the Interwicket bot, that he was in best possible intentions given bot flag, can act in such destructive way. --Ivan Štambuk 05:53, 30 December 2009 (UTC)
No, it isn't correct; the set of wikts to link is not the set of "active" or "existing" wikts (whatever that means; sh.wikt was stone cold dead until his more recent campaign of vandalism). wikts can "exist" (domain name takes you to a page), not exist but be treated as valid in DNS (domain name takes you to page about how to create a new wikt), or domain name takes you nowhere. They can be locked, empty, or both. In a few cases they have been noted to be almost empty, not locked, but API calls crash (!) because of something not initialized. They can be open and unlocked and contain content, but not linked to (tlh/Klingon for a period of time) And more, but it isn't "simple", (I still need to categorize "as" properly...) The table on meta is just the starting point. Robert Ullmann 12:20, 30 December 2009 (UTC)
Serbo-Croatian Wiktionary is nowhere near of being "locked" or "inactive". It's active, as well as Serbo-Croatian Wikipedia, and has no signs of decay (as opposed to B/C/S ethnopedias). You yourself claimed that the only reason why Serbo-Croatian Wiktionary is not linked to by Interwicket is because, in your opinion, "it should be closed". Interwicket worked perfectly fine with adding interwiki to Serbo-Croatian Wiktionary, and only later did you deliberately alter its functionality in order to install the destructive behavior that is being complained about. --Ivan Štambuk 12:50, 30 December 2009 (UTC)

Ignoring Štambuk's usual abuse, it is like this:

Interwicket is not looking for "sh" links to add or remove. Indeed it explicitly avoids the issue, treating the code in the same manner as "tlh" (Klingon). The reason for this is to avoid encouraging and facilitating Štambuk's continued vandalism. When it finds codes in this class, it throws an exception: (part of log)

    ... [[fr:m\xe9connaitrez]] +en
        pivo found in lo
42931 (idx) fr \xe9mergerions links 2 random 3857.7273 queue 31
        pivo found in sw
        renaqu\xeet found in en
    ... hunting iwikis for \xe9mergerions
        pivo found in sv
    ... [[en:paitront]] +fr
42932 (idx) fr repara\xeetraient links 2 random 3893.2822 queue 30
        pivo found in sk
    ... [[fr:renaissez]] +en
        pivo found in lt
exception testing existence of word u'sh'
        renaquissions found in en
    ... hunting iwikis for repara\xeetraient
        pivo found in ko
        poursuivriez found in en
        suivisse found in en
42933 (idx) fr m\xe9connurent links 2 random 3995.3155 queue 29
        pivo found in sl
    ... hunting iwikis for m\xe9connurent
42934 (idx) fr m\xe9conna\xeetrait links 2 random 4028.1212 queue 28

and does not remove any iwikis from the entry. However, in some cases it will still be adding one or more iwikis, and calls the "framework" code to replace the links. The framework code silently elides bad iwikis. Note that the edit summary is (iwiki +li:November), Interwicket is only adding the li link, not attempting to remove anything. In point, it is explicitly coded to not remove sh links, but is then defeated by the framework. (*sigh*)

Yes, I should re-write that part of the "framework", as I have replaced a lot of the other cruft over time. (I probably should get rid of it entirely, it is an endless series of bugs, and never seems to be maintained ;-) Robert Ullmann 09:27, 28 December 2009 (UTC)

"The reason for this is to avoid encouraging and facilitating Štambuk's continued vandalism." - And how many edits do I exactly have on Icelandic Wiktionary? What does English-Wiktionary dispute have anything to do with foreign-language wikiprojects anyway? Who are you to decide that Serbo-Croatian Wikiprojects (which are now flourishing more than ever) are "invalid" ? --Ivan Štambuk 06:08, 30 December 2009 (UTC)
Observe this idiocy:
  • November 24, 2007, [14] - Interwicket adds sh: interwiki
  • November 1, 2009, [15] - Interwicket silently removes sh interwiki form the very same article it added it to.
What's the matter Robert, the label Srpskohrvatski / Српскохрватски in the interwiki links makes you feel uncomfortable? Why don't you fix the destructive behavior of your bot that you deliberately implanted in it, instead of providing petty excuses? --Ivan Štambuk 06:15, 30 December 2009 (UTC)
And for the record, Ullmann has been notified of this destructive behavior as early as 10 May 2009 by our Bulgarian contributor who was utilizing Wiktionary to learn Serbo-Croatian language, and also contributed Serbo-Croatian words here and on Serbo-Croatian (and other-language, he's a polyglot!) Wiktionaries, cross-linking them with interwiki, and was disturbed by the fact that the very same interwiki links he manually added were silently vanishing. Behold Ullmann's "explanation":
Since Serbo-Croatian Wiktionary as well as Wikipedia show no signs of decay and are unlikely to be closed anytime soon, I was wondering if mr. Ullmann has changed his opinion on the value of having Serbo-Croatian interwiki links by fixing the destructive behavior of his bot accordingly. --Ivan Štambuk 06:26, 30 December 2009 (UTC)

Again, there is no end to Mr Štambuk's campaign of personal abuse. This from someone who has spent the better part of a year destroying the Serbian and Croatian content of the wikt, in spite of his "proposal" being firmly rejected by the community. More destructive behaviour would be difficult to imagine.

(by contrast, the iwiki links are minor, and automatic, he raises this issue only to conduct yet another personal attack; as noted supra, this behaviour—relentlessly abusing others—has nothing really to do with me, I'm just the present "lucky" target as I am calling out for his vandalism, it is a pattern of his behaviour across multiple wikis for several years)

For the record, and everyone else's information: mbwa.py, the present code of Interwicket, was written in February 2009, it is less than a year old. The stop-gap program run to update some iwikis on the en.wikt only (to replace the defunct RobotGMwikt) was run (on long one-off runs) several times before that; it operated totally differently. As explained carefully above, there is no intent to drop the "sh" iwikis, simply to not add them in furtherance of his vandalism.

If he had spent a fraction of the effort he has spent trying to take revenge on the "Croatian Nazi-pedia" by deleting Croatian here and writing walls of text to abuse other people so he can feel justified as a "victim", on instead improving and adding Croatian content (as indeed he used to do), his contribution would be immense.

Instead we have a very intelligent, very knowledgeable young man (as others have said, and I heartily concur), willing to spend a great deal of time on a very important project, who has is not only wasting that effort, but spending time destroying his own prior work. Tragically, most of his excellent work will probably have to simply be deleted in the end, as the standard languages are restored, as they must be; having them in the wikt is not optional. My attempts to salvage some of his work have so far been met with vicious and abusive resistance.

This entry on wikipedia may help others understand.

The time wasted by dealing with Stambuk's relentless campaign of abuse subtracts from what is available to fix problems, of course that is his intent, to pour out abuse and intimidation until others are worn down and give up; I suspect it infuriates him that I just calmly continue to point out his vandalism?

Bot owners are not politicians, Robert. Interwicket should be running on sh and adding sh interwikis. Keep the politics out of the bot running, ok? Mglovesfun (talk) 12:24, 30 December 2009 (UTC)
Sure, but I need to fix some things. And dealing with his relentless personal abuse is time consuming. If he'd just go back to constructive work on Croatian? Way too much to hope for, I'm afraid. Robert Ullmann 13:29, 30 December 2009 (UTC)
Actually, on this I disagree - bot operators work on what interests them, what they believe or agree with. Attempting to say they are NPOV is delusional - we have enough of illusion here that we should avoid self-delusion. I believe it's perfectly appropriate for M. Ullmann to choose not to support a macro language, just as you might choose not to support Simple.WT. Should we require you to edit there at least as often as you edit en.WT?
However, choosing not to support is very different from actively harming. Removing sh interwikis is an action. Ignoring sh interwikis is not.
Equally, though, deleting L2 headers for languages also supported by WMF is actively harming - in exactly the same way. - Amgine/talk 16:24, 30 December 2009 (UTC)
We already ban languages "supported by WMF" such as Klingon, because they do not interest us (e.g. they are copyrighted). Nowhere does any Wiktionary policy state that we must follow the WMF scheme of language codes for Wiktionary L2 sections: in fact, we're already breaking it in numerous instances. Serbo-Croatian is not a "macrolanguage" - there is no such thing as "macrolanguage", it's an imaginary clade invented by Christian organization SIL International. Who BTW also happens to assign codes for languages which do not exist at all but in their imagination. There is no such thing as "Croatian language" : all the edits I do for Serbo-Croatian are simultaneously equally valid "Croatian", "Serbian" and "Bosnian", modulo exception labels. --Ivan Štambuk 00:56, 31 December 2009 (UTC)

technical issue[edit]

As noted above, the technical problem probably requires re-writing part of the links code from the "pywikipedia framework". This is not a simple task, as there is much magic and oddity involved. The work will take a bit of time,

To return to the actual issue (however minor) it is worth explaining here in more technical detail (as I am going to need to design this anyway ;-)

Consider this code, finding the iwiki links in the text of an entry, and replacing them without changes:

linksites = wikipedia.getLanguageLinks(text) newtext = wikipedia.replaceLanguageLinks(text, linksites, site = flw.site)

one would now expect newtext == text, correct? Not a chance. Not even if you do it twice (fetch the page, do this, save it, fetch it again, do it again).

Firstly, replaceLanguageLinks uses removeLanguageLinks, not getLanguageLinks, and the two do not match the same set of links (!). Codes that are obsolete are silently elided, as are some other codes returned by the "get" routine; some codes are silently changed (dk->de, jp->ja, nb->no, etc). Next sorts them in site order (why it needs the site, it also annoying throws exceptions for site links.) It then formats them one per line, or one line (for pl.wikt), add back to text (using \r\n for some idiotic reason). (It also handles a special case of categories last, after iwikis, but that case doesn't occur in any of the wikts.) The combined results have several (many?) odd bugs: something that looks like an iwiki inside math tags will be silently replicated outside, the same thing in an HTML comment will not.

Interwicket does have the parameters needed within its own code; it "knows" the pl.wikt special case(s) for example. Needed is a version of the "get" routine that matches all the iwikis not in math, comments, or nowiki tags, and look like iwikis to the MW s/w. (Access to the s/w iwiki table? Haven't seen that anywhere ... or fixed list as in the framework now? Put in mbwa's flw table in any case?)

Then a replace routine that replaces the exact set from the "get" routine with the set given by caller, without doing elisions or replacements. (Interwicket does or will do those explicitly.) No "fixing" of WS etc done there. Then equivalent code to the above will not change the text unless the iwikis need to be moved or sorted.

This should fix a number of other bugs, most of which not seen yet. I'll try putting together a version. Robert Ullmann 13:29, 30 December 2009 (UTC)

Can you simply fix your bot so that it doesn't remove existing and valid sh: interwiki? It doesn't even need to add it (I'll do it manually, or write a bot that does so). --Ivan Štambuk 12:50, 30 December 2009 (UTC)
Robert, I understand that you weren't intentionally removing the SC interwikis, and that it will take some work and some time before you can get your bot to neither-add-nor-remove them. That's very understandable. And I hate to add to the dog-pile of comments you're receiving, some of which are quite rude. But. The thing is, it will not take work nor time to get your bot to add-and-not-remove SC interwikis: just remove the special case for SC. So until you can fix your bot to take your preferred action (neither-add-nor-remove), I think you have to either turn it off completely, or fix it to add-and-not-remove. Because right now, your bot is taking an action that Wiktionaries have not requested and do not support, and that you do not plan to revert. That's not really O.K. —RuakhTALK 13:02, 30 December 2009 (UTC)
The only "special case" for SC (as explained above) is explicit code to not remove it. Which (as also explained above) is defeated by the "framework". I'm working on fixing it. The constant stream of personal abuse does not help. Robert Ullmann 13:36, 30 December 2009 (UTC)
Ah, sorry, I misunderstood; I thought you were saying that the special case for SC was explicit code to not add it. Never mind, then. Thanks for your efforts. :-)   —RuakhTALK 13:51, 30 December 2009 (UTC)

Done.

Nice to get rid of a bunch of the "framework" code and the half-dozen or so bugs. Simpler, faster ....

Will be in integration test for about six hours. If it passes, will then resume. Robert Ullmann 17:27, 30 December 2009 (UTC)

Thank you very much for the time and energy invested in this quick change. Your work is very appreciated. --Bequw¢τ 20:09, 30 December 2009 (UTC)
I concur. I hope that in the future similar discussions would be much more productive programming-wise. --Ivan Štambuk 00:56, 31 December 2009 (UTC)

Email[edit]

Hey there RU. Have you gotten my recent emails? Thanks, Razorflame 23:23, 27 December 2009 (UTC)

Yes, sorry, I have a complicated reply ... will get there soon. Robert Ullmann 23:48, 27 December 2009 (UTC)
Hopefully it isn't so complicated that I won't be able to understand it ;) Razorflame 23:52, 27 December 2009 (UTC)

Re:quæstion[edit]

Thanks so much for pointing that out! I'll be more careful in the future. Thanks again, Wrelwser43 04:44, 30 December 2009 (UTC)

Email[edit]

Did you get my email about that list of words? A text file would be fine. Cheers! bd2412 T 04:02, 31 December 2009 (UTC)

Deleting simple interwikis[edit]

I'm pretty sure that simple is not a valid ISO 639 code. Shall we start deleting those interwikis too? Mglovesfun (talk) 12:12, 31 December 2009 (UTC)

Uh, Mg, perhaps you should read the above (rather long) conversation. --Yair rand 18:22, 31 December 2009 (UTC)