User talk:Interwicket/2008

From Wiktionary, the free dictionary
Jump to navigation Jump to search

New bot for iwikis designed for wiktionary[edit]

See the code page for current status. Seems to be working well. Robert Ullmann 23:18, 25 September 2007 (UTC)[reply]

Functions[edit]

So, I know this bot adds iw links. Does it remove bad links? Does it sort existing lists of links on pages? --EncycloPetey 00:37, 1 October 2007 (UTC)[reply]

It removes bad links; but it only sorts if it is updating the page (adding/removing). Further, it only moves misplaced iwikis to the end if it is updating. We could do something separate to collect pages that are mis-sorted, and perhaps feed them to AF. Robert Ullmann 00:41, 1 October 2007 (UTC)[reply]
When it is scanning the XML, it might (or might optionally, so it wouldn't be init overhead each time) generate a list of pages that either had iwikis mis-sorted, or had text after the first wiki, and force the page to be updated when it gets to that point in the titles. I've thought about it a bit. Working now on adding the 100K+ iwikis still missing. Robert Ullmann 07:36, 2 October 2007 (UTC)[reply]
Updated to sort links if out of order, or not at the end of the entry. Robert Ullmann 14:19, 7 October 2007 (UTC)[reply]

vi links, pybot bug[edit]

There is a bug in pybot that strips whitespace other than U+0020 from page titles, this caused iwikt.py to lose its place in the vi.wikt this morning when it got to entries starting with U+3000 IDEOGRAPHIC SPACE; as a result it removed some links. Bug fixed, they are being replaced now. Robert Ullmann 06:40, 5 October 2007 (UTC)[reply]

VolkovBot[edit]

With this bot active, what will become of VolkovBot (which performs the same function)? --EncycloPetey 10:56, 11 October 2007 (UTC)[reply]

While it generates (some of) the same links, it is doing something different, and useful: it runs on the new pages in a given wikt, and finds the links for it, and adds the reciprocals; it doesn't find old missed links, or remove links in all the cases it should. Interwicket is only adding to the en.wikt; it would be/will be run on other wikts if they want to. So they play well together. It is an advantage to VolkovBot et al that the en.wikt has a complete set (less whatever new entry was just added that it is updating) so it can find them all. It is an advantage to us that we don't have to run multiple passes of iwikt.py -new in between XML dumps to satisfy the people that want new entries quickly interlinked. Robert Ullmann 11:18, 11 October 2007 (UTC)[reply]

run status 16 October[edit]

Run lasted about two days, stats: 2681069 in union index, 618941 entries, 6891 possible, 6891 updated Robert Ullmann 12:23, 19 October 2007 (UTC)[reply]

Bot creates interlanguage links to redirects[edit]

See Staff. Surely this is not appropriate? __meco 11:09, 20 January 2008 (UTC)[reply]

It is doing what it is supposed to; if the FL wikt has the title redirected to another form you will end up there. We don't want to follow the redirect because it may change, become an entry, etc on the FL wikt. (and yes, there is a lot of CS cruft out there) Robert Ullmann 12:18, 20 January 2008 (UTC)[reply]

scope of this bot is severely lacking[edit]

the running of the RobotGMwikt bot was stopped on the promise that the function of providing interwiki links would be taken over by a more intelligent design. this is good. the RobotGMwikt was stopped as a consequence.

It is extremely disappointing that this bot is a selfish English only project. It is extremely disappointing that some o in the English Wiktionary community are not willing to argue about the way the algorithm of the Interwiki process should be... The only arguments heard so far are based on wishful thinking and not on fact.

Given all this, I am of the opinion that this bot does not qualify for production use.

Thanks, GerardM 09:59, 13 May 2008 (UTC)[reply]

GerardM, the Wiktionaries are, by design, seperate projects. Whether this is good or bad is another story, but what it means is that we (the English Wiktionarians) can do what we want to the English Wiktionary. In the same way we are happy for other Wiktionaries to do what they want to themselves. I can't see there being a problem with running RobotGMwikt elsewhere - but it seems that some people here do not wish it to be run here.
I chatted with Ullmann (quite some time ago now) about using Interwicket on other projects and he seemed amenable to the idea at the time, though obviously only he knows the specifics. I believe the main reason that Interwicket is English-only is because no other project has asked for it. A better way to proceed would be to find out if the other Wiktionaries want to run Interwicket and then to get this to happen. As you are no doubt aware, imposing bots on projects without their consent is not acceptable.
In terms of arguing about the way that interwikis should work, there is a topic on the beer parlour in which people are discussing the link-to-redirect issue, if there are other issues you want addressed then feel free to raise them there. Conrad.Irwin 10:23, 13 May 2008 (UTC)[reply]
I agree with GerardMs opinion here.
Another problem with this bot is that it loads the servers by looking at the databases of all wiktioanries and then only adds link to en.wiktionary. IMHO this is not only selfish, but an unneccessary serverload. Thanks, --birdy (:> )=| 10:25, 13 May 2008 (UTC)[reply]
I have learned about yet another person coming to me who is upset with the attitude of the English on how to do Interwiki work. Again.. there are good reasons to blog this bot. GerardM 15:46, 13 May 2008 (UTC)[reply]
Birdy: Interwicket only looks at the indexes for all the wikts. Several orders of magnitude less server load than the interwiki.py bot designed for the 'pedias that reads all the pages (utterly uselessly). Robert Ullmann 15:19, 24 May 2008 (UTC)[reply]

Note that the only argument we have heard from Gerard is that his way is right just because he is right, and therefore should be imposed on the en.wikt whether we want it or not. He is perfectly capable of running his bot for all other wikts that accept his imposition and not updating en.wikt (nor pl.wikt, which has its own policy re links to ru.wikt). When asked if he knows how to do that, he has steadfastly refused to answer the question (this on IRC). Of course he does, he disabled editing on the pl.wikt; it is simply a matter of removing the username line for en.wikt from user_config.py. Robert Ullmann 15:28, 24 May 2008 (UTC)[reply]

I should also note that RobotGMwikt was not stopped because there was a replacement, it was stopped because Gerard (at the time) flatly refused to makes exceptions for pl.wikt. Interwicket was only written because RobotGMwikt was long gone. This is all on the record.

And the code is available for any project. It is, however, designed for fairly large projects; a project with less than perhaps 10K entries might do better with interwiki.py for now. On any scale, however, interwiki.py is horrendously inefficient. (Interwicket by contrast updates all of en.wikt running from a laptop in Nairobi, it simply doesn't need to do that much traffic. ;-)

Birdy: you might note that the interwiki.py bot (that Gerard says he is running), requires that entries from "lesser" languages be linked from one of the big wikts before they will be found. Without reading the indexes, it misses a lot of links. Since Interwicket adds all the links to all of the wikts, other bots running the interwiki.py code will get links they would not otherwise have gotten. Some of the newly expanding wikts would never be linked in unless they run their own bot to add links to en, etc. (Which is why we see numerous such requests from people who do understand how it works.) Robert Ullmann 15:48, 24 May 2008 (UTC)[reply]

Removing hsb interwikis[edit]

Special:RecentChangesLinked/Category:Upper_Sorbian_adjectives Something is wrong... Maro 17:01, 21 May 2008 (UTC)[reply]

hsb is missing from meta:List of Wiktionaries. We need to poke mutante. Robert Ullmann 15:50, 24 May 2008 (UTC)[reply]

It is included in the list on rank 55 meanwhile. Mutante 22:49, 30 June 2008 (UTC)[reply]

has been fixed, as now included Robert Ullmann 22:52, 30 June 2008 (UTC)[reply]

RE: Base[edit]

Why would you put German definitions on an English dictionary? I'm afraid I'm going to have to dispute this. --IdLoveOne 02:43, 5 July 2008 (UTC)[reply]

We are a dictionary of all words in all languages, defined and described in English. I'd suggest you learn a bit more about who and what we are before "disputing" things. Robert Ullmann 16:28, 6 July 2008 (UTC)[reply]
Ok, but why two pages for the same word? Shouldn't they be merged?--IdLoveOne 23:41, 7 July 2008 (UTC)[reply]
No, they shouldn't. the thing remains that base is different than Base. Please listen to Robert's suggestion, and please take mine -- drop this dispute. --Neskaya talk 23:43, 7 July 2008 (UTC)[reply]
See my userpage for my reply. --IdLoveOne 00:03, 8 July 2008 (UTC)[reply]

No contributions since 26 June 2008[edit]

Is this bot now suspended, or replaced by some other bot? -- Gauss 14:45, 23 July 2008 (UTC)[reply]

It is run after XML dumps. And the XML dump process is stuck, waiting on new fileservers long overdue. Normally, it should run every two-three weeks after each dump, and another time in between on all new entries. Not quite sure what to do at this time. Robert Ullmann 14:51, 23 July 2008 (UTC)[reply]
It has been that long? Gaak. I'll run it again on new entries at least. (Already did this in June, so it will do a bunch of redundant checking; I'll just have it go slowly; it's a bot, it is patient.) Robert Ullmann 15:28, 23 July 2008 (UTC)[reply]
Ah, thanks. I was just wondering and saw nowhere more appropriate to ask this question. -- Gauss 20:25, 23 July 2008 (UTC)[reply]

Outsourcing[edit]

This bot is now running on as sv:User:Conrad.Bot and no:User:Conrad.Bot. Conrad.Irwin 15:03, 23 July 2008 (UTC)[reply]

Cool. Btw: your comment (at no.wikt) about "missing newly created pages" is not correct: if it finds a page in the home wikt from allpages that it didn't pick up in the XML, and there are any iwikis, it will check and update the page. The -new option tells it to do only this, so if you've already run complete on an XML, and a week or two later with no XML (*sigh*) you can run it on all the new entries. (rechecking the new ones already done, so this gets less and less efficient if repeated) Robert Ullmann 15:13, 23 July 2008 (UTC)[reply]
Thanks! I should read the small print. Conrad.Bot 15:34, 23 July 2008 (UTC)[reply]

gl.[edit]

Does Interwicket pick up entries that are on the gl.wikt?—msh210 20:32, 10 September 2008 (UTC)[reply]

Interwicket reads the local xml dump for en.wikt and the AllPages on every other Wiktionary. Whenever it finds a difference between the interwikis in the dump (assuming no interwiki's for a page not in the dump) it updates the en.wikt page. So yes, it will. It doesn't work like the normal interwiki bot (which updates pages as they are added to the wiktionaries) but instead is started manually every now and then, at which point it will bring the entire Wiktionary up to date. (Feel free to correct me if I'm wrong, Ullmann) Conrad.Irwin 20:46, 10 September 2008 (UTC)[reply]
(note that a page not in the dump is updated, reading it from the wikt and adding all needed links) Robert Ullmann 17:35, 10 November 2008 (UTC)[reply]
Yes. Long intervals between XML dumps make it a lot more time consuming, but at present there are two threads, one at about "eg" and one in "с" (in Cyrillic, e.g. скула), updating all of the iwikis. Robert Ullmann 00:19, 11 September 2008 (UTC)[reply]

Thanks for the replies.—msh210 16:59, 11 September 2008 (UTC)[reply]

Hello. This bot recently removed a crucial interwiki link to the German idiom in the German Wiktionary (here). The difference with or without sein is negligible, just as there is not important whether the entry in English is on the go or be on the go (always with the verb be). Can this problem be fixed so that the interwiki link cease to be removed? Bogorm 19:58, 7 November 2008 (UTC)[reply]

Interwiki links should always point to the exact same form on the other Wiktionary. The way that two idioms are identified as equivalent is by having a redirect from one to the other; then [[fix und fertig]] would have an interwiki link to [[de:fix und fertig]] which would redirect to [[de:fix und fertig sein]] which would have an interwiki link to [[fix und fertig sein]] which would redirect to [[fix und fertig]]. —RuakhTALK 13:12, 8 November 2008 (UTC)[reply]
Thus?Bogorm 13:22, 8 November 2008 (UTC)[reply]
Yes, this is why (one reason why) we link to redirects; so that wikts that differ over what is the canonical form get linked properly (assuming they have the redirects). Note that these work correctly now. But in general, the links from other wikts will not, as they haven't updated policy to link to redirects, and a lot of iwiki bot operators will insist that those links should be removed. (!) Robert Ullmann 17:30, 10 November 2008 (UTC)[reply]

"sorting" iwikis[edit]

I think Interwicket should remove iwikis to non-existent Wiktionaries: [1] [2] [3] [4]

And also, do not add iwikis to redirects, if it's possible.. [5] [6] Maro 21:08, 19 November 2008 (UTC)[reply]

We add iwikis to redirects on purpose, see immediately previous section. Cleaning up iwikis for (now) non-existing wikts is a little bit tricky, since they are often someone trying to do something else; I do look for them. Robert Ullmann 23:25, 19 November 2008 (UTC)[reply]
Perhaps Interwicket (or AF?) could eventually tag entries with an interwiki link to a not (yet?) existing wikt so that they end up in a suitable category? -- Gauss 23:32, 19 November 2008 (UTC)[reply]
Btw: "se" is in the MW table (it displays as an iwiki link), but the wikt doesn't (as you note) exist as an initialized project. This often takes waiting until "they" figure out WTF "they" are doing ;-) Robert Ullmann 23:36, 19 November 2008 (UTC)[reply]
Getting slightly off-topic: I just noticed that the bad "[se:burner]" was added (by anon) together with the Swedish translation. Considering that "se" is the TLD for Sweden, I assume that the link was meant to lead to sv.wikt, where (deprecated template usage) burner doesn't exist either. I'm removing that interwiki link now. (done already by Maro) -- Gauss 23:44, 19 November 2008 (UTC)[reply]
Yes, very likely, people get confused by country codes vs language codes. (and "sw" is Swahili, not Swedish or Sweden ;-) Robert Ullmann 23:49, 19 November 2008 (UTC)[reply]