User talk:UllmannBot

From Wiktionary, the free dictionary
Jump to navigation Jump to search

Talk page for 'bot (or pseudo-bot) run by User:Robert Ullmann

See user page for task description.

Eumhun[edit]

Bot uses parameter eumhun, but should be emhun. Example taken from : {{ko-hanja|hangeul=재|eumhun=엄숙할 재, 집 재, 상복 자, 재계할 재, 공부방 재|rv=jae|mr=chae|y=cay}}

Either fix the template to use eumhun, or make the bot use emhun. FWIW, only eumhun is correct by the ROK MCT's 2000 Revised Romanization. Thanks – Dustsucker 22:56, 24 November 2006 (UTC)[reply]

Um, I know the correct spelling ... it worked when I coded them. Must have then fixed the bot and somehow not managed to fix the template? *sigh* Robert Ullmann 23:23, 24 November 2006 (UTC)[reply]

Trad/simplified characters[edit]

Currently your bot puts trad and simplified characters on separate lines:

  1. : decrease, lower; censure, criticize
  2. : decrease, lower; censure, criticize

I think it should put them on the same line, something like this:

  1. / : decrease, lower; censure, criticize

Kappa 02:19, 8 December 2006 (UTC)[reply]

That would be cool, if it had the information. Some of the entries have that information from Nanshu's attempt to interpret the Unihan database, but he also sometimes got it wrong. And it isn't that simple of course, look at this case: (at biān)

  1. : edge, margin, side, border
  2. : edge, margin, side, border
  3. : edge, margin, side, border
  4. : edge, margin, side, border

There is the simplified form, the shinjitai, a traditional form Z-axis variant (presumably less common) that isn't considered to have been simplified to the first form, and the traditional form that does correspond to the first. But I had to look all that up to confirm it, and I don't know how to tell the bot that much. I could probably get to:

  1. , : edge, margin, side, border
  2. : edge, margin, side, border
  3. : edge, margin, side, border

or just assume that it ought to combine any consecutive characters that have the same definition:

  1. , , , : edge, margin, side, border

although that might have some odd effects. (Besides having the shinjitai and the less common form in between.) Look at (at bài), what does "pathaka" mean? There are two different definitions. And the forms aren't always consecutive in UCS sort order.

It really ought to be something like:

  1. : edge, margin, side, border
  2. (shinjitai): edge, margin, side, border
  3. (less common trad. form): edge, margin, side, border

But that is way out beyond what the bot can do (barring a complete analysis of some external DB). And it still doesn't really explain what the shinjitai is doing listed under Mandarin Pinyin. Sorting the common characters to the top would be useful. (If an entry exists, the bot leaves the lines in order, only adding missing definitions, then adds any missing characters at the end.) Robert Ullmann 05:04, 8 December 2006 (UTC)[reply]

Consider (at biāo)

  1. : mark, symbol, label, sign; stand the bole of a tree
  2. : a mark, symbol, label, sign; standard
# [[标]]: [[mark]], [[symbol]], [[label]], [[sign]]; stand the bole of a tree
# [[標]]: a mark, symbol, label, sign; [[standard]]

The bot can know these are a sim/tra pair, the info is in the entries. But then what does it do? (;-) This is pretty common, as people have wikilinked the entries in various ways, fixed the definitions in one and not the other. (And the definitions are not always the same, especially when another, rarer, traditional character simplifies to the same form.) Still thinking about what might be done a bit better. Robert Ullmann 06:16, 8 December 2006 (UTC)[reply]

Please look at biǎn (and biān). I've combined any that are together, with the same definition. This is better, and I don't think we can get to perfect ;-) Thanks for looking at these; it is not easy sometimes to get anyone to check on what you are doing. Robert Ullmann 08:27, 8 December 2006 (UTC)[reply]

{{t}} again[edit]

Hi, I saw you are misusing your bot to experiment with your new versions of {t}. That’s ok with me, but it seems to miss some stuff: on die, the Catalan word is not linked to its section, neither is the Spanish. I suppose this is because the word for morir contains only Spanish, but eventually, it is to contain Catalan as well. I do not really know how to handle this. Maybe just like you do: leave out the ls, the robot will add it when eventually the Catalan information is entered. H. (talk) 15:37, 1 April 2007 (UTC)[reply]

It should get its own bot name at some point ;-) Yes, it only adds the name for the section link when needed, and will update it when another section is added. Robert Ullmann 13:30, 30 August 2007 (UTC)[reply]

{{t}} once again[edit]

Hi Robert, I thought the bot was going to introduce the t template as well, not only updating it. Are you working on this? Will it happen in the near future? Also, is there a tag to place to request an update run? H. (talk) 13:15, 30 August 2007 (UTC)[reply]

This is the first time in a month or two I've gone back to this; there are a lot more t templates than when experimenting before. The update run should immediately (day or so) follow an XML dump.
Introducing the template has a lot of tricky cases; the simple ones are not hard, but a very large percentage are complex. The most serious problem is that a bot can't tell what is the FL word, what is some sort of gloss, and what else consistently, even when it is "obvious" to us humans. The existing program isn't intended to do that. Robert Ullmann 13:42, 30 August 2007 (UTC)[reply]

translation-language[edit]

Hi Robert,

Why do you remove |lang=foo from calls to {{t}}?

RuakhTALK 14:17, 30 August 2007 (UTC)[reply]

I don't know exactly which entry you are referring to, but the bot code sets lang if and only if the template needs a #section reference. In the majority of cases lang= isn't needed, and users shouldn't worry about it either way. (If we could get a language specific version of the #language parser function, we'd lose this parameter completely.) I had called it ls, but Connel asked to change it to lang=; it would be better if it was ls or X or something so users wouldn't worry about it ;-). Robert Ullmann 14:25, 30 August 2007 (UTC)[reply]
It was aircraft. And while it doesn't really make a huge difference for Hebrew — generally only Aramaic can appear above Hebrew, so it's not like you have to scroll past twenty language sections — it seems that in general, language-segment fragment identifiers are either beneficial or neutral. I'm not saying UllmannBot should add them, necessarily, but it certainly seems wrong to remove them when human editors add them. —RuakhTALK 16:08, 30 August 2007 (UTC)[reply]
Um, the idea is that the parameter (whatever it is) is just automated by the bot; otherwise you get humans going to a lot of trouble adding it when not needed. The bot is replacing lang= in every t template it sees; but of course that replacement is often a no-op, not changing it. (then if no page text change, not saving of course) And when we get some parser function or something, it will be stripping all of them. Or there may be some place in between; I have an idea or two. Robert Ullmann 16:29, 30 August 2007 (UTC)[reply]

Request[edit]

Since it's my understanding you correct language names, why not also substitute language templates? See drinking water. DAVilla 14:46, 4 September 2007 (UTC)[reply]

AF does that. If you'd added the trans-top gloss and just left the language templates it would have fixed all of them. Robert Ullmann 15:03, 4 September 2007 (UTC)[reply]

Template {t} update task transferred to Tbot.

I has a favor[edit]

I'm not sure if this is entirely possible, but if it is it would be pretty awesome. What I want to do is go through everything that links to {{ro-nounform}} and change the templates to {{ro-noun-def}}. One of the problems is that I changed the parameters to simplify it and it's kindof a bitch to do by hand. I basically just need "gend=x|num=s" changed to "1=xs" and little stuff like that. If it's possible to do this, I'll give you a couple more details :) — [ ric | opiaterein ] — 19:28, 29 September 2007 (UTC)[reply]

I can do this; I recently wrote and ran a bot that did something fairly similar for French. Just let me know the details. —RuakhTALK 20:03, 29 September 2007 (UTC)[reply]
If it can be spec'd rigorously; it isn't hard; one of us can do it. Robert Ullmann 21:07, 29 September 2007 (UTC)[reply]

I think we may need to run an update to Wiktionary:Index to templates/languages as it seems a little out of date. Regards --Williamsayers79 21:44, 4 February 2008 (UTC)[reply]

Bot Strangeness from July 5 2007[edit]

While running some validation code against an offline copy of the wiktionary, I noticed that this bot made some questionable edits to zhì, zhí, zhī, zhǐ and . All of the edits were on July 5 2007, and resulted in duplication of most of the article. I thought you might want to know in case it's a bug you can find and fix. I've been away from the wiktionary too long (several years) to feel comfortable making edits right now, or I'd have just fixed them myself. By the way, I'm writing a substantially improved wiktionary module that should be compatible with pywikipedia. It checks for most of the things that you seem to detect. I'm writing it for my own purposes, but I'd be happy to share it with you. If you're interested, let me know. -- CoryCohen2 04:01, 28 March 2008 (UTC)[reply]

I see, thanks for pointing that out; User:Jusjih had added the audio in the wrong (nonstandard) place; it should be in a Pronunciation section. The bot code found the Pinyin header and headword template correctly, but then it wasn't followed by the definition lines as expected. I'll fix those. You might be interested in looking at User:AutoFormat/code. Robert Ullmann 07:08, 28 March 2008 (UTC)[reply]

Language code list[edit]

Would it be possible the next time you generate Wiktionary:Index to templates/languages to include the dialect and language-family codes in the "etyl:*" area? Could they be added as separate tables? Thanks. --Bequw¢τ 19:22, 11 November 2009 (UTC)[reply]

Nevermind, did it myself. --Bequw¢τ 21:46, 19 January 2010 (UTC)[reply]

lacking conjugation[edit]

Hi. I have to bug you again for creation of the following pages, for our conjobots. User:Rising Sun/German verbs needing conjugation, User:Rising Sun/Spanish verbs needing conjugation, User:Rising Sun/Latin verbs needing conjugation and User:Rising Sun/Italian verbs needing conjugation --Rising Sun talk? contributions 13:21, 16 May 2010 (UTC)[reply]

Well don't now as he's been blocked. I've moved his subpage to User:Mglovesfun/French verbs needing conjugation, could you update it? Ideally I'd like an AWB loadable text file so I can 'fix' them that way. Mglovesfun (talk) 08:25, 18 September 2010 (UTC)[reply]