User talk:TheDaveBot

Definition from Wiktionary, the free dictionary
Jump to: navigation, search
Questions, comments or jobs should be submitted here.

Templates[edit]

Comments:

  • Verb form is used as a header a lot; I think it is as "standard" as anything. The existing Spanish verb forms use it.
    POS line is now templated, currently "Verb", can be changed to suit concensus.
  • {{infl|es|verb form}} will cat in Category:Spanish verb forms and generate the inflection line; this is easy for a bot to replace with a specific template if desired later.
    There is a dummy template for adding any categories we wish to all bot-made entries.
  • the definition lines need templates or we will need another bot to carefully try to recognize all of them.
    Done. All forms are created with template:esbot:conjline or template:esbot:conline_no.
  • the "form of" templates have specific style that can be customized, if there aren't specific templates, it at least needs one that does the right CSS/span HTML magic.
    addressed above.
  • adding the template to the end for categories doesn't help much; it doesn't have the parameters it would need to put the entry into conjugation categories.
    do we really need to sort them by tense, mood, person, and plurality? this seems superfluous to me.
  • when it finds an entry "already exists" does that mean the language section exists? Or just the entry?
    The bot won't add to an existing page, I am working on making it examine existing pages and add to them if they don't have a Spanish section, but I am a beginner at the bot thing, so this is taking some time. For now, it doesn't add to existing pages, and won't create conjugated forms of pages which we don't have the infinitive for.
  • Either way, these need to be remembered and fixed, because they won't be red links, and otherwise may be missed for a long time.
    All of this information is recorded on the wiki in the /summary subpages of the bot so that we can see what it made and what it skipped.
  • Making the templates is fairly easy; how many conjugations are entirely specific to Spanish? Robert Ullmann 20:34, 17 September 2006 (UTC)

Discussion and Vote on TheDaveBot[edit]

TheDaveBot is currently set up to spit out pages for all of the conjugations of regular spanish verbs, about 40 pages per verb. The algorithm and other information is on the Bot's Userpage, and examples of it's work are in it's contribs page. There are a few minor formatting considerations which need to be worked out, which can be seen in the "Requested features" section of the Bot's userpage.

Important facts
  • It only creates new pages, will not overwrite anything which already exists.
  • It only creates pages for verbs which en.wikt presently defines, won't point to red entries.
  • It works.
  • We can templatize the sense lines so they can be changed latter en-masse.
  • I will probably run it from ~2am EST to ~10am EST, so there will be a minimum of slowing as it pertains to "meaty" users.
  • I will request a bot flag, you wont have to see the entries.
  • We should have non-redirect entries for every single verb form anyway, this will enter them accurately and quickly with a bare minimum of effort on anyone's part but mine ;)
  • I will clean up any problems myself.

Questions, comments and concerns are all welcome, or just supports and opposes without rhyme or reason. - TheDaveRoss 07:21, 17 September 2006 (UTC)

Is this a request for 'bot approval? If so, then:

  • Absolut support. We should have these for many other conjugation and declension patterns in many languages! bd2412 T 07:54, 17 September 2006 (UTC)
I would be willing to run it for other languages, provided a conjugation template, a list of regular verbs, and a goto guy on accuracy questions. It wouldn't be too unpleasant to rewrite the relavent portions. - TheDaveRoss
  • Support --Connel MacKenzie 17:32, 17 September 2006 (UTC)
  • Support --Enginear 19:26, 17 September 2006 (UTC)
  • Support MGSpiller 21:53, 17 September 2006 (UTC)
  • Support \Mike 21:57, 17 September 2006 (UTC)
  • conditional support -- I prefer using Verb form for non-lemma entries in highly inflected languages, such as Spanish. (see discussion below) --EncycloPetey 22:10, 17 September 2006 (UTC)
    I templatized the POS line, once consensus is reached all of the bots entries can be made easily to reflect it. -Dave
  • Support Oppose excellent idea, not ready quite yet. Robert Ullmann 14:28, 20 September 2006 (UTC)
  • Support when and trusting that owner thinks it's ready. DAVilla 14:45, 21 September 2006 (UTC)
    • Granted. Connel and Alhen are keeping an eye on things. Dave, we're counting on you to run a clean bot, here. —Dvortygirl 04:27, 25 September 2006 (UTC)

Comments[edit]

  • Sometimes we worry about our article count (currently 3,782,591), and whether it's artificially low or artificially high. Clearly, adding entries for every declension of every word will bloat this count artificially. That's not a reason not to add them, but we should make sure that entries for declined forms (especially when mechanically added like this) always contain nicely definitive tags or categories so that we can easily subtract them all out of the count, some day and if we want to, when we try to come up with a more meaningful number of how many words our dictionary contains. —scs 15:00, 17 September 2006 (UTC)
Typically, paper dictionaries list "x terms with y definitions", or one or the other. While these entries will not define the term I think they are absolutely deserving of inclusion for one reason: it is our purpose to include meaningful content for every term in every language, not to have a good looking article count. At this point, if someone comes across the form pensábamos they have to go elsewhere to get the definition, even though we have pensar defined. After this bot runs, they will be able to discover that pesnsabamos is simply a conjugated form (the second-person plural of the imperfect indicatve) of the verb pensar, and here is a link to find out what that means also if you would like. Far and away more important than the number of entries, or the quibbles about formatting and style, is the usefulness to those who simply want a dictionary, this is a fact that I think gets lost in the shuffle all too often on Wikis. - TheDaveRoss 15:56, 17 September 2006 (UTC)
I absolutely agree about usefulness. And that's why I was trying not to sound like I was arguing against the declined (declensed? :-) ) entries. (I do have a few doubts about these entries, but for now, the usability argument overrules them.)
I was serious about the tagging/categorizing, though -- it will definitely be useful to be able to definitively identify all these merely-declined forms sometime down the road. —scs 16:07, 17 September 2006 (UTC)
How do you feel about the inclusion of Template:es:conjugation on each page created by the bot? We can ammend it later to include it in any category (secondary entries, whatever). - TheDaveRoss 16:47, 17 September 2006 (UTC)
Poifekt! (Well, as far as I'm concerned, anyway.) —scs 17:26, 17 September 2006 (UTC)

The cat template has no parameters, so all it can do is cat in Category:Spanish verb forms. Since there are 62 (with 50 unique combinations), subcats will be very useful. we are talking about 15,000 entries here. If the conjline template is to do this catting, it will require a lot of conditionals, which will make it un-subst'able. It could be done with 2 more careful complete bot passes, but better to have the bot do it now? (name the cat template something permanent, and pass it a code for one of the 50 combinations. Then it can do a switch as desired? Or work out the desired catting and do more in the bot?) Want to support, this is very good.

Also you could make the conjline a permanent template that could then be modded to add the CSS/HTML magic in Form of templates. Or not, we can do a subst pass. Robert Ullmann 14:28, 20 September 2006 (UTC)

What parameters would you like to see exactly in esbot:conjugation? I can make it take all the same parameters as the conjline or more if you like, I just don't see the necesity of categorizing that deeply (mood and tense perhaps, but person?). As for the conjline, what do you mean by permantent? It isn't subst:ed in, if that is what you mean, it can easily be ammended with color schemes and soforth. I am not sure I understand what you are looking fir here. - TheDaveRoss 16:07, 20 September 2006 (UTC)
I updated the bot script so that the es:conjugation template is passed like so:
{{esbot:conjugation|person=' + p + '|count=' + c + '|verb=' + verb + '|tense=' + tense + '|mood=' + m + '}}
Are those five parameters enough? Or is there something else which would be useful? - TheDaveRoss 16:16, 20 September 2006 (UTC)
not sure I understand the syntax you are using. Python. Person is one of the more likely cat targets given what people do in other languages. What does the bot do if there is more than one form? Do you replicate this template? Robert Ullmann 19:18, 20 September 2006 (UTC)
Yes, the template is added once per conjugation, with the appropriate variables passed to each of them. - TheDaveRoss 19:56, 20 September 2006 (UTC)
Good. It would be useful if instead of verb, it just got the ending? (The template can't parse.) so instead of saltar it got -ar ? Then we could have Category:Spanish past forms of -ar verbs or some such. (there's discussion of something like this in English). What plan do you have to subst these later? E.g. what templates are we going to end up with in the entries when the whole process is done? We need a "form of" template with the CSS magic (see {{plural of}}) Robert Ullmann 11:28, 21 September 2006 (UTC)
I will give you both ending and verb, who knows if someone will want to associate the inflected forms with the verb, so mpw the passed parameters are: {{esbot:conjugation|person=' + p + '|count=' + c + '|verb=' + verb + '|tense=' + tense + '|mood=' + m + '|ending=' + verb[-2:]}}. An interesting note is that if we asked nicely, we might be able to get some very neat string parsing functions added to our templates, there is an extension already written, and I think Wiktionary is the most likely user for them. As for the subst:ing, I would think that the POS line, and categorization templates would be subst:ed once we are set with exactly how we want them, and even the definition lines could be subst:ed out. I don't know that any of the templates need to remain, but I wasn't going to monopolize the entries, people can do with them as they please. As for the "form of" template, I am really not sure what you are looking for with the "form of" css thing still, it might be helpful if you made the template and then I could look at it and include it...do you mean a template which encloses the whole entry? Or the inflection line? Or the definition lines? - TheDaveRoss 16:05, 21 September 2006 (UTC)
Look at {{es-conjline}}. It does the customization magic, and takes mood= conditionally so you don't need a separate conjline_no form. Just pass mood as blank or not at all. You know, if you added ending= to it, you could drop the other esbot:conjugation template since they have the same parameters now ;-) it can do cats (it doesn't matter if the same cat like Spanish verb forms is repeated a couple of times). Then you can leave this template "permanently". The only subst would be the POS heading. Robert Ullmann 17:00, 21 September 2006 (UTC)
My concern with dropping the esbot:conj template is that the categories will be scattered between the definition lines, and I like cleanliness :) I like the idea about dropping the second template as long as we are substing them out, if not, conditional templates are more intensive than replacement ones, and I would rather just have two. - TheDaveRoss 17:16, 21 September 2006 (UTC)
I understand what you mean. The categories aren't visible (being in the template); they are actually on the end of the line, else it is impossible to add anything after it, and I think it would break the # sequence. At to being "intensive", having to load two templates is several orders of magnitude more compute than a conditional, which is almost vanishingly small. (I'd bet that one conditional is less than 0.0001 of the compute time in generating the page.) For the pages that use both templates, you save serious time not loading two different ones. Not that the total matters that much. Note that the required CSS magic has to stay inside a template, not get subst'd into the page. (We don't use HTML on the pages, that really wouldn't be clean!) Robert Ullmann 17:35, 21 September 2006 (UTC)

(taking this back to the left margin if no-one minds ;-) a couple of more things I've thought of. It would be better if the "# " was not in the template, but genned by the bot. This allows someone to add things on the line before or after the conjugation sentence. Much cleaner too!

Also, about the placement of categories. When inside things like templates, code elements like this go in lots of places. If you looked, it would probably fry your brain ... consider the (innocent, infamous) {{en-noun}} if it was used here, this is what the template would generate and pass to the next wikitext eval, and thence to the HTML level:

 [[Category:English nouns]]<span class="infl-inline">'''Beer Parlour''' (''plural'' '''[[Beer parlours]]''')</span>
 <div class="infl-table">
 {| border=0 width=100%
 |-
 |bgcolor="#F8F8FF" valign=top width=49.75%|
 Singular<br>
 '''Beer Parlour'''
 | width=0.5% |
 |bgcolor="#F8F8FF" valign=top width=49.75%|
 Plural<br>
 '''Beer Parlours'''
 |}</div>

And this is just one level of the evaluation which uses a dozen different computer languages from wiktext to template to perl or C to HTML to HTTP to network code to assembler to machine code. A couple of levels down it looks like this:

 <span class="infl-inline"><b>Beer parlour</b> (<i>plural</i> <b><a href="/w/index.php?title=Beer_parlours&action=edit"
 class="new" title="Beer parlours">Beer parlours</a></b>)</span></p>

 <div class="infl-table">
 <table border="0" width="100%">
 <tr>
 <td bgcolor="#F8F8FF" valign="top" width="49.75%">
 <p>Singular<br />
 <b>Beer parlour</b></p>
 </td>
 <td width="0.5%"></td>
 <td bgcolor="#F8F8FF" valign="top" width="49.75%">
 <p>Plural<br />
 <b><a href="/w/index.php?title=Beer_parlours&action=edit" class="new" title="Beer parlours">Beer parlours</a></b></p>
 </td>
 </tr>

 </table>
 </div>

(see that </p> after the span? that's a bug in the wiki s/w that was brought up recently on WT:GP)

Click on this link , in a new tab or window, then come back here. Between the server, the internet routers, and your PC, you just executed hundreds of millions of machine instructions. The conditional in {{kanji}} that decided whether to cat if the readings are present was at most 500 of those instructions ... less than a millionth of a second ... there are things that are compute intensive ... ({{nav}}), this ain't it. Robert Ullmann 21:52, 21 September 2006 (UTC)

I will certainly move the # out, that does make sense. I will wait until the current set of entries which were bot generated are edited though, before changing the template. If you change the conjline template so it makes the mood param conditional, I will also change that portion of the bot. - TheDaveRoss 14:46, 22 September 2006 (UTC)
Sorry I missed this for a couple of days; I see you've got it. I wrapped the first cat in includeonly; what are you trying to do with the 2nd one? That isn't recognized syntax ... Robert Ullmann 14:03, 25 September 2006 (UTC)
The second category is more of an example of what can be done with that template, I don't have any important purpose for it. I am going to shift those templates around anyway, I think, so I can subst out conjline while keeping a second template for creating categories with, with the intent of making these entries more user-edit friendly. - TheDaveRoss 16:16, 25 September 2006 (UTC)
  • The heading templates have forever plagued en.wiktionary. It has always been taboo to include a template inside a heading (with the only accepted exceptions that I know of, being abbreviations, acronyms and initialisms.) But to have the entire heading contained in a template is always unacceptable, as that breaks section editing. --Connel MacKenzie 05:09, 25 September 2006 (UTC)