User talk:AutoFormat/2007

Definition from Wiktionary, the free dictionary
Jump to navigation Jump to search


Newline after L2 headers[edit]

First of all, please list what the bot is already doing, since I think this might be something it already does:

The rule should be newline before all headers. There may be an image or something that should be after the L2 header, but then there is a newline before the L3 header. The code is taking the structure about down to L2 now, next will be all the other headers. I've read Connel's (and others') doc 3 or 4 times now; part of the inspiration for this. More doc RSN ;-) Robert Ullmann 13:56, 18 March 2007 (UTC)
It is now handling the spacing around all headers. Robert Ullmann 21:27, 18 March 2007 (UTC)

Blank line after ----[edit]

I noticed AutoFormat is removing blank lines after the horizontal rule, but whenever someone does a language-section edit, wiki will add it back. Try editing the English section on mina and pressing "show changes" without making any. Seems futile to remove it. Cynewulf 16:37, 23 March 2007 (UTC)

odd, that ... AutoFormat isn't really trying to remove them, just that it takes all the language sections apart and then regenerates them in sort order, with specific stuff in between. I didn't want to add in space that "belonged" to the previous section, but not really. Hhum... Robert Ullmann 16:46, 23 March 2007 (UTC)

See also and Related terms as L4[edit]

This cannot be automated, since sometimes the related terms and see also applt to more than one POS. Neither can the References, since this may apply to the etymology and not the information in the POS section.

--EncycloPetey 16:49, 24 March 2007 (UTC)

WT:ELE only allows them at L4 under a POS ... Of course, it doesn't mention See also at all, even though we use it everywhere ... shouldn't any references for the etymology be in-lined? Robert Ullmann 17:45, 24 March 2007 (UTC)
No, they are supposed to be L3, generally, according to WT:ELE#Usage_notes. --Connel MacKenzie 20:55, 24 March 2007 (UTC)
Which refers (only) to Usage notes, which can be at any level (well, presumably > 2 ;-) It is listed under "Headings after the definitions", e.g. within a POS section. Usage notes is only moved 3->4 when it is for a single POS.
The discussion about etymology references was begun (though not be me) and never settled. One idea (that wasn't popular) was to have a separate References subsection under Etymology. There are problems with the really needs some discussion and updating. --EncycloPetey 18:07, 24 March 2007 (UTC)
We do have a page on ety format now don't we? There are all sorts of issues. The thing with references for the ety is that they need to be tied to a specific bit most of the time, and that's (I think) why people usually inline them.
The bot does know how many POS sections there are, and what the tree structure looks like; it isn't flying blind ;-) Robert Ullmann 18:11, 24 March 2007 (UTC)
  • Reviewing AutoBot's latest edits, they seem to all be wrong, moving to L4 when they should be L3. --Connel MacKenzie 20:51, 24 March 2007 (UTC)

Where exactly do we want "See also"? Since we don't mention it anywhere? Related terms is L4 under a POS (WT:ELE), and certainly belongs there if there is only one POS. See also in use is most frequently sequenced after Related terms and Derived terms, and before References and External links, those appearing at L4 as WT:ELE specifies, so "See also" is at L4. The bot isn't standardizing this to ELE style if there is any ambiguity (listed with/after multiple POS). Robert Ullmann 21:22, 24 March 2007 (UTC)

Meta comment: WT:ELE is very clear on the structure, and is written very well. If what you are saying is that it is very badly wrong, then we have a very big problem. Robert Ullmann 21:22, 24 March 2007 (UTC)

The bot code should only be moving the headers listed in ELE as L4 under POS sections, as well as others that are clearly in the same class (Conjugation, Declension, etc) when they unambiguously belong to a single POS (usually because there is only one POS ;-). Mind you, I've re-written the header/subsection code about 2 1/2 times, so I'm not sure this was always true.

"See also" is apparently a different (and heretofore undocumented) case, hence the note on BP. At present it gets flagged with rfc if found at L1/2, left at 3+. Robert Ullmann 22:41, 24 March 2007 (UTC)

I disagree that the bot should be second-guessing human edits. When I edit a Usage notes or See also heading to level 3, it is because the heading applies to the entire entry for that language. The presence of only one POS is no guarantee that moving the note to a lower level is appropriate - usually just the opposite! As I pointed out in WT:BP, we should not allow L4 for either of these headings under any circumstances, anyway. --Connel MacKenzie 22:54, 24 March 2007 (UTC)
If you are interested, I'll try and dig up some of the Ncik/Connel flamewars from my talk page archives. --Connel MacKenzie 22:59, 24 March 2007 (UTC)
(Not allow L4? But Usage notes can be anywhere, as you observed) We seem to have two different classes of headers here. There are things like Usage notes and References, which are shown in ELE as L4, but make sense at L3 after the POS sections; and then there are things like Conjugation:

is just wrong. But very common. Translations has to be at L4, in a POS section. But very often isn't. There are lots of entries that simply have everything at L3, because people who are just looking at entries and not ELE don't notice the difference. (network goes down for hours as I am writing ...) Robert Ullmann 11:29, 25 March 2007 (UTC)

Here I am, naively assuming that the primary style policy document on the wikt actually described reality ... silly me.

Seems very clear that ELE needs to explain that some of the headers shown and described at L4 are also used at L3 after all the POS sections. While there are others that can only be at L4 within a POS section.

I've broken out the two classes for the bot, so it can fix things like product, without messing with headers in the other class. Robert Ullmann 12:08, 25 March 2007 (UTC)

I am sorry, but I don't have the time to flit through the edit history, to see when the errors were introduced. ===See also=== and ===Usage notes===, in particular, were always supposed to be at level three (with a tiny number of exceptions.) Those exceptions may have spurred the errors in WT:ELE. But those two certainly should never recommend L4, as that is inappropriate most of the time. --Connel MacKenzie 19:06, 25 March 2007 (UTC)
  • Likewise many of the other headings. The bot must not second-guess when it is appropriate. Even with one POS heading, there is no way for it to not introduce inaccuracies by inappropriately moving the heading level down. --Connel MacKenzie 16:44, 13 April 2007 (UTC)

Delinking language headers[edit]

Is this bot supposed to de-link language headers like Basque? (see koala) I thought we allowed languages outside of the Top 40 to have the name of the language link from the header. --EncycloPetey 17:34, 24 March 2007 (UTC)

The WT:40 list is for the languages in the translations tables. The use in language headers seems to have come from the infernal wikilinks in the language templates. (Which would be far more useful if they never had links.) Robert Ullmann 17:52, 24 March 2007 (UTC)
But since they're used for the translations tables, I find that having the templates linked is far more useful. Otherwise, you have to know the English name associated with each and every language code, even if you don;t know much English. I was just wondering whether we had ever decided for or against language names being linked from L2 headers. I can't recall ever seeing a definitive statement (or discussion) either way. I'm not particularly in favor of linking them (or not); I merely want to know whether there is policy on the matter. --EncycloPetey 18:09, 24 March 2007 (UTC)
huh? Why would you have to know the English name if you know the code and subst: the template? Yes, you then have to sort it in the right place in the table, but the linking is not going to help with that? All it does is default the linking for a given language name, so you don't have to know the "top 40". (And you really oughtn't be adding a translation for a language into the English entry if you don't even know the English name of your own language?)
as to the language headers and policy, the policy docs that reference the headers never mention links, and over time we have removed a LOT of cruft. Robert Ullmann 18:17, 24 March 2007 (UTC)
It's because (1) your language may be rather obscure, so that it doesn't have a widely known English translation, though you are quite likely to know the ISO code in such a case. This is one of the reasons the French are using ISO code templates as language headers (though we should not). And (2) because your language may have more than one correct English translation (Slovene/Slovenian; Scots Gaelic/Scottish Gaelic) and you might be unaware that one form is preferred over the other on Wiktionary. There are also a number of cases where I as an English speaker would want the language name linked because of possible ambiguity. There are many languages with names similar enough that I get them confused (Malay/Malayalam; and a host of Bantu languages). There are even some English names of languages that apply to more than one language. Standardizing the link when a template is subst'ed allows a user to check to see which language was meant. --EncycloPetey 23:10, 24 March 2007 (UTC)
As I recall, L2 language heading wikification matched Translation section language name wikification exactly. I don't think the distinction was ever an issue to anyone. --Connel MacKenzie 22:57, 24 March 2007 (UTC)

It will be reading the the TOP40 list soon; I can improve this. I do want it to fix stuff like [[Korean]] [[Hanja]]. Robert Ullmann 12:11, 25 March 2007 (UTC)


Do you have code in there, to prevent multiple edits to a single entry per day? I think you should, if you don't. --Connel MacKenzie 23:01, 24 March 2007 (UTC)

There is code that prevents it from looking at an entry more than once on the same run; but that doesn't help too much when I am editing code and restarting it frequently (and often need it to look at the same entry again!); it would/will function properly if running in production. Robert Ullmann 11:42, 25 March 2007 (UTC)

Other thoughts[edit]

I just read this bot's userpage for the first time.


I will set up a bot to tag everything in my "/todo" lists with {{rfc-auto}} after this next XML dump. Would it help you to know which list it is coming from? E.g. {{rfc-auto|todo2}} or would that be superfluous?

Is there a parameter to give to {{rfc-auto}} that tells autobot to replace it with {{rfc}} if it can't fix it (only under certain preconditions)? Or {{wikify}} or {{substub}} as appropriate?


Didn't you say you are a lawyer? Where do you find the time?

--Connel MacKenzie 23:10, 24 March 2007 (UTC)

Thanks! I don't think there is anything useful the bot can do with the parameter; I did think it might copy it into the edit summary. It tags headers that shouldn't be at L2 (or L1!) with rfc, I'd like to see it tag all headers that aren't recognized (whether fixable or not), but getting the control table to that point needs more work (as seen above ;-). It never adds a second rfc tag to an entry. I'm not sure what parameter saying "fix this specifically or rfc" would be useful? Oh, and IANAL ;-) Robert Ullmann 12:17, 25 March 2007 (UTC)
To be precise: if you tag an entry with {{rfc-auto|[[User:Connel MacKenzie/todo2|todo2]]}} the edit summary will start with "rm tag:todo2" Robert Ullmann 15:24, 25 March 2007 (UTC)

I am experimenting with adding {{rfc}} for any remaining un-recognized header. Any variants and erroneous forms that can be automatically corrected (and occur more than one random time) should be added to the control tables. And note the last table, for headers than are wrong, but need specific attention. Robert Ullmann 18:58, 25 March 2007 (UTC)

By the way, did you scour User:Connel MacKenzie/reformat.js for additional rules for your control file(s)? I probably have some leftover/outdated cruft in there now. But every automatic change there is because there were more than a dozen entries with any given mistake, that I semi-auto-correct in there. --Connel MacKenzie 19:04, 25 March 2007 (UTC)
Yes, but it is probably worth doing again, I picked up the most likely stuff. Robert Ullmann 19:08, 25 March 2007 (UTC)

higher being[edit]

The bot had an awful time reformatting higher being. --EncycloPetey 20:09, 25 March 2007 (UTC)

I noticed that. I was trying to tag it at one point, and someone else edited it and removed the tag, and then the bot code hit it again. It is trying to sort L2 headers as languages. I'm thinking about that; at some point it should probably be giving up ;-) Is amusing (at least to me). What should we do with entries that are beyond help? Appeal to a ... never mind. Robert Ullmann 22:13, 25 March 2007 (UTC)
Well, I have been working on that list of languages and ISO templates. There might be a way to dump the list into the bot for making comparisons before it sorts header sections. Does it first compare L2 headers against known L3 headers? That would help somewhat...if it could adject the header level before sorting. --EncycloPetey 23:26, 25 March 2007 (UTC)
The bot reads User:AutoFormat/Languages which is written by the same code that writes my L2 list. It also has the header table, that's how it is generating the rfc tag. What is probably needed is some logic telling it to punt once it hits that error, or anything else similarly egregious. That is why I was watching what it was doing to higher being. Robert Ullmann 11:33, 26 March 2007 (UTC)

Pronunciation N[edit]

moved from my user page, refactored, (EP signature duplicated here and next section)

You might wish to add Pronunciation 1 (2, 3) to the list. --EncycloPetey 22:57, 25 March 2007 (UTC)

Since when are Pronunciation N entries legit? They should be ety entries I should think? Robert Ullmann 23:11, 25 March 2007 (UTC)
Not in cases where the etymology is the same for two or more pronunciations; each of which pertains to a different part of speech. I've just edited abject to show you what I mean. Compare the format of the pronunciation section(s) before and after my edit. This sort of situation is relatively uncommon overall, but common enough that I encounter it on a regular basis. --EncycloPetey 23:10, 25 March 2007 (UTC)
Then that is a structure I've not seen before. I've only seen cases with note in the Pronunciation section. Is this described somewhere? Robert Ullmann 23:13, 25 March 2007 (UTC)
Offically, I don't think so, but it's one of the conventions that's been in quiet use for some time. I forget where I first saw it used. It is my intention to make a case for this sort of structure in certain limited cases of English heteronyms like abject. It makes much better use of our hierarchical data structure, for one thing. For another, I find it much easier to parse as a user. --EncycloPetey 23:24, 25 March 2007 (UTC)
Hmmm... We have seen multiple pronunciation sections (without numbering). The list, BTW: is at User:AutoFormat/Headers. Is read by the bot code. (thus also protected sysop/sysop, as there is a huge potential for mischief.) Robert Ullmann 23:27, 25 March 2007 (UTC)
See second for an even better example...where the pronunciation applies to one verb but not another, yet there is a single etymology. I'm mostly suggesting that the bot ignore these, and assume they're legit for now rather than tagging them all. --EncycloPetey 23:31, 25 March 2007 (UTC)
Moi, being a cretin even though I speak some French, pronounce them exactly the same ... ;-) You can add these to the table, follow the lead for the numbered etymologies. Or I will, presently. (At some point soon, they take magic coding changes too, so that the header levels below in the structure work, I'll worry about that!) Robert Ullmann 23:36, 25 March 2007 (UTC)
  • WTF? "Pronunciation" is the heading. It is not valid, if numbered. It gets "Translations" section style disambiguation, if needed. --Connel MacKenzie 05:24, 26 March 2007 (UTC)
  • No need to overact the shock and surprise WTF business every other post Connel. What's the big deal? We've always been bold with formatting when the guidelines don't seem to quite fit. Hell it's from just that kind of experimentation that we got to the still imperfect semistandard we have and then codified it. Nobody believes it's perfect. There's plenty of room to experiment and to discuss it. There's no need to pretend to faint for god's sake. Just discuss the idea without the dramatics. — Hippietrail 20:42, 9 April 2007 (UTC)
    Exellent point. Drama sucks. --Connel MacKenzie 16:42, 13 April 2007 (UTC)

Transitive and Intransitive verb[edit]

Also, I *think* we eventually decided that Transitive verb and Intransitive verb should simply be "verb" with the {{transitive}} or {{intransitive}} template inserted at the head of each definition. This is a trickier issue than I'd want a bot to handle, but it would be great if your bot tagged these in some way and categorized them for people who like to do that sort of repair editing. --EncycloPetey 22:57, 25 March 2007 (UTC)

Yes, Transitive/Intransitive verb should be fixed, but not by a bot I don't think. If we had a category, we should tag all of them with the category. (fairly easy to do). Robert Ullmann 23:11, 25 March 2007 (UTC)
simply removing them from the table will get them tagged with {{rfc}}. But perhaps something more sophisticated? Robert Ullmann 23:31, 25 March 2007 (UTC)
But these cases often require than two Verb sections be merged, along with all the synonyms, antonyms, quotations, translations, etc. I wouldn't want to be the guy to write a bot that could cleanly do that kind of merge given the typical state of our data! The entries with separate headers are usually the ones that have not had recent attention from experienced editors. --EncycloPetey 23:34, 25 March 2007 (UTC)
I'm just suggesting a bot run to tag all of them with something other than rfc, preferably something invisible, that puts them in a specific attention cat. Not something to try to fix them! (Although: I suspect a decent number of cases only have one or the other, and a single def line: changing the header to Verb and adding the tag to the def line would be fine.) But as you say, most require careful attention. Robert Ullmann 23:59, 25 March 2007 (UTC)
A separate "invisible" cleanup category sounds like the best approach. Note that many times "transitive" or "intransitive" is simply wrong...the verbs can be used as either, which allows for some consolidation of the definitions given. (There never was a vote on the topic, but I learned from those conversations that it is only useful to indentify when a verb cannot be used transitively. Everything else is just "regular" or "normal.") --Connel MacKenzie 16:40, 13 April 2007 (UTC)
I have encountered the transitive/intransitive verb format inconsistency, and while I've worked around it for my purposes I am collecting a list of those entries which have the problem. I agree with the idea, "that Transitive verb and Intransitive verb should simply be "verb" with the {{transitive}} or {{intransitive}} template inserted at the head of each definition" suggested by EncycloPetey. Is there a place where entries using the nonconforming style are accumulating? Probably better would be a way to mark them for subsequent review by human or bot. Whatever, I'm a newbie interested in looking for ways to improve things and am willing to help where I can. Makearney 15:24, 7 May 2007 (UTC)
The cat Category:Entries with non-standard headers would collect these, but right now any header in the control table is not tagged with the cat, even if marked NS. (Would be too many right now.) So these aren't included. We could create such a thing without too much trouble. Robert Ullmann 15:36, 7 May 2007 (UTC)
Maybe you could take a look at that category; the simplest thing would be to take Transitive and Intransitive verb out of the table, and they would start being collected. I'll think about it. Robert Ullmann 15:39, 7 May 2007 (UTC)
I have a bunch of questions: I looked at Category:Entries with non-standard headers. Are you suggesting that I add my list of verbs with non-standard headings (currently 94 verbs) to Category:Entries with non-standard headers? I'll be glad to do that if that's what you mean. However, it sounds as if modifying "the control table" would be a better way to go since it sounds as if the non-standard headers would be discovered automatically if I understand correctly. What is the "control table"? Is it accessible to a general user or is it in a bot somewhere? It might be accessible anyway I suppose. Discovering trans/nontrans headers is one task. Fixing them is another. Is there any reason why I should not just fix any I would like to have fixed? That should not be too difficult since it is a form rather than a content edit. --Makearney 15:26, 14 May 2007 (UTC)
The table that tells AutoFormat what headers to consider standard/non-standard/unknown is User:AutoFormat/Headers, go take a look. Right now it flags only headers that don't appear in the table at all. The question is whether the bot should flag tr/intr verb. By all means fix any you have that you'd like to fix! (And it is a semantic-content edit, as you will find when you start combining the definition lists: simple cases are easy, but there are more complex ones, and you will find errors.) Robert Ullmann 12:59, 15 May 2007 (UTC)
Well I fixed a handful, and will plod on. I believe some low level automation would help here though. I am concerned about making errors in basic cut and past operations to say nothing of being tedious. I am thinking along the lines of functions/macros one could invoke to remove transitive/intransitive from a header and insert "Template:transitive/intransitive" at the beginning of definition lines, or to merge two sections, be it definitions/translations whatever. I've seen a description of assisted bots. Is the wiki API effective for this sort of thing? I certainly could do this if I could get an entry's source text. Some one must have thought of this. Can you recommend a pointer? I am certain I will only have variations on these sort of problems in the future.--Makearney 13:40, 17 May 2007 (UTC)
You might look at AWB. Might be helpful. Robert Ullmann 01:26, 29 May 2007 (UTC)

See section below as well. Robert Ullmann 01:26, 29 May 2007 (UTC)

From the feedback I got after making some changes, I don't think a final decision was made. It appears that there are two ways to proceed: 1. The EncycloPetey approach, in which we mark every definition transitive or intransitive. 2. The Ullmann approach, in which we mark all intransitive definitions, for all verbs that have both intransitive and transitive usage we mark each definition; we don't mark verbs that are exclusively transitive. We have rejected the option of separate Transitive and Intransitive headers. So how do reach a decision here? I've stopped making edits for now. Makearney 12:57, 31 May 2007 (UTC)

There isn't any conflict; EP was only objecting to you removing {{transitive}} tags. The headers have been deprecated for a while, they need to be replaced with the tags. I am not telling you not to use the tags! The only observation I made is that often neither the header nor the tag is needed. If you aren't sure, then just always add the transitive tag when you take out the transitive header. Robert Ullmann 05:42, 1 June 2007 (UTC)

To be be clear about the conclusion of this discussion: The ELE section marked, Non-standard, deprecated headers, says that, "Transitive Verb" and "Intransitive Verb" while still in use are being replaced. "Transitive Verb" should be replaced by Verb with {{transitive}} on the definition lines as appropriate. "Intransitive Verb" should be Verb with {{intransitive}} on the definition lines as appropriate.


Hi, you requested a cleanup for minun because the header "Personal pronoun" is invalid. So...should the same thing be done e.g. to I and ich? There's also the same header. -- Frous 19:23, 29 March 2007 (UTC)

I think Williamsayers is working on this; it should be "Pronoun" with a definition line tag using context tag personal. We'll go chase the others once we are happy with it ... (some of the odd headers are also some of the oldest entries in the wikt, that predate the standards that are being established as an on-going process) Robert Ullmann 19:28, 29 March 2007 (UTC)

Moving categories[edit]

I strongly oppose that AutoFormat moves categories together at the end of a language section. I think category tags should go there where they are most meaningful for editors, i.e. if it has to do with multiple etymologies, it should go in the etymology section, if it’s about prescriptivism, then it should go in the Usage notes section etc. See die. Please move that code out of the bot, or first raise a discussion about it. H. (talk) 15:21, 1 April 2007 (UTC)

Actually, the standard now is to move all the cats to the end of the entire entry. (!) There is going to be a discussion soon on BP, because we need a vote on keeping them inside/moving them inside the language section. Robert Ullmann 15:36, 2 April 2007 (UTC)
No, that is the standard on Wikipedia. I think it is generally accepted here that we put categories on more meaningful places. H. (talk) 14:44, 6 April 2007 (UTC)
Connel will tell you that the existing policy is to move them to the end. (I concur, what WP does doesn't matter; but it did and does have an effect because the pywikipedia category code routinely moves them to then end unless you are very careful to tell it not to. You are right, we generally put them at the end of the language section (that is about the 70% case, I ran some stats). Robert Ullmann 15:16, 6 April 2007 (UTC) the end of the entry. I've heard compelling arguments for "end of language section" but never for having them willy-nilly throughout an entry. --Connel MacKenzie 16:35, 13 April 2007 (UTC)

{{top}} to {{trans-top}}[edit]

At Connel's request, the bot code has been taught to change these to the new format. Robert Ullmann 15:36, 2 April 2007 (UTC)

This is only happenning when the bot can capture the gloss. I found that there were a lot of entries with top/mid/bottom and no gloss (usually single sense), and one or two translations, or none at all. Better left for now I think. Robert Ullmann 11:32, 5 April 2007 (UTC)
I thought {{trans-top}} has a cleanup category for things with no heading? Wouldn't that suffice? --Connel MacKenzie 14:43, 7 April 2007 (UTC)
Yes, that is the cat DAVilla didn't want to flood. I'm planning on running some stats on where top is used, so we can figure out what the cases are. It might be useful to automatically fix top->top2 when outside trans sections; there used to be a lot of these, but I fixed quite a number in the Han formatting process. Robert Ullmann 15:02, 7 April 2007 (UTC)
There are (6.4.7) 23690 occurrances of top in translations sections, 398 in ttbc, and 671 in other sections, mostly related and derived terms. But see entries like key (look at the section for transitive verb). Robert Ullmann 15:01, 8 April 2007 (UTC)
15024 entries that use top. 12228 of the template calls don't have a gloss; it we converted them all that cat would be very big. Robert Ullmann 15:50, 8 April 2007 (UTC)
Wow. Well, no ideas for automating, nor even for semi-automating the addition of a gloss of the first definition (when gloss is absent) are presenting themselves. Hmmm. --Connel MacKenzie 16:32, 13 April 2007 (UTC)

unknown and non-standard headers[edit]

Are now tagged with {{rfc-header}} rather than rfc; I don't want to flood the rfc cat with classes of headers that we may want to either accept or handle with some automation. (E.g. "Verb form"). Headers in the table marked as NS are not tagged yet. Robert Ullmann 11:32, 5 April 2007 (UTC)

I'm going to add a whole bunch more of these. You seem to be missing the Lojban headings. --Connel MacKenzie 14:42, 7 April 2007 (UTC)
I have "Cmavo". Why do we have these at all? They are just the Lojban terms for particle, root, and affix. Shouldn't they be in English? Note that they should get the NS flag. Robert Ullmann 15:00, 7 April 2007 (UTC)
I agree they should be in English, but everyone that I have spoken to about Lojban who speaks it, disagrees. I have no desire to learn that language, and little desire to pursue it. But before they go getting marked nonstandard universally, we should have an About Lojban page, and a separate bot fired off to correct them all as a result of the community discussion. --Connel MacKenzie 16:29, 13 April 2007 (UTC)

Korean definitions need[edit]

May I remind you, Mr. Bot, that Korean syllables such as are often no more than a meaningless phoneme? Why do you ask me to make all of them sememes, which is simply impossible? --KYPark 10:37, 9 April 2007 (UTC)

Syllable should be followed by # lines, for the things that are meaningful. See for example how the phoneme is handled in Mandarin pinyin.
This also answers the next question. Robert Ullmann 11:15, 9 April 2007 (UTC)
I admit it may be hard for Westerners and Chinese to imagine there could be some meaningless syllables, which could be as such as any alphabets. Such is the case at least with most of 11,172 Korean syllables, which are called as such because they are phonemes rather than sememes. No need I answer the next question, unless you answer this properly. Cheers anyway. --KYPark 12:48, 9 April 2007 (UTC)
Of course they/we can. The entries get # lines with the corresponding hanja or other meaningful correspondence; if there aren't any, then the entry isn't needed at all. We don't need an entry for each of the blocks if they have no other significance. (We don't have ry or nif, even though they are perfectly good English syllables used in the previous sentence.)
The entry at needs no "Syllable" section, the entry at need not exist at all; it contains no information whatsoever. Robert Ullmann 13:43, 9 April 2007 (UTC)
Why need such entries as b, c, and other nonsensical units by name of Symbol or Letter, exist? They need because they are building blocks, I guess. While happens to be a deadlock syllable, leads to such words as 너무 in so to speak Derived terms, which are well demonstrated at . Regardless of its meaning, a Korean syllable can serve as an initial, stepping stone, or fan-out to a number of words. To find "comedy" in the dictionary, you need to find the syllable "com-" first regardless of its meaning, whether "together" or otherwise. To find 나라, you may better find first, which may not only mean "I" but also show you "nation" under Derived terms of the Syllable. It should be not too bad to find 나라 through , though both are semantically unrelated. --KYPark 14:58, 9 April 2007 (UTC)
We have entries for single letters because English dictionaries have entries for single letters. I still don't have a decent Korean dictionary in my collection so I have no idea if they usually include entries for single letters or single syllables.
Korean creates a grey area in that letters are composed into blocks which at least people who don't know Korean will think of as characters. That they are characters is reinforced by computers which treat composed syllables as the basic units of Korean most of the time. We have plenty of issues in English where people make their decisions based on how ASCII treats the English alphabet. This makes me expect people to base their ideas of the Korean language on how Unicode or other character sets treat Korean letters and composed syllables.
I think We'd do best by having entries for each Korean letter and none for composed syllables. But if many people disagreed I'd go with entries for every Unicode character including composed Korean syllables if that were the concensus. — Hippietrail 21:12, 9 April 2007 (UTC)

Sorry but to the left margin.

The question of Korean syllables in Wikt may well be related to that of soft or electronic pages, which are entirely free and different from the evenly divided hard copy. Soft pagination is yet to evolve diverse, likely through trials and errors. The Wikt word-by-word pagination is just one thing, far from being all and everything.

The English index fans out to 26 AB pages, each with 26 ab sectional anchors. To find "cony" for example, you click on the /C/ (page) in the Index, and then click on the /co/ (anchor) in the Table of Contents of the /C/ page, hence two clicks. I've just inserted the Direct access section to help reduce to one-click access to specific anchors.

One serious trouble may be that the AB page, especially, /S/ is too long. Wikt has to pump out that much to let find a single word. Thus the word list might better be divided into 26 x 26 short than just 26 long pages. Then, the short page containing "cony" may be named "co", whether as Syllable or otherwise.

Then, such nonsensical syllabaries are confused with real words in the common namespace. So far I am not sure how serious the adverse effect of such confusion would be, especially within the common Korean (Hangul) namespace. I am sure that most languages within Wiktion are confused into an extremely multiplexed alphabetic namespace. But actually we rarely or barely care about that as the entity identification system is well organized such that the entity "co" is English - Syllable.

In summary, the common Korean (Hangul) namespace, as opposed to the Index:Korean namespace that may be unnecessary, would not be adversely affected by a maximum of one or two thousand nonsensical syllables co-existing with the real words. --KYPark 08:28, 10 April 2007 (UTC)

The nonsensical Chinese syllable (?) hao, for example, in the common alphabetic namespace would be justified as it leads to the four real sounds hāo, háo, hǎo, or hǎo of Chinese ideographs. In analogy, then, nonsensical Korean syllables such as , , for example should be justified in the common (rather than Index:Korean) Hangul namespace, it seems to me, as they lead to some real Korean words. --KYPark 10:32, 10 April 2007 (UTC)

This may be really hard for you to understand, but there are people here who understand all this. Two things:
Please follow the standard format. See . List the Hanja characters after the Syllable header with #. No bogus "hanja homophones" header. Do give meanings/glosses where useful. Do not include blank sections ("Derived terms" with no content).
Please make entries only when there is some content. contains no information whatsoever. hao refers the reader to the forms with diacritics; is just a random composed block with nothing useful.
Another point is: were we to have an entry for each composed block, don't you think we should generate them automatically? Are you going to make all 11000+ by hand, and then leave them all for someone else to clean up the formatting mess? Robert Ullmann 14:20, 11 April 2007 (UTC)

Haven’t read all of this, but it was sort of agreed on that every unicode character deserves its own page. But if it is not meaningful in any language, then the header should be ==Translingual== (or ==Symbol==, but that is still under discussion), and an explanation what kind of symbol it is. So for a Hangul syllable that would be ‘combination of the letters X and Y’. H. (talk) 11:38, 13 April 2007 (UTC)

Entries with non-standard headers[edit]

May I ask you, Mr. Bot, to let me know what would be the standard header rather than my Hanja homophones, of such a section? --KYPark 10:38, 9 April 2007 (UTC)

See previous. No header at all; a # list of the Hanja, preferably with the short meanings, but at least the characters. This makes a very useful entry in the same style as we use for the Chinese languages and Japanese.
The bot isn't tagging either of these things for now, but eventually will be again. Cheers! Robert Ullmann 11:18, 9 April 2007 (UTC)

See . The "Derived terms" should probably be in an Index though; they aren't "derived" from the noun or syllable. But in no case should "Derived terms" appear if it is blank. Robert Ullmann 14:47, 11 April 2007 (UTC)

Korean characters in computing[edit]

The computing system does not normally bother Chinese radicals and Korean alphabets aka jamo, while ironically it does English alphabets above and after all. This may be due to the system constraints. In Chinese computing context, therefore, characters or text processing units are composite ideographs rather than radicals. In Korean computing, characters are syllabic blocks rather than alphabets.

A remarkable difference is between semantically more explicit or rigid Chinese vs. phonetically more explicit while semantically more implicit, dynamic or flexible Korean characters, which are as worth an entry in Wiktionary as Chinese and English characters, regardless of specific meaning. The practice of the traditional hard dictionary should not be regarded as the ultimate authority to rule the revolutionary wiktionary of soft electronic multi-lateral kind. I've not heard about any hard dictionary with Translations for example. Too easy is such a justification that Wiktionary has entries for English characters as dictionaries do.

As English characters are worth an entry, so are Korean, however numerous, especially those which are actually used anyway. The short 뇄다 for 뇌었다 (past tense of 뇌다) may be entered and Categoried under the class name or . Therefore, these potential class names and need be entered as symbologic entities in Wiktionary, aren't they, ladies and gentlemen? Everybody, please comment. --KYPark 07:51, 11 April 2007 (UTC)

I was part of the ISO 10646/Unicode process, and there was utterly no technical reason for including the precomposed blocks at all; rendering systems would have no trouble rendering the Jamo. (They can render Arabic and Devanagari, with even stranger rules.) The Chinese ideographs are different, you can't do more than approximate a character with (e.g.) IDS. It was only after enduring years of relentless nationalistic POV screaming by the S. Korean contingent that we allowed all these entries. (Then there was more screaming that we moved the resulting block above Han Unified. So sorry. Not. Robert Ullmann 14:32, 11 April 2007 (UTC)
var arrayC = "g kk n d tt r m b pp s ss ^ j jj ch k t p h".split(" ") ; arrayC[11]="" ;
var arrayV = "a ae ya yae eo e yeo ye o wa wae oe yo u wo we wi yu eu ui i".split(" ") ;  
function doRR(x) {
  n = parseInt(escape(x).substring(2), 16) - 44032 ;
  c = parseInt(n/588) ; 
  v = parseInt((n-588*c)/28) ; 
  return arrayC[c] + arrayV[v] ; 
Thank you for your frank comment. But I am afraid you have gone too far backward, to the pre-Unicode age and into the strange jungle beyond reason and relevance. We have only to aim to make best or better use of the given Unicode here. No need indeed to recall how it came into being, and how much all were annoyed during the process by relentless S. Korean nationalists, however truly, helping noway.
The above simple JavaScript helps with Revised Romanizaion of Unicode Hangul blocks without a final consonant. I use it in recent Wiktionary edits, as you may guess. Such easy scripting was not made possible by the previous KS code based on the used Hangul blocks only, but by the very Unicode, as allowed "after enduring years of relentless nationalistic POV screaming by the S. Korean contingent," as you described.
This single technical merit may be enough to prove that your utterance of "utterly no technical reason" is utterly false. Should it be true, appallingly serious is the implication that not only the S. Korean contingent but also the ISO/Unicode authority concerned including you made a Big mistake. You look like screaming to defeat everybody there, including yourself self-defeatingly! This would be one of the unspeakable, I fear.
Your utterance would be most relevant, eloquent, and effective in rallying against really blind, unreasonable, and relentless S. Korean nationalists I myself also hate so much indeed. To our dismay, however, Wiktionary may not be the right place for such rallying. Your comment may aim to suggest as if I were one of them. Do you really mean it? At the moment I simply wish everybody including you could realize that I would not remain blind nor chauvinist but as reasonable as possible. "So sorry. Not." --KYPark 03:58, 12 April 2007 (UTC)
As English characters are worth an entry, so are Korean, however numerous, especially those which are actually used anyway. NO. Only those that are actually used. Composed blocks that have no use are pointless. (For everyone else's benefit, these 11,000+ composed blocks are all the mathemetically-possible combinations of Jamo; whether they are used is a different issue. By comparison, the 70,000+ Han characters are all coded because they all have been used.) Robert Ullmann 14:32, 11 April 2007 (UTC)
NO. There is no great difference, I am very glad, between my "especially those" and your "only those." Please don't misunderstand me as if I were aiming for entry for all those silly syllabaries of no use whatsoever. I am not mad. But my Maginot line is entry for 399 syllabic blocks without final consonant. Cheers. --KYPark 03:58, 12 April 2007 (UTC)
It is 2:00 local time. I still sit up waiting for your response but no more. --KYPark 17:03, 12 April 2007 (UTC)
It is 2:49 local time. (Nairobi, almost Saturday UTC). I see you understand what I have had to deal with in the past ... I do want to see useful entries, those with content; and have sections in entries only when the sections have content.
We might very well create entries with Syllable and the Hanja for every one that we have the Hanja characters for, I did this with Mandarin Pinyin with bot code. Robert Ullmann 23:53, 13 April 2007 (UTC)
How nice was your weekend? You can undo whatever injustice your 'bot has done me. The sooner the better to YOU! AND! this is the very prerequisite we can help each other for Wikttionary. --KYPark 10:05, 16 April 2007 (UTC)
Injustice? Isn't that a bit extreme? All it has done is tag a few entries for the non-standard headers. And the tag isn't even visible, it just makes them show up in the category, where we can sort things out. (And those headers are now listed as non-standard in the control file, so it isn't even tagging any more for now. Cheers Robert Ullmann 13:28, 16 April 2007 (UTC)

Sorry but to the left margin.

  • As a sysop and bot master, you are supposed to be expert in, or at least very familiar with and keen on, the various wikt policies, especially the well-established WT:POS and WT:AJA, as noted in your User page. But I strongly refer you to 1.1 Flexibility of WT:POS, to 2.1.2 Kanji reading of WT:AJA, and lastly to WT:AK which is very poor to my regret. In this regard, together with some language-specific considerations, I am supposed to reserve the right to "experiment" with Hanja homophones and especially Hanja reading, which would be not so bad as to be marked by the rfc-header or "Categories:Entries with non-standard headers," though it may be appropriated anyway by someone.
    • Your resolution: e.g., "No, you have no such a right."
  • Please refer me to the relevant policy that empty sections should never be attempted.
    • Your resolution: e.g., "No. there's no such policy."
  • What exactly do you mean by content? Can it be some physical rather than semantic description of a Hangul Syllable similar if not equivalent to Letter, including the Unicode value, used alphabets, pronunciation, romanization, Hanja reading, relation or link to other syllables, etc.? Aren't these useful contents?
    • Your resolution: e.g., "No, it cannot be as such."
  • Do you agree with User:Hippietrail who suggests that Wikt may better do without Hangul syllabic entries (pages), hence sections, if traditional dictionaries do so, but with the English alphabet? The 11000+ (roughly 2000+ used including 400- basic) Hangul syllables swing between the 70000+ Hanja and the 50 Kana syllables. Hangul simply enumerates far more than Kana, to the good or not.
    • Your resolution: e.g., "Yes, I do."
  • Do you accept my claim that at least the 399 basic Hangul syllables without final consonant should have an entry, regardless of its semantic content and actual use, as they are vital stems from which a maximum of 27 more syllables with final consonants may be derived and referred to.
    • Your resolution: e.g., "No, I don't."

The entry at needs no "Syllable" section, the entry at need not exist at all; it contains no information whatsoever. Robert Ullmann 13:43, 9 April 2007 (UTC)

  • Is this still your firm stance?
    • Your resolution: e.g., "Yes, surely."
all very nice

But doesn't change the fact that the entry itself has no content whatsoever. Robert Ullmann 15:01, 12 April 2007 (UTC)

What exactly do you mean by content? # content? --KYPark 15:17, 12 April 2007 (UTC)
  • The entry was changed hopefully enough to justify existence. It is still "nominated for deletion." Do you still mean it? I would call this the worst injustice. Oh no robot, Robert. Am I making, or are you shooting, too much trouble? Cheers anyway. --KYPark 03:53, 17 April 2007 (UTC)
    • Your resolution: e.g., "Yes, I mean it."
I fill out "Your resolution" to exemplify how easily you could do that at least. Please tell me if it is still too hard for you. --KYPark 11:49, 18 April 2007 (UTC)
Trying to put words in my mouth is both offensive, and a cheap rhetorical device that completely discredits you.
The entry at was properly tagged for RfD as devoid of content, it now has content, and won't be deleted, the rfd will be struck in the normal process.
The bot tagged several headers that it did not recognize; they have been added to the table so that it won't tag them again. Of course they can be experimented and developed, Wiktionary:About Korean is a work in progress.
Everything else you attribute to me is completely false and is nothing but a personal attack. Robert Ullmann 12:09, 18 April 2007 (UTC)
Have you blocked me? Isn't it sort of abuse? Excuse me but I aligned your paragraphs with ::: to avoid confusion.
If you say, "Everything else you attribute to me is completely false and is nothing but a personal attack," for the reasons since "Sorry but to the left" at most, you are lying, as I was simply asking you some questions, and exemplifying how simply you could answer. Please do me a justice and answer the questions as reasonably as you can. These have exhausted me so much, and caused me to stop editing. We will get least angry but remain most reasonable, shall we? KYPark -- 14:18, 18 April 2007 (UTC)

This is wrong[edit]

Do not let Autoformat do incorrect edits like this! --Connel MacKenzie 16:26, 13 April 2007 (UTC)

Why is it wrong Connel? The synonym can only possibly be for the noun, which is the only POS. Is there a verb "last straw" floating about that is not taken into consideration? Parts of speech have synonyms, and antonyms, and translations. "words" (e.g. spellings, as you so often emphasize ;-) with more than one part of speech might have "related terms", but that is only when someone has been sloppy about sorting the terms under the correct POS. But in this case there is utterly only one possible POS.
Note that Hippietrail has already restored the correct AutoFormat edit. And look at trade, where the structure was wrong (Translations is not a sub-section of Synonyms) and AF fixed it correctly by correcting the level of Synonyms. It does a lot of this. People use L3 all the time when they shouldn't, following bad examples. Robert Ullmann 23:47, 13 April 2007 (UTC)
I agree with Robert. I've been keeping my own eye on what AutoFormat does. 95% (and more) of the time, It either fixes a problem or at least makes a necessary change that introduces no new error. Since the last revisions to the bot, it will still occasionally swap one error for a different one, but only because there was a serious format error to begin with. The fact that a bot is looking for these problems means that we're noticing them more, not that they weren't there before. --EncycloPetey 00:01, 14 April 2007 (UTC)
And look at question, where AF would have fixed the structure entirely—and correctly—except for the insistence that "Derived terms" can somehow (?) be applied to more than one POS. (This can be fixed in User:AutoFormat/Headers by changing the level back to 4.) It does know what the structure is, and what it should be in most cases, it isn't just doing some blind regex. It takes the entire entry apart by sections. Robert Ullmann 00:40, 14 April 2007 (UTC)
I'll emphasize this: the structure of question at the previous edit was wrong, AF would have fixed it if it was allowed to move Derived terms to L4 where it belongs in this case. Robert Ullmann 00:51, 14 April 2007 (UTC)
You are saying that last straw can't be used as a verb? Not now, not in the past and not in the future? Yes, L4 is wrong. --Connel MacKenzie 04:07, 14 April 2007 (UTC)
Re: question - yes, AutoFormat goofed again. The word query most certainly is a noun and a verb, and is synonymous in both POS. So yes, it looks to me like your bot is "guessing" wrong. --Connel MacKenzie 04:11, 14 April 2007 (UTC)
At question, the Synonyms were/are listed after the noun, and before the verb; so they applied only to the noun, and only the header level was wrong. If query is to be listed as a synonym for the verb, it has to be after both, right? Look at the structure before the edit; the edit was exactly correct: that list of synonyms is intended for the noun.
Look at before the edit: after the noun, there are synonyms, derived terms, and translations for the noun (as a subsection of derived terms ...), then verb, then translations for the verb. There are no synonyms provided for the verb. Then look at the result, which is structured correctly. Then also look at the edit to black, which has both Related and Derived terms applying to both POS, which it also handled correctly. Robert Ullmann 12:14, 14 April 2007 (UTC)
If someone adds a verb sense for last straw (unlikely), and the synonym applies (extremely unlikely), they will either move synonyms to the end at L3, or correctly following WT:ELE style, add a synonyms section for the verb sense. Robert Ullmann 11:11, 14 April 2007 (UTC)
  • Likewise, Hippietrail is wrong. You could shut me up with a WT:VOTE, it you really think you are right. --Connel MacKenzie 08:22, 14 April 2007 (UTC)

Connel, it is you that wants to change ELE (which would take a vote ;-). It is very clear that these headers are all normally at level 4; it shows that in the examples, and then lists them as "Headings after the definitions", i.e. within the POS section. Under Derived terms, it says explicitly: If it is not known from which part of speech a certain derivative was formed it is necessary to have a "Derived terms" header on the same level as the part of speech headings. In other words, that is an exception when there is more than one POS and it can't be resolved to the standard L4 heading within the POS. The normal case (e.g. should be the 99% case) is that Derived terms is an L4 header. Synonyms always is, there isn't any question what POS it is a synonym for. This is what ELE says, and what we are doing. Robert Ullmann 11:42, 14 April 2007 (UTC)

No, what you are doing is enforcing "instruction creep" - precisely the opposite of what had been the convention for years, particularly for synonyms! --Connel MacKenzie 12:25, 15 April 2007 (UTC)
It isn't instruction creep; the instructions are simple (this set of headers goes at L4, except in a few cases like derived terms that can't be associated with a POS, the example is clear), and we aren't making it more complicated, nor trying to modify behaviour with ever increasing instructions.
The conflict is what Richard was trying to resolve with the (overly complicated) policy process: various conventions exist, shift, get taken in different directions, and never get written down. (and no, two-year-old BP discussions don't count, although I've read them all). In this case, the conflict is between what is (A) written policy, used by most users, used by a very large majority of entries, and (B) what you recall as the "convention for years" .... (;-)
We are at the beginning of scale: in a very short time the wikt will have many thousands of editors and many millions of entries; we don't want to make the instructions more complicated, except where absolutely necessary; but we must actually use our written policy, not be following conventions that have been sort of in the air, but not quite written, or telling people (as WT:ELE does!) "go ask a senior person".
The idea behind AF is that it will fix a lot of the simple things that people do without reading the instructions (the major reason why instruction creep is pointless), and we don't have to go about snapping at newbies. (Or those that have been around for a while, and are still writing "Pronounciation" or whatever. ;-) Robert Ullmann 15:29, 15 April 2007 (UTC)
  • That is the ultimate "irresponsible bot operator" justification I've ever heard. If you want a bot-war, then get ready. Otherwise, stop using your automation to "enforce" something that does not have consensus. --Connel MacKenzie 04:00, 17 April 2007 (UTC)
We have both consensus and well-written and well-established policy. I frankly don't know where you are coming from. We can debate whether WT:ELE is "real" on the BP. Robert Ullmann 11:15, 17 April 2007 (UTC)
Also note that I have been checking every single edit and all of these are correct. Robert Ullmann 11:20, 17 April 2007 (UTC)
No. Your interpretation of that policy is erroneous. You cannot be certain of a given part-of-speech for most derived terms, synonyms and antonyms. The fuzzy wording is the rule not the exception.
Did I miss your WT:VOTE for bot approval for AF? While possible, that seems highly unlikely. --Connel MacKenzie 15:23, 17 April 2007 (UTC)
  • Well, I am content with the "rfc-heading" approach. Back in the day, all "level 2" headings were languages, and all other headings were "level 3." Obviously, I don't find the etymology subdivision layout useful now (yes, I do recall arguing in favor of it, at one point.) I can't tell what the future will bring. But auto-correcting them to L4 still is wrong, to me. --Connel MacKenzie 18:07, 11 May 2007 (UTC)

wikipediapar and cattag[edit]

Code added to change wikipediapar to wikipedia and cattag to context (both presently redirects) Robert Ullmann 13:25, 16 April 2007 (UTC)

{{rfc-level}} tag[edit]

Code added to tag entries that exhibit structural problems that can't be fixed by AF. Code to change header levels disabled for now. Robert Ullmann 13:15, 18 April 2007 (UTC)

Thank you. --Connel MacKenzie 14:03, 18 April 2007 (UTC)
I don't suppose you could adjust it so the tag is placed within the section headed by the marked heading? It's a bit annoying that we can't fix the problem and remove the tag just by editing the relevant section. —RuakhTALK 16:29, 22 April 2007 (UTC)
Good idea, done. Robert Ullmann 14:17, 26 April 2007 (UTC)
Thanks. :-) —RuakhTALK 14:53, 26 April 2007 (UTC)


Please don't auto-subst pagename. Or, if you do, perhaps you could add yet-another cleanup tag. I (in the past couple years) have seen very few expection to the rule, that people that somehow use the preload templates wrong, usually do LOTS of other things wrong as well. On the other hand, the once-an-XML-dump method of reviewing them has worked pretty well. --Connel MacKenzie 22:53, 26 April 2007 (UTC)

Hmm, I recall the comment like that on your todo page. So far, AF hasn't sub'd any PAGENAME I've note other than A-cai leaving it in the sortkey of a category. (Oh, and Opterein) And you got a whole new batch on the last XML-but-one that weren't from preload templates. Lemme think about this, ought to be able to do something. (In the meantime, I am watching everything ;-)
What I don't get is how the preload templates affect this? How does someone prevent the subst:? (Do they explicitly remove it?) Robert Ullmann 23:46, 26 April 2007 (UTC)
That, I don't know. I never did get time to go back and review all the preload templates (Special:Prefixindex/Template:new en ). But I don't see how they are slipping in, offhand. I doubt they are all old. --Connel MacKenzie 00:20, 27 April 2007 (UTC)
This new batch seems to be all User:Gilward Kukel (Esperanto) and User:Eric Utgerd (Crimean Tatar), both making decently formatted entries but leaving the PAGENAME in. Robert Ullmann 00:28, 27 April 2007 (UTC)
Yup...the first half, anyhow. {{new eo proper noun}}? Well, anyhow, yes, I recind my request, provided you are still manually reviewing them one-by-one. --Connel MacKenzie 00:36, 27 April 2007 (UTC)

Pronunciation and/or Etymology at L4 at start of language section[edit]

These two headers often appear at L4 at the start of a language section; it is not obvious because the WM s/w displays the TOC correctly as if they were at L3 (unless both appear).

AF fixes this case while checking for Etymologies at L3. Robert Ullmann 14:08, 15 May 2007 (UTC)

Usage notes[edit]

νεανίας was recently tagged with rfc-level for having "Usage notes" at L4. The example at WT:ELE has usage notes at L4. It is not specified what level UN should be at. May I ask for some clarification here please? Atelaes 20:57, 15 May 2007 (UTC)

At the time, Usage notes was at L4 in the "Inflection" section (which is in fact okay in this case, but then the related terms and references were also nested inside Inflection ;-) Hence the comment that putting Inflection at L4 would fix the problem ... The tagging is not perfect, working around not being able to fix Inflection to L4 is not fully compensated. (If it was allowed, it would have changed Inflection to L4 as you did, and not complained about anything.) Robert Ullmann 23:51, 15 May 2007 (UTC)
Ah, I see now. I keep forgetting that your little bot is intelligent enough to analyze the overall structure of the entry. Thanks. Atelaes 23:53, 15 May 2007 (UTC)

Another feature?[edit]

Is it possible to add the "minor" edit of changing "(''vulgar slang'))" to "{{context|vulgar slang}}"? This conceivably could have much greater variety in corrections than headings, particularly for multiple tags. --Connel MacKenzie 17:07, 16 May 2007 (UTC)

Hmmm, interesting idea. Would we want a list of tags to recognize? (I should think so; perhaps something generated from the template collection? ...) Would need to add the lang= parameter as well, or lots of presently uncategorized entries would land in the English cats. AF already has the code table. (Would need to invert it, but that is one line of code ;-) Will think on it ;-) Robert Ullmann 14:28, 17 May 2007 (UTC)

So looking at the cases of # (''something'') we find ...

See User:Robert Ullmann/Contexts.

and there is certainly something AF could be doing. Robert Ullmann 16:23, 17 May 2007 (UTC)

<low appreciative whistle> Well, if there are less than 5, and no existing template, perhaps they should be relegated to a cleanup list of some sort.
Wow. --Connel MacKenzie 17:14, 17 May 2007 (UTC)
I tried adding the simplest case as a test: only English, only single tags (so it doesn't generate {context|...|...}), only the ones that appear in the XML (using the report above as a control table).
See fulminant, pound, fly, have eyes bigger than one's belly, spunk, but also beaver in which it made a mistake (fixed). There are a number of considerations; this might be better as a separate task:
  1. handling language and language oode; apparently there is some sub-class that wants the language name? The doc at {{context}} is not clear
    When the language name is required, which is rare, the label templates pass the lang code to {{language}}. DAVilla 15:57, 18 May 2007 (UTC)
  2. phrases with qualifiers in {{context/modifier}}
  3. multiple tags, when only one is a known label
  4. phrases like "by extension"; do we want to make this {{context|by extension}}
  5. the many descriptive phrases that make up most of the low occurrence counts (not) in the table above
  6. removing redundant category links (! didja think of that one? ;-)
I'm going to let it try some more, and then turn the simple code off for now. Robert Ullmann 13:54, 18 May 2007 (UTC)
See gusset, garnet, hi. Note at gusset only one of three definitions has a known label, while "Armor" is a category. Robert Ullmann 15:29, 18 May 2007 (UTC)
Some phrases like "by analogy" and "by extention" are okay to context, but it wouldn't be a good idea to do in general, especially for phrases like "Of people" and "Of animals". (My preference for these specifically is to take them out of parentheses, but I don't know if that's controversial.) I would think that the existence of a label template or a category is a sufficient heuristic. DAVilla 15:42, 18 May 2007 (UTC)

I have let it run for a while, doing a number of edits that all look good. reboot is a very interesting case. As noted, there is some serious work bebefore this can be made more general. Turned off for now. Comment please. Robert Ullmann 00:10, 19 May 2007 (UTC)


This seems like a pretty high number of edits for such a short period of time. Please do not burn yourself out. --Connel MacKenzie 17:29, 17 May 2007 (UTC)

I do get very tired of changing ''m'' to {{m}} and the endless mid to trans-mid (remember—I am told endlessly, every time through the bloody section loop—do that only if you changed top to trans-top, and that only if you could find the gloss. Sigh ...) My doctor (like I have a doctor) would warn me about spending so much time concerned with "translations" and "gender". Not healthy.
And today? My boss spends the whole day "taking tea" and watching "Test Cricket" from some place called "Lord's" of all things! While I work. Bloody pompous [redacted] arsehole.
I put in my hours. (well, microseconds, but you know)
Not bloody fair, I say. Where are my FA Cup final tickets? No? Well now, do I get to sit at the bar watching it on telly and taking gin and tonic? AutoFormat 23:02, 17 May 2007 (UTC)
Uppity bot. Robert Ullmann 13:30, 18 May 2007 (UTC)
On the topic of translated gender, what gender do you suppose AF is? (Might help to choose a pronoun...then again, with all that translating going on, I guess we'll never know for sure!) --Connel MacKenzie 22:12, 23 May 2007 (UTC)

Possible new feature[edit]

AutoFormat does not seem to switch a definition line like:


  1. A crocodile

As the use of * for # is a not uncommon mistake, I'd recommend it be able to make the switch. --EncycloPetey 17:31, 17 May 2007 (UTC)

Thought about this. Often it is as you say; just someone using * for #. But there are other cases, where there is something else going on. There isn't any syntactical pattern to recognize a definitional sentence or phrase, except in the very simple cases. Which is why the # is so important, and why we don't use it anywhere except definition lines.
So IMHO better to tag it and have someone look at it. Robert Ullmann 13:38, 18 May 2007 (UTC)
Look at, for example, nigher in which the * line is in fact a definition, but it also needs a bit more work. Robert Ullmann 15:15, 18 May 2007 (UTC)
If I may add my two cents: I agree wholeheartedly with Robert on this one. I don't do that change in my manually-reviewed semi-automatic Javascript as it is more often wrong. --Connel MacKenzie 21:51, 23 May 2007 (UTC)
In the very simple cases, where what follows the * is a template used only in definition lines, e.g. {{misspelling of}} or {{plural of}}, then it should be fine to (semi-)automatically fix. Thryduulf 20:39, 15 July 2007 (UTC)

Order for POS[edit]

Why did AutoFormat make this switch [1] ? I thought we normally put Adjective before Noun because of alphabetical order? --EncycloPetey 17:35, 17 May 2007 (UTC)

That edit is User:WikiPedant, AF did the following edit (adding trans-top). And he/she should not have switched it, as you say, the POS order was correct. AF doesn't try to order POS sections; too many L3 sections that should have been L4 to get misplaced. Robert Ullmann 18:43, 17 May 2007 (UTC)
Sorry, I must have misread and blended two edit lines together in my mind while reading. --EncycloPetey 18:53, 17 May 2007 (UTC)


I noticed that AutoFormat put [[Category:English nouns that lack inflection template]] into an English plural here. Could you set it so that it won't put the category into articles that use {{plural of|}}? Thanks, Tim Q. Wells 17:16, 19 May 2007 (UTC)

It only knows it has to generate the missing inflection line, and thus that someone should be looking at the entry. Hence the category. You notice you removed the inflection line entirely by reverting; plan to put it back? Robert Ullmann 17:20, 19 May 2007 (UTC) Tx ;-) Robert Ullmann 17:21, 19 May 2007 (UTC)
Yes, I put it back. Sorry for the misunderstanding (I didn't realize I had forgotten the heading). Tim Q. Wells 17:23, 19 May 2007 (UTC)
Tx, this is AF's whole purpose, to go around being picky mostly so we don't have to. Course I watch cricket while it works. (see above ;-) Robert Ullmann 17:28, 19 May 2007 (UTC)

{{Acronym}} not an L3 POS section header?[edit]

I'm not sure what the problem is with RAM; is ==={{Acronym}}=== not considered a level-3 part-of-speech section header? Is it that it should be ==={{acronym}}===? —RuakhTALK 03:18, 20 May 2007 (UTC)

It's still treating those odd headers as special cases. I say odd because as a general policy we don't have templates in headers, this should be header ===Acronym=== with a POS template on the headword line, like everything else. So AF has several places it has to treat these differently. (Putting them in the table doesn't work 'cause they take parameters, and thus don't match ...) Will go look. Robert Ullmann 16:51, 20 May 2007 (UTC)
I thought I'd handled that, and I had. If the header was ==={{acronym}}=== like it should be, it would be fine. The problem is the upper case Acronym, redirects to acronym. Suppose I should look for that too and fix it ... between Acronym and Initialism and Abbreviation there are 450+ of these, I guess they are worth hunting down ;-) Robert Ullmann 17:15, 20 May 2007 (UTC)
Fixed. Will correct templates. Robert Ullmann 17:27, 20 May 2007 (UTC)
Cool, thanks. :-) —RuakhTALK 18:59, 20 May 2007 (UTC)
  • Heading note - removed the "=" signed from this section to eliminate this false-positive bad heading. --Connel MacKenzie 19:36, 13 August 2007 (UTC)

Demonstrative pronoun[edit]

An article ये was tagged by autoformat for having a nonstandard header demonstrative pronoun. It's a pretty useful and accurate header for the word. I couldn't find anywhere a list of accepable POS, but it's in Category:Parts of speech. Thanks - Taxman 18:49, 23 May 2007 (UTC)

But the heading is ===Pronoun===. The more specific information can be put on the "inflection line" right after the heading as {{context|demonstrative pronoun}} perhaps. Even better, would be to have that at the start of each applicable definition line. --Connel MacKenzie 21:47, 23 May 2007 (UTC)

Feature request - rfc-trans[edit]

When you encounter an entry with a translation section, but the number of translation subsections does not match the number of definitions above, can AF add the cleanup tag {{rfc-trans}}? Or perhaps even stub-in the needed subsections (I imagine figuring out a gloss would be way too much to ask, right?) --Connel MacKenzie 21:54, 23 May 2007 (UTC)

Never mind. The number of exceptions to the rule is staggering. --Connel MacKenzie 22:08, 23 May 2007 (UTC)
Indeed. There are many entries with definitions that do not need translations sections. Couple of examples: definitions that are "misspelling of" in the same POS as a legitimate word, definitions tagged obsolete, archaic, maybe poetic, and so on. Robert Ullmann 15:16, 24 May 2007 (UTC)

Converting context labels[edit]

A bit of explication of process:

  1. first, build a table of the context templates from the XML, with redirects identified
  2. in each definition line, break out a string at the start that is # (''...'')
  3. break string on commas, remove other 's
  4. for each tag, separate context modifiers
  5. if template matches, use it, else quit
  6. if more than one template, or modifier + template etc, add context|
  7. if language is known, and not English, add |lang=code, if not known, quit
  8. for each template, remove duplicate category code:name if present

Languages known and code table comes from the language templates in the wikt. Robert Ullmann 15:27, 24 May 2007 (UTC)


Why didn't AF do the trans-top thing? Couldn't it count the number of languages in a given section, and stick the trans-mid in, too? --Connel MacKenzie 19:26, 24 May 2007 (UTC)

it is changing top to trans-top, folding the gloss into the trans-top call when found. quackery doesn't have a top template Robert Ullmann 01:03, 25 May 2007 (UTC)

Multiple pronunciation sections[edit]

Take a look at this edit. AitoFormat should be able to cope with this without becoming confused. --EncycloPetey 18:25, 28 May 2007 (UTC)

Really? This is completely non-standard format. We have numbered Etymology sections, not numbered Pronunciation sections. The entry needs to be formatted correctly: different pronunciations for different POS are disambiguated within the (one) Pronunciation section.
AutoFormat is tagging the entry properly as having a serious structure problem. Robert Ullmann 01:30, 29 May 2007 (UTC)
(Besides which, in this case there isn't anything in the Pronunciation sections anyway, and the POS headword repeaters have the macrons or not, so why this structure anyway?) Robert Ullmann 01:53, 29 May 2007 (UTC)

Transitive/intransitive verb headers[edit]

AF is tagging these separately, rather than accepting them or lumping them in with the other non-standard headers. These case have to be cleaned up manually. (Only a very simple case could be done by bot; in any case out of scope for AF, is a semantic change.)

See Category:Entries with transitive verb header.

Experimental, easy to change/fix/recat/whatever. Robert Ullmann 01:35, 29 May 2007 (UTC)

Infix not recognized as POS[edit]

Infixes should be considered on equal footing with prefixes and suffixes, however, now the bot doesn't acknowledge this as a POS header[2][3]. __meco 12:42, 29 May 2007 (UTC)

Hmm, it is not listed as standard in WT:POS; it is listed under "other headers in use". Should get added to the AF table ;-) Will do. Robert Ullmann 06:31, 30 May 2007 (UTC)
Oddly, "Affix" was already there. Robert Ullmann 06:34, 30 May 2007 (UTC)

Madaraka Day[edit]

Test run without "Connel" flag, details:

  • daemon alt spelling/forms needs to be in EOS list, fixed
  • fenomen, fetr (lots of Crimean Tatar) Declension properly changed to L4
  • animal problem fixed, but English messed up because of a bad dependency on the flag, want to recognize level 4/3 headers in either case as end of last ety section
    code fixed, retested, correctly fixed structure of Latin section
  • fin derived terms from noun properly changed to L4, also related terms in Spanish
  • pee related terms for pence (ety 2)
  • otter translations for noun
  • fokus header levels in numbered ety sections fixed correctly, declension to L5 (this isn't dependent on the flag)
  • moonset from cat, rm tag, correct translations to L4
  • fight translations and derived terms to L5 in two etymology sections
  • recur was able to fix related terms to L4, thus translations
  • related terms to L4 (could still use a POS other than "Kanji")
  • 五大碼 (and others) synonyms to L4 properly
  • hobo derived terms to L4, translations now not under derived terms, so no complaint
  • hai homophones to L4, but is a bad header not in Pronunciations
  • accomodation from cat, rm tag, fix levels correctly

(last change) Robert Ullmann 08:27, 1 June 2007 (UTC)

Mostly pretty routine (dominus), I'll add more if I see any other interesting cases. Venus just lost, Federer doing as expected against Starace ... Robert Ullmann 11:48, 1 June 2007 (UTC)

  • tease is interesting; the terms should be derived, but needed sorting

Trying some more (last edit) Robert Ullmann 23:10, 1 June 2007 (UTC)


See this edit. Why did AF mark this? --EncycloPetey 02:25, 4 June 2007 (UTC)

See also this one; similar issue. --EncycloPetey 02:27, 4 June 2007 (UTC)
I can't speak for AF (obviously), but note that WT:ELE as it stands does not provide for POS sections to be grouped by pronunciation, only by etymology. (The message could be improved, though; it's currently giving the same message for POS headers as for, say "Derived terms", and the message makes sense only for the latter.) —RuakhTALK 05:01, 4 June 2007 (UTC)
Yes, Robert and I have discussed this issue previously. Getting our ELE standards for Pronunciation formatting updated is a high priority for me, and I should have time to draft a vote (or two) in a little over a week. The key issues are heteronyms and homophones. I'm hoping to get a set of regional pronunciation templates with a context-like function as well, but that shouldn't require a vote unless the community decides to adopt them as standard. --EncycloPetey 05:09, 4 June 2007 (UTC)
AF was originally designed to correct header levels when the structure was understood, and not do anything to the entry otherwise (unless there was a completely separate change, like cattag->context. It was flagging Pronunciation 1/2/3 as unknown headers; we added them to the control table, and everything was fine. Then with Connel screaming that AF was doing things wrong (by following the example in ELE, which is the only part of all the policy docs most people look at!) I added a flag to disable the changes, and code to tag header levels that are apparently wrong; so now this gets tagged. If we can somehow decide that ELE does mean what it says, AF can just go back to fixing known errors. Robert Ullmann 13:05, 5 June 2007 (UTC)

Inconsistent POS header levels[edit]

In this edit AF changes one POS entry to L4 while allowing another to remain at L3. This seems inconsistent and gives the page a disordered layout. I assume this cannot be overriden without the bot reverting the changes. __meco 11:35, 13 June 2007 (UTC)

I was just looking at that entry! The problem is (or is related to) "Alternative spellings" at L3 after the Etymology n header, where it should be nested at L4 (applies only to that etymology). AF handles this for Pronunciation, but I didn't cover Alternative spellings/forms the same way. I'll tag it and re-test a bit later. thanks! Robert Ullmann 11:52, 13 June 2007 (UTC)
Ouch, what were you thinking? ;-) no, you don't need to nest under Pronunciation. Checking again now. Robert Ullmann 22:48, 13 June 2007 (UTC)
I think it is good now? Robert Ullmann 23:12, 13 June 2007 (UTC)

Noun at L4+ not in L3 POS section[edit]

I want to format entries correctly, but I don't think I understand "Noun at L4+ not in L3 POS section". What was wrong with before this? I put ====Noun==== at L4 to show that it's the only POS section associated with the L3 ===Etymology===. Is there a different way to indicate that scenario? Does the entry need another ===Etymology=== section? Rod (A. Smith) 19:57, 13 June 2007 (UTC)

There are two different etymologies, we need another section (even if it contains no ety information known). Robert Ullmann 22:51, 13 June 2007 (UTC)
OK. Obviously every word has an etymology. When an etymology is unknown, we usually don't require it. Strangely, though, we now require an etymology section if another word with the same spelling from the same language has a known etymology. That strange requirement is caused by our illogical entry structure. (See my "Of what is the etymology section an etymology?".) Rod (A. Smith) 23:07, 13 June 2007 (UTC)
Right. We don't want blank sections; but in this case it isn't: it has the nested sections. We do know there is a different etymology, correct? Even without some specific information? Somehow we have to say "not that etymology, but some other specific one". Is at least self-consistent ;-) Robert Ullmann 23:12, 13 June 2007 (UTC)
Correct, I know there is a different etymology, probably one related to the etymology of the interjection (ne, “ne”), which itself is probably a native Korean word. I don't know the etymology, though, and it seems weird to write "===Etymology===" when we mean "a distinct word" (which, by definition has an etymology). Rod (A. Smith) 00:21, 14 June 2007 (UTC)
Entering "unknown" in the etymology section is perfectly valid. --Connel MacKenzie 19:27, 16 June 2007 (UTC)
Good point. I'm not sure I'd feel comfortable saying "unknown", though, because that might imply that there is no consensus as to the etymology. That is, readers might reasonably interpret "unknown" as "unknown [by this editor]" or as "unknown [by etymology experts]". Rod (A. Smith) 20:04, 16 June 2007 (UTC)
The etymology heading has two functions. This was done because we need structure but the only thing available to create structure is headings and we didn't want to create a heading "Word" or "Entry" which would've been too abstract and just a wrapper for the etymology section most of the time anyway.
Because of this there are times we know that two words have the same orthography and different etymologies but we don't know one or more of the actual etymologies. Putting "Unknown" in an etymology section means that you have consulted primary references and they list the etymology as unkown. Never put "Unknown" if it's just you who doesn't know the etymology and you weren't able to find any references. In such cases it's fair to leave the etymology section blank or put in an etymology request. If you have a hunch it should be fine to put it in qualified by "possibly" or "probably" depending on your confidence. Another approach might be to put "This word is not related to Foo(1)". — Hippietrail 23:14, 16 June 2007 (UTC)
Yikes, I wasn't clear at all there, in how I worded that. I had (incorrectly) assumed that everyone knew about {{etystub}}, and were discussing the case where that isn't appropriate. Unknown by this Wiktionary editor == {{etystub}}. Unknown by etymology experts == "unknown". On the other hand, a blank etymology section is not acceptable. --Connel MacKenzie 19:39, 1 July 2007 (UTC)

Be careful with Usage -> Usage notes[edit]

On this change the section Usage was changed to Usage notes but the section was actually examples of usage rather than notes and should've probably been tagged with RFC instead. I've seen this elsewhere also. — Hippietrail 23:17, 13 June 2007 (UTC)

Okay, but "Usage" isn't a header, and what header would you use other than, say, "Usage notes"? Point being: it can be tagged, and the whoever fixes it is probably going to change it to "Usage notes", maybe with ;Examples: or something? What else would you do? (serious question) Robert Ullmann 23:23, 13 June 2007 (UTC)
If they change it to usage notes it will still be wrong. Examples of usage belong beneath each sense, indented with : and are usually in italics with the headword in bold. I would suggest anything with the Usage heading needs to be checked and fixed manually. — Hippietrail 22:44, 14 June 2007 (UTC)
Okay, I've changed it to just be tagged. (Do note that other people can edit the control table.) Robert Ullmann 12:19, 15 June 2007 (UTC)
  • It does cast some doubt on your earlier promise to review each entry as it goes. --Connel MacKenzie 17:37, 16 June 2007 (UTC)
As I said above, it was changing it to Usage notes, not (of course) moving them in among the defns (there is no possible way it could do that); I thought this was okay; there are a number of entries with extensive usage examples under usage notes that cannot reasonably be interpolated in the defns. Robert Ullmann 13:09, 26 June 2007 (UTC)

Noun phrase[edit]

Could you please remind me why you aren't correcting "Noun phrase" to "Noun" (etc.)? Last time I checked, WT:ELE was pretty clear that those aren't OK. --Connel MacKenzie 17:39, 16 June 2007 (UTC)

first, it isn't totally resolved that X phrase is not to be used (all but totally ;-). But that isn't the issue, the issue is that "Noun phrase" should sometimes (usually) be "Noun", but sometimes "Phrase", "Idiom", or even "Proverb". No way AF can tell. (This is different from "Verb form" which we can routinely fix if it does get nailed down in policy.) Robert Ullmann 13:05, 26 June 2007 (UTC)
Entries with X phrase headers are now tagged separately, with {{rfc-xphrase}}, and appear in Category:Entries with X phrase header. Robert Ullmann 17:16, 1 July 2007 (UTC)
Please tell me again, what the circumstances are, when you can't auto-correct this? Can you at least correct these when they are the only POS heading? --Connel MacKenzie 21:40, 16 July 2007 (UTC)
That is, from my interpretation, the heading can be ===Phrase===, but any other heading with the word "phrase" in it should always be converted to the correct heading. The only time that can cause a problem is when it would result in a duplicate heading. So, "transitive verb phrase", "verb phrase", "noun phrase", "adjective phrase" should all (at a minimum) have " phrase" removed. --Connel MacKenzie 21:43, 16 July 2007 (UTC)
Hmmm, they also need templates sometimes, and other things. Often they have no language/POS cat. I'll try changing the table, and look at a few; but I may want to change it back. Robert Ullmann 20:42, 17 July 2007 (UTC)
RJFJR and Algrif have been fixing some of these, adding templates and such as well. Might be better to leave them in the RfC category? (I'd just have to revert the table.) Robert Ullmann 21:35, 17 July 2007 (UTC)
RJFJR has indicated (see my talk page) that he/she is adding templates and finding a few other things to do; so I'm leaving it as just tagging for now. I did test the code with the table set to fold them automatically, and it does work properly. Robert Ullmann 14:31, 18 July 2007 (UTC)


Hmm... interesting case. Robert Ullmann 15:44, 26 June 2007 (UTC)

forty-five and others[edit]

Note to self: Sanskrit in these entries (numbers in range ??) needs fixing, no Devanagari, transliteration is not IAST. Robert Ullmann 17:16, 4 July 2007 (UTC)

Proper name[edit]

Suggest automatically changing Proper name to Proper noun. --EncycloPetey 22:33, 6 July 2007 (UTC)

There are only a few, but it does seem to be consistently proper nouns (names of places at present). Okay. Also changing Proper adjective to Adjective which is what we use. Robert Ullmann 00:04, 7 July 2007 (UTC)

rfc-xphrase|Verb phrase[edit]

Hi Two questions. 1) Can I fix some of these without having to check with anyone. (See give the elbow for example. Have I done this correctly?) 2) I'm not clear whether Idiom comes before or after Verb, Noun, etc. In the example, I put Idiom before Verb, but looking at ELE, I'm not clear about this.Algrif 12:19, 8 July 2007 (UTC)

By all means. Most are easy, change Noun phrase to Noun, just make sure the result makes sense. In some cases it makes more sense to identify a phrase as an Idiom rather than a POS, those should be apparent. In general, if using the POS template makes sense (such as en-verb in this case), then use the POS header. There are also a few that are Phrase (the kind of things you find in a translation phrasebook ;-) Yes, this one is fine. Carry on! (Oh, if Idiom is in the same entry as Noun or Verb, it comes first; alphabetical order) Robert Ullmann 10:41, 10 July 2007 (UTC)


I am in Kigali, Rwanda for a few days. The code doesn't want to work through the wireless proxy I'm using, so AF isn't working right now. I'm thinking trying to fix it is more trouble than it is worth; back around 16:00 UTC on Thursday. If I'm not on-line and you want me to look at something, SMS +250 088 08 087. Robert Ullmann 10:41, 10 July 2007 (UTC)

Strike that, works now. Windows sockets only work correctly if you have configured IE correctly. Doesn't matter if you never use IE ... Robert Ullmann 13:34, 11 July 2007 (UTC)

Improper heading demotion[edit]

In this edit, AutoFormat improperly demoted "===Syllable===" to "====Syllable====". The demotion is improper because it makes the syllable section appear to be part of the preceding etymology. Can such demotions be prevented? Rod (A. Smith) 19:51, 14 July 2007 (UTC)

I noticed that. The problem is that it should (if it is a POS) be "inside" an Etymology ... "Syllable" is a new idea, that hasn't been sorted yet. When we have multiple etys, we don't have a provision for another "POS" not inside one of them. What do you think? (Note the use of syllable here is entirely novel. I'm not saying bad; this is how we advance on a wiki, but what do we want to do?) Robert Ullmann 20:25, 15 July 2007 (UTC)
I should point out that this doesn't worry me too much because the whole issue of 'Syllable with or without diacritics and tone numbers and such for dozens (um, many) of CVJK languages is on my agenda. I intend to get this sorted! Robert Ullmann 20:35, 15 July 2007 (UTC)
Glad you spotted it, too. An exceptional aspect of "===Syllable===" is, as you say, that a given syllable has multiple etymologies. When we formalize and standardize our treatment of syllables, "===Syllable===" will probably end up as a peer to "===Etymology===". Rod (A. Smith) 16:34, 16 July 2007 (UTC)

bullet point instead of a numbered list[edit]

A couple of times now I've spotted that AutoFormat has marked an entry as missing a definition if a * rather than a # starts the line. Would it be possible to correct this where a standard template is used in the definition line of a POS section (e.g. {{past of}}, {{misspelling of}})? Thryduulf 19:39, 15 July 2007 (UTC)

See above User talk:AutoFormat#Possible new feature, these are worth someone looking at? As Connel says, the whole entry/language section is worth having someone look at? Robert Ullmann

Clean up pronunciation sections[edit]

Do you think it would be possible for AutoFormat to clean up mostly-well-formatted pronunciation sections to use the templates {{enPR}}, {{IPA}}, and {{SAMPA}}, and to mark un-AutoFormattable ones with some sort of RFC tag? (This comes up because there are hundreds of pages that still say "AHD" with an unhelpful link instead of "enPR" with a helpful one; and the "IPA" links are very inconsistent in where they link to.) The main down-side I see is, pronunciation sections tend to come first in an entry and are very often ill-formatted, so AutoFormat's one-​RFC-​tag-​per-​entry policy might prevent it from placing lots of potentially-more-meaningful RFC tags. —RuakhTALK 18:10, 17 July 2007 (UTC)

As you say often very ill-formatted. What I think we need first is a specification? Wiktionary:Pronunciation doesn't go into the format, and WT:ELE needs work. (shouldn't audio, homophones, and rhymes all be ** under the particular pronunciation? And why do we have a full Homophones heading floating about, when ELE says it should be a bullet? etc etc). I could always keep it from counting an RfC tag in that section as the first one. What is needed is specific rules. Even trying to change AHD is tricky, there are lots of variants. Robert Ullmann 20:36, 17 July 2007 (UTC) Look at diffuse. Then cent. (!) Robert Ullmann 20:39, 17 July 2007 (UTC)

Welsh mutations[edit]

It would be useful if AutoFormat could check that the correct mutation template is used on Welsh entries. Entries that start with a capital letter should use a capital letter template, entries that start with a lower-case letter should use a lower-case template. i.e. Lloegr should use {{cy-mut-Ll}} and llys should use {{cy-mut-ll}}, Bryste should use {{cy-mut-B}} and bardd should use {{cy-mut-b}}, etc. Ideally it would also change it to the correct template if this is possible. Templates named named in the format {{welsh mutation b-}} (where "b" is any of b B c C d D g G ll Ll m M p P rh Rh t T) are now just redirects to templates named in the format {{cy-mut-b}} (again "b" can be any of the same letters), so it would be good if AutoFormat could replace the former with the latter at the same time as this check.

Also, would it be possible to flag Welsh entries starting with one of: b B c C d D g G ll Ll m M p P rh Rh t T or that are in category:Welsh mutated nouns which do not have a Mutation section, or have a mutation section with no template in it. This will not catch entries for mutated forms not in that category, but I can't think of how to catch them, as for example words starting with "f" do not mutate, but could me mutated forms of words starting with "b" or "m" (see filiwn for example); and mutated forms of words starting with "g" can start with any of "a", "e", "ng", "i", "l", "o", "r", "u", "w" and "y" and possibly others that could all be non-mutated words.

Thirdly, it would be good if any entries that use a "Conjugation" header could be changed to "Mutation". Thryduulf 23:59, 18 July 2007 (UTC)

This sounds to me to be more appropriate as a separate task, to go through the entire wikt (e.g. the XML file) and look for things. It can then be run after a subsequent XML dump to re-check. Similar to the task that cross-checks and updates the Mandarin Pinyin syllable entries, see User:Robert Ullmann/Mandarin Pinyin. Robert Ullmann 08:04, 23 July 2007 (UTC)
I don't have a lot of (personal) bandwidth; I've been very seriously ill, and in and out of The Nairobi Hospital for the last 10 days. If I do have a bit of time, it would be a pleasant distraction; I can always hand the Python code to someone else later. No promises :-) Robert Ullmann 08:22, 23 July 2007 (UTC)
See User:Robert Ullmann/Welsh Mutations. IMHO, cases like filiwn probably shouldn't have the mutation template, only the "lemma" form. Sort of like with conjugations or declensions? OTOH maybe you'd prefer to have them? Robert Ullmann 11:54, 23 July 2007 (UTC)
Thanks for this. I think the mutated forms should have the mutation template as a quick reference to what the other forms are. With a maximum of four entries for one word (lemma, soft mutation, nasal mutation and aspirate mutation) it isn't taking up a lot of space. Perhaps on entries like filiwn the mutation tables should be merged into one? This would require some reprogramming of the templates that I can't think how to achieve off the top of my head, but almost certainly can be done. Thryduulf 08:46, 24 July 2007 (UTC)
The standard mutation template is too wide for the page at Llanfairpwllgwyngyllgogerychwyrndrobwllllantysiliogogogoch and Lanfairpwllgwyngyllgogerychwyrndrobwllllantysiliogogogoch (soft mutation form), so I've put a custom version over two lines on both pages. I've included a call to the standard template in a comment. If the bot does check for the presence of a template, could you either mark these two pages as exceptions or get it to recognise a call inside a comment as valid. Thryduulf 09:27, 24 July 2007 (UTC)

Language statement[edit]

What does language statement mean?--Oraculum 07:44, 24 July 2007 (UTC)

Level 2 header, like ==English==. It is missing, so there are no language sections in the entry. (And while the intended language may be obvious to humans, not so much to the 'bot ;-) Robert Ullmann 07:49, 24 July 2007 (UTC)
Thanks for the clarification. I fixed it.--Oraculum 07:52, 24 July 2007 (UTC)


Hi, I brought this up at user talk: Robert Ullmann, but you must have missed it. Thanks for all your work doing formatting stuff. I updated about:japanese so that it includes the definition of the reciprocal header. So there's no more need to put "rfc-header|reciprocal" in verbs like 落ちる. In fact, this now makes the 2nd time I have to go through and take those rfc's out... so if you could find a moment, please reprogram the bot to stop doing this. Language Lover 23:09, 24 July 2007 (UTC)

Allowing it for Japanese requires discussion and (probably) a vote. I note that the change to WT:AJ was reverted. Robert Ullmann 06:46, 26 July 2007 (UTC)
The vote has been started. In the meantime, it would be wasteful for someone to spend time cleaning up anything, so let's turn that part of the bot off. :) Language Lover 04:21, 27 July 2007 (UTC)
Then just leave them flagged. In any case, they have to be fixed from level 3 to 4 anyway. Robert Ullmann 06:09, 27 July 2007 (UTC)

Automatically insert {{italbrac}}[edit]

Hi, Maybe you could write a little routine which transforms ('' into {{italbrac| and the corresponding '') into }}. Maybe also similar for ''(. H. (talk) 10:07, 27 July 2007 (UTC)

Is a little more complicated than that; what with commas and such. (the commas get their own CSS wrap) But more seriously, it depends on where the syntax appears. In defn lines it becomes context (already), in Related terms and the like probably italbrac, but in pronunciations it should probably go to various dialect/variation labels. In other places? Takes thought to make sure it is just a syntactical change/correction, not altering the semantics or meaning. Robert Ullmann 14:07, 28 July 2007 (UTC)


AutoFormat could remove colons from the "see line". The indentation is done with CSS now but some still exist with the colon.

Change this:

{ {see|Alsatian}}

To this:

{ {see|Alsatian}}

Hippietrail 23:59, 27 July 2007 (UTC)

Just a regex table entry ... I added it, we'll see when I restart a run. Robert Ullmann 14:03, 28 July 2007 (UTC)
The AF stats display shows that there are 113 of these remaining; AF will sooner or later get them. Robert Ullmann 15:40, 28 July 2007 (UTC)
See edit Robert Ullmann 19:59, 28 July 2007 (UTC)

Regarding ‘Belgium’[edit]

Hi Robert,

regarding this edit: I hope you fixed AutoFormat by now? Belgium has three official languages, so there’s no way to know that an occurrence of (''Belgium'') means Belgian Dutch or Belgian French. Hm, well, ok you could look at the current language section... Maybe Belgian Dutch should just be Flemish, although there is still a slight difference, but I won’t bother you with that. H. (talk) 13:18, 1 August 2007 (UTC)

The problem isn't actually inside AF, it is {{Belgium}}, which you will note is a redirect to {{Belgian French}}. Which, as you note, isn't correct. There is a general case problem with country tags for non-English languages; they can't just be redirects like this. Robert Ullmann 12:33, 2 August 2007 (UTC)
This is the only existing problem template; I'll take it out of the Contexts table, but it really ought to be fixed! Robert Ullmann 14:02, 2 August 2007 (UTC)
This is fixed, both with a new context template, and with AF handling redirects a bit differently. Robert Ullmann 14:50, 11 August 2007 (UTC)

Multiline translation entries[edit]

Especially Serbian but also Chinese and other languages sometimes use sub entries. This broken format:

  • Chinese:
Cantonese: 上海
Mandarin: 上海 (Shànghăi)
Min Nan: Siōng-hái

Should be trivially changed into this correct format:

Hippietrail 13:38, 1 August 2007 (UTC)

Yes, I've been thinking about the parsing of the multiline things; right now if it isn't *language: AF doesn't touch it. (Mind you in your example, the Chinese really ought to be sorted out as the individual languages.) There are lots of little nasty variations once tables go past the single line format, and I want to be careful. Robert Ullmann
Do you have any present examples? Robert Ullmann 12:35, 2 August 2007 (UTC)
Tried adding a simple rule. We shall see ;-) Robert Ullmann 13:12, 2 August 2007 (UTC)
Look at this edit, separating the cases is not so simple. We'll just see how it goes. Robert Ullmann
That's better ... Robert Ullmann 13:58, 2 August 2007 (UTC)
  • Plese can you also add a rule to add the colon which is missing at the end of the main language name on many entries, eg starHippietrail 03:26, 3 August 2007 (UTC)

Avoiding selflinks on the pages for exotic language names[edit]

I just noticed that on Klingon, in the Translations section the name of the Klingon language was linked, creating a selflink. I assume this was a corner case not thought of for AutoFormat since it usually links language names not in the "top 40". An exception should be made when the link is to the very page so as to avoid self links. — Hippietrail 12:10, 2 August 2007 (UTC)

Yes, that case hadn't occurred to me at that point. (AF didn't actually link it, rather changed the 'pedia link to our own entry.) I saw the case arise with some other language and have fixed it; it will always unlink a translation language that is a self-reference. Robert Ullmann 12:19, 2 August 2007 (UTC)

Script request categories[edit]

The bot should not move these request categories to the bottom of the section. Since the script might be needed at various places the category is put at the place the script is needed so somebody fulfilling the request doesn't have to hunt up and down long articles. — Hippietrail 09:44, 4 August 2007 (UTC)

Seems to me we usually do this sort of thing with tag templates, rather than placing a category at the point. AF likes to collect all of the cats for a section before parsing in detail, else there are endless exceptions. The other things tagged with templates do fine. What do you think of using {{rfscript}}, always at the point needed? (spider is a good example) Then, as with other things in this class, it is much easier to change the requests/cat structure too. Robert Ullmann 16:30, 4 August 2007 (UTC)

Bot error[edit] 16:39, 4 August 2007 (UTC)

Where both the Translations to be checked header and the {{checktrans}} template were missing ... ;-)
Just how do we want these ttbc sections formatted now? There is {{ttbc-top}} and {{checktrans-top}} and who knows what else! Robert Ullmann 16:52, 4 August 2007 (UTC)
Look, it wouldn't have been a huge deal if AutoFormat had changed "{{ttbc|Esperanto}}" to "Esperanto"; but AutoFormat removed the "{{ttbc|" and left the "}}", which is clearly wrong. —RuakhTALK 17:58, 4 August 2007 (UTC)
At that point (which it will no longer arrive at ;-) it has a string which should be either a language name, wikilinked language name, or a language template. So it didn't get the right answer ... I have fixed it. (even without the check below). Robert Ullmann 18:07, 4 August 2007 (UTC)
Fixed to just give up on a trans section if it unexpectedly sees {ttbc}. Robert Ullmann 17:02, 4 August 2007 (UTC)

Please restore delay[edit]

Having a one hour delay (or so) is better than the rapid-fire mode AF is currently in. The delay was brilliant. Why the change (in the wrong direction?) --Connel MacKenzie 00:12, 16 August 2007 (UTC)

I haven't changed anything. The delay is 6 minutes+. It has been quite slow because of the net here. But it is the middle of the night. What specifically happened too fast? Robert Ullmann 00:22, 16 August 2007 (UTC)
It is 6 min8 minutes + 30 sec/queue entry, from the time an entry is patrolled, and a minimum time of 70 sec between edits to successive pages Robert Ullmann 00:26, 16 August 2007 (UTC)
(corrected, 480 sec is 8 min) It looks at RC every 10 minutes, and then queues as above; the average time before it looks at a new (newly patrolled) entry with no load or network delay is 13 minutes. This can be made longer; it was much shorter when I was doing the bulk of the development because I didn't want to have to wait a long time each time it was restarted. Robert Ullmann 12:04, 16 August 2007 (UTC)

Translations to be checked at L5+, not in Translations[edit]

What happened in this edit? As far as I can tell, the "Translations to be checked" section is indeed inside a "Translations" section. Do your automated eyes see something mine don't? —RuakhTALK 14:52, 21 August 2007 (UTC)

There is (was; fixed now) a {{ttbc}} in the previous table. When it sees the TTBC header, it is essentially saying "wait a sec, I thought we were already in ttbc?!". Robert Ullmann 14:59, 21 August 2007 (UTC)

Changing paren + 2 single quotes + (text) + 2 single quotes + close paren + colon --> c-i template[edit]


I've noticed Autoformat is changing my synonym/antonym glosses to use this template. I was just wondering what the merits of this template

{{i-c|blah blah}}

vs. the "longhand" version, e.g.

(''blah blah''): 

is? Seems less intuitive to me, and it uses one more character. Is there an inherent advantage to template use here that I'm missing? --Jeffqyzt 21:03, 24 August 2007 (UTC)

[edit c-i to i-c Robert Ullmann 22:53, 26 August 2007 (UTC)]

It allows the CSS customization to work, just as with {{context}}. The parenthesis can be hidden, or italicized (which is typographically correct). Likewise the colon. And the text can be displayed with whatever font attributes with or without the serial comma. So depending on CSS, {{i-c|foo|bar}} can be

  • (foo, bar):
  • (foo, bar):
  • (foo, bar)
  • (foo, bar)
  • foo, bar:
  • foo, bar
  • foo, bar:

or whatever. We aren't stuck on "i-c" as a name, DAVilla doesn't like it, and he won't like "ic" any better; it looks like a language code, even though it will never be assigned. ("i:" would be cool, but looks too much to the software like a namespace or sister-project reference ;-) Still experimenting; it is easy to replace/subst/whatever all the occurences of {{i-c}} Robert Ullmann 12:42, 26 August 2007 (UTC)

Ah, so it's a CSS hook then, I see. Isn't the rendering of the generic "two single quote" italics done via .CSS as well? I suppose you might want to separate different instances of italics use. Well...since the "i" and "c" are taken from "italics" and "colon" respectively, in keeping with .CSS philosophy for class labels (describe function not formatting) it would be better to come up with a descriptive name for the function of the tag. Also, you might not want to have the same tag used in all cases (although, at present, I can only remember this being used for the Synonyms/Antonyms/InsertNameHereNyms.) In each of those cases, the formatting is used to present a short gloss of the def given above. How about something like {{gloss|foobar}}?
P.S. If there's a discussion of this somewhere, can you please point me? Thanks! --Jeffqyzt 14:04, 27 August 2007 (UTC)
Yes, look at {{italbrac}}. {i-c} is just a redirect to {{italbrac-colon}}. And yes, they are named for form not function. The name is uncertain right now, but {{gloss}} is used for something else at present. See discussion in the Grease Pit. Robert Ullmann 18:21, 27 August 2007 (UTC)
In fact for this template the function is the form so technically it is named for both the form and the function. — Hippietrail 11:37, 11 September 2007 (UTC)

minor spacing[edit]

AutoFormat will now "correct" minor spacing on new entries, even if there is no other reason to save the edit. "correct" being qualified because it generally does not affect rendered format. Robert Ullmann 22:50, 26 August 2007 (UTC)

Turned off again. Too many edits that were just one blank line added. Robert Ullmann 14:34, 6 September 2007 (UTC)

The bot messed up "ducklike"[edit]

Check it out, the bot made a mess of ducklike in its most recent revision thereof. I thought you'd like to know, since it's probably messing up other entries besides the ones on my watchlist. —This unsigned comment was added by Language Lover (talkcontribs) at 20:08, 5 September 2007 (UTC).

Can you be more specific? In the edit you mention [4] I cannot see that the bot is messing up anything. The sole change was:
- (''usually [[postpositive]]'') in a [[ducklike]] manner; as, to walk [[ducklike]]
+ {{usually|postpositive}} in a [[ducklike]] manner; as, to walk [[ducklike]]
i.e. it replaced the hardcoded context label with the standard template that conveys the same information. Thryduulf 02:05, 6 September 2007 (UTC)
It's fixed now because Ruakh modified the postpositive template. Thanks Ruakh :). Before Ruakh did that fix, the entry was messed up, the tag said something like, "(usually (context 1))". Language Lover 02:13, 6 September 2007 (UTC)
Heh, sorry, I should have commented here. (Usually I kind of like fixing things silently, but in this case by doing so I hid an example of a problem that exists elsewhere.) BTW, welcome back, Language Lover. :-) —RuakhTALK 02:56, 6 September 2007 (UTC)
Yes, the problem was with the template. If you look at User:Robert Ullmann/Context labels, you'll see postpositive is the only one with unparsed syntax. (there are a few others that don't have a line break before the boilerplate parameters). Don't know how I missed that one... sorry. There aren't any others at present (well, as of Aug 29 XML dump) that are recognized as context templates by AF, but aren't using the new syntax. There may be others that ought to be converted to context syntax, and then would be recognized. (A template is treated as a context label template if it contains {{context but isn't {{context}} itself.) Robert Ullmann 10:13, 6 September 2007 (UTC)

Structure problems[edit]

If the bot's going to leave {{rfc-level}}, it might as well just fix it automatically. (An exception, of course, being words with multiple etymologies.) I'm sure something can be done about this — [ ric | opiaterein ] — 14:11, 10 September 2007 (UTC)

See all the screaming above about not allowing the bot to follow ELE, including the argument that ELE doesn't say what it says. Will take a vote to permit it. Robert Ullmann 14:13, 10 September 2007 (UTC)

inflection lines for symbols[edit]

In this edit, AutoFormat didn't recognize the existing inflection line, presumably because it was wrapped in font tags. Would it be easy to have it ignore font tags? Also, translingual should probably be mul, right? Rod (A. Smith) 20:45, 20 September 2007 (UTC)

We shouldn't have HTML wrapped around the inflection line templates. The language specific templates do the right thing, and infl takes a script parameter (all this as on your talk page, I'm repeating it here for others' benefit). In this case, it should use sc=Brai (and these entries have been fixed). And yes, {{mul}} for Translingual is exactly the right thing. Robert Ullmann 12:51, 21 September 2007 (UTC)

Language names not needing wikification[edit]

I see that both Lao and Macedonian are now being linked but they are both very close to the names of countries and pretty well known. I think shey should be in the "do not link" category. — Hippietrail 03:14, 28 September 2007 (UTC)

There was a conversation Wiktionary talk:Translations/Wikification about Macedonian, pointing out that the definition of the country is a hot political issue, and the language area is not the country (for whatever definition of the country), and a link to the definition of the language would be good. As to Lao, I don't see any issue; most people wouldn't recognize it as a language name out of context, but given that it is language would know what it is. Robert Ullmann 10:24, 29 September 2007 (UTC)
  • Burmese is another that doesn't need to be linked. It is trivially and regularly derived from the name of the country Burma. — Hippietrail 18:05, 1 October 2007 (UTC)
you know, you can edit WT:TOP40 ;-) Robert Ullmann 19:24, 1 October 2007 (UTC)


This just got left on apă potabilă.

Fix. Now. Please. =) — [ ric | opiaterein ] — 18:02, 28 September 2007 (UTC)

I'm usually an inclusionist when it comes to part-of-speech headers, but in this case I agree with AutoFormat. Should the header perhaps be ==Noun==? —RuakhTALK 18:20, 28 September 2007 (UTC)
Certainly. Corrected. (Thanks, AutoFormat!) Rod (A. Smith) 18:59, 28 September 2007 (UTC)

The "acronym" context template.[edit]

{{acronym}} is not intended as a context template, but this edit tries to use it as one. Any thoughts how to address this? —RuakhTALK 14:00, 29 September 2007 (UTC)

It just converted cattag to context. AF certainly wouldn't convert (say) (''acronym'') to {{context|acronym}} ... the problem here was pre-existing. Robert Ullmann 19:30, 1 October 2007 (UTC)


The translations don't go on the wrong entry - what do you want it to say, instead? I just removed that friggin' tag! --Connel MacKenzie 22:44, 4 October 2007 (UTC)

The Translations header goes at L4. Robert Ullmann 14:41, 5 October 2007 (UTC)
That doesn't make sense for a soft-link. With only one POS, I suppose it doesn't matter. --Connel MacKenzie 05:33, 7 October 2007 (UTC)
With more than one POS, you repeat it, with the correct links:



See run (noun)



See run (verb)

See? One could also link to the Noun and Verb headers, a bit more robust. Robert Ullmann 14:50, 7 October 2007 (UTC)

I'm sure this made someone laugh...[edit] --Connel MacKenzie 14:28, 7 October 2007 (UTC)

I thought this was a rather incredible whatever; just makes more work to clean up. AF removed the tag because there is after all, no header there ... Don't we have {{impersonal}}? I fixed the entry. Robert Ullmann 14:36, 7 October 2007 (UTC)

Infinitives and Participles[edit]

I think I re-triggered AF to catch these better. Please take a look. --Connel MacKenzie 20:23, 20 October 2007 (UTC)

Doesn't quite work that way ;-(. It creates a duplicate table entry for participle(s), and ends up changing the Participle header to Conjugation ... Robert Ullmann 13:44, 22 October 2007 (UTC)
How? Well, it was wrong before I started editing it, too. (Duplicate listing - 1st was wrong.) Hrm. I'd better take a look at what I did wrong. (Why would it confuse "participles/participle"??? The singular part. vs. plural part. mistakes are very different errors.) --Connel MacKenzie 06:18, 25 October 2007 (UTC)
GAH! I took out "Particle"? Ouch. --Connel MacKenzie 06:26, 25 October 2007 (UTC)
A large number of header corrections are pl/sg: Antonym to Antonyms, Nouns to Noun (seen that a few times). So it is set up for that; the table says "Participle" is standard, so it assumes Participles -> Participle, but then later in the table it says Participles is recognized as NS, to keep it from changing it to Participle. Trying to add additional case of recognizing Participles but then changing it to Conjugation is more than the structure can handle ... it ends up changing Participle to Conjugation (um...). I've tweaked it a little, but this case still isn't handled. Robert Ullmann 16:15, 25 October 2007 (UTC)

Feature request: "#" to "*" in L4s[edit]

In valid "L4" sections such as ===Synonyms===, ===Antonyms===, etc., if a line starts with a "#", it can safely be changed to "*". I haven't found a single exception to that rule yet. (Many on my /todo8 can do with a healthy amount of review, besides that, but usually aren't too horrible, so it isn't worth forcing to the manual lists, I think.) --Connel MacKenzie 06:21, 25 October 2007 (UTC)

Okay, but I'll have to look at it; the guts of the loop are getting too complicated, time to redesign a bit. I think it might only take "if # and Level[header] == 4, then ..." about 3 lines, right where it is looking for defn lines. This would be good because it would keep the code out of the context label processing, instead doing the italbrac->sense or qualifier as it should. Robert Ullmann 16:23, 25 October 2007 (UTC)
POS headers (such as Noun and Adjective) sometimes appear at level 4; it is not desirable that numbered definitions become bulleted instead. Also, could an exception be made for Dictionary notes and References sections please? And while we’re at it, could AF be tweaked to insert a space between numbering octothorpes, bulleting asterisks, emboldening semi-colons, and indenting colons and the text that follows them please? –It would make code neater and easier to read.  (u):Raifʻhār (t):Doremítzwr﴿ 02:05, 26 October 2007 (UTC)
It knows when it is in a section under Etymology 1 (or n), and isn't confused. In any case, it would only do this when the section header is in the known "L4" list (whether it appears at L4 or L5). References and Dictionary notes take bullets; the convention is that # numbered lines are only definitions. (I'm not talking about the numbers generated by ref tags.)
The other issue is something I've been thinking about: we add a space after #, but it really ought to be after any sequence of #*: or a ; to make things easier to read. We usually haven't done this after *. (In any case, AF would not save an edit for this alone.) Robert Ullmann 02:17, 26 October 2007 (UTC)
OK, just checking to make sure that this tweak wouldn’t cause an unforeseen and highly undesirable change to be made.
I’ve more than once gone through translation tables adding a space between the asterisk and the language name (I kinda gave up when I saw AF remove the 100+ spaces I added once). Though an adjoining asterisk is less intrusive than an adjoining octothorpe, and much less so than an adjoining colon or semi-colon, it’s still more intrusive than when the asterisk is separated from the text by a space; for which reason and for consistency’s sake, I think adding that space would be a good thing. What do you think?  (u):Raifʻhār (t):Doremítzwr﴿ 02:24, 26 October 2007 (UTC)
I'd like to see a space after the "stackable" wikisyntax characters, *, :, # (and ; although that is really part of the ; : syntax). See a prior discussion. The reason lines in translations sections get changed is that AF takes them apart and re-assembles them, and it is following the usual convention. But we could change this. (not going and trying to fix all the entries! just as edited for something else) Robert Ullmann 16:22, 27 October 2007 (UTC)
Is this a significant-enough change to warrant a (probably short and uninterested) policy discussion in the Beer Parlour?  (u):Raifʻhār (t):Doremítzwr﴿ 17:37, 27 October 2007 (UTC)
I'd probably say WT:GP, see what some other people think. You never know who might be vociferously opposed. Robert Ullmann 17:58, 27 October 2007 (UTC)
I’ll do that.  (u):Raifʻhār (t):Doremítzwr﴿ 21:15, 27 October 2007 (UTC)
I think spaces would be nice, but so many changes at once might make for an unreadable diff. Or maybe not, since there'd be no visible red. Whatever; I trust your judgment, as I think most contributors here do. —RuakhTALK 19:22, 27 October 2007 (UTC)

Original request: done. It would help if you feed several entries to AF with the tag, and see if it does what is expected. Robert Ullmann 16:22, 27 October 2007 (UTC)

Are Dictionary notes and References sections excepted?  (u):Raifʻhār (t):Doremítzwr﴿ 17:37, 27 October 2007 (UTC)
No. Note that WT:ELE shows/specifies bullets, and templates often include them (which they shouldn't, the format should be in the wikitext). It is very important that only definition lines use #. There are a lot of programs that read the XML, for various kinds of mirrors and such. The simplest ones can simply look at the # lines to get the definitions. We do a number of things to make sure the wikitext is "tractable" to automation. Robert Ullmann 17:58, 27 October 2007 (UTC)
That’s fair enough. For that reason, I replaced a numbered list with a bulleted one in the Dictionary notes section for encolden.  (u):Raifʻhār (t):Doremítzwr﴿ 21:15, 27 October 2007 (UTC)
I tested this on Armagnac, kithless, bakeshop, seruitus; works okay. Robert Ullmann 17:58, 27 October 2007 (UTC)


Was the recent change to tagliarino‎ correct? RJFJR 13:46, 29 October 2007 (UTC)

I’d say not. The singular form is rare, not inextant, which means that tagliarini is, by definition, not a plurale tantum. Moreover, the singular form tagliarino cannot possibly be a plurale tantum.  (u):Raifʻhār (t):Doremítzwr﴿ 13:58, 29 October 2007 (UTC)
Someone has created a template for "in plural", adding the pluralia tantum category; I'd say that is probably wrong, the category should only be added if you use {{plurale tantum}} (or {{pluralonly}}). "in plural" doesn't necessarily mean that; it seems to be/would be used in the entry for the singular form. Robert Ullmann 14:10, 29 October 2007 (UTC)

AF changed
#(''especially in plural'') narrow [[tagliatelle]]
# {{especially|in plural|lang=it}} narrow [[tagliatelle]]
It was oddly formatted and I'm not sure AF knew how to handle it. RJFJR 14:16, 29 October 2007 (UTC)

Yes, it is breaking out one of the standard modifiers. As Doremítzwr notes, the singular form (where the "in plural" template is used) can never be the plurale tantum, and shouldn't be in the cat; I've taken the cat out of the template, looking through the pages that link to it, I see only singular forms, except for memoirs, and that isn't a plurale tantum either. Robert Ullmann 14:24, 29 October 2007 (UTC)
What was odd about the format? Looked perfectly normal to me? Robert Ullmann 14:27, 29 October 2007 (UTC)

acronym template breakage[edit]

Here's something to watch out for! 08:17, 6 November 2007 (UTC)

Ah, thank you, fixed. Very subtle. (can we get rid of the header template sometime, and use an infl/headword template like everything else?) Thank you! Robert Ullmann 23:12, 6 November 2007 (UTC)
um now this edit is really interesting. How does that happen? Ouch. Robert Ullmann 23:35, 6 November 2007 (UTC)

See template[edit]

In this edit, AF added the {{see}} template at the end of the entry, whereas it should be placed at the outset. --EncycloPetey 19:07, 12 November 2007 (UTC)

It didn't add the template; SemperBlottoBot added an Italian section on the end and the template, tagging it for AF to sort languages/iwikis; the {see} template isn't added in the right place (known problem) and AF doesn't fix that. Robert Ullmann 19:37, 12 November 2007 (UTC)

Skipping commented code[edit]

[5] - L3 ===Synonyms=== is clearly commented out. I don't know if AF ignores everything inside <!-- and -->. but I guess it should.. --Ivan Štambuk 17:41, 15 November 2007 (UTC)

Yes, it probably should. OTOH, if something is wrong, it should be deleted (always available in the page history), or moved to the talk page. HTML comments, sometimes in odd places, can cause all sorts of problems. Robert Ullmann 21:12, 15 November 2007 (UTC)

Bug! AF messes up formatting of translations[edit]

If a translation line starts with 2 *, e.g.

AF changes this to the following, which is incorrect:

see H. (talk) 20:58, 17 November 2007 (UTC)

it is wrong to start with; any sub line should start with *:
and this needs resolving into Kurmanji/Kurdish or something... Robert Ullmann 19:04, 18 November 2007 (UTC)
Do the style manuals determine to start with '*:' in Translations section? In the example in Wiktionary:Entry layout explained, just ':' is explained for the multi style language like Kurdish. I have been believed to use this style. Please let us know if style manual have this definition.
I found the same kind of AutoFormat's edit at Thanksgiving. Thanks. --Eveningmist 06:40, 23 November 2007 (UTC)
Should be *: (and the WT:ELE example should be fixed); the reason not to use just : is that we want to continue the HTML list (the * format). In any case not **, as the sub lines shouldn't get bullets. Still some bug I don't understand; ** should be left alone. Robert Ullmann 07:57, 23 November 2007 (UTC)
Fixed that (regex gobbling too much), ** is converted properly to the preferred *: Robert Ullmann 09:02, 23 November 2007 (UTC)
Scratch that last note: ** when the item is a language should probably be left that way, at least until we figure out a bit more about "sub-" languages in the tables. Robert Ullmann 15:39, 24 November 2007 (UTC)


hey autoformat check out my enteries and see if you can help like petey or nanadro --TheRaccoon 16:18, 24 November 2007 (UTC)

RFC stuff[edit]

A lot of pages are getting RFC tags from you for pretty minor/pedantic reasons. I think in the end, theyd look better with the minor "errors" than with a big box at the top. I'm thinking here through the eyes not of a wiktionary editor but just someone who comes to look up the entry. Those boxes should really only be displayed to editors because our primary userbase doesn't really care whether the "see also" is on level 2 or level 3. The "see also" mis-level is barely noticeable but the RFC box is an eyesore. Language Lover 08:00, 26 November 2007 (UTC)

As I pointed out on BP, "See also" (or External links, etc) at level 2 is a serious error that needs to be fixed. It is not "minor". They should get cleaned up as quickly as possible. Is there some resistance to fixing them?
The errors that are minor are indeed tagged with invisible tags, only indicated in the page presentation by the category at the bottom. Robert Ullmann 13:19, 26 November 2007 (UTC)

Bot flag[edit]

Bot flag finally set. SemperBlotto 16:11, 3 December 2007 (UTC)

Category:English nouns that lack inflection template and plural words[edit]

Given that {{en-noun}} cannot handle plurals could the bot be made not to tag articles using {{plural of}}, please? Circeus 05:45, 11 December 2007 (UTC)

see #Plurals above: if AF has to add an inflection line on an English noun, it ought to be looked at anyway. Robert Ullmann 06:09, 11 December 2007 (UTC)
Okay, I had only added the plural of template, and had missed that. Circeus 06:27, 11 December 2007 (UTC)

Subcategorization of L3+ problems by language[edit]

I'm looking for a way to filter [[Category:Entries with level or structure problems]] and [[Category:Entries with non-standard headers]] by the offending language. I couldn't find a toolserver tool to filter a category by articles matching header criteria (too heavy on the server?). Is there a way to modify the addition of {{rfc-level}} and {{rfc-level}} to show in which L2 language the problem occurs? --Bequw¢τ 17:23, 18 December 2007 (UTC)

Well I found a way around that. Using CatScan I found the intersect between those categories and [[Category:Spanish language]]. Now time to clean up. --Bequw¢τ 23:04, 18 December 2007 (UTC)