Lemma List

Fragment of a discussion from User talk:Rua
Jump to navigation Jump to search

That won't find entries where some sections use a template but not others.

CodeCat23:58, 30 July 2014

That's true, I might do that one later. In the mean time, there are 367,479 entries that probably have no headword template (most of them are Italian, ~64,000 are not). Here's the list: https://dl.dropboxusercontent.com/u/28940500/no%20head.txt.gz

DTLHS (talk)02:25, 31 July 2014

Thank you for that list. Can I ask how it was generated?

CodeCat11:01, 31 July 2014
 

I've started working on the list, but it will take a while to work through it. Right now the bot is replacing all instances of a bolded page title on a line by itself with {{head|xx}}. That seems to be catching almost all of them now.

I would appreciate it if you could give updates regularly as the dumps are released, so that I know what still needs to be done. Also I'd like to request that the list contains only the page names (not the languages), and also sorted and with duplicates removed. Would that be possible for the next dump?

CodeCat17:29, 31 July 2014

Of course. And I just split pages by language section, then looked for any lines containing '''PAGENAME'''.

DTLHS (talk)19:28, 31 July 2014

That won't have caught them all, but it's a good start. Thank you.

CodeCat19:38, 31 July 2014