Translations from an Xml dump

Jump to navigation Jump to search

On the one hand I replied to heyzeuss, that we are working on something that he is looking for, but it will need some more time. On the other hand, I wanted to give you a short update and piggybacked that on the message. I will have a look at the Beer Parlour discussion soon. Scraping data from Wiktionary is difficult since each language version is different. So instead of making some fixed code in python or Java to scrape the data, we are implementing a configurable system. Every interested Wiktionry user like heyzeuss can then alter the configuration (which is like a simplified regex) to say what he wants to have parsed out. SebastianHellmann 19:20, 23 March 2011 (UTC)

SebastianHellmann19:20, 23 March 2011

Hm, I still not sure I understand what you mean. Do you mean adding code to inflection/headword-line templates to allow data to be pulled out?

Yair rand (talk)19:34, 27 March 2011

No, I want to scrape data from Wiktionary, covering all language editions. Since all language editions are different the scraper needs to be configured differently for each language. It is not so easy to produce code that allows to be arbitrarily configured to scrape code. It will work similar to a template, but the other way round. I think the opposite of a template is a pattern such as Regex. I think it is probably more complicated to explain it than to give an example. I will prepare one and send it to you. SebastianHellmann 20:16, 27 March 2011 (UTC)

SebastianHellmann20:16, 27 March 2011

You might want to ask User:Robert Ullmann for guidance. He runs a bot called User:AutoFormat that fixes mishappen entries. User:Prince Kassad does the same thing with User:KassadBot.

heyzeuss07:41, 28 March 2011
 

Robert Ullmann has been out. Best check with Prince Kassad.

heyzeuss08:06, 28 March 2011