User:Matthias Buchmeier/ding2dictd

From Wiktionary, the free dictionary
Jump to navigation Jump to search

Convert dictionaries from ding-format to dictd-format[edit]

Usage:[edit]

You need the programs gawk, bash and dictfmt. Save the awk code to a file named ding2dictd.awk and the bash script to ding2dictd.sh resp.. Then run bash ./ding2dictd.sh SOURCE TARGET from the commandline, where SOURCE is the filename of the ding-file to be converted and TARGET is the filename-base of the dictd files to be generated, e.g. 'bash ./ding2dictd.sh en-es.txt en-es' will generate the files 'en-es.dict' and 'en-es.index'.

Code:[edit]

Bash script[edit]

#!/bin/bash
sourcefile=$1
targetfile=$2
gawk -f ding2dictd.awk $sourcefile|dictfmt -f \
-s "$1 extracted from en.wiktionary.org" -u "http://en.wiktionary.org/wiki/User:Matthias_Buchmeier" \
--utf8  --columns 0 --without-headword --headword-separator :: $targetfile

AWK script (ding2dictd.awk)[edit]

BEGIN {FS=" :: "; print "#";}
/^[#]/ {
	print " "$0;
	header=1; next;}
/^[^#]/ {
if(header==1) {print "#\n#"; header=0;}
indx=$0;
# convert {...} -> <...> (curly brackets are interpreted as links by some clients)
gsub(/\{[^\{\}]*\}/, "<&>"); gsub(/<\{/, "<"); gsub(/\}>/, ">");
#rm See:...
gsub(/SEE[\:].*/, "", indx);
#rm {...}, [...] (...)
gsub(/\{[^}]*}/, "", indx);
gsub(/\([^)(]*)/, "", indx);
gsub(/\([^)(]*)/, "", indx);
gsub(/\[[^][]*]/, "", indx);
# convert ,;|/ -> ::
gsub(/[;,/]/, "::", indx);
# rm extraspace after and before ::
gsub(/[ ]*[\:][\:][ ]*/, "::", indx)
# rm multiple ::
gsub(/[\:][\:]+/, "::", indx);
# rm space at end of line
gsub(/[ ]*$/, "", indx);
gsub(/[ ]*[\:][\:]$/, "", indx);
# rm "::" at the end of trans-see
gsub(/[ ]*[\:]+[ ]*$/, "", $1);
# add dictd links {...} to trans-see (needs gawk)
# undocumented feature does not work with all clients
$1=gensub(/(.*SEE[:][\ ])(.*)/, "\\1{\\2}", "g", $1);
# printout		}
print indx;
print " "$1"\n\t"$2;
}