Convert dictionaries from ding-format to dictd-format




You need the programs gawk, bash and dictfmt. Save the awk code to a file named ding2dictd.awk and the bash script to resp.. Then run bash ./ SOURCE TARGET from the commandline, where SOURCE is the filename of the ding-file to be converted and TARGET is the filename-base of the dictd files to be generated, e.g. 'bash ./ en-es.txt en-es' will generate the files 'en-es.dict' and 'en-es.index'.



Bash script

gawk -f ding2dictd.awk $sourcefile|dictfmt -f \
-s "$1 extracted from" -u "" \
--utf8  --columns 0 --without-headword --headword-separator :: $targetfile

AWK script (ding2dictd.awk)

BEGIN {FS=" :: "; print "#";}
/^[#]/ {
	print " "$0;
	header=1; next;}
/^[^#]/ {
if(header==1) {print "#\n#"; header=0;}
# convert {...} -> <...> (curly brackets are interpreted as links by some clients)
gsub(/\{[^\{\}]*\}/, "<&>"); gsub(/<\{/, "<"); gsub(/\}>/, ">");
#rm See:...
gsub(/SEE[\:].*/, "", indx);
#rm {...}, [...] (...)
gsub(/\{[^}]*}/, "", indx);
gsub(/\([^)(]*)/, "", indx);
gsub(/\([^)(]*)/, "", indx);
gsub(/\[[^][]*]/, "", indx);
# convert ,;|/ -> ::
gsub(/[;,/]/, "::", indx);
# rm extraspace after and before ::
gsub(/[ ]*[\:][\:][ ]*/, "::", indx)
# rm multiple ::
gsub(/[\:][\:]+/, "::", indx);
# rm space at end of line
gsub(/[ ]*$/, "", indx);
gsub(/[ ]*[\:][\:]$/, "", indx);
# rm "::" at the end of trans-see
gsub(/[ ]*[\:]+[ ]*$/, "", $1);
# add dictd links {...} to trans-see (needs gawk)
# undocumented feature does not work with all clients
$1=gensub(/(.*SEE[:][\ ])(.*)/, "\\1{\\2}", "g", $1);
# printout		}
print indx;
print " "$1"\n\t"$2;