User:Flubot/What links here

From Wiktionary, the free dictionary
Jump to navigation Jump to search

This script is yet incomplete but it will work sufficiently in most cases.

It takes as input a list of entries and creates a file for each one, named title-entry.temp with the xml result of list=blinks API command, i.e. with all entries in mainspace that link back to the specific entry. Well, not exactly all of them, just the first 500.

An example of such a xml output (what links to ace) in the namespace 0: http://en.wiktionary.org/w/api.php?action=query&format=xml&list=backlinks&blnamespace=0&bllimit=500&bltitle=ace

The xml text for all entries will be in log.xml and all titles found in log.txt.

getbacklinks.sh

[edit]
#!/bin/bash

wiki="en.wiktionary.org"

curlargs="-s -S --retry 10 -H 'Expect:' -f"
curlurl="http://$wiki/w/api.php?action="


function getblfromtitle() {
    xmltext=`curl $curlargs -d "list=backlinks" -d "format=xml" -d "blnamespace=0" -d "bllimit=500" --data-urlencode "bltitle=$title" "${curlurl}query"`
    xmltext1=`echo $xmltext | sed -e 's/>/>\n/g;' > $title-enwikt.temp`
    cat $title-enwikt.temp >> log.xml

}

function checkusage() {
    if [ "$1" -eq "0" ]; then
        echo "Use: $0 1st_word 2nd_word 3rd..."
        echo "Attention: Every headword including spaces must be written between quotes"
        echo "For example:"
	echo "$0 'Main page'   'playing card'"
        echo "Alternatively, you can call this script with"
        echo "cat file-with-words | $0 -" 
        echo "and it will read the headwords' list from stdin."
        exit 1
    fi
}

## MAIN
##

checkusage $#

if [ "$1" == "-" ]; then
    while read title0; do
	title=`echo "$title0" | sed -e 's/ /_/g;'`
	getblfromtitle
    done
else 
    for title0 in "$@"; do
	title=`echo "$title0" | sed -e 's/ /_/g;'`
	getblfromtitle
    done
fi

cat log.xml | grep '<bl pageid' | awk -F'"' '{ print $6 }'  | sed -e 's/^/[[/g; s/$/]]/g;' > log.txt