User talk:Debaser003

From Wiktionary, the free dictionary
Jump to navigation Jump to search

I have an idea to create a search engine that will extract internal links on pages on this site, and try to resolve whether the domain name for each string exists for various extensions (.com, .net, .org). Being open content, I believe there are no restrictions on doing this. Please, if someone knows, tell me. The page or 'edit' source would need to be fetched and parsed for each page. I am having trouble in Perl-CGI fetching the source for pages on Wiktionary and Wikipedia. Is there a reason the source might be protected or difficult to capture from a web-application such as Perl?

There are no protections whatsoever. You can write a script and access the pages through standard web protocols. You can take a look at the PHP script I wrote to get some pointers. I'm not clear on what you are trying to achieve by checking whether the word.com, word.net, word.org exists, but I guess you have your reasons. Polyglot 20:36, 17 Mar 2004 (UTC)

The essence of the search engine is to get terms and phrases related to the search string and resolve whether sites exist named these. It would find synonyms.com, antonyms.net, translations.org, relationships.edu, where as Google, for example, matches only small variations in the string. Obscure sites would be uncovered to the user. I wish to complete it in Perl, though the code that fetches other site's source returns nothing for any address on Wiktionary.org. I am using the following code:

use LWP::Simple;

my $source=get('http://wiktionary.org/wiki/Ball');

if($source){ print "Content-type: text/html\n\n"; print "$source"; } else {

    	print "Content-type: text/html\n\n";

print "Failed\n\n"; }


After much searching and many trials, I configured this base script to the search engine. It was a problem with identification. Identify as Mozilla/8.0 and the source can be fetched. I use the following Perl code, if anyone else is interested. I will post the link to the search engine when it is complete.

require LWP::UserAgent; $ua = LWP::UserAgent->new;

$ua->agent("Mozilla/8.0"); $req = new HTTP::Request('GET' => 'http://wiktionary.org/wiki/Ball');

my $res = $ua->request($req);

if ($res->is_success) { $page = $res->content; }

if ($page) { print "Content-type: text/html; charset=utf-8\n\n$page"; } else { print "Content-type: text/html\n\nError"; }

Not quite as I imagined, but being wiki-powered, has the potential to grow. A bit of a curiosity for the time being.

http://www.fictionfiction.net/main/employ/search