Google

FTP search frequently asked questions

How often is the data set updated ?

We fetch ls listings once a week from most ftp servers. If the server has a ls-lR.gz file, we fetch that file instead of asking the ftp server to perform a recursive directory listing. Thus, if that file is outdated, our view of the files on that ftp server is outdated.

We generate a new data set from the ls listings at the end of a data collection pass if the current data set is more than 4 days old, or we have more than 700 parsed ls listings that are newer than the current data set. The data set generation interval varies between 2 and 3 days.

How do I add my Web page to FTP search ?

We do not index Web pages. Thus URLs in the form http://www.unit.no/ are not indexed.

How do I add my ftp server to FTP search ?

First verify that your server supports anonymous ftp and either has a regularly updated ls-lR.gz file in the root directory of the anonymous ftp area containing a compressed unix style recursive directory listing or gives a unix style recursive directory listing in response to the "LIST -lR" command on the ftp protocol level.

Then send an email to tegge@idt.ntnu.no. It should contain the word add and the URL of the server to be indexed without any directory specified, e.g. add ftp://ftp.unit.no/

To have only a part of the the ftp server indexed, you must maintain a ls-lR.gz file in the root directory of the anonymous ftp area containing what you want indexed.

Why does it take 20 seconds for any page to appear ?

Our WWW gateway uses DNS and ident lookups, mainly for the purpose of creating some statistics, e.g. histogram over how many requests are made by the same persons daily, how many persons are using this service, how many uses it from each country (top-level domain). If this change causes a 10-20 seconds delay in bringing up any pages from this WWW gateway, then you probably have problems with your name servers, with your firewall, with your TCP stack, or your identd daemon.

A firewall should not be configured to drop (deny) incoming tcp connections to port 113. Instead, it should be configured to send back a RST packet (reject), causing the ident request to be aborted with a Connection refused error, instead of a timeout.

How do I remove my ftp server from FTP search?

Send an email to tegge@idt.ntnu.no.

How can I prevent FTP search from performing a recursive ls listing on my ftp server ?

Maintain a ls-lR.gz file in the root directory of the anonymous ftp area. A sample script could be:
        #!/bin/sh
	cd /ftparea &&
        ls -laR pub > ls-lR.new 2>/dev/null &&
        gzip -9 ls-lR.new &&
        mv ls-lR.new.gz ls-lR.gz
This script could be run from cron on a daily basis. When the file ls-lR.gz is present, FTP search will fetch that file instead of performing a recursive ls listing.

If your system does not have a ls command with the expected output, but your ftp server gives the right listing, you might want to connect to your own ftp server via a command-line ftp client in the script that generates the ls-lR.gz file.

A more radical alternative is to remove the server from FTP search.

[ FTP search | Search page | Technical info ]


tegge@idt.ntnu.no
Last modified: Thu Feb 13 16:35:49 MET 1997