Date: Wed, 4 Feb 1998 14:00:16 -0800
From: Chuq Von Rospach <firstname.lastname@example.org>
At 1:33 PM -0800 2/4/98, Gerald Oskoboiny wrote:
>I disagree -- there's a very good reason for archives to be in
>global search engines: so the information is easily accessible
>to anyone who might need it!
Sorry, but my experience is that the archives tend to clog things up, not
enlighten. There's such a thing as overkill. And even if I didn't believe
that was true, my users privacy issues override distribution of my archives
in that way.
I want both things. Accessability and privacy. I want any visitor to my
webpage to be able to access the archives but I don't want search engines
picking them up.
Right now, I more or less have that. The archives are in the FTP area of
my web space so the search engines don't get them. The path to them is
clearly accessable from the web site. A friend is writing me a local
search engine so people can pull up indivudual messages via keywords.
Right now, they are arranged chronologically in files by month. Most are
Are there global search engines I have to worry about that do FTP sites? I
know such things exist, but are they used much? Are they used by spammers?
Will the existance of a local search program change anything? I don't even
have room for all my archives gzipped (they are currently spread over 3
different accounts; the list has been running over 7 years) let alone a
back up copy with full attributions, though I'm in the process of getting
A word about robots.txt. It doesn't work for most of us. I set it up and
the search engines (altavista) still went to the pages, months later. I
had blocked out an entire subdirectory and I know I did the logistics
correctly. I asked around my ISP's local newsgroups and it turns out that
you can only block directories if the robots.txt file is at the top level
(i.e., if you have a custom domain).
I don't really understand this though. If I have a custom domain that's
really a virtual domain, why would robots.txt work? It would perhaps keep
out searches of, say, http://www.mydomain.com/keepout/privatepage.html but
how would it stop a search of the very same file which is also known as,
People on the groups mentioned alternatives to robots.txt where you put an
HTML command on each page you don't want searched. But I'm afraid it
didn't make any sense to me. Is there someone who could give me the code
(I know HTML and could probably implement it with a brief explaination) to
block searches of indivudual pages? If there is a way to block directories
or FTP sites from search engines, I'd appreciate that very much.
"There's nothing wrong with me. Maybe there's Cyndi Norman
something wrong with the universe." (ST:TNG) email@example.com