Great Circle Associates List-Managers
(February 1998)
 

Indexed By Date: [Previous] [Next] Indexed By Thread: [Previous] [Next]

Subject: Re: Archives and robots.txt
From: Cyndi Norman <cnorman @ best . com>
Date: Wed, 4 Feb 1998 18:41:40 -0800 (PST)
To: list-managers @ GreatCircle . COM
Cc: cnorman @ shell7 . ba . best . com
In-reply-to: <v04003a31b0fe95333d9a@[17.219.12.172]> (message from Chuq VonRospach on Wed, 4 Feb 1998 14:00:16 -0800)
Reply-to: cnorman @ best . com

   Date: Wed, 4 Feb 1998 14:00:16 -0800
   From: Chuq Von Rospach <chuqui@plaidworks.com>

   At 1:33 PM -0800 2/4/98, Gerald Oskoboiny wrote:

   >I disagree -- there's a very good reason for archives to be in
   >global search engines: so the information is easily accessible
   >to anyone who might need it!

   Sorry, but my experience is that the archives tend to clog things up, not
   enlighten. There's such a thing as overkill. And even if I didn't believe
   that was true, my users privacy issues override distribution of my archives
   in that way.

I want both things.  Accessability and privacy.  I want any visitor to my
webpage to be able to access the archives but I don't want search engines
picking them up.

Right now, I more or less have that.  The archives are in the FTP area of
my web space so the search engines don't get them.  The path to them is
clearly accessable from the web site.  A friend is writing me a local
search engine so people can pull up indivudual messages via keywords.
Right now, they are arranged chronologically in files by month.  Most are
gzipped. 

Are there global search engines I have to worry about that do FTP sites?  I
know such things exist, but are they used much?  Are they used by spammers?
Will the existance of a local search program change anything?  I don't even
have room for all my archives gzipped (they are currently spread over 3
different accounts; the list has been running over 7 years) let alone a
back up copy with full attributions, though I'm in the process of getting
CD backups.

A word about robots.txt.  It doesn't work for most of us.  I set it up and
the search engines (altavista) still went to the pages, months later.  I
had blocked out an entire subdirectory and I know I did the logistics
correctly.  I asked around my ISP's local newsgroups and it turns out that
you can only block directories if the robots.txt file is at the top level
(i.e., if you have a custom domain).

I don't really understand this though.  If I have a custom domain that's
really a virtual domain, why would robots.txt work?  It would perhaps keep
out searches of, say, http://www.mydomain.com/keepout/privatepage.html but
how would it stop a search of the very same file which is also known as,
http://www.best.com/~cnorman/keepout/privatepage.html ??

People on the groups mentioned alternatives to robots.txt where you put an
HTML command on each page you don't want searched.  But I'm afraid it
didn't make any sense to me.  Is there someone who could give me the code
(I know HTML and could probably implement it with a brief explaination) to
block searches of indivudual pages?  If there is a way to block directories
or FTP sites from search engines, I'd appreciate that very much.

Thanks,
Cyndi

-- 
_______________________________________________________________________________
"There's nothing wrong with me.  Maybe there's                     Cyndi Norman
something wrong with the universe." (ST:TNG)                   cnorman@best.com
__________________________________________________ http://www.best.com/~cnorman


Follow-Ups:
References:
  • Re: Archives
    From: Chuq Von Rospach <chuqui@plaidworks.com>
Indexed By Date Previous: Re: AOL spam controls question
From: Dave Voorhis <dave@armchair.mb.ca>
Next: Re: AOL spam controls question
From: Berg <berg@eskimo.com>
Indexed By Thread Previous: Re: Archives
From: Chuq Von Rospach <chuqui@plaidworks.com>
Next: Re: Archives and robots.txt
From: Chuq Von Rospach <chuqui@plaidworks.com>

Google
 
Search Internet Search www.greatcircle.com