Great Circle Associates List-Managers
(March 2001)
 

Indexed By Date: [Previous] [Next] Indexed By Thread: [Previous] [Next]

Subject: Re: robots.txt
From: "Peter Galbavy" <peter . galbavy @ knowledge . com>
Organization: Knowledge Matters Ltd.
Date: Thu, 1 Mar 2001 17:45:13 -0000
To: "Tim Pierce" <twp @ rootsweb . com>, "Chuq Von Rospach" <chuqui @ plaidworks . com>
Cc: "JC Dill" <inet-list @ vo . cnchost . com>, "List-Managers" <list-managers @ GreatCircle . COM>
References: <5.0.0.25.2.20010228115334.02e015e0@pop3.vo.cnchost.com> <B6C2B066.5DC0%chuqui@plaidworks.com> <20010301110428.K47456@ma-1.rootsweb.com>

> In fact, it's not unlike putting the data behind a passworded web
> page.  The difference is that there are a lot of "passwords" (search
> terms) which are likely to yield access.  But the problem space is
> also so much larger than traditional password access that there is
> no motivation for the harvesting spiders to try to solve it.

Can I quote "assumption is the mother of all f*** up" ?

Let me go to any old search engine (private or public) and try the query
"mailto".
Oops. Perhaps "email" or "e-mail". Oops again. "contact" is another good
one.

While I wholly agree that the problem / query space is too large to be fully
searched, the 80/20 rule should tell us that a mail address harvesting
engine will be written, perhaps, my someone who may know 20 different
generic terms to put into 20 different typical FORM fields for a POST and
try. The follow all the HREFs off those pages and Roberts is your parent's
sibling.

Peter




References:
Indexed By Date Previous: Re: robots.txt
From: Tim Pierce <twp@rootsweb.com>
Next: Re: robots.txt
From: Chuq Von Rospach <chuqui@plaidworks.com>
Indexed By Thread Previous: Re: robots.txt
From: Chuq Von Rospach <chuqui@plaidworks.com>
Next: Re: robots.txt
From: Aumont <serge.aumont@cru.fr>

Google
 
Search Internet Search www.greatcircle.com