I guess CERN proxy server is very good doing what you want. You can
download it from http://www.w3.org/ (bin or source). It will let Pass,
Fail or Map virtually any URL. The good thing is that you can use a
combination of these at the same time (unlike other proxy servers).
----------
From: firewalls-owner
Sent: Monday, December 16, 1996 12:00 PM
To: FIREWALLS
Subject: FW: Web Proxies
On Mon, 16 Dec 1996 15:14:00 +0800 wrote Hiro <hirod @
hutchnet .
com .
hk>
>-------------------------------------------------------------------------
--
---
>Hi
>
> Simple question.....
>
> Has anyone 1. heard of or 2. has implemented a proxy that blocks
users
>from accessing all http sites on the Net except for a specific one from
>an administrator controlled list?
>
>Regards
>
>
>Hiro
I've written one but can't distribute the code. Our firewall is
configured
to reject several of our internal subnets but we needed to allow systems
on
those subnets to get to a select number of sites. The solution was to
write
a simple pass-through proxy on a third system which validates the request
and passes the request on. The algorithm is to:
1. Read the entire input stream. To do this correctly, there is a field
in
the header to look at. I don't do it correctly but have confirmed this
works
with all the browsers that our internal people are supposed to be using.
I
look for:
a. If the input starts with "P", wait for an EOL.
b. If the input starts with "G" or "H", wait for 2 consecutive EOLs.
(I ignore returns, only looking for line feeds)
c. If the line doesn't start with one of the above 3 characters, reject
it.
2. Scan the Input line. To find the host:
a. Skip to the first white space (skip the PUT, GET, HEAD)
b. Skip any leading "/"
c. The next 7 characters should be "HTTP://" (upper or lower case)
d. The site is between this and the next white space, EOL, or /
3. If this is on your list (either case), open a connection to the site,
pass the input and wait for the site to respond back. In our case, we
always open the connection to our firewall.
Note that the input (from the browser) phase is completed before scanning
the URL and opening the remote connection. Also, once the connection is
opened, I don't pay attention to what comes up from the browser since I'm
not going to pass it to the remote site anyway. Browsers use a
half-duplex,
single transaction protocol.
There were several reasons for using a third host but high on the list
was
to keep the ruleset in the firewall simple. Our ruleset is already
complicated enough since we have subnets which can pass through, some
which
can't, some which support ports which others don't, and such. Each rule
has
been thought out as to if we really need and if we can somehow simplify
the
ruleset. I didn't want to throw in URL filtering.
As it stands, we can simply block all access through the firewall from
restricted subnets and use the third system (where this filter runs) both
as
the filter and a router to switch from the restricted subnets to the
Internet-savvy subnet. If we only had 1 subnet, we could easily make a
rule
that only the third host could go through the firewall. There is no URL
filtering for subnets which are allowed Internet access directly through
the
firewall.
The downside is that the browsers on different subnets need to point to
different proxy system. I decided that trying to maintain the rulesets
on
the firewall to support a potentially changing list of URLs and allowing
only HTTP from the selected subnets could easily lead to busted rulesets.
I
also don't handle any cases where the remote location is allowing URL
proxies (http://remotesite://http:another site) but this isn't allowed at
the sites that we allow.
There is no reason why the above shouldn't work as a modified plug
gateway
on a firewall though. Check with your firewall vendor though; many
support
some varient of URL filtering. No sense writing your own unless you have
a
good reason for doing so.
|
|