While Marcus wrote some good ideas, I can't let this message go
uncorrected. Let us not lose sight of the original question and
firewall design posed to the list. Many people on this list seem to
think only in terms of Internet to Private net firewalls, but just as
usefull is a private net to private net firewall and in most cases
these are NOT T1 connections, but ethernet or fddi. In fact the
posting to which I was following up indicated that they wanted to
build just such a firewall.
> Mostly we say that kind of stuff based on real world
> experience installing firewalls for T1 connections and seeing
> that 60MHz boxes consume the load with no problem.
> I'm an experimental computer scientist, not a theoretical
> one. If I try something dozens of times and it works, after a
> while I'm willing to just say, "oh, with a 60MHz pentium it should
> be fast enough." :) Actually, based on real world experience
> with 50MHz '486 platforms I'd say they're fast enough. We went
> to Pentium boxes because Intel has deprecated the 486 in a major
> way - a 60MHz Pentium now costs what a 50MHz '486 used to. I'm
> sure we'll use 90MHz Pentiums someday, and guess what -- if my
> 50MHz '486 could handle the load, I'm willing to gamble that
> a processor more than twice as fast can handle it OK, too.
> Remember that Van Jacobson demonstrated saturating an ethernet
> with a Sun3/50 running TCP/IP. By extension, a Sun3/50
> can consume a T1 line.
First, as you say you have lots of experience installing firewalls for
T1 connections, which wasn't the original question. We are talking
about ethernet speeds. I agree that an application gateway on a 486
or pentium can fill a T1, but what about an ethernet or T3 or fddi. I
think not. Geez, for a 28.8K or 56K circuit a 286 running DOS would
probably work. :)
The part about Van saturating an ethernet for a Sun3/50 is completely
off the point. Van did it running a single process ftp on the Sun
(well actually two FTPs) not hundreds or thousands of processes.
> At TIS we use a 33 MHz 486 with a RISCOM T1 board
> running BSDI as a router between our Glenwood office and
> a local remote office. It has a Cisco on the other end and
> they talk HDLC. Looking at vmstat on the router box shows
> that typically it's using about 10% of the system. If a
> computer could yawn, this one would be showing tonsils. :)
Again this is irrelevant unless you have mis-described this box. This
is a router not an application gateway. A router rarely has to use
user level processes to move packets from one interface to another
whereas an application gateway does.
> The biggest problems in building bastion host firewalls
> are resource starvation issues in the base kernels, not CPU
> or bus horsepower. The usual answer for resource starvation
> in UNIX is: Add memory, increase maxusers, rebuild kernel,
> reboot. :) It scales nicely.
Not for tcp/ip and certain linear searches that are done by the code.
You can add all the memory you want and it won't get any faster. Ask
DEC or SUN how easy it is to scale to thousands of processes.
> So - no, there are no "real numbers" for memory required
> but I suppose if you're low on memory it's easier to add it
> when your measurements show you're low than it is to pay for
> a load of memory you'll never use. It's your money, though. :)
Memory will sometimes fix the problem and sometimes it won't. Then
adding memory is just flushing money away.
> Process creation and forking and all that stuff is an
> absolutely insignifcant delay. We're talking microseconds. Maybe
> a millisecond or 2 if you actually need to page [most UNIX boxes
> keep spare pages for startup overhead just against this case].
> Which takes longer, fork() or doing a DNS query for
> www.blagh.foo.com, when it's not in your nameserver's cache
> and you send 2, 3 UDP packets over the backbone, with RTTs
> of 100ms or so apiece?
But the box only does a DNS query at the beginning of a connection.
The rest of the time it is pumping packets from one interface to
another through a user level processes.
What is the latency through your application relay? It is probably
between 30 and 60 milliseconds.
> The expense of launching a proxy pales to complete
> insignificance compared to the expense incurred when you
> do those DNS queries (which is not exactly a HUGE expense
> either, to be frank) and then the remote WWW server does a
> bunch of DNS queries to see YOU are and all these packets
> go back and forth and if you're running that idiotic IDENT
> stuff you have to wait for that and...
Yes, this is true for opening a connection, but after the connection
is up you aren't doing idiotic IDENT stuff but just pushing packets
and incurring the overhead of context switches. How does this scale
to thousands of connections and then to ethernet or better speeds?
> This is one area that has proven to be fun. Large, busy
> firewalls are good at finding out what the boundaries of systems
> are. Usually setting MAXUSERS to something reasonable Just Works.
> If it doesn't, adding RAM and setting MAXUSERS to something a bit
> bigger always Just Works.
> Processor is not an issue, unless you've got
You have been fortunate in that most of the firewalls you seem to have
built are using T1s which are the gating factor. As soon as you
change that and try to fill a T3 the processor and the efficiency of
moving packets through the box IS an issue.
> >I doubt that a pentium box or sun could adequately handle this, but
> >quite a bit depends on the nature of the connections.
> Have you tried it?
> If not, I respectfully suggest you do. A lot of us have and
> it's worked. That's just a data point to consider. :)
Actually YES! and getting a sun to handle 2000+ simultaneous
connections is difficult and doing it at ethernet speeds or better
unimaginable using an application relay.
You are saying that you have built an application relay that handles
2000+ connections?! Next you have also built it to support ethernet
to ethernet connectivity at ethernet speeds.
> Yeah, LOTS AND LOTS of overhead. Context switches are
> measured in microseconds and packet RTTs are measured in
> milliseconds (often tens or hundreds)
Latency through an application relay is measured in milliseconds!
If all you want to do is support a relatively small number of
simultaneous connections (under 200) and a T1 circuit, first the
bandwidth will be a bottleneck well before the firewall. But if you
are planning to scale to T3 or heaven forbid ATM and 2000+ connections
there is no way an application relay runnin on a 100Mhz pentium will
suffice and in 6 months it still will not.