October 23, 2006

Managing patch cables

A colleague recently asked if I had any recommendations for software to manage lists of patch cables and ports, and I think my answer might have surprised him...

There are a variety of Visio add-ons and standalone tools available, and while I've often found tools like that useful for planning initial installations, they haven't been so useful for ongoing maintenance. The problem is, whatever documentation you create gets out date pretty quickly, unless you're very disciplined about it, which almost nobody is...

Instead, I've found it most useful to simply follow good cable management practices:

  • Labelling both ends of all cables with a unique identifier, but not what it's currently used for (because that will inevitably change, and the only thing worse than no label is an incorrect one)
  • Always taking the time to dress cables in neatly, rather than draping them haphazardly.
  • Removing cables when you disconnect one end, rather than just leaving them hanging.
  • Using easy-to-change physical cable management systems, like clips and velcro, rather than hard-to-change systems like cable ties, so that it's easy to "do it right" (see the above 2 points).
  • Using cables of just the right length, rather than too-long cables that you then somehow have to manage the excess for (or too-short cables with a patch hidden somewhere inaccessible in the middle of the run). This means keeping a selection of cables available in various lengths, so that the right one is available when you need it.
  • Having and religiously following a color coding scheme.

As usual, Limoncelli and Hogan offer good advice on this topic in their indispensable book The Practice of System and Network Administration (chapter 17, especially sections 17.1.7 and 17.1.8).

Posted by Brent Chapman on Mon 23 Oct 2006 at 2:16 PM UTC-08:00
Permalink | Comments (2)

December 19, 2005

LISA, LOPSA, Interop, and Splunk

I'm back home today after two weeks on the road, and it's good to be home. The first week, I was in San Diego for the annual USENIX LISA conference (where LOPSA was a major topic of discussion), then I was home just long enough to do laundry and repack before heading out to New York City for the Interop conference and exhibition (where I was helping Splunk showcase their product in the InteropNet NOC).

LISA and LOPSA

At LISA, I participated in the Social Technologies and Advanced Technologies Workshops (small, day-long discussions among senior practitioners who are especially interested in a particular topic), gave an invited talk on Incident Command for IT: What We Can Learn from the Fire Department (Adobe Acrobat PDF version), and chaired a Birds of a Feather session on network automation.

Much of the "hallway track" discussion at LISA, of course, was about the implosion of SAGE and subsequent formation of LOPSA, the League of Professional System Administrators. I think that LOPSA has a chance to become a world-class organization, but the first year or so are going to be critical, and it isn't going to happen if everybody takes a "wait and see" attitude. What's going to matter most are money and membership numbers. To try and make sure LOPSA reaches critical mass, I've joined as a Platinum Individual Sponsor, and I encourage others to join/donate/volunteer as they are able.

Interop and Splunk

At Interop, I was working in the NOC for the InteropNet show network, helping Splunk showcase their product. Splunk's product is a really powerful troubleshooting tool for interactively working your way through huge volumes of any sort of text-based system logs (syslog, SNMP traps, whatever). It is designed as a tool for use by smart people who understand what the log messages mean, if only they could wade through the flood; the software doesn't try to understand the messages itself, just makes it easier for a system/network/security administrator to navigate through the flood of messages and find the needle in the haystack. There's a very powerful free version of their software (the paid version adds various features, but the free version is fully functional by itself) available from the Splunk web site; check it out!

Posted by Brent Chapman on Mon 19 Dec 2005 at 3:08 PM UTC-08:00
Permalink | Comments (0)

October 20, 2005

Network Automation BoF scheduled for USENIX/SAGE LISA conference, Dec 4-9, in San Diego

If you are going to be at the USENIX/SAGE LISA conference in San Diego in early December, I've scheduled a Network Automation BoF (Birds of a Feather session, where folks interested in a particular topic get together to chat about it) for Thursday night, 8 December 2005, 8:00-9:00pm (right after the conference reception). Right now, they've got us scheduled in Garden Salon 1, but that's subject to change, so check the scheduling board at the conference. I hope to see you there!

BoF info:

Automating Network Configuration & Management

Organizer/Moderator: Brent Chapman, Great Circle Associates
Thursday, 8 December 2005, 8:00 pm-9:00 pm, Garden Salon 1

What's the state of the art for automated network configuration and management? What systems and tools are available, either freely or commercially? Where are these issues being considered and discussed?

Over the last 15 years or so, much of the research in the system administration field has focused on automation. It's now well accepted that a well-run operation doesn't manage 10,000 servers individually, but rather uses tools like cfengine to manage definitions of those servers and then create instances of those servers as needed. In the networking world, though, most of us seem to be still manually configuring (and reconfiguring) every device.

Further info:

Posted by Brent Chapman on Thu 20 Oct 2005 at 4:19 PM UTC-08:00
Permalink | Comments (0)

April 8, 2005

Network Automation BoF scheduled for USENIX conf next week in Anaheim

If you are going to be at the USENIX conference in Anaheim next week, I've scheduled a Network Automation BoF (Birds of a Feather session, where folks interested in a particular topic get together to chat about it) for Wednesday night, 8:30-9:30pm. Right now, they've got us scheduled in Salon H, but that's subject to change, so check the scheduling board at the conference. I hope to see you there!

BoF info:

Automating Network Configuration & Management

Organizer/Moderator: Brent Chapman, Great Circle Associates
Wednesday, April 13, 8:30 pm-9:30 pm, Salon H

What's the state of the art for automated network configuration and management? What systems and tools are available, either freely or commercially? Where are these issues being considered and discussed?

Over the last 15 years or so, much of the research in the system administration field has focused on automation. It's now well accepted that a well-run operation doesn't manage 10,000 servers individually, but rather uses tools like cfengine to manage definitions of those servers and then create instances of those servers as needed. In the networking world, though, most of us seem to be still manually configuring (and reconfiguring) every device.

Further info:

Posted by Brent Chapman on Fri 08 Apr 2005 at 1:37 PM UTC-08:00
Permalink | Comments (0)

April 7, 2005

Network-Automation discussion mailing list created

I've created a Network-Automation mailing list for discussions of issues related to automating network configuration and management, including (but not limited to) methods, mechanisms, techniques, philosophies, policies, and products.

See the list's web page for more information, to view archives, or to subscribe:

http://www.greatcircle.com/network-automation

I look forward to some interesting discussions there, and I hope you'll join us!

-Brent

Posted by Brent Chapman on Thu 07 Apr 2005 at 2:44 PM UTC-08:00
Permalink | Comments (0)

March 17, 2005

Opsware announces Network Automation System 4.0

On 7 Mar 05, Opsware (the company formerly known as Loudcloud, which Marc Andreessen founded after he left Netscape) announced that its Opsware Network Automation System 4.0 would be available beginning 21 Mar 05.

Interesting tidbits from the press release:


  • "The Opsware Network Automation System is based on Rendition Networks' award-winning TrueControl product, which Opsware acquired in February 2005."
  • "Opsware NAS 4.0 moves beyond simple network change and configuration management to offer complete network automation including change automation, compliance management, process automation, security administration and reporting around all these operational activities."
  • "Opsware NAS 4.0 includes key automation capabilities ... such as the ability to automate processes that span different IT groups and systems and an advanced Compliance Center for compliance management. ... The Compliance Center includes automated auditing and reporting for Sarbanes-Oxley, ITIL, HIPAA and COBIT."

... more ...
Continue reading "Opsware announces Network Automation System 4.0"

Posted by Brent Chapman on Thu 17 Mar 2005 at 3:46 PM UTC-08:00
Permalink | Comments (0)

Introductory articles on network management from OPENXTRA

OPENXTRA is a UK VAR that offers a variety of network management and server room monitoring tools and information. They publish a set of related newsletters, including one on Network Management, although there don't appear to have been any issues published in the last few months (since December 2004).

I haven't read all the articles on their web site yet, but the ones I've looked at so far all seem fairly introductory and high-level; for example, An Introduction to Network Configuration Management is a good very high-level overview of what network configuration management is and why it's useful, but it's fairly short on details. Regardless, I'm glad to see them making these articles available.

Posted by Brent Chapman on Thu 17 Mar 2005 at 2:50 PM UTC-08:00
Permalink | Comments (1)

March 11, 2005

The database is always right; don't fix what you don't understand

One of the concerns folks have about automated network management systems is that they'll become "automated network destruction systems" if things go wrong; in particular, it's a challenge to figure out what to do when the automation system discovers that the way something is currently configured doesn't match the way the system thinks it ought to be configured.

In a comment on another thread (Reluctance to trust automated network management tools), Kirby Files shares an interesting approach to fixing discrepencies found by automated systems (emphasis mine, and edited slightly to hilight Kirby's two key principles):

I agree that it's a bad thing(tm) to have automated tools "fixing" problems. In our home-grown configuration automation system, we take a different approach for service activation changes vs. auditing errors.

User-requested service activation add/modify/delete actions will identify the set of affected equipment from our service management database, dynamically create the configuration by combining templates with user- and datamodel-derived values, then deploy the changes on each piece of equipment, rolling back if one has an error.

By contrast, our nightly network auditing processes generate a list of reports of inconsistencies between the service management / network inventory database and network device configs. These reports do not in and of themselves cause changes to the network; an Ops user goes through them and decides whether to fix the database or update the network.

This follows from two personal principles of configuration managment


  • The database is always right
  • Don't fix what you don't understand

Under this process model, manual entry for service activation is avoided, but there's no automated "fixing" of unexpected configurations that might break the network.

--kirby
NMS Software Lead
Masergy Communications

I think that these are very powerful principles, good advice, and a good way to approach real-world deployments of automated systems. Thanks, Kirby!

Posted by Brent Chapman on Fri 11 Mar 2005 at 1:49 PM UTC-08:00
Permalink | Comments (3)

Everybody wants to be a hero

In a comment on another thread (Reluctance to trust automated network management tools), Landon Noll make some very astute observations about how management can inadvertently strengthen and perpetuate a culture of manual (as opposed to automated) network administration by rewarding "network heros" (emphasis mine):

Reluctance to trust automated network management tools can also be rooted in the way management encourages heroism.

I have seen clients where their network was maintained on a completely ad hoc / by hand basis. Audits revealed many mistakes and inconsistencies in their network setup. The network admins said "too busy" keeping their working running to automate. When a problem arose, the network admins performed heroic duty to bring the network back from disaster. Management was too grateful for service restoration to ask about the root cause. Management would praise the "skill and dedication" of their network staff instead of being critical of the way their network was managed.

...

... There is a strong desire on behalf of these so-called "network admin heroes" to have a direct personal control over the company's network assets. They feel they need this direct control so that that when they are called on, they can to perform a heroic rescue and reap their reward.

Network hero's fear that network automation will reduce their level of control. They fear that when an automated network breaks, they won't be able to fulfill the role of network hero. This ad hoc non-automated condition is likely to remain unless some external pressure (i.e., merger/acquisition, major security breach, regulatory compliance) forces things to change.

Excellent observation. I've seen this myself, and even unwittingly indulged in it myself, both as a "hero" (saving the day, and reaping the rewards) and as a manager (rewarding folks for being a hero rather than asking the hard questions about why the situation reached the point where heroics were necessary).

To counter this, obviously, management needs to ask those hard questions, and figure out a way to reward folks for preventing problems (by automation, for example) as well as "heroically" responding to them. We've got to ask questions like:

  • Why were heroic measures necessary in this circumstance?
  • What could we have done to prevent this situation, so that such heroics wouldn't have been necessary?
  • Are the folks who do good, solid work on preventing problems getting properly recognized for their work? Or are we inadvertently creating an incentive to let problems fester until heroic measures are required (and rewarded)?

Posted by Brent Chapman on Fri 11 Mar 2005 at 1:28 PM UTC-08:00
Permalink | Comments (2)

March 10, 2005

New paper: Rigorous Automated Network Security Management

Steve Lodin of Roche Diagnostics North America was kind enough to tell me about a newly published paper in the Feb 2005 issue of the International Journal of Information Security entitled "Rigorous Automated Network Security Management", by Joshua D. Guttman and Amy L. Herzog of The MITRE Corporation.

The paper's abstract:

Achieving a security goal in a networked system requires the cooperation of a variety of devices, each device potentially requiring a different configuration. Many information security problems may be solved with appropriate models of these devices and their interactions, and giving a systematic way to handle the complexity of real situations.

We present an approach, rigorous automated network security management, which front-loads formal modeling and analysis before problemsolving, thereby providing easy-to-run tools with rigorously justified results. With this approach, we model the network and a class of practically important security goals. The models derived suggest algorithms which, given system configuration information, determine the security goals satisfied by the system. The modeling provides rigorous justification for the algorithms, which may then be implemented as ordinary computer programs requiring no formal methods training to operate.

We have applied this approach to several problems. In this paper we describe two: distributed packet filtering and the use of IP security (IPsec) gateways. We also describe how to piece together the two separate solutions to these problems, jointly enforcing packet filtering as well as IPsec authentication and confidentiality on a single network.

Posted by Brent Chapman on Thu 10 Mar 2005 at 6:04 PM UTC-08:00
Permalink | Comments (0)

Infrastructures.ORG

Steve Traugott at Infrastructures.ORG says:

Most IT organizations still install and maintain computers the same way the automotive industry built cars in the early 1900's: An individual craftsman manually manipulates a machine into being, and manually maintains it afterward. This is expensive. The automotive industry discovered first mass production, then mass customization using standard tooling.

Indeed... Most network devices are still configured by hand and manually maintained, with all of the attendant problems (typos, inconsistency of configuration, difficulty making common changes to many systems in parallel, etc.). I'm very interested in taking the same principles that Steve has been codifying and espousing for systems, and applying them to networks.

For the last several years, Steve has been driving this effort, including creating and hosting the Infrastructures mailing list. Their goal is to develop and discuss the

... standards and practices [that] are the standarized tooling needed for mass customization within IT. This tooling enables:
  • Scalable, flexible, and rapid deployments and changes
  • Cost effective, timely return on IT investment
  • Low labor headcount
  • Secure, trustworthy computing environments
  • Reliable enterprise infrastructures

Posted by Brent Chapman on Thu 10 Mar 2005 at 5:14 PM UTC-08:00
Permalink | Comments (1)

March 7, 2005

Uplogix Envoy network management appliance

Uplogix offers a product named the Envoy, which is a device to help automate management of network devices such as routers and switches. You attach Envoy units to the serial consoles of your network devices (each Envoy can manage up to 4 devices), and use in-band or out-of-band access to manage those devices through the Envoy.

... more ...
Continue reading "Uplogix Envoy network management appliance"

Posted by Brent Chapman on Mon 07 Mar 2005 at 1:44 PM UTC-08:00
Permalink | Comments (2)

Reluctance to trust automated network management tools

From a discussion today with someone who wishes to remain anonymous (emphasis mine):

I think you'll find most of these [network management tools] are sort of a RANCID outgrowth - config monitoring systems + other functions which differ between all the vendors, although there is growth towards an approach of establishing a baseline and then creating and enforcing compliance rules/templates across the network. I think we're a bit cautious of using software written by someone else that writes to a device (all of the [network management tools we were discussing] do, but those functions aren't widely used), opting instead for tell me what's different and I'll change it myself. As more of these tools become well known and stable, and with more people using automated provisioning tools which do network device writes, that attitude will gradually ease off. But I believe many people are a bit scared of auto-enforcing features when it comes to routers/switches/etc., and maybe that explains a bit of what's lacking in comparison to sysadmin tools.

I agree with this assessment, but personally, I'm more worried about somebody fat-fingering a manual configuration. Another concern is that the configurations just getting too complex to maintain manually, particularly things like packet filtering ACLs, BGP policy statements, and so forth. In a lot of ways, it's like the old arguments about programming in assembly language versus higher-level languages.

Posted by Brent Chapman on Mon 07 Mar 2005 at 1:29 PM UTC-08:00
Permalink | Comments (4)

Network World review of configuration tools

Network World Fusion did a review of network configuration tools back in April 2004.

Their choice for the best product evaluated was Rendition's TrueControl.

Elsewhere on their web site, they also have a more up-to-date list (but not review) of configuration management products.

... more ...
Continue reading "Network World review of configuration tools"

Posted by Brent Chapman on Mon 07 Mar 2005 at 12:44 PM UTC-08:00
Permalink | Comments (1)

March 4, 2005

IETF Network Configuration Working Group (NETCONF)

IETF has chartered a Network Configuration Working Group (NETCONF) to "produce a protocol for network configuration". (More details in the full blog entry; click the "Continue reading..." link below.)

Their focus seems to be on defining a protocol to supplant SNMP (a worthy goal, in my opinion; SNMP has proven largely unworkable for network configuration, although it has been useful for network monitoring), but they're intentionally punting on the underlying data model to use to describe how the network ought to be configured (which I think is at least as challenging a problem).

... more ...
Continue reading "IETF Network Configuration Working Group (NETCONF)"

Posted by Brent Chapman on Fri 04 Mar 2005 at 9:43 AM UTC-08:00
Permalink | Comments (0)

What's the state of the art for Network Automation?

I posted a message to the NANOG mailing list earlier this morning, hoping to stimulate discussion:


Date: Fri, 4 Mar 2005 09:15:19 -0800
To: nanog@merit.edu
From: Brent Chapman <Brent@GreatCircle.COM>
Subject: Network automation?

What's the state of the art for automated network configuration and management? What systems and tools are available, either freely or commercially? Where are these issues being considered and discussed?

I'm not simply talking about network status monitoring systems like HP OpenView, or device configuration monitoring systems like RANCID, although those are certainly useful. Instead, I'm talking about systems that will start from a description of how a network ought to be configured, and then interact with the various devices on that network to make it so; something like cfengine for network devices.

Over the last 15 years or so, much of the research in the system administration field has focused on automation. It's now well accepted that a well-run operation doesn't manage 10,000 servers individually, but rather uses tools like cfengine to manage definitions of those servers and then create instances of those servers as needed. In the networking world, though, most of us seem to be still manually configuring (and reconfiguring) every device.

... more ...
Continue reading "What's the state of the art for Network Automation?"

Posted by Brent Chapman on Fri 04 Mar 2005 at 9:38 AM UTC-08:00
Permalink | Comments (0)