Reluctance to trust automated network management tools

| | Comments (4)

From a discussion today with someone who wishes to remain anonymous (emphasis mine):

I think you'll find most of these [network management tools] are sort of a RANCID outgrowth - config monitoring systems + other functions which differ between all the vendors, although there is growth towards an approach of establishing a baseline and then creating and enforcing compliance rules/templates across the network. I think we're a bit cautious of using software written by someone else that writes to a device (all of the [network management tools we were discussing] do, but those functions aren't widely used), opting instead for tell me what's different and I'll change it myself. As more of these tools become well known and stable, and with more people using automated provisioning tools which do network device writes, that attitude will gradually ease off. But I believe many people are a bit scared of auto-enforcing features when it comes to routers/switches/etc., and maybe that explains a bit of what's lacking in comparison to sysadmin tools.

I agree with this assessment, but personally, I'm more worried about somebody fat-fingering a manual configuration. Another concern is that the configurations just getting too complex to maintain manually, particularly things like packet filtering ACLs, BGP policy statements, and so forth. In a lot of ways, it's like the old arguments about programming in assembly language versus higher-level languages.

4 Comments

Yep, at some level of scale the manual configuration becomes a nightmare. Depending on the amount of time staff have to cultivate configurations, this may be a quite small level of scale (say, 50 devices).

I have experienced that "fear" of autoconfiguration tools; network engineers and architects alike don't like them often because the problems they create are due to their statelessness and the inherent state required to generate, say, an IOS configuration...

Thus, purely declarative solutions (e.g. building config templates with m4) become less useful as the configuration diverges due to say, the NOC fixing a problem and not updating the configuration system (or some such type of problem)... RtConfig tends to work a little better, because its configuration can work within existing configured state (i.e., it just reconfigures the bits it needs to, and builds new configuration where required). Obviously, these systems cannot deal with what needs to be removed, so they will only ever be incremental solutions.

Alva Couch and Paul Anderson have a great presentation about the uses of both declarative and 'proscriptive' configuration models in system administration, and it's well worth the time..

http://www.cs.duke.edu/csl/usenix/04lisa/tech/talks/couch.ppt

An important point from this presentation is that you need to evaluate whats suitable for you, now (the two presenters approach the same problem at different points (and thus, approaches) in the curve, one with a distinct cost/complexity advantage, and one with advantages in other areas). Iterative approaches work well for a variety of solutions - ACL application, building customer interfaces (works best on networks requiring only one or two interfaces configured per service), but eventually configuration will diverge. How you manage this will become a major issue. This is the problem people run into when deploying things like cfengine, which is delcarative -- you have to turn the model into the _things to implement_ yourself.

Reluctance to trust automated network management tools can also be rooted in the way management encourages heroism.

I have seen clients where their network was maintained on a completely ad hoc / by hand basis. Audits revealed many mistakes and inconsistencies in their network setup. The network admins said "too busy" keeping their working running to automate. When a problem arose, the network admins performed heroic duty to bring the network back from disaster. Management was too grateful for service restoration to ask about the root cause. Management would praise the "skill and dedication" of their network staff instead of being critical of the way their network was managed.

The condition of "heroic ad hoc network management" is often reveled by a merger or acquisition of network asserts. A gap analysis (issues, problems and concerns relating to the merger of networks) shows a lack of consistency in managed network assets.

"Heroic ad hoc network management" is also reveled when a major security breach has occurred and the damage rises beyond the level that heroic efforts can solve.

Finally "Heroic ad hoc network management" is exposed by an audit. Often these audits are driven by a regulations, legal compliance requirements or audits by major customers or major partners.

Returning to the original point: There is a strong desire on behalf of these so-called "network admin heroes" to have a direct personal control over the company's network assets. They feel they need this direct control so that that when they are called on, they can to perform a heroic rescue and reap their reward.

Network hero's fear that network automation will reduce their level of control. They fear that when an automated network breaks, they won't be able to fulfill the role of network hero. This ad hoc non-automated condition is likely to remain unless some external pressure (i.e., merger/acquisition, major security breach, regulatory compliance) forces things to change.

I agree that it's a bad thing(tm) to have automated tools "fixing" problems. In our home-grown configuration automation system, we take a different approach for service activation changes vs. auditing errors.

User-requested service activation add/modify/delete actions will identify the set of affected equipment from our service management database, dynamically create the configuration by combining templates with user- and datamodel-derived values, then deploy the changes on each piece of equipment, rolling back if one has an error.

By contrast, our nightly network auditing processes generate a list of reports of inconsistencies between the service management /network inventory database and network device configs. These reports do not in and of themselves cause changes to the network; an Ops user goes through them and decides whether to fix the database or update the network.

This follows from two personal principles of configuration managment, "The database is always right," and, "Don't fix what you don't understand." Under this process model, manual entry for service activation is avoided, but there's no automated "fixing" of unexpected configurations that might break the network.

--kirby
NMS Software Lead
Masergy Communications

An engineer that does not push to automate administrative tasks is not an engineer, he is an administrator with an engineer's title.

Pages

About this Entry Archives

This page contains a single entry by Brent Chapman published on March 7, 2005 1:29 PM.

Network World review of configuration tools was the previous entry in this blog.

Uplogix Envoy network management appliance is the next entry in this blog.

Find recent content on the main index or look in the archives to find all content.

Mailing List

Creative Commons License
This weblog is licensed under a Creative Commons License.
Powered by Movable Type 4.12