Everybody wants to be a hero

| | Comments (2)

In a comment on another thread (Reluctance to trust automated network management tools), Landon Noll make some very astute observations about how management can inadvertently strengthen and perpetuate a culture of manual (as opposed to automated) network administration by rewarding "network heros" (emphasis mine):

Reluctance to trust automated network management tools can also be rooted in the way management encourages heroism.

I have seen clients where their network was maintained on a completely ad hoc / by hand basis. Audits revealed many mistakes and inconsistencies in their network setup. The network admins said "too busy" keeping their working running to automate. When a problem arose, the network admins performed heroic duty to bring the network back from disaster. Management was too grateful for service restoration to ask about the root cause. Management would praise the "skill and dedication" of their network staff instead of being critical of the way their network was managed.

...

... There is a strong desire on behalf of these so-called "network admin heroes" to have a direct personal control over the company's network assets. They feel they need this direct control so that that when they are called on, they can to perform a heroic rescue and reap their reward.

Network hero's fear that network automation will reduce their level of control. They fear that when an automated network breaks, they won't be able to fulfill the role of network hero. This ad hoc non-automated condition is likely to remain unless some external pressure (i.e., merger/acquisition, major security breach, regulatory compliance) forces things to change.

Excellent observation. I've seen this myself, and even unwittingly indulged in it myself, both as a "hero" (saving the day, and reaping the rewards) and as a manager (rewarding folks for being a hero rather than asking the hard questions about why the situation reached the point where heroics were necessary).

To counter this, obviously, management needs to ask those hard questions, and figure out a way to reward folks for preventing problems (by automation, for example) as well as "heroically" responding to them. We've got to ask questions like:

  • Why were heroic measures necessary in this circumstance?
  • What could we have done to prevent this situation, so that such heroics wouldn't have been necessary?
  • Are the folks who do good, solid work on preventing problems getting properly recognized for their work? Or are we inadvertently creating an incentive to let problems fester until heroic measures are required (and rewarded)?

2 Comments

I definitely agree that there's a real tendency for Ops and Eng folks to want the full control over and praise for events in the network.

However, we have not found that there's any less opportunity for heroism, despite day-to-day changes being automated by our configuration management systems.

Vendors' equipment still breaks in unpredictable ways; architectural leaps involving massive config changes still need to be made; and equipment upgrades still happen. Resolving one class of really embarrassing issue (manual misconfiguration) has only served to make those people look even better (and much more efficient; our Service Activation staff is held in much awe for how much they accomplish with so few staff).

--kirby
NMS Software Lead
Masergy Communications

It is one sided to simply look at the firefighter syndrome as the only limiter to applying new tools. Outside of system administration, automated tools can cause havoc as well.

Automated tools that link to the core knowledge of the system administrator, recommend instead of implement, and have obvious or well understood algorithms are likely to be useful. A whole class of tools encapsulate hidden models, uncontrollable updates, and opaque actions.

Microsoft Windows is a poster child for this approach, causing strange and sweeping changes when some files are corrupted. Websites are dedicated to providing hints about what certain processes profess to accomplish, and why the might not. System administrator and simple user alike is often reduced to twiddling strategic knobs and reading support forums.

I once tested a beta product aimed at SOHO customers from a company many of us lost money in. This functioned as a DHCP server, gateway, and firewall and attempted to be a black-box automated network manager. When I turned it on, it didn't work. I twiddled some knobs. There were no diagnostics showing any state, such as 'I can ping the outside world.' or 'A client has connected to me.' After I gave up, I pulled out one of the network cards and had both the President and CTO waste half an hour being heros in the BSD shell figuring it out. I never managed to persuade them that the tool had to link to the knowledge of its audience.

Would you use a network automation tool that could shoot you in the foot?

Pages

About this Entry Archives

This page contains a single entry by Brent Chapman published on March 11, 2005 1:28 PM.

New paper: Rigorous Automated Network Security Management was the previous entry in this blog.

The database is always right; don't fix what you don't understand is the next entry in this blog.

Find recent content on the main index or look in the archives to find all content.

Mailing List

Creative Commons License
This weblog is licensed under a Creative Commons License.
Powered by Movable Type 4.12