One of the concerns folks have about automated network management systems is that they'll become "automated network destruction systems" if things go wrong; in particular, it's a challenge to figure out what to do when the automation system discovers that the way something is currently configured doesn't match the way the system thinks it ought to be configured.
In a comment on another thread (Reluctance to trust automated network management tools), Kirby Files shares an interesting approach to fixing discrepencies found by automated systems (emphasis mine, and edited slightly to hilight Kirby's two key principles):
I agree that it's a bad thing(tm) to have automated tools "fixing" problems. In our home-grown configuration automation system, we take a different approach for service activation changes vs. auditing errors.
User-requested service activation add/modify/delete actions will identify the set of affected equipment from our service management database, dynamically create the configuration by combining templates with user- and datamodel-derived values, then deploy the changes on each piece of equipment, rolling back if one has an error.
By contrast, our nightly network auditing processes generate a list of reports of inconsistencies between the service management / network inventory database and network device configs. These reports do not in and of themselves cause changes to the network; an Ops user goes through them and decides whether to fix the database or update the network.
This follows from two personal principles of configuration managment
- The database is always right
- Don't fix what you don't understand
Under this process model, manual entry for service activation is avoided, but there's no automated "fixing" of unexpected configurations that might break the network.
NMS Software Lead
I think that these are very powerful principles, good advice, and a good way to approach real-world deployments of automated systems. Thanks, Kirby!