Overviews of incident/outage management principles and practices

| | Comments (3)

I recently went looking for introductory material on incident and outage management principles, practices, and so forth, to help an ASP-like client of mine educate some of their management team who are pretty sharp in their own fields but who are new to Operations and IT management. I'm not talking about security incidents in particular, though those are certainly one type; I'm talking about more general types of incidents and outages that a service provider (particularly an ASP) might run into. Network outages, hardware failures, system overloads, cooling/power failures, software meltdowns, database debacles, etc.

I found a couple of interesting papers...

First, in the June 2005 issue of the USENIX magazine ;login:, there's an article called "When Disaster Strikes: Cailin and Roland Discuss Crisis Management", by Thomas Sluyter and Roland van Maarschalkerweerd (USENIX membership is required to download the article PDF from the web site). It's a good, high-level, 4-page intro to crisis management, and does a pretty decent job of providing an overview of a workable incident management process.

Second, I found a free white paper written by INS called "A Framework for Incident and Problem Management", by Victor Kapella, April 2003. This is longer (20 pages), more formal, more abstract, and more management-oriented, but very useful and very interesting. One thing to keep in mind in reading this paper is that they draw a distinction (as does ITIL, so I understand) between "incidents" (single occurrences) and "problems" (ongoing sets of individual issues which have a common cause). This paper does a particularly good job of talking about the ways in which Operations, Engineering, and other functions within an organization need to work together to resolve Incidents and Problems.

3 Comments

I just found your blog entry through Google and I must say that I'm flattered by your review of our article :) I'm happy that it was of use to you!

Oops! Something I forgot to mention in my previous comment (just thought of it while cooking dinner): our article can also be downloaded from my website for free!

Granted, it doesn't have the fancy-schmancy layout of the ;Login: version, but it's the same deal (just laid out to match the theme of my website).

You can find it in the Sysadmin section of my website, at http://cailin.tweakdsl.nl.

Should anyone care :) The website where the free version of this document can be found has moved to http://www.kilala.nl.

Pages

About this Entry Archives

This page contains a single entry by Brent Chapman published on July 26, 2005 10:28 AM.

COBIT is a useful IT capability assessment tool was the previous entry in this blog.

Just returned from 2 weeks in Mississippi doing hurricane relief is the next entry in this blog.

Find recent content on the main index or look in the archives to find all content.

Mailing List

Creative Commons License
This weblog is licensed under a Creative Commons License.
Powered by Movable Type 4.12