MOM 2005 and ITIL – Part 1

ITIL (IT Infrastructure Library – http://www.itil.co.uk/) was developed by the
UK government as a set of best practices for IT service management and is now recognised internationally and used by the public and private sector. MOF (Microsoft Operations Framework – http://www.microsoft.com/technet/itsolutions/cits/mo/mof/default.mspx) is based on ITIL with more specific focus and recommendations for Microsoft products. As the demand grows for IT organisations to be cost efficient and effective then the use of ITIL/MOF is growing. 

MOM can be used to help in the ITIL process. While no piece of software is ITIL certified (only people can be ITIL certified) there is no doubt that MOM can be used to help with ITIL processes. ITIL can be a big, and sometimes daunting project, for some organisations to embark on. Where do they start? I have also seen organisations put in MOM and then ask what do they do with it? By using MOM and ITIL it helps define where to start on an ITIL project and at the same time as ITIL is going in it focuses the reasons why MOM is being used and how to configure it. 

The four main areas that MOM can help are 

  • Incident Management
  • Problem Management
  • Service Level Management
  • Capacity Management

Incident Management  

The goal of Incident Management is to restore normal service operation as quickly as possible with minimum disruption to the business, in order to ensure that the best achievable levels of availability and service are maintained. 

The Incident Management life cycle is  

  • Incident detecting and recording.
  • Initial classification & support.
  • Investigation & diagnosis.
  • Resolution & recovery.
  • Incident Closure.

Incident Detecting and Recording 

This is pretty much what MOM does out of the box. An alert equates to an incident for the majority of the alerts. Obviously information alerts do not. But by focusing MOM on getting alerts to equal incidents it gives a target for alert tuning. MOM will have an alert with the date, time, server, a level (warning, error, critical error etc) and Product Knowledge to help in the investigation and diagnosis. So the incident detecting and recording is done automatically. Additionally MOM is proactive.  It is looking for events that may affect the IT systems and spot issues before they become incidents – like out of disk space incidents. 

Initial Classification & Support 

What you need to do for initial classification & support is  

  • Classify incidents
  • Match against known errors
  • Assign impact and urgency
  • Provide initial support
  • Close or route to a specialist support group

You can assign a Resolution State and Owner. Also if you use a helpdesk or service desk package you can forward the alert manually or automatically to a helpdesk or service desk using the MOM Connector Framework (MCF) or via a third party tool to create the incident in the service desk software and keep the changes made in synch between the two packages. Some organisations prefer this as all incidents are recorded in one package for analysis regardless of whether the incident comes from a MOM alert or a user call to the helpdesk. 

It is also possible to modify the rules so that an rule that would normally create a warning alert can be changed to create a critical alert if you have deemed that this alert is important in the organisation. As well as the Product Knowledge tab which is filled in with details on the problem and probable cause as well as potential solutions there is the Company Knowledge tab where additional information, workarounds, company specific knowledge with web links can be added by the organisation to aid in the resolution when this alert happens. The Product Knowledge or Company Knowledge may help in providing the initial support and the operations team may be able to close the incident based on that. Otherwise further investigation and diagnosis is needed. 

Investigation & Diagnosis 

You can use the Resolution State to escalate to another team. This can be customised to suit the organisation. A process should be put in place to have a mechanism to hand over incidents from one team to another and not just rely on someone “looking at the console”. While keeping an eye on the console is a front line task it is unlikely that second and third line support will monitor a console as they should be working on projects and proactive activities as well as responding to requests from front line support when there is an incident that needs to be escalated to them.  

The tasks in the Task Pane help with the investigation and diagnosis as well as recovery. This can be used for simple tasks like pinging a server to more complex tasks like running a script against a server to determine a specific piece of information.  

The Product Knowledge tab and Company Knowledge tab can assist in this phase by providing information and known fixes.  

Resolution & Recovery 

MOM can be setup to automatically alert a specialist group via e-mail for certain incidents or automatically run a script or command line task to fix the problem if it is a known issue with a known fix. This ability to automate these known issues frees up the operations staff to focus on incidents that need manual intervention. Different groups can have different views of the alerts that satisfy their criteria. For example one group may just want to see all alerts for all AD servers, another may just want to see all alerts for Exchange servers in London while another may just want to see all critical errors that have been in that state for more than 30 minutes. 

Once an incident is resolved then additional knowledge or fixes should be added to the Company Knowledge to aid in the resolution and recovery of future occurrences of this incident. 

Incident Closure 

Once the incident is resolved the alert can also be resolved and is removed from the main Alert View. 

To be continued.

5 Comments

  1. I wonder that alert auto resolution feature is a kind of marketing stuff. It isn’t work in most cases. Even though alert auto resolution feature works – we can’t receive the “auto resolved” message by email. Then we can’t build fully automated system.

  2. Paul

    There is a certification for ITIL tooling called PinkVerify. This certification mainly states that the tool can be used to support ITIL processes. The added value of the certification can be discussed when ITIL is becoming the de-facto standard and all tool manufacturers will have to comply in order to sell.

    One other remark: you can not and should not use alerts in an one on one relation with incidents. One incident can be indicated by many alerts but many alerts will not necessary indicate one incident. If you are not careful the incident management team(s) will be overflown with MOM related incidents. It’s like the story about the boy who’ve cried wolf.

  3. Nick Madge

    Agree with Paul, re: alerting for incidents, the quickest way to lose your clients is to flood their inboxes with alerts. Correlation of events is a good way of ensuring that key issues are alerted, and your product is valued internally.

  4. Hi,

    Thanks for the comments. I agree 100% with Paul and Nick that an alert does not always equal an incident and I apologies if I gave that impression. But some alerts do equate nicely to an incident and for organisations that are new to MOM and ITIL I have found that linking the two helps them.

    As for flooding the incident team, one of the key jobs in any MOM deployment is alert tuning. Again I have found when MOM is going in with ITIL it gives extra focus to the alert tuning so that you are trying to get to the case of an alert equals an incident. I do not believe that you can achieve it 100% but that focus does help in the tuning. Then there is the process of dealing with alerts and whether to automate them to a service desk or manually mark them to be sent. Process is a key part of what ITIL is all about and this again helps in configuring MOM. Putting in technology without process and involving the people will rarely work.

    Valdislav – the auto alert feature does work as you say and it is useful and if you link that with the MCF to a service desk product you should get a bidirectional flow of changes. The auto resolve feature only works with rules which have state and are therefore showed in the State View and will turn the dots red and then back to green. I am not clear why not getting an e-mail means that you can not build a fully automated system. What part do e-mails play in the automation?

    Ian

Trackbacks

  1. musc@> $daniele.rant | Out-Blog » Blog Archive » MOM2005 vs. OpsMgr2007 and ITIL ?