Archive for the ‘ITIL’ category

MOM 2005 and ITIL – Part 3

August 3, 2006

Capacity Management 

The goal of Capacity Management is to understand the future business requirements and ensure that all current and future capacity and aspects of the infrastructure are provided cost effectively to support those objectives. 

Capacity Management tasks 

  • Monitoring the performance and the throughput of IT services and supporting IT components.

  • Tuning activities to make efficient use of resources.

  • Understanding the current demands for IT resources and deriving forecasts for future requirements.

  • Producing a Capacity Plan predicting the IT resources needed to achieve agreed service level agreements.

Out of the box most management packs will monitor a number of performance counters. The product groups creating the management packs have done the work to say which perfmon counters are useful to monitor. Within the console it is possible to view a graph of these performance counters to see that everything is running fine. More importantly MOM allows thresholds to be set when counters, average of counters over a number of samples or differences between samples are greater or less than a value so that operators can be alerted and do not have to monitor graphs. 

With MOM 2005 came the built in data warehouse that could be used with SQL Reporting Services. This was a big leap from MOM 2000 with an Access front end of reports which were hard to create and you had to create your own data warehouse. While it is recommended that you keep the MOM database at less than 30 GB for the data warehouse is supported to 1 TB and out of the box the settings will keep data for 13 months. The first step in the process is to decide how long it is useful to keep the data for. 

There are numerous reports from the management packs that can be run for reports on a number of performance counters. And there is also a generic performance counter report that will graph any performance counter that you have decided to collect. These reports enable the organisation to determine if there are servers that are under utilised and can be consolidated, removed or virtualised. They also allow the organisation to see which servers are nearing capacity and need upgrades. Also by looking at the graphs over time you can trend servers that will run out of capacity in future and so put in an order with the MOM report helping to create the business case. 

While MOM 2005 can not take the information in the data warehouse and predict the trend Microsoft are working at doing this for future specifically around using the data warehouse in SCOM with a future version System Center Capacity Planner. 

Availability Management 

Additionally Availability Management is also a contender for MOM. I have not added a section on this as the Availabilty Management pack seems to be off at the moment. This is a shame as any practically any organisation that you show the reports to like them. Although according to Clive Eastwood’s blog this should be back by the end of August.http://blogs.technet.com/cliveeastwood/archive/2006/07/21/442795.aspx 

Conclusion 

That does not cover all of ITIL but it shows how you can start an ITIL project with MOM 2005 as the focus which will help determine how you proceed with the ITIL project but also adds clarity about what you expect MOM to deliver to the organisation. 

One of the nice things that I like about ITIL is that it gives everyone a common language. So when someone talks about a problem in ITIL terms everyone knows what that means whereas in organisations that do not use ITIL some people will be talking about an incident, others about an ITIL problem and others about issues in general. 

One recommendation is to follow the Keep It Simple rule and not create vast swathes of documentation and processes that no one will read or follow. Remember it is People, Processes and Technology. The people come first. Make sure they are involved in the project and one thing I have learned from the many types of projects that I have done over the years – you can never over communicate! 

MOM 2005 and ITIL – Part 2

July 31, 2006

Having discussed Incident Management in part 1 then Problem Management logically follows.

Problem Management 

A problem is the unknown underlying cause of one or more incidents. It will become a Known Error when the root cause is known and a temporary workaround or a permanent alternative has been identified. Or to put simply – a problem is often identified as a result of multiple incidents that exhibit common symptoms. 

The goal of Problem Management is to minimise the adverse effect on the business of Incidents and Problems caused by errors in the infrastructure, and to proactively prevent the occurrence of incidents, problems and errors. 

As we have seen in Incident Management MOM detects incidents quickly and hence can help ID problems quickly. MOM’s reporting can help you prioritise your resources by supplying you with accurate data on which incidents are occurring most often. These are basically the Most Common Alerts reports. 

You may also want to look at the Alert Tuning Solution Accelerator (http://www.microsoft.com/downloads/details.aspx?FamilyId=F6AC090E-A594-4EB5-96D9-2A5FEB827BCC&displaylang=en) as it has more reports that help in summarising alerts in different views. These are Alert Count by Processing Rules, Alert Count by Device and Alert Count by Date. This solution accelerator is also good at helping tune alerts so that you can get alerts that equate to incidents. 

By seeing which alerts are most common then you can then set the second/third line staff on looking at the most common incidents or the problem that has the highest impact even though it may have fewer incidents. In either case MOM easily provides the evidence of which incidents are occurring and how frequently. 

When problems are resolved MOM has an easily updated and extensible knowledgebase that can be used to make solutions and workarounds readily available to the 1-2 line support people improving the productivity of support staff. If an existing knowledgebase system exists that can be hyperlinked then the process should be setup so that all the information is put into that single system and in the Company Knowledge tab a hyperlink to the relevant information should be added. 

MOM Reporting can also assist in providing relevant information to management. 

Service Level Management 

The goal of Service Level Management is to maintain and gradually improve business aligned IT service quality, through a constant cycle of defining, agreeing, monitoring, reporting and reviewing IT service achievements and through instigating actions to eradicate unacceptable levels of service. 

MOM will not write the SLAs for you but MOM provides the monitoring and reporting capabilities to help in determining if those SLAs are being achieved. In particular the MOM SLA Scorecard for Exchange Solution provides you with an executive dashboard to measure and trend service availability and workloads across multiple server roles in an Exchange Server messaging environment. 

The facilities to be able to do Service Level Management will be greatly enhanced in System Center Operations Manager 2007. 

MOM monitors itself and has a view to show if operations staff are not dealing with alerts in the time frames that have been configured for each resolution state in the global settings. 

As mentioned earlier MOM has the ability to integrate in with a service desk to provide information to the service desk operators. This can be done using the MCF and there is a solution accelerator to help in setting up a bi-directional connection.http://www.microsoft.com/downloads/details.aspx?FamilyId=E795EDF1-C610-467D-A9D5-92D5239232F6&displaylang=en 

Additionally there are connectors for
Tivoli, HPOV and HP NNM in the resource kit.http://www.microsoft.com/mom/downloads/2005/reskit/default.mspx 

And if you want to buy an off the shelf connector with support then there is a list of third party connectors athttp://www.microsoft.com/mom/downloads/momprodconnectors.mspx 

To be continued.

MOM 2005 and ITIL – Part 1

July 27, 2006

ITIL (IT Infrastructure Library – http://www.itil.co.uk/) was developed by the
UK government as a set of best practices for IT service management and is now recognised internationally and used by the public and private sector. MOF (Microsoft Operations Framework – http://www.microsoft.com/technet/itsolutions/cits/mo/mof/default.mspx) is based on ITIL with more specific focus and recommendations for Microsoft products. As the demand grows for IT organisations to be cost efficient and effective then the use of ITIL/MOF is growing. 

MOM can be used to help in the ITIL process. While no piece of software is ITIL certified (only people can be ITIL certified) there is no doubt that MOM can be used to help with ITIL processes. ITIL can be a big, and sometimes daunting project, for some organisations to embark on. Where do they start? I have also seen organisations put in MOM and then ask what do they do with it? By using MOM and ITIL it helps define where to start on an ITIL project and at the same time as ITIL is going in it focuses the reasons why MOM is being used and how to configure it. 

The four main areas that MOM can help are 

  • Incident Management
  • Problem Management
  • Service Level Management
  • Capacity Management

Incident Management  

The goal of Incident Management is to restore normal service operation as quickly as possible with minimum disruption to the business, in order to ensure that the best achievable levels of availability and service are maintained. 

The Incident Management life cycle is  

  • Incident detecting and recording.
  • Initial classification & support.
  • Investigation & diagnosis.
  • Resolution & recovery.
  • Incident Closure.

Incident Detecting and Recording 

This is pretty much what MOM does out of the box. An alert equates to an incident for the majority of the alerts. Obviously information alerts do not. But by focusing MOM on getting alerts to equal incidents it gives a target for alert tuning. MOM will have an alert with the date, time, server, a level (warning, error, critical error etc) and Product Knowledge to help in the investigation and diagnosis. So the incident detecting and recording is done automatically. Additionally MOM is proactive.  It is looking for events that may affect the IT systems and spot issues before they become incidents – like out of disk space incidents. 

Initial Classification & Support 

What you need to do for initial classification & support is  

  • Classify incidents
  • Match against known errors
  • Assign impact and urgency
  • Provide initial support
  • Close or route to a specialist support group

You can assign a Resolution State and Owner. Also if you use a helpdesk or service desk package you can forward the alert manually or automatically to a helpdesk or service desk using the MOM Connector Framework (MCF) or via a third party tool to create the incident in the service desk software and keep the changes made in synch between the two packages. Some organisations prefer this as all incidents are recorded in one package for analysis regardless of whether the incident comes from a MOM alert or a user call to the helpdesk. 

It is also possible to modify the rules so that an rule that would normally create a warning alert can be changed to create a critical alert if you have deemed that this alert is important in the organisation. As well as the Product Knowledge tab which is filled in with details on the problem and probable cause as well as potential solutions there is the Company Knowledge tab where additional information, workarounds, company specific knowledge with web links can be added by the organisation to aid in the resolution when this alert happens. The Product Knowledge or Company Knowledge may help in providing the initial support and the operations team may be able to close the incident based on that. Otherwise further investigation and diagnosis is needed. 

Investigation & Diagnosis 

You can use the Resolution State to escalate to another team. This can be customised to suit the organisation. A process should be put in place to have a mechanism to hand over incidents from one team to another and not just rely on someone “looking at the console”. While keeping an eye on the console is a front line task it is unlikely that second and third line support will monitor a console as they should be working on projects and proactive activities as well as responding to requests from front line support when there is an incident that needs to be escalated to them.  

The tasks in the Task Pane help with the investigation and diagnosis as well as recovery. This can be used for simple tasks like pinging a server to more complex tasks like running a script against a server to determine a specific piece of information.  

The Product Knowledge tab and Company Knowledge tab can assist in this phase by providing information and known fixes.  

Resolution & Recovery 

MOM can be setup to automatically alert a specialist group via e-mail for certain incidents or automatically run a script or command line task to fix the problem if it is a known issue with a known fix. This ability to automate these known issues frees up the operations staff to focus on incidents that need manual intervention. Different groups can have different views of the alerts that satisfy their criteria. For example one group may just want to see all alerts for all AD servers, another may just want to see all alerts for Exchange servers in London while another may just want to see all critical errors that have been in that state for more than 30 minutes. 

Once an incident is resolved then additional knowledge or fixes should be added to the Company Knowledge to aid in the resolution and recovery of future occurrences of this incident. 

Incident Closure 

Once the incident is resolved the alert can also be resolved and is removed from the main Alert View. 

To be continued.


Follow

Get every new post delivered to your Inbox.