Archive for February 2009

Discoveries

February 25, 2009

I was investigating a problem with a SQL server having 100% CPU spikes. I found that Sysinternals Process Monitor invaluable in helping me see what was going on. What was happening was that a large number of scripts were running at the same time and as it was a SQL cluster with multiple instances (this customer was doing a SQL Server consolidation) then it was exasperated as the scripts run for each instance at the same time. This has been confirmed by Microsoft as being a problem with the SQL MP (v6.0.6460.0) when running on a cluster with multiple instances all running off one active node.

To improve the situation I created a group with the cluster members and went through all the discoveries that had generic targets (like  Windows Computer or Windows Server) and put an override on them to stop them. This cluster is only going to be used for SQL Server and so there is no need to discover anything else. The main ones that really helped were eliminating the discoveries of SRS, Integration and Analysis Services as those were getting run for each instance. The active node is down to running about 15% which the SQL DBA thinks is dreadful but at least there are no 100% spikes now.

In my investigationI found some strange things. The IBM Director Agent script was set to run every 20 seconds – targeted to Windows Computer. I don’t even run discoveries that fast on a demo system! That was definitely changed.

Here is a list of more reasonable times for discoveries and what they are in seconds that you can use for overrides.

1 hour        3600 seconds
2 hours        7200 seconds
4 hours        14400 seconds
8 hours        28800 seconds
12 hours    43200 seconds
24 hours    86400 seconds
1 week        604800 seconds

I have always thought that one of the advantages of the application since MOM 2000 has been its automatic discovery and downloading of MP rules. This meant that if someone installed IIS on a server and forgot to tell the monitoring team it would not matter as the discovery would ensure that IIS was found and the rules downloaded. With 2007 using Cscript it seems that there is more chance of hitting 100% CPU which was pretty much unheard of in 2000 and 2005. Yet I have seen threads on forums about this for the AD and DNS MPs and now SQL on clusters with multiple instances.

In contrast I have been looking at the beta of the new Exchange 2007 MP which has been written for 2007 rather than converted from 2005. One of the things that struck me immediately when reading the MP Guide was that the majority of discoveries are switched off by default and when they are enabled the default is a rather more sensible 24 hours. After all how often do you change server roles in a production environment? This is a philosophical change as in the past all discoveries were targeted to all servers. The only discovery in this MP that does this is a light weight discovery that just checks registry keys. Once that has been done then the other discoveries (when switched on) are targeted at just those servers. That type of behaviour is seen in a number of MPs where a general discovery is run against Windows Computer but then other specific discoveries are targeted towards the class that is discovered. Obviously this puts less load on servers that are not running that application – especially if the discovery uses big scripts and/or WMI.

I like this idea. I have suggested in the past that the Exchange MP is split into two with the basics (event monitoring) in one and the advanced stuff that needs configuration (synthetic transactions) in a second. While this is not how the Exchange MP has been done it is split into multiple roles so you can just install the mailbox monitoring or the CAS or Hub MPs. This will make it easier to tune as you can put in one bit at a time.

I would recommend that you have a look at your discoveries and how often they are running and ask yourself what frequency is good for your environment. I would suggest a long period like a day or even a week for most as you can always create an override for a shorter period if you need to speed things up temporarily.

I was hoping to add the almost obligatory 1 line PowerShell script to show you how to get that information but although there is a get-discovery command it does not include the actual frequency of the discovery. You can always use the excellent MP Viewer (from Boris Yanushpolsky) but that can only look at 1 MP at a time. But it can examine an MP before you import it. It has a node for discoveries and will tell you the target, whether it is enabled or not and the all import frequency (in seconds).

32 v 64 Bit Programs

February 20, 2009

Reading Aidan Finn’s post about having problems with agents as he used the 32 bit MOMCertImport.exe tool on a 64 bit server reminded me of a post I wanted to do.

I had a similar problem in that the server build team wanted to build the servers with the OpsMgr agent which was fine. But for DCs they got hold of oomads.msi but were using the 64 bit version on a 32 bit server. So obviously it did not work. And then came to me with the problem.

The files are different sizes but why can’t Microsoft call them oomads32.msi and oomads64.msi etc. If you download KB patches they name the patches differently. One I downloaded recently was 354607_ENU_i386_zip.exe with the 64 bit as 354627_ENU_x64_zip.exe. In the OpsMgr directories there are the various directories for the different versions. Would it be that difficult to name them to reflect the OS version? It would save a few PSS calls I am sure.

Health Service Unloaded System Rule(s) Event 1102

February 16, 2009

I have been seeing a number of servers that were greyed out and the alert “Health Service Unloaded System Rule(s)” showing that those agents have unloaded rules. The Alert Knowledge said to initiate a repair but that did not work. The alert auto resolves but then a new alert is created. Looking at the server the health service was running but there were masses of event ID 1102 in the Operations Manager event log. Hundreds of them. Basically all rules and monitors had unloaded.

A quick search for the alert “Health Service Unloaded System Rule(s)” shows that the problem can have a time zone element as mentioned here – http://myitforum.com/cs2/blogs/momlist/archive/2009/01/08/msmom-issue-with-health-service-unloaded-system-rules-alert-jahaig.aspx but that was not relevat to this problem.

I found a different problem. Something in the environment is creating a file called program in the root of c:\. This is obviously confusing OpsMgr with “C:\Program Files” and the agent just unloads all its monitors. When you log into the server you get an error message about the file and you are asked to rename it.
program-name-warning
When you do and restart the health service everything is fine. If you do not rename the file and restart the Health Service then you get 1102 errors for every rule and monitor again but luckily only one alert in the console.

The creation of the file coincides with the reboot of the server on the ones that I examined. I also notice that there is DCOM error 10000 in the system log at the same time where OpsMgr can not start as “c:\program files\system center operations manager 2007\monitoringhost.exe –embedding is not a valid win32 application.” So it looks like OpsMgr is not loading so the rules and monitors can not load.

Apparently it’s a Windows bug when a call “%systemroot%\program files\anything” is made without the quotes. The person who wrote about it said that he “wrote a batch file that just deletes it and set Win.ini to activate the batch on boot. You could do the same with a login script or whatever. Problem solved.” There are plenty of posts about it going back to Windows 2000 days.

In this environment this happens after a server is rebooted. So now I need to try and find out what script is run when a server starts up. Or perhaps create a diagnostic that renames the file and restarts the OpsMgr service.

MMS 2009

February 15, 2009

This weekend I have booked MMS 2009 (27th April to 1st May) along with the hotel and a flight so I am on my way to Vegas. Hurrah! Last year I left it too late and it was sold out. Mind you it does fit in as my previous attendances were 2003, 2005 and 2007.  So it looks like 2011 after this one. Look forward to meeting you if you are attending.

I expect there to be a lot about R2 as the Product Group will be busy trying to ensure that it RTMs for the event as they like to have a launch at the event. It explains why there are not that many blog posts from the Product Group.

I have finally got around to clearing off OpsMgr from my test VM pair and installed the R2 beta and hopefully I will have time to work with it over the next few weeks and write about what is different – the good, the bad and the ugly and whether you should upgrade.

You can feedback on MP Quality

February 3, 2009

There has been a few posts about this survey but the one I saw originally was from Justin Incarnato of the Product Group.

http://blogs.technet.com/momteam/archive/2009/02/02/your-feedback-requested-operations-manager-2007-management-pack-quality-survey.aspx

Microsoft and the Operations Manager 2007 product team would like to know what you think about the quality of Microsoft management packs for Operations Manager 2007. This community survey is your chance to rate the quality and features of several management packs to help Microsoft understand where they need to focus their time and effort to improve management pack quality. 

This is a short survey (only 9 questions) allowing you to rate the individual quality of several management packs, and communicate your thoughts on monitoring features, tuning effort required – everything that makes for a good MP.

The survey should only take about 5 minutes. We’d like to get your feedback by Monday, February 16th if possible. However, we’ll leave the survey open longer if responses are still coming in sufficient numbers.

Click Here to take the survey

As I have a few posts about problems and quality issues with the MPs then I have already filled it out. Pity that there is no fields for comments! Also the first question asks about which MPs you have used or not used but you still have to tick not used to them again in multiple questions. If you have a problem with an MP not in the list (too noisy, hard to tune or just does not monitor enough) then there is no option to fill that in. I hope they are going to use it to improve the MP quality and not just as a marketing exercise.

There used to be an e-mail address (momwish@microsoft.com) that you used to be able to send feedback and suggestions to. I don’t know if that still exists but there is the ability to give feedback on Connect (need Hotmail/Passport/Live ID). You get the link to it at the bottom of the community page.


Follow

Get every new post delivered to your Inbox.