Archive for September 2008

Microsoft Exchange Monitoring Service

September 30, 2008

Ensure that your Exchange team has the Microsoft Exchange Monitoring service set to Automatic if you are using SCOM to monitor Exchange 2007.

If this service is not running you will see a number of Backward Compatibility Script Error alerts mentioning line 279. And not just one for each server but one for each test for each server. They all have the alert description starting with

An error occurred on line 279 while executing script ‘Microsoft Exchange 2007 – Execute Diagnostic Cmdlet’ Source: Microsoft JScript runtime error Description: Number expected One or more workflows were affected by this. Workflow name:

and then there is a description of the command they were trying to run

Execute__Test_ServiceHealth_diagnostic_cmdlet._8_Rule Instance name: Microsoft.Exchange.2007.Microsoft_Exchange_2007_All_Servers_Installation

Execute__Test_ActiveSyncConnectivity__Internal__diagnostic_cmdlet.__Report_Collection__3_Rule Instance name: Microsoft.Exchange.2007.Microsoft_Exchange_2007_Client_Access_Servers_Installation

Execute__Test_ReplicationHealth_diagnostic_cmdlet.__Report_Collection__6_Rule Instance name: Microsoft.Exchange.2007.Microsoft_Exchange_2007_Mailbox_Servers___Physical_Computers_Installation

Execute__Test_WebServicesConnectivity__Internal__diagnostic_cmdlet._3_Rule Instance name: Microsoft.Exchange.2007.Microsoft_Exchange_2007_Client_Access_Servers_Installation

Execute__Test_Mailflow__Remote__diagnostic_cmdlet.__Report_Collection__5_Rule Instance name: Microsoft.Exchange.2007.Microsoft_Exchange_2007_Mailbox_Servers_Installation

and so on for a large number of tests.

This is not mentioned in the Exchange Management Pack Guide.

Views – Or How to Tame The Beast

September 29, 2008

One of the great things about OpsMgr is the sheer amount of management packs and rules available. This means that in a typical install there are over 6,000 rules looking at various aspect of the OS and applications. The problem is that this can create a significant amount of alerts. Tuning OpsMgr is the most important piece of work that is done post installation and it should be the first (major) part of a continuous tuning process. The way I tame the console is by creating views – quite a lot of views in fact. Especially during the first stage of major tuning.

The Operators Console is just a view onto the SQL OperationsManager database.  In the Monitoring view you will see all alerts that have not been closed for the last  x days where you can change x. Although this is the main view and the first view that many people go into it is the worst view to use when dealing with a large number of servers. My preference is to use the My Workspace tab and create specific views. These are very easy to setup and there are examples of specific views in the folders of the various MPs and that can be copied. In fact if you like a view right click it to get it to create a shortcut in My Workspace

The main view I used is based on New alerts (so that any other resolution state does not show up. If you are using the console you would tend to put alerts that are being worked on in another resolution state so that they do not show up in New. I create a number of different resolution states so that I can use those to help sort out what needs to be done. I create states like 2nd line support, Rule to be Investigated (whether rule needs to be disabled or changed. These can all be changed when it gets handed to operations as a working system.

I have this view sorted (nearly all columns in all alert views can be sorted by double clicking on the column header) by time created. Using Age does not seem as reliable as time created. That way the latest new alerts are always at the top and I can see what has been happening over the last few hours. For example in the last 8 hours 18 alerts show up in the visible part of this view but half of those are information or warning which means only 1 Critical alert per hour. This gives a view of how the estate is performing and it is useful to know when the estate is “normal” compared to when there are major problems. Anyone who has run a server estate will be familiar with this process as it is baselining the environment. Not to be confused with baseline monitors.

Monitors will close alerts when they see that the health is good. Unlike MOM 2005 these alerts get closed immediately so you may miss them. One of the key views I have is for auto resolved alerts.

  1. Go to My Workpace and right click on Favourite views.
  2. Chose New and Alert view.
  3. In Name type Auto Resolved Alerts.
  4. Tick the box for “Resolved by a specific user”
  5. Click “specific” and enter SYSTEM. This is not case sensitive.
  6. Additionally you can further filter by time (last 3 days for example) or by a group or targeted towards a specific entity.
  7. Click the Display tab.
  8. Resolution State column can be removed as all alerts in this view are Closed. Arrange to columns to your liking. Hint – in the view you can drag the columns around which is easier. I always remove the Group view as I don’t find it useful. I would add Path (which is server FQDN) as sometimes the source is cryptic like C: – not much use when you have 500 servers all with a C: drive. I also add the columns for Last Modified and Repeat count to most views as they are useful for deal with Rule Alerts.
  9. You now have a view that shows all auto resolved alerts.

Rule based alerts will not know how to resolve themselves. On a regular basis the console should be sorted by time Last Modified. If the alert has not been incremented for a number of days then a view can be taken that the problem is no longer active and those old alerts can be cleared. Also in order to keep the console clean when problems are fixed the associated alerts should be resolved. To stop spurious alerts coming into the console maintenance mode should be used for servers that are getting rebooted or worked on.

Other views I create

  • Alerts Green (all information alerts)
  • Alerts Amber (all warning alerts)
  • Alerts Red (all critical error alerts)
  • View for each resolution state that is created
  • Alerts based on a source like Clusters or KCC
  • Alerts based on a group like AD or Exchange
  • Alerts based on a computer name when a computer is being troublesome
  • Resolved alerts (not just by system) for the last day and 3 days
  • Alerts based on the name of the alert if I am investigating a rule like Script Errors when there are a large number
  • Alerts based on name like Windows Server has Restarted for information
  • Computer view Servers in Maintenance Mode
  • Computer view No Heartbeats for last 15 Mins
  • Plus more as needed for events, performance or state.

Note that on some alert views I exclude Closed alerts but others (like based a computer) I usually leave in to get a full picture of what is going on. A tip is if you create a view to exclude only Closed alerts then be careful. If you just go in and tick the resolution state boxes apart from Closed you will not get alerts from a new resolution state that is created afterwards. In this case use the formula at the bottom instead and say less than 255 so that it will cover any new resolution state that is not Closed.

As views are easy and quick to create I generally add them as needed and remove them when no longer needed. These help filter down the amount of alerts seen in the console to a more manageable view and is something I always teach people how to do. Divide and conquer.

There are alerts that are constantly repeating and so resolving them does not help as they will be recreated. The choice with these alerts is to switch them off of they are not going to be acted on for ALL servers or fix the problem. There are alerts that are fleeting.  They happen due to a server being rebooted or a server being temporarily busy. These need to be looked at to see if they are important but can be cleared after a few days if they do not increment. By ensuring that alerts are cleared regularly and the underlying problems are fixed will keep the alerts in the console down. Creating specific views (especially if using resolution states) will help focus on what needs to be looked at.

In summary if you create your own views you will find it easy to discover issues and focus on fixing them.

Patching Problems Part 3

September 23, 2008

Today I pushed out the agent to a number of new servers to see how patching would be delivered. It worked fine and when I looked at the view I had created to show the patch list field I could see that both patches had been installed. I confirmed by going onto one of the servers and checking the version numbers. However the version number of all these new agents says 6.0.6278.0. This means I have a list where some of the agents that were done say 6.0.6278.36 and some are 6.0.6278.0 with no rhyme or reason.

As this diagram shows you can see the patch field with both patches but the version number is different for some agents.

All the management servers and gateway servers are at 0 as are all the manual updates that I did. Some (not 100% but most) of the ones that I pushed out the other day are at 36 but the new ones are at 0. How is anyone meant to keep track of this?

Patching Problems Part 2

September 22, 2008

Now that I had fixed the problem of the patches being applied to all servers I had second problem – the manually installed agents. Actually although I had manually installed some agents before the Gateway server was installed I found that with this in place it could patch the agents. A useful distinction to make. You only need to do a manual patch if you do not have the same type of access that you would have if you were pushing an agent out i.e. firewalls.

The problem is with the SCCM 2007 servers. These happen to be 64 bit servers but as the current SCCM MP does not work with the 64 bit agent you have to manually install the 32 bit agent on those servers for the MP to work. The 32 bit agent seems to work quite happily monitoring the 64 bit Windows and 64 bit SQL 2005 as well but it is a bit worrying as you are never 100% sure. It is supported though. It surprises me that the Configuration Manager team who are are responsible for inventory and therefore should have no problem determining 32 bit and 64 bit servers have problems with this.

I did a manual install of those hotfixes on a couple of the servers. You can not push out the agent as it wants to push the 64 bit agent but sees the 32 bit agent and errors. I forgot and mistakenly tried to push it hence I know that that it produces an error. Initially I used the msi file as that is what it seems to says in the KB article.The hotfixes seem to work but when I looked at the console I saw the alert – “Health Service Unloaded System Rules(s)” and the server was greyed out. In the OpsMgr log on that server it was full of 4507 Health Service events for all the internal System Center rules it had unloaded. I tried restarting the OpsMgr service but all the same 4507 events reappeared in the log.Looking at Control Panel, it seems that on the agent where I had run the hotfix (msi) manually both showed up in the main list where on an agent with a successful patch both hotfixes were under the OpsMgr entry and only show when you tick Show Updates. I uninstalled the 2 hotfixes and then run the momagent.msi with the 2 MSPs in the directory and chose repair. This put them in Control Panel in the right place but still got all the 4507 errors along with related 1206 information events.I uninstalled the agent and made sure the directory and event log was deleted and deleted it from the list of agents. Interestingly if you do an uninstall of the agent from the Administrators page in the console it says succeeded and removes the server from the list. On the server, however, the agent is still there but can no longer connect to the management server. I then reinstalled the x86 agent manually. Although the MSPs were in the same directory they were not picked up when I did a new install but the agent was working correctly. I double clicked on one of the MSPs and that installed the first patch (rather than using the MSI that the patch is in). Left it for a while and it was OK so I did the other one and that was fine as well. On the second server I uninstalled the agent and deleted it from OpsMgr. After the reinstall I confirmed that the MSPs had not been installed even though they were in the directory. I tried running momagent.msi with the Repair option but that did nothing. It seems like the only way is to copy the MSP files from the management server and install the MSP files. But a bit of a fuss to do but it works. The SCCM team need to pull their finger out and sort their MP out.A strange thing. Even though you delete an agent in the console when you reinstall it it remembers that it has the Proxy settings box ticked. So it must not be fully deleted from the database. I suppose it needs to keep track of it for historical reasons until all the data is gone.Another weird thing is that most of the agents in the Administrator agent view now show the version as 6.0.6278.36 with the 36 coming from the version of mommodules.dll I presume. But some agents still show 6.0.6278.0 even though they have the patch. All the management server and gateway servers are also at 0. It is a bit inconsistent.Summary – The best way to install patches on a manually installed agent seems to be to copy the MSP files down to the server and double click them to install them. You can find these files in the agent directory of the management servers that have had the patches installed. 

Patching Problems Part 1

September 19, 2008

I knew I would have to patch the OpsMgr system that I am working on at some time as the organisation I am working with is putting in Exchange 2007. At least now you can get the patches without having to phone Microsoft. Just request them and they will e-mail you a link.

Kevin Holman has a great post on all the hotfixes and which ones supersede which ones. http://blogs.technet.com/kevinholman/archive/2008/09/12/what-hotfixes-should-i-apply.aspx. There is always the great debate about applying patches with the philosophy of “if it ain’t broke then don’t fix it”. Although that used to be my philosophy years ago I am much more of the philosophy that you need to keep up to date. However I decided I definitely needed 2 patches for this environment – KB 954903 and KB 956689.

Having read the docs I knew I had to apply them to all management servers and gateway servers. And as Kevin says – check that the versions of the DLLs have been done just to make sure. That went well and then back in the console all the agents popped up as Pending as they needed the updates. I went about approving them. As these were in different environments with different domains I had to approve them per Management Server or Gateway as it needed the primary management server and an administrator user for those servers. No problem as the Agent view can neatly group them by management server.

Kevin also has some great posts on checking that the patches have been applied.
By reporting
http://blogs.technet.com/kevinholman/archive/2008/06/27/a-report-to-show-all-agents-missing-a-specific-hotfix.aspx

By State View and SQL Query
http://blogs.technet.com/kevinholman/archive/2008/06/24/how-do-i-know-which-hotfixes-have-been-applied-to-which-agents.aspx

And how the hotfix process works
http://blogs.technet.com/kevinholman/archive/2008/06/25/a-little-tidbit-on-hot-fixes-for-opsmgr.aspx

And as he says the patch list is all in one 256 field. The two patches I applied took up 241 characters already! This is not good. There is so much junk in there taking up space and you really need to know what patches have been applied.

System Center Operations Manager 2007 Agent installed.{7EEAF9D0-F78D-4C94-874E-66A756A4C510},C:\WINDOWS\Installer\fd1b6428.msp,KB 954903,20080916; {668B6309-9D96-405D-8B98-439C9C5A9A37},C:\WINDOWS\Installer\1874861.msp,KB 956689,20080917;

clip_image002

A bug has been raised on this but there is no fix yet. Everyone should contact them and tell them that this needs fixing!

The actual patching sounds simple enough but I ran into a couple of problems.

The first was that after I approved the pending agents and looked in the State View I had created some fields were bigger than others. Using the SQL queries provided in Kevin’s blog I queried the SQL database and copied the results into an Excel spreadsheet.

select bme.path AS ‘Agent Name’, hs.patchlist AS ‘Patch List’ from MT_HealthService hs
inner join BaseManagedEntity bme on hs.BaseManagedEntityId = bme.BaseManagedEntityId
where hs.patchlist not like ‘%954903%’
order by path

Take out the line starting “Where…” to get the full list. Or take out the not on that line to get the ones that have the patch etc. Good little query.

In the spreadsheet insert 2 columns between the servers (column A) and patch list (now becomes column D) columns. Assuming that you have put headers in row 1, then insert the following formulas into row 2, column B =ISNUMBER(SEARCH(”954903″,D2)) and into column C =ISNUMBER(SEARCH(”956689″,D2)). Then copy and paste the formula down and you should see TRUE or FALSE depending on whether the KB number is found or not.

Note that Management Servers and Gateway Servers do not show a patch list.

What I found was that only 25% of the servers had taken both patches but all had taken the first patch. It was not a single management server or type of OS. It just seemed random. With help from Kevin I worked out what I needed to do. He said that “putting agents in pending is just flipping a bit in the database and putting them in the pending actions table. It REALLY is doing a “repair” behind the scenes when you approve them. The repair forces the agent to download a new MSI from the \Agentmanagement directory of it’s primary MS, PLUS any patch MSP’s present. You need to make sure ONLY the correct and current MSP’s are present, and manually delete any older ones which are not applicable. You should be able to deploy a ton of hotfixes at once, and then patch/deploy new agents. The agents should pick up all the MSP’s present.”

I checked the management servers and gateways and they all had both MSPs. As he says it is really a repair so I just did a repair on all the agents that only had one hotfix and that sorted them out.

As for the second problem, more on that later.

Rule v Simple Monitor

September 18, 2008

I have been asked to create a custom management pack for one customer but they only want a list of events turned into alerts. Immediately this suggested rules to me but it is distributed application which means that rules would not show up in the DA  – only monitors do. My initial reaction was that I could not use monitors as they did not have a health model. That is they did not have an event that could be used to show that the application was now healthy.  But there is a monitor that is very similar to a rule – The Manual Reset under Simple Event Detection.

clip_image002

I created a test alert just to see how it works. In creating the monitor it is very similar to the wizard for rule creation except you have to say what branch it goes under – Availability, Configuration, Performance or Security. I had it create an alert as that is optional on a monitor.

With a rule the event triggers an alert in the console. This alert needs to be closed manually. The rule will increment the repeat count every time the event is picked up by the rule. The rule does not affect the Computer (or state) view and does not show up in Health Explorer.

With a monitor an alert is also generated but this time it shows the state in the Computer view and can be seen in Health Explorer which means it would show up in a distributed application. Monitors do not increment the repeat count.

The difference is how you clear the alert. If you close the alert it removes the alert from the Alert view but that does not change the Computer view or Health Explorer view. What you need to do is open up Health Explorer and click Recalculate Health. This clears it from the Computer view and Health Explorer. If the alert is still in the console then clicking Recalculate Health will also close the alert. You can also use Reset Health but that is brute force and will change any monitor back to green (healthy).

So if you are looking to create simple event rules you may want to consider using the Manual Reset monitor as it will do a similar job but can be used in a distributed applications health. Make sure that the event really does make the DA unhealthy before including it. Also if you put it in the Availability branch it will affect the Availability report. The only thing to watch for is how you close the alerts. With a rule, close it in the Alert view. With a Manual Reset monitor open Health Explorer and do a Recalculate Health.

It would have been nice if the Product Group had made it consistent so that closing the alert also clears the health view so that there is only one method of closing alerts. Also it brings up the question of how do you know what type of monitor that has created an alert and whether it will auto resolve or needs manual intervention.

Get SP Info from AD

September 10, 2008

One of the things that is required when you roll out agents is to know that the minimum service pack has been installed. Of course you can find out by trying to deploy the agent. The install failing will tell you but that is not a great strategy. If the customer is really good then they can provide that information but quite often I have to find it out. Or they ask me! There are some ADSI scripts that I looked at in the Script Center which helped but did not give quite what I wanted plus I knew that PowerShell has a nice export to CSV option so pulling it into Excel would be a breeze.

I started with PowerGui as it is an easier way to find out the command line but to use the AD extensions you need to download and install Quest Active Roles (available in 32 and 64 bit).
http://powergui.org/downloads.jspa
http://www.quest.com/powershell/activeroles-server.aspx

I hoped that I could get all the computers that were servers in PowerGui and output that to a text file. And I did get part of the way there but it seemed to choke getting all the computer information for the number of servers. But as it has a tab to show the PowerShell command I was able to take those and build on it.

I did this on the RMS as it had PowerShell installed (no need to run this on a DC) and I used the normal PowerShell console rather than the OpsMgr one. You need to tell it to use the Quest Add-in.

Add-PSSnapin Quest.ActiveRoles.ADManagement

Then using the command that PowerGui built I found all servers that were servers and had no SP (useful if all servers are 2003).

Get-QADComputer -ErrorAction SilentlyContinue -SizeLimit 0 | where { $_.OSName  -like  ‘*Server*’ } | where { $_.OSServicePack  -eq  $null } | Select-Object -property “Name”,”Type”,”DN”

Instead of displaying the list I wanted it as a CSV file.

Get-QADComputer -ErrorAction SilentlyContinue -SizeLimit 0 | where { $_.OSName  -like  ‘*Server*’ } | where { $_.OSServicePack  -eq  $null } | export-csv c:\servers_nosp.csv

Now for all servers

Get-QADComputer -ErrorAction SilentlyContinue -SizeLimit 0 | where { $_.OSName  -like  ‘*Server*’ } | export-csv c:\servers.csv

There was too much information in the CSV although you can delete the columns. A quick look at the PDF for Active Roles and I found the fields that I wanted. The next line does all computers so if you need to do SCCM as well this is useful.

Get-QADComputer -ErrorAction SilentlyContinue -SizeLimit 0 | Select-Object -property “Name”,”OSName”,”OSVersion”,”OSServicePack”,”DN”} | export-csv c:\allcomputers.csv

I imported that into Excel and with a quick pivot table I had all the information I needed and as I included the Distinguished Name I could see which OUs these servers were in.

Note that this just provides the information from a single domain.

Now someone will tell me why didn’t I use x, y or z as they have been around for ages to do this. Well I either didn’t know or couldn’t find them and I wanted to extend my PowerShell knowledge.


Follow

Get every new post delivered to your Inbox.