We had built a new SCOM 2016 Management Group alongside the existing 2012 R2 UR11 one in order to dual home and test before switching over. As we saw posts showing that there were issues with APM and .Net Application Pools crashing with UR2 we decided to wait until UR3 but that did not fix the problem.
The System Center Operations Manager team blogged about the issue on 31st May.
On the 6th June they blogged about workarounds that could be used until a fix was released as UR3 still had problems.
- SCOM 2016 Agent can be replaced with SCOM 2012 R2 Agent, it’s forward-compatible with SCOM 2016 Server and APM feature will continue to work with the older bits
- SCOM 2016 Agent can be reinstalled with NOAPM=1 switch in msiexec.exe setup command line, APM feature will be excluded from setup
To do the install with the recommended NOAPM switch would have meant manually installing the agent on hundreds of servers. The SCOM Team decided that utilising the existing 2012 R2 agent that was already on the server was a better option and initially have the agent dual homed to both management groups. We could then upgrade the agent to 2016 when the fix was released. This work was carried out using Kevin Holman’s Agent Management MP (now SCOM Management) as it has a nice task of adding the existing agent to another Management Group. Pushing the agent out from the 2016 MG would have meant that the 2016 UR2 agent would have been deployed. This allowed us to have the 2012 R2 agent dual homed to the old Production Managment Group and the new 2016 one.
As we had decided to use the 2012 R2 agent and had not configured APM to be used on any server then the SCOM Team had thought that they had followed the required steps to avoid the application pool crash issue but still being able to move forwards with SCOM 2016 to get the benefits of scheduled maintenance mode and other features. Therefore, the fact that SharePoint 2016 servers on a Windows 2012 R2 server started to exhibit this issue was not expected. The SharePoint Team removed the SCOM Agent and that stopped the problem for them but left the servers unmonitored.
The team started to analyse the situation. Two new registry keys had been added.
These are MULTI_SZ with the values being
We discovered that the contents of the registry key was the GUID for the SCOM APM component and that key turned on .Net profiling. We did a web search to see if we could find any issues with the 2012 R2 agent and the APM issue. This was known to happen with the SCOM 2016 agents but we could not find any mentions of this combination. We did find a forum post that mentioned the registry key for 2016 agents and a recommendation to change a rule in one of the APM Management Packs.
On investigating this, we found that this rule was in the Operations Manager APM Infrastructure MP. This is a standard MP that is installed as part of the SCOM installation but the MP in SCOM 2016 had a higher version number that the one in the 2012 R2 Management Group. We checked the differences between the two MPs and found that Microsoft had added an undocumented feature to the newer version.
The rule – Apply APM Agent Configuration
There is a new parameter in the 2016 MP that is not in the 2012 R2 MP – Enable RITA Profiler.
This has the additional parameter Enable RTIA Profiler that is on by default and is targeted to every server that has the APM agent (.Net Application Monitoring class). This means every server as the APM agent is automatically installed in a switched off state when the SCOM agent is installed.
We tested this override and found that when we disabled it for a server then the registry keys would be removed and when we set it back to enabled, it would add these keys back again. This was the difference between the 2012 R2 Management Group and the new 2016 Management Group.
Next after discussion with the SharePoint Team we requested access to a SharePoint server that displayed the errors so that we could repeat the process with a server that would break so that we could ensure that the problem was fixed.
We overrode the rule to enable it for the Sharepoint server and we could see that the two registry keys were created as soon as it received the MP. We left it for about 6 minutes and no issues. As soon as we did an IISreset then the events for the App Pool crashing appeared in the Application log. Events 1325, 2016 and 1000. And these kept repeating which meant the web site using that application pool no longer worked.
In the SCOM Console we could see the Application Pool alert increment as the pools kept crashing.
After we removed the override to enable it the events stopped and alert stopped incrementing.
We also found that a number of servers had these events repeating in the Operations Manager event log.
These events were the agent trying to start and failing. We did not investigate these as the main issue was with the application pools.
As a second test we removed the agent and installed the agent with the NOAPM switch. As this server no longer has the .Net Application Monitoring class then the rule does not apply to it. During all tests this server carried on with no problems regardless of how the rule was set as it would never be sent to that server as the target class did not exist on it.
As we were not using APM we put in the override to disable the parameter “EnableRTIA” (set to false) for the rule “Apply APM Agent Configuration”. This will ensure that the registry keys are not created. This will stop the application pool crashes due to the .Net Profiler being activated and the APM Agent error events across the estate.
Note as only one .Net profiler can be run at one time you may want to disable this rule if you are using the .Net profiler with a different application. I found that problem occurred in a product called New Relic when searching.
Note that this override had to be done in both the 2016 Test and 2016 Production Management Groups (if you have this setup) as dual homed agents will create the following warning alert – “APM .NET Server-Side monitoring Configuration Error or Conflict”.
Note that this does not stop application pool crashes that may crash due to genuine problems.
For the SharePoint Farm we recommended installing the agent with the NOAPM switch to give double reassurance to the SharePoint Team that this would not cause problems again.
I hope that SCOM 2016 UR4 fixes this problem for good.
I would like the Product Group to consider having a switch on agent installation from the console with a tickbox on whether or not to install the APM agent. That would have made it easier to change the agent from the console rather than having to manually install it (unless you have a software deployment tool this is a lot of work.).
Edit – 6 August 2017
I mentioned Kevin Holman’s SCOM Management MP which has help to find servers with old Management Groups and remove them and also a task to add a Management Group to an agent. The MP also contains a task to run a command or PowerShell command from a central location which I had thought about using but Kevin has just blogged about that and how to use to remove the APM agent from your servers.
Not quite as good as the Product Group doing it as part of the agent install as the agent has to be deployed for the task to run but still pretty useful for changing the agent.
He has also listed all the APM MPs that you can remove from SCOM 2016 so that APM is never deployed.
I wanted a quick list of all the servers in a 2012 R2 resource pool for documentation rather than go into each on and check it. Haven’t got there yet but I did manage to get a quick way to see what resouce pools a management server belongs to.
For a single server
$Member = Get-SCOMManagementServer -Name “FQDN”
$Pools = Get-SCOMResourcePool -Member $Member
For all management servers
$Members = Get-SCOMManagementServer
foreach ($member in $members)
$Pools = Get-SCOMResourcePool -Member $Member
write-host “Management Server – “$Member.DisplayName
Useful for me so I thought it might be useful for other people.
Creating a dynamic group makes life easier for overrides, views and reports (the main uses of groups). When a new object is discovered it automatically joins the group if it matches the criteria. And that means that your overrides, views and reports take that into account without any intervention on your part.
These are usually quite simple to create based on a number of criteria. I was wanting to build a group that matched either wildcard 1 or wildcard 2. It was not initially obvious how to do it so I thought I would document it.
Although this helped http://social.technet.microsoft.com/wiki/contents/articles/7205.operations-manager-dynamic-group-examples.aspx as it showed it was possible (for example a group with C: and D: drives) it did not show how to actually do it.
Start by finding the class and add it. It will show as AND group. If you add your criteria you will get this.
( Object is Scheduled Tasks 2008 Job AND ( Job Name Matches wildcard User_Feed_Synchronization* ) AND ( Job Name Matches wildcard GoogleUpdateTask* ) )
This will obviously never work. We need an OR not an AND. Putting the OR between the two expressions does not do what you think it would do. You would expect it to say if it is a Scheduled Task and either of the expressions is true then add the object. But what you need to change is the class expression to an OR group. And the way you do that is by right clicking on that expression.
Now you can switch to an Or Group.
Now the formula is
( Object is Scheduled Tasks 2008 Job AND ( Job Name Matches wildcard User_Feed_Synchronization* ) OR ( Job Name Matches wildcard GoogleUpdateTask* ) )
Which is what is required. If the scheduled task matches either wildcard then it belongs to the group. Easy when you know how.
This alert seems to come up regularly. If you search for it then this post from 2012 by Marnix Wolf comes up saying just to disable the alert.
I would have thought it would have been fixed by now. I am using version 126.96.36.199 of the SQL MP. I tried to do it from the alert help and it seemed to work so when I pushed some more agents out and a couple of new alerts appeared from 2 more SQL Servers. I tried the setspn command again but this time it did not work with the following error:-
Unknown parameter MSSQLSvc/ServerFQDN:1433. Please check your usage.
After some searching I came across this page on Technet. “How to Configure an SPN for SQL Server Site Database Servers” http://technet.microsoft.com/en-gb/library/bb735885.aspx
It was not so much the article but a comment from Asif Shafi that provided the answer – “Do not copy and paste syntax from this article as it has invalid characters”.
He provided a couple of examples and I copied those and inserted my server and account details in and this time it work straight away.
That was weird as I copied the original details to notepad first and then copied them into a command line. From my Notepad file I copied and pasted another server and got the same error. But when I typed it exactly as it was in Notepad it worked. I am perplexed how strange characters or formatting can get into Notepad. And I tried copying the faulty line into Notepad++ as well with the same results.
I did some further investigation and it turned out to be the dash in front of the A. So if you get this error try the command again but delete the dash and type the dash in again. That sorted it on the copy and paste I tried. In a few minutes the alerts disappeared. But at least I now know a way that works and clears the alert.
Note that if you run the command (with the right characters) again you will get the following error “Duplicate SPN found, aborting operation!” and your SPN will be left as is.
You may also see information on using ADSIEdit as per http://support.microsoft.com/kb/319723. I did not try this method but I have used this method before on the SCOM service as proposed by Kevin Holman.
I was working on a project where Orchestrator 2012 SP1 plays a big part in linking and automating the various parts of the solution which includes Service Manager, Operations Manager, BMC and Orion as well as custom databases. Therefore it was essential that Orchestrator is properly monitored. While Microsoft produce a MP for Orchestrator it is a bit of a disappointing MP (version 7.0. Version 188.8.131.52 has been released since I did this work but still does not monitor Runbooks) which basically monitors the services and a few bits and pieces but not Runbooks. The trouble with the Common Engineering Criteria is that, while it is a good thing that the Product Groups have to create an MP for their product, there is no definition of what it should contain and no quality bar. While some Product Groups are very good at creating a good MP some just go for the simplest tick in the box they can get away with. It is very disappointing when one of the System Center products falls into that category.
Fortunately there is an answer and that is the Infront Consulting System Center Orchestrator Management Pack. http://infrontconsulting.com/services/infront-software/.
This is a free MP that works on SCOM 2007 R2 and 2012. You do need to register to be able to download it and you need to provide the name of the Management Group. They use the MG name as part of the install process. I am not sure why they do that. Maybe at one stage they were going to sell it or they are just testing that technology to see how it works. It does mean that installation is a bit different from importing an MP but is quite quick and painless. Once I tested it in the test environment a quick e-mail with the names of the other management groups got us the files for use with each Management Group. The version that I am reviewing is 184.108.40.206 and this was run on SCOM 2012 SP1.
This MP is about monitoring the Orchestrator Runbooks. As these are the key parts of what makes Orchestrator work it is a great solution for monitoring your Orchestrator Runbook servers. Initially when installed it does nothing, which I like. The first thing is to create an override to discover the servers the Runbooks are running on. There is only one monitor that is enabled by default and that checks if there has been a failure with the Runbook. The next time the Runbook succeeds then the monitor will clear. And to be honest if that was all the MP did it would be a good enough reason to install it.
- Last Runbook execution succeeded.
- Runbook is executing
- Runbook is checked in.
There are other monitors and rules that are switched off by default. One of them is if a Runbook is checked out for a length of time (15 minutes by default). This is handy as Runbooks should not be checked out unless they are being updated. As the MP discovers the Runbooks they are listed as objects so individual Runbooks can be put into maintenance mode while work is being done on them.
Another monitor is checking that a Runbook that is supposed to be continuously running actually is. For this one you need to create a group using the attribute “Manually Triggered”. This means you can target the monitor at Runbooks which need to be constantly running rather than those that get called by another Runbook when needed.
As well as the rule and monitors there are a number of performance collection rules which makes it easy to see how frequently Runbooks are getting called and how long they take to run. This was an interesting one as some Runbooks were taking a long time and so need investigating.
- SC Orchestrator Runbook Instance Failed (With Suppression)
- SC Orchestrator Runbook Instance Failed (Without Suppression)
Rules – Performance Collection
- SCO Runbook Average Execution Time Collection Rule
- SCO Runbook Failed Instances Collection Rule
- SCO Runbook In Progress Instances Collection Rule
- SCO Runbook Success Instances Collection Rule
- SCO Runbook Total Finished Instances Collection Rule
- SCO Runbook Warning Instances Collection Rule
There are no built in reports but the fact that the counters are collected means that you can use the generic performance report to create your own.
Once running you soon realise that a naming standard is critical for your Runbooks. If you start getting more and more Runbooks being used, then when the alert comes up about a particular Runbook you will want to quickly find it. The same is true to see the Runbooks in Service Manager. You will see a list of Runbooks without the hierarchy so a good naming convention really helps. The sessions done by Pete Zerger and Anders Bengtsson at MMS 2013 are essential viewing for anyone involved in Orchestrator in my opinion.
Best Practices For Runbook Authoring and Managing Orchestrator
- Start Runbook
- Start Runbook with Parameters
A couple of handy tasks to save you having to logon to an Orchestrator server.
The one issue I have seen is with the alert that a Runbook is checked out. In the alert description it says the person that checked it out and the Runbook but the date and time is blank. The information is in the alert context. Looking at the alert context the date is in US format whereas the servers I was working on are in UK format. I have passed this information on to Infront.
Not an issue but it would be nice to have the group already created that discovers Runbooks that are set to continuously run.
The monitor “Last Runbook execution succeeded” resolves when the Runbook succeeds. This means that it creates and alert and quickly clears it if you have a busy Runbook. Therefore be sure to check the closed alerts to see if this is happening. Also the most common alerts report also highlighted this as an issue. It enabled us to find a problem with a particular Runbook. Most of the time it succeeded but occasionally it would fail. Checking the failures we found there was an issue with the way the alert description was formed occasionally that meant the Runbook picked up the wrong information and failed when creating an incident as the name was too long in those instances.
This is an essential MP if you are using Orchestrator and would like it to be properly monitored. The ability to get alerted on errors and on when Runbooks are not running or have been checked out is very useful as it is difficult to see that information in the Runbook console. The documentation is clear and easy to follow. Thanks to Infront Consulting for taking the time and effort to create this MP and release it for free. And shame on Microsoft for not doing something similar in the Orchestrator MP.
I was checking out some reports on the SQL Server MP as I had installed the new 220.127.116.11 version in a test system but still had 18.104.22.168 in production. When looking at group membership I noticed that there were a few SQL servers missing from the SQL Server 2012 DB Engine Group but that they were all shown as members of the group SQL Server 2012 Computers. But this was consistent for both versions of the MP.
The SQL 2012 DB Engine Group dynamic membership rule is
( Object is SQL Server 2012 DB Engine AND ( Version Matches wildcard 11.0.* ) AND True )
But all the SQL 2012 SP1 Servers were 11.1.3000. So this would never work. Initially I thought it was a bug in the MP but looking at version numbers according to Microsoft the SP1 version is 11.0.3000.00 which would mean that the group membership would work.
Even CU5 for SP1 is 11.0.3373.0.
This was puzzling as this did not match the version numbers of the customer. Searching on the web I saw a few more mentions of 11.1.3000.
But nothing to say why there were these two version numbers for the same product. The customer had Enterprise, Standard and Developer and all versions were 11.1.3000.0 if SP1 was installed.
On the SQL Server when I do
Microsoft SQL Server 2012 (SP1) – 11.0.3000.0 (X64)
Oct 19 2012 13:38:57
Copyright (c) Microsoft Corporation
Standard Edition (64-bit) on Windows NT 6.1 <X64> (Build 7601: Service Pack 1) (Hypervisor)
Then I get 11.0.3000.0
But from SCOM Discovered Inventory
Display Name SCOMOPERATIONS
Full Path Name server.local\SCOMOPERATIONS
Instance Name SCOMOPERATIONS
Edition Standard Edition
I get 11.1.3000.0
HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft SQL Server\110\SQLServer2012\CurrentVersion
It shows 11.1.3000.0
I tried looking at the SCOM discovery script for SQL Server 2012 DB engine but it goes on for pages and pages and pages and uses WMI. I am presuming if I found this registry key wrong then the script will find the wrong registry key wherever it is looking which it must do as it returns that value. So it looks like it is a SQL problem putting the wrong value into the registry and the MP developers were going by the documentation.
A search for 11.1.3000 finds that it is in these registry keys (and a lot more for SQL)
- HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft SQL Server\110\Tools\Setup
- HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft SQL Server\110\Tools\Setup\Client_Components_Full
- HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft SQL Server\110\Tools\Setup\Client_Components_Full\1033
- HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft SQL Server\110\Tools\Setup\SQL_SSMS_Adv
- HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft SQL Server\110\Tools\Setup\SQL_SSMS_Adv\1033
- HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft SQL Server\MSAS11.SCSM\Setup
- HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft SQL Server\MSAS11.SCSM\Setup\Analysis_Server_Full
- HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft SQL Server\MSRS11.SCOMOPERATIONS\Setup
- HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft SQL Server\MSRS11.SCOMOPERATIONS\Setup\RS_Server_Adv
- HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft SQL Server\MSRS11.SCSM\Setup
- HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft SQL Server\MSRS11.SCSM\Setup\RS_Server_Adv
- HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft SQL Server\MSSQL11.SCOMDW\Setup\SQL_Engine_Core_Inst
- HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft SQL Server\MSSQL11.SCOMDW\Setup\SQL_Engine_Core_Inst\1033
- HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft SQL Server\MSSQL11.SCOMDW\Setup\SQL_FullText_Adv
- HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft SQL Server\MSSQL11.SCOMOPERATIONS\Setup
- HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft SQL Server\MSSQL11.SCOMOPERATIONS\Setup\SQL_Engine_Core_Inst
- HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft SQL Server\SqlDom\CurrentVersion
- HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft SQL Server\sqlls\CurrentVersion
- HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft SQL Server\SQLNCLI11\CurrentVersion
- HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft SQL Server\SqlWriter\CurrentVersion
- HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Microsoft\Microsoft SQL Server 2012 Redist\SQL Server System CLR Types\1033\CurrentVersion_64
A search for 11.0.3000 only finds a number of names under this one key
Name = Microsoft.AnalysisServices,fileVersion=”11.0.3000.0″,version=”11.0.0.0000″,culture=”neutral”,publicKeyToken=”89845DCD8080CC91″,processorArchitecture=”MSIL”
All very confusing but if you have SQL 2012 SP1 then be aware that the group SQL 2012 DB Engine Group will not pick up any 2012 SP1 SQL DB engines.
If you need to have a group targeted at all SQL 2012 DB Engines then either use the SQL 2012 Computer group or create your own group with the formula:
( Object is SQL Server 2012 DB Engine AND ( Version Matches wildcard 11.* ) AND True )
The SQL team need to get their installation or documentation fixed and/or the MP team need to update the SQL MP.
Update Rollup 3 has arrived but not all products are updated. I have created this matrix to show what product has been updated with each rollup.
|App Controller Setup Update||N/A||N/A||KB2823452|
|Data Protection Manager||KB2802095||KB2822782||KB2853210|
|Operations Manager – UNIX/Linux||KB2784734||KB2828653||KB2852565|
|Service Provider Foundation||KB2785476||N/A||N/A|
|VMM Administration Console||KB2792925||KB2826392||KB2858509|
|VMMr Guest Agent||N/A||N/A||KB2858511|
Description of Update Rollup 1 for System Center 2012 Service Pack 1 http://support.microsoft.com/kb/2785682
Description of Update Rollup 2 for System Center 2012 Service Pack 1 http://support.microsoft.com/kb/2802159
Description of Update Rollup 3 for System Center 2012 Service Pack 1 http://support.microsoft.com/kb/2836751
* Configuration Manager has seprate updates and calls them Cumulative Updates
Description of Cumulative Update 1 for System Center 2012 Configuration Manager Service Pack 1 http://support.microsoft.com/kb/2817245
Description of Cumulative Update 2 for System Center 2012 Configuration Manager Service Pack 1 https://support.microsoft.com/kb/2854009
I am not sure why some products just have one KB per UR while others have multiple. And OpsMgr switches between both methods.
System Center is supposed to be heading to be a product with the old products SCOM, SCVMM etc just being elements of that product the way that Office went. But you can only license System Center as a suite. This presentation from MMS by Carmen Summers, Senior Program Manager, Microsoft is interesting and shows the way they are heading and the challenges that they face with this approach. Anyone who has tried to work out what SQL Collation is required for all the System Center components to co-exist on one SQL Server will know what I mean.
System Center 2012 SP1 Simplifications and Upgrades, MMS 2013 – http://channel9.msdn.com/Events/MMS/2013/SD-B203
I had a situation that the SCOM to SCSM (both 2012 SP1) had stopped several days ago yet the CI Connector showed that it was still working. There was no obvious reason why this connector was still working when the alert connector was not.Looking at the SCOM side and showing the column Forwarding Status then I could see that all the alerts that met the criteria were showing as Forwarding Pending. This showed that the SCOM side was putting the alerts up for SCSM to pick up but SCSM was not picking them up. When I clicked the Synchronise Now button on the connector in SCSM the following event was generated.
Warning event 34070
Source – Operations Manager Connector
Error of type System.Xml.XmlException while reading configuration. The following information may help take corrective action:
Message: An error occurred while parsing EntityName. Line 1, position 958.
Subsequent error messages in the event log will indicate the affected connector. In order to correct this error, export the ServiceManager.LinkingFramework.Configuration management pack, correct the connector configuration, and then reimport the management pack.
The named MP (ServiceManager.LinkingFramework.Configuration) was exported and I looked at it in an editor. Line 1 was the normal MP start line and certainly did not have 958 characters. But down in the XML there was this section:-
<Name>Customer A & B Incidents</Name>
<ComputerCriteria computerGroup=”Customer A & B” />
While the line number and character number in the event was no help it did point me to the correct MP to look in. I was wary as soon as I saw the ampersand which is a reserved character in XML and even though it was encased properly as & I decided to try that first.
There were a number of Alert Routing Rules that had been created. These used groups to determine which template a particular alert should use when it comes in. One of these groups and the template had an ampersand (&) in the name. This Alert Routing Rule was deleted and the connector sprung back into life. The group and template were renamed without the ampersand and put back into the Alert Routing Rule table and all is fine now.