SCOM Sizing – an easy win?
One of the statements from the SCOM material is that it more scalable. Well that should be easy as the MOM 2005 scalability tests for support was run on old hardware (even for the time) and so the 2000 agents per server and 4000 per management group should be easy to increase just by using modern hardware! Looking at the Performance and Sizing white paper that was done for 2005 I suspect they could double the numbers just by using modern hardware. I suspect that they will have done some work on it anyway to improve things. And perhaps the use of 64 bit SQL may help the scalability of a management group.
For server monitoring it will not make that much difference to many people as there are very few organisations that need to monitor more than 4000 servers. However monitoring workstations is becoming more interesting and that is where MOM 2005 had problems.
Not only are there scalability issues which meant that you have to use multiple management groups above 4000 agents but there are other issues as well like:-
• No workstation license – so costly
• Workstations are switched on and off more than servers so heartbeats alerts are a problem
• No support for XP Embedded which is used for systems like POS tills (for a retail environment this is a critical application)
And if you did go down this route and installed multiple management groups then there is the problem of keeping the rules in synch. Unlike SMS which you install a package at the top and it filters down MOM requires you to install the MPs at each management group. If you don’t then alerts that come up to the top management group don’t work as they only send a pointer and not the alert. If the rule (with GUID) does not exist then no alert. And that means every time you modify a rule you need to ensure that you export it and then import it into every down level management group. Then once you have solved that you need to set up a system to get the reporting data into the top management group as that does not happen automatically. Although Microsoft has produced a solution accelerator that shows you how to do it.
So I am intrigued on the scalability of SCOM for workstations like POS tills or ATM machines as these are more business critical than normal PCs. Then you have to take into account the three parts of SCOM.
AEM (formerly CER) is based on Windows Error Reporting and that scales up to Internet numbers. As it is agentless it is easy to deploy and get large number of clients working. After all you don’t expect 5000 PCs to all have a Dr Watson error at the same time and hit the AEM file share at the same time! I expect this figure will be bandied about as “demonstrating” SCOM scalability. Especially by marketing.
The ACS agent is part of the OM agent. When it was in beta ACS had its own agent. This will be attractive to companies who want to monitor security events for compliance and so I see that organisations would want to put that on PCs. Some initial figures I have seen is that a separate ACS server can deal with 100 DCs, 1000 server or 10000 PCs. It looks like you just have to put in extra ACS servers to scale out. There has been no mention of how to get all that info into a single database for reports if you have scaled out. We will just have to wait until the testing is done to get the supported figures.
OM itself will be the bit you want if you need to alert on event logs and performance counters and if they can deal with the heartbeat separately from servers they should be able to scale up to some reasonable numbers – I hope. As the ACS agent is included as part of the OM agent then it needs to scale out to deal with that fact alone.
Having written this, although Microsoft are bundling SCOM 2007 as one product, it is clear that it is three separate products by the way I tackled writing about it. This fits in with the article I did a while back on SCOM architecture and I think that organisations should look at SCOM as three separate products.