Blueprint for a Big SCOM Design

According to the testing done a single management group can support up to 5,000 agents. If those are all servers then there will not be that many organisations that will need multiple management groups and those that do will be big enough to hire specialist consultants!

From the Design, Guide, Deployment Guide, Security Guide, Performance and Scalability guide I have come up with what I consider to be the blueprint design to put in place for Operations Manager to monitor from 2,000 to 5,000 servers.  The first thing is that if that many servers are being monitored then I am assuming that you would want a fault tolerant design.

In 2007 both the database and the RMS server are single points of failure and so need to be clustered. It would be nice to have them on a single cluster but at the moment that is not a supported option although Microsoft may support that in future. So 2 separate clusters are needed.

To scale up you will want x64 for Windows for the database and RMS servers. 64 bit for SQL server although I have seen posts of people running 32 bit SQL on 64 bit Windows. After some discussions with some people from Microsoft IT and the Program Manager responsible for testing non RMS management servers can be 32 bit and have the potential to  be virtual. For performance you will want 8 to 16 GB of memory on both the RMS and SQL servers. You can start with 4 but the reason for going 64 bit is to use memory above 4 GB. Obviously database I/O is crucial to the performance and SAN disks have to be organised accordingly.

On the RMS server the SDK service is responsible for providing a communication layer between the OperationsManager database and the rest of the Management Group. The Config Service is responsible for calculating the configuration of all agents and which Management Packs they should receive and the overall configuration of the Management Group. To allow the RMS to deal with these tasks and all the security requests from consoles and web consoles which get funneled through it, it is recommended that no agents are assigned to the RMS. To deal with agents then three management servers are used which will provide the scale and fail over.

Finally I would keep the Data Warehouse on a  separate server (preferably x64) which does not need to be clustered as it is dealing with long term reports which are not usually deemed to be mission critical with one caveat. The management servers now insert data directly into the data warehouse database so if this is off line for a period of  time then some data may be lost.

For a small number of servers then SCOM 2007 on a single box will do. When you get into a few hundred servers to a few thousand then there are many decisions that need to be made and consulting an expert will help. But for 5,000 agents I think that this is a good blueprint.

5000 Management Group  DW

Note that I have not put in all lines of communications as the diagram gets messy. Also as per a previous post I am treating the product as three separate products but with this in place then it is possible to add ACS collectors and databases as needed. As I am only looking at servers I have not included AEM. This also assumes all servers are in a single forest but for DMZ then a certificate server and perhaps a Gateway server need to be considered.



  1. Ian

    What about having those 5000 agents distributed?

    Multiple site with 100+ servers per site? Data line speed 512K -> 2Mbps?


  1. Design Tips and OS versions « Ian Blyth - System Center Technologies
%d bloggers like this: