Archive for August 2006

What is in a name?

August 31, 2006

Since Microsoft bought Operations Manager from NetIQ it has been know affectionately as MOM. With the next version being called System Center Operations Manager 2007 then the obvious short hand is to call it SCOM but that does not have the same ring to it as MOM does. While working with a Microsoft Technical Specialist I noticed that he went out of his way not to use SCOM as he did not like that acronym but instead referred to it as Ops Mngr or OM. 

As the new version comes out what will people start calling it? Will it be OM, Ops Mngr, Operations Manager or SCOM?  I don’t think many people will be saying the full name! I have generally called it SCOM although that does not sound great. I think that people will still refer to as MOM for a long time yet. MOM is a nice friendly sounding name.

Fault Tolerant MOM 2005

August 31, 2006

As I read an article about the new dual core PCs with dual graphics cards it brought to mind a question that frequently comes up. Should MOM be made fault tolerant? Some organisations say no as it is not a business application and so does not matter while others say yes as if MOM goes down their view of the business applications goes down and so they are blind to any problems. The answer breaks down to two parts. What is your attitude to risk and what is your budget? If you do not have the budget for an extra MOM server and an extra SQL Server with the software licences which when you move to a SQL cluster requires Enterprise Edition of Windows and SQL then it is a no brainer. No budget – no fault tolerance. 

If you do have the budget then it is down to risk. If you rely heavily on MOM to notify you of problems with Exchange, AD, SQL and have created custom management packs for the in house applications and you are proactively using MOM then your exposure to any downtime on the MOM server is high. If your use of MOM is light and you are not dependent on it (yet – usually in the early days of an installation before the people see what MOM can really do) you can use a non fault tolerant system and rely on the improved uptime of the modern versions of SQL and Windows Server.  For less that 4,000 agents (the limit of a Management Group) then the options are as follows. Non fault tolerant

  1. All on one box (only good up to 200 agents roughly)
  2. One MOM management server and one SQL Server (up to 2000 agents)
  3. Two MOM management servers and one SQL Server (up to 4000 agents)

Although the last option appears to be fault tolerant as there are 2 MOM servers it is not as each MOM server is only supported up to 2,000 agents. Does this mean that it won’t work? With modern hardware it may well cope as the performance testing was done with relatively old hardware but if you phone up Microsoft for support then they will tell you that you are running a non supported installation.  Fault tolerant

  1. Two MOM management servers and a SQL Cluster (up to 2,000 agents – if one MOM server dies then the whole load goes to the other and that can only cope with 2,000 agents)
  2. Three MOM management servers and a SQL Cluster (up to 4,000 agents – if one MOM server dies then the 4,000 agents split across 2 MOM servers which can cope with 2,000 agents each)
  3. Use DTS. This avoids having to use a SQL cluster and is explained comprehensively in the Service Continuity Solution Accelerator. But this is more a hot standby solution than true fault tolerance.
  4. Over the top fault tolerant. I came up with this design for a bank that was very keen on fault tolerance! Have two management groups (either as in bullet 1 for up to 2,000 agents or bullet 2 for up to 4,000 agents) in two separate data centres and dual home the agents so that each management group is fault tolerant and if a whole management group (data centre) goes down then the other one carries on working. An analogy would be RAID 10. Very fault tolerant – very expensive!

Another option is to use SQL Log Shipping but the Service Continuity SA excluded it because

  • A single failure in the log-shipping mechanism will force a full re-synchronization of the primary and alternate MOM database, which is both labor- and time-intensive.
  •  Log shipping will transport all data including statistical information, which increases the WAN bandwidth requirements.

For more than 4,000 agents then you are into multiple management groups and the number of companies that would be in that position is small plus they can afford to bring in consultants to help them.  I have started to look at SQL 2005 and there are more options but I am still getting to grips with them. SQL 2005 brings in database mirroring which allows almost the same level of redundancy as clustering without having to have a cluster. See http://www.microsoft.com/sql/prodinfo/overview/whats-new-in-sqlserver2005.mspx for more information. Clustering supports up to 8 nodes (Enterprise Edition) so you can have seven active and one passive. See the table at the bottom of the page to see which versions support which features and prices. A cut down version of the table highlighting the fault tolerant features: 

Edition Pricing Key Features
Express Free Replication and SSB Client
Workgroup $3,900 per processor
$739 (server + 5 users)
Limited Replication Publishing
Back-up Log Shipping
Standard $6,000 per processor
$2,799 (server + 10 users)
Database Mirroring
Full Replication and SSB Publishing
Clustering (supports two nodes)
Enterprise $25,000 per processor
$13,500 (server + 25 users)
Advanced database mirroring, complete online and parallel operations, and database snapshot
Clustering up to the limit of the operating system

So by using Database Mirroring you can get most of the advantages of clustering. But why not use 2 node cluster as it is also included in the Standard Edition? That means that you need Windows 2003 Enterprise Edition and hardware that is supported whereas Database Mirroring works on standard server hardware and requires no special storage or controllers. There is a good Media show of the two at http://www.microsoft.com/sql/prodinfo/demo/wss-refarchdesign-demo.mspx.   And as always make sure you have a good backup!

Microsoft Softricity

August 23, 2006

As Microsoft has now purchased Softricity (http://www.microsoft.com/presspass/press/2006/jul06/07-17SoftricityPR.mspx) I thought I should take a look at it. If it works as well as they show on the web and on their demos then I wonder why all companies would not want to use virtualised application deployment rather than deploying apps the traditional way or using Terminal Server farms (although Softricity can work on those and help with apps that can not normally run on TS).  It creates a sandbox called SystemGuard Environment that the virtualised application runs in so it can not write to registry or other files but can read them and allows the application to cut and paste as if it were installed locally. It reminds me of the Java Virtual Machine but any Windows application can be run through the sequencer program. This can be delivered from a server and they say that only 20% to 40% of the app needs to be downloaded for it to work. It is then cached so that the user can run it next time. Which is great for laptops which has always been Terminal Servers Achilles heel. Well until 3G is cheap and pervasive. The advantages are

  • No application testing for conflicts as the application is sandboxed
  • Can create a simple single OS image for deployment and stream apps afterwards
  • Reduced application conflicts so reduced helpdesk calls
  • Improved security as application sandboxed
  • Centralise control and reporting
  • Smaller amount of data to transmit to get app working
  • Can be cached for laptop users
  • Multiple versions of applications that clash can be run

There is integration with SMS. http://www.softricity.com/products/softgrid-sms.asp So you can push the virtualised application to an SMS client in the same way that you would a normal application or just the code needed to get the application running and the client will pull down code as it needs it. 

Good collateral at http://www.softricity.com/products/howitworks.asp on how it works and the PDF at http://www.softricity.com/news/ecollateral/public/The-Softricity-Desktop.pdf gives a good overview of the product, how it works and its advantages.  The only thing I am not sure of is how it is to be licensed. Will Microsoft give it as part of SMS, which would be a good deal for SMS customers, or will they continue to sell it as a separate product? And if so for how much? In any case it looks like an amazing technology that Microsoft has bought. 

Now Microsoft has virtualisation technologies for the OS with Virtual Server and Virtual PC and for applications with Softricity.

Problems with MOM and McAfee VirusScan Enterprise 8.0i – specifically ScriptScan.dll.

August 14, 2006

If you are getting a lot of failed scripts with MOM 2005 and are also running McAfee VirusScan Enterprise 8.0i then it is a well known problem that has been buzzing about for a while.

Typical is “The remote procedure call failed” error with event 21245.
KB article 890736 – 14 April 2006
http://support.microsoft.com/kb/890736/en-us

The KB states

Patch 11 for McAfee VirusScan Enterprise 8.0i corrects the problem discussed in this Microsoft Knowledge Base article. For more information, visit the McAfee support Web site: http://knowledgemap.nai.com (http://knowledgemap.nai.com) 

Note On the McAfee support Web site, search for Solution ID kb40049 for more information about Patch 11. Also, if you currently experience the problem that is described in the Microsoft Knowledge Base article 891605, “Event 21246 is logged on an agent computer, and you receive an error message in the Microsoft Operations Manager (MOM) 2005 Operator Console,” McAfee Solution ID kb40067 describes the same problem. McAfee VirusScan Enterprise 8.0i does not fix the memory leak that is referenced by these articles. Also, McAfee Solution ID kb47302 describes an issue that is related. 

Actually finding the stuff on the Network Associates web site is another matter!

Patch 11 does not cure the problem and on their web site they admit that due to the architecture of 8.0i there will never be a fix! Although later on in the same article they claim that patch 13 fixes it!

Claiming patch 12 fixes it
http://knowledge.mcafee.com/SupportSite/search.do?cmd=displayKC&docType=kc&externalId=KB40049&sliceId=SAL_Public&dialogID=2123018&stateId=0%200%202121540

Coming clean
http://knowledge.mcafee.com/SupportSite/search.do?cmd=displayKC&docType=kc&externalId=KB40067&sliceId=SAL_Public&dialogID=2123018&stateId=0%200%202121540

Actual message:-
“McAfee is aware of an issue where the loaded module for ScriptScan, ScriptProxy.dll, can leak pageable memory.Because of the ScriptScan architecture, the leak cannot be addressed in VirusScan Enterprise 8.0i. Therefore, if this issue is experienced under mission critical conditions, such as on a server, it may be necessary to unregister the ScriptScan module.

The fix for this is now available in Patch 13, which can be downloaded from the Service Portal.”

On this site they also explain that this component was for client workstations to help with Outlook and IE exploits and was never designed for servers. Organisations should not be running Outlook and IE on servers anyway. This assures organisations who are a bit worried about turning it off. 

From the notes in that article:-

“When installed to a server, McAfee recommends that ScriptScan be disabled. Jscript and VBScript protection is intended for use with Microsoft Internet Explorer and Microsoft Outlook, which generally are not used on server platforms. Additionally, ScriptScan is not designed for high-throughput requirements of servers. Despite having On-Access Scanner protection, there is some risk in disabling ScriptScan. The On-Access Scanner detects malicious script attacks when the script, or it’s activity, accesses the file system. However, not all scripts must interact with the file system to become a hindrance or modify system settings. ScriptScan would block those malicious scripts from executing.”

If you are seeing a lot of 21245 errors then you need to unregistered the dll.
KB 890736 tells you how to do this manually but supposedly Patch 11 onwards allows you to do it from the VirusScan console. I have not tried this as the ePolicy Orchestrator that I saw was not up to this level.

 To work around this problem, you must unregister the ScriptProxy.dll component. To do this, follow these steps. Important When you unregister the ScriptProxy.dll component, McAfee VirusScan software does not check any scripts for viruses.

1. Use an account that has domain administrator permissions to log on to the Windows Server 2003-based domain controller.
2. Click Start, click Run, type cmd, and then click OK.
3. At the command prompt, locate the %ProgramFiles%\Network Associates\VirusScan folder.
4. At the command prompt, type regsvr32 /u scriptproxy.dll.
5. You must restart the MOM service to apply the changes. To do this, follow these steps:

a. Click Start, point to Administrative Tools, and then click Services.
b. In the Services snap-in, right-click MOM, and then click Restart.
c. Close the Services snap-in.

Note If unregistring the Scriptproxy.dll component does not work around this issue, disable the McAfee ScriptScan by using the McAfee Configuration Console.

Also a problem with MOM 2000 and McAfee

9015 and 9014 events in MOM 2000
KB article 891604 – 14 April 2006
http://support.microsoft.com/default.aspx?scid=kb%3Ben-us%3B891604

Citrix Management Pack

August 13, 2006

When deploying the Citrix MP I have often seen a lot of WMI errors. I recently came across this thread on the Citrix site.

http://support.citrix.com/forums/thread.jspa?forumID=24&threadID=60156&tstart=0

 

The key bit of the thread is to run dscheck (Data Store Validation Utility) and if there are problems reported run dscheck /clean.

 

Another useful tip in the thread is not to use Local System as the agent action account as you need the agent action account to be a Citrix Administrator.

System Center Virtual Machine Manager beta 1

August 13, 2006

Well beta 1 has been released but there are already some good posts on it. I did say in my earlier post that virtualistion management will be a big thing and SC VMM is Microsoft’s answer.

Clive Watson has been busy with three posts on VMM with some good screen shots showing the features.

http://blogs.technet.com/clive_watson/archive/2006/08/08/445532.aspx
http://blogs.technet.com/clive_watson/archive/2006/08/11/446016.aspx
http://blogs.technet.com/clive_watson/archive/2006/08/12/446025.aspx

The Virtual PC Guys blog has links to the WinHEC keynotes that has demos.
http://blogs.msdn.com/virtual_pc_guy/archive/2006/05/25/607780.aspx

You can apply for beta 1 at http://connect.microsoft.com. Read the post at http://blogs.msdn.com/virtual_pc_guy/archive/2006/08/07/691345.aspx first.

Read Tony Soper (Enterprsie Management Product Group at Microsoft) for info on VMM.
http://blogs.technet.com/tonyso/archive/2006/08.aspx
And especially the documentation at http://blogs.technet.com/tonyso/archive/2006/08/07/445457.aspx

And thanks to Eileen for pointing out David’s blog on all things virtual including VMWare and Xen.
http://vmblog.com/

MOM 2005 and ITIL – Part 3

August 3, 2006

Capacity Management 

The goal of Capacity Management is to understand the future business requirements and ensure that all current and future capacity and aspects of the infrastructure are provided cost effectively to support those objectives. 

Capacity Management tasks 

  • Monitoring the performance and the throughput of IT services and supporting IT components.

  • Tuning activities to make efficient use of resources.

  • Understanding the current demands for IT resources and deriving forecasts for future requirements.

  • Producing a Capacity Plan predicting the IT resources needed to achieve agreed service level agreements.

Out of the box most management packs will monitor a number of performance counters. The product groups creating the management packs have done the work to say which perfmon counters are useful to monitor. Within the console it is possible to view a graph of these performance counters to see that everything is running fine. More importantly MOM allows thresholds to be set when counters, average of counters over a number of samples or differences between samples are greater or less than a value so that operators can be alerted and do not have to monitor graphs. 

With MOM 2005 came the built in data warehouse that could be used with SQL Reporting Services. This was a big leap from MOM 2000 with an Access front end of reports which were hard to create and you had to create your own data warehouse. While it is recommended that you keep the MOM database at less than 30 GB for the data warehouse is supported to 1 TB and out of the box the settings will keep data for 13 months. The first step in the process is to decide how long it is useful to keep the data for. 

There are numerous reports from the management packs that can be run for reports on a number of performance counters. And there is also a generic performance counter report that will graph any performance counter that you have decided to collect. These reports enable the organisation to determine if there are servers that are under utilised and can be consolidated, removed or virtualised. They also allow the organisation to see which servers are nearing capacity and need upgrades. Also by looking at the graphs over time you can trend servers that will run out of capacity in future and so put in an order with the MOM report helping to create the business case. 

While MOM 2005 can not take the information in the data warehouse and predict the trend Microsoft are working at doing this for future specifically around using the data warehouse in SCOM with a future version System Center Capacity Planner. 

Availability Management 

Additionally Availability Management is also a contender for MOM. I have not added a section on this as the Availabilty Management pack seems to be off at the moment. This is a shame as any practically any organisation that you show the reports to like them. Although according to Clive Eastwood’s blog this should be back by the end of August.http://blogs.technet.com/cliveeastwood/archive/2006/07/21/442795.aspx 

Conclusion 

That does not cover all of ITIL but it shows how you can start an ITIL project with MOM 2005 as the focus which will help determine how you proceed with the ITIL project but also adds clarity about what you expect MOM to deliver to the organisation. 

One of the nice things that I like about ITIL is that it gives everyone a common language. So when someone talks about a problem in ITIL terms everyone knows what that means whereas in organisations that do not use ITIL some people will be talking about an incident, others about an ITIL problem and others about issues in general. 

One recommendation is to follow the Keep It Simple rule and not create vast swathes of documentation and processes that no one will read or follow. Remember it is People, Processes and Technology. The people come first. Make sure they are involved in the project and one thing I have learned from the many types of projects that I have done over the years – you can never over communicate! 

SCOM 2007 Architecture

August 1, 2006

One of the interesting things about System Center Operations Manager 2007 is that it is not just MOM v3. But it is also Corporate Error Reporting (CER) v3 (now called Agentless Exception Monitoring (AEM)) and Microsoft Audit Collection Services (MACS or sometimes ACS). The original beta of MACS was called Distributed Audit Database (DAD) so you could have MOM and DAD. Less said about that the better. While MACS was never released as a product the beta has been deployed by a number of organisations. It is not in the beta 2 build of SCOM 2007. So SCOM 2007 is actually three products rolled into one. All three have different histories and different ways of working. This reminds me of the early versions of Office. This should make the architecture of SCOM interesting though. 

MOM was designed to monitor event logs and performance counters and create an alert. So it was designed not to collect data but to filter it. The exception was the collecting of data for reports and that went into the data warehouse. One of the new features of SCOM is that data goes straight to the data warehouse from the
OM server and does not go into the SCOM operations database which means you no longer have that pesky DTS job to take the data out of one database to put in another. This job was run by that well know enterprise tool – Scheduled Tasks in the Accessories part of the Windows menu! If that job did not run then grooming in MOM 2005 would not take place. The other advantage is that reports will be more up to date rather than waiting for the overnight DTS job. SCOM requires SQL 2005 as well as a number of other prerequisites:-  

  • Windows 2003 SP1
  • AD
  • WinFX runtime (beta 2 at the moment)
  • MSXML 6.0 (installed by setup)
  • .Net Framework v2
  • MDAC 2.8
  • Powershell (in beta at the moment)

CER was designed to help organisations collect data to analyse Dr Watson errors. CER v2 was available to Software Assurance customers only. Rather than the user sending the Dr Watson errors to Microsoft they would be sent to a local server and a policy could be put in place that would allow this to happen automatically without user intervention. Something that Microsoft would not be allowed to do. Voluntarily the organisation could still send the data to Microsoft for analysis which would help Microsoft develop fixes quicker. In return the organisation would get back information of fixes to known problems. The data is stored on a local file share. And this is still the case with AEM except that the data must be stored on a SCOM server so that it can be analysed and reported on. The problem being is that you can not create the file share using DFS for fault tolerance on multiple SCOM servers as each SCOM server would analyse each file share. So if you had 2 SCOM servers it would look like you had twice as many errors and if you had 3 SCOM servers then it would look like you had three times as many errors. Good if you want to show your boss you need to get more budget to fix the problems but otherwise a pain. The other option is to create a DFS share on a non SCOM server so that if the SCOM server does go down for some reason then AEM data is still gathered. At the moment there is no official word on whether DFS would be supported for this scenario but I can not see how it would be a problem. 

Most people would have never come across MACS although there were presentations at one TechEd that I know of but it has been around in Microsoft for a while. There were a number of discussions on how it was going to be released and in what format before they decided to bundle it as part of SCOM 2007. The reason for MACS is to help in the wake of legislation like Sarbanes Oxley (SOX) which affects more than just
US firms. Although there is HIPPA, Basel II and others as well that have regulatory requirements for some organisations. MACS collects security events (and no other events) from servers and workstations (only in SCOM is workstations seriously considered for MOM type monitoring which also may affect the design and scalability of the system) in a secure manner and transports them to a SQL database for storage and analysis. The point was to have separation from the IT team who are generally administrators on most systems but if it is bundled with SCOM then those are the very people that MACS was supposed to be separated from.  

MACS is different from MOM as it was designed to collect events. Also while a MOM event is usually actionable as an event (e.g. disk almost full, SMTP queue over a certain threshold, service stopped etc) security events really need to be correlated and it is the unique information inside the event that is important rather than the actual event per se. MACS does not store the whole event but the unique bits with pointers for efficiency. One of the key findings from the beta is that the database can be a bottleneck due to the amount of insertions. In which case would you want that database on the same server as your
OM database monitoring your key systems? Initially MACS was designed as the collection and storage system and it was going to be up to third parties and organisations to create front ends and reports. Whether this is still the case now it is part of SCOM I do not know but organisations that are used to MOM and out of the box reports will not be pleased if there are no reports or at least some samples. 

At this stage there is no guide from Microsoft on the best practices of setting up SCOM 2007 with all three components. That type of information will come later. But you have one component (AEM) that stores information on a file share that then gets analysed and put in a SQL database.
OM itself stores its data in a SQL database but is designed to be groomed.
OM also has the data warehouse for long term storage and MACS has its database for storage but is optimised for security events. And this is before you even take into account the requirements for fault tolerance and failover. And then if you have SMS and want
OM, SMS and AD to feed into System Center Reporting Manager you will need to work out the best way to do that. And that is before sending the information to a service desk or a manager of managers. As I said designing a SCOM 2007 architecture should be interesting. I just hope that Microsoft publishes some good guidelines and best practices when it is released.


Follow

Get every new post delivered to your Inbox.