RMS and Time
I was about to write a post about the RMS going grey when this post came up –
OpsMgr 2007: The Health of the Root Management Server is in a Gray “Not Monitored” State
While it describes the same symptoms it is not the same problem. What I was seeing was the RMS health service on a new install going critical every minute and then resolved by the system. That in itself was very strange as the alert was created by a rule and I have always believed that only monitors can be auto resolved. Looks like I was wrong on that.
The alert was “Root Management Server Unavailable” and the description was “The root management server (Healthservice) has stopped heartbeating soon after 8/28/2008 4:10:08 PM. This adversely affects all availability calculation for the entire management group”. There was nothing in the Alert Context. The RMS would flick between grey and green. The auto resolved view I created (very useful view for finding problems that are always coming and going that you may not notice. Also good for checking how baseline monitors are doing if they are getting created and resolved at next time check. – Create a new alert view and chose “resolved by a specific user” and put in System) showed that this alert was happening every minute.
I then noticed that the RMS time was 1 hour ahead so I reset that but in minute (co-incidence? I think not) the server’s time went 1 hour ahead. I tried several wasy of channging the time with the same result. All the other servers were in the same time zone and they were fine. I spent some time with w32tm and net time before checking the VMWare settings and found that it was getting its time from the ESX server. I did not think to look there initially as the build was not supposed to have that switched on. I removed that and the time was picked up correctly from the domain. The ESX server this VM was on was an hour out but all my other VMS must have been on another server as their time was correct.
I had already uninstalled the OpsMgr software to see that it was the OS that was still doing the time shift. Once I reinstalled it with the server at the correct time then everything was OK. Very bizarre.
This is a different problem to the one I had with another system where the AD server and the RMS/DB server were one hour out that caused problems.
Time is obviously very import to OpsMgr.