Recently, I noticed errors occuring daily at 2:00AM on my SQL Cluster. The errors showed Windows logins being routed to the wrong named instances, along with errors from SQL Browser not being able to service client requests. I noticed that all of the instances being affected were SQL Server 2008, but never any of my 2005 instances.
My first guess was maybe the server was being inundated with connections by some rogue job and causing SQL Browser to get confused. One day, SQL Browser even suffered a fatal error and stopped servicing requests altogether. This meant all of my instances on the server were unavailable until I noticed that morning and restarted the SQL Browser service.
After some investigation, I noticed that the syspolicy_purge_history job that every SQL 2008 install gets was running on all of my 2008 instances at 2:00AM. So, I disabled all of the jobs and the next day all of the nightly errors were gone.
I found this documented by Microsoft here:
I had seen references to this before in regards to updating the job when using Virtual SQL Host names, especially in clustered configurations. The default setup uses the local node name, but if you fail the instance over to another node, the PowerShell script that gets called is called with the wrong host name… which may lead the job to fail.
I still don’t know exactly why this job causes SQL Browser to freak out, but it does. I’d rather not even use the job, since I’m not currently using Policy Management, but I haven’t yet investigated far enough to see if I need the step that cleans up the health records. I will just update all of the jobs to use the virtual SQL host name and check for errors every morning until I’m comfortable.
So, moral of the story – If you have SQL 2008 instances running in a clustered environment, make sure you update the node names in Step 3. Staggering the jobs wouldn’t be a bad idea either.