Failed Domain Relationships on SQL Server Cluster Nodes

You know what’s scary as hell? When one node of an important cluster loses its trust relationship with the domain and you see the error “the trust relationship between this workstation and the primary domain failed”. That happened to me late last year with one of my SQL Server 2008 R2 nodes. The scary part was that I just didn’t know what to expect. The fix could be simple, or it could require a node rebuild.

Generally, I address this issue not by removing servers from the domain, but by using PowerShell v3 by executing the following with an admin prompt:
Reset-ComputerMachinePassword

In this case, however, I did not have access to v3, so at an Admin prompt, I executed

netdom resetpwd /s:dc.ad.local /ud:ad\adminaccount /pd:*

This successfully reset the password and I was able to login again with a domain account. I then started the cluster service, and failed my SQL Server over with no issues. I then failed back and rebooted the server for good measure (and to ensure the trust still existed). I tested this as well on Windows 2012 R2 with a SQL 2014 cluster with success, though Reset-ComputerMachinePassword was easier to remember and worked just as well.

What caused the loss of trust? I haven’t figured it out yet, but I’m assuming that node probably cheated with his ex.

Chrissy is a PowerShell MVP who has worked in IT for nearly 20 years, and currently serves as a Sr. Database Engineer in Belgium. Always an avid scripter, she attended the Monad session at Microsoft’s Professional Developers Conference in Los Angeles back in 2005 and has worked and played with PowerShell ever since. Chrissy is currently pursuing an MS in Systems Engineering at Regis University and helps maintain RealCajunRecipes.com in her spare time. She holds a number of certifications, including those relating to SQL Server, SuSE Linux, SharePoint and network security. She recently became co-lead of the SQL PASS PowerShell Virtual Chapter. You can follow her on Twitter at @cl.

Posted in Active Directory, PowerShell, SQL Server
9 comments on “Failed Domain Relationships on SQL Server Cluster Nodes
  1. Shiny says:

    Hi, when you reset the password, did you reset in the domain server or did you reset in the SQL cluster servers? I’m also facing some kind of same issue as the cluster servers showed they are connected to the domain but when I login with domain account, I could not logged in and get this error “There are currently no logon servers available to service the logon request”.

    • Chrissy LeMaire says:

      The servers themselves – but no logon servers generally means that your cluster can’t even find/talk to the DCs. Try checking your DNS and pinging your domain, see if it resolves.

  2. Mike says:

    Great article. We’ve come *this* close to getting both our cluster nodes back up in service (using the netdom command). At one point both nodes could login the domain again and showed as “up” in cluster manager. However, we did not have the option to move SQL services to Node1. We tried connecting across to the other node in File Mgr and it couldn’t access the C$ share.

    Mayhem ensued while we tried a variety of tactics to sync things, but now we’re back to Node2 appears to be happy and the cluster is running on it. Node1 is marked down and cannot login to the domain.

    We’re about to have to rebuild the cluster… scheduled for tomorrow at noon. If you happen to see this and have any suggestions, I’ll be sure to watch for an update. Thanks++

    • Chrissy LeMaire says:

      Hey Mike! Yuck, hopefully you’ll have some luck. I put out an APB on Twitter so hopefully someone will know.

    • Chrissy LeMaire says:

      Oh, and if you are on Twitter, you can use the #sqlhelp tag to ask, or in the SQL Server Community Slack, the channel #sqlhelp. SQL Slack is at sqlps.io/slack

  3. Mike says:

    Thanks, Chrissy, you rock. It looks like they actually turned off DC2 before using netdom “because they wanted to have a good DC if something went wrong”. Then when they brought it back up, things didn’t jive. That and not handling the Kerberos caches properly on the two DCs during the effort resulted in the disconnect.

    We’re proceeding to rebuild – their call. I think it could be salvaged with judicious procedure.

    Thanks again for your time & attention – much appreciated. TGIF.

    • Chrissy LeMaire says:

      Seems like a somewhat valid reason for failure! How are you doing your restores btw? Just rebuilding the node or the everything?

      • Mike says:

        We copied off the file contents on the shared node storage and will do a db backup right before a complete rebuild. Before we do that, I hope to grab 20 minutes to execute the process I think would right the ship, just as an exercise. It would help me insure I understand the way this works(!)

Leave a Reply

Your email address will not be published. Required fields are marked *

*