Have You Considered Not Using SQL Server High Availability?

When someone asks about SQL Server architecture, the reflexive answer is usually 'High Availability,' as if it's a requirement rather than a choice. But after 20 years of managing SQL Server environments, I've found that HA often creates more problems than it solves, especially in certain types of organizations.

The Turnover Problem

Most places I've worked have rotating staff—people are out within 1-4 years for all kinds of reasons. That means the infrastructure we build needs to stay as simple as possible, because all of us might leave in that same timeframe.

SQL Server High Availability is not simple. It goes offline constantly in highly secure environments, it needs babysitting, and it requires specialized skills. Plus, I haven't seen a dedicated "SQL Server DBA" job posting in years, so I'm likely going to be replaced by a generalist when I leave. Because of that, I don't set up anything that needs babysitting or advanced knowledge. Good ol' Standalone SQL Server doesn't require either.

When Your RTO is Measured in Days

Most places I've worked have insanely high RTOs (recovery time objectives). I laughed with glee once when I found out that my workplace had 48 hours to get back online. I could build 48 SQL Servers in that time! Why would we use something that relies on network and storage stability when the environments I work in don't provide it?

In highly secure environments, the "High" in High Availability often doesn't exist. Maybe it's the encryption, the antivirus suites, or the unstable/deployed networks, but Availability Groups and Failover Clusters frequently introduce more downtime than they prevent.

The Hidden Complexity: It's Not Just SQL

SQL Server HA isn't just a SQL Server problem. Clusters and Availability Groups require coordination between storage admins, network engineers, virtualization specialists, and SQL Server DBAs.

During COOP (Continuity of Operations) testing, we tried bringing up our cluster at the secondary storage site. Getting it running involved VMware RDMs and storage configurations that required multiple teams. 33 hours later, we finally had it working. The storage admin begged me to switch to standalone servers after that—best justification I ever got.

When your HA solution is the thing that fails disaster recovery testing, something's wrong. And when your storage team is asking you to simplify, you know the complexity has gotten out of hand. When things break at 2 AM on a weekend, you need all those specialists available at the same time, bonne chance!

Real World Examples

I remember one job where I replaced all our clusters with Standalone SQL Servers. I moved the disaster recovery work to VMware Site Recovery Manager, because that's what the rest of our infrastructure used. That meant others could restore SQL Server in an emergency, not just me.

After I left, I talked to an old coworker about how things were going. He said, "Pretty much everything is going to hell except for your SQL Servers." I'm not even exaggerating—that's one of my proudest accomplishments. Standalone SQL Server is the best.

Recently, I left a job where I'd inherited an environment running Availability Groups. The person who set it up was knowledgeable and it stayed stable while we both took care of it. But after I left, a huge network outage sent the AGs straight into the dumpster. After a few days of downtime, they asked me to come back and help. We worked out a plan to replace them with standalones, and they've been stable since, even when network engineers are making aggressive changes.

When This Approach Makes Sense

So for those of you who have seen a need for this in your own environment, I can tell you and your boss that I, Chrissy LeMaire, Microsoft Data Platform MVP, 20+ year DBA, book author and dbatools creator, don't default to using SQL Server High Availability. I carefully evaluate whether it's truly necessary. Here's what I consider:

  • My RTO requirement is always longer than 2 seconds and always longer than it takes for me to just build a whole new SQL Server
  • I usually support a small number of people: 250-1000 users on average
  • If I inherit a SQL Server HA environment, I don't immediately gut it. As a DBA, I'm conservative in my approach and I'll wait until it acts up or until a migration is otherwise required
  • My environment doesn't have the network stability or specialized staff to maintain HA solutions long-term

And again, people who inherit my environments are relieved and this change has always been welcomed. Also, as a sweetener when pitching this to management, I always have a dbatools script ready to deploy everything from backup. Show them you can restore the entire server in minutes and suddenly that 48-hour RTO looks like overkill.

Making It Easy with dbatools

Check out dbatools.io/dr and specifically, Export-DbaInstance which exports logins, database mail, credentials, Agent jobs, linked servers, and more—basically everything you need to rebuild a server. It can make rebuilds a breeze along with Install-DbaInstance. Practice an automated install, see how long it takes, then include that in your justification.

As the blog post outlines, HA and DR are two different things, but with a 48-hour RTO, the distinction becomes less critical. Whether SQL Server goes down from a cluster failure or a full disaster, I've got the same recovery window. Good DR automation can handle both scenarios without the overhead of maintaining HA infrastructure.

Is It Time to Revisit the Convention?

I know this goes against conventional wisdom. High Availability became the default recommendation when building a new server was a multi-day ordeal. But with modern automation and tools like dbatools, restoring a full environment can take minutes instead of days.

The question isn't "Can we afford downtime?" It's "Does HA actually reduce downtime in our environment?" In many cases—especially in highly secure or unstable networks—the answer is no. HA introduces complexity, requires specialized knowledge, and often creates more outages than it prevents.

I'm not saying HA is never needed. If you're running life-critical systems or supporting thousands of concurrent users with strict SLAs, you probably need it. But if your RTO is measured in hours or days, and your environment lacks the stability or staffing to maintain HA properly, a solid DR plan with automation might serve you better.

Maybe it's time to question whether your specific environment needs HA, rather than treating it as a universal best practice.

With love for Standalone SQL Server,
Chrissy