SQL Server: Block Level Replication Failures Causing Suspect Database

Recently, I worked with a client who uses block level replication software (think XOSoft/ArcServ) for their clustered SQL Server databases. During proactive testing one night, replication was broken and the failover server was brought up. For the first time in years, the procedure failed and the primary application database came up as SUSPECT. Examining Event Viewer revealed the following error:

Source: MSSQLServer Category:2 Type: Error Event ID: 824

SQL Server detected a logical consistency-based I/O error: incorrect pageid (expected 1:9718172; actual 0:0). It occurred during a read of page (1:9718172) in database ID 3 at offset 0x00000f6b000000 in file ‘C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\Data\myDatabase.mdf’. Additional messages in the SQL Server Serror log or system event log may provide more detail. This is a severe error condition that threatens database integrity and must be corrected immediately. Complete a full database consistency check (DBCC CHECKDB). This error can be caused by many factors; for more information, see SQL Server Books Online.

We backed up the primary server’s application database, restored the backup to the failover server and the database came up no problem. We restarted replication, waited for it to complete, stopped replication, brought up the SQL Server engine and again the database was in SUSPECT mode. We then deleted the .MDF and .LDF files and replicated them from scratch. Once replication was complete, we brought up the SQL Server engine on the failover machine and the database came up just fine. We restarted replication, stopped it, brought up the SQL Server and again it was in SUSPECT mode.

Rinse, repeat these steps with a few changes. Uninstall all updates and HIDS software, no luck.

Finally, the client’s disk administrator detached the failover cluster’s LUN, formatted it, reattached it, rereplicated the database and boom! That seemed to do the trick. Several stops/starts/replication breaks later, the database still comes up healthy. Because we were able to get the database to repeatedly come up healthy (after the restore and initial replication) I, for one, am surprised that this was the solution, but that issue made for a stressful week, and I’m glad the solution was finally found.

Chrissy is a PowerShell MVP who has worked in IT for nearly 20 years, and currently serves as a Sr. Database Engineer in Belgium. Always an avid scripter, she attended the Monad session at Microsoft’s Professional Developers Conference in Los Angeles back in 2005 and has worked and played with PowerShell ever since. Chrissy is currently pursuing an MS in Systems Engineering at Regis University and helps maintain RealCajunRecipes.com in her spare time. She holds a number of certifications, including those relating to SQL Server, SuSE Linux, SharePoint and network security. She recently became co-lead of the SQL PASS PowerShell Virtual Chapter. You can follow her on Twitter at @cl.

Posted in SQL Server

Leave a Reply

Your email address will not be published. Required fields are marked *