Database corruption repair with SQL Server is one of those things that you generally don’t see every day, but as a DBA, you are expected to be able to fix it quickly without any data loss. From the risk analysis matrix, it is one of those low frequency high impact scenarios that results in an extremely risky situation. This is one of those situations which could lead to great success, or massive failure depending on your training and skills.
These low frequency, high impact, high risk scenarios are the times that you need to rely on your training, or the training of someone more experienced in these areas.
If you are reading this post because you have database corruption right now, and you have never dealt with database corruption before, I would suggest that you ask for help. Depending on the type of corruption that your SQL Server database has, it could be a quick fix with no data loss, or it could be a complete catastrophic failure with the entire database being destroyed. Let me mention it again, this is the time that you should ask for help from someone who has dealt with fixing database corruption before.
Complete Database Destruction
When corruption occurs the complete database destruction is usually not caused by the corruption itself, but more so by the steps that people take to repair the database. Most of the time data loss or database destruction during database corruption repair can be avoided if you know what to do, and know what is really happening in the database engine when you run commands to fix the corruption.
There are 2 general ways that a DBA finds out about database corruption, first if you are proactive, you are running DBCC CheckDB regularly and your DBCC CheckDB command will give you some error messages about corruption, allocation errors, or other types of messages. For instance:
DBCC CheckDB() WITH NO_INFOMSGS;
The second common way that DBA’s find out about corruption is from users who get errors when they attempt to query a table or index that is corrupt. You might see something like this:
SELECT * FROM Revenue;
Msg 824, Level 24, State 2, Line 1
SQL Server detected a logical consistency-based I/O error: invalid protection option. It occurred during a read of page (1:244) in database ID 21 at offset 0x000000001e8000 in file ‘C:\Program Files\Microsoft SQL Server\MSSQL10.SQL2008\MSSQL\DATA\CorruptionChallenge2.mdf’. Additional messages in the SQL Server error log or system event log may provide more detail. This is a severe error condition that threatens database integrity and must be corrected immediately. Complete a full database consistency check (DBCC CHECKDB). This error can be caused by many factors; for more information, see SQL Server Books Online.
Either way, you are faced with a corrupt database that you need to repair. You can take the brute force at this point, and possibly lose some unknown amount of data, or you can take the professional approach to first identify what data is corrupt, what would be lost on a repair, and have a way to get he missing data back.
Will the corruption just go away over time?
Unless your standard database activity involves the complete removal of tables and indexes (both) on a regular basis the answer is NO, the corruption will not go away. In fact if you have corruption that is not dealt with, it is likely that it will get worse.
Can’t I just run DBCC for Database Corruption Repair?
The answer is yes, you could just run DBCC CheckDB, DBCC CheckTable, or a variety of other DBCC commands to fix the corruption, but if this is where you start, you are likely to end up with data loss that could possibly be avoided.
DBCC CheckDB() WITH NO_INFOMSGS;
Then you attempt to fix it with a rebuild.
ALTER DATABASE [CorruptionChallenge1] SET SINGLE_USER WITH ROLLBACK IMMEDIATE; DBCC CheckTable(Revenue, REPAIR_REBUILD); ALTER DATABASE [CorruptionChallenge1] SET MULTI_USER WITH ROLLBACK IMMEDIATE;
Although the rebuild sounds like a good idea it doesn’t often fix anything, you end up having to use the REPAIR_ALLOW_DATA_LOSS option as shown here.
ALTER DATABASE [CorruptionChallenge1] SET SINGLE_USER WITH ROLLBACK IMMEDIATE; DBCC CheckTable(Revenue, REPAIR_ALLOW_DATA_LOSS); ALTER DATABASE [CorruptionChallenge1] SET MULTI_USER WITH ROLLBACK IMMEDIATE;
The problem with REPAIR_ALLOW_DATA_LOSS is that when a page containing corruption is encountered, that whole 8k page is just thrown out, and all rows in that 8k page are then lost. If there are multiple pages with corruption they all get thrown out. Even if only one row in the entire page is corrupt, the whole page with all rows gets thrown out. This is the brute force approach which almost always creates data loss, however following the steps above you may not know what data is missing, how many rows or missing, or what rows are missing.
It is best to assess what is going to be lost, then save off what you can so that you can then bring the data back after the corrupt pages are deleted.
Recommended Steps to Repair Corruption
The steps to repair corruption will vary with every specific case. These are just some general steps to consider, they may not apply to your specific case, but you should at least determine if they do apply.
- If possible get a current backup, or determine a plan to get back to the exact point that you are at right now. This is important so that you don’t lose everything if something goes wrong in the recovery.
- Determine exactly where the corruption is at. What tables are impacted? What rows are bad, and what rows are good? Is the corruption in a heap or clustered index, or is the corruption I a nonclustered index? Is the corruption in a system database or is it in a user database?
- Can you work on the corruption in a restored copy of the database, or do you have to work on your only production database?
- Determine what backups are available? Have you taken a tail of the log backup?
- Determine the last known good time. Did this corruption just occur today, or did it happen 6 weeks ago and nobody noticed it. The answer to this question may help determine what backups you might consider in the recovery process.
- Save off whatever data you can in the tables where the corruption occurs? Have a plan to put this data back if it is lost during the repair process.
- Determine a plan to be able to determine exactly what is lost if there is data loss, and how you might be able to get that data back.
- Determine how you will know when you are done. For instance if your plan is to run DBCC CheckDB or DBCC CheckTable with the REPAIR_ALLOW_DATA_LOSS option, what are you doing to do next?
- When you have the corruption resolved, with minimal or no data loss, can you take a full backup to save your known good state going forward?
- Most importantly: Who are you going to turn to if you get in to deep? Can you run your plan by that person before you get in too deep?
These are just a few of the common steps to consider, it can be much more complex that just those steps depending on the type of corruption, amount of corruption or time the data has been corrupt?
Need help right now?
At Stedman Solutions, LLC we regularly deal with database corruption, in fact the Database Corruption Challenge a 10 week (or almost weekly) event was hosted by Steve Stedman of Stedman Solutions, LLC. In this contest some of the best people at fixing SQL Server database corruption on the planet participated. Stedman Solutions can help you with your database corruption, after working through hundreds of corrupt database scenarios and solutions just this year we are ready to help. Just visit the Stedman Solutions, LLC website, and fill out the contact us form.
Plan Ahead for Database Corruption Repair
Two recommendations that I have for planning ahead are:
- Practice your skills at recovering corrupt databases.
- Have an established consulting relationship with someone who can help you when corruption occurs.
To practice your skills, I recommend visiting the 10 challenges in the Database Corruption Challenge (without looking at the solutions), download the corrupt database and attempt to solve it yourself. Week 1 of the corruption challenge is a good place to start if you are new to dealing with database corruption.
Stedman Solutions offers on call emergency response for database corruption and other SQL server help. Contact Steve to establish a consulting arrangement so that you can just call and get help when corruption occurs. If you don’t have the pre-established relationship, we can still help, it just takes a bit more time to get a signed agreement and get the billing agreement in place.