SQL Server Corruption: Corruption vs Drive Failure
Over the years I have documented many of the concepts around corruption repair. Want to find out more, check out the corruption category at by blog.
Transcribed from a presentation on Database Corruption by Steve Stedman and Derrick Bovenkamp.
Steve : Now, one of the things that when we talk about drive failures and that there is a big difference between corruption and complete drive failure. And here’s an example. It was another medical group. We do a lot of work with medical groups. They called us with a corrupt database. So we started looking into it to find out what was corrupt. And it turned out they had a raid 50 drive array and on that was the storage for the virtual machine. It had multiple drive failures and with RAID 50, if you lose enough disks, eventually you end up with a big hole in your data. And by a big hole, I mean, like 20% to 25% of every disk was gone. And for big files, 20% to 25% of those files were gone.
Derrick : So another tough thing they had here is they did have backups but they were on the same RAID 50 array as the database. So they also had the same 20% hole missing out of them. And, you know, this is another one, I don’t know if this client had hot spares available, and maybe even through hot spare as well. But they’ve definitely been through two drives and not just the one drive.
Steve : And with this, basically, there was nothing we could do other than help them restore a backup from six months ago. But that was something that they were able to do on their own. And basically 20% of every large file on the drive was missing. And when we look at that, that’s not corruption. Corruption is areas where things are damaged, that they can be repaired. But this is like the difference between a car getting a fender bender and doing a repair on that, and your car being stolen. It’s just gone at that point there was no option from the database side to repair this in any way. And here’s an example of what the SQL server error log looked like.
Derrick : Yeah, so if you can look at that, you can see the error log looks normal at the top and at the bottom. And there’s just a whole bunch of some kind of garbage in the middle. That’s not error log,
Steve : Every single large file, and not even that large, but every single medium to large file on the disk had this type of problem with it. So their only option was a install their operating system and build rebuild from scratch based off of the situation they were in. So there’s a lot of confusion with database corruption. And if you’ve ever run check DB and got errors, and you start Googling on some of those errors, you find a lot of people out there who may not have had enough corruption repair experience as they should have to be talking about it. And they come up with all kinds of things that they suggest or tell you about what’s going on with that corruption. And one of those is that a full backup and restore of a corrupt database may help fix the corruption.
Derrick : Unfortunately, that is false. And when you do a full database backup, it backs up the corrupted data with it. And that’s if you can back it up. There’s a lot of times where the backup just quits halfway through. And you know, one of the ways that we can prove this is if you so choose to get to the Database Corruption Challenge. That’s how we’ve distributed the corrupt databases is we back them up.
Steve : Yeah, in the Database Corruption Challenge. If you download several weeks of database corruption, you download a backup file, you do that restore, and it comes back corrupt. So doing a full backup and restore of the database will not help fix it. However, there are some situations where if you happen to have a full backup, and maybe some transaction log backups, you might be able to replay the transaction log backups to get past the date, the point in time where the corruption occurred, if the storage holding the transaction log backups does not have the same corruption that caused the database to be corrupt to begin with.
Derrick : Yeah, and the next one is the first thought of many system analysts and system administrators is, “if I reboot the servers is that going to fix it?
Steve : That’s one of those things that if it was a file server, I’d say yeah, kick it for a reboot and see if it works any better. But in SQL Server that’s not the case, once the file becomes corrupt, a reboot won’t help it. And actually, it may make things worse, like a whole lot worse. And what I mean by that is if you have corruption and your database is running, that’s a bad spot to be in. But if you have corruption, and your system is shut down and it will not start up. That’s a much worse place to be than corrupted and running.
Derrick : So what about if I ignore corruption? Is it going to go away? Is it gonna repair itself?
Steve : That’s not going to happen. Technically, it’s unlikely because if you have a regular process that maybe truncates a table and then refills it was some kind of a data warehousing or ETL type process. Yeah, you might get lucky it truncates it, it goes away, and then you wouldn’t have corruption. But other than that, without the act of like truncating a table or clearing out a table completely, it’s extremely unlikely that ignoring it will make go away. And, in fact, ignoring it may make things a whole lot worse, because without fixing it and without knowing what caused it. You may get more and more corruption over time. Be careful with what people are telling you on the internet because a lot of those things may not be accurate on database corruption.
Want to learn more about how to prepare for corruption on SQL Server? Take a look at my class that will teach how to to be prepared if corruption strikes.
Corruption Class by Steve Stedman.
More from Stedman Solutions:
Steve and the team at Stedman Solutions are here for all your SQL Server needs.
Contact us today for your free 30 minute consultation..
We are ready to help!
Leave a Reply