Do you know what to do when corruption hits? You should watch this quick 10 min video first if you plan on fixing it yourself?
Steve Stedman and Derrick Bovenkamp wants to save you from making a big mistake. In this video they will share the Do’s and Do Nots when faced with a database corruption.
And we are always here for guidance or full corruption repair.
We are available for emergency help (contact Steve first to confirm availability). Visit our store at Stedman Solutions to learn more.
Transcription:
Derrick Bovenkamp 0:12
So what do you do when corruption strikes and and again, this is I wish we had a product that just threw up a master alarm on our desk, when we ran into corruption,
Steve Stedman 0:24
you know, it would have to flash red, really. And then it would also have to have really loud sound that was really, like, annoying if we had a device like that. But really the tool is check TV. And that’s our master alarm when something goes wrong. Now, the first thing that will happen when you when you’re hit with corruption, if you haven’t experienced this before practice it is you’ll go to Google or Bing or your favorite search engine. And you’ll say, check DB errors, you’ll search on the error code or the specific message. And you’ll find somebody who sounds brilliant, who tells you just run check DB with repair allowed data loss, and your data will be repaired. Those people should be I don’t know, lined up and thrown out of the database community. Because what repair allow data loss, this is an option on check dB. And what it says is to go through and scan the database just like check DB normally does. But if you find a page and a page is an 8k chunk of the database, which may contain a few rows to a few dozen rows, and when it finds a page that’s corrupt if even if one bite or one row and that entire page is corrupted, just SQL Server says Huh, let’s just throw that away. And it’s sort of like if you had a book and you had a couple bent pages, and you were scanning through and you said, Oh, this page looks bad. Let’s just throw it away. And you throw away the whole thing there is if you value your data, there is really no good outcome from running, repair allow data loss.
Derrick Bovenkamp 1:43
Yeah, and I think the other thing is, we frequently see it not even work like it ran, it threw out a bunch of data, and it still not repaired. Yep,
Steve Stedman 1:54
exactly. And not only at that point is you’re you’re you’re in a mess, but you’ve lost some data, and it may be even harder to repair things. Now, if you’ve ever seen the movie Fight Club, and you remember there’s three rules in Fight Club. And if you remember those rule number one is do not use repair allow data loss. And rule number two is do not use repair allow data loss. And rule number three is certainly do not use repair log data loss. Now, after I’ve said that, there are some scenarios where you could use it but never use it on a production environment. And it’s okay to use it on a copy of a database, not your production system. If you’re going to use it, use it with check table rather than check dB. And part of the problem with check DB is you might run check DB and get a couple dozen errors or something. And then you run parallel data loss and it throws out those couple dozen pages. But then it just keeps going to find more corrupt stuff after it gets past those. And you might lose huge, huge chunks of your database. Yeah, so if you do have to use it, use it with check table once you’ve rescued any of the data from that table. And if you’re going to do it with check table, usually I’m going to assume that everything is going to be thrown away. So I’m going to figure out, how do we backup all the data out of that table. And instead of running check table with parallel data loss, I’m just gonna use truncate table and then have a way to bring all the data back in after we do that.
Derrick Bovenkamp 3:14
Alright, so if you get corruption, you, you realized you ran into corruption through one of the ways that we showed you with the very first thing you should do, even if you suspect corruption is do not restart that SQL Server. Don’t restart the service, don’t reboot the machine. And I know that’s the first thing that you want to do is it’s still the first thing that I want to do. And it’s just it’s just one of the things you have to stop yourself. Take your hand off the mouse, don’t reboot. And Steve says the very best case is if you actually have corruption, you’re going to be in the same spot that you are now. The worst case is, you know, what if that machine doesn’t even bit because there was also something corrupt in Windows? Or what if it does bid but the SQL Server can’t start out or that database can’t even connect at all.
Steve Stedman 4:09
So then once you’ve got past that urge to try and restart your server. The next thing you’re really going to want to do is go and figure out what’s going on with your backups and determine are your backups failing, or are you able to take a backup of that corrupt database, and then try and track down when was the last pre corruption backup. And if you’ve got good check DB jobs, you should be able to see when check DB started failing and then go look at the last good backup before that. I know so often I see people have like three to five days of backup retention. And that’s it. And if the corruption was introduced, and it took you a little while to figure out it was there, like you’re only running check DB once a week when you’ve only got three days for the backups. Well, your options to do the repair are pretty limited at that point. So with those backups, you want to consider how important the data is and how recent the corruption was. And is it easier just to go and grab a break. eruption backup and restore that music. I remember at past summit one time, I was talking to a guy, and he had hundreds of SQL servers that he managed. And he said to me, why would you ever want to repair a corrupt database, we have good alerting we know when corruptions hit we have good backups, we know that we can restore right up to when the corruption was hit, we would never repair a database. He said, he said, we just restore from backup and get it done quickly so that nobody even noticed there was much of an impact there. Now, if that was a bank, and that was my bank account database that had my checking account, I just deposited a check. And then they went restored a backup before the corruption, I might lose that deposit. That might not be so good. But some businesses cases will allow you to do that, depending on the nature of the data.
Derrick Bovenkamp 5:39
So that’s talking with your management team. And part of it is is even, you know, the business needs, they need to know what’s possible with the budget that they supplied you. And we’ve seen sometimes those discussions, the outcome of that discussion is, okay, well, I have to live with that risk, or the outcome of that discussion is here’s some more money to close that. close that loophole. The next thing is is this is a good question is can you reply, the transaction replay replay the transaction logs to get past the corruption? And you know, do you have room to go try that? Do a restore?
Steve Stedman 6:17
So Derek, you’ve got corruption, your manager is coming down and saying, Derek, when is it going to be done? What’s wrong with it? What are we missing? When is it going to be back when one of the customers can really use the system, one of the employees is going to be able to use the system, and when you’re gonna have it back online, and maybe you’ll have a bullhorn, like this guy in the picture does. They’re gonna keep asking you that you can’t even get a word out, when is it gonna be done? And that can be really painful.
Derrick Bovenkamp 6:47
Yeah, so don’t let stress dry your decision making process. And I’ve certainly been been there, you know, it wasn’t corruption. But I dealt with system outages, there was times where I had to literally like, put my phone on DND, turn off my cell phone, close outlook. And just focus on the problem. Now, obviously, keeping management aware is a high priority. Steve and I do that when we do repairs, we tell people, here’s the update, now expect an update in an hour, or expect an update in 30 minutes. And we hold to that. A lot of the times, even if they go to sleep at night, we will keep sending updates throughout the night to keep them aware to try to starve off the blow horn there. But don’t let stress drive you into the corner and then you made a bad decision. A bad decision could be lost data, potentially you put people back in a system that isn’t stable. And now all of a sudden, people are writing more to that database that’s already corrupt. Yep,
Steve Stedman 7:48
absolutely. So another important question is, can I get a do over? And if you’ve ever played a video game, and you get partway into it, and you screw something up, you realize, oh, I’ll just try that again. And I’ll get it right next time. Well, in life, it’s not always that way. But if there’s a way that you can figure out how to get a do over when you’re doing corruption repair, specifically, if that’s a backup of the corrupt database, or to make sure that if you go through the process, and you get to some point, where you realize, Oh, I wish I’d save that in the beginning of the process, you get to some point that you can roll everything back, I don’t want you to not roll back like a sequel, rollback, but that you can go back to that starting point and start over again with a do over and do it right. Because most of the time, when you’re recovering data, you don’t know what’s all broken until you get all the way through it and having a way to do it over and if that means you’ve taken a backup of the corrupt database and put it on a test server, that is an awesome way to handle it. Because then you know that what you’re doing certainly isn’t going to make it worse. And you can certainly do a restore and start it over again if you need to.
Derrick Bovenkamp 8:51
Yeah, and I think the other part there about putting on another server is if you don’t know what caused the corruption, if you’re actually getting it on fully different hardware and storage, you can be protecting yourself there is for corruption.
Steve Stedman 9:06
So before fixing or removing corruption, there’s some things you’re gonna want to consider. The first one we just talked about is the do over Do you have a way to start over if something goes wrong. And that’s probably the most important thing while you’re doing the recovery to make sure and that way, you it’s sort of the do no harm, no matter what you do, if you can undo it and start over again, you’re making it no worse than when you started.
Derrick Bovenkamp 9:28
And you know, you have again, a backup of the current state, can you make a backup to get you back there?
Steve Stedman 9:37
And if your solution is going to cause data loss of any kind, can you save anything before causing that data loss? And a great example is if you know that there’s a page that’s corrupt and it has 12 rows in it, can you go and figure out how to pull those 12 rows back from a backup or from from a copy or some other part of the system to fill them in after they’ve been lost?
Derrick Bovenkamp 9:58
And then do you have somebody To review your ideas with and they can’t cover how important this is. They don’t even need to be a technical person, but just somebody to allow you to whiteboard out your idea or, you know, talk through the idea of what you’re going to do and what order you’re going to do them. Because that can help you make sure that you’re making good decisions,
Steve Stedman 10:19
you know, and that doesn’t even have to be a person because really the the key thing here is that you’re explaining it and I used to I knew a guy years ago that I worked with, when he ran into a problem and there was nobody to explain it to he would explain it to his dog. And even though the dog couldn’t reply and couldn’t tell him there was anything wrong. It was the act of verbalizing it and explaining it that made him realize that he was saying something wrong. Even if it’s a dog or a non technical spouse or your kids or a baby that can’t talk yet. It might be better to explain your ideas. Before proceeding. You’re in much better place if the person you’re talking to is technical and understands what you’re saying. But even if they’re not go for it. You can also you can call us for help. If you need help there.
Related Links:
- REPAIR_ALLOW_DATA_LOSS example
- Corruption Repair Help from Stedman Solutions.
- Contact us if you need help. https://Stedman.us/30
More from Stedman Solutions:
Steve and the team at Stedman Solutions are here for all your SQL Server needs.
Contact us today for your free 30 minute consultation..
We are ready to help!