Importance of Running CheckDB for Database Health

Steve Stedman

0:15

So Derrick, one of the things we run into is customers who are a little bit resistant to running CheckDB as frequently as maybe we think they should normally. And I guess, I’m just curious, what are your thoughts on how frequent? Or how important it is to run CheckDB frequently?

Derrick Bovenkamp

0:51

Yeah, that’s, that’s a good question. Steve, I think let’s touch base on the resistance a little bit. Usually, when we run into it, there’s performance concerns, right? “How is this going to affect my performance? Is it going to slow my system down at night when we run it?” Or, “is my system too slow to run it in a reasonable time?” Is the other one that we run into? So usually, it’s all around performance. And, you know, I think you and I have the gold standard. We run CheckDB at least once it once a day if possible, once a week for those systems that are a little bit slower. Or, you split it up and you run it day by day. There’s one customer whos databases are so large that we run it once a week, but each database gets its own night.

Steve Stedman

1:52

And you know, what’s interesting is that, when there’s the resistance on the performance, mainly the server’s already overloaded a little bit in some way, meaning they can’t, they don’t have the IO to really do CheckDB as frequently as you as you should. What that really equates to is that if they do get to the point where there’s corruption, it’s going to be that much harder for us to do the repair, when they’re on that slow IO. And instead of a repair, taking a short amount of time, it might take days to weeks to do a repair. And that situation, if we don’t know about it, and I guess that kind of brings back to the primary reason for running checkdb is that, we all know that things can happen with I/O here and there. Occasionally corruption may get introduced into a database if there’s some glitch with the I/O. And the reason we running CheckDB frequently is so that we catch that quickly, so that it’s easier to repair. Or that things are as minimally damaged as possible before things get worse.

Derrick Bovenkamp

2:58

You know, that’s a great point. And that reminds me, I want to touch on with the importance of CheckDB. It is another way to know that the backups you’re getting are good. The same customers that are hesitant to run CheckDB regularly, consider they’re backups are important to them. Because the databases are really important to them we may be backing up corrupt data, and you got to do that restore, you can’t restore it.

Steve Stedman

3:33

And that’s a really good point. So what are your thoughts on actually running CheckDB against a restored backup somewhere, as opposed to running it against the actual production database?

Derrick Bovenkamp

3:45

You know that theoretically should work fine. If you’re doing it regularly, and you have it scripted out, and you get that restore, theoretically, if you take that backup, you restore it and you CheckDB it.

Steve Stedman

4:01

You should be okay, yes I would agree. And I think the only kind of negative against doing it that way would be the time delay that would be involved in doing that restore and doing the check. But if it’s something you have all automated where maybe your backups run at 1am and then they finished by 3am and then on some other server it gets restored and checkdb gets run there that’s yeah, that’s pretty legit way to do it.

Derrick Bovenkamp

4:55

Another thing to include then is you may actually need Server Enterprise Edition of SQL Server. Because CheckDB does run faster on enterprise. It’s multi threaded and as long as your storage is fast enough, you can run CheckDB a lot faster that with that than SQL Server Standard Edition. Every corruption that we’ve run into, almost every single one they had not been running CheckDB regularly. And the ones where they have been running CheckDB regularly their recovery process is usually a lot easier. You just either go back to that last backup, or the fixed part of it. A lot less complex than the one that’s been corrupt for a long time. Right?

Steve Stedman

6:01

Yes, and really the key there is, it comes down to you’re fixing corruption that was introduced in the last day or two versus fixing corruption that was introduced weeks, months, or even years ago, that nobody knew about until things got really bad. And the system stopped working for some reason.

Derrick Bovenkamp

6:16

Yeah, or how many corruption repairs have we done where they didn’t realize that corruption until they went to do an update? And they went up to their software and update wouldn’t work. And then they found out they had corruption? So I just, I just think it is so important. That sounds like we’re preaching here. But it is it is so important to run that CheckDB’s often.

Steve Stedman

6:42

Yep. And if we might seem a little bit fanatical, when we’re describing the importance of running CheckDB, there’s a reason for that. And the reason is because the most difficult part of the job is when we have to look at it and realize, oh, there’s nothing that can be done to recover what happened there. And sometimes that’s the case, it hasn’t happened very many times. But I hate having to tell someone, “sorry, your database is completely gone, because it’s so corrupt, and you don’t have good backups.” But the fact is, if they’d come to us when they first when the corruption was first introduced, there may have been hope. But there may not be hope, when it waits too long.

Derrick Bovenkamp

7:40

Yeah. And that’s really unfortunate. You know, I remember one client who was able to find their backup but it was over a year old. So essentially they lost a years worth.

Steve Stedman

8:03

All right. Well, I think parting words on this little, little blog post here would be run CheckDB, as frequent as you can once a day is like the gold standard. But if you can’t do it once a day, do it as frequently as you possibly can and it will save you time, money, and it’ll may save your business one day.

Derrick Bovenkamp

8:27

Yeah, that’s a perfect way to end it.

Steve Stedman

9:13

So one of the things that I think happens is we get caught in the day to day maintenance, day to day things that often times we miss out on the big picture items. And well, someone who knows about analyzing risks, once said, “You look at the problem and think, if the worst case happens, is that something you can live with? And if the answer is no, then maybe you should do something about it and run CheckDB more often.

So for instance, can you live with losing this entire database. And if this is just some little IT database that you can rebuild if you had to, and it’s not the end of the world, you’re not going out of business because of it, that’s different than if this is the core database for your business that has all of your customer information. And if it’s gone, it’s gone. And you don’t even know who those customers are. That’s a tough spot to be in. So ask yourself, what’s the status of things? And can I live with it? And if the answer is no, you can’t live with that. Well, maybe we need to look at fixing it or doing something different.

Derrick Bovenkamp

10:32

Yeah, no, that’s perfect. “What’s the worst that can happen? And can you live with it?”

Steve Stedman

10:40

Yep. Yep.

1 thought on “CheckDB Chat with Steve and Derrick”

RJH September 21, 2021 at 07:20


I’ve been in this situation with a bit of a twist. I had a database that was corrupt. It is a very large database and running checkdb daily causes issues so we went to once a week. So we found out the database had corruption on it and went back and found a backup that was a couple days old that was not corrupt. We restored, ran checkdb to verify it was not corrupt, and then released it to users. We were lucky in that all the data is added via a nightly process so that process just had to catch up. After the nightly process ran the database was corrupt again. So we ran through the above process one more time only to find after the nightly process that the database was corrupt yet again. Each time we noticed the corruption on the same table so we tried dropping and recreating the indexes but that only worked for a short period. We ended up creating a new empty version of that table, copying all the data into this new table, dropping the old table, and renaming the new table to the correct name and have not had an issue yet.

So yes, I agree do those checkdb’s and also check the results of them and make sure you setup and configure the alerts for corruption as well.

CheckDB Chat with Steve and Derrick

1 thought on “CheckDB Chat with Steve and Derrick”

Leave a Reply Cancel reply