SQL Server Corruption: Stories About Causes of Corruption

SQL Server Corruption: Stories About Causes of Corruption
Download PDF

The integrity of a Microsoft SQL Server database can be severely undermined by corruption, leading to data inaccessibility and potential operational disasters. Steve Stedman’s comprehensive approach to identifying and rectifying such corruption is often considered an essential tool in the SQL Server administrator’s kit, underlining its significance in maintaining database health.

These stories come from a recording of a presentation by Steve Stedman and Derrick Bovenkamp.

Steve: Now imagine with this picture, your fiber optic cable running through the middle of a fire and you’re doing a backup, or trying to load a database or something across the network, you’re probably not going to expect your connection to last very long. But while it’s burning up, it may be sending some data through, and it may be getting corrupt. This was one of the more unusual possible causes of corruption but hopefully, you never look outside of your building and see this type of situation with your network cables.

There are a lot of different things that we have seen from experience with different customers that have led to a corrupt database. And the first one was the very first customer that I ever did a database corruption repair for and was during a lightning storm. And what had happened is they had their accounting system running on a, we will call it production in quotes production SQL Server, under their desk, with no power strip, no uninterruptible, power supply, nothing like that. And they were in an area that was frequented by lightning storms, lightning storm hit, we think the server got some kind of a jolt, and then it shut down unclean. And when it started back up, there were chunks of the database that were not coming back not accessible. And there was in that situation, about a dozen tables we had to repair. But the exact timing of when things went bad, they could pin down to the lightning storm. Of course, this is a situation where customers didn’t have any backups. And that always makes it interesting. But I think that one could have been avoided if they had an APC battery backup system or something that filters the power.

Derrick : So this is not one that I’ve personally seen, I’ve heard the story two or three times at a SQL Pass Summit. The story goes that somebody had a network room installed on the side of their building close to train tracks. This company started to have drive failures like nobody had ever seen before. It took them a while to realize it, but they finally connected the dots that every time the train went by, the building would shake. And you would think that it would not cause drive failure. There is a video on YouTube of somebody, (this is specifically with spinning drives, you know, old, regular spinning disks, not solid state drives) of somebody screaming into an array. And when they scream into it, they noticed when monitoring latency, the latency would go up on the desks every time the guy would yell at the array.

Steve : You know that reminds me Derrick, about 20 years ago, I worked in an office that was right next to the railroad tracks. And when the passenger train came by, we didn’t really notice much other than a bit of shaking. But when a freight train came by, especially the ones that were carrying, ore cars, or these big bins of scrap metal. This was back before we had all the flat screen monitors and actually had a CRT type monitor, and when those cars would go by with ore or scrap metal, it would actually create enough of a magnetic field that you would see the monitor picture, move and distort because of that magnetic field as it was going by. And I can imagine that would just not be good for any type of storage or disk situation as well.

Derrick : Yeah, kind of gives shivers down my spine when you’re thinking about a magnetic drive. The next one here is another one that I have. It’s actually two stories here. But Steve will tell one, but this is one that I have heard of. And it was a janitor with a vacuum cleaner in the middle of the night. And the server was on the ground. And every time they would vacuum the room. Maybe not every time, but it would happen enough. The vacuum cleaner would run into the server and that you can imagine that’s also not good for the spinning drives.

Steve : Oh, yeah, absolutely. And then the other situation, again you can blame the janitor, but I wasn’t part of it I only heard the stories of it later where they had a server that twice a week the server would just power off and reboot and after months of tracking this down, they eventually found out that this server was a server that was just sitting in an empty cubicle in an office. And the reason it would power cycle was the janitor would come in to vacuum and they needed an outlet. So they pull the power cord out for the server, do the vacuuming and then plug the server back in. And you know, that can’t be good for your database.

Derrick : The next one here is, you know it’s really hard for me to talk about as a systems administrator that really likes technology, and really likes to think very highly of some of the storage technology that we have today. But we have seen at least two cases where we were able to trace back corruption to something with an iSCSI switch, or something with that connection between the storage and the server, both with iSCSI. So yes, network switch errors, they certainly can cause corruption.

Steve : The last one we want to talk about here is drive failure. Now, if you have complete drive failure, and your hard drive is just totally dead. That’s a different situation, because either you’ve got a backup or you’ve got a RAID array of some kind, or completely lost all your data. That’s not corruption in that case, but where you have a partial drive failure, where things are starting to fail with the drive, and you’re getting things that are not being written or read correctly, and that can lead to the database being corrupted. If it reads, and it has a read error, but somehow, when it writes it back out, it writes it out with incorrect data, that can definitely lead to database corruption. But really, most of the time, it comes down to problems with IO, somewhere between the SQL server process and when it actually lands on disk. There is something that is corrupting, distorting, manipulating or changing the data in some way. That when you try and read it back later, it doesn’t look like a regular database file.

Want to learn more about how to prepare for corruption on SQL Server? Take a look at my class that will teach how to to be prepared if corruption strikes.

Corruption Class by Steve Stedman.

 

More from Stedman Solutions:

SteveStedman5
Steve and the team at Stedman Solutions are here for all your SQL Server needs.
Contact us today for your free 30 minute consultation..
We are ready to help!

Leave a Reply

Your email address will not be published. Required fields are marked *

*