November 28, 2014 Leave a Comment
As the DBA, you get a call from the IT Team who monitors disk space and they tell you the following:
“One of our server that has SQL Server running on it is running low on disk space. It has gone from around 60% disk utilization to 95% disk utilization just overnight. Whats the problem?”
In this case it happens that this is a server that you have never seen before. Someone installed a SQL Server to test out something product along the way, they never told the DBA about it, and suddenly this sever has become an important part of the company internal tools.
Sound familiar? I know I have been hit with this before.
Right off, you have never seen this server before, you have no established baseline to compare it to, and you need to find out why it is running out of disk space rapidly. For me, some of the first things that would cross my mind would be:
- Has anyone installed a new database or product new on this server recently?
- Are any of the databases growing rapidly?
- Are backup files being saved locally?
- Are backups being run at all?
- How big are each databases data files and log files?
- Is there something outside of SQL Server taking up space?
Before I start troubleshooting I am going to just remote desktop to the server and double check the disk space myself to be sure that someone hasn’t misinterpreted the disk utilization.
At first glance the free disk space is 4.39GB out of 63GB.
As I am thinking about where to look first, the disk space drops to 2.28GB free. Now I am very concerned. I know that the C: Drive is filling up, this sever only has a C: drive, and if the Windows operating system runs out of disk space on the boot drive C:, then the server will crash, and often times will no longer boot. Now things are serious, and must be dealt with quickly.
Now to start hunting, I could dig out several dozen of my favorite trouble shooting queries, or I could just run Database Health Monitor to start troubleshooting the issue. Database Health Monitor is my quick choice to track down problems like this.
I start up Database Health Monitor and connect to the SQL Server that is running out of space, to start looking around.
Not immediately jumps out. I see 12 databases on this SQL Server, and I see that the PerformanceTrouble database is using more CPU than any other database on the system. Now to start looking at individual databases. The first database listed is called BadDB. For this demo I am using a SQL Server that I use to test the Database Health Monitor Application, and I tend to use names that are descriptive of what the database is intended to do.
From this database I can see a few red flags. Well, we learn that the Database and Log files all exists on the C: Drive, and that this could cause the database to run out of disk space if the database grows, but this database isn’t very big, its only a few Megabytes, so I am not worried here.
Next I take a look at the Real Time Reports for Backup Sizes:
For this database I can see that it has grow over the last year from around 2 megabytes to around 10 megabytes. All really small. I also notice from the File names that the backups are being saved on the C: drive too.
As I click through the rest of the databases I see very similar results, its not great, but none of them are very big until I get to the database called PerformanceTrouble. Database Health Monitor shows this overview.
Now this page points out a few things. In bright red, there is a notice that shows the amount of log space allocated and being used by this database is around 24GB, for a database that is only 341MB in size. This is an indication that backups are not being run, or that backups are not being run often enough, for instance log backups. On this same page, I can see that the last full backup was run today at 5:00am. I then click through the rest of the databases and nothing jumps out anywhere near this problem.
At this point I have spent about 3 minutes running Database Health Monitor, and I have found one problem that is using up about 33% of our entire hard drive.
How do we fix this issue?
First if I just doubleclick the Large Log File warning at the bottom of the application, my browser will open with one solution. There is a solution presented that looks a lot like this:
First we run this query to get the names of the log files.
-- check the size of the files. SELECT size / 128.0 as sizeMB, name FROM sys.database_files;
From that query we see that the name of the log file is PerformanceTrouble_log, the script below would get updated that your actual database name and log file name. The following script will shrink the log file for this database to around 1mb.
WARNING: Running this script will disconnect all other users from the database. Only run this on a production database if you have no other option.
-- Truncate the log by changing the database recovery model to SIMPLE. ALTER DATABASE PerformanceTrouble SET RECOVERY SIMPLE; GO -- Shrink the truncated log file to 1 MB. DBCC SHRINKFILE (PerformanceTrouble_log, 1); GO -- Reset the database recovery model. ALTER DATABASE PerformanceTrouble SET RECOVERY FULL; GO -- Be sure to do a full backup, then kick off transaction log backups -- check the size of the files. SELECT size / 128.0 as sizeMB, name FROM sys.database_files;
Once the script is run we can see that the log file has been shrunk from the 24GB to be around 1MB.
When we check the drive space on the server it now looks like this:
Now we are no longer at the the critical point where we think the server is going to run out of space, but there are two things that should be done next.
1. Run a full backup of this database. If your SQL Server were to crash before the full backup is run you would have no way to recover back to the current point in time.
2. Monitor and confirm if the database log is continuing to grow, or if this was maybe a one time event that caused it to grow this large. If you determine that it is still growing then research and find out what is causing the transaction log to fill up. You can use many of the features of Database Health Monitor to do this research.
The overall troubleshooting process took about 3 minutes, and the time to fix this issue only took around 2 minutes.