Here is a quest that was submitted on the blog today.
Hello
Just have a question about a table of mine
one of the columns is of type text and its of estimated size of 128gb (99%)
Now I do not have permissions to remove the column, but I was told that I can remove data from it (to save space).
But of course, setting the data to null does not de-allocate the space.
Do you have any tips for how to reclaim the space?
My first comment here is that the text column type is deprecated, meaning that Microsoft has announced that it is going away in a future version of SQL Server. For the purpose of an examples in this post, I have tested on SQL Server 2019.
You will want to understand how the text data is stored. It is always store out of the row. What you have stored in the data page with the row is a 16 byte pointer to the text data. It appears that even if the text data type is null, it stores a pointer to an off row location of which indicates that the row is null.
Lets take a look at how how can clean up some of the space after the text data has been set to an empty string or to null.
First we will start by creating a test databases an creating a table to hold some data.
CREATE DATABASE TextSizeTest;
GO
USE TextSizeTest;
GO
CREATE TABLE bigTable
(
id INTEGER IDENTITY NOT NULL,
textField TEXT NULL DEFAULT ''
);
GO
CREATE CLUSTERED INDEX [idx_id] ON [dbo].[bigTable]
(
[id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON);
Next we will use the master.sys.messages table as a way to grab a bunch of text data and insert it into the sample test column.
INSERT INTO bigTable (textField)
SELECT TOP 60000 text + text + text + text + text + text + text + text + text + text + text + text as textField
FROM master.sys.messages;
SELECT COUNT(*) as NumRows
FROM bigTable;
Note the text + text … where we are just taking the text from sys.messages and replicating it 12 times. We could have used the replicate command to do that also.
Next lets take a look at what is stored in the row versus off row with the following query.
SELECT alloc_unit_type_desc, page_count
FROM sys.dm_db_index_physical_stats (DB_ID(), OBJECT_ID('dbo.bigTable'), NULL, NULL , 'DETAILED')
You can see in the picture above that there are 260 pages of row data, and 11555 pages of LOB data or off row data that is holding all those text strings.
Next we update the textField and set it to an empty string.
UPDATE bigTable SET textField = '';
You can see that the page_count drops when we go with the empty string, but there are still many pages being used. 7419 pages to hold the empty string.
You can also try setting the text field to null and there is no additional space savings.
So what can we do to save space. This is where reorganizing or rebuilding the table will help release some of the lob_data pages and free up space. Lets first try reorganizing the index:
ALTER INDEX [idx_id]
ON [dbo].[bigTable]
REORGANIZE WITH ( LOB_COMPACTION = ON );
SELECT alloc_unit_type_desc, page_count
FROM sys.dm_db_index_physical_stats (DB_ID(),
OBJECT_ID('dbo.bigTable'), NULL, NULL , 'DETAILED')
Just over half the lob rows disappear. Here is where it gets interesting, the above code to reorg and check the lob_data page count, try running it multiple times.
After reorganizing 15 times the lob_data ended up at 752 pages, I tried a few more reorgs and there was no additional savings. Doing these reorgs took the table in this example for 92mb to 8.4mb.
Running the same test again and REBUILDing the index didn’t end up saving any space, we needed to do the reorg to save that much space.
What happens if you don’t have a clustered index in this example. Well you don’t have the index to rebuild this way, so just throw a clustered index on the table, reorg the indexes multiple times, then drop the clustered index (if you don’t need it any longer).
CREATE CLUSTERED INDEX [idx_id] ON [dbo].[bigTable]
(
[id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON);
GO
ALTER INDEX [idx_id]
ON [dbo].[bigTable]
REORGANIZE WITH ( LOB_COMPACTION = ON );
SELECT alloc_unit_type_desc, page_count
FROM sys.dm_db_index_physical_stats (DB_ID(),
OBJECT_ID('dbo.bigTable'), NULL, NULL , 'DETAILED')
GO 15
DROP INDEX [idx_id] ON [dbo].[bigTable];
I hope this helps.
-Steve Stedman
More from Stedman Solutions:
Steve and the team at Stedman Solutions are here for all your SQL Server needs.
Contact us today for your free 30 minute consultation..
We are ready to help!
I would not use Text data type for this since it was depreciated from 2005(maybe earlier), more importantly, it does not support quite a lot as what varchar(max) does.
Tony –
I agree on text data type being deprecated, and that varchar(max) is a much better option, however the request specifically stated that the table is using the text type and that they didn’t have permissions to change that.
It would indeed be crazy to use a the text data type on any modern system, but there are many older systems out there, even those that were created on SQL Server 2000 or older that have been migrated to newer versions of SQL Server and that do use the text data type.
Thanks for the feedback, I appreciate it.
-Steve Stedman