Skip to content

SQL 2019 Page and Row Compression with XML and JSON

On a week long theme on XML and JSON in SQL Server, this is the second post. After the first post of just showing how to store these values in the database I am going to take a look at how big the data is and how much we can compress each one with row and page level compression.

Lets take the same table from the first post, and fill it full of 10,000 rows of data.

CREATE TABLE XmlAndJson
(
	id INTEGER IDENTITY NOT NULL,
	XmlData XML,
    JsonData NVARCHAR(MAX)  
);
GO
ALTER TABLE XmlAndJson
  ADD CONSTRAINT [JsonData record should be formatted as JSON]
  CHECK (ISJSON(JsonData)=1);



GO
SET NOCOUNT ON;
DECLARE @XmlData as XML;
set @XmlData = '<employees>
  <employee>
    <firstName>Steve</firstName>
	<lastName>Stedman</lastName>
  </employee>
  <employee>
    <firstName>Derrick</firstName>
	<lastName>Bovenkamp</lastName>
  </employee>
  <employee>
    <firstName>Bill</firstName>
	<lastName>Simpkins</lastName>
  </employee>
</employees>';

DECLARE @JsonData as VARCHAR(MAX);
set @JsonData = '
{"employees":[
  { "firstName":"Steve", "lastName":"Stedman" },
  { "firstName":"Derrick", "lastName":"Bovenkamp" },
  { "firstName":"Bill", "lastName":"Simpkins" }
]}';


INSERT INTO XmlAndJson(XmlData, JsonData)
VALUES (@XmlData, @JsonData );
GO 10000
SET NOCOUNT OFF;

Note the GO 10000 line near the bottom. This tells SQL Server to run the batch above 10,000 times, adding 10,000 rows into the table.

Lets take a look at how many data pages are in the table after adding the 10,000 rows with the following query that we will use to assess the table size (in pages).

SELECT  OBJECT_SCHEMA_NAME(s.object_id) schema_name,
        OBJECT_NAME(s.object_id) table_name,
        SUM(s.used_page_count) used_pages,
        SUM(s.reserved_page_count) reserved_pages
FROM    sys.dm_db_partition_stats s
JOIN    sys.tables t
        ON s.object_id = t.object_id
GROUP BY s.object_id
ORDER BY schema_name,
        table_name;

The table as shown used 842 pages, with a total allocation of 849 pages. Lets see what we can do with row and page level compression. First we will try row level compression on SQL Server 2019.

ALTER TABLE dbo.XmlAndJson REBUILD PARTITION = ALL 
WITH  (DATA_COMPRESSION = ROW) 

After altering the table with row compression we run the check again to see how many pages we have allocated. The page count drops slightly from 849 reserved pages to 811 reserved pages, which is about a 4.5% savings.

Next we try page level compression. Since the compression uses a pattern type compression where the more similar values there are the better it compresses, lets see how well that works.

First we turn on page level compression for our table.

ALTER TABLE dbo.XmlAndJson REBUILD PARTITION = ALL 
WITH  (DATA_COMPRESSION = PAGE) 

Now to check the compression…

The 842 reserved pages dropped to 91 reserved pages, and the used pages dropped to 26. This give us an 89.2% savings on those data types.

Now, which compresses better, the XML or the JSON. Another try with only filling in one value or the other produced the following results.

  • XML 10,000 rows, with no compression takes up 305 pages, with page compression 91 pages. 71% compression savings.
  • JSON 10,000 rows, with no compression takes up 457 pages, with page compression 91 pages. 81% compression savings.

Overall the JSON appears to compress about 10% better than the XML, however the XML starts out smaller.

Next we go for a wider variety of text instead of using the same values for each row, I am going to shake it up a bit by using newid() to generate a random string and chop it to 10 characters, and use that to populate the table with more random XML and JSON data.

SET NOCOUNT ON;
DECLARE @name as varchar(100);
SET @name = LEFT(cast(newid() as varchar(100)), 10);
DECLARE @XmlData as XML;
set @XmlData = '<employees>
  <employee>
    <firstName>' + @name  + '</firstName>
	<lastName>' + @name  + '</lastName>
  </employee>
  <employee>
    <firstName>' + @name  + '</firstName>
	<lastName>' + @name  + '</lastName>
  </employee>
  <employee>
    <firstName>' + @name  + '</firstName>
	<lastName>' + @name  + '</lastName>
  </employee>
</employees>';

DECLARE @JsonData as VARCHAR(MAX);
set @JsonData = '
{"employees":[
  { "firstName":"' + @name  + '", "lastName":"' + @name  + '" },
  { "firstName":"' + @name  + '", "lastName":"' + @name  + '" },
  { "firstName":"' + @name  + '", "lastName":"' + @name  + '" }
]}';


INSERT INTO XmlAndJson(XmlData, JsonData)
VALUES (NULL, @JsonData );
GO 10000

After that change the compression ratio is much less, but it still has some savings. From 529 pages total to 467 pages or a 12.8% compression ratio. Your actual compression will vary depending on how much variety you have in your XML or JSON. Either way we still end up with some savings, which is better than not compressing the data.

Summary, you can have significant saving on SQL Server 2019 in your XML or JSON data values by using page level compression. Some savings with row level compression, but more with page level compression. This applies to Standard, Enterprise, and Express editions of SQL Server 2019.

Related Links:

 

More from Stedman Solutions:

SteveStedman5
Steve and the team at Stedman Solutions are here for all your SQL Server needs.
Contact us today for your free 30 minute consultation..
We are ready to help!

Leave a Reply

Your email address will not be published. Required fields are marked *