There are many great reasons to use Common Expressions, from derived table query re-use, to doing recursive queries and creating hierarchies to doing data paging, cleaner code, or de-duplication of data.
I am curious what you are using CTE’s for?
Please post a reply with the top things you use CTE’s for.
Day 15 of Common Table Expression Month (June) at SteveStedman.com, today we will be taking a look at data paging with CTE’s in a function. Yesterday we took a look at CTEs for Data Paging, today we will build on what we used yesterday.
These queries will be using that database that was set up in a previous posting on the CTE_DEMO Sample Database, if you haven’t set up the sample database, download it and set it up now.
The concept of data paging is taking a large set of results and breaking it up into smaller pages of result sets. For instance if you search your favorite internet search engine for a term, you are usually shown results 1 to 20 of X where X is a very large number. For instance page 1 is results 1 to 20 of 652,443 results.
CTE Data Paging
declare @pageNum as int;
declare @pageSize as int;
set @pageNum = 2;
set @pageSize = 10;
;WITH TablesAndColumns AS
(
SELECT OBJECT_NAME(sc.object_id) AS TableName,
name AS ColumnName,
ROW_NUMBER() OVER (ORDER BY OBJECT_NAME(sc.object_id)) AS RowNum
FROM sys.columns sc
)
SELECT *
FROM TablesAndColumns
WHERE RowNum BETWEEN (@pageNum - 1) * @pageSize + 1
AND @pageNum * @pageSize ;
Now lets wrap it in a procedure to simplify things.
CREATE PROCEDURE TablesAndColumnsPager @pageNum int, @pageSize int
AS
BEGIN
SET NOCOUNT ON;
;WITH TablesAndColumns AS
(
SELECT OBJECT_NAME(sc.object_id) AS TableName,
name AS ColumnName,
ROW_NUMBER() OVER (ORDER BY OBJECT_NAME(sc.object_id)) AS RowNum
FROM sys.columns sc
)
SELECT *
FROM TablesAndColumns
WHERE RowNum BETWEEN (@pageNum - 1) * @pageSize + 1
AND @pageNum * @pageSize ;
END
Then to call the paging function it is a bit cleaner.
If you enjoyed this posting, and want to learn more about common table expressions, please take a look at my book on CTE’s at Amazon.com. The book is titled Common Table Expressions - Joes 2 Pros® - A CTE Tutorial on Performance, Stored Procedures, Recursion, Nesting and the use of Multiple CTEs.
Day 14 of Common Table Expression Month (June) at SteveStedman.com, today we will be taking a look at data paging with CTE’s and how it compares to the offset and fetch in T-SQL 2012.
These queries will be using that database that was set up in a previous posting on the CTE_DEMO Sample Database, if you haven’t set up the sample database, download it and set it up now.
The concept of data paging is taking a large set of results and breaking it up into smaller pages of result sets. For instance if you search your favorite internet search engine for a term, you are usually shown results 1 to 20 of X where X is a very large number. For instance page 1 is results 1 to 20 of 652,443 results.
Data Paging – Before SQL 2012
First lets take a look at a query with no data paging Shows all tables on this database, and all columns, and returns many rows.
SELECT OBJECT_NAME(sc.object_id) as TableName,
name as ColumnName
FROM sys.columns sc
ORDER BY OBJECT_NAME(sc.object_id);
When this is run we get 702 rows in the result set:
Now to introduce data paging with a CTE, Here we have a CTE example to get data paging. This example assumes that the page size is 10 the first page would display the first 10 rows, the the second page would display rows 11 to 20, and the third page would display rows 21 to 30. Here we are using the ROW_NUMBER() function with the OVER clause for the windowing functionality to get the row number from the output. In a non-CTE query you can’t use ROW_NUMBER() in the where statement, but if it is wrapped in a CTE, and given a column alias we can then use it in the WHERE clause selecting from the CTE.
declare @pageNum as int;
declare @pageSize as int;
set @pageNum = 2;
set @pageSize = 10;
;WITH TablesAndColumns AS
(
SELECT OBJECT_NAME(sc.object_id) AS TableName,
name AS ColumnName,
ROW_NUMBER() OVER (ORDER BY OBJECT_NAME(sc.object_id)) AS RowNum
FROM sys.columns sc
)
SELECT *
FROM TablesAndColumns
WHERE RowNum BETWEEN (@pageNum - 1) * @pageSize + 1
AND @pageNum * @pageSize ;
When we run this we get the following results for page 2:
Now if we change the @pageNum variable to be 3 we get the following:
SQL 2012 Data Paging
If you are using SQL Server 2012, there is a new option added to the SELECT statement, this is called OFFSET and FETCH. You have an option to do data paging without the CTE, the OFFSET and FETCH option allows you to do the following:
</pre>
declare @pageNum as int;
declare @pageSize as int;
set @pageNum = 2;
set @pageSize = 10;
SELECT OBJECT_NAME(sc.object_id) AS TableName,
name AS ColumnName
FROM sys.columns sc
ORDER BY TableName
OFFSET (@pageNum - 1) * @pageSize ROWS FETCH NEXT @pageSize ROWS ONLY;
<pre>
Which produces similar results for page 2 as the CTE version as shown here:
Which is Better?
The only correct answer is “It Depends”. It depends on a number of things. First if you are writing code that needs to run on an older version of SQL Server than 2012, then you can’t use the OFFSET and FETCH way 0f doing it, your best option is the CTE data paging. If you are running on only SQL Server 2012 or newer, then you have a choice. The size of your result set, the design of your tables, the number of JOINs in the query will all impact the overall performance. I would suggest if you are using SQL 2012 and you have a choice, code it both ways, and performance test it. See which runs best base on your specific query.
If you enjoyed this posting, and want to learn more about common table expressions, please take a look at my book on CTE’s at Amazon.com. The book is titled Common Table Expressions - Joes 2 Pros® - A CTE Tutorial on Performance, Stored Procedures, Recursion, Nesting and the use of Multiple CTEs.
Quite often when working testing different performance measures from the Database Health Reports to testing Resource Governor configuration I end up in a situation where I need to emulate a high CPU load on my test SQL Server. In this case it is usually my desktop or a Hyper-V virtual machine, where I want to see how things will work if I had a nearly 100% CPU load. In the real world you would just have to get millions of users to visit your website that has a SQL Server backend, that its not that easy in my development / test lab.
Here is that I came up with for a plan. Keeping in mind the goal here is to write queries that will use up as much CPU as possible. This is counter-intuitive, completely against everything that I practice on a daily basis, but here goes.
First create a table with poor design. Using UNIQUEIDENTIFIERS for a primary key and a foreign key (parent_id) is probably ugly enough.
CREATE TABLE SplitThrash
(
id UNIQUEIDENTIFIER default newid(),
parent_id UNIQUEIDENTIFIER default newid(),
name VARCHAR(50) default cast(newid() as varchar(50))
);
Next we fill the table up with lots and lots of rows, specifically 1,000,000 rows, remember here the goal is to simulate CPU load. If this isn’t enough I often times run this script several times. Keep in mind the GO statement followed by a number says to execute the batch that many times.
SET NOCOUNT ON;
INSERT INTO SplitThrash DEFAULT VALUES;
GO 1000000
Next, this part makes me just feel nasty. Create a CLUSTERED index on the table that we just filled up, and cluster on BOTH columns that were UNIQUEIDENTIFIERS.
CREATE CLUSTERED INDEX [ClusteredSplitThrash] ON [dbo].[SplitThrash]
(
[id] ASC,
[parent_id] ASC
);
At this point is is a bit ugly, but it still doesn’t use much memory. You are probably wondering why I called the table split thrash. I gave it this name so that updating the UNIQUEIDENTIFER would cause as many page splits or new page allocations as possible. So we update the parent_id which is part of the clustered index
</pre>
UPDATE SplitThrash
SET parent_id = newid(), id = newid();
GO 100
<pre>
This update statement causes chaos in the page structure for the table as updating the unique identifiers causes quite a bit of processor work.
On my wimpy VM for this development environment I need to repeat this entire process creating 4 or 5 tables, and doing the update in 4 or 5 SSMS windows in order to use up all of the CPU on the database.
Don’t try this on a production system, but it is a great test to run on a development server.
Day 12 of Common Table Expression Month (June) at SteveStedman.com, today we will be taking a look at how to use multiple nested CTEs in a query to use up all the memory on your SQL Server.
These queries will be using that database that was set up in a previous posting on the CTE_DEMO Sample Database, if you haven’t set up the sample database, download it and set it up now.
How to use up all of your SQL Server available memory with a single CTE query
The article could be named, “How to use up all of your SQL Server available memory with a single CTE query.” Another name for the article could just be “SQL Server Bug Report” depending on how you look at it.
When presenting unleashing Common Table Expressions at SQL Saturday a while back, I was asked a couple of great questions that I didn’t know the answer to. So I did the research and tracked it down:
1. How many levels of recursion can you have in a CTE?
2. How many levels of nesting can you have in a CTE?
So I started doing the research and doing some testing to figure it out.
How many levels of recursion can you have in a CTE?
This is the easier one to answer between the two questions. The answer is, you can have more levels of recursion that you would ever need for standard recursion. I have tested CTEs with up to 1 million levels of recursion, and the have performed pretty well. If you are writing queries that need more than 1 million levels of recursion, perhaps you should take a look at a different approach.
How many levels of nesting can you have in a CTE?
This is where it gets really interesting. SQL Server does a really good job with overall performance on CTEs, but where it completely breaks down is on deep nested CTE queries. By deep nested I mean more than a thousand or two thousand. To answer the question of how many levels can you have, I would answer this as you can nest more than you would ever really need. If you really need more than a thousand levels of nesting in a CTE, you might want to rethink your approach.
On SQL Server 2005 the limit is 255. But in SQL Server 2008 and newer this limit was extended, and appears to not have a fixed limit, rather the limit is based on the amount of memory available for the query to use.
So to build the crazy CTE, I used excel to build out several thousands of rows of nested CTEs. One calling another, calling the next, and so on. What I found was that this was a very easy way to use up almost all the memory on your database. DO NOT TRY THIS ON A PRODUCTION DATABASE. The reason that DBAs and developers have test or development databases is to play around with things that may be dangerous on a production server. This is one that you could try on a test server.
The other interesting thing that the query does when it uses up all the memory, somehow it dumps some of the connections that are currently active on the SQL Server at that point.
Watch the video to find out how a nested CTE query could use up all of the memory on your SQL Server.
If anyone has a test server with 512GB, or 1TB of available, memory, I would love to see how this type of query performs. Give it a try and let me know.
You can download the script NestedCTE.zip here. The zip file contains 2 files, one called CTE.sql, and one called CTE2500.sql. The 2500 version is the one that I used in the demo, and the CTE.sql has 32767 nestings. I would love to see the bigger on run on a server with a TB or more of memory to see how it performs.
If you enjoyed this posting, and want to learn more about common table expressions, please take a look at my book on CTE’s at Amazon.com. The book is titled Common Table Expressions - Joes 2 Pros® - A CTE Tutorial on Performance, Stored Procedures, Recursion, Nesting and the use of Multiple CTEs.