CTE Hierarchy compared to the alternative

After my CTE presentation at SQL Saturday 108 in Redmond, I was asked many questions, and received several great suggestions from people.  Based on that feedback, I am updating my presentation for SQL Saturday 114 in Vancouver to include some additional content.

One question was how does the performance compare between a recursive CTE to generate a hierarchical tree path listing and a query using self JOINs and UNION ALL to generate similar results.  To test this I created a simple table for departments in an online store that contained the following rows.

The I wrote two queries to generate a tree path for all departments in the list.

First the CTE Query:



-- Performance Differences 
-- Remember Ctrl+M to turn on Execution Plan 
-- Recursive CTE compared to Multiple Self Joins 
;WITH departmentcte(deptid, department, parent, LEVEL, treepath) AS
( SELECT id AS deptid, department, parent, 0 AS LEVEL,
CAST(department AS VARCHAR(1024)) AS treepath
FROM departments
WHERE parent IS NULL
UNION ALL -- and now for the recursive part  
SELECT d.id AS deptid, d.department, d.parent,
departmentcte.LEVEL + 1 AS LEVEL,
CAST(departmentcte.treepath + ' -> ' +
CAST(d.department AS VARCHAR(1024))
AS VARCHAR(1024)) AS treepath
FROM departments d
INNER JOIN departmentcte
ON departmentcte.deptid = d.parent
)
SELECT deptid, treepath
FROM departmentcte
ORDER BY treepath;

 

 

Then the Non CTE Query (ugly):



-- Multiple Self Joins Unioned 
-- Difficult to display Parent categories 
SELECT d.id AS deptid, d.department AS treepath
FROM departments d
WHERE d.parent IS NULL
UNION ALL
SELECT da2.id AS deptid,
da1.department + ' -> ' +
da2.department AS treepath
FROM departments da1
INNER JOIN departments da2 ON da1.id = da2.parent
WHERE da1.parent IS NULL
UNION ALL
SELECT db3.id AS deptid,
db1.department + ' -> ' +
db2.department + ' -> ' +
db3.department AS treepath
FROM departments db1
INNER JOIN departments db2 ON db1.id = db2.parent
INNER JOIN departments db3 ON db2.id = db3.parent
WHERE db1.parent IS NULL
UNION ALL
SELECT dc3.id AS deptid,
dc1.department + ' -> ' +
dc2.department + ' -> ' +
dc3.department + ' -> ' +
dc4.department AS treepath
FROM departments dc1
INNER JOIN departments dc2 ON dc1.id = dc2.parent
INNER JOIN departments dc3 ON dc2.id = dc3.parent
INNER JOIN departments dc4 ON dc3.id = dc4.parent
WHERE dc1.parent IS NULL
UNION ALL
SELECT dd3.id AS deptid,
dd1.department + ' -> ' +
dd2.department + ' -> ' +
dd3.department + ' -> ' +
dd4.department + ' -> ' + dd5.department AS treepath
FROM departments dd1
INNER JOIN departments dd2 ON dd1.id = dd2.parent
INNER JOIN departments dd3 ON dd2.id = dd3.parent
INNER JOIN departments dd4 ON dd3.id = dd4.parent
INNER JOIN departments dd5 ON dd4.id = dd5.parent
WHERE dd1.parent IS NULL

ORDER BY treepath
;

 

The two queries were confirmed to produce the same results:

But when looking at the actual execution plan for the queries, there is a very big difference.  The self JOIN, and UNION ALL query took 13 times as long to run as the CTE query did.

Results:

The CTE compared to the self JOIN with UNION ALL is a double win, first the CTE is much easier to read, second the CTE is much faster with 1/13th of the time it took to execute the self JOIN, UNION ALL solution. Also in the examples shown we showed the queries were used to show up to 5 levels of categories. If we wanted to go further than 5 levels, the difference gets even worse between the CTE and the self JOIN, UNION ALL solution.

Related Links:

Common Table Expressions Book

If you enjoyed this posting, and want to learn more about common table expressions, please take a look at my book on CTE’s at Amazon.com. The book is titled Common Table Expressions – Joes 2 Pros® – A CTE Tutorial on Performance, Stored Procedures, Recursion, Nesting and the use of Multiple CTEs.

A Great Day at SQL Saturday 108 in Redmond.

WOW, what a great day at SQL Saturday 108 in Redmond.  I started out with a keynote presentation from Buck Woody.  Then I delivered my  presentation on Common Table Expressions.  Which I think went well.  There were some great questions and I look forward to refining my presentation to include some of the questions that I was asked at the end of the session.

After my presentation I was able to enjoy several other sessions.

Mark Simms (@mabsimms) – The Elephant in the Room (Hadoop for DBAs)

Donabel Santos (@SQLbelle) – SQL Server Reporting Services – Beyond the Basics

Aaron Nelson (@SQLvariant)- PowerShell for Data Professionals

Kevin Kline (@KeKline) – Top 10 Admin Mistakes on SQL Server

 

Here is one of my mementos for the day.

 

Thanks everyone for all the great feedback.

 

All of the content was great today.  Thanks to Microsoft for hosting this event on their campus.

 

Now to start preparing for SQL Saturday in Vancouver in March.

KeePass – for security and time savings

After a recent tour at a couple companies who are using the LEAN process, I have decided to take on LEAN in my everyday work.

This is the first in what I think will be many videos on the Lean Office topic.

Click the link below to watch the video.

Lean2SecondKeePass

 

For more info on KeePass see the detailed article with instructions on using KeePass.

 

Follow hash tag #leanOffice on Twitter

KeePass – to improve your process

Before proceeding with the article, please take this quick password safety test.

Password Safety Test – Score one point for every YES answer.:

  1. Do you write your passwords down?  ( YES  /  NO )
  2. Do you ever re-use a password on multiple systems or websites?  ( YES  /  NO )
  3. Are your passwords shorter than 8 characters?  ( YES  /  NO )
  4. Do you use dictionary words as your password or part of your password? ( YES  /  NO )
  5. Do you ever use family or pet names as your password?  ( YES  /  NO )
  6. If you lift up your keyboard right now, will you find one or more passwords? (YES  /  NO )

Scoring:

If you answered the above questions with YES more than 5 times, then your use of passwords are extremely insecure and you may be in danger, you may be in serious trouble, keep reading and find out what you can do to fix this.

If you answered YES more than 3 times then you passwords are insecure and could lead to trouble for you, keep reading and find out what you can do to fix this.

If you answered YES at least once, then keep reading.  You are not doing bad, but you could use some help.

 

KeePass is a free open-source password manager and safe which helps you keep track of all of your passwords.  KeePass will help you be more secure in your use of passwords.

Too Many Passwords:

Prior to using KeePass I would need to remember dozens of passwords for all of the different websites and systems that I regularly use, along with the password for the systems that you rarely use.  These passwords usually ended up on sticky notes on the side of my monitor or under my keyboard.

Re-using passwords:

It is very common for someone to use the same password again and again at every website they signup at.  This alone is a security risk, for instance if you sign up with PayPal with one username and password combination, then you sign up some online store to buy something.  If you were to use the same password, then the company with the online store would know your PayPal password.  Even worse than that, if one website with your password on it was hacked, and the password list was stolen, then those hackers would have access to a password that matched your PayPal account.

Forgetting your passwords:

Have you ever gone on vacation for a couple weeks, then when returning home or to work, you realized that you had forgotten one of your commonly used passwords.  Have you ever attempted to configure you email account on a new computer and realized that you entered your email account when you set up your old computer, and haven’t typed it since, and have no idea what that password is.  KeePass can help with this.

KeePass is a tool that I have been using for a while now.  When I first started using it, I wasn’t too sure if it would work for me, but it turned out great.  This article outlines the cool things that I have discovered while using KeePass.  This is not a paid advertisement of any kind, just my opinion and nothing more.

The main KeePass page is shown below.  You can build a hierarchy of passwords and organize them into groups or folders.  You can also search on a term if you forget which group your password is in.

 

All it takes to get started is to pick the category that you want to add a password to, then click to add an entry.  The screen below will be shown.  Passwords are generally hidden with ******* since you don’t have a reason to see the password, you just need to copy it to the clipboard and paste it into a page.

 

Another common problem that is fixed by KeePass, is the generation of strong passwords.  Strong passwords are passwords that would not be easily guessed by someone trying to break into your account.  The challenge is creating strong passwords that you can easily remember.  KeePass eliminates the need to remember any passwords, beyond the one password required to get into your KeePass system.

Rules of thumb when choosing passwords:

  • Avoid using dictionary words.
  • Never use your account name as its password.
  • Use different passwords for each machine.
  • Don’t use common misspellings of dictionary words (including replacing “i” with “1”).
  • Don’t use your first or last name in any form.
  • Don’t use your spouse’s or child’s name.
  • Don’t use your pet’s name.
  • Don’t use a password shorter than 8 characters.
  • Don’t use any “method” for creating passwords.

There are many other rules around choosing passwords, but to keep it simple the longer and more random the password, the more secure it will be.

The chart below shows the cracking time based on the length of the password.

 

KeePass provides tools for generating passwords for you, completely at random, using your keyboard and mouse for random input on the password.

Here are some examples of passwords generated by KeePass:

XUkaojtVwlMxHAZi

9YIlMRz1edPcieagLLpFipnJs8VJ4JUoyGk6Ay01

8A3FpRnkyv08oMS3CVhj6qRX3YFffRcST6ZpCYbw

After looking at these passwords you might think “How can I ever remember a password like that”.  The simple answer is that you won’t, but you don’t need to.   Just have KeePass store the password for you, then when you need it, copy it to the clipboard and paste it into the password field.

AutoType:

KeePass has a nice feature called autotype.  With this you can define the sequence that you passwords should be entered with usernames.  For instance an AutoType sequence of {USERNAME}{TAB}{PASSWORD}{ENTER} would send your username, hit tab to go to the next field, then enter your password and hit enter to submit the form.  This is very useful for forms that require multiple lines of input to log in.

 

With KeePass you set the master password for the whole password safe.  Once this has been set, you need to enter this password every time that you start KeePass.  If KeePass is minimized to an icon, you need to enter this password to restore it.

 

So what passwords do I need to remember?

You will only need to remember one password, the password used to use the KeePass program.  You can choose this password, keep in mind that you will be typing this password often as you use the system.

 

Click here to see a video on KeePass and how it applies to LEAN Process.

Calculating Factorials with a Recursive CTE

What is a Factorial:

The product of an integer and all the integers below it; e.g., factorial four (4!) is equal to 24.

The factorial of a positive integer n, written n!, is the product of all the positive integers from 1 up to and including n Example: 1! = 1 2! = 1 * 2 = 2 3! = 1 * 2 * 3 = 6 4! = 1 * 2 * 3 * 4 = 24

It is simple to do with a recursive function, and actually commonly used as a programming interview question to determine if you understand recursion.  For instance, years ago, I was asked the following during an interview at Microsoft.

What is recursion? Calculate factorial numbers using a recursive function?

Which is very easy to answer using C, which was the programming language of choice when I was asked it.  But what if you were asked that on an intervie for a SQL Server DBA, or SQL Server Programmer job.  Your first answer might be to use a recursive stored procedure, which would be a good answer, but what if you were asked the following, how would you answer it?

What is recursion?  Using TSQL write a query to calculate factorials without creating any stored procedures or functions.

Think, think, think…. You want the job, and that is the question that the interviewer is asking you.  How do you solve it?

  1. First, make sure you understand factorials.  If you don’t understand what a factorial is you won’t get anywhere with this.
  2. Then write the query, using a Recursive Common Table Expression.

Here is how I solved it.

First, get started with a simple CTE that just calculates the first factorial 1! as shown below.  Pretty simple at this point.  The result set is correct, the Factorial of 1 is just 1.

Next, I would add the recursive CTE query to continue with the factorial calculations beyond 1!, this time going up to 5!  When working with recursive CTE‘s you usually want to have an escape clause to avoid too much recursion.  In this example the CTE exits at 5.

That’s nice, but lets go further.  Lets go for 20!  Which you will see in the image below caused some problems as shown with an error of “Arithmetic overflow error converting expression to data type int.”

Now to fix the overflow.  It turns out that Factorial of 20 gets big pretty quick, and an integer only can hold a number up to just over 2 Billion. But a BIGINT can go much further, to just over 9 Quintillion (US).  So to fix this we cast the factorial column to be a BIGINT rather than an INT.

INT

Range: -2^31 (-2,147,483,648) to 2^31-1 (2,147,483,647) Space: 4 Bytes

BIGINT

Range: -2^63 (-9,223,372,036,854,775,808) to 2^63-1 (9,223,372,036,854,775,807) Space: 8 Bytes

So here we go with an attempt to get to 20! , which works great.

But how far can we go with the BIGINT 9 Quintillion limit of BIGINT.  As it turns out 21! exceeds the size of a BIGINT.

So what now?   What if we want to calculate more than 20! using SQL Server?     Cast it to a NUMERIC(38,0) which gives us 38 digits to work with as shown here.

Now what???

I haven’t been able to find a solution to do math and store the results larger than a NUMERIC(38,0).

Here is the final query.

WITH factorial (n, factorial)
 AS (SELECT 1,
 CAST(1 AS NUMERIC(38, 0)) -- Cast to BIGINT to avoid overflow
 UNION ALL -- here is where it gets recursive
 SELECT n + 1,
 ( n + 1 ) * factorial
 FROM factoria
 -- reference back to the CTE
 WHERE n < 33 -- abort when we get to 33!
 )
SELECT n,
 factorial
FROM factorial;

Although you could create a stored procedure in sql to calculated factorial, the CTE is a much easier way of doing it.

I hope you have found this useful.

CTE – With An Insert Statement

Queries with Common table expressions (CTE) are made up of two parts, the CTE part, and the SQL that references the CTE.  In preparation for SQL Saturday, the question came up of can you use an INSERT or UPDATE statement with a CTE.  Referring to the documentation I confirmed that using an insert or update inside of the CTE is invalid, but you can use an insert or update statement outside of the CTE.

 

For Example.


DECLARE @NumTableVar TABLE( n INT);

 

;WITH numbers (n)
AS (SELECT 1
UNION ALL
SELECT 1 + n
FROM   numbers
WHERE  n < 1000)

INSERT INTO @NumTableVar   (n)
SELECT n
FROM   numbers
OPTION (MAXRECURSION 0);


SELECT *
FROM   @NumTableVar;

When run confirms that you can use the insert statement with a CTE, but not inside of a CTE.

 

This would be very useful if you had just created a table and wanted to fill it up quickly for testing purposes.