SQL Sunday Fun – Word Search with New T-SQL 2012 Functionality

 

 

new_sql2012_word_jumbleClick on the grid to enlarge, then print it.

Find the following words in the grid up, down, left, right, horizontal, vertical, and diagonal.

ROWS  RANGE  PRECEDING 
FOLLOWING  UNBOUNDED  IIF 
CHOOSE  OFFSET  FETCH 
FORMAT  CONCAT  SEQUENCE 
PARSE  TRYPARSE  TRYCONVERT 
THROW  LEAD  LAG 
FIRSTVALUE  LASTVALUE  PERCENTRANK 
CUMEDIST  PERCENTILEDISC  PERCENTILECONT 

NOTE:  some of these terms normally have an underscore in them, but for the purpose of the word scramble the underscores have been omitted.

 

Related Links:

SEQUENCE object

CHOOSE function

Analytics - Analytic Functions – PERCENT_RANK

Analytics PERCENTILE_DISC and PERCENTILE_CONT

IIF function in TSQL.

TSQL Analytic Functions LEAD and LAG

OVER clause with ROWS PRECEDING and FOLLOWING in TSQL 2012

OFFSET / FETCH in a SELECT statement

Cumulative Distribution Function (CDF) or as it is known in TSQL CUME_DIST()

 

 

Happy Birthday SQL Server 2012

It has been a year since SQL Server 2012 released.

There were some fun posts on Twitter about the SQL Server 2012 birthday or anniversary. For instance:

declare @greetings varchar(100)

SET @greetings= 'Happy Birthday @SQLServer 2012'

SELECT @greetings

and another anniversary query (which doesn’t actually run):

SELECT Happy FROM Anniversary WHERE Year = 1 AND Version = '11.0.2100.60';

1 year out since SQL 2012 released, and here are some things to note:

There is no more Business Intelligence Development Studio (BIDS) in SQL Server 2012, instead it has been replaced with the SQL Server Data Tools that uses the core of Visual Studio 2010, with the Report Builder and Report Designer built in to the new Visual Studio.

SQL Server 2012 introduced a new T-SQL object used to generate numbers, a SEQUENCE, similar to an identity column.

SQL Server 2012 introduced the IIF statement. It is very similar to the IIF statement in MS Access.

In SQL Server 2012 there is a new function called CHOOSE that takes in an offset, and a list of options.

The OFFSET and FETCH functionality was added  to achieve data paging server side in TSQL.

Other Notable features:

I hope you are able to get as much out of SQL Server 2012 as I am.  Happy Birthday to SQL Server 2012.

My Top 10 TSQL Enhancements in SQL Server 2012

So far after playing around with SQL Server 2012 there are many new features to SQL.  Overall I think my favorites are in the Analytic Functions category.  Here is my breakdown of the top 10 transact SQL enhancements to SQL Server 2012.

10.  New SEQUENCE object

9.  CHOOSE function

8.  Analytics - Analytic Functions – PERCENT_RANK

7.  Analytics PERCENTILE_DISC and PERCENTILE_CONT

6. IIF function in TSQL.

5. TSQL Analytic Functions LEAD and LAG

4.  OVER clause with ROWS PRECEDING and FOLLOWING in TSQL 2012

3.  OFFSET / FETCH in a SELECT statement

2.  LAST_VALUE and FIRST_VALUE

1.  Cumulative Distribution Function (CDF) or as it is known in TSQL CUME_DIST()

Whats your favorite?

More TSQL 2012 Analytics PERCENTILE_DISC and PERCENTILE_CONT

Like the other new Analytic functions, PERCENTILE_DISC and PERCENTILE_CONT require the use of the OVER clause.

For this example I will be using almost the same revenue table in the sample database that I set up for the LEAD and LAG posting earlier in the week, and the PERCENT_RANK posting yesterday, just a few values changed to show the differences between these two functions.


CREATE DATABASE [analytics_demo];
GO

USE [analytics_demo];

-- same Revenue Table used in previous examples of the OVER clause

CREATE TABLE REVENUE
(
[DepartmentID] int,
[Revenue] int,
[Year] int
);

insert into REVENUE
values (1,20000,1999),(2,60000,1999),(3,49000,1999),
 (1,40000,2000),(2,40000,2000),(3,60000,2000),
 (1,30000,2001),(2,30000,2001),(3,700,2001),
 (1,90000,2002),(2,20000,2002),(3,80000,2002),
 (1,10300,2003),(2,1000,2003), (3,900,2003),
 (1,10000,2004),(2,10000,2004),(3,10000,2004),
 (1,20000,2005),(2,20000,2005),(3,20000,2005),
 (1,40000,2006),(2,30000,2006),(3,300,2006),
 (1,70000,2007),(2,40000,2007),(3,40000,2007),
 (1,50000,2008),(2,50000,2008),(3,42000,2008),
 (1,20000,2009),(2,60000,2009),(3,600,2009),
 (1,30000,2010),(2,70000,2010),(3,700,2010),
 (1,80000,2011),(2,80000,2011),(3,800,2011),
 (1,10000,2012),(2,90000,2012),(3,900,2012);

With these two percentile functions the main difference is that PERCENTILE_CONT interpolates the appropriate value, even if it is not in the set, and PERCENTILE_DISC picks a percentile that exists in the set.  Keep in mind that any nulls in the data set are ignored.  The CONT stands for continuous, and DISC stands for discrete distribution which means that the percentile will exist in the set.

Here is how they work.

First PERCENTILE_CONT to calculate the 90th percentile:


-- now on to PERCENTILE_CONT to calculate the 90th percentile
select DepartmentID, Revenue, Year,
 PERCENTILE_CONT(.9)
 WITHIN GROUP(ORDER BY Revenue)
 OVER(PARTITION BY DepartmentID) as Percentile90
 from REVENUE
order by DepartmentID;

which produces these results:

From here you can see that the 90th percentile is 77000, but there is no 77000 in the set, thus the continuous distribution.

Then PERCENTILE_DISC to calculate the 90th percentile (Discrete distribution):


-- now on to PERCENTILE_DISC to calculate the 90th percentile
select DepartmentID, Revenue, Year,
 PERCENTILE_DISC(.9)
 WITHIN GROUP(ORDER BY Revenue)
 OVER(PARTITION BY DepartmentID) as Percentile90
 from REVENUE
order by DepartmentID;

From the output can can see that the 90th percentile for the Revenue at DepartmentID of 1 is 80000, meaning that 90% of the values in the set are less than or equal to 80000.  Since the 90th percentile is calculated using values in the set when using PERCENT_DISC, this gives you a different value than the previous example with PERCENT_CONT.

Not on to calculating the median and comparing DISC and CONT


-- calculating the median with PERCENTILE_DISC and PERCENTILE_CONT
 -- http://stevestedman.com/?p=1533
 select DepartmentID, Revenue, Year,
 PERCENTILE_DISC(.5)
 WITHIN GROUP(ORDER BY Revenue)
 OVER(PARTITION BY DepartmentID) as MedianDisc,
 PERCENTILE_CONT(.5)
 WITHIN GROUP(ORDER BY Revenue)
 OVER(PARTITION BY DepartmentID) as MedianCont
 from REVENUE
order by DepartmentID;

With this example you can see below that Department 1 and Department 2 have the same value for the PERCENTILE_DISC(.5) and PERCENTILE_CONT(.5), but when you look at Department 3 you can see that are very different numbers based on the distribution of this set.

So overall the thing to remember here is the difference between CONT and DISC, in that DISC will include a number that is in your set, and that CONT may include the actual percentile even if it doesn’t fit into your set.

More blogging to come on the TSQL 2012 Analytic function CUME_DIST tomorrow.

More TSQL Analytic Functions – PERCENT_RANK

Percent rank is defined as the number of values that are the same or less than the current value divided by one less than the number of values. Percent rank is different than PERCENTILE, stay tuned for PERCENTILE_DISC and PERCENTILE_CONT which are different from PERCENT_RANK.

For this example I will be using the same revenue table in the sample database that I set up for the LEAD and LAG posting yesterday.

select DepartmentID, Revenue, Year,
       RANK() OVER(ORDER BY Revenue) as RankYear,
       PERCENT_RANK() OVER(ORDER BY Revenue) as PercentRank
from REVENUE
where DepartmentID = 1;

RANK() will give us the position in the overall list. For instance in a list with 15 items, the two values tied for the bottom value will be position 1. The largest number will be position 15, and the two that are tied in the middle will be at position 8.

PERCENT_RANK() shows the percentage of values that are less than or equal to the current value.

So in the above example you can see that the year 2002 was the best revenue year with $90,000 in revenue, and 2004 with 2012 were the worst years for revenue with $10,000 in revenue.

TSQL Analytic Functions LEAD and LAG

SQL Server 2012 introduces 8 new analytic functions. This post will cover 2 of them LEAD and LAG, which can be used to reference a value in the row before or the row after the current row.

If you have read my OVER clause articles you will recognize the REVENUE table. Just a table listing department revenue year over year for 3 departments.

USE [Master];

IF EXISTS(SELECT name FROM sys.databases WHERE name = 'analytics_demo')
BEGIN
	ALTER DATABASE [analytics_demo] SET SINGLE_USER WITH ROLLBACK IMMEDIATE;
	DROP DATABASE [analytics_demo];
END
GO
CREATE DATABASE [analytics_demo];
GO

USE [analytics_demo];

-- same Revenue Table used in previous examples of the OVER clause
-- http://stevestedman.com/?p=1454

CREATE TABLE REVENUE
(
[DepartmentID] int,
[Revenue] int,
[Year] int
);

insert into REVENUE
values (1,10030,1998),(2,20000,1998),(3,40000,1998),
 (1,20000,1999),(2,60000,1999),(3,50000,1999),
 (1,40000,2000),(2,40000,2000),(3,60000,2000),
 (1,30000,2001),(2,30000,2001),(3,70000,2001),
 (1,90000,2002),(2,20000,2002),(3,80000,2002),
 (1,10300,2003),(2,1000,2003), (3,90000,2003),
 (1,10000,2004),(2,10000,2004),(3,10000,2004),
 (1,20000,2005),(2,20000,2005),(3,20000,2005),
 (1,40000,2006),(2,30000,2006),(3,30000,2006),
 (1,70000,2007),(2,40000,2007),(3,40000,2007),
 (1,50000,2008),(2,50000,2008),(3,50000,2008),
 (1,20000,2009),(2,60000,2009),(3,60000,2009),
 (1,30000,2010),(2,70000,2010),(3,70000,2010),
 (1,80000,2011),(2,80000,2011),(3,80000,2011),
 (1,10000,2012),(2,90000,2012),(3,90000,2012);

 --just double check the table to see what's there for DepartmentID of 1
 select DepartmentID, Revenue, Year
 from REVENUE
 where DepartmentID = 1;

-- Using LAG
-- http://stevestedman.com/?p=1513

select DepartmentID, Revenue, Year,
       LAG(Revenue) OVER (ORDER BY Year) as LastYearRevenue
 from REVENUE
 where DepartmentID = 1
 order by Year;

With LAG you can grab a value from the previous row, or previous value in the set as as set by the OVER (ORDER BY ) clause.

Now for LEAD which allows you to pull from the next row in your set.

select DepartmentID, Revenue, Year,
       LAG(Revenue) OVER (ORDER BY Year) as LastYearRevenue,
       LEAD(Revenue) OVER (ORDER BY Year) as NextYearRevenue
  from REVENUE
 where DepartmentID = 1
  order by Year;

Which produces the following output adding in a new column called NextYearRevenue

So far, this is pretty exciting as a way to grab a value from the previous year or next year. Now to use it!

Calculating the Year Over Year Delta in the revenue numbers. Sure you could do this with a CTE and a lot of recursion, but it is just so much easier using LAG.

--So how do we calculate the difference between last year's
-- numbers and this years numbers

select DepartmentID, Revenue, Year,
       LAG(Revenue) OVER (ORDER BY Year) as LastYearRevenue,
       Revenue - LAG(Revenue) OVER (ORDER BY Year) as YearOverYearDelta
  from REVENUE
 where DepartmentID = 1
  order by Year;

and here is what you get.

That’s it for LEAD and LAG, but keep in mind these are just building blocks for the analytic functions. More to come on the analytic functions in SQL 2012.

I hope you find LEAD and LAG in SQL Server as useful as I have.  Just remember “Statistics can be made to prove anything—even the truth.”