More TSQL 2012 Analytics CUME_DIST – Cumulative Distribution

Download PDF

Continuing on the TSQL 2012 Analytic Series now on to the CUME_DIST function

SQL Server 2012 introduces another new analytic function.  The Cumulative Distribution Function CUME_DIST()  refers to the probability that the value of a random variable falls within a specified range.

CUME_DIST  is the function that maps values to their percentile rank in a distribution.  CUME_DIST function calculates the possibility of another occurrence being of that value or lesser than that among a group of values.

Lets jump into the examples:


-- CUME_DIST for Year
select DepartmentID, Revenue, Year,
 CUME_DIST() OVER (ORDER BY Year) as CumulativeDistributionYear
 from REVENUE
 where DepartmentID = 1
 order by Year;

-- CUME_DIST for Revenue
select DepartmentID, Revenue, Year,
 CUME_DIST() OVER (ORDER BY Revenue) as CumulativeDistributionRev
 from REVENUE
 where DepartmentID = 1
 order by Revenue;

Which produces the following output: In the first example with the years being the range looked at from 1999 to 2012, the difference between each cumulative distribution value is 7.1% which with an even calculation could be determined by dividing 100 by the number of values wich produces 7.1% (.0714285….). That’s the easy one. You don’t need a TSQL Function to calculate 100 divided by the the number of rows.

Now on the the second distribution which calculates the distribution over the revenue numbers. I have added another column to the example above which rounds the Cumulative Distribution to a percentage with one decimal.

</pre>
select DepartmentID, Revenue, Year,
 CUME_DIST() OVER (ORDER BY Revenue) as CumulativeDistributionRev,
 ROUND(CUME_DIST() OVER (ORDER BY Revenue) * 100, 1) as PercentLessThanOrEqual
 from REVENUE
 where DepartmentID = 1
 order by Revenue;

From here you can see that the lowest Revenue value of 10,000 was hit in both 2004 and 2012.  Keeping in mind the number 7.14% from the previous example you can see that 14.3% is double 7.14% since there are two years with a $10,000 revenue.

This can be useful to determine the percentage of values that are less than or equal to the current value.

That’s it for today.  I hope you find CUME_DIST to be useful.

 

More from Stedman Solutions:

SteveStedman5
Steve and the team at Stedman Solutions are here for all your SQL Server needs.
Contact us today for your free 30 minute consultation..
We are ready to help!

Leave a Reply

Your email address will not be published. Required fields are marked *

*