FULL OUTER JOIN vs CROSS JOIN

As I have been working on the SQL Server JOIN Types poster, I have received several questions around the difference between a CROSS JOIN, and a FULL OUTER JOIN. After looking at the Venn diagrams for the two, they are both shown as the same, however they are not the same by any means.

Yes, they both include all rows from both the LEFT and RIGHT side of the JOIN, however they are matched up or JOINed in a very different way.

Lets take a look, first at the Venn diagrams, you can see below that the Venn diagrams show that all rows from Table1 are included in the results, and all rows from Table2 are included in the results. This is indeed correct, the differences come in the number of rows and how the results are matched up.

CrossJoinVennFullOuterJoinVenn

 

Now lets take a look at some sample cod that shows the differences. First we will create a database and create a couple tables to use for the demo, and each table has 3 rows inserted to it:

CREATE DATABASE [QueryTraining];
GO
USE [QueryTraining];
GO

CREATE TABLE People
(
	id   INTEGER IDENTITY NOT NULL,
	Name VARCHAR(100) NOT NULL,
    fk   INTEGER NULL
PRIMARY KEY (id)
) ;

CREATE TABLE Colors
(
	id   INTEGER NOT NULL,
	FavoriteColor VARCHAR(100) NOT NULL
) ;

INSERT INTO Colors (id, FavoriteColor) VALUES
    (1, 'red'),
    (2, 'green'),
    (3, 'blue');

INSERT INTO People (Name, fk) VALUES
    ('Steve', 1),
    ('Aaron', 3),
    ('Mary', NULL);

Look at this as a database to store people and their favorite color, in this example, there are 3 people , and 3 colors. One person does not have a favorite color, and one color has not been favorited by anyone.

Now lets take a look at the differences between the two.

CROSS JOIN

CrossJoinVenn

The CROSS JOIN gives you the Cartesian product of the two tables, by matching every row from one table with every row from another table. If there are X rows in the first table, and Y rows in the second table, the result set will have exactly X times Y rows. Notice on the CROSS JOIN, there is no ON clause specified. This is because there is not matching.

CROSS JOIN Sample

SELECT *
  FROM People p
  CROSS JOIN Colors c;

Which gives us the following result set:

CrossJoinResults

 

In the above result set, you can see that for each row in the People table (3) it was matched up with each row  from the Colors table (3) which gives us a total of 3 x 3 = 9 rows in the result set.

FULL OUTER JOIN

Now for the full outer JOIN, this is like asking for a LEFT OUTER JOIN and a RIGHT OUTER JOIN in the same query. You are asking to get every row from Table1 matching what could be matched in Table2, for those that don’t match on the Table1 side, just display the Table2 side, with NULLs on the LEFT side, and for those that don’t match on the TABLE2 side, just display the Table1 side with NULLs on the RIGHT side.

FullOuterJoinVenn
FULL OUTER JOIN Sample

SELECT *
  FROM People p
  FULL OUTER JOIN Colors c
    ON p.fk = c.id;

In this case everything from Table1 will be represented, and everything from Table2 will be shown, but they will be matched up base on the ON clause, rather than with the CROSS JOIN which gives every possible combination.FullOuterJoinResults

I hope that this helps answer some of the questions with the differences between a CROSS JOIN and a FULL OUTER JOIN.

If you haven’t already taken a look, please check out the free download of the SQL Server JOIN Types poster.

For More Information:

Using the TSQL CASE Statement

Here is a quick video tutorial on how to use the T-SQL CASE function on SQL Server 2012, SQL Server 2014 0r newer. This was originally part of my free SQL query training for the 70-461 certification exam.

Here is the sample or demo code from the video tutorial.

CREATE DATABASE [QueryTraining];
GO

USE [QueryTraining];
GO

------------------------ EXAMPLES SETUP ------------------------
-- Table to be used for training
CREATE TABLE REVENUE
(
[DepartmentID] int,
[Revenue] int,
[Year] int
);

insert into REVENUE
values (1,10030,1998),(2,20000,1998),(3,40000,1998),
(1,20000,1999),(2,60000,1999),(3,50000,1999),
(1,40000,2000),(2,40000,2000),(3,60000,2000),
(1,30000,2001),(2,30000,2001),(3,70000,2001),
(1,90000,2002),(2,20000,2002),(3,80000,2002),
(1,10300,2003),(2,1000,2003), (3,90000,2003),
(1,10000,2004),(2,10000,2004),(3,10000,2004),
(1,20000,2005),(2,20000,2005),(3,20000,2005),
(1,40000,2006),(2,30000,2006),(3,30000,2006),
(1,70000,2007),(2,40000,2007),(3,40000,2007),
(1,50000,2008),(2,50000,2008),(3,50000,2008),
(1,20000,2009),(2,60000,2009),(3,60000,2009),
(1,30000,2010),(2,70000,2010),(3,70000,2010),
(1,80000,2011),(2,80000,2011),(3,80000,2011),
(1,10000,2012),(2,90000,2012),(3,90000,2012);

declare @corners as int = 6;
-- the old way using case.
SELECT CASE @corners
		WHEN 1 THEN 'point'
		WHEN 2 THEN 'line'
		WHEN 3 THEN 'triangle'
		WHEN 4 THEN 'square'
		WHEN 5 THEN 'pentagon'
		WHEN 6 THEN 'hexagon'
		WHEN 7 THEN 'heptagon'
		WHEN 8 THEN 'octagon'
		ELSE NULL
	   END;

-- assume we want to display an indicator to see if we are above
-- or below average. First we start with the average over departmentID
SELECT Year,
       DepartmentID,
       Revenue,
       AVG(Revenue) OVER (PARTITION BY DepartmentID) AS AverageRevenue
  FROM REVENUE
 ORDER BY DepartmentID, year;

-- using the CASE statement we would get the following
SELECT Year, DepartmentID, Revenue, AverageRevenue,
       CASE WHEN Revenue > AverageRevenue
            THEN 'Better Than Average'
            ELSE 'Not Better'
        END AS Ranking
  FROM (SELECT Year, DepartmentID, Revenue,
               avg(Revenue) OVER (PARTITION by DepartmentID) as AverageRevenue
          FROM REVENUE
	   ) as t
 ORDER BY DepartmentID, year;

More Info:

Using the TSQL CHOOSE Function

Here is a short video tutorial that shows how to use the CHOOSE function in T-SQL on SQL Server 2012, SQL Server 2014 or Newer. This was originally part of my free SQL query training for the 70-461 certification exam.

Here is the sample code that goes along with the CHOOSE function training. This sample code has been tested and confirmed working on SQL Server 2012 and SQL Server 2014.

 

CREATE DATABASE [QueryTraining];
GO
USE [QueryTraining];
GO
------------------------------------------------
-- CHOOSE (2012, 2014 or newer)
-- returns the item at a specific index

declare @corners as int = 6;
-- the old way using case.
SELECT CASE @corners
		WHEN 1 THEN 'point'
		WHEN 2 THEN 'line'
		WHEN 3 THEN 'triangle'
		WHEN 4 THEN 'square'
		WHEN 5 THEN 'pentagon'
		WHEN 6 THEN 'hexagon'
		WHEN 7 THEN 'heptagon'
		WHEN 8 THEN 'octagon'
		ELSE NULL
	   END;

-- now with choose
declare @corners as int = 6;
SELECT choose(@corners, 'point', 'line', 'triangle', 'square', 'pentagon',
                        'hexagon', 'heptagon', 'octagon')

-- CHOOSE day of week example
DECLARE @day as int=4
SELECT CHOOSE(@day,'Sunday','Monday', 'Tuesday', 'Wednesday',
                   'Thursday', 'Friday', 'Saturday');

-- Pick 5 people at random from a list of 10
GO
DECLARE @myRandonNum INTEGER;
SET @myRandonNum = CAST(RAND() * 10 AS INTEGER) + 1;
SELECT @myRandonNum as Num, CHOOSE(@myRandonNum, 'Steve', 'Aaron', 'Pete', 'Cody',
                     'Gabe', 'Bill', 'Aaron', 'Jordan', 'Darren', 'Mark', 'Aaron') as Name;
GO 5

Notice on the last line of the sample, the GO command has the number 5 right after it. This tells SSMS to run the batch 5 times. See my blog post on GO for more details.

More Info:

Using the TSQL IIF Function

Here is a quick video tutorial on how to use the IIF function on SQL Server 2012, SQL Server 2014 0r newer. This was originally part of my free SQL query training for the 70-461 certification exam.

Here is the sample code from the video.


USE Master;
GO

IF EXISTS(SELECT name
            FROM sys.databases
		   WHERE name = 'QueryTraining')
BEGIN
	ALTER DATABASE [QueryTraining]
	  SET SINGLE_USER WITH ROLLBACK IMMEDIATE;
	DROP DATABASE [QueryTraining];
END
CREATE DATABASE [QueryTraining];
GO

USE [QueryTraining];
GO

------------------------ EXAMPLES SETUP ------------------------
-- Table to be used for training
CREATE TABLE REVENUE
(
[DepartmentID] int,
[Revenue] int,
[Year] int
);

insert into REVENUE
values (1,10030,1998),(2,20000,1998),(3,40000,1998),
(1,20000,1999),(2,60000,1999),(3,50000,1999),
(1,40000,2000),(2,40000,2000),(3,60000,2000),
(1,30000,2001),(2,30000,2001),(3,70000,2001),
(1,90000,2002),(2,20000,2002),(3,80000,2002),
(1,10300,2003),(2,1000,2003), (3,90000,2003),
(1,10000,2004),(2,10000,2004),(3,10000,2004),
(1,20000,2005),(2,20000,2005),(3,20000,2005),
(1,40000,2006),(2,30000,2006),(3,30000,2006),
(1,70000,2007),(2,40000,2007),(3,40000,2007),
(1,50000,2008),(2,50000,2008),(3,50000,2008),
(1,20000,2009),(2,60000,2009),(3,60000,2009),
(1,30000,2010),(2,70000,2010),(3,70000,2010),
(1,80000,2011),(2,80000,2011),(3,80000,2011),
(1,10000,2012),(2,90000,2012),(3,90000,2012);

------------------------------------------------
-- IIF (2012)
USE [QueryTraining];
GO

-- now the same functionality using IIF and simplifying the code
-- http://stevestedman.com/?p=1578
SELECT  Year,
        DepartmentID,
        Revenue,
        AverageRevenue,
		IIF(Revenue > AverageRevenue, 'Better Than Average', 'Not Better' ) AS Ranking
FROM    (SELECT Year,
                DepartmentID,
                Revenue,
                AVG(Revenue) OVER (PARTITION BY DepartmentID) AS AverageRevenue
         FROM   REVENUE
        ) AS t
ORDER BY DepartmentID, year;

----------------------------------
SELECT IIF(3 = 2, 'true', NULL); -- ERROR ?

----------------------------------

SELECT IIF(1 = 7, NULL, NULL); -- ERROR !
--Do you think that IIF actually uses the CASE statement?

More Info:

Executing a TSQL batch multiple times using GO

Using the GO commend in TSQL to run a batch of code multiple times is something that I commonly do in demo’s and training’s.  What amazes me is how many people after the training will tell me that they didn’t know about using the GO command to run a statement multiple times.

Here’s how it works. First lets create a table, then we will insert 100000 into the table using a while loop.

CREATE TABLE SampleTable(
	ID INT NOT NULL IDENTITY(1,1),
	SomeData uniqueidentifier NOT NULL DEFAULT(NEWID())
);

DECLARE @i INT; 
SET @i = 1;
WHILE @i <= 100000
BEGIN
	INSERT dbo.SampleTable DEFAULT VALUES ;
	SET @i = @i + 1;
END 

Sure that works, but it takes several lines of code to loop. There is an easier way to do it.

CREATE TABLE SampleTable(
	ID INT NOT NULL IDENTITY(1,1),
	SomeData uniqueidentifier NOT NULL DEFAULT(NEWID())
);

GO
	INSERT dbo.SampleTable DEFAULT VALUES; 
GO 100000

In this example, GO is the batch separator, SSMS breaks up the TSQL on GO statements, then runs each batch one at a time.
The first GO command is used to simply separate the CREATE statement from the INSERT statement. Then the second GO with followed by 100000 tells SSMS to run that batch 100000 times. If we didn’t have the first GO, the CREATE TABLE would have been attempted 100000 times, not just the INSERT STATEMENT.

GO IS NOT A TSQL KEYWORD.

GO is simply a SSMS batch separator word that can be modified to be something else in the SSMS settings if you want to change it. GO is not part of TSQL, and won’t work when you call TSQL from your own programs.

 

A quick tip that will hopefully save you some time.

TSQL Queries – Using NTILE

NTILE is a SQL Server function that can be called in a TSQL Query. This is part of the free sql query training to prepare for the Microsoft 70-461 queries exam.
Here is a video presentation on using the TSQL NTILE function.

And here is the sample code associated with the presentation. Feel free to copy and paste to SSMS and run it yourself.


CREATE DATABASE [QueryTraining];
GO

USE [QueryTraining];



-- Table to be used by Over Clause Rows/Range
CREATE TABLE REVENUE
(
[DepartmentID] int,
[Revenue] int,
[Year] int
);

insert into REVENUE
values (1,10030,1998),(2,20000,1998),(3,40000,1998),
       (1,20000,1999),(2,60000,1999),(3,50000,1999),
       (1,40000,2000),(2,40000,2000),(3,60000,2000),
       (1,30000,2001),(2,30000,2001),(3,70000,2001),
       (1,90000,2002),(2,20000,2002),(3,80000,2002),
       (1,10300,2003),(2,1000,2003), (3,90000,2003),
       (1,10000,2004),(2,10000,2004),(3,10000,2004),
       (1,20000,2005),(2,20000,2005),(3,20000,2005),
       (1,40000,2006),(2,30000,2006),(3,30000,2006),
       (1,70000,2007),(2,40000,2007),(3,40000,2007),
       (1,50000,2008),(2,50000,2008),(3,50000,2008),
       (1,20000,2009),(2,60000,2009),(3,60000,2009),
       (1,30000,2010),(2,70000,2010),(3,70000,2010),
       (1,80000,2011),(2,80000,2011),(3,80000,2011),
       (1,10000,2012),(2,90000,2012),(3,90000,2012);



-- first lets look at the REVENUE table

SELECT * 
  FROM Revenue;



-- NTILE Function


-- Rank all the revenue amounts on a scale of 1 to 10
SELECT *, NTILE(10) OVER(ORDER BY [Revenue] ASC) 
  FROM Revenue
  ORDER BY [Year];

-- Rank all the revenue amounts on a scale of 1 to 100
SELECT *, NTILE(100) OVER(ORDER BY [Revenue] ASC) AS Percentile
  FROM Revenue
  ORDER BY [Year];
-- or does it.....


SELECT *, NTILE(20) OVER(ORDER BY [Revenue] DESC) AS NTile20 
  FROM Revenue
  ORDER BY NTile20;


-- multiple NTILES
SELECT *, 
	   NTILE(10) OVER(ORDER BY [Revenue] DESC) AS NTile10, 
	   NTILE(20) OVER(ORDER BY [Revenue] DESC) AS NTile20, 
	   NTILE(4) OVER(ORDER BY [Revenue] DESC) AS Quartile
  FROM Revenue
  ORDER BY NTile20;