Corruption Challenge 1 – An alternative solution

After posting the winning solution for Corruption Challenge 1 from Brent Ozar, I realized that he and I both solved the corruption by using the REPAIR_ALLOW_DATA_LOSS option on CheckDb. A very nasty move, however it did repair the corruption.


DBCC CHECKDB ('',REPAIR_ALLOW_DATA_LOSS);

 

After reading some feedback, one of the winners stated:

As soon as he ran REPAIR_ALLOW_DATA_LOSS, I knew we weren’t on the same page. I just never do that unless I’ve exhausted all the other options.

Which is a good point, in this solution I was fairly certain as to what REPAIR_ALLOW_DATA_LOSS was going to do, however in a real world scenario, who knows what might be effected beyond the initial table that we know about.

There are several other options to clean up the corrupt table besides the REPAIR_ALLOW_DATA_LOSS option. These options still involve copying the data off to another table and finding the missing data from row 31, however how the corruption gets cleaned up varies widely with the following options:

Read more of this post

Corruption Challenge 1 – how I corrupted the database

Since the corruption challenge completed yesterday, I have had several request asking how I created the corrupt database. So here is the script that I used to create the Database Corruption Challenge 1.

First the initial setup. Most of this I stole from a query training session that I did several weeks ago. All I really needed was a table with some data in it.


CREATE DATABASE [CorruptionChallenge1];
GO

USE [CorruptionChallenge1];

CREATE TABLE Revenue
(
[id] INTEGER IDENTITY,
[DepartmentID] INTEGER,
[Revenue] INTEGER,
[Year] INTEGER,
[Notes] VARCHAR(300)
);

INSERT INTO Revenue ([DepartmentID], [Revenue], [Year])
VALUES (1,10030,1998),(2,20000,1998),(3,40000,1998),
 (1,20000,1999),(2,600400,1999),(3,500400,1999),
 (1,40050,2000),(2,400300,2000),(3,604000,2000),
 (1,30000,2001),(2,30000,2001),(3,703000,2001),
 (1,90000,2002),(2,200200,2002),(3,80000,2002),
 (1,10300,2003),(2,1000,2003), (3,900300,2003),
 (1,10000,2004),(2,10000,2004),(3,100300,2004),
 (1,208000,2005),(2,200200,2005),(3,203000,2005),
 (1,40000,2006),(2,30000,2006),(3,300300,2006),
 (1,709000,2007),(2,40000,2007),(3,400300,2007),
 (1,50000,2008),(2,50000,2008),(3,500300,2008),
 (1,20000,2009),(2,600030,2009),(3,600300,2009),
 (1,300700,2010),(2,70000,2010),(3,700300,2010),
 (1,80000,2011),(2,80000,2011),(3,800200,2011),
 (1,100030,2012),(2,90000,2012),(3,900300,2012),
 (1,10000,2013),(2,90000,2013),(3,900100,2013),
 (1,100400,2014),(2,900300,2014),(3,903000,2014),
 (1,102000,2015),(2,902000,2015),(3,902000,2015);

UPDATE Revenue SET [Notes] = CAST(NEWID() as VARCHAR(300)) + 'This is some varchar data just to fil out some pages... data pages are only 8k, therefore the more we fill up in each page, the more pages this table will flow into, thus simulating a larger table for the corruption example';

CREATE CLUSTERED INDEX [clustId] ON [dbo].[Revenue]
(
 [id] ASC
);

CREATE NONCLUSTERED INDEX [ncDeptIdYear] ON [dbo].[Revenue]
(
 [DepartmentID] ASC,
 [Revenue] ASC
);

CREATE NONCLUSTERED INDEX [ncBadNameForAnIndex] ON [dbo].[Revenue]
(
 [Year] ASC
)
INCLUDE ( [Notes]) ;

-- first lets look at the REVENUE table
SELECT *
 FROM Revenue;

Setup1

Read more of this post

Introducing the DataBase Corruption Challenge (DBCC) – Week 1 Challenge

corruption

Welcome to the DataBase Corruption Challenge, this is an about weekly blog challenge where I will post a corrupt SQL Server database with some details on what happened to it.

If at this point you are already a bit irked by my use of capitalization in the DataBase Corruption Challenge, and the acronym of DBCC that I have used to describe it, then you are already ahead of many people reading about this challenge. Welcome to the challenge.
The challenge will be to download the corrupt database and attempt to recover it. If you can recover it, please send me the steps to recover it, along with some proof that the database has been recovered. The goal each week will be the following:

Read more of this post

FULL OUTER JOIN vs CROSS JOIN

As I have been working on the SQL Server JOIN Types poster, I have received several questions around the difference between a CROSS JOIN, and a FULL OUTER JOIN. After looking at the Venn diagrams for the two, they are both shown as the same, however they are not the same by any means.

Yes, they both include all rows from both the LEFT and RIGHT side of the JOIN, however they are matched up or JOINed in a very different way.

Lets take a look, first at the Venn diagrams, you can see below that the Venn diagrams show that all rows from Table1 are included in the results, and all rows from Table2 are included in the results. This is indeed correct, the differences come in the number of rows and how the results are matched up.

CrossJoinVennFullOuterJoinVenn

 

Now lets take a look at some sample cod that shows the differences. First we will create a database and create a couple tables to use for the demo, and each table has 3 rows inserted to it:

CREATE DATABASE [QueryTraining];
GO
USE [QueryTraining];
GO

CREATE TABLE People
(
	id   INTEGER IDENTITY NOT NULL,
	Name VARCHAR(100) NOT NULL,
    fk   INTEGER NULL
PRIMARY KEY (id)
) ;

CREATE TABLE Colors
(
	id   INTEGER NOT NULL,
	FavoriteColor VARCHAR(100) NOT NULL
) ;

INSERT INTO Colors (id, FavoriteColor) VALUES
    (1, 'red'),
    (2, 'green'),
    (3, 'blue');

INSERT INTO People (Name, fk) VALUES
    ('Steve', 1),
    ('Aaron', 3),
    ('Mary', NULL);

Look at this as a database to store people and their favorite color, in this example, there are 3 people , and 3 colors. One person does not have a favorite color, and one color has not been favorited by anyone.

Now lets take a look at the differences between the two.

CROSS JOIN

CrossJoinVenn

The CROSS JOIN gives you the Cartesian product of the two tables, by matching every row from one table with every row from another table. If there are X rows in the first table, and Y rows in the second table, the result set will have exactly X times Y rows. Notice on the CROSS JOIN, there is no ON clause specified. This is because there is not matching.

CROSS JOIN Sample

SELECT *
  FROM People p
  CROSS JOIN Colors c;

Which gives us the following result set:

CrossJoinResults

 

In the above result set, you can see that for each row in the People table (3) it was matched up with each row  from the Colors table (3) which gives us a total of 3 x 3 = 9 rows in the result set.

FULL OUTER JOIN

Now for the full outer JOIN, this is like asking for a LEFT OUTER JOIN and a RIGHT OUTER JOIN in the same query. You are asking to get every row from Table1 matching what could be matched in Table2, for those that don’t match on the Table1 side, just display the Table2 side, with NULLs on the LEFT side, and for those that don’t match on the TABLE2 side, just display the Table1 side with NULLs on the RIGHT side.

FullOuterJoinVenn
FULL OUTER JOIN Sample

SELECT *
  FROM People p
  FULL OUTER JOIN Colors c
    ON p.fk = c.id;

In this case everything from Table1 will be represented, and everything from Table2 will be shown, but they will be matched up base on the ON clause, rather than with the CROSS JOIN which gives every possible combination.FullOuterJoinResults

I hope that this helps answer some of the questions with the differences between a CROSS JOIN and a FULL OUTER JOIN.

If you haven’t already taken a look, please check out the free download of the SQL Server JOIN Types poster.

For More Information:

Using the TSQL CASE Statement

Here is a quick video tutorial on how to use the T-SQL CASE function on SQL Server 2012, SQL Server 2014 0r newer. This was originally part of my free SQL query training for the 70-461 certification exam.

Here is the sample or demo code from the video tutorial.

CREATE DATABASE [QueryTraining];
GO

USE [QueryTraining];
GO

------------------------ EXAMPLES SETUP ------------------------
-- Table to be used for training
CREATE TABLE REVENUE
(
[DepartmentID] int,
[Revenue] int,
[Year] int
);

insert into REVENUE
values (1,10030,1998),(2,20000,1998),(3,40000,1998),
(1,20000,1999),(2,60000,1999),(3,50000,1999),
(1,40000,2000),(2,40000,2000),(3,60000,2000),
(1,30000,2001),(2,30000,2001),(3,70000,2001),
(1,90000,2002),(2,20000,2002),(3,80000,2002),
(1,10300,2003),(2,1000,2003), (3,90000,2003),
(1,10000,2004),(2,10000,2004),(3,10000,2004),
(1,20000,2005),(2,20000,2005),(3,20000,2005),
(1,40000,2006),(2,30000,2006),(3,30000,2006),
(1,70000,2007),(2,40000,2007),(3,40000,2007),
(1,50000,2008),(2,50000,2008),(3,50000,2008),
(1,20000,2009),(2,60000,2009),(3,60000,2009),
(1,30000,2010),(2,70000,2010),(3,70000,2010),
(1,80000,2011),(2,80000,2011),(3,80000,2011),
(1,10000,2012),(2,90000,2012),(3,90000,2012);

declare @corners as int = 6;
-- the old way using case.
SELECT CASE @corners
		WHEN 1 THEN 'point'
		WHEN 2 THEN 'line'
		WHEN 3 THEN 'triangle'
		WHEN 4 THEN 'square'
		WHEN 5 THEN 'pentagon'
		WHEN 6 THEN 'hexagon'
		WHEN 7 THEN 'heptagon'
		WHEN 8 THEN 'octagon'
		ELSE NULL
	   END;

-- assume we want to display an indicator to see if we are above
-- or below average. First we start with the average over departmentID
SELECT Year,
       DepartmentID,
       Revenue,
       AVG(Revenue) OVER (PARTITION BY DepartmentID) AS AverageRevenue
  FROM REVENUE
 ORDER BY DepartmentID, year;

-- using the CASE statement we would get the following
SELECT Year, DepartmentID, Revenue, AverageRevenue,
       CASE WHEN Revenue > AverageRevenue
            THEN 'Better Than Average'
            ELSE 'Not Better'
        END AS Ranking
  FROM (SELECT Year, DepartmentID, Revenue,
               avg(Revenue) OVER (PARTITION by DepartmentID) as AverageRevenue
          FROM REVENUE
	   ) as t
 ORDER BY DepartmentID, year;

More Info: