VARCHAR diff function

Last month I posted another CTE blog entry Using a CTE to Split a String Into Rows with Line Numbers.  Since then I have used it but, but realized that what I really needed was a diff tool to compare two strings similar to the command line diff tool.

First we take a look the SplitWithLineNumber from the previous article.  This will be used in the CTE diff tool

CREATE FUNCTION dbo.SplitWithLineNumber (@sep char(1), @s varchar(max))
RETURNS table
AS
RETURN (
WITH splitter_cte AS (
  SELECT CHARINDEX(@sep, @s) as pos,
         cast(0 as bigint) as lastPos,
         cast(0 as int) as LineNumber
  UNION ALL
 SELECT CHARINDEX(@sep, @s, pos + 1),
        cast(pos as bigint),
        LineNumber + 1 as LineNumber
  FROM splitter_cte
 WHERE pos > 0
)
SELECT LineNumber,
       SUBSTRING(@s, lastPos + 1,
         case when pos = 0 then 2147483647
         else pos - lastPos -1 end) as chunk
  FROM splitter_cte
);

Next we look at the new function which calls the original, and compares the results.

CREATE FUNCTION [dbo].[varcharDiff](@s1 varchar(max), @s2 varchar(max))
RETURNS table
AS
RETURN (
WITH FirstStringAsTable as
(
SELECT leftDiff.lineNumber,
       leftDiff.chunk AS leftChunk,
       rightDiff.chunk AS rightChunk
  FROM dbo.SplitWithLineNumber(char(10), @s1) AS leftDiff
  LEFT JOIN dbo.SplitWithLineNumber(char(10), @s2) AS rightDiff
    ON leftDiff.lineNumber = rightDiff.lineNumber,
)
SELECT *
  FROM FirstStringAsTable
 WHERE leftChunk <> rightChunk
)

In order to run it we need to include two varchar’s that are slightly diff so that we can run the diff tool against them.

declare @InputString as varchar(8000);
set @InputString = 'Anyone can perform a SQL Server upgrade, but the risk of failure is much
greater if you don’t plan it correctly. By the end of this chapter you will understand how
to plan for the things that could go wrong during the upgrade, and options you can take to
mitigate those risks.
This chapter will cover the details that you will need to know when upgrading a stand alone,
non High Availability (HA), SQL Server from one version to a newer version, with minimal
downtime. I use the term “minimal downtime” as the goal, as it is very expensive to get to the
point of zero downtime with a SQL Server upgrade without an HA solution. For the purpose of
this upgrade process we will be considering a single SQL Server being upgraded, although it
may end up on different hardware or it may be the same hardware.
Upgrading may involve upgrading just the hardware for better performance or more capacity and
not the SQL Server version. The specific upgrade path may depend on your business needs and
the resources or budget that you have available.
Most of the summaries and stories in this chapter are based on my experiences using SQL Server
over the last 21 years. I will refer to several examples in this chapter of things that went
wrong, and then at the end follow up with a case study of an upgrade that went very well.
Considering the things that can go wrong will help you think about ways to prevent them, or
reduce their likelihood of happening.'

declare @InputString2 as varchar(8000);
set @InputString2 = 'Anyone can perform a SQL Server upgrade, but the risk of failure is much
greater if you don’t plan it correctly. By the end of this chapter you will understand how
to plan for the things that could go wrong during the upgrade, and options you can take to
mitigate those risks.
This chapter will cover the details that you need to know when upgrading a stand alone,
non High Availability (HA), SQL Server from one version to a newer version, with minimal
downtime. I use the term “minimalistic downtime” as the goal, as it is very expensive to get to the
point of zero downtime with a SQL Server upgrade without an HA solution. For the purpose of
this upgrade process we will be considering a single SQL Server being upgraded, although it
may end up on different hardware or it may be the same hardware.
Upgrading may involve upgrading just the hardware for better performance or more capacity and
not the SQL Server version. The specific upgrade path may depend on your business needs and
the resources or budget that you have.
Most of the summaries and stories in this chapter are based on my experiences using SQL Server
over the last 21 years. I will refer to several examples in this chapter of things that went
wrong, and then at the end follow up with a case study of an upgrade that went very well.
Considering the things that can go wrong will help you think about ways to prevent them, or
reduce their likelihood of happening.'

And then call the new function.

SELECT *
  FROM [dbo].[varcharDiff](@InputString, @InputString2);

Which produces the following output.

CTE_DIFFWhere we can see that there are 3 lines different between the two varchar strings.  Lines 4, 6 and 12 are the lines that are different, and the left and right pieces are shown here.

Useful for doing Test Driven Development (TDD) to test the output of a function against what yoa are expecting it to return.

I hope this script helps you and that you find enjoy it.

Using a CTE to Split a String Into Rows with Line Numbers

Last year while working on my CTE presentation for a SQL Saturday I added a blog post called “Using a CTE to Split a String Into Rows“, and since that posting I have used it many times.   But as things go in development, eventually there is a need to do something more.

Doing some complex string building to create files lately I cam across the need to use a CTE to split strings into rows but to also include line numbers.  To accomplish that I first started with the same function that I used last year, but renamed it, and added a second output column called LineNumber, here is what it looked like:


CREATE FUNCTION dbo.SplitWithLineNumber (@sep char(1), @s varchar(max))
RETURNS table
AS
RETURN (
 WITH splitter_cte AS (
 SELECT CHARINDEX(@sep, @s) as pos,
        cast(0 as bigint) as lastPos,
        0 as LineNumber
 UNION ALL
 SELECT CHARINDEX(@sep, @s, pos + 1),
        cast(pos as bigint),
        LineNumber + 1 as LineNumber
 FROM splitter_cte
 WHERE pos > 0
 )
 SELECT LineNumber,
        SUBSTRING(@s,
                 lastPos + 1,
                 case when pos = 0 then 2147483647
                      else pos - lastPos -1 end) as chunk
 FROM splitter_cte
 );

The things that have changed since the last post “Using a CTE to Split a String Into Rows“, are the following:

  • I added another column to the output called Line Number.
  • Modified the input to be varchar(max).
  • Adjusted the case statement to accommodate varchar(max).
  • Cast the positions as bigint’s in order to accomodate varchar(max).

So now when you run this on the original input, you get the following:

SELECT *
 FROM dbo.SplitWithLineNumber (' ',
          'the quick brown dog jumped over the lazy fox')
OPTION(MAXRECURSION 0);

CTEtoSplitaStringIntoRowswithLineNumbers

Just a small change, but if you need to lave line numbers on the output, this will do it for you.