SELECT Blog FROM Brad.Schulz CROSS APPLY SQL.Server(): August 2010

Tuesday, August 17, 2010

Integer List Splitting: A SQL Fable

nce upon a time, not too long ago, there was a company called AdventureWorks. And working in this company was a diligent database developer named Dan Druff. Dan was responsible for developing and maintaining the AdventureWorks SQL Server database.

One day, Dan got a request to put together a query. The specs called for accepting a string of comma-delimited ProductID’s and to produce a result set representing orders that used those ProductID’s. The result set was to consist of the ProductID, the Product Name, the SalesOrderID, the Order Date, and the name of the Territory associated with the order. The final output was to be ordered by ProductID and SalesOrderID.

This sounded easy enough. Dan already had a user-defined function to split a comma-delimited list of integers. He had written it himself:

create function dbo.ufn_SplitIntArray
(
   @List       varchar(max)
  ,@Delimiter  char(1)
)
returns @Items table (Item int)
as
begin
  declare @Item varchar(12)
         ,@Pos int
  while len(@List)>0
  begin
    set @Pos=charindex(@Delimiter,@List)
    if @Pos=0 set @Pos=len(@List)+1
    set @Item=left(@List,@Pos-1)
    insert @Items select convert(int,ltrim(rtrim(@Item)))
    set @List=substring(@List,@Pos+len(@Delimiter),len(@List))
  end
  return
end

He put together the requested query, incorporating his function, and he tested it out:

declare @ProductList varchar(max) = '897,911,942'
select d.ProductID 
      ,ProductName=p.Name
      ,h.SalesOrderID 
      ,h.OrderDate
      ,TerritoryName=t.Name 
from dbo.ufn_SplitIntArray(@ProductList,',') a
join Sales.SalesOrderDetail d on a.Item=d.ProductID 
join Production.Product p on d.ProductID=p.ProductID 
join Sales.SalesOrderHeader h on d.SalesOrderID=h.SalesOrderID 
join Sales.SalesTerritory t on h.TerritoryID=t.TerritoryID 
order by d.ProductID,h.SalesOrderID 
/*
ProductID ProductName                      SalesOrderID OrderDate  TerritoryName
--------- -------------------------------- ------------ ---------- --------------
      897 LL Touring Frame - Blue, 58             51823 2003-08-01 United Kingdom
      897 LL Touring Frame - Blue, 58             51875 2003-08-01 Australia
      911 LL Road Seat/Saddle                     51140 2003-07-01 Central
      911 LL Road Seat/Saddle                     53472 2003-09-01 United Kingdom
      911 LL Road Seat/Saddle                     53488 2003-09-01 Canada
      911 LL Road Seat/Saddle                     53495 2003-09-01 United Kingdom
      911 LL Road Seat/Saddle                     53501 2003-09-01 Southeast
      911 LL Road Seat/Saddle                     53529 2003-09-01 Canada
      942 ML Mountain Frame-W - Silver, 38        51120 2003-07-01 United Kingdom
      942 ML Mountain Frame-W - Silver, 38        51711 2003-08-01 Northwest
      942 ML Mountain Frame-W - Silver, 38        51758 2003-08-01 Southeast
      942 ML Mountain Frame-W - Silver, 38        51799 2003-08-01 Southeast
      942 ML Mountain Frame-W - Silver, 38        51856 2003-08-01 Southwest
*/

It seemed to work just fine… it spit out the result quickly and efficiently.

But something nagged at Dan. His function had serviced him well in the past, but the function was so “yesterday”… so “1970’s”… it just felt kind of old-fashioned and tired. Dan wanted a function that was more “hip” and more “with it”.

And then Dan had an idea.

He recalled reading an excellent blog post by MVP Aaron Bertrand from last July that did a roundup of various string-splitting techniques. One of the techniques was just like Dan’s, and Aaron had given the approach a nickname of RBAR, meaning “Row By Agonizing Row”. The roundup showed that this technique was one of the poorest performers.

So Dan made a decision right then and there that he was going to move into the 21st century… he was going to upgrade his function to one of the better performers that Aaron had outlined in his blog.

The fastest performer in Aaron’s roundup by far was the CLR approach. But that kind of scared Dan… he didn’t want to get into something that he didn’t fully understand.

So he gave one of the other techniques a try.

In general, the Numbers Table technique seemed to be the next-best performer after CLR in Aaron’s roundup, so Dan went about creating a table of numbers from 1 to 1,000,000 in his AdventureWorks database, and created a primary key for it:

;with 
  L0(c) as (select 0 from (values (0),(0),(0)) x(c)) --3 Rows
 ,L1(c) as (select 0 from L0 a, L0 b, L0 c)          --27 Rows (3x3x3)
 ,L2(c) as (select 0 from L1 a, L1 b, L1 c)          --19683 Rows (27x27x27)
 ,L3(c) as (select 0 from L2 a, L2 b)                --387,420,489 Rows (19683x19683)
 ,NN(n) as (select row_number() over (order by (select 0)) from L3)
select Number=isnull(convert(int,n),0)  --Force it to be a "not null" column
into dbo.Numbers 
from NN 
where n<=1000000
go 
alter table dbo.Numbers 
add constraint PK_Numbers 
primary key clustered (Number)
with (fillfactor=100)

And then he put together a function to split a string of integers using this auxiliary table of numbers:

create function dbo.ufn_SplitIntArrayNum
(
   @List       varchar(max)
  ,@Delimiter  char(1)
)
returns table
as
return
select Item=convert(int,String)
from dbo.Numbers with (nolock)
cross apply (select ItemPos=Number) F1
cross apply (select DelimPos=charindex(@Delimiter,@List+@Delimiter,ItemPos)) F2
cross apply (select String=rtrim(ltrim(substring(@List,ItemPos,DelimPos-ItemPos)))) F3
where ItemPos<=convert(int,len(@List))
  and substring(@Delimiter+@List,ItemPos,1)=@Delimiter

He tested it out…

select * 
from dbo.ufn_SplitIntArrayNum('123,456,789',',')
/*
Item
----
 123
 456
 789
*/

…and it worked just fine. So he incorporated it into his query and executed it:

declare @ProductList varchar(max) = '897,911,942'
select d.ProductID 
      ,ProductName=p.Name
      ,h.SalesOrderID 
      ,h.OrderDate
      ,TerritoryName=t.Name 
from dbo.ufn_SplitIntArrayNum(@ProductList,',') a
join Sales.SalesOrderDetail d on a.Item=d.ProductID 
join Production.Product p on d.ProductID=p.ProductID 
join Sales.SalesOrderHeader h on d.SalesOrderID=h.SalesOrderID 
join Sales.SalesTerritory t on h.TerritoryID=t.TerritoryID 
order by d.ProductID,h.SalesOrderID 
/*
ProductID ProductName                      SalesOrderID OrderDate  TerritoryName
--------- -------------------------------- ------------ ---------- --------------
      897 LL Touring Frame - Blue, 58             51823 2003-08-01 United Kingdom
      897 LL Touring Frame - Blue, 58             51875 2003-08-01 Australia
      911 LL Road Seat/Saddle                     51140 2003-07-01 Central
      911 LL Road Seat/Saddle                     53472 2003-09-01 United Kingdom
      911 LL Road Seat/Saddle                     53488 2003-09-01 Canada
      911 LL Road Seat/Saddle                     53495 2003-09-01 United Kingdom
      911 LL Road Seat/Saddle                     53501 2003-09-01 Southeast
      911 LL Road Seat/Saddle                     53529 2003-09-01 Canada
      942 ML Mountain Frame-W - Silver, 38        51120 2003-07-01 United Kingdom
      942 ML Mountain Frame-W - Silver, 38        51711 2003-08-01 Northwest
      942 ML Mountain Frame-W - Silver, 38        51758 2003-08-01 Southeast
      942 ML Mountain Frame-W - Silver, 38        51799 2003-08-01 Southeast
      942 ML Mountain Frame-W - Silver, 38        51856 2003-08-01 Southwest
*/

Well, it gave the correct result, but something seemed strange. The query seemed sluggish. It didn’t seem as quick as his query that used his tired old “RBAR” function. But that couldn’t be possible… after all, Aaron’s blog had clearly demonstrated the superiority of the Numbers approach.

So Dan fired up Profiler and did a comparison between the two queries. And he looked at the Actual Execution Plans for the two queries and he was shocked at what he discovered:

/*
                                                  Estimated     Estimated
Description           CPU    Reads  Duration      QueryCost  Number of Rows  MemoryGrant
----------------------------------------------------------------------------------------
Original (RBAR)      62ms      832     304ms       0.790177         359.935      2,192KB
Numbers Technique  5961ms  244,199    2229ms    3556.620000  56,629,000.000     37,984KB
*/

What!!?? This couldn’t be! Dan’s RBAR approach completely annihilated the Numbers approach by a mile! Heck, it beat it by more than a mile… it beat it by a light-year!

The CPU and Reads of the Numbers Technique was huge! The memory grant was almost 38MB! And the query took over 2 full seconds to run! This was ridiculous!

Dan was very puzzled and a little scared. Maybe he was doing something wrong. Maybe something was going on that he didn’t understand.

He decided to try another technique.

The approach that Aaron’s blog named “Inline 2” looked promising. So Dan created that function in his database:

create function dbo.ufn_SplitIntArrayInline
(
   @List       varchar(max)
  ,@Delimiter  char(1)
)
returns table
as
return
select Item=convert(int,(substring(@Delimiter+@List+@Delimiter
                                  ,w.n+1
                                  ,charindex(@Delimiter
                                            ,@Delimiter+@List+@Delimiter
                                            ,w.n+1) - w.n - 1
                                  )))
from (select n=v0.n+v1.n+v2.n+v3.n
      from (select n = 0 
            union all select 1  union all select 2  union all select 3
            union all select 4  union all select 5  union all select 6
            union all select 7  union all select 8  union all select 9
            union all select 10 union all select 11 union all select 12
            union all select 13 union all select 14 union all select 15) v0
          ,(select n = 0 
            union all select 16  union all select 32  union all select 48
            union all select 64  union all select 80  union all select 96
            union all select 112 union all select 128 union all select 144
            union all select 160 union all select 176 union all select 192
            union all select 208 union all select 224 union all select 240) v1
          ,(select n = 0
            union all select 256  union all select 512  union all select 768 
            union all select 1024 union all select 1280 union all select 1536 
            union all select 1792 union all select 2048 union all select 2304
            union all select 2560 union all select 2816 union all select 3072
            union all select 3328 union all select 3584 union all select 3840) v2
          ,(select n = 0
            union all select 4096   union all select 8192   union all select 12288
            union all select 16384  union all select 20480  union all select 24576
            union all select 28672  union all select 32768  union all select 36864
            union all select 40960  union all select 45056  union all select 49152
            union all select 53248  union all select 57344  union all select 61440
            union all select 65536  union all select 69632  union all select 73728
            union all select 77824  union all select 81920  union all select 86016
            union all select 90112  union all select 94208  union all select 98304
            union all select 102400 union all select 106496 union all select 110592
            union all select 114688 union all select 118784 union all select 122880 
            union all select 126976 union all select 131072 union all select 135168
            union all select 139264 union all select 143360 union all select 147456) v3
     ) w
where w.n=charindex(@Delimiter,@Delimiter+@List+@Delimiter,w.n)
  and w.n<len(@Delimiter+@List)

He tested it out…

select * 
from dbo.ufn_SplitIntArrayInline('123,456,789',',')
/*
Item
----
 123
 456
 789
*/

…and it worked just fine. So he incorporated it into his query and executed it:

declare @ProductList varchar(max) = '897,911,942'
select d.ProductID 
      ,ProductName=p.Name
      ,h.SalesOrderID 
      ,h.OrderDate
      ,TerritoryName=t.Name 
from dbo.ufn_SplitIntArrayInline(@ProductList,',') a
join Sales.SalesOrderDetail d on a.Item=d.ProductID 
join Production.Product p on d.ProductID=p.ProductID 
join Sales.SalesOrderHeader h on d.SalesOrderID=h.SalesOrderID 
join Sales.SalesTerritory t on h.TerritoryID=t.TerritoryID 
order by d.ProductID,h.SalesOrderID 
/*
Msg 537, Level 16, State 5, Line 2
Invalid length parameter passed to the LEFT or SUBSTRING function.
*/

What!!?? What was going on? The query completely bombed! This didn’t make sense. The function worked when run by itself, but once it was JOINed with other tables, it turned over and died.

Dan looked at the estimated execution plan and recorded some information to compare it with his previous two approaches:

/*
                                                  Estimated     Estimated
Description           CPU    Reads  Duration      QueryCost  Number of Rows  MemoryGrant
----------------------------------------------------------------------------------------
Original (RBAR)      62ms      832     304ms       0.790177         359.935      2,192KB
Numbers Technique  5961ms  244,199    2229ms    3556.620000  56,629,000.000     37,984KB
Inline Technique      N/A      N/A       N/A  116532.000000  34,912,200.000        N/A
*/

Omigosh! Look at the Estimated Query Cost! The optimizer estimated this query was going to take 116532/3600 = 32 Hours to run! No wonder it choked!

Dan was really confused at this point. But he decided to bravely trudge onward.

He figured he would give the XML approach a try… it certainly seemed to hold its own in Aaron’s tests. This XML technique was an approach that some joker named Brad Schulz had talked about several times in his blog. Dan didn’t always read Brad’s blog posts… Sometimes Brad had a weird sense of humor and that didn’t always sit well with Dan… Dan was kind of a serious fellow.

But, despite his feelings about Brad, Dan went ahead and created a function using the XML string-splitting function in his database:

create function dbo.ufn_SplitIntArrayXML
(
   @List       varchar(max)
  ,@Delimiter  char(1)
)
returns table
as
return
select Item
from (select XMLString='<x>'+replace(@List,@Delimiter,'</x><x>')+'</x>') F1
cross apply (select XMLList=cast(XMLString as xml).query('.')) F2
cross apply XMLList.nodes('/x') F3(XMLNode)
cross apply (select Item=XMLNode.value('(./text())[1]','int')) F4

He tested it out…

select * 
from dbo.ufn_SplitIntArrayXML('123,456,789',',')
/*
Item
----
 123
 456
 789
*/

…and it worked just fine. So he incorporated it into his query and executed it:

declare @ProductList varchar(max) = '897,911,942'
select d.ProductID 
      ,ProductName=p.Name
      ,h.SalesOrderID 
      ,h.OrderDate
      ,TerritoryName=t.Name 
from dbo.ufn_SplitIntArrayXML(@ProductList,',') a
join Sales.SalesOrderDetail d on a.Item=d.ProductID 
join Production.Product p on d.ProductID=p.ProductID 
join Sales.SalesOrderHeader h on d.SalesOrderID=h.SalesOrderID 
join Sales.SalesTerritory t on h.TerritoryID=t.TerritoryID 
order by d.ProductID,h.SalesOrderID 
/*
ProductID ProductName                      SalesOrderID OrderDate  TerritoryName
--------- -------------------------------- ------------ ---------- --------------
      897 LL Touring Frame - Blue, 58             51823 2003-08-01 United Kingdom
      897 LL Touring Frame - Blue, 58             51875 2003-08-01 Australia
      911 LL Road Seat/Saddle                     51140 2003-07-01 Central
      911 LL Road Seat/Saddle                     53472 2003-09-01 United Kingdom
      911 LL Road Seat/Saddle                     53488 2003-09-01 Canada
      911 LL Road Seat/Saddle                     53495 2003-09-01 United Kingdom
      911 LL Road Seat/Saddle                     53501 2003-09-01 Southeast
      911 LL Road Seat/Saddle                     53529 2003-09-01 Canada
      942 ML Mountain Frame-W - Silver, 38        51120 2003-07-01 United Kingdom
      942 ML Mountain Frame-W - Silver, 38        51711 2003-08-01 Northwest
      942 ML Mountain Frame-W - Silver, 38        51758 2003-08-01 Southeast
      942 ML Mountain Frame-W - Silver, 38        51799 2003-08-01 Southeast
      942 ML Mountain Frame-W - Silver, 38        51856 2003-08-01 Southwest
*/

Hmmm… well at least this approach didn’t croak like the Inline technique did. The result was correct. And it seemed pretty snappy in execution. Dan investigated the Profiler and Actual Execution Plan data and added data to his table of figures:

/*
                                                  Estimated     Estimated
Description           CPU    Reads  Duration      QueryCost  Number of Rows  MemoryGrant
----------------------------------------------------------------------------------------
Original (RBAR)      62ms      832     304ms       0.790177         359.935      2,192KB
Numbers Technique  5961ms  244,199    2229ms    3556.620000  56,629,000.000     37,984KB
Inline Technique      N/A      N/A       N/A  116532.000000  34,912,200.000        N/A
XML Technique       237ms      843     449ms      26.408000      35,993.500     25,856KB
*/

This was really weird! Dan’s original RBAR approach was still the clear winner in all these tests. The XML approach came pretty darn close… MUCH closer than the other methods… but it wanted a memory grant of 26MB!

Dan’s world had turned upside-down. He was expecting to hear news bulletins any minute talking about pigs flying and snowballs discovered in Hell. This just wasn’t right!

Dan figured he’d have to go the final step… There was only one clear path. He took a deep breath, let it out slowly, and decided he would have to brave the CLR technique.

Aaron Bertrand’s roundup had given a link to Adam Machanic’s blog that had the C# source code for doing the string splitting. But, as little as Dan knew about C#, he could tell that Adam’s code returned a table of NVARCHAR vales, not INT values.

So Dan gave a call to an old friend from college named Biff Wellington who was a C# consultant and asked if he could help. Biff was thrilled to hear from Dan and said he’d be happy to assist… anything for an old pal. Dan referred him to the source code at Adam Machanic’s blog and, an hour later, Biff emailed Dan the revised code… and a bill for $150. Dan swore under his breath.

Here was the code that Biff had sent, which amended Adam Machanic’s code to include a FillRowInts function and SplitStringOfInts function to return a table of integers:

Click here to download the C# Code

Now Dan had to jump through all the hoops to get this C# code to work. He downloaded a copy of C# 2008 Express Edition from Microsoft (at least that was free, unlike Biff’s services… the jerk). He created a project using the Class Library template, calling the project BiffSucks. He pasted in Biff’s code and hit the F6 key to Build the Solution and it built successfully! Now he was ready to get into SSMS to create his function.

First he made sure that the server was enabled for CLR:

sp_configure 'CLR Enabled',1
go
reconfigure
go

Then he created an assembly, which referenced the DLL he had built:

create assembly StringHelper
from 'c:\Users\DanDruff\Documents\Visual Studio 2008\Projects\'
    +'BiffSucks\BiffSucks\obj\Release\BiffSucks.dll'

And then he created the function to use that assembly and its SplitStringOfInts function:

create function dbo.ufn_SplitIntArrayCLR
(
   @List       nvarchar(max)
  ,@Delimiter  nchar(1)
)
returns table (Item int)
external name StringHelper.UserDefinedFunctions.SplitStringOfInts

Gee, this wasn’t so hard after all!

He tested it out…

select * 
from dbo.ufn_SplitIntArrayCLR(N'123,456,789',N',')
/*
Item
----
 123
 456
 789
*/

…and it worked just fine. So he incorporated it into his query and executed it:

declare @ProductList nvarchar(max) = N'897,911,942'
select d.ProductID 
      ,ProductName=p.Name
      ,h.SalesOrderID 
      ,h.OrderDate
      ,TerritoryName=t.Name 
from dbo.ufn_SplitIntArrayCLR(@ProductList,N',') a
join Sales.SalesOrderDetail d on a.Item=d.ProductID 
join Production.Product p on d.ProductID=p.ProductID 
join Sales.SalesOrderHeader h on d.SalesOrderID=h.SalesOrderID 
join Sales.SalesTerritory t on h.TerritoryID=t.TerritoryID 
order by d.ProductID,h.SalesOrderID 
/*
ProductID ProductName                      SalesOrderID OrderDate  TerritoryName
--------- -------------------------------- ------------ ---------- --------------
      897 LL Touring Frame - Blue, 58             51823 2003-08-01 United Kingdom
      897 LL Touring Frame - Blue, 58             51875 2003-08-01 Australia
      911 LL Road Seat/Saddle                     51140 2003-07-01 Central
      911 LL Road Seat/Saddle                     53472 2003-09-01 United Kingdom
      911 LL Road Seat/Saddle                     53488 2003-09-01 Canada
      911 LL Road Seat/Saddle                     53495 2003-09-01 United Kingdom
      911 LL Road Seat/Saddle                     53501 2003-09-01 Southeast
      911 LL Road Seat/Saddle                     53529 2003-09-01 Canada
      942 ML Mountain Frame-W - Silver, 38        51120 2003-07-01 United Kingdom
      942 ML Mountain Frame-W - Silver, 38        51711 2003-08-01 Northwest
      942 ML Mountain Frame-W - Silver, 38        51758 2003-08-01 Southeast
      942 ML Mountain Frame-W - Silver, 38        51799 2003-08-01 Southeast
      942 ML Mountain Frame-W - Silver, 38        51856 2003-08-01 Southwest
*/

It worked! And it seemed pretty quick! Dan was hopeful as he started to record the Profiler and Actual Execution Plan figures into his table:

/*
                                                  Estimated     Estimated
Description           CPU    Reads  Duration      QueryCost  Number of Rows  MemoryGrant
----------------------------------------------------------------------------------------
Original (RBAR)      62ms      832     304ms       0.790177         359.935      2,192KB
Numbers Technique  5961ms  244,199    2229ms    3556.620000  56,629,000.000     37,984KB
Inline Technique      N/A      N/A       N/A  116532.000000  34,912,200.000        N/A
XML Technique       237ms      843     449ms      26.408000      35,993.500     25,856KB
CLR Technique       443ms     1541     526ms      22.483400     359,935.000     99,744KB
*/

Huh? The CLR approach performed in under a second, but it was still slower than the RBAR and XML approach! And it had almost twice the reads! And it wanted almost 100MB of memory to do its job!!

Dan was feeling faint. This must be a dream, he thought. He pinched himself. But he didn’t wake up… it was not a dream.

Dan was at wit’s end… The world no longer made any sense.

Not knowing what else to do, he decided to go to Brad Schulz’s blog to read more about the XML technique. Even though it underperformed compared to Dan’s boring old RBAR approach, at least the whole idea of XML was kind of cool and modern.

And when he navigated to the blog, he came upon a post that Brad had written on Aug13,2010 about Query Hints. Brad was describing a situation frighteningly similar to what Dan was experiencing. By adding query hints of FORCE ORDER, MAXDOP 1, and LOOP JOIN to the query, Brad demonstrated how the execution plan could be changed to give better performance.

So Dan decided to add these query hints to each of the 5 techniques and see what would happen.

The new query plans created by the hints certainly made a lot more sense… no more parallelism and weird JOIN configurations and such… and the Inline technique actually worked this time!

Here is what he recorded:

/*
                                                  Estimated     Estimated
Description           CPU    Reads  Duration      QueryCost  Number of Rows  MemoryGrant
----------------------------------------------------------------------------------------
NO HINTS:
Original (RBAR)      62ms      832     304ms       0.790177         359.935      2,192KB
Numbers Technique  5961ms  244,199    2229ms    3556.620000  56,629,000.000     37,984KB
Inline Technique      N/A      N/A       N/A  116532.000000  34,912,200.000        N/A
XML Technique       237ms      843     449ms      26.408000      35,993.500     25,856KB
CLR Technique       443ms     1541     526ms      22.483400     359,935.000     99,744KB
----------------------------------------------------------------------------------------
WITH HINTS:
Original (RBAR)      18ms      257      67ms        1.13386         288.875      1,024KB
Numbers Technique    23ms      212     178ms    42676.40000  20,968,100.000    383,976KB
Inline Technique    130ms      202     157ms    30394.90000  10,049,000.000    383,976KB
XML Technique        34ms      203      41ms       45.78860      28,887.500      6,768KB
CLR Technique        32ms      203      32ms      220.80600     288,875.000     63,000KB
----------------------------------------------------------------------------------------
*/

With the hints incorporated into the queries, all of the approaches were nice and quick, with a Duration of under 200ms. The Numbers and Inline techniques were clearly the worst performers… and they both wanted an ENORMOUS memory grant of about 384MB!!!… This was because the optimizer made such a gross over-estimate of how many records would result, that it figured it was going to have to sort tens of millions of rows. Dan eliminated those techniques outright.

Oddly enough, his RBAR technique still had the lowest CPU, but among the three remaining techniques, it had the highest Reads and longest Duration. On the other hand, RBAR had the lowest memory grant by far. The CLR, which was the fastest performer, wanted 60 times the memory that Dan’s RBAR technique wanted. At least the XML approach was not as much of a memory hog, since it had a more reasonable estimate of the number of resulting rows than CLR.

Ultimately, Dan made a decision. He was happy with his RBAR function. Sure, it wasn’t “sexy” like the other approaches, but it had a lot going for it. It was readable. It was simple. The optimizer made reasonable estimates of rows when it was incorporated into a query. It did not demand a lot of memory. And, if in the future, Dan forgot to include the FORCE ORDER and MAXDOP 1 and LOOP JOIN hints in a query that used his function, it would still perform just fine with very little memory requirements (unlike the other approaches).

However, Dan also vowed to make sure that in the future, he would at least test out future queries to make sure that his RBAR function was still a reasonable performer. There may be other, more complicated queries in his future that may work better with the other techniques. He now knew, better than ever, the golden rule of SQL Server: “It depends.”

Dan sat back in his chair, pleased with his decisions.

And he lived happily ever after.

The morals of the story are:

What works well in theory may not work well in practice… It depends… Test test test.

and (forgive me)…

If you're itching to get out of a hairy splitting situation, apply the Dan Druff treatment:

Gather data in testing various approaches

Hints if necessary

Repeat for future queries

Friday, August 13, 2010

Taking A Hint

A new client of mine needed some performance tuning in several of his stored procedures. In most cases, it involved creating an appropriate index, or changing a table-valued function call to an equivalent inline TVF call, or just correcting the predicates in the WHERE clause to make it more SARGable. But there were a couple times when I could eke out even more performance by doing a little something extra.

And what “extra thing” did I give to those queries? I’ll give you a hint: I gave them a hint.

I Want It FAST

For example, there was one procedure that did a little calculation to come up with a particular date, and then it needed to pull out some information from some tables based on a 1-day date range from that date. It was something like this query in AdventureWorks:

declare @DesiredDateAtMidnight datetime = '20010709'
declare @NextDateAtMidnight datetime = dateadd(day,1,@DesiredDateAtMidnight)
select OrderID=h.SalesOrderID
      ,h.OrderDate
      ,h.TerritoryID
      ,TerritoryName=t.Name
      ,c.CardType 
      ,c.CardNumber
      ,CardExpire=right(str(100+ExpMonth),2)+'/'+str(ExpYear,4)
      ,h.TotalDue
from Sales.SalesOrderHeader h
left join Sales.SalesTerritory t on h.TerritoryID=t.TerritoryID 
left join Sales.CreditCard c on h.CreditCardID=c.CreditCardID 
where OrderDate>=@DesiredDateAtMidnight 
  and OrderDate<@NextDateAtMidnight 
order by h.SalesOrderID
/*
OrderID OrderDate  TerritoryName   CardType      CardNumber     CardExpire  TotalDue
------- ---------- --------------- ------------- -------------- ---------- ---------
  43728 2001-07-09 Southwest       Distinguish   55553397232060 03/2007    3953.9884
  43729 2001-07-09 United Kingdom  ColonialVoice 77771647096870 03/2006     3756.989
  43730 2001-07-09 Southwest       Distinguish   55552313505236 09/2007     3756.989
  43731 2001-07-09 Australia       Vista         11114001441348 11/2006    3953.9884
  43732 2001-07-09 Australia       Distinguish   55557723662910 11/2007     3729.364
  43733 2001-07-09 Northwest       SuperiorCard  33338859457470 04/2008    3953.9884
*/

That particular query produces this (actual) execution plan (click on the plan to see the full-size image in a new window):

The optimizer decided to approach this by doing a Merge Join between the CreditCard table (which has 19118 rows) and the result of a Join of the Territory and the SalesOrderHeader tables. Note that it had to SORT the Territory/SalesOrderHeader data by CreditCardID so that it could do that Merge Join. And then, after that, it had to SORT the final data by SalesOrderID, because that’s what the ORDER BY clause of the query called for.

Note, also that the optimizer estimated that my WHERE clause would pull 2831.85 rows (out of a total of 31465) from SalesOrderHeader. How did it arrive at this figure? Since the @DesiredDateAtMidnight and @NextDateAtMidnight variables had values that were unknown to the query, it had to make assumptions. The optimizer assumes that equality-based predicates are 10% selective and inequality-based predicates are 30% selective. Since I had two inequality-based predicates, it assumed that together they would be 30% * 30% = 9% selective. And, sure enough, 31465 total rows * 9% = 2183.85 rows.

But the final result of the query was only 6 rows.

This is the situation that I was in with my client’s query. The optimizer grossly over-estimated how many rows it would find, and so it came up with a query that was efficient for handling that many rows. But I knew that a day’s worth of data from the particular tables that I was querying would produce about an average of 10 rows, not thousands of rows.

So I gave the optimizer a hint… specifically I gave it a FAST 10 hint, indicating that it should put together a plan that would return 10 rows as fast as possible. Adding OPTION (FAST 10) to the end of our query above produces this (actual) execution plan:

Since the optimizer now knew that only 10 rows were estimated, it could put together a more appropriate plan for that small result. Note that there are Nested Loop Joins rather than a Merge Join and a Hash Join, and there are no more Sort operators, since the main driving table of the query is the SalesOrderHeader Clustered Index, which is already sorted by SalesOrderID.

I could have used the OPTION (RECOMPILE) hint instead, which would allow the optimizer to “sniff” the values of the variables and put together an appropriate plan based on their values, but that would involve having to recompile the plan from scratch every time this procedure was run, and therefore the plan would not go into the cache for others to use. The OPTION (FAST 10) plan, on the other hand, would be cached for others.

The Profiler statistics for the two queries are as follows:

/*
Description        CPU  Reads  Duration  QueryCost  Relative%
-------------------------------------------------------------
No Hint Provided  63ms    889     288ms  0.9698090     96%
OPTION (FAST 10)  31ms    775     173ms  0.0450217      4%
*/

The CPU is so low for both queries that it’s really meaningless to compare them, but you can see that the Number of Reads and the Duration are lower by introducing the hint. The optimizer figured the second approach was much cheaper, but that’s mainly because, as far as it was concerned, it was comparing a 2184-row query to a 10-row query.

In my particular case with the client, the query was a bit more complicated, so the difference in CPU and Reads was much more dramatic. By adding the FAST 10 hint, the CPU was decreased by 60% and the Reads were decreased by 75%.

Using the FORCE

Another query that I wanted to tune incorporated an array-splitting (inline) table-valued function that used a Numbers table.

Let’s put together an example in AdventureWorks to illustrate what I was dealing with. First, create a Numbers table consisting of the integers from 1 to 1,000,000 and create a clustered index on it:

;with 
  L0(c) as (select 0 from (values (0),(0),(0)) x(c)) --3 Rows
 ,L1(c) as (select 0 from L0 a, L0 b, L0 c)          --27 Rows (3x3x3)
 ,L2(c) as (select 0 from L1 a, L1 b, L1 c)          --19683 Rows (27x27x27)
 ,L3(c) as (select 0 from L2 a, L2 b)                --387,420,489 Rows (19683x19683)
 ,NN(n) as (select row_number() over (order by (select 0)) from L3)
select Number=isnull(convert(int,n),0)  --Force it to be a "not null" column
into dbo.Numbers 
from NN 
where n<=1000000
go 
alter table dbo.Numbers 
add constraint PK_Numbers 
primary key clustered (Number)
with (fillfactor=100)

Next, create an inline table-valued function that would use that Numbers table to split a comma-delimited list of integers:

create function dbo.ufn_SplitIntArray
(
   @List       varchar(max)
  ,@Delimiter  varchar(10)
)
returns table
as
return
select Item=convert(int,String)
from dbo.Numbers with (nolock)
cross apply (select ItemPos=Number) F1
cross apply (select DelimPos=charindex(@Delimiter,@List+@Delimiter,ItemPos)) F2
cross apply (select String=rtrim(ltrim(substring(@List,ItemPos,DelimPos-ItemPos)))) F3
where ItemPos<=convert(int,len(@List))
  and substring(@Delimiter+@List,ItemPos,1)=@Delimiter

You can see how this function is used in the example below:

select * 
from dbo.ufn_SplitIntArray('123,456,789',',')
/*
Item
----
 123
 456
 789
*/

Now that we have that in place, we can see a query that makes use of this function. The following will take a list of TerritoryID’s and will list the SalesOrderHeader rows that are in those Territories, along with the Territory Description:

declare @TerritoryList varchar(max) = '2,3,5'
select h.SalesOrderID
      ,h.OrderDate
      ,t.TerritoryID
      ,TerritoryName=t.Name 
      ,h.PurchaseOrderNumber 
      ,h.TotalDue 
from dbo.ufn_SplitIntArray(@TerritoryList,',') a
join Sales.SalesTerritory t on a.Item=t.TerritoryID 
join Sales.SalesOrderHeader h on t.TerritoryID=h.TerritoryID 
order by h.SalesOrderID
/*
SalesOrderID OrderDate  TerritoryID TerritoryName PurchaseOrderNumber   TotalDue
------------ ---------- ----------- ------------- ------------------- ----------
       43659 2001-07-01           5 Southeast     PO522145787         27231.5495
       43660 2001-07-01           5 Southeast     PO18850127500        1716.1794
       43667 2001-07-01           3 Central       PO15428132599        8095.7863
         ...
       74193 2004-07-02           5 Southeast     NULL                   43.0729
       74548 2004-07-13           5 Southeast     NULL                    38.675
       75103 2004-07-31           5 Southeast     NULL                   44.1779
(1223 rows total)
*/

And here is the (actual) execution plan for that query:

Notice that Parellelism is involved in this plan. The optimizer figured our query was so expensive that it decided it was cheaper to spread the burden of the query across the 4 processors in my system.

As far as how to attack the query, the optimizer put together a plan that starts by scanning the SalesTerritory table and Hash Joins it with the SalesOrderHeader table, resulting in an estimated 24,832 rows.

It also scans the Numbers table, first satisfying the predicate of

where Number<=convert(int,len(@TerritoryList))

Since the optimizer has no idea what is in @TerritoryList, it makes an inequality-based assumption that the predicate will have a selectivity of 30%, and therefore it estimates that it will produce an estimated 1,000,000 * 30% = 300,000 rows. In reality, it only produced 5 rows.

Then it applies a filter to those rows to pull the positions of the comma-delimited numbers in @TerritoryList… in other words a filter to satisfy the predicate of

where substring(','+@TerritoryList,Number,1)=','

It estimates that those 300,000 rows will filter down to 9486.83 (I have no idea how it arrived at this 3.16% selectivity figure). In reality, it only produced 3 rows. Finally, the optimizer plans on storing those rows into a Spool for repeated access by the Nested Loops operator.

Now the Nested Loops operator will match up each of the estimated 24,832 rows with each of the estimated 9486.83 rows from the spool and will find the ones that satisfy this JOIN predicate (which is implied by incorporating our inline table-valued function in our query):

convert(int
       ,rtrim(ltrim(substring(@TerritoryList
                             ,Number
                             ,charindex(',',@TerritoryList+',',Number)-Number
                             )
                   )
             )
       ) = SalesTerritory.TerritoryID

Since that’s an equality predicate, it assumes 10% selectivity, so it comes up with a final estimated result of 24,832 * 9486.83 * 10% = 23,557,700 rows for the final result. Wow! In reality, the Nested Loops will match 31,465 rows with the 3 rows in the spool (producing 94,395 matches) and end up only finding 1223 that satisfy the JOIN predicate (a selectivity of only 1.3%).

The bottom line here is that the optimizer got scared by that million-row table of Numbers and grossly overestimated how many values it would pull from that table and so put together a plan to most efficiently handle a final product of over 23 million rows. That’s why it started by processing the 31,465-row SalesOrderHeader table and spooled the estimated 9486.83 rows from the Numbers table that it would match up with each of those.

But we know that, in reality, our @TerritoryList will only consist of a small handful of comma-delimited values, so the plan should ideally start with the Numbers table, producing the positions of the values in @TerritoryList and, for each of those values, look them up in the SalesTerritory and then find the rows in SalesOrderHeader with that TerritoryID.

In other words, it should process the plan in exactly the order that I specified in the query:

from dbo.ufn_SplitIntArray(@TerritoryList,',') a
join Sales.SalesTerritory t on a.Item=t.TerritoryID 
join Sales.SalesOrderHeader h on t.TerritoryID=h.TerritoryID

Well, there’s a hint for that! If we add an OPTION (FORCE ORDER) to our query, we end up with a completely new (actual) execution plan:

Query Plan With OPTION (FORCE ORDER) Hint

And here are the statistics that Profiler gives us in comparing the two plans:

/*
Description          CPU   Reads  Duration  QueryCost  Relative%
----------------------------------------------------------------
No Hint Provided  1499ms  63,783     854ms    1167.11     27%
FORCE ORDER        141ms   3,660     314ms    3149.68     73%
*/

Nothing changed in what the optimizer estimates… it still thinks the query will produce 23 million rows. And it obviously thinks we’re completely insane to do a FORCE ORDER, because it estimates that the cost will almost triple by doing it this way. But you and I know better, and by introducing that hint, we decreased the CPU and Reads dramatically.

But this new plan still has Parallelism in it. The optimizer thought we needed it because it thinks the cost is so ridiculous that it will be cheaper to split up the query among the 4 processors in my computer. And, interestingly enough, it figures that by doing that split, it can introduce Sort operators to order the two main streams by TerritoryID and do a Merge Join, and then turn around and do a Sort on SalesOrderID for the final result.

Well, we know that our final result is not going to be that large… it’s certainly not going to be anywhere even close to 23 million rows! So let’s just get rid of the Parallelism. We’ll add one additional hint (MAXDOP 1) to our query in order to force it to only use one processor:

option (force order, maxdop 1)

Here’s the (actual) execution plan that results from that hint:

Query Plan With OPTION (FORCE ORDER, MAXDOP 1) Hint

See how simple that new plan is? Now how does that compare to the previous two plans?:

/*
Description               CPU   Reads  Duration  QueryCost  Relative%
--------------------------------------------------------------------
No Hint Provided       1499ms  63,783     854ms    1167.11     12%
FORCE ORDER             141ms   3,660     314ms    3149.68     31%
FORCE ORDER, MAXDOP 1    31ms     741     175ms    5788.61     57%
*/

The optimizer thinks we’ve completely lost our mind, since it estimates the Cost is 5788.61… over 5 times more expensive than the plan it would have produced without hints. It’s probably snickering… laughing behind our backs and thinking, “Your stupid plan with hints will take over an hour and a half to run, but if you let me do my job, I can do it all in only 20 minutes!”

But that couldn’t be further than the truth. By introducing the hints, we reduced the CPU by 98% and reduced the Reads by 99%.

One More LOOPy Enhancement

But wait! We can even do better!

The plan still involves a Clustered Index Scan of the entire SalesOrderHeader table. If we can create an index on TerritoryID and INCLUDE the other columns required by the query, then we could SEEK into that index by TerritoryID rather than SCANning it.

So let’s create an index… Note that the TotalDue column in SalesOrderHeader is actually a computed column based on SubTotal and TaxAmt and Freight, so we have to INCLUDE those columns:

create index ix_Terr_iOrdDt_iPO_iTDue 
on Sales.SalesOrderHeader
(TerritoryID)
include (OrderDate,PurchaseOrderNumber,SubTotal,TaxAmt,Freight)

Now let’s run our query (with our two hints) and see if that index made a difference in the plan:

Query Plan With OPTION (FORCE ORDER, MAXDOP 1) Hint And New Index

Hmmm… the only thing that changed was that it was SCANning our new index rather than SCANning the Clustered Index. That improves the query somewhat, decreasing the number of reads, because our new index is smaller in terms of number of pages:

/*
Description               CPU   Reads  Duration
-----------------------------------------------
No Hint Provided       1499ms  63,783     854ms
FORCE ORDER             141ms   3,660     314ms
FORCE ORDER, MAXDOP 1    31ms     741     175ms
Add New Index            31ms     232     172ms
*/

But we created that index to take advantage of the TerritoryID. We wanted it to SEEK into the index rather than SCAN the entire thing.

So another hint to the rescue!

This time we will tell the optimizer that we only want it to use LOOP JOINs. So here’s our new query with all the hints involved:

declare @TerritoryList varchar(max) = '2,3,5'
select h.SalesOrderID
      ,h.OrderDate
      ,t.TerritoryID
      ,TerritoryName=t.Name 
      ,h.PurchaseOrderNumber 
      ,h.TotalDue 
from dbo.ufn_SplitIntArray(@TerritoryList,',') a
join Sales.SalesTerritory t on a.Item=t.TerritoryID 
join Sales.SalesOrderHeader h on t.TerritoryID=h.TerritoryID 
order by h.SalesOrderID
option (force order, maxdop 1, loop join)

And here’s the new (actual) execution plan that it produces:

Query Plan With OPTION (FORCE ORDER, MAXDOP 1, LOOP JOIN) Hint

Note that we now have a Nested Loops Join that now SEEKs into our new index. And the results?

/*
Description               CPU   Reads  Duration
-----------------------------------------------
No Hint Provided       1499ms  63,783     854ms
FORCE ORDER             141ms   3,660     314ms
FORCE ORDER, MAXDOP 1    31ms     741     175ms
Add New Index            31ms     232     172ms
Add LOOP JOIN Hint       16ms      69     144ms
*/

Hah! So we’ve ultimately decreased the Reads by 99.9% from 63,783 down to a measly 69. That’s the way it should be with a straightforward query like this one.

Thank goodness for the existence of hints. They’re one more effective tool we can use to ~~beat the optimizer into submission~~ persuade the optimizer to form a plan that we know is best for our query.

SELECT Blog FROM Brad.Schulz CROSS APPLY SQL.Server()

Tuesday, August 17, 2010

Integer List Splitting: A SQL Fable

Friday, August 13, 2010

Taking A Hint

Search This Blog

Subscribe

About Me

Keywords

Blog Archive

Followers

# # #

SELECT Blog FROM Brad.Schulz CROSS APPLY SQL.Server()

Tuesday, August 17, 2010

Integer List Splitting: A SQL Fable

Friday, August 13, 2010

Taking A Hint

Search This Blog

Subscribe

Subscribe To...

About Me

Keywords

Blog Archive

Followers

# # #