Tuesday, August 17, 2010

Integer List Splitting: A SQL Fable

Once Upon A Time...nce upon a time, not too long ago, there was a company called AdventureWorks. And working in this company was a diligent database developer named Dan Druff. Dan was responsible for developing and maintaining the AdventureWorks SQL Server database.

One day, Dan got a request to put together a query. The specs called for accepting a string of comma-delimited ProductID’s and to produce a result set representing orders that used those ProductID’s. The result set was to consist of the ProductID, the Product Name, the SalesOrderID, the Order Date, and the name of the Territory associated with the order. The final output was to be ordered by ProductID and SalesOrderID.

This sounded easy enough. Dan already had a user-defined function to split a comma-delimited list of integers. He had written it himself:

create function dbo.ufn_SplitIntArray
(
@List varchar(max)
,@Delimiter char(1)
)
returns @Items table (Item int)
as
begin
declare @Item varchar(12)
,@Pos int
while len(@List)>0
begin
set @Pos=charindex(@Delimiter,@List)
if @Pos=0 set @Pos=len(@List)+1
set @Item=left(@List,@Pos-1)
insert @Items select convert(int,ltrim(rtrim(@Item)))
set @List=substring(@List,@Pos+len(@Delimiter),len(@List))
end
return
end
He put together the requested query, incorporating his function, and he tested it out:

declare @ProductList varchar(max) = '897,911,942'
select d.ProductID
,ProductName=p.Name
,h.SalesOrderID
,h.OrderDate
,TerritoryName=t.Name
from dbo.ufn_SplitIntArray(@ProductList,',') a
join Sales.SalesOrderDetail d on a.Item=d.ProductID
join Production.Product p on d.ProductID=p.ProductID
join Sales.SalesOrderHeader h on d.SalesOrderID=h.SalesOrderID
join Sales.SalesTerritory t on h.TerritoryID=t.TerritoryID
order by d.ProductID,h.SalesOrderID
/*
ProductID ProductName SalesOrderID OrderDate TerritoryName
--------- -------------------------------- ------------ ---------- --------------
897 LL Touring Frame - Blue, 58 51823 2003-08-01 United Kingdom
897 LL Touring Frame - Blue, 58 51875 2003-08-01 Australia
911 LL Road Seat/Saddle 51140 2003-07-01 Central
911 LL Road Seat/Saddle 53472 2003-09-01 United Kingdom
911 LL Road Seat/Saddle 53488 2003-09-01 Canada
911 LL Road Seat/Saddle 53495 2003-09-01 United Kingdom
911 LL Road Seat/Saddle 53501 2003-09-01 Southeast
911 LL Road Seat/Saddle 53529 2003-09-01 Canada
942 ML Mountain Frame-W - Silver, 38 51120 2003-07-01 United Kingdom
942 ML Mountain Frame-W - Silver, 38 51711 2003-08-01 Northwest
942 ML Mountain Frame-W - Silver, 38 51758 2003-08-01 Southeast
942 ML Mountain Frame-W - Silver, 38 51799 2003-08-01 Southeast
942 ML Mountain Frame-W - Silver, 38 51856 2003-08-01 Southwest
*/
It seemed to work just fine… it spit out the result quickly and efficiently.

But something nagged at Dan. His function had serviced him well in the past, but the function was so “yesterday”… so “1970’s”… it just felt kind of old-fashioned and tired. Dan wanted a function that was more “hip” and more “with it”.

And then Dan had an idea.

He recalled reading an excellent blog post by MVP Aaron Bertrand from last July that did a roundup of various string-splitting techniques. One of the techniques was just like Dan’s, and Aaron had given the approach a nickname of RBAR, meaning “Row By Agonizing Row”. The roundup showed that this technique was one of the poorest performers.

So Dan made a decision right then and there that he was going to move into the 21st century… he was going to upgrade his function to one of the better performers that Aaron had outlined in his blog.

The fastest performer in Aaron’s roundup by far was the CLR approach. But that kind of scared Dan… he didn’t want to get into something that he didn’t fully understand.

So he gave one of the other techniques a try.



In general, the Numbers Table technique seemed to be the next-best performer after CLR in Aaron’s roundup, so Dan went about creating a table of numbers from 1 to 1,000,000 in his AdventureWorks database, and created a primary key for it:

;with 
L0
(c) as (select 0 from (values (0),(0),(0)) x(c)) --3 Rows
,L1(c) as (select 0 from L0 a, L0 b, L0 c) --27 Rows (3x3x3)
,L2(c) as (select 0 from L1 a, L1 b, L1 c) --19683 Rows (27x27x27)
,L3(c) as (select 0 from L2 a, L2 b) --387,420,489 Rows (19683x19683)
,NN(n) as (select row_number() over (order by (select 0)) from L3)
select Number=isnull(convert(int,n),0) --Force it to be a "not null" column
into dbo.Numbers
from NN
where n<=1000000
go
alter table dbo.Numbers
add constraint PK_Numbers
primary key clustered (Number)
with (fillfactor=100)
And then he put together a function to split a string of integers using this auxiliary table of numbers:

create function dbo.ufn_SplitIntArrayNum
(
@List varchar(max)
,@Delimiter char(1)
)
returns table
as
return
select
Item=convert(int,String)
from dbo.Numbers with (nolock)
cross
apply (select ItemPos=Number) F1
cross apply (select DelimPos=charindex(@Delimiter,@List+@Delimiter,ItemPos)) F2
cross apply (select String=rtrim(ltrim(substring(@List,ItemPos,DelimPos-ItemPos)))) F3
where ItemPos<=convert(int,len(@List))
and substring(@Delimiter+@List,ItemPos,1)=@Delimiter
He tested it out…

select * 
from dbo.ufn_SplitIntArrayNum('123,456,789',',')
/*
Item
----
123
456
789
*/
…and it worked just fine. So he incorporated it into his query and executed it:

declare @ProductList varchar(max) = '897,911,942'
select d.ProductID
,ProductName=p.Name
,h.SalesOrderID
,h.OrderDate
,TerritoryName=t.Name
from dbo.ufn_SplitIntArrayNum(@ProductList,',') a
join Sales.SalesOrderDetail d on a.Item=d.ProductID
join Production.Product p on d.ProductID=p.ProductID
join Sales.SalesOrderHeader h on d.SalesOrderID=h.SalesOrderID
join Sales.SalesTerritory t on h.TerritoryID=t.TerritoryID
order by d.ProductID,h.SalesOrderID
/*
ProductID ProductName SalesOrderID OrderDate TerritoryName
--------- -------------------------------- ------------ ---------- --------------
897 LL Touring Frame - Blue, 58 51823 2003-08-01 United Kingdom
897 LL Touring Frame - Blue, 58 51875 2003-08-01 Australia
911 LL Road Seat/Saddle 51140 2003-07-01 Central
911 LL Road Seat/Saddle 53472 2003-09-01 United Kingdom
911 LL Road Seat/Saddle 53488 2003-09-01 Canada
911 LL Road Seat/Saddle 53495 2003-09-01 United Kingdom
911 LL Road Seat/Saddle 53501 2003-09-01 Southeast
911 LL Road Seat/Saddle 53529 2003-09-01 Canada
942 ML Mountain Frame-W - Silver, 38 51120 2003-07-01 United Kingdom
942 ML Mountain Frame-W - Silver, 38 51711 2003-08-01 Northwest
942 ML Mountain Frame-W - Silver, 38 51758 2003-08-01 Southeast
942 ML Mountain Frame-W - Silver, 38 51799 2003-08-01 Southeast
942 ML Mountain Frame-W - Silver, 38 51856 2003-08-01 Southwest
*/
Well, it gave the correct result, but something seemed strange. The query seemed sluggish. It didn’t seem as quick as his query that used his tired old “RBAR” function. But that couldn’t be possible… after all, Aaron’s blog had clearly demonstrated the superiority of the Numbers approach.

So Dan fired up Profiler and did a comparison between the two queries. And he looked at the Actual Execution Plans for the two queries and he was shocked at what he discovered:

/*
Estimated Estimated
Description CPU Reads Duration QueryCost Number of Rows MemoryGrant
----------------------------------------------------------------------------------------
Original (RBAR) 62ms 832 304ms 0.790177 359.935 2,192KB
Numbers Technique 5961ms 244,199 2229ms 3556.620000 56,629,000.000 37,984KB
*/
What!!??What!!?? This couldn’t be! Dan’s RBAR approach completely annihilated the Numbers approach by a mile! Heck, it beat it by more than a mile… it beat it by a light-year!

The CPU and Reads of the Numbers Technique was huge! The memory grant was almost 38MB! And the query took over 2 full seconds to run! This was ridiculous!

Dan was very puzzled and a little scared. Maybe he was doing something wrong. Maybe something was going on that he didn’t understand.

He decided to try another technique.



The approach that Aaron’s blog named “Inline 2” looked promising. So Dan created that function in his database:

create function dbo.ufn_SplitIntArrayInline
(
@List varchar(max)
,@Delimiter char(1)
)
returns table
as
return
select
Item=convert(int,(substring(@Delimiter+@List+@Delimiter
,w.n+1
,charindex(@Delimiter
,@Delimiter+@List+@Delimiter
,w.n+1) - w.n - 1
)))
from (select n=v0.n+v1.n+v2.n+v3.n
from (select n = 0
union all select 1 union all select 2 union all select 3
union all select 4 union all select 5 union all select 6
union all select 7 union all select 8 union all select 9
union all select 10 union all select 11 union all select 12
union all select 13 union all select 14 union all select 15) v0
,(select n = 0
union all select 16 union all select 32 union all select 48
union all select 64 union all select 80 union all select 96
union all select 112 union all select 128 union all select 144
union all select 160 union all select 176 union all select 192
union all select 208 union all select 224 union all select 240) v1
,(select n = 0
union all select 256 union all select 512 union all select 768
union all select 1024 union all select 1280 union all select 1536
union all select 1792 union all select 2048 union all select 2304
union all select 2560 union all select 2816 union all select 3072
union all select 3328 union all select 3584 union all select 3840) v2
,(select n = 0
union all select 4096 union all select 8192 union all select 12288
union all select 16384 union all select 20480 union all select 24576
union all select 28672 union all select 32768 union all select 36864
union all select 40960 union all select 45056 union all select 49152
union all select 53248 union all select 57344 union all select 61440
union all select 65536 union all select 69632 union all select 73728
union all select 77824 union all select 81920 union all select 86016
union all select 90112 union all select 94208 union all select 98304
union all select 102400 union all select 106496 union all select 110592
union all select 114688 union all select 118784 union all select 122880
union all select 126976 union all select 131072 union all select 135168
union all select 139264 union all select 143360 union all select 147456) v3
) w
where w.n=charindex(@Delimiter,@Delimiter+@List+@Delimiter,w.n)
and w.n<len(@Delimiter+@List)
He tested it out…

select * 
from dbo.ufn_SplitIntArrayInline('123,456,789',',')
/*
Item
----
123
456
789
*/
…and it worked just fine. So he incorporated it into his query and executed it:

declare @ProductList varchar(max) = '897,911,942'
select d.ProductID
,ProductName=p.Name
,h.SalesOrderID
,h.OrderDate
,TerritoryName=t.Name
from dbo.ufn_SplitIntArrayInline(@ProductList,',') a
join Sales.SalesOrderDetail d on a.Item=d.ProductID
join Production.Product p on d.ProductID=p.ProductID
join Sales.SalesOrderHeader h on d.SalesOrderID=h.SalesOrderID
join Sales.SalesTerritory t on h.TerritoryID=t.TerritoryID
order by d.ProductID,h.SalesOrderID
/*
Msg 537, Level 16, State 5, Line 2
Invalid length parameter passed to the LEFT or SUBSTRING function.
*/
What!!??What!!?? What was going on? The query completely bombed! This didn’t make sense. The function worked when run by itself, but once it was JOINed with other tables, it turned over and died.

Dan looked at the estimated execution plan and recorded some information to compare it with his previous two approaches:

/*
Estimated Estimated
Description CPU Reads Duration QueryCost Number of Rows MemoryGrant
----------------------------------------------------------------------------------------
Original (RBAR) 62ms 832 304ms 0.790177 359.935 2,192KB
Numbers Technique 5961ms 244,199 2229ms 3556.620000 56,629,000.000 37,984KB
Inline Technique N/A N/A N/A 116532.000000 34,912,200.000 N/A
*/
Omigosh! Look at the Estimated Query Cost! The optimizer estimated this query was going to take 116532/3600 = 32 Hours to run! No wonder it choked!

Dan was really confused at this point. But he decided to bravely trudge onward.



He figured he would give the XML approach a try… it certainly seemed to hold its own in Aaron’s tests. This XML technique was an approach that some joker named Brad Schulz had talked about several times in his blog. Dan didn’t always read Brad’s blog posts… Sometimes Brad had a weird sense of humor and that didn’t always sit well with Dan… Dan was kind of a serious fellow.

But, despite his feelings about Brad, Dan went ahead and created a function using the XML string-splitting function in his database:

create function dbo.ufn_SplitIntArrayXML
(
@List varchar(max)
,@Delimiter char(1)
)
returns table
as
return
select
Item
from (select XMLString='<x>'+replace(@List,@Delimiter,'</x><x>')+'</x>') F1
cross apply (select XMLList=cast(XMLString as xml).query('.')) F2
cross apply XMLList.nodes('/x') F3(XMLNode)
cross
apply (select Item=XMLNode.value('(./text())[1]','int')) F4
He tested it out…

select * 
from dbo.ufn_SplitIntArrayXML('123,456,789',',')
/*
Item
----
123
456
789
*/
…and it worked just fine. So he incorporated it into his query and executed it:

declare @ProductList varchar(max) = '897,911,942'
select d.ProductID
,ProductName=p.Name
,h.SalesOrderID
,h.OrderDate
,TerritoryName=t.Name
from dbo.ufn_SplitIntArrayXML(@ProductList,',') a
join Sales.SalesOrderDetail d on a.Item=d.ProductID
join Production.Product p on d.ProductID=p.ProductID
join Sales.SalesOrderHeader h on d.SalesOrderID=h.SalesOrderID
join Sales.SalesTerritory t on h.TerritoryID=t.TerritoryID
order by d.ProductID,h.SalesOrderID
/*
ProductID ProductName SalesOrderID OrderDate TerritoryName
--------- -------------------------------- ------------ ---------- --------------
897 LL Touring Frame - Blue, 58 51823 2003-08-01 United Kingdom
897 LL Touring Frame - Blue, 58 51875 2003-08-01 Australia
911 LL Road Seat/Saddle 51140 2003-07-01 Central
911 LL Road Seat/Saddle 53472 2003-09-01 United Kingdom
911 LL Road Seat/Saddle 53488 2003-09-01 Canada
911 LL Road Seat/Saddle 53495 2003-09-01 United Kingdom
911 LL Road Seat/Saddle 53501 2003-09-01 Southeast
911 LL Road Seat/Saddle 53529 2003-09-01 Canada
942 ML Mountain Frame-W - Silver, 38 51120 2003-07-01 United Kingdom
942 ML Mountain Frame-W - Silver, 38 51711 2003-08-01 Northwest
942 ML Mountain Frame-W - Silver, 38 51758 2003-08-01 Southeast
942 ML Mountain Frame-W - Silver, 38 51799 2003-08-01 Southeast
942 ML Mountain Frame-W - Silver, 38 51856 2003-08-01 Southwest
*/
Hmmm… well at least this approach didn’t croak like the Inline technique did. The result was correct. And it seemed pretty snappy in execution. Dan investigated the Profiler and Actual Execution Plan data and added data to his table of figures:

/*
Estimated Estimated
Description CPU Reads Duration QueryCost Number of Rows MemoryGrant
----------------------------------------------------------------------------------------
Original (RBAR) 62ms 832 304ms 0.790177 359.935 2,192KB
Numbers Technique 5961ms 244,199 2229ms 3556.620000 56,629,000.000 37,984KB
Inline Technique N/A N/A N/A 116532.000000 34,912,200.000 N/A
XML Technique 237ms 843 449ms 26.408000 35,993.500 25,856KB
*/
What!!??This was really weird! Dan’s original RBAR approach was still the clear winner in all these tests. The XML approach came pretty darn close… MUCH closer than the other methods… but it wanted a memory grant of 26MB!

Dan’s world had turned upside-down. He was expecting to hear news bulletins any minute talking about pigs flying and snowballs discovered in Hell. This just wasn’t right!

Dan figured he’d have to go the final step… There was only one clear path. He took a deep breath, let it out slowly, and decided he would have to brave the CLR technique.



Aaron Bertrand’s roundup had given a link to Adam Machanic’s blog that had the C# source code for doing the string splitting. But, as little as Dan knew about C#, he could tell that Adam’s code returned a table of NVARCHAR vales, not INT values.

So Dan gave a call to an old friend from college named Biff Wellington who was a C# consultant and asked if he could help. Biff was thrilled to hear from Dan and said he’d be happy to assist… anything for an old pal. Dan referred him to the source code at Adam Machanic’s blog and, an hour later, Biff emailed Dan the revised code… and a bill for $150. Dan swore under his breath.

Here was the code that Biff had sent, which amended Adam Machanic’s code to include a FillRowInts function and SplitStringOfInts function to return a table of integers:

Click here to download the C# Code

Now Dan had to jump through all the hoops to get this C# code to work. He downloaded a copy of C# 2008 Express Edition from Microsoft (at least that was free, unlike Biff’s services… the jerk). He created a project using the Class Library template, calling the project BiffSucks. He pasted in Biff’s code and hit the F6 key to Build the Solution and it built successfully! Now he was ready to get into SSMS to create his function.

First he made sure that the server was enabled for CLR:

sp_configure 'CLR Enabled',1
go
reconfigure
go
Then he created an assembly, which referenced the DLL he had built:

create assembly StringHelper
from 'c:\Users\DanDruff\Documents\Visual Studio 2008\Projects\'
+'BiffSucks\BiffSucks\obj\Release\BiffSucks.dll'
And then he created the function to use that assembly and its SplitStringOfInts function:

create function dbo.ufn_SplitIntArrayCLR
(
@List nvarchar(max)
,@Delimiter nchar(1)
)
returns table (Item int)
external name StringHelper.UserDefinedFunctions.SplitStringOfInts
Gee, this wasn’t so hard after all!

He tested it out…

select * 
from dbo.ufn_SplitIntArrayCLR(N'123,456,789',N',')
/*
Item
----
123
456
789
*/
…and it worked just fine. So he incorporated it into his query and executed it:

declare @ProductList nvarchar(max) = N'897,911,942'
select d.ProductID
,ProductName=p.Name
,h.SalesOrderID
,h.OrderDate
,TerritoryName=t.Name
from dbo.ufn_SplitIntArrayCLR(@ProductList,N',') a
join Sales.SalesOrderDetail d on a.Item=d.ProductID
join Production.Product p on d.ProductID=p.ProductID
join Sales.SalesOrderHeader h on d.SalesOrderID=h.SalesOrderID
join Sales.SalesTerritory t on h.TerritoryID=t.TerritoryID
order by d.ProductID,h.SalesOrderID
/*
ProductID ProductName SalesOrderID OrderDate TerritoryName
--------- -------------------------------- ------------ ---------- --------------
897 LL Touring Frame - Blue, 58 51823 2003-08-01 United Kingdom
897 LL Touring Frame - Blue, 58 51875 2003-08-01 Australia
911 LL Road Seat/Saddle 51140 2003-07-01 Central
911 LL Road Seat/Saddle 53472 2003-09-01 United Kingdom
911 LL Road Seat/Saddle 53488 2003-09-01 Canada
911 LL Road Seat/Saddle 53495 2003-09-01 United Kingdom
911 LL Road Seat/Saddle 53501 2003-09-01 Southeast
911 LL Road Seat/Saddle 53529 2003-09-01 Canada
942 ML Mountain Frame-W - Silver, 38 51120 2003-07-01 United Kingdom
942 ML Mountain Frame-W - Silver, 38 51711 2003-08-01 Northwest
942 ML Mountain Frame-W - Silver, 38 51758 2003-08-01 Southeast
942 ML Mountain Frame-W - Silver, 38 51799 2003-08-01 Southeast
942 ML Mountain Frame-W - Silver, 38 51856 2003-08-01 Southwest
*/
It worked! And it seemed pretty quick! Dan was hopeful as he started to record the Profiler and Actual Execution Plan figures into his table:

/*
Estimated Estimated
Description CPU Reads Duration QueryCost Number of Rows MemoryGrant
----------------------------------------------------------------------------------------
Original (RBAR) 62ms 832 304ms 0.790177 359.935 2,192KB
Numbers Technique 5961ms 244,199 2229ms 3556.620000 56,629,000.000 37,984KB
Inline Technique N/A N/A N/A 116532.000000 34,912,200.000 N/A
XML Technique 237ms 843 449ms 26.408000 35,993.500 25,856KB
CLR Technique 443ms 1541 526ms 22.483400 359,935.000 99,744KB
*/
What!!??Huh? The CLR approach performed in under a second, but it was still slower than the RBAR and XML approach! And it had almost twice the reads! And it wanted almost 100MB of memory to do its job!!

Dan was feeling faint. This must be a dream, he thought. He pinched himself. But he didn’t wake up… it was not a dream.

Dan was at wit’s end… The world no longer made any sense.



Not knowing what else to do, he decided to go to Brad Schulz’s blog to read more about the XML technique. Even though it underperformed compared to Dan’s boring old RBAR approach, at least the whole idea of XML was kind of cool and modern.

And when he navigated to the blog, he came upon a post that Brad had written on Aug13,2010 about Query Hints. Brad was describing a situation frighteningly similar to what Dan was experiencing. By adding query hints of FORCE ORDER, MAXDOP 1, and LOOP JOIN to the query, Brad demonstrated how the execution plan could be changed to give better performance.

So Dan decided to add these query hints to each of the 5 techniques and see what would happen.

The new query plans created by the hints certainly made a lot more sense… no more parallelism and weird JOIN configurations and such… and the Inline technique actually worked this time!

Here is what he recorded:

/*
Estimated Estimated
Description CPU Reads Duration QueryCost Number of Rows MemoryGrant
----------------------------------------------------------------------------------------
NO HINTS:
Original (RBAR) 62ms 832 304ms 0.790177 359.935 2,192KB
Numbers Technique 5961ms 244,199 2229ms 3556.620000 56,629,000.000 37,984KB
Inline Technique N/A N/A N/A 116532.000000 34,912,200.000 N/A
XML Technique 237ms 843 449ms 26.408000 35,993.500 25,856KB
CLR Technique 443ms 1541 526ms 22.483400 359,935.000 99,744KB
----------------------------------------------------------------------------------------
WITH HINTS:
Original (RBAR) 18ms 257 67ms 1.13386 288.875 1,024KB
Numbers Technique 23ms 212 178ms 42676.40000 20,968,100.000 383,976KB
Inline Technique 130ms 202 157ms 30394.90000 10,049,000.000 383,976KB
XML Technique 34ms 203 41ms 45.78860 28,887.500 6,768KB
CLR Technique 32ms 203 32ms 220.80600 288,875.000 63,000KB
----------------------------------------------------------------------------------------
*/
With the hints incorporated into the queries, all of the approaches were nice and quick, with a Duration of under 200ms. The Numbers and Inline techniques were clearly the worst performers… and they both wanted an ENORMOUS memory grant of about 384MB!!!… This was because the optimizer made such a gross over-estimate of how many records would result, that it figured it was going to have to sort tens of millions of rows. Dan eliminated those techniques outright.

Oddly enough, his RBAR technique still had the lowest CPU, but among the three remaining techniques, it had the highest Reads and longest Duration. On the other hand, RBAR had the lowest memory grant by far. The CLR, which was the fastest performer, wanted 60 times the memory that Dan’s RBAR technique wanted. At least the XML approach was not as much of a memory hog, since it had a more reasonable estimate of the number of resulting rows than CLR.

Ultimately, Dan made a decision. He was happy with his RBAR function. Sure, it wasn’t “sexy” like the other approaches, but it had a lot going for it. It was readable. It was simple. The optimizer made reasonable estimates of rows when it was incorporated into a query. It did not demand a lot of memory. And, if in the future, Dan forgot to include the FORCE ORDER and MAXDOP 1 and LOOP JOIN hints in a query that used his function, it would still perform just fine with very little memory requirements (unlike the other approaches).

However, Dan also vowed to make sure that in the future, he would at least test out future queries to make sure that his RBAR function was still a reasonable performer. There may be other, more complicated queries in his future that may work better with the other techniques. He now knew, better than ever, the golden rule of SQL Server: “It depends.”

Dan sat back in his chair, pleased with his decisions.

And he lived happily ever after.



The morals of the story are:

What works well in theory may not work well in practice… It depends… Test test test.

and (forgive me)…

If you're itching to get out of a hairy splitting situation, apply the Dan Druff treatment:
  • Gather data in testing various approaches

  • Hints if necessary

  • Repeat for future queries

Friday, August 13, 2010

Taking A Hint

A new client of mine needed some performance tuning in several of his stored procedures. In most cases, it involved creating an appropriate index, or changing a table-valued function call to an equivalent inline TVF call, or just correcting the predicates in the WHERE clause to make it more SARGable. But there were a couple times when I could eke out even more performance by doing a little something extra.

And what “extra thing” did I give to those queries? I’ll give you a hint: I gave them a hint.

I Want It FAST

For example, there was one procedure that did a little calculation to come up with a particular date, and then it needed to pull out some information from some tables based on a 1-day date range from that date. It was something like this query in AdventureWorks:

declare @DesiredDateAtMidnight datetime = '20010709'
declare @NextDateAtMidnight datetime = dateadd(day,1,@DesiredDateAtMidnight)
select OrderID=h.SalesOrderID
,h.OrderDate
,h.TerritoryID
,TerritoryName=t.Name
,c.CardType
,c.CardNumber
,CardExpire=right(str(100+ExpMonth),2)+'/'+str(ExpYear,4)
,h.TotalDue
from Sales.SalesOrderHeader h
left join Sales.SalesTerritory t on h.TerritoryID=t.TerritoryID
left join Sales.CreditCard c on h.CreditCardID=c.CreditCardID
where OrderDate>=@DesiredDateAtMidnight
and OrderDate<@NextDateAtMidnight
order by h.SalesOrderID
/*
OrderID OrderDate TerritoryName CardType CardNumber CardExpire TotalDue
------- ---------- --------------- ------------- -------------- ---------- ---------
43728 2001-07-09 Southwest Distinguish 55553397232060 03/2007 3953.9884
43729 2001-07-09 United Kingdom ColonialVoice 77771647096870 03/2006 3756.989
43730 2001-07-09 Southwest Distinguish 55552313505236 09/2007 3756.989
43731 2001-07-09 Australia Vista 11114001441348 11/2006 3953.9884
43732 2001-07-09 Australia Distinguish 55557723662910 11/2007 3729.364
43733 2001-07-09 Northwest SuperiorCard 33338859457470 04/2008 3953.9884
*/
That particular query produces this (actual) execution plan (click on the plan to see the full-size image in a new window):

Original Query Plan

The optimizer decided to approach this by doing a Merge Join between the CreditCard table (which has 19118 rows) and the result of a Join of the Territory and the SalesOrderHeader tables. Note that it had to SORT the Territory/SalesOrderHeader data by CreditCardID so that it could do that Merge Join. And then, after that, it had to SORT the final data by SalesOrderID, because that’s what the ORDER BY clause of the query called for.

Note, also that the optimizer estimated that my WHERE clause would pull 2831.85 rows (out of a total of 31465) from SalesOrderHeader. How did it arrive at this figure? Since the @DesiredDateAtMidnight and @NextDateAtMidnight variables had values that were unknown to the query, it had to make assumptions. The optimizer assumes that equality-based predicates are 10% selective and inequality-based predicates are 30% selective. Since I had two inequality-based predicates, it assumed that together they would be 30% * 30% = 9% selective. And, sure enough, 31465 total rows * 9% = 2183.85 rows.

But the final result of the query was only 6 rows.

This is the situation that I was in with my client’s query. The optimizer grossly over-estimated how many rows it would find, and so it came up with a query that was efficient for handling that many rows. But I knew that a day’s worth of data from the particular tables that I was querying would produce about an average of 10 rows, not thousands of rows.

So I gave the optimizer a hint… specifically I gave it a FAST 10 hint, indicating that it should put together a plan that would return 10 rows as fast as possible. Adding OPTION (FAST 10) to the end of our query above produces this (actual) execution plan:

Query Plan With OPTION (FAST 10) Hint

Since the optimizer now knew that only 10 rows were estimated, it could put together a more appropriate plan for that small result. Note that there are Nested Loop Joins rather than a Merge Join and a Hash Join, and there are no more Sort operators, since the main driving table of the query is the SalesOrderHeader Clustered Index, which is already sorted by SalesOrderID.

I could have used the OPTION (RECOMPILE) hint instead, which would allow the optimizer to “sniff” the values of the variables and put together an appropriate plan based on their values, but that would involve having to recompile the plan from scratch every time this procedure was run, and therefore the plan would not go into the cache for others to use. The OPTION (FAST 10) plan, on the other hand, would be cached for others.

The Profiler statistics for the two queries are as follows:

/*
Description CPU Reads Duration QueryCost Relative%
-------------------------------------------------------------
No Hint Provided 63ms 889 288ms 0.9698090 96%
OPTION (FAST 10) 31ms 775 173ms 0.0450217 4%
*/
The CPU is so low for both queries that it’s really meaningless to compare them, but you can see that the Number of Reads and the Duration are lower by introducing the hint. The optimizer figured the second approach was much cheaper, but that’s mainly because, as far as it was concerned, it was comparing a 2184-row query to a 10-row query.

In my particular case with the client, the query was a bit more complicated, so the difference in CPU and Reads was much more dramatic. By adding the FAST 10 hint, the CPU was decreased by 60% and the Reads were decreased by 75%.

Using the FORCE

Another query that I wanted to tune incorporated an array-splitting (inline) table-valued function that used a Numbers table.

Let’s put together an example in AdventureWorks to illustrate what I was dealing with. First, create a Numbers table consisting of the integers from 1 to 1,000,000 and create a clustered index on it:

;with 
L0
(c) as (select 0 from (values (0),(0),(0)) x(c)) --3 Rows
,L1(c) as (select 0 from L0 a, L0 b, L0 c) --27 Rows (3x3x3)
,L2(c) as (select 0 from L1 a, L1 b, L1 c) --19683 Rows (27x27x27)
,L3(c) as (select 0 from L2 a, L2 b) --387,420,489 Rows (19683x19683)
,NN(n) as (select row_number() over (order by (select 0)) from L3)
select Number=isnull(convert(int,n),0) --Force it to be a "not null" column
into dbo.Numbers
from NN
where n<=1000000
go
alter table dbo.Numbers
add constraint PK_Numbers
primary key clustered (Number)
with (fillfactor=100)
Next, create an inline table-valued function that would use that Numbers table to split a comma-delimited list of integers:

create function dbo.ufn_SplitIntArray
(
@List varchar(max)
,@Delimiter varchar(10)
)
returns table
as
return
select
Item=convert(int,String)
from dbo.Numbers with (nolock)
cross
apply (select ItemPos=Number) F1
cross apply (select DelimPos=charindex(@Delimiter,@List+@Delimiter,ItemPos)) F2
cross apply (select String=rtrim(ltrim(substring(@List,ItemPos,DelimPos-ItemPos)))) F3
where ItemPos<=convert(int,len(@List))
and substring(@Delimiter+@List,ItemPos,1)=@Delimiter
You can see how this function is used in the example below:

select * 
from dbo.ufn_SplitIntArray('123,456,789',',')
/*
Item
----
123
456
789
*/
Now that we have that in place, we can see a query that makes use of this function. The following will take a list of TerritoryID’s and will list the SalesOrderHeader rows that are in those Territories, along with the Territory Description:

declare @TerritoryList varchar(max) = '2,3,5'
select h.SalesOrderID
,h.OrderDate
,t.TerritoryID
,TerritoryName=t.Name
,h.PurchaseOrderNumber
,h.TotalDue
from dbo.ufn_SplitIntArray(@TerritoryList,',') a
join Sales.SalesTerritory t on a.Item=t.TerritoryID
join Sales.SalesOrderHeader h on t.TerritoryID=h.TerritoryID
order by h.SalesOrderID
/*
SalesOrderID OrderDate TerritoryID TerritoryName PurchaseOrderNumber TotalDue
------------ ---------- ----------- ------------- ------------------- ----------
43659 2001-07-01 5 Southeast PO522145787 27231.5495
43660 2001-07-01 5 Southeast PO18850127500 1716.1794
43667 2001-07-01 3 Central PO15428132599 8095.7863
...
74193 2004-07-02 5 Southeast NULL 43.0729
74548 2004-07-13 5 Southeast NULL 38.675
75103 2004-07-31 5 Southeast NULL 44.1779
(1223 rows total)
*/
And here is the (actual) execution plan for that query:

Original Query Plan

Notice that Parellelism is involved in this plan. The optimizer figured our query was so expensive that it decided it was cheaper to spread the burden of the query across the 4 processors in my system.

As far as how to attack the query, the optimizer put together a plan that starts by scanning the SalesTerritory table and Hash Joins it with the SalesOrderHeader table, resulting in an estimated 24,832 rows.

It also scans the Numbers table, first satisfying the predicate of

where Number<=convert(int,len(@TerritoryList))
Since the optimizer has no idea what is in @TerritoryList, it makes an inequality-based assumption that the predicate will have a selectivity of 30%, and therefore it estimates that it will produce an estimated 1,000,000 * 30% = 300,000 rows. In reality, it only produced 5 rows.

Then it applies a filter to those rows to pull the positions of the comma-delimited numbers in @TerritoryList… in other words a filter to satisfy the predicate of

where substring(','+@TerritoryList,Number,1)=','
It estimates that those 300,000 rows will filter down to 9486.83 (I have no idea how it arrived at this 3.16% selectivity figure). In reality, it only produced 3 rows. Finally, the optimizer plans on storing those rows into a Spool for repeated access by the Nested Loops operator.

Now the Nested Loops operator will match up each of the estimated 24,832 rows with each of the estimated 9486.83 rows from the spool and will find the ones that satisfy this JOIN predicate (which is implied by incorporating our inline table-valued function in our query):

convert(int
,rtrim(ltrim(substring(@TerritoryList
,Number
,charindex(',',@TerritoryList+',',Number)-Number
)
)
)
) = SalesTerritory.TerritoryID
Since that’s an equality predicate, it assumes 10% selectivity, so it comes up with a final estimated result of 24,832 * 9486.83 * 10% = 23,557,700 rows for the final result. Wow! In reality, the Nested Loops will match 31,465 rows with the 3 rows in the spool (producing 94,395 matches) and end up only finding 1223 that satisfy the JOIN predicate (a selectivity of only 1.3%).

The bottom line here is that the optimizer got scared by that million-row table of Numbers and grossly overestimated how many values it would pull from that table and so put together a plan to most efficiently handle a final product of over 23 million rows. That’s why it started by processing the 31,465-row SalesOrderHeader table and spooled the estimated 9486.83 rows from the Numbers table that it would match up with each of those.

But we know that, in reality, our @TerritoryList will only consist of a small handful of comma-delimited values, so the plan should ideally start with the Numbers table, producing the positions of the values in @TerritoryList and, for each of those values, look them up in the SalesTerritory and then find the rows in SalesOrderHeader with that TerritoryID.

In other words, it should process the plan in exactly the order that I specified in the query:

from dbo.ufn_SplitIntArray(@TerritoryList,',') a
join Sales.SalesTerritory t on a.Item=t.TerritoryID
join Sales.SalesOrderHeader h on t.TerritoryID=h.TerritoryID
Well, there’s a hint for that! If we add an OPTION (FORCE ORDER) to our query, we end up with a completely new (actual) execution plan:

Query Plan With OPTION (FORCE ORDER) Hint

And here are the statistics that Profiler gives us in comparing the two plans:

/*
Description CPU Reads Duration QueryCost Relative%
----------------------------------------------------------------
No Hint Provided 1499ms 63,783 854ms 1167.11 27%
FORCE ORDER 141ms 3,660 314ms 3149.68 73%
*/
Nothing changed in what the optimizer estimates… it still thinks the query will produce 23 million rows. And it obviously thinks we’re completely insane to do a FORCE ORDER, because it estimates that the cost will almost triple by doing it this way. But you and I know better, and by introducing that hint, we decreased the CPU and Reads dramatically.

But this new plan still has Parallelism in it. The optimizer thought we needed it because it thinks the cost is so ridiculous that it will be cheaper to split up the query among the 4 processors in my computer. And, interestingly enough, it figures that by doing that split, it can introduce Sort operators to order the two main streams by TerritoryID and do a Merge Join, and then turn around and do a Sort on SalesOrderID for the final result.

Well, we know that our final result is not going to be that large… it’s certainly not going to be anywhere even close to 23 million rows! So let’s just get rid of the Parallelism. We’ll add one additional hint (MAXDOP 1) to our query in order to force it to only use one processor:

option (force order, maxdop 1)
Here’s the (actual) execution plan that results from that hint:

Query Plan With OPTION (FORCE ORDER, MAXDOP 1) Hint

See how simple that new plan is? Now how does that compare to the previous two plans?:

/*
Description CPU Reads Duration QueryCost Relative%
--------------------------------------------------------------------
No Hint Provided 1499ms 63,783 854ms 1167.11 12%
FORCE ORDER 141ms 3,660 314ms 3149.68 31%
FORCE ORDER, MAXDOP 1 31ms 741 175ms 5788.61 57%
*/
The optimizer thinks we’ve completely lost our mind, since it estimates the Cost is 5788.61… over 5 times more expensive than the plan it would have produced without hints. It’s probably snickering… laughing behind our backs and thinking, “Your stupid plan with hints will take over an hour and a half to run, but if you let me do my job, I can do it all in only 20 minutes!”

But that couldn’t be further than the truth. By introducing the hints, we reduced the CPU by 98% and reduced the Reads by 99%.

One More LOOPy Enhancement

But wait! We can even do better!

The plan still involves a Clustered Index Scan of the entire SalesOrderHeader table. If we can create an index on TerritoryID and INCLUDE the other columns required by the query, then we could SEEK into that index by TerritoryID rather than SCANning it.

So let’s create an index… Note that the TotalDue column in SalesOrderHeader is actually a computed column based on SubTotal and TaxAmt and Freight, so we have to INCLUDE those columns:

create index ix_Terr_iOrdDt_iPO_iTDue 
on Sales.SalesOrderHeader
(TerritoryID)
include (OrderDate,PurchaseOrderNumber,SubTotal,TaxAmt,Freight)
Now let’s run our query (with our two hints) and see if that index made a difference in the plan:

Query Plan With OPTION (FORCE ORDER, MAXDOP 1) Hint And New Index

Hmmm… the only thing that changed was that it was SCANning our new index rather than SCANning the Clustered Index. That improves the query somewhat, decreasing the number of reads, because our new index is smaller in terms of number of pages:

/*
Description CPU Reads Duration
-----------------------------------------------
No Hint Provided 1499ms 63,783 854ms
FORCE ORDER 141ms 3,660 314ms
FORCE ORDER, MAXDOP 1 31ms 741 175ms
Add New Index 31ms 232 172ms
*/
But we created that index to take advantage of the TerritoryID. We wanted it to SEEK into the index rather than SCAN the entire thing.

So another hint to the rescue!

This time we will tell the optimizer that we only want it to use LOOP JOINs. So here’s our new query with all the hints involved:

declare @TerritoryList varchar(max) = '2,3,5'
select h.SalesOrderID
,h.OrderDate
,t.TerritoryID
,TerritoryName=t.Name
,h.PurchaseOrderNumber
,h.TotalDue
from dbo.ufn_SplitIntArray(@TerritoryList,',') a
join Sales.SalesTerritory t on a.Item=t.TerritoryID
join Sales.SalesOrderHeader h on t.TerritoryID=h.TerritoryID
order by h.SalesOrderID
option (force order, maxdop 1, loop join)
And here’s the new (actual) execution plan that it produces:

Query Plan With OPTION (FORCE ORDER, MAXDOP 1, LOOP JOIN) Hint

Note that we now have a Nested Loops Join that now SEEKs into our new index. And the results?

/*
Description CPU Reads Duration
-----------------------------------------------
No Hint Provided 1499ms 63,783 854ms
FORCE ORDER 141ms 3,660 314ms
FORCE ORDER, MAXDOP 1 31ms 741 175ms
Add New Index 31ms 232 172ms
Add LOOP JOIN Hint 16ms 69 144ms
*/
Hah! So we’ve ultimately decreased the Reads by 99.9% from 63,783 down to a measly 69. That’s the way it should be with a straightforward query like this one.

Thank goodness for the existence of hints. They’re one more effective tool we can use to beat the optimizer into submission persuade the optimizer to form a plan that we know is best for our query.