Wednesday, December 23, 2009

It's a Small World After All

Small WorldI don't have any heavy in-depth T-SQL code or analysis today…

Instead, let’s just sit back and take a quick break from the hectic holiday season…

Sip some hot cocoa or cider…

And have a little chuckle over a couple “Small World” incidents…

The late 1990’s were kind of a busy whirlwind blur for me… I worked at home with 3 young children running around; I was doing a lot of preventative Y2K work; I was a Visual FoxPro MVP frequenting the old CompuServe FoxForum and writing articles for a couple of VFP publications; and I was doing lots of short-term trading in the stock market. (Ahhh… those were the days… check out the 1100% advance in InfoSpace in less than 5 months during 1999/2000).

INSP Oct1999-Mar2000

Yes, it was a busy time, and that’s probably why I can’t remember writing a book on SQL Server 7 back in 1999 with none other than Buck Woody (known then, more formally, as Greg Woody):

SQL Server 7.0 Book

I stumbled upon this book the other day at Amazon quite by mistake.

Hmmm… I guess I’d better update my résumé.

It’s amazing the kind of wild stuff you can find out about yourself on the web. For example, about a year ago, SQL Server MVP Kalen Delaney was shocked to discover that she had somehow written a book on Oracle… Kalen may be interested to discover that this book is now available at Amazon in Spanish!

One other funny Google story…

Several years ago, I decided (for whatever reason) to Google my young son’s name. What I found came as quite a surprise. I found a guy named Qxbr Schulz (name disguised) who was a self-employed software developer who had a son named Brad. Well, I was a guy named Brad Schulz who was a self-employed software developer who had a son named Qxbr. And the really weird thing is that we lived only 3 miles from each other!

Yes, as the song goes: ”It’s a small world after all!”

Anyway, he and I met soon thereafter at a Fourth-of-July block party. And no, we’re not related in any way. But he and I have a little chuckle every time we run into each other in town.

Best wishes for the holidays, everybody…

Saturday, December 12, 2009

Delimited String Tennis Anyone?

Tennis Anyone?SQL Server MVP Adam Haines and I have been going back and forth for months in our blogs on the subject of creating and splitting a delimited string… it’s been kind of a virtual T-SQL Tennis match… not a competition, but rather a friendly rally, knocking the subject back and forth, practicing our skills. Between the two of us, we’ve just about beaten the subject to death; however, I like to think that our collective research has been a win-win situation for the SQL Server community.

(If you’d like to look at these past articles, read Adam’s posts here, here, here, and here and read my posts and comments here, here, here, and here.)

In Adam’s latest volley a few days ago (which you can read here), he did some in-depth time tests on various methods of splitting a delimited string…. the XML approach using XPath and the .nodes() and .value() methods, the permanent Numbers table approach, and the approach using a TVF with a virtual table of Numbers.

His verdict was that the XML approach was a biiiiig disappointment when handling a comma-delimited string of many items, in one case performing over 200 times slower (!!) than the Numbers table approaches. Like Adam, I was a bit crestfallen. The XML method is the usual method suggested on the T-SQL forums for doing this sort of thing.

It is literally shocking how badly it performs.

Take a look at the following example, in which I create a variable that contains a comma-delimited list of 5000 items. Then it uses the XML approach to shred the list into a result set of 5000 rows:

/* 
Note that I have to CAST() the first parameter of REPLICATE
in order to force it to produce more than 8000 bytes
*/
declare @CommaList varchar(max)
set @CommaList='ABC'+replicate(cast(',ABC' as varchar(max)),4999)

select x.i.value('.','varchar(3)')
from (select XMLList=cast('<i>'+replace(@CommaList,',','</i><i>')+'</i>' as xml)) a
cross apply XMLList.nodes('i') x(i)
The task that we’re asking SQL Server to do is not that complicated, but do you want to guess how long this takes to fully execute on my system? Are you sitting down?

It takes over 6 minutes!

Really disappointing…

I had a lot of trouble accepting this. After doing several experiments, I finally figured out what was going on.

The problem lies in the derived table in the FROM clause, where we create a derived XML column via the CAST function.

It seems we are not really creating a complete column or value at all. Apparently we are inadvertently creating an XML stream, which sends small chunks of data to the .nodes() method and then the chunks get handed over to the .value() method to come up with our result… one chunk at a time.

On my system, when I execute the above code, I see nothing at all happen for about almost 50 seconds. Then the first 676 rows get spit out into the results window. After another 50 seconds or so, another similar-sized chunk of rows get spit out. This happens like clockwork until it spits out the final 5000th row over 6 minutes after I started the query.

You may recall in a previous post of mine where I said that an XML stream is processed in 2033-byte chunks. Well, 676 rows of 3 bytes (‘ABC’) each equals 2028 bytes. (I’m assuming that there’s a little tiny bit of overhead involved and that’s why we can’t squeeze in the 677th row). My comma-delimited string consists of 5000 items at 3 bytes each, coming to 15000 bytes. Divide that by 2033 bytes and you get 7.38, which is the number of chunks of data that get processed. Multiply that by 50 seconds per chunk, and you get… voila… 6.15 minutes.

Well, this must be all really exciting for you math nerds out there, but how does it help us?

It does help to explain what’s going on, and it does show us that streaming the XML is reeeeeaaalllly sloooooow.

So is there hope for the XML method?

YES! YES! YES!

This slow streaming only occurs when we are creating the XML to be processed via a derived table or CTE. What If we instead transfer our comma-delimited string into an actual XML variable and then process that variable, like so?:

/* 
Note that I have to CAST() the first parameter of REPLICATE
in order to force it to produce more than 8000 bytes
*/
declare @CommaList varchar(max)
set @CommaList='ABC'+replicate(cast(',ABC' as varchar(max)),4999)

declare @XMLList xml
set
@XMLList=cast('<i>'+replace(@CommaList,',','</i><i>')+'</i>' as xml)

select x.i.value('.','varchar(3)')
from @XMLList.nodes('i') x(i)
Believe it or not, this takes only 0.174 seconds!

In comparing the two pieces of code outlined in this article, here are the results:

/*            Derived     XML
Table Variable Percent
Measurement Method Method Improvement
--------------------------------------------
CPU 386,172 125 2989%
Duration(ms) 368,747 174 2019%
*/
So I’m happy to report that all is not lost for the XML method. It’s still a contender in the list-splitting competition, giving the Numbers table methods a run for their money… as long as it is implemented in a certain way.

Back to you, Adam…

UPDATE ON Jan06,2010: Please read the follow-up to this article in which I talk about some additional methods to speed the process up even further.

Tuesday, December 8, 2009

Friday the 13th 2009: Cluster of Terror

This blog entry is participating in the very first T-SQL Tuesday, hosted this month by Adam Machanic. You are invited to visit his blog to join this rollicking blog party and read more blogs participating in this month’s theme: Date/Time Tricks.

Approximately 715,298 articles have been written over the past 20 years or so regarding the handling of dates and times in T-SQL. So, instead of fruitlessly trying to come up with an earth-shattering new clever way of handling dates that no one’s ever seen before, I just thought I’d have some fun with some analysis of a very specific date that we all know and fear love.

By the way, before we proceed, as long as we’re talking about date-related stuff, I should remind you that the deadline for TSQL Challenge #18 (Generate Text-Formatted Month Calendars) is on December 14th, so check it out and give it a try.

Okay, on with our fun stuff…



Friday The ThirteenthIs it just my imagination, or have we experienced a bunch of Friday the 13th’s lately?

For those of you who are not up-to-speed on superstitions, the number 13 is an unlucky number in many cultures, and, similarly, Friday is an unlucky day of the week, and so Friday the 13th is double bad luck. People who have a genuine fear of Friday the 13th suffer from paraskevidekatriaphobia, which is a concoction of Greek words for Friday (Paraskeví), thirteen (dekatreís), and fear (phobía).

Let’s do a little investigating into Friday the 13th. First, we’ll build a Calendar table of every single date in the 1247-year range of January 1, 1753 (the minimum date of the T-SQL datetime datatype) to December 31, 2999 (just a reasonable stopping point). That comes out to 455,457 dates, but our friend T-SQL can create that table in less than a second:

if object_id('tempdb..#Calendar') is not null drop table #Calendar
go
with
L0
(N) as (select 0 union all select 0 union all select 0) /* 3 Rows */
,L1(N) as (select 0 from L0 A cross join L0 B) /* 9 Rows (3x3) */
,L2(N) as (select 0 from L1 A cross join L1 B) /* 81 Rows (9x9) */
,L3(N) as (select 0 from L2 A cross join L2 B) /* 6561 Rows (81x81) */
,L4(N) as (select 0 from L3 A cross join L3 B) /* 43,046,721 Rows */
,Numbers(Num) as (select row_number() over (order by (select 0)) from L4)
select [Date]
,[Year]=year([Date])
,[Month]=month([Date])
,[Day]=day([Date])
,DayOfWeek=datename(weekday,[Date])
into #Calendar
from Numbers
cross apply (select [Date]=convert(datetime,dateadd(day,Num-1,'17530101'))) F1
where Num between 1 and datediff(day,'17530101','29991231')+1
Now we can check out the 13th and see how often it falls on the 7 days of the week:

select DayOfWeek
,NumOccurrences=count(*)
from #Calendar
where [Day]=13
group by DayOfWeek
order by NumOccurrences desc
/*
DayOfWeek NumOccurrences
--------- --------------
Friday 2146
Wednesday 2142
Sunday 2142
Tuesday 2137
Monday 2133
Saturday 2132
Thursday 2132
*/
Hmmm… interesting. The 13th occurs on Friday more than any other day of the week. That’s kind of scary, don’t you think?

In fact, as you can see below, Friday the 13th is in the family of most frequently occurring dates in history (and the future):

with HighestOccurrences as
(
select
top 1
with ties DayOfWeek
,[Day]
from #Calendar
group by DayOfWeek,[Day]
order by count(*) desc
)
select [MostFrequentlyOccurringDay]
=DayOfWeek
+' the '
+ltrim(str([Day]))
+case
when [Day] in (1,21,31) then 'st'
when [Day] in (2,22) then 'nd'
when [Day] in (3,23) then 'rd'
else 'th'
end
from
HighestOccurrences
order by [Day],DayOfWeek
/*
MostFrequentlyOccurringDay
--------------------------
Sunday the 1st
Monday the 2nd
Tuesday the 3rd
...
Wednesday the 11th
Thursday the 12th
Friday the 13th
Saturday the 14th
Sunday the 15th
...
Thursday the 26th
Friday the 27th
Saturday the 28th
*/
Of course, logically, we can see that this also means that Thursday the 12th and Saturday the 14th and so on down to Sunday the 1st and up to Saturday the 28th occur the most frequently, but still… it’s kind of disconcerting that the most-feared day of all occurs the most often.

Let’s build a table of just the Friday the 13th dates:

if object_id('tempdb..#Friday13s') is not null drop table #Friday13s
go
select [Date]
,[Year]
,[Month]
into #Friday13s
from #Calendar
where DayOfWeek='Friday' and [Day]=13
That table is now populated with all the Friday-the-13th dates from 1753 to 2999… There are 2146 of them.

Wait a minute…

If you add the digits of 2146, you get 2 + 1 + 4 + 6 = 13.

Now that’s an interesting (and somewhat unsettling) coincidence, isn’t it? Ha-ha-ha.

Ahem… Oh well, never mind… Let’s move on…

Using that table, we can see that a Friday the 13th occurs every single year without exception:

select Result=case
when (select count(distinct [Year]) from #Friday13s)=2999-1753+1
then 'Yep... it occurs every single year'
else 'Nope... there are some years when it doesn''t happen'
end
/*
Result
----------------------------------
Yep... it occurs every single year
*/
Apparently there’s no escape!

And here’s an interesting tidbit: Every century (except the 1700s because they are only partially represented in our table) contains the exact same number of Friday the 13th’s:

select Century
,NumOccurrences=count(*)
from #Friday13s
cross apply (select Century=str([Year]/100*100,4)+'s') F1
where [Year]>=1800
group by Century
order by Century
/*
Century NumOccurrences
------- --------------
1800s 172
1900s 172
2000s 172
2100s 172
2200s 172
2300s 172
2400s 172
2500s 172
2600s 172
2700s 172
2800s 172
2900s 172
*/
Fascinating…

Now I mentioned earlier that I thought that we had experienced a bunch of Friday the 13th’s lately. Was I right? How many did we have this year (2009)?:

select [Date]
from #Friday13s
where [Year]=2009
order by [Date]
/*
Date
-----------------------
2009-02-13 00:00:00.000
2009-03-13 00:00:00.000
2009-11-13 00:00:00.000
*/
Three of them! A Triple! We just had one last month in November, and we had two months in a row with February and March! Omigosh!

Now that makes me wonder… What years contain the most occurrences of Friday the 13th?:

with HighestOccurrences as
(
select
top 1
with ties [Year]
,NumOccurrences=count(*)
from #Friday13s
group by [Year]
order by NumOccurrences desc
)
select [Year]
,NumOccurrences
from HighestOccurrences
order by [Year]
/*
Year NumOccurrences
---- --------------
1761 3
1764 3
1767 3
...
1931 3
1942 3
1953 3
1956 3
1959 3
1970 3
1981 3
1984 3
1987 3
1998 3
2009 3
2012 3
2015 3
2026 3
2037 3
2040 3
2043 3
...
2989 3
2992 3
2995 3
*/
Yikes! Sure enough… Three times is the most it can occur in a year, and this year (2009) is one of those! And we haven’t experienced a Triple like that since 1998!

Wait a minute… Look at that list again. It looks like we’re going to experience another Triple in just 3 short years… in 2012! And wait! We’ll have another Triple just 3 years after that… in 2015!

Whoa! I thought we had experienced many Friday the 13th’s lately, but it looks like we are only at the beginning of a really scary cluster of them! It’s a Triple-Triple-Triple (3-3-3) Cluster! Three Friday the 13th’s each occurring in three different years that are three years apart from each other.

This calls for more investigation. Let’s look at all the years that have a Triple… and then cluster them chronologically in groups of three… and then find the Clusters that span the least amount of time (i.e. the tightest, densest, scariest Clusters):

with TripleFri13Years as
(
select [Year]
,RowNum=row_number() over (order by [Year])
from #Friday13s
group by [Year]
having count(*)=3
)
,
TripleFri13YearClusters as
(
select FirstTriple=A.[Year]
,SecondTriple=B.[Year]
,ThirdTriple=C.[Year]
,YearSpan
,YearSpanRank=rank() over (order by YearSpan)
from TripleFri13Years A
join TripleFri13Years B on A.RowNum+1=B.RowNum
join TripleFri13Years C on B.RowNum+1=C.RowNum
cross apply (select YearSpan=C.[Year]-A.[Year]+1) F1
)
select FirstTriple
,SecondTriple
,ThirdTriple
,YearSpan
from TripleFri13YearClusters
where YearSpanRank=1 /* The Shortest Span */
order by FirstTriple
/*
FirstTriple SecondTriple ThirdTriple YearSpan
----------- ------------ ----------- --------
1761 1764 1767 7
1789 1792 1795 7
1801 1804 1807 7
1829 1832 1835 7
1857 1860 1863 7
1885 1888 1891 7
1925 1928 1931 7
1953 1956 1959 7
1981 1984 1987 7
2009 2012 2015 7
2037 2040 2043 7
2065 2068 2071 7
2093 2096 2099 7
...
2933 2936 2939 7
2961 2964 2967 7
2989 2992 2995 7
*/
Oh no!

It looks like the 3-3-3 Cluster is the shortest cluster time span possible, and therefore, we are now on the verge of the worst possible Triple Cluster of Triple Friday-the-13th Years! The dreaded 3-3-3 cluster of 2009-2012-2015! The last time this kind of dense 3-3-3 cluster happened was the 7 year span of 1981-1984-1987.

Wait… Let me think… Did anything terrible happen from 1981 to 1987? Well, AIDS was first reported in 1981. And… uh-oh… (GULP)… The H1N1 Swine Flu Pandemic started this very year in 2009!

OH… MY… GAWD! It can’t be a coincidence!

So, let me make sure I got this straight… it seems that 2009 has the highest possible occurrences of Friday the 13th’s and it’s the beginning of a dreaded 3-3-3 Cluster. This doesn’t look good.

Wait!… I’ve gotta check this out… We already saw that every century has the same amount of Friday the 13th’s, but what about 3-3-3 Clusters?:

with TripleFri13Years as
(
select [Year]
,RowNum=row_number() over (order by [Year])
from #Friday13s
group by [Year]
having count(*)=3
)
,
TripleFri13YearClusters as
(
select FirstTriple=A.[Year]
,SecondTriple=B.[Year]
,ThirdTriple=C.[Year]
,YearSpan
,Ranking=rank() over (order by YearSpan)
from TripleFri13Years A
join TripleFri13Years B on A.RowNum+1=B.RowNum
join TripleFri13Years C on B.RowNum+1=C.RowNum
cross apply (select YearSpan=C.[Year]-A.[Year]+1) F1
)
select Century
,Num333Clusters=count(*)
from TripleFri13YearClusters
cross apply (select Century=str(FirstTriple/100*100,4)+'s') F1
where Ranking=1 and FirstTriple>=1800
group by Century
order by Century
/*
Century Num333Clusters
------- --------------
1800s 4
1900s 3
2000s 4
2100s 4
2200s 4
2300s 3
2400s 4
2500s 4
2600s 4
2700s 3
2800s 4
2900s 4
*/
This current century has the most possible 3-3-3 Clusters as well!… There are going to be 4 of them. And it… Hold on a minute… Four 3-3-3 Clusters… Uh-oh… 4 + 3 + 3 + 3 = 13!

NOOOOO! It’s a sign! It looks like 2009 is the beginning of the end… Maybe the Mayans and Columbia Pictures are wrong about 2012 being the end of the world. All indicators point to NOW.

I don’t know about you, but now I’m really getting panicky… In fact, I think that…

OH NO!… IT CAN’T BE!

Are you sitting down?

Look at what happens if you add the ASCII values of the characters of the ominous string “FRIDAY THIRTEENTH 333 CLUSTER”:

select SumOfAsciiChars=sum(ascii(substring(OminousString,Number,1)))
from (select OminousString='FRIDAY THIRTEENTH 333 CLUSTER') X
join master..spt_values on Type='P' and Number between 1 and len(OminousString)
/*
SumOfAsciiChars
---------------
2009
*/
It’s an omen! We’re doomed! MAKE IT STOP!! AAAAAIIIIIEEEEEEEE…





We regret to inform you that Mr Schulz has recently been admitted to the Institute for the Calendrically Insane. His planned release is on Friday, August 13th, 2010.