SELECT Blog FROM Brad.Schulz CROSS APPLY SQL.Server(): 2009

Wednesday, December 23, 2009

It's a Small World After All

I don't have any heavy in-depth T-SQL code or analysis today…

Instead, let’s just sit back and take a quick break from the hectic holiday season…

Sip some hot cocoa or cider…

And have a little chuckle over a couple “Small World” incidents…

The late 1990’s were kind of a busy whirlwind blur for me… I worked at home with 3 young children running around; I was doing a lot of preventative Y2K work; I was a Visual FoxPro MVP frequenting the old CompuServe FoxForum and writing articles for a couple of VFP publications; and I was doing lots of short-term trading in the stock market. (Ahhh… those were the days… check out the 1100% advance in InfoSpace in less than 5 months during 1999/2000).

Yes, it was a busy time, and that’s probably why I can’t remember writing a book on SQL Server 7 back in 1999 with none other than Buck Woody (known then, more formally, as Greg Woody):

I stumbled upon this book the other day at Amazon quite by mistake.

Hmmm… I guess I’d better update my résumé.

It’s amazing the kind of wild stuff you can find out about yourself on the web. For example, about a year ago, SQL Server MVP Kalen Delaney was shocked to discover that she had somehow written a book on Oracle… Kalen may be interested to discover that this book is now available at Amazon in Spanish!

One other funny Google story…

Several years ago, I decided (for whatever reason) to Google my young son’s name. What I found came as quite a surprise. I found a guy named Qxbr Schulz (name disguised) who was a self-employed software developer who had a son named Brad. Well, I was a guy named Brad Schulz who was a self-employed software developer who had a son named Qxbr. And the really weird thing is that we lived only 3 miles from each other!

Yes, as the song goes: ”It’s a small world after all!”

Anyway, he and I met soon thereafter at a Fourth-of-July block party. And no, we’re not related in any way. But he and I have a little chuckle every time we run into each other in town.

Best wishes for the holidays, everybody…

Saturday, December 12, 2009

Delimited String Tennis Anyone?

SQL Server MVP Adam Haines and I have been going back and forth for months in our blogs on the subject of creating and splitting a delimited string… it’s been kind of a virtual T-SQL Tennis match… not a competition, but rather a friendly rally, knocking the subject back and forth, practicing our skills. Between the two of us, we’ve just about beaten the subject to death; however, I like to think that our collective research has been a win-win situation for the SQL Server community.

(If you’d like to look at these past articles, read Adam’s posts here, here, here, and here and read my posts and comments here, here, here, and here.)

In Adam’s latest volley a few days ago (which you can read here), he did some in-depth time tests on various methods of splitting a delimited string…. the XML approach using XPath and the .nodes() and .value() methods, the permanent Numbers table approach, and the approach using a TVF with a virtual table of Numbers.

His verdict was that the XML approach was a biiiiig disappointment when handling a comma-delimited string of many items, in one case performing over 200 times slower (!!) than the Numbers table approaches. Like Adam, I was a bit crestfallen. The XML method is the usual method suggested on the T-SQL forums for doing this sort of thing.

It is literally shocking how badly it performs.

Take a look at the following example, in which I create a variable that contains a comma-delimited list of 5000 items. Then it uses the XML approach to shred the list into a result set of 5000 rows:

/* 
Note that I have to CAST() the first parameter of REPLICATE
in order to force it to produce more than 8000 bytes
*/
declare @CommaList varchar(max)
set @CommaList='ABC'+replicate(cast(',ABC' as varchar(max)),4999)
  
select x.i.value('.','varchar(3)')
from (select XMLList=cast('<i>'+replace(@CommaList,',','</i><i>')+'</i>' as xml)) a
cross apply XMLList.nodes('i') x(i)

The task that we’re asking SQL Server to do is not that complicated, but do you want to guess how long this takes to fully execute on my system? Are you sitting down?

It takes over 6 minutes!

Really disappointing…

I had a lot of trouble accepting this. After doing several experiments, I finally figured out what was going on.

The problem lies in the derived table in the FROM clause, where we create a derived XML column via the CAST function.

It seems we are not really creating a complete column or value at all. Apparently we are inadvertently creating an XML stream, which sends small chunks of data to the .nodes() method and then the chunks get handed over to the .value() method to come up with our result… one chunk at a time.

On my system, when I execute the above code, I see nothing at all happen for about almost 50 seconds. Then the first 676 rows get spit out into the results window. After another 50 seconds or so, another similar-sized chunk of rows get spit out. This happens like clockwork until it spits out the final 5000th row over 6 minutes after I started the query.

You may recall in a previous post of mine where I said that an XML stream is processed in 2033-byte chunks. Well, 676 rows of 3 bytes (‘ABC’) each equals 2028 bytes. (I’m assuming that there’s a little tiny bit of overhead involved and that’s why we can’t squeeze in the 677th row). My comma-delimited string consists of 5000 items at 3 bytes each, coming to 15000 bytes. Divide that by 2033 bytes and you get 7.38, which is the number of chunks of data that get processed. Multiply that by 50 seconds per chunk, and you get… voila… 6.15 minutes.

Well, this must be all really exciting for you math nerds out there, but how does it help us?

It does help to explain what’s going on, and it does show us that streaming the XML is reeeeeaaalllly sloooooow.

So is there hope for the XML method?

YES! YES! YES!

This slow streaming only occurs when we are creating the XML to be processed via a derived table or CTE. What If we instead transfer our comma-delimited string into an actual XML variable and then process that variable, like so?:

/* 
Note that I have to CAST() the first parameter of REPLICATE
in order to force it to produce more than 8000 bytes
*/
declare @CommaList varchar(max)
set @CommaList='ABC'+replicate(cast(',ABC' as varchar(max)),4999)
  
declare @XMLList xml
set @XMLList=cast('<i>'+replace(@CommaList,',','</i><i>')+'</i>' as xml)
  
select x.i.value('.','varchar(3)')
from @XMLList.nodes('i') x(i)

Believe it or not, this takes only 0.174 seconds!

In comparing the two pieces of code outlined in this article, here are the results:

/*            Derived     XML
                Table  Variable    Percent
Measurement    Method   Method   Improvement
--------------------------------------------
CPU           386,172     125       2989%
Duration(ms)  368,747     174       2019%
*/

So I’m happy to report that all is not lost for the XML method. It’s still a contender in the list-splitting competition, giving the Numbers table methods a run for their money… as long as it is implemented in a certain way.

Back to you, Adam…

UPDATE ON Jan06,2010: Please read the follow-up to this article in which I talk about some additional methods to speed the process up even further.

Tuesday, December 8, 2009

Friday the 13th 2009: Cluster of Terror

This blog entry is participating in the very first T-SQL Tuesday, hosted this month by Adam Machanic. You are invited to visit his blog to join this rollicking blog party and read more blogs participating in this month’s theme: Date/Time Tricks.

Approximately 715,298 articles have been written over the past 20 years or so regarding the handling of dates and times in T-SQL. So, instead of fruitlessly trying to come up with an earth-shattering new clever way of handling dates that no one’s ever seen before, I just thought I’d have some fun with some analysis of a very specific date that we all know and ~~fear~~ love.

By the way, before we proceed, as long as we’re talking about date-related stuff, I should remind you that the deadline for TSQL Challenge #18 (Generate Text-Formatted Month Calendars) is on December 14th, so check it out and give it a try.

Okay, on with our fun stuff…

Is it just my imagination, or have we experienced a bunch of Friday the 13th’s lately?

For those of you who are not up-to-speed on superstitions, the number 13 is an unlucky number in many cultures, and, similarly, Friday is an unlucky day of the week, and so Friday the 13th is double bad luck. People who have a genuine fear of Friday the 13th suffer from paraskevidekatriaphobia, which is a concoction of Greek words for Friday (Paraskeví), thirteen (dekatreís), and fear (phobía).

Let’s do a little investigating into Friday the 13th. First, we’ll build a Calendar table of every single date in the 1247-year range of January 1, 1753 (the minimum date of the T-SQL datetime datatype) to December 31, 2999 (just a reasonable stopping point). That comes out to 455,457 dates, but our friend T-SQL can create that table in less than a second:

if object_id('tempdb..#Calendar') is not null drop table #Calendar
go
with 
  L0(N) as (select 0 union all select 0 union all select 0) /* 3 Rows */
 ,L1(N) as (select 0 from L0 A cross join L0 B)             /* 9 Rows (3x3) */
 ,L2(N) as (select 0 from L1 A cross join L1 B)             /* 81 Rows (9x9) */
 ,L3(N) as (select 0 from L2 A cross join L2 B)             /* 6561 Rows (81x81) */
 ,L4(N) as (select 0 from L3 A cross join L3 B)             /* 43,046,721 Rows */
 ,Numbers(Num) as (select row_number() over (order by (select 0)) from L4)
select [Date]
      ,[Year]=year([Date])
      ,[Month]=month([Date])
      ,[Day]=day([Date])
      ,DayOfWeek=datename(weekday,[Date])
into #Calendar
from Numbers 
cross apply (select [Date]=convert(datetime,dateadd(day,Num-1,'17530101'))) F1
where Num between 1 and datediff(day,'17530101','29991231')+1

Now we can check out the 13th and see how often it falls on the 7 days of the week:

select DayOfWeek
      ,NumOccurrences=count(*)
from #Calendar
where [Day]=13
group by DayOfWeek
order by NumOccurrences desc
/*
DayOfWeek NumOccurrences
--------- --------------
Friday              2146
Wednesday           2142
Sunday              2142
Tuesday             2137
Monday              2133
Saturday            2132
Thursday            2132
*/

Hmmm… interesting. The 13th occurs on Friday more than any other day of the week. That’s kind of scary, don’t you think?

In fact, as you can see below, Friday the 13th is in the family of most frequently occurring dates in history (and the future):

with HighestOccurrences as
(
  select 
  top 1 
  with ties DayOfWeek
           ,[Day]
  from #Calendar
  group by DayOfWeek,[Day]
  order by count(*) desc
)
select [MostFrequentlyOccurringDay]
          =DayOfWeek
          +' the '
          +ltrim(str([Day]))
          +case
             when [Day] in (1,21,31) then 'st'
             when [Day] in (2,22) then 'nd'
             when [Day] in (3,23) then 'rd'
             else 'th'
           end
from HighestOccurrences
order by [Day],DayOfWeek
/*
MostFrequentlyOccurringDay
--------------------------
Sunday the 1st
Monday the 2nd
Tuesday the 3rd
...
Wednesday the 11th
Thursday the 12th
Friday the 13th
Saturday the 14th
Sunday the 15th
...
Thursday the 26th
Friday the 27th
Saturday the 28th
*/

Of course, logically, we can see that this also means that Thursday the 12th and Saturday the 14th and so on down to Sunday the 1st and up to Saturday the 28th occur the most frequently, but still… it’s kind of disconcerting that the most-feared day of all occurs the most often.

Let’s build a table of just the Friday the 13th dates:

if object_id('tempdb..#Friday13s') is not null drop table #Friday13s
go
select [Date]
      ,[Year]
      ,[Month]
into #Friday13s
from #Calendar
where DayOfWeek='Friday' and [Day]=13

That table is now populated with all the Friday-the-13th dates from 1753 to 2999… There are 2146 of them.

Wait a minute…

If you add the digits of 2146, you get 2 + 1 + 4 + 6 = 13.

Now that’s an interesting (and somewhat unsettling) coincidence, isn’t it? Ha-ha-ha.

Ahem… Oh well, never mind… Let’s move on…

Using that table, we can see that a Friday the 13th occurs every single year without exception:

select Result=case
                when (select count(distinct [Year]) from #Friday13s)=2999-1753+1
                then 'Yep... it occurs every single year'
                else 'Nope... there are some years when it doesn''t happen'
              end
/*
Result
----------------------------------
Yep... it occurs every single year
*/

Apparently there’s no escape!

And here’s an interesting tidbit: Every century (except the 1700s because they are only partially represented in our table) contains the exact same number of Friday the 13th’s:

select Century
      ,NumOccurrences=count(*)
from #Friday13s
cross apply (select Century=str([Year]/100*100,4)+'s') F1
where [Year]>=1800
group by Century
order by Century
/*
Century NumOccurrences
------- --------------
1800s              172
1900s              172
2000s              172
2100s              172
2200s              172
2300s              172
2400s              172
2500s              172
2600s              172
2700s              172
2800s              172
2900s              172
*/

Fascinating…

Now I mentioned earlier that I thought that we had experienced a bunch of Friday the 13th’s lately. Was I right? How many did we have this year (2009)?:

select [Date]
from #Friday13s
where [Year]=2009
order by [Date]
/*
Date
-----------------------
2009-02-13 00:00:00.000
2009-03-13 00:00:00.000
2009-11-13 00:00:00.000
*/

Three of them! A Triple! We just had one last month in November, and we had two months in a row with February and March! Omigosh!

Now that makes me wonder… What years contain the most occurrences of Friday the 13th?:

with HighestOccurrences as
(
  select 
  top 1 
  with ties [Year]
           ,NumOccurrences=count(*)
  from #Friday13s
  group by [Year]
  order by NumOccurrences desc
)
select [Year]
      ,NumOccurrences
from HighestOccurrences
order by [Year]
/*
Year NumOccurrences
---- --------------
1761              3
1764              3
1767              3
...
1931              3
1942              3
1953              3
1956              3
1959              3
1970              3
1981              3
1984              3
1987              3
1998              3
2009              3
2012              3
2015              3
2026              3
2037              3
2040              3
2043              3
...
2989              3
2992              3
2995              3
*/

Yikes! Sure enough… Three times is the most it can occur in a year, and this year (2009) is one of those! And we haven’t experienced a Triple like that since 1998!

Wait a minute… Look at that list again. It looks like we’re going to experience another Triple in just 3 short years… in 2012! And wait! We’ll have another Triple just 3 years after that… in 2015!

Whoa! I thought we had experienced many Friday the 13th’s lately, but it looks like we are only at the beginning of a really scary cluster of them! It’s a Triple-Triple-Triple (3-3-3) Cluster! Three Friday the 13th’s each occurring in three different years that are three years apart from each other.

This calls for more investigation. Let’s look at all the years that have a Triple… and then cluster them chronologically in groups of three… and then find the Clusters that span the least amount of time (i.e. the tightest, densest, scariest Clusters):

with TripleFri13Years as
(
  select [Year]
        ,RowNum=row_number() over (order by [Year])
  from #Friday13s
  group by [Year]
  having count(*)=3
)
,TripleFri13YearClusters as
(
  select FirstTriple=A.[Year]
        ,SecondTriple=B.[Year]
        ,ThirdTriple=C.[Year]
        ,YearSpan
        ,YearSpanRank=rank() over (order by YearSpan)
  from TripleFri13Years A
  join TripleFri13Years B on A.RowNum+1=B.RowNum
  join TripleFri13Years C on B.RowNum+1=C.RowNum
  cross apply (select YearSpan=C.[Year]-A.[Year]+1) F1
)
select FirstTriple
      ,SecondTriple
      ,ThirdTriple
      ,YearSpan
from TripleFri13YearClusters 
where YearSpanRank=1  /* The Shortest Span */
order by FirstTriple
/*
FirstTriple SecondTriple ThirdTriple YearSpan
----------- ------------ ----------- --------
       1761         1764        1767        7
       1789         1792        1795        7
       1801         1804        1807        7
       1829         1832        1835        7
       1857         1860        1863        7
       1885         1888        1891        7
       1925         1928        1931        7
       1953         1956        1959        7
       1981         1984        1987        7
       2009         2012        2015        7
       2037         2040        2043        7
       2065         2068        2071        7
       2093         2096        2099        7
       ...
       2933         2936        2939        7
       2961         2964        2967        7
       2989         2992        2995        7
*/

Oh no!

It looks like the 3-3-3 Cluster is the shortest cluster time span possible, and therefore, we are now on the verge of the worst possible Triple Cluster of Triple Friday-the-13th Years! The dreaded 3-3-3 cluster of 2009-2012-2015! The last time this kind of dense 3-3-3 cluster happened was the 7 year span of 1981-1984-1987.

Wait… Let me think… Did anything terrible happen from 1981 to 1987? Well, AIDS was first reported in 1981. And… uh-oh… (GULP)… The H1N1 Swine Flu Pandemic started this very year in 2009!

OH… MY… GAWD! It can’t be a coincidence!

So, let me make sure I got this straight… it seems that 2009 has the highest possible occurrences of Friday the 13th’s and it’s the beginning of a dreaded 3-3-3 Cluster. This doesn’t look good.

Wait!… I’ve gotta check this out… We already saw that every century has the same amount of Friday the 13th’s, but what about 3-3-3 Clusters?:

with TripleFri13Years as
(
  select [Year]
        ,RowNum=row_number() over (order by [Year])
  from #Friday13s
  group by [Year]
  having count(*)=3
)
,TripleFri13YearClusters as
(
  select FirstTriple=A.[Year]
        ,SecondTriple=B.[Year]
        ,ThirdTriple=C.[Year]
        ,YearSpan
        ,Ranking=rank() over (order by YearSpan)
  from TripleFri13Years A
  join TripleFri13Years B on A.RowNum+1=B.RowNum
  join TripleFri13Years C on B.RowNum+1=C.RowNum
  cross apply (select YearSpan=C.[Year]-A.[Year]+1) F1
)
select Century
      ,Num333Clusters=count(*)
from TripleFri13YearClusters 
cross apply (select Century=str(FirstTriple/100*100,4)+'s') F1
where Ranking=1 and FirstTriple>=1800
group by Century
order by Century
/*
Century Num333Clusters
------- --------------
1800s                4
1900s                3
2000s                4
2100s                4
2200s                4
2300s                3
2400s                4
2500s                4
2600s                4
2700s                3
2800s                4
2900s                4
*/

This current century has the most possible 3-3-3 Clusters as well!… There are going to be 4 of them. And it… Hold on a minute… Four 3-3-3 Clusters… Uh-oh… 4 + 3 + 3 + 3 = 13!

NOOOOO! It’s a sign! It looks like 2009 is the beginning of the end… Maybe the Mayans and Columbia Pictures are wrong about 2012 being the end of the world. All indicators point to NOW.

I don’t know about you, but now I’m really getting panicky… In fact, I think that…

OH NO!… IT CAN’T BE!

Are you sitting down?

Look at what happens if you add the ASCII values of the characters of the ominous string “FRIDAY THIRTEENTH 333 CLUSTER”:

select SumOfAsciiChars=sum(ascii(substring(OminousString,Number,1)))
from (select OminousString='FRIDAY THIRTEENTH 333 CLUSTER') X
join master..spt_values on Type='P' and Number between 1 and len(OminousString)
/*
SumOfAsciiChars
---------------
           2009
*/

It’s an omen! We’re doomed! MAKE IT STOP!! AAAAAIIIIIEEEEEEEE…

We regret to inform you that Mr Schulz has recently been admitted to the Institute for the Calendrically Insane. His planned release is on Friday, August 13th, 2010.

Friday, November 20, 2009

A Loan At Last!

Got your eye on a flashy new car?

Or perhaps you’re considering a new home?

If you don’t have the cash to pay for either of these, you’ll need to get a loan.

And before you jump in head-first, you’ll want to know how much you’ll be paying each month, and perhaps you’ll want to know ~~the obscene amount of~~ how much interest you’ll end up paying over the life of the loan.

Sure, there are truckloads of loan calculators out there on the web… They’re a dime a dozen.

But who needs those when you have the raw power of SQL Server at your fingertips?

This blog entry will demonstrate how to create a loan payment schedule based on a fixed interest rate with compounded interest, and, as always, we’ll go a little bit beyond that to explore some other things.

The monthly payment amount, c, is expressed by the following formula:

where P is the loan amount, M is the number of months of the loan, and r is the monthly interest rate expressed as a decimal.

The balance of the loan after payment number N is expressed by the following formula:

With all this in mind, here is a stored procedure to produce a payment schedule:

use tempdb;
go
  
if object_id('usp_LoanSchedule') is not null drop procedure usp_LoanSchedule;
go
  
create procedure usp_LoanSchedule
   @LoanAmount     numeric(10,2)
  ,@AnnualRate     numeric(10,8)  /* Expressed as percent */
  ,@NumberOfMonths int
  ,@StartDate      datetime
as
with InputVariables as
(
  select P,M,R
        ,C=round((P*R)/(1-power(1+R,-M)),2)
  from (select P=@LoanAmount
              ,M=@NumberOfMonths
              ,R=@AnnualRate/12/100) InputData
)
,MonthlyPayments(PmtNo
                ,PmtDate
                ,Balance
                ,Principle
                ,Interest
                ,CumulPrinciple
                ,CumulInterest) as
(
  select N
        ,dateadd(month,datediff(month,'19000101',@StartDate),'19000101')
        ,cast(NewBalance as numeric(10,2))
        ,cast(P-NewBalance as numeric(10,2))
        ,cast(C-(P-NewBalance) as numeric(10,2))
        ,cast(P-NewBalance as numeric(10,2))
        ,cast(C-(P-NewBalance) as numeric(10,2))
  from InputVariables
  cross apply (select N=1) CalcPmtNo
  cross apply (select NewBalance=round(P*power(1+R,N)
                                      -(power(1+R,N)-1)*C/R,2)) CalcNewBalance
  union all
  select N
        ,dateadd(month,1,mp.PmtDate)
        ,cast(NewBalance as numeric(10,2))
        ,cast(mp.Balance-NewBalance as numeric(10,2))
        ,cast(C-(mp.Balance-NewBalance) as numeric(10,2))
        ,cast(mp.CumulPrinciple+mp.Balance-NewBalance as numeric(10,2))
        ,cast(mp.CumulInterest+C-(mp.Balance-NewBalance) as numeric(10,2))
  from MonthlyPayments mp
  cross join InputVariables
  cross apply (select N=mp.PmtNo+1) CalcPmtNo
  cross apply (select NewBalance=case
                                   when N=M
                                   then 0.00  /* Last Payment */
                                   else round(P*power(1+R,N)
                                             -(power(1+R,N)-1)*C/R,2)
                                 end) CalcNewBalance
  where N<=M
)
select PmtNo
      ,PmtMonth=datename(month,PmtDate)+str(year(PmtDate),5)
      ,PmtAmount=Principle+Interest
      ,Principle
      ,Interest
      ,CumulPrinciple
      ,CumulInterest
      ,Balance
from MonthlyPayments  
order by PmtNo
option (maxrecursion 1000)
;

If we want to buy a $20,000 car at 6.8% (annual) interest with a 5-year term (60 months) with our first payment in December 2009, then we can call it like so:

exec usp_LoanSchedule @LoanAmount     = 20000
                     ,@AnnualRate     = 6.8
                     ,@NumberOfMonths = 60
                     ,@StartDate      = '20091201'
/*
PmtNo PmtMonth       PmtAmount Principle Interest CumulPrinciple CumulInterest  Balance
----- -------------- --------- --------- -------- -------------- ------------- --------
    1 December 2009     394.14    280.81   113.33         280.81        113.33 19719.19
    2 January 2010      394.14    282.39   111.75         563.20        225.08 19436.80
    3 February 2010     394.14    284.00   110.14         847.20        335.22 19152.80
    4 March 2010        394.14    285.61   108.53        1132.81        443.75 18867.19
  ... ...                 ...       ...      ...            ...           ...      ...
   57 August 2014       394.14    385.33     8.81       18830.89       3635.09  1169.11
   58 September 2014    394.14    387.52     6.62       19218.41       3641.71   781.59
   59 October 2014      394.14    389.71     4.43       19608.12       3646.14   391.88
   60 November 2014     394.14    391.88     2.26       20000.00       3648.40     0.00
*/

Let’s go over how the logic works.

The procedure essentially consists of a single SELECT statement, which is made up of two CTE’s (Common Table Expressions).

The first CTE is called InputVariables, and it just produces a small one-row table of the variables that we need for the calculations… in other words, our friends P, M, r, and c. Note how r is calculated by dividing the @AnnualRate parameter by 12 (to get a monthly rate) and then by 100 (to represent the rate as a decimal). Also note how c (the monthly payment amount) was calculated based on the other 3 variables.

I could have just created local variables (@P, @M, @R, and @C) instead of setting them up in a CTE, but I had two reasons for not doing that. First of all, I didn’t want all those pesky at-signs (@) to be all throughout the code and second, I wanted to encapsulate the entire creation of the loan schedule table in a single SELECT statement (I’ll explain why later).

The second CTE, called MonthlyPayments, is a recursive one. The first part of the recursive CTE (before the UNION ALL) is the anchor. It produces the data for the very first payment (note how I also adjust the first payment date to the first of the month). The second part (the recursive part) of the CTE builds upon that, creating the second payment, then the third, and so on, up until the last payment. It will continue tacking on payments as long as its WHERE clause (WHERE N<=M) is true… in other words, as long as the Payment Number (N) is less than or equal to the number of months of the loan (M). Once the WHERE clause evaluates to false (when N>M), then it will stop.

Both the anchor and the recursive part of the CTE use a couple of CROSS APPLYs. The first CROSS APPLY establishes a new column N, the Payment Number. The only reason I do this is because I want to use N in the calculation of the Balance. You’ll recall that the formula for the Balance uses N as one of its variables.

The second CROSS APPLY is the calculation of that NewBalance, making use of P, r, N, and c. Note the CASE statement that checks if we are processing the final payment (CASE WHEN N=M) and, if so, it forces the final balance to be zero.

The Principle of the payment is calculated by subtracting the NewBalance from the Balance of the previous payment (or in the case of the very first payment, by subtracting NewBalance from the original loan amount P). The Interest part of the payment is calculated by simply subtracting the Principle from the Payment Amount (c).

Since the recursive CTE is building the schedule one payment at a time, we can take advantage of that and calculate cumulative totals for the Principle and the Interest (CumulPrinciple and CumulInterest) as we go along.

Finally, the main SELECT statement pulls all the rows from the MonthlyPayments recursive CTE, dressing up the date by spelling out the month’s name and ORDERing BY the PmtNo.

Note the OPTION clause. By default, the maximum number of recursion levels is 100. The number of levels of recursion we will reach depends on the value of @NumberOfMonths that is passed to the procedure. If we’re doing a mortgage payment that spans 30 years, then @NumberOfMonths will be 360 and we will need a level of recursion of at least 360 or else we will get this message:

/*
Msg 530: The statement terminated. 
The maximum recursion 100 has been exhausted before statement completion.
*/

So our general MAXRECURSION value of 1000 should certainly be reasonable enough, assuming that no loan is going to last more than 83 years.

Great! We have a stored procedure that produces a loan schedule! Terrific!

Now what?

Not much, that’s what. This stored procedure works just fine, but we can’t really do anything with it. All we can do is EXECUTE it. Unless we INSERT its result set into a temporary table, we can’t query any important information out of it.

However, we could query information out of this loan schedule data if we created an inline table-valued function (TVF) instead of the stored procedure, like so:

use tempdb;
go
  
if object_id('ufn_LoanSchedule') is not null drop function ufn_LoanSchedule;
go
  
create function ufn_LoanSchedule
(
   @LoanAmount     numeric(10,2)
  ,@AnnualRate     numeric(10,8)  /* Expressed as percent */
  ,@NumberOfMonths int
  ,@StartDate      datetime
)
returns table
as
return
with InputVariables as
(
  select P,M,R
        ,C=round((P*R)/(1-power(1+R,-M)),2)
  from (select P=@LoanAmount
              ,M=@NumberOfMonths
              ,R=@AnnualRate/12/100) InputData
)
,MonthlyPayments(PmtNo
                ,PmtDate
                ,Balance
                ,Principle
                ,Interest
                ,CumulPrinciple
                ,CumulInterest) as
(
  select N
        ,dateadd(month,datediff(month,'19000101',@StartDate),'19000101')
        ,cast(NewBalance as numeric(10,2))
        ,cast(P-NewBalance as numeric(10,2))
        ,cast(C-(P-NewBalance) as numeric(10,2))
        ,cast(P-NewBalance as numeric(10,2))
        ,cast(C-(P-NewBalance) as numeric(10,2))
  from InputVariables
  cross apply (select N=1) CalcPmtNo
  cross apply (select NewBalance=round(P*power(1+R,N)
                                      -(power(1+R,N)-1)*C/R,2)) CalcNewBalance
  union all
  select N
        ,dateadd(month,1,mp.PmtDate)
        ,cast(NewBalance as numeric(10,2))
        ,cast(mp.Balance-NewBalance as numeric(10,2))
        ,cast(C-(mp.Balance-NewBalance) as numeric(10,2))
        ,cast(mp.CumulPrinciple+mp.Balance-NewBalance as numeric(10,2))
        ,cast(mp.CumulInterest+C-(mp.Balance-NewBalance) as numeric(10,2))
  from MonthlyPayments mp
  cross join InputVariables
  cross apply (select N=mp.PmtNo+1) CalcPmtNo
  cross apply (select NewBalance=case
                                   when N=M
                                   then 0.00  /* Last Payment */
                                   else round(P*power(1+R,N)
                                             -(power(1+R,N)-1)*C/R,2)
                                 end) CalcNewBalance
  where N<=M
)
select PmtNo
      ,PmtDate
      ,PmtAmount=Principle+Interest
      ,Principle
      ,Interest
      ,CumulPrinciple
      ,CumulInterest
      ,Balance
from MonthlyPayments  
;

(This is why I originally created a single one-stop-shopping query in our original stored procedure without the use of local variables… so that I could easily translate it into an in-line TVF).

Now we can call the TVF just as if it were a table, pulling out any columns we wish:

select PmtNo
      ,PmtDate
      ,Principle
      ,Interest
      ,Balance
from dbo.ufn_LoanSchedule(20000,6.8,60,'20091201')
order by PmtNo
/*
PmtNo PmtDate                 Principle Interest  Balance
----- ----------------------- --------- -------- --------
    1 2009-12-01 00:00:00.000    280.81   113.33 19719.19
    2 2010-01-01 00:00:00.000    282.39   111.75 19436.80
    3 2010-02-01 00:00:00.000    284.00   110.14 19152.80
    4 2010-03-01 00:00:00.000    285.61   108.53 18867.19
  ...   ...                        ...      ...      ...
   57 2014-08-01 00:00:00.000    385.33     8.81  1169.11
   58 2014-09-01 00:00:00.000    387.52     6.62   781.59
   59 2014-10-01 00:00:00.000    389.71     4.43   391.88
   60 2014-11-01 00:00:00.000    391.88     2.26     0.00
*/

Look back at the code where we created the TVF. Did you notice something missing in the definition? Two things, actually. I deliberately left out the ORDER BY clause because we may want to order the data in a different way when we call the TVF. But the OPTION clause is also missing that specifies the MAXRECURSION value. T-SQL will not allow it in a query that’s part of a TVF definition. If we were to tack it on to the end of the query and try to CREATE the FUNCTION, we’d get the following error message:

/*
Msg 156, Level 15, State 1, Procedure ufn_LoanSchedule, Line 67
Incorrect syntax near the keyword 'option'.
*/

Since we cannot specify any MAXRECURSION in the TVF definition, it is (unfortunately) up to us to include the OPTION clause in any query that calls the TVF if we know that we are going to breach the minimum level of 100.

So, for example, let’s say we have just bought a quaint little cottage for $500,000. (Don’t scoff… that price is dirt cheap for a humble little abode here in the San Francisco Bay Area). We get a 30-year (360-month) fixed loan with a 5% interest rate, with our first payment due in January 2010. We need to add the OPTION clause to the following query to prevent an error:

select PmtNo
      ,PmtDate
      ,Principle
      ,Interest
      ,Balance
from dbo.ufn_LoanSchedule(500000,5,360,'20100101')
option (maxrecursion 360)
/*
PmtNo PmtDate                 Principle Interest   Balance
----- ----------------------- --------- -------- ---------
    1 2010-01-01 00:00:00.000    600.78  2083.33 499399.22
    2 2010-02-01 00:00:00.000    603.28  2080.83 498795.94
    3 2010-03-01 00:00:00.000    605.79  2078.32 498190.15
    4 2010-04-01 00:00:00.000    608.32  2075.79 497581.83
  ...  ...                         ...      ...       ...
  357 2039-09-01 00:00:00.000   2639.84    44.27   7984.14
  358 2039-10-01 00:00:00.000   2650.84    33.27   5333.30
  359 2039-11-01 00:00:00.000   2661.89    22.22   2671.41
  360 2039-12-01 00:00:00.000   2671.41    12.70      0.00
*/

Anyway, now that we have a TVF, we can get some interesting information from the loan schedule data.

For example, using our mortgage scenario, how much can we write off in mortgage interest on our income taxes in the year 2012?:

select TotalInterest=sum(Interest)
from dbo.ufn_LoanSchedule(500000,5,360,'20100101')
where year(PmtDate)=2012
option (maxrecursion 360)
/*
TotalInterest
-------------
     24058.33
*/

What will our loan balance be at the end of 2015?:

select Balance
from dbo.ufn_LoanSchedule(500000,5,360,'20100101')
where PmtDate='20151201'
order by PmtNo
option (maxrecursion 360)
/*
  Balance
---------
449676.39
*/

How long will it take for us to pay off half of the loan?:

select top 1 PmtNo,PmtDate
from dbo.ufn_LoanSchedule(500000,5,360,'20100101')
where CumulPrinciple>=Balance
order by PmtNo
option (maxrecursion 360)
/*
PmtNo PmtDate
----- -----------------------
  242 2030-02-01 00:00:00.000
*/

(Yikes… That’s the problem with how these loans work. It’ll take over 20 years of the 30-year loan term to pay off half of the principle.)

Finally, we can test out different rates and see how the monthly payment changes and how much total cumulative interest we’ll end up paying for each rate:

select Rate
      ,PmtAmount
      ,CumulInterest
from (select 5.1 union all
      select 5.3 union all
      select 5.5 union all
      select 5.7 union all
      select 5.9) Rates(Rate)
cross apply dbo.ufn_LoanSchedule(500000,Rate,360,'20100101')
where PmtNo=360
option (maxrecursion 360)
/*
Rate PmtAmount CumulInterest
---- --------- -------------
 5.1   2714.75     477310.00
 5.3   2776.52     499547.20
 5.5   2838.95     522022.00
 5.7   2902.00     544720.00
 5.9   2965.68     567644.80
*/

(Don’t you find it ~~disgusting~~ interesting how you pay more in interest over the life of the loan than you do in principle if your rate is a little over 5.3%?).

I hope this article has been helpful not just in showing how to calculate loan schedule data, but also in demonstrating recursive CTE’s and TVF’s and other techniques. I certainly learned something I didn’t expect (the inability to specify a MAXRECURSION value in a TVF) in putting the article together.

I’ll bet that your borrowing future will be brighter now that you have the tools to analyze potential loans. From now on, you’ll be able to make those payments with a smile on your face.

Yeah, right.

SELECT Blog FROM Brad.Schulz CROSS APPLY SQL.Server()

Wednesday, December 23, 2009

It's a Small World After All

Saturday, December 12, 2009

Delimited String Tennis Anyone?

Tuesday, December 8, 2009

Friday the 13th 2009: Cluster of Terror

Friday, November 20, 2009

A Loan At Last!

Search This Blog

Subscribe

About Me

Keywords

Blog Archive

Followers

# # #

SELECT Blog FROM Brad.Schulz CROSS APPLY SQL.Server()

Wednesday, December 23, 2009

It's a Small World After All

Saturday, December 12, 2009

Delimited String Tennis Anyone?

Tuesday, December 8, 2009

Friday the 13th 2009: Cluster of Terror

Friday, November 20, 2009

A Loan At Last!

Search This Blog

Subscribe

Subscribe To...

About Me

Keywords

Blog Archive

Followers

# # #