Wednesday, December 23, 2009

It's a Small World After All

Small WorldI don't have any heavy in-depth T-SQL code or analysis today…

Instead, let’s just sit back and take a quick break from the hectic holiday season…

Sip some hot cocoa or cider…

And have a little chuckle over a couple “Small World” incidents…

The late 1990’s were kind of a busy whirlwind blur for me… I worked at home with 3 young children running around; I was doing a lot of preventative Y2K work; I was a Visual FoxPro MVP frequenting the old CompuServe FoxForum and writing articles for a couple of VFP publications; and I was doing lots of short-term trading in the stock market. (Ahhh… those were the days… check out the 1100% advance in InfoSpace in less than 5 months during 1999/2000).

INSP Oct1999-Mar2000

Yes, it was a busy time, and that’s probably why I can’t remember writing a book on SQL Server 7 back in 1999 with none other than Buck Woody (known then, more formally, as Greg Woody):

SQL Server 7.0 Book

I stumbled upon this book the other day at Amazon quite by mistake.

Hmmm… I guess I’d better update my résumé.

It’s amazing the kind of wild stuff you can find out about yourself on the web. For example, about a year ago, SQL Server MVP Kalen Delaney was shocked to discover that she had somehow written a book on Oracle… Kalen may be interested to discover that this book is now available at Amazon in Spanish!

One other funny Google story…

Several years ago, I decided (for whatever reason) to Google my young son’s name. What I found came as quite a surprise. I found a guy named Qxbr Schulz (name disguised) who was a self-employed software developer who had a son named Brad. Well, I was a guy named Brad Schulz who was a self-employed software developer who had a son named Qxbr. And the really weird thing is that we lived only 3 miles from each other!

Yes, as the song goes: ”It’s a small world after all!”

Anyway, he and I met soon thereafter at a Fourth-of-July block party. And no, we’re not related in any way. But he and I have a little chuckle every time we run into each other in town.

Best wishes for the holidays, everybody…

Saturday, December 12, 2009

Delimited String Tennis Anyone?

Tennis Anyone?SQL Server MVP Adam Haines and I have been going back and forth for months in our blogs on the subject of creating and splitting a delimited string… it’s been kind of a virtual T-SQL Tennis match… not a competition, but rather a friendly rally, knocking the subject back and forth, practicing our skills. Between the two of us, we’ve just about beaten the subject to death; however, I like to think that our collective research has been a win-win situation for the SQL Server community.

(If you’d like to look at these past articles, read Adam’s posts here, here, here, and here and read my posts and comments here, here, here, and here.)

In Adam’s latest volley a few days ago (which you can read here), he did some in-depth time tests on various methods of splitting a delimited string…. the XML approach using XPath and the .nodes() and .value() methods, the permanent Numbers table approach, and the approach using a TVF with a virtual table of Numbers.

His verdict was that the XML approach was a biiiiig disappointment when handling a comma-delimited string of many items, in one case performing over 200 times slower (!!) than the Numbers table approaches. Like Adam, I was a bit crestfallen. The XML method is the usual method suggested on the T-SQL forums for doing this sort of thing.

It is literally shocking how badly it performs.

Take a look at the following example, in which I create a variable that contains a comma-delimited list of 5000 items. Then it uses the XML approach to shred the list into a result set of 5000 rows:

/* 
Note that I have to CAST() the first parameter of REPLICATE
in order to force it to produce more than 8000 bytes
*/
declare @CommaList varchar(max)
set @CommaList='ABC'+replicate(cast(',ABC' as varchar(max)),4999)

select x.i.value('.','varchar(3)')
from (select XMLList=cast('<i>'+replace(@CommaList,',','</i><i>')+'</i>' as xml)) a
cross apply XMLList.nodes('i') x(i)
The task that we’re asking SQL Server to do is not that complicated, but do you want to guess how long this takes to fully execute on my system? Are you sitting down?

It takes over 6 minutes!

Really disappointing…

I had a lot of trouble accepting this. After doing several experiments, I finally figured out what was going on.

The problem lies in the derived table in the FROM clause, where we create a derived XML column via the CAST function.

It seems we are not really creating a complete column or value at all. Apparently we are inadvertently creating an XML stream, which sends small chunks of data to the .nodes() method and then the chunks get handed over to the .value() method to come up with our result… one chunk at a time.

On my system, when I execute the above code, I see nothing at all happen for about almost 50 seconds. Then the first 676 rows get spit out into the results window. After another 50 seconds or so, another similar-sized chunk of rows get spit out. This happens like clockwork until it spits out the final 5000th row over 6 minutes after I started the query.

You may recall in a previous post of mine where I said that an XML stream is processed in 2033-byte chunks. Well, 676 rows of 3 bytes (‘ABC’) each equals 2028 bytes. (I’m assuming that there’s a little tiny bit of overhead involved and that’s why we can’t squeeze in the 677th row). My comma-delimited string consists of 5000 items at 3 bytes each, coming to 15000 bytes. Divide that by 2033 bytes and you get 7.38, which is the number of chunks of data that get processed. Multiply that by 50 seconds per chunk, and you get… voila… 6.15 minutes.

Well, this must be all really exciting for you math nerds out there, but how does it help us?

It does help to explain what’s going on, and it does show us that streaming the XML is reeeeeaaalllly sloooooow.

So is there hope for the XML method?

YES! YES! YES!

This slow streaming only occurs when we are creating the XML to be processed via a derived table or CTE. What If we instead transfer our comma-delimited string into an actual XML variable and then process that variable, like so?:

/* 
Note that I have to CAST() the first parameter of REPLICATE
in order to force it to produce more than 8000 bytes
*/
declare @CommaList varchar(max)
set @CommaList='ABC'+replicate(cast(',ABC' as varchar(max)),4999)

declare @XMLList xml
set
@XMLList=cast('<i>'+replace(@CommaList,',','</i><i>')+'</i>' as xml)

select x.i.value('.','varchar(3)')
from @XMLList.nodes('i') x(i)
Believe it or not, this takes only 0.174 seconds!

In comparing the two pieces of code outlined in this article, here are the results:

/*            Derived     XML
Table Variable Percent
Measurement Method Method Improvement
--------------------------------------------
CPU 386,172 125 2989%
Duration(ms) 368,747 174 2019%
*/
So I’m happy to report that all is not lost for the XML method. It’s still a contender in the list-splitting competition, giving the Numbers table methods a run for their money… as long as it is implemented in a certain way.

Back to you, Adam…

UPDATE ON Jan06,2010: Please read the follow-up to this article in which I talk about some additional methods to speed the process up even further.

Tuesday, December 8, 2009

Friday the 13th 2009: Cluster of Terror

This blog entry is participating in the very first T-SQL Tuesday, hosted this month by Adam Machanic. You are invited to visit his blog to join this rollicking blog party and read more blogs participating in this month’s theme: Date/Time Tricks.

Approximately 715,298 articles have been written over the past 20 years or so regarding the handling of dates and times in T-SQL. So, instead of fruitlessly trying to come up with an earth-shattering new clever way of handling dates that no one’s ever seen before, I just thought I’d have some fun with some analysis of a very specific date that we all know and fear love.

By the way, before we proceed, as long as we’re talking about date-related stuff, I should remind you that the deadline for TSQL Challenge #18 (Generate Text-Formatted Month Calendars) is on December 14th, so check it out and give it a try.

Okay, on with our fun stuff…



Friday The ThirteenthIs it just my imagination, or have we experienced a bunch of Friday the 13th’s lately?

For those of you who are not up-to-speed on superstitions, the number 13 is an unlucky number in many cultures, and, similarly, Friday is an unlucky day of the week, and so Friday the 13th is double bad luck. People who have a genuine fear of Friday the 13th suffer from paraskevidekatriaphobia, which is a concoction of Greek words for Friday (Paraskeví), thirteen (dekatreís), and fear (phobía).

Let’s do a little investigating into Friday the 13th. First, we’ll build a Calendar table of every single date in the 1247-year range of January 1, 1753 (the minimum date of the T-SQL datetime datatype) to December 31, 2999 (just a reasonable stopping point). That comes out to 455,457 dates, but our friend T-SQL can create that table in less than a second:

if object_id('tempdb..#Calendar') is not null drop table #Calendar
go
with
L0
(N) as (select 0 union all select 0 union all select 0) /* 3 Rows */
,L1(N) as (select 0 from L0 A cross join L0 B) /* 9 Rows (3x3) */
,L2(N) as (select 0 from L1 A cross join L1 B) /* 81 Rows (9x9) */
,L3(N) as (select 0 from L2 A cross join L2 B) /* 6561 Rows (81x81) */
,L4(N) as (select 0 from L3 A cross join L3 B) /* 43,046,721 Rows */
,Numbers(Num) as (select row_number() over (order by (select 0)) from L4)
select [Date]
,[Year]=year([Date])
,[Month]=month([Date])
,[Day]=day([Date])
,DayOfWeek=datename(weekday,[Date])
into #Calendar
from Numbers
cross apply (select [Date]=convert(datetime,dateadd(day,Num-1,'17530101'))) F1
where Num between 1 and datediff(day,'17530101','29991231')+1
Now we can check out the 13th and see how often it falls on the 7 days of the week:

select DayOfWeek
,NumOccurrences=count(*)
from #Calendar
where [Day]=13
group by DayOfWeek
order by NumOccurrences desc
/*
DayOfWeek NumOccurrences
--------- --------------
Friday 2146
Wednesday 2142
Sunday 2142
Tuesday 2137
Monday 2133
Saturday 2132
Thursday 2132
*/
Hmmm… interesting. The 13th occurs on Friday more than any other day of the week. That’s kind of scary, don’t you think?

In fact, as you can see below, Friday the 13th is in the family of most frequently occurring dates in history (and the future):

with HighestOccurrences as
(
select
top 1
with ties DayOfWeek
,[Day]
from #Calendar
group by DayOfWeek,[Day]
order by count(*) desc
)
select [MostFrequentlyOccurringDay]
=DayOfWeek
+' the '
+ltrim(str([Day]))
+case
when [Day] in (1,21,31) then 'st'
when [Day] in (2,22) then 'nd'
when [Day] in (3,23) then 'rd'
else 'th'
end
from
HighestOccurrences
order by [Day],DayOfWeek
/*
MostFrequentlyOccurringDay
--------------------------
Sunday the 1st
Monday the 2nd
Tuesday the 3rd
...
Wednesday the 11th
Thursday the 12th
Friday the 13th
Saturday the 14th
Sunday the 15th
...
Thursday the 26th
Friday the 27th
Saturday the 28th
*/
Of course, logically, we can see that this also means that Thursday the 12th and Saturday the 14th and so on down to Sunday the 1st and up to Saturday the 28th occur the most frequently, but still… it’s kind of disconcerting that the most-feared day of all occurs the most often.

Let’s build a table of just the Friday the 13th dates:

if object_id('tempdb..#Friday13s') is not null drop table #Friday13s
go
select [Date]
,[Year]
,[Month]
into #Friday13s
from #Calendar
where DayOfWeek='Friday' and [Day]=13
That table is now populated with all the Friday-the-13th dates from 1753 to 2999… There are 2146 of them.

Wait a minute…

If you add the digits of 2146, you get 2 + 1 + 4 + 6 = 13.

Now that’s an interesting (and somewhat unsettling) coincidence, isn’t it? Ha-ha-ha.

Ahem… Oh well, never mind… Let’s move on…

Using that table, we can see that a Friday the 13th occurs every single year without exception:

select Result=case
when (select count(distinct [Year]) from #Friday13s)=2999-1753+1
then 'Yep... it occurs every single year'
else 'Nope... there are some years when it doesn''t happen'
end
/*
Result
----------------------------------
Yep... it occurs every single year
*/
Apparently there’s no escape!

And here’s an interesting tidbit: Every century (except the 1700s because they are only partially represented in our table) contains the exact same number of Friday the 13th’s:

select Century
,NumOccurrences=count(*)
from #Friday13s
cross apply (select Century=str([Year]/100*100,4)+'s') F1
where [Year]>=1800
group by Century
order by Century
/*
Century NumOccurrences
------- --------------
1800s 172
1900s 172
2000s 172
2100s 172
2200s 172
2300s 172
2400s 172
2500s 172
2600s 172
2700s 172
2800s 172
2900s 172
*/
Fascinating…

Now I mentioned earlier that I thought that we had experienced a bunch of Friday the 13th’s lately. Was I right? How many did we have this year (2009)?:

select [Date]
from #Friday13s
where [Year]=2009
order by [Date]
/*
Date
-----------------------
2009-02-13 00:00:00.000
2009-03-13 00:00:00.000
2009-11-13 00:00:00.000
*/
Three of them! A Triple! We just had one last month in November, and we had two months in a row with February and March! Omigosh!

Now that makes me wonder… What years contain the most occurrences of Friday the 13th?:

with HighestOccurrences as
(
select
top 1
with ties [Year]
,NumOccurrences=count(*)
from #Friday13s
group by [Year]
order by NumOccurrences desc
)
select [Year]
,NumOccurrences
from HighestOccurrences
order by [Year]
/*
Year NumOccurrences
---- --------------
1761 3
1764 3
1767 3
...
1931 3
1942 3
1953 3
1956 3
1959 3
1970 3
1981 3
1984 3
1987 3
1998 3
2009 3
2012 3
2015 3
2026 3
2037 3
2040 3
2043 3
...
2989 3
2992 3
2995 3
*/
Yikes! Sure enough… Three times is the most it can occur in a year, and this year (2009) is one of those! And we haven’t experienced a Triple like that since 1998!

Wait a minute… Look at that list again. It looks like we’re going to experience another Triple in just 3 short years… in 2012! And wait! We’ll have another Triple just 3 years after that… in 2015!

Whoa! I thought we had experienced many Friday the 13th’s lately, but it looks like we are only at the beginning of a really scary cluster of them! It’s a Triple-Triple-Triple (3-3-3) Cluster! Three Friday the 13th’s each occurring in three different years that are three years apart from each other.

This calls for more investigation. Let’s look at all the years that have a Triple… and then cluster them chronologically in groups of three… and then find the Clusters that span the least amount of time (i.e. the tightest, densest, scariest Clusters):

with TripleFri13Years as
(
select [Year]
,RowNum=row_number() over (order by [Year])
from #Friday13s
group by [Year]
having count(*)=3
)
,
TripleFri13YearClusters as
(
select FirstTriple=A.[Year]
,SecondTriple=B.[Year]
,ThirdTriple=C.[Year]
,YearSpan
,YearSpanRank=rank() over (order by YearSpan)
from TripleFri13Years A
join TripleFri13Years B on A.RowNum+1=B.RowNum
join TripleFri13Years C on B.RowNum+1=C.RowNum
cross apply (select YearSpan=C.[Year]-A.[Year]+1) F1
)
select FirstTriple
,SecondTriple
,ThirdTriple
,YearSpan
from TripleFri13YearClusters
where YearSpanRank=1 /* The Shortest Span */
order by FirstTriple
/*
FirstTriple SecondTriple ThirdTriple YearSpan
----------- ------------ ----------- --------
1761 1764 1767 7
1789 1792 1795 7
1801 1804 1807 7
1829 1832 1835 7
1857 1860 1863 7
1885 1888 1891 7
1925 1928 1931 7
1953 1956 1959 7
1981 1984 1987 7
2009 2012 2015 7
2037 2040 2043 7
2065 2068 2071 7
2093 2096 2099 7
...
2933 2936 2939 7
2961 2964 2967 7
2989 2992 2995 7
*/
Oh no!

It looks like the 3-3-3 Cluster is the shortest cluster time span possible, and therefore, we are now on the verge of the worst possible Triple Cluster of Triple Friday-the-13th Years! The dreaded 3-3-3 cluster of 2009-2012-2015! The last time this kind of dense 3-3-3 cluster happened was the 7 year span of 1981-1984-1987.

Wait… Let me think… Did anything terrible happen from 1981 to 1987? Well, AIDS was first reported in 1981. And… uh-oh… (GULP)… The H1N1 Swine Flu Pandemic started this very year in 2009!

OH… MY… GAWD! It can’t be a coincidence!

So, let me make sure I got this straight… it seems that 2009 has the highest possible occurrences of Friday the 13th’s and it’s the beginning of a dreaded 3-3-3 Cluster. This doesn’t look good.

Wait!… I’ve gotta check this out… We already saw that every century has the same amount of Friday the 13th’s, but what about 3-3-3 Clusters?:

with TripleFri13Years as
(
select [Year]
,RowNum=row_number() over (order by [Year])
from #Friday13s
group by [Year]
having count(*)=3
)
,
TripleFri13YearClusters as
(
select FirstTriple=A.[Year]
,SecondTriple=B.[Year]
,ThirdTriple=C.[Year]
,YearSpan
,Ranking=rank() over (order by YearSpan)
from TripleFri13Years A
join TripleFri13Years B on A.RowNum+1=B.RowNum
join TripleFri13Years C on B.RowNum+1=C.RowNum
cross apply (select YearSpan=C.[Year]-A.[Year]+1) F1
)
select Century
,Num333Clusters=count(*)
from TripleFri13YearClusters
cross apply (select Century=str(FirstTriple/100*100,4)+'s') F1
where Ranking=1 and FirstTriple>=1800
group by Century
order by Century
/*
Century Num333Clusters
------- --------------
1800s 4
1900s 3
2000s 4
2100s 4
2200s 4
2300s 3
2400s 4
2500s 4
2600s 4
2700s 3
2800s 4
2900s 4
*/
This current century has the most possible 3-3-3 Clusters as well!… There are going to be 4 of them. And it… Hold on a minute… Four 3-3-3 Clusters… Uh-oh… 4 + 3 + 3 + 3 = 13!

NOOOOO! It’s a sign! It looks like 2009 is the beginning of the end… Maybe the Mayans and Columbia Pictures are wrong about 2012 being the end of the world. All indicators point to NOW.

I don’t know about you, but now I’m really getting panicky… In fact, I think that…

OH NO!… IT CAN’T BE!

Are you sitting down?

Look at what happens if you add the ASCII values of the characters of the ominous string “FRIDAY THIRTEENTH 333 CLUSTER”:

select SumOfAsciiChars=sum(ascii(substring(OminousString,Number,1)))
from (select OminousString='FRIDAY THIRTEENTH 333 CLUSTER') X
join master..spt_values on Type='P' and Number between 1 and len(OminousString)
/*
SumOfAsciiChars
---------------
2009
*/
It’s an omen! We’re doomed! MAKE IT STOP!! AAAAAIIIIIEEEEEEEE…





We regret to inform you that Mr Schulz has recently been admitted to the Institute for the Calendrically Insane. His planned release is on Friday, August 13th, 2010.

Friday, November 20, 2009

A Loan At Last!

Flashy New Car and HomeGot your eye on a flashy new car?

Or perhaps you’re considering a new home?

If you don’t have the cash to pay for either of these, you’ll need to get a loan.

And before you jump in head-first, you’ll want to know how much you’ll be paying each month, and perhaps you’ll want to know the obscene amount of how much interest you’ll end up paying over the life of the loan.

Sure, there are truckloads of loan calculators out there on the web… They’re a dime a dozen.

But who needs those when you have the raw power of SQL Server at your fingertips?

This blog entry will demonstrate how to create a loan payment schedule based on a fixed interest rate with compounded interest, and, as always, we’ll go a little bit beyond that to explore some other things.

The monthly payment amount, c, is expressed by the following formula:

Monthly Payment Calculation

where P is the loan amount, M is the number of months of the loan, and r is the monthly interest rate expressed as a decimal.

The balance of the loan after payment number N is expressed by the following formula:

Balance Calculation for Month N

With all this in mind, here is a stored procedure to produce a payment schedule:

use tempdb;
go

if object_id('usp_LoanSchedule') is not null drop procedure usp_LoanSchedule;
go

create procedure usp_LoanSchedule
@LoanAmount
numeric(10,2)
,@AnnualRate numeric(10,8) /* Expressed as percent */
,@NumberOfMonths int
,@StartDate datetime
as
with
InputVariables as
(
select P,M,R
,C=round((P*R)/(1-power(1+R,-M)),2)
from (select P=@LoanAmount
,M=@NumberOfMonths
,R=@AnnualRate/12/100) InputData
)
,
MonthlyPayments(PmtNo
,PmtDate
,Balance
,Principle
,Interest
,CumulPrinciple
,CumulInterest) as
(
select N
,dateadd(month,datediff(month,'19000101',@StartDate),'19000101')
,cast(NewBalance as numeric(10,2))
,cast(P-NewBalance as numeric(10,2))
,cast(C-(P-NewBalance) as numeric(10,2))
,cast(P-NewBalance as numeric(10,2))
,cast(C-(P-NewBalance) as numeric(10,2))
from InputVariables
cross apply (select N=1) CalcPmtNo
cross apply (select NewBalance=round(P*power(1+R,N)
-(power(1+R,N)-1)*C/R,2)) CalcNewBalance
union all
select N
,dateadd(month,1,mp.PmtDate)
,cast(NewBalance as numeric(10,2))
,cast(mp.Balance-NewBalance as numeric(10,2))
,cast(C-(mp.Balance-NewBalance) as numeric(10,2))
,cast(mp.CumulPrinciple+mp.Balance-NewBalance as numeric(10,2))
,cast(mp.CumulInterest+C-(mp.Balance-NewBalance) as numeric(10,2))
from MonthlyPayments mp
cross join InputVariables
cross apply (select N=mp.PmtNo+1) CalcPmtNo
cross apply (select NewBalance=case
when N=M
then 0.00 /* Last Payment */
else round(P*power(1+R,N)
-(power(1+R,N)-1)*C/R,2)
end) CalcNewBalance
where N<=M
)
select PmtNo
,PmtMonth=datename(month,PmtDate)+str(year(PmtDate),5)
,PmtAmount=Principle+Interest
,Principle
,Interest
,CumulPrinciple
,CumulInterest
,Balance
from MonthlyPayments
order by PmtNo
option (maxrecursion 1000)
;
If we want to buy a $20,000 car at 6.8% (annual) interest with a 5-year term (60 months) with our first payment in December 2009, then we can call it like so:

exec usp_LoanSchedule @LoanAmount     = 20000
,@AnnualRate = 6.8
,@NumberOfMonths = 60
,@StartDate = '20091201'
/*
PmtNo PmtMonth PmtAmount Principle Interest CumulPrinciple CumulInterest Balance
----- -------------- --------- --------- -------- -------------- ------------- --------
1 December 2009 394.14 280.81 113.33 280.81 113.33 19719.19
2 January 2010 394.14 282.39 111.75 563.20 225.08 19436.80
3 February 2010 394.14 284.00 110.14 847.20 335.22 19152.80
4 March 2010 394.14 285.61 108.53 1132.81 443.75 18867.19
... ... ... ... ... ... ... ...
57 August 2014 394.14 385.33 8.81 18830.89 3635.09 1169.11
58 September 2014 394.14 387.52 6.62 19218.41 3641.71 781.59
59 October 2014 394.14 389.71 4.43 19608.12 3646.14 391.88
60 November 2014 394.14 391.88 2.26 20000.00 3648.40 0.00
*/
Let’s go over how the logic works.

The procedure essentially consists of a single SELECT statement, which is made up of two CTE’s (Common Table Expressions).

The first CTE is called InputVariables, and it just produces a small one-row table of the variables that we need for the calculations… in other words, our friends P, M, r, and c. Note how r is calculated by dividing the @AnnualRate parameter by 12 (to get a monthly rate) and then by 100 (to represent the rate as a decimal). Also note how c (the monthly payment amount) was calculated based on the other 3 variables.

I could have just created local variables (@P, @M, @R, and @C) instead of setting them up in a CTE, but I had two reasons for not doing that. First of all, I didn’t want all those pesky at-signs (@) to be all throughout the code and second, I wanted to encapsulate the entire creation of the loan schedule table in a single SELECT statement (I’ll explain why later).

The second CTE, called MonthlyPayments, is a recursive one. The first part of the recursive CTE (before the UNION ALL) is the anchor. It produces the data for the very first payment (note how I also adjust the first payment date to the first of the month). The second part (the recursive part) of the CTE builds upon that, creating the second payment, then the third, and so on, up until the last payment. It will continue tacking on payments as long as its WHERE clause (WHERE N<=M) is true… in other words, as long as the Payment Number (N) is less than or equal to the number of months of the loan (M). Once the WHERE clause evaluates to false (when N>M), then it will stop.

Both the anchor and the recursive part of the CTE use a couple of CROSS APPLYs. The first CROSS APPLY establishes a new column N, the Payment Number. The only reason I do this is because I want to use N in the calculation of the Balance. You’ll recall that the formula for the Balance uses N as one of its variables.

The second CROSS APPLY is the calculation of that NewBalance, making use of P, r, N, and c. Note the CASE statement that checks if we are processing the final payment (CASE WHEN N=M) and, if so, it forces the final balance to be zero.

The Principle of the payment is calculated by subtracting the NewBalance from the Balance of the previous payment (or in the case of the very first payment, by subtracting NewBalance from the original loan amount P). The Interest part of the payment is calculated by simply subtracting the Principle from the Payment Amount (c).

Since the recursive CTE is building the schedule one payment at a time, we can take advantage of that and calculate cumulative totals for the Principle and the Interest (CumulPrinciple and CumulInterest) as we go along.

Finally, the main SELECT statement pulls all the rows from the MonthlyPayments recursive CTE, dressing up the date by spelling out the month’s name and ORDERing BY the PmtNo.

Note the OPTION clause. By default, the maximum number of recursion levels is 100. The number of levels of recursion we will reach depends on the value of @NumberOfMonths that is passed to the procedure. If we’re doing a mortgage payment that spans 30 years, then @NumberOfMonths will be 360 and we will need a level of recursion of at least 360 or else we will get this message:

/*
Msg 530: The statement terminated.
The maximum recursion 100 has been exhausted before statement completion.
*/
So our general MAXRECURSION value of 1000 should certainly be reasonable enough, assuming that no loan is going to last more than 83 years.

Great! We have a stored procedure that produces a loan schedule! Terrific!

Now what?

Not much, that’s what. This stored procedure works just fine, but we can’t really do anything with it. All we can do is EXECUTE it. Unless we INSERT its result set into a temporary table, we can’t query any important information out of it.

However, we could query information out of this loan schedule data if we created an inline table-valued function (TVF) instead of the stored procedure, like so:

use tempdb;
go

if object_id('ufn_LoanSchedule') is not null drop function ufn_LoanSchedule;
go

create function ufn_LoanSchedule
(
@LoanAmount numeric(10,2)
,@AnnualRate numeric(10,8) /* Expressed as percent */
,@NumberOfMonths int
,@StartDate datetime
)
returns table
as
return
with
InputVariables as
(
select P,M,R
,C=round((P*R)/(1-power(1+R,-M)),2)
from (select P=@LoanAmount
,M=@NumberOfMonths
,R=@AnnualRate/12/100) InputData
)
,
MonthlyPayments(PmtNo
,PmtDate
,Balance
,Principle
,Interest
,CumulPrinciple
,CumulInterest) as
(
select N
,dateadd(month,datediff(month,'19000101',@StartDate),'19000101')
,cast(NewBalance as numeric(10,2))
,cast(P-NewBalance as numeric(10,2))
,cast(C-(P-NewBalance) as numeric(10,2))
,cast(P-NewBalance as numeric(10,2))
,cast(C-(P-NewBalance) as numeric(10,2))
from InputVariables
cross apply (select N=1) CalcPmtNo
cross apply (select NewBalance=round(P*power(1+R,N)
-(power(1+R,N)-1)*C/R,2)) CalcNewBalance
union all
select N
,dateadd(month,1,mp.PmtDate)
,cast(NewBalance as numeric(10,2))
,cast(mp.Balance-NewBalance as numeric(10,2))
,cast(C-(mp.Balance-NewBalance) as numeric(10,2))
,cast(mp.CumulPrinciple+mp.Balance-NewBalance as numeric(10,2))
,cast(mp.CumulInterest+C-(mp.Balance-NewBalance) as numeric(10,2))
from MonthlyPayments mp
cross join InputVariables
cross apply (select N=mp.PmtNo+1) CalcPmtNo
cross apply (select NewBalance=case
when N=M
then 0.00 /* Last Payment */
else round(P*power(1+R,N)
-(power(1+R,N)-1)*C/R,2)
end) CalcNewBalance
where N<=M
)
select PmtNo
,PmtDate
,PmtAmount=Principle+Interest
,Principle
,Interest
,CumulPrinciple
,CumulInterest
,Balance
from MonthlyPayments
;
(This is why I originally created a single one-stop-shopping query in our original stored procedure without the use of local variables… so that I could easily translate it into an in-line TVF).

Now we can call the TVF just as if it were a table, pulling out any columns we wish:

select PmtNo
,PmtDate
,Principle
,Interest
,Balance
from dbo.ufn_LoanSchedule(20000,6.8,60,'20091201')
order by PmtNo
/*
PmtNo PmtDate Principle Interest Balance
----- ----------------------- --------- -------- --------
1 2009-12-01 00:00:00.000 280.81 113.33 19719.19
2 2010-01-01 00:00:00.000 282.39 111.75 19436.80
3 2010-02-01 00:00:00.000 284.00 110.14 19152.80
4 2010-03-01 00:00:00.000 285.61 108.53 18867.19
... ... ... ... ...
57 2014-08-01 00:00:00.000 385.33 8.81 1169.11
58 2014-09-01 00:00:00.000 387.52 6.62 781.59
59 2014-10-01 00:00:00.000 389.71 4.43 391.88
60 2014-11-01 00:00:00.000 391.88 2.26 0.00
*/
Look back at the code where we created the TVF. Did you notice something missing in the definition? Two things, actually. I deliberately left out the ORDER BY clause because we may want to order the data in a different way when we call the TVF. But the OPTION clause is also missing that specifies the MAXRECURSION value. T-SQL will not allow it in a query that’s part of a TVF definition. If we were to tack it on to the end of the query and try to CREATE the FUNCTION, we’d get the following error message:

/*
Msg 156, Level 15, State 1, Procedure ufn_LoanSchedule, Line 67
Incorrect syntax near the keyword 'option'.
*/
Since we cannot specify any MAXRECURSION in the TVF definition, it is (unfortunately) up to us to include the OPTION clause in any query that calls the TVF if we know that we are going to breach the minimum level of 100.

So, for example, let’s say we have just bought a quaint little cottage for $500,000. (Don’t scoff… that price is dirt cheap for a humble little abode here in the San Francisco Bay Area). We get a 30-year (360-month) fixed loan with a 5% interest rate, with our first payment due in January 2010. We need to add the OPTION clause to the following query to prevent an error:

select PmtNo
,PmtDate
,Principle
,Interest
,Balance
from dbo.ufn_LoanSchedule(500000,5,360,'20100101')
option (maxrecursion 360)
/*
PmtNo PmtDate Principle Interest Balance
----- ----------------------- --------- -------- ---------
1 2010-01-01 00:00:00.000 600.78 2083.33 499399.22
2 2010-02-01 00:00:00.000 603.28 2080.83 498795.94
3 2010-03-01 00:00:00.000 605.79 2078.32 498190.15
4 2010-04-01 00:00:00.000 608.32 2075.79 497581.83
... ... ... ... ...
357 2039-09-01 00:00:00.000 2639.84 44.27 7984.14
358 2039-10-01 00:00:00.000 2650.84 33.27 5333.30
359 2039-11-01 00:00:00.000 2661.89 22.22 2671.41
360 2039-12-01 00:00:00.000 2671.41 12.70 0.00
*/
Anyway, now that we have a TVF, we can get some interesting information from the loan schedule data.

For example, using our mortgage scenario, how much can we write off in mortgage interest on our income taxes in the year 2012?:

select TotalInterest=sum(Interest)
from dbo.ufn_LoanSchedule(500000,5,360,'20100101')
where year(PmtDate)=2012
option (maxrecursion 360)
/*
TotalInterest
-------------
24058.33
*/
What will our loan balance be at the end of 2015?:

select Balance
from dbo.ufn_LoanSchedule(500000,5,360,'20100101')
where PmtDate='20151201'
order by PmtNo
option (maxrecursion 360)
/*
Balance
---------
449676.39
*/
How long will it take for us to pay off half of the loan?:

select top 1 PmtNo,PmtDate
from dbo.ufn_LoanSchedule(500000,5,360,'20100101')
where CumulPrinciple>=Balance
order by PmtNo
option (maxrecursion 360)
/*
PmtNo PmtDate
----- -----------------------
242 2030-02-01 00:00:00.000
*/
(Yikes… That’s the problem with how these loans work. It’ll take over 20 years of the 30-year loan term to pay off half of the principle.)

Finally, we can test out different rates and see how the monthly payment changes and how much total cumulative interest we’ll end up paying for each rate:

select Rate
,PmtAmount
,CumulInterest
from (select 5.1 union all
select 5.3 union all
select 5.5 union all
select 5.7 union all
select 5.9) Rates(Rate)
cross
apply dbo.ufn_LoanSchedule(500000,Rate,360,'20100101')
where PmtNo=360
option (maxrecursion 360)
/*
Rate PmtAmount CumulInterest
---- --------- -------------
5.1 2714.75 477310.00
5.3 2776.52 499547.20
5.5 2838.95 522022.00
5.7 2902.00 544720.00
5.9 2965.68 567644.80
*/
(Don’t you find it disgusting interesting how you pay more in interest over the life of the loan than you do in principle if your rate is a little over 5.3%?).

I hope this article has been helpful not just in showing how to calculate loan schedule data, but also in demonstrating recursive CTE’s and TVF’s and other techniques. I certainly learned something I didn’t expect (the inability to specify a MAXRECURSION value in a TVF) in putting the article together.

I’ll bet that your borrowing future will be brighter now that you have the tools to analyze potential loans. From now on, you’ll be able to make those payments with a smile on your face.

Yeah, right.