Tuesday, March 29, 2011

Shrink Your Databases Regularly

Oh, no, no, no… Not that kind of shrinking! (But I did get your attention, didn’t I?)

shrink \shringk\ n. : (slang) [short for headshrinker] a clinical psychiatrist or psychologist [Tony Soprano sees his shrink, Dr. Melfi, every week.] vt. : (slang) to psychoanalyze [Shrink your databases regularly.] shrank, shrunk


A note to the reader: This is based on several true stories. Some events were changed or otherwise fictionalized for dramatic purposes.



Shrink Your Databases RegularlyDr. Ben Adryl: Good afternoon. I’m Dr. Ben Adryl, but many people just refer to me by my initials DBA. And you are Mr. Tabase, I presume?

Deigh Tabase: Yes, my first name is Deighton. Most people just call me Deigh for short.

Dr. Ben Adryl (DBA): Welcome. Now how can I help you today?

Deigh Tabase (Database): Well, I seem to have a lot of problems, and they’ve been getting worse with each passing day. It was suggested that I make an appointment with you. But I have to admit, I’ve never been to a shrink before.

DBA: I would prefer that you not use the “s” word in this office. I’m not really a traditional psychotherapist anyway… I take more of a holistic approach and work with all aspects of a person’s well-being, like sanity, health, productivity. Think of me as a life coach. I’m kind of like Dr. Phil and Dr. Oz and David Allen all rolled into one.

Database: That sounds great, doc. How do we start?

DBA: Tell me a little bit about some of your problems.

Database: Well, for one thing, I’ve been suffering from claustrophobia as long as I can remember. Plus I feel bloated and tired and overworked and I can’t seem to get organized. I guess I’m kind of a mess.

DBA: Don’t worry, all is not lost. I’m sure I can help. Let me explain how this works. I’m going to ask you several questions, using a special technique to draw more information out of your subconscience. This technique is called Dynamic Management View Querying.

Database: Will it hurt?

DBA: Oh no, not at all. Based on what I find, I may be able to give you recommendations on how you can find relief, and, in some cases, I may be able to administer treatment immediately.

Database: That would be wonderful, doc.

DBA: So tell me more about yourself… Your age, your background, etc…

select Created=convert(varchar(20),d.create_date,100)
,LastRead=isnull(convert(varchar(20),s.last_read_date,100),'')
,LastWrite=isnull(convert(varchar(20),s.last_write_date,100),'')
,CompLevel=d.compatibility_level
,RecovModel=d.recovery_model_desc
,LogReuseWait=d.log_reuse_wait_desc
,Collation=d.collation_name
,AutoStats=case
when d.is_auto_create_stats_on=1
then 'Create '
else ''
end
+case
when d.is_auto_update_stats_on=1
then 'Update '
else ''
end
+case
when d.is_auto_update_stats_async_on=1
then 'Asynch '
else ''
end
,Parameterization=case
when d.is_parameterization_forced=1
then 'Forced'
else 'Simple'
end
,UserAccess=d.user_access_desc
,[State]=d.state_desc
from sys.databases d
outer apply (select last_read_date
=max(case
when last_user_scan>=isnull(last_user_seek,'19000101')
and last_user_scan>=isnull(last_user_lookup,'19000101')
then last_user_scan
when last_user_seek>=isnull(last_user_scan,'19000101')
and last_user_seek>=isnull(last_user_lookup,'19000101')
then last_user_seek
else last_user_lookup
end)
,last_write_date=max(last_user_update)
from sys.dm_db_index_usage_stats
where database_id=d.database_id) s
where database_id=db_id('Deighton')
/*
(Reformatted for clarity):
Created............ Dec 18 2010 8:37PM
LastRead........... Mar 29 2011 11.24AM
LastWrite.......... Mar 29 2011 11:23AM
CompLevel.......... 100
RecovModel......... FULL
LogReuseWait....... LOG_BACKUP
Collation.......... SQL_Latin1_General_CP1_CI_AS
AutoStats.......... Create Update
Parameterization... Simple
UserAccess......... MULTI_USER
State.............. ONLINE
*/
DBA: Hmmm… Interesting.

Database: What is it, doc?

DBA: Oh nothing… Don’t worry… I often mumble to myself as I do some of this analysis. Pay it no mind. Tell me a little more about yourself …

use Deighton
go
select
LogicalName=name
,[Type]=case when [type]=0 then 'Data' else 'Log' end
,SizeAllocated=convert(varchar(20),convert(decimal(12,2),size*8./1024))+'MB'
,SpaceUsed=convert(varchar(20),convert(decimal(12,2),UsedPages*8./1024))+'MB'
,[%Full]=convert(varchar(20),convert(decimal(12,1),UsedPages*100./size))+'%'
,SpaceLeft=convert(varchar(20),convert(decimal(12,2),(size-UsedPages)*8./1024))+'MB'
,[%Avail]=convert(varchar(20),convert(decimal(12,1),(size-UsedPages)*100./size))+'%'
,Growth=case
when is_percent_growth=1
then convert(varchar(20),growth)+'%'
else convert(varchar(20),convert(decimal(12,2),growth*8./1024))+'MB'
end
,MaxSize=case
when max_size=-1
then 'Unlimited'
else convert(varchar(20),convert(decimal(12,0),max_size*8./1024))+'MB'
end
,PhysicalFile=physical_name
from sys.database_files
cross apply (select UsedPages=fileproperty(name,'SpaceUsed')) F_Used
where [type] in (0,1) --Rows,Log
/*
(Reformatted for clarity):
Deighton_Data (Data):
Size: 1756.00MB
Used: 1755.00MB (99.9%)
Avail: 1.00MB (0.1%)
Growth: 1.00MB
Max: Unlimited
File: C:\Microsoft SQL Server\MSSQL.1\MSSQL\data\Deighton_Data.MDF
Deighton_Log (Log):
Size: 14198.81MB
Used: 13745.87MB (96.8%)
Avail: 452.95MB (3.2%)
Growth: 10%
Max: Unlimited
File: C:\Microsoft SQL Server\MSSQL.1\MSSQL\data\Deighton_Log.LDF
*/
DBA: Ahhh… I see… You mentioned earlier that you suffer from claustrophobia.

Database: Yes, all the time. And often I have these anxiety or panic attacks where I kind of explode, and it makes me feel a little better, but not for long.

DBA: Well, I can cure you of that immediately.

Database: You’re kidding… Really?

DBA: Yes. Now just hold still while I…

/*
The database felt claustrophobic because it was using up 99.9% of its allocated space.
And its FILEGROWTH is only 1MB (the default), so every time it reached full capacity and
had an anxiety attack, which was often, it only experienced "autogrowth" of a piddly
little 1MB, which might have made it feel relatively better for a short while, but it
would still be only 1MB away from being full capacity again and having another attack.
If the database's initial allocated size upon creation was 200MB, then it had over 1500
autogrowth anxiety attacks in its lifetime!
So DBA bumped the allocated size up to 3500MB to relieve the claustrophobia and,
at the same time, he changed the FILEGROWTH to 100MB. This was just a preventative
measure, in case the database unexpectedly reached that 3500MB capacity. But DBA will
diligently monitor the used space and adjust sizes appropriately long before the
database uses up that space.
*/
alter database Deighton
modify file (name=N'Deighton_Data'
,size=3500MB
,filegrowth=100MB)
DBA: How does that feel?

Database: Omigosh, doc! That’s amazing! I feel so free now! Yippee!

DBA: I think I can take care of that bloating problem you mentioned earlier also. One moment…

/*
The database's transaction log was HUGE. It was about 14GB in size, filled up
to 96.8% capacity. That's compared to its actual data, which was only 1.7GB.
Since the database has a Recovery Model of FULL, its transaction log will
continue to grow larger and larger forever... unless a transaction log backup
is performed. That log backup will truncate the contents of the transaction
log. Note that a full backup of the database WILL NOT truncate the transaction
log automatically... Log backups must be performed in order to do that.
*/
backup log Deighton
to disk=N'C:\SomeBackupLocation\DeightonLog-yyyymmdd-hhmmss.trn'
DBA: How’s that?

Database: Much better, doc!

DBA: Perhaps at another time I can make a further adjustment, though it is a procedure that I very rarely perform.

Database: Okay.

DBA: Now tell me about diet and exercise. Do you exercise regularly?

Database: Well… I… er…

/*
Obtain the 5 most recent backups of each type performed.
*/
with BackupDetail as
(
select BackupType
,backup_finish_date
,RowNum=row_number() over (partition by BackupType
order by backup_finish_date desc)
from msdb.dbo.backupset
cross apply (select BackupType=case [type]
when 'D' then 'Full'
when 'I' then 'Differential'
when 'L' then 'Log'
when 'F' then 'File'
when 'G' then 'Differential File'
when 'P' then 'Partial'
when 'Q' then 'Differential Partial'
else 'N/A'
end) F_Type
where database_name='Deighton'
)
select BackupType
,backup_finish_date
from BackupDetail
where RowNum<=5
order by BackupType
,RowNum
/*
BackupType backup_finish_date
---------- -----------------------
Full 2011-03-20 23:01:33.000
Full 2011-03-04 22:19:52.000
Full 2011-02-24 22:43:39.000
Full 2011-02-19 22:24:03.000
Full 2011-02-11 23:11:42.000
Log 2011-03-29 11:35:31.000 <==This is the log backup just performed
*/
DBA: Hmmm… It seems pretty sporadic.

Database: Yeah, well…

DBA: Listen, Deighton, this is very important. I know you’re the kind of guy who wants to live life to the FULLest. In order to really do that, you have to do regular exercise and have a healthy diet. Look at it this way… If something happens to you and you get very sick or are in a bad accident, you want to get better quickly and RESTORE yourself to FULL health, don’t you?

Database: Yes.

DBA: You don’t want to be a SIMPLEton and only be restored to a fraction of yourself, do you?

Database: No, I suppose not.

DBA: Then we have to get you into an exercise regimen [full and differential backups] and you should be drinking water regularly during the day [log backups] to help flush out your system. I know it seems like a pain, but it can be really easy and doesn’t take all that much time. To get you motivated, I can put you in touch with a personal trainer named C. Quill Agent who will stay on top of these things for you, making sure that you do them regularly.

Database: Thanks, doc.

DBA: Don’t mention it. Now, on to other things… You mentioned overwork and disorganization?

Database: Yeah. I feel like I’m doing unnecessary work at my place of business. I’m trying to remedy this by using a strategy that I saw in a self-help book, but it doesn’t seem to work. My workflow still seems incredibly inefficient, and I just feel kind of scatter-brained, if you know what I mean.

DBA: Yes, I think I have an idea of what you’re talking about. Tell me about your methods of organization…

use Deighton
go
select
TableName=object_name(dm.object_id)
,IndexName=i.name
,IndexType=dm.index_type_desc
,[%Fragmented]=avg_fragmentation_in_percent
from sys.dm_db_index_physical_stats(db_id(),null,null,null,'sampled') dm
join sys.indexes i on dm.object_id=i.object_id and dm.index_id=i.index_id
order by avg_fragmentation_in_percent desc
/*
(Names of Tables and Indexes Disguised):
TableName IndexName IndexType %Fragmented
-------------------- ----------------------- ------------------ ----------------
qufi_efferhmiqfz rhaqwqamxim NONCLUSTERED INDEX 99.9925228054434
qufi_efferhmiqfz foliqemi NONCLUSTERED INDEX 99.9788779993241
qufiz zuamriwiy NONCLUSTERED INDEX 99.9643747773424
qufi_efferhmiqfz qufioh NONCLUSTERED INDEX 99.9625355911884
qufiz qufihefifomi NONCLUSTERED INDEX 99.9515738498789
qufiz rmiefihhefi NONCLUSTERED INDEX 99.9515503875969
qufiz lezfaphefi NONCLUSTERED INDEX 99.9515503875969
qufiz erfogofy_zfemfhefi NONCLUSTERED INDEX 99.9481058640374
qufiz erfogofy_rumplifihhefi NONCLUSTERED INDEX 99.9481058640374
qufiz miplerihwofhoh NONCLUSTERED INDEX 99.9339498018494
qufiz zuamrioh NONCLUSTERED INDEX 99.9339061467284
qufiz hilifih NONCLUSTERED INDEX 99.9093381686310
qufiz umpheq NONCLUSTERED INDEX 99.9092558983666
fuhu PK_fuhu CLUSTERED INDEX 99.8109640831758
xmuwimz PK_xmuwimz CLUSTERED INDEX 99.7950819672131
xmuwimz hxe NONCLUSTERED INDEX 97.4358974358974
qufiz PK_qufiz CLUSTERED INDEX 97.2856847758728
xmuwimz lezfqemi NONCLUSTERED INDEX 97.2222222222222
hogozouqz PK_hogozouqz CLUSTERED INDEX 97.0588235294118
xmuwimz fomzfqemi NONCLUSTERED INDEX 96.9696969696970
xmuwimz reloriqziqamxim NONCLUSTERED INDEX 96.9696969696970
xmuwimz zzq NONCLUSTERED INDEX 95.8333333333333
xmuwimz mzmoh NONCLUSTERED INDEX 95.4545454545455
xmuwimz fizfmirumh NONCLUSTERED INDEX 95.4545454545455
ruqferfzlezferrizzih PK_ruqferfzlezferrizzih CLUSTERED INDEX 93.7500000000000
hogozouqz hogozouqqemi NONCLUSTERED INDEX 92.8571428571429
xmuwimzlezferrizzih PK_xmuwimzlezferrizzih CLUSTERED INDEX 90.9090909090909
...And so on and so on
*/
DBA: My goodness, I can see why you’re feeling overworked… You are scattered and fragmented. What is the strategy that you said you’re trying to alleviate this?

Database: I wrote it down. Here it is… I try to do it often:

use Deighton
go

/*
Rebuild all Indexes and Statistics in the database
*/
declare @Table nvarchar(255)
,@Sql nvarchar(255)

declare TableCursor cursor
for
select
quotename(table_schema)+'.'+quotename(table_name)
from information_schema.tables
where table_type='BASE TABLE'

open TableCursor
while 1=1
begin
fetch next from TableCursor into @Table
if @@fetch_status<>0 break
set @Sql='alter index all on '+@Table+' rebuild'
exec (@Sql)
set @Sql='update statistics '+@Table
exec (@Sql)
end

close TableCursor
deallocate TableCursor

/*
Reclaim disk space by shrinking
*/
dbcc shrinkdatabase(N'Deighton',10)
dbcc shrinkfile(N'Deighton_Data',10)
dbcc shrinkfile(N'Deighton_Log',10)
DBA: Oh my goodness, I definitely see the problem here. And I’ll bet you feel exhausted after doing this, am I right?

Database: Yeah, you got that right.

DBA: Hold on… Let me write down a few notes…

/*
The following will rebuild all indexes from scratch, even though that might not
be necessary. Indexes should be treated on a case-by-case bses. An index should
be REBUILT only when it has high fragmentation, REORGANIZEd if the fragmentation is
moderate, and not touched at all if the fragmentation is very low.
*/
set @Sql='alter index all on '+@Table+' rebuild'
exec (@Sql)
/*
By default, statistics are auto-updated by the system when a table has been
changed by a certain amount. If a table has not changed at all, then there's no
reason to update its statistics. What's worse with the command below is that
it is NOT updating the statistics WITH FULLSCAN. The Index Rebuild above
automatically rebuilt statistics for indexed columns "for free" (using FULLSCAN),
but the command below will just end up re-updating them based on only a
SAMPLE, so it's duplicating the work of the Index Rebuild and creating less
accurate statistics at the same time! Ideally the command below should be doing
UPDATE STATISTICS ... WITH FULLSCAN, COLUMNS so that it only updates non-indexed
column statistics.
*/
set @Sql='update statistics '+@Table
exec (@Sql)

/*
Oh, horror of horrors! SHRINKing a database will just end up completely fragmenting
up all the indexes that had been painstakingly rebuilt above! This is like shooting
yourself in the foot. And it's essentially being done twice! A SHRINKDATABASE
automatically shrinks each of its data and log files, so the two SHRINKFILE commands
are just duplicating the process.
*/
dbcc shrinkdatabase(N'Deighton',10)
dbcc shrinkfile(N'Deighton_Data',10)
dbcc shrinkfile(N'Deighton_Log',10)
DBA: Okay, listen to me carefully. You must stop doing this immediately and never do it again! The first part is generally okay, except you’re spending way too much time and effort in redoing EVERYTHING from scratch. Contact a colleague of mine named Ola Hallengren… he has a much more intelligent approach to doing all of this. But the second part is a huge giant no-no, because it’s destroying everything you painstakingly did in the first part. It’s like you’ve arranged your papers on your desk in nice neat stacks and then a hurricane comes into the room and blows them all over the place. After our session, please read this article by Paul Randal on why you should not be doing this.

Database: Okay, whatever you say, doc.

DBA: It’s also possible that some of the stuff you’re organizing may be doing you more harm than good…

/*
Produce a list of all nonclustered, non-primary-key indexes in the database that
deal with more than 5000 rows. Compare their Reads vs Writes. The list is sorted
by Read/Write ratio. Focus on the indexes toward the top of the list with a
Read/Write ratio of under 1.00... They are candidates for DROPping from the database.
*/
use Deighton
go
select
TableName=o.Name
,IndexName=i.Name
,Rows
,Reads
,Writes
,[Reads/Write]
,Seeks=User_Seeks
,Scans=User_Scans
,Lookups=User_Lookups
,Definition='('+IndexColumns+')'+coalesce(' include ('+IncludeColumns+')','')
,DropIt='drop index '+quotename(i.Name)
+' on '+quotename(c.name)+'.'+quotename(object_name(s.object_id))
from sys.dm_db_index_usage_stats s
join sys.indexes i ON s.object_id=i.object_id and s.index_id=i.index_id
join sys.objects o on s.object_id=o.object_id
join sys.schemas c on o.schema_id=c.schema_id
cross apply (select Rows=sum(p.Rows)
from sys.partitions p
where object_id=s.object_id and index_id=s.index_id) F_Rows
cross apply (select Reads=User_Seeks+User_Scans+User_Lookups
,Writes=User_Updates) F_RW
cross apply (select [Reads/Write]=cast(case
when Writes<1
then 100
else 1.*Reads/Writes
end as decimal(12,3))) F_RW2
cross apply (select IndexColumns
=stuff(
(select ','+c.Name
+case ic.Is_Descending_Key
when 1
then ' DESC'
else ''
end
from sys.index_columns ic
join sys.columns c on ic.Object_ID=c.Object_ID
and ic.Column_ID=c.Column_ID
where ic.Object_ID=i.Object_ID
and ic.Index_ID=i.Index_ID
and ic.Is_Included_Column=0
order by ic.Index_Column_ID
for xml path(''))
,1,1,'')
,IncludeColumns
=stuff(
(select ','+c.Name
from sys.index_columns ic
join sys.columns c on ic.Object_ID=c.Object_ID
and ic.Column_ID=c.Column_ID
where ic.Object_ID=i.Object_ID
and ic.Index_ID=i.Index_ID
and ic.Is_Included_Column=1
order by ic.Index_Column_ID
for xml path(''))
,1,1,'')) F_IC
where s.database_id=db_id()
and objectproperty(s.object_id,'IsUserTable')=1
and i.type_desc='NONCLUSTERED'
and i.is_primary_key=0
and i.is_unique_constraint=0
and Rows>5000
order by [Reads/Write]
,Reads
,Writes
/*
(Names of Tables and Indexes Disguised):
(Output abbreviated):
TableName IndexName Reads Writes Reads/Write
---------------------- ---------------------------------------- ----- ------ -----------
eahoffmeol eahoffmoel_rheqgilugmezfimoh 0 1346 0.000
ehhmizziz ehhmizziz_ehhmizzfypi 0 1880 0.000
rheqgilug rheqgilug_gluxelazimoh 0 42235 0.000
zaxlihgim zaxlihgim_gloh 0 62264 0.000
rheqgilug_hifeolih rheqgilug_hifeolih_rheqgilug_mezfimoh 1 37584 0.000
zaxlihgim zaxlihgim_miruqrolexli_ehjazfmiqfoh 82 62292 0.001
zaxlihgim zaxlihgim_rezhmiriopf_hifeolzoh 113 62323 0.002
zaxlihgim zaxlihgim_oqguorioh 323 62641 0.005
zaxlihgim zaxlihgim_hozxamzexliofimoh 732 62996 0.012
phuqiz phuqiz_phuqi 75 2662 0.028
eahoffmeol eahoffmoel_zuamriwiy_zuamrioh 44 1346 0.033
rheqgilug_mezfim rheqgilugmezfim_zizzouqoh 491 6759 0.073
oqguori_xollexliofimz oqguori_xollexliofimz_pulory_pleq_oh 1907 5440 0.351
miruqrolexli_loqiofimz miruqrolexli_loqiofimz_fypi 13723 19833 0.692
oqguori_xollexliofimz oqguori_xollexliofimz_oqguorioh 11240 5440 2.066
oqguori_xollexliofimz oqguori_xollexliofimz_zuamriwiy_zuamrioh 11465 5117 2.241
phuqiz phuqiz_zuamriwiy_zuamrioh 8303 3476 2.389
miruqrolexli_loqiofimz fw_miruqrolexli_loqiofimz_fypi 75078 24978 3.006
rheqgilug_mezfim rheqgilugmezfim_zuamriwiy_zuamrioh 38364 6759 5.676
ehhmizziz ehhmizziz_zuamriwiy_zuamrioh 21305 2959 7.200
zopruhiz rofy 2572 0 100.000
zopruhiz zfefi 2572 0 100.000
zopruhiz zopruhi 27203 0 100.000
*/
DBA: See the items at the top? Those are cases where you’re spending much more time filing items away than you are acquiring them later. For example, in the fourth entry in the list, you’ve spent time and energy filing 62,264 items but you’ve never once accessed one of them after the fact. Organizing your information in that way is most likely more work than it’s worth, since you aren’t really taking advantage of it anyway.

Database: Hmmm… I see what you mean, doc.

DBA: Well, it looks like we’re out of time for today. Even though we started to accomplish a lot today, we’ve just scratched the surface, so I’d like to schedule you for regular visits so that we can monitor you and administer some more treatments as necessary.

Database: Er… How long will we be doing this monitoring, doc?

DBA: Why, the rest of your life of course. Here’s my bill.

Database: Let’s see… WHAT?! That’s outrageous! I don’t have that kind of money! And you want to continue seeing me and billing me ridiculous amounts like this for the rest of my life? Are you insane? Just who the hell do you think you are? You high-and-mighty types make me want to vomit. You sit there on your high horse and think you know everything. Well, you know what you can do with this here bill? You can take it and just shove---

drop database Deighton
DBA: (Sigh). Another one bites the dust. Oh well… Ms. Foreachdb, would you please send in my next patient?

Tuesday, March 22, 2011

DMV/DMF Info Just A Couple Clicks Away

If you’re not reading Kendra Little’s blog, then you’re really missing out. It’s full of terrific technical information, but it’s her artwork and sense of humor that really make it special.

This morning, Kendra wrote a really cool post called Dynamic Management Quickie: Exploring SQL Server’s System Views and Functions As You Work.

It included a query that will help you find system views and/or functions that have a column name containing a certain search string. For example, to find stuff that has to do with CPU, you can execute the following query (which I rewrote/reformatted in Schulzified style):

select SysObjName
,o.type_desc
,ColumnName=c.name
from sys.system_columns c
join sys.system_objects o on c.object_id=o.object_id
cross apply (select SysObjName=schema_name(schema_id)+'.'+o.name) F_SON
where c.name like '%cpu%' --Enter Search String Here
order by SysObjName
/*
SysObjName type_desc ColumnName
--------------------- --------------------------------- ---------------
sys.dm_exec_requests VIEW cpu_time
sys.dm_exec_sessions VIEW cpu_time
sys.dm_os_schedulers VIEW cpu_id
sys.dm_os_sys_info VIEW cpu_count
sys.dm_os_sys_info VIEW cpu_ticks
sys.dm_os_sys_info VIEW cpu_ticks_in_ms
sys.fn_trace_gettable SQL_INLINE_TABLE_VALUED_FUNCTION CPU
sys.syslogins VIEW totcpu
sys.sysprocesses VIEW cpu
*/
This is really helpful, but Kendra went one step further and posted another query which added a column containing a URL so that you could open a browser window (CTRL+ALT+R) and paste the URL in to get more information on the subject.

I thought this was a great idea, but, lazy person that I am, I wanted to cut down on the number of clicks and keystrokes that I had to do in order to get to the browser window... and I wanted all the information there so it was just one click away.

So I came up with the following query:

with SysObjInfo as
(
select SysObjInfo=(select nchar(13)
+N'System Object: '+SysObjName+nchar(13)
+N'Type of Object: '+o.type_desc+nchar(13)
+N'Column Name: '+c.name+nchar(13)
+N'URL (CTRL+Click): '
+N'http://social.msdn.microsoft.com/'
+N'Search/en-US/?Refinement=117&Query='
+SysObjName
+nchar(13)
from sys.system_columns c
join sys.system_objects o on c.object_id=o.object_id
cross apply (select SysObjName=schema_name(schema_id)+'.'+o.name) F
where c.name like '%cpu%' --Enter Search String Here
order by SysObjName
for xml path(''),type).value('.','nvarchar(max)')
)
select LinkToSysObjInfo
=(select [processing-instruction(q)]=N':'+nchar(13)+SysObjInfo+nchar(13)
from SysObjInfo
for xml path(''),type)
You can find the explanation for the processing-instruction directive and all the other XML stuff in an article I wrote last June called Hyperlinks To T-SQL Code.

When I execute that query in SSMS, it gives me a single hyperlink…

System Object Query Result in SSMS

When I click on that hyperlink, it opens an XML window with a list of all the views/functions that have to do with CPU…

System Object Query Result in XML Window

If I see one that interests me and I want to learn more, I just CTRL+Click on the URL and voila! There’s a browser window with MSDN Search giving me links to the view/function:

System Object Information in Browser Window

So now all the information you want on views/functions on a certain subject are just a couple clicks away.

Thanks again to Kendra for the original idea.

Now do yourself a favor and add her blog to your reader right now. Then sit back in your easy chair by the fire and enjoy catching up on her posts… Make yourself comfortable because you won’t be able to stop.

Tuesday, March 8, 2011

T-SQL Tuesday #016: AWF-Inspiring Queries

T-SQL TuesdayThis blog entry is participating in T-SQL Tuesday #016, hosted this month by Jes Schultz Borland (also known as grrlgeek). And no, we are not related, because you can plainly see that she puts an extraneous “t” in Schulz. (Sorry, Jes, I couldn’t resist).

You are invited to visit her blog and join the party and read more blogs participating in this month’s theme: Aggregates.

In May and June of 2010, I had the pleasure of giving a talk on Window Functions to the San Francisco and Silicon Valley SQL Server User Groups. I’ve been meaning to put together a blog post or two consisting of highlights of those presentations, but I just never got around to it.

Jes’ T-SQL Tuesday invitation was the perfect catalyst… And so this blog post will be specifically about Aggregate Window Functions (AWF) and some tips and tricks and traps.



Who Am I? Where Do I Fit In?

Consider the following query, which gives a detailed list of all the orders in AdventureWorks. The result is sorted by CustomerID and SalesOrderID for our convenience:

select CustomerID
,SalesOrderID
,SalesPersonID
,OrderDate=convert(date,OrderDate)
,OrderDollars=TotalDue
from AdventureWorks.Sales.SalesOrderHeader
order by CustomerID
,SalesOrderID
/*
CustomerID SalesOrderID SalesPersonID OrderDate OrderDollars
---------- ------------ ------------- ---------- ------------
1 43860 280 2001-08-01 14603.7393
1 44501 280 2001-11-01 26128.8674
1 45283 280 2002-02-01 37643.1378
1 46042 280 2002-05-01 34722.9906
2 46976 283 2002-08-01 10184.0774
2 47997 283 2002-11-01 5469.5941
. . .
(31465 Rows Total)
*/
Take a look at the first row in this result set… Order Number 43860. This row has an identity complex.

Sure, it knows that it’s an order for CustomerID 1, and it was handled by SalesPersonID 280, and it was ordered in the month of August 2001, and it was worth about $14600.

But what does all that really mean? The row is wondering: “What is my place in the world? How do I stack up against other rows? Who am I? I know that I belong to CustomerID 1, but am I that customer’s only order? Or am I one of many? What about my order amount of $14600? Am I a small order for that customer? Or am I huge? What percentage of CustomerID 1’s orders is attributable to me? What about my SalesPerson? Is my order amount above or below average for that SalesPerson? What about compared to ALL orders in AdventureWorks?”

This row has a lot of questions about the world outside of its little 1-row universe.

Since we sorted the result by CustomerID, it’s easy to see from our perspective the partition of rows for CustomerID number 1. And we can see quickly that Order 43860 is just 1 of 4 orders for this Customer, and, if we’re quick with mental calculations, we can see that the average dollars for that Customer’s orders is around $28000, so that makes Order 43860 below average.

We’ve always had the ability to analyze partitions (or groups) of data with the GROUP BY clause, like so:

select CustomerID
,NumOrders=count(*)
,TotalDollars=sum(TotalDue)
,AverageDollars=avg(TotalDue)
from AdventureWorks.Sales.SalesOrderHeader
group by CustomerID
order by CustomerID
/*
CustomerID NumOrders TotalDollars AverageDollars
---------- --------- ------------ --------------
1 4 113098.7351 28274.6837
2 8 32733.9695 4091.7461
3 12 479506.3256 39958.8604
4 8 780035.2121 97504.4015
5 8 114218.8006 14277.35
. . .
(19119 Rows Total)
*/
And we can also get information on the large single partition of ALL orders:

select NumOrders=count(*)
,TotalDollars=sum(TotalDue)
,AverageDollars=avg(TotalDue)
from AdventureWorks.Sales.SalesOrderHeader
/*
NumOrders TotalDollars AverageDollars
--------- -------------- --------------
31465 140707584.8246 4471.8762
*/
But in getting that summary information, we lose the detail.

The Window Functions introduced in SQL2005 allow us to “marry” the detail and the summary information… and we also are not limited to partitioning the data in only one way.

There are Ranking Window Functions like ROW_NUMBER, RANK, DENSE_RANK, and NTILE, but I want to focus instead on Aggregate Window Functions, like SUM, COUNT, AVG, MIN, MAX, etc.

What differentiates a regular Aggregate from a Windowed Aggregate is the OVER clause, where we can specify how we want to partition the data:

Aggregate_Function( <parameter> ) OVER ( [PARTITION BY <partition_list>] )
The OVER clause defines how we want to partition the data into groups of rows, and the Aggregate Function is applied to each of those groups, returning a value for that group.

Note that the PARTITION BY clause is optional, but if it is left out, then it is implied that the aggregate is applied to the partition of the entire result set.

So, with that in mind, let’s look at our original query once again, but this time, through Aggregate Window Functions, let’s answer some of Order 43860’s questions. How many order does its customer have? What percent of the customer’s orders is its value responsible for? Is its value above or below average for the customer? What about for the salesperson? What about the entire company?

select CstID=CustomerID
,OrdID=SalesOrderID
,SPsnID=SalesPersonID
,Dollars=TotalDue
,[#CstOrds]=count(*) over (partition by CustomerID)
,[%CustTot]=TotalDue / sum(TotalDue) over (partition by CustomerID) * 100
,[+/-CustAvg]=TotalDue - avg(TotalDue) over (partition by CustomerID)
,[+/-SlsPsnAvg]=TotalDue - avg(TotalDue) over (partition by SalesPersonID)
,[+/-OverallAvg]=TotalDue - avg(TotalDue) over ()
from AdventureWorks.Sales.SalesOrderHeader
order by CustomerID, SalesOrderID
/*
CstID OrdID SPsnID Dollars #CstOrds %CustTot +/-CustAvg +/-SlsPsnAvg +/-OverallAvg
----- ----- ------ ---------- -------- -------- ----------- ------------ -------------
1 43860 280 14603.7393 4 12.91 -13670.9444 -32270.8068 10131.8631
1 44501 280 26128.8674 4 23.10 -2145.8163 -20745.6787 21656.9912
1 45283 280 37643.1378 4 33.28 9368.4541 -9231.4083 33171.2616
1 46042 280 34722.9906 4 30.70 6448.3069 -12151.5555 30251.1144
2 46976 283 10184.0774 8 31.11 6092.3313 -16428.8692 5712.2012
2 47997 283 5469.5941 8 16.70 1377.848 -21143.3525 997.7179
2 49054 283 1739.4078 8 5.31 -2352.3383 -24873.5388 -2732.4684
. . .
(31465 Rows Total)
*/
This is great! Now Order 43860 knows so much more about itself and how it fits into the world, and we have relieved it of the burden of expensive therapy sessions.

What is a Result Set?

It’s important to know, however, that these Window Functions are applied to the Result Set, and not to the original source data (i.e. the tables) of the query.

For example, if we add a WHERE clause to that last query to only look at orders placed in the year 2001, then that completely changes the Result Set…

select CstID=CustomerID
,OrdID=SalesOrderID
,SPsnID=SalesPersonID
,Dollars=TotalDue
,[#CstOrds]=count(*) over (partition by CustomerID)
,[%CustTot]=TotalDue / sum(TotalDue) over (partition by CustomerID) * 100
,[+/-CustAvg]=TotalDue - avg(TotalDue) over (partition by CustomerID)
,[+/-SlsPsnAvg]=TotalDue - avg(TotalDue) over (partition by SalesPersonID)
,[+/-OverallAvg]=TotalDue - avg(TotalDue) over ()
from AdventureWorks.Sales.SalesOrderHeader
where OrderDate>='20010101' and OrderDate<'20020101'
order by CustomerID, SalesOrderID
/*
CstID OrdID SPsnID Dollars #CstOrds %CustTot +/-CustAvg +/-SlsPsnAvg +/-OverallAvg
----- ----- ------ ---------- -------- -------- ----------- ------------ -------------
1 43860 280 14603.7393 2 35.85 -5762.564 -25709.6621 4213.9263
1 44501 280 26128.8674 2 64.14 5762.5641 -14184.534 15739.0544
3 44124 277 20504.1755 2 51.57 627.7545 -9620.1415 10114.3625
3 44791 277 19248.6666 2 48.42 -627.7544 -10875.6504 8858.8536
11 43853 282 34946.6266 2 86.60 14771.4029 7888.1527 24556.8136
11 44495 282 5403.8208 2 13.39 -14771.4029 -21654.6531 -4985.9922
. . .
(1379 Rows Total)
*/
…And we see that Order 43860’s perspective completely changes. Now he’s only one of 2 orders of CustomerID 1, and he now makes up a full 35.85% of that customer’s order total (as opposed to only 12.91%). And he’s only $4000 above the overall average of all orders in AdventureWorks (as opposed to $10000 above average).

Because Window Functions are applied to partitions of the Result Set, it’s important to understand the logical processing order of a query so that we really know what a “Result Set” is.

In simplest terms, the logical processing order of a query’s clauses is like so:

/*
FROM and JOIN/APPLY/PIVOT/UNPIVOT
WHERE
GROUP BY
HAVING
(At this point we have a Result Set)
SELECT
ORDER BY
TOP
*/
So Window Functions only come into play after we’ve done the FROM and JOINs and WHEREs and the GROUPing and HAVING stuff.

And this is why the Window Functions can only be used in the SELECT or ORDER BY clauses of a query.

What About the WHERE Clause?

You may NOT use Window Functions in a WHERE clause.

For example, let’s consider orders placed in January of 2004… Which orders had an above-average dollar amount (for that time period)? The following query, which employs a subquery to get the average for the month, will work fine:

select SalesOrderID 
,CustomerID
,OrderDate=convert(date,OrderDate)
,OrderDollars=TotalDue
from AdventureWorks.Sales.SalesOrderHeader
where OrderDate>='20040101' and OrderDate<'20040201'
and TotalDue>(select avg(TotalDue)
from AdventureWorks.Sales.SalesOrderHeader
where OrderDate>='20040101' and OrderDate<'20040201')
order by SalesOrderID
/*
SalesOrderID CustomerID OrderDate OrderDollars
------------ ----------- ---------- ---------------------
61173 676 2004-01-01 38511.2882
61174 622 2004-01-01 9128.8743
61175 34 2004-01-01 34218.425
61177 442 2004-01-01 28287.8223
61178 510 2004-01-01 3565.8474
. . .
(498 Rows Total)
*/
But don’t you just hate having to duplicate the WHERE clause in the subquery? One might be tempted to use Window Function to get the average of the result set and put it in the WHERE clause like so:

select SalesOrderID 
,CustomerID
,OrderDate=convert(date,OrderDate)
,OrderDollars=TotalDue
from AdventureWorks.Sales.SalesOrderHeader
where OrderDate>='20040101' and OrderDate<'20040201'
and TotalDue>avg(TotalDue) over ()
order by SalesOrderID
/*
Msg 4108, Level 15, State 1, Line 7
Windowed functions can only appear in the SELECT or ORDER BY clauses.
*/
Well, that error message makes things clear, doesn’t it? We simply can’t do it that way. But how do we accomplish our goal? We’d have to use a CTE that included the Window Function, and then in the query that uses the CTE, we can now filter out what we want:

with SourceData as
(
select SalesOrderID
,CustomerID
,OrderDate=convert(date,OrderDate)
,OrderDollars=TotalDue
,OverallAverageDollars=avg(TotalDue) over ()
from AdventureWorks.Sales.SalesOrderHeader
where OrderDate>='20040101' and OrderDate<'20040201'
)
select SalesOrderID
,CustomerID
,OrderDate
,OrderDollars
from SourceData
where OrderDollars>OverallAverageDollars
order by SalesOrderID
/*
SalesOrderID CustomerID OrderDate OrderDollars
------------ ----------- ---------- ------------
61173 676 2004-01-01 38511.2882
61174 622 2004-01-01 9128.8743
61175 34 2004-01-01 34218.425
61177 442 2004-01-01 28287.8223
61178 510 2004-01-01 3565.8474
. . .
(498 Rows Total)
*/
There, that’s better.

The GROUP BY Puzzle

This knowledge of when the Window Functions come into play is especially important when doing a GROUP BY query.

Let’s take a look at quantities sold by Region/Territory:

select Region=t.CountryRegionCode
,Territory=t.Name
,OrderTotal=sum(OrderQty)
from AdventureWorks.Sales.SalesOrderDetail d
join AdventureWorks.Sales.SalesOrderHeader h on d.SalesOrderID=h.SalesOrderID
join AdventureWorks.Sales.SalesTerritory t on h.TerritoryID=t.TerritoryID
group by t.CountryRegionCode
,t.Name
order by Region
,Territory
/*
Region Territory OrderTotal
------ -------------- ----------
AU Australia 18293
CA Canada 49381
DE Germany 13143
FR France 19906
GB United Kingdom 20099
US Central 19493
US Northeast 19843
US Northwest 36776
US Southeast 18875
US Southwest 59105
*/
We get a request to show how each territory contributes (percentage-wise) to the grand total. We might be tempted to do that by using the SUM Aggregate Window Function to calculate the grand total of the entire result set and divide each row’s total by that grand total to get a percentage.

select Region=t.CountryRegionCode
,Territory=t.Name
,OrderTotal=sum(OrderQty)
,PercentOfTotal=100.0 * sum(OrderQty) / sum(OrderQty) over ()
from AdventureWorks.Sales.SalesOrderDetail d
join AdventureWorks.Sales.SalesOrderHeader h on d.SalesOrderID=h.SalesOrderID
join AdventureWorks.Sales.SalesTerritory t on h.TerritoryID=t.TerritoryID
group by t.CountryRegionCode
,t.Name
order by Region
,Territory
/*
Msg 8120, Level 16, State 1, Line 4
Column 'AdventureWorks.Sales.SalesOrderDetail.OrderQty' is invalid in the select list
because it is not contained in either an aggregate function or the GROUP BY clause.
*/
Huh? What does that error message mean? OrderQty is not in the SELECT list… it’s part of an aggregate. Why does the optimizer think it’s in the SELECT list?

Let’s take a step back for a moment. What was the result set of our original query? It was 3 columns: Region, Territory, and OrderTotal. That OrderTotal column was simply SUM(OrderQty). So then what would the Grand Total of the entire result set be? Ah-hah! It would be the SUM of all the SUM(OrderQty)’s. So that’s what we have to use in our Aggregate Window Function:

select Region=t.CountryRegionCode
,Territory=t.Name
,OrderTotal=sum(OrderQty)
,PercentOfTotal=100.0 * sum(OrderQty) / sum(sum(OrderQty)) over ()
from AdventureWorks.Sales.SalesOrderDetail d
join AdventureWorks.Sales.SalesOrderHeader h on d.SalesOrderID=h.SalesOrderID
join AdventureWorks.Sales.SalesTerritory t on h.TerritoryID=t.TerritoryID
group by t.CountryRegionCode
,t.Name
order by Region
,Territory
/*
Region Territory OrderTotal PercentOfTotal
------ -------------- ---------- ---------------
AU Australia 18293 6.654080912576
CA Canada 49381 17.962344587761
DE Germany 13143 4.780767803749
FR France 19906 7.240809853263
GB United Kingdom 20099 7.311013626079
US Central 19493 7.090581054438
US Northeast 19843 7.217893595815
US Northwest 36776 13.377274347614
US Southeast 18875 6.865783481379
US Southwest 59105 21.499450737321
*/
So, in short, we can’t apply an Aggregate Window Function to the expression OrderQty, because by the time that the GROUP BY does its job, the OrderQty column is not available anymore… It is not in the Result Set. However, SUM(OrderQty) was in our Result Set, so we can apply our SUM function to that expression… thus, SUM(SUM(OrderQty)).

ORDER BY Uses

One final word on Aggregate Window Functions: Using an Aggregate Window Function in the SELECT clause makes sense. But how the heck would you use one in an ORDER BY?

Here’s an interesting example of how that might be useful. Let’s take a look at the top 100 orders (in terms of dollars). Let’s see which SalesPersons are associated with those orders:

select top 100 SalesPersonID
,FirstName
,LastName
,SalesOrderID
,OrderDate=convert(date,OrderDate)
,TotalDue
from AdventureWorks.Sales.SalesOrderHeader soh
join AdventureWorks.Person.Contact c on soh.SalesPersonID=c.ContactID
order by TotalDue desc
/*
SalesPersonID FirstName LastName SalesOrderID OrderDate TotalDue
------------- --------- -------- ------------ ---------- -----------
281 Jauna Elson 51131 2003-07-01 247913.9138
281 Jauna Elson 55282 2003-10-01 227737.7215
285 Gail Erickson 46616 2002-07-01 207058.3754
286 Mark Erickson 46981 2002-08-01 201490.4144
275 Maciej Dusza 47395 2002-09-01 198628.3054
. . .
286 Mark Erickson 51735 2003-08-01 123218.8124
286 Mark Erickson 53497 2003-09-01 122462.2807
275 Maciej Dusza 51153 2003-07-01 122035.2702
281 Jauna Elson 44127 2001-09-01 121141.6984
285 Gail Erickson 46959 2002-08-01 120843.8322
(100 Rows Total)
*/
That gives us the detail about those Top 100 Orders. But it doesn’t give us any kind of perspective. We can see that Jauna Elson had the two biggest orders in the company, but does she have any others? What if we sort the result so that the SalesPerson who has the most orders is at the top of the result, and the SalesPerson with the second most orders is next, and so on? All we have to do is ORDER BY the COUNT of orders of each partition of SalesPersonID, like so:

select top 100 SalesPersonID
,FirstName
,LastName
,SalesOrderID
,OrderDate=convert(date,OrderDate)
,TotalDue
from AdventureWorks.Sales.SalesOrderHeader soh
join AdventureWorks.Person.Contact c on soh.SalesPersonID=c.ContactID
order by count(*) over (partition by SalesPersonID) desc
/*
SalesPersonID FirstName LastName SalesOrderID OrderDate TotalDue
------------- --------- -------- ------------ ---------- ----------
277 Linda Ecoffey 71894 2004-06-01 92952.5948
277 Linda Ecoffey 71908 2004-06-01 2791.3805
277 Linda Ecoffey 71939 2004-06-01 972.785
277 Linda Ecoffey 71940 2004-06-01 668.3798
277 Linda Ecoffey 71929 2004-06-01 66.2255
. . .
277 Linda Ecoffey 63274 2004-02-01 486.3925
277 Linda Ecoffey 63286 2004-02-01 449.506
277 Linda Ecoffey 63221 2004-02-01 50.2703
277 Linda Ecoffey 63226 2004-02-01 50.2703
277 Linda Ecoffey 63232 2004-02-01 102.1309
(100 Rows Total)
*/
Isn’t that cool? We see that Linda Ecoffey has the most orders, so she’s at the top, and at the bottom of the list, we see---

What a minute…

Something’s wrong here. Linda Ecoffey is the only one in the report. Where’s Jauna Elson? Why are piddly little $50 orders showing up in our query?

Any idea?

It’s the Logical Processing Order of the query coming into play again! What is the very last thing that happens in a query? The TOP operator! Our COUNT(*) Window Function was applied to CustomerID partitions of ALL orders… and THEN the TOP operator was applied.

So once again, we have to split the query up and get the Top 100 Orders in a CTE, and THEN use the ORDER BY in the main query that uses that CTE:

with Top100Sales as
(
select top 100 SalesPersonID
,FirstName
,LastName
,SalesOrderID
,OrderDate=convert(date,OrderDate)
,TotalDue
from AdventureWorks.Sales.SalesOrderHeader soh
join AdventureWorks.Person.Contact c on soh.SalesPersonID=c.ContactID
order by TotalDue desc
)
select *
from Top100Sales
order by count(*) over (partition by SalesPersonID) desc
/*
SalesPersonID FirstName LastName SalesOrderID OrderDate TotalDue
------------- --------- --------- ------------ ---------- -----------
277 Linda Ecoffey 43884 2001-08-01 154912.0712
277 Linda Ecoffey 44528 2001-11-01 144355.8792
277 Linda Ecoffey 46660 2002-07-01 162629.7468
277 Linda Ecoffey 53506 2003-09-01 142587.761
277 Linda Ecoffey 47027 2002-08-01 141111.7487
277 Linda Ecoffey 50683 2003-06-01 140349.1998
277 Linda Ecoffey 44530 2001-11-01 139414.5241
277 Linda Ecoffey 46067 2002-05-01 135306.5844
277 Linda Ecoffey 49888 2003-04-01 128123.3556
277 Linda Ecoffey 47658 2002-10-01 132199.8023
277 Linda Ecoffey 46642 2002-07-01 130825.509
277 Linda Ecoffey 46066 2002-05-01 133271.8758
277 Linda Ecoffey 47712 2002-10-01 133229.5585
277 Linda Ecoffey 47990 2002-11-01 133056.7092
277 Linda Ecoffey 46629 2002-07-01 132987.326
277 Linda Ecoffey 48048 2002-11-01 126309.4039
276 Shelley Dyck 50304 2003-05-01 123326.9867
276 Shelley Dyck 69471 2004-05-01 131279.032
276 Shelley Dyck 44777 2001-12-01 132235.1616
. . .
283 Terry Eminhizer 51123 2003-07-01 125068.34
283 Terry Eminhizer 46643 2002-07-01 149897.3647
283 Terry Eminhizer 51711 2003-08-01 136046.4425
287 Martha Espinoza 50297 2003-05-01 137108.3865
287 Martha Espinoza 69508 2004-05-01 140734.4875
289 Janeth Esteves 55254 2003-10-01 140521.5049
289 Janeth Esteves 51761 2003-08-01 146293.234
280 Shannon Elliott 47033 2002-08-01 125254.4859
280 Shannon Elliott 67297 2004-04-01 125144.0084
268 Gary Drury 51830 2003-08-01 149861.0659
(100 Rows Total)
*/
So that’s more like it. Linda Ecoffey is the top sales person in terms of having the most orders of the Top 100. Then comes Shelley Dyck and her orders (not all listed). And at the bottom of the list is Gary Drury with only one order in the Top 100.

So this is a novel way of looking at a query result and seeing at a glance that Linda is our big gun and she really leaves Gary in the dust.

But is this fair? Maybe the quantity of orders is not a good representation of who is our big gun. Perhaps finding out who brings in the highest average dollar amount of orders should be put at the top. Easy enough… We’ll just change our ORDER BY clause:

with Top100Sales as
(
select top 100 SalesPersonID
,FirstName
,LastName
,SalesOrderID
,OrderDate=convert(date,OrderDate)
,TotalDue
from AdventureWorks.Sales.SalesOrderHeader soh
join AdventureWorks.Person.Contact c on soh.SalesPersonID=c.ContactID
order by TotalDue desc
)
select *
from Top100Sales
order by avg(TotalDue) over (partition by SalesPersonID) desc
/*
SalesPersonID FirstName LastName SalesOrderID OrderDate TotalDue
------------- --------- --------- ------------ ---------- -----------
281 Jauna Elson 44762 2001-12-01 123265.0706
281 Jauna Elson 61184 2004-01-01 125364.2684
281 Jauna Elson 50252 2003-05-01 126565.2584
281 Jauna Elson 44127 2001-09-01 121141.6984
281 Jauna Elson 47441 2002-09-01 153055.214
281 Jauna Elson 47018 2002-08-01 144864.8759
281 Jauna Elson 47369 2002-09-01 189198.6156
281 Jauna Elson 51131 2003-07-01 247913.9138
281 Jauna Elson 55282 2003-10-01 227737.7215
281 Jauna Elson 67305 2004-04-01 155260.0958
281 Jauna Elson 48043 2002-11-01 129398.8662
281 Jauna Elson 44795 2001-12-01 139562.6866
268 Gary Drury 51830 2003-08-01 149861.0659
286 Mark Erickson 57143 2003-11-01 126349.7887
286 Mark Erickson 50222 2003-05-01 124696.565
286 Mark Erickson 51735 2003-08-01 123218.8124
. . .
277 Linda Ecoffey 47990 2002-11-01 133056.7092
277 Linda Ecoffey 46629 2002-07-01 132987.326
277 Linda Ecoffey 48048 2002-11-01 126309.4039
283 Terry Eminhizer 51123 2003-07-01 125068.34
283 Terry Eminhizer 46643 2002-07-01 149897.3647
283 Terry Eminhizer 51711 2003-08-01 136046.4425
280 Shannon Elliott 47033 2002-08-01 125254.4859
280 Shannon Elliott 67297 2004-04-01 125144.0084
(100 Rows Total)
*/
So Jauna is the champion in terms of highest-dollar average orders, and our friend Gary Drury, who ranked dead last in the previous query, redeems himself in this one because, although he only has one order, it is for a large amount, making him the second highest SalesPerson in terms of average dollars. Linda Ecoffey comes out towards the bottom.

In Conclusion

I hope this article gave you some insight into Aggregate Window Functions (AWF)… If you can’t get enough of this stuff, please also read a blog post I wrote last year called OVER Exposed, in which I talk in more detail about AWF’s and more gotchas and what really goes on under the hood with them.

Now go out there and produce some AWF-Inspiring Queries!