Author Archive

T-SQL Tuesday #024: The Roundup

Friday, November 18th, 2011
T-SQL TuesdayThanx to everyone for their contributions to T-SQL Tuesday #024: Prox ‘n’ Funx.

The roundup is finally here… Well, shux, I’ve been so busy that I almost let it fall through the crax, but before weex and weex went by, I finally read through the stax of submissions. I certainly had a lot of kix reading them… you could hear shriex of delight coming out of my office window.

(Okay, I admit it… The roundup was late because I spent a lot of time looking up words that end with a -ks sound… It loox like you can count me among the ranx of freax ‘n’ geex ‘n’ punx ‘n’ jerx ‘n’ dorx ‘n’ quax who carry something a little too far… It stinx, doesn’t it?).

Methinx it was a success! The SQL Community rox!

But don’t take it from me… Read the submissions for yourself. Click on the linx below for some good reading.

declare @Subject varchar(100)='T-SQL Tuesday #024: Prox ''n'' Funx';
 
insert [Brad Schulz].Blog 
select Subject='T-SQL Tuesday #024 Roundup'
      ,Content
from 
  (
  select Content from [Rob Farley].Blog where Subject=@Subject
  union all
  select Content from [Noel McKinney].Blog where Subject=@Subject
  union all
  select Content from [Greg Lucas].Blog where Subject=@Subject
  union all
  select Content from [Michael J. Swart].Blog where Subject=@Subject
  union all
  select Content from [Kent Chenery].Blog where Subject=@Subject
  union all
  select Content from [Steve Wales].Blog where Subject=@Subject
  union all
  select Content from [Merrill Aldrich].Blog where Subject=@Subject
  union all
  select Content from [Aaron Bertrand].Blog where Subject=@Subject
  union all
  select Content from [Pinal Dave].Blog where Subject=@Subject
  union all
  select Content from [Rich Brown].Blog where Subject=@Subject
  union all
  select Content from [Bob Pusateri].Blog where Subject=@Subject
  union all
  select Content from [Kerry Tyler].Blog where Subject=@Subject
  union all
  select Content from [Jes Schultz Borland].Blog where Subject=@Subject
  union all
  select Content from [Thomas Rushton].Blog where Subject=@Subject
  union all
  select Content from [Jason Brimhall].Blog where Subject=@Subject
  union all
  select Content from [Nancy Hidy Wilson].Blog where Subject=@Subject
  union all
  select Content from [David Howard].Blog where Subject=@Subject
  union all
  select Content from [Brad Schulz].Blog where Subject=@Subject
) Contributions
 
Rob Farley contributes a wonderfully detailed post comparing inline functions and non-inline functions, filled with lots of examples and query plans. Rob categorically states that “if it’s not inline, it’s rubbish.” I couldn’t agree more, but then again, I’m always a sucker for any blog post that uses the word “rubbish”.

Noel McKinney talks about a situation concerning the abuse and overuse of stored procedures. Imagine a database that is completely devoid of data and yet consumes truckloads of disk space… because it houses 27,000 stored procedures! A good story to keep in the back of your mind in designing solutions.

Greg Lucas gives us a detailed explanation of a useful ExceptionHandler procedure that is designed to be used in the CATCH block of a TRY…CATCH construct, logging detailed information about the error in a table. It encapsulates and standardizes error handling so that the developer can save time in coding.

Michael J. Swart talks about some interesting numbers related to stored procedures. At first I thought it would be the usual numbers like “maximum parameters allowed” and “maximum nested levels”, but if you know Michael, then you know he wouldn’t only spout out dry statistics like that. By the time you finish his list, you’ll be struggling to catch your breath from laughing.

Kent Chenery joins T-SQL Tuesday for the first time, contributing a CLR routine for calculating Easter in any given year. That’s pretty cool… I never even thought about the fact that an algorithm existed for calculating Easter. Make sure you read the Wikipedia article he references… the detail behind the algorithm is fascinating.

Steve Wales is another first-time contributor to T-SQL Tuesday. He talks about the differences in coding procedures and functions between SQL Server and that other database created by that company in my neck of the woods, Oracle. He also provides a link to an earlier post of his that warns about the fact that you have to be aware of the compatibility mode when using built-in functions.

Merrill Aldrich offers his ode to composable solutions, illustrating how table-valued functions (and views) offer maximum composability and flexibility (and that stored procedures don’t offer that same flexibility). As a rabid fan of TVF’s, I couldn’t agree more.

Aaron Bertrand wrote a terrific post on the shortcomings of INFORMATION_SCHEMA views earlier this month, but he waited for T-SQL Tuesday to post his pièce de résistance, plunging the knife deeper in talking about INFORMATION_SCHEMA.ROUTINES (and syscomments and sp_helptext) and how they just don’t cut it in getting the full procedure definition.

Pinal Dave talks about the new CUME_DIST() analytic function in SQL2012, which provides a really cool easy-to-implement percentile-calculating capability. Pinal’s post is the first of a series of blog posts about the new SQL2012 analytic functions… make sure you add them to your must-read list.

Rich Brown tells us about the benefits of using stored procedures for Reporting Services datasets, abstracting away the data layer from the presentation layer. He also brings up an interesting idea of putting Reporting Services procedures into their own schema.

Bob Pusateri gives an overview of the pros and cons of both procedures and functions. Speaking of procedures, though, I have to also mention that I found Bob’s post of October 18th to be very interesting in terms of using a cool trick of using sp_executesql to create a view in another database besides the current one.

Kerry Tyler discusses scenarios (and some myths) regarding transactions being left open by procedures that produce errors or experience timeouts, concluding with a discussion on XACT_ABORT.

Jes Schultz Borland expresses her frustration about the fact that a Reporting Services dataset only recognizes the first result set from a procedure that produces multiple result sets. It is confusing, being that the term dataset has different meanings across platforms (in .NET the dataset class truly is a collection of datatables).

Thomas Rushton makes a “Hell yes” case in answer to the question, “Should I learn to use stored procedures?” And he follows up with a second contribution talking about the best practice in releasing stored procedure scripts (or scripts for any kind of object for that matter).

Jason Brimhall talks about functions… specifically functions that he is thankful for, functions he can’t live without, and functions that he had fun with in delving into SQL Server internals. I’m going to set aside some time to read more about those.

Nancy Hidy Wilson reminds us of the myriad of information we can glean from the good old SERVERPROPERTY() function… and she reminds us to check out DATABASEPROPERTY() and DATABASEPROPERTYEX() as well.

David Howard discussed bad plans resulting from parameter sniffing and gave a list of techniques to solve the problem.

And, finally, my contribution talked about a procedure that I use to deal with a client system that contains no stored procedures at all… it finds the original text of ad-hoc queries in a Forced Parameterization database.

Thanx again for your contributions!

T-SQL Tuesday #024: A Procedure To Deal With No Procedures

Wednesday, November 9th, 2011
T-SQL TuesdayOh my… Here it is Wednesday, and I’m a day late for T-SQL Tuesday #024, which is hosted this month by some joker named Brad Schulz. Hopefully he’ll understand and accept my late entry… If he doesn’t, then I’ll just have to make him understand.

The theme for this month is Prox ‘n’ Funx (aka Procedures and Functions)… and I’m going to talk about a procedure that I wrote to help me deal with the lack of procedures.

You probably had to re-read that latest sentence, didn’t you? Well, this whole blog post might turn your brain inside-out a bit, so you might want to read it slowly or be prepared to re-read it after you finish it. My apologies in advance.

Here’s the situation… purely hypothetical, you understand… (wink, wink).

Imagine yourself faced with a SQL Server Database that is just a big dumb file cabinet. All it holds is data, and there isn’t a single procedure or function to be found… every single query that goes to the system is an ad-hoc query constructed within an enormous .NET application.

For example, a typical query in the C# code may be put together like this:

string SQL = "";
SQL += " SELECT soh.SalesOrderNumber ";
SQL += "       ,soh.ShipDate ";
SQL += "       ,soh.CustomerID ";
SQL += "       ,s.Name ";
SQL += "       ,soh.TotalDue ";
SQL += " FROM Sales.SalesOrderHeader soh ";
SQL += " JOIN Person.Address a ON soh.ShipToAddressID=a.AddressID ";
SQL += " JOIN Person.StateProvince sp ON a.StateProvinceID=sp.StateProvinceID ";
SQL += " JOIN Sales.Store s ON soh.CustomerID=s.CustomerID ";
SQL += " WHERE soh.SalesOrderID IN ";
SQL += "         (SELECT SalesOrderID ";
SQL += "          FROM Sales.SalesOrderDetail ";
SQL += "          WHERE ProductID=" + SQLUtils.SQLValue(ProdID) + ") ";
SQL += "   AND soh.ShipDate>=" + SQLUtils.SQLValue(FromDate);
SQL += "   AND soh.ShipDate<" + SQLUtils.SQLValue(ToDate);
SQL += "   AND a.City=" + SQLUtils.SQLValue(CityName);
SQL += "   AND sp.Name=" + SQLUtils.SQLValue(StateProvinceName);
DataTable dtResult = SQLUtils.ExecSQLQuery(SQL)
 
Never mind the thoughts that are going through your head regarding readability. Never mind the maintenance nightmare. Never mind the potential for SQL Injection. Never mind the…

Oh never mind.

So hundreds of thousands of ad-hoc queries get sent to the system all day long.

My job? Make things go faster.

Okay, fine. However, there’s one other wrinkle. This particular database operates under Forced Parameterization. This may reduce the frequency of query compilations, but it makes my job a little more complicated.

Let me illustrate… Let’s set the AdventureWorks database to use Forced Parameterization:

alter database AdventureWorks set parameterization forced
With that in place, every ad-hoc query sent to the system is parameterized… In other words, any literal within the ad-hoc query is converted into a parameter and the query as a whole is parameterized so that its plan can be reused by any subsequent ad-hoc query that has the same “code shape”.

Let’s execute the following query in AdventureWorks:

SELECT soh.SalesOrderNumber
      ,soh.ShipDate 
      ,soh.CustomerID
      ,s.Name
      ,soh.TotalDue
FROM Sales.SalesOrderHeader soh 
JOIN Person.Address a ON soh.ShipToAddressID=a.AddressID 
JOIN Person.StateProvince sp ON a.StateProvinceID=sp.StateProvinceID 
JOIN Sales.Store s ON soh.CustomerID=s.CustomerID 
WHERE soh.SalesOrderID IN (SELECT SalesOrderID 
                           FROM Sales.SalesOrderDetail 
                           WHERE ProductID=897)
  AND soh.ShipDate>='20030801'
  AND soh.ShipDate<'20030901'
  AND a.City='Milsons Point'
  AND sp.Name='New South Wales'
(Those of you who read my blog regularly may be shocked by my use of UPPER CASE keywords… I never do that… But I’m doing it here to illustrate a point).

As you may know, there is a DMV called sys.dm_exec_cached_plans, which contains information about all the query plans that have been cached, along with a plan_handle column that you can use to acquire the actual text of the query that created the plan. There’s also a DMV called sys.dm_exec_query_stats, which contains performance statistics information for cached plans, but its plan_handle (or sql_handle) column points to a different kind of text when you’re dealing with Forced Parameterization.

Here is a simplified version of a Performance Statistics Query that I run that uses sys.dm_exec_query_stats to acquire performance information so that I can find plans that perform a lot of reads (the ones with the most reads are at the top).:

select Code=(select [processing-instruction(q)]=':'+nchar(13)+qt.text+nchar(13)
             for xml path(''),type)
      ,qp.Query_Plan
      ,[#Reads]=qs.total_logical_reads
      ,[#Execs]=qs.execution_count
from sys.dm_exec_query_stats qs
cross apply sys.dm_exec_sql_text(plan_handle) qt
cross apply sys.dm_exec_query_plan(plan_handle) qp
where qt.text not like '%sys.dm\_%' escape '\'
order by [#Reads] desc
Here’s what the result looks like in SSMS:



In case you’re wondering about that “processing-instruction” and XML gibberish for the generation of the first column, you can read more about that here. But, in short, you can see that it creates a hyperlink to the query text. When I click on that link, I can see that the text looks like this:

(@0 int,@1 varchar(8000),@2 varchar(8000),
@3 varchar(8000),@4 varchar(8000))select soh . SalesOrderNumber , 
soh . ShipDate , soh . CustomerID , s . Name , soh . TotalDue 
from Sales . SalesOrderHeader soh join Person . Address a on 
soh . ShipToAddressID = a . AddressID join Person . StateProvince sp 
on a . StateProvinceID = sp . StateProvinceID join Sales . Store s 
on soh . CustomerID = s . CustomerID where soh . SalesOrderID in 
( select SalesOrderID from Sales . SalesOrderDetail where ProductID 
= @0 ) and soh . ShipDate > = @1 and soh . ShipDate < @2 and 
a . City = @3 and sp . Name = @4
I have split it into multiple lines for this blog post… In reality it’s one looooonnnng string. But you can see that this is NOT the actual text of the query that I executed. In parameterizing this query, SQL converted all my UPPER CASE keywords into lower case, and it got rid of any extraneous white space (multiple spaces, tabs, carriage returns, line feeds), and it inserted a single space between all commas and periods/full-stops and operators.

Now when another query comes along that is exactly like the first query (except querying for different values)…

SELECT soh.SalesOrderNumber
      ,soh.ShipDate 
      ,soh.CustomerID
      ,s.Name
      ,soh.TotalDue
FROM Sales.SalesOrderHeader soh 
JOIN Person.Address a ON soh.ShipToAddressID=a.AddressID 
JOIN Person.StateProvince sp ON a.StateProvinceID=sp.StateProvinceID 
JOIN Sales.Store s ON soh.CustomerID=s.CustomerID 
WHERE soh.SalesOrderID IN (SELECT SalesOrderID 
                           FROM Sales.SalesOrderDetail 
                           WHERE ProductID=870)
  AND soh.ShipDate>='20030101'
  AND soh.ShipDate<'20040101'
  AND a.City='London'
  AND sp.Name='England'
…and we once again execute our Performance Statistics Query to look at the plan cache...



…we see that the same parameterized query has been reused (note the #Execs is equal to 2).

(Those of you with a quick eye may have also noticed that the #Reads skyrocketed… I’ll talk about that in a later post).

So my problem is this: In order to improve the performance of the application, I can use my Performance Statistics Query in order to find those plans that exhibited the most reads and see if they can be tuned via either the addition of an index or via a recommendation of rewriting the query a different way.

But unfortunately all I get is that parameterized version of the query text. I have no indication what the ACTUAL ad-hoc queries were that were sent to the system. Yes, I can see the text of the ad-hoc queries via the sys.dm_exec_cached_plans DMV, but how do I find the exact ones that have to do with this parameterized query? It’s like looking for a needle in a haystack.

So I wrote a procedure to do that for me. It essentially takes the parameterized query text and creates a LIKE pattern out of it so that I can find its original ad-hoc code via the plan_handle of the sys.dm_exec_cached_plans DMV.

I create this procedure in a database called DBATools (which houses other utilities and data that I use for monitoring the server). It starts like so, receiving the text of the parameterized query code and putting it into a local @Text variable:

use DBATools
go
 
create procedure GetAdhocQueries
   @ParameterizedQuery nvarchar(max)
as
 
declare @Text nvarchar(max)
set @Text=@ParameterizedQuery
 
Its first step is to get rid of the parameter declarations at the beginning of the query. Well, luckily, all these parameterized queries are of the pattern “(@0 … )xxxxx”, where “xxxxx” is one of the following: SELECT or UPDATE or INSERT or DELETE or MERGE or WITH (in the case of a CTE). So let’s use that knowledge to strip off the parameter list:

/*Get rid of parameter list*/
select @Text=substring(@Text,isnull(min(Pos),0)+1,len(@Text))
from (select charindex(N')select ',@Text)
      union all
      select charindex(N')insert ',@Text)
      union all
      select charindex(N')update ',@Text)
      union all
      select charindex(N')delete ',@Text)
      union all
      select charindex(N')merge ',@Text)
      union all
      select charindex(N')with ',@Text)) x(Pos)
where Pos>0
 
So that makes our @Text variable looks like the following, stripped of its parameter list (again, I’m artificially word-wrapping the text here… it’s just one loonng string):

select soh . SalesOrderNumber , soh . ShipDate , 
soh . CustomerID , s . Name , soh . TotalDue 
from Sales . SalesOrderHeader soh join Person . Address a on 
soh . ShipToAddressID = a . AddressID join Person . StateProvince sp 
on a . StateProvinceID = sp . StateProvinceID join Sales . Store s 
on soh . CustomerID = s . CustomerID where soh . SalesOrderID in 
( select SalesOrderID from Sales . SalesOrderDetail where ProductID 
= @0 ) and soh . ShipDate > = @1 and soh . ShipDate < @2 and 
a . City = @3 and sp . Name = @4
Next I need to find all the parameters in the text. They are all surrounded with a single space, so they are easy to find because they have the pattern “ @[0-9] ” (for a single digit parameter) or “ @[0-9][0-9] ” (for a two-digit parameter), etc. So the following will find all 4-digit, 3-digit, 2-digit, and 1-digit parameters in the text and substitute them with a single percent sign (%):

/*Substitute all parameters (i.e. @0 or @1 or ...) with Percents (%)*/
declare @NumDigs tinyint
       ,@Pos int
set @NumDigs=5
while @NumDigs>1
begin
  set @NumDigs=@NumDigs-1
  while 1=1
  begin  --Continue substituting until there are no more
    set @Pos=patindex(N'% @'+replicate(N'[0-9]',@NumDigs)+N' %',@Text)
    if @Pos=0 break  --No more... we're done with the loop
    set @Text=stuff(@Text,@Pos,3+@NumDigs,N' % ')
  end
end
And now our @Text variable looks like this:

select soh . SalesOrderNumber , soh . ShipDate , 
soh . CustomerID , s . Name , soh . TotalDue 
from Sales . SalesOrderHeader soh join Person . Address a on 
soh . ShipToAddressID = a . AddressID join Person . StateProvince sp 
on a . StateProvinceID = sp . StateProvinceID join Sales . Store s 
on soh . CustomerID = s . CustomerID where soh . SalesOrderID in 
( select SalesOrderID from Sales . SalesOrderDetail where ProductID 
= % ) and soh . ShipDate > = % and soh . ShipDate < % and 
a . City = % and sp . Name = %
Now, I need to take care of those spaces that had been inserted by the optimizer between every operator and keyword in the query. So I replace all those spaces with percent signs. At the same time, I also replace any normal pattern-matching characters (like a carat or left square bracket or right square bracket) with percent signs. I don’t want them screwing up my final query into the plan cache. Finally, I place a percent sign at the beginning and the end of the text:

/*Replace all spaces and pattern characters (i.e. "[", "]", "^") with Percents (%)*/
/*Also place a Percent at the beginning and the end*/
set @Text=N'%'
         +replace(replace(replace(replace(@Text
                                         ,N' '
                                         ,N'%')
                                 ,N'['
                                 ,N'%')
                         ,N']'
                         ,N'%')
                 ,N'^'
                 ,N'%')
         +N'%'
At this point our @Text variable looks like this:

%select%soh%.%SalesOrderNumber%,%soh%.%ShipDate%,%
soh%.%CustomerID%,%s%.%Name%,%soh%.%TotalDue%
from%Sales%.%SalesOrderHeader%soh%join%Person%.%Address%a%on%
soh%.%ShipToAddressID%=%a%.%AddressID%join%Person%.%StateProvince%sp%
on%a%.%StateProvinceID%=%sp%.%StateProvinceID%join%Sales%.%Store%s%
on%soh%.%CustomerID%=%s%.%CustomerID%where%soh%.%SalesOrderID%in%
(%select%SalesOrderID%from%Sales%.%SalesOrderDetail%where%ProductID%
=%%%)%and%soh%.%ShipDate%>%=%%%and%soh%.%ShipDate%<%%%and%
a%.%City%=%%%and%sp%.%Name%=%%%
Just to be neat and tidy, I get rid of all the multiple percent signs and replace them with a single one:

/*Transform all multiple Percents into single ones*/
set @Text=replace(@Text,N'%%%',N'%%')
set @Text=replace(@Text,N'%%',N'%')
And that results in this:

%select%soh%.%SalesOrderNumber%,%soh%.%ShipDate%,%
soh%.%CustomerID%,%s%.%Name%,%soh%.%TotalDue%
from%Sales%.%SalesOrderHeader%soh%join%Person%.%Address%a%on%
soh%.%ShipToAddressID%=%a%.%AddressID%join%Person%.%StateProvince%sp%
on%a%.%StateProvinceID%=%sp%.%StateProvinceID%join%Sales%.%Store%s%
on%soh%.%CustomerID%=%s%.%CustomerID%where%soh%.%SalesOrderID%in%
(%select%SalesOrderID%from%Sales%.%SalesOrderDetail%where%ProductID%
=%)%and%soh%.%ShipDate%>%=%and%soh%.%ShipDate%<%and%
a%.%City%=%and%sp%.%Name%=%
Finally, since LIKE patterns are not allowed to be more than 4000 characters, I truncate the @Text variable to that length if needed:

/*Truncate to 4000 characters max*/
if len(@Text)>4000 set @Text=left(@Text,3999)+N'%' 
 
So finally, at this point, we can now find all the ad-hoc queries in the cache whose text is LIKE our @Text variable:

/*Find our Adhoc queries that match the pattern*/
select Code=(select [processing-instruction(q)]=N':'+nchar(13)+qt.text+nchar(13)
             for xml path(''),type)
      ,DB=db_name(QueryDatabaseID)
      ,[#Bytes]=cp.size_in_bytes
      ,cp.usecounts 
from sys.dm_exec_cached_plans cp
cross apply (select QueryDatabaseID=convert(int,value)
             from sys.dm_exec_plan_attributes(cp.plan_handle)
             where attribute='dbid') F_DB
cross apply sys.dm_exec_sql_text(cp.plan_handle) qt
where objtype='Adhoc'
  and qt.text like @Text
order by cp.usecounts desc
Here is the result:



And when we click on one of the hyperlinks, we can see the original ad-hoc query in all its glory, just as it was originally submitted.

<?q :
SELECT soh.SalesOrderNumber
      ,soh.ShipDate 
      ,soh.CustomerID
      ,s.Name
      ,soh.TotalDue
FROM Sales.SalesOrderHeader soh 
JOIN Person.Address a ON soh.ShipToAddressID=a.AddressID 
JOIN Person.StateProvince sp ON a.StateProvinceID=sp.StateProvinceID 
JOIN Sales.Store s ON soh.CustomerID=s.CustomerID 
WHERE soh.SalesOrderID IN (SELECT SalesOrderID 
                           FROM Sales.SalesOrderDetail 
                           WHERE ProductID=870)
  AND soh.ShipDate>='20030101'
  AND soh.ShipDate<'20040101'
  AND a.City='London'
  AND sp.Name='England'
 
?>
So, to take advantage of this new procedure, I just incorporate it into my original Performance Statistics Query, like so:

select Code=(select [processing-instruction(q)]=':'+nchar(13)+qt.text+nchar(13)
             for xml path(''),type)
      ,qp.Query_Plan
      ,[#Reads]=qs.total_logical_reads
      ,[#Execs]=qs.execution_count
      ,AdHocStmts
from sys.dm_exec_query_stats qs
cross apply sys.dm_exec_sql_text(plan_handle) qt
cross apply sys.dm_exec_query_plan(plan_handle) qp
cross apply
  (select AdHocStmts=case
                       when qt.text like '(@0 %'
                       then N'exec DBATools.dbo.GetAdhocQueries '''
                           +replace(qt.text,N'''',N'''''')+N' '''
                       else null
                      end) F_Adhoc
where qt.text not like '%sys.dm\_%' escape '\'
order by [#Reads] desc
In short, if the query text (in sys.dm_exec_sql_text) starts with a “(@0 “, I know it’s a parameterized query, so I will have the AdHocStmts column populated with the text to call my GetAdhocQueries procedure. When I execute the above query, I get the following result with the new AdHocStmts column:



And I can copy/paste the AdHocStmts column into a query window and execute it (again, I only word-wrap it here for clarity… it’s actually a looonng single line of text).

exec DBATools.dbo.GetAdhocQueries '(@0 int,@1 varchar(8000),@2 varchar(8000),
@3 varchar(8000),@4 varchar(8000))select soh . SalesOrderNumber , soh . 
ShipDate , soh . CustomerID , s . Name , soh . TotalDue from Sales . 
SalesOrderHeader soh join Person . Address a on soh . ShipToAddressID = 
a . AddressID join Person . StateProvince sp on a . StateProvinceID = 
sp . StateProvinceID join Sales . Store s on soh . CustomerID = s . 
CustomerID where soh . SalesOrderID in ( select SalesOrderID from 
Sales . SalesOrderDetail where ProductID = @0 ) and soh . ShipDate 
> = @1 and soh . ShipDate < @2 and a . City = @3 and sp . Name = @4 '
And that will find the ad-hoc queries that are associated with that particular parameterized query.

So this GetAdhocQueries procedure allows me to find the ad-hoc query needles in the plan cache haystack. Now if I only had a way to find those $#&@ queries in the C# code as easily… oh well, that’s somebody else’s problem.

Invitation for T-SQL Tuesday #024: Prox ‘n’ Funx

Monday, October 31st, 2011
T-SQL TuesdayBoy, time really flies, doesn’t it? It’s hard to believe, but we are finishing up our second year of T-SQL Tuesdays! Adam Machanic started the whole concept back in November 2009 with T-SQL Tuesday #001, and here we are already up to #024.

So, without further ado…

You are hereby invited to this month’s T-SQL Tuesday #024, which will take place on November 8, 2011.

So, all you T-SQL Bloggers out there, please join the blog party and write up something revolving around this month’s topic: Prox ‘n’ Funx (which is just a coo-ul way of referring to Procedures and Functions).

This topic covers a lot of ground, so there’s a myriad of possibilities in what you can write about. You could discuss a really cool stored procedure or function that you wrote. You could write about a Dynamic Management Function that you can’t live without… or perhaps write about some of the new functions that are coming in SQL2012. How about limitations or “gotchas” or performance issues in working with procedures and functions? And on and on and on…

Now for those nasty rules:

1) Your post must go live between 00:00:00 GMT on Tuesday November 8 and 00:00:00 GMT on Wednesday November 9. If you’re unsure exactly when that is, well, guess what? There’s a function for that! It’s called GETUTCDATE().

2) Your post must link back to this post, and the link must be anchored from the logo (found above) which must also appear at the top of your post.

3) Make sure you leave a comment or a trackback here on this blog regarding your post (so that I can collect the posts and write a round-up).

Optionally:

4) “T-SQL Tuesday #024” should be included in the title of the post.

5) Tweet about your post using the hash tag #TSQL2sDay.

(I have to admit… that last rule is kind of amusing since I don’t have a Twitter account myself… A big thank-you to Rob Farley for initially tweeting this invitation).

I’m looking forward to reading your submissions. But don’t delay in composing your post! T-SQL Tuesday 24 is coming fast! I can almost hear the clock ticking down… Ka-chink! Ka-chunk! Ka-chink! Ka-chunk!

T-SQL Tuesday #023: Flip Side of the JOIN

Tuesday, October 4th, 2011
T-SQL TuesdayUnbelievable… It’s been almost 5 months since I last posted something here. I had a lot going on since that last post: A daughter graduating from college, a son graduating from high school, a few family vacation trips, moving my son into college, and between all of that, I was juggling an overwhelming amount of work from 3 demanding clients (and still am, quite frankly).

But I’m back now, ready to finally re-JOIN the SQL blogging world once again.

And that is very apt, because this post is part of the October T-SQL Tuesday on the subject of JOINs, hosted by Stuart Ainsworth.

So, let’s not waste any time… Let’s plunge in…

We are all aware of the various JOINs that are available to us in the T-SQL syntax: INNER JOIN, LEFT OUTER JOIN, RIGHT OUTER JOIN, FULL OUTER JOIN, and CROSS JOIN. But what I wanted to talk about today are two other kinds of JOINs: The (LEFT or RIGHT) Semi JOIN and the (LEFT or RIGHT) Anti Semi JOIN.

These are not available to us directly in the syntactical sense, but they are employed in a Query Plan when you use certain types of queries.

A LEFT Semi JOIN returns rows from the left side that have at least one matching row on the right side. At first glance, this seems very much like a regular INNER JOIN, except for one thing. With a LEFT Semi JOIN, the rows from the left table will be returned at most once. Even if the table on the right side contains hundreds of matches for the row on the left side, only one copy of the left-hand row will be returned.

For example, let’s find Handlebar Products (SubCategory=4) in the Products table that appeared on at least one Order. If we use a regular INNER JOIN…

select p.ProductID
from Production.Product p
join Sales.SalesOrderDetail sod on p.ProductID=sod.ProductID 
where ProductSubcategoryID=4
/*
ProductID
---------
      808
      808
      808
    . . .
      809
      809
      809
    . . .
      947
      947
      947
(1531 row(s) affected)
*/
…We get tons of duplicate Product IDs. The (actual) execution plan for the above query looks like so:



Note that the data flow arrow from the Products table showed that 8 rows were processed, and the total rows that were requested by the Nested Loops operator from the SalesOrderDetail Index Seek for those 8 product rows were 1531.

In order to eliminate all the duplicates, we have to introduce a DISTINCT to the query:

select distinct p.ProductID
from Production.Product p
join Sales.SalesOrderDetail sod on p.ProductID=sod.ProductID 
where ProductSubcategoryID=4
/*
ProductID
---------
      808
      809
      810
      811
      813
      946
      947
*/
Interestingly enough, if we look at the actual execution plan for that query, you would probably expect to find the same plan as before except with an Aggregate operator at the top of the tree to eliminate the duplicates and shrink the 1531 rows down to 7, but instead, this is what we find:



Note that the Nested Loops operator is a LEFT Semi JOIN. Somehow the optimizer is “smart enough” to realize (because of our INNER JOIN and the DISTINCT and our result set only consisting of columns from one side of the JOIN) that it would be more expensive to do the full INNER JOIN and then eliminate the duplicates, and so it employed a LEFT Semi JOIN. This is much more efficient because the Nested Loops operator only needs to request a single row from the SalesOrderDetail Index Seek operator for each Product processed… If a single row exists in SalesOrderDetails, then it can release the Product row to the Select operator for output. If no row exists, then it tosses out the Product row and moves on. If you hover over the data flow arrow coming out of the Index Seek, you’ll see that only 7 rows were passed along (because one of the 8 Product rows did not have a match).

I don’t know about you, but the hairs on the back of my neck stand up whenever I see a DISTINCT in a query. This same solution could be employed (more clearly in my opinion) via three other types of queries, all of which (by nature) involve a Semi JOIN… an INTERSECT query or an EXISTS query or an IN query:

select ProductID
from Production.Product
where ProductSubcategoryID=4
intersect
select ProductID
from Sales.SalesOrderDetail 
 
select ProductID
from Production.Product p
where ProductSubcategoryID=4
  and exists (select * 
              from Sales.SalesOrderDetail 
              where ProductID=p.ProductID)
 
select ProductID
from Production.Product
where ProductSubcategoryID=4
  and ProductID in (select ProductID
                    from Sales.SalesOrderDetail)
All three of the above queries produce the exact same plan, which is the very efficient Semi JOIN plan that we just examined.

It’s really a matter of style as to which approach that you use. I prefer EXISTS or IN. The INTERSECT operator is kind of cool, but it is very limiting. For example, let’s say we wanted the result set to include the Name of the Product as well. In the EXISTS and IN queries (and for that matter in our original INNER JOIN/DISTINCT query), we simply add the Name column to the SELECT clause and we’re done… And the query plan would remain unchanged except for the fact that an extra column will come from the Product table.

But with the INTERSECT query, we have to introduce the Name column to both sides of the INTERSECT, meaning that we have to add an extra JOIN to get the Name:

select ProductID
      ,Name
from Production.Product
where ProductSubcategoryID=4
intersect
select sod.ProductID
      ,p.Name
from Sales.SalesOrderDetail sod
join Production.Product p on sod.ProductID=p.ProductID 
/*
ProductID Name
--------- ----------------------
      808 LL Mountain Handlebars
      809 ML Mountain Handlebars
      810 HL Mountain Handlebars
      811 LL Road Handlebars
      813 HL Road Handlebars
      946 LL Touring Handlebars
      947 HL Touring Handlebars
*/
And the execution plan now involves a lot more work:



We could also try to re-work the INTERSECT query using a CTE or a derived table to just get the ProductID’s and then JOIN the result to the Products table to get the Name column like so…

with ProductsInOrders as
(
  select ProductID
  from Production.Product
  where ProductSubcategoryID=4
  intersect
  select ProductID
  from Sales.SalesOrderDetail 
)
select pio.ProductID
      ,p.Name
from ProductsInOrders pio
join Production.Product p on pio.ProductID=p.ProductID 
/*
ProductID Name
--------- ----------------------
      808 LL Mountain Handlebars
      809 ML Mountain Handlebars
      810 HL Mountain Handlebars
      811 LL Road Handlebars
      813 HL Road Handlebars
      946 LL Touring Handlebars
      947 HL Touring Handlebars
*/
…But we still can’t get around the fact that we have to access the Products table multiple times. In fact, the execution plan for the above query is quite amusing when you look at it:



For each Product row acquired in the Clustered Index Scan, it does a SEEK into the same Clustered Index to get the Name! What a waste of resources.

Now on to Anti Semi JOINs…

A LEFT Anti Semi JOIN returns rows from the left side that have no matching rows on the right side… It’s the exact opposite of the Semi JOIN. As you can probably guess, this kind of JOIN is employed when you execute a query using EXCEPT or NOT EXISTS or NOT IN:

select ProductID
from Production.Product
where ProductSubcategoryID=4
except
select ProductID
from Sales.SalesOrderDetail 
/*
ProductID
---------
      812
*/
 
select ProductID
from Production.Product p
where ProductSubcategoryID=4
  and not exists (select * 
                  from Sales.SalesOrderDetail 
                  where ProductID=p.ProductID)
/*
ProductID
---------
      812
*/
 
select ProductID
from Production.Product
where ProductSubcategoryID=4
  and ProductID not in (select ProductID
                        from Sales.SalesOrderDetail)
/*
ProductID
---------
      812
*/
All three of the above queries produce the exact same execution plan using a LEFT Anti Semi JOIN:



So, for each of the 8 Product rows, the Nested Loops operator requests a row from the SalesOrderDetail table. If one exists, then it tosses the Product row aside and moves on. If one does not exist, then it releases the Product row up to the Select operator for output.

The EXCEPT operator has the same limitations as was described for the INTERSECT operator and therefore is not as useful as the NOT EXISTS or NOT IN types of queries.

One important note about NOT IN. It is only equivalent to the NOT EXISTS query if the column being checked is non-nullable. If the ProductID in Sales.SalesOrderDetail allowed NULLs, then the NOT IN query plan would look like this:



There’s a lot of logic employed in the plan to handle the fact that there may be NULLs in SalesOrderDetail. We can return back to our more simplified query, however, by adding a WHERE IS NOT NULL predicate to our IN subquery:

select ProductID
from Production.Product
where ProductSubcategoryID=4
  and ProductID not in (select ProductID
                        from Sales.SalesOrderDetail
                        where ProductID is not null)
So if you prefer the NOT IN style over the NOT EXISTS style of querying, it’s a good idea to get in the habit of including a WHERE IS NOT NULL predicate to the subquery.

By the way, many people in the past have tried to emulate the Anti Semi JOIN behavior by doing a LEFT JOIN and adding a WHERE IS NULL to the query to only find rows that have no match on the right side, like so:

select p.ProductID
from Production.Product p
left join Sales.SalesOrderDetail sod on p.ProductID=sod.ProductID 
where ProductSubcategoryID=4
  and sod.ProductID is null
But this actually produces a plan that LEFT JOINs everything and then employs a Filter to only allow the IS NULL non-matches, creating a lot of unnecessary work and a much more inefficient query than the true Anti Semi JOIN:



I’ve seen people in the past saying that the LEFT JOIN/IS NULL approach is faster than the NOT EXISTS approach, but frankly, I can’t see it. If anyone has an example to offer, I’d certainly like to take a look.

CTE: Coolest T-SQL Enhancement

Tuesday, May 10th, 2011
T-SQL TuesdayThis blog entry is participating in T-SQL Tuesday #018, hosted this month by Bob Pusateri.

You are invited to visit his blog and join the party and read more blogs participating in this month’s theme: CTEs.

CTE stands for Common Table Expression, but it should stand for Coolest T-SQL Enhancement. In fact, this T-SQL Tuesday topic is a perfect followup to last month’s topic of APPLY, because APPLY and CTEs (and Window Functions) were all added in SQL2005 as new indispensable enhancements to the language.

The APPLY Operator is an incredibly versatile tool that helps you to create “functions on-the-fly” and do column manipulations. Similarly, a CTE is a tool that helps you create “views on-the-fly” and do row manipulations.

There are many things that you can do with CTEs that you could do with APPLY as well; however, what sets CTEs apart are that they can be recursive and that they can be re-used (in the same query). You can’t re-use an APPLY operation.

I’m sure there will be many posts about recursion this month, but I did a big treatise on that subject last year, so I’m going to focus on a couple of other cool features of CTEs.

Column Aliasing

One aspect of CTEs that are not used that often (and that surprisingly not many people even know about) is the fact that you can provide column aliases in its definition.

For example, let’s say you have a simple CTE query like the following using the NorthWind database (I’m taking a break from AdventureWorks):

with MadridCusts as
(
  select ID=CustomerID
        ,Company=CompanyName
        ,Contact=ContactName
        ,Phone
  from Customers
  where City='Madrid'
)
select ID,Company,Contact,Phone
from MadridCusts
/*
ID    Company                              Contact          Phone
----- ------------------------------------ ---------------- --------------
BOLID Bolido Comidas preparadas            Martin Sommer    (91) 555 22 82
FISSA FISSA Fabrica Inter. Salchichas S.A. Diego Roel       (91) 555 94 44
ROMEY Romero y tomillo                     Alejandra Camino (91) 745 6200
*/
As you can see, I’m defining new column aliases for each of the columns (except Phone) within the CTE. But in the WITH clause, I can override those columns aliases with different aliases if I want to. (In fact, you didn’t hear it here, but this is a hilarious trick you can play on your colleagues when they move away from their desk for a few moments):

with MadridCusts(Phone,ID,Company,Contact) as
(
  select ID=CustomerID
        ,Company=CompanyName
        ,Contact=ContactName
        ,Phone
  from Customers
  where City='Madrid'
)
select ID,Company,Contact,Phone
from MadridCusts
/*
ID                                   Company          Contact         Phone
------------------------------------ ---------------- --------------- -----
Bolido Comidas preparadas            Martin Sommer    (91) 555 22 82  BOLID
FISSA Fabrica Inter. Salchichas S.A. Diego Roel       (91) 555 94 44  FISSA
Romero y tomillo                     Alejandra Camino (91) 745 6200   ROMEY
*/
Diabolical, isn’t it?

Multiple CTEs Building Upon Each Other

You can define several CTEs that build upon each other. In fact, last year, I wrote a blog post about a query that plays Poker. Via about a dozen CTEs, it creates the deck of cards, deals them out to 10 people, evaluates the hands, and shows the winner… all in one single query.

The following is a ridiculously simple (and useless) example of CTEs building upon each other:

with USACustomers as
(
  select CustomerID,CompanyName,ContactName,Region,City,Phone,PostalCode
  from Customers
  where Country='USA'
)
,USACustomersInOregon as
(
  select *
  from USACustomers
  where Region='OR'
)
,OregonCustomersInPortland as
(
  select *
  from USACustomersInOregon
  where City='Portland'
)
select * from OregonCustomersInPortland
 
Each CTE uses the result of the previous to continue to narrow down the result set. But since each CTE is treated as a view, the query optimizer is able to “push” the predicates into a single one. It is exactly the same in every aspect as the following query:

select CustomerID,CompanyName,ContactName,Region,City,Phone,PostalCode
from Customers
where Country='USA'
  and Region='OR'
  and City='Portland'

Updating and Deleting

Another aspect of CTEs that many people don’t realize is that they are updatable.

Let’s make a temporary copy of the Customers table and Orders table add a new column to them called UpdateColumn:

if object_id('tempdb..#Custs','U') is not null drop table #Custs
select *, cast(null as char(1)) as UpdateColumn
into #Custs
from Customers
if object_id('tempdb..#Orders','U') is not null drop table #Orders
select *, cast(null as char(1)) as UpdateColumn
into #Orders
from Orders
 
Now we can do our same OregonCustomersInPortland CTE query, but this time we will update that new UpdateColumn with an ‘X’ for each of those Portland Customers. Note that we have to introduce the UpdateColumn to the SELECT list of the initial query:

with USACustomers as
(
  select CustomerID,CompanyName,ContactName,Region,City,Phone,PostalCode,UpdateColumn
  from #Custs
  where Country='USA'
)
,USACustomersInOregon as
(
  select *
  from USACustomers
  where Region='OR'
)
,OregonCustomersInPortland as
(
  select *
  from USACustomersInOregon
  where City='Portland'
)
update OregonCustomersInPortland 
set UpdateColumn='X'
/*
(2 row(s) affected)
*/
As long as the column(s) you want to update are defined in the CTE, you can update it. In fact, you can introduce the UpdateColumn multiple times within the CTEs and you can update any one of them and it will work fine. (Note that you can’t update more than one of them, because SQL will not allow you to update a column more than once:

with USACustomers as
(
  select CustomerID,CompanyName,ContactName,Region,City,Phone,PostalCode,UpdateColumn
  from #Custs
  where Country='USA'
)
,USACustomersInOregon as
(
  select *,UpdateColumnAgain=UpdateColumn
  from USACustomers
  where Region='OR'
)
,OregonCustomersInPortland as
(
  select *,UpdateColumnStillAgain=UpdateColumn
  from USACustomersInOregon
  where City='Portland'
)
update OregonCustomersInPortland 
set UpdateColumnStillAgain='X'
/*
(2 row(s) affected)
*/
Using CTEs to UPDATE or DELETE can perhaps make things a little clearer when you’re JOINing in other tables or involving more complicated query methods. For example, let’s say Howard Snyder, a contact for some Customer in Northwind, calls us and wants to change his last open order so that it ships via Speedy Express rather than the usual Federal Shipping.

So we can use a CTE to find open orders belonging to a customer whose contact is named Howard Snyder that are set up for Federal Shipping, get the most recent one (based on Order Date) and update its Shipping Method to Speedy Express instead:

with HowardSnyderOpenOrdersViaFedShipping as
(
  select o.ShipVia
        ,RowNum=row_number() over (order by o.OrderDate desc)
  from #Orders o
  join Customers c on o.CustomerID=c.CustomerID
  join Shippers s on o.ShipVia=s.ShipperID 
  where o.ShippedDate is null  --Open Orders
    and c.ContactName='Howard Snyder'
    and s.CompanyName='Federal Shipping'
)
update HowardSnyderOpenOrdersViaFedShipping 
set ShipVia=(select ShipperID 
             from Shippers 
             where CompanyName='Speedy Express')
where RowNum=1  --Only most recent order
/*
(1 row(s) affected)
*/
Coo-ul, huh?

Generating Numbers

This has been seen in hundreds of blogs and books, but I still marvel at the following method of generating a table of numbers (from 1 to 1,000,000). It’s elegant and brief and fast:

with 
  L0(c) as (select 0 from (values (0),(0),(0)) f(c)) --3 Rows
 ,L1(c) as (select 0 from L0 a,L0 b,L0 c)            --27 Rows (3x3x3)
 ,L2(c) as (select 0 from L1 a,L1 b,L1 c)            --19683 Rows (27x27x27)
 ,L3(c) as (select 0 from L2 a,L2 b)                 --387,420,489 Rows (19683x19683)
 ,NN(n) as (select row_number() over (order by (select 0)) from L3)
select n into #Nums from NN where n<=1000000
 
That’s just too cool. Note that because of the VALUES row constructor syntax, the above will only work in SQL2008. To make it work in SQL2005 or earlier, just change the first CTE to use UNION ALLs instead:

with 
  L0(c) as (select 0 union all select 0 union all select 0) --3 Rows

Step-By-Step Clarity

Like the APPLY operator, CTEs are terrific for step-by-step self-documentation. They make your code easier to follow and understand what’s going on.

Here’s an example of seeing whether the current date/time falls within the Daylight Savings Time (as defined by the United States, unless you live in Hawaii or Arizona). Again, this will only work in SQL2008… Change the VALUES row constructor syntax to UNION ALL syntax to make it work in SQL2005:

with 
  ZeroThruSix as
    (
       select N from (values (0),(1),(2),(3),(4),(5),(6)) F(N)
    )
 ,FirstDayInMarch(FirstDayInMarch) as
    (
       select convert(datetime,str(year(getdate()),4)+'0301 02:00')
    )
 ,FirstSevenDaysInMarch(MarchDate) as
    (
       select dateadd(day,N,FirstDayInMarch)
       from FirstDayInMarch
       cross join ZeroThruSix
    )
 ,SecondSundayInMarch(SecondSundayInMarch) as 
    (
       select dateadd(day,7,MarchDate)
       from FirstSevenDaysInMarch
       where datename(weekday,MarchDate)='Sunday'
    )
 ,FirstDayInNovember(FirstDayInNovember) as
    (
       select convert(datetime,str(year(getdate()),4)+'1101 02:00')
    )
 ,FirstSevenDaysInNovember(NovemberDate) as
    (
       select dateadd(day,N,FirstDayInNovember)
       from FirstDayInNovember 
       cross join ZeroThruSix 
    )
 ,FirstSundayInNovember(FirstSundayInNovember) as
    (
       select NovemberDate
       from FirstSevenDaysInNovember 
       where datename(weekday,NovemberDate)='Sunday'
    )
select IsDST=convert(bit,case
                           when getdate()>=SecondSundayInMarch
                            and getdate()<FirstSundayInNovember 
                           then 1
                           else 0
                         end)
from SecondSundayInMarch
cross join FirstSundayInNovember 
 
Cute, huh? Note how the ZeroThruSix CTE was used more than once? That’s the reusability I was mentioning earlier. You can also see how I incorporated column aliasing in most of the CTEs as well.

You can follow what’s going on step-by-step in the above query. Before SQL2005 and CTEs you would have to write the query like so:

select IsDST=convert(bit,case
                           when getdate()>=SecondSundayInMarch
                            and getdate()<FirstSundayInNovember 
                           then 1
                           else 0
                         end)
from 
  (
    select dateadd(day,N+7,convert(datetime,str(year(getdate()),4)+'0301 02:00'))
    from (select 0 union all select 1 union all select 2 union all select 3
          union all select 4 union all select 5 union all select 6) F(N)
    where datename(weekday
                  ,dateadd(day,N,convert(datetime,str(year(getdate()),4)+'0301 02:00')))
          ='Sunday'
  ) SecondSundayInMarch(SecondSundayInMarch)
cross join
  (
    select dateadd(day,N,convert(datetime,str(year(getdate()),4)+'1101 02:00'))
    from (select 0 union all select 1 union all select 2 union all select 3
          union all select 4 union all select 5 union all select 6) F(N)
    where datename(weekday
                  ,dateadd(day,N,convert(datetime,str(year(getdate()),4)+'1101 02:00')))
          ='Sunday' 
  ) FirstSundayInNovember(FirstSundayInNovember)
Not quite as clear, is it? But both queries produce the exact same query plan.

I hope you’ve enjoyed these CTE examples and appreciate their power. I couldn’t live without them.

The Index Tuning Detective

Friday, April 29th, 2011
A Shocking Revelation!Part of tuning queries is being a detective in figuring out what indexes need to be created… and figuring out what indexes may need to be tweaked a bit.

So to all you Nancy Drews and Hardy Boys out there: Get your magnifying glass and let’s unravel some mysteries!

Our first adventure will be The Case Of The Missing Indexes.

This has been talked about before in other blogs, because SQL Server already provides some tools to find missing indexes, but hopefully I’ll go just a little bit farther with the concept.

But it’s the second adventure, The Case Of The Key Lookup Killer, that I’m looking forward to sharing with you.

It may help you uncover some shocking revelations about your queries.

Anyway, please read on…



The Case of the Missing Indexes

Consider the following query in AdventureWorks:

select SalesOrderID,OrderDate 
from Sales.SalesOrderHeader 
where PurchaseOrderNumber is not null
When we look at the Estimated Execution Plan for this query in SQL2008, we get a helpful hint about a missing index. It says that we can cut down the cost of the query by 94.2762% if we were to add the index that it suggests. And we can even right-click on the plan and choose Missing Index Details and it will provide a code window with the code necessary to create that index:

SQL2008 Showing Missing Index

/*
Missing Index Details from SQLQuery6.sql-BRADPC\SQL08.AdventureWorks (BRADPC\Brad (53))
The Query Processor estimates that implementing the following index 
could improve the query cost by 94.2762%.
*/
 
/*
USE [AdventureWorks]
GO
CREATE NONCLUSTERED INDEX [<Name of Missing Index, sysname,>]
ON [Sales].[SalesOrderHeader] ([PurchaseOrderNumber])
INCLUDE ([SalesOrderID],[OrderDate])
GO
*/
Of course, the index is not a perfect suggestion because it suggests we INCLUDE the SalesOrderID column, which is ridiculous because SalesOrderID is the Clustered Index Key, so it would be part of the index automatically anyway.

But never mind that… This is still pretty cool stuff.

And it’s not limited to SQL2008 either. Yes, SQL2008 will provide the helpful hint when you look at the Estimated Plan, but SQL2005 still has the same information behind the scenes. If you look at the Execution Plan XML (by right-clicking on the Plan and choosing Show Execution Plan XML), you will find the Missing Index information buried in there… usually towards the top, but not always… there can potentially be many Missing Index sections in the XML if the plan is for a multiple-statement batch (particularly a batch with IF conditions).

/*
<?xml version="1.0" encoding="utf-16"?>
<ShowPlanXML ... xmlns="http://schemas.microsoft.com/sqlserver/2004/07/showplan">
...
<MissingIndexes>
  <MissingIndexGroup Impact="94.2762">
  <MissingIndex Database="[AdventureWorks]" Schema="[Sales]" Table="[SalesOrderHeader]">
     <ColumnGroup Usage="INEQUALITY">
        <Column Name="[PurchaseOrderNumber]" ColumnId="9" />
     </ColumnGroup>
     <ColumnGroup Usage="INCLUDE">
        <Column Name="[SalesOrderID]" ColumnId="1" />
        <Column Name="[OrderDate]" ColumnId="3" />
     </ColumnGroup>
   </MissingIndex>
  </MissingIndexGroup>
</MissingIndexes>
...
</ShowPlanXML>
*/
This is great if you happen to be looking at a specific plan, but what if you want to know about missing indexes across many plans?

There is a collection of DMV’s that report missing indexes for queries that have been executed on the server. The following is a query used by many DBA’s in order to find indexes that they can potentially add in order to speed up their queries:

select index_advantage=user_seeks*avg_total_user_cost*(avg_user_impact*0.01)
      ,migs.last_user_seek
      ,TableName=mid.statement
      ,mid.equality_columns
      ,mid.inequality_columns 
      ,mid.included_columns 
      ,migs.unique_compiles 
      ,migs.user_seeks 
      ,migs.avg_total_user_cost 
      ,migs.avg_user_impact 
from sys.dm_db_missing_index_group_stats migs with (nolock)
join sys.dm_db_missing_index_groups mig with (nolock) 
     on migs.group_handle=mig.index_group_handle 
join sys.dm_db_missing_index_details mid with (nolock) 
     on mig.index_handle=mid.index_handle 
order by index_advantage desc 
/*
 index_advantage last_user_seek          TableName                                  
---------------- ----------------------- -------------------------------------------
3.23713751033778 2011-04-29 08:58:11.960 [AdventureWorks].[Sales].[SalesOrderHeader]
  
equality_columns inequality_columns   included_columns            
---------------- -------------------- --------------------------- 
NULL             [PurchaseOrderNumber [SalesOrderID], [OrderDate] 
  
unique_compiles user_seeks avg_total_user_cost avg_user_impact
--------------- ---------- ------------------- ---------------
              1          6   0.572255959259259           94.28
*/
You can see our familiar Impact figure of 94.28% in the last column, but since the DMV’s record the number of seeks that could have potentially been performed (based on the number of times the query was executed on the server), this query calculates a theoretical “advantage” figure using that information, and orders the data in descending order of that “advantage”.

Again, this is really cool that this information is recorded and we can effortlessly find out suggestions for indexes in our databases.

But these indexes shouldn’t be created blindly… they should be created after some consideration of how they will impact the system. If you know your database backwards and forwards, you probably have a good idea of what kind of queries would benefit from the creation of these indexes. But if you’re a consultant coming in cold, you don’t really know much about the queries in the system.

However, all is not lost!

We can look directly inside the query cache (via the sys.dm_exec_cached_plans DMV), hunting for queries that are running right now (or have run recently), and pick out those that have missing index suggestions in their Execution XML data. This can be used in concert with the Missing Index DMV query above to gain more knowledge about the queries that will benefit.

I put together the following query to do just that. The comments within the code should help you to figure out what is going on. Use the XML snippet shown earlier to follow how the code hunts for what it wants.

Note that I only wanted to pay attention to cached queries that had been used at least 5 times. I also wanted to limit the number of columns making up the key to 5 and the number of INCLUDEd columns to 5. I also only wanted to pay attention to Missing Indexes that have an Impact of at least 50%. You can fiddle with these quantities in the WHERE clause to match your needs.

The query provides the Key Column list and the INCLUDE Column List and the command you can use to actually create the index if you wish. It also provides a hyperlink to the code of the batch or procedure that would benefit from the index so you can see the query involved. There is also a hyperlink to the Execution Plan so you can investigate that as well.

The query is ORDERed BY the Impact figure in descending order, so the ones with the biggest benefit will be at the top.

with xmlnamespaces 
(
  default 'http://schemas.microsoft.com/sqlserver/2004/07/showplan'
)
select Impact
      ,TableName=IxDB+'.'+IxSchema+'.'+IxTable
      ,KeyCols
      ,IncludeCols
      ,IndexCommand
      ,usecounts
      ,size_in_bytes
      ,objtype
      ,BatchCode
      ,QueryPlan=qp.query_plan 
from sys.dm_exec_cached_plans qs 
cross apply 
  --Get the Query Text
  sys.dm_exec_sql_text(qs.plan_handle) qt             
cross apply 
  --Get the Query Plan
  sys.dm_exec_query_plan(qs.plan_handle) qp
cross apply
  --Get the Code for the Batch in Hyperlink Form
  (select BatchCode
            =(select [processing-instruction(q)]=':'+nchar(13)+qt.text+nchar(13)
              for xml path(''),type)
  ) F_Code
cross apply
  --Find the Missing Indexes Group Nodes in the Plan
  qp.query_plan.nodes('//MissingIndexes/MissingIndexGroup') F_GrpNodes(GrpNode)
cross apply 
  --Pull out the Impact Figure
  (select Impact=GrpNode.value('(./@Impact)','float')) F_Impact
cross apply 
  --Get the Missing Index Nodes from the Group
  GrpNode.nodes('(./MissingIndex)') F_IxNodes(IxNode)
cross apply 
  --Pull out the Database,Schema,Table of the Missing Index
  (select IxDB=IxNode.value('(./@Database)','sysname')
         ,IxSchema=IxNode.value('(./@Schema)','sysname')
         ,IxTable=IxNode.value('(./@Table)','sysname')
  ) F_IxInfo
cross apply 
  --How many INCLUDE columns are there;
  --And how many EQUALITY/INEQUALITY columns are there?
  (select NumIncludes
            =IxNode.value('count(./ColumnGroup[@Usage="INCLUDE"]/Column)','int')
         ,NumKeys
            =IxNode.value('count(./ColumnGroup[@Usage!="INCLUDE"]/Column)','int')
  ) F_NumIncl
cross apply 
  --Pull out the Key Columns and the Include Columns from the various Column Groups
  (select EqCols=max(case when Usage='EQUALITY' then ColList end)
         ,InEqCols=max(case when Usage='INEQUALITY' then ColList end)
         ,IncludeCols=max(case when Usage='INCLUDE' then ColList end)
   from IxNode.nodes('(./ColumnGroup)') F_ColGrp(ColGrpNode)
   cross apply 
     --Pull out the Usage of the Group? (EQUALITY of INEQUALITY or INCLUDE)
     (select Usage=ColGrpNode.value('(./@Usage)','varchar(20)')) F_Usage
   cross apply 
     --Get a comma-delimited list of the Column Names in the Group
     (select ColList=stuff((select ','+ColNode.value('(./@Name)','sysname')
                            from ColGrpNode.nodes('(./Column)') F_ColNodes(ColNode)
                            for xml path(''))
                          ,1,1,'')
     ) F_ColList
  ) F_ColGrps
cross apply
  --Put together the Equality and InEquality Columns
  (select KeyCols=isnull(EqCols,'')
                 +case 
                    when EqCols is not null and InEqCols is not null 
                    then ',' 
                    else '' 
                  end
                 +isnull(InEqCols,'')
  ) F_KeyCols
cross apply 
  --Construct a CREATE INDEX command
  (select IndexCommand='create index <InsertNameHere> on '
                      +IxDB+'.'+IxSchema+'.'+IxTable+' ('
                      +KeyCols+')'
                      +isnull(' include ('+IncludeCols+')','')) F_Cmd
where qs.cacheobjtype='Compiled Plan'
  and usecounts>=5    --Only interested in those plans used at least 5 times
  and NumKeys<=5      --Limit to the #columns we're willing to have in the index
  and NumIncludes<=5  --Limit to the #columns we're willing to have in the INCLUDE list
  and Impact>=50      --Only indexes that will have a 50% impact
order by Impact desc
Here is the output (without the hyperlinks) for the AdventureWorks query we were discussing:

/*
 Impact TableName                                  
------- -------------------------------------------
94.2762 [AdventureWorks].[Sales].[SalesOrderHeader]
  
KeyCols               IncludeCols                
--------------------- -------------------------- 
[PurchaseOrderNumber] [SalesOrderID],[OrderDate]  
  
IndexCommand                                                                                           
----------------------------------------------------------------------------
create index <InsertNameHere> on [AdventureWorks].[Sales].[SalesOrderHeader] 
([PurchaseOrderNumber]) include ([SalesOrderID],[OrderDate])
  
usecounts size_in_bytes objtype
--------- ------------- -------
        9         40960 Adhoc  
*/



The Case of the Key Lookup Killer

Now on to the part of this article I’m really excited about, because it will help you possibly tweak existing indexes to help improve your queries. This is something that you cannot get from any DMV’s.

Consider the following query:

select h.SalesOrderID
      ,h.CustomerID 
      ,h.OrderDate 
      ,d.LineTotal 
from Sales.SalesOrderHeader h
join Sales.SalesOrderDetail d on h.SalesOrderID=d.SalesOrderID 
where h.CustomerID in (117,119,126,196,236,435)
  and d.ProductID=942
 
This is what its query plan looks like (click on the image to see a larger view):

Query with Key Lookups

Note that it is able to use the SalesOrderDetail’s index on ProductID to easily find the rows for ProductID 942. However, for each of those rows, it has to do a Key Lookup into the Clustered Index in order to get the columns for the LineTotal, which is a computed column based on the columns OrderQty, UnitPrice, and UnitPriceDiscount. The LineTotal is calculated by the Compute Scalar operator.

Similarly, the query makes use of SalesOrderHeader’s index on CustomerID to find the rows for the desired CustomerID’s. But again, it has to do Key Lookups to get the OrderDate.

What if we could eliminate those Key Lookups? If we were to INCLUDE the OrderQty and UnitPrice and UnitPriceDiscount columns in the ProductID index and INCLUDE the OrderDate column in the CustomerID index, then the query would be covered completely by those indexes. The query plan would then look like this:

Query Covered by Indexes

The cost of this query is 0.0128915, which is a 92% improvement over the Key Lookup query, which had a cost of 0.163265.

I put together a query to look in the cache for queries containing Key Lookups and then pull out the columns that we could potentially INCLUDE in an index in order to improve performance.

Here’s the section of the Execution Plan XML that involves the Sales.SalesOrderDetail Index Seek and its Key Lookup… I abbreviated it so that only the relevant portions are displayed:

/*
<?xml version="1.0" encoding="utf-16"?>
<ShowPlanXML ... xmlns="http://schemas.microsoft.com/sqlserver/2004/07/showplan">
...
<RelOp ... PhysicalOp="Nested Loops" ...>
  ...
  <NestedLoops ...>
    ...
    <RelOp ... LogicalOp="Index Seek" ...>
      ...
      <IndexScan ...>
        ...
        <Object Database="[AdventureWorks]" 
                Schema="[Sales]" 
                Table="[SalesOrderDetail]" 
                Index="[IX_SalesOrderDetail_ProductID]" 
                Alias="[d]" ... />
      </IndexScan>
    </RelOp>
    <RelOp  ... LogicalOp="Compute Scalar" ...>
      <ComputeScalar>
        ...
        <RelOp ... LogicalOp="Clustered Index Seek" ...>
          <OutputList>
            <ColumnReference ... Column="OrderQty" />
            <ColumnReference ... Column="UnitPrice" />
            <ColumnReference ... Column="UnitPriceDiscount" />
          </OutputList>
          <IndexScan Lookup="true" ...>
            ...
            <Object Database="[AdventureWorks]" 
                    Schema="[Sales]" 
                    Table="[SalesOrderDetail]" 
                    Index="[PK_SalesOrderDetail_SalesOrderID_SalesOrderDetailID]" 
                    Alias="[d]" ... />
          </IndexScan>
        </RelOp>
      </ComputeScalar>
    </RelOp>
  </NestedLoops>
</RelOp>
...
</ShowPlanXML>
*/
You can see the Nested Loops RelOp Node has a node down its hierarchy for the Index Seek into the IX_SalesOrderDetail_ProductID index, and it also has a node down its hierarchy for the Clustered Index Seek.

Here’s what my query does, step by step…

It finds Clustered Index Seek RelOp Nodes with an Index Scan Node with the Lookup attribute equal to True. Those are the Key Lookups. For each one found, it saves the Database, Schema, Table, and (possible) Alias Name. It also gets the list of columns in the OutputList node and also counts how many columns there are.

Then it goes up the hierarchy, looking for a Nested Loops RelOp Node. It may be the immediate RelOp Node, or it could be two RelOp Nodes up the hierarchy. In this case, because of the Compute Scalar, we have to go up two RelOp Nodes.

Once it finds the appropriate Nested Loops RelOp Node, it goes down its hierarchy looking for any Index Seek or Index Scan RelOp Nodes.

For each of those found, it saves the Database, Schema, Table, and (possible) Alias Name. And it gets the name of the Index that was involved in the Seek or Scan.

So now we have Lookup Data, and we have to see if any of the Index Data we found (there could be more than one) match in terms of Database and Schema and Table and (possible) Alias. Once the match is found, we can output it.

It’s a bit complicated, but with a little study in reading the above steps and in reading the comments in the code, you can hopefully figure out what’s going on.

Like the Missing Index query, I only look for cached plans that have been used at least 5 times, and I’m only interested in an INCLUDE list of no more than 5 columns. The output includes the usual hyperlinks to the code and to the plan.

I order the query’s output by TableName and IndexName so you can see similar suggestions clustered together. Note also that the columns in the suggested INCLUDE list are in alphabetical order.

Here is what my query suggested for our AdventureWorks Key Lookup query:

/*
TableName                                   IndexName                         
------------------------------------------- --------------------------------  
[AdventureWorks].[Sales].[SalesOrderDetail] [IX_SalesOrderDetail_ProductID]   
[AdventureWorks].[Sales].[SalesOrderHeader] [IX_SalesOrderHeader_CustomerID]  
  
TableAliasInQuery ColumnsToInclude                     usecounts size_in_bytes objtype
----------------- ------------------------------------ --------- ------------- -------
[d]               OrderQty,UnitPrice,UnitPriceDiscount         6        196608 Adhoc
[h]               OrderDate                                    6        196608 Adhoc
*/
So it is suggesting that the SalesOrderDetail index called IX_SalesOrderDetail_ProductID should INCLUDE the columns OrderQty, UnitPrice, and UnitPriceDiscount. And it is making a similar suggestion to INCLUDE the OrderDate column in the IX_SalesOrderHeader_CustomerID index.

Again, as with any kind of tuning exercise, you should not make these changes blindly, but consider the possible implications before going forward.

The code for this query for INCLUDE Column Suggestions is below. I hope you found it as useful as I have. It actually helped me to find an error in a client’s stored procedure… In suggesting a column to INCLUDE, I could see by looking at the query that it was actually using an incorrect WHERE predicate, and so we were able to fix it before it caused a bug. I’ve also used its suggestions to improve some queries considerably.

So here’s the query… Please let me know if it’s been useful to you… Enjoy!

with xmlnamespaces 
(
  default 'http://schemas.microsoft.com/sqlserver/2004/07/showplan'
)
select TableName=IxDB+'.'+IxSchema+'.'+IxTable
      ,IndexName=IxIndex
      ,TableAliasInQuery=isnull(IxAlias,IxTable)
      ,ColumnsToInclude=ColList
      ,usecounts
      ,size_in_bytes
      ,objtype
      ,BatchCode
      ,QueryPlan=qp.query_plan 
from sys.dm_exec_cached_plans qs 
cross apply 
  --Get the Query Text
  sys.dm_exec_sql_text(qs.plan_handle) qt             
cross apply 
  --Get the Query Plan
  sys.dm_exec_query_plan(qs.plan_handle) qp
cross apply
  --Get the Code for the Batch in Hyperlink Form
  (select BatchCode
            =(select [processing-instruction(q)]=':'+nchar(13)+qt.text+nchar(13)
              for xml path(''),type)
  ) F_Code
cross apply
  --Find the Key Lookups in the Plan
  qp.query_plan.nodes
  (
    '//RelOp[@LogicalOp="Clustered Index Seek"]/IndexScan[@Lookup=1]'
  ) F_Lookup(LookupNode)
cross apply 
  --Get the Database,Schema,Table of the Lookup
  --Also get the Alias (if it exists) in case the table
  --  is used more than once in the query
  (select LookupDB=LookupNode.value('(./Object[1]/@Database)','sysname')
         ,LookupSchema=LookupNode.value('(./Object[1]/@Schema)','sysname')
         ,LookupTable=LookupNode.value('(./Object[1]/@Table)','sysname')
         ,LookupAlias=isnull(LookupNode.value('(./Object[1]/@Alias)','sysname'),'')
         ,ColumnCount=LookupNode.value('count(../OutputList[1]/ColumnReference)','int')
  ) F_LookupInfo
cross apply 
  --Get the Output Columns
  (select stuff(
            (select ','+ColName
             from LookupNode.nodes('(../OutputList[1]/ColumnReference)') F_Col(ColNode)
             cross apply 
               (select ColName=ColNode.value('(./@Column)','sysname')) F_ColInfo
             order by ColName
             for xml path(''),type).value('(./text())[1]','varchar(max)')
            ,1,1,'')
  ) F_ColList(ColList)
outer apply
  --Get the Parent RelOp Node, hoping that it is a Nested Loops operator.
  --Use OUTER APPLY because we may not find it
  LookupNode.nodes
  (
    '(./../../..[@PhysicalOp="Nested Loops"])'
  ) F_ParentLoop(ParentLoopNode)
outer apply
  --Get the GrandParent RelOp Node, hoping that it is a Nested Loops operator.
  --Use OUTER APPLY because we may not find it
  LookupNode.nodes
  (
    '(./../../../../..[@PhysicalOp="Nested Loops"])'
  ) F_GrandParentLoop(GrandParentLoopNode)
cross apply 
  --Get the Nested Loop Node... Could be the Parent or the GrandParent
  (select LoopNode=isnull(ParentLoopNode.query('.')
                         ,GrandParentLoopNode.query('.'))
  ) F_LoopNode
cross apply
  --Now that we (hopefully) have a Nested Loops Node, let's find a descendant
  --of that node that is an Index Seek or Index Scan and acquire its Object Information
  LoopNode.nodes
    (
      '//RelOp[@LogicalOp="Index Scan" or @LogicalOp="Index Seek"]
       /IndexScan[1]/Object[1]'
    ) F_SeekNode(SeekObjNode)
cross apply
  --Get the Database,Schema,Table and Index of the Index Seek/Scan
  --Also get the Alias (if it exists) so we can match it up with 
  --  the Lookup Table
  (select IxDB=SeekObjNode.value('(./@Database)','sysname')
         ,IxSchema=SeekObjNode.value('(./@Schema)','sysname')
         ,IxTable=SeekObjNode.value('(./@Table)','sysname')
         ,IxAlias=isnull(SeekObjNode.value('(./@Alias)','sysname'),'')
         ,IxIndex=SeekObjNode.value('(./@Index)','sysname')
  ) F_SeekInfo
where qs.cacheobjtype='Compiled Plan'
  and usecounts>=5       --Only interested in those plans used at least 5 times
  and LookupDB=IxDB          --( Lookup and IndexSeek/Scan )
  and LookupSchema=IxSchema  --(   Database,Schema,Table,  )
  and LookupTable=IxTable    --(   and [possible] Alias    )
  and LookupAlias=IxAlias    --(   must match              )
  and ColumnCount<=5     --Limit to the #columns we're willing to INCLUDE
order by TableName
        ,IndexName
        ,ColumnsToInclude
 

T-SQL Tuesday #017: APPLY: It Slices! It Dices! It Does It All!

Tuesday, April 12th, 2011
T-SQL TuesdayThis blog entry is participating in T-SQL Tuesday #017, hosted this month by Matt Velic.

You are invited to visit his blog and join the party and read more blogs participating in this month’s theme: APPLY Knowledge.

I’ve been a fan of the APPLY operator since the beginning. That’s why it’s incorporated into the name of my blog. And I've blogged about its power about a dozen times.

I don’t know how anyone lived without it before SQL2005.

Practically everyone knows that you can use it to invoke table-valued functions. That is its most obvious usage and it’s about the only way you’ll see APPLY demonstrated in 97% of the books on SQL Server (if they even mention it at all).

APPLY is a cool cat, baby!But, as you can see by the twinkle in APPLY’s eyes at the left (even behind the cool shades), he’s got other things up his sleeve.

(Okay, the goofy pointed hat and the goatee are a little much, but hey, if you want great artwork, you’re in the wrong place… Michael J. Swart or Kendra Little are the masters of illustration).

APPLY is capable of soooooo much more than just invoking TVF’s. It is incredibly versatile. It seems like it can do anything!

Let’s take a look at what you can do!



Call Table-Valued Functions!

Yawn. Okay, so this is the most common way people use APPLY. Here’s a quick demo… For each Vista credit card expiring in Jun2008, let’s get the Contact information for that card using a built-in function in AdventureWorks.

select f.FirstName
      ,f.LastName
      ,f.JobTitle
      ,f.ContactType
      ,cc.CardNumber
from Sales.CreditCard cc
join Sales.ContactCreditCard ccc on cc.CreditCardID=ccc.CreditCardID
cross apply dbo.ufnGetContactInformation(ccc.ContactID) f
where cc.ExpYear=2008
  and cc.ExpMonth=6
  and cc.CardType='Vista'
/*
FirstName LastName JobTitle         ContactType   CardNumber
--------- -------- ---------------- ------------- --------------
Peggy     Justice  Owner            Store Contact 11119759315644
John      McClane  Purchasing Agent Store Contact 11119490672347
Laura     Cai      NULL             Consumer      11112813884091
Natalie   Gonzales NULL             Consumer      11114369564985
Jarrod    Sara     NULL             Consumer      11116045498593
Katherine Bailey   NULL             Consumer      11119100149656
Stephanie Gray     NULL             Consumer      11112324154556
Shawna    Sharma   NULL             Consumer      11116413893110
Mindy     Rai      NULL             Consumer      11115163407997
Jackson   Jai      NULL             Consumer      11112011871602
And so on... (74 rows total)
*/
That’s very convenient, but kind of boring in the grand scheme of things. So let’s move on.

Execute SubQueries!

Why even bother to create a function when you can just create a table on the fly via a correlated subquery with APPLY?

You are only limited by your imagination.

Here’s an example…

For each store with a main office in Wisconsin, let’s look at the top 3 products (and their dollar amounts) that they bought in terms of dollars.

select c.CustomerID
      ,s.Name
      ,f.ProductID
      ,ProductName=p.Name
      ,f.PurchaseAmt
from Sales.Customer c
join Sales.Store s on c.CustomerID=s.CustomerID
join Sales.CustomerAddress ca on c.CustomerID=ca.CustomerID
join Person.Address a on ca.AddressID=a.AddressID
join Person.StateProvince sp on a.StateProvinceID=sp.StateProvinceID
cross apply (select top 3 ProductID
                         ,PurchaseAmt=sum(LineTotal)
             from Sales.SalesOrderHeader soh
             join Sales.SalesOrderDetail sod on soh.SalesOrderID=sod.SalesOrderID
             where CustomerID=c.CustomerID
             group by ProductID
             order by sum(LineTotal) desc) f
join Production.Product p on f.ProductID=p.ProductID
where ca.AddressTypeID=3 --MainOffice
  and sp.StateProvinceCode='WI'
/*
CustomerID Name                        ProductID ProductName                 PurchaseAmt
---------- --------------------------- --------- -------------------------- ------------
       418 Good Bike Shop                    795 Road-250 Black, 52         30367.350000
       418 Good Bike Shop                    794 Road-250 Black, 48         24136.807500
       418 Good Bike Shop                    792 Road-250 Red, 58           23508.517500
       453 Unique Bikes                      773 Mountain-100 Silver, 44    16319.952000
       453 Unique Bikes                      771 Mountain-100 Silver, 38    12239.964000
       453 Unique Bikes                      772 Mountain-100 Silver, 42    12239.964000
       543 Friendly Neighborhood Bikes       782 Mountain-200 Black, 38      9638.958000
       543 Friendly Neighborhood Bikes       868 Women's Mountain Shorts, M   671.904000
       543 Friendly Neighborhood Bikes       869 Women's Mountain Shorts, L   335.952000
       606 Little Bicycle Supply Shop        717 HL Road Frame - Red, 62     1717.800000
       606 Little Bicycle Supply Shop        838 HL Road Frame - Black, 44    858.900000
       606 Little Bicycle Supply Shop        738 LL Road Frame - Black, 52    809.328000
*/
That’s pretty slick, huh?

Shred XML!

Using the .nodes() function, coupled with the .value() and .query() functions, we can use APPLY to do some cool tricks with XML.

For the first 10 JobCandidates, let’s pull information out of the Resume column, which is of type XML. We’ll get their Name and the schools (there might be more than one) that they attended, listing them in order of their graduation date.:

with xmlnamespaces 
(
  'http://schemas.microsoft.com/sqlserver/2004/07/adventure-works/Resume' as ns
)
select JobCandidateID
      ,Name
      ,Education=stuff(EduList,1,2,'')
from HumanResources.JobCandidate
cross apply
  Resume.nodes('/ns:Resume') F_ResumeNode(ResumeNode)
cross apply 
  ResumeNode.nodes('(./ns:Name)') F_NameNode(NameNode)
cross apply 
  (select Name=NameNode.value('(./ns:Name.First[1])','nvarchar(50)')
              +' '
              +NameNode.value('(./ns:Name.Last[1])','nvarchar(50)')
  ) F_Name
cross apply 
  (select EduList=ResumeNode.query('for $p in (./ns:Education)
                                    order by $p/ns:Edu.EndDate
                                    return concat("; ",string($p/ns:Edu.School))'
                                   ).value('.','nvarchar(200)')
  ) F_Edu
where JobCandidateID<=10
/*
JobCandidateID Name                 Education
-------------- -------------------- ----------------------------------------------------
             1 Shai Bassli          Midwest State University
             2 Max Benson           Evergreen High School ; Everglades State College
             3 Krishna Sunkammurali Western University
             4 Stephen Jiang        Louisiana Business College of New Orleans
             5 Thierry D'Hers       Université d'Aix-Marseille
             6 Christian Kleinerman Lycée technique Émile Zola ; Université de Perpignan
             7 Lionel Penuchot      Université de Lyon
             8 Peng Wu              Western University
             9 Shengda Yang         Evergreen High School ; Soutern State College
            10 Tai Yee              Midwest State University
*/
As Miley Cyrus would say: That’s really cool.

Introduce New Columns!

This is probably the best use of APPLY because it makes code so much more clear.

Consider the following query, which groups the 2002 Sales by Month. That’s done by the DATEADD/DATEDIFF logic, but it has to be repeated in the GROUP BY and the SELECT and the ORDER BY:

select Mth=datename(month
                   ,dateadd(month
                           ,datediff(month,'19000101',OrderDate)
                           ,'19000101'))
      ,Total=sum(TotalDue)
from Sales.SalesOrderHeader 
where OrderDate>='20020101'
  and OrderDate<'20030101'
group by dateadd(month
                ,datediff(month,'19000101',OrderDate)
                ,'19000101')
order by dateadd(month
                ,datediff(month,'19000101',OrderDate)
                ,'19000101')
/*
Mth              Total
--------- ------------
January   1605782.1915
February  3130823.0378
March     2643081.0798
April     1905833.9088
May       3758329.2949
June      2546121.9618
July      3781879.0708
August    5433609.3426
September 4242717.7166
October   2854206.7518
November  4427598.0006
December   3545522.738
*/
I don’t know about you, but I hate all that repetition, and it looks a little busy. So APPLY to the rescue:

select Mth=datename(month,FirstDayOfMth)
      ,Total=sum(TotalDue)
from Sales.SalesOrderHeader 
cross apply 
  (
    select FirstDayOfMth=dateadd(month
                                ,datediff(month,'19000101',OrderDate)
                                ,'19000101')
  ) F_Mth
where OrderDate>='20020101'
  and OrderDate<'20030101'
group by FirstDayOfMth
order by FirstDayOfMth
/*
Mth              Total
--------- ------------
January   1605782.1915
February  3130823.0378
March     2643081.0798
April     1905833.9088
May       3758329.2949
June      2546121.9618
July      3781879.0708
August    5433609.3426
September 4242717.7166
October   2854206.7518
November  4427598.0006
December   3545522.738
*/
Now isn’t that much clearer as to what’s going on? And it costs nothing at all! The query plans of both of the queries above are exactly the same!

Perform Complicated Calculations!

This is the part of APPLY that I really love. Let’s look at an example.

Let’s say that you have a table of comma-delimited lists of one or more integers:

create table #t
(
   ID int identity(1,1)
  ,ListOfNums varchar(50)
)
insert #t
values ('279,37,972,15,175')
      ,('17,72')
      ,('672,52,19,23')
      ,('153,798,266,52,29')
      ,('77,349,14')
select * from #t
/*
ID ListOfNums
-- -----------------
 1 279,37,972,15,175
 2 17,72
 3 672,52,19,23
 4 153,798,266,52,29
 5 77,349,14
*/
Your job: Pull out only the rows that have the 4th number in the list less than 50 and sort the output by the 3rd number in the list.

Easy, right? Ha ha ha ha ha ha ha hee hee hee hee hee hee ho ho ho ho haw haw giggle chuckle guffaw!

Before the APPLY operator, SQL2000 folks would have to resort to something ludicrous like this in order to accomplish this task:

select ID
      ,ListOfNums
from #t
where substring(ListOfNums+',,,,',charindex(',',ListOfNums+',,,,',
      charindex(',',ListOfNums+',,,,',charindex(',',ListOfNums+',,,,')+1)+1)+1,
      (charindex(',',ListOfNums+',,,,',charindex(',',ListOfNums+',,,,',
      charindex(',',ListOfNums+',,,,',charindex(',',ListOfNums+',,,,')+1)+1)+1)-
      charindex(',',ListOfNums+',,,,',charindex(',',ListOfNums+',,,,',
      charindex(',',ListOfNums+',,,,')+1)+1))-1)
      < 50
order by substring(ListOfNums+',,,,',charindex(',',ListOfNums+',,,,',
         charindex(',',ListOfNums+',,,,')+1)+1,(charindex(',',ListOfNums+',,,,',
         charindex(',',ListOfNums+',,,,',charindex(',',ListOfNums+',,,,')+1)+1)-
         charindex(',',ListOfNums+',,,,',charindex(',',ListOfNums+',,,,')+1))-1)
/*
ID ListOfNums
-- -------------
 2 17,72
 5 77,349,14
 3 672,52,19,23
 1 279,37,972,15
*/
But now, through the magic of APPLY, you can have a much clearer query:

select ID
      ,ListOfNums
from #t
cross apply (select WorkString=ListOfNums+',,,,') F_Str
cross apply (select p1=charindex(',',WorkString)) F_P1
cross apply (select p2=charindex(',',WorkString,p1+1)) F_P2
cross apply (select p3=charindex(',',WorkString,p2+1)) F_P3
cross apply (select p4=charindex(',',WorkString,p3+1)) F_P4      
cross apply (select Num3=convert(int,substring(WorkString,p2+1,p3-p2-1))
                   ,Num4=convert(int,substring(WorkString,p3+1,p4-p3-1))) F_Nums
where Num4<50
order by Num3
/*
ID ListOfNums
-- -------------
 2 17,72
 5 77,349,14
 3 672,52,19,23
 1 279,37,972,15
*/
See how I used APPLY to write a little program of a sort? First, I added commas to the end of the column to account for possibly missing numbers in the list. Then I calculated the position of the first comma in that string (p1). Then I calculated the position of the second comma (p2), and that can only be done by using the p1 position I calculated in the previous step. I continue on getting the position of the third and fourth comma. And now that I have those, I can pull out Num3 (from between the second and third comma) and Num4 (from between the third and fourth comma). And I can now use those values in my WHERE and ORDER BY clause.

And the best part? NO COST! The above two queries are exactly the same as far as the optimizer is concerned. All those CROSS APPLYs are glommed together into a Compute Scalar operator, essentially coming up with really complicated expressions like you see in the first query. Take a look at the query plan yourself and you’ll see.

Replace the UNPIVOT operator!

Throw the UNPIVOT operator out the window… The optimizer really translates it into an APPLY operator under the hood anyway… and you can have control over NULLs and differing datatypes.

Look at the following example, which is a function that accepts a CustomerID and spits out information on the customer in a vertical fashion (note that there can be multiple contacts for a customer… this just spits out the first one… thus the ROW_NUMBER() logic):

create function VerticalMainOfficeData
(
  @CustomerID int
)
returns table
as
return
with BaseData as
(
  select SeqNo=row_number() over (order by ContactType)
        ,Name,AddressLine1,AddressLine2,City,StateProvinceName
        ,PostalCode,CountryRegionName
        ,ContactType,ContactName,Phone,EmailAddress
        ,YearOpened,NumberEmployees,Specialty 
        ,SquareFeet,Brands,AnnualSales
  from AdventureWorks.Sales.vStoreWithDemographics
  cross apply (select ContactName=isnull(Title+' ','')
                                 +FirstName+' '
                                 +isnull(MiddleName+' ','')
                                 +LastName
                                 +isnull(' '+Suffix,'')) F_Name
  where CustomerID=@CustomerID
    and AddressType='Main Office'
)
select Property,Value
from BaseData
cross apply 
   (values ('NAME AND ADDRESS:','')
          ,('  Name',Name)
          ,('  Address',AddressLine1)
          ,('  ',AddressLine2)
          ,('  City',City)
          ,('  State/Province',StateProvinceName)
          ,('  Postal Code',PostalCode)
          ,('  Country/Region',CountryRegionName)
          ,('','')
          ,('CONTACT:','')
          ,('  Type',ContactType)
          ,('  Name',ContactName)
          ,('  Phone',Phone)
          ,('  EmailAddress',EmailAddress)
          ,('','')
          ,('DEMOGRAPHIC INFO:','')
          ,('  Year Opened',str(YearOpened,4))
          ,('  Number of Employees',convert(varchar(10),NumberEmployees))
          ,('  Specialty',Specialty)
          ,('  Square Feet',convert(varchar(10),SquareFeet))
          ,('  Brands',Brands)
          ,('  Annual Sales','$'+convert(varchar(20),AnnualSales,1))) P(Property,Value)
where SeqNo=1
  and Value is not null
Watch it in action:

select * from VerticalMainOfficeData(34)
/*
Property              Value
--------------------- -----------------------------
NAME AND ADDRESS:     
  Name                Cycles Wholesaler & Mfg.
  Address             Science Park South, Birchwood
                      Stanford House
  City                Warrington
  State/Province      England
  Postal Code         WA3 7BH
  Country/Region      United Kingdom
                      
CONTACT:              
  Type                Owner
  Name                Ms. Barbara J. German
  Phone               1 (11) 500 555-0181
  EmailAddress        barbara4@adventure-works.com
                      
DEMOGRAPHIC INFO:     
  Year Opened         1999
  Number of Employees 15
  Specialty           Touring
  Square Feet         21000
  Brands              4+
  Annual Sales        $800,000.00
*/
Can UNPIVOT do that? Not in a million years. It’s toast!

Make JOINs Extinct!
`
Okay, this is really kind of a joke, but really, think about it… What is a JOIN? For each row in the first table, I want to JOIN it somehow with a row or rows in the second table. That sounds like an APPLY type of thing, doesn’t it? Well it is!

Look at the following traditional JOIN query, which finds all the Accessories that are Yellow, Blue, White:

select SubCategoryName=s.Name 
      ,p.ProductID
      ,ProductName=p.Name
      ,p.Color
from Production.ProductSubCategory s 
join Production.Product p on s.ProductSubcategoryID=p.ProductSubcategoryID 
where s.ProductCategoryID=3  --Accessories
  and p.Color in ('Yellow','Blue','White')
order by SubCategoryName 
        ,p.ProductID   
/*
SubCategoryName ProductID ProductName                     Color
--------------- --------- ------------------------------- ------
Jerseys               881 Short-Sleeve Classic Jersey, S  Yellow
Jerseys               882 Short-Sleeve Classic Jersey, M  Yellow
Jerseys               883 Short-Sleeve Classic Jersey, L  Yellow
Jerseys               884 Short-Sleeve Classic Jersey, XL Yellow
Socks                 709 Mountain Bike Socks, M          White
Socks                 710 Mountain Bike Socks, L          White
Socks                 874 Racing Socks, M                 White
Socks                 875 Racing Socks, L                 White
Vests                 864 Classic Vest, S                 Blue
Vests                 865 Classic Vest, M                 Blue
Vests                 866 Classic Vest, L                 Blue
*/
You can replace that JOIN with a CROSS APPLY:

select SubCategoryName=s.Name 
      ,p.ProductID
      ,ProductName=p.Name
      ,p.Color
from Production.ProductSubCategory s 
cross apply (select *
             from Production.Product 
             where ProductSubcategoryID=s.ProductSubcategoryID) p
where s.ProductCategoryID=3  --Accessories
  and p.Color in ('Yellow','Blue','White')
order by SubCategoryName 
        ,p.ProductID   
/*
SubCategoryName ProductID ProductName                     Color
--------------- --------- ------------------------------- ------
Jerseys               881 Short-Sleeve Classic Jersey, S  Yellow
Jerseys               882 Short-Sleeve Classic Jersey, M  Yellow
Jerseys               883 Short-Sleeve Classic Jersey, L  Yellow
Jerseys               884 Short-Sleeve Classic Jersey, XL Yellow
Socks                 709 Mountain Bike Socks, M          White
Socks                 710 Mountain Bike Socks, L          White
Socks                 874 Racing Socks, M                 White
Socks                 875 Racing Socks, L                 White
Vests                 864 Classic Vest, S                 Blue
Vests                 865 Classic Vest, M                 Blue
Vests                 866 Classic Vest, L                 Blue
*/
The query plans for both of those are exactly the same!

And LEFT JOINs can be replaced by OUTER APPLYs!

Today UNPIVOTs and JOINs… Tomorrow the world! Bwu hu hu ha ha ha ha haaaaaaa (Diabolical laughter).

Do it ALL!

For my final example, I’ll do ALL of the above (except for the JOIN replacement, which was just kind of a joke/trick anyway).

In doing the examples above, my query cache got populated with the text and plans of the queries I executed. We will look in the cache for the CROSS APPLY(TOP 3) query that was in the Execute SubQueries! section above, shred its query plan, looking for the operators, figure out their percentage cost, and list them in descending order of that cost. For Scans and Seeks and Joins, we will show the table, column and/or index used. And it will be presented in a vertical manner.

Note that the challenge here is finding the cost of each operator… it is not stored in the plan. Each operator has a Total Subtree Cost, but that is the cost of the operator itself PLUS the Subtree Costs of each of its immediate children operators. So for each operator, I had to find its children, total up their Subtree Costs and subtract that from the Subtree Cost of the operator to get the Individual Cost of the operator. This is done in the CROSS APPLY of the OperatorCosts CTE.

Hopefully the comments are self-explanatory:

with xmlnamespaces 
(
  default 'http://schemas.microsoft.com/sqlserver/2004/07/showplan'
)
,OperatorData as
(
  select ParentNodeID
        ,NodeID
        ,OperatorDesc 
        ,ScanSchema,ScanTable,ScanIndex
        ,LoopSchema,LoopTable,LoopColumn
        ,HashSchema,HashTable,HashColumn
        ,SubTreeCost 
  from sys.dm_exec_query_stats qs
  cross apply 
    --Get the Query Text
    sys.dm_exec_sql_text(qs.sql_handle) qt             
  cross apply 
    --Get the Query Plan
    sys.dm_exec_query_plan(qs.plan_handle) qp
  cross apply
    --Get the RelOp nodes from the Plan
    qp.query_plan.nodes('//RelOp') F_RelNodes(RelNode)
  cross apply 
    --Pull out the various attributes from the RelOp Node
    --And also extract the ParentNodeID of the operator
    (select ParentNodeID=RelNode.value('(../../@NodeId)','int')
           ,NodeID=RelNode.value('(./@NodeId)','int')
           ,LogicalOp=RelNode.value('(./@LogicalOp)','varchar(50)')
           ,PhysicalOp=RelNode.value('(./@PhysicalOp)','varchar(50)')
           ,SubTreeCost=RelNode.value('(./@EstimatedTotalSubtreeCost)','float')
    ) F_OpInfo
  cross apply 
     --Make a nice description out of the Operator
     (select OperatorDesc=case
                            when LogicalOp=PhysicalOp
                            then PhysicalOp
                            else PhysicalOp+' ('+LogicalOp+')'
                          end
    ) F_OpDesc
  outer apply 
    --Get child nodes having to do with a Scan/Seek
    --Note that OUTER APPLY is used since there may not
    --be any child nodes of this type
    RelNode.nodes('(./IndexScan[1]/Object[1])') 
    F_ScanNode(ScanNode)
  outer apply 
    --And pull out their Table/Index information
    (select ScanSchema=ScanNode.value('(./@Schema)','varchar(50)')
           ,ScanTable=ScanNode.value('(./@Table)','varchar(50)')
           ,ScanIndex=ScanNode.value('(./@Index)','varchar(100)')
    ) F_ScanInfo
  outer apply 
    --Get child nodes having to do with Nested Loops
    --Note that OUTER APPLY is used since there may not
    --be any child nodes of this type
    RelNode.nodes('(./NestedLoops[1]/OuterReferences[1]/ColumnReference[1])') 
    F_LoopNode(LoopNode)
  outer apply 
    --And pull out their Table/Column information
    (select LoopSchema=LoopNode.value('(./@Schema)','varchar(50)')
           ,LoopTable=LoopNode.value('(./@Table)','varchar(50)')
           ,LoopColumn=LoopNode.value('(./@Column)','varchar(50)')
    ) F_LoopInfo
  outer apply
    --Get child nodes having to do with Hash Joins
    --Note that OUTER APPLY is used since there may not
    --be any child nodes of this type
    RelNode.nodes('(./Hash[1]/HashKeysBuild[1]/ColumnReference[1])') 
    F_HashNode(HashNode)
  outer apply
    --And pull out their Table/Column information
    (select HashSchema=HashNode.value('(./@Schema)','varchar(50)')
           ,HashTable=HashNode.value('(./@Table)','varchar(50)')
           ,HashColumn=HashNode.value('(./@Column)','varchar(50)')
    ) F_HashInfo
  where qt.text like '%select top 3 ProductID%'
    and qt.text not like '%with xmlnamespaces%'  --Exclude this query
)
,OperatorCosts as
(
  --Calculate the Individual Costs by subtracting each Operator's
  --SubTreeCost minus its immediate children's SubTreeCosts
  select NodeID
        ,OperatorDesc 
        ,ScanSchema,ScanTable,ScanIndex
        ,LoopSchema,LoopTable,LoopColumn
        ,HashSchema,HashTable,HashColumn
        ,OperatorCost=convert(numeric(16,8),SubTreeCost-ChildrenSubTreeCost)
  from OperatorData o
  cross apply 
    --Calculate the sum of the SubTreeCosts of the immediate children
    (select ChildrenSubTreeCost=isnull(sum(SubTreeCost),0)
     from OperatorData
     where ParentNodeID=o.NodeID) F_ChildCost
)
,CostPercents as
(
  --Calculate the CostPercent using a window function
  select NodeID
        ,OperatorDesc 
        ,ScanSchema,ScanTable,ScanIndex
        ,LoopSchema,LoopTable,LoopColumn
        ,HashSchema,HashTable,HashColumn
        ,CostPercent=convert(numeric(5,1),100*OperatorCost/sum(OperatorCost) over ())
  from OperatorCosts       
)
select Information
from CostPercents
cross apply
  --UNPIVOT the information into a vertical presentation
  (values ('NodeID '+convert(varchar(5),NodeID)
          +' ('+convert(varchar(10),CostPercent)+'%):')
         ,('  '+OperatorDesc)
         ,('    Table: '+ScanSchema+'.'+ScanTable)
         ,('    Index: '+ScanIndex)
         ,('    Table: '+LoopSchema+'.'+LoopTable)
         ,('    Column: '+LoopColumn)
         ,('    Table: '+HashSchema+'.'+HashTable)
         ,('    Column: '+HashColumn)) P(Information)
where Information is not null   --Eliminate NULL rows
order by CostPercent desc
/*
Information
----------------------------------------------------------------
NodeID 13 (19.6%):
  Sort (TopN Sort)
NodeID 15 (19.6%):
  Sort
NodeID 9 (18.2%):
  Clustered Index Scan
    Table: [Sales].[CustomerAddress]
    Index: [PK_CustomerAddress_CustomerID_AddressID]
NodeID 11 (14.5%):
  Clustered Index Seek
    Table: [Sales].[Store]
    Index: [PK_Store_CustomerID]
NodeID 20 (8.7%):
  Clustered Index Seek
    Table: [Sales].[SalesOrderDetail]
    Index: [PK_SalesOrderDetail_SalesOrderID_SalesOrderDetailID]
NodeID 5 (5.8%):
  Hash Match (Inner Join)
    Table: [Person].[Address]
    Column: AddressID
NodeID 12 (4.9%):
  Clustered Index Seek
    Table: [Sales].[Customer]
    Index: [PK_Customer_CustomerID]
NodeID 18 (4.7%):
  Index Seek
    Table: [Sales].[SalesOrderHeader]
    Index: [IX_SalesOrderHeader_CustomerID]
NodeID 30 (2.3%):
  Clustered Index Seek
    Table: [Production].[Product]
    Index: [PK_Product_ProductID]
NodeID 7 (0.6%):
  Index Seek
    Table: [Person].[StateProvince]
    Index: [AK_StateProvince_StateProvinceCode_CountryRegionCode]
NodeID 8 (0.6%):
  Index Seek
    Table: [Person].[Address]
    Index: [IX_Address_StateProvinceID]
NodeID 6 (0.2%):
  Nested Loops (Inner Join)
    Table: [Person].[StateProvince]
    Column: StateProvinceID
NodeID 3 (0.2%):
  Nested Loops (Inner Join)
    Table: [Sales].[CustomerAddress]
    Column: CustomerID
NodeID 0 (0.0%):
  Nested Loops (Inner Join)
    Table: [Sales].[SalesOrderDetail]
    Column: ProductID
NodeID 1 (0.0%):
  Nested Loops (Inner Join)
    Table: [Sales].[Customer]
    Column: CustomerID
NodeID 2 (0.0%):
  Nested Loops (Inner Join)
    Table: [Sales].[CustomerAddress]
    Column: CustomerID
NodeID 19 (0.0%):
  Compute Scalar
NodeID 16 (0.0%):
  Compute Scalar
NodeID 17 (0.0%):
  Nested Loops (Inner Join)
    Table: [Sales].[SalesOrderHeader]
    Column: SalesOrderID
NodeID 14 (0.0%):
  Stream Aggregate (Aggregate)
*/
Is that, like, waaaaay cool, or what?!

I hope I’ve convinced you how powerful the APPLY operator can be. I couldn’t live without it.

Documenting Your SQL Agent Jobs

Wednesday, April 6th, 2011
XKCD RTFMAs a developer, I hate writing documentation.

And people in general hate reading (or just don’t read) documentation or manuals, as indicated by the XKCD comic.

So why in the world would I go through the crazy exercise of writing a stored procedure to generate documentation for SQL Agent Jobs?

Well, for one thing, I guess I must be a masochist.

Second of all, I enjoy a challenge.

Third, I’m not really writing documentation… I’m just, er, regurgitating it.

And fourth, I have to admit that it does come in very handy when I go in to visit a new client and find out what they’ve got running. ”What? You SHRINK your databases every single night? Do you like poking yourself in the head with an icepick too?”

This documentation I’m spitting out is not just a bunch o’ columns that get plopped into an SSMS grid results window… I went the extra mile and created actual HTML code to generate nice professional content suitable for viewing in your favorite browser (and, when printed, it’s also suitable for wrapping fish, training puppies, and lining birdcages).

Here’s a sample of the output (click on the picture to see a larger version):

Documentation Sample in Browser

If that makes you drool, then keep on reading. If it makes you yawn, then go take a nap.

The name of the stored procedure is called usp_SQLAgentJobDocumentation, and you can download it from my SkyDrive and install it into whatever database you wish.

It just generates a (multi-row) single-column NVARCHAR(MAX) result called HTM. You could just run it in SSMS, but the output is not going to do you much good unless you copy/paste it into a text file. (Important note: Direct the output to GRID, not to TEXT, because a TEXT output window can truncate some of the output. Once it has output to GRID, click in the grid and do a Select All (CTRL+A) and then Copy/Paste).

You can create a (CmdExec) job that uses BCP to output the procedure’s contents to a file, as in the following example:

bcp "exec YOURDATABASEHERE.YOURSCHEMAHERE.usp_SQLAgentJobDocumentation" 
    queryout "X:\SomeFolder\$(ESCAPE_DQUOTE(MACH))_SQLAgentDocumentation.htm" 
    -S$(ESCAPE_SQUOTE(SRVR)) -T -c -w
(If you have a local named instance, you may want to add $(ESCAPE_DQUOTE(INST)) to the filename).

Alternately, you can create an SSIS Package that will execute the stored procedure and output the contents to a Flat File Destination.

Then, you can browse away in the file to your heart’s content.

I made my best attempt to FULLY document the jobs, including multiple schedules, alerts, all possible notifications, etc… The only thing that is left out is anything involving Target Servers, mainly because I couldn’t test it out, but I imagine it’s as easy as JOINing in the sysjobservers and systargetservers tables of the msdb database. If someone would like to test it for me, leave a comment for me or send me an email and I’ll send off a revised copy that will address that.

One extra tidbit is that the documentation shows the average job duration over the last 100 executions, so you can get a good idea of how long the job takes to run. In addition, each individual Job Step shows the duration of its last execution. Both are spelled out in xx Hours xx Mins xx Secs easy-to-read format.

By the way, I did not include any information as to the last date/time executed or next date/time to be executed, because I figured it would be outdated within a half-hour of producing the documentation.

The procedure is just one huge single SELECT statement, with liberal use of CTE’s and CROSS APPLY’s (so it will only run in SQL2005 and beyond). Currently it will generate HTML for ALL jobs, but if you want to revise it to accept some kind of parameters to filter only certain jobs (by Job_ID or Name or whatever), it’s easy enough to do. The procedure’s first CTE is called SelectedJobs and you can make your modifications there:

alter procedure usp_SQLAgentJobDocumentation
  --Optional parameters here to filter jobs you want
as
with SelectedJobs as
(
  --If you want to add parameters to the procedure to filter out
  --certain jobs, you can do it in the WHERE clause here
  select *
  from msdb.dbo.sysjobs j
  --where job_id=@Job_ID
  --where name like '%'+@JobNamePattern+'%'
  --where category_id=@JobCategoryID
)
...
There were a couple of challenges in putting this together. Namely…

One: In my last blog post, I talked about handling the goofy integer representations of dates and times and durations in the sysjobhistory table.

Two: Books Online is a little spotty in giving detail about some of the msdb table columns, so I had to search around the web or use trial-and-error to find out how some of the columns worked. One example is the flags column in the sysjobsteps table, which (I found) is used to represent the various outputs of a Job Step. Books Online simply says “Reserved” for the description of this column, which is no help whatsoever.

Three: I had to unpivot the various data into the various HTML table structures, but that was easily accomplished via a CROSS APPLY, which I’ll talk about a little next week in my T-SQL Tuesday post. And getting the various HTML output to come out in the correct order was something I had to always keep in mind. I also had to worry about various HTML encodings of the ampersand and less-than and greater-than characters.

Four: Finally, translating all the various sysschedules columns into a readable English phrase was fun, coming up with possible phrases like ”Every 20 Minutes From 6:00am to 10:00pm Every 2 Months on the 3rd Wednesday of the Month”. I realize now that a few others have done this already (like SQLFool Michelle Ufford), but I’m glad I attacked it from scratch anyway. The code from that CTE is below if you’re interested.

I’m sure there are 3rd-party products that produce stuff like this, but I don’t care. I did it because I can, and it was fun. Yes, I’m a SQL nerd.

I hope you find this to be useful. I must admit it’s been a great help to me and my clients.

select schedule_id
      ,name
      ,SchedDesc=TimeOfDay+Frequency+EffDtRange
from msdb.dbo.sysschedules 
cross apply 
  --Translate the dates and times into DATETIME values
  --And translate the times into HH:MM:SSam (or HH:MMam) strings
  (select StDate=convert(datetime
                        ,convert(varchar(8),active_start_date))
         ,EnDate=convert(datetime
                        ,convert(varchar(8),active_end_date))
         ,StTime=convert(datetime
                        ,stuff(stuff(right(1000000+active_start_time
                                          ,6)
                                    ,3,0,N':')
                              ,6,0,N':'))
         ,EnTime=convert(datetime
                        ,stuff(stuff(right(1000000+active_end_time
                                          ,6)
                                    ,3,0,N':')
                              ,6,0,N':'))
  ) F_DtTm
cross apply
  --Translate the times into appropriate HH:MM:SSam or HH:MMam char formats
  (select replace(replace(replace(substring(lower(convert(varchar(30),StTime,109))
                                           ,13,14)
                                 ,N':000',N'')
                         ,N':00a',N'a')
                 ,N':00p',N'p')
         ,replace(replace(replace(substring(lower(convert(varchar(30),EnTime,109))
                                           ,13,14)
                                 ,N':000',N'')
                         ,N':00a',N'a')
                 ,N':00p',N'p')
  ) F_Tms(StTimeString,EnTimeString)
cross apply 
  --What Time of Day? Single Time or Range of Times/Intervals
  (select case 
            when freq_subday_type=0
            then N''
            else case 
                   when freq_subday_type=1
                   then N'At '
                   else N'Every '
                       +convert(nvarchar(10),freq_subday_interval)
                       +' '
                       +case freq_subday_type
                          when 2 then N'Second'
                          when 4 then N'Minute'
                          when 8 then N'Hour'
                        end
                       +case 
                          when freq_subday_interval=1 then N'' else N's' end
                       +N' From '
                 end
                +StTimeString
                +case
                   when freq_subday_type=1
                   then N''
                   else N' to '+EnTimeString
                 end
                +N' '
          end
  ) F_Tm(TimeOfDay)
cross apply
  --Translate Frequency  
  (select case freq_type
            when 1
            then N'One Time Only'
            when 4
            then N'Every '
                +case freq_interval 
                   when 1
                   then N'Day'
                   else convert(nvarchar(10),freq_interval)+N' Days'
                 end
            when 8
            then N'Every '
                +case freq_recurrence_factor
                   when 1
                   then N''
                   else convert(nvarchar(10),freq_recurrence_factor)+N' Weeks on '
                 end
                +stuff(case when freq_interval& 1<>0 then N', Sunday' else N'' end
                      +case when freq_interval& 2<>0 then N', Monday' else N'' end
                      +case when freq_interval& 4<>0 then N', Tuesday' else N'' end
                      +case when freq_interval& 8<>0 then N', Wednesday' else N'' end
                      +case when freq_interval&16<>0 then N', Thursday' else N'' end
                      +case when freq_interval&32<>0 then N', Friday' else N'' end
                      +case when freq_interval&64<>0 then N', Saturday' else N'' end
                      ,1,2,N'')
            when 16
            then N'Every '
                +case freq_recurrence_factor 
                   when 1
                   then N'Month '
                   else convert(nvarchar(10),freq_recurrence_factor)+N' Months '
                 end
                +N'on the '
                +convert(nvarchar(10),freq_interval)
                +case 
                   when freq_interval in (1,21,31)
                   then N'st'
                   when freq_interval in (2,22)
                   then N'nd'
                   when freq_interval in (3,23)
                   then N'rd'
                   else N'th'
                 end
                +N' of the Month'
            when 32
            then N'Every '
                +case freq_recurrence_factor 
                   when 1
                   then N'Month '
                   else convert(nvarchar(10),freq_recurrence_factor)+N' Months '
                 end
                +N'on the '
                +case freq_relative_interval 
                   when  1 then N'1st '
                   when  2 then N'2nd '
                   when  4 then N'3rd '
                   when  8 then N'4th '
                   when 16 then N'Last '
                 end
                +case freq_interval 
                   when  1 then N'Sunday'
                   when  2 then N'Monday'
                   when  3 then N'Tuesday'
                   when  4 then N'Wednesday'
                   when  5 then N'Thursday'
                   when  6 then N'Friday'
                   when  7 then N'Saturday'
                   when  8 then N'Day'
                   when  9 then N'Weekday'
                   when 10 then N'Weekend Day'
                 end
                +N' of the Month'
            when 64
            then N'When SQL Server Agent Starts'
            when 128
            then N'Whenever the CPUs become Idle'
            else N'Unknown'
          end
  ) F_Frq(Frequency)
cross apply
  --When is it effective?
  (select N' (Effective '+convert(nvarchar(11),StDate,100)
         +case  
            when EnDate='99991231'
            then N''
            else N' thru '+convert(nvarchar(11),EnDate,100)
          end
         +N')'           
  ) F_Eff(EffDtRange)

SysJobHistory Outliers

Tuesday, April 5th, 2011
Bell Curve with Standard DeviationsIn honor of NCAA March Madness, former basketball star (and current SQL MVP and SQLRockStar) Thomas LaRock (b | t) wrote a series of blog posts about SQL Server System Tables entitled, appropriately enough, March Madness. They ran from Mar17 to Apr04. Check them out… there is sure to be something in there that you didn’t know about before or that you’ll find to be really cool and useful.

Which brings me to this post. I really enjoyed the concept of Tom’s Apr02 post on the sysjobhistory table in msdb. He had a query that found the SQL Agent Jobs that are outliers… in other words, jobs that are atypical in terms of their run duration… to put it still another way, this query found jobs that ran longer than what would seem to be statistically “normal”.

Tom’s query looked at the previous 24 hours’ worth of job history and found those job steps that ran longer than the average duration plus two standard deviations (i.e. the mean plus two sigmas). In the picture of the bell curve above, that is in the green and gray zone at the right.

I thought this was a great concept, but I wanted to expand on it and introduce some additional features.

First of all, as Tom mentioned, the sysjobhistory table stores dates and times as integers, which is a real pain in the posterior. The dates are stored as YYYYMMDD integers and the times (and durations) are stored as HHMMSS integers. So, for example, if I look at a random entry in my sysjobhistory table…

select top 1 run_date, run_time, run_duration
from msdb.dbo.sysjobhistory
/*
run_date run_time run_duration
-------- -------- ------------
20110302    74856          106
*/
the run_time of 74856 means that the item started at 07:48:56 and the run_duration of 106 means that the item ran for a total of 00:01:06, or 1 minute and 6 seconds. So my first job was to translate these goofy integer representations into something meaningful.

The run_date is easy, since we are beyond the Middle Ages and all years that we deal with are 4 digits, so I could just directly translate a date by converting the integer to a VARCHAR(8) and then converting that to a DATETIME:

select top 1 convert(datetime,convert(varchar(8),run_date))
from msdb.dbo.sysjobhistory
/* 2011-03-02 00:00:00.000 */
The run_time takes a little more work, because the CONVERT function insists on times being in HH:MM:SS format, with 2 digits for each element, meaning leading zeroes had to be there. So first I would introduce the leading zeroes I need by adding 1000000 to the number and then taking the RIGHT-most 6 characters (T-SQL will implicitly convert the integer to a VARCHAR before doing the RIGHT function):

select top 1 right(1000000+run_time,6)
from msdb.dbo.sysjobhistory
/* 074856 */
Then it’s a matter of introducing the colons to separate the elements:

select top 1 stuff(stuff(right(1000000+run_time,6),3,0,':'),6,0,':')
from msdb.dbo.sysjobhistory
/* 07:48:56 */
Now, when I combine that with my VARCHAR representation of the run_date and put them together in YYYYMMDD HH:MM:SS format, I can CONVERT it into a complete date and time of the item:

select top 1 convert(datetime,convert(varchar(8),run_date)
                             +' '
                             +stuff(stuff(right(1000000+run_time,6),3,0,':'),6,0,':'))
from msdb.dbo.sysjobhistory
/* 2011-03-02 07:48:56.000 */
The run_duration requires a little more manipulation to translate the HHMMSS integer representation into just a number of seconds (so the value of 106, representing 00:01:06, is translated into 66 seconds):

select top 1 run_duration/10000*3600   --Hours*3600 = Seconds
            +run_duration%10000/100*60 --Minutes*60 = Seconds
            +run_duration%100          --Seconds
from msdb.dbo.sysjobhistory
/* 66 */            
 
So now that I have those formulas down, I can put together a query that would give me more meaningful information. This query will give me information for successfully-completed jobs… Entries with a Step_ID of 0 indicate a date/time/duration of the job as a whole:

select job_id
      ,date_executed=convert(datetime,convert(varchar(8),run_date))
                                     +' '
                                     +stuff(stuff(right(1000000+run_time
                                                       ,6)
                                                  ,3,0,':')
                                            ,6,0,':')
      ,secs_duration=run_duration/10000*3600
                    +run_duration%10000/100*60
                    +run_duration%100
from msdb.dbo.sysjobhistory
where step_id=0     --Job Outcome
  and run_status=1  --Succeeded
/*
job_id                               date_executed           secs_duration
------------------------------------ ----------------------- -------------
ECA51518-7144-49DE-9153-EB12D42AE0D9 2011-03-02 07:59:17.000            66
ECA51518-7144-49DE-9153-EB12D42AE0D9 2011-03-02 08:23:49.000            74
ECA51518-7144-49DE-9153-EB12D42AE0D9 2011-03-02 13:07:24.000            61
etc, etc, etc
*/
Now that I have that information, I can put that query into a CTE and make good use of it.

In order to find the outliers, I wanted the ability to define a certain “history window” or “baseline window” of a range of dates of job runs from which I could calculate the Average and the Standard Deviation. So the query would accept a @HistoryStartDate and @HistoryEndDate date range (I would convert the dates to full days starting/ending at midnight). Also, for a halfway meaningful sample, I wanted an option to calculate the statistics for jobs that had run a minimum number of times (represented by @MinHistExecutions). And, since we may only want to look at outliers of jobs that are kind of “beefy”, I wanted to introduce the option to only pay attention to jobs that had a minimum average duration of some kind (@MinAvgSecsDuration).

So this query would give us our history/baseline calculations:

declare @HistoryStartDate datetime = '19000101'
       ,@HistoryEndDate datetime = getdate()
       ,@MinHistExecutions int = 10
       ,@MinAvgSecsDuration int = 30
;
with JobHistData as
(
  select job_id
        ,date_executed=convert(datetime,convert(varchar(8),run_date))
                                       +' '
                                       +stuff(stuff(right(1000000+run_time
                                                         ,6)
                                                    ,3,0,':')
                                              ,6,0,':')
        ,secs_duration=run_duration/10000*3600
                      +run_duration%10000/100*60
                      +run_duration%100
  from msdb.dbo.sysjobhistory
  where step_id=0     --Job Outcome
    and run_status=1  --Succeeded
)
select job_id
      ,AvgDuration=avg(secs_duration*1.)
      ,AvgPlus2StDev=avg(secs_duration*1.)+2*stdevp(secs_duration)
from JobHistData
where date_executed>=dateadd(day,datediff(day,'19000101',@HistoryStartDate),'19000101')
  and date_executed<dateadd(day,1+datediff(day,'19000101',@HistoryEndDate),'19000101')
group by job_id
having count(*)>=@MinHistExecutions
   and avg(secs_duration*1.)>=@MinAvgSecsDuration
/*
job_id                               AvgDuration    AvgPlus2StDev
------------------------------------ ----------- ----------------
ECA51518-7144-49DE-9153-EB12D42AE0D9   69.373333 108.739927181991 
0BC1CB96-602F-417A-AB42-CAF98E1E39A1   47.266666  82.860423255389
etc, etc, etc
*/
And now that we have that, we can look for jobs in an “analysis window” date range (@AnalysisStartDate and @AnalysisEndDate)… perhaps the last 24 or 48 hours… to see which ones have a duration that exceeds the AvgPlus2StDev of the baseline.

Here is a stored procedure to do just that.

create procedure usp_SQLAgentJobOutliers
   @HistoryStartDate datetime  = null
  ,@HistoryEndDate datetime    = null
  ,@AnalysisStartDate datetime = null
  ,@AnalysisEndDate datetime   = null
  ,@MinHistExecutions int      = 10
  ,@MinAvgSecsDuration int     = 30
as
 
if @HistoryStartDate is null set @HistoryStartDate='19000101' ;
if @HistoryEndDate is null set @HistoryEndDate=getdate() ;
if @AnalysisStartDate is null set @HistoryStartDate='19000101' ;
if @AnalysisEndDate is null set @HistoryEndDate=getdate() ;
 
with JobHistData as
(
  select job_id
        ,date_executed=convert(datetime,convert(varchar(8),run_date))
                                       +' '
                                       +stuff(stuff(right(1000000+run_time
                                                         ,6)
                                                    ,3,0,':')
                                              ,6,0,':')
        ,secs_duration=run_duration/10000*3600
                      +run_duration%10000/100*60
                      +run_duration%100
  from msdb.dbo.sysjobhistory
  where step_id=0     --Job Outcome
    and run_status=1  --Succeeded
)
,JobHistStats as
(
  select job_id
        ,AvgDuration=avg(secs_duration*1.)
        ,AvgPlus2StDev=avg(secs_duration*1.)+2*stdevp(secs_duration)
  from JobHistData
  where date_executed>=dateadd(day,datediff(day,'19000101',@HistoryStartDate),'19000101')
    and date_executed<dateadd(day,1+datediff(day,'19000101',@HistoryEndDate),'19000101')
  group by job_id
  having count(*)>=@MinHistExecutions
     and avg(secs_duration*1.)>=@MinAvgSecsDuration
)
select jd.job_id
      ,JobName=j.name
      ,ExecutionDate=jd.date_executed
      ,[Duration (secs)]=jd.secs_duration
      ,[Historical Avg Duration (secs)]=AvgDuration
      ,[Min Threshhold (secs)]=AvgPlus2StDev
from JobHistData jd
join JobHistStats jhs on jd.job_id=jhs.job_id
join msdb.dbo.sysjobs j on jd.job_id=j.job_id
where date_executed>=dateadd(day,datediff(day,'19000101',@AnalysisStartDate),'19000101')
  and date_executed<dateadd(day,1+datediff(day,'19000101',@AnalysisEndDate),'19000101')
  and secs_duration>AvgPlus2StDev
 
If you don’t specify any parameters, it will use the ENTIRE sysjobhistory table for both the “history/baseline window” and the “analysis window”. Try calling the procedure with various parameters and see what you come up with. See if any jobs of yours have run with an atypically long duration.

If you want to get more granular and analyze individual Job Steps as opposed to Jobs as a whole, the following procedure will do that for you.

create procedure usp_SQLAgentJobStepOutliers
   @HistoryStartDate datetime  = null  
  ,@HistoryEndDate datetime    = null
  ,@AnalysisStartDate datetime = null
  ,@AnalysisEndDate datetime   = null
  ,@MinHistExecutions int      = 10
  ,@MinAvgSecsDuration int     = 30
as
 
if @HistoryStartDate is null set @HistoryStartDate='19000101' ;
if @HistoryEndDate is null set @HistoryEndDate=getdate() ;
if @AnalysisStartDate is null set @HistoryStartDate='19000101' ;
if @AnalysisEndDate is null set @HistoryEndDate=getdate() ;
 
with JobHistData as
(
  select job_id
        ,step_id
        ,date_executed=convert(datetime,convert(varchar(8),run_date))
                                       +' '
                                       +stuff(stuff(right(1000000+run_time
                                                         ,6)
                                                    ,3,0,':')
                                              ,6,0,':')
        ,secs_duration=run_duration/10000*3600
                      +run_duration%10000/100*60
                      +run_duration%100
  from msdb.dbo.sysjobhistory
  where step_id<>0
    and run_status=1  --Succeeded
)
,JobHistStats as
(
  select job_id
        ,step_id
        ,AvgDuration=avg(secs_duration*1.)
        ,AvgPlus2StDev=avg(secs_duration*1.)+2*stdevp(secs_duration)
  from JobHistData
  where date_executed>=dateadd(day,datediff(day,'19000101',@HistoryStartDate),'19000101')
    and date_executed<dateadd(day,1+datediff(day,'19000101',@HistoryEndDate),'19000101')
  group by job_id
          ,step_id
  having count(*)>=@MinHistExecutions
     and avg(secs_duration*1.)>=@MinAvgSecsDuration
)
select jd.job_id
      ,JobName=j.name
      ,jd.step_id
      ,s.step_name
      ,ExecutionDate=jd.date_executed
      ,[Duration (secs)]=jd.secs_duration
      ,[Historical Avg Duration (secs)]=AvgDuration
      ,[Min Threshhold (secs)]=AvgPlus2StDev
from JobHistData jd
join JobHistStats jhs on jd.job_id=jhs.job_id
join msdb.dbo.sysjobs j on jd.job_id=j.job_id
join msdb.dbo.sysjobsteps s on jd.job_id=s.job_id and jd.step_id=s.step_id 
where date_executed>=dateadd(day,datediff(day,'19000101',@AnalysisStartDate),'19000101')
  and date_executed<dateadd(day,1+datediff(day,'19000101',@AnalysisEndDate),'19000101')
  and secs_duration>AvgPlus2StDev
 
However, a word of warning: I was disappointed to find that the sysjobhistory table stores only the Step_ID and not the Step_UID unique identifier. So a Job’s Step_ID #1 today may not have been its Step_ID #1 a month ago. If you move steps up/down in the step list of the SQL Agent Job dialog, it changes their Step_ID value from that point forward, but it doesn’t change anything in history. So you may be comparing apples and oranges.

Anyway, I hope you find these procedures useful for finding those outliers. Thanks again to Thomas LaRock for the idea.

My next blog post will also be dealing with SQL Agent Jobs… I think you’ll really like that one… Stay tuned!

Shrink Your Databases Regularly

Tuesday, March 29th, 2011
Oh, no, no, no… Not that kind of shrinking! (But I did get your attention, didn’t I?)

shrink \shringk\ n. : (slang) [short for headshrinker] a clinical psychiatrist or psychologist [Tony Soprano sees his shrink, Dr. Melfi, every week.] vt. : (slang) to psychoanalyze [Shrink your databases regularly.] shrank, shrunk


A note to the reader: This is based on several true stories. Some events were changed or otherwise fictionalized for dramatic purposes.



Shrink Your Databases RegularlyDr. Ben Adryl: Good afternoon. I’m Dr. Ben Adryl, but many people just refer to me by my initials DBA. And you are Mr. Tabase, I presume?

Deigh Tabase: Yes, my first name is Deighton. Most people just call me Deigh for short.

Dr. Ben Adryl (DBA): Welcome. Now how can I help you today?

Deigh Tabase (Database): Well, I seem to have a lot of problems, and they’ve been getting worse with each passing day. It was suggested that I make an appointment with you. But I have to admit, I’ve never been to a shrink before.

DBA: I would prefer that you not use the “s” word in this office. I’m not really a traditional psychotherapist anyway… I take more of a holistic approach and work with all aspects of a person’s well-being, like sanity, health, productivity. Think of me as a life coach. I’m kind of like Dr. Phil and Dr. Oz and David Allen all rolled into one.

Database: That sounds great, doc. How do we start?

DBA: Tell me a little bit about some of your problems.

Database: Well, for one thing, I’ve been suffering from claustrophobia as long as I can remember. Plus I feel bloated and tired and overworked and I can’t seem to get organized. I guess I’m kind of a mess.

DBA: Don’t worry, all is not lost. I’m sure I can help. Let me explain how this works. I’m going to ask you several questions, using a special technique to draw more information out of your subconscience. This technique is called Dynamic Management View Querying.

Database: Will it hurt?

DBA: Oh no, not at all. Based on what I find, I may be able to give you recommendations on how you can find relief, and, in some cases, I may be able to administer treatment immediately.

Database: That would be wonderful, doc.

DBA: So tell me more about yourself… Your age, your background, etc…

select Created=convert(varchar(20),d.create_date,100)
      ,LastRead=isnull(convert(varchar(20),s.last_read_date,100),'')
      ,LastWrite=isnull(convert(varchar(20),s.last_write_date,100),'')
      ,CompLevel=d.compatibility_level
      ,RecovModel=d.recovery_model_desc
      ,LogReuseWait=d.log_reuse_wait_desc
      ,Collation=d.collation_name
      ,AutoStats=case 
                   when d.is_auto_create_stats_on=1
                   then 'Create '
                   else ''
                 end
                +case
                   when d.is_auto_update_stats_on=1
                   then 'Update '
                   else ''
                 end
                +case
                   when d.is_auto_update_stats_async_on=1
                   then 'Asynch '
                   else ''
                 end
      ,Parameterization=case
                          when d.is_parameterization_forced=1
                          then 'Forced'
                          else 'Simple'
                        end
      ,UserAccess=d.user_access_desc
      ,[State]=d.state_desc
from sys.databases d
outer apply (select last_read_date
                      =max(case
                             when last_user_scan>=isnull(last_user_seek,'19000101')
                              and last_user_scan>=isnull(last_user_lookup,'19000101')
                             then last_user_scan
                             when last_user_seek>=isnull(last_user_scan,'19000101')
                              and last_user_seek>=isnull(last_user_lookup,'19000101')
                             then last_user_seek
                             else last_user_lookup
                           end)
                   ,last_write_date=max(last_user_update)
             from sys.dm_db_index_usage_stats
             where database_id=d.database_id) s
where database_id=db_id('Deighton')
/*
(Reformatted for clarity):
Created............ Dec 18 2010  8:37PM
LastRead........... Mar 29 2011 11.24AM
LastWrite.......... Mar 29 2011 11:23AM
CompLevel.......... 100
RecovModel......... FULL
LogReuseWait....... LOG_BACKUP
Collation.......... SQL_Latin1_General_CP1_CI_AS
AutoStats.......... Create Update
Parameterization... Simple
UserAccess......... MULTI_USER
State.............. ONLINE
*/
DBA: Hmmm… Interesting.

Database: What is it, doc?

DBA: Oh nothing… Don’t worry… I often mumble to myself as I do some of this analysis. Pay it no mind. Tell me a little more about yourself …

use Deighton
go
select LogicalName=name
      ,[Type]=case when [type]=0 then 'Data' else 'Log' end
      ,SizeAllocated=convert(varchar(20),convert(decimal(12,2),size*8./1024))+'MB'
      ,SpaceUsed=convert(varchar(20),convert(decimal(12,2),UsedPages*8./1024))+'MB'
      ,[%Full]=convert(varchar(20),convert(decimal(12,1),UsedPages*100./size))+'%'
      ,SpaceLeft=convert(varchar(20),convert(decimal(12,2),(size-UsedPages)*8./1024))+'MB'
      ,[%Avail]=convert(varchar(20),convert(decimal(12,1),(size-UsedPages)*100./size))+'%'
      ,Growth=case
                when is_percent_growth=1
                then convert(varchar(20),growth)+'%'
                else convert(varchar(20),convert(decimal(12,2),growth*8./1024))+'MB'
              end
      ,MaxSize=case
                 when max_size=-1
                 then 'Unlimited'
                 else convert(varchar(20),convert(decimal(12,0),max_size*8./1024))+'MB'
               end
      ,PhysicalFile=physical_name
from sys.database_files 
cross apply (select UsedPages=fileproperty(name,'SpaceUsed')) F_Used
where [type] in (0,1)  --Rows,Log
/*
(Reformatted for clarity):
Deighton_Data (Data):
  Size:   1756.00MB
  Used:   1755.00MB (99.9%)
  Avail:     1.00MB (0.1%)
  Growth:    1.00MB
  Max:    Unlimited
  File:   C:\Microsoft SQL Server\MSSQL.1\MSSQL\data\Deighton_Data.MDF
Deighton_Log (Log):
  Size:  14198.81MB
  Used:  13745.87MB (96.8%)
  Avail:   452.95MB (3.2%)
  Growth:       10%
  Max:    Unlimited
  File:   C:\Microsoft SQL Server\MSSQL.1\MSSQL\data\Deighton_Log.LDF
*/
DBA: Ahhh… I see… You mentioned earlier that you suffer from claustrophobia.

Database: Yes, all the time. And often I have these anxiety or panic attacks where I kind of explode, and it makes me feel a little better, but not for long.

DBA: Well, I can cure you of that immediately.

Database: You’re kidding… Really?

DBA: Yes. Now just hold still while I…

/*
The database felt claustrophobic because it was using up 99.9% of its allocated space.
And its FILEGROWTH is only 1MB (the default), so every time it reached full capacity and
had an anxiety attack, which was often, it only experienced "autogrowth" of a piddly 
little 1MB, which might have made it feel relatively better for a short while, but it 
would still be only 1MB away from being full capacity again and having another attack.  
If the database's initial allocated size upon creation was 200MB, then it had over 1500 
autogrowth anxiety attacks in its lifetime!
So DBA bumped the allocated size up to 3500MB to relieve the claustrophobia and,
at the same time, he changed the FILEGROWTH to 100MB. This was just a preventative
measure, in case the database unexpectedly reached that 3500MB capacity.  But DBA will
diligently monitor the used space and adjust sizes appropriately long before the
database uses up that space.
*/
alter database Deighton
modify file (name=N'Deighton_Data'
            ,size=3500MB
            ,filegrowth=100MB)
DBA: How does that feel?

Database: Omigosh, doc! That’s amazing! I feel so free now! Yippee!

DBA: I think I can take care of that bloating problem you mentioned earlier also. One moment…

/*
The database's transaction log was HUGE. It was about 14GB in size, filled up
to 96.8% capacity.  That's compared to its actual data, which was only 1.7GB.
Since the database has a Recovery Model of FULL, its transaction log will 
continue to grow larger and larger forever... unless a transaction log backup
is performed.  That log backup will truncate the contents of the transaction
log.  Note that a full backup of the database WILL NOT truncate the transaction
log automatically... Log backups must be performed in order to do that.
*/
backup log Deighton
to disk=N'C:\SomeBackupLocation\DeightonLog-yyyymmdd-hhmmss.trn'
DBA: How’s that?

Database: Much better, doc!

DBA: Perhaps at another time I can make a further adjustment, though it is a procedure that I very rarely perform.

Database: Okay.

DBA: Now tell me about diet and exercise. Do you exercise regularly?

Database: Well… I… er…

/*
Obtain the 5 most recent backups of each type performed.
*/
with BackupDetail as
(
  select BackupType
        ,backup_finish_date
        ,RowNum=row_number() over (partition by BackupType 
                                   order by backup_finish_date desc)
  from msdb.dbo.backupset 
  cross apply (select BackupType=case [type]
                                   when 'D' then 'Full'
                                   when 'I' then 'Differential'
                                   when 'L' then 'Log'
                                   when 'F' then 'File'
                                   when 'G' then 'Differential File'
                                   when 'P' then 'Partial'
                                   when 'Q' then 'Differential Partial'
                                   else 'N/A'
                                 end) F_Type
  where database_name='Deighton'
)
select BackupType
      ,backup_finish_date
from BackupDetail
where RowNum<=5
order by BackupType
        ,RowNum
/*
BackupType backup_finish_date
---------- -----------------------
Full       2011-03-20 23:01:33.000
Full       2011-03-04 22:19:52.000
Full       2011-02-24 22:43:39.000
Full       2011-02-19 22:24:03.000
Full       2011-02-11 23:11:42.000
Log        2011-03-29 11:35:31.000  <==This is the log backup just performed
*/
DBA: Hmmm… It seems pretty sporadic.

Database: Yeah, well…

DBA: Listen, Deighton, this is very important. I know you’re the kind of guy who wants to live life to the FULLest. In order to really do that, you have to do regular exercise and have a healthy diet. Look at it this way… If something happens to you and you get very sick or are in a bad accident, you want to get better quickly and RESTORE yourself to FULL health, don’t you?

Database: Yes.

DBA: You don’t want to be a SIMPLEton and only be restored to a fraction of yourself, do you?

Database: No, I suppose not.

DBA: Then we have to get you into an exercise regimen [full and differential backups] and you should be drinking water regularly during the day [log backups] to help flush out your system. I know it seems like a pain, but it can be really easy and doesn’t take all that much time. To get you motivated, I can put you in touch with a personal trainer named C. Quill Agent who will stay on top of these things for you, making sure that you do them regularly.

Database: Thanks, doc.

DBA: Don’t mention it. Now, on to other things… You mentioned overwork and disorganization?

Database: Yeah. I feel like I’m doing unnecessary work at my place of business. I’m trying to remedy this by using a strategy that I saw in a self-help book, but it doesn’t seem to work. My workflow still seems incredibly inefficient, and I just feel kind of scatter-brained, if you know what I mean.

DBA: Yes, I think I have an idea of what you’re talking about. Tell me about your methods of organization…

use Deighton
go
select TableName=object_name(dm.object_id)
      ,IndexName=i.name
      ,IndexType=dm.index_type_desc
      ,[%Fragmented]=avg_fragmentation_in_percent
from sys.dm_db_index_physical_stats(db_id(),null,null,null,'sampled') dm
join sys.indexes i on dm.object_id=i.object_id and dm.index_id=i.index_id 
order by avg_fragmentation_in_percent desc
/*
(Names of Tables and Indexes Disguised):
TableName            IndexName               IndexType               %Fragmented
-------------------- ----------------------- ------------------ ----------------
qufi_efferhmiqfz     rhaqwqamxim             NONCLUSTERED INDEX 99.9925228054434
qufi_efferhmiqfz     foliqemi                NONCLUSTERED INDEX 99.9788779993241
qufiz                zuamriwiy               NONCLUSTERED INDEX 99.9643747773424
qufi_efferhmiqfz     qufioh                  NONCLUSTERED INDEX 99.9625355911884
qufiz                qufihefifomi            NONCLUSTERED INDEX 99.9515738498789
qufiz                rmiefihhefi             NONCLUSTERED INDEX 99.9515503875969
qufiz                lezfaphefi              NONCLUSTERED INDEX 99.9515503875969
qufiz                erfogofy_zfemfhefi      NONCLUSTERED INDEX 99.9481058640374
qufiz                erfogofy_rumplifihhefi  NONCLUSTERED INDEX 99.9481058640374
qufiz                miplerihwofhoh          NONCLUSTERED INDEX 99.9339498018494
qufiz                zuamrioh                NONCLUSTERED INDEX 99.9339061467284
qufiz                hilifih                 NONCLUSTERED INDEX 99.9093381686310
qufiz                umpheq                  NONCLUSTERED INDEX 99.9092558983666
fuhu                 PK_fuhu                 CLUSTERED INDEX    99.8109640831758
xmuwimz              PK_xmuwimz              CLUSTERED INDEX    99.7950819672131
xmuwimz              hxe                     NONCLUSTERED INDEX 97.4358974358974
qufiz                PK_qufiz                CLUSTERED INDEX    97.2856847758728
xmuwimz              lezfqemi                NONCLUSTERED INDEX 97.2222222222222
hogozouqz            PK_hogozouqz            CLUSTERED INDEX    97.0588235294118
xmuwimz              fomzfqemi               NONCLUSTERED INDEX 96.9696969696970
xmuwimz              reloriqziqamxim         NONCLUSTERED INDEX 96.9696969696970
xmuwimz              zzq                     NONCLUSTERED INDEX 95.8333333333333
xmuwimz              mzmoh                   NONCLUSTERED INDEX 95.4545454545455
xmuwimz              fizfmirumh              NONCLUSTERED INDEX 95.4545454545455
ruqferfzlezferrizzih PK_ruqferfzlezferrizzih CLUSTERED INDEX    93.7500000000000
hogozouqz            hogozouqqemi            NONCLUSTERED INDEX 92.8571428571429
xmuwimzlezferrizzih  PK_xmuwimzlezferrizzih  CLUSTERED INDEX    90.9090909090909
...And so on and so on
*/
DBA: My goodness, I can see why you’re feeling overworked… You are scattered and fragmented. What is the strategy that you said you’re trying to alleviate this?

Database: I wrote it down. Here it is… I try to do it often:

use Deighton
go
 
/*
Rebuild all Indexes and Statistics in the database
*/
declare @Table nvarchar(255)
       ,@Sql nvarchar(255)
 
declare TableCursor cursor
for
select quotename(table_schema)+'.'+quotename(table_name)
from information_schema.tables
where table_type='BASE TABLE'
 
open TableCursor
while 1=1
begin
  fetch next from TableCursor into @Table
  if @@fetch_status<>0 break
  set @Sql='alter index all on '+@Table+' rebuild'
  exec (@Sql)
  set @Sql='update statistics '+@Table
  exec (@Sql)
end
 
close TableCursor
deallocate TableCursor
 
/*
Reclaim disk space by shrinking
*/
dbcc shrinkdatabase(N'Deighton',10)
dbcc shrinkfile(N'Deighton_Data',10)
dbcc shrinkfile(N'Deighton_Log',10)
DBA: Oh my goodness, I definitely see the problem here. And I’ll bet you feel exhausted after doing this, am I right?

Database: Yeah, you got that right.

DBA: Hold on… Let me write down a few notes…

/*
The following will rebuild all indexes from scratch, even though that might not
be necessary.  Indexes should be treated on a case-by-case bses.  An index should
be REBUILT only when it has high fragmentation, REORGANIZEd if the fragmentation is 
moderate, and not touched at all if the fragmentation is very low.  
*/
set @Sql='alter index all on '+@Table+' rebuild'
exec (@Sql)
/*
By default, statistics are auto-updated by the system when a table has been
changed by a certain amount.  If a table has not changed at all, then there's no
reason to update its statistics.  What's worse with the command below is that
it is NOT updating the statistics WITH FULLSCAN.  The Index Rebuild above
automatically rebuilt statistics for indexed columns "for free" (using FULLSCAN), 
but the command below will just end up re-updating them based on only a
SAMPLE, so it's duplicating the work of the Index Rebuild and creating less
accurate statistics at the same time!  Ideally the command below should be doing
UPDATE STATISTICS ... WITH FULLSCAN, COLUMNS so that it only updates non-indexed
column statistics.  
*/
set @Sql='update statistics '+@Table
exec (@Sql)
 
/*
Oh, horror of horrors!  SHRINKing a database will just end up completely fragmenting
up all the indexes that had been painstakingly rebuilt above!  This is like shooting
yourself in the foot.  And it's essentially being done twice!  A SHRINKDATABASE 
automatically shrinks each of its data and log files, so the two SHRINKFILE commands
are just duplicating the process.
*/
dbcc shrinkdatabase(N'Deighton',10)
dbcc shrinkfile(N'Deighton_Data',10)
dbcc shrinkfile(N'Deighton_Log',10)
DBA: Okay, listen to me carefully. You must stop doing this immediately and never do it again! The first part is generally okay, except you’re spending way too much time and effort in redoing EVERYTHING from scratch. Contact a colleague of mine named Ola Hallengren… he has a much more intelligent approach to doing all of this. But the second part is a huge giant no-no, because it’s destroying everything you painstakingly did in the first part. It’s like you’ve arranged your papers on your desk in nice neat stacks and then a hurricane comes into the room and blows them all over the place. After our session, please read this article by Paul Randal on why you should not be doing this.

Database: Okay, whatever you say, doc.

DBA: It’s also possible that some of the stuff you’re organizing may be doing you more harm than good…

/*
Produce a list of all nonclustered, non-primary-key indexes in the database that
deal with more than 5000 rows.  Compare their Reads vs Writes.  The list is sorted
by Read/Write ratio.  Focus on the indexes toward the top of the list with a
Read/Write ratio of under 1.00... They are candidates for DROPping from the database.
*/
use Deighton
go
select TableName=o.Name
      ,IndexName=i.Name
      ,Rows
      ,Reads
      ,Writes
      ,[Reads/Write]
      ,Seeks=User_Seeks
      ,Scans=User_Scans
      ,Lookups=User_Lookups
      ,Definition='('+IndexColumns+')'+coalesce(' include ('+IncludeColumns+')','')
      ,DropIt='drop index '+quotename(i.Name)
             +' on '+quotename(c.name)+'.'+quotename(object_name(s.object_id))
from sys.dm_db_index_usage_stats s  
join sys.indexes i ON s.object_id=i.object_id and s.index_id=i.index_id 
join sys.objects o on s.object_id=o.object_id
join sys.schemas c on o.schema_id=c.schema_id
cross apply (select Rows=sum(p.Rows)
             from sys.partitions p
             where object_id=s.object_id and index_id=s.index_id) F_Rows
cross apply (select Reads=User_Seeks+User_Scans+User_Lookups
                   ,Writes=User_Updates) F_RW 
cross apply (select [Reads/Write]=cast(case
                                         when Writes<1
                                         then 100
                                         else 1.*Reads/Writes
                                       end as decimal(12,3))) F_RW2
cross apply (select IndexColumns
                      =stuff(
                         (select ','+c.Name
                                +case ic.Is_Descending_Key 
                                   when 1 
                                   then ' DESC' 
                                   else '' 
                                 end
                          from sys.index_columns ic
                          join sys.columns c on ic.Object_ID=c.Object_ID 
                                             and ic.Column_ID=c.Column_ID
                          where ic.Object_ID=i.Object_ID 
                            and ic.Index_ID=i.Index_ID
                            and ic.Is_Included_Column=0
                          order by ic.Index_Column_ID
                          for xml path(''))
                       ,1,1,'')
                   ,IncludeColumns
                      =stuff(
                         (select ','+c.Name
                          from sys.index_columns ic
                          join sys.columns c on ic.Object_ID=c.Object_ID 
                                             and ic.Column_ID=c.Column_ID
                          where ic.Object_ID=i.Object_ID 
                            and ic.Index_ID=i.Index_ID
                            and ic.Is_Included_Column=1
                          order by ic.Index_Column_ID
                          for xml path(''))
                       ,1,1,'')) F_IC
where s.database_id=db_id()
  and objectproperty(s.object_id,'IsUserTable')=1
  and i.type_desc='NONCLUSTERED'
  and i.is_primary_key=0
  and i.is_unique_constraint=0
  and Rows>5000
order by [Reads/Write]
        ,Reads
        ,Writes
/*
(Names of Tables and Indexes Disguised):
(Output abbreviated):
TableName              IndexName                                Reads Writes Reads/Write
---------------------- ---------------------------------------- ----- ------ -----------
eahoffmeol             eahoffmoel_rheqgilugmezfimoh                 0   1346       0.000
ehhmizziz              ehhmizziz_ehhmizzfypi                        0   1880       0.000
rheqgilug              rheqgilug_gluxelazimoh                       0  42235       0.000
zaxlihgim              zaxlihgim_gloh                               0  62264       0.000
rheqgilug_hifeolih     rheqgilug_hifeolih_rheqgilug_mezfimoh        1  37584       0.000
zaxlihgim              zaxlihgim_miruqrolexli_ehjazfmiqfoh         82  62292       0.001
zaxlihgim              zaxlihgim_rezhmiriopf_hifeolzoh            113  62323       0.002
zaxlihgim              zaxlihgim_oqguorioh                        323  62641       0.005
zaxlihgim              zaxlihgim_hozxamzexliofimoh                732  62996       0.012
phuqiz                 phuqiz_phuqi                                75   2662       0.028
eahoffmeol             eahoffmoel_zuamriwiy_zuamrioh               44   1346       0.033
rheqgilug_mezfim       rheqgilugmezfim_zizzouqoh                  491   6759       0.073
oqguori_xollexliofimz  oqguori_xollexliofimz_pulory_pleq_oh      1907   5440       0.351
miruqrolexli_loqiofimz miruqrolexli_loqiofimz_fypi              13723  19833       0.692
oqguori_xollexliofimz  oqguori_xollexliofimz_oqguorioh          11240   5440       2.066
oqguori_xollexliofimz  oqguori_xollexliofimz_zuamriwiy_zuamrioh 11465   5117       2.241
phuqiz                 phuqiz_zuamriwiy_zuamrioh                 8303   3476       2.389
miruqrolexli_loqiofimz fw_miruqrolexli_loqiofimz_fypi           75078  24978       3.006
rheqgilug_mezfim       rheqgilugmezfim_zuamriwiy_zuamrioh       38364   6759       5.676
ehhmizziz              ehhmizziz_zuamriwiy_zuamrioh             21305   2959       7.200
zopruhiz               rofy                                      2572      0     100.000
zopruhiz               zfefi                                     2572      0     100.000
zopruhiz               zopruhi                                  27203      0     100.000
*/
DBA: See the items at the top? Those are cases where you’re spending much more time filing items away than you are acquiring them later. For example, in the fourth entry in the list, you’ve spent time and energy filing 62,264 items but you’ve never once accessed one of them after the fact. Organizing your information in that way is most likely more work than it’s worth, since you aren’t really taking advantage of it anyway.

Database: Hmmm… I see what you mean, doc.

DBA: Well, it looks like we’re out of time for today. Even though we started to accomplish a lot today, we’ve just scratched the surface, so I’d like to schedule you for regular visits so that we can monitor you and administer some more treatments as necessary.

Database: Er… How long will we be doing this monitoring, doc?

DBA: Why, the rest of your life of course. Here’s my bill.

Database: Let’s see… WHAT?! That’s outrageous! I don’t have that kind of money! And you want to continue seeing me and billing me ridiculous amounts like this for the rest of my life? Are you insane? Just who the hell do you think you are? You high-and-mighty types make me want to vomit. You sit there on your high horse and think you know everything. Well, you know what you can do with this here bill? You can take it and just shove---

drop database Deighton
 
DBA: (Sigh). Another one bites the dust. Oh well… Ms. Foreachdb, would you please send in my next patient?