Archive for the ‘SQL Server 2005’ Category

Index Analysis: The Re-Return

Tuesday, December 18th, 2012

Lego ComputerA couple years back, I wrote a few blog posts about an indexing script that I use with clients from time to time.  While I provided some updates to it over the years, I haven’t pushed out anything new on it in quite some time.  Seems I got distracted with an indexing book and never got back to pushing out those updates.

Time to fix all that.  If you had previously used the script and followed the old blog posts, which have all been tagged with Index Analysis, today’s the day for an update.  If you haven’t seen these posts before, take a look.  You might find this information interesting and useful.  Especially if you need to get a handle on the indexing in your environment.

Index Analysis Script

The basic premise with the index analysis script is to provide a snapshot of the indexes within a database.  Depending on the parameters provided, the script can provide results for all of the tables in a database or a single table.  By default, the script correlates the missing index and foreign key information into the results to assist in identifying indexes that can be created.  Through the index_action column, a brief analysis of the index is performed to assist in identifying issues with the index.  The index_pros and index_cons columns provides information on things that are good and bad about the index.

Index Analysis Updates

The main updates to the script in this release are the following:

  • Rewrote information collection to reduce dynamic SQL
  • Fixed issue with Object_ID that wasn’t returning correct table
  • Added unique check and filter definition to duplication and overlapping indexes
  • Fixed errors with documentation section.
  • Added parameters @IncludeMissingIndexes, @IncludeMissingFKIndexes, and @Output
  • Removed DDL create and drop columns for indexes, too much risk in automatic index building
  • Split the script into three versions for the past three major SQL Server releases

Index Analysis Downloads

Click on the following links to download the index analysis script:

More Information

If you are looking for a few hundred pages of indexing information, you can check out Expert Performance Indexing for SQL Server 2012.  Along with that, for indexing resources from my blog, check out the index resource page.

Discovering Binary Pages In Sys.Indexes

Wednesday, October 24th, 2012

6972394436_74ebe2c1c3_qIn a previous post, we looked at how to take the the binary page values in sysindexes and convert the values into file id and page id for the first and root index page and first IAM page for each index.  This information can be useful when you need to find the base pages for indexes and don’t want to run DBCC IND on each index to retrieve those pages.

While this information is readily available in sysindexes, it doesn’t happen to be a part of sys.indexes. And since sysindexes is deprecated and potentially on it’s last hurrah in SQL Server 2012 – a new solution is needed.

If the information isn’t a part of sys.indexes, is there another place this information is available?  Fortunately the answer is yes.  There is a new undocumented dynamic management object (DMO) in SQL Server 2012 called sys.dm_db_database_page_allocations.  This DMO generally provides the same output that DBCC IND does but in a query-able format.

Two Data Elements into a Binary Value

The purpose of this post is to demonstrate how to take the file id and page id from designated pages and reconstruct the binary value that is being returned from sysindexes.  While I’m not certain on the advantages and uses for the values in the native sysindexes format, that discussion is really for another day.  The aim is just to reconstruct with no questions asked.  Ok, maybe some questions – that’s what the comment section is for.

Start With the Pages

Building the binary values for the root and first index pages and the first IAM pages requires a few steps.  The first is to identify the proper page types from the output of sys.dm_db_database_page_allocations that are needed.  There are a number of different page types returned by the DMO.  For the purpose here, the types that map to the intended outputs are  INDEX_PAGE, IAM_PAGE, and DATA_PAGE.  The query in Listing 1 returns a list of these pages for object id 245575913.



--Listing 1 – Page types from sys.dm_db_database_page_allocations

SELECT page_type_desc
,object_id
,index_id
,page_level
,allocated_page_file_id
,allocated_page_page_id
,next_page_page_id
,previous_page_page_id
FROM sys.dm_db_database_page_allocations(db_id(), NULL, NULL, NULL, 'DETAILED')
WHERE page_type_desc IN ('INDEX_PAGE','IAM_PAGE','DATA_PAGE')

Reviewing the results from the query, shown in Figure 1, show a couple interesting items that need to be considered for the final results.  First, as with the DBCC IND output, the information returned is hierarchical in nature.  Each of the pages in the index trees relate to one another (shown with red arrows) and have multiple levels (highlighted in yellow).  In order to discern the first and root pages, the pages will need to be laid out in the proper order.

10-24-2012 2-44-26 AM
Figure 1 – Results from sys.dm_db_database_page_allocations

Ordering the Pages

The next step with the data is three-fold.  First, the pages need to be ordered following the hierarchy of the index.  Then the pages need to be ranked according to their order between page levels and the hierarchy level.  And finally, the binary representation of the values needs to be created.

To start the ordering of pages, the previous and next page ids need to be used to walk down the page order for the indexes.  This is a fairly basic recursive operation that creates a page order value as each page is traversed.  This is needed since the order of pages in a database isn’t necessarily the order of the pages in the index.

The next step is to assign an order to the pages. The easiest way to order the page is to use the ROW_NUMBER() function.  Each ordering needs to be partitioned by page type, object id, and index id.  For the actual order, the rows are ordered by index level in a descending order and then by page order (the value created during recursion) in ascending.  Additionally, for heaps and clustered indexes, a page order is needed for the data pages to identify the first data page.  Of course, we won’t need more than the first two or three order pages, so the recursion can be existed after those pages have been identified.

For the last step, the binary values are generated in much the same way as was required to decode the binary value, but in reverse.  The page id and file id values are converted to binary(4) and binary(2), respectively.  Then the resulting value is split by each of the values in the binary and reassembled in reverse.

The query that accomplishes both of the tasks laid out in this section is accomplished with query in Listing 2.  Before this query will run, though, the results from Listing 1 need to be inserted into a temporary table named #page.  Otherwise, an error will be generated when running the query.


--Listing 2 – Query to order pages

WITH pages_rec AS (
SELECT *, 1 AS page_order
FROM #pages
WHERE previous_page_page_id IS NULL
UNION ALL
SELECT p.*, page_order + 1
FROM #pages p
INNER JOIN pages_rec pr ON p.page_type_desc = pr.page_type_desc
AND p.object_id = pr.object_id
AND p.index_id = pr.index_id
AND p.page_level = pr.page_level
AND p.allocated_page_page_id = pr.next_page_page_id
WHERE page_order < 3
)
SELECT pr.page_type_desc
,pr.object_id
,pr.index_id
,ROW_NUMBER() OVER (PARTITION BY pr.page_type_desc, pr.object_id, pr.index_id
ORDER BY pr.page_level DESC, page_order) page_sequence
,ROW_NUMBER() OVER (PARTITION BY pr.page_type_desc, pr.object_id, pr.index_id
ORDER BY pr.allocated_page_page_id, page_order) primary_page_sequence
,page_level
,allocated_page_page_id
,allocated_page_file_id
,SUBSTRING(CAST(allocated_page_page_id AS VARBINARY(6)),4,1)
+ SUBSTRING(CAST(allocated_page_page_id AS VARBINARY(6)),3,1)
+ SUBSTRING(CAST(allocated_page_page_id AS VARBINARY(6)),2,1)
+ SUBSTRING(CAST(allocated_page_page_id AS VARBINARY(6)),1,1)
+ SUBSTRING(CAST(allocated_page_file_id AS VARBINARY(6)),2,1)
+ SUBSTRING(CAST(allocated_page_file_id AS VARBINARY(6)),1,1) page_binary
FROM pages_rec pr

The results from the query, shown in Figure 2, show how the page_sequence and heap_page_sequence have ordered the pages appropriately for each index and page type.  Also, the page_binary value for each page has been calculated and ready to be used when the proper pages are identified for each of the needed values.

10-24-2012 2-51-17 AM
Figure 2 – Results from sys.dm_db_database_page_allocations

Pivoting the Results

At this point, the pages are ordered and the data is ready to pick through to for the proper values for each of the required columns.  For the root column, if the index is a heap then there is no root to the index, since a heap isn’t an index.  Otherwise, the root will be the first INDEX_PAGE in the page sequence.  The first column for heaps and clustered indexes will be the first DATA_PAGE in the primary_page_sequence.  For non-clustered indexes it is the first page in the page_sequence if the first page has a level of 0.  Otherwise, it is the second page in the page_sequence.  And for the first IAM column, regardless of the index type or heap, the value is always the first IAM_PAGE in the database.  This logic is all included in the query in Listing 3.


--Listing 3 – Pivot page binary values

SELECT ps.OBJECT_ID
,ps.index_id
,MAX(CASE WHEN index_id = 0 THEN CAST(0 AS BINARY(6))
WHEN page_type_desc = 'INDEX_PAGE' AND page_sequence = 1 THEN page_binary
END) AS root
,MAX(CASE WHEN index_id IN (0,1) AND page_type_desc = 'DATA_PAGE' AND primary_page_sequence = 1 THEN page_binary
WHEN page_type_desc = 'INDEX_PAGE' AND page_sequence = 1 AND page_level = 0 THEN page_binary
WHEN page_type_desc = 'INDEX_PAGE' AND page_sequence = 2 THEN page_binary
END) AS first
,MAX(CASE WHEN page_type_desc = 'IAM_PAGE' AND page_sequence = 1 THEN page_binary END) AS first_iam_page
FROM #page_sequence ps
GROUP BY ps.OBJECT_ID
,ps.index_id

Reviewing these results, shown in FIgure 3, provides the same values as querying sysindexes does.  WIth these results, it is now possible to use DMOs in SQL Server 2012 and forward to continue to retrieve the columns root, first, and FirstIAM which are missing from sys.indexes

10-24-2012 3-00-14 AM
Figure 3 – Results from pivot page binary values

Wrapping Up

Digging into all of this has been an interesting exercise.  I found out a bit about how the SQL Server team identifies different page types and how to access them without the need for DBCC commands.  One thing that might be tempting, with these queries, is to join them all together into a single CTE.  You might want to avoid that – with my machine, the CPU fan got a good workout when trying to get the values for every index in AdventureWorks2012.  Instead, just change the object id in the first query to NULL and run each individually – the results should return pretty quickly.

After reading through this post, if you have any comments, please leave them below.  Also, if you are interested in how to map compatibility views to the their replacement objects, see my blog series Lost in Translation – Deprecated System Tables,

Related posts:

  1. Decoding first, root, and FirstIAM in Sysindexes
  2. XQuery for the Non-Expert – Binary Values
  3. What Happened to rowmodctr from Sysindexes?

Decoding first, root, and FirstIAM in Sysindexes

Tuesday, October 23rd, 2012

Have you ever looked at the first, root, and FirstIAM columns in sysindexes and wondered…

“What do these values mean?”

Recently, I took the time to dig into these values to figure out what they meant.  And then, how to take the information that they contain and turn it into something useful.

By the Book

To begin with, sysindexes is a documented compatibility view, so there is some information on these values within Books Online.  According to that source, the columns contain the following:

  • First: Pointer to the first or root Index page.
  • Root: Pointer to the root Index page
  • FirstIAM: Not defined, but is actually the first Index Allocation Mapping (IAM) page.

The pointer value is stored as a binary(6) data type that actually contains two pieces of information.  The first is the page id for the data page this is being references and the second is the file id where the page exists.

Along with those details about the columns are a few other items about the columns, which don’t necessarily apply to this post.  For instance, there are some special conditions based on partitioning and whether the indexes is a heap, clustered, or non-clustered.

In this post, we’ll look at the columns first, root, and FirstIAM and turn them into something useful.  To start, each of these columns contains two pieces of information.  They each contain the file id and the page id for the page in the database that the column references.

Binary Unpacking

Now that we’ve defined the columns in the sections above, the next step is to unpack the values stored in the binary format.  Since there are two values, the key to extracting the values is to understand where they are stored and the how to reconstruct them into the page id and file id.

To start, as already mentioned, the values are stored in binary value with a length of 6.  That means that each of the values are represented in text by 2-characters, as shown in Figure 1.  The first four binary values are the page id and the last two values are the file id.

Binary Page

Figure 1 – Binary representation of database page

Of course, it can’t be quite this simple.  The values in the binary values are stored in reverse and need to be flipped before they can be converted to an INT.  Thus, the file id is comprised of values 5 and 6 and need to be rearranged to 6 and 5 before being converted to an int.  And for the page id, the values 1, 2, 3, and 4 are required and need to re-ordered into 4, 3, 2, then 1, after which they can be converted to an int.

To demonstrate retrieving the file id and page id from a binary value, execute the code in Listing 1.  This statement will return a file id of 1 and a page id of 4621.  With this information at hand, you can use your favorite DBCC command to dig in and uncover more information about the index.


--Listing 1 – Query for file id and page id

DECLARE @BinaryPage BINARY(6)
SET @BinaryPage = 0x0D1200000100
SELECT FileID = CAST(SUBSTRING(@BinaryPage,6,1)
+ SUBSTRING(@BinaryPage,5,1) AS INT)
,PageID = CAST(SUBSTRING(@BinaryPage,4,1)
+ SUBSTRING(@BinaryPage,3,1)
+ SUBSTRING(@BinaryPage,2,1)
+ SUBSTRING(@BinaryPage,1,1) AS INT)

Applying to sysindexes

The most useful way to use this logic, though, is not going to be through a one-time, one-value query.  Instead, the need will be to push it into a query for sysindexes and dig out the file id and page id values from there.  To do this, use the query in Listing 2 and you will be able to get the values split apart for the First, Root, and FirstIAM pages.


--Listing 2 – Query binary values from sysindexes

SELECT id
,indid
,name
,first
,CAST(SUBSTRING(first,6,1)
+ SUBSTRING(first,5,1) AS INT)
AS first_file_id
,CAST(SUBSTRING(first,4,1)
+ SUBSTRING(first,3,1)
+ SUBSTRING(first,2,1)
+ SUBSTRING(first,1,1) AS INT)
AS first_page_id
,root
,CAST(SUBSTRING(root,6,1)
+ SUBSTRING(root,5,1) AS INT)
AS root_file_id
,CAST(SUBSTRING(root,4,1)
+ SUBSTRING(root,3,1)
+ SUBSTRING(root,2,1)
+ SUBSTRING(root,1,1) AS INT)
AS root_page_id
,firstiam
,CAST(SUBSTRING(firstiam,6,1)
+ SUBSTRING(firstiam,5,1) AS INT)
AS firstiam_file_id
,CAST(SUBSTRING(firstiam,4,1)
+ SUBSTRING(firstiam,3,1)
+ SUBSTRING(firstiam,2,1)
+ SUBSTRING(firstiam,1,1) AS INT)
AS firstiam_page_id
FROM sysindexes

Wrapping Up

Hopefully, if you are digging into these, you find this useful.  As I was looking at these, there really weren’t any sources that explained how to do these – just a few forum posts here and there that discussed a little but not all of this information.  The question that I have for people reading this is – why and where would you use these values?  Is this a short cut to, or way to avoid, using DBCC IND?

If you are interested in how to map compatibility views to the their replacement objects, see my blog series Lost in Translation – Deprecated System Tables,

Related posts:

  1. Discovering Binary Pages In Sys.Indexes
  2. What Happened to rowmodctr from Sysindexes?
  3. Lost in Translation – Deprecated System Tables – Introduction

What Happened to rowmodctr from Sysindexes?

Monday, October 22nd, 2012

5300385405_d9747596a5_q

In my blog series Lost in Translation – Deprecated System Tables, I’ve been going through the compatibility views in SQL Server and translating and mapping them over to catalog views and dynamic management objects.  One of the ones that I was working on recently was mapping sysindexes to sys.indexes.  While working on that post, there is one question, from myself and others, that came up time and time again…

What happened to rowmodctr from Sysindexes?

When looking at the schema and definition for sys.indexes, you’ll notice that the column rowmodctr is not included in the catalog view.  Due to changes in the purpose of catalog views versus dynamic management views, it was no longer appropriate to store this value in the catalog view.  But then, where did it go?

Rowmodctr Definition

Before figuring out where the column went, let’s start by first looking at what the rowmodctr column provides.  According to Books Online, the column rowmodctr returns counts of how many rows have been inserted, updated, or deleted on tables.  Or to put it quite simply, the rowmodctr value is a measurement of change for the index.

One thing to note is that prior to SQL Server 2005, the value for rowmodctr was calculated slightly different than it is today.  BY this I mean that  that with SQL Server 2000, the value provided wouldn’t be quite the same as it is now in SQL Server 2012, or in 2005 or 2008.  Regardless of which calculation for rowmodctr your are looking for, the resulting value is nearly the same – so with either, you can continue reading along in this post and find a new manner to calculate rowmodctr.

Rowmodctr Purpose

It is, of course, useful to want to know why people are interested in rowmodctr and what it provides that is useful.  As mentioned earlier, rowmodctr measures the amount of change that has occurred to an index.  It’s a fairly generic measure of change though – for every insert, update, or delete – the value is incremented.  It doesn’t matter if the change was on all rows or a single row – change is change and the counter is incremented.

The reason that people do care about these values is that in conjunction with the STATS_DATE() function, the rowmodctr value can assist in statistics maintenance scenarios.  When considering large environments with many tables and many indexes and a high throughput of queries, getting the query plans right every time is kind of a big deal.  Statistics are critical in making this happen.  But when the volume of work to update all statistics is large, how do you dicide which statistics to update first?

The most often used solution is the STATS_DATE() function, but that function only tells you the age of the statistics.  Just being isn’t a definitive reason to update statistics.  But if you combine the age of the statistics with a metric identifying the amount of change on the table, then you can start to prioritize which statistics to update first.

Redefining Rowmodctr

We are now left with the problem that we know what rowmodctr is and why it is useful, but where did it go?  Is it still there and available to use?

As mentioned in the introduction, sys.indexes is a catalog view, which are primarily focused on storing metadata about database structures.  The information that provides a replacement to the rowmodctr value can instead be found in the dynamic management object sys.dm_db_index_operational_stats.

By using the columns leaf_insert_count, leaf_delete_count, leaf_update_count, and leaf_ghost_count, you arrive at all of the rows that are inserted, updated and deleted from a table or an index.  Through these, you can see the change that is happening and get an idea for what the rate of change is, along with the amount of data that has changed.  The query in LIsting 1 lists the rowmodctr calculation for all of the indexes in the database by using the DMO.


--Listing 1 – Rowmodctr from sys.dm_db_index_operational_stats

SELECT object_id
,index_id
,leaf_insert_count
+ leaf_delete_count
+ leaf_update_count
+ leaf_ghost_count AS rowmodctr
FROM sys.dm_db_index_operational_stats(DB_ID(),NULL,NULL,NULL)

Wrapping Up

WIth this all detailed out, what are your thoughts?  Do you agree that you can use the DMO to get to the same information that was provided from rowmodctr?  Do you trust the results?  Let me know what you think in the comments below.

Follow-Up

Ben Thul (Blog | @SpartanSQL), my former co-worker, pointed out the modification_counter in sys.dm_db_stats_properties.  This column can be used to do what the rowmodctr did in pre-SQL Server 2005 versions of SQL Server.  The catch with this DMO is that it is only available in SQL Server 2008 R2 Service Pack 2 and, soon to be, SQL Server 2012 Service Pack 1.

Related posts:

  1. Decoding first, root, and FirstIAM in Sysindexes
  2. Discovering Binary Pages In Sys.Indexes
  3. Lost in Translation – Deprecated System Tables – sysfulltextcatalogs

Get a Free SQL Server Indexing Book

Wednesday, October 3rd, 2012

What would you say if I offered you the best indexing book on the market for SQL Server 2012?

Wow, right?  The “best” indexing book!  Unfortunately, this is also the “worst” indexing book.  Coincidentally, it is also the “only” indexing book for SQL Server 2012.  And even though the book was written with SQL Server 2012 in mind, most of it applies to SQL Server 2005 and 2008.  So many qualities for one little book… what is this book?

The book that I am speaking of is Expert Performance Indexing for SQL Server 2012; which I wrote earlier this year  with Ted Krueger (Blog | @Onpnt).

One of the perks in writing books is that they send you a box of the books.  I’ve given a few away here and there, but still have a number laying around.  So… let’s have a contest.  I’m going to make this one easy.  Leave a comment on a post between now and my birthday – October 11 – and I’ll enter you into a drawing.  It doesn’t matter which post you comment on – just leave a comment and for each, I’ll enter you into this drawing.

I’ll take all the entries that I get by 8 AM PST on October 11 and send them over to Sarah Strate (Blog | @DanceM0m).  She’ll pick one and then I’ll mail it out to you.  Good luck…


Follow me on Twitter at StrateSQL.

Original article: Get a Free SQL Server Indexing Book

©2012 Strate SQL. All Rights Reserved.

Related posts:

  1. Book: Expert Performance Indexing for SQL Server 2012
  2. Free SQL Server Security Book
  3. Book Status #1: Indexing for Performance

Can You Dig It? – Determining Index Plan Usage

Friday, May 4th, 2012
She Can Dig It!

She Can Dig It!

A while back I was asked, if you can use the plan cache to determine which plans are using specific indexes.  The answer is, of course, yes, you can.  And it’s important that people, especially DBAs, understand how to do this.  Usually during index analysis, we concern ourselves with whether an index is being used.  If it isn’t used, it becomes a candidate for removal.  After a few months of monitoring index usage, you are often safe to remove indexes that never get used.

But what about the indexes that are used infrequently?  What if an index is only used once a day or once a week.  Is it worth the resources for maintaining the index and the storage space dedicated to the index?  Frequency of use is only one measurement of the value of an index.  To get the whole picture, you need to know what the index is doing when it is being used.  In other words, what business process is the index supporting and how will that process function without the index in place.

Since we know the tables in our databases, we often have a good idea how tables are used.  But we don’t really know how the indexes are being used.  To allow us to continue to be able to drop indexes, we need to be able to demonstrate where infrequently used indexes are being used and detail what dropping the index will affect.  While this information is not stored as a set of statistics in dynamic management objects (DMOs), it can be retrieved by querying the SHOWPLAN XML available in sys.dm_exec_query_plan.

Index Usage Setup

Before we can begin with demonstrating this query, we’ll first setup a scenario that uses the index PK_SalesOrderHeader_SalesOrderID in a few queries.  The first two queries select individual orders from the Sales.SalesOrderHeader table and the last returns all records from the table.  The purpose of the variations of SELECT statements is to also demonstration how eve similar queries can generate different plans.


USE AdventureWorks2012
GO

SELECT * FROM Sales.SalesOrderHeader
WHERE SalesOrderID = 10000

SELECT * FROM Sales.SalesOrderHeader
WHERE SalesOrderID = 51531

SELECT * FROM Sales.SalesOrderHeader

Index Usage Query

When querying sys.dm_exec_query_plan, the query_plan column provides an XML representation of the execution plan in the SHOWPLAN XML format.  As noted in previous posts, in this series, there is a lot of great information in execution plans that can be used to performance tune and troubleshoot your servers.

In this example, we are interested in plans that are using PK_SalesOrderHeader_SalesOrderID.  This can be found in the Object element under the StmtSimple XML element.  There are a number of attributes in the Object element.  The one we are interested in today is the Index attribute.  This attribute is quite literally the index that the execution plan is using to satisfy the query request.  By leveraging the exist XQuery function, we can quickly identify any execution plans that are using this particular index.


SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED
GO

DECLARE @IndexName sysname = 'PK_SalesOrderHeader_SalesOrderID';
SET @IndexName = QUOTENAME(@IndexName,'[');

WITH XMLNAMESPACES (DEFAULT 'http://schemas.microsoft.com/sqlserver/2004/07/showplan')
,IndexSearch
AS (
SELECT qp.query_plan
,cp.usecounts
,ix.query('.') AS StmtSimple
FROM sys.dm_exec_cached_plans cp
OUTER APPLY sys.dm_exec_query_plan(cp.plan_handle) qp
CROSS APPLY qp.query_plan.nodes('//StmtSimple') AS p(ix)
WHERE query_plan.exist('//Object[@Index = sql:variable("@IndexName")]') = 1
)
SELECT StmtSimple.value('StmtSimple[1]/@StatementText', 'VARCHAR(4000)') AS sql_text
,ixs.usecounts
,obj.value('@Database','sysname') AS database_name
,obj.value('@Schema','sysname') AS schema_name
,obj.value('@Table','sysname') AS table_name
,obj.value('@Index','sysname') AS index_name
,ixs.query_plan
FROM IndexSearch ixs
CROSS APPLY StmtSimple.nodes('//Object') AS o(obj)
WHERE obj.exist('//Object[@Index = sql:variable("@IndexName")]') = 1

As you can can see in the results, the output returns all of the queries that used the index from the demonstration script.  Each appeared twice, since I ran the demo script twice as well.  One bit of interest, the first two queries, while identical except for the value used in the predicate, resulted in two separate plans.  The first plan uses the variable that is a smallint and the second uses an int.  This is a situation where parameterizing the plans may prevent plan cache bloating.

Index Usage Wrap-Up

As was mentioned in the introduction, it is important to know the true value of an index before you make plans to drop it.  Just because the index is infrequently used doesn’t mean that the index isn’t used in critical processes.  I recall past situations where the elimination of an index used once a day increased import times from minutes to hours.  When you need to use this in your environment, simply change the value for the parameter in the script and dig into you plan cache.

Do you see a use for doing this in your environment?  Or do you remember a time that you could have used a query like this?

Related posts:

  1. Can You Dig It? – Plan Cache Series
  2. Can You Dig It? – The Plan Cache
  3. Can You Dig It? – Parallelism in the Plan Cache

Turning the sp_BLITZ SQL Server Takeover Into a General Health Report

Thursday, April 19th, 2012

As many of you have probably noticed, I haven’t blogged in quite a while due to work commitments, health issues and various family commitments (don’t want to go on too long here with excuses J), but I decided a perfect first blog post back might be taking the stored procedure that a friend and his consulting group have so graciously shared with the community. I am, of course, speaking of Brent Ozar’s sp_BLITZ stored procedure intended to help a DBA see what they are inheriting when someone dumps a new server on them. I kind of took a different twist on this and figured that this might be a great tool to use on all of the SQL Servers in an environment periodically by creating a report around it.

Some Background

I work as the lead DBA in an environment with over 160 SQL Server instances (a number that seems to grow by at least five or so every quarter) and somewhere in excess of 2,000 databases ranging in size from what some of you might consider laughably small to some rather large data warehouse databases, many of which are mission critical to our business. To manage this environment, I have a team of two other DBA’s that I lead. One started with the company the week after I started and the other, a junior DBA, has been with us just over a year. We have a great team, but even with three DBA’s, it is hard to be proactive without some tools to let you know what is going on. Unfortunately, we don’t have the budget for some of the major monitoring tools as the cost for our environment would be rather substantial. Needless to say, it is left to me and my team to be creative and create our tools and instrumentation. That is where Brent’s sp_BLITZ script comes in. With a report written around it that my junior DBA can go through on a weekly or monthly basis, we can be much more proactive with some of the more basic or fundamental settings that someone who shouldn’t have access to change, but always inevitably does, changes without our knowledge.

 

The Report

So, the report itself is pretty simple. Unfortunately, it does require that you have a server that has linked servers to all of your servers (we have a centralized DBA server that we use for this) and the sp_BLITZ script that can be downloaded from here has to be installed on each of these servers. This is a perfect use for the SQL Server 2008 Central Management Server feature that we have set up on our DBA monitoring server. What I have done in the report is created two datasets, one that queries a table that we maintain with an inventory of all of our SQL Servers which will feed the “Server Name” report parameter and the second which actually runs the sp_BLITZ stored procedure on the server that has been chosen from the dropdown. Brent has a great video on exactly what his script does at http://www.brentozar.com/blitz/. This report just gives you a format that you can go out and run off of your Reporting Services site or even schedule to run automatically in a Reporting Services subscription and have it automatically emailed to you or posted out in a document library on a SharePoint site if you are running Reporting Services in SharePoint integrated mode. This report does require that your Reporting Services service is at least 2008 R2 in order to work. One of the nice things about this report is that the URLs that Brent provides in the output for this stored procedure are active links in this report, so if you click in that URL cell, you will be taken to the page on Brent’s site that explains the Finding. Below are some screenshots of the report in collapsed and expanded form (all private information has been blacked out to protect the innocent or at least those who sign my paycheck J):

 

    

Figure 1 Collapsed Version of Report

 

    

    

Figure 2 Expanded View of Report

 

Setting Up The Report

So, to use the report that is freely downloadable at the end of this blog post, all you need to do is go into the Data Source for the report and change it to the name of your monitoring SQL Server or at least a server that has linked servers to all of the servers that you want to manage with this report, like so, replacing the text <Type Your Monitoring Server Here> with the name of your monitoring server.:

    

    

 

The next step is to make sure that you have a table on your monitoring server that has an inventory list of all of the servers from your environment and replace the <ServerName.database.schema.tablename> text in the query in the Servers Dataset with the pertinent information for your environment. See below:

 

    

 

 

From here, it is just a matter of deploying the report to your Reporting Services server and making sure that Brent’s stored procedure has been created on all of the servers that you wish to monitor.

 

The report can be downloaded here (you will need to go to Brent’s site mentioned earlier in this blog post to get the latest version of his sp_BLITZ script). I hope that you find this to be one of the many helpful tools in your tool chest to keep your environment in check.

    

Execution Plan for a User Defined Function

Monday, March 26th, 2012

When you execute a multi-statement user-defined function you may see an execution plan that looks something like this:

image

It appears as if the cost of the UDF is free. This is especially true if you use the UDF in a query with other objects, such as joining it to actual tables. Since the optimizer always assumes a multi-statement UDF has a single row for statistics estimates, it’ frequently displays a low cost.

But you know that there’s more going on there, right? It’s a multi-statement UDF because it’s doing a lot of work, but that is not reflected in the execution plan.. or is it?

What if we went after the cache? Let’s run this little query:

SELECT  deqp.query_plan,
dest.text,
SUBSTRING(dest.text, (deqs.statement_start_offset / 2) + 1,
(deqs.statement_end_offset – deqs.statement_start_offset)
/ 2 + 1) AS actualstatement
FROM    sys.dm_exec_query_stats AS deqs
CROSS APPLY sys.dm_exec_query_plan(deqs.plan_handle) AS deqp
CROSS APPLY sys.dm_exec_sql_text(deqs.sql_handle) AS dest
WHERE   deqp.objectid = OBJECT_ID(‘dbo.ufnGetContactInformation’);

This query doesn’t do anything really fancy. I’m using the sys.dm_exec_query_stats which holds aggregated performance metrics for any query currently in cache because it has the plan_handle that lets me look up execution plans in sys.dm_exec_query_plan. It just so happens that you also get the objectid there in the plan so we can look for the specific plan that corresponds to our UDF (I’m using an example from AdventureWorks2008R2). The results from the query look like this:

image

Each row represents a statement that has been executed from the UDF. There’s just a single plan for the UDF, and it looks like this:

image

Without even drilling down to all the details of what’s going on in the plan, I think we can agree, that this UDF is not zero cost, but may in fact be extremely expensive depending on what’s going on inside that execution plan.

Why is this useful? Because it lets you look behind that simplistic… lie, that is shown as a plan for the execution of a UDF to see what the UDF is actually doing. This may make it possible to improve the performance of the UDF since you’ll be able to identify missing indexes and other information thanks to your direct access to what’s going on inside the UDF.

I had demoed this during my 24 Hours of PASS presentation and was asked to put the code up on my blog, so here you go. I hope it’ proves helpful.

How can I tell if a SQL Server system is affected by Forwarded records?

Thursday, March 8th, 2012

Update: This post is participating in the 4th installment of the T-SQL Tuesday hosted this time by Mike Walsh.

One of the side affects of having Heaps in a system is the possibility of generating forwarded records. When reading data from a heap, Forwarded records (if they are present in a heap) can generate extra, random and potentially physical IOs when compared to a table with clustered index and this may hurt the performance if one has to read lot of data from that heap. Note that having a heap doesn’t mean it is guaranteed to have forwarded records. Enough has been said and blogged about the details of forwarded records already and in this post I am NOT going to rehash the details of forwarded records and how it affects the performance but will go over an example and a script that will help to find out if forwarded records were used to fetch the data in the heap. Armed with that information one can see the true cost of these extra IOs generated by forwarded records. Looking at sys.dm_db_index_physical_stats shows if forwarded records are present in the heap but it doesn’t necessarily tell if the workload is indeed affected by them, how much and how many times. In the script I am going to show at the end of this post, one can see how many times forwarded records have been used to retrieve the data and understand the total cost of *extra IOs* and the performance implications of forwarded records.

I will be borrowing Kalen’s example for this test and here is the script for the test data.
USE 
AdventureWorks
GO

IF OBJECT_ID ('Details''U'IS NOT NULL
DROP TABLE Details;
GO

CREATE TABLE dbo.Details
([SalesOrderID] [int] NOT NULL,
[SalesOrderDetailID] [int] NOT NULL,
[CarrierTrackingNumber] [nvarchar](25) NULL,
[OrderQty] [smallint] NOT NULL,
[ProductID] [int] NOT NULL,
[SpecialOfferID] [int] NOT NULL,
[UnitPrice] [money] NOT NULL,
[UnitPriceDiscount] [money] NOT NULL);
GO
INSERT INTO dbo.Details
SELECT [SalesOrderID]
,[SalesOrderDetailID]
,[CarrierTrackingNumber]
,[OrderQty]
,[ProductID]
,[SpecialOfferID]
,[UnitPrice]
,[UnitPriceDiscount]
FROM [AdventureWorks].[Sales].[SalesOrderDetail];
GO

SELECT OBJECT_NAME(OBJECT_IDAS Object
page_count
avg_page_space_used_in_percent
forwarded_record_count
FROM sys.dm_db_index_physical_stats (DB_ID(N'AdventureWorks'), OBJECT_ID ('dbo.Details'),  NULL, NULL, 'DETAILED');
GO
SET STATISTICS IO ON
GO
-- A table scan takes as many reads as there are pages, i.e. 856 in this case
SELECT FROM dbo.Details;
GO
SET STATISTICS IO OFF
GO

/*
Table 'Details'. Scan count 1, logical reads 856, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.*/

-- Now add a new fixed width column and note that this is a
-- metadata only change
-- The data pages are not modified
-- There is no change in the fullness of the pages

ALTER TABLE dbo.Details ADD notes CHAR(100);
GO

SELECT OBJECT_NAME(OBJECT_IDAS Object
page_count
avg_page_space_used_in_percent
forwarded_record_count
FROM sys.dm_db_index_physical_stats (DB_ID(N'AdventureWorks'), OBJECT_ID ('dbo.Details'),  NULL, NULL, 'DETAILED');
GO
-- The data pages are not affected until we run the following update.
-- Every row on every page will get an additional 100 bytes in the notes field
--  added to it
UPDATE dbo.Details
SET notes 'notes';
GO

-- note there are LOTS of forwarded records now (84408),
-- and many more pages the table (2709)
SELECT OBJECT_NAME(OBJECT_IDAS Object
page_count
avg_page_space_used_in_percent
forwarded_record_count
FROM sys.dm_db_index_physical_stats (DB_ID(N'AdventureWorks'), OBJECT_ID ('dbo.Details'),  NULL, NULL, 'DETAILED');
GO
SET STATISTICS IO ON
GO
-- The number of reads is not just the number of pages as we would expect for
-- a scan of a heap, but is equal to the
-- number of pages PLUS the number of forwarded records:
--  84408 + 2709 = 87117
-- During a scan, the forwarded pointers are followed for EACH row, and then
-- SQL Server goes back to the original position to continue the scan

SELECT FROM dbo.Details;
GO
SET STATISTICS IO OFF
GO

/*Table 'Details'. Scan count 1, logical reads 87117, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.*/
So far we have seen the existence of forwarded records using sys.dm_db_index_physical_stats and the number of logical reads go up using SET STATISTICS IO ON. The next query will tell you how many times forwarded records have been traversed to satisfy the queries against the heap.

SELECT

DB_NAME(database_id) AS database_name

, OBJECT_NAME(OBJECT_ID) AS OBJECT_NAME

, forwarded_fetch_count

FROM sys.dm_db_index_operational_stats (DB_ID('AdventureWorks'), OBJECT_ID('dbo.Details'), NULL, NULL)

In our example, our heap had 84408 forwarded records and so far the entire heap was traversed 2 times 84408 * 2 = 168816
/*
database_name  object_name forwarded_fetch_count
AdventureWorks Details     168816
*/

Before we take off, lets take the forwarded records for a spin and see if the DMV sys.dm_db_index_operational_stats displays the true count of how many times these forwarded records are used.

SELECT * FROM dbo.Details;

GO 10
 
SELECT

DB_NAME(database_id) AS database_name

, OBJECT_NAME(OBJECT_ID) AS OBJECT_NAME

, forwarded_fetch_count

FROM sys.dm_db_index_operational_stats (DB_ID('AdventureWorks'), OBJECT_ID('dbo.Details'), NULL, NULL)

So far our heap was traversed 2 times and we spinned the table another 10 times above with a total of 12 * 84408 forwarded records = 1012896.
/*
database_name  object_name forwarded_fetch_count
AdventureWorks Details     1012896
*/

If forwarded records are a new concept for you then I would urge you to look at these excellent posts by Kalen, Paul and Simon on this topic. This post is participating in the 4th installment of the T-SQL Tuesday hosted by Mike.
whats-worse-than-a-table-scan
fragmentation-and-forwarded-records-in-a-heap
Forwarding-and-forwarded-records-and-the-back-pointer-size
INSERT and UPDATE loading practice – The impact of forwarding pointers.

Execution Plans, What Do I Look At?

Monday, January 30th, 2012

lookThe question came up, what are the top 3-5 things that you look at in a query. I realized then that I hadn’t really written them down. There are some areas that let me know pretty much immediately where problems lie within a plan. You don’t necessarily know the answer from looking at these things, but you know you have a problem. Being me, I couldn’t stick to 5, so here are 6:

  1. Immediately check the properties on the first operator (SELECT/DELETE/INSERT). There’s tons of information in here and people frequently just ignore it. You can see if the plan is based on full optimization or not. That immediately tells me if I’m working on the optimizer’s best estimate at a plan or I’m looking at a timeout. If it’s a timeout, I know I can’t count on this plan being good. Also I get the parameter compile time & run time values to help determine parameter sniffing issues in the properties.
  2. Warnings. If you see no join predicate warnings, that should jump up and poke you in the eye like some jumping eye-poking little monster. Same goes with missing statistics. The new warnings in plans in 2012 are equally important to know about. These are quick pieces of information that should immediately point you in a direction of inquiry within the plan.
  3. The most costly operations. Yes, I know you can’t trust these values because they are just estimates. Yes, the estimated operator cost is the same in both estimated and actual plans. No measurements of actual cost are taken by an execution plan. But these are the numbers available, so I use them. They’re accurate more often than not and quickly lead you to the possible source of the problem.
  4. Fat pipes. Now really, these are usually just an indication of volume and knowing that you’re moving lots of rows helps you read a plan (umpty-million rows joining umpty-million rows through a Loop might be an issue). But the real alarm bells go off when you see big fat pipes going to little skinny ones or skinny ones to big fat ones or even skinny-fat-skinny. That’s a huge indicator of something
  5. Extra operators. This is like that old statement about pornography “I can’t give you a precise definition, but I know it when I see it.” It’s looking for stuff that doesn’t belong. For example, you don’t have a single ORDER BY statement, but there sits a Sort operation. Why? That’s my “extra operator” indicator telling me to dig deeper.
  6. Scans. Scans are not necessarily bad and Seeks are not necessarily good. In general terms, with smaller data sets, you usually would expect to see a Seek over a Scan. Scans can be the right, good, and best choice, especially for very large data sets and in other situations, but they are an indicator of potential issues.

After that, you have a whole slew of things you can get worked up about. Table Spools in SELECT statements are usually not good. Look for indications of multi-statement UDF’s (Scan’s with zero cost). Loop joins when a Merge makes more sense, Merges where you ought to see a Hash, missing index information, mismatch between estimated & actual, blah, blah, blah… You get the point. There’s just tons & tons of information within execution plans. But that list of six are usually the first things I look for.