Adventures in SQL: SP_MSFOREACHTABLE to find the largest table

March 18th, 2010 by Jen Stirrup
When exploring a SQL Server database for the first time, it’s always useful to identify the most populated tables first because they are very likely to be the most important tables. To do this in a situation where there are tight time-scales, it’s possible to use some undocumented SQL Server procedures to quickly produce a list of the tables which have the most rows.

Before proceeding, however, it’s wise to note that undocumented stored procedures are not guaranteed by Microsoft. Further, they are not ‘fixed’ so they could change without warning, so use this command at your own discretion, and treat the results with care!

One example of an undocumented SQL Server procedure which has been useful is SP_FOREACHTABLE.  Although it’s not guaranteed, it might be useful for some to try it out as a starting point when exploring data, when it’s useful to find out the most populated tables in your new unexplored database.

In this example, a new table called TableRowCount will be created. This table will hold the results of the execution of SP_FOREACHTABLE.  To assist you, the SQL to do this is given below:

/****** Check if table TableRowCount exists already, and if so, delete it    ******/ IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[TableRowCount]') AND type in (N'U'))
DROP TABLE [dbo].[TableRowCount]
GO
/****** Create table TableRowCount                                                         ******/
CREATE TABLE [TableRowCount](
  TableName sysname,
  [TableRowCount] int )

Once the table is created, the SP_FOREACHTABLE procedure is executed as follows:

/****** Execute stored procedure, and save results in table TableRowCount ******/
EXEC sp_MSForEachTable 'INSERT [TableRowCount](TableName, [TableRowCount]) SELECT ''?'', COUNT(*) FROM ?'

Then, it’s possible to retrieve the results of this command from the TableRowCount table, as follows:

/****** Display the results from table TableRowCount, in descending order ******/


select * from TableRowCount
order by TableRowCount.TableRowCount desc

The above statement will sort the data so that the table with the highest number of rows will appear at the top. It's a guess that the most populated table is an important one, but it's not a bad guess all the same!

A Clustered Index Does Not Guarantee Order

March 18th, 2010 by Larry Leonard

Learned something the hard way recently: just because a table has a clustered index doesn't mean a SELECT without an ORDER BY will return the rows in order. For example, given this table:

CREATE TABLE MyTable
(
    ID  INT IDENTITY (1, 1)  NOT NULL
        CONSTRAINT PK_tblMyTable PRIMARY KEY CLUSTERED (ID)
)

... if you say:

SELECT * FROM MyTable

... then the rows will be returned in undefined (that is, random) order.

"Clustering", I've learned, is a physical construct, not a logical one. The rows are laid out in physical order on the hard drive, but SQL Server is free (in the abscence of an ORDER BY clause) to return them in whatever order it thinks most efficient.

As developers, we don't usually see this behavior, because we tend to have fresh, well-maintained tables. In fact, I only discovered it in a customer's database that hadn't had any maintenance done to it in years. So, the bottom line is, as always, "no ORDER BY, no ordering."

Hyper-V v2 has come a long way, but so has VMware

March 18th, 2010 by Denny Cherry

The timing on this post might seam a little strange, but I’ve been meaning to write this for a while and I finely got a chance to do it.

Back when Hyper-V 1.0 was released it wasn’t all that great of a product.  It showed some promise, but it really wasn’t there.  I had all sorts of people (mostly from Microsoft) telling me that it was way better than ESX and that I needed to give it a shot.  My personal feelings are that it wasn’t anywhere near where ESX was, and for my production environment I needed the better product, so we went with ESX 3.5.

Well a while back Microsoft released Hyper-V 2.0 and it is a much better release than it was at the time.  I’d even be willing to stack it up next to VMware’s ESX 3.5 which was VMware’s competing version at the time of release.  Put next to ESX 3.5 you would have two well matched products.  Both included a real time online migration solution vMotion for ESX and life migration for Hyper-V.  Both support being put into a high availability cluster.  Both support pass through disks so that the guest OS has direct access to the fibre channel storage.

However shortly after Hyper-V 2.0 was release VMware released vSphere 4.0 which is the successor to ESX 3.5 and with vSphere 4.0 they’ve blown the doors off of Hyper-V yet again.

vSphere gives us FT or Fault Tolerance which basically runs a single VM on two machines with only one of the machines being active at a time.  In the event that one host fails the other host being running the VM actively with no outage to the guest.  Users connected to the guest won’t even know that the guest has switched to another machine.

VMware has also introduced some interesting features as experimental which means that we will probably see them show up as full on features in a future release.  This includes the ability to map an HBA directly to a virtual machine to give the VM actual direct access to the HBA.  At the moment that HBA can only be mapped to a single VM, but hopefully in the next release they will fix that.

Now don’t get we wrong, I think that Hyper-V has come a long way since the v1 release.  Do I think that Hyper-V is an Enterprise Ready solution?  Yes I do.  Do I think Hyper-V is ready to be called the winner in the virtualization server space?  No, not at all.  I think it is anyone’s game still before we have a clear winner.  Hyper-V has a big selling point to it, the cost to get into Hyper-V is free, as long as you don’t want to cluster it.  Then you’ll need to purchase a management tools license for each host machine.  How with VMware you’ll want the management tool weather you cluster the machines or not, but its a single purchase from VMware at least.

What it really comes down to is that you need to fully evaluate both platforms and due a solid CBA (Cost Benefit Analysis) as well as a full feature analysis before picking a platform for your enterprise.  Because once you pick one platform moving from one to the other is very tough to do.

Denny

Dirty Tricks – Comparing rowcount

March 18th, 2010 by Jen McCown

Content Rating: Beginner

I don't have many cool tools at my current gig to help me out, so I'm often forced to come up with quick and dirty solutions of my own. Today I have to compare row counts in two identical databases, one in Dev and one in QA. Here's what I did.

1. Generate a list of select count(*) statements

-- Get the SELECT statements for rowcounts for all user tables
select 'select ''' + name + ''' nm, count(*) cnt from ' + [name] + ' UNION '
from sys.objects where type_desc = 'USER_TABLE' order by [name]

This generates a list of statements like this:

select 'Client' nm, count(*) cnt from Client UNION

2. Copy-paste that list into a query window and run. (Note that we remove the last Union statement).

select 'Client' nm, count(*) cnt from Client UNION

select 'Supplier' nm, count(*) cnt from Client UNION  
select 'Anchor' nm, count(*) cnt from Client

This gets us a resultset like this:

nm          cnt 
Client       12333 
Supplier    401
Anchor     32

3. Do the same on the other environment.

 Now at this point, you have your data. If there are few enough tables, you can compare by hand. Or you could run each set of SELECT statements with Save As File, import those file to your sandbox DB as two tables, and compare rowcounts that way.

Edit: That's the old school way of getting rowcounts from pre-DMV days. So if you're using SQL Server 2000 or earlier, or need exact rowcounts, use the above.  If you'd like something faster and easier, check out Aaron Bertrand's (blog | twitter) solutions in the comments below!

Another day, another dirty little trick from the MidnightDBA.  Show us your dirty tricks, or a better way to compare - either blog or post in comments!

-Jen McCown
http://www.MidnightDBA.com

Performance Tuning 101 – What You Will Not Learn In The Class Room (Part 2)

March 18th, 2010 by Adam Haines

In my last post, http://jahaines.blogspot.com/2010/03/performance-tuning-101-what-you-will.html I talked about performance tuning queries that appear to be well tuned.  There are a lot of optimization techniques available unbeknownst to most developers that do not require indexes or radical code changes.  These are the optimizations that I will be talking about in this post.  There is absolutely no way I could go over every possible optimization technique available, but I will do my best to present as much content here today, and will make future posts on other techniques.

I will start things off  by talking about a challenge that Ramesh Meyyappan presented in his webcast, http://www.sqlworkshops.com/.  Ramesh’s challenge was to solve the TOP 101 phenomenon, using SQL Server 2005.  To start things off, I will create a sample table, with data.

USE [tempdb]
GO

SET NOCOUNT ON;
GO

IF EXISTS(SELECT 1 FROM sys.tables WHERE NAME = 'TestData')
BEGIN
    DROP TABLE dbo.[TestData];
END
GO

CREATE TABLE dbo.TestData(
RowNum INT PRIMARY KEY,
SomeId INT,
SomeCode CHAR(2000)
);
GO

;WITH 
   L0 AS (SELECT 1 AS C UNION ALL SELECT 1)       --2 rows
  ,L1 AS (SELECT 1 AS C FROM L0 AS A, L0 AS B)    --4 rows (2x2)
  ,L2 AS (SELECT 1 AS C FROM L1 AS A, L1 AS B)    --16 rows (4x4)
  ,L3 AS (SELECT 1 AS C FROM L2 AS A, L2 AS B)    --256 rows (16x16)
  ,L4 AS (SELECT 1 AS C FROM L3 AS A, L3 AS B)    --65536 rows (256x256)
  ,L5 AS (SELECT 1 AS C FROM L4 AS A, L4 AS B)    --4,294,967,296 rows (65536x65536)
  ,Number AS (SELECT row_number() OVER (ORDER BY (SELECT 0)) AS N FROM L5) 
INSERT INTO dbo.TestData
SELECT
    N AS RowNumber,
    ABS(CHECKSUM(NEWID()))%1000000+1 AS SomeId ,
    REPLICATE('a',2000) AS SomeCode
FROM Number
WHERE [N] <= 50000
GO

UPDATE STATISTICS dbo.[TestData] WITH FULLSCAN;
GO

Next, I will create a query that uses TOP and an order by to return 100 rows. 

--Fast
SELECT TOP 100 [RowNum],[SomeId],[SomeCode]
FROM dbo.[TestData]
WHERE [RowNum] < 30000
ORDER BY [SomeId]
OPTION (MAXDOP 1)
/*
SQL Server parse and compile time: 
   CPU time = 0 ms, elapsed time = 1 ms.

 SQL Server Execution Times:
   CPU time = 78 ms,  elapsed time = 102 ms.
SQL Server parse and compile time: 
   CPU time = 0 ms, elapsed time = 0 ms.
*/

Now watch what happens when I change the TOP operator to 101.  You will notice that I did not change anything else in the query other than increasing the number of rows returned by 1.

--Slow
SELECT TOP 101 [RowNum],[SomeId],[SomeCode]
FROM dbo.[TestData]
WHERE [RowNum] < 30000
ORDER BY [SomeId]
OPTION(MAXDOP 1)

/*
SQL Server parse and compile time: 
   CPU time = 0 ms, elapsed time = 0 ms.

 SQL Server Execution Times:
   CPU time = 312 ms,  elapsed time = 1690 ms.
SQL Server parse and compile time: 
   CPU time = 0 ms, elapsed time = 0 ms.
*/

Wow…. TOP 101 is over 17 times slower than TOP 100 and all I changed is the number of rows in the TOP operator!!!!!  So why does the optimizer take so much longer to optimize and execute a query using TOP 101, oppose to TOP 100?  The short answer is the memory requirements.  The TOP 101 queries requires a lot more query memory than TOP 101, which translates into tempdb sorting.  As you may recall, I addressed some techniques to solve the tempdb sorting problem in my last post, http://jahaines.blogspot.com/2010/03/performance-tuning-101-what-you-will.html.  If you are using SQL 2008, you can use the same optimization techniques presented in my prior post, but SQL 2005 is a completely different animal.  To make the TOP 101 query faster, we need to first understand why it is slower.  Let’s take a look at what is different when we run the TOP 100 and the TOP 101 query.

Lets start by looking at the memory SQL Server grants to each query.  Open two different query windows and execute each top query within a while loop.  We can then use sys.dm_os_memory_grants to get the required memory.

Here is a sample of how to run the TOP query in a while loop.

WHILE 1=1
SELECT TOP 100 [RowNum],[SomeId],[SomeCode]
FROM dbo.[TestData]
WHERE [RowNum] < 30000
ORDER BY [SomeId]
OPTION(MAXDOP 1)

In a new query window, run the following query to get the memory specifications.

SELECT  [granted_memory_kb],[required_memory_kb],[max_used_memory_kb] FROM sys.dm_exec_query_memory_grants WHERE [session_id] = 58

Here are my results:

     granted_memory_kb    required_memory_kb   max_used_memory_kb
---- -------------------- -------------------- --------------------
FAST 1024                 216                  216

     granted_memory_kb    required_memory_kb   max_used_memory_kb
---- -------------------- -------------------- --------------------
SLOW 6040                 512                  6040

The results are simply astonishing.  The memory requirements increase nearly 28 times when I use TOP 101, instead of TOP 100.  I do not have a formal explanation of why the TOP 101 operator, consumes more memory than TOP 100.  Brad Schulz, http://bradsruminations.blogspot.com/, has contacted Conor Cunningham about this issue and believes that 101 is an arbitrary threshold.  Brad is working on an in-depth post involving the TOP operator.  Keep an eye out for this one, as it should be really good.  Anyway, once the 101 threshold is breached the optimizer uses different calculations to optimize a query, which can effectively bloat the memory requirements for the query.  This memory bloat forces the sort operation to spill into tempdb.  This is where the TOP 101 bottleneck exists.  To verify this problem, open profiler and choose the sort warnings counter and you will see that the slow query has a sort warning error, while the fast query does not.

Now that I have identified the problem, how do I solve it?  I will start by attempting the methods that I used in the previous article, http://jahaines.blogspot.com/2010/03/performance-tuning-101-what-you-will.html.

SELECT TOP 101 [RowNum],[SomeId],CAST([SomeCode] AS VARCHAR(4200))
FROM dbo.[TestData]
WHERE [RowNum] < 30000
ORDER BY [SomeId]
OPTION(MAXDOP 1)

/*
 SQL Server Execution Times:
   CPU time = 344 ms,  elapsed time = 3385 ms.
SQL Server parse and compile time: 
   CPU time = 0 ms, elapsed time = 0 ms.
*/

Bloating the estimated row size still did not help our situation.  Next I will try shrinking the row size.

SELECT TOP 101 [RowNum],[SomeId],RTRIM(CAST([SomeCode] AS VARCHAR(2000)))
FROM dbo.[TestData]
WHERE [RowNum] < 30000
ORDER BY [SomeId]
OPTION(MAXDOP 1)

/*
 SQL Server Execution Times:
   CPU time = 344 ms,  elapsed time = 2461 ms.
SQL Server parse and compile time: 
   CPU time = 0 ms, elapsed time = 0 ms.
*/

Hmm.  Still no luck….. How can I reduce the row size of the input passed into the sort operator?  When you really sit back and think about the problem, the answer is really simple.  To reduce the row size, all you have to do is reduce the columns involved in the sort.  I like to use the TOP inside a derived table, making sure to only use the RowNum and SomeId columns.  We can then join back onto the TestData table.  This gives us a fast sort and a ultra fast index seek on the 101 rows we are returning.

SELECT t.[RowNum],t.[SomeId], t.[SomeCode]
FROM dbo.[TestData] t
INNER JOIN(
    SELECT TOP 101 [RowNum],[SomeId]
    FROM dbo.[TestData]
    WHERE [RowNum] < 30000
    ORDER BY [SomeId]
) AS t2
    ON T.RowNum = t2.RowNum
ORDER BY t.[SomeId]
OPTION (MAXDOP 1)

/*
 SQL Server Execution Times:
   CPU time = 31 ms,  elapsed time = 104 ms.
SQL Server parse and compile time: 
   CPU time = 0 ms, elapsed time = 0 ms.
*/

Alternatively, we can use correlated subqueries or the cross apply operator.

SELECT TOP 101
    t.[RowNum],
    (SELECT t2.[SomeId] FROM dbo.[TestData] t2 WHERE t2.[RowNum] = t.[RowNum]) AS SomeId,
    (SELECT t2.[SomeCode] FROM dbo.[TestData] t2 WHERE t2.[RowNum] = t.[RowNum]) AS SomeCode
FROM dbo.[TestData] t
WHERE t.[RowNum] < 30000
ORDER BY t.[SomeId]
OPTION (MAXDOP 1)

SELECT TOP 101
    t.[RowNum],
    t2.SomeId,
    t2.SomeCode
FROM dbo.[TestData] T
CROSS APPLY(SELECT t2.SomeId, t2.SomeCode FROM dbo.[TestData] t2 WHERE t2.[RowNum] = t.[RowNum]) AS t2
WHERE t.[RowNum] < 30000
ORDER BY t.[SomeId]
OPTION (MAXDOP 1)

It should be noted that the correlated subquery method will produce more IO because it uses two subqueries.  As you can see, the solution to this challenge is quite simple, but the solution requires an understanding of what is occurring underneath the hood of SQL Server. 

Note: It is still possible that some of the sorting will be sent to tempdb, but you should see a elapsed time that rivals TOP 100.

The next optimization technique, I will be demonstrating is a predicate pushing problem.  Unbeknownst to most developers, SQL Server 2005 does have a problem with predicate pushing in views.  A lot of these issues have been resolved in SQL Server 2008, but should be known.  I will be demonstrating a very simple example, using a ranking function.  Ranking functions are relatively new to SQL Server and were introduced in 2005.  I am sure there are other scenarios that cause predicate pushing problems, but I will only be addressing the ranking problem, in this post.

Let’s start by creating a small sample table.

USE [tempdb]
GO

CREATE TABLE Test(
ID INT IDENTITY(1,1) PRIMARY KEY,
FName VARCHAR(50),
LName VARCHAR(50)
);

INSERT INTO dbo.Test VALUES ('Adam','Haines');
INSERT INTO dbo.Test VALUES ('John','Smith');
INSERT INTO dbo.Test VALUES ('Jane','Doe');
GO

CREATE NONCLUSTERED INDEX ncl_idx_LName ON dbo.Test(LName) INCLUDE(FName);
GO

As you can see, the table is relatively simple.  The idea is to present an easy to understand example that demonstrates potential performance problems with views.

Here is my simple query that shows an index seek on LName.

SELECT Id,FName,LName,ROW_NUMBER() OVER(PARTITION BY LName ORDER BY Id) AS seq
FROM dbo.Test
WHERE LName = 'Smith'
GO

image

Let’s see what happens when I put the logic into a view, with no predicate.  The predicate will be called from outside the view and should be pushed down into the view, as views are materialized into the underlying objects at runtime.

CREATE VIEW dbo.vw_Test
AS
SELECT Id,FName,LName,ROW_NUMBER() OVER(PARTITION BY LName ORDER BY Id) AS seq
FROM dbo.Test
GO

I will now query the view using the same predicate as the original query.

SELECT Id,FName,LName,seq
FROM dbo.vw_Test 
WHERE LName = 'Smith'
GO

image

The problem here is the optimizer decided to filter the results of the query AFTER the table “Test” has been scanned.  One should expect the optimizer to seek on the LName column because the optimizer should push the predicate; however, SQL Server 2005 does not do a great job of this.  SQL Server 2008 will appropriately push the predicate deep into the plan to get the index seek.  How do we solve this problem?  Unfortunately, there is not a whole lot you can do to make the plan work more efficiently.  The best option in my opinion is to a INLINE TVF to parameterize the query. 

CREATE FUNCTION dbo.fn_Test(@LName VARCHAR(50))
RETURNS TABLE
RETURN(
    SELECT Id,FName,LName,ROW_NUMBER() OVER(PARTITION BY LName ORDER BY Id) AS seq
    FROM dbo.Test
    WHERE LName = @LName
)
GO

Now execute a select against the TVF using the same predicate.

SELECT Id,FName,LName,seq
FROM dbo.fn_Test('Smith')

image

There you have it.  I have demonstrated a few optimization techniques that I have used to solve performance problems.  I have only scratched the surface here.  There are many more optimization techniques available.  Stay tuned for future posts, where I will explore even more optimization techniques including a shocking example demonstrating how an index rebuild can introduce fragmentation and how to avoid it.

Until next time, happy coding.

Performance Tuning 101 – What You Will Not Learn In The Class Room (Part 2)

March 18th, 2010 by Adam Haines

In my last post, http://jahaines.blogspot.com/2010/03/performance-tuning-101-what-you-will.html I talked about performance tuning queries that appear to be well tuned.  There are a lot of optimization techniques available unbeknownst to most developers that do not require indexes or radical code changes.  These are the optimizations that I will be talking about in this post.  There is absolutely no way I could go over every possible optimization technique available, but I will do my best to present as much content here today, and will make future posts on other techniques.

I will start things off  by talking about a challenge that Ramesh Meyyappan presented in his webcast, http://www.sqlworkshops.com/.  Ramesh’s challenge was to solve the TOP 101 phenomenon, using SQL Server 2005.  To start things off, I will create a sample table, with data.

USE [tempdb]
GO

SET NOCOUNT ON;
GO

IF EXISTS(SELECT 1 FROM sys.tables WHERE NAME = 'TestData')
BEGIN
    DROP TABLE dbo.[TestData];
END
GO

CREATE TABLE dbo.TestData(
RowNum INT PRIMARY KEY,
SomeId INT,
SomeCode CHAR(2000)
);
GO

;WITH 
   L0 AS (SELECT 1 AS C UNION ALL SELECT 1)       --2 rows
  ,L1 AS (SELECT 1 AS C FROM L0 AS A, L0 AS B)    --4 rows (2x2)
  ,L2 AS (SELECT 1 AS C FROM L1 AS A, L1 AS B)    --16 rows (4x4)
  ,L3 AS (SELECT 1 AS C FROM L2 AS A, L2 AS B)    --256 rows (16x16)
  ,L4 AS (SELECT 1 AS C FROM L3 AS A, L3 AS B)    --65536 rows (256x256)
  ,L5 AS (SELECT 1 AS C FROM L4 AS A, L4 AS B)    --4,294,967,296 rows (65536x65536)
  ,Number AS (SELECT row_number() OVER (ORDER BY (SELECT 0)) AS N FROM L5) 
INSERT INTO dbo.TestData
SELECT
    N AS RowNumber,
    ABS(CHECKSUM(NEWID()))%1000000+1 AS SomeId ,
    REPLICATE('a',2000) AS SomeCode
FROM Number
WHERE [N] <= 50000
GO

UPDATE STATISTICS dbo.[TestData] WITH FULLSCAN;
GO

Next, I will create a query that uses TOP and an order by to return 100 rows. 

--Fast
SELECT TOP 100 [RowNum],[SomeId],[SomeCode]
FROM dbo.[TestData]
WHERE [RowNum] < 30000
ORDER BY [SomeId]
OPTION (MAXDOP 1)
/*
SQL Server parse and compile time: 
   CPU time = 0 ms, elapsed time = 1 ms.

 SQL Server Execution Times:
   CPU time = 78 ms,  elapsed time = 102 ms.
SQL Server parse and compile time: 
   CPU time = 0 ms, elapsed time = 0 ms.
*/

Now watch what happens when I change the TOP operator to 101.  You will notice that I did not change anything else in the query other than increasing the number of rows returned by 1.

--Slow
SELECT TOP 101 [RowNum],[SomeId],[SomeCode]
FROM dbo.[TestData]
WHERE [RowNum] < 30000
ORDER BY [SomeId]
OPTION(MAXDOP 1)

/*
SQL Server parse and compile time: 
   CPU time = 0 ms, elapsed time = 0 ms.

 SQL Server Execution Times:
   CPU time = 312 ms,  elapsed time = 1690 ms.
SQL Server parse and compile time: 
   CPU time = 0 ms, elapsed time = 0 ms.
*/

Wow…. TOP 101 is over 17 times slower than TOP 100 and all I changed is the number of rows in the TOP operator!!!!!  So why does the optimizer take so much longer to optimize and execute a query using TOP 101, oppose to TOP 100?  The short answer is the memory requirements.  The TOP 101 queries requires a lot more query memory than TOP 101, which translates into tempdb sorting.  As you may recall, I addressed some techniques to solve the tempdb sorting problem in my last post, http://jahaines.blogspot.com/2010/03/performance-tuning-101-what-you-will.html.  If you are using SQL 2008, you can use the same optimization techniques presented in my prior post, but SQL 2005 is a completely different animal.  To make the TOP 101 query faster, we need to first understand why it is slower.  Let’s take a look at what is different when we run the TOP 100 and the TOP 101 query.

Lets start by looking at the memory SQL Server grants to each query.  Open two different query windows and execute each top query within a while loop.  We can then use sys.dm_os_memory_grants to get the required memory.

Here is a sample of how to run the TOP query in a while loop.

WHILE 1=1
SELECT TOP 100 [RowNum],[SomeId],[SomeCode]
FROM dbo.[TestData]
WHERE [RowNum] < 30000
ORDER BY [SomeId]
OPTION(MAXDOP 1)

In a new query window, run the following query to get the memory specifications.

SELECT  [granted_memory_kb],[required_memory_kb],[max_used_memory_kb] FROM sys.dm_exec_query_memory_grants WHERE [session_id] = 58

Here are my results:

     granted_memory_kb    required_memory_kb   max_used_memory_kb
---- -------------------- -------------------- --------------------
FAST 1024                 216                  216

     granted_memory_kb    required_memory_kb   max_used_memory_kb
---- -------------------- -------------------- --------------------
SLOW 6040                 512                  6040

The results are simply astonishing.  The memory requirements increase nearly 28 times when I use TOP 101, instead of TOP 100.  I do not have a formal explanation of why the TOP 101 operator, consumes more memory than TOP 100.  Brad Schulz, http://bradsruminations.blogspot.com/, has contacted Conor Cunningham about this issue and believes that 101 is an arbitrary threshold.  Brad is working on an in-depth post involving the TOP operator.  Keep an eye out for this one, as it should be really good.  Anyway, once the 101 threshold is breached the optimizer uses different calculations to optimize a query, which can effectively bloat the memory requirements for the query.  This memory bloat forces the sort operation to spill into tempdb.  This is where the TOP 101 bottleneck exists.  To verify this problem, open profiler and choose the sort warnings counter and you will see that the slow query has a sort warning error, while the fast query does not.

Now that I have identified the problem, how do I solve it?  I will start by attempting the methods that I used in the previous article, http://jahaines.blogspot.com/2010/03/performance-tuning-101-what-you-will.html.

SELECT TOP 101 [RowNum],[SomeId],CAST([SomeCode] AS VARCHAR(4200))
FROM dbo.[TestData]
WHERE [RowNum] < 30000
ORDER BY [SomeId]
OPTION(MAXDOP 1)

/*
 SQL Server Execution Times:
   CPU time = 344 ms,  elapsed time = 3385 ms.
SQL Server parse and compile time: 
   CPU time = 0 ms, elapsed time = 0 ms.
*/

Bloating the estimated row size still did not help our situation.  Next I will try shrinking the row size.

SELECT TOP 101 [RowNum],[SomeId],RTRIM(CAST([SomeCode] AS VARCHAR(2000)))
FROM dbo.[TestData]
WHERE [RowNum] < 30000
ORDER BY [SomeId]
OPTION(MAXDOP 1)

/*
 SQL Server Execution Times:
   CPU time = 344 ms,  elapsed time = 2461 ms.
SQL Server parse and compile time: 
   CPU time = 0 ms, elapsed time = 0 ms.
*/

Hmm.  Still no luck….. How can I reduce the row size of the input passed into the sort operator?  When you really sit back and think about the problem, the answer is really simple.  To reduce the row size, all you have to do is reduce the columns involved in the sort.  I like to use the TOP inside a derived table, making sure to only use the RowNum and SomeId columns.  We can then join back onto the TestData table.  This gives us a fast sort and a ultra fast index seek on the 101 rows we are returning.

SELECT t.[RowNum],t.[SomeId], t.[SomeCode]
FROM dbo.[TestData] t
INNER JOIN(
    SELECT TOP 101 [RowNum],[SomeId]
    FROM dbo.[TestData]
    WHERE [RowNum] < 30000
    ORDER BY [SomeId]
) AS t2
    ON T.RowNum = t2.RowNum
ORDER BY t.[SomeId]
OPTION (MAXDOP 1)

/*
 SQL Server Execution Times:
   CPU time = 31 ms,  elapsed time = 104 ms.
SQL Server parse and compile time: 
   CPU time = 0 ms, elapsed time = 0 ms.
*/

Alternatively, we can use correlated subqueries or the cross apply operator.

SELECT TOP 101
    t.[RowNum],
    (SELECT t2.[SomeId] FROM dbo.[TestData] t2 WHERE t2.[RowNum] = t.[RowNum]) AS SomeId,
    (SELECT t2.[SomeCode] FROM dbo.[TestData] t2 WHERE t2.[RowNum] = t.[RowNum]) AS SomeCode
FROM dbo.[TestData] t
WHERE t.[RowNum] < 30000
ORDER BY t.[SomeId]
OPTION (MAXDOP 1)

SELECT TOP 101
    t.[RowNum],
    t2.SomeId,
    t2.SomeCode
FROM dbo.[TestData] T
CROSS APPLY(SELECT t2.SomeId, t2.SomeCode FROM dbo.[TestData] t2 WHERE t2.[RowNum] = t.[RowNum]) AS t2
WHERE t.[RowNum] < 30000
ORDER BY t.[SomeId]
OPTION (MAXDOP 1)

It should be noted that the correlated subquery method will produce more IO because it uses two subqueries.  As you can see, the solution to this challenge is quite simple, but the solution requires an understanding of what is occurring underneath the hood of SQL Server. 

Note: It is still possible that some of the sorting will be sent to tempdb, but you should see a elapsed time that rivals TOP 100.

The next optimization technique, I will be demonstrating is a predicate pushing problem.  Unbeknownst to most developers, SQL Server 2005 does have a problem with predicate pushing in views.  A lot of these issues have been resolved in SQL Server 2008, but should be known.  I will be demonstrating a very simple example, using a ranking function.  Ranking functions are relatively new to SQL Server and were introduced in 2005.  I am sure there are other scenarios that cause predicate pushing problems, but I will only be addressing the ranking problem, in this post.

Let’s start by creating a small sample table.

USE [tempdb]
GO

CREATE TABLE Test(
ID INT IDENTITY(1,1) PRIMARY KEY,
FName VARCHAR(50),
LName VARCHAR(50)
);

INSERT INTO dbo.Test VALUES ('Adam','Haines');
INSERT INTO dbo.Test VALUES ('John','Smith');
INSERT INTO dbo.Test VALUES ('Jane','Doe');
GO

CREATE NONCLUSTERED INDEX ncl_idx_LName ON dbo.Test(LName) INCLUDE(FName);
GO

As you can see, the table is relatively simple.  The idea is to present an easy to understand example that demonstrates potential performance problems with views.

Here is my simple query that shows an index seek on LName.

SELECT Id,FName,LName,ROW_NUMBER() OVER(PARTITION BY LName ORDER BY Id) AS seq
FROM dbo.Test
WHERE LName = 'Smith'
GO

image

Let’s see what happens when I put the logic into a view, with no predicate.  The predicate will be called from outside the view and should be pushed down into the view, as views are materialized into the underlying objects at runtime.

CREATE VIEW dbo.vw_Test
AS
SELECT Id,FName,LName,ROW_NUMBER() OVER(PARTITION BY LName ORDER BY Id) AS seq
FROM dbo.Test
GO

I will now query the view using the same predicate as the original query.

SELECT Id,FName,LName,seq
FROM dbo.vw_Test 
WHERE LName = 'Smith'
GO

image

The problem here is the optimizer decided to filter the results of the query AFTER the table “Test” has been scanned.  One should expect the optimizer to seek on the LName column because the optimizer should push the predicate; however, SQL Server 2005 does not do a great job of this.  SQL Server 2008 will appropriately push the predicate deep into the plan to get the index seek.  How do we solve this problem?  Unfortunately, there is not a whole lot you can do to make the plan work more efficiently.  The best option in my opinion is to a INLINE TVF to parameterize the query. 

CREATE FUNCTION dbo.fn_Test(@LName VARCHAR(50))
RETURNS TABLE
RETURN(
    SELECT Id,FName,LName,ROW_NUMBER() OVER(PARTITION BY LName ORDER BY Id) AS seq
    FROM dbo.Test
    WHERE LName = @LName
)
GO

Now execute a select against the TVF using the same predicate.

SELECT Id,FName,LName,seq
FROM dbo.fn_Test('Smith')

image

There you have it.  I have demonstrated a few optimization techniques that I have used to solve performance problems.  I have only scratched the surface here.  There are many more optimization techniques available.  Stay tuned for future posts, where I will explore even more optimization techniques including a shocking example demonstrating how an index rebuild can introduce fragmentation and how to avoid it.

Until next time, happy coding.

One TRUNCATE TABLE Can Use Up All Your Locks

March 17th, 2010 by Larry Leonard

Here's an example of how you can learn more from bad schemas than good schemas. This happened to me under SQL Server 2000, and I'm wondering if anyone has seen it on later versions.

I was attempting to truncate a table with maybe 100 million rows (don't ask - that's the bad schema part). When I issued the TRUNCATE TABLE command, however, I got:

Error: 1204, Severity: 19, State: The SQL Server cannot obtain a LOCK resource at this time. Rerun your statement when there are fewer active users or ask the system administrator to check the SQL Server lock and memory configuration.

What I've learned from http://support.microsoft.com/kb/257345 is that "almost all of the locks acquired for a TRUNCATE statement are for extent deallocations". I didn't even know TRUNCATE TABLE used locks, but it makes sense now, of course. I ended up dropping the table and re-creating it.

One TRUNCATE TABLE Can Use Up All Your Locks

March 17th, 2010 by Larry Leonard

Here's an example of how you can learn more from bad schemas than good schemas. This happened to me under SQL Server 2000, and I'm wondering if anyone has seen it on later versions.

I was attempting to truncate a table with maybe 100 million rows (don't ask - that's the bad schema part). When I issued the TRUNCATE TABLE command, however, I got:

Error: 1204, Severity: 19, State: The SQL Server cannot obtain a LOCK resource at this time. Rerun your statement when there are fewer active users or ask the system administrator to check the SQL Server lock and memory configuration.

What I've learned from http://support.microsoft.com/kb/257345 is that "almost all of the locks acquired for a TRUNCATE statement are for extent deallocations". I didn't even know TRUNCATE TABLE used locks, but it makes sense now, of course. I ended up dropping the table and re-creating it.

Forewords for DBA Survivor

March 17th, 2010 by Thomas LaRock

I wanted to give a big thanks to Kevin Kline (blog | twitter) and Buck Woody (blog | twitter) for writing the forewords to my book. I put them on their own page at http://dbasurvivor.com, as well as posted a blog entry for them.

I’m not sure I could ever thank them for taking the time to write a foreword for my book. I owe them both a debt of thanks for a lot of help they have given me over the years. Kevin introduced me to my love Operations Manager (we dated for a while and then married four years ago) and Buck once helped me disable a logon trigger using the DAC during a chat session in a LiveMeeting that we were both attending. Having them both agree to write a foreword was easy: I didn’t tell them about each other. But, now with the book out, they are sure to find out I was cheating on them with the other. So, I might as well go public with the info and wanted to share with you what they wrote.

Kevin and Buck, thanks for your help.

SSWUG Spring ‘10 Ultimate Virtual Conference

March 17th, 2010 by jstrate

13_02_42

The SSWUG Spring 2010 Virtual Conference is coming up.  On April 7, 8, and 9, attend a conference from the comforts of your living room, office, or couch.  And while you are at it you’ll get to sit on on some incredible presentations, interaction with speakers, chat with other attendees, make new contacts, lots of fun and surprises.

  • More than 80 technical sessions
  • These are NOT sales presentations.
  • Live keynote presentations and interaction
  • On-Demand access to sessions for 45 days – miss a session? No problem.
  • Chat, Twitter Integration, SKYPE integration for Q&A
  • Experience-based learning – find out what you need to know from people that are using the technology every day
  • 6-Month SSWUG.ORG Membership (or membership extension), included!
  • ALL-ACCESS Pass: SQL Server and Business Intelligence… SharePoint and .NET technologies – all included, one low price
  • Great vendor hall – learn about the best tools, technologies, publications and partners out there for your shop
  • Learning, learning and more learning

If this sounds as awesome as it is, register for the event here.  Among the presenters will be myself with the following topics:

Using XML to Query Execution Plans

SQL Server stores its execution plans as XML in dynamic management views. The execution plans are a gold mine of information. From the whether or not the execution plan will rely on parallelism to what columns are requiring a key lookup after a non-clustered index seek. Through a the use of XML this information can be available at your finger tips to help determine the value and impact of an index and guide you in improving the performance of your SQL Server databases.

Prerequisites

  • Understanding of basic T-SQL coding
  • Understanding of basic XQuery statements

Goals:

  1. Discuss information available in an execution plan
  2. Demonstrate use of XQuery to query execution cache
  3. Demonstrate methods to performance tune

Getting To Know Your Indexes

Without proper indexing SQL Server can be hard pressed to create efficient and performant execution plans. Dynamic Management Views (DMV) and system views provide a slew of information about indexes that can be used to analyze indexes within SQL Server. In this session we’ll go under the hood of SQL Server to look at DMVs and system views to know what indexes you have, should have, and how they feel about the way applications are treating them.

Prerequisites

  • Understanding of performance tuning needs
  • Experience with Database design

Goals:

  1. Identify methods to analyze current and potential indexes
  2. Learn how to alleviate stress found on indexes
  3. Demonstrate methods for tuning indexes

A Function By Any Other Name Is A Function, Right? Right?!?

User defined functions provide a means to encapsulate business logic in the database tier. Often the purpose of the encapsulation is to provide standard method access segments of data within the database. Unfortunately, not all methods of creating user defined functions are equal. In this session we’ll review the types of user defined functions and investigate the performance impact in selecting the different types

Prerequisites

  • Understanding of basic T-SQL

Goals:

  1. Identify purposes for creating user defined functions
  2. Discuss the types of user-defined functions
  3. Demonstrate performance impact in selecting different types of functions

Stop in an check it out, there will be a lot of great sessions and great conversation.