Archive for January, 2008

2008 to RTM next week

Thursday, January 31st, 2008

Windows Server 2008, that is.

"

Windows Server 2008 (and Vista Service Pack 1) are slated to release to manufacturing (RTM) on Feb. 6, according to several sources. That gives the company plenty of time to churn out disks for distribution at the event, which Microsoft executives have characterised as the company's "biggest enterprise launch ever."

The early February RTM means that the long-awaited server operating system will be available for the big event. Visual Studio 2008 is already out. Microsoft last week said SQL Server 2008 has slipped into the third quarter. Previously, the company said the database would be available in the second quarter. Microsoft has also promised to deliver its new database release in 36 to 48 months going forward. SQL Server 2003 shipped in November of that year."

The full story can be found here.

Extended Events in SQL Server 2008 Part 2

Thursday, January 31st, 2008

Wow, I cry uncle. This stuff is hard to figure out. They are simply going to have to add tools to product and\or documentation a little more before I mess around with it. I am looking forward to Bob B’s session on it at SQL Connections in Orlando. Welp, on to Performance data collection and SSRS 2008 :)

SQL Server Shorts

Wednesday, January 30th, 2008

Dan Jones provides some interesting insight into the development process of SQL Server 2008.

SQL Customer Advisory Team releases a partition management utility on CodePlex.

The storage engine team continues the data compression series. This time they cover data compression strategies. I am really looking forward tow planing with this in the next CTP.

Lastly, it is a sad week in the SQL Server world as we lost a living legend. Ken Henderson passed Sunday. Many bloggers paid homage. A few include Kalen Delaney, Bill Ramos, WelselyB, Sean McCown.

Find overlapping indexes with ease in SQL Server 2005

Wednesday, January 30th, 2008

In a previous post I discussed how the SQL Server 2005 DMV [sys].[dm_db_index_usage_stats] could be used to find unused indexes. The SQL Server Programmability & API Development Blog recently posted a good article on identifying overlapping indexes using a simple script.  This article is definitely worth a read because overlapping indexes are just as much of a waste of resources during update operations as are unused indexes.

“Simple script, eh?” Well, in this case we’re talking about a three-part process:

  1. Create a user-defined function and a view
  2. Run a SELECT statement to expose the overlapping indexes
  3. Clean up the view and the function

If you’re worried about running such a thing in a production environment, you can either:

  • Run the script during off-peak hours
  • Alter the script to set the isolation level to READ UNCOMMITTED and include the NOLOCK hint on the objects referenced (if your environment is stable there’s no reason to fear the dirty/phantom read issues inherent in this approach)
  • Create the function and view, then create another database from a snapshot, and then run the SELECT on the snapshot database

The blog outlines some common concerns when dropping existing indexes and reviews a few index usage basics to qualify the results of the query. This can be a very quick, time-saving process and a good way to jump start your own database maintenance initiatives.

Never use table variables?

Wednesday, January 30th, 2008

 

MS pushed tables variables too hard back in 2000 and developers went a little crazy with them. However, we found out that they were not the greatest thing since sliced bread especially when the result set is more that a few records and the query is even mildly complex.

The other case we hear for table variables is to avoid recompilations. This was true in SQL Server 2000. This has changed somewhat in SQL 2005 but you might not realize this by reading some web sites out there. On top of that, I cannot reproduce recompiles until much higher thresholds than what we should see per the documentation. This is a good thing in most scenarios IMO.

Based on this blog post, which is part of a great procedure cache series, we should see a recompile when 6 rows change, 500 more and at 500 + 0.20 * n more where n is the cardinality of the table.

"

After 6 modifications to an empty temporary table any stored procedure referencing that temporary table will need to be recompiled because the temporary table statistics needs to be refreshed.

The recompilation threshold for a table partly determines the frequency with which queries that refer to the table recompile. Recompilation threshold depends on the table type (permanent vs temporary), and the cardinality (number of rows in the table) when a query plan is compiled. The recompilation thresholds for all of the tables referenced in a batch are stored with the query plans of that batch.

Recompilation threshold is calculated as follows for temporary tables: n is the cardinality of the temporary table when the query plan is compiled.

If n < 6, Recompilation threshold = 6.

If 6 <= n <= 500, Recompilation threshold = 500.

If n > 500, Recompilation threshold = 500 + 0.20 * n.

"

That blog post mirrors the numbers in this must read white paper. Both the blog post and the white paper, use this example.

 

create procedure RowCountDemo

as

begin

      create table #t1 (a int, b int)

      declare @i int

      set @i = 0 while (@i < 20)

      begin

            insert into #t1 values (@i, 2*@i - 50)

            select a

            from #t1

            where a < 10 or ((b > 20 or a >=100) and (a < 10000))

            group by a

            set @i = @i + 1

      end

end

Now here is the interesting part... I cannot get it to recompile. I have tried on SQL 2005 RTM, sp2 and sp2 + 3054. The initial run shows up as a recompile in a trace but subsequent runs do not. Not at @i = 100, 500 or 1000. At precisely @i = 1108, recompilation happens every time.

 

create procedure RowCountDemo

as

begin

      create table #t1 (a int, b int)

      declare @i int

      set @i = 0 while (@i < 1108) --<--- Recompilations happen at >= 1108

      begin

            insert into #t1 values (@i, 2*@i - 50)

            select a

            from #t1

            where a < 10 or ((b > 20 or a >=100) and (a < 10000))

            group by a

      set @i = @i + 1

      end

end

Now there may be something totally flawed in my understanding. I am sure you guys will point that out if it is the case :) But the white paper states:

"Recall that the recompilation threshold for a temporary table is 6 when the table is empty when the threshold is calculated. When RowCountDemo is executed, a "statistics changed"-related recompilation can be observed after #t1 contains exactly 6 rows. By changing the upper bound of the "while" loop, more recompilations can be observed."

If temp tables really do not cause recompilations at 6 rows, I really cannot think of a good reason to use table variables except for small sets and then only out of preference.

Ever since I upgraded from SQL Server 2000 to SQL Server 2005 I’ve seen significant plan cache bloat. Nothing’s changed in my application, so why the increase?

Tuesday, January 29th, 2008

Depending on how you look at things, this can be considered to be either a feature, a bug, or a miscalculation in the way caching was initially implemented in SQL Server 2005. Either way, as of SP2 this behavior has been changed.

Reading Microsoft’s whitepapers on Statistics Used by the Query Optimizer in Microsoft SQL Server and Batch Compilation, Recompilation, and Plan Caching Issues in SQL Server 2005 provide lots of good information on what’s changed in plan caching and reuse between SQL Server 2000 and SQL Server 2005.

But, your question deals specifically with the size of the cache. Here’s a breakdown of the sizing by SQL Server version:

SQL Server 2000: 4GB upper cap on the plan cache
SQL Server 2005 RTM & SP1: 75% of server memory from 0-8GB + 50% of server memory from 8Gb-64GB + 25% of server memory > 64GB
SQL Server 2005 SP2: 75% of server memory from 0-4GB + 10% of server memory from 4Gb-64GB + 5% of server memory > 64GB

Do you see the difference? To use a practical example, consider a SQL Server with 64 GB total SQL Server memory.

SQL Server 2005 RTM and SP1 will end up with a cachestore limit of:
75% * 8 + 50% * ( 64 - 8 ) = 34GB

Now, if you upgrade that same SQL Server 2005 instance to SP2:
75% * 4 + 10% * ( 64 - 4 ) = 12GB

There is also a significant difference in query plan costing where the cache is concerned. For a detailed breakdown of these differences read the SQL Programmability & API Development Team Blog’s article on Changes in Caching Behavior between SQL Server 2000, SQL Server 2005 RTM and SQL Server 2005 SP2.

I keep reading about the dangers of autogrow but can I really size the database accurately enough to disable it?

Tuesday, January 29th, 2008

First of all, it’s important to realize that the autogrow feature was developed for a reason. Consider an extreme set of alternatives: either your mission-critical database starts throwing out of space errors or processing slows when the autogrow feature gows the database’s data files. Obviously, when faced with these alternatives you’ll want the diminished I/O throughput in lieu of no throughput whatsoever.

Performance tuning specialists and SQL Server internals gurus preach about the dangers of autogrow because it’s important to do everything in your power to avoid having the feature triggered at inopportune moments. Let’s use a more realistic set of alternatives (out of space errors should not be realistic in production) based on a stock trading website with a  backend SQL Server database. This type of website usually guarantees that trades will be performed in the neighborhood of a tenth of a second. You now have a choice between putting in a few hours/days of grunt work calculating the database’s storage needs and potentially running into a situation where autogrow fires right before the market closes; I/O throughput will slow to a crawl and disk contention will lead to blocking locks when autogrow fires; you’d be well advised to start updating your resume because the company will face legal action and/or huge financial consequences.

Planning for database growth over time is not an easy task – it’s not that the process is particularly complicated, but the time required to accurately plan for growth and understand usage trends can make it a tedious process. SearchSQLServer recently posted an article by Denny Cherry, someone who’s got experience dealing with large deployments (he worked on the MySpace 175+ million user installation). His article on Determining SQL Server database storage requirements goes over two generally accepted database sizing methods and points out the danger of using a “good enough” estimation versus a truly accurate one.

A little planning can go a long way…

I can’t figure out what state my database is in. Sysdatabases shows me status values that don’t exist in books online like 532.

Monday, January 28th, 2008

First, let’s dispense with the standard answer, “If you upgrade to SQL Server 2005 (or if you already have) the new system management and catalog views will provide you with more useful information.” While this is a stock answer, it happens to be very true in this particular scenario.

sysdatabases is the SQL Server 2000 catalog view whose status column displays:

1 = autoclose (ALTER DATABASE)
4 = select into/bulkcopy (ALTER DATABASE using SET RECOVERY)
8 = trunc. log on chkpt (ALTER DATABASE using SET RECOVERY)
16 = torn page detection (ALTER DATABASE)
32 = loading
64 = pre recovery
128 = recovering
256 = not recovered
512 = offline (ALTER DATABASE)
1024 = read only (ALTER DATABASE)
2048 = dbo use only (ALTER DATABASE using SET RESTRICTED_USER)
4096 = single user (ALTER DATABASE)
32768 = emergency mode
4194304 = autoshrink (ALTER DATABASE)
1073741824 = cleanly shutdown
Multiple bits can be ON at the same time.

The last sentence about the multiple bits is what’s getting you. In your case, a database showing a status of 532 is set to allow select into/bulkcopy (BULK_LOGGED recovery), to enable torn page detection and is currently offline (4 + 16 + 512 = 532).

sys.databases is the SQL Server 2005 catalog view and offers more insight (while requiring you to query more columns, which is fair in my opinion). Using this view you’ll find columns identifying statuses and configuration options both numerically and using descriptions. So, using your status of 532 again, but by querying sys.databases, you’ll see:

[recovery_model]=2,
[recovery_model_desc]='BULK_LOGGED',
[page_verify_option]=1,
[page_verify_option_desc]='TORN_PAGE_DETECTION',
[state]=6,
[state_desc]='OFFLINE'

Hope that helps.

Heroes Happen Here Comic

Friday, January 25th, 2008

Read more

HHH Comics RSS feed

  

R.I.P. Kenh1234

The problem with local variables

Friday, January 25th, 2008

Have you ever been writing a query and just cannot get it to use the right index. This could be one of the reasons why. Let's use this query with local variables as our example.

declare @Start datetime
declare @End datetime
select @Start = '2004-08-01 00:00:00.000'
select @End = '2004-07-28 00:00:00.000'
select ProductID from sales.SalesOrderDetail where ModifiedDate >= @End and ModifiedDate <= @Start
It generates this plan:

SQL Server Clustered index scan

but we have an index on ModifiedDate. There are many reason why SQL would not use this index but, for this post, we will assume we have eliminated them. Finally, we hard code the dates and we get this plan.

SQL Server index seek with bookmark lookup

So why is it doing this? The reason is because the query optimizer cannot accurately use the statistics to estimate how many rows are returned with local variables. Let's look at how we can tell there is a problem with the cardinality estimates. In the query with the local variables, the optimizer thinks we are getting 10918.5 rows so we do the index scan. In the query with hard coded literals, the estimated rows and actual rows are the same and accurate.

SQL Server Cardinality underestimation

How can we fix this? There are several way. This is another situation that makes a case for stored procedures or parameterized queries.

create proc pDemo01 @Start datetime, @End datetime
as
select ProductID from sales.SalesOrderDetail
where ModifiedDate >= @End and ModifiedDate <= @Start

The stored proc generates the proper plan. However, you will run into the same problem if you modify the parameter within the stored proc like select @start = @start-90. In this case, to should use sp_executeSQL. What if you cannot use a stored proc because it is a 3rd party app or some other reason? A covering index is probably the solution. Once we create this index, it will always be used:

create index ix01 on sales.SalesOrderDetail(ModifiedDate) include (ProductID)

We could use a plan guide or an index hint with a forceseek(SQL 2008) but performance will be really bad when we really do need to get 10k rows. The same problem can happen with stored proc's but that is another post.

To get deeper into this subject, check out this.