Archive for the ‘SQL Server Engine Internals’ Category

Old Performance Tuning Recommendations Die Hard

Tuesday, June 23rd, 2009

It’s interesting to me that old and inaccurate performance tuning recommendations seem to have a life of their own.  In some ways, old performance tuning recommendations are like the Undead from some kind of cheesy, 1970’s zombie movie – no matter how many times you shoot them, they just keep coming back.

 

Here’s a good case in point, http://msdn2.microsoft.com/en-us/library/ms345118.aspx, a white paper discussing Performance Optimizations for the XML data type in SQL Server 2005.  The document states:

 

Multiple tempDB Files for Better Scalability of XML Variables and Parameters

XML variables and parameters use main memory as storage as long as their values are small. Large values, however, are backed by tempdb storage. In a multi-user scenario, if many large XML blobs occur, tempdb contention may become a bottleneck for good scalability. Creating multiple tempdb files reduces the storage contention and yields significantly better scalability. The next example illustrates how multiple tempdb files can be created.

Example: Creating Multiple tempdb Files

This example creates two additional data files for tempdb, each with an initial size of 8 MB, and two log files with an initial size of 1 MB.

Copy Code

USE TEMPDB
GO
ALTER DATABASE tempdb ADD FILE 
   (NAME = 'Tempdb_Data1', 
      FILENAME = 'C:\temp\Tempdb_Data1.MDF', SIZE = 8 MB),  
   (NAME = 'Tempdb_Data2', 
      FILENAME = 'C:\temp\Tempdb_Data2.MDF', SIZE = 8 MB) 
GO
ALTER DATABASE tempdb ADD log FILE 
   (NAME = 'Tempdb_Log1',
      FILENAME = 'C:\temp\Tempdb_Log1.LDF', SIZE = 1 MB),  
   (NAME = 'Tempdb_Log2',
      FILENAME = 'C:\temp\Tempdb_Log2.LDF', SIZE = 1 MB)
GO

These files can be removed by using the ALTER DATABASE tempdb REMOVE FILE command. For more information, see SQL Server Books Online.

 

 

There was a time when this recommendation made sense.  The intent behind multiple tempdb files prior to SQL Server 2005 was to avoid GAM contention on very high throughput scenarios.  Later, SQL Server 2005 introduced segmented GAMs even within a single tempdb file. 

 

This recommendation still holds for data files, especially on systems using NUMA processors, where IO should be aligned with one data file per NUMA socket.  (It also does no harm on an SMP system to organize your data files this way, hence the standardized recommendation.)

 

However, the recommendation fails when you get to the log portion of the equation.  Why?  It’s because data file IO is written using the proportional file algorithm where each data file has data written to it in round-robin style.  On the other hand, log files are written using the active file algorithm where LogFile1 is written first until full, then LogFile2 is written next until full, and so on…  Long story made short (too late, I know) – you get no performance gain from having multiple files in the log because all writes occur on only one file.  You can only get a performance gain from multiple files on the data portion of a database.

 

As an aside, Simon Sabin (http://sqlblogcasts.com/blogs/simons/) has a wonderful collection of addition myths, misunderstandings, and miscommunications.  I’m not sure if they’re put together in a single blog post, so put his blog on your watch list.

 

Cheers,

 

-Kev

 

P.S.  Thanks to Tony Rogerson (http://sqlblogcasts.com/blogs/tonyrogerson) and Geoff Hiten for the inspiration on this blog post, btw.

 

P.P.S.  I like how the MSDN article refers to SQL Server 2005 as SQL Server 9.0.  ;^)

Index Fragmentation

Friday, January 16th, 2009

Is index fragmentation a problem for your SQL Servers?  How would you find out?  Brent Ozar shows how to quickly query all of your SQL Server 2005 and 2008 instances to find out where you’ve got fragmentation.

 

Related wiki articles include:

If you have an MP3 player or a portable video player and you want to download our videos automatically, you can subscribe to our podcast feeds here:

You can also download this video to watch it later:

ORDER BY failings…

Monday, November 17th, 2008

I had an interesting debate with a customer during a demo where I said that his assertion, “I don’t need to specify ORDER BY, I have a clustered index on that table” was problematic. You see, defining a clustered index does tell SQL Server to store data in the order stipulated by the clustering key, and many times running SELECTs against that data will return the data ordered as expected. But there are some gotchas you should keep in mind:

  • When SQL Server runs a query in parallel, each stream will complete it’s set of work and return its result which could cause your clustered-index-ordered data to appear out of order.
  • When another query is already scanning the data when your SELECT statement is run, the data will be read out of the order you’re expecting as your query will piggy-back on the other scan, then come back to the beginning to gather all the data requested.
  • If statistics are out-of-date on your clustered index, or SQL Server otherwise chooses to use a non-clustered index to scan the data, the order of the clustered index will not be reflected in the result set.

But don’t take my word for it, check out what Conor Cunningham has to say on the topic, including some sample code to prove the point.

Memory From Every Angle

Tuesday, October 14th, 2008

It seems like questions always come up about how to monitor memory.  But each time the question comes up, it’s slightly different.

 

Here’s a sort of FAQ for memory questions:

 

1.  How can I figure out memory and CPU consumption by database?

a.  In SQL Server 2005, DMVs are of course the way to go.  Pre-SQL Server 2005, you will probably want to use PerfMon.

b.  Sys.dm_exec_query_stats with sys.dm_exec_sql_text will give per database object stats. Note that there can be ad-hoc statements executed against db which will be difficult to track

c.  Sys.dm_io_virtual_file_stats – File level IO stats per database file. You could track reads/writes this way

d.  Sys.dm_os_buffer_descriptors – Buffer pool information. You can derive this per database

2.  How do I find out memory consumption by worker thread?

a.  This is a tricky challenge because this memory is controlled by the kernel, not the buffer pool. The buffer pool does try to account for it as part of SQL Server’s set-aside which you sometimes override with -g.

b.  Once again, your best bet is the DMV sys.dm_os_threads.  It has two columns that help with worker thread stacks: stack_bytes_committed and stack_bytes_used.

3.  How can I track memory consumption by database object?

a.  SQL Server MVP and luminary, Kalen Delaney, once shared this query that shows consumption of the buffer pool in the local database:

 

— DEMO: Buffer counts by object & index.sql

— break down buffers by object (table, index)

SELECT b.database_id,

   database_name = CASE b.database_id

      WHEN 32767 THEN ‘Resource’

      ELSE db_name(b.database_id) END,

   p.object_id,

   Object_Name = object_name(p.object_id, database_id),

   p.index_id,

   buffer_count=count(*)

FROM sys.allocation_units a

JOIN sys.dm_os_buffer_descriptors b

   ON a.allocation_unit_id = b.allocation_unit_id

JOIN sys.partitions p

   ON a.container_id = p.hobt_id WHERE object_id > 99

GROUP BY b.database_id, p.object_id, p.index_id

ORDER BY buffer_count DESC

 

4.  What other techniques are there for learning about memory consumption?

a.  You should learn about DBCC MEMORYSTATUS discussed at http://support.microsoft.com/?id=907877.  (Thanks to Erland Sommarskog for pointing out this resource.)

b.  You should also read the excellent white paper entitled Troubleshooting Performance Problems in SQL Server 2005 Whitepaper located on TechNet at http://www.microsoft.com/technet/prodtechnol/sql/2005/tsprfprb.mspx.

 

I hope this helps.  Enjoy!

-Kev

Can SELECT Statements Cause Blocking to Occur in SQL Server?

Friday, September 26th, 2008

The following question was posed to me during a call today and its one that I’ve heard quite a bit so I figured it warranted a blog post.

The answer is, absolutely!  SELECT statements acquire a shared lock on the tables being accessed.  This shared lock will not affect other SELECT statements hitting the same table, but if someone tries to modify data in the table (via an UPDATE statement for example), the UPDATE statement will be blocked.  When analyzing locks using sp_lock the (S) symbol indicates a shared lock is on an object.

This default behavior can be modified by using one of many query hints.  Using the NOLOCK hint on a SELECT statement will force SQL Server to read data from the table without creating a shared lock on it.  When using this hint, you run the risk of reading uncommited data from the database but in cases where reading data with 100% accuracy is not required, NOLOCK can dramatically reduce blocking and improve the performance of your SELECT queries.  Another option is to use the READ UNCOMMITED isolation level when running your transactions, which conceptually does the same thing.

One additional caveat about both methods.  If your SELECT statement is running, and it expects to read a page that has been deleted by a transaction that is currently executing, SQL Server may deadlock this transaction.  There is an interresting post on this topic that can be found here.

Error Message: Ad hoc update to system catalogs is not supported.

Wednesday, September 24th, 2008

I ran into an interresting situation today while working with a customer and thought it worthwhile to blog on the subject since my internet searching proved lengthy on the subject.  While trying to run a RECONFIGURE statement after making an sp_configure change I received the following message:

Msg 5808, Level 16, State 1, Line 2
Ad hoc update to system catalogs is not supported.

I’m running on SQL Server 2005 SP2+ so my first thought was… obviously, but I’m not trying to update system catalogs, I’m trying to make an sp_configure change.  After doing some digging I found that the culript was the sp_configure ‘allow updates’ parameter.  This configuration in SQL Server 2000 allowed or dissalowed direct system table updates.  In SQL Server 2005, this configuration item still exists but it is obsolete since direct access to system tables in always prohibited.  While the configuration item is obsolete, having it set to 1 in SQL Server 2005 requires you to run the RECONFIGURE statement using WITH OVERRIDE, otherwise you will get the message above.

If you ever see the message above when trying to run RECONFIGURE you will either need to run RECONFIGURE WITH OVERRIDE, or first run:

EXEC sp_configure ‘allow updates’, 0
RECONFIGURE

I find it confusing that an “obsolete” configuration parameter can have an effect on instance behavior, but oh well.  I’d love to hear comments if anyone has any insight into this or similar issues.

Thanks to Jasper Smith, who posted here and pointed me in the right direction.

Architecture Questions – What’s Your Opinion?

Tuesday, September 23rd, 2008

I had a great time speaking with the large communities of SQL Server users as I traveled about Europe last week and much of this week.  I’m always impressed by the skill, intelligence, and creativity of these professionals.

And sometimes they raise questions that I think are worth sharing with everyone because they too might’ve heard the question before or asked it themselves.  Here are a few examples.  What are your thoughts on these questions?

1. To what extent does compatibility mode effect the behavior of SQL Server?  For example, direct catalog updates aren’t supported in SQL Server 2005. If we’re running in SQL Server 2000 compatibility-mode, can we still access the SYS schema?

2. There are strong architectural reasons why SQL Server doesn’t support multiple log readers, right?  Let’s discuss.  (I have my own ideas about why you wouldn’t want to enable it.)  However, the potential for improving the performance of replication by the use of multiple Log Reader agents is very strong because you could have more data flowing from one server to another.  Imagine for example, if you could have one log reader pumping data for the “customer” table and another pumping data for the “orders” table.  How can we build a solution that offers us those benefits?

3. Error messages?  Yes, error messages.  Is there any way we can find out what error messages are most commonly encountered across the SQL Server universe?  Maybe someone on the PSS team can help us out here?

4. How do I give a good presentation?  Well, there’s a hilarious YouTube video that covers all the bases – http://www.youtube.com/watch?v=cagxPlVqrtM&feature=related.  But if you’re looking for some real resources, here are some others that are useful:

Good web sites from Garr Reynolds and his associated  book:
http://www.presentationzen.com/
http://www.garrreynolds.com/Presentation/index.html
http://www.amazon.com/gp/product/0321525655/103-6148611-3957463?ie=UTF8&tag=garrreynoldsc-20&linkCode=xm2&camp=1789&creativeASIN=0321525655

Guy Kawasaki web site:
http://blog.guykawasaki.com/
http://blog.guykawasaki.com/2008/09/winners-of-worl.html
http://blog.guykawasaki.com/2005/12/the_102030_rule.html

How to Give a Steve Jobs Caliber Keynote Presentation:
http://www.maclife.com/article/how_to_give_a_steve_jobs_caliber_keynote_presentation?page=0%2C0

So what do you think about these questions?

Best regards,

-Kevin

Is there a way to find out from system tables when the store procedure was changed last?

Friday, August 29th, 2008

Absolutely. You can use the sys.objects view to determine the create_date and modify_date for an object. The query would be:


select [name],[create_date],[modify_date]
from [sys].[objects]
where [type]=N'P' and [is_ms_shipped]=0

To see the full definition of the procedure you would run:


select [name],[create_date],[modify_date],[definition]
from [sys].[objects] obj
join [sys].[sql_modules] mod
on obj.[object_id]=mod.[object_id]
where [type]=N'P' and [is_ms_shipped]=0

We are trying to implement log shipping to run hourly. If our nightly full backup takes 1 hour and 45 minutes, what is our recovery process? Is it a bad thing to have a transactional backup in the middle of a full backup?

Friday, August 29th, 2008

Yes, this is bad because it can lead to your log shipped database getting out of sync and will probably result in Error 4305; error 4305 states that the log in a backup set is too late to apply, which means your log shipping jobs will fail until you can synchronize the databases. You should schedule your transaction log backup job to stop before your full backup job begins and then to restart once it has completed. See the Microsoft TechNet article on setting up Log Shipping.

When you perform a a full database backup SQL Server stores the ending log sequence number (LSN), which becomes the starting LSN for the next transaction log backup. So, if you are performing a transaction log backup while your full database backup is executing, and that transaction log backup is successful, once it is shipped and applied to the subscriber it will have a different starting LSN than the ending LSN of the full backup. This is not allowed and SQL Server will throw error 4305.

How Can I Determine Which Tables and Indexes Are Not Being Used in My Database?

Wednesday, August 27th, 2008

Q: I have inherited a legacy applications of which I am certain there are many unneccesary tables and indexes.  I am slightly paranoid about ripping them out however as I am not 100% sure that they are not used.  Is there any way that I can determine which indexes and tables are safe to drop?

A: As long as your application is running on SQL Server 2005+, you have access to a Dynamic Management View (DMV) that displays exactlky this information.  Querying from sys.dm_db_index_usage_stats will tell you how many times a given index/table has been scanned, seeked, looked up, updated, and even gives you the last date when an operation happened.  Browsing through this DMV will allow you to determine with confidence any index/table that has never been touched (since SQL Server was last restarted).  There are many exampes of some elegant queries that people have written that can be found by searching the DMV name but its a fairly straight forward one to understand.

One got-chya, the data in sys.dm_db_index_usage_stats will show tables/indexes across all databases where sys.indexes only shows you data from the database you are in.  This creates some challenging hurdles if you are trying to join on those tables to lookup the index name.  It’s best to keep it simple and filter dm_db_index_usage_stats to only show you data from the database you are in.

A fairly simple example with explanation can be found here.