WHY ISN”T AUTO_UPDATE STATISTICS RUNNING?

Filed under: Internals and Architecture, Tuning and Optimization — KKline at 10:07 am on Monday, May 19, 2008

I find that auto_update statistics in SQL Server is a really good thing.  Without it, many third-party applications would simply fall over from lack of preventative maintenance.  With it, they are able to run for extraordinarily long periods of time without really needing a full-time DBA to check up on the databases.

 

Having said all of that, auto_update statistics is a mystery for many of us.  This is reflected in the fact that I get lots of questions about auto_update statistics when I speak publicly (which seems to be at least once per month these days).  The most common question is “why isn’t auto_update statistics running?”

 

There are many elements to the answer to this question.  Naturally, you’re not going to have any new statistics in a database that is marked as readonly.  However, you might not know that approximately 20% of a table needs to change before triggering an auto_update statistics run for the given table.  If the table is really big, it’s possible that you’re simply not changing enough data.

 

Also remember that statistics are not the same in SQL Server 2005 as they are in SQL Server 2000 and that they must be upgraded using the sp_updatestats stored procedure after you migrate from version 2000 to 20005.  Otherwise, the old statistics are unusable by the query engine.  Once you’ve run sp_updatestats, the engine will maintain them automatically.  Be sure to check the article “After Upgrading the Database Engine” at http://msdn2.microsoft.com/en-s/library/ms143695.aspx to make sure you covered all the bases.

 

Thanks to Connor Cunningham and Simon Sabin (http://sqlblogcasts.com/blogs/simons) for help on this post.

I need to know whether a 3rd party application is leading to out of memory errors in my SQL Server…and I need to know right now.

Filed under: Administration, Internals and Architecture, SQL Server 2005, Tuning and Optimization — Ari Weil at 12:57 pm on Thursday, April 24, 2008

First things first - depending on the version of SQL Server that you’re using, SQL Server could be creating this problem for itself. See these articles if you’re running SQL Server 2005 pre-SP2:

So what’s going on? First we need to address two concepts: Virtual Address Space and Working Set. Windows uses virtual address space (VAS) to allocate memory to applications (that’s as deep as I’m going here). A program’s working set is the collection of VAS pages that have been recently referenced. When a process is created, it is assigned a minimum and maximum working set size. As the working set size increases, thereby increasing memory demand, the virtual memory manager will attempt to keep up to the maximum working set size (memory) resident without exceeding the maximum threshold.

One common misconception is that there is as much virtual memory available to a process as there is physical memory (RAM) on a machine. Not so (or I wouldn’t be going there); you can get much more depth by reading Slava OKs’s blog on VAS. One very important point from a SQL Server perspective, however, is that when you’re configuring min and max server memory, you have to take into account the MemToLeave area. MemToLeave is VAS that’s left unused when SQL Server starts so that external components have address space to consume when they’re loaded. Typically, the majority of memory allocations from MemToLeave are from non-SQL Server memory consumers like COM objects, extended stored procedures, and linked servers. You can use DBCC MEMORYSTATUS to see these, but bear in mind that what’s reported will only reflect allocations made by SQL Server directly (SQL Server has no knowledge of outside memory requests).

So, when troubleshooting out of memory errors, do your homework.

  • Check the list above to see whether your SQL Server is suffering from a known SQL Server bug, and understand (at least a little bit about) VAS and memory allocations.
  • You can use Performance Monitor (Task Manager won’t help here) to get an idea of how much of VAS is currently consumed inside of your process by looking at the virtual bytes counter.
  • If you believe that a 3rd party application is loading DLLs into SQL Server’s VAS, or you want to see if your own CLR or external procedures are doing so use SQL Server’s DMV sys.dm_os_loaded_modules to see what’s been loaded into SQL Server’s address space.

Oh, and by the way, for anyone who’s used to configuring the working set size parameter with sp_configure, you need to be aware that it’s been deactivated in SQL Server 2005.

Hope that helps.

I have two Windows 2005 instances running on Windows 2003. Connections to one instance start failing with SSPI context errors and I spot something in the log files aboust deregistering the SPN.

Filed under: Administration, Internals and Architecture, SQL Server 2005 — Ari Weil at 8:57 am on Monday, April 21, 2008

There’s either a network issue with regard to that server’s DNS, or you’re having an issue that’s addressed by KB article 319723, particularly section 3: Configure the SQL Server service to create SPNs dynamically.

Let’s discuss how SQL Server integrated security works to understand what’s happening. When the SQL Server driver attempts to connect it tries to resolve the fully-qualified domain name of the SQL Server using WinSock APIs (look up gethostbyname and gethostbyaddr for details) to determine the Service Principal Name (SPN). SQL Server is using Kerberos over TCP-IP sockets via Security Support Provider Interface (SSPI) to resolve the SPN. If there is a problem verifying the fully-qualified domain name from the client or server side during this exchange, the connection fails. You can see this yourself when/if the problem occurs using NSLOOKUP from the client machine.

Hope that helps.

NASTY RUMORS ABOUT MAXDOP

Filed under: Internals and Architecture, Tuning and Optimization — KKline at 11:53 am on Tuesday, April 15, 2008

You’d think an article called “NASTY RUMORS ABOUT MAXDOP” would have something to do with Britney Spears or maybe Robert Downy Jr, but in that case it’d be total fiction (at least, it would be coming from my pen).

 

So, I was en route to the 2008 Microsoft MVP Summit yesterday and I had a chance to catch up on my reading.  You may have heard some rumors that you should only ever set MAXDOP (maximum degrees of parallelism) to an even number.  I can neither confirm nor deny these rumors since I haven’t had time to test this yet myself.  But I’m curious if any readers out there have definitive information one way or the other.

 

Evidently, there are two roles involved in parallel processing, a writer and a consumer.  Naturally, when two roles are at work, the MAXDOP setting doesn’t have to share resources when it is set in increments of two.

 

Hope this helps,

 

-Kevin

Wondering whether to take a full or differential backup? See what percentage of your data has changed using native SQL Server tools!

Filed under: Administration, Backup and Restore, Internals and Architecture, Transact-SQL (T-SQL) — Ari Weil at 9:22 am on Tuesday, April 15, 2008

If you’ve never read Paul Randal’s blog on the SQLSkills website you should check it out. Paul recently posted a stored procedure that can be used to tell which percentage of a database’s data has changed so you can determine which type of backup to take. The procedure can be scheduled and its output analyzed over time to strengthen your backup and recovery strategy (because we know that you already have one, you just need to improve upon it ;-) ).  The cool thing about code like this is that it can be bundled into other maintenance procedures and it can integrate with monitoring and maintenance tools.  For instance, you could run this procedure on a schedule with predefined diff thresholds that would flag the  appropriate type of backup job for execution.  You could then use enterprise monitoring tools like MOM or SCOM to determine the best maintenance windows to guarantee the backups run at the right time, and voila!  You’ve got the beginnings of an automated backup and recovery strategy.

Dealing with fragmentation - when, why, and how.

Filed under: Administration, Internals and Architecture, SQL, SQL Server 2005, Transact-SQL (T-SQL), Tuning and Optimization — Ari Weil at 4:51 am on Monday, March 3, 2008

Fragmentation: The scattering of parts of a file throughout a disk, as when the operating system breaks up the file and fits it into the spaces left vacant by previously deleted files.

That’s how the dictionary defines fragmentation, which can be a very real performance problem for certain databases. But when is fragmentation truly a problem, and how can you deal with it? The folks posting on the SQL Server Storage Engine blog have a nice series on fragmentation that can help debunk some myths and show you how to evaluate and fix problems once they arise.

On that note, Kalen Delaney has also posted some good information on why going to lengths to defragment system tables is often effort best expended elsewhere. See her posts on this subject here and here.

Finally, you can read the MSDN article Microsoft SQL Server 2000 Index Defragmentation Best Practices for the official word on the subject.

I have a parallel execution environment where I’m seeing PAGELATCH blocking in some DMVs and not in others. What am I reading incorrectly?

Filed under: Administration, Internals and Architecture, SQL Server 2005, Transact-SQL (T-SQL), Tuning and Optimization — Ari Weil at 3:59 am on Wednesday, February 13, 2008

This is a really tough problem for people to troubleshoot because there is little useful information available about non-I/O latches.

First things first, what are latches? Latches are lightweight synchronization locks used to coordinate access to data either on disk (PAGEIOLATCH) or in the bufferpool (LATCH, PAGELATCH); the latter is actually a bit of a gray area as data pages in the bufferpool are still tied to pages on disk. So why not call them locks to save confusion? Well, because they’re different. For one, lightweight is usually interpreted as short-lived, however there’s more to it than that. Latches are lightweight because users cannot directly control them (people will argue with this statement, but for the purposes of this short article I’ll leave it at that), information on the latch owner is not fully (read always) available, and latches are granted differently than locks. For a very thorough review of exactly what is unique about latches and how to troubleshoot latching issues, Ken Henderson published the best information I’ve seen to date on waits and blocking issues entitled SQL Server 2005 Waiting and Blocking Issues.

I’ll admit the aforementioned article can be a tough read - especially if you’re looking for summary information, or you don’t have a lot of background in SQL Server. So, in summary:

  • There are four main causes of page latch blocking:
    1. IO subsystem performance
    2. Contention on internal allocation system tables
    3. Contention on catalog pages
    4. Contention on data pages
  • You will never see a latch wait time greater than 5 minutes (something else that makes latches unique from blocking locks)
  • Latches are granted using the First-In-First-Out (FIFO) method
  • User table latch contention is the simplest to troubleshoot, and is typically the most common source of confusion with latch contention issues. Frequent modifications to the data in specific tables can create hot spots in a database, or more specifially hot pages. This isn’t a symptom of frequent SELECT operations, but rather INSERTs, UPDATEs, and DELETEs.
  • When troubleshooting a latching issue:
    1. Check the latch class description in Books Online, if available.
    2. Examine the latch class for indications of what components or what type of statements may acquire the latch.
    3. Examine the blocked task’s current statement for more hints regarding the latch usage.
    4. Use the suffix of the wait type to determine the mode for the blocked request. This will help identify whether the latch is being acquired for shared access—in which case, it must currently be held for exclusive access. An exclusive request implies that the blocked task’s current session needs to do some update, and this can provide further info regarding resolving the blocking.
    5. Query the following DMVs:
      1. sys.dm_os_wait_stats
      2. sys.dm_os_latch_stats
      3. sys.dm_exec_requests
      4. sys.dm_tran_active_transactions

So, to get to the original question: what are you reading incorrectly? Nothing. In your case you had (screenshots not included since I’m not that well-heeled a blogger yet) two SPIDS, both exhibiting parallel coordination (CXPACKET), shared (PAGELATCH_SH) and update latches (PAGELATCH_UP). Since you were observing queries in a parallel environment and simultaneously saw thread coordination waits, once the threads were complete, the latch was removed and the blocking went away; because of the FIFO behavior of latches, one incompatible latch type will cause the rest to wait. Since SQL Server 2000 SP4, Microsoft’s been identifying latches as the cause of blocking in sysprocesses, however since there is no owner information for certain lock types, the DMVs in SQL Server 2005 will not show you a blocker (for example, in sys.dm_os_waiting_tasks) when the blocking latch type is a SH latch.

For more information on troubleshooting performance problems in SQL Server 2005, read the aptly named Microsoft whitepaper Troubleshooting Performance Problems in SQL Server 2005.

I had a question regarding page life expectancy in SQL Server 2005. What causes page life expectancy to be very low? If you have low page life expectancy, how is it resolved and avoided?

Filed under: Administration, Internals and Architecture, SQL Server 2005, Tuning and Optimization — Ari Weil at 3:21 am on Wednesday, February 13, 2008

Page Life Expectancy can tell you whether your SQL Server has enough memory for its bufferpool to function properly. Extremely low Page Life Expectancy values can be an indication of:

  • Too little memory on the system
  • Too little memory configured for SQL Server’s use
  • Poorly written application code

Brian Moran wrote an article for SQL Server magazine on this back in 2004 entitled Page Life Expectancy a Reliable Indicator of SQL Server Memory Pressure that sums up how you can use Page Life Expectancy to determine whether adding memory is warranted, and things haven’t changed since he wrote the article. Essentially, you might find that there is too little memory installed on the server to enable SQL Server to function properly or that SQL Server hasn’t been configured to use enough of what’s available.

Still, you can throw memory at a server all day long (memory’s cheap, right?) but if your application code is behaving badly it might not help. While you might see a temporary improvement, you’ll likely start running into other issues as the application scales. By addressing application-specific problems like stored procedures that are recompiled due to:

  • Underlying schema changes
  • Statistics changes
  • Including SET options in the procedure definition
  • Temporary table changes
  • Use of the RECOMPILE query hint or OPTION (RECOMPILE)
  • Including data definition language (DDL)

Or by addressing queries that read an entire table into the bufferpool to modify a small number of rows, you can save lots of arguments with systems administrators and managers about adding memory, and improve your application’s performance at the same time. Read Microsoft’s technical document Troubleshooting Performance Problems in SQL Server 2005 for details on these and other tips for improving the performance of your SQL Server, and the Microsoft TechNet article Top SQL Server 2005 Performance Issues for OLTP Applications for OLTP-specific tuning tips.

Scheduler Yields = CPU Pressure…or do they?

Filed under: Administration, Internals and Architecture, SQL Server 2005, Tuning and Optimization — Ari Weil at 3:06 am on Sunday, February 10, 2008

I have been on a number of support calls where the customer insists there is a CPU bottleneck, and in some cases he/she even points to my blog posts and/or those from others indicating that SQL Server scheduler yields are an indication of CPU pressure; however perfmon’s OS CPU counters don’t support that observation. Once we get to the root of the issue and resolve the problem, the customer (very rightly) wants an explanation of why there were scheduler yields (SOS_SCHEDULER_YIELD) without any noticeable CPU pressure.

In some recent cases I have pointed customers to Mario Broodbakker’s excellent article Taking the Guesswork out of SQL Server Performance Profiling Part 2 (Tom Davidson wrote a related albeit shorter article). Mario explains some important concepts relating to the SQL Server Scheduler and monitoring performance data, taking time to explain key concepts, and using clear examples to illustrate his points. The links to other blogs/sites/whitepapers provide good background reading to either familiarize the reader, or help him/her to brush up on related information.

In essence, the article illustrates what can happen when two “killer” queries are executed on the same scheduler. Because the SQLOS attempts to ensure that no one query will monopolize a CPU queries are forced to yield when appropriate.

For background and sample scripts relating to the SQL Server Scheduler you can look at Slava Oks’s posts on SQLOS - unleashed and SQLOS’s DMVs Continued which benefit from first understanding the user mode scheduler (UMS) which was best explained by the late (and great) Ken Henderson (see Inside the SQL Server 2000 User Mode Scheduler).

Mario points out how:

  • when CPU-intensive queries are assigned to different schedulers yields are not observed, however when multiple CPU-intensive queries are assigned to the same scheduler they cause one another to yield thereby producing a situation that appears as CPU Pressure but is in fact what he refers to as SOS Scheduler pressure
  • when the overall CPU load on the system is high a similar phenomenon is observed, however not only are scheduler yields increased, so are other event waits

One factor that I really appreciate about this article is the clear explanation and examples provided for what I usually refer to as cascading resource bottlenecks. The article shows how external CPU pressure can make it difficult for the SQLOS scheduler to schedule processes, and thereby results in increased wait times across all events. In these cases looking at the processor run queue length (this is a great example of when this counter truly is relevant), as well as the increased signal wait times in the [sys].[dm_os_wait_stats] DMV can expose the root issue. In these cases it “takes time to get scheduled on the CPU, and you only hope that your thread is not pushed off of the CPU, preemptively, by a higher priority thread that is ready to run. This is something to be aware of: on very CPU bound systems, sometimes the waits are not what they seem.

As many SQL Server pros like to point out, generalizations and quickly jumping to a conclusion can be dangerous - and this is a perfect example of how a generalization that’s not supported by the facts of the case can lead to drawing the wrong conclusion. To use a cheesy, but very relevant catch phrase from my youth (a la GI Joe) - Now you know, and knowing is half the battle.

;-)

Ever since I upgraded from SQL Server 2000 to SQL Server 2005 I’ve seen significant plan cache bloat. Nothing’s changed in my application, so why the increase?

Filed under: Administration, Internals and Architecture, SQL Server 2005, Tuning and Optimization — Ari Weil at 7:56 am on Tuesday, January 29, 2008

Depending on how you look at things, this can be considered to be either a feature, a bug, or a miscalculation in the way caching was initially implemented in SQL Server 2005. Either way, as of SP2 this behavior has been changed.

Reading Microsoft’s whitepapers on Statistics Used by the Query Optimizer in Microsoft SQL Server and Batch Compilation, Recompilation, and Plan Caching Issues in SQL Server 2005 provide lots of good information on what’s changed in plan caching and reuse between SQL Server 2000 and SQL Server 2005.

But, your question deals specifically with the size of the cache. Here’s a breakdown of the sizing by SQL Server version:

SQL Server 2000: 4GB upper cap on the plan cache
SQL Server 2005 RTM & SP1: 75% of server memory from 0-8GB + 50% of server memory from 8Gb-64GB + 25% of server memory > 64GB
SQL Server 2005 SP2: 75% of server memory from 0-4GB + 10% of server memory from 4Gb-64GB + 5% of server memory > 64GB

Do you see the difference? To use a practical example, consider a SQL Server with 64 GB total SQL Server memory.

SQL Server 2005 RTM and SP1 will end up with a cachestore limit of:
75% * 8 + 50% * ( 64 - 8 ) = 34GB

Now, if you upgrade that same SQL Server 2005 instance to SP2:
75% * 4 + 10% * ( 64 - 4 ) = 12GB

There is also a significant difference in query plan costing where the cache is concerned. For a detailed breakdown of these differences read the SQL Programmability & API Development Team Blog’s article on Changes in Caching Behavior between SQL Server 2000, SQL Server 2005 RTM and SQL Server 2005 SP2.

« Previous PageNext Page »