I have a parallel execution environment where I’m seeing PAGELATCH blocking in some DMVs and not in others. What am I reading incorrectly?

Filed under: Administration, Internals and Architecture, SQL Server 2005, Transact-SQL (T-SQL), Tuning and Optimization — Ari Weil at 3:59 am on Wednesday, February 13, 2008

This is a really tough problem for people to troubleshoot because there is little useful information available about non-I/O latches.

First things first, what are latches? Latches are lightweight synchronization locks used to coordinate access to data either on disk (PAGEIOLATCH) or in the bufferpool (LATCH, PAGELATCH); the latter is actually a bit of a gray area as data pages in the bufferpool are still tied to pages on disk. So why not call them locks to save confusion? Well, because they’re different. For one, lightweight is usually interpreted as short-lived, however there’s more to it than that. Latches are lightweight because users cannot directly control them (people will argue with this statement, but for the purposes of this short article I’ll leave it at that), information on the latch owner is not fully (read always) available, and latches are granted differently than locks. For a very thorough review of exactly what is unique about latches and how to troubleshoot latching issues, Ken Henderson published the best information I’ve seen to date on waits and blocking issues entitled SQL Server 2005 Waiting and Blocking Issues.

I’ll admit the aforementioned article can be a tough read - especially if you’re looking for summary information, or you don’t have a lot of background in SQL Server. So, in summary:

  • There are four main causes of page latch blocking:
    1. IO subsystem performance
    2. Contention on internal allocation system tables
    3. Contention on catalog pages
    4. Contention on data pages
  • You will never see a latch wait time greater than 5 minutes (something else that makes latches unique from blocking locks)
  • Latches are granted using the First-In-First-Out (FIFO) method
  • User table latch contention is the simplest to troubleshoot, and is typically the most common source of confusion with latch contention issues. Frequent modifications to the data in specific tables can create hot spots in a database, or more specifially hot pages. This isn’t a symptom of frequent SELECT operations, but rather INSERTs, UPDATEs, and DELETEs.
  • When troubleshooting a latching issue:
    1. Check the latch class description in Books Online, if available.
    2. Examine the latch class for indications of what components or what type of statements may acquire the latch.
    3. Examine the blocked task’s current statement for more hints regarding the latch usage.
    4. Use the suffix of the wait type to determine the mode for the blocked request. This will help identify whether the latch is being acquired for shared access—in which case, it must currently be held for exclusive access. An exclusive request implies that the blocked task’s current session needs to do some update, and this can provide further info regarding resolving the blocking.
    5. Query the following DMVs:
      1. sys.dm_os_wait_stats
      2. sys.dm_os_latch_stats
      3. sys.dm_exec_requests
      4. sys.dm_tran_active_transactions

So, to get to the original question: what are you reading incorrectly? Nothing. In your case you had (screenshots not included since I’m not that well-heeled a blogger yet) two SPIDS, both exhibiting parallel coordination (CXPACKET), shared (PAGELATCH_SH) and update latches (PAGELATCH_UP). Since you were observing queries in a parallel environment and simultaneously saw thread coordination waits, once the threads were complete, the latch was removed and the blocking went away; because of the FIFO behavior of latches, one incompatible latch type will cause the rest to wait. Since SQL Server 2000 SP4, Microsoft’s been identifying latches as the cause of blocking in sysprocesses, however since there is no owner information for certain lock types, the DMVs in SQL Server 2005 will not show you a blocker (for example, in sys.dm_os_waiting_tasks) when the blocking latch type is a SH latch.

For more information on troubleshooting performance problems in SQL Server 2005, read the aptly named Microsoft whitepaper Troubleshooting Performance Problems in SQL Server 2005.

No Comments »

No comments yet.

RSS feed for comments on this post. TrackBack URI

Leave a comment

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>