Multiple filegroups: yay or nay?

Filed under: Administration, Database Design — Brent Ozar at 9:34 am on Monday, September 29, 2008

Stephanie wrote in with two questions:

  1. Is there a gain in getting 2 LDF (log) files for one database?  My opinion is no, because log file access is sequential, so no performance gain can be obtained and this will complicate maintenance.
  2. Is there a gain in separating the data & indexes onto 2 specific filegroups?  My opinion is no, because the way SQL Server handles indexes (as opposed to Oracle) does not help if you split them from the data.

First things first, multiple log files - I agree that you don’t want to do two LDF files for a single database unless there’s special circumstances.  At one shop, we had a pre-grown full-size log file for each database (with autogrow turned off), but we kept an “emergency” log file for each database on an empty drive.  The “emergency” log files were 1mb, and set to autogrow.  If a database completely ran out of log space in its normal pre-grown log files, then it would start using that “emergency” log file and growing it.  That was better than stopping the database completely, and gave us enough time to kill transactions and roll things back.

That certainly wasn’t a best practice, and it didn’t buy us any performance, but we had to do it based on the unpredictable nature of those applications and the way they would sometimes load huge amounts of data in a single transaction.  In that case, multiple LDF files served a purpose.  (When our Microsoft contact found out what we were doing, they hit the roof and told us to just go buy enough disk to be done with it, and the business guys finally decided to shell out the dough.)

Adding a second file adds complexity, like you said, and the same holds true whether it’s a log file or a data file, which segues into your second question.  Adding a second data file for indexes adds more design overhead - suddenly the DBA has to pay attention with where they’re placing objects.

In the days of my youth (ahh, for the sweet days of 2004), I liked this idea for my 1tb data warehouse because I could rebuild indexes faster with less impact on the end users.  Now that I’m old (okay, maybe just older) and SQL Server 2005 Enterprise Edition lets us rebuild most indexes online, I don’t bother with that anymore.

Plus, when I started managing storage area networks, I had a different reason.  Storage is getting faster and faster every year, but DBAs aren’t getting any more hours in their day.  Keeping data and indexes on different filegroups requires careful attention during development.  All it takes is one knucklehead adding an index without paying attention, and bam, you’ve got indexes on the data filegroup.  That means you have to introduce a new task into your daily/weekly/monthly routine: running a query to identify which objects are on which filegroups.

What a pain.

I’ve struggled with finding and hiring good DBAs, and if I have a choice between making my database 5% faster or making my DBAs 5% faster, I choose that latter option.  I can get 5% more storage performance by adding a few more drive spindles, but it’s harder - and more expensive - to get 5% more good DBA time.

On the other hand, if I was in a shop where my DBAs had plenty of time (or worked for peanuts, whate, then I would take a sandbox system, set it up the same way as production, and do performance benchmarking to see how much of a difference a separate index filegroup made on my exact hardware, storage and application.

You’re asking me, though, so I bet you don’t have the time to do that research either.  In that case, keep it simple!

Can SELECT Statements Cause Blocking to Occur in SQL Server?

Filed under: Administration, Database Design, I'm a Newbie, Internals and Architecture, Programming, Transact-SQL (T-SQL) — Jason at 11:00 am on Friday, September 26, 2008

The following question was posed to me during a call today and its one that I’ve heard quite a bit so I figured it warranted a blog post.

The answer is, absolutely!  SELECT statements acquire a shared lock on the tables being accessed.  This shared lock will not affect other SELECT statements hitting the same table, but if someone tries to modify data in the table (via an UPDATE statement for example), the UPDATE statement will be blocked.  When analyzing locks using sp_lock the (S) symbol indicates a shared lock is on an object.

This default behavior can be modified by using one of many query hints.  Using the NOLOCK hint on a SELECT statement will force SQL Server to read data from the table without creating a shared lock on it.  When using this hint, you run the risk of reading uncommited data from the database but in cases where reading data with 100% accuracy is not required, NOLOCK can dramatically reduce blocking and improve the performance of your SELECT queries.  Another option is to use the READ UNCOMMITED isolation level when running your transactions, which conceptually does the same thing.

One additional caveat about both methods.  If your SELECT statement is running, and it expects to read a page that has been deleted by a transaction that is currently executing, SQL Server may deadlock this transaction.  There is an interresting post on this topic that can be found here.

Error Message: Ad hoc update to system catalogs is not supported.

Filed under: Administration, Database Design, Internals and Architecture, Other, Programming, SQL, SQL Server 2005, Transact-SQL (T-SQL) — Jason at 7:26 am on Wednesday, September 24, 2008

I ran into an interresting situation today while working with a customer and thought it worthwhile to blog on the subject since my internet searching proved lengthy on the subject.  While trying to run a RECONFIGURE statement after making an sp_configure change I received the following message:

Msg 5808, Level 16, State 1, Line 2
Ad hoc update to system catalogs is not supported.

I’m running on SQL Server 2005 SP2+ so my first thought was… obviously, but I’m not trying to update system catalogs, I’m trying to make an sp_configure change.  After doing some digging I found that the culript was the sp_configure ‘allow updates’ parameter.  This configuration in SQL Server 2000 allowed or dissalowed direct system table updates.  In SQL Server 2005, this configuration item still exists but it is obsolete since direct access to system tables in always prohibited.  While the configuration item is obsolete, having it set to 1 in SQL Server 2005 requires you to run the RECONFIGURE statement using WITH OVERRIDE, otherwise you will get the message above.

If you ever see the message above when trying to run RECONFIGURE you will either need to run RECONFIGURE WITH OVERRIDE, or first run:

EXEC sp_configure ‘allow updates’, 0
RECONFIGURE

I find it confusing that an “obsolete” configuration parameter can have an effect on instance behavior, but oh well.  I’d love to hear comments if anyone has any insight into this or similar issues.

Thanks to Jasper Smith, who posted here and pointed me in the right direction.

How Can I Determine Which Tables and Indexes Are Not Being Used in My Database?

Filed under: Administration, Database Design, Internals and Architecture, SQL Server 2005, Transact-SQL (T-SQL), Tuning and Optimization — Jason at 11:33 am on Wednesday, August 27, 2008

Q: I have inherited a legacy applications of which I am certain there are many unneccesary tables and indexes.  I am slightly paranoid about ripping them out however as I am not 100% sure that they are not used.  Is there any way that I can determine which indexes and tables are safe to drop?

A: As long as your application is running on SQL Server 2005+, you have access to a Dynamic Management View (DMV) that displays exactlky this information.  Querying from sys.dm_db_index_usage_stats will tell you how many times a given index/table has been scanned, seeked, looked up, updated, and even gives you the last date when an operation happened.  Browsing through this DMV will allow you to determine with confidence any index/table that has never been touched (since SQL Server was last restarted).  There are many exampes of some elegant queries that people have written that can be found by searching the DMV name but its a fairly straight forward one to understand.

One got-chya, the data in sys.dm_db_index_usage_stats will show tables/indexes across all databases where sys.indexes only shows you data from the database you are in.  This creates some challenging hurdles if you are trying to join on those tables to lookup the index name.  It’s best to keep it simple and filter dm_db_index_usage_stats to only show you data from the database you are in.

A fairly simple example with explanation can be found here.

How do I query DB2 data from SQL Server?

Filed under: Administration, Database Design, I'm a Newbie, Internals and Architecture, Programming, Security — Jason at 3:15 pm on Wednesday, August 20, 2008

Q:  We use DB2 Connect to access DB2 from MS Access. Do we need this for SQL Server also or does SQL Server have a built-in interface or do we need something else?

A: No you do not need to use DB2 Connect in able to access DB2 data within SQL Server.  SQL Server supports linked servers which allow you to register a remote data source in SQL Server and execute queries against it.  Linked servers can be remote SQL Server instances or instances of Oracle, DB2, Sybase, MySQL, etc…  As long as there is a supported OLEDB driver for the data source you can add it as a linked server.  There is a blog post you can read here:

http://blogs.msdn.com/dotnetinterop/archive/2006/01/20/defining-a-db2-as-a-linked-server.aspx

that walks you through the process of setting up a linked server to DB2 and you can also look up Linked Servers in books online for more information.

Clustered index impacts on writes vs reads

Filed under: Database Design — Brent Ozar at 11:59 am on Thursday, May 29, 2008

After today’s webcast, “What a Cluster! Clustered Indexes” we got an email from David Holt at Kaye Scholer LLP. He writes:

I just attended the webcast “What A Cluster! Clustered Indexes”; thanks for the info!

Clustered indexes obviously impact writes as well as reads, and there can be a conflict between the needs of the two; i.e., if a lot of different people are writing to the same table then you don’t want them to block each other, yet you want simultaneous queries (selects) on that table to be efficient. That seems to argue that the clustered index should make writes as efficient as possible, even if it’s not the most effective clustered index for reads. Would you agree?

Uh oh - you know it’s dangerous when someone from a law firm asks if you would agree, and it’s even more dangerous when that answer is no. I’ll take my life into my own hands and answer anyway.

When designing an index strategy for a table, it helps to know the read/write mix and any time ranges for each. I can illustrate with two examples of opposite ends of the spectrum.

Let’s say our company has a web filtering database that logs all employee web activity to a database table, including the user info, date/time, and the URL they were surfing. Every now and then, they’ll run reports to see what kinds of web sites the users are hitting, and sometimes they’ll audit the web history of a particular individual.

Example #1: The Web History Table

The web activity table would be extremely write-intensive, since we have a lot of company employees who sit around surfing the web all day long. (Like you right now, come to think of it - get back to work!) A write performance reduction as small as 10% might have a significant impact on our ability to keep up with our surf-happy employees. Plus, we store the data on a SATA RAID5 array with slow writes to begin with.In a case like this, we could make an argument for a clustered key using an identity field or using a datestamp field. That would keep write speeds quick at the cost of making reports slower. However, at report time, these users don’t really care how long the report takes to run - they just want to see it sooner or later.

Example #2: The IP Subnet Table

Our web filtering application would store a list of IP address subnets - network TCP/IP address ranges that identify workstations on our network. We would not do very frequent inserts into this table, because our network isn’t expanding like crazy.

When we query this table, the queries use a lot of range queries because they want to group users together by location. The IP address subnets are in similar ranges. The fields on this table include a lot of things that we want to report on, like geographic location and company department.

This table could have a completely different primary key design because it’d have different design objectives: faster queries instead of faster inserts.

Data warehouses are another great example of this kind of design: nightly loads happen during one window (nighttime), and end users do queries during a different window (daytime). Depending on our loads, we might design our primary keys differently. If we have plenty of time at night to load our files, and our users want faster responses on their daytime queries, then our primary keys wouldn’t be designed for fast inserts.

There’s yet another consideration in this design: the number of other indexes on the table. If a table has ten wide covering indexes, then the arrangement of the primary key may not make as much of a difference as you’d think, because there’s so much overhead in maintaining the indexes.

Bottom Line: Measure Every Change

This is why I drove home the Measure, Change, Measure, Revert point in the presentation: the only way to find out for sure is to get a complete picture of your query loads, design a replayable test (like a series of queries) and keep measuring the results.  If possible, replicate the production system as closely as possible in development, because the storage systems will affect your results.

Clustered Indexes & SQL 2005’s Performance Dashboard

Filed under: Administration, Database Design, SQL Server 2005 — Brent Ozar at 6:53 am on Thursday, May 29, 2008

In today’s installment of the Quest Pain of the Week webcasts, Jason Hall and I will be discussing a reader’s question about finding missing clustered indexes, their impact on performance, and how to be proactive about monitoring performance to find those types of design issues.

MSSQLTips.com has a great post called “Finding primary keys and missing primary keys in SQL Server.”  That post includes two snippets of code, one for each of the popular SQL versions:

Finding missing primary keys in SQL 2005:

SELECT c.nameb.name
FROM sys.tables b
INNER JOIN sys.schemas c ON b.schema_id c.schema_id
WHERE b.type ‘U’
AND NOT EXISTS
(
SELECT a.name
FROM sys.key_constraints a
WHERE a.parent_object_id b.OBJECT_ID
AND a.schema_id c.schema_id
AND a.type ‘PK’ )

Finding missing primary keys in SQL 2000:

SELECT c.namea.name
FROM sysobjects a
INNER JOIN sysusers c ON a.uid c.uid
WHERE xtype ‘U’
AND NOT EXISTS
(
SELECT b.name
FROM sysindexes b
WHERE a.id b.id
AND (b.status 2048)<>0)

You can read their post for similar queries on finding tables WITH primary keys too.

Also in today’s webcast, I’ll be showing a few screenshots from Microsoft’s SQL Server Performance Dashboard Reports.  They’re a free set of SSRS reports that are integrated into SQL Server Management Studio.  The reports I’ll be showing in the webcast are somewhat limited in that they only show data that’s currently in the plan cache.  If your server is under memory pressure, it may not retain this data for too long, and you would have to revisit these reports pretty often to get a good picture of what’s going on under the hood.

For more information about the dashboard and how to use it, check out Kevin Kline’s Tool Time article “SQL Server 2005 Dashboard Reports” in SQL Server Magazine’s August 2007 issue.

Implementing a Hash Partition on SQL Server 2005

Filed under: Database Design, Programming, SQL, SQL Server 2005, Transact-SQL (T-SQL) — KKline at 2:54 pm on Monday, April 21, 2008

One of the best things about becoming a Microsoft MVP is meeting other MVPs.  I bring this up because last week was the annual MVP Summit in Seattle, WA.  I was really looking forward to meeting Steve Kass.  Steve Kass is one of the smartest SQL Server MVPs I’ve encountered, especially when it comes to SQL questions.  A while back, I noticed that Steve made an interesting recommendation for a hash function that you could use for partitioning that I thought was worth noting.  A hash function would be very useful if you wanted to implement your own variation of a range partition using a hash function rather than the standard sort of range partitioning where colA values of A-H go to partition 1, values of I-P got to partition 2, and so forth.

 

Steve notes that you could use the following for hashing something small in size:

   CAST(    SUBSTRING(      HASHBYTES(’SHA1′,         CAST(my_col AS NVARCHAR(appropriate_size))),8,1) AS tinyint) 

This is just an off the cuff recommendation from Steve and might need some fine tuning, for example, the CAST might throw off persistence.  However, it’s a good start.

 

Thanks, Steve, for sharing this and thanks, readers, for sharing any improvements you might develop out in the field.

I am interested in migrating from Sybase ASE to SQL Server and I am not making progress with SQL Server Migration Assistant provided by Microsoft. Are there other tools that provide an a the ability to migrated Sybase ASE 12.5 To SQL Server 2005.

Filed under: Administration, Database Design, SQL Server 2005, Transact-SQL (T-SQL) — Ari Weil at 6:12 am on Sunday, February 24, 2008

First of all, welcome to the light, we’ve been waiting for you. ;-) Just kidding…(sorta).

The first question you need to answer is what’s been causing you problems? Have you taken a look at Microsoft’s Guide to Migrating from Sybase ASE to SQL Server 2005 whitepaper? The planning phase is critical in ensuring your migration is successful. You should definitely do some due dilligence with regard to equivalent, nonsupported and emulated functions as well as unique datatype conversions, and syntax differences/disagreements to determine what’s going to create problems.

There is certainly no shortage of third-party applications to help you migrate your databases and applications, but this can be the perfect opportunity to gain some deep insight into your environment. Yes, that is a glass-half-full way of looking at it, but it also happens to be true. A third-party application can help, but if something during the migration goes south, you or the vendor’s support are still going to have to investigate your environment to determine what went wrong and why. Wouldn’t you rather know?

Personally, I’m a big fan of testimonials and the type of information you usually find in forums and Google groups. In an article entitled Migrating from Sybase to SQL Server Sayed Geneidy talks about his experience migrating to SQL Server 2000. Look at the sections on data compatibility mode and optimizer hints for some items that frequently cause people headaches during a migration.

Hope that helps!

I keep reading about the dangers of autogrow but can I really size the database accurately enough to disable it?

Filed under: Administration, Database Design, Tuning and Optimization — Ari Weil at 2:57 am on Tuesday, January 29, 2008

First of all, it’s important to realize that the autogrow feature was developed for a reason. Consider an extreme set of alternatives: either your mission-critical database starts throwing out of space errors or processing slows when the autogrow feature gows the database’s data files. Obviously, when faced with these alternatives you’ll want the diminished I/O throughput in lieu of no throughput whatsoever.

Performance tuning specialists and SQL Server internals gurus preach about the dangers of autogrow because it’s important to do everything in your power to avoid having the feature triggered at inopportune moments. Let’s use a more realistic set of alternatives (out of space errors should not be realistic in production) based on a stock trading website with a  backend SQL Server database. This type of website usually guarantees that trades will be performed in the neighborhood of a tenth of a second. You now have a choice between putting in a few hours/days of grunt work calculating the database’s storage needs and potentially running into a situation where autogrow fires right before the market closes; I/O throughput will slow to a crawl and disk contention will lead to blocking locks when autogrow fires; you’d be well advised to start updating your resume because the company will face legal action and/or huge financial consequences.

Planning for database growth over time is not an easy task - it’s not that the process is particularly complicated, but the time required to accurately plan for growth and understand usage trends can make it a tedious process. SearchSQLServer recently posted an article by Denny Cherry, someone who’s got experience dealing with large deployments (he worked on the MySpace 175+ million user installation). His article on Determining SQL Server database storage requirements goes over two generally accepted database sizing methods and points out the danger of using a “good enough” estimation versus a truly accurate one.

A little planning can go a long way…

Next Page »