Right, I know it’s Friday and everyone’s tired and looking forward to the weekend, but I do need to finish off this indexing section and I’ll try to keep this short and interesting and hopefully keep everyone awake.
There’s no shortage of information available on how to create indexes. Hell, I’ve written a copious amount myself. Most of these many articles however are written from the point of indexing single queries. What you chose for a where clause, what has to go into the include to create the perfect index for this query. Now that’s all well and good, but I’ve never met a system that had only one query per table (maybe there is such a system out there, but I’ve never found it)
So what I’m going to try to do today is address the topic of a strategy for indexing. How to approach indexing, not for a single query, but for the system as a whole. I won’t be able to cover this in-depth, this is material worthy of an entire book chapter, if not an entire book, but I can at least touch on the essential portions.
Now, there’s two main positions that we could be in when considering indexing strategies for an entire system
1) A brand new system that’s still in development
2) An existing system that’s being used actively.
One at a time…
Indexing strategies for a brand new system
Start by choosing a good clustered index. What makes a good clustered index? Well, it depends
- http://technet.microsoft.com/en-us/sqlserver/gg508879.aspx (video)
The clustered index is the base, it will affect each and every nonclustered index, and it’s not trivial to change once the system is in use, so chose carefully. I’m not saying another word on the subject of a clustered index, not today.
Once that’s done…
Design any unique constraints, unique indexes or primary key constraints that are required by the database design. If the primary key should go on the column(s) that were chosen as the clustering key, great, the primary key gets created clustered. If not, then the clustered index goes on the column(s) chosen for the clustered index and the primary key gets created as nonclustered.
Index the foreign keys. These may end up not being the final indexes on these columns, but it’s an excellent place to start and it’s something that’s left out far, far too often.
That’s the absolute bare minimum that must be done. That can all be done with just the database design. The rest is going to require some knowledge of the queries that will be running against the server.
Speak to the developers, see what queries they’re going to be sending to the database, speak to the business analysts (or users) and see what think will be the commonly used aspects. Bear in mind that both the developers and business analysts (or users) may well be wrong. Not intentionally wrong, but wrong because they’re looking at things from a different perspective.
To give an example of that, a system I worked on some time back had custom security built-in to the DB (tables storing access rights to various sets of data). The users swore that the most accessed portion of the system was the address book. The developers claimed that the account balances procedure would have the heaviest impact. A trace showed that the custom security ran far more frequently than anything else.
Hence, if you can trace a workload (from automated testing, user testing or acceptance testing is the best) you should. Combine that with what the developers and business analysts say and that should be reasonably accurate.
Find the most critical queries, the ones that are going to run often. These may be ‘housekeeping’ queries like the custom security, or maybe they’ll be queries run when the user opens the app. It’s going to differ for everyone.
Create a minimal set of indexes to support the most critical, most frequent queries. Do not, at this point, try to index everything. It’s going to be a waste of time without accurate stats of how the users really use the system. Create just a minimum of indexes to start with. You want enough so that the system runs acceptably but not so many that you’ll be cleaning unused indexes off for months after the system goes live.
I’m not going to go into detail here on how to create indexes, see all the links that I gave in the first part of this series. There are, however, a few things to keep in mind
- Fewer, wide indexes are better in general than lots of narrow indexes (http://sqlinthewild.co.za/index.php/2010/09/14/one-wide-index-or-multiple-narrow-indexes/)
- Selectivity should not be the first thing you consider when choosing column order for indexes, especially indexes that are going to support multiple queries. (http://sqlinthewild.co.za/index.php/2009/01/19/index-columns-selectivity-and-equality-predicates/ and http://sqlinthewild.co.za/index.php/2009/02/06/index-columns-selectivity-and-inequality-predicates/)
Once done, come back to the indexes after the system is in use and re-evaluate.
Indexing strategies for an established system
This one depends heavily on how badly the system is indexed.
If tables are missing clustered indexes or primary keys that should be the first priority. This is harder than for a new system as, in the absence of enforced constraints, duplicate values could have crept into the supposedly unique columns.
If primary keys can’t easily be added (due to errant data), the clustered indexes should still be considered. All the same considerations as for a new system apply here. It’s worth noting that adding clustered indexes to huge tables is not a quick exercise.
There are two factors to fixing the indexing for an existing system
- creating or widening indexes to support the current workload
- removing indexes that are not used
Creating or widening indexes
This should be done based on the workload. Either SQLTrace or Extended Events can be used to capture the queries running against the server. This can be examined manually or it can be submitted to the Database Tuning Adviser (DTA).
If using DTA, that must not be the entire story. DTA recommendations have to be carefully examined and tested and implemented only if they make sense (and improve performance). Be wary of accidentally creating redundant indexes this way (How can you tell if an index is REALLY a duplicate?)
If manually tuning, the same cautions as mentioned in the ‘Indexing a new system’ apply.
Also, be aware that you can’t create perfect indexes for all queries (except maybe in a data warehouse, but likely not even then). Index for the important (frequent or high priority) queries, create indexes that can support multiple queries. Tune the system, not the individual query. See Indexes: just because you can, doesn’t mean you should! (Kimberly Tripp).
If the less important, less frequently run queries aren’t quite as optimal as they could be (but are still in the acceptable range), that’s fine. Be very careful of over-indexing.
To be honest, this is fraught with peril. Telling that an index is unused is not as easy as it may seem.
Start with the sys.dm_db_index_usage_stats DMV. Indexes that have no seeks, no scans and just updates, or ones that don’t appear in there at all may appear to be unused. Whether they are really unused however is another question.
The sys.dm_db_index_usage_stats DMV is cleared by a restart of SQL or any time the database is closed (beware auto_close). So if the SQL instance has only been running for three days, then the best that can be said about indexes that appear in sys.dm_db_index_usage_stats with no seeks and no scans is that they haven’t been used in three days.
Before deciding to drop any indexes monitor that DMV over a period of time. How long? Depends on your application. If it’s an app that has a steady and consistent usage, maybe not long. If it’s an app that has radically different usage patterns at different times of the month/year, then long enough that you capture them all.
Also, make sure you keep documentation and preferably scripts of any indexes dropped, so that you can easily recreate them should it be necessary.
Comprehensive indexing strategies are not exactly easy things to write short posts on, but I hope this has given some idea on how to fit all the pieces together. For another view on this topic, be sure to watch Kimberly’s video on indexing strategies: http://technet.microsoft.com/en-us/sqlserver/gg545006.aspx
Oh, I almost forgot the answer to Wednesday’s homework (for anyone that’s still awake). There are likely many answers, but this one satisfies the requirements:
CREATE NONCLUSTERED INDEX idx_HomeworkAnswer ON dbo.CallLog (Severity, LastUpdateDate, CallStatus) INCLUDE (AssignedTo, LogDate) WHERE LastUpdateDate IS NOT NULL AND Severity IN (1,2)