Choosing Index Keys

From SQLServerPedia

Jump to: navigation, search

See Also: Main_Page - Database Administration - Database Design - Indexing

The effectiveness of an index largely depends on how selective its keys are. In other words, the keys that have one or few rows associated with them will be most efficient. For instance, the primary key of authors table is au_id. This key identifies each author uniquely; each value of author identifier is only associated with a single record. On the other hand, each value of the contract column (1 or 0) in authors table is associated with many authors, and is therefore not very selective. Query optimizer is more likely to use an index with higher selectivity for satisfying individual query requests. Therefore, you should strive to build indexes on key(s) with high selectivity.

Index density is the reverse of selectivity. The lower the density, the higher the selectivity. This also means that an index with higher density is less likely to be used by query optimizer than an index with low density.

Information about index selectivity and density is stored in distribution statistics. You can retrieve such information by executing DBCC SHOWSTATISTICS, as follows:

USE pubs  GO  DBCC SHOW_STATISTICS ('authors', 'aunmind')

The first row of output will look similar to the following:

Updated              Rows     Rows Sampled  Steps  Density  Average Key Length  
Jun 13 2003 11:32PM  7075442  7075442       135    7.45E-07                  8

Although individual keys might not be very selective, when combined several columns can provide good selectivity. For example, an index on marital status won't be very selective; however, if you combine marital_status column with job_category and income_level the selectivity might go up significantly.

It is also important to choose the order of keys in the clustered index built on multiple columns. Since data in a table is ordered according to the structure of the clustered index, the index built on last_name, first_name won't behave the same way as the one built on first_name, last_name. The most selective columns should be specified first.

You should attempt to build a clustered index on keys that will NOT change often. Since data is ordered according to the order of clustered index key(s), changing the key value might require moving the index rows from one page to another. Re-shuffling clustered index rows can add undue overhead to the system.

Generally, it is recommended to build clustered indexes on columns with narrow data types. For instance an index built on a column with the INTEGER data type will be more efficient than one built on a VARCHAR column. This happens because the non-clustered indexes will have to contain pointers that contain clustered index keys. Leaner indexes are easier to read through and can therefore be more effective. Keep in mind, though, that this advice is rather broad. Effectiveness of your index largely depends on the way your application reads and writes data in the database. Building a clustered index on the column with the smallest data type in the table does not guarantee that the index will be helpful.

SQL Server will let you build multiple indexes on the same key(s). However, doing so will not benefit your application. In fact, having duplicate indexes provides more alternatives for query optimizer and therefore can add an overhead in choosing the optimal execution plan.

More SQLServerPedia Articles on Indexes

How SQL Server Indexes Work

Types of Indexes in SQL Server

  • Included Columns - starting with SQL Server 2005, you can add fields to indexes with a smaller performance hit.
  • Indexed Views - how to build indexes on top of views for more speed.
  • XML Indexes Overview - how to index XML documents and fragments.
  • XML Index Rules - some additional considerations and requirements for XML indexes.

Best Practices on How to Design Database Indexes

Maintaining Indexes for Top Performance

Indexes need regular maintenance in order to perform well.