Indexing Strategies
From SQLServerPedia
|
See Also: Main_Page - Database Administration - Database Design - Indexing Choosing appropriate indexes can improve your application performance significantly, often by an order of magnitude. At the same time choosing an index strategy is not as simple as it might sound. There are several points to consider when choosing indexes.
More Fields or Less Fields On Each Index?Each index you define might improve performance of a SELECT query; on the other hand the same index will slightly worsen the performance of each INSERT, UPDATE and DELETE statement. The reason for this is that SQL Server automatically maintains the index keys. Therefore, each time you issue a data modification statement, not only does SQL Server have to make changes to the data, but also to each index defined on the affected table. The performance degradation is especially noticeable with large tables with many indexes, or few indexes with long keys. SQL Server 2000 has come a long way compared to its previous releases as far as index maintenance is concerned. However, in some cases, it still makes sense to drop the indexes, issue the data modification statement and then re-create the indexes. For instance, suppose you have a fact table with millions of rows in a data warehouse. Since your data analysis queries affect this table and multiple dimension tables you create a clustered index on the combination of the dimension keys in the fact table. When you try to populate the data in the fact table from the various OLTP systems you will quickly realize that the INSERT statements are extremely slow. Some experts even suggest that you populate your fact tables through bulk loads (using BCP and / or BULK INSERT). Although in some cases this could be beneficial, loading the data out to the file system and then loading it back to SQL Server seems somewhat wasteful. A better solution might be to drop the clustered index, load the new data and then re-create the index. You will find that the INSERT query without the indexes will be much faster. Even with the time it takes to drop and re-create the indexes your whole population routine will be at least several times faster than BULK loads. The performance of INSERTS without the indexes will be at least an order of magnitude faster on a several million row table than the performance of INSERT with indexes. Remember, INSERT is a logged operation, whereas BULK INSERT and BCP are not. Therefore, you will have to clean up the transaction log if populating the fact table with INSERT statements. Since data warehouses are used for read-only purposes and your routine is the only connection to alter the tables, you can safely truncate the transaction log once your INSERT routine is complete. In earlier versions of SQL Server each query could only take advantage of a single index. In such environments it made all kinds of difference to have a proper index for each query - sometimes you had to sacrifice the performance of less important queries for the price of having a good index for the critical queries. With SQL Server versions 7.0 and later, query optimizer can use multiple indexes per table. Therefore SELECT queries will perform better with each index they can use. Once again, you need to balance the performance of SELECT queries with the performance of INSERT, UPDATE and DELETE operations. Clustered or non-clustered?Another decision to make is whether you need a clustered or non-clustered index on a particular column or set of columns. This is where it helps to have an understanding of the index architecture. SQL Server indexes have a B-tree structure with the root node, intermediate levels and leaf nodes. With the non-clustered indexes the leaf nodes are the keys of the clustered index. With the clustered indexes the leaf nodes represent the data itself. This brings up two important points to keep in mind:
To illustrate these points, lets look at the employee table in the pubs database. By default this table has a clustered index employee_ind defined on the combination of the following columns: lname, fname, minit. This means that the data will be stored ordered by last name, then first name and then middle initial. Now, let's drop the clustered index and create a new clustered index on the same table, this time on the hire date column, as follows: DROP INDEX employee.employee_ind GO CREATE CLUSTERED INDEX emp_hire_date ON employee(hire_date DESC)
SELECT * FROM employee
A clustered index will be much faster for data retrieval operations than the non-clustered index, because the data is sorted according to the clustered index definition. What this means to you as a developer is that the clustered index should be used sparingly, for the most important column in each table. SQL Server allows up to 249 non-clustered indexes per table therefore, if you do need an index on other columns you can always create a non-clustered index. Keep in mind that SQL Server enforces PRIMARY KEY constraints by placing a unique index on the table. Unless told otherwise, and if there is no clustered index on the table, SQL Server will make the PRIMARY KEY index clustered. When creating indexes other than PRIMARY or UNIQUE KEY indexes, by default SQL Server will create them as non-clustered. If you have a column with an identity property AVOID putting a clustered index on that column. It is not likely that your users will ever run queries affecting the identity column since it has no business meaning. Nor would you ever have to order by the identity column since such sorting operation makes no business sense. In addition, a clustered index on the identity column will force users entering new data into the table to populate the last data page of the table. This condition is sometimes referred to as a "hotspot" since there may be multiple users competing for the last available spot on a page, therefore making the INSERT statements rather slow. If you are maintaining an application with a clustered index on an identity column, an easy tuning option is to move the clustered index to a more meaningful column. It is very important to have a clustered index on every table. If a table does not have a clustered index then all new records have to be added to the last data page occupied by the table. If there is a clustered index on a table, then new rows can be added to the last page or to the middle of the table, whichever position is suitable to the new row according to the way data is sorted in the table (according to the way clustered index was created). In general the clustered indexes are good for queries that:
Point 3 might need some extra explanation. SQL Server cannot take advantage of the index (clustered or not) if the query does not specify the columns in the same order as they were submitted at index creation. This means, if you have an index on lname, fname, minit in the employee table and your query selects fname, minit, lname, the index cannot be used. The index you have created could be used only for the queries that specify the columns in the following order for the SELECT list:
Any other order of the columns cannot take advantage of the index. You might experience an exception to this rule if all columns needed for the query are in the index and SQL Server decides that it would be faster to scan an index than to scan an entire table. In such case SQL Server might still use the index even if the query does not specify columns in the same order as the index. In general you should avoid placing a clustered index on a frequently changing column. Since SQL Server has to sort the data according to the clustered index definition, changing clustered key values will result in extra work and slow down the UPDATE / INSERT operations. You should also avoid having long keys for clustered indexes, unless your queries refer to the entire combination of all columns in the clustered index frequently. The reason is that the non-clustered indexes on the same table will use the clustered keys for lookups and therefore grow very large. The non-clustered indexes are good for retrieving a few rows and for the queries that have exact match (=) conditions in the WHERE clause. In general you can use non-clustered indexes any time you need to speed up the query on a particular column, but the clustered index is already being used by another column (or combination of columns). Each time you rebuild the clustered index (whether you move it to a different column or not) all non-clustered indexes are rebuilt as well. This happens because the non-clustered indexes use clustered index values as their lookup keys. When you move the clustered index to a different column the whole table is copied to a temporary storage, ordered according to the clustered index definition and recreated. Therefore, dropping and re-creating clustered indexes is a resource intensive and time-consuming operation which should be performed during the periods of limited user activity. How Many Fields Should The Index Have?You also have to decide how many and which columns to include in your indexes. Most of the time your queries will determine whether you need to include a particular column in the index or not. However, you have an option to create more indexes with fewer columns (or index keys). The shorter your index the more index keys will fit on a single page and the index will take up less space. This means that scanning such index pages will take less time and the index will be more efficient. On the other hand, some queries that contain the search criteria defined on several columns could benefit from a long index that includes all columns mentioned in the WHERE clause. Such an index is referred to as a "covering index". Note that if the index key is too long, SQL Server might not be able to store more index keys on a single page than the actual data. So the covering index keys should fit on a single page in order to be beneficial. If the index is narrower than the actual table then more index keys can be placed on a single page and therefore the entire index will be easier to scan. For example, suppose we have a report that retrieves employee id, name, and job level. We could create a covering non-clustered index on the employee table to optimize the query generating this report as follows: CREATE NONCLUSTERED INDEX emp_name ON employee(emp_id, fname, minit, lname, job_lvl )
Index Selectivity and DensityIt is important to understand the concepts of index selectivity and density. Index selectivity measures the number of distinct key values in the table. Therefore, a unique index, such as the PRIMARY KEY index will be perfectly selective. In general the higher the selectivity of an index the better for SQL Server query optimizer. If an index is not very selective the query optimizer might decide that it would be more cost effective to scan an entire table than to scan an index. On the other hand, index density measures number of duplicate index key values in the table. Therefore, more selective indexes will have lower density. Usually the best indexes will be the ones with the highest selectivity. However, that does not mean that non-selective indexes are useless. Non-selective indexes are helpful when they cover a particular query. For instance there are only a handful of states in the authors table, however, the query calculating the number of authors in each state might have to scan the entire table. If we build an index on the state column the query can take advantage of that index and be more efficient. More SQLServerPedia Articles on IndexesHow SQL Server Indexes Work
Types of Indexes in SQL Server
Best Practices on How to Design Database Indexes
Maintaining Indexes for Top PerformanceIndexes need regular maintenance in order to perform well.
|