Optimization Tips & Techniques
From SQLServerPedia
|
See Also: Main_Page - Performance Tuning This topic covers some of the tips and techniques that will get you started in optimizing your applications. The suggestions in this section work in given situations or when accomplishing a particular programming task. Get Rid of the Sub-Queries When You CanSub-queries are the queries included within other queries. At times, you can't help but use a sub-query to accomplish a particular task. For example, if you wanted to examine titles that have sold more copies per sale than the average quantity of copies sold, you could write the following query: SELECT * FROM titles WHERE title_id in (SELECT title_id FROM sales GROUP BY title_id HAVING AVG(qty) > (SELECT AVG(qty) FROM sales))
In some other cases, you can write a query using a join or using a sub-query. As a rule of thumb, the former will work better. Sometimes the query optimizer can replace the sub-query with a join itself, but that does not always happen and if it does, the optimizer does not do a good job of selecting proper indexes. Therefore, you are better off writing a query with a join, rather than a sub-query. SQL Server 2000 has come a long way as compared to its predecessors as far as sub-query optimization is concerned; however, if you feel like you are not getting an optimal performance, try replacing all sub-queries in your code with inner or outer joins. For example, both of the following queries return the authors for those titles that never sold more than 50 copies at a time: -- subquery solution SELECT * FROM titleauthor WHERE title_id NOT IN (SELECT title_id FROM sales WHERE qty > 50) -- execution time: 105 milliseconds -- join solution SELECT ta.* FROM titleauthor ta, sales s WHERE qty >50 AND ta.title_id <> s.title_id -- execution time: 47 milliseconds
Transact-SQL also provides extensions to UPDATE and <a href="DELE"> DELETE]] statements that let you join tables when determining rows to delete and update. This will let you rewrite ANSI SQL UPDATE and DELETE queries without using sub-queries. For instance, the following queries update year to date sales for a particular title; even though both queries perform the same task T-SQL query is more efficient since it does not have to use sub-queries:
-- ANSI SQL solution: UPDATE titles SET ytd_sales = ytd_sales + (SELECT qty FROM sales WHERE stor_id = '6380'AND ord_num = '722a') WHERE title_id = (SELECT title_id FROM sales WHERE stor_id = '6380' AND ord_num = '722a') -- Transact-SQL solution: UPDATE titles SET ytd_sales = ytd_sales + qty FROM titles a, sales b WHERE a.title_id = b.title_id AND stor_id = '6380' AND ord_num = '722a' Consider UNION ALL instead of UNIONThere is a slight difference between UNION and UNION ALL clauses. They both combine outputs from two separate queries that have the same number of columns, with the same (or similar) data types. However, UNION discards the duplicates and orders the output. UNION ALL does not perform either of those operations. If you need to get rid of the duplicates and have the output sorted, then use UNION. Otherwise, if you don't expect duplicates in the output UNION ALL will do a more efficient job. For example, the following query should not have any duplicates, it combines the sales with more than 25 copies and sales with less than or 25 copies: SELECT stor_id, ord_date, qty, title_id, 'sale with more than 25 copies' FROM sales WHERE qty > 25 UNION ALL SELECT stor_id, ord_date, qty, title_id, 'sale with less than 25 copies' FROM sales WHERE qty < = 25 The same query could have been written with UNION, as follows: SELECT stor_id, ord_date, qty, title_id, 'sale with more than 25 copies' FROM sales WHERE qty > 25 UNION ALL SELECT stor_id, ord_date, qty, title_id, 'sale with less than 25 copies' FROM sales WHERE qty < = 25 In this case both queries produce the same result set and since the query only affects a handful of rows the execution time isn't much different. However, performing the same operation on a million row table will quickly take toll on your application's efficiency. Also note that UNION creates a worktable in tempdb database, where it performs sorting. So using a UNION is also creating a danger of overflowing the tempdb if you don't have AUTOGROW option enabled. Impact of Triggers and Constraints on PerformanceTriggers and constraints are incredibly powerful and helpful when enforcing referential integrity. On the other hand, they can be the hidden causes of your performance problems. Since triggers execute automatically their impact on performance is tough to detect, especially if your application utilizes nested triggers - triggers that fire other triggers. When you are certain that your query is perfectly optimized, check for triggers on the affected tables. Ensure all triggers can use appropriate indexes for operations they perform. If those triggers are nested you might have to examine other triggers as well - it takes only one poorly performing code module to cause your application to hang. Depending on their type, constraints might also require useful indexes. For example if a foreign key constraint has to check for existence of a particular record in a related table, then it might have to perform a table or index scan. SELECT INTO versus INSERT SELECTThe SELECT INTO statement can be used to create a new table and populate it with the data in other tables in one shot. However, contrary to the common belief, SELECT INTO is not faster than creating a table first and then executing INSERT SELECT. In fact, most of the time INSERT SELECT will be quicker. You should avoid using SELECT INTO unless it only takes a few seconds, or the database is in a single-user mode and you are the only user. The SELECT INTO statement locks system tables for as long as it takes to execute it. Using that statement is especially harmful if you are populating a temporary table. Keep in mind that all temporary objects are stored in the tempdb database, which is shared by all users on the server. If you lock the system tables in tempdb, none of the other users will be able to create their temporary objects until you're done populating your table. The only benefit of using SELECT INTO is that it is not logged. One of the few times when SELECT INTO can be handy is when creating an empty copy of the existing table as in the following: SELECT * INTO authors_dummy FROM authors WHERE 1 = 10
SET NOCOUNT ONThis option turns off the reporting of the number of affected rows. When you are debugging your applications ensuring that the appropriate number of rows have been affected might be helpful; however, once your application is ready to be deployed consider setting the NOCOUNT option on for all of your stored procedures and triggers. If you have a procedure that contains numerous SELECT, INSERT, UPDATE and DELETE statements you will be returning much useless information to the user, thus placing an unnecessary burden on the network, especially if your procedure is executed frequently. Setting NOCOUNT on reduces information going from the server to the clients and improves overall response time of your transactions. Functions might Suppress IndexesWhen you apply built-in or scalar user-defined functions to the indexed columns the optimizer cannot use the index. However, if you rewrite these queries so that the index keys are not involved in the function, then the optimizer uses the index as expected. For example, consider the following two queries: SELECT * FROM authors WHERE au_lname = 'green' and au_fname = 'marjorie' SELECT * FROM authors WHERE au_lname + ' ' + au_fname = 'green marjorie'
Similarly, SELECT * FROM employee WHERE job_lvl - 34 > 20 query will scan the employees table, whereas SELECT * FROM employee WHERE job_lvl > 54 will scan the index defined on the job_lvl column. As another example SELECT * FROM sales WHERE ord_date > '1/1/94' will scan the index defined on the ord_date column, whereas SELECT * FROM sales WHERE DATEPART(yy, ord_date) > 1993 query, which returns the same rows, will scan the clustered index. So the trick with the queries that use functions and calculations is to rewrite them in a manner that allows the optimizer to use an index. Using Parameters & VariablesValues assigned to the variables might alter the decision of query optimizer as to whether it should use an index. If the variable value necessitates retrieval of the majority of table rows, it might be more efficient to scan the entire table, rather than scanning the index. If only a few rows need to be retrieved, then an index scan will be a better choice. The catch is that query optimizer does not know about the variable values in advance and has to make decisions at the parse and compilation phase. Sometimes, this results in an inefficient use of indexes. On the other hand, if you rewrite the same script as a stored procedure and pass the values as parameters, the optimizer knows about the parameters ahead of compile time and can make correct decisions. For example we could write the same query using parameter and variable solutions as follows: -- local variable solution: DECLARE @au_id varchar(11) SET @au_id = '213%' SELECT * FROM titleauthor WHERE au_id like @au_id
CREATE PROCEDURE get_titleauthor @au_id VARCHAR(11) AS SELECT * FROM titleauthor WHERE au_id like @au_id GO DECLARE @au_id VARCHAR(6) SET @au_id = '213%' EXEC get_titleauthor @au_id
Using Temporary TablesIf you are using temporary tables within your stored procedures and joining these tables to regular tables or other temporary tables, predicting the number of affected rows is difficult. Optimizer will usually choose the best query execution plan there is, however, it cannot create indexes for you. Therefore, you might have to do some extra work of examining the execution time and identifying the bottlenecks. If you find that your temp tables contain a large number of rows, it might help to create a clustered index on the temp tables. When you drop the temporary tables the indexes are automatically dropped as well. Loop OptimizationOptimizing loops is fairly straightforward: you should put all the operations that do not change outside the loop. This reduces the amount of repetitive work and makes loops faster. For example, consider the following query: SET NOCOUNT ON DECLARE @i INT, @m INT SELECT @i = 1, @m = 1 WHILE @i < = 10 BEGIN WHILE @m < = 10 BEGIN DECLARE @msg VARCHAR(20), @header VARCHAR(20) SELECT @header = '@i * @m = ' SELECT @msg = CAST((@i * @m) AS VARCHAR(3)) SELECT @header + @msg SET @m = @m + 1 END SET @i = @i + 1 SET @m = 1 END
SET NOCOUNT ON DECLARE @i INT, @m INT, @header VARCHAR(20), @msg VARCHAR(20) SELECT @i = 1, @m = 1, @header = '@i * @m = ' WHILE @i < = 10 BEGIN WHILE @m < = 10 BEGIN SELECT @msg = CAST((@i * @m) AS VARCHAR(3)) SELECT @header + @msg SET @m = @m + 1 END SET @i = @i + 1 SET @m = 1 END
---------------------------------------- @i * @m = 1 ---------------------------------------- @i * @m = 2 ---------------------------------------- @i * @m = 3 ---------------------------------------- @i * @m = 4 ---------------------------------------- @i * @m = 5 ---------------------------------------- @i * @m = 6 ---------------------------------------- @i * @m = 100
Get Rid of Cursors when PossibleTransact-SQL cursors tend to be rather slow. This is due to the fact that T-SQL is a set based language and it is not optimized for processing one row at a time. In general, if you can write a query with or without a cursor you should opt for the latter. If you are doing a repetitive operation and you have a background in some other procedural language, it is often tempting to use a cursor. For the efficiency sake you should avoid the temptation and try to solve the task at hand without cursors. The following cursor prints the names of authors for the titles that have been sold: DECLARE @au_lname VARCHAR(40), @au_fname VARCHAR(40), @title VARCHAR(80) DECLARE authors_cursor CURSOR FOR SELECT au_lname, au_fname, title FROM authors a, titleauthor ta, sales s, titles t WHERE a.au_id = ta.au_id AND ta.title_id = s.title_id AND t.title_id = s.title_id OPEN authors_cursor FETCH NEXT FROM authors_cursor INTO @au_lname, @au_fname, @title WHILE @@FETCH_STATUS = 0 BEGIN SELECT @title + ' is written by ' + @au_fname + ' ' + @au_lname FETCH NEXT FROM authors_cursor INTO @au_lname, @au_fname, @title END CLOSE authors_cursor DEALLOCATE authors_cursor
SELECT title + ' is written by ' + au_fname + ' ' + au_lname FROM authors a, titles t, titleauthor ta, sales s WHERE a.au_id = ta.au_id AND s.title_id = t.title_id AND ta.title_id = s.title_id
In Place UpdatesSQL Server versions 7.0 and 2000 have two ways of updating data. In many cases SQL Server can flip the bytes in a row without changing the row's location - this is referred to as update in-place. In some other instances SQL Server has to perform an update as a DELETE operation followed by an INSERT. Unlike previous releases, SQL Server 2000 no longer uses deferred updates; those were done by marking the transaction log in the first step and actually modifying the value in the second. Still, it is important to understand when updates can happen in place, since two operations (DELETE followed by an INSERT) are more expensive than one. SQL Server maintains indexes dynamically. Therefore if a clustered index key changes, SQL Server will have to update pointers to the clustered key in non-clustered indexes. If a table does not have a clustered index, then the pointer in non-clustered indexes identifies the physical location of the row. SQL Server stores data sorted according to the order the clustered index was created. If you update the clustered index keys, then the row might have to be moved to a different location. For instance, if you have a clustered index on an ord_date column in the sales table and you change the order data from '1/1/92' to '1/1/01', the row will have to move from the top to the bottom of the table. If this happens all non-clustered indexes need to be updated. So, you should try to create clustered indexes on rarely changing columns. You won't always be able to do so. Another time when a row might have to be moved is when its size changes significantly and it no longer fits in its current location. If a row changes the location and non-clustered indexes need to be updated, the update is implemented as a DELETE followed by an INSERT. Unlike clustered tables, when a row changes position in tables with no clustered indexes the non-clustered indexes don't need to be updated; they still point to the rows original location. Instead, SQL Server stores a pointer in the row's original location to the new location. In such cases, the update can be implemented in-place and will be more efficient. On the other hand, we already mentioned that you should create clustered indexes on every table to use your storage space efficiently. When modifying the non-clustered index keys, if the index is non-unique the updates will happen as DELETES followed by INSERTS. If the index is unique SQL Server tries to do the updates in one shot. In general, an UPDATE cannot be implemented in-place if:
The UPDATES in-place are possible if:
In addition multiple columns in a single row might be updated in-place if these columns are contiguous and there is a unique index covering the UPDATE. The above should not encourage you to examine every UDPATE statement and change your table structure accordingly. This information simply gives you clues about why your queries might be more efficient at times. However, you are more likely to experience UPDATES in-place, and therefore more efficient queries if you follow some or all of these guidelines:
|