Key Lookup Threshold

Gail talks about bookmark lookups…. err.. key lookups in this post. So are they good or bad? Well, like many things in SQL, it depends. The main factor is the number of rows returned. A few rows are fine but the cost rises sharply with larger result sets. There comes a point where the threshold is crossed and a scan is more efficient. This is because a scan leverages sequential IO while a lookup does random IO.

Here are the results of the code at the end of the post.

Rows IO  
Scan 422  
1 row 4  
5 rows 12  
10 rows 22  
25 rows 52  
50 rows 114  
100 rows 217  
250 Rows 526 <--Threshold
500 Rows 1043  
1000 Rows 4141  

 

As you can see, at 250 rows, we have crossed the threshold and it is cheaper do a scan. If you are passing in a literal, the optimizer can detect this and switch to a scan. If it is a stored proc or parameterized SQL, a plan is cached the first time it is run. Problems happen when the result size greatly varies depending on the parameter. There are ways around this all with their pro’s and con’s. Here are some:

  • A covering index.
  • A statement level recompile hint
  • Plan guides and hints

Here is the sample code that can repro these numbers on SQL Server 2008.

CREATE TABLE #temp      
(      
id INT IDENTITY(1,1) PRIMARY KEY CLUSTERED,      
c1 CHAR(1000) DEFAULT( 'blah'),      
c2 INT      
)      
INSERT INTO #temp(c2)      
VALUES      
(1)      
GO  
INSERT INTO #temp(c2)      
VALUES      
(5)      
GO 5      
INSERT INTO #temp(c2)      
VALUES      
(10)      
GO 10      
INSERT INTO #temp(c2)      
VALUES      
(25)      
GO 25      
INSERT INTO #temp(c2)      
VALUES      
(50)      
GO 50      
INSERT INTO #temp(c2)      
VALUES      
(100)      
GO 100      
INSERT INTO #temp(c2)      
VALUES      
(250)      
GO 250      
INSERT INTO #temp(c2)      
VALUES      
(500)      
GO 500      
INSERT INTO #temp(c2)      
VALUES      
(1000)      
GO 1000      
INSERT INTO #temp(c2)      
VALUES      
(1000)      
GO 1000      
      
CREATE INDEX ix ON #temp(c2)      
      
--The baseline      
SET STATISTICS io ON      
SELECT *      
FROM #temp WITH (INDEX=1)      
WHERE c2 = 1      
      
--1 row returned      
SELECT *      
FROM #temp      
WHERE c2 = 1      
      
--5 rows returned      
SELECT *      
FROM #temp      
WHERE c2 = 5      
      
--10 rows returned      
SELECT *      
FROM #temp      
WHERE c2 = 10      
      
--25 rows returned      
SELECT *      
FROM #temp      
WHERE c2 = 25      
      
--50 rows returned      
SELECT *      
FROM #temp      
WHERE c2 = 50      
      
--100 rows returned      
SELECT *      
FROM #temp      
WHERE c2 = 100      
      
--250 rows returned      
--Must begin using hints because the optimizer can tell that a scan is better       
SELECT *      
FROM #temp WITH (INDEX=ix, forceseek)      
WHERE c2 = 250      
      
--1000 rows returned      
--Must begin using hints because the optimizer can tell that a scan is better       
SELECT *      
FROM #temp WITH (INDEX=ix, forceseek)      
WHERE c2 = 1000