Indexing for Partitioned Tables

From SQLServerPedia

Jump to: navigation, search

Once you have partitioned your table, you’re ready to tune for performance. As with any table, indexing is a great place to start. If you’re new to partitioning, you probably created all of your indexes on the partitioned scheme, either by design or unintentionally.


Contents

What is a Partitioned Index?

A partitioned index, like a partitioned table, separates data into different physical structures (partitions) under one logical name. Specifically, each partition in a non-clustered index contains its own B-tree structure with a subset of rows, based upon the partitioning scheme. By default, an unpartitioned, non-clustered index has just one partition.


When you create an index on a partitioned table using the basic CREATE command, you are creating the index on the partitioned scheme by default.

CREATE NONCLUSTERED INDEX IX_myIndex
    ON dbo.myTable(myColumn);


In order to create a non-partitioned index on that same table, you must explicitly declare “ON [FileGroup]“, as shown below:

CREATE NONCLUSTERED INDEX IX_myIndex
    ON dbo.myTable(myColumn)
    ON [PRIMARY];


When Should You Use a Partitioned Index?

But should you partition your index? That depends on how you use it. In fact, most environments are best served by a mix of partitioned and non-partitioned indexes.

Two tips:

  • Partitioned indexes perform better when you are aggregating data or scanning partitions.
  • If you need to locate a single, specific record, nothing performs better than a non-partitioned index on that column.

Examples:

Let’s set up some indexes and see how different queries perform.


First, we’ll create a partitioned index and an unpartitioned index:

/* Create a partition function. */
CREATE Partition FUNCTION 
    [test_monthlyDateRange_pf] (DATETIME)
    AS Range RIGHT FOR VALUES
    ('2009-01-01', '2009-01-08', '2009-01-15'
    , '2009-01-22', '2009-01-29');
Go
 
/* Associate the partition function with a partition scheme. */
CREATE Partition Scheme test_monthlyDateRange_ps
    AS Partition test_monthlyDateRange_pf
    All TO ([PRIMARY]);
Go
 
/* Create a partitioned table. */
CREATE TABLE dbo.orders
(
      order_id  INT IDENTITY(1,1)   Not Null
    , orderDate DATETIME            Not Null
    , orderData SMALLDATETIME       Not Null
 
    CONSTRAINT PK_orders PRIMARY KEY CLUSTERED
    (
        order_id
      , orderDate
    )
) ON test_monthlyDateRange_ps(orderDate);
Go
 
/* Create some records to play with. */
SET NOCOUNT ON;
 
DECLARE @endDate DATETIME = '2009-01-01';
 
WHILE @endDate < '2009-02-01'
BEGIN
 
    INSERT INTO dbo.orders
    SELECT @endDate, @endDate;
 
    SET @endDate = DATEADD(MINUTE, 1, @endDate);
 
END;
 
SET NOCOUNT OFF;
 
 
/* Let’s create an aligned, partitioned index. */
CREATE NONCLUSTERED INDEX IX_orders_aligned
    ON dbo.orders(order_id)
    ON test_monthlyDateRange_ps(orderDate); 
    /* you don't actually need to declare the last
       line of this unless you want to create the
       index on a different partitioning scheme.   */
 
/* Now let’s create an unpartitioned index. */
CREATE NONCLUSTERED INDEX IX_orders_unpartitioned
    ON dbo.orders(order_id)
    ON [PRIMARY];
Now that we have both a partitioned index and an unpartitioned index, let’s take a look at our sys.partitions table:
/* Let's take a look at our index partitions */
SELECT i.name
    , i.index_id
    , p.partition_number
    , p.ROWS
FROM sys.partitions AS p
Join sys.indexes AS i
    ON p.OBJECT_ID = i.OBJECT_ID 
   And p.index_id = i.index_id
WHERE p.OBJECT_ID = OBJECT_ID('orders')
ORDER BY i.index_id, p.partition_number;

sys.partitions

sys.partitions


As expected, both of our partitioned indexes, PK_orders and IX_orders_aligned, have six partitions, with a subset of rows on each partition. Our unpartitioned, non-clustered index, IX_orders_unpartitioned, on the other hand, has just one partition containing all of the rows.

Now that we have our environments set up, let’s run through some different queries and see the performance impact of each type of index.


Query 1: Look Up a Specific Record Using a Seek

/* Query 1, specific record lookup, covered */
SELECT order_id, orderDate
FROM dbo.orders WITH (INDEX(IX_orders_aligned))
WHERE order_id = 25000;
 
SELECT order_id, orderDate
FROM dbo.orders WITH (INDEX(IX_orders_unpartitioned))
WHERE order_id = 25000;

Query 1

Query 1


The unpartitioned index performs significantly better when given a specific record to look up.


Query 2: Look Up a Specific Record Using a Scan

Now let’s try the same query, but utilizing a scan instead of a seek:

/* Query 2, specific record look-up, uncovered */
SELECT order_id, orderDate, orderData
FROM dbo.orders WITH (INDEX(IX_orders_aligned))
WHERE order_id = 30000;
 
SELECT order_id, orderDate, orderData
FROM dbo.orders WITH (INDEX(IX_orders_unpartitioned))
WHERE order_id = 30000;

Query2

Query 2


Again we see that the non-partitioned index performs better looking up a single record. This can lead to some pretty dramatic performance implications.


Query 3: Aggregation

So when would we want to use a partitioned index? Two instances immediately pop to mind. First, partition switching can only be performed when all indexes on a table are aligned. Second, partitioned indexes perform better when large data sets are being manipulated.


To see this in action, let’s try some simple aggregation:

/* Query 3, aggregation */
SELECT CAST(ROUND(CAST(orderdate AS FLOAT), 0, 1) AS SMALLDATETIME) AS 'order_date'
    , COUNT(*)
FROM dbo.orders WITH (INDEX(IX_orders_aligned))
WHERE orderDate Between '2009-01-01' And '2009-01-07 23:59'
GROUP BY CAST(ROUND(CAST(orderdate AS FLOAT), 0, 1) AS SMALLDATETIME) 
ORDER BY CAST(ROUND(CAST(orderdate AS FLOAT), 0, 1) AS SMALLDATETIME);

 
SELECT CAST(ROUND(CAST(orderdate AS FLOAT), 0, 1) AS SMALLDATETIME) AS 'order_date'
    , COUNT(*)
FROM dbo.orders WITH (INDEX(IX_orders_unpartitioned))
WHERE orderDate Between '2009-01-01' And '2009-01-07 23:59'
GROUP BY CAST(ROUND(CAST(orderdate AS FLOAT), 0, 1) AS SMALLDATETIME) 
ORDER BY CAST(ROUND(CAST(orderdate AS FLOAT), 0, 1) AS SMALLDATETIME);

Query3

Query 3


As you can see, partitioned indexes perform better when aggregating data. This is just a simple example, but the results can be even more dramatic in a large production environment. This is one of the reasons why partitioned tables and indexes are especially beneficial in data warehouses.


Summary

Now you have a general idea of what a partitioned index is and when to use a partitioned index and when to use a non-partitioned index. Ultimately, your indexing needs will depend largely on the application and how the data is used.


When in doubt, test, test, test!


To recap, you should:

  • Specify “ON [FileGroup]“ to create an unpartitioned index on a partitioned table.
  • Consider using non-partitioned indexes for looking up single records.
  • Use partitioned indexes for multiple records and data aggregations.
  • Remember that to enable partition switching, all indexes on the table must be aligned.


Author Credits

Michelle Ufford

This wiki article was adapted from a blog post by Michelle Ufford.

Michelle is a SQL Developer DBA for GoDaddy.com, where she works with high-volume, mission-critical databases. She has over a decade of experience in a variety of technical roles and has worked with SQL Server for the last 5 years. She enjoys performance tuning and maintains an active SQL Server blog.

Her online presences include:

Related Reading

For more information about partitioning, check out these articles:

More SQLServerPedia Wiki Articles on SQL Server Partitioning

When To Use SQL Server Partitioning

How to Implement Partitioned Tables