Archive for the ‘SQL Server 2008’ Category

Grouping and Rounding Dates

Wednesday, March 10th, 2010

DSC01618Ever need to group dates and round them to intervals?  I had this exact need for a report on wait stats I was working on for an upcoming post.  I wanted the flexibility to be able to group a list of dates by either minute, hour, date, etc.  And then also round those dates within those intervals.  For example, round a list of dates to every 15 minutes, every 3 hours, or every 2 days.  At first thought this sounded like something that might be a little tricky to accomplish.  But as I’ll show in the examples below, this is quite easy.

Start With The First of the Month

To resolve this issue, I started by using the logic that I typically when I need to take a date and change it to the first day of a month or quarter.  The logic I use for that utilizes the DATEADD and DATEDIFF functions.  I’ll skip going into how those work since Books Online covers that pretty well. 

To find the start of a month or quarter, these functions can be used together to be compare a date and time against the value 0.  Casting 0 as datetime returns the value 1900-01-01 00:00:00.000.  Using the two functions together, use DATEDIFF to determine how many months or quarters have occurred since 1900-01-01.  And then use DATEADD to add those months or quarters to the date 1900-01-01.  Doing this will provide you with the the first day in either the month or the quarter.

The examples below can be used to demonstrate this:

SELECT DATEADD(M, DATEDIFF(M, 0, '20100310'), 0)

GO

SELECT DATEADD(Q, DATEDIFF(Q, 0, '20100310'), 0)

GO

Returning the following results:

DateAddDateDiff

Round to the Desired Interval

Now that’s the easy part which leaves the next part, where dates and times need to be rounded not to the 1st of the month put to periods that are determined at execution time.  As I mentioned above, I need to change the logic above such that I can get dates to round to every 1 week or 2 days or 3 hours.

On the surface, this sounded a lot harder than it turned out to be.  Because if you use the % to find the remainder of division, you can easy round any date or time to any logical grouping of dates or times. 

Suppose you need to round a datetime to the nearest 4 hours.  Find the distance in hours between 1900-01-01 and the datetime in question.  Take that number and find the modulo using your rounding interval.  In this case that value is 4.  Then subtract the remainder from the distance.  And use the new distance value in the DATEADD function from 1900-01-01.  Hopefully, that didn’t lose too many people.

Rather than explain it another way, take a look at the following T-SQL statements where the datetime is rounded down to the nearest 2, 3, and 4 hour intervals.

--Round down to nearest 2 hours.

SELECT DATEADD(HH, DATEDIFF(HH, 0, '2010-03-10 03:00:00.000')-(DATEDIFF(HH, 0, '2010-03-10 03:00:00.000')%2), 0)

GO

--Round down to nearest 3 hours.

SELECT DATEADD(HH, DATEDIFF(HH, 0, '2010-03-10 03:00:00.000')-(DATEDIFF(HH, 0, '2010-03-10 03:00:00.000')%3), 0)

GO

--Round down to nearest 4 hours.

SELECT DATEADD(HH, DATEDIFF(HH, 0, '2010-03-10 03:00:00.000')-(DATEDIFF(HH, 0, '2010-03-10 03:00:00.000')%4), 0)

GO

These statements produce the following results:

DateAddDateDiff2 

As you’ll hopefully see, this is a very simple technique.  Imagine replacing the hour, day, or month date parts with a variable and then the interval to round the dates with a variable.  Doing that and the expression can be used in a stored procedure or report to group any set of dates along any set of date or time groupings.

Hope you find this as useful as I did.  And in the next couple days, I hope to have something up that uses this technique to make it even easier to use in your own code by providing fully working examples.

March PASSMN Meeting (03/16/2010)

Tuesday, March 9th, 2010

It’s that time of the month again.  Time to network and learn about SQL Server with your peers.

Location: 8300 Norman Center Drive, 9th Floor, Bloomington, MN  55437
Date: March 16, 2010
Time: 2:30 – 5:00

Please click here for meeting details and to RSVP.

Registration has changed with the move from our previous hosting site and you will be required to log into www.sqlpass.org in order to register for our events. If you have any issues with this, please contact support@mnssug.org.

 

ABCs of CTEs
Jason Strate

Common Table Expressions (CTE) aren’t as common as their name implies. CTEs are often seen as a secret part of the dark art of recursion. This session will explore CTEs to show how they can be extremely useful in improving performance and legibility of T-SQL code. And, of course, we will look at their use in returning recursive data.

Database Design Fundamentals
Louis Davidson

In this session I will give an overview of how to design a database, including the common normal forms and why they should matter to you if you are creating or modifying SQL Server databases. Data should be easy to work with in SQL Server if the database has been organized as close as possible to the standards of normalization that have proven for many years. Many common T-SQL programming “difficulties” are the result of struggling against the way data should be structured and can be avoided by applying the basic normalization techniques and are obvious things that you find yourself struggling with time and again (i.e. using the SUBSTRING function in a WHERE clause meaning you can’t use an index efficiently).

How can I tell if a SQL Server system is affected by Forwarded records?

Monday, March 8th, 2010

Update: This post is participating in the 4th installment of the T-SQL Tuesday hosted this time by Mike Walsh.

One of the side affects of having Heaps in a system is the possibility of generating forwarded records. When reading data from a heap, Forwarded records (if they are present in a heap) can generate extra, random and potentially physical IOs when compared to a table with clustered index and this may hurt the performance if one has to read lot of data from that heap. Note that having a heap doesn’t mean it is guaranteed to have forwarded records. Enough has been said and blogged about the details of forwarded records already and in this post I am NOT going to rehash the details of forwarded records and how it affects the performance but will go over an example and a script that will help to find out if forwarded records were used to fetch the data in the heap. Armed with that information one can see the true cost of these extra IOs generated by forwarded records. Looking at sys.dm_db_index_physical_stats shows if forwarded records are present in the heap but it doesn’t necessarily tell if the workload is indeed affected by them, how much and how many times. In the script I am going to show at the end of this post, one can see how many times forwarded records have been used to retrieve the data and understand the total cost of *extra IOs* and the performance implications of forwarded records.

I will be borrowing Kalen’s example for this test and here is the script for the test data.

USE 

AdventureWorks

GO

IF OBJECT_ID ('Details', 'U') IS NOT NULL

DROP TABLE Details;

GO
CREATE TABLE dbo.Details

([SalesOrderID] [int] NOT NULL,

[SalesOrderDetailID] [int] NOT NULL,

[CarrierTrackingNumber] [nvarchar](25) NULL,

[OrderQty] [smallint] NOT NULL,

[ProductID] [int] NOT NULL,

[SpecialOfferID] [int] NOT NULL,

[UnitPrice] [money] NOT NULL,

[UnitPriceDiscount] [money] NOT NULL);

GO

INSERT INTO dbo.Details

SELECT [SalesOrderID]

,[SalesOrderDetailID]

,[CarrierTrackingNumber]

,[OrderQty]

,[ProductID]

,[SpecialOfferID]

,[UnitPrice]

,[UnitPriceDiscount]

FROM [AdventureWorks].[Sales].[SalesOrderDetail];

GO
SELECT OBJECT_NAME(OBJECT_ID) AS Object

, page_count

, avg_page_space_used_in_percent

, forwarded_record_count

FROM sys.dm_db_index_physical_stats (DB_ID(N'AdventureWorks'), OBJECT_ID ('dbo.Details'),  NULL, NULL, 'DETAILED');

GO

SET STATISTICS IO ON

GO

-- A table scan takes as many reads as there are pages, i.e. 856 in this case

SELECT * FROM dbo.Details;

GO

SET STATISTICS IO OFF

GO
/*

Table 'Details'. Scan count 1, logical reads 856, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.*/
-- Now add a new fixed width column and note that this is a

-- metadata only change

-- The data pages are not modified

-- There is no change in the fullness of the pages
ALTER TABLE dbo.Details ADD notes CHAR(100);

GO
SELECT OBJECT_NAME(OBJECT_ID) AS Object

, page_count

, avg_page_space_used_in_percent

, forwarded_record_count

FROM sys.dm_db_index_physical_stats (DB_ID(N'AdventureWorks'), OBJECT_ID ('dbo.Details'),  NULL, NULL, 'DETAILED');

GO

-- The data pages are not affected until we run the following update.

-- Every row on every page will get an additional 100 bytes in the notes field

--  added to it

UPDATE dbo.Details

SET notes = 'notes';

GO
-- note there are LOTS of forwarded records now (84408),

-- and many more pages the table (2709)

SELECT OBJECT_NAME(OBJECT_ID) AS Object

, page_count

, avg_page_space_used_in_percent

, forwarded_record_count

FROM sys.dm_db_index_physical_stats (DB_ID(N'AdventureWorks'), OBJECT_ID ('dbo.Details'),  NULL, NULL, 'DETAILED');

GO

SET STATISTICS IO ON

GO

-- The number of reads is not just the number of pages as we would expect for

-- a scan of a heap, but is equal to the

-- number of pages PLUS the number of forwarded records:

--  84408 + 2709 = 87117

-- During a scan, the forwarded pointers are followed for EACH row, and then

-- SQL Server goes back to the original position to continue the scan
SELECT * FROM dbo.Details;

GO

SET STATISTICS IO OFF

GO
/*Table 'Details'. Scan count 1, logical reads 87117, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.*/

So far we have seen the existence of forwarded records using sys.dm_db_index_physical_stats and the number of logical reads go up using SET STATISTICS IO ON. The next query will tell you how many times forwarded records have been traversed to satisfy the queries against the heap.

SELECT

DB_NAME(database_id) AS database_name

, OBJECT_NAME(OBJECT_ID) AS OBJECT_NAME

, forwarded_fetch_count

FROM sys.dm_db_index_operational_stats (DB_ID('AdventureWorks'), OBJECT_ID('dbo.Details'), NULL, NULL)

In our example, our heap had 84408 forwarded records and so far the entire heap was traversed 2 times 84408 * 2 = 168816
/*
database_name  object_name forwarded_fetch_count
AdventureWorks Details     168816
*/

Before we take off, lets take the forwarded records for a spin and see if the DMV sys.dm_db_index_operational_stats displays the true count of how many times these forwarded records are used.

SELECT * FROM dbo.Details;

GO 10
 
SELECT

DB_NAME(database_id) AS database_name

, OBJECT_NAME(OBJECT_ID) AS OBJECT_NAME

, forwarded_fetch_count

FROM sys.dm_db_index_operational_stats (DB_ID('AdventureWorks'), OBJECT_ID('dbo.Details'), NULL, NULL)

So far our heap was traversed 2 times and we spinned the table another 10 times above with a total of 12 * 84408 forwarded records = 1012896.
/*
database_name  object_name forwarded_fetch_count
AdventureWorks Details     1012896
*/

If forwarded records are a new concept for you then I would urge you to look at these excellent posts by Kalen, Paul and Simon on this topic. This post is participating in the 4th installment of the T-SQL Tuesday hosted by Mike.
whats-worse-than-a-table-scan
fragmentation-and-forwarded-records-in-a-heap
Forwarding-and-forwarded-records-and-the-back-pointer-size
INSERT and UPDATE loading practice – The impact of forwarding pointers.

Digg This  Reddit This  Stumble Now!  Buzz This  Vote on DZone  Share on Facebook  Bookmark this on Delicious  Kick It on DotNetKicks.com  Shout it  Share on LinkedIn  Bookmark this on Technorati  Post on Twitter  Google Buzz (aka. Google Reader)  

Transaction Log (block) CHECKSUM

Monday, March 8th, 2010

Update: This post is participating in the 4th installment of the T-SQL Tuesday hosted this time by Mike Walsh

.SQL Server 2005 came few years ago with lots of optimizations and in this post I will add some details on one of the lesser known IO enhancement that wasn’t present in previous versions.

CHECKSUM was introduced as a PAGE_VERIFY option starting from SQL Server 2005 and the two widely known options are PAGE CHECKSUM and BACKUP CHECKSUM. The other kind and the lesser known option is log block CHECKSUM.

What is a log block?

The smallest unit of data storage in SQL Server is a page which is 8 kb in size. SQL Server manages space as Extents, which are made of eight contiguous pages. When it comes to Transaction Log, the fundamental unit of IO is a log block. The size of the log block is *NOT* fixed and it varies from 512 bytes to 60 kb. Virtual log files known as VLFs are made up of log blocks. And the Transaction Log is made up many VLFs, which are variable in size. For more details on the log block, refer Paul’s post listed below.

How do I turn on log block CHECKSUM?

PAGE CHECKSUMs are enabled using the ALTER DATABASE PAGE_VERIFY option. Log block CHECKSUMs are enabled automatically when the CHECKSUM option is set for PAGE_VERIFY. Note that turning on TORN_PAGE_DETECTION has no effect on the log block and is NOT applicable.

How does log block CHECKSUM works?

When SQL Server commits a transaction and issues a write for the transaction log, just before the log block is written to disk the CHECKSUM over the log block is computed and stamped on the header of the log block very similar in nature to the data page. When the transaction log is read during the backup and restore (and other activities listed in Paul’s post below), the CHECKSUM value is re-computed and checked against the previously written value in the header.

How does log block CHECKSUM help?

If the CHECKSUM value of the log block doesn’t match with the existing CHECKSUM value then IO subsystem has compromised your data. One has to check and validate the IO Subsystem extensively to find the root cause of the problem.

Does Transact SQL and PAGE/BACKUP/Log Block CHECKSUM use the same algorithm?

No. Transact SQL CHECKSUM algorithm is different from the PAGE/BACKUP/Log Block CHECKSUM, which uses simple a yet efficient algorithm compared to T-SQL CHECKSUM.

I learned about this very recently and thought its good information to share with the community. This post is participating in the 4th installment of the T-SQL Tuesday hosted by Mike.
Long list of references and recommended reading:
Digg This  Reddit This  Stumble Now!  Buzz This  Vote on DZone  Share on Facebook  Bookmark this on Delicious  Kick It on DotNetKicks.com  Shout it  Share on LinkedIn  Bookmark this on Technorati  Post on Twitter  Google Buzz (aka. Google Reader)  

SSIS and LINQ

Saturday, March 6th, 2010

Introduction
Last August I looked to add the LINQ library to a SQL Server Integration Services (SSIS) Script Task. Why? Two reasons: First, because if you’ve looked any LINQ, the features of it, the language, and most importantly why it exists, you know that it is the bomb for allowing you to do many things when working with data. These features are well advertised - it works with all types of data (XML, SQL, Excel, objects, etc.) and accords to the developer the opportunity to focus on the data rather than learning a new API or language,  allows one to remove the “plumbing” out of whatever you may be connecting to, has built in type-checking, and others. The Linq syntax is very simple, straightforward, and concise, and I thought that this would be perfect to use in SSIS. After all, don’t we connect to a myriad of data sources in our various connection managers? Isn't SSIS all about connecting to "disparate" data sources?  Wouldn’t it be nice to have only one line worth of code to connect to SQL Server in an SSIS script component?  Second, because “it’s there” and I wanted to do some investigation.  So this blog is about SSIS and LINQ.

And, if you happened to catch a previous post on "Why I Blog", this blog fits the category of research and experimentation. I’m just geeking here -this is neither a tutorial, a how-to, or a lesson.

Get your groove on

Set up the data source first, so we don’t have to go back and do this later.  Add this to SQL Server:

 

USE [Test]
GO

/****** Object:  View [dbo].[vData]    Script Date: 03/06/2010 08:42:16 ******/
SET ANSI_NULLS ON
GO

SET QUOTED_IDENTIFIER ON
GO

CREATE VIEW [dbo].[vData]
as
SELECT *
FROM
 (SELECT cast(x.ID as varchar (20)) as ID, x.Guid_ID
       FROM 
              (
              SELECT  TOP (1000) ROW_NUMBER() OVER (ORDER BY z.ID)  AS ID,
              CAST(NEWID() as varchar (100)) as Guid_ID
              FROM sys.syscolumns x, sys.syscolumns y, sys.syscolumns z     
              ORDER BY 1                 
              ) x
      ) y


GO

 

Create a table where we will land our result set:

USE [Test]
GO

/****** Object:  Table [dbo].[LinqTest]    Script Date: 03/06/2010 09:17:34 ******/
SET ANSI_NULLS ON
GO

SET QUOTED_IDENTIFIER ON
GO

SET ANSI_PADDING ON
GO

CREATE TABLE [dbo].[LinqTest](
    [ID] [varchar](20) NULL,
    [GUID_ID] [varchar](100) NULL
) ON [PRIMARY]

GO

SET ANSI_PADDING OFF
GO


Let’s open up the SQL Server 2008 Business Intelligence Development Studio (BIDS) and create a new SSIS project.  Add a script task to the Control Flow, open it, right click on references, and add System.Data.Linq.  (If yours is grayed out, then your project isn't set up with the .Net 3.5 framework. Go ahead and do Project>>Properties, and change the Target Framework.  Now you should be able to add it).  See Figure 1.

Figure1
Figure 1  BIDS with Linq library

Right click on the project and select Linq to SQL Classes, and add this to the project. Mine says DataClasses2.dbml, and added automatically are two other files underneath it, DataClasses2.dbml.layout and DataClasses2.designer.cs.  This will allow us to map the entity, and let .Net create the classes, properties, and methods for our entity.  This is the Object Relational Designer ORM tool.  Now all we need to do is find an entity (table or view).  Here, I run into a problem though, because when I try to add an entity from SQL Server (what I want to map to) I get in my Database Explorer only Data Connections; there is no "Server Explorer", and it gives me the option to only add the following:

Figure2 
Figure 2  No SQL Server, no Server Explorer in VSTA. Ugh.

This is a problem.  I don't want to add a database file to SQL Server Express, akin to when you are working with ASPNET database, ASPSTATE, or one of the other databases used in web development. No, I want to connect to SQL Server and access the view that we just created.  I'm going to have to revisit this, because at first go I don't see how I can do this. There is on option to access my SQL Servers in the script task. So, the next best thing is to fire up Visual Studio, create a class, and make a custom component that exposes Linq; then I can reference this library in SSIS.  Let's try this route. 

Create a new class, and do the same thing that we just did – add Linq, and add the Linq classes.  Open up the DataClasses1.dbml file, open Server Explorer, find you view that we just created, and drag the view into the workspace.  It should look like this:

Figure3

Figure 3.  The Object Relational Designer

Now, create a new class and add the following code.

using System.Data.Linq;
using System.Data.Linq.Mapping;
using System.Data;
using System.Collections.Generic;
using System.Reflection;
using System.Linq;
using System.Linq.Expressions;
using System.ComponentModel;
using System;

namespace Console
{
    public class DataRepository
    {
       public static IEnumerable<vData> GetData()
        {
            DataClasses1DataContext db = new DataClasses1DataContext();
            IEnumerable<vData> query = from c in db.vDatas
                                        select c;
            return query;
        }
        
    }    
}

 

This is the class that creates the method that we’ll reference to our view from above, vData, that implements IEnumerable interface. This interface is very important in SSIS because we can consume any data that uses this. It’s a good idea to learn this interface by the way, because it is used throughout SSIS behind the scenes.  I created my class name hoping to use a partial class, but it didn’t work. Something else to investigate. The reason that I do this is because I don’t want to change any of the code that the designer created. Why?  Right on. If I do, then when I go change the entity it spins up all new classes, and overwrites my code. Notice the warnings not to change the code in Figure 4.

Notice the simplicity of the Linq query above.  Here I’m just fetching all of the columns in the view; available is a bunch of other TSQL-comparable operations, so check that out in the Linq to SQL documentation for more on particulars with the query language.  Finally, we return a vData object.

At this point, compile the project, create a strong key, and GAC this baby.  If you don’t know how to do this, go see my ever-so-popular blog entitled Create a Custom Assembly in SSIS in another post.  (As a shortcut, by the way, simply drag the .dll into C:\Windows\Assembly after creating your strong key name – no need to do the command line stuff).

Figure3.3

Figure 4.  The Data Classes Designer

 

I thought this was an SSIS blog entry

Ok we’re finally here. Let’s open up BIDS now and go back to that project that we created. Create a variable now in SSIS of type object, and then add the following code to your SSIS script task.  Now, open up your script task and add a reference to the component that you just created. My name is “Console.dll” (a bad name, by the way I know):

Figure4

Figure 5. Addition of the custom assembly

 

Add the following code to your script task. Note: do not copy the namespace here, because yours is different than the below.  Pluck out the using clauses and the remainder of the code and you should be ok.

using System;
using System.Data;
using Microsoft.SqlServer.Dts.Runtime;
using System.Windows.Forms;
using System.Data.SqlClient;
using Console;
using System.Collections.Generic;

namespace ST_5ca62ff84d714027842f55eb01cbe963.csproj
{
    [System.AddIn.AddIn("ScriptMain", Version = "1.0", Publisher = "", Description = "")]
    public partial class ScriptMain : Microsoft.SqlServer.Dts.Tasks
.ScriptTask.VSTARTScriptObjectModelBase
    {

        #region VSTA generated code
        enum ScriptResults
        {
            Success = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Success,
            Failure = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Failure
        };
        #endregion


        public void Main()
        {
            IEnumerable<vData> query = DataRepository.GetData();
            Dts.Variables["RecordsetVar"].Value = query;

            SqlConnection _conn = DefineConnection();
            string _sql;

            foreach (Console.vData item in Dts.Variables["RecordsetVar"].Value 
                 as System.Collections.Generic.IEnumerable<Console.vData>)
            {
                _sql = "INSERT INTO LinqTest VALUES ('" + item.ID + "'," + "'" 
                    + item.Guid_ID + "')";
                ExecSQL(_conn, _sql);
            }
            Dts.TaskResult = (int)ScriptResults.Success;
        }

        private static void ExecSQL(SqlConnection _conn, string _sql)
        {
            SqlCommand cmd = new SqlCommand(_sql, _conn);
            cmd.CommandTimeout = 0;
            cmd.ExecuteNonQuery();
        }

        private SqlConnection DefineConnection()
        {
            SqlConnection _conn = (SqlConnection)Dts.Connections["OLE"]
                 .AcquireConnection(null);
            _conn = new SqlConnection(_conn.ConnectionString);
            _conn.Open();
            return _conn;
        }
    }
}

 

Run the project and then view the result in SSMS and view the results.

Figure5

Figure 6. Result set in SQL Server

Conclusion

Some observations here. First, one of our goals was to avoid the “plumbing” of making connections to data that is one of the features of Linq, but we’ve just broken this rule by adding all of the above to consume this data source. Look at what we had to do in order to get to our data - I tried to add my variable of type object to all sorts of places (Foreach container for example) but had a difficult time getting to my dataset, and for now this was the only way that I could figure out how to get to my data, besides simply throwing it to a message box or opening up a TextWriter and sending the results to a text file. This is an area for more investigation as well. Second, the speed of this operation is slow, much slower than simply connecting to the data via the SSIS stock connections, which I anticipated. Third, because I had to create a custom assembly, this workaround indicates to me that Microsoft did not want SSIS developers consuming data from an Object Relational Designer.

Linq is very cool – with regards to using it in SSIS, this excursion indicates that it is not an option to go to in this context.  I will continue with experimentation on the subject, so look for a future post down the road if you are curious about this implementation.

 

Thanks for reading,

Lee Everest, M.S.

 

Flickr Tags:

face

Performance impact of using BACKUP CHECKSUM in SQL Server 2005/2008

Wednesday, March 3rd, 2010

From SQL Server 2005 onwards, CHECKSUM is available as a PAGE_VERIFY option at the database level. All databases created in SQL Server 2005 & above will have CHECKSUM enabled by default. Even tempdb in SQL Server 2008 has this option enabled. It is recommended by MSFT to change the PAGE_VERIFY option of databases to CHECKSUM when migrating from earlier versions to SQL Server 2005 and above.

Why should I enable PAGE CHECKSUM?

CHECKSUM has a robust algorithm compared to TORN_PAGE_DETECTION option which was the default in previous versions. Database pages can get corrupted outside the scope of SQL Server and CHECKSUMS are a great way to detect corruption caused by IO subsystems. When a dirty database page is written to disk, SQL Server computes the CHECKSUM of the contents of the page and stamps the value in the page header just before the page is written to disk. When that page is read back into memory, SQL Server re-computes the CHECKSUM and if that value doesn’t match with the value stamped on the header then someone else wrote to this page outside the scope of SQL Server. By enabling CHECKSUMs one should be able to isolate the IO subsystem as the root cause of corruption.

The key to un-covering corruption above is, only if the page is read back from the disk. What happens if the page is *NOT* read back for a week or say a month. Corruption will be your house guest and you may not notice it and there is a potential chance that you may end up with loosing data.

Why should I use BACKUP CHECKSUM?

When BACKUPs are taken with the CHECKSUM option, SQL Server will verify and re-compute the CHECKSUM/ TORN_PAGE_DETECTION of all pages and gives you extra protection that PAGE CHECKSUMS can’t handle. This comes very handy if the PAGE_VERIFY is set to CHECKSUM or TORN_PAGE_DETECTION. Even if both the options are turned off then BACKUP CHECKSUM will still compute a combined CHECKSUM of all pages and stamp the value on the backup media. This can be used to re-compute the CHECKSUM before restoring to make sure the backup itself is intact.

Impact of using PAGE CHECKSUM:

Linchi Shea did some testing on this topic a while back and I strongly recommend you to visit his work at Performance impact of enabling page checksum and default trace. As per Linchi and MSFT, the total cpu cost of enabling PAGE CHECKSUM is very small and mostly in the range of 1-2%.

Impact of using BACKUP CHECKSUM:

BOL mentions there is a performance penalty for enabling the BACKUP CHECKSUM and here is a quote from BOL.
“ When creating a backup checksum, a backup operation does not add any checksums to pages. Pages are backed up as they exist in the database, and the pages are unmodified by backup. Due to the overhead verifying and generating backup checksums, using backup checksums poses a potential performance impact. Both the workload and the backup throughput may be affected. Therefore, using backup checksums is optional. When deciding to generate checksums during a backup, carefully monitor the CPU overhead incurred as well as the impact on any concurrent workload on the system. ”

I was curious to measure the impact of the BACKUP CHECKSUM and went ahead with 6 different options listed below with my testing. SQL Server 2005 with Service Pack 2 is used and I measured the performance using SQL Profiler. I also did similar testing on SQL Server 2008 with SP1 with almost identical results on small databases. I wish I had better hardware on SQL Server 2008 instance.

1) Database with PAGE_VERIFY set to NONE & use BACKUP WITH CHECKSUM
2) Database with PAGE_VERIFY set to NONE & use BACKUP WITH OUT CHECKSUM
3) Database with PAGE_VERIFY set to TORN_PAGE_DETECTION & use BACKUP WITH CHECKSUM
4) Database with PAGE_VERIFY set to TORN_PAGE_DETECTION & use BACKUP WITH OUT CHECKSUM
5) Database with PAGE_VERIFY set to CHECKSUM & use BACKUP WITH CHECKSUM
6) Database with PAGE_VERIFY set to CHECKSUM & use BACKUP WITH OUT CHECKSUM

Environment:

SQL Server 2005 SP2 64 bit Developer Edition.
4 Processors, 8 GB RAM, RAID 5

Testing:
Each data point has been run 3 times and the average of them is considered. CPUTime and Duration are measured for every test using Profiler. A new database is created with the right PAGE_VERIFY option and data is populated for every test. These tests are done for databases with 4, 8, 16, 32 GB.

The results were startling to me and I didn’t expected the results.

BACKUP WITH CHECKSUM vs BACKUP when PAGE_VERIFY is set to NONE (least recommended)

PAGE VERIFY NONE

BACKUP WITH CHECKSUM vs BACKUP when PAGE_VERIFY is set to TORN_PAGE_DETECTION

PAGE VERIFY TORN PAGE DETECTION

BACKUP WITH CHECKSUM vs BACKUP when PAGE_VERIFY is set to CHECKSUM

PAGE_VERIFY CHECKSUM

BACKUP CHECKSUM when PAGE_VERIFY is set to NONE, TORN_PAGE_DETECTION & CHECKSUM

PAGE VERIFY OPTIONS

From the above results, the least expensive method is when PAGE_VERIFY is set to CHECKSUM and the backups are taken with CHECKSUM option and the most expensive is when TORN_PAGE_DETECTION is set for PAGE_VERIFY.

So, Do I recommend adding CHECKSUM option while taking the backups? Definitely if you have cpu cycles to spare then I would highly recommend adding this option [CHECKSUM] to the BACKUP or if you don’t have enough confidence on your IO subsystem. In this post I tried to show the actual cost of adding this option so that one can take an informed decision about the added cost. Also, note that BACKUP CHECKSUM is *NOT* a replacement to run the full-fledged DBCC CHECKDB which does a lot more checking to identify corruption.

Acknowledgements:
Performance Impact of Enabling Page Checksum and Default Trace
Example 2000/2005 corrupt databases and some more info on backup, restore, page checksums and IO errors
How do I know if my earlier backups used CHECKSUM?
Checksum in SQL2005

Digg This  Reddit This  Stumble Now!  Buzz This  Vote on DZone  Share on Facebook  Bookmark this on Delicious  Kick It on DotNetKicks.com  Shout it  Share on LinkedIn  Bookmark this on Technorati  Post on Twitter  Google Buzz (aka. Google Reader)  

Performance impact of using BACKUP CHECKSUM in SQL Server 2005/2008

Monday, March 1st, 2010

From SQL Server 2005 onwards, CHECKSUM is available as a PAGE_VERIFY option at the database level. All databases created in SQL Server 2005 & above will have CHECKSUM enabled by default. Even tempdb in SQL Server 2008 has this option enabled. It is recommended by MSFT to change the PAGE_VERIFY option of databases to CHECKSUM when migrating from earlier versions to SQL Server 2005 and above.

Why should I enable PAGE CHECKSUM?

CHECKSUM has a robust algorithm compared to TORN_PAGE_DETECTION option which was the default in previous versions. Database pages can get corrupted outside the scope of SQL Server and CHECKSUMS are a great way to detect corruption caused by IO subsystems. When a dirty database page is written to disk, SQL Server computes the CHECKSUM of the contents of the page and stamps the value in the page header just before the page is written to disk. When that page is read back into memory, SQL Server re-computes the CHECKSUM and if that value doesn’t match with the value stamped on the header then someone else wrote to this page outside the scope of SQL Server. By enabling CHECKSUMs one should be able to isolate the IO subsystem as the root cause of corruption.

The key to un-covering corruption above is, only if the page is read back from the disk. What happens if the page is *NOT* read back for a week or say a month. Corruption will be your house guest and you may not notice it and there is a potential chance that you may end up with loosing data.

Why should I use BACKUP CHECKSUM?

When BACKUPs are taken with the CHECKSUM option, SQL Server will verify and re-compute the CHECKSUM/ TORN_PAGE_DETECTION of all pages and gives you extra protection that PAGE CHECKSUMS can’t handle. This comes very handy if the PAGE_VERIFY is set to CHECKSUM or TORN_PAGE_DETECTION. Even if both the options are turned off then BACKUP CHECKSUM will still compute a combined CHECKSUM of all pages and stamp the value on the backup media. This can be used to re-compute the CHECKSUM before restoring to make sure the backup itself is intact.

Impact of using PAGE CHECKSUM:

Linchi Shea did some testing on this topic a while back and I strongly recommend you to visit his work at Performance impact of enabling page checksum and default trace. As per Linchi and MSFT, the total cpu cost of enabling PAGE CHECKSUM is very small and mostly in the range of 1-2%.

Impact of using BACKUP CHECKSUM:

BOL mentions there is a performance penalty for enabling the BACKUP CHECKSUM and here is a quote from BOL.
“ When creating a backup checksum, a backup operation does not add any checksums to pages. Pages are backed up as they exist in the database, and the pages are unmodified by backup. Due to the overhead verifying and generating backup checksums, using backup checksums poses a potential performance impact. Both the workload and the backup throughput may be affected. Therefore, using backup checksums is optional. When deciding to generate checksums during a backup, carefully monitor the CPU overhead incurred as well as the impact on any concurrent workload on the system. ”

I was curious to measure the impact of the BACKUP CHECKSUM and went ahead with 6 different options listed below with my testing. SQL Server 2005 with Service Pack 2 is used and I measured the performance using SQL Profiler. I also did similar testing on SQL Server 2008 with SP1 with almost identical results on small databases. I wish I had better hardware on SQL Server 2008 instance.

1) Database with PAGE_VERIFY set to NONE & use BACKUP WITH CHECKSUM
2) Database with PAGE_VERIFY set to NONE & use BACKUP WITH OUT CHECKSUM
3) Database with PAGE_VERIFY set to TORN_PAGE_DETECTION & use BACKUP WITH CHECKSUM
4) Database with PAGE_VERIFY set to TORN_PAGE_DETECTION & use BACKUP WITH OUT CHECKSUM
5) Database with PAGE_VERIFY set to CHECKSUM & use BACKUP WITH CHECKSUM
6) Database with PAGE_VERIFY set to CHECKSUM & use BACKUP WITH OUT CHECKSUM

Environment:

SQL Server 2005 SP2 64 bit Developer Edition.
4 Processors, 8 GB RAM, RAID 5

Testing:
Each data point has been run 3 times and the average of them is considered. CPUTime and Duration are measured for every test using Profiler. A new database is created with the right PAGE_VERIFY option and data is populated for every test. These tests are done for databases with 4, 8, 16, 32 GB.

The results were startling to me and I didn’t expected the results.

BACKUP WITH CHECKSUM vs BACKUP when PAGE_VERIFY is set to NONE (least recommended)

PAGE VERIFY NONE

BACKUP WITH CHECKSUM vs BACKUP when PAGE_VERIFY is set to TORN_PAGE_DETECTION

PAGE VERIFY TORN PAGE DETECTION

BACKUP WITH CHECKSUM vs BACKUP when PAGE_VERIFY is set to CHECKSUM

PAGE_VERIFY CHECKSUM

BACKUP CHECKSUM when PAGE_VERIFY is set to NONE, TORN_PAGE_DETECTION & CHECKSUM

PAGE VERIFY OPTIONS

From the above results, the least expensive method is when PAGE_VERIFY is set to CHECKSUM and the backups are taken with CHECKSUM option and the most expensive is when TORN_PAGE_DETECTION is set for PAGE_VERIFY.

So, Do I recommend adding CHECKSUM option while taking the backups? Definitely if you have cpu cycles to spare then I would highly recommend adding this option [CHECKSUM] to the BACKUP or if you don’t have enough confidence on your IO subsystem. In this post I tried to show the actual cost of adding this option so that one can take an informed decision about the added cost. Also, note that BACKUP CHECKSUM is *NOT* a replacement to run the full-fledged DBCC CHECKDB which does a lot more checking to identify corruption.

Acknowledgements:
Performance Impact of Enabling Page Checksum and Default Trace
Example 2000/2005 corrupt databases and some more info on backup, restore, page checksums and IO errors
How do I know if my earlier backups used CHECKSUM?
Checksum in SQL2005

Digg This  Reddit This  Stumble Now!  Buzz This  Vote on DZone  Share on Facebook  Bookmark this on Delicious  Kick It on DotNetKicks.com  Shout it  Share on LinkedIn  Bookmark this on Technorati  Post on Twitter  Google Buzz (aka. Google Reader)  

Relative paths in SSIS to make things simpler

Wednesday, February 24th, 2010

Moving packages just got easy.

I worked on a large project last year at a global company that sells household items, and one thing that I liked about the project was the fact that they used ‘relative’ paths for their SSIS packages when referencing resources, such as text files, executables, xml files, etc. If you have used these before, this is old-hat and probably not terribly interesting; if not, it’s pretty slick!

Notice in the below test package that I have a Flat File Connection – FF_Source1 – and it’s wired-up to a file as seen in the below picture:

clip_image002

 

This is great, but what if I want to deploy this to the server?  Chances are, you’re probably not going to land the data file in the C:\My Documents\Visual Studio 2008\Projects\Examples\Data_Files folder. Rather, it might be in some other folder on another drive, maybe E:\SSIS\Projects\Data_Files for instance. And that’s only for the dev box. What if it has to be pushed to four or five servers, and maybe they’re all set up differently. Maybe Dev has an E:\ drive, but Validation or Test has an L:\ drive where the packages are stored. Sounds like a mess, eh?

clip_image004

 

Thank goodness for relative paths. Rather than the entire UNC path name, let’s just put the following in there - add this to your data source:  ..\Data_Files\Source1.txt

clip_image006

 

Niiiiiiiice!  How does this work? It’s easier explained by looking first at the document out on the Microsoft MSDN site, and search for “Paths”. Here are the excerpts from the link:

  • Use a period as a directory component in a path to represent the current directory, for example ".\temp.txt". For more information, see Paths.
  • Use two consecutive periods (..) as a directory component in a path to represent the parent of the current directory, for example "..\temp.txt". For more information, see Paths.
  • A path is also said to be relative if it contains "double-dots"; that is, two periods together in one component of the path. This special specifier is used to denote the directory above the current directory, otherwise known as the "parent directory". Examples of this format are as follows:
  • "..\tmp.txt" specifies a file named tmp.txt located in the parent of the current directory.
  • "..\..\tmp.txt" specifies a file that is two directories above the current directory.
  • "..\tempdir\tmp.txt" specifies a file named tmp.txt located in a directory named tempdir that is a peer directory to the current directory.

 

So, .\ is the current path, ..\ is the parent of the current directory, and ..\..\ is two directories up from where your package sits...relative to it's location.  This requires that you create a set of folders that you will push onto all of your boxes (at the company where I am consulting now they have five (5) environments) so that when you move a package from one box to another, it knows where to look, regardless of the letter drive. Sweet!

Check it out and let me know how it works for you.

Lee

-------------------------

 

 

It's not that I'm lazy; it's just that I don't care

clip_image007

 

Refs: http://msdn.microsoft.com/en-us/library/aa365247(VS.85).aspx#paths

 

Visit:  Lee   facebook

Cutover 30+ GB databases in 60 seconds with SQL Server 2005/2008

Tuesday, February 23rd, 2010
You kid brother just messed up a database migration.  You now have sixty seconds to migrate a 30 GB database or you kid brother is…

Okay I gave it a shot :-)

If you are familiar with moving databases across servers most likely you are using one of the following methods.  Today I am going to write about what goes on during these methods with a focus on speed and provide another option that might work for you.

  • Detach/Attach
  • Copy Database Wizard
  • Backup/Restore
  • 3rd Party Tools

Detach/Attach

The detach/attach method allows you to detach and reattach databases. Therefore, in order to move a database from one server to another you have to detach the database, physically move the data and log files to the new server and then attach the database.  An unforeseen bottleneck with this strategy might be network latency. While this method is very straight forward and simple how long can the database be offline while you are moving the database files (mdf, ndf, ldf) during the detach/attach process?

Copy Database Wizard

The copy database wizard is a tool that leverages SQL Server Integration Services (SSIS) to copy a database from one server to another.  The account that the package uses at runtime has to have sysadmin role on both the source and destination instances.  You have two options during the copy process.  The first method is detach/attach see the paragraph above for feedback on using detach/attach.  The second method includes using SMO to script database objects.  This method keeps the source database online during the copy but is much slower than the detach/attach.  Therefore, I have no feedback on this method.  Have you used it? If so please add your comments.

Backup/Restore

Assuming you are using the Full recovery mode for your database the backup restore method for moving a database involves taking a full backup and a transactional log backup where you backup the tail of the log, and leave the database in restoring state.  This will take the source database offline keep the data in sync.  With restore time being a factor you could restore the full backup and do incremental transaction logs up to the point of cutting over assuming assuming no full backups occurred on your database while you started applying transactional backups.

And the winner is…

Drum roll please……..  And the winner is Database Mirroring in High Availability Mode as it can allow you to cutover failover to migrate huge databases in less than sixty seconds.  Okay I might be cheating, it will take much more than sixty seconds to configure. The important fact is that it will seam like it only took seconds to the end users. If your application uses .NET 2.0+ framework and you configure client side redirect the end users might not experience an outage at all.

To setup database mirroring it is highly recommended to make sure the principal and mirror database engine services are using service accounts, they also need access the the ports used by SQL endpoints, and more….  Check out the MSDN site for a great guide for Database Mirroring

Added my first Microsoft Connect Ticket for SQL Server

Monday, February 22nd, 2010

Seems that I’m having SQL Server 2008 installs fail on computers that have Visual Studio 2008 with SP1. We’ll see if anyone can reproduce it.

https://connect.microsoft.com/SQLServer/feedback/details/535434/system-configuration-checker-doesnt-catch-that-visual-studio-needs-sp1-applied-before-install