Reblog: May 3 to May 9

May 10th, 2013 by Jason Strate

BookWelcome to this Friday’s reblog summary post.  The aim of these posts is to bring some old posts that newer readers may not have seen back to the forefront.  As with many technology blogs, just because a post is old, doesn’t mean it doesn’t still have value.

Some posts from years gone by that you may have missed are:

Do you have something from years gone by that was posted during this week?  If so, leave a comment below and we’ll give some new life to good knowledge.

What the VLF?

May 10th, 2013 by Mike Hillwig

When I interviewed for my current position, the hiring manager asked me if I knew anything about VLFs. My response was “Very large filesystem?” Clearly I missed that question. She was talking about SQL Server’s Virtual Log Files.

Until that day, I had never heard of VLFs, and to this day, a lot of DBAs don’t know what they are. But before I get too far into this, let me state that I’m hardly the expert on this. Kimberly Tripp from SQL Skills  has written several really good articles on transaction log throughput where she talks about VLFs.

VLFs are a construct within the SQL Server transaction log file that breaks the transaction log file into chunks. Those are the segements that get marked for being in use or ready to be reused.

The bottom line is that if you have lots of really small VLFs in your log file, you probably have a performance problem and you may not even know it. The best way to avoid this is to avoid the default settings on a new database for the transaction log file size and growth. If you start with a 1 MB transaction log file that grows by 10%, you’re going to have lots of little VLFs before you know what hit you.

My suggestion for a good practice is to start with a larger transaction log file and grow in larger increments. In my environment, for our datamart environments, we start with an 8 GB transaction log file and grow them in 8 GB increments. For smaller databases with smaller throughput, we set a minimum of a 256 MB transaction log file with a growth of 256 MB.

In this session, I talk about how this can be a Goldilocks problem. You might have too many or too few. Or they might be too big or too small. The real way to know is to baseline and test for your environment. The one thing I would suggest is that you don’t want to grow your transaction log file by more then 8 GB at a time. This will lead to 16 VLFs created per growth. And that would mean VLFs greater than 512 MB.

The real value in this session is the demo. The first script creates a database and starts stuffing it with data. It’s absolutely not a model of good design. In fact, it’s far from it. I have multiple columns that contain GUIDs. Worse yet, I have those columns indexed. The reason I’m doing this is because I want to create a ton of transaction log volume. Kick this thing off and let it run. It’ll grow your transaction log to about 12 GB.

The second script will show the VLFs in your database. This script executes

DBCC LOGINFO WITH TABLERESULTS

Notice that as the first script is running, the number of VLFs will continue to grow on each execution.

vlf1

There are some cool details on this output, including the size of each VLF and the LSN of when it was created. Lets do a little analysis on it. What I’m doing is pumping that data into a temp tble and then grouping on the LSN. I can then see how much the log file was grown each time and the number of VLFs it created. What you see can be quite staggering.

CREATE TABLE #VLFS
(recoveryunit int, --remove this column for SQL 2005/2008 environments
fileid int,
filesize bigint,
startoffset bigint,
fseqno bigint,
status int,
parity int,
createlsn varchar(1000))

use miketest
go
INSERT INTO #VLFS EXEC('DBCC LOGINFO WITH TABLERESULTS')

select createlsn, sum(cast(filesize as float))/1024/1024 [size], count(startoffset) [count]
from #vlfs
where createlsn <> '0'
group by createlsn
order by 3, 2, 1

drop table #vlfs
go
use master
go

vlf2

Look at all of those growths. And they’re tiny. That can’t be good. You can see that we’re we’re growing our transaction log file by less than 1 MB several times.  But look at what happens closer to the end of the output. We’re growing our transaction log file in much larger increments. And the bigger the growth, the more VLFs.

vlf3

Here is what I’ve been able to prove. If your growth is less than 512 KB,  you create one VLF. from 512 KB to 768 KB, you create two VLFs. 768 KB to 1 MB creates three VLFs. 1 MB to 64 MB creates four VLFs. Until you reach 1 GB, it’s eight VLFs. Anything over 1 GB is 16 VLFs.

If you poke around your environment, you probably have databases that look just like this. The question is How do I fix it?  The answer is simple. It depends. If you’re not seeing a noticable performance impact, you may want to leave it alone. This is where If it isn’t broken, don’t fix it could apply. If you are facing a performance problem from too many VLFs, I’d suggest doing this  during a quiet period for your database.

First, note how big your transaction log file is. Because I know how my files grow, I’m going to put it back to the size it was before.  Then I use DBCC SHRINKFILE to shrink the transaction log file and then grow it back to the original size, making sure I only grow it in 8 GB increments.

The real lesson here is something that I talk about in other posts: Defy the Defaults! If you’re creating databases with 1 MB transaction files and allowing them to grow in 10% increments, you’re probably doing it wrong.

What the VLF?

May 10th, 2013 by Mike Hillwig

When I interviewed for my current position, the hiring manager asked me if I knew anything about VLFs. My response was “Very large filesystem?” Clearly I missed that question. She was talking about SQL Server’s Virtual Log Files.

Until that day, I had never heard of VLFs, and to this day, a lot of DBAs don’t know what they are. But before I get too far into this, let me state that I’m hardly the expert on this. Kimberly Tripp from SQL Skills  has written several really good articles on transaction log throughput where she talks about VLFs.

VLFs are a construct within the SQL Server transaction log file that breaks the transaction log file into chunks. Those are the segements that get marked for being in use or ready to be reused.

The bottom line is that if you have lots of really small VLFs in your log file, you probably have a performance problem and you may not even know it. The best way to avoid this is to avoid the default settings on a new database for the transaction log file size and growth. If you start with a 1 MB transaction log file that grows by 10%, you’re going to have lots of little VLFs before you know what hit you.

My suggestion for a good practice is to start with a larger transaction log file and grow in larger increments. In my environment, for our datamart environments, we start with an 8 GB transaction log file and grow them in 8 GB increments. For smaller databases with smaller throughput, we set a minimum of a 256 MB transaction log file with a growth of 256 MB.

In this session, I talk about how this can be a Goldilocks problem. You might have too many or too few. Or they might be too big or too small. The real way to know is to baseline and test for your environment. The one thing I would suggest is that you don’t want to grow your transaction log file by more then 8 GB at a time. This will lead to 16 VLFs created per growth. And that would mean VLFs greater than 512 MB.

The real value in this session is the demo. The first script creates a database and starts stuffing it with data. It’s absolutely not a model of good design. In fact, it’s far from it. I have multiple columns that contain GUIDs. Worse yet, I have those columns indexed. The reason I’m doing this is because I want to create a ton of transaction log volume. Kick this thing off and let it run. It’ll grow your transaction log to about 12 GB.

The second script will show the VLFs in your database. This script executes

DBCC LOGINFO WITH TABLERESULTS

Notice that as the first script is running, the number of VLFs will continue to grow on each execution.

vlf1

There are some cool details on this output, including the size of each VLF and the LSN of when it was created. Lets do a little analysis on it. What I’m doing is pumping that data into a temp tble and then grouping on the LSN. I can then see how much the log file was grown each time and the number of VLFs it created. What you see can be quite staggering.

CREATE TABLE #VLFS
(recoveryunit int, --remove this column for SQL 2005/2008 environments
fileid int,
filesize bigint,
startoffset bigint,
fseqno bigint,
status int,
parity int,
createlsn varchar(1000))

use miketest
go
INSERT INTO #VLFS EXEC('DBCC LOGINFO WITH TABLERESULTS')

select createlsn, sum(cast(filesize as float))/1024/1024 [size], count(startoffset) [count]
from #vlfs
where createlsn <> '0'
group by createlsn
order by 3, 2, 1

drop table #vlfs
go
use master
go

vlf2

Look at all of those growths. And they’re tiny. That can’t be good. You can see that we’re we’re growing our transaction log file by less than 1 MB several times.  But look at what happens closer to the end of the output. We’re growing our transaction log file in much larger increments. And the bigger the growth, the more VLFs.

vlf3

Here is what I’ve been able to prove. If your growth is less than 512 KB,  you create one VLF. from 512 KB to 768 KB, you create two VLFs. 768 KB to 1 MB creates three VLFs. 1 MB to 64 MB creates four VLFs. Until you reach 1 GB, it’s eight VLFs. Anything over 1 GB is 16 VLFs.

If you poke around your environment, you probably have databases that look just like this. The question is How do I fix it?  The answer is simple. It depends. If you’re not seeing a noticable performance impact, you may want to leave it alone. This is where If it isn’t broken, don’t fix it could apply. If you are facing a performance problem from too many VLFs, I’d suggest doing this  during a quiet period for your database.

First, note how big your transaction log file is. Because I know how my files grow, I’m going to put it back to the size it was before.  Then I use DBCC SHRINKFILE to shrink the transaction log file and then grow it back to the original size, making sure I only grow it in 8 GB increments.

The real lesson here is something that I talk about in other posts: Defy the Defaults! If you’re creating databases with 1 MB transaction files and allowing them to grow in 10% increments, you’re probably doing it wrong.

QuickHitter 03 : Put some Windows updates in my basket

May 9th, 2013 by Jason Crider

Maybe this is just news to me, but I didn’t know this resource was out there. In comparing the updates between two nodes on a cluster, I found out about the “Microsoft Update Catalog” where you can add updates to your basket and download them all at once. I also found the cluster validation wizard lists the same updates as missing.

Not sure if this will be beneficial in all cases, but when trying to match up servers with updates it might help someone. Don’t forget that updates can be superseded and try to trust your WSUS server.

Data cleansing in SSIS, DQS, and MDS

May 9th, 2013 by James Serra

Data cleansing can be done via SSIS as well as Data Quality Services (DQS) and Master Data Services (MDS).  The lines are a bit blurred when talking about data cleansing using SSIS, DQS and MDS.  In what product should data be cleaned?  To give examples: having to convert a Unicode string to a non-Unicode string can be done in SSIS using the data conversion transformation; converting the word “one” to the number “1” would use the derived column transformation (which has a sophisticated expression language) in SSIS.  Cleaning state codes by comparing them to a knowledge base/reference dataset containing valid state codes can be done with the lookup transformation in SSIS; removing duplicates from a table (i.e. a customer that is entered twice with a different spelling) can be done in SSIS using the fuzzy lookup transformation.  These SSIS transformations would need to be used with other SSIS data flow components to fully complete the data cleaning solution. 

But all those tasks can be done much easier using DQS, which also has a lot more features available.  DQS enables you to build a knowledge base and use it to perform a variety of critical data quality tasks, including correction, enrichment, standardization, and de-duplication of your data (see Data Quality Services Books Online).  Or you can perform matching manually within the Master Data Services Add-in for Excel which leverages the matching functionality in Data Quality Services.  And there is a DQS cleaning transformation that you can use in SSIS (see Overview of the DQS Cleansing Transform).

MDS has limited data cleansing via business rules which can apply default values and change values.  The best approach is to use DQS to clean the data from the source and then copy that data into MDS.

More info:

Data Conversion in SSIS

Cleanse and Match Master Data by Using EIM – Microsoft

sp_Blitz™ v22: Output Results to a Table, Better Comments, and an App

May 9th, 2013 by Brent Ozar

Our free server troubleshooting tool, sp_Blitz™, just learned some new tricks. If you call it like this:

EXEC dbo.sp_Blitz
@OutputDatabaseName = ‘DBAtools’,
@OutputSchemaName = ‘dbo’,
@OutputTableName = ‘BlitzResults’

It will create the table DBAtools.dbo.BlitzResults (if it doesn’t already exist) and insert the results. The table includes columns for the server name and the check date, so you can query this table to see what’s changed since the last time you ran sp_Blitz™. Plus, there’s more:

Added new checks for disabled CPU schedulers (due to licensing or affinity masking), extended stored procedures, disabled remote access to the DAC, databases in unusual states like suspect or emergency, logins with CONTROL SERVER permission.

Easier readability – whenever anyone’s asked how to add checks, I’ve always just said, “Read the code.” Last week, I read the code with fresh eyes and realized it could use some cleanup and comments. I used Red Gate SQL Prompt to clean up the formatting, which is better but could still use some work. The code is nearing 4,000 lines, so I moved the old changes out of the proc and into a new online changelog. I put in some time to explain what the proc is doing, and that should help people reading it from the first time. I’ve also started an sp_Blitz™ Documentation page where I’ll be explaining some of the more advanced uses.

New downloadable sp_Blitz™ app for Windows - over and over, we heard from people that they wanted a quick way to generate a print-friendly copy of the sp_Blitz™ results. Jeremiah worked with a developer and built our first app:

sp_Blitz™ Windows App

sp_Blitz™ Windows App

The app creates sp_Blitz™ in TempDB on the server of your choice, executes it, shows you the results, and can output the results to PDF – complete with links to the explanations for each result.

We’ve got lots of ideas for the app, but we wanted to get it out there first and let you help drive it. If there’s something you’d like to see added or changed, leave a comment here or contact us. The whole point of this thing is to make your life easier. I wish this tool would have been available back when I was a production DBA struggling to figure out what was going on in my environment, so every little improvement helps. Hopefully this can save you from the stress I had back then. Get it from the download page and enjoy!

...
Bored at work? We've got a wall of YouTube videos.

sp_Blitz™ v22: Output Results to a Table, Better Comments, and an App

May 9th, 2013 by Brent Ozar

Our free server troubleshooting tool, sp_Blitz™, just learned some new tricks. If you call it like this:

EXEC dbo.sp_Blitz
@OutputDatabaseName = ‘DBAtools’,
@OutputSchemaName = ‘dbo’,
@OutputTableName = ‘BlitzResults’

It will create the table DBAtools.dbo.BlitzResults (if it doesn’t already exist) and insert the results. The table includes columns for the server name and the check date, so you can query this table to see what’s changed since the last time you ran sp_Blitz™. Plus, there’s more:

Added new checks for disabled CPU schedulers (due to licensing or affinity masking), extended stored procedures, disabled remote access to the DAC, databases in unusual states like suspect or emergency, logins with CONTROL SERVER permission.

Easier readability – whenever anyone’s asked how to add checks, I’ve always just said, “Read the code.” Last week, I read the code with fresh eyes and realized it could use some cleanup and comments. I used Red Gate SQL Prompt to clean up the formatting, which is better but could still use some work. The code is nearing 4,000 lines, so I moved the old changes out of the proc and into a new online changelog. I put in some time to explain what the proc is doing, and that should help people reading it from the first time. I’ve also started an sp_Blitz™ Documentation page where I’ll be explaining some of the more advanced uses.

New downloadable sp_Blitz™ app for Windows - over and over, we heard from people that they wanted a quick way to generate a print-friendly copy of the sp_Blitz™ results. Jeremiah worked with a developer and built our first app:

sp_Blitz™ Windows App

sp_Blitz™ Windows App

The app creates sp_Blitz™ in TempDB on the server of your choice, executes it, shows you the results, and can output the results to PDF – complete with links to the explanations for each result.

We’ve got lots of ideas for the app, but we wanted to get it out there first and let you help drive it. If there’s something you’d like to see added or changed, leave a comment here or contact us. The whole point of this thing is to make your life easier. I wish this tool would have been available back when I was a production DBA struggling to figure out what was going on in my environment, so every little improvement helps. Hopefully this can save you from the stress I had back then. Get it from the download page and enjoy!

...
Bored at work? We've got a wall of YouTube videos.

SQLSaturday #209 – Rochester

May 8th, 2013 by admin
Welcome to the Roc!

Welcome to the Roc!

Just a quick note that I’ll be presenting this weekend, Saturday May 11th, at SQLSaturday in Rochester, NY. Exciting to return to Central New York as I graduated right down the road at Syracuse University. Go Orange!​If you’re in the area come on down, should be a great day of training. Check out the day’s schedule (link), I’ll be presenting two sessions this weekend:

 

You can follow the event along on Twitter using the #sqlsatroc hashtag as well. See you this weekend in the Roc!

 

Share

How To: Migrate Encrypted Procedures To Windows Azure SQL Database (WASD)

May 8th, 2013 by Thomas LaRock

Here’s the easy answer: you don’t.

WASD does not support the WITH ENCRYPTION option for objects such as stored procedures, user defined functions, triggers, or views. So the idea of migrating an object that was compiled with that option is simply not feasible.

If you have a system with encrypted objects and you try migrating to WASD you will get “Validation of the schema model for data package failed. Error SQL71564: The element…cannot be deployed as the script body is encrypted.”

encrypt_error

To unencrypt this proc you have three options: you can use a 3rd party tool, find various scripts through a quick internet search, or take the source code and issue an ALTER command without the encryption option.

I prefer the latter option. Just take the source code and remove the WITH ENCRYPTION statement.

I would also like to remind you that WITH ENCRYPTION does not encrypt anything. It obfuscates the object code, nothing more. It really should be named WITH OBFUSCATION, but that probably wouldn’t be clear for most end users. If you have a requirement to encrypt your source code you should go back and ask for more information as to why it is necessary. Chances are the WITH ENCRYPTION option is not meeting the original requirements.

After you update the object you’ll be ready to complete your migration to WASD. At the very least, you’ll get past that error message!

How To: Migrate Encrypted Procedures To Windows Azure SQL Database (WASD) is a post from: SQLRockstar - Thomas LaRock

The post How To: Migrate Encrypted Procedures To Windows Azure SQL Database (WASD) appeared first on SQLRockstar - Thomas LaRock.

Indexing Wide Keys in SQL Server

May 8th, 2013 by Kendra Little

Key length matters in SQL Server indexes.

It’s a best practice to keep your index keys as narrow as possible, and SQL Server enforces a maximum key length of 900 bytes on most “normal” clustered and nonclustered indexes.

But what happens if you want to optimize the lookup of a wide column? You’re not necessarily out of luck, you may just have to get a bit creative.

What If I Need to do an Equality Search on a Wide Column?

Let’s say I have a simple table. I have a narrow key on my clustered index and then I have a pretty wide variable length column. I need the wide column to be unicode, which makes it even wider, since unicode data types take up more room.

Here’s our sample table with a few rows (just pretend it has a lot more):

CREATE TABLE dbo.LookupValues (
	i int identity,
	bigval nvarchar(2000) default (REPLICATE('d',700)),
	constraint pk_LookupValues_i primary key (i)
);
GO

--Insert rows with the default values
begin tran
	declare @i smallint = 0;
	while @i < 10000
	begin
		insert dbo.LookupValues default values;
		set @i=@i+1;
	end
commit
GO

--Insert a few smaller values
insert dbo.LookupValues (bigval) VALUES ('big');
insert dbo.LookupValues (bigval) VALUES ('bunny');
insert dbo.LookupValues (bigval) VALUES ('bunny bunny');
GO

Let’s say we write to this table rarely, but query it often. When this query runs, I want to make it as fast as possible:

SELECT i
from dbo.LookupValues
where bigval = N'bunny';

Right now, this query has to scan every row in the clustered index (the whole table) to find instances where bigval=N’bunny’. That’s not ideal, and as the table grows it’ll become worse and worse, burning more IO and CPU, and taking longer over time.

There’s usually an easy way to make a query like this fast: just create a nonclustered index on the bigval column. But when I try, it doesn’t work because of restrictions on key size.

--Make my query faster!
CREATE NONCLUSTERED INDEX ix_LookupValues_bigval on dbo.LookupValues (bigval);
GO

SQL Says:

Warning! The maximum key length is 900 bytes. The index 'ix_LookupValues_bigval' has maximum length of 4000 bytes. For some combination of large values, the insert/update operation will fail.
Msg 1946, Level 16, State 3, Line 1
Operation failed. The index entry of length 1400 bytes for the index 'ix_LookupValues_bigval' exceeds the maximum length of 900 bytes.
The statement has been terminated.

Terminated. Yeah. I can’t just index this to make my query fast.

Options for Indexing Wide Keys

So what’s a performance tuner to do?

My first thought when I hit this problem was that I might have to use a fulltext index. A fulltext index can work here– it lets you index large columns, but it would be kind of a bummer to have to do it. Fulltext indexes have extra overhead and are really designed for different things than doing a simple equality search, so it would be like using a jackhammer because you can’t find a mallet.

My partner Jeremiah Peschka came up with a quick and clever solution using an indexed computed column. You can work all sorts of cool magic with computed columns in SQL Server– the trick is just to remember them!

Here’s how it works: you add a computed column to the table that’s the hash of the large value. You then index the computed column and modify your query to take advantage of it.

In this example we use SHA_512 for the hashing algorithm. This will give an output of 64 bytes– well within our limits for index key sizes.

ALTER TABLE dbo.LookupValues ADD bigvalhash AS HASHBYTES('SHA2_512', bigval) PERSISTED;
GO

CREATE NONCLUSTERED INDEX ix_LookupValues_bigvalhash on dbo.LookupValues (bigvalhash) INCLUDE (bigval);
GO

Now, to get the query work, we need to change it a bit:

SELECT i
from dbo.LookupValues
where bigvalhash = HASHBYTES('SHA2_512', N'bunny')
	and bigval = N'bunny';
GO

This revised approach gives me a targeted index seek and limits my logical reads. Voila!

The Fine Print on This Solution

There are a few things to note:

  • HASHBYTES results are dependent upon datatype. If my query used HASHBYTES(‘SHA2_512′, ‘bunny’), it would not find any rows, because the column is hashed unicode values and I provided a hashed non-unicode value.
  • I do still include “bigval= N’bunny’” in my query. In theory there shouldn’t be collisions with SHA-512, but it doesn’t add much expense to the query and in my example I deemed it “worth it” to me. You might make a different choice.

Sometimes Old Tools Do the Trick

What I love most about this solution is that it’s creative, but it’s not really weird, when you think about it. It uses standard features that have been in SQL Server for a long time to create a way to do something that seems like the product wouldn’t support– and that’s really cool.

...
Bored at work? We've got a wall of YouTube videos.