When Did That File Get So Big?

June 30th, 2009 by Jason Strate

omg1 I haven’t blogged much since getting back from vacation.  It seems like a good restart after the two week hiatus would be to address an issue that has arisen countless times.  And while addressing it, I’ll put it into the DBADiagnostics database that I’ve blogged about a few times before.

Sudden File Growth

In a number of SQL Server environments that I’ve worked in there either isn’t a method for monitoring file size or the process is more of a rubber stamp morning check.  The DBA gets in and looks at the size of the files and if there isn’t a log of changes to update nothing is really done and changes aren’t investigated.

In most cases this won’t be a problem.  For instance, if you’ve gone out and pre-grown your data and log files to appropriate sizes then nothing will grow and there is nothing to see here.

In cases were the unexpected happens, though, database files can and will grow.  And in the worst of these cases, which only occur at night or on vacations, the files will grow to a point where there is no longer any disk space available.  And if your annual review is next week, this will happen to the log file and force your database offline.

Knowing Is Half the Battle

In these types of situations, I like to recall the last couple minutes of the GI Joe epos ides that I watched as a kid.  They almost always ended with the quote, “Knowing is half the battle!  Yo, Joe!”.  If I know that a file growth has happened then I can do something about it.

Now the best case is to know, monitor, and plan for upcoming file growths.  This is what might be called a best practice and if you’re not doing it I’d really recommend putting a process in place.  But we need to be prepared for the unexpected.  And even in the most best, rock solid environment, I’d recommend a file growth monitoring process.

The Solution

This procedure for the DBADiagnostics database differs slightly from previous alerts.  Instead of sending out an e-mail that aggregates all of the changes an error is raised for each file change in question.  This method was selected because this allows the process to be tool agnostic.

If the client has a log file monitoring process, then the error can be picked up that way.  Or if all of the alerts are generated from the SQL Server instances, then a SQL Agent Alert can be created to notify people about the issue.

USE [DBADiagnostics]
GO

IF NOT EXISTS(SELECT * FROM sys.schemas WHERE name = 'Monitor')
    EXEC('CREATE SCHEMA [Monitor] AUTHORIZATION [dbo]')
GO

IF OBJECT_ID('Monitor.FileGrowth') IS NULL
BEGIN
    CREATE TABLE [Monitor].[FileGrowth]
        (
        DatabaseName sysname NOT NULL,
        DatabaseFileName nvarchar(260) NOT NULL,
        FileSizeMB decimal(18,3) NOT NULL
        CONSTRAINT PK_MonitorFileGrowth PRIMARY KEY(DatabaseName, DatabaseFileName)
        )
END

IF OBJECT_ID('Alert.FileGrowth') IS NOT NULL
    DROP PROCEDURE [Alert].[FileGrowth]
GO

/*================================================================================
Procedure:    [Alert].[FileGrowth] 
Author:        Jason Strate
Date:        2007-11-14

Synopsis:
    Procedures monitors the size of each file for all user databases and tempdb.  In
    the event of file growrh for any of the monitored databases, a error is raised
    that can be captured through either tools monitoring SQL Servers log files or 
    through SQL Agent Alerts.
         

================================================================================
Revision History:
Date:        By            Description
----------------------------------------------------------------------------------
================================================================================*/
CREATE PROCEDURE [Alert].[FileGrowth] 

As

SET NOCOUNT ON

-- Validate that necessary error message exists
IF NOT EXISTS (SELECT * FROM master..sysmessages WHERE error = 70000)
    EXEC master..sp_addmessage  @msgnum = 70000, @severity = 12, @with_log = 'true', 
        @msgtext = 'File growth has occured in the database %s on the file %s.  The size has increased from %d to %d.  If the file growth was unplanned, please review for unexpected issues.', @replace = 'REPLACE'

-- Insert files for each database that are not currently being watched
INSERT INTO [Monitor].[FileGrowth] (DatabaseName, DatabaseFileName, FileSizeMB)
SELECT d.name, mf.name, CAST(mf.size as float)*8/1024
FROM sys.databases d
    INNER JOIN sys.master_files mf ON d.database_id = mf.database_id
    LEFT OUTER JOIN Monitor.FileGrowth fg ON d.name = fg.DatabaseName AND mf.name = fg.DatabaseFileName
WHERE d.state_desc = 'ONLINE'
AND source_database_id IS NULL
AND d.name NOT IN ('model', 'master', 'msdb')
AND fg.FileSizeMB IS NULL

-- Delete databases that no longer exist
DELETE FROM Monitor.FileGrowth 
FROM Monitor.FileGrowth fg 
    LEFT OUTER JOIN sys.databases d ON fg.DatabaseName = d.name
WHERE d.name IS NULL

DECLARE @FileGrowthDelta table
    (
    DatabaseName sysname
    ,DatabaseFileName nvarchar(260) 
    ,OldFileSizeMB decimal(18,3)
    ,NewFileSizeMB decimal(18,3)
    )

-- Update files that changed size and output delta rows
UPDATE fg
SET FileSizeMB = CAST(CAST(mf.size as float)*8/1024 as decimal(18,3))
OUTPUT INSERTED.DatabaseName, INSERTED.DatabaseFileName, INSERTED.FileSizeMB, DELETED.FileSizeMB
INTO @FileGrowthDelta
FROM Monitor.FileGrowth fg 
    INNER JOIN sys.databases d ON d.name = fg.DatabaseName 
    INNER JOIN sys.master_files mf ON d.database_id = mf.database_id AND mf.name = fg.DatabaseFileName
WHERE d.state_desc = 'ONLINE'
AND source_database_id IS NULL
AND d.name NOT IN ('model', 'master', 'msdb')
AND FileSizeMB <> CAST(CAST(mf.size as float)*8/1024 as decimal(18,3))

--Declare variables section for triggering error event
DECLARE @DatabaseName sysname
    ,@DatabaseFileName nvarchar(260) 
    ,@OldFileSizeMB int
    ,@NewFileSizeMB int

--For each database name in sysdatabases
DECLARE ALTER_FILE_GROWTH_CURSOR CURSOR LOCAL FAST_FORWARD FOR 
    SELECT DatabaseName 
        ,DatabaseFileName 
        ,OldFileSizeMB 
        ,NewFileSizeMB
    FROM @FileGrowthDelta

OPEN ALTER_FILE_GROWTH_CURSOR
FETCH NEXT FROM ALTER_FILE_GROWTH_CURSOR INTO @DatabaseName, @DatabaseFileName, @OldFileSizeMB, @NewFileSizeMB

WHILE @@FETCH_STATUS = 0
BEGIN
    RAISERROR(70000, 0, 1, @DatabaseName, @DatabaseFileName, @OldFileSizeMB, @NewFileSizeMB) WITH LOG
    FETCH NEXT FROM ALTER_FILE_GROWTH_CURSOR INTO @DatabaseName, @DatabaseFileName, @OldFileSizeMB, @NewFileSizeMB    
END

CLOSE ALTER_FILE_GROWTH_CURSOR
DEALLOCATE ALTER_FILE_GROWTH_CURSOR
GO

EXEC [Alert].[FileGrowth]

In the next post, I’ll outline the process that I use to check to see if there are any files that may need be running out of available space.

July Presentations

June 30th, 2009 by Jeremiah Peschka

I know of three presentations coming up in the next month that I thought I’d share with everyone, if only because I’m involved with all three.

The first is Andy Leonard’s presentation on Incremental Loads, Change Data Capture, and SSIS 2008 in scenic Columbus, OH. The event takes place at 6:30PM on July 9th at Battelle for Kids. Lest you think Andy will be in Columbus, he will be presenting via LiveMeeting. This means that you can attend via LiveMeeting!

Presentation the second is Aaron Alton’s presentation on Simple Data Change Tracking in SQL Server. This is a virtual presentation that is happening on July 14th at 2PM Eastern.

Finally, on July 23rd I will be presenting at the Central Ohio .NET Developers Group. The event starts at 6:00 PM and it located at Microsoft building on Polaris Parkway in Columbus, Ohio. My presentation is titled From Tables to Objects: Making Your Database Work With You - it’s all about using database features to enhance the quality of your ORM. There is no LiveMeeting available for this, but I will either be recording via Camtasia or else will record it after the fact.

Google Reader Tutorial Video

June 30th, 2009 by Brent Ozar

Google Reader is an RSS reader tool that helps you stay in touch with more web sites in less time. It consolidates new articles from sites all over the web in one single, easy-to-use interface, and helps you share your favorite articles with your friends. Brent Ozar explains Google Reader this four minute video tutorial.

Get the Flash Player to see the wordTube Media Player.

In the videos, Brent talks about these sites:

  • Google Reader – the online RSS reader tool.
  • Brent’s Shared Google Reader items – if you want to get notified of his favorite SQL Server blog posts.
  • AideRSS PostRank – the best/great/good filtering plugin for Reader that helps you catch up with the most popular blog entries quickly.
  • URLFan – find out when other people mention your blog or link to it.
  • Search.Twitter.com – find out when anyone mentions your name or your site’s name on Twitter.

Subscribing or Downloading the Podcast

If you have an MP3 player or a portable video player and you want to download our videos automatically, you can subscribe to our podcast feeds here:

You can also download this video to watch it later:

  • Digg
  • del.icio.us
  • DZone
  • DotNetKicks
  • Google Bookmarks
  • Ping.fm
  • Reddit
  • StumbleUpon
  • TwitThis

Letter to the Editors & Contributors for June 30th

June 30th, 2009 by Brent Ozar

This week, we have some developments that make me proud to wear the big hat around here.

Comments On Your Stories Belong On Your Site

Brett Epps blogged about the latest round of improvements to SSP.  Most of ‘em focus on making the site load faster, but my favorite one has nothing to do with performance.

As of today, syndicated blog posts don’t have a separate comments area on SQLServerPedia.  If visitors want to comment about your blog entry, they’ll click on a link to go to your own web site.  This isn’t just a switch you can throw in WordPress, either – we couldn’t have done it without Brett’s PHP/MySQL work under the hood.

This keeps all the discussion about a post in a single place, which makes life easier for bloggers.  Readers can interact better with each other regardless of where they found your blog initially.  Plus, if like me, you’ve implemented a Subscribe to Comments plugin for your blog, then readers will get notified whenever anyone else posts a comment too.  Since all comments will be happening in one place, it makes readers more likely to return as new comments are posted in one place.

This also perks up the activity on your blog.  As a blogger, I know it can be frustrating to put a lot of work into an entry and then not get any comments.  If comments ended up on SQLServerPedia, that’s cool, but it’s not as cool as having them end up on your own site.  Then as readers stumble across your site, it’ll look more active.

I don’t want to syndicate your site’s comments over to SQLServerPedia because I want you, the blogger, to have incentives you can use to bring traffic to your own blog.  To make it easier, use FeedBurner FeedFlare to add comment links to the bottom of your posts, because FeedBurner can automatically update the number of comments.  I talk about using FeedBurner FeedFlare in my Tips for Syndicated Bloggers.

Solving Plagiarism Problems

A web site popped up on the radar recently for attempting to plagiarize just about everyone in the database community.  They used RSS feeds to suck the content out of blogs and show it on their own site without any attribution whatsoever.  Several of us bloggers contacted the site and demanded that our content be taken down, and Quest’s legal department sent cease & desist letters to the site.  Eventually, the site’s hosting company intervened after we began sending DMCA violation notices, and the site’s down for the count right now.

For more in-depth coverage of the story, check out my posts on How to Take Action When Your Site is Plagiarized and the followup More Thoughts on Blog Plagiarism.

Is Your Biography Up to Date?

No, we’re not worried about writing your obituary, even though the #celebpocalypse keeps claiming more victims.  Instead, we’re kicking around the idea of including your bio from the SQLServerPedia Editors and Contributors page at the bottom of every blog entry you write.  If there’s anything you’d like to update, take a moment to go edit that page.  Try to resist the temptation to edit the bios of others.

  • Digg
  • del.icio.us
  • DZone
  • DotNetKicks
  • Google Bookmarks
  • Ping.fm
  • Reddit
  • StumbleUpon
  • TwitThis

squeaky wheel… yada, yada

June 29th, 2009 by Stuart Ainsworth

Well, I guess if you complain about something long and hard enough, things happen.  Sometimes, things happen that have nothing to do with your complaining, but the coincidental correlation is so strong that your mind draws associations for it (which sometimes leads to fodder for blog posts).

BTW, sorry for not posting much lately; my personal life has gotten rather complicated lately, so I’ve been avoiding the computer after hours.

Anyway, I’ve been struggling for some time now with Visual Studio for Team Systems: Database Edition, particularly in a scrum environment, and it just seems that I’m not making much headway with it.  It’s just a foreign model of usage, and I’m still struggling with the tool. It’s kind of like learning to drive a car after driving a golf cart; the basic principles are the same, but there’s a heckuva lot more options.

Anyway, Shawn Wildermuth to my rescue; his company, Agilitrain, is hosting a series of workshops on Agile Database Techniques using Visual Studio 2008 Database Edition; I told my boss about it, and even though training money is pretty tight right now, I got approval to go within minutes (that says a lot about my boss, or it may say something about his opinion of me and the need for training).  Anyway, here’s the course description:

Agile Database Techniques

This course provides students with the knowledge and skills to properly manage the SQL Server database development lifecycle in an agile environment. The student will learn how to manage changes to the database structure, ensuring quality through T-SQL unit testing, and how to automate the building and deploying of SQL Server databases. Intended for developers and administrators already comfortable with the concepts of designing, programming, testing, and deploying SQL Server databases, this course focuses on using Visual Studio Team System 2008 Database Edition to maximize productivity while minimizing mistakes.

Price: $1995.00 (USD)
Pricing may be different in locations outside the United States
Language(s): English
Length: 3 Days

Pre-requisites:

Before attending this course, the student should have experience developing and maintaining SQL Server 2000, 2005, or 2008 databases. It would also be beneficial if the student has worked in a team-based software development project and is familiar with their organization’s Software Development Life Cycle.

I’ve also been notified that Nob Hill Software has released their latest version of Randolph (an automatic versioning software for databases) with full support for TFS.  Unfortunately, I haven’t had a chance to install and test it yet (I’ve actually been working on an Analysis Services project; looks like I’m becoming a BI guy), but once I do, I’ll post a review here and at the AtlantaMDF Review site.

The Best Thing I learned at PASS Summit

June 29th, 2009 by Joe Webb

Wow, what a great question!

The Best Thing I Learned at PASS Summit

I vividly remember my first Summit experience. I was sitting with a couple of hundred other people in a session. The speaker was an established industry expert and author. I had a couple of his books on my shelf at home and I was eager to hear him in person. He was doing a great job when someone asked a question. It was obvious he didn’t know the answer – that’s okay no one knows everything. He handled it well. Just then another SQL legend in the audience stood up and gave the answer. As that person finished, a Product Manager for SQL Server stood up and provided even more insight for the future direction of SQL Server.

I remember sliding down in my seat thinking “There’s no way I’ll ever speak at a conference like this. So, I’ve got to find another way to volunteer for PASS.”

I did. I became a chapter manager, then ran for and was elected to the Board of Directors, and eventually became the Executive Vice President of Finance for PASS. During my 9 years of volunteerism (6 years on the Board), PASS afforded me many opportunities to do things that I would not have had a chance to do otherwise as a self employed database consultant, far too many to list! And for that, I’m very thankful.

So what did I learn?  Some call it the Law of the Harvest. The more you put in, the more you tend to get back. The more you give, the more you receive.

And oddly enough since that first experience, I’ve given more than 50 sessions at conferences throughout North America and Europe.

PASS Board of Directors at the PASS European Conference 2005 in Munich

It’s Your Turn!

Got a PASS Summit story? I’d love to hear it. And so could PASS. Jot it down in a blog or email and you would win a trip to this year’s Summit in Seattle or some other prizes. For more details visit the contest site.

Cheers!

Joe

Data Mining Basics with Tom LaRock

June 29th, 2009 by Brent Ozar

We’ve talked about data mining the StackOverflow database dump before, but today we’re going to start from the very beginning.  In this half-hour video, Tom LaRock and Brent Ozar talk about what data mining is, and show some examples using the StackOverflow database.

Get the Flash Player to see the wordTube Media Player.

For more about importing the StackOverflow database export and working with it, check out our articles on data mining the StackOverflow database dump.

Subscribing or Downloading the Podcast

If you have an MP3 player or a portable video player and you want to download our videos automatically, you can subscribe to our podcast feeds here:

You can also download this video to watch it later:

  • Digg
  • del.icio.us
  • DZone
  • DotNetKicks
  • Google Bookmarks
  • Ping.fm
  • Reddit
  • StumbleUpon
  • TwitThis

Speaking Engagements

June 29th, 2009 by Gail Shaw

It’s going to be a busy 6 months in terms of conferences and speaking (well, busy for me, I’m just getting used to the whole speaker thing)

TechEd Africa

TechEd Africa is running from the 2nd to the 5th of August in Durban. I’m presenting two sessions there, one on query hints and plan guides and one on evaluating your indexing strategies. I’ll also very likely be helping out in the community lounge and the Ask the Experts area.

PASS Community Summit

I’m presenting two sessions at the PASS Community Summit this year. A spotlight session on titled ‘Lies, damned lies and Statistics’ and a general session titled ‘Insight into Indexes’

The spotlight session will be covering column statistics, why SQL creates them and how, when they’re updated, the importance of accurate statistics, some of the problems that can result when they’re not accurate and some maintenance strategies.

The general session will be looking at what SQL can tell you about indexes, how they’re been used, what they’re been used for and what indexes SQL thinks it wants, and how reliable all that information is.

SQL Usergroup

I’ll be presenting at the October meeting of the SQL Server usergroup and will be doing a final dry-run of one of the presentations that I’ll be giving at PASS. I haven’t decided which one yet.

More Thoughts on Blog Plagiarism

June 29th, 2009 by Brent Ozar

In the aftermath of the InformationFlash plagiarism incident, several questions have come up from the site’s webmaster and from other bloggers.

Is it okay if the plagiarizer isn’t making money?

No.  Authors work really, really hard to create their original content.  Seeing someone else pass it off as their own, whether there’s a charge or not, reduces the value of our hard work.

If I took the whole content of The Manga Guide to Databases and reproduced it here on my blog, I wouldn’t be making a dime off it.  However, I’d be robbing the author of income.  Even if that author was giving away the work for free, the author might be benefitting in a way that I don’t understand yet, so I need to contact the author before republishing their copyrighted work.

Is it okay if I don’t understand my blog aggregation software?

No.  If you pick up a gun, it’s your responsibility to understand how it works. The first time it accidentally goes off and shoots somebody, you might be able to get away with claiming you didn’t know it was loaded.  After several people complain about gunshot injuries, though, you need to put the gun down.

Just as you can go to a local gun club to learn about firearm safety, you can get help with RSS aggregators too.  Post a message in the product’s support forum, contact other users of the product, or post a message on StackOverflow.  But whatever you do, don’t wave that thing around until you understand what you’re doing.

Shouldn’t the bloggers change their feeds to prevent theft?

Bloggers can choose whether to include the full article or just a few words in the RSS feed.  In my series on how to start a technical blog, I recommend using the full article because readers like it a lot more.  They don’t want to click through to read your full article on your site.  (Personally, I hate the holy hell out of blogs who just include the abstract, and their content has to be insanely good for me to subscribe to one of those kinds of blogs.)

Even if the blogger changes their feed to just include an abstract, it still doesn’t prevent syndication sites from stealing content with screen-scraping techniques.  Then the naysayers would say, “It’s the blogger’s fault for not requiring a username and password in order to read the blog.”

If we have another site pop up like InformationFlash, I’ll probably end up including a copyright note at the bottom of every blog entry.  It’ll say something like, “If you’re not reading this article at BrentOzar.com or SQLServerPedia.com, it was stolen.”  I hate doing that, though, because it looks crappy.  It’s like bolting the TV remote to the nightstand.

Is it okay if end users submit the copyrighted blogs?

No.  When the owner of copyrighted content notifies you that your site has their stuff on it, and they want it taken down, you have to take it down pronto.  YouTube is a good example because people try to upload copyrighted data all the time.  If the original content owner files a DMCA complaint at YouTube, then YouTube acts quickly to take the content down.

Just as a side note – if you try to claim some other user uploaded the copyrighted content, you need to be *very* prepared to show database records and web server access logs to prove the site administrator wasn’t the one uploading content.

How come it’s okay when Digg or DotNetKicks does it?

Because those sites don’t publish the full content of the article.  They show the first few words of the article, and if the reader is interested, they click through to the full content of the article on the blogger’s site.

InformationFlash was showing the entire article, start to finish, without even showing the author’s name.  That isn’t promoting the authors at all.  To make matters worse, InformationFlash had a Google PageRank of a whopping zero – meaning it wasn’t promoting anyone other than itself by stealing content.

Then is it okay if the site promotes the bloggers?

No. When you’re taking copyrighted content from bloggers, you have to get their permission first, period.

Some authors are completely okay with you republishing their work as long as you attribute them appropriately and link back to them.  For example, I’ve told SQL Server Magazine they’re free to use any material from my blog as long as they quote me.  (Part of this is a selfish reason: despite what Compete thinks, I’m pretty sure SQL Server Magazine has more readers than I do.)

Is it okay if it’s not illegal?

Even if you register your domain name anonymously and ignore all incoming emails, sooner or later people are going to figure out your real name.  They’re going to post your name in public along with an explanation of what happened.  That kind of information will turn up in Google searches, and it’ll make for very ugly job interviews and client negotiations down the road.

Besides, don’t you want to be successful?  Your site simply can’t become a success by alienating the very people upon whom your site depends for content.  You can be successful by working with the community and making sure everything is a win-win.  It’s not easy, and it’s not cheap, but it works in the long run.

Stealing is easy and cheap – but the long-term outlook is not so good.

DBAs Behaving Badly (5/10): Data File Management

June 29th, 2009 by Rod Colledge

In the last post in this series, we investigated common problems with storage configuration. A closely related topic is that of data file management, something I’ll be focusing on in this post.

A well known attribute of SQL Server is its ease of use. In contrast with other database management systems, it’s a piece of cake to get up and running, and as many of you would agree, that’s not always a good thing.

SQL Server is designed with good out-of-the-box settings that strengthen security, reduce administration overhead and maximize performance, however, when it comes to individual databases, there’s a number of recommended configuration steps that SQL Server doesn’t perform, in large part due to dependencies on disk configuration and unknown future usage of the databases. In this post, we’ll concentrate on three such settings; autogrowth, file size and the recovery model.

Relying on Autogrowth

When a database is created using the default settings, it’s sized as per the model database, which by default has a 3MB data file set to autogrow in 1MB increments, and a 1MB log file with 10% autogrowth. Autogrow events occur when the files fill to capacity and automatically grow, therefore avoiding “out of space” errors. For very small test or development databases, that may be ok, but what about a large production database that’s growing rapidly? In such a case, the database would be constantly auto-growing every few seconds (or less). So what’s the problem?

Fragmentation; If a database’s files are sized up front for the expected growth over the medium term (let’s say 12 months), the disk space is allocated in one large chunk. In contrast, files that are constantly filling and growing have space allocated in small chunks from whatever disk space is free at the time. The end result is high levels of file fragmentation, a particularly nasty problem for the transaction log file.

Performance; There’s a performance overhead involved in each file expansion. You don’t need to be the sharpest tool in the shed to understand that a file that’s constantly auto-growing will incur a much larger performance overhead compared to one that’s appropriately sized for the expected growth. Instant File Initialization helps in this regard (and should be enabled in almost all cases), but can’t be used for the transaction log, which is arguably the most important file from a performance perspective.

Associated best practice; Autogrowth should not be used as a substitute for good capacity planning and proactive database maintenance. Configuring files with the autogrowth property is fine in handling unexpected surges in growth, so long as the files are appropriately sized in advance, and appropriate monitoring and alerting is in place for disk usage and autogrowth events.

Failing to presize tempdb

Those that rely on auto-growth for databases almost always fail to pre-size tempdb. The unique thing about the tempdb database is that when SQL Server is restarted, tempdb is recreated with the original file sizes. As an example, let’s say that over the course of normal production activity, the tempdb database grows to 20GB. When SQL Server is restarted, the tempdb database will be reset back to the defined size and growth increments, which by default is 8MB with 10% autogrowth. Such a setting would require close to 100 autogrowth events to return the file to the required operating size, bringing with it the corresponding performance and fragmentation overhead. In such cases, the classic outcome, particularly for databases with a heavy reliance on tempdb, is sluggish performance for a period of time after SQL Server restarts.

Associated best practice; Pre-size the tempdb database after observing the production system that’s been up and running for long enough to cover the full range of activities (index rebuilds, dbcc checks, application activity, reporting etc …) All of these things use tempdb, some more than others, and the correct value to use is therefore site specific. Of course, before the system goes into production, an educated guess is required, but a value greater than the default 8MB may be a good place to start!

Full recovery model with no transaction log backups

In the first post in this series we covered backup worst practices, the first of which was not taking backups. One of the less than obvious variations of this practice is not taking transaction log backups on the assumption that a full database backup will “take care of it”. It won’t. A database in the full recovery model (the default for the Standard and Enterprise editions of SQL Server) will retain transactions in the transaction log until an explicit transaction log backup is performed. If transaction log backups are never taken (frighteningly common), the log continues to grow forever.

The classic outcome of situations such as the one just presented are databases with a 20MB data file and a 900GB transaction log file. In most of these cases, the disk eventually fills to capacity and the system falls over with 9002 errors (out of disk space). What happens next is usually a combination of panic-induced hysteria and logic-defying feats of ignorance.

Among many other critical functions, the transaction log is used to ensure the integrity of the database by rolling forward committed transactions and rolling back uncommitted ones. Such a process is performed when the SQL Server instance is restarted. Let’s take the case of the full disk scenario presented earlier; someone notices the 900GB transaction log file and, in a moment of clarity, decides it’s not really required, so they delete it (after stopping SQL Server to remove the file lock). When SQL Server restarts, the usual roll forward/back process cannot happen, given the absence of the transaction log, therefore resulting in a potentially (probably) corrupt database containing half completed transactions.

Associated best practice; Backup the transaction log frequently, or use the simple recovery model if point in time restore is not required (and data loss since the last full backup is acceptable.)

In the next post in this series, we’ll examine indexing worst practices.