Have you tested your database backups lately?

Filed under: Administration, Backup and Restore — Brent Ozar at 10:11 am on Thursday, October 2, 2008

No you haven’t.  I know you haven’t, because I was a DBA, and I hardly ever did it either.  Now that I work for the makers of LiteSpeed, I hear the horror stories from our support folks.  DBAs call up screaming in pain, and we can hear the boss loading the shotgun in the background.  The DBAs wrote backups to tape and the tapes got hosed, or maybe the backups never actually made it to tape because the DBAs wrote the backups to disk but the network staff never backed up that disk share.

Do yourself a favor - pick your three most critical databases, the ones that’d really give you heartburn if they disappeared, and go try to restore them onto a testbed server that has decent performance.  Time how long it takes, and mention it to your boss.

Mention it to your manager in casual conversation.  “Hey, Pointy-Headed Boss, I tested the restores today just like I’ve been doing every week (cough) and they worked great.  I never mentioned it to you before, but in case you’re curious, it’d probably take us about X minutes in order to recover in case the production server barfed.”

Your boss will probably say, “Why are you telling me this?”

You say, “I just thought you might want to know.  If that server dies, I want you to kinda have a feeling for how long it’ll take me to restore, because in the heat of the moment, you know, people kinda stand around the desk waiting for it to happen.  If the box explodes, I want you to be able to answer when your manager and your manager’s manager ask us how long it’s going to take.”

At that point, the fidgeting will start, and maybe they’ll ask you to make it happen faster.

Why am I telling you this?  Because budgeting time is coming up, and that’s a good way to segue into a conversation about getting more storage performance for your servers or getting a faster, dedicated backup share.

Good luck.  Let me know how it goes.  (And start testing your backups regularly.  Please.)

Update 10/2 - Funny Ha Ha - Jason Massie aka StatisticsIO.com posted a funny comic about testing your database backups.

We are trying to implement log shipping to run hourly. If our nightly full backup takes 1 hour and 45 minutes, what is our recovery process? Is it a bad thing to have a transactional backup in the middle of a full backup?

Filed under: Administration, Backup and Restore, I'm a Newbie, Internals and Architecture, Replication, Transact-SQL (T-SQL) — Ari Weil at 1:06 pm on Friday, August 29, 2008

Yes, this is bad because it can lead to your log shipped database getting out of sync and will probably result in Error 4305; error 4305 states that the log in a backup set is too late to apply, which means your log shipping jobs will fail until you can synchronize the databases. You should schedule your transaction log backup job to stop before your full backup job begins and then to restart once it has completed. See the Microsoft TechNet article on setting up Log Shipping.

When you perform a a full database backup SQL Server stores the ending log sequence number (LSN), which becomes the starting LSN for the next transaction log backup. So, if you are performing a transaction log backup while your full database backup is executing, and that transaction log backup is successful, once it is shipped and applied to the subscriber it will have a different starting LSN than the ending LSN of the full backup. This is not allowed and SQL Server will throw error 4305.

Is it a bad thing to have a transactional backup in the middle of a full backup?

Filed under: Administration, Backup and Restore, I'm a Newbie — KKline at 3:53 pm on Thursday, August 14, 2008

Here’s the actual question in its entirety:

We are trying to implement log shipping… We implement a transactional backup hourly. If our nightly full backup takes 1 hour and 45 minutes, what is our recovery process? Is it a bad thing to have a transactional backup in the middle of a full backup? 

 

Now, your question, whether you realize it or not, is pretty involved and has three major elements.  First, you want to know about implementing log shipping.  Second, you want to know about the recovery process considering your current backup process of hourly transaction log dumps and nightly full database dumps.  And finally, you want to know if a transaction log backup can cause problems if it occurs while a database backup is already processing. 

 

FULL DISCLOSURE - I’ll go ahead and say now that I’m going to include verbiage around a Quest product in this response - not a blatant advert, but I will be mentioning one of our products.

 

So, to your first question – remember that log shipping is a high-availability technique used to speed recovery of a database should you ever have a crash.  You don’t say which version of SQL Server you’re running.  But I assume that you must be running on SQL Server 2000, otherwise you would have mentioned other (and better, IMO) alternatives such as database mirroring.  Assuming you’re on SQL2000, then log shipping is not a big deal to implement and is widely discussed elsewhere on the internet, such as Microsoft’s TechNet (www.microsoft.com/technet/prodtechnol/sql/2000/maintain/logship1.mspx) and MSDN (msdn.microsoft.com/en-us/library/ms187103.aspx).  I won’t rehash an elementary concept of SQL Server when others have already done an excellent job of telling you how to implement it.  Since you might still be in the planning stages of your high-availability solution, here are two caveats that are often overlooked by newbies.  One, don’t forget that the database that is the target of log shipping is going to be in recovery mode all the time until it becomes the primary database.  You pretty much cannot use it unless the source database fails.  Two, log shipping works great with LiteSpeed.  So, if you’re a Quest Software customer using LiteSpeed, then you can enjoy all the benefits of speed, reduced disk consumption, and encryption that LiteSpeed offers.

 

To your second question, your current backup process, in which you backup the transaction logs every hour and the database nightly, is pretty sound.  Keep in mind that you could potentially lose up to 59 minutes of data with any given database failure.  If 59 minutes does not represent a significant amount of work on the database in question, then great!  However, I rarely had a production database where I was comfortable making users re-key up to 59 minutes of work.  Fifteen minute intervals for a transaction log dump were more in my range of comfort.  Additionally, consider that most applications do NOT have users working around the clock.  Because of that very natural business cycle, you can often lessen or even completely discontinue transaction log backups during the wee hours.  For example, you might run the transaction log backups every 15 minutes from 6:00 AM EST to 8:00 PM EST every day, since you have few if any users after that time.  Alternately, you might run the log dumps ever 15 minutes from 6:00 AM to 8:00 PM EST, run them hourly from 8:00 – 10:00 PM EST, and then only do one more full database backup until the next morning.  Frequent log backups are important also because that’s the primary means by which you keep the transaction log from growing too large.

 

Finally, with your last question, starting a transaction log backup while a full database backup is still running will behave differently on different versions of SQL Server.  For SQL2000, SQL Server doesn’t allow you to run the transaction log backup while the full database backup is running (at least that’s what my fragile memory is telling me).  For SQL2005 and later, transaction log backups and full database backups can run concurrently though the transaction log backup will certainly be slower because of it and the database backup will likely be slower too.  The full database backup will contain all data in the state of the database at the time it completed, while the transaction log backup will contain all of the transactions that have run since the last full database backup OR transaction log backup.  Since the transaction log backup will probably finish before the 90 minute long database backup, you should think of that transaction log backup as the last log backup before the new full database backup starts the transaction log backup process all over again.  Thus, in an emergency restore situation, you could apply the previous day’s full database backup plus all transaction logs up to the time that the transaction log ran concurrently with tonight’s full database backup to get a recovery that included all transactions up to the most recent point in time.  Once the full database backup completes, however, the clock is started over and you would have to start applying transaction log backups that were subsequent to the completion of the full database backup.

 

Hope this helps,

 

-Kevin

The basics of snapshots

Filed under: Backup and Restore — Brent Ozar at 10:37 pm on Tuesday, August 5, 2008

“Snapshots” is one of those words that gets tossed around a lot.  In a nutshell, a snapshot is a point-in-time copy of a piece of data.  It might be a single database, it might be a drive, or it might be a SAN array.

There’s (at least) 3 kinds of snapshots:

  • Application-level (in this case, SQL Server, but I like to illustrate snapshots by talking about Word’s change-tracking features.  Those aren’t exactly snapshots, but you get the idea.)
  • Windows-level (Volume Shadow Copy Services, or VSS)
  • SAN-level

The difference between the levels is who’s doing the change tracking, and each level has its own advantages.

With SQL Server 2005’s snapshots, the DBA controls the whole process inside SQL Server Management Studio.  The snapshots appear as read-only databases, which can be really useful.  Say you automate the snapshot process and take a snapshot every morning at 8am, and then present that to your users as DatabaseName_8am.  Presto, if somebody drops an index or deletes records, they can go back to the 8am snapshot without calling the DBA.  Problem is, these snapshots aren’t useful for real backup – meaning, we can’t easily get ‘em off the database server.

With Windows-level snapshots using Volume Shadow Copy, the Windows admin controls the process inside their backup program of choice (Veritas, Microsoft DPM, etc).  Windows takes the snapshot and doesn’t care what’s on the drive.  The Wintel folks automate the process and shuffle the data off to tape.  Problem is, these snapshots aren’t useful for application-level restores, like letting our users query directly into the snapshot to see what changed.

With SAN-level snapshots, the SAN admin controls the process inside the SAN.  They take a snapshot of an entire array and present it to a backup server or shuffle it off to tape.  These combine the best of both worlds, because with solutions like EMC’s or NetApp’s, you can present the snapshot to your database servers or Exchange servers and query the snapshot just like it’s a live database.  One problem is that they cost an arm and a leg, and another issue is that they’re usually SAN-specific - meaning, each of your SANs will have its own snapshot technology, snapshot administration tools, and ways to go about using it.

Each of the snapshot technologies has its own pros and cons, and they’re all useful in different situations.

Backup Log with Truncate_Only: Like a Bear Trap

Filed under: Backup and Restore — Brent Ozar at 8:32 am on Friday, July 25, 2008

We got a question about the best practices way to truncate their SQL Server log files, and that kinda scares me.

It’s somewhat akin to asking, “What’s the best way to cut my hand off to free myself from this bear trap before I starve to death in the wilderness?”  Well, you shouldn’t be sticking your hand in bear traps to begin with, but if you find your hand in a bear trap, ANY way to get out of it is a good way.  Pocket knife, teeth, band saw, whatever it takes.  (Now there’s a sunny image to start the day.)

When people ask about truncating their log files, what they usually mean is, “We don’t have enough space on our log drive, and I need to free it up often.”  This happens in systems where folks didn’t get buy enough space for their log files, and they have to repeatedly truncate their logs or else their system will run out of space.

If we’re talking best practices, then really the answer is not to truncate the logs, period.  Avoid that by:

  • Increasing the size of the log drive.  This is the easy answer, and it’s the best long-term answer, but sometimes we don’t have the money to do this, or there’s no space left for drives in the server or the SAN.
  • Add an additional drive for logs, and either move some of the database log files there or add secondary log files on those drives.
  • If the transaction logs truly don’t matter, then change the databases to simple recovery mode so they don’t need to be backed up.  The log files won’t keep growing (unless there’s a monster transaction).
  • Do transaction log backups more frequently (maybe we’re doing t-logs every hour, and need to be done every 15 minutes).  I’m not wild about this answer because at some point, your log backups will fail for some reason (like your backup drive ran out of space) and your log drive will still fill up.

If you do find yourself in a bear trap with a full transaction log drive, then the fastest way to get out of it is to run a query manually via T-SQL – forget LiteSpeed.  Just use the native command:

BACKUP LOG [DatabaseName] WITH TRUNCATE_ONLY

However, like cutting your hand off, that has some really nasty implications – you lose your ability to do a point-in-time restore once you do that.  Do a full backup as soon as possible afterwards, and then resume normal transaction log backups.

Update 8/14/2008 - Paul Randal wrote a good blog post on what to do when your transaction log fills up.

Backing Up and Restoring Replicated Databases

Filed under: Administration, Backup and Restore, Replication — Jason at 8:04 am on Friday, May 23, 2008

At the last user group that I attended, a question was raised adbout restoring databases involved in replication and maintaining the replication settings.  I mentioned that the RESTORE command has a KEEP_REPLICATION option that can be used in this scenario but thought I’d go into a little more detail here.

 Typically when you restore a SQL Server database, replication settings are removed during the restore process.  This allows you to do a database recovery to a dev/test/DR environment and not have to worry about replication settings causing issues in a non replicated environment.  In many scenario’s however you may need to restore a replicated database and maintain these replication settings so that you can re-initialize replication after the restore.  The KEEP_REPLICATION option allows replication settings to be maintained after a database is recovered.  Syntax is below:

 RESTORE DATABASE northwind FROM DISK=’d:\backups\northwind.bak’ WITH KEEP_REPLICATION

For more information, Microsoft has an MSDN article that goes through the entire Backup and Recovery strategy for databases involved in replication.  There is quite a bit more work here than you may initially think, so be prepared.

http://msdn.microsoft.com/en-us/library/ms151152.aspx

If anyone has any real-world scenarios around backing up and restoring replicated databases and some of the challenges that you’ve faced, I’d love to hear them in the comments section.

Wondering whether to take a full or differential backup? See what percentage of your data has changed using native SQL Server tools!

Filed under: Administration, Backup and Restore, Internals and Architecture, Transact-SQL (T-SQL) — Ari Weil at 9:22 am on Tuesday, April 15, 2008

If you’ve never read Paul Randal’s blog on the SQLSkills website you should check it out. Paul recently posted a stored procedure that can be used to tell which percentage of a database’s data has changed so you can determine which type of backup to take. The procedure can be scheduled and its output analyzed over time to strengthen your backup and recovery strategy (because we know that you already have one, you just need to improve upon it ;-) ).  The cool thing about code like this is that it can be bundled into other maintenance procedures and it can integrate with monitoring and maintenance tools.  For instance, you could run this procedure on a schedule with predefined diff thresholds that would flag the  appropriate type of backup job for execution.  You could then use enterprise monitoring tools like MOM or SCOM to determine the best maintenance windows to guarantee the backups run at the right time, and voila!  You’ve got the beginnings of an automated backup and recovery strategy.

We are running legato to backup SQL databases, however one of the servers is failing on two of it’s databases as they are offline, is there a way to put them online in order to run the backup on them but have them ‘blocked’ from running?

Filed under: Administration, Backup and Restore — Ari Weil at 2:40 am on Wednesday, January 16, 2008

There are two ways you can go about this. One would be to set the offline databases to single user mode:

ALTER DATABASE SET SINGLE_USER

…then back them up. Any users attempting to access the databases will receive the following error:

Server: Msg 924, Level 14, State 1, Line 1
Database '‘ is already open and can only have one user at a time.

Once the backups have run the databases can be taken back offline.

The other way would be to bring the databases online and change the access permissions to deny access to all users but the one used to run the Legato backups.

The approach you take will necessarily depend on your company’s security policies and the reason the databases are offline to begin with. If it were me, I’d try to avoid constantly changing database states and simply address the access rights to the database.

My transaction log backups are taking longer and longer to complete. What can be causing this?

Filed under: Administration, Backup and Restore, Database Design, Internals and Architecture, SQL Server 2005, Tuning and Optimization — Ari Weil at 7:03 am on Tuesday, December 18, 2007

I would bet you’re running into an issue where too many virtual log files (VLFs) have been created for your database. Each physical SQL Server log file is internally divided into a number of VLFs. VLFs have no fixed size, and there is no fixed number of VLFs for a physical log file. SQL Server’s Database Engine chooses the size of the virtual log files dynamically while it is creating or extending log files, but information is available on how this works. VLFs are added as follows when the transaction log file grows:
< 64MB = 4 VLFs
64MB - 1GB = 8 VLFs
> 1GB = 16 VLFs

To understand, in more detail, transaction log architecture read: MSDN Virtual Log Files (SQL Server 2000) or MSDN Transaction Log Physical Architecture (SQL Server 2005)

To see how many VLFs exist in your database issue the [undocumented] DBCC LOGINFO command. The number of rows returned is equivalent to the number of VLFs in your database. The Status column tells you a VLF is still active if the value is 2. Don’t be alarmed when you see the output of DBCC LOGINFO, you can rectify the situation; if you haven’t noticed slowdowns you should still be aware of how your logs are configured and how many VLFs exist in your database. VLFs affect system performance when log files are defined with a small initial size and use small growth_increment values. Too many VLFs can slow down database startup and also log backup and restore operations. To see how following some [egad] proven best practices to ensure good transaction log throughput read: Kimberly Tripp’s 8 Steps to better Transaction Log throughput

In your case you’ll likely have to:
Shrink your transaction log file. In SQL Server 2000 many DBAs probably know that multiple BACKUP LOG or DBCC SHRINKFILE commands were necessary to shrink a transaction log. To understand why and to see how to use DBCC SHRINKFILE in SQL Server 2005 for this condition read: KB Article How to use the DBCC SHRINKFILE statement to shrink the transaction log file in SQL Server 2005
…then address the configured size of your transaction log file. I say file, because there’s little to no benefit to creating multiple transaction log files (read Kimberly Tripp’s 8 Steps article above…). You want to attempt to adequately size the transaction log and allow for an auto-growth rate (to ensure you don’t get transaction log full errors) that will ensure very infrequent growth.  You also want to ensure you understand enough about your workload to ensure log truncation operations don’t burden the system (you don’t want the log truncating every second).

To read a practical case, with examples, of how small increment values and auto-shrink can be detrimental to database performance read: Tibor Karaszi’s Blog on why to be restrictive with DB file shrinking

I’m trying to come up with a database backup strategy, do I need to do more planning than to decide when to take full/differential backups?

Filed under: Administration, Backup and Restore, SQL Server 2005, Transact-SQL (T-SQL) — Ari Weil at 8:40 am on Monday, December 17, 2007

The goal of a good backup / recovery strategy is consistent data availability in the event of a corruption or disaster. So, while it’s helpful to determine how frequently you need full and differential backups, you definitely need to consider other factors. Database backups are only helpful if they can be restored. Also, where and how you store backups is important. What happens if your production database server goes down, or what if the hosting facility loses power in a flood or earthquake? Sure, not all scenarios apply to everyone, but storage, redundancy and consistency checks are all crucial in ensuring a successful backup / recovery strategy.

Disaster recovery strategies are outside the scope of this post, however leveraging the following SQL Server features can help strengthen your backup / recovery strategy and avoid integrity problems when you least expect them while maintaining data availability:

DBCC CHECKDB
You’ll almost certainly want to use DBCC CHECKDB to find corruptions before they become a problem. Paul Randal wrote CHECKDB for SQL Server 2005, and frequently posts in-depth, yet concise information on SQL Server internals and architecture on his blog. His series entitled CHECKDB From Every Angle tackles many of the topics you’ll need to consider and dispels the myth that you don’t have time for consistency checking.

BACKUP DATABASE…COPY_ONLY
If you’re performance tuning your production database or you have developers that need a copy of production data this option might help. You can create a full database backup without affecting the differential bitmap in the database. Consider this scenario: you perform full database backups every Monday, Wednesday, and Friday, and perform differential backups in between. Your development staff needs a full backup on Tuesday, but you know that doing so will affect the differential bitmap of your database. Enter the COPY_ONLY option. Using Copy-Only Backups lets you keep the developers happy and keep your backup strategy sound.

CREATE DATABASE…AS SNAPSHOT
Auditing production data is an industry in itself, but many shops rely on fast, ad-hoc reporting to satisfy management requests. These same shops often cannot afford to risk allowing ad-hoc statement executions for fear of data being accidentally modified. In these situations, database snapshots offer a good solution. A snapshot is a complete, read-only copy of a database that can be created extremely quickly and that consumes very little space on disk. When data is read from the snapshot, it is actually being read from the primary database. As data changes in your primary database, SQL Server writes the previous data to a sparse file. The snapshot will then read the sparse file for the data that has been changed and continue to read the primary database for data that has not changed.

One final note - Quest’s Litespeed for SQL Server product’s low impact, high-performance compression technology can reduce storage costs (data can be compressed up to 95%) and SQL backup and recovery windows while maintaining complete control over the backup and recovery process. It even supports object-level recovery!! It’s definitely worth a look.

Next Page »