Backup Performance

From SQLServerPedia

Jump to: navigation, search
See Also: Main_Page - Database Administration - Backup & Recovery

Contents

The Backup Process Described



A typical backup process consists of several steps that are executed in a loop:
  1. Data (or transactions) are read from the SQL Server database files. Backup Input / Output (I/O) operations are sequential, therefore SQL Server performs backups in larger I/O steps than normal read / write operations. Recall that typical SQL Server I/O consists of 8Kb pages. However, by default backups are combined into 64 Kb I/O's. You can override this default by specifying a different value of BLOCKSIZE with the BACKUP command.
     
  2. Data that is read is copied to the backup destination. The backup destination can be a local directory, network share, tape or pipe. If you are backing up to a network share, the data can be written to the memory cache. After that it can be written to the actual disk on the network.
     
  3. When the copy of the read data is complete, the control returns to the backup process to read additional data, until all data has been copied.


This process is sequential in nature, which means that some parts of the SQL Server system as well as the backup server are idle during some portion of the backup. To optimize backups you can try backing up in parallel so that all components of the system are busy at all times. For example, you can backup to a drive array instead of a single disk to write to multiple disk in parallel.

There are three ways of backing up SQL Server databases:
  1. Backup to a local disk drive or disk array
  2. Backup to a different server over a network
  3. Backup to tape device(s)


The advantage of the first method is that network utilization will not be affected and if you have to recover a database you won't have to copy the backup from the network. Generally, local backups will be the fastest. One obvious disadvantage of backing up locally is that this method is not very reliable: if a disaster were to occur (fire or loss of power) backups would be lost along with the database.

Network backups are very common in medium to large scale environments. The main advantage of this method is the ability to share resources among multiple SQL Servers. You can have multiple SQL Servers backing up data to the same shared network server which has larger disk arrays than individual database servers, which in turn translates into higher I/O capacity. The main disadvantage of network backups is their adverse impact on the network, which can become a performance bottleneck. If you had to recover a database, you'd have to either copy backup files to a SQL Server computer or restore over the network, which can be slower than restoring from a local drive.

Backup to tape devices can be a great way of securing the data since tapes can be stored in an off-site location, without the risk of being damaged during a disaster. The disadvantage of tape backups is having to go off-site, get the tape and restore (or copy) the backup to the disk first, prior to recovering the database, which is much slower than restoring from a local drive array.

Performance Impact of Backups



While SQL Server backups are online (users can read / write data while the database is backed up), the increased I/O activity has the potential for a negative impact on performance. Since SQL Server has to read data from disk and write data to a backup device, disk utilization during backups goes up significantly. If you are backing up to network share, network bandwidth utilization also goes up, which might also affect your system availability.

Factors Affecting Backup Performance



I/O on SQL Server and on backup server

I/O operations while performing the backup are no different than reading and writing data from the database, aside from the fact that the backup is performed using larger blocks of data. However, combining backups with heavy read/write activity on SQL Server can bog down the performance by overloading the system.

Another side of the equation is the backup media that is being written to. A tape device or disk on the network drive can also be the bottleneck. Recall that the backup process will NOT read another portion of data until it completes copying already read data to the destination. If you are backing up to a tape device and experience performance degradation you can perform one of the following actions:
  1. Backup to a local disk prior to copying the backup to the tape device
     
  2. Attempt to back up to multiple tape devices in parallel
     
  3. Try using a different (faster) tape device


It is also a common (and recommended) practice to backup databases to a disk drive first and then move the backup to a tape. Many newer tape drives have built-in data compression capabilities, which reduces the time it takes to copy the backup, as well as the size of backup files.

If you are backing up to a disk and experience slow performance, you might wish to examine the RAID configuration on the backup destination. Using RAID 5 configuration is not appropriate for disks that experience over 25% write activity. Such configuration has a high write overhead. Instead try using RAID 1 or RAID 10 as the backup destination media. The disadvantage of RAID 1 (or RAID 10) is the cost, which can be considerably more expensive than RAID 5.

It is recommended to examine the Avg.Disk sec/ Read and Avg. Disk sec /Write counters in the System Monitor to determine whether the backup destination is the bottleneck. If these counters go over 25 milliseconds per read / write then you're likely to have a destination disk problem. You should also examine disk queue length.

Tuning disk I/O for backups is fairly straightforward, but depends largely on what type of disk drives and RAID configuration you can afford. RAID 0 provides no fault tolerance, but supports the maximum number of writes; RAID 1 (or RAID 10) provides mirroring fault tolerance, but requires writing the same data twice, therefore supporting approximately half the throughput provided by RAID 0. Finally RAID 5 provides parity for fault tolerance, but requires much more overhead than RAID 1. Exact throughput of RAID 5 depends on the number of disks in the array as well as how fast those disks are. RAID 5 is typically not well suited for backups since such activity is 100% writes and no reads.

NOTE: if you perform a disk backup, you should not backup to the drive that shares the SCSI controller with the data or transaction log drive. Backing up to a drive / drive array with a separate controller will perform significantly better and won't affect your application's performance.

The Network

The network can cause significant performance issues if you are backing up your database to a network share. If you determine that your network throughput is insufficient for backup needs, you can try upgrading to faster network hardware, for example going from 10BaseT to 100BaseT. Gigabit networks will perform even better but will come at much higher cost.

If you experience network bottlenecks with backups, you may consider generating a local backup first and then copying the backup file to the network. Another way to optimize network throughput is to use multiple (and faster) network cards. To further improve backup throughput, use several network segments.

If you share a network server for backing up data from multiple SQL Servers, you should consider having a dedicated network for backups. This way, backups will not interfere with regular network activity. Furthermore, you should examine backup schedules on each SQL Server backing up to a shared resource and alternate them, so that backups from various servers don't compete for network bandwidth.

If your budget allows, you should consider investing in new software and hardware products such as Storage Area Networks (SAN). SAN allows you to share disk and tape arrays across multiple systems and provides superior performance for backups.

The CPU

Generally you should see greater impact on I/O than on processors during the backup. However, processor utilization does go up during backups. If you're not using multi-processor systems and experience performance degradation during backups, consider investing in more and faster CPU's.

Tips for Tuning Backup Performance



Following are some tips and guidelines for improving backup performance.
  • Use multiple backup devices
    Using multiple backup devices (whether disk, tape or network share) allows SQL Server to perform backup steps in parallel, which reduces the total time required for backups. This holds true for drive arrays - more disks in the drive array will speed up backup operation.
     
  • Use multiple data files
    By using several data files, SQL Server can back them up in parallel, which makes both back and recovery faster.
     
  • Backup to disk first, before copying backups to tape or network
    Backups to a local disk array can often be quicker and have less impact on the system than network or tape backups. In addition, having backup files on the local drive can speed up recovery.
     
  • Plan full backups for off-peak hours
    Doing so will make backups quicker and reduce the impact on user applications.
     
  • Use Differential backups
    Differential backups take less time than full backups since they only copy the data that was changed since the last full backup
     
  • Keep the most recent backup files on disk for optimizing recovery
    Having backup files on a network or tape requires copying the backup locally prior to recovering the database. Therefore, try to keep at least the most recent backup on the local drive / drive array.
     
  • Examine the SQL Server configuration setting Max Worker Threads
    When using multiple data files and multiple backup devices, SQL Server can perform backups in parallel and this improves backup performance. However, each backup operation requires a SQL Server thread to run. If the system is overloaded, not having enough threads allocated for SQL Server use can force the process to wait until the next thread becomes available. If this happens you no longer get the benefit of backup operations performed in parallel. The default setting for Max Worker Threads is 255, which is sufficient in most environments. If you have reduced this setting for other tuning efforts and experience performance degradation during backups you might want to bump it back up to 255.