Alternatives to Replication
From SQLServerPedia
|
See Also: Main_Page - Database Administration - Replication There are other options for distributing data. This topic will examine them and compare them briefly with replication to clarify when replication is the best choice. Distributed TransactionsThis involves programmatic solutions in your application or code where you apply your transactions on both the source server and the destination server, usually within the confines of a transaction. There is a price to pay in terms of latency for this and usually involves considerable programming effort and are not scalable for write intensive applications. Distributed transactions are used when real time synchronization is a must; they are typically used in financial transactions. Linked ServersLinked servers are similar to distributed transactions, except that you typically access a linked server within the database as opposed to within your code. The performance costs of using a linked server do not make them scalable. Linked servers also provide access to a wider variety of data sources than SQL Server replication provides. Linked servers are typically used to connect with data sources that replication does not support or which require ad hoq access. Linked servers involve a network hop each time they are accessed. Whenever you see a linked server being used, consider replicating the data locally to the same database or a different database on the same server and access it there. TriggersTriggers can be used to distribute data, but are best used when distributing data locally and there is considerable performance and administrative overhead with using them. Typically triggers are used for auditing purposes. Triggers are not a good choice to use over a network. Should the destination server be offline or busy, the trigger will hang typically for up to 20 seconds before rolling back. Integration Services (formerly known as DTS - Data Transformation Services)IS can distribute data, and has a very rich set of features for transforming data as it migrates from the source server to the destination server. Integration Services also can be used to distribute data to a wider variety of data sources than SQL Server replication supports. IS does not track individual transactions, so if you choose to use Integration Services to distribute your data you will have to replace the entire data set each time, or have some mechanism to tell what has changed on the source server so you can only replicate the changes. Integration Services is typically used with batch processes and not in processes that require minimal latency. Snapshot Isolation LevelThis option, new in SQL Server 2005, is an isolation level that results in a higher level of concurrency, ie more users can access the database simultaneously. If, at the beginning of a session, a user sets the isolation level to snapshot isolation level, the user can query the state of the database at the point in time he or she set the isolation level. Other users can modify the database without being locked by this user. Backup/RestoreThe above options are used to distribute individual transactions. Backup/Restore involves distributing the entire database. This is only a good solution when customers need copies of the entire database which they will have to treat as read-only because a subsequent restored backup will overwrite the work done on the last restored copy. Users will have to disconnect from the database while the backup is restored and there is no capability to migrate any work done on the destination server back to the source server. There is no capability to distribute a portion of the data; you are limited to the entire database. Log ShippingLog Shipping can be thought of as a continuous Backup/Restore which starts with a restore of the backup on the destination server, followed by periodic restores of the transaction logs. These logs must be restored in order and all logs must be restored. Users will have to go offline as the backup and logs are installed. There is no capability to distribute a portion of the data; you are limited to the entire database. Log shipping does not provide the small latency that replication does, and it does not allow you to send a subset of your data to the destination server. Nor does it allow you to massage the data in any way while sending the data from the source server to the destination server. Log shipping is supported in two recovery models - full and bulk-logged. Microsoft SQL Server Data MirroringData Mirroring, new in SQL 2005, will mirror a database between two servers. In the high safety with automatic failover mode (there are three modes - high performance, high safety, and high safety with automatic failover), it requires a third SQL Server (called a Witness) to determine when the source server goes offline and will failover clients to the destination server. Data Mirroring uses continuous log shipping under the covers. Data Mirroring works well for mid-sized databases under medium load and is only scalable to 10 databases per server. ClusteringClustering involved specialized hardware and requires different versions of the OS. Clustering involves sharing a database, SQL Server or resource between nodes. Clients connect to a virtualized SQL Server that will connect to an underlying node to access the shared resource. If a resource fails, the virtualized SQL Server/Resource will be migrated to the failover node with little impact on the clients. Clustering is not a data distribution technology, but rather a fault tolerant solution designed to minimize downtime or provide high availability. Clustering should be used for fault tolerance over replication; it "distributes" an entire SQL server and automatically fails over to the second node in the cluster on failure. With replication you must manually fail it over. Software Data MirroringThese products install a file system driver which tracks file activity and replicates this activity to a destination server. These products are called "host based" and consume network bandwidth, RAM and CPU on the host (source server). They offer similar functionality as hardware data mirroring but they don't offer the same performance - although they come can come close for medium loads. An example of a software data mirroring provider is Double Take. Hardware Data MirroringHardware data mirroring is a hardware product by vendors like Hitachi Data Systems, EMC, or Veritas that replicate changes that happen on the byte level from one disk array to another array. These solutions make heavy use of caching and tend to be very expensive and complex. However they are required when the cost of downtime is significant. With hardware data mirroring there is always some added latency with write intensive operations. NotesThe above list is ordered based on the level of granularity. For instance, the first 3 alternatives distribute data at the transaction level. DTS and replication distribute data on an article or table level. Log shipping, Data Mirroring distribute data on a database level. Clustering, Hardware and Software Data mirroring mirror servers or disk arrays. These are also ordered by required skill set for the dba and expense; although the expense of programming distributed transactions can be considerable. Replication can replicate a subset of your tables, columns or rows. Replication can also replicate partitioned tables, the schema of stored procedures (as well as the execution of a stored procedure), views, indexed views (as an indexed view or as a table that represents the indexed view), User Defined types and Functions, full-text indexes, Alias Data types, Schema Objects (constraints, indexes, user triggers, extended properties, and collation) and DDL (Data Definition Language alter table statements, etc). The latency you will see with a replication solution in a production environment under load typically is one or two minutes. Replication is a low cost alternative that offers a high degree of flexibility, which the other solutions do not offer. |