Replication Overview
From SQLServerPedia
|
See Also: Main_Page - Database Administration - Replication In any multi-SQL Server environment the dba will sooner or later have to distribute data from one SQL Server to another SQL Server, or other RDBMS. SQL Server has a free rich set of options to distribute data; SQL 2005 has extended the available choices.
Why ReplicateThere are several reasons why a DBA might want to replicate data:
These reasons are examined in the following sections. Fault ToleranceFault tolerance is a solution's ability to withstand faults. Faults generally fall into two categories:
Local failures can be mitigated by hardware redundancy. Regional failures require a Disaster Recovery site which the application will be able to fail over quickly with minimal data loss. Different companies have different high availability requirements. Some companies cannot tolerate data loss; for example medical or financial institutions. Some companies can tolerate some data loss - for example, ecommerce applications will sometimes be designed to tolerate some small amount of data loss in the interests of maximum throughput. Other companies require high availability but can tolerate significant data loss - for example some media companies need to provide continuous server and can tolerate significant data loss as the same media item is in continuous circulation. Clustering is a technology that provides automatic failover. Clustering involves grouping two or more servers (called nodes) together to act as a single virtual server. The nodes will share resources among themselves. The virtual server will float between nodes; at one time one node may be providing resources for the virtual server. This node may experience hardware failure, or may go offline for maintenance, at which time the other node will host the virtual server, with minimal downtime for the clients. Clients will be disconnected as the cluster fails over from one node to another and they will have to reconnect. Logic will have to be built into the client application to queue work which may be lost during cluster failover. Clustering used to be only available in SQL 2000 Enterprise Edition, however in SQL 2005 the Standard edition will support 2 node clustering. Most financial institutions will use hardware data mirroring with a solution from a vendor like EMC, Hitachi (HDS), or Veritas that will mirror the data at the hardware level. Such a solution is more scalable than clustering, and provides for regional fault tolerance and automatic failover. Clustering is limited by separation distance between the nodes (typically a distance that will allow 150 ms round trips for pings). Some of the hardware data mirroring vendors provide Global Server Load Balancing where a server will fail over to another server running in an entirely different location with minimal data loss. Log shipping, replication and database mirroring are all high availability solutions which are able to transcend the small separating distance the clustering has. The other solutions for providing fault tolerance are SQL Server 2005 data mirroring, log shipping and replication. SQL Server 2005 database mirroring also provides automatic failover; however the high availability mode does increase the commit time of each transaction on the source and requires a third server in the mix. Database Mirroring is supported on the Enterprise and Developer Editions of SQL 2005. Data Mirroring is not a scalable solution for SQL Servers under significant load, nor does it scale to more than 10 databases. One important note about database mirroring is that the mirrored database on the destination server (called the mirror) will be in a state of recovery and cannot be accessed - although you can create a database snapshot off it using the following command: CREATE DATABASE DatabaseName ON ( NAME = DatabaseName, FILENAME = 'C:\DatabaseName.ss' ) AS SNAPSHOT OF DatabaseName; GO. Log shipping is another alternative which provides fault tolerance but not automatic failover. In log shipping a backup is restored on the destination server, and then log dumps are shipped from the source server to the destination server and applied. While you can dump the transaction log every minute, the practical limit is closer to 5 minutes, which means log shipping is not as scalable as clustering, hardware data mirroring, or SQL Server Data Mirroring, and your exposure to data loss is greater than other high availability technologies. For example your data loss can be as lengthy as the last time you dumped the log and sent it to the standby server; this can be more than twice your dump frequency. However, log shipping does work well if you are trying to synchronize two databases where you have frequent schema changes. Replication can also be used for fault tolerance, however there are several caveats:
To use replication for fault tolerance you must do work on the destination server to account for the factors discussed above. Replication does free up your destination database for useful work, and offers the lowest latency in all the above solutions except for hardware data mirroring. Application RequirementsReplication is frequently used when applications require data to be consolidated from one or more servers to a central repository; and/or to be transformed as it travels from one server to another. Frequently DBAs will work in environments where they have a central office with multiple branch offices and they have a requirement to consolidate data centrally (Frequently called Point of Sale (POS) applications), or to distribute data to each branch office. Another example is where sales agents visit clients and take orders and synchronize their orders nightly or when they return to the office a week later (Frequently called Sales Force Automation (SFA) applications). A classic example of this is a delivery fleet who update their order details using their PDAs (Personal Data Assistants running SQL Server 2005 Mobile Edition) at their delivery locations and their order details are replicated over the internet to their head office (Frequently called Field Force Automation (FFA) applications). Replication can be used to consolidate data centrally, or to replicate data from one RDBMS server to another; for you can replicate from SQL Server to previous versions of SQL Server, Oracle Sybase, DB2, MS Access, and SQL CE (merge replication only). With SQL 2005 you can now replicate from Oracle servers to SQL Server, or you can replicate from Oracle to a central SQL Server and then replicate from there to other SQL Servers or heterogeneous RDBMs - with SQL Server acting as the gateway. Replication can also be used to replicate to the same database. Performance GainsReplication is frequently used when you need to move your reporting functionality off your main SQL Server to a separate SQL Server. With replication, as opposed to other data distribution solutions (Log Shipping, Data Mirroring, or Hardware Data Mirroring), you can have completely different indexes or even tables on your reporting server and keep both servers in sync with the same data. Most other data distribution solutions do not permit access to the destination database while it is being synchronized. You can also improve aggregate read performance by replicating the same data to a bank of SQL Servers, and instead of having 1000 users reading a single server; distribute the load over 10 SQL Servers each with 100 users, and achieve 10 times the read performance. Another example of improving performance using replication is when you have branch offices connecting over the WAN to their head office to connect to the central SQL Server. The latency involved with this network hop can cause the application to crawl; and while a Citrix or Terminal Services solution is ideal for distributed environments like this, frequently the most effective approach is to replicate the data to SQL Servers in the branch offices and have the users' applications use this database. The changes are then replicated centrally. Another interesting approach is to replicate data to worker nodes which will perform batch operations on the data and then replicate it back upon completion. You can get considerable performance improvements by offloading such batch processing to these worker nodes. Data DistributionFrequently DBAs need to distribute their data. For example you may need a copy of your data replicated daily to a development server, reporting server, or your data warehouse. Or you may need to replicate data to another server so the client application can access it there instead of having the client application access the data across the network. Generally the reason is to bring the data closer to the consuming application or to provide consumers of the data with packages of the data in a manner so that their querying of the data will not degrade performance on the source server. |