Site Recovery Manager a site migration and disaster recovery solution from VMware.
Site Recovery Manager a site migration and disaster recovery solution from VMware. It is fully integrated with VMware vCenter Server™ and VMware vSphere® Web Client. Site Recovery Manager provides orchestration and non-disruptive testing of centralized recovery plans. Site Recovery Manager works in conjunction with various replication solutions including VMware vSphere Replication™ to automate the process of migrating and recovering virtual machine workloads.
Multiple recovery plans can be configured to migrate individual applications and entire sites providing finer control over what virtual machines are failed over and failed back. This also enables flexible testing schedules. For example, one application owner requires quarterly disaster recovery testing while another application owner must test once per month. This is easily accomplished with Site Recovery Manager.
Sites that share stretched storage can take advantage of zero-downtime virtual machine migrations. Site Recovery manager can orchestrate the live migration of virtual machines using Cross-vCenter vMotion also known as “Long Distance vMotion.”
Storage policy protection groups enable automatic protection of virtual machines residing on array-replicated storage. Items such as networks, folders, and resource pools are mapped between sites in Site Recovery Manager to further automate the migration and recovery of virtual machines between sites. Utilizing VMware NSX™ universal logical switches with Site Recovery Manager enables automatic mapping of networks and virtual machine security policies across sites. NSX supports the spanning of layer 2 networks eliminating the need to customize virtual machine IP address settings during failover and migration. These features reduce complexity, improve reliability, and minimize recovery times.
Site Recovery Manager roles can be assigned to specific individuals and groups in vCenter Server. For example, an administrator might wish to allow several application owners to test recovery plans, but limit the actual migration and failover of virtual machines to just a few individuals in the organization. Site Recovery Manager also includes vCenter Server alarms for monitoring and alerting.
The key features provided by Site Recovery Manager are:
Recovery time objective (RTO): Targeted amount of time a business process should be restored after a disaster or disruption in order to avoid unacceptable consequences associated with a break in business continuity.
Recovery point objective (RPO): Maximum age of files recovered from backup storage for normal operations to resume if a system goes offline as a result of a hardware, program, or communications failure.
Array replication: Replication across one or more storage controllers, which eliminates the processing overhead from servers.
vSphere Replication: Host-based virtual machine replication technology created by VMware included with vSphere Essentials Plus Kit and higher editions.
Logical unit number (LUN): Number used to identify a logical unit, which is a device addressed by the SCSI protocol or Storage Area Network (SAN) protocols.
Consistency group: One or more LUNs or volumes that are replicated at the same time. When recovering items in a consistency group, all items are restored to the same point in time.
Failover: Method of recovering applications and services to a secondary system when the primary system experiences a failure or disaster.
Failback: Restoring applications and services from a secondary system back to the primary system after a failover has occurred.
Reprotect: Specific to Site Recovery Manager, the process of reversing the direction of replication and enabling recovery plans for a failback event.
Protected virtual machine: Virtual machine that is replicated from one site to another and is included in a Site Recovery Manager recovery plan for failover and failback.
Protected site: Site that contains protected virtual machines.
Recovery site: Site where protected virtual machines are recovered in the event of a failover.
NOTE: It is possible for the same site to serve as a protected site and recovery site when replication is occurring in both directions and Site Recovery Manager is protecting virtual machines at both sites.
Datastore group: One or more datastores that are treated as a unit in Site Recovery Manager. A common example is a consistency group in an array replication solution.
Protection group: Collection of protected virtual machines that are migrated or failed over as a unit.
Storage policy protection group: Protection group configured with a tag-based storage policy that enables automatic protection of a virtual machine in Site Recovery Manager simply by assigning the tag-based storage policy to the virtual machine.
Recovery plan: Documented process to recover a business IT infrastructure in the event of a disaster. A recovery plan in Site Recovery Manager includes one or more protection groups.
Storage replication adapter: Software components provided by array replication vendors that are installed on the Site Recovery Manager servers to enable communication between Site Recovery Manager and array replication solutions.
Placeholder virtual machine: Virtual machine created in the vCenter Server inventory at the recovery site when a virtual machine is protected by Site Recovery Manager. Placeholder virtual machines do not have virtual disks attached to it so the storage capacity consumed by placeholder virtual machines is very small.
Inventory mappings: In Site Recovery Manager, the default networks, folders, and resources for protected virtual machines to use at the recovery site.
NSX universal logical switch: Virtual switch that allows layer 2 networks to span multiple sites.
The purpose of this document is to provide a structured guide for IT professionals to evaluate the primary features and benefits of using Site Recovery Manager to automate planned migration and disaster recovery workflows for applications and services running in virtual machines. The exercises in this guide should be completed in the order prescribed for best results. Some exercises have dependencies on previously completed items.
This guide does not contain detailed steps on performing activities such as installation and configuration since these steps are already included in the product documentation.
Overview of the requirements in SRM
It is assumed the following items are already properly installed and configured in a non-production environment designated for this evaluation.
Recommendation: Verify the Windows operating systems for the Site Recovery Manager host virtual machines are compatible with the Site Recovery Manager using the VMware Compatibility Guides . Consult the Site Recovery Manager Documentation when installing and configuring Site Recovery Manager.
Recommendation: While array replication supported by Site Recovery Manager can be used for this evaluation, vSphere Replication is recommended for simplicity and compatibility with a wide variety of storage types including VMware vSAN™. A minimum of one vSphere Replication virtual appliance must be deployed and configured for use with the vCenter Server instance at each site. For more information on deploying and configuring vSphere Replication, see the vSphere Replication documentation. vSphere Replication does not require installation of a storage replication adapter.
NOTE: Storage policy protection groups, cross-vCenter vMotion with stretched storage, and NSX integration require array-based replication. These features are not required to successfully complete the steps in this guide.
The figure below shows a logical diagram of how the evaluation environment can be configured. Network connectivity is required between the two sites, but they do not have to be geographically separated to satisfy the requirements of the evaluation exercises.
The following checklist can be used to track the progress of the evaluation at a high level.
The following exercises are covered in this document:
The following checklist can be used to track the progress of the evaluation at a high level. The sections after the checklist provide more details on each exercise, including recommendations, documentation references, VMware Knowledge Base articles, and other resources. This document does not contain detailed, step-by-step instructions for completing the tasks in each exercise. These instructions are documented in items such as the Site Recovery Manager documentation. In most cases, one exercise is dependent on another one. For example, a recovery plan cannot be created until at least one protection group is created. Perform the exercises in the order documented in this guide.
SUCCESS CRITERIA | RESULT |
Sites paired | |
Inventory mappings configured | |
Placeholder datastores defined | |
Array managers added and enabled (if using array replication) | |
Protection group created | |
Recovery plan created | |
Test a recovery plan | |
Run a recovery plan | |
Reprotect a recovery plan | |
Run a reprotected recovery plan (fail-back) | |
Customized virtual machine recovery properties | |
Run a recovery plan with virtual machine customization |
It is assumed that Site Recovery Manager has been installed in both sites, a replication solution has been deployed
It is assumed that Site Recovery Manager has been installed in both sites, a replication solution has been deployed, and all virtual machines that will be protected by Site Recovery Manager are being replicated.
Site Recovery Manager is managed using vSphere Web Client. During the installation of Site Recovery Manager, a plugin is installed in vSphere Web Client and an icon labeled “Site Recovery” is displayed.
Exercise 2: Configure Inventory Mappings
Inventory mappings consist of three types: Resource mappings, folder mappings, and network mappings. These mappings provide default settings for recovered virtual machines. For example, a mapping can be configured between a network port group named “Production” at the protected site and a network port group named “Production” at the recovery site. As a result of this mapping, virtual machines connected to “Production” at the protected site will, by default, automatically be connected to “Production” at the recovery site.
There is no issue with having a port group at each site with the same name since each site is managed by a separate vCenter Server instance. Having port groups at each site with the same name eases Site Recovery Manager configuration. If port groups at the protected and recovery site have different names, the mappings must be created manually.
Exercise 3: Configure Placeholder Datastore
Site Recovery Manager creates a placeholder virtual machine at the recovery site for every protected virtual machine. Placeholder virtual machines are contained in a datastore and registered with vCenter Server at the recovery site. This datastore is referred to as a “placeholder datastore”. Placeholder virtual machines do not have virtual disks (VMDK files) so they consume minimal storage capacity.
Create a small datastore that is accessible by all hosts at the recovery site for use as a placeholder datastore. Create a similar datastore at the protected site, as well. At least one placeholder datastore is required at each site to utilize the failover and failback functionality in Site Recovery Manager. If you are using array replication, do not configure replication for the placeholder datastores.
It is possible to configure multiple placeholder datastores at each site. Typically, one placeholder datastore at each site is sufficient. Multiple placeholder datastores may be beneficial in larger environments such as a site with multiple vSphere clusters or a shared recovery site.
Exercise 4: Add Array Manager & Enable Array Pair
This exercise is necessary only if using array replication. If you are using vSphere Replication, this exercise can be skipped.
When using array replication, a storage replication adapter is required for the specific array replication solution to be used with Site Recovery Manager. Storage replication adapters are software components that are produced and supported by the array replication vendors. The Site Recovery Manager compatibility guide on VMware’s web site should be used to determine if a storage replication adapter is available for the array replication technology in the evaluation environment. Only storage replication adapters downloaded from vmware.com should be used to ensure compatibility and support.
Recommendation: Use vSphere Replication for this evaluation. While array replication has some advantages over vSphere Replication, it is also more complex to install and configure and it usually requires additional licensing from the array replication vendor. Using vSphere Replication has a number of advantages including simple management using vSphere Web Client, support for virtually any storage supported by vSphere including vSAN, the ability to configure replication on a per-VM basis, and vSphere Replication is included with vSphere Essentials Plus Kit and higher editions. For a full comparison of array replication and vSphere Replication, see SRM - Array Based Replication vs. vSphere Replication.
See “Configure Array Managers” in the Site Recovery Manager documentation for guidance when working with array managers. Thoroughly read the documentation - release notes, installation guide, etc. - that is typically included with a storage replication adapter. Most storage replication adapters have specific requirements that are outlined in this documentation, not in the Site Recovery Manager documentation.
After array managers have been successfully configured, paired, and enabled, information about the array replication can be seen in the Site Recovery Manager user interface. This information includes items such as the local and remote array names, local and remote devices, and the direction in which replication is currently occurring.
How Site Recovery Manager Computes Datastore Groups in the Site Recovery Manager documentation to gain a thorough understanding of how datastore groups are composed when array replication is utilized.
Before a protection group can be created, replication must be configured. If you have not configured array replication or vSphere Replication for the virtual machines that Site Recovery Manager will protect, you will need to do so before proceeding.
Details on deploying and configuring vSphere Replication can be found in the vSphere Replication documentation.
A protection group is a collection of one or more virtual machines that are failed over and failed back as a unit. In many cases, a protection group consists of multiple virtual machines that support a service such as an accounting system. For example, a service might consist of a database server, two application servers, and two web servers. In most cases, it is not be beneficial to fail over part of a service (only one or two of the servers in the example). All five servers would be included in a protection group to enable failover of the service.
Creating a protection group for each application or service also has the benefit of selective testing. With Site Recovery Manager, having a protection group for each application enables non-disruptive, low-risk testing of individual applications. Application owners can test disaster recovery plans, as needed.
Larger environments usually have higher numbers of applications. Creating a protection group for each application in these larger environments may not be practical and might exceed the maximum supported number of protection groups in Site Recovery Manager. Please see Operational Limits for Site Recovery Manager (2105500) for details.
There are other organizational methods to consider when creating protection groups. One is creating a protection group for each business unit - all virtual machines belonging to a specific business unit are placed in a protection group. Another method is grouping virtual machines together by application tier. For example, all database servers in one protection group, all middleware servers in a second protection group, and all client-facing servers in a third protection group. While these approaches have their limitations, they also reduce the number of protection groups to create and manage.