vSphere 5.1 Announced with Enhanced vSphere Replication

August 27, 2012 by Gregg Robertson 4 Comments

vSphere Replication

vSphere Replication (VR) is the industry’s first and only genuinely hypervisor-level replication engine.

It is a feature first introduced with Site Recovery Manager 5.0 to allow for the vSphere platform to protect virtual machines natively by copying their disk files to another location where they are ready to be recovered.

VR is a software based replication engine that works at the host level rather than the array level.

Identical hardware is not required between sites, and in fact customers can run their VMs on any type of storage they choose at their site – even local storage on the vSphere hosts, and VR will still work.

It provides simple and cost-efficient replication of applications to a failover site

VR is a component delivered with vSphere editions of Essentials Plus and above, and also comes bundled with Site Recovery Manager. This offers protection and simple recoverability to the vast majority of VMware customers without extra cost.

•With VR, a virtual machine is replicated by components of the hypervisor, removing any dependency on the underlying storage, and without the need for storage-level replication.

•VMs can be replicated between *any* type of storage platform: Replicate between VMFS and NFS, from iSCSI to local disk. Because VR works above the storage layer it can replicated independently of the file systems. (It will not, however, work with physical RDMs.)

•Replication is controlled as a property of the VM itself and its VMDKs, eliminating the need to configure storage any differently or to impose constraints on storage layout or management. If the VM is changed or migrated then the policy for replication will follow the VM.

•VR creates a “shadow VM” at the recovery side, then populates the VM’s data through replication of changed data.

•While VR can be deployed through the “thick client” all management and interaction with VR is done strictly through the vCenter 5.1 web interface.

•Only vSphere 5.0 and 5.1 will work for vSphere Replication as the VR Agent is a component of the vSphere 5.x hypervisor.

•vSphere Replication can not co-exist with the vSphere Replication pieces originally shipped with SRM 5.0. If an existing SRM 5.0 vSphere Replication environment is in place it will need to be uninstalled and replaced with the standalone vSphere Replication from vSphere 5.1.

•While both Storage DRS and sVmotion are supported, they will cause certain scenarios to be aware of

•While Storage vMotion of a VR protected VM can be done by an administrator, on vSphere 5.0 this may create a “full sync” scenario in which a VM must be completely resynchronized between source and destination, possibly violating the configured recovery point objective for that VM.

•Storage DRS compounds this problem by automating storage vMotion, and thereby may potentially cause the protected virtual machines to create continual full sync scenarios, driving up I/O on the storage, thereby creating cyclical storage DRS events. Because of this it is unsupported with 5.0.

•Storage vMotion and SDRS are only able to be run on the *protected* VM and can not execute against the *replica* of the VM.

•When using vSphere Replication with Site Recovery Manager, storage vMotion and storage DRS are *not supported*

•Neither of these scenarios is true with vSphere 5.1 as the persistent state file that contains current replication data is migrated along with the rest of the VM, which did not occur in vSphere 5.0.

vSphere Replication is not “new” as it has more than a year-long track record of success with Site Recovery Manager.

VR is a non-disruptive technology: It does not use vSphere file-system snapshots nor impact the execution of the VM in any abnormal way.

Since VR tracks changes at a sub-VM level, but above the file system, it is completely transparent to the VM unless Microsoft Volume Snapshot Service is being used to make the VM quiescent. Even then VR uses fully standard VSS calls to the Microsoft operating system.

Virtual machines can be replicated irrespective of underlying storage type • Can use local disk, SAN, NFS, and VSA
• Enables replication between heterogeneous datastores
• Replication is managed as a property of a virtual machine

• Efficient replication minimizes impact on VM workloads

vSphere Replication Use Cases

Protecting VMs within a site, between sites, or to and from remote and branch offices.

Can use dissimilar storage, low cost NAS Appliances, even independent vSphere hosts with only local disk.

VR Deployment

VR is deployed via a standard virtual appliance OVF format.

The OVF contains all the necessary components for VR.

•What used to be both the “VRMS and VRS” in the SRM 5.0 implementation of VR are included in the “VR Appliance” now

•This allows a single appliance to act in both a VR management capacity and as the recipient of changed blocks

•Scaling sites is an easy task, simply deploy another VR Appliance at the target site and it will contain the necessary pieces to either pair and mange replication for a site or simply receive changed blocks as per the VRS

vSphere Replication Limitations

vSphere Replication is targeted at replicating the virtual disks of powered on virtual machines only. It is based on a disk filter to track changes that pass through it, therefore static images can not be tracked.

Powered-off or suspended VMs will not be replicated. The assumption is that if the VM is important enough for protection, it is powered on.

That also means non-disks attached to a VM (ISOs, floppy images, etc) are not replicated. Also any disks, ISOs, or configuration files not associated with a VM will not be replicated.

Files that moreover are not required for the VM to restart (e.g., vswp files or log files) are not replicated by VR.

Since VR works above the disk itself at the virtual device layer, it can be completely independent of specifics about the VMDK it is replicating. VR can replicate to a different format than its primary disk – i.e. you can replicate a thick provisioned disk to be a thin provisioned replica.

VM snapshots in and of themselves are not replicated but instead are collapsed during replication. A VM with snapshots may be configured for protection by VR (and you can take and revert snapshots), but the remote state for such VMs will be “flat” without any snapshots. Snapshots are aggregated into a single VMDK at the recovery location.

Note: Reverting from a snapshot may cause a full sync!

VMs can be replicated with a recovery point objective (RPO) of at most 15 minutes and at least 24 hours. This means that a recovery of replicated VMs will lose at least 15 minutes worth of recent data.

How it works

Fundamentally VR is a handful of virtual appliances that allow the vSphere kernel to identify and replicate changed blocks between sites. The configuration and deployment is a handful of simple steps.

Once the administrator has deployed the components it is a matter of pairing a source and destination.

Lastly, configuration of an individual VM for protection tells VR to start replicating its changes, and where to put them at the recovery location.

Only replicates changed blocks

On an ongoing basis, after the first sync, VR will only ship changed blocks.

Within the RPO defined by the administrator, VR tracks which blocks are being dirtied and will create a “lightweight delta” (LWD) bundle of data to be transferred to the remote site.

Pointers to changed blocks are kept in both a memory bitmap as well as a “persistent state file” (psf) located in the directory of a VM. Memory contents are always current, the PSF file represents the current shipping LWD. After an LWD is shipped and completely acknowledgd, the memory bitmap is copied to the PSF file and the memory bitmap is restarted for the next LWD.

VR will use the defined RPO to determine how often to create a LWD. Time must be allowed to create the block bundle, transfer it, and successfully complete writing the entire bundle to ensure that the RPO is not violated. In order to do this, VR will track the length of the previous 15 transfers to create an estimate of how long it will take to complete the transaction of the subsequent LWD.

For example, if a transfer takes 1 minute to create, 8 minutes to transfer, and 1 minute to write, by the time the data is successfully written the original VM is now 10 minutes old. With, for example, a 1 hour RPO set for a VM, the next transfer would need to take place at least within the next 40 minutes. This presumes 10 minute old data plus the next 10 minute transfer = 20 minutes gone out of the 1 hour RPO to ensure the data at the recovery site is never older than the RPO defined.

If a transfer of a LWD takes more than half the time of the RPO it is very likely that the RPO will be violated based on the incremental “catch up” to the RPO period and it will be flagged as a potential RPO violation.

VR will create a per-host replication schedule by taking into account *all* the VMs being replicated from that particular host. This allows it to do host-wide scheduling for each replicated VMDK and allows transfers to take place according to variables such as length of transfer, size of LWD, etc. and gives the scheduler flexibility to send data when appropriate.

The scheduler will execute each time an event occurs that alters replication patterns, such as a power task on a replicated VM, changes to RPOs or a full sync, or an HA event such as a host crash.

Only the most-recent transfer information is persisted. If hostd crashes, or the VM is migrated, or reconfigured, the historic transfer state is lost, and must be re-accumulated for the scheduler to be most effective.

It is important to note that VR is *not* using vSphere based snapshots to create redo logs of the primary VMDK. The VMDK is not interrupted in any fashion at all, and there is no snapshot created.

It also does NOT use “CBT” or “Changed Block Tracking”, another feature of the vSphere Platform. The vSCSI filter of VR is completely independent of CBT by design. This allows CBT to remain untouched for other tools such as VADP and backup software. If CBT were to be used it would reset the changed block tracking epoch, breaking backups and other uses of CBT.

VR is 100% isolated from snapshots and CBT.

Recovering a VM with a few clicks

A VM can be recovered only if it is not powered on somewhere else or is not reachable by the recovery vCenter Server. This is to avoid having duplicate VMs running at the same time.

For further safety, the VM is booted with no networks connected to help avoid duplicate VMs colliding.

Once the recovery is processed, you can not reconnect and re-enable replication of that VM. You must re-start protection all over again. You may, however, use the old VMDK that might remain at either site as a seed to begin replication again.

Four steps for full recovery

As long as the replication has completed at least once a VM can be recovered quickly and easily directly from the vCenter Web Client.

From the Replication location in the Web Client, choose a VM that has been replicated, right-click and choose to recover.

Choosing a target folder and resource (Cluster, host, or resource pool) will then instantiate the replicated vm, create and register the vmx, attach the VMDK and power-on the VM if chosen.

This can not be automated, and can only be done a single VM at a time.

4 thoughts on “vSphere 5.1 Announced with Enhanced vSphere Replication”

Narendra Kumar KT
November 20, 2013 at 11:24 am

Very Well and Nicely Descriptive Blog about the VR Internals….

Paul M. Field
November 16, 2015 at 6:54 pm

ok so how do you clear an RPO violation once it goes back to OK (RPO violation)

thanks

- Gregg Robertson
  November 17, 2015 at 11:58 am
  
  the alarm is clearable via the alarms tab in the viclient or the alarms page in the web client
  
  - Paul M. Field
    November 17, 2015 at 12:49 pm
    
    only option i get is “reconfigure” or “stop”
    
    nothing in alarms… using 5.5 web client and 5.0 client