Best Practices for Disaster Recovery as a Service

This article describes the best practices that must be following while using Disaster Recovery.

Druva AWS Proxy
DR restore
DR Failover
Failback
Billable AWS services

Druva AWS proxy

Druva AWS proxy, also referred to as DR proxy, is an EC2 instance that runs in the customer’s AWS account. The Druva AWS proxy runs the Disaster Recovery service and is responsible for orchestrating the DR Restore, DR failback, and DR failover. The DR proxy is deployed using the AWS CloudFormation template. The DR proxy deployment takes less than 10 minutes.

Druva recommends that you deploy at least two DR proxies in separate availability zones for high availability.

📝 Note
Each DR proxy can run three DR restore jobs concurrently.

The recommended EC2 instance size for the Druva AWS proxy is c5.2xlarge.

Instance type	vCPU	Memory(GiB)	Instance Storage(GiB)	Network Bandwidth (Gbps)	EBS Bandwidth( Mbps)
c5.2xlarge	8	16	EBS-Only	Upto 10	Upto 4,750

The DR proxy must have access to the following services:
- S3,
- EC2-API, and
- SQS

Druva CloudFormation template creates endpoints that provide connectivity to these services over AWS private network.

Ensure that the EC2 key pair assigned to the Druva AWS proxy is stored in a secure location. The key pair is used to access the Druva AWS proxy for troubleshooting only.
The Druva services should be available in the availability zone for the subnet that you intend to select during the Druva AWS proxy deployment. See Prerequisite check Druva AWS proxy VPC and subnets and AWS Proxy deployment fails due to AWS Availability Zone limitation for details.
To avoid throttling the PutSnapshotBlock API, you need to set a minimum required quota, depending on your proxy version. See Deploying the Druva AWS proxy for details.

VPC

While defining networking mappings in a DR plan, we need you to map the vCenter source network to a VPC and subnet on the target AWS account.

If you create a new Amazon VPC, you don’t need to attach an Internet Gateway(IGW) to it, as the Druva AWS proxy uses the AWS private link for all communication.
Ensure that DNS hostnames and DNS resolution are enabled within the VPC.

📝 Note
The Druva AWS proxy can be deployed in a customer VPC that has DNS resolution enabled and provided by an Amazon-owned DNS server. In situations where DNS resolution is disabled or provided by a third-party-owned DNS server, we recommend deploying a new VPC dedicated to the Druva AWS proxy. This newly deployed VPC must have:

DNS resolution enabled.
DNS server configured in the DHCP option should be set to an Amazon-provided DNS server, not a custom DNS server.

DR Failover Checks - Guest OS

DR Failover Checks - Guest OS run while the VM backup is in progress and ensures that the VM meets all the DR failover and failback requirements. Ensure that all the DR Failover Checks - Guest OS are successful for a successful DR failback or failover.

When the DR Failover Checks - Guest OS checks do not execute

The DR Failover Checks - Guest OS may not execute at all for one or more of the following reasons:

The VMware Backup proxy is unable to communicate with the ESX host on port 443. Enable communication between the backup proxy and the ESX host on port 443.
The VMware Backup proxy is on a version older than 4.8.11. Upgrade the VMware backup proxy to the latest version, and ensure that the first VM backup after the proxy upgrade is successful.
The DR Failover Checks - Guest OS may not execute at all if the VM cannot connect to Druva download portal at https://downloads.druva.com/phoenix/ to download the DR Failover Checks - Guest OS executables while the VM backup is in progress. If the VM is unable to connect to Druva download portal and download the DR Failover Checks - Guest OS executables, ensure that:
- The URL https://downloads.druva.com/phoenix/ or *.druva.com is allowed through the network firewall.
- If the VM in question is a Windows VM, disable UAC on the VM.

Exclude the DR Failover Checks - Guest OS executables from any antivirus software running on the VM. The following table lists the DR Failover Checks - Guest OS executables that must be excluded depending upon the VM operating system.

Operating system	Prerequisite check executable
Windows	`PhoenixPreflight_<version number>.exe`
Linux	`PhoenixPreflight_<version number>`

Resolving DR Failover Checks - Guest OS errors

If the DR Failover Checks - Guest OS fail or pass with warnings, resolve the errors or warnings before re-running the backup job.

Resolve Linux prerequisite check errors

Resolve Windows prerequisite check errors

Credentials: Ensure that the VMs whose disaster recovery you want to perform have credentials assigned to them. If credentials are not assigned to virtual machines or are invalid, Druva will not perform prerequisite checks. You can either assign credentials to the VMs from the VMware page, or the Disaster Recovery page.
The user account must have the following privileges:

Windows virtual machines

The account must have local administrative privileges.
UAC must be disabled on the virtual machine. See disabling UAC on Windows server for more information.

Linux virtual machines

A non-root user must have sudo rights and must have the NOPASSWD: ALL tag enabled in the sudoers file. Edit the sudoers file and ensure that the non-root user has the following entry at the end:

username ALL=(ALL) NOPASSWD: ALL

Where username, is the username that can execute all commands without prompting for a password.

Verifying permissions

Execute the sudo -l command. If the user has sudo privileges and the NOPASSWD: ALL tag has been enabled in the sudoers file, the command will generate the following output without prompting for a password.

If the user does not have sudo privileges or does not have the NOPASSWD: ALL tag enabled in the sudoers file, the command will generate the following output and will prompt for a password.

The directory /home/{username} must exist, and the non-root user must have read, write, and execute ( RWX ) permissions over this directory.

While a VM backup is in progress, the prerequisite checks use the working directory /home/{username}/Druva/Phoenix/Preflight for non-root users and the directory /home/{PreflightBinaryName}/Druva/Phoenix/Preflight for root users. Once the prerequisite checks are complete, Druva deletes the directories that it created under /home/{username} for non-root users or /home/{PreflightBinaryName} for root users.

2. Virtual Machines

The VM must be running for the prerequisite check to work.
The VM must have VMware tools installed on it.
The VM must have at least 1 GB of free space on the boot partition.
Ensure that all Druva processes are whitelisted in any antivirus software running on the virtual machine.

Here are all the 14 Windows files that must be whitelisted:

C:\Windows\System32\systeminfo.exe

C:\Druva\Vmtools\Ec2Install\Ec2Install.exe

C:\Druva\Vmtools\Citrix_xensetup.exe

C:\Druva\Vmtools\dotnetfx45.exe

C:\Druva\Vmtools\AWSPVDriverSetup8.2.1.msi

C:\Druva\Vmtools\dotNetFx40_Full_x86_x64.exe

C:\Druva\Vmtools\Ec2Install\AmazonSSMAgentSetup.exe

C:\Druva\Vmtools\XenGuestAgent.exe

C:\Druva\Vmtools\wic_x86_enu.exe

C:\Druva\Vmtools\wic_x64_enu.exe

C:\Druva\Vmtools\WiXEC2ConfigSetup_64.msi

C:\Druva\Model\cli.exe

C:\Druva\Model\run_model.bat

C:\Druva\Service\rmservice.exe

Here are all the Linux files that must be whitelisted: (The /opt/druva files are installed by Druva as part of the DR Failover operation)

/opt/druva/rm_startup.sh

/opt/druva/cli

/opt/druva/run_model.sh

/opt/druva/upload_logs.sh

/etc/rc.local

/etc/init.d/after.local

Add virtual machines to DR plan

A DR plan includes a group of virtual machines, the DR restore frequency and all the disaster recovery settings that help you perform a single click failover.

When a VM is added to a DR plan, Druva automatically assigns a few default failover settings. The default settings are:
1. instance_type = t2.medium
2. public_ip = None
3. private_ip = Auto Assign
  These settings can be used to spin up the VM from the DR copy in case of a failover. You can update these settings based on source VM configuration for optimum failover times.
While configuring failover settings for VMs added to the DR plan, ensure that the instance type is not smaller than the virtual machine you are trying to failover. You can also use the auto-suggest instance type feature to let Druva choose the appropriate instance type.

📝Note

We've discontinued support for t2.micro and t2.small EC2 instance types for DR failovers. These instance types are not available for manual instance type selection or instance auto-assignment.
Ensure that the Recovery Point Actual (RPA) does not exceed the backup frequency duration. RPA is the time elapsed since the last successful VM recovery point that is available for failover. For more information, see Managing Recovery Point Actual.

DR restore

DR restore (also referred to as DR copy) is the process where the Druva AWS proxy reads the VM backup data from Druva Cloud, replicates it to an EBS volume in the customer's AWS account, and creates an EBS snapshot of the EBS volume. The frequency with which the data is replicated is defined in the DR plan.

Ensure that the retention period for backups of large virtual machines is longer than the time it can take to create the first full DR copy, that is, transfer the VM backup data from Druva Cloud to the customer AWS account. The first DR restore can take longer. Subsequent incremental DR restores are faster.

DR Failover

Failover is the process where the DR proxy creates an EC2 instance in the customer’s EC2 account, creates an EBS volume from the EBS snapshot, attaches it to the EC2 instance, and finally spins up the instance after redirecting the network traffic to the IP addresses of the EC2 servers. A Linux VM failover can take between 15 to 30 minutes on average, while a Windows VM failover can take between 45-75 minutes. A failover can complete within the stipulated time provided the E2 instance type that is spawned from the EBS snapshot is the same type and size as the source virtual machine. Ensure that the DR failover checks are successful before initiating a production or test failover. The DR failover checks preemptively flag issues that can cause the failover jobs to fail. Fixing identified issues proactively ensures that your actual failovers are successful. For more information, see DR failover checks - environment and DR failover checks - Guest OS.

Test Failover

Druva recommends using the Test Failover option to periodically test VM failovers. You specify the production and test failover settings while creating the DR plan. As part of Test Failover Settings, you specify the instance type, the IAM role, Volume Type and Instance Tags. You can also use the same failover settings as used in Production.

On the Disaster Recovery page, select the DR Plan. On the Overview Page, click Failover > Test Failover. For more information, see Manage disaster recovery failover.

Failback

When you initiate a DR failback, the VMware backup proxy creates a target VM in the on-premise infrastructure. This target VM connects to the failed over EC2 instance and copies the data onto itself. Druva then boots up this VM.

Ensure that the target virtual machine in your on-premise environment to which you will failback has connectivity to the EC2 instance.
Ensure that the target virtual machine in your on-premise environment used for failback is reachable from the VMware backup proxy.
Ensure that the following ports are open on the target virtual machine:
- Linux: Port 22 for SSH
- Windows: Ports 445 (Used for preflight checks and control messaging) and 50000 (Used for actual data transfer in failback operation).

📝 Note
You must manually enable the SMB port for communication. See, DR8263.

Ensure that the administrative shares of the source EC2 instance are reachable before attempting a failback. For more information, see error DR8263 and its resolution.

Before initiating a production DR failback job, we recommend running the DR Failback Checks to ensure that your AWS environment and the destination VMware environment do not have any issues that can cause the production DR failback jobs to fail. For more information, see DR Failback Checks.

Billable AWS services

The following AWS services are deployed in your AWS account during the Druva AWS proxy deployment and are billable.

The Amazon EC2 instance type (c5.2xlarge - recommended) used for the Druva AWS proxy.
The following AWS VPC endpoints that are configured as part of proxy deployment:
1. Druva Backup Service Endpoint
2. Druva Node Service Endpoint
3. S3 Endpoint
4. SQS Endpoint
5. EC2 Endpoint
6. CloudFormation Endpoint
7. EBS Endpoint
8. Lambda Endpoint
9. Logs Endpoint

The AWS service costs are to be paid to AWS. For more information on the service costs, refer to Amazon EC2 pricing and AWS PrivateLink pricing.

Table of contents