Skip to main content

Deploy Failover VPC Infrastructure for Disaster Recovery

Updated today

A Virtual Private Cloud (VPC) is your own isolated network inside AWS. It defines the IP address range, subnets, routing rules, and security controls for the resources you run in the cloud.

In a disaster recovery (DR) scenario, you need a pre-configured VPC in your failover region, known as the Failover VPC, so that when a disaster occurs, you can bring up your recovered servers and services directly into an already prepared network. Without it, you would have to design and create the VPC during the outage, which slows down recovery and increases the risk of errors.

The CloudFormation template described in this article deploys a Failover VPC in a single step, so your DR environment is ready before you need it.

About the Failover VPC template

This CloudFormation template creates isolated (no‑internet) subnets, NAT-backed private subnets, and public subnets across multiple Availability Zones for DR failover. It is intended for scenarios where you need to provision a new VPC rather than configuring one manually, and helps you meet security and compliance requirements such as PCI DSS or HIPAA.

The following are the benefits of using the CloudFormation template:

  • Simplify failover VPC provisioning: Deploy a Failover VPC from a single CloudFormation template instead of manually configuring subnets, NAT gateways, and endpoints.

  • Isolated environments: The VPC separates ‘Test’ and ‘Production’ into distinct environments, so a test failover does not consume bandwidth or affect the stability of your live failover environment.

  • Dedicated NAT gateways: Each environment (Test and Production) gets its own Network Address Translation (NAT) gateway. This avoids a situation where test failover traffic could slow down your production traffic.

  • Private service access (VPC endpoints): Instead of sending sensitive data over the public internet to reach AWS services such as Amazon Simple Storage Service (S3) or Amazon Simple Queue Service (SQS), the template uses private connections within your VPC. This keeps your traffic off the public web, reducing both security risks and data transfer costs.

  • Flexible public and private deployments: You can deploy failover EC2 instances either with or without public IP addresses, depending on whether they need direct internet access.

    • With a public IP: Place instances in public subnets (Prod-IGW, Test-IGW) when they must be reachable from or access the internet directly.

    • Without a public IP: Place instances in private or isolated subnets (Prod-NAT, Test-NAT, Prod-NoInternet, Test-NoInternet) when they should not be directly reachable from the internet.

Prerequisites

Before deploying, ensure you have:

What the template creates

This table summarizes all resources the template deploys.

Component

Description

VPC

Single VPC with configurable CIDR (default: 172.31.0.0/16)

6 Subnets

Test-NoInternet, Prod-NoInternet, Test-NAT, Prod-NAT, Test-IGW, Prod-IGW

2 NAT Gateways

One for Test, one for Production

S3 Gateway Endpoint

Private S3 access for NoInternet subnets

SQS Interface Endpoint

Private SQS access via PrivateLink

4 Route Tables

For public, NAT, and NoInternet subnets

Security Group

Restricts SQS endpoint access to VPC CIDR

Network ACL

Subnet-level stateless traffic filtering

Architecture

The following architecture diagram illustrates the failover VPC structure, including its subnets, NAT gateways, VPC endpoints, and route tables.

Subnets (6 total)

Subnet

Purpose

Use case

Test-NoInternet

Fully isolated. No internet or NAT. Reaches S3/SQS via VPC endpoints only.

Test failover with no internet access

Prod-NoInternet

Fully isolated. No internet or NAT. Reaches S3/SQS via VPC endpoints only.

Production failover with no internet access

Test-NAT

Private subnet with outbound internet through NAT Gateway.

Test failover instances that need outbound internet

Prod-NAT

Private subnet with outbound internet through NAT Gateway.

Production failover instances that need outbound internet

Test-IGW

Public subnet for the Test NAT Gateway and public-facing test resources.

Prod-IGW

Public subnet for the Production NAT Gateway and public-facing resources.

NAT Gateways (2 total)

Each environment (Test and Production) gets its own NAT Gateway. This avoids a situation where test failover traffic could slow down your production traffic.

VPC Endpoints (S3 and SQS)

  • S3 Gateway Endpoint — Lets NoInternet subnets access S3 without internet or NAT. Eliminates data transfer charges for S3 traffic.

  • SQS Interface Endpoint — Lets workloads access SQS via PrivateLink, so traffic stays within AWS and does not reach the public internet.

Route Tables (4 total)

The template creates four route tables:

Route Table

Attached to

Routes to

PublicRouteTable

Prod-IGW, Test-IGW

Internet Gateway

ProdNatRouteTable

Prod-NAT

Production NAT Gateway

TestNatRouteTable

Test-NAT

Test NAT Gateway

NoInternetRouteTable

Prod-NoInternet, Test-NoInternet

No default route; S3 via VPC endpoint only

Security Group

Allows only TCP/443 from the VPC CIDR to reach the SQS endpoint.

Network ACL (NACL)

A subnet-level ACL for stateless traffic filtering. The template uses the default NACL rules, which allow all inbound and outbound traffic. You can tighten these rules after deployment based on your security requirements.

Public or private failover instances?

Decide this before deployment, as it determines which subnets your failover instances will use.

With a public IP (internet-facing)

  • Use Prod-IGW or Test-IGW subnets.

  • Instances receive public IPs and are reachable from the internet.

  • Suitable for web servers, load balancers, or other public-facing workloads.

Without a public IP (private only)

  • Use Prod-NAT, Test-NAT, Prod-NoInternet, or Test-NoInternet subnets.

  • Instances have only private IPs and are not directly reachable from the internet.

    • Prod-NAT / Test-NAT — outbound internet through NAT Gateway.

    • Prod-NoInternet / Test-NoInternet — no internet; S3/SQS through VPC endpoints only.

IAM permissions

To deploy the stack, the IAM user or role needs at least the following permissions:

{

"Version": "2012-10-17",

"Statement": [

{

"Sid": "CloudFormationAccess",

"Effect": "Allow",

"Action": [

"cloudformation:CreateStack",

"cloudformation:UpdateStack",

"cloudformation:DeleteStack",

"cloudformation:Describe*",

"cloudformation:GetTemplate"

],

"Resource": "*"

},

{

"Sid": "EC2NetworkingCore",

"Effect": "Allow",

"Action": [

"ec2:CreateVpc",

"ec2:DeleteVpc",

"ec2:ModifyVpcAttribute",

"ec2:CreateSubnet",

"ec2:DeleteSubnet",

"ec2:CreateInternetGateway",

"ec2:DeleteInternetGateway",

"ec2:AttachInternetGateway",

"ec2:DetachInternetGateway",

"ec2:CreateRouteTable",

"ec2:DeleteRouteTable",

"ec2:CreateRoute",

"ec2:DeleteRoute",

"ec2:AssociateRouteTable",

"ec2:DisassociateRouteTable",

"ec2:AllocateAddress",

"ec2:ReleaseAddress",

"ec2:CreateNatGateway",

"ec2:DeleteNatGateway",

"ec2:CreateVpcEndpoint",

"ec2:DeleteVpcEndpoints",

"ec2:ModifyVpcEndpoint",

"ec2:CreateSecurityGroup",

"ec2:DeleteSecurityGroup",

"ec2:AuthorizeSecurityGroupIngress",

"ec2:AuthorizeSecurityGroupEgress",

"ec2:RevokeSecurityGroupIngress",

"ec2:RevokeSecurityGroupEgress",

"ec2:CreateNetworkAcl",

"ec2:DeleteNetworkAcl",

"ec2:CreateNetworkAclEntry",

"ec2:DeleteNetworkAclEntry",

"ec2:CreateTags",

"ec2:DeleteTags",

"ec2:Describe*"

],

"Resource": "*"

}

]

}

Deployment steps

Perform the following:

  1. In the AWS Console, go to CloudFormation > Create stack > With new resources.

  2. Select and upload a template file and upload failover.json downloaded from https://downloads.druva.com/phoenix/DR/Failover-VPC-CloudFormation/dr-failover-vpc.json.

  3. Enter a Stack name (for example, dr-failover-vpc-infrastructure).

  4. Set the following parameters, and click Create stack. Wait 5–7 minutes for the stack creation to finish.

Parameter

Description

Default

VPCCidr

VPC CIDR block

172.31.0.0/16

TestNoInternetSubnetCidr

Test NoInternet subnet

172.31.0.0/24

ProdIGWSubnetCidr

Production public subnet

172.31.1.0/24

TestIGWSubnetCidr

Test public subnet

172.31.2.0/24

ProdNoInternetSubnetCidr

Production NoInternet subnet

172.31.3.0/24

TestNATSubnetCidr

Test NAT subnet

172.31.4.0/24

ProdNATSubnetCidr

Production NAT subnet

172.31.5.0/24

Steps after deployment

Perform the following:

  1. Go to CloudFormation > Stacks, select your stack, and open the Outputs tab.

  2. Copy the following values. You will need them for DR plan network mapping in Druva Phoenix:

    1. VpcId - for DR plan network mapping

    2. TestNATSubnetId, ProdNATSubnetId for failover instances

    3. TestNoInternetSubnetId, ProdNoInternetSubnetId for isolated workloads

  3. In Druva Phoenix, create or edit your DR plan and map these VPC and subnet IDs under Network mappings.

Did this answer your question?