This document is intended for architects and people who work in operations and administrative teams. The document describes an example pattern that you can use for your own deployments in Google Cloud

In this pattern, Cloud DNS directs traffic to Compute Engine instances in managed instance groups that serve the content. In an outage, you update the Cloud DNS zone and fail over to a static site in Cloud Storage

To complete this tutorial, you need a registered domain name that you control and want to use with this document

In production deployments, your website likely includes many more files and additional application code on your managed instance group virtual machines (VMs) than is shown in this document. Cloud Storage then hosts a more limited static version that provides minimal functionality. In a warm failover scenario, users see this limited website until the managed instance groups recover and can serve traffic for the full website experience

In this tutorial, you deploy resources to create an environment as shown in the following image:
When you need to fail over, you update the Cloud DNS configuration to direct traffic to Cloud Storage, as shown in the following image:
This warm failover pattern balances the cost of running another managed instance group in a different region that you only use when the primary region fails. The cost of a static site using Cloud Storage is lower than running another managed instance group, but there's a short delay as you update Cloud DNS between the hosting options. The limited website experience in Cloud Storage is better than a completely unavailable website and poor customer experience

For an alternative approach that uses external HTTP(S) Load Balancing instead of Cloud DNS to control the failover, see Deploy a warm recoverable web server with Compute Engine and Cloud Storage. This pattern is useful if you don't have, or don't want to use, Cloud DNS

To run reliable applications in Google Cloud, we recommend that you design your application infrastructure to handle outages. Depending on your application and business needs, you might need a cold failover, warm failover, or hot failover pattern. For more information on how to determine the best approach for your own applications, see the Disaster recovery planning guide

This document uses a basic Apache web server, but the same approach to the infrastructure deployment applies to other application environments you need to create

## Objectives

- Create regional managed instance groups with a custom VM image

- Create a Cloud Storage bucket

- Create and configure a Cloud DNS zone

- Test the warm web server failover with updated Cloud DNS records

- Test the recovery and failback with updated Cloud DNS records

## Costs
This tutorial uses the following billable components of Google Cloud:
To generate a cost estimate based on your projected usage,
use the pricing calculator

## Before you begin
Some of the steps in this document might not work correctly if your organization applies constraints to your Google Cloud environment. In that case, you might not be able to complete tasks like creating public IP addresses or service account keys. If you make a request that returns an error about constraints, see how to Develop applications in a constrained Google Cloud environment

- Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads

In the Google Cloud console, on the project selector page, select or create a Google Cloud project

Make sure that billing is enabled for your Cloud project. Learn how to check if billing is enabled on a project

Enable the Compute Engine API

- Install and initialize the Google Cloud CLI

You can run the Google Cloud CLI in the Google Cloud console without installing the Google Cloud CLI. To run the gcloud CLI in the Google Cloud console, use the Cloud Shell

## Prepare the environment
In this section, you define some variables for your resource names and locations. These variables are used by the Google Cloud CLI commands as you deploy the resources

Throughout this tutorial, unless otherwise noted, you enter all commands in Cloud Shell or your local development environment

with your own project ID. If desired, provide your own name suffix for resources to help search for and identify them, such as

Specify two regions, such as
, and a zone within one of those regions, such as
. This zone defines where the initial base VM is created that's used to create an image for the managed instance group

Finally, set a domain that's used for your static website, such as
PROJECT_IDNAME_SUFFIX= appREGION1= us-west1REGION2= us-west2ZONE= us-west1-aDOMAIN=
## Create a VPC and subnet
To provide network access to the VMs, you create Virtual Private Cloud (VPC) and subnets. As you need managed instance groups in two regions, you create one subnet in each region. For more information on the advantages of the custom subnet mode to manage IP address ranges in use in your environment, see Use custom mode VPC networks

Create the VPC with a custom subnet mode:
gcloud compute networks create network-$NAME_SUFFIX --subnet-mode=custom
Now create two subnets in the new VPC, one for each region. Define your own address ranges, such as
, that fit in your network range:
gcloud compute networks subnets create \ subnet-$NAME_SUFFIX-$REGION1 \ --network=network-$NAME_SUFFIX \ --range=\ --region=$REGION1 gcloud compute networks subnets create \ subnet-$NAME_SUFFIX-$REGION2 \ --network=network-$NAME_SUFFIX \ --range=\ --region=$REGION2
## Create firewall rules
To let network traffic flow correctly in the VPC, use firewall rules

Create firewall rules to allow web traffic and health checks for the load balancer and managed instance groups:
gcloud compute firewall-rules create allow-http-$NAME_SUFFIX \ --network=network-$NAME_SUFFIX \ --direction=INGRESS \ --priority=1000 \ --action=ALLOW \ --rules=tcp:80 \ --source-ranges= \ --target-tags=http-server gcloud compute firewall-rules create allow-health-check-$NAME_SUFFIX \ --network=network-$NAME_SUFFIX \ --action=allow \ --direction=ingress \ --source-ranges=, \ --target-tags=allow-health-check \ --rules=tcp:80
The HTTP rule allows traffic to any VM where the
http-servertag is applied, and from any source using the For the health check rule, default ranges for Google Cloud are set to allow the platform to correctly check the health of resources

To allow SSH traffic for the initial configuration of a base VM image, scope the firewall rule to your environment using the
--source-rangeparameter. You might need to work with your network team to determine what source ranges your organization uses

with your own IP address scopes:
gcloud compute firewall-rules create allow-ssh-$NAME_SUFFIX \ --network=network-$NAME_SUFFIX \ --direction=INGRESS \ --priority=1000 \ --action=ALLOW \ --rules=tcp:22 \ --source-ranges=

After you create the firewall rules, verify that the three rules have been added:
gcloud compute firewall-rules list \ --project=$PROJECT_ID \ --filter="NETWORK=network-$NAME_SUFFIX"
The following example output shows the three rules have been correctly created:
NAME NETWORK DIRECTION PRIORITY ALLOW allow-health-check-app network-app INGRESS 1000 tcp:80 allow-http-app network-app INGRESS 1000 tcp:80 allow-ssh-app network-app INGRESS 1000 tcp:22
## Create and configure a base VM image
To create identical VMs that you deploy without additional configuration, you use a custom VM image. This image captures the OS and Apache configuration, and is used to create each VM in the managed instance group in the next steps

On the VM, you create a basic
index.html file on the persistent disk and
mount it to
/var/www/ An Apache configuration file at
/etc/apache2/sites-available/ serves web content from the
mounted persistent disk location

The following diagram shows the basic HTML page served by Apache that's stored on the persistent disk:
You build this environment in the following steps

Create a base VM with an attached persistent disk:
gcloud compute instances create vm-base-$NAME_SUFFIX \ --zone=$ZONE \ --machine-type=n1-standard-1 \ --subnet=subnet-$NAME_SUFFIX-$REGION1 \ --tags=http-server \ --image=debian-10-buster-v20210420 \ --image-project=debian-cloud \ --boot-disk-size=10GB \ --boot-disk-type=pd-balanced \ --boot-disk-device-name=vm-base-$NAME_SUFFIX \ --create-disk=type=pd-ssd,name=disk-base-$NAME_SUFFIX,size=10GB,device-name=disk-base-$NAME_SUFFIX
You use parameters defined at the start of this document to name the VM and connect to the correct subnet. Names are also assigned from the parameters for the boot disk and data disk

To install and configure the simple website, first connect to the base VM using SSH:
gcloud compute ssh vm-base-$NAME_SUFFIX --zone=$ZONE
In your SSH session to the VM, create a script to configure the VM in an editor of your choice. The following example uses Nano as the editor:
nano configure-vm.
Paste the following configuration script into the file:
bin/bash NAME_SUFFIX=
app# Create directory for the basic website files sudo mkdir -p /var/www/ sudo chmod a+w /var/www/ sudo chown -R www-data: /var/www/ # Find the disk name, then format and mount it DISK_NAME="google-disk-base-$NAME_SUFFIX" DISK_PATHfind /dev/disk/by-id -name DISK_NAME}" | xargs -Ireadlink -fsudo mkfs.ext4 -m 0 -E lazy_itable_init=0,lazy_journal_init=0,discard $DISK_PATH sudo mount -o discard,defaults $DISK_PATH /var/www/ # Install Apache sudo apt-get update && sudo apt-get -y install apache2 # Write out a basic HTML file to the mounted persistent disk sudo tee -a /var/www/ >/dev/null EOF'     HA / DR example   

Welcome to a Compute Engine website with warm failover to Cloud Storagep> EOF # Write out an Apache configuration file sudo tee -a /etc/apache2/sites-available/ >/dev/null EOF' ServerName ServerAdmin webmaster@localhost DocumentRoot /var/www/ ErrorLog ${APACHE_LOG_DIR}/error.log CustomLog ${APACHE_LOG_DIR}/access.log combined EOF # Enable the Apache configuration file and reload service sudo a2dissite 000-default sudo a2ensite sudo systemctl reload apache2 Update the variable to match the value set at the start of this document, such as NAME_SUFFIX app Write out the file and exit your editor. For example, in Nano you use Ctrl-Oto write out the file, then exit with Ctrl-X Make the configuration script executable, then run it: chmod +x configure-vm../configure-vm. Exit the SSH session to the VM: exit Get the IP address of the VM and use curlto see the basic web page: curl $(gcloud compute instances describe vm-base-$NAME_SUFFIX \ --zone $ZONE \ --format="value(networkInterfaces.accessConfigs.[0].natIP The basic website is returned, as shown in the following example output: HA / DR example

Welcome to a Compute Engine website with warm failover to Cloud Storagep> This step confirms that Apache is configured correctly, and the page is loaded from the attached persistent disk. In the following sections, you create an image using this base VM and configure an instance template with a startup script ## Deploy the Compute Engine resources This warm failover pattern uses managed instance groups to run the VMs. The managed instance groups run in two regions, and each group monitors the health of the VMs. If there's an outage and one of the VMs fails, the managed instance group recreates the VM. This configuration creates a highly available application, even without the warm failover to a static site in Cloud Storage Before you can create an image, you must stop the VM: gcloud compute instances stop vm-base-$NAME_SUFFIX --zone=$ZONE Run the following set of commands to create the VM images, instance templates, and managed instance groups: # Create the base VM images gcloud compute images create image-$NAME_SUFFIX \ --source-disk=vm-base-$NAME_SUFFIX \ --source-disk-zone=$ZONE gcloud compute images create image-disk-$NAME_SUFFIX \ --source-disk=disk-base-$NAME_SUFFIX \ --source-disk-zone=$ZONE # Create instance templates gcloud compute instance-templates create template-$NAME_SUFFIX-$REGION1 \ --machine-type=n1-standard-1 \ --subnet=projects/$PROJECT_ID/regions/$REGION1/subnetworks/subnet-$NAME_SUFFIX-$REGION1 \ --region=$REGION1 \ --tags=http-server \ --metadatastartup-script /bin/bashn'echo\ UUIDblkid\ -s\ UUID\ -o\ value\ /dev/sdb /var/www/\ ext4\ discard,defaults,nofail\ 0\ 2 ee\ -a\ /etc/fstabn'mount\ -a \ --image=image-$NAME_SUFFIX \ --create-disk=image=image-disk-$NAME_SUFFIX,auto-delete=yes gcloud compute instance-templates create template-$NAME_SUFFIX-$REGION2 \ --machine-type=n1-standard-1 \ --subnet=projects/$PROJECT_ID/regions/$REGION2/subnetworks/subnet-$NAME_SUFFIX-$REGION2 \ --region=$REGION2 \ --tags=http-server \ --metadatastartup-script /bin/bashn'echo\ UUIDblkid\ -s\ UUID\ -o\ value\ /dev/sdb /var/www/\ ext4\ discard,defaults,nofail\ 0\ 2 ee\ -a\ /etc/fstabn'mount\ -a \ --image=image-$NAME_SUFFIX \ --create-disk=image=image-disk-$NAME_SUFFIX,auto-delete=yes # Create a health check for VM instances gcloud compute health-checks create http http-basic-check-$NAME_SUFFIX \ --port 80 # Create the managed instance groups gcloud compute instance-groups managed create instance-group-$NAME_SUFFIX-$REGION1 \ --template=template-$NAME_SUFFIX-$REGION1 \ --size=2 \ --region=$REGION1 \ --health-check=http-basic-check-$NAME_SUFFIX gcloud compute instance-groups managed create instance-group-$NAME_SUFFIX-$REGION2 \ --template=template-$NAME_SUFFIX-$REGION2 \ --size=2 \ --region=$REGION2 \ --health-check=http-basic-check-$NAME_SUFFIX ## Create and configure a load balancer For users to access your website, you need to allow traffic through to the VMs that run in the managed instance groups. You also want to automatically redirect traffic to new VMs if there's a zone failure in a managed instance group In the following section, you create an external HTTPS load balancer with a backend service for HTTP traffic on port 80, use the health check created in the previous steps, and map an external IP address through to the backend service For more information, see How to set up a simple external HTTP load balancer Create and configure the load balancer for your application: # Configure port rules for HTTP port 80 gcloud compute instance-groups set-named-ports \ instance-group-$NAME_SUFFIX-$REGION1 \ --named-ports http:80 \ --region $REGION1 gcloud compute instance-groups set-named-ports \ instance-group-$NAME_SUFFIX-$REGION2 \ --named-ports http:80 \ --region $REGION2 # Create a backend service and add the managed instance groups to it gcloud compute backend-services create \ web-backend-service-$NAME_SUFFIX \ --protocol=HTTP \ --port-name=http \ --health-checks=http-basic-check-$NAME_SUFFIX \ --global gcloud compute backend-services add-backend \ web-backend-service-$NAME_SUFFIX \ --instance-group=instance-group-$NAME_SUFFIX-$REGION1 \ --instance-group-region=$REGION1 \ --global gcloud compute backend-services add-backend \ web-backend-service-$NAME_SUFFIX \ --instance-group=instance-group-$NAME_SUFFIX-$REGION2 \ --instance-group-region=$REGION2 \ --global # Create a URL map for the backend service gcloud compute url-maps create web-map-http-$NAME_SUFFIX \ --default-service web-backend-service-$NAME_SUFFIX # Configure forwarding for the HTTP traffic gcloud compute target-http-proxies create \ http-lb-proxy-$NAME_SUFFIX \ --url-map web-map-http-$NAME_SUFFIX gcloud compute forwarding-rules create \ http-content-rule-$NAME_SUFFIX \ --global \ --target-http-proxy=http-lb-proxy-$NAME_SUFFIX \ --ports=80 Get the IP address of the forwarding rule for the web traffic: IP_ADDRESSgcloud compute forwarding-rules describe http-content-rule-$NAME_SUFFIX \ --global \ --format="value(IPAddress Use curl, or open your web browser, to view the website using the IP address of the load balancer from the previous step: curl $IP_ADDRESS It takes a few minutes for the load balancer to finish deploying and to correctly direct traffic to your backend. An HTTP 404 error is returned if the load balancer is still deploying. If needed, wait a few minutes and try to access the website again The basic website is returned, as shown in the following example output: HA / DR example

Welcome to a Compute Engine website with warm failover to Cloud Storagep> ## Create and configure a storage bucket Cloud Storage is used to hold static website files. In this basic example, you create a single file with slightly different text than on the VMs In production deployments, your website likely includes many more files and additional application code on your managed instance group VMs than is shown in this document. The static version hosted in Cloud Storage is often then a more limited version that provides minimal functionality. In a warm failover scenario, this limited website from Cloud Storage is displayed until the managed instance groups recover and can serve traffic for the full website experience Verify the domain you want to use with your Cloud Storage bucket Create a Cloud Storage bucket to match the name of the domain you own and want to use: gsutil mb gsstatic-web.$DOMAIN The DOMAINvariable defined at the start of this document is used, such as . This example stores the static files at Create a local file that you copy to the Cloud Storage bucket in the next step: cat < index.html HA / DR example

Welcome to a test static web server with warm failover from Cloud Storagep> EOF Upload the basic HTML file to the Cloud Storage bucket: gsutil cp index.html gsstatic-web.$DOMAIN To allow users to view the static web content, set the appropriate permissions on the Cloud Storage bucket: gsutil iam ch allUsers:objectViewer gsstatic-web.$DOMAIN Configure the Cloud Storage bucket to serve the index.htmlfile as the default web page: gsutil web set -m index.html gsstatic-web.$DOMAIN ## Create a DNS zone and record To allow traffic to be directed to the warm static site on Cloud Storage when there's an outage with the managed instance groups, create a Cloud DNS zone. Under normal conditions, this DNS zone directs traffic through the external load balancer to the managed instance groups created in the previous sections Create a Cloud DNS zone: gcloud dns managed-zones create zone-$NAME_SUFFIX \ --dns-name=$DOMAIN \ --description="DNS zone for warm site failover" The DOMAINvariable defined at the start of this document is used, such as Get the details of the Cloud DNS zone: gcloud dns managed-zones describe zone-$NAME_SUFFIX The following example output shows the nameServersfor the zone, such as kind: dns#managedZone name: zone-app nameServers: - - - - Cloud DNS must be authoritative for your domain. Create nameserver (NS) records with your domain registrar that point to your Cloud DNS zone. Use the nameserver addresses returned in the previous step For more information and an example using Google Domains, see How to update name servers In your Cloud DNS zone, add a record for wwwusing the load balancer IP address obtained in a previous section: gcloud dns record-sets transaction start \ --zone=zone-$NAME_SUFFIX gcloud dns record-sets transaction add $IP_ADDRESS \ --name=www.$DOMAIN \ --ttl=300 \ --type=A \ --zone=zone-$NAME_SUFFIX This record directs user requests for the website through the load balancer to the managed instance groups. A TTL of 300 seconds is set to reduce the length of time the cached DNS record exists for a user Create a record to be used by the Cloud Storage bucket for the static website: gcloud dns record-sets transaction add \ --name=static-web.$DOMAIN \ --ttl=300 \ --type=CNAME \ --zone=zone-$NAME_SUFFIX This example uses static-webas the subdomain. Leave the, a TTL of 300 seconds is set to reduce the length of time the cached DNS record exists for a user Finally,the DNS record additions to the zone: gcloud dns record-sets transaction execute \ --zone=zone-$NAME_SUFFIX ## Verify and test the DNS zone and records Let's review the resource deployments before simulating a zone failure. All of the resources have been created to support the environment, as shown in the following image: - Cloud DNS zone records direct users to the load balancer for distribution across the managed instance group VMs - A Cloud Storage bucket is configured to host static web pages if there's an outage with the managed instance groups - The Cloud DNS zone is configured to use the static site in Cloud Storage, but doesn't currently resolve requests to the storage bucket To view the DNS records and test resolution, you must resolve addresses against the Cloud DNS servers. In production deployments, make sure you test and verify the addresses resolve correctly, then update your own DNS servers to resolve appropriately. This document doesn't detail the steps to update your own DNS servers, only how to verify traffic flows correctly under normal and failover conditions Get the details of the Cloud DNS zone again: gcloud dns managed-zones describe zone-$NAME_SUFFIX The following example output shows the nameServersfor the zone, such as kind: dns#managedZone name: zone-app nameServers: - - - - To resolve the wwwrecord for your Cloud DNS zone against one of these name servers, use the digcommand: dig www.$DOMAIN This example uses the ns-cloud-b1.googledomains.comnameserver address returned from the previous describecommand. Provide your own nameserver address shown in the output of the previous command The following example output shows that the record resolves to the IP address of the load balancer. If you used this nameserver to access the address, such as using curland the --resolveparameter with the Cloud DNS nameserver, the default page would be displayed from one of the managed instance groups behind the load balancer ;DiG [email protected] ; (1 server found);; QUESTION SECTION: ; IN A ;; ANSWER SECTION: 300 IN A Use the digcommand again to verify the DNS record for the static website in Cloud Storage: dig static-web.$DOMAIN The following example output shows that the record resolves to Cloud Storage that can serve the static content from the storage bucket: ;DiG [email protected] ; (1 server found);; QUESTION SECTION: ; IN A ;; ANSWER SECTION: 300 IN CNAME ## Fail over to the Cloud Storage bucket In a production environment, you might get an alert using Cloud Monitoring or other monitoring solution when there's a problem with the managed instance groups. This alert prompts a human to understand the scope of the failure before you update the Cloud DNS records to redirect traffic to the Cloud Storage-hosted static website. An alternative approach is to use your monitoring solution to automatically respond to outages with the managed instance groups When you fail over, Cloud DNS resolves traffic to the Cloud Storage-hosted static website, as shown in the following image: When you or your monitoring solution determine the most appropriate action is to update the Cloud DNS records to direct traffic to Cloud Storage, update the existing DNS A record. In this document, you manually update the Cloud DNS records to redirect traffic to the Cloud Storage-hosted static website To fail over the Cloud DNS records, remove the existing Arecord that resolves to the load balancer: gcloud dns record-sets transaction start \ --zone=zone-$NAME_SUFFIX gcloud dns record-sets transaction remove $IP_ADDRESS \ --name=www.$DOMAIN \ --ttl=300 \ --type=A \ --zone=zone-$NAME_SUFFIX Create a CNAMErecord for wwwthat points to the Cloud Storage-hosted content: gcloud dns record-sets transaction add static-web.$DOMAIN \ --name=www.$DOMAIN. \ --ttl=30 \ --type=CNAME \ --zone=zone-$NAME_SUFFIX the updates to the Cloud DNS zone: gcloud dns record-sets transaction execute \ --zone=zone-$NAME_SUFFIX Use the digcommand to confirm the wwwrecord now resolves to the address of the Cloud Storage static website: dig www.$DOMAIN The following example output shows that the www.example.comrecord resolves to the CNAME record of the Cloud Storage static website. Requests to access www.example.comare redirected to the Cloud Storage bucket, which displays the static website: ;DiG [email protected] ; (1 server found);; QUESTION SECTION: ; IN A ;; ANSWER SECTION: 30 IN CNAME 300 IN CNAME ## Fail back to the managed instance groups After issues with the managed instance groups are resolved, you can fail back to serving content from the load-balanced managed instance groups by updating the Cloud DNS records again. Again, a human might make this decision using Cloud Monitoring insights for the health of the managed instance groups. Or, you could use automation to respond to the restored health of the managed instance group. 