Posted on :: 2909 Words :: Tags: , , , ,

Multi-Cloud: When One Cloud Isn't Enough (And Why That's Usually Fine)

You've mastered Terraform on a single cloud. Now every conference talk and LinkedIn post is screaming: "Go multi-cloud! Avoid vendor lock-in! Future-proof your architecture!"

Here's the truth nobody wants to admit: Multi-cloud is usually overkill.

But sometimes? It's absolutely necessary. Your company just acquired another company running on a different cloud. You need specific regional compliance that only Azure offers. Or maybe you're genuinely building the next Netflix and need geographic coverage across every cloud provider on Earth.

This tutorial teaches you how to implement multi-cloud Terraform, when it actually makes sense, and when to politely decline the complexity explosion.

📦 Code Examples

Repository: terraform-hcl-tutorial-series This Part: Part 8 - Multi-Cloud Patterns

Get the working example:

git clone https://github.com/khuongdo/terraform-hcl-tutorial-series.git
cd terraform-hcl-tutorial-series
git checkout part-08
cd examples/part-08-multi-cloud/

# Compare AWS, GCP, and Azure patterns
terraform init
terraform plan

The Multi-Cloud Reality Check Nobody Talks About

Let's start with brutal honesty.

Why Companies Actually Go Multi-Cloud

Legitimate reasons that justify the pain:

1. Mergers and Acquisitions Company A runs everything on AWS. Company B runs everything on GCP. Congratulations on your acquisition! You now have a multi-cloud architecture whether you wanted one or not.

2. Geographic Compliance You need data residency in Germany, but AWS doesn't have the specific compliance certifications you need. Azure does. Boom - you're multi-cloud.

3. Negotiation Leverage When your AWS bill hits 7 figures annually, the ability to credibly say "we can migrate to GCP" during contract negotiations has real financial value.

4. Best-of-Breed Services GCP's BigQuery genuinely beats AWS Redshift for certain analytics workloads. AWS Lambda has the most mature serverless ecosystem. Azure Active Directory integrates better with your enterprise Windows environment. Sometimes you need all three.

5. True Disaster Recovery AWS US-EAST-1 going down is rare, but it happens. If 6 hours of downtime costs you $10 million, cross-cloud failover makes financial sense.

The Bad Reasons (Most of Them)

"We need to avoid vendor lock-in" You're already locked into Terraform, Kubernetes, Docker, PostgreSQL, and a dozen other technologies. Your cloud provider is the least of your lock-in concerns. Also, migrating clouds is incredibly expensive - you won't do it unless forced to.

"What if AWS shuts down our account?" If AWS terminates your account, you have bigger problems than multi-cloud architecture. Focus on compliance and ToS adherence instead.

"It looks impressive on my resume" Cool. Your company is now paying 3-5x operational overhead so you can pad your LinkedIn. Not a good trade.

"We want flexibility" Flexibility to do what, exactly? Migrate your entire production infrastructure mid-quarter? That's not flexibility, that's chaos.

Real Talk

Multi-cloud multiplies your operational complexity by 3-5x. You need expertise in multiple cloud consoles, billing systems, IAM models, and networking paradigms. Every engineer needs to context-switch between three different ways of doing the same thing. Only proceed if the business value clearly justifies this cost.

The Foundation: Configuring Multiple Providers

If you've decided multi-cloud is worth it (or it was decided for you), here's how to configure Terraform to work with multiple clouds.

The good news? Terraform makes the configuration part straightforward.

Basic Multi-Provider Setup

# main.tf
terraform {
  required_version = ">= 1.0"

  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.0"
    }
    google = {
      source  = "hashicorp/google"
      version = "~> 5.0"
    }
    azurerm = {
      source  = "hashicorp/azurerm"
      version = "~> 3.0"
    }
  }
}

# Provider configurations
provider "aws" {
  region = "us-east-1"
}

provider "google" {
  project = "my-gcp-project"
  region  = "us-central1"
}

provider "azurerm" {
  features {}  # Required empty block (yes, really)
}

Notice how each provider has its own configuration quirks:

  • AWS wants a region
  • GCP wants a project AND a region
  • Azure demands a features {} block even if it's empty

Welcome to multi-cloud! Everything is almost the same but subtly different in ways that will frustrate you at 11 PM.

Authentication: Three Different Ways to Say "Log In"

AWS uses the standard credential chain:

export AWS_ACCESS_KEY_ID="your-key"
export AWS_SECRET_ACCESS_KEY="your-secret"

Or use ~/.aws/credentials, or IAM roles if you're running on EC2, or OIDC if you're in CI/CD. Standard AWS stuff.

GCP uses Application Default Credentials:

export GOOGLE_APPLICATION_CREDENTIALS="/path/to/service-account.json"

Or use gcloud auth application-default login for local development. GCP has its own way of doing things.

Azure uses the Azure CLI or service principals:

az login  # Interactive authentication

Or for automation:

export ARM_CLIENT_ID="your-client-id"
export ARM_CLIENT_SECRET="your-client-secret"
export ARM_TENANT_ID="your-tenant-id"
export ARM_SUBSCRIPTION_ID="your-subscription-id"

Security Warning

NEVER hardcode credentials in Terraform files. Use environment variables, cloud-native secret managers (AWS Secrets Manager, GCP Secret Manager, Azure Key Vault), or CI/CD pipeline secrets. Treat credentials like nuclear launch codes - because to your infrastructure, they basically are.

The Cloud Rosetta Stone: Same Thing, Different Names

Every cloud does the same things - virtual machines, object storage, databases - but with completely different resource names and configuration styles.

Here's your decoder ring.

Virtual Machines: Three Ways to Rent a Computer

AWS EC2 Instance:

resource "aws_instance" "web" {
  ami           = "ami-0c55b159cbfafe1f0"  # Ubuntu 22.04
  instance_type = "t3.micro"

  tags = {
    Name        = "web-server"
    Environment = "production"
  }
}

Short and sweet. AWS pioneered cloud computing, so their API is relatively straightforward.

GCP Compute Engine:

resource "google_compute_instance" "web" {
  name         = "web-server"
  machine_type = "e2-micro"
  zone         = "us-central1-a"

  boot_disk {
    initialize_params {
      image = "ubuntu-os-cloud/ubuntu-2204-lts"
    }
  }

  network_interface {
    network = "default"
    access_config {}  # Ephemeral public IP
  }

  labels = {
    environment = "production"
  }
}

More verbose. GCP requires explicit boot disk and network configuration. Also, they call tags "labels" because consistency is overrated.

Azure Virtual Machine:

resource "azurerm_resource_group" "main" {
  name     = "web-resources"
  location = "East US"
}

resource "azurerm_linux_virtual_machine" "web" {
  name                = "web-server"
  resource_group_name = azurerm_resource_group.main.name
  location            = azurerm_resource_group.main.location
  size                = "Standard_B1s"

  admin_username = "adminuser"
  admin_ssh_key {
    username   = "adminuser"
    public_key = file("~/.ssh/id_rsa.pub")
  }

  network_interface_ids = [
    azurerm_network_interface.main.id,
  ]

  os_disk {
    caching              = "ReadWrite"
    storage_account_type = "Standard_LRS"
  }

  source_image_reference {
    publisher = "Canonical"
    offer     = "0001-com-ubuntu-server-jammy"
    sku       = "22_04-lts"
    version   = "latest"
  }

  tags = {
    environment = "production"
  }
}

Azure takes verbosity to new heights. You need a resource group (a container for other resources), a network interface (separate resource), and an image reference with publisher/offer/SKU. Buckle up.

Quick comparison:

ConceptAWSGCPAzure
Resourceaws_instancegoogle_compute_instanceazurerm_linux_virtual_machine
ImageAMI IDImage family/projectPublisher/Offer/SKU
Sizet3.microe2-microStandard_B1s
LocationRegion (implicit)Zone (explicit)Location (via resource group)
MetadataTagsLabelsTags

Same concept. Completely different execution.

Object Storage: Buckets All the Way Down

AWS S3:

resource "aws_s3_bucket" "data" {
  bucket = "my-company-data-bucket"

  tags = {
    Purpose = "analytics"
  }
}

resource "aws_s3_bucket_versioning" "data" {
  bucket = aws_s3_bucket.data.id

  versioning_configuration {
    status = "Enabled"
  }
}

Versioning is a separate resource as of AWS provider v4. Because reasons.

GCP Cloud Storage:

resource "google_storage_bucket" "data" {
  name     = "my-company-data-bucket"
  location = "US"

  versioning {
    enabled = true
  }

  labels = {
    purpose = "analytics"
  }
}

Cleaner. Versioning is inline. Labels instead of tags.

Azure Blob Storage:

resource "azurerm_storage_account" "data" {
  name                     = "mycompanydatastorage"  # Lowercase, globally unique
  resource_group_name      = azurerm_resource_group.main.name
  location                 = azurerm_resource_group.main.location
  account_tier             = "Standard"
  account_replication_type = "LRS"

  tags = {
    purpose = "analytics"
  }
}

resource "azurerm_storage_container" "data" {
  name                  = "data-bucket"
  storage_account_name  = azurerm_storage_account.data.name
  container_access_type = "private"
}

Azure uses a two-level hierarchy: Storage Account (parent) → Container (child). The storage account name must be globally unique, lowercase, and can't have hyphens. Enjoy debugging that validation error.

Key differences:

  • AWS: Bucket name globally unique, versioning is separate resource
  • GCP: Bucket name globally unique, versioning inline
  • Azure: Storage Account + Container model, strict naming rules

Managed Databases: PostgreSQL Three Ways

AWS RDS:

resource "aws_db_instance" "postgres" {
  identifier          = "app-database"
  engine              = "postgres"
  engine_version      = "15.3"
  instance_class      = "db.t3.micro"
  allocated_storage   = 20
  storage_type        = "gp3"
  db_name             = "appdb"
  username            = "dbadmin"
  password            = var.db_password  # NEVER hardcode this
  skip_final_snapshot = true

  tags = {
    Application = "web-app"
  }
}

GCP Cloud SQL:

resource "google_sql_database_instance" "postgres" {
  name             = "app-database"
  database_version = "POSTGRES_15"
  region           = "us-central1"

  settings {
    tier = "db-f1-micro"

    database_flags {
      name  = "max_connections"
      value = "100"
    }
  }

  deletion_protection = false  # Set to true in production!
}

resource "google_sql_database" "app" {
  name     = "appdb"
  instance = google_sql_database_instance.postgres.name
}

resource "google_sql_user" "admin" {
  name     = "dbadmin"
  instance = google_sql_database_instance.postgres.name
  password = var.db_password
}

GCP requires separate resources for the database and user. AWS bundles it together.

Azure Database for PostgreSQL:

resource "azurerm_postgresql_flexible_server" "postgres" {
  name                = "app-database-server"
  resource_group_name = azurerm_resource_group.main.name
  location            = azurerm_resource_group.main.location
  version             = "15"

  administrator_login    = "dbadmin"
  administrator_password = var.db_password

  sku_name   = "B_Standard_B1ms"
  storage_mb = 32768

  tags = {
    application = "web-app"
  }
}

resource "azurerm_postgresql_flexible_server_database" "app" {
  name      = "appdb"
  server_id = azurerm_postgresql_flexible_server.postgres.id
}

Azure also separates the server and database. Notice the SKU naming convention - completely different from AWS/GCP.

When Multi-Cloud Actually Makes Sense

Use this decision framework before committing to multi-cloud complexity:

Do you have multiple cloud providers TODAY?
├─ No → Stick with single cloud. Add second cloud only with clear business case.
└─ Yes → Continue

Is multi-cloud mandatory (merger, compliance, existing contracts)?
├─ Yes → Multi-cloud is required. Optimize for operational simplicity.
└─ No → Continue

Can you consolidate to single cloud within 12 months?
├─ Yes → Create migration plan. Don't invest in multi-cloud abstraction.
└─ No → Embrace multi-cloud. Build for the long haul.

Scenario 1: Geographic Coverage

You're building a global CDN and need presence in 50+ regions worldwide.

# AWS dominates North America and Europe
provider "aws" {
  alias  = "us_east"
  region = "us-east-1"
}

provider "aws" {
  alias  = "eu_west"
  region = "eu-west-1"
}

# GCP strong in Asia-Pacific
provider "google" {
  alias  = "asia"
  region = "asia-southeast1"
}

# Azure for government and compliance regions
provider "azurerm" {
  alias       = "gov"
  environment = "usgovernment"
}

Valid use case. Each cloud has different regional coverage.

Scenario 2: Best-of-Breed Services

# GCP for machine learning and analytics
resource "google_bigquery_dataset" "analytics" {
  dataset_id = "web_analytics"
  location   = "US"
}

# AWS for mature compute ecosystem
resource "aws_ecs_cluster" "app" {
  name = "production-app"
}

# Azure for enterprise Windows integration
resource "azurerm_active_directory_domain_service" "corp" {
  name                = "corp-domain"
  resource_group_name = azurerm_resource_group.main.name
  location            = azurerm_resource_group.main.location
  # ... configuration
}

Choosing the best tool for each job. Defensible if you have the team to support it.

Scenario 3: Disaster Recovery Across Clouds

# Primary application on GCP
resource "google_compute_instance" "app_primary" {
  name         = "app-primary"
  machine_type = "n2-standard-4"
  zone         = "us-central1-a"
  # ... configuration
}

# Failover to AWS if GCP region fails
resource "aws_instance" "app_failover" {
  ami           = "ami-0c55b159cbfafe1f0"
  instance_type = "t3.xlarge"
  # ... configuration
}

# DNS failover using Route53
resource "aws_route53_health_check" "gcp_primary" {
  fqdn = google_compute_instance.app_primary.network_interface[0].access_config[0].nat_ip
  type = "HTTPS"
}

Expensive but effective for mission-critical systems where downtime costs millions.

Provider Aliases: Multiple Instances of the Same Cloud

Sometimes you need multiple configurations of the same provider - deploying to multiple AWS regions, for example.

Multi-Region AWS Deployment

# providers.tf
provider "aws" {
  region = "us-east-1"
  alias  = "us_east"
}

provider "aws" {
  region = "eu-west-1"
  alias  = "eu_west"
}

provider "aws" {
  region = "ap-southeast-1"
  alias  = "asia"
}

# main.tf
resource "aws_s3_bucket" "us_data" {
  provider = aws.us_east
  bucket   = "my-app-us-data"
}

resource "aws_s3_bucket" "eu_data" {
  provider = aws.eu_west
  bucket   = "my-app-eu-data"
}

resource "aws_s3_bucket" "asia_data" {
  provider = aws.asia
  bucket   = "my-app-asia-data"
}

Critical rule: Resources without explicit provider = use the default provider (the one without alias). Be explicit to avoid surprises.

Multi-Account AWS Strategy

provider "aws" {
  region = "us-east-1"
  # Default: production account credentials
}

provider "aws" {
  alias  = "dev"
  region = "us-east-1"

  assume_role {
    role_arn = "arn:aws:iam::111111111111:role/TerraformDev"
  }
}

provider "aws" {
  alias  = "staging"
  region = "us-east-1"

  assume_role {
    role_arn = "arn:aws:iam::222222222222:role/TerraformStaging"
  }
}

# Deploy to dev account
resource "aws_instance" "dev_web" {
  provider      = aws.dev
  ami           = "ami-0c55b159cbfafe1f0"
  instance_type = "t3.micro"
}

# Deploy to production account
resource "aws_instance" "prod_web" {
  # Uses default provider (production account)
  ami           = "ami-0c55b159cbfafe1f0"
  instance_type = "t3.large"
}

This is actually more common than multi-cloud - separating environments by AWS accounts.

Cross-Cloud Networking: Where Dreams Go to Die

You want your AWS VPC to talk to your GCP VPC? Buckle up, this is where multi-cloud gets painful.

Each cloud has different VPC models, IP addressing schemes, routing paradigms, and peering mechanisms. And the data egress fees? Brutal.

VPN Between AWS and GCP (The Hard Way)

AWS side:

resource "aws_vpn_gateway" "main" {
  vpc_id = aws_vpc.main.id

  tags = {
    Name = "aws-to-gcp-vpn"
  }
}

resource "aws_customer_gateway" "gcp" {
  bgp_asn    = 65000
  ip_address = google_compute_address.vpn.address
  type       = "ipsec.1"

  tags = {
    Name = "gcp-gateway"
  }
}

resource "aws_vpn_connection" "gcp" {
  vpn_gateway_id      = aws_vpn_gateway.main.id
  customer_gateway_id = aws_customer_gateway.gcp.id
  type                = "ipsec.1"
  static_routes_only  = true
}

GCP side:

resource "google_compute_address" "vpn" {
  name   = "gcp-vpn-ip"
  region = "us-central1"
}

resource "google_compute_vpn_gateway" "main" {
  name    = "gcp-to-aws-vpn"
  network = google_compute_network.main.id
  region  = "us-central1"
}

resource "google_compute_vpn_tunnel" "aws" {
  name          = "tunnel-to-aws"
  peer_ip       = aws_vpn_connection.gcp.tunnel1_address
  shared_secret = aws_vpn_connection.gcp.tunnel1_preshared_key

  target_vpn_gateway = google_compute_vpn_gateway.main.id

  local_traffic_selector  = ["10.1.0.0/16"]
  remote_traffic_selector = ["10.0.0.0/16"]
}

Reality Check

Cross-cloud VPNs are expensive (data egress fees are eye-watering), operationally complex (asymmetric routing, MTU issues, BGP debugging at 3 AM), and often unnecessary. Use cloud-native public APIs with authentication instead whenever possible.

The Smart Alternative: Public API Integration

Instead of private networking, expose services via authenticated public APIs:

# AWS: Expose API Gateway
resource "aws_api_gateway_rest_api" "app" {
  name = "app-api"
}

# GCP: Call AWS API from Cloud Run
resource "google_cloud_run_service" "worker" {
  name     = "data-processor"
  location = "us-central1"

  template {
    spec {
      containers {
        image = "gcr.io/my-project/worker:latest"
        env {
          name  = "AWS_API_URL"
          value = "https://${aws_api_gateway_rest_api.app.id}.execute-api.us-east-1.amazonaws.com"
        }
      }
    }
  }
}

Why this is better:

  • No VPN complexity
  • Standard HTTPS encryption (TLS 1.3)
  • Cloud-native authentication (AWS Signature v4, GCP OAuth2)
  • Pay per API request, not 24/7 VPN tunnel uptime
  • Easier debugging (standard HTTP tools work)

Provider-Agnostic Modules: Abstraction vs. Reality

You can write modules that abstract away cloud provider differences. Should you?

The abstraction:

# modules/compute_instance/main.tf
variable "cloud_provider" {
  type = string
  validation {
    condition     = contains(["aws", "gcp", "azure"], var.cloud_provider)
    error_message = "Provider must be aws, gcp, or azure"
  }
}

variable "instance_name" {
  type = string
}

variable "instance_size" {
  type = string
}

# Conditional resources based on provider
resource "aws_instance" "this" {
  count         = var.cloud_provider == "aws" ? 1 : 0
  ami           = data.aws_ami.ubuntu.id
  instance_type = var.instance_size

  tags = {
    Name = var.instance_name
  }
}

resource "google_compute_instance" "this" {
  count        = var.cloud_provider == "gcp" ? 1 : 0
  name         = var.instance_name
  machine_type = var.instance_size
  zone         = "us-central1-a"

  boot_disk {
    initialize_params {
      image = "ubuntu-os-cloud/ubuntu-2204-lts"
    }
  }

  network_interface {
    network = "default"
  }
}

# Outputs that work across providers
output "instance_id" {
  value = var.cloud_provider == "aws" ? aws_instance.this[0].id : (
    var.cloud_provider == "gcp" ? google_compute_instance.this[0].id :
    azurerm_linux_virtual_machine.this[0].id
  )
}

Usage:

module "web_aws" {
  source = "./modules/compute_instance"

  cloud_provider = "aws"
  instance_name  = "web-server"
  instance_size  = "t3.micro"
}

module "web_gcp" {
  source = "./modules/compute_instance"

  cloud_provider = "gcp"
  instance_name  = "web-server"
  instance_size  = "e2-micro"
}

The trade-offs:

Pros:

  • Single interface for multiple clouds
  • Easier to migrate between providers (in theory)
  • Consistent configuration patterns

Cons:

  • Abstracts away cloud-specific features (you lose flexibility)
  • More complex module logic (conditional resources everywhere)
  • Maintenance burden when providers change their APIs
  • You're building your own mini-Terraform on top of Terraform

Honest recommendation: Only build provider-agnostic modules if you're actually deploying the same workload to multiple clouds regularly. Otherwise, you're solving a problem you don't have.

Checkpoint Questions

Before moving to Part 9, ensure you understand:

  1. What are three legitimate business reasons to go multi-cloud? (M&A, geographic compliance, best-of-breed services)

  2. What's the difference between multi-cloud and multi-region? (Multi-cloud = different providers; multi-region = same provider, different locations)

  3. How do you configure Terraform to use multiple cloud providers? (Multiple entries in required_providers block, multiple provider configurations)

  4. What is a provider alias and when do you use one? (Multiple instances of same provider for different regions/accounts)

  5. Compare VM creation: How does creating a VM differ between AWS, GCP, and Azure? (Different resource names, configuration styles, required fields)

  6. Why is cross-cloud networking complex? (Different VPC models, egress fees, routing complexity, operational overhead)

  7. What's a simpler alternative to VPN-based cross-cloud networking? (Public APIs with authentication)

  8. When should you avoid multi-cloud? (Single cloud meets requirements, team lacks multi-cloud expertise, no clear business case, early-stage company)

Checkpoint!

Multi-cloud is a tool, not a religion. Use it when business requirements demand it - mergers, compliance, geographic reach, or specific cloud services. Avoid it when single-cloud simplicity serves you better. The best architecture is the one your team can operate reliably at 3 AM on a Sunday.

What's Next: Part 9 - Terraform Backends & Remote State

You've been running terraform apply on your laptop. That's fine for learning.

But what happens when Sarah runs terraform apply on her laptop at the same time you do?

State conflicts. Infrastructure corruption. Production outages. Panic.

In Part 9, we'll solve team collaboration with remote backends:

  • Why local state files break teams (and how to fix it)
  • Configuring remote backends (S3, GCS, Azure Storage)
  • State locking with DynamoDB/Cloud Storage (preventing simultaneous applies)
  • Migrating existing state to remote backends (without destroying everything)
  • Workspace strategies for multi-environment management

The problem:

Developer A: terraform apply (starts)
Developer B (10 seconds later): terraform apply (starts)
Result: State file corruption! Who created what? Nobody knows!

The solution: Remote state with locking. See you in Part 9.


Series navigation:


This post is part of the "Terraform from Fundamentals to Production" series. Follow along to master Infrastructure as Code with Terraform.