Table of Contents
- Multi-Cloud: When One Cloud Isn't Enough (And Why That's Usually Fine)
- 📦 Code Examples
- The Multi-Cloud Reality Check Nobody Talks About
- The Foundation: Configuring Multiple Providers
- The Cloud Rosetta Stone: Same Thing, Different Names
- Virtual Machines: Three Ways to Rent a Computer
- Object Storage: Buckets All the Way Down
- Managed Databases: PostgreSQL Three Ways
- When Multi-Cloud Actually Makes Sense
- Scenario 1: Geographic Coverage
- Scenario 2: Best-of-Breed Services
- Scenario 3: Disaster Recovery Across Clouds
- Provider Aliases: Multiple Instances of the Same Cloud
- Cross-Cloud Networking: Where Dreams Go to Die
- Provider-Agnostic Modules: Abstraction vs. Reality
- Checkpoint Questions
- What's Next: Part 9 - Terraform Backends & Remote State
Multi-Cloud: When One Cloud Isn't Enough (And Why That's Usually Fine)
You've mastered Terraform on a single cloud. Now every conference talk and LinkedIn post is screaming: "Go multi-cloud! Avoid vendor lock-in! Future-proof your architecture!"
Here's the truth nobody wants to admit: Multi-cloud is usually overkill.
But sometimes? It's absolutely necessary. Your company just acquired another company running on a different cloud. You need specific regional compliance that only Azure offers. Or maybe you're genuinely building the next Netflix and need geographic coverage across every cloud provider on Earth.
This tutorial teaches you how to implement multi-cloud Terraform, when it actually makes sense, and when to politely decline the complexity explosion.
📦 Code Examples
Repository: terraform-hcl-tutorial-series This Part: Part 8 - Multi-Cloud Patterns
Get the working example:
git clone https://github.com/khuongdo/terraform-hcl-tutorial-series.git
cd terraform-hcl-tutorial-series
git checkout part-08
cd examples/part-08-multi-cloud/
# Compare AWS, GCP, and Azure patterns
terraform init
terraform plan
The Multi-Cloud Reality Check Nobody Talks About
Let's start with brutal honesty.
Why Companies Actually Go Multi-Cloud
Legitimate reasons that justify the pain:
1. Mergers and Acquisitions Company A runs everything on AWS. Company B runs everything on GCP. Congratulations on your acquisition! You now have a multi-cloud architecture whether you wanted one or not.
2. Geographic Compliance You need data residency in Germany, but AWS doesn't have the specific compliance certifications you need. Azure does. Boom - you're multi-cloud.
3. Negotiation Leverage When your AWS bill hits 7 figures annually, the ability to credibly say "we can migrate to GCP" during contract negotiations has real financial value.
4. Best-of-Breed Services GCP's BigQuery genuinely beats AWS Redshift for certain analytics workloads. AWS Lambda has the most mature serverless ecosystem. Azure Active Directory integrates better with your enterprise Windows environment. Sometimes you need all three.
5. True Disaster Recovery AWS US-EAST-1 going down is rare, but it happens. If 6 hours of downtime costs you $10 million, cross-cloud failover makes financial sense.
The Bad Reasons (Most of Them)
"We need to avoid vendor lock-in" You're already locked into Terraform, Kubernetes, Docker, PostgreSQL, and a dozen other technologies. Your cloud provider is the least of your lock-in concerns. Also, migrating clouds is incredibly expensive - you won't do it unless forced to.
"What if AWS shuts down our account?" If AWS terminates your account, you have bigger problems than multi-cloud architecture. Focus on compliance and ToS adherence instead.
"It looks impressive on my resume" Cool. Your company is now paying 3-5x operational overhead so you can pad your LinkedIn. Not a good trade.
"We want flexibility" Flexibility to do what, exactly? Migrate your entire production infrastructure mid-quarter? That's not flexibility, that's chaos.
Multi-cloud multiplies your operational complexity by 3-5x. You need expertise in multiple cloud consoles, billing systems, IAM models, and networking paradigms. Every engineer needs to context-switch between three different ways of doing the same thing. Only proceed if the business value clearly justifies this cost.
The Foundation: Configuring Multiple Providers
If you've decided multi-cloud is worth it (or it was decided for you), here's how to configure Terraform to work with multiple clouds.
The good news? Terraform makes the configuration part straightforward.
Basic Multi-Provider Setup
# main.tf
terraform {
required_version = ">= 1.0"
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 5.0"
}
google = {
source = "hashicorp/google"
version = "~> 5.0"
}
azurerm = {
source = "hashicorp/azurerm"
version = "~> 3.0"
}
}
}
# Provider configurations
provider "aws" {
region = "us-east-1"
}
provider "google" {
project = "my-gcp-project"
region = "us-central1"
}
provider "azurerm" {
features {} # Required empty block (yes, really)
}
Notice how each provider has its own configuration quirks:
- AWS wants a
region - GCP wants a
projectAND aregion - Azure demands a
features {}block even if it's empty
Welcome to multi-cloud! Everything is almost the same but subtly different in ways that will frustrate you at 11 PM.
Authentication: Three Different Ways to Say "Log In"
AWS uses the standard credential chain:
export AWS_ACCESS_KEY_ID="your-key"
export AWS_SECRET_ACCESS_KEY="your-secret"
Or use ~/.aws/credentials, or IAM roles if you're running on EC2, or OIDC if you're in CI/CD. Standard AWS stuff.
GCP uses Application Default Credentials:
export GOOGLE_APPLICATION_CREDENTIALS="/path/to/service-account.json"
Or use gcloud auth application-default login for local development. GCP has its own way of doing things.
Azure uses the Azure CLI or service principals:
az login # Interactive authentication
Or for automation:
export ARM_CLIENT_ID="your-client-id"
export ARM_CLIENT_SECRET="your-client-secret"
export ARM_TENANT_ID="your-tenant-id"
export ARM_SUBSCRIPTION_ID="your-subscription-id"
NEVER hardcode credentials in Terraform files. Use environment variables, cloud-native secret managers (AWS Secrets Manager, GCP Secret Manager, Azure Key Vault), or CI/CD pipeline secrets. Treat credentials like nuclear launch codes - because to your infrastructure, they basically are.
The Cloud Rosetta Stone: Same Thing, Different Names
Every cloud does the same things - virtual machines, object storage, databases - but with completely different resource names and configuration styles.
Here's your decoder ring.
Virtual Machines: Three Ways to Rent a Computer
AWS EC2 Instance:
resource "aws_instance" "web" {
ami = "ami-0c55b159cbfafe1f0" # Ubuntu 22.04
instance_type = "t3.micro"
tags = {
Name = "web-server"
Environment = "production"
}
}
Short and sweet. AWS pioneered cloud computing, so their API is relatively straightforward.
GCP Compute Engine:
resource "google_compute_instance" "web" {
name = "web-server"
machine_type = "e2-micro"
zone = "us-central1-a"
boot_disk {
initialize_params {
image = "ubuntu-os-cloud/ubuntu-2204-lts"
}
}
network_interface {
network = "default"
access_config {} # Ephemeral public IP
}
labels = {
environment = "production"
}
}
More verbose. GCP requires explicit boot disk and network configuration. Also, they call tags "labels" because consistency is overrated.
Azure Virtual Machine:
resource "azurerm_resource_group" "main" {
name = "web-resources"
location = "East US"
}
resource "azurerm_linux_virtual_machine" "web" {
name = "web-server"
resource_group_name = azurerm_resource_group.main.name
location = azurerm_resource_group.main.location
size = "Standard_B1s"
admin_username = "adminuser"
admin_ssh_key {
username = "adminuser"
public_key = file("~/.ssh/id_rsa.pub")
}
network_interface_ids = [
azurerm_network_interface.main.id,
]
os_disk {
caching = "ReadWrite"
storage_account_type = "Standard_LRS"
}
source_image_reference {
publisher = "Canonical"
offer = "0001-com-ubuntu-server-jammy"
sku = "22_04-lts"
version = "latest"
}
tags = {
environment = "production"
}
}
Azure takes verbosity to new heights. You need a resource group (a container for other resources), a network interface (separate resource), and an image reference with publisher/offer/SKU. Buckle up.
Quick comparison:
| Concept | AWS | GCP | Azure |
|---|---|---|---|
| Resource | aws_instance | google_compute_instance | azurerm_linux_virtual_machine |
| Image | AMI ID | Image family/project | Publisher/Offer/SKU |
| Size | t3.micro | e2-micro | Standard_B1s |
| Location | Region (implicit) | Zone (explicit) | Location (via resource group) |
| Metadata | Tags | Labels | Tags |
Same concept. Completely different execution.
Object Storage: Buckets All the Way Down
AWS S3:
resource "aws_s3_bucket" "data" {
bucket = "my-company-data-bucket"
tags = {
Purpose = "analytics"
}
}
resource "aws_s3_bucket_versioning" "data" {
bucket = aws_s3_bucket.data.id
versioning_configuration {
status = "Enabled"
}
}
Versioning is a separate resource as of AWS provider v4. Because reasons.
GCP Cloud Storage:
resource "google_storage_bucket" "data" {
name = "my-company-data-bucket"
location = "US"
versioning {
enabled = true
}
labels = {
purpose = "analytics"
}
}
Cleaner. Versioning is inline. Labels instead of tags.
Azure Blob Storage:
resource "azurerm_storage_account" "data" {
name = "mycompanydatastorage" # Lowercase, globally unique
resource_group_name = azurerm_resource_group.main.name
location = azurerm_resource_group.main.location
account_tier = "Standard"
account_replication_type = "LRS"
tags = {
purpose = "analytics"
}
}
resource "azurerm_storage_container" "data" {
name = "data-bucket"
storage_account_name = azurerm_storage_account.data.name
container_access_type = "private"
}
Azure uses a two-level hierarchy: Storage Account (parent) → Container (child). The storage account name must be globally unique, lowercase, and can't have hyphens. Enjoy debugging that validation error.
Key differences:
- AWS: Bucket name globally unique, versioning is separate resource
- GCP: Bucket name globally unique, versioning inline
- Azure: Storage Account + Container model, strict naming rules
Managed Databases: PostgreSQL Three Ways
AWS RDS:
resource "aws_db_instance" "postgres" {
identifier = "app-database"
engine = "postgres"
engine_version = "15.3"
instance_class = "db.t3.micro"
allocated_storage = 20
storage_type = "gp3"
db_name = "appdb"
username = "dbadmin"
password = var.db_password # NEVER hardcode this
skip_final_snapshot = true
tags = {
Application = "web-app"
}
}
GCP Cloud SQL:
resource "google_sql_database_instance" "postgres" {
name = "app-database"
database_version = "POSTGRES_15"
region = "us-central1"
settings {
tier = "db-f1-micro"
database_flags {
name = "max_connections"
value = "100"
}
}
deletion_protection = false # Set to true in production!
}
resource "google_sql_database" "app" {
name = "appdb"
instance = google_sql_database_instance.postgres.name
}
resource "google_sql_user" "admin" {
name = "dbadmin"
instance = google_sql_database_instance.postgres.name
password = var.db_password
}
GCP requires separate resources for the database and user. AWS bundles it together.
Azure Database for PostgreSQL:
resource "azurerm_postgresql_flexible_server" "postgres" {
name = "app-database-server"
resource_group_name = azurerm_resource_group.main.name
location = azurerm_resource_group.main.location
version = "15"
administrator_login = "dbadmin"
administrator_password = var.db_password
sku_name = "B_Standard_B1ms"
storage_mb = 32768
tags = {
application = "web-app"
}
}
resource "azurerm_postgresql_flexible_server_database" "app" {
name = "appdb"
server_id = azurerm_postgresql_flexible_server.postgres.id
}
Azure also separates the server and database. Notice the SKU naming convention - completely different from AWS/GCP.
When Multi-Cloud Actually Makes Sense
Use this decision framework before committing to multi-cloud complexity:
Do you have multiple cloud providers TODAY?
├─ No → Stick with single cloud. Add second cloud only with clear business case.
└─ Yes → Continue
Is multi-cloud mandatory (merger, compliance, existing contracts)?
├─ Yes → Multi-cloud is required. Optimize for operational simplicity.
└─ No → Continue
Can you consolidate to single cloud within 12 months?
├─ Yes → Create migration plan. Don't invest in multi-cloud abstraction.
└─ No → Embrace multi-cloud. Build for the long haul.
Scenario 1: Geographic Coverage
You're building a global CDN and need presence in 50+ regions worldwide.
# AWS dominates North America and Europe
provider "aws" {
alias = "us_east"
region = "us-east-1"
}
provider "aws" {
alias = "eu_west"
region = "eu-west-1"
}
# GCP strong in Asia-Pacific
provider "google" {
alias = "asia"
region = "asia-southeast1"
}
# Azure for government and compliance regions
provider "azurerm" {
alias = "gov"
environment = "usgovernment"
}
Valid use case. Each cloud has different regional coverage.
Scenario 2: Best-of-Breed Services
# GCP for machine learning and analytics
resource "google_bigquery_dataset" "analytics" {
dataset_id = "web_analytics"
location = "US"
}
# AWS for mature compute ecosystem
resource "aws_ecs_cluster" "app" {
name = "production-app"
}
# Azure for enterprise Windows integration
resource "azurerm_active_directory_domain_service" "corp" {
name = "corp-domain"
resource_group_name = azurerm_resource_group.main.name
location = azurerm_resource_group.main.location
# ... configuration
}
Choosing the best tool for each job. Defensible if you have the team to support it.
Scenario 3: Disaster Recovery Across Clouds
# Primary application on GCP
resource "google_compute_instance" "app_primary" {
name = "app-primary"
machine_type = "n2-standard-4"
zone = "us-central1-a"
# ... configuration
}
# Failover to AWS if GCP region fails
resource "aws_instance" "app_failover" {
ami = "ami-0c55b159cbfafe1f0"
instance_type = "t3.xlarge"
# ... configuration
}
# DNS failover using Route53
resource "aws_route53_health_check" "gcp_primary" {
fqdn = google_compute_instance.app_primary.network_interface[0].access_config[0].nat_ip
type = "HTTPS"
}
Expensive but effective for mission-critical systems where downtime costs millions.
Provider Aliases: Multiple Instances of the Same Cloud
Sometimes you need multiple configurations of the same provider - deploying to multiple AWS regions, for example.
Multi-Region AWS Deployment
# providers.tf
provider "aws" {
region = "us-east-1"
alias = "us_east"
}
provider "aws" {
region = "eu-west-1"
alias = "eu_west"
}
provider "aws" {
region = "ap-southeast-1"
alias = "asia"
}
# main.tf
resource "aws_s3_bucket" "us_data" {
provider = aws.us_east
bucket = "my-app-us-data"
}
resource "aws_s3_bucket" "eu_data" {
provider = aws.eu_west
bucket = "my-app-eu-data"
}
resource "aws_s3_bucket" "asia_data" {
provider = aws.asia
bucket = "my-app-asia-data"
}
Critical rule: Resources without explicit provider = use the default provider (the one without alias). Be explicit to avoid surprises.
Multi-Account AWS Strategy
provider "aws" {
region = "us-east-1"
# Default: production account credentials
}
provider "aws" {
alias = "dev"
region = "us-east-1"
assume_role {
role_arn = "arn:aws:iam::111111111111:role/TerraformDev"
}
}
provider "aws" {
alias = "staging"
region = "us-east-1"
assume_role {
role_arn = "arn:aws:iam::222222222222:role/TerraformStaging"
}
}
# Deploy to dev account
resource "aws_instance" "dev_web" {
provider = aws.dev
ami = "ami-0c55b159cbfafe1f0"
instance_type = "t3.micro"
}
# Deploy to production account
resource "aws_instance" "prod_web" {
# Uses default provider (production account)
ami = "ami-0c55b159cbfafe1f0"
instance_type = "t3.large"
}
This is actually more common than multi-cloud - separating environments by AWS accounts.
Cross-Cloud Networking: Where Dreams Go to Die
You want your AWS VPC to talk to your GCP VPC? Buckle up, this is where multi-cloud gets painful.
Each cloud has different VPC models, IP addressing schemes, routing paradigms, and peering mechanisms. And the data egress fees? Brutal.
VPN Between AWS and GCP (The Hard Way)
AWS side:
resource "aws_vpn_gateway" "main" {
vpc_id = aws_vpc.main.id
tags = {
Name = "aws-to-gcp-vpn"
}
}
resource "aws_customer_gateway" "gcp" {
bgp_asn = 65000
ip_address = google_compute_address.vpn.address
type = "ipsec.1"
tags = {
Name = "gcp-gateway"
}
}
resource "aws_vpn_connection" "gcp" {
vpn_gateway_id = aws_vpn_gateway.main.id
customer_gateway_id = aws_customer_gateway.gcp.id
type = "ipsec.1"
static_routes_only = true
}
GCP side:
resource "google_compute_address" "vpn" {
name = "gcp-vpn-ip"
region = "us-central1"
}
resource "google_compute_vpn_gateway" "main" {
name = "gcp-to-aws-vpn"
network = google_compute_network.main.id
region = "us-central1"
}
resource "google_compute_vpn_tunnel" "aws" {
name = "tunnel-to-aws"
peer_ip = aws_vpn_connection.gcp.tunnel1_address
shared_secret = aws_vpn_connection.gcp.tunnel1_preshared_key
target_vpn_gateway = google_compute_vpn_gateway.main.id
local_traffic_selector = ["10.1.0.0/16"]
remote_traffic_selector = ["10.0.0.0/16"]
}
Cross-cloud VPNs are expensive (data egress fees are eye-watering), operationally complex (asymmetric routing, MTU issues, BGP debugging at 3 AM), and often unnecessary. Use cloud-native public APIs with authentication instead whenever possible.
The Smart Alternative: Public API Integration
Instead of private networking, expose services via authenticated public APIs:
# AWS: Expose API Gateway
resource "aws_api_gateway_rest_api" "app" {
name = "app-api"
}
# GCP: Call AWS API from Cloud Run
resource "google_cloud_run_service" "worker" {
name = "data-processor"
location = "us-central1"
template {
spec {
containers {
image = "gcr.io/my-project/worker:latest"
env {
name = "AWS_API_URL"
value = "https://${aws_api_gateway_rest_api.app.id}.execute-api.us-east-1.amazonaws.com"
}
}
}
}
}
Why this is better:
- No VPN complexity
- Standard HTTPS encryption (TLS 1.3)
- Cloud-native authentication (AWS Signature v4, GCP OAuth2)
- Pay per API request, not 24/7 VPN tunnel uptime
- Easier debugging (standard HTTP tools work)
Provider-Agnostic Modules: Abstraction vs. Reality
You can write modules that abstract away cloud provider differences. Should you?
The abstraction:
# modules/compute_instance/main.tf
variable "cloud_provider" {
type = string
validation {
condition = contains(["aws", "gcp", "azure"], var.cloud_provider)
error_message = "Provider must be aws, gcp, or azure"
}
}
variable "instance_name" {
type = string
}
variable "instance_size" {
type = string
}
# Conditional resources based on provider
resource "aws_instance" "this" {
count = var.cloud_provider == "aws" ? 1 : 0
ami = data.aws_ami.ubuntu.id
instance_type = var.instance_size
tags = {
Name = var.instance_name
}
}
resource "google_compute_instance" "this" {
count = var.cloud_provider == "gcp" ? 1 : 0
name = var.instance_name
machine_type = var.instance_size
zone = "us-central1-a"
boot_disk {
initialize_params {
image = "ubuntu-os-cloud/ubuntu-2204-lts"
}
}
network_interface {
network = "default"
}
}
# Outputs that work across providers
output "instance_id" {
value = var.cloud_provider == "aws" ? aws_instance.this[0].id : (
var.cloud_provider == "gcp" ? google_compute_instance.this[0].id :
azurerm_linux_virtual_machine.this[0].id
)
}
Usage:
module "web_aws" {
source = "./modules/compute_instance"
cloud_provider = "aws"
instance_name = "web-server"
instance_size = "t3.micro"
}
module "web_gcp" {
source = "./modules/compute_instance"
cloud_provider = "gcp"
instance_name = "web-server"
instance_size = "e2-micro"
}
The trade-offs:
Pros:
- Single interface for multiple clouds
- Easier to migrate between providers (in theory)
- Consistent configuration patterns
Cons:
- Abstracts away cloud-specific features (you lose flexibility)
- More complex module logic (conditional resources everywhere)
- Maintenance burden when providers change their APIs
- You're building your own mini-Terraform on top of Terraform
Honest recommendation: Only build provider-agnostic modules if you're actually deploying the same workload to multiple clouds regularly. Otherwise, you're solving a problem you don't have.
Checkpoint Questions
Before moving to Part 9, ensure you understand:
What are three legitimate business reasons to go multi-cloud? (M&A, geographic compliance, best-of-breed services)
What's the difference between multi-cloud and multi-region? (Multi-cloud = different providers; multi-region = same provider, different locations)
How do you configure Terraform to use multiple cloud providers? (Multiple entries in
required_providersblock, multipleproviderconfigurations)What is a provider alias and when do you use one? (Multiple instances of same provider for different regions/accounts)
Compare VM creation: How does creating a VM differ between AWS, GCP, and Azure? (Different resource names, configuration styles, required fields)
Why is cross-cloud networking complex? (Different VPC models, egress fees, routing complexity, operational overhead)
What's a simpler alternative to VPN-based cross-cloud networking? (Public APIs with authentication)
When should you avoid multi-cloud? (Single cloud meets requirements, team lacks multi-cloud expertise, no clear business case, early-stage company)
Multi-cloud is a tool, not a religion. Use it when business requirements demand it - mergers, compliance, geographic reach, or specific cloud services. Avoid it when single-cloud simplicity serves you better. The best architecture is the one your team can operate reliably at 3 AM on a Sunday.
What's Next: Part 9 - Terraform Backends & Remote State
You've been running terraform apply on your laptop. That's fine for learning.
But what happens when Sarah runs terraform apply on her laptop at the same time you do?
State conflicts. Infrastructure corruption. Production outages. Panic.
In Part 9, we'll solve team collaboration with remote backends:
- Why local state files break teams (and how to fix it)
- Configuring remote backends (S3, GCS, Azure Storage)
- State locking with DynamoDB/Cloud Storage (preventing simultaneous applies)
- Migrating existing state to remote backends (without destroying everything)
- Workspace strategies for multi-environment management
The problem:
Developer A: terraform apply (starts)
Developer B (10 seconds later): terraform apply (starts)
Result: State file corruption! Who created what? Nobody knows!
The solution: Remote state with locking. See you in Part 9.
Series navigation:
- Part 1: Why Infrastructure as Code?
- Part 2: Setting Up Terraform
- Part 3: Your First Cloud Resource
- Part 4: HCL Fundamentals
- Part 5: Variables, Outputs & State
- Part 6: Core Terraform Workflow
- Part 7: Modules for Organization
- Part 8: Multi-Cloud Patterns (You are here)
- Part 9: State Management & Team Workflows
- Part 10: Testing & Validation
- Part 11: Security & Secrets Management
- Part 12: Production Patterns & DevSecOps
This post is part of the "Terraform from Fundamentals to Production" series. Follow along to master Infrastructure as Code with Terraform.