Posted on :: 3554 Words :: Tags: , , , ,

Production-Ready Terraform: The Series Finale

You've made it. Part 12. The finish line.

You started learning Terraform basics 11 parts ago. Now you're building reusable modules, deploying multi-cloud infrastructure, and writing comprehensive tests. You're doing things most engineers never bother to learn.

But here's the reality: Everything you've learned so far gets you to 80%. This final part? This is the 20% that separates hobbyists from professionals.

Production infrastructure doesn't fail gracefully. It fails at 3 AM when your monitoring wakes you up. It fails when someone accidentally runs terraform destroy on the wrong workspace. It fails when AWS us-east-1 goes down for the third time this year.

This part is about making your infrastructure bulletproof. Not perfectβ€”nothing is perfectβ€”but resilient enough to survive the chaos of real-world production.

Let's finish this series strong.

πŸ“¦ Code Examples

Repository: terraform-hcl-tutorial-series This Part: Part 12 - Production Patterns

Get the working example:

git clone https://github.com/khuongdo/terraform-hcl-tutorial-series.git
cd terraform-hcl-tutorial-series
git checkout part-12
cd examples/part-12-production/

# Explore production-ready patterns
terraform init
terraform plan

What "Production-Ready" Actually Means

Stop me if you've seen this before:

The "Works On My Machine" Infrastructure:

  • Manual terraform apply from your laptop
  • No code review process
  • State file living on your local disk
  • Secrets hardcoded in variables
  • Zero monitoring or drift detection
  • Changes deployed straight to production
  • "Disaster recovery plan" = pray nothing breaks

The "I Can Sleep At Night" Infrastructure:

  • Automated CI/CD with approval gates
  • Every change code-reviewed and tested
  • Remote state with encryption and locking
  • Secrets managed via Vault or OIDC
  • Comprehensive drift detection and alerting
  • Blast radius containment strategies
  • Tested disaster recovery playbooks

The difference? The second one doesn't wake you up at 3 AM because someone accidentally nuked your database.

Your Production Readiness Checklist

Before you deploy infrastructure that matters, check these boxes:

State Management:

  • Remote backend with encryption enabled
  • State locking configured (DynamoDB/Consul)
  • State bucket versioning enabled
  • Cross-region replication for state backups
  • Separate state files per environment

Security:

  • No hardcoded credentials anywhere
  • Secrets managed via Vault/SOPS/OIDC
  • Security scanning in CI/CD (tfsec/Checkov/Trivy)
  • Policy enforcement with OPA or Sentinel
  • All S3 buckets encrypted
  • All EBS volumes encrypted
  • IMDSv2 enforced on EC2 instances

CI/CD Pipeline:

  • Automated plan on pull requests
  • Manual approval gate for production
  • Plan artifacts saved for review
  • Security scans fail the build
  • Notifications on success/failure
  • Rollback procedures documented

Disaster Recovery:

  • Recovery playbook written and tested
  • State restoration tested quarterly
  • Multi-region failover configured (if needed)
  • RTO and RPO defined and measured
  • Chaos engineering drills scheduled

Cost & Compliance:

  • All resources tagged with Owner/CostCenter
  • Budget alerts configured
  • Non-prod resources auto-shutdown enabled
  • RBAC configured for terraform operations
  • Compliance framework validated (CIS/SOC2/HIPAA)

If you can't check every box, you're not ready. And that's okayβ€”just be honest about your risk.

Building A Bulletproof CI/CD Pipeline

Manual terraform apply doesn't scale. Here's how to automate safely.

The Pipeline Architecture

Every production Terraform deployment should flow through this gauntlet:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Git Push   β”‚
β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜
       β”‚
       v
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Format & Validateβ”‚  ← Catch syntax errors fast
β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
       β”‚
       v
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Security Scanningβ”‚  ← tfsec/Trivy/Checkov
β”‚  Policy Checks   β”‚  ← OPA/Conftest
β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
       β”‚
       v
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ terraform plan   β”‚  ← Generate execution plan
β”‚ Save artifact    β”‚  ← For approval review
β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
       β”‚
       v
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Manual Approval  β”‚  ← Human gate for prod
β”‚ (Prod Only)      β”‚
β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
       β”‚
       v
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ terraform apply  β”‚  ← Execute approved plan
β”‚ Notify team      β”‚  ← Slack/Teams/Email
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

The golden rule: No human touches production manually. Ever.

Every changeβ€”every single oneβ€”goes through this pipeline. No exceptions. Not even for "quick fixes."

GitHub Actions: The Complete Pipeline

Here's a production-grade workflow that actually works:

name: Terraform CI/CD

on:
  pull_request:
    paths:
      - 'terraform/**'
      - '.github/workflows/terraform.yml'
  push:
    branches:
      - main
    paths:
      - 'terraform/**'

env:
  TF_VERSION: '1.5.0'
  WORKING_DIR: './terraform'

jobs:
  validate:
    name: Validate & Lint
    runs-on: ubuntu-latest
    steps:
      - name: Checkout
        uses: actions/checkout@v4

      - name: Setup Terraform
        uses: hashicorp/setup-terraform@v3
        with:
          terraform_version: ${{ env.TF_VERSION }}

      - name: Format Check
        working-directory: ${{ env.WORKING_DIR }}
        run: terraform fmt -check -recursive

      - name: Init (no backend)
        working-directory: ${{ env.WORKING_DIR }}
        run: terraform init -backend=false

      - name: Validate
        working-directory: ${{ env.WORKING_DIR }}
        run: terraform validate

  security:
    name: Security Scanning
    runs-on: ubuntu-latest
    needs: validate
    steps:
      - name: Checkout
        uses: actions/checkout@v4

      - name: Trivy Scan
        uses: aquasecurity/trivy-action@master
        with:
          scan-type: 'config'
          scan-ref: ${{ env.WORKING_DIR }}
          format: 'sarif'
          output: 'trivy-results.sarif'

      - name: Upload to GitHub Security
        uses: github/codeql-action/upload-sarif@v3
        with:
          sarif_file: 'trivy-results.sarif'

      - name: Checkov Scan
        uses: bridgecrewio/checkov-action@master
        with:
          directory: ${{ env.WORKING_DIR }}
          framework: terraform
          soft_fail: false  # Fail build on issues

  policy:
    name: Policy Enforcement
    runs-on: ubuntu-latest
    needs: validate
    steps:
      - name: Checkout
        uses: actions/checkout@v4

      - name: Install Conftest
        run: |
          curl -L https://github.com/open-policy-agent/conftest/releases/download/v0.46.0/conftest_0.46.0_Linux_x86_64.tar.gz | tar xz
          sudo mv conftest /usr/local/bin/

      - name: Run Policy Checks
        working-directory: ${{ env.WORKING_DIR }}
        run: conftest test -p ../policies/ *.tf

  plan:
    name: Terraform Plan
    runs-on: ubuntu-latest
    needs: [security, policy]
    permissions:
      contents: read
      id-token: write  # OIDC authentication
      pull-requests: write
    steps:
      - name: Checkout
        uses: actions/checkout@v4

      - name: Configure AWS (OIDC)
        uses: aws-actions/configure-aws-credentials@v4
        with:
          role-to-assume: arn:aws:iam::123456789012:role/GitHubActionsTerraform
          aws-region: us-east-1

      - name: Setup Terraform
        uses: hashicorp/setup-terraform@v3
        with:
          terraform_version: ${{ env.TF_VERSION }}

      - name: Init
        working-directory: ${{ env.WORKING_DIR }}
        run: terraform init

      - name: Plan
        id: plan
        working-directory: ${{ env.WORKING_DIR }}
        run: |
          terraform plan -out=tfplan -no-color
          terraform show -no-color tfplan > plan.txt

      - name: Comment PR
        if: github.event_name == 'pull_request'
        uses: actions/github-script@v7
        with:
          script: |
            const fs = require('fs');
            const plan = fs.readFileSync('${{ env.WORKING_DIR }}/plan.txt', 'utf8');
            const output = `#### Terraform Plan πŸ“‹
            <details><summary>Show Plan</summary>

            \`\`\`hcl
            ${plan}
            \`\`\`

            </details>`;

            github.rest.issues.createComment({
              issue_number: context.issue.number,
              owner: context.repo.owner,
              repo: context.repo.repo,
              body: output
            });

      - name: Save Plan Artifact
        uses: actions/upload-artifact@v4
        with:
          name: tfplan
          path: ${{ env.WORKING_DIR }}/tfplan
          retention-days: 5

  apply:
    name: Terraform Apply
    runs-on: ubuntu-latest
    needs: plan
    if: github.ref == 'refs/heads/main' && github.event_name == 'push'
    environment:
      name: production  # Requires manual approval
    permissions:
      contents: read
      id-token: write
    steps:
      - name: Checkout
        uses: actions/checkout@v4

      - name: Configure AWS (OIDC)
        uses: aws-actions/configure-aws-credentials@v4
        with:
          role-to-assume: arn:aws:iam::123456789012:role/GitHubActionsTerraform
          aws-region: us-east-1

      - name: Setup Terraform
        uses: hashicorp/setup-terraform@v3
        with:
          terraform_version: ${{ env.TF_VERSION }}

      - name: Download Plan
        uses: actions/download-artifact@v4
        with:
          name: tfplan
          path: ${{ env.WORKING_DIR }}

      - name: Init
        working-directory: ${{ env.WORKING_DIR }}
        run: terraform init

      - name: Apply
        working-directory: ${{ env.WORKING_DIR }}
        run: terraform apply -auto-approve tfplan

      - name: Notify Success
        if: success()
        uses: slackapi/slack-github-action@v1
        with:
          payload: |
            {
              "text": "βœ… Terraform deployment succeeded",
              "blocks": [
                {
                  "type": "section",
                  "text": {
                    "type": "mrkdwn",
                    "text": "*Terraform Apply Succeeded*\n${{ github.event.head_commit.message }}"
                  }
                }
              ]
            }
        env:
          SLACK_WEBHOOK_URL: ${{ secrets.SLACK_WEBHOOK_URL }}

      - name: Notify Failure
        if: failure()
        uses: slackapi/slack-github-action@v1
        with:
          payload: |
            {
              "text": "❌ Terraform deployment failed",
              "blocks": [
                {
                  "type": "section",
                  "text": {
                    "type": "mrkdwn",
                    "text": "*Terraform Apply Failed*\nCheck <${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}|workflow logs>"
                  }
                }
              ]
            }
        env:
          SLACK_WEBHOOK_URL: ${{ secrets.SLACK_WEBHOOK_URL }}

Why This Pipeline Works

OIDC Authentication: No long-lived AWS credentials in GitHub Secrets. Temporary credentials issued on-demand.

Security-First: Trivy and Checkov run before planning. If misconfigurations exist, the build fails before you waste time.

Policy Enforcement: Conftest validates organizational rules (e.g., "all S3 buckets must be encrypted").

Manual Approval Gate: Production changes require clicking "Approve" in GitHub UI. No accidents.

Plan Artifacts: The exact plan gets saved, reviewed, then applied. Prevents plan/apply drift.

Team Notifications: Slack alerts on success or failure. No more "did my deployment work?" questions.

GitLab CI Alternative

If you're team GitLab:

# .gitlab-ci.yml
variables:
  TF_VERSION: "1.5.0"
  TF_ROOT: ${CI_PROJECT_DIR}/terraform

stages:
  - validate
  - security
  - plan
  - apply

.terraform_base:
  image:
    name: hashicorp/terraform:${TF_VERSION}
    entrypoint: [""]
  before_script:
    - cd ${TF_ROOT}
    - terraform init

validate:
  extends: .terraform_base
  stage: validate
  script:
    - terraform fmt -check -recursive
    - terraform validate

security_scan:
  stage: security
  image: aquasec/trivy:latest
  script:
    - trivy config --severity HIGH,CRITICAL ${TF_ROOT}

policy_check:
  stage: security
  image: openpolicyagent/conftest:latest
  script:
    - conftest test -p policies/ ${TF_ROOT}/*.tf

plan:
  extends: .terraform_base
  stage: plan
  script:
    - terraform plan -out=tfplan
    - terraform show -json tfplan > plan.json
  artifacts:
    paths:
      - ${TF_ROOT}/tfplan
      - ${TF_ROOT}/plan.json
    expire_in: 1 week
  only:
    - merge_requests
    - main

apply:
  extends: .terraform_base
  stage: apply
  script:
    - terraform apply -auto-approve tfplan
  dependencies:
    - plan
  only:
    - main
  when: manual  # Manual trigger required
  environment:
    name: production

Blast Radius Reduction: Limiting the Damage

When things go wrongβ€”and they willβ€”limit how much can burn.

Strategy 1: Environment Isolation

Bad: Everything in one state file

terraform/
  main.tf  # Manages dev, staging, AND prod
  terraform.tfstate

Run terraform apply and you impact all environments at once. Terrifying.

Good: Separate states per environment

terraform/
  environments/
    dev/
      main.tf
      backend.tf
      terraform.tfvars
    staging/
      main.tf
      backend.tf
      terraform.tfvars
    production/
      main.tf
      backend.tf
      terraform.tfvars

Destroying dev? Production doesn't even notice. This is how you sleep at night.

Strategy 2: Team-Based Separation

Split state by ownership:

terraform/
  networking/     # Platform team owns VPCs, subnets
  databases/      # DBA team owns RDS, backups
  applications/   # App teams own compute, LBs

Benefits:

  • Smaller blast radius (networking changes don't affect databases)
  • Clear ownership boundaries
  • Faster plan/apply cycles
  • Easier code review

Strategy 3: Surgical Changes with -target

Need to change one resource without replanning everything?

# Apply only to specific resource
terraform apply -target=aws_instance.web_server

# Plan only specific module
terraform plan -target=module.networking

Warning: Don't abuse -target. If you're using it constantly, your state is too largeβ€”split it up.

Strategy 4: Immutable Infrastructure

Instead of updating resources in place, replace them:

resource "aws_instance" "web" {
  ami           = var.ami_id
  instance_type = "t3.medium"

  # Force recreation on changes (immutable pattern)
  lifecycle {
    create_before_destroy = true
  }

  # Replace monthly for security hygiene
  replace_triggered_by = [
    time_rotating.monthly_rotation
  ]
}

resource "time_rotating" "monthly_rotation" {
  rotation_days = 30
}

Immutable infrastructure reduces drift and security vulnerabilities. You're not patching serversβ€”you're replacing them.

Disaster Recovery: Planning For Catastrophe

Your infrastructure will fail. The question is: how fast can you recover?

Disaster Scenario 1: State File Corruption

The Nightmare: Your terraform.tfstate file is corrupted or deleted.

Recovery Steps:

# 1. Restore from backup
aws s3 cp s3://terraform-state-backups/terraform.tfstate.backup ./terraform.tfstate

# 2. Verify state matches reality
terraform plan
# Should show "No changes" if backup is recent

# 3. If outdated, reconcile manually
terraform refresh
terraform plan

Prevention:

# Enable S3 versioning
resource "aws_s3_bucket_versioning" "state" {
  bucket = "my-terraform-state"

  versioning_configuration {
    status = "Enabled"
  }
}

# Cross-region replication
resource "aws_s3_bucket_replication_configuration" "state" {
  bucket = "my-terraform-state"

  rule {
    id     = "disaster-recovery"
    status = "Enabled"

    destination {
      bucket        = "arn:aws:s3:::terraform-state-backup-us-west-2"
      storage_class = "GLACIER"
    }
  }
}

Test restoration quarterly. If you haven't tested it, it doesn't work.

Disaster Scenario 2: Accidental Destruction

The Nightmare: Someone ran terraform destroy on production. Everything's gone.

Recovery Steps:

# 1. Restore state from backup IMMEDIATELY
aws s3api list-object-versions \
  --bucket my-terraform-state \
  --prefix production/terraform.tfstate

# 2. Get the version before destruction
aws s3api get-object \
  --bucket my-terraform-state \
  --key production/terraform.tfstate \
  --version-id <VERSION_BEFORE_DESTROY> \
  terraform.tfstate

# 3. Re-apply infrastructure
terraform apply -auto-approve

# 4. Run smoke tests, verify health

Prevention:

# IAM policy preventing destroy operations
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Deny",
      "Action": [
        "ec2:TerminateInstances",
        "rds:DeleteDBInstance",
        "s3:DeleteBucket"
      ],
      "Resource": "*",
      "Condition": {
        "StringEquals": {
          "aws:PrincipalTag/Environment": "production"
        }
      }
    }
  ]
}

Also: Enable Terraform Cloud's deletion protection. Require MFA for production operations.

Disaster Scenario 3: Cloud Provider Outage

The Nightmare: AWS us-east-1 is down. Your entire infrastructure is unavailable.

Recovery Steps:

# 1. Failover to backup region
cd terraform/environments/production-us-west-2/
terraform apply -auto-approve

# 2. Update DNS (or let Route53 failover do it automatically)

# 3. Monitor backup region capacity

# 4. When primary recovers, fail back

Prevention: Multi-Region Active-Standby

module "primary_region" {
  source = "./modules/app"

  providers = {
    aws = aws.us-east-1
  }

  region     = "us-east-1"
  is_primary = true
}

module "backup_region" {
  source = "./modules/app"

  providers = {
    aws = aws.us-west-2
  }

  region     = "us-west-2"
  is_primary = false
}

# Automated DNS failover
resource "aws_route53_health_check" "primary" {
  fqdn              = module.primary_region.endpoint
  type              = "HTTPS"
  resource_path     = "/health"
  failure_threshold = 3
  request_interval  = 30
}

resource "aws_route53_record" "app" {
  zone_id = aws_route53_zone.main.id
  name    = "app.example.com"
  type    = "A"

  set_identifier = "primary"
  failover_routing_policy {
    type = "PRIMARY"
  }

  health_check_id = aws_route53_health_check.primary.id
  alias {
    name                   = module.primary_region.load_balancer_dns
    zone_id                = module.primary_region.load_balancer_zone_id
    evaluate_target_health = true
  }
}

resource "aws_route53_record" "app_backup" {
  zone_id = aws_route53_zone.main.id
  name    = "app.example.com"
  type    = "A"

  set_identifier = "backup"
  failover_routing_policy {
    type = "SECONDARY"
  }

  alias {
    name                   = module.backup_region.load_balancer_dns
    zone_id                = module.backup_region.load_balancer_zone_id
    evaluate_target_health = true
  }
}

Quarterly Disaster Recovery Drills

Schedule chaos engineering exercises:

Q1: State File Recovery

  • Delete state file intentionally
  • Restore from backup
  • Verify infrastructure matches
  • Measure recovery time

Q2: Resource Deletion Recovery

  • Manually delete critical resource via console
  • Run terraform apply to recreate
  • Verify application recovery
  • Document gaps in the process

Q3: Region Failover

  • Simulate us-east-1 outage
  • Execute multi-region failover
  • Measure RTO (Recovery Time Objective)
  • Identify bottlenecks

Q4: Complete Rebuild

  • Destroy all resources
  • Rebuild from code + state backup
  • Measure RPO (Recovery Point Objective)
  • Update runbooks based on findings

If you haven't tested your disaster recovery plan, you don't have a disaster recovery plan.

Cost Optimization: Taming The Cloud Bill

Cloud costs spiral without discipline. Here's how Terraform helps.

Tag Everything For Cost Attribution

locals {
  common_tags = {
    Environment = var.environment
    ManagedBy   = "Terraform"
    CostCenter  = var.cost_center
    Owner       = var.owner_email
    Project     = var.project_name
  }
}

resource "aws_instance" "web" {
  ami           = data.aws_ami.ubuntu.id
  instance_type = var.instance_type

  tags = merge(
    local.common_tags,
    {
      Name = "web-server-${var.environment}"
    }
  )
}

# Enforce tagging via policy
resource "aws_organizations_policy" "require_tags" {
  name = "RequireTags"
  content = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Effect = "Deny"
        Action = ["ec2:RunInstances"]
        Resource = ["*"]
        Condition = {
          "Null" = {
            "aws:RequestTag/CostCenter" = "true"
            "aws:RequestTag/Owner"      = "true"
          }
        }
      }
    ]
  })
}

Why this matters: You can't optimize costs you can't attribute. Tags let you answer "Who's spending $50k/month on EC2?"

Right-Size Resources By Environment

locals {
  # Cost-optimized sizing
  instance_types = {
    dev     = "t3.micro"   # ~$7/month
    staging = "t3.small"   # ~$15/month
    prod    = "t3.large"   # ~$60/month
  }
}

resource "aws_instance" "app" {
  ami           = data.aws_ami.ubuntu.id
  instance_type = local.instance_types[var.environment]

  # Prevent over-provisioning
  lifecycle {
    precondition {
      condition     = contains(keys(local.instance_types), var.environment)
      error_message = "Invalid environment. Must be dev, staging, or prod."
    }
  }
}

Stop paying production prices for dev environments.

Automated Resource Scheduling

Shut down non-production resources nights and weekends:

resource "aws_autoscaling_schedule" "scale_down_evening" {
  count                  = var.environment != "prod" ? 1 : 0
  scheduled_action_name  = "scale-down-evening"
  min_size               = 0
  max_size               = 0
  desired_capacity       = 0
  recurrence             = "0 20 * * MON-FRI"  # 8 PM weekdays
  autoscaling_group_name = aws_autoscaling_group.app.name
}

resource "aws_autoscaling_schedule" "scale_up_morning" {
  count                  = var.environment != "prod" ? 1 : 0
  scheduled_action_name  = "scale-up-morning"
  min_size               = 2
  max_size               = 10
  desired_capacity       = 2
  recurrence             = "0 8 * * MON-FRI"   # 8 AM weekdays
  autoscaling_group_name = aws_autoscaling_group.app.name
}

Potential savings: 70% reduction on dev/staging compute costs.

Use Spot Instances For Stateless Workloads

resource "aws_launch_template" "app" {
  name_prefix   = "app-"
  image_id      = data.aws_ami.ubuntu.id
  instance_type = "t3.large"

  instance_market_options {
    market_type = "spot"
    spot_options {
      max_price = "0.03"  # ~70% discount vs on-demand
    }
  }
}

resource "aws_autoscaling_group" "app" {
  desired_capacity = 3
  max_size         = 10
  min_size         = 1

  mixed_instances_policy {
    launch_template {
      launch_template_specification {
        launch_template_id = aws_launch_template.app.id
        version            = "$Latest"
      }
    }

    instances_distribution {
      on_demand_base_capacity                  = 1  # Keep 1 on-demand
      on_demand_percentage_above_base_capacity = 0  # Rest are spot
      spot_allocation_strategy                 = "capacity-optimized"
    }
  }
}

Spot instances for stateless workloads = massive savings with minimal risk.

Detect Unused Resources

# Find unattached EBS volumes (wasting money)
data "aws_ebs_volumes" "unattached" {
  filter {
    name   = "status"
    values = ["available"]
  }
}

# Alert if any exist
resource "null_resource" "unused_volume_alert" {
  count = length(data.aws_ebs_volumes.unattached.ids) > 0 ? 1 : 0

  provisioner "local-exec" {
    command = <<-EOT
      echo "WARNING: ${length(data.aws_ebs_volumes.unattached.ids)} unused EBS volumes detected"
      echo "Potential monthly waste: $${length(data.aws_ebs_volumes.unattached.ids) * 8}"
    EOT
  }
}

Unused resources are burning money while you sleep.

Compliance & Governance Automation

Enforce organizational policies with code, not spreadsheets.

Policy As Code With OPA

Create a policy library:

# policies/aws_s3_encryption.rego
package terraform.s3

import future.keywords.contains
import future.keywords.if

deny[msg] {
    resource := input.resource.aws_s3_bucket[name]
    not resource.server_side_encryption_configuration

    msg := sprintf("S3 bucket '%s' must enable encryption", [name])
}

deny[msg] {
    resource := input.resource.aws_s3_bucket[name]
    resource.acl == "public-read"

    msg := sprintf("S3 bucket '%s' cannot be public", [name])
}
# policies/aws_ec2_approved_instances.rego
package terraform.ec2

import future.keywords.contains
import future.keywords.if

approved_instance_types := {
    "t3.micro", "t3.small", "t3.medium",
    "m5.large", "m5.xlarge"
}

deny[msg] {
    resource := input.resource.aws_instance[name]
    not contains(approved_instance_types, resource.instance_type)

    msg := sprintf("EC2 instance '%s' uses unapproved type '%s'", [name, resource.instance_type])
}

Run in CI/CD:

conftest test terraform/*.tf -p policies/

# Example output:
# FAIL - terraform/main.tf - S3 bucket 'logs' must enable encryption
# FAIL - terraform/main.tf - EC2 instance 'web' uses unapproved type 't3.xlarge'

Policies enforce rules automatically. No more "please remember to encrypt S3 buckets" emails.

CIS Benchmark Compliance

Enforce Center for Internet Security benchmarks:

# Require IMDSv2 on EC2 instances (CIS AWS 5.6)
resource "aws_instance" "web" {
  ami           = data.aws_ami.ubuntu.id
  instance_type = "t3.medium"

  metadata_options {
    http_endpoint               = "enabled"
    http_tokens                 = "required"  # Require IMDSv2
    http_put_response_hop_limit = 1
  }
}

# Enforce encrypted EBS volumes (CIS AWS 2.2.1)
resource "aws_ebs_encryption_by_default" "enabled" {
  enabled = true
}

# Enforce VPC flow logs (CIS AWS 2.9)
resource "aws_flow_log" "vpc" {
  vpc_id          = aws_vpc.main.id
  traffic_type    = "ALL"
  iam_role_arn    = aws_iam_role.flow_logs.arn
  log_destination = aws_cloudwatch_log_group.flow_logs.arn
}

Validate automatically:

tfsec . --format=json | jq '.results[] | select(.severity=="CRITICAL")'

RBAC For Terraform Operations

Not everyone should run terraform apply on production.

Terraform Cloud RBAC:

resource "tfe_team" "developers" {
  name         = "developers"
  organization = "my-org"
}

resource "tfe_team_access" "developers_staging" {
  team_id      = tfe_team.developers.id
  workspace_id = tfe_workspace.staging.id
  access       = "write"  # Can plan and apply
}

resource "tfe_team_access" "developers_production" {
  team_id      = tfe_team.developers.id
  workspace_id = tfe_workspace.production.id
  access       = "read"   # View only, no apply
}

resource "tfe_team" "platform_engineers" {
  name         = "platform-engineers"
  organization = "my-org"
}

resource "tfe_team_access" "platform_production" {
  team_id      = tfe_team.platform_engineers.id
  workspace_id = tfe_workspace.production.id
  access       = "admin"  # Full control
}

Principle: Developers test in staging. Platform engineers control production. Clear boundaries prevent accidents.

Series Conclusion: What You've Accomplished

You did it. All 12 parts.

What You've Mastered

Foundation (Parts 1-3):

  • Why Infrastructure as Code matters
  • Setting up Terraform and cloud authentication
  • Deploying your first resources

Core Concepts (Parts 4-7):

  • HCL syntax, types, functions, expressions
  • Variables, outputs, state management
  • The core workflow: init β†’ plan β†’ apply β†’ destroy
  • Building reusable modules

Advanced Topics (Parts 8-12):

  • Multi-cloud deployment patterns
  • Team workflows and collaboration
  • Comprehensive testing strategies
  • Security and secrets management
  • Production deployment patterns

You've learned what most engineers never bother to learn. Most people terraform apply from their laptop and hope for the best. You've built bulletproof infrastructure that survives production chaos.

What To Do Next

Immediate Actions:

  1. Apply these patterns to a real project
  2. Build a module library for your organization
  3. Set up CI/CD pipelines for your infrastructure
  4. Run a disaster recovery drill

Further Learning:

  • Terraform Associate Certification - Validate your skills officially
  • Terraform: Up & Running by Yevgeniy Brikman - The definitive book
  • HashiCorp Learn - Official tutorials and workshops
  • Terraform Module Registry - Explore community modules

Advanced Topics:

  • CDK for Terraform (CDKTF) - Write infrastructure in TypeScript/Python/Go
  • Terragrunt - DRY wrapper for complex Terraform configurations
  • Atlantis - Self-hosted Terraform automation for GitHub/GitLab
  • Spacelift/env0 - Enterprise Terraform platforms

Join The Community

  • Terraform Community Forum - discuss.hashicorp.com
  • r/Terraform - Reddit community
  • HashiCorp Community Slack - Get help from experts

Final Thoughts

Infrastructure as Code isn't just a technical skill. It's a mindset.

You've learned to treat infrastructure with the same rigor as application code: version controlled, tested, reviewed, automated.

Most importantly, you've learned that infrastructure should be boring. The best infrastructure is infrastructure you never think about because it just works.

You're not done learningβ€”no one ever isβ€”but you're ready. Ready to build production systems that scale. Ready to handle the 3 AM pages. Ready to survive the chaos.

Go build something amazing.

The cloud is your canvas. Terraform is your brush. And you know how to use it.

Checkpoint: Final Knowledge Check

Test yourself one last time:

  1. What are the essential stages of a production Terraform pipeline?

    • Validation β†’ Security scanning β†’ Policy checks β†’ Plan β†’ Manual approval (prod) β†’ Apply β†’ Notifications
  2. How do you reduce blast radius in Terraform deployments?

    • Separate state files by environment/team, workspace isolation, targeted applies when needed, immutable infrastructure patterns
  3. What's the difference between RTO and RPO in disaster recovery?

    • RTO = Recovery Time Objective (how quickly you can recover). RPO = Recovery Point Objective (how much data loss is acceptable)
  4. Name three cost optimization strategies for Terraform infrastructure.

    • Tag everything, right-size by environment, automate resource scheduling, use spot instances, detect unused resources
  5. How does policy-as-code prevent misconfigurations?

    • OPA/Conftest policies enforce rules (encryption, tagging, approved instance types) and fail CI/CD if violated

If you answered these confidently, you're ready for production.


Series Navigation: ← Part 11: Security & Secrets Management | Part 12 (You are here)


This concludes the "Terraform from Fundamentals to Production" tutorial series. Thank you for following along. May your infrastructure be declarative, your state files intact, and your cloud bills reasonable.

Questions? Drop a comment below. Share your production Terraform storiesβ€”we all learn from each other's war stories.


Resources & Further Reading

Official Documentation:

Security & Compliance:

CI/CD Platforms:

Books:

Community:


Series navigation:


This post is part of the "Terraform from Fundamentals to Production" series. Congratulations on completing all 12 parts!