Posted on 2026-01-05 :: 3554 Words :: Tags: terraform, devsecops, ci-cd, production, devops

Production-Ready Terraform: The Series Finale

Production-Ready Terraform: The Series Finale

You've made it. Part 12. The finish line.

You started learning Terraform basics 11 parts ago. Now you're building reusable modules, deploying multi-cloud infrastructure, and writing comprehensive tests. You're doing things most engineers never bother to learn.

But here's the reality: Everything you've learned so far gets you to 80%. This final part? This is the 20% that separates hobbyists from professionals.

Production infrastructure doesn't fail gracefully. It fails at 3 AM when your monitoring wakes you up. It fails when someone accidentally runs terraform destroy on the wrong workspace. It fails when AWS us-east-1 goes down for the third time this year.

This part is about making your infrastructure bulletproof. Not perfect—nothing is perfect—but resilient enough to survive the chaos of real-world production.

Let's finish this series strong.

📦 Code Examples

Repository: terraform-hcl-tutorial-series This Part: Part 12 - Production Patterns

Get the working example:

git clone https://github.com/khuongdo/terraform-hcl-tutorial-series.git
cd terraform-hcl-tutorial-series
git checkout part-12
cd examples/part-12-production/

# Explore production-ready patterns
terraform init
terraform plan

What "Production-Ready" Actually Means

Stop me if you've seen this before:

The "Works On My Machine" Infrastructure:

Manual terraform apply from your laptop
No code review process
State file living on your local disk
Secrets hardcoded in variables
Zero monitoring or drift detection
Changes deployed straight to production
"Disaster recovery plan" = pray nothing breaks

The "I Can Sleep At Night" Infrastructure:

Automated CI/CD with approval gates
Every change code-reviewed and tested
Remote state with encryption and locking
Secrets managed via Vault or OIDC
Comprehensive drift detection and alerting
Blast radius containment strategies
Tested disaster recovery playbooks

The difference? The second one doesn't wake you up at 3 AM because someone accidentally nuked your database.

Your Production Readiness Checklist

Before you deploy infrastructure that matters, check these boxes:

State Management:

Remote backend with encryption enabled
State locking configured (DynamoDB/Consul)
State bucket versioning enabled
Cross-region replication for state backups
Separate state files per environment

Security:

No hardcoded credentials anywhere
Secrets managed via Vault/SOPS/OIDC
Security scanning in CI/CD (tfsec/Checkov/Trivy)
Policy enforcement with OPA or Sentinel
All S3 buckets encrypted
All EBS volumes encrypted
IMDSv2 enforced on EC2 instances

CI/CD Pipeline:

Automated plan on pull requests
Manual approval gate for production
Plan artifacts saved for review
Security scans fail the build
Notifications on success/failure
Rollback procedures documented

Disaster Recovery:

Recovery playbook written and tested
State restoration tested quarterly
Multi-region failover configured (if needed)
RTO and RPO defined and measured
Chaos engineering drills scheduled

Cost & Compliance:

All resources tagged with Owner/CostCenter
Budget alerts configured
Non-prod resources auto-shutdown enabled
RBAC configured for terraform operations
Compliance framework validated (CIS/SOC2/HIPAA)

If you can't check every box, you're not ready. And that's okay—just be honest about your risk.

Building A Bulletproof CI/CD Pipeline

Manual terraform apply doesn't scale. Here's how to automate safely.

The Pipeline Architecture

Every production Terraform deployment should flow through this gauntlet:

┌─────────────┐
│  Git Push   │
└──────┬──────┘
       │
       v
┌──────────────────┐
│ Format & Validate│  ← Catch syntax errors fast
└──────┬───────────┘
       │
       v
┌──────────────────┐
│ Security Scanning│  ← tfsec/Trivy/Checkov
│  Policy Checks   │  ← OPA/Conftest
└──────┬───────────┘
       │
       v
┌──────────────────┐
│ terraform plan   │  ← Generate execution plan
│ Save artifact    │  ← For approval review
└──────┬───────────┘
       │
       v
┌──────────────────┐
│ Manual Approval  │  ← Human gate for prod
│ (Prod Only)      │
└──────┬───────────┘
       │
       v
┌──────────────────┐
│ terraform apply  │  ← Execute approved plan
│ Notify team      │  ← Slack/Teams/Email
└──────────────────┘

The golden rule: No human touches production manually. Ever.

Every change—every single one—goes through this pipeline. No exceptions. Not even for "quick fixes."

GitHub Actions: The Complete Pipeline

Here's a production-grade workflow that actually works:

name: Terraform CI/CD

on:
  pull_request:
    paths:
      - 'terraform/**'
      - '.github/workflows/terraform.yml'
  push:
    branches:
      - main
    paths:
      - 'terraform/**'

env:
  TF_VERSION: '1.5.0'
  WORKING_DIR: './terraform'

jobs:
  validate:
    name: Validate & Lint
    runs-on: ubuntu-latest
    steps:
      - name: Checkout
        uses: actions/checkout@v4

      - name: Setup Terraform
        uses: hashicorp/setup-terraform@v3
        with:
          terraform_version: ${{ env.TF_VERSION }}

      - name: Format Check
        working-directory: ${{ env.WORKING_DIR }}
        run: terraform fmt -check -recursive

      - name: Init (no backend)
        working-directory: ${{ env.WORKING_DIR }}
        run: terraform init -backend=false

      - name: Validate
        working-directory: ${{ env.WORKING_DIR }}
        run: terraform validate

  security:
    name: Security Scanning
    runs-on: ubuntu-latest
    needs: validate
    steps:
      - name: Checkout
        uses: actions/checkout@v4

      - name: Trivy Scan
        uses: aquasecurity/trivy-action@master
        with:
          scan-type: 'config'
          scan-ref: ${{ env.WORKING_DIR }}
          format: 'sarif'
          output: 'trivy-results.sarif'

      - name: Upload to GitHub Security
        uses: github/codeql-action/upload-sarif@v3
        with:
          sarif_file: 'trivy-results.sarif'

      - name: Checkov Scan
        uses: bridgecrewio/checkov-action@master
        with:
          directory: ${{ env.WORKING_DIR }}
          framework: terraform
          soft_fail: false  # Fail build on issues

  policy:
    name: Policy Enforcement
    runs-on: ubuntu-latest
    needs: validate
    steps:
      - name: Checkout
        uses: actions/checkout@v4

      - name: Install Conftest
        run: |
          curl -L https://github.com/open-policy-agent/conftest/releases/download/v0.46.0/conftest_0.46.0_Linux_x86_64.tar.gz | tar xz
          sudo mv conftest /usr/local/bin/

      - name: Run Policy Checks
        working-directory: ${{ env.WORKING_DIR }}
        run: conftest test -p ../policies/ *.tf

  plan:
    name: Terraform Plan
    runs-on: ubuntu-latest
    needs: [security, policy]
    permissions:
      contents: read
      id-token: write  # OIDC authentication
      pull-requests: write
    steps:
      - name: Checkout
        uses: actions/checkout@v4

      - name: Configure AWS (OIDC)
        uses: aws-actions/configure-aws-credentials@v4
        with:
          role-to-assume: arn:aws:iam::123456789012:role/GitHubActionsTerraform
          aws-region: us-east-1

      - name: Setup Terraform
        uses: hashicorp/setup-terraform@v3
        with:
          terraform_version: ${{ env.TF_VERSION }}

      - name: Init
        working-directory: ${{ env.WORKING_DIR }}
        run: terraform init

      - name: Plan
        id: plan
        working-directory: ${{ env.WORKING_DIR }}
        run: |
          terraform plan -out=tfplan -no-color
          terraform show -no-color tfplan > plan.txt

      - name: Comment PR
        if: github.event_name == 'pull_request'
        uses: actions/github-script@v7
        with:
          script: |
            const fs = require('fs');
            const plan = fs.readFileSync('${{ env.WORKING_DIR }}/plan.txt', 'utf8');
            const output = `#### Terraform Plan 📋
            <details><summary>Show Plan</summary>

            \`\`\`hcl
            ${plan}
            \`\`\`

            </details>`;

            github.rest.issues.createComment({
              issue_number: context.issue.number,
              owner: context.repo.owner,
              repo: context.repo.repo,
              body: output
            });

      - name: Save Plan Artifact
        uses: actions/upload-artifact@v4
        with:
          name: tfplan
          path: ${{ env.WORKING_DIR }}/tfplan
          retention-days: 5

  apply:
    name: Terraform Apply
    runs-on: ubuntu-latest
    needs: plan
    if: github.ref == 'refs/heads/main' && github.event_name == 'push'
    environment:
      name: production  # Requires manual approval
    permissions:
      contents: read
      id-token: write
    steps:
      - name: Checkout
        uses: actions/checkout@v4

      - name: Configure AWS (OIDC)
        uses: aws-actions/configure-aws-credentials@v4
        with:
          role-to-assume: arn:aws:iam::123456789012:role/GitHubActionsTerraform
          aws-region: us-east-1

      - name: Setup Terraform
        uses: hashicorp/setup-terraform@v3
        with:
          terraform_version: ${{ env.TF_VERSION }}

      - name: Download Plan
        uses: actions/download-artifact@v4
        with:
          name: tfplan
          path: ${{ env.WORKING_DIR }}

      - name: Init
        working-directory: ${{ env.WORKING_DIR }}
        run: terraform init

      - name: Apply
        working-directory: ${{ env.WORKING_DIR }}
        run: terraform apply -auto-approve tfplan

      - name: Notify Success
        if: success()
        uses: slackapi/slack-github-action@v1
        with:
          payload: |
            {
              "text": "✅ Terraform deployment succeeded",
              "blocks": [
                {
                  "type": "section",
                  "text": {
                    "type": "mrkdwn",
                    "text": "*Terraform Apply Succeeded*\n${{ github.event.head_commit.message }}"
                  }
                }
              ]
            }
        env:
          SLACK_WEBHOOK_URL: ${{ secrets.SLACK_WEBHOOK_URL }}

      - name: Notify Failure
        if: failure()
        uses: slackapi/slack-github-action@v1
        with:
          payload: |
            {
              "text": "❌ Terraform deployment failed",
              "blocks": [
                {
                  "type": "section",
                  "text": {
                    "type": "mrkdwn",
                    "text": "*Terraform Apply Failed*\nCheck <${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}|workflow logs>"
                  }
                }
              ]
            }
        env:
          SLACK_WEBHOOK_URL: ${{ secrets.SLACK_WEBHOOK_URL }}

Why This Pipeline Works

OIDC Authentication: No long-lived AWS credentials in GitHub Secrets. Temporary credentials issued on-demand.

Security-First: Trivy and Checkov run before planning. If misconfigurations exist, the build fails before you waste time.

Policy Enforcement: Conftest validates organizational rules (e.g., "all S3 buckets must be encrypted").

Manual Approval Gate: Production changes require clicking "Approve" in GitHub UI. No accidents.

Plan Artifacts: The exact plan gets saved, reviewed, then applied. Prevents plan/apply drift.

Team Notifications: Slack alerts on success or failure. No more "did my deployment work?" questions.

GitLab CI Alternative

If you're team GitLab:

# .gitlab-ci.yml
variables:
  TF_VERSION: "1.5.0"
  TF_ROOT: ${CI_PROJECT_DIR}/terraform

stages:
  - validate
  - security
  - plan
  - apply

.terraform_base:
  image:
    name: hashicorp/terraform:${TF_VERSION}
    entrypoint: [""]
  before_script:
    - cd ${TF_ROOT}
    - terraform init

validate:
  extends: .terraform_base
  stage: validate
  script:
    - terraform fmt -check -recursive
    - terraform validate

security_scan:
  stage: security
  image: aquasec/trivy:latest
  script:
    - trivy config --severity HIGH,CRITICAL ${TF_ROOT}

policy_check:
  stage: security
  image: openpolicyagent/conftest:latest
  script:
    - conftest test -p policies/ ${TF_ROOT}/*.tf

plan:
  extends: .terraform_base
  stage: plan
  script:
    - terraform plan -out=tfplan
    - terraform show -json tfplan > plan.json
  artifacts:
    paths:
      - ${TF_ROOT}/tfplan
      - ${TF_ROOT}/plan.json
    expire_in: 1 week
  only:
    - merge_requests
    - main

apply:
  extends: .terraform_base
  stage: apply
  script:
    - terraform apply -auto-approve tfplan
  dependencies:
    - plan
  only:
    - main
  when: manual  # Manual trigger required
  environment:
    name: production

Blast Radius Reduction: Limiting the Damage

When things go wrong—and they will—limit how much can burn.

Strategy 1: Environment Isolation

Bad: Everything in one state file

terraform/
  main.tf  # Manages dev, staging, AND prod
  terraform.tfstate

Run terraform apply and you impact all environments at once. Terrifying.

Good: Separate states per environment

terraform/
  environments/
    dev/
      main.tf
      backend.tf
      terraform.tfvars
    staging/
      main.tf
      backend.tf
      terraform.tfvars
    production/
      main.tf
      backend.tf
      terraform.tfvars

Destroying dev? Production doesn't even notice. This is how you sleep at night.

Strategy 2: Team-Based Separation

Split state by ownership:

terraform/
  networking/     # Platform team owns VPCs, subnets
  databases/      # DBA team owns RDS, backups
  applications/   # App teams own compute, LBs

Benefits:

Smaller blast radius (networking changes don't affect databases)
Clear ownership boundaries
Faster plan/apply cycles
Easier code review

Strategy 3: Surgical Changes with `-target`

Need to change one resource without replanning everything?

# Apply only to specific resource
terraform apply -target=aws_instance.web_server

# Plan only specific module
terraform plan -target=module.networking

Warning: Don't abuse -target. If you're using it constantly, your state is too large—split it up.

Strategy 4: Immutable Infrastructure

Instead of updating resources in place, replace them:

resource "aws_instance" "web" {
  ami           = var.ami_id
  instance_type = "t3.medium"

  # Force recreation on changes (immutable pattern)
  lifecycle {
    create_before_destroy = true
  }

  # Replace monthly for security hygiene
  replace_triggered_by = [
    time_rotating.monthly_rotation
  ]
}

resource "time_rotating" "monthly_rotation" {
  rotation_days = 30
}

Immutable infrastructure reduces drift and security vulnerabilities. You're not patching servers—you're replacing them.

Disaster Recovery: Planning For Catastrophe

Your infrastructure will fail. The question is: how fast can you recover?

Disaster Scenario 1: State File Corruption

The Nightmare: Your terraform.tfstate file is corrupted or deleted.

Recovery Steps:

# 1. Restore from backup
aws s3 cp s3://terraform-state-backups/terraform.tfstate.backup ./terraform.tfstate

# 2. Verify state matches reality
terraform plan
# Should show "No changes" if backup is recent

# 3. If outdated, reconcile manually
terraform refresh
terraform plan

Prevention:

# Enable S3 versioning
resource "aws_s3_bucket_versioning" "state" {
  bucket = "my-terraform-state"

  versioning_configuration {
    status = "Enabled"
  }
}

# Cross-region replication
resource "aws_s3_bucket_replication_configuration" "state" {
  bucket = "my-terraform-state"

  rule {
    id     = "disaster-recovery"
    status = "Enabled"

    destination {
      bucket        = "arn:aws:s3:::terraform-state-backup-us-west-2"
      storage_class = "GLACIER"
    }
  }
}

Test restoration quarterly. If you haven't tested it, it doesn't work.

Disaster Scenario 2: Accidental Destruction

The Nightmare: Someone ran terraform destroy on production. Everything's gone.

Recovery Steps:

# 1. Restore state from backup IMMEDIATELY
aws s3api list-object-versions \
  --bucket my-terraform-state \
  --prefix production/terraform.tfstate

# 2. Get the version before destruction
aws s3api get-object \
  --bucket my-terraform-state \
  --key production/terraform.tfstate \
  --version-id <VERSION_BEFORE_DESTROY> \
  terraform.tfstate

# 3. Re-apply infrastructure
terraform apply -auto-approve

# 4. Run smoke tests, verify health

Prevention:

# IAM policy preventing destroy operations
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Deny",
      "Action": [
        "ec2:TerminateInstances",
        "rds:DeleteDBInstance",
        "s3:DeleteBucket"
      ],
      "Resource": "*",
      "Condition": {
        "StringEquals": {
          "aws:PrincipalTag/Environment": "production"
        }
      }
    }
  ]
}

Also: Enable Terraform Cloud's deletion protection. Require MFA for production operations.

Disaster Scenario 3: Cloud Provider Outage

The Nightmare: AWS us-east-1 is down. Your entire infrastructure is unavailable.

Recovery Steps:

# 1. Failover to backup region
cd terraform/environments/production-us-west-2/
terraform apply -auto-approve

# 2. Update DNS (or let Route53 failover do it automatically)

# 3. Monitor backup region capacity

# 4. When primary recovers, fail back

Prevention: Multi-Region Active-Standby

module "primary_region" {
  source = "./modules/app"

  providers = {
    aws = aws.us-east-1
  }

  region     = "us-east-1"
  is_primary = true
}

module "backup_region" {
  source = "./modules/app"

  providers = {
    aws = aws.us-west-2
  }

  region     = "us-west-2"
  is_primary = false
}

# Automated DNS failover
resource "aws_route53_health_check" "primary" {
  fqdn              = module.primary_region.endpoint
  type              = "HTTPS"
  resource_path     = "/health"
  failure_threshold = 3
  request_interval  = 30
}

resource "aws_route53_record" "app" {
  zone_id = aws_route53_zone.main.id
  name    = "app.example.com"
  type    = "A"

  set_identifier = "primary"
  failover_routing_policy {
    type = "PRIMARY"
  }

  health_check_id = aws_route53_health_check.primary.id
  alias {
    name                   = module.primary_region.load_balancer_dns
    zone_id                = module.primary_region.load_balancer_zone_id
    evaluate_target_health = true
  }
}

resource "aws_route53_record" "app_backup" {
  zone_id = aws_route53_zone.main.id
  name    = "app.example.com"
  type    = "A"

  set_identifier = "backup"
  failover_routing_policy {
    type = "SECONDARY"
  }

  alias {
    name                   = module.backup_region.load_balancer_dns
    zone_id                = module.backup_region.load_balancer_zone_id
    evaluate_target_health = true
  }
}

Quarterly Disaster Recovery Drills

Schedule chaos engineering exercises:

Q1: State File Recovery

Delete state file intentionally
Restore from backup
Verify infrastructure matches
Measure recovery time

Q2: Resource Deletion Recovery

Manually delete critical resource via console
Run terraform apply to recreate
Verify application recovery
Document gaps in the process

Q3: Region Failover

Simulate us-east-1 outage
Execute multi-region failover
Measure RTO (Recovery Time Objective)
Identify bottlenecks

Q4: Complete Rebuild

Destroy all resources
Rebuild from code + state backup
Measure RPO (Recovery Point Objective)
Update runbooks based on findings

If you haven't tested your disaster recovery plan, you don't have a disaster recovery plan.

Cost Optimization: Taming The Cloud Bill

Cloud costs spiral without discipline. Here's how Terraform helps.

Tag Everything For Cost Attribution

locals {
  common_tags = {
    Environment = var.environment
    ManagedBy   = "Terraform"
    CostCenter  = var.cost_center
    Owner       = var.owner_email
    Project     = var.project_name
  }
}

resource "aws_instance" "web" {
  ami           = data.aws_ami.ubuntu.id
  instance_type = var.instance_type

  tags = merge(
    local.common_tags,
    {
      Name = "web-server-${var.environment}"
    }
  )
}

# Enforce tagging via policy
resource "aws_organizations_policy" "require_tags" {
  name = "RequireTags"
  content = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Effect = "Deny"
        Action = ["ec2:RunInstances"]
        Resource = ["*"]
        Condition = {
          "Null" = {
            "aws:RequestTag/CostCenter" = "true"
            "aws:RequestTag/Owner"      = "true"
          }
        }
      }
    ]
  })
}

Why this matters: You can't optimize costs you can't attribute. Tags let you answer "Who's spending $50k/month on EC2?"

Right-Size Resources By Environment

locals {
  # Cost-optimized sizing
  instance_types = {
    dev     = "t3.micro"   # ~$7/month
    staging = "t3.small"   # ~$15/month
    prod    = "t3.large"   # ~$60/month
  }
}

resource "aws_instance" "app" {
  ami           = data.aws_ami.ubuntu.id
  instance_type = local.instance_types[var.environment]

  # Prevent over-provisioning
  lifecycle {
    precondition {
      condition     = contains(keys(local.instance_types), var.environment)
      error_message = "Invalid environment. Must be dev, staging, or prod."
    }
  }
}

Stop paying production prices for dev environments.

Automated Resource Scheduling

Shut down non-production resources nights and weekends:

resource "aws_autoscaling_schedule" "scale_down_evening" {
  count                  = var.environment != "prod" ? 1 : 0
  scheduled_action_name  = "scale-down-evening"
  min_size               = 0
  max_size               = 0
  desired_capacity       = 0
  recurrence             = "0 20 * * MON-FRI"  # 8 PM weekdays
  autoscaling_group_name = aws_autoscaling_group.app.name
}

resource "aws_autoscaling_schedule" "scale_up_morning" {
  count                  = var.environment != "prod" ? 1 : 0
  scheduled_action_name  = "scale-up-morning"
  min_size               = 2
  max_size               = 10
  desired_capacity       = 2
  recurrence             = "0 8 * * MON-FRI"   # 8 AM weekdays
  autoscaling_group_name = aws_autoscaling_group.app.name
}

Potential savings: 70% reduction on dev/staging compute costs.

Use Spot Instances For Stateless Workloads

resource "aws_launch_template" "app" {
  name_prefix   = "app-"
  image_id      = data.aws_ami.ubuntu.id
  instance_type = "t3.large"

  instance_market_options {
    market_type = "spot"
    spot_options {
      max_price = "0.03"  # ~70% discount vs on-demand
    }
  }
}

resource "aws_autoscaling_group" "app" {
  desired_capacity = 3
  max_size         = 10
  min_size         = 1

  mixed_instances_policy {
    launch_template {
      launch_template_specification {
        launch_template_id = aws_launch_template.app.id
        version            = "$Latest"
      }
    }

    instances_distribution {
      on_demand_base_capacity                  = 1  # Keep 1 on-demand
      on_demand_percentage_above_base_capacity = 0  # Rest are spot
      spot_allocation_strategy                 = "capacity-optimized"
    }
  }
}

Spot instances for stateless workloads = massive savings with minimal risk.

Detect Unused Resources

# Find unattached EBS volumes (wasting money)
data "aws_ebs_volumes" "unattached" {
  filter {
    name   = "status"
    values = ["available"]
  }
}

# Alert if any exist
resource "null_resource" "unused_volume_alert" {
  count = length(data.aws_ebs_volumes.unattached.ids) > 0 ? 1 : 0

  provisioner "local-exec" {
    command = <<-EOT
      echo "WARNING: ${length(data.aws_ebs_volumes.unattached.ids)} unused EBS volumes detected"
      echo "Potential monthly waste: $${length(data.aws_ebs_volumes.unattached.ids) * 8}"
    EOT
  }
}

Unused resources are burning money while you sleep.

Compliance & Governance Automation

Enforce organizational policies with code, not spreadsheets.

Policy As Code With OPA

Create a policy library:

# policies/aws_s3_encryption.rego
package terraform.s3

import future.keywords.contains
import future.keywords.if

deny[msg] {
    resource := input.resource.aws_s3_bucket[name]
    not resource.server_side_encryption_configuration

    msg := sprintf("S3 bucket '%s' must enable encryption", [name])
}

deny[msg] {
    resource := input.resource.aws_s3_bucket[name]
    resource.acl == "public-read"

    msg := sprintf("S3 bucket '%s' cannot be public", [name])
}

# policies/aws_ec2_approved_instances.rego
package terraform.ec2

import future.keywords.contains
import future.keywords.if

approved_instance_types := {
    "t3.micro", "t3.small", "t3.medium",
    "m5.large", "m5.xlarge"
}

deny[msg] {
    resource := input.resource.aws_instance[name]
    not contains(approved_instance_types, resource.instance_type)

    msg := sprintf("EC2 instance '%s' uses unapproved type '%s'", [name, resource.instance_type])
}

Run in CI/CD:

conftest test terraform/*.tf -p policies/

# Example output:
# FAIL - terraform/main.tf - S3 bucket 'logs' must enable encryption
# FAIL - terraform/main.tf - EC2 instance 'web' uses unapproved type 't3.xlarge'

Policies enforce rules automatically. No more "please remember to encrypt S3 buckets" emails.

CIS Benchmark Compliance

Enforce Center for Internet Security benchmarks:

# Require IMDSv2 on EC2 instances (CIS AWS 5.6)
resource "aws_instance" "web" {
  ami           = data.aws_ami.ubuntu.id
  instance_type = "t3.medium"

  metadata_options {
    http_endpoint               = "enabled"
    http_tokens                 = "required"  # Require IMDSv2
    http_put_response_hop_limit = 1
  }
}

# Enforce encrypted EBS volumes (CIS AWS 2.2.1)
resource "aws_ebs_encryption_by_default" "enabled" {
  enabled = true
}

# Enforce VPC flow logs (CIS AWS 2.9)
resource "aws_flow_log" "vpc" {
  vpc_id          = aws_vpc.main.id
  traffic_type    = "ALL"
  iam_role_arn    = aws_iam_role.flow_logs.arn
  log_destination = aws_cloudwatch_log_group.flow_logs.arn
}

Validate automatically:

tfsec . --format=json | jq '.results[] | select(.severity=="CRITICAL")'

RBAC For Terraform Operations

Not everyone should run terraform apply on production.

Terraform Cloud RBAC:

resource "tfe_team" "developers" {
  name         = "developers"
  organization = "my-org"
}

resource "tfe_team_access" "developers_staging" {
  team_id      = tfe_team.developers.id
  workspace_id = tfe_workspace.staging.id
  access       = "write"  # Can plan and apply
}

resource "tfe_team_access" "developers_production" {
  team_id      = tfe_team.developers.id
  workspace_id = tfe_workspace.production.id
  access       = "read"   # View only, no apply
}

resource "tfe_team" "platform_engineers" {
  name         = "platform-engineers"
  organization = "my-org"
}

resource "tfe_team_access" "platform_production" {
  team_id      = tfe_team.platform_engineers.id
  workspace_id = tfe_workspace.production.id
  access       = "admin"  # Full control
}

Principle: Developers test in staging. Platform engineers control production. Clear boundaries prevent accidents.

Series Conclusion: What You've Accomplished

You did it. All 12 parts.

What You've Mastered

Foundation (Parts 1-3):

Why Infrastructure as Code matters
Setting up Terraform and cloud authentication
Deploying your first resources

Core Concepts (Parts 4-7):

HCL syntax, types, functions, expressions
Variables, outputs, state management
The core workflow: init → plan → apply → destroy
Building reusable modules

Advanced Topics (Parts 8-12):

Multi-cloud deployment patterns
Team workflows and collaboration
Comprehensive testing strategies
Security and secrets management
Production deployment patterns

You've learned what most engineers never bother to learn. Most people terraform apply from their laptop and hope for the best. You've built bulletproof infrastructure that survives production chaos.

What To Do Next

Immediate Actions:

Apply these patterns to a real project
Build a module library for your organization
Set up CI/CD pipelines for your infrastructure
Run a disaster recovery drill

Further Learning:

Terraform Associate Certification - Validate your skills officially
Terraform: Up & Running by Yevgeniy Brikman - The definitive book
HashiCorp Learn - Official tutorials and workshops
Terraform Module Registry - Explore community modules

Advanced Topics:

CDK for Terraform (CDKTF) - Write infrastructure in TypeScript/Python/Go
Terragrunt - DRY wrapper for complex Terraform configurations
Atlantis - Self-hosted Terraform automation for GitHub/GitLab
Spacelift/env0 - Enterprise Terraform platforms

Join The Community

Terraform Community Forum - discuss.hashicorp.com
r/Terraform - Reddit community
HashiCorp Community Slack - Get help from experts

Final Thoughts

Infrastructure as Code isn't just a technical skill. It's a mindset.

You've learned to treat infrastructure with the same rigor as application code: version controlled, tested, reviewed, automated.

Most importantly, you've learned that infrastructure should be boring. The best infrastructure is infrastructure you never think about because it just works.

You're not done learning—no one ever is—but you're ready. Ready to build production systems that scale. Ready to handle the 3 AM pages. Ready to survive the chaos.

Go build something amazing.

The cloud is your canvas. Terraform is your brush. And you know how to use it.

Checkpoint: Final Knowledge Check

Test yourself one last time:

What are the essential stages of a production Terraform pipeline?
- Validation → Security scanning → Policy checks → Plan → Manual approval (prod) → Apply → Notifications
How do you reduce blast radius in Terraform deployments?
- Separate state files by environment/team, workspace isolation, targeted applies when needed, immutable infrastructure patterns
What's the difference between RTO and RPO in disaster recovery?
- RTO = Recovery Time Objective (how quickly you can recover). RPO = Recovery Point Objective (how much data loss is acceptable)
Name three cost optimization strategies for Terraform infrastructure.
- Tag everything, right-size by environment, automate resource scheduling, use spot instances, detect unused resources
How does policy-as-code prevent misconfigurations?
- OPA/Conftest policies enforce rules (encryption, tagging, approved instance types) and fail CI/CD if violated

If you answered these confidently, you're ready for production.

Series Navigation: ← Part 11: Security & Secrets Management | Part 12 (You are here)

This concludes the "Terraform from Fundamentals to Production" tutorial series. Thank you for following along. May your infrastructure be declarative, your state files intact, and your cloud bills reasonable.

Questions? Drop a comment below. Share your production Terraform stories—we all learn from each other's war stories.