Table of Contents
- Production-Ready Terraform: The Series Finale
- π¦ Code Examples
- What "Production-Ready" Actually Means
- Building A Bulletproof CI/CD Pipeline
- Blast Radius Reduction: Limiting the Damage
- Strategy 1: Environment Isolation
- Strategy 2: Team-Based Separation
- Strategy 3: Surgical Changes with -target
- Strategy 4: Immutable Infrastructure
- Disaster Recovery: Planning For Catastrophe
- Disaster Scenario 1: State File Corruption
- Disaster Scenario 2: Accidental Destruction
- Disaster Scenario 3: Cloud Provider Outage
- Quarterly Disaster Recovery Drills
- Cost Optimization: Taming The Cloud Bill
- Tag Everything For Cost Attribution
- Right-Size Resources By Environment
- Automated Resource Scheduling
- Use Spot Instances For Stateless Workloads
- Detect Unused Resources
- Compliance & Governance Automation
- Series Conclusion: What You've Accomplished
- Checkpoint: Final Knowledge Check
- Resources & Further Reading
Production-Ready Terraform: The Series Finale
You've made it. Part 12. The finish line.
You started learning Terraform basics 11 parts ago. Now you're building reusable modules, deploying multi-cloud infrastructure, and writing comprehensive tests. You're doing things most engineers never bother to learn.
But here's the reality: Everything you've learned so far gets you to 80%. This final part? This is the 20% that separates hobbyists from professionals.
Production infrastructure doesn't fail gracefully. It fails at 3 AM when your monitoring wakes you up. It fails when someone accidentally runs terraform destroy on the wrong workspace. It fails when AWS us-east-1 goes down for the third time this year.
This part is about making your infrastructure bulletproof. Not perfectβnothing is perfectβbut resilient enough to survive the chaos of real-world production.
Let's finish this series strong.
π¦ Code Examples
Repository: terraform-hcl-tutorial-series This Part: Part 12 - Production Patterns
Get the working example:
git clone https://github.com/khuongdo/terraform-hcl-tutorial-series.git
cd terraform-hcl-tutorial-series
git checkout part-12
cd examples/part-12-production/
# Explore production-ready patterns
terraform init
terraform plan
What "Production-Ready" Actually Means
Stop me if you've seen this before:
The "Works On My Machine" Infrastructure:
- Manual
terraform applyfrom your laptop - No code review process
- State file living on your local disk
- Secrets hardcoded in variables
- Zero monitoring or drift detection
- Changes deployed straight to production
- "Disaster recovery plan" = pray nothing breaks
The "I Can Sleep At Night" Infrastructure:
- Automated CI/CD with approval gates
- Every change code-reviewed and tested
- Remote state with encryption and locking
- Secrets managed via Vault or OIDC
- Comprehensive drift detection and alerting
- Blast radius containment strategies
- Tested disaster recovery playbooks
The difference? The second one doesn't wake you up at 3 AM because someone accidentally nuked your database.
Your Production Readiness Checklist
Before you deploy infrastructure that matters, check these boxes:
State Management:
- Remote backend with encryption enabled
- State locking configured (DynamoDB/Consul)
- State bucket versioning enabled
- Cross-region replication for state backups
- Separate state files per environment
Security:
- No hardcoded credentials anywhere
- Secrets managed via Vault/SOPS/OIDC
- Security scanning in CI/CD (tfsec/Checkov/Trivy)
- Policy enforcement with OPA or Sentinel
- All S3 buckets encrypted
- All EBS volumes encrypted
- IMDSv2 enforced on EC2 instances
CI/CD Pipeline:
- Automated plan on pull requests
- Manual approval gate for production
- Plan artifacts saved for review
- Security scans fail the build
- Notifications on success/failure
- Rollback procedures documented
Disaster Recovery:
- Recovery playbook written and tested
- State restoration tested quarterly
- Multi-region failover configured (if needed)
- RTO and RPO defined and measured
- Chaos engineering drills scheduled
Cost & Compliance:
- All resources tagged with Owner/CostCenter
- Budget alerts configured
- Non-prod resources auto-shutdown enabled
- RBAC configured for terraform operations
- Compliance framework validated (CIS/SOC2/HIPAA)
If you can't check every box, you're not ready. And that's okayβjust be honest about your risk.
Building A Bulletproof CI/CD Pipeline
Manual terraform apply doesn't scale. Here's how to automate safely.
The Pipeline Architecture
Every production Terraform deployment should flow through this gauntlet:
βββββββββββββββ
β Git Push β
ββββββββ¬βββββββ
β
v
ββββββββββββββββββββ
β Format & Validateβ β Catch syntax errors fast
ββββββββ¬ββββββββββββ
β
v
ββββββββββββββββββββ
β Security Scanningβ β tfsec/Trivy/Checkov
β Policy Checks β β OPA/Conftest
ββββββββ¬ββββββββββββ
β
v
ββββββββββββββββββββ
β terraform plan β β Generate execution plan
β Save artifact β β For approval review
ββββββββ¬ββββββββββββ
β
v
ββββββββββββββββββββ
β Manual Approval β β Human gate for prod
β (Prod Only) β
ββββββββ¬ββββββββββββ
β
v
ββββββββββββββββββββ
β terraform apply β β Execute approved plan
β Notify team β β Slack/Teams/Email
ββββββββββββββββββββ
The golden rule: No human touches production manually. Ever.
Every changeβevery single oneβgoes through this pipeline. No exceptions. Not even for "quick fixes."
GitHub Actions: The Complete Pipeline
Here's a production-grade workflow that actually works:
name: Terraform CI/CD
on:
pull_request:
paths:
- 'terraform/**'
- '.github/workflows/terraform.yml'
push:
branches:
- main
paths:
- 'terraform/**'
env:
TF_VERSION: '1.5.0'
WORKING_DIR: './terraform'
jobs:
validate:
name: Validate & Lint
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v4
- name: Setup Terraform
uses: hashicorp/setup-terraform@v3
with:
terraform_version: ${{ env.TF_VERSION }}
- name: Format Check
working-directory: ${{ env.WORKING_DIR }}
run: terraform fmt -check -recursive
- name: Init (no backend)
working-directory: ${{ env.WORKING_DIR }}
run: terraform init -backend=false
- name: Validate
working-directory: ${{ env.WORKING_DIR }}
run: terraform validate
security:
name: Security Scanning
runs-on: ubuntu-latest
needs: validate
steps:
- name: Checkout
uses: actions/checkout@v4
- name: Trivy Scan
uses: aquasecurity/trivy-action@master
with:
scan-type: 'config'
scan-ref: ${{ env.WORKING_DIR }}
format: 'sarif'
output: 'trivy-results.sarif'
- name: Upload to GitHub Security
uses: github/codeql-action/upload-sarif@v3
with:
sarif_file: 'trivy-results.sarif'
- name: Checkov Scan
uses: bridgecrewio/checkov-action@master
with:
directory: ${{ env.WORKING_DIR }}
framework: terraform
soft_fail: false # Fail build on issues
policy:
name: Policy Enforcement
runs-on: ubuntu-latest
needs: validate
steps:
- name: Checkout
uses: actions/checkout@v4
- name: Install Conftest
run: |
curl -L https://github.com/open-policy-agent/conftest/releases/download/v0.46.0/conftest_0.46.0_Linux_x86_64.tar.gz | tar xz
sudo mv conftest /usr/local/bin/
- name: Run Policy Checks
working-directory: ${{ env.WORKING_DIR }}
run: conftest test -p ../policies/ *.tf
plan:
name: Terraform Plan
runs-on: ubuntu-latest
needs: [security, policy]
permissions:
contents: read
id-token: write # OIDC authentication
pull-requests: write
steps:
- name: Checkout
uses: actions/checkout@v4
- name: Configure AWS (OIDC)
uses: aws-actions/configure-aws-credentials@v4
with:
role-to-assume: arn:aws:iam::123456789012:role/GitHubActionsTerraform
aws-region: us-east-1
- name: Setup Terraform
uses: hashicorp/setup-terraform@v3
with:
terraform_version: ${{ env.TF_VERSION }}
- name: Init
working-directory: ${{ env.WORKING_DIR }}
run: terraform init
- name: Plan
id: plan
working-directory: ${{ env.WORKING_DIR }}
run: |
terraform plan -out=tfplan -no-color
terraform show -no-color tfplan > plan.txt
- name: Comment PR
if: github.event_name == 'pull_request'
uses: actions/github-script@v7
with:
script: |
const fs = require('fs');
const plan = fs.readFileSync('${{ env.WORKING_DIR }}/plan.txt', 'utf8');
const output = `#### Terraform Plan π
<details><summary>Show Plan</summary>
\`\`\`hcl
${plan}
\`\`\`
</details>`;
github.rest.issues.createComment({
issue_number: context.issue.number,
owner: context.repo.owner,
repo: context.repo.repo,
body: output
});
- name: Save Plan Artifact
uses: actions/upload-artifact@v4
with:
name: tfplan
path: ${{ env.WORKING_DIR }}/tfplan
retention-days: 5
apply:
name: Terraform Apply
runs-on: ubuntu-latest
needs: plan
if: github.ref == 'refs/heads/main' && github.event_name == 'push'
environment:
name: production # Requires manual approval
permissions:
contents: read
id-token: write
steps:
- name: Checkout
uses: actions/checkout@v4
- name: Configure AWS (OIDC)
uses: aws-actions/configure-aws-credentials@v4
with:
role-to-assume: arn:aws:iam::123456789012:role/GitHubActionsTerraform
aws-region: us-east-1
- name: Setup Terraform
uses: hashicorp/setup-terraform@v3
with:
terraform_version: ${{ env.TF_VERSION }}
- name: Download Plan
uses: actions/download-artifact@v4
with:
name: tfplan
path: ${{ env.WORKING_DIR }}
- name: Init
working-directory: ${{ env.WORKING_DIR }}
run: terraform init
- name: Apply
working-directory: ${{ env.WORKING_DIR }}
run: terraform apply -auto-approve tfplan
- name: Notify Success
if: success()
uses: slackapi/slack-github-action@v1
with:
payload: |
{
"text": "β
Terraform deployment succeeded",
"blocks": [
{
"type": "section",
"text": {
"type": "mrkdwn",
"text": "*Terraform Apply Succeeded*\n${{ github.event.head_commit.message }}"
}
}
]
}
env:
SLACK_WEBHOOK_URL: ${{ secrets.SLACK_WEBHOOK_URL }}
- name: Notify Failure
if: failure()
uses: slackapi/slack-github-action@v1
with:
payload: |
{
"text": "β Terraform deployment failed",
"blocks": [
{
"type": "section",
"text": {
"type": "mrkdwn",
"text": "*Terraform Apply Failed*\nCheck <${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}|workflow logs>"
}
}
]
}
env:
SLACK_WEBHOOK_URL: ${{ secrets.SLACK_WEBHOOK_URL }}
OIDC Authentication: No long-lived AWS credentials in GitHub Secrets. Temporary credentials issued on-demand.
Security-First: Trivy and Checkov run before planning. If misconfigurations exist, the build fails before you waste time.
Policy Enforcement: Conftest validates organizational rules (e.g., "all S3 buckets must be encrypted").
Manual Approval Gate: Production changes require clicking "Approve" in GitHub UI. No accidents.
Plan Artifacts: The exact plan gets saved, reviewed, then applied. Prevents plan/apply drift.
Team Notifications: Slack alerts on success or failure. No more "did my deployment work?" questions.
GitLab CI Alternative
If you're team GitLab:
# .gitlab-ci.yml
variables:
TF_VERSION: "1.5.0"
TF_ROOT: ${CI_PROJECT_DIR}/terraform
stages:
- validate
- security
- plan
- apply
.terraform_base:
image:
name: hashicorp/terraform:${TF_VERSION}
entrypoint: [""]
before_script:
- cd ${TF_ROOT}
- terraform init
validate:
extends: .terraform_base
stage: validate
script:
- terraform fmt -check -recursive
- terraform validate
security_scan:
stage: security
image: aquasec/trivy:latest
script:
- trivy config --severity HIGH,CRITICAL ${TF_ROOT}
policy_check:
stage: security
image: openpolicyagent/conftest:latest
script:
- conftest test -p policies/ ${TF_ROOT}/*.tf
plan:
extends: .terraform_base
stage: plan
script:
- terraform plan -out=tfplan
- terraform show -json tfplan > plan.json
artifacts:
paths:
- ${TF_ROOT}/tfplan
- ${TF_ROOT}/plan.json
expire_in: 1 week
only:
- merge_requests
- main
apply:
extends: .terraform_base
stage: apply
script:
- terraform apply -auto-approve tfplan
dependencies:
- plan
only:
- main
when: manual # Manual trigger required
environment:
name: production
Blast Radius Reduction: Limiting the Damage
When things go wrongβand they willβlimit how much can burn.
Strategy 1: Environment Isolation
Bad: Everything in one state file
terraform/
main.tf # Manages dev, staging, AND prod
terraform.tfstate
Run terraform apply and you impact all environments at once. Terrifying.
Good: Separate states per environment
terraform/
environments/
dev/
main.tf
backend.tf
terraform.tfvars
staging/
main.tf
backend.tf
terraform.tfvars
production/
main.tf
backend.tf
terraform.tfvars
Destroying dev? Production doesn't even notice. This is how you sleep at night.
Strategy 2: Team-Based Separation
Split state by ownership:
terraform/
networking/ # Platform team owns VPCs, subnets
databases/ # DBA team owns RDS, backups
applications/ # App teams own compute, LBs
Benefits:
- Smaller blast radius (networking changes don't affect databases)
- Clear ownership boundaries
- Faster plan/apply cycles
- Easier code review
Strategy 3: Surgical Changes with -target
Need to change one resource without replanning everything?
# Apply only to specific resource
terraform apply -target=aws_instance.web_server
# Plan only specific module
terraform plan -target=module.networking
Warning: Don't abuse -target. If you're using it constantly, your state is too largeβsplit it up.
Strategy 4: Immutable Infrastructure
Instead of updating resources in place, replace them:
resource "aws_instance" "web" {
ami = var.ami_id
instance_type = "t3.medium"
# Force recreation on changes (immutable pattern)
lifecycle {
create_before_destroy = true
}
# Replace monthly for security hygiene
replace_triggered_by = [
time_rotating.monthly_rotation
]
}
resource "time_rotating" "monthly_rotation" {
rotation_days = 30
}
Immutable infrastructure reduces drift and security vulnerabilities. You're not patching serversβyou're replacing them.
Disaster Recovery: Planning For Catastrophe
Your infrastructure will fail. The question is: how fast can you recover?
Disaster Scenario 1: State File Corruption
The Nightmare: Your terraform.tfstate file is corrupted or deleted.
Recovery Steps:
# 1. Restore from backup
aws s3 cp s3://terraform-state-backups/terraform.tfstate.backup ./terraform.tfstate
# 2. Verify state matches reality
terraform plan
# Should show "No changes" if backup is recent
# 3. If outdated, reconcile manually
terraform refresh
terraform plan
Prevention:
# Enable S3 versioning
resource "aws_s3_bucket_versioning" "state" {
bucket = "my-terraform-state"
versioning_configuration {
status = "Enabled"
}
}
# Cross-region replication
resource "aws_s3_bucket_replication_configuration" "state" {
bucket = "my-terraform-state"
rule {
id = "disaster-recovery"
status = "Enabled"
destination {
bucket = "arn:aws:s3:::terraform-state-backup-us-west-2"
storage_class = "GLACIER"
}
}
}
Test restoration quarterly. If you haven't tested it, it doesn't work.
Disaster Scenario 2: Accidental Destruction
The Nightmare: Someone ran terraform destroy on production. Everything's gone.
Recovery Steps:
# 1. Restore state from backup IMMEDIATELY
aws s3api list-object-versions \
--bucket my-terraform-state \
--prefix production/terraform.tfstate
# 2. Get the version before destruction
aws s3api get-object \
--bucket my-terraform-state \
--key production/terraform.tfstate \
--version-id <VERSION_BEFORE_DESTROY> \
terraform.tfstate
# 3. Re-apply infrastructure
terraform apply -auto-approve
# 4. Run smoke tests, verify health
Prevention:
# IAM policy preventing destroy operations
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Deny",
"Action": [
"ec2:TerminateInstances",
"rds:DeleteDBInstance",
"s3:DeleteBucket"
],
"Resource": "*",
"Condition": {
"StringEquals": {
"aws:PrincipalTag/Environment": "production"
}
}
}
]
}
Also: Enable Terraform Cloud's deletion protection. Require MFA for production operations.
Disaster Scenario 3: Cloud Provider Outage
The Nightmare: AWS us-east-1 is down. Your entire infrastructure is unavailable.
Recovery Steps:
# 1. Failover to backup region
cd terraform/environments/production-us-west-2/
terraform apply -auto-approve
# 2. Update DNS (or let Route53 failover do it automatically)
# 3. Monitor backup region capacity
# 4. When primary recovers, fail back
Prevention: Multi-Region Active-Standby
module "primary_region" {
source = "./modules/app"
providers = {
aws = aws.us-east-1
}
region = "us-east-1"
is_primary = true
}
module "backup_region" {
source = "./modules/app"
providers = {
aws = aws.us-west-2
}
region = "us-west-2"
is_primary = false
}
# Automated DNS failover
resource "aws_route53_health_check" "primary" {
fqdn = module.primary_region.endpoint
type = "HTTPS"
resource_path = "/health"
failure_threshold = 3
request_interval = 30
}
resource "aws_route53_record" "app" {
zone_id = aws_route53_zone.main.id
name = "app.example.com"
type = "A"
set_identifier = "primary"
failover_routing_policy {
type = "PRIMARY"
}
health_check_id = aws_route53_health_check.primary.id
alias {
name = module.primary_region.load_balancer_dns
zone_id = module.primary_region.load_balancer_zone_id
evaluate_target_health = true
}
}
resource "aws_route53_record" "app_backup" {
zone_id = aws_route53_zone.main.id
name = "app.example.com"
type = "A"
set_identifier = "backup"
failover_routing_policy {
type = "SECONDARY"
}
alias {
name = module.backup_region.load_balancer_dns
zone_id = module.backup_region.load_balancer_zone_id
evaluate_target_health = true
}
}
Quarterly Disaster Recovery Drills
Schedule chaos engineering exercises:
Q1: State File Recovery
- Delete state file intentionally
- Restore from backup
- Verify infrastructure matches
- Measure recovery time
Q2: Resource Deletion Recovery
- Manually delete critical resource via console
- Run
terraform applyto recreate - Verify application recovery
- Document gaps in the process
Q3: Region Failover
- Simulate us-east-1 outage
- Execute multi-region failover
- Measure RTO (Recovery Time Objective)
- Identify bottlenecks
Q4: Complete Rebuild
- Destroy all resources
- Rebuild from code + state backup
- Measure RPO (Recovery Point Objective)
- Update runbooks based on findings
If you haven't tested your disaster recovery plan, you don't have a disaster recovery plan.
Cost Optimization: Taming The Cloud Bill
Cloud costs spiral without discipline. Here's how Terraform helps.
Tag Everything For Cost Attribution
locals {
common_tags = {
Environment = var.environment
ManagedBy = "Terraform"
CostCenter = var.cost_center
Owner = var.owner_email
Project = var.project_name
}
}
resource "aws_instance" "web" {
ami = data.aws_ami.ubuntu.id
instance_type = var.instance_type
tags = merge(
local.common_tags,
{
Name = "web-server-${var.environment}"
}
)
}
# Enforce tagging via policy
resource "aws_organizations_policy" "require_tags" {
name = "RequireTags"
content = jsonencode({
Version = "2012-10-17"
Statement = [
{
Effect = "Deny"
Action = ["ec2:RunInstances"]
Resource = ["*"]
Condition = {
"Null" = {
"aws:RequestTag/CostCenter" = "true"
"aws:RequestTag/Owner" = "true"
}
}
}
]
})
}
Why this matters: You can't optimize costs you can't attribute. Tags let you answer "Who's spending $50k/month on EC2?"
Right-Size Resources By Environment
locals {
# Cost-optimized sizing
instance_types = {
dev = "t3.micro" # ~$7/month
staging = "t3.small" # ~$15/month
prod = "t3.large" # ~$60/month
}
}
resource "aws_instance" "app" {
ami = data.aws_ami.ubuntu.id
instance_type = local.instance_types[var.environment]
# Prevent over-provisioning
lifecycle {
precondition {
condition = contains(keys(local.instance_types), var.environment)
error_message = "Invalid environment. Must be dev, staging, or prod."
}
}
}
Stop paying production prices for dev environments.
Automated Resource Scheduling
Shut down non-production resources nights and weekends:
resource "aws_autoscaling_schedule" "scale_down_evening" {
count = var.environment != "prod" ? 1 : 0
scheduled_action_name = "scale-down-evening"
min_size = 0
max_size = 0
desired_capacity = 0
recurrence = "0 20 * * MON-FRI" # 8 PM weekdays
autoscaling_group_name = aws_autoscaling_group.app.name
}
resource "aws_autoscaling_schedule" "scale_up_morning" {
count = var.environment != "prod" ? 1 : 0
scheduled_action_name = "scale-up-morning"
min_size = 2
max_size = 10
desired_capacity = 2
recurrence = "0 8 * * MON-FRI" # 8 AM weekdays
autoscaling_group_name = aws_autoscaling_group.app.name
}
Potential savings: 70% reduction on dev/staging compute costs.
Use Spot Instances For Stateless Workloads
resource "aws_launch_template" "app" {
name_prefix = "app-"
image_id = data.aws_ami.ubuntu.id
instance_type = "t3.large"
instance_market_options {
market_type = "spot"
spot_options {
max_price = "0.03" # ~70% discount vs on-demand
}
}
}
resource "aws_autoscaling_group" "app" {
desired_capacity = 3
max_size = 10
min_size = 1
mixed_instances_policy {
launch_template {
launch_template_specification {
launch_template_id = aws_launch_template.app.id
version = "$Latest"
}
}
instances_distribution {
on_demand_base_capacity = 1 # Keep 1 on-demand
on_demand_percentage_above_base_capacity = 0 # Rest are spot
spot_allocation_strategy = "capacity-optimized"
}
}
}
Spot instances for stateless workloads = massive savings with minimal risk.
Detect Unused Resources
# Find unattached EBS volumes (wasting money)
data "aws_ebs_volumes" "unattached" {
filter {
name = "status"
values = ["available"]
}
}
# Alert if any exist
resource "null_resource" "unused_volume_alert" {
count = length(data.aws_ebs_volumes.unattached.ids) > 0 ? 1 : 0
provisioner "local-exec" {
command = <<-EOT
echo "WARNING: ${length(data.aws_ebs_volumes.unattached.ids)} unused EBS volumes detected"
echo "Potential monthly waste: $${length(data.aws_ebs_volumes.unattached.ids) * 8}"
EOT
}
}
Unused resources are burning money while you sleep.
Compliance & Governance Automation
Enforce organizational policies with code, not spreadsheets.
Policy As Code With OPA
Create a policy library:
# policies/aws_s3_encryption.rego
package terraform.s3
import future.keywords.contains
import future.keywords.if
deny[msg] {
resource := input.resource.aws_s3_bucket[name]
not resource.server_side_encryption_configuration
msg := sprintf("S3 bucket '%s' must enable encryption", [name])
}
deny[msg] {
resource := input.resource.aws_s3_bucket[name]
resource.acl == "public-read"
msg := sprintf("S3 bucket '%s' cannot be public", [name])
}
# policies/aws_ec2_approved_instances.rego
package terraform.ec2
import future.keywords.contains
import future.keywords.if
approved_instance_types := {
"t3.micro", "t3.small", "t3.medium",
"m5.large", "m5.xlarge"
}
deny[msg] {
resource := input.resource.aws_instance[name]
not contains(approved_instance_types, resource.instance_type)
msg := sprintf("EC2 instance '%s' uses unapproved type '%s'", [name, resource.instance_type])
}
Run in CI/CD:
conftest test terraform/*.tf -p policies/
# Example output:
# FAIL - terraform/main.tf - S3 bucket 'logs' must enable encryption
# FAIL - terraform/main.tf - EC2 instance 'web' uses unapproved type 't3.xlarge'
Policies enforce rules automatically. No more "please remember to encrypt S3 buckets" emails.
CIS Benchmark Compliance
Enforce Center for Internet Security benchmarks:
# Require IMDSv2 on EC2 instances (CIS AWS 5.6)
resource "aws_instance" "web" {
ami = data.aws_ami.ubuntu.id
instance_type = "t3.medium"
metadata_options {
http_endpoint = "enabled"
http_tokens = "required" # Require IMDSv2
http_put_response_hop_limit = 1
}
}
# Enforce encrypted EBS volumes (CIS AWS 2.2.1)
resource "aws_ebs_encryption_by_default" "enabled" {
enabled = true
}
# Enforce VPC flow logs (CIS AWS 2.9)
resource "aws_flow_log" "vpc" {
vpc_id = aws_vpc.main.id
traffic_type = "ALL"
iam_role_arn = aws_iam_role.flow_logs.arn
log_destination = aws_cloudwatch_log_group.flow_logs.arn
}
Validate automatically:
tfsec . --format=json | jq '.results[] | select(.severity=="CRITICAL")'
RBAC For Terraform Operations
Not everyone should run terraform apply on production.
Terraform Cloud RBAC:
resource "tfe_team" "developers" {
name = "developers"
organization = "my-org"
}
resource "tfe_team_access" "developers_staging" {
team_id = tfe_team.developers.id
workspace_id = tfe_workspace.staging.id
access = "write" # Can plan and apply
}
resource "tfe_team_access" "developers_production" {
team_id = tfe_team.developers.id
workspace_id = tfe_workspace.production.id
access = "read" # View only, no apply
}
resource "tfe_team" "platform_engineers" {
name = "platform-engineers"
organization = "my-org"
}
resource "tfe_team_access" "platform_production" {
team_id = tfe_team.platform_engineers.id
workspace_id = tfe_workspace.production.id
access = "admin" # Full control
}
Principle: Developers test in staging. Platform engineers control production. Clear boundaries prevent accidents.
Series Conclusion: What You've Accomplished
You did it. All 12 parts.
What You've Mastered
Foundation (Parts 1-3):
- Why Infrastructure as Code matters
- Setting up Terraform and cloud authentication
- Deploying your first resources
Core Concepts (Parts 4-7):
- HCL syntax, types, functions, expressions
- Variables, outputs, state management
- The core workflow: init β plan β apply β destroy
- Building reusable modules
Advanced Topics (Parts 8-12):
- Multi-cloud deployment patterns
- Team workflows and collaboration
- Comprehensive testing strategies
- Security and secrets management
- Production deployment patterns
You've learned what most engineers never bother to learn. Most people terraform apply from their laptop and hope for the best. You've built bulletproof infrastructure that survives production chaos.
What To Do Next
Immediate Actions:
- Apply these patterns to a real project
- Build a module library for your organization
- Set up CI/CD pipelines for your infrastructure
- Run a disaster recovery drill
Further Learning:
- Terraform Associate Certification - Validate your skills officially
- Terraform: Up & Running by Yevgeniy Brikman - The definitive book
- HashiCorp Learn - Official tutorials and workshops
- Terraform Module Registry - Explore community modules
Advanced Topics:
- CDK for Terraform (CDKTF) - Write infrastructure in TypeScript/Python/Go
- Terragrunt - DRY wrapper for complex Terraform configurations
- Atlantis - Self-hosted Terraform automation for GitHub/GitLab
- Spacelift/env0 - Enterprise Terraform platforms
Join The Community
- Terraform Community Forum - discuss.hashicorp.com
- r/Terraform - Reddit community
- HashiCorp Community Slack - Get help from experts
Final Thoughts
Infrastructure as Code isn't just a technical skill. It's a mindset.
You've learned to treat infrastructure with the same rigor as application code: version controlled, tested, reviewed, automated.
Most importantly, you've learned that infrastructure should be boring. The best infrastructure is infrastructure you never think about because it just works.
You're not done learningβno one ever isβbut you're ready. Ready to build production systems that scale. Ready to handle the 3 AM pages. Ready to survive the chaos.
Go build something amazing.
The cloud is your canvas. Terraform is your brush. And you know how to use it.
Checkpoint: Final Knowledge Check
Test yourself one last time:
What are the essential stages of a production Terraform pipeline?
- Validation β Security scanning β Policy checks β Plan β Manual approval (prod) β Apply β Notifications
How do you reduce blast radius in Terraform deployments?
- Separate state files by environment/team, workspace isolation, targeted applies when needed, immutable infrastructure patterns
What's the difference between RTO and RPO in disaster recovery?
- RTO = Recovery Time Objective (how quickly you can recover). RPO = Recovery Point Objective (how much data loss is acceptable)
Name three cost optimization strategies for Terraform infrastructure.
- Tag everything, right-size by environment, automate resource scheduling, use spot instances, detect unused resources
How does policy-as-code prevent misconfigurations?
- OPA/Conftest policies enforce rules (encryption, tagging, approved instance types) and fail CI/CD if violated
If you answered these confidently, you're ready for production.
Series Navigation: β Part 11: Security & Secrets Management | Part 12 (You are here)
This concludes the "Terraform from Fundamentals to Production" tutorial series. Thank you for following along. May your infrastructure be declarative, your state files intact, and your cloud bills reasonable.
Questions? Drop a comment below. Share your production Terraform storiesβwe all learn from each other's war stories.
Resources & Further Reading
Official Documentation:
Security & Compliance:
CI/CD Platforms:
Books:
- Terraform: Up & Running (3rd Edition) by Yevgeniy Brikman
- Infrastructure as Code, Patterns and Practices
Community:
- awesome-terraform - Curated list of resources
- Terraform Best Practices
Series navigation:
- Part 1: Why Infrastructure as Code?
- Part 2: Setting Up Terraform
- Part 3: Your First Cloud Resource
- Part 4: HCL Fundamentals
- Part 5: Variables, Outputs & State
- Part 6: Core Terraform Workflow
- Part 7: Modules for Organization
- Part 8: Multi-Cloud Patterns
- Part 9: State Management & Team Workflows
- Part 10: Testing & Validation
- Part 11: Security & Secrets Management
- Part 12: Production Patterns & DevSecOps (You are here)
This post is part of the "Terraform from Fundamentals to Production" series. Congratulations on completing all 12 parts!