Posted on :: 3746 Words :: Tags: , , , ,

Testing & Validation

Your Terraform code works perfectly in your dev account. You run terraform apply, watch the green success messages scroll by, and think you're done.

Then production happens.

Your S3 buckets are wide open to the internet. EC2 instances are running the wrong AMI. That database you thought was encrypted? It's not. And someone just spun up 47 c5.24xlarge instances in Tokyo because there were no guardrails.

The $18,000 AWS bill arrives. Your security team is having a bad day. And you're learning an expensive lesson: testing deployment isn't the same as testing correctness.

Here's the thing nobody tells you: Infrastructure code breaks in ways application code doesn't. A typo in Python gives you a stack trace. A typo in Terraform gives you a publicly exposed database that passes every health check.

This tutorial shows you how to catch disasters before they cost you money, compliance violations, or your job.

📦 Code Examples

Repository: terraform-hcl-tutorial-series This Part: Part 10 - Testing Examples

Get the working example:

git clone https://github.com/khuongdo/terraform-hcl-tutorial-series.git
cd terraform-hcl-tutorial-series
git checkout part-10
cd examples/part-10-testing/

# Run security scans and tests
tfsec .
terraform init
terraform plan

Why Testing Infrastructure Actually Matters

Let me guess: You're thinking "it's just config files, how bad can it be?"

I'll tell you exactly how bad.

Security breach, take one: Developer copies an S3 bucket config from StackOverflow. Forgets to change acl = "public-read". Customer PII leaks. Company makes headlines. GDPR fine: 4% of annual revenue.

Cost explosion, take two: Junior engineer sets desired_capacity = 50 instead of 5 for an autoscaling group. Nobody notices until Monday morning when AWS has helpfully billed you for 672 hours of compute you didn't need.

Compliance failure, take three: Your SOC 2 audit fails because 30% of your RDS instances aren't encrypted. You swear they were. Turns out someone copy-pasted old Terraform code that didn't enforce encryption.

See the pattern? These aren't bugs. They're successful deployments of bad decisions.

You need three layers of defense:

Layer 1: Static Analysis - Scan your .tf files without deploying anything. Catch obvious mistakes like unencrypted buckets, missing backups, overly permissive security groups. Fast, free, catches 80% of problems.

Layer 2: Policy-as-Code - Enforce your company's rules. "All resources must have a CostCenter tag." "RDS instances can't use db.t2.micro in production." "S3 buckets can't exist in Singapore region." Automated compliance checks.

Layer 3: Integration Tests - Actually deploy infrastructure to a test account and verify it works. Does the EC2 instance boot? Can it reach the database? Is the load balancer health check passing? Expensive but catches what static analysis misses.

You need all three. Here's how to set them up.

Static Analysis: Catch Issues Before Deployment

Static analysis is your first line of defense. It scans Terraform files for known problems without deploying anything. No AWS credentials needed. No costs incurred. Just fast feedback on what's broken.

tfsec: Your Security Scanner

tfsec knows every common Terraform security mistake. Unencrypted S3 buckets. Security groups open to 0.0.0.0/0. Missing CloudTrail logging. IAM policies with wildcard permissions. It's seen it all.

Install it:

# macOS
brew install tfsec

# Linux
curl -s https://raw.githubusercontent.com/aquasecurity/tfsec/master/scripts/install_linux.sh | bash

# Or use Docker if you prefer containers
docker pull aquasec/tfsec

Run it:

tfsec .

That's it. It scans your current directory and tells you everything that's wrong.

Here's what actual output looks like:

───────────────────────────────────────────────────────────────
Result #1 HIGH Bucket does not have encryption enabled
───────────────────────────────────────────────────────────────
  main.tf:12-16
───────────────────────────────────────────────────────────────
   12 | resource "aws_s3_bucket" "data_bucket" {
   13 |   bucket = "my-data-bucket"
   14 |   acl    = "private"
   15 | }
───────────────────────────────────────────────────────────────
  Impact:  The bucket objects could be read if compromised
  Resolution: Configure bucket encryption
  More info: https://tfsec.dev/docs/aws/s3/enable-bucket-encryption
───────────────────────────────────────────────────────────────

See that? You thought setting acl = "private" was enough. tfsec knows better. Private ACL means "restrict access," not "encrypt data." If someone gets your credentials, they can read everything. Encryption protects data at rest even if access controls fail.

Fix it:

resource "aws_s3_bucket" "data_bucket" {
  bucket = "my-data-bucket"
}

resource "aws_s3_bucket_server_side_encryption_configuration" "data_bucket" {
  bucket = aws_s3_bucket.data_bucket.id

  rule {
    apply_server_side_encryption_by_default {
      sse_algorithm = "AES256"
    }
  }
}

Run tfsec . again. Green output. Problem solved.

What tfsec catches:

  • Unencrypted resources (S3, EBS volumes, RDS databases, EFS file systems)
  • Public exposure (security groups allowing 0.0.0.0/0, S3 buckets with public ACLs)
  • Missing logging (no CloudTrail, VPC flow logs disabled, S3 access logging off)
  • Weak IAM policies (actions set to "*", resources set to "*")
  • Outdated protocols (TLS 1.0 or 1.1 instead of 1.2+)

Sometimes tfsec is wrong:

You're building a public website. The load balancer needs to accept traffic from anywhere. tfsec flags it anyway.

Tell it to back off:

resource "aws_security_group" "allow_http" {
  name = "allow-http"

  #tfsec:ignore:aws-ec2-no-public-ingress-sgr
  ingress {
    description = "Public HTTP - required for ALB serving public website"
    from_port   = 80
    to_port     = 80
    protocol    = "tcp"
    cidr_blocks = ["0.0.0.0/0"]
  }
}

The comment tells tfsec "yes, I know this looks dangerous, I did it on purpose." Always add a description explaining why you're ignoring the check. Future you will thank present you.

Pro move for CI/CD:

tfsec . --format json > tfsec-report.json

JSON output means you can parse results, fail builds on HIGH severity issues, and track metrics over time.

Trivy: The Multi-Tool Scanner

Trivy scans everything: Terraform, Docker images, Kubernetes manifests, even application dependencies. It's slower than tfsec but more comprehensive.

Install it:

# macOS
brew install trivy

# Linux
wget -qO - https://aquasecurity.github.io/trivy-repo/deb/public.key | sudo apt-key add -
echo "deb https://aquasecurity.github.io/trivy-repo/deb $(lsb_release -sc) main" | sudo tee /etc/apt/sources.list.d/trivy.list
sudo apt-get update && sudo apt-get install trivy

Scan Terraform:

trivy config .

Output looks similar to tfsec but includes additional checks:

main.tf (terraform)

LOW: No bucket versioning enabled
════════════════════════════════════════
S3 bucket does not have versioning enabled.
Without versioning, accidental deletes are permanent.
────────────────────────────────────────
 main.tf:12-16
────────────────────────────────────────
 12 ┌ resource "aws_s3_bucket" "data_bucket" {
 13 │   bucket = "my-data-bucket"
 14 │   acl    = "private"
 15 └ }
────────────────────────────────────────

When to use which:

  • Working only with Terraform? Use tfsec. It's faster and Terraform-specific.
  • Scanning containers + Terraform in the same pipeline? Use Trivy. One tool, multiple scan types.

Checkov: The Compliance Police

Checkov (from Bridgecrew, now owned by Palo Alto) has over 1,000 built-in policies mapped to compliance frameworks. HIPAA, PCI-DSS, CIS benchmarks, SOC 2, GDPR - if there's a compliance standard, Checkov checks it.

pip install checkov

checkov -d .

Check specific compliance framework:

checkov -d . --framework terraform --check HIPAA

It shows you exactly which resources violate which HIPAA controls. No more guessing what auditors will flag.

Example output:

Check: CKV_AWS_18: "Ensure the S3 bucket has server-side encryption enabled"
	FAILED for resource: aws_s3_bucket.data_bucket
	File: /main.tf:12-16
	Guide: https://docs.bridgecrew.io/docs/s3_14-data-encrypted-at-rest

Same encryption issue, but now you know it's a HIPAA violation, not just a security best practice.

Policy-as-Code: Enforce Your Rules

Static scanners catch known problems. But they don't know your company's specific policies:

  • "All production EC2 instances must have tag CostCenter"
  • "S3 buckets can't be created in ap-southeast-1 (too expensive)"
  • "RDS instances in production can't be smaller than db.t3.medium"
  • "Security groups can't have descriptions containing the word 'temporary'"

You need Policy-as-Code. Enter Open Policy Agent.

OPA and Conftest

Open Policy Agent (OPA) is a general-purpose policy engine. You write rules in Rego language. Conftest applies those rules to Terraform plans.

Install Conftest:

# macOS
brew install conftest

# Linux
wget https://github.com/open-policy-agent/conftest/releases/download/v0.45.0/conftest_0.45.0_Linux_x86_64.tar.gz
tar xzf conftest_0.45.0_Linux_x86_64.tar.gz
sudo mv conftest /usr/local/bin/

Create a policy: Require CostCenter tags on everything

Create file policy/tagging.rego:

package main

import future.keywords.contains
import future.keywords.if

# Deny EC2 instances without CostCenter tag
deny[msg] {
  resource := input.resource.aws_instance[name]
  not resource.tags.CostCenter
  msg := sprintf("EC2 instance '%s' missing required tag 'CostCenter'", [name])
}

# Deny S3 buckets without Environment tag
deny[msg] {
  resource := input.resource.aws_s3_bucket[name]
  not resource.tags.Environment
  msg := sprintf("S3 bucket '%s' missing required tag 'Environment'", [name])
}

Rego looks weird at first. Read it like this: "For each EC2 instance in the plan, if it doesn't have a CostCenter tag, create a denial message."

Test the policy:

# Create Terraform plan in JSON format
terraform init
terraform plan -out=tfplan.binary
terraform show -json tfplan.binary > tfplan.json

# Run policy check
conftest test tfplan.json

Output if you violate the policy:

FAIL - tfplan.json - main - EC2 instance 'web_server' missing required tag 'CostCenter'
FAIL - tfplan.json - main - S3 bucket 'data_bucket' missing required tag 'Environment'

2 tests, 0 passed, 0 warnings, 2 failures, 0 exceptions

Pipeline fails. No deployment until you fix it.

Fix the violation:

resource "aws_instance" "web_server" {
  ami           = "ami-0c55b159cbfafe1f0"
  instance_type = "t3.micro"

  tags = {
    Name       = "WebServer"
    CostCenter = "Engineering"  # Added
  }
}

resource "aws_s3_bucket" "data_bucket" {
  bucket = "my-data-bucket"

  tags = {
    Environment = "Production"  # Added
  }
}

Run conftest test tfplan.json again. All tests pass. Deploy approved.

Advanced Policy: Prevent Cost Explosions

Here's a real policy that prevents expensive instance types in non-production environments:

policy/cost.rego:

package main

import future.keywords.if

# List of expensive instance types
expensive_instances := [
  "c5.24xlarge",
  "m5.24xlarge",
  "r5.24xlarge",
  "p3.16xlarge",
  "p4d.24xlarge"
]

# Deny expensive instances in dev/staging
deny[msg] {
  resource := input.resource.aws_instance[name]
  resource.instance_type == expensive_instances[_]
  resource.tags.Environment != "Production"

  msg := sprintf(
    "Instance '%s' uses expensive type '%s' in non-production (Environment=%s). Use smaller instances for dev/staging.",
    [name, resource.instance_type, resource.tags.Environment]
  )
}

Someone tries to launch a c5.24xlarge in staging? Blocked. No exceptions.

Regional Restrictions Policy

Some AWS regions cost more. Prevent accidental deployments:

policy/regions.rego:

package main

import future.keywords.if

allowed_regions := [
  "us-east-1",
  "us-west-2",
  "eu-west-1"
]

deny[msg] {
  provider := input.provider.aws[_]
  region := provider.region
  not region_allowed(region)

  msg := sprintf(
    "AWS region '%s' not allowed. Use one of: %v",
    [region, allowed_regions]
  )
}

region_allowed(region) {
  region == allowed_regions[_]
}

Try to deploy to ap-southeast-1? Policy says no. Stay in the cheap regions.

Run multiple policies at once:

conftest test tfplan.json -p policy/

Conftest checks every .rego file in the directory. One command, all policies enforced.

Integration Testing with Terratest

Static analysis catches configuration mistakes. But it can't tell you if your infrastructure actually works.

Questions static analysis can't answer:

  • Does the EC2 instance boot successfully?
  • Can the application connect to RDS?
  • Is the load balancer health check passing?
  • Do autoscaling policies trigger correctly under load?

You need integration tests. Real deployments to real AWS accounts.

Terratest (by Gruntwork) deploys your Terraform code, validates it works, then cleans up. All automated. All in Go.

Setup Terratest

Install Go:

# macOS
brew install go

# Linux: download from https://go.dev/dl/

# Verify
go version  # Should be 1.19 or later

Project structure:

your-terraform-project/
├── examples/
│   └── aws-instance/
│       ├── main.tf
│       ├── variables.tf
│       └── outputs.tf
└── test/
    └── aws_instance_test.go

The examples/ directory contains minimal working Terraform configs. The test/ directory contains Go tests that deploy those examples and validate them.

Example 1: Test EC2 Instance Deployment

examples/aws-instance/main.tf:

provider "aws" {
  region = var.region
}

resource "aws_instance" "test_instance" {
  ami           = var.ami_id
  instance_type = var.instance_type

  tags = {
    Name        = var.instance_name
    Environment = "Test"
    ManagedBy   = "Terratest"
  }
}

examples/aws-instance/variables.tf:

variable "region" {
  type    = string
  default = "us-east-1"
}

variable "ami_id" {
  type = string
}

variable "instance_type" {
  type    = string
  default = "t3.micro"
}

variable "instance_name" {
  type = string
}

examples/aws-instance/outputs.tf:

output "instance_id" {
  value = aws_instance.test_instance.id
}

output "public_ip" {
  value = aws_instance.test_instance.public_ip
}

test/aws_instance_test.go:

package test

import (
  "testing"

  "github.com/gruntwork-io/terratest/modules/aws"
  "github.com/gruntwork-io/terratest/modules/terraform"
  "github.com/stretchr/testify/assert"
)

func TestAwsInstance(t *testing.T) {
  t.Parallel()

  // Pick random region to avoid conflicts
  awsRegion := aws.GetRandomStableRegion(t, []string{"us-east-1", "us-west-2"}, nil)

  terraformOptions := &terraform.Options{
    TerraformDir: "../examples/aws-instance",

    Vars: map[string]interface{}{
      "region":        awsRegion,
      "ami_id":        "ami-0c55b159cbfafe1f0",  // Amazon Linux 2
      "instance_name": "terratest-example",
    },
  }

  // Clean up after test (runs even if test fails)
  defer terraform.Destroy(t, terraformOptions)

  // Deploy infrastructure
  terraform.InitAndApply(t, terraformOptions)

  // Validate outputs
  instanceID := terraform.Output(t, terraformOptions, "instance_id")
  assert.NotEmpty(t, instanceID, "Instance ID should not be empty")

  // Validate instance exists in AWS
  instanceIDs := aws.GetEc2InstanceIdsByTag(t, awsRegion, "Name", "terratest-example")
  assert.Contains(t, instanceIDs, instanceID)
}

Initialize Go modules:

cd test/
go mod init github.com/yourusername/terraform-tests
go get github.com/gruntwork-io/terratest/modules/terraform
go get github.com/gruntwork-io/terratest/modules/aws
go get github.com/stretchr/testify/assert

Run the test:

export AWS_ACCESS_KEY_ID="your-key"
export AWS_SECRET_ACCESS_KEY="your-secret"

go test -v -timeout 30m

What happens:

  1. Terratest picks a random AWS region (avoids conflicts)
  2. Runs terraform init and terraform apply
  3. Captures outputs (instance ID, public IP)
  4. Validates instance actually exists in AWS
  5. Runs terraform destroy to clean up
  6. Reports pass/fail

Actual output:

=== RUN   TestAwsInstance
TestAwsInstance 2025-12-30T14:23:10Z Running command terraform with args [init -upgrade=false]
TestAwsInstance 2025-12-30T14:23:12Z Running command terraform with args [apply -auto-approve]
TestAwsInstance 2025-12-30T14:24:45Z Apply complete! Resources: 1 added, 0 changed, 0 destroyed.
TestAwsInstance 2025-12-30T14:24:46Z Running command terraform with args [output -no-color instance_id]
TestAwsInstance 2025-12-30T14:24:47Z i-0abcd1234efgh5678
TestAwsInstance 2025-12-30T14:24:51Z Running command terraform with args [destroy -auto-approve]
TestAwsInstance 2025-12-30T14:25:15Z Destroy complete! Resources: 1 destroyed.
--- PASS: TestAwsInstance (125.43s)
PASS

Your infrastructure deployed successfully, passed validation, and cleaned up. All automated.

Example 2: Test S3 Bucket Compliance

Don't just test that the bucket exists. Test that it's configured correctly.

test/s3_bucket_test.go:

package test

import (
  "testing"

  "github.com/gruntwork-io/terratest/modules/aws"
  "github.com/gruntwork-io/terratest/modules/random"
  "github.com/gruntwork-io/terratest/modules/terraform"
  "github.com/stretchr/testify/assert"
)

func TestS3Bucket(t *testing.T) {
  t.Parallel()

  awsRegion := "us-east-1"
  uniqueID := random.UniqueId()
  bucketName := "terratest-bucket-" + uniqueID

  terraformOptions := &terraform.Options{
    TerraformDir: "../examples/s3-bucket",
    Vars: map[string]interface{}{
      "bucket_name": bucketName,
      "region":      awsRegion,
    },
  }

  defer terraform.Destroy(t, terraformOptions)
  terraform.InitAndApply(t, terraformOptions)

  // Validate bucket exists
  aws.AssertS3BucketExists(t, awsRegion, bucketName)

  // Validate versioning enabled (compliance requirement)
  versioningStatus := aws.GetS3BucketVersioning(t, awsRegion, bucketName)
  assert.Equal(t, "Enabled", versioningStatus)

  // Validate encryption enabled (security requirement)
  bucketEncryption := aws.GetS3BucketEncryption(t, awsRegion, bucketName)
  assert.NotNil(t, bucketEncryption, "Bucket must have encryption enabled")
}

This test fails if:

  • Bucket doesn't get created
  • Versioning is disabled
  • Encryption is missing

Catches drift between policy and reality.

Example 3: Test Web Application Works

Deploy a complete web stack (EC2 + ALB) and verify the app actually responds:

test/web_app_test.go:

package test

import (
  "fmt"
  "testing"
  "time"

  http_helper "github.com/gruntwork-io/terratest/modules/http-helper"
  "github.com/gruntwork-io/terratest/modules/terraform"
)

func TestWebApp(t *testing.T) {
  t.Parallel()

  terraformOptions := &terraform.Options{
    TerraformDir: "../examples/web-app",
  }

  defer terraform.Destroy(t, terraformOptions)
  terraform.InitAndApply(t, terraformOptions)

  // Get load balancer DNS name from output
  albURL := terraform.Output(t, terraformOptions, "alb_dns_name")
  url := fmt.Sprintf("http://%s", albURL)

  // Validate HTTP 200 response with retries (ALB takes time to warm up)
  http_helper.HttpGetWithRetry(
    t,
    url,
    nil,
    200,
    "Hello, World!",  // Expected response body
    30,               // Max retries
    10*time.Second,   // Wait between retries
  )
}

This catches:

  • Load balancer misconfiguration
  • Wrong target group health checks
  • Security group blocking traffic
  • Application deployment failures

If terraform apply succeeds but the app doesn't respond, the test fails. That's the whole point.

CI/CD Integration

Manual test runs don't scale. You need automated pipelines.

GitHub Actions Pipeline

.github/workflows/terraform-test.yml:

name: Terraform Testing Pipeline

on:
  pull_request:
    paths:
      - '**.tf'
      - '.github/workflows/terraform-test.yml'
  push:
    branches:
      - main

jobs:
  static-analysis:
    name: Static Security Scan
    runs-on: ubuntu-latest
    steps:
      - name: Checkout
        uses: actions/checkout@v3

      - name: Run tfsec
        uses: aquasecurity/tfsec-action@v1.0.0
        with:
          soft_fail: false

      - name: Run Trivy
        uses: aquasecurity/trivy-action@master
        with:
          scan-type: 'config'
          scan-ref: '.'
          format: 'sarif'
          output: 'trivy-results.sarif'

      - name: Upload results to GitHub Security
        uses: github/codeql-action/upload-sarif@v2
        with:
          sarif_file: 'trivy-results.sarif'

  policy-check:
    name: Policy Validation
    runs-on: ubuntu-latest
    steps:
      - name: Checkout
        uses: actions/checkout@v3

      - name: Setup Terraform
        uses: hashicorp/setup-terraform@v2

      - name: Generate plan
        run: |
          terraform init
          terraform plan -out=tfplan.binary
          terraform show -json tfplan.binary > tfplan.json

      - name: Install Conftest
        run: |
          wget https://github.com/open-policy-agent/conftest/releases/download/v0.45.0/conftest_0.45.0_Linux_x86_64.tar.gz
          tar xzf conftest_0.45.0_Linux_x86_64.tar.gz
          sudo mv conftest /usr/local/bin/

      - name: Run policies
        run: conftest test tfplan.json -p policy/

  terratest:
    name: Integration Tests
    runs-on: ubuntu-latest
    needs: [static-analysis, policy-check]
    steps:
      - name: Checkout
        uses: actions/checkout@v3

      - name: Setup Go
        uses: actions/setup-go@v4
        with:
          go-version: '1.20'

      - name: Setup Terraform
        uses: hashicorp/setup-terraform@v2

      - name: Configure AWS
        uses: aws-actions/configure-aws-credentials@v2
        with:
          aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
          aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
          aws-region: us-east-1

      - name: Run Terratest
        working-directory: test
        run: |
          go mod download
          go test -v -timeout 30m

This pipeline:

  1. Runs tfsec and Trivy (fast, catches obvious issues)
  2. Runs OPA policies (enforces your rules)
  3. Only runs Terratest if static checks pass (saves time and money)
  4. Uploads security findings to GitHub Security tab

Every pull request gets tested. No manual intervention required.

Testing Best Practices

1. Follow the Testing Pyramid

Don't over-rely on Terratest. It's slow and costs real money.

Recommended distribution:

  • 70% static analysis (tfsec, Trivy, Checkov) - Fast, free, catches most issues
  • 20% policy validation (OPA/Conftest) - Medium speed, no infrastructure costs
  • 10% integration tests (Terratest) - Slow, expensive, catches what others miss

2. Use Dedicated Test Accounts

Never run Terratest in production accounts. Ever.

Set up AWS Organizations structure:

root-account/
├── production/       # Real customer traffic
├── staging/          # Pre-prod testing
└── testing/          # Terratest sandbox (costs don't matter)

Set billing alerts on the testing account. If someone accidentally spawns 100 EC2 instances, you want to know immediately.

3. Parallel Testing Speeds Things Up

Terratest supports parallel execution:

func TestMultipleRegions(t *testing.T) {
  t.Parallel()  // Runs alongside other t.Parallel() tests

  // Test code...
}

But watch out for AWS service limits. Don't run 50 parallel tests that each create a VPC. You'll hit the VPC limit (5 per region by default) and everything fails.

4. Test Idempotency

Run terraform apply twice. The second run should show zero changes:

func TestIdempotency(t *testing.T) {
  terraformOptions := &terraform.Options{
    TerraformDir: "../examples/vpc",
  }

  defer terraform.Destroy(t, terraformOptions)

  // First apply
  terraform.InitAndApply(t, terraformOptions)

  // Second apply - should be a no-op
  output := terraform.Apply(t, terraformOptions)
  assert.Contains(t, output, "0 to add, 0 to change, 0 to destroy")
}

If the second apply changes things, your Terraform code isn't idempotent. That causes drift and unpredictable behavior.

5. Clean Up Test Resources Automatically

Always use defer terraform.Destroy():

defer terraform.Destroy(t, terraformOptions)  // Runs even if test fails

This prevents orphaned resources. If a test crashes halfway through, Terraform still destroys what it created.

But cleanup can fail. Resource dependencies, manual deletions, rate limits - all cause destroy failures. Monitor with AWS Config or write a cleanup Lambda that runs daily.

6. Use Unique Names to Avoid Conflicts

Multiple developers running tests at the same time? You need unique resource names:

bucketName := "test-bucket-" + random.UniqueId()

Terratest's random.UniqueId() generates unique strings. No more "bucket name already exists" errors when someone else is testing.

7. Mock External APIs in Unit Tests

Don't hit real AWS APIs for every test. Use LocalStack for local AWS emulation:

# docker-compose.yml
services:
  localstack:
    image: localstack/localstack
    environment:
      - SERVICES=s3,dynamodb,sqs,lambda
      - DEFAULT_REGION=us-east-1
    ports:
      - "4566:4566"

Point Terraform at LocalStack:

provider "aws" {
  region                      = "us-east-1"
  access_key                  = "test"
  secret_key                  = "test"
  skip_credentials_validation = true
  skip_metadata_api_check     = true

  endpoints {
    s3       = "http://localhost:4566"
    dynamodb = "http://localhost:4566"
    sqs      = "http://localhost:4566"
  }
}

Tests run locally, no AWS costs, much faster feedback.

Common Mistakes to Avoid

Mistake 1: Testing in production

I shouldn't have to say this, but: never run destructive tests in production accounts. Use separate AWS accounts for testing.

Mistake 2: Ignoring cleanup failures

Terratest's defer terraform.Destroy() can fail. Resources get orphaned. AWS keeps billing you.

Set up automated cleanup:

# Tag all test resources
tags = {
  ManagedBy = "Terratest"
  TTL       = "24h"
}

# Daily Lambda deletes resources older than 24 hours with ManagedBy=Terratest tag

Mistake 3: Hardcoding AWS credentials

Never commit credentials to Git. Use environment variables or IAM role assumption:

# Environment variables (local development)
export AWS_ACCESS_KEY_ID="..."
export AWS_SECRET_ACCESS_KEY="..."

# IAM roles (CI/CD)
aws sts assume-role --role-arn arn:aws:iam::123456789012:role/TerratestRole

Mistake 4: Slow test feedback loops

Terratest tests can take 15+ minutes. Optimize:

  • Use smallest possible instance types (t3.nano instead of t3.xlarge)
  • Reduce retry timeouts in HTTP checks
  • Run expensive tests only on main branch, not every PR
  • Cache Terraform providers in CI/CD

Mistake 5: No visibility into test costs

Terratest deploys real infrastructure. That costs money. Track it:

  • Tag all test resources with Environment=Testing
  • Use AWS Cost Explorer to filter by tag
  • Set billing alerts ($50/day is reasonable for active testing)

What You've Learned

You now know how to:

  • Catch security issues before deployment with tfsec, Trivy, and Checkov
  • Enforce company-specific policies using OPA and Conftest
  • Test real infrastructure behavior with Terratest
  • Integrate all three testing layers into CI/CD pipelines
  • Balance test coverage vs cost and speed

Checkpoint!

Before moving on, do this:

  1. Run tfsec . on your Terraform code. Fix anything marked HIGH or CRITICAL.
  2. Write one OPA policy that enforces your company's tagging standards.
  3. Create a basic Terratest test that deploys an S3 bucket and validates encryption.
  4. Add tfsec to your CI/CD pipeline so it runs on every PR.

Why this matters: One unencrypted S3 bucket can expose customer data. One missing policy check can cost thousands in wasted EC2 spend. Testing prevents disasters before they reach production.

What's Next

In Part 11: Terraform Cloud & Remote State, you'll learn:

  • Moving from local state to Terraform Cloud
  • Team collaboration workflows (RBAC, policy enforcement, drift detection)
  • Sentinel policies for enterprise governance
  • Cost estimation before you apply changes
  • Private module registry for sharing code across teams

You've mastered testing. Next up: scaling Terraform for teams.

References


Series navigation:


This post is part of the "Terraform from Fundamentals to Production" series.