Posted on :: 6716 Words :: Tags: , , , ,

You've built a beautiful React, Vue, or Angular single-page application. It's fast, modern, and users love it. But when you share links on Facebook, you see generic previews with no images. Google Search Console shows disappointing rankings. Your marketing team is frustrated because social media posts don't convert.

What went wrong?

The answer lies in how single-page applications (SPAs) render content. Unlike traditional server-rendered websites, SPAs send minimal HTML to the browser and rely on JavaScript to build the page dynamically. This creates a critical problem: search engine crawlers and social media bots don't execute JavaScript properly, so they see empty HTML with no content or meta tags.

In this guide, I'll show you how to solve this problem using AWS Lambda@Edge—a serverless solution that detects bots at CloudFront's edge locations and dynamically injects meta tags into HTML responses. This approach is:

  • Framework-agnostic: Works with any SPA (React, Vue, Angular, Svelte)
  • Cost-effective: ~$1-2/month vs $99-249/month for pre-rendering services
  • Low-latency: 8-15ms execution time at the edge
  • Production-ready: Complete with infrastructure-as-code templates
  • Non-invasive: No changes to your existing SPA code

GitHub Repository

Complete working example available: All code from this guide is available in the lambda-edge-spa-seo repository. Star it if you find it useful!

What You'll Learn

By the end of this guide, you'll understand:

  1. Why SPAs fail at SEO and social media sharing
  2. How Lambda@Edge works and when to use it
  3. How to implement bot detection and meta tag injection
  4. Complete infrastructure setup with Terraform
  5. Testing strategies and cost optimization techniques

Let's dive in.

The SPA Problem: Why Bots See Empty HTML

To understand the solution, we need to understand the problem. Let's examine how SPAs differ from traditional server-rendered applications.

How SPAs Work

In a traditional server-rendered application, when a user requests a page, the server generates complete HTML with all content and meta tags:

<!-- Traditional SSR: Server sends complete HTML -->
<!DOCTYPE html>
<html>
<head>
  <title>Awesome Blog Post - My Site</title>
  <meta name="description" content="This is an amazing post about...">
  <meta property="og:title" content="Awesome Blog Post">
  <meta property="og:description" content="This is an amazing post about...">
  <meta property="og:image" content="https://example.com/images/post.jpg">
</head>
<body>
  <h1>Awesome Blog Post</h1>
  <p>This is an amazing post about Lambda@Edge and SEO...</p>
  <p>More content here...</p>
</body>
</html>

In contrast, SPAs serve a minimal HTML shell and load content via JavaScript:

<!-- SPA: Server sends minimal HTML shell -->
<!DOCTYPE html>
<html>
<head>
  <title>My App</title>
  <meta name="description" content="Generic app description">
</head>
<body>
  <div id="root"></div>
  <script src="/bundle.js"></script>
</body>
</html>

The JavaScript bundle then:

  1. Executes in the browser
  2. Fetches data from APIs
  3. Renders content into the <div id="root"> element
  4. Updates meta tags via document.title or libraries like React Helmet

For human users, this works perfectly. The page loads, JavaScript executes, and content appears. But for bots? Not so much.

Why Bots Fail with SPAs

Search engine crawlers and social media bots have significant limitations:

Search Engine Crawlers (Google, Bing, DuckDuckGo):

  • May execute JavaScript, but with strict resource constraints
  • Rendering happens in a queue, can take hours or days
  • Timeout after 5-10 seconds if JavaScript is slow
  • Don't support all modern JavaScript features
  • May miss content loaded after initial render

Social Media Crawlers (Facebook, Twitter, LinkedIn):

  • Do NOT execute JavaScript at all
  • Only read server-sent HTML
  • Look for meta tags in the <head> section
  • Timeout after 2-5 seconds
  • Cache results aggressively

When Facebook's scraper (facebookexternalhit) visits your SPA, it sees:

<html>
<head>
  <title>My App</title>
  <meta name="description" content="Generic app description">
</head>
<body>
  <div id="root"></div>
  <script src="/bundle.js"></script>
</body>
</html>

No og:title, no og:image, no content. Result: broken social preview cards.

Real-World Impact

Here's what happens in practice:

Facebook Sharing:

Generic Preview:
┌─────────────────────────────┐
│ My App                      │
│ example.com                 │
│ Generic app description     │
└─────────────────────────────┘

What You Want:

Rich Preview:
┌─────────────────────────────┐
│ [Beautiful Hero Image]      │
│ Awesome Blog Post           │
│ This is an amazing post... │
│ example.com/blog/post       │
└─────────────────────────────┘

Google Search Results:

  • Wrong page title (shows generic "My App" instead of specific post title)
  • Generic description instead of post-specific content
  • Delayed or missing indexing
  • Lower rankings due to poor content signals

Client-Side Meta Tag Updates Don't Work

Many developers try to fix this with client-side meta tag manipulation:

// React example - DOESN'T WORK FOR BOTS
import { Helmet } from 'react-helmet';

function BlogPost({ post }) {
  return (
    <>
      <Helmet>
        <title>{post.title} - My Blog</title>
        <meta property="og:title" content={post.title} />
        <meta property="og:description" content={post.description} />
        <meta property="og:image" content={post.image} />
      </Helmet>
      <article>
        <h1>{post.title}</h1>
        <p>{post.content}</p>
      </article>
    </>
  );
}

This updates meta tags after JavaScript executes. Bots never execute JavaScript, so they never see these tags.

Traditional Solutions & Limitations

Before Lambda@Edge, you had limited options:

1. Server-Side Rendering (SSR)

  • Frameworks: Next.js, Nuxt.js, SvelteKit, Angular Universal
  • Pros: Best solution, proper SSR for all requests
  • Cons: Requires complete rewrite, high infrastructure costs

2. Pre-rendering Services

  • Services: Prerender.io, Rendertron, Puppeteer-based solutions
  • Pros: Minimal code changes, works with existing SPAs
  • Cons: Expensive ($99-500/month), adds latency, single point of failure

3. Static Site Generation

  • Tools: Gatsby, Hugo, Jekyll, 11ty
  • Pros: Perfect SEO, fast, cheap hosting
  • Cons: Not suitable for dynamic content, build time increases with content

4. Hybrid Approaches

  • Example: Pre-render critical pages, SPA for everything else
  • Pros: Balance between SEO and SPA benefits
  • Cons: Complex to maintain, inconsistent user experience

All these approaches share common problems:

  • High cost (time or money)
  • Complexity in implementation
  • Significant architectural changes required

Understanding Lambda@Edge

Lambda@Edge offers an elegant alternative: intercept bot requests at CloudFront's edge and inject meta tags dynamically, without changing your SPA architecture.

What is Lambda@Edge?

Lambda@Edge is AWS Lambda's extension that runs serverless functions at CloudFront edge locations—140+ locations worldwide, close to your users.

Key characteristics:

  • Executes JavaScript (Node.js) or Python code
  • Runs at CloudFront edge locations (not centralized)
  • Intercepts CloudFront requests and responses
  • No server management required
  • Pay per execution (no idle costs)

CloudFront Event Types

Lambda@Edge can execute at four points in the CloudFront request/response lifecycle:

┌─────────────┐   Viewer      ┌──────────────┐   Origin      ┌─────────────┐
│   Client    │────Request────>│  CloudFront  │────Request────>│   Origin    │
│  (Browser)  │                │     Edge     │                │ (S3/Server) │
│             │<───Response────│              │<───Response────│             │
└─────────────┘                └──────────────┘                └─────────────┘
                                      ▲ ▲                            ▲ ▲
                                      │ │                            │ │
                          [1] Viewer  │ │ [4] Viewer    [2] Origin  │ │ [3] Origin
                              Request │ └─ Response         Request │ └─ Response

Event Triggers:

  1. Viewer Request (before CloudFront forwards to origin)

    • Use cases: Authentication, A/B testing, URI rewriting
    • Timeout: 5 seconds
    • Memory: 128MB (fixed)
  2. Origin Request (before CloudFront sends to origin, on cache miss)

    • Use cases: Adding headers, custom origin logic, external API calls
    • Timeout: 30 seconds
    • Memory: 128MB
  3. Origin Response (after receiving from origin, before caching)

    • Use cases: Modifying headers before caching, response transformation
    • Timeout: 30 seconds
    • Memory: 128MB
  4. Viewer Response (before returning to client)

    • Use cases: Meta tag injection ← Our approach
    • Timeout: 5 seconds
    • Memory: 128MB (fixed)

Why Viewer-Response for Meta Tag Injection?

For our use case (meta tag injection), viewer-response is optimal:

ConsiderationViewer-ResponseOrigin-Response
Execution frequencyEvery request (cache hit + miss)Cache misses only
Cache impactModifies after cache lookupModifies before caching
LatencyLower (after cache)Higher (on miss only)
FlexibilityCan modify per requestSame for all cached
Our use case✅ Perfect⚠️ Less suitable

Why viewer-response wins:

  • Executes on every request (both cache hits and misses)
  • Can serve different HTML based on User-Agent (bot vs human)
  • Leverages CloudFront caching (98%+ cache hit ratio)
  • Minimal latency since it runs after cache lookup

Lambda@Edge Constraints

Understanding the limits is critical for production implementations:

// Viewer Request/Response Constraints
const LIMITS = {
  memory: "128MB (fixed, cannot increase)",
  timeout: "5 seconds max",
  codeSize: "1MB max (including dependencies)",
  responseSize: "1MB max body size",
  requestSize: "40KB max headers + body",

  // Important limitations
  noFileSystem: true,              // No /tmp access
  noDynamoDB: "Not recommended",   // High latency from edge
  noExternalAPIs: "Not recommended", // Must complete in 5s
};

Practical implications:

  • DO: Use regex for HTML parsing (fast, lightweight)
  • DO: Embed metadata in function code (no external calls)
  • DO: Use simple string operations
  • DON'T: Use heavy DOM parsers (cheerio, jsdom)
  • DON'T: Call external APIs in viewer events
  • DON'T: Use large npm packages

CloudWatch Logs Location: Unlike regular Lambda, Lambda@Edge logs appear in the edge region where the function executed, not us-east-1. This means logs are distributed globally.

# Logs appear in edge regions
/aws/lambda/us-east-1.function-name  # US East users
/aws/lambda/eu-west-1.function-name  # Europe users
/aws/lambda/ap-southeast-1.function-name  # Asia users

Solution Architecture

Now let's design the complete solution. Here's how Lambda@Edge solves the SPA SEO/OGP problem.

High-Level Architecture

┌──────────┐
│  Client  │ (Human User or Bot)
└────┬─────┘
     │ 1. GET /blog/my-post
     │    User-Agent: facebookexternalhit/1.1
┌────────────────────────────────────────┐
│     CloudFront Distribution            │
│     (CDN + Lambda@Edge)                │
│                                        │
│  ┌──────────────────────────────────┐ │
│  │  Cache Layer                     │ │
│  │  - Separate cache per User-Agent│ │
│  │  - 98%+ cache hit ratio          │ │
│  └──────────────────────────────────┘ │
└────┬───────────────────────────────────┘
     │ 2. Trigger: Viewer-Response
┌──────────────────────────────────────────────────┐
│         Lambda@Edge Function                     │
│  ┌────────────────────────────────────────────┐ │
│  │ Step 1: Bot Detection                      │ │
│  │   - Extract User-Agent header              │ │
│  │   - Match: /googlebot|facebookexternalhit/ │ │
│  │   - Result: isBot = true/false             │ │
│  │                                             │ │
│  │ Step 2: If Bot Detected                    │ │
│  │   - Parse request URI: /blog/my-post       │ │
│  │   - Lookup metadata from embedded config   │ │
│  │   - Generate OGP meta tags                 │ │
│  │   - Inject into HTML <head>                │ │
│  │   - Return modified HTML                   │ │
│  │                                             │ │
│  │ Step 3: If Human Detected                  │ │
│  │   - Return original HTML (SPA shell)       │ │
│  │   - JavaScript hydrates normally           │ │
│  └────────────────────────────────────────────┘ │
└────┬─────────────────────────────────────────────┘
     │ 3. Return HTML (modified or original)
┌──────────┐
│  Client  │
│  - Bot: Sees meta tags in HTML <head>
│  - Human: Sees SPA shell, JS hydrates
└──────────┘

Component Architecture

1. S3 Bucket (Static Hosting)

my-spa-bucket/
├── index.html          # SPA shell (entry point)
├── static/
│   ├── js/
│   │   ├── main.chunk.js
│   │   └── vendor.chunk.js
│   ├── css/
│   │   └── main.css
│   └── media/
│       └── logo.png
└── assets/
    └── og-images/      # Open Graph images
        ├── default.jpg
        └── blog-post.jpg

2. CloudFront Distribution

  • Origin: S3 bucket with Origin Access Identity (OAI)
  • Cache behavior: Forward User-Agent header to Lambda@Edge
  • Vary header: Cache separately based on User-Agent pattern
  • Error handling: 404 → 200 /index.html (SPA routing)

3. Lambda@Edge Function (Node.js 20)

// Function structure (simplified)
exports.handler = async (event) => {
  const request = event.Records[0].cf.request;
  const response = event.Records[0].cf.response;

  // 1. Bot detection
  const userAgent = request.headers['user-agent'][0].value;
  const isBot = detectBot(userAgent);

  // 2. Return early if not bot
  if (!isBot) return response;

  // 3. Inject meta tags for bots
  const metadata = getMetadata(request.uri);
  const modifiedResponse = injectMetaTags(response, metadata);

  return modifiedResponse;
};

4. Metadata Configuration

// Embedded in Lambda function (zero latency)
const METADATA_MAP = {
  '/': {
    title: 'My SPA - Home',
    description: 'Welcome to my application',
    image: 'https://cdn.example.com/og-home.jpg',
    url: 'https://example.com/'
  },
  '/blog/lambda-edge': {
    title: 'Solving SPA SEO with Lambda@Edge',
    description: 'Learn how to fix SPA SEO issues using AWS Lambda@Edge',
    image: 'https://cdn.example.com/blog/lambda-edge.jpg',
    url: 'https://example.com/blog/lambda-edge'
  }
};

Request Flow: Bot vs Human

Bot Request Flow (Facebook scraper):

1. Facebook bot requests: /blog/my-post
   User-Agent: facebookexternalhit/1.1

2. CloudFront receives request
   - Checks cache (key: URL + User-Agent pattern)
   - Cache MISS (first time)

3. CloudFront fetches from S3
   - Gets index.html (SPA shell)

4. Triggers viewer-response Lambda@Edge
   - Detects bot: isBot = true
   - Looks up metadata for /blog/my-post
   - Injects OGP tags into HTML <head>
   - Returns modified HTML

5. CloudFront caches modified response
   - Cache key includes User-Agent pattern

6. Facebook receives HTML with meta tags:
   <meta property="og:title" content="My Post" />
   <meta property="og:image" content="image.jpg" />

7. Facebook scraper parses meta tags
   - Extracts title, description, image
   - Generates rich preview card

Human Request Flow (Chrome browser):

1. User requests: /blog/my-post
   User-Agent: Mozilla/5.0 ... Chrome/120.0

2. CloudFront receives request
   - Checks cache (key: URL + User-Agent pattern)
   - Cache HIT (98% of the time)

3. Triggers viewer-response Lambda@Edge
   - Detects human: isBot = false
   - Returns original HTML (no modification)

4. User receives SPA shell:
   <div id="root"></div>
   <script src="/bundle.js"></script>

5. JavaScript executes
   - Fetches post data from API
   - Renders content
   - Updates meta tags client-side (for display only)

Caching Strategy with Vary Header

CloudFront caches responses based on cache keys. We need separate cache entries for bots vs humans:

// CloudFront cache behavior configuration
{
  "ForwardedValues": {
    "QueryString": false,
    "Headers": ["User-Agent"],  // Forward to Lambda@Edge
    "Cookies": { "Forward": "none" }
  },
  "MinTTL": 3600,     // 1 hour min
  "DefaultTTL": 86400, // 24 hours default
  "MaxTTL": 604800    // 7 days max
}

Cache Key Composition:

Cache Key = URL + User-Agent Pattern

Example cache entries:
1. /blog/post + "bot" → HTML with injected meta tags
2. /blog/post + "browser" → Original SPA shell

Result: Bots get rich HTML, humans get SPA shell

Cache Hit Ratio:

  • Initial bot request: MISS (Lambda@Edge executes)
  • Subsequent bot requests: HIT (served from cache)
  • Human requests: HIT (separate cache entry)
  • Expected cache hit ratio: 98%+

This means Lambda@Edge executes on only ~2% of requests, dramatically reducing costs.

Performance Characteristics

Latency Breakdown:

Bot Request (First Time - Cache Miss):
┌──────────────────────────────────────┐
│ CloudFront routing:          ~5ms    │
│ S3 fetch (origin):          ~50ms    │
│ Lambda@Edge execution:      ~10ms    │
│   - Bot detection:           1ms     │
│   - Metadata lookup:         0.1ms   │
│   - HTML injection:          8ms     │
│ CloudFront caching:          ~2ms    │
│ ─────────────────────────────────    │
│ Total:                      ~67ms    │
└──────────────────────────────────────┘

Bot Request (Cached):
┌──────────────────────────────────────┐
│ CloudFront routing:          ~5ms    │
│ Cache hit (no origin):        0ms    │
│ Lambda@Edge execution:      ~10ms    │
│ ─────────────────────────────────    │
│ Total:                      ~15ms    │
└──────────────────────────────────────┘

Human Request (Cached):
┌──────────────────────────────────────┐
│ CloudFront routing:          ~5ms    │
│ Cache hit (no origin):        0ms    │
│ Lambda@Edge (no injection):  ~2ms    │
│ ─────────────────────────────────    │
│ Total:                       ~7ms    │
└──────────────────────────────────────┘

Key Metrics:

  • Bot meta tag injection: 10-15ms added latency
  • Human requests: 2-7ms added latency (negligible)
  • Cache hit ratio: 98%+ with proper configuration
  • Lambda@Edge cold start: 20-50ms (rare, 1-2% of requests)

Cost Structure

For a site with 1 million requests/month:

Traffic Distribution:
- Total requests: 1,000,000
- Bot traffic: 5% (50,000 requests)
- Human traffic: 95% (950,000 requests)
- Cache hit ratio: 98%

Lambda@Edge Execution:
- Cache misses: 2% of 1M = 20,000 executions
- Average execution: 10ms per request
- Memory: 128MB (fixed)

Cost Breakdown:
┌─────────────────────────────────────────────┐
│ CloudFront                                  │
│ - Requests: 1M × $0.0075/10k = $0.75       │
│ - Data transfer: ~1GB × $0.085 = $0.09     │
│                                             │
│ Lambda@Edge                                 │
│ - Requests: 20k × $0.60/1M = $0.01         │
│ - Compute: 20k × 10ms × 128MB              │
│   = 25.6 GB-s × $0.00000625125 = $0.16    │
│                                             │
│ Total Monthly Cost: $1.01                   │
└─────────────────────────────────────────────┘

Comparison:
- Lambda@Edge solution: $1.01/month
- Prerender.io Basic:    $99/month
- Prerender.io Pro:      $249/month
- Savings: 99% vs Prerender.io

This architecture delivers production-grade SEO and OGP support for SPAs at minimal cost and latency.

Implementation: Bot Detection

Now let's implement the core logic. Bot detection is the critical first step—we need to identify when a request comes from a search crawler or social media bot.

Bot User-Agent Patterns

Search engines and social platforms identify their crawlers using User-Agent strings. Here are the key patterns:

// Comprehensive bot detection patterns (2026)
const BOT_USER_AGENTS = [
  // Google crawlers
  /googlebot/i,              // Main Googlebot
  /google-inspectiontool/i,  // Google Search Console
  /adsbot-google/i,          // Google Ads bot

  // Other search engines
  /bingbot/i,                // Microsoft Bing
  /slurp/i,                  // Yahoo
  /duckduckbot/i,            // DuckDuckGo
  /baiduspider/i,            // Baidu (China)
  /yandexbot/i,              // Yandex (Russia)
  /sogou/i,                  // Sogou (China)

  // Social media crawlers
  /facebookexternalhit/i,    // Facebook Link Preview
  /facebot/i,                // Facebook Bot (alternative)
  /twitterbot/i,             // Twitter Card Validator
  /linkedinbot/i,            // LinkedIn Preview
  /slackbot/i,               // Slack Link Unfurling
  /discordbot/i,             // Discord Link Embed
  /whatsapp/i,               // WhatsApp Link Preview
  /telegrambot/i,            // Telegram Link Preview

  // Additional important bots
  /pinterestbot/i,           // Pinterest
  /redditbot/i,              // Reddit
  /applebot/i,               // Apple (Siri, Spotlight)
  /ia_archiver/i,            // Alexa
];

/**
 * Detect if the request comes from a bot
 * @param {string} userAgent - User-Agent header value
 * @returns {boolean} - True if bot detected
 */
function isBot(userAgent) {
  if (!userAgent) return false;
  return BOT_USER_AGENTS.some(pattern => pattern.test(userAgent));
}

Detection Strategy

Our detection strategy prioritizes simplicity and reliability:

1. User-Agent Matching (Primary)

  • Fast: Regex test completes in < 1ms
  • Reliable: Bots consistently use identifiable User-Agent strings
  • Comprehensive: Covers 99%+ of legitimate crawlers
  • Easy to extend: Add new patterns as bots emerge

2. CloudFront Headers (Optional) CloudFront can add device-detection headers:

// Optional: Use CloudFront device detection headers
const headers = request.headers;
const isMobile = headers['cloudfront-is-mobile-viewer']?.[0]?.value === 'true';
const isDesktop = headers['cloudfront-is-desktop-viewer']?.[0]?.value === 'true';

// CloudFront doesn't provide is-bot header by default
// but User-Agent matching is sufficient

Production Implementation

Here's the complete bot detection logic with proper error handling:

/**
 * Lambda@Edge handler for bot detection and meta tag injection
 */
exports.handler = async (event) => {
  const { request, response } = event.Records[0].cf;

  try {
    // Extract User-Agent header
    const userAgentHeader = request.headers['user-agent'];
    const userAgent = userAgentHeader?.[0]?.value || '';

    // Perform bot detection
    const isBotRequest = isBot(userAgent);

    // Log detection result (CloudWatch)
    console.log(JSON.stringify({
      timestamp: new Date().toISOString(),
      path: request.uri,
      userAgent: userAgent.substring(0, 100), // Truncate for logs
      isBot: isBotRequest,
      country: request.headers['cloudfront-viewer-country']?.[0]?.value
    }));

    // If not a bot, return original response immediately
    if (!isBotRequest) {
      return response;
    }

    // Bot detected: proceed to meta tag injection
    // (next section)

  } catch (error) {
    console.error('Bot detection error:', error);
    // Always return response on error (fail gracefully)
    return response;
  }
};

/**
 * Bot detection function
 */
function isBot(userAgent) {
  if (!userAgent || typeof userAgent !== 'string') {
    return false;
  }

  return BOT_USER_AGENTS.some(pattern => {
    try {
      return pattern.test(userAgent);
    } catch (err) {
      console.error('Regex test error:', err);
      return false;
    }
  });
}

// Bot patterns (from above)
const BOT_USER_AGENTS = [
  /googlebot|bingbot|slurp|duckduckbot|baiduspider|yandexbot/i,
  /facebookexternalhit|facebot|twitterbot|linkedinbot|slackbot/i,
  /discordbot|whatsapp|telegrambot|pinterestbot|redditbot|applebot/i
];

Handling Edge Cases

Missing User-Agent:

// Some requests may not have User-Agent header
const userAgent = request.headers['user-agent']?.[0]?.value || '';
if (!userAgent) {
  return false; // Treat as non-bot (default behavior)
}

Malformed User-Agent:

// Wrap regex matching in try-catch
try {
  return BOT_USER_AGENTS.some(pattern => pattern.test(userAgent));
} catch (error) {
  console.error('Bot detection regex error:', error);
  return false; // Fail gracefully, assume non-bot
}

Whitelist for Internal Tools:

// Whitelist certain UAs that match bot patterns but aren't bots
const WHITELIST = [
  /my-company-monitoring-tool/i,
  /internal-link-checker/i
];

function isBot(userAgent) {
  // Check whitelist first
  if (WHITELIST.some(pattern => pattern.test(userAgent))) {
    return false;
  }

  // Then check bot patterns
  return BOT_USER_AGENTS.some(pattern => pattern.test(userAgent));
}

Testing Bot Detection

Local Testing with Sample Events:

// test-bot-detection.js
const { handler } = require('./index');

const testCases = [
  {
    name: 'Facebook Bot',
    userAgent: 'facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)',
    expectedBot: true
  },
  {
    name: 'Googlebot',
    userAgent: 'Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)',
    expectedBot: true
  },
  {
    name: 'Chrome Browser',
    userAgent: 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) Chrome/120.0.0.0',
    expectedBot: false
  },
  {
    name: 'Twitter Bot',
    userAgent: 'Twitterbot/1.0',
    expectedBot: true
  }
];

testCases.forEach(test => {
  const event = createMockEvent(test.userAgent);
  const result = isBot(test.userAgent);
  const pass = result === test.expectedBot;

  console.log(`${pass ? '✓' : '✗'} ${test.name}: ${result} (expected ${test.expectedBot})`);
});

cURL Testing (after deployment):

# Test with Facebook bot
curl -H "User-Agent: facebookexternalhit/1.1" \
  https://your-cloudfront-domain.com/blog/post \
  -v | grep -i "og:title"

# Test with Googlebot
curl -H "User-Agent: Googlebot/2.1" \
  https://your-cloudfront-domain.com/blog/post \
  -v | grep -i "og:title"

# Test with normal browser (should NOT inject)
curl -H "User-Agent: Mozilla/5.0 Chrome/120.0" \
  https://your-cloudfront-domain.com/blog/post \
  -v | grep -i "og:title"

Performance Optimization

Pre-compile Regex Patterns:

// Compile patterns once at Lambda initialization (cold start)
// Not on every request (warm execution)
const COMBINED_BOT_PATTERN = new RegExp(
  'googlebot|bingbot|slurp|duckduckbot|baiduspider|yandexbot|' +
  'facebookexternalhit|facebot|twitterbot|linkedinbot|slackbot|' +
  'discordbot|whatsapp|telegrambot|pinterestbot|redditbot|applebot',
  'i'
);

function isBot(userAgent) {
  return userAgent && COMBINED_BOT_PATTERN.test(userAgent);
}

This single regex test is slightly faster than testing multiple patterns, but the difference is negligible (< 0.5ms).

Monitoring Bot Traffic

CloudWatch Logs Insights Query:

fields @timestamp, userAgent, path, isBot
| filter isBot = true
| stats count() by userAgent
| sort count desc
| limit 20

This query shows which bots are accessing your site and how frequently.

Implementation: Meta Tag Injection

Once we've detected a bot, we need to inject Open Graph Protocol (OGP) meta tags into the HTML response. This section covers the complete implementation.

Metadata Configuration

First, define metadata for your routes. We'll embed this in the Lambda function for zero-latency lookups:

/**
 * Metadata map: URL path → OGP metadata
 */
const METADATA_MAP = {
  '/': {
    title: 'My SPA - Home',
    description: 'Welcome to my modern single-page application',
    image: 'https://cdn.example.com/images/og-home.jpg',
    url: 'https://example.com/',
    type: 'website'
  },
  '/blog/lambda-edge-seo': {
    title: 'Solving SPA SEO with Lambda@Edge - My Blog',
    description: 'Learn how to fix SPA SEO and social sharing issues using AWS Lambda@Edge. Complete production guide with code examples.',
    image: 'https://cdn.example.com/images/blog/lambda-edge.jpg',
    url: 'https://example.com/blog/lambda-edge-seo',
    type: 'article'
  },
  '/about': {
    title: 'About Us - My SPA',
    description: 'Learn more about our company, mission, and team',
    image: 'https://cdn.example.com/images/og-about.jpg',
    url: 'https://example.com/about',
    type: 'website'
  }
};

/**
 * Default metadata for unknown routes
 */
const DEFAULT_METADATA = {
  title: 'My SPA Site',
  description: 'A modern single-page application',
  image: 'https://cdn.example.com/images/og-default.jpg',
  url: 'https://example.com',
  type: 'website',
  siteName: 'My SPA'
};

/**
 * Get metadata for a given path
 * @param {string} path - Request URI path
 * @returns {Object} - Metadata object
 */
function getMetadata(path) {
  // Direct match
  if (METADATA_MAP[path]) {
    return { ...DEFAULT_METADATA, ...METADATA_MAP[path] };
  }

  // Pattern matching for dynamic routes
  if (path.startsWith('/blog/')) {
    const slug = path.split('/blog/')[1];
    return {
      ...DEFAULT_METADATA,
      title: `${formatTitle(slug)} - My Blog`,
      description: `Read our post about ${formatTitle(slug)}`,
      image: `https://cdn.example.com/images/blog/${slug}.jpg`,
      url: `https://example.com${path}`,
      type: 'article'
    };
  }

  // Fallback to default
  return DEFAULT_METADATA;
}

/**
 * Format slug into title case
 * @param {string} slug - URL slug (e.g., "my-post-title")
 * @returns {string} - Title case string (e.g., "My Post Title")
 */
function formatTitle(slug) {
  return slug
    .split('-')
    .map(word => word.charAt(0).toUpperCase() + word.slice(1))
    .join(' ');
}

Generating Meta Tags

Create properly formatted OGP meta tags with HTML escaping:

/**
 * Generate OGP meta tags HTML
 * @param {Object} metadata - Metadata object
 * @returns {string} - HTML meta tags
 */
function generateMetaTags(metadata) {
  const {
    title,
    description,
    image,
    url,
    type = 'website',
    siteName = 'My SPA'
  } = metadata;

  // Escape HTML special characters
  const escape = (str) => {
    if (!str) return '';
    return String(str)
      .replace(/&/g, '&amp;')
      .replace(/</g, '&lt;')
      .replace(/>/g, '&gt;')
      .replace(/"/g, '&quot;')
      .replace(/'/g, '&#039;');
  };

  // Generate meta tags
  return `
    <!-- Open Graph / Facebook -->
    <meta property="og:type" content="${escape(type)}">
    <meta property="og:url" content="${escape(url)}">
    <meta property="og:title" content="${escape(title)}">
    <meta property="og:description" content="${escape(description)}">
    <meta property="og:image" content="${escape(image)}">
    <meta property="og:site_name" content="${escape(siteName)}">

    <!-- Twitter Card -->
    <meta name="twitter:card" content="summary_large_image">
    <meta name="twitter:url" content="${escape(url)}">
    <meta name="twitter:title" content="${escape(title)}">
    <meta name="twitter:description" content="${escape(description)}">
    <meta name="twitter:image" content="${escape(image)}">

    <!-- Standard Meta Tags -->
    <meta name="description" content="${escape(description)}">
    <title>${escape(title)}</title>
  `.trim();
}

HTML Injection Strategy

Inject meta tags into the HTML <head> section using regex:

/**
 * Inject meta tags into HTML response
 * @param {Object} response - CloudFront response object
 * @param {Object} metadata - Metadata object
 * @returns {Object} - Modified response object
 */
function injectMetaTags(response, metadata) {
  // Check if response is HTML
  const contentType = response.headers['content-type']?.[0]?.value || '';
  if (!contentType.includes('text/html')) {
    return response; // Not HTML, return unchanged
  }

  try {
    // Get response body
    let html = response.body;

    // Handle base64 encoding (CloudFront may encode)
    const isBase64 = response.bodyEncoding === 'base64';
    if (isBase64) {
      html = Buffer.from(html, 'base64').toString('utf8');
    }

    // Generate meta tags
    const metaTags = generateMetaTags(metadata);

    // Find </head> tag and inject before it
    const headCloseIndex = html.indexOf('</head>');
    if (headCloseIndex === -1) {
      console.warn('No </head> tag found in HTML');
      return response; // Malformed HTML, return unchanged
    }

    // Inject meta tags
    const modifiedHtml =
      html.slice(0, headCloseIndex) +
      metaTags + '\n' +
      html.slice(headCloseIndex);

    // Update response body
    if (isBase64) {
      response.body = Buffer.from(modifiedHtml).toString('base64');
      response.bodyEncoding = 'base64';
    } else {
      response.body = modifiedHtml;
    }

    // Update Content-Length header
    const byteLength = Buffer.byteLength(modifiedHtml, 'utf8');
    response.headers['content-length'] = [{
      key: 'Content-Length',
      value: byteLength.toString()
    }];

    return response;

  } catch (error) {
    console.error('Meta tag injection error:', error);
    return response; // Return original on error
  }
}

Complete Lambda@Edge Function

Here's the full production-ready implementation:

'use strict';

// Bot detection patterns
const BOT_USER_AGENTS = [
  /googlebot|bingbot|slurp|duckduckbot|baiduspider|yandexbot/i,
  /facebookexternalhit|facebot|twitterbot|linkedinbot|slackbot/i,
  /discordbot|whatsapp|telegrambot|pinterestbot|redditbot|applebot/i
];

// Metadata configuration
const METADATA_MAP = {
  '/': {
    title: 'My SPA - Home',
    description: 'Welcome to my modern single-page application',
    image: 'https://cdn.example.com/images/og-home.jpg',
    url: 'https://example.com/'
  },
  '/blog/lambda-edge-seo': {
    title: 'Solving SPA SEO with Lambda@Edge',
    description: 'Learn how to fix SPA SEO using AWS Lambda@Edge',
    image: 'https://cdn.example.com/images/blog/lambda-edge.jpg',
    url: 'https://example.com/blog/lambda-edge-seo',
    type: 'article'
  }
};

const DEFAULT_METADATA = {
  title: 'My SPA Site',
  description: 'A modern single-page application',
  image: 'https://cdn.example.com/images/og-default.jpg',
  url: 'https://example.com',
  type: 'website',
  siteName: 'My SPA'
};

/**
 * Lambda@Edge handler
 */
exports.handler = async (event) => {
  const { request, response } = event.Records[0].cf;

  try {
    // Extract User-Agent
    const userAgentHeader = request.headers['user-agent'];
    const userAgent = userAgentHeader?.[0]?.value || '';

    // Detect bot
    const isBotRequest = isBot(userAgent);

    // Log request
    console.log(JSON.stringify({
      timestamp: Date.now(),
      path: request.uri,
      userAgent: userAgent.substring(0, 100),
      isBot: isBotRequest
    }));

    // Return original response if not bot
    if (!isBotRequest) {
      return response;
    }

    // Get metadata for this path
    const metadata = getMetadata(request.uri);

    // Inject meta tags
    return injectMetaTags(response, metadata);

  } catch (error) {
    console.error('Lambda@Edge error:', error);
    return response; // Always return response on error
  }
};

/**
 * Bot detection
 */
function isBot(userAgent) {
  if (!userAgent || typeof userAgent !== 'string') {
    return false;
  }
  return BOT_USER_AGENTS.some(pattern => pattern.test(userAgent));
}

/**
 * Get metadata for path
 */
function getMetadata(path) {
  if (METADATA_MAP[path]) {
    return { ...DEFAULT_METADATA, ...METADATA_MAP[path] };
  }

  // Dynamic route handling
  if (path.startsWith('/blog/')) {
    const slug = path.split('/blog/')[1];
    return {
      ...DEFAULT_METADATA,
      title: `${formatTitle(slug)} - My Blog`,
      description: `Read about ${formatTitle(slug)}`,
      image: `https://cdn.example.com/images/blog/${slug}.jpg`,
      url: `https://example.com${path}`,
      type: 'article'
    };
  }

  return DEFAULT_METADATA;
}

/**
 * Format slug to title case
 */
function formatTitle(slug) {
  return slug
    .split('-')
    .map(w => w.charAt(0).toUpperCase() + w.slice(1))
    .join(' ');
}

/**
 * Generate meta tags HTML
 */
function generateMetaTags(metadata) {
  const escape = (str) => {
    if (!str) return '';
    return String(str)
      .replace(/&/g, '&amp;')
      .replace(/</g, '&lt;')
      .replace(/>/g, '&gt;')
      .replace(/"/g, '&quot;')
      .replace(/'/g, '&#039;');
  };

  const {
    title,
    description,
    image,
    url,
    type = 'website',
    siteName = 'My SPA'
  } = metadata;

  return `
    <!-- Open Graph / Facebook -->
    <meta property="og:type" content="${escape(type)}">
    <meta property="og:url" content="${escape(url)}">
    <meta property="og:title" content="${escape(title)}">
    <meta property="og:description" content="${escape(description)}">
    <meta property="og:image" content="${escape(image)}">
    <meta property="og:site_name" content="${escape(siteName)}">

    <!-- Twitter Card -->
    <meta name="twitter:card" content="summary_large_image">
    <meta name="twitter:url" content="${escape(url)}">
    <meta name="twitter:title" content="${escape(title)}">
    <meta name="twitter:description" content="${escape(description)}">
    <meta name="twitter:image" content="${escape(image)}">

    <!-- Standard Meta Tags -->
    <meta name="description" content="${escape(description)}">
    <title>${escape(title)}</title>
  `.trim();
}

/**
 * Inject meta tags into HTML
 */
function injectMetaTags(response, metadata) {
  const contentType = response.headers['content-type']?.[0]?.value || '';
  if (!contentType.includes('text/html')) {
    return response;
  }

  try {
    let html = response.body;
    const isBase64 = response.bodyEncoding === 'base64';

    if (isBase64) {
      html = Buffer.from(html, 'base64').toString('utf8');
    }

    const metaTags = generateMetaTags(metadata);
    const headCloseIndex = html.indexOf('</head>');

    if (headCloseIndex === -1) {
      return response;
    }

    const modifiedHtml =
      html.slice(0, headCloseIndex) +
      metaTags + '\n' +
      html.slice(headCloseIndex);

    if (isBase64) {
      response.body = Buffer.from(modifiedHtml).toString('base64');
      response.bodyEncoding = 'base64';
    } else {
      response.body = modifiedHtml;
    }

    const byteLength = Buffer.byteLength(modifiedHtml, 'utf8');
    response.headers['content-length'] = [{
      key: 'Content-Length',
      value: byteLength.toString()
    }];

    return response;

  } catch (error) {
    console.error('Injection error:', error);
    return response;
  }
}

OGP Image Requirements

For optimal social sharing, follow these image specifications:

// Facebook OGP image best practices (2026)
const OG_IMAGE_SPECS = {
  // Recommended dimensions
  recommended: { width: 1200, height: 630 },
  aspectRatio: '1.91:1',

  // Minimum dimensions
  minimum: { width: 600, height: 315 },

  // Maximum size
  maxFileSize: '8 MB',

  // Supported formats
  formats: ['JPG', 'PNG', 'WebP', 'GIF'],

  // Protocol
  protocol: 'https://', // Required for Facebook

  // Multiple images
  fallback: 'First og:image tag is primary'
};

// Example with multiple images
const metaTags = `
  <meta property="og:image" content="https://cdn.example.com/primary.jpg">
  <meta property="og:image:secure_url" content="https://cdn.example.com/primary.jpg">
  <meta property="og:image:width" content="1200">
  <meta property="og:image:height" content="630">
  <meta property="og:image:alt" content="Image description for accessibility">
`;

This completes the meta tag injection implementation. Next, we'll deploy this infrastructure using Terraform.

Infrastructure as Code with Terraform

Now let's deploy the complete infrastructure using Terraform. This IaC approach ensures reproducible, version-controlled deployments.

Quick Start

Want to jump straight to deployment? Clone the repository and follow the README:

git clone https://github.com/khuongdo/lambda-edge-spa-seo.git
cd lambda-edge-spa-seo

Prerequisites

# Install Terraform (if not already installed)
brew install terraform  # macOS
# or download from https://www.terraform.io/downloads

# Configure AWS credentials
aws configure
# Enter your AWS Access Key ID, Secret Access Key, and region

# Verify setup
terraform --version
aws sts get-caller-identity

Project Structure

spa-seo-lambda-edge/
├── terraform/
│   ├── main.tf           # Main infrastructure
│   ├── variables.tf      # Input variables
│   ├── outputs.tf        # Output values
│   └── lambda.tf         # Lambda function config
├── lambda/
│   └── index.js          # Lambda@Edge function
└── README.md

Lambda Function Packaging

First, save the Lambda function code:

# Create Lambda function file
mkdir -p lambda
cat > lambda/index.js << 'EOF'
'use strict';

const BOT_USER_AGENTS = [
  /googlebot|bingbot|slurp|duckduckbot|baiduspider|yandexbot/i,
  /facebookexternalhit|facebot|twitterbot|linkedinbot|slackbot/i,
  /discordbot|whatsapp|telegrambot|pinterestbot|redditbot|applebot/i
];

const METADATA_MAP = {
  '/': {
    title: 'My SPA - Home',
    description: 'Welcome to my modern single-page application',
    image: 'https://cdn.example.com/images/og-home.jpg',
    url: 'https://example.com/'
  },
  '/blog/lambda-edge-seo': {
    title: 'Solving SPA SEO with Lambda@Edge',
    description: 'Learn how to fix SPA SEO using AWS Lambda@Edge',
    image: 'https://cdn.example.com/images/blog/lambda-edge.jpg',
    url: 'https://example.com/blog/lambda-edge-seo',
    type: 'article'
  }
};

const DEFAULT_METADATA = {
  title: 'My SPA Site',
  description: 'A modern single-page application',
  image: 'https://cdn.example.com/images/og-default.jpg',
  url: 'https://example.com',
  type: 'website',
  siteName: 'My SPA'
};

exports.handler = async (event) => {
  const { request, response } = event.Records[0].cf;

  try {
    const userAgent = request.headers['user-agent']?.[0]?.value || '';
    const isBotRequest = isBot(userAgent);

    if (!isBotRequest) {
      return response;
    }

    const metadata = getMetadata(request.uri);
    return injectMetaTags(response, metadata);

  } catch (error) {
    console.error('Lambda@Edge error:', error);
    return response;
  }
};

function isBot(userAgent) {
  if (!userAgent) return false;
  return BOT_USER_AGENTS.some(pattern => pattern.test(userAgent));
}

function getMetadata(path) {
  if (METADATA_MAP[path]) {
    return { ...DEFAULT_METADATA, ...METADATA_MAP[path] };
  }

  if (path.startsWith('/blog/')) {
    const slug = path.split('/blog/')[1];
    return {
      ...DEFAULT_METADATA,
      title: `${formatTitle(slug)} - My Blog`,
      description: `Read about ${formatTitle(slug)}`,
      image: `https://cdn.example.com/images/blog/${slug}.jpg`,
      url: `https://example.com${path}`,
      type: 'article'
    };
  }

  return DEFAULT_METADATA;
}

function formatTitle(slug) {
  return slug.split('-').map(w => w.charAt(0).toUpperCase() + w.slice(1)).join(' ');
}

function generateMetaTags(metadata) {
  const escape = (str) => {
    if (!str) return '';
    return String(str)
      .replace(/&/g, '&amp;')
      .replace(/</g, '&lt;')
      .replace(/>/g, '&gt;')
      .replace(/"/g, '&quot;')
      .replace(/'/g, '&#039;');
  };

  const { title, description, image, url, type = 'website', siteName = 'My SPA' } = metadata;

  return `
    <!-- Open Graph / Facebook -->
    <meta property="og:type" content="${escape(type)}">
    <meta property="og:url" content="${escape(url)}">
    <meta property="og:title" content="${escape(title)}">
    <meta property="og:description" content="${escape(description)}">
    <meta property="og:image" content="${escape(image)}">
    <meta property="og:site_name" content="${escape(siteName)}">

    <!-- Twitter Card -->
    <meta name="twitter:card" content="summary_large_image">
    <meta name="twitter:url" content="${escape(url)}">
    <meta name="twitter:title" content="${escape(title)}">
    <meta name="twitter:description" content="${escape(description)}">
    <meta name="twitter:image" content="${escape(image)}">

    <!-- Standard Meta Tags -->
    <meta name="description" content="${escape(description)}">
    <title>${escape(title)}</title>
  `.trim();
}

function injectMetaTags(response, metadata) {
  const contentType = response.headers['content-type']?.[0]?.value || '';
  if (!contentType.includes('text/html')) {
    return response;
  }

  try {
    let html = response.body;
    const isBase64 = response.bodyEncoding === 'base64';

    if (isBase64) {
      html = Buffer.from(html, 'base64').toString('utf8');
    }

    const metaTags = generateMetaTags(metadata);
    const headCloseIndex = html.indexOf('</head>');

    if (headCloseIndex === -1) {
      return response;
    }

    const modifiedHtml = html.slice(0, headCloseIndex) + metaTags + '\n' + html.slice(headCloseIndex);

    if (isBase64) {
      response.body = Buffer.from(modifiedHtml).toString('base64');
      response.bodyEncoding = 'base64';
    } else {
      response.body = modifiedHtml;
    }

    response.headers['content-length'] = [{
      key: 'Content-Length',
      value: Buffer.byteLength(modifiedHtml, 'utf8').toString()
    }];

    return response;

  } catch (error) {
    console.error('Injection error:', error);
    return response;
  }
}
EOF

# Package Lambda function
cd lambda
zip -q function.zip index.js
cd ..

Complete Terraform Configuration

Create the Terraform files:

# terraform/main.tf

terraform {
  required_version = ">= 1.0"
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.0"
    }
  }
}

# Lambda@Edge MUST be in us-east-1
provider "aws" {
  region = "us-east-1"
  alias  = "us_east_1"
}

provider "aws" {
  region = var.aws_region
}

# S3 Bucket for SPA static files
resource "aws_s3_bucket" "spa_bucket" {
  bucket = var.s3_bucket_name

  tags = {
    Name        = "SPA Static Assets"
    Environment = var.environment
  }
}

# Block public access
resource "aws_s3_bucket_public_access_block" "spa_bucket" {
  bucket = aws_s3_bucket.spa_bucket.id

  block_public_acls       = true
  block_public_policy     = true
  ignore_public_acls      = true
  restrict_public_buckets = true
}

# CloudFront Origin Access Identity
resource "aws_cloudfront_origin_access_identity" "spa_oai" {
  comment = "OAI for ${var.s3_bucket_name}"
}

# S3 bucket policy - allow CloudFront OAI
resource "aws_s3_bucket_policy" "spa_bucket_policy" {
  bucket = aws_s3_bucket.spa_bucket.id

  policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Sid    = "AllowCloudFrontOAI"
        Effect = "Allow"
        Principal = {
          AWS = aws_cloudfront_origin_access_identity.spa_oai.iam_arn
        }
        Action   = "s3:GetObject"
        Resource = "${aws_s3_bucket.spa_bucket.arn}/*"
      }
    ]
  })
}

# IAM Role for Lambda@Edge
resource "aws_iam_role" "lambda_edge_role" {
  provider = aws.us_east_1
  name     = "${var.project_name}-lambda-edge-role"

  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Action = "sts:AssumeRole"
        Effect = "Allow"
        Principal = {
          Service = [
            "lambda.amazonaws.com",
            "edgelambda.amazonaws.com"
          ]
        }
      }
    ]
  })
}

# IAM Policy for Lambda@Edge
resource "aws_iam_role_policy" "lambda_edge_policy" {
  provider = aws.us_east_1
  role     = aws_iam_role.lambda_edge_role.id

  policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Effect = "Allow"
        Action = [
          "logs:CreateLogGroup",
          "logs:CreateLogStream",
          "logs:PutLogEvents"
        ]
        Resource = "arn:aws:logs:*:*:*"
      }
    ]
  })
}

# Lambda@Edge Function
resource "aws_lambda_function" "seo_ogp_injector" {
  provider         = aws.us_east_1
  filename         = "${path.module}/../lambda/function.zip"
  function_name    = "${var.project_name}-seo-ogp-injector"
  role            = aws_iam_role.lambda_edge_role.arn
  handler         = "index.handler"
  runtime         = "nodejs20.x"
  publish         = true  # Required for Lambda@Edge
  timeout         = 5
  memory_size     = 128

  source_code_hash = filebase64sha256("${path.module}/../lambda/function.zip")

  tags = {
    Name        = "SEO OGP Injector"
    Environment = var.environment
  }
}

# CloudFront Distribution
resource "aws_cloudfront_distribution" "spa_distribution" {
  enabled             = true
  is_ipv6_enabled     = true
  default_root_object = "index.html"
  price_class         = var.cloudfront_price_class
  comment             = "${var.project_name} SPA with Lambda@Edge SEO"

  origin {
    domain_name = aws_s3_bucket.spa_bucket.bucket_regional_domain_name
    origin_id   = "S3-${var.s3_bucket_name}"

    s3_origin_config {
      origin_access_identity = aws_cloudfront_origin_access_identity.spa_oai.cloudfront_access_identity_path
    }
  }

  default_cache_behavior {
    allowed_methods  = ["GET", "HEAD", "OPTIONS"]
    cached_methods   = ["GET", "HEAD"]
    target_origin_id = "S3-${var.s3_bucket_name}"

    forwarded_values {
      query_string = false
      headers      = ["User-Agent"]  # Required for bot detection

      cookies {
        forward = "none"
      }
    }

    viewer_protocol_policy = "redirect-to-https"
    min_ttl                = 0
    default_ttl            = 86400  # 24 hours
    max_ttl                = 604800 # 7 days
    compress               = true

    # Lambda@Edge association
    lambda_function_association {
      event_type   = "viewer-response"
      lambda_arn   = aws_lambda_function.seo_ogp_injector.qualified_arn
      include_body = false
    }
  }

  # SPA routing: 404 -> index.html
  custom_error_response {
    error_code         = 404
    response_code      = 200
    response_page_path = "/index.html"
  }

  custom_error_response {
    error_code         = 403
    response_code      = 200
    response_page_path = "/index.html"
  }

  restrictions {
    geo_restriction {
      restriction_type = "none"
    }
  }

  viewer_certificate {
    cloudfront_default_certificate = true
    # For custom domain:
    # acm_certificate_arn      = var.acm_certificate_arn
    # ssl_support_method       = "sni-only"
    # minimum_protocol_version = "TLSv1.2_2021"
  }

  tags = {
    Name        = "${var.project_name} CloudFront"
    Environment = var.environment
  }
}
# terraform/variables.tf

variable "aws_region" {
  description = "AWS region for resources (CloudFront is global)"
  type        = string
  default     = "us-east-1"
}

variable "project_name" {
  description = "Project name for resource naming"
  type        = string
  default     = "spa-seo-lambda-edge"
}

variable "s3_bucket_name" {
  description = "S3 bucket name for SPA static files"
  type        = string
}

variable "environment" {
  description = "Environment (dev, staging, production)"
  type        = string
  default     = "production"
}

variable "cloudfront_price_class" {
  description = "CloudFront price class (PriceClass_100, PriceClass_200, PriceClass_All)"
  type        = string
  default     = "PriceClass_100" # US, Canada, Europe
}
# terraform/outputs.tf

output "cloudfront_domain_name" {
  value       = aws_cloudfront_distribution.spa_distribution.domain_name
  description = "CloudFront distribution domain name"
}

output "cloudfront_distribution_id" {
  value       = aws_cloudfront_distribution.spa_distribution.id
  description = "CloudFront distribution ID (for cache invalidation)"
}

output "s3_bucket_name" {
  value       = aws_s3_bucket.spa_bucket.id
  description = "S3 bucket name"
}

output "lambda_function_arn" {
  value       = aws_lambda_function.seo_ogp_injector.qualified_arn
  description = "Lambda@Edge function ARN with version"
}

output "cloudfront_url" {
  value       = "https://${aws_cloudfront_distribution.spa_distribution.domain_name}"
  description = "Full CloudFront URL"
}

Deployment Steps

# 1. Create terraform.tfvars
cat > terraform/terraform.tfvars << 'EOF'
s3_bucket_name = "my-unique-spa-bucket-name-12345"
project_name   = "my-spa-seo"
environment    = "production"
EOF

# 2. Initialize Terraform
cd terraform
terraform init

# 3. Plan deployment (review changes)
terraform plan

# 4. Apply infrastructure
terraform apply
# Review the plan, type 'yes' to proceed

# 5. Deploy SPA files to S3
# (Replace with your SPA build output directory)
aws s3 sync ../spa-dist s3://my-unique-spa-bucket-name-12345/ --delete

# 6. Invalidate CloudFront cache
DISTRIBUTION_ID=$(terraform output -raw cloudfront_distribution_id)
aws cloudfront create-invalidation \
  --distribution-id $DISTRIBUTION_ID \
  --paths "/*"

# 7. Get CloudFront URL
terraform output cloudfront_url

Important Notes

Lambda@Edge Deployment Time:

  • Lambda@Edge replication to all edge locations takes 15-30 minutes
  • During this time, some requests may not have the Lambda@Edge function available
  • Plan deployments accordingly (use blue/green or canary deployments for production)

Updating Lambda Function:

# After modifying lambda/index.js:

# 1. Re-package
cd lambda
zip -q function.zip index.js
cd ..

# 2. Apply Terraform
cd terraform
terraform apply

# 3. Wait for replication (15-30 minutes)

Cost Estimation:

# Use AWS Pricing Calculator
# https://calculator.aws/

# Typical monthly cost for 1M requests:
# - CloudFront: $0.75-1.00
# - Lambda@Edge: $0.40-0.60
# - S3: $0.10-0.20
# Total: ~$1.25-1.80/month

Testing & Validation

After deployment, validate the implementation using multiple methods.

1. Local cURL Testing

# Set your CloudFront domain
DOMAIN="d1234abcd5678.cloudfront.net"

# Test 1: Facebook bot (should inject meta tags)
curl -H "User-Agent: facebookexternalhit/1.1" \
  https://$DOMAIN/ | grep -i "og:title"

# Expected output:
# <meta property="og:title" content="My SPA - Home">

# Test 2: Googlebot (should inject meta tags)
curl -H "User-Agent: Mozilla/5.0 (compatible; Googlebot/2.1)" \
  https://$DOMAIN/blog/test-post | grep -i "og:title"

# Test 3: Normal browser (should NOT inject)
curl -H "User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) Chrome/120.0" \
  https://$DOMAIN/ | grep -i "og:title"

# Expected: No og:title in output (original SPA shell)

2. Social Media Validators

Facebook Sharing Debugger:

  1. Visit https://developers.facebook.com/tools/debug/
  2. Enter your CloudFront URL: https://d1234abcd5678.cloudfront.net/blog/post
  3. Click "Scrape Again" to refresh cache
  4. Verify preview shows correct:
    • Title
    • Description
    • Image (1200x630 recommended)

Twitter Card Validator:

  1. Visit https://cards-dev.twitter.com/validator
  2. Enter your URL
  3. Verify card preview appears correctly
  4. Check that both og:image and twitter:image are present

LinkedIn Post Inspector:

  1. Visit https://www.linkedin.com/post-inspector/
  2. Enter your URL
  3. Inspect the preview
  4. Verify professional appearance

3. CloudWatch Logs Verification

# View Lambda@Edge logs (appear in edge regions)
# Find your closest region

# US East Coast
aws logs tail /aws/lambda/us-east-1.my-spa-seo-seo-ogp-injector \
  --region us-east-1 \
  --follow

# Europe
aws logs tail /aws/lambda/eu-west-1.my-spa-seo-seo-ogp-injector \
  --region eu-west-1 \
  --follow

# CloudWatch Insights query for bot detection stats
aws logs insights query \
  --log-group-name /aws/lambda/us-east-1.my-spa-seo-seo-ogp-injector \
  --start-time $(date -u -d '1 hour ago' +%s) \
  --end-time $(date -u +%s) \
  --query-string 'fields @timestamp, userAgent, path, isBot | filter isBot = true | stats count() by userAgent'

4. Performance Testing

# Measure latency with ApacheBench
ab -n 100 -c 10 \
  -H "User-Agent: facebookexternalhit/1.1" \
  https://$DOMAIN/

# Key metrics to check:
# - Time per request: should be < 100ms (after cache warm-up)
# - Failed requests: should be 0
# - 95th percentile: should be < 150ms

5. Automated Testing Script

#!/bin/bash
# test-lambda-edge.sh

DOMAIN="$1"
if [ -z "$DOMAIN" ]; then
  echo "Usage: ./test-lambda-edge.sh <cloudfront-domain>"
  exit 1
fi

echo "Testing Lambda@Edge deployment on: $DOMAIN"
echo "============================================"

# Test bots (should have meta tags)
echo -e "\n1. Testing Facebook bot..."
RESULT=$(curl -s -H "User-Agent: facebookexternalhit/1.1" https://$DOMAIN/ | grep -c "og:title")
if [ "$RESULT" -gt 0 ]; then
  echo "✓ Facebook bot: Meta tags injected"
else
  echo "✗ Facebook bot: Meta tags NOT found"
fi

echo -e "\n2. Testing Googlebot..."
RESULT=$(curl -s -H "User-Agent: Googlebot/2.1" https://$DOMAIN/ | grep -c "og:title")
if [ "$RESULT" -gt 0 ]; then
  echo "✓ Googlebot: Meta tags injected"
else
  echo "✗ Googlebot: Meta tags NOT found"
fi

# Test human (should NOT have injected meta tags in SPA shell)
echo -e "\n3. Testing human browser..."
RESULT=$(curl -s -H "User-Agent: Mozilla/5.0 Chrome/120.0" https://$DOMAIN/ | grep -c "og:title")
if [ "$RESULT" -eq 0 ]; then
  echo "✓ Human browser: Original SPA shell returned"
else
  echo "⚠ Human browser: Unexpected meta tags found"
fi

echo -e "\n============================================"
echo "Testing complete!"

Run the test script:

chmod +x test-lambda-edge.sh
./test-lambda-edge.sh d1234abcd5678.cloudfront.net

Common Issues & Solutions

IssueCauseSolution
Meta tags not appearingUser-Agent header not forwardedCheck CloudFront cache behavior forwarded_values.headers = ["User-Agent"]
Old content cachedCloudFront cache not invalidatedRun aws cloudfront create-invalidation --distribution-id XXX --paths "/*"
Lambda not executingFunction not publishedEnsure publish = true in Terraform
403 Access DeniedS3 bucket policy missingVerify OAI has GetObject permission
Logs not visibleLooking in wrong regionCheck CloudWatch Logs in edge region closest to you

Conclusion

You've now implemented a production-grade solution for SPA SEO and social media sharing using AWS Lambda@Edge. This approach delivers:

Key Benefits:

  • Cost-effective: ~$1-2/month vs $99-249/month for pre-rendering services
  • Low-latency: 8-15ms execution time at the edge
  • Framework-agnostic: Works with any SPA (React, Vue, Angular, Svelte)
  • Non-invasive: No changes to existing SPA code
  • Production-ready: Complete with error handling and monitoring

When to Use This Approach:

  • ✅ Existing SPA that can't be easily refactored to SSR
  • ✅ Need SEO and social sharing improvements quickly
  • ✅ Cost-conscious (< $5/month for most sites)
  • ✅ Already using AWS CloudFront
  • ✅ Want framework-agnostic solution

When NOT to Use This:

  • ❌ Building new application → Use SSR framework (Next.js, Nuxt, SvelteKit)
  • ❌ Need real-time dynamic content in meta tags → Consider full SSR
  • ❌ Very complex metadata logic → May need origin-request trigger
  • ❌ Not on AWS → Look at Cloudflare Workers or Vercel Edge Functions

Next Steps

  1. Customize metadata:

    • Update METADATA_MAP in Lambda function
    • Add routes specific to your application
    • Configure fallback metadata
  2. Optimize for your use case:

    • Adjust CloudFront cache TTLs
    • Monitor bot traffic patterns
    • Fine-tune bot detection patterns
  3. Add custom domain (optional):

    # In terraform/main.tf viewer_certificate block:
    viewer_certificate {
      acm_certificate_arn      = "arn:aws:acm:us-east-1:ACCOUNT:certificate/ID"
      ssl_support_method       = "sni-only"
      minimum_protocol_version = "TLSv1.2_2021"
    }
    
    # Add aliases
    aliases = ["www.example.com", "example.com"]
    
  4. Set up monitoring:

    • CloudWatch dashboard for Lambda invocations
    • Alerts for error rates > 1%
    • Cost monitoring with AWS Budgets
  5. Consider advanced features:

    • Multi-language support (og:locale)
    • A/B testing different meta tags
    • Dynamic image generation for OG images
    • Structured data (JSON-LD) injection

Resources

Code & Examples:

  • GitHub Repository - Complete working implementation with Terraform, tests, and example SPA

AWS Documentation:

SEO & Social Media:

Lambda@Edge offers an elegant solution for SPAs struggling with SEO and social sharing. By intercepting bot traffic at the edge and dynamically injecting meta tags, you get the best of both worlds: modern SPA architecture for users and SEO-friendly responses for crawlers—all without rewriting your application.

If you found this guide helpful, share it on social media to test your OGP implementation!