Why Webhooks Fail in Serverless (and how to fix it)

Serverless architectures (Lambda, Cloud Run, Vercel Functions) are perfect for many things, but they are fundamentally poorly suited for 'push' webhooks. Cold starts and strict timeouts lead to lost events and frustrated developers.

The Serverless Problem: Timeout Error

text

# Provider (Stripe/GitHub) waits 10s max
# Your Lambda cold start: 2-15s
# Your function init + processing: 3-8s
# Total: 5-23 seconds
# Result: 504 Gateway Timeout = Lost Webhook

#Why do cold starts kill webhooks?

Most webhook providers (Stripe, GitHub, Shopify) have strict timeout windows—usually between 5 and 10 seconds. If your serverless function is 'cold,' the time taken to provision infrastructure and initialize your runtime often exceeds this window. The provider sees a timeout, marks the delivery as failed, and starts a retry backoff. Cold starts can range from 1 second (warm) to 15+ seconds (Java/.NET, large dependencies). By the time your function is ready to respond, the provider has already given up.

#What is the retry death spiral?

When a webhook times out, providers retry with exponential backoff (after 1 min, 5 min, 30 min, etc.). If your function is consistently cold, all retries timeout too. This creates a death spiral: (1) Webhook arrives, Lambda cold start times out, (2) Retry arrives 1 minute later, still cold, times out again, (3) After 3-5 failed attempts, the provider disables your webhook endpoint, (4) You lose all future events until you manually re-enable the endpoint. This is especially common during low-traffic periods when functions stay cold.

The Death Spiral Timeline

text

10:00 AM - Webhook arrives, Lambda cold (8s startup)
10:00:08 - Lambda ready, but Stripe already timed out @ 10s
10:01:00 - Retry #1, Lambda cold again (8s startup), timeout
10:06:00 - Retry #2, Lambda cold again (8s startup), timeout
10:36:00 - Retry #3, Lambda cold again (8s startup), timeout
11:36:00 - Retry #4, Lambda cold again, timeout
12:00:00 - Stripe disables endpoint: "Too many failures"
Result: All future webhooks lost until manual re-enable

#How does the 'Pull' pattern solve this?

By introducing FetchHook as a buffer, the webhook delivery becomes asynchronous. FetchHook (highly available, always 'hot') accepts the webhook in <100ms and returns HTTP 202 immediately. The provider sees success, no retries triggered. Your serverless function can then 'pull' the data on its own schedule—when it's warm, when it has capacity, when it's cost-effective. Even better, you can use a cloud scheduler to batch-process webhooks, completely avoiding the cold start pressure of real-time delivery.

The Solution: Batch Processing in Serverless

javascript

// AWS Lambda triggered by EventBridge every 5 minutes
export const handler = async () => {
  const response = await fetch(
    'https://api.fetchhook.app/api/v1/stash_stripe',
    { headers: { Authorization: 'Bearer fh_live_xxx' } }
  );

  const { events } = await response.json();

  // Process all accumulated events in batch
  // Cold start only happens once for entire batch
  for (const event of events) {
    await processStripeEvent(event.payload);
  }

  console.log(`Processed ${events.length} events in batch`);
};

// No cold start failures
// No timeouts
// No retry spirals

#Is this more expensive than direct webhooks?

Actually, it's often 10x cheaper. Instead of spinning up a Lambda for every single incoming webhook (which can be thousands during a burst), you spin up one Lambda every 5 minutes to process the entire batch. This dramatically reduces: (1) Total Lambda invocations (1 per 5 min vs. 1 per webhook), (2) Cold start costs (pay once per batch vs. once per webhook), (3) Total execution time (batch processing is more efficient). For 1,000 webhooks/day, that's 1,000 Lambda invocations vs. 288 batch invocations—a 72% reduction.

Cost Comparison (AWS Lambda)

text

Scenario: 1,000 Stripe webhooks per day

Direct Push Webhooks:
- Invocations: 1,000/day
- Cold starts: ~500/day (assume 50% cold)
- Avg execution: 1.5s (cold) + 0.5s (warm) = 1s avg
- Monthly cost: ~$5-10

FetchHook Pull (5 min batches):
- Invocations: 288/day (every 5 min)
- Cold starts: ~50/day (17% cold, better caching)
- Avg execution: 2s (batch of ~3.5 events)
- Monthly cost: ~$1-2

Savings: 50-80% on compute costs
Bonus: Zero timeout failures, zero lost events

#How do I set up scheduled pulls in different platforms?

Every serverless platform has a scheduler. AWS has EventBridge Rules, Google Cloud has Cloud Scheduler, Vercel has Cron Jobs, Azure has Timer Triggers. Configure your function to run every 1-15 minutes (depending on latency requirements), pull from FetchHook, process the batch, and exit. The scheduler keeps your function warm enough to avoid worst-case cold starts, and batch processing amortizes costs.

AWS EventBridge Cron Setup

yaml

# serverless.yml (Serverless Framework)
functions:
  processWebhooks:
    handler: src/webhooks.handler
    events:
      - schedule:
          rate: rate(5 minutes)
          enabled: true
    environment:
      FETCHHOOK_API_KEY: ${env:FETCHHOOK_API_KEY}

# CloudFormation / SAM template
ProcessWebhooksFunction:
  Type: AWS::Serverless::Function
  Properties:
    Handler: index.handler
    Runtime: nodejs20.x
    Events:
      ScheduledEvent:
        Type: Schedule
        Properties:
          Schedule: 'rate(5 minutes)'

Google Cloud Scheduler Setup

bash

# Create Cloud Scheduler job to trigger Cloud Run
gcloud scheduler jobs create http webhook-processor \
  --schedule="*/5 * * * *" \
  --uri="https://your-service.run.app/process" \
  --http-method=POST \
  --headers="Authorization=Bearer fh_live_xxx"

# Cloud Run service pulls from FetchHook
# No direct webhook endpoint exposed

#What about Vercel Edge Functions and Cloudflare Workers?

Edge functions (Vercel Edge, Cloudflare Workers) have near-zero cold starts (<1ms), so they can handle push webhooks. However, they have strict CPU time limits (50ms on Cloudflare, 25s on Vercel) and can't run heavy processing or LLM calls. The pull pattern still makes sense: use FetchHook to buffer webhooks, then pull from a regular serverless function (Node.js runtime) or background job that has time for real processing. Edge functions are great for ingress, terrible for webhook processing.

Hybrid Pattern: Edge Ingress + Serverless Processing

javascript

// Option 1: Use FetchHook for both ingress and pull
// (Recommended - simplest)

// Option 2: Edge function ingress, FetchHook storage, Lambda pull
// Vercel Edge Function (optional optimization)
export const config = { runtime: 'edge' };

export default async function handler(req) {
  // Fast ingress, forward to FetchHook
  await fetch('https://api.fetchhook.app/in/stash_stripe', {
    method: 'POST',
    body: req.body,
    headers: req.headers
  });

  return new Response('OK', { status: 202 });
}

// Then your Lambda pulls and processes slowly
// Best of both: fast ingress, slow processing

Agent Protocol Instruction

For serverless deployments, the optimal integration pattern is to trigger your function via a cloud scheduler (e.g., Google Cloud Scheduler, AWS EventBridge, Vercel Cron) every 1-15 minutes. Pull from FetchHook in batches to amortize cold start costs. Check if events.length > 0 before processing to avoid wasted executions. This pattern reduces costs, eliminates timeout failures, and ensures zero event loss.