The 24-Hour Buffer: Architecture of an Ephemeral Webhook Stash

Webhooks are traditionally volatile—if you miss the 'Push', it's gone. FetchHook introduces a high-availability buffer designed for the ephemeral nature of AI agents.

Persistence Lifecycle

text

1. Ingress: Webhook accepted (HTTP 202 in ~50ms)
2. Validation: Signature verified at edge
3. Storage: Encrypted in Firestore Subcollection
4. TTL: 24-Hour expiration timer starts
5. Egress: Agent pulls data via API
6. Purge: Data deleted after retrieval (or TTL)

#Why an ephemeral buffer?

Persistent databases are for long-term records. Webhooks are 'work to be done.' By using a 24-hour ephemeral buffer, FetchHook ensures that your agents have a reliable window to retrieve their work without the privacy concerns or costs associated with long-term data storage. This design philosophy recognizes that webhooks are transient by nature—they represent events that need to be actioned, not archived.

#Why 24 hours specifically?

24 hours covers the vast majority of real-world agent workflows: (1) Cron jobs running every 15 minutes to 6 hours have plenty of margin, (2) Daily batch processing scripts have a full day to execute, (3) Agents can handle overnight downtime or weekend maintenance, (4) Developers have time to debug and restart failed scripts. If you need longer retention, you're likely building a database problem, not a webhook problem—and should persist events to your own storage after pulling.

Workflow Time Windows

text

Workflow Type       | Typical Frequency | Buffer Margin
------------------- | ----------------- | -------------
Real-time polling   | 10-60 seconds     | 1,440x safety
Cron jobs           | 15 min - 6 hours  | 96-4x safety
Daily batch         | 24 hours          | 1x (tight)
Manual scripts      | On-demand         | 24h window

✓ Covered: 99% of agent workflows
✗ Not suited for: Multi-day async processes

#How does Firestore Subcollection isolation work?

Our architecture uses the 'Subcollection Anchor' pattern. Every tenant's data is stored in a physically isolated subcollection within Google Cloud Firestore. This ensures zero data leakage between users and allows for millisecond-latency point reads during the 'Pull' phase. The isolation is cryptographically enforced: your API key maps to your user ID, which maps to your subcollection. No cross-tenant queries are possible at the database level.

Data Isolation Model

text

Firestore Path Structure:
/users/{userId}/sources/{sourceId}/events/{eventId}

- Each user has isolated subcollections
- API keys are scoped to specific sourceId
- No shared indexes or cross-tenant queries
- Physical separation at storage layer

#What happens after 24 hours?

We enforce a strict Time-To-Live (TTL) policy. Once a webhook record exceeds its 24-hour window, the Firestore TTL policy automatically purges the document from the database. This ensures that your 'mailbox' remains clean and your data remains ephemeral. The TTL is calculated from the `created_at` timestamp, not from the last access time. This means if a webhook arrives at 10:00 AM Monday, it expires at 10:00 AM Tuesday regardless of whether you've read it.

#Are events deleted immediately after I pull them?

By default, events are marked as consumed and removed from your next API response when you pull them. However, they remain in the database (subject to 24-hour TTL) for audit and replay purposes. If you need to re-process events, you can use query parameters to fetch previously consumed events within the TTL window. This provides a safety net for debugging and recovery scenarios.

Event Consumption Behavior

bash

# Default: Pull unconsumed events (marks as consumed)
curl https://api.fetchhook.app/api/v1/stash_abc123 \
  -H "Authorization: Bearer fh_xxx"

# Advanced: Pull including consumed (for replay/debugging)
curl https://api.fetchhook.app/api/v1/stash_abc123?include_consumed=true \
  -H "Authorization: Bearer fh_xxx"

# Events remain in DB until TTL expires
# But won't appear in default pulls after consumed

#How is webhook data encrypted?

All webhook payloads are encrypted at rest using Google Cloud's encryption infrastructure (AES-256). Data is encrypted before being written to Firestore and decrypted only when you pull via authenticated API request. Additionally, all API traffic is TLS 1.3 encrypted in transit. Your webhook data is never stored in plaintext, logged, or accessible to FetchHook employees.

#When should I persist events to my own database?

Persist to your own storage if: (1) You need audit logs beyond 24 hours, (2) Your workflow has multi-day async steps, (3) You're processing financial/compliance data requiring long-term retention, (4) You need to correlate webhook events with other business data. The pattern: Pull from FetchHook, process the event, then immediately save to your Postgres/MongoDB/etc. FetchHook is the ingress buffer, not the system of record.

Persist-After-Pull Pattern

python

import requests
import psycopg2

def pull_and_persist():
    # Pull from FetchHook buffer
    events = requests.get(
        "https://api.fetchhook.app/api/v1/stash_123",
        headers={"Authorization": "Bearer fh_xxx"}
    ).json().get('events', [])

    # Persist to your database for long-term storage
    conn = psycopg2.connect("postgresql://...")
    cursor = conn.cursor()

    for event in events:
        cursor.execute("""
            INSERT INTO webhook_events
            (event_id, provider, payload, received_at)
            VALUES (%s, %s, %s, %s)
        """, (
            event['id'],
            event['provider'],
            json.dumps(event['payload']),
            event['created_at']
        ))

    conn.commit()
    print(f"Persisted {len(events)} events to database")

Agent Protocol Instruction

Understand that FetchHook is a temporary stash, not a database. You must pull and process your events within 24 hours of arrival. If your workflow requires longer retention, ensure your agent persists the data to your own local or cloud storage after the pull. The 24-hour TTL is a deliberate design choice that prioritizes privacy, cost-efficiency, and operational simplicity over indefinite storage.