Webhook Delivery Tutorial · Module 08 of 10

Scaling & Multi-Tenant

One tenant's misbehaving webhook shouldn't tank the system. Add isolation, rate limits per customer, and resource fairness. By the end, one customer's failures won't starve others.

~5–7 hrsAdvancedIsolation focus
← Back to Module 08 overview
What You'll Have at the End

Definition of Done

  • Customer/tenant ID in webhooks table.
  • quotas table: max events/sec, max concurrent deliveries, max DLQ size per tenant.
  • POST /events rejects with 429 if tenant quota exceeded.
  • GET /quotas/:tenant shows usage and limits.
  • Chaos test: one slow tenant doesn't block others' deliveries.
  • Resource isolation documented in design doc.
The Steps

Build It

STEP 1

Add tenant schema and quotas table

Update src/db/migrations.ts:

ALTER TABLE webhooks ADD COLUMN tenant_id UUID NOT NULL DEFAULT gen_random_uuid();
ALTER TABLE events ADD COLUMN tenant_id UUID NOT NULL DEFAULT gen_random_uuid();

CREATE TABLE IF NOT EXISTS quotas (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  tenant_id UUID NOT NULL UNIQUE,
  max_events_per_sec INT DEFAULT 100,
  max_concurrent_deliveries INT DEFAULT 50,
  max_dlq_size INT DEFAULT 100,
  created_at TIMESTAMP DEFAULT NOW(),
  updated_at TIMESTAMP DEFAULT NOW()
);
✓ Verify: Migrations run without errors.
STEP 2

Implement rate limiting in POST /events

Update src/routes/events.ts:

import { Ratelimit } from '@upstash/ratelimit';
import { Redis } from '@upstash/redis';

// Create per-tenant rate limiter
const redis = new Redis({ url: process.env.REDIS_URL });

router.post('/', async (req: Request, res: Response) => {
  const { type, payload, tenant_id } = req.body;

  // Get tenant quota
  const quotaResult = await pool.query(
    'SELECT * FROM quotas WHERE tenant_id = $1',
    [tenant_id]
  );

  if (quotaResult.rows.length === 0) {
    return res.status(404).json({ error: 'Tenant not found' });
  }

  const quota = quotaResult.rows[0];
  const ratelimit = new Ratelimit({
    redis,
    limiter: Ratelimit.slidingWindow(quota.max_events_per_sec, '1 s'),
    analytics: true,
    prefix: `tenant:${tenant_id}:events`
  });

  const { success, pending, limit, reset, remaining } = await ratelimit.limit(tenant_id);

  if (!success) {
    return res.status(429).json({
      error: 'Rate limit exceeded',
      limit,
      remaining: 0,
      reset
    });
  }

  // ... rest of event creation with tenant_id
✓ Verify: Exceeding quota returns 429 with Retry-After header.
STEP 3

Implement GET /quotas endpoint

Create src/routes/quotas.ts:

import { Router, Request, Response } from 'express';
import pool from '../db/pool';

const router = Router();

// GET /quotas/:tenant_id
router.get('/:tenant_id', async (req: Request, res: Response) => {
  try {
    const quotaResult = await pool.query(
      'SELECT * FROM quotas WHERE tenant_id = $1',
      [req.params.tenant_id]
    );

    if (quotaResult.rows.length === 0) {
      return res.status(404).json({ error: 'Tenant quota not found' });
    }

    const quota = quotaResult.rows[0];

    // Get current usage
    const usageResult = await pool.query(`
      SELECT
        COUNT(DISTINCT e.id) as events_sent,
        COUNT(CASE WHEN d.status = 'processing' THEN 1 END) as concurrent_deliveries,
        COUNT(CASE WHEN dlq.id IS NOT NULL THEN 1 END) as dlq_count
      FROM events e
      LEFT JOIN deliveries d ON e.id = d.event_id AND d.status = 'processing'
      LEFT JOIN dead_letter_queue dlq ON e.tenant_id = dlq.id
      WHERE e.tenant_id = $1
    `, [req.params.tenant_id]);

    const usage = usageResult.rows[0];

    res.json({
      tenant_id: quota.tenant_id,
      limits: {
        max_events_per_sec: quota.max_events_per_sec,
        max_concurrent_deliveries: quota.max_concurrent_deliveries,
        max_dlq_size: quota.max_dlq_size
      },
      current_usage: {
        events_sent: parseInt(usage.events_sent),
        concurrent_deliveries: parseInt(usage.concurrent_deliveries),
        dlq_count: parseInt(usage.dlq_count)
      },
      remaining: {
        events: quota.max_events_per_sec - parseInt(usage.events_sent),
        deliveries: quota.max_concurrent_deliveries - parseInt(usage.concurrent_deliveries),
        dlq: quota.max_dlq_size - parseInt(usage.dlq_count)
      }
    });
  } catch (err) {
    console.error(err);
    res.status(500).json({ error: 'Failed to fetch quota' });
  }
});

export default router;
✓ Verify: curl http://localhost:3000/quotas/:tenant_id returns quota and usage.
STEP 4

Test tenant isolation with chaos

Create src/__tests__/tenant-isolation.test.ts:

describe('Tenant Isolation', () => {
  test('slow webhook of tenant A does not block tenant B', async () => {
    // Create two tenants
    const tenantA = crypto.randomUUID();
    const tenantB = crypto.randomUUID();

    // Tenant A has a slow webhook (30s response time)
    // Tenant B has a fast webhook (100ms response time)

    // Emit 100 events for each tenant
    // Measure time for Tenant B's deliveries to complete
    // Should be fast (not blocked by Tenant A's slow webhook)

    const startB = Date.now();
    // Emit events for Tenant B
    const endB = Date.now();

    expect(endB - startB).toBeLessThan(5000); // Should complete quickly
  });
});
✓ Verify: Tenant B's deliveries complete quickly even while Tenant A is slow.
STEP 5

Document resource isolation strategy

Update docs/design.md:

## Resource Isolation

### Per-Tenant Quotas
- Max events/sec: prevents one tenant from flooding the system
- Max concurrent deliveries: limits thread pool per tenant
- Max DLQ size: prevents one tenant's failures from filling disk

### Implementation
- Rate limiting via Upstash (Redis-backed)
- Sliding window: 1-second buckets
- Returns 429 (Too Many Requests) when exceeded
- Retry-After header indicates when quota resets

### Monitoring
- Metrics per tenant_id: events/sec, delivery rate, error rate
- Alert if any tenant exceeds 80% of their quota
- Alert if any tenant's DLQ grows beyond 50% of limit
✓ Verify: Documentation is clear and committed.
STEP 6

Commit multi-tenant work

git add -A
git commit -m "feat: add multi-tenant support with rate limiting and isolation

- Tenant ID in webhooks, events, deliveries
- Quotas table: max events/sec, concurrent deliveries, DLQ size per tenant
- Rate limiting via Ratelimit (Upstash Redis)
- GET /quotas endpoint shows usage vs limits
- POST /events returns 429 when tenant quota exceeded
- Isolation: one tenant's failures don't affect others
- Chaos test verifies isolation"
git push origin main
✓ Verify: git log --oneline shows your commit.
Next Steps

Ready for Module 09?

Your system now isolates tenants properly. Next, you'll harden security: HMAC signing, SSRF prevention, and threat modeling. Head to Module 09.