Ten modules that take you from an empty Git repo to a distributed, resilient webhook service — handling millions of events with reliable delivery, exponential backoff retries, dead-letter queues, and observability. This track pushes async systems, failure modes, and reliability patterns harder than most projects will encounter.
← Back to Learning TracksStart with a clear design and a pleasant repo. Webhook delivery is complex — don't let tooling get in the way.
.editorconfig, .gitignore, linter/formatter).webhooks (registrations), events (immutable log), deliveries (attempts and status).git clone + one command spins up the server./docs/design.md with a sketch of the three main tables.The contract: let customers register webhooks and emit events. Get this right before adding async machinery.
POST /webhooks to register a webhook: URL, secret (for signing), optional filter (event types).POST /events to emit an event: type, timestamp, payload. Store immediately in the events table (immutable log).order.created, user.updated).curl -X POST /webhooks creates a webhook; GET /webhooks/:id returns it.curl -X POST /events with an event appears in the events table.The hard part: when you emit an event, every webhook must eventually get it — even if the customer's server is down for an hour.
POST /events, enqueue a delivery job for each matching webhook (use your job queue: BullMQ, Celery, SQS, etc.).delivery_id in the request. Customers should deduplicate on this.failed state.Some webhooks will fail permanently. Don't lose them — analyze, alert, and allow recovery.
dead_letter_queue table: deliveries that failed all retries go here.GET /dlq to list failed deliveries (paginated). Include: webhook_id, event, last_error, timestamp.POST /dlq/:delivery_id/retry to allow manual retries from the DLQ.POST /dlq/:delivery_id/retry moves it back to the queue and retries it.Async systems are slippery. Test them hard: race conditions, timing, failure injection.
Push throughput. Can you deliver 10k events/sec to 100k webhooks? Batching, parallelism, and backpressure are key.
N delivery jobs (not one-by-one).k6 or Locust: measure events/sec, p95 latency.POST /events or buffer?/docs/perf.Async systems fail silently. Instrument heavily: logs, metrics, traces. Debugging without visibility is hopeless.
One tenant's misbehaving webhook shouldn't tank the system. Add isolation, rate limits, and resource fairness.
quotas table. Reject POST /events with 429 if over limit.GET /quotas/:tenant shows current usage and limits.Retry-After header.You're posting to customer URLs with sensitive data. Harden it: HMAC signing, SSRF prevention, secrets management.
X-Webhook-Signature: sha256=... in every POST. Customers verify with their secret.X-Webhook-Signature header.Ship it safely. Deploy, monitor, document, and be ready to debug production issues.
docker-compose.yml for local dev (app + Postgres + Redis + worker).main deploys to production automatically./docs.