Learning Track · 1 of 1

Build a Production-Grade URL Shortener

Eleven modules that take you from an empty Git repo to a deployed, observable, secure service — used as a vehicle to practice every major area covered in the deep dives. Each module is a small, shippable increment with concrete acceptance criteria. Finish the track and you'll have a portfolio-worthy project and working knowledge across the stack.

BackendAPI designDatabasesCachingAuthCI/CDObservabilitySecurity
← Back to Learning Tracks
How to Use This Track

Learning by Shipping

Ground rules

  • Build it small, build it real. Each module ends with something running — not just notes.
  • Pick a stack and stick with it. Node/TypeScript + Postgres + Redis is recommended; Python/FastAPI or Go are equally fine.
  • Commit often. Use trunk-based development with short-lived branches. Squash before merge.
  • Don't skip the boring modules. Testing, observability, and CI/CD are where most learners give up — and where most growth happens.
  • When stuck, read the linked deep dive. Each module points back to the relevant chapters of the tour.

The eleven modules

Module 01 · ~2–3 hrs

Foundations & Project Setup

Every good system starts with a one-page design doc and a repo that's pleasant to work in. Don't skip this — bad foundations compound.

Tasks

  • Create a Git repo. Configure .editorconfig, .gitignore, and a linter/formatter (ESLint+Prettier, Ruff+Black, etc.).
  • Pick a stack and scaffold a "hello world" HTTP server.
  • Adopt trunk-based development: protect main, require PRs, use short-lived branches.
  • Write a 1-page design doc: problem, goals, non-goals, sketch of the API, what success looks like.
  • Add a basic README.md: what it is, how to run it locally.
Acceptance criteria
  • git clone + one command starts the server.
  • Linter runs clean. Pre-commit hook (or PR check) blocks lint failures.
  • Design doc is committed under /docs and links from the README.

Start the Module 01 tutorial →

Module 02 · ~4–6 hrs

Core API & Data Modeling

The smallest interesting slice: shorten a URL, redirect to it, delete it. Get the contract right before you add anything else.

Tasks

  • Design endpoints: POST /shorten, GET /:code (302 redirect), DELETE /:code.
  • Pick an ID strategy: random base62, hash-then-truncate, or counter+encode. Discuss collision & guessability tradeoffs in your design doc.
  • Model schema in Postgres: links(id, code, target_url, created_at, owner_id). Write a migration.
  • Validate input: URL format, max length, scheme allow-list (http/https only).
  • Return proper HTTP status codes (201 on create, 404 on miss, 410 on deleted).
Acceptance criteria
  • curl -X POST with a URL returns a short code.
  • Hitting /:code redirects (302) to the original.
  • Bad input is rejected with a clear 4xx error and a useful message.
  • Migrations run forward and back cleanly.

Start the Module 02 tutorial →

Module 03 · ~3–5 hrs

Testing Discipline

If your code isn't tested, future-you will be afraid to change it. Build the habit now while the codebase is small.

Tasks

  • Unit tests for the encoder: round-trips, edge cases, collisions.
  • Integration tests for the API: spin up Postgres in a test container, exercise full request/response.
  • Practice TDD on one new feature (e.g., custom alias / vanity codes): red → green → refactor.
  • Add a coverage threshold to CI (start at 70%, raise as you go).
  • Write one contract test that pins the public API shape.
Acceptance criteria
  • npm test (or equivalent) runs in under 30s for the unit suite.
  • Integration tests are isolated — they don't share state across runs.
  • You can describe the testing pyramid and where each test sits on it.

Start the Module 03 tutorial →

Module 04 · ~3–4 hrs

Caching & Performance

Reads dominate a shortener — most requests are repeat hits to popular links. Caching is the right hammer.

Tasks

  • Add Redis. Cache-aside on reads: check Redis, fall back to Postgres, populate.
  • Choose a sensible TTL. Invalidate on update/delete.
  • Benchmark before vs after with k6 or Locust — measure p50, p95, p99.
  • Discuss in your design doc: cache stampede, thundering herd, negative caching (404s).
  • Add a small in-process LRU in front of Redis for the hottest keys (optional).
Acceptance criteria
  • p95 redirect latency under 20ms on a warm cache (local).
  • Killing Redis degrades performance but the service still serves traffic.
  • Load test report committed to /docs/perf.

Start the Module 04 tutorial →

Module 05 · ~5–7 hrs

Authentication & Authorization

Anonymous shorteners get abused fast. Add accounts, ownership, and basic rate limits.

Tasks

  • Implement signup/login. Hash passwords with bcrypt or argon2 — never plain or MD5/SHA1.
  • Issue JWTs (or signed sessions). Add an Authorization: Bearer middleware.
  • Per-user link ownership: only the owner can delete or update.
  • Per-API-key rate limiting (token bucket via Redis).
  • Add an admin role for moderation actions.
Acceptance criteria
  • Unauthenticated POST /shorten is rejected (or capped to a strict anon quota).
  • User A cannot delete user B's links — verified by an integration test.
  • Hitting the rate limit returns a 429 with a Retry-After header.

Start the Module 05 tutorial →

Module 06 · ~4–6 hrs

Analytics & Background Jobs

Don't block the redirect to write analytics. Push the work off the request path.

Tasks

  • On every redirect, enqueue a click event (BullMQ / Celery / SQS).
  • A worker consumes events, writes to a clicks table (or aggregates into rollups).
  • Expose GET /stats/:code — total clicks, by-day, top referrers/UAs.
  • Handle worker failures: retries with exponential backoff, dead-letter queue.
  • Discuss in design doc: at-least-once vs exactly-once, idempotency.
Acceptance criteria
  • Redirect latency is unaffected by analytics (verified with load test).
  • Killing the worker doesn't lose events — they queue and drain on restart.
  • Stats endpoint returns coherent numbers under load.

Start the Module 06 tutorial →

Module 07 · ~5–7 hrs

Observability

You can't fix what you can't see. Instrument the service before you scale it.

Tasks

  • Structured logs (JSON) with correlation/trace IDs threaded through every request.
  • Metrics via Prometheus: request rate, error rate, latency histograms, queue depth, cache hit ratio.
  • Distributed tracing with OpenTelemetry — instrument HTTP, DB, Redis, and queue spans.
  • Build one Grafana dashboard with the four golden signals (latency, traffic, errors, saturation).
  • Define SLIs/SLOs: e.g., 99.9% of redirects under 50ms over 30 days. Compute the error budget.
Acceptance criteria
  • You can answer "what was p95 redirect latency at 14:32 UTC?" from your dashboard.
  • One trace shows the full path: HTTP → cache → DB → queue.
  • SLO doc committed to the repo with a clear error budget policy.

Start the Module 07 tutorial →

Module 08 · ~4–6 hrs

Containerization & CI/CD

"It works on my machine" is a smell. Make build, test, and deploy boring and automatic.

Tasks

  • Write a multi-stage Dockerfile. Keep the runtime image small and rootless.
  • docker-compose.yml for local dev (app + Postgres + Redis + worker).
  • GitHub Actions pipeline: lint → typecheck → test → build image → push → deploy preview.
  • Deploy to a real host: Fly.io, Render, Railway, a small VM, or Kubernetes if you're feeling brave.
  • Set up blue/green or rolling deploys with health checks and automatic rollback.
Acceptance criteria
  • Every PR gets a preview URL automatically.
  • Merging to main deploys to production with zero downtime.
  • A bad deploy auto-rolls back and pages no one (because the health check caught it).

Start the Module 08 tutorial →

Module 09 · ~4–5 hrs

Security Hardening

A shortener is a surprisingly juicy attack surface. Threat-model it before someone else does.

Tasks

  • Prevent open redirect abuse: scheme allow-list, deny-list of known-bad domains, optional URL preview page.
  • Prevent SSRF if you fetch the target (e.g., for previews/title): block private IPs, metadata endpoints, file scheme.
  • Add security headers: HSTS, CSP, X-Content-Type-Options, Referrer-Policy.
  • Move secrets to a manager (Vault, AWS Secrets Manager, Doppler, or GitHub Actions secrets) — never commit .env.
  • Enable dependency scanning (Dependabot, Snyk) and a SAST tool (CodeQL, Semgrep).
  • Write a 1-page threat model: assets, actors, attack vectors, mitigations.
Acceptance criteria
  • An automated test confirms javascript:, file:, and private-IP targets are rejected.
  • curl -I on the public URL shows expected security headers.
  • Threat model committed and linked from the README.

Start the Module 09 tutorial →

Module 10 · ~5–7 hrs

Scaling & Reliability

Now make it survive failure. The interesting questions aren't "how fast?" but "how does it degrade?"

Tasks

  • Add a Postgres read replica. Route reads to the replica, writes to the primary.
  • Run two app instances behind a load balancer. Make sure sessions/state are externalized.
  • Chaos drills: kill Redis, kill a replica, partition the network. Observe and document the blast radius.
  • Write a runbook for the top three on-call scenarios (DB down, queue backed up, cache cold).
  • Write a postmortem for one of your chaos drills using a blameless template.
  • Discuss in your design doc: where consistency matters and where eventual is fine.
Acceptance criteria
  • Killing one app instance causes zero failed requests (load balancer drains it).
  • Redis outage → cache misses but service stays up; alarm fires within 1 minute.
  • Runbook + postmortem committed under /docs/ops.

Start the Module 10 tutorial →

Module 11 · ~3–4 hrs

Capstone & Review

The work isn't done until someone else can understand it. Polish, document, demo.

Tasks

  • Run a self code review with fresh eyes. Look for dead code, weak names, missing tests, leaky abstractions.
  • Run a self security review against your threat model. Patch anything you find.
  • Polish the README: what it is, why, architecture diagram, how to run, how to contribute, license.
  • Add an ADR log (/docs/adr) with the 5–10 biggest decisions you made and why.
  • Record a 5-minute demo walking through the architecture, a feature, and the dashboards.
  • Open-source it. Pick a license intentionally.
Acceptance criteria
  • A friend can clone the repo, follow the README, and have it running in under 10 minutes.
  • The architecture diagram matches what's actually deployed.
  • Demo is recorded and linked from the README.

Start the Module 11 tutorial →

After the Track

Where to Go Next

  • Stretch goals on the same project: custom domains, link expiration, QR codes, fraud/abuse detection, an admin dashboard, a CLI client, a mobile app.
  • Branch into systems design. Read Designing Data-Intensive Applications by Kleppmann; sketch how you'd scale your shortener to 100B links.
  • Try a harder track. Webhook delivery service, feature-flag service, or a real-time collaborative tool — each pushes one dimension (async, distribution, real-time) much harder.
  • Write about it. One blog post per module is a great forcing function for understanding.