Learning Track · 3 of 3

Build a Production-Grade Feature-Flag Service

Ten modules that take you from an empty Git repo to a distributed feature-flag platform — serving rules to thousands of clients, supporting real-time updates, progressive rollouts, A/B testing, and analytics. This track focuses on rule evaluation, SDK design, caching, and the operational challenges of feature control at scale.

Rule enginesSDK designReal-time distributionProgressive rolloutsAnalytics
← Back to Learning Tracks
How to Use This Track

Learning by Shipping Feature Control Systems

Ground rules

  • Think like an SDK user. Every decision you make affects thousands of client apps. Prioritize simplicity and backwards compatibility.
  • Performance matters. Flag evaluation is on the hot path. Caching, pre-computing, and efficient algorithms are non-negotiable.
  • Embrace real-time distribution. WebSockets or polling: pick one and build it properly. Flag changes can't take hours to propagate.
  • Pick a stack and stick with it. Node/TypeScript, Python/FastAPI, or Go are all good choices.
  • Test the rule engine extensively. Off-by-one errors in targeting can segment your user base wrong.

The ten modules

Module 01 · ~2–3 hrs

Foundations & Project Setup

Start with a clear design. Feature flags are simple — wrong rule evaluation is not. Get the model right from the start.

Tasks

  • Create a Git repo with standard config (.editorconfig, .gitignore, linter/formatter).
  • Scaffold an HTTP server in your chosen language.
  • Write a one-page design doc: problem, goals, flag model (simple boolean? rules? segments?), API sketch, evaluation performance targets.
  • Create database schema: flags (metadata), rules (targeting), segments (user groups).
  • Document your rule evaluation strategy in the design doc.
Acceptance criteria
  • git clone + one command spins up the server.
  • Design doc committed to /docs/design.md with schema sketch.
  • Linter passes. Pre-commit hook blocks failures.
Module 02 · ~4–6 hrs

Core API & Data Model

The contract: manage flags and evaluate them. Start simple — a flag has a name, enabled state, and variations.

Tasks

  • Implement POST /flags to create a flag: name, type (boolean/string), enabled, variations, description.
  • Implement POST /evaluate to evaluate flags: user context (ID, attributes), flag key. Return the value/variation.
  • Implement GET /flags/:key, PATCH /flags/:key, DELETE /flags/:key.
  • Validate input: flag names, context structure, variation format.
  • Document the user context schema with examples (user ID, email, custom attributes).
Acceptance criteria
  • curl -X POST /flags creates a flag; GET /flags/:key returns it.
  • curl -X POST /evaluate with a user context returns a flag value.
  • Updating a flag is reflected immediately in evaluations.
Module 03 · ~5–7 hrs

Rule Evaluation Engine

The core: evaluate complex rules (if-then logic, segments, attributes) to decide flag variations.

Tasks

  • Implement a rule model: conditions (AND, OR), operators (equals, in, contains, regex), actions (target variation).
  • Implement segment support: define a segment by rules (e.g., "users in US and signed up this month"), use segments in flag rules.
  • Implement percentage-based targeting: send variation B to X% of users (deterministic by user ID hash).
  • Evaluate flags deterministically: same user + flag = same result every time (use consistent hashing for percentages).
  • Discuss in design doc: rule precedence, conflict resolution, fall-back behavior.
Acceptance criteria
  • A flag with rules: create segments, assign variations based on segments + attributes.
  • Evaluate a flag for user A 100 times → same result every time.
  • Percentage targeting: 1000 evaluations → roughly 50% of each variation (for 50% rollout).
Module 04 · ~5–7 hrs

Client SDK & Integration

SDKs are the interface. Make them simple to use, hard to misuse, and blazingly fast.

Tasks

  • Build a server-side SDK (or multiple: Node, Python, Go). Initialize with the service URL and API key.
  • SDK exposes client.evaluate(flag_key, user_context) → returns the evaluated variation.
  • SDK locally caches the flag configuration in memory.
  • SDK provides is_enabled(flag_key, user_context) for boolean flags and get_variation(flag_key, user_context) for multi-armed.
  • Write SDK docs: initialization, API, error handling, performance.
Acceptance criteria
  • SDK initialization takes < 100ms.
  • Flag evaluation from memory is < 1ms (local cache).
  • SDK README with working example, error handling guide.
Module 05 · ~4–6 hrs

Real-Time Config Updates

SDK caches are stale by default. Push new flag definitions to SDKs in real-time so changes are live immediately.

Tasks

  • Choose a mechanism: WebSockets (push), polling (pull), or Server-Sent Events.
  • On PATCH /flags/:key, push the updated flag config to all connected SDKs.
  • SDK subscribes to updates on initialization; receives flag changes in real-time.
  • Handle network failures: SDK reconnects with exponential backoff if the connection drops.
  • Document your update latency SLO: e.g., 99% of SDKs see new flags within 5 seconds.
Acceptance criteria
  • Update a flag; within 5 seconds, SDK cache is fresh and next evaluation uses the new rules.
  • Disconnect the SDK from the update stream; flag doesn't change until connection is restored.
  • You can describe the failure mode if the update stream is down for an hour.
Module 06 · ~5–7 hrs

Testing & Correctness

Rule engines are correctness-critical. Off-by-one errors segment users wrong. Test extensively.

Tasks

  • Unit tests for rule evaluation: all operators (equals, in, regex), segment membership, percentage targeting.
  • Property-based tests (QuickCheck/Hypothesis): random user contexts + rules should always evaluate consistently.
  • Integration tests: test SDK + server together. Update flags, verify SDKs see changes.
  • Edge case tests: empty segments, contradicting rules, malformed attributes, null contexts.
  • Coverage threshold: 85% for evaluation logic.
Acceptance criteria
  • Test suite runs in under 2 minutes.
  • You can describe a testing strategy for the rule engine (especially percentage targeting).
  • A malformed rule doesn't crash the service (graceful error handling).
Module 07 · ~5–7 hrs

Observability & Debugging

Teams need to understand why a flag evaluated to X. Instrument: logs, metrics, traces, evaluation audit logs.

Tasks

  • Structured logs for every evaluation: flag key, user ID, rules matched, variation returned, timestamp.
  • Metrics: evaluation count by flag, error rate, cache hit ratio, update latency.
  • Distributed tracing: instrument the full path (SDK request → server evaluation → cache lookup).
  • Build a Grafana dashboard: flag evaluation rate, error rate, top flags by volume.
  • Expose GET /flags/:key/audit to show evaluation results for a user (debugging tool).
Acceptance criteria
  • You can query logs for all evaluations of flag X in the last hour.
  • Dashboard shows which flags are being used most; error rates per flag.
  • GET /flags/my-flag/audit?user_id=123 shows why the user got variation A.
Module 08 · ~5–7 hrs

Progressive Rollouts & Canaries

Roll out features to 1%, 10%, 100% without code changes. Schedule rollouts, monitor metrics, auto-rollback if needed.

Tasks

  • Add rollout_percentage field to flags: 0–100% rollout to users.
  • Implement scheduled rollouts: flag can transition 1% → 10% → 50% → 100% on a schedule.
  • Implement guardrails: if error rate (from metrics) exceeds threshold, auto-rollback to 0%.
  • Expose POST /flags/:key/rollout to manually adjust rollout % with audit logging.
  • Document your rollout strategy: when to use % vs boolean, how to monitor during rollout.
Acceptance criteria
  • Create a flag with a 50% rollout; verify exactly 50% of users see the new variation.
  • Schedule a rollout: 1% at time T, then 50% at time T+1h; verify it happens on schedule.
  • Manually adjust rollout %; audit log shows who changed it and when.
Module 09 · ~4–6 hrs

Security & Compliance

Teams rely on flags to ship safely. Secure it: access control, audit logs, secrets, sensitive data handling.

Tasks

  • Implement role-based access control: admin (full), editor (create/update flags), viewer (read-only).
  • Add audit logging: log every flag change, rollout adjustment, user login. Include who, what, when.
  • Move secrets (API keys, signing keys) to a secrets manager (Vault, AWS Secrets Manager).
  • Never log sensitive user attributes in evaluation audit logs.
  • Write a simple threat model: flag injection (customer sees wrong flag)? Unauthorized rollouts?
Acceptance criteria
  • A read-only user cannot create or modify flags.
  • Audit log shows every flag change: timestamp, user, old value, new value.
  • Threat model committed; key risks documented.
Module 10 · ~6–8 hrs

Capstone & Production Readiness

Ship it. Deploy, document SDKs, run operational drills, go live.

Tasks

  • Write a multi-stage Dockerfile. Use docker-compose.yml for local dev (app + Postgres).
  • GitHub Actions: lint → test → build → push → deploy.
  • Deploy to production: Fly.io, Render, Railway, or a VM. Set up health checks and graceful rollback.
  • Write two runbooks: "a flag isn't updating" and "SDK connection storm."
  • Publish SDK documentation: GitHub README, code examples, changelog.
  • Write a migration guide for teams adopting the service from competitors.
Acceptance criteria
  • Every PR gets a preview URL; main deploys to production automatically.
  • A bad deploy auto-rolls back.
  • Runbooks + SDK docs committed; linked from README.
After the Track

Where to Go Next

  • Stretch goals on the same project: A/B testing framework, experiment analytics, custom attribute types, flag templates, third-party integrations.
  • Read about systems design. Study large-scale feature-flag deployments; sketch how you'd serve 100M evaluations/sec.
  • Try the Webhook Delivery track to practice event-driven systems at scale.
  • Write about it. Blog post per module; specifically on rule evaluation correctness and percentage targeting.