Key-Value Stores · Database Family Deep Dive

Quick Facts

At a Glance

Core Ideas

The key is opaque. The store doesn't care what's inside the value — bytes, JSON, an image, a serialized object.
No queries. You can't ask "all users in California." You can only ask for keys you know, or scan ranges if the engine supports it.
TTL is first-class. Set a key with an expiry; it disappears. The reason these are the default for caches and sessions.
In-memory or on-disk. Redis is RAM-first with optional persistence; DynamoDB and FoundationDB are disk-backed with caches.
Latency is the product. Sub-millisecond p99 is the table stakes.

The Engines

Who Plays Here

Redis / Valkey

The de facto standard. Strings, hashes, lists, sets, sorted sets, streams, pub/sub. Single-threaded per shard, brutally fast. Valkey is the open-source fork after the license change.

Memcached

The old guard. Pure cache — no persistence, no data structures, just get/set with TTLs. Multi-threaded, dead simple.

DynamoDB

AWS' managed key-value (and document). Predictable latency, autoscaling, global tables. The price model rewards good access-pattern design.

Cloudflare KV / Workers KV

Edge key-value — eventually consistent, replicated to every PoP. Reads are fast from anywhere, writes propagate.

etcd / Consul

Strongly consistent KV for configuration and service discovery. Backed by Raft. Powers Kubernetes.

FoundationDB / TiKV

Distributed transactional KV — building blocks for higher-level databases (CockroachDB and TiDB sit on TiKV).

When KV Wins

The Canonical Use Cases

Cache In Front of a Slower Store

The classic. Compute or fetch the result, stash it under a deterministic key with a TTL, serve subsequent reads from RAM. Patterns: cache-aside (app reads cache, falls back to DB), read-through (cache fetches on miss), write-through (cache updates on every write). Mind invalidation — the second-hardest problem in CS.

Sessions & Tokens

Login session, JWT denylist, OAuth nonce, idempotency keys for payment requests. Short-lived, looked up by ID, written once. TTL handles cleanup automatically.

Rate Limits & Counters

Atomic INCR on a key like rl:user:42:minute:2026-04-27T15:23, with a TTL. Token bucket, leaky bucket, sliding window — all a few Redis commands. Cassandra's counter columns play here too.

Leaderboards & Real-Time Ranking

Redis sorted sets (ZADD, ZRANGE) give you "top N by score" in O(log N). Ranks, presence, trending lists, matchmaking queues — small data, hot reads, score updates.

Pub/Sub & Lightweight Queues

Redis Streams and pub/sub for fan-out within a service. Not a Kafka replacement — no long retention, no exactly-once — but for "notify the websocket layer that user 42's data changed," it's perfect.

Feature Flags & Config

etcd / Consul / Cloudflare KV: small bits of config replicated everywhere, read on every request. Strong consistency matters here — flipping a flag should not race with a deploy.

Trade-offs

What You Give Up

No queries. Want "all sessions for user 42"? You build a secondary index by hand — a set keyed on user:42:sessions — and keep it in sync.
No joins, no transactions across keys (mostly). Redis MULTI/EXEC and Lua scripts give atomicity within one shard; cross-shard is on you.
Memory cost. Redis at scale is expensive — every byte is in RAM. Track key size and eviction policy (allkeys-lru, volatile-ttl).
Persistence is optional and lossy. RDB snapshots and AOF logs help, but Redis is not your system of record. Treat it as cache, not truth.

Operations

Things That Bite

Hot Keys

One key getting a million reads per second. Pins a single shard's CPU while the rest of the cluster idles. Mitigate with client-side caching, request coalescing (single-flight), or sharding the key (counter:0..15).

Big Keys / Big Values

A 50MB list. LRANGE on it blocks the single thread; replicas fall behind on the sync. Cap value sizes, paginate large structures, and run --bigkeys regularly.

Cache Stampede

A hot key expires; 10,000 requests miss simultaneously and all try to repopulate. The DB goes down. Fix with probabilistic early expiration, locking on miss (single-flight), or stale-while-revalidate.

Cache Inconsistency

Update DB, then update cache — and the second step fails. Invalidate (DEL) instead of writing through, accept TTLs as the worst-case staleness, or use change-data-capture from the DB to drive cache updates.

Pitfalls

Common Mistakes

Using Redis as a primary database for data you can't lose. The persistence story is "best effort"; the operational story is "RAM is finite."
No TTL. Keys live forever, memory grows forever. Pick a default TTL even for "permanent" caches.
KEYS in production. O(N) on the entire keyspace, blocks the server. Use SCAN.
Serializing the world. Pickling whole Python objects into Redis values — fast to write, painful to evolve. Stick to language-neutral formats (JSON, MessagePack, Protobuf).
Treating DynamoDB like SQL. No GSI for the access pattern means a scan, and scans are slow and expensive. Design the table for the queries.

Continue