A cache trades a little staleness for a lot of speed. Done well, it turns a creaking system into a calm one. Done poorly, it produces the kind of bugs that show up only on Black Friday — wrong data served confidently, fast.
← Back to Observability & PerformancePhil Karlton: "There are only two hard things in computer science: cache invalidation and naming things."
| Layer | Examples | What it caches |
|---|---|---|
| Browser | HTTP cache, service worker | Static assets, API responses with explicit headers. |
| CDN / edge | Cloudflare, CloudFront, Fastly | HTML, images, JS, JSON — close to the user. |
| Reverse proxy | Varnish, NGINX, HAProxy | Whole responses keyed by URL + headers. |
| Application — local | Caffeine (Java), functools.lru_cache (Python), in-process LRU | Hottest values, no network hop. Per-instance. |
| Application — distributed | Redis, Memcached | Shared across instances. Single source of cached truth. |
| Database | Buffer pool, query cache, materialized views | Pages, query plans, precomputed aggregates. |
| Client SDK | Apollo, React Query, SWR | API responses, normalized for the UI. |
Most production systems have caches at three or four of these layers simultaneously. The hard part isn't adding one — it's reasoning about all of them at once.
The application checks the cache; on a miss, it reads the source of truth, populates the cache, and returns. On writes, it updates the source and either invalidates or refreshes the cache entry.
Pros: simple, resilient — the system works even if the cache is down. Cons: first request after expiry is slow; risk of stampedes when many readers miss at once.
The default choice. If you don't know which pattern to use, use this one.
The cache itself fetches from the source on a miss. The application only talks to the cache. Common in client SDKs and in caching libraries with built-in loaders (e.g., Caffeine's LoadingCache).
Pros: cleaner application code. Cons: coupling — the cache becomes a runtime dependency, and configuring loaders adds complexity.
Writes go to the cache and the source synchronously. Reads always hit fresh data.
Pros: never stale. Cons: every write pays both costs; no benefit if the data is rarely read.
Writes hit the cache and queue an async write to the source. Fastest writes, eventual consistency.
Pros: excellent write latency, can absorb bursts. Cons: data loss risk if the cache crashes before flushing — only acceptable for non-critical data.
Asynchronously refresh entries before they expire, based on access patterns. Hot keys never go cold.
Pros: tail latency stays flat. Cons: you can refresh things no one will ever ask for again — wasted work.
user:42:v17) so old entries are simply unreachable. Old entries die naturally on TTL.Rule of thumb: prefer short TTLs over clever invalidation. The bug you don't write is the bug you don't debug at 2am.
A popular key expires. A thousand concurrent requests miss simultaneously, all stampede the database, the database falls over.
Mitigations: single-flight (only one fetch per key in-flight, others wait); probabilistic early expiry (refresh at random points before TTL); soft TTL + hard TTL (serve stale while one process refreshes); jittered TTLs so keys don't expire in lockstep.
Requests for keys that don't exist anywhere — all misses, all hit the database. Often a sign of probing or a buggy client.
Mitigations: negative caching (cache the 404); bloom filter in front of the cache; rate-limit unknown-key requests.
Many keys expire at once (often because they were all set together). Database gets a synchronized spike.
Mitigations: jitter the TTL (baseTTL ± 10%); spread initial population over time; multi-tier caches so only one tier expires at once.
Most insidious. Code paths bypass invalidation, the wrong field is keyed, two services cache differently. The system works — just with slightly wrong numbers.
Mitigations: observability — emit hit ratio, age-of-entry, source-vs-cache mismatches; integration tests that mutate data and assert reads see the change; ship a ?nocache=1 debug toggle.
You optimized so hard that the database now can't handle the cold-cache traffic. When Redis blinks, everything falls.
Mitigations: graceful degradation — on cache failure, log and fall back to source; rate-limit fallback so the DB survives; capacity-test with the cache disabled occasionally.
link:code:abc123, not abc123. Future-you will thank you.v3:link:…) so a deploy can invalidate everything by bumping a constant.used_memory vs used_memory_rss.Reads dominate (every redirect is a read), writes are rare, the value is a small immutable string. Textbook cache-aside.
async function resolve(code) {
const key = `link:code:${code}`;
const cached = await redis.get(key);
if (cached !== null) {
metrics.inc('cache.hit', { family: 'link' });
return cached === 'MISS' ? null : cached; // negative cache for 404s
}
metrics.inc('cache.miss', { family: 'link' });
const row = await db.findLinkByCode(code);
const value = row?.target_url ?? 'MISS';
const ttl = row ? jitter(60 * 60) : 60; // 1h ± jitter for hits, 60s for misses
await redis.set(key, value, { EX: ttl });
return row?.target_url ?? null;
}
async function deleteLink(code) {
await db.deleteByCode(code);
await redis.del(`link:code:${code}`); // explicit invalidation on write
}
Jittered TTL avoids avalanche; negative caching avoids penetration; explicit delete-on-write keeps the cache honest. Add single-flight if you ever see stampede patterns in your hit-ratio dashboard.
Add a cache when the data is hot, immutable-ish, and slow to compute. Otherwise, fix the source first.