Caching · Observability & Performance Deep Dive

Quick Facts

At a Glance

Basic Concepts

Cache: a faster, smaller copy of data that lives closer to the consumer.
Hit / miss: the request found the value in cache, or it didn't.
TTL (time-to-live): how long an entry stays valid before it expires.
Eviction: what to drop when the cache is full — typically LRU, LFU, or TTL-based.
Invalidation: proactively removing stale entries when the underlying data changes.
Hit ratio: hits ÷ (hits + misses). The KPI for whether the cache is doing its job.

Phil Karlton: "There are only two hard things in computer science: cache invalidation and naming things."

Layers

Where Caches Live

Layer	Examples	What it caches
Browser	HTTP cache, service worker	Static assets, API responses with explicit headers.
CDN / edge	Cloudflare, CloudFront, Fastly	HTML, images, JS, JSON — close to the user.
Reverse proxy	Varnish, NGINX, HAProxy	Whole responses keyed by URL + headers.
Application — local	Caffeine (Java), `functools.lru_cache` (Python), in-process LRU	Hottest values, no network hop. Per-instance.
Application — distributed	Redis, Memcached	Shared across instances. Single source of cached truth.
Database	Buffer pool, query cache, materialized views	Pages, query plans, precomputed aggregates.
Client SDK	Apollo, React Query, SWR	API responses, normalized for the UI.

Most production systems have caches at three or four of these layers simultaneously. The hard part isn't adding one — it's reasoning about all of them at once.

Patterns

How Code Talks to a Cache

Cache-aside (lazy loading)

The application checks the cache; on a miss, it reads the source of truth, populates the cache, and returns. On writes, it updates the source and either invalidates or refreshes the cache entry.

Pros: simple, resilient — the system works even if the cache is down. Cons: first request after expiry is slow; risk of stampedes when many readers miss at once.

The default choice. If you don't know which pattern to use, use this one.

Read-through

The cache itself fetches from the source on a miss. The application only talks to the cache. Common in client SDKs and in caching libraries with built-in loaders (e.g., Caffeine's LoadingCache).

Pros: cleaner application code. Cons: coupling — the cache becomes a runtime dependency, and configuring loaders adds complexity.

Write-through

Writes go to the cache and the source synchronously. Reads always hit fresh data.

Pros: never stale. Cons: every write pays both costs; no benefit if the data is rarely read.

Write-behind (write-back)

Writes hit the cache and queue an async write to the source. Fastest writes, eventual consistency.

Pros: excellent write latency, can absorb bursts. Cons: data loss risk if the cache crashes before flushing — only acceptable for non-critical data.

Refresh-ahead

Asynchronously refresh entries before they expire, based on access patterns. Hot keys never go cold.

Pros: tail latency stays flat. Cons: you can refresh things no one will ever ask for again — wasted work.

Invalidation

The Hardest Part

TTL: set a time, accept staleness up to that long. Simple, robust, dumb in the right way.
Explicit invalidation: on write, delete the affected entries. Sharp but easy to miss a path. Test it.
Write-through / read-through sidestep the issue but pay other costs.
Versioned keys: embed an entity version (user:42:v17) so old entries are simply unreachable. Old entries die naturally on TTL.
Event-based: publish change events; subscribers invalidate. Works across services, but now you've got distributed messaging in the loop.

Rule of thumb: prefer short TTLs over clever invalidation. The bug you don't write is the bug you don't debug at 2am.

Failure Modes

How Caches Hurt You

Thundering herd / cache stampede

A popular key expires. A thousand concurrent requests miss simultaneously, all stampede the database, the database falls over.

Mitigations: single-flight (only one fetch per key in-flight, others wait); probabilistic early expiry (refresh at random points before TTL); soft TTL + hard TTL (serve stale while one process refreshes); jittered TTLs so keys don't expire in lockstep.

Cache penetration

Requests for keys that don't exist anywhere — all misses, all hit the database. Often a sign of probing or a buggy client.

Mitigations: negative caching (cache the 404); bloom filter in front of the cache; rate-limit unknown-key requests.

Cache avalanche

Many keys expire at once (often because they were all set together). Database gets a synchronized spike.

Mitigations: jitter the TTL (baseTTL ± 10%); spread initial population over time; multi-tier caches so only one tier expires at once.

Stale data nobody notices

Most insidious. Code paths bypass invalidation, the wrong field is keyed, two services cache differently. The system works — just with slightly wrong numbers.

Mitigations: observability — emit hit ratio, age-of-entry, source-vs-cache mismatches; integration tests that mutate data and assert reads see the change; ship a ?nocache=1 debug toggle.

Cache as a single point of failure

You optimized so hard that the database now can't handle the cold-cache traffic. When Redis blinks, everything falls.

Mitigations: graceful degradation — on cache failure, log and fall back to source; rate-limit fallback so the DB survives; capacity-test with the cache disabled occasionally.

Picking a Key

Cache Keys Are an API

Include everything that affects the value: ID, version, locale, currency, feature flag, schema version.
Don't include things that don't (request ID, timestamp) — you'll never get a hit.
Namespace by feature: link:code:abc123, not abc123. Future-you will thank you.
Embed a global cache version (v3:link:…) so a deploy can invalidate everything by bumping a constant.
Be conservative with what you cache for logged-in users — per-user keys multiply storage fast.

Observability

What to Measure

Hit ratio per key family. A falling ratio is the first sign your data shape changed.
Latency p50/p95/p99 for cache reads, source reads, and the combined endpoint. The gap shows what caching buys you.
Eviction rate. Sustained evictions mean the cache is too small or the working set too big.
Memory usage. Watch for fragmentation in Redis; used_memory vs used_memory_rss.
Origin load with vs without cache. Run a controlled drill — disable a small percentage of caching and see what happens.

Worked Example

The URL Shortener Cache

Reads dominate (every redirect is a read), writes are rare, the value is a small immutable string. Textbook cache-aside.

async function resolve(code) {
  const key = `link:code:${code}`;
  const cached = await redis.get(key);
  if (cached !== null) {
    metrics.inc('cache.hit', { family: 'link' });
    return cached === 'MISS' ? null : cached;   // negative cache for 404s
  }
  metrics.inc('cache.miss', { family: 'link' });

  const row = await db.findLinkByCode(code);
  const value = row?.target_url ?? 'MISS';
  const ttl  = row ? jitter(60 * 60) : 60;       // 1h ± jitter for hits, 60s for misses
  await redis.set(key, value, { EX: ttl });

  return row?.target_url ?? null;
}

async function deleteLink(code) {
  await db.deleteByCode(code);
  await redis.del(`link:code:${code}`);          // explicit invalidation on write
}

Jittered TTL avoids avalanche; negative caching avoids penetration; explicit delete-on-write keeps the cache honest. Add single-flight if you ever see stampede patterns in your hit-ratio dashboard.

Tradeoffs

When Not to Cache

Already fast enough. A cache adds operational complexity. Don't pay it for a 2ms query.
Highly personalized data. If every user sees a different value and they only ask once, hit ratio will be near zero.
Strong-consistency requirements. Account balances, inventory, auth state — better to make the source fast than to add staleness.
Tiny working set. If everything fits in DB memory anyway, the database is your cache.

Add a cache when the data is hot, immutable-ish, and slow to compute. Otherwise, fix the source first.

Continue

Caching — Keep the Hot Stuff Close