Performance & Scalability · Observability & Performance Deep Dive

Scaling

Vertical vs Horizontal

Axis	How	When it fits
Vertical (scale up)	Bigger box — more CPU, RAM, faster disk	Stateful systems (DBs); single-process workloads; quick wins.
Horizontal (scale out)	More boxes behind a load balancer	Stateless services; web tiers; queue workers. Needs idempotency.
Read replicas	Scale reads only; writes to primary	Read-heavy DB workloads; analytics queries.
Sharding	Partition data across nodes by a key	Datasets that outgrow a single primary; high write throughput.
Auto-scaling	Add/remove instances on metrics (CPU, queue depth, RPS)	Spiky workloads; cost-sensitive cloud deployments.

Caching

Closer Is Cheaper

Browser cache

→

CDN edge

→

Reverse proxy

→

App-level cache (Redis / Memcached)

→

Database cache

→

Origin DB

Cache-aside (lazy load) — most common; check cache, miss → load DB → fill cache.
Write-through — write hits cache and DB synchronously; consistency at write cost.
Write-behind — write hits cache, async to DB; fast but risk on crash.
TTLs — short for personalized data; long for catalogs; never 0.
Cache invalidation — the hard part. Tag-based or event-driven beats hoping TTL is fast enough.

Profiling

Find Hot Spots Before Optimizing

CPU profilers — Linux perf, async-profiler (JVM), py-spy, Go pprof, .NET dotnet-trace.
Memory profilers — heap dumps + analyzers; track allocations not just usage.
SQL profiling — EXPLAIN, slow-query logs, pg_stat_statements.
Continuous profiling — Pyroscope, Parca, Datadog/Granulate run profilers always-on; tie samples to traces.
Flame graphs — vertical = stack depth, horizontal = sample share. Wide bars = where to look.
"Make it work, make it right, make it fast." Profile first; intuition lies.

Load Testing

Find the Cliff Before Users Do

Tool	Strength
k6	JS scripts, Grafana-friendly, dev-ergonomic.
JMeter	GUI, plugin-rich, mature; XML-heavy.
Gatling	Scala/Java DSL; great reports.
Locust	Python; distributed; readable scenarios.
Artillery / Vegeta / wrk	Lightweight CLIs for quick HTTP load.

Run load tests (sustained), stress tests (find the breakpoint), soak tests (memory leaks under hours of traffic), and spike tests (sudden 10×). Test against prod-like data, not empty DBs.

Protect the Backend

Rate Limiting, Backpressure, Circuit Breakers

Rate limiting — cap requests per client / IP / API key. Token bucket and leaky bucket are the classics.
Backpressure — when a downstream is slow, slow down the upstream too; bounded queues, not infinite.
Circuit breakers — stop calling a failing dependency for a cooling period; recover progressively.
Bulkheads — partition resources so one bad dependency can't drown the whole process (separate thread pools / clients).
Timeouts everywhere. No timeout = waiting forever on a hung peer.
Retries with jitter. Naive retries cause synchronized thundering herds.
Load shedding. Drop low-priority traffic to keep critical paths alive.

Common Pitfalls

Things That Bite

p99, not average. Averages hide the slow tail; p99 latency is what your worst users feel.
The N+1 query. One outer + N inner DB calls per request — the most common silent killer.
Synchronous fan-out. 10 services in series = sum of 10 latencies. Parallelize what you can.
Cache stampedes. A hot key expiring → 1000 concurrent reloads. Use single-flight / probabilistic early refresh.
Premature horizontal scaling. A 4× bigger box is often cheaper and simpler than a fleet of small ones.
Optimizing without measurement. "I have a hunch" is how you spend a week speeding up code that runs once a day.

Continue

Other Observability Tools

← Error Tracking Logging → Metrics → Tracing → ↑ Back to Observability & Performance