A microservices architecture splits the system into many small services, each owned by one team, each with its own database, each deployed independently. It buys you team autonomy, polyglot stacks, and targeted scaling — at the cost of every problem in distributed systems landing in your lap.
← Back to Architecture20 teams shipping 20× a day without stepping on each other. The signal that justifies microservices isn't "we want clean code" — it's "we can't ship fast enough together."
Scale the search service to 100 replicas because it's CPU-heavy; keep the user-profile service at 3 because it's barely used. With a monolith you'd scale everything to the bottleneck.
Each team owns a service end to end — code, deploy, on-call. Decisions stay local; the team picks the right tool for its workload (Go for the high-throughput pricing engine, Python for the ML scoring service).
If the recommendations service is down, the homepage can degrade gracefully. The whole product doesn't go with it. If done right — without circuit breakers and timeouts, one slow dependency cascades into a full outage.
Networks fail, packets reorder, machines crash mid-call, latency varies by 100×. Every cross-service call needs timeouts, retries (with backoff and jitter), idempotency keys, and circuit breakers. None of this exists in a monolith.
You can't run an ACID transaction across services. To "place an order, charge a card, reserve inventory" you write a saga — a sequence of local transactions with compensating actions if any step fails. Significantly more complex than BEGIN ... COMMIT.
Each service brings: a CI pipeline, a Helm chart, dashboards, alerts, runbooks, an on-call rotation, log aggregation, distributed tracing, a service-mesh sidecar, secret rotation, and certificate management. Multiplied by 50 services. You need a platform team — sometimes a platform org.
"Why did this checkout fail?" used to be one stack trace. Now it's: distributed trace across 7 services, correlated logs, possibly a Kafka offset to inspect. Without OpenTelemetry, Jaeger, or similar, you're guessing.
Microservices that must deploy together (because their schemas are coupled) get all of the operational tax with none of the autonomy. Common signs: changing service A breaks service B; releases require a "release train"; nobody can deploy alone. This is microservices done wrong.
Services find each other via DNS (Kubernetes service objects), a registry (Consul, Eureka), or a service mesh (Istio, Linkerd). The mesh adds mTLS, traffic shaping, retries, and observability without changing application code.
A single entry point for clients (web, mobile). Handles auth, rate limiting, request shaping. The Backend-for-Frontend pattern goes further — one gateway per client type, each composing the underlying services into the shape that client needs.
For workflows where the producer doesn't need a synchronous reply, prefer events on a broker. Looser coupling, natural fan-out, easy audit. Sync RPC is better for "give me data right now to render this page."
Versioned APIs (URL or header), schema registry for events (Avro, Protobuf), backward-compatible changes only, contract tests in CI (Pact). Without this, schema drift turns "independent deploys" into "two-week coordination meetings."
Pick microservices when at least these are true:
Picking microservices on day one of a startup is the most common over-engineering mistake in the field. Start with a modular monolith; extract services when the pain is concrete, not theoretical.
"You should build a new system with a monolith first, even if you think your system would benefit from microservices later." — Martin Fowler, "MonolithFirst" (2015).