Architectural Styles Deep Dive · 5 of 8

Event-Driven Architecture — Tell, Don't Ask

Services emit events when something happens — "OrderPlaced", "PaymentReceived", "UserSignedUp" — onto a shared bus. Other services subscribe to what they care about and react. Producers don't know who consumes; consumers don't know who produced. The result is loosely coupled systems that fan out naturally and leave a perfect audit trail.

Pub/SubAsyncLoose CouplingEventual Consistency
← Back to Architecture
Quick Facts

What Event-Driven Means

Basic Concepts

  • Event: a fact about something that happened in the past. Past tense, immutable, named for the business meaning ("OrderPlaced", not "InsertOrderRow").
  • Producer: the service that publishes the event. Doesn't know or care who's listening.
  • Consumer: a service that subscribes to a topic and reacts. Each consumer keeps its own offset.
  • Broker: the middleware that routes events. Kafka, RabbitMQ, AWS SNS/SQS, Google Pub/Sub, Azure Service Bus, NATS.
  • Topic / Stream: a named channel events flow through. Often partitioned for parallelism.
  • Event ≠ Command. "OrderPlaced" announces a fact (event); "PlaceOrder" requests an action (command). Confusing them tangles ownership.
Why It Wins

What You Buy

Loose Coupling

The order service doesn't know that emails get sent, analytics gets updated, the warehouse picks the package, and the loyalty system credits points. It just publishes "OrderPlaced". Add a new consumer (fraud detection, recommendation training) without touching the producer.

Natural Fan-Out

One event, ten consumers. Each one gets its own copy and processes at its own pace. Slow consumers don't slow down fast ones; they just lag behind on the topic.

Audit & Replay

The event log is a durable history. New consumer? Replay the topic from the start to backfill. Bug fix? Replay since the bad commit to repair downstream state. With Kafka, retention can be effectively infinite.

Resilience to Downstream Outages

If the email service is down for an hour, "OrderPlaced" events queue up on its topic. When it comes back, it catches up. The order service never knew there was a problem.

The Tax

What You Pay For

Eventual Consistency

"Place an order, then immediately query the order list" may not show the new order if the read model lags the event. Designs that assume read-after-write consistency break here. Solutions: optimistic UI updates, "your order is processing" states, or a read model with monotonic guarantees.

Ordering Is Hard

Across a partitioned topic, global order is impossible. You get order within a partition. Partition by aggregate key (e.g., order ID) so all events for one entity stay in order — but events across different aggregates can interleave.

At-Least-Once Delivery

Most brokers guarantee "at least once", which means consumers will see duplicates. Make consumers idempotent — store the event ID, skip if already processed. "Exactly once" exists in name (Kafka transactions) but is brittle and rare in practice.

Schema Evolution

An event lives in the topic for years; consumers come and go. Yesterday's events must keep deserializing. Use a schema registry (Confluent, Apicurio) and enforce backward-compatible changes only — add optional fields, never rename or remove without versioning.

End-to-End Visibility

"What happens after OrderPlaced?" requires grepping every service to see who subscribes. Distributed tracing across event hops is harder than across HTTP — propagate trace IDs in event headers and instrument both sides.

The Outbox Problem

If a service writes to its DB and then publishes an event, what if the publish fails after the DB commits? Inconsistency. The outbox pattern writes the event to an outbox table in the same DB transaction; a separate process reads the outbox and publishes. Solves the dual-write problem.

Coordination

Choreography vs Orchestration

Choreography — No Conductor

Each service reacts to events independently. No central controller. Simple to build, hard to see — "what's the full checkout flow?" requires walking every service. Best for small flows with 2–3 steps.

Orchestration — A Workflow Drives It

A workflow engine (Temporal, AWS Step Functions, Camunda, Netflix Conductor) drives the saga step by step. Visible, testable, with built-in retries and compensation. The right call for long-running multi-step flows like onboarding, refunds, KYC. Slightly more central coupling, much more debuggable.

Sagas — Distributed Transactions

You can't BEGIN/COMMIT across services. A saga is a sequence of local transactions, each with a compensating action. If step 4 fails, fire compensations for steps 1–3. Implement choreographed (events) or orchestrated (workflow engine).

Brokers

Pick the Right Pipe

BrokerModelBest For
Apache KafkaAppend-only log, partitioned, persistentHigh-throughput event streaming, replay, analytics pipelines.
RabbitMQQueue + exchange routingTask queues, request/reply, complex routing.
AWS SNS + SQSPub/sub fan-out + queuesDecoupling Lambdas and microservices on AWS.
Google Pub/SubManaged pub/subCloud-native eventing on GCP.
NATS / JetStreamLightweight messaging + persistent streamsEdge, IoT, microservices needing low overhead.
Redis StreamsIn-memory logSmall-scale eventing alongside an existing Redis.
Decision

When to Pick Event-Driven

Reach for events when:

  • The producer genuinely doesn't need to know who consumes — fan-out is real, not speculative.
  • Workflows are async, multi-step, and benefit from being decoupled in time.
  • You need a durable audit log of what happened, in order.
  • Downstream services should keep working when one is briefly down.

Don't reach for events when you actually need a synchronous answer ("did the payment go through?"), when consistency must be immediate, or when the team can't yet operate a broker reliably. Sync RPC over HTTP/gRPC is fine — sometimes ideal — for request/response.

Continue

Other Architectural Styles