Observability & Performance Deep Dive · 4 of 7

APM Platforms — The Commercial All-In-Ones

Running Prometheus + Loki + Tempo + Alertmanager + Grafana + on-call rotation isn't free — somebody owns it. Commercial platforms collapse the three pillars (and a few extras: RUM, synthetics, SIEM) into one product with one bill. The bill is famously eye-watering, but for many teams it's still cheaper than a platform engineer.

DatadogNew RelicHoneycombSplunkDynatraceSumo Logic
← Back to Observability & Performance
Side-By-Side

Pick by Strength

PlatformStrongest atNotable
DatadogBreadth — logs, metrics, APM, RUM, security, CI visibility700+ integrations; the obvious pick if you want one bill. Pricing complexity is itself a feature.
New RelicPer-user pricing model; full-stack visibilityRepriced in 2020 to ingest + per-user — generous free tier.
HoneycombHigh-cardinality querying; SLO & BubbleUp workflowsBuilt around traces; Charity Majors's "events, not metrics" philosophy.
SplunkLogs at extreme scale; enterprise + security (SIEM)Acquired by Cisco 2024. Famously expensive but extremely powerful search.
DynatraceAuto-instrumentation; AI root-causeOneAgent installs and discovers everything; opinionated & enterprise-priced.
Sumo Logic / Elastic Cloud / Logz.ioHosted log searchOften paired with self-hosted metrics.
What You Get

Beyond the Three Pillars

  • RUM (Real User Monitoring) — what real browsers/apps experience: TTFB, LCP, JS errors.
  • Synthetics — scripted probes from many regions; "is the login page up from Tokyo?"
  • Profiling — continuous CPU/memory profiling tied to traces.
  • Service Map — auto-discovered topology of who calls whom.
  • SLO management — error budgets, burn-rate alerts, multi-window calculations.
  • Notebooks — investigations as shareable docs with embedded queries.
Tradeoffs

What to Watch Out For

  • Pricing is ingest-heavy. A chatty service can 10× your bill overnight. Sample, drop, and watch budgets.
  • Vendor lock-in. Vendor-proprietary agents and query languages mean migrations are painful. OTel reduces this — emit OTLP, route anywhere.
  • "Custom metrics" tax. Most vendors charge per unique metric/series; cardinality control still matters.
  • Outage blast radius. If your monitoring vendor goes down during your incident, you're flying blind.
  • Data residency & compliance. Logs may contain PII; pick a region/setup that matches your regulator.
Continue

Other Observability Tools