Observability & Performance Deep Dive · 6 of 7

Error Tracking — Sentry, Rollbar & Friends

Logs tell you something happened; error trackers tell you the same exception happened 1,432 times across two browser versions, started after release 4.7.1, affects users in three countries, and here's the exact stack frame. Sentry pioneered the category in 2008 — most teams now think of error tracking as a separate, mandatory layer alongside logs and metrics.

SentryRollbarBugsnagSource mapsRelease trackingBreadcrumbs
← Back to Observability & Performance
Anatomy

What Makes It More Than "Just Logs"

Basic Concepts

  • Event — one captured exception or message; deduped into issues.
  • Issue — a fingerprint of "the same error" — count, first seen, last seen, affected users.
  • Stack trace — symbolicated, with source code context lines, locals where possible.
  • Breadcrumbs — last N actions (clicks, navigations, network calls) leading up to the error.
  • Release tag — tie errors to the deployment that introduced them.
  • User & session context — who hit it, on what browser, on what device.
  • Source maps — for minified JS, map back to original TS/JSX.
Players

Side-By-Side

ToolStrengthNotable
SentryMost popular; SDKs for everything; Tracing & Replay add-onsBSL-licensed self-host option (free for 1 user). Session Replay competes with FullStory.
RollbarWorkflow features & routingStrong RQL query language; auto-grouping rules.
Bugsnag (SmartBear)Stability scores tied to releasesCrash-free user % is the headline metric.
RaygunAPM + crash reporting + RUM combinedSmaller player but full stack.
Honeybadger / AirbrakeLightweight, indie-friendly pricingCommon in Rails / Django shops.
Workflow

From Error to Fix

Exception thrown
SDK captures stack + context
Send to backend
Group into issue
Alert (Slack / email)
Triage / assign / fix
Resolve in next release
Tradeoffs

What to Watch Out For

  • PII leakage. Stack frames + locals may capture passwords, tokens, full request bodies. Configure scrubbing.
  • Issue volume. A noisy frontend page can blow your event quota. Use sample rates and inbound filters.
  • Bad grouping. Same error showing up as 50 "issues" because the stack trace has dynamic parts. Use fingerprint rules.
  • Alert routing. A new release dumping 10k events should rate-limit, not page everyone.
  • Source maps in prod. Upload them privately; never serve to users — gives away your source.
  • Old issues never get fixed. Auto-resolve on inactivity or use SLAs (e.g., "high-impact issues open > 30d → escalate").
Continue

Other Observability Tools