A smoke test is a tiny, fast set of checks that answers one question: did the deploy land in a usable state? It runs after every release — sometimes in CI before promotion, sometimes against production immediately after. It catches the failures nothing else can: env config, DNS, broken third-party links, missing migrations, the small things that make a release "rolled out but broken."
← Back to TestingThe wrong DB URL in prod. A missing env var. An IAM role that lost a permission during the last refactor. None of those show up in unit, integration, or even staging E2E — they're prod-config issues. A smoke test that hits a real endpoint in prod surfaces them in seconds.
Stripe API key rotated and not updated. Auth provider's metadata URL changed. SendGrid TLS certificate expired. Smoke against the real third parties catches them before the support tickets start.
The migration ran in staging but not prod (or ran in prod but failed). The new code expects a column that isn't there. A smoke test that exercises a write path lights this up immediately.
The cert renewal failed. The new Kubernetes ingress rule has a typo. The domain forwards to the old version. Smoke from outside the cluster — using a real public DNS name — catches the routing failures internal tests miss.
Every service exposes /health (liveness — am I running?) and /ready (readiness — can I serve traffic? DB connection pool warm? caches hydrated?). Smoke pings both. Health endpoints should actually do something — many teams' health checks return 200 OK while the DB is offline.
Sign in as a synthetic user; load the dashboard; create-and-delete a test record; verify the response shape. The 5–10 things a real user does in the first minute of using the product.
One light call to each major external system: payment gateway, auth provider, email service, search index, queue. Catches "their API key is wrong" without spamming them with traffic.
A trivial endpoint that returns the deployed git SHA. The smoke test verifies the deployed version matches the one that was supposed to ship — catches "the deploy succeeded but the rollout is stuck on old pods."
Deploy to environment → run smoke → on failure, auto-rollback or block promotion to the next stage. This is what makes "deploy on every merge to main" survivable.
Run the same smoke from outside your infrastructure every minute or two — Pingdom, Datadog Synthetics, New Relic, Checkly, Grafana Synthetic Monitoring. They're smoke tests as ongoing observability. Page an on-call when one fails; you'll know about outages before users do.
Before promoting from staging to prod, run smoke against staging. Cheaper than discovering breakage post-prod-deploy.