Smoke Testing · Test Types Deep Dive

Quick Facts

What a Smoke Test Is

Basic Concepts

Tiny: 5–30 checks, total. Not a regression suite.
Fast: seconds to a couple of minutes. Runs after every deploy without slowing the pipeline.
Critical-path only: can users sign in, see the dashboard, place an order? Not "does the date picker handle Feb 29th?"
Runs against production-like environments — including production itself. Many teams run smoke against prod immediately after the deploy and roll back if it fails.
Origin: the term came from electronics: plug it in; if no smoke, proceed.

Why It's Different

What Smoke Catches That Other Tiers Don't

Environment Config

The wrong DB URL in prod. A missing env var. An IAM role that lost a permission during the last refactor. None of those show up in unit, integration, or even staging E2E — they're prod-config issues. A smoke test that hits a real endpoint in prod surfaces them in seconds.

Broken Third-Party Connections

Stripe API key rotated and not updated. Auth provider's metadata URL changed. SendGrid TLS certificate expired. Smoke against the real third parties catches them before the support tickets start.

Missing Migrations / Schema Drift

The migration ran in staging but not prod (or ran in prod but failed). The new code expects a column that isn't there. A smoke test that exercises a write path lights this up immediately.

DNS, Certs, Ingress

The cert renewal failed. The new Kubernetes ingress rule has a typo. The domain forwards to the old version. Smoke from outside the cluster — using a real public DNS name — catches the routing failures internal tests miss.

What to Cover

The Right Set of Smoke Checks

Health Endpoints

Every service exposes /health (liveness — am I running?) and /ready (readiness — can I serve traffic? DB connection pool warm? caches hydrated?). Smoke pings both. Health endpoints should actually do something — many teams' health checks return 200 OK while the DB is offline.

Critical User Flow Probes

Sign in as a synthetic user; load the dashboard; create-and-delete a test record; verify the response shape. The 5–10 things a real user does in the first minute of using the product.

External Dependency Pings

One light call to each major external system: payment gateway, auth provider, email service, search index, queue. Catches "their API key is wrong" without spamming them with traffic.

Version Check

A trivial endpoint that returns the deployed git SHA. The smoke test verifies the deployed version matches the one that was supposed to ship — catches "the deploy succeeded but the rollout is stuck on old pods."

When to Run Them

The Lifecycle Position

Post-Deploy in the Pipeline

Deploy to environment → run smoke → on failure, auto-rollback or block promotion to the next stage. This is what makes "deploy on every merge to main" survivable.

Continuous Synthetic Monitoring

Run the same smoke from outside your infrastructure every minute or two — Pingdom, Datadog Synthetics, New Relic, Checkly, Grafana Synthetic Monitoring. They're smoke tests as ongoing observability. Page an on-call when one fails; you'll know about outages before users do.

Pre-Promotion Gate

Before promoting from staging to prod, run smoke against staging. Cheaper than discovering breakage post-prod-deploy.

Building Them Right

Working Practices

Use the same tooling as E2E. Playwright, k6, Postman/Newman, custom scripts. The skill set transfers; the tests look like skinny E2E.
Synthetic accounts, not real ones. A dedicated test user with predictable data. Never run smoke that mutates real customer data.
Idempotent and self-cleaning. If the test creates something, it cleans up — or runs against a fixture that's expected to exist.
Tag costs. If smoke runs against paid third parties (a real Stripe charge for $0.50), tag those amounts as test transactions or use sandbox endpoints.
Alert on flake. A smoke that occasionally fails is corrosive — people start ignoring it. Quarantine the flake; fix it within a sprint; restore it.
Distinguish failure from degradation. "The system is responding slowly" and "the system is broken" deserve different alerts. Smoke is for the second one; performance tests handle the first.

Continue

Other Test Types

← Unit ← Integration ← End-to-End ← Contract ← Performance / Load Regression → Chaos → ↑ Back to Testing