Regression testing is the practice — not so much a separate kind of test as a discipline — of verifying that previously working behavior still works. Every bug fix becomes a permanent test. Every refactor is checked against the suite. Snapshot tests, visual regressions, golden-file tests are all tools in this category. The goal is simple: don't ship the same bug twice.
← Back to TestingReproduce the bug as a failing test, in the lowest tier where you can. Make the test pass with a code change. Merge both — the test stays in the suite forever, ensuring the bug can't return.
Why it works: a bug that escaped to production was, by definition, in a code path no test covered. Adding the test covers that path. The next refactor that would have broken it again now fails in CI instead of in production.
Name the test or comment it with the bug ID — // Regression for PROD-1234: total miscalculated when discount is exactly 100%. Six months later, when someone wonders why a particular edge case is asserted, the comment answers them in one line.
A bug found in production was reported as an E2E failure. The fix and the test should usually live at the lowest tier that can reproduce it — the unit or integration level — not as another slow E2E. Otherwise the suite ages into the "ice cream cone" anti-pattern.
The first run records the output. Subsequent runs compare. A diff fails the test until someone reviews and updates the snapshot. Jest popularized this for React component output; supported across many ecosystems.
The trap: "the test fails, run --update" becomes muscle memory. Snapshots get blindly accepted, the regression-protection value approaches zero. Use them sparingly — for output that's complex, hard to assert by hand, and rarely changes — and review snapshot diffs in PRs as carefully as code.
Same idea, more deliberate. The "approved" output lives in a separate file checked into source control. Tools: Approvaltests.com (most languages), insta (Rust). Used for compiler output, generated reports, query results, anywhere you have a complex correct answer that's expensive to assert by hand.
Capture a screenshot of a rendered UI; compare with the baseline. Catches CSS breakages, layout shifts, accessibility regressions that no functional test would notice. Tools: Chromatic, Percy, Applitools, Playwright's built-in screenshot diffing, BackstopJS.
Caveats: font-rendering differences across OSes cause spurious fails — run from a fixed environment (a CI container with consistent fonts). Animations, dynamic content, dates need to be masked or frozen.
Tools like QuickCheck, Hypothesis, fast-check generate hundreds of inputs and look for failing ones. When they find a counter-example, they minimize it and store it as a regression case. The bug stays in the suite; even if the property is later relaxed, the historical failures still catch attempts to re-introduce them.
The suite catches regressions before they ship more than half the time. Most bug fixes land with a test. The suite runs on every PR, quickly enough that nobody resents it. Old tests are refreshed when they go stale, deleted when the behavior they protect is no longer relevant — but not before. The suite size grows roughly linearly with the codebase, not exponentially with snapshots.
Pair with the rest: unit for the cheapest regressions, integration for wiring regressions, E2E for the production-only ones, smoke for the deploy-time ones.