An end-to-end (E2E) test exercises the full product the way a user would: real browser, real backend, real database, real third-parties (or close approximations). It's the only test that proves the entire flow works. It's also the slowest, flakiest, hardest-to-maintain layer of the pyramid — which is why the answer to "should we add another E2E test?" is usually "no."
← Back to TestingSign up, log in, place an order, search for a product, complete checkout. The handful of flows where breakage means "the product is down." Cover those with E2E and don't apologize for the slowness.
Don't add E2E for every form, every edge case, every error path — those belong in unit and integration tests where they run in milliseconds.
A small E2E suite that runs against production after every deploy: log in, view dashboard, fetch one record. Catches DNS issues, env-config mistakes, broken third-party connections — the production-only failures unit tests can't see. See also: Smoke Testing.
When a customer reports a bug nobody can reproduce in lower environments, an E2E reproducer becomes the regression test that ensures it doesn't come back. See also: Regression Testing.
Microsoft's open-source framework. Drives Chromium, Firefox, and WebKit; auto-waits for elements; built-in network interception, video recording, trace viewer. Bindings for TypeScript, Python, Java, .NET. As of 2026, the choice for new browser E2E suites.
Why it won: reliable selectors, decent debugging, multi-browser support, parallel execution, sensible defaults. The auto-wait alone removes a whole class of flakes.
Strong UX, great in-browser test runner, large ecosystem. Works only in browsers it can run inside (originally Chrome-family; now broader). Widely deployed in JS/TS shops; many teams still happy with it.
The original. W3C standard. Still common in Java and .NET shops. Heavier setup than Playwright; more brittle without careful waits. Mature, but rarely the choice for greenfield projects.
Mobile: Appium (cross-platform, WebDriver-based), Maestro (declarative, UX-focused), Detox (React Native). Desktop: Playwright with Electron, WinAppDriver, Squish. Same pyramid logic applies — keep the count small.
For headless services and APIs: Playwright's request API, Postman/Newman, Karate, REST Assured (Java), Bruno, k6 (also a load tool). Skip the browser overhead; assert on JSON.
Don't select by CSS class (changes on every redesign) or by text in the user's locale (changes on translation). Use data-testid attributes, accessible roles (getByRole('button', {name: 'Submit'})), or stable test IDs added explicitly. Playwright's recommended selector strategy follows accessibility — which is good for users and tests.
sleep(2) is the worst answer. It's slow when not needed and not enough when it is. Modern frameworks auto-wait for elements to be ready; for everything else, wait on a specific condition (expect(page.locator(...)).toBeVisible(), waitForResponse).
Tests that depend on the real clock fail every Daylight Saving Time. Tests that depend on real third-party uptime fail when the third party blinks. Mock the network at the request layer (Playwright's route.fulfill, MSW), freeze the clock, seed random.
Each E2E test should create its own data and clean up — not share a shared "test user." Two parallel runs hitting the same shared user is a classic flake source. UUID-suffix everything; tear down at the end (or just leave it — deleted nightly).
A flaky test taken out of the suite is a regression waiting to happen. Run it in a quarantine pool that doesn't fail the build but does report results, fix the flake within a sprint, return it to the main suite. Tests that flake without being investigated are how teams stop trusting CI.
Playwright's trace viewer, Cypress dashboard, screenshots-on-failure are non-optional. When an E2E fails in CI on someone else's branch, the only useful artifact is "what did the test see at the moment of failure?"
Teams that don't trust their lower tiers often pile every test into E2E — the "ice cream cone" inversion of the pyramid. The result:
Fix: for every E2E test, ask "is there a layer below that could cover the same case?" Push down the pyramid: input validation → unit; backend logic → integration; user flow → E2E. The end state is a few dozen E2E tests covering the critical paths, not a few thousand covering everything.