TDD · Testing Deep Dive

Quick Facts

At a Glance

Basic Concepts

Red: write a test for behavior that doesn't exist yet — watch it fail.
Green: write the simplest code that makes the test pass. Resist the urge to do more.
Refactor: with the test as a safety net, clean up the code (and the test) without changing behavior.
Origin: popularized by Kent Beck in the late 1990s alongside Extreme Programming.
Outcome: a regression suite that grew with the code, plus a design shaped by its own use.

The Loop

Red, Green, Refactor

1. Red — write a failing test

Pick the smallest behavior you don't have yet. Write a test that asserts it. Run it. It must fail — and fail for the right reason (the function doesn't exist, the result is wrong) rather than a typo or import error. A red test you didn't expect to fail is a bug in your test, not a feature.

2. Green — make it pass, however ugly

Write the dumbest code that turns the test green. Hardcode the answer if you have to. The point isn't elegance yet — it's proving the test wires up to real code and the loop is closed. Speed matters here; minutes, not hours.

3. Refactor — clean up with the net

Now that the bar is green, improve the code without changing behavior. Rename, extract, deduplicate. Run the tests after every small step. If a refactor turns the bar red, undo it — don't try to fix it forward.

Refactor the test code too. Tests are first-class code; they rot like everything else.

Why It Works

The Hidden Design Discipline

You design from the outside in. Writing the test first forces you to imagine the API before the implementation. Bad APIs feel awful to test — and the pain shows up before you've built anything.
You build only what's needed. No speculative branches, no "I might need this later" code. The next test pulls the next behavior out.
Every line is covered by a test that justified its existence. Coverage isn't a metric you chase; it's a residue.
You're never far from green. If you break something, the last passing state is one minute behind you.

Worked Example

A Base62 Encoder, TDD-Style

Building the short-code encoder for the URL shortener track, one red-green-refactor cycle at a time.

Cycle 1 — encoding zero

test('encode(0) returns "0"', () => {
  expect(encode(0)).toBe('0');
});

Red — encode doesn't exist. Green — const encode = () => '0';. That's it. Resist the urge to write the real algorithm.

Cycle 2 — encoding small numbers

test('encode(1) returns "1"', () => expect(encode(1)).toBe('1'));
test('encode(10) returns "a"', () => expect(encode(10)).toBe('a'));
test('encode(61) returns "Z"', () => expect(encode(61)).toBe('Z'));

Now the hardcoded version fails. Implement a real lookup against the alphabet.

Cycle 3 — multi-digit

test('encode(62) returns "10"', () => expect(encode(62)).toBe('10'));
test('encode(123456) round-trips', () => {
  expect(decode(encode(123456))).toBe(123456);
});

Red drives you to the divmod loop. Once green, refactor: extract the alphabet to a constant, share it between encode and decode, add a property test for round-trips on random integers.

Notice what you didn't write: error handling for negatives, performance optimizations, edge cases for huge integers. They'll arrive when a test demands them — not before.

Variations

Flavors of TDD

Flavor	What's different	When it shines
Classic / Chicago	Test the result. Use real collaborators where possible.	Algorithms, pure logic, libraries.
Mockist / London	Test interactions with collaborators via mocks.	Object-heavy code with clear roles and responsibilities.
BDD (Behavior-Driven)	Tests phrased as given/when/then scenarios.	Cross-functional teams; product-readable specs.
ATDD (Acceptance-TDD)	Start at the user-visible behavior, drill in.	Feature work with clear acceptance criteria.
Property-based	Assert invariants over generated inputs, not specific cases.	Encoders, parsers, anything with round-trip or algebraic properties.

Tradeoffs

Where TDD Pays Off — and Where It Doesn't

Great for: pure logic, parsers, encoders, business rules, anything you'll touch repeatedly.
Awkward for: exploratory UI work, glue code that mostly delegates, throwaway scripts.
Slower at first. Expect to be 10–30% slower for the first few weeks; the curve flips once the suite is meaningful.
Doesn't replace integration, E2E, or exploratory testing. It's the bottom of the pyramid, not the whole thing.
Test smell ≈ design smell. If a test is hard to write, the design is hard to use. Fix the design.

Common Pitfalls

Ways TDD Goes Wrong

Testing the implementation, not the behavior. If a refactor that preserves behavior breaks tests, the tests are coupled to internals.
Over-mocking. Mocks that mirror the code under test prove only that the code calls itself. Prefer real collaborators where cheap.
Skipping refactor. Red-green-red-green leaves a swamp of duplication. The third step is where the design improves.
Giant tests. A test that fails for ten reasons tells you nothing. One assertion per behavior.
Slow suite. If the unit suite takes more than a few seconds, the loop breaks down. Push slow tests up the pyramid.
Cargo-cult coverage. 100% coverage with weak assertions is worse than 70% with strong ones.

How to Start

Pick one small new feature, not a refactor of existing code.
Write the test first. If you don't know what to assert, you don't know what you're building.
Make it green with the dumbest possible code. No cleverness yet.
Refactor with the bar green. Run the tests after every change.
Stop when the next test you'd write feels redundant. That's enough.

After a few features, you'll notice you're designing differently — smaller pieces, clearer seams, fewer big-bang debug sessions. That's the real win, not the green bar.

Continue

TDD — Test-Driven Development