Code Generation · AI Workflow Deep Dive

Quick Facts

At a Glance

Basic Concepts

Three altitudes: autocomplete (lines), chat (functions/files), agent (multi-file features).
Context determines quality — the model can only write good code if it sees the surrounding code.
Generation isn't "type and ship" — review, test, and iterate are still on you.
Models matter: the best frontier model produces materially better code than mid-tier ones.
Spec quality = output quality. Vague prompts get vague code.

Modes

The Three Altitudes

1. Inline (Autocomplete)

Ghost-text suggestions as you type. Fast (<200ms), small models (Supermaven, Codeium, Copilot Tab).

2. Chat / Edit

"Add a function that…" — generates a function or refactors a file. Cursor Chat, Copilot Chat, JetBrains AI.

3. Agent / Composer

"Build the checkout flow." Reads files, edits multiple, runs tests, iterates. Claude Code, Cursor Composer, Devin.

Mechanics

What AI Generates Well

High-Confidence Targets

Boilerplate — DTOs, controllers, CRUD scaffolding, config files.
Unit tests for existing functions — coverage that would otherwise rot.
Migrations & schema changes — repeating patterns across files.
Translations — Python → TypeScript, jQuery → React, Java → Kotlin.
Glue code — wiring two libraries together, format converters, parsers.
One-off scripts — data cleanup, file renaming, log analysis.

Medium-Confidence Targets

Feature implementation — when the spec is clear and the codebase is conventional.
Refactoring — extracting components, renaming, splitting modules.
Bug fixes from a stack trace — paste error + relevant code, get a fix.
UI from a screenshot or Figma export.

Low-Confidence (Be Careful)

Architecture decisions — agents will happily invent a microservice you didn't need.
Cross-cutting concerns not visible in the local context (auth, multi-tenant rules).
Performance-critical hot paths — generated code is correct but rarely optimal.
Niche libraries / new APIs — model may hallucinate functions that don't exist.

Practice

Prompting Patterns

The Spec-First Pattern

Better than diving in. Write a short spec — the model will follow it.

// Bad
"add caching"

// Good
"Add an LRU cache to ProductService.getById:
- Max 1000 entries
- 5-minute TTL
- Cache key: `product:${id}`
- Use the existing Caffeine library; see CacheConfig.java
- Add a unit test that verifies eviction after 1001 inserts"

The Reference Pattern

Point the agent at an example. "Build a UserController like ProductController in src/api/products/" gives it the conventions, naming, and test style without you spelling it out.

The Tight Loop

For agents: small task → run tests → review diff → fix or accept. Don't ask for "the whole feature" in one go — break it into steps you can verify in minutes.

The Test-First Pattern (TDD with AI)

You: write the test cases describing desired behavior.
Agent: implement until tests pass.
You: review diff, accept.

Tests act as a contract — they pin down intent better than prose, and the agent gets a fast feedback signal.

Risks

What Can Go Wrong

Hallucinated APIs

Calls to functions that don't exist. Linters, types, and tests catch most.

Plausible Wrong Answers

Code that compiles, looks right, but does the wrong thing. Read every diff.

License Contamination

Verbatim reproduction of GPL code. Use Copilot's filter / similar guards.

Skill Atrophy

Junior devs miss out on muscle memory. Mix AI work with hand-written code.

Code Sprawl

It's easy to generate too much code. Prune ruthlessly; don't merge unused.

Security Holes

SQL injection, weak crypto, leaked secrets. SAST + human review still required.

Continue

Other AI Workflow Areas

Code Review Documentation Debugging Product Features Data & Insights ↑ Back to AI Landscape