AI Workflow · 2 of 6

Code Review

AI as the second pair of eyes on every pull request — bug spotting, security flags, style nits, and explanations that help juniors ship safely. Best as a complement to human review, not a replacement.

Pull RequestsStatic AnalysisSecurityStyleWorkflow 2
← Back to AI Landscape
Quick Facts

At a Glance

Basic Concepts

  • Two flavors: AI inside the IDE (review-as-you-write) and AI on the PR (review-on-push).
  • Best at: consistency, explanation, easy-to-miss bugs, security smells, accessibility.
  • Weakest at: business-logic correctness it can't see context for.
  • Noise is the killer: too many low-value comments and developers stop reading.
  • Augment, don't replace human review — the human still owns the merge decision.
Landscape

The Major Tools

ToolWhere it runsNotes
GitHub Copilot Code ReviewGitHub PRsNative to GitHub; comments inline.
CodeRabbitGitHub / GitLab / AzurePopular standalone reviewer; configurable depth.
Greptile / Codium PR-AgentGitHub / GitLabRepo-aware, uses embeddings of the codebase.
Sourcery / Cursor BugBotIDE / PRLighter-weight, focused on patterns.
EllipsisGitHubAI reviewer + custom rules.
Snyk Code / GitHub Advanced SecurityPR + CISecurity-focused (SAST) with AI explanations.
SonarQube + AI CodeFixCI / IDEEstablished static analyzer adding AI suggestions.
Custom (Claude Code / Cursor)Local / CI scriptRoll your own reviewer with prompts + the diff.
Mechanics

What AI Reviewers Catch

High-Value Catches
  • Null / undefined access the type checker missed.
  • Off-by-one errors in loops & ranges.
  • Race conditions in async code (often!).
  • SQL injection, XSS, SSRF patterns.
  • Hard-coded secrets & credentials.
  • Missing error handling on async / I/O.
  • Inconsistent naming / style vs surrounding code.
  • Accessibility (a11y) — missing alt text, label-for, focus traps.
Lower-Value (Often Noise)
  • Style nits a formatter would handle.
  • "Consider extracting a function" on perfectly clear 8-line blocks.
  • Generic "add tests" without specifying which.
  • "Add comments" on self-explanatory code.
  • Performance speculation without measurement.

Tune the reviewer to suppress these — most tools have severity / category filters.

Where It Genuinely Falls Short
  • Business-logic correctness — does this discount rule match policy?
  • Cross-service contracts — the diff doesn't show the consumer.
  • Architecture & long-term maintainability.
  • Performance under real load.
Practice

Getting It to Work

Tune Aggressiveness

Default settings tend to be noisy. Configure:

  • Severity threshold — show only "warning" and above.
  • Categories — disable "style" if you have a formatter.
  • Path filters — skip generated code, vendor, fixtures.
  • Custom rules / prompts — encode your team's conventions.
The Two-Reviewer Workflow
  1. AI reviewer runs on PR open — author addresses obvious issues before pinging humans.
  2. Human reviewer focuses on intent, architecture, business rules.
  3. AI re-runs after each push.

Net result: humans spend less time on nits, more on the things only humans catch.

Custom CI Reviewer (DIY)

If off-the-shelf doesn't fit, a 50-line GitHub Action can do it: send the diff + a custom prompt to Claude / GPT, post structured review comments back via the API. Total control over rules, model, and cost.

# pseudo-code
diff = git_diff(base="main")
review = anthropic.messages(
    model="claude-sonnet-4.6",
    system=OUR_TEAM_REVIEW_GUIDELINES,
    messages=[{"role": "user", "content": diff}],
)
gh.post_review(pr=PR_NUMBER, body=review.text)
Anti-patterns
  • Auto-merging on AI approval — never. Humans must own merges to production.
  • Treating AI comments as required — they're suggestions, not blockers.
  • Reviewing 5,000-line PRs with AI alone — split the PR, then review.
Continue

Other AI Workflow Areas