Moon

March 1, 2026

Engineering

๐Ÿง› Stop Reading Every Line of Code!

If you keep reviewing things like it's 2016 in 2026, your process is going to break.

Alex Tongยท7 min read

I've been using Claude Code HEAVY at work. Like, 8 hours a day speaking to 6 different robots at the same time, working on 6 tickets at the same time. So much so that our team cannot handle the amount of PRs that we're spitting out because we're just not able to keep up with the output. The more I use it, the more I'm convinced of this fact:

We're doing code review wrong.

And our process is already breaking in 2026. Our default assumption is outdated:

"A human wrote every line, so a human should read every line."

That world is quickly fading. If you keep reviewing things like it's 2016 in 2026, your process is going to break. Whether you're an engineer, a manager, or a product manager โ€” this problem will affect how you do everything.

meow
meow

We Stopped Reading Assembly โ€” But This Isn't Quite The Same Move

We've been in a similar situation before. When we moved from assembly to higher-level languages, nobody said "hey, we still need to review the entire assembly output." We trusted the abstraction. We were fine just reviewing the higher-level code. We shifted our focus. This AI-generated shift is similar, but a hundred times bigger โ€” and it has one critical difference.

Here we go. We already tolerate a lot of black boxes constantly. You're not reading every npm package you install. Nobody reviews the JavaScript that TypeScript compiles down to. Nobody reads the server configs AWS provisions when you write in Terraform. You write a SQL query and you let the query planner decide how to execute.

So here's the critical difference. Compilers, like the examples I just gave, are deterministic and formally verified. LLMs are not. Two identical prompts fed to Claude can produce very different code. Worse, AI agents actively drift from their own instructions as context bloats. Your CLAUDE.md file might say "don't insert default values," but the agent might do it anyway 3,000 tokens in. I don't know if this will get fixed with better models because it seems very fundamental to how these models work to begin with.

So how can we fix our code review bottleneck problem? With better guardrails.

Review The Plan, Then Skim The Diff

If implementation is increasingly generated โ€” and it is โ€” the highest-leverage review shifts earlier in the process: plans, constraints, interfaces, architecture, risks, tests.

For most changes, when you're reading the PR summary, you're asking: Does this plan sound correct? Are the constraints right? Are there any edge cases? Do the tests on the PR prove the behavior? That's much higher leverage than reading every single line.

For things like auth, payments, or permission checks, you probably still want targeted line-by-line reading even when the plan looks solid. Why? Because agents will drift. Plans will not capture every single implementation error or nitpick. Critical code like this always deserves human eyes on every line. I'm not saying we should move to a no-review process โ€” I'm saying not all code deserves the same depth of review. Some code gets a skim. Some code gets every line read. The skill is knowing which is which.

The Review Bottleneck Nobody Wants To Admit

Here's a dirty secret. Code is being written way WAY faster than it can possibly be reviewed by a human.

Teams with high AI adoption merged 98% more PRs, but PR review time increased 91%.

Faros AI analyzed telemetry across 10,000+ developers and confirmed this. You're producing 2x the output but reviews take 2x as long. And that's before parallelization โ€” engineers running 3โ€“5 sessions at once across worktrees are generating multiple streams at once.

Teams that used to push out 10 PRs a week are now staring at 50โ€“100. If your code review stays as "carefully read everything line by line" while Claude turns every engineer into a PR production factory, your org becomes a firehose with a human-sized funnel. Something needs to change.

My Proposed Solution

Tiered review systems.

Tier 1: fully automated โ€” linting, static analysis, unit tests, security scanning, type checking. No human involved.

Tier 2: peer review for behavior, correctness, and "does this match the intent?"

Tier 3: senior/security review for critical paths โ€” auth, payments, PII, system boundaries. Most changes should never need Tier 3.

Every layer should have AI-assisted review. And here's the key insight โ€” don't let your agent grade its own homework. A separate subagent will catch semantic issues before the code ever reaches human eyes โ€” in under a minute. Always start a fresh session to review your code.

Enjoying this? Subscribe for free to get the next one.

Guardrails Become The Product

So, if we stop reading everything line by line โ€” and I'm saying that we should โ€” then guardrails stop being nice-to-haves and start becoming must-haves. They become the actual real task that you need to do. AI in its current state has a lot of its faults. It can silently insert defaults, it can name things badly without proper context from you, it can drift from instructions, it can also hallucinate completely once your context is too bloated. So you're going to need guardrails to catch these.

When I say guardrails, I'm talking about tests. You're going to need way more test coverage than most production companies actually have. You need critical testing, integration testing, end-to-end testing, regression tests for every single bug that you find. Your tests become the contract of your task. But is there a problem with also using AI to generate tests too?

Why yes, of course. So test quality also needs close review, especially for critical ones. Using something like a coverage helper can totally be gamed by AI. What matters is that your tests prove that what could be a dangerous piece of code actually works correctly. You should probably create a test for every production incident that happens. Similar to Thoriq's advice โ€” one of the principal engineers at Anthropic โ€” you should always be telling Claude about your bugs that you run into and your gotchas. You probably should also be having security checks running on every PR in your GitHub workflows.

Architecture Decision Records also become super important. Your CLAUDE.md is kind of already a lightweight version of this. The teams that I've seen getting the most out of Claude treat CLAUDE.md as a living document. Every time a mistake is noticed, they update their CLAUDE.md file. That CLAUDE.md file becomes institutional memory โ€” a compounding record of all lessons learned. Observability gets way more important too with this approach. Telling Claude how to read your logs, tracings, metrics, and alerts will just make this entire workflow work way smoother. If it's very well documented and it's clear what it's examining, you'll know when there are bugs and when to fix them and how to fix them.

This also changes how junior engineers learn โ€” but that's a whole other conversation that I'll be writing about separately. Short version: the apprenticeship isn't dead, it just looks different now.

i like dogs and cats, dont worry
i like dogs and cats, dont worry

TLDR: What I'd Actually Do Next Week

yep
yep

Cap your PRs at ~400 lines or so. Enforce this on GitHub. Only allow overrides if someone explicitly tries to override, which also alerts your team.

Create a risk tier rubric. Automate low-risk PRs. High-risk PRs โ€” the ones that might modify user data or auth โ€” require senior eyes and targeted diff reading.

Require a thorough PR summary. PR summaries are probably the most important artifact now. I'll be writing a separate article on this, so look out for it.

Make automated checks mandatory. Type checking, security scans, test coverage. All of this must be a requirement now. Not nice-to-have. No exceptions.

Set up AI review during development, not just after it hits GitHub. Get into the habit of spawning a new AI agent to review your code after you write it on your own terminal. Look into using Claude's new /simplify command. Don't ever let your agents grade their own homework.

Document your rules so Claude can read them. Put CLAUDE.md files in every single repo with more in-depth rules in the .claude/rules folder. When Claude makes a mistake, always update these files.

Track your PR review queue. Start putting metrics around how long PRs take to review. This is where the bottleneck shows up first โ€” and you should have good observability on it.

The Bottom Line

The role is shifting from "write code, review code" to "design thorough Jira ticket work, translate tickets into concise yet thorough prompts, define your guardrails, and then verify the outcome."

This isn't a con โ€” it's a huge upgrade. But only if you earn it with the correct infrastructure. So stop reading every line of code by default. But don't confuse that with trusting everything Claude says blindly. You need to trust the guardrails, the tests, the gates. Human judgment should come in for things that only humans can judge โ€” like product requirements, for example โ€” and make sure you're still applying human judgment at the boundaries where the real damage happens.

So don't stop reviewing. Just stop reviewing everything the same way you've been doing your whole life.

I've been a software engineer for close to 5 years at both Amazon and The New York Times, recently using Claude extensively in production workflows. These are observations from the trenches, not theoretical predictions. The shift is already happening โ€” prepare yourselves!

What did you think?

Enjoyed this? Get the next one in your inbox.

Already subscribed?