The shift from "read the code" to "trust the system" is huge. I've been wrestling with this too, and the PRD.json pattern plus infra-first approach makes alot of sense. The psychological part is real though, sometimes I still catch myself wanting to review everything even when the CI gates passed. The explicit subtask specification to prevent checkbox testing is clever, dunno why more people don't do that.
Great breakdown, thank you. Did you notice any "poor quality" code written by the agent?
Like, the code works but looks like my dog wrote it and it's convoluted and inefficient... Or something of that sort...
Definitely yes. The way I overcome this is by tackling the problem from two sides:
1. Provide a template for the agent that covers how I want the code to be written, and link it to CLAUDE.md.
2. Add a specific section for the Reviewer agent to focus on code complexity.
You can find an example here:
https://github.com/skenklok/ai-dev-utility/blob/main/prompt/pr-review.md
"Engineering review
Identify: bugs, error handling gaps, performance concerns, security issues, logging/observability gaps, API contracts, DB migrations, race conditions.
Verify naming, cohesion, readability, and consistency with existing code.
Analyze Code Complexity & LOC:
Estimate the Lines of Code (LOC) added/modified.
Assess cyclomatic complexity (e.g., deep nesting, complex conditions).
Identify "Hotspots": parts of the code that are too complex and hard to maintain.
Suggest refactoring strategies for these hotspots (e.g., extract method, simplify logic, use design patterns).
"
The shift from "read the code" to "trust the system" is huge. I've been wrestling with this too, and the PRD.json pattern plus infra-first approach makes alot of sense. The psychological part is real though, sometimes I still catch myself wanting to review everything even when the CI gates passed. The explicit subtask specification to prevent checkbox testing is clever, dunno why more people don't do that.
Yes, you don't know how many times I found myself looking at the tests to make sure those green checks are meaningful and I can trust them :)