AI Agents Write Code — Who Verifies It?
The speed is real. The trust gap is growing.
Claude Code, Cursor, Devin, GitHub Copilot Workspace — AI coding agents are shipping pull requests in minutes. Complete with confident descriptions: "Adds auth middleware." "Fixes the timeout bug." "No breaking changes."
But here's the thing nobody talks about: these descriptions are generated by the same model that wrote the code. The agent believes what it wrote is correct. The description reflects that confidence, not an independent assessment.
You now have a PR that reads perfectly, looks professional, and might be completely wrong about what it actually does.
CI doesn't solve this
CI tells you if code compiles and tests pass. It doesn't tell you if the PR description matches the diff. A PR can say "adds rate limiting" while the diff actually adds rate limiting AND silently changes the database connection pool size. CI passes. The reviewer skims the description. The hidden change ships.
Code review tools like CodeRabbit analyze code quality — style, bugs, best practices. They don't verify claims. They don't check if what the PR says matches what the code does.
The verification layer
What's missing is a system that reads the PR description, extracts every claim, and verifies each one against the actual diff. Then scans for everything the description didn't mention.
That's what Vigil does. It reads your PR title and body, extracts claims like "adds auth middleware" and "fixes timeout," then checks the diff to confirm or contradict each one. It also surfaces undocumented changes — new dependencies, environment variables, schema modifications that nobody mentioned.
Why this matters now
When humans wrote all the code, the PR author usually knew what they changed. The description was often incomplete but rarely misleading. With AI agents, the description is always confident and sometimes wrong.
The faster code gets written, the more you need an independent verification layer. Not to replace code review — to complement it with a truthfulness check that nobody has time to do manually at scale.
Code is becoming commodity. Trust is becoming scarce. Verification is the layer that bridges the gap.