We ran Vigil on our own AI-generated PRs
What happens when you point a PR verification tool at its own codebase? Real findings, real scores, real improvements.
Every PR in the Vigil codebase is written by AI agents — Claude Code, primarily. We use Vigil to verify every single one. Not as a marketing stunt, but because we need to. When your entire codebase is AI-generated, the gap between “what the PR says” and “what the code does” becomes the most important thing to verify.
Here’s what happened when we pointed Vigil at its own PRs.
Three real findings
The score trajectory
Our early PRs scored between 59 and 70. The main culprit: false positives. Template literal confusion was destroying JSX diffs (10+ false positives per PR). Credential scan was flagging test files. Coverage mapper was complaining about Dockerfiles not having tests.
We fixed each issue systematically (PRs #88 and #94). False positive rate dropped from ~10/PR to ~0/PR. Scores climbed from the 60s to the 80s and 90s.
What we learned
AI-generated code is syntactically correct but narratively incomplete. The code compiles. The tests pass. But the PR description doesn’t tell the full story. That’s the gap Vigil fills.
False positives matter more than false negatives. A tool that cries wolf gets ignored. We spent more time reducing false positives than adding features — and it was worth it.
Verification is not review. Code review asks “is this code good?” Verification asks “does this PR actually do what it claims?” They’re complementary, not competing.
Try it on your repos
Vigil is free for unlimited repos. Install from the GitHub Marketplace, open a PR, and see what it finds.
Install Vigil