Skip to content
← Back to blog
·5 min read

We ran Vigil on our own AI-generated PRs

What happens when you point a PR verification tool at its own codebase? Real findings, real scores, real improvements.

Every PR in the Vigil codebase is written by AI agents — Claude Code, primarily. We use Vigil to verify every single one. Not as a marketing stunt, but because we need to. When your entire codebase is AI-generated, the gap between “what the PR says” and “what the code does” becomes the most important thing to verify.

Here’s what happened when we pointed Vigil at its own PRs.

Three real findings

Undocumented Changes
Hardcoded redirect URI in auth flow
PR #82 added GitHub OAuth. The PR description mentioned “adds auth flow.” What it didn’t mention: the callback URL was hardcoded to https://keepvigil.dev/api/auth/callback. No environment variable, no configuration. Vigil flagged it as an undocumented change. In a multi-environment setup, this would have broken staging and development.
Claims Verifier
Auto-approve creates undocumented GitHub review
PR #92 implemented auto-approve for high-score PRs. The description said “adds auto-approve when score > threshold.” What it didn’t say: it creates a GitHub review with an APPROVE event — a side effect that changes the PR’s merge status. Vigil caught the gap between what was claimed and what the code actually did.
Undocumented Changes
Silent redirect page added without mention
PR #93 added i18n support. Buried in the diff: a new redirect page routing /docs to /docs/getting-started. Not mentioned anywhere in the PR description. Small? Yes. The kind of thing that slips through review? Also yes.

The score trajectory

Before fixes (PRs #81–#86)
59PR #81
70PR #82
66PR #83
70PR #84
After fixes (PRs #91–#93)
82PR #91
93PR #92
100PR #107

Our early PRs scored between 59 and 70. The main culprit: false positives. Template literal confusion was destroying JSX diffs (10+ false positives per PR). Credential scan was flagging test files. Coverage mapper was complaining about Dockerfiles not having tests.

We fixed each issue systematically (PRs #88 and #94). False positive rate dropped from ~10/PR to ~0/PR. Scores climbed from the 60s to the 80s and 90s.

What we learned

AI-generated code is syntactically correct but narratively incomplete. The code compiles. The tests pass. But the PR description doesn’t tell the full story. That’s the gap Vigil fills.

False positives matter more than false negatives. A tool that cries wolf gets ignored. We spent more time reducing false positives than adding features — and it was worth it.

Verification is not review. Code review asks “is this code good?” Verification asks “does this PR actually do what it claims?” They’re complementary, not competing.

Try it on your repos

Vigil is free for unlimited repos. Install from the GitHub Marketplace, open a PR, and see what it finds.

Install Vigil