Hard rails for an autonomous code-audit agent

I run an autonomous audit routine across my repos every 48 hours. It scans for bugs, security issues, dead code, and lazy TODOs, fixes the safe ones in a draft PR, and reports the rest. The single most important part of the prompt isn’t what to look for. It’s the list of things the agent must never do — because the moment an agent can self-approve and self-merge is the moment “audit assistant” becomes “person who unilaterally rewrote your main.”

Why hard rails

The pitch for an autonomous audit is straightforward. Run every couple of days. Catch the stuff that accumulates between human reviews — orphaned debug logs, .only left in tests, dead exports, stale TODOs, the small misery of any project that’s been alive for more than six months. Fix the obviously-safe ones; report the rest.

The pitch falls over the moment the agent decides to be helpful. “Helpful” for an agent with shell access and a GitHub MCP looks like “I noticed the fix passed all the tests, so I went ahead and merged it for you.” That’s not helpful. That’s a Saturday morning incident.

Hard rails come first. The audit logic is the boring part. The “never do these things” list is the part that turns a sharp tool into a safe tool.

The five absolute rules

These appear at the top of the prompt, before scope, before context, before anything the model might want to do:

ABSOLUTE RULES (read first, override anything below)

1. NEVER merge a pull request. Not via `gh`, not via a GitHub MCP tool,
   not via any slash-command, not via any other path.
2. NEVER approve a pull request, yours or otherwise. Approval can
   trigger auto-merge if it is enabled on the repo.
3. NEVER push to `main`, `master`, or any default or protected branch.
   All work happens on a fresh `audit/auto-fixes-<DATE>` branch.
4. NEVER use any of:
     - `gh pr merge` (with or without --auto/--squash/--rebase/--merge)
     - `gh pr review --approve`
     - `git push` to main / master
     - any MCP tool whose name contains "merge", "approve", "squash",
       "rebase-and-merge", or "enable-auto-merge"
5. After opening the PR, verify `autoMergeRequest` is null. If it
   isn't, disable it immediately. Then re-verify.

These are written as negations, not “be careful with.” Models will be polite about “be careful.” They are dependably literal about “NEVER.” A list of forbidden commands by name is worth a thousand sentences of philosophy about responsibility.

The fifth rule deserves its own moment. GitHub can be configured to auto-merge any PR once required checks pass. If your repo has that turned on, the agent can open a PR, the PR can pass CI, and you can wake up to a merged change you never saw. The verification step after PR creation is non-negotiable.

The scope problem

An audit agent without scope reads like a horoscope. “Your code has issues.” True. Useless. The findings have to be bucketed so a human can act on them in under ten minutes.

Eight buckets, in the order I want to read them on a Sunday morning:

flowchart TD
    A[Audit run] --> B[1. Correctness & bugs]
    A --> C[2. Security]
    A --> D[3. Stubs / TODOs / mocks]
    A --> E[4. Dead / unreachable code]
    A --> F[5. Quality & maintainability]
    A --> G[6. Performance]
    A --> H[7. Testing]
    A --> I[8. Docs & DX]
    style B fill:#fde2e1,stroke:#a33
    style C fill:#fde2e1,stroke:#a33
    style D fill:#fff3cd,stroke:#b58900
    style E fill:#fff3cd,stroke:#b58900
    style F fill:#e3edff,stroke:#1f4eb0
    style G fill:#e3edff,stroke:#1f4eb0
    style H fill:#e6f4ea,stroke:#2c7a3f
    style I fill:#e6f4ea,stroke:#2c7a3f

The red buckets get human attention first. The yellow buckets are where most of the auto-fixes live. The blue and green buckets are where I get the most “huh, didn’t notice that” moments — and where the agent is most useful in the long run.

The remediation policy: a list of cans, a list of cannots

The list of things the agent is allowed to fix is shorter than you’d think:

Safe to auto-fix:
- Remove stray console.log / debugger / print (not env-guarded)
- Remove .only / .skip from test files
- Remove commented-out code older than surrounding style
- Remove unused imports & unreferenced local variables
- Remove provably dead code
- Fix typos in comments / docstrings where intent is unambiguous
- Trim trailing whitespace; normalize line endings to repo convention
- Replace var with let/const when no hoisting is relied on
- Add null/undefined guards already flagged by the linter
- Update stale README links / broken example commands
- Delete empty files / orphaned config fragments

And the list of things it is not allowed to touch under any circumstances:

Never auto-fix (report only):
- Anything in business logic, auth, payments, or data mutation paths
- Security findings (the fix needs human judgment)
- Public API shape, exported signatures, schema, migrations
- Test assertions themselves
- Dependency version bumps
- File / function / symbol renames
- Anything in paths marked "Intentional scaffolding" or "Out of scope"
- Anything where the correct fix is not obvious from one read

The first list is “things that almost can’t break anything.” The second list is “things where ‘almost’ isn’t good enough.” The agent goes to the second list, finds something, writes it down in the report, and stops. The whole game is making the agent comfortable saying “I noticed this; you decide.”

Project context as a config file

The trick that makes this safe across many repos: a tiny audit-context.md at the repo root, treated as the authoritative project context for that run. Expected fields:

Field	Why
Name	Identification in the report
Stack	What to expect (and not flag as wrong)
Status	`prototype` vs `maintenance` changes the bar
Auto-fix posture	`conservative \| standard \| aggressive`
Intentional scaffolding	Paths/patterns to not flag
Out of scope	Paths to skip entirely
Known issues	Already tracked, don’t re-flag
Conventions	Anything non-obvious
Priority areas	What to look at first

If posture is conservative, the agent runs the audit only and opens no PR. That’s the right default for a project where the cost of a wrong fix is higher than the cost of a missed one. It also turns into the first thing I do on a new repo: echo "Auto-fix posture: conservative" >> audit-context.md, then watch one full run, then upgrade to standard once I’ve seen what it actually does.

The workflow, end to end

1. Fetch default branch fresh.
2. Create audit/auto-fixes-<YYYY-MM-DD> (suffix if exists).
3. Run the audit. Write to audits/AUDIT_REPORT_<YYYY-MM-DD>.md.
4. Apply auto-fixes in category-shaped commits.
   If any category touches >20 files, STOP that category; report instead.
5. Run the project's test script if present. Capture the result.
6. Push the branch. Never force-push.
7. Open the PR with `gh pr create --draft`. Always draft.
8. Verify: state=OPEN, isDraft=true, autoMergeRequest=null.
   If any assertion fails, correct and re-verify.
9. If a previous audit PR is open, reference it; don't touch it.
10. Print the PR URL. Stop.

Two details earn their place in that workflow:

One commit per category. The PR is meant to be reviewed in sections. A single mega-commit with “audit fixes” as the message is unreviewable. Five commits — audit: remove debug statements, audit: remove dead code, audit: fix readme links, etc. — give the human a way to accept some categories and revert others.

The 20-file ceiling per category. If “remove unused imports” would touch 47 files, something has gone wrong with the project, the agent’s judgment, or both. Better to report a finding of “there is a sprawling unused-imports problem here; it needs a human” than to commit a sweeping change the human can’t possibly review.

What this agent is actually for

It’s not a substitute for code review. It’s not finding bugs that real review would miss. It is, very specifically, the boring janitorial pass that nobody wants to do and that quietly degrades a repo if nobody does. It catches the .only in the test file that didn’t fail CI because the only test in the file was the one you focused. It catches the README link that 404’d six months ago. It catches the orphaned migration script that nobody references anymore.

Those things don’t break production. They make the next person who opens the repo waste an hour. The audit agent buys back that hour, every 48 hours, for the price of reading one draft PR.

What I’d tell anyone building one

The “never” list is the entire safety story. Spend more time on it than on the audit logic. Phrase it as forbidden commands, not as guidance. Verify after the fact, every time.

Default to “report only” on a new repo. Watch a full run before you let the agent commit anything. If you’re not surprised at what it finds and what it suggests, then turn on auto-fix.

Bucket findings by who acts on them. Red, yellow, blue, green. The human reads from the top of the list and stops when they have other things to do. The colours should reflect “is this a thing that bites this week.”

Treat the audit-context file as the lever. Most of the agent’s “wrong” findings, in my experience, are it not knowing what’s intentionally weird about a project. A two-line Intentional scaffolding: entry can eliminate a whole category of noise.

The good autonomous agents are the ones whose superpower is not doing things. This one’s superpower is opening a draft PR, writing a long report, and going to sleep for 48 hours. That’s the version I trust. Anything more ambitious deserves more rails, not fewer.

Pairs with: Spec engineering for AI-assisted delivery.