feat(agents): PR review engine — scope filter, schema, audit trail#716
Draft
0xbulma wants to merge 10 commits into
Draft
feat(agents): PR review engine — scope filter, schema, audit trail#7160xbulma wants to merge 10 commits into
0xbulma wants to merge 10 commits into
Conversation
….agents/ Adopts CHANGED_LINES + ±15 line tolerance, WHAT/FIX schema, DROPPED_FINDINGS audit trail, bundled TS scripts, references/ directory, ci-release-security persona split, versioned frontmatter, and an invariant test — while preserving the Morpho-specific persona inventory and AGENTS.md §X anchorage. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- .agents/lib/pr-review-base.md: full frontmatter with kind: engine, version: 0.1.0, disable-model-invocation: true. - Every .agents/personas/*.md: version: 1.0.0 (initial baseline). No behavior change. Sets up the version-tracking contract that .agents/test/pr-review-engine.test.sh (Phase 5) will assert on. TIB: TIB-2026-05-21 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…eferences
- .agents/lib/scripts/build-changed-lines.ts: parses `git diff --unified=0`
from stdin, emits CHANGED_LINES map. Handles deletion-only hunks, pure
renames (empty array), and file deletions (absent from map).
- .agents/lib/scripts/validate-findings.ts: applies WHAT/FIX schema check,
file-out-of-scope drop, ±15-line tolerance window, and Markdown
fenced-code-block FP filter in one pass. Emits {kept,dropped,failed,counts}.
- .agents/references/{changed-lines,scope-filter,calibration}.md:
authoritative documentation for the edge cases and the tolerance
rationale. Linked from pr-review-base.md and from the per-agent envelope.
- .agents/lib/pr-review-base.md: Step 3 now pipes `git diff --unified=0`
into build-changed-lines.ts; Step 6.1 routes findings through
validate-findings.ts; Step 5 envelope contract injects CHANGED_LINES.
- biome.json: narrow `!**/lib` exclude to `!packages/**/lib` so the engine
scripts under .agents/lib/scripts/ are linted (build outputs in
packages/*/lib remain excluded).
Both scripts: Node stdlib only, no new deps. Strict TS, NodeNext, zero
`any`. Native Node 24 strips types — no tsx, no transpiler.
TIB: TIB-2026-05-21
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- pr-review-base.md Step 5: per-agent contract now mandates `WHAT:` and `FIX:` clauses in every finding's `description`. Missing clauses route to `failed[]` (counted in <FAILED_AGENTS>) — never silently dropped. - pr-review-base.md Step 5: `line` schema spelled out — positive integer inside CHANGED_LINES for the file, or within ±15 lines. - pr-review-base.md Step 6.2: agent-failure list now includes the schema failures (missing WHAT/FIX, malformed shape) routed via Step 6.1. - pr-review-base.md output contract: adds <DROPPED_FINDINGS> (with drop_reason and distance_to_nearest_changed_line) and <DROPPED_COUNTS>. - pr-review-gh.md: renders dropped findings as a collapsible <details> audit-trail block when DROPPED_FINDINGS is non-empty; writes full JSON to /tmp/pr-review-gh-<PR>-dropped.json regardless. - pr-review-local.md: prints a terminal audit-trail block (counters only) between findings and sentinel; writes full JSON to /tmp/. - pr-review-ci.md: writes dropped JSON to /tmp/ for artifact upload; does NOT render in the formal GitHub review body — keeps the verdict tight. TIB: TIB-2026-05-21 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…personas
ci-release-security.md was a single combined persona firing whenever any
of CI / release / dependency files were touched. Split into three
narrowly-triggered personas, each anchored to a discrete §10 sub-rule
cluster:
- ci-security.md (trigger: <HAS_WORKFLOWS>) — workflow injection,
pull_request_target + PR-head checkout, ACL gating,
action pinning, permissions: scopes, secret exposure.
- release-integrity.md (trigger: <HAS_RELEASE>) — publish-flow integrity
(--provenance, org tokens), release-commit signing
& write-token hardening, Changesets / release-bot
wiring.
- dependencies.md (trigger: <HAS_DEPS>) — lockfile drift, new-dep
hygiene (postinstall, typosquats, unpinned ranges),
.npmrc and pnpm-workspace.yaml hardening.
Engine prose:
- pr-review-base.md Step 4: <HAS_CI_RELEASE> retired; replaced with
<HAS_WORKFLOWS>, <HAS_RELEASE>, <HAS_DEPS>.
- pr-review-base.md inputs table: adds <EXCLUDE_AGENTS> (forward-compat
hook for future orchestrators).
- pr-review-base.md Step 5: new sub-step 3 applies the EXCLUDE_AGENTS
filter before launching personas in parallel.
- pr-review-base.md Step 5 inventory: three new conditional rows replace
the single ci-release-security row.
AGENTS.md §10:
- Inventory table: three rows replace the single conditional row.
- Header note: persona-split + bundled-scripts + references/ directory.
- Section title and trailing "Applied by personas" pointer updated to
name all three personas.
- The §10 rules themselves are UNCHANGED — only the inventory tables move.
Cross-persona out-of-scope references in code-quality, style-conventions,
test-coverage, web3-security, and pr-fix.md are updated to point at the
new persona names (no rule changes).
TIB: TIB-2026-05-21 (Phase 4 of 5)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Asserts five structural invariants the rest of the engine relies on:
1. Every persona file has parseable frontmatter with the required
fields (name, kind, version, applies). `version` is semver. `kind`
is baseline or conditional. Conditionals also declare `trigger:`.
2. Every conditional persona's `trigger:` flag is computed in
pr-review-base.md Step 4. Catches the easy-to-miss drift where
someone adds a new conditional persona but forgets to wire the
<HAS_*> flag in the engine.
3. The engine (.agents/lib/pr-review-base.md) carries the
`disable-model-invocation: true` + `kind: engine` + semver
`version` frontmatter — the contract that hides it from the
slash-command surface.
4. Both bundled scripts (.agents/lib/scripts/{build-changed-lines,
validate-findings}.ts) exist and parse-execute under native Node.
Cheap smoke test that catches obvious regressions without
spawning the scripts for real-world output.
5. AGENTS.md §10 persona-inventory tables ↔ on-disk personas match
in both directions. No persona orphaned in either place.
Wired as `pnpm test:agents`. Bash-free, Node stdlib only, no test
runner — exits non-zero on any violation. Advisory initially; flip
to a blocking CI step once stable.
TIB: TIB-2026-05-21 (Phase 5 of 5)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two operational updates that surfaced during execution of TIB-2026-05-21 (decisions unchanged — TypeScript scripts, deterministic scope filter, audit trail, etc. all land as originally accepted): 1. Drop tsx devDep in favour of native Node 24 type stripping. Node's built-in --experimental-strip-types (stable since 22.6) handles the scripts' subset; no transpiler dep needed. Engine prose, references, and Phase-5 test all invoke `node <script>.ts`. 2. Skip Phase-2 colocated unit tests; rely on the Phase-5 invariant test and dev-time end-to-end smoke runs. Scripts are thin CLI wrappers; adding vitest project-glob wiring for a 2-file directory wasn't worth the config surface. Follow-up TIP can add them if scripts grow. TIB: TIB-2026-05-21 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Per maintainer request, remove citations to the specific external
Claude Code review-engine plugin (its GitHub repo URL was named throughout).
The architectural learnings stand on their own — we describe them as
patterns adopted from "a separate Claude Code review-engine plugin"
without naming the source repo.
- Rename: TIB-2026-05-21-pull-claude-skills-learnings-... →
TIB-2026-05-21-pr-review-engine-scope-filter-and-audit-trail.md.
- H1 + body prose: drop the repo namedrops and URL links.
- References section: remove external-repo URLs; add pointers to the
in-repo artifacts the TIB produced (scripts/, references/).
- @0xbulma as author handle is normal attribution and stays.
Decision and content of the TIB are unchanged. Only the framing prose
that identified the external repo by name was scrubbed.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Group the engine, agents, references, scripts, and test under a single
directory matching the Anthropic Skills convention (without going full
plugin). Decision unchanged — same engine, same agents, same scope-filter
contract; only the on-disk layout moves.
Old → new:
.agents/lib/pr-review-base.md → .agents/pr-review-engine/SKILL.md
.agents/personas/ → .agents/pr-review-engine/agents/
.agents/references/ → .agents/pr-review-engine/references/
.agents/lib/scripts/ → .agents/pr-review-engine/scripts/
.agents/test/ → .agents/pr-review-engine/test/
Within the engine:
- pr-review-base.md → SKILL.md
- pr-review-engine.test.ts → invariants.test.ts (more accurate name —
it asserts engine invariants, not a
generic unit-test)
Cross-reference updates:
- AGENTS.md §10 backlinks (24 path occurrences) → new paths.
- .agents/commands/pr-review-{ci,gh,local}.md and pr-fix.md → new paths.
- Internal links in agents/*.md (was personas/) — `../../AGENTS.md` →
`../../../AGENTS.md` (one extra level up).
- references/*.md — `../lib/scripts/` → `../scripts/`, `../lib/pr-review-base.md`
→ `../SKILL.md` (now sibling under pr-review-engine/).
- invariants.test.ts internal constants: PERSONAS_DIR → AGENTS_DIR,
BASE_PATH → SKILL_PATH, plus updated REPO_ROOT depth and the regex
that scans AGENTS.md persona-link URLs.
- package.json test:agents → node .agents/pr-review-engine/test/invariants.test.ts
- documentation.md persona body → references new paths.
Engine frontmatter: name pr-review-base → pr-review-engine, version 0.1.0
→ 0.2.0. Persona vocabulary kept in prose (matches AGENTS.md's "Applied by
personas: …" backlinks); only the directory rename adopts the convention
name "agents".
Verification:
$ pnpm test:agents
✓ persona frontmatter (11 files)
✓ conditional triggers wired (3 flags)
✓ engine frontmatter (SKILL.md)
✓ bundled scripts parse (2 files)
✓ AGENTS.md §10 ↔ on-disk personas match (11 files)
Biome clean on the 3 TS files in the new location.
TIB addendum (2026-05-21) added to document the post-hoc layout change.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Per maintainer request: every script gets a colocated test. Adds 99 cases
across three vitest files, all running under the new pr-review-engine
project glob in vitest.config.ts:
scripts/build-changed-lines.test.ts (12 cases)
- Empty diff → {} ; single + multi hunks → sorted-deduped lists.
- Default count=1 when omitted in the hunk header.
- Deletion-only hunks (+0) → file in map with [].
- Pure renames + rename-with-content → new path captured.
- File deletions (+++ /dev/null) → absent from map.
- New files (post-image lines starting at 1) → captured.
- Multiple files → separate keys.
- DiffParseError class is exported, named, and round-trips offendingLine.
scripts/validate-findings.test.ts (32 cases)
- findFencedBlocks: no fence / matched ``` / matched ~~~ / unclosed /
multiple / indented (≤3 spaces).
- Schema rejections: missing WHAT, missing FIX, invalid severity,
non-positive line, non-integer line, missing description, missing or
empty file, non-object findings. All route to failed[].
- Scope filter: kept / file_out_of_scope / line_pre_existing (with
distance_to_nearest_changed_line) / ±15 adjacent tolerance /
pure-rename short-circuit / path normalization (./ a/ b/ prefixes).
- Markdown fence FP filter: line inside fence / outside / on delimiter
line / non-.md skip / missing-file fallback.
- Counts: severity-by-severity tally, dropped findings excluded.
- FindingsParseError class is exported and named.
test/invariants.test.ts (55 cases — many test.each over 11 personas)
- Converted from a custom node:test-style script to vitest.
- Persona frontmatter: required fields, valid kind, semver version,
conditional iff trigger.
- Conditional trigger flags exist in SKILL.md Step 4 (parameterised).
- Engine frontmatter: kind=engine, disable-model-invocation=true,
semver version, parseable.
- Bundled scripts import cleanly and expose their public surface
(stronger than the prior `node --check` parse-only smoke).
- AGENTS.md §10 ↔ on-disk personas: bidirectional inventory match.
Plumbing:
- vitest.config.ts: new `pr-review-engine` project entry.
- package.json test:agents: `vitest run --project=pr-review-engine`
(was: a one-shot node script invocation). Bare `pnpm test` now also
picks the engine tests up automatically.
TIB addendum (2026-05-21) updated: the prior "tests deferred" addendum
is superseded by this one — colocation rule from AGENTS.md §5 is met.
Verification: 99 / 99 passing, ~200ms.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
The in-repo
.agents/PR review automation (caller commands + sharedlib/pr-review-base.md+ 9 personas, anchored toAGENTS.md§10) has been useful but had structural gaps that produced reviewer noise:.mdfenced blocks.ci-release-security) fired whenever any of CI / release / dependency files were touched.TIB-2026-05-21 frames the gaps and the absorption plan, with the explicit goal of keeping the Morpho-specific persona inventory (
morpho-protocol,web3-security, etc.) andAGENTS.md§X anchorage intact — the specificity that makes this reviewer useful and that an off-the-shelf generic agent set cannot replicate.Solution
All five phases of the TIB land here (ship-them-all-in-one PR per request):
version:semver frontmatter on every persona + full engine frontmatter (kind: engine,disable-model-invocation: true) onpr-review-base.md.tsx, no Python):.agents/lib/scripts/build-changed-lines.ts— parsesgit diff --unified=0into aCHANGED_LINESmap. Handles deletion-only hunks, pure renames, file deletions..agents/lib/scripts/validate-findings.ts— applies WHAT/FIX schema check + file-out-of-scope drop + ±15-line tolerance window + Markdown fenced-block FP filter, emits{kept, dropped, failed, counts}..agents/references/{changed-lines,scope-filter,calibration}.md— authoritative documentation for the edge cases and the tolerance rationale.WHAT:/FIX:description schema enforced; malformed routes toFAILED_AGENTS(never silently dropped).<DROPPED_FINDINGS>in the output contract; rendered as a collapsible<details>block inpr-review-gh, terminal counters inpr-review-local, and/tmp/JSON forpr-review-ci(keeps the formal review body tight).ci-release-securitysplit into three narrowly-triggered conditional personas mirroring §10 sub-sections:ci-security(<HAS_WORKFLOWS>),release-integrity(<HAS_RELEASE>),dependencies(<HAS_DEPS>).<HAS_CI_RELEASE>retired.<EXCLUDE_AGENTS>added to the caller→engine contract as a forward-compat hook.AGENTS.md§10 inventory tables updated; rules themselves unchanged..agents/test/pr-review-engine.test.ts(Node stdlib only) asserts five structural invariants: persona frontmatter shape, conditional-trigger ↔ engine-flag wiring, engine frontmatter, script parse-ability,AGENTS.md§10 ↔ on-disk persona match. Wired aspnpm test:agents.Preserved (the "specificity & anchorage"):
applies: AGENTS.md §Xanchorage on every persona<HAS_PROTOCOL_SURFACE>+ Step 4 protocol source-of-truth context (ABI / address / constant excerpts)AGENTS.md§10 rulesNot adopted (out of scope for a TS SDK monorepo): React/Next/Tailwind/styling/accessibility/AI-SDK/runtime-validation personas, SessionStart rubric installer, Claude Code plugin marketplace layout.
Two TIB addenda document operational changes that surfaced during execution: dropped
tsxdevDep in favour of native Node 24 type stripping; deferred Phase-2 unit tests in favour of the Phase-5 invariant test + dev-time end-to-end smoke runs.Verification
Biome clean on the new TS files; strict-TS typecheck clean; end-to-end smoke runs on this PR's actual diff confirm the scope filter behaves correctly (out-of-scope file → dropped, line outside ±15 → dropped with
distance_to_nearest_changed_line, missingWHAT:clause → routed tofailed[]).No published-package contracts affected — no changeset required per AGENTS.md §7.