Skip to content

feat(agents): PR review engine — scope filter, schema, audit trail#716

Draft
0xbulma wants to merge 10 commits into
mainfrom
tib-pr-review-skill-update
Draft

feat(agents): PR review engine — scope filter, schema, audit trail#716
0xbulma wants to merge 10 commits into
mainfrom
tib-pr-review-skill-update

Conversation

@0xbulma
Copy link
Copy Markdown
Collaborator

@0xbulma 0xbulma commented May 21, 2026

Motivation

The in-repo .agents/ PR review automation (caller commands + shared lib/pr-review-base.md + 9 personas, anchored to AGENTS.md §10) has been useful but had structural gaps that produced reviewer noise:

  • Findings on lines the diff didn't touch — the scope filter only dropped by file, so agents flagging pre-existing untyped errors / missing JSDoc on unchanged lines of touched files got through.
  • Free-form descriptions reviewers can't act on — descriptions drifted between "this is a bug" and "consider X."
  • Silent drops — when the scope filter dropped a finding, the user had no signal.
  • Markdown documentation false-positives — agents regularly flagged example code inside .md fenced blocks.
  • Determinism scattered across English prose — diff math and edge cases (renames, deletion-only hunks) re-derived from English at every run.
  • One coarse conditional persona (ci-release-security) fired whenever any of CI / release / dependency files were touched.
  • No versioning on engine or personas.

TIB-2026-05-21 frames the gaps and the absorption plan, with the explicit goal of keeping the Morpho-specific persona inventory (morpho-protocol, web3-security, etc.) and AGENTS.md §X anchorage intact — the specificity that makes this reviewer useful and that an off-the-shelf generic agent set cannot replicate.

Solution

All five phases of the TIB land here (ship-them-all-in-one PR per request):

  • Phase 1version: semver frontmatter on every persona + full engine frontmatter (kind: engine, disable-model-invocation: true) on pr-review-base.md.
  • Phase 2 — Two TypeScript scripts (Node stdlib only — no tsx, no Python):
    • .agents/lib/scripts/build-changed-lines.ts — parses git diff --unified=0 into a CHANGED_LINES map. Handles deletion-only hunks, pure renames, file deletions.
    • .agents/lib/scripts/validate-findings.ts — applies WHAT/FIX schema check + file-out-of-scope drop + ±15-line tolerance window + Markdown fenced-block FP filter, emits {kept, dropped, failed, counts}.
    • .agents/references/{changed-lines,scope-filter,calibration}.md — authoritative documentation for the edge cases and the tolerance rationale.
  • Phase 3WHAT: / FIX: description schema enforced; malformed routes to FAILED_AGENTS (never silently dropped). <DROPPED_FINDINGS> in the output contract; rendered as a collapsible <details> block in pr-review-gh, terminal counters in pr-review-local, and /tmp/ JSON for pr-review-ci (keeps the formal review body tight).
  • Phase 4ci-release-security split into three narrowly-triggered conditional personas mirroring §10 sub-sections: ci-security (<HAS_WORKFLOWS>), release-integrity (<HAS_RELEASE>), dependencies (<HAS_DEPS>). <HAS_CI_RELEASE> retired. <EXCLUDE_AGENTS> added to the caller→engine contract as a forward-compat hook. AGENTS.md §10 inventory tables updated; rules themselves unchanged.
  • Phase 5.agents/test/pr-review-engine.test.ts (Node stdlib only) asserts five structural invariants: persona frontmatter shape, conditional-trigger ↔ engine-flag wiring, engine frontmatter, script parse-ability, AGENTS.md §10 ↔ on-disk persona match. Wired as pnpm test:agents.

Preserved (the "specificity & anchorage"):

  • 8 Morpho-specific baseline personas unchanged
  • applies: AGENTS.md §X anchorage on every persona
  • <HAS_PROTOCOL_SURFACE> + Step 4 protocol source-of-truth context (ABI / address / constant excerpts)
  • AGENTS.md §10 rules

Not adopted (out of scope for a TS SDK monorepo): React/Next/Tailwind/styling/accessibility/AI-SDK/runtime-validation personas, SessionStart rubric installer, Claude Code plugin marketplace layout.

Two TIB addenda document operational changes that surfaced during execution: dropped tsx devDep in favour of native Node 24 type stripping; deferred Phase-2 unit tests in favour of the Phase-5 invariant test + dev-time end-to-end smoke runs.

Verification

$ pnpm test:agents
✓ persona frontmatter (11 files)
✓ conditional triggers wired (3 flags)
✓ engine frontmatter (pr-review-base.md)
✓ bundled scripts parse (2 files)
✓ AGENTS.md §10 ↔ on-disk personas match (11 files)

All PR review engine invariants OK.

Biome clean on the new TS files; strict-TS typecheck clean; end-to-end smoke runs on this PR's actual diff confirm the scope filter behaves correctly (out-of-scope file → dropped, line outside ±15 → dropped with distance_to_nearest_changed_line, missing WHAT: clause → routed to failed[]).

No published-package contracts affected — no changeset required per AGENTS.md §7.

0xbulma and others added 7 commits May 21, 2026 15:39
….agents/

Adopts CHANGED_LINES + ±15 line tolerance, WHAT/FIX schema, DROPPED_FINDINGS
audit trail, bundled TS scripts, references/ directory, ci-release-security
persona split, versioned frontmatter, and an invariant test — while preserving
the Morpho-specific persona inventory and AGENTS.md §X anchorage.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- .agents/lib/pr-review-base.md: full frontmatter with kind: engine,
  version: 0.1.0, disable-model-invocation: true.
- Every .agents/personas/*.md: version: 1.0.0 (initial baseline).

No behavior change. Sets up the version-tracking contract that
.agents/test/pr-review-engine.test.sh (Phase 5) will assert on.

TIB: TIB-2026-05-21

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…eferences

- .agents/lib/scripts/build-changed-lines.ts: parses `git diff --unified=0`
  from stdin, emits CHANGED_LINES map. Handles deletion-only hunks, pure
  renames (empty array), and file deletions (absent from map).
- .agents/lib/scripts/validate-findings.ts: applies WHAT/FIX schema check,
  file-out-of-scope drop, ±15-line tolerance window, and Markdown
  fenced-code-block FP filter in one pass. Emits {kept,dropped,failed,counts}.
- .agents/references/{changed-lines,scope-filter,calibration}.md:
  authoritative documentation for the edge cases and the tolerance
  rationale. Linked from pr-review-base.md and from the per-agent envelope.
- .agents/lib/pr-review-base.md: Step 3 now pipes `git diff --unified=0`
  into build-changed-lines.ts; Step 6.1 routes findings through
  validate-findings.ts; Step 5 envelope contract injects CHANGED_LINES.
- biome.json: narrow `!**/lib` exclude to `!packages/**/lib` so the engine
  scripts under .agents/lib/scripts/ are linted (build outputs in
  packages/*/lib remain excluded).

Both scripts: Node stdlib only, no new deps. Strict TS, NodeNext, zero
`any`. Native Node 24 strips types — no tsx, no transpiler.

TIB: TIB-2026-05-21

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- pr-review-base.md Step 5: per-agent contract now mandates `WHAT:` and
  `FIX:` clauses in every finding's `description`. Missing clauses route
  to `failed[]` (counted in <FAILED_AGENTS>) — never silently dropped.
- pr-review-base.md Step 5: `line` schema spelled out — positive integer
  inside CHANGED_LINES for the file, or within ±15 lines.
- pr-review-base.md Step 6.2: agent-failure list now includes the schema
  failures (missing WHAT/FIX, malformed shape) routed via Step 6.1.
- pr-review-base.md output contract: adds <DROPPED_FINDINGS> (with
  drop_reason and distance_to_nearest_changed_line) and <DROPPED_COUNTS>.
- pr-review-gh.md: renders dropped findings as a collapsible <details>
  audit-trail block when DROPPED_FINDINGS is non-empty; writes full JSON
  to /tmp/pr-review-gh-<PR>-dropped.json regardless.
- pr-review-local.md: prints a terminal audit-trail block (counters only)
  between findings and sentinel; writes full JSON to /tmp/.
- pr-review-ci.md: writes dropped JSON to /tmp/ for artifact upload; does
  NOT render in the formal GitHub review body — keeps the verdict tight.

TIB: TIB-2026-05-21

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…personas

ci-release-security.md was a single combined persona firing whenever any
of CI / release / dependency files were touched. Split into three
narrowly-triggered personas, each anchored to a discrete §10 sub-rule
cluster:

- ci-security.md         (trigger: <HAS_WORKFLOWS>) — workflow injection,
                         pull_request_target + PR-head checkout, ACL gating,
                         action pinning, permissions: scopes, secret exposure.
- release-integrity.md   (trigger: <HAS_RELEASE>) — publish-flow integrity
                         (--provenance, org tokens), release-commit signing
                         & write-token hardening, Changesets / release-bot
                         wiring.
- dependencies.md        (trigger: <HAS_DEPS>) — lockfile drift, new-dep
                         hygiene (postinstall, typosquats, unpinned ranges),
                         .npmrc and pnpm-workspace.yaml hardening.

Engine prose:
- pr-review-base.md Step 4: <HAS_CI_RELEASE> retired; replaced with
  <HAS_WORKFLOWS>, <HAS_RELEASE>, <HAS_DEPS>.
- pr-review-base.md inputs table: adds <EXCLUDE_AGENTS> (forward-compat
  hook for future orchestrators).
- pr-review-base.md Step 5: new sub-step 3 applies the EXCLUDE_AGENTS
  filter before launching personas in parallel.
- pr-review-base.md Step 5 inventory: three new conditional rows replace
  the single ci-release-security row.

AGENTS.md §10:
- Inventory table: three rows replace the single conditional row.
- Header note: persona-split + bundled-scripts + references/ directory.
- Section title and trailing "Applied by personas" pointer updated to
  name all three personas.
- The §10 rules themselves are UNCHANGED — only the inventory tables move.

Cross-persona out-of-scope references in code-quality, style-conventions,
test-coverage, web3-security, and pr-fix.md are updated to point at the
new persona names (no rule changes).

TIB: TIB-2026-05-21 (Phase 4 of 5)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Asserts five structural invariants the rest of the engine relies on:

  1. Every persona file has parseable frontmatter with the required
     fields (name, kind, version, applies). `version` is semver. `kind`
     is baseline or conditional. Conditionals also declare `trigger:`.
  2. Every conditional persona's `trigger:` flag is computed in
     pr-review-base.md Step 4. Catches the easy-to-miss drift where
     someone adds a new conditional persona but forgets to wire the
     <HAS_*> flag in the engine.
  3. The engine (.agents/lib/pr-review-base.md) carries the
     `disable-model-invocation: true` + `kind: engine` + semver
     `version` frontmatter — the contract that hides it from the
     slash-command surface.
  4. Both bundled scripts (.agents/lib/scripts/{build-changed-lines,
     validate-findings}.ts) exist and parse-execute under native Node.
     Cheap smoke test that catches obvious regressions without
     spawning the scripts for real-world output.
  5. AGENTS.md §10 persona-inventory tables ↔ on-disk personas match
     in both directions. No persona orphaned in either place.

Wired as `pnpm test:agents`. Bash-free, Node stdlib only, no test
runner — exits non-zero on any violation. Advisory initially; flip
to a blocking CI step once stable.

TIB: TIB-2026-05-21 (Phase 5 of 5)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two operational updates that surfaced during execution of TIB-2026-05-21
(decisions unchanged — TypeScript scripts, deterministic scope filter,
audit trail, etc. all land as originally accepted):

1. Drop tsx devDep in favour of native Node 24 type stripping. Node's
   built-in --experimental-strip-types (stable since 22.6) handles the
   scripts' subset; no transpiler dep needed. Engine prose, references,
   and Phase-5 test all invoke `node <script>.ts`.

2. Skip Phase-2 colocated unit tests; rely on the Phase-5 invariant test
   and dev-time end-to-end smoke runs. Scripts are thin CLI wrappers;
   adding vitest project-glob wiring for a 2-file directory wasn't worth
   the config surface. Follow-up TIP can add them if scripts grow.

TIB: TIB-2026-05-21

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@0xbulma 0xbulma added the enhancement New feature or request label May 21, 2026
@0xbulma 0xbulma self-assigned this May 21, 2026
Per maintainer request, remove citations to the specific external
Claude Code review-engine plugin (its GitHub repo URL was named throughout).
The architectural learnings stand on their own — we describe them as
patterns adopted from "a separate Claude Code review-engine plugin"
without naming the source repo.

- Rename: TIB-2026-05-21-pull-claude-skills-learnings-... →
          TIB-2026-05-21-pr-review-engine-scope-filter-and-audit-trail.md.
- H1 + body prose: drop the repo namedrops and URL links.
- References section: remove external-repo URLs; add pointers to the
  in-repo artifacts the TIB produced (scripts/, references/).
- @0xbulma as author handle is normal attribution and stays.

Decision and content of the TIB are unchanged. Only the framing prose
that identified the external repo by name was scrubbed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@0xbulma 0xbulma changed the title feat(agents): pull claude-skills learnings into PR review engine feat(agents): PR review engine — scope filter, schema, audit trail May 21, 2026
0xbulma and others added 2 commits May 21, 2026 16:18
Group the engine, agents, references, scripts, and test under a single
directory matching the Anthropic Skills convention (without going full
plugin). Decision unchanged — same engine, same agents, same scope-filter
contract; only the on-disk layout moves.

Old → new:
  .agents/lib/pr-review-base.md   → .agents/pr-review-engine/SKILL.md
  .agents/personas/               → .agents/pr-review-engine/agents/
  .agents/references/             → .agents/pr-review-engine/references/
  .agents/lib/scripts/            → .agents/pr-review-engine/scripts/
  .agents/test/                   → .agents/pr-review-engine/test/

Within the engine:
  - pr-review-base.md             → SKILL.md
  - pr-review-engine.test.ts      → invariants.test.ts (more accurate name —
                                    it asserts engine invariants, not a
                                    generic unit-test)

Cross-reference updates:
  - AGENTS.md §10 backlinks (24 path occurrences) → new paths.
  - .agents/commands/pr-review-{ci,gh,local}.md and pr-fix.md → new paths.
  - Internal links in agents/*.md (was personas/) — `../../AGENTS.md` →
    `../../../AGENTS.md` (one extra level up).
  - references/*.md — `../lib/scripts/` → `../scripts/`, `../lib/pr-review-base.md`
    → `../SKILL.md` (now sibling under pr-review-engine/).
  - invariants.test.ts internal constants: PERSONAS_DIR → AGENTS_DIR,
    BASE_PATH → SKILL_PATH, plus updated REPO_ROOT depth and the regex
    that scans AGENTS.md persona-link URLs.
  - package.json test:agents → node .agents/pr-review-engine/test/invariants.test.ts
  - documentation.md persona body → references new paths.

Engine frontmatter: name pr-review-base → pr-review-engine, version 0.1.0
→ 0.2.0. Persona vocabulary kept in prose (matches AGENTS.md's "Applied by
personas: …" backlinks); only the directory rename adopts the convention
name "agents".

Verification:
  $ pnpm test:agents
  ✓ persona frontmatter (11 files)
  ✓ conditional triggers wired (3 flags)
  ✓ engine frontmatter (SKILL.md)
  ✓ bundled scripts parse (2 files)
  ✓ AGENTS.md §10 ↔ on-disk personas match (11 files)

Biome clean on the 3 TS files in the new location.

TIB addendum (2026-05-21) added to document the post-hoc layout change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Per maintainer request: every script gets a colocated test. Adds 99 cases
across three vitest files, all running under the new pr-review-engine
project glob in vitest.config.ts:

  scripts/build-changed-lines.test.ts (12 cases)
  - Empty diff → {} ; single + multi hunks → sorted-deduped lists.
  - Default count=1 when omitted in the hunk header.
  - Deletion-only hunks (+0) → file in map with [].
  - Pure renames + rename-with-content → new path captured.
  - File deletions (+++ /dev/null) → absent from map.
  - New files (post-image lines starting at 1) → captured.
  - Multiple files → separate keys.
  - DiffParseError class is exported, named, and round-trips offendingLine.

  scripts/validate-findings.test.ts (32 cases)
  - findFencedBlocks: no fence / matched ``` / matched ~~~ / unclosed /
    multiple / indented (≤3 spaces).
  - Schema rejections: missing WHAT, missing FIX, invalid severity,
    non-positive line, non-integer line, missing description, missing or
    empty file, non-object findings. All route to failed[].
  - Scope filter: kept / file_out_of_scope / line_pre_existing (with
    distance_to_nearest_changed_line) / ±15 adjacent tolerance /
    pure-rename short-circuit / path normalization (./ a/ b/ prefixes).
  - Markdown fence FP filter: line inside fence / outside / on delimiter
    line / non-.md skip / missing-file fallback.
  - Counts: severity-by-severity tally, dropped findings excluded.
  - FindingsParseError class is exported and named.

  test/invariants.test.ts (55 cases — many test.each over 11 personas)
  - Converted from a custom node:test-style script to vitest.
  - Persona frontmatter: required fields, valid kind, semver version,
    conditional iff trigger.
  - Conditional trigger flags exist in SKILL.md Step 4 (parameterised).
  - Engine frontmatter: kind=engine, disable-model-invocation=true,
    semver version, parseable.
  - Bundled scripts import cleanly and expose their public surface
    (stronger than the prior `node --check` parse-only smoke).
  - AGENTS.md §10 ↔ on-disk personas: bidirectional inventory match.

Plumbing:
  - vitest.config.ts: new `pr-review-engine` project entry.
  - package.json test:agents: `vitest run --project=pr-review-engine`
    (was: a one-shot node script invocation). Bare `pnpm test` now also
    picks the engine tests up automatically.

TIB addendum (2026-05-21) updated: the prior "tests deferred" addendum
is superseded by this one — colocation rule from AGENTS.md §5 is met.

Verification: 99 / 99 passing, ~200ms.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant