plan-refine

A small, reusable Claude Code / agent skill that runs a disciplined reflect → critique → improve loop over an artifact — a plan / strategy / doc / prompt / skill / research design / SOP, or a feature/code change — and iterates until the artifact is relatively perfect or a round/time budget runs out.

It's a lightweight, project-grounded variant of Self-Refine (Madaan et al. 2023) and Reflexion (Shinn et al. 2023): a fixed loop skeleton, evidence-anchored critique (read the real project + logs and go pull external evidence from GitHub / the web), an explicit guard against Reflexion's documented degeneration-of-thought failure mode, a visible changelog, and a learnings loop that compounds across runs.

Why it exists

Single-LLM self-critique loops tend to (a) critique in a vacuum, (b) re-litigate the same points round after round ("degeneration-of-thought"), (c) bloat the artifact, and (d) never honestly converge. plan-refine is the SOP that fixes those: each round must produce a substantive change or declare convergence; each round attacks an angle not yet used; reverting a prior change is checked against the changelog to avoid ping-pong; the artifact is kept tight (long checklists live in a reference file, not the SKILL.md); and the stop condition is operationalised, not vague.

The loop (fixed skeleton — see `SKILL.md`)

Step 0  Ground       — read the project + logs (internal); pull GitHub/web evidence (external); read past learnings.
Step 1  Classify + v1 — code change → Mode B; anything else → Mode A; mixed/meta → both. Draft v1 if none given.
Step 2  Critique      — adversarial, from a not-yet-used angle. Each finding = [where] + [what's wrong] + [why/evidence+source] + [proposed fix]. (loads reference/critique-playbook.md)
Step 3  Improve       — apply the findings; bump version; append a changelog entry. Reverting is fine; ping-ponging isn't.
Step 4  Stop check    — stop if: cosmetic-only findings / only re-raising old points / budget hit / 2 rounds no substantive change / "relatively perfect". Else → Step 2. When unsure, bring in a fresh perspective (a different model / a critic agent / an adversarial persona).
Step 5  Wrap up       — append 1–2 lines to the learnings log; output the polished artifact (version header + changelog) + a per-round critique→change summary.

Mode A (non-code artifacts): critique boldly raises alternatives and argues — with evidence — toward the best final version; surfaces a "minimum viable core" each round. Mode B ("change one thing, the whole thing moves" — product self-maintenance): map the blast radius first, fix all the connected parts in the same round, then critique soundness; a huge blast radius is itself a finding. For systems the user depends on daily, output a diff for review rather than refactoring in place.

Full attack-angle checklists, blast-radius mapping method, external-grounding sources, anti-degeneration tactics, prior art, and the learnings log are in reference/critique-playbook.md.

What a run looks like

plan-refine was used to refine itself. Abridged trace (the per-round critique → change summary it emits):

Round 1 (external ammo: Self-Refine / Reflexion / Anthropic skill best-practices)
  critique: it's a self-critique loop with no degeneration-of-thought guard (Reflexion's
            documented failure mode); critique items lack "where + how to fix"; the SKILL.md
            mixes process + long checklists (violates the official "process only, context in
            reference files" guidance); no learnings loop; stop condition is vague; "multi-task"
            isn't really delivered.
  → change: added the degeneration guard (new angle each round + ping-pong check); changed
            the finding format to [where] + [what's wrong] + [why/evidence+source] + [fix];
            split the long checklists into reference/critique-playbook.md; added a learnings
            loop; sharpened the stop condition to 4 parallel tests; broadened Mode A from
            "plans" to "any non-code artifact" + made mode-classification an explicit step.

Round 2
  critique: "swap to a fresh perspective every round" is too heavy; a no-findings first round
            usually means lazy grounding, not a perfect artifact; two reference files is one
            too many.
  → change: made the fresh-perspective swap trigger-based (only when re-raising old points);
            added the lazy-grounding warning; merged the learnings log into the playbook.

Round 3
  critique: self-consistency check — does the skill follow its own rules? (has a changelog ✓,
            a minimal core ✓, mode classification ✓, length ≈ the official guideline ✓)
  → change: none substantive → declared converged (tentative).

Round 4 (fresh-perspective pass, post-"converged" — "what's missing entirely")
  critique: no rounds-count guidance; no large-artifact handling; "time budget" is a fiction
            (the agent can't see a clock); "relatively perfect" was never operationalised;
            the learnings log will grow unbounded; a huge blast radius is itself a finding.
  → change: added all six, concisely; offset the additions by moving the prior-art section
            into the playbook (one cut per add — keep it lean).

Round 5
  critique: only marginal — clarify that reverting a prior change is fine (correction) vs
            ping-ponging (oscillating on a settled point).
  → change: that clarification → converged. (v0.5, "relatively perfect")

The same loop applies to a product strategy, a doc, a prompt, or a feature/code change — see SKILL.md.

Install

Drop this repo's contents into a Claude Code skills directory:

git clone https://github.com/TTTTTToYYYYY/plan-refine.git ~/.claude/skills/plan-refine

(or copy SKILL.md + reference/ into ~/.claude/skills/plan-refine/). Restart Claude Code. Then invoke with /plan-refine, or just say things like "iterate this plan and critique it for half an hour", "upgrade this feature — mind the ripple effects", "polish X until it's basically perfect".

Works with any agent harness that loads SKILL.md-style skill files (Claude Code, OpenClaw, etc.) — it's a plain process document.

Prior art

Self-Refine — Madaan et al. 2023, arXiv:2303.17651, github.com/madaan/self-refine
Reflexion — Shinn et al. 2023, github.com/noahshinn/reflexion
Anthropic — Agent Skill authoring best practices — platform.claude.com/docs/.../agent-skills/best-practices

This skill is the project-grounded, plan-and-code, "relatively-perfect-or-budget" variant of those ideas.

License

MIT — see LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
reference		reference
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
SKILL.md		SKILL.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

plan-refine

Why it exists

The loop (fixed skeleton — see `SKILL.md`)

What a run looks like

Install

Prior art

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

plan-refine

Why it exists

The loop (fixed skeleton — see SKILL.md)

What a run looks like

Install

Prior art

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

The loop (fixed skeleton — see `SKILL.md`)

Packages