A small, reusable Claude Code / agent skill that runs a disciplined reflect → critique → improve loop over an artifact — a plan / strategy / doc / prompt / skill / research design / SOP, or a feature/code change — and iterates until the artifact is relatively perfect or a round/time budget runs out.
It's a lightweight, project-grounded variant of Self-Refine (Madaan et al. 2023) and Reflexion (Shinn et al. 2023): a fixed loop skeleton, evidence-anchored critique (read the real project + logs and go pull external evidence from GitHub / the web), an explicit guard against Reflexion's documented degeneration-of-thought failure mode, a visible changelog, and a learnings loop that compounds across runs.
Single-LLM self-critique loops tend to (a) critique in a vacuum, (b) re-litigate the same points round after round ("degeneration-of-thought"), (c) bloat the artifact, and (d) never honestly converge. plan-refine is the SOP that fixes those: each round must produce a substantive change or declare convergence; each round attacks an angle not yet used; reverting a prior change is checked against the changelog to avoid ping-pong; the artifact is kept tight (long checklists live in a reference file, not the SKILL.md); and the stop condition is operationalised, not vague.
Step 0 Ground — read the project + logs (internal); pull GitHub/web evidence (external); read past learnings.
Step 1 Classify + v1 — code change → Mode B; anything else → Mode A; mixed/meta → both. Draft v1 if none given.
Step 2 Critique — adversarial, from a not-yet-used angle. Each finding = [where] + [what's wrong] + [why/evidence+source] + [proposed fix]. (loads reference/critique-playbook.md)
Step 3 Improve — apply the findings; bump version; append a changelog entry. Reverting is fine; ping-ponging isn't.
Step 4 Stop check — stop if: cosmetic-only findings / only re-raising old points / budget hit / 2 rounds no substantive change / "relatively perfect". Else → Step 2. When unsure, bring in a fresh perspective (a different model / a critic agent / an adversarial persona).
Step 5 Wrap up — append 1–2 lines to the learnings log; output the polished artifact (version header + changelog) + a per-round critique→change summary.
Mode A (non-code artifacts): critique boldly raises alternatives and argues — with evidence — toward the best final version; surfaces a "minimum viable core" each round. Mode B ("change one thing, the whole thing moves" — product self-maintenance): map the blast radius first, fix all the connected parts in the same round, then critique soundness; a huge blast radius is itself a finding. For systems the user depends on daily, output a diff for review rather than refactoring in place.
Full attack-angle checklists, blast-radius mapping method, external-grounding sources, anti-degeneration tactics, prior art, and the learnings log are in reference/critique-playbook.md.
plan-refine was used to refine itself. Abridged trace (the per-round critique → change summary it emits):
Round 1 (external ammo: Self-Refine / Reflexion / Anthropic skill best-practices)
critique: it's a self-critique loop with no degeneration-of-thought guard (Reflexion's
documented failure mode); critique items lack "where + how to fix"; the SKILL.md
mixes process + long checklists (violates the official "process only, context in
reference files" guidance); no learnings loop; stop condition is vague; "multi-task"
isn't really delivered.
→ change: added the degeneration guard (new angle each round + ping-pong check); changed
the finding format to [where] + [what's wrong] + [why/evidence+source] + [fix];
split the long checklists into reference/critique-playbook.md; added a learnings
loop; sharpened the stop condition to 4 parallel tests; broadened Mode A from
"plans" to "any non-code artifact" + made mode-classification an explicit step.
Round 2
critique: "swap to a fresh perspective every round" is too heavy; a no-findings first round
usually means lazy grounding, not a perfect artifact; two reference files is one
too many.
→ change: made the fresh-perspective swap trigger-based (only when re-raising old points);
added the lazy-grounding warning; merged the learnings log into the playbook.
Round 3
critique: self-consistency check — does the skill follow its own rules? (has a changelog ✓,
a minimal core ✓, mode classification ✓, length ≈ the official guideline ✓)
→ change: none substantive → declared converged (tentative).
Round 4 (fresh-perspective pass, post-"converged" — "what's missing entirely")
critique: no rounds-count guidance; no large-artifact handling; "time budget" is a fiction
(the agent can't see a clock); "relatively perfect" was never operationalised;
the learnings log will grow unbounded; a huge blast radius is itself a finding.
→ change: added all six, concisely; offset the additions by moving the prior-art section
into the playbook (one cut per add — keep it lean).
Round 5
critique: only marginal — clarify that reverting a prior change is fine (correction) vs
ping-ponging (oscillating on a settled point).
→ change: that clarification → converged. (v0.5, "relatively perfect")
The same loop applies to a product strategy, a doc, a prompt, or a feature/code change — see SKILL.md.
Drop this repo's contents into a Claude Code skills directory:
git clone https://github.com/TTTTTToYYYYY/plan-refine.git ~/.claude/skills/plan-refine(or copy SKILL.md + reference/ into ~/.claude/skills/plan-refine/). Restart Claude Code. Then invoke with /plan-refine, or just say things like "iterate this plan and critique it for half an hour", "upgrade this feature — mind the ripple effects", "polish X until it's basically perfect".
Works with any agent harness that loads SKILL.md-style skill files (Claude Code, OpenClaw, etc.) — it's a plain process document.
- Self-Refine — Madaan et al. 2023, arXiv:2303.17651, github.com/madaan/self-refine
- Reflexion — Shinn et al. 2023, github.com/noahshinn/reflexion
- Anthropic — Agent Skill authoring best practices — platform.claude.com/docs/.../agent-skills/best-practices
This skill is the project-grounded, plan-and-code, "relatively-perfect-or-budget" variant of those ideas.
MIT — see LICENSE.