A verification toolchain for TypeScript. Write ordinary TypeScript with //@ specification annotations. The toolchain generates verifiable code from your TypeScript — either in Dafny or Lean 4 (with Velvet/Loom).
See SPEC.md, DESIGN.md, and GETTING_STARTED.md.
This is a Tech Preview: the core idea is there, but support, semantics, and ergonomics are still evolving.
See our blog post.
Each example and case study is verified in Lean 4 and/or Dafny from the same annotated TypeScript source.
See the internal examples.
See the external case studies:
- collab-todo-lemmascript — collaborative task management web app (React + Supabase) with a verified domain model. Single
domain.tsimported directly by the UI, hooks, and edge functions — no adapter layer. 123 Dafny lemmas (120 in a separatedomain.proofs.dfy): 16-conjunct invariant preserved across 25 single-project + 3 cross-project actions, NoOp completeness/soundness, initialization. Dafny only. - colorwheel-lemmascript — verified color palette generator with mood + harmony constraints. 31 Lean proofs + 18 behavioral properties, 115 Dafny lemmas (invariant preservation, commutativity, NoOp completeness).
- clear-split-lemmascript — greenfield verified expense splitting web app. Conservation theorem, invariant preservation, delta laws — all proven in both Lean (no sorry) and Dafny (56 lemmas).
- github-star-checker-lemmascript — small verified CLI that tracks GitHub star counts across repos and reports per-run deltas. Verifies: per-row diff correctness and
totalDiff == sumDiffs(rows)(via an inductiveSumDiffs_appendlemma); three sign-classified extractors (gainers / losers / unchanged) with soundness, completeness, ordered completeness (gainers appear in the notification in the same order they were listed on the command line), and count/sum equalities against prefix-indexedupTohelpers; conservation theoremdecompose(r)— the three splits partition every row exactly once, andsumDiffs(increases) + sumDiffs(decreases) == totalDiff. 33 Dafny VCs, 0 errors; proof additions include a head/tail bridge (sumDiffs↔sumDiffsUpTo) and two partition-on-n inductions. Dafny only. - equality-game-lemmascript — greenfield verified arithmetic equality card game (React + Tailwind). Sound + complete decision procedure for "can these two card lists be combined into equal expressions":
canEqualize(L, R) ⟺ ∃ eL, eR. eval(eL) == eval(eR) ∧ multiset(leaves(eL)) == multiset(L) ∧ same for R. Algorithm is subset-DP over a bitmaskm ∈ [1, 2^n − 1); the proof composes aPopCountupper/lower bound chain (with stdlibLemmaDivDenominator/LemmaFundamentalDivModConverse), asplitLeft/splitRight↔ imperative-loop connection, aWitnessCombinelemma threading existentialExprwitnesses through the cross-product loops, and aChooseMaskcombinatorial constructor that, given any sub-multiset ofcards, produces the realizing mask. Capped byCompletenessFromMaskCoverage. 753 verification conditions, 0 errors, 0assumes, 0 axioms under--isolate-assertions --verification-time-limit 180. Dafny only. - talktimer-lemmascript — verified talk timer React app, ported from a Dafny-only
talktimer-lemmafittwin. 17-variantActionstate machine + verifiedHistory(undo/redo/preview/commitFrom) all in onedomain.ts— the original Dafny'sDomain refines Kernelabstract-module pattern inlined since LS has no abstract modules. 108 VCs indomain.dfy(invariant preservation) + 123 indomain.proofs.dfy(behavioral lemmas + Kernel round-trip). Dafny only. - node-casbin-lemmascript — brownfield verification of node-casbin. 5 functions verified, 217 existing tests pass. End-to-end correctness and order independence for all 4 effect modes in both Lean and Dafny (39 lemmas).
- hono-lemmascript — brownfield verification of hono's security middleware. Four CVEs covered: IP restriction bypass (CVE-2026-39409) and cookie name bypass (CVE-2026-39410) — 51 Dafny lemmas, cookie verification in-place; plus
serveStatic's URL-encoded directory traversal (CVE-2024-32869) + repeated-slash bypass (CVE-2026-39407), proved as a composition —decode(rawPath)beforecheck(decoded), so a buggy implementation that reordered the steps would fail the proof. First use of//@ assume+//@ havoc-on-assign. Dafny only. - charmchat — brownfield verification of an AI agent orchestration backend.
isEmptyResult(string emptiness predicate, 8 postconditions, <1s) andtopologicalSort(Kahn's algorithm — memory safety, output bounds, completeness via acyclicity ranking witness, termination). Full completeness proof: 23 helper lemmas, 14 opaque ghost predicates, 115 loop invariants; 736 VCs verified under--isolate-assertions --verification-time-limit 600. Key technique: snapshot-based inner invariants (ghost var originalRemDeps := remDeps) replace the mid-iteration SEEN/UNSEEN split so preservation is frame reasoning against a ghost-constant rather than set-subtraction against mutating state. Dafny only. - xyflow-lemmascript — brownfield verification of xyflow's core edge and geometry utilities. 9 functions verified:
addEdge(dedup — never loses edges, adds at most one),reconnectEdge(semantic: under a unique-id precondition, the result is in-place —|result| ≤ |edges|, no insertion — and when a matching edge existed with non-empty new endpoints, the output contains an edge with those endpoints. Uses//@ assumeto characterize destructuring,find, and the constructed edge),connectionExists,getEdgeCenter(midpoint correctness),clamp(bounds),rectToBox/boxToRect(field arithmetic),getBoundsOfBoxes(enclosure),getOverlappingArea(non-negative),areSetsEqual(subset + same size). 14 Dafny proof obligations. Dafny only. - rallly-lemmascript — brownfield verification of rallly's meeting-poll Next.js app. 2 functions:
validateRedirectUrl(in-place — open-redirect predicate; non-undefinedoutputs start with/but not//) andscorePoll(extracted ranking core — length preservation, score bounds, top-choice characterization, score-formula equality, within-poll monotonicity, tiebreaker injectivity). The injectivity proof surfaced a real spec-level constraint on the existing(yes + ifNeedBe) * 1000 + yesencoding: it overflows when an option has ≥ 1000 yes votes. 10 Dafny VCs, 0 errors. Drove four toolchain additions:s.startsWith(),T | nullnullability,\resultnarrowing under==>,Math.max(...arr)spread. Dafny only. - opencode-lemmascript — brownfield verification of opencode's permission system and unified-diff patch parser. Highlights: (1)
Patch.parsePatchcarries conservation loop invariants over local ghost state — a parser bug here would silently corrupt user files when an AI applies a patch, and (2) the permission-engine work mechanically closes opencode bug #26514 (subagents bypassing Plan Mode's file-edit restrictions). 9 functions verified in-place, 0 errors. Dafny only. - pi-lemmascript — brownfield, in-place verification of the context-compaction cut-point selector in pi (the earendil-works agent harness). When the context window fills, pi discards history before a chosen cut; a provider API rejects a retained prefix containing an orphaned
toolResult(a tool result whose tool call was cut away). Both selector functions proven: the cut never lets the kept suffix start with — nor split a tool-use/tool-result run into — an orphaned tool result, even across the backward metadata snap. The no-orphan result forced the session tree's tool-pairing ordering into an explicitrequires. 4 VCs, 0 errors. Drove five toolchain additions, headlined by an opaque fall-through type: a union LemmaScript can't discriminate (here an array-element union of unreachable imports) becomes a single opaquetype— the field stays present so distinct values stay distinct, and with no constructor or tag predicate it can only be passed through, never unsoundly observed. Dafny only. - balanced-match-lemmascript — brownfield, in-place verification of balanced-match, the ~70-line balanced-bracket finder pulled in by
npm,webpack, and most of the JS tooling stack (1B+ downloads/month). The stack-basedrangecore is verified by refinement: a pure recursive specrange_specmirrors the loop one branch per recursive case, so the single equivalencerange == range_spectransfers every property automatically — including an unconditional Dyck body-balance theorem for the interior of every returned pair. 2233 VCs, 0 errors under--isolate-assertions(registered on thedafny-slowtrack). Dafny only. - guardians-lemmascript — greenfield verification of the core safety argument behind Guardians (Erik Meijer, "Guardians of the Agents", CACM Jan 2026), a generate-verify-execute checker for AI-agent workflows. Instead of verifying an app, it proves the agent guardrail itself sound — that a static taint/automaton check over the real recursive workflow AST can never admit an unsafe plan. Highlights: taint over nested conditionals as a sound branch-union over-approximation; per-source provenance with a join (a multi-input tool is tainted if any input was); unbounded loops discharged by a one-step pre-fixpoint (
sat = t0 ‖ bodyTaint(t0)) that bounds taint over any iteration count without iterating to a fixpoint; and a unified capstone (verifyWfSound) — one clean verdict rules out, on every path, both a tainted-data-to-sink leak and a security-automaton error. 54 Dafny obligations, 0 errors. The verified cores are reached from a Guardians-styleWorkflow/Policythrough a thin unverified adapter, differentially tested against the real Python Guardians (used as the oracle, not a porting target). Dafny only. - quorum-lemmascript — greenfield verified when2meet/Doodle-style group scheduler (React + Cloudflare Durable Objects), with one
domain.tsrunning unchanged in the browser, the in-app query, and the server. The standout is that the proof licenses the architecture:countFreeis a homomorphism from participant-list concatenation to integer addition (so the heatmap is order-independent) plus same-participant last-writer-wins convergence — which is exactly what makes the lock-free, no-login, optimistic multi-device backend safe, with the Durable Object and the browser applying the same verifiedapplyOp(server-authoritatively, client-optimistically) with no rollback or operational transform. Also: heatmap is exactly the per-slot count andisBestexactly its argmax; monotonicity; invariant-preserving mutations + op-logreplay; a sparse export codec round-trip; an in-appwhoIsFree(e, s)whose length provably equals the cell's count; and a separategrid.tsproving the(day, time) → slotmap in-range + injective — which makes specific-dates-vs-days-of-the-week pure shell labeling at zero proof cost; and full element-level permutation invariance (heatmapPermInvariant— the heatmap depends only on the multiset of participant rows), which drove theperm(...)spec predicate into LemmaScript itself. 100 Dafny VCs (90 + 10), 0 errors. The aggregate is proven; the React UI, WebSocket/DO I/O, and timezone labeling are the stated trust boundary. Dafny only. - quota-lemmascript — greenfield verified booking app: providers publish a page of limited-capacity featured slots, signed-in users grab them (React + Cloudflare Durable Objects + D1). A deliberate inverse of quorum-lemmascript: where Quorum counts up to a threshold over data partitioned per participant (no conflicts ⇒ optimistic, lock-free, no rollback), Quota's bookings contend for shared inventory, so the load-bearing fact flips from a count to a bound — for every slot
j,confirmedCount(bookings, j) <= slots[j].capacity— and so does the concurrency story: the samedomain.tsruns in the browser and the Durable Object, but server-authoritatively (it never oversells under contention; no optimistic client apply). Proven: no overbooking (invariant preservation acrosstryBook/ cancel), accept-iff-room, an idempotent three-waytryBook(a retry reads as success, not rejection — only "confirmed" mutates), cancellation frees seats, replay determinism, and full order-invariance of availability under contention —confirmedCountPerm/hasRoomPermInvariantshow availability depends only on the multiset of the booking log (any reordering, not just a pairwise swap), via the sameperm(...)predicate Quorum drove into LemmaScript. NDJSON export is built on the verifiedconfirmedOnly. The counting kernel is Quorum'scountFreere-pointed at bookings — total and precondition-free so it composes. 80 Dafny VCs, 0 errors. Trust boundary (stated plainly): auth, the React UI, WebSocket/DO/D1 I/O, email, slot date/time labeling, and abuse/rate-limiting. Dafny only. - henri-lemmascript — a runnable TypeScript coding-agent CLI (a port of henri; multi-provider, Anthropic + AWS Bedrock) whose security- and protocol-critical core is verified and imported directly by the live agent —
decide()gates every real tool call and the conversation invariant is asserted on every turn (it streams end-to-end against Bedrock). Three modules, 48 Dafny VCs, 0 errors. (1) Permission gate: soundness (decide == Allow ⟺ isAllowed), path-traversal containment — auto-allow-in-cwd can never resolve outside cwd, with./..normalization proved in-core so the shell is trusted only toresolve().split('/')— grant monotonicity, andrejectPromptsis deny-only. (2) Conversation protocol: tool-call/result pairing plus the pi-lemmascript-style no-orphaned-tool_resultproperty, proved as an invariant preserved by the loop —wellFormed(msgs + [assistant(calls), tool(makeResults(calls))])— not checked after the fact. (3) Hook/config merge: removal, tool-name uniqueness — a fix (henri concatenated hook tool lists with no dedup, so two hooks could shadow a name), order-independence, and additivity composed cross-module with the gate's monotonicity (merging only grows the allow-sets, which by P3 never revokes anAllow). The merge is verified in place via//@ declare-type Tool { name: string }, shadowing the realTool's function-valuedexecuteso the actualmergeTools(Tool[])is the proof target rather than a parallel model. Dafny only.
Prerequisites: Node.js >= 18. For the Lean backend: elan. For the Dafny backend: Dafny >= 4.x.
Install from npm:
npm install lemmascriptOr from source:
git clone https://github.com/midspiral/LemmaScript.git
cd LemmaScript && npm install && npm run buildLean backend additionally requires the Loom and Velvet forks:
git clone https://github.com/namin/loom.git -b lemma ../loom
git clone https://github.com/namin/velvet.git -b lemma ../velvetnpx lsc gen --backend=dafny src/myModule.ts
npx lsc check --backend=dafny src/myModule.ts
npx lsc regen --backend=dafny src/myModule.tsThe Dafny backend generates two files per TS source: foo.dfy.gen (always regeneratable) and foo.dfy (source of truth, with LLM/user proof additions). The diff between them must be additions-only.
npx lsc gen --backend=lean src/myModule.ts
lake build//@ requires arr.length > 0
//@ ensures \result >= -1 && \result < arr.length
//@ invariant 0 <= i && i <= arr.length
//@ decreases arr.length - i
//@ type i natFor the full surface, see SPEC.md.
| File | Generated? | Purpose |
|---|---|---|
| .ts | — | TypeScript source with //@ annotations |
| .dfy.gen | Yes | Generated Dafny (merge base, always regeneratable) |
| .dfy | Yes (initial) | Annotated Dafny (gen + proof additions) |
| File | Generated? | Purpose |
|---|---|---|
| .ts | — | TypeScript source with //@ annotations |
| .types.lean | Yes | Lean types, namespace Pure defs |
| .spec.lean | No | Ghost definitions, helper lemmas |
| .def.lean | Yes | Velvet method definitions |
| .proof.lean | No | prove_correct with proof tactics |