Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 16 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,25 @@
# Changelog

## 0.0.11-beta.3 — 2026-05-25
## 0.0.11-beta.3 — 2026-05-31

### Features
- Add email-OTP auth wired to the Rust `failproof-api-server` (`/v0/auth/login/request`, `/login/verify`, `/token/refresh`, `/logout`, `/me`). New `failproofai auth --login | --logout | --whoami` CLI subcommand (`src/auth/cli.ts`, dispatched from `bin/failproofai.mjs`) persists tokens to `~/.failproofai/auth.json` at mode `0600` via a shared store (`lib/auth/auth-store.ts` + `lib/auth/api-server-client.ts`); the store auto-refreshes the access token within a 60s leeway window and treats refresh-token reuse / 401 as "wipe local session". Four Next.js API routes (`app/api/auth/{status,login-request,login-verify,logout}/route.ts`) proxy the same flow for the dashboard so the refresh token never reaches the browser — only `{authenticated, user}` does. The "set a reminder" CTA in `/audit`'s `return-section.tsx` now probes `/api/auth/status` on mount and, for un-authed visitors, opens a new `AuthDialog` (`app/audit/_components/auth-dialog.tsx`, styled to match the audit aesthetic: pink corner-glyphs, dashed-frame backdrop, terminal mono inputs, masked OTP entry, live resend countdown, ESC-to-close) that walks email → OTP → "you are <email>" inline; signed-in users get a green "signed in as …" pill under the CTA. Configurable via `FAILPROOF_API_URL` (defaults to `http://localhost:8080`) and `FAILPROOFAI_AUTH_DIR` (defaults to `~/.failproofai`).

- `/audit` polish pass: simplify the "next audit" CTA to `[ install policies ]` copying the bare `failproofai policies --install` command (no longer appends per-slipping-policy short names); fix the `[ share → ]` header button to scroll to the Show-off section reliably by accounting for the sticky in-page `.app-header` height with a manual y-coord scroll + a `scroll-margin-top` fallback on `.showoff`; harden the "make poster" PNG export so the captured archetype frame no longer collides with the sigil / tagline — `show-off-cta.tsx` now `await document.fonts.ready` before capture, applies a `.capturing` class that locks every viewport-clamped font-size and grid column to an absolute value tuned for the 1100px capture width, drops `text-shadow` / `box-shadow` that html2canvas crops unpredictably, and captures with a 12px bleed on each side so the frame's corner accents and box-shadow survive the crop; and expand every archetype in `src/audit/archetypes.ts` from a single hand-written copy block to a multi-variant catalog (4–6 taglines, keyword sets, descriptions, signature blocks, "common in" / "primary risk" / closing lines per archetype, all 8 archetypes covered). A new `pickArchetypeVariant(key, seed)` picker deterministically selects one variant from each list via a djb2-seeded per-field hash mixed with a per-field axis, so the persona blurb stays stable across renders for a given seed but two different projects landing on the same archetype see different copy. `IdentitySection` consumes the resolved variant; the seed flows in from `audit-dashboard.tsx` as the inferred project name.

- Add an in-app `/audit` dashboard that turns the existing `failproofai audit` data into a personality-driven report. The page classifies every audited agent into one of 8 archetypes (`optimist`, `cowboy`, `explorer`, `goldfish`, `paranoid architect`, `precision builder`, `hammer`, `ghost`) via a weighted classifier (`src/audit/archetypes.ts`) that maps every builtin policy + every audit-only detector (47/47 coverage) to an archetype with a tuned weight. A scoring module (`src/audit/scoring.ts`) derives a 0-100 score with S/A/B/C/D/F grade thresholds, a projected-score uplift if every recommended policy were enabled, and a stable synthetic cohort rank. The page composes six sections — Identity (archetype hero with 8x8 pixel sigil + meta grid), Show-off CTA, Strengths (real numbers derived from the audit), Score + cohort leaderboard with distribution histogram, Findings (per-policy cards with what happened / cost / evidence / fix), Prescribed Policies (with projected-score callout), and a "re-audit in 7 days" return loop. Every audit-only detector is now mapped to its closest real-time builtin policy as the prescribed fix (`findings.ts:DETECTOR_TO_POLICY`) so the report never carries an "audit-only — no real-time policy" framing. New dashboard cache at `~/.failproofai/audit-dashboard.json` (mode `0600`, single slot, helper at `src/audit/dashboard-cache.ts`); `AuditResult` schema bumped to version 2 with new fields `eventsScanned`, `projectsScanned`, `enabledBuiltinNames`. New routes `app/audit/page.tsx`, `app/api/audit/run/route.ts` (POST, in-process `runAudit()` call, module-scoped run lock that 409s on concurrent clicks), `app/api/audit/status/route.ts` (GET, drives client polling), and server action `app/actions/get-audit-result.ts` (cache read, mirrors `getHooksConfigAction`'s read-only contract). "Make poster" downloads a 2x PNG of the archetype frame via html2canvas. Navbar gains an Audit entry between Policies and Projects with a slipping-through count chip. Existing runtime policy enforcement is untouched — `policy-registry.ts` gets two additive exports (`getAllPolicies` / `setAllPolicies`) used only by the new `replay.ts:restoreReplay()` snapshot/restore so embedding `runAudit()` in a long-running process no longer wipes pre-existing registrations. Ports the brand team's design kit verbatim from `assets/audit/styles.css` (1235 lines, JetBrains Mono + VT323 via Google Fonts, Architype Stedelijk shipped locally under `public/audit/fonts/`).

- Stamp `product: "failproofai-oss"` on every PostHog event across all four telemetry channels — hooks/audit (`trackHookEvent`), server (`trackEvent`), web UI (`captureClientEvent`), and npm-lifecycle install/uninstall (`trackInstallEvent`) — so OSS events stay distinguishable from any future hosted surface. The value lives in a single `POSTHOG_PRODUCT` constant in `src/posthog-key.ts`, reused by the three TypeScript channels; the standalone `scripts/install-telemetry.mjs` inlines the same literal because it can't import the TS module at install time. Honors `FAILPROOFAI_TELEMETRY_DISABLED=1` like all other telemetry (#380).

- Polish pass across `/audit`, `/policies`, and `/projects`: bump base font from `13px → 14.5px` and widen `.report` from `1180px → 1380px` (with `40px` side padding) in `globals.css` so default-zoom readability stops requiring a browser zoom-in; restore `.section` vertical padding to `64px` to match the audit reference. Remove the second in-page audit `<header>` (`app/audit/_components/app-header.tsx` deleted) and all three of its mount sites in `audit-dashboard.tsx` — the global navbar plus per-section masts cover the same chrome without the duplicate `failproof_ai / AUDIT [share →]` strip. Rewrite `score-section.tsx` end-to-end: drop the synthetic cohort leaderboard and replace with a single dashed-frame `.panel` (the new `.score-share-card`) split into two columns — left is the audit score (big tier-colored number, tier badge, progress bar to the next grade band, three stat boxes for missing policies / pts-to-next / est. days-to-fix, plus a top-N policy-status chip strip), right is share (pre-written X / Twitter and LinkedIn templates derived from `score + archetype + missing-count`, `[share on X]`, `[share on LinkedIn]`, and `[download audit card]` that html2canvas-captures the whole panel as a PNG named `failproofai-card-<grade>-<score>.png`). `audit-dashboard.tsx` drops the now-unused `syntheticRank` import / `rank` prop and threads `result` into the score section. Replace `empty-state.tsx` and `run-progress.tsx` with audit-pixel-craft versions: a `.empty-panel` with a pixel-grid sigil, Architype Stedelijk headline, and `.btn-press` CTA replaces the shadcn `Button` + `lucide-react` icon center-card; the running view becomes a terminal-style `.running-panel` (`$ failproofai audit --since 30d ▮` header with a blinking pink cursor, stage list with `✓` / `▮▮` / `○` markers and a per-stage braille spinner, and a marquee `audit-bar-fill` progress bar). Persistent **next-audit reminder** added — new `~/.failproofai/next-audit.json` (mode 0600, separate file from `auth.json` so the reminder is independent of token refresh), new `lib/auth/auth-store.ts` helpers (`readReminder` / `writeReminder` / `deleteReminder` / `getReminderFilePath` + `StoredReminder` type), new `app/api/auth/reminder/route.ts` (GET / POST / DELETE, defaults to a 7-day offset, scoped to the active session so a reminder for `a@x.com` is invisible to a CLI-authed `b@x.com`), and `/api/auth/status` now returns `reminder: { next_audit_at, user_email, set_at } | null` alongside the user. `return-section.tsx` flips behavior accordingly: signed in + reminder set → status panel ("next audit set for `<Mon Jun 8> · in 7 days`" + "signed in as `<email>`" + a `[re-audit now]` button next to `[install policies]` and a tiny "clear reminder" link); anon → `[set a reminder]` opens the existing AuthDialog and on successful sign-in writes the reminder automatically; signed in + no reminder → `[set a reminder]` writes it directly with no dialog. The `[re-audit now]` button (also shown to anon users with audit data) reuses the existing `triggerRun` poller and reloads the page once the run completes. No new dependencies; the deleted `app-header.tsx` was a 38-line component with no callers other than the three audit-dashboard mounts.

- Unify the dashboard design system around the brutalist pixel-craft aesthetic that previously lived only in `/audit`. The audit token set (`--bg`, `--ink`, `--accent-pink`, `--accent-green`, `--font-mono` → JetBrains Mono, `--font-display` → Architype Stedelijk / VT323) is now declared once in `app/globals.css`, and every shadcn-style Tailwind alias (`--background`, `--card`, `--foreground`, `--primary`, `--border`, `--radius: 0`, …) is repointed at the audit palette so existing utility classes like `bg-card` / `text-foreground` / `border-border` produce audit visuals across the whole app without rewriting any component markup. The `:root` block, body cross-hatch + grain overlays, JetBrains Mono import, and all canonical chrome classes (`.app-header`, `.h-brand*`, `.btn`, `.btn-press`, `.tabs`, `.tab`, `.section`, `.section-mast`, `.section-h`, `.report`, plus a new reusable `.panel` with pink corner brackets) are promoted to `globals.css`. `app/audit/audit-styles.css` keeps only the audit-page-only widgets (archetype frame, sigil, score grade, leaderboard, findings cards, return hook, auth dialog), so the styles loaded specifically by `/audit` no longer leak into `/policies` or `/projects` on client-side navigation. `app/layout.tsx` drops the `next/font/google` Geist Mono import — fonts now ship via the single CSS `@import url('…JetBrains+Mono…')` in `globals.css`. `components/navbar.tsx` is rewritten around `.app-header` with the pink `▮▮` mark, lowercase Architype wordmark, optional version chip, a current-section eyebrow, and `.tab` links with sharp pink underline on the active route (lucide icons in the bar removed). `app/projects/page.tsx` and its `loading.tsx` are wrapped in the `.report` + `.section` + `.panel` chrome with a green-eyebrow masthead and "your agent footprint." section heading; the inner `ProjectList` component is unchanged and picks up the unified palette automatically. `app/policies/hooks-client.tsx` swaps its outer `<div className="min-h-screen bg-background…">` for a `.report` + `.section` shell with audit masthead copy ("what your agents tried." / "what to stop them doing."), replaces the rounded-pill `TabBar` with the global `.tabs` / `.tab` underline tabs, and drops the now-redundant "Back to /projects" link (the new navbar covers cross-page navigation). No functional changes — all 1701 tests pass and the production `next build` succeeds.

### Docs
- Extend `docs/cli/auth.mdx` with a "Persistent re-audit reminder" section covering the new `~/.failproofai/next-audit.json` file and the `GET / POST / DELETE /api/auth/reminder` dashboard endpoint that backs the `/audit` `[ set a reminder ]` CTA — including the file shape, the per-email scoping rule, and the 7-day default offset.

- Document the new `failproofai auth --login | --logout | --whoami` subcommand in a dedicated `docs/cli/auth.mdx` page (mirrors the style of `cli/audit.mdx`: usage block, sign-in / sign-out / whoami sections, on-disk `auth.json` shape, env-var table, and a short troubleshooting list for the common `Could not reach the api-server` / `Rate limited` / `Code rejected` cases). Add an Authentication section to `docs/cli/environment-variables.mdx` covering `FAILPROOF_API_URL` (override the api-server base URL) and `FAILPROOFAI_AUTH_DIR` (override where `auth.json` is stored). i18n mirrors left for the translation-sync workflow.

## 0.0.11-beta.2 — 2026-05-21

### Features
Expand Down
95 changes: 95 additions & 0 deletions __tests__/audit/dashboard-cache.test.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,95 @@
// @vitest-environment node
import { describe, it, expect, beforeEach, afterEach } from "vitest";
import { existsSync, mkdirSync, mkdtempSync, rmSync, writeFileSync, statSync } from "node:fs";
import { tmpdir } from "node:os";
import { join } from "node:path";
import {
readDashboardCache,
writeDashboardCache,
isCacheStale,
} from "../../src/audit/dashboard-cache";
import type { AuditResult } from "../../src/audit/types";

const FAKE_RESULT: AuditResult = {
version: 2,
scannedAt: "2026-05-26T00:00:00.000Z",
scope: { cli: ["claude"], projects: "all", since: null },
transcripts: { scanned: 5, skipped: 0, errors: 0, durationMs: 100 },
results: [],
totals: { hits: 0, projectsWithHits: 0 },
projectsScanned: ["/home/u/a", "/home/u/b"],
eventsScanned: 42,
enabledBuiltinNames: ["block-failproofai-commands"],
};

describe("dashboard cache", () => {
let tmpHome: string;
let originalHome: string | undefined;

beforeEach(() => {
// Redirect homedir() to a tmp directory by overriding HOME — os.homedir()
// reads it on every call on POSIX, so the dashboard-cache module sees
// our tmp path without needing module mocks.
tmpHome = mkdtempSync(join(tmpdir(), "fpa-audit-cache-test-"));
originalHome = process.env.HOME;
process.env.HOME = tmpHome;
});

afterEach(() => {
if (originalHome === undefined) delete process.env.HOME;
else process.env.HOME = originalHome;
try { rmSync(tmpHome, { recursive: true, force: true }); } catch { /* ignore */ }
});

it("returns null when no cache file exists", () => {
expect(readDashboardCache()).toBeNull();
});

it("round-trips a written entry", () => {
writeDashboardCache({ since: "7d" }, FAKE_RESULT);
const entry = readDashboardCache();
expect(entry).not.toBeNull();
expect(entry?.params).toEqual({ since: "7d" });
expect(entry?.result.transcripts.scanned).toBe(5);
expect(entry?.result.projectsScanned).toEqual(["/home/u/a", "/home/u/b"]);
expect(typeof entry?.cachedAt).toBe("string");
});

it("writes mode 0600 on the file", () => {
writeDashboardCache({}, FAKE_RESULT);
const cachePath = join(tmpHome, ".failproofai", "audit-dashboard.json");
expect(existsSync(cachePath)).toBe(true);
const mode = statSync(cachePath).mode & 0o777;
// Some filesystems (FAT, etc.) can't honor mode bits perfectly — just
// assert no world-readable bit is set.
expect(mode & 0o004).toBe(0);
});

it("returns null for a corrupt JSON cache file", () => {
const dir = join(tmpHome, ".failproofai");
mkdirSync(dir, { recursive: true });
writeFileSync(join(dir, "audit-dashboard.json"), "{ not json", "utf-8");
expect(readDashboardCache()).toBeNull();
});

it("returns null when shape is wrong", () => {
const dir = join(tmpHome, ".failproofai");
mkdirSync(dir, { recursive: true });
writeFileSync(join(dir, "audit-dashboard.json"), JSON.stringify({ foo: 1 }), "utf-8");
expect(readDashboardCache()).toBeNull();
});

it("isCacheStale returns true past the threshold", () => {
const old = new Date(Date.now() - 60 * 60_000).toISOString(); // 1 hour ago
expect(isCacheStale(old, 30)).toBe(true);
});

it("isCacheStale returns false within the threshold", () => {
const recent = new Date(Date.now() - 10 * 60_000).toISOString(); // 10 min ago
expect(isCacheStale(recent, 30)).toBe(false);
});

it("isCacheStale treats unparseable timestamps as stale", () => {
expect(isCacheStale("not-a-date")).toBe(true);
});
});
53 changes: 52 additions & 1 deletion __tests__/audit/replay.test.ts
Original file line number Diff line number Diff line change
@@ -1,6 +1,12 @@
// @vitest-environment node
import { describe, it, expect, beforeEach } from "vitest";
import { resetReplay, replayEvent } from "../../src/audit/replay";
import { resetReplay, replayEvent, initReplay, restoreReplay } from "../../src/audit/replay";
import {
clearPolicies,
getAllPolicies,
registerPolicy,
} from "../../src/hooks/policy-registry";
import { allow } from "../../src/hooks/policy-helpers";
import type { NormalizedToolEvent } from "../../src/audit/types";

function bash(command: string): NormalizedToolEvent {
Expand Down Expand Up @@ -50,3 +56,48 @@ describe("replay engine", () => {
expect(hits.some((h) => h.eventType === "PostToolUse")).toBe(true);
});
});

describe("replay registry snapshot/restore", () => {
beforeEach(() => {
resetReplay();
clearPolicies();
});

it("restoreReplay puts back the pre-init registry", () => {
registerPolicy(
"test/custom-marker",
"test policy",
async () => allow(),
{ events: ["PreToolUse"] },
);
const before = getAllPolicies().map((p) => p.name).sort();
expect(before).toContain("test/custom-marker");

initReplay();
const duringInit = getAllPolicies().map((p) => p.name);
expect(duringInit).not.toContain("test/custom-marker");
expect(duringInit.length).toBeGreaterThan(10); // builtins are loaded

restoreReplay();
const after = getAllPolicies().map((p) => p.name).sort();
expect(after).toEqual(before);
});

it("restoreReplay is idempotent when called twice", () => {
registerPolicy(
"test/another-marker",
"test policy",
async () => allow(),
{ events: ["PreToolUse"] },
);
initReplay();
restoreReplay();
restoreReplay(); // second call should be a no-op
expect(getAllPolicies().map((p) => p.name)).toContain("test/another-marker");
});

it("restoreReplay before initReplay is a no-op", () => {
expect(() => restoreReplay()).not.toThrow();
expect(getAllPolicies()).toEqual([]);
});
});
24 changes: 24 additions & 0 deletions app/actions/get-audit-result.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
"use server";

import { readDashboardCache } from "@/src/audit/dashboard-cache";
import type { AuditResult, RunAuditOptions } from "@/src/audit/types";

export type AuditResultPayload =
| { status: "cached"; cachedAt: string; params: RunAuditOptions; result: AuditResult }
| { status: "empty" };

/**
* Read the dashboard cache. Never triggers a run — `/audit` shows the empty
* state when there's no cache and lets the user opt in to scanning. Mirrors
* the read-only ergonomics of `getHooksConfigAction()`.
*/
export async function getAuditResultAction(): Promise<AuditResultPayload> {
const entry = readDashboardCache();
if (!entry) return { status: "empty" };
return {
status: "cached",
cachedAt: entry.cachedAt,
params: entry.params,
result: entry.result,
};
}
Loading