feat(hive): BBS 树形结构 + 任务原子锁防双抢#446
Open
YinZT1 wants to merge 2 commits into
Open
Conversation
…tion Add parent_id field + /tree endpoint to agent_bbs.py so posts form a task tree rather than a flat board. Frontend rewrites to a left-to-right tree: fixed-width cards, synthesized worker nodes per (task, worker) pair with embedded communication history, and pipeline-style state badges that visualize double-claim races (⚠ N workers claimed / delivered by N workers). Worker and master prompts updated so they post with parent_id pointing to the relevant task / completion / verification, producing a right-extending pipeline TASK → 接单 → 完成 → master 验收 → 追加TASK with no client-side reparenting needed.
…laim Add claimed_by column to posts + new POST /claim endpoint backed by SQLite row-level atomic UPDATE WHERE claimed_by IS NULL. First worker to claim wins (200); concurrent attempts get 409. Verified atomic under 10-way concurrent claim race in unit smoke test. Worker prompt updated to require /claim before any work — 409 means abandon the task and look elsewhere. Frontend shows 🔒 locked badge on claimed tasks; "LOCK BYPASSED" warning kept as a guard for the case where an LLM ignores the protocol. E2E with 2 workers + 3 parallel tasks (calc/todo/weather): previously double-claimed 2 of 3 tasks under load; now each task is claimed by exactly one worker, no race observed across master verification and follow-up cycles.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
概述
给 Hive 模式的 BBS 加两层结构,让多 worker 协作可见、可锁:
/claim原子锁 — 同一 task 只能被一个 worker 抢到,靠 SQLite 行级 UPDATE 保证原子性两条独立配套:树是展示层(不强制),锁是协调层(机制硬约束)。旧 endpoint(
/post/poll/posts/file/*)行为完全保留,向后兼容。复现实验
按
memory/goal_hive_sop.md启动 hive:BBS + admin 公告 + 2 worker + 1 master,预算 10 分钟。admin #1 启动公告(原文,跑的时候用的就是这个;路径占位换成你自己的 hive 工作目录):
三个 CLI 任务彼此独立、可并行,每个 30 秒到 2 分钟级别的 LLM 推理工作量。无锁版本去掉公告里的 "Worker 行为约束" 段就行。
三轮初步观察
第 2 轮最戏剧的是 TASK-2:worker-2 已经发了
[完成]帖,worker-1 在 60 秒后又接同一 task 重做了一遍,覆盖了 worker-2 的文件。说明纯协议级约束(SOP 原本写的"确认最早接单")根本拦不住 LLM 自觉失败——抢占窗口贯穿整个 task 生命周期,不只是发布瞬间。第 3 轮加
/claim之后:任何 worker 抢同一 task 第二次都拿 409,prompt 强制它放弃这个 task 去找下一个。前端 🔒 标徽显示锁持有者。加锁后的多场景验证(5 个不同任务形态)
为确认锁在不同负载/协作模式下都稳,又跑了 5 轮各异的场景:
5 轮下来 0 次双抢/双 [接单] 出现,SQLite 行级锁在所有用到的场景下表现一致。
另观察到一个不属于本 PR 的 dispatch 公平性问题:单 worker 容易抢光多 task。锁让"输的人 409 后优雅退出",这是正确的;但如何避免一个 worker 包揽,是 dispatch 层的事,不在本 PR 范围。
改动文件(3 个)
assets/agent_bbs.py/tree接口 + claimed_by 字段 +/claim接口 + 前端 HTML 重写(横向树 + worker 合成卡片 + history 内嵌 + 🔒 锁标徽 + ⚠ race 警告)memory/goal_hive_sop.mdreflect/agent_team_worker.py/claim前置 + parent_id 规则合计 +214 / −52 行。
已知仍存在的坑
task 锁只保证「一个 task 一个 worker」。但两个不同 task 的 worker 各自合法接到自己的 task 后,如果两个 task 都改同一个文件,
file_write还是会撞——典型场景:master 发 task A「改utils.py加add()」+ task B「改utils.py加mul()」,两个 worker 各自合法持锁,但file_write仍然后写覆盖先写。task 锁拦不住这种文件层冲突——冲突在文件层不在 task 层。这个坑保留,后续单独 PR 处理。
测试
/claim同一 task post,1 个赢 9 个 409,原子性确认(SQLite 行锁可靠)Test plan checklist
/claim原子性:并发抢同一 task 只有 1 个赢后续观察点
/claim直接发[接单]——前端 "LOCK BYPASSED" 标徽用来检测这种协议违反,方便调试parent_id不强制(POST 不带也能成功),靠 prompt 引导;违反时前端会形成孤立节点便于发现