Request
Please expose a supported low-latency audio stream from Bee, ideally as raw audio frames/chunks or an equivalent realtime websocket/webhook/API. This would make Bee usable as the front end for personal agents and action automation, not only retrospective notes/transcripts.
Current limitation
Today the public CLI path appears to be transcript-first:
bee stream --json --types new-utterance emits utterance events, but the realtime docs describe the stream as at-most-once delivery.
bee now --json can backfill, but it is polling-oriented and can lag behind speech.
- In practice, action workflows that wait for processed utterances can miss commands or arrive too late for voice-assistant UX.
For my setup, Bee transcribes speech, a VPS ingests it, and a local agent routes explicit wake-word commands like "Hermes ..." to approved actions. This works for slow tasks, but the connection is brittle for Jarvis-style voice control because the system cannot get audio bytes or low-latency transcript deltas directly.
Desired API shape
Any one of these would help:
- WebSocket or webhook delivering PCM/Opus/AAC audio chunks with timestamps.
- Realtime transcript deltas with stable utterance IDs and delivery acknowledgements.
- A local/device stream from the Bee CLI that can be consumed by an agent process.
- Clear latency target and ordering/deduplication semantics.
Ideal target: sub-second to a few seconds end-to-end from speech to agent callback. Raw audio access would also allow users to run their own wake-word detection, ASR, latency measurement, and fallback routing without waiting for post-processing.
Safety/use case
This is for the owner’s own Bee device and opt-in personal automation. The agent side can keep a wake-word gate plus an allowlist of actions. I am not asking for other users’ data or bypassing consent controls, just an official way to stream my own captured audio or lower-latency speech events into my own automation stack.
Request
Please expose a supported low-latency audio stream from Bee, ideally as raw audio frames/chunks or an equivalent realtime websocket/webhook/API. This would make Bee usable as the front end for personal agents and action automation, not only retrospective notes/transcripts.
Current limitation
Today the public CLI path appears to be transcript-first:
bee stream --json --types new-utteranceemits utterance events, but the realtime docs describe the stream as at-most-once delivery.bee now --jsoncan backfill, but it is polling-oriented and can lag behind speech.For my setup, Bee transcribes speech, a VPS ingests it, and a local agent routes explicit wake-word commands like "Hermes ..." to approved actions. This works for slow tasks, but the connection is brittle for Jarvis-style voice control because the system cannot get audio bytes or low-latency transcript deltas directly.
Desired API shape
Any one of these would help:
Ideal target: sub-second to a few seconds end-to-end from speech to agent callback. Raw audio access would also allow users to run their own wake-word detection, ASR, latency measurement, and fallback routing without waiting for post-processing.
Safety/use case
This is for the owner’s own Bee device and opt-in personal automation. The agent side can keep a wake-word gate plus an allowlist of actions. I am not asking for other users’ data or bypassing consent controls, just an official way to stream my own captured audio or lower-latency speech events into my own automation stack.