A simple voice chat web application using OpenAI's Speech-to-speech architecture with real-time voice interaction.
- Real-time API: Low-latency WebRTC streaming for interactive conversations
- Chained API: Speech-to-text β AI β Text-to-speech pipeline for reliable processing
- Architecture Comparison: Easy switching between modes to compare performance
- Modern UI: Beautiful, responsive design with voice indicators
- Visual Feedback: Animated voice wave indicators and status updates
- Cross-platform: Works on desktop and mobile browsers
- WebRTC Integration: Uses OpenAI's Realtime API for low-latency communication
- Natural Conversation: Speak naturally with an AI assistant
- Voice Selection: Multiple AI voices including new Realtime API voices
- Reliable Processing: Traditional pipeline approach for consistent results
- Python Integration: Optional Python-based processing using OpenAI Agents SDK
- Click-to-Record: Simple recording interface for chained mode
- Node.js (v16 or higher)
- OpenAI API key with Realtime API access
-
Clone and setup the project:
git clone <your-repo-url> cd voice-chat npm install
-
Configure your OpenAI API key:
cp .env.example .env # Edit .env and add your OpenAI API key -
Start the server:
npm start # or for development with auto-reload: npm run dev -
Open your browser:
- Navigate to
http://localhost:3000 - Use the "π€ Voice Chat" tab for conversations
- Use the "π Data Explorer" tab to view saved sessions
- Navigate to
For advanced chained processing using the OpenAI Agents Python SDK:
-
Set up Python environment:
python setup_python.py
-
Activate virtual environment:
# On Windows venv\Scripts\activate # On macOS/Linux source venv/bin/activate
-
Start Python server:
python python_chained_server.py
-
Configure environment: Set
USE_PYTHON_CHAINED=truein your.envfile to use Python processing
- Visit
http://localhost:3000for AI voice chat - Select API Mode:
- Real-time API: For low-latency interactive conversations
- Chained API: For reliable processing with click-to-record
- Choose your preferred AI voice
- Click "Connect" to start the voice session
- Allow microphone access when prompted by your browser
- For Real-time mode: Start speaking naturally - the AI will respond with voice
- For Chained mode: Click "Start Recording", speak, then click "Stop Recording"
- Use "Disconnect" to end the session
- Real-time API: Best for interactive conversations, lower latency
- Chained API: More reliable processing, higher latency but consistent results
- Real-time API: Cedar, Marin (recommended) + Alloy, Echo, Fable, Onyx, Nova, Shimmer (legacy)
- Chained API: Alloy, Echo, Fable, Onyx, Nova, Shimmer (TTS API voices only)
- Note: Cedar and Marin are Realtime API exclusive and won't work with chained mode
This application supports two voice processing architectures:
- Frontend: HTML/CSS/JavaScript with OpenAI Agents Realtime SDK
- Backend: Express.js server for API key management
- Voice Processing: WebRTC for real-time audio streaming
- AI Model: GPT-4o Realtime Preview for natural conversation
- Frontend: HTML/CSS/JavaScript with click-to-record interface
- Backend: Express.js server with chained processing endpoints
- Voice Processing: Speech-to-text β AI β Text-to-speech pipeline
- AI Model: GPT-4o-mini for reliable processing
- Optional Python: Advanced chained processing using OpenAI Agents Python SDK
voice-chat/
βββ public/
β βββ index.html # Main application page
β βββ styles.css # Modern UI styling
β βββ app.js # Voice chat application logic
βββ server.js # Express.js backend server
βββ package.json # Node.js dependencies and scripts
βββ requirements.txt # Python dependencies
βββ chained_voice_pipeline.py # Python chained voice pipeline
βββ python_chained_server.py # Python FastAPI server
βββ setup_python.py # Python environment setup
βββ .env.example # Environment variables template
βββ README.md # This file
Create a .env file with:
OPENAI_API_KEY=your_openai_api_key_here
PORT=3000- Get an API key from OpenAI Platform
- Ensure you have access to the Realtime API
- Add your key to the
.envfile
You can customize the AI assistant in public/app.js:
const agent = new RealtimeAgent({
name: 'Assistant',
instructions: 'Your custom instructions here...',
voice: 'alloy', // or 'echo', 'fable', 'onyx', 'nova', 'shimmer'
interruptible: true
});Modify public/styles.css to customize the appearance:
- Colors and gradients
- Button styles
- Voice wave animations
- Responsive breakpoints
-
Microphone Access Denied
- Ensure your browser has microphone permissions
- Check browser settings for site permissions
-
Connection Failed
- Verify your OpenAI API key is correct
- Check that you have Realtime API access
- Ensure your internet connection is stable
-
Audio Issues
- Check your system audio settings
- Try refreshing the page
- Ensure WebRTC is supported in your browser
- Chrome/Chromium (recommended)
- Firefox
- Safari (with limitations)
- Edge
The chained architecture automatically saves conversation data:
saved_data/
βββ audio/
β βββ session_2024-01-15T10-30-45-123Z_input.webm
β βββ session_2024-01-15T10-30-45-123Z_output.mp3
βββ transcripts/
βββ session_2024-01-15T10-30-45-123Z_transcript.json
- Input Audio: User's recorded speech (WebM format)
- Output Audio: AI's generated speech (MP3 format)
- Transcript: Complete conversation with timestamps
- Metadata: Voice settings, instructions, session info
- Visit
http://localhost:3000/and click the "π Data Explorer" tab - Browse all saved sessions
- Click speaker icons (π) next to each message to play audio
- View complete conversation transcripts with individual message playback
- Alternative: Visit
http://localhost:3000/saved-data.htmlfor standalone viewer
GET /api/saved-data- List all sessionsGET /api/saved-data/:sessionId- Get specific sessionGET /api/saved-data/:sessionId/:messageId/:audioType- Play individual message audioGET /api/saved-data/:sessionId/:audioType- Legacy audio endpoint (backward compatibility)
This application uses the OpenAI Agents Realtime SDK.
Key components:
RealtimeAgent: Defines the AI assistant behaviorRealtimeSession: Manages the voice conversation- Event listeners for real-time status updates
- Fork the repository
- Create a feature branch
- Make your changes
- Test thoroughly
- Submit a pull request
MIT License - see LICENSE file for details
- OpenAI for the Realtime API and Agents SDK
- WebRTC for real-time audio streaming
- Modern web standards for seamless voice interaction