Project assembled by AI with Apex
Learn how to create and configure ElevenLabs agents (in general and) for this application:
| Tutorial | Description |
|---|---|
| [DOCS] Building Your First ElevenLabs Agent | Complete walkthrough of creating your base conversational AI agent |
| [FILES] Setting Up Knowledge Base (RAG) | Quick 60-second guide to prepare your agent's knowledge base |
| [TOOLS] Creating Agent Tools & Functions | Build your first agent tool for contact detail collection |
| [NOTES] Handling Call Transcripts | Process and manage post-call transcripts effectively |
| [NEW] Advanced Features & Configuration | Explore new features and advanced usage patterns |
A sophisticated multi-provider voice AI web application built with React 19, TypeScript, and support for 8 different voice AI providers. Experience real-time voice conversations with beautiful audio visualizations and a modern glassmorphism UI.
- The very first version featured just the Elevenlabs Widget and was built with Lovable.dev and Cursor
- All revisions to the app since its initial launch were made with Claude Code Plugin Skill 'Apex Spec System': https://github.com/moshehbenavraham/apex-spec-system
- Real-time Voice Conversation: Talk naturally with AI using multiple voice providers
- 8 Voice Providers: ElevenLabs (Widget + SDK), xAI Grok, OpenAI Realtime, Ultravox, Vapi, Retell, and Google Gemini Live
- Audio Visualization: Beautiful 60fps audio visualizer with real-time frequency analysis
- Glassmorphism Design: Modern, premium UI with dark/light theme toggle
- Mobile-First: Responsive design optimized for all devices (375px to 1920px)
- Accessibility: Full keyboard navigation, ARIA support, and respects prefers-reduced-motion
- Voice Selection UI: Choose from multiple voices per provider
- Real-time Transcript: Live conversation transcript with user/AI message differentiation and auto-scroll
- Automatic Reconnection: WebSocket reconnection with exponential backoff (1s, 2s, 4s, 8s, max 30s)
- Function Calling: AI can execute tools like weather lookup, time queries, and calculations
- Connection Status: Visual indicators for connecting, connected, reconnecting states
- Voice Persistence: Selected voice saved to localStorage across sessions
- Docker Support: Full containerization with Docker and docker-compose
- E2E Testing: Playwright-based end-to-end testing with multi-browser support
- Voice Flow Tests: Comprehensive voice connection, transcript, and function calling tests
- 429+ Unit Tests: Extensive test coverage including voice, contexts, hooks, and accessibility
- Configuration Modal: Settings modal for configuring voice providers
This application supports 8 voice AI providers through a tabbed interface:
| Provider | Status | Backend Required | Description |
|---|---|---|---|
| ElevenLabs Widget | Available | No | Pre-built embed from ElevenLabs CDN with customizable UI |
| ElevenLabs SDK | Available | No | Custom React UI with @elevenlabs/react SDK |
| xAI Grok | Available | Yes | Grok-powered voice assistant with realtime API |
| OpenAI | Available | Yes | GPT-4o realtime voice conversations with server VAD |
| Ultravox | Available | Yes | Low-latency voice AI with call-based WebSocket connections |
| Vapi | Available | No | Voice AI platform with Daily.co WebRTC and public web token |
| Retell | Available | Yes | Retell AI with LiveKit WebRTC and agent dashboard config |
| Gemini Live | Available | Yes | Google Gemini Live with AudioWorklet and 30 HD voices |
# Add to your .env file (used by both Widget and SDK tabs)
VITE_ELEVENLABS_AGENT_ID=your_agent_id_here
VITE_ELEVENLABS_ENABLED=true # Enable Widget tab
VITE_ELEVENLABS_SDK_ENABLED=true # Enable SDK tab# Server-side environment (xAI requires backend authentication)
XAI_API_KEY=your_xai_api_key_here
# Client-side (enable xAI in frontend)
VITE_XAI_ENABLED=true
VITE_XAI_VOICE=Ara # Options: Ara, Eve, Leo, Rex, Sal
VITE_API_BASE_URL=http://localhost:3001# Server-side environment (OpenAI requires backend for ephemeral tokens)
OPENAI_API_KEY=sk-your_openai_api_key_here
# Client-side (enable OpenAI in frontend)
VITE_OPENAI_ENABLED=true
VITE_OPENAI_VOICE=alloy # Options: alloy, ash, ballad, coral, echo, sage, shimmer, verse
VITE_API_BASE_URL=http://localhost:3001OpenAI uses the Realtime API with ephemeral tokens for secure WebSocket connections. The backend generates short-lived tokens (60s expiry) so your API key is never exposed to the client.
# Server-side environment (Ultravox requires backend for call creation)
ULTRAVOX_API_KEY=your_ultravox_api_key_here
# Client-side (enable Ultravox in frontend)
VITE_ULTRAVOX_ENABLED=true
VITE_ULTRAVOX_VOICE=Mark
VITE_API_BASE_URL=http://localhost:3001Ultravox uses a call-based model where the backend creates a call via REST API and returns a WebSocket joinUrl. The frontend connects using the ultravox-client SDK.
# Client-side only (Vapi uses public web token, no backend required)
VITE_VAPI_ENABLED=true
VITE_VAPI_WEB_TOKEN=your_public_web_token_here
VITE_VAPI_ASSISTANT_ID=your_assistant_id_here # Optional
VITE_VAPI_VOICE=paula # Default voice
VITE_VAPI_MODEL=gpt-3.5-turbo # Model selectionVapi uses a frontend-only integration with a public web token. The @vapi-ai/web SDK handles all connection and audio via Daily.co WebRTC. No backend is required.
# Server-side environment (Retell requires backend for access tokens)
RETELL_API_KEY=key_your_retell_api_key_here
# Client-side (enable Retell in frontend)
VITE_RETELL_ENABLED=true
VITE_RETELL_AGENT_ID=your-retell-agent-id
VITE_API_BASE_URL=http://localhost:3001Retell uses a backend-generated access token for secure WebRTC connections via LiveKit. The agent configuration (voice, LLM, prompts) is managed in the Retell Dashboard. The retell-client-js-sdk handles audio streaming.
# Server-side environment (Gemini requires backend for ephemeral tokens)
GEMINI_API_KEY=your_gemini_api_key_here
# Client-side (enable Gemini in frontend)
VITE_GEMINI_ENABLED=true
VITE_GEMINI_VOICE=Puck # Options: Puck, Charon, Kore, Fenrir, Aoede, + 25 more HD voices
VITE_API_BASE_URL=http://localhost:3001Gemini Live uses ephemeral tokens from the backend for secure WebSocket connections. Features AudioWorklet-based audio capture (16kHz) and playback (24kHz), 30 HD voices, session timer with warnings, and thinking state visualization. Sessions are limited to 15 minutes.
- Smooth Tab Transitions: Framer Motion animations for seamless provider switching
- Empty State Guidance: Clear setup instructions when providers aren't configured
- Keyboard Navigation: Arrow keys to navigate tabs, Enter/Space to select
- Mobile Responsive: Horizontal scrolling tabs on smaller screens
- Touch Optimized: 44px minimum touch targets for accessibility
This project includes comprehensive documentation:
- Quick Start Guide - Get up and running in minutes
- Installation & Configuration - Detailed setup instructions
- Deployment Guide - Production deployment for Vercel, Netlify, AWS, Firebase
- Architecture Overview - System design, components, and data flow
- API Integration Guide - Voice SDK integration and best practices
- Voice Features Documentation - Voice orb, audio visualization, and voice interactions
- Mobile Optimization Guide - Touch interactions, PWA features, and mobile performance
- Claude Code Integration Guide - Development commands, architecture overview, and guidelines for Claude Code
- Contributing Guidelines - Development setup, code style, and contribution process
- Code of Conduct - Community standards and guidelines
- Support Guide - Getting help, troubleshooting, and community resources
- Security Policy - Vulnerability reporting and security best practices
- Troubleshooting Guide - Common issues, solutions, and diagnostic tools
- Changelog - Version history and release notes
| Type | Documentation | Description |
|---|---|---|
| [DEPLOY] | Deployment | Production deployment guides |
| [DEMO] | Demo Mode | ngrok demo mode setup |
| [ARCH] | Architecture | Technical system design |
| [VOICE] | Voice Features | Voice AI functionality |
| [MOBILE] | Mobile Guide | Mobile optimization |
| [API] | API Integration | Voice SDK integration guide |
| [HELP] | Troubleshooting | Problem resolution |
| [AI] | Claude Integration | AI assistant development guide |
| [CONTRIB] | Contributing | Development guidelines |
| [SECURE] | Security | Security policies |
- Node.js 18+ and npm (or Bun)
- Modern browser with microphone access
- Voice provider credentials (see provider setup sections above)
# Clone the repository
git clone <YOUR_GIT_URL>
cd Voice-Agent-PuPuPlatter
# Install dependencies
npm install
# or with Bun
bun install
# Copy environment template
cp .env.example .env
# Edit .env with your provider credentials
# Start development server (frontend only)
npm run dev
# Or start both frontend and backend
npm run dev:allThe frontend runs on port 8082 by default. The backend API server runs on port 3001.
# Frontend only
npm run dev # http://localhost:8082
# Frontend + Backend (for xAI, OpenAI, Ultravox, Retell)
npm run dev:all # Frontend: 8082, Backend: 3001
# Backend only
npm run server # http://localhost:3001Demo mode exposes your local development environment via secure HTTPS tunnels for client demos, mobile testing, and team collaboration.
# Quick start
npm run demoThis starts ngrok tunnels, the frontend, and backend with automatic CORS configuration. A shareable demo card is displayed:
+--------------------------------------------------------------+
| Voice-Agent-PuPuPlatter - Demo Mode Active |
| |
| Frontend: https://abc123.ngrok-free.app |
| Backend: https://def456.ngrok-free.app |
| |
| Press Ctrl+C to stop |
+--------------------------------------------------------------+
Prerequisites:
- Install ngrok:
./scripts/ngrok/install-instructions.sh - Authenticate:
ngrok config add-authtoken YOUR_TOKEN - Install jq:
sudo apt install jq(orbrew install jq)
Optional Configuration (in .env):
NGROK_DOMAIN=myapp.ngrok.dev # Custom domain (paid plans)
NGROK_AUTH_USER=demo # Basic auth username
NGROK_AUTH_PASS=secretpass # Basic auth passwordSee docs/DEMO_MODE.md for comprehensive documentation, troubleshooting, and provider-specific notes.
- Framework: React 19.2, TypeScript 5.9
- Build Tool: Vite 7.2 with SWC for fast compilation
- Styling: Tailwind CSS 4.1, Framer Motion animations
- Voice AI SDKs:
- @elevenlabs/react v0.12.3
- @vapi-ai/web v2.5.2
- ultravox-client v0.5.0
- retell-client-js-sdk v2.0.7
- @google/genai v1.37.0 (Gemini Live)
- UI Components: Radix UI primitives with shadcn/ui styling
- State Management: React Context, TanStack Query
- Testing: Vitest, React Testing Library, Playwright
- Code Quality: ESLint, Prettier, Husky, lint-staged
- Containerization: Docker, docker-compose
The app is built mobile-first with:
- Touch-optimized controls (44px+ tap targets)
- Responsive breakpoints: 375px -> 768px -> 1024px+
- Thumb-reachable CTAs in bottom 20% of viewport
- Optimized for both portrait and landscape orientations
- Primary: Purple (#7C3AED) to Pink (#EC4899) gradients
- Background: Dark slate (#050714) with glassmorphism overlays
- Glass: Semi-transparent containers with backdrop blur
- Text: High contrast white/slate for accessibility
- Font: Inter (300-700 weights)
- Scale: Mobile-first responsive scaling
- Hierarchy: Clear visual hierarchy with gradient text accents
- Duration: 0.8s for major transitions, 0.2s for interactions
- Easing:
ease-outfor natural motion - Respect:
prefers-reduced-motionfor accessibility
The project includes a comprehensive test suite:
# Run tests in watch mode
npm run test
# Run tests once (CI mode)
npm run test:run
# Run tests with UI
npm run test:ui
# Run E2E tests
npm run test:e2e
# Run E2E tests with UI
npm run test:e2e:ui- 623+ tests covering components, contexts, hooks, and utilities
- 28 test files with comprehensive coverage
- Voice configuration tests - Provider voice selection, persistence
- Reconnection tests - Backoff logic, retry limits, connection recovery
- Conversation tests - Message bubbles, transcript panel, auto-scroll
- Function calling tests - Tool definitions, execution, result handling
- ProviderContext tests - Provider selection, localStorage persistence
- ProviderTabs tests - Tab rendering, keyboard navigation, accessibility
- Audio utilities tests - PCM encoding/decoding, base64 conversion
- Unit Tests: Component behavior, hooks, and utility functions
- Accessibility Tests: ARIA labels, keyboard navigation (Arrow keys, Tab, Enter)
- Integration Tests: Provider switching, voice connection flows
- E2E Tests: Full user flows with Playwright (Chromium, Firefox, WebKit)
# Build the application
npm run build
# Preview production build locally
npm run preview# Build Docker image
npm run docker:build
# Start with docker-compose
npm run docker:up
# Stop containers
npm run docker:down# Install Vercel CLI
npm i -g vercel
# Deploy
vercel --prod# Build command: npm run build
# Publish directory: dist- Build the project with
npm run build - Upload the
distfolder contents to your web server - Ensure HTTPS is configured for microphone access
-
Security
- Configure environment variables on your hosting platform (never commit
.env) - Use server-side API key proxy for xAI, OpenAI, Ultravox, Retell
- Enable HTTPS with valid SSL certificate
- Set appropriate CORS origins in backend
- Configure environment variables on your hosting platform (never commit
-
Backend Server
- Deploy the Express backend (
server/directory) separately or as serverless functions - Configure
CORS_ORIGINto match your frontend URL - Ensure API keys are set in server environment variables
- Deploy the Express backend (
-
Environment Variables
# Frontend (build-time) VITE_ELEVENLABS_AGENT_ID=your_agent_id VITE_ELEVENLABS_ENABLED=true VITE_ELEVENLABS_SDK_ENABLED=true VITE_XAI_ENABLED=true VITE_OPENAI_ENABLED=true VITE_ULTRAVOX_ENABLED=true VITE_VAPI_ENABLED=true VITE_VAPI_WEB_TOKEN=your_web_token VITE_RETELL_ENABLED=true VITE_RETELL_AGENT_ID=your_retell_agent_id VITE_GEMINI_ENABLED=true VITE_GEMINI_VOICE=Puck VITE_API_BASE_URL=https://your-backend-api.com # Backend (runtime) ELEVENLABS_API_KEY=sk_xxx XAI_API_KEY=xai-xxx OPENAI_API_KEY=sk-xxx ULTRAVOX_API_KEY=your_ultravox_key RETELL_API_KEY=key_xxx GEMINI_API_KEY=your_gemini_key CORS_ORIGIN=https://your-frontend.com
-
Browser Compatibility
- Chrome/Edge: Full support
- Firefox: Full support
- Safari: Requires user gesture for AudioContext (handled automatically)
- Mobile: Works on iOS Safari 15+ and Chrome for Android
-
Reconnection Behavior
- Automatic reconnection on network interruption
- Exponential backoff: 1s, 2s, 4s, 8s, up to 30s max
- Maximum 10 retry attempts before giving up
- User can manually reconnect after max retries
- Audio Processing: 60fps canvas animation with WebAudio API
- Bundle Size: Optimized with Vite tree-shaking
- Loading: Progressive enhancement with loading states
- Accessibility: Full ARIA support and keyboard navigation
- Audio Data: Processed locally and streamed securely to voice providers
- No Storage: Conversations are not stored locally by default
- Permissions: Explicit microphone permission requests
- HTTPS: Required for microphone access in production
- API Keys: Server-side only for providers requiring backend
- Fork the repository
- Create a feature branch:
git checkout -b feat/amazing-feature - Commit changes:
git commit -m 'feat: add amazing feature' - Push to branch:
git push origin feat/amazing-feature - Open a Pull Request
feat:New featuresfix:Bug fixesdocs:Documentation changesstyle:Formatting changesrefactor:Code refactoringtest:Adding testschore:Maintenance tasks
src/
|-- components/ # Reusable UI components
| |-- voice/ # Voice interaction components
| | |-- VoiceButton.tsx
| | |-- VoiceStatus.tsx
| | |-- VoiceVisualizer.tsx
| | |-- VoiceSelector.tsx
| | |-- VoiceWidget.tsx
| | |-- FunctionCallIndicator.tsx
| | \-- ReconnectionStatus.tsx
| |-- providers/ # Provider-specific components
| | |-- ElevenLabsProvider.tsx
| | |-- OpenAIProvider.tsx
| | |-- XAIProvider.tsx
| | |-- UltravoxProvider.tsx
| | |-- VapiProvider.tsx
| | |-- RetellProvider.tsx
| | \-- GeminiProvider.tsx
| |-- conversation/ # Conversation UI components
| | |-- ConversationPanel.tsx
| | |-- MessageBubble.tsx
| | |-- ElevenLabsConversationPanel.tsx
| | |-- OpenAIConversationPanel.tsx
| | |-- XAIConversationPanel.tsx
| | |-- UltravoxConversationPanel.tsx
| | |-- VapiConversationPanel.tsx
| | \-- GeminiConversationPanel.tsx
| |-- tabs/ # Tab navigation
| | |-- ProviderTabs.tsx
| | \-- ProviderTab.tsx
| |-- settings/ # Settings components
| | |-- ConfigurationDialog.tsx
| | |-- ProviderSettingsPanel.tsx
| | \-- ConnectionDiagnostics.tsx
| |-- ui/ # shadcn/ui components
| |-- BackgroundEffects.tsx
| |-- HeroSection.tsx
| |-- VoiceEnvironment.tsx
| |-- ParticleSystem.tsx
| \-- ThemeToggle.tsx
|-- contexts/ # React contexts
| |-- ThemeContext.tsx
| |-- VoiceContext.tsx # ElevenLabs SDK state
| |-- XAIVoiceContext.tsx
| |-- OpenAIVoiceContext.tsx
| |-- UltravoxVoiceContext.tsx
| |-- VapiVoiceContext.tsx
| |-- GeminiVoiceContext.tsx # Gemini Live state
| \-- ProviderContext.tsx # Active provider selection
|-- hooks/ # Custom React hooks
| |-- useReconnection.ts # WebSocket reconnection with backoff
| |-- useVapiVoice.ts
| |-- useRetellVoice.ts
| |-- useUltravoxVoice.ts
| |-- useOpenAIVoice.ts
| |-- useXAIVoice.ts
| |-- useGeminiVoice.ts # Gemini Live hook
| |-- useAccessibility.ts
| |-- useReducedMotion.ts
| \-- use-toast.ts
|-- lib/ # Utility functions
| |-- utils.ts
| |-- audio/ # Audio processing utilities
| | \-- audioUtils.ts
| |-- gemini/ # Gemini Live utilities
| | |-- audioUtils.ts # PCM encoding/decoding (16kHz/24kHz)
| | |-- audio-recorder.ts # Microphone capture (AudioWorklet)
| | |-- audio-streamer.ts # Audio playback
| | |-- genai-live-client.ts # WebSocket client
| | |-- config.ts # Voice/model configuration
| | \-- types.ts # TypeScript interfaces
| |-- worklets/ # AudioWorklet processors
| | \-- gemini-audio-worklet.ts
| |-- tools/ # Function calling tool definitions
| | \-- toolDefinitions.ts
| \-- vapi.ts # Vapi utilities
|-- pages/ # Page components
| |-- Index.tsx # Main application page
| \-- NotFound.tsx # 404 page
|-- test/ # Test files (28 test files, 623+ tests)
|-- types/ # TypeScript type definitions
| |-- ultravox.ts
| |-- vapi.ts
| |-- retell.ts
| |-- gemini.ts # Gemini types
| \-- voice-provider.ts
\-- server/ # Backend API server
|-- index.js # Express server
\-- routes/ # API routes
Each voice provider has:
- Provider Component (
*Provider.tsx): UI wrapper with buttons, status, visualizer - Voice Hook (
use*Voice.ts): Connection logic, state management, events - Voice Context (
*VoiceContext.tsx): Global state for the provider - Conversation Panel (
*ConversationPanel.tsx): Real-time transcript display
ProviderTabs.tsxrenders all enabled provider tabs- Tabs are controlled via environment variables (
VITE_*_ENABLED) - Smooth Framer Motion transitions between providers
- Keyboard navigation with arrow keys
- Chrome/Edge: Full feature support
- Firefox: Full support, some animation optimizations may vary
- Safari: WebAudio API requires user gesture for initialization
- Mobile Browsers: Optimized for mobile, background tab throttling may affect audio
- HTTPS Requirement: Microphone access requires HTTPS in production
- Safari WebAudio: May require user interaction before audio processing starts
- Mobile Chrome: Background tab throttling affects audio visualization
- Provider Configuration: Each provider requires specific setup (see Configuration section)
- Verify your provider credentials are correctly configured
- Check browser console for specific error messages
- Ensure microphone permissions are granted
- Verify HTTPS is used in production
- For backend-requiring providers, ensure the backend is running
- Check browser compatibility with Web Audio API
- Ensure microphone permissions are granted
- Verify audio input device is working
- Check for browser tab throttling
- Reduce animation complexity in settings
- Disable particle effects on lower-end devices
- Check for browser-specific optimizations
- Node.js 18+ (or Bun)
- Modern browser with microphone support
- Voice provider credentials
# Development server
npm run dev # Frontend only
npm run dev:all # Frontend + Backend
npm run server # Backend only
# Build
npm run build # Production build
npm run build:dev # Development build
npm run preview # Preview production build
# Code quality
npm run lint # Run ESLint
npm run format # Format with Prettier
npm run format:check # Check formatting
# Testing
npm run test # Watch mode
npm run test:run # Single run
npm run test:ui # Visual UI
npm run test:e2e # Playwright E2E
# Docker
npm run docker:build # Build image
npm run docker:up # Start containers
npm run docker:down # Stop containers- Clone the repository
- Install dependencies with
npm installorbun install - Copy
.env.exampleto.envand configure your credentials - Start development server with
npm run dev(ornpm run dev:allfor full stack) - Open
http://localhost:8082in your browser
This project is licensed under the MIT License - see the LICENSE file for details.
- ElevenLabs for conversational AI technology
- OpenAI for GPT-4o Realtime API
- xAI for Grok voice assistant
- Ultravox for low-latency voice AI
- Vapi for voice AI platform
- Retell AI for conversational AI
- Google Gemini for Gemini Live API
- Radix UI for accessible UI primitives
- shadcn/ui for beautiful UI components
- Framer Motion for smooth animations
- Tailwind CSS for utility-first styling
- Vite for fast development and building
Current Version: v1.0.31
[!] Important: For production use, implement proper API key management and server-side authentication for providers requiring backend support.