FAQ & Troubleshooting
Common issues, error codes, browser gotchas, and solutions for CompositeVoice.
Quick answers for the most common issues developers hit when building with CompositeVoice.
Setup & Installation
”Cannot find module ‘@lukeocodes/composite-voice’”
The SDK must be built before examples or consuming apps can import it:
pnpm install && pnpm build
“Missing API key” or 401 errors
- Copy the example’s
sample.envto.envin the same directory. - Fill in the key — e.g.,
ANTHROPIC_API_KEY=sk-ant-... - Never prefix keys with
VITE_— Vite auto-exposesVITE_*variables to the browser, leaking your key. The SDK proxy reads plain env vars server-side.
Missing peer dependency errors
Some providers require an external SDK:
| Provider | Peer dependency |
|---|---|
| AnthropicLLM | @anthropic-ai/sdk (>=0.67.0) |
| OpenAILLM, GroqLLM, GeminiLLM, MistralLLM, OpenAITTS | openai (>=6.5.0) |
| DeepgramSTT, DeepgramTTS | None (built-in WebSocket) |
| DeepgramFlux | @deepgram/sdk (>=5.0.0) |
| WebLLMLLM | @mlc-ai/web-llm (>=0.2.74) |
| AssemblyAISTT | None (built-in WebSocket) |
| ElevenLabsSTT, ElevenLabsTTS | None (built-in WebSocket) |
| CartesiaTTS | None (built-in WebSocket) |
Install what you need:
npm install @anthropic-ai/sdk # for AnthropicLLM
npm install openai # for OpenAILLM, GroqLLM, GeminiLLM, MistralLLM, OpenAITTS
npm install @mlc-ai/web-llm # for WebLLM
Browser & OS Compatibility
NativeSTT does nothing / microphone dies silently
The Web Speech API in Chrome sends audio to Google’s speech recognition servers. It will not work in:
| Browser | NativeSTT | NativeTTS | Why |
|---|---|---|---|
| Chrome / Edge | Works | Works | Full Google services |
| Ungoogled Chromium | No | Works | Google services stripped |
| Brave | No | Works | Google services stripped |
| Firefox | No | Limited | Web Speech API not implemented |
| Safari | Unreliable | Works | Partial, varies by version |
Fix: Switch to a WebSocket-based STT provider (DeepgramSTT, AssemblyAISTT, or ElevenLabsSTT) which work in all modern browsers.
WebLLM won’t load — “WebGPU not available”
WebLLMLLM requires a WebGPU-capable browser. Check first:
if (!navigator.gpu) {
console.error('WebGPU not supported — use a cloud LLM provider instead');
}
Supported in Chrome 113+ and Edge 113+ only.
Voice sounds different across operating systems
NativeTTS uses whatever voices the OS provides. Quality varies:
- macOS: Select (Enhanced) or (Premium) voices in System Settings > Accessibility > Spoken Content for better quality.
- Windows: Install additional voices via Settings > Time & Language > Speech.
- Linux: Voice quality depends on the speech-dispatcher configuration.
For consistent quality across platforms, use DeepgramTTS, ElevenLabsTTS, or CartesiaTTS.
Microphone & Audio
Microphone permission denied
Browsers require a user gesture (click, tap) before granting microphone access. Make sure:
- The user clicks a button before
startListening()is called. - The page is served over
localhostorhttps://— microphone access is blocked on plainhttp://. - The user hasn’t previously denied permission. Click the lock icon in the address bar to reset.
If permission was denied, NativeSTT throws a ProviderConnectionError. Other STT providers (DeepgramSTT, etc.) rely on the SDK’s internal AudioCapture, which throws a MicrophonePermissionError.
No audio playback / speakers are silent
- Check the system volume isn’t muted.
- Check the browser console for
AudioContextsuspension warnings — browsers suspend audio until a user gesture occurs. - If using DeepgramTTS, verify the voice model is available on your plan. Try
aura-2-thalia-en(available on free tier).
Audio cuts out during TTS playback
By default, CompositeVoice pauses the microphone while TTS is playing to prevent echo. This is expected. After playback finishes, listening resumes automatically. See the turn-taking guide to customize this behaviour.
Network & Connectivity
WebSocket connection fails
Common causes:
- Invalid API key — check
.envfor typos, verify the key isn’t revoked. - Corporate VPN/firewall — some networks block outbound WebSocket connections on port 443. Try on a different network.
- Proxy not running — in development, the Vite proxy only works during
pnpm dev. In production, ensure the Express/Node server is running.
404 on /proxy/* endpoints
- Development: The Vite dev server proxy only works with
pnpm dev, not after a production build. - Production: Ensure
server.tsis running andcreateExpressProxyis mounted at the correct path.
WebSocket timeout during long pauses
DeepgramFlux uses Deepgram’s V2 API which may disconnect after extended silence. Call sendKeepAlive() to maintain the connection:
// Send keep-alive every 8 seconds during idle periods
setInterval(() => stt.sendKeepAlive(), 8000);
For other providers, adjust the timeout config (default: 10,000ms).
WebSocket proxying in Next.js
Standard Vercel/Next.js deployments do not support WebSocket upgrade. If you need WebSocket proxying for DeepgramSTT/TTS, use a custom Next.js server or deploy the proxy separately.
Speech-to-Text Issues
Transcripts cut off mid-sentence
Deepgram may split a single utterance into multiple is_final segments. The SDK buffers these until speech_final fires, but if the endpointing threshold is too aggressive, words get split.
Fix: Increase the endpointing window:
new DeepgramSTT({
options: {
endpointing: 500, // ms of silence before end-of-speech (default: 300)
utteranceEndMs: 1500, // ms before utterance boundary (default: 1000)
},
});
Eager pipeline not working — no preflight events
The eager LLM pipeline only works with DeepgramFlux. Check:
- You’re using
DeepgramFlux, notDeepgramSTT. eagerEotThresholdis set — without it, noEagerEndOfTurnevents fire.- Your Deepgram account has V2 Flux model access.
new DeepgramFlux({
options: {
model: 'flux-general-en',
eagerEotThreshold: 0.5, // enables preflight signals
eotThreshold: 0.7,
},
});
Speech recognition stops mid-session (NativeSTT)
Chrome’s Web Speech API sometimes fires onend even with continuous: true. The SDK auto-restarts recognition (up to 5 retries). If it keeps happening, check the browser console for errors, or switch to a WebSocket-based STT provider.
Language Model Issues
Anthropic: “maxTokens is required”
Unlike OpenAI, the Anthropic API requires maxTokens on every request. The SDK defaults to 1024, but for voice apps, shorter is better:
new AnthropicLLM({
proxyUrl: '/api/proxy/anthropic',
model: 'claude-haiku-4-5',
maxTokens: 256, // keep voice responses concise
});
LLM restarts too frequently (eager pipeline)
If the eager pipeline is cancelling and restarting the LLM on every small transcript change:
- Raise
similarityThreshold(e.g.,0.9) to tolerate minor word-level changes. - Set
cancelOnTextChange: falseif small differences are acceptable. - Speak with natural pauses — the eagerness threshold needs clear end-of-turn signals.
Rate limiting (429 errors)
The SDK emits an llm.error event with the details. To handle:
- Subscribe to
agent.on('llm.error', ...)and surface a “please wait” message. - For Anthropic, check your rate limits — upgrade your plan or add retry logic.
- Lower
maxTokensto reduce token consumption per turn.
WebLLM: first load is slow (100+ MB download)
The first load downloads model weights to the browser cache. Wire onLoadProgress to show a loading indicator:
new WebLLMLLM({
model: 'Llama-3.2-1B-Instruct-q4f16_1-MLC',
onLoadProgress: (progress) => {
loadingBar.style.width = `${(progress.progress * 100).toFixed(0)}%`;
},
});
Subsequent loads use the cache and are near-instant.
Text-to-Speech Issues
OpenAI TTS: response is slow
OpenAITTS is REST-based — the full audio must be generated before playback starts. For lower latency, switch to a streaming TTS provider: DeepgramTTS, ElevenLabsTTS, or CartesiaTTS.
4096-character limit (OpenAI TTS)
The OpenAI TTS API has a 4096-character request limit. The SDK handles this automatically, but if your LLM responses are long, consider reducing maxTokens or adding “keep responses brief” to your system prompt.
Cost Optimization
Conversation history makes costs grow
Every turn appends to the message history, so input tokens grow over time. Control this with maxTurns:
new CompositeVoice({
providers: [stt, llm, tts],
conversationHistory: {
enabled: true,
maxTurns: 10, // keep last 10 exchanges, drop older ones
},
});
Lower maxTurns means lower cost per turn but less context for the AI. Set maxTurns: 0 for unlimited history (careful with long sessions).
Choosing the right model for voice
Voice apps prioritize latency over capability. Recommended picks:
| Use case | LLM | Why |
|---|---|---|
| Lowest cost | GroqLLM (llama-3.3-70b-versatile) | Free tier, fastest inference |
| Best quality | AnthropicLLM (claude-haiku-4-5) | Fast, high quality, low cost |
| Full privacy | WebLLMLLM | Runs entirely in the browser |
Error Codes
All SDK errors extend CompositeVoiceError with a code and recoverable flag.
| Error Class | Code | Recoverable | Typical cause |
|---|---|---|---|
ProviderInitializationError | PROVIDER_INIT_ERROR | No | Missing API key, missing peer dependency |
ProviderConnectionError | PROVIDER_CONNECTION_ERROR | Yes | Network issue, service down |
ProviderResponseError | PROVIDER_RESPONSE_ERROR | Yes | HTTP error (429 rate limit, 500 server error) |
AudioCaptureError | AUDIO_CAPTURE_ERROR | Yes | Device disconnected, stream interrupted |
AudioPlaybackError | AUDIO_PLAYBACK_ERROR | Yes | AudioContext failure, decoding error |
MicrophonePermissionError | MICROPHONE_PERMISSION_DENIED | No | User denied microphone access |
ConfigurationError | CONFIGURATION_ERROR | No | Invalid SDK configuration |
InvalidStateError | INVALID_STATE_ERROR | No | Operation in wrong state (e.g., startListening before initialize) |
TimeoutError | TIMEOUT_ERROR | Yes | WebSocket or API call exceeded time limit |
WebSocketError | WEBSOCKET_ERROR | Yes | Connection drop, send failure |
Recoverable errors are transient — the SDK can retry automatically with autoRecover: true. Non-recoverable errors require user action (grant permission, fix config, add API key).
See Error Recovery for detailed handling patterns.
Production Checklist
Before deploying:
- All API keys in server-side env vars (not
VITE_prefixed) - Using
proxyUrlon every provider (noapiKeyin browser code) - HTTPS enabled (required for microphone access)
- Rate limiting configured at the proxy level
- CORS restricted to your domain
- Spending limits set on provider dashboards
- Error events handled (
agent.error,llm.error,tts.error,transcription.error) - Logging enabled with appropriate level (
'info'for production,'debug'for staging) -
conversationHistory.maxTurnsset to control cost growth - Tested in target browsers (especially if using NativeSTT/NativeTTS)
Still stuck?
- Enable debug logging:
logging: { enabled: true, level: 'debug' }— the SDK logs every state transition, WebSocket message, and provider call. - Check the Error Recovery guide for auto-recovery patterns.
- Browse the examples — each one has a troubleshooting section in its README.
- Open an issue on GitHub.