Getting Started
Install CompositeVoice and build your first voice pipeline in under five minutes.
Prerequisites
- Node.js 18 or later
- A package manager: npm, pnpm, or yarn
- An Anthropic API key (for the LLM provider)
Install
npm install @lukeocodes/composite-voice
Your first voice pipeline
The simplest pipeline uses browser-native speech recognition and synthesis — zero extra API keys beyond Anthropic. NativeSTT wraps the Web Speech API. NativeTTS wraps SpeechSynthesis. Both work out of the box in Chrome, Edge, and Safari.
import {
CompositeVoice,
NativeSTT,
AnthropicLLM,
NativeTTS,
} from '@lukeocodes/composite-voice';
const voice = new CompositeVoice({
stt: new NativeSTT(),
llm: new AnthropicLLM({
proxyUrl: '/api/proxy/anthropic',
model: 'claude-haiku-4-5',
}),
tts: new NativeTTS(),
});
The proxyUrl keeps your Anthropic API key on the server. The browser never sees it. See Secure your API key below.
Listen for events
CompositeVoice uses an event-driven architecture. Subscribe to the events you care about before calling start().
voice.on('agent:stateChange', ({ state }) => {
console.log('State:', state);
// idle -> ready -> listening -> thinking -> speaking
});
voice.on('transcription:speechFinal', ({ text }) => {
console.log('User:', text);
});
voice.on('llm:complete', ({ text }) => {
console.log('Assistant:', text);
});
voice.on('agent:error', ({ error }) => {
console.error('Error:', error.message);
});
The state machine drives the UI. When the agent enters listening, show a recording indicator. When it enters thinking, show a loading spinner. When it enters speaking, animate the assistant avatar.
Start and stop
await voice.start(); // Requests microphone permission, opens connections
// ... user speaks, assistant responds ...
await voice.stop(); // Releases microphone, closes connections
start() returns a Promise that resolves once the microphone is active and all providers are connected. stop() tears everything down and releases all resources.
Secure your API key with the server proxy
The proxyUrl pattern keeps API keys server-side. The browser sends requests to your proxy endpoint, and the proxy injects the real API key before forwarding to the upstream provider.
Create an Express server:
import express from 'express';
import { createExpressProxy } from '@lukeocodes/composite-voice/proxy';
const app = express();
const proxy = createExpressProxy({
anthropicApiKey: process.env.ANTHROPIC_API_KEY,
pathPrefix: '/api/proxy',
});
app.use(proxy.middleware);
const server = app.listen(3000, () => {
proxy.attachWebSocket(server);
console.log('Proxy listening on port 3000');
});
The browser sends requests to /api/proxy/anthropic and the proxy forwards them to https://api.anthropic.com with the real key attached. WebSocket providers (Deepgram, ElevenLabs, AssemblyAI, Cartesia) require the attachWebSocket call to handle upgrade requests.
Upgrade to cloud providers
NativeSTT and NativeTTS work for prototyping. For production-quality speech recognition and synthesis, swap in cloud providers. DeepgramSTT and DeepgramTTS use low-latency WebSocket connections.
import {
CompositeVoice,
DeepgramSTT,
AnthropicLLM,
DeepgramTTS,
} from '@lukeocodes/composite-voice';
const voice = new CompositeVoice({
stt: new DeepgramSTT({ proxyUrl: '/api/proxy/deepgram' }),
llm: new AnthropicLLM({
proxyUrl: '/api/proxy/anthropic',
model: 'claude-haiku-4-5',
}),
tts: new DeepgramTTS({ proxyUrl: '/api/proxy/deepgram' }),
});
Update the proxy to include the Deepgram API key:
const proxy = createExpressProxy({
anthropicApiKey: process.env.ANTHROPIC_API_KEY,
deepgramApiKey: process.env.DEEPGRAM_API_KEY,
pathPrefix: '/api/proxy',
});
Every provider you configure in the proxy gets its own route. Add keys for only the providers you use.
Next steps
- Configuration — pipeline options, turn-taking, and audio settings
- Providers — all available STT, LLM, and TTS providers
- Events — the full event reference
- Examples — runnable demo apps