AssemblyAISTT
Add real-time speech recognition with word boosting to your voice pipeline using AssemblyAI's WebSocket API.
Use AssemblyAISTT when you need real-time transcription with word boosting for domain-specific vocabulary and automatic WebSocket reconnection.
Prerequisites
- An AssemblyAI API key
No peer dependencies are required. AssemblyAISTT connects through a raw WebSocket managed by the SDK’s built-in WebSocketManager.
For production, set up a proxy server so your API key stays server-side.
Basic setup
import { CompositeVoice, AssemblyAISTT, AnthropicLLM, NativeTTS } from '@lukeocodes/composite-voice';
const agent = new CompositeVoice({
stt: new AssemblyAISTT({
proxyUrl: '/api/proxy/assemblyai',
sampleRate: 16000,
}),
llm: new AnthropicLLM({
proxyUrl: '/api/proxy/anthropic',
model: 'claude-haiku-4-5',
systemPrompt: 'You are a helpful voice assistant. Keep responses brief.',
}),
tts: new NativeTTS(),
});
await agent.start();
Configuration options
| Option | Type | Default | Description |
|---|---|---|---|
proxyUrl | string | — | URL of your CompositeVoice proxy endpoint (recommended) |
apiKey | string | — | AssemblyAI API key (development only) |
sampleRate | number | 16000 | Audio sample rate in Hz |
language | string | 'en' | Language code for transcription |
wordBoost | string[] | — | Words to prioritize during recognition |
interimResults | boolean | true | Emit partial transcripts while the user speaks |
timeout | number | 10000 | Connection timeout in milliseconds |
See the API reference for the full list.
Complete example
import { CompositeVoice, AssemblyAISTT, AnthropicLLM, NativeTTS } from '@lukeocodes/composite-voice';
const agent = new CompositeVoice({
stt: new AssemblyAISTT({
proxyUrl: '/api/proxy/assemblyai',
sampleRate: 16000,
language: 'en',
wordBoost: ['CompositeVoice', 'Deepgram', 'AssemblyAI'],
}),
llm: new AnthropicLLM({
proxyUrl: '/api/proxy/anthropic',
model: 'claude-haiku-4-5',
maxTokens: 256,
systemPrompt: 'You are a helpful voice assistant. Keep responses under two sentences.',
}),
tts: new NativeTTS({ voiceLang: 'en-US' }),
conversationHistory: { enabled: true, maxTurns: 10 },
logging: { enabled: true, level: 'info' },
});
agent.on('transcription:final', (event) => {
console.log('User said:', event.text);
});
agent.on('response:text', (event) => {
console.log('Assistant:', event.text);
});
await agent.start();
Tips and gotchas
- Always use a proxy in production. Pass
proxyUrlinstead ofapiKeyso your AssemblyAI key never reaches the browser. The SDK convertshttp(s)tows(s)automatically. - No peer dependencies. Unlike DeepgramSTT, AssemblyAISTT uses the SDK’s built-in
WebSocketManager— no extra packages to install. - Word boosting improves accuracy. Pass product names, technical terms, or proper nouns in
wordBoostso AssemblyAI prioritizes them during recognition. - Audio is base64-encoded. The provider converts raw
ArrayBufferaudio into base64 JSON messages ({ audio_data: "..." }) before sending. This is handled automatically. - Automatic reconnection. The
WebSocketManagerreconnects with exponential backoff (up to 5 attempts, 1s initial delay, 30s max delay) if the connection drops. - No preflight signals. AssemblyAISTT does not emit preflight/eager end-of-turn events. If you need the eager LLM pipeline, use DeepgramSTT instead.
- Graceful disconnect. When you call
disconnect(), the provider sends aterminate_sessionmessage to AssemblyAI before closing the WebSocket.
Related resources
- Proxy server example — secure your API key server-side
- API reference: AssemblyAISTT
- Providers reference