Skip to content

AssemblyAISTT

Add real-time speech recognition with word boosting to your voice pipeline using AssemblyAI's WebSocket API.

Use AssemblyAISTT when you need real-time transcription with word boosting for domain-specific vocabulary and automatic WebSocket reconnection.

Prerequisites

No peer dependencies are required. AssemblyAISTT connects through a raw WebSocket managed by the SDK’s built-in WebSocketManager.

For production, set up a proxy server so your API key stays server-side.

Basic setup

import { CompositeVoice, AssemblyAISTT, AnthropicLLM, NativeTTS } from '@lukeocodes/composite-voice';

const agent = new CompositeVoice({
  stt: new AssemblyAISTT({
    proxyUrl: '/api/proxy/assemblyai',
    sampleRate: 16000,
  }),
  llm: new AnthropicLLM({
    proxyUrl: '/api/proxy/anthropic',
    model: 'claude-haiku-4-5',
    systemPrompt: 'You are a helpful voice assistant. Keep responses brief.',
  }),
  tts: new NativeTTS(),
});

await agent.start();

Configuration options

OptionTypeDefaultDescription
proxyUrlstringURL of your CompositeVoice proxy endpoint (recommended)
apiKeystringAssemblyAI API key (development only)
sampleRatenumber16000Audio sample rate in Hz
languagestring'en'Language code for transcription
wordBooststring[]Words to prioritize during recognition
interimResultsbooleantrueEmit partial transcripts while the user speaks
timeoutnumber10000Connection timeout in milliseconds

See the API reference for the full list.

Complete example

import { CompositeVoice, AssemblyAISTT, AnthropicLLM, NativeTTS } from '@lukeocodes/composite-voice';

const agent = new CompositeVoice({
  stt: new AssemblyAISTT({
    proxyUrl: '/api/proxy/assemblyai',
    sampleRate: 16000,
    language: 'en',
    wordBoost: ['CompositeVoice', 'Deepgram', 'AssemblyAI'],
  }),
  llm: new AnthropicLLM({
    proxyUrl: '/api/proxy/anthropic',
    model: 'claude-haiku-4-5',
    maxTokens: 256,
    systemPrompt: 'You are a helpful voice assistant. Keep responses under two sentences.',
  }),
  tts: new NativeTTS({ voiceLang: 'en-US' }),
  conversationHistory: { enabled: true, maxTurns: 10 },
  logging: { enabled: true, level: 'info' },
});

agent.on('transcription:final', (event) => {
  console.log('User said:', event.text);
});

agent.on('response:text', (event) => {
  console.log('Assistant:', event.text);
});

await agent.start();

Tips and gotchas

  • Always use a proxy in production. Pass proxyUrl instead of apiKey so your AssemblyAI key never reaches the browser. The SDK converts http(s) to ws(s) automatically.
  • No peer dependencies. Unlike DeepgramSTT, AssemblyAISTT uses the SDK’s built-in WebSocketManager — no extra packages to install.
  • Word boosting improves accuracy. Pass product names, technical terms, or proper nouns in wordBoost so AssemblyAI prioritizes them during recognition.
  • Audio is base64-encoded. The provider converts raw ArrayBuffer audio into base64 JSON messages ({ audio_data: "..." }) before sending. This is handled automatically.
  • Automatic reconnection. The WebSocketManager reconnects with exponential backoff (up to 5 attempts, 1s initial delay, 30s max delay) if the connection drops.
  • No preflight signals. AssemblyAISTT does not emit preflight/eager end-of-turn events. If you need the eager LLM pipeline, use DeepgramSTT instead.
  • Graceful disconnect. When you call disconnect(), the provider sends a terminate_session message to AssemblyAI before closing the WebSocket.

© 2026 CompositeVoice. All rights reserved.

Font size
Contrast
Motion
Transparency