Skip to content

DeepgramSTT

Add production-grade real-time speech recognition to your voice pipeline with Deepgram's WebSocket API.

Use DeepgramSTT for production voice pipelines that need high accuracy, word-level timestamps, and wide language/model support via Deepgram’s V1 (Nova) streaming API.

Looking for eager end-of-turn / preflight signals? Use DeepgramFlux instead — it connects to Deepgram’s V2 API and supports the eager LLM pipeline.

Prerequisites

  • A Deepgram API key
  • The @deepgram/sdk peer dependency installed:
npm install @deepgram/sdk

For production, set up a proxy server so your API key stays server-side.

Basic setup

import { CompositeVoice, DeepgramSTT, AnthropicLLM, NativeTTS } from '@lukeocodes/composite-voice';

const agent = new CompositeVoice({
  stt: new DeepgramSTT({
    proxyUrl: '/api/proxy/deepgram',
    options: {
      model: 'nova-3',
      smartFormat: true,
    },
  }),
  llm: new AnthropicLLM({
    proxyUrl: '/api/proxy/anthropic',
    model: 'claude-haiku-4-5',
    systemPrompt: 'You are a helpful voice assistant. Keep responses brief.',
  }),
  tts: new NativeTTS(),
});

await agent.start();

Configuration options

OptionTypeDefaultDescription
proxyUrlstringURL of your CompositeVoice proxy endpoint (recommended)
apiKeystringDeepgram API key (development only)
languagestring'en-US'Language code
interimResultsbooleantrueEmit partial transcripts while the user speaks
options.modelstring'nova-3'Transcription model (see model table below)
options.smartFormatbooleantrueAuto-punctuation and formatting
options.punctuationbooleantrueAdd punctuation to results
options.endpointingboolean | number10Milliseconds of silence before end-of-speech (false to disable)
options.diarizebooleanfalseSpeaker identification (V1 only)
options.keywordsstring[]Boost recognition of specific terms (with optional weight, e.g. 'Deepgram:2')
options.vadEventsbooleanfalseEmit SpeechStarted events (V1 only)
options.detectEntitiesbooleanfalseDetect entities in the transcript (V1 only)
options.numeralsbooleanfalseConvert spoken numbers to digits (V1 only)
options.redactstring[]Redact sensitive info: 'pci', 'ssn', 'numbers' (V1 only)
options.multichannelbooleanfalseTranscribe each audio channel independently (V1 only)
options.utterancesbooleanfalseEnable utterance segmentation (V1 only)

See the API reference for the full list.

Models

DeepgramSTT uses Deepgram’s V1 (Nova) model family:

ModelDescription
nova-3Latest model, highest accuracy, recommended default
nova-3-medicalOptimized for medical terminology
nova-2Previous generation — use if you need a language not yet in Nova-3
nova-2-*Domain variants: meeting, finance, conversationalai, voicemail, medical, drivethru, automotive
novaLegacy, not recommended for new projects

V1 uses an event-streaming model with Results events containing is_final and speech_final flags. Nova-3 delivers the best accuracy across the widest range of languages. Use Nova-2 variants for domain-specific vocabulary.

For Flux models (e.g., flux-general-en) with turn-based transcription and eager end-of-turn signals, use the DeepgramFlux provider instead.

Complete example

import { CompositeVoice, DeepgramSTT, AnthropicLLM, DeepgramTTS } from '@lukeocodes/composite-voice';

const agent = new CompositeVoice({
  stt: new DeepgramSTT({
    proxyUrl: '/api/proxy/deepgram',
    language: 'en',
    interimResults: true,
    options: {
      model: 'nova-3',
      smartFormat: true,
      punctuation: true,
      endpointing: 300,
      keywords: ['CompositeVoice'],
    },
  }),
  llm: new AnthropicLLM({
    proxyUrl: '/api/proxy/anthropic',
    model: 'claude-haiku-4-5',
    maxTokens: 256,
    systemPrompt: 'You are a helpful voice assistant. Keep responses under two sentences.',
  }),
  tts: new DeepgramTTS({
    proxyUrl: '/api/proxy/deepgram',
    voice: 'aura-2-thalia-en',
  }),
  // eagerLLM requires DeepgramFlux — see the DeepgramFlux guide for eager pipeline setup
  conversationHistory: { enabled: true, maxTurns: 10 },
  logging: { enabled: true, level: 'info' },
});

agent.on('transcription:final', (event) => {
  console.log('User said:', event.text);
});

await agent.start();

Tips and gotchas

  • Always use a proxy in production. Pass proxyUrl instead of apiKey so your Deepgram key never reaches the browser. The SDK converts http(s) to ws(s) automatically.
  • Install the peer dependency. DeepgramSTT dynamically imports @deepgram/sdk at initialization. If the package is missing, you get a clear error with install instructions.
  • Utterance buffering. Deepgram may split one utterance into multiple is_final segments before emitting speech_final. DeepgramSTT buffers these segments and delivers the complete utterance text when speechFinal: true.
  • No preflight signals. DeepgramSTT (V1/Nova) does not emit preflight/eager end-of-turn events. For the eager LLM pipeline, use DeepgramFlux instead.
  • Connection timeout. The WebSocket connection defaults to a 10-second timeout. Adjust with timeout in the config if your network is slow.

© 2026 CompositeVoice. All rights reserved.

Font size
Contrast
Motion
Transparency