DeepgramSTT
Add production-grade real-time speech recognition to your voice pipeline with Deepgram's WebSocket API.
Use DeepgramSTT for production voice pipelines that need high accuracy, word-level timestamps, and wide language/model support via Deepgram’s V1 (Nova) streaming API.
Looking for eager end-of-turn / preflight signals? Use DeepgramFlux instead — it connects to Deepgram’s V2 API and supports the eager LLM pipeline.
Prerequisites
- A Deepgram API key
- The
@deepgram/sdkpeer dependency installed:
npm install @deepgram/sdk
For production, set up a proxy server so your API key stays server-side.
Basic setup
import { CompositeVoice, DeepgramSTT, AnthropicLLM, NativeTTS } from '@lukeocodes/composite-voice';
const agent = new CompositeVoice({
stt: new DeepgramSTT({
proxyUrl: '/api/proxy/deepgram',
options: {
model: 'nova-3',
smartFormat: true,
},
}),
llm: new AnthropicLLM({
proxyUrl: '/api/proxy/anthropic',
model: 'claude-haiku-4-5',
systemPrompt: 'You are a helpful voice assistant. Keep responses brief.',
}),
tts: new NativeTTS(),
});
await agent.start();
Configuration options
| Option | Type | Default | Description |
|---|---|---|---|
proxyUrl | string | — | URL of your CompositeVoice proxy endpoint (recommended) |
apiKey | string | — | Deepgram API key (development only) |
language | string | 'en-US' | Language code |
interimResults | boolean | true | Emit partial transcripts while the user speaks |
options.model | string | 'nova-3' | Transcription model (see model table below) |
options.smartFormat | boolean | true | Auto-punctuation and formatting |
options.punctuation | boolean | true | Add punctuation to results |
options.endpointing | boolean | number | 10 | Milliseconds of silence before end-of-speech (false to disable) |
options.diarize | boolean | false | Speaker identification (V1 only) |
options.keywords | string[] | — | Boost recognition of specific terms (with optional weight, e.g. 'Deepgram:2') |
options.vadEvents | boolean | false | Emit SpeechStarted events (V1 only) |
options.detectEntities | boolean | false | Detect entities in the transcript (V1 only) |
options.numerals | boolean | false | Convert spoken numbers to digits (V1 only) |
options.redact | string[] | — | Redact sensitive info: 'pci', 'ssn', 'numbers' (V1 only) |
options.multichannel | boolean | false | Transcribe each audio channel independently (V1 only) |
options.utterances | boolean | false | Enable utterance segmentation (V1 only) |
See the API reference for the full list.
Models
DeepgramSTT uses Deepgram’s V1 (Nova) model family:
| Model | Description |
|---|---|
nova-3 | Latest model, highest accuracy, recommended default |
nova-3-medical | Optimized for medical terminology |
nova-2 | Previous generation — use if you need a language not yet in Nova-3 |
nova-2-* | Domain variants: meeting, finance, conversationalai, voicemail, medical, drivethru, automotive |
nova | Legacy, not recommended for new projects |
V1 uses an event-streaming model with Results events containing is_final and speech_final flags. Nova-3 delivers the best accuracy across the widest range of languages. Use Nova-2 variants for domain-specific vocabulary.
For Flux models (e.g.,
flux-general-en) with turn-based transcription and eager end-of-turn signals, use the DeepgramFlux provider instead.
Complete example
import { CompositeVoice, DeepgramSTT, AnthropicLLM, DeepgramTTS } from '@lukeocodes/composite-voice';
const agent = new CompositeVoice({
stt: new DeepgramSTT({
proxyUrl: '/api/proxy/deepgram',
language: 'en',
interimResults: true,
options: {
model: 'nova-3',
smartFormat: true,
punctuation: true,
endpointing: 300,
keywords: ['CompositeVoice'],
},
}),
llm: new AnthropicLLM({
proxyUrl: '/api/proxy/anthropic',
model: 'claude-haiku-4-5',
maxTokens: 256,
systemPrompt: 'You are a helpful voice assistant. Keep responses under two sentences.',
}),
tts: new DeepgramTTS({
proxyUrl: '/api/proxy/deepgram',
voice: 'aura-2-thalia-en',
}),
// eagerLLM requires DeepgramFlux — see the DeepgramFlux guide for eager pipeline setup
conversationHistory: { enabled: true, maxTurns: 10 },
logging: { enabled: true, level: 'info' },
});
agent.on('transcription:final', (event) => {
console.log('User said:', event.text);
});
await agent.start();
Tips and gotchas
- Always use a proxy in production. Pass
proxyUrlinstead ofapiKeyso your Deepgram key never reaches the browser. The SDK convertshttp(s)tows(s)automatically. - Install the peer dependency. DeepgramSTT dynamically imports
@deepgram/sdkat initialization. If the package is missing, you get a clear error with install instructions. - Utterance buffering. Deepgram may split one utterance into multiple
is_finalsegments before emittingspeech_final. DeepgramSTT buffers these segments and delivers the complete utterance text whenspeechFinal: true. - No preflight signals. DeepgramSTT (V1/Nova) does not emit preflight/eager end-of-turn events. For the eager LLM pipeline, use DeepgramFlux instead.
- Connection timeout. The WebSocket connection defaults to a 10-second timeout. Adjust with
timeoutin the config if your network is slow.
Related resources
- Deepgram pipeline example — full Deepgram STT + TTS pipeline
- Eager pipeline example — preflight signals with speculative LLM (uses DeepgramFlux)
- Deepgram options example — explore transcription options
- DeepgramFlux guide — V2 Flux provider with eager end-of-turn signals
- Proxy server example — secure your API key server-side
- API reference: DeepgramSTT
- Providers reference