DeepgramFlux
Low-latency real-time speech recognition with eager end-of-turn signals for the speculative LLM pipeline.
Use DeepgramFlux for the lowest-latency voice pipelines. It connects to Deepgram’s V2 (Flux) streaming API, which delivers turn-based transcription with eager end-of-turn signals — the key ingredient for the eager LLM pipeline.
Prerequisites
- A Deepgram API key
- The
@deepgram/sdk(v5+) peer dependency installed:
npm install @deepgram/sdk@^5
For production, set up a proxy server so your API key stays server-side.
Basic setup
import { CompositeVoice, DeepgramFlux, AnthropicLLM, DeepgramTTS } from '@lukeocodes/composite-voice';
const agent = new CompositeVoice({
stt: new DeepgramFlux({
proxyUrl: '/api/proxy/deepgram',
options: {
model: 'flux-general-en',
eagerEotThreshold: 0.5,
},
}),
llm: new AnthropicLLM({
proxyUrl: '/api/proxy/anthropic',
model: 'claude-haiku-4-5',
systemPrompt: 'You are a helpful voice assistant. Keep responses brief.',
}),
tts: new DeepgramTTS({
proxyUrl: '/api/proxy/deepgram',
voice: 'aura-2-thalia-en',
}),
eagerLLM: {
enabled: true,
cancelOnTextChange: true,
similarityThreshold: 0.8,
},
});
await agent.start();
Configuration options
| Option | Type | Default | Description |
|---|---|---|---|
proxyUrl | string | — | URL of your CompositeVoice proxy endpoint (recommended) |
apiKey | string | — | Deepgram API key (development only) |
language | string | 'en-US' | Language code |
interimResults | boolean | true | Emit partial transcripts while the user speaks |
options.model | string | 'flux-general-en' | Flux transcription model |
options.encoding | string | — | Audio encoding: 'linear16', 'linear32', 'mulaw', 'alaw', 'opus', 'ogg-opus' |
options.sampleRate | number | — | Audio sample rate in Hz (required when encoding is set) |
options.eotThreshold | number | 0.7 | Confidence (0.5–0.9) required to confirm end-of-turn |
options.eagerEotThreshold | number | — | Confidence (0.3–0.9) to fire EagerEndOfTurn (enables eager mode) |
options.eotTimeoutMs | number | 5000 | Max ms before forcing end-of-turn regardless of confidence |
options.keyterms | string[] | — | Specialized terminology to boost recognition |
options.tag | string | — | Label for usage reporting in the Deepgram console |
options.mipOptOut | boolean | false | Opt out of the Deepgram Model Improvement Program |
See the API reference for the full list.
How Flux differs from DeepgramSTT
DeepgramFlux uses Deepgram’s V2 API (listen.v2), which is fundamentally different from the V1 API used by DeepgramSTT:
| DeepgramSTT (V1) | DeepgramFlux (V2) | |
|---|---|---|
| API | listen.live | listen.v2 |
| Models | Nova-3, Nova-2 | Flux (e.g., flux-general-en) |
| Transcription model | Event-streaming (Results events) | Turn-based (TurnInfo events) |
| Events | is_final, speech_final | StartOfTurn, Update, EagerEndOfTurn, TurnResumed, EndOfTurn |
| Preflight signals | No | Yes (EagerEndOfTurn → isPreflight: true) |
| Eager LLM pipeline | Not supported | Supported |
| Utterance buffering | SDK buffers is_final segments until speech_final | Turn lifecycle managed by Deepgram |
Use DeepgramFlux when: you want the lowest latency via the eager LLM pipeline, or you prefer the turn-based conversation model.
Use DeepgramSTT when: you need Nova-3’s broader language support, domain-specific models (medical, finance), or V1-specific features like diarization.
TurnInfo events
Flux delivers transcription through TurnInfo events that map to the CompositeVoice transcription model:
| V2 event | SDK result | Description |
|---|---|---|
StartOfTurn | isFinal: false | Speech detected, turn has begun |
Update | isFinal: false | Partial transcript update (like interim results) |
EagerEndOfTurn | isPreflight: true | Early end-of-turn prediction — triggers eager LLM |
TurnResumed | isFinal: false | User resumed speaking after an eager end-of-turn |
EndOfTurn | isFinal: true, speechFinal: true | Confirmed end of utterance — triggers standard LLM |
Eager LLM pipeline
The killer feature of DeepgramFlux is the EagerEndOfTurn signal. When Deepgram predicts that the speaker is about to stop talking, it fires this event early — before the final EndOfTurn confirmation. The SDK uses it to start LLM generation speculatively.
Configure the threshold to balance speed vs. accuracy:
const stt = new DeepgramFlux({
proxyUrl: '/api/proxy/deepgram',
options: {
model: 'flux-general-en',
eagerEotThreshold: 0.5, // lower = faster but more false positives
eotThreshold: 0.7, // higher = more certain before confirming end-of-turn
},
});
Enable the eager pipeline in CompositeVoice:
const agent = new CompositeVoice({
stt,
llm: new AnthropicLLM({ proxyUrl: '/api/proxy/anthropic' }),
tts: new DeepgramTTS({ proxyUrl: '/api/proxy/deepgram' }),
eagerLLM: {
enabled: true,
cancelOnTextChange: true,
similarityThreshold: 0.8, // accept if >=80% word overlap
},
});
The similarityThreshold controls how different the final text can be from the preflight text before the speculative response is cancelled. A value of 0.8 means that if 80%+ of the words match (in order), the response is kept. See textSimilarity for details on how similarity is computed.
Complete example
import { CompositeVoice, DeepgramFlux, AnthropicLLM, DeepgramTTS } from '@lukeocodes/composite-voice';
const agent = new CompositeVoice({
stt: new DeepgramFlux({
proxyUrl: '/api/proxy/deepgram',
language: 'en',
options: {
model: 'flux-general-en',
eagerEotThreshold: 0.5,
eotThreshold: 0.7,
eotTimeoutMs: 5000,
keyterms: ['CompositeVoice'],
},
}),
llm: new AnthropicLLM({
proxyUrl: '/api/proxy/anthropic',
model: 'claude-haiku-4-5',
maxTokens: 256,
systemPrompt: 'You are a helpful voice assistant. Keep responses under two sentences.',
}),
tts: new DeepgramTTS({
proxyUrl: '/api/proxy/deepgram',
voice: 'aura-2-thalia-en',
}),
eagerLLM: {
enabled: true,
cancelOnTextChange: true,
similarityThreshold: 0.8,
},
conversationHistory: { enabled: true, maxTurns: 10 },
logging: { enabled: true, level: 'info' },
});
agent.on('transcription:preflight', (event) => {
console.log('Eager end-of-turn:', event.text);
});
agent.on('transcription:speechFinal', (event) => {
console.log('Confirmed:', event.text);
});
await agent.start();
Tips and gotchas
- Always use a proxy in production. Pass
proxyUrlinstead ofapiKeyso your Deepgram key never reaches the browser. The SDK convertshttp(s)tows(s)automatically. - Install the peer dependency. DeepgramFlux dynamically imports
@deepgram/sdkV5 at initialization. If the package is missing, you get a clear error with install instructions. - Set
eagerEotThresholdto enable preflight. Without this option, DeepgramFlux will not emitEagerEndOfTurnevents, and the eager LLM pipeline will have no preflight signals to work with. - Connection timeout. The WebSocket connection defaults to a 10-second timeout. Adjust with
timeoutin the config if your network is slow. - Keep-alive. Use
sendKeepAlive()to prevent the V2 WebSocket from timing out during long pauses.
Related resources
- Eager pipeline example — preflight signals with speculative LLM
- Deepgram pipeline example — full Deepgram STT + TTS pipeline
- Proxy server example — secure your API key server-side
- Pipeline architecture — how the eager pipeline works
- API reference: DeepgramFlux
- API reference: DeepgramFluxConfig
- Providers reference