Skip to content

DeepgramFlux

Low-latency real-time speech recognition with eager end-of-turn signals for the speculative LLM pipeline.

Use DeepgramFlux for the lowest-latency voice pipelines. It connects to Deepgram’s V2 (Flux) streaming API, which delivers turn-based transcription with eager end-of-turn signals — the key ingredient for the eager LLM pipeline.

Prerequisites

  • A Deepgram API key
  • The @deepgram/sdk (v5+) peer dependency installed:
npm install @deepgram/sdk@^5

For production, set up a proxy server so your API key stays server-side.

Basic setup

import { CompositeVoice, DeepgramFlux, AnthropicLLM, DeepgramTTS } from '@lukeocodes/composite-voice';

const agent = new CompositeVoice({
  stt: new DeepgramFlux({
    proxyUrl: '/api/proxy/deepgram',
    options: {
      model: 'flux-general-en',
      eagerEotThreshold: 0.5,
    },
  }),
  llm: new AnthropicLLM({
    proxyUrl: '/api/proxy/anthropic',
    model: 'claude-haiku-4-5',
    systemPrompt: 'You are a helpful voice assistant. Keep responses brief.',
  }),
  tts: new DeepgramTTS({
    proxyUrl: '/api/proxy/deepgram',
    voice: 'aura-2-thalia-en',
  }),
  eagerLLM: {
    enabled: true,
    cancelOnTextChange: true,
    similarityThreshold: 0.8,
  },
});

await agent.start();

Configuration options

OptionTypeDefaultDescription
proxyUrlstringURL of your CompositeVoice proxy endpoint (recommended)
apiKeystringDeepgram API key (development only)
languagestring'en-US'Language code
interimResultsbooleantrueEmit partial transcripts while the user speaks
options.modelstring'flux-general-en'Flux transcription model
options.encodingstringAudio encoding: 'linear16', 'linear32', 'mulaw', 'alaw', 'opus', 'ogg-opus'
options.sampleRatenumberAudio sample rate in Hz (required when encoding is set)
options.eotThresholdnumber0.7Confidence (0.5–0.9) required to confirm end-of-turn
options.eagerEotThresholdnumberConfidence (0.3–0.9) to fire EagerEndOfTurn (enables eager mode)
options.eotTimeoutMsnumber5000Max ms before forcing end-of-turn regardless of confidence
options.keytermsstring[]Specialized terminology to boost recognition
options.tagstringLabel for usage reporting in the Deepgram console
options.mipOptOutbooleanfalseOpt out of the Deepgram Model Improvement Program

See the API reference for the full list.

How Flux differs from DeepgramSTT

DeepgramFlux uses Deepgram’s V2 API (listen.v2), which is fundamentally different from the V1 API used by DeepgramSTT:

DeepgramSTT (V1)DeepgramFlux (V2)
APIlisten.livelisten.v2
ModelsNova-3, Nova-2Flux (e.g., flux-general-en)
Transcription modelEvent-streaming (Results events)Turn-based (TurnInfo events)
Eventsis_final, speech_finalStartOfTurn, Update, EagerEndOfTurn, TurnResumed, EndOfTurn
Preflight signalsNoYes (EagerEndOfTurnisPreflight: true)
Eager LLM pipelineNot supportedSupported
Utterance bufferingSDK buffers is_final segments until speech_finalTurn lifecycle managed by Deepgram

Use DeepgramFlux when: you want the lowest latency via the eager LLM pipeline, or you prefer the turn-based conversation model.

Use DeepgramSTT when: you need Nova-3’s broader language support, domain-specific models (medical, finance), or V1-specific features like diarization.

TurnInfo events

Flux delivers transcription through TurnInfo events that map to the CompositeVoice transcription model:

V2 eventSDK resultDescription
StartOfTurnisFinal: falseSpeech detected, turn has begun
UpdateisFinal: falsePartial transcript update (like interim results)
EagerEndOfTurnisPreflight: trueEarly end-of-turn prediction — triggers eager LLM
TurnResumedisFinal: falseUser resumed speaking after an eager end-of-turn
EndOfTurnisFinal: true, speechFinal: trueConfirmed end of utterance — triggers standard LLM

Eager LLM pipeline

The killer feature of DeepgramFlux is the EagerEndOfTurn signal. When Deepgram predicts that the speaker is about to stop talking, it fires this event early — before the final EndOfTurn confirmation. The SDK uses it to start LLM generation speculatively.

Configure the threshold to balance speed vs. accuracy:

const stt = new DeepgramFlux({
  proxyUrl: '/api/proxy/deepgram',
  options: {
    model: 'flux-general-en',
    eagerEotThreshold: 0.5,  // lower = faster but more false positives
    eotThreshold: 0.7,       // higher = more certain before confirming end-of-turn
  },
});

Enable the eager pipeline in CompositeVoice:

const agent = new CompositeVoice({
  stt,
  llm: new AnthropicLLM({ proxyUrl: '/api/proxy/anthropic' }),
  tts: new DeepgramTTS({ proxyUrl: '/api/proxy/deepgram' }),
  eagerLLM: {
    enabled: true,
    cancelOnTextChange: true,
    similarityThreshold: 0.8,  // accept if >=80% word overlap
  },
});

The similarityThreshold controls how different the final text can be from the preflight text before the speculative response is cancelled. A value of 0.8 means that if 80%+ of the words match (in order), the response is kept. See textSimilarity for details on how similarity is computed.

Complete example

import { CompositeVoice, DeepgramFlux, AnthropicLLM, DeepgramTTS } from '@lukeocodes/composite-voice';

const agent = new CompositeVoice({
  stt: new DeepgramFlux({
    proxyUrl: '/api/proxy/deepgram',
    language: 'en',
    options: {
      model: 'flux-general-en',
      eagerEotThreshold: 0.5,
      eotThreshold: 0.7,
      eotTimeoutMs: 5000,
      keyterms: ['CompositeVoice'],
    },
  }),
  llm: new AnthropicLLM({
    proxyUrl: '/api/proxy/anthropic',
    model: 'claude-haiku-4-5',
    maxTokens: 256,
    systemPrompt: 'You are a helpful voice assistant. Keep responses under two sentences.',
  }),
  tts: new DeepgramTTS({
    proxyUrl: '/api/proxy/deepgram',
    voice: 'aura-2-thalia-en',
  }),
  eagerLLM: {
    enabled: true,
    cancelOnTextChange: true,
    similarityThreshold: 0.8,
  },
  conversationHistory: { enabled: true, maxTurns: 10 },
  logging: { enabled: true, level: 'info' },
});

agent.on('transcription:preflight', (event) => {
  console.log('Eager end-of-turn:', event.text);
});

agent.on('transcription:speechFinal', (event) => {
  console.log('Confirmed:', event.text);
});

await agent.start();

Tips and gotchas

  • Always use a proxy in production. Pass proxyUrl instead of apiKey so your Deepgram key never reaches the browser. The SDK converts http(s) to ws(s) automatically.
  • Install the peer dependency. DeepgramFlux dynamically imports @deepgram/sdk V5 at initialization. If the package is missing, you get a clear error with install instructions.
  • Set eagerEotThreshold to enable preflight. Without this option, DeepgramFlux will not emit EagerEndOfTurn events, and the eager LLM pipeline will have no preflight signals to work with.
  • Connection timeout. The WebSocket connection defaults to a 10-second timeout. Adjust with timeout in the config if your network is slow.
  • Keep-alive. Use sendKeepAlive() to prevent the V2 WebSocket from timing out during long pauses.

© 2026 CompositeVoice. All rights reserved.

Font size
Contrast
Motion
Transparency