Skip to content

Groq

Use Groq's ultra-fast LPU inference as the LLM provider in a CompositeVoice pipeline.

Use GroqLLM when you need the lowest possible LLM latency. Groq’s custom LPU hardware delivers token generation speeds that often exceed 500 tokens per second.

Prerequisites

  • A Groq API key or a CompositeVoice proxy server
  • Install the peer dependency:
npm install openai

Groq’s API is OpenAI-compatible, so the openai package handles all communication.

Basic setup

import { CompositeVoice, GroqLLM, NativeSTT, NativeTTS } from '@lukeocodes/composite-voice';

const agent = new CompositeVoice({
  stt: new NativeSTT({ language: 'en-US' }),
  llm: new GroqLLM({
    proxyUrl: '/api/proxy/groq',
    model: 'llama-3.3-70b-versatile',
    systemPrompt: 'You are a concise voice assistant. Keep answers under two sentences.',
  }),
  tts: new NativeTTS(),
});

await agent.start();

Configuration options

OptionTypeDefaultDescription
modelstring'llama-3.3-70b-versatile'Model identifier. See model variants below.
systemPromptstringSystem-level instructions for the assistant.
temperaturenumberRandomness (0 = deterministic, 2 = creative).
maxTokensnumberMaximum tokens per response.
topPnumberNucleus sampling threshold (0—1).
streambooleantrueStream tokens incrementally.
proxyUrlstringCompositeVoice proxy endpoint. Recommended for browsers.
groqApiKeystringGroq API key. Convenience alias for apiKey.
apiKeystringDirect API key. groqApiKey takes precedence if both are set.

Model variants

ModelParametersNotes
llama-3.3-70b-versatile70BDefault. Strong general-purpose model.
mixtral-8x7b-327688x7B MoE32k context window. Good for longer conversations.
gemma2-9b-it9BSmaller, faster model from Google.
llama-3.1-8b-instant8BFastest option. Good for simple tasks.

Check the Groq console for the current model list.

Complete example

import {
  CompositeVoice,
  GroqLLM,
  DeepgramSTT,
  DeepgramTTS,
} from '@lukeocodes/composite-voice';

const agent = new CompositeVoice({
  stt: new DeepgramSTT({
    proxyUrl: '/api/proxy/deepgram',
    language: 'en',
    options: { model: 'nova-3', smartFormat: true },
  }),
  llm: new GroqLLM({
    proxyUrl: '/api/proxy/groq',
    model: 'llama-3.3-70b-versatile',
    temperature: 0.7,
    maxTokens: 256,
    systemPrompt: 'You are a friendly voice assistant. Answer briefly.',
  }),
  tts: new DeepgramTTS({
    proxyUrl: '/api/proxy/deepgram',
    voice: 'aura-2-thalia-en',
  }),
  conversationHistory: { enabled: true, maxTurns: 10 },
  eagerLLM: { enabled: true },
});

await agent.start();

Tips

  • Use Groq model names exactly as listed. Groq model identifiers differ from the upstream model names (e.g., llama-3.3-70b-versatile, not meta-llama/Llama-3.3-70B).
  • Pair with eager LLM for minimum latency. Groq’s fast inference combined with DeepgramFlux preflight signals produces the lowest speech-to-first-token latency.
  • Groq uses the openai peer dependency. You do not need to install groq-sdk.
  • Rate limits apply. Free-tier Groq accounts have token-per-minute limits. Check the Groq docs for current limits.

© 2026 CompositeVoice. All rights reserved.

Font size
Contrast
Motion
Transparency