Skip to content

Google Gemini

Use Google Gemini models as the LLM provider in a CompositeVoice pipeline.

Use GeminiLLM when you want Google’s Gemini models with their strong multimodal capabilities and competitive performance.

Prerequisites

npm install openai

Google exposes an OpenAI-compatible endpoint for Gemini, so the openai package handles all communication.

Basic setup

import { CompositeVoice, GeminiLLM, NativeSTT, NativeTTS } from '@lukeocodes/composite-voice';

const agent = new CompositeVoice({
  stt: new NativeSTT({ language: 'en-US' }),
  llm: new GeminiLLM({
    proxyUrl: '/api/proxy/gemini',
    model: 'gemini-2.0-flash',
    systemPrompt: 'You are a concise voice assistant. Keep answers under two sentences.',
  }),
  tts: new NativeTTS(),
});

await agent.start();

Configuration options

OptionTypeDefaultDescription
modelstring'gemini-2.0-flash'Model identifier. See model variants below.
systemPromptstringSystem-level instructions for the assistant.
temperaturenumberRandomness (0 = deterministic, 2 = creative).
maxTokensnumberMaximum tokens per response.
topPnumberNucleus sampling threshold (0—1).
streambooleantrueStream tokens incrementally.
proxyUrlstringCompositeVoice proxy endpoint. Recommended for browsers.
geminiApiKeystringGemini API key. Convenience alias for apiKey.
apiKeystringDirect API key. geminiApiKey takes precedence if both are set.

Model variants

ModelSpeedNotes
gemini-2.0-flashFastDefault. Best for low-latency voice applications.
gemini-1.5-flashFastPrevious generation flash model.
gemini-1.5-proSlowerLarger context, higher capability.

Complete example

import {
  CompositeVoice,
  GeminiLLM,
  DeepgramSTT,
  DeepgramTTS,
} from '@lukeocodes/composite-voice';

const agent = new CompositeVoice({
  stt: new DeepgramSTT({
    proxyUrl: '/api/proxy/deepgram',
    language: 'en',
    options: { model: 'nova-3', smartFormat: true },
  }),
  llm: new GeminiLLM({
    proxyUrl: '/api/proxy/gemini',
    model: 'gemini-2.0-flash',
    temperature: 0.7,
    maxTokens: 256,
    systemPrompt: 'You are a friendly voice assistant. Answer briefly.',
  }),
  tts: new DeepgramTTS({
    proxyUrl: '/api/proxy/deepgram',
    voice: 'aura-2-thalia-en',
  }),
  conversationHistory: { enabled: true, maxTurns: 10 },
});

await agent.start();

Tips

  • Gemini uses Google’s OpenAI-compatible endpoint. The base URL defaults to https://generativelanguage.googleapis.com/v1beta/openai. You do not need to set this manually.
  • gemini-2.0-flash is ideal for voice. It delivers fast inference with good quality for conversational tasks.
  • Gemini uses the openai peer dependency. You do not need to install a Gemini-specific SDK.
  • Google AI Studio keys are free-tier. They work for development and testing. For production, use Vertex AI credentials through a proxy.

© 2026 CompositeVoice. All rights reserved.

Font size
Contrast
Motion
Transparency