Google Gemini

Use Google Gemini models as the LLM provider in a CompositeVoice pipeline.

Use GeminiLLM when you want Google’s Gemini models with their strong multimodal capabilities and competitive performance.

Prerequisites

A Google AI Studio API key or a CompositeVoice proxy server
Install the peer dependency:

npm install openai

Google exposes an OpenAI-compatible endpoint for Gemini, so the openai package handles all communication.

Basic setup

import { CompositeVoice, GeminiLLM, NativeSTT, NativeTTS } from '@lukeocodes/composite-voice';

const agent = new CompositeVoice({
  stt: new NativeSTT({ language: 'en-US' }),
  llm: new GeminiLLM({
    proxyUrl: '/api/proxy/gemini',
    model: 'gemini-2.0-flash',
    systemPrompt: 'You are a concise voice assistant. Keep answers under two sentences.',
  }),
  tts: new NativeTTS(),
});

await agent.start();

Configuration options

Option	Type	Default	Description
`model`	`string`	`'gemini-2.0-flash'`	Model identifier. See model variants below.
`systemPrompt`	`string`	—	System-level instructions for the assistant.
`temperature`	`number`	—	Randomness (0 = deterministic, 2 = creative).
`maxTokens`	`number`	—	Maximum tokens per response.
`topP`	`number`	—	Nucleus sampling threshold (0—1).
`stream`	`boolean`	`true`	Stream tokens incrementally.
`proxyUrl`	`string`	—	CompositeVoice proxy endpoint. Recommended for browsers.
`geminiApiKey`	`string`	—	Gemini API key. Convenience alias for `apiKey`.
`apiKey`	`string`	—	Direct API key. `geminiApiKey` takes precedence if both are set.

Model variants

Model	Speed	Notes
`gemini-2.0-flash`	Fast	Default. Best for low-latency voice applications.
`gemini-1.5-flash`	Fast	Previous generation flash model.
`gemini-1.5-pro`	Slower	Larger context, higher capability.

Complete example

import {
  CompositeVoice,
  GeminiLLM,
  DeepgramSTT,
  DeepgramTTS,
} from '@lukeocodes/composite-voice';

const agent = new CompositeVoice({
  stt: new DeepgramSTT({
    proxyUrl: '/api/proxy/deepgram',
    language: 'en',
    options: { model: 'nova-3', smartFormat: true },
  }),
  llm: new GeminiLLM({
    proxyUrl: '/api/proxy/gemini',
    model: 'gemini-2.0-flash',
    temperature: 0.7,
    maxTokens: 256,
    systemPrompt: 'You are a friendly voice assistant. Answer briefly.',
  }),
  tts: new DeepgramTTS({
    proxyUrl: '/api/proxy/deepgram',
    voice: 'aura-2-thalia-en',
  }),
  conversationHistory: { enabled: true, maxTurns: 10 },
});

await agent.start();

Tips

Gemini uses Google’s OpenAI-compatible endpoint. The base URL defaults to https://generativelanguage.googleapis.com/v1beta/openai. You do not need to set this manually.
gemini-2.0-flash is ideal for voice. It delivers fast inference with good quality for conversational tasks.
Gemini uses the openai peer dependency. You do not need to install a Gemini-specific SDK.
Google AI Studio keys are free-tier. They work for development and testing. For production, use Vertex AI credentials through a proxy.

Providers reference — all LLM providers at a glance
API reference — full class documentation
OpenAI Compatible guide — connect custom OpenAI-compatible endpoints