Google Gemini
Use Google Gemini models as the LLM provider in a CompositeVoice pipeline.
Use GeminiLLM when you want Google’s Gemini models with their strong multimodal capabilities and competitive performance.
Prerequisites
- A Google AI Studio API key or a CompositeVoice proxy server
- Install the peer dependency:
npm install openai
Google exposes an OpenAI-compatible endpoint for Gemini, so the openai package handles all communication.
Basic setup
import { CompositeVoice, GeminiLLM, NativeSTT, NativeTTS } from '@lukeocodes/composite-voice';
const agent = new CompositeVoice({
stt: new NativeSTT({ language: 'en-US' }),
llm: new GeminiLLM({
proxyUrl: '/api/proxy/gemini',
model: 'gemini-2.0-flash',
systemPrompt: 'You are a concise voice assistant. Keep answers under two sentences.',
}),
tts: new NativeTTS(),
});
await agent.start();
Configuration options
| Option | Type | Default | Description |
|---|---|---|---|
model | string | 'gemini-2.0-flash' | Model identifier. See model variants below. |
systemPrompt | string | — | System-level instructions for the assistant. |
temperature | number | — | Randomness (0 = deterministic, 2 = creative). |
maxTokens | number | — | Maximum tokens per response. |
topP | number | — | Nucleus sampling threshold (0—1). |
stream | boolean | true | Stream tokens incrementally. |
proxyUrl | string | — | CompositeVoice proxy endpoint. Recommended for browsers. |
geminiApiKey | string | — | Gemini API key. Convenience alias for apiKey. |
apiKey | string | — | Direct API key. geminiApiKey takes precedence if both are set. |
Model variants
| Model | Speed | Notes |
|---|---|---|
gemini-2.0-flash | Fast | Default. Best for low-latency voice applications. |
gemini-1.5-flash | Fast | Previous generation flash model. |
gemini-1.5-pro | Slower | Larger context, higher capability. |
Complete example
import {
CompositeVoice,
GeminiLLM,
DeepgramSTT,
DeepgramTTS,
} from '@lukeocodes/composite-voice';
const agent = new CompositeVoice({
stt: new DeepgramSTT({
proxyUrl: '/api/proxy/deepgram',
language: 'en',
options: { model: 'nova-3', smartFormat: true },
}),
llm: new GeminiLLM({
proxyUrl: '/api/proxy/gemini',
model: 'gemini-2.0-flash',
temperature: 0.7,
maxTokens: 256,
systemPrompt: 'You are a friendly voice assistant. Answer briefly.',
}),
tts: new DeepgramTTS({
proxyUrl: '/api/proxy/deepgram',
voice: 'aura-2-thalia-en',
}),
conversationHistory: { enabled: true, maxTurns: 10 },
});
await agent.start();
Tips
- Gemini uses Google’s OpenAI-compatible endpoint. The base URL defaults to
https://generativelanguage.googleapis.com/v1beta/openai. You do not need to set this manually. gemini-2.0-flashis ideal for voice. It delivers fast inference with good quality for conversational tasks.- Gemini uses the
openaipeer dependency. You do not need to install a Gemini-specific SDK. - Google AI Studio keys are free-tier. They work for development and testing. For production, use Vertex AI credentials through a proxy.
Related
- Providers reference — all LLM providers at a glance
- API reference — full class documentation
- OpenAI Compatible guide — connect custom OpenAI-compatible endpoints