Skip to content

Conversation History

Multi-turn conversation memory — how the SDK maintains context across exchanges.

Overview

CompositeVoice maintains a conversation history that gives the LLM context from previous exchanges. Each user utterance and assistant response form a turn. The SDK sends the full history with each LLM request via generateFromMessages().

Configuration

const voice = new CompositeVoice({
  providers: [stt, llm, tts],
  conversationHistory: {
    enabled: true,   // default: false
    maxTurns: 10,    // default: 0 (unlimited)
  },
});
OptionTypeDefaultDescription
enabledbooleanfalseAccumulate turns and send them as context
maxTurnsnumber0Maximum turns to retain; 0 means unlimited
maxTokensnumberundefinedApproximate token budget for history (uses a ceil(text.length / 4) heuristic). When both maxTurns and maxTokens are set, the more restrictive limit wins.
preserveSystemMessagesbooleantrueWhen true, system messages are never removed by turn-based or token-based trimming

How turns work

Each turn consists of one user message (from STT) and one assistant message (from LLM):

Turn 1: { role: 'user', content: 'What is TypeScript?' }
        { role: 'assistant', content: 'TypeScript is a typed superset of JavaScript...' }

Turn 2: { role: 'user', content: 'How does it compare to Flow?' }
        { role: 'assistant', content: 'Both add static types, but TypeScript...' }

Internally the SDK stores turns as a flat array of LLMMessage objects. One turn equals two messages (user + assistant). When maxTurns is exceeded, the oldest turn (both messages) is dropped.

System prompts

Pass a system prompt through the LLM provider’s configuration. The system prompt persists across all turns and is not part of the conversation history array:

const llm = new AnthropicLLM({
  proxyUrl: '/api/proxy/anthropic',
  model: 'claude-haiku-4-5',
  systemPrompt: 'You are a helpful voice assistant. Keep responses brief.',
});

When conversation history is enabled, the SDK calls generateFromMessages() with the accumulated messages. The LLM provider prepends the system prompt automatically. The message order sent to the model is:

[system prompt] + [conversation history] + [latest user message]

Message format

The SDK uses the LLMMessage interface:

interface LLMMessage {
  role: 'system' | 'user' | 'assistant';
  content: string;
}

The conversation history array contains only user and assistant messages. The LLM provider handles system messages separately.

Reading history

const history = voice.getHistory();
// Returns a copy — modifications do not affect the internal array
console.log(`${history.length} messages in history`);
for (const msg of history) {
  console.log(`[${msg.role}]: ${msg.content}`);
}

Clearing history

voice.clearHistory();  // removes all stored messages

Use this when switching topics or resetting a conversation without disposing the agent.

Disabling history

const voice = new CompositeVoice({
  providers: [stt, llm, tts],
  conversationHistory: {
    enabled: false,  // each utterance is independent (default)
  },
});

Without history, the SDK calls generate() (single prompt) instead of generateFromMessages(). Each user utterance is treated as a standalone prompt. The LLM has no memory of previous exchanges.

Token management

Conversation history grows with each turn. Long conversations may approach the LLM’s context window limit. Set maxTurns to cap memory usage:

maxTurnsMessages retainedUse case
510Quick Q&A interactions
1020Moderate conversations
20+40+Extended sessions (watch token costs)
0 (unlimited)allYou manage context yourself

The SDK trims history before each LLM request. When the message count exceeds maxTurns * 2, it keeps only the most recent messages.

© 2026 CompositeVoice. All rights reserved.

Font size
Contrast
Motion
Transparency