CompositeVoice

The primary class of the CompositeVoice SDK, orchestrating a complete 5-role audio pipeline from input capture through speech recognition, language model...

Defined in: src/CompositeVoice.ts:226

The primary class of the CompositeVoice SDK, orchestrating a complete 5-role audio pipeline from input capture through speech recognition, language model processing, speech synthesis, to audio output.

Remarks

CompositeVoice resolves a flat array of providers into a typed ResolvedPipeline covering five roles (input, stt, llm, tts, output), then wires them together with AudioBufferQueue instances to prevent race conditions and an AudioHeaderCache for WebSocket reconnection scenarios. It manages:

5-role pipeline: Providers are resolved via resolveProviders into a typed pipeline. Multi-role providers (e.g., NativeSTT covering input+stt) use simplified paths without queues.
Race condition fix: An AudioBufferQueue between the input provider and STT buffers audio frames during the WebSocket handshake, then flushes them in order when startDraining() is called.
Provider lifecycle: Initialization, connection, and disposal of all providers, with deduplication for multi-role instances (using Set).
State machines: Four coordinated state machines (audio capture, audio playback, processing, and an orchestrating agent state machine) that derive the high-level agent state (idle, ready, listening, thinking, speaking, error).
Turn-taking: Configurable strategies (auto, conservative, aggressive, detect) that control whether audio capture pauses during TTS playback to prevent echo. Pause/resume stops/starts queue draining, pauses/resumes input, disconnects/reconnects STT with header re-injection.
Conversation history: Optional multi-turn memory with configurable maxTurns, sending accumulated context to the LLM.
Eager LLM pipeline: Speculative generation triggered by DeepgramFlux preflight signals, reducing speech-to-first-token latency. If speech_final arrives with different text, the speculative generation is cancelled and restarted.
Event emission: A rich set of typed events covering every stage of the pipeline, plus wildcard ('*') subscription support.

Examples

import { CompositeVoice, NativeSTT, AnthropicLLM, NativeTTS } from 'composite-voice';

const agent = new CompositeVoice({
  providers: [
    new NativeSTT(),
    new AnthropicLLM({ model: 'claude-sonnet-4-20250514', systemPrompt: 'You are a helpful assistant.' }),
    new NativeTTS(),
  ],
});

// Subscribe to events before initializing
agent.on('agent.stateChange', ({ state }) => console.log('State:', state));
agent.on('transcription.final', ({ text }) => console.log('User said:', text));
agent.on('llm.chunk', ({ chunk }) => process.stdout.write(chunk));
agent.on('agent.error', ({ error, context }) => console.error(context, error));

await agent.initialize();
await agent.startListening();

// The pipeline now runs automatically:
//   [input] -> InputQueue -> [stt] -> [llm] -> [tts] -> OutputQueue -> [output]

// When finished:
await agent.dispose();

import {
  CompositeVoice, MicrophoneInput, DeepgramSTT, AnthropicLLM,
  DeepgramTTS, BrowserAudioOutput,
} from 'composite-voice';

const agent = new CompositeVoice({
  providers: [
    new MicrophoneInput(),
    new DeepgramSTT({ apiKey: '...' }),
    new AnthropicLLM({ model: 'claude-sonnet-4-20250514' }),
    new DeepgramTTS({ apiKey: '...' }),
    new BrowserAudioOutput(),
  ],
  queue: {
    input: { maxSize: 2000 },
    output: { maxSize: 500 },
  },
});

const agent = new CompositeVoice({
  providers: [
    new NativeSTT(),
    new AnthropicLLM({ model: 'claude-sonnet-4-20250514' }),
    new NativeTTS(),
  ],
  conversationHistory: { enabled: true, maxTurns: 10 },
  eagerLLM: { enabled: true, cancelOnTextChange: true },
});

await agent.initialize();
await agent.startListening();

// Check history at any time
console.log(agent.getHistory());

// Clear history to reset context
agent.clearHistory();

See

EventEmitter for the underlying event system.
CompositeVoiceConfig for all configuration options.
resolveProviders for the provider resolution algorithm.
AudioBufferQueue for the queue that fixes the race condition.
AudioHeaderCache for header caching on reconnection.

Constructors

Constructor

new CompositeVoice(config): CompositeVoice;

Defined in: src/CompositeVoice.ts:393

Creates a new CompositeVoice instance with the given provider configuration.

Parameters

Parameter	Type	Description
`config`	`CompositeVoiceConfig`	The SDK configuration containing a `providers` array with provider instances, plus optional queue, logging, turn-taking, conversation history, and eager LLM settings.

Returns

CompositeVoice

Remarks

The constructor resolves the providers array into a typed ResolvedPipeline via resolveProviders, creates input and output AudioBufferQueue instances, initializes the AudioHeaderCache, creates the four state machines, and wires up the agent state change listener. It does not initialize providers or start listening — call initialize() and then startListening() to begin the pipeline.

Throws

ConfigurationError Thrown if the required roles are not covered by the providers array, if duplicate roles are found, or if providers fail duck-type validation.

Example

const agent = new CompositeVoice({
  providers: [
    new NativeSTT(),
    new AnthropicLLM({ model: 'claude-sonnet-4-20250514' }),
    new NativeTTS(),
  ],
  logging: { enabled: true, level: 'debug' },
});

Accessors

isInputMuted

Get Signature

get isInputMuted(): boolean;

Defined in: src/CompositeVoice.ts:2082

Whether audio input (microphone) is currently muted.

Returns

boolean

isOutputMuted

Get Signature

get isOutputMuted(): boolean;

Defined in: src/CompositeVoice.ts:2075

Whether audio output (TTS playback) is currently muted.

Returns

boolean

Methods

clearHistory()

clearHistory(): void;

Defined in: src/CompositeVoice.ts:2323

Clears all accumulated conversation history.

Returns

void

Remarks

After calling this method, the next LLM request will have no prior context (unless new turns accumulate). This is useful for resetting the conversation topic without disposing the entire agent.

Example

agent.clearHistory();
console.log(agent.getHistory().length); // 0

dispose()

dispose(): Promise<void>;

Defined in: src/CompositeVoice.ts:2396

Disposes of the SDK, releasing all resources including providers, queues, state machines, event listeners, and conversation history.

Returns

Promise<void>

Remarks

This method performs a full teardown in the following order:

Stops listening and speaking if the agent is in those states.
Aborts any in-flight eager LLM generation.
Clears conversation history.
Clears both input and output queues.
Disposes all pipeline providers concurrently, deduplicated so that multi-role providers (e.g., NativeSTT covering input+stt) are only disposed once.
Removes all event listeners from the internal EventEmitter.
Resets and disposes all four state machines.

After disposal, calling any operational method will throw. If the SDK is already disposed, this method logs a warning and returns immediately.

Throws

Throws the underlying error if any disposal step fails.

Example

// Graceful shutdown
await agent.dispose();
console.log(agent.isReady()); // false

getHistory()

getHistory(): LLMMessage[];

Defined in: src/CompositeVoice.ts:2305

Returns a shallow copy of the current conversation history.

Returns

LLMMessage[]

A new array of LLMMessage objects representing the conversation so far.

Remarks

The returned array contains LLMMessage objects with role ('user' or 'assistant') and content fields, in chronological order. If conversation history is not enabled in the configuration, or no turns have occurred yet, an empty array is returned.

The array is a copy, so modifications do not affect the internal history.

Example

const history = agent.getHistory();
console.log(`${history.length} messages in history`);
for (const msg of history) {
  console.log(`[${msg.role}]: ${msg.content}`);
}

getQueueStats()

getQueueStats(): {
  input: QueueStats;
  output: QueueStats;
};

Defined in: src/CompositeVoice.ts:2251

Returns statistics from both the input and output audio buffer queues.

Returns

{
  input: QueueStats;
  output: QueueStats;
}

An object with input and output QueueStats snapshots.

Name	Type	Defined in
`input`	`QueueStats`	src/CompositeVoice.ts:2251
`output`	`QueueStats`	src/CompositeVoice.ts:2251

Remarks

Provides observability into the pipeline’s buffering behavior. The input queue sits between the AudioInputProvider and the STT provider; the output queue sits between the TTS provider and the AudioOutputProvider.

Stats include current size, total enqueued/dequeued/dropped counts, and the age of the oldest buffered chunk. For multi-role providers where the queue is not actively used, stats will show zero activity.

Example

const stats = agent.getQueueStats();
console.log(`Input queue: ${stats.input.size} buffered, ${stats.input.totalDropped} dropped`);
console.log(`Output queue: ${stats.output.size} buffered`);

getState()

getState(): AgentState;

Defined in: src/CompositeVoice.ts:2226

Returns the current high-level agent state.

Returns

AgentState

The current AgentState.

Remarks

The agent state is derived by the AgentStateMachine from the three sub-state machines (capture, playback, processing). Possible values are: 'idle', 'ready', 'listening', 'thinking', 'speaking', and 'error'.

Example

if (agent.getState() === 'listening') {
  console.log('Agent is currently listening');
}

initialize()

initialize(): Promise<void>;

Defined in: src/CompositeVoice.ts:491

Initializes the SDK by connecting the agent state machine to its sub-machines and initializing all pipeline providers concurrently.

Returns

Promise<void>

Remarks

This method must be called exactly once before startListening or any other operational method. Calling it a second time logs a warning and returns immediately. On success it emits an 'agent.ready' event and transitions the agent state machine from idle to ready.

Multi-role providers are deduplicated using a Set so that a provider covering both input and stt (e.g., NativeSTT) is only initialized once. All unique providers are initialized concurrently via Promise.all.

Throws

Throws the underlying provider error if any provider’s initialize() method rejects.

Example

const agent = new CompositeVoice({ providers: [stt, llm, tts] });
agent.on('agent.ready', () => console.log('SDK is ready'));
await agent.initialize();

isReady()

isReady(): boolean;

Defined in: src/CompositeVoice.ts:2345

Checks whether the SDK has been successfully initialized.

Returns

boolean

true if the SDK is initialized and operational, false otherwise.

Remarks

Returns true after initialize() has completed successfully, and false before initialization or after dispose() has been called.

Example

if (!agent.isReady()) {
  await agent.initialize();
}

muteInput()

muteInput(): void;

Defined in: src/CompositeVoice.ts:1998

Mute audio input (microphone), pausing speech capture and STT processing.

Returns

void

Remarks

The agent remains initialized and can still receive text input via sendMessage. Call unmuteInput to resume voice capture.

Example

agent.muteInput();
// User can still type via sendMessage()
await agent.sendMessage('Hello');
agent.unmuteInput(); // resume mic

muteOutput()

muteOutput(): void;

Defined in: src/CompositeVoice.ts:2057

Mute audio output (speaker), suppressing TTS playback.

Returns

void

Remarks

LLM responses still stream and emit llm.chunk / llm.complete events, but audio will not be played. This is useful when the user prefers to read responses as text.

See

unmuteOutput

off()

off<T>(event, listener): void;

Defined in: src/CompositeVoice.ts:2166

Removes a previously registered event listener.

Type Parameters

Type Parameter Description

Type Parameter	Description
`T` extends \| `"transcription.start"` \| `"transcription.interim"` \| `"transcription.final"` \| `"transcription.speechFinal"` \| `"transcription.preflight"` \| `"transcription.error"` \| `"llm.start"` \| `"llm.chunk"` \| `"llm.complete"` \| `"llm.error"` \| `"tts.start"` \| `"tts.audio"` \| `"tts.metadata"` \| `"tts.complete"` \| `"tts.error"` \| `"agent.ready"` \| `"agent.stateChange"` \| `"agent.error"` \| `"audio.capture.start"` \| `"audio.capture.stop"` \| `"audio.capture.error"` \| `"audio.playback.start"` \| `"audio.playback.end"` \| `"audio.playback.error"` \| `"queue.overflow"` \| `"queue.stats"`	The event type string, inferred from the `event` argument.

Parameters

Parameter	Type	Description
`event`	`"*"` \| `T`	The event type the listener was registered for, or `'*'`.
`listener`	`T` extends `"*"` ? (`event`) => `void` : `EventListenerMap`[`T`]	The exact listener function reference that was passed to on.

Returns

void

Example

const handler = ({ state }: { state: AgentState }) => console.log(state);
agent.on('agent.stateChange', handler);

// Later, remove it manually
agent.off('agent.stateChange', handler);

on()

on<T>(event, listener): () => void;

Defined in: src/CompositeVoice.ts:2120

Registers an event listener for the specified event type.

Type Parameters

Type Parameter Description

Type Parameter	Description
`T` extends \| `"transcription.start"` \| `"transcription.interim"` \| `"transcription.final"` \| `"transcription.speechFinal"` \| `"transcription.preflight"` \| `"transcription.error"` \| `"llm.start"` \| `"llm.chunk"` \| `"llm.complete"` \| `"llm.error"` \| `"tts.start"` \| `"tts.audio"` \| `"tts.metadata"` \| `"tts.complete"` \| `"tts.error"` \| `"agent.ready"` \| `"agent.stateChange"` \| `"agent.error"` \| `"audio.capture.start"` \| `"audio.capture.stop"` \| `"audio.capture.error"` \| `"audio.playback.start"` \| `"audio.playback.end"` \| `"audio.playback.error"` \| `"queue.overflow"` \| `"queue.stats"`	The event type string, inferred from the `event` argument.

Parameters

Parameter	Type	Description
`event`	`"*"` \| `T`	The event type to listen for (e.g., `'agent.stateChange'`, `'transcription.final'`), or `'*'` to listen for all events.
`listener`	`T` extends `"*"` ? (`event`) => `void` : `EventListenerMap`[`T`]	The callback function invoked when the event fires. The callback receives the typed event payload matching `T`.

Returns

A function that, when called, removes this listener.

(): void;

Returns

void

Remarks

Supports all typed SDK events as well as the wildcard '*' to receive every event. The returned function can be called to unsubscribe.

Example

// Typed event listener
const unsubscribe = agent.on('llm.chunk', ({ chunk, accumulated }) => {
  console.log('LLM chunk:', chunk);
});

// Wildcard listener for logging
agent.on('*', (event) => {
  console.log(`[${event.type}]`, event);
});

// Later, remove the listener
unsubscribe();

See

once for one-time listeners.
off for manual removal.

once()

once<T>(event, listener): () => void;

Defined in: src/CompositeVoice.ts:2144

Registers a one-time event listener that automatically unsubscribes after the first invocation.

Type Parameters

Type Parameter Description

Type Parameter	Description
`T` extends \| `"transcription.start"` \| `"transcription.interim"` \| `"transcription.final"` \| `"transcription.speechFinal"` \| `"transcription.preflight"` \| `"transcription.error"` \| `"llm.start"` \| `"llm.chunk"` \| `"llm.complete"` \| `"llm.error"` \| `"tts.start"` \| `"tts.audio"` \| `"tts.metadata"` \| `"tts.complete"` \| `"tts.error"` \| `"agent.ready"` \| `"agent.stateChange"` \| `"agent.error"` \| `"audio.capture.start"` \| `"audio.capture.stop"` \| `"audio.capture.error"` \| `"audio.playback.start"` \| `"audio.playback.end"` \| `"audio.playback.error"` \| `"queue.overflow"` \| `"queue.stats"`	The event type string, inferred from the `event` argument.

Parameters

Parameter	Type	Description
`event`	`T`	The event type to listen for.
`listener`	`EventListenerMap`[`T`]	The callback function invoked once when the event fires.

Returns

A function that, when called, removes this listener before it fires.

(): void;

Returns

void

Example

agent.once('agent.ready', () => {
  console.log('Agent is ready (this fires only once)');
});

sendMessage()

sendMessage(text): Promise<void>;

Defined in: src/CompositeVoice.ts:1687

Send a text message directly to the LLM, bypassing speech-to-text.

Parameters

Parameter	Type	Description
`text`	`string`	The user’s typed message.

Returns

Promise<void>

Remarks

This allows users to type a message that enters the same pipeline as spoken input. The message is added to conversation history with modality: 'text' so the LLM can distinguish typed input from transcribed speech and adapt its response style accordingly.

If audio output is not muted, the LLM response will also be spoken via TTS. Use muteOutput to get text-only responses.

Example

// Send a typed message while the agent is listening
await agent.sendMessage('What is the weather like?');

// Listen for the response
agent.on('llm.complete', ({ text }) => {
  console.log('Response:', text);
});

startListening()

startListening(): Promise<void>;

Defined in: src/CompositeVoice.ts:1448

Starts listening for user speech input by wiring the input provider to the STT provider through the input queue and header cache.

Returns

Promise<void>

Remarks

This method transitions the agent from ready (or idle) into the listening state. The exact behavior depends on the pipeline topology:

Multi-role input===stt (e.g., NativeSTT): Simplified path — just calls connect() on the Live STT provider. The provider handles its own microphone access internally.
Separate input + STT (e.g., MicrophoneInput + DeepgramSTT): The race condition fix is applied:
1. Wire input.onAudio → headerCache.process → inputQueue.enqueue
2. Start the input provider (begins audio capture)
3. Auto-configure STT encoding/sampleRate from input metadata
4. Connect the STT provider (async WebSocket handshake)
5. inputQueue.startDraining — flush buffered chunks + switch to pass-through

On success, emits an 'audio.capture.start' event.

Throws

InvalidStateError Thrown if the agent is not in the ready or idle state.

Throws

Throws the underlying error if STT connection or audio capture fails, after emitting an 'agent.error' event.

Example

await agent.initialize();
await agent.startListening();
// The agent is now transcribing speech in real-time

stopListening()

stopListening(): Promise<void>;

Defined in: src/CompositeVoice.ts:1535

Stops listening for user speech by halting audio capture, clearing the input queue, and disconnecting the STT provider.

Returns

Promise<void>

Remarks

If the agent is not currently in the listening state, this method logs a warning and returns without error. On success, it:

Stops draining the input queue
Clears the input queue
Stops the input provider (for separate input)
Disconnects the STT provider
Resets the header cache
Transitions the capture state machine to stopped
Emits an 'audio.capture.stop' event

Throws

Throws the underlying error if stopping audio capture or disconnecting the STT provider fails.

Example

await agent.startListening();
// ... user finishes speaking ...
await agent.stopListening();

stopSpeaking()

stopSpeaking(): Promise<void>;

Defined in: src/CompositeVoice.ts:1603

Stops the agent from speaking by cancelling TTS playback and disconnecting any Live TTS provider.

Returns

Promise<void>

Remarks

If the agent is not currently in the speaking state, this method returns silently. Otherwise it stops the output provider (clearing its queue), disconnects any Live TTS WebSocket, transitions the playback state machine back to idle, and emits a 'tts.complete' event.

This is useful for implementing “barge-in” behavior where the user interrupts the agent mid-speech.

Throws

Throws the underlying error if stopping playback or disconnecting the TTS provider fails.

Example

agent.on('transcription.interim', async ({ text }) => {
  // Barge-in: stop the agent if the user starts speaking
  if (agent.getState() === 'speaking') {
    await agent.stopSpeaking();
  }
});

unmuteInput()

unmuteInput(): Promise<void>;

Defined in: src/CompositeVoice.ts:2021

Unmute audio input (microphone), resuming speech capture and STT processing.

Returns

Promise<void>

See

muteInput

unmuteOutput()

unmuteOutput(): void;

Defined in: src/CompositeVoice.ts:2067

Unmute audio output (speaker), resuming TTS playback for future responses.

Returns

void

See

muteOutput