CompositeVoice
The primary class of the CompositeVoice SDK, orchestrating a complete 5-role audio pipeline from input capture through speech recognition, language model...
Defined in: src/CompositeVoice.ts:226
The primary class of the CompositeVoice SDK, orchestrating a complete 5-role audio pipeline from input capture through speech recognition, language model processing, speech synthesis, to audio output.
Remarks
CompositeVoice resolves a flat array of providers into a typed ResolvedPipeline covering five roles (input, stt, llm, tts, output), then wires them together with AudioBufferQueue instances to prevent race conditions and an AudioHeaderCache for WebSocket reconnection scenarios. It manages:
- 5-role pipeline: Providers are resolved via resolveProviders into a typed pipeline. Multi-role providers (e.g., NativeSTT covering
input+stt) use simplified paths without queues. - Race condition fix: An
AudioBufferQueuebetween the input provider and STT buffers audio frames during the WebSocket handshake, then flushes them in order whenstartDraining()is called. - Provider lifecycle: Initialization, connection, and disposal of all providers, with deduplication for multi-role instances (using
Set). - State machines: Four coordinated state machines (audio capture, audio playback, processing, and an orchestrating agent state machine) that derive the high-level agent state (
idle,ready,listening,thinking,speaking,error). - Turn-taking: Configurable strategies (
auto,conservative,aggressive,detect) that control whether audio capture pauses during TTS playback to prevent echo. Pause/resume stops/starts queue draining, pauses/resumes input, disconnects/reconnects STT with header re-injection. - Conversation history: Optional multi-turn memory with configurable
maxTurns, sending accumulated context to the LLM. - Eager LLM pipeline: Speculative generation triggered by DeepgramFlux preflight signals, reducing speech-to-first-token latency. If
speech_finalarrives with different text, the speculative generation is cancelled and restarted. - Event emission: A rich set of typed events covering every stage of the pipeline, plus wildcard (
'*') subscription support.
Examples
import { CompositeVoice, NativeSTT, AnthropicLLM, NativeTTS } from 'composite-voice';
const agent = new CompositeVoice({
providers: [
new NativeSTT(),
new AnthropicLLM({ model: 'claude-sonnet-4-20250514', systemPrompt: 'You are a helpful assistant.' }),
new NativeTTS(),
],
});
// Subscribe to events before initializing
agent.on('agent.stateChange', ({ state }) => console.log('State:', state));
agent.on('transcription.final', ({ text }) => console.log('User said:', text));
agent.on('llm.chunk', ({ chunk }) => process.stdout.write(chunk));
agent.on('agent.error', ({ error, context }) => console.error(context, error));
await agent.initialize();
await agent.startListening();
// The pipeline now runs automatically:
// [input] -> InputQueue -> [stt] -> [llm] -> [tts] -> OutputQueue -> [output]
// When finished:
await agent.dispose();
import {
CompositeVoice, MicrophoneInput, DeepgramSTT, AnthropicLLM,
DeepgramTTS, BrowserAudioOutput,
} from 'composite-voice';
const agent = new CompositeVoice({
providers: [
new MicrophoneInput(),
new DeepgramSTT({ apiKey: '...' }),
new AnthropicLLM({ model: 'claude-sonnet-4-20250514' }),
new DeepgramTTS({ apiKey: '...' }),
new BrowserAudioOutput(),
],
queue: {
input: { maxSize: 2000 },
output: { maxSize: 500 },
},
});
const agent = new CompositeVoice({
providers: [
new NativeSTT(),
new AnthropicLLM({ model: 'claude-sonnet-4-20250514' }),
new NativeTTS(),
],
conversationHistory: { enabled: true, maxTurns: 10 },
eagerLLM: { enabled: true, cancelOnTextChange: true },
});
await agent.initialize();
await agent.startListening();
// Check history at any time
console.log(agent.getHistory());
// Clear history to reset context
agent.clearHistory();
See
- EventEmitter for the underlying event system.
- CompositeVoiceConfig for all configuration options.
- resolveProviders for the provider resolution algorithm.
- AudioBufferQueue for the queue that fixes the race condition.
- AudioHeaderCache for header caching on reconnection.
Constructors
Constructor
new CompositeVoice(config): CompositeVoice;
Defined in: src/CompositeVoice.ts:393
Creates a new CompositeVoice instance with the given provider configuration.
Parameters
| Parameter | Type | Description |
|---|---|---|
config | CompositeVoiceConfig | The SDK configuration containing a providers array with provider instances, plus optional queue, logging, turn-taking, conversation history, and eager LLM settings. |
Returns
CompositeVoice
Remarks
The constructor resolves the providers array into a typed ResolvedPipeline via resolveProviders, creates input and output AudioBufferQueue instances, initializes the AudioHeaderCache, creates the four state machines, and wires up the agent state change listener. It does not initialize providers or start listening — call initialize() and then startListening() to begin the pipeline.
Throws
ConfigurationError Thrown if the required roles are not covered by the providers array, if duplicate roles are found, or if providers fail duck-type validation.
Example
const agent = new CompositeVoice({
providers: [
new NativeSTT(),
new AnthropicLLM({ model: 'claude-sonnet-4-20250514' }),
new NativeTTS(),
],
logging: { enabled: true, level: 'debug' },
});
Accessors
isInputMuted
Get Signature
get isInputMuted(): boolean;
Defined in: src/CompositeVoice.ts:2082
Whether audio input (microphone) is currently muted.
Returns
boolean
isOutputMuted
Get Signature
get isOutputMuted(): boolean;
Defined in: src/CompositeVoice.ts:2075
Whether audio output (TTS playback) is currently muted.
Returns
boolean
Methods
clearHistory()
clearHistory(): void;
Defined in: src/CompositeVoice.ts:2323
Clears all accumulated conversation history.
Returns
void
Remarks
After calling this method, the next LLM request will have no prior context (unless new turns accumulate). This is useful for resetting the conversation topic without disposing the entire agent.
Example
agent.clearHistory();
console.log(agent.getHistory().length); // 0
dispose()
dispose(): Promise<void>;
Defined in: src/CompositeVoice.ts:2396
Disposes of the SDK, releasing all resources including providers, queues, state machines, event listeners, and conversation history.
Returns
Promise<void>
Remarks
This method performs a full teardown in the following order:
- Stops listening and speaking if the agent is in those states.
- Aborts any in-flight eager LLM generation.
- Clears conversation history.
- Clears both input and output queues.
- Disposes all pipeline providers concurrently, deduplicated so that multi-role providers (e.g., NativeSTT covering
input+stt) are only disposed once. - Removes all event listeners from the internal
EventEmitter. - Resets and disposes all four state machines.
After disposal, calling any operational method will throw. If the SDK is already disposed, this method logs a warning and returns immediately.
Throws
Throws the underlying error if any disposal step fails.
Example
// Graceful shutdown
await agent.dispose();
console.log(agent.isReady()); // false
getHistory()
getHistory(): LLMMessage[];
Defined in: src/CompositeVoice.ts:2305
Returns a shallow copy of the current conversation history.
Returns
A new array of LLMMessage objects representing the conversation so far.
Remarks
The returned array contains LLMMessage objects with role ('user' or 'assistant') and content fields, in chronological order. If conversation history is not enabled in the configuration, or no turns have occurred yet, an empty array is returned.
The array is a copy, so modifications do not affect the internal history.
Example
const history = agent.getHistory();
console.log(`${history.length} messages in history`);
for (const msg of history) {
console.log(`[${msg.role}]: ${msg.content}`);
}
getQueueStats()
getQueueStats(): {
input: QueueStats;
output: QueueStats;
};
Defined in: src/CompositeVoice.ts:2251
Returns statistics from both the input and output audio buffer queues.
Returns
{
input: QueueStats;
output: QueueStats;
}
An object with input and output QueueStats snapshots.
| Name | Type | Defined in |
|---|---|---|
input | QueueStats | src/CompositeVoice.ts:2251 |
output | QueueStats | src/CompositeVoice.ts:2251 |
Remarks
Provides observability into the pipeline’s buffering behavior. The input queue sits between the AudioInputProvider and the STT provider; the output queue sits between the TTS provider and the AudioOutputProvider.
Stats include current size, total enqueued/dequeued/dropped counts, and the age of the oldest buffered chunk. For multi-role providers where the queue is not actively used, stats will show zero activity.
Example
const stats = agent.getQueueStats();
console.log(`Input queue: ${stats.input.size} buffered, ${stats.input.totalDropped} dropped`);
console.log(`Output queue: ${stats.output.size} buffered`);
getState()
getState(): AgentState;
Defined in: src/CompositeVoice.ts:2226
Returns the current high-level agent state.
Returns
The current AgentState.
Remarks
The agent state is derived by the AgentStateMachine from the three sub-state machines (capture, playback, processing). Possible values are: 'idle', 'ready', 'listening', 'thinking', 'speaking', and 'error'.
Example
if (agent.getState() === 'listening') {
console.log('Agent is currently listening');
}
initialize()
initialize(): Promise<void>;
Defined in: src/CompositeVoice.ts:491
Initializes the SDK by connecting the agent state machine to its sub-machines and initializing all pipeline providers concurrently.
Returns
Promise<void>
Remarks
This method must be called exactly once before startListening or any other operational method. Calling it a second time logs a warning and returns immediately. On success it emits an 'agent.ready' event and transitions the agent state machine from idle to ready.
Multi-role providers are deduplicated using a Set so that a provider covering both input and stt (e.g., NativeSTT) is only initialized once. All unique providers are initialized concurrently via Promise.all.
Throws
Throws the underlying provider error if any provider’s initialize() method rejects.
Example
const agent = new CompositeVoice({ providers: [stt, llm, tts] });
agent.on('agent.ready', () => console.log('SDK is ready'));
await agent.initialize();
isReady()
isReady(): boolean;
Defined in: src/CompositeVoice.ts:2345
Checks whether the SDK has been successfully initialized.
Returns
boolean
true if the SDK is initialized and operational, false otherwise.
Remarks
Returns true after initialize() has completed successfully, and false before initialization or after dispose() has been called.
Example
if (!agent.isReady()) {
await agent.initialize();
}
muteInput()
muteInput(): void;
Defined in: src/CompositeVoice.ts:1998
Mute audio input (microphone), pausing speech capture and STT processing.
Returns
void
Remarks
The agent remains initialized and can still receive text input via sendMessage. Call unmuteInput to resume voice capture.
Example
agent.muteInput();
// User can still type via sendMessage()
await agent.sendMessage('Hello');
agent.unmuteInput(); // resume mic
muteOutput()
muteOutput(): void;
Defined in: src/CompositeVoice.ts:2057
Mute audio output (speaker), suppressing TTS playback.
Returns
void
Remarks
LLM responses still stream and emit llm.chunk / llm.complete events, but audio will not be played. This is useful when the user prefers to read responses as text.
See
off()
off<T>(event, listener): void;
Defined in: src/CompositeVoice.ts:2166
Removes a previously registered event listener.
Type Parameters
| Type Parameter | Description |
|---|---|
T extends | "transcription.start" | "transcription.interim" | "transcription.final" | "transcription.speechFinal" | "transcription.preflight" | "transcription.error" | "llm.start" | "llm.chunk" | "llm.complete" | "llm.error" | "tts.start" | "tts.audio" | "tts.metadata" | "tts.complete" | "tts.error" | "agent.ready" | "agent.stateChange" | "agent.error" | "audio.capture.start" | "audio.capture.stop" | "audio.capture.error" | "audio.playback.start" | "audio.playback.end" | "audio.playback.error" | "queue.overflow" | "queue.stats" | The event type string, inferred from the event argument. |
Parameters
| Parameter | Type | Description |
|---|---|---|
event | "*" | T | The event type the listener was registered for, or '*'. |
listener | T extends "*" ? (event) => void : EventListenerMap[T] | The exact listener function reference that was passed to on. |
Returns
void
Example
const handler = ({ state }: { state: AgentState }) => console.log(state);
agent.on('agent.stateChange', handler);
// Later, remove it manually
agent.off('agent.stateChange', handler);
on()
on<T>(event, listener): () => void;
Defined in: src/CompositeVoice.ts:2120
Registers an event listener for the specified event type.
Type Parameters
| Type Parameter | Description |
|---|---|
T extends | "transcription.start" | "transcription.interim" | "transcription.final" | "transcription.speechFinal" | "transcription.preflight" | "transcription.error" | "llm.start" | "llm.chunk" | "llm.complete" | "llm.error" | "tts.start" | "tts.audio" | "tts.metadata" | "tts.complete" | "tts.error" | "agent.ready" | "agent.stateChange" | "agent.error" | "audio.capture.start" | "audio.capture.stop" | "audio.capture.error" | "audio.playback.start" | "audio.playback.end" | "audio.playback.error" | "queue.overflow" | "queue.stats" | The event type string, inferred from the event argument. |
Parameters
| Parameter | Type | Description |
|---|---|---|
event | "*" | T | The event type to listen for (e.g., 'agent.stateChange', 'transcription.final'), or '*' to listen for all events. |
listener | T extends "*" ? (event) => void : EventListenerMap[T] | The callback function invoked when the event fires. The callback receives the typed event payload matching T. |
Returns
A function that, when called, removes this listener.
(): void;
Returns
void
Remarks
Supports all typed SDK events as well as the wildcard '*' to receive every event. The returned function can be called to unsubscribe.
Example
// Typed event listener
const unsubscribe = agent.on('llm.chunk', ({ chunk, accumulated }) => {
console.log('LLM chunk:', chunk);
});
// Wildcard listener for logging
agent.on('*', (event) => {
console.log(`[${event.type}]`, event);
});
// Later, remove the listener
unsubscribe();
See
once()
once<T>(event, listener): () => void;
Defined in: src/CompositeVoice.ts:2144
Registers a one-time event listener that automatically unsubscribes after the first invocation.
Type Parameters
| Type Parameter | Description |
|---|---|
T extends | "transcription.start" | "transcription.interim" | "transcription.final" | "transcription.speechFinal" | "transcription.preflight" | "transcription.error" | "llm.start" | "llm.chunk" | "llm.complete" | "llm.error" | "tts.start" | "tts.audio" | "tts.metadata" | "tts.complete" | "tts.error" | "agent.ready" | "agent.stateChange" | "agent.error" | "audio.capture.start" | "audio.capture.stop" | "audio.capture.error" | "audio.playback.start" | "audio.playback.end" | "audio.playback.error" | "queue.overflow" | "queue.stats" | The event type string, inferred from the event argument. |
Parameters
| Parameter | Type | Description |
|---|---|---|
event | T | The event type to listen for. |
listener | EventListenerMap[T] | The callback function invoked once when the event fires. |
Returns
A function that, when called, removes this listener before it fires.
(): void;
Returns
void
Example
agent.once('agent.ready', () => {
console.log('Agent is ready (this fires only once)');
});
sendMessage()
sendMessage(text): Promise<void>;
Defined in: src/CompositeVoice.ts:1687
Send a text message directly to the LLM, bypassing speech-to-text.
Parameters
| Parameter | Type | Description |
|---|---|---|
text | string | The user’s typed message. |
Returns
Promise<void>
Remarks
This allows users to type a message that enters the same pipeline as spoken input. The message is added to conversation history with modality: 'text' so the LLM can distinguish typed input from transcribed speech and adapt its response style accordingly.
If audio output is not muted, the LLM response will also be spoken via TTS. Use muteOutput to get text-only responses.
Example
// Send a typed message while the agent is listening
await agent.sendMessage('What is the weather like?');
// Listen for the response
agent.on('llm.complete', ({ text }) => {
console.log('Response:', text);
});
startListening()
startListening(): Promise<void>;
Defined in: src/CompositeVoice.ts:1448
Starts listening for user speech input by wiring the input provider to the STT provider through the input queue and header cache.
Returns
Promise<void>
Remarks
This method transitions the agent from ready (or idle) into the listening state. The exact behavior depends on the pipeline topology:
- Multi-role input===stt (e.g., NativeSTT): Simplified path — just calls
connect()on the Live STT provider. The provider handles its own microphone access internally. - Separate input + STT (e.g., MicrophoneInput + DeepgramSTT): The race condition fix is applied:
- Wire
input.onAudio→headerCache.process→inputQueue.enqueue - Start the input provider (begins audio capture)
- Auto-configure STT encoding/sampleRate from input metadata
- Connect the STT provider (async WebSocket handshake)
inputQueue.startDraining— flush buffered chunks + switch to pass-through
- Wire
On success, emits an 'audio.capture.start' event.
Throws
InvalidStateError Thrown if the agent is not in the ready or idle state.
Throws
Throws the underlying error if STT connection or audio capture fails, after emitting an 'agent.error' event.
Example
await agent.initialize();
await agent.startListening();
// The agent is now transcribing speech in real-time
stopListening()
stopListening(): Promise<void>;
Defined in: src/CompositeVoice.ts:1535
Stops listening for user speech by halting audio capture, clearing the input queue, and disconnecting the STT provider.
Returns
Promise<void>
Remarks
If the agent is not currently in the listening state, this method logs a warning and returns without error. On success, it:
- Stops draining the input queue
- Clears the input queue
- Stops the input provider (for separate input)
- Disconnects the STT provider
- Resets the header cache
- Transitions the capture state machine to
stopped - Emits an
'audio.capture.stop'event
Throws
Throws the underlying error if stopping audio capture or disconnecting the STT provider fails.
Example
await agent.startListening();
// ... user finishes speaking ...
await agent.stopListening();
stopSpeaking()
stopSpeaking(): Promise<void>;
Defined in: src/CompositeVoice.ts:1603
Stops the agent from speaking by cancelling TTS playback and disconnecting any Live TTS provider.
Returns
Promise<void>
Remarks
If the agent is not currently in the speaking state, this method returns silently. Otherwise it stops the output provider (clearing its queue), disconnects any Live TTS WebSocket, transitions the playback state machine back to idle, and emits a 'tts.complete' event.
This is useful for implementing “barge-in” behavior where the user interrupts the agent mid-speech.
Throws
Throws the underlying error if stopping playback or disconnecting the TTS provider fails.
Example
agent.on('transcription.interim', async ({ text }) => {
// Barge-in: stop the agent if the user starts speaking
if (agent.getState() === 'speaking') {
await agent.stopSpeaking();
}
});
unmuteInput()
unmuteInput(): Promise<void>;
Defined in: src/CompositeVoice.ts:2021
Unmute audio input (microphone), resuming speech capture and STT processing.
Returns
Promise<void>
See
unmuteOutput()
unmuteOutput(): void;
Defined in: src/CompositeVoice.ts:2067
Unmute audio output (speaker), resuming TTS playback for future responses.
Returns
void