CartesiaTTS
Cartesia TTS provider for low-latency real-time streaming text-to-speech via WebSocket.
Defined in: src/providers/tts/cartesia/CartesiaTTS.ts:273
Cartesia TTS provider for low-latency real-time streaming text-to-speech via WebSocket.
Remarks
This provider establishes a WebSocket connection to the Cartesia TTS API (or a proxy). It uses Cartesia’s context-based streaming protocol, where a context_id links multiple text chunks into a single coherent utterance. The continue flag indicates whether a chunk continues an existing context or starts a new one.
The lifecycle is:
- Construct with CartesiaTTSConfig
- Call
initialize()to validate configuration - Call
connect()to open the WebSocket and generate a context ID - Call
sendText()to stream text for synthesis (uses context continuation) - Call
finalize()to send end-of-input and flush remaining audio - Call
disconnect()to close the WebSocket - Call
dispose()to release all resources
Audio flow: Text chunks -> WebSocket -> Cartesia -> Audio chunks -> onAudio callback
Example
import { CartesiaTTS } from 'composite-voice';
const tts = new CartesiaTTS({
apiKey: 'cart-xxxxxxxxxxxx',
voiceId: 'a0e99841-438c-4a64-b679-ae501e7d6091',
modelId: 'sonic-2',
outputEncoding: 'pcm_s16le',
outputSampleRate: 24000,
});
await tts.initialize();
await tts.connect();
tts.onAudio((chunk) => {
// Process audio chunk
});
tts.sendText('Hello, ');
tts.sendText('world!');
await tts.finalize();
await tts.disconnect();
See
- LiveTTSProvider - The base class this provider extends.
- CartesiaTTSConfig - Configuration options for this provider.
- WebSocketManager - The WebSocket manager used for connection handling.
Extends
LiveTTSProvider
Constructors
Constructor
new CartesiaTTS(config, logger?): CartesiaTTS;
Defined in: src/providers/tts/cartesia/CartesiaTTS.ts:295
Creates a new CartesiaTTS provider instance.
Parameters
| Parameter | Type | Description |
|---|---|---|
config | CartesiaTTSConfig | Configuration for the Cartesia TTS provider. The voiceId property is required. |
logger? | Logger | Optional logger instance for debug and diagnostic output. |
Returns
CartesiaTTS
Example
const tts = new CartesiaTTS({
apiKey: 'cart-xxxxxxxxxxxx',
voiceId: 'a0e99841-438c-4a64-b679-ae501e7d6091',
});
Overrides
LiveTTSProvider.constructor
Properties
| Property | Modifier | Type | Default value | Description | Overrides | Inherited from | Defined in |
|---|---|---|---|---|---|---|---|
audioCallback? | protected | (chunk) => void | undefined | Callback registered by the SDK or consumer to receive audio chunks. Set via onAudio. | - | LiveTTSProvider.audioCallback | src/providers/base/BaseTTSProvider.ts:79 |
config | public | CartesiaTTSConfig | undefined | TTS-specific provider configuration. | LiveTTSProvider.config | - | src/providers/tts/cartesia/CartesiaTTS.ts:274 |
initialized | protected | boolean | false | Tracks whether initialize has completed successfully. | - | LiveTTSProvider.initialized | src/providers/base/BaseProvider.ts:97 |
logger | protected | Logger | undefined | Scoped logger instance for this provider. | - | LiveTTSProvider.logger | src/providers/base/BaseProvider.ts:94 |
metadataCallback? | protected | (metadata) => void | undefined | Callback registered by the SDK or consumer to receive audio metadata. Set via onMetadata. | - | LiveTTSProvider.metadataCallback | src/providers/base/BaseTTSProvider.ts:85 |
roles | readonly | readonly ProviderRole[] | undefined | TTS providers cover the 'tts' pipeline role by default. | - | LiveTTSProvider.roles | src/providers/base/BaseTTSProvider.ts:70 |
type | readonly | ProviderType | undefined | Communication transport this provider uses ('rest' or 'websocket'). | - | LiveTTSProvider.type | src/providers/base/BaseProvider.ts:74 |
Methods
assertReady()
protected assertReady(): void;
Defined in: src/providers/base/BaseProvider.ts:255
Guard that throws if the provider has not been initialized.
Returns
void
Remarks
Call at the start of any method that requires the provider to be ready.
Throws
Error Thrown with a descriptive message when initialized is false.
Inherited from
LiveTTSProvider.assertReady
connect()
connect(): Promise<void>;
Defined in: src/providers/tts/cartesia/CartesiaTTS.ts:402
Connects to the Cartesia WebSocket for real-time TTS streaming.
Returns
Promise<void>
Remarks
Establishes a WebSocket connection and generates a fresh context ID for the session. Auto-reconnect is disabled for TTS sessions since each session is typically short-lived.
This method is idempotent — calling it when already connected is a no-op.
Throws
ProviderConnectionError if the WebSocket connection fails.
Overrides
LiveTTSProvider.connect
disconnect()
disconnect(): Promise<void>;
Defined in: src/providers/tts/cartesia/CartesiaTTS.ts:694
Disconnects from the Cartesia WebSocket.
Returns
Promise<void>
Remarks
Gracefully closes the WebSocket connection and releases the WebSocketManager instance. Also resets the context ID and chunk tracking state.
Throws
Rethrows any error that occurs during disconnection.
Overrides
LiveTTSProvider.disconnect
dispose()
dispose(): Promise<void>;
Defined in: src/providers/base/BaseProvider.ts:154
Clean up resources and dispose of the provider.
Returns
Promise<void>
Remarks
Delegates to the subclass hook onDispose and resets the initialized flag. If the provider is not initialized, the call is a no-op.
Throws
Re-throws any error raised by onDispose.
Inherited from
LiveTTSProvider.dispose
emitAudio()
protected emitAudio(chunk): void;
Defined in: src/providers/base/BaseTTSProvider.ts:138
Emit a synthesized audio chunk to the registered callback.
Parameters
| Parameter | Type | Description |
|---|---|---|
chunk | AudioChunk | The audio chunk to emit. |
Returns
void
Remarks
Subclasses call this method for each chunk of audio produced during synthesis. If no callback has been registered the chunk is silently dropped.
Inherited from
LiveTTSProvider.emitAudio
emitMetadata()
protected emitMetadata(metadata): void;
Defined in: src/providers/base/BaseTTSProvider.ts:154
Emit audio metadata to the registered callback.
Parameters
| Parameter | Type | Description |
|---|---|---|
metadata | AudioMetadata | The audio metadata to emit. |
Returns
void
Remarks
Typically called once at the start of synthesis when the provider knows the output format. If no callback has been registered the metadata is silently dropped.
Inherited from
LiveTTSProvider.emitMetadata
finalize()
finalize(): Promise<void>;
Defined in: src/providers/tts/cartesia/CartesiaTTS.ts:638
Finalizes the current synthesis session by sending an end-of-input signal.
Returns
Promise<void>
Remarks
Sends an empty transcript with continue: false to signal that no more text will be sent for the current context. Waits up to 2 seconds for any remaining audio to arrive, then resets the context ID for the next utterance.
Throws
Rethrows any error that occurs during finalization.
Overrides
LiveTTSProvider.finalize
getConfig()
getConfig(): TTSProviderConfig;
Defined in: src/providers/base/BaseTTSProvider.ts:165
Get a shallow copy of the current TTS configuration.
Returns
A new TTSProviderConfig object.
Inherited from
LiveTTSProvider.getConfig
initialize()
initialize(): Promise<void>;
Defined in: src/providers/base/BaseProvider.ts:127
Initialize the provider, making it ready for use.
Returns
Promise<void>
Remarks
Calls the subclass hook onInitialize. If the provider has already been initialized the call is a no-op.
Throws
ProviderInitializationError Thrown when onInitialize rejects. The original error is wrapped with the provider class name for diagnostics.
Inherited from
LiveTTSProvider.initialize
isReady()
isReady(): boolean;
Defined in: src/providers/base/BaseProvider.ts:178
Check whether the provider has been initialized and is ready.
Returns
boolean
true when initialize has completed successfully and dispose has not yet been called.
Inherited from
LiveTTSProvider.isReady
isWebSocketConnected()
isWebSocketConnected(): boolean;
Defined in: src/providers/tts/cartesia/CartesiaTTS.ts:722
Checks whether the WebSocket connection to Cartesia is currently active.
Returns
boolean
true if the WebSocket is connected, false otherwise.
onAudio()
onAudio(callback): void;
Defined in: src/providers/base/BaseTTSProvider.ts:109
Register a callback to receive synthesized audio chunks.
Parameters
| Parameter | Type | Description |
|---|---|---|
callback | (chunk) => void | Function invoked with each AudioChunk. |
Returns
void
Remarks
All TTS providers — regardless of transport — deliver audio through this callback. CompositeVoice registers it during pipeline setup so that audio data flows into the AudioPlayer.
Inherited from
LiveTTSProvider.onAudio
onConfigUpdate()
protected onConfigUpdate(_config): void;
Defined in: src/providers/base/BaseProvider.ts:242
Hook called after updateConfig merges new values.
Parameters
| Parameter | Type | Description |
|---|---|---|
_config | Partial<BaseProviderConfig> | The partial configuration that was merged. |
Returns
void
Remarks
The default implementation is a no-op. Override in subclasses to react to runtime configuration changes (e.g. reconnect with a new API key).
Inherited from
LiveTTSProvider.onConfigUpdate
onDispose()
protected onDispose(): Promise<void>;
Defined in: src/providers/tts/cartesia/CartesiaTTS.ts:341
Disposes the provider, disconnecting from the WebSocket and releasing resources.
Returns
Promise<void>
Overrides
LiveTTSProvider.onDispose
onInitialize()
protected onInitialize(): Promise<void>;
Defined in: src/providers/tts/cartesia/CartesiaTTS.ts:314
Validates configuration and prepares the provider for connection.
Returns
Promise<void>
Throws
ProviderInitializationError if neither apiKey nor proxyUrl is configured.
Throws
ProviderInitializationError if voiceId is not provided.
Overrides
LiveTTSProvider.onInitialize
onMetadata()
onMetadata(callback): void;
Defined in: src/providers/base/BaseTTSProvider.ts:124
Register a callback to receive audio metadata.
Parameters
| Parameter | Type | Description |
|---|---|---|
callback | (metadata) => void | Function invoked with AudioMetadata when available. |
Returns
void
Remarks
Metadata (sample rate, encoding, channels, etc.) helps the AudioPlayer configure playback correctly. Providers may emit metadata once at the start of synthesis but are not required to.
Inherited from
LiveTTSProvider.onMetadata
sendText()
sendText(chunk): void;
Defined in: src/providers/tts/cartesia/CartesiaTTS.ts:588
Sends a text chunk to Cartesia for real-time synthesis.
Parameters
| Parameter | Type | Description |
|---|---|---|
chunk | string | The text to synthesize into speech. |
Returns
void
Remarks
Each message includes the model ID, voice reference, output format, and a context_id for streaming continuation. The continue flag is false for the first chunk and true for subsequent chunks, allowing Cartesia to maintain prosody across multiple text segments.
Optional parameters (language, speed, emotion) are included when configured.
Overrides
LiveTTSProvider.sendText
updateConfig()
updateConfig(config): void;
Defined in: src/providers/base/BaseProvider.ts:201
Merge partial configuration updates into the current config.
Parameters
| Parameter | Type | Description |
|---|---|---|
config | Partial<BaseProviderConfig> | A partial configuration object whose keys will overwrite existing values. |
Returns
void
Remarks
After merging, the subclass hook onConfigUpdate is called so providers can react to changed values at runtime.
Inherited from
LiveTTSProvider.updateConfig