OpenAITTS
Convert text to speech using OpenAI's REST API with six voices and two quality tiers.
Use OpenAITTS when you want high-quality speech synthesis via a simple REST call. Each synthesize() request returns the complete audio as a Blob — no WebSocket management required. Choose between tts-1 for speed or tts-1-hd for quality.
Prerequisites
- An OpenAI API key or a CompositeVoice proxy server
- Install the peer dependency:
npm install openai
Basic setup
import { CompositeVoice, NativeSTT, AnthropicLLM, OpenAITTS } from '@lukeocodes/composite-voice';
const voice = new CompositeVoice({
stt: new NativeSTT(),
llm: new AnthropicLLM({
proxyUrl: '/api/proxy/anthropic',
model: 'claude-haiku-4-5',
}),
tts: new OpenAITTS({
proxyUrl: '/api/proxy/openai',
model: 'tts-1',
voice: 'nova',
responseFormat: 'mp3',
}),
});
await voice.start();
Configuration options
| Option | Type | Default | Description |
|---|---|---|---|
apiKey | string | — | OpenAI API key (direct mode) |
proxyUrl | string | — | Proxy server URL (recommended for production) |
model | string | 'tts-1' | 'tts-1' (fast, low latency) or 'tts-1-hd' (higher quality) |
voice | string | 'alloy' | Voice identifier |
responseFormat | string | 'mp3' | Output format: mp3, opus, aac, flac, wav |
speed | number | 1.0 | Speech speed multiplier (0.25 to 4.0) |
organizationId | string | — | OpenAI organization ID for billing |
baseURL | string | — | Custom API endpoint (e.g., Azure OpenAI) |
maxRetries | number | 3 | Retry count for failed requests |
Available voices
alloy, echo, fable, onyx, nova, shimmer
Each voice has distinct characteristics. Preview them in the OpenAI TTS guide.
Output formats
| Format | Use case |
|---|---|
mp3 | Good compression, wide browser support (default) |
opus | Best for streaming and low latency |
aac | Optimized for mobile devices |
flac | Lossless compression |
wav | Uncompressed, highest quality |
Complete example
import { CompositeVoice, DeepgramSTT, AnthropicLLM, OpenAITTS } from '@lukeocodes/composite-voice';
const tts = new OpenAITTS({
proxyUrl: '/api/proxy/openai',
model: 'tts-1-hd',
voice: 'shimmer',
responseFormat: 'opus',
speed: 1.1,
});
const voice = new CompositeVoice({
stt: new DeepgramSTT({ proxyUrl: '/api/proxy/deepgram' }),
llm: new AnthropicLLM({
proxyUrl: '/api/proxy/anthropic',
model: 'claude-haiku-4-5',
}),
tts,
});
voice.on('tts:start', () => console.log('Speaking...'));
voice.on('tts:end', () => console.log('Done speaking'));
await voice.start();
Model selection
tts-1— Optimized for real-time use. Lower latency, slightly lower audio quality. Best for voice pipelines where responsiveness matters.tts-1-hd— Higher audio fidelity at the cost of increased latency. Best when audio quality is the priority.
Tips
- OpenAI TTS has a 4096-character limit per request. CompositeVoice handles this automatically in the pipeline, but keep it in mind for standalone use.
- The
opusformat gives the best balance of quality and size for browser playback. - OpenAITTS is REST-based, not streaming. The full audio Blob is returned after the API processes the entire input. For real-time streaming, consider DeepgramTTS.
- Use
baseURLto point at Azure OpenAI or any API-compatible endpoint. For CompositeVoice proxy routing, useproxyUrlinstead.