Skip to content

OpenAITTS

Convert text to speech using OpenAI's REST API with six voices and two quality tiers.

Use OpenAITTS when you want high-quality speech synthesis via a simple REST call. Each synthesize() request returns the complete audio as a Blob — no WebSocket management required. Choose between tts-1 for speed or tts-1-hd for quality.

Prerequisites

  • An OpenAI API key or a CompositeVoice proxy server
  • Install the peer dependency:
npm install openai

Basic setup

import { CompositeVoice, NativeSTT, AnthropicLLM, OpenAITTS } from '@lukeocodes/composite-voice';

const voice = new CompositeVoice({
  stt: new NativeSTT(),
  llm: new AnthropicLLM({
    proxyUrl: '/api/proxy/anthropic',
    model: 'claude-haiku-4-5',
  }),
  tts: new OpenAITTS({
    proxyUrl: '/api/proxy/openai',
    model: 'tts-1',
    voice: 'nova',
    responseFormat: 'mp3',
  }),
});

await voice.start();

Configuration options

OptionTypeDefaultDescription
apiKeystringOpenAI API key (direct mode)
proxyUrlstringProxy server URL (recommended for production)
modelstring'tts-1''tts-1' (fast, low latency) or 'tts-1-hd' (higher quality)
voicestring'alloy'Voice identifier
responseFormatstring'mp3'Output format: mp3, opus, aac, flac, wav
speednumber1.0Speech speed multiplier (0.25 to 4.0)
organizationIdstringOpenAI organization ID for billing
baseURLstringCustom API endpoint (e.g., Azure OpenAI)
maxRetriesnumber3Retry count for failed requests

Available voices

alloy, echo, fable, onyx, nova, shimmer

Each voice has distinct characteristics. Preview them in the OpenAI TTS guide.

Output formats

FormatUse case
mp3Good compression, wide browser support (default)
opusBest for streaming and low latency
aacOptimized for mobile devices
flacLossless compression
wavUncompressed, highest quality

Complete example

import { CompositeVoice, DeepgramSTT, AnthropicLLM, OpenAITTS } from '@lukeocodes/composite-voice';

const tts = new OpenAITTS({
  proxyUrl: '/api/proxy/openai',
  model: 'tts-1-hd',
  voice: 'shimmer',
  responseFormat: 'opus',
  speed: 1.1,
});

const voice = new CompositeVoice({
  stt: new DeepgramSTT({ proxyUrl: '/api/proxy/deepgram' }),
  llm: new AnthropicLLM({
    proxyUrl: '/api/proxy/anthropic',
    model: 'claude-haiku-4-5',
  }),
  tts,
});

voice.on('tts:start', () => console.log('Speaking...'));
voice.on('tts:end', () => console.log('Done speaking'));

await voice.start();

Model selection

  • tts-1 — Optimized for real-time use. Lower latency, slightly lower audio quality. Best for voice pipelines where responsiveness matters.
  • tts-1-hd — Higher audio fidelity at the cost of increased latency. Best when audio quality is the priority.

Tips

  • OpenAI TTS has a 4096-character limit per request. CompositeVoice handles this automatically in the pipeline, but keep it in mind for standalone use.
  • The opus format gives the best balance of quality and size for browser playback.
  • OpenAITTS is REST-based, not streaming. The full audio Blob is returned after the API processes the entire input. For real-time streaming, consider DeepgramTTS.
  • Use baseURL to point at Azure OpenAI or any API-compatible endpoint. For CompositeVoice proxy routing, use proxyUrl instead.

Further reading

© 2026 CompositeVoice. All rights reserved.

Font size
Contrast
Motion
Transparency