Skip to content

Provider Matrix

Every provider's products, features, and capabilities at a glance — organized by company.

CompositeVoice supports 11 provider companies across 17 provider classes, plus 4 input/output providers for the 5-role pipeline. This page organizes them by company so you can see everything a single vendor offers.

Audio Input / Output (Pipeline I/O)

These providers handle the input and output roles in the 5-role pipeline. They are not tied to any vendor.

MicrophoneInputBufferInputBrowserAudioOutputNullOutput
Roleinputinputoutputoutput
EnvironmentBrowserNode/Bun/DenoBrowserNode/Bun/Deno
Peer dependencyNoneNoneNoneNone
DescriptionWraps getUserMedia + AudioContext for browser microphone captureAccepts pushed ArrayBuffer data for server-side pipelinesWraps AudioContext for browser speaker playbackSilently discards audio — for server-side pipelines

MicrophoneInput buffers audio frames in the input queue while the STT WebSocket connects, then flushes them in order — no audio is ever lost. BufferInput does the same for programmatic audio sources.

BrowserAudioOutput handles AudioContext resumption and buffers frames in the output queue during speaker setup. NullOutput discards all audio — use it for server-side pipelines where there are no speakers.

Multi-role providers like NativeSTT (input+stt) and NativeTTS (tts+output) cover multiple pipeline roles. When using them, you do not need separate input or output providers.


Deepgram

STT (V1)STT (V2)TTS
ClassDeepgramSTTDeepgramFluxDeepgramTTS
TransportWebSocketWebSocketWebSocket
StreamingYesYesYes
Peer dependencyNoneNoneNone
Proxy supportYesYesYes
Browser supportAll modern browsersAll modern browsersAll modern browsers
Default modelnova-3flux-general-enaura-2-thalia-en

DeepgramSTT (V1/Nova) features: Interim results, smart formatting, auto-punctuation, speaker diarization, entity detection, keyword boosting, profanity filter, redaction (PCI/SSN), numerals conversion, VAD events, word-level timestamps, configurable endpointing, utterance buffering, multichannel transcription. Models: nova-3 (recommended), nova-3-medical, nova-2 (+ domain variants), nova (legacy).

DeepgramFlux (V2/Flux) features: Turn-based conversation model, eager end-of-turn detection (configurable thresholds: eot_threshold 0.5–0.9, eager_eot_threshold 0.3–0.9), end-of-turn timeout (eot_timeout_ms), keyterm support, word confidence scores. Events: StartOfTurn, EagerEndOfTurn, TurnResumed, EndOfTurn, Update. Models: flux-general-en. Only provider that supports the eager LLM pipeline.

TTS features: Real-time streaming synthesis, linear16/mulaw/alaw encoding, configurable sample rate (8–48 kHz), word-level timing metadata.

TTS models: Aura 2 (recommended — 40 English voices + 10 Spanish voices), Aura 1 (legacy — 12 English voices).

Guides: DeepgramSTT · DeepgramFlux · DeepgramTTS · Examples: 20, 21, 22, 23, 24


Anthropic

LLM
ClassAnthropicLLM
TransportHTTP streaming (SSE)
StreamingYes
Peer dependency@anthropic-ai/sdk >=0.67.0
Proxy supportYes
Browser supportAll modern browsers
Default modelclaude-haiku-4-5

LLM features: Streaming via SSE, system prompts extracted to top-level system parameter (Anthropic API convention), maxTokens required (default 1024), AbortSignal cancellation for the eager pipeline, temperature and topP controls.

Models: claude-haiku-4-5 (fastest), claude-sonnet-4-6 (balanced), claude-opus-4-6 (most capable).

Guides: AnthropicLLM · Examples: 00, 30, 31


OpenAI

LLMTTS
ClassOpenAILLMOpenAITTS
TransportHTTP streamingHTTP (REST)
StreamingYesNo (batch synthesis)
Peer dependencyopenai >=6.5.0openai >=6.5.0
Proxy supportYesYes
Browser supportAll modern browsersAll modern browsers
Default model(required)tts-1

LLM features: GPT model family, streaming token generation, organizationId for multi-org accounts, temperature/topP/maxTokens controls.

LLM models: gpt-4o-mini, gpt-4o, gpt-4-turbo, gpt-3.5-turbo.

TTS features: 6 voices (alloy, echo, fable, onyx, nova, shimmer), quality/speed tradeoff via model selection (tts-1 fast, tts-1-hd quality), 5 output formats (mp3, opus, aac, flac, wav), speed control (0.25–4.0x), 4096 character limit per request, endpoint for Azure OpenAI compatibility.

Guides: OpenAILLM · OpenAITTS · Examples: 40, 41, 42


Groq

LLM
ClassGroqLLM
TransportHTTP streaming
StreamingYes
Peer dependencyopenai >=6.5.0
Proxy supportYes
Browser supportAll modern browsers
Default modelllama-3.3-70b-versatile

LLM features: Ultra-fast LPU-based inference (lowest latency of any cloud LLM), OpenAI-compatible API, groqApiKey convenience alias, wide range of open-source models.

Models: llama-3.3-70b-versatile, mixtral-8x7b-32768, gemma2-9b-it, llama-3.1-8b-instant.

Guides: GroqLLM · Examples: 60


Google Gemini

LLM
ClassGeminiLLM
TransportHTTP streaming
StreamingYes
Peer dependencyopenai >=6.5.0
Proxy supportYes
Browser supportAll modern browsers
Default modelgemini-2.0-flash

LLM features: OpenAI-compatible endpoint, geminiApiKey convenience alias, auto-configured base URL (generativelanguage.googleapis.com/v1beta/openai).

Models: gemini-2.0-flash, gemini-1.5-pro, gemini-1.5-flash.

Guides: GeminiLLM · Examples: 100


Mistral

LLM
ClassMistralLLM
TransportHTTP streaming
StreamingYes
Peer dependencyopenai >=6.5.0
Proxy supportYes
Browser supportAll modern browsers
Default modelmistral-small-latest

LLM features: Strong multilingual support, OpenAI-compatible API, mistralApiKey convenience alias.

Models: mistral-small-latest, mistral-medium-latest, mistral-large-latest.

Guides: MistralLLM · Examples: 110


AssemblyAI

STT
ClassAssemblyAISTT
TransportWebSocket
StreamingYes
Peer dependencyNone
Proxy supportYes
Browser supportAll modern browsers
Default model(default real-time model)

STT features: Interim results, word boosting for domain vocabulary, word-level timestamps and confidence, automatic reconnection with exponential backoff, base64-encoded audio, graceful terminate_session on disconnect, configurable sample rate.

Guides: AssemblyAISTT · Examples: 70


ElevenLabs

STTTTS
ClassElevenLabsSTTElevenLabsTTS
TransportWebSocketWebSocket
StreamingYesYes
Peer dependencyNoneNone
Proxy supportYesYes
Browser supportAll modern browsersAll modern browsers
Default modelscribe_v2_realtimeeleven_turbo_v2_5

STT features: Scribe V2 Realtime (~150ms latency), 90+ languages with auto-detection, VAD and manual commit strategies, interim results (partial transcripts), word-level timestamps and confidence, base64-encoded audio, three auth methods (API key, proxy, single-use token), BCP 47 / ISO 639-1 / ISO 639-3 language code auto-mapping, configurable VAD sensitivity, previousText context, zero-retention mode.

TTS features: Voice cloning controls (stability 0–1, similarityBoost 0–1), BOS/EOS stream-input protocol, word-level alignment, 6 output formats (pcm_16000, pcm_22050, pcm_24000, pcm_44100, mp3_44100_128, ulaw_8000), multilingual models.

TTS models: eleven_turbo_v2_5 (fast), eleven_turbo_v2, eleven_multilingual_v2, eleven_monolingual_v1.

Guides: ElevenLabsSTT · ElevenLabsTTS · Examples: 80, 81


Cartesia

TTS
ClassCartesiaTTS
TransportWebSocket
StreamingYes
Peer dependencyNone
Proxy supportYes
Browser supportAll modern browsers
Default modelsonic-2

TTS features: Ultra-low-latency streaming, context-based streaming (context_id + continue flag preserves prosody across chunks), emotion controls (emotion_name:intensity tags), speed multiplier, 4 PCM encodings (s16le, f32le, mulaw, alaw), word-level timestamps, configurable sample rate.

Models: sonic-2 (latest, lowest latency), sonic, sonic-multilingual.

Guides: CartesiaTTS · Examples: 90


Browser Built-ins

STTTTS
ClassNativeSTTNativeTTS
TransportWeb Speech APISpeechSynthesis API
StreamingYes (interim results)No (managed playback)
Peer dependencyNoneNone
Proxy supportNo (no API key needed)No (no API key needed)
Browser supportChrome, Edge (full); Safari (limited)All modern browsers
Default modelBrowser defaultOS default voice

STT features: Zero dependencies, works offline, 50+ languages via browser, continuous mode, interim results, maxAlternatives, startTimeout, managed audio (browser controls the microphone directly).

TTS features: Zero dependencies, works offline, voice enumeration via getAvailableVoices(), voice selection by name/language, rate/pitch/volume controls, pause/resume/cancel playback, runtime voice switching with setVoice(), managed audio (browser plays directly).

Limitations: NativeSTT requires Chromium (no Firefox). Both use managed audio — the SDK cannot access raw audio streams. No preflight signals. Best for prototyping.

Guides: NativeSTT · NativeTTS · Examples: 00


WebLLM (MLC AI)

LLM
ClassWebLLMLLM
TransportWebGPU (in-browser)
StreamingYes
Peer dependency@mlc-ai/web-llm >=0.2.74
Proxy supportNo (runs locally)
Browser supportChrome 113+, Edge 113+ (WebGPU required)
Default model(required — no default)

LLM features: Fully offline after initial model download, all data stays in the browser, onLoadProgress callback for download UI, chatOpts for engine tuning, engine.interruptGenerate() abort support, no API keys needed.

Example models: Llama-3.2-1B-Instruct-q4f16_1-MLC (~500 MB), Phi-2-q4f16_1-MLC (~1.5 GB).

Guides: WebLLMLLM · Examples: 50


Feature comparison at a glance

CapabilityProviders that support it
WebSocket streamingDeepgramSTT, DeepgramFlux, DeepgramTTS, AssemblyAISTT, ElevenLabsSTT, ElevenLabsTTS, CartesiaTTS
Preflight / eager LLMDeepgramFlux
Server proxyAll except NativeSTT, NativeTTS, WebLLMLLM
No API key neededNativeSTT, NativeTTS, WebLLMLLM
No peer dependencyNativeSTT, NativeTTS, DeepgramSTT, DeepgramFlux, DeepgramTTS, AssemblyAISTT, ElevenLabsSTT, ElevenLabsTTS, CartesiaTTS
Managed audioNativeSTT, NativeTTS
Voice cloning controlsElevenLabsTTS
Emotion controlsCartesiaTTS
Word boostingDeepgramSTT, AssemblyAISTT
Keyterm boostingDeepgramFlux
Offline capableNativeSTT, NativeTTS, WebLLMLLM
Speaker diarizationDeepgramSTT
Word-level timestampsDeepgramSTT, DeepgramFlux, AssemblyAISTT, ElevenLabsSTT, DeepgramTTS, CartesiaTTS
Language auto-detectionElevenLabsSTT
VAD commit strategyElevenLabsSTT

© 2026 CompositeVoice. All rights reserved.

Font size
Contrast
Motion
Transparency