Provider Matrix

Every provider's products, features, and capabilities at a glance — organized by company.

CompositeVoice supports 11 provider companies across 17 provider classes, plus 4 input/output providers for the 5-role pipeline. This page organizes them by company so you can see everything a single vendor offers.

Audio Input / Output (Pipeline I/O)

These providers handle the input and output roles in the 5-role pipeline. They are not tied to any vendor.

	MicrophoneInput	BufferInput	BrowserAudioOutput	NullOutput
Role	`input`	`input`	`output`	`output`
Environment	Browser	Node/Bun/Deno	Browser	Node/Bun/Deno
Peer dependency	None	None	None	None
Description	Wraps `getUserMedia` + `AudioContext` for browser microphone capture	Accepts pushed `ArrayBuffer` data for server-side pipelines	Wraps `AudioContext` for browser speaker playback	Silently discards audio — for server-side pipelines

MicrophoneInput buffers audio frames in the input queue while the STT WebSocket connects, then flushes them in order — no audio is ever lost. BufferInput does the same for programmatic audio sources.

BrowserAudioOutput handles AudioContext resumption and buffers frames in the output queue during speaker setup. NullOutput discards all audio — use it for server-side pipelines where there are no speakers.

Multi-role providers like NativeSTT (input+stt) and NativeTTS (tts+output) cover multiple pipeline roles. When using them, you do not need separate input or output providers.

Deepgram

	STT (V1)	STT (V2)	TTS
Class	`DeepgramSTT`	`DeepgramFlux`	`DeepgramTTS`
Transport	WebSocket	WebSocket	WebSocket
Streaming	Yes	Yes	Yes
Peer dependency	None	None	None
Proxy support	Yes	Yes	Yes
Browser support	All modern browsers	All modern browsers	All modern browsers
Default model	nova-3	flux-general-en	aura-2-thalia-en

DeepgramSTT (V1/Nova) features: Interim results, smart formatting, auto-punctuation, speaker diarization, entity detection, keyword boosting, profanity filter, redaction (PCI/SSN), numerals conversion, VAD events, word-level timestamps, configurable endpointing, utterance buffering, multichannel transcription. Models: nova-3 (recommended), nova-3-medical, nova-2 (+ domain variants), nova (legacy).

DeepgramFlux (V2/Flux) features: Turn-based conversation model, eager end-of-turn detection (configurable thresholds: eot_threshold 0.5–0.9, eager_eot_threshold 0.3–0.9), end-of-turn timeout (eot_timeout_ms), keyterm support, word confidence scores. Events: StartOfTurn, EagerEndOfTurn, TurnResumed, EndOfTurn, Update. Models: flux-general-en. Only provider that supports the eager LLM pipeline.

TTS features: Real-time streaming synthesis, linear16/mulaw/alaw encoding, configurable sample rate (8–48 kHz), word-level timing metadata.

TTS models: Aura 2 (recommended — 40 English voices + 10 Spanish voices), Aura 1 (legacy — 12 English voices).

Guides: DeepgramSTT · DeepgramFlux · DeepgramTTS · Examples: 20, 21, 22, 23, 24

Anthropic

	LLM
Class	`AnthropicLLM`
Transport	HTTP streaming (SSE)
Streaming	Yes
Peer dependency	`@anthropic-ai/sdk` >=0.67.0
Proxy support	Yes
Browser support	All modern browsers
Default model	claude-haiku-4-5

LLM features: Streaming via SSE, system prompts extracted to top-level system parameter (Anthropic API convention), maxTokens required (default 1024), AbortSignal cancellation for the eager pipeline, temperature and topP controls.

Models: claude-haiku-4-5 (fastest), claude-sonnet-4-6 (balanced), claude-opus-4-6 (most capable).

Guides: AnthropicLLM · Examples: 00, 30, 31

OpenAI

	LLM	TTS
Class	`OpenAILLM`	`OpenAITTS`
Transport	HTTP streaming	HTTP (REST)
Streaming	Yes	No (batch synthesis)
Peer dependency	`openai` >=6.5.0	`openai` >=6.5.0
Proxy support	Yes	Yes
Browser support	All modern browsers	All modern browsers
Default model	(required)	tts-1

LLM features: GPT model family, streaming token generation, organizationId for multi-org accounts, temperature/topP/maxTokens controls.

LLM models: gpt-4o-mini, gpt-4o, gpt-4-turbo, gpt-3.5-turbo.

TTS features: 6 voices (alloy, echo, fable, onyx, nova, shimmer), quality/speed tradeoff via model selection (tts-1 fast, tts-1-hd quality), 5 output formats (mp3, opus, aac, flac, wav), speed control (0.25–4.0x), 4096 character limit per request, endpoint for Azure OpenAI compatibility.

Guides: OpenAILLM · OpenAITTS · Examples: 40, 41, 42

Groq

	LLM
Class	`GroqLLM`
Transport	HTTP streaming
Streaming	Yes
Peer dependency	`openai` >=6.5.0
Proxy support	Yes
Browser support	All modern browsers
Default model	llama-3.3-70b-versatile

LLM features: Ultra-fast LPU-based inference (lowest latency of any cloud LLM), OpenAI-compatible API, groqApiKey convenience alias, wide range of open-source models.

Models: llama-3.3-70b-versatile, mixtral-8x7b-32768, gemma2-9b-it, llama-3.1-8b-instant.

Guides: GroqLLM · Examples: 60

Google Gemini

	LLM
Class	`GeminiLLM`
Transport	HTTP streaming
Streaming	Yes
Peer dependency	`openai` >=6.5.0
Proxy support	Yes
Browser support	All modern browsers
Default model	gemini-2.0-flash

LLM features: OpenAI-compatible endpoint, geminiApiKey convenience alias, auto-configured base URL (generativelanguage.googleapis.com/v1beta/openai).

Models: gemini-2.0-flash, gemini-1.5-pro, gemini-1.5-flash.

Guides: GeminiLLM · Examples: 100

Mistral

	LLM
Class	`MistralLLM`
Transport	HTTP streaming
Streaming	Yes
Peer dependency	`openai` >=6.5.0
Proxy support	Yes
Browser support	All modern browsers
Default model	mistral-small-latest

LLM features: Strong multilingual support, OpenAI-compatible API, mistralApiKey convenience alias.

Models: mistral-small-latest, mistral-medium-latest, mistral-large-latest.

Guides: MistralLLM · Examples: 110

AssemblyAI

	STT
Class	`AssemblyAISTT`
Transport	WebSocket
Streaming	Yes
Peer dependency	None
Proxy support	Yes
Browser support	All modern browsers
Default model	(default real-time model)

STT features: Interim results, word boosting for domain vocabulary, word-level timestamps and confidence, automatic reconnection with exponential backoff, base64-encoded audio, graceful terminate_session on disconnect, configurable sample rate.

Guides: AssemblyAISTT · Examples: 70

ElevenLabs

	STT	TTS
Class	`ElevenLabsSTT`	`ElevenLabsTTS`
Transport	WebSocket	WebSocket
Streaming	Yes	Yes
Peer dependency	None	None
Proxy support	Yes	Yes
Browser support	All modern browsers	All modern browsers
Default model	scribe_v2_realtime	eleven_turbo_v2_5

STT features: Scribe V2 Realtime (~150ms latency), 90+ languages with auto-detection, VAD and manual commit strategies, interim results (partial transcripts), word-level timestamps and confidence, base64-encoded audio, three auth methods (API key, proxy, single-use token), BCP 47 / ISO 639-1 / ISO 639-3 language code auto-mapping, configurable VAD sensitivity, previousText context, zero-retention mode.

TTS features: Voice cloning controls (stability 0–1, similarityBoost 0–1), BOS/EOS stream-input protocol, word-level alignment, 6 output formats (pcm_16000, pcm_22050, pcm_24000, pcm_44100, mp3_44100_128, ulaw_8000), multilingual models.

TTS models: eleven_turbo_v2_5 (fast), eleven_turbo_v2, eleven_multilingual_v2, eleven_monolingual_v1.

Guides: ElevenLabsSTT · ElevenLabsTTS · Examples: 80, 81

Cartesia

	TTS
Class	`CartesiaTTS`
Transport	WebSocket
Streaming	Yes
Peer dependency	None
Proxy support	Yes
Browser support	All modern browsers
Default model	sonic-2

TTS features: Ultra-low-latency streaming, context-based streaming (context_id + continue flag preserves prosody across chunks), emotion controls (emotion_name:intensity tags), speed multiplier, 4 PCM encodings (s16le, f32le, mulaw, alaw), word-level timestamps, configurable sample rate.

Models: sonic-2 (latest, lowest latency), sonic, sonic-multilingual.

Guides: CartesiaTTS · Examples: 90

Browser Built-ins

	STT	TTS
Class	`NativeSTT`	`NativeTTS`
Transport	Web Speech API	SpeechSynthesis API
Streaming	Yes (interim results)	No (managed playback)
Peer dependency	None	None
Proxy support	No (no API key needed)	No (no API key needed)
Browser support	Chrome, Edge (full); Safari (limited)	All modern browsers
Default model	Browser default	OS default voice

STT features: Zero dependencies, works offline, 50+ languages via browser, continuous mode, interim results, maxAlternatives, startTimeout, managed audio (browser controls the microphone directly).

TTS features: Zero dependencies, works offline, voice enumeration via getAvailableVoices(), voice selection by name/language, rate/pitch/volume controls, pause/resume/cancel playback, runtime voice switching with setVoice(), managed audio (browser plays directly).

Limitations: NativeSTT requires Chromium (no Firefox). Both use managed audio — the SDK cannot access raw audio streams. No preflight signals. Best for prototyping.

Guides: NativeSTT · NativeTTS · Examples: 00

WebLLM (MLC AI)

	LLM
Class	`WebLLMLLM`
Transport	WebGPU (in-browser)
Streaming	Yes
Peer dependency	`@mlc-ai/web-llm` >=0.2.74
Proxy support	No (runs locally)
Browser support	Chrome 113+, Edge 113+ (WebGPU required)
Default model	(required — no default)

LLM features: Fully offline after initial model download, all data stays in the browser, onLoadProgress callback for download UI, chatOpts for engine tuning, engine.interruptGenerate() abort support, no API keys needed.

Example models: Llama-3.2-1B-Instruct-q4f16_1-MLC (~500 MB), Phi-2-q4f16_1-MLC (~1.5 GB).

Guides: WebLLMLLM · Examples: 50

Feature comparison at a glance

Capability	Providers that support it
WebSocket streaming	DeepgramSTT, DeepgramFlux, DeepgramTTS, AssemblyAISTT, ElevenLabsSTT, ElevenLabsTTS, CartesiaTTS
Preflight / eager LLM	DeepgramFlux
Server proxy	All except NativeSTT, NativeTTS, WebLLMLLM
No API key needed	NativeSTT, NativeTTS, WebLLMLLM
No peer dependency	NativeSTT, NativeTTS, DeepgramSTT, DeepgramFlux, DeepgramTTS, AssemblyAISTT, ElevenLabsSTT, ElevenLabsTTS, CartesiaTTS
Managed audio	NativeSTT, NativeTTS
Voice cloning controls	ElevenLabsTTS
Emotion controls	CartesiaTTS
Word boosting	DeepgramSTT, AssemblyAISTT
Keyterm boosting	DeepgramFlux
Offline capable	NativeSTT, NativeTTS, WebLLMLLM
Speaker diarization	DeepgramSTT
Word-level timestamps	DeepgramSTT, DeepgramFlux, AssemblyAISTT, ElevenLabsSTT, DeepgramTTS, CartesiaTTS
Language auto-detection	ElevenLabsSTT
VAD commit strategy	ElevenLabsSTT