Skip to content

NativeSTT

Add speech-to-text to your voice pipeline using the browser's built-in Web Speech API -- no API keys required.

Use NativeSTT for prototyping and demos where you need speech recognition without API keys or external dependencies.

Prerequisites

  • A Chromium-based browser (Chrome, Edge) or Safari. Firefox does not support the Web Speech API. De-Googled forks (Ungoogled Chromium, Brave) will not work — see Tips and gotchas.
  • Microphone access granted by the user.

No API keys or peer dependencies are required.

Basic setup

import { CompositeVoice, NativeSTT, AnthropicLLM, NativeTTS } from '@lukeocodes/composite-voice';

const agent = new CompositeVoice({
  stt: new NativeSTT({
    language: 'en-US',
  }),
  llm: new AnthropicLLM({
    proxyUrl: '/api/proxy/anthropic',
    model: 'claude-haiku-4-5',
    systemPrompt: 'You are a helpful voice assistant. Keep responses brief.',
  }),
  tts: new NativeTTS(),
});

await agent.start();

NativeSTT manages its own audio capture. The browser’s SpeechRecognition API accesses the microphone directly, so CompositeVoice skips its internal AudioCapture setup.

Configuration options

OptionTypeDefaultDescription
languagestring'en-US'BCP 47 language tag (e.g., 'fr-FR', 'es-ES')
continuousbooleantrueKeep listening after each utterance ends
interimResultsbooleantrueEmit partial transcripts while the user speaks
maxAlternativesnumber1Number of recognition alternatives per result
startTimeoutnumber5000Milliseconds to wait for the recognition engine to start

See the API reference for the full list.

Complete example

import { CompositeVoice, NativeSTT, AnthropicLLM, NativeTTS } from '@lukeocodes/composite-voice';

const agent = new CompositeVoice({
  stt: new NativeSTT({
    language: 'en-US',
    continuous: true,
    interimResults: true,
    maxAlternatives: 1,
  }),
  llm: new AnthropicLLM({
    proxyUrl: '/api/proxy/anthropic',
    model: 'claude-haiku-4-5',
    maxTokens: 256,
    systemPrompt: 'You are a helpful voice assistant. Keep responses under two sentences.',
  }),
  tts: new NativeTTS({ voiceLang: 'en-US' }),
  logging: { enabled: true, level: 'info' },
});

agent.on('transcription:final', (event) => {
  console.log('User said:', event.text);
});

agent.on('response:text', (event) => {
  console.log('Assistant:', event.text);
});

await agent.start();

Tips and gotchas

  • Browser support is limited. Chrome and Edge have full support. Safari offers partial support via webkitSpeechRecognition. Firefox does not support the Web Speech API at all.
  • De-Googled browsers will not work. The Web Speech API in Chromium sends audio to Google’s servers for recognition. Privacy-focused forks like Ungoogled Chromium and Brave strip out Google services, so SpeechRecognition will silently fail. If you use one of these browsers, switch to a WebSocket-based provider like DeepgramSTT, AssemblyAISTT, or ElevenLabsSTT.
  • Microphone permission prompt. NativeSTT pre-checks permission via getUserMedia before starting recognition. If the user denies access, connect() throws a ProviderConnectionError.
  • Turn-taking pauses capture. NativeSTT always appears in the SDK’s alwaysPauseCombinations list, so the microphone pauses during TTS playback regardless of your turn-taking strategy.
  • No preflight signals. NativeSTT does not emit preflight/eager end-of-turn events. If you need the eager LLM pipeline, switch to DeepgramSTT.
  • sendAudio() is a no-op. Because the browser manages audio capture, calling sendAudio() on NativeSTT does nothing.

© 2026 CompositeVoice. All rights reserved.

Font size
Contrast
Motion
Transparency