NativeSTT
Add speech-to-text to your voice pipeline using the browser's built-in Web Speech API -- no API keys required.
Use NativeSTT for prototyping and demos where you need speech recognition without API keys or external dependencies.
Prerequisites
- A Chromium-based browser (Chrome, Edge) or Safari. Firefox does not support the Web Speech API. De-Googled forks (Ungoogled Chromium, Brave) will not work — see Tips and gotchas.
- Microphone access granted by the user.
No API keys or peer dependencies are required.
Basic setup
import { CompositeVoice, NativeSTT, AnthropicLLM, NativeTTS } from '@lukeocodes/composite-voice';
const agent = new CompositeVoice({
stt: new NativeSTT({
language: 'en-US',
}),
llm: new AnthropicLLM({
proxyUrl: '/api/proxy/anthropic',
model: 'claude-haiku-4-5',
systemPrompt: 'You are a helpful voice assistant. Keep responses brief.',
}),
tts: new NativeTTS(),
});
await agent.start();
NativeSTT manages its own audio capture. The browser’s SpeechRecognition API accesses the microphone directly, so CompositeVoice skips its internal AudioCapture setup.
Configuration options
| Option | Type | Default | Description |
|---|---|---|---|
language | string | 'en-US' | BCP 47 language tag (e.g., 'fr-FR', 'es-ES') |
continuous | boolean | true | Keep listening after each utterance ends |
interimResults | boolean | true | Emit partial transcripts while the user speaks |
maxAlternatives | number | 1 | Number of recognition alternatives per result |
startTimeout | number | 5000 | Milliseconds to wait for the recognition engine to start |
See the API reference for the full list.
Complete example
import { CompositeVoice, NativeSTT, AnthropicLLM, NativeTTS } from '@lukeocodes/composite-voice';
const agent = new CompositeVoice({
stt: new NativeSTT({
language: 'en-US',
continuous: true,
interimResults: true,
maxAlternatives: 1,
}),
llm: new AnthropicLLM({
proxyUrl: '/api/proxy/anthropic',
model: 'claude-haiku-4-5',
maxTokens: 256,
systemPrompt: 'You are a helpful voice assistant. Keep responses under two sentences.',
}),
tts: new NativeTTS({ voiceLang: 'en-US' }),
logging: { enabled: true, level: 'info' },
});
agent.on('transcription:final', (event) => {
console.log('User said:', event.text);
});
agent.on('response:text', (event) => {
console.log('Assistant:', event.text);
});
await agent.start();
Tips and gotchas
- Browser support is limited. Chrome and Edge have full support. Safari offers partial support via
webkitSpeechRecognition. Firefox does not support the Web Speech API at all. - De-Googled browsers will not work. The Web Speech API in Chromium sends audio to Google’s servers for recognition. Privacy-focused forks like Ungoogled Chromium and Brave strip out Google services, so
SpeechRecognitionwill silently fail. If you use one of these browsers, switch to a WebSocket-based provider like DeepgramSTT, AssemblyAISTT, or ElevenLabsSTT. - Microphone permission prompt. NativeSTT pre-checks permission via
getUserMediabefore starting recognition. If the user denies access,connect()throws aProviderConnectionError. - Turn-taking pauses capture. NativeSTT always appears in the SDK’s
alwaysPauseCombinationslist, so the microphone pauses during TTS playback regardless of your turn-taking strategy. - No preflight signals. NativeSTT does not emit preflight/eager end-of-turn events. If you need the eager LLM pipeline, switch to DeepgramSTT.
sendAudio()is a no-op. Because the browser manages audio capture, callingsendAudio()on NativeSTT does nothing.
Related resources
- Minimal voice agent example — uses NativeSTT with NativeTTS
- Multi-language example — switch languages at runtime
- API reference: NativeSTT
- Providers reference