Skip to content

AudioInputConfig

Configuration for audio input (microphone capture).

Defined in: src/core/types/audio.ts:87

Configuration for audio input (microphone capture).

Remarks

Controls how the SDK captures audio from the user’s microphone. These settings are passed to the browser’s getUserMedia API and the internal audio processing pipeline. The configuration affects audio quality, bandwidth, and compatibility with your chosen STT provider.

Most STT providers work best with 16kHz mono PCM audio. The SDK applies sensible defaults via DEFAULT_AUDIO_INPUT_CONFIG if you do not specify these values.

Example

import { CompositeVoice } from 'composite-voice';

const agent = new CompositeVoice({
  stt: mySTTProvider,
  llm: myLLMProvider,
  tts: myTTSProvider,
  audio: {
    input: {
      sampleRate: 16000,
      format: 'pcm',
      channels: 1,
      echoCancellation: true,
      noiseSuppression: true,
    },
  },
});

See

Extended by

Properties

PropertyTypeDefault valueDescriptionDefined in
autoGainControl?booleantrueWhether to enable the browser’s automatic gain control. Remarks Normalizes microphone input volume, which helps when users speak at varying distances from the microphone.src/core/types/audio.ts:156
channels?number1Number of audio channels. Remarks Use 1 for mono (recommended for speech) or 2 for stereo.src/core/types/audio.ts:112
chunkDuration?number100Duration of each audio chunk sent to the STT provider, in milliseconds. Remarks Lower values reduce latency but increase the number of chunks sent. Typical values range from 20ms to 250ms.src/core/types/audio.ts:123
echoCancellation?booleantrueWhether to enable the browser’s echo cancellation processing. Remarks Strongly recommended when using speaker output simultaneously with microphone capture to prevent the TTS audio from being re-transcribed by the STT provider.src/core/types/audio.ts:134
formatAudioFormatundefinedAudio format/codec for encoding captured audio. See AudioFormatsrc/core/types/audio.ts:102
noiseSuppression?booleantrueWhether to enable the browser’s noise suppression processing. Remarks Reduces background noise for cleaner transcriptions. Recommended for most environments unless you need raw audio fidelity.src/core/types/audio.ts:145
sampleRatenumberundefinedSample rate in Hz for audio capture. Remarks Common values are 16000 (speech-optimized), 24000, and 48000 (high fidelity). Most STT providers perform best at 16000 Hz.src/core/types/audio.ts:95

© 2026 CompositeVoice. All rights reserved.

Font size
Contrast
Motion
Transparency