WebLLMLLMConfig
Configuration for the WebLLM in-browser LLM provider.
Defined in: src/providers/llm/webllm/WebLLMLLM.ts:92
Configuration for the WebLLM in-browser LLM provider.
Remarks
Unlike server-side providers, WebLLM needs no API key or proxy — everything runs client-side via WebGPU. The only required field is model.
Example
const config: WebLLMLLMConfig = {
model: 'Llama-3.2-1B-Instruct-q4f16_1-MLC',
stream: true,
systemPrompt: 'You are a helpful assistant running locally.',
onLoadProgress: ({ progress, text }) => {
console.log(`Loading: ${Math.round(progress * 100)}% - ${text}`);
},
};
See
- LLMProviderConfig for inherited base properties (temperature, maxTokens, systemPrompt, etc.).
- Available WebLLM models
Extends
Properties
| Property | Type | Default value | Description | Overrides | Inherited from | Defined in |
|---|---|---|---|---|---|---|
apiKey? | string | undefined | API key or authentication token for the provider. Remarks For client-side usage, consider using a proxy server to keep API keys secure. The SDK provides Express, Next.js, and Node adapters for this purpose. | - | LLMProviderConfig.apiKey | src/core/types/providers.ts:67 |
chatOpts? | Record<string, unknown> | undefined | Override entries from mlc-chat-config.json at engine creation time. Remarks Useful for tuning engine parameters such as context_window_size, prefill_chunk_size, or sliding_window_size without modifying the model’s packaged configuration. Example chatOpts: { context_window_size: 2048, prefill_chunk_size: 1024, } | - | - | src/providers/llm/webllm/WebLLMLLM.ts:145 |
debug? | boolean | false | Whether to enable debug logging for this provider. Remarks When true, the provider emits detailed internal logs. This is separate from the SDK-level LoggingConfig. | - | LLMProviderConfig.debug | src/core/types/providers.ts:86 |
endpoint? | string | undefined | Custom endpoint URL to override the provider’s default API endpoint. Remarks Useful for self-hosted instances, proxy servers, or development environments. | - | LLMProviderConfig.endpoint | src/core/types/providers.ts:75 |
maxTokens? | number | undefined | Maximum number of tokens to generate in the response. Remarks For voice applications, lower values (100-300) help keep responses concise and reduce TTS latency. | - | LLMProviderConfig.maxTokens | src/core/types/providers.ts:589 |
model | string | undefined | WebLLM model identifier. Remarks Must match one of the model IDs supported by @mlc-ai/web-llm. The model weights are downloaded on first use and cached by the browser for subsequent loads. Example 'Llama-3.2-1B-Instruct-q4f16_1-MLC' See Available models | LLMProviderConfig.model | - | src/providers/llm/webllm/WebLLMLLM.ts:105 |
onLoadProgress? | (progress) => void | undefined | Callback fired during model download and WebGPU shader compilation. Remarks Wire this to a progress bar for good UX — initial loads can be 100 MB+. The callback receives a WebLLMLoadProgress object with progress (0—1), timeElapsed (seconds), and a human-readable text description. Example onLoadProgress: ({ progress, text }) => { progressBar.style.width =${progress * 100}%; statusLabel.textContent = text; } | - | - | src/providers/llm/webllm/WebLLMLLM.ts:125 |
stopSequences? | string[] | undefined | Sequences that cause the LLM to stop generating. Remarks When the model generates any of these sequences, generation halts. Useful for controlling response boundaries. | - | LLMProviderConfig.stopSequences | src/core/types/providers.ts:627 |
stream? | boolean | undefined | Whether to stream the LLM response token by token. Remarks When true, the provider yields tokens incrementally via an async iterable. Streaming is essential for low-latency voice applications as it allows TTS to begin synthesizing before the full response is generated. | - | LLMProviderConfig.stream | src/core/types/providers.ts:618 |
systemPrompt? | string | undefined | System prompt providing instructions and context to the LLM. Remarks Sets the behavior and persona of the assistant. For voice applications, include instructions to keep responses brief and conversational. | - | LLMProviderConfig.systemPrompt | src/core/types/providers.ts:608 |
temperature? | number | undefined | Temperature for controlling generation randomness. Remarks Values from 0 (deterministic) to 2 (highly creative). Lower values produce more focused responses; higher values increase variety. | - | LLMProviderConfig.temperature | src/core/types/providers.ts:580 |
timeout? | number | undefined | Request timeout in milliseconds. Remarks Applies to HTTP requests (REST providers) and connection establishment (WebSocket providers). Set to 0 for no timeout. | - | LLMProviderConfig.timeout | src/core/types/providers.ts:95 |
topP? | number | undefined | Top-P (nucleus) sampling parameter. Remarks Limits token selection to the smallest set whose cumulative probability exceeds this value. Values from 0 to 1. Often used as an alternative to temperature. | - | LLMProviderConfig.topP | src/core/types/providers.ts:599 |