Wake Word for the Browser

Always-on wake-phrase detection running entirely inside a browser tab, no server, no download-first, no native install. Detects the phrase "Hey Assistant" on 16 kHz mono microphone audio.

Status: shipping, browser-ready. Verified end-to-end via live microphone on Chrome (V8) and Safari (JSC) across Mac and mobile.
Current version: v0.1.1
Target: any modern browser with WebAssembly SIMD128 — Chrome 91+, Firefox 89+, Safari 16.4+, iOS Safari 16.4+, Android Chrome, Edge 91+.
Bundle: ~170 KB WebAssembly runtime + ~100 KB .vxrt model = ~275 KB total download.
License: Apache-2.0 (wrapper sources) · proprietary (compiled runtime, redistribution allowed via this SDK)

Positioning — read this before shipping to production

The three deployment tiers we support, most-secure first:

Tier	SDK	Model delivery	Recommended for
Native	Android / iOS / Linux	Embedded in the app / SDK asset, AES-256-GCM encrypted `.vxrt`	Production consumer apps, embedded devices, offline appliances
Browser (this SDK)	`@voxrt/wake-word-browser` (npm)	Bundled plaintext `.vxrt` — same model bytes as the native SDKs, without the AES-GCM envelope	Free demos, prototypes, marketing landing pages, quick internal tools
Browser + server-gated model _(v0.2+ roadmap)_	Same npm package	`.vxrt` fetched from an auth-token-gated endpoint per session — revocable + auditable	Commercial browser deployments where model IP matters more than device reach

If your product monetises the wake-word capability itself (custom phrase, brand-specific), we recommend the native SDKs. Browsers give a low-friction demo surface and prototyping speed, not maximum-security model distribution.

What is VoxRT?

VoxRT is a from-scratch inference runtime for on-device speech models. No ONNX Runtime, no PyTorch Mobile, no LiteRT — a custom Rust core sized and tuned for streaming voice workloads. This package is the WebAssembly port of that same runtime — same .vxrt model format, same detection schema as our Android (Kotlin VoxrtWakeWordEngine), iOS (Swift VoxrtWakeWordEngine), Linux (voxrt_wake_word_* C ABI), Python, Node, and Go bindings.

Sister browser modules are on the roadmap: voxrt-silero-browser (VAD), voxrt-asr-browser (streaming ASR). All share this runtime.

Model quality

Same wake-word model as the native SDKs (voxrt_wake_word.vxrt v0.1.0). Test split: 5,240 positive utterances + 6,416 hard-negative utterances (isolated "Hey", isolated "Assistant", competitor wake-words like "Hey Siri", phonetic neighbours, arbitrary speech, non-speech audio). All speakers disjoint from train + val.

ROC AUC: 0.9966
Average precision (PR AUC): 0.9899
Default threshold 0.90 hits precision 0.993 / recall 0.982 on the test split.

Full precision/recall/FPR table lives in the Linux SDK README — identical model.

Performance

60-second offline benchmark, WebAssembly with SIMD128, cumulative RTF (wall_time / audio_time) across the actual push path:

Device	Browser	RTF	Native reference (same model, NEON)
MacBook Pro M4	Chrome	0.16 %	—
iPhone 13 Pro Max (A15)	Safari	0.23 %	—
Snapdragon 662 (A73)	Chrome Android	1.17 %	2.1 % (Android SDK, native NEON, HIGH_PERF)
Raspberry Pi Zero 2 W (A53)	—	(native ref only)	5.3 % (Linux SDK, native NEON)

WASM SIMD128 recovers most of the native-NEON perf on the same chip (2.4× penalty vs native NEON on A73), and on modern CPUs (M4 / A15) the browser overhead disappears entirely.

Install

npm install @voxrt/wake-word-browser

Or CDN (both files sit next to each other in the published package):

<script type="module">
  import init, { WakeWordEngine } from
    "https://unpkg.com/@voxrt/[email protected]/voxrt-wake-word-browser.js";
  await init();
  const bytes = new Uint8Array(await (await fetch(
    "https://unpkg.com/@voxrt/[email protected]/voxrt_wake_word.vxrt"
  )).arrayBuffer());
  const engine = WakeWordEngine.fromBytes(bytes);
</script>

The npm package ships everything you need in one install: the WebAssembly runtime, the JavaScript glue + TypeScript types, and the wake-phrase model file (voxrt_wake_word.vxrt, plaintext v2 — the browser build is compiled without a decrypt key, so there's no crypto material anywhere in the shipped binary and no risk to the native SDKs' model IP).

If you want a custom-phrase or brand-specific model, contact [email protected] — those ship through the native SDKs. See the tier table at the top of this README for the security trade-off.

Quick start

The full runnable page lives at examples/live-mic-quickstart/. Minimum you need:

<!doctype html>
<html>
<body>
<button id="start">Listen</button>
<pre id="log"></pre>
<script type="module">
import init, { WakeWordEngine } from
  "./node_modules/@voxrt/wake-word-browser/voxrt-wake-word-browser.js";

document.getElementById("start").addEventListener("click", async () => {
  await init();
  const engine = WakeWordEngine.fromBytes(new Uint8Array(
    await (await fetch("voxrt_wake_word.vxrt")).arrayBuffer()));
  engine.threshold = 0.9;

  const stream = await navigator.mediaDevices.getUserMedia({audio: true});
  const ac = new AudioContext({sampleRate: 16000});
  const src = ac.createMediaStreamSource(stream);
  const proc = ac.createScriptProcessor(512, 1, 1);
  const mute = ac.createGain(); mute.gain.value = 0;
  const pcm = new Int16Array(512);

  proc.addEventListener("audioprocess", (e) => {
    const input = e.inputBuffer.getChannelData(0);
    for (let i = 0; i < input.length; i++) {
      const s = Math.max(-1, Math.min(1, input[i]));
      pcm[i] = s < 0 ? s * 0x8000 : s * 0x7fff;
    }
    for (const d of engine.pushPcmI16(pcm)) {
      document.getElementById("log").textContent +=
        `wake! t=${d.timestampSec.toFixed(3)}s score=${d.score.toFixed(4)}\n`;
    }
  });
  src.connect(proc);
  proc.connect(mute);
  mute.connect(ac.destination);
});
</script>
</body>
</html>

Serve it over HTTPS (or localhost) — browsers refuse getUserMedia on plain HTTP for non-loopback origins.

API

class WakeWordEngine {
  static fromBytes(bytes: Uint8Array): WakeWordEngine;

  pushPcmI16(pcm: Int16Array): Detection[];
  pushPcmF32(pcm: Float32Array): Detection[];

  currentScore(): number;
  reset(): void;

  threshold: number;         // default 0.9
  cooldownFrames: number;    // default 100 (= 1.0 s at 10 ms hop)
}

interface Detection {
  frameIndex: number;        // 0-based frame counter (10 ms/frame)
  timestampSec: number;
  score: number;             // sigmoid, [0, 1]
}

function version(): string;

Detection schema is identical to the Kotlin (VoxrtWakeWordDetection), Swift (VoxrtWakeWordDetection), Node (Detection), Python (Detection), and Go (wakeword.Detection) bindings on the other channels.

Tuning

threshold — sigmoid space [0, 1]. Default 0.9. Lower for higher recall (more triggers, more false positives); raise for higher precision.
cooldownFrames — post-detection silence in 10-ms frames. Default 100 = 1.0 s. Prevents multi-fire on a long utterance.

Browser support

WebAssembly SIMD128 is stable in every modern browser:

Engine	Minimum version	Released
Chrome / Edge (V8)	91	May 2021
Firefox	89	June 2021
Safari (JSC) — desktop + iOS	16.4	Mar 2023

If you must support older Safari (<16.4), we don't ship a scalar-fallback build in v0.1.x — the runtime hard-requires SIMD128. Older engines will fail at init().

License

Wrapper sources (voxrt_wake_word_browser crate, wasm-bindgen glue, examples): Apache-2.0.
Compiled runtime (voxrt-wake-word-browser_bg.wasm and its symbol table): proprietary binary. Redistribution is allowed only as an unmodified part of this SDK package. See LICENSE-BINARY.
Wake-phrase weights (voxrt_wake_word.vxrt fetched at runtime): proprietary in-house model, trained on synthetic and licensed speech, no upstream license obligations.

Synced from voxrt-wake-word-browser on 2026-07-02 — the GitHub repo is the source of truth. Found a mismatch? The repo wins.