Custom Wake Word

On-device custom wake word detection

Wake your app from a custom phrase or word — running entirely on the device. No cloud, no streamed microphone audio, single-digit-millisecond inference even on budget Android.

Request a custom wake word →

Overview

A wake word that never leaves the device

A wake-word detector is the always-listening trigger that brings your app to attention. VoxRT runs that detector on-device, so it stays responsive without a network connection and never streams microphone audio to the cloud — the privacy posture enterprise buyers expect.

Models are trained on synthetic data and heavy augmentation, so they hold up against background noise, distance from the mic, and a wide range of accents. A lightweight voice-activity gate keeps power draw low between activations.

The runtime

The runtime is the product

VoxRT is a from-scratch inference runtime for on-device speech models — no ONNX Runtime, no PyTorch Mobile, no LiteRT. It's a custom Rust core sized and tuned for streaming voice workloads on phone-class hardware.

Custom Wake Word is one product on that runtime, alongside VAD and streaming ASR. All three share the same Rust runtime crate and NEON kernel set — the runtime is the product; the models are what it runs.

Capabilities

What you get

Any phrase or word

Single words, brand names, or multi-word activations — "Hey YourBrand", "OK Product".

Robust in the wild

Holds up against background noise, distance from the mic, and a wide range of accents.

Tunable threshold

Trade precision for recall with one confidence setting tuned to your product.

Battery-aware

Gated by on-device voice activity detection, so it sips power while idle.

Fully on-device

No microphone audio ever leaves the device. No network required.

Performance

Fast enough to leave always-on

0.015

real-time factor on iPhone 13 Pro Max — ~150 µs / 10 ms frame

~1.5%

of one core during continuous listening

0.021

real-time factor on a Snapdragon 662 (2020 budget Android)

~48K

parameters — tiny enough for microcontrollers

Model quality

Measured on a hard test set

5,240 positive utterances + 6,416 hard negatives (isolated "Hey", isolated "Assistant", competitor wake-words like "Hey Siri", phonetic neighbours, arbitrary speech, non-speech audio), all speakers disjoint from train and validation. ROC AUC 0.9966, average precision 0.9899.

Threshold	Precision	Recall	F1	FPR	False positives
0.5	0.864	0.995	0.925	12.8%	822 / 6,416
0.85	0.957	0.987	0.972	3.7%	234 / 6,416
0.9 (default)	0.993	0.982	0.987	0.5%	34 / 6,416
0.95	0.997	0.769	0.868	0.2%	12 / 6,416

Footprint

Roughly 2–3 MB in your app

Swift wrapper source (one file)~7 KB
Native xcframework, compressed (device + simulator)~19 MB
App-binary delta after extraction + dead-code elimination2–3 MB
Wake-phrase model (fp16)~100 KB

How it works

Reference model free, your phrase commercial

The SDK ships a reference wake-phrase model trained 100% on synthetic data. When you need your own phrase, we train a custom model on it and tune the operating point for your acoustic environment.