Custom Wake Word

On-device custom wake word detection

Wake your app from a custom phrase or word — running entirely on the device. No cloud, no streamed microphone audio, single-digit-millisecond inference even on budget Android.

Overview

A wake word that never leaves the device

A wake-word detector is the always-listening trigger that brings your app to attention. VoxRT runs that detector on-device, so it stays responsive without a network connection and never streams microphone audio to the cloud — the privacy posture enterprise buyers expect.

Models are trained on synthetic data and heavy augmentation, so they hold up against background noise, distance from the mic, and a wide range of accents. A lightweight voice-activity gate keeps power draw low between activations.

The runtime

The runtime is the product

VoxRT is a from-scratch inference runtime for on-device speech models — no ONNX Runtime, no PyTorch Mobile, no LiteRT. It's a custom Rust core sized and tuned for streaming voice workloads on phone-class hardware.

Custom Wake Word is one product on that runtime, alongside VAD and streaming ASR. All three share the same Rust runtime crate and NEON kernel set — the runtime is the product; the models are what it runs.

Capabilities

What you get

Any phrase or word

Single words, brand names, or multi-word activations — "Hey YourBrand", "OK Product".

Robust in the wild

Holds up against background noise, distance from the mic, and a wide range of accents.

Tunable threshold

Trade precision for recall with one confidence setting tuned to your product.

Battery-aware

Gated by on-device voice activity detection, so it sips power while idle.

Fully on-device

No microphone audio ever leaves the device. No network required.

Performance

Fast enough to leave always-on

0.015
real-time factor on iPhone 13 Pro Max — ~150 µs / 10 ms frame
~1.5%
of one core during continuous listening
0.021
real-time factor on a Snapdragon 662 (2020 budget Android)
~48K
parameters — tiny enough for microcontrollers
Model quality

Measured on a hard test set

5,240 positive utterances + 6,416 hard negatives (isolated "Hey", isolated "Assistant", competitor wake-words like "Hey Siri", phonetic neighbours, arbitrary speech, non-speech audio), all speakers disjoint from train and validation. ROC AUC 0.9966, average precision 0.9899.

ThresholdPrecisionRecallF1FPRFalse positives
0.50.8640.9950.92512.8%822 / 6,416
0.850.9570.9870.9723.7%234 / 6,416
0.9 (default)0.9930.9820.9870.5%34 / 6,416
0.950.9970.7690.8680.2%12 / 6,416
Footprint

Roughly 2–3 MB in your app

  • Swift wrapper source (one file)~7 KB
  • Native xcframework, compressed (device + simulator)~19 MB
  • App-binary delta after extraction + dead-code elimination2–3 MB
  • Wake-phrase model (fp16)~100 KB
How it works

Reference model free, your phrase commercial

The SDK ships a reference wake-phrase model trained 100% on synthetic data. When you need your own phrase, we train a custom model on it and tune the operating point for your acoustic environment.

Ship a custom wake word on-device

Tell us your phrase and which devices it has to run on.

Try now