On-device custom wake word detection
Wake your app from a custom phrase or word — running entirely on the device. No cloud, no streamed microphone audio, single-digit-millisecond inference even on budget Android.
A wake word that never leaves the device
A wake-word detector is the always-listening trigger that brings your app to attention. VoxRT runs that detector on-device, so it stays responsive without a network connection and never streams microphone audio to the cloud — the privacy posture enterprise buyers expect.
Models are trained on synthetic data and heavy augmentation, so they hold up against background noise, distance from the mic, and a wide range of accents. A lightweight voice-activity gate keeps power draw low between activations.
The runtime is the product
VoxRT is a from-scratch inference runtime for on-device speech models — no ONNX Runtime, no PyTorch Mobile, no LiteRT. It's a custom Rust core sized and tuned for streaming voice workloads on phone-class hardware.
Custom Wake Word is one product on that runtime, alongside VAD and streaming ASR. All three share the same Rust runtime crate and NEON kernel set — the runtime is the product; the models are what it runs.
What you get
Any phrase or word
Single words, brand names, or multi-word activations — "Hey YourBrand", "OK Product".
Robust in the wild
Holds up against background noise, distance from the mic, and a wide range of accents.
Tunable threshold
Trade precision for recall with one confidence setting tuned to your product.
Battery-aware
Gated by on-device voice activity detection, so it sips power while idle.
Fully on-device
No microphone audio ever leaves the device. No network required.
Fast enough to leave always-on
Measured on a hard test set
5,240 positive utterances + 6,416 hard negatives (isolated "Hey", isolated "Assistant", competitor wake-words like "Hey Siri", phonetic neighbours, arbitrary speech, non-speech audio), all speakers disjoint from train and validation. ROC AUC 0.9966, average precision 0.9899.
| Threshold | Precision | Recall | F1 | FPR | False positives |
|---|---|---|---|---|---|
| 0.5 | 0.864 | 0.995 | 0.925 | 12.8% | 822 / 6,416 |
| 0.85 | 0.957 | 0.987 | 0.972 | 3.7% | 234 / 6,416 |
| 0.9 (default) | 0.993 | 0.982 | 0.987 | 0.5% | 34 / 6,416 |
| 0.95 | 0.997 | 0.769 | 0.868 | 0.2% | 12 / 6,416 |
Roughly 2–3 MB in your app
- Swift wrapper source (one file)~7 KB
- Native xcframework, compressed (device + simulator)~19 MB
- App-binary delta after extraction + dead-code elimination2–3 MB
- Wake-phrase model (fp16)~100 KB
Reference model free, your phrase commercial
The SDK ships a reference wake-phrase model trained 100% on synthetic data. When you need your own phrase, we train a custom model on it and tune the operating point for your acoustic environment.