On-device wake-word · compared

Which on-device wake-word should you ship?

A wake word is the always-on phrase that brings your app to life — "Hey YourBrand" — detected right on the device so nothing streams to the cloud. This page compares the on-device options — Picovoice Porcupine, openWakeWord and the older baselines — in plain terms: how accurate they are, how they're licensed, and whether you can actually ship them on mobile. VoxRT's edge is simple: an MIT runtime and model, a native Android and iOS SDK, and a tiny always-on footprint. Built-in assistant wake words such as "Hey Siri" and "Hey Google" aren't included — third-party apps can't redistribute them as their own wake-word SDK.

Why VoxRT

What you get with VoxRT wake-word

Custom phrase tuning

Single words, brand names, or multi-word activations — tuned for your phrase, language, and target devices.

Designed for real-world audio

Tuned against hard negatives and noisy mobile conditions, with thresholds adjustable for your product.

Tunable threshold

Trade precision for recall with one confidence setting tuned to your product.

Battery-aware

Gated by on-device voice activity detection, so it sips power while idle.

Fully on-device

No microphone audio ever leaves the device. No network required — and no per-use fees.

Side by side

The comparison table

Most wake-word vendors publish desktop or Raspberry Pi numbers. VoxRT publishes measured mobile real-time factor on a Snapdragon 662 and an iPhone — the hardware your users actually carry.

Comparison of on-device wake-word engines — VoxRT, Picovoice Porcupine, openWakeWord, Mycroft Precise, Snowboy and PocketSphinx — by vendor-reported accuracy, mobile speed, native mobile SDK, custom-keyword support and license.
Engine Accuracy (vendor-reported) Speed Native mobile SDK Custom keyword License
VoxRTwake-word v0.1 ROC AUC 0.9966 a
P 0.993 / R 0.982 @ 0.9
RTF 0.021
Snapdragon 662 (A73)
✓ Android + iOS Any phrase / language
tuned on request
MIT
runtime + weights
Picovoice Porcupineclosed commercial SDK 2.7% miss rate b
1 FA / 10 hr, 10 dB SNR · 6-keyword avg
0.6% CPU RPi 5 c
~1.8% est. on SD662
✓ Android + iOS Console d
paid tier for commercial
Commercial
free plan = eval only
openWakeWordmodern OSS FA <0.5/hr, FR <5% target e
"beats Porcupine" on a small set
Not published
15–20 models real-time / RPi 3 core
✗ Python only e Colab + TTS
~1 hr
Weights CC-BY-NC-SA f
code Apache-2.0
Mycroft Preciselegacy baseline · 2019 Not published Not published ✗ Linux / RPi only Training docs Apache-2.0
Snowboylegacy baseline · archived 2020 31.9% miss rate b 3.8% CPU RPi 5 b ✗ discontinued Archived g
PocketSphinxlegacy baseline · CMU-Sphinx 52.0% miss rate b 12.1% CPU RPi 5 b ✗ unofficial ports Phonetic, no training BSD-style g

RTF = real-time factor — fraction of one CPU core needed to keep up with audio in real time; lower is better. Accuracy and speed figures are each vendor's own published numbers, measured on different phrases, datasets and hardware, so they are directionally indicative, not directly comparable (see the methodology note below). Sources: a Internal VoxRT measurement on a held-out test set — 5,240 positive utterances + 6,416 hard negatives, speakers disjoint from training; default phrase "Hey Assistant", precision/recall quoted at the 0.9 default threshold. b Picovoice wake-word-benchmark — miss rate at 1 false alarm per 10 hours, 10 dB SNR, six keywords averaged (alexa, computer, jarvis, smart mirror, snowboy, view glass); runtime CPU on Raspberry Pi 5. Figures read from the benchmark's chart images. c Picovoice publishes 0.6% CPU on a Raspberry Pi 5; ~1.8% is a Geekbench-single-core-scaled estimate for a Snapdragon 662 (see methodology). No first-party Android/iOS number is published. d Picovoice Console trains custom keywords; the Free Plan is evaluation-only and commercial deployment requires a paid tier with pricing gated behind sales. e openWakeWord README — a target false-accept <0.5/hr and false-reject <5%, and an "Alexa" model that beats Porcupine on a small test set the authors say to interpret cautiously; ships as a Python package (a community C++ port exists, not first-party). f openWakeWord code is Apache-2.0, but its pretrained models are CC-BY-NC-SA 4.0 (non-commercial) — shipping them in a paid app requires retraining your own model. g Snowboy (KITT.AI) was archived in 2020 and PocketSphinx is CMU-Sphinx-era; both appear as the OSS baselines in Picovoice's benchmark, not as current shipping options.

For technical evaluators

The technical details

VoxRT's wake-word is a compact neural detector on a mobile-first Rust runtime — a stateless C ABI, shipped today as native Android (JitPack) and iOS (Swift Package) modules, with published mobile real-time factor and a model small enough for microcontrollers next. Below are the measured numbers and the footprint that lands in your app.

0.9966
ROC AUC on a hard held-out test set (AP 0.9899)
0.021
real-time factor on a Snapdragon 662 — ~2.1% of one core
0.015
real-time factor on iPhone 13 Pro Max — ~150 µs / 10 ms frame
~103 KB
wake-phrase model on disk (~48K parameters)
8.5×
faster than the scalar baseline — NEON-optimized runtime on supported ARM

The 0.021 real-time factor is the HIGH_PERF, affinity-pinned measurement. Default AUTO mode is built for normal app integration and varies with device scheduling and thermal state.

What it costs in your app

  • Swift wrapper source (one file)~7 KB
  • Wake-phrase model (fp16)~103 KB
  • App-binary delta after extraction + dead-code elimination2–3 MB

The wake-phrase model itself is ~103 KB; the 2–3 MB app delta includes the extracted native runtime and packaging overhead.

Engine by engine

How each one really compares

VoxRT

MIT · runtime + weights

VoxRT is the wake-word engine for teams that need a custom, on-device trigger inside a commercial product — without opaque licensing, a cloud dependency, or porting work. MIT runtime and MIT weights, a native Android and iOS SDK today, a ~103 KB model, and a measured mobile real-time factor of 0.021 on a Snapdragon 662. Custom phrases in any language are tuned per customer on request — a paid engagement rather than a self-serve console at v0.1. It wins where a buyer feels pain: license clarity, native mobile shipping, tiny footprint, and published mobile performance.

Picovoice Porcupine

Closed commercial SDK

Picovoice Porcupine is a closed commercial wake-word SDK with broad platform support, a self-serve keyword console, and published benchmark results on six built-in phrases. It doesn't publish cheap-tier Android or iPhone RTF, its model size isn't surfaced in the docs, and commercial custom-keyword deployment requires a paid plan. The useful comparison is practical: Porcupine publishes Raspberry Pi numbers; VoxRT publishes measured Android and iOS mobile RTF, with MIT weights and no commercial runtime lock-in.

openWakeWord

Code Apache-2.0 · weights CC-BY-NC-SA

The modern open-source option, with a clever pretrained-embedding backbone and easy ~1-hour custom training on Colab. The trap is the weights: the pretrained models are CC-BY-NC-SA 4.0 — non-commercial — so you can't ship them in a paid app without retraining your own from scratch. It's also Python-only, with no first-party native mobile SDK.

Legacy baselines

Mycroft Precise · Snowboy · PocketSphinx

Older OSS baselines kept in the table for context only — Mycroft Precise (dormant since 2019), Snowboy (archived 2020) and PocketSphinx (CMU-Sphinx-era) — none something you'd ship in a new product today.

Read the numbers carefully

Why the figures aren't apples-to-apples

We'd rather hand you the caveats than a tidy leaderboard. Every accuracy and speed figure above is vendor-self-reported, and no independent academic benchmark covers all of these engines on a common test set.

Accuracy is measured on different phrases. Our ROC AUC 0.9966 is on our custom "Hey Assistant" phrase against a set of hard negatives; Porcupine's 2.7% miss rate is averaged over six built-in keywords; openWakeWord's "beats Porcupine" claim is on its Alexa model on a small dataset its own authors say to read cautiously. Custom-keyword accuracy is consistently lower than built-in for every vendor, so don't read a custom number against a built-in one.

Speed is the axis nobody but us publishes on cheap-tier Android. Picovoice reports 0.6% CPU on a Raspberry Pi 5; scaling by Geekbench single-core ratios (~3×) puts that near 1.8% on a Snapdragon 662, against our measured 2.1% — a gap well inside the device's thermal noise, so the honest read is a near-tie, not an advantage either way. Our 0.021 is a warm, affinity-pinned high-performance figure; a phone's default mode runs a little higher.

What's solid and comparable is the rest of the table: license terms, whether there's a first-party native mobile SDK, and whether a vendor publishes any mobile number at all.

FAQ

On-device wake-word, answered

What is the best on-device wake-word engine?

There is no independent third-party benchmark that covers every on-device wake-word engine on a common test set, so any single "best" claim is vendor-self-reported. The practical choice comes down to license, whether there is a native mobile SDK, and published mobile performance. VoxRT ships an MIT runtime and MIT weights with native Android and iOS modules, a ROC AUC of 0.9966 on a hard held-out test set, and a measured real-time factor of 0.021 on a Snapdragon 662.

Can VoxRT detect a custom wake phrase?

Yes. The default reference phrase is "Hey Assistant", and custom phrases in any language are tuned per customer as a paid engagement. VoxRT does not offer a self-serve keyword-training console at v0.1 — custom phrases are delivered as a tuned model rather than generated in a UI.

What is a good real-time factor for a wake-word engine?

Real-time factor is the fraction of one CPU core needed to keep up with audio in real time, so lower is better. VoxRT measures a real-time factor of 0.021 — about 2.1% of one core — on a Snapdragon 662, and 0.015 on an iPhone 13 Pro Max, light enough to leave always-on. Picovoice publishes 0.6% CPU on a Raspberry Pi 5; scaled to a Snapdragon 662 that is roughly 1.8%, so the two are effectively a tie on raw speed.

Put VoxRT wake-word on your device

Tell us your wake phrase, target devices, and latency/battery constraints.

Try now