On-device custom keyword spotting
Detect a fixed vocabulary of spoken commands — play, pause, next, louder, stop — directly on the device. Hands-free control at a fraction of the latency and compute of full speech recognition.
Closed-vocabulary control, done efficiently
When your product only needs to recognize a known set of commands, running a full speech-to-text model is overkill. Keyword spotting trains a compact classifier on exactly your command set, so it responds faster, uses less battery, and is more accurate on those words than a general transcriber would be.
It runs on the same VoxRT on-device runtime as the rest of the stack, gated by voice activity detection so it only fires when there's speech — and like everything else, no audio leaves the device.
The runtime is the product
VoxRT is a from-scratch inference runtime for on-device speech models — no ONNX Runtime, no PyTorch Mobile, no LiteRT. It's a custom Rust core sized and tuned for streaming voice workloads on phone-class hardware.
Keyword spotting runs on that same runtime as wake word, VAD, and streaming ASR — one Rust runtime crate and one NEON kernel set across the whole stack.
What you get
Your vocabulary
A classifier trained on exactly the commands your product needs.
Per-keyword confidence
Tune each command independently for the precision you need.
Lighter than ASR
Lower latency and compute than full transcription for closed-vocabulary control.
Battery-aware
Voice-activity gated for always-listening devices.
On-device & offline
No network round-trips, no audio leaving the device.
Trained on your command set
Tell us the commands you need to recognize and the devices they have to run on, and we train and tune a keyword model for your product. We'll be in touch within a business day.