Custom Keyword Spotting

On-device custom keyword spotting

Detect a fixed vocabulary of spoken commands — play, pause, next, louder, stop — directly on the device. Hands-free control at a fraction of the latency and compute of full speech recognition.

Overview

Closed-vocabulary control, done efficiently

When your product only needs to recognize a known set of commands, running a full speech-to-text model is overkill. Keyword spotting trains a compact classifier on exactly your command set, so it responds faster, uses less battery, and is more accurate on those words than a general transcriber would be.

It runs on the same VoxRT on-device runtime as the rest of the stack, gated by voice activity detection so it only fires when there's speech — and like everything else, no audio leaves the device.

The runtime

The runtime is the product

VoxRT is a from-scratch inference runtime for on-device speech models — no ONNX Runtime, no PyTorch Mobile, no LiteRT. It's a custom Rust core sized and tuned for streaming voice workloads on phone-class hardware.

Keyword spotting runs on that same runtime as wake word, VAD, and streaming ASR — one Rust runtime crate and one NEON kernel set across the whole stack.

Capabilities

What you get

Your vocabulary

A classifier trained on exactly the commands your product needs.

Per-keyword confidence

Tune each command independently for the precision you need.

Lighter than ASR

Lower latency and compute than full transcription for closed-vocabulary control.

Battery-aware

Voice-activity gated for always-listening devices.

On-device & offline

No network round-trips, no audio leaving the device.

How it works

Trained on your command set

Tell us the commands you need to recognize and the devices they have to run on, and we train and tune a keyword model for your product. We'll be in touch within a business day.

Build it on-device with VoxRT

Tell us your command set and which devices it has to run on.

Try now