Voice Activity Detection

On-device voice activity detection

Streaming, on-device detection of when a person starts and stops speaking. Silero v5 weights on the VoxRT runtime — about 1.7 MB of app-size impact, and no audio ever leaves the device.

Overview

The foundational voice primitive

Voice activity detection tells you, frame by frame, whether someone is speaking. It's the building block the rest of a voice pipeline sits on: it gates wake-word and keyword-spotting models so they only run when there's speech, drives barge-in and interruption logic, and stands alone for record-trimming and turn-taking.

VoxRT ships the well-known Silero v5 weights (MIT-licensed) on its own from-scratch inference runtime — the accuracy of a proven model with a tiny binary footprint and no heavyweight ML framework dependency.

The runtime

The runtime is the product

VoxRT is a from-scratch inference runtime for on-device speech models — no ONNX Runtime, no PyTorch Mobile, no LiteRT. It's a custom Rust core sized and tuned for streaming voice workloads on phone-class hardware.

VAD is the free, open-source showcase of that runtime — running Silero v5 with state-of-the-art per-frame latency, alongside wake word and streaming ASR. All three share the same Rust runtime crate and NEON kernel set.

Capabilities

What you get

Per-frame decisions

Streaming, low-latency speech / no-speech on every audio frame.

Proven model

Silero v5 weights, MIT-licensed — no ONNX Runtime or PyTorch Mobile.

Tiny footprint

~1.7 MB net app-size impact; runs comfortably on budget Android.

Power gate

Gates wake-word and keyword-spotting models so they only run on speech.

Fully on-device

No network, no microphone audio leaving the device.

Performance

Negligible cost per stream

1.85%
real-time factor on iPhone 13 Pro Max — ~0.6 ms / 32 ms frame
3.05%
real-time factor on a Snapdragon 662
~54
parallel VAD streams on a single core
~1.7 MB
net app-size impact
Footprint

About 1.7 MB in your app

  • Swift wrapper source~17 KB
  • Native xcframework, compressed (device slice)~500 KB
  • Silero VAD weights (fp16)1.2 MB
  • Net app-size impact~1.7 MB

Build it on-device with VoxRT

Tell us what you're shipping and which devices it has to run on.

Try now