Voice Activity Detection

On-device voice activity detection

Streaming, on-device detection of when a person starts and stops speaking. Silero v5 weights on the VoxRT runtime — about 1.7 MB of app-size impact, and no audio ever leaves the device.

Get the VAD SDK →

Overview

The foundational voice primitive

Voice activity detection tells you, frame by frame, whether someone is speaking. It's the building block the rest of a voice pipeline sits on: it gates wake-word and keyword-spotting models so they only run when there's speech, drives barge-in and interruption logic, and stands alone for record-trimming and turn-taking.

VoxRT ships the well-known Silero v5 weights (MIT-licensed) on its own from-scratch inference runtime — the accuracy of a proven model with a tiny binary footprint and no heavyweight ML framework dependency.

The runtime

The runtime is the product

VoxRT is a from-scratch inference runtime for on-device speech models — no ONNX Runtime, no PyTorch Mobile, no LiteRT. It's a custom Rust core sized and tuned for streaming voice workloads on phone-class hardware.

VAD is the free, open-source showcase of that runtime — running Silero v5 with state-of-the-art per-frame latency, alongside wake word and streaming ASR. All three share the same Rust runtime crate and NEON kernel set.

Capabilities

What you get

Per-frame decisions

Streaming, low-latency speech / no-speech on every audio frame.

Proven model

Silero v5 weights, MIT-licensed — no ONNX Runtime or PyTorch Mobile.

Tiny footprint

~1.7 MB net app-size impact; runs comfortably on budget Android.

Power gate

Gates wake-word and keyword-spotting models so they only run on speech.

Fully on-device

No network, no microphone audio leaving the device.

Performance

Negligible cost per stream

1.85%

real-time factor on iPhone 13 Pro Max — ~0.6 ms / 32 ms frame

3.05%

real-time factor on a Snapdragon 662

~54

parallel VAD streams on a single core

~1.7 MB

net app-size impact

Footprint

About 1.7 MB in your app

Swift wrapper source~17 KB
Native xcframework, compressed (device slice)~500 KB
Silero VAD weights (fp16)1.2 MB
Net app-size impact~1.7 MB

Explore