The most accurate voice activity detection for German speech. Built on neuroscience research, optimized for real-time telephone conversations.
Enterprise-grade audio processing infrastructure
German's complex sentence structure -- with verb-final clauses, separable prefixes, and compound words -- confuses standard voice activity detectors. They trigger too early, interrupt mid-sentence, or miss turn endings entirely.
HumanVAD combines prosodic analysis (pitch, energy, rhythm) with German-specific semantic analysis to determine when a speaker has truly finished their thought -- not just paused mid-sentence.
Prosodic analysis (40-50%) combined with semantic analysis (50-60%) for maximum accuracy.
100% accuracy on Banking, Medical, and Restaurant calls. Optimized per domain.
0.077ms latency with 13,061 chunks/sec throughput. Process audio as it arrives.
34KB memory footprint. No GPU required. Runs on embedded devices and edge servers.
Install from PyPI, initialize the detector, and start processing audio. Three lines of code to production-ready German VAD.
| 1 | from humanvad import ExcellenceVADGerman |
| 2 | |
| 3 | # Initialize with default settings |
| 4 | vad = ExcellenceVADGerman(turn_end_threshold=0.60) |
| 5 | |
| 6 | # Process audio with transcript |
| 7 | result = vad.process_audio( |
| 8 | audio_chunk, |
| 9 | transcript="Das Hotel hat fünfzig Zimmer." |
| 10 | ) |
| 11 | |
| 12 | if result['action'] == 'interrupt': |
| 13 | agent.start_speaking() |
"Ich möchte gerne mein Konto überprüfen, bitte."
confidence
HumanVAD is MIT licensed and free forever. Professional support plans fund continued development and research.
Priority issues, email support (24h), integration guidance
Dedicated channel, 4hr/mo consulting, SLA, on-call support
Pay-per-use hosted API at zedigital-humanvad.fly.dev
HumanVAD is specifically optimized for German. The prosodic and semantic analysis models are trained on German telephone conversations. The basic energy-based VAD will work on any language, but the hybrid accuracy advantage is German-specific.
HumanVAD accepts 16kHz PCM audio (16-bit mono). The library includes utilities to convert from other formats. The hosted API accepts WAV, WebM, and MP3 files and handles conversion automatically.
Yes. HumanVAD is released under the MIT license. You can use it in any project, commercial or otherwise, without restriction. Attribution is appreciated but not required.
The hosted API at zedigital-humanvad.fly.dev provides the same detection capabilities via REST endpoints. It handles audio format conversion, scales automatically, and requires no local installation. It charges €0.002 per request.
HumanVAD processes audio in chunks as small as 10ms. For the hybrid prosodic + semantic analysis, at least 200ms of audio context is recommended for optimal accuracy. The streaming API accumulates context automatically.