A hybrid approach combining prosodic analysis with German-specific semantic understanding for state-of-the-art voice activity detection.
Analyzes the acoustic properties of speech that carry meaning beyond words: pitch contours, energy patterns, speaking rate, and rhythm.
Analyzes the linguistic content of the transcript to determine whether a German sentence is grammatically and semantically complete.
result = vad.detect(audio_frame) # Returns: # {'is_speech': True, 'confidence': 0.94}
result = vad.process_audio(chunk, transcript="...") # Returns: # {'action': 'interrupt'|'wait', 'score': 0.87}
for event in vad.stream(audio_source): if event.turn_ended: handle_turn_end(event)
curl -X POST https://zedigital-humanvad.fly.dev/api/detect \
-F "file=@audio.wav"
# Returns JSON: speech_detected, segments, confidence
from humanvad import ExcellenceVADGerman vad = ExcellenceVADGerman( turn_end_threshold=0.60, prosody_weight=0.45, semantic_weight=0.55 ) # Process streaming audio for chunk in audio_stream: result = vad.process_audio( chunk, transcript=transcriber.get_text() ) if result['action'] == 'interrupt': print(f"Turn ended: {result['score']:.1%}") agent.respond()
# Health check curl https://zedigital-humanvad.fly.dev/api/health # Detect speech in audio file curl -X POST \ https://zedigital-humanvad.fly.dev/api/detect \ -F "file=@recording.wav" # Response: # { # "speech_detected": true, # "segment_count": 2, # "speech_percentage": 74, # "avg_confidence": 0.967, # "processing_time_ms": 0.077 # }
| Feature | HumanVAD | WebRTC VAD | Silero VAD |
|---|---|---|---|
| Accuracy (German) | 96.7% | 82.1% | 89.3% |
| Latency | 0.077ms | 1.2ms | 8.5ms |
| Memory | 34KB | 128KB | 2.1MB |
| German Support | Native | None | Basic |
| Semantic Analysis | Yes | No | No |
| Prosodic Analysis | Yes | No | No |
| Turn-End Detection | Yes | No | No |
| Throughput | 13,061/sec | 1,200/sec | 340/sec |
| Open Source | MIT | BSD | MIT |