The science behind every reading.
Voxavia's audio engine is the moat. Every parameter is calibrated against gold-standard references — Praat, MDVP, and published clinical norms — so the numbers stand up to scrutiny.
- Range
- A2 — F♯5
- On target
- 86%
- Stability
- Good
- Take
- 0:08
From microphone to measurement.
Voxavia's pipeline is decoupled from the UI so the same primitives power every drill. Every frame is gated for clarity before it counts.
Permission
Microphone capture via getUserMedia. Browser DSP — echo cancellation, noise suppression, AGC — is deliberately disabled because those filters distort the time-domain signal that YIN-based pitch detection relies on.
Pipeline
MediaStreamSource → AnalyserNode (fftSize 2048) → per-frame Float32 → DSP module → emitted sample. Same plumbing across every feature.
Validity gate
A pitch frame counts only when clarity ≥ 0.9 and 60 Hz ≤ F0 ≤ 1500 Hz. Below that, the frame is recorded as a gap — which keeps glissando and vibrato traces honest.
Throttling
React state updates at ~10 fps for live readouts. The graph canvas redraws at 60 fps independently, reading directly from a ref-backed sample buffer so React state never enters the hot path.
Design decision
Why we turn off AGC, AEC, and noise suppression.
Most browsers default to applying automatic gain control, echo cancellation, and noise suppression on microphone capture. They're great for video calls, but they smear the time-domain signal that pitch detection and voice-quality DSP rely on.
Re-enabling them would silently corrupt every measurement Voxavia produces. So we disable them at the getUserMedia constraints level and lean on signal-design choices — calibration, SNR gating, validity windows — to keep noisy rooms honest.
The library underneath.
Built once, used everywhere. Every module emits a stream of measurements that feed drills, reports, and personal-best tracking.
Pitch (Pitchy / YIN)
DSPF0, clarity. The foundation of every pitch-driven feature.
CPPS
DSPCepstral peak prominence smoothed — the most modern voice-quality scalar.
Jitter
DSPCycle-to-cycle variation in pitch period.
Shimmer
DSPCycle-to-cycle variation in amplitude.
HNR
DSPHarmonics-to-noise ratio.
Formants (LPC)
DSPF1 / F2 / F3 every ~200 ms. Drives vowel-space plotting and singer's-formant scoring.
Vibrato
DSPRate (Hz), extent (cents), regularity.
Stability
DSPTremor, drift, deviation on sustained pitches.
Messa-di-voce shape fit
DSPHow well a crescendo→decrescendo amplitude curve matches the target shape.
Passaggio detection
DSPBreak-zone clustering across glide segments.
LTAS
DSPLong-term average spectrum slope.
MFCC
DSPMel-frequency cepstral coefficients (used by accent and vowel-shape work).
GNE
DSPGlottal-to-noise excitation ratio.
Subharmonic ratio
DSPDiplophonia / period-doubling detector.
Spectral tilt
DSPHigh-frequency rolloff; pairs with mic-distance work.
Singer's formant
DSPFFT energy ratio in 2.5–3.6 kHz vs the broader band.
Sibilance
DSP4–10 kHz band ratio + spectral centroid.
Speaking rate
DSPApproximate words/syllables per minute via onset detection.
Filler detection (beta)
DSPDetects um, uh, and similar fillers in continuous speech.
Mic calibration
DSPPer-device dB offset; pink-noise reference.
Tone player
DSPReference-tone synthesis for matching, harmony, and ear-training drills.
Onset detection
DSPRMS-envelope onset times — feeds DDK, melodic dictation, and rate.
Key suggestion
DSPMaps a stored vocal range to comfortable song keys.
Fatigue index
DSPComposite scalar over recent vs baseline jitter, HNR, range, and load.
Whose shoulders we stand on.
The literature and tools we calibrate against. The list is short on purpose — we choose published, widely cited references over novel methods.
YIN
De Cheveigné & Kawahara, 2002. Time-domain autocorrelation pitch detector — Voxavia uses the Pitchy implementation.
Praat
Boersma & Weenink. The open-source phonetics tool we calibrate against.
MDVP
Multi-Dimensional Voice Program. Long-standing clinical norms for jitter, shimmer, HNR.
GRBAS — Hirano 1981
Perceptual self-rating: Grade · Roughness · Breathiness · Asthenia · Strain.
RSI — Belafsky 2002
9-item Reflux Symptom Index for laryngopharyngeal reflux.
VHI — Jacobson 1997
Voice Handicap Index — patient-reported impact.
Trust, but verify.
Open the app, sing a vowel for ten seconds, and see the numbers. If you're a researcher who'd compare them against your own toolchain, get in touch.