STT — Speech to Text

silence prob: 0% buf: 0ms
Ready — tap the microphone to begin
🗣️
Tap microphone to start/stop. Text refines in real-time as you speak. | API Docs

File Upload

Drop audio files here or tap to select

wav, mp3, ogg, webm, mp4, flac — multiple files supported

Speaker Enrollment

Enroll voice samples so the system can identify who is speaking during diarization.

For best results, record 3-5 samples per person from different microphones, distances, and environments. Each sample should be 15-30s of natural speech. The system averages all samples for a robust voiceprint.

Loading speakers...
Or upload audio file:
Recognition threshold: 0.55