<aside>
Purpose
Zentry AI Assistant aims to deliver a real-time, multilingual, and human-like voice assistant optimized for telephony and institutional automation. Its core purpose is to streamline interactions such as reception calls, helpline responses, and localized support by combining efficient STT, lightweight reasoning, and conversational output.
</aside>
<aside>
Scope
The project focuses on integrating FreeSWITCH for call handling, Whisper (CTranslate2) for high-accuracy speech-to-text, and Phi-3 Mini with RAG for reasoning. It is extendable with Meta MMS models for multilingual support and future TTS integration, ensuring adaptability across education, healthcare, and enterprise use cases.
</aside>
<aside>
Key Deliverables
<aside>
Background
Voice assistants are often cloud-reliant, expensive, and lack support for local languages. Zentry addresses these gaps by building a fully open-source, lightweight, and edge-deployable system optimized for real-world conditions like noisy phone calls. It leverages proven speech models and fast inference pipelines to provide a privacy-first and highly accurate solution.
</aside>
<aside>
Team
<aside>
Milestone Schedule
Date | Milestone |
---|---|
May 21 | setting up asterik,realising vm is a real deal |
May 23 | traversing through various STT LLMs |
→checking whisper large v3 | |
[too bad on Malayalam] | |
May 24 | →vrlsc/whisper medium |
[good but i found out about thennal/ whisper medium ml] | |
[thennal fine-tuned model has a WER of 11 on Common Voice 11.0 dataset] | |
June-July | |
</aside> |
special mention to this guy https://kurianbenoy.com/talks/delft-fastai/index.html?utm_source=chatgpt.com#/malayalam-models-in-whisper-event
Draft of out main architecture
Draft of out main architecture
Whats happening after STT text transcription
Whats happening after STT text transcription
mod_audio_fork
to fork the live call audio to your Python STT server (Whisper)..wav
and FreeSWITCH plays.mod_external_media
.