Project Title:Zentry

Git-Organization:https://github.com/Zentry-org


<aside>

Purpose

Zentry AI Assistant aims to deliver a real-time, multilingual, and human-like voice assistant optimized for telephony and institutional automation. Its core purpose is to streamline interactions such as reception calls, helpline responses, and localized support by combining efficient STT, lightweight reasoning, and conversational output.

</aside>

<aside>

Scope

The project focuses on integrating FreeSWITCH for call handling, Whisper (CTranslate2) for high-accuracy speech-to-text, and Phi-3 Mini with RAG for reasoning. It is extendable with Meta MMS models for multilingual support and future TTS integration, ensuring adaptability across education, healthcare, and enterprise use cases.

</aside>

<aside>

Key Deliverables

<aside>

Background

Voice assistants are often cloud-reliant, expensive, and lack support for local languages. Zentry addresses these gaps by building a fully open-source, lightweight, and edge-deployable system optimized for real-world conditions like noisy phone calls. It leverages proven speech models and fast inference pipelines to provide a privacy-first and highly accurate solution.

</aside>

<aside>

Team

<aside>

Milestone Schedule

Date Milestone
May 21 setting up asterik,realising vm is a real deal
May 23 traversing through various STT LLMs
→checking whisper large v3
[too bad on Malayalam]
May 24 →vrlsc/whisper medium
[good but i found out about thennal/ whisper medium ml]
[thennal fine-tuned model has a WER of 11 on Common Voice 11.0 dataset]
June-July
</aside>

special mention to this guy https://kurianbenoy.com/talks/delft-fastai/index.html?utm_source=chatgpt.com#/malayalam-models-in-whisper-event

Draft of out main architecture

Draft of out main architecture

Whats happening after STT text transcription

Whats happening after STT text transcription

Actual Working!

Phi3 Pipeline

1. Call context builder (middleware)

2. Phi-3 stays lightweight