Vak Sahayak

Next.js 15React 19Tailwind CSSLiveKitSarvam AIGroq LLMSilero VAD

overview.

A working prototype built as a proposal for a voice-driven government form-filling service. A citizen selects a service (like Aadhaar, PAN, or Ration Card), presses a button, and speaks naturally. An AI agent walks through each field conversationally, confirms what it heard, and fills the on-screen form live. It runs end-to-end on LiveKit using Sarvam AI for Indic STT/TTS and Groq LLMs for natural conversation.

technical implementation.

Finite State Machine (FSM) Agent Control

Engineered a robust Finite State Machine in the LiveKit backend with strict transitions ('greeting' → 'collecting' → 'review' → 'submitted'). The FSM utilizes a cursor to rigidly enforce chronological field order, gates tool execution by phase, and guarantees that any user corrections during review force a fresh read-back loop before final submission.

WebRTC Voice Pipeline

Implemented a high-performance voice pipeline using LiveKit to stream audio between the Next.js frontend and a self-contained Node.js backend agent, achieving near real-time conversational latency.

Dynamic Schema Injection

Designed a schema-agnostic Voice Agent. The frontend mints a LiveKit token and dispatches the agent with the selected form's schema embedded in the job metadata, allowing new forms to be added with zero backend changes.

Indic AI Integration

Integrated Sarvam AI's models specifically trained on Indian languages to handle varied accents and dialects, alongside Silero VAD for accurate voice activity detection.

key features.

Zero-typing conversational form filling through LiveKit WebRTC architecture
Integration with Sarvam AI for high-accuracy Indic language Speech-to-Text and Text-to-Speech
Groq LLM integration for ultra-low latency, natural conversational flows
Schema-agnostic Voice Agent that adapts dynamically to different form metadata
Real-time visual form updates as the user speaks
Live support for complex government forms including Aadhaar, PAN, and Ration Card
Independent frontend and backend agent scaling via LiveKit Cloud

screenshots.

Voice-first assistant for filling government forms.

Choose between traditional typing or voice-first form filling.

Real-time visual form updates as the voice agent listens.

Automated calling capabilities with Ringg API.

Track previously filled and submitted government forms.

Final review and submission of the voice-filled form.

challenges & solutions.

Challenge: Managing unstructured speech for strict form fields

Solution: Utilized Groq LLMs to intelligently parse unstructured conversational input into structured JSON conforming exactly to the dynamic Zod schemas provided by the frontend.

Challenge: Reducing conversation latency

Solution: Optimized the pipeline by deploying the agent on LiveKit Cloud (ap-south region) and using Groq's LPU inference engine for lightning-fast LLM responses.