Vak Sahayak

Next.js 15React 19Tailwind CSSLiveKitSarvam AIGroq LLMSilero VAD
Vak Sahayak preview

overview.

A working prototype built as a proposal for a voice-driven government form-filling service. A citizen selects a service (like Aadhaar, PAN, or Ration Card), presses a button, and speaks naturally. An AI agent walks through each field conversationally, confirms what it heard, and fills the on-screen form live. It runs end-to-end on LiveKit using Sarvam AI for Indic STT/TTS and Groq LLMs for natural conversation.

technical implementation.

Finite State Machine (FSM) Agent Control

Engineered a robust Finite State Machine in the LiveKit backend with strict transitions ('greeting' → 'collecting' → 'review' → 'submitted'). The FSM utilizes a cursor to rigidly enforce chronological field order, gates tool execution by phase, and guarantees that any user corrections during review force a fresh read-back loop before final submission.

WebRTC Voice Pipeline

Implemented a high-performance voice pipeline using LiveKit to stream audio between the Next.js frontend and a self-contained Node.js backend agent, achieving near real-time conversational latency.

Dynamic Schema Injection

Designed a schema-agnostic Voice Agent. The frontend mints a LiveKit token and dispatches the agent with the selected form's schema embedded in the job metadata, allowing new forms to be added with zero backend changes.

Indic AI Integration

Integrated Sarvam AI's models specifically trained on Indian languages to handle varied accents and dialects, alongside Silero VAD for accurate voice activity detection.

key features.

  • Zero-typing conversational form filling through LiveKit WebRTC architecture
  • Integration with Sarvam AI for high-accuracy Indic language Speech-to-Text and Text-to-Speech
  • Groq LLM integration for ultra-low latency, natural conversational flows
  • Schema-agnostic Voice Agent that adapts dynamically to different form metadata
  • Real-time visual form updates as the user speaks
  • Live support for complex government forms including Aadhaar, PAN, and Ration Card
  • Independent frontend and backend agent scaling via LiveKit Cloud

screenshots.

Vak Sahayak Hero

Voice-first assistant for filling government forms.

Two Ways to Fill

Choose between traditional typing or voice-first form filling.

Live Form Filling

Real-time visual form updates as the voice agent listens.

Sarvam AI Integration

Powered by Sarvam AI for accurate Indic speech-to-text.

Ringg Integration

Automated calling capabilities with Ringg API.

Application History

Track previously filled and submitted government forms.

Form Submission

Final review and submission of the voice-filled form.

challenges & solutions.

Challenge: Managing unstructured speech for strict form fields

Solution: Utilized Groq LLMs to intelligently parse unstructured conversational input into structured JSON conforming exactly to the dynamic Zod schemas provided by the frontend.

Challenge: Reducing conversation latency

Solution: Optimized the pipeline by deploying the agent on LiveKit Cloud (ap-south region) and using Groq's LPU inference engine for lightning-fast LLM responses.