
overview.
A working prototype built as a proposal for a voice-driven government form-filling service. A citizen selects a service (like Aadhaar, PAN, or Ration Card), presses a button, and speaks naturally. An AI agent walks through each field conversationally, confirms what it heard, and fills the on-screen form live. It runs end-to-end on LiveKit using Sarvam AI for Indic STT/TTS and Groq LLMs for natural conversation.
technical implementation.
Finite State Machine (FSM) Agent Control
Engineered a robust Finite State Machine in the LiveKit backend with strict transitions ('greeting' → 'collecting' → 'review' → 'submitted'). The FSM utilizes a cursor to rigidly enforce chronological field order, gates tool execution by phase, and guarantees that any user corrections during review force a fresh read-back loop before final submission.
WebRTC Voice Pipeline
Implemented a high-performance voice pipeline using LiveKit to stream audio between the Next.js frontend and a self-contained Node.js backend agent, achieving near real-time conversational latency.
Dynamic Schema Injection
Designed a schema-agnostic Voice Agent. The frontend mints a LiveKit token and dispatches the agent with the selected form's schema embedded in the job metadata, allowing new forms to be added with zero backend changes.
Indic AI Integration
Integrated Sarvam AI's models specifically trained on Indian languages to handle varied accents and dialects, alongside Silero VAD for accurate voice activity detection.
key features.
- Zero-typing conversational form filling through LiveKit WebRTC architecture
- Integration with Sarvam AI for high-accuracy Indic language Speech-to-Text and Text-to-Speech
- Groq LLM integration for ultra-low latency, natural conversational flows
- Schema-agnostic Voice Agent that adapts dynamically to different form metadata
- Real-time visual form updates as the user speaks
- Live support for complex government forms including Aadhaar, PAN, and Ration Card
- Independent frontend and backend agent scaling via LiveKit Cloud
screenshots.

Voice-first assistant for filling government forms.

Choose between traditional typing or voice-first form filling.

Real-time visual form updates as the voice agent listens.

Powered by Sarvam AI for accurate Indic speech-to-text.

Automated calling capabilities with Ringg API.

Track previously filled and submitted government forms.

Final review and submission of the voice-filled form.
challenges & solutions.
Challenge: Managing unstructured speech for strict form fields
Solution: Utilized Groq LLMs to intelligently parse unstructured conversational input into structured JSON conforming exactly to the dynamic Zod schemas provided by the frontend.
Challenge: Reducing conversation latency
Solution: Optimized the pipeline by deploying the agent on LiveKit Cloud (ap-south region) and using Groq's LPU inference engine for lightning-fast LLM responses.