Hackathon Date: March 31, 2026 • 2:00 PM - 7:00 PM IST
Format: Individual
What You’re Building
Build an intelligent Pipecat voice AI pipeline that goes beyond basic conversation. Your pipeline will process a live voice call with our Shilpa agent (Kotak Securities demat account opening) and add four real-time capabilities on top:- Human Escalation : Detect when the caller needs a human agent and execute a live SIP transfer via FreeSWITCH
- Gender Detection : Infer the caller’s gender from the conversation transcript in real-time and make it available as pipeline metadata
- Language Detection : Detect the caller’s language (Hindi, English, code-mixed) from the transcript in real-time and make it available as pipeline metadata
- Prompt Optimization : Reduce latency and cost through one or more of: prompt compression, faster tool calling, or RAG-based dynamic knowledge injection
Pipeline Requirements
Base Pipeline
Build a working Pipecat pipeline from scratch with:- STT (any provider)
- LLM (Gemini or any provider)
- TTS (any provider)
- The Shilpa agent prompt loaded and working
Agent
The agent is Shilpa (Kotak Securities demat account opening).The Four Capabilities
1. Human Escalation
Detect in real-time when the conversation should be handed off to a human agent, then execute the transfer via FreeSWITCH SIP. Detection triggers (at minimum):- Caller explicitly asks to speak to a human / manager / supervisor
- Caller expresses extreme frustration or anger (repeated objections, raised voice cues in transcript)
- Conversation is stuck in a loop (agent repeating itself, caller not progressing)
- Agent is unable to answer a question outside its domain
2. Gender Detection
Infer the caller’s likely gender from the conversation transcript in real-time using NLP/LLM analysis. Use linguistic cues — name mentions, pronoun usage, Hindi gendered verb forms (e.g., “मैं करता हूँ” vs “मैं करती हूँ”). Produce a classification (male, female, unknown) with a confidence score that updates as the conversation progresses.
3. Language Detection
Detect the caller’s language in real-time from the transcript. Classify each caller turn ashindi, english, code-mixed, or other. Track the dominant language across the conversation. Handle edge cases like single-word responses, and numbers-only responses.
4. Prompt Optimization
Reduce LLM latency and/or cost through intelligent prompt engineering at the pipeline level. Implement one or more of: prompt compression (reduce token count while preserving instruction fidelity), fast tool calling (parallel execution, caching, speculative selection), or RAG-based knowledge injection (index agent knowledge into a vector store, retrieve relevant chunks per turn instead of stuffing the full prompt).Judging Criteria
| Criteria | Weight | What We’re Looking For |
|---|---|---|
| Gender & Language Detection | 30% | Do detectors produce correct results in real-time? Handle edge cases — code-mixed utterances, Romanized Hindi (“haan bolo”), ambiguous gender, single-word responses? Does confidence improve with more turns? Clean processor design with proper error handling and logging? |
| Human Escalation | 30% | Does the detector trigger correctly. no false positives on mild complaints, no misses on explicit requests like “get me a manager”? Does the SIP transfer actually execute via FreeSWITCH? Does the agent deliver a smooth handoff message before transfer? Is the escalation logged with reason, transcript, and metadata (gender, language)? |
| Prompt Optimization | 20% | Is there a measurable improvement in latency, token count, or cost? Is response quality preserved after compression/RAG? Is the approach technically sound not just truncating the prompt but intelligently reducing it? Before/after metrics shown? |
| Ambition & Polish | 20% | How far beyond the core requirements did you go? Dynamic STT/TTS language switching? Escalation context handoff to human agent screen? Real-time dashboard? Prompt compression + RAG combined? Does the overall system feel production-ready? |
Rules
- Individual work only this is a solo competition
- Claude Code is allowed and encouraged use it aggressively
- Any programming language Python required (Pipecat is Python)
- Any STT/TTS/LLM provider you provision your own keys (Gemini is available)
- No pre-written code start from scratch at the hackathon start
- Final demo: 5-minute live demo run a conversation showing all four capabilities in action
Good luck.