Hackathon Date: March 31, 2026 • 2:00 PM - 7:00 PM IST
Format: Individual
What You’re Building
Build a drop-in voice AI plugin that any client can embed on their website. When a customer clicks it, they can have a live voice conversation with the client’s AI agent — right from the browser. After the conversation ends, your system extracts key entities from the transcript and stores them in a CRM. Think of it as: Intercom chat widget, but for voice AI, with automatic CRM sync.Architecture
Key Components
- Web Plugin — A JavaScript widget (floating button, sidebar, modal — your design) that captures the user’s microphone audio and plays the agent’s audio response through the browser speaker.
- Voice Transport — Connect to the Pipecat WebSocket, send user’s microphone audio as PCM 16-bit 16kHz mono, receive and play the agent’s audio response in the same format.
- Transcription — Transcribe both sides of the conversation (user’s speech and agent’s speech) to build a full transcript. You can use any STT provider or the browser’s built-in Web Speech API.
- Entity Extraction — After the conversation ends, use an LLM (Gemini) to extract structured entities from the transcript based on the agent’s domain.
- CRM Storage — Store the extracted entities. This can be a real CRM API (HubSpot, Salesforce, etc.) or a mock CRM (local database, JSON file, Airtable, Google Sheets — whatever you prefer).
Target Agent
The agent is exposed via Pipecat WebSocketEntity Extraction Reference
Based on the agent’s domain, here are the kinds of entities your system should extract from the conversation transcript.Milestones
Milestone 1 — Voice Widget
Goal: A working browser-based voice widget that talks to the agent. Deliverables:- A web page with an embeddable voice widget (button, sidebar, modal — your choice)
- Widget captures microphone audio from the browser
- Audio sent to Pipecat WebSocket
- Agent’s audio response played back through browser speakers
- User can have a real multi-turn voice conversation with the agent from the browser
- Visual feedback — user can see when the agent is speaking, when it’s their turn, connection status
Milestone 2 — Transcript + Entity Extraction
Goal: After a conversation ends, extract structured entities and store them. Deliverables:- Full conversation transcript generated (both user and agent speech)
- Transcript displayed in the widget or a side panel after the call ends
- LLM-based entity extraction that takes the transcript + entity schema → structured JSON output
- Entity extraction works with a configurable schema (not hardcoded to one agent)
- Extracted entities displayed to the user after the call
Milestone 3 — CRM Integration & Polish
Goal: Store extracted entities in a CRM and polish the experience. Pick one or more:- CRM Storage: Push extracted entities to a CRM (real or mock): HubSpot, Salesforce, Airtable, Google Sheets, Notionvia API
- CRM Dashboard: Simple page showing all past conversations with their extracted entities — searchable, filterable
- Embeddable Script: Package the plugin as a single
<script>tag that any website can drop in (like Google Analytics or Intercom) - Conversation Summary: In addition to entities, generate a human-readable summary of the call
- Multi-language Support: Handle Hindi/English conversations — entity extraction works correctly regardless of language
- Plugin Configurability: A config object where the client specifies agent WebSocket URL, entity schema, CRM endpoint, widget theme/colors
Judging Criteria
| Criteria | Weight | What We’re Looking For |
|---|---|---|
| Voice Experience | 30% | Does the browser voice widget work smoothly? Low latency? Clear audio? Good visual feedback? Does it feel like a real phone call in the browser? |
| Entity Extraction Quality | 30% | Are entities extracted accurately? Does it handle messy transcripts, partial information, and multilingual conversations? Is the schema configurable? |
| Integration & Polish | 20% | CRM storage working? Is the plugin embeddable? Is there a dashboard or summary view? |
| Engineering Quality | 20% | Clean code, good abstractions, error handling. Could this be shipped to a client with minimal changes? |