← back to the log

Building a Voice AI Intake Agent

Most inbound leads hit a contact form and wait. By the time you reply, they've already talked to someone else. We wanted to fix that for ourselves — and eventually offer it to clients.

The idea: a voice AI agent that picks up inbound calls, runs a structured intake conversation, and delivers a qualified brief before the first human meeting. Not a chatbot. Not an IVR tree. An actual conversational agent that listens, asks follow-ups, and summarizes what it learns.

Here's how we evaluated the platforms, designed the conversation, and planned the build.

Platform Comparison

We looked at three developer-focused voice AI platforms: Vapi, Bland.ai, and Retell AI. For our use case — low volume, inbound-triggered, context-gathering calls — the key differentiators were latency, setup complexity, LLM flexibility, and true cost per minute.

Vapi Bland.ai Retell AI
True Cost/Min ~$0.13–$0.31 ~$0.09–$0.15 ~$0.07–$0.20
Pricing Model BYO everything — platform + STT + TTS + LLM + telephony billed separately Plan-based tiers + per-minute usage Pay-as-you-go, bundled voice + LLM + telephony
Latency Sub-500ms ~800ms Sub-600ms
LLM Support BYO: OpenAI, Claude, Gemini, custom Built-in, limited choice BYO: OpenAI, Claude, custom
Setup Complexity High — full stack assembly Medium — API-first Low-Medium — visual builder + API
Best For Max flexibility, deep eng resources High-volume outbound campaigns Fastest path to production inbound agent

Our pick: Retell AI

For a single inbound intake agent, Retell is the best fit. The bundled pricing means no surprise bills from five different vendors. The visual builder plus API means a working agent in days, not weeks. And it supports Claude as the LLM backend, which we already know well.

At low volume — say 30 calls a month averaging 5 minutes each — total cost lands around $17–32/month. That's essentially free for a system that qualifies every inbound lead before you ever pick up the phone.

Runner-up: Vapi. If you later need maximum customization or want to offer white-label voice agents to clients, Vapi's flexibility becomes more valuable at scale.

Conversation Flow Design

The goal is to gather enough context so that by the time you get on the real call, you already know what they need, what their environment looks like, and what level of engagement they're expecting. The tone should be warm and curious — not interrogative.

Step 1 — Opening
Introduce & Set Expectations
"Hi, this is the Agentic Studio Labs assistant. Thanks for reaching out — I'm here to learn a bit about what you're looking for so we can make the most of your time when we connect with you directly. This should take about 3 to 5 minutes. Sound good?"

Transparency that this is an AI. Sets a time expectation. Gets verbal consent to proceed. If they want a human instead, route straight to calendar booking.

Step 2 — Context Gathering
Who They Are & What Prompted the Call
"Great. To start — what's your name and your company? And what got you interested in exploring AI solutions right now?"

Let them talk. The agent uses active listening cues and follows up naturally. If they mention a specific problem, dig deeper. If they're vague, offer structure: "Are you more focused on automating internal operations, or is this customer-facing?"

Step 3 — Technical Landscape
Understand Their Environment
"This is helpful. Quick question on the tech side — what does your current stack look like? Are you mostly cloud-based? Any specific platforms or tools your team relies on?"

Not everyone will be technical. If they seem unsure, pivot: "No worries on the technical details — we can dig into that together. Do you have an internal IT team, or would you be looking for end-to-end support?"

Step 4 — Scope & Timeline
Urgency, Budget Signals, Decision Process
"Where are you in the process — is this more exploratory, or do you have a timeline in mind for getting something in place?"

Follow up on deadline drivers, who else is involved in the decision, and whether they have a budget range in mind. The budget question is optional — only ask if the conversation naturally goes there.

Step 5 — Confirm & Close
Summarize Back & Schedule
"Let me make sure I've got this right — [agent summarizes key points back]. Does that capture it well?"

After confirmation, offer to book a call directly or send a scheduling link. The agent generates a structured brief and attaches it to the calendar invite.

Post-Call Data Pipeline

This is where the real operational value lives. After each call, the agent generates a structured brief that lands in your system before the meeting happens.

1. Transcript

Full call transcript stored automatically. Searchable and referenceable.

2. Structured Summary

AI-generated brief: name, company, pain points, tech stack, timeline, decision makers, budget signals.

3. Delivery

Pushed via webhook to Google Doc, CRM, email, or Slack — wherever you need it before the real call.

4. Calendar

Meeting booked with the structured summary attached to the calendar invite description.

Example: What You'd See Before Your Call

INTAKE BRIEF — Sarah Chen, Greenfield Health

Contact: Sarah Chen, VP of Operations, Greenfield Health (regional healthcare network, ~200 employees)

What they need: Automate patient intake and appointment scheduling calls. Currently handling ~500 calls/day with a 12-person team. Wants to reduce staffing costs while improving after-hours availability.

Current stack: Epic EHR, AWS-based infrastructure, Twilio for existing phone system. Internal IT team of 3.

Timeline: Board presentation in Q2. Wants a proof of concept by mid-April. Decision involves CTO and CFO.

Budget signals: Current call center costs "north of $40K/month." Open to phased approach.

Suggested talking points: HIPAA compliance approach, Epic integration feasibility, ROI model comparing current staffing vs. AI agent deployment, phased rollout starting with after-hours calls.

Interested in a voice AI intake agent for your team?

./start-conversation