THE CALL COMES IN
Water is coming through the kitchen ceiling. The homeowner grabs a phone and dials the only plumber whose magnet is still on the fridge. It's the middle of the night. Two rings. Then a calm voice: "Hi, thanks for calling — tell me what's going on." Nobody got out of bed. The call was already answered.
WHY THIS CALL MATTERS
The missed call is the lost job.
Most missed calls happen nights, weekends, and during emergencies — exactly when the customer is most desperate and most likely to dial the next number instead. TalkTuit answers all of them.
Answers every call, 24/7.
Nights, weekends, holidays. The phone never goes to voicemail again.
Books the job, captures the details.
It understands the caller, schedules the work, and writes down exactly what's needed.
Rushes real emergencies to a human.
Gas, carbon monoxide, flooding, a burst pipe — it gets a live on-call person on the line in seconds.
DIAGRAM 1 OF 5 · THE CAST
Meet the team behind the call.
Six services do the live work of that 2 AM call together. Each is a best-in-class outside service — the same kind of AI you've already seen elsewhere — doing one job a person would understand.
- The Phone Line (Twilio) answers the call and streams the caller's audio in.
- The Ears (Deepgram) turn that speech into text, live.
- The Brain (Claude) understands the text, decides what to say, and fills in one clean record of the call.
- The Mouth (ElevenLabs) turns the brain's words back into a human voice that goes out through the phone line.
- The Orchestrator (Pipecat, our code) sits underneath and moves the audio between the ears, brain, and mouth.
- The Brain writes one structured record into the Memory (DynamoDB).
HOW THE LINE EVEN CONNECTED
Before the homeowner heard "Hi, thanks for calling," something happened in about half a second. The phone company knocked on a tiny door, a one-line answer came back — "open a live audio stream over here" — and then the real connection took over for the whole call. Two pieces, two very different jobs.
DIAGRAM 2 OF 5 · THE HANDSHAKE
What happens when someone calls.
You never see this — it's the reason the call connects in about half a second and never drops. (One for the engineers: a short request fits a tiny function; holding a call open for minutes needs a long-running container. That difference is the whole reason both exist.)
- The caller dials and reaches Twilio, the phone line.
- Phase 1, the doorbell, lasts only milliseconds: Twilio makes one short request to a small Lambda function (a tiny program that wakes up, answers, and exits), which answers with a one-line instruction that means "open a live audio line here," then steps out and is never in the audio path.
- Phase 2, the live call, lasts about five minutes: Twilio opens a persistent connection to a Fargate container (a long-running program that stays alive for the whole call) that holds the call open and streams audio both ways the entire time.
- When the call ends, one structured record is written to DynamoDB.
- The point: a short stateless request suits a tiny function, while holding a call open for minutes requires a long-running container.
DIAGRAM 3 OF 5 · THE LIVE LOOP
Inside one live call.
The loop below is one turn. A real call is many turns, 15 to 40 of them, so it runs this loop over and over: answering each new thing the caller raises and firing tools (book the job, text the owner, sync the CRM) as it goes. Interrupt it and it stops, like a real conversation, not a press-1-for-this phone menu.
One real call is many turns. The loop above repeats for every one of them, answering each new thing the caller raises and firing tools as it goes:
- 1"My water heater's leaking.""Oh no, where's it leaking from?"
- 2"The basement, it's pooling.""Got it, that's not an emergency. Let's get you booked."
- 3"Can someone come today?""I've got a 2–4pm window open today."↳ books the job
- 4"2 works. Do you do drains too?""Yes, we handle drains. Anything else?"
- 5"That's it, thanks.""You're all set. I just texted you a confirmation."↳ texts the owner · logs to the CRM
Tools fired in this one call:
- HEAR: the ears (Deepgram) take in a chunk of speech.
- CHECK FOR EMERGENCY: fast plain code, no AI, runs instantly to see if this is a life-safety situation.
- THINK: the brain (Claude) responds — Haiku, the fast everyday tier, for routine turns, and Sonnet, the smarter tier, for emergencies. They are two tiers of the same brain: cheaper and faster versus smarter.
- TOOL: if an action is needed, it may book the job, forward the emergency, text the owner, or sync the result into your CRM (the software that holds your customers and jobs).
- SPEAK: the mouth (ElevenLabs) says the reply.
- The loop returns to HEAR and repeats for the whole call.
- Barge-in: if the caller talks while the bot is speaking, the bot stops mid-sentence and listens.
THE PART THAT MATTERS AT 2 AM
The homeowner says the word "burst." The receptionist doesn't guess and doesn't shrug it off as a routine booking. A fast check flags it instantly, the smarter brain takes the turn, and it starts dialing a real on-call person. And here's the part that matters most: it does not say "help is coming" until a human is actually on the line. If the first number doesn't pick up, it falls back and keeps trying. It never lies to someone standing in rising water.
DIAGRAM 4 OF 5 · THE SAFETY FORK
Emergencies are handled in code, not hope.
A prompt can't route itself to a smarter model, so the emergency decision lives in plain, deterministic code — not in something the AI might get wrong.
- Every caller utterance hits a fast keyword check — plain code, no AI, instant.
- MISS (calm path): routine turns go to Haiku, the fast everyday tier of the brain (cheaper and faster), which is structurally forbidden from declaring an emergency on a guess.
- If Haiku is unsure, a dashed escalation arrow re-runs the turn on the smarter Sonnet model.
- HIT (emergency path): keywords like gas, carbon monoxide, flood, burst pipe, sewage, or no-heat-while-freezing force the turn onto Sonnet, the smarter tier of the brain — the only one allowed to forward an emergency.
- Sonnet forwards to a real human, which is treated as something that can fail.
- Success: a human is engaged, and only then do we tell the caller help is coming.
- Failure (ring-no-answer or busy): it falls back to 911 or the posted emergency line and retries.
WHY IT NEVER SOUNDS LIKE IT'S THINKING
The fillers hide the thinking time.
Predictable lines — the greeting, little fillers like "one moment" or "let me check that" — are recorded once and replayed instantly. Only the words that genuinely have to be invented for this caller use the live brain. The fillers hide the thinking time, so there's never dead air.
Gold = the instant, pre-recorded line that buys time. Plain = the live brain inventing the part that's unique to this caller.
DIAGRAM 5 OF 5 · THE MOAT
Why it gets better every month.
Every call becomes one clean, structured record. Across many shops those records roll up, anonymized, into a per-trade brain that gets sharper as volume grows.
- Each call becomes one structured record whose key fields are fixed dropdowns — trade, problem, urgency, outcome — never free text.
- Across many shops, those identical records converge, anonymized, so no raw customer data leaves a shop.
- They roll up into one per-trade brain that gets sharper as call volume grows.
- The receptionist is the wedge. The data is the asset.
Every call becomes one clean, structured record — the key fields are fixed dropdowns from call one, never free text — so every call is comparable across every shop and every month. Across many shops those records roll up, anonymized, into a per-trade brain that gets smarter with volume.
The receptionist is the wedge. The data is the asset.
Aggregates are anonymized — no raw customer data leaves a shop.
THE SAME CALL, EVERY NIGHT
Every call gets this. 24/7. Every shop.
The 2 AM burst pipe, the Tuesday tune-up, the Saturday no-heat call — every one of them gets answered on the second ring, booked, and saved as one clean record. No call goes to voicemail. No job gets lost.