2 AM · A PIPE JUST BURST

It's 2 AM and the pipe just burst. Your phone gets answered on the second ring.

TalkTuit is an AI voice receptionist for plumbing and HVAC shops. It answers every call, day or night, in a calm human voice — books the job, captures the details, and the moment it hears a real emergency it puts a live person on the line. One call in. One clean record out.

Follow one 2 AM call from the first ring to the saved job. Everything else on this page is just that call, explained.

See how the call works Five diagrams. Sixty seconds. No hype. Answers 24/7 · sounds human · updates your customer records · built on AWS

THE CALL COMES IN

Water is coming through the kitchen ceiling. The homeowner grabs a phone and dials the only plumber whose magnet is still on the fridge. It's the middle of the night. Two rings. Then a calm voice: "Hi, thanks for calling — tell me what's going on." Nobody got out of bed. The call was already answered.

WHY THIS CALL MATTERS

The missed call is the lost job.

Most missed calls happen nights, weekends, and during emergencies — exactly when the customer is most desperate and most likely to dial the next number instead. TalkTuit answers all of them.

Answers every call, 24/7.

Nights, weekends, holidays. The phone never goes to voicemail again.

Books the job, captures the details.

It understands the caller, schedules the work, and writes down exactly what's needed.

Rushes real emergencies to a human.

Gas, carbon monoxide, flooding, a burst pipe — it gets a live on-call person on the line in seconds.

DIAGRAM 1 OF 5 · THE CAST

Meet the team behind the call.

Six services do the live work of that 2 AM call together. Each is a best-in-class outside service — the same kind of AI you've already seen elsewhere — doing one job a person would understand.

The team of six services behind one call Twilio, the phone line, sits at the far left and feeds the caller's audio into a central horizontal chain of three live organs: the ears (Deepgram, marked with a cyan stripe), the brain (Claude, marked blue), and the mouth (ElevenLabs, marked violet). A wide orchestrator bar (Pipecat, our code) sits beneath the three organs and holds them, moving the audio between them. The brain writes one record, shown as a teal signal arrow, into the memory (DynamoDB) at the lower right. Color meanings used across all diagrams: cyan is hear, blue is think, violet is speak, a dashed ember line means interrupt or can-fail, a dotted gold line means instant or no-AI, and a teal signal line means a good resolved outcome. THE PHONE LINE Twilio audio THE EARS Deepgram speech → text THE BRAIN Claude decides & fills fields THE MOUTH ElevenLabs text → human voice THE ORCHESTRATOR Pipecat — our code — moves the audio between them. THE MEMORY DynamoDB writes one record
  1. The Phone Line (Twilio) answers the call and streams the caller's audio in.
  2. The Ears (Deepgram) turn that speech into text, live.
  3. The Brain (Claude) understands the text, decides what to say, and fills in one clean record of the call.
  4. The Mouth (ElevenLabs) turns the brain's words back into a human voice that goes out through the phone line.
  5. The Orchestrator (Pipecat, our code) sits underneath and moves the audio between the ears, brain, and mouth.
  6. The Brain writes one structured record into the Memory (DynamoDB).
Six services. Each does one job a human would understand.
cyan = hear blue = think violet = speak dashed ember = interrupt / can-fail dotted gold = instant / no-AI teal = good outcome

HOW THE LINE EVEN CONNECTED

Before the homeowner heard "Hi, thanks for calling," something happened in about half a second. The phone company knocked on a tiny door, a one-line answer came back — "open a live audio stream over here" — and then the real connection took over for the whole call. Two pieces, two very different jobs.

DIAGRAM 2 OF 5 · THE HANDSHAKE

What happens when someone calls.

You never see this — it's the reason the call connects in about half a second and never drops. (One for the engineers: a short request fits a tiny function; holding a call open for minutes needs a long-running container. That difference is the whole reason both exist.)

The two-connection handshake of a call A left-to-right timeline. The caller dials Twilio, the phone line. Phase 1 is a very short segment, milliseconds long, called the doorbell: a small Lambda function (a tiny program that wakes up, answers, and exits) returns a one-line instruction that says, in plain terms, open a live audio line here, then a dotted gold arrow shows it stepping out — it answers and is never in the audio path. Phase 2 is a long segment, about five minutes, deliberately drawn five times wider: a Fargate container holds the live call open, streaming audio both ways with Twilio the whole time. The timeline ends with one structured record written to DynamoDB, shown as a teal signal node. The width contrast is the explanation: a short stateless request suits a small function, a long stateful call needs a long-running container. time → Caller PHONE LINE Twilio PHASE 1 · THE DOORBELL · MILLISECONDS Lambda returns TwiML answer: “open a live audio line here” answers, then steps out — never in the audio path spins up, answers, dies PHASE 2 · THE LIVE CALL · ~5 MINUTES Fargate · holds the call long-running container audio both ways ↔ stays open the whole call DynamoDB one record
  1. The caller dials and reaches Twilio, the phone line.
  2. Phase 1, the doorbell, lasts only milliseconds: Twilio makes one short request to a small Lambda function (a tiny program that wakes up, answers, and exits), which answers with a one-line instruction that means "open a live audio line here," then steps out and is never in the audio path.
  3. Phase 2, the live call, lasts about five minutes: Twilio opens a persistent connection to a Fargate container (a long-running program that stays alive for the whole call) that holds the call open and streams audio both ways the entire time.
  4. When the call ends, one structured record is written to DynamoDB.
  5. The point: a short stateless request suits a tiny function, while holding a call open for minutes requires a long-running container.
You never see this — it's why the call connects in half a second and never drops. For engineers: a short request fits a Lambda; holding a call open for minutes needs a long-running container.

DIAGRAM 3 OF 5 · THE LIVE LOOP

Inside one live call.

The loop below is one turn. A real call is many turns, 15 to 40 of them, so it runs this loop over and over: answering each new thing the caller raises and firing tools (book the job, text the owner, sync the CRM) as it goes. Interrupt it and it stops, like a real conversation, not a press-1-for-this phone menu.

The live conversation loop inside a call A clockwise loop of five steps. Step one, HEAR (Deepgram, cyan), takes in a chunk of speech. Step two, CHECK FOR EMERGENCY, is fast plain code with no AI, drawn in neutral gray with a code-brackets glyph. Step three, THINK (Claude, blue), uses two tiers of the same brain: the fast everyday Haiku tier for routine turns and the smarter Sonnet tier for emergencies. Step four, a TOOL decision, can fan out with a dashed optional arrow to four actions: book the job, forward the emergency, text the owner, or sync the result into the shop's CRM (the software that holds its customers and jobs). Step five, SPEAK (ElevenLabs, violet), says the reply, and the loop returns to HEAR. A dashed ember barge-in arrow curves from SPEAK back to HEAR: if the caller talks while the bot is speaking, the bot stops and listens. HEAR Deepgram { } CHECK EMERGENCY fast code · no AI instant THINK Claude Haiku (fast, everyday): routine Sonnet (smarter): emergencies SPEAK ElevenLabs TOOL? if action needed next turn · repeats 15–40× per call book job forward emergency text owner sync CRM barge-in: caller can interrupt → bot stops & listens

One real call is many turns. The loop above repeats for every one of them, answering each new thing the caller raises and firing tools as it goes:

  1. 1"My water heater's leaking.""Oh no, where's it leaking from?"
  2. 2"The basement, it's pooling.""Got it, that's not an emergency. Let's get you booked."
  3. 3"Can someone come today?""I've got a 2–4pm window open today."↳ books the job
  4. 4"2 works. Do you do drains too?""Yes, we handle drains. Anything else?"
  5. 5"That's it, thanks.""You're all set. I just texted you a confirmation."↳ texts the owner · logs to the CRM

Tools fired in this one call:

✓ book_job ✓ text_owner ✓ write_crm
  1. HEAR: the ears (Deepgram) take in a chunk of speech.
  2. CHECK FOR EMERGENCY: fast plain code, no AI, runs instantly to see if this is a life-safety situation.
  3. THINK: the brain (Claude) responds — Haiku, the fast everyday tier, for routine turns, and Sonnet, the smarter tier, for emergencies. They are two tiers of the same brain: cheaper and faster versus smarter.
  4. TOOL: if an action is needed, it may book the job, forward the emergency, text the owner, or sync the result into your CRM (the software that holds your customers and jobs).
  5. SPEAK: the mouth (ElevenLabs) says the reply.
  6. The loop returns to HEAR and repeats for the whole call.
  7. Barge-in: if the caller talks while the bot is speaking, the bot stops mid-sentence and listens.
One trip around the loop = one turn. A real call is 15–40 turns, so it answers each thing the caller raises and fires several tools (here: book_job, text_owner, write_crm) along the way.

THE PART THAT MATTERS AT 2 AM

The homeowner says the word "burst." The receptionist doesn't guess and doesn't shrug it off as a routine booking. A fast check flags it instantly, the smarter brain takes the turn, and it starts dialing a real on-call person. And here's the part that matters most: it does not say "help is coming" until a human is actually on the line. If the first number doesn't pick up, it falls back and keeps trying. It never lies to someone standing in rising water.

DIAGRAM 4 OF 5 · THE SAFETY FORK

Emergencies are handled in code, not hope.

A prompt can't route itself to a smarter model, so the emergency decision lives in plain, deterministic code — not in something the AI might get wrong.

ember = forced / emergency path teal = human reached / safe dashed = can fail or escalate
The deterministic emergency safety fork Every caller utterance hits a fast keyword check, drawn as a neutral diamond of plain code that runs instantly. It splits two ways. On the right is the emergency HIT path, drawn in ember: keywords like gas, carbon monoxide, flood, burst pipe, sewage, or no-heat-while-freezing force the turn onto the smarter Sonnet model, shown with a dashed ember arrow. Sonnet forwards to a real human, drawn as a node with two outgoing arrows: a teal success arrow meaning a human was engaged and only then is the caller told help is coming, and a dashed ember failure arrow for ring-no-answer or busy that falls back to 911 or the posted line and retries. We never say help is coming until a human is actually on. On the left is the calm MISS path, drawn in teal: routine turns go to the cheap Haiku model, which is structurally forbidden from declaring an emergency on a guess. A dashed ember escalation arrow runs from Haiku up into Sonnet, labeled: if unsure, re-run on the smarter model. The asymmetry tells the story — the right side branches and can fail, the left side stays calm with a safety escalation. Caller utterance { } fast keyword check plain code · instant miss Haiku — fast, everyday tier (the cheaper, faster brain) forbidden from declaring an emergency on a guess hit gas · CO · flood · burst pipe sewage · no-heat-freezing FORCES the smart model Sonnet — smarter tier the only brain allowed to forward an emergency unsure → re-run on the smarter model (escalation) Forward to a real human dialing the on-call person human engaged → then we tell the caller help is coming ring-no-answer / busy → fall back: 911 / posted line → retry
  1. Every caller utterance hits a fast keyword check — plain code, no AI, instant.
  2. MISS (calm path): routine turns go to Haiku, the fast everyday tier of the brain (cheaper and faster), which is structurally forbidden from declaring an emergency on a guess.
  3. If Haiku is unsure, a dashed escalation arrow re-runs the turn on the smarter Sonnet model.
  4. HIT (emergency path): keywords like gas, carbon monoxide, flood, burst pipe, sewage, or no-heat-while-freezing force the turn onto Sonnet, the smarter tier of the brain — the only one allowed to forward an emergency.
  5. Sonnet forwards to a real human, which is treated as something that can fail.
  6. Success: a human is engaged, and only then do we tell the caller help is coming.
  7. Failure (ring-no-answer or busy): it falls back to 911 or the posted emergency line and retries.
A cheap model can never quietly downgrade a gas leak. We never say "help is coming" until a human is actually on.

WHY IT NEVER SOUNDS LIKE IT'S THINKING

The fillers hide the thinking time.

Predictable lines — the greeting, little fillers like "one moment" or "let me check that" — are recorded once and replayed instantly. Only the words that genuinely have to be invented for this caller use the live brain. The fillers hide the thinking time, so there's never dead air.

Gold = the instant, pre-recorded line that buys time. Plain = the live brain inventing the part that's unique to this caller.

DIAGRAM 5 OF 5 · THE MOAT

Why it gets better every month.

Every call becomes one clean, structured record. Across many shops those records roll up, anonymized, into a per-trade brain that gets sharper as volume grows.

Why the data compounds into a moat A funnel that widens left to right. On the left, several individual calls each become one structured record card whose key fields are fixed dropdown chips — trade, problem, urgency, and outcome — shown next to a struck-through gray free-text chip, because the vocabulary is controlled, never free text. In the middle, many identical record cards converge, anonymized, so no raw customer data leaves a shop. On the right, they roll up into one large per-trade brain, drawn in teal with an upward growth curve, that gets sharper as call volume grows. The caption: the receptionist is the wedge, the data is the asset. calls one structured record trade ▾ problem ▾ urgency ▾ outcome ▾ free text controlled vocabulary — never free text anonymized — no raw data leaves a shop PER-TRADE BRAIN (anonymized) × volume → sharper
  1. Each call becomes one structured record whose key fields are fixed dropdowns — trade, problem, urgency, outcome — never free text.
  2. Across many shops, those identical records converge, anonymized, so no raw customer data leaves a shop.
  3. They roll up into one per-trade brain that gets sharper as call volume grows.
  4. The receptionist is the wedge. The data is the asset.
The receptionist is the wedge. The data is the asset.

Every call becomes one clean, structured record — the key fields are fixed dropdowns from call one, never free text — so every call is comparable across every shop and every month. Across many shops those records roll up, anonymized, into a per-trade brain that gets smarter with volume.

The receptionist is the wedge. The data is the asset.

Aggregates are anonymized — no raw customer data leaves a shop.

THE SAME CALL, EVERY NIGHT

Every call gets this. 24/7. Every shop.

The 2 AM burst pipe, the Tuesday tune-up, the Saturday no-heat call — every one of them gets answered on the second ring, booked, and saved as one clean record. No call goes to voicemail. No job gets lost.

Built on AWS — Fargate runs the live call, DynamoDB keeps the records, Lambda handles the glue. Single-region, pay-as-you-go, so it scales from one shop to many without re-architecting.