HOW IT WORKS

TalkTuit, end to end.

An AI voice receptionist for plumbing and HVAC shops. This page shows how it is built — the parts, how a call flows through them, and why the data compounds — at a high level, by role.

Answers 24/7 · sounds human · updates your customer records · runs in the cloud

DIAGRAM 1 OF 4 · THE CAST

Meet the team behind the call.

Six parts do the live work of that call together. Each is a best-in-class service doing one job a person would understand.

The team of six services behind one call The phone line sits at the far left and feeds the caller's audio into a central horizontal chain of three live organs: the ears (a transcriber, cyan), the brain (the model, blue), and the mouth (the voice, violet). A wide orchestrator bar (our code) sits beneath them and moves the audio between them. The brain writes one record, shown as a teal signal arrow, into the memory (a database) at the lower right. Color meanings used across all diagrams: cyan is hear, blue is think, violet is speak, a dashed ember line means interrupt or can-fail, a dotted gold line means instant or no-AI, and a teal signal line means a good resolved outcome. THE PHONE LINE the carrier audio THE EARS a transcriber speech → text THE BRAIN the model decides & fills fields THE MOUTH a voice model text → human voice THE ORCHESTRATOR our code — moves the audio between them. THE MEMORY a database writes one record
  1. The Phone Line answers the call and streams the caller's audio in.
  2. The Ears (a transcriber) turn that speech into text, live.
  3. The Brain (the model) understands the text, decides what to say, and fills in one clean record of the call.
  4. The Mouth (the voice) turns the brain's words back into a human voice that goes out through the phone line.
  5. The Orchestrator (our code) sits underneath and moves the audio between the ears, brain, and mouth.
  6. The Brain writes one structured record into the Memory (a database).
Six services. Each does one job a human would understand.
cyan = hear blue = think violet = speak dashed ember = interrupt / can-fail dotted gold = instant / no-AI teal = good outcome

DIAGRAM 2 OF 4 · THE HANDSHAKE

What happens when someone calls.

You never see this — it's the reason the call connects in about half a second and never drops. (One for the engineers: a short request fits a tiny function; holding a call open for minutes needs a long-running container. That difference is the whole reason both exist.)

The two-connection handshake of a call A left-to-right timeline. The caller dials the phone line. Phase 1 is a very short segment, milliseconds long, called the doorbell: a tiny function (a small program that wakes up, answers, and exits) returns a one-line instruction that says, in plain terms, open a live audio line here, then a dotted gold arrow shows it stepping out — it answers and is never in the audio path. Phase 2 is a long segment, about five minutes, deliberately drawn five times wider: a long-running server holds the live call open, streaming audio both ways the whole time. The timeline ends with one structured record written to the database, shown as a teal signal node. The width contrast is the explanation: a short request suits a tiny function, a long live call needs a long-running server. time → Caller PHONE LINE the carrier PHASE 1 · DOORBELL · ms a tiny function returns a setup line answer: “open a live audio line here” spins up, answers, dies PHASE 2 · LIVE CALL · ~5 min the server · holds the call long-running container audio both ways ↔ stays open the whole call a database +1 record / call
  1. The caller dials and reaches the phone line.
  2. Phase 1, the doorbell, lasts only milliseconds: the phone line makes one short request to a tiny function (a tiny program that wakes up, answers, and exits), which answers with a one-line instruction that means "open a live audio line here," then steps out and is never in the audio path.
  3. Phase 2, the live call, lasts about five minutes: the phone line opens a persistent connection to a long-running server (that stays alive for the whole call) that holds the call open and streams audio both ways the entire time.
  4. When the call ends, one structured record is written to the database.
  5. The point: a short stateless request suits a tiny function, while holding a call open for minutes requires a long-running container.
You never see this — it's why the call connects in half a second and never drops. For engineers: a short request fits a tiny function; holding a call open for minutes needs a long-running server.

DIAGRAM 3 OF 4 · THE LIVE LOOP

Inside one live call.

One turn, step by step. The whole loop runs in about a second, then repeats for the next turn until the caller hangs up.

One turn of the live call, steps 1 to 4, repeating One turn runs as four numbered steps, left to right: 1 HEAR (the transcriber turns speech to text), 2 THINK (the model decides what to do), 3 ACT (it runs an action such as book, notify, or update the records), and 4 SPEAK (the voice replies, human-sounding). A loop arrow returns from step 4 to step 1, showing the loop repeats for each turn until the caller hangs up. ONE TURN, STEP BY STEP 1 HEAR the transcriber speech → text 2 THINK the model decides what to do 3 ACT runs an action book · notify · update 4 SPEAK the voice human-sounding reply ↻ repeats for each turn
  1. Step 1, HEAR: the transcriber turns the caller's speech into text.
  2. Step 2, THINK: the model decides what to do.
  3. Step 3, ACT: it runs an action, such as book the job, notify the owner, or update the records.
  4. Step 4, SPEAK: the voice replies, human-sounding.
  5. The loop returns to step 1 for the next turn and repeats until the caller hangs up.
One turn = steps 1–4, about a second. It repeats for every turn until the caller hangs up.

DIAGRAM 4 OF 4 · THE MOAT

Why it gets better every month.

Every call becomes one clean, structured record. Across many shops those records roll up, anonymized, into a per-trade brain that gets sharper as volume grows.

Why the data compounds into a moat A funnel that widens left to right. On the left, several individual calls each become one structured record card with the same fields every time — trade, problem, urgency, and outcome. In the middle, many identical record cards converge, anonymized, so no raw customer data leaves a shop. On the right, they roll up into one large per-trade brain, drawn in teal with an upward growth curve, that gets sharper as call volume grows. The caption: the receptionist is the wedge, the data is the asset. calls one record per call trade ▾ problem ▾ urgency ▾ outcome ▾ the same fields on every call anonymized — no raw data leaves a shop PER-TRADE BRAIN (anonymized) × volume → sharper
  1. Each call becomes one structured record with the same fields every time — trade, problem, urgency, outcome.
  2. Across many shops, those identical records converge, anonymized, so no raw customer data leaves a shop.
  3. They roll up into one per-trade brain that gets sharper as call volume grows.
  4. The receptionist is the wedge. The data is the asset.
The receptionist is the wedge. The data is the asset.

Every call becomes one clean, structured record with the same fields every time, so every call is comparable across every shop and every month. Across many shops those records roll up, anonymized, into a per-trade brain that gets smarter with volume.

The receptionist is the wedge. The data is the asset.

Aggregates are anonymized — no raw customer data leaves a shop.

IN SHORT

That is the whole system.

A transcriber feeds the model; the model speaks through a voice and writes one structured record — on a server in the cloud. One agent, one call. That is the prototype.

Built on standard cloud infrastructure — a server runs the live call, a database keeps the records, small functions handle the glue. Single-region, pay-as-you-go, so it scales from one shop to many without re-architecting.

ROADMAP

What we build, in order.

This is the order things unlock in, not a schedule. Each phase builds on the one before it.

Order, not dates · single-agent first · the system grows around it

  1. 0

    System design — lock these first you are here

    The handful of decisions that are cheap to make now and expensive to change later. This is the design behind the “How it works” tab, scoped to just the prototype.

    • the call-record shape — exactly what one call captures, the same fields every time (the rich-data spine)
    • the live-call pipeline — transcriber → model → voice, and how each hands off to the next
    • the connect path — the phone webhook plus a server that holds the call open
    • the one CRM connection — which CRM, and which fields we read and write
    • basic logging — so every call leaves a trace we can read back
  2. 1

    Stand up the parts

    Get each piece from the “How it works” cast working on its own.

    • a phone number on a carrier
    • the transcriber (speech → text)
    • the model (the brain) with its first tools — book a job, look something up
    • the voice (text → human-sounding reply)
    • a server to run the live call, and a database to hold the records
  3. 2

    Wire the live loop

    Connect the parts into one call that flows end to end.

    • the two-step connect — a tiny function answers, the server holds the call
    • the loop — hear → think → act → speak, repeating every turn
    • a real call flowing both directions, in a natural back-and-forth
  4. 3

    Capture & book

    Turn the conversation into clean data and a booked job.

    • write one structured record per call — the same fields every time
    • the model fills the fields and books the job
    • push the booking into the one CRM
  5. 4

    Prove the POC the demo

    Place a real call and watch it work.

    • call the number for real
    • the agent answers, handles the call, and books the job
    • a clean record lands in the database and the CRM
    • you watch the whole thing happen live — that is the prototype

That is the prototype. Everything after it — more agents, billing, scale, hardening — comes after the POC proves out.