DIAGRAM 1 OF 4 · THE CAST
Meet the team behind the call.
Six parts do the live work of that call together. Each is a best-in-class service doing one job a person would understand.
- The Phone Line answers the call and streams the caller's audio in.
- The Ears (a transcriber) turn that speech into text, live.
- The Brain (the model) understands the text, decides what to say, and fills in one clean record of the call.
- The Mouth (the voice) turns the brain's words back into a human voice that goes out through the phone line.
- The Orchestrator (our code) sits underneath and moves the audio between the ears, brain, and mouth.
- The Brain writes one structured record into the Memory (a database).
DIAGRAM 2 OF 4 · THE HANDSHAKE
What happens when someone calls.
You never see this — it's the reason the call connects in about half a second and never drops. (One for the engineers: a short request fits a tiny function; holding a call open for minutes needs a long-running container. That difference is the whole reason both exist.)
- The caller dials and reaches the phone line.
- Phase 1, the doorbell, lasts only milliseconds: the phone line makes one short request to a tiny function (a tiny program that wakes up, answers, and exits), which answers with a one-line instruction that means "open a live audio line here," then steps out and is never in the audio path.
- Phase 2, the live call, lasts about five minutes: the phone line opens a persistent connection to a long-running server (that stays alive for the whole call) that holds the call open and streams audio both ways the entire time.
- When the call ends, one structured record is written to the database.
- The point: a short stateless request suits a tiny function, while holding a call open for minutes requires a long-running container.
DIAGRAM 3 OF 4 · THE LIVE LOOP
Inside one live call.
One turn, step by step. The whole loop runs in about a second, then repeats for the next turn until the caller hangs up.
- Step 1, HEAR: the transcriber turns the caller's speech into text.
- Step 2, THINK: the model decides what to do.
- Step 3, ACT: it runs an action, such as book the job, notify the owner, or update the records.
- Step 4, SPEAK: the voice replies, human-sounding.
- The loop returns to step 1 for the next turn and repeats until the caller hangs up.
DIAGRAM 4 OF 4 · THE MOAT
Why it gets better every month.
Every call becomes one clean, structured record. Across many shops those records roll up, anonymized, into a per-trade brain that gets sharper as volume grows.
- Each call becomes one structured record with the same fields every time — trade, problem, urgency, outcome.
- Across many shops, those identical records converge, anonymized, so no raw customer data leaves a shop.
- They roll up into one per-trade brain that gets sharper as call volume grows.
- The receptionist is the wedge. The data is the asset.
Every call becomes one clean, structured record with the same fields every time, so every call is comparable across every shop and every month. Across many shops those records roll up, anonymized, into a per-trade brain that gets smarter with volume.
The receptionist is the wedge. The data is the asset.
Aggregates are anonymized — no raw customer data leaves a shop.
IN SHORT
That is the whole system.
A transcriber feeds the model; the model speaks through a voice and writes one structured record — on a server in the cloud. One agent, one call. That is the prototype.