CAUM Live Pilot Kit

Monitor one real agent workflow.

CAUM Live gives teams a private-content-safe meter for loops, retry pressure, stagnation, exact cycles, token exposure, and cost exposure while agents run.

Run the Pilot Inspect Evidence Open Sandbox

No prompts No completions No source files No blocking No truth scoring

Pilot Readout

One workflow, 20-100 tasks, observe-only.

LIVE API OK

TASKS Real workflow events 20-100

SIGNAL Hard alerts only for conservative evidence Review

BOUNDARY Customer owns any action Observe

{
  "public_class": "hard_alert",
  "structural_health": "T5",
  "exact_cycle_coverage": 1.0,
  "allowed_to_block": false,
  "raw_content_stored": false
}
structural evidence only

The pilot is deliberately small.

The point is not to prove a universal waste rate. The point is to prove that CAUM can observe one recurring workflow, keep private content out, and surface reviewable structural evidence when the workflow starts looping or retrying.

Pick one workflow

A coding agent, browser agent, support agent, research agent, or internal automation that already runs repeatedly.

Send structural events

Tool family, phase, status, state hash, timestamps, token counters, cost counters, and latency. No raw content.

Review the receipt

CAUM returns structural tiers, public signals, hard-alert evidence, budget exposure percentages, and hash-linked receipts.

Talk in percent first.

Event-level counters can be tiny. A buyer cares about the recurring agent budget. CAUM should frame the pilot as reviewable structural exposure against monthly agent spend, then let the customer prove any realized savings with their own controls.

$562/month agent spend

5% reviewable exposure is about $28/month or $337/year. This is a scenario, not guaranteed savings.

$10,000/month agent spend

5% reviewable exposure is about $500/month or $6,000/year. The customer decides which controls to apply.

$50,000/month agent spend

5% reviewable exposure is about $2,500/month or $30,000/year. CAUM supplies evidence, not a savings guarantee.

What the customer sends.

The integration should be boring. CAUM does not need prompts, model outputs, code, files, or business data to measure structural movement.

✓Event type: tool_call, tool_result, retry, checkpoint, handoff, verification, final.
✓Coarse tool family or tool label: browser, shell, search, file_read, test, api.
✓Phase and status: inspect, plan, execute, retry, verify, completed, error.
✓Local state ID or hash bucket generated by the customer system.
✓Token, cost, and latency counters when available.

# Minimal CAUM Live event
session.event(structural_event(
    event="tool_call",
    tool="browser",
    phase="retry",
    status="error",
    state_id="local_hash_bucket_17",
    input_tokens=420,
    output_tokens=80,
    cost_usd=task_cost_counter,
    latency_ms=1200,
))

State IDs should be generated locally. Do not send raw tool arguments, prompts, completions, source files, customer messages, secrets, or PHI/PII.

What counts as a successful pilot.

A good pilot is not one where CAUM flags everything. A good pilot shows that normal work stays understandable, controlled retry loops are visible, and review-only signals are not sold as confirmed waste.

✓At least 20 real tasks stream into CAUM Live.
✓One controlled retry/loop run produces a conservative structural hard alert.
✓Returned evidence confirms allowed_to_block=false.
✓Returned evidence confirms raw private content is not required.
✓Any savings scenario is shown as a percentage of the customer's monthly agent spend, not as a CAUM guarantee.

The honest commercial offer.

CAUM Live is for teams already spending recurring money on agents. PDF Receipt remains the low-friction entry point; Live is the product when a workflow repeats often enough that loop/retry exposure matters.

Entry

Run a PDF Receipt on one historical trace to prove the structural readout is understandable.

Pilot

Instrument one recurring workflow with CAUM Live and review hard alerts, cost counters, and receipts.

Expansion

If the same pattern repeats, monitor more workflows and connect customer-owned review gates or retry ceilings.

Start with one workflow.

Bring one agent trace or one running workflow. CAUM will observe structure only, produce receipts, and keep the claim boundary intact.

Request Pilot Access See Live Evidence Run PDF Receipt