Agent coverage

One structural meter across the agents already doing the work.

CAUM watches runs from coding agents and automation agents for loops, repeated tool cycles, stalled progress, token exposure, and cost exposure. It observes structure only. It does not read prompts, source code, or private content.

Open CAUM Live Upload a trace

Agent families CAUM can observe.

Different agents emit different logs. CAUM normalizes the shape into structural events, then reports the health of the run without scoring the content itself.

Claude Code

Track shell/tool cycles, repeated edits, long retry chains, context growth, and cost exposure in coding sessions.

tool cyclessession logscost

OpenAI Codex

Observe coding-agent runs from event logs, terminal activity, patch attempts, validation loops, and token or cost metadata.

patch loopstestslive meter

Cursor / Windsurf

Convert editor-agent activity into structural evidence when session exports, tool traces, or automation logs are available.

editor agentstrace uploadreview

OpenHands

Measure action/observation rhythm, command repetition, environment retries, and paths where compute stops becoming movement.

action logsretriesT1-T5

SWE-agent

Summarize issue-solving traces into structural health tiers and hard-alert evidence for exact loops and dead-work pressure.

bench tracescyclesreceipt

LangChain, AutoGen, custom

Send neutral JSON events from your orchestration layer and keep prompts, files, customer data, and model output private.

APIJSONzero-semantic

Real validation agents are now running CAUM.

CAUM is being exercised by real non-destructive processes: production website smoke checks, backend tests, trace ingestion, privacy-boundary probes, and a controlled retry loop. The loop control is expected to alert; healthy agents are comparison runs.

Latest production validation

April 30, 2026 run against the live Railway API. Honest read: the controlled retry loop reached T5 and produced a live alert; non-control agents completed and reached review tiers, which keeps T4 labeled as review-only, not public waste.

5 real process agents

T5 controlled retry loop

0 private canary leaks

4 comparison workflows

RunPod structural calibration

May 1, 2026 generated calibration run on a temporary RunPod CPU pod. CAUM evaluated structural scenarios built to test loops, retries, privacy boundaries, exact cycles, and pattern cycles. This is calibration evidence, not customer prevalence.

13k generated scenarios

328k structural events

0 labeled FP/FN candidates

0 leaks or control actions

Private content stays outside the meter.

CAUM needs the structure of work, not the meaning of the work. Teams can hash identities, omit prompts, and send only operational signals that describe what happened during the run.

{
  "run_id": "run_042",
  "agent": "claude_code",
  "event_type": "tool_call",
  "tool_name": "shell",
  "status": "retry",
  "tokens": 1840,
  "cost_usd": 0.021,
  "timestamp": "2026-04-30T14:25:00Z"
}

Two ways to start.

Use the CAUM Receipt when you already have logs. Use Live Meter when you want observability during the run.

1. Upload

Drop messy JSON, JSONL, or exported trace data into the CAUM Receipt flow and receive a structural audit after the run.

2. Stream

Send live event records from your agent runtime and watch structural health, hard alerts, tokens, and cost exposure.

3. Act

Use CAUM evidence to review compute exposure, tune retries, compare agent workflows, and decide what deserves deeper inspection.

CAUM is the meter between agent activity and actual progress.

It gives operators a structural receipt for where compute went, without pretending to know whether the agent was semantically right.

Open CAUM Live Run Receipt