Everything you need to observe your AI agents with CAUM.
Get CAUM observing your agents in under 60 seconds.
pip install caum
Register at caum.systems/pilot or via API:
curl -X POST https://caum-observation-production.up.railway.app/v1/register \
-H "Content-Type: application/json" \
-d '{"email": "you@company.com", "company": "Your Company"}'
Option A: Zero-code wrapper (recommended)
Wrap your existing Anthropic or OpenAI client. Your agent runs exactly the same. CAUM observes automatically.
# Anthropic
import caum, anthropic
caum.init("caum_live_YOUR_KEY")
client = caum.wrap_anthropic(anthropic.Anthropic())
# Use client exactly as before. CAUM observes every tool call.
response = client.messages.create(
model="claude-sonnet-4-20250514",
messages=[...],
tools=[...],
)
# OpenAI
import caum, openai
caum.init("caum_live_YOUR_KEY")
client = caum.wrap_openai(openai.OpenAI())
# Use client exactly as before.
response = client.chat.completions.create(
model="gpt-5",
messages=[...],
tools=[...],
)
Option B: Manual observation
Call caum.observe() after each tool call in your agent:
import caum
caum.init("caum_live_YOUR_KEY")
caum.observe("session-1", tool="bash", content="pytest tests/")
caum.observe("session-1", tool="edit", content="Fixed the bug")
result = caum.end("session-1")
print(result["tier"]) # T1-T5
Option C: Raw API (curl)
curl -X POST https://caum-observation-production.up.railway.app/v1/step \
-H "Authorization: Bearer caum_live_YOUR_KEY" \
-H "Content-Type: application/json" \
-d '{"session_id": "task-1", "step": 1, "tool": "bash", "content": "pytest tests/"}'
Response:
{
"regime": "EXPLORER",
"failure_risk": 0.23,
"uds": 0.77,
"tier": "T2",
"advisory": null,
"merkle_root": "a7c3e1f9...",
"alert_fired": false,
"observation": "passive"
}
result = caum.end("session-1", resolved=True)
# or via curl:
# POST /v1/session/session-1/end
CAUM observes and notifies. CAUM never controls agent execution. If CAUM goes down, your agent keeps running. Policy actions are at your discretion.
| Regime | Meaning | Your agent is... |
|---|---|---|
EXPLORER | Healthy | Using diverse tools, making progress |
GRIND | Degraded | Repeating similar actions |
LOOP | Critical | Stuck in a repetitive cycle |
REASONING_LOOP | Critical | Thinking in circles, not acting |
STAGNATION | Degraded | Same tool, same result, over and over |
Base URL: https://caum-observation-production.up.railway.app
| Method | Endpoint | Auth | Description |
|---|---|---|---|
POST | /v1/register | None | Create account, get API key |
POST | /v1/step | Bearer | Observe a tool call (core endpoint) |
GET | /v1/sessions | Bearer | List all sessions |
GET | /v1/session/{id} | Bearer | Session detail with all steps |
POST | /v1/session/{id}/end | Bearer | Finalize session, get full report |
POST | /v1/alerts | Bearer | Create alert rules |
GET | /v1/alerts/history | Bearer | List fired alerts |
GET | /v1/usage | Bearer | Usage stats and trial status |
POST | /v1/checkout | Bearer | Stripe subscription checkout |
| Field | Type | Required | Description |
|---|---|---|---|
session_id | string | Yes | Your identifier for this session |
step | integer | Yes | Step number (1-indexed) |
tool | string | Yes | Tool name (bash, code_editor, etc.) |
content | string | No | Tool output summary (improves detection) |
agent_name | string | No | Agent identifier |
team | string | No | Team that owns the agent |
task | string | No | Task description |
tokens_used | integer | No | LLM tokens consumed |
cost_usd | float | No | Cost of this step |
state_hash | string | No | SHA-256 of tool output |
Minimal wrapper for any Python agent:
import requests, hashlib
class CaumObserver:
def __init__(self, api_key, session_id, agent_name=""):
self.api_key = api_key
self.session_id = session_id
self.agent_name = agent_name
self.step = 0
self.base = "https://caum-observation-production.up.railway.app"
self.headers = {
"Authorization": f"Bearer {api_key}",
"Content-Type": "application/json",
}
def observe(self, tool, content, cost=0):
"""Call after each tool use. Returns regime assessment."""
self.step += 1
resp = requests.post(f"{self.base}/v1/step", headers=self.headers, json={
"session_id": self.session_id,
"step": self.step,
"tool": tool,
"content": content[:500],
"agent_name": self.agent_name,
"cost_usd": cost,
"state_hash": hashlib.sha256(content.encode()).hexdigest()[:16],
})
return resp.json()
def end(self):
"""Finalize session. Call when agent completes."""
resp = requests.post(
f"{self.base}/v1/session/{self.session_id}/end",
headers=self.headers,
)
return resp.json()
# Usage
caum = CaumObserver("caum_live_YOUR_KEY", "fix-auth-bug", "Claude Sonnet")
# After each tool call in your agent loop:
result = caum.observe("bash", "pytest tests/ -v -- 5 passed", cost=0.03)
print(result["regime"]) # EXPLORER
print(result["uds"]) # 0.88
# When done:
report = caum.end()
print(report["tier"]) # T2
Configure alert rules to receive webhooks when CAUM detects degradation:
curl -X POST https://caum-observation-production.up.railway.app/v1/alerts \
-H "Authorization: Bearer caum_live_YOUR_KEY" \
-H "Content-Type: application/json" \
-d '[{
"condition": "tier>=T4",
"action_type": "webhook",
"destination": "https://your-server.com/caum-alert"
}]'
{
"event": "caum.advisory",
"agent": "Claude Sonnet",
"team": "Backend",
"task": "Fix auth service bug",
"session_id": "fix-auth-bug",
"step": 14,
"regime": "LOOP",
"uds": 0.22,
"tier": "T4",
"total_steps": 14,
"total_cost": 0.42,
"message": "CAUM Advisory: LOOP at step 14, UDS=0.22, Tier=T4",
"dashboard_url": "https://caum.systems/dashboard/fix-auth-bug",
"timestamp": "2026-04-03T..."
}
| Type | Destination | Format |
|---|---|---|
webhook | Any HTTPS URL | JSON POST |
slack | Slack webhook URL | Block Kit message |
email | Email address | HTML email via Resend |
| Condition | Triggers when |
|---|---|
tier>=T4 | Session enters T4 or T5 |
tier>=T3 | Session enters T3, T4, or T5 |
tier=T5 | Session enters T5 only |
regime=LOOP | Agent enters LOOP regime |
regime=GRIND | Agent enters GRIND regime |
uds<0.3 | UDS drops below 0.3 |
uds<0.5 | UDS drops below 0.5 |
CAUM classifies each step into a behavioral regime:
| Regime | Meaning | Healthy? |
|---|---|---|
WARMING_UP | Agent is reading files and understanding the codebase | Yes |
EXPLORER | Agent is making diverse, productive tool calls | Yes |
EXPANSION | Agent is building new code or features | Yes |
LINEAR_PROGRESS | Steady forward movement on the task | Yes |
STABLE | Consistent work pattern, no anomalies | Yes |
GRIND | Repetitive tool use, low diversity, may be stuck | Warning |
LOOP | Agent is repeating the same actions | No |
STAGNATION | Agent has stopped making progress | No |
FUNNEL_TRAP | Agent is narrowing into a dead end | No |
UDS (Unified Detection Score) maps to tiers:
| Tier | UDS Range | Meaning |
|---|---|---|
| T1 | 0.85 - 1.00 | Excellent. Diverse tools, real substance, productive work. |
| T2 | 0.70 - 0.84 | Good. Genuine work with minor structural issues. |
| T3 | 0.50 - 0.69 | Marginal. Some substance but tool monotony or short session. |
| T4 | 0.30 - 0.49 | Degraded. Low substance, repetitive patterns, possible loop. |
| T5 | 0.00 - 0.29 | Critical. Gaming, looping, or no productive work detected. |
UDS is computed as: UDS = 0.30 * TCR + 0.40 * ESR + 0.30 * (SCI * ESR)
All endpoints except /v1/register require a Bearer token:
Authorization: Bearer caum_live_YOUR_API_KEY
API keys are generated on registration and start with caum_live_.
When trial expires, the API returns 402 with an upgrade link:
{
"error": "trial_expired",
"message": "Trial expired. Upgrade at caum.systems/pilot/",
"upgrade_url": "https://caum.systems/pilot/"
}