OpenHands Monitoring — Detect Agent Loops & Behavioral Waste

Framework-agnostic validation

The key test for any behavioral monitor is whether it works across frameworks — not just the one it was trained on. CAUM was validated on both SWE-agent and OpenHands sessions using the same motor, with no retraining.

Framework-agnostic result confirmed: GPT-4o on SWE-agent: d=+1.099, AUC=0.757. GPT-4o on OpenHands: d=+1.131, AUC=0.852. The signal is stronger on OpenHands — the motor generalizes across execution environments.

Performance across all tested configurations

Model	Framework	Cohen's d	AUC	Status
GPT-4o	OpenHands	+1.131	0.852	BEST RESULT
GPT-4o	SWE-agent	+1.099	0.757	EXCELLENT
Llama 3.x	SWE-agent	+0.968	0.747	EXCELLENT
Gemini Flash	mini-SWE	+0.804	0.722	EXCELLENT
Claude 3.7	SWE-agent	+0.775	0.650	GOOD

Integration with OpenHands

# Works with OpenHands cloud and self-hosted
from caum import ZeroTrustAuditor

aud = ZeroTrustAuditor(
    model_hint="gpt4o",
    framework_hint="openhands"
)

# Wrap the OpenHands event stream
for event in openhands_runtime.events():
    verdict = aud.push(event.action, event.observation)
    if verdict["regime"] == "LOOP":
        notify_team("Loop detected", verdict["severity"])

cert = aud.finalize()
# cert["uds"] health score · cert["tier"] T1–T5 · Ed25519 signed

Analyze your OpenHands trajectories

Upload a trajectory JSONL and get a 10-page forensic PDF report. First analysis free with code PIONEER.

Upload Trajectory → Read the Full Research

Monitoring forOpenHands

Framework-agnostic validation

Performance across all tested configurations

Integration with OpenHands

Analyze your OpenHands trajectories

Monitoring for
OpenHands