OpenHands Integration

Monitoring for
OpenHands

CAUM's strongest validation is on OpenHands sessions. Framework-agnostic behavioral monitoring with zero semantic access.

d = +1.131
Cohen's d · AUC = 0.852 · GPT-4o + OpenHands · Best result across all frameworks tested

Framework-agnostic validation

The key test for any behavioral monitor is whether it works across frameworks — not just the one it was trained on. CAUM was validated on both SWE-agent and OpenHands sessions using the same motor, with no retraining.

Framework-agnostic result confirmed: GPT-4o on SWE-agent: d=+1.099, AUC=0.757. GPT-4o on OpenHands: d=+1.131, AUC=0.852. The signal is stronger on OpenHands — the motor generalizes across execution environments.

Performance across all tested configurations

ModelFrameworkCohen's dAUCStatus
GPT-4oOpenHands+1.1310.852BEST RESULT
GPT-4oSWE-agent+1.0990.757EXCELLENT
Llama 3.xSWE-agent+0.9680.747EXCELLENT
Gemini Flashmini-SWE+0.8040.722EXCELLENT
Claude 3.7SWE-agent+0.7750.650GOOD

Integration with OpenHands

# Works with OpenHands cloud and self-hosted
from caum import ZeroTrustAuditor

aud = ZeroTrustAuditor(
    model_hint="gpt4o",
    framework_hint="openhands"
)

# Wrap the OpenHands event stream
for event in openhands_runtime.events():
    verdict = aud.push(event.action, event.observation)
    if verdict["regime"] == "LOOP":
        notify_team("Loop detected", verdict["severity"])

cert = aud.finalize()
# cert["uds"] health score · cert["tier"] T1–T5 · Ed25519 signed

Analyze your OpenHands trajectories

Upload a trajectory JSONL and get a 10-page forensic PDF report. First analysis free with code PIONEER.