OpenHands Monitoring â€” Observe Agent Loops & Structural Exposure

Framework-agnostic validation

The key test for any behavioral monitor is whether it works across frameworks â€” not just the one it was trained on. CAUM was validated on both SWE-agent and OpenHands sessions using the same motor, with no retraining.

Framework-agnostic result confirmed: GPT-4o on SWE-agent: d=+1.099, AUC=0.757. GPT-4o on OpenHands: d=+1.131, AUC=0.852. The signal is stronger on OpenHands â€” the motor generalizes across execution environments.

Performance across all tested configurations

Model	Framework	Cohen's d	AUC	Status
GPT-4o	OpenHands	+1.131	0.852	BEST RESULT
GPT-4o	SWE-agent	+1.099	0.757	EXCELLENT
Llama 3.x	SWE-agent	+0.968	0.747	EXCELLENT
Gemini Flash	mini-SWE	+0.804	0.722	EXCELLENT
Claude 3.7	SWE-agent	+0.775	0.650	GOOD

Integration with OpenHands

# Works with OpenHands cloud and self-hosted
from caum import ZeroTrustAuditor

aud = ZeroTrustAuditor(
    model_hint="gpt4o",
    framework_hint="openhands"
)

# Wrap the OpenHands event stream
for event in openhands_runtime.events():
    signal = aud.push(event.action, event.observation)
    if signal["regime"] == "LOOP":
        notify_team("Loop detected", signal["severity"])

receipt = aud.finalize()
# receipt["tier"] T1-T5 Â· evidence grade Â· hash-linked metadata

Analyze your OpenHands trajectories

Upload a trajectory JSONL and get an observation-only structural PDF report. First analysis free with code PIONEER.

Upload Trajectory â†’ Read the Full Research

Monitoring forOpenHands

Framework-agnostic validation

Performance across all tested configurations

Integration with OpenHands

Analyze your OpenHands trajectories

Monitoring for
OpenHands