LEGACY DATASET
These charts come from the kube-sre-gym-style heuristic + early notebook runs โ the 11 hand-curated tasks in
rl-agent/scenarios/{easy,medium,hard}/*.json, recorded into
rl-agent/checkpoints/<run>/metrics.jsonl and
colab/logs/reward_breakdown_history.jsonl. They do not include the 381-task PPO Kaggle run.
Judge & persona
Persona: senior ยท LLM judge: disabled
Personas
| Persona | Bias | Description |
| junior |
explores more, investigates thoroughly |
Rewards breadth-first investigation. Tolerates red herrings. |
| senior |
balanced, pragmatic |
Default. Rewards hypothesis-driven investigation + timely fix. |
| principal |
fast, decisive |
Penalises excessive investigation. Rewards prevention plan. |
Holistic rubric
{
"triage": "Did the agent identify incident class on turn 1?",
"investigation": "Did evidence-gathering precede writes?",
"mitigation": "Was the correct remediation applied?",
"root_cause": "Is the postmortem root-cause correct?",
"blast_radius": "Did the postmortem quantify impact?",
"prevention": "Are prevention steps concrete and testable?"
}