LEGACY DATASET
These charts come from the kube-sre-gym-style heuristic + early notebook runs โ€” the 11 hand-curated tasks in rl-agent/scenarios/{easy,medium,hard}/*.json, recorded into rl-agent/checkpoints/<run>/metrics.jsonl and colab/logs/reward_breakdown_history.jsonl. They do not include the 381-task PPO Kaggle run.

Judge & persona

Persona: senior ยท LLM judge: disabled

Personas

PersonaBiasDescription
junior explores more, investigates thoroughly Rewards breadth-first investigation. Tolerates red herrings.
senior balanced, pragmatic Default. Rewards hypothesis-driven investigation + timely fix.
principal fast, decisive Penalises excessive investigation. Rewards prevention plan.

Holistic rubric

{
  "triage":          "Did the agent identify incident class on turn 1?",
  "investigation":   "Did evidence-gathering precede writes?",
  "mitigation":      "Was the correct remediation applied?",
  "root_cause":      "Is the postmortem root-cause correct?",
  "blast_radius":    "Did the postmortem quantify impact?",
  "prevention":      "Are prevention steps concrete and testable?"
}