LEGACY DATASET
These charts come from the kube-sre-gym-style heuristic + early notebook runs — the 11 hand-curated tasks in
rl-agent/scenarios/{easy,medium,hard}/*.json, recorded into
rl-agent/checkpoints/<run>/metrics.jsonl and
colab/logs/reward_breakdown_history.jsonl. They do not include the 381-task PPO Kaggle run.
IncidentCommander — Overview
Actor-critic PPO training telemetry · run: ppo-v3-hybrid-ollama-bedrock · mode: ppo-v3-hybrid-ollama-bedrock
Reward per update
Grade per update
Mitigation & Root-Cause rate
Tier distribution
Available runs