LEGACY DATASET
These charts come from the kube-sre-gym-style heuristic + early notebook runs — the 11 hand-curated tasks in rl-agent/scenarios/{easy,medium,hard}/*.json, recorded into rl-agent/checkpoints/<run>/metrics.jsonl and colab/logs/reward_breakdown_history.jsonl. They do not include the 381-task PPO Kaggle run.

IncidentCommander — Overview

Actor-critic PPO training telemetry · run: ppo-v2-heuristic · mode: heuristic
Episodes
99
Mean Reward
1.167
Mean Grade
0.741
Mitigation Rate
100.0%
Root Cause Rate
36.4%
Current Tier
warmup
Current Task
Updates Logged
33

Reward per update

Grade per update

Mitigation & Root-Cause rate

Tier distribution

Available runs

ppo-v2-heuristic ppo-v3-hybrid-ollama-bedrock ppo-v4-hybrid-ollama-groq