LEGACY DATASET
These charts come from the kube-sre-gym-style heuristic + early notebook runs — the 11 hand-curated tasks in rl-agent/scenarios/{easy,medium,hard}/*.json, recorded into rl-agent/checkpoints/<run>/metrics.jsonl and colab/logs/reward_breakdown_history.jsonl. They do not include the 381-task PPO Kaggle run.

IncidentCommander — Overview

Actor-critic PPO training telemetry · run: ppo-v3-hybrid-ollama-bedrock · mode: ppo-v3-hybrid-ollama-bedrock
Episodes
36
Mean Reward
1.322
Mean Grade
0.631
Mitigation Rate
69.4%
Root Cause Rate
36.1%
Current Tier
warmup
Current Task
Updates Logged
12

Reward per update

Grade per update

Mitigation & Root-Cause rate

Tier distribution

Available runs

ppo-v2-heuristic ppo-v3-hybrid-ollama-bedrock ppo-v4-hybrid-ollama-groq