LEGACY DATASET
These charts come from the kube-sre-gym-style heuristic + early notebook runs — the 11 hand-curated tasks in rl-agent/scenarios/{easy,medium,hard}/*.json, recorded into rl-agent/checkpoints/<run>/metrics.jsonl and colab/logs/reward_breakdown_history.jsonl. They do not include the 381-task PPO Kaggle run.

Curriculum controller

Current tier: expert · Source: training snapshot (36 episodes)

Tier composition

TierTasks
warmup task1, task4, task9
beginner task1, task2, task4, task5, task9
intermediate task1–task10
advanced task2, task3, task5, task6, task7, task8, task10, task11
expert task3, task6, task7, task8, task10, task11

Per-task mastery

TaskMastery
task10.84
task100.94
task110.29
task20.34
task30.25
task40.68
task50.15
task60.29
task70.13
task80.28
task90.95

Recent episodes

TaskScoreTargetResult
task60.3500.45fail
task70.3000.45fail
task80.3500.45fail
task90.8500.70pass
task100.8500.45pass
task110.6000.45pass
task10.8500.70pass
task20.4500.45pass
task30.4000.45fail
task40.6000.70fail
task50.3500.45fail
task60.4500.45pass
task70.4000.45fail
task80.6000.45pass
task90.9000.70pass
task100.8500.45pass
task110.4000.45fail
task10.9000.70pass
task20.3500.45fail
task30.4000.45fail