LEGACY DATASET
These charts come from the kube-sre-gym-style heuristic + early notebook runs — the 11 hand-curated tasks in
rl-agent/scenarios/{easy,medium,hard}/*.json, recorded into
rl-agent/checkpoints/<run>/metrics.jsonl and
colab/logs/reward_breakdown_history.jsonl. They do not include the 381-task PPO Kaggle run.
Curriculum controller
Current tier: expert · Source: training snapshot (36 episodes)
Tier composition
| Tier | Tasks |
| warmup | task1, task4, task9 |
| beginner | task1, task2, task4, task5, task9 |
| intermediate | task1–task10 |
| advanced | task2, task3, task5, task6, task7, task8, task10, task11 |
| expert | task3, task6, task7, task8, task10, task11 |
Per-task mastery
| Task | Mastery | |
| task1 | 0.84 | |
| task10 | 0.94 | |
| task11 | 0.29 | |
| task2 | 0.34 | |
| task3 | 0.25 | |
| task4 | 0.68 | |
| task5 | 0.15 | |
| task6 | 0.29 | |
| task7 | 0.13 | |
| task8 | 0.28 | |
| task9 | 0.95 | |
Recent episodes
| Task | Score | Target | Result |
| task6 | 0.350 | 0.45 | fail |
| task7 | 0.300 | 0.45 | fail |
| task8 | 0.350 | 0.45 | fail |
| task9 | 0.850 | 0.70 | pass |
| task10 | 0.850 | 0.45 | pass |
| task11 | 0.600 | 0.45 | pass |
| task1 | 0.850 | 0.70 | pass |
| task2 | 0.450 | 0.45 | pass |
| task3 | 0.400 | 0.45 | fail |
| task4 | 0.600 | 0.70 | fail |
| task5 | 0.350 | 0.45 | fail |
| task6 | 0.450 | 0.45 | pass |
| task7 | 0.400 | 0.45 | fail |
| task8 | 0.600 | 0.45 | pass |
| task9 | 0.900 | 0.70 | pass |
| task10 | 0.850 | 0.45 | pass |
| task11 | 0.400 | 0.45 | fail |
| task1 | 0.900 | 0.70 | pass |
| task2 | 0.350 | 0.45 | fail |
| task3 | 0.400 | 0.45 | fail |