LEGACY DATASET
These charts come from the kube-sre-gym-style heuristic + early notebook runs — the 11 hand-curated tasks in
rl-agent/scenarios/{easy,medium,hard}/*.json, recorded into
rl-agent/checkpoints/<run>/metrics.jsonl and
colab/logs/reward_breakdown_history.jsonl. They do not include the 381-task PPO Kaggle run.
Per-task performance
11 hand-authored tasks across 5 difficulty tiers.
Mean reward by task
Mean grade by task
Mitigation rate by task (%)
Task catalog
| ID | Scenario | Difficulty | Target |
N | Reward | Grade | Mit% | RC% |
| task1 | Redis Connection Pool Exhaustion | easy | 0.80 | 4 | +2.387 | 0.800 | 75% | 100% |
| task2 | Cascading Failure via Payments OOM | medium | 0.45 | 4 | +1.564 | 0.475 | 50% | 0% |
| task3 | Silent Decimal Corruption | hard | 0.20 | 4 | +1.456 | 0.438 | 25% | 0% |
| task4 | Kafka Broker Network Partition | easy | 0.80 | 3 | +2.158 | 0.700 | 33% | 100% |
| task5 | DNS Resolution Failure | medium | 0.45 | 3 | +1.364 | 0.367 | 0% | 0% |
| task6 | TLS Certificate Expiry Cascade | hard | 0.20 | 3 | +1.573 | 0.467 | 33% | 0% |
| task7 | ConfigMap Hot-Reload Race Condition | hard | 0.20 | 3 | +1.271 | 0.333 | 0% | 0% |
| task8 | JWT Secret Rotation Cascade | medium | 0.45 | 3 | +1.480 | 0.450 | 33% | 0% |
| task9 | Invalid Image Tag Deploy | easy | 0.80 | 3 | +2.415 | 0.883 | 100% | 100% |
| task10 | Namespace ResourceQuota Starvation | medium | 0.45 | 3 | +2.293 | 0.850 | 100% | 100% |
| task11 | Liveness Probe Path Regression | hard | 0.20 | 3 | +1.558 | 0.467 | 33% | 0% |