LEGACY DATASET
These charts come from the kube-sre-gym-style heuristic + early notebook runs — the 11 hand-curated tasks in rl-agent/scenarios/{easy,medium,hard}/*.json, recorded into rl-agent/checkpoints/<run>/metrics.jsonl and colab/logs/reward_breakdown_history.jsonl. They do not include the 381-task PPO Kaggle run.

Per-task performance

11 hand-authored tasks across 5 difficulty tiers.

Mean reward by task

Mean grade by task

Mitigation rate by task (%)

Task catalog

IDScenarioDifficultyTarget NRewardGradeMit%RC%
task1Redis Connection Pool Exhaustioneasy0.804+2.3870.80075%100%
task2Cascading Failure via Payments OOMmedium0.454+1.5640.47550%0%
task3Silent Decimal Corruptionhard0.204+1.4560.43825%0%
task4Kafka Broker Network Partitioneasy0.803+2.1580.70033%100%
task5DNS Resolution Failuremedium0.453+1.3640.3670%0%
task6TLS Certificate Expiry Cascadehard0.203+1.5730.46733%0%
task7ConfigMap Hot-Reload Race Conditionhard0.203+1.2710.3330%0%
task8JWT Secret Rotation Cascademedium0.453+1.4800.45033%0%
task9Invalid Image Tag Deployeasy0.803+2.4150.883100%100%
task10Namespace ResourceQuota Starvationmedium0.453+2.2930.850100%100%
task11Liveness Probe Path Regressionhard0.203+1.5580.46733%0%