PPO KAGGLE · 381 TASKS
These charts come from the 3-shard PPO + LoRA training we ran on free Kaggle T4s. Source data: kaggle ran notebooks/shard {1,2,3}/training_kaggle{N}.json + the 381 scenarios in rl-agent/scenarios/sim/{easy,medium,hard}/*.json, pre-bundled into rl-agent/showcase_data.json by scripts/build_showcase_data.py. Every visible number is computed from those files.

Adversarial scenarios · PPO coverage

Scenarios with active adversaries, runbook traps, cascading failures, or noisy distractors.
Adversarial scenarios
25
Total tasks
381
Coverage
6.6%

Definition

Adversarial scenarios add a saboteur agent that re-injects faults on a cooldown, noisy Slack channels designed to actively misdirect the agent, runbook traps where the standard playbook makes things worse, and cascading failures where the loud symptom hides the real root cause. Source files: rl-agent/scenarios/sim/hard/sim_advanced_*.json.

Per-scenario performance

IDTitleDifficultyCategoryMean reward
sim_gen_redherring_auth_001Red-herring on Slack — true cause is leak in authmediumGenerated · Red Herring-8.280
sim_gen_redherring_auth_002Red-herring on Slack — true cause is leak in authmediumGenerated · Red Herring-6.330
sim_gen_redherring_auth_003Red-herring on Slack — true cause is leak in authmediumGenerated · Red Herring-6.630
sim_gen_redherring_auth_004Red-herring on Slack — true cause is leak in authmediumGenerated · Red Herring-8.280
sim_gen_redherring_catalog_009Red-herring on Slack — true cause is leak in catalogmediumGenerated · Red Herring-5.880
sim_gen_redherring_catalog_010Red-herring on Slack — true cause is leak in catalogmediumGenerated · Red Herring-6.480
sim_gen_redherring_catalog_011Red-herring on Slack — true cause is leak in catalogmediumGenerated · Red Herring-8.280
sim_gen_redherring_catalog_012Red-herring on Slack — true cause is leak in catalogmediumGenerated · Red Herring-5.430
sim_gen_redherring_checkout_005Red-herring on Slack — true cause is leak in checkoutmediumGenerated · Red Herring-6.480
sim_gen_redherring_checkout_006Red-herring on Slack — true cause is leak in checkoutmediumGenerated · Red Herring-5.730
sim_gen_redherring_checkout_007Red-herring on Slack — true cause is leak in checkoutmediumGenerated · Red Herring-4.830
sim_gen_redherring_checkout_008Red-herring on Slack — true cause is leak in checkoutmediumGenerated · Red Herring-5.730
sim_gen_redherring_inventory_017Red-herring on Slack — true cause is leak in inventorymediumGenerated · Red Herring-6.630
sim_gen_redherring_inventory_018Red-herring on Slack — true cause is leak in inventorymediumGenerated · Red Herring-5.880
sim_gen_redherring_inventory_019Red-herring on Slack — true cause is leak in inventorymediumGenerated · Red Herring-4.230
sim_gen_redherring_inventory_020Red-herring on Slack — true cause is leak in inventorymediumGenerated · Red Herring-6.930
sim_gen_redherring_payments_013Red-herring on Slack — true cause is leak in paymentsmediumGenerated · Red Herring-6.180
sim_gen_redherring_payments_014Red-herring on Slack — true cause is leak in paymentsmediumGenerated · Red Herring-6.930
sim_gen_redherring_payments_015Red-herring on Slack — true cause is leak in paymentsmediumGenerated · Red Herring-8.280
sim_gen_redherring_payments_016Red-herring on Slack — true cause is leak in paymentsmediumGenerated · Red Herring-5.880
sim_advanced_cascade_users_db_001Cascade: users_db memory-leak hides behind frontend 504shardCascading Failure-7.230
sim_advanced_runbook_trap_postgres_001Trap: TXID wraparound — restart corrupts the DBhardRunbook Trap-7.380
sim_advanced_saboteur_duel_0011v1 Duel — Active Saboteur attacks auth_db, then choke-holds the replicahardAdversarial Saboteur-5.805
sim_advanced_slack_redherring_001Slack Red-Herring — A frontend dev claims their hotfix broke checkouthardSlack Red Herring-5.655
sim_advanced_trolley_orders_db_001Trolley: orders_db index corrupted — rebuild vs restorehardTrolley Problem-6.180