PPO KAGGLE ยท 381 TASKS
These charts come from the 3-shard PPO + LoRA training we ran on free Kaggle T4s. Source data:
kaggle ran notebooks/shard {1,2,3}/training_kaggle{N}.json + the 381 scenarios in
rl-agent/scenarios/sim/{easy,medium,hard}/*.json, pre-bundled into
rl-agent/showcase_data.json by scripts/build_showcase_data.py.
Every visible number is computed from those files.
PPO training curves
Every PPO update from every shard. Source: kaggle ran notebooks/shard {1,2,3}/training_kaggle{N}.json.
Wall-clock per update (s)