PPO KAGGLE ยท 381 TASKS
These charts come from the 3-shard PPO + LoRA training we ran on free Kaggle T4s. Source data: kaggle ran notebooks/shard {1,2,3}/training_kaggle{N}.json + the 381 scenarios in rl-agent/scenarios/sim/{easy,medium,hard}/*.json, pre-bundled into rl-agent/showcase_data.json by scripts/build_showcase_data.py. Every visible number is computed from those files.

PPO training curves

Every PPO update from every shard. Source: kaggle ran notebooks/shard {1,2,3}/training_kaggle{N}.json.
Updates / shard
60
Rollouts / update
3
Max steps / episode
12
Total transitions
6480

Mean reward

PPO loss

KL divergence

Value error

Policy loss

Wall-clock per update (s)