RiffOn - How Cursor Trained Composer on Fireworks: Distributed Infrastructure for High-Performance RL

Cursor & Fireworks on training Composer2 with distributed RL, tackling async training, global clusters, and MOE numerical mismatch challenges.

Asynchronous RL Sacrifices Algorithmic Purity for Massive GPU Utilization Gains

Cursor and Fireworks intentionally use an asynchronous RL setup where the model used for generating experiences can be slightly behind the model being trained. This "staleness" is an accepted trade-off that keeps expensive GPUs constantly working, compensating for minor algorithmic inefficiencies with higher overall throughput.