ZeRO-Offload: Training Multi-Billion Parameter Models on a Single GPU
Researchers from University of California, Merced and Microsoft have introduced ZeRO-Offload, a novel heterogeneous DL training technology that enables training of multi-billion parameter models on a single GPU without any model refactoring.