Introducing Alpa: A Compiler Architecture for Automated Model-Parallel Distributed Training That Outperforms Hand-Tuned Strategies
A research team from UC Berkeley, Amazon Web Services, Google, Shanghai Jiao Tong University and Duke University proposes Alpa, a compiler system for distributed deep learning on GPU clusters that automatically generates parallelization plans that match or outperform hand-tuned model-parallel training systems even on the models they were designed for.