Mixture of experts

by Synced 2024-07-18 8

Revolutionizing Transformers: DeepMind’s PEER Layer and the Power of a Million Experts

A DeepMind research team introduces PEER, a innovative layer design leverages the product key technique for sparse retrieval from an extensive pool of tiny experts (over a million), which unlocks the potential for further scaling transformer models while maintaining computational efficiency.

by Synced 2023-10-30 1

AI Machine Learning & Data Science Research

MoE: Revolutionizing Memory-Efficient Execution of Massive-Scale MoE Models

A research team from Institute of Science and Technology Austria (ISTA) and Neural Magic Inc. introduces the QMoE framework. This innovative framework offers an effective solution for accurately compressing massive MoEs and conducting swift compressed inference, reducing model sizes by 10–20×, achieving less than 1 bit per parameter.

by Synced 2022-01-18 0

AI Machine Learning & Data Science Research

Microsoft’s DeepSpeed-MoE Makes Massive MoE Model Inference up to 4.5x Faster and 9x Cheaper

A Microsoft research team proposes DeepSpeed-MoE, comprising a novel MoE architecture design and model compression technique that reduces MoE model size by up to 3.7x and a highly optimized inference system that provides 7.3x better latency and cost compared to existing MoE inference solutions.