Tag: Mixture of experts

AI Machine Learning & Data Science Research

MoE: Revolutionizing Memory-Efficient Execution of Massive-Scale MoE Models

A research team from Institute of Science and Technology Austria (ISTA) and Neural Magic Inc. introduces the QMoE framework. This innovative framework offers an effective solution for accurately compressing massive MoEs and conducting swift compressed inference, reducing model sizes by 10–20×, achieving less than 1 bit per parameter.