Tag: Mixture of Expert

AI Machine Learning & Data Science Research

Llama 3 Meets MoE: Pioneering Low-Cost High-Performance AI

Researchers from the University of Texas at Austin and NVIDIA proposes upcycling approach, an innovative training recipe enables the development of an 8-Expert Top-2 MoE model using Llama 3-8B with less than 1% of the compute typically required for pre-training.