Tag: Mixture of Expert

by Synced 2024-12-28 34

Llama 3 Meets MoE: Pioneering Low-Cost High-Performance AI

Researchers from the University of Texas at Austin and NVIDIA proposes upcycling approach, an innovative training recipe enables the development of an 8-Expert Top-2 MoE model using Llama 3-8B with less than 1% of the compute typically required for pre-training.

by Synced 2024-10-16 5

AI Machine Learning & Data Science Research

From Dense to Dynamic: NVIDIA’s Innovations in Upcycling LLMs to Sparse MoE

In a new paper Upcycling Large Language Models into Mixture of Experts, an NVIDIA research team introduces a new “virtual group” initialization technique to facilitate the transition of dense models into fine-grained MoE structures.