model pruning | Synced

by Synced 2024-08-29 4

NVIDIA’s Minitron: Compressing Llama 3.1 and Mistral NeMo for Superior Performance in 4B and 8B Models

In a new paper LLM Pruning and Distillation in Practice: The Minitron Approach, an NVIDIA research team presents the Minitron compression strategy, which effectively produces a robust 4B model from Llama 3.1 8B and a cutting-edge Mistral-NeMo-Minitron-8B model derived from Mistral NeMo 12B.

by Synced 2022-06-08 0

AI Machine Learning & Data Science Research

Gem-Miner: Finding Lottery Tickets at Initialization and Bettering All Baselines at 19x Faster Speeds

In the new paper Rare Gems: Finding Lottery Tickets at Initialization, a research team from Carnegie Mellon University, MBZUAI, Petuum, Inc and the University of Wisconsin-Madison proposes GEM-MINER, an algorithm that finds sparse subnetworks at initialization trainable to accuracy that is comparable or better than iterative magnitude pruning (IMP) with warm-up.

by Synced 2021-11-18 1

AI Machine Learning & Data Science Research

Intel’s Prune Once for All Compression Method Achieves SOTA Compression-to-Accuracy Results on BERT

An Intel research team presents Prune Once for All (Prune OFA), a training method that leverages weight pruning and model distillation to produce pretrained transformer-based language models with high sparsity ratios. Applied to BERT, the approach achieves state-of-the-art results in compression-to-accuracy ratio.

by Synced 2021-07-22 1

AI Machine Learning & Data Science Research

Only Train Once: SOTA One-Shot DNN Training and Pruning Framework

A research team from Microsoft, Zhejiang University, Johns Hopkins University, Georgia Institute of Technology and University of Denver proposes Only-Train-Once (OTO), a one-shot DNN training and pruning framework that produces a slim architecture from a full heavy model without fine-tuning while maintaining high performance.

by Synced 2021-06-01 1

AI Machine Learning & Data Science Research

Georgia Tech & Microsoft Reveal ‘Super Tickets’ in Pretrained Language Models: Improving Model Compression and Generalization

A research team from Georgia Tech, Microsoft Research and Microsoft Azure AI studies the collections of “lottery tickets” in extremely over-parametrized models, revealing the generalization performance pattern of winning tickets and proving the existence of “super tickets.”