Model Compression

by Synced 2022-06-09 1

Microsoft’s XTC Extreme Lightweight Compression Method for Pretrained Transformers Achieves SOTA Results and 50x Smaller Model Sizes

In the new paper Extreme Compression for Pre-trained Transformers Made Simple and Efficient, a Microsoft research team introduces XTC, a simple yet effective extreme compression pipeline for pretrained transformers that can achieve state-of-the-art results while reducing model size by 50x.

by Synced 2021-11-18 1

AI Machine Learning & Data Science Research

Intel’s Prune Once for All Compression Method Achieves SOTA Compression-to-Accuracy Results on BERT

An Intel research team presents Prune Once for All (Prune OFA), a training method that leverages weight pruning and model distillation to produce pretrained transformer-based language models with high sparsity ratios. Applied to BERT, the approach achieves state-of-the-art results in compression-to-accuracy ratio.

by Synced 2021-07-22 1

AI Machine Learning & Data Science Research

Only Train Once: SOTA One-Shot DNN Training and Pruning Framework

A research team from Microsoft, Zhejiang University, Johns Hopkins University, Georgia Institute of Technology and University of Denver proposes Only-Train-Once (OTO), a one-shot DNN training and pruning framework that produces a slim architecture from a full heavy model without fine-tuning while maintaining high performance.

by Synced 2021-06-17 2

AI Machine Learning & Data Science Research

Does Knowledge Distillation Really Work? NYU & Google Study Provides Insights on Student Model Fidelity

A research team from New York University and Google Research explores whether knowledge distillation really works, showing that a surprisingly large discrepancy often remains between the predictive distributions of the teacher and student models, even when the student has the capacity to perfectly match the teacher.

by Synced 2021-06-01 1

AI Machine Learning & Data Science Research

Georgia Tech & Microsoft Reveal ‘Super Tickets’ in Pretrained Language Models: Improving Model Compression and Generalization

A research team from Georgia Tech, Microsoft Research and Microsoft Azure AI studies the collections of “lottery tickets” in extremely over-parametrized models, revealing the generalization performance pattern of winning tickets and proving the existence of “super tickets.”

by Synced 2018-09-14 0

Research

MIT & Google Propose AutoML for Model Compression

Researchers from MIT, Google, and Xian Jiaotong University recently published a paper proposing AutoML for Model Compression (AMC), which leverages reinforcement learning to shorten model compression processing time and improve results.