Pretrained-Model

by Synced 2023-07-18 1

65-Billion-Parameter Large Model Pretraining Accelerated by 38%, Best Practices for Building LLaMA-like Base Models Open-Source

Colossal-AI—the world’s largest and most active big model development tool and community—utilizes the current most widely used large model, LLaMA, to provide an example of the tool’s groundbreaking pre-training solutions for the 65 billion parameter large model which improves the training speed by 38%.

by Synced 2023-01-19 2

AI Machine Learning & Data Science Research

BERT-Style Pretraining on Convnets? Peking U, ByteDance & Oxford U’s Sparse Masked Modelling With Hierarchy Leads the Way

In the new paper Designing BERT for Convolutional Networks: Sparse and Hierarchical Masked Modeling, a research team from Peking University, ByteDance, and the University of Oxford presents Sparse Masked Modelling with Hierarchy (SparK), the first BERT-style pretraining approach that can be used on convolutional models without any backbone modifications.

by Synced 2022-11-09 2

AI Machine Learning & Data Science Research

Almost 7X Cheaper! Colossal-AI’s Open Source Solution Accelerates AIGC at a Low-Cost Diffusion Pretraining and Hardware Fine-Tuning Can Be

Colossal-AI releases a complete open-source Stable Diffusion pretraining and fine-tuning solution that reduces the pretraining cost by 6.5 times, and the hardware cost of fine-tuning by 7 times, while simultaneously speeding up the processes! The fine-tuning task flow can also be conveniently completed on an RTX 2070/3050 PC.

by Synced 2022-09-08 0

AI Machine Learning & Data Science Nature Language Tech Research

Using State-Of-The-Art AI Models for Free: Try OPT-175B on Your Cellphone and Laptop

Colossal-AI, a unified deep learning system for the big model era, can efficiently and rapidly deploy large AI model training and inference with just a few lines of code, and promote the low-cost application and implementation of big models.

by Synced 2022-04-28 2

AI Machine Learning & Data Science Nature Language Tech Research

Adobe’s UDoc Captures Cross-Modal Correlations in a Unified Pretraining Framework to Improve Document Understanding

In the new paper Unified Pretraining Framework for Document Understanding, an Adobe Research and Adobe Document Cloud team presents a unified pretraining framework for document understanding that enables cross-modal connections, relevant information highlighting in both visual and textual modalities, and cross-modal connections. UDoc achieves impressive performance on various downstream tasks.

by Synced 2022-01-10 0

AI Machine Learning & Data Science Research

Counterfactual Memorization in Language Models: Distinguishing Rare from Common Memorization

A team from Google Research, University of Pennsylvania and Cornell University proposes a principled perspective to filter out common memorization for LMs, introducing “counterfactual memorization” to measure the expected change in a model’s prediction and distinguish “rare” (episodic) memorization from “common” (semantic) memorization in neural LMs.

by Synced 2022-01-07 0

AI Machine Learning & Data Science Research

Baidu’s 10-Billion Scale ERNIE-ViLG Unified Generative Pretraining Framework Achieves SOTA Performance on Bidirectional Vision-Language Generation Tasks

Baidu researchers propose ERNIE-ViLG, a 10-billion parameter scale pretraining framework for bidirectional text-image generation. Pretrained on 145 million (Chinese) image-text pairs, ERNIE-ViLG achieves state-of-the-art performance on both text-to-image and image-to-text generation tasks.

by Synced 2021-11-29 1

AI Computer Vision & Graphics Machine Learning & Data Science Research

Microsoft’s ‘Florence’ General-Purpose Foundation Model Achieves SOTA Results on Dozens of CV Benchmarks

In the paper A New Foundation Model for Computer Vision, a Microsoft research team proposes Florence, a novel foundation model for computer vision that significantly outperforms previous large-scale pretraining approaches and achieves new SOTA results across a wide range of visual and visual-linguistic benchmarks.

by Synced 2021-11-23 2

AI Machine Learning & Data Science Research

Microsoft’s DeBERTaV3 Uses ELECTRA-Style Pretraining With Gradient-Disentangled Embedding Sharing to Boost DeBERTa Performance on NLU Tasks

Microsoft releases DeBERTaV3, improving the original DeBERTa model using ELECTRA-style pretraining with gradient-disentangled embedding sharing to achieve better pretraining efficiency and a significant performance jump.

by Synced 2021-11-22 1

AI Computer Vision & Graphics Machine Learning & Data Science Research

Microsoft Asia’s Swin Transformer V2 Scales the Award-Winning ViT to 3 Billion Parameters and Achieves SOTA Performance on Vision Benchmarks

Microsoft Research Asia has upgraded their Swin Transformer with a new version featuring three billion parameters to train images with resolutions up to 1,536 x 1,536 and advance the SOTA on four representative vision benchmarks.

by Synced 2021-10-14 1

AI Machine Learning & Data Science Research

Google Researchers Explore the Limits of Large-Scale Model Pretraining

A Google Research team conducts a systematic exploration comprising more than 4800 experiments on Vision Transformers, MLP-Mixers and ResNets with parameters ranging from 10 million to 10 billion, evaluated on more than 20 downstream image recognition tasks, aiming to capture the nonlinear relationships between performance on upstream and downstream tasks.

by Synced 2021-10-04 3

AI Computer Vision & Graphics Machine Learning & Data Science Research

Debiasing Image Datasets: Oxford University Presents PASS, an ImageNet Replacement for Self-Supervised Pretraining

An Oxford University research team presents PASS, a large (1.28M) image collection excluding humans, created as an ImageNet replacement for self-supervised pretraining without technical, ethical or legal issues.

by Synced 2021-08-19 3

AI Machine Learning & Data Science Research

100+ Stanford Researchers Publish 200+ Page Paper on the AI Paradigm Shift Introduced by Large-Scale Models

In a 200+ page paper, Percy Liang, Fei-Fei Li, and over 100 other researchers from the Stanford University Center for Research on Foundation Models (CRFM) systematically describe the opportunities and risks of large-scale pretrained “foundation” models. The unique study aims to provide a clearer understanding of how these models work, when and how they fail, and the various capabilities provided by their emergent properties.

by Synced 2021-06-14 1

AI Machine Learning & Data Science Nature Language Tech Research

Google Researchers Merge Pretrained Teacher LMs Into a Single Multilingual Student LM Via Knowledge Distillation

A Google Research team proposes MergeDistill, a framework for merging pretrained teacher LMs from multiple monolingual/multilingual LMs into a single multilingual task-agnostic student LM to leverage the capabilities of the powerful language-specific LMs while still being multilingual and enabling positive language transfer.

by Synced 2021-06-01 1

AI Machine Learning & Data Science Research

Georgia Tech & Microsoft Reveal ‘Super Tickets’ in Pretrained Language Models: Improving Model Compression and Generalization

A research team from Georgia Tech, Microsoft Research and Microsoft Azure AI studies the collections of “lottery tickets” in extremely over-parametrized models, revealing the generalization performance pattern of winning tickets and proving the existence of “super tickets.”

by Synced 2021-05-18 2

AI Machine Learning & Data Science Research

Facebook Transfer Learning Method Boosts Code Autocompletion Accuracy by Over 50%

A research team from Facebook shows how the power of transfer learning can enable pretraining on non-IDE, non-autocompletion and different-language example code sequences before fine-tuning on the autocompletion prediction task to improve model accuracy by over 50 percent on very small fine-tuning datasets and over 10 percent on 50k labelled examples.

by Synced 2021-05-04 2

AI Machine Learning & Data Science Research

Huawei & Tsinghua U Method Boosts Task-Agnostic BERT Distillation Efficiency by Reusing Teacher Model Parameters

A research team from Huawei Noah’s Ark Lab and Tsinghua University proposes Extract Then Distill (ETD), a generic and flexible strategy for reusing teacher model parameters for efficient and effective task-agnostic distillation that can be applied to student models of any size.

by Synced 2021-04-27 2

AI Machine Learning & Data Science Nature Language Tech Research

Microsoft & Peking U Researchers Identify ‘Knowledge Neurons’ in Pretrained Transformers, Enabling Fact Editing

A research team from Microsoft Research and Peking University peeps into pretrained transformers and investigates how factual knowledge is stored, proposing a method to identify “knowledge neurons,” which can be utilized to explicitly update and erase facts.