Tag: Pretrained-Model

AI Machine Learning & Data Science Research

65-Billion-Parameter Large Model Pretraining Accelerated by 38%, Best Practices for Building LLaMA-like Base Models Open-Source

Colossal-AI—the world’s largest and most active big model development tool and community—utilizes the current most widely used large model, LLaMA, to provide an example of the tool’s groundbreaking pre-training solutions for the 65 billion parameter large model which improves the training speed by 38%.

AI Machine Learning & Data Science Research

BERT-Style Pretraining on Convnets? Peking U, ByteDance & Oxford U’s Sparse Masked Modelling With Hierarchy Leads the Way

In the new paper Designing BERT for Convolutional Networks: Sparse and Hierarchical Masked Modeling, a research team from Peking University, ByteDance, and the University of Oxford presents Sparse Masked Modelling with Hierarchy (SparK), the first BERT-style pretraining approach that can be used on convolutional models without any backbone modifications.

AI Machine Learning & Data Science Research

Almost 7X Cheaper! Colossal-AI’s Open Source Solution Accelerates AIGC at a Low-Cost Diffusion Pretraining and Hardware Fine-Tuning Can Be

Colossal-AI releases a complete open-source Stable Diffusion pretraining and fine-tuning solution that reduces the pretraining cost by 6.5 times, and the hardware cost of fine-tuning by 7 times, while simultaneously speeding up the processes! The fine-tuning task flow can also be conveniently completed on an RTX 2070/3050 PC.

AI Machine Learning & Data Science Nature Language Tech Research

Adobe’s UDoc Captures Cross-Modal Correlations in a Unified Pretraining Framework to Improve Document Understanding

In the new paper Unified Pretraining Framework for Document Understanding, an Adobe Research and Adobe Document Cloud team presents a unified pretraining framework for document understanding that enables cross-modal connections, relevant information highlighting in both visual and textual modalities, and cross-modal connections. UDoc achieves impressive performance on various downstream tasks.

AI Machine Learning & Data Science Research

Counterfactual Memorization in Language Models: Distinguishing Rare from Common Memorization

A team from Google Research, University of Pennsylvania and Cornell University proposes a principled perspective to filter out common memorization for LMs, introducing “counterfactual memorization” to measure the expected change in a model’s prediction and distinguish “rare” (episodic) memorization from “common” (semantic) memorization in neural LMs.

AI Machine Learning & Data Science Research

Baidu’s 10-Billion Scale ERNIE-ViLG Unified Generative Pretraining Framework Achieves SOTA Performance on Bidirectional Vision-Language Generation Tasks

Baidu researchers propose ERNIE-ViLG, a 10-billion parameter scale pretraining framework for bidirectional text-image generation. Pretrained on 145 million (Chinese) image-text pairs, ERNIE-ViLG achieves state-of-the-art performance on both text-to-image and image-to-text generation tasks.

AI Computer Vision & Graphics Machine Learning & Data Science Research

Microsoft’s ‘Florence’ General-Purpose Foundation Model Achieves SOTA Results on Dozens of CV Benchmarks

In the paper A New Foundation Model for Computer Vision, a Microsoft research team proposes Florence, a novel foundation model for computer vision that significantly outperforms previous large-scale pretraining approaches and achieves new SOTA results across a wide range of visual and visual-linguistic benchmarks.

AI Machine Learning & Data Science Research

100+ Stanford Researchers Publish 200+ Page Paper on the AI Paradigm Shift Introduced by Large-Scale Models

In a 200+ page paper, Percy Liang, Fei-Fei Li, and over 100 other researchers from the Stanford University Center for Research on Foundation Models (CRFM) systematically describe the opportunities and risks of large-scale pretrained “foundation” models. The unique study aims to provide a clearer understanding of how these models work, when and how they fail, and the various capabilities provided by their emergent properties.

AI Machine Learning & Data Science Nature Language Tech Research

Google Researchers Merge Pretrained Teacher LMs Into a Single Multilingual Student LM Via Knowledge Distillation

A Google Research team proposes MergeDistill, a framework for merging pretrained teacher LMs from multiple monolingual/multilingual LMs into a single multilingual task-agnostic student LM to leverage the capabilities of the powerful language-specific LMs while still being multilingual and enabling positive language transfer.

AI Machine Learning & Data Science Research

Facebook Transfer Learning Method Boosts Code Autocompletion Accuracy by Over 50%

A research team from Facebook shows how the power of transfer learning can enable pretraining on non-IDE, non-autocompletion and different-language example code sequences before fine-tuning on the autocompletion prediction task to improve model accuracy by over 50 percent on very small fine-tuning datasets and over 10 percent on 50k labelled examples.