AI Machine Learning & Data Science Research

Breaking LLMs’ Limits: Upstage AI’s SOLAR 10.7B Shines Bright with Simple Scaling Magic

In a new paper SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling, a Upstage AI research team introduces depth up-scaling (DUS), which emerges as an efficient and uncomplicated technique for amplifying LLMs, surpassing existing open-source state-of-the-art LLMs, such as Llama 2 and Mistral 7B.

In the realm of language modeling, recent strides have empowered the creation of expansive language models (LLMs) boasting millions, if not billions, of parameters trained on extensive text corpora, resulting in exceptional performance. However, the pursuit of such advancements has ushered in challenges, particularly the imperative to continually augment model size in accordance with the performance scaling law.

In response to this challenge, in a new paper SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling, a Upstage AI research team introduces depth up-scaling (DUS), which emerges as an efficient and uncomplicated technique for amplifying LLMs, surpassing existing open-source state-of-the-art LLMs, such as Llama 2 and Mistral 7B.

The team contends that to achieve a superior performance-to-size ratio, the widely used 7B-sized LLMs should align with the Pareto-optimal curve. They advocate for scaling up these 7B LLMs, leveraging pre-trained weights from base models for a more resource-efficient expansion.

In the implementation of DUS, the team selects a high-performing base model and applies the novel method to produce a scaled-up model utilizing pre-trained weights. Remarkably, a model scaled with DUS seamlessly integrates into the existing training and inference framework of base LLMs, retaining maximal efficiency and efficacy.

The DUS process unfolds by duplicating the base model, a 32-layer Llama2 architecture with Mistral 7B pre-trained weights. Subsequently, the team trims the last 8 layers from the original model and the first 8 layers from the duplicate. The concatenated result is a depth up-scaled model, SOLAR 10.7B, boasting 48 layers and 10.7 billion parameters.

In their empirical evaluation, the team pits SOLAR 10.7B against other top-performing models across six tasks. Notably, SOLAR 10.7B surpasses similarly sized pretrained models like Qwen 14B and Mistral 7B, demonstrating the efficacy of DUS in up-scaling base LLMs. Despite its smaller size, SOLAR 10.7B-Instruct achieves the highest H6 score, even outperforming leading open-source LLMs like Mixtral 8x7B Instruct-0.1 and Qwen 72B.

To foster collaboration and innovation in natural language processing (NLP), the Upstage AI team releases SOLAR 10.7B under the Apache 2.0 license. This open-source approach enables broader accessibility and application of these models by researchers and developers worldwide.

The paper SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling on arXiv.


Author: Hecate He | Editor: Chain Zhang


We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.

1 comment on “Breaking LLMs’ Limits: Upstage AI’s SOLAR 10.7B Shines Bright with Simple Scaling Magic

  1. Expanding the scale of large language models with the ability to expand the depth of simplicity but effectively.

Leave a Reply

Your email address will not be published. Required fields are marked *