Meta AI’s Llama 2: Open-Sourced LLM with Commercial Rights Reshapes Industry

In a new paper Llama 2: Open Foundation and Fine-Tuned Chat Model, a Meta AI research team presents and releases Llama 2 and Llama 2-Chat, the former one is a family of pretrained and fine-tuned LLMs and the later one is a fine-tuned version of Llama 2 that is optimized for dialogue, paving the way to develop more responsible LLMs.

Large Language Models (LLMs) have become a cornerstone in the era of modern deep learning, demonstrating an impressive capacity to process complex reasoning tasks. Their ability to interact with humans via intuitive chat interfaces has led to their widespread adoption as chatbots among the general populace.

However, many existing LLMs require extensive fine-tuning to align with human preferences, a process that can be both computationally expensive and require significant manual effort. Furthermore, this process is often opaque and not easily reproducible, which hinders the progress of AI alignment research.

Addressing these challenges, a research team from Meta AI introduces and open sourced Llama 2 and Llama 2-Chat with a new paper, “Llama 2: Open Foundation and Fine-Tuned Chat Model.” The former is a suite of pre-trained and fine-tuned LLMs, while the latter is a dialogue-optimized version of Llama 2. Crucially, both models are open sourced with license that authorizes commercial use, marking a significant stride towards fostering transparency and promoting the development of more responsible, replicable LLMs.

Both Llama 2 and Llama 2-Chat have variants of with 7B, 13B, and 70B parameters. The team first uses an optimized auto-regressive transformer with some modifications for pretraining. Specifically, compared to Llama 1, they performed more robust data cleaning, updated the data mixes, trained on 40% more total tokens, doubled the context length, as well as leveraged grouped-query attention (GQA) for inference scalability improving.

The training corpus of Llama 2 consists of mixed data from publicly available resources and does not include data related to Meta products or services. Llama 2 adopts most of the pre-training settings and model architecture from Llama 1, including the standard Transformer architecture, pre-normalization with RMSNorm, SwiGLU activation function, and rotational positional embeddings.

In terms of hyperparameters, Meta utilizes the AdamW optimizer for training with β_1 = 0.9, β_2 = 0.95, and eps = 10^−5. A cosine learning rate schedule is employed, with a warm-up of 2000 steps and a decay of the final learning rate to 10% of the peak learning rate.

The researchers reported the results of open-source models, including Llama 1, Llama 2 base models, MPT (MosaicML), and Falcon on standard academic benchmarks. The results indicating that Llama 2 outperforms Llama 1.

They also compared Llama 2 with closed-source models. Llama 2 70B is comparable to GPT-3.5 on MMLU and GSM8K, but there is a significant gap in performance on the encoding benchmark. Furthermore, on almost all benchmarks, the results of Llama 2 70B are on par or outperform Google’s PaLM (540B).

The human evaluation results also show that Llama 2 surpasses open-source models by a significant margin, moreover, the largest Llama 2-Chat model even can compete ChatGPT.

The researchers have responsibly opened access to Llama 2 and Llama 2-Chat, and they claim they will make further improvements in terms of model transparency and safety.

The paper Llama 2: Open Foundation and Fine-Tuned Chat Models on

Author: Hecate He | Editor: Chain Zhang

2 comments on "Meta AI's Llama 2: Open-Sourced LLM with Commercial Rights Reshapes Industry

  1. Meta, in collaboration with Microsoft, has launched LLaMA 2, an updated version of the popular language model LLaMa. This innovative model is capable of fluently comprehending and producing content in a variety of languages.

    I really appreciated
    Thank you for sharing your blog its very useful and informational blog for us.

