Entrepreneur Elon Musk made his intentions clear in co-founding Open-AI in 2015: “We must have democratization of AI technology and make it widely available… so it doesn’t get concentrated in the hands of a few.” Unfortunately, things have not exactly worked out that way. While state-of-the-art large language models (LLMs) have achieved epoch-making performance, most remain the protected intellectual property of huge organizations, with limited or no access available to outsiders.

In the new paper GPT-NeoX-20B: An Open-Source Autoregressive Language Model, a research team from Eleuther.AI introduces GPT-NeoX-20B, the world’s largest publicly accessible pretrained general-purpose dense autoregressive LLM. GPT-NeoX-20B packs 20 billion parameters, demonstrates powerful few-shot learning capabilities, and significantly surpasses similarly sized GPT-3 and FairSeq model performance.

Founded in 2020, Eleuther.AI describes itself as a decentralized collective of volunteer researchers, engineers and developers focused on AI alignment, scaling and open-source AI research. Eleuther believes LLMs are crucial for the development of a wide variety of AI research fields, and have made GPT-NeoX-20B freely available to the public. In a tweet, NYU Professor Gary Marcus satirically described the release as “Genuinely Open AI.”

GPT-NeoX-20B is an autoregressive transformer decoder model built upon OpenAI’s GPT-3. It has 20 billion parameters, 19.9 billion of which are “non-embedding” parameters — this considered an appropriate number based on an inference of scaling laws analysis. It has 44 layers, a hidden dimension size of 6144, and 64 heads. To facilitate research on LLM training dynamics as well as AI safety and mechanistic interpretability, GPT-NeoX-20B stores partially trained checkpoints at evenly spaced 1000-step intervals.

Although it is based on GPT-3, GPT-NeoX-20B has a number of notable differences: it uses rotary embeddings instead of learned positional embeddings, computes attention and feed-forward (FF) layers in parallel instead of in series, and uses dense layers to reduce implementation complexity.

In their empirical studies, the team employed their open-source EleutherAI Language Model Evaluation Harness as a codebase and evaluated GPT-NeoX-20B on a variety of standard language model evaluation datasets in three main categories: natural language processing tasks, mathematical tasks, and advanced knowledge-based tasks.

In the experiments, GPT-NeoX-20B bettered FairSeq13B on 22 of 32 natural language evaluation tasks; significantly outperformed GPT-3 and FairSeq models on mathematical tasks; and outperformed GPT-3 in the five-shot setting on the MMMLU (measuring massive multitask language understanding) test. The results also show that GPT-NeoX-20B benefits significantly more from few-shot evaluations than the FairSeq models.

Overall, the study shows that GPT-NeoX-20B can significantly outperform similarly sized GPT-3 and FairSeq models, indicating its potential to democratize LLM research and aid in the development and deployment of new LLMs.

The open-source training/evaluation code and the model weights are available on the project’s GitHub. The paper GPT-NeoX-20B: An Open-Source Autoregressive Language Model is on arXiv.

Author: Hecate He | Editor: Michael Sarazen

We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.

Share this: Twitter

Facebook

