Large language models (LLMs) popularized by the GPT family have shown impressive language processing, understanding and generating capabilities across diverse domains. Many industry researchers are now exploring ways to improve the task-specific performance of such models and integrate them into their workflows, with US firm Bloomberg emerging as a first-mover in the financial domain.
In the new paper BloombergGPT: A Large Language Model for Finance, a research team from Bloomberg and Johns Hopkins University presents BloombergGPT, a 50 billion parameter language model trained on a 700 billion token dataset that significantly outperforms current benchmark models on financial tasks.
Bloomberg’s goal was to train an LLM capable of achieving best results across a wide range of financial tasks while maintaining competitive performance on general-purpose LLM benchmarks. To this end, the team first leveraged Bloomberg’s extensive data sources to compile what they believe to be the largest-ever finance-specific dataset, comprising 363 billion tokens. This was augmented with various public datasets to reach a total of 700 billion tokens and used to train their 50 billion parameter BloombergGPT model.
BloombergGPT is a decoder-only causal LLM based on the BLOOM (Scao et al., 2022) architecture, comprising 70 layers of transformer decoder blocks with multi-head self-attention, layer-normalization, and a feed-forward network with one hidden layer.
The team used the Amazon AWS SageMaker service for model training and evaluation and the proprietary SageMaker Model Parallelism (SMP) for efficient parallel computing.
In their empirical study, the team compared BloombergGPT with larger baseline models — GPT-NeoX (Black et al., 2022), OPT66B (Zhang et al., 2022a) and BLOOM176B (Scao et al., 2022) — on finance-specific and general-purpose benchmarks.
In the experiments, BloombergGPT achieved the best performance on most financial tasks and comparable or better performance on the general-purpose benchmarks.
“We see tremendous value in having developed the first LLM focused on the financial domain,” says Bloomberg Chief Technology Officer Shawn Edwards, “BloombergGPT will enable us to tackle many new types of applications, while it delivers much higher performance out-of-the-box than custom models for each application, at a faster time-to-market.”
The paper BloombergGPT: A Large Language Model for Finance is on arXiv.
Author: Hecate He | Editor: Michael Sarazen
We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.