The continuous development of large language models (LLMs), popularized by GPT, has encouraged their expansion into diverse domains, sparking great interest among researchers in the finance realm. However, utilizing LLMs in the financial arena presents intricate hurdles, such as the challenge of accessing high-quality financial data, and current proprietary models like BloombergGPT have restricted the accessibility and transparency of their data for training financial LLMs.
In a new paper titled “FinGPT: Open-Source Financial Large Language Models,” a research team from Columbia University and New York University (Shanghai) presents FinGPT, an end-to-end open-source financial large language models (FinLLMs) that democratize financial data to encourage researchers and practitioners to developer user-specified FinLLMs, aiming to unlock new opportunities in open finance.
The team summarizes their main contributions as follows:
- Democratization: FinGPT, as an open-source framework, aims to democratize financial data and FinLLMs, uncovering untapped potentials in open finance.
- Data-centric approach: Recognizing the significance of data curation, FinGPT adopts a data-centric approach and implements rigorous cleaning and preprocessing methods for handling varied data formats and types, thereby ensuring high-quality data.
- End-to-end framework: FinGPT embraces a full-stack framework for FinLLMs.
The goal of this work is to combine the strengths of general LLMs like ChatGPT to train and fine-tune a financial LLM, while also providing researchers and practitioners with accessible and transparent high-quality financial data for their own FinLLMs training.
To achieve this, the team first adopts a data-centric approach to implement data acquisition, cleaning, and preprocessing for a wide variety of data formats and types, including but not limited to Financial News, Company Filings, Social Media Discussions, and Company Announcements, thereby generating high-quality financial data.
In terms of model architecture, FinGPT consists of four key components: Data Source, Data Engineering, LLMs, and Applications. The researchers summarize the key features of each layer as follows:
- Data source layer: This layer assures comprehensive market coverage, addressing the temporal sensitivity of financial data through real-time information capture.
- Data engineering layer: Primed for real-time NLP data processing, this layer tackles the inherent challenges of high temporal sensitivity and low signal-tonoise ratio in financial data.
- LLMs layer: Focusing on a range of fine-tuning methodologies, this layer mitigates the highly dynamic nature of financial data, ensuring the model’s relevance and accuracy.
- Application layer: Showcasing practical applications and demos, this layer highlights the potential capability of FinGPT in the financial sector.
Moreover, the team also applies the key technology in FinGPT, Reinforcement learning from human feedback (RLHF) to equip LLMs with the capabilities to learn individual preferences, such as risk-aversion level, investing habits, personalized robo-advisor, algorithmic trading, etc., stimulating effective personalized financial assistants.
Overall, FinGPT provides a more accessible, flexible, and cost-effective solution for training financial LLMs, and it paves the way to open finance.
The code is available on Project’s GitHub. The paper FinGPT: Open-Source Financial Large Language Models on arXiv.
Author: Hecate He | Editor: Chain Zhang
We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.

