The recent development of large foundation models such as BERT, GPT-3 MAE, etc., has brought a paradigm shift to AI. Such models benefit from pretraining on big data at scale and have demonstrated game-changing performance and excellent transfer capability on various downstream tasks. The lack of a unified theoretical framework for such foundation models however remains an obstacle to their further improvements and extensions.
A research team from Sun Yat-sen University and UBTECH addresses this issue in the new paper Big Learning: A Universal Machine Learning Paradigm?; proposing a unified approach for justifying, analyzing, and improving foundation models. The team’s proposed big learning framework can model many-to-all joint/conditional/marginal data distributions and delivers extraordinary data and task flexibilities.
The team summarizes their main contributions as follows:
- Big learning serves as a theoretical platform for justifying, analyzing, and improving big/foundation models, because most of them are implicitly doing (parts of) big learning.
- By modelling many-to-all joint/conditional/marginal data distributions, big learning (i) comprehensively exploits the available data information (thus focusing on the data essence) and delivers the corresponding data capabilities (valuable for, e.g., data completion and flexible counter-factual analysis) and (ii) embraces statistical sharing power to implicitly summarize intrinsic compositional data meta-knowledge within model parameters, enhancing the model’s robustness, adaptability, and generalization capabilities.
- Big learning delivers extraordinary data and task flexibilities by enabling large-scale training with complete/incomplete data on diverse learning tasks across various domains, leading to (i) minimal human interventions in data collection and learning-task specification, (ii) a significantly reduced training-test (or pretraining-finetuning) gap, and (iii) a potential avenue toward true self-learning on the Internet.
The paper first presents the main ideas informing big learning in simplified unsupervised settings. Here, the researchers find that conventional models are restricted by limited complete data samples and cannot handle incomplete data. They thus focus on the simultaneous modelling of joint, conditional, and marginal distributions to enable greater flexibility in big learning training with complete or incomplete data and the collection of comprehensive data capabilities via “exquisite data exploitation” techniques.
Building on the flexibility of unsupervised big learning, the researchers then generalize it to other settings such as supervised learning, self-supervised learning, generative learning, and their various combinations. They conclude that big learning can serve as a new universal machine learning paradigm and enable flexible combinations of different machine learning paradigms via knowledge communication.
In their empirical studies, the team evaluated big learning’s effectiveness by training models with both complete and incomplete data on a wide variety of tasks. The results show that big learning can effectively perform training with minimal-to-none human interventions in data collection, reduce gaps in training-test mismatches, and behave more robustly with regard to the independent and identically distributed random variables (IID) assumption.
Overall, the work demonstrates that big learning can boost both data and task flexibility, indicating its potential to serve as a new universal machine learning paradigm (big data, big/foundation models, big learning) for foundation models.
The paper Big Learning: A Universal Machine Learning Paradigm? is on arXiv.
Author: Hecate He | Editor: Michael Sarazen
We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.