In recent years, large-scale neural networks have ushered in a transformative era in generative modeling. These neural networks possess an unprecedented capacity to capture intricate relationships among numerous variables, opening up exciting possibilities in various domains.
One prominent approach in generative modeling is the utilization of autoregressive networks, which currently stand as the state-of-the-art for language modeling. These networks excel in handling discrete data and demonstrate impressive performance. However, their effectiveness wanes when it comes to domains like image generation, characterized by continuous data. Conversely, diffusion models have proven highly effective in image generation tasks. Nevertheless, they struggle to match the performance of autoregressive models when applied to discrete data.
In a new paper A Paradigm Shift in Machine Translation: Boosting Translation Performance of Large Language Models, a NNAISENSE research team introduces a novel class of generative models known as Bayesian Flow Networks (BFNs). These BFNs combine the power of Bayesian inference with neural networks in an iterative modeling process, enabling successful application to continuous, discretized, and discrete data while maintaining competitive performance.

The researchers posit that the key to diffusion models’ superiority in image generation lies in their progression from coarse to fine image details as the level of noise decreases. However, a fundamental challenge arises when dealing with discrete data – the noise in the diffusion process is also discrete and thus discontinuous.
The primary motivation behind this work stems from the team’s conviction that a fully continuous transmission process would prove more effective for discrete data. Such an approach would open doors to gradient-based sample guidance and few-step generation techniques, akin to those developed for continuous diffusion models.

Building upon this notion, BFNs operate on the parameters of a data distribution rather than directly on noisy data. In this model architecture, the parameters of a set of independent distributions undergo modification through Bayesian inference in light of noisy data samples. Subsequently, these modified parameters serve as input to a neural network, which outputs a second, interdependent distribution.
The generative procedure in BFNs unfolds iteratively, starting from a simple prior and updating the two distributions. This process mirrors the reverse progression seen in diffusion models, albeit with conceptual simplicity, as it obviates the need for a forward process. The researchers derive discrete and continuous-time loss functions for handling continuous, discretized, and discrete data, along with the relevant sample generation procedures.
A notable feature of BFNs is their ability to handle discrete data, whose network inputs lie on the probability simplex, making them natively differentiable. This opens the door to gradient-based sample guidance and few-step generation techniques in discrete domains, such as language modeling. Importantly, the loss function in BFNs directly optimizes data compression without imposing constraints on the network architecture.


Empirical studies conducted by the research team demonstrate that BFNs achieve competitive log-likelihoods in image modeling tasks using dynamically binarized MNIST and CIFAR-10 datasets. Furthermore, they outperform all known discrete diffusion models when applied to the text8 character-level language modeling task. The researchers anticipate that this innovative work will inspire fresh perspectives and drive new directions in generative modeling research.
The paper A Paradigm Shift in Machine Translation: Boosting Translation Performance of Large Language Models on arXiv.
Author: Hecate He | Editor: Chain Zhang

We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.
The integration of Bayesian principles into neural networks via Bayesian Flow Networks opens exciting possibilities for modeling both discrete and continuous data.
Best Moving Services in Dallas TX