Tag: diffusion model

AI Machine Learning & Data Science Research

Futureverse’ Universal High-Quality Text-to-Music Generator JEN-1 Makes Significant Advancements

In a new paper JEN-1: Text-Guided Universal Music Generation with Omnidirectional Diffusion Models, a Futureverse research team presents JEN-1, a universal framework that combines bidirectional and unidirectional modes to generate high-quality music conditioned on either text or music representations.

AI Computer Vision & Graphics Machine Learning & Data Science Research

Shanghai AI Lab, CUHK & Stanford U Extend Personalized Text-to-Image Diffusion Models Into Animation Generators Without Tuning

In a new paper AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning, a research team presents AnimateDiff, a general and practical framework that is able to generate animated images for any personalized text-to-image (T2I) model, without any extra training and model-specified tuning.

AI Machine Learning & Data Science Research

Huawei’s DiffFit Unlocks the Transferability of Large Diffusion Models to New Domains

In the new paper DiffFit: Unlocking Transferability of Large Diffusion Models via Simple Parameter-Efficient Fine-Tuning, a Huawei Noah’s Ark Lab research team introduces DiffFit, a parameter-efficient fine-tuning technique that enables fast adaptation to new domains for diffusion image generation. Compared to full fine-tuning approaches, DiffFit achieves 2x training speed-ups while using only ~0.12 percent of trainable parameters.

AI Computer Vision & Graphics Machine Learning & Data Science Research

Oxford U Presents RealFusion: 360° Reconstructions of Any Object from a Single Image

In the new paper RealFusion: 360° Reconstruction of Any Object from a Single Image, an Oxford University research team leverages a diffusion model to generate 360° reconstructions of objects from a single image. Their RealFusion approach achieves state-of-the-art performance on monocular 3D reconstruction benchmarks.

AI Machine Learning & Data Science Nature Language Tech Research

Google’s Imagen Text-to-Image Diffusion Model With Deep Language Understanding Defeats DALL-E 2

In the new paper Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding, a Google Brain research team presents Imagen, a text-to-image diffusion model that combines deep language understanding and photorealistic image generation capabilities to achieve a new state-of-the-art FID score of 7.27 on the COCO dataset.