Tag: text to image

AI Machine Learning & Data Science Research

OpenAI & Microsoft’s DALL-E 3 Masters Image Creation Through Enhanced Captions

In a new paper Improving Image Generation with Better Captions, a research team from OpenAI and Microsoft introduces DALL-E 3, a cutting-edge text-to-image generation system that is benchmarked for its prowess in prompt following, coherence, and aesthetics, demonstrating its competitive edge against existing counterparts.

AI Machine Learning & Data Science Research

Microsoft Azure’s Idea2Img: Enabling Automatic Image Design and Generation with Enhanced Image Quality

A Microsoft Azure AI research team introduces “Idea2Img” in their paper, “Idea2Img: Iterative Self-Refinement with GPT-4V(ision) for Automatic Image Design and Generation.”, which leverages the capabilities of GPT-4V(ision) to revolutionize the process of automatic image design and generation with enhanced image quality.

AI Computer Vision & Graphics Machine Learning & Data Science Research

Shanghai AI Lab, CUHK & Stanford U Extend Personalized Text-to-Image Diffusion Models Into Animation Generators Without Tuning

In a new paper AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning, a research team presents AnimateDiff, a general and practical framework that is able to generate animated images for any personalized text-to-image (T2I) model, without any extra training and model-specified tuning.

AI Machine Learning & Data Science Nature Language Tech Research

Google’s Imagen Text-to-Image Diffusion Model With Deep Language Understanding Defeats DALL-E 2

In the new paper Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding, a Google Brain research team presents Imagen, a text-to-image diffusion model that combines deep language understanding and photorealistic image generation capabilities to achieve a new state-of-the-art FID score of 7.27 on the COCO dataset.

AI Machine Learning & Data Science Research

OpenAI’s unCLIP Text-to-Image System Leverages Contrastive and Diffusion Models to Achieve SOTA Performance

In the new paper Hierarchical Text-Conditional Image Generation with CLIP Latents, an OpenAI research team combines the advantages of contrastive and diffusion models for text-conditional image generation tasks. Their proposed unCLIP model improves image diversity with minimal loss in photorealism and caption similarity, and produces image quality comparable to the state-of-the-art text-to-image system GLIDE.