From Texts to Kitties: OpenAI’s GPT Language Model Tackles Image Generation

It’s been just three weeks since OpenAI wowed the world with its gigantic 175-billion-parameter GPT-3 language model. Now, the San Francisco-based AI company has triggered a new stir on social media — proposing that large transformer-based language models trained on pixel sequences can generate coherent images without the use of labels. The new paper comes from an OpenAI research team that includes Founder and Chief Scientist Ilya Sutskever.

The success of unsupervised learning methods and transformer models in natural language processing (NLP) inspired OpenAI researchers to explore this new direction. Can similar models also learn useful representations for images?

Explains OpenAI in a blog post: “Just as a large transformer model trained on language can generate coherent text by establishing a correlation between sample quality and image classification accuracy, we show that our best generative model also contains features competitive with top convolutional nets in the unsupervised setting.” Unsupervised learning generally refers to model training that does not require manual data labelling.

AI pioneer and Turing Award honouree Geoffrey Hinton has tweeted that “unsupervised learning of representations is beginning to work quite well without requiring reconstruction.” In Hinton’s paper A Simple Framework for Contrastive Learning of Visual Representations, a linear classifier trained on self-supervised representations learned by a simple framework SimCLR achieves a significant performance leap in image recognition.

One of the vital insights the OpenAI researchers learned from transformer models like BERT and GPT-2 is that they are domain agnostic, meaning that they can be directly applied to 1D sequences of any form. The team decided to unroll raw images to a low resolution and reshape them into text-like long sequences of pixels, as otherwise the unrolled sequences would be too large to handle. “If we naively trained a transformer on a sequence of length 2242 × 3, our attention logits would be tens of thousands of times larger than those used in language models, and even a single layer would not fit on a GPU.“

The team trained a model that uses the same transformer architecture as GPT-2 in language, dubbed iGPT, which learned strong image representations as measured by linear probing, fine-tuning, and low-data classification. The approach consists of a pretraining stage completed without labels, followed by a fine-tuning step. The team leveraged one of two pretraining objectives to achieve the pixel prediction: autoregressive, which is also the GPT-2 pretraining approach, and BERT. Once the objectives learned the representations, the team evaluated them with linear probes or fine-tuning.

In experiments on CIFAR-10 the iGPT-L model achieved 96.3 percent accuracy with a linear probe, outperforming a supervised Wide ResNet. It also reached 99.0 percent accuracy with full fine-tuning, matching the top supervised pretrained models. On ImageNet, the larger model iGPT -XL trained on a mixture of ImageNet and web images was comparable with self-supervised benchmarks, achieving an accuracy of 72.0 percent.

Even without the guidance of any human-labelled data, iGPT managed to generate a wide range of coherent images. But the researchers note that this performance came with a hefty price: “iGPT-L was trained for roughly 2500 V100-days while a similarly performing MoCo model can be trained in roughly 70 V100-days.“

I was excited and then read "2500 V100-days" 😹! This is pure brute forcing the problem and can you imagine the environmental impact of this. If I understand correctly, with 100 V100 GPUs, it would still take 25 days running 24/7 to train the model. https://t.co/v2t3Nb6ckO
— Alexia Jolicoeur-Martineau (@jm_alexia) June 17, 2020

OpenAI stated sees the work as a proof-of-concept demonstrating the enormous potential of large transformer language models to learn unsupervised representations in new domains despite. The drawback is the jaw-dropping compute cost to train the models — which may be a deal-breaker for researchers who don’t have access to a supercomputer.

The paper Generative Pretraining from Pixels is available on the OpenAI project page, and the code can be found on GitHub.

Journalist: Fangyu Cai | Editor: Michael Sarazen

We know you don’t want to miss any story. Subscribe to our popular Synced Global AI Weekly to get weekly AI updates.

11 comments on “From Texts to Kitties: OpenAI’s GPT Language Model Tackles Image Generation”

Pingback: [R] From Texts to Kitties: OpenAI’s GPT Language Model Tackles Image Generation – tensor.io
Pingback: Nederlandse investeringen in AI blijven achter / Zo voorkomt Heineken tekort aan bier in supermarkt - ai.nl
Pingback: AI, Machine Learning and the Pandemic » Healthcare News and Products
Pingback: AI, Machine Learning and the Pandemic – MedLancr
Pingback: AI, Machine Learning and the Pandemic - Chemist.gr
Pingback: ‘Farewell Convolutions’ – ML Community Applauds Anonymous ICLR 2021 Paper – A2M1N
Pingback: Manipulace obrazem – Dominika Miholová
Stone Emma

2023-05-31

Hello. Thank you. This article is so useful for me.

Loading...

Kate Middletone

2023-05-31

Hello. Open AI is sery useful thing nowasays. I like aimals and pictures of them. The mole images on the image service are simply adorable! The website offers a charming collection of high-quality photos that capture the cuteness and uniqueness of these little creatures. From close-up shots to playful poses, the images provide a wide range of visuals that are both endearing and captivating.

Loading...

Kate Brown

2023-05-31

Hello. Open AI is sery useful thing nowasays. I like aimals and pictures of them. The mole images on the image service are simply adorable! The website offers a charming collection of high-quality photos that capture the cuteness and uniqueness of these little creatures. From close-up shots to playful poses, the images provide a wide range of visuals that are both endearing and captivating.

Loading...

Stone Emma

2023-05-31

Hello. I like aimals and pictures of them. The mole images on the image service are simply adorable! The website offers a charming collection of high-quality photos that capture the cuteness and uniqueness of these little creatures. From close-up shots to playful poses, the images provide a wide range of visuals that are both endearing and captivating.

Loading...

From Texts to Kitties: OpenAI’s GPT Language Model Tackles Image Generation

Like this:

11 comments on “From Texts to Kitties: OpenAI’s GPT Language Model Tackles Image Generation”

Leave a Reply Cancel reply

Related

Share this:

Like this:

11 comments on “From Texts to Kitties: OpenAI’s GPT Language Model Tackles Image Generation”

Leave a Reply Cancel reply

Related