As Its GPT-3 Model Wows the World, OpenAI CEO Suggests ‘the Hype Is Way Too Much’

OpenAI’s 175 billion parameter language model GPT-3 has gone viral once again, with a flurry of tech tweets celebrating the many innovative new applications — ranging from automatic code and short story generators to fully functioning search engines — that have leveraged the GPT-3 API OpenAI released in June. But not everyone in the ML community is impressed.

Open AI’s first GPT (Generative Pre-Training) model was introduced in June 2018. The then-novel idea was to take advantage of the huge supply of unlabelled text corpuses and the Transformer generative deep learning architecture to train a powerful general language model. In February 2019, the San Francisco-based AI company rolled out a much larger GPT-2 model with key technical updates such as pre-activation, zero domain transfer, and zero task transfer. With 1.5 billion parameters, GPT-2 was 12 times larger than the initial GPT. OpenAI unveiled the third version, GPT-3, which scaled up the model architecture, data and compute, in their May research paper Language Models are Few-Shot Learners.

GPT-3 delivered SOTA performance across a variety of NLP tasks and benchmarks in zero-shot, one-shot, and few-shot settings. For example, when fed the prompt: “Close your eyes and, with detail, describe the sounds and smells around you right now. Create a picture that I can clearly see in my mind,” a GPT-3-powered writing assistant developed by ShortlyRead generated the following dark tale, which reads like the product of a creative writing class:

(This place seemed much smaller when she’d first walked in. It was probably the concrete walls – too bare, too harsh, like a cell. They always made her want to burrow into the corners.)
Scritch-scratch.
It was getting cold. She shivered. The sounds continued. More rapid now. She coughed.

A few months ago I jokingly said to a friend “we should create an AI that can write fiction for @ShortlyRead”

I knew the tech was coming but I didn’t expect it to be this good. @OpenAI @gdb

Here’s GPT-3 as a creative writing assistant. Play with v1 now https://t.co/dBQVygCTwv pic.twitter.com/W1lvTKjWVQ
— Qasim Munye (@QasimMunye) July 30, 2020

I asked GPT-3 to write a response to the philosophical essays written about it by @DrZimmermann, @rinireg @ShannonVallor, @add_hawk, @AmandaAskell, @dioscuri, David Chalmers, Carlos Montemayor, and Justin Khoo published yesterday by @DailyNousEditor. It's quite remarkable! pic.twitter.com/W1PVlsHdu4
— Raphaël Millière (@raphamilliere) July 31, 2020

Raphaël Millière, a Philosopher of Mind & Cognitive Science at Columbia University’s Center for Science and Society, asked GPT-3 to compose a response to the philosophical essays written about it. The generated text includes an advanced argument and even a bit of self-reflection: “Human philosophers often make the error of assuming that all intelligent behavior is a form of reasoning. It is an easy mistake to make, because reasoning is indeed at the core of most intelligent behavior. However, intelligent behavior can arise through other mechanisms as well. […] I lack long-term memory. Every time our conversation starts anew, I forget everything that came before.“

Millière employed AI Dungeon’s GPT-3 based “Dragon” model instead of the GPT-3 API, along with some custom prompts, explaining, “there’s cherry-picking at two levels: within each complete response, some sentences were not GPT-3’s first output (although they were still written by GPT-3!); and I shared only the two most interesting complete responses I obtained through this process.” Millière tweeted that even taking the cherry-picking process into account, the results were “quite remarkable!”

I asked GPT-3 to write a response to the philosophical essays written about it by @DrZimmermann, @rinireg @ShannonVallor, @add_hawk, @AmandaAskell, @dioscuri, David Chalmers, Carlos Montemayor, and Justin Khoo published yesterday by @DailyNousEditor. It's quite remarkable! pic.twitter.com/W1PVlsHdu4
— Raphaël Millière (@raphamilliere) July 31, 2020

Millière pointed out however that “serious and systematic assessment of GPT-3’s abilities has to be done via the API, w/ many trials per task and no cherry-picking. I don’t think any researcher would claim that playing w/@AiDungeon is a valid substitute. Unfortunately, most of us lack access to the API.” OpenAI is offering free access to the API private beta through mid-August. Interested academic researchers and collaborators must however submit use cases or products to join a waitlist. NYU Professor Gary Marcus, for example, hasn’t received API access even though he has repeatedly requested it.

Toronto-based machine learning engineer Aditya Joshi has curated a list of jaw-dropping GPT-3 powered applications that includes an all-purpose Excel function, a recipe creator, a Google-ads generator, and even a comedy sketch writer. But as the list grows, some are cautioning against overly optimistic expectations regarding the language model. Even OpenAI CEO Sam Altman tweeted that “the hype is way too much.”

no. GPT-3 fundamentally does not understand the world that it talks about. Increasing corpus further will allow it to generate a more credible pastiche but not fix its fundamental lack of comprehension of the world. Demos of GPT-4 will still require human cherry picking. https://t.co/6vl3ettSZk
— Gary Marcus (@GaryMarcus) August 2, 2020

Some GPT-3-powered applications have also found critics. Facebook’s head of AI Jerome Pesenti slammed a tweet generator dubbed “thoughts” that was created using GPT-3 for generating harmfully biased sentences, and suggested OpenAI may have released the API prematurely.

#gpt3 is surprising and creative but it’s also unsafe due to harmful biases. Prompted to write tweets from one word – Jews, black, women, holocaust – it came up with these (https://t.co/G5POcerE1h). We need more progress on #ResponsibleAI before putting NLG models in production. pic.twitter.com/FAscgUr5Hh
— Jerome Pesenti (@an_open_mind) July 18, 2020

Pesenti’s concerns are not without foundation. GPT-3 predecessor GPT-2 was initially not made publicly available, as OpenAI explained, “it’s clear that the ability to generate synthetic text that is conditioned on specific subjects has the potential for significant abuse.” It not until was nine months later, in November 2019, that OpenAI publicly released GPT-2 along with code and model weights after “no strong evidence of misuse” had been observed.

Last week I raised concerns about using #gpt3 in production because it can easily output toxic language that propagates harmful biases. I thought it was a pretty uncontroversial stance but the responses ranged from complete misunderstanding of AI to total irresponsibility. 1/13
— Jerome Pesenti (@an_open_mind) July 22, 2020

.@VioletNPeng wrote a paper that produced shockingly #racist and #sexist paragraphs without any cherry picking. For @OpenAI to launch this during #BlackLivesMattters is tone deaf. pic.twitter.com/6q3szp0Mm1
— Prof. Anima Anandkumar (@AnimaAnandkumar) June 11, 2020

Altman responded to Pesenti, “We share your concern about bias and safety in language models, and it’s a big part of why we’re starting off with a beta and have safety review before apps can go live,” adding, “We do not (yet) have a service in production for billions of users, and we want to learn from our own and others’ experiences before we do. We totally agree with you on the need to be very thoughtful about the potential negative impact companies like ours can have on the world.”

Thank you again for the comments, and we'd love to hear any other thoughts or learnings from FB about how we could navigate this better!
— Sam Altman (@sama) July 22, 2020

“AI is going to change the world, but GPT-3 is just a very early glimpse. We have a lot still to figure out,” Altman tweeted. As the waves of praise celebrating the early successes of the 175 billion parameter language model subside, the exposed limitations are sending a sobering message that the entire AI research community would do well to heed: we still have a lot to figure out indeed.

Reporter: Fangyu Cai | Editor: Michael Sarazen

Synced Report | A Survey of China’s Artificial Intelligence Solutions in Response to the COVID-19 Pandemic — 87 Case Studies from 700+ AI Vendors

This report offers a look at how China has leveraged artificial intelligence technologies in the battle against COVID-19. It is also available on Amazon Kindle. Along with this report, we also introduced a database covering additional 1428 artificial intelligence solutions from 12 pandemic scenarios.

Click here to find more reports from us.

We know you don’t want to miss any story. Subscribe to our popular Synced Global AI Weekly to get weekly AI updates.