OpenAI’s 175 billion parameter language model GPT-3 has gone viral once again, with a flurry of tech tweets celebrating the many innovative new applications — ranging from automatic code and short story generators to fully functioning search engines — that have leveraged the GPT-3 API OpenAI released in June. But not everyone in the ML community is impressed.
Open AI’s first GPT (Generative Pre-Training) model was introduced in June 2018. The then-novel idea was to take advantage of the huge supply of unlabelled text corpuses and the Transformer generative deep learning architecture to train a powerful general language model. In February 2019, the San Francisco-based AI company rolled out a much larger GPT-2 model with key technical updates such as pre-activation, zero domain transfer, and zero task transfer. With 1.5 billion parameters, GPT-2 was 12 times larger than the initial GPT. OpenAI unveiled the third version, GPT-3, which scaled up the model architecture, data and compute, in their May research paper Language Models are Few-Shot Learners.
GPT-3 delivered SOTA performance across a variety of NLP tasks and benchmarks in zero-shot, one-shot, and few-shot settings. For example, when fed the prompt: “Close your eyes and, with detail, describe the sounds and smells around you right now. Create a picture that I can clearly see in my mind,” a GPT-3-powered writing assistant developed by ShortlyRead generated the following dark tale, which reads like the product of a creative writing class:
(This place seemed much smaller when she’d first walked in. It was probably the concrete walls – too bare, too harsh, like a cell. They always made her want to burrow into the corners.)
It was getting cold. She shivered. The sounds continued. More rapid now. She coughed.
Raphaël Millière, a Philosopher of Mind & Cognitive Science at Columbia University’s Center for Science and Society, asked GPT-3 to compose a response to the philosophical essays written about it. The generated text includes an advanced argument and even a bit of self-reflection: “Human philosophers often make the error of assuming that all intelligent behavior is a form of reasoning. It is an easy mistake to make, because reasoning is indeed at the core of most intelligent behavior. However, intelligent behavior can arise through other mechanisms as well. […] I lack long-term memory. Every time our conversation starts anew, I forget everything that came before.“
Millière employed AI Dungeon’s GPT-3 based “Dragon” model instead of the GPT-3 API, along with some custom prompts, explaining, “there’s cherry-picking at two levels: within each complete response, some sentences were not GPT-3’s first output (although they were still written by GPT-3!); and I shared only the two most interesting complete responses I obtained through this process.” Millière tweeted that even taking the cherry-picking process into account, the results were “quite remarkable!”
Millière pointed out however that “serious and systematic assessment of GPT-3’s abilities has to be done via the API, w/ many trials per task and no cherry-picking. I don’t think any researcher would claim that playing w/@AiDungeon is a valid substitute. Unfortunately, most of us lack access to the API.” OpenAI is offering free access to the API private beta through mid-August. Interested academic researchers and collaborators must however submit use cases or products to join a waitlist. NYU Professor Gary Marcus, for example, hasn’t received API access even though he has repeatedly requested it.
Toronto-based machine learning engineer Aditya Joshi has curated a list of jaw-dropping GPT-3 powered applications that includes an all-purpose Excel function, a recipe creator, a Google-ads generator, and even a comedy sketch writer. But as the list grows, some are cautioning against overly optimistic expectations regarding the language model. Even OpenAI CEO Sam Altman tweeted that “the hype is way too much.”
Some GPT-3-powered applications have also found critics. Facebook’s head of AI Jerome Pesenti slammed a tweet generator dubbed “thoughts” that was created using GPT-3 for generating harmfully biased sentences, and suggested OpenAI may have released the API prematurely.
Pesenti’s concerns are not without foundation. GPT-3 predecessor GPT-2 was initially not made publicly available, as OpenAI explained, “it’s clear that the ability to generate synthetic text that is conditioned on specific subjects has the potential for significant abuse.” It not until was nine months later, in November 2019, that OpenAI publicly released GPT-2 along with code and model weights after “no strong evidence of misuse” had been observed.
Altman responded to Pesenti, “We share your concern about bias and safety in language models, and it’s a big part of why we’re starting off with a beta and have safety review before apps can go live,” adding, “We do not (yet) have a service in production for billions of users, and we want to learn from our own and others’ experiences before we do. We totally agree with you on the need to be very thoughtful about the potential negative impact companies like ours can have on the world.”
“AI is going to change the world, but GPT-3 is just a very early glimpse. We have a lot still to figure out,” Altman tweeted. As the waves of praise celebrating the early successes of the 175 billion parameter language model subside, the exposed limitations are sending a sobering message that the entire AI research community would do well to heed: we still have a lot to figure out indeed.
Reporter: Fangyu Cai | Editor: Michael Sarazen
This report offers a look at how China has leveraged artificial intelligence technologies in the battle against COVID-19. It is also available on Amazon Kindle. Along with this report, we also introduced a database covering additional 1428 artificial intelligence solutions from 12 pandemic scenarios.
Click here to find more reports from us.
We know you don’t want to miss any story. Subscribe to our popular Synced Global AI Weekly to get weekly AI updates.