Open AI’s GPT-3 Paper Shares NeurIPS 2020 Best Paper Award With Works from Politecnico di Milano, CMU and UC Berkeley

2020-12-07

OpenAI’s groundbreaking GPT-3 language model paper, a no-regret learning dynamics study from Politecnico di Milano & Carnegie Mellon University, and a UC Berkeley work on data summarization have been named the NeurIPS 2020 Best Paper Award winners. The organizing committee made the announcements this morning, along with their Test of Time Award, to kick off the thirty-fourth Conference on Neural Information Processing Systems.

More than 18,000 participants are anticipated at this year’s virtual gathering. In a blog post, NeurIPS 2020 organizers say they have endeavoured to ensure the virtual event is as accessible as possible for attendees in different time zones and with varied Internet speed and access.

The organizers designed a schedule with two six-hour sessions per day: the first starts at 5am PT and the second at 5pm PT. Paper authors could choose a session to make their presentations compatible with their preferred time zone. The organizers have also enabled users to choose their preferred bandwidth.

Best Paper Award Winners

Language Models are Few-Shot Learners
(NeurIPS link)
Authors: Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, Dario Amodei
Institution: OpenAI

Abstract: We demonstrate that scaling up language models greatly improves task-agnostic, few-shot performance, sometimes even becoming competitive with prior state-of-the-art fine-tuning approaches. Specifically, we train GPT-3, an autoregressive language model with 175 billion parameters, 10x more than any previous non-sparse language model, and test its performance in the few-shot setting. For all tasks, GPT-3 is applied without any gradient updates or fine-tuning, with tasks and few-shot demonstrations specified purely via text interaction with the model. GPT-3 achieves strong performance on many NLP datasets, including translation, question-answering, and cloze tasks. We also identify some datasets where GPT-3’s few-shot learning still struggles, as well as some datasets where GPT-3 faces methodological issues related to training on large web corpora.

Reasons given by the awards committee: Artificial intelligence systems trained to estimate the likelihood of the next word in a sequence are known as “language models”. Language models were first described in the 1950s as a theoretical construct for connecting the then-new field of information theory with natural language. This paper describes GPT-3, the largest and most sophisticated language model ever constructed. It demonstrates that, if you make a language model accurate enough by using unprecedented amounts of compute and data, it gains the ability to solve a wide variety of tasks without additional training, using only simple, natural language prompts. Example tasks include answering trivia questions, generating essays, determining if a movie review is positive or negative, and translating between French and English. The authors note that GPT-3 is better at some tasks than others, and devote most of the paper to carefully cataloging its strengths and weaknesses. The authors also consider potentially harmful implications of the technology, such as cheap generation of almost undetectable fake news and the model’s tendency to reflect the biases of its training data on sensitive topics such as race, gender, and religion.

No-Regret Learning Dynamics for Extensive-Form Correlated Equilibrium
(NeurIPS link)
Authors: Andrea Celli (Polimi), Alberto Marchesi (Polimi), Gabriele Farina (CM) and Nicola Gatti (Polimi)
Institutions: Politecnico di Milano and Carnegie Mellon University

Abstract: The existence of simple, uncoupled no-regret dynamics that converge to correlated equilibria in normal-form games is a celebrated result in the theory of multi-agent systems. Specifically, it has been known for more than 20 years that when all players seek to minimize their internal regret in a repeated normal-form game, the empirical frequency of play converges to a normal-form correlated equilibrium. Extensive-form (that is, tree-form) games generalize normal-form games by modeling both sequential and simultaneous moves, as well as private information. Because of the sequential nature and presence of partial information in the game, extensive-form correlation has significantly different properties than the normal-form counterpart, many of which are still open research directions. Extensive-form correlated equilibrium (EFCE) has been proposed as the natural extensive-form counterpart to normal-form correlated equilibrium. However, it was currently unknown whether EFCE emerges as the result of uncoupled agent dynamics. In this paper, we give the first uncoupled no-regret dynamics that converge to the set of EFCEs in n-player general-sum extensive-form games with perfect recall. First, we introduce a notion of trigger regret in extensive-form games, which extends that of internal regret in normal-form games. When each player has low trigger regret, the empirical frequency of play is a close to an EFCE. Then, we give an efficient no-trigger-regret algorithm. Our algorithm decomposes trigger regret into local subproblems at each decision point for the player, and constructs a global strategy of the player from the local solutions at each decision point.

Reasons given by the awards committee: Our decisions impact others and their decisions impact us. To settle on a rational way to behave, we need to cut through this interdependence to reach what economists call an equilibrium. Creating automated procedures for finding equilibria is notoriously difficult. This paper provides the first approach for finding so-called correlated equilibria for general interactions using a learning approach. Correlated equilibria require a trusted external mediator that makes decision recommendations to the decision-makers. The canonical example of a correlated equilibrium is a stoplight. The stoplight tells approaching cars whether it is safe to go. Even in the absence of relevant laws, we should follow the stoplight’s recommendations because we know that everyone can reason that it is in their best interest to do so—driving through the red light is a risky proposition. The paper shows that such equilibria can be arrived at by learning algorithms acting completely independently—no external traffic engineer is needed—even when the decisions involve multiple steps and the decision-makers are partly in the dark about the state of the world. Such an approach could have powerful implications in the modern “gig economy”, where centralized supervision of self-interested actors is the norm.

Improved Guarantees and a Multiple-Descent Curve for Column Subset Selection and the Nystrom Method
(NeurIPS link)
Authors: Michał Dereziński, Rajiv Khanna, Michael W. Mahoney
Institution: University of California, Berkeley

Abstract: The Column Subset Selection Problem (CSSP) and the Nystrom method are among the leading tools for constructing small low-rank approximations of large datasets in machine learning and scientific computing. A fundamental question in this area is: how well can a data subset of size k compete with the best rank k approximation? We develop techniques which exploit spectral properties of the data matrix to obtain improved approximation guarantees which go beyond the standard worst-case analysis. Our approach leads to significantly better bounds for datasets with known rates of singular value decay, e.g., polynomial or exponential decay. Our analysis also reveals an intriguing phenomenon: the approximation factor as a function of k may exhibit multiple peaks and valleys, which we call a multiple-descent curve. A lower bound we establish shows that this behavior is not an artifact of our analysis, but rather it is an inherent property of the CSSP and Nystrom tasks. Finally, using the example of a radial basis function (RBF) kernel, we show that both our improved bounds and the multiple-descent curve can be observed on real datasets simply by varying the RBF parameter.

Reasons given by the awards committee: As the availability of large datasets expands, so does society’s dependence on being able to summarize complex data succinctly. Data summarization is the problem of identifying important examples and attributes in data to help characterize it efficiently. It can be used to select a representative subset of gene variants from a genetics dataset or the most informative documents from a text database. Prior work has shown that data summarization is an intractable problem—there are data sets for which no known algorithm can provide a good summary in a reasonable time frame. This paper shows that these analyses are far too pessimistic. The datasets that make the data summarization problem intractable are pathological and, in fact, interpretable summaries can be generated far more cheaply for real-world data. The work suggests that future systems will be able to create data summaries that are accurate, interpretable, and efficiently generated, greatly aiding our ability to absorb and process complex datasets.

Test of Time Award Winner

The test of time award is presented to a paper from 10 years ago that has had a particularly significant and lasting impact on the AI community.

Hogwild: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent
(NeurIPS link)
Authors: Benjamin Recht, Christopher Re, Stephen Wright, Feng Niu
Institution: University of Wisconsin-Madison

Abstract: Stochastic Gradient Descent (SGD) is a popular algorithm that can achieve stateof-the-art performance on a variety of machine learning tasks. Several researchers have recently proposed schemes to parallelize SGD, but all require performance-destroying memory locking and synchronization. This work aims to show using novel theoretical analysis, algorithms, and implementation that SGD can be implemented without any locking. We present an update scheme called HOGWILD! which allows processors access to shared memory with the possibility of overwriting each other’s work. We show that when the associated optimization problem is sparse, meaning most gradient updates only modify small parts of the decision variable, then HOGWILD! achieves a nearly optimal rate of convergence. We demonstrate experimentally that HOGWILD! outperforms alternative schemes that use locking by an order of magnitude.

Reasons given by the awards committee: Machine learning is the problem of turning exemplar data into a model, stored in a computer, that can be used to make decisions or take actions. At the core of modern machine-learning systems is the stochastic gradient method – usually known as “SGD” – which searches the space of possible models to find one that matches up well with the exemplar data. This paper described an implementation of SGD that can be run in parallel across a collection of fast computers, all of them making repeated small changes to the model without any coordination or synchronization. This approach, which the authors dubbed Hogwild!, outperformed alternative parallelization schemes that required synchronization. The paper also presented a theoretical analysis of Hogwild!’s convergence rate, showing that linear speedup in the number of processors could be attained (to within a constant factor) even when a large number of processors were used. The paper has been cited almost 2000 times, attesting to its influence not only on machine learning but also on the fields of computer systems and optimization, both of which contributed to the development and understanding of the Hogwild! approach.

NeurIPS 2020 continues through December 12. With 9,467 submitted papers, this has been another record-breaking year for NeurIPS — with 38 percent more paper submissions than 2019. A total of 1,903 papers were accepted, compared to 1,428 last year.

Over the course of the week, participants can virtually join the Expo, where top industry sponsors will provide talks, panels, and demos of academic interest. Tutorials will cover current lines of inquiry while general sessions will include talks, posters, and demonstrations. A full agenda can be found by visiting the NeurIPS conference schedule page.

Reporter: Yuan Yuan | Editor: Michael Sarazen

Synced Report | A Survey of China’s Artificial Intelligence Solutions in Response to the COVID-19 Pandemic — 87 Case Studies from 700+ AI Vendors

This report offers a look at how China has leveraged artificial intelligence technologies in the battle against COVID-19. It is also available on Amazon Kindle. Along with this report, we also introduced a database covering additional 1428 artificial intelligence solutions from 12 pandemic scenarios.

Click here to find more reports from us.

We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.

13 comments on “Open AI’s GPT-3 Paper Shares NeurIPS 2020 Best Paper Award With Works from Politecnico di Milano, CMU and UC Berkeley”

Pingback: Open AI’s GPT-3 Paper Shares NeurIPS 2020 Best Paper Awards With Politecnico di Milano, CMU and UC Berkeley – tensor.io
Pingback: Open AI’s GPT-3 Paper Shares NeurIPS 2020 Best Paper Awards With Politecnico di Milano, CMU and UC Berkeley – ONEO AI
Jewel Galore

2024-01-04

Adorn yourself with the regal allure of Pakistani bangles available at Jewelgalore. Our exquisite collection celebrates the traditional artistry and craftsmanship of Pakistan, allowing you to express your unique style with grace and charm.

Loading...

Reply
OSH UNIVERSITY

2024-01-04

Join the ranks of excellence at Osh University, consistently ranked among the international medical universities. Experience a cutting-edge education that propels you toward a successful medical career on a global stage.

Loading...

Reply
Shalamar Hospital

2024-01-04

Shalamar Hospital’s Laser Skin Complex offers advanced dermatological treatments. Visit us for cutting-edge skin care services, and achieve your desired skin goals with expert guidance.

Loading...

Reply
choice mall

2024-01-15

Discover the power of L’Oreal Hyaluronic Acid Serum at ChoiceMall.pk. Elevate your skincare with this hydrating choice for a radiant complexion. Shop now!

Loading...

Reply
choice mall

2024-01-15

Revitalize your face with Vitamin C Serum at ChoiceMall.pk. Explore top choices for radiant skin. Elevate your skincare routine. Shop now

Loading...

Reply
choice mall

2024-01-15

Discover your skin’s perfect match with face serum at ChoiceMall.pk. Elevate your skincare routine for a radiant and healthy complexion. Shop now

Loading...

Reply
Tempo Garments

2024-02-24

Discover unmatched comfort and quality with Tempo Garments undergarments collection. Crafted with soft fabrics and meticulous attention to detail, our range ensures ultimate comfort and confidence every day.

Loading...

Reply
Sheesha Sense

2024-04-02

Discover the closest source of premium hookah flavour near you at Sheesha Sense. With our wide availability, you can enjoy exceptional flavors without having to search far and wide.

Loading...

Reply
Kullure Beauty And Body

2024-08-05

Looking for a spa near McDonough GA ? Kullure Beauty and Body offers luxurious treatments for your relaxation and beauty.

Loading...

Reply
messi

2026-03-23

Personalization is no longer optional in real estate marketing—it’s expected. AI tools analyze client data to deliver highly personalized experiences, from tailored property recommendations to customized email campaigns. For instance, if a client frequently searches for waterfront properties, AI ensures they receive listings that match this preference. This level of personalization builds trust and strengthens relationships, as clients feel understood and valued. Realtors who embrace AI personalization strategies see higher engagement rates, stronger loyalty, and more referrals, making it a cornerstone of modern real estate success. real estate listing marketing ideas

Loading...

Reply
messi

2026-04-04

Phuket’s appeal extends beyond investment. Buyers are drawn to the island’s lifestyle, which combines tropical beauty with modern amenities. Luxury villas and condos offer a standard of living that rivals global destinations, while the island’s cultural richness adds depth to the experience. Owning property in Phuket is not just about financial returns; it’s about embracing a lifestyle that blends relaxation, adventure, and sophistication. beachfront property Phuket

Loading...

Reply

Open AI’s GPT-3 Paper Shares NeurIPS 2020 Best Paper Award With Works from Politecnico di Milano, CMU and UC Berkeley

Best Paper Award Winners

Test of Time Award Winner

Like this:

13 comments on “Open AI’s GPT-3 Paper Shares NeurIPS 2020 Best Paper Award With Works from Politecnico di Milano, CMU and UC Berkeley”

Leave a Reply Cancel reply

Related

Best Paper Award Winners

Test of Time Award Winner

Share this:

Like this:

13 comments on “Open AI’s GPT-3 Paper Shares NeurIPS 2020 Best Paper Award With Works from Politecnico di Milano, CMU and UC Berkeley”

Leave a Reply Cancel reply

Related