Meta AI & Columbia U ‘Squeeze the Juice’ to Turn Bad Responses into Good Labels and Boost Dialogue Model Performance

There is growing interest in the machine learning community on how to best employ human feedback to improve the responses and performance of chatbots and other dialogue models. Human feedback in the wild is however sparse; and typically comes in the form of up/down votes and free-form textual comments and “gold” corrections which are not always reliable or explicit.

In the new paper When Life Gives You Lemons, Make Cherryade: Converting Feedback from Bad Responses into Good Labels, a research team from Meta AI and Columbia University proposes JUICER, a framework that effectively utilizes binary and textual human feedback to improve the conversational responses of dialogue models.

The team summarizes their main contributions as follows:

We show that free-form textual feedback is a very useful signal for improving the performance of both a satisfaction classifier to identify good and bad responses and a reply corrector to generate better corrections.
Augmenting training data with reply-corrector-generated corrections works better than only training with existing gold corrections.
Models such as Director (Arora et al., 2022) that utilize both gold/predicted good and bad responses further improve the final dialogue model. Our final best models outperform the baseline BlenderBot 2 model or Director alone.

The proposed JUICER model consists of three modules: a satisfaction classifier, a reply corrector, and the dialogue model itself. The satisfaction classifier is trained to detect good and bad feedback and predict binary satisfaction labels for all unannotated bot responses. The reply corrector subsequently converts the bad responses into good responses. The final dialogue model is then re-trained using the refined feedback and predictions from the previous steps.

In their empirical study, the team applied their approach to the baseline 3B parameter BlenderBot2 (BB2 3B) dialogue model on the FITS and DEMO datasets. In the experiments, JUICER boosted the F1 accuracy score from 15.3 to 18.5 on an unseen test set, and improved good responses from 33.2 to 41.9 percent in human evaluations results, confirming its ability to leverage both good and bad human feedback to improve the overall performance of dialogue models.

The paper When Life Gives You Lemons, Make Cherryade: Converting Feedback from Bad Responses into Good Labels is on arXiv.

Author: Hecate He | Editor: Michael Sarazen

We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.

3 comments on “Meta AI & Columbia U ‘Squeeze the Juice’ to Turn Bad Responses into Good Labels and Boost Dialogue Model Performance”

Here

2022-11-11

Very interesting post, thanks for sharing this info.

Loading...

Jack

2022-11-11

Great news

Loading...

tommyrider2000

2026-03-25

Hey, that’s an interesting challenge, balancing limited human feedback with improving systems over time, and it reminds me how after long workdays I usually prefer solutions that learn or adapt without requiring too much manual input from me, and while browsing casually I came across vegashero by chance, what caught my attention were the bonus for players from the Netherlands, I tried Book of Dead after a losing streak and almost stopped early, but then I pushed a bit further and managed to hit a solid win, since then I sometimes return to it when I want a straightforward way to unwind without overthinking things.

Loading...

Meta AI & Columbia U ‘Squeeze the Juice’ to Turn Bad Responses into Good Labels and Boost Dialogue Model Performance

Like this:

3 comments on “Meta AI & Columbia U ‘Squeeze the Juice’ to Turn Bad Responses into Good Labels and Boost Dialogue Model Performance”

Leave a Reply Cancel reply

Related

Share this:

Like this:

3 comments on “Meta AI & Columbia U ‘Squeeze the Juice’ to Turn Bad Responses into Good Labels and Boost Dialogue Model Performance”

Leave a Reply Cancel reply

Related