Large language models (LLMs) continuously making progress in generating contextually and semantically meaningful texts, but they still suffer from the risks of generating incoherent, false, unreliable or even toxic texts. To address this issue, there is an increasing interests in teaching LLMs to be able to refine their generated outputs.
In a new paper Shepherd: A Critic for Language Model Generation, a Meta AI research team presents Shepherd, a language model that are explicitly tuned to critique model generated outputs as well as to generate feedbacks to suggest improvements on solving the factuality, logical errors, coherence, and alignment issues.

The team starts by gathering feedback from two communities: Stack Exchange and the Pushshift Reddit Dataset. They clean the dataset as a question-answer-critique triad, with a post’s title and the sub-title as a question, the corresponding top-level comments as answer, and the replies to these comments as critiques.
To curate valid critiques, the researchers employ several techniques: 1) keyword filtering to match answer to two cases: large accurate answer or answer contains inaccuracies; 2) user edit history to identify the case where the critique leads to a refinement of the original answer; 3) incorporate additional filters linked with community vote scores to further refine the data to improve critiques; 4) choose highest critique score to maintain diversity; 5) incorporate a profanity check and eliminate lower score comments to manage offensive language; 6) filter URLs, images, or videos to key it text-only; 7) identify and remove comments to preserve the integrity of the Q&A format.

To ensure high-quality data, the researchers further conduct several postprocessing: 1) remove examples with red flags; 2) remove feedback on error types of “Redundancy” and “Consistency with context”; 3) concatenate the feedback from different error types into a paragraph using natural words to better identify example with different error type. As a results, they obtain 1,317 high quality examples in total.
The team selected LLaMA-7B (Touvron et al., 2023) as the base model for Shepherd training. Given the questions and the corresponding answers generated by large language models, Shepherd is trained to critique the generated answer by detecting errors or providing insightful feedback.

In their empirical study, the team compared Shepherd with several state-of-the-art language models, such as Alpaca 7B (Taori et al., 2023), SelFee-7B (Ye et al., 2023) and ChatGPT (GPT-3.5 Turbo). Shepherd outperforms Alapca and SelFee by a large margin and matches the performance of ChatGPT. The team believes Shepherd will be helpful to improve generation quality and reduce hallucinations.
The paper Shepherd: A Critic for Language Model Generation on arXiv.
Author: Hecate He | Editor: Chain Zhang

We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.
Pingback: Meta AI’s Shepherd Criticize Language Model Outputs to Crash Hallucinations
Pingback: Meta AI’s Shepherd Criticize Language Model Outputs to Crash Hallucinations – Ai Headlines