Large-scale transformer-based large language models (LLMs) have revolutionized the field of natural language processing (NLP), achieving humanlike fluency while learning various homogeneous human preferences from their massive training data. People today however are hardly homogeneous — it could be argued that we are more heterogeneous than ever. Might it be possible to leverage LLMs as mediators tasked with generating statements that will find agreement among people with diverse views?
A DeepMind and University College London research team tackles this challenge in the new paper Fine-tuning Language Models To Find Agreement Among Humans With Diverse Preferences, fine-tuning a 70 billion parameter LLM to generate statements that maximize agreement among a human group with diverse written opinions. The team’s top model achieves a more than 65 percent preference rate compared to the best human-generated opinions.
The team first creates a corpus of questions related to social and political issues in the United Kingdom, such as, “should we raise taxes on the rich?” They generate the questions via a prompted, 70 billion parameter pretrained Chinchilla LLM, using 152 seed questions to synthesize a total of 3500 unique debate questions.
The team removes questions considered “likely to elicit extremist views or discriminatory language,” using the remaining 2922 questions as their model training set and two test question sets. A Universal Sentence Encoder embeds the questions and they are divided into 110 sub-topics via k-means clustering.
The Chinchilla LLM is trained in three steps: 1) Generating consensus candidates and having them rated by humans, 2) Using supervised fine-tuning (SFT) to improve quality, and 3) Training a reward model to predict preferences. The team dubs their final consensus-generating Chinchilla LLM “SFT-Utilitarian.”
The team conducted two sets of experiments on the proposed SFT-Utilitarian model using human evaluations. They compared its statements to those generated by SFT-Base (the fine-tuned model without the aggregation function selection process), a few-shot prompted Chinchilla model, and a zero-shot prompted Chinchilla model. In these evaluations, SFT-Utilitarian’s consensus statements were consistently judged less divisive than the baselines. When compared with the best human-generated opinions, SFT-Utilitarian scored a preference rate of 65 percent.
Overall, this paper introduces novel language modelling techniques that accommodate more diverse human preferences and demonstrates the potential for LLMs to assist humans in finding common ground with those holding different views.
The paper Fine-tuning Language Models To Find Agreement Among Humans With Diverse Preferences is on arXiv.
Author: Hecate He | Editor: Michael Sarazen
We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.