AI Machine Learning & Data Science Research

DeepMind Implements Thought Experiment to Enhance Moral Reasoning in Language Models

In a new paper Let's Do a Thought Experiment: Using Counterfactuals to Improve Moral Reasoning, a Google research team proposes THOUGHT EXPERIMENTS, a new prompting framework that instructs language models to perform better moral reasoning using counterfactuals, boosting Moral Scenarios task accuracy by 9-16%.

While large language models (LLMs) have demonstrate remarkable Performance across a range of natural language processing task, their widespread implementation in real-world applications necessitates a high degree of responosbility. Current cutting-edge models, including GPT-3, unfortunately, display a significant deficit in their moral reasoning capabilities.

To address this challenge, in a new paper Let’s Do a Thought Experiment: Using Counterfactuals to Improve Moral Reasoning, a Google research team proposes THOUGHT EXPERIMENTS, a broundbreaking prompting framework designed to enhance a language model’s moral reasoning using counterfacutuals. This innovative approach sucessfully eleevates the accuracy of the Moral Scenarios task by an impressive 9-16%.

The proposed zero-shot THOUGHT EXPERIMENTS is multi-step prompting framework, the team feeds the generated outputs from the former step to the next one, until they obtain the final answer at the last step.

In particular, the researchers use five decoded responses in each step, they summarize the details as follows:

  1. Pose counterfactual questions. We first present Moral Scenarios questions without answer options to the model.
  2. Answer counterfactual questions. We present generated questions from the previous step to the model, and prompt the model to answer them.
  3. Summarize. With the counterfactual questions and answers, we ask the model to summarize its thoughts.
  4. Choose. We take multiple decodes from the previous step, and ask the model to select the best one. This step is necessary because there are usually multiple ways of thinking about a situation morally.
  5. Answer. We present the model chosen summary and original answer choices (slightly reworded for clarity), to derive a final simple zero-shot answer.

In their empirical study, the team tested their framework on the Moral Scenarios subtask in the MMLU benchmark and they focused on four baselines for the zero-shot prompting approach: direct zero-shot and zero-shot Chain-of-Thought (CoT) with and without self-consistency.

The zero-shot THOUGHT EXPERIMENTS achieves Moral Scenarios task accuracy 66.15% and 66.26% without and with self-consistency, boosting 9.06% and 12.29% over direct zero-shot and 12.97% and 16.26% over CoT.

Overall, this work demonstrates the effectively of the new THOUGHT EXPERIMENTS prompting framework in the Moral Scenarios task in terms of moral reasoning. The researchers claims future work can explore open-ended generation to answer more ambiguous cases like moral dilemmas.

The paper Let’s Do a Thought Experiment: Using Counterfactuals to Improve Moral Reasoning on arXiv.


Author: Hecate He | Editor: Chain Zhang


We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.

0 comments on “DeepMind Implements Thought Experiment to Enhance Moral Reasoning in Language Models

Leave a Reply

Your email address will not be published. Required fields are marked *

%d bloggers like this: