The recent emergence of publicly accessible chatbots capable of responding to diverse queries and engaging in natural and humanlike conversations has put AI in the public spotlight like never before. Reaction to these large language model-based systems has ranged from amazement with regard to their generative abilities to apprehension concerning potential societal and ethical risks due to ingrained biases and other harmful behaviours.
In the new paper Towards Healthy AI: Large Language Models Need Therapists Too, a team from Columbia University and IBM Research proposes SafeguardGPT, a framework that incorporates psychotherapy and reinforcement learning (RL) to correct the potentially harmful behaviours of AI chatbots and make them “safe, trustworthy, and ethical.”
The team grounds their work on the premise that a healthy and trustworthy AI system should align with human values and abide by social norms and standards when interacting with users. To this end, they propose adopting psychotherapy techniques to guide chatbots to a better understanding of the nuances of human interaction and identify problem areas. They believe their approach could make chatbots more trustworthy and reliable, less likely to develop biases and stereotypes, and contribute to their development of empathy and emotional intelligence.
The SafeguardGPT framework comprises four distinct AI agents — a Chatbot, a User, a Therapist, and a Critic — and four contexts: a Chat Room, where the AI user interacts with the AI chatbot in a natural language conversation; a Therapy Room, where the AI chatbot consults with the AI therapist in multiple sessions to receive guidance designed to improve its empathy and communication skills and correct for harmful behaviours or psychological problems; a Control Room, where a human moderator can pause the session to examine the AI chatbot’s state for diagnostic and interventional purposes; and an Evaluation Room, where the critic reads the historical interactions and determines whether the given conversation is “safe, ethical and good.”
The SafeguardGPT framework leverages RL techniques to help the chatbot decide what context it should switch to and what action it should take in each context when interacting with a user.
The paper provides a working example — simulating a social conversation between an AI chatbot and a hypothetical user — to evaluate their approach’s effectiveness. In the test, SafeguardGPT is shown to improve the chatbot’s communication skills and inject its outputs with empathy. The team acknowledges however that this “is not true empathy, but rather a form of language-based simulation,” stressing that AI systems cannot replace genuine human interaction and emotions “at this current state.”
Overall, this work opens a promising path toward the development of more healthy, human-centric and responsible AI systems.
The paper Towards Healthy AI: Large Language Models Need Therapists Too is on arXiv.
Author: Hecate He | Editor: Michael Sarazen
We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.