The recent advancement of Large language models (LLMs) has ushered in a significant leap forward in the development of conversational agent, demonstrating synthetic human personalities characteristic by distinct thought patterns, traits and behaviors.
Identifying a scientific measure to quantify these manifestations of personality in LLMs is pivotal for conducting responsible Artificial Intelligent (AI) researches. It’s also the key to fostering effective communication between users and these intelligent agents.
In their latest paper Personality Traits in Large Language Models, a collaborative research team from Google, Cambridge University and Keio University presents a robust, validated approach to establish the validity of personality characterization in LLMS. They simulate population Avatars in LLM responses and devise a mechanism to shape and control the personality traits exhibited by these models.
The team summarizes their main contributions as follows:
- We develop a methodology that establishes construct validity of characterizing personalities in LLM-generated text using established psychometric tests.
- We propose a novel method of simulating population variance in LLM responses through controlled prompting.
- We contribute a LLM-independent personality shaping mechanism that changes LLM-observed levels of personality traits in a controlled way.
The team summarizes that there are two steps for characterizing LLM personality and measure its capability to coherently emulate human personality traits. The researchers start by administering psychometric tests to an LLM: A given prompt instructs a LLM to rate an item in a psychometric test. They construct all possible prompts for that item and compare the output of the mode with the standardized responses to score a given item, from where the scores can be statistically analyzed for construct validity.
Next they select two well-established psychometric measures, lexical tradition and questionnaire tradition to measure personality, as well as the Big Five Inventory (BFI) for a robustness check and to assess convergent validity.
They also use the prompt that consists of an Item Preamble, Persona, Item, and Item Postamble to simulate population variance. They systematically modify the components in the prompt to generate what they call “simulated participants” to make it possible to compute correlations with personality-related constructs as well as facilities robustness check.
To know if the derived signals of personality is reliable and externally meaningful, they use structured prompting to simulate a diverse population of LLM responses in terms of personality and known correlates of personality. They also conduct a suite of statistical analyses to measure the quality of the returned LLM data.
Having established the approaches to determine if a LLM personality is reliable and valid, the researchers finally demonstrate how to shape and control LLM-synthesized personalities. They adapt Goldberg in their prompt design and match the adjectives to the Big Five domains and 30 lower-order personality facets, which facilitates shaping of any trait at different levels.
In their empirical study the team evaluated their methods on multiple choice question answering (MCQA), and long generated text. They summarizes their findings as follows:
- Personality simulated in the outputs of some LLMs (under specific prompting configurations) is reliable and valid.
- Evidence of reliability and validity of LLM-simulated personality is stronger for larger and instruction fine-tuned models.
- Personality in LLM outputs can be shaped along desired dimensions to mimic specific personality profiles.
The paper Personality Traits in Large Language Models on arXiv.
Author: Hecate He | Editor: Chain Zhang
We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.