Finding Truth in LLMs: UC Berkeley & Peking U Propose Unsupervised Contrast-Consistent Search

Powerful large language models (LLMs) now play essential roles in many real-world applications. But as humans become increasingly dependent on LLMs, some are questioning whether or to what extent we can trust them to deliver the “truth.”

In the new paper Discovering Latent Knowledge in Language Models Without Supervision, a research team from UC Berkeley and Peking University presents Contrast-Consistent Search (CCS), an unsupervised approach for discovering latent knowledge in language models.

The research team argues there are a number of ways conventional LLMs can become “misaligned with the truth.” If a model is trained via imitation learning, it may simply adopt the human demonstrators’ inefficiencies and errors. If a model’s outputs are rated by humans (reward optimization), the texts may be coherent and compelling, but errors that humans can’t detect may get through.

To circumvent these issues, instead of using explicit truth, the team focuses on models’ learned implicit, internal “beliefs” or “knowledge”, which it realizes through the introduction of Contrast-Consistent Search (CCS), a novel approach designed to accurately detect and reveal knowledge from model representations.

The proposed CCS aims at finding a direction in the activation space that is consistent across negations — i.e., that satisfies logical consistency properties such as that a statement and its negation have opposite truth values. The CCS workflow comprises four steps: 1) Answer questions with ‘yes’ or ‘no’, 2) Compute the representation of each answer, 3) Map the answer representations to probabilities of being true, and 4) Optimize that mapping to make the probabilities both consistent and confident.

In their empirical study, the team evaluated CCS across six models and ten question-answering datasets. The results show that CCS surpasses strong zero-shot baselines by an average of four percent and can cut prompt sensitivity in half while maintaining high accuracy — even if the prompted answers are incorrect.

This work demonstrates the potential of using unsupervised approaches to solve false text output issues in LLMs. The team sees their method as an initial step toward discovering latent knowledge when explicit ground truth labels are unavailable.

The paper Discovering Latent Knowledge in Language Models Without Supervision is on arXiv.

Author: Hecate He | Editor: Michael Sarazen

We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.

9 comments on “Finding Truth in LLMs: UC Berkeley & Peking U Propose Unsupervised Contrast-Consistent Search”

kamir bouchareb st

2023-02-06

شكرا

Loading...

Reply
snow rider 3d

2023-03-17

This research shows how unsupervised methods can be used to address LLMs’ problems with producing erroneous text output.

Loading...

Reply
Slope

2024-05-27

The research presented offers a promising direction for addressing these challenges.

Loading...

Reply
Turner2

2025-02-21

Blockblast is a fun puzzle game where you clear blocks by matching colors. Tap groups of blocks to remove them and score points. It’s easy and relaxing.

Loading...

Reply
Ashley Olga

2025-03-18

This “Contrast-Consistent Search” is fascinating! Addressing LLM truthfulness is crucial. The misalignment issue highlights the “demonstrator error” problem well. Perhaps a Geometry Dash-like scoring system could add another layer of validation, rewarding consistent, accurate responses. This approach could reduce reliance on solely human-rated rewards and improve overall reliability. Interesting work!

Loading...

Reply
Geometry Dash

2025-03-18

This “Contrast-Consistent Search” is fascinating! Addressing LLM truthfulness is crucial. The misalignment issue highlights the “demonstrator error” problem well. Perhaps a Geometry Dash-like scoring system could add another layer of validation, rewarding consistent, accurate responses. This approach could reduce reliance on solely human-rated rewards and improve overall reliability. Interesting work!

Loading...

Reply
Snow Rider 3D

2025-07-07

ok thanks

Loading...

Reply
walterliz

2025-07-07

Discovering implicit knowledge in unsupervised language models is a new challenge with geometry dash jump

Loading...

Reply
Emilycandy

2025-11-13

If you like to play games online, you may go to our website. We are currently adding new and popular level devil games for free.

Loading...

Reply